minChatGPT
Model alignment
This project demonstrates the effectiveness of reinforcement learning from human feedback (RLHF) in improving small language models like GPT-2.
A minimum example of aligning language models with RLHF similar to ChatGPT
213 stars
5 watching
28 forks
Language: Python
last commit: about 1 year ago Related projects:
Repository | Description | Stars |
---|---|---|
rlhf-v/rlhf-v | Aligns large language models' behavior through fine-grained correctional human feedback to improve trustworthiness and accuracy. | 233 |
pku-yuangroup/languagebind | Extending pretraining models to handle multiple modalities by aligning language and video representations | 723 |
llava-rlhf/llava-rlhf | Aligns large multimodal models with factually enhanced reward functions to improve performance and mitigate hacking in reinforcement learning | 319 |
openai/lm-human-preferences | Training methods and tools for fine-tuning language models using human preferences | 1,229 |
pku-alignment/align-anything | Aligns large models with human values and intentions across various modalities. | 244 |
jshilong/gpt4roi | Training and deploying large language models on computer vision tasks using region-of-interest inputs | 506 |
tristandeleu/pytorch-maml-rl | Replication of Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks in PyTorch for reinforcement learning tasks | 827 |
brightmart/xlnet_zh | Trains a large Chinese language model on massive data and provides a pre-trained model for downstream tasks | 230 |
xrsrke/instructgoose | A framework for training language models using human feedback and reinforcement learning | 169 |
minqi/learning-to-communicate-pytorch | This project implements a PyTorch-based framework for learning discrete communication protocols in multi-agent reinforcement learning environments. | 346 |
ymcui/macbert | Improves pre-trained Chinese language models by incorporating a correction task to alleviate inconsistency issues with downstream tasks | 645 |
sjtu-marl/malib | A framework for parallel population-based reinforcement learning | 497 |
guopengf/auto-fedrl | A reinforcement learning-based framework for optimizing hyperparameters in distributed machine learning environments. | 15 |
wangrongsheng/ivygpt | Develops large language models to support medical diagnoses and provide helpful suggestions | 59 |
x-plug/cvalues | Evaluates and aligns the values of Chinese large language models with safety and responsibility standards | 477 |