 minChatGPT
 minChatGPT 
 Model alignment
 This project demonstrates the effectiveness of reinforcement learning from human feedback (RLHF) in improving small language models like GPT-2.
A minimum example of aligning language models with RLHF similar to ChatGPT
214 stars
 5 watching
 28 forks
 
Language: Python 
last commit: about 2 years ago  Related projects:
| Repository | Description | Stars | 
|---|---|---|
|  | Aligns large language models' behavior through fine-grained correctional human feedback to improve trustworthiness and accuracy. | 245 | 
|  | Extending pretraining models to handle multiple modalities by aligning language and video representations | 751 | 
|  | Aligns large multimodal models with factually enhanced reward functions to improve performance and mitigate hacking in reinforcement learning | 328 | 
|  | Training methods and tools for fine-tuning language models using human preferences | 1,240 | 
|  | Aligns large multimodal models with human intentions and values using various algorithms and fine-tuning methods. | 270 | 
|  | Training and deploying large language models on computer vision tasks using region-of-interest inputs | 517 | 
|  | Replication of Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks in PyTorch for reinforcement learning tasks | 830 | 
|  | Trains a large Chinese language model on massive data and provides a pre-trained model for downstream tasks | 230 | 
|  | A framework for training language models using human feedback and reinforcement learning | 171 | 
|  | This project implements a PyTorch-based framework for learning discrete communication protocols in multi-agent reinforcement learning environments. | 349 | 
|  | Improves pre-trained Chinese language models by incorporating a correction task to alleviate inconsistency issues with downstream tasks | 646 | 
|  | A framework for parallel population-based reinforcement learning | 507 | 
|  | A reinforcement learning-based framework for optimizing hyperparameters in distributed machine learning environments. | 15 | 
|  | Develops large language models to support medical diagnoses and provide helpful suggestions | 59 | 
|  | Evaluates and aligns the values of Chinese large language models with safety and responsibility standards | 481 |