minChatGPT

Model alignment

This project demonstrates the effectiveness of reinforcement learning from human feedback (RLHF) in improving small language models like GPT-2.

A minimum example of aligning language models with RLHF similar to ChatGPT

GitHub

214 stars

5 watching

28 forks

Language: Python

last commit: over 2 years ago

Related projects:

Repository	Description	Stars
rlhf-v/rlhf-v	Aligns large language models' behavior through fine-grained correctional human feedback to improve trustworthiness and accuracy.	245
pku-yuangroup/languagebind	Extending pretraining models to handle multiple modalities by aligning language and video representations	751
llava-rlhf/llava-rlhf	Aligns large multimodal models with factually enhanced reward functions to improve performance and mitigate hacking in reinforcement learning	328
openai/lm-human-preferences	Training methods and tools for fine-tuning language models using human preferences	1,240
pku-alignment/align-anything	Aligns large multimodal models with human intentions and values using various algorithms and fine-tuning methods.	270
jshilong/gpt4roi	Training and deploying large language models on computer vision tasks using region-of-interest inputs	517
tristandeleu/pytorch-maml-rl	Replication of Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks in PyTorch for reinforcement learning tasks	830
brightmart/xlnet_zh	Trains a large Chinese language model on massive data and provides a pre-trained model for downstream tasks	230
xrsrke/instructgoose	A framework for training language models using human feedback and reinforcement learning	171
minqi/learning-to-communicate-pytorch	This project implements a PyTorch-based framework for learning discrete communication protocols in multi-agent reinforcement learning environments.	349
ymcui/macbert	Improves pre-trained Chinese language models by incorporating a correction task to alleviate inconsistency issues with downstream tasks	646
sjtu-marl/malib	A framework for parallel population-based reinforcement learning	507
guopengf/auto-fedrl	A reinforcement learning-based framework for optimizing hyperparameters in distributed machine learning environments.	15
wangrongsheng/ivygpt	Develops large language models to support medical diagnoses and provide helpful suggestions	59
x-plug/cvalues	Evaluates and aligns the values of Chinese large language models with safety and responsibility standards	481