instructGOOSE
RLHF framework
A framework for training language models using human feedback and reinforcement learning
Implementation of Reinforcement Learning from Human Feedback (RLHF)
171 stars
5 watching
21 forks
Language: Jupyter Notebook
last commit: almost 2 years ago chatgpthuman-feedbackinstructgptreinforcement-learningrlhf
Related projects:
Repository | Description | Stars |
---|---|---|
| A framework for simulating and evaluating reinforcement learning from human feedback methods | 786 |
| A high-performance implementation of reinforcement learning training pipelines using JAX and PyTorch-like functionality | 755 |
| A framework for implementing complex reinforcement learning algorithms with flexibility and ease of implementation | 306 |
| Aligns large language models' behavior through fine-grained correctional human feedback to improve trustworthiness and accuracy. | 245 |
| An open-source reinforcement learning framework for autonomous driving tasks using the Carla-Simulator environment and Ray/Rllib libraries. | 35 |
| This project demonstrates the effectiveness of reinforcement learning from human feedback (RLHF) in improving small language models like GPT-2. | 214 |
| A collection of implementations of reinforcement learning algorithms in MATLAB | 61 |
| A framework for parallel population-based reinforcement learning | 507 |
| A flexible RL training framework designed for large language models | 427 |
| Provides a framework and theoretical foundation for Federated Reinforcement Learning with Byzantine Resilience in distributed systems | 85 |
| A Python implementation of a deep reinforcement learning algorithm combining multiple techniques for improved performance in Atari games | 1,591 |
| A Python library implementing state-of-the-art deep reinforcement learning algorithms for Keras and OpenAI Gym environments. | 8 |
| Provides a framework for using CARLA as a reinforcement learning environment | 95 |
| An RL framework for building and training reinforcement learning models in Python | 266 |
| Provides a unified toolkit for constructing, computing, and optimizing intrinsic reward modules in reinforcement learning | 373 |