LLaVA-RLHF
Reward alignment system
Aligns large multimodal models with factually enhanced reward functions to improve performance and mitigate hacking in reinforcement learning
Aligning LMMs with Factually Augmented RLHF
328 stars
9 watching
24 forks
Language: Python
last commit: over 1 year ago Related projects:
Repository | Description | Stars |
---|---|---|
| Aligns large language models' behavior through fine-grained correctional human feedback to improve trustworthiness and accuracy. | 245 |
| This project demonstrates the effectiveness of reinforcement learning from human feedback (RLHF) in improving small language models like GPT-2. | 214 |
| Replication of Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks in PyTorch for reinforcement learning tasks | 830 |
| A system designed to enable large multimodal models to understand arbitrary visual prompts | 302 |
| A framework for parallel population-based reinforcement learning | 507 |
| An all-in-one demo for interactive image processing and generation | 353 |
| A platform for training and deploying large language and vision models that can use tools to perform tasks | 717 |
| A framework for simulating and evaluating reinforcement learning from human feedback methods | 786 |
| A Python library implementing state-of-the-art deep reinforcement learning algorithms for Keras and OpenAI Gym environments. | 8 |
| A Python implementation of a deep reinforcement learning algorithm combining multiple techniques for improved performance in Atari games | 1,591 |
| An open-source project that enhances visual instruction tuning for text-rich image understanding by integrating GPT-4 models with multimodal datasets. | 259 |
| An MLLM architecture designed to align visual and textual embeddings through structural alignment | 575 |
| An open-source reinforcement learning library for PyTorch, providing a simple and clear implementation of various algorithms. | 402 |
| Debiasing techniques to minimize hallucinations in large visual language models | 75 |
| This implementation allows an attacker to directly obtain user data from federated learning gradient updates by modifying the shared model architecture. | 23 |