LLaVA-RLHF
Reward alignment system
Aligns large multimodal models with factually enhanced reward functions to improve performance and mitigate hacking in reinforcement learning
Aligning LMMs with Factually Augmented RLHF
328 stars
9 watching
24 forks
Language: Python
last commit: about 1 year ago Related projects:
Repository | Description | Stars |
---|---|---|
rlhf-v/rlhf-v | Aligns large language models' behavior through fine-grained correctional human feedback to improve trustworthiness and accuracy. | 245 |
ethanyanjiali/minchatgpt | This project demonstrates the effectiveness of reinforcement learning from human feedback (RLHF) in improving small language models like GPT-2. | 214 |
tristandeleu/pytorch-maml-rl | Replication of Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks in PyTorch for reinforcement learning tasks | 830 |
wisconsinaivision/vip-llava | A system designed to enable large multimodal models to understand arbitrary visual prompts | 302 |
sjtu-marl/malib | A framework for parallel population-based reinforcement learning | 507 |
llava-vl/llava-interactive-demo | An all-in-one demo for interactive image processing and generation | 353 |
llava-vl/llava-plus-codebase | A platform for training and deploying large language and vision models that can use tools to perform tasks | 717 |
tatsu-lab/alpaca_farm | A framework for simulating and evaluating reinforcement learning from human feedback methods | 786 |
matthiasplappert/keras-rl | A Python library implementing state-of-the-art deep reinforcement learning algorithms for Keras and OpenAI Gym environments. | 8 |
kaixhin/rainbow | A Python implementation of a deep reinforcement learning algorithm combining multiple techniques for improved performance in Atari games | 1,591 |
salt-nlp/llavar | An open-source project that enhances visual instruction tuning for text-rich image understanding by integrating GPT-4 models with multimodal datasets. | 259 |
aidc-ai/ovis | An MLLM architecture designed to align visual and textual embeddings through structural alignment | 575 |
iffix/machin | An open-source reinforcement learning library for PyTorch, providing a simple and clear implementation of various algorithms. | 402 |
yfzhang114/llava-align | Debiasing techniques to minimize hallucinations in large visual language models | 75 |
lhfowl/robbing_the_fed | This implementation allows an attacker to directly obtain user data from federated learning gradient updates by modifying the shared model architecture. | 23 |