LLaVA-RLHF

Reward alignment system

Aligns large multimodal models with factually enhanced reward functions to improve performance and mitigate hacking in reinforcement learning

Aligning LMMs with Factually Augmented RLHF

GitHub

319 stars
9 watching
25 forks
Language: Python
last commit: about 1 year ago

Related projects:

Repository Description Stars
rlhf-v/rlhf-v Aligns large language models' behavior through fine-grained correctional human feedback to improve trustworthiness and accuracy. 233
ethanyanjiali/minchatgpt This project demonstrates the effectiveness of reinforcement learning from human feedback (RLHF) in improving small language models like GPT-2. 213
tristandeleu/pytorch-maml-rl Replication of Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks in PyTorch for reinforcement learning tasks 827
wisconsinaivision/vip-llava A system designed to enable large multimodal models to understand arbitrary visual prompts 294
sjtu-marl/malib A framework for parallel population-based reinforcement learning 497
llava-vl/llava-interactive-demo An all-in-one demo for interactive image processing and generation 351
llava-vl/llava-plus-codebase A platform for training and deploying large language and vision models that can use tools to perform tasks 704
tatsu-lab/alpaca_farm A framework for simulating and evaluating reinforcement learning from human feedback methods 782
matthiasplappert/keras-rl A Python library implementing state-of-the-art deep reinforcement learning algorithms for Keras and OpenAI Gym environments. 7
kaixhin/rainbow A Python implementation of a deep reinforcement learning algorithm combining multiple techniques for improved performance in Atari games 1,585
salt-nlp/llavar An open-source project that enhances visual instruction tuning for text-rich image understanding by integrating GPT-4 models with multimodal datasets. 258
aidc-ai/ovis An architecture designed to align visual and textual embeddings in multimodal learning 517
iffix/machin An open-source reinforcement learning library for PyTorch, providing a simple and clear implementation of various algorithms. 401
yfzhang114/llava-align Debiasing techniques to minimize hallucinations in large visual language models 71
lhfowl/robbing_the_fed This implementation allows an attacker to directly obtain user data from federated learning gradient updates by modifying the shared model architecture. 23