LLaVA

Visual Instruction System

A system that uses large language and vision models to generate and process visual instructions

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

GitHub

20k stars
159 watching
2k forks
Language: Python
last commit: 4 months ago
chatbotchatgptfoundation-modelsgpt-4instruction-tuningllamallama-2llama2llavamulti-modalitymultimodalvision-language-modelvisual-language-learning

Related projects:

Repository Description Stars
llava-vl/llava-next Develops large multimodal models for various computer vision tasks including image and video analysis 3,005
instruction-tuning-with-gpt-4/gpt-4-llm This project generates instruction-following data using GPT-4 to fine-tune large language models for real-world tasks. 4,224
opengvlab/llama-adapter An implementation of a method for fine-tuning language models to follow instructions with high efficiency and accuracy 5,763
pku-yuangroup/video-llava This project enables large language models to perform visual reasoning capabilities on images and videos simultaneously by learning united visual representations before projection. 3,040
hiyouga/llama-factory A tool for efficiently fine-tuning large language models across multiple architectures and methods. 35,410
dvlab-research/mgm An open-source framework for training large language models with vision capabilities. 3,217
alpha-vllm/llama2-accessory An open-source toolkit for pretraining and fine-tuning large language models 2,725
salt-nlp/llavar An open-source project that enhances visual instruction tuning for text-rich image understanding by integrating GPT-4 models with multimodal datasets. 259
facico/chinese-vicuna An instruction-following Chinese LLaMA-based model project aimed at training and fine-tuning models on specific hardware configurations for efficient deployment. 4,146
damo-nlp-sg/video-llama An audio-visual language model designed to understand and respond to video content with improved instruction-following capabilities 2,826
luodian/otter A multi-modal AI model developed for improved instruction-following and in-context learning, utilizing large-scale architectures and various training datasets. 3,564
qwenlm/qwen-vl A large vision language model with improved image reasoning and text recognition capabilities, suitable for various multimodal tasks 5,118
sgl-project/sglang A fast serving framework for large language models and vision language models. 6,323
eleutherai/lm-evaluation-harness Provides a unified framework to test generative language models on various evaluation tasks. 7,087
tloen/alpaca-lora Tuning a large language model on consumer hardware using low-rank adaptation 18,682