LLaVA
Visual Instruction System
A system that uses large language and vision models to generate and process visual instructions
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
20k stars
159 watching
2k forks
Language: Python
last commit: 3 months ago chatbotchatgptfoundation-modelsgpt-4instruction-tuningllamallama-2llama2llavamulti-modalitymultimodalvision-language-modelvisual-language-learning
Related projects:
Repository | Description | Stars |
---|---|---|
llava-vl/llava-next | Develops large multimodal models for various computer vision tasks including image and video analysis | 2,872 |
instruction-tuning-with-gpt-4/gpt-4-llm | This project generates instruction-following data using GPT-4 to fine-tune large language models for real-world tasks. | 4,210 |
opengvlab/llama-adapter | An implementation of a method for fine-tuning language models to follow instructions with high efficiency and accuracy | 5,754 |
pku-yuangroup/video-llava | This project enables large language models to perform visual reasoning capabilities on images and videos simultaneously by learning united visual representations before projection. | 2,990 |
hiyouga/llama-factory | A unified platform for fine-tuning multiple large language models with various training approaches and methods | 34,436 |
dvlab-research/mgm | An open-source framework for training large language models with vision capabilities. | 3,211 |
alpha-vllm/llama2-accessory | An open-source toolkit for pretraining and fine-tuning large language models | 2,722 |
salt-nlp/llavar | An open-source project that enhances visual instruction tuning for text-rich image understanding by integrating GPT-4 models with multimodal datasets. | 258 |
facico/chinese-vicuna | An instruction-following Chinese LLaMA-based model project aimed at training and fine-tuning models on specific hardware configurations for efficient deployment. | 4,142 |
damo-nlp-sg/video-llama | An audio-visual language model designed to understand and respond to video content with improved instruction-following capabilities | 2,802 |
luodian/otter | A multi-modal AI model developed for improved instruction-following and in-context learning, utilizing large-scale architectures and various training datasets. | 3,563 |
qwenlm/qwen-vl | A large vision language model with improved image reasoning and text recognition capabilities, suitable for various multimodal tasks | 5,079 |
sgl-project/sglang | A framework for serving large language models and vision models with efficient runtime and flexible interface. | 6,082 |
eleutherai/lm-evaluation-harness | Provides a unified framework to test generative language models on various evaluation tasks. | 7,028 |
tloen/alpaca-lora | Tuning a large language model on consumer hardware using low-rank adaptation | 18,651 |