LLaVA

Visual Instruction System

A system that uses large language and vision models to generate and process visual instructions

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

GitHub

21k stars

159 watching

2k forks

Language: Python

last commit: about 1 year ago

chatbotchatgptfoundation-modelsgpt-4instruction-tuningllamallama-2llama2llavamulti-modalitymultimodalvision-language-modelvisual-language-learning

llava.hliu.cc

Related projects:

Repository	Description	Stars
llava-vl/llava-next	Develops large multimodal models for various computer vision tasks including image and video analysis	3,099
instruction-tuning-with-gpt-4/gpt-4-llm	This project generates instruction-following data using GPT-4 to fine-tune large language models for real-world tasks.	4,244
opengvlab/llama-adapter	An implementation of a method for fine-tuning language models to follow instructions with high efficiency and accuracy	5,775
pku-yuangroup/video-llava	A deep learning framework for generating videos from text inputs and visual features.	3,071
hiyouga/llama-factory	A tool for efficiently fine-tuning large language models across multiple architectures and methods.	36,219
dvlab-research/mgm	An open-source framework for training large language models with vision capabilities.	3,229
alpha-vllm/llama2-accessory	An open-source toolkit for pretraining and fine-tuning large language models	2,732
salt-nlp/llavar	An open-source project that enhances visual instruction tuning for text-rich image understanding by integrating GPT-4 models with multimodal datasets.	259
facico/chinese-vicuna	An instruction-following Chinese LLaMA-based model project aimed at training and fine-tuning models on specific hardware configurations for efficient deployment.	4,152
damo-nlp-sg/video-llama	An audio-visual language model designed to understand and respond to video content with improved instruction-following capabilities	2,842
luodian/otter	A multi-modal AI model developed for improved instruction-following and in-context learning, utilizing large-scale architectures and various training datasets.	3,570
qwenlm/qwen-vl	A large vision language model with improved image reasoning and text recognition capabilities, suitable for various multimodal tasks	5,179
sgl-project/sglang	A fast serving framework for large language models and vision language models.	6,551
eleutherai/lm-evaluation-harness	Provides a unified framework to test generative language models on various evaluation tasks.	7,200
tloen/alpaca-lora	Tuning a large language model on consumer hardware using low-rank adaptation	18,710