LLaVA-NeXT

Multimodal model developer

Develops large multimodal models for various computer vision tasks including image and video analysis

GitHub

3k stars
33 watching
239 forks
Language: Python
last commit: about 1 month ago

Related projects:

Repository Description Stars
haotian-liu/llava A system that uses large language and vision models to generate and process visual instructions 20,232
pku-yuangroup/video-llava This project enables large language models to perform visual reasoning capabilities on images and videos simultaneously by learning united visual representations before projection. 2,990
opengvlab/llama-adapter An implementation of a method for fine-tuning language models to follow instructions with high efficiency and accuracy 5,754
damo-nlp-sg/video-llama An audio-visual language model designed to understand and respond to video content with improved instruction-following capabilities 2,802
dvlab-research/mgm An open-source framework for training large language models with vision capabilities. 3,211
alpha-vllm/llama2-accessory An open-source toolkit for pretraining and fine-tuning large language models 2,720
llava-vl/llava-interactive-demo An all-in-one demo for interactive image processing and generation 351
llava-vl/llava-plus-codebase A platform for training and deploying large language and vision models that can use tools to perform tasks 704
scisharp/llamasharp A C#/.NET library to efficiently run Large Language Models (LLMs) on local devices 2,673
wisconsinaivision/vip-llava A system designed to enable large multimodal models to understand arbitrary visual prompts 294
eleutherai/lm-evaluation-harness Provides a unified framework to test generative language models on various evaluation tasks. 6,970
hiyouga/llama-factory A unified platform for fine-tuning multiple large language models with various training approaches and methods 34,436
optimalscale/lmflow A toolkit for finetuning large language models and providing efficient inference capabilities 8,273
qwenlm/qwen2-vl A multimodal large language model series developed by the Qwen team to understand and process images, videos, and text. 3,093
nvidia/megatron-lm A research framework for training large language models at scale using GPU optimized techniques. 10,562