LLaVA-NeXT
Multimodal model developer
Develops large multimodal models for various computer vision tasks including image and video analysis
3k stars
33 watching
239 forks
Language: Python
last commit: about 1 month ago Related projects:
Repository | Description | Stars |
---|---|---|
haotian-liu/llava | A system that uses large language and vision models to generate and process visual instructions | 20,232 |
pku-yuangroup/video-llava | This project enables large language models to perform visual reasoning capabilities on images and videos simultaneously by learning united visual representations before projection. | 2,990 |
opengvlab/llama-adapter | An implementation of a method for fine-tuning language models to follow instructions with high efficiency and accuracy | 5,754 |
damo-nlp-sg/video-llama | An audio-visual language model designed to understand and respond to video content with improved instruction-following capabilities | 2,802 |
dvlab-research/mgm | An open-source framework for training large language models with vision capabilities. | 3,211 |
alpha-vllm/llama2-accessory | An open-source toolkit for pretraining and fine-tuning large language models | 2,720 |
llava-vl/llava-interactive-demo | An all-in-one demo for interactive image processing and generation | 351 |
llava-vl/llava-plus-codebase | A platform for training and deploying large language and vision models that can use tools to perform tasks | 704 |
scisharp/llamasharp | A C#/.NET library to efficiently run Large Language Models (LLMs) on local devices | 2,673 |
wisconsinaivision/vip-llava | A system designed to enable large multimodal models to understand arbitrary visual prompts | 294 |
eleutherai/lm-evaluation-harness | Provides a unified framework to test generative language models on various evaluation tasks. | 6,970 |
hiyouga/llama-factory | A unified platform for fine-tuning multiple large language models with various training approaches and methods | 34,436 |
optimalscale/lmflow | A toolkit for finetuning large language models and providing efficient inference capabilities | 8,273 |
qwenlm/qwen2-vl | A multimodal large language model series developed by the Qwen team to understand and process images, videos, and text. | 3,093 |
nvidia/megatron-lm | A research framework for training large language models at scale using GPU optimized techniques. | 10,562 |