LLaVA-NeXT

Multimodal model developer

Develops large multimodal models for various computer vision tasks including image and video analysis

GitHub

3k stars
37 watching
266 forks
Language: Python
last commit: 3 months ago

Related projects:

Repository Description Stars
haotian-liu/llava A system that uses large language and vision models to generate and process visual instructions 20,683
pku-yuangroup/video-llava A deep learning framework for generating videos from text inputs and visual features. 3,071
opengvlab/llama-adapter An implementation of a method for fine-tuning language models to follow instructions with high efficiency and accuracy 5,775
damo-nlp-sg/video-llama An audio-visual language model designed to understand and respond to video content with improved instruction-following capabilities 2,842
dvlab-research/mgm An open-source framework for training large language models with vision capabilities. 3,229
alpha-vllm/llama2-accessory An open-source toolkit for pretraining and fine-tuning large language models 2,732
llava-vl/llava-interactive-demo An all-in-one demo for interactive image processing and generation 353
llava-vl/llava-plus-codebase A platform for training and deploying large language and vision models that can use tools to perform tasks 717
scisharp/llamasharp An efficient C#/.NET library for running Large Language Models (LLMs) on local devices 2,750
wisconsinaivision/vip-llava A system designed to enable large multimodal models to understand arbitrary visual prompts 302
eleutherai/lm-evaluation-harness Provides a unified framework to test generative language models on various evaluation tasks. 7,200
hiyouga/llama-factory A tool for efficiently fine-tuning large language models across multiple architectures and methods. 36,219
optimalscale/lmflow A toolkit for fine-tuning and inferring large machine learning models 8,312
qwenlm/qwen2-vl A multimodal large language model series developed by the Qwen team to understand and process images, videos, and text. 3,613
nvidia/megatron-lm A framework for training large language models using scalable and optimized GPU techniques 10,804