InternLM-XComposer

Vision-Language Model

A large vision language model that can understand and generate text from visual inputs, with capabilities for long-contextual input and output, high-resolution understanding, fine-grained video understanding, and multi-turn multi-image dialogue.

InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output

GitHub

3k stars
43 watching
154 forks
Language: Python
last commit: about 1 month ago
chatgptfoundationgptgpt-4instruction-tuninglanguage-modellarge-language-modellarge-vision-language-modelllmmllmmulti-modalitymultimodalsupervised-finetuningvision-language-modelvision-transformervisual-language-learning

Related projects:

Repository Description Stars
qwenlm/qwen-vl A large vision language model with improved image reasoning and text recognition capabilities, suitable for various multimodal tasks 5,045
qwenlm/qwen2-vl A multimodal large language model series developed by the Qwen team to understand and process images, videos, and text. 3,093
internlm/internlm Large language models for chatbot and natural language understanding applications 6,473
vision-cair/minigpt-4 Enabling vision-language understanding by fine-tuning large language models on visual data. 25,422
openbmb/minicpm-v A multimodal language model designed to understand images, videos, and text inputs and generate high-quality text outputs. 12,619
x-plug/mplug-owl Develops large language models that can understand and generate human-like visual and video content 2,321
pku-yuangroup/video-llava This project enables large language models to perform visual reasoning capabilities on images and videos simultaneously by learning united visual representations before projection. 2,990
sgl-project/sglang A framework for serving large language models and vision models with efficient runtime and flexible interface. 6,082
alpha-vllm/llama2-accessory An open-source toolkit for pretraining and fine-tuning large language models 2,720
haotian-liu/llava A system that uses large language and vision models to generate and process visual instructions 20,232
opengvlab/internvl A pioneering open-source alternative to commercial multimodal models with a family of large-scale language and vision models. 6,014
internlm/lmdeploy A toolkit for optimizing and serving large language models 4,653
google-research/big_vision Supports large-scale vision model training on GPU machines or Google Cloud TPUs using scalable input pipelines. 2,334
thudm/cogvlm Develops a state-of-the-art visual language model with applications in image understanding and dialogue systems. 6,080
young-geng/easylm A framework for training and serving large language models using JAX/Flax 2,409