InternLM-XComposer
Vision-Language Model
A large vision language model that can understand and generate text from visual inputs, with capabilities for long-contextual input and output, high-resolution understanding, fine-grained video understanding, and multi-turn multi-image dialogue.
InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output
3k stars
43 watching
154 forks
Language: Python
last commit: about 1 month ago chatgptfoundationgptgpt-4instruction-tuninglanguage-modellarge-language-modellarge-vision-language-modelllmmllmmulti-modalitymultimodalsupervised-finetuningvision-language-modelvision-transformervisual-language-learning
Related projects:
Repository | Description | Stars |
---|---|---|
qwenlm/qwen-vl | A large vision language model with improved image reasoning and text recognition capabilities, suitable for various multimodal tasks | 5,045 |
qwenlm/qwen2-vl | A multimodal large language model series developed by the Qwen team to understand and process images, videos, and text. | 3,093 |
internlm/internlm | Large language models for chatbot and natural language understanding applications | 6,473 |
vision-cair/minigpt-4 | Enabling vision-language understanding by fine-tuning large language models on visual data. | 25,422 |
openbmb/minicpm-v | A multimodal language model designed to understand images, videos, and text inputs and generate high-quality text outputs. | 12,619 |
x-plug/mplug-owl | Develops large language models that can understand and generate human-like visual and video content | 2,321 |
pku-yuangroup/video-llava | This project enables large language models to perform visual reasoning capabilities on images and videos simultaneously by learning united visual representations before projection. | 2,990 |
sgl-project/sglang | A framework for serving large language models and vision models with efficient runtime and flexible interface. | 6,082 |
alpha-vllm/llama2-accessory | An open-source toolkit for pretraining and fine-tuning large language models | 2,720 |
haotian-liu/llava | A system that uses large language and vision models to generate and process visual instructions | 20,232 |
opengvlab/internvl | A pioneering open-source alternative to commercial multimodal models with a family of large-scale language and vision models. | 6,014 |
internlm/lmdeploy | A toolkit for optimizing and serving large language models | 4,653 |
google-research/big_vision | Supports large-scale vision model training on GPU machines or Google Cloud TPUs using scalable input pipelines. | 2,334 |
thudm/cogvlm | Develops a state-of-the-art visual language model with applications in image understanding and dialogue systems. | 6,080 |
young-geng/easylm | A framework for training and serving large language models using JAX/Flax | 2,409 |