InternLM-XComposer
Multimodal model
A comprehensive multimodal system for long-term streaming video and audio interactions with capabilities including text-image comprehension and composition
InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions
3k stars
44 watching
158 forks
Language: Python
last commit: about 1 month ago chatgptfoundationgptgpt-4instruction-tuninglanguage-modellarge-language-modellarge-vision-language-modelllmmllmmulti-modalitymultimodalsupervised-finetuningvision-language-modelvision-transformervisual-language-learning
Related projects:
Repository | Description | Stars |
---|---|---|
qwenlm/qwen-vl | A large vision language model with improved image reasoning and text recognition capabilities, suitable for various multimodal tasks | 5,179 |
qwenlm/qwen2-vl | A multimodal large language model series developed by the Qwen team to understand and process images, videos, and text. | 3,613 |
internlm/internlm | A collection of large language models designed to improve reasoning and tool use capabilities in chatbots. | 6,572 |
vision-cair/minigpt-4 | Enabling vision-language understanding by fine-tuning large language models on visual data. | 25,490 |
openbmb/minicpm-v | A multimodal language model designed to understand images, videos, and text inputs and generate high-quality text outputs. | 12,870 |
x-plug/mplug-owl | Develops large language models that can understand and generate human-like visual and video content | 2,365 |
pku-yuangroup/video-llava | A deep learning framework for generating videos from text inputs and visual features. | 3,071 |
sgl-project/sglang | A fast serving framework for large language models and vision language models. | 6,551 |
alpha-vllm/llama2-accessory | An open-source toolkit for pretraining and fine-tuning large language models | 2,732 |
haotian-liu/llava | A system that uses large language and vision models to generate and process visual instructions | 20,683 |
opengvlab/internvl | Develops large language models capable of processing multiple data types and modalities | 6,394 |
internlm/lmdeploy | A toolkit for optimizing and serving large language models | 4,854 |
google-research/big_vision | Supports large-scale vision model training on GPU machines or Google Cloud TPUs using scalable input pipelines. | 2,439 |
thudm/cogvlm | Develops a state-of-the-art visual language model with applications in image understanding and dialogue systems. | 6,182 |
young-geng/easylm | A framework for training and serving large language models using JAX/Flax | 2,428 |