InternLM-XComposer

Multimodal model

A comprehensive multimodal system for long-term streaming video and audio interactions with capabilities including text-image comprehension and composition

InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions

GitHub

3k stars
44 watching
158 forks
Language: Python
last commit: about 1 month ago
chatgptfoundationgptgpt-4instruction-tuninglanguage-modellarge-language-modellarge-vision-language-modelllmmllmmulti-modalitymultimodalsupervised-finetuningvision-language-modelvision-transformervisual-language-learning

Related projects:

Repository Description Stars
qwenlm/qwen-vl A large vision language model with improved image reasoning and text recognition capabilities, suitable for various multimodal tasks 5,179
qwenlm/qwen2-vl A multimodal large language model series developed by the Qwen team to understand and process images, videos, and text. 3,613
internlm/internlm A collection of large language models designed to improve reasoning and tool use capabilities in chatbots. 6,572
vision-cair/minigpt-4 Enabling vision-language understanding by fine-tuning large language models on visual data. 25,490
openbmb/minicpm-v A multimodal language model designed to understand images, videos, and text inputs and generate high-quality text outputs. 12,870
x-plug/mplug-owl Develops large language models that can understand and generate human-like visual and video content 2,365
pku-yuangroup/video-llava A deep learning framework for generating videos from text inputs and visual features. 3,071
sgl-project/sglang A fast serving framework for large language models and vision language models. 6,551
alpha-vllm/llama2-accessory An open-source toolkit for pretraining and fine-tuning large language models 2,732
haotian-liu/llava A system that uses large language and vision models to generate and process visual instructions 20,683
opengvlab/internvl Develops large language models capable of processing multiple data types and modalities 6,394
internlm/lmdeploy A toolkit for optimizing and serving large language models 4,854
google-research/big_vision Supports large-scale vision model training on GPU machines or Google Cloud TPUs using scalable input pipelines. 2,439
thudm/cogvlm Develops a state-of-the-art visual language model with applications in image understanding and dialogue systems. 6,182
young-geng/easylm A framework for training and serving large language models using JAX/Flax 2,428