InternLM-XComposer

Multimodal model

A comprehensive multimodal system for long-term streaming video and audio interactions with capabilities including text-image comprehension and composition

InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions

GitHub

3k stars

44 watching

158 forks

Language: Python

last commit: over 1 year ago

chatgptfoundationgptgpt-4instruction-tuninglanguage-modellarge-language-modellarge-vision-language-modelllmmllmmulti-modalitymultimodalsupervised-finetuningvision-language-modelvision-transformervisual-language-learning

Related projects:

Repository	Description	Stars
qwenlm/qwen-vl	A large vision language model with improved image reasoning and text recognition capabilities, suitable for various multimodal tasks	5,179
qwenlm/qwen2-vl	A multimodal large language model series developed by the Qwen team to understand and process images, videos, and text.	3,613
internlm/internlm	A collection of large language models designed to improve reasoning and tool use capabilities in chatbots.	6,572
vision-cair/minigpt-4	Enabling vision-language understanding by fine-tuning large language models on visual data.	25,490
openbmb/minicpm-v	A multimodal language model designed to understand images, videos, and text inputs and generate high-quality text outputs.	12,870
x-plug/mplug-owl	Develops large language models that can understand and generate human-like visual and video content	2,365
pku-yuangroup/video-llava	A deep learning framework for generating videos from text inputs and visual features.	3,071
sgl-project/sglang	A fast serving framework for large language models and vision language models.	6,551
alpha-vllm/llama2-accessory	An open-source toolkit for pretraining and fine-tuning large language models	2,732
haotian-liu/llava	A system that uses large language and vision models to generate and process visual instructions	20,683
opengvlab/internvl	Develops large language models capable of processing multiple data types and modalities	6,394
internlm/lmdeploy	A toolkit for optimizing and serving large language models	4,854
google-research/big_vision	Supports large-scale vision model training on GPU machines or Google Cloud TPUs using scalable input pipelines.	2,439
thudm/cogvlm	Develops a state-of-the-art visual language model with applications in image understanding and dialogue systems.	6,182
young-geng/easylm	A framework for training and serving large language models using JAX/Flax	2,428