InternLM-XComposer
Multimodal model
A comprehensive multimodal system for long-term streaming video and audio interactions with capabilities including text-image comprehension and composition
InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions
3k stars
44 watching
158 forks
Language: Python
last commit: 2 months ago chatgptfoundationgptgpt-4instruction-tuninglanguage-modellarge-language-modellarge-vision-language-modelllmmllmmulti-modalitymultimodalsupervised-finetuningvision-language-modelvision-transformervisual-language-learning
Related projects:
Repository | Description | Stars |
---|---|---|
| A large vision language model with improved image reasoning and text recognition capabilities, suitable for various multimodal tasks | 5,179 |
| A multimodal large language model series developed by the Qwen team to understand and process images, videos, and text. | 3,613 |
| A collection of large language models designed to improve reasoning and tool use capabilities in chatbots. | 6,572 |
| Enabling vision-language understanding by fine-tuning large language models on visual data. | 25,490 |
| A multimodal language model designed to understand images, videos, and text inputs and generate high-quality text outputs. | 12,870 |
| Develops large language models that can understand and generate human-like visual and video content | 2,365 |
| A deep learning framework for generating videos from text inputs and visual features. | 3,071 |
| A fast serving framework for large language models and vision language models. | 6,551 |
| An open-source toolkit for pretraining and fine-tuning large language models | 2,732 |
| A system that uses large language and vision models to generate and process visual instructions | 20,683 |
| Develops large language models capable of processing multiple data types and modalities | 6,394 |
| A toolkit for optimizing and serving large language models | 4,854 |
| Supports large-scale vision model training on GPU machines or Google Cloud TPUs using scalable input pipelines. | 2,439 |
| Develops a state-of-the-art visual language model with applications in image understanding and dialogue systems. | 6,182 |
| A framework for training and serving large language models using JAX/Flax | 2,428 |