VisCPM
Multimodal Models
A family of large multimodal models supporting multimodal conversational capabilities and text-to-image generation in multiple languages
[ICLR'24 spotlight] Chinese and English Multimodal Large Model Series (Chat and Paint) | 基于CPM基础模型的中英双语多模态大模型系列
1k stars
15 watching
92 forks
Language: Python
last commit: 8 months ago diffusion-modelslarge-language-modelsmultimodaltransformers
Related projects:
Repository | Description | Stars |
---|---|---|
| A multi-modal large language model that integrates natural language and visual capabilities with fine-tuning for various tasks | 73 |
| A curated list of large machine learning models tracked over time | 341 |
| An end-to-end image captioning system that uses large multi-modal models and provides tools for training, inference, and demo usage. | 1,849 |
| A live training platform for large-scale deep learning models, allowing community participation and collaboration in model development and deployment. | 511 |
| Evaluates and benchmarks multimodal language models' ability to process visual, acoustic, and textual inputs simultaneously. | 15 |
| A large multimodal language model designed to process and analyze video, image, text, and audio inputs in real-time. | 1,005 |
| An end-to-end trained model capable of generating natural language responses integrated with object segmentation masks for interactive visual conversations | 797 |
| Develops large multimodal models for high-resolution understanding and analysis of text, images, and other data types. | 143 |
| An evaluation framework for multimodal language models' visual capabilities using image and question benchmarks. | 296 |
| An implementation of a multimodal language model with capabilities for comprehension and generation | 585 |
| A framework to build versatile Multimodal Large Language Models with synergistic comprehension and creation capabilities | 402 |
| A multi-modal language model that integrates image, video, audio, and text data to improve language understanding and generation | 1,568 |
| An evaluation platform for comparing multi-modality models on visual question-answering tasks | 478 |
| A large multimodal model for visual question answering, trained on a dataset of 2.1B image-text pairs and 8.2M instruction sequences. | 78 |
| Trains a multimodal chatbot that combines visual and language instructions to generate responses | 1,478 |