VisCPM
Multimodal Models
A family of large multimodal models supporting multimodal conversational capabilities and text-to-image generation in multiple languages
[ICLR'24 spotlight] Chinese and English Multimodal Large Model Series (Chat and Paint) | 基于CPM基础模型的中英双语多模态大模型系列
1k stars
15 watching
94 forks
Language: Python
last commit: 5 months ago diffusion-modelslarge-language-modelsmultimodaltransformers
Related projects:
Repository | Description | Stars |
---|---|---|
pleisto/yuren-baichuan-7b | A multi-modal large language model that integrates natural language and visual capabilities with fine-tuning for various tasks | 72 |
openbmb/bmlist | A curated list of large machine learning models tracked over time | 341 |
yuliang-liu/monkey | A toolkit for building conversational AI models that can process images and text inputs. | 1,825 |
openbmb/cpm-live | A live training platform for large-scale deep learning models, allowing community participation and collaboration in model development and deployment. | 511 |
multimodal-art-projection/omnibench | Evaluates and benchmarks multimodal language models' ability to process visual, acoustic, and textual inputs simultaneously. | 14 |
vita-mllm/vita | A large multimodal language model designed to process and analyze video, image, text, and audio inputs in real-time. | 961 |
mbzuai-oryx/groundinglmm | An end-to-end trained model capable of generating natural language responses integrated with object segmentation masks. | 781 |
yfzhang114/slime | Develops large multimodal models for high-resolution understanding and analysis of text, images, and other data types. | 137 |
tsb0601/mmvp | An evaluation framework for multimodal language models' visual capabilities using image and question benchmarks. | 288 |
ailab-cvc/seed | An implementation of a multimodal language model with capabilities for comprehension and generation | 576 |
runpeidong/dreamllm | A framework to build versatile Multimodal Large Language Models with synergistic comprehension and creation capabilities | 394 |
lyuchenyang/macaw-llm | A multi-modal language model that integrates image, video, audio, and text data to improve language understanding and generation | 1,550 |
opengvlab/multi-modality-arena | An evaluation platform for comparing multi-modality models on visual question-answering tasks | 467 |
xverse-ai/xverse-v-13b | A large multimodal model for visual question answering, trained on a dataset of 2.1B image-text pairs and 8.2M instruction sequences. | 77 |
open-mmlab/multimodal-gpt | Trains a multimodal chatbot that combines visual and language instructions to generate responses | 1,477 |