VisCPM

Multimodal Models

A family of large multimodal models supporting multimodal conversational capabilities and text-to-image generation in multiple languages

[ICLR'24 spotlight] Chinese and English Multimodal Large Model Series (Chat and Paint) | 基于CPM基础模型的中英双语多模态大模型系列

GitHub

1k stars
15 watching
92 forks
Language: Python
last commit: 7 months ago
diffusion-modelslarge-language-modelsmultimodaltransformers

Related projects:

Repository Description Stars
pleisto/yuren-baichuan-7b A multi-modal large language model that integrates natural language and visual capabilities with fine-tuning for various tasks 73
openbmb/bmlist A curated list of large machine learning models tracked over time 341
yuliang-liu/monkey An end-to-end image captioning system that uses large multi-modal models and provides tools for training, inference, and demo usage. 1,849
openbmb/cpm-live A live training platform for large-scale deep learning models, allowing community participation and collaboration in model development and deployment. 511
multimodal-art-projection/omnibench Evaluates and benchmarks multimodal language models' ability to process visual, acoustic, and textual inputs simultaneously. 15
vita-mllm/vita A large multimodal language model designed to process and analyze video, image, text, and audio inputs in real-time. 1,005
mbzuai-oryx/groundinglmm An end-to-end trained model capable of generating natural language responses integrated with object segmentation masks for interactive visual conversations 797
yfzhang114/slime Develops large multimodal models for high-resolution understanding and analysis of text, images, and other data types. 143
tsb0601/mmvp An evaluation framework for multimodal language models' visual capabilities using image and question benchmarks. 296
ailab-cvc/seed An implementation of a multimodal language model with capabilities for comprehension and generation 585
runpeidong/dreamllm A framework to build versatile Multimodal Large Language Models with synergistic comprehension and creation capabilities 402
lyuchenyang/macaw-llm A multi-modal language model that integrates image, video, audio, and text data to improve language understanding and generation 1,568
opengvlab/multi-modality-arena An evaluation platform for comparing multi-modality models on visual question-answering tasks 478
xverse-ai/xverse-v-13b A large multimodal model for visual question answering, trained on a dataset of 2.1B image-text pairs and 8.2M instruction sequences. 78
open-mmlab/multimodal-gpt Trains a multimodal chatbot that combines visual and language instructions to generate responses 1,478