VisCPM

Multimodal Models

A family of large multimodal models supporting multimodal conversational capabilities and text-to-image generation in multiple languages

[ICLR'24 spotlight] Chinese and English Multimodal Large Model Series (Chat and Paint) | 基于CPM基础模型的中英双语多模态大模型系列

GitHub

1k stars
15 watching
94 forks
Language: Python
last commit: 5 months ago
diffusion-modelslarge-language-modelsmultimodaltransformers

Related projects:

Repository Description Stars
pleisto/yuren-baichuan-7b A multi-modal large language model that integrates natural language and visual capabilities with fine-tuning for various tasks 72
openbmb/bmlist A curated list of large machine learning models tracked over time 341
yuliang-liu/monkey A toolkit for building conversational AI models that can process images and text inputs. 1,825
openbmb/cpm-live A live training platform for large-scale deep learning models, allowing community participation and collaboration in model development and deployment. 511
multimodal-art-projection/omnibench Evaluates and benchmarks multimodal language models' ability to process visual, acoustic, and textual inputs simultaneously. 14
vita-mllm/vita A large multimodal language model designed to process and analyze video, image, text, and audio inputs in real-time. 961
mbzuai-oryx/groundinglmm An end-to-end trained model capable of generating natural language responses integrated with object segmentation masks. 781
yfzhang114/slime Develops large multimodal models for high-resolution understanding and analysis of text, images, and other data types. 137
tsb0601/mmvp An evaluation framework for multimodal language models' visual capabilities using image and question benchmarks. 288
ailab-cvc/seed An implementation of a multimodal language model with capabilities for comprehension and generation 576
runpeidong/dreamllm A framework to build versatile Multimodal Large Language Models with synergistic comprehension and creation capabilities 394
lyuchenyang/macaw-llm A multi-modal language model that integrates image, video, audio, and text data to improve language understanding and generation 1,550
opengvlab/multi-modality-arena An evaluation platform for comparing multi-modality models on visual question-answering tasks 467
xverse-ai/xverse-v-13b A large multimodal model for visual question answering, trained on a dataset of 2.1B image-text pairs and 8.2M instruction sequences. 77
open-mmlab/multimodal-gpt Trains a multimodal chatbot that combines visual and language instructions to generate responses 1,477