VisCPM

Multimodal Models

A family of large multimodal models supporting multimodal conversational capabilities and text-to-image generation in multiple languages

[ICLR'24 spotlight] Chinese and English Multimodal Large Model Series (Chat and Paint) | 基于CPM基础模型的中英双语多模态大模型系列

GitHub

1k stars

15 watching

92 forks

Language: Python

last commit: over 1 year ago

diffusion-modelslarge-language-modelsmultimodaltransformers

Related projects:

Repository	Description	Stars
pleisto/yuren-baichuan-7b	A multi-modal large language model that integrates natural language and visual capabilities with fine-tuning for various tasks	73
openbmb/bmlist	A curated list of large machine learning models tracked over time	341
yuliang-liu/monkey	An end-to-end image captioning system that uses large multi-modal models and provides tools for training, inference, and demo usage.	1,849
openbmb/cpm-live	A live training platform for large-scale deep learning models, allowing community participation and collaboration in model development and deployment.	511
multimodal-art-projection/omnibench	Evaluates and benchmarks multimodal language models' ability to process visual, acoustic, and textual inputs simultaneously.	15
vita-mllm/vita	A large multimodal language model designed to process and analyze video, image, text, and audio inputs in real-time.	1,005
mbzuai-oryx/groundinglmm	An end-to-end trained model capable of generating natural language responses integrated with object segmentation masks for interactive visual conversations	797
yfzhang114/slime	Develops large multimodal models for high-resolution understanding and analysis of text, images, and other data types.	143
tsb0601/mmvp	An evaluation framework for multimodal language models' visual capabilities using image and question benchmarks.	296
ailab-cvc/seed	An implementation of a multimodal language model with capabilities for comprehension and generation	585
runpeidong/dreamllm	A framework to build versatile Multimodal Large Language Models with synergistic comprehension and creation capabilities	402
lyuchenyang/macaw-llm	A multi-modal language model that integrates image, video, audio, and text data to improve language understanding and generation	1,568
opengvlab/multi-modality-arena	An evaluation platform for comparing multi-modality models on visual question-answering tasks	478
xverse-ai/xverse-v-13b	A large multimodal model for visual question answering, trained on a dataset of 2.1B image-text pairs and 8.2M instruction sequences.	78
open-mmlab/multimodal-gpt	Trains a multimodal chatbot that combines visual and language instructions to generate responses	1,478