ChatBridge

Multimodal Model

A unified multimodal language model capable of interpreting and reasoning about various modalities without paired data.

ChatBridge, an approach to learning a unified multimodal model to interpret, correlate, and reason about various modalities without relying on all combinations of paired data.

GitHub

49 stars

2 watching

1 forks

Language: Python

last commit: almost 2 years ago

Related projects:

Repository	Description	Stars
thunlp/muffin	A framework for building multimodal foundation models that can serve as bridges between different modalities and language models.	59
42wim/matterbridge	A bridge that connects multiple chat protocols to a unified interface	6,745
andrewnguonly/chatabstractions	Provides a framework for creating custom chat models with dynamic failover and load balancing features	79
multimodal-art-projection/omnibench	Evaluates and benchmarks multimodal language models' ability to process visual, acoustic, and textual inputs simultaneously.	15
langboat/mengzi3	An 8B and 13B language model based on the Llama architecture with multilingual capabilities.	2,031
mainframecomputer/fullmoon-ios	An iOS app that provides a chat interface to local large language models, optimized for Apple silicon.	450
openbmb/viscpm	A family of large multimodal models supporting multimodal conversational capabilities and text-to-image generation in multiple languages	1,098
yuliang-liu/monkey	An end-to-end image captioning system that uses large multi-modal models and provides tools for training, inference, and demo usage.	1,849
lyuchenyang/macaw-llm	A multi-modal language model that integrates image, video, audio, and text data to improve language understanding and generation	1,568
deltachat-bot/matterdelta	A tool that enables communication between Delta Chat and other supported chat services using Matterbridge.	13
xverse-ai/xverse-v-13b	A large multimodal model for visual question answering, trained on a dataset of 2.1B image-text pairs and 8.2M instruction sequences.	78
kohjingyu/fromage	A framework for grounding language models to images and handling multimodal inputs and outputs	478
tele-ai/telechat-52b	An open-source chat model built on top of the 52B large language model, with improvements in position encoding, activation function, and layer normalization.	40
pleisto/yuren-baichuan-7b	A multi-modal large language model that integrates natural language and visual capabilities with fine-tuning for various tasks	73
mshukor/unival	A unified model for image, video, audio, and language tasks that can be fine-tuned for various downstream applications.	224