Muffin

Multimodal bridge

A framework for building multimodal foundation models that can serve as bridges between different modalities and language models.

59 stars

8 watching

3 forks

Language: Python

last commit: over 2 years ago

Related projects:

Repository	Description	Stars
yuliang-liu/monkey	An end-to-end image captioning system that uses large multi-modal models and provides tools for training, inference, and demo usage.	1,849
joez17/chatbridge	A unified multimodal language model capable of interpreting and reasoning about various modalities without paired data.	49
multimodal-art-projection/omnibench	Evaluates and benchmarks multimodal language models' ability to process visual, acoustic, and textual inputs simultaneously.	15
matrix-org/matrix-bifrost	A general-purpose bridge that connects multiple networks and protocols using various backends.	164
mautrix/telegram	Enables communication between Matrix and Telegram networks by bridging them together	1,360
mwotton/hubris	A bridge between Ruby and Haskell allowing code reuse across the two languages	262
sorunome/mx-puppet-bridge	A library that allows building bridges between Matrix and remote services by automating logins and interactions.	95
bendudson/py4cl	A bridge between Common Lisp and Python, enabling interaction between the two languages through a separate process.	235
mautrix/whatsapp	A software bridge connecting Matrix and WhatsApp	1,301
subho406/omninet	An implementation of a unified architecture for multi-modal multi-task learning using PyTorch.	515
yglukhov/nimpy	A bridge between Nim and Python, allowing native language integration.	1,482
metawilm/cl-python	An implementation of Python in Common Lisp, allowing mixed execution and library access between the two languages.	369
openbmb/viscpm	A family of large multimodal models supporting multimodal conversational capabilities and text-to-image generation in multiple languages	1,098
kohjingyu/fromage	A framework for grounding language models to images and handling multimodal inputs and outputs	478
vita-mllm/vita	A large multimodal language model designed to process and analyze video, image, text, and audio inputs in real-time.	1,005