Muffin

Multimodal bridge

A framework for building multimodal foundation models that can serve as bridges between different modalities and language models.

GitHub

57 stars
8 watching
3 forks
Language: Python
last commit: 10 months ago

Related projects:

Repository Description Stars
yuliang-liu/monkey A toolkit for building conversational AI models that can process images and text inputs. 1,825
joez17/chatbridge A unified multimodal language model capable of interpreting and reasoning about various modalities without paired data. 47
multimodal-art-projection/omnibench Evaluates and benchmarks multimodal language models' ability to process visual, acoustic, and textual inputs simultaneously. 14
matrix-org/matrix-bifrost A general-purpose bridge that connects multiple networks and protocols using various backends. 162
mautrix/telegram Enables communication between Matrix and Telegram networks by bridging them together 1,343
mwotton/hubris A bridge between Ruby and Haskell allowing code reuse across the two languages 262
sorunome/mx-puppet-bridge A library that allows building bridges between Matrix and remote services by automating logins and interactions. 95
bendudson/py4cl A bridge between Common Lisp and Python, enabling interaction between the two languages through a separate process. 235
mautrix/whatsapp A software bridge connecting Matrix and WhatsApp 1,287
subho406/omninet An implementation of a unified architecture for multi-modal multi-task learning using PyTorch. 512
yglukhov/nimpy A bridge between Nim and Python, allowing native language integration. 1,477
metawilm/cl-python An implementation of Python in Common Lisp, allowing mixed execution and library access between the two languages. 367
openbmb/viscpm A family of large multimodal models supporting multimodal conversational capabilities and text-to-image generation in multiple languages 1,089
kohjingyu/fromage A framework for grounding language models to images and handling multimodal inputs and outputs 478
vita-mllm/vita A large multimodal language model designed to process and analyze video, image, text, and audio inputs in real-time. 961