Muffin
Multimodal bridge
A framework for building multimodal foundation models that can serve as bridges between different modalities and language models.
59 stars
8 watching
3 forks
Language: Python
last commit: 12 months ago Related projects:
Repository | Description | Stars |
---|---|---|
yuliang-liu/monkey | An end-to-end image captioning system that uses large multi-modal models and provides tools for training, inference, and demo usage. | 1,849 |
joez17/chatbridge | A unified multimodal language model capable of interpreting and reasoning about various modalities without paired data. | 49 |
multimodal-art-projection/omnibench | Evaluates and benchmarks multimodal language models' ability to process visual, acoustic, and textual inputs simultaneously. | 15 |
matrix-org/matrix-bifrost | A general-purpose bridge that connects multiple networks and protocols using various backends. | 164 |
mautrix/telegram | Enables communication between Matrix and Telegram networks by bridging them together | 1,360 |
mwotton/hubris | A bridge between Ruby and Haskell allowing code reuse across the two languages | 262 |
sorunome/mx-puppet-bridge | A library that allows building bridges between Matrix and remote services by automating logins and interactions. | 95 |
bendudson/py4cl | A bridge between Common Lisp and Python, enabling interaction between the two languages through a separate process. | 235 |
mautrix/whatsapp | A software bridge connecting Matrix and WhatsApp | 1,301 |
subho406/omninet | An implementation of a unified architecture for multi-modal multi-task learning using PyTorch. | 515 |
yglukhov/nimpy | A bridge between Nim and Python, allowing native language integration. | 1,482 |
metawilm/cl-python | An implementation of Python in Common Lisp, allowing mixed execution and library access between the two languages. | 369 |
openbmb/viscpm | A family of large multimodal models supporting multimodal conversational capabilities and text-to-image generation in multiple languages | 1,098 |
kohjingyu/fromage | A framework for grounding language models to images and handling multimodal inputs and outputs | 478 |
vita-mllm/vita | A large multimodal language model designed to process and analyze video, image, text, and audio inputs in real-time. | 1,005 |