 Muffin
 Muffin 
 Multimodal bridge
 A framework for building multimodal foundation models that can serve as bridges between different modalities and language models.
59 stars
 8 watching
 3 forks
 
Language: Python 
last commit: over 1 year ago  Related projects:
| Repository | Description | Stars | 
|---|---|---|
|  | An end-to-end image captioning system that uses large multi-modal models and provides tools for training, inference, and demo usage. | 1,849 | 
|  | A unified multimodal language model capable of interpreting and reasoning about various modalities without paired data. | 49 | 
|  | Evaluates and benchmarks multimodal language models' ability to process visual, acoustic, and textual inputs simultaneously. | 15 | 
|  | A general-purpose bridge that connects multiple networks and protocols using various backends. | 164 | 
|  | Enables communication between Matrix and Telegram networks by bridging them together | 1,360 | 
|  | A bridge between Ruby and Haskell allowing code reuse across the two languages | 262 | 
|  | A library that allows building bridges between Matrix and remote services by automating logins and interactions. | 95 | 
|  | A bridge between Common Lisp and Python, enabling interaction between the two languages through a separate process. | 235 | 
|  | A software bridge connecting Matrix and WhatsApp | 1,301 | 
|  | An implementation of a unified architecture for multi-modal multi-task learning using PyTorch. | 515 | 
|  | A bridge between Nim and Python, allowing native language integration. | 1,482 | 
|  | An implementation of Python in Common Lisp, allowing mixed execution and library access between the two languages. | 369 | 
|  | A family of large multimodal models supporting multimodal conversational capabilities and text-to-image generation in multiple languages | 1,098 | 
|  | A framework for grounding language models to images and handling multimodal inputs and outputs | 478 | 
|  | A large multimodal language model designed to process and analyze video, image, text, and audio inputs in real-time. | 1,005 |