Emu
Model framework
A multimodal generative model framework
Emu Series: Generative Multimodal Models from BAAI
2k stars
22 watching
86 forks
Language: Python
last commit: 5 months ago foundation-modelsgenerative-pretraining-in-multimodalityin-context-learninginstruct-tuningmultimodal-generalistmultimodal-pretraining
Related projects:
Repository | Description | Stars |
---|---|---|
| A PyTorch implementation of an encoder-free vision-language model that can be fine-tuned for various tasks and modalities | 246 |
| This project aims to develop a generative model for 3D multi-object scenes using a novel network architecture inspired by auto-encoding and generative adversarial networks. | 103 |
| A framework for grounding language models to images and handling multimodal inputs and outputs | 478 |
| A repository of pre-trained language models for various tasks and domains. | 121 |
| This project provides a set of tools and techniques to design and improve diffusion-based generative models. | 1,447 |
| An end-to-end image captioning system that uses large multi-modal models and provides tools for training, inference, and demo usage. | 1,849 |
| This project provides code and model for improving language understanding through generative pre-training using a transformer-based architecture. | 2,167 |
| An end-to-end trained model capable of generating natural language responses integrated with object segmentation masks for interactive visual conversations | 797 |
| Pre-trains a multilingual model to bridge vision and language modalities for various downstream applications | 279 |
| An evaluation toolkit and platform for assessing large models in various domains | 307 |
| Provides pre-trained language models and tools for fine-tuning and evaluation | 439 |
| Develops high-resolution multimodal LLMs by combining vision encoders and various input resolutions | 549 |
| A unified interface to various deep learning architectures | 818 |
| A generative model with tractable likelihood and easy sampling, allowing for efficient data generation. | 1,921 |
| A large vision-language model using a mixture-of-experts architecture to improve performance on multi-modal learning tasks | 2,023 |