Emu

Model framework

A multimodal generative model framework

Emu Series: Generative Multimodal Models from BAAI

GitHub

2k stars

22 watching

86 forks

Language: Python

last commit: about 1 year ago

foundation-modelsgenerative-pretraining-in-multimodalityin-context-learninginstruct-tuningmultimodal-generalistmultimodal-pretraining

baaivision.github.io/emu2/

Related projects:

Repository	Description	Stars
baaivision/eve	A PyTorch implementation of an encoder-free vision-language model that can be fine-tuned for various tasks and modalities	246
yunishi3/3d-fcr-alphagan	This project aims to develop a generative model for 3D multi-object scenes using a novel network architecture inspired by auto-encoding and generative adversarial networks.	103
kohjingyu/fromage	A framework for grounding language models to images and handling multimodal inputs and outputs	478
baai-wudao/model	A repository of pre-trained language models for various tasks and domains.	121
nvlabs/edm	This project provides a set of tools and techniques to design and improve diffusion-based generative models.	1,447
yuliang-liu/monkey	An end-to-end image captioning system that uses large multi-modal models and provides tools for training, inference, and demo usage.	1,849
openai/finetune-transformer-lm	This project provides code and model for improving language understanding through generative pre-training using a transformer-based architecture.	2,167
mbzuai-oryx/groundinglmm	An end-to-end trained model capable of generating natural language responses integrated with object segmentation masks for interactive visual conversations	797
baai-wudao/brivl	Pre-trains a multilingual model to bridge vision and language modalities for various downstream applications	279
flageval-baai/flageval	An evaluation toolkit and platform for assessing large models in various domains	307
flagai-open/aquila2	Provides pre-trained language models and tools for fine-tuning and evaluation	439
nvlabs/eagle	Develops high-resolution multimodal LLMs by combining vision encoders and various input resolutions	549
keras-team/keras-hub	A unified interface to various deep learning architectures	818
openai/pixel-cnn	A generative model with tractable likelihood and easy sampling, allowing for efficient data generation.	1,921
pku-yuangroup/moe-llava	A large vision-language model using a mixture-of-experts architecture to improve performance on multi-modal learning tasks	2,023