Emu

Model framework

A multimodal generative model framework

Emu Series: Generative Multimodal Models from BAAI

GitHub

2k stars
21 watching
86 forks
Language: Python
last commit: about 2 months ago
foundation-modelsgenerative-pretraining-in-multimodalityin-context-learninginstruct-tuningmultimodal-generalistmultimodal-pretraining

Related projects:

Repository Description Stars
baaivision/eve A PyTorch implementation of an encoder-free vision-language model that can be fine-tuned for various tasks and modalities 230
yunishi3/3d-fcr-alphagan This project aims to develop a generative model for 3D multi-object scenes using a novel network architecture inspired by auto-encoding and generative adversarial networks. 103
kohjingyu/fromage A framework for grounding language models to images and handling multimodal inputs and outputs 478
baai-wudao/model A repository of pre-trained language models for various tasks and domains. 121
nvlabs/edm This project provides a set of tools and techniques to design and improve diffusion-based generative models. 1,399
yuliang-liu/monkey A toolkit for building conversational AI models that can process images and text inputs. 1,825
openai/finetune-transformer-lm This project provides code and model for improving language understanding through generative pre-training using a transformer-based architecture. 2,160
mbzuai-oryx/groundinglmm An end-to-end trained model capable of generating natural language responses integrated with object segmentation masks. 781
baai-wudao/brivl Pre-trains a multilingual model to bridge vision and language modalities for various downstream applications 279
flageval-baai/flageval An evaluation toolkit and platform for assessing large models in various domains 300
flagai-open/aquila2 Provides pre-trained language models and tools for fine-tuning and evaluation 437
nvlabs/eagle Develops high-resolution multimodal LLMs by combining vision encoders and various input resolutions 539
keras-team/keras-hub Provides pre-trained models and building blocks for natural language processing, computer vision, audio, and multimodal tasks 797
openai/pixel-cnn A generative model with tractable likelihood and easy sampling, allowing for efficient data generation. 1,921
pku-yuangroup/moe-llava Develops a neural network architecture for multi-modal learning with large vision-language models 1,980