Emu
Model framework
A multimodal generative model framework
Emu Series: Generative Multimodal Models from BAAI
2k stars
21 watching
86 forks
Language: Python
last commit: about 2 months ago foundation-modelsgenerative-pretraining-in-multimodalityin-context-learninginstruct-tuningmultimodal-generalistmultimodal-pretraining
Related projects:
Repository | Description | Stars |
---|---|---|
baaivision/eve | A PyTorch implementation of an encoder-free vision-language model that can be fine-tuned for various tasks and modalities | 230 |
yunishi3/3d-fcr-alphagan | This project aims to develop a generative model for 3D multi-object scenes using a novel network architecture inspired by auto-encoding and generative adversarial networks. | 103 |
kohjingyu/fromage | A framework for grounding language models to images and handling multimodal inputs and outputs | 478 |
baai-wudao/model | A repository of pre-trained language models for various tasks and domains. | 121 |
nvlabs/edm | This project provides a set of tools and techniques to design and improve diffusion-based generative models. | 1,399 |
yuliang-liu/monkey | A toolkit for building conversational AI models that can process images and text inputs. | 1,825 |
openai/finetune-transformer-lm | This project provides code and model for improving language understanding through generative pre-training using a transformer-based architecture. | 2,160 |
mbzuai-oryx/groundinglmm | An end-to-end trained model capable of generating natural language responses integrated with object segmentation masks. | 781 |
baai-wudao/brivl | Pre-trains a multilingual model to bridge vision and language modalities for various downstream applications | 279 |
flageval-baai/flageval | An evaluation toolkit and platform for assessing large models in various domains | 300 |
flagai-open/aquila2 | Provides pre-trained language models and tools for fine-tuning and evaluation | 437 |
nvlabs/eagle | Develops high-resolution multimodal LLMs by combining vision encoders and various input resolutions | 539 |
keras-team/keras-hub | Provides pre-trained models and building blocks for natural language processing, computer vision, audio, and multimodal tasks | 797 |
openai/pixel-cnn | A generative model with tractable likelihood and easy sampling, allowing for efficient data generation. | 1,921 |
pku-yuangroup/moe-llava | Develops a neural network architecture for multi-modal learning with large vision-language models | 1,980 |