MoAI
Vision Language Integrator
Improves performance of vision language tasks by integrating computer vision capabilities into large language models
[ECCV 2024] Official PyTorch implementation code for realizing the technical part of Mixture of All Intelligence (MoAI) to improve performance of numerous zero-shot vision language tasks.
314 stars
11 watching
32 forks
Language: Python
last commit: 11 months ago Related projects:
Repository | Description | Stars |
---|---|---|
| Develops a PyTorch implementation of an enhanced vision language model | 93 |
| An implementation of Mamba-based traversal of rationale to improve performance of numerous vision language models. | 102 |
| A large vision-language model using a mixture-of-experts architecture to improve performance on multi-modal learning tasks | 2,023 |
| A PyTorch toolbox for supporting research and development of domain adaptation, generalization, and semi-supervised learning methods in computer vision. | 1,236 |
| A PyTorch implementation of an encoder-free vision-language model that can be fine-tuned for various tasks and modalities | 246 |
| Develops and trains models for vision-language learning with decoupled language pre-training | 24 |
| This project controls vision-language models to restore degraded images in various environments and conditions. | 673 |
| This project provides an official PyTorch implementation of a method to interpret and edit vision-language representations to mitigate hallucinations in image captions. | 46 |
| Develops a multimodal vision-language model to enable machines to understand complex relationships between instructions and images in various tasks. | 337 |
| Extending pretraining models to handle multiple modalities by aligning language and video representations | 751 |
| Training and deploying large language models on computer vision tasks using region-of-interest inputs | 517 |
| Pre-trains a multilingual model to bridge vision and language modalities for various downstream applications | 279 |
| This repository provides a PyTorch implementation of an image manipulation technique using a pretrained StyleGAN model. | 380 |
| Analyzing and mitigating object hallucination in large vision-language models to improve their accuracy and reliability. | 136 |
| Improves image restoration performance by converting global operations to local ones during inference | 231 |