 MoAI
 MoAI 
 Vision Language Integrator
 Improves performance of vision language tasks by integrating computer vision capabilities into large language models
[ECCV 2024] Official PyTorch implementation code for realizing the technical part of Mixture of All Intelligence (MoAI) to improve performance of numerous zero-shot vision language tasks.
314 stars
 11 watching
 32 forks
 
Language: Python 
last commit: over 1 year ago  Related projects:
| Repository | Description | Stars | 
|---|---|---|
|  | Develops a PyTorch implementation of an enhanced vision language model | 93 | 
|  | An implementation of Mamba-based traversal of rationale to improve performance of numerous vision language models. | 102 | 
|  | A large vision-language model using a mixture-of-experts architecture to improve performance on multi-modal learning tasks | 2,023 | 
|  | A PyTorch toolbox for supporting research and development of domain adaptation, generalization, and semi-supervised learning methods in computer vision. | 1,236 | 
|  | A PyTorch implementation of an encoder-free vision-language model that can be fine-tuned for various tasks and modalities | 246 | 
|  | Develops and trains models for vision-language learning with decoupled language pre-training | 24 | 
|  | This project controls vision-language models to restore degraded images in various environments and conditions. | 673 | 
|  | This project provides an official PyTorch implementation of a method to interpret and edit vision-language representations to mitigate hallucinations in image captions. | 46 | 
|  | Develops a multimodal vision-language model to enable machines to understand complex relationships between instructions and images in various tasks. | 337 | 
|  | Extending pretraining models to handle multiple modalities by aligning language and video representations | 751 | 
|  | Training and deploying large language models on computer vision tasks using region-of-interest inputs | 517 | 
|  | Pre-trains a multilingual model to bridge vision and language modalities for various downstream applications | 279 | 
|  | This repository provides a PyTorch implementation of an image manipulation technique using a pretrained StyleGAN model. | 380 | 
|  | Analyzing and mitigating object hallucination in large vision-language models to improve their accuracy and reliability. | 136 | 
|  | Improves image restoration performance by converting global operations to local ones during inference | 231 |