MoAI
Vision Language Integrator
Improves performance of vision language tasks by integrating computer vision capabilities into large language models
[ECCV 2024] Official PyTorch implementation code for realizing the technical part of Mixture of All Intelligence (MoAI) to improve performance of numerous zero-shot vision language tasks.
311 stars
11 watching
32 forks
Language: Python
last commit: 8 months ago Related projects:
Repository | Description | Stars |
---|---|---|
byungkwanlee/collavo | Develops a PyTorch implementation of an enhanced vision language model | 93 |
byungkwanlee/meteor | An implementation of Mamba-based traversal of rationale to improve performance of numerous vision language models. | 102 |
pku-yuangroup/moe-llava | Develops a neural network architecture for multi-modal learning with large vision-language models | 1,980 |
kaiyangzhou/dassl.pytorch | A PyTorch toolbox for supporting research and development of domain adaptation, generalization, and semi-supervised learning methods in computer vision. | 1,217 |
baaivision/eve | A PyTorch implementation of an encoder-free vision-language model that can be fine-tuned for various tasks and modalities | 230 |
yiren-jian/blitext | Develops and trains models for vision-language learning with decoupled language pre-training | 24 |
algolzw/daclip-uir | This project controls vision-language models to restore degraded images in various environments and conditions. | 662 |
nickjiang2378/vl-interp | This project provides an official PyTorch implementation of a method to interpret and edit vision-language representations to mitigate hallucinations in image captions. | 31 |
haozhezhao/mic | Develops a multimodal vision-language model to enable machines to understand complex relationships between instructions and images in various tasks. | 334 |
pku-yuangroup/languagebind | Extending pretraining models to handle multiple modalities by aligning language and video representations | 723 |
jshilong/gpt4roi | Training and deploying large language models on computer vision tasks using region-of-interest inputs | 506 |
baai-wudao/brivl | Pre-trains a multilingual model to bridge vision and language modalities for various downstream applications | 279 |
mchong6/soat | This repository provides a PyTorch implementation of an image manipulation technique using a pretrained StyleGAN model. | 380 |
yiyangzhou/lure | Analyzing and mitigating object hallucination in large vision-language models to improve their accuracy and reliability. | 134 |
megvii-research/tlc | Improves image restoration performance by converting global operations to local ones during inference | 231 |