MoAI

Vision Language Integrator

Improves performance of vision language tasks by integrating computer vision capabilities into large language models

[ECCV 2024] Official PyTorch implementation code for realizing the technical part of Mixture of All Intelligence (MoAI) to improve performance of numerous zero-shot vision language tasks.

GitHub

311 stars
11 watching
32 forks
Language: Python
last commit: 8 months ago

Related projects:

Repository Description Stars
byungkwanlee/collavo Develops a PyTorch implementation of an enhanced vision language model 93
byungkwanlee/meteor An implementation of Mamba-based traversal of rationale to improve performance of numerous vision language models. 102
pku-yuangroup/moe-llava Develops a neural network architecture for multi-modal learning with large vision-language models 1,980
kaiyangzhou/dassl.pytorch A PyTorch toolbox for supporting research and development of domain adaptation, generalization, and semi-supervised learning methods in computer vision. 1,217
baaivision/eve A PyTorch implementation of an encoder-free vision-language model that can be fine-tuned for various tasks and modalities 230
yiren-jian/blitext Develops and trains models for vision-language learning with decoupled language pre-training 24
algolzw/daclip-uir This project controls vision-language models to restore degraded images in various environments and conditions. 662
nickjiang2378/vl-interp This project provides an official PyTorch implementation of a method to interpret and edit vision-language representations to mitigate hallucinations in image captions. 31
haozhezhao/mic Develops a multimodal vision-language model to enable machines to understand complex relationships between instructions and images in various tasks. 334
pku-yuangroup/languagebind Extending pretraining models to handle multiple modalities by aligning language and video representations 723
jshilong/gpt4roi Training and deploying large language models on computer vision tasks using region-of-interest inputs 506
baai-wudao/brivl Pre-trains a multilingual model to bridge vision and language modalities for various downstream applications 279
mchong6/soat This repository provides a PyTorch implementation of an image manipulation technique using a pretrained StyleGAN model. 380
yiyangzhou/lure Analyzing and mitigating object hallucination in large vision-language models to improve their accuracy and reliability. 134
megvii-research/tlc Improves image restoration performance by converting global operations to local ones during inference 231