MoAI

Vision Language Integrator

Improves performance of vision language tasks by integrating computer vision capabilities into large language models

[ECCV 2024] Official PyTorch implementation code for realizing the technical part of Mixture of All Intelligence (MoAI) to improve performance of numerous zero-shot vision language tasks.

GitHub

314 stars

11 watching

32 forks

Language: Python

last commit: over 2 years ago

Related projects:

Repository	Description	Stars
byungkwanlee/collavo	Develops a PyTorch implementation of an enhanced vision language model	93
byungkwanlee/meteor	An implementation of Mamba-based traversal of rationale to improve performance of numerous vision language models.	102
pku-yuangroup/moe-llava	A large vision-language model using a mixture-of-experts architecture to improve performance on multi-modal learning tasks	2,023
kaiyangzhou/dassl.pytorch	A PyTorch toolbox for supporting research and development of domain adaptation, generalization, and semi-supervised learning methods in computer vision.	1,236
baaivision/eve	A PyTorch implementation of an encoder-free vision-language model that can be fine-tuned for various tasks and modalities	246
yiren-jian/blitext	Develops and trains models for vision-language learning with decoupled language pre-training	24
algolzw/daclip-uir	This project controls vision-language models to restore degraded images in various environments and conditions.	673
nickjiang2378/vl-interp	This project provides an official PyTorch implementation of a method to interpret and edit vision-language representations to mitigate hallucinations in image captions.	46
haozhezhao/mic	Develops a multimodal vision-language model to enable machines to understand complex relationships between instructions and images in various tasks.	337
pku-yuangroup/languagebind	Extending pretraining models to handle multiple modalities by aligning language and video representations	751
jshilong/gpt4roi	Training and deploying large language models on computer vision tasks using region-of-interest inputs	517
baai-wudao/brivl	Pre-trains a multilingual model to bridge vision and language modalities for various downstream applications	279
mchong6/soat	This repository provides a PyTorch implementation of an image manipulation technique using a pretrained StyleGAN model.	380
yiyangzhou/lure	Analyzing and mitigating object hallucination in large vision-language models to improve their accuracy and reliability.	136
megvii-research/tlc	Improves image restoration performance by converting global operations to local ones during inference	231