CoLLaVO

Vision Language Model

Develops a PyTorch implementation of an enhanced vision language model

[ACL 2024 Findings] Official PyTorch Implementation code for realizing the technical part of CoLLaVO: Crayon Large Language and Vision mOdel to significantly improve zero-shot vision language performances

GitHub

93 stars

5 watching

13 forks

Language: Python

last commit: about 2 years ago

Related projects:

Repository	Description	Stars
byungkwanlee/moai	Improves performance of vision language tasks by integrating computer vision capabilities into large language models	314
kaiyangzhou/dassl.pytorch	A PyTorch toolbox for supporting research and development of domain adaptation, generalization, and semi-supervised learning methods in computer vision.	1,236
byungkwanlee/meteor	An implementation of Mamba-based traversal of rationale to improve performance of numerous vision language models.	102
woozzu/dong_iccv_2017	An implementation of semantic image synthesis via adversarial learning using PyTorch	145
alankbi/detecto	A Python package for building and deploying computer vision models with PyTorch	614
lackel/agla	Improves large vision-language models' ability to accurately describe images by combining global and local attention mechanisms.	18
yiren-jian/blitext	Develops and trains models for vision-language learning with decoupled language pre-training	24
jayleicn/clipbert	An efficient framework for end-to-end learning on image-text and video-text tasks	709
baaivision/eve	A PyTorch implementation of an encoder-free vision-language model that can be fine-tuned for various tasks and modalities	246
nickjiang2378/vl-interp	This project provides an official PyTorch implementation of a method to interpret and edit vision-language representations to mitigate hallucinations in image captions.	46
jiutian-vl/jiutian-lion	This project integrates visual knowledge into large language models to improve their capabilities and reduce hallucinations.	124
shizhediao/davinci	Implementing a unified modal learning framework for generative vision-language models	43
xiadingz/video-caption.pytorch	PyTorch implementation of video captioning, combining deep learning and computer vision techniques.	402
evolvinglmms-lab/longva	An open-source project that enables the transfer of language understanding to vision capabilities through long context processing.	347
nvlabs/prismer	A deep learning framework for training multi-modal models with vision and language capabilities.	1,299