CoLLaVO

Vision Language Model

Develops a PyTorch implementation of an enhanced vision language model

[ACL 2024 Findings] Official PyTorch Implementation code for realizing the technical part of CoLLaVO: Crayon Large Language and Vision mOdel to significantly improve zero-shot vision language performances

GitHub

93 stars
5 watching
13 forks
Language: Python
last commit: 7 months ago

Related projects:

Repository Description Stars
byungkwanlee/moai Improves performance of vision language tasks by integrating computer vision capabilities into large language models 314
kaiyangzhou/dassl.pytorch A PyTorch toolbox for supporting research and development of domain adaptation, generalization, and semi-supervised learning methods in computer vision. 1,236
byungkwanlee/meteor An implementation of Mamba-based traversal of rationale to improve performance of numerous vision language models. 102
woozzu/dong_iccv_2017 An implementation of semantic image synthesis via adversarial learning using PyTorch 145
alankbi/detecto A Python package for building and deploying computer vision models with PyTorch 614
lackel/agla Improves large vision-language models' ability to accurately describe images by combining global and local attention mechanisms. 18
yiren-jian/blitext Develops and trains models for vision-language learning with decoupled language pre-training 24
jayleicn/clipbert An efficient framework for end-to-end learning on image-text and video-text tasks 709
baaivision/eve A PyTorch implementation of an encoder-free vision-language model that can be fine-tuned for various tasks and modalities 246
nickjiang2378/vl-interp This project provides an official PyTorch implementation of a method to interpret and edit vision-language representations to mitigate hallucinations in image captions. 46
jiutian-vl/jiutian-lion This project integrates visual knowledge into large language models to improve their capabilities and reduce hallucinations. 124
shizhediao/davinci Implementing a unified modal learning framework for generative vision-language models 43
xiadingz/video-caption.pytorch PyTorch implementation of video captioning, combining deep learning and computer vision techniques. 402
evolvinglmms-lab/longva An open-source project that enables the transfer of language understanding to vision capabilities through long context processing. 347
nvlabs/prismer A deep learning framework for training multi-modal models with vision and language capabilities. 1,299