CoLLaVO

Vision Language Model

Develops a PyTorch implementation of an enhanced vision language model

[ACL 2024 Findings] Official PyTorch Implementation code for realizing the technical part of CoLLaVO: Crayon Large Language and Vision mOdel to significantly improve zero-shot vision language performances

GitHub

93 stars
5 watching
13 forks
Language: Python
last commit: 5 months ago

Related projects:

Repository Description Stars
byungkwanlee/moai Improves performance of vision language tasks by integrating computer vision capabilities into large language models 311
kaiyangzhou/dassl.pytorch A PyTorch toolbox for supporting research and development of domain adaptation, generalization, and semi-supervised learning methods in computer vision. 1,217
byungkwanlee/meteor An implementation of Mamba-based traversal of rationale to improve performance of numerous vision language models. 102
woozzu/dong_iccv_2017 An implementation of semantic image synthesis via adversarial learning using PyTorch 145
alankbi/detecto A Python package for building and deploying computer vision models with PyTorch 613
lackel/agla Improving large vision-language models to accurately describe images without generating fictional objects 15
yiren-jian/blitext Develops and trains models for vision-language learning with decoupled language pre-training 24
jayleicn/clipbert An efficient framework for end-to-end learning on image-text and video-text tasks 704
baaivision/eve A PyTorch implementation of an encoder-free vision-language model that can be fine-tuned for various tasks and modalities 230
nickjiang2378/vl-interp This project provides an official PyTorch implementation of a method to interpret and edit vision-language representations to mitigate hallucinations in image captions. 31
jiutian-vl/jiutian-lion This project integrates visual knowledge into large language models to improve their capabilities and reduce hallucinations. 121
shizhediao/davinci An implementation of vision-language models for multimodal learning tasks, enabling generative vision-language models to be fine-tuned for various applications. 43
xiadingz/video-caption.pytorch PyTorch implementation of video captioning, combining deep learning and computer vision techniques. 401
evolvinglmms-lab/longva This project provides a model for long context transfer from language to vision using a deep learning framework. 334
nvlabs/prismer A deep learning framework for training multi-modal models with vision and language capabilities. 1,298