CoLLaVO
Vision Language Model
Develops a PyTorch implementation of an enhanced vision language model
[ACL 2024 Findings] Official PyTorch Implementation code for realizing the technical part of CoLLaVO: Crayon Large Language and Vision mOdel to significantly improve zero-shot vision language performances
93 stars
5 watching
13 forks
Language: Python
last commit: 5 months ago Related projects:
Repository | Description | Stars |
---|---|---|
byungkwanlee/moai | Improves performance of vision language tasks by integrating computer vision capabilities into large language models | 311 |
kaiyangzhou/dassl.pytorch | A PyTorch toolbox for supporting research and development of domain adaptation, generalization, and semi-supervised learning methods in computer vision. | 1,217 |
byungkwanlee/meteor | An implementation of Mamba-based traversal of rationale to improve performance of numerous vision language models. | 102 |
woozzu/dong_iccv_2017 | An implementation of semantic image synthesis via adversarial learning using PyTorch | 145 |
alankbi/detecto | A Python package for building and deploying computer vision models with PyTorch | 613 |
lackel/agla | Improving large vision-language models to accurately describe images without generating fictional objects | 15 |
yiren-jian/blitext | Develops and trains models for vision-language learning with decoupled language pre-training | 24 |
jayleicn/clipbert | An efficient framework for end-to-end learning on image-text and video-text tasks | 704 |
baaivision/eve | A PyTorch implementation of an encoder-free vision-language model that can be fine-tuned for various tasks and modalities | 230 |
nickjiang2378/vl-interp | This project provides an official PyTorch implementation of a method to interpret and edit vision-language representations to mitigate hallucinations in image captions. | 31 |
jiutian-vl/jiutian-lion | This project integrates visual knowledge into large language models to improve their capabilities and reduce hallucinations. | 121 |
shizhediao/davinci | An implementation of vision-language models for multimodal learning tasks, enabling generative vision-language models to be fine-tuned for various applications. | 43 |
xiadingz/video-caption.pytorch | PyTorch implementation of video captioning, combining deep learning and computer vision techniques. | 401 |
evolvinglmms-lab/longva | This project provides a model for long context transfer from language to vision using a deep learning framework. | 334 |
nvlabs/prismer | A deep learning framework for training multi-modal models with vision and language capabilities. | 1,298 |