DaVinci

Vision-Language Model Framework

Implementing a unified modal learning framework for generative vision-language models

Source code for the paper "Prefix Language Models are Unified Modal Learners"

GitHub

43 stars

10 watching

3 forks

Language: Jupyter Notebook

last commit: over 3 years ago

Related projects:

Repository	Description	Stars
nvlabs/prismer	A deep learning framework for training multi-modal models with vision and language capabilities.	1,299
kaiyangzhou/dassl.pytorch	A PyTorch toolbox for supporting research and development of domain adaptation, generalization, and semi-supervised learning methods in computer vision.	1,236
hxyou/idealgpt	A deep learning framework for iteratively decomposing vision and language reasoning via large language models.	32
yiren-jian/blitext	Develops and trains models for vision-language learning with decoupled language pre-training	24
vlf-silkie/vlfeedback	An annotated preference dataset and training framework for improving large vision language models.	88
yuxie11/r2d2	A framework for large-scale cross-modal benchmarks and vision-language tasks in Chinese	157
byungkwanlee/collavo	Develops a PyTorch implementation of an enhanced vision language model	93
zhourax/vega	Develops a multimodal task and dataset to assess vision-language models' ability to handle interleaved image-text inputs.	33
donnyyou/pytorchcv	A PyTorch-based framework for building and training deep learning models in computer vision.	47
anuragranj/back2future.pytorch	An implementation of unsupervised learning for multi-frame optical flow with occlusions using PyTorch.	112
woozzu/dong_iccv_2017	An implementation of semantic image synthesis via adversarial learning using PyTorch	145
baaivision/eve	A PyTorch implementation of an encoder-free vision-language model that can be fine-tuned for various tasks and modalities	246
yassouali/pytorch-segmentation	Implementation of semantic segmentation models and datasets using PyTorch	1,705
zijundeng/pytorch-semantic-segmentation	Provides PyTorch implementations of various models and pipelines for semantic segmentation in deep learning.	1,729
vishaal27/sus-x	This is an open-source project that proposes a novel method to train large-scale vision-language models with minimal resources and no fine-tuning required.	94