DaVinci

Vision-Language Model Framework

Implementing a unified modal learning framework for generative vision-language models

Source code for the paper "Prefix Language Models are Unified Modal Learners"

GitHub

43 stars
10 watching
3 forks
Language: Jupyter Notebook
last commit: over 1 year ago

Related projects:

Repository Description Stars
nvlabs/prismer A deep learning framework for training multi-modal models with vision and language capabilities. 1,299
kaiyangzhou/dassl.pytorch A PyTorch toolbox for supporting research and development of domain adaptation, generalization, and semi-supervised learning methods in computer vision. 1,236
hxyou/idealgpt A deep learning framework for iteratively decomposing vision and language reasoning via large language models. 32
yiren-jian/blitext Develops and trains models for vision-language learning with decoupled language pre-training 24
vlf-silkie/vlfeedback An annotated preference dataset and training framework for improving large vision language models. 88
yuxie11/r2d2 A framework for large-scale cross-modal benchmarks and vision-language tasks in Chinese 157
byungkwanlee/collavo Develops a PyTorch implementation of an enhanced vision language model 93
zhourax/vega Develops a multimodal task and dataset to assess vision-language models' ability to handle interleaved image-text inputs. 33
donnyyou/pytorchcv A PyTorch-based framework for building and training deep learning models in computer vision. 47
anuragranj/back2future.pytorch An implementation of unsupervised learning for multi-frame optical flow with occlusions using PyTorch. 112
woozzu/dong_iccv_2017 An implementation of semantic image synthesis via adversarial learning using PyTorch 145
baaivision/eve A PyTorch implementation of an encoder-free vision-language model that can be fine-tuned for various tasks and modalities 246
yassouali/pytorch-segmentation Implementation of semantic segmentation models and datasets using PyTorch 1,705
zijundeng/pytorch-semantic-segmentation Provides PyTorch implementations of various models and pipelines for semantic segmentation in deep learning. 1,729
vishaal27/sus-x This is an open-source project that proposes a novel method to train large-scale vision-language models with minimal resources and no fine-tuning required. 94