DaVinci
Vision-Language Model Framework
Implementing a unified modal learning framework for generative vision-language models
Source code for the paper "Prefix Language Models are Unified Modal Learners"
43 stars
10 watching
3 forks
Language: Jupyter Notebook
last commit: almost 2 years ago Related projects:
Repository | Description | Stars |
---|---|---|
| A deep learning framework for training multi-modal models with vision and language capabilities. | 1,299 |
| A PyTorch toolbox for supporting research and development of domain adaptation, generalization, and semi-supervised learning methods in computer vision. | 1,236 |
| A deep learning framework for iteratively decomposing vision and language reasoning via large language models. | 32 |
| Develops and trains models for vision-language learning with decoupled language pre-training | 24 |
| An annotated preference dataset and training framework for improving large vision language models. | 88 |
| A framework for large-scale cross-modal benchmarks and vision-language tasks in Chinese | 157 |
| Develops a PyTorch implementation of an enhanced vision language model | 93 |
| Develops a multimodal task and dataset to assess vision-language models' ability to handle interleaved image-text inputs. | 33 |
| A PyTorch-based framework for building and training deep learning models in computer vision. | 47 |
| An implementation of unsupervised learning for multi-frame optical flow with occlusions using PyTorch. | 112 |
| An implementation of semantic image synthesis via adversarial learning using PyTorch | 145 |
| A PyTorch implementation of an encoder-free vision-language model that can be fine-tuned for various tasks and modalities | 246 |
| Implementation of semantic segmentation models and datasets using PyTorch | 1,705 |
| Provides PyTorch implementations of various models and pipelines for semantic segmentation in deep learning. | 1,729 |
| This is an open-source project that proposes a novel method to train large-scale vision-language models with minimal resources and no fine-tuning required. | 94 |