SuS-X
Vision-Language Model Trainer
This is an open-source project that proposes a novel method to train large-scale vision-language models with minimal resources and no fine-tuning required.
Code for the paper: "SuS-X: Training-Free Name-Only Transfer of Vision-Language Models" [ICCV'23]
94 stars
3 watching
5 forks
Language: Python
last commit: over 1 year ago Related projects:
Repository | Description | Stars |
---|---|---|
vlf-silkie/vlfeedback | An annotated preference dataset and training framework for improving large vision language models. | 88 |
deepseek-ai/deepseek-vl | A multimodal AI model that enables real-world vision-language understanding applications | 2,145 |
baai-wudao/brivl | Pre-trains a multilingual model to bridge vision and language modalities for various downstream applications | 279 |
llava-vl/llava-plus-codebase | A platform for training and deploying large language and vision models that can use tools to perform tasks | 717 |
yiren-jian/blitext | Develops and trains models for vision-language learning with decoupled language pre-training | 24 |
nvlabs/prismer | A deep learning framework for training multi-modal models with vision and language capabilities. | 1,299 |
shizhediao/davinci | Implementing a unified modal learning framework for generative vision-language models | 43 |
baaivision/eve | A PyTorch implementation of an encoder-free vision-language model that can be fine-tuned for various tasks and modalities | 246 |
vpgtrans/vpgtrans | Transfers visual prompt generators across large language models to reduce training costs and enable customization of multimodal LLMs | 270 |
ucsc-vlaa/sight-beyond-text | An implementation of a multimodal LLM training paradigm to enhance truthfulness and ethics in language models | 19 |
openai/finetune-transformer-lm | This project provides code and model for improving language understanding through generative pre-training using a transformer-based architecture. | 2,167 |
jshilong/gpt4roi | Training and deploying large language models on computer vision tasks using region-of-interest inputs | 517 |
csuhan/onellm | A framework for training and fine-tuning multimodal language models on various data types | 601 |
maxpumperla/elephas | Enables distributed deep learning with Keras and Spark for scalable model training | 1,574 |
byungkwanlee/collavo | Develops a PyTorch implementation of an enhanced vision language model | 93 |