SuS-X

Vision-Language Model Trainer

This is an open-source project that proposes a novel method to train large-scale vision-language models with minimal resources and no fine-tuning required.

Code for the paper: "SuS-X: Training-Free Name-Only Transfer of Vision-Language Models" [ICCV'23]

GitHub

94 stars

3 watching

5 forks

Language: Python

last commit: almost 3 years ago

vishaal27.github.io/SuS-X-webpage/

Related projects:

Repository	Description	Stars
vlf-silkie/vlfeedback	An annotated preference dataset and training framework for improving large vision language models.	88
deepseek-ai/deepseek-vl	A multimodal AI model that enables real-world vision-language understanding applications	2,145
baai-wudao/brivl	Pre-trains a multilingual model to bridge vision and language modalities for various downstream applications	279
llava-vl/llava-plus-codebase	A platform for training and deploying large language and vision models that can use tools to perform tasks	717
yiren-jian/blitext	Develops and trains models for vision-language learning with decoupled language pre-training	24
nvlabs/prismer	A deep learning framework for training multi-modal models with vision and language capabilities.	1,299
shizhediao/davinci	Implementing a unified modal learning framework for generative vision-language models	43
baaivision/eve	A PyTorch implementation of an encoder-free vision-language model that can be fine-tuned for various tasks and modalities	246
vpgtrans/vpgtrans	Transfers visual prompt generators across large language models to reduce training costs and enable customization of multimodal LLMs	270
ucsc-vlaa/sight-beyond-text	An implementation of a multimodal LLM training paradigm to enhance truthfulness and ethics in language models	19
openai/finetune-transformer-lm	This project provides code and model for improving language understanding through generative pre-training using a transformer-based architecture.	2,167
jshilong/gpt4roi	Training and deploying large language models on computer vision tasks using region-of-interest inputs	517
csuhan/onellm	A framework for training and fine-tuning multimodal language models on various data types	601
maxpumperla/elephas	Enables distributed deep learning with Keras and Spark for scalable model training	1,574
byungkwanlee/collavo	Develops a PyTorch implementation of an enhanced vision language model	93