BriVL

Vision-Language Bridge

Pre-trains a multilingual model to bridge vision and language modalities for various downstream applications

Bridging Vision and Language Model

GitHub

279 stars
4 watching
31 forks
Language: Python
last commit: over 1 year ago

Related projects:

Repository Description Stars
yiren-jian/blitext Develops and trains models for vision-language learning with decoupled language pre-training 24
baaivision/eve A PyTorch implementation of an encoder-free vision-language model that can be fine-tuned for various tasks and modalities 230
baai-wudao/model A repository of pre-trained language models for various tasks and domains. 121
vishaal27/sus-x This is an open-source project that proposes a novel method to train large-scale vision-language models with minimal resources and no fine-tuning required. 94
byungkwanlee/moai Improves performance of vision language tasks by integrating computer vision capabilities into large language models 311
nvlabs/prismer A deep learning framework for training multi-modal models with vision and language capabilities. 1,298
zhuiyitechnology/pretrained-models A collection of pre-trained language models for natural language processing tasks 987
brightmart/xlnet_zh Trains a large Chinese language model on massive data and provides a pre-trained model for downstream tasks 230
deepseek-ai/deepseek-vl A multimodal AI model that enables real-world vision-language understanding applications 2,077
pku-yuangroup/languagebind Extending pretraining models to handle multiple modalities by aligning language and video representations 723
vlf-silkie/vlfeedback An annotated preference dataset and training framework for improving large vision language models. 85
openai/finetune-transformer-lm This project provides code and model for improving language understanding through generative pre-training using a transformer-based architecture. 2,160
yuxie11/r2d2 A framework for large-scale cross-modal benchmarks and vision-language tasks in Chinese 157
meituan-automl/mobilevlm An implementation of a vision language model designed for mobile devices, utilizing a lightweight downsample projector and pre-trained language models. 1,039
byungkwanlee/collavo Develops a PyTorch implementation of an enhanced vision language model 93