BriVL
Vision-Language Bridge
Pre-trains a multilingual model to bridge vision and language modalities for various downstream applications
Bridging Vision and Language Model
279 stars
4 watching
31 forks
Language: Python
last commit: over 1 year ago Related projects:
Repository | Description | Stars |
---|---|---|
yiren-jian/blitext | Develops and trains models for vision-language learning with decoupled language pre-training | 24 |
baaivision/eve | A PyTorch implementation of an encoder-free vision-language model that can be fine-tuned for various tasks and modalities | 230 |
baai-wudao/model | A repository of pre-trained language models for various tasks and domains. | 121 |
vishaal27/sus-x | This is an open-source project that proposes a novel method to train large-scale vision-language models with minimal resources and no fine-tuning required. | 94 |
byungkwanlee/moai | Improves performance of vision language tasks by integrating computer vision capabilities into large language models | 311 |
nvlabs/prismer | A deep learning framework for training multi-modal models with vision and language capabilities. | 1,298 |
zhuiyitechnology/pretrained-models | A collection of pre-trained language models for natural language processing tasks | 987 |
brightmart/xlnet_zh | Trains a large Chinese language model on massive data and provides a pre-trained model for downstream tasks | 230 |
deepseek-ai/deepseek-vl | A multimodal AI model that enables real-world vision-language understanding applications | 2,077 |
pku-yuangroup/languagebind | Extending pretraining models to handle multiple modalities by aligning language and video representations | 723 |
vlf-silkie/vlfeedback | An annotated preference dataset and training framework for improving large vision language models. | 85 |
openai/finetune-transformer-lm | This project provides code and model for improving language understanding through generative pre-training using a transformer-based architecture. | 2,160 |
yuxie11/r2d2 | A framework for large-scale cross-modal benchmarks and vision-language tasks in Chinese | 157 |
meituan-automl/mobilevlm | An implementation of a vision language model designed for mobile devices, utilizing a lightweight downsample projector and pre-trained language models. | 1,039 |
byungkwanlee/collavo | Develops a PyTorch implementation of an enhanced vision language model | 93 |