LanguageBind

Multimodal alignment model

Extending pretraining models to handle multiple modalities by aligning language and video representations

【ICLR 2024🔥】 Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment

GitHub

751 stars

15 watching

52 forks

Language: Python

last commit: over 1 year ago

language-centralmulti-modalpretrainingzero-shot

Screenshot of PKU-YuanGroup/LanguageBind website

arxiv.org/abs/2310.01852

Related projects:

Repository	Description	Stars
pku-alignment/align-anything	Aligns large multimodal models with human intentions and values using various algorithms and fine-tuning methods.	270
pku-yuangroup/video-bench	Evaluates and benchmarks large language models' video understanding capabilities	121
multimodal-art-projection/omnibench	Evaluates and benchmarks multimodal language models' ability to process visual, acoustic, and textual inputs simultaneously.	15
yiren-jian/blitext	Develops and trains models for vision-language learning with decoupled language pre-training	24
jshilong/gpt4roi	Training and deploying large language models on computer vision tasks using region-of-interest inputs	517
lyuchenyang/macaw-llm	A multi-modal language model that integrates image, video, audio, and text data to improve language understanding and generation	1,568
shawn-ieitsystems/yuan-1.0	Large-scale language model with improved performance on NLP tasks through distributed training and efficient data processing	591
lancopku/iais	This project proposes a novel method for calibrating attention distributions in multimodal models to improve contextualized representations of image-text pairs.	30
ethanyanjiali/minchatgpt	This project demonstrates the effectiveness of reinforcement learning from human feedback (RLHF) in improving small language models like GPT-2.	214
pku-yuangroup/moe-llava	A large vision-language model using a mixture-of-experts architecture to improve performance on multi-modal learning tasks	2,023
pku-yuangroup/chat-univi	A framework for unified visual representation in image and video understanding models, enabling efficient training of large language models on multimodal data.	895
pleisto/yuren-baichuan-7b	A multi-modal large language model that integrates natural language and visual capabilities with fine-tuning for various tasks	73
sihengli99/textbind	Enables larger language models to generate multi-turn multimodal instruction-response conversations from image-caption pairs with minimal annotations.	47
brightmart/xlnet_zh	Trains a large Chinese language model on massive data and provides a pre-trained model for downstream tasks	230
haozhezhao/mic	Develops a multimodal vision-language model to enable machines to understand complex relationships between instructions and images in various tasks.	337