LanguageBind

Multimodal alignment model

Extending pretraining models to handle multiple modalities by aligning language and video representations

【ICLR 2024🔥】 Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment

GitHub

723 stars
15 watching
52 forks
Language: Python
last commit: 8 months ago
language-centralmulti-modalpretrainingzero-shot

Related projects:

Repository Description Stars
pku-alignment/align-anything Aligns large models with human values and intentions across various modalities. 244
pku-yuangroup/video-bench Evaluates and benchmarks large language models' video understanding capabilities 117
multimodal-art-projection/omnibench Evaluates and benchmarks multimodal language models' ability to process visual, acoustic, and textual inputs simultaneously. 14
yiren-jian/blitext Develops and trains models for vision-language learning with decoupled language pre-training 24
jshilong/gpt4roi Training and deploying large language models on computer vision tasks using region-of-interest inputs 506
lyuchenyang/macaw-llm A multi-modal language model that integrates image, video, audio, and text data to improve language understanding and generation 1,550
shawn-ieitsystems/yuan-1.0 Large-scale language model with improved performance on NLP tasks through distributed training and efficient data processing 591
lancopku/iais This project proposes a novel method for calibrating attention distributions in multimodal models to improve contextualized representations of image-text pairs. 30
ethanyanjiali/minchatgpt This project demonstrates the effectiveness of reinforcement learning from human feedback (RLHF) in improving small language models like GPT-2. 213
pku-yuangroup/moe-llava Develops a neural network architecture for multi-modal learning with large vision-language models 1,980
pku-yuangroup/chat-univi A framework for unified visual representation in image and video understanding models, enabling efficient training of large language models on multimodal data. 847
pleisto/yuren-baichuan-7b A multi-modal large language model that integrates natural language and visual capabilities with fine-tuning for various tasks 72
sihengli99/textbind Enables larger language models to generate multi-turn multimodal instruction-response conversations from image-caption pairs with minimal annotations. 48
brightmart/xlnet_zh Trains a large Chinese language model on massive data and provides a pre-trained model for downstream tasks 230
haozhezhao/mic Develops a multimodal vision-language model to enable machines to understand complex relationships between instructions and images in various tasks. 334