IAIS
Attention calibrator
This project proposes a novel method for calibrating attention distributions in multimodal models to improve contextualized representations of image-text pairs.
[ACL 2021] Learning Relation Alignment for Calibrated Cross-modal Retrieval
30 stars
4 watching
4 forks
Language: Python
last commit: over 1 year ago multimodalretrievalvision-and-language
Related projects:
Repository | Description | Stars |
---|---|---|
pku-yuangroup/languagebind | Extending pretraining models to handle multiple modalities by aligning language and video representations | 723 |
mshukor/evalign-icl | Evaluating and improving large multimodal models through in-context learning | 20 |
pku-alignment/align-anything | Aligns large models with human values and intentions across various modalities. | 244 |
hekj/fda | This project proposes a novel data augmentation technique to enhance visual-textual matching in vision-and-language navigation tasks. | 13 |
aidc-ai/ovis | An architecture designed to align visual and textual embeddings in multimodal learning | 517 |
isekai-portal/link-context-learning | An implementation of a multimodal learning approach to improve language models' ability to recognize unseen images and understand novel concepts. | 89 |
ifl-camp/easy_handeye | Automated calibration tool for robotic vision systems | 871 |
pkunlp-icler/pca-eval | An open-source benchmark and evaluation tool for assessing multimodal large language models' performance in embodied decision-making tasks | 100 |
szagoruyko/attention-transfer | Improves performance of convolutional neural networks by transferring knowledge from teacher models to student models using attention mechanisms. | 1,444 |
bryanplummer/pl-clc | This implementation provides a framework for phrase localization and visual relationship detection using comprehensive image-language cues. | 39 |
jiasenlu/adaptiveattention | Adaptive attention mechanism for image captioning using visual sentinels | 334 |
byungkwanlee/moai | Improves performance of vision language tasks by integrating computer vision capabilities into large language models | 311 |
tiger-ai-lab/uniir | Trains and evaluates a universal multimodal retrieval model to perform various information retrieval tasks. | 110 |
mop/bier | This project implements a deep metric learning framework using an adversarial auxiliary loss to improve robustness. | 39 |
megvii-research/tlc | Improves image restoration performance by converting global operations to local ones during inference | 231 |