IAIS

Attention calibrator

This project proposes a novel method for calibrating attention distributions in multimodal models to improve contextualized representations of image-text pairs.

[ACL 2021] Learning Relation Alignment for Calibrated Cross-modal Retrieval

GitHub

30 stars
4 watching
4 forks
Language: Python
last commit: over 1 year ago
multimodalretrievalvision-and-language

Related projects:

Repository Description Stars
pku-yuangroup/languagebind Extending pretraining models to handle multiple modalities by aligning language and video representations 723
mshukor/evalign-icl Evaluating and improving large multimodal models through in-context learning 20
pku-alignment/align-anything Aligns large models with human values and intentions across various modalities. 244
hekj/fda This project proposes a novel data augmentation technique to enhance visual-textual matching in vision-and-language navigation tasks. 13
aidc-ai/ovis An architecture designed to align visual and textual embeddings in multimodal learning 517
isekai-portal/link-context-learning An implementation of a multimodal learning approach to improve language models' ability to recognize unseen images and understand novel concepts. 89
ifl-camp/easy_handeye Automated calibration tool for robotic vision systems 871
pkunlp-icler/pca-eval An open-source benchmark and evaluation tool for assessing multimodal large language models' performance in embodied decision-making tasks 100
szagoruyko/attention-transfer Improves performance of convolutional neural networks by transferring knowledge from teacher models to student models using attention mechanisms. 1,444
bryanplummer/pl-clc This implementation provides a framework for phrase localization and visual relationship detection using comprehensive image-language cues. 39
jiasenlu/adaptiveattention Adaptive attention mechanism for image captioning using visual sentinels 334
byungkwanlee/moai Improves performance of vision language tasks by integrating computer vision capabilities into large language models 311
tiger-ai-lab/uniir Trains and evaluates a universal multimodal retrieval model to perform various information retrieval tasks. 110
mop/bier This project implements a deep metric learning framework using an adversarial auxiliary loss to improve robustness. 39
megvii-research/tlc Improves image restoration performance by converting global operations to local ones during inference 231