IAIS

Attention calibrator

This project proposes a novel method for calibrating attention distributions in multimodal models to improve contextualized representations of image-text pairs.

[ACL 2021] Learning Relation Alignment for Calibrated Cross-modal Retrieval

GitHub

30 stars

4 watching

4 forks

Language: Python

last commit: about 3 years ago

multimodalretrievalvision-and-language

arxiv.org/abs/2105.13868

Related projects:

Repository	Description	Stars
pku-yuangroup/languagebind	Extending pretraining models to handle multiple modalities by aligning language and video representations	751
mshukor/evalign-icl	Evaluating and improving large multimodal models through in-context learning	21
pku-alignment/align-anything	Aligns large multimodal models with human intentions and values using various algorithms and fine-tuning methods.	270
hekj/fda	This project proposes a novel data augmentation technique to enhance visual-textual matching in vision-and-language navigation tasks.	13
aidc-ai/ovis	An MLLM architecture designed to align visual and textual embeddings through structural alignment	575
isekai-portal/link-context-learning	An implementation of a multimodal learning approach to improve language models' ability to recognize unseen images and understand novel concepts.	91
ifl-camp/easy_handeye	Automated calibration tool for robotic vision systems	893
pkunlp-icler/pca-eval	An open-source benchmark and evaluation tool for assessing multimodal large language models' performance in embodied decision-making tasks	99
szagoruyko/attention-transfer	Improves performance of convolutional neural networks by transferring knowledge from teacher models to student models using attention mechanisms.	1,449
bryanplummer/pl-clc	This implementation provides a framework for phrase localization and visual relationship detection using comprehensive image-language cues.	39
jiasenlu/adaptiveattention	Adaptive attention mechanism for image captioning using visual sentinels	335
byungkwanlee/moai	Improves performance of vision language tasks by integrating computer vision capabilities into large language models	314
tiger-ai-lab/uniir	Trains and evaluates a universal multimodal retrieval model to perform various information retrieval tasks.	114
mop/bier	This project implements a deep metric learning framework using an adversarial auxiliary loss to improve robustness.	39
megvii-research/tlc	Improves image restoration performance by converting global operations to local ones during inference	231