IAIS
Attention calibrator
This project proposes a novel method for calibrating attention distributions in multimodal models to improve contextualized representations of image-text pairs.
[ACL 2021] Learning Relation Alignment for Calibrated Cross-modal Retrieval
30 stars
4 watching
4 forks
Language: Python
last commit: almost 2 years ago multimodalretrievalvision-and-language
Related projects:
Repository | Description | Stars |
---|---|---|
| Extending pretraining models to handle multiple modalities by aligning language and video representations | 751 |
| Evaluating and improving large multimodal models through in-context learning | 21 |
| Aligns large multimodal models with human intentions and values using various algorithms and fine-tuning methods. | 270 |
| This project proposes a novel data augmentation technique to enhance visual-textual matching in vision-and-language navigation tasks. | 13 |
| An MLLM architecture designed to align visual and textual embeddings through structural alignment | 575 |
| An implementation of a multimodal learning approach to improve language models' ability to recognize unseen images and understand novel concepts. | 91 |
| Automated calibration tool for robotic vision systems | 893 |
| An open-source benchmark and evaluation tool for assessing multimodal large language models' performance in embodied decision-making tasks | 99 |
| Improves performance of convolutional neural networks by transferring knowledge from teacher models to student models using attention mechanisms. | 1,449 |
| This implementation provides a framework for phrase localization and visual relationship detection using comprehensive image-language cues. | 39 |
| Adaptive attention mechanism for image captioning using visual sentinels | 335 |
| Improves performance of vision language tasks by integrating computer vision capabilities into large language models | 314 |
| Trains and evaluates a universal multimodal retrieval model to perform various information retrieval tasks. | 114 |
| This project implements a deep metric learning framework using an adversarial auxiliary loss to improve robustness. | 39 |
| Improves image restoration performance by converting global operations to local ones during inference | 231 |