FDA

Navigation assistant

This project proposes a novel data augmentation technique to enhance visual-textual matching in vision-and-language navigation tasks.

Official Implementation of Frequency-enhanced Data Augmentation for Vision-and-Language Navigation (NeurIPS2023)

GitHub

13 stars
3 watching
0 forks
Language: Python
last commit: about 1 year ago

Related projects:

Repository Description Stars
lancopku/iais This project proposes a novel method for calibrating attention distributions in multimodal models to improve contextualized representations of image-text pairs. 30
byungkwanlee/moai Improves performance of vision language tasks by integrating computer vision capabilities into large language models 314
iamwangyunkai/carla_py Generates data for CARLA's visual navigation system using raw camera images and instructions. 8
aheze/accessiblereality An AR navigation aid for visually impaired individuals. 26
pku-yuangroup/languagebind Extending pretraining models to handle multiple modalities by aligning language and video representations 751
microsoft/llava-med A research project aimed at building large language and vision models for biomedical applications with capabilities comparable to GPT-4. 1,622
megvii-research/tlc Improves image restoration performance by converting global operations to local ones during inference 231
hofbi/mv-roi A tool for annotating and labeling data for autonomous driving applications using semi-supervised machine learning 1
lalbj/pai Improves the performance of large language models by intervening in their internal workings to reduce hallucinations 83
kentonishi/augmentation-for-lnl Provides a framework for learning with noisy labels using data augmentation strategies. 113
hit-scir/elmoformanylangs Provides pre-trained ELMo representations for multiple languages to improve NLP tasks. 1,462
vita-epfl/crowdnav Develops robot navigation policies in crowded spaces using reinforcement learning and attention mechanisms. 607
nkasmanoff/pi-card An AI-powered conversational assistant built on top of a Raspberry Pi. 747
jshilong/gpt4roi Training and deploying large language models on computer vision tasks using region-of-interest inputs 517
dfki-interactive-machine-learning/arasif Provides sentence embeddings for Arabic languages using pre-trained word embeddings and Smooth Inverse Frequency algorithm 5