FDA

Navigation assistant

This project proposes a novel data augmentation technique to enhance visual-textual matching in vision-and-language navigation tasks.

Official Implementation of Frequency-enhanced Data Augmentation for Vision-and-Language Navigation (NeurIPS2023)

GitHub

13 stars
3 watching
0 forks
Language: Python
last commit: 11 months ago

Related projects:

Repository Description Stars
lancopku/iais This project proposes a novel method for calibrating attention distributions in multimodal models to improve contextualized representations of image-text pairs. 30
byungkwanlee/moai Improves performance of vision language tasks by integrating computer vision capabilities into large language models 311
iamwangyunkai/carla_py Generates data for CARLA's visual navigation system using raw camera images and instructions. 8
aheze/accessiblereality An AR navigation aid for visually impaired individuals. 26
pku-yuangroup/languagebind Extending pretraining models to handle multiple modalities by aligning language and video representations 723
microsoft/llava-med A research project aimed at building large language and vision models for biomedical applications with capabilities comparable to GPT-4. 1,556
megvii-research/tlc Improves image restoration performance by converting global operations to local ones during inference 231
hofbi/mv-roi A tool for annotating and labeling data for autonomous driving applications using semi-supervised machine learning 1
lalbj/pai Improves the performance of large language models by intervening in their internal workings to reduce hallucinations 67
kentonishi/augmentation-for-lnl Provides a framework for learning with noisy labels using data augmentation strategies. 113
hit-scir/elmoformanylangs Provides pre-trained ELMo representations for multiple languages to improve NLP tasks. 1,463
vita-epfl/crowdnav Develops robot navigation policies in crowded spaces using reinforcement learning and attention mechanisms. 598
nkasmanoff/pi-card An offline voice assistant built on Raspberry Pi using AI and natural language processing 736
jshilong/gpt4roi Training and deploying large language models on computer vision tasks using region-of-interest inputs 506
dfki-interactive-machine-learning/arasif Provides sentence embeddings for Arabic languages using pre-trained word embeddings and Smooth Inverse Frequency algorithm 5