FDA
Navigation assistant
This project proposes a novel data augmentation technique to enhance visual-textual matching in vision-and-language navigation tasks.
Official Implementation of Frequency-enhanced Data Augmentation for Vision-and-Language Navigation (NeurIPS2023)
13 stars
3 watching
0 forks
Language: Python
last commit: 11 months ago Related projects:
Repository | Description | Stars |
---|---|---|
lancopku/iais | This project proposes a novel method for calibrating attention distributions in multimodal models to improve contextualized representations of image-text pairs. | 30 |
byungkwanlee/moai | Improves performance of vision language tasks by integrating computer vision capabilities into large language models | 311 |
iamwangyunkai/carla_py | Generates data for CARLA's visual navigation system using raw camera images and instructions. | 8 |
aheze/accessiblereality | An AR navigation aid for visually impaired individuals. | 26 |
pku-yuangroup/languagebind | Extending pretraining models to handle multiple modalities by aligning language and video representations | 723 |
microsoft/llava-med | A research project aimed at building large language and vision models for biomedical applications with capabilities comparable to GPT-4. | 1,556 |
megvii-research/tlc | Improves image restoration performance by converting global operations to local ones during inference | 231 |
hofbi/mv-roi | A tool for annotating and labeling data for autonomous driving applications using semi-supervised machine learning | 1 |
lalbj/pai | Improves the performance of large language models by intervening in their internal workings to reduce hallucinations | 67 |
kentonishi/augmentation-for-lnl | Provides a framework for learning with noisy labels using data augmentation strategies. | 113 |
hit-scir/elmoformanylangs | Provides pre-trained ELMo representations for multiple languages to improve NLP tasks. | 1,463 |
vita-epfl/crowdnav | Develops robot navigation policies in crowded spaces using reinforcement learning and attention mechanisms. | 598 |
nkasmanoff/pi-card | An offline voice assistant built on Raspberry Pi using AI and natural language processing | 736 |
jshilong/gpt4roi | Training and deploying large language models on computer vision tasks using region-of-interest inputs | 506 |
dfki-interactive-machine-learning/arasif | Provides sentence embeddings for Arabic languages using pre-trained word embeddings and Smooth Inverse Frequency algorithm | 5 |