bottom-up-attention-vqa
VQA system
An implementation of a VQA system using bottom-up attention, aiming to improve the efficiency and speed of visual question answering tasks.
An efficient PyTorch implementation of the winning entry of the 2017 VQA Challenge.
754 stars
34 watching
181 forks
Language: Python
last commit: 9 months ago bottom-up-attentionpytorchvqa
Related projects:
Repository | Description | Stars |
---|---|---|
cadene/vqa.pytorch | A PyTorch implementation of visual question answering with multimodal representation learning | 716 |
peteanderson80/bottom-up-attention | Trains a bottom-up attention model using Faster R-CNN and Visual Genome annotations for image captioning and VQA tasks | 1,433 |
hyeonwoonoh/vqa-transfer-externaldata | Tools and scripts for training and evaluating a visual question answering model using transfer learning from an external data source. | 20 |
markdtw/vqa-winner-cvprw-2017 | Implementations and tools for training and fine-tuning a visual question answering model based on the 2017 CVPR workshop winner's approach. | 164 |
jayleicn/tvqa | PyTorch implementation of video question answering system based on TVQA dataset | 172 |
jiasenlu/hiecoattenvqa | A framework for training Hierarchical Co-Attention models for Visual Question Answering using preprocessed data and a specific image model. | 349 |
guoyang9/unk-vqa | A VQA dataset with unanswerable questions designed to test the limits of large models' knowledge and reasoning abilities. | 2 |
milvlg/prophet | An implementation of a two-stage framework designed to prompt large language models with answer heuristics for knowledge-based visual question answering tasks. | 267 |
penghao-wu/vstar | PyTorch implementation of guided visual search mechanism for multimodal LLMs | 527 |
xiaoman-zhang/pmc-vqa | A medical visual question-answering dataset and toolkit for training models to understand medical images and instructions. | 174 |
henryjunw/tag | A Python-based system for generating visual question-answer pairs using text-aware approaches to improve Text-VQA performance. | 21 |
hitvoice/drqa | Implementing reading comprehension from Wikipedia questions to answer open-domain queries using PyTorch and SQuAD dataset | 401 |
gt-vision-lab/vqa_lstm_cnn | A Visual Question Answering model using a deeper LSTM and normalized CNN architecture. | 376 |
akirafukui/vqa-mcb | A software framework for training and deploying multimodal visual question answering models using compact bilinear pooling. | 222 |
noagarcia/roll-videoqa | A PyTorch-based model for answering questions about videos based on unseen scenes and storylines | 19 |