bottom-up-attention-vqa

VQA system

An implementation of a VQA system using bottom-up attention, aiming to improve the efficiency and speed of visual question answering tasks.

An efficient PyTorch implementation of the winning entry of the 2017 VQA Challenge.

GitHub

755 stars
34 watching
181 forks
Language: Python
last commit: 10 months ago
bottom-up-attentionpytorchvqa

Related projects:

Repository Description Stars
cadene/vqa.pytorch A PyTorch implementation of visual question answering with multimodal representation learning 718
peteanderson80/bottom-up-attention Trains a bottom-up attention model using Faster R-CNN and Visual Genome annotations for image captioning and VQA tasks 1,438
hyeonwoonoh/vqa-transfer-externaldata Tools and scripts for training and evaluating a visual question answering model using transfer learning from an external data source. 20
markdtw/vqa-winner-cvprw-2017 Implementations and tools for training and fine-tuning a visual question answering model based on the 2017 CVPR workshop winner's approach. 164
jayleicn/tvqa PyTorch implementation of video question answering system based on TVQA dataset 172
jiasenlu/hiecoattenvqa A framework for training Hierarchical Co-Attention models for Visual Question Answering using preprocessed data and a specific image model. 349
guoyang9/unk-vqa A VQA dataset with unanswerable questions designed to test the limits of large models' knowledge and reasoning abilities. 3
milvlg/prophet An implementation of a two-stage framework designed to prompt large language models with answer heuristics for knowledge-based visual question answering tasks. 270
penghao-wu/vstar PyTorch implementation of guided visual search mechanism for multimodal LLMs 541
xiaoman-zhang/pmc-vqa A medical visual question-answering dataset and toolkit for training models to understand medical images and instructions. 180
henryjunw/tag A Python-based system for generating visual question-answer pairs using text-aware approaches to improve Text-VQA performance. 21
hitvoice/drqa Implementing reading comprehension from Wikipedia questions to answer open-domain queries using PyTorch and SQuAD dataset 401
gt-vision-lab/vqa_lstm_cnn A Visual Question Answering model using a deeper LSTM and normalized CNN architecture. 377
akirafukui/vqa-mcb A software framework for training and deploying multimodal visual question answering models using compact bilinear pooling. 222
noagarcia/roll-videoqa A PyTorch-based model for answering questions about videos based on unseen scenes and storylines 19