bottom-up-attention-vqa

VQA system

An implementation of a VQA system using bottom-up attention, aiming to improve the efficiency and speed of visual question answering tasks.

An efficient PyTorch implementation of the winning entry of the 2017 VQA Challenge.

GitHub

755 stars

34 watching

181 forks

Language: Python

last commit: over 1 year ago

bottom-up-attentionpytorchvqa

Related projects:

Repository	Description	Stars
cadene/vqa.pytorch	A PyTorch implementation of visual question answering with multimodal representation learning	718
peteanderson80/bottom-up-attention	Trains a bottom-up attention model using Faster R-CNN and Visual Genome annotations for image captioning and VQA tasks	1,438
hyeonwoonoh/vqa-transfer-externaldata	Tools and scripts for training and evaluating a visual question answering model using transfer learning from an external data source.	20
markdtw/vqa-winner-cvprw-2017	Implementations and tools for training and fine-tuning a visual question answering model based on the 2017 CVPR workshop winner's approach.	164
jayleicn/tvqa	PyTorch implementation of video question answering system based on TVQA dataset	172
jiasenlu/hiecoattenvqa	A framework for training Hierarchical Co-Attention models for Visual Question Answering using preprocessed data and a specific image model.	349
guoyang9/unk-vqa	A VQA dataset with unanswerable questions designed to test the limits of large models' knowledge and reasoning abilities.	3
milvlg/prophet	An implementation of a two-stage framework designed to prompt large language models with answer heuristics for knowledge-based visual question answering tasks.	270
penghao-wu/vstar	PyTorch implementation of guided visual search mechanism for multimodal LLMs	541
xiaoman-zhang/pmc-vqa	A medical visual question-answering dataset and toolkit for training models to understand medical images and instructions.	180
henryjunw/tag	A Python-based system for generating visual question-answer pairs using text-aware approaches to improve Text-VQA performance.	21
hitvoice/drqa	Implementing reading comprehension from Wikipedia questions to answer open-domain queries using PyTorch and SQuAD dataset	401
gt-vision-lab/vqa_lstm_cnn	A Visual Question Answering model using a deeper LSTM and normalized CNN architecture.	377
akirafukui/vqa-mcb	A software framework for training and deploying multimodal visual question answering models using compact bilinear pooling.	222
noagarcia/roll-videoqa	A PyTorch-based model for answering questions about videos based on unseen scenes and storylines	19