vqa-winner-cvprw-2017

VQA Model Trainer

Implementations and tools for training and fine-tuning a visual question answering model based on the 2017 CVPR workshop winner's approach.

Pytorch implementation of winner from VQA Chllange Workshop in CVPR'17

GitHub

164 stars
11 watching
38 forks
Language: Python
last commit: almost 6 years ago
pytorchvisual-question-answering

Related projects:

Repository Description Stars
cadene/vqa.pytorch A PyTorch implementation of visual question answering with multimodal representation learning 716
hyeonwoonoh/vqa-transfer-externaldata Tools and scripts for training and evaluating a visual question answering model using transfer learning from an external data source. 20
jayleicn/tvqa PyTorch implementation of video question answering system based on TVQA dataset 172
milvlg/prophet An implementation of a two-stage framework designed to prompt large language models with answer heuristics for knowledge-based visual question answering tasks. 267
hitvoice/drqa Implementing reading comprehension from Wikipedia questions to answer open-domain queries using PyTorch and SQuAD dataset 401
hengyuan-hu/bottom-up-attention-vqa An implementation of a VQA system using bottom-up attention, aiming to improve the efficiency and speed of visual question answering tasks. 754
jayleicn/clipbert An efficient framework for end-to-end learning on image-text and video-text tasks 704
pasqal-io/pyqtorch A PyTorch-based simulator for quantum machine learning 45
penghao-wu/vstar PyTorch implementation of guided visual search mechanism for multimodal LLMs 527
gt-vision-lab/vqa_lstm_cnn A Visual Question Answering model using a deeper LSTM and normalized CNN architecture. 376
kaiyangzhou/dassl.pytorch A PyTorch toolbox for supporting research and development of domain adaptation, generalization, and semi-supervised learning methods in computer vision. 1,217
kunpengli1994/vsrn An open-source PyTorch implementation of a visual semantic reasoning model for image-text matching 294
volcengine/vescale A PyTorch-based framework for training large language models in parallel on multiple devices 663
fartashf/vsepp A PyTorch implementation of visual-semantic embedding methods for image-caption retrieval 489
vlgiitr/dmn-plus A PyTorch implementation of an improved question answering architecture with dynamic memory networks and attention mechanisms 64