VQA_LSTM_CNN

VQA model

A Visual Question Answering model using a deeper LSTM and normalized CNN architecture.

Train a deeper LSTM and normalized CNN Visual Question Answering model. This current code can get 58.16 on OpenEnded and 63.09 on Multiple-Choice on test-standard.

GitHub

377 stars

25 watching

133 forks

Language: Lua

last commit: over 6 years ago

Linked from 1 awesome list

Backlinks from these awesome lists:

carpedm20/awesome-torch

Related projects:

Repository	Description	Stars
jnhwkim/nips-mrn-vqa	This project presents a neural network model designed to answer visual questions by combining question and image features in a residual learning framework.	39
cadene/vqa.pytorch	A PyTorch implementation of visual question answering with multimodal representation learning	718
mlpc-ucsd/bliva	A multimodal LLM designed to handle text-rich visual questions	270
lxtgh/omg-seg	Develops an end-to-end model for multiple visual perception and reasoning tasks using a single encoder, decoder, and large language model.	1,336
peteanderson80/bottom-up-attention	Trains a bottom-up attention model using Faster R-CNN and Visual Genome annotations for image captioning and VQA tasks	1,438
akirafukui/vqa-mcb	A software framework for training and deploying multimodal visual question answering models using compact bilinear pooling.	222
vpgtrans/vpgtrans	Transfers visual prompt generators across large language models to reduce training costs and enable customization of multimodal LLMs	270
jayleicn/tvqa	PyTorch implementation of video question answering system based on TVQA dataset	172
zcyang/imageqa-san	This project provides code for training image question answering models using stacked attention networks and convolutional neural networks.	108
xiaoman-zhang/pmc-vqa	A medical visual question-answering dataset and toolkit for training models to understand medical images and instructions.	180
davidmascharka/tbd-nets	An open-source implementation of a deep learning model designed to improve the balance between performance and interpretability in visual reasoning tasks.	348
markdtw/vqa-winner-cvprw-2017	Implementations and tools for training and fine-tuning a visual question answering model based on the 2017 CVPR workshop winner's approach.	164
hyeonwoonoh/vqa-transfer-externaldata	Tools and scripts for training and evaluating a visual question answering model using transfer learning from an external data source.	20
oyxhust/cnn-lstm-ctc-text-recognition	Develops CTC-based text recognition models with neural network architectures	259
guoyang9/unk-vqa	A VQA dataset with unanswerable questions designed to test the limits of large models' knowledge and reasoning abilities.	3