VQA_LSTM_CNN

VQA model

A Visual Question Answering model using a deeper LSTM and normalized CNN architecture.

Train a deeper LSTM and normalized CNN Visual Question Answering model. This current code can get 58.16 on OpenEnded and 63.09 on Multiple-Choice on test-standard.

GitHub

377 stars
25 watching
133 forks
Language: Lua
last commit: almost 6 years ago
Linked from 1 awesome list


Backlinks from these awesome lists:

Related projects:

Repository Description Stars
jnhwkim/nips-mrn-vqa This project presents a neural network model designed to answer visual questions by combining question and image features in a residual learning framework. 39
cadene/vqa.pytorch A PyTorch implementation of visual question answering with multimodal representation learning 718
mlpc-ucsd/bliva A multimodal LLM designed to handle text-rich visual questions 270
lxtgh/omg-seg Develops an end-to-end model for multiple visual perception and reasoning tasks using a single encoder, decoder, and large language model. 1,336
peteanderson80/bottom-up-attention Trains a bottom-up attention model using Faster R-CNN and Visual Genome annotations for image captioning and VQA tasks 1,438
akirafukui/vqa-mcb A software framework for training and deploying multimodal visual question answering models using compact bilinear pooling. 222
vpgtrans/vpgtrans Transfers visual prompt generators across large language models to reduce training costs and enable customization of multimodal LLMs 270
jayleicn/tvqa PyTorch implementation of video question answering system based on TVQA dataset 172
zcyang/imageqa-san This project provides code for training image question answering models using stacked attention networks and convolutional neural networks. 108
xiaoman-zhang/pmc-vqa A medical visual question-answering dataset and toolkit for training models to understand medical images and instructions. 180
davidmascharka/tbd-nets An open-source implementation of a deep learning model designed to improve the balance between performance and interpretability in visual reasoning tasks. 348
markdtw/vqa-winner-cvprw-2017 Implementations and tools for training and fine-tuning a visual question answering model based on the 2017 CVPR workshop winner's approach. 164
hyeonwoonoh/vqa-transfer-externaldata Tools and scripts for training and evaluating a visual question answering model using transfer learning from an external data source. 20
oyxhust/cnn-lstm-ctc-text-recognition Develops CTC-based text recognition models with neural network architectures 259
guoyang9/unk-vqa A VQA dataset with unanswerable questions designed to test the limits of large models' knowledge and reasoning abilities. 3