nips-mrn-vqa

Visual QA Model

This project presents a neural network model designed to answer visual questions by combining question and image features in a residual learning framework.

Multimodal Residual Learning for Visual QA (NIPS 2016)

GitHub

39 stars
4 watching
5 forks
Language: Lua
last commit: almost 8 years ago

Related projects:

Repository Description Stars
gt-vision-lab/vqa_lstm_cnn A Visual Question Answering model using a deeper LSTM and normalized CNN architecture. 376
zcyang/imageqa-san This project provides code for training image question answering models using stacked attention networks and convolutional neural networks. 107
visionlearninggroup/ask_attend_and_answer Develops a deep learning model to answer questions about visual scenes based on spatial attention and question guidance 25
mlpc-ucsd/bliva A multimodal LLM designed to handle text-rich visual questions 269
jiasenlu/hiecoattenvqa A framework for training Hierarchical Co-Attention models for Visual Question Answering using preprocessed data and a specific image model. 349
vlgiitr/dmn-plus A PyTorch implementation of an improved question answering architecture with dynamic memory networks and attention mechanisms 64
researchmm/sttn Proposes a deep learning model to fill missing regions in video frames and generate completed videos 474
cadene/vqa.pytorch A PyTorch implementation of visual question answering with multimodal representation learning 716
localminimum/qanet An implementation of Google's QANet for machine reading comprehension using TensorFlow. 983
zeioth/markmap.nvim A plugin for visualizing Markdown files as mindmaps 167
priba/nmp_qc An implementation of neural networks on graph structures for learning molecular properties 339
rktjmp/highlight-current-n.nvim Highlights current search matches under the cursor when pressing n or N 89
nvlabs/prismer A deep learning framework for training multi-modal models with vision and language capabilities. 1,298
davidmascharka/tbd-nets An open-source implementation of a deep learning model designed to improve the balance between performance and interpretability in visual reasoning tasks. 348
lxtgh/omg-seg Develops an end-to-end model for multiple visual perception and reasoning tasks using a single encoder, decoder, and large language model. 1,300