nips-mrn-vqa

Visual QA Model

This project presents a neural network model designed to answer visual questions by combining question and image features in a residual learning framework.

Multimodal Residual Learning for Visual QA (NIPS 2016)

GitHub

39 stars

4 watching

5 forks

Language: Lua

last commit: almost 9 years ago

Related projects:

Repository	Description	Stars
gt-vision-lab/vqa_lstm_cnn	A Visual Question Answering model using a deeper LSTM and normalized CNN architecture.	377
zcyang/imageqa-san	This project provides code for training image question answering models using stacked attention networks and convolutional neural networks.	108
visionlearninggroup/ask_attend_and_answer	Develops a deep learning model to answer questions about visual scenes based on spatial attention and question guidance	25
mlpc-ucsd/bliva	A multimodal LLM designed to handle text-rich visual questions	270
jiasenlu/hiecoattenvqa	A framework for training Hierarchical Co-Attention models for Visual Question Answering using preprocessed data and a specific image model.	349
vlgiitr/dmn-plus	A PyTorch implementation of an improved question answering architecture with dynamic memory networks and attention mechanisms	64
researchmm/sttn	Proposes a deep learning model to fill missing regions in video frames and generate completed videos	480
cadene/vqa.pytorch	A PyTorch implementation of visual question answering with multimodal representation learning	718
localminimum/qanet	An implementation of Google's QANet for machine reading comprehension using TensorFlow.	983
zeioth/markmap.nvim	A plugin for visualizing Markdown files as mindmaps	174
priba/nmp_qc	An implementation of neural networks on graph structures for learning molecular properties	340
rktjmp/highlight-current-n.nvim	Highlights current search matches under the cursor when pressing n or N	89
nvlabs/prismer	A deep learning framework for training multi-modal models with vision and language capabilities.	1,299
davidmascharka/tbd-nets	An open-source implementation of a deep learning model designed to improve the balance between performance and interpretability in visual reasoning tasks.	348
lxtgh/omg-seg	Develops an end-to-end model for multiple visual perception and reasoning tasks using a single encoder, decoder, and large language model.	1,336