nips-mrn-vqa
Visual QA Model
This project presents a neural network model designed to answer visual questions by combining question and image features in a residual learning framework.
Multimodal Residual Learning for Visual QA (NIPS 2016)
39 stars
4 watching
5 forks
Language: Lua
last commit: about 8 years ago Related projects:
Repository | Description | Stars |
---|---|---|
gt-vision-lab/vqa_lstm_cnn | A Visual Question Answering model using a deeper LSTM and normalized CNN architecture. | 377 |
zcyang/imageqa-san | This project provides code for training image question answering models using stacked attention networks and convolutional neural networks. | 108 |
visionlearninggroup/ask_attend_and_answer | Develops a deep learning model to answer questions about visual scenes based on spatial attention and question guidance | 25 |
mlpc-ucsd/bliva | A multimodal LLM designed to handle text-rich visual questions | 270 |
jiasenlu/hiecoattenvqa | A framework for training Hierarchical Co-Attention models for Visual Question Answering using preprocessed data and a specific image model. | 349 |
vlgiitr/dmn-plus | A PyTorch implementation of an improved question answering architecture with dynamic memory networks and attention mechanisms | 64 |
researchmm/sttn | Proposes a deep learning model to fill missing regions in video frames and generate completed videos | 480 |
cadene/vqa.pytorch | A PyTorch implementation of visual question answering with multimodal representation learning | 718 |
localminimum/qanet | An implementation of Google's QANet for machine reading comprehension using TensorFlow. | 983 |
zeioth/markmap.nvim | A plugin for visualizing Markdown files as mindmaps | 174 |
priba/nmp_qc | An implementation of neural networks on graph structures for learning molecular properties | 340 |
rktjmp/highlight-current-n.nvim | Highlights current search matches under the cursor when pressing n or N | 89 |
nvlabs/prismer | A deep learning framework for training multi-modal models with vision and language capabilities. | 1,299 |
davidmascharka/tbd-nets | An open-source implementation of a deep learning model designed to improve the balance between performance and interpretability in visual reasoning tasks. | 348 |
lxtgh/omg-seg | Develops an end-to-end model for multiple visual perception and reasoning tasks using a single encoder, decoder, and large language model. | 1,336 |