nips-mrn-vqa
Visual QA Model
This project presents a neural network model designed to answer visual questions by combining question and image features in a residual learning framework.
Multimodal Residual Learning for Visual QA (NIPS 2016)
39 stars
4 watching
5 forks
Language: Lua
last commit: about 8 years ago Related projects:
Repository | Description | Stars |
---|---|---|
| A Visual Question Answering model using a deeper LSTM and normalized CNN architecture. | 377 |
| This project provides code for training image question answering models using stacked attention networks and convolutional neural networks. | 108 |
| Develops a deep learning model to answer questions about visual scenes based on spatial attention and question guidance | 25 |
| A multimodal LLM designed to handle text-rich visual questions | 270 |
| A framework for training Hierarchical Co-Attention models for Visual Question Answering using preprocessed data and a specific image model. | 349 |
| A PyTorch implementation of an improved question answering architecture with dynamic memory networks and attention mechanisms | 64 |
| Proposes a deep learning model to fill missing regions in video frames and generate completed videos | 480 |
| A PyTorch implementation of visual question answering with multimodal representation learning | 718 |
| An implementation of Google's QANet for machine reading comprehension using TensorFlow. | 983 |
| A plugin for visualizing Markdown files as mindmaps | 174 |
| An implementation of neural networks on graph structures for learning molecular properties | 340 |
| Highlights current search matches under the cursor when pressing n or N | 89 |
| A deep learning framework for training multi-modal models with vision and language capabilities. | 1,299 |
| An open-source implementation of a deep learning model designed to improve the balance between performance and interpretability in visual reasoning tasks. | 348 |
| Develops an end-to-end model for multiple visual perception and reasoning tasks using a single encoder, decoder, and large language model. | 1,336 |