Ask_Attend_and_Answer
Visual QA Model
Develops a deep learning model to answer questions about visual scenes based on spatial attention and question guidance
Ask, Attend and Answer: Exploring Question-Guided Spatial Attention for Visual Question Answering
25 stars
4 watching
11 forks
Language: C++
last commit: over 4 years ago Related projects:
Repository | Description | Stars |
---|---|---|
| A framework for training Hierarchical Co-Attention models for Visual Question Answering using preprocessed data and a specific image model. | 349 |
| This project presents a neural network model designed to answer visual questions by combining question and image features in a residual learning framework. | 39 |
| This project provides code for training image question answering models using stacked attention networks and convolutional neural networks. | 108 |
| A Visual Question Answering model using a deeper LSTM and normalized CNN architecture. | 377 |
| An implementation of Attend, Infer, Repeat, a method for fast scene understanding using generative models. | 82 |
| Provides a simple way to perform question answering using a pre-trained model in Node.js | 466 |
| Generates captions for images using an attention-based neural network | 907 |
| A PyTorch implementation of visual question answering with multimodal representation learning | 718 |
| A TensorFlow implementation of a neural caption generator using attention mechanisms. | 506 |
| An open-source project providing PyTorch code and data for a deep learning model that enables visual commonsense reasoning. | 466 |
| An open-source implementation of a deep learning model designed to improve the balance between performance and interpretability in visual reasoning tasks. | 348 |
| Adaptive attention mechanism for image captioning using visual sentinels | 335 |
| Tools and scripts for training and evaluating a visual question answering model using transfer learning from an external data source. | 20 |
| Tools and codebase for training neural question answering models on multiple paragraphs of text data | 435 |
| A multimodal AI model that enables real-world vision-language understanding applications | 2,145 |