Ask_Attend_and_Answer
Visual QA Model
Develops a deep learning model to answer questions about visual scenes based on spatial attention and question guidance
Ask, Attend and Answer: Exploring Question-Guided Spatial Attention for Visual Question Answering
25 stars
4 watching
11 forks
Language: C++
last commit: about 4 years ago Related projects:
Repository | Description | Stars |
---|---|---|
jiasenlu/hiecoattenvqa | A framework for training Hierarchical Co-Attention models for Visual Question Answering using preprocessed data and a specific image model. | 349 |
jnhwkim/nips-mrn-vqa | This project presents a neural network model designed to answer visual questions by combining question and image features in a residual learning framework. | 39 |
zcyang/imageqa-san | This project provides code for training image question answering models using stacked attention networks and convolutional neural networks. | 107 |
gt-vision-lab/vqa_lstm_cnn | A Visual Question Answering model using a deeper LSTM and normalized CNN architecture. | 376 |
akosiorek/attend_infer_repeat | An implementation of Attend, Infer, Repeat, a method for fast scene understanding using generative models. | 82 |
huggingface/node-question-answering | Provides a simple way to perform question answering using a pre-trained model in Node.js | 466 |
yunjey/show-attend-and-tell | Generates captions for images using an attention-based neural network | 907 |
cadene/vqa.pytorch | A PyTorch implementation of visual question answering with multimodal representation learning | 716 |
jazzsaxmafia/show_attend_and_tell.tensorflow | A TensorFlow implementation of a neural caption generator using attention mechanisms. | 506 |
rowanz/r2c | An open-source project providing PyTorch code and data for a deep learning model that enables visual commonsense reasoning. | 466 |
davidmascharka/tbd-nets | An open-source implementation of a deep learning model designed to improve the balance between performance and interpretability in visual reasoning tasks. | 348 |
jiasenlu/adaptiveattention | Adaptive attention mechanism for image captioning using visual sentinels | 334 |
hyeonwoonoh/vqa-transfer-externaldata | Tools and scripts for training and evaluating a visual question answering model using transfer learning from an external data source. | 20 |
allenai/document-qa | Tools and codebase for training neural question answering models on multiple paragraphs of text data | 434 |
deepseek-ai/deepseek-vl | A multimodal AI model that enables real-world vision-language understanding applications | 2,077 |