Ask_Attend_and_Answer

Visual QA Model

Develops a deep learning model to answer questions about visual scenes based on spatial attention and question guidance

Ask, Attend and Answer: Exploring Question-Guided Spatial Attention for Visual Question Answering

GitHub

25 stars
4 watching
11 forks
Language: C++
last commit: about 4 years ago

Related projects:

Repository Description Stars
jiasenlu/hiecoattenvqa A framework for training Hierarchical Co-Attention models for Visual Question Answering using preprocessed data and a specific image model. 349
jnhwkim/nips-mrn-vqa This project presents a neural network model designed to answer visual questions by combining question and image features in a residual learning framework. 39
zcyang/imageqa-san This project provides code for training image question answering models using stacked attention networks and convolutional neural networks. 107
gt-vision-lab/vqa_lstm_cnn A Visual Question Answering model using a deeper LSTM and normalized CNN architecture. 376
akosiorek/attend_infer_repeat An implementation of Attend, Infer, Repeat, a method for fast scene understanding using generative models. 82
huggingface/node-question-answering Provides a simple way to perform question answering using a pre-trained model in Node.js 466
yunjey/show-attend-and-tell Generates captions for images using an attention-based neural network 907
cadene/vqa.pytorch A PyTorch implementation of visual question answering with multimodal representation learning 716
jazzsaxmafia/show_attend_and_tell.tensorflow A TensorFlow implementation of a neural caption generator using attention mechanisms. 506
rowanz/r2c An open-source project providing PyTorch code and data for a deep learning model that enables visual commonsense reasoning. 466
davidmascharka/tbd-nets An open-source implementation of a deep learning model designed to improve the balance between performance and interpretability in visual reasoning tasks. 348
jiasenlu/adaptiveattention Adaptive attention mechanism for image captioning using visual sentinels 334
hyeonwoonoh/vqa-transfer-externaldata Tools and scripts for training and evaluating a visual question answering model using transfer learning from an external data source. 20
allenai/document-qa Tools and codebase for training neural question answering models on multiple paragraphs of text data 434
deepseek-ai/deepseek-vl A multimodal AI model that enables real-world vision-language understanding applications 2,077