HieCoAttenVQA

Visual QA framework

A framework for training Hierarchical Co-Attention models for Visual Question Answering using preprocessed data and a specific image model.

GitHub

349 stars

15 watching

123 forks

Language: Jupyter Notebook

last commit: almost 7 years ago

Linked from 2 awesome lists

Backlinks from these awesome lists:

Related projects:

Repository	Description	Stars
akirafukui/vqa-mcb	A software framework for training and deploying multimodal visual question answering models using compact bilinear pooling.	222
visionlearninggroup/ask_attend_and_answer	Develops a deep learning model to answer questions about visual scenes based on spatial attention and question guidance	25
jnhwkim/nips-mrn-vqa	This project presents a neural network model designed to answer visual questions by combining question and image features in a residual learning framework.	39
hengyuan-hu/bottom-up-attention-vqa	An implementation of a VQA system using bottom-up attention, aiming to improve the efficiency and speed of visual question answering tasks.	755
zcyang/imageqa-san	This project provides code for training image question answering models using stacked attention networks and convolutional neural networks.	108
yj-yu/lsmdc	A framework implementing a joint sequence fusion model for video question answering and retrieval	31
cadene/vqa.pytorch	A PyTorch implementation of visual question answering with multimodal representation learning	718
hyeonwoonoh/vqa-transfer-externaldata	Tools and scripts for training and evaluating a visual question answering model using transfer learning from an external data source.	20
gt-vision-lab/vqa_lstm_cnn	A Visual Question Answering model using a deeper LSTM and normalized CNN architecture.	377
jy0205/lavit	A unified framework for training large language models to understand and generate visual content	544
milvlg/prophet	An implementation of a two-stage framework designed to prompt large language models with answer heuristics for knowledge-based visual question answering tasks.	270
jiasenlu/adaptiveattention	Adaptive attention mechanism for image captioning using visual sentinels	335
qt/qtdeclarative	A comprehensive collection of libraries and modules for building user interfaces and dynamic applications using Qt's declarative language.	231
nvlabs/relvit	A deep learning framework designed to improve visual reasoning capabilities by utilizing concepts and semantic relations.	64
findalexli/scigraphqa	A dataset and benchmarking framework for evaluating the performance of large language models on multi-turn question answering tasks for scientific graphs.	38