HieCoAttenVQA
Visual QA framework
A framework for training Hierarchical Co-Attention models for Visual Question Answering using preprocessed data and a specific image model.
349 stars
15 watching
123 forks
Language: Jupyter Notebook
last commit: about 6 years ago
Linked from 2 awesome lists
Related projects:
Repository | Description | Stars |
---|---|---|
akirafukui/vqa-mcb | A software framework for training and deploying multimodal visual question answering models using compact bilinear pooling. | 222 |
visionlearninggroup/ask_attend_and_answer | Develops a deep learning model to answer questions about visual scenes based on spatial attention and question guidance | 25 |
jnhwkim/nips-mrn-vqa | This project presents a neural network model designed to answer visual questions by combining question and image features in a residual learning framework. | 39 |
hengyuan-hu/bottom-up-attention-vqa | An implementation of a VQA system using bottom-up attention, aiming to improve the efficiency and speed of visual question answering tasks. | 754 |
zcyang/imageqa-san | This project provides code for training image question answering models using stacked attention networks and convolutional neural networks. | 107 |
yj-yu/lsmdc | A framework implementing a joint sequence fusion model for video question answering and retrieval | 31 |
cadene/vqa.pytorch | A PyTorch implementation of visual question answering with multimodal representation learning | 716 |
hyeonwoonoh/vqa-transfer-externaldata | Tools and scripts for training and evaluating a visual question answering model using transfer learning from an external data source. | 20 |
gt-vision-lab/vqa_lstm_cnn | A Visual Question Answering model using a deeper LSTM and normalized CNN architecture. | 376 |
jy0205/lavit | A unified framework for training large language models to understand and generate visual content | 528 |
milvlg/prophet | An implementation of a two-stage framework designed to prompt large language models with answer heuristics for knowledge-based visual question answering tasks. | 267 |
jiasenlu/adaptiveattention | Adaptive attention mechanism for image captioning using visual sentinels | 334 |
qt/qtdeclarative | A comprehensive collection of libraries and modules for building user interfaces and dynamic applications using Qt's declarative language. | 225 |
nvlabs/relvit | A deep learning framework designed to improve visual reasoning capabilities by utilizing concepts and semantic relations. | 64 |
findalexli/scigraphqa | A dataset and benchmarking framework for evaluating the performance of large language models on multi-turn question answering tasks for scientific graphs. | 37 |