bottom-up-attention
Attention model training
Trains a bottom-up attention model using Faster R-CNN and Visual Genome annotations for image captioning and VQA tasks
Bottom-up attention model for image captioning and VQA, based on Faster R-CNN and Visual Genome
1k stars
26 watching
378 forks
Language: Jupyter Notebook
last commit: about 2 years ago
Linked from 1 awesome list
caffecaptioning-imagesfaster-rcnnimage-captioningmscocomscoco-datasetvisual-question-answeringvqa
Related projects:
Repository | Description | Stars |
---|---|---|
| An implementation of a VQA system using bottom-up attention, aiming to improve the efficiency and speed of visual question answering tasks. | 755 |
| A Visual Question Answering model using a deeper LSTM and normalized CNN architecture. | 377 |
| A Gluon implementation of Residual Attention Network for image classification tasks | 108 |
| Adaptive attention mechanism for image captioning using visual sentinels | 335 |
| This project provides code for training image question answering models using stacked attention networks and convolutional neural networks. | 108 |
| A deep learning model that generates sentence embeddings using structured self-attention and is used for binary and multiclass classification tasks. | 494 |
| An image classification neural network implementation using attention mechanisms and residual learning | 94 |
| This repository provides code for training a Faster R-CNN object detection model on DOTA datasets. | 337 |
| Provides implementations of CNN architectures and improvement methods for image classification on the CIFAR benchmark. | 703 |
| A deep learning model for generating image captions with semantic attention | 51 |
| This project trains deep CNN denoisers to improve image restoration tasks such as deblurring and demosaicking through model-based optimization methods. | 602 |
| An implementation of a deep neural network architecture using attention mechanisms and residual connections for image classification tasks. | 554 |
| A TensorFlow model for recognizing text in images using visual attention and a sequence-to-sequence architecture. | 1,079 |
| A PyTorch implementation of visual question answering with multimodal representation learning | 718 |
| Improves performance of convolutional neural networks by transferring knowledge from teacher models to student models using attention mechanisms. | 1,449 |