VQA_LSTM_CNN
VQA model
A Visual Question Answering model using a deeper LSTM and normalized CNN architecture.
Train a deeper LSTM and normalized CNN Visual Question Answering model. This current code can get 58.16 on OpenEnded and 63.09 on Multiple-Choice on test-standard.
377 stars
25 watching
133 forks
Language: Lua
last commit: over 6 years ago
Linked from 1 awesome list
Related projects:
| Repository | Description | Stars |
|---|---|---|
| | This project presents a neural network model designed to answer visual questions by combining question and image features in a residual learning framework. | 39 |
| | A PyTorch implementation of visual question answering with multimodal representation learning | 718 |
| | A multimodal LLM designed to handle text-rich visual questions | 270 |
| | Develops an end-to-end model for multiple visual perception and reasoning tasks using a single encoder, decoder, and large language model. | 1,336 |
| | Trains a bottom-up attention model using Faster R-CNN and Visual Genome annotations for image captioning and VQA tasks | 1,438 |
| | A software framework for training and deploying multimodal visual question answering models using compact bilinear pooling. | 222 |
| | Transfers visual prompt generators across large language models to reduce training costs and enable customization of multimodal LLMs | 270 |
| | PyTorch implementation of video question answering system based on TVQA dataset | 172 |
| | This project provides code for training image question answering models using stacked attention networks and convolutional neural networks. | 108 |
| | A medical visual question-answering dataset and toolkit for training models to understand medical images and instructions. | 180 |
| | An open-source implementation of a deep learning model designed to improve the balance between performance and interpretability in visual reasoning tasks. | 348 |
| | Implementations and tools for training and fine-tuning a visual question answering model based on the 2017 CVPR workshop winner's approach. | 164 |
| | Tools and scripts for training and evaluating a visual question answering model using transfer learning from an external data source. | 20 |
| | Develops CTC-based text recognition models with neural network architectures | 259 |
| | A VQA dataset with unanswerable questions designed to test the limits of large models' knowledge and reasoning abilities. | 3 |