VQA_LSTM_CNN
VQA model
A Visual Question Answering model using a deeper LSTM and normalized CNN architecture.
Train a deeper LSTM and normalized CNN Visual Question Answering model. This current code can get 58.16 on OpenEnded and 63.09 on Multiple-Choice on test-standard.
377 stars
25 watching
133 forks
Language: Lua
last commit: almost 6 years ago
Linked from 1 awesome list
Related projects:
Repository | Description | Stars |
---|---|---|
| This project presents a neural network model designed to answer visual questions by combining question and image features in a residual learning framework. | 39 |
| A PyTorch implementation of visual question answering with multimodal representation learning | 718 |
| A multimodal LLM designed to handle text-rich visual questions | 270 |
| Develops an end-to-end model for multiple visual perception and reasoning tasks using a single encoder, decoder, and large language model. | 1,336 |
| Trains a bottom-up attention model using Faster R-CNN and Visual Genome annotations for image captioning and VQA tasks | 1,438 |
| A software framework for training and deploying multimodal visual question answering models using compact bilinear pooling. | 222 |
| Transfers visual prompt generators across large language models to reduce training costs and enable customization of multimodal LLMs | 270 |
| PyTorch implementation of video question answering system based on TVQA dataset | 172 |
| This project provides code for training image question answering models using stacked attention networks and convolutional neural networks. | 108 |
| A medical visual question-answering dataset and toolkit for training models to understand medical images and instructions. | 180 |
| An open-source implementation of a deep learning model designed to improve the balance between performance and interpretability in visual reasoning tasks. | 348 |
| Implementations and tools for training and fine-tuning a visual question answering model based on the 2017 CVPR workshop winner's approach. | 164 |
| Tools and scripts for training and evaluating a visual question answering model using transfer learning from an external data source. | 20 |
| Develops CTC-based text recognition models with neural network architectures | 259 |
| A VQA dataset with unanswerable questions designed to test the limits of large models' knowledge and reasoning abilities. | 3 |