VQA_LSTM_CNN
VQA model
A Visual Question Answering model using a deeper LSTM and normalized CNN architecture.
Train a deeper LSTM and normalized CNN Visual Question Answering model. This current code can get 58.16 on OpenEnded and 63.09 on Multiple-Choice on test-standard.
376 stars
25 watching
133 forks
Language: Lua
last commit: over 5 years ago
Linked from 1 awesome list
Related projects:
Repository | Description | Stars |
---|---|---|
jnhwkim/nips-mrn-vqa | This project presents a neural network model designed to answer visual questions by combining question and image features in a residual learning framework. | 39 |
cadene/vqa.pytorch | A PyTorch implementation of visual question answering with multimodal representation learning | 716 |
mlpc-ucsd/bliva | A multimodal LLM designed to handle text-rich visual questions | 269 |
lxtgh/omg-seg | Develops an end-to-end model for multiple visual perception and reasoning tasks using a single encoder, decoder, and large language model. | 1,300 |
peteanderson80/bottom-up-attention | Trains a bottom-up attention model using Faster R-CNN and Visual Genome annotations for image captioning and VQA tasks | 1,433 |
akirafukui/vqa-mcb | A software framework for training and deploying multimodal visual question answering models using compact bilinear pooling. | 222 |
vpgtrans/vpgtrans | Transfers visual prompt generators across large language models to reduce training costs and enable customization of multimodal LLMs | 269 |
jayleicn/tvqa | PyTorch implementation of video question answering system based on TVQA dataset | 172 |
zcyang/imageqa-san | This project provides code for training image question answering models using stacked attention networks and convolutional neural networks. | 107 |
xiaoman-zhang/pmc-vqa | A medical visual question-answering dataset and toolkit for training models to understand medical images and instructions. | 174 |
davidmascharka/tbd-nets | An open-source implementation of a deep learning model designed to improve the balance between performance and interpretability in visual reasoning tasks. | 348 |
markdtw/vqa-winner-cvprw-2017 | Implementations and tools for training and fine-tuning a visual question answering model based on the 2017 CVPR workshop winner's approach. | 164 |
hyeonwoonoh/vqa-transfer-externaldata | Tools and scripts for training and evaluating a visual question answering model using transfer learning from an external data source. | 20 |
oyxhust/cnn-lstm-ctc-text-recognition | Develops CTC-based text recognition models with neural network architectures | 259 |
guoyang9/unk-vqa | A VQA dataset with unanswerable questions designed to test the limits of large models' knowledge and reasoning abilities. | 2 |