VQA-Transfer-ExternalData
VQA trainer
Tools and scripts for training and evaluating a visual question answering model using transfer learning from an external data source.
Transfer Learning via Unsupervised Task Discovery for Visual Question Answering
20 stars
3 watching
2 forks
Language: Python
last commit: almost 6 years ago Related projects:
Repository | Description | Stars |
---|---|---|
| Implementations and tools for training and fine-tuning a visual question answering model based on the 2017 CVPR workshop winner's approach. | 164 |
| A PyTorch implementation of visual question answering with multimodal representation learning | 718 |
| An implementation of a VQA system using bottom-up attention, aiming to improve the efficiency and speed of visual question answering tasks. | 755 |
| PyTorch implementation of video question answering system based on TVQA dataset | 172 |
| A software framework for training and deploying multimodal visual question answering models using compact bilinear pooling. | 222 |
| A VQA dataset with unanswerable questions designed to test the limits of large models' knowledge and reasoning abilities. | 3 |
| A Python-based system for generating visual question-answer pairs using text-aware approaches to improve Text-VQA performance. | 21 |
| A Visual Question Answering model using a deeper LSTM and normalized CNN architecture. | 377 |
| A framework for training Hierarchical Co-Attention models for Visual Question Answering using preprocessed data and a specific image model. | 349 |
| This project presents a neural network model designed to answer visual questions by combining question and image features in a residual learning framework. | 39 |
| Implementing reading comprehension from Wikipedia questions to answer open-domain queries using PyTorch and SQuAD dataset | 401 |
| An implementation of a two-stage framework designed to prompt large language models with answer heuristics for knowledge-based visual question answering tasks. | 270 |
| Transfers visual prompt generators across large language models to reduce training costs and enable customization of multimodal LLMs | 270 |
| A medical visual question-answering dataset and toolkit for training models to understand medical images and instructions. | 180 |
| This is an open-source project that proposes a novel method to train large-scale vision-language models with minimal resources and no fine-tuning required. | 94 |