vqa-mcb
VQA model framework
A software framework for training and deploying multimodal visual question answering models using compact bilinear pooling.
222 stars
15 watching
79 forks
Language: Python
last commit: over 8 years ago Related projects:
Repository | Description | Stars |
---|---|---|
| An implementation of a two-stage framework designed to prompt large language models with answer heuristics for knowledge-based visual question answering tasks. | 270 |
| A framework for training Hierarchical Co-Attention models for Visual Question Answering using preprocessed data and a specific image model. | 349 |
| A PyTorch implementation of visual question answering with multimodal representation learning | 718 |
| A multimodal LLM designed to handle text-rich visual questions | 270 |
| Tools and scripts for training and evaluating a visual question answering model using transfer learning from an external data source. | 20 |
| A Visual Question Answering model using a deeper LSTM and normalized CNN architecture. | 377 |
| A medical visual question-answering dataset and toolkit for training models to understand medical images and instructions. | 180 |
| A large multimodal model for visual question answering, trained on a dataset of 2.1B image-text pairs and 8.2M instruction sequences. | 78 |
| Implementations and tools for training and fine-tuning a visual question answering model based on the 2017 CVPR workshop winner's approach. | 164 |
| An implementation of a VQA system using bottom-up attention, aiming to improve the efficiency and speed of visual question answering tasks. | 755 |
| A VQA dataset with unanswerable questions designed to test the limits of large models' knowledge and reasoning abilities. | 3 |
| An evaluation framework using Chinese high school examination questions to assess large language model capabilities | 565 |
| An evaluation framework for multimodal language models' visual capabilities using image and question benchmarks. | 296 |
| This project presents a neural network model designed to answer visual questions by combining question and image features in a residual learning framework. | 39 |
| A PyTorch-based model for answering questions about videos based on unseen scenes and storylines | 19 |