vqa-mcb
VQA model framework
A software framework for training and deploying multimodal visual question answering models using compact bilinear pooling.
222 stars
15 watching
79 forks
Language: Python
last commit: over 8 years ago Related projects:
Repository | Description | Stars |
---|---|---|
milvlg/prophet | An implementation of a two-stage framework designed to prompt large language models with answer heuristics for knowledge-based visual question answering tasks. | 267 |
jiasenlu/hiecoattenvqa | A framework for training Hierarchical Co-Attention models for Visual Question Answering using preprocessed data and a specific image model. | 349 |
cadene/vqa.pytorch | A PyTorch implementation of visual question answering with multimodal representation learning | 716 |
mlpc-ucsd/bliva | A multimodal LLM designed to handle text-rich visual questions | 269 |
hyeonwoonoh/vqa-transfer-externaldata | Tools and scripts for training and evaluating a visual question answering model using transfer learning from an external data source. | 20 |
gt-vision-lab/vqa_lstm_cnn | A Visual Question Answering model using a deeper LSTM and normalized CNN architecture. | 376 |
xiaoman-zhang/pmc-vqa | A medical visual question-answering dataset and toolkit for training models to understand medical images and instructions. | 174 |
xverse-ai/xverse-v-13b | A large multimodal model for visual question answering, trained on a dataset of 2.1B image-text pairs and 8.2M instruction sequences. | 77 |
markdtw/vqa-winner-cvprw-2017 | Implementations and tools for training and fine-tuning a visual question answering model based on the 2017 CVPR workshop winner's approach. | 164 |
hengyuan-hu/bottom-up-attention-vqa | An implementation of a VQA system using bottom-up attention, aiming to improve the efficiency and speed of visual question answering tasks. | 754 |
guoyang9/unk-vqa | A VQA dataset with unanswerable questions designed to test the limits of large models' knowledge and reasoning abilities. | 2 |
openlmlab/gaokao-bench | An evaluation framework using Chinese high school examination questions to assess large language model capabilities | 551 |
tsb0601/mmvp | An evaluation framework for multimodal language models' visual capabilities using image and question benchmarks. | 288 |
jnhwkim/nips-mrn-vqa | This project presents a neural network model designed to answer visual questions by combining question and image features in a residual learning framework. | 39 |
noagarcia/roll-videoqa | A PyTorch-based model for answering questions about videos based on unseen scenes and storylines | 19 |