prophet

VQA prompter

An implementation of a two-stage framework designed to prompt large language models with answer heuristics for knowledge-based visual question answering tasks.

Implementation of CVPR 2023 paper "Prompting Large Language Models with Answer Heuristics for Knowledge-based Visual Question Answering".

GitHub

267 stars
3 watching
27 forks
Language: Python
last commit: over 1 year ago
a-okvqagpt-3multimodal-deep-learningokvqaprompt-engineeringpytorchvisual-question-answering

Related projects:

Repository Description Stars
cadene/vqa.pytorch A PyTorch implementation of visual question answering with multimodal representation learning 716
markdtw/vqa-winner-cvprw-2017 Implementations and tools for training and fine-tuning a visual question answering model based on the 2017 CVPR workshop winner's approach. 164
akirafukui/vqa-mcb A software framework for training and deploying multimodal visual question answering models using compact bilinear pooling. 222
mlpc-ucsd/bliva A multimodal LLM designed to handle text-rich visual questions 269
wisconsinaivision/vip-llava A system designed to enable large multimodal models to understand arbitrary visual prompts 294
xiaoman-zhang/pmc-vqa A medical visual question-answering dataset and toolkit for training models to understand medical images and instructions. 174
hengyuan-hu/bottom-up-attention-vqa An implementation of a VQA system using bottom-up attention, aiming to improve the efficiency and speed of visual question answering tasks. 754
lxtgh/omg-seg Develops an end-to-end model for multiple visual perception and reasoning tasks using a single encoder, decoder, and large language model. 1,300
makarandtapaswi/movieqa_cvpr2016 This project explores question-answering in movies using various machine learning approaches. 80
jayleicn/tvqa PyTorch implementation of video question answering system based on TVQA dataset 172
gt-vision-lab/vqa_lstm_cnn A Visual Question Answering model using a deeper LSTM and normalized CNN architecture. 376
hitvoice/drqa Implementing reading comprehension from Wikipedia questions to answer open-domain queries using PyTorch and SQuAD dataset 401
krrishdholakia/betterprompt An API for evaluating the quality of text prompts used in Large Language Models (LLMs) based on perplexity estimation 38
hyeonwoonoh/vqa-transfer-externaldata Tools and scripts for training and evaluating a visual question answering model using transfer learning from an external data source. 20
microsoft/pica An empirical study on using GPT-3 for multimodal question answering tasks with few-shot learning. 84