prophet

VQA prompter

An implementation of a two-stage framework designed to prompt large language models with answer heuristics for knowledge-based visual question answering tasks.

Implementation of CVPR 2023 paper "Prompting Large Language Models with Answer Heuristics for Knowledge-based Visual Question Answering".

GitHub

270 stars

3 watching

27 forks

Language: Python

last commit: over 2 years ago

a-okvqagpt-3multimodal-deep-learningokvqaprompt-engineeringpytorchvisual-question-answering

arxiv.org/abs/2303.01903

Related projects:

Repository	Description	Stars
cadene/vqa.pytorch	A PyTorch implementation of visual question answering with multimodal representation learning	718
markdtw/vqa-winner-cvprw-2017	Implementations and tools for training and fine-tuning a visual question answering model based on the 2017 CVPR workshop winner's approach.	164
akirafukui/vqa-mcb	A software framework for training and deploying multimodal visual question answering models using compact bilinear pooling.	222
mlpc-ucsd/bliva	A multimodal LLM designed to handle text-rich visual questions	270
wisconsinaivision/vip-llava	A system designed to enable large multimodal models to understand arbitrary visual prompts	302
xiaoman-zhang/pmc-vqa	A medical visual question-answering dataset and toolkit for training models to understand medical images and instructions.	180
hengyuan-hu/bottom-up-attention-vqa	An implementation of a VQA system using bottom-up attention, aiming to improve the efficiency and speed of visual question answering tasks.	755
lxtgh/omg-seg	Develops an end-to-end model for multiple visual perception and reasoning tasks using a single encoder, decoder, and large language model.	1,336
makarandtapaswi/movieqa_cvpr2016	This project explores question-answering in movies using various machine learning approaches.	80
jayleicn/tvqa	PyTorch implementation of video question answering system based on TVQA dataset	172
gt-vision-lab/vqa_lstm_cnn	A Visual Question Answering model using a deeper LSTM and normalized CNN architecture.	377
hitvoice/drqa	Implementing reading comprehension from Wikipedia questions to answer open-domain queries using PyTorch and SQuAD dataset	401
krrishdholakia/betterprompt	An API for evaluating the quality of text prompts used in Large Language Models (LLMs) based on perplexity estimation	43
hyeonwoonoh/vqa-transfer-externaldata	Tools and scripts for training and evaluating a visual question answering model using transfer learning from an external data source.	20
microsoft/pica	An empirical study on using GPT-3 for multimodal question answering tasks with few-shot learning.	85