prophet
VQA prompter
An implementation of a two-stage framework designed to prompt large language models with answer heuristics for knowledge-based visual question answering tasks.
Implementation of CVPR 2023 paper "Prompting Large Language Models with Answer Heuristics for Knowledge-based Visual Question Answering".
270 stars
3 watching
27 forks
Language: Python
last commit: over 1 year ago a-okvqagpt-3multimodal-deep-learningokvqaprompt-engineeringpytorchvisual-question-answering
Related projects:
Repository | Description | Stars |
---|---|---|
| A PyTorch implementation of visual question answering with multimodal representation learning | 718 |
| Implementations and tools for training and fine-tuning a visual question answering model based on the 2017 CVPR workshop winner's approach. | 164 |
| A software framework for training and deploying multimodal visual question answering models using compact bilinear pooling. | 222 |
| A multimodal LLM designed to handle text-rich visual questions | 270 |
| A system designed to enable large multimodal models to understand arbitrary visual prompts | 302 |
| A medical visual question-answering dataset and toolkit for training models to understand medical images and instructions. | 180 |
| An implementation of a VQA system using bottom-up attention, aiming to improve the efficiency and speed of visual question answering tasks. | 755 |
| Develops an end-to-end model for multiple visual perception and reasoning tasks using a single encoder, decoder, and large language model. | 1,336 |
| This project explores question-answering in movies using various machine learning approaches. | 80 |
| PyTorch implementation of video question answering system based on TVQA dataset | 172 |
| A Visual Question Answering model using a deeper LSTM and normalized CNN architecture. | 377 |
| Implementing reading comprehension from Wikipedia questions to answer open-domain queries using PyTorch and SQuAD dataset | 401 |
| An API for evaluating the quality of text prompts used in Large Language Models (LLMs) based on perplexity estimation | 43 |
| Tools and scripts for training and evaluating a visual question answering model using transfer learning from an external data source. | 20 |
| An empirical study on using GPT-3 for multimodal question answering tasks with few-shot learning. | 85 |