GPT-4V-Evaluation

GPT-4V evaluation tool

An evaluation framework for GPT-4V models using data from An Early Evaluation of GPT-4V(ision)

Data for evaluating GPT-4V

GitHub

11 stars
2 watching
0 forks
last commit: about 1 year ago

Related projects:

Repository Description Stars
scut-dlvclab/gpt-4v_ocr Evaluates the Optical Character Recognition capabilities of GPT-4V(ision) using various tasks and scenarios to identify its strengths and weaknesses 120
pjlab-adg/gpt4v-ad-exploration An autonomous driving project exploring the capabilities of a visual-language model in understanding complex driving scenes and making decisions 287
prometheus-eval/prometheus-eval An open-source framework that enables language model evaluation using Prometheus and GPT4 796
0xeb/gpt-analyst A resource repository providing tools and guides for analyzing and reverse engineering GPT models. 181
ai-secure/decodingtrust An assessment tool for evaluating trustworthiness in GPT models across various aspects such as toxicity, bias, robustness, and fairness. 259
allenai/olmo-eval An evaluation framework for large language models. 310
open-compass/vlmevalkit A toolkit for evaluating large vision-language models on various benchmarks and datasets. 1,343
yuweihao/mm-vet Evaluates the capabilities of large multimodal models using a set of diverse tasks and metrics 267
vchitect/vbench A tool for evaluating and benchmarking video generative models in computer vision and artificial intelligence 576
jshilong/gpt4roi Training and deploying large language models on computer vision tasks using region-of-interest inputs 506
langchain-ai/auto-evaluator Automated evaluation of language models for question answering tasks 744
ailab-cvc/gpt4tools An intelligent system that enables automatic control and utilization of visual foundation models to interact with images in conversational settings. 760
gzcch/bingo An analysis project investigating limitations of visual language models in understanding and processing images with potential biases and interference challenges. 53
zzhanghub/eval-co-sod An evaluation tool for co-saliency detection tasks 96
usepa/amet Tools for evaluating and analyzing model predictions in atmospheric science 21