GPT-4V-Evaluation

GPT-4V evaluation tool

An evaluation framework for GPT-4V models using data from An Early Evaluation of GPT-4V(ision)

Data for evaluating GPT-4V

GitHub

11 stars
2 watching
0 forks
last commit: about 1 year ago

Related projects:

Repository Description Stars
scut-dlvclab/gpt-4v_ocr Evaluates the Optical Character Recognition capabilities of GPT-4V(ision) using various tasks and scenarios to identify its strengths and weaknesses 121
pjlab-adg/gpt4v-ad-exploration An autonomous driving project exploring the capabilities of a visual-language model in understanding complex driving scenes and making decisions 288
prometheus-eval/prometheus-eval An open-source framework that enables language model evaluation using Prometheus and GPT4 820
0xeb/gpt-analyst A resource repository providing tools and guides for analyzing and reverse engineering GPT models. 184
ai-secure/decodingtrust An assessment tool for evaluating trustworthiness in GPT models across various aspects such as toxicity, bias, robustness, and fairness. 267
allenai/olmo-eval A framework for evaluating language models on NLP tasks 326
open-compass/vlmevalkit An evaluation toolkit for large vision-language models 1,514
yuweihao/mm-vet Evaluates the capabilities of large multimodal models using a set of diverse tasks and metrics 274
vchitect/vbench A benchmark suite for evaluating the performance of video generative models 643
jshilong/gpt4roi Training and deploying large language models on computer vision tasks using region-of-interest inputs 517
langchain-ai/auto-evaluator Automated evaluation of language models for question answering tasks 749
ailab-cvc/gpt4tools An intelligent system that enables automatic control and utilization of visual foundation models to interact with images in conversational settings. 762
gzcch/bingo An analysis project investigating limitations of visual language models in understanding and processing images with potential biases and interference challenges. 53
zzhanghub/eval-co-sod An evaluation tool for co-saliency detection tasks 97
usepa/amet Tools for evaluating and analyzing model predictions in atmospheric science 21