VLMEvalKit
Evaluation framework
An evaluation toolkit for large vision-language models
Open-source evaluation toolkit of large vision-language models (LVLMs), support 160+ VLMs, 50+ benchmarks
2k stars
11 watching
211 forks
Language: Python
last commit: about 1 month ago
Linked from 1 awesome list
chatgptclaudeclipcomputer-visionevaluationgeminigptgpt-4vgpt4large-language-modelsllavallmmulti-modalopenaiopenai-apipytorchqwenvitvqa
Related projects:
Repository | Description | Stars |
---|---|---|
open-compass/lawbench | Evaluates the legal knowledge of large language models using a custom benchmarking framework. | 273 |
evolvinglmms-lab/lmms-eval | Tools and evaluation framework for accelerating the development of large multimodal models by providing an efficient way to assess their performance | 2,164 |
freedomintelligence/mllm-bench | Evaluates and compares the performance of multimodal large language models on various tasks | 56 |
open-compass/mmbench | A collection of benchmarks to evaluate the multi-modal understanding capability of large vision language models. | 168 |
chenllliang/mmevalpro | A benchmarking framework for evaluating Large Multimodal Models by providing rigorous metrics and an efficient evaluation pipeline. | 22 |
openai/simple-evals | Evaluates language models using standardized benchmarks and prompting techniques. | 2,059 |
mshukor/evalign-icl | Evaluating and improving large multimodal models through in-context learning | 21 |
prometheus-eval/prometheus-eval | An open-source framework that enables language model evaluation using Prometheus and GPT4 | 820 |
huggingface/lighteval | An all-in-one toolkit for evaluating Large Language Models (LLMs) across multiple backends. | 879 |
pkunlp-icler/pca-eval | An open-source benchmark and evaluation tool for assessing multimodal large language models' performance in embodied decision-making tasks | 99 |
mlgroupjlu/llm-eval-survey | A repository of papers and resources for evaluating large language models. | 1,450 |
opengvlab/visionllm | A large language model designed to process and generate visual information | 956 |
edublancas/sklearn-evaluation | A tool for evaluating and visualizing machine learning model performance | 3 |
maluuba/nlg-eval | A toolset for evaluating and comparing natural language generation models | 1,350 |
esmvalgroup/esmvaltool | A community-developed tool for evaluating climate models and providing diagnostic metrics. | 230 |