TouchStone
Vision Model Evaluator
A tool to evaluate vision-language models by comparing their performance on various tasks such as image recognition and text generation.
Touchstone: Evaluating Vision-Language Models by Language Models
78 stars
3 watching
0 forks
Language: Python
last commit: 10 months ago Related projects:
Repository | Description | Stars |
---|---|---|
openai/simple-evals | A library for evaluating language models using standardized prompts and benchmarking tests. | 1,939 |
huggingface/evaluate | An evaluation framework for machine learning models and datasets, providing standardized metrics and tools for comparing model performance. | 2,034 |
allenai/olmo-eval | An evaluation framework for large language models. | 310 |
pkunlp-icler/pca-eval | An open-source benchmark and evaluation tool for assessing multimodal large language models' performance in embodied decision-making tasks | 100 |
edublancas/sklearn-evaluation | A tool for evaluating and visualizing machine learning model performance | 3 |
open-compass/vlmevalkit | A toolkit for evaluating large vision-language models on various benchmarks and datasets. | 1,343 |
modelscope/evalscope | A framework for efficient large model evaluation and performance benchmarking. | 248 |
vchitect/vbench | A tool for evaluating and benchmarking video generative models in computer vision and artificial intelligence | 576 |
truskovskiyk/nima.pytorch | Assesses and evaluates images using deep learning models | 335 |
tsb0601/mmvp | An evaluation framework for multimodal language models' visual capabilities using image and question benchmarks. | 288 |
zhourax/vega | Develops a multimodal task and dataset to assess vision-language models' ability to handle interleaved image-text inputs. | 33 |
ucsc-vlaa/vllm-safety-benchmark | A benchmark for evaluating the safety and robustness of vision language models against adversarial attacks. | 67 |
chenllliang/mmevalpro | A benchmarking framework for evaluating Large Multimodal Models by providing rigorous metrics and an efficient evaluation pipeline. | 22 |
huggingface/lighteval | A toolkit for evaluating Large Language Models across multiple backends | 804 |
vishaal27/sus-x | This is an open-source project that proposes a novel method to train large-scale vision-language models with minimal resources and no fine-tuning required. | 94 |