MM-Vet

Model evaluator

Evaluates the capabilities of large multimodal models using a set of diverse tasks and metrics

MM-Vet: Evaluating Large Multimodal Models for Integrated Capabilities (ICML 2024)

GitHub

274 stars
2 watching
11 forks
Language: Python
last commit: 2 months ago

Related projects:

Repository Description Stars
zhourax/vega Develops a multimodal task and dataset to assess vision-language models' ability to handle interleaved image-text inputs. 33
tsb0601/mmvp An evaluation framework for multimodal language models' visual capabilities using image and question benchmarks. 296
yuliang-liu/monkey An end-to-end image captioning system that uses large multi-modal models and provides tools for training, inference, and demo usage. 1,849
chenllliang/mmevalpro A benchmarking framework for evaluating Large Multimodal Models by providing rigorous metrics and an efficient evaluation pipeline. 22
mshukor/evalign-icl Evaluating and improving large multimodal models through in-context learning 21
freedomintelligence/mllm-bench Evaluates and compares the performance of multimodal large language models on various tasks 56
allenai/olmo-eval A framework for evaluating language models on NLP tasks 326
haozhezhao/mic Develops a multimodal vision-language model to enable machines to understand complex relationships between instructions and images in various tasks. 337
evolvinglmms-lab/lmms-eval Tools and evaluation framework for accelerating the development of large multimodal models by providing an efficient way to assess their performance 2,164
yfzhang114/slime Develops large multimodal models for high-resolution understanding and analysis of text, images, and other data types. 143
fuxiaoliu/mmc Develops a large-scale dataset and benchmark for training multimodal chart understanding models using large language models. 87
mikegu721/xiezhibenchmark An evaluation suite to assess language models' performance in multi-choice questions 93
yuliang-liu/multimodalocr An evaluation benchmark for OCR capabilities in large multmodal models. 484
tiger-ai-lab/uniir Trains and evaluates a universal multimodal retrieval model to perform various information retrieval tasks. 114
felixgithub2017/mmcu Measures the understanding of massive multitask Chinese datasets using large language models 87