MM-Vet

Model evaluator

Evaluates the capabilities of large multimodal models using a set of diverse tasks and metrics

MM-Vet: Evaluating Large Multimodal Models for Integrated Capabilities (ICML 2024)

GitHub

267 stars
2 watching
11 forks
Language: Python
last commit: 19 days ago

Related projects:

Repository Description Stars
zhourax/vega Develops a multimodal task and dataset to assess vision-language models' ability to handle interleaved image-text inputs. 33
tsb0601/mmvp An evaluation framework for multimodal language models' visual capabilities using image and question benchmarks. 288
yuliang-liu/monkey A toolkit for building conversational AI models that can process images and text inputs. 1,825
chenllliang/mmevalpro A benchmarking framework for evaluating Large Multimodal Models by providing rigorous metrics and an efficient evaluation pipeline. 22
mshukor/evalign-icl Evaluating and improving large multimodal models through in-context learning 20
freedomintelligence/mllm-bench Evaluates and compares the performance of multimodal large language models on various tasks 55
allenai/olmo-eval An evaluation framework for large language models. 311
haozhezhao/mic Develops a multimodal vision-language model to enable machines to understand complex relationships between instructions and images in various tasks. 334
evolvinglmms-lab/lmms-eval Tools and evaluation suite for large multimodal models 2,058
yfzhang114/slime Develops large multimodal models for high-resolution understanding and analysis of text, images, and other data types. 137
fuxiaoliu/mmc Develops a large-scale dataset and benchmark for training multimodal chart understanding models using large language models. 84
mikegu721/xiezhibenchmark An evaluation suite to assess language models' performance in multi-choice questions 91
yuliang-liu/multimodalocr An evaluation benchmark for OCR capabilities in large multmodal models. 471
tiger-ai-lab/uniir Trains and evaluates a universal multimodal retrieval model to perform various information retrieval tasks. 110
felixgithub2017/mmcu Evaluates the semantic understanding capabilities of large Chinese language models using a multimodal dataset. 87