MM-Vet

Model evaluator

Evaluates the capabilities of large multimodal models using a set of diverse tasks and metrics

MM-Vet: Evaluating Large Multimodal Models for Integrated Capabilities (ICML 2024)

GitHub

274 stars

2 watching

11 forks

Language: Python

last commit: over 1 year ago

arxiv.org/abs/2308.02490

Related projects:

Repository	Description	Stars
zhourax/vega	Develops a multimodal task and dataset to assess vision-language models' ability to handle interleaved image-text inputs.	33
tsb0601/mmvp	An evaluation framework for multimodal language models' visual capabilities using image and question benchmarks.	296
yuliang-liu/monkey	An end-to-end image captioning system that uses large multi-modal models and provides tools for training, inference, and demo usage.	1,849
chenllliang/mmevalpro	A benchmarking framework for evaluating Large Multimodal Models by providing rigorous metrics and an efficient evaluation pipeline.	22
mshukor/evalign-icl	Evaluating and improving large multimodal models through in-context learning	21
freedomintelligence/mllm-bench	Evaluates and compares the performance of multimodal large language models on various tasks	56
allenai/olmo-eval	A framework for evaluating language models on NLP tasks	326
haozhezhao/mic	Develops a multimodal vision-language model to enable machines to understand complex relationships between instructions and images in various tasks.	337
evolvinglmms-lab/lmms-eval	Tools and evaluation framework for accelerating the development of large multimodal models by providing an efficient way to assess their performance	2,164
yfzhang114/slime	Develops large multimodal models for high-resolution understanding and analysis of text, images, and other data types.	143
fuxiaoliu/mmc	Develops a large-scale dataset and benchmark for training multimodal chart understanding models using large language models.	87
mikegu721/xiezhibenchmark	An evaluation suite to assess language models' performance in multi-choice questions	93
yuliang-liu/multimodalocr	An evaluation benchmark for OCR capabilities in large multmodal models.	484
tiger-ai-lab/uniir	Trains and evaluates a universal multimodal retrieval model to perform various information retrieval tasks.	114
felixgithub2017/mmcu	Measures the understanding of massive multitask Chinese datasets using large language models	87