CMMMU

Multimodal QA model evaluator

An evaluation benchmark and dataset for multimodal question answering models

GitHub

46 stars
2 watching
1 forks
Language: Python
last commit: 3 months ago

Related projects:

Repository Description Stars
felixgithub2017/mmcu Evaluates the semantic understanding capabilities of large Chinese language models using a multimodal dataset. 87
damo-nlp-sg/m3exam A benchmark for evaluating large language models in multiple languages and formats 92
multimodal-art-projection/omnibench Evaluates and benchmarks multimodal language models' ability to process visual, acoustic, and textual inputs simultaneously. 14
aifeg/benchlmm An open-source benchmarking framework for evaluating cross-style visual capability of large multimodal models 83
ailab-cvc/seed-bench A benchmark for evaluating large language models' ability to process multimodal input 315
cmawer/reproducible-model A project demonstrating how to create a reproducible machine learning model using Python and version control 86
qcri/llmebench A benchmarking framework for large language models 80
mna/gocostmodel A benchmarking package for the Go language. 61
junyangwang0410/amber An LLM-free benchmark suite for evaluating MLLMs' hallucination capabilities in various tasks and dimensions 93
mlcommons/inference Measures the performance of deep learning models in various deployment scenarios. 1,236
mikegu721/xiezhibenchmark An evaluation suite to assess language models' performance in multi-choice questions 91
mariomka/regex-benchmark A benchmarking project comparing the performance of different programming languages' regex engines 315
cmu-safari/prim-benchmarks A benchmarking suite for evaluating the performance of memory-centric computing architectures 137
freedomintelligence/mllm-bench Evaluates and compares the performance of multimodal large language models on various tasks 55
bradyfu/video-mme An evaluation framework for large language models in video analysis, providing a comprehensive benchmark of their capabilities. 406