CMMMU
Multimodal QA model evaluator
An evaluation benchmark and dataset for multimodal question answering models
46 stars
2 watching
1 forks
Language: Python
last commit: 3 months ago Related projects:
Repository | Description | Stars |
---|---|---|
felixgithub2017/mmcu | Evaluates the semantic understanding capabilities of large Chinese language models using a multimodal dataset. | 87 |
damo-nlp-sg/m3exam | A benchmark for evaluating large language models in multiple languages and formats | 92 |
multimodal-art-projection/omnibench | Evaluates and benchmarks multimodal language models' ability to process visual, acoustic, and textual inputs simultaneously. | 14 |
aifeg/benchlmm | An open-source benchmarking framework for evaluating cross-style visual capability of large multimodal models | 83 |
ailab-cvc/seed-bench | A benchmark for evaluating large language models' ability to process multimodal input | 315 |
cmawer/reproducible-model | A project demonstrating how to create a reproducible machine learning model using Python and version control | 86 |
qcri/llmebench | A benchmarking framework for large language models | 80 |
mna/gocostmodel | A benchmarking package for the Go language. | 61 |
junyangwang0410/amber | An LLM-free benchmark suite for evaluating MLLMs' hallucination capabilities in various tasks and dimensions | 93 |
mlcommons/inference | Measures the performance of deep learning models in various deployment scenarios. | 1,236 |
mikegu721/xiezhibenchmark | An evaluation suite to assess language models' performance in multi-choice questions | 91 |
mariomka/regex-benchmark | A benchmarking project comparing the performance of different programming languages' regex engines | 315 |
cmu-safari/prim-benchmarks | A benchmarking suite for evaluating the performance of memory-centric computing architectures | 137 |
freedomintelligence/mllm-bench | Evaluates and compares the performance of multimodal large language models on various tasks | 55 |
bradyfu/video-mme | An evaluation framework for large language models in video analysis, providing a comprehensive benchmark of their capabilities. | 406 |