BenchLMM

Visual Model Benchmark

An open-source benchmarking framework for evaluating cross-style visual capability of large multimodal models

[ECCV 2024] BenchLMM: Benchmarking Cross-style Visual Capability of Large Multimodal Models

GitHub

84 stars
0 watching
6 forks
Language: Python
last commit: 5 months ago
benchmarkcvdatasetlarge-language-modelslarge-multimodal-models

Related projects:

Repository Description Stars
ailab-cvc/seed-bench A benchmark for evaluating large language models' ability to process multimodal input 322
damo-nlp-sg/m3exam A benchmark for evaluating large language models in multiple languages and formats 93
qcri/llmebench A benchmarking framework for large language models 81
ucsc-vlaa/vllm-safety-benchmark A benchmark for evaluating the safety and robustness of vision language models against adversarial attacks. 72
junyangwang0410/amber An LLM-free benchmark suite for evaluating MLLMs' hallucination capabilities in various tasks and dimensions 98
freedomintelligence/mllm-bench Evaluates and compares the performance of multimodal large language models on various tasks 56
szilard/benchm-ml A benchmark for evaluating machine learning algorithms' performance on large datasets 1,874
multimodal-art-projection/omnibench Evaluates and benchmarks multimodal language models' ability to process visual, acoustic, and textual inputs simultaneously. 15
ailab-cvc/seed An implementation of a multimodal language model with capabilities for comprehension and generation 585
bradyfu/video-mme Comprehensive benchmark for evaluating multi-modal large language models on video analysis tasks 422
felixgithub2017/mmcu Measures the understanding of massive multitask Chinese datasets using large language models 87
i-gallegos/fair-llm-benchmark Compiles bias evaluation datasets and provides access to original data sources for large language models 115
mlcommons/inference Measures the performance of deep learning models in various deployment scenarios. 1,256
tsb0601/mmvp An evaluation framework for multimodal language models' visual capabilities using image and question benchmarks. 296
lxtgh/omg-seg Develops an end-to-end model for multiple visual perception and reasoning tasks using a single encoder, decoder, and large language model. 1,336