BenchLMM
Visual Model Benchmark
An open-source benchmarking framework for evaluating cross-style visual capability of large multimodal models
[ECCV 2024] BenchLMM: Benchmarking Cross-style Visual Capability of Large Multimodal Models
83 stars
0 watching
6 forks
Language: Python
last commit: 3 months ago benchmarkcvdatasetlarge-language-modelslarge-multimodal-models
Related projects:
Repository | Description | Stars |
---|---|---|
ailab-cvc/seed-bench | A benchmark for evaluating large language models' ability to process multimodal input | 315 |
damo-nlp-sg/m3exam | A benchmark for evaluating large language models in multiple languages and formats | 92 |
qcri/llmebench | A benchmarking framework for large language models | 80 |
ucsc-vlaa/vllm-safety-benchmark | A benchmark for evaluating the safety and robustness of vision language models against adversarial attacks. | 67 |
junyangwang0410/amber | An LLM-free benchmark suite for evaluating MLLMs' hallucination capabilities in various tasks and dimensions | 93 |
freedomintelligence/mllm-bench | Evaluates and compares the performance of multimodal large language models on various tasks | 55 |
szilard/benchm-ml | A benchmark for evaluating machine learning algorithms' performance on large datasets | 1,869 |
multimodal-art-projection/omnibench | Evaluates and benchmarks multimodal language models' ability to process visual, acoustic, and textual inputs simultaneously. | 14 |
ailab-cvc/seed | An implementation of a multimodal language model with capabilities for comprehension and generation | 576 |
bradyfu/video-mme | An evaluation framework for large language models in video analysis, providing a comprehensive benchmark of their capabilities. | 406 |
felixgithub2017/mmcu | Evaluates the semantic understanding capabilities of large Chinese language models using a multimodal dataset. | 87 |
i-gallegos/fair-llm-benchmark | Compiles bias evaluation datasets and provides access to original data sources for large language models | 110 |
mlcommons/inference | Measures the performance of deep learning models in various deployment scenarios. | 1,236 |
tsb0601/mmvp | An evaluation framework for multimodal language models' visual capabilities using image and question benchmarks. | 288 |
lxtgh/omg-seg | Develops an end-to-end model for multiple visual perception and reasoning tasks using a single encoder, decoder, and large language model. | 1,300 |