MMCBench
Model robustness tester
A benchmarking framework designed to evaluate the robustness of large multimodal models against common corruption scenarios
27 stars
5 watching
0 forks
Language: Python
last commit: 10 months ago Related projects:
Repository | Description | Stars |
---|---|---|
freedomintelligence/mllm-bench | Evaluates and compares the performance of multimodal large language models on various tasks | 55 |
felixgithub2017/mmcu | Evaluates the semantic understanding capabilities of large Chinese language models using a multimodal dataset. | 87 |
hendrycks/robustness | Evaluates and benchmarks the robustness of deep learning models to various corruptions and perturbations in computer vision tasks. | 1,022 |
borealisai/advertorch | A toolbox for researching and evaluating robustness against attacks on machine learning models | 1,308 |
0x0mar/smod | A modular framework for testing and exploiting Modbus protocol vulnerabilities in industrial control systems | 73 |
ailab-cvc/seed-bench | A benchmark for evaluating large language models' ability to process multimodal input | 315 |
open-compass/mmbench | A collection of benchmarks to evaluate the multi-modal understanding capability of large vision language models. | 163 |
robustbench/robustbench | A standardized benchmark for measuring the robustness of machine learning models against adversarial attacks | 667 |
sww9370/rocbert | A pre-trained Chinese language model designed to be robust against maliciously crafted texts | 15 |
chenllliang/mmevalpro | A benchmarking framework for evaluating Large Multimodal Models by providing rigorous metrics and an efficient evaluation pipeline. | 22 |
guanghelee/neurips19-certificates-of-robustness | Tight certificates of adversarial robustness for randomly smoothed classifiers | 17 |
tsb0601/mmvp | An evaluation framework for multimodal language models' visual capabilities using image and question benchmarks. | 288 |
fuxiaoliu/mmc | Develops a large-scale dataset and benchmark for training multimodal chart understanding models using large language models. | 84 |
google-research/robustness_metrics | A toolset to evaluate the robustness of machine learning models | 466 |
vernamlab/medusa | Automated attack synthesis tool for discovering vulnerabilities in CPU architecture and cryptographic protocols | 18 |