MMCBench

Model robustness tester

A benchmarking framework designed to evaluate the robustness of large multimodal models against common corruption scenarios

27 stars

5 watching

0 forks

Language: Python

last commit: over 1 year ago

Related projects:

Repository	Description	Stars
freedomintelligence/mllm-bench	Evaluates and compares the performance of multimodal large language models on various tasks	56
felixgithub2017/mmcu	Measures the understanding of massive multitask Chinese datasets using large language models	87
hendrycks/robustness	Evaluates and benchmarks the robustness of deep learning models to various corruptions and perturbations in computer vision tasks.	1,030
borealisai/advertorch	A toolbox for researching and evaluating robustness against attacks on machine learning models	1,311
0x0mar/smod	A modular framework for testing and exploiting Modbus protocol vulnerabilities in industrial control systems	74
ailab-cvc/seed-bench	A benchmark for evaluating large language models' ability to process multimodal input	322
open-compass/mmbench	A collection of benchmarks to evaluate the multi-modal understanding capability of large vision language models.	168
robustbench/robustbench	A standardized benchmark for measuring the robustness of machine learning models against adversarial attacks	682
sww9370/rocbert	A pre-trained Chinese language model designed to be robust against maliciously crafted texts	15
chenllliang/mmevalpro	A benchmarking framework for evaluating Large Multimodal Models by providing rigorous metrics and an efficient evaluation pipeline.	22
guanghelee/neurips19-certificates-of-robustness	Provides a framework for computing tight certificates of adversarial robustness for randomly smoothed classifiers.	17
tsb0601/mmvp	An evaluation framework for multimodal language models' visual capabilities using image and question benchmarks.	296
fuxiaoliu/mmc	Develops a large-scale dataset and benchmark for training multimodal chart understanding models using large language models.	87
google-research/robustness_metrics	A toolset to evaluate the robustness of machine learning models	466
vernamlab/medusa	Automated attack synthesis tool for discovering vulnerabilities in CPU architecture and cryptographic protocols	18