MMCBench

Model robustness tester

A benchmarking framework designed to evaluate the robustness of large multimodal models against common corruption scenarios

GitHub

27 stars
5 watching
0 forks
Language: Python
last commit: 12 months ago

Related projects:

Repository Description Stars
freedomintelligence/mllm-bench Evaluates and compares the performance of multimodal large language models on various tasks 56
felixgithub2017/mmcu Measures the understanding of massive multitask Chinese datasets using large language models 87
hendrycks/robustness Evaluates and benchmarks the robustness of deep learning models to various corruptions and perturbations in computer vision tasks. 1,030
borealisai/advertorch A toolbox for researching and evaluating robustness against attacks on machine learning models 1,311
0x0mar/smod A modular framework for testing and exploiting Modbus protocol vulnerabilities in industrial control systems 74
ailab-cvc/seed-bench A benchmark for evaluating large language models' ability to process multimodal input 322
open-compass/mmbench A collection of benchmarks to evaluate the multi-modal understanding capability of large vision language models. 168
robustbench/robustbench A standardized benchmark for measuring the robustness of machine learning models against adversarial attacks 682
sww9370/rocbert A pre-trained Chinese language model designed to be robust against maliciously crafted texts 15
chenllliang/mmevalpro A benchmarking framework for evaluating Large Multimodal Models by providing rigorous metrics and an efficient evaluation pipeline. 22
guanghelee/neurips19-certificates-of-robustness Provides a framework for computing tight certificates of adversarial robustness for randomly smoothed classifiers. 17
tsb0601/mmvp An evaluation framework for multimodal language models' visual capabilities using image and question benchmarks. 296
fuxiaoliu/mmc Develops a large-scale dataset and benchmark for training multimodal chart understanding models using large language models. 87
google-research/robustness_metrics A toolset to evaluate the robustness of machine learning models 466
vernamlab/medusa Automated attack synthesis tool for discovering vulnerabilities in CPU architecture and cryptographic protocols 18