MMCBench

Model robustness tester

A benchmarking framework designed to evaluate the robustness of large multimodal models against common corruption scenarios

GitHub

27 stars
5 watching
0 forks
Language: Python
last commit: 10 months ago

Related projects:

Repository Description Stars
freedomintelligence/mllm-bench Evaluates and compares the performance of multimodal large language models on various tasks 55
felixgithub2017/mmcu Evaluates the semantic understanding capabilities of large Chinese language models using a multimodal dataset. 87
hendrycks/robustness Evaluates and benchmarks the robustness of deep learning models to various corruptions and perturbations in computer vision tasks. 1,022
borealisai/advertorch A toolbox for researching and evaluating robustness against attacks on machine learning models 1,308
0x0mar/smod A modular framework for testing and exploiting Modbus protocol vulnerabilities in industrial control systems 73
ailab-cvc/seed-bench A benchmark for evaluating large language models' ability to process multimodal input 315
open-compass/mmbench A collection of benchmarks to evaluate the multi-modal understanding capability of large vision language models. 163
robustbench/robustbench A standardized benchmark for measuring the robustness of machine learning models against adversarial attacks 667
sww9370/rocbert A pre-trained Chinese language model designed to be robust against maliciously crafted texts 15
chenllliang/mmevalpro A benchmarking framework for evaluating Large Multimodal Models by providing rigorous metrics and an efficient evaluation pipeline. 22
guanghelee/neurips19-certificates-of-robustness Tight certificates of adversarial robustness for randomly smoothed classifiers 17
tsb0601/mmvp An evaluation framework for multimodal language models' visual capabilities using image and question benchmarks. 288
fuxiaoliu/mmc Develops a large-scale dataset and benchmark for training multimodal chart understanding models using large language models. 84
google-research/robustness_metrics A toolset to evaluate the robustness of machine learning models 466
vernamlab/medusa Automated attack synthesis tool for discovering vulnerabilities in CPU architecture and cryptographic protocols 18