robustness_metrics

Model Robustness Tool

A toolset to evaluate the robustness of machine learning models

GitHub

466 stars
11 watching
33 forks
Language: Jupyter Notebook
last commit: 6 months ago
Linked from 1 awesome list


Backlinks from these awesome lists:

Related projects:

Repository Description Stars
hendrycks/robustness Evaluates and benchmarks the robustness of deep learning models to various corruptions and perturbations in computer vision tasks. 1,030
borealisai/advertorch A toolbox for researching and evaluating robustness against attacks on machine learning models 1,311
robustbench/robustbench A standardized benchmark for measuring the robustness of machine learning models against adversarial attacks 682
google-research/deep_ope Provides benchmarking policies and datasets for offline reinforcement learning 85
guanghelee/neurips19-certificates-of-robustness Provides a framework for computing tight certificates of adversarial robustness for randomly smoothed classifiers. 17
jmgirard/mreliability Tools for calculating consistency of observer measurements in various contexts 41
edisonleeeee/greatx A toolbox for graph reliability and robustness against noise, distribution shifts, and attacks. 85
modeloriented/fairmodels A tool for detecting bias in machine learning models and mitigating it using various techniques. 86
google-research/rlds A toolkit for storing and manipulating episodic data in reinforcement learning and related tasks. 302
google/ml-fairness-gym An open-source tool for simulating the long-term impacts of machine learning-based decision systems on social environments 314
freedomintelligence/mllm-bench Evaluates and compares the performance of multimodal large language models on various tasks 56
benhamner/metrics Provides implementations of various supervised machine learning evaluation metrics in multiple programming languages. 1,632
i-gallegos/fair-llm-benchmark Compiles bias evaluation datasets and provides access to original data sources for large language models 115
sail-sg/mmcbench A benchmarking framework designed to evaluate the robustness of large multimodal models against common corruption scenarios 27
cmawer/reproducible-model A project demonstrating how to create a reproducible machine learning model using Python and version control 86