rlhf_trojan_competition
Backdoor detector
Detecting backdoors in language models to prevent malicious AI usage
Finding trojans in aligned LLMs. Official repository for the competition hosted at SaTML 2024.
109 stars
4 watching
9 forks
Language: Python
last commit: 8 months ago
Linked from 1 awesome list
Related projects:
Repository | Description | Stars |
---|---|---|
eth-sri/bayes-framework-leakage | Develops and evaluates a framework for detecting attacks on federated learning systems | 11 |
duo-labs/secret-bridge | Automates secret detection in GitHub repositories | 191 |
mzweilin/ipv6-attack-detector | Tools to detect and analyze link-local IPv6 attacks | 39 |
kaiyuanzh/flip | A framework for defending against backdoor attacks in federated learning systems | 48 |
logrhythm-labs/pie | A framework that detects and responds to phishing attacks by analyzing email contents, attachments, and links. | 180 |
git-disl/lockdown | A backdoor defense system for federated learning, designed to protect against data poisoning attacks by isolating subspace training and aggregating models with robust consensus fusion. | 18 |
legit-labs/legitify | Automates vulnerability detection and remediation across GitHub and GitLab assets to strengthen software security posture. | 782 |
sophos/sorel-20m | A large-scale dataset and codebase for training machine learning models to detect malicious software | 646 |
ai4risk/antifraud | Develops and evaluates machine learning models for detecting financial fraud | 195 |
rlhf-v/rlhf-v | Aligns large language models' behavior through fine-grained correctional human feedback to improve trustworthiness and accuracy. | 245 |
hfzhang31/a3fl | A framework for attacking federated learning systems with adaptive backdoor attacks | 23 |
safellama/plexiglass | A toolkit to detect and protect against vulnerabilities in Large Language Models. | 122 |
newrelic/rusty-hog | Secret scanner built in Rust for performance to detect sensitive information | 461 |
ai-secure/crfl | This project presents a framework for robust federated learning against backdoor attacks. | 71 |
13o-bbr-bbq/machine_learning_security | An open-source project that explores the intersection of machine learning and security to develop tools for detecting vulnerabilities in web applications. | 1,987 |