rlhf_trojan_competition
Backdoor detector
Detecting backdoors in language models to prevent malicious AI usage
Finding trojans in aligned LLMs. Official repository for the competition hosted at SaTML 2024.
107 stars
4 watching
9 forks
Language: Python
last commit: 5 months ago
Linked from 1 awesome list
Related projects:
Repository | Description | Stars |
---|---|---|
eth-sri/bayes-framework-leakage | Develops and evaluates a framework for detecting attacks on federated learning systems | 11 |
duo-labs/secret-bridge | Automates secret detection in GitHub repositories | 189 |
mzweilin/ipv6-attack-detector | Tools to detect and analyze link-local IPv6 attacks | 39 |
kaiyuanzh/flip | A framework for defending against backdoor attacks in federated learning systems | 44 |
logrhythm-labs/pie | A framework that detects and responds to phishing attacks by analyzing email contents, attachments, and links. | 180 |
git-disl/lockdown | A backdoor defense system against attacks in federated learning algorithms used for machine learning model training on distributed datasets. | 14 |
legit-labs/legitify | Automates vulnerability detection and remediation across GitHub and GitLab assets to strengthen software security posture. | 774 |
sophos/sorel-20m | A large-scale dataset and codebase for training machine learning models to detect malicious software | 638 |
ai4risk/antifraud | Develops and evaluates machine learning models for detecting financial fraud | 174 |
rlhf-v/rlhf-v | Aligns large language models' behavior through fine-grained correctional human feedback to improve trustworthiness and accuracy. | 233 |
hfzhang31/a3fl | A framework for attacking federated learning systems with adaptive backdoor attacks | 22 |
safellama/plexiglass | A toolkit to detect and protect against vulnerabilities in Large Language Models. | 121 |
newrelic/rusty-hog | Secret scanner built in Rust for performance to detect sensitive information | 454 |
ai-secure/crfl | This project presents a framework for robust federated learning against backdoor attacks. | 71 |
13o-bbr-bbq/machine_learning_security | This project explores the intersection of machine learning and security, focusing on developing tools and techniques to improve vulnerability detection and penetration testing in web applications. | 1,979 |