rlhf_trojan_competition
Backdoor detector
Detecting backdoors in language models to prevent malicious AI usage
Finding trojans in aligned LLMs. Official repository for the competition hosted at SaTML 2024.
109 stars
4 watching
9 forks
Language: Python
last commit: 9 months ago
Linked from 1 awesome list
Related projects:
Repository | Description | Stars |
---|---|---|
| Develops and evaluates a framework for detecting attacks on federated learning systems | 11 |
| Automates secret detection in GitHub repositories | 191 |
| Tools to detect and analyze link-local IPv6 attacks | 39 |
| A framework for defending against backdoor attacks in federated learning systems | 48 |
| A framework that detects and responds to phishing attacks by analyzing email contents, attachments, and links. | 180 |
| A backdoor defense system for federated learning, designed to protect against data poisoning attacks by isolating subspace training and aggregating models with robust consensus fusion. | 18 |
| Automates vulnerability detection and remediation across GitHub and GitLab assets to strengthen software security posture. | 782 |
| A large-scale dataset and codebase for training machine learning models to detect malicious software | 646 |
| Develops and evaluates machine learning models for detecting financial fraud | 195 |
| Aligns large language models' behavior through fine-grained correctional human feedback to improve trustworthiness and accuracy. | 245 |
| A framework for attacking federated learning systems with adaptive backdoor attacks | 23 |
| A toolkit to detect and protect against vulnerabilities in Large Language Models. | 122 |
| Secret scanner built in Rust for performance to detect sensitive information | 461 |
| This project presents a framework for robust federated learning against backdoor attacks. | 71 |
| An open-source project that explores the intersection of machine learning and security to develop tools for detecting vulnerabilities in web applications. | 1,987 |