rlhf_trojan_competition

Backdoor detector

Detecting backdoors in language models to prevent malicious AI usage

Finding trojans in aligned LLMs. Official repository for the competition hosted at SaTML 2024.

GitHub

107 stars
4 watching
9 forks
Language: Python
last commit: 5 months ago
Linked from 1 awesome list


Backlinks from these awesome lists:

Related projects:

Repository Description Stars
eth-sri/bayes-framework-leakage Develops and evaluates a framework for detecting attacks on federated learning systems 11
duo-labs/secret-bridge Automates secret detection in GitHub repositories 189
mzweilin/ipv6-attack-detector Tools to detect and analyze link-local IPv6 attacks 39
kaiyuanzh/flip A framework for defending against backdoor attacks in federated learning systems 44
logrhythm-labs/pie A framework that detects and responds to phishing attacks by analyzing email contents, attachments, and links. 180
git-disl/lockdown A backdoor defense system against attacks in federated learning algorithms used for machine learning model training on distributed datasets. 14
legit-labs/legitify Automates vulnerability detection and remediation across GitHub and GitLab assets to strengthen software security posture. 774
sophos/sorel-20m A large-scale dataset and codebase for training machine learning models to detect malicious software 638
ai4risk/antifraud Develops and evaluates machine learning models for detecting financial fraud 174
rlhf-v/rlhf-v Aligns large language models' behavior through fine-grained correctional human feedback to improve trustworthiness and accuracy. 233
hfzhang31/a3fl A framework for attacking federated learning systems with adaptive backdoor attacks 22
safellama/plexiglass A toolkit to detect and protect against vulnerabilities in Large Language Models. 121
newrelic/rusty-hog Secret scanner built in Rust for performance to detect sensitive information 454
ai-secure/crfl This project presents a framework for robust federated learning against backdoor attacks. 71
13o-bbr-bbq/machine_learning_security This project explores the intersection of machine learning and security, focusing on developing tools and techniques to improve vulnerability detection and penetration testing in web applications. 1,979