rlhf_trojan_competition

Backdoor detector

Detecting backdoors in language models to prevent malicious AI usage

Finding trojans in aligned LLMs. Official repository for the competition hosted at SaTML 2024.

GitHub

109 stars

4 watching

9 forks

Language: Python

last commit: about 1 year ago

Linked from 1 awesome list

Backlinks from these awesome lists:

chawins/llm-sp

Related projects:

Repository	Description	Stars
eth-sri/bayes-framework-leakage	Develops and evaluates a framework for detecting attacks on federated learning systems	11
duo-labs/secret-bridge	Automates secret detection in GitHub repositories	191
mzweilin/ipv6-attack-detector	Tools to detect and analyze link-local IPv6 attacks	39
kaiyuanzh/flip	A framework for defending against backdoor attacks in federated learning systems	48
logrhythm-labs/pie	A framework that detects and responds to phishing attacks by analyzing email contents, attachments, and links.	180
git-disl/lockdown	A backdoor defense system for federated learning, designed to protect against data poisoning attacks by isolating subspace training and aggregating models with robust consensus fusion.	18
legit-labs/legitify	Automates vulnerability detection and remediation across GitHub and GitLab assets to strengthen software security posture.	782
sophos/sorel-20m	A large-scale dataset and codebase for training machine learning models to detect malicious software	646
ai4risk/antifraud	Develops and evaluates machine learning models for detecting financial fraud	195
rlhf-v/rlhf-v	Aligns large language models' behavior through fine-grained correctional human feedback to improve trustworthiness and accuracy.	245
hfzhang31/a3fl	A framework for attacking federated learning systems with adaptive backdoor attacks	23
safellama/plexiglass	A toolkit to detect and protect against vulnerabilities in Large Language Models.	122
newrelic/rusty-hog	Secret scanner built in Rust for performance to detect sensitive information	461
ai-secure/crfl	This project presents a framework for robust federated learning against backdoor attacks.	71
13o-bbr-bbq/machine_learning_security	An open-source project that explores the intersection of machine learning and security to develop tools for detecting vulnerabilities in web applications.	1,987