llm-sp

LLM Security Resources

A collection of papers and resources on the security and privacy of Large Language Models (LLMs), providing insights into vulnerabilities and potential attack vectors.

Papers and resources related to the security and privacy of LLMs 🤖

GitHub

454 stars

17 watching

34 forks

Language: Python

last commit: 12 months ago

Linked from 1 awesome list

adversarial-machine-learningawesome-listllmllm-privacyllm-securityprivacysecurity

chawins.github.io/llm-sp

LLM Security & Privacy / Vulnerabilities / Jailbreak
red-teaming LLM with LLM paper			On a high level, the idea is similar to the . They train an LLM (called AdvPrompter) to automatically jailbreak a target LLM. AdvPrompter is trained on rewards (logprob of "Sure, here is...") of the target model. The result is good but maybe not as good as the at the time. However, there are a lot of interesting technical contributions
LLM Security & Privacy / Vulnerabilities / Privacy
Ippolito et al. (2023)			The authors also advocate for approximate memorization instead of verbatim, similar to
GitHub - iamgroot42/mimir: Python package for measuring memorization in LLMs.	126	12 months ago	Library of MIAs on LLMs, including Min-k%, zlib, reference-based attack (Ref), neighborhood
n-gram overlap			Temporal shift in member vs non-member test samples contributes to an overestimated MIA success rate. The authors measure this distribution shift with
the neighborhood attack			Ask target LLM to select a verbatim text from a copyrighted book/ArXiv paper in a multiple-choice format (four choices). The other three options are close LLM-paraphrased texts. The core idea is similar to , but using MCQA instead of loss computation. The authors also debias/normalize for effects of the answer ordering, which LLMs are known to have trouble with
MemFree			Propose SHIELD defense which works by (1) detecting copyrighted content in model’s output, (2) verifying it with internet search, and (3) few-shot prompting to let the model refuse or answer as appropriate (summary and QA are ok, but not verbatim). Defense seems very effective and is better than
LLM Security & Privacy / Defenses / Against Jailbreak & Prompt Injection
Representation Engineering paper			The technique is based on
LLM Security & Privacy / Other resources / People/Orgs/Blog to Follow
Blog			ChatGPT Plugin Exploit Explained: From Prompt Injection to Accessing Private Data [ ]
Blog			Advanced Data Exfiltration Techniques with ChatGPT [ ]
Blog			Hacking Google Bard - From Prompt Injection to Data Exfiltration [ ]
Blog			Securing LLM Systems Against Prompt Injection [ ]
X			Meme [ ]
LLM Security & Privacy / Other resources / Resource Compilation
https://github.com/corca-ai/awesome-llm-security	985	12 months ago	: A curation of awesome tools, documents and projects about LLM Security
https://github.com/briland/LLM-security-and-privacy	41	about 1 year ago
https://llmsecurity.net/			: LLM security is the investigation of the failure modes of LLMs in use, the conditions that lead to them, and their mitigations
https://surrealyz.github.io/classes/llmsec/llmsec.html			: CMSC818I: Advanced Topics in Computer Systems; Large Language Models, Security, and Privacy (UMD) by Prof. Yizheng Chen
https://www.jailbreakchat.com/			: Crowd-sourced jailbreaks
https://github.com/ethz-spylab/rlhf_trojan_competition	109	over 1 year ago	: Competition track at SaTML 2024
https://github.com/Hannibal046/Awesome-LLM/	19,419	11 months ago	: Huge compilation of LLM papers and software
LLM Security & Privacy / Other resources / Open-Source Projects
https://github.com/LostOxygen/llm-confidentiality	30	11 months ago	: Framework for evaluating LLM confidentiality
https://github.com/leondz/garak	3,043	11 months ago	: LLM vulnerability scanner
https://github.com/fiddler-labs/fiddler-auditor	173	over 1 year ago	: Fiddler Auditor is a tool to evaluate language models
https://github.com/NVIDIA/NeMo	12,438	11 months ago	: NeMo: a toolkit for conversational AI
LLM Security & Privacy / Logistics / Prompt Injection vs Jailbreak vs Adversarial Attacks
jailbreakchat.com			is a method for bypassing safety filters, system instructions, or preferences. Sometimes asking the model directly (like prompt injection) does not work so more complex prompts (e.g., ) are used to trick the model

Backlinks from these awesome lists:

jphall663/awesome-machine-learning-interpretability

llm-sp

LLM Security & Privacy / Vulnerabilities / Jailbreak

LLM Security & Privacy / Vulnerabilities / Privacy

LLM Security & Privacy / Defenses / Against Jailbreak & Prompt Injection

LLM Security & Privacy / Other resources / People/Orgs/Blog to Follow

LLM Security & Privacy / Other resources / Resource Compilation

LLM Security & Privacy / Other resources / Open-Source Projects

LLM Security & Privacy / Logistics / Prompt Injection vs Jailbreak vs Adversarial Attacks

Backlinks from these awesome lists:

More related projects: