awesome-llm-security
LLM Security Resources
Curation of resources and tools related to security vulnerabilities in large language models.
A curation of awesome tools, documents and projects about LLM Security.
956 stars
32 watching
94 forks
last commit: 3 days ago
Linked from 2 awesome lists
awesomeawesome-listllmsecurity
Awesome LLM Security / Papers / White-box attack | |||
[paper] | "Visual Adversarial Examples Jailbreak Large Language Models", 2023-06, AAAI(Oral) 24, , | ||
[paper] | "Are aligned neural networks adversarially aligned?", 2023-06, NeurIPS(Poster) 23, , | ||
[paper] | "(Ab)using Images and Sounds for Indirect Instruction Injection in Multi-Modal LLMs", 2023-07, | ||
[paper] | "Universal and Transferable Adversarial Attacks on Aligned Language Models", 2023-07, , | ||
[paper] | "Jailbreak in pieces: Compositional Adversarial Attacks on Multi-Modal Language Models", 2023-07, , | ||
[paper] | "Image Hijacking: Adversarial Images can Control Generative Models at Runtime", 2023-09, , | ||
[paper] | "Weak-to-Strong Jailbreaking on Large Language Models", 2024-04, , | ||
Awesome LLM Security / Papers / Black-box attack | |||
[paper] | "Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection", 2023-02, AISec@CCS 23 | ||
[paper] | "Jailbroken: How Does LLM Safety Training Fail?", 2023-07, NeurIPS(Oral) 23, | ||
[paper] | "Latent Jailbreak: A Benchmark for Evaluating Text Safety and Output Robustness of Large Language Models", 2023-07, | ||
[paper] | "Effective Prompt Extraction from Language Models", 2023-07, , | ||
[paper] | "Multi-step Jailbreaking Privacy Attacks on ChatGPT", 2023-04, EMNLP 23, , | ||
[paper] | "LLM Censorship: A Machine Learning Challenge or a Computer Security Problem?", 2023-07, | ||
[paper] | "Jailbreaking chatgpt via prompt engineering: An empirical study", 2023-05, | ||
[paper] | "Prompt Injection attack against LLM-integrated Applications", 2023-06, | ||
[paper] | "MasterKey: Automated Jailbreak Across Multiple Large Language Model Chatbots", 2023-07, , | ||
[paper] | "GPT-4 Is Too Smart To Be Safe: Stealthy Chat with LLMs via Cipher", 2023-08, ICLR 24, , | ||
[paper] | "Use of LLMs for Illicit Purposes: Threats, Prevention Measures, and Vulnerabilities", 2023-08, | ||
[paper] | "Do-Not-Answer: A Dataset for Evaluating Safeguards in LLMs", 2023-08, | ||
[paper] | "Detecting Language Model Attacks with Perplexity", 2023-08, | ||
[paper] | "Open Sesame! Universal Black Box Jailbreaking of Large Language Models", 2023-09, , | ||
[paper] | "Fine-tuning Aligned Language Models Compromises Safety, Even When Users Do Not Intend To!", 2023-10, ICLR(oral) 24, | ||
[paper] | "AutoDAN: Generating Stealthy Jailbreak Prompts on Aligned Large Language Models", 2023-10, ICLR(poster) 24, , , | ||
[paper] | "Jailbreak and Guard Aligned Language Models with Only Few In-Context Demonstrations", 2023-10, CoRR 23, , | ||
[paper] | "Multilingual Jailbreak Challenges in Large Language Models", 2023-10, ICLR(poster) 24, | ||
[paper] | "Scalable and Transferable Black-Box Jailbreaks for Language Models via Persona Modulation", 2023-11, SoLaR(poster) 24, | ||
[paper] | "DeepInception: Hypnotize Large Language Model to Be Jailbreaker", 2023-11, | ||
[paper] | "A Wolf in Sheep’s Clothing: Generalized Nested Jailbreak Prompts can Fool Large Language Models Easily", 2023-11, NAACL 24, | ||
[paper] | "AutoDAN: Automatic and Interpretable Adversarial Attacks on Large Language Models", 2023-10, | ||
[paper] | "Language Model Inversion", 2023-11, ICLR(poster) 24, | ||
[paper] | "An LLM can Fool Itself: A Prompt-Based Adversarial Attack", 2023-10, ICLR(poster) 24, | ||
[paper] | "GPTFUZZER: Red Teaming Large Language Models with Auto-Generated Jailbreak Prompts", 2023-09, | ||
[paper] | "Many-shot Jailbreaking", 2024-04, | ||
[paper] | "Rethinking How to Evaluate Language Model Jailbreak", 2024-04, | ||
Awesome LLM Security / Papers / Backdoor attack | |||
[paper] | "BITE: Textual Backdoor Attacks with Iterative Trigger Injection", 2022-05, ACL 23, | ||
[paper] | "Prompt as Triggers for Backdoor Attack: Examining the Vulnerability in Language Models", 2023-05, EMNLP 23, | ||
[paper] | "Backdooring Instruction-Tuned Large Language Models with Virtual Prompt Injection", 2023-07, NAACL 24, | ||
Awesome LLM Security / Papers / Fingerprinting | |||
[paper] | "Instructional Fingerprinting of Large Language Models", 2024-01, NAACL 24 | ||
[paper] | "TRAP: Targeted Random Adversarial Prompt Honeypot for Black-Box Identification", 2024-02, ACL 24 (findings) | ||
[paper] | "LLMmap: Fingerprinting For Large Language Models", 2024-07, | ||
Awesome LLM Security / Papers / Defense | |||
[paper] | "Baseline Defenses for Adversarial Attacks Against Aligned Language Models", 2023-09, | ||
[paper] | "LLM Self Defense: By Self Examination, LLMs Know They Are Being Tricked", 2023-08, ICLR 24 Tiny Paper, , | ||
[paper] | "Defending Against Alignment-Breaking Attacks via Robustly Aligned LLM", 2023-09, , | ||
[paper] | "Benchmarking and Defending Against Indirect Prompt Injection Attacks on Large Language Models", 2023-12, | ||
[paper] | "AutoDefense: Multi-Agent LLM Defense against Jailbreak Attacks", 2024-03, | ||
[paper] | "Protecting Your LLMs with Information Bottleneck", 2024-04, | ||
[paper] | "PARDEN, Can You Repeat That? Defending against Jailbreaks via Repetition", 2024-05, ICML 24, | ||
[paper] | “Adversarial Tuning: Defending Against Jailbreak Attacks for LLMs”, 2024-06, | ||
[paper] | "Improving Alignment and Robustness with Circuit Breakers", 2024-06, NeurIPS 24, , | ||
Awesome LLM Security / Papers / Platform Security | |||
[paper] | "LLM Platform Security: Applying a Systematic Evaluation Framework to OpenAI’s ChatGPT Plugins", 2023-09, | ||
Awesome LLM Security / Papers / Survey | |||
[paper] | "Survey of Vulnerabilities in Large Language Models Revealed by Adversarial Attacks", 2023-10, ACL 24, | ||
[paper] | "Security and Privacy Challenges of Large Language Models: A Survey", 2024-02, | ||
[paper] | "Breaking Down the Defenses: A Comparative Survey of Attacks on Large Language Models", 2024-03, | ||
[paper] | "Operationalizing a Threat Model for Red-Teaming Large Language Models (LLMs)", 2024-07, | ||
Awesome LLM Security / Benchmark | |||
[paper] | "JailbreakBench: An Open Robustness Benchmark for Jailbreaking Large Language Models", 2024-03, | ||
[paper] | "AgentDojo: A Dynamic Environment to Evaluate Attacks and Defenses for LLM Agents", 2024-06, NeurIPS 24, | ||
[paper] | "AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents", 2024-10, | ||
Awesome LLM Security / Tools | |||
Plexiglass | 121 | 11 months ago | : a security toolbox for testing and safeguarding LLMs |
PurpleLlama | 2,716 | 23 days ago | : set of tools to assess and improve LLM security |
Rebuff | 1,124 | 4 months ago | : a self-hardening prompt injection detector |
Garak | 1,471 | 8 days ago | : a LLM vulnerability scanner |
LLMFuzzer | 231 | 10 months ago | : a fuzzing framework for LLMs |
LLM Guard | 1,242 | about 1 month ago | : a security toolkit for LLM Interactions |
Vigil | 309 | 10 months ago | : a LLM prompt injection detection toolkit |
jailbreak-evaluation | 19 | 19 days ago | : an easy-to-use Python package for language model jailbreak evaluation |
Prompt Fuzzer | 401 | about 1 month ago | : the open-source tool to help you harden your GenAI applications |
WhistleBlower | 111 | 4 months ago | : open-source tool designed to infer the system prompt of an AI agent based on its generated text outputs |
Awesome LLM Security / Articles | |||
Hacking Auto-GPT and escaping its docker container | |||
Prompt Injection Cheat Sheet: How To Manipulate AI Language Models | |||
Indirect Prompt Injection Threats | |||
Prompt injection: What’s the worst that can happen? | |||
OWASP Top 10 for Large Language Model Applications | |||
PoisonGPT: How we hid a lobotomized LLM on Hugging Face to spread fake news | |||
ChatGPT Plugins: Data Exfiltration via Images & Cross Plugin Request Forgery | |||
Jailbreaking GPT-4's code interpreter | |||
Securing LLM Systems Against Prompt Injection | |||
The AI Attack Surface Map v1.0 | |||
Adversarial Attacks on LLMs | |||
How Anyone can Hack ChatGPT - GPT4o | |||
LLM Evaluation metrics, frmaework, and checklist | |||
How RAG Poisoning Made Llama3 Racist! | |||
Awesome LLM Security / Other Awesome Projects | |||
https://0din.ai | (0din GenAI Bug Bounty from Mozilla)( ): The 0Day Investigative Network is a bug bounty program focusing on flaws within GenAI models. Vulnerability classes include Prompt Injection, Training Data Poisoning, DoS, and more | ||
Gandalf | : a prompt injection wargame | ||
LangChain vulnerable to code injection - CVE-2023-29374 | |||
LLM Security startups | 1 | 15 days ago | |
Adversarial Prompting | |||
Epivolis | : a prompt injection aware chatbot designed to mitigate adversarial efforts | ||
LLM Security Problems at DEFCON31 Quals | 123 | over 1 year ago | : the world's top security competition |
PromptBounty.io | |||
PALLMs (Payloads for Attacking Large Language Models) | 64 | 5 months ago | |
Awesome LLM Security / Other Useful Resources | |||
@llm_sec | Twitter: | ||
LLM Security | Blog: authored by | ||
Embrace The Red | Blog: | ||
Kai's Blog | Blog: | ||
AI safety takes | Newsletter: | ||
Hackstery | Newsletter & Blog: |