awesome-chaos-engineering

System stress test collection

A curated list of resources and examples on experimenting with distributed systems to improve resilience and reliability

A curated list of Chaos Engineering resources.

GitHub

6k stars
307 watching
649 forks
last commit: 11 months ago
Linked from 2 awesome lists

awesomeawesome-listchaoschaos-communitychaos-engineeringchaos-monkeychaos-testingnetflix-chaos-monkeyresiliencesimian-armysite-reliability-engineering

Awesome Chaos Engineering / Culture

Principles Of Chaos Engineering
Chaos Community
Chaos Engineering
O'Reilly Velocity San Jose 2017: Precision Chaos
The Discipline of Chaos Engineering
Chaos Monkey for Fun and Profit
Fault Injection in Production: Making the case for resilience testing
Lord of Chaos - Becoming a Chaos Engineer
Chaos testing - Preventing failure by instigation
Orchestrated Chaos
Video Choose your own adventure: Chaos Engineering - &
AMA Chaos Engineering + DiRT
SRECON17: Principles of Chaos Engineering
Chaos & Intuition Engineering at Netflix
Mastering Chaos - A Netflix Guide to Microservices
Too big to test: Breaking a production brokerage platform without causing financial devastation
Inside Azure Search: Chaos Engineering
Netflix, the Simian Army, and the culture of freedom and responsibility
FIT: Failure Injection Testing
The Netflix Simian Army
Automated Failure Testing
The Verification of a Distributed System by Caitie McCaffrey
The Journey to Chaos Engineering begins with a single step - Bruce Wong and James Burns (Twilio)
Chaos Engineering by Lorin Hochstein
Aaron Rinehart - ChaoSlingr: Introducing Security based Chaos Testing
Chaos Engineering - Casey Rosenthal
video The Road to Chaos - Velocity 2017- &
How Netflix DDoS’d Itself To Help Protect the Entire Internet
10 Years of Crashing Google
Weathering the Unexpected
SRECON17: Breaking Things on Purpose
PuppetConf 2016: Chaos Patterns - Architecting for Failure in Distributed Systems
Ship More, Sink Less - Changing Chaos Engineering and Distributed Tracing
Cloudcast - Discipline of Chaos Engineering
Software Engineering Daily - Failure Injection with Kolton Andrus podcast
Responding to Failures in Playback Features with Haley Tucker podcast
"Antics, drift, and chaos" by Lorin Hochstein
re:invent 2017: Nora Jones Describes Why We Need More Chaos - Chaos Engineering, That Is
Failure Friday: Four Years On
Monkeys & Lemurs and Locusts, Oh my!
Practical Chaos Engineering
Chaos Day in the Met Office Cloud
Cloud Native and Chaos Engineering
Chaos Engineering with Kolton Andrus
Chaos Engineering: the history, principles, and practice
Embracing the Chaos of Chaos Engineering
Designing Services for Resilience: Netflix Lessons
Chaos Engineering: A cheat sheet
How to convince your boss and make them say “Yes!” to Chaos Engineering?
Why the World Needs More Resilient Systems
Chaos Architecture
Gremlin’s Tammy Bütow on the Business Side of Chaos Engineering
Kubernetes Chaos Engineering: Lessons Learned
Chaos Engineering: managing complexity by breaking things
Podcast:Database Chaos with Tammy Butow
LinkedOut: A Request-Level Failure Injection Framework
GOTO 2018 - Breaking Things on Purpose - Kolton Andrus
Why should Chaos be part of your Distributed Systems Engineering?
Brian Holt - Chaos Monkeys in Your Browser What Chaos Engineering Means For the Front End
Chaos Engineering: Why the World Needs More Resilient Systems
video QCon·Beijing 2017: The Practice of Failure Management and Fault Injection at Alibaba E-Commerce Platforms - & (Chinese speech)
Orchestrating Chaos using Grab's Experimentation Platform
Breaking to Learn: Chaos Engineering Explained
Chaos Engineering Traps
Chaos Engineering - The Art of Breaking Things Purposefully
Disasterpiece Theater: Slack’s process for approachable Chaos Engineering
Taming chaos: Preparing for your next incident
The Future of Chaos Engineering w/ Conde Nast
Chaos Engineering For People Systems w/ Dave Rensin of Google
Performing chaos engineering in a serverless world (AWS re:Invent 2019 CMY301)
Building Confidence in Healthcare Systems through Chaos Engineering
Break Your App before Someone Else Does
Preparing for Traffic Spikes with Chaos Engineering
Automating Chaos Engineering GameDays with Terraform
Postmortem Culture: Learning from failure
Problem Detection by John Allspaw
New Paradigms for the Next Era of Security
Cloud-Native Chaos Engineering
Building resilient services at Prime Video with chaos engineering
Making Chaos Part of Kubernetes/OpenShift Performance and Scalability Tests
Lucky Lotto, chaos engineering but for teams
Using Fault Injection Testing to Improve DoorDash Reliability
Chaos Engineering At Ant Group

Awesome Chaos Engineering / Books

Chaos Engineering: Building Confidence in System Behavior through Experiment
Site Reliability Engineering: How Google Runs Production Systems -
The Practice Of Cloud System Administration: Designing and Operating Large Distributed Systems
Antifragile Systems and Teams
The InfoQ eMag: Chaos Engineering
Learning Chaos Engineering
Chaos Engineering: System Resilience in Practice
Chaos Engineering: Crash test your applications
Security Chaos Engineering: Gaining Confidence in Resilience and Safety at Speed and Scale
Chaos Engineering Observability

Awesome Chaos Engineering / Education

Slides A Chaos Engineering Bootcamp for O'Reilly Velocity 2017 - &
Your First Chaos Experiment
Chaos Engineering 101
A Primer on Automating Chaos
Intro to Chaos Engineering
Learn the basics of the Chaos Toolkit
Build System Confidence with Chaos Engineering
How we break things at Twitter: failure testing
Run Chaos Experiments Without Risking Your Job
A Guide to Your First Chaos Day
Planning Your Own Chaos Day
How To Install Distributed Tensorflow on GCP and Perform Chaos Engineering Experiments
Monitoring Your Chaos Experiments
Increasing the Resilience of APIs with Chaos Engineering
3 key steps for running chaos engineering experiments
Exploring Multi-level Weaknesses using Automated Chaos Experiments
Chaos Monkey Guide for Engineers
Chaos Engineering for Serverless
Network Fire Drills with Chaos Engineering
Dev Ops Foundations: Chaos Engineering
Resilience Engineering: Short Course
The Chaos Engineering Collection
PenTester Academic
Consul and Chaos Engineering

Awesome Chaos Engineering / Notable Tools

Chaos Monkey 15,256 about 2 months ago A resiliency tool that helps applications tolerate random instance failures
orchestrator 5,637 4 months ago MySQL replication topology management and HA
kube-monkey 2,981 5 months ago An implementation of Netflix's Chaos Monkey for Kubernetes clusters
Gremlin Inc. Failure as a Service
Chaos Toolkit 1,891 4 months ago A chaos engineering toolkit to help you build confidence in your software system
steadybit A Chaos Engineering platform (SaaS or On-Prem) with auto discovery features, different attack types, user management and many more
PowerfulSeal 1,946 about 1 year ago Adds chaos to your Kubernetes clusters, so that you can detect problems in your systems as early as possible. It kills targeted pods and takes VMs up and down
drax 42 over 5 years ago DC/OS Resilience Automated Xenodiagnosis tool. It helps to test DC/OS deployments by applying a Chaos Monkey-inspired, proactive and invasive testing approach
Wiremock API mocking (Service Virtualization) which enables modeling real world faults and delays
MockLab API mocking (Service Virtualization) as a service which enables modeling real world faults and delays
Pod-Reaper 201 4 months ago A rules based pod killing container. Pod-Reaper was designed to kill pods that meet specific conditions that can be used for Chaos testing in Kubernetes
Muxy 823 almost 4 years ago A chaos testing tool for simulating a real-world distributed system failures
Toxiproxy 10,841 12 days ago A TCP proxy to simulate network and system conditions for chaos and resiliency testing

Awesome Chaos Engineering / Notable Tools / Chaos engineering for Docker:

Pumba 2,791 3 months ago Chaos testing and network emulation for Docker containers (and clusters)
Blockade 907 over 3 years ago Docker-based utility for testing network failures and partitions in distributed applications

Awesome Chaos Engineering / Notable Tools

chaos-lambda 163 5 months ago Randomly terminate ASG instances during business hours
Namazu 493 about 6 years ago Programmable fuzzy scheduler for testing distributed systems
Chaos Monkey for Spring Boot Injects latencies, exceptions, and terminations into Spring Boot applications
Byte-Monkey 225 about 4 years ago Bytecode-level fault injection for the JVM. It works by instrumenting application code on the fly to deliberately introduce faults like exceptions and latency
GomJabbar 30 2 months ago ChaosMonkey for your private cloud
Turbulence 49 over 5 years ago Tool focused on BOSH environments capable of stressing VMs, manipulating network traffic, and more. It is very simmilar to Gremlin
chaosblade 5,982 15 days ago An Easy to Use and Powerful Chaos Engineering Toolkit
KubeInvaders 1,022 27 days ago Gamfied Chaos engineering tool for Kubernetes Clusters
Cthulhu 93 about 5 years ago Chaos Engineering tool that helps evaluating the resiliency of microservice systems simulating various disaster scenarios against a target infrastructure in a data-driven manner
VMware Mangle Orchestrating Chaos Engineering
Byteman A Swiss Army Knife for Byte Code Manipulation
Litmus 4,439 8 days ago Framework for Kubernetes environments that enables users to run test suites, capture logs, generate reports and perform chaos tests
Perses 66 over 3 years ago A project to cause (controlled) destruction to a JVM application
ChaosKube 1,810 22 days ago chaoskube periodically kills random pods in your Kubernetes cluster
Chaos Mesh 6,768 18 days ago Chaos Mesh is a cloud-native Chaos Engineering platform that orchestrates chaos on Kubernetes environments
failure-lambda 94 3 months ago A small Node module for injecting failure into AWS Lambda using latency, exception, statuscode or diskspace
aws-chaos-scripts 92 about 1 year ago Collection of python scripts to run failure injection on AWS infrastructure
chaos-ssm-documents 267 over 1 year ago Collection of AWS SSM Documents to perform Chaos Engineering experiments
aws-lambda-chaos-injection 100 16 days ago A library injecting chaos into AWS Lambda. It offers simple python decorators to do delay, exception and statusCode injection and a Class to add delay to any 3rd party dependencies
chaos-dingo 11 about 5 years ago A tool to mess with Azure services using the Azure NodeJS SDK
Chaos HTTP Proxy 145 12 months ago Introduce failures into HTTP requests via a proxy server
Chaos Lemur 62 over 6 years ago A self-hostable application to randomly destroy virtual machines in a BOSH-managed environment
Simoorg 191 almost 7 years ago Linkedin’s very own failure inducer framework
react-chaos 593 almost 2 years ago A chaos engineering tool for your React apps
vue-chaos 2 about 4 years ago A chaos engineering tool for your Vue apps
Chaos Engine 68 9 months ago tool designed to intermittently destroy or degrade application resources running in cloud based infrastructure
kubedoom 2,016 3 months ago Kill Kubernetes pods by playing Id's DOOM
kubethanos 623 over 4 years ago Kills half of your randomly selected Kubernetes pods
go-fault 506 about 1 month ago Fault injection middleware in Go
Proofdock's Chaos Engineering Platform A chaos engineering platform that seamlessly integrates in Azure DevOps and has a focus on the Azure cloud platform
Pystol Pystol is a fault injection platform allowing users to execute fault injection Actions in cloud-native environments in a controlled and prescribed way
AWSSSMChaosRunner 249 about 1 year ago Amazon's light-weight open-source library for chaos engineering on AWS. It can be used for , and
Kraken 288 8 days ago Chaos and resiliency testing tool for Kubernetes and OpenShift
kube-burner 502 10 days ago A tool aimed at stressing Kubernetes clusters by creating or deleting a high quantity of objects
Chaos Experimentation Framework 1,691 6 days ago An extensible platform for infrastructure management including Chaos Engineering
NetHavoc A Chaos Engineering Tool for Linux, K8s, Windows, PCF, Cloud, and Containers for injecting Resource, Infrastructure, Network, and Application failures
gorm-sqlchaos 5 about 3 years ago A runtime SQL manipulator for your Golang applications based on gorm
Chaos Frontend Toolkit A set of tools to apply Chaos Engineering to frontend
Mitigant The Continuos Security Verification Platform, enables confidence in cloud security posture by leveraging security chaos engineering

Awesome Chaos Engineering / Retired tools

The Simian Army 7,979 almost 6 years ago A suite of tools for keeping your cloud operating in top form
ChaoSlingr 66 over 5 years ago Introducing Security Chaos Engineering. ChaoSlingr focuses primarily on the experimentation on AWS Infrastructure to proactively instrument system security failure through experimentation

Awesome Chaos Engineering / Cloud Services

Testing Amazon Aurora Using Fault Injection Queries
Azure Chaos Studio A managed fault injection service for Azure applications. See also for Azure Service Fabric applications
Security Chaos Engineering for Cloud Services

Awesome Chaos Engineering / Papers

Maelstrom: Mitigating Datacenter-level Disasters by Draining Interdependent Traffic Safely and Efficiently
Simple Testing Can Prevent Most Critical Failures: An Analysis of Production Failures in Distributed Data-Intensive Systems
Automating Failure Testing Research at Internet Scale
Principles of Antifragile Software
Why is random testing effective for partition tolerance bugs?
Chaos Engineering
A Platform for Automating Chaos Experiments
A Chaos Engineering System for Live Analysis and Falsification of Exception-handling in the JVM
TripleAgent: Monitoring, Perturbation And Failure-obliviousness for Automated Resilience Improvement in Java Applications
Lineage-driven Fault Injection
Antifragility is a Fragile Concept
Chaos Engineering Security
Security Chaos Engineering: A new paradigm for cybersecurity
Security Challenges around Chaos Engineering
CloudStrike: Security Chaos Engineering for Cloud Services
Observability and Chaos Engineering on System Calls for Containerized Applications in Docker
Maximizing Error Injection Realism for Chaos Engineering with System Calls
Chaos Engineering of Ethereum Blockchain Clients

Awesome Chaos Engineering / Gamedays

Target: What is a Gameday? Chaos Gamedays experience by Target
Codecentric: Chaos Engineering Gamedays Chaos Gamedays by Codecentric
New Relic: How to run a Gameday? Chaos Gamedays experience by New Relic
Dius: Gamedays resources Resources for getting started with GameDay and Chaos Engineering
Gremlin: Gamedays Resources for getting started with GameDay and Chaos Engineering
Gremlin: What is a Chaos Day? What is a Gameday according Gremlin
Gremlin: Why run a Chaos Day? Reasons to run Gamedays according Gremlin
Gremlin: How to run a Gameday? Methodology to run Gamedays according Gremlin
Gremlin DB: Breaking Dynamo DB Example of a Gameday with DynamoDB by Gremlin
Gremlin: Introduction to Gameday What is a Gameday according Gremlin
Gremlin: Planning your own Chaos Day Example of a Gameday with DynamoDB by Gremlin
Gremlin: Inside Gremlin 2019 Gremlin Gamedays Roadmap Chaos Gamedays experience by Gremlin
Gremlin: What I lerned running the Chaos Lab with Kafka Example of a Gameday with Kafka by Gremlin
Chaos Toolkit: Chaos Engineering with Humans in the loop Article about Chaos Gamedays
GooCardless: All fun and games until you start with Gamedays Article about Chaos Gamedays
InfoQ: Gamedays - Achieving Resilience through Chaos Engineering InfoQ Presentation with experiences about Chaos Gamedays

Awesome Chaos Engineering / Blogs & Newsletters

Netflix Technology Blog Learn more about how Netflix designs, builds, and operates our systems and engineering organizations
Production Ready A mailing list about building resilient infrastructure and tools
SRE Weekly Weekly Site Reliability Newsletter
Site Reliability Engineering resources 11,989 6 months ago A curated list of awesome Site Reliability and Production Engineering resources
SysAdvent One article for each day of December, ending on the 25th article
Gremlin Blog Blogs on Chaos Engineering from Gremlin Inc
O’Reilly Systems Engineering and Operations Newsletter Weekly systems engineering and operations news and insights from industry insiders
LaunchDarkly Blog Continuous delivery and feature flags blog
Verica Chaos engineering, security chaos engineering and continuous verification
Proofdock Reliability, resilience and chaos engineering with a focus on MS Azure
LitmusChaos Blog Blogs on Chaos Engineering from LitmusChaos
ChaosEngineering.news Chaos Engineering newsletter. All things chaos engineering, directly to your inbox!
Chaos Mesh Blog Blogs on Chaos Engineering from Chaos Mesh
Chaos Experimentation Framework Chaos Experimentation, an open-source framework built on top of Envoy Proxy
Squadcast Blog on Site Reliability engineering
steadybit Blog Blogs on Chaos Engineering, Resilience, SRE and OPS from steadybit

Awesome Chaos Engineering / Podcasts

Break Things On Purpose Monthly podcast about Chaos Engineering presented by Gremlin Inc. Also available on Spotify, Google Play, and Stitcher

Awesome Chaos Engineering / Conferences & Meetups

Chaos Carnival A global two-day virtual conference for Cloud Native Chaos Engineering
Chaos Conf A day of Chaos Engineering demos, expert advice, and connect with your peers putting chaos into practice at their companies
SRECon Conferences The official SRE conference
LISA Conferences Prominent conference about SysAdmin/DevOps/SRE
O'Reilly Velocity Conference Prominent conference about Systems Engineering/DevOps/SRE
Chaos Engineering Community Meetup Group Bay Area Meetup group for Chaos Engineers
London Chaos Engineering Community _ London Area Meetup group for Chaos Engineers
Stockholm Chaos Engineering Meetup Stockholm Meetup group for Chaos Engineers
Chaos Engineering Community A collection of meetups across the globe about Chaos Engineerings
Conf42.com: Chaos Engineering Chaos Engineering for practitioners and adopters - London UK, 23 Jan 2020
Kubernetes Chaos Engineering Meetup Group India India Meetup group for Chaos Engineers

Awesome Chaos Engineering / Forums

Chaos Community Google Group
Chaos Engineering LinkedIn Group
Chaos Engineering Slack Community
CNCF Chaos Engineering Working Group
CNCF Chaos Engineering Working Group Github 113 over 4 years ago
Chaos Toolkit Slack Community
Litmus Chaos Engineering Slack Community

Backlinks from these awesome lists:

More related projects: