Principle |
| Lessons from Giant-Scale Services - Eric Brewer, UC Berkeley & Google | | | |
| Designs, Lessons and Advice from Building Large Distributed Systems - Jeff Dean, Google | | | |
| How to Design a Good API & Why it Matters - Joshua Bloch, CMU & Google | | | |
| On Efficiency, Reliability, Scaling - James Hamilton, VP at AWS | | | |
| Principles of Chaos Engineering | | | |
| Finding the Order in Chaos | | | |
| The Twelve-Factor App | | | |
| Clean Architecture | | | |
| High Cohesion and Low Coupling | | | |
| Monoliths and Microservices | | | |
| CAP Theorem and Trade-offs | | | |
| CP Databases and AP Databases | | | |
| Stateless vs Stateful Scalability | | | |
| Scale Up vs Scale Out: Hidden Costs | | | |
| ACID and BASE | | | |
| Blocking/Non-Blocking and Sync/Async | | | |
| Performance and Scalability of Databases | | | |
| Database Isolation Levels and Effects on Performance and Scalability | | | |
| The Probability of Data Loss in Large Clusters | | | |
| Data Access for Highly-Scalable Solutions: Using SQL, NoSQL, and Polyglot Persistence | | | |
| SQL vs NoSQL | | | |
| SQL vs NoSQL - Lesson Learned at Salesforce | | | |
| NoSQL Databases: Survey and Decision Guidance | | | |
| How Sharding Works | | | |
| Consistent Hashing | | | |
| Consistent Hashing: Algorithmic Tradeoffs | | | |
| Don’t be tricked by the Hashing Trick | | | |
| Uniform Consistent Hashing at Netflix | | | |
| Eventually Consistent - Werner Vogels, CTO at Amazon | | | |
| Cache is King | | | |
| Anti-Caching | | | |
| Understand Latency | | | |
| Latency Numbers Every Programmer Should Know | | | |
| The Calculus of Service Availability | | | |
| Architecture Issues When Scaling Web Applications: Bottlenecks, Database, CPU, IO | | | |
| Common Bottlenecks | | | |
| Life Beyond Distributed Transactions | | | |
| Relying on Software to Redirect Traffic Reliably at Various Layers | | | |
| Breaking Things on Purpose | | | |
| Avoid Over Engineering | | | |
| Scalability Worst Practices | | | |
| Use Solid Technologies - Don’t Re-invent the Wheel - Keep It Simple! | | | |
| Simplicity by Distributing Complexity | | | |
| Why Over-Reusing is Bad | | | |
| Performance is a Feature | | | |
| Make Performance Part of Your Workflow | | | |
| The Benefits of Server Side Rendering over Client Side Rendering | | | |
| Automate and Abstract: Lessons at Facebook | | | |
| AWS Do's and Don'ts | | | |
| (UI) Design Doesn’t Scale - Stanley Wood, Design Director at Spotify | | | |
| Linux Performance | | | |
| Building Fast and Resilient Web Applications - Ilya Grigorik | | | |
| Accept Partial Failures, Minimize Service Loss | | | |
| Design for Resiliency | | | |
| Design for Self-healing | | | |
| Design for Scaling Out | | | |
| Design for Evolution | | | |
| Learn from Mistakes | | | |
Scalability |
| Microservices and Orchestration | | | |
Scalability / Microservices and Orchestration |
| Domain-Oriented Microservice Architecture at Uber | | | |
| Service Architecture (3 parts: Domain Gateways, Value-Added Services, BFF) at SoundCloud | | | |
| Container (8 parts) at Riot Games | | | |
| Containerization at Pinterest | | | |
| Evolution of Container Usage at Netflix | | | |
| Dockerizing MySQL at Uber | | | |
| Testing of Microservices at Spotify | | | |
| Docker in Production at Treehouse | | | |
| Microservice at SoundCloud | | | |
| Operate Kubernetes Reliably at Stripe | | | |
| Cross-Cluster Traffic Mirroring with Istio at Trivago | | | |
| Agrarian-Scale Kubernetes (3 parts) at New York Times | | | |
| Nanoservices at BBC | | | |
| PowerfulSeal: Testing Tool for Kubernetes Clusters at Bloomberg | | | |
| Conductor: Microservices Orchestrator at Netflix | | | |
| Docker Containers that Power Over 100.000 Online Shops at Shopify | | | |
| Microservice Architecture at Medium | | | |
| From bare-metal to Kubernetes at Betabrand | | | |
| Kubernetes at Tinder | | | |
| Kubernetes at Quora | | | |
| Kubernetes Platform at Pinterest | | | |
| Microservices at Nubank | | | |
| Payment Transaction Management in Microservices at Mercari | | | |
| Service Mesh at Snap | | | |
| GRIT: Protocol for Distributed Transactions across Microservices at eBay | | | |
| Rubix: Kubernetes at Palantir | | | |
| CRISP: Critical Path Analysis for Microservice Architectures at Uber | | | |
Scalability |
| Distributed Caching | | | |
Scalability / Distributed Caching |
| EVCache: Distributed In-memory Caching at Netflix | | | |
| EVCache Cache Warmer Infrastructure at Netflix | | | |
| Memsniff: Robust Memcache Traffic Analyzer at Box | | | |
| Caching with Consistent Hashing and Cache Smearing at Etsy | | | |
| Analysis of Photo Caching at Facebook | | | |
| Cache Efficiency Exercise at Facebook | | | |
| tCache: Scalable Data-aware Java Caching at Trivago | | | |
| Pycache: In-process Caching at Quora | | | |
| Reduce Memcached Memory Usage by 50% at Trivago | | | |
| Caching Internal Service Calls at Yelp | | | |
| Estimating the Cache Efficiency using Big Data at Allegro | | | |
| Distributed Cache at Zalando | | | |
| Application Data Caching from RAM to SSD at NetFlix | | | |
| Tradeoffs of Replicated Cache at Skyscanner | | | |
| Location Caching with Quadtrees at Yext | | | |
| Video Metadata Caching at Vimeo | | | |
| Scaling Redis at Twitter | | | |
| Scaling Job Queue with Redis at Slack | | | |
| Moving persistent data out of Redis at Github | | | |
| Storing Hundreds of Millions of Simple Key-Value Pairs in Redis at Instagram | | | |
| Redis at Trivago | | | |
| Optimizing Redis Storage at Deliveroo | | | |
| Memory Optimization in Redis at Wattpad | | | |
| Redis Fleet at Heroku | | | |
| Solving Remote Build Cache Misses (2 parts) at SoundCloud | | | |
| Ratings & Reviews (2 parts) at Flipkart | | | |
| Prefetch Caching of Items at eBay | | | |
| Cross-Region Caching Library at Wix | | | |
| Improving Distributed Caching Performance and Efficiency at Pinterest | | | |
| Standardize and Improve Microservices Caching at DoorDash | | | |
| HTTP Caching and CDN | | | |
Scalability / Distributed Caching / HTTP Caching and CDN |
| Zynga Geo Proxy: Reducing Mobile Game Latency at Zynga | | | |
| Google AMP at Condé Nast | | | |
| A/B Tests on Hosting Infrastructure (CDNs) at Deliveroo | | | |
| HAProxy with Kubernetes for User-facing Traffic at SoundCloud | | | |
| Bandaid: Service Proxy at Dropbox | | | |
| Service Workers at Slack | | | |
| CDN Services at Spotify | | | |
Scalability |
| Distributed Locking | | | |
Scalability / Distributed Locking |
| Chubby: Lock Service for Loosely Coupled Distributed Systems at Google | | | |
| Distributed Locking at Uber | | | |
| Distributed Locks using Redis at GoSquared | | | |
| ZooKeeper at Twitter | | | |
| Eliminating Duplicate Queries using Distributed Locking at Chartio | | | |
Scalability |
| Distributed Tracking, Tracing, and Measuring | | | |
Scalability / Distributed Tracking, Tracing, and Measuring |
| Zipkin: Distributed Systems Tracing at Twitter | | | |
| Improve Zipkin Traces using Kubernetes Pod Metadata at SoundCloud | | | |
| Canopy: Scalable Distributed Tracing & Analysis at Facebook | | | |
| Pintrace: Distributed Tracing at Pinterest | | | |
| XCMetrics: All-in-One Tool for Tracking Xcode Build Metrics at Spotify | | | |
| Real-time Distributed Tracing at LinkedIn | | | |
| Tracking Service Infrastructure at Scale at Shopify | | | |
| Distributed Tracing at HelloFresh | | | |
| Analyzing Distributed Trace Data at Pinterest | | | |
| Distributed Tracing at Uber | | | |
| JVM Profiler: Tracing Distributed JVM Applications at Uber | | | |
| Data Checking at Dropbox | | | |
| Tracing Distributed Systems at Showmax | | | |
| osquery Across the Enterprise at Palantir | | | |
| StatsD at Etsy | | | |
Scalability |
| Distributed Scheduling | | | |
Scalability / Distributed Scheduling |
| Distributed Task Scheduling (3 parts) at PagerDuty | | | |
| Building Cron at Google | | | |
| Distributed Cron Architecture at Quora | | | |
| Chronos: A Replacement for Cron at Airbnb | | | |
| Scheduler at Nextdoor | | | |
| Peloton: Unified Resource Scheduler for Diverse Cluster Workloads at Uber | | | |
| Fenzo: OSS Scheduler for Apache Mesos Frameworks at Netflix | | | |
| Airflow - Workflow Orchestration | | | |
Scalability / Distributed Scheduling / Airflow - Workflow Orchestration |
| Airflow at Airbnb | | | |
| Airflow at Adyen | | | |
| Airflow at Pandora | | | |
| Airflow at Robinhood | | | |
| Airflow at Lyft | | | |
| Airflow at Drivy | | | |
| Airflow at Grab | | | |
| Airflow at Adobe | | | |
| Auditing Airflow Job Runs at Walmart | | | |
| MaaT: DAG-based Distributed Task Scheduler at Alibaba | | | |
| boundary-layer: Declarative Airflow Workflows at Etsy | | | |
Scalability |
| Distributed Monitoring and Alerting | | | |
Scalability / Distributed Monitoring and Alerting |
| Unicorn: Remediation System at eBay | | | |
| M3: Metrics and Monitoring Platform at Uber | | | |
| Athena: Automated Build Health Management System at Dropbox | | | |
| Vortex: Monitoring Server Applications at Dropbox | | | |
| Nuage: Cloud Management Service at LinkedIn | | | |
| Telltale: Application Monitoring at Netflix | | | |
| ThirdEye: Monitoring Platform at LinkedIn | | | |
| Periskop: Exception Monitoring Service at SoundCloud | | | |
| Securitybot: Distributed Alerting Bot at Dropbox | | | |
| Monitoring System at Alibaba | | | |
| Real User Monitoring at Dailymotion | | | |
| Alerting Ecosystem at Uber | | | |
| Alerting Framework at Airbnb | | | |
| Alerting on Service-Level Objectives (SLOs) at SoundCloud | | | |
| Job-based Forecasting Workflow for Observability Anomaly Detection at Uber | | | |
| Monitoring and Alert System using Graphite and Cabot at HackerEarth | | | |
| Observability (2 parts) at Twitter | | | |
| Distributed Security Alerting at Slack | | | |
| Real-Time News Alerting at Bloomberg | | | |
| Data Pipeline Monitoring System at LinkedIn | | | |
| Monitoring and Observability at Picnic | | | |
Scalability |
| Distributed Security | | | |
Scalability / Distributed Security |
| Approach to Security at Scale at Dropbox | | | |
| Aardvark and Repokid: AWS Least Privilege for Distributed, High-Velocity Development at Netflix | | | |
| LISA: Distributed Firewall at LinkedIn | | | |
| Secure Infrastructure To Store Bitcoin In The Cloud at Coinbase | | | |
| BinaryAlert: Real-time Serverless Malware Detection at Airbnb | | | |
| Scalable IAM Architecture to Secure Access to 100 AWS Accounts at Segment | | | |
| OAuth Audit Toolbox at Indeed | | | |
| Active Directory Password Blacklisting at Yelp | | | |
| Syscall Auditing at Scale at Slack | | | |
| Athenz: Fine-Grained, Role-Based Access Control at Yahoo | | | |
| WebAuthn Support for Secure Sign In at Dropbox | | | |
| Security Development Lifecycle at Slack | | | |
| Unprivileged Container Builds at Kinvolk | | | |
| Diffy: Differencing Engine for Digital Forensics in the Cloud at Netflix | | | |
| Detecting Credential Compromise in AWS at Netflix | | | |
| Scalable User Privacy at Spotify | | | |
| AVA: Audit Web Applications at Indeed | | | |
| TTL as a Service: Automatic Revocation of Stale Privileges at Yelp | | | |
| Enterprise Key Management at Slack | | | |
| Scalability and Authentication at Twitch | | | |
| Edge Authentication and Token-Agnostic Identity Propagation at Netflix | | | |
| Hardening Kubernetes Infrastructure with Cilium at Palantir | | | |
| Improving Web Vulnerability Management through Automation at Lyft | | | |
| Clock Skew when Syncing Password Payloads at Drobbox | | | |
Scalability |
| Distributed Messaging, Queuing, and Event Streaming | | | |
Scalability / Distributed Messaging, Queuing, and Event Streaming |
| Cape: Event Stream Processing Framework at Dropbox | | | |
| Brooklin: Distributed Service for Near Real-Time Data Streaming at LinkedIn | | | |
| Samza: Stream Processing System for Latency Insighs at LinkedIn | | | |
| Bullet: Forward-Looking Query Engine for Streaming Data at Yahoo | | | |
| EventHorizon: Tool for Watching Events Streaming at Etsy | | | |
| Qmessage: Distributed, Asynchronous Task Queue at Quora | | | |
| Cherami: Message Queue System for Transporting Async Tasks at Uber | | | |
| Dynein: Distributed Delayed Job Queueing System at Airbnb | | | |
| Timestone: Queueing System for Non-Parallelizable Workloads at Netflix | | | |
| Messaging Service at Riot Games | | | |
| Debugging Production with Event Logging at Zillow | | | |
| Cross-platform In-app Messaging Orchestration Service at Netflix | | | |
| Video Gatekeeper at Netflix | | | |
| Scaling Push Messaging for Millions of Devices at Netflix | | | |
| Delaying Asynchronous Message Processing with RabbitMQ at Indeed | | | |
| Benchmarking Streaming Computation Engines at Yahoo | | | |
| Improving Stream Data Quality With Protobuf Schema Validation at Deliveroo | | | |
| Scaling Email Infrastructure at Medium | | | |
| Real-time Messaging at Slack | | | |
| Event Stream Database at Nike | | | |
| Event Tracking System at Udemy | | | |
| Event-Driven Messaging | | | |
Scalability / Distributed Messaging, Queuing, and Event Streaming / Event-Driven Messaging |
| Domain-Driven Design at Alibaba | | | |
| Domain-Driven Design at Weebly | | | |
| Domain-Driven Design at Moonpig | | | |
| Scaling Event Sourcing for Netflix Downloads | | | |
| Scaling Event-Sourcing at Jet.com | | | |
| Event Sourcing (2 parts) at eBay | | | |
| Event Sourcing at FREE NOW | | | |
| Scalable content feed using Event Sourcing and CQRS patterns at Brainly | | | |
Scalability / Distributed Messaging, Queuing, and Event Streaming |
| Pub-Sub Messaging | | | |
Scalability / Distributed Messaging, Queuing, and Event Streaming / Pub-Sub Messaging |
| Pulsar: Pub-Sub Messaging at Scale at Yahoo | | | |
| Wormhole: Pub-Sub System at Facebook | | | |
| MemQ: Cloud Native Pub-Sub System at Pinterest | | | |
| Pub-Sub in Microservices at Netflix | | | |
Scalability / Distributed Messaging, Queuing, and Event Streaming |
| Kafka - Message Broker | | | |
Scalability / Distributed Messaging, Queuing, and Event Streaming / Kafka - Message Broker |
| Kafka at LinkedIn | | | |
| Kafka at Pinterest | | | |
| Kafka at Trello | | | |
| Kafka at Salesforce | | | |
| Kafka at The New York Times | | | |
| Kafka at Yelp | | | |
| Kafka at Criteo | | | |
| Kafka on Kubernetes at Shopify | | | |
| Kafka on PaaSTA: Running Kafka on Kubernetes at Yelp (2 parts) | | | |
| Migrating Kafka's Zookeeper with No Downtime at Yelp | | | |
| Reprocessing and Dead Letter Queues with Kafka at Uber | | | |
| Chaperone: Audit Kafka End-to-End at Uber | | | |
| Finding Kafka throughput limit in infrastructure at Dropbox | | | |
| Cost Orchestration at Walmart | | | |
| InfluxDB and Kafka to Scale to Over 1 Million Metrics a Second at Hulu | | | |
| Scaling Kafka to Support Data Growth at PayPal | | | |
Scalability / Distributed Messaging, Queuing, and Event Streaming |
| Stream Data Deduplication | | | |
Scalability / Distributed Messaging, Queuing, and Event Streaming / Stream Data Deduplication |
| Exactly-once Semantics with Kafka | | | |
| Real-time Deduping at Tapjoy | | | |
| Deduplication at Segment | | | |
| Deduplication at Mail.Ru | | | |
| Petabyte Scale Data Deduplication at Mixpanel | | | |
Scalability |
| Distributed Logging | | | |
Scalability / Distributed Logging |
| Logging at LinkedIn | | | |
| Scalable and Reliable Log Ingestion at Pinterest | | | |
| High-performance Replicated Log Service at Twitter | | | |
| Logging Service with Spark at CERN Accelerator | | | |
| Logging and Aggregation at Quora | | | |
| Collection and Analysis of Daemon Logs at Badoo | | | |
| Log Parsing with Static Code Analysis at Palantir | | | |
| Centralized Application Logging at eBay | | | |
| Enrich VPC Flow Logs at Hyper Scale to provide Network Insight at Netflix | | | |
| BookKeeper: Distributed Log Storage at Yahoo | | | |
| LogDevice: Distributed Data Store for Logs at Facebook | | | |
| LogFeeder: Log Collection System at Yelp | | | |
| DBLog: Generic Change-Data-Capture Framework at Netflix | | | |
Scalability |
| Distributed Searching | | | |
Scalability / Distributed Searching |
| Search Architecture at Instagram | | | |
| Search Architecture at eBay | | | |
| Search Architecture at Box | | | |
| Search Discovery Indexing Platform at Coupang | | | |
| Universal Search System at Pinterest | | | |
| Improving Search Engine Efficiency by over 25% at eBay | | | |
| Indexing and Querying Telemetry Logs with Lucene at Palantir | | | |
| Query Understanding at TripAdvisor | | | |
| Search Federation Architecture at LinkedIn (2018) | | | |
| Search at Slack | | | |
| Search Engine at DoorDash | | | |
| Stability and Scalability for Search at Twitter | | | |
| Search Service at Twitter (2014) | | | |
| Autocomplete Search (2 parts) at Traveloka | | | |
| Data-Driven Autocorrection System at Canva | | | |
| Adapting Search to Indian Phonetics at Flipkart | | | |
| Nautilus: Search Engine at Dropbox | | | |
| Galene: Search Architecture of LinkedIn | | | |
| Manas: High Performing Customized Search System at Pinterest | | | |
| Sherlock: Near Real Time Search Indexing at Flipkart | | | |
| Nebula: Storage Platform to Build Search Backends at Airbnb | | | |
| ELK (Elasticsearch, Logstash, Kibana) Stack | | | |
Scalability / Distributed Searching / ELK (Elasticsearch, Logstash, Kibana) Stack |
| Predictions in Real Time with ELK at Uber | | | |
| Building a scalable ELK stack at Envato | | | |
| ELK at Robinhood | | | |
| Scaling Elasticsearch Clusters at Uber | | | |
| Elasticsearch Performance Tuning Practice at eBay | | | |
| Improve Performance using Elasticsearch Plugins (2 parts) at Tinder | | | |
| Elasticsearch at Kickstarter | | | |
| Log Parsing with Logstash and Google Protocol Buffers at Trivago | | | |
| Fast Order Search using Data Pipeline and Elasticsearch at Yelp | | | |
| Moving Core Business Search to Elasticsearch at Yelp | | | |
| Sharding out Elasticsearch at Vinted | | | |
| Self-Ranking Search with Elasticsearch at Wattpad | | | |
| Vulcanizer: a library for operating Elasticsearch at Github | | | |
Scalability |
| Distributed Storage | | | |
Scalability / Distributed Storage |
| In-memory Storage | | | |
Scalability / Distributed Storage / In-memory Storage |
| MemSQL Architecture - The Fast (MVCC, InMem, LockFree, CodeGen) And Familiar (SQL) | | | |
| Optimizing Memcached Efficiency at Quora | | | |
| Real-Time Data Warehouse with MemSQL on Cisco UCS | | | |
| Moving to MemSQL at Tapjoy | | | |
| MemSQL and Kinesis for Real-time Insights at Disney | | | |
| MemSQL to Query Hundreds of Billions of Rows in a Dashboard at Pandora | | | |
Scalability / Distributed Storage |
| Object Storage | | | |
Scalability / Distributed Storage / Object Storage |
| Scaling HDFS at Uber | | | |
| Reasons for Choosing S3 over HDFS at Databricks | | | |
| File System on Amazon S3 at Quantcast | | | |
| Image Recovery at Scale Using S3 Versioning at Trivago | | | |
| Cloud Object Store at Yahoo | | | |
| Ambry: Distributed Immutable Object Store at LinkedIn | | | |
| Dynamometer: Scale Testing HDFS on Minimal Hardware with Maximum Fidelity at LinkedIn | | | |
| Hammerspace: Persistent, Concurrent, Off-heap Storage at Airbnb | | | |
| MezzFS: Mounting Object Storage in Media Processing Platform at Netflix | | | |
| Magic Pocket: In-house Multi-exabyte Storage System at Dropbox | | | |
Scalability |
| Relational Databases | | | |
Scalability / Relational Databases |
| Building and Deploying MySQL Raft at Meta | | | |
| MySQL for Schema-less Data at FriendFeed | | | |
| MySQL at Pinterest | | | |
| PostgreSQL at Twitch | | | |
| Scaling MySQL-based Financial Reporting System at Airbnb | | | |
| Scaling MySQL at Wix | | | |
| MaxScale (MySQL) Database Proxy at Airbnb | | | |
| Switching from Postgres to MySQL at Uber | | | |
| Handling Growth with Postgres at Instagram | | | |
| Scaling the Analytics Database (Postgres) at TransferWise | | | |
| Updating a 50 Terabyte PostgreSQL Database at Adyen | | | |
| Scaling Database Access for 100s of Billions of Queries per Day at PayPal | | | |
| Minimizing Read-Write MySQL Downtime at Yelp | | | |
| Migrating MySQL from 5.6 to 8.0 at Facebook | | | |
| Migration from HBase to MyRocks at Quora | | | |
| Replication | | | |
Scalability / Relational Databases / Replication |
| MySQL Parallel Replication (4 parts) at Booking.com | | | |
| Mitigating MySQL Replication Lag and Reducing Read Load at Github | | | |
| Read Consistency with Database Replicas at Shopify | | | |
| Black-Box Auditing: Verifying End-to-End Replication Integrity between MySQL and Redshift at Yelp | | | |
| Partitioning Main MySQL Database at Airbnb | | | |
| Herb: Multi-DC Replication Engine for Schemaless Datastore at Uber | | | |
Scalability / Relational Databases |
| Sharding | | | |
Scalability / Relational Databases / Sharding |
| Sharding MySQL at Pinterest | | | |
| Sharding MySQL at Twilio | | | |
| Sharding MySQL at Square | | | |
| Sharding MySQL at Quora | | | |
| Sharding Layer of Schemaless Datastore at Uber | | | |
| Sharding & IDs at Instagram | | | |
| Sharding Postgres at Notion | | | |
| Solr: Improving Performance for Batch Indexing at Box | | | |
| Geosharded Recommendations (3 parts) at Tinder | | | |
| Scaling Services with Shard Manager at Facebook | | | |
Scalability / Relational Databases |
| Presto the Distributed SQL Query Engine | | | |
Scalability / Relational Databases / Presto the Distributed SQL Query Engine |
| Presto at Pinterest | | | |
| Presto Infrastructure at Lyft | | | |
| Presto at Grab | | | |
| Engineering Data Analytics with Presto and Apache Parquet at Uber | | | |
| Data Wrangling at Slack | | | |
| Presto in Big Data Platform on AWS at Netflix | | | |
| Presto Auto Scaling at Eventbrite | | | |
| Speed Up Presto with Alluxio Local Cache at Uber | | | |
Scalability |
| NoSQL Databases | | | |
Scalability / NoSQL Databases |
| Key-Value Databases | | | |
Scalability / NoSQL Databases / Key-Value Databases |
| DynamoDB at Nike | | | |
| DynamoDB at Segment | | | |
| DynamoDB at Mapbox | | | |
| Manhattan: Distributed Key-Value Database at Twitter | | | |
| Sherpa: Distributed NoSQL Key-Value Store at Yahoo | | | |
| HaloDB: Embedded Key-Value Storage Engine at Yahoo | | | |
| MPH: Fast and Compact Immutable Key-Value Stores at Indeed | | | |
| Venice: Distributed Key-Value Database at Linkedin | | | |
Scalability / NoSQL Databases |
| Columnar Databases | | | |
Scalability / NoSQL Databases / Columnar Databases |
| Cassandra | | | |
Scalability / NoSQL Databases / Columnar Databases / Cassandra |
| Cassandra at Instagram | | | |
| Storing Images in Cassandra at Walmart | | | |
| Storing Messages with Cassandra at Discord | | | |
| Scaling Cassandra Cluster at Walmart | | | |
| Scaling Ad Analytics with Cassandra at Yelp | | | |
| Scaling to 100+ Million Reads/Writes using Spark and Cassandra at Dream11 | | | |
| Moving Food Feed from Redis to Cassandra at Zomato | | | |
| Benchmarking Cassandra Scalability on AWS at Netflix | | | |
| Service Decomposition at Scale with Cassandra at Intuit QuickBooks | | | |
| Cassandra for Keeping Counts In Sync at SoundCloud | | | |
| Cassandra Driver Configuration for Improved Performance and Load Balancing at Glassdoor | | | |
| cstar: Cassandra Orchestration Tool at Spotify | | | |
Scalability / NoSQL Databases / Columnar Databases |
| HBase | | | |
Scalability / NoSQL Databases / Columnar Databases / HBase |
| HBase at Salesforce | | | |
| HBase in Facebook Messages | | | |
| HBase in Imgur Notification | | | |
| Improving HBase Backup Efficiency at Pinterest | | | |
| HBase at Xiaomi | | | |
Scalability / NoSQL Databases / Columnar Databases |
| Redshift | | | |
Scalability / NoSQL Databases / Columnar Databases / Redshift |
| Redshift at GIPHY | | | |
| Redshift at Hudl | | | |
| Redshift at Drivy | | | |
Scalability / NoSQL Databases |
| Document Databases | | | |
Scalability / NoSQL Databases / Document Databases |
| eBay: Building Mission-Critical Multi-Data Center Applications with MongoDB | | | |
| MongoDB at Baidu: Multi-Tenant Cluster Storing 200+ Billion Documents across 160 Shards | | | |
| Migrating Mongo Data at Addepar | | | |
| The AWS and MongoDB Infrastructure of Parse (acquired by Facebook) | | | |
| Migrating Mountains of Mongo Data at Addepar | | | |
| Couchbase Ecosystem at LinkedIn | | | |
| SimpleDB at Zendesk | | | |
| Espresso: Distributed Document Store at LinkedIn | | | |
Scalability / NoSQL Databases |
| Graph Databases | | | |
Scalability / NoSQL Databases / Graph Databases |
| FlockDB: Distributed Graph Database at Twitter | | | |
| TAO: Distributed Data Store for the Social Graph at Facebook | | | |
| Akutan: Distributed Knowledge Graph Store at eBay | | | |
Scalability |
| Time Series Databases | | | |
Scalability / Time Series Databases |
| Beringei: High-performance Time Series Storage Engine at Facebook | | | |
| MetricsDB: TimeSeries Database for storing metrics at Twitter | | | |
| Atlas: In-memory Dimensional Time Series Database at Netflix | | | |
| Heroic: Time Series Database at Spotify | | | |
| Roshi: Distributed Storage System for Time-Series Event at SoundCloud | | | |
| Goku: Time Series Database at Pinterest | | | |
| Scaling Time Series Data Storage (2 parts) at Netflix | | | |
| Druid - Real-time Analytics Database | | | |
Scalability / Time Series Databases / Druid - Real-time Analytics Database |
| Druid at Airbnb | | | |
| Druid at Walmart | | | |
| Druid at eBay | | | |
| Druid at Netflix | | | |
Scalability |
| Distributed Repositories, Dependencies, and Configurations Management | | | |
Scalability / Distributed Repositories, Dependencies, and Configurations Management |
| DGit: Distributed Git at Github | | | |
| Stemma: Distributed Git Server at Palantir | | | |
| Configuration Management for Distributed Systems at Flickr | | | |
| Git Repository at Microsoft | | | |
| Solve Git Problem with Large Repositories at Microsoft | | | |
| Single Repository at Google | | | |
| Scaling Infrastructure and (Git) Workflow at Adyen | | | |
| Dotfiles Distribution at Booking.com | | | |
| Secret Detector: Preventing Secrets in Source Code at Yelp | | | |
| Managing Software Dependency at Scale at LinkedIn | | | |
| Merging Code in High-velocity Repositories at LinkedIn | | | |
| Dynamic Configuration at Twitter | | | |
| Dynamic Configuration at Mixpanel | | | |
| Dynamic Configuration at GoDaddy | | | |
Scalability |
| Scaling Continuous Integration and Continuous Delivery | | | |
Scalability / Scaling Continuous Integration and Continuous Delivery |
| Continuous Integration Stack at Facebook | | | |
| Continuous Integration with Distributed Repositories and Dependencies at Netflix | | | |
| Continuous Integration and Deployment with Bazel at Dropbox | | | |
| Adopting Bazel for Web at Airbnb | | | |
| Continuous Deployments at BuzzFeed | | | |
| Screwdriver: Continuous Delivery Build System for Dynamic Infrastructure at Yahoo | | | |
| CI/CD at Betterment | | | |
| CI/CD at Brainly | | | |
| Scaling iOS CI with Anka at Shopify | | | |
| Scaling Jira Server at Yelp | | | |
| Auto-scaling CI/CD cluster at Flexport | | | |
Availability |
| Resilience Engineering: Learning to Embrace Failure | | | |
Availability / Resilience Engineering: Learning to Embrace Failure |
| Resilience Engineering with Project Waterbear at LinkedIn | | | |
| Resiliency against Traffic Oversaturation at iHeartRadio | | | |
| Resiliency in Distributed Systems at GO-JEK | | | |
| Practical NoSQL Resilience Design Pattern for the Enterprise at eBay | | | |
| Ensuring Resilience to Disaster at Quora | | | |
| Site Resiliency at Expedia | | | |
| Resiliency and Disaster Recovery with Kafka at eBay | | | |
| Disaster Recovery for Multi-Region Kafka at Uber | | | |
Availability |
| Failover | | | |
Availability / Failover |
| The Evolution of Global Traffic Routing and Failover | | | |
| Testing for Disaster Recovery Failover Testing | | | |
| Designing a Microservices Architecture for Failure | | | |
| ELB for Automatic Failover at GoSquared | | | |
| Eliminate the Database for Higher Availability at American Express | | | |
| Failover with Redis Sentinel at Vinted | | | |
| High-availability SaaS Infrastructure at FreeAgent | | | |
| MySQL High Availability at GitHub | | | |
| MySQL High Availability at Eventbrite | | | |
| Business Continuity & Disaster Recovery at Walmart | | | |
Availability |
| Load Balancing | | | |
Availability / Load Balancing |
| Introduction to Modern Network Load Balancing and Proxying | | | |
| Top Five (Load Balancing) Scalability Patterns | | | |
| Load Balancing infrastructure to support more than 1.3 billion users at Facebook | | | |
| DHCPLB: DHCP Load Balancer at Facebook | | | |
| Katran: Scalable Network Load Balancer at Facebook | | | |
| Deterministic Aperture: A Distributed, Load Balancing Algorithm at Twitter | | | |
| Load Balancing with Eureka at Netflix | | | |
| Edge Load Balancing at Netflix | | | |
| Zuul 2: Cloud Gateway at Netflix | | | |
| Load Balancing at Yelp | | | |
| Load Balancing at Github | | | |
| Consistent Hashing to Improve Load Balancing at Vimeo | | | |
| UDP Load Balancing at 500 pixel | | | |
| QALM: QoS Load Management Framework at Uber | | | |
| Traffic Steering using Rum DNS at LinkedIn | | | |
| Traffic Infrastructure (Edge Network) at Dropbox | | | |
| Intelligent DNS based load balancing at Dropbox | | | |
| Monitor DNS systems at Stripe | | | |
| Multi-DNS Architecture (3 parts) at Monday | | | |
| Dynamic Anycast DNS Infrastructure at Hulu | | | |
Availability |
| Rate Limiting | | | |
Availability / Rate Limiting |
| Rate Limiting for Scaling to Millions of Domains at Cloudflare | | | |
| Cloud Bouncer: Distributed Rate Limiting at Yahoo | | | |
| Scaling API with Rate Limiters at Stripe | | | |
| Distributed Rate Limiting at Allegro | | | |
| Ratequeue: Core Queueing-And-Rate-Limiting System at Twilio | | | |
| Quotas Service at Grab | | | |
| Rate Limiting at Figma | | | |
Availability |
| Autoscaling | | | |
Availability / Autoscaling |
| Autoscaling Pinterest | | | |
| Autoscaling Based on Request Queuing at Square | | | |
| Autoscaling Jenkins at Trivago | | | |
| Autoscaling Pub-Sub Consumers at Spotify | | | |
| Autoscaling Bigtable Clusters based on CPU Load at Spotify | | | |
| Autoscaling AWS Step Functions Activities at Yelp | | | |
| Scryer: Predictive Auto Scaling Engine at Netflix | | | |
| Bouncer: Simple AWS Auto Scaling Rollovers at Palantir | | | |
| Clusterman: Autoscaling Mesos Clusters at Yelp | | | |
Availability |
| Availability in Globally Distributed Storage Systems at Google | | | |
| NodeJS High Availability at Yahoo | | | |
| Operations (11 parts) at LinkedIn | | | |
| Monitoring Powers High Availability for LinkedIn Feed | | | |
| Supporting Global Events at Facebook | | | |
| High Availability at BlaBlaCar | | | |
| High Availability at Netflix | | | |
| High Availability Cloud Infrastructure at Twilio | | | |
| Automating Datacenter Operations at Dropbox | | | |
| Globalizing Player Accounts at Riot Games | | | |
Stability |
| Circuit Breaker | | | |
Stability / Circuit Breaker |
| Circuit Breaking in Distributed Systems | | | |
| Circuit Breaker for Scaling Containers | | | |
| Lessons in Resilience at SoundCloud | | | |
| Protector: Circuit Breaker for Time Series Databases at Trivago | | | |
| Improved Production Stability with Circuit Breakers at Heroku | | | |
| Circuit Breaker at Zendesk | | | |
| Circuit Breaker at Traveloka | | | |
| Circuit Breaker at Shopify | | | |
Stability |
| Timeouts | | | |
Stability / Timeouts |
| Fault Tolerance (Timeouts and Retries, Thread Separation, Semaphores, Circuit Breakers) at Netflix | | | |
| Enforce Timeout: A Reliability Methodology at DoorDash | | | |
| Troubleshooting a Connection Timeout Issue with tcp_tw_recycle Enabled at eBay | | | |
Stability |
| Crash-safe Replication for MySQL at Booking.com | | | |
| Bulkheads: Partition and Tolerate Failure in One Part | | | |
| Steady State: Always Put Logs on Separate Disk | | | |
| Throttling: Maintain a Steady Pace | | | |
| Multi-Clustering: Improving Resiliency and Stability of a Large-scale Monolithic API Service at LinkedIn | | | |
| Determinism (4 parts) in League of Legends Server | | | |
| |
| Performance Optimization on OS, Storage, Database, Network | | | |
| |
| Improving Performance with Background Data Prefetching at Instagram | | | |
| Fixing Linux filesystem performance regressions at LinkedIn | | | |
| Compression Techniques to Solve Network I/O Bottlenecks at eBay | | | |
| Optimizing Web Servers for High Throughput and Low Latency at Dropbox | | | |
| Linux Performance Analysis in 60.000 Milliseconds at Netflix | | | |
| Live Downsizing Google Cloud Persistent Disks (PD-SSD) at Mixpanel | | | |
| Decreasing RAM Usage by 40% Using jemalloc with Python & Celery at Zapier | | | |
| Reducing Memory Footprint at Slack | | | |
| Continuous Load Testing at Slack | | | |
| Performance Improvements at Pinterest | | | |
| Server Side Rendering at Wix | | | |
| 30x Performance Improvements on MySQLStreamer at Yelp | | | |
| Optimizing APIs at Netflix | | | |
| Performance Monitoring with Riemann and Clojure at Walmart | | | |
| Performance Tracking Dashboard for Live Games at Zynga | | | |
| Optimizing CAL Report Hadoop MapReduce Jobs at eBay | | | |
| Performance Tuning on Quartz Scheduler at eBay | | | |
| Profiling C++ (Part 1: Optimization, Part 2: Measurement and Analysis) at Riot Games | | | |
| Profiling React Server-Side Rendering at HomeAway | | | |
| Hardware-Assisted Video Transcoding at Dailymotion | | | |
| Cross Shard Transactions at 10 Million RPS at Dropbox | | | |
| API Profiling at Pinterest | | | |
| Pagelets Parallelize Server-side Processing at Yelp | | | |
| Improving key expiration in Redis at Twitter | | | |
| Ad Delivery Network Performance Optimization with Flame Graphs at MindGeek | | | |
| Predictive CPU isolation of containers at Netflix | | | |
| Improving HDFS I/O Utilization for Efficiency at Uber | | | |
| Cloud Jewels: Estimating kWh in the Cloud at Etsy | | | |
| Unthrottled: Fixing CPU Limits in the Cloud (2 parts) at Indeed | | | |
| |
| Performance Optimization by Tuning Garbage Collection | | | |
| |
| Garbage Collection in Java Applications at LinkedIn | | | |
| Garbage Collection in High-Throughput, Low-Latency Machine Learning Services at Adobe | | | |
| Garbage Collection in Redux Applications at SoundCloud | | | |
| Garbage Collection in Go Application at Twitch | | | |
| Analyzing V8 Garbage Collection Logs at Alibaba | | | |
| Python Garbage Collection for Dropping 50% Memory Growth Per Request at Instagram | | | |
| Performance Impact of Removing Out of Band Garbage Collector (OOBGC) at Github | | | |
| Debugging Java Memory Leaks at Allegro | | | |
| Optimizing JVM at Alibaba | | | |
| Tuning JVM Memory for Large-scale Services at Uber | | | |
| Solr Performance Tuning at Walmart | | | |
| Memory Tuning a High Throughput Microservice at Flipkart | | | |
| |
| Performance Optimization on Image, Video, Page Load | | | |
Performance / Performance Optimization on Image, Video, Page Load |
| Optimizing 360 Photos at Scale at Facebook | | | |
| Reducing Image File Size in the Photos Infrastructure at Etsy | | | |
| Improving GIF Performance at Pinterest | | | |
| Optimizing Video Playback Performance at Pinterest | | | |
| Optimizing Video Stream for Low Bandwidth with Dynamic Optimizer at Netflix | | | |
| Adaptive Video Streaming at YouTube | | | |
| Reducing Video Loading Time at Dailymotion | | | |
| Improving Homepage Performance at Zillow | | | |
| The Process of Optimizing for Client Performance at Expedia | | | |
| Web Performance at BBC | | | |
| |
| Performance Optimization by Brotli Compression | | | |
| |
| Boosting Site Speed Using Brotli Compression at LinkedIn | | | |
| Brotli at Booking.com | | | |
| Brotli at Treebo | | | |
| Deploying Brotli for Static Content at Dropbox | | | |
| Progressive Enhancement with Brotli at Yelp | | | |
| Speeding Up Redis with Compression at DoorDash | | | |
| |
| Performance Optimization on Languages and Frameworks | | | |
| |
| Python at Netflix | | | |
| Python at scale (3 parts) at Instagram | | | |
| OCaml best practices (2 parts) at Issuu | | | |
| PHP at Slack | | | |
| Go at Trivago | | | |
| TypeScript at Etsy | | | |
| Kotlin for taming state at Etsy | | | |
| Kotlin at DoorDash | | | |
| BPF and Go at Bumble | | | |
| Ruby on Rails at GitLab | | | |
| Rust in production at Figma | | | |
| Choosing a Language Stack at WeWork | | | |
| Switching from Go to Rust at Discord | | | |
| ASP.NET Core Performance Optimization at Agoda | | | |
| Data Race Patterns in Go at Uber | | | |
| Java 21 Virtual Threads at Netflix | | | |
Intelligence |
| Big Data | | | |
Intelligence / Big Data |
| Data Platform at Uber | | | |
| Data Platform at BMW | | | |
| Data Platform at Netflix | | | |
| Data Platform at Flipkart | | | |
| Data Platform at Coupang | | | |
| Data Platform at DoorDash | | | |
| Data Platform at Khan Academy | | | |
| Data Infrastructure at Airbnb | | | |
| Data Infrastructure at LinkedIn | | | |
| Data Infrastructure at GO-JEK | | | |
| Data Ingestion Infrastructure at Pinterest | | | |
| Data Analytics Architecture at Pinterest | | | |
| Data Orchestration Service at Spotify | | | |
| Big Data Processing (2 parts) at Spotify | | | |
| Big Data Processing at Uber | | | |
| Analytics Pipeline at Lyft | | | |
| Analytics Pipeline at Grammarly | | | |
| Analytics Pipeline at Teads | | | |
| ML Data Pipelines for Real-Time Fraud Prevention at PayPal | | | |
| Big Data Analytics and ML Techniques at LinkedIn | | | |
| Self-Serve Reporting Platform on Hadoop at LinkedIn | | | |
| Privacy-Preserving Analytics and Reporting at LinkedIn | | | |
| Analytics Platform for Tracking Item Availability at Walmart | | | |
| Real-Time Analytics for Mobile App Crashes using Apache Pinot at Uber | | | |
| HALO: Hardware Analytics and Lifecycle Optimization at Facebook | | | |
| RBEA: Real-time Analytics Platform at King | | | |
| AresDB: GPU-Powered Real-time Analytics Engine at Uber | | | |
| AthenaX: Streaming Analytics Platform at Uber | | | |
| Jupiter: Config Driven Adtech Batch Ingestion Platform at Uber | | | |
| Delta: Data Synchronization and Enrichment Platform at Netflix | | | |
| Keystone: Real-time Stream Processing Platform at Netflix | | | |
| Databook: Turning Big Data into Knowledge with Metadata at Uber | | | |
| Amundsen: Data Discovery & Metadata Engine at Lyft | | | |
| Maze: Funnel Visualization Platform at Uber | | | |
| Metacat: Making Big Data Discoverable and Meaningful at Netflix | | | |
| SpinalTap: Change Data Capture System at Airbnb | | | |
| Accelerator: Fast Data Processing Framework at eBay | | | |
| Omid: Transaction Processing Platform at Yahoo | | | |
| TensorFlowOnSpark: Distributed Deep Learning on Big Data Clusters at Yahoo | | | |
| CaffeOnSpark: Distributed Deep Learning on Big Data Clusters at Yahoo | | | |
| Spark on Scala: Analytics Reference Architecture at Adobe | | | |
| Experimentation Platform (2 parts) at Spotify | | | |
| Experimentation Platform at Airbnb | | | |
| Smart Product Platform at Zalando | | | |
| Log Analysis Platform at LINE | | | |
| Data Visualisation Platform at Myntra | | | |
| Building and Scaling Data Lineage at Netflix | | | |
| Building a scalable data management system for computer vision tasks at Pinterest | | | |
| Structured Data at Etsy | | | |
| Scaling a Mature Data Pipeline - Managing Overhead at Airbnb | | | |
| Spark Partitioning Strategies at Airbnb | | | |
| Scaling the Hadoop Distributed File System at LinkedIn | | | |
| Scaling Hadoop YARN cluster beyond 10,000 nodes at LinkedIn | | | |
| Scaling Big Data Access Controls at Pinterest | | | |
Intelligence |
| Distributed Machine Learning | | | |
Intelligence / Distributed Machine Learning |
| Machine Learning Platform at Yelp | | | |
| Machine Learning Platform at Etsy | | | |
| Machine Learning Platform at Zalando | | | |
| Scaling AI/ML Infrastructure at Uber | | | |
| Recommendation System at Lyft | | | |
| Reinforcement Learning Platform at Lyft | | | |
| Platform for Serving Recommendations at Etsy | | | |
| Infrastructure to Run User Forecasts at Spotify | | | |
| Aroma: Using ML for Code Recommendation at Facebook | | | |
| Flyte: Cloud Native Machine Learning and Data Processing Platform at Lyft | | | |
| LyftLearn: ML Model Training Infrastructure built on Kubernetes at Lyft | | | |
| Horovod: Open Source Distributed Deep Learning Framework for TensorFlow at Uber | | | |
| Genie: Gen AI On-Call Copilot at Uber | | | |
| COTA: Improving Customer Care with NLP & Machine Learning at Uber | | | |
| Manifold: Model-Agnostic Visual Debugging Tool for Machine Learning at Uber | | | |
| Repo-Topix: Topic Extraction Framework at Github | | | |
| Concourse: Generating Personalized Content Notifications in Near-Real-Time at LinkedIn | | | |
| Altus Care: Applying a Chatbot to Platform Engineering at eBay | | | |
| PyKrylov: Accelerating Machine Learning Research at eBay | | | |
| Box Graph: Spontaneous Social Network at Box | | | |
| PricingNet: Pricing Modelling with Neural Networks at Skyscanner | | | |
| PinText: Multitask Text Embedding System at Pinterest | | | |
| SearchSage: Learning Search Query Representations at Pinterest | | | |
| Cannes: ML saves $1.7M a year on document previews at Dropbox | | | |
| Scaling Gradient Boosted Trees for Click-Through-Rate Prediction at Yelp | | | |
| Learning with Privacy at Scale at Apple | | | |
| Deep Learning for Image Classification Experiment at Mercari | | | |
| Deep Learning for Frame Detection in Product Images at Allegro | | | |
| Content-based Video Relevance Prediction at Hulu | | | |
| Moderating Inappropriate Video Content at Yelp | | | |
| Improving Photo Selection With Deep Learning at TripAdvisor | | | |
| Personalized Recommendations for Experiences Using Deep Learning at TripAdvisor | | | |
| Personalised Recommender Systems at BBC | | | |
| Machine Learning (2 parts) at Condé Nast | | | |
| Natural Language Processing and Content Analysis (2 parts) at Condé Nast | | | |
| Mapping the World of Music Using Machine Learning (2 parts) at iHeartRadio | | | |
| Machine Learning to Improve Streaming Quality at Netflix | | | |
| Machine Learning to Match Drivers & Riders at GO-JEK | | | |
| Improving Video Thumbnails with Deep Neural Nets at YouTube | | | |
| Quantile Regression for Delivering On Time at Instacart | | | |
| Cross-Lingual End-to-End Product Search with Deep Learning at Zalando | | | |
| Machine Learning at Jane Street | | | |
| Machine Learning for Ranking Answers End-to-End at Quora | | | |
| Clustering Similar Stories Using LDA at Flipboard | | | |
| Similarity Search at Flickr | | | |
| Large-Scale Machine Learning Pipeline for Job Recommendations at Indeed | | | |
| Deep Learning from Prototype to Production at Taboola | | | |
| Atom Smashing using Machine Learning at CERN | | | |
| Mapping Tags at Medium | | | |
| Clustering with the Dirichlet Process Mixture Model in Scala at Monsanto | | | |
| Map Pins with DBSCAN & Random Forests at Foursquare | | | |
| Forecasting at Uber | | | |
| Financial Forecasting at Uber | | | |
| Productionizing ML with Workflows at Twitter | | | |
| GUI Testing Powered by Deep Learning at eBay | | | |
| Scaling Machine Learning to Recommend Driving Routes at Pivotal | | | |
| Real-Time Predictions at DoorDash | | | |
| Machine Intelligence at Dropbox | | | |
| Machine Learning for Indexing Text from Billions of Images at Dropbox | | | |
| Modeling User Journeys via Semantic Embeddings at Etsy | | | |
| Automated Fake Account Detection at LinkedIn | | | |
| Building Knowledge Graph at Airbnb | | | |
| Core Modeling at Instagram | | | |
| Neural Architecture Search (NAS) for Prohibited Item Detection at Mercari | | | |
| Computer Vision at Airbnb | | | |
| 3D Home Backend Algorithms at Zillow | | | |
| Long-term Forecasts at Lyft | | | |
| Discovering Popular Dishes with Deep Learning at Yelp | | | |
| SplitNet Architecture for Ad Candidate Ranking at Twitter | | | |
| Jobs Filter at Indeed | | | |
| Architecting Restaurant Wait Time Predictions at Yelp | | | |
| Music Personalization at Spotify | | | |
| Deep Learning for Domain Name Valuation at GoDaddy | | | |
| Similarity Clustering to Catch Fraud Rings at Stripe | | | |
| Personalized Search at Etsy | | | |
| ML Feature Serving Infrastructure at Lyft | | | |
| Context-Specific Bidding System at Etsy | | | |
| Moderating Promotional Spam and Inappropriate Content in Photos at Scale at Yelp | | | |
| Optimizing Payments with Machine Learning at Dropbox | | | |
| Scaling Media Machine Learning at Netflix | | | |
| Similarity Engine at eBay | | | |
| Machine Learning in Content Moderation at Etsy | | | |
Architecture |
| Tech Stack at Medium | | | |
| Tech Stack at Shopify | | | |
| Building Services (4 parts) at Airbnb | | | |
| Architecture of Evernote | | | |
| Architecture of Chat Service (3 parts) at Riot Games | | | |
| Architecture of League of Legends Client Update | | | |
| Architecture of Ad Platform at Twitter | | | |
| Architecture of API Gateway at Uber | | | |
| Architecture of API Gateway at Tinder | | | |
| Basic Architecture of Slack | | | |
| Lightweight Distributed Architecture to Handle Thousands of Library Releases at eBay | | | |
| Back-end at LinkedIn | | | |
| Back-end at Flickr | | | |
| Infrastructure (3 parts) at Zendesk | | | |
| Cloud Infrastructure at Grubhub | | | |
| Real-time Presence Platform at LinkedIn | | | |
| Settings Platform at LinkedIn | | | |
| Nearline System for Scale and Performance (2 parts) at Glassdoor | | | |
| Real-time User Action Counting System for Ads at Pinterest | | | |
| API Platform at Riot Games | | | |
| Games Platform at The New York Times | | | |
| Kabootar: Communication Platform at Swiggy | | | |
| Simone: Distributed Simulation Service at Netflix | | | |
| Seagull: Distributed System that Helps Running > 20 Million Tests Per Day at Yelp | | | |
| PriceAggregator: Intelligent System for Hotel Price Fetching (3 parts) at Agoda | | | |
| Phoenix: Testing Platform (3 parts) at Tinder | | | |
| Hexagonal Architecture at Netflix | | | |
| Architecture of Sticker Services at LINE | | | |
| Stack Overflow Enterprise at Palantir | | | |
| Architecture of Following Feed, Interest Feed, and Picked For You at Pinterest | | | |
| API Specification Workflow at WeWork | | | |
| Media Database at Netflix | | | |
| Member Transaction History Architecture at Walmart | | | |
| Sync Engine (2 parts) at Dropbox | | | |
| Ads Pacing Service at Twitter | | | |
| Rapid Event Notification System at Netflix | | | |
| Architectures of Finance, Banking, and Payment Systems | | | |
Architecture / Architectures of Finance, Banking, and Payment Systems |
| Bank Backend at Monzo | | | |
| Trading Platform for Scale at Wealthsimple | | | |
| Core Banking System at Margo Bank | | | |
| Architecture of Nubank | | | |
| Tech Stack at TransferWise | | | |
| Tech Stack at Addepar | | | |
| Avoiding Double Payments in a Distributed Payments System at Airbnb | | | |
| Scaling Payments (3 parts) at Etsy | | | |
| Handles Millions of Digital Transactions Safely Everyday at Paytm | | | |
| Billing and Payment Platform at Grammarly | | | |
Interview |
| Designing Large-Scale Systems | | | |
Interview / Designing Large-Scale Systems |
| My Scaling Hero - Jeff Atwood (a dose of Endorphins before your interview, JK) | | | |
| Software Engineering Advice from Building Large-Scale Distributed Systems - Jeff Dean | | | |
| Introduction to Architecting Systems for Scale | | | |
| Anatomy of a System Design Interview | | | |
| 8 Things You Need to Know Before a System Design Interview | | | |
| Top 10 System Design Interview Questions | | | |
| Top 10 Common Large-Scale Software Architectural Patterns in a Nutshell | | | |
| Cloud Big Data Design Patterns - Lynn Langit | | | |
| How NOT to design Netflix in your 45-minute System Design Interview? | | | |
| API Best Practices: Webhooks, Deprecation, and Design | | | |
Interview |
| Explaining Low-Level Systems (OS, Network/Protocol, Database, Storage) | | | |
Interview / Explaining Low-Level Systems (OS, Network/Protocol, Database, Storage) |
| The Precise Meaning of I/O Wait Time in Linux | | | |
| Paxos Made Live – An Engineering Perspective | | | |
| How to do Distributed Locking | | | |
| SQL Transaction Isolation Levels Explained | | | |
Interview |
| "What Happens When... and How" Questions | | | |
Interview / "What Happens When... and How" Questions |
| Netflix: What Happens When You Press Play? | | | |
| Monzo: How Peer-To-Peer Payments Work | | | |
| Transit and Peering: How Your Requests Reach GitHub | | | |
| How Spotify Streams Music | | | |
Organization |
| Engineering Levels at SoundCloud | | | |
| Engineering Roles at Palantir | | | |
| Engineering Career Framework at Dropbox | | | |
| Scaling Engineering Teams at Twitter | | | |
| Scaling Decision-Making Across Teams at LinkedIn | | | |
| Scaling Data Science Team at GOJEK | | | |
| Scaling Agile at Zalando | | | |
| Scaling Agile at bol.com | | | |
| Lessons Learned from Scaling a Product Team at Intercom | | | |
| Hiring, Managing, and Scaling Engineering Teams at Typeform | | | |
| Scaling the Datagram Team at Instagram | | | |
| Scaling the Design Team at Flexport | | | |
| Team Model for Scaling a Design System at Salesforce | | | |
| Building Analytics Team (4 parts) at Wish | | | |
| From 2 Founders to 1000 Employees at Transferwise | | | |
| Lessons Learned Growing a UX Team from 10 to 170 at Adobe | | | |
| Five Lessons from Scaling at Pinterest | | | |
| Approach Engineering at Vinted | | | |
| Using Metrics to Improve the Development Process (and Coach People) at Indeed | | | |
| Mistakes to Avoid while Creating an Internal Product at Skyscanner | | | |
| RACI (Responsible, Accountable, Consulted, Informed) at Etsy | | | |
| Four Pillars of Leading People (Empathy, Inspiration, Trust, Honesty) at Zalando | | | |
| Pair Programming at Shopify | | | |
| Distributed Responsibility at Asana | | | |
| Rotating Engineers at Zalando | | | |
| Experiment Idea Review at Pinterest | | | |
| Tech Migrations at Spotify | | | |
| Improving Code Ownership at Yelp | | | |
| Agile Code Base at eBay | | | |
| Agile Data Engineering at Miro | | | |
| Automated Incident Management through Slack at Airbnb | | | |
| Refactor Organization at BBC | | | |
| Code Review | | | |
Organization / Code Review |
| Code Review at Palantir | | | |
| Code Review at LINE | | | |
| Code Reviews at Medium | | | |
| Code Review at LinkedIn | | | |
| Code Review at Disney | | | |
| Code Review at Netlify | | | |
Talk |
| Distributed Systems in One Lesson - Tim Berglund, Senior Director of Developer Experience at Confluent | | | |
| Building Real Time Infrastructure at Facebook - Jeff Barber and Shie Erlich, Software Engineer at Facebook | | | |
| Building Reliable Social Infrastructure for Google - Marc Alvidrez, Senior Manager at Google | | | |
| Building a Distributed Build System at Google Scale - Aysylu Greenberg, SDE at Google | | | |
| Site Reliability Engineering at Dropbox - Tammy Butow, Site Reliability Engineering Manager at Dropbox | | | |
| How Google Does Planet-Scale for Planet-Scale Infra - Melissa Binde, SRE Director for Google Cloud Platform | | | |
| Netflix Guide to Microservices - Josh Evans, Director of Operations Engineering at Netflix | | | |
| Achieving Rapid Response Times in Large Online Services - Jeff Dean, Google Senior Fellow | | | |
| Architecture to Handle 80K RPS Celebrity Sales at Shopify - Simon Eskildsen, Engineering Lead at Shopify | | | |
| Lessons of Scale at Facebook - Bobby Johnson, Director of Engineering at Facebook | | | |
| Performance Optimization for the Greater China Region at Salesforce - Jeff Cheng, Enterprise Architect at Salesforce | | | |
| How GIPHY Delivers a GIF to 300 Millions Users - Alex Hoang and Nima Khoshini, Services Engineers at GIPHY | | | |
| High Performance Packet Processing Platform at Alibaba - Haiyong Wang, Senior Director at Alibaba | | | |
| Solving Large-scale Data Center and Cloud Interconnection Problems - Ihab Tarazi, CTO at Equinix | | | |
| Scaling Dropbox - Kevin Modzelewski, Back-end Engineer at Dropbox | | | |
| Scaling Reliability at Dropbox - Sat Kriya Khalsa, SRE at Dropbox | | | |
| Scaling with Performance at Facebook - Bill Jia, VP of Infrastructure at Facebook | | | |
| Scaling Live Videos to a Billion Users at Facebook - Sachin Kulkarni, Director of Engineering at Facebook | | | |
| Scaling Infrastructure at Instagram - Lisa Guo, Instagram Engineering | | | |
| Scaling Infrastructure at Twitter - Yao Yue, Staff Software Engineer at Twitter | | | |
| Scaling Infrastructure at Etsy - Bethany Macri, Engineering Manager at Etsy | | | |
| Scaling Real-time Infrastructure at Alibaba for Global Shopping Holiday - Xiaowei Jiang, Senior Director at Alibaba | | | |
| Scaling Data Infrastructure at Spotify - Matti (Lepistö) Pehrs, Spotify | | | |
| Scaling Pinterest - Marty Weiner, Pinterest’s founding engineer | | | |
| Scaling Slack - Bing Wei, Software Engineer (Infrastructure) at Slack | | | |
| Scaling Backend at Youtube - Sugu Sougoumarane, SDE at Youtube | | | |
| Scaling Backend at Uber - Matt Ranney, Chief Systems Architect at Uber | | | |
| Scaling Global CDN at Netflix - Dave Temkin, Director of Global Networks at Netflix | | | |
| Scaling Load Balancing Infra to Support 1.3 Billion Users at Facebook - Patrick Shuff, Production Engineer at Facebook | | | |
| Scaling (a NSFW site) to 200 Million Views A Day And Beyond - Eric Pickup, Lead Platform Developer at MindGeek | | | |
| Scaling Counting Infrastructure at Quora - Chun-Ho Hung and Nikhil Gar, SEs at Quora | | | |
| Scaling Git at Microsoft - Saeed Noursalehi, Principal Program Manager at Microsoft | | | |
| Scaling Multitenant Architecture Across Multiple Data Centres at Shopify - Weingarten, Engineering Lead at Shopify | | | |