Principle |
Lessons from Giant-Scale Services - Eric Brewer, UC Berkeley & Google | | | |
Designs, Lessons and Advice from Building Large Distributed Systems - Jeff Dean, Google | | | |
How to Design a Good API & Why it Matters - Joshua Bloch, CMU & Google | | | |
On Efficiency, Reliability, Scaling - James Hamilton, VP at AWS | | | |
Principles of Chaos Engineering | | | |
Finding the Order in Chaos | | | |
The Twelve-Factor App | | | |
Clean Architecture | | | |
High Cohesion and Low Coupling | | | |
Monoliths and Microservices | | | |
CAP Theorem and Trade-offs | | | |
CP Databases and AP Databases | | | |
Stateless vs Stateful Scalability | | | |
Scale Up vs Scale Out: Hidden Costs | | | |
ACID and BASE | | | |
Blocking/Non-Blocking and Sync/Async | | | |
Performance and Scalability of Databases | | | |
Database Isolation Levels and Effects on Performance and Scalability | | | |
The Probability of Data Loss in Large Clusters | | | |
Data Access for Highly-Scalable Solutions: Using SQL, NoSQL, and Polyglot Persistence | | | |
SQL vs NoSQL | | | |
SQL vs NoSQL - Lesson Learned at Salesforce | | | |
NoSQL Databases: Survey and Decision Guidance | | | |
How Sharding Works | | | |
Consistent Hashing | | | |
Consistent Hashing: Algorithmic Tradeoffs | | | |
Don’t be tricked by the Hashing Trick | | | |
Uniform Consistent Hashing at Netflix | | | |
Eventually Consistent - Werner Vogels, CTO at Amazon | | | |
Cache is King | | | |
Anti-Caching | | | |
Understand Latency | | | |
Latency Numbers Every Programmer Should Know | | | |
The Calculus of Service Availability | | | |
Architecture Issues When Scaling Web Applications: Bottlenecks, Database, CPU, IO | | | |
Common Bottlenecks | | | |
Life Beyond Distributed Transactions | | | |
Relying on Software to Redirect Traffic Reliably at Various Layers | | | |
Breaking Things on Purpose | | | |
Avoid Over Engineering | | | |
Scalability Worst Practices | | | |
Use Solid Technologies - Don’t Re-invent the Wheel - Keep It Simple! | | | |
Simplicity by Distributing Complexity | | | |
Why Over-Reusing is Bad | | | |
Performance is a Feature | | | |
Make Performance Part of Your Workflow | | | |
The Benefits of Server Side Rendering over Client Side Rendering | | | |
Automate and Abstract: Lessons at Facebook | | | |
AWS Do's and Don'ts | | | |
(UI) Design Doesn’t Scale - Stanley Wood, Design Director at Spotify | | | |
Linux Performance | | | |
Building Fast and Resilient Web Applications - Ilya Grigorik | | | |
Accept Partial Failures, Minimize Service Loss | | | |
Design for Resiliency | | | |
Design for Self-healing | | | |
Design for Scaling Out | | | |
Design for Evolution | | | |
Learn from Mistakes | | | |
Scalability |
Microservices and Orchestration | | | |
Scalability / Microservices and Orchestration |
Domain-Oriented Microservice Architecture at Uber | | | |
Service Architecture (3 parts: Domain Gateways, Value-Added Services, BFF) at SoundCloud | | | |
Container (8 parts) at Riot Games | | | |
Containerization at Pinterest | | | |
Evolution of Container Usage at Netflix | | | |
Dockerizing MySQL at Uber | | | |
Testing of Microservices at Spotify | | | |
Docker in Production at Treehouse | | | |
Microservice at SoundCloud | | | |
Operate Kubernetes Reliably at Stripe | | | |
Cross-Cluster Traffic Mirroring with Istio at Trivago | | | |
Agrarian-Scale Kubernetes (3 parts) at New York Times | | | |
Nanoservices at BBC | | | |
PowerfulSeal: Testing Tool for Kubernetes Clusters at Bloomberg | | | |
Conductor: Microservices Orchestrator at Netflix | | | |
Docker Containers that Power Over 100.000 Online Shops at Shopify | | | |
Microservice Architecture at Medium | | | |
From bare-metal to Kubernetes at Betabrand | | | |
Kubernetes at Tinder | | | |
Kubernetes at Quora | | | |
Kubernetes Platform at Pinterest | | | |
Microservices at Nubank | | | |
Payment Transaction Management in Microservices at Mercari | | | |
Service Mesh at Snap | | | |
GRIT: Protocol for Distributed Transactions across Microservices at eBay | | | |
Rubix: Kubernetes at Palantir | | | |
CRISP: Critical Path Analysis for Microservice Architectures at Uber | | | |
Scalability |
Distributed Caching | | | |
Scalability / Distributed Caching |
EVCache: Distributed In-memory Caching at Netflix | | | |
EVCache Cache Warmer Infrastructure at Netflix | | | |
Memsniff: Robust Memcache Traffic Analyzer at Box | | | |
Caching with Consistent Hashing and Cache Smearing at Etsy | | | |
Analysis of Photo Caching at Facebook | | | |
Cache Efficiency Exercise at Facebook | | | |
tCache: Scalable Data-aware Java Caching at Trivago | | | |
Pycache: In-process Caching at Quora | | | |
Reduce Memcached Memory Usage by 50% at Trivago | | | |
Caching Internal Service Calls at Yelp | | | |
Estimating the Cache Efficiency using Big Data at Allegro | | | |
Distributed Cache at Zalando | | | |
Application Data Caching from RAM to SSD at NetFlix | | | |
Tradeoffs of Replicated Cache at Skyscanner | | | |
Avoiding Cache Stampede at DoorDash | | | |
Location Caching with Quadtrees at Yext | | | |
Video Metadata Caching at Vimeo | | | |
Scaling Redis at Twitter | | | |
Scaling Job Queue with Redis at Slack | | | |
Moving persistent data out of Redis at Github | | | |
Storing Hundreds of Millions of Simple Key-Value Pairs in Redis at Instagram | | | |
Redis at Trivago | | | |
Optimizing Redis Storage at Deliveroo | | | |
Memory Optimization in Redis at Wattpad | | | |
Redis Fleet at Heroku | | | |
Solving Remote Build Cache Misses (2 parts) at SoundCloud | | | |
Ratings & Reviews (2 parts) at Flipkart | | | |
Prefetch Caching of Items at eBay | | | |
Cross-Region Caching Library at Wix | | | |
Improving Distributed Caching Performance and Efficiency at Pinterest | | | |
Standardize and Improve Microservices Caching at DoorDash | | | |
HTTP Caching and CDN | | | |
Scalability / Distributed Caching / HTTP Caching and CDN |
Zynga Geo Proxy: Reducing Mobile Game Latency at Zynga | | | |
Google AMP at Condé Nast | | | |
A/B Tests on Hosting Infrastructure (CDNs) at Deliveroo | | | |
HAProxy with Kubernetes for User-facing Traffic at SoundCloud | | | |
Bandaid: Service Proxy at Dropbox | | | |
Service Workers at Slack | | | |
CDN Services at Spotify | | | |
Scalability |
Distributed Locking | | | |
Scalability / Distributed Locking |
Chubby: Lock Service for Loosely Coupled Distributed Systems at Google | | | |
Distributed Locking at Uber | | | |
Distributed Locks using Redis at GoSquared | | | |
ZooKeeper at Twitter | | | |
Eliminating Duplicate Queries using Distributed Locking at Chartio | | | |
Scalability |
Distributed Tracking, Tracing, and Measuring | | | |
Scalability / Distributed Tracking, Tracing, and Measuring |
Zipkin: Distributed Systems Tracing at Twitter | | | |
Improve Zipkin Traces using Kubernetes Pod Metadata at SoundCloud | | | |
Canopy: Scalable Distributed Tracing & Analysis at Facebook | | | |
Pintrace: Distributed Tracing at Pinterest | | | |
XCMetrics: All-in-One Tool for Tracking Xcode Build Metrics at Spotify | | | |
Real-time Distributed Tracing at LinkedIn | | | |
Tracking Service Infrastructure at Scale at Shopify | | | |
Distributed Tracing at HelloFresh | | | |
Analyzing Distributed Trace Data at Pinterest | | | |
Distributed Tracing at Uber | | | |
JVM Profiler: Tracing Distributed JVM Applications at Uber | | | |
Data Checking at Dropbox | | | |
Tracing Distributed Systems at Showmax | | | |
osquery Across the Enterprise at Palantir | | | |
StatsD at Etsy | | | |
StatsD at DoorDash | | | |
Scalability |
Distributed Scheduling | | | |
Scalability / Distributed Scheduling |
Distributed Task Scheduling (3 parts) at PagerDuty | | | |
Building Cron at Google | | | |
Distributed Cron Architecture at Quora | | | |
Chronos: A Replacement for Cron at Airbnb | | | |
Scheduler at Nextdoor | | | |
Peloton: Unified Resource Scheduler for Diverse Cluster Workloads at Uber | | | |
Fenzo: OSS Scheduler for Apache Mesos Frameworks at Netflix | | | |
Airflow - Workflow Orchestration | | | |
Scalability / Distributed Scheduling / Airflow - Workflow Orchestration |
Airflow at Airbnb | | | |
Airflow at Adyen | | | |
Airflow at Pandora | | | |
Airflow at Robinhood | | | |
Airflow at Lyft | | | |
Airflow at Drivy | | | |
Airflow at Grab | | | |
Airflow at Adobe | | | |
Auditing Airflow Job Runs at Walmart | | | |
MaaT: DAG-based Distributed Task Scheduler at Alibaba | | | |
boundary-layer: Declarative Airflow Workflows at Etsy | | | |
Scalability |
Distributed Monitoring and Alerting | | | |
Scalability / Distributed Monitoring and Alerting |
Unicorn: Remediation System at eBay | | | |
M3: Metrics and Monitoring Platform at Uber | | | |
Athena: Automated Build Health Management System at Dropbox | | | |
Vortex: Monitoring Server Applications at Dropbox | | | |
Nuage: Cloud Management Service at LinkedIn | | | |
Telltale: Application Monitoring at Netflix | | | |
ThirdEye: Monitoring Platform at LinkedIn | | | |
Periskop: Exception Monitoring Service at SoundCloud | | | |
Securitybot: Distributed Alerting Bot at Dropbox | | | |
Monitoring System at Alibaba | | | |
Real User Monitoring at Dailymotion | | | |
Alerting Ecosystem at Uber | | | |
Alerting Framework at Airbnb | | | |
Alerting on Service-Level Objectives (SLOs) at SoundCloud | | | |
Job-based Forecasting Workflow for Observability Anomaly Detection at Uber | | | |
Monitoring and Alert System using Graphite and Cabot at HackerEarth | | | |
Observability (2 parts) at Twitter | | | |
Distributed Security Alerting at Slack | | | |
Real-Time News Alerting at Bloomberg | | | |
Data Pipeline Monitoring System at LinkedIn | | | |
Monitoring and Observability at Picnic | | | |
Scalability |
Distributed Security | | | |
Scalability / Distributed Security |
Approach to Security at Scale at Dropbox | | | |
Aardvark and Repokid: AWS Least Privilege for Distributed, High-Velocity Development at Netflix | | | |
LISA: Distributed Firewall at LinkedIn | | | |
Secure Infrastructure To Store Bitcoin In The Cloud at Coinbase | | | |
BinaryAlert: Real-time Serverless Malware Detection at Airbnb | | | |
Scalable IAM Architecture to Secure Access to 100 AWS Accounts at Segment | | | |
OAuth Audit Toolbox at Indeed | | | |
Active Directory Password Blacklisting at Yelp | | | |
Syscall Auditing at Scale at Slack | | | |
Athenz: Fine-Grained, Role-Based Access Control at Yahoo | | | |
WebAuthn Support for Secure Sign In at Dropbox | | | |
Security Development Lifecycle at Slack | | | |
Unprivileged Container Builds at Kinvolk | | | |
Diffy: Differencing Engine for Digital Forensics in the Cloud at Netflix | | | |
Detecting Credential Compromise in AWS at Netflix | | | |
Scalable User Privacy at Spotify | | | |
AVA: Audit Web Applications at Indeed | | | |
TTL as a Service: Automatic Revocation of Stale Privileges at Yelp | | | |
Enterprise Key Management at Slack | | | |
Scalability and Authentication at Twitch | | | |
Edge Authentication and Token-Agnostic Identity Propagation at Netflix | | | |
Hardening Kubernetes Infrastructure with Cilium at Palantir | | | |
Improving Web Vulnerability Management through Automation at Lyft | | | |
Clock Skew when Syncing Password Payloads at Drobbox | | | |
Scalability |
Distributed Messaging, Queuing, and Event Streaming | | | |
Scalability / Distributed Messaging, Queuing, and Event Streaming |
Cape: Event Stream Processing Framework at Dropbox | | | |
Brooklin: Distributed Service for Near Real-Time Data Streaming at LinkedIn | | | |
Samza: Stream Processing System for Latency Insighs at LinkedIn | | | |
Bullet: Forward-Looking Query Engine for Streaming Data at Yahoo | | | |
EventHorizon: Tool for Watching Events Streaming at Etsy | | | |
Qmessage: Distributed, Asynchronous Task Queue at Quora | | | |
Cherami: Message Queue System for Transporting Async Tasks at Uber | | | |
Dynein: Distributed Delayed Job Queueing System at Airbnb | | | |
Timestone: Queueing System for Non-Parallelizable Workloads at Netflix | | | |
Messaging Service at Riot Games | | | |
Debugging Production with Event Logging at Zillow | | | |
Cross-platform In-app Messaging Orchestration Service at Netflix | | | |
Video Gatekeeper at Netflix | | | |
Scaling Push Messaging for Millions of Devices at Netflix | | | |
Delaying Asynchronous Message Processing with RabbitMQ at Indeed | | | |
Benchmarking Streaming Computation Engines at Yahoo | | | |
Improving Stream Data Quality With Protobuf Schema Validation at Deliveroo | | | |
Scaling Email Infrastructure at Medium | | | |
Real-time Messaging at Slack | | | |
Event Stream Database at Nike | | | |
Event Tracking System at Udemy | | | |
Event-Driven Messaging | | | |
Scalability / Distributed Messaging, Queuing, and Event Streaming / Event-Driven Messaging |
Domain-Driven Design at Alibaba | | | |
Domain-Driven Design at Weebly | | | |
Domain-Driven Design at Moonpig | | | |
Scaling Event Sourcing for Netflix Downloads | | | |
Scaling Event-Sourcing at Jet.com | | | |
Event Sourcing (2 parts) at eBay | | | |
Event Sourcing at FREE NOW | | | |
Scalable content feed using Event Sourcing and CQRS patterns at Brainly | | | |
Scalability / Distributed Messaging, Queuing, and Event Streaming |
Pub-Sub Messaging | | | |
Scalability / Distributed Messaging, Queuing, and Event Streaming / Pub-Sub Messaging |
Pulsar: Pub-Sub Messaging at Scale at Yahoo | | | |
Wormhole: Pub-Sub System at Facebook | | | |
MemQ: Cloud Native Pub-Sub System at Pinterest | | | |
Pub-Sub in Microservices at Netflix | | | |
Scalability / Distributed Messaging, Queuing, and Event Streaming |
Kafka - Message Broker | | | |
Scalability / Distributed Messaging, Queuing, and Event Streaming / Kafka - Message Broker |
Kafka at LinkedIn | | | |
Kafka at Pinterest | | | |
Kafka at Trello | | | |
Kafka at Salesforce | | | |
Kafka at The New York Times | | | |
Kafka at Yelp | | | |
Kafka at Criteo | | | |
Kafka on Kubernetes at Shopify | | | |
Kafka on PaaSTA: Running Kafka on Kubernetes at Yelp (2 parts) | | | |
Migrating Kafka's Zookeeper with No Downtime at Yelp | | | |
Reprocessing and Dead Letter Queues with Kafka at Uber | | | |
Chaperone: Audit Kafka End-to-End at Uber | | | |
Finding Kafka throughput limit in infrastructure at Dropbox | | | |
Cost Orchestration at Walmart | | | |
InfluxDB and Kafka to Scale to Over 1 Million Metrics a Second at Hulu | | | |
Scaling Kafka to Support Data Growth at PayPal | | | |
Scalability / Distributed Messaging, Queuing, and Event Streaming |
Stream Data Deduplication | | | |
Scalability / Distributed Messaging, Queuing, and Event Streaming / Stream Data Deduplication |
Exactly-once Semantics with Kafka | | | |
Real-time Deduping at Tapjoy | | | |
Deduplication at Segment | | | |
Deduplication at Mail.Ru | | | |
Petabyte Scale Data Deduplication at Mixpanel | | | |
Scalability |
Distributed Logging | | | |
Scalability / Distributed Logging |
Logging at LinkedIn | | | |
Scalable and Reliable Log Ingestion at Pinterest | | | |
High-performance Replicated Log Service at Twitter | | | |
Logging Service with Spark at CERN Accelerator | | | |
Logging and Aggregation at Quora | | | |
Collection and Analysis of Daemon Logs at Badoo | | | |
Log Parsing with Static Code Analysis at Palantir | | | |
Centralized Application Logging at eBay | | | |
Enrich VPC Flow Logs at Hyper Scale to provide Network Insight at Netflix | | | |
BookKeeper: Distributed Log Storage at Yahoo | | | |
LogDevice: Distributed Data Store for Logs at Facebook | | | |
LogFeeder: Log Collection System at Yelp | | | |
DBLog: Generic Change-Data-Capture Framework at Netflix | | | |
Scalability |
Distributed Searching | | | |
Scalability / Distributed Searching |
Search Architecture at Instagram | | | |
Search Architecture at eBay | | | |
Search Architecture at Box | | | |
Search Discovery Indexing Platform at Coupang | | | |
Universal Search System at Pinterest | | | |
Improving Search Engine Efficiency by over 25% at eBay | | | |
Indexing and Querying Telemetry Logs with Lucene at Palantir | | | |
Query Understanding at TripAdvisor | | | |
Search Federation Architecture at LinkedIn (2018) | | | |
Search at Slack | | | |
Search and Recommendations at DoorDash | | | |
Stability and Scalability for Search at Twitter | | | |
Search Service at Twitter (2014) | | | |
Autocomplete Search (2 parts) at Traveloka | | | |
Data-Driven Autocorrection System at Canva | | | |
Adapting Search to Indian Phonetics at Flipkart | | | |
Nautilus: Search Engine at Dropbox | | | |
Galene: Search Architecture of LinkedIn | | | |
Manas: High Performing Customized Search System at Pinterest | | | |
Sherlock: Near Real Time Search Indexing at Flipkart | | | |
Nebula: Storage Platform to Build Search Backends at Airbnb | | | |
ELK (Elasticsearch, Logstash, Kibana) Stack | | | |
Scalability / Distributed Searching / ELK (Elasticsearch, Logstash, Kibana) Stack |
Predictions in Real Time with ELK at Uber | | | |
Building a scalable ELK stack at Envato | | | |
ELK at Robinhood | | | |
Scaling Elasticsearch Clusters at Uber | | | |
Elasticsearch Performance Tuning Practice at eBay | | | |
Improve Performance using Elasticsearch Plugins (2 parts) at Tinder | | | |
Elasticsearch at Kickstarter | | | |
Log Parsing with Logstash and Google Protocol Buffers at Trivago | | | |
Fast Order Search using Data Pipeline and Elasticsearch at Yelp | | | |
Moving Core Business Search to Elasticsearch at Yelp | | | |
Sharding out Elasticsearch at Vinted | | | |
Self-Ranking Search with Elasticsearch at Wattpad | | | |
Vulcanizer: a library for operating Elasticsearch at Github | | | |
Scalability |
Distributed Storage | | | |
Scalability / Distributed Storage |
In-memory Storage | | | |
Scalability / Distributed Storage / In-memory Storage |
MemSQL Architecture - The Fast (MVCC, InMem, LockFree, CodeGen) And Familiar (SQL) | | | |
Optimizing Memcached Efficiency at Quora | | | |
Real-Time Data Warehouse with MemSQL on Cisco UCS | | | |
Moving to MemSQL at Tapjoy | | | |
MemSQL and Kinesis for Real-time Insights at Disney | | | |
MemSQL to Query Hundreds of Billions of Rows in a Dashboard at Pandora | | | |
Scalability / Distributed Storage |
Object Storage | | | |
Scalability / Distributed Storage / Object Storage |
Scaling HDFS at Uber | | | |
Reasons for Choosing S3 over HDFS at Databricks | | | |
File System on Amazon S3 at Quantcast | | | |
Image Recovery at Scale Using S3 Versioning at Trivago | | | |
Cloud Object Store at Yahoo | | | |
Ambry: Distributed Immutable Object Store at LinkedIn | | | |
Dynamometer: Scale Testing HDFS on Minimal Hardware with Maximum Fidelity at LinkedIn | | | |
Hammerspace: Persistent, Concurrent, Off-heap Storage at Airbnb | | | |
MezzFS: Mounting Object Storage in Media Processing Platform at Netflix | | | |
Magic Pocket: In-house Multi-exabyte Storage System at Dropbox | | | |
Scalability |
Relational Databases | | | |
Scalability / Relational Databases |
Building and Deploying MySQL Raft at Meta | | | |
MySQL for Schema-less Data at FriendFeed | | | |
MySQL at Pinterest | | | |
PostgreSQL at Twitch | | | |
Scaling MySQL-based Financial Reporting System at Airbnb | | | |
Scaling MySQL at Wix | | | |
MaxScale (MySQL) Database Proxy at Airbnb | | | |
Switching from Postgres to MySQL at Uber | | | |
Handling Growth with Postgres at Instagram | | | |
Scaling the Analytics Database (Postgres) at TransferWise | | | |
Updating a 50 Terabyte PostgreSQL Database at Adyen | | | |
Scaling Database Access for 100s of Billions of Queries per Day at PayPal | | | |
Minimizing Read-Write MySQL Downtime at Yelp | | | |
Migrating MySQL from 5.6 to 8.0 at Facebook | | | |
Migration from HBase to MyRocks at Quora | | | |
Replication | | | |
Scalability / Relational Databases / Replication |
MySQL Parallel Replication (4 parts) at Booking.com | | | |
Mitigating MySQL Replication Lag and Reducing Read Load at Github | | | |
Read Consistency with Database Replicas at Shopify | | | |
Black-Box Auditing: Verifying End-to-End Replication Integrity between MySQL and Redshift at Yelp | | | |
Partitioning Main MySQL Database at Airbnb | | | |
Herb: Multi-DC Replication Engine for Schemaless Datastore at Uber | | | |
Scalability / Relational Databases |
Sharding | | | |
Scalability / Relational Databases / Sharding |
Sharding MySQL at Pinterest | | | |
Sharding MySQL at Twilio | | | |
Sharding MySQL at Square | | | |
Sharding MySQL at Quora | | | |
Sharding Layer of Schemaless Datastore at Uber | | | |
Sharding & IDs at Instagram | | | |
Sharding Postgres at Notion | | | |
Solr: Improving Performance for Batch Indexing at Box | | | |
Geosharded Recommendations (3 parts) at Tinder | | | |
Scaling Services with Shard Manager at Facebook | | | |
Scalability / Relational Databases |
Presto the Distributed SQL Query Engine | | | |
Scalability / Relational Databases / Presto the Distributed SQL Query Engine |
Presto at Pinterest | | | |
Presto Infrastructure at Lyft | | | |
Presto at Grab | | | |
Engineering Data Analytics with Presto and Apache Parquet at Uber | | | |
Data Wrangling at Slack | | | |
Presto in Big Data Platform on AWS at Netflix | | | |
Presto Auto Scaling at Eventbrite | | | |
Speed Up Presto with Alluxio Local Cache at Uber | | | |
Scalability |
NoSQL Databases | | | |
Scalability / NoSQL Databases |
Key-Value Databases | | | |
Scalability / NoSQL Databases / Key-Value Databases |
DynamoDB at Nike | | | |
DynamoDB at Segment | | | |
DynamoDB at Mapbox | | | |
Manhattan: Distributed Key-Value Database at Twitter | | | |
Sherpa: Distributed NoSQL Key-Value Store at Yahoo | | | |
HaloDB: Embedded Key-Value Storage Engine at Yahoo | | | |
MPH: Fast and Compact Immutable Key-Value Stores at Indeed | | | |
Venice: Distributed Key-Value Database at Linkedin | | | |
Scalability / NoSQL Databases |
Columnar Databases | | | |
Scalability / NoSQL Databases / Columnar Databases |
Cassandra | | | |
Scalability / NoSQL Databases / Columnar Databases / Cassandra |
Cassandra at Instagram | | | |
Storing Images in Cassandra at Walmart | | | |
Storing Messages with Cassandra at Discord | | | |
Scaling Cassandra Cluster at Walmart | | | |
Scaling Ad Analytics with Cassandra at Yelp | | | |
Scaling to 100+ Million Reads/Writes using Spark and Cassandra at Dream11 | | | |
Moving Food Feed from Redis to Cassandra at Zomato | | | |
Benchmarking Cassandra Scalability on AWS at Netflix | | | |
Service Decomposition at Scale with Cassandra at Intuit QuickBooks | | | |
Cassandra for Keeping Counts In Sync at SoundCloud | | | |
Cassandra Driver Configuration for Improved Performance and Load Balancing at Glassdoor | | | |
cstar: Cassandra Orchestration Tool at Spotify | | | |
Scalability / NoSQL Databases / Columnar Databases |
HBase | | | |
Scalability / NoSQL Databases / Columnar Databases / HBase |
HBase at Salesforce | | | |
HBase in Facebook Messages | | | |
HBase in Imgur Notification | | | |
Improving HBase Backup Efficiency at Pinterest | | | |
HBase at Xiaomi | | | |
Scalability / NoSQL Databases / Columnar Databases |
Redshift | | | |
Scalability / NoSQL Databases / Columnar Databases / Redshift |
Redshift at GIPHY | | | |
Redshift at Hudl | | | |
Redshift at Drivy | | | |
Scalability / NoSQL Databases |
Document Databases | | | |
Scalability / NoSQL Databases / Document Databases |
eBay: Building Mission-Critical Multi-Data Center Applications with MongoDB | | | |
MongoDB at Baidu: Multi-Tenant Cluster Storing 200+ Billion Documents across 160 Shards | | | |
Migrating Mongo Data at Addepar | | | |
The AWS and MongoDB Infrastructure of Parse (acquired by Facebook) | | | |
Migrating Mountains of Mongo Data at Addepar | | | |
Couchbase Ecosystem at LinkedIn | | | |
SimpleDB at Zendesk | | | |
Espresso: Distributed Document Store at LinkedIn | | | |
Scalability / NoSQL Databases |
Graph Databases | | | |
Scalability / NoSQL Databases / Graph Databases |
FlockDB: Distributed Graph Database at Twitter | | | |
TAO: Distributed Data Store for the Social Graph at Facebook | | | |
Akutan: Distributed Knowledge Graph Store at eBay | | | |
Scalability |
Time Series Databases | | | |
Scalability / Time Series Databases |
Beringei: High-performance Time Series Storage Engine at Facebook | | | |
MetricsDB: TimeSeries Database for storing metrics at Twitter | | | |
Atlas: In-memory Dimensional Time Series Database at Netflix | | | |
Heroic: Time Series Database at Spotify | | | |
Roshi: Distributed Storage System for Time-Series Event at SoundCloud | | | |
Goku: Time Series Database at Pinterest | | | |
Scaling Time Series Data Storage (2 parts) at Netflix | | | |
Druid - Real-time Analytics Database | | | |
Scalability / Time Series Databases / Druid - Real-time Analytics Database |
Druid at Airbnb | | | |
Druid at Walmart | | | |
Druid at eBay | | | |
Druid at Netflix | | | |
Scalability |
Distributed Repositories, Dependencies, and Configurations Management | | | |
Scalability / Distributed Repositories, Dependencies, and Configurations Management |
DGit: Distributed Git at Github | | | |
Stemma: Distributed Git Server at Palantir | | | |
Configuration Management for Distributed Systems at Flickr | | | |
Git Repository at Microsoft | | | |
Solve Git Problem with Large Repositories at Microsoft | | | |
Single Repository at Google | | | |
Scaling Infrastructure and (Git) Workflow at Adyen | | | |
Dotfiles Distribution at Booking.com | | | |
Secret Detector: Preventing Secrets in Source Code at Yelp | | | |
Managing Software Dependency at Scale at LinkedIn | | | |
Merging Code in High-velocity Repositories at LinkedIn | | | |
Dynamic Configuration at Twitter | | | |
Dynamic Configuration at Mixpanel | | | |
Dynamic Configuration at GoDaddy | | | |
Scalability |
Scaling Continuous Integration and Continuous Delivery | | | |
Scalability / Scaling Continuous Integration and Continuous Delivery |
Continuous Integration Stack at Facebook | | | |
Continuous Integration with Distributed Repositories and Dependencies at Netflix | | | |
Continuous Integration and Deployment with Bazel at Dropbox | | | |
Continuous Deployments at BuzzFeed | | | |
Screwdriver: Continuous Delivery Build System for Dynamic Infrastructure at Yahoo | | | |
CI/CD at Betterment | | | |
CI/CD at Brainly | | | |
Scaling iOS CI with Anka at Shopify | | | |
Scaling Jira Server at Yelp | | | |
Auto-scaling CI/CD cluster at Flexport | | | |
Availability |
Resilience Engineering: Learning to Embrace Failure | | | |
Availability / Resilience Engineering: Learning to Embrace Failure |
Resilience Engineering with Project Waterbear at LinkedIn | | | |
Resiliency against Traffic Oversaturation at iHeartRadio | | | |
Resiliency in Distributed Systems at GO-JEK | | | |
Practical NoSQL Resilience Design Pattern for the Enterprise at eBay | | | |
Ensuring Resilience to Disaster at Quora | | | |
Site Resiliency at Expedia | | | |
Resiliency and Disaster Recovery with Kafka at eBay | | | |
Disaster Recovery for Multi-Region Kafka at Uber | | | |
Availability |
Failover | | | |
Availability / Failover |
The Evolution of Global Traffic Routing and Failover | | | |
Testing for Disaster Recovery Failover Testing | | | |
Designing a Microservices Architecture for Failure | | | |
ELB for Automatic Failover at GoSquared | | | |
Eliminate the Database for Higher Availability at American Express | | | |
Failover with Redis Sentinel at Vinted | | | |
High-availability SaaS Infrastructure at FreeAgent | | | |
MySQL High Availability at GitHub | | | |
MySQL High Availability at Eventbrite | | | |
Business Continuity & Disaster Recovery at Walmart | | | |
Availability |
Load Balancing | | | |
Availability / Load Balancing |
Introduction to Modern Network Load Balancing and Proxying | | | |
Top Five (Load Balancing) Scalability Patterns | | | |
Load Balancing infrastructure to support more than 1.3 billion users at Facebook | | | |
DHCPLB: DHCP Load Balancer at Facebook | | | |
Katran: Scalable Network Load Balancer at Facebook | | | |
Deterministic Aperture: A Distributed, Load Balancing Algorithm at Twitter | | | |
Load Balancing with Eureka at Netflix | | | |
Edge Load Balancing at Netflix | | | |
Zuul 2: Cloud Gateway at Netflix | | | |
Load Balancing at Yelp | | | |
Load Balancing at Github | | | |
Consistent Hashing to Improve Load Balancing at Vimeo | | | |
UDP Load Balancing at 500 pixel | | | |
QALM: QoS Load Management Framework at Uber | | | |
Traffic Steering using Rum DNS at LinkedIn | | | |
Traffic Infrastructure (Edge Network) at Dropbox | | | |
Intelligent DNS based load balancing at Dropbox | | | |
Monitor DNS systems at Stripe | | | |
Multi-DNS Architecture (3 parts) at Monday | | | |
Dynamic Anycast DNS Infrastructure at Hulu | | | |
Availability |
Rate Limiting | | | |
Availability / Rate Limiting |
Rate Limiting for Scaling to Millions of Domains at Cloudflare | | | |
Cloud Bouncer: Distributed Rate Limiting at Yahoo | | | |
Scaling API with Rate Limiters at Stripe | | | |
Distributed Rate Limiting at Allegro | | | |
Ratequeue: Core Queueing-And-Rate-Limiting System at Twilio | | | |
Quotas Service at Grab | | | |
Rate Limiting at Figma | | | |
Availability |
Autoscaling | | | |
Availability / Autoscaling |
Autoscaling Pinterest | | | |
Autoscaling Based on Request Queuing at Square | | | |
Autoscaling Jenkins at Trivago | | | |
Autoscaling Pub-Sub Consumers at Spotify | | | |
Autoscaling Bigtable Clusters based on CPU Load at Spotify | | | |
Autoscaling AWS Step Functions Activities at Yelp | | | |
Scryer: Predictive Auto Scaling Engine at Netflix | | | |
Bouncer: Simple AWS Auto Scaling Rollovers at Palantir | | | |
Clusterman: Autoscaling Mesos Clusters at Yelp | | | |
Availability |
Availability in Globally Distributed Storage Systems at Google | | | |
NodeJS High Availability at Yahoo | | | |
Operations (11 parts) at LinkedIn | | | |
Monitoring Powers High Availability for LinkedIn Feed | | | |
Supporting Global Events at Facebook | | | |
High Availability at BlaBlaCar | | | |
High Availability at Netflix | | | |
High Availability Cloud Infrastructure at Twilio | | | |
Automating Datacenter Operations at Dropbox | | | |
Globalizing Player Accounts at Riot Games | | | |
Stability |
Circuit Breaker | | | |
Stability / Circuit Breaker |
Circuit Breaking in Distributed Systems | | | |
Circuit Breaker for Scaling Containers | | | |
Lessons in Resilience at SoundCloud | | | |
Protector: Circuit Breaker for Time Series Databases at Trivago | | | |
Improved Production Stability with Circuit Breakers at Heroku | | | |
Circuit Breaker at Zendesk | | | |
Circuit Breaker at Traveloka | | | |
Circuit Breaker at Shopify | | | |
Stability |
Timeouts | | | |
Stability / Timeouts |
Fault Tolerance (Timeouts and Retries, Thread Separation, Semaphores, Circuit Breakers) at Netflix | | | |
Enforce Timeout: A Reliability Methodology at DoorDash | | | |
Troubleshooting a Connection Timeout Issue with tcp_tw_recycle Enabled at eBay | | | |
Stability |
Crash-safe Replication for MySQL at Booking.com | | | |
Bulkheads: Partition and Tolerate Failure in One Part | | | |
Steady State: Always Put Logs on Separate Disk | | | |
Throttling: Maintain a Steady Pace | | | |
Multi-Clustering: Improving Resiliency and Stability of a Large-scale Monolithic API Service at LinkedIn | | | |
Determinism (4 parts) in League of Legends Server | | | |
|
Performance Optimization on OS, Storage, Database, Network | | | |
|
Improving Performance with Background Data Prefetching at Instagram | | | |
Fixing Linux filesystem performance regressions at LinkedIn | | | |
Compression Techniques to Solve Network I/O Bottlenecks at eBay | | | |
Optimizing Web Servers for High Throughput and Low Latency at Dropbox | | | |
Linux Performance Analysis in 60.000 Milliseconds at Netflix | | | |
Live Downsizing Google Cloud Persistent Disks (PD-SSD) at Mixpanel | | | |
Decreasing RAM Usage by 40% Using jemalloc with Python & Celery at Zapier | | | |
Reducing Memory Footprint at Slack | | | |
Continuous Load Testing at Slack | | | |
Performance Improvements at Pinterest | | | |
Server Side Rendering at Wix | | | |
30x Performance Improvements on MySQLStreamer at Yelp | | | |
Optimizing APIs at Netflix | | | |
Performance Monitoring with Riemann and Clojure at Walmart | | | |
Performance Tracking Dashboard for Live Games at Zynga | | | |
Optimizing CAL Report Hadoop MapReduce Jobs at eBay | | | |
Performance Tuning on Quartz Scheduler at eBay | | | |
Profiling C++ (Part 1: Optimization, Part 2: Measurement and Analysis) at Riot Games | | | |
Profiling React Server-Side Rendering at HomeAway | | | |
Hardware-Assisted Video Transcoding at Dailymotion | | | |
Cross Shard Transactions at 10 Million RPS at Dropbox | | | |
API Profiling at Pinterest | | | |
Pagelets Parallelize Server-side Processing at Yelp | | | |
Improving key expiration in Redis at Twitter | | | |
Ad Delivery Network Performance Optimization with Flame Graphs at MindGeek | | | |
Predictive CPU isolation of containers at Netflix | | | |
Improving HDFS I/O Utilization for Efficiency at Uber | | | |
Cloud Jewels: Estimating kWh in the Cloud at Etsy | | | |
Unthrottled: Fixing CPU Limits in the Cloud (2 parts) at Indeed | | | |
|
Performance Optimization by Tuning Garbage Collection | | | |
|
Garbage Collection in Java Applications at LinkedIn | | | |
Garbage Collection in High-Throughput, Low-Latency Machine Learning Services at Adobe | | | |
Garbage Collection in Redux Applications at SoundCloud | | | |
Garbage Collection in Go Application at Twitch | | | |
Analyzing V8 Garbage Collection Logs at Alibaba | | | |
Python Garbage Collection for Dropping 50% Memory Growth Per Request at Instagram | | | |
Performance Impact of Removing Out of Band Garbage Collector (OOBGC) at Github | | | |
Debugging Java Memory Leaks at Allegro | | | |
Optimizing JVM at Alibaba | | | |
Tuning JVM Memory for Large-scale Services at Uber | | | |
Solr Performance Tuning at Walmart | | | |
Memory Tuning a High Throughput Microservice at Flipkart | | | |
|
Performance Optimization on Image, Video, Page Load | | | |
Performance / Performance Optimization on Image, Video, Page Load |
Optimizing 360 Photos at Scale at Facebook | | | |
Reducing Image File Size in the Photos Infrastructure at Etsy | | | |
Improving GIF Performance at Pinterest | | | |
Optimizing Video Playback Performance at Pinterest | | | |
Optimizing Video Stream for Low Bandwidth with Dynamic Optimizer at Netflix | | | |
Adaptive Video Streaming at YouTube | | | |
Reducing Video Loading Time at Dailymotion | | | |
Improving Homepage Performance at Zillow | | | |
The Process of Optimizing for Client Performance at Expedia | | | |
Web Performance at BBC | | | |
|
Performance Optimization by Brotli Compression | | | |
|
Boosting Site Speed Using Brotli Compression at LinkedIn | | | |
Brotli at Booking.com | | | |
Brotli at Treebo | | | |
Deploying Brotli for Static Content at Dropbox | | | |
Progressive Enhancement with Brotli at Yelp | | | |
Speeding Up Redis with Compression at DoorDash | | | |
|
Performance Optimization on Languages and Frameworks | | | |
|
Python at Netflix | | | |
Python at scale (3 parts) at Instagram | | | |
OCaml best practices (2 parts) at Issuu | | | |
PHP at Slack | | | |
Go at Trivago | | | |
TypeScript at Etsy | | | |
Kotlin for taming state at Etsy | | | |
Kotlin at DoorDash | | | |
BPF and Go at Bumble | | | |
Ruby on Rails at GitLab | | | |
Rust in production at Figma | | | |
Choosing a Language Stack at WeWork | | | |
Switching from Go to Rust at Discord | | | |
ASP.NET Core Performance Optimization at Agoda | | | |
Data Race Patterns in Go at Uber | | | |
Java 21 Virtual Threads at Netflix | | | |
Intelligence |
Big Data | | | |
Intelligence / Big Data |
Data Platform at Uber | | | |
Data Platform at BMW | | | |
Data Platform at Netflix | | | |
Data Platform at Flipkart | | | |
Data Platform at Coupang | | | |
Data Platform at DoorDash | | | |
Data Platform at Khan Academy | | | |
Data Infrastructure at Airbnb | | | |
Data Infrastructure at LinkedIn | | | |
Data Infrastructure at GO-JEK | | | |
Data Ingestion Infrastructure at Pinterest | | | |
Data Analytics Architecture at Pinterest | | | |
Data Orchestration Service at Spotify | | | |
Big Data Processing (2 parts) at Spotify | | | |
Big Data Processing at Uber | | | |
Analytics Pipeline at Lyft | | | |
Analytics Pipeline at Grammarly | | | |
Analytics Pipeline at Teads | | | |
ML Data Pipelines for Real-Time Fraud Prevention at PayPal | | | |
Big Data Analytics and ML Techniques at LinkedIn | | | |
Self-Serve Reporting Platform on Hadoop at LinkedIn | | | |
Privacy-Preserving Analytics and Reporting at LinkedIn | | | |
Analytics Platform for Tracking Item Availability at Walmart | | | |
Real-Time Analytics for Mobile App Crashes using Apache Pinot at Uber | | | |
HALO: Hardware Analytics and Lifecycle Optimization at Facebook | | | |
RBEA: Real-time Analytics Platform at King | | | |
AresDB: GPU-Powered Real-time Analytics Engine at Uber | | | |
AthenaX: Streaming Analytics Platform at Uber | | | |
Jupiter: Config Driven Adtech Batch Ingestion Platform at Uber | | | |
Delta: Data Synchronization and Enrichment Platform at Netflix | | | |
Keystone: Real-time Stream Processing Platform at Netflix | | | |
Databook: Turning Big Data into Knowledge with Metadata at Uber | | | |
Amundsen: Data Discovery & Metadata Engine at Lyft | | | |
Maze: Funnel Visualization Platform at Uber | | | |
Metacat: Making Big Data Discoverable and Meaningful at Netflix | | | |
SpinalTap: Change Data Capture System at Airbnb | | | |
Accelerator: Fast Data Processing Framework at eBay | | | |
Omid: Transaction Processing Platform at Yahoo | | | |
TensorFlowOnSpark: Distributed Deep Learning on Big Data Clusters at Yahoo | | | |
CaffeOnSpark: Distributed Deep Learning on Big Data Clusters at Yahoo | | | |
Spark on Scala: Analytics Reference Architecture at Adobe | | | |
Experimentation Platform (2 parts) at Spotify | | | |
Experimentation Platform at Airbnb | | | |
Smart Product Platform at Zalando | | | |
Log Analysis Platform at LINE | | | |
Data Visualisation Platform at Myntra | | | |
Building and Scaling Data Lineage at Netflix | | | |
Building a scalable data management system for computer vision tasks at Pinterest | | | |
Structured Data at Etsy | | | |
Scaling a Mature Data Pipeline - Managing Overhead at Airbnb | | | |
Spark Partitioning Strategies at Airbnb | | | |
Scaling the Hadoop Distributed File System at LinkedIn | | | |
Scaling Hadoop YARN cluster beyond 10,000 nodes at LinkedIn | | | |
Scaling Big Data Access Controls at Pinterest | | | |
Intelligence |
Distributed Machine Learning | | | |
Intelligence / Distributed Machine Learning |
Machine Learning Platform at Yelp | | | |
Machine Learning Platform at Etsy | | | |
Machine Learning Platform at Zalando | | | |
Scaling AI/ML Infrastructure at Uber | | | |
Recommendation System at Lyft | | | |
Reinforcement Learning Platform at Lyft | | | |
Platform for Serving Recommendations at Etsy | | | |
Infrastructure to Run User Forecasts at Spotify | | | |
Aroma: Using ML for Code Recommendation at Facebook | | | |
Flyte: Cloud Native Machine Learning and Data Processing Platform at Lyft | | | |
LyftLearn: ML Model Training Infrastructure built on Kubernetes at Lyft | | | |
Horovod: Open Source Distributed Deep Learning Framework for TensorFlow at Uber | | | |
Genie: Gen AI On-Call Copilot at Uber | | | |
COTA: Improving Customer Care with NLP & Machine Learning at Uber | | | |
Manifold: Model-Agnostic Visual Debugging Tool for Machine Learning at Uber | | | |
Repo-Topix: Topic Extraction Framework at Github | | | |
Concourse: Generating Personalized Content Notifications in Near-Real-Time at LinkedIn | | | |
Altus Care: Applying a Chatbot to Platform Engineering at eBay | | | |
PyKrylov: Accelerating Machine Learning Research at eBay | | | |
Box Graph: Spontaneous Social Network at Box | | | |
PricingNet: Pricing Modelling with Neural Networks at Skyscanner | | | |
PinText: Multitask Text Embedding System at Pinterest | | | |
SearchSage: Learning Search Query Representations at Pinterest | | | |
Cannes: ML saves $1.7M a year on document previews at Dropbox | | | |
Scaling Gradient Boosted Trees for Click-Through-Rate Prediction at Yelp | | | |
Learning with Privacy at Scale at Apple | | | |
Deep Learning for Image Classification Experiment at Mercari | | | |
Deep Learning for Frame Detection in Product Images at Allegro | | | |
Content-based Video Relevance Prediction at Hulu | | | |
Moderating Inappropriate Video Content at Yelp | | | |
Improving Photo Selection With Deep Learning at TripAdvisor | | | |
Personalized Recommendations for Experiences Using Deep Learning at TripAdvisor | | | |
Personalised Recommender Systems at BBC | | | |
Machine Learning (2 parts) at Condé Nast | | | |
Natural Language Processing and Content Analysis (2 parts) at Condé Nast | | | |
Mapping the World of Music Using Machine Learning (2 parts) at iHeartRadio | | | |
Machine Learning to Improve Streaming Quality at Netflix | | | |
Machine Learning to Match Drivers & Riders at GO-JEK | | | |
Improving Video Thumbnails with Deep Neural Nets at YouTube | | | |
Quantile Regression for Delivering On Time at Instacart | | | |
Cross-Lingual End-to-End Product Search with Deep Learning at Zalando | | | |
Machine Learning at Jane Street | | | |
Machine Learning for Ranking Answers End-to-End at Quora | | | |
Clustering Similar Stories Using LDA at Flipboard | | | |
Similarity Search at Flickr | | | |
Large-Scale Machine Learning Pipeline for Job Recommendations at Indeed | | | |
Deep Learning from Prototype to Production at Taboola | | | |
Atom Smashing using Machine Learning at CERN | | | |
Mapping Tags at Medium | | | |
Clustering with the Dirichlet Process Mixture Model in Scala at Monsanto | | | |
Map Pins with DBSCAN & Random Forests at Foursquare | | | |
Forecasting at Uber | | | |
Financial Forecasting at Uber | | | |
Productionizing ML with Workflows at Twitter | | | |
GUI Testing Powered by Deep Learning at eBay | | | |
Scaling Machine Learning to Recommend Driving Routes at Pivotal | | | |
Real-Time Predictions at DoorDash | | | |
Machine Intelligence at Dropbox | | | |
Machine Learning for Indexing Text from Billions of Images at Dropbox | | | |
Modeling User Journeys via Semantic Embeddings at Etsy | | | |
Automated Fake Account Detection at LinkedIn | | | |
Building Knowledge Graph at Airbnb | | | |
Core Modeling at Instagram | | | |
Neural Architecture Search (NAS) for Prohibited Item Detection at Mercari | | | |
Computer Vision at Airbnb | | | |
3D Home Backend Algorithms at Zillow | | | |
Long-term Forecasts at Lyft | | | |
Discovering Popular Dishes with Deep Learning at Yelp | | | |
SplitNet Architecture for Ad Candidate Ranking at Twitter | | | |
Jobs Filter at Indeed | | | |
Architecting Restaurant Wait Time Predictions at Yelp | | | |
Music Personalization at Spotify | | | |
Deep Learning for Domain Name Valuation at GoDaddy | | | |
Similarity Clustering to Catch Fraud Rings at Stripe | | | |
Personalized Search at Etsy | | | |
ML Feature Serving Infrastructure at Lyft | | | |
Context-Specific Bidding System at Etsy | | | |
Moderating Promotional Spam and Inappropriate Content in Photos at Scale at Yelp | | | |
Optimizing Payments with Machine Learning at Dropbox | | | |
Scaling Media Machine Learning at Netflix | | | |
Similarity Engine at eBay | | | |
Machine Learning in Content Moderation at Etsy | | | |
Architecture |
Tech Stack at Medium | | | |
Tech Stack at Shopify | | | |
Building Services (4 parts) at Airbnb | | | |
Architecture of Evernote | | | |
Architecture of Chat Service (3 parts) at Riot Games | | | |
Architecture of League of Legends Client Update | | | |
Architecture of Ad Platform at Twitter | | | |
Architecture of API Gateway at Uber | | | |
Architecture of API Gateway at Tinder | | | |
Basic Architecture of Slack | | | |
Lightweight Distributed Architecture to Handle Thousands of Library Releases at eBay | | | |
Back-end at LinkedIn | | | |
Back-end at Flickr | | | |
Infrastructure (3 parts) at Zendesk | | | |
Cloud Infrastructure at Grubhub | | | |
Real-time Presence Platform at LinkedIn | | | |
Settings Platform at LinkedIn | | | |
Nearline System for Scale and Performance (2 parts) at Glassdoor | | | |
Real-time User Action Counting System for Ads at Pinterest | | | |
API Platform at Riot Games | | | |
Games Platform at The New York Times | | | |
Kabootar: Communication Platform at Swiggy | | | |
Simone: Distributed Simulation Service at Netflix | | | |
Seagull: Distributed System that Helps Running > 20 Million Tests Per Day at Yelp | | | |
PriceAggregator: Intelligent System for Hotel Price Fetching (3 parts) at Agoda | | | |
Phoenix: Testing Platform (3 parts) at Tinder | | | |
Hexagonal Architecture at Netflix | | | |
Architecture of Sticker Services at LINE | | | |
Stack Overflow Enterprise at Palantir | | | |
Architecture of Following Feed, Interest Feed, and Picked For You at Pinterest | | | |
API Specification Workflow at WeWork | | | |
Media Database at Netflix | | | |
Member Transaction History Architecture at Walmart | | | |
Sync Engine (2 parts) at Dropbox | | | |
Ads Pacing Service at Twitter | | | |
Rapid Event Notification System at Netflix | | | |
Architectures of Finance, Banking, and Payment Systems | | | |
Architecture / Architectures of Finance, Banking, and Payment Systems |
Bank Backend at Monzo | | | |
Trading Platform for Scale at Wealthsimple | | | |
Core Banking System at Margo Bank | | | |
Architecture of Nubank | | | |
Tech Stack at TransferWise | | | |
Tech Stack at Addepar | | | |
Avoiding Double Payments in a Distributed Payments System at Airbnb | | | |
Scaling Payments (3 parts) at Etsy | | | |
Handles Millions of Digital Transactions Safely Everyday at Paytm | | | |
Billing and Payment Platform at Grammarly | | | |
Interview |
Designing Large-Scale Systems | | | |
Interview / Designing Large-Scale Systems |
My Scaling Hero - Jeff Atwood (a dose of Endorphins before your interview, JK) | | | |
Software Engineering Advice from Building Large-Scale Distributed Systems - Jeff Dean | | | |
Introduction to Architecting Systems for Scale | | | |
Anatomy of a System Design Interview | | | |
8 Things You Need to Know Before a System Design Interview | | | |
Top 10 System Design Interview Questions | | | |
Top 10 Common Large-Scale Software Architectural Patterns in a Nutshell | | | |
Cloud Big Data Design Patterns - Lynn Langit | | | |
How NOT to design Netflix in your 45-minute System Design Interview? | | | |
API Best Practices: Webhooks, Deprecation, and Design | | | |
Interview |
Explaining Low-Level Systems (OS, Network/Protocol, Database, Storage) | | | |
Interview / Explaining Low-Level Systems (OS, Network/Protocol, Database, Storage) |
The Precise Meaning of I/O Wait Time in Linux | | | |
Paxos Made Live – An Engineering Perspective | | | |
How to do Distributed Locking | | | |
SQL Transaction Isolation Levels Explained | | | |
Interview |
"What Happens When... and How" Questions | | | |
Interview / "What Happens When... and How" Questions |
Netflix: What Happens When You Press Play? | | | |
Monzo: How Peer-To-Peer Payments Work | | | |
Transit and Peering: How Your Requests Reach GitHub | | | |
How Spotify Streams Music | | | |
Organization |
Engineering Levels at SoundCloud | | | |
Engineering Roles at Palantir | | | |
Engineering Career Framework at Dropbox | | | |
Scaling Engineering Teams at Twitter | | | |
Scaling Decision-Making Across Teams at LinkedIn | | | |
Scaling Data Science Team at GOJEK | | | |
Scaling Agile at Zalando | | | |
Scaling Agile at bol.com | | | |
Lessons Learned from Scaling a Product Team at Intercom | | | |
Hiring, Managing, and Scaling Engineering Teams at Typeform | | | |
Scaling the Datagram Team at Instagram | | | |
Scaling the Design Team at Flexport | | | |
Team Model for Scaling a Design System at Salesforce | | | |
Building Analytics Team (4 parts) at Wish | | | |
From 2 Founders to 1000 Employees at Transferwise | | | |
Lessons Learned Growing a UX Team from 10 to 170 at Adobe | | | |
Five Lessons from Scaling at Pinterest | | | |
Approach Engineering at Vinted | | | |
Using Metrics to Improve the Development Process (and Coach People) at Indeed | | | |
Mistakes to Avoid while Creating an Internal Product at Skyscanner | | | |
RACI (Responsible, Accountable, Consulted, Informed) at Etsy | | | |
Four Pillars of Leading People (Empathy, Inspiration, Trust, Honesty) at Zalando | | | |
Pair Programming at Shopify | | | |
Distributed Responsibility at Asana | | | |
Rotating Engineers at Zalando | | | |
Experiment Idea Review at Pinterest | | | |
Tech Migrations at Spotify | | | |
Improving Code Ownership at Yelp | | | |
Agile Code Base at eBay | | | |
Agile Data Engineering at Miro | | | |
Automated Incident Management through Slack at Airbnb | | | |
Refactor Organization at BBC | | | |
Code Review | | | |
Organization / Code Review |
Code Review at Palantir | | | |
Code Review at LINE | | | |
Code Reviews at Medium | | | |
Code Review at LinkedIn | | | |
Code Review at Disney | | | |
Code Review at Netlify | | | |
Talk |
Distributed Systems in One Lesson - Tim Berglund, Senior Director of Developer Experience at Confluent | | | |
Building Real Time Infrastructure at Facebook - Jeff Barber and Shie Erlich, Software Engineer at Facebook | | | |
Building Reliable Social Infrastructure for Google - Marc Alvidrez, Senior Manager at Google | | | |
Building a Distributed Build System at Google Scale - Aysylu Greenberg, SDE at Google | | | |
Site Reliability Engineering at Dropbox - Tammy Butow, Site Reliability Engineering Manager at Dropbox | | | |
How Google Does Planet-Scale for Planet-Scale Infra - Melissa Binde, SRE Director for Google Cloud Platform | | | |
Netflix Guide to Microservices - Josh Evans, Director of Operations Engineering at Netflix | | | |
Achieving Rapid Response Times in Large Online Services - Jeff Dean, Google Senior Fellow | | | |
Architecture to Handle 80K RPS Celebrity Sales at Shopify - Simon Eskildsen, Engineering Lead at Shopify | | | |
Lessons of Scale at Facebook - Bobby Johnson, Director of Engineering at Facebook | | | |
Performance Optimization for the Greater China Region at Salesforce - Jeff Cheng, Enterprise Architect at Salesforce | | | |
How GIPHY Delivers a GIF to 300 Millions Users - Alex Hoang and Nima Khoshini, Services Engineers at GIPHY | | | |
High Performance Packet Processing Platform at Alibaba - Haiyong Wang, Senior Director at Alibaba | | | |
Solving Large-scale Data Center and Cloud Interconnection Problems - Ihab Tarazi, CTO at Equinix | | | |
Scaling Dropbox - Kevin Modzelewski, Back-end Engineer at Dropbox | | | |
Scaling Reliability at Dropbox - Sat Kriya Khalsa, SRE at Dropbox | | | |
Scaling with Performance at Facebook - Bill Jia, VP of Infrastructure at Facebook | | | |
Scaling Live Videos to a Billion Users at Facebook - Sachin Kulkarni, Director of Engineering at Facebook | | | |
Scaling Infrastructure at Instagram - Lisa Guo, Instagram Engineering | | | |
Scaling Infrastructure at Twitter - Yao Yue, Staff Software Engineer at Twitter | | | |
Scaling Infrastructure at Etsy - Bethany Macri, Engineering Manager at Etsy | | | |
Scaling Real-time Infrastructure at Alibaba for Global Shopping Holiday - Xiaowei Jiang, Senior Director at Alibaba | | | |
Scaling Data Infrastructure at Spotify - Matti (Lepistö) Pehrs, Spotify | | | |
Scaling Pinterest - Marty Weiner, Pinterest’s founding engineer | | | |
Scaling Slack - Bing Wei, Software Engineer (Infrastructure) at Slack | | | |
Scaling Backend at Youtube - Sugu Sougoumarane, SDE at Youtube | | | |
Scaling Backend at Uber - Matt Ranney, Chief Systems Architect at Uber | | | |
Scaling Global CDN at Netflix - Dave Temkin, Director of Global Networks at Netflix | | | |
Scaling Load Balancing Infra to Support 1.3 Billion Users at Facebook - Patrick Shuff, Production Engineer at Facebook | | | |
Scaling (a NSFW site) to 200 Million Views A Day And Beyond - Eric Pickup, Lead Platform Developer at MindGeek | | | |
Scaling Counting Infrastructure at Quora - Chun-Ho Hung and Nikhil Gar, SEs at Quora | | | |
Scaling Git at Microsoft - Saeed Noursalehi, Principal Program Manager at Microsoft | | | |
Scaling Multitenant Architecture Across Multiple Data Centres at Shopify - Weingarten, Engineering Lead at Shopify | | | |