Key system design principles distilled from engineering blogs at top companies. Filter by difficulty or topic, and click any card for details.
No prior system design experience needed
Use a single primary for writes and scale reads horizontally with many replicas across regions.
Put a cache layer in front of your database to serve frequently read data without hitting the DB every time.
Use a connection pooler to reuse database connections instead of creating a new one per request.
Limit how many requests a client or endpoint can make in a time window to prevent overwhelming the system.
Keep a synchronized standby ready to take over if the primary server fails, minimizing downtime.
A performance problem means your system is slow for a single user. A scalability problem means it's fast for one but slow under heavy load.
Latency is the time to complete one action. Throughput is how many actions complete per unit time. Aim for maximal throughput with acceptable latency.
In a distributed system, you can only guarantee two of three: Consistency, Availability, and Partition Tolerance. Since networks fail, you must choose between CP and AP.
Choose CP when your business requires atomic reads and writes. Choose AP when the system must stay responsive even if data is temporarily stale.
After a write, reads may or may not see it. Best-effort delivery is acceptable when losing some data is tolerable.
After a write, reads will eventually see it (typically within milliseconds). Data is replicated asynchronously.
After a write, every subsequent read returns the updated value. Data is replicated synchronously.
Availability is measured in 'nines' — 99.9% (three 9s) allows ~8h 46min downtime/year, while 99.99% (four 9s) allows only ~52 minutes.
DNS translates domain names to IP addresses using a hierarchical system of servers. It's the first step in every web request.
Load balancers distribute incoming requests across multiple servers, preventing overload and eliminating single points of failure.
Vertical scaling (scale up) means bigger hardware. Horizontal scaling (scale out) means more machines. Horizontal is cheaper and more resilient but adds complexity.
ACID (Atomicity, Consistency, Isolation, Durability) guarantees that database transactions are reliable even during failures.
Key-value stores offer O(1) reads and writes, backed by memory or SSD. Best for simple data models and rapidly-changing data like caches.
Caching can happen at every layer: client (browser/OS), CDN, web server (reverse proxy), application (Redis/Memcached), and database.
The application checks the cache first. On a miss, it loads from the database, stores the result in cache, then returns it. Only requested data gets cached.
Load-balanced servers break server-local sessions. Fix it with a centralized session store or load-balancer-injected cookies that pin users to a backend.
RAID combines multiple disks for performance and/or redundancy. RAID 0 stripes for speed, RAID 1 mirrors for safety, RAID 5/6 balance economy and fault tolerance.
Accept dynamic input but serve pre-rendered static HTML files. Web servers are extremely fast at serving static content, avoiding per-request computation.
TCP guarantees ordered, reliable delivery via handshakes and retransmission. UDP is connectionless and faster but may lose or reorder packets.
Caching assembled objects instead of raw query results is easier to invalidate and enables async pre-assembly by worker servers.
For early-career engineers starting to design systems
Use cache locking/leasing so only one request fetches from the DB on a miss — others wait for the repopulated cache.
Route low-priority and high-priority workloads to separate instances so one can't degrade the other.
Multi-table joins are an OLTP anti-pattern. Break them apart and move join logic to the application layer.
Migrate write-heavy, shardable workloads to horizontally scalable systems to protect the primary.
Heartbeats between an active and passive server detect failures. The passive takes over the active's IP when a heartbeat is missed.
Both servers actively handle traffic, spreading load between them. If one fails, the other absorbs all traffic.
Push CDNs receive content when you upload it (good for low-traffic, rarely changing content). Pull CDNs fetch content on first request (good for high-traffic sites).
Layer 4 routes based on IP/port (fast, simple). Layer 7 routes based on request content like URL, headers, and cookies (flexible, smarter).
A reverse proxy sits in front of backend servers, providing a unified interface while adding security, caching, compression, and SSL termination.
The master handles all writes and replicates them to one or more slaves that serve read-only traffic.
Split databases by function (e.g., users, products, forums) to reduce per-database traffic and improve cache locality.
Store redundant copies of data to avoid expensive joins, trading write complexity for read performance.
Benchmark, profile, then optimize: tighten schemas, add proper indices, avoid expensive joins, and partition hot tables.
Document stores center around JSON/XML documents, providing flexible schemas and APIs to query document internals. Best for semi-structured, occasionally changing data.
Choose SQL for structured data, complex joins, and transactions. Choose NoSQL for flexible schemas, massive scale, and high-throughput workloads.
The application writes to the cache, and the cache synchronously writes to the database. Data is never stale, but writes are slower.
The application writes to the cache, which asynchronously flushes to the database later. Fast writes, but risk of data loss if the cache crashes.
DNS round robin is the simplest load-distribution scheme — the DNS server cycles through IPs — but caching and lack of health awareness make it unreliable.
Split users across servers by a simple attribute (name range, school, geography) for a quick horizontal scaling win before investing in full sharding.
Memcached is a dedicated in-memory key-value daemon that multiple web servers share. LRU eviction automatically discards cold entries when memory is full.
Restrict traffic between architecture tiers with port-level firewalls: only HTTP in from the internet, only MySQL between web and DB servers.
Enable MySQL's built-in query cache to instantly return results for repeated identical queries without re-executing them.
Separating the web layer from the application (platform) layer lets you scale and configure each independently.
A suite of independently deployable, small, modular services. Each runs a unique process and communicates via lightweight protocols.
Systems like Consul, Etcd, and Zookeeper help services find each other by tracking registered names, addresses, and ports.
Message queues decouple producers from consumers: a publisher posts a job, a worker picks it up and processes it in the background.
Task queues receive tasks with their data, execute them, and return results. They support scheduling and are ideal for compute-intensive background work.
RPC exposes behaviors (actions). REST exposes resources (data). RPC is common for internal services; REST is preferred for public APIs.
RESTful APIs identify resources by URI, change them with HTTP verbs, use status codes for errors, and should be fully accessible in a browser (HATEOAS).
For mid-to-senior engineers operating at scale
PostgreSQL's MVCC copies the entire row on every update, causing write amplification, dead tuple bloat, and vacuum pressure.
Apply rate limiting at every layer — application, connection pooler, proxy, and query — for defense in depth.
Only allow lightweight schema changes in production. Anything that rewrites the table is too dangerous at scale.
When the primary can't stream WAL to all replicas, use intermediate replicas to relay WAL downstream.
The classic failure loop is: load spike -> latency rise -> timeouts -> retries -> amplified load. Break it at every link.
Both masters serve reads and writes, coordinating with each other. If either goes down, the other continues operating.
Distribute data across different databases so each manages only a subset. Reduces traffic, replication, and index size per shard.
Wide column stores (Bigtable, HBase, Cassandra) use column families with row keys. Built for very large datasets with high availability and scalability.
Graph databases represent data as nodes and relationships (edges). Optimized for complex many-to-many relationships like social networks.
The cache automatically refreshes recently accessed entries before their TTL expires, reducing read latency if predictions are accurate.
A single data center is a single point of failure. Distribute across availability zones with independent power and networking, using global DNS to route users.
When queues grow beyond a threshold, reject new work with HTTP 503 and let clients retry with exponential backoff. This preserves throughput for jobs already in the queue.