System Design Interview Guide

A structured framework for approaching any system design interview. Follow the four steps, ask the right questions, and avoid common pitfalls.

The 4-Step Framework

Use this structure for every system design interview. Each step builds on the previous one.

Step 1

Requirements & Constraints

Gather requirements and scope the problem. Ask questions to clarify use cases and constraints. Discuss assumptions. Never start designing before you understand what you're building.

Tips

Spend 5-10 minutes here — it sets the direction for the entire interview.
Explicitly list what is in scope and what is out of scope.
State your assumptions out loud so the interviewer can course-correct early.
Ask whether you should run back-of-the-envelope calculations.
Identify the read-to-write ratio — it drives almost every architectural decision.
Quantify scale: number of users, requests per second, storage growth over 3-5 years.

Key Questions to Ask

Who are the users and how will they use the system?
What are the core features we need to support?
How many users / requests per second should we plan for?
What is the expected read-to-write ratio?
Are there latency requirements (e.g., p99 < 200ms)?
Do we need strong consistency or is eventual consistency acceptable?
What data do we need to store, and for how long?
Are there geographic or compliance constraints?

Step 2

High-Level Design

Outline a high-level design with all important components. Draw a block diagram showing the major services, data stores, and how they communicate. Keep it simple — details come in Step 3.

Tips

Start with clients, then work inward: load balancer → web/API servers → services → data stores.
Identify the main data flow for each use case (write path and read path).
Use well-known building blocks: load balancers, CDNs, caches, message queues, databases.
Don't over-complicate — a clear 5-box diagram is better than a sprawling 20-box one.
Label arrows with the protocol or action (e.g., REST, gRPC, pub/sub).
Call out which components are stateless vs. stateful — it affects how you scale later.

Step 3

Core Component Deep-Dive

Dive into the details of each core component. Design the API, data model, and algorithms. Walk through the flow for each use case step-by-step.

Tips

Define the API contracts first (REST endpoints or RPC methods, request/response shapes).
Design the database schema — list tables, key columns, indexes, and primary/foreign keys.
Walk through the happy path end-to-end for each use case.
Discuss algorithmic choices (e.g., hashing for URL shorteners, fan-out for feeds).
Ask the interviewer how much code they expect — some want pseudocode, others want discussion only.
Consider edge cases: what happens on duplicates, failures, or timeouts?

Step 4

Scale the Design

Identify and address bottlenecks given the constraints. Do not jump straight to the final design — state that you would iteratively benchmark, profile for bottlenecks, address them, and repeat.

Tips

Walk through the iterative loop: Benchmark/Load Test → Profile bottlenecks → Address them → Repeat.
Start with vertical scaling, then explain when and why you'd move to horizontal scaling.
Add caching for read-heavy workloads — discuss cache invalidation strategies.
Add a load balancer when a single server can't handle the traffic.
Consider database scaling: read replicas, sharding, federation, denormalization.
Discuss trade-offs explicitly — every decision has a cost (complexity, consistency, latency, $).
Mention CDNs for static content, message queues for async processing, autoscaling for traffic spikes.

Clarifying Questions to Ask

Organized by category. Use these to scope the problem before designing.

Data & Storage

How much data will we store, and what is the growth rate?
What is the average size of each record/object?
Do we need to support search or complex queries?
What is the data retention policy — do we keep data forever or expire it?
Is the data relational, document-oriented, or graph-like?

Edge Cases & Constraints

What happens when a dependent service is down?
How do we handle duplicate requests or idempotency?
Are there rate-limiting or abuse-prevention requirements?
Do we need to support backward compatibility or migrations?
What is the budget — are we optimizing for cost, speed, or reliability?

Functional Requirements

What are the core use cases we must support?
Who are the primary users (end users, internal services, third-party integrations)?
Are there different user roles or permission levels?
What does the write path look like vs. the read path?
Do users need real-time updates (WebSockets, SSE) or is polling acceptable?

Non-Functional Requirements

What are the availability requirements (e.g., 99.9% uptime)?
What latency targets should we aim for (p50, p99)?
Is strong consistency required or is eventual consistency acceptable?
Are there durability requirements (zero data loss)?
What are the security and compliance requirements (encryption, GDPR, SOC2)?

Scale & Traffic

How many daily active users should we plan for?
What is the expected requests-per-second (reads and writes)?
Is traffic evenly distributed or are there spikes (e.g., time-of-day, viral events)?
What is the read-to-write ratio?
Do we need to support multiple geographic regions?

Back-of-the-Envelope Cheat Sheet

Handy reference numbers for quick capacity estimation during interviews.

Seconds per day~86,400 (~10^5)

Seconds per month~2.5 million (~2.5 x 10^6)

1 req/s~2.5M requests/month

40 req/s~100M requests/month

400 req/s~1B requests/month

4,000 req/s~10B requests/month

1 KB x 1M1 GB

1 KB x 1B1 TB

1 MB x 1M1 TB

1 MB x 1B1 PB

L1 cache reference~1 ns

L2 cache reference~4 ns

Main memory reference~100 ns

Read 1 MB from memory~250 μs

Read 1 MB from SSD~1 ms

Read 1 MB from disk~20 ms

Round-trip within datacenter~0.5 ms

Round-trip CA → Netherlands~150 ms

Common Anti-Patterns

Mistakes to avoid during your system design interview.

Forgetting about failure modes

Discuss what happens when things go wrong: server crashes, network partitions, cache misses, thundering herds. Show you think about reliability.

Hand-waving the data model

The schema and access patterns drive your entire architecture. Define tables, indexes, and query patterns early.

Ignoring trade-offs

Every design decision has costs. Always state the trade-off explicitly: consistency vs. availability, latency vs. throughput, simplicity vs. scalability.

Jumping straight to the final design

Always start simple and iterate. Show the interviewer your thought process by evolving the design from a single-box architecture to a distributed system.

Monologuing without checking in

System design interviews are collaborative. Pause regularly to ask the interviewer if they'd like you to go deeper or move on.

Not asking clarifying questions

Designing without understanding requirements leads to solving the wrong problem. Spend the first 5-10 minutes scoping.

Over-engineering from the start

Don't add sharding, microservices, or Kafka on day one. Start with the simplest thing that works and scale incrementally.

Topics to Know

Core system design topics you should be comfortable discussing.

DNS & CDNLoad Balancing (L4 vs. L7, algorithms)Reverse ProxyHorizontal vs. Vertical ScalingCaching (cache-aside, write-through, write-back, eviction policies)Database types (RDBMS, key-value, document, wide-column, graph)SQL vs. NoSQL trade-offsReplication (leader-follower, leader-leader)Sharding / Partitioning (hash-based, range-based)Consistency patterns (strong, eventual, causal)Availability patterns (failover, replication)Message queues & async processingREST vs. gRPC vs. GraphQLRate limiting & throttlingConsistent hashingCAP theorem & PACELCBlob / object storageSearch indexes (inverted index, Elasticsearch)Monitoring, logging & alertingSecurity (TLS, auth, encryption at rest)