System Design Fundamental Concepts
A beginner-friendly guide to essential system design concepts, with detailed explanations and real-world examples.
Introduction
System design interviews assess your ability to create large-scale software systems. Think of it like architecting a city - you need to understand how all the pieces work together, from individual buildings (services) to the roads connecting them (networks) and the utilities supporting them (infrastructure). This guide breaks down these concepts in simple terms.
Scaling Strategies
When your system grows, you need ways to handle more users and data. Think of it like managing a restaurant that's becoming more popular.
Vertical vs. Horizontal Scaling
-
Vertical Scaling (Scale Up)
- What it is: Adding more power to your existing machines (like upgrading your computer with more RAM or a better CPU)
- When to use: When you have predictable, steady growth and want to keep things simple
- Advantages:
- Simpler to manage (it's just one machine)
- No need to modify your application code
- Often sufficient for most small to medium applications
- Limitations:
- There's a ceiling to how much you can upgrade
- Can be expensive for high-end hardware
- Single point of failure
- Real example: Upgrading a database server from 16GB RAM to 32GB RAM to handle more queries
-
Horizontal Scaling (Scale Out)
- What it is: Adding more machines to share the work (like adding more cashiers in a busy store)
- When to use: When you have unpredictable growth or need high availability
- How it works:
- Distribute incoming requests across multiple servers
- Split data across multiple databases
- Use load balancers to direct traffic
- Challenges:
- Need to handle data consistency across machines
- More complex to manage and monitor
- May require application code changes
- Real example: Adding more web servers to handle Black Friday traffic on an e-commerce site
Distribution Strategies
When you have multiple servers, you need smart ways to share the work between them.
Work Distribution
-
Load Balancing
- What it is: A system that distributes incoming requests across multiple servers
- How it works: Like a traffic cop directing cars to different lanes
- Common strategies explained:
- Round-robin: Takes turns sending requests to each server (like dealing cards)
- Least connections: Sends to the server handling the fewest requests (like picking the shortest checkout line)
- Utilization-based: Checks which server has the most available resources
- Real-world example: When you visit Netflix, a load balancer decides which server will stream your video
-
Queue-based Distribution
- What it is: A system where work items wait in line to be processed
- How it works: Like a restaurant kitchen with orders coming in and chefs processing them
- Benefits:
- Handles traffic spikes (orders can wait in line)
- Ensures no work is lost
- Allows for prioritization
- Example: Processing uploaded videos on YouTube - they wait in a queue to be converted to different formats
Data Distribution
-
Partitioning Strategies
-
Geographic partitioning:
- What: Storing data close to users in different regions
- Example: Netflix storing popular shows in servers near users
- Benefits: Faster access, compliance with local laws
-
Hash-based partitioning:
- What: Using a formula to decide where data goes
- How: Like assigning students to classrooms based on their last names
- Example: Instagram storing user posts across multiple servers based on user ID
-
Range-based partitioning:
- What: Grouping similar data together
- Example: Storing January orders on one server, February on another
- Best for: Time-series data or alphabetical grouping
-
-
Key Considerations
-
Cross-node communication:
- Challenge: When servers need to talk to each other
- Why it matters: More communication = slower system
- Solution: Keep related data together when possible
-
Scatter-gather pattern:
- What: When you need to get data from many servers and combine it
- Why it's challenging: Slow and prone to failures
- Example: Searching for a product across multiple warehouse databases
-
Consistency Models
How to handle data updates when you have multiple copies of the same information.
Types of Consistency
-
Strong Consistency
- What it means: All users see the same data at the same time
- How it works: Like a bank account - your balance must be accurate everywhere
- Trade-offs:
- More reliable but slower
- Requires more coordination between servers
- When to use:
- Financial transactions
- Inventory systems
- User authentication
-
Eventual Consistency
- What it means: Updates take time to reach all servers
- How it works: Like social media - your friend might see your post a few seconds after you post it
- Benefits:
- Faster response times
- Works better when internet connection is poor
- Best for:
- Social media feeds
- Content management
- Non-critical updates
Mixed Consistency Approach
- What it means: Using different consistency levels for different features
- Real-world example: E-commerce site
- Product reviews: Can be eventually consistent (slight delay is okay)
- Shopping cart: Needs strong consistency (must be accurate)
- Order status: Can be eventually consistent
- Payment processing: Must be strongly consistent
Locking Mechanisms
How to manage shared resources when multiple users or processes need access.
Key Considerations
-
Lock Granularity
- What it means: How much data or resources you lock at once
- Examples:
- Fine-grained: Locking a single row in a database
- Coarse-grained: Locking an entire table
- Trade-offs:
- Fine-grained: Better concurrency but more overhead
- Coarse-grained: Simpler but limits concurrent access
- Real example: Like locking a single hotel room vs. the entire floor
-
Lock Duration
- What it means: How long you keep the resource locked
- Best practices:
- Lock only what you need
- Release as quickly as possible
- Plan your operations before locking
- Example: Like holding a bathroom key - get in, do your business, get out
-
Lock Alternatives
- Optimistic concurrency:
- What: Assume conflicts are rare
- How: Check for conflicts only when saving
- Example: Like Google Docs - multiple people can edit, conflicts resolved at save
- Compare-and-swap:
- What: Check if data changed before updating
- How: Like making sure no one changed the price while you were buying
- Lock-free algorithms:
- What: Special algorithms that don't need locks
- When: High-performance requirements
- Optimistic concurrency:
Indexing Strategies
How to organize data so you can find it quickly, like the index at the back of a book.
Basic Indexing
-
Hash Map-based
- What: Direct lookup by key
- How it works: Like finding a book by its ISBN
- Performance: Instant access (O(1))
- Best for: Exact matches
-
Sorted List
- What: Ordered data for range queries
- How it works: Like a phone book sorted by name
- Performance: Fast for ranges (O(log n))
- Best for: Range queries, sorting
-
Database Native
- What: Built-in database features
- Benefits:
- Optimized for the database
- Automatically maintained
- Support for complex queries
Specialized Indexes
-
Geospatial Indexes
- What: Optimize location-based searches
- Use cases:
- Finding nearby restaurants
- Delivery radius calculations
- Location-based services
- Example: How Uber finds nearby drivers
-
Vector Indexes
- What: Finding similar items in large datasets
- Use cases:
- Image similarity search
- Recommendation systems
- Natural language processing
- Example: How Spotify finds similar songs
-
Full-text Indexes
- What: Searching through text content
- Features:
- Word stemming (running → run)
- Relevance ranking
- Partial matching
- Example: How Google searches web pages
Communication Protocols
How different parts of your system talk to each other.
External Communication
-
HTTP(S)
- What: Standard web protocol
- How it works: Request-response, like sending a letter and getting a reply
- When to use:
- Regular web applications
- Mobile app APIs
- Most client-server communication
- Benefits:
- Well understood
- Works everywhere
- Easy to debug
-
Long Polling
- What: Client repeatedly asks for updates
- How it works: Like checking your mailbox until a letter arrives
- When to use:
- Simple real-time updates
- Legacy system support
- Trade-offs:
- Simpler than WebSockets
- More server resources needed
-
Server-Sent Events (SSE)
- What: Server pushing updates to clients
- How it works: Like a news ticker - continuous updates
- Best for:
- Real-time dashboards
- Social media feeds
- Status updates
- Example: Live sports scores
-
WebSockets
- What: Two-way continuous communication
- How it works: Like a phone call - both sides can talk
- Best for:
- Chat applications
- Online gaming
- Real-time collaboration
- Example: How WhatsApp Web works
Internal Communication
-
HTTP(S)
- When: Service-to-service communication
- Benefits: Simple, standard, well-supported
-
gRPC
- What: Modern, high-performance protocol
- Benefits:
- Faster than HTTP
- Strong typing
- Better for microservices
- Example: How Google services communicate
-
Message Brokers
- What: Middleware for async communication
- How it works: Like a postal service for your system
- Benefits:
- Decoupled services
- Better reliability
- Handle traffic spikes
Security Considerations
Protecting your system and user data.
Authentication & Authorization
-
API Gateway
- What: Single entry point for all requests
- Benefits:
- Centralized security
- Rate limiting
- Request validation
- Example: Like a security desk in a building
-
Third-party Solutions
- What: Using services like Auth0
- Benefits:
- Professional security
- Less maintenance
- More features
- Example: "Login with Google"
Encryption
-
Data in Transit
- What: Protecting data while it moves
- How: Like sending a sealed envelope
- Methods:
- HTTPS for web traffic
- TLS for service communication
- Certificate management
-
Data at Rest
- What: Protecting stored data
- How: Like a safe for your data
- Considerations:
- Key management
- Backup encryption
- Access controls
Monitoring Framework
How to know your system is healthy and performing well.
Infrastructure Level
- What to monitor:
- CPU: Processing power usage
- Memory: RAM usage
- Disk: Storage space and speed
- Network: Connection health
- Why it matters: Like checking vital signs
Service Level
- What to track:
- Response times: How fast services respond
- Error rates: How often things fail
- Throughput: How much work is done
- Why it matters: Shows service health
Application Level
- What to measure:
- User activity
- Feature usage
- Business metrics
- Error reports
- Why it matters: Shows business impact
Key Numbers in System Design
Understanding these limits helps avoid premature optimization and fosters the creation of simpler, scalable systems. Demonstrating this knowledge in system design interviews reflects a candidate's ability to blend theoretical understanding with practical experience, which is especially important for senior roles.
Caching
- Modern In-Memory Caches: Handle TB-scale datasets with single-digit millisecond latency, processing hundreds of thousands of operations per second.
- Key Metrics:
- Memory: Up to 1 TB on optimized instances.
- Latency: < 1 ms (across availability zones within same region), 50-150 ms (cross-region)
- Throughput: Over 100k reads/sec; hundreds of thousands writes/sec.
- Sharding Considerations:
- Dataset size nearing 1 TB
- sustained throughput over 100k ops/sec
- read latency exceeding 1 ms
Databases
- Modern Databases: Handle up to 64 TB with millisecond response times and tens of thousands of transactions per second. For systems handling millions or even tens of millions of users, a well-tuned single database can often handle the load
- Key Metrics:
- Storage: Up to 64 TB (128 TB for Aurora).
- Latency: 1-5 ms reads (cached), 5-30 ms reads and writes (disk).
- Throughput: Up to 50k TPS reads; Up to 10k TPS writes.
- Connections: 5-20k concurrent connections.
- Sharding Considerations:
- Dataset size over 50 TB
- read throughput exceeding 50k TPS
- write throughput exceeding 10k TPS
- read latency above 5 ms
- Geographic distribution required
Application Servers
- Modern Servers: Handle hundreds of thousands of concurrent connections with modest resource usage; cloud platforms allow rapid scaling.
- Key Metrics:
- Connections: 100k+ concurrent.
- CPU: 8-64 cores.
- Memory: 64-512 GB standard, up to 2 TB.
- Network: Up to 25 Gbps.
- Sharding Considerations:
- CPU is almost always your first bottleneck, not memory
- CPU utilization above 80%,
- memory usage trending above 80%,
- response latency exceeding SLA
Message Queues
- Modern Message Queues: Support millions of messages per second with low latency, expanding their role beyond traditional async processing.
- Key Metrics:
- Throughput: Up to 1 million messages/sec per broker.
- Latency: 1-5 ms end-to-end within a region, 5-30 ms cross-region
- Storage: Up to 50 TB per broker; retention of weeks to months.
- Sharding Considerations:
- Throughput nearing 1 million messages/sec per broker
- Consistent growing consumer lag
- Cross-Region Replication: If geographic redundancy is required
Scaling Pitfalls
-
Premature Sharding:
- Candidates often assume sharding is necessary without considering data size. For example, a dataset of 10 million businesses (10 GB) and their reviews (100 GB total) can fit on a single database without sharding. Similarly, a LeetCode leaderboard with 400 GB of data can also be managed without sharding.
-
Unnecessary Caching:
- Candidates frequently overestimate the latency for simple key or row lookups on SSDs, often suggesting unnecessary caching layers. For simple lookups, the latency is typically around 10 ms, which is fast enough to forgo additional infrastructure, although caching is still wise for expensive queries.
-
Over-engineering for High Write Throughput:
- Many candidates incorrectly assume that a system with 5k writes per second requires a message queue. A well-tuned PostgreSQL instance can handle over 20,000 writes per second. Write capacity is usually limited by complex transactions or excessive indexing rather than the write volume itself. Message queues should be reserved for specific needs like guaranteed delivery, event sourcing, or handling spikes above 50k writes per second.