Interview Learning Platform

Introduction

System design interviews assess your ability to create large-scale software systems. Think of it like architecting a city - you need to understand how all the pieces work together, from individual buildings (services) to the roads connecting them (networks) and the utilities supporting them (infrastructure). This guide breaks down these concepts in simple terms.

Scaling Strategies

When your system grows, you need ways to handle more users and data. Think of it like managing a restaurant that's becoming more popular.

Vertical vs. Horizontal Scaling

Vertical Scaling (Scale Up)
- What it is: Adding more power to your existing machines (like upgrading your computer with more RAM or a better CPU)
- When to use: When you have predictable, steady growth and want to keep things simple
- Advantages:
  - Simpler to manage (it's just one machine)
  - No need to modify your application code
  - Often sufficient for most small to medium applications
- Limitations:
  - There's a ceiling to how much you can upgrade
  - Can be expensive for high-end hardware
  - Single point of failure
- Real example: Upgrading a database server from 16GB RAM to 32GB RAM to handle more queries
Horizontal Scaling (Scale Out)
- What it is: Adding more machines to share the work (like adding more cashiers in a busy store)
- When to use: When you have unpredictable growth or need high availability
- How it works:
  - Distribute incoming requests across multiple servers
  - Split data across multiple databases
  - Use load balancers to direct traffic
- Challenges:
  - Need to handle data consistency across machines
  - More complex to manage and monitor
  - May require application code changes
- Real example: Adding more web servers to handle Black Friday traffic on an e-commerce site

Distribution Strategies

When you have multiple servers, you need smart ways to share the work between them.

Work Distribution

Load Balancing
- What it is: A system that distributes incoming requests across multiple servers
- How it works: Like a traffic cop directing cars to different lanes
- Common strategies explained:
  - Round-robin: Takes turns sending requests to each server (like dealing cards)
  - Least connections: Sends to the server handling the fewest requests (like picking the shortest checkout line)
  - Utilization-based: Checks which server has the most available resources
- Real-world example: When you visit Netflix, a load balancer decides which server will stream your video
Queue-based Distribution
- What it is: A system where work items wait in line to be processed
- How it works: Like a restaurant kitchen with orders coming in and chefs processing them
- Benefits:
  - Handles traffic spikes (orders can wait in line)
  - Ensures no work is lost
  - Allows for prioritization
- Example: Processing uploaded videos on YouTube - they wait in a queue to be converted to different formats

Data Distribution

Partitioning Strategies
- Geographic partitioning:
  - What: Storing data close to users in different regions
  - Example: Netflix storing popular shows in servers near users
  - Benefits: Faster access, compliance with local laws
- Hash-based partitioning:
  - What: Using a formula to decide where data goes
  - How: Like assigning students to classrooms based on their last names
  - Example: Instagram storing user posts across multiple servers based on user ID
- Range-based partitioning:
  - What: Grouping similar data together
  - Example: Storing January orders on one server, February on another
  - Best for: Time-series data or alphabetical grouping
Key Considerations
- Cross-node communication:
  - Challenge: When servers need to talk to each other
  - Why it matters: More communication = slower system
  - Solution: Keep related data together when possible
- Scatter-gather pattern:
  - What: When you need to get data from many servers and combine it
  - Why it's challenging: Slow and prone to failures
  - Example: Searching for a product across multiple warehouse databases

Consistency Models

How to handle data updates when you have multiple copies of the same information.

Types of Consistency

Strong Consistency
- What it means: All users see the same data at the same time
- How it works: Like a bank account - your balance must be accurate everywhere
- Trade-offs:
  - More reliable but slower
  - Requires more coordination between servers
- When to use:
  - Financial transactions
  - Inventory systems
  - User authentication
Eventual Consistency
- What it means: Updates take time to reach all servers
- How it works: Like social media - your friend might see your post a few seconds after you post it
- Benefits:
  - Faster response times
  - Works better when internet connection is poor
- Best for:
  - Social media feeds
  - Content management
  - Non-critical updates

Mixed Consistency Approach

What it means: Using different consistency levels for different features
Real-world example: E-commerce site
- Product reviews: Can be eventually consistent (slight delay is okay)
- Shopping cart: Needs strong consistency (must be accurate)
- Order status: Can be eventually consistent
- Payment processing: Must be strongly consistent

Locking Mechanisms

How to manage shared resources when multiple users or processes need access.

Key Considerations

Lock Granularity
- What it means: How much data or resources you lock at once
- Examples:
  - Fine-grained: Locking a single row in a database
  - Coarse-grained: Locking an entire table
- Trade-offs:
  - Fine-grained: Better concurrency but more overhead
  - Coarse-grained: Simpler but limits concurrent access
- Real example: Like locking a single hotel room vs. the entire floor
Lock Duration
- What it means: How long you keep the resource locked
- Best practices:
  - Lock only what you need
  - Release as quickly as possible
  - Plan your operations before locking
- Example: Like holding a bathroom key - get in, do your business, get out
Lock Alternatives
- Optimistic concurrency:
  - What: Assume conflicts are rare
  - How: Check for conflicts only when saving
  - Example: Like Google Docs - multiple people can edit, conflicts resolved at save
- Compare-and-swap:
  - What: Check if data changed before updating
  - How: Like making sure no one changed the price while you were buying
- Lock-free algorithms:
  - What: Special algorithms that don't need locks
  - When: High-performance requirements

Indexing Strategies

How to organize data so you can find it quickly, like the index at the back of a book.

Basic Indexing

Hash Map-based
- What: Direct lookup by key
- How it works: Like finding a book by its ISBN
- Performance: Instant access (O(1))
- Best for: Exact matches
Sorted List
- What: Ordered data for range queries
- How it works: Like a phone book sorted by name
- Performance: Fast for ranges (O(log n))
- Best for: Range queries, sorting
Database Native
- What: Built-in database features
- Benefits:
  - Optimized for the database
  - Automatically maintained
  - Support for complex queries

Specialized Indexes

Geospatial Indexes
- What: Optimize location-based searches
- Use cases:
  - Finding nearby restaurants
  - Delivery radius calculations
  - Location-based services
- Example: How Uber finds nearby drivers
Vector Indexes
- What: Finding similar items in large datasets
- Use cases:
  - Image similarity search
  - Recommendation systems
  - Natural language processing
- Example: How Spotify finds similar songs
Full-text Indexes
- What: Searching through text content
- Features:
  - Word stemming (running → run)
  - Relevance ranking
  - Partial matching
- Example: How Google searches web pages

Communication Protocols

How different parts of your system talk to each other.

External Communication

HTTP(S)
- What: Standard web protocol
- How it works: Request-response, like sending a letter and getting a reply
- When to use:
  - Regular web applications
  - Mobile app APIs
  - Most client-server communication
- Benefits:
  - Well understood
  - Works everywhere
  - Easy to debug
Long Polling
- What: Client repeatedly asks for updates
- How it works: Like checking your mailbox until a letter arrives
- When to use:
  - Simple real-time updates
  - Legacy system support
- Trade-offs:
  - Simpler than WebSockets
  - More server resources needed
Server-Sent Events (SSE)
- What: Server pushing updates to clients
- How it works: Like a news ticker - continuous updates
- Best for:
  - Real-time dashboards
  - Social media feeds
  - Status updates
- Example: Live sports scores
WebSockets
- What: Two-way continuous communication
- How it works: Like a phone call - both sides can talk
- Best for:
  - Chat applications
  - Online gaming
  - Real-time collaboration
- Example: How WhatsApp Web works

Internal Communication

HTTP(S)
- When: Service-to-service communication
- Benefits: Simple, standard, well-supported
gRPC
- What: Modern, high-performance protocol
- Benefits:
  - Faster than HTTP
  - Strong typing
  - Better for microservices
- Example: How Google services communicate
Message Brokers
- What: Middleware for async communication
- How it works: Like a postal service for your system
- Benefits:
  - Decoupled services
  - Better reliability
  - Handle traffic spikes

Security Considerations

Protecting your system and user data.

Authentication & Authorization

API Gateway
- What: Single entry point for all requests
- Benefits:
  - Centralized security
  - Rate limiting
  - Request validation
- Example: Like a security desk in a building
Third-party Solutions
- What: Using services like Auth0
- Benefits:
  - Professional security
  - Less maintenance
  - More features
- Example: "Login with Google"

Encryption

Data in Transit
- What: Protecting data while it moves
- How: Like sending a sealed envelope
- Methods:
  - HTTPS for web traffic
  - TLS for service communication
  - Certificate management
Data at Rest
- What: Protecting stored data
- How: Like a safe for your data
- Considerations:
  - Key management
  - Backup encryption
  - Access controls

Monitoring Framework

How to know your system is healthy and performing well.

Infrastructure Level

What to monitor:
- CPU: Processing power usage
- Memory: RAM usage
- Disk: Storage space and speed
- Network: Connection health
Why it matters: Like checking vital signs

Service Level

What to track:
- Response times: How fast services respond
- Error rates: How often things fail
- Throughput: How much work is done
Why it matters: Shows service health

Application Level

What to measure:
- User activity
- Feature usage
- Business metrics
- Error reports
Why it matters: Shows business impact

Key Numbers in System Design

Understanding these limits helps avoid premature optimization and fosters the creation of simpler, scalable systems. Demonstrating this knowledge in system design interviews reflects a candidate's ability to blend theoretical understanding with practical experience, which is especially important for senior roles.

Caching

Modern In-Memory Caches: Handle TB-scale datasets with single-digit millisecond latency, processing hundreds of thousands of operations per second.
Key Metrics:
- Memory: Up to 1 TB on optimized instances.
- Latency: < 1 ms (across availability zones within same region), 50-150 ms (cross-region)
- Throughput: Over 100k reads/sec; hundreds of thousands writes/sec.
Sharding Considerations:
- Dataset size nearing 1 TB
- sustained throughput over 100k ops/sec
- read latency exceeding 1 ms

Databases

Modern Databases: Handle up to 64 TB with millisecond response times and tens of thousands of transactions per second. For systems handling millions or even tens of millions of users, a well-tuned single database can often handle the load
Key Metrics:
- Storage: Up to 64 TB (128 TB for Aurora).
- Latency: 1-5 ms reads (cached), 5-30 ms reads and writes (disk).
- Throughput: Up to 50k TPS reads; Up to 10k TPS writes.
- Connections: 5-20k concurrent connections.
Sharding Considerations:
- Dataset size over 50 TB
- read throughput exceeding 50k TPS
- write throughput exceeding 10k TPS
- read latency above 5 ms
- Geographic distribution required

Application Servers

Modern Servers: Handle hundreds of thousands of concurrent connections with modest resource usage; cloud platforms allow rapid scaling.
Key Metrics:
- Connections: 100k+ concurrent.
- CPU: 8-64 cores.
- Memory: 64-512 GB standard, up to 2 TB.
- Network: Up to 25 Gbps.
Sharding Considerations:
- CPU is almost always your first bottleneck, not memory
- CPU utilization above 80%,
- memory usage trending above 80%,
- response latency exceeding SLA

Message Queues

Modern Message Queues: Support millions of messages per second with low latency, expanding their role beyond traditional async processing.
Key Metrics:
- Throughput: Up to 1 million messages/sec per broker.
- Latency: 1-5 ms end-to-end within a region, 5-30 ms cross-region
- Storage: Up to 50 TB per broker; retention of weeks to months.
Sharding Considerations:
- Throughput nearing 1 million messages/sec per broker
- Consistent growing consumer lag
- Cross-Region Replication: If geographic redundancy is required

Scaling Pitfalls

Premature Sharding:
- Candidates often assume sharding is necessary without considering data size. For example, a dataset of 10 million businesses (10 GB) and their reviews (100 GB total) can fit on a single database without sharding. Similarly, a LeetCode leaderboard with 400 GB of data can also be managed without sharding.
Unnecessary Caching:
- Candidates frequently overestimate the latency for simple key or row lookups on SSDs, often suggesting unnecessary caching layers. For simple lookups, the latency is typically around 10 ms, which is fast enough to forgo additional infrastructure, although caching is still wise for expensive queries.
Over-engineering for High Write Throughput:
- Many candidates incorrectly assume that a system with 5k writes per second requires a message queue. A well-tuned PostgreSQL instance can handle over 20,000 writes per second. Write capacity is usually limited by complex transactions or excessive indexing rather than the write volume itself. Message queues should be reserved for specific needs like guaranteed delivery, event sourcing, or handling spikes above 50k writes per second.

System Design Fundamental Concepts

Table of Contents