Interview Learning Platform

Application Server

Modern servers can pack enourmous amount of computing power.

CPU & Memory:

A general-purpose large instance comes with 100 cores and 1TB of memory.
A memory-optimized instance can go further with 10s of TB of memory.
Consider sharding when CPU or memory consistently over 80% utilization.

Disk Storage:

A modern server can come with 10TB of local SSD or 100TB of HDD.

Network:

Bandwith can go up to 2Gbps within the same data center, and range from 100Mbps to 1Gbps for cross-region communication.
Latency can be as low as sub ms or microseconds within data center, and <2 ms between AZs within the same region, and 100s ms for cross-region communication depends on actual physical distane.
Consider sharding when network latency and bandwidth become the bottleneck.

API Design

POST vs PUT vs PATCH

HTTP Method	Description	Idempotent	URL Pattern
POST	Creates a new resource. Multiple identical requests will create multiple resources.	No	Typically targets a collection URL (e.g., /users)
PUT	Updates or creates a resource at a specific URL. Multiple identical requests will have the same effect as a single request.	Yes	Typically targets a specific resource URL (e.g., /users/123)
PATCH	Partially updates a resource at a specific URL. Multiple identical requests should have the same effect as a single request.	Yes (depends on implementation)	Typically targets a specific resource URL (e.g., /users/123)

Security

From a security perspective, it's often best to avoid passing sensitive information like user IDs in the request body. Instead, pass them as headers.

Exposure Risk: Request bodies can be logged or intercepted, and sensitive information like user IDs could be exposed.
Prevention of Manipulation: If user IDs are passed in the request body, there's a risk that malicious users could manipulate them.
Mitigation: It's more secure to use authentication tokens (like JWT) or session IDs in the headers that can securely identify the user. The server can then validates the token and extracts the userId from it.

Idempotent APIs

Idempotent APIs are a type of API design where multiple identical requests have the same effect as a single request. Idempotent APIs play a critical role in building reliable and robust web services, making the system more reliable and easier to maintain and use, providing predictable outcomes for clients even in the face of errors or network issues.

Key Characteristics of Idempotent APIs

Same Effect: Multiple identical requests should have the same effect as a single request.
Repeatability: Operations can safely be repeated without unintended side effects.
Consistent Outcome: The outcome of the operation is the same regardless of how many times it is repeated.

Examples of Idempotent APIs

GET Requests: Fetching a resource does not change the server state.
PUT Requests: Updating a resource to a specific state; repeated calls with the same data do not change the outcome.
DELETE Requests: Deleting a resource effectively makes multiple calls to delete the same resource yield the same final state (the resource is gone).

Pagination

API pagination is a technique used to divide a large set of data into smaller, manageable chunks or "pages" that can be retrieved sequentially. This approach is essential for optimizing performance, reducing load times, and improving the user experience when dealing with large datasets.

Types of Pagination

Offset-Based Pagination: This method uses an offset (or page number) and a limit (number of items per page) to retrieve data. For example, if you want to access the second page of results with 10 items per page, you would use an offset of 10 (i.e., starting after the first 10 results). Example:
```
GET /api/items?offset=10&limit=10
```
- Pros:
  - Simplicity: Easy to implement and understand.
- Cons:
  - Performance Issue: As the offset increases, query performance can degrade, especially in large datasets, since the database has to count through all rows preceding the offset for each query and bypass them.
  - Result Inconsistency: If the underlying data changes while paginating (e.g., new items are inserted or deleted), results can be inconsistent (e.g., items might be missed or duplicated).
Cursor-Based Pagination: A better approach is to use cursor pagination. The cursor is a unique identifier for a specific position in the dataset. For example, if you want to access the second page of results, you would use a cursor that points to the position after the first page. Example:
```
GET /api/items?cursor={last_item_id}&limit=10
```
- Pros:
  - Performance: Generally performs better with large datasets, as it does not require the database to count or skip records, since the cursor points to the specific position in the dataset (assuming we built an index on the cursor field).
  - Result Consistency: More resilient to changes in the underlying dataset, as it always returns results relative to the last fetched item where the cursor is. However, there is still a risk of missing items if new records are inserted ahead of your current cursor position.
- Cons:
  - Complexity: Implementation can be more complex.

Database

Numbers to Remember

Storage: Single node can handle up to a hundred TB of storage.
Throughput: Read up to 50k QPS, write up to 10k QPS. Affected by query complexity. For complex transactions (multiple tables & indexes), the throughput can be as low as hundreds QPS.
Latency: 1-5ms for cached read, 10-300ms for disk read. 10-30ms for write.
Connection: Up to 10k concurrent connections.

Interview Tips

Avoid comparing relational and NoSQL databases directly in your interview responses, as this may indicate a lack of experience. Instead of broad statements about why to choose one over the other, concentrate on the specific database you're familiar with and how it addresses the problem at hand. If comparisons are necessary, focus on the differences relevant to your experience and their impact on your design. A strong statement might include specifics, such as highlighting the ACID properties of Postgres for data integrity.

For interviews, it’s best to choose a specific database type to focus on. If you're preparing for product design interviews, opt for a relational database (like Postgres). If you're preparing for infrastructure design interviews, choose a NoSQL database (such as DynamoDB).

Relational Databases

Key Features

SQL Joins
- Combines data from multiple tables, enabling complex queries
- Joins can be a performance bottleneck, so they should be minimized when possible.
Indexes:
- Improve query performance by allowing faster data retrieval.
- Common implementation: B-Trees or Hash Tables
- Support for multiple indexes, including multi-column and specialized types (e.g., geospatial, full-text)
- Indexs comes with additional storage cost and write performance overhead
Transactions:
- Group multiple operations into a single atomic operation, ensuring data integrity.
- ACID compliant to ensure data integrity and reliability.
- Transactions introduce inherent overhead due to locking mechanisms, isolation management, and transaction logging.

Pros & Cons

Pros

Data Integrity: Strong ACID (Atomicity, Consistency, Isolation, Durability) properties ensure reliable transactions.
Structured Data: Well-defined schema allows for clear organization and relationships.
Powerful Querying: SQL provides a robust language for complex queries and data manipulation.
Mature Ecosystem: Extensive tools and community support for popular RDBMS like Postgres and MySQL.

Cons

Scalability Limitations: Can struggle with horizontal scaling compared to NoSQL databases.
Rigid Schema: Changes to the database schema can be complex and time-consuming.
Performance Bottlenecks: Joins and complex queries can lead to performance issues, especially with large datasets.

NoSQL Databases

Key Features

Flexible Schema: Support for various data structures without a fixed schema, allowing for easy adaptation to changing data requirements.
Scalability: Designed to scale horizontally across many servers, accommodating large amounts of data and high traffic loads through techniques like sharding and consistent hashing.
Diverse Consistency Models: Offer a range of consistency options, from strong consistency (ensuring all nodes have the same data at the same time) to eventual consistency (where all nodes will eventually converge on the same data).
Indexing: Support for indexing (e.g., B-Tree, Hash Table) to enhance query performance, similar to relational databases.
Variety of Data Models:
- Key-Value Stores: Fast access and simple data retrieval (e.g., Redis, DynamoDB).
- Document Stores: Flexible and schema-less, ideal for JSON-like data (e.g., MongoDB).
- Column-Family Stores: Optimized for high write performance and scalability (e.g., Cassandra, BigTable).
- Graph Databases: Efficiently manage and query relationships between data points (e.g., Neo4j).
- Time-Series Databases: Store data in time-series format, allowing for efficient querying of data indexed by time, making them ideal for applications monitoring metrics and events over time. (e.g., Prometheus).
- Geospatial Databases: Designed to store and query spatial data, these databases include support for geographic information systems (GIS), allowing for location-based queries and analysis. (e.g., PostGIS (extension for PostgreSQL)).
- Search Engines: Optimized for full-text search capabilities and complex search queries, allowing users to index and retrieve data efficiently based on text patterns. (e.g., Elasticsearch).

Pros & Cons

Pros

Flexibility: Easily accommodates varying data types and structures without the need for a predefined schema.
High Scalability: Capable of handling large-scale applications with high throughput and low latency.
Performance: Optimized for specific use cases, such as write-heavy workloads or real-time analytics.
Diverse Use Cases: Suitable for applications dealing with big data, real-time web apps, and evolving data models.

Cons

Consistency Trade-offs: Eventual consistency may lead to stale reads, which can be problematic for certain applications requiring real-time accuracy.
Limited ACID Transactions: Many NoSQL databases do not fully support ACID transactions, which can be a drawback for applications needing strong transactional guarantees.

Blob Storage

Blob storage is a service designed for storing large, unstructured data blobs such as images, videos, and files. It is more cost-effective and efficient than traditional databases for handling these types of data. Services like Amazon S3 and Google Cloud Storage allow users to upload blobs and retrieve them via URLs. These services often integrate with Content Delivery Networks (CDNs) to enable fast global access. The most popular blob storage services are Amazon S3, Google Cloud Storage, and Azure Blob.

Key Patterns

Use Case: Blob storage is ideal for applications like YouTube (videos), Instagram (images), and Dropbox (files), where metadata is stored in a core database while the actual blobs are stored in blob storage.
Architecture: Typically involves a core database (e.g., Postgres, DynamoDB) that stores metadata and URLs pointing to the blobs in blob storage.

Upload/Download Process

Upload:
- The user issues a upload request to the server.
- The server registers the upload request with status pending in the database and returns a pre-signed URL to the user.
- The user uploads the data to blob storage using the pre-signed URL.
- The blob storage triggers a notification event to the server to update the database with status completed.
Download:
- The user issues a download request to the server.
- The server returns a presigned URL to the user.
- The user uses the presigned URL to download the data via CDN, which proxies the request to the underlying blob storage.

Key Features

Durability: Blob storage services are designed to be incredibly durable. They ensure data safety through replication and erasure coding.
Scalability: Services like AWS S3 are highly scalable, capable of handling unlimited data and requests.
Cost-Effectiveness: Much cheaper than traditional databases (e.g., AWS S3 charges $0.023 per GB, while DynamoDB charges $1.25 per GB).
Security: Built-in encryption at rest and in transit, access control features protect data.
Direct Client Interaction: Clients can upload and download files directly using presigned URLs. Presigned URLs are temporary URLs that are signed with the user's credentials, allowing them to upload and download files directly. When a presigned URL is created, it includes authentication information as part of the query string, enabling controlled access to otherwise private objects. This is useful for applications that need to store and retrieve large blobs of data, like images or videos.
Chunking:
- When uploading large files, it's common to use chunking to upload the file in smaller pieces. This allows for resumable uploads, where the upload can be paused and resumed without having to start from the beginning, and also allows for parallel uploads.
- This chunked uploads can be even more efficient by leveraging client-side streaming where all chucnked upload are sent in parallel within the same TCP connection due to multiplexing supported by HTTP/2 or HTTP/3.
- To verify the integrity of each chunk and the whole file, we can use hash function like MD5 or SHA-256 to generate a checksum for each chunk and the whole file. This checksum can also be used to identify which chunk has been modified and only upload the modified chunks or check if a file has already been uploaded before.
- Implement compression to reduce file or chunk size before transfer, compression if useful if the speed gained from transfering fewer bytes outweighs the time to compres and decompress the file
  - Apply client-side logic to determine when to compress based on file type, size and network conditions, balance the trade-off between compression time and transfer speed
    - Text files compression ratio can be quite higher, so it's worth for compression
    - While Media files like image or videos may not worth due to low compression ratio
- You can then encrypt the compressed file for the in transit data encryption, then apply encryption at rest in the blob storage
  - Always compress first before encryption, because encryption natrually introduce randomness, which will affect compression ratio negatively
- S3 supports this out of the box with multipart uploads feature.
CDN: Integrates with Content Delivery Networks (CDNs) to enable fast global access, where actual download is served from the CDN that caches the file at edge locations around the world.
Versioning: Supports versioning of files, allowing for easy rollback to previous versions.
Lifecycle Management: Supports lifecycle management policies to automatically transition files between storage classes, reducing costs.

Message Queues

Message queues are data structures that act as buffers for managing bursty traffic and distributing workloads across systems. They allow a producer (such as a compute resource) to send messages and forget about them, while a pool of workers processes these messages at their own pace. This mechanism helps smooth out system loads and decouples the producer from the consumer, enabling independent scaling.

Numbers to Remember

Storage: Up to 50TB of storage per broker.
Throughput: Up to 1 million messages per second per broker.
Latency: 1-5ms end-to-end latency within the same region.
Partition: Up to 200k partitions per cluster.

Key Functions

Buffer for Bursty Traffic: Queues can handle sudden spikes in requests without dropping messages.
Distribute Work: Queues distribute tasks among worker nodes, ensuring efficient resource utilization.
Backpressure: This mechanism prevents overwhelming the system by slowing down message production when the queue is full, helping to avoid bottlenecks. This may not provided by queue services out of the box, but can be implemented by the application.

Key Metrics:

Queue Depth: The number of messages in the queue waiting to be processed.
Message Age/Latency: The average time it takes for a message to be processed.
Consumer Lag: The difference between producer and consumer offsets
Throughput: The number of messages processed per second.
Dead Letter Queue (DLQ) Size: The number of messages that have been sent to the dead letter queue.

Common Queue Technologies

SQS (Simple Queue Service): A fully managed queue service provided by AWS, designed for ease of use and integration with other AWS services.

Key Features:
- Scalability: SQS can automatically scale to handle an increasing number of messages, accommodating high volumes of traffic without pre-provisioning resources.
- Message Retention: Messages can be retained in the queue for a configurable period (from a few minutes up to 14 days), allowing consumers to process messages at their own pace.
- FIFO Queues: SQS provides FIFO (First-In-First-Out) queues to guarantee the order of message delivery, which is critical in scenarios where the order of messages matters.
- Message Visibility Timeout: After a message is retrieved from a queue, it becomes invisible to other consumers for a defined period. This ensures that while the consumer is processing the message, no other consumer can see or retrieve it. If the processing is successful, the message is deleted from the queue by the consumer via DeleteMessage API. If the processing is not successful, the message will be visible again after the visibility timeout is over.
  - At-Least-Once Delivery: This ensures at-least-once delivery of messages, meaning that each message is delivered at least once, no message is lost.
  - Manage long-running tasks: Set an appropriate visibility timeout for messages that require extended processing time. If the consumer needs more time to process the message, it can use the ChangeMessageVisibility API to extend the visibility timeout.
- Delay Queues: Delay queues let you postpone the delivery of new messages to consumers for a specified delay period, from 0 seconds to 15 minutes. This delay is specified at queue level, apply to all messages sent to the queue. Useful for scheduling tasks.
- Delayed Message: To set different delay times for individual messages, you can use the DelaySeconds parameter when sending a message. This parameter allows you to set a delay of 0 to 900 seconds (15 minutes) for that specific message. Useful for scheduling tasks.
- Dead Letter Queues (DLQ): SQS supports DLQs, where messages that cannot be processed after a specified number of attempts are sent. This is useful for debugging and handling failed messages without losing them.
- Batched Operations: SQS allows sending, receiving, and deleting messages in batches, which improves efficiency and reduces costs by minimizing API calls.
- Long Polling: SQS supports long polling, which reduces the number of empty responses and helps ensure that consumers receive messages more efficiently by waiting for messages to arrive instead of continuously polling.
Challenges
- Complexity: Introducing a message queue adds architectural complexity, requiring developers to understand and manage an additional system and its configuration.
- Latency: While message queues improve asynchronous processes, they can also introduce latency as messages are placed in the queue and retrieved later by consumers.

Event Streaming Platforms

Event streaming platforms are used for processing large amounts of data in real-time and supporting complex processing scenarios, such as event sourcing. Event sourcing involves storing changes in application state as a sequence of events, allowing for state reconstruction, detailed auditing, and transaction replay.

Event streaming platforms are essential for real-time data processing and event-driven architectures, providing flexibility, scalability, and fault tolerance in modern applications. Kafka being particularly prominent in system design discussions due to its robust features and widespread use.

Use Cases of Event Streaming Platforms

Real-Time Data Processing: Streams are ideal for applications that require immediate processing of high-volume data, like a social media platform needing real-time analytics of user interactions (likes, comments, shares). Stream processing systems (e.g., Apache Flink, Spark Streaming) can handle these events efficiently.
Event Sourcing: It's a domain-specific architecture pattern to model and store application state as a sequence of events. Rather than storing the current state of entities in your system, you store the complete sequence of events (immutable state changes) that led to that current state. The current state is derived by replaying these events. This allows for state reconstruction, detailed auditing, and transaction replay.
- In banking or financial systems, where every transaction must be recorded. Each transaction is treated as an event that can be stored, processed, and replayed, allowing transaction reconciliation, fraud detection, account reconciliation.
- In collaborative applications, where every action (e.g., edit, comment, cursor move) must be captured and stored as an event, allowing for real-time collaboration (employs conflict resolution strategies like OT, CRDTs) and history.
Multiple Consumers: Streams support multiple consumers reading from the same data source simultaneously. For example, in a real-time chat application, messages sent to a stream are distributed to all participants, facilitating instant communication through a publish-subscribe pattern.

Common Stream Technologies

Kafka: A distributed streaming platform that can function as a queue, known for its scalability and complex ordering capabilities.

Key Features:
- High Throughput: Kafka is designed to handle high volumes of data with minimal latency, allowing for the processing of millions of messages per second.
- Durability: Messages in Kafka are persisted on disk and can be replicated across multiple brokers, ensuring data durability and fault tolerance.
- Scalability: Kafka can easily scale horizontally by adding more brokers to the cluster. Topics can be partitioned, allowing for parallel processing of messages.
- Event Replay: Consumers can reprocess events by simply reading the event stream from a specific offset, facilitating debugging, data recovery, and processing of historical data.
- Consumer Groups: Kafka allows multiple consumers to work as part of a consumer group, enabling load balancing. Each message is consumed by only one consumer within a group, allowing for scalable message processing.
- Message Retention: Kafka retains messages for a configurable amount of time, allowing consumers to read messages at their own pace, even if they fall behind.
- Stream Processing: Kafka provides support for stream processing through Kafka Streams, which allows for powerful, real-time processing of data streams.
- Message Partitioning: Kafka topics can be divided into partitions, which enable parallel processing and help maintain order within a partition. Each partition can be hosted on different brokers.
- Idempotent Producer: Kafka supports idempotent producers, which ensures that messages are not duplicated in the event of a producer failure.
- Delivery Semantics: Kafka supports at-least-once and exactly-once delivery semantics, ensuring that messages are not lost or duplicated.
- Schema Registry: While not part of Kafka core, the Kafka ecosystem often includes a Schema Registry to manage message schemas, ensuring compatibility and preventing issues with data format changes.
- Transaction Support: Transaction support was introduced in Apache Kafka 0.11 and after. The following are transaction features supported by Kafka:
  - Exactly-once semantics: Transactions allow for exactly-once processing semantics in Kafka streams applications.
  - Atomic writes: Multiple messages can be sent to different topic partitions atomically, ensuring either all messages are written or none.
  - Read-process-write cycles: Applications can consume, process, and produce messages atomically, which is crucial for stream processing. To read transactional messages, consumers should set isolation.level to either:
    - read_committed: Read only committed messages
    - read_uncommitted: Read all messages
  - Idempotent producers: Transactions build upon Kafka's idempotent producer feature, which prevents duplicate messages due to network retries.
Challenges
- Complexity of Cluster Management: Setting up and managing a Kafka cluster can be complex, requiring expertise in distributed systems, configuration tuning, and ongoing maintenance.
- Operational Overhead: Running a Kafka cluster requires operational vigilance, including monitoring metrics such as throughput, latency, and disk usage, which can add to the maintenance burden.

Distributed Caching

A distributed cache is a system that stores data in memory across multiple servers, helping to scale applications and reduce latency. It is particularly useful for storing data that is expensive to compute or retrieve from a database.

Numbers to Remember

Memory: Single node can handle up to 1TB of memory.
Throughput: hundreds of thousands of requests per second, 100k QPS for Redis.
Latency: sub-ms (< 1ms) for read and write. Network latency is often the biggest factor.

Benefits

Reduced Latency: Frequently accessed data can be retrieved from memory much faster than from disk or a remote database, significantly decreasing response times for applications.
Increased Scalability: Distributed caches can easily scale horizontally by adding more nodes, allowing systems to handle increasing loads and support more concurrent users efficiently.
Offloading Backend Databases: By caching frequently accessed data, the load on primary databases is reduced, leading to lower resource consumption and improved overall performance.
Increased Performance: Applications can handle more requests per second due to faster read access, improving user experience and responsiveness.
Data Locality: Distributed caches can be located closer to application servers, reducing network latency and speeding up data access.

Challenges

Consistency Issues: Maintaining consistency between the cache and the underlying data source can be difficult, especially in systems with frequent updates.
Cache Invalidation Complexity: Implementing effective cache invalidation strategies is complex and may lead to stale data being served if not managed properly.
Durability: If the distributed cache does not persist data to disk or has replication issues, there is a risk of data loss in case of node failures.
Hot key problem: When certain keys receive disproportionately high traffic compared to others, it creates a hotspot that lead to performance bottleneck for that entire shard.
- Hot read key problem: Keys that receive an extremely high volume of read requests, like a viral tweet's data that millions of users are trying to view simultaneously
  
  Mitigation:
  - Read replicas: We can add read replica nodes and distribute the read load across them.
    - It brings additional storage overhead as we replicate those non-hot keys as well.
    - Configure and manage read replicas adds additional infrastructure and operational overhead.
  - Copy on write:
    - During write, we can store the same data in multiple keys by adding random suffix within a range to the key. These copies get distributed to different nodes via consistent hashing.
    - During read, system randomly picks one of the suffixed keys and send the request to that node. This helps distribute the read load across multiple nodes.
    - This approach works best when hot keys are primarily read-heavy with minimal writes, so we less worry about the consistency among the multiple copies during writes.
- Hot write key problem: Keys that receive many concurrent write requests, like a counter tracking real-time votes.
  
  Mitigation:
  - Write batching:
    - By collecting multiple write operations over a short period of time and applying them as a single atomic update, this reduce the write pressure on the cache node.
    - The trade-off is data accuracy with a delay, as the data is not updated immediately.
    - This approach works best for metrics and counters where eventual consistency is acceptable, but may not be suitable for scenarios requiring immediate write visibility.
  - Sharding with suffixes:
    - It spreads write load across multiple shards by adding a suffix within a range to the original key.
    - During write, system randomly selects one of suffixed key to update, so the write only happens on one node.
    - During read, system need to perform a scatter-gather approach to aggregate the result from all suffixed keys, so the read happens on all nodes.
    - This approach effectively distributes write load across multiple nodes in the cluster, but brings addtional overhead and complexity for read operations due to scatter-gather.
Network Overhead: Distributed caches rely on the network to communicate between nodes and with applications, introducing potential latency and vulnerabilities to network issues.
Increased System Complexity: Adding a distributed cache increases overall system architecture complexity, requiring additional management and monitoring.
Operational Overhead: Managing a distributed cache infrastructure requires additional resources for setup, configuration, monitoring, and maintenance, which can increase operational costs.
Eviction Policies: The effectiveness of caching can be impacted by how eviction policies are managed.

Key Concepts

Eviction Policy: Determines which items are removed when the cache is full. Common policies include:
- Least Recently Used (LRU): Evicts the least recently accessed items. Most common used and memory efficient.
- First In, First Out (FIFO): Evicts items in the order they were added.
- Least Frequently Used (LFU): Removes items that are least frequently accessed.
Cache Patterns:
- Cache-Aside (Lazy Loading): The application explicitly retrieves data from the cache. If the data is not found (cache miss), it fetches it from the underlying data store and populates the cache for future requests. Use Case: Ideal for read-heavy applications like e-commerce sites where product details are requested frequently but changed infrequently.
  
  Pros:
  - Reduces load on the database by caching only the necessary data.
  - Simple implementation; the cache is populated only when needed.
  Cons:
  - Cache may become stale if not updated frequently.
  - Initial read requests may incur higher latencies due to the need to fetch from the database
- Write-Through Cache: The application writes data to both the cache and the datastore simultaneously, ensuring that both are always in sync. Use Case: Suitable for applications requiring strong consistency, such as financial systems where real-time data accuracy is critical.
  
  Pros:
  - Ensures that the cache and data store are always consistent.
  - Reduces the risk of stale data in the cache.
  Cons:
  - Higher latency due to additional write operations.
  - Can lead to performance bottlenecks during high write operations.
- Cache Invalidation: The application invalidates the cache when the underlying data store is updated, ensuring that stale data is promptly invalidated when changes occur.
  
  Use Case: Suitable for applications where the data store is updated frequently, such as a social media platform where new posts are constantly being added.
  
  Pros:
  - Helps maintain the accuracy and relevance of the data in the cache.
  - Reduces the risk of stale data in the cache.
  Cons:
  - Complexity in managing cache eviction strategies and synchronization.
  - Can lead to performance bottlenecks during high write operations.
Data Structure: Be explicit about the data stored in the cache and the data structure used (e.g., sorted sets for lists of events) to optimize retrieval and processing. In Redis, the core structure underneath is a key value store, the key is always a string that uniquely identifies the data, and the value is the data itself. The value can be of different data structures as follows:
- Strings: Value is a string.
- Lists: Ordered list of strings.
- Sets: Collection of unique strings.
- Hashes (Objects): Collections of field-value pairs. (e.g. HSET, HGET, HDEL)
- Full Text Search Indexes: Store text data and provide full text search capabilities, but it's not as mature as Elasticsearch. (e.g. FT.CREATE, FT.SEARCH)
- Geospatial Indexes: Store geographic coordinates and provide spatial queries. (e.g. GEOADD, GEORADIUS, GEOSEARCH)
  
  Use Case:
  - Nearby Search: Find points of interest (POIs) within a certain distance from a specified location.
  - Distance Calculation: Calculate the distance between two points.
  - Geofencing: Redis support this through geohash with sorted set, but it has limitation when dealing with complex polygon shapes, which requires tree-based indexing.
- Streams: Append-only logs similar to Kafka topics, ideal for event sourcing. (e.g. XADD, XREAD, XDEL)
  
  Use Case:
  - Event Sourcing: Store a sequence of events that occurred in the system, allowing for state reconstruction, detailed auditing, and transaction replay.
  - Pub-Sub: For reliable pub/sub where consumers need to process all messages, even after disconnection. The standard Redis Pub/Sub has a key limitation: it's fire-and-forget. Messages are only delivered to clients that are connected and subscribed at the time of publishing. If a consumer is offline or disconnected, it will miss messages.
    
    Specific Features:
    - Message Persistence: Messages remain in the stream until explicitly removed
    - Consumer Groups: A group of consumers that share the responsibility of processing messages from a stream.
    - Offset Management: Each consumer maintains an offset, which is the position of the last message it has processed.
- Bloom Filters: Probabilistic data structure that let you check for the presence or absence of an element in a set.
  
  Use Case:
  - Membership Testing: Check if an element is present in a set. Note, Bloom filters may return false positives (i.e., they may incorrectly report that an element is present in a set).
- Count-Min Sketch: Probabilistic data structure that let you count the frequency of elements in a set.
  
  Use Case:
  - Frequency Estimation: Estimate the frequency of elements in a set.
- Time-Series Data: Store time-series data in a sorted order based on the timestamp. Allows for efficient range queries, such as retrieving data points within a certain time frame.
  
  Use Case:
  - Monitoring: Store and retrieve metrics data over time.
- Sorted Sets: A data structure that stores elements associated with a score, allowing the elements to be stored in a sorted order based on the score. (e.g. ZADD, ZRANGE, ZSCORE)
  
  Use Case:
  - Leaderboard: Create leaderboards where each score represents a user’s points or rankings. Sorted set keeps the scores in order, allowing quick retrieval of the top scorers.
  - Priority Queue: Sorted sets can function as priority queues where items with higher scores (priority) are served before those with lower scores.
  Inner Implementation:
  - A hash table map the elements to their scores.
  - A skip list to store the elements in a sorted order based on the scores.
    - Skip lists offer average-case O(log N) time complexity for search, insert, and delete operations, making it efficient for ordered retrieval.
    - For range queries, time complexity is O(log N + M), where N is the number of elements in the Sorted Set, and M is the number of elements within the requested range.
    - The skip list consists of a based sorted linked list and multi-level indexes. To build next level index, simply skips every other node in current level index.
Key Metrics:
- Cache Hit/Miss Rate: The ratio of cache hits/misses to the total number of requests.
- Memory Usage: The amount of memory used by the cache.
- Eviction Rate: The rate at which items are removed from the cache due to memory pressure.
- Latency: The average time it takes to serve a request from the cache.
- Throughput: The number of requests per second that the cache can handle.
Common Solutions
- Redis: A popular in-memory data structure store that supports various data structures and provides a rich set of commands for data manipulation.
- Memcached: A simple key-value store that that primarily supports strings and binary objects.

CDN & Edge Caching

A Content Delivery Network (CDN) is a system of distributed servers designed to deliver content to users based on their geographic location, improving load times and user experience. CDNs cache content, such as static files (images, videos, HTML) and dynamic content (like API responses), on servers closer to users. When a user requests content, the CDN serves it from the nearest server if available; if not, it retrieve from the origin server, caches it, and then delivers it.

Edge Caching is a technique that cache content at the "edge" of the network—on servers located geographically closer to end users. This is a fundamental technology used in CDNs, however, CDNs encompass more than just edge caching—such as routing algorithms, encryption, load balancing, and security features.

Key Points

CDNs enhance performance by reducing latency and improving load times for global users.
They cache not only static assets but also dynamic content that changes infrequently, such as blog posts.
CDNs can cache API responses, alleviating server load and boosting API performance.
Eviction policies manage cached content, determining when to remove items based on rules like time-to-live (TTL) or content changes.

Challenges

Operational Overhead: Managing a CDN infrastructure requires ongoing maintenance, including monitoring, updating, and scaling the network as needed.
Cache Invalidation: CDNs must manage cache invalidation to ensure that stale data is not served to users. This can be complex and may lead to trade-offs between consistency and performance.
Cost: CDNs can be expensive, especially for high-traffic applications, due to the need for a global network of servers and the associated operational costs.

Distributed Lock

Distributed locks are mechanisms used to temporarily lock resources across different systems or processes, ensuring that only one process can access a resource at a time. They are particularly useful in scenarios where multiple users or systems might try to access the same resource simultaneously, such as in ticket sales or e-commerce.

Traditional databases with ACID properties use transaction locks to keep data consistent, which is great, but they're not designed for longer-term locking. This is where distributed locks come in handy.

Typically implemented using distributed key-value stores like Redis or Zookeeper, distributed locks utilize atomic operations to ensure that a resource can only be locked by one process at a time. For example, setting a key (e.g., ticket-123) to a "locked" state prevents other processes from acquiring the same lock until it is released.

Key Concepts

Locking Mechanisms: This mechanism ensures that only one process can acquire the lock at a time Familiarity with implementations like Redlock, which uses multiple Redis instances for safety and consistency.
Lock Expiry: Distributed locks can be configured to expire after a certain period, which helps avoid situations where a lock remains active indefinitely due to process crashes. This feature ensures that resources can be reclaimed after a timeout, allowing other processes to acquire the lock.
Locking Granularity: Distributed locks can be used to lock a single resource or a group of resources. For example, a distributed lock can be used to lock a single ticket or a group of tickets.

Challenges

Deadlocks: When multiple processes attempt to acquire locks on multiple resources, they may end up in a deadlock situation where each process is waiting for the other to release a lock.

Mitigation:
- Timeouts: Setting a timeout for the lock ensures that the lock will be released if the process crashes.
- Try-Lock: Implement a "try-lock" technique that allows a process to attempt to acquire a lock without waiting. If it fails to acquire the lock, the process can then back off, release other locks, and retry.
- Lock-Free: Where feasible, design algorithms that do not require locks. Instead use alternative concurrency control methods, such as optimistic concurrency control (OCC).
- Lock Hierarchy: Establish a strict order in which locks must be acquired. Ensure that all processes acquire locks in the same predefined sequence.
- Deadlock-Detection: Implement mechanisms to detect and resolve deadlocks. For example, use a centralized service to monitor lock requests and detect when a deadlock is imminent.
Performance: Distributed locks can introduce latency, as they require network communication to coordinate locks across different nodes.
Consistency: Distributed locks must ensure consistency across different nodes, which can be difficult to achieve in highly concurrent systems.
Scalability: Ensuring that distributed locks are scalable and fault-tolerant can be challenging, especially in large-scale systems with many nodes.
Complexity: Implementing and managing distributed locks can be complex, requiring careful consideration of failure modes and ensuring that locks are released in all cases, even in the event of system failures.

Common Use Cases

E-Commerce Checkout: Locking high-demand items in a user’s cart during checkout to prevent double-selling.
Ticket Booking: Locking a ticket during the checkout process to prevent multiple users from booking the same ticket.
Ride-Sharing Matchmaking: Locking a driver to a rider request to avoid multiple matches.
Distributed Cron Jobs: Ensuring that scheduled tasks are executed by only one server at a time to avoid duplication.
Online Auction Bidding: Briefly locking an item during the final moments of bidding to process bids without conflicts.

Common Solutions

Redis: A popular distributed key-value store that supports distributed locks in various approaches.
- SETNX and TTL (Single Redis Instance): SETNX is a command that sets a key to a value if it does not already exist. The TTL (Time To Live) is the duration for which the lock will be held.
  - Pros and Cons: Simple implementation, but not fault-tolerant.
- Redlock (Multiple Redis Instances): A distributed locking algorithm that uses multiple Redis instances for safety and consistency. A client must acquire the majority of locks (e.g., at least 3 out of 5) within a short timeframe to ensure the lock is valid.
  - Pros and Cons: More complex setup, but provides fault-tolerance and consistency.
AWS DynamoDB: A distributed database that provides a distributed locking mechanism through its Locks API. DynamoDB uses a persistent table with additional features like automatic heartbeating.
Zookeeper: A distributed coordination service that provides a distributed locking mechanism through its Locks API. Zookeeper uses its distributed file system to store locks, which provides durability and fault-tolerance.

Compute Options

In system design, compute options refer to the different ways to execute code in a system. Here are some of the most common compute options:

Containers

Containers are similar to VMs in that they provide an isolated environment for running code, but they are much more lightweight and faster to start up. Let's break down the key differences between VMs and containers:

Isolation: Containers share the kernel of the host machine, while VMs have their own kernel and virtual hardware.
Resource Utilization: Containers are more resource-efficient than VMs, as they do not need to run a full virtual machine.
Lightweight: Containers are much lighter than VMs, they only include applications and their dependencies, so they can be started and stopped much faster.

When it comes to production, containers are often used in conjunction with orchestration tools like Kubernetes or ECS to manage the lifecycle of the containers.

Pros:

Cost effective for steady workloads. We can further reduce the cost by using spot instances
Better suited for long-running processes since they maintain the state of the application

Cons:

More operational overhead compared to serverless, due to managing the lifecycle of the containers and the orchestration platform.
Cannot scale elastically as serverless

Serverless

Serverless functions like AWS Lambda are small, stateless, event-driven functions that run in response to triggers (e.g. an HTTP request). They are managed by a cloud provider and automatically scale up or down based on demand, making them a great option for running code that is CPU intensive or unpredictable in terms of load.

Pros:

Minimal operational overhead, as the cloud provider manages the infrastructure.
Ideal for short-lived tasks under 15 minutes.
Automatically scales up or down based on demand.

Cons:

Has cold start time, which can introduce latency for the first invocation.

Mitigation:
- Provisioned Concurrency: pre-initialize a certain number of instances, but cost more
- Keep Function Warm: Impplement scheduled tasks (cloudwatch event) to ping functions to keep them warm.
Has resource limits, which can impact the performance of long running or CPU intensive tasks.
More expensive than containers for steady workloads.

Push Notification

Push notifications are a way to send messages to users from a server to their devices. They are particularly useful for real-time communication, such as in chat applications or social media platforms.

3rd Party Push Notification Services

Apple Push Notification Service (APNS): A messaging service provided by Apple to push notifications to iOS devices.
Firebase Cloud Messaging (FCM): A messaging service provided by Google to push notifications to Android devices.
Short Message Service (SMS): 3rd party services like Twilio to send SMS notifications to users.
Email Service: 3rd party services like Resend to send email notifications to users.

System Design Key Technologies

Table of Contents

Table of Contents

Application Server

API Design

POST vs PUT vs PATCH

Security

Idempotent APIs

Database

Numbers to Remember

Interview Tips

Relational Databases

Key Features

Pros & Cons

NoSQL Databases

Key Features

Pros & Cons

Blob Storage

Message Queues

Numbers to Remember

Event Streaming Platforms

Distributed Caching

Numbers to Remember

CDN & Edge Caching

Distributed Lock

Compute Options

Push Notification

3rd Party Push Notification Services