System Design Advanced Technologies

Advanced

A guide to advanced technologies commonly used in system design.

Database Transactions

A transaction is a sequence of one or more database operations that are treated as a single logical unit of work. It ensures that all operations within the transaction are completed successfully before committing any changes to the database. If any operation fails, the entire transaction can be rolled back, leaving the database in its previous state.

ACID Properties

When a database supports transactions, it typically means that the implementation adheres to the ACID properties. Therefore, for a transaction to be considered robust and reliable, it must comply with ACID.

  • Atomicity: Ensures that all operations in a transaction are completed successfully; if any operation fails, the entire transaction fails and is rolled back.
  • Consistency: Guarantees that a transaction brings the database from one valid state to another, maintaining all defined rules and constraints (e.g., integrity constraints).
  • Isolation: Ensures that concurrently executed transactions do not interfere with each other, providing a clear and consistent view of the data.
  • Durability: Ensures that once a transaction is committed, it remains so, even in the event of a system failure.

ANSI Isolation Levels

Transaction isolation levels determine how transaction integrity is visible to other transactions and how the interactions between competing transactions are managed. Different isolation levels provide varying balances of performance and consistency, impacting how transactions interact with one another.

  • Read Uncommitted: Transactions can read data that has been modified but not yet committed by other transactions.

    • Pros: Offers the highest level of concurrency and performance.
    • Cons: Can lead to dirty reads, where transactions read uncommitted data that may be rolled back later.
  • Read Committed: A transaction can only read data that has been committed at that moment. It cannot see uncommitted changes made by other transactions.

    • Pros: Prevents dirty reads, ensuring that only committed data is read.
    • Cons: Does not protect against non-repeatable reads or phantom reads, where the data read might change if read again within the same transaction.
  • Repeatable Read: Ensures that if a transaction reads a row, it can read that row again and receive the same values, even if other transactions are modifying it.

    • Pros: Prevents both dirty reads and non-repeatable reads.
    • Cons: May still allow phantom reads, where new rows added by other transactions can be seen on subsequent reads.
  • Serializable: The highest isolation level, where transactions are executed in such a way that they appear to be serial, meaning one transaction is completely executed before another begins.

    • Pros: Prevents dirty reads, non-repeatable reads, and phantom reads.
    • Cons: Most restrictive in terms of concurrency and can lead to performance issues due to increased locking and contention.

    Common Default Isolation Levels

    • PostgreSQL: Default is Read Committed.
    • MySQL (InnoDB): Default is Repeatable Read.

Locking Mechanisms

Locking Mechanisms are techniques used in databases to control concurrent access to data, ensuring data integrity and consistency during transactions. They can be broadly categorized into pessimistic and optimistic locking strategies.

  • Pessimistic Locking: This strategy involves locking the data being modified before the transaction begins, preventing other transactions from accessing it until the lock is released.
    • Subtypes:
      • Row-Level Locking: Locks individual rows in a table.
      • Table-Level Locking: Locks the entire table, preventing access to all rows in the table.
    • Pros & Cons:
      • Pros:
        • Ensures data integrity and consistency, preventing race conditions.
        • Easy to implement and it avoid conflicts by serializing updates.
        • Very useful when data contention is heavy
      • Cons:
        • Can lead to deadlocks and reduced concurrency.
        • Can be less efficient than optimistic locking, especially in low-concurrency environments.
      • Best Practices:
        • By locking only the necessary rows, more transactions can proceed concurrently.
        • Shorter lock duration reduces the risk of deadlocks and improve system responsiveness.
  • Optimistic Locking: A locking strategy where transactions proceed without acquiring locks initially. Before committing, the transaction checks whether the data has been modified by other transactions. If a conflict is detected, the transaction is rolled back.
    • Subtypes:

      • Versioning Check-and-Set (CAS): Each row has a version number that increments with each update. Transactions check this version number during commit.
      • Timestamping: Each row has a timestamp that is updated with each modification. Transactions check this timestamp during commit. However,server clock synchronization is required to ensure accurate timestamp comparison.
    • Pros & Cons:

      • Pros:
        • Higher concurrency and performance, as it avoids unnecessary locking, especially in low-contention environments.
        • Reduces the risk of deadlocks since locks are not held.
      • Cons:
        • Performance can degrade in high-contention due to frequent retries.
        • Requires additional logic to handle conflict detection and resolution.
    • Best Practices:

      • Use optimistic locking in low-conflict scenarios to maximize performance.
      • Monitor conflict rates to determine whether optimistic locking is appropriate for the workload.
    • Summary Table

    Feature Pessimistic Locking Optimistic Locking
    Definition Locks data upfront to prevent access Checks for conflicts before commit
    Sub-Locking Types Row-level, Table-level, Shared, Exclusive Versioning, Timestamp-based
    Pros Guarantees data integrity Higher concurrency, no deadlocks
    Cons Reduced concurrency, potential for deadlocks Possible rollbacks, conflict overhead
    Best Practices Use row-level locks, implement timeouts Monitor conflicts, provide feedback
    Use Cases High-conflict environments (e.g., banking) Low-conflict (e.g., read-heavy, collaborative tools)

Database Constraints

Constraints are rules applied to columns in a table to ensure the integrity and validity of the data. They enforce specific conditions that data must adhere to when being inserted or updated, ensuring that the data maintains certain standards.

  • Types of Constraints
Constraint Type Description
Primary Key Ensures that each row in a table has a unique identifier.
Foreign Key Ensures that a value in one table corresponds to a value in another table.
Unique Ensures that all values in a column are unique.
Check Ensures that the values in a column meet certain conditions.
Not Null Ensures that a column cannot contain null values.

Constraints and transactions are closely related. Constraints help ensure the integrity and consistency of the data being manipulated by transactions.

  • Enfore Data Integrity: Ensure that the data being inserted, updated, or deleted in a transaction adheres to the defined rules and constraints.
  • Rollback on Failure: If a transaction attempts to perform an operation that violates a database constraint, the transaction will be rolled back, and the database will remain in a consistent state.
  • Performance Considerations: While database constraints are essential for maintaining data integrity, they can also impact the performance of the database, especially for high-concurrency workloads.

Database Indexes

Indexes are data structures used to optimize the performance of database queries by allowing the database to quickly locate specific rows based on the values of one or more columns.

  • Types of Indexes
Index Type Description
Primary Key A unique index on a single column that serves as the primary key for the table.
Unique Index Ensures that all values in a column are unique.
Composite Index An index on multiple columns, allowing for efficient queries that involve combinations of those columns.
B-Tree Index A balanced tree structure that supports efficient range queries on single dimension data (e.g WHERE column BETWEEN value1 AND value2), equality queries (e.g WHERE column = value), and sorting (e.g ORDER BY column).
Hash Index Uses a hash function to quickly locate rows based on the values of one or more columns, it is efficient for equality queries (e.g WHERE column = value).
Bitmap Index Uses a bitmap to quickly locate rows based on the values of one or more columns.
Spatial Index Used for geospatial data, allowing for efficient queries that involve spatial relationships.
Full-Text Index Used for full-text search, allowing for efficient queries that involve searching for specific text patterns.

Full Text Search

A search optimized database is designed specifically for full-text search, allowing users to efficiently search through large volumes of text data. Unlike traditional databases that rely on exact matches and can be slow and inefficient (e.g., using LIKE queries) as it requires scanning the entire table, search optimized databases utilize techniques such as inverted indexes, tokenization, and stemming to enhance search performance. Use cases for full-text search include Ticketmaster that needs to search through a large number of events to find relevant results. Or a social media platform like Twitter that needs to search through a large number of tweets to find relevant results.

Key Features

  • Inverted Indexes: A data structure mapping words to documents, enabling quick lookups for search terms.
    {
      "running": [doc1, doc2, doc3],
      "jumping": [doc1, doc3],
      "jumper": [doc2]
    }
    
  • Tokenization: The process of breaking text into individual words for mapping in the inverted index.
  • Stemming: Reducing words to their root forms (e.g., "running" to "run"), allows you to match different forms of the same word.
  • Fuzzy Search: Allows for finding similar terms, accommodating typos and slight misspellings through algorithms that measure edit distances.
  • Relevance Scoring: Assigns a score to each result based on how well it matches the search query, allowing you to prioritize results based on relevance.
  • Scalability: Designed to scale horizontally across many servers, accommodating large amounts of data and high traffic loads through techniques like sharding and consistent hashing.

Examples

  • Elasticsearch: The leading search optimized database, built on Apache Lucene, known for its speed and scalability, used by companies like Netflix and Uber.
  • Solr: A search platform built on Apache Lucene, known for its scalability and reliability, used by companies like Walmart and eBay.
  • Postgres: Offers GIN index for full-text search.
  • Redis: Offers full-text search capabilities, but is considered less mature.

Geospatial Search

Geospatial search is a technique used to search for data based on its location. It is used in applications that require searching for data based on its location, such as Yelp that needs to search for restaurants near a specific location and Uber that needs to search for drivers near a specific location.

Problems with using traditional index:

  • Traditional B-Tree index is efficient for one-dimensional data, not for two-dimensional.
  • Geospatial data is two-dimensional (latitude and longitude), even with index datasets returned from each dimension could still be huge. The intersection operation of the these two datasets could be quite slow. Thus we need a way to map the two-dimensional data into a one-dimension.

Type of Geospatial Index

In a broad sense, there are two types of geospatial indexing approaches: Hash-based and Tree-based. Even though the underlying implementations of those approaches are differerent, the high level idea is the same which is to divide the map into smaller areas and then build indexes for fast search.
  • Google S2: A tree-based index structure that uses Hilbert curve to map two-dimensional data to one-dimensional data, it's widely used in companies like Google, Uber, and Tinder.

  • R-Tree: A tree-based index structure that optimized for indexing multi-dimensional information (like geographic data) using bounding rectangles for efficient spatial queries. ** Examples:**

    • PostGIS: Using R-tree as geospatial index.
  • Quadtree: A tree-based index structure that partition a two-dimensional space into four quadrants recursively until each quadrant contains only certain number of locations.

    • Only the leaf nodes store the actual location data.

    • Quadtree is an in-memory data structure, quadtree is built at server startup time.

    • Quadtree index does not take too much memory (a few GBs) and can be easily fit in one server.

    • To find nearby locations, we start from the root and traverse the tree until we find the leaf node where the search location is. If there are enough locations in the leaf node, we can return them directly. Otherwise, we visit its neighboring nodes recursively until we find enough locations.

    • Pros:
      • Dynamic grid size, can adjust to the population density. Well suited for fetching k-nearest locations, no matter the location is within a specified radius or not.
    • Limitations:
      • Operational Overhead:
        • Update the quadtree index is more expensive than geohash. In general, update index takes O(log n) time. If the index is accessed by multi-threaded program, it would require proper locking mechaism to avoid race conditions. Moreover, it's time consuming to rebuild the quadtree index. We could take a rolling update to incrementally rebuild the quadtree on a small portion of servers at a time across the entire cluster. This would mean some servers would return stale data for a short period of time. Migration: This is generally acceptable for most applications. We can setup a business agreement with the clients that new locations are only be avaiable after a certain period of time. Or we could update the quadtree via a cron job, during off-peak hours.
      • Not enough locations in the grid:
        • The current gird may not contain enough locations, leading to inaccurate results. Migration: Increase the search radius by explorting the neighboring grids. Two options for finding neighboring grids:
          • Connect all leaf nodes with double-linked list.
          • Keep a parent pointer in each node.
    • Examples:
      • Elasticsearch: Using both Geohash and Quadtree as geospatial index.
  • Geohash: A hash-based index structure that encodes geographic coordinates (latitude and longitude, two-dimensional data) into a short string (one-dimensional data) of characters.

    • Geohash usually use base32 encoding, which is a 5-bit encoding scheme that maps a set of 32 characters (0–9 and b–z, excluding "a", "i", "l", "o" to avoid confusion). Each group of 5 bits turns into one character in the geohash string.

    • The length of the geohash string determines the precision. Longer strings provide more specific locations, while shorter ones cover larger areas. Normally, we only interested in geohashes with lengths between 4 and 6.

    • Nearby locations share common prefixes in their geohash strings, making it efficient to find nearby points using string matching. It guarantees that the longer a shared prefix between two geohash strings, the closer the two locations are.

    • Pros:

      • Easy to use and implement, no need to build a tree
      • Update the index is cheap, we just need to update the geohash of the location in the index.
    • Limitations:

      • Fixed Grid Size:
        • As the precision (level) of geohash is fixed, the size of the grid is fixed as well. It cannot dynamically adjust to the population density.
      • Boundary Issues:
        • Two locations can be very close to each other but have no shared prefix at all due to they are on different sides of a box. Migration: A common solution is when fetching nearby locations, we search no only within the current grid, but also the neighboring grids. The geohashes of the neighboring grids can be calculated in constant time.
      • Not enough locations in the grid:
        • The current gird and neighboring grids may not contain enough locations, leading to inaccurate results. Migration: Increase the search radius by removing the last character of the geohash string, then use the new geohash string to find nearby locations. We can repeat this process until we find enough locations.
    • Examples:

      • Redis: Utilize Geohash in conjunction with its sorted set data structure to implement efficient geospatial search.

        • Redis encodes geographic coordinates into a geohash string
        • Redis uses a sorted set to store the each geospatial point, where the key of the sorted set is geograpic coordinate and the score is the geohash of the location.
        • The sorted set allows Redis to efficiently perform range queries and proximity searches, such as finding all points within a certain distance from a specified location.
      • Elasticsearch:

        • Geospatial Index Types:

          • Geo-point: A single geographic point on the earth's surface, represented by latitude and longitude coordinates. When a geo-point is indexed, its coordinates are converted into a geohash that aids spatial indexing. Use Case: - Point-based proximity search, such as finding all points within a certain distance from a specified location. Query Type: - geo_distance: Find points within a certain distance from a specified location. - geo_bounding_box: Find points within a rectangular bounding box.

          • Geo-shape: A polygon on the earth's surface, represented by a list of coordinates. Use Case: - Spatial relationship checks, such as finding all points within a custom boundaries. Query Type: - geo_polygon: Find points within a polygon. - geo_shape: Find points within a complex shape.

        How to allow search by predefined location names such as city names or neighborhood names, eg Pizza in San Francisco?

          1. Create a index or table that maps the predefined location names to their polygon coordinates. These data can be obtained from open data sources. In elasticsearch, we can use a geo-shape index to store this info.
          1. When a user search for "Pizza in San Francisco", based on the predefined location name, we can perform a query that finds all points within the polygon of San Francisco. Both Postgres via the PostGIS extension and Elasticsearch have functionality for working with polygons. In elasticsearch, we can use a geo_polygon query to find all points within the polygon.

          Further Optimization:

          Instead of filtering on polygon for each request, we can pre-compute the areas for each location upon creation and store them as a list of location identifiers in our location table. Example location document:

          {
            "id": "123",
            "name": "Pizza Place",
            "location_names": ["bay_area","san_francisco", "mission_district"],
            "category": "restaurant"
          }
          

          Now we can leverage the inverted index on the location_names to quickly find all locations within the predefined location names.

        The combination of geohashing and the inverted index allows Elasticsearch to efficiently handle spatial queries and search by geographic parameters.

Distributed Systems Architecture

Distributed systems involve organizing nodes in a system to manage data and requests efficiently across multiple machines. Two primary architectures are commonly used: Master-Slave (Leader-Follower) Architecture and Decentralized Peer-to-Peer Architecture.

  • Master-Slave Architecture: A master-slave architecture is a design pattern where a single master node is responsible for managing the data and a set of slave nodes are responsible for reading the data.

    • How it works:
      • Master Node: Acts as the coordinator, maintaining the authoritative state and managing cluster configuration.
      • Slave Nodes: Replicate data from the master and handle client requests. Write operations are forwarded to the master for processing.
      • Failover: If the master fails, a new master is elected from the slaves.
    • Pros:
      • Centralized control simplifies consistency management.
      • Efficient handling of failover scenarios.
      • Easy to implement and understand.
    • Cons:
      • Single point of failure (the master node).
      • Performance bottlenecks if the master becomes overloaded.
    • Common Use Cases:
      • Relational databases (e.g., MySQL, PostgreSQL)
      • Distributed caching systems (e.g., Memcached, Redis)
  • Decentralized Peer-to-Peer Architecture: A decentralized peer-to-peer architecture is a design pattern where each node is independent and self-contained, without a central master node.

    • How it works:
      • Equal Nodes: All nodes operate as peers with equal roles, sharing complete copies of the data.
      • Consensus Protocols: Updates are synchronized across all nodes using protocols like Raft or Paxos, which maintain data consistency and fault tolerance.
      • Dynamic Leadership: Leadership is elected dynamically for consensus, and client requests can be directed to any available node.
    • Pros:
      • High availability and scalability, with no single point of failure.
      • Robust against node failures, as remaining nodes can continue operating.
    • Cons:
      • More complex implementation and management due to the need for consensus protocols.
      • Requires mechanisms for conflict resolution to maintain data consistency across nodes.
    • Common Use Cases:
      • NoSQL databases (e.g., Apache Cassandra, DynamoDB)
      • Messaging systems (e.g., Apache Kafka)
      • Search engines (e.g., Elasticsearch)

Load Balancer

  • L4 Load Balancer: are preferred for applications where performance and low latency are critical, particularly for protocols that maintain long-lived connections. They are efficient in handling high traffic volumes without the overhead of inspecting request content.
    • Key Concepts:
      • Operate at the Transport Layer (TCP/UDP).
      • Make routing decisions based on IP addresses and ports without inspecting packet content.
      • Best suited for protocols requiring persistent connections, like WebSockets, WebRTC.
  • L7 Load Balancer: excel in scenarios where application-level routing is necessary, enabling more advanced features like A/B testing, user session persistence, and traffic management based on request characteristics. They are particularly useful for web applications that rely heavily on HTTP.
    • Key Concepts:
      • Operate at the Application Layer, understanding protocols like HTTP.
      • Can inspect the content of requests (e.g., URL, headers, cookies) to make intelligent routing decisions.

Client Realtime Updates

Simple Polling

  • How it works: Simple polling involves the client making regular HTTP requests to the server to check for updates. The client sends a request at fixed intervals (e.g., every 2 seconds), and the server responds with the current state of the data.

  • Load Balancer for Polling:

    • Polling is stateless, so layer 7 load balance with round robin or least connections is usually sufficient.
  • Pros:

    • Simplicity: Easy to implement and understand.
    • Stateless: Does not require maintaining session state.
  • Cons:

    • Latency: Updates can be delayed by the polling interval plus processing time.
    • Limited Frequency: The update rate is constrained by the polling interval.
    • Resource Intensive: Can strain server resources with many clients, as each request may require establishing new connections.
  • When to Use:

    • Real-time updates are not critical.
    • The need for updates is short-lived.
    • System where lower frequency updates suffice, like in certain scenarios on platforms like Leetcode.
  • Discussion Points in Interview:

    • Trade-offs: Clearly explain the decision to use polling over more complex methods, emphasizing its simplicity and how it allows focusing on other system components.
    • Addressing Objections: Be prepared to discuss concerns about speed and efficiency. Justify your chosen polling frequency and suggest optimizations, such as using HTTP keep-alive connections to minimize connection overhead.

Long Polling

  • How it works: Long polling enhances the basic polling method to achieve near real-time updates. The client sends an HTTP request to the server, which holds the request open until new data is available. Once the server has new data, it responds to the client, which immediately sends a new request, creating a continuous cycle. If no new data arrives, the server may return an empty response after a timeout, prompting the client to make another request.

  • Load Balancer for Long Polling:

    • Long polling is stateless, a Layer 7 load balancer is necessary, with careful attention to connection timeout settings.
    • While not strictly required, sticky sessions are often recommended for long polling. This ensures that a client maintains a consistent connection with the same backend server, which can be important if the server is maintaining some kind of transient state related to the client's request.
  • Pros:

    • Standard HTTP: Leverages existing HTTP infrastructure, making it widely compatible.
    • Stateless: Maintains a stateless server-side architecture.
    • Reduced Bandwidth: Lower usage due to fewer requests.
    • Near Real-Time Updates: Provides more consistent and near real-time updates.
  • Cons:

    • Higher Latency: Can introduce delays, especially for high-frequency updates, due to the round-trip nature of requests.
    • HTTP Overhead: Increased overhead from frequent HTTP requests.
    • Resource Intensive: Can strain server resources with many clients, as each request may require establishing new connections.
  • When to Use:

    • Ideal for applications requiring near real-time updates where updates are infrequent.
    • Particularly useful for scenarios like payment processing, where you need to know the status of a long-running process as soon as it completes.
    • If the latency of simple polling is a concern, long polling provides an effective upgrade with minimal added complexity.
  • Discussion Points in Interview:

    • Infrastructure Compatibility: Emphasize that long polling uses standard HTTP, minimizing the need for additional infrastructure.
    • Polling Frequency: Be specific about the expected polling frequency and ensure that your infrastructure can handle long-lived requests. This is crucial to avoid issues with load balancers timing out connections prematurely (e.g., if they close connections after 30 seconds while your server can hold them for longer).
    • Trade-offs: Discuss the balance between the simplicity of long polling and its limitations, particularly in terms of latency and resource usage.

Server Sent Events (SSE)

  • How it works: Server-Sent Events (SSE) allow the server to send a continuous stream of data to the client over a single HTTP connection. Instead of closing the connection after sending a response, the server keeps it open and sends data chunks as they become available. This is accomplished using the Transfer-Encoding: chunked header, which indicates that the response consists of a series of unknown-sized chunks.

  • Load Balancer for SSE:

    • SSE is stateful due to the persistent connection.
    • Layer 7 load balancers are preferred because they can manage persistent HTTP connections effectively.
    • Sticky sessions may be required to maintain a consistent connection between the client and the same server throughout the SSE stream.
  • Pros:

    • Standard HTTP: Leverages existing HTTP infrastructure, making it widely compatible.
    • Reduced Overhead: Reduces overhead from connection initiation and teardown compared to long polling.
    • Automatic Reconnection: Clients can automatically reconnect if the connection drops.
    • Built-in Browser Support: Modern browsers natively support SSE through the EventSource object.
    • Real-Time Updates: Provides near real-time updates.
  • Cons:

    • One-Way Communication: Only allows data to flow from the server to the client, not vice versa.
    • Limited Browser Support: While most modern browsers support SSE, some older browsers do not.
    • Proxy Issues: Some proxies and networking equipment may not support streaming responses, complicating debugging.
  • When to Use:

    • SSE is ideal for applications requiring efficient, near real-time updates where the server needs to push data to the client without frequent client responses. It’s particularly suitable for scenarios like AI chat applications that need to stream data (e.g., words or tokens) to maintain a responsive user interface. However, ensure that the entire infrastructure supports streaming responses, as some proxies may not handle it well.
  • Discussion Points in Interview:

    • Infrastructure Compatibility: Highlight that SSE utilizes existing HTTP infrastructure, reducing the need for additional setup.
    • Connection Management: Discuss how SSE connections are typically short-lived (30-60 seconds) and how clients handle reconnections using the "last event ID" to resume data flow seamlessly.
    • Limitiations and Considerations: Be prepared to mention the potential issues with proxies and load balancers that may not support streaming, and how this can affect application performance and debugging.

WebSockets

  • How it works: WebSockets provide a persistent, bi-directional communication channel over a single, long-lived connection between the client and the server. This allows for real-time, two-way communication without the overhead of HTTP requests and responses, making them ideal for high-frequency read and write operations.

  • Load Balancer for WebSockets:

    • WebSockets are stateful due to the persistent connection.
    • L4 load balancers will support websockets natively since the same TCP connection is used for each request.
  • Pros:

    • Full-Duplex: Supports simultaneous read and write operations.
    • Low Latency: Reduces overhead compared to HTTP due to the absence of headers in each message.
    • Efficient for Frequent Messages: Optimized for scenarios with frequent data exchange.
    • Wide Browser Support: Broadly supported by modern browsers.
  • Cons:

    • Complex Implementation: More challenging to implement compared to simpler alternatives.
    • Special Infrastructure: Requires infrastructure that supports WebSocket connections.
    • Stateful Connections: Stateful nature complicates load balancing and scaling.
    • Reconnection Handling: Requires logic to handle connection drops and reconnections.
  • When to Use:

    • WebSockets are the best choice when you need high-frequency, bi-directional communication.
    • However, consider whether you truly need bi-directional communication before opting for WebSockets, as simpler solutions like SSE or polling might suffice for scenarios primarily involving server-to-client data flow. A very common pattern is to have SSE subscriptions for updates and do writes over simple HTTP POST/PUT whenever they occur.
  • Discussion Points in Interview:

    • Connection Management: Be prepared to discuss how you'll manage connections, handle reconnections, and address server restarts during deployments.
    • State Management: Address how to minimize the spread of state across your architecture, especially in senior-level interviews.
    • Scalability: Discuss strategies for scaling WebSocket servers, such as using a "least connections" load balancing strategy and offloading intensive processing to separate, scalable services.
    • Reference Architecture: Consider using a WebSocket service to terminate WebSocket connections early in the infrastructure, which helps manage connection scaling and reduces the need for frequent deployments that churn connections.

WebRTC: The Peer-to-Peer Solution

  • How it works: WebRTC enables direct peer-to-peer communication between browsers, making it ideal for video/audio calls and collaborative applications. Clients use a central signaling server to discover peers and exchange connection information. WebRTC uses STUN (Session Traversal Utilities for NAT) for NAT traversal via "hole punching," and TURN (Traversal Using Relays around NAT) as a relay service when direct connections aren't possible.

In practice, the signaling server is relatively lightweight and isn't handling much of the bandwidth as the bulk of the traffic is handled by the peer-to-peer connections. But interestingly the signaling server does effectively act as a real-time update system for its clients (so they can find their peers and update their connection info) so it either needs to utilize WebSockets, SSE, or some other approach detailed above.

  • Setup Process:

      1. Peers connect to a signaling server to discover each other.
      1. Peers exchange connection information (ICE candidates).
      1. A direct peer connection is established, using STUN/TURN if needed.
      1. Audio/video streams or data are sent directly between peers.
  • Pros:

    • Direct Peer Communication: Enables direct connections between clients.
    • Lower Latency: Reduces latency by eliminating intermediary servers for data transmission.
    • Reduced Server Costs: Offloads traffic from servers to peer connections.
    • Native Audio/Video Support: Built-in support for audio and video streaming.
  • Cons:

    • Complex Setup: More complex to set up than WebSockets.
    • Requires Signaling Server: Needs a signaling server for peer discovery.
    • NAT/Firewall Issues: Vulnerable to issues related to NAT and firewalls.
    • Connection Setup Delay: Involves connection setup delays due to NAT traversal.
  • When to Use: WebRTC is suitable for video/audio calls, screen sharing, gaming, and collaborative applications where low latency and direct peer communication are important. It can also be used to reduce server load by having clients communicate directly, as seen in applications like collaborative document editing.

  • Discussion Points in Interview:

    • Justification: Clearly explain why WebRTC is necessary, such as for video conferencing or to handle scale by enabling direct client communication.
    • Infra Requirements: Demonstrate knowledge of infrastructure requirements like STUN/TURN servers and signaling servers.
    • Communication Patterns: Discuss communication patterns between peer clients and synchronization with a central server for data storage or reporting.
    • Trade-offs: Be prepared to discuss the trade-offs of using WebRTC versus other solutions, especially in scenarios with constraints like limited resources.

Client Realtime Updates Overview

To choose the best method for server-to-client event delivery, consider these options:

  • Simple Polling: Use if latency isn't a concern.
  • SSE (Server-Sent Events): Opt for this if you need server-to-client communication but not bi-directional.
  • WebSocket: Choose this for frequent, bi-directional communication.
  • WebRTC: Use this specifically for audio/video calls.

Server Realtime Updates

After choosing a method for server-to-client updates (Simple Polling, Long-Polling, SSE, WebSockets, WebRTC), the next step is to consider how updates propagate from their source to the server. There are three common patterns for triggering these updates:

  • Pulling via Polling: The server periodically checks for updates.
  • Pushing via Consistent Hashes: Updates are pushed to the server using a consistent hashing strategy.
  • Pushing via Pub/Sub: Updates are pushed to the server through a publish-subscribe system.

Pulling via Polling

  • How it works:

    Clients repeatedly ask the server for updates. The server maintains a database of updates and responds to client requests.

  • Pros:

    • Simple to implement.
    • State is constrained to the database.
    • No special infrastructure is needed.
  • Cons:

    • High latency.
    • Excessive database load if updates are infrequent and polling is frequent.
  • When to use:

    • When real-time updates are not required and some delay is acceptable.
  • Discussion Points in Interview:

    • Discuss how you are storing updates and querying the database.
    • Consider the read volume from frequent polling, especially with a large number of clients (e.g., a million clients polling every 10 seconds results in 100k TPS of read volume).

Pushing via Consistent Hashes

  • How it works: This approach ensures updates reach the correct clients, especially with persistent connections (long-polling, SSE, WebSockets). To enable this, you need to ensure that:

    • The server can determine which client is connected to which server:
      • Simple Hashing: Initially, a simple hashing method (using modulo) is applied to assign users to servers. A central service (like Zookeeper or etcd) keeps track of server assignments and user connections. This service helps servers know which user is connected to which server.

      • Simple Hashing: Initially, a simple hashing method (using modulo) is applied to assign users to servers. A central service (like Zookeeper or etcd) keeps track of server assignments and user connections. This service helps servers know which user is connected to which server.

        • Connection Process:
          1. When a client connects, it can either connect directly to the appropriate server or randomly connect to any server, which then redirects them to the correct one.
          2. The server looks up the user's hash in the coordination service to determine the correct server and redirects the client accordingly.
      • Consistent Hashing: Addresses the scaling issues of simple hashing by mapping servers onto a ring. This minimizes the number of re-mappings when servers are added or removed.

        • Consistent hashing reduces the dependency on centralized coordination for the hashing process itself.
        • Each node in the system can independently calculate the hash ring and determine where data should be stored or retrieved without relying on a central authority.
        • However, centralized services like ZooKeeper or etcd may still be needed for cluster membership management, failure detection, and configuration management.
          • Cluster Membership Management: While consistent hashing handles data redistribution efficiently, it does not manage cluster membership (e.g., detecting when a node joins, leaves, or fails). A service like ZooKeeper or etcd can provide consensus and maintain an authoritative list of active nodes in the cluster.
          • Failure Detection and Recovery: In distributed systems, nodes may fail unpredictably. A coordination service can help detect failures and notify other nodes to update their hash rings accordingly.
          • Configuration Management: Centralized services can store shared configurations (e.g., replication factors or backup policies) required by all nodes.
          • Alternatively, you can use decentralized approaches like gossip protocol for node discovery and failure detection rather than relying on ZooKeeper.
    • Direct communication between servers:
      • Message Forwarding: When a user sends a message to another user, the sending server must forward that message to the server where the receiving user is connected. This requires direct communication between servers. For example, if User A is connected to Server 1 and sends a message to User C, who is connected to Server 2, Server 1 needs to communicate with Server 2 to deliver the message.
      • Decoupling of Servers: Direct communication allows servers to operate more independently, reducing bottlenecks and enhancing scalability. If each server only communicates via a central hub, it could become a performance bottleneck.
      • Protocol Choice:: Websocket is a good choice for inter-server communication, provides low-latency, real-time communication between servers.
  • Pros:

    • Predictable server assignment.
    • Minimal connection disruption (re-mappings) during scaling.
    • No central coordination needed for routing.
    • Works well with stateful connections.
    • Easy to add/remove servers.
  • Cons:

    • Complex to implement correctly.
    • Requires coordination service (like Zookeeper).
    • Can lead to uneven distribution if the hash function isn't well-designed.
    • All servers need to maintain routing information.
  • When to use:

    • When persistent connections are needed, the system needs to scale dynamically, and you can tolerate some connection loss during server failures.
    • Also suitable when maintaining a lot of state associated with a given connection (e.g., collaborative document editing).
  • Discussion Points in Interview:

    • Discuss how updates are routed using a coordination service like Zookeeper or etcd.
    • Explain the scaling process:
      1. Signaling the beginning of a scaling event.
      2. Recording both the old and new server assignments.
      3. Slowly disconnecting clients from the old server and having them reconnect to their newly assigned server.
      4. Signaling the end of the scaling event and updating the coordination service with the new server assignments.
      5. In the interim, having messages which are sent to both the old and new server until they're fully transitioned.
    • The mechanics of discovering the initial server assignments is also interesting. There are performance tradeoffs associated with redirecting clients to the correct server or requiring them to do a round-trip to a central server to look up the correct one. Especially during scaling events, any central registration service may become a bottleneck so it's important to discuss the tradeoffs with your interviewer.

Pushing via Pub/Sub

  • How it works:

    • Architecture Overview:
      • A single Pub/Sub service (e.g., Kafka, Redis) collects updates and broadcasts them to interested clients.
      • The pub/sub service becomes the biggest source of state for our realtime updates.
      • Endpoint servers act as lightweight intermediaries that subscribe to topics and forward updates to clients.
    • Client Connection:
      • Clients connect to any available endpoint server instead of a specific one.
      • The endpoint server registers the client with the Pub/Sub service by creating a topic, subscribing to it, and keeping a mapping from topics to the connections associated with them.
    • Message Sending:
      • When an update occurs, it is published to the relevant topic in the Pub/Sub service.
      • The Pub/Sub service pushes the update to all endpoint servers that subscribe to the topic, then endpoint servers will forward the message to the appropriate clients over existing connections.
  • Pros:

    • Decoupled Communication: Publishers and subscribers do not need to know about each other, allowing for flexible scaling.
    • Load Management: Easy to manage load on endpoint servers using load balanceing strategies like "least connections" for the initial incoming connection establishment.
    • Efficient Broadcasting: Capable of broadcasting updates to a large number of clients efficiently.
    • Minimal State Spread: Reduces the amount of state information that needs to be managed across the system.
  • Cons:

    • Subscriber State: Difficulty in tracking whether subscribers are connected or when they disconnect.
    • Single Point of Failure: The Pub/Sub service can become a bottleneck and a single point of failure.
    • At-Most-Once Delivery: Redis Pub/Sub does not guarantee message delivery; if a subscriber is offline or misses a message, it will not be re-delivered.
    • No Persistence: If a subscriber is offline during message publication, it will miss the message since Redis Pub/Sub does not persist messages. For guaranteed delivery, you can use Redis Streams, which persist messages and allow replay.
    • Increased Latency: The additional layer of indirection can introduce latency in message delivery.
    • Complex Connections: Many-to-many connections between Pub/Sub service hosts and endpoint servers can complicate the architecture.
  • When to use:

    • Ideal For: Scenarios requiring efficient broadcasting to a large number of clients with minimal overhead on endpoint servers. The latency impact is minimal (<10ms).
    • Not Ideal For: Applications needing to respond to connect/disconnect events or maintain significant state associated with clients.
  • Discussion Points in Interview:

    • Single Point of Failure: Be prepared to discuss the implications of the Pub/Sub service being a single point of failure and potential bottlenecks. Redis cluster is a popular way to scale pub/sub service, which involves sharding subscriptions across multiple hosts.
    • Connection Management: Introducing a cluster for the Pub/Sub component means you'll manage the many-to-many connections between the pub/sub service and the endpoint servers (each endpoint server will be connected to all hosts in the cluster). In reality, the number of nodes in the cluster is small.
    • Load Balancing: For initial inbound connections to the endpoint servers, you'll probably want to use a load balancer with a "least connections" strategy. This will help ensure that you're distributing the load across the servers in the cluster. Since the connection itself are effectively the only resource being consumed by the endpoint servers, load balancing based on connections is a great way to manage the load. Once connection established, it is crucial to ensure that clients remain connected to the same endpoint server for the lifetime of their session. This is where sticky sessions (or session affinity) come into play.

Server Realtime Updates Overview

When propagating updates to endpoint servers, consider these options:

  • Polling Model:
    • The simplest method if feasible.
  • Pub/Sub Model:
    • A great alternative if polling isn't possible.
    • Minimizes state management, simplifies scaling, and efficiently broadcasts updates to many clients.
    • Suitable for most applications.
  • Consistent Hashing:
    • A specialized approach for specific scenarios.
    • Ideal when creating deep infrastructure components or when maintaining substantial state associated with a client.
    • Useful if your application requires a lot of state/resources for each connection

Stream Processing

Stream processing is a technique for processing data as it flows in, rather than waiting for all the data to arrive. It is a way to process data in real-time, as it is generated.

Apache Spark

  • Spark is a distributed unified engine that supports batch and stream processing.
  • Spark adopts a micro-batch processing aggregation strategy, which can introduce latency, making it less suitable for ultra-low latency applications.
  • Spark window should be a multiple of micro-batch interval, which determines the flush interval.
  • Spark has checkpointing mechanism to ensure fault-tolerance, which can introduce significant latency due to it's disk-based approach.
  • Spark has larger user base and community support, and more comprehensive libraries for different types of data processing tasks, including machine learning, graph processing, and SQL.

Apache Flink

  • Flink adopts real-time event-based aggregation strategy, which means as soon as an data arrives for a specific window, it can be processed immediately and flushed out.
  • Flink is a true stream processing engine, it is designed for ultra-low latency and high throughput.
  • Flink allows configure flush interval independently of window duration, enabling you to achieve lower latency by emitting results more frequently.
  • Flink supports complex event-drive applications with native support for stateful computations.
  • Flink's checkpointing mechanism is more efficient than Spark's, has lower overhead due to it uses a hybrid approah, which initially managing state in memory and peroidically taking snapshots to disk.

Collaborative Editing Techniques

In collaborative editing, multiple users can edit the same document concurrently. One of the biggest challenge is real-time conflict resolution and ensure data consistency. There are several techniques to achieve this, including:

Operational Transformation (OT)

  • OT treats each user's edit as an operation, and then transform these operations in a way that is consistent with the edits of all the users.

  • The transformation ensures that operations do not conflict with each other and can be applied in any order, the final result remains the same.

    How it works:

    1. Editing Operations: When a user makes an edit (e.g., inserting, deleting, or modifying text), the client generates an operation that represents this edit. This operation contains information about the edit, such as the position, content, and metadata. The operation is sent to the server.
    2. Server Transforming Operations: The server receives the operations from all the users and transforms them in a way that is consistent with the edits of all the users. The server then broadcasts the transformed operations to all the users.
    3. Client Transforming Operations: Upon receiving the transformed operations, the client need to transform the operation again to incorporate the effects of its local operations.
    4. Applying Operations: The client applies the transformed operations to its local document.

    Pros:

    • Memory efficient
    • Low latency

    Cons:

    • OT requires a central server to coordinate transformation and provide a final ordering of operations, which can become a bottleneck. It's less scalable than CRDTs.
    • Complex to implement correctly.

    Real World Use Cases:

    • Google Docs
    • Collaborative code editors

Conflict Free Replicated Data Types (CRDTs)

  • CRDTs represents edits as a set of operations that can be applied in any order, the final result remains the same.
  • More complex to implement and understand than OT.

Pros:

  • More distributed, no need for a central server. Ideal for decentralized (peer-to-peer) applications.
  • More scalable than OT, to support large number of collaborators.
  • More easily adaptable to offline use cases.

Cons:

  • Computation and memory intensive
  • More complex to implement than OT

Real World Use Cases:

  • Trello
  • Collabroative drawing applications like Figma, Miro