Trade-offs in System Design
1. Vertical vs Horizontal Scaling
Vertical scaling involves boosting the power of an existing machine (eg. CPU, RAM, Storage) to handle increased loads.
Scaling vertically is simpler but there's a physical limit to how much you can upgrade a single machine and it introduces a single point of failure.
Horizontal scaling involves adding more servers or nodes to the system to distribute the load across multiple machines.
Scaling horizontally allows for almost limitless scaling but brings complexity of managing distributed systems.
2. Strong vs Eventual Consistency
Strong consistency ensures that any read operation returns the most recent write for a given piece of data.
This means that once a write is acknowledged, all subsequent reads will reflect that write
Eventual consistency ensures that, given enough time, all nodes in the system will converge to the same value.
However, there are no guarantees about when this convergence will occur.
3. Stateful vs Stateless Design
In a stateful design, the system remembers client data from one request to the next.
It maintains a record of the client's state, which can include session information, transaction details, or any other data relevant to the ongoing interaction.
Stateless design treats each request as independent transaction. The server does not store any information about the client's state between requests.
Each request must contain all the information necessary to understand and process it.
4. Read-Through vs Write-Through Cache
A Read-Through cache sits between your application and your data store.
When your application requests data, it first checks the cache.
- If the data is found in the cache (a cache hit), it's returned to the application.
- If the data is not in the cache (a cache miss), the cache itself is responsible for loading the data from the data store, caching it, and then returning it to the application.
In a Write-Through cache strategy, data is written into the cache and the corresponding database simultaneously.
Every write operation writes data to both the cache and the data store.
The write operation is only considered complete when both writes are successful.
5. SQL vs NoSQL
SQL databases use structured query language and have a predefined schema. They're ideal for:
- Complex queries: SQL is powerful for querying complex relationships between data.
- ACID compliance: Ensures data validity in high-stake transactions (e.g., financial systems).
- Structured data: When your data structure is unlikely to change.
Examples: MySQL, PostgreSQL, Oracle
NoSQL databases are more flexible and scalable. They're best for:
- Big Data: Can handle large volumes of structured and unstructured data.
- Rapid development: Schema-less nature allows for quicker iterations.
- Scalability: Easier to scale horizontally.
Examples: MongoDB, Cassandra, Redis
6. REST vs RPC
When designing APls, two popular architectural styles often come into consideration: REST (Representational State Transfer) and RPC (Remote Procedure Call). Both have their strengths and ideal use cases. Let's dive into their key differences to help you choose the right one for your project.
REST (Representational State Transfer)
REST is an architectural style that uses HTTP methods to interact with resources.
Key characteristics:
- Stateless: Each request contains all necessary information
- Resource-based: Uses URLs to represent resources
- Uses standard HTTP methods (GET, POST, PUT, DELETE)
- Typically returns data in JSON or XML format
RPC (Remote Procedure Call)
RPC is a protocol that one program can use to request a service from a program located on another computer in a network.
Key characteristics:
- Action-based: Focuses on operations or actions
- Can use various protocols (HTTP, TCP, etc.)
- Often uses custom methods
- Typically returns custom data formats
7. Synchronous vs Asynchronous
Synchronous Processing:
- Tasks are executed sequentially.
- Makes it easier to reason about code and handle dependencies.
- Used in scenarios where tasks must be completed in order like reading a file line by line.
Asynchronous Processing:
- Tasks are executed concurrently.
- Improves responsiveness and performance, especially in I/O-bound operations
- Used when you need to handle multiple tasks simultaneously without blocking the main thread like background processing jobs.
8. Batch vs Stream Processing
Batch Processing:
- Process large volumes of data at once, typically at scheduled intervals.
- Efficient for handling massive datasets, ideal for tasks like reporting or data warehousing.
- High Latency - results are available only after the entire batch is processed.
- Examples: ETL jobs, data aggregation, periodic backups.
Stream Processing:
- Process data in real-time as it arrives.
- Perfect for real-time analytics, monitoring, and alerting systems.
- Minimal latency since data is processed within milliseconds or seconds of arrival.
- Examples: Real-time fraud detection, live data feeds, loT applications.
9. Long Polling vs WebSockets
In a Long Polling connection, the client repeatedly requests updates from the server at regular intervals.
If the server has new data, it sends a response immediately; otherwise, it holds the connection until data is available.
This can lead to increased latency and higher server load due to frequent requests, even when no data is available.
Websokcet establishes a persistent, full-duplex connection between the client and server, allowing real-time data exchange without the overhead of HTTP requests.
Unlike the traditional HTTP protocol, where the client sends a request to the server and waits for a response, WebSockets allow both the client and server to send messages to each other independently and continuously after the connection is established.
10. Normalization vs Denormalization
Normalization in database design involves splitting up data into related tables to ensure each piece of information is stored only once.
It aims to reduce redundancy and improve data integrity.
Example: A customer database can have two separate tables: one for customer details and another for orders, avoiding duplication of customer information for each order.
Denormalization is the process of combining data back into fewer tables to improve query performance.
This often means introducing redundancy (duplicate information) back into your database.
Example: A blog website can store the latest comments with the posts in the same table (denormalized) to speed up the display of post and comments, instead of storing them separately (normalized).
11. TCP vs UDP
When it comes to data transmission over the internet, two key protocols are at the forefront: TCP and UDP.
TCP (Transmission Control Protocol):
- Reliable: Ensures all data packets arrive in order and are error-free.
- Connection-Oriented: Establishes a connection before data transfer, making it ideal for tasks where accuracy is crucial (e.g., web browsing, file transfers).
- Slower: The overhead of managing connections and ensuring reliability can introduce latency.
UDP (User Datagram Protocol):
- Faster: Minimal overhead allows for quick data transfer, perfect for time-sensitive applications.
- Connectionless: No formal connection setup; data is sent without guarantees, making it ideal for real-time applications (e.g., video streaming, online gaming).
- Unreliable: No error-checking or ordering, so some data packets might be lost or arrive out of order.
Creator