System Design 2
System Design 2
FreemanZhang / system-design
Dismiss
Join GitHub today
GitHub is home to over 28 million developers working together to host
and review code, manage projects, and build software together.
Sign up
README.md
https://github.com/FreemanZhang/system-design Page 1 of 48
GitHub - FreemanZhang/system-design: Preparing for system design interview questions 01/07/18, 5)42 PM
https://github.com/FreemanZhang/system-design Page 2 of 48
GitHub - FreemanZhang/system-design: Preparing for system design interview questions 01/07/18, 5)42 PM
Stateless applications
Structure of a session
Server-side session vs client-side cookie
Store session state in client-side cookies
Cookie Def
Cookie typical workflow
Cookie Pros and cons
Store session state in server-side
Typical server-side session workflow
Use a load balancer that supports sticky sessions:
DNS
Design
Initial design
A distributed, hierarchical database
Internals
DNS records
Insert records into DNS DB
DNS query parsing
Types
Round-robin DNS
GeoDNS
Functionality
DNS Caching
Load balancing
Host alias
DNS prefetching
Def
Control prefetching
Load balancers
Benefits
Round-robin algorithm
Security
SSL
Definition
How does HTTPS work
How to avoid public key being modified?
How to avoid computation consumption from PKI
NoSQL
NoSQL vs SQL
NoSQL flavors
Key-value
Document
Column-Family
Graph
Scaling
Functional partitioning
REST best practices
Consistency
Endpoint naming conventions
HTTP verbs and CRUD consistency
https://github.com/FreemanZhang/system-design Page 3 of 48
GitHub - FreemanZhang/system-design: Preparing for system design interview questions 01/07/18, 5)42 PM
Versioning
Data transfer format
HTTP status codes and error handling
Paging
Scaling REST web services
Keeping service machine stateless
Benefits
Common use cases needing share state
Caching service responses
Cache-Control header
Expires
Last-Modified/If-Modified-Since/Max-age
ETag
Vary: Authorization
Functional partitioning
Security
Throttling
Use OAuth2 with HTTPS for authorization, authentication and confidentiality.
Documentation
Others
Data partitioning - Sharding
Sharding benefits
Sharding key
Sharding function
Static sharding
Dynamic sharding
Challenges
Cross-shard joins
Using AUTO_INCREMENT
Distributed transactions
Clones - Replication
Replication purpose
High availability by creating redundancy
Planning for failures
Replication for scaling read
When to use
When not to use
Replication Topology
Master-slave vs peer-to-peer
Master-slave replication
Number of slaves
Peer-to-peer replication
Replication mode
Synchronous and Asynchronous
Synchronous vs Asynchronous
Cache
Why does cache work
Cache hit ratio
How much will cache benefit
Access pattern
https://github.com/FreemanZhang/system-design Page 4 of 48
GitHub - FreemanZhang/system-design: Preparing for system design interview questions 01/07/18, 5)42 PM
https://github.com/FreemanZhang/system-design Page 5 of 48
GitHub - FreemanZhang/system-design: Preparing for system design interview questions 01/07/18, 5)42 PM
ActiveMQ
RabbitMQ
SQS
Kafka
Data Processing
Hadoop
Spark
EMR
Stream Processing
Samza
Storm
References
System design
The process of designing the architecture, components, modules, interfaces, and data for a system to satisfy
specified requirements. Flexible, maintainable and scalable.
Num of operation per day * Number of daily active users / 86400 (~100,000)
Since the traffic load is not evenly distributed across the day. Let's assume that the peak QPS is 3 times the
average QPS.
List features
https://github.com/FreemanZhang/system-design Page 6 of 48
GitHub - FreemanZhang/system-design: Preparing for system design interview questions 01/07/18, 5)42 PM
(Interviewee) First, let me list down all the features I could think of.
(Interviewee) Among all these use cases, these are the core features. I would like to focus on these core features
first. If we have extra time, then we consider XXX features.
Common features
User system
Register / Login
Profile display / Edit
History view
Friendship system
User interface (Or only API is needed)
Payment
Search
Notification (Email/SMS)
Mobile / Desktop / Third party support
Abstract design
Diagram of components of your designed system and its connections.
Let's draw a high-level module diagram for the system.
Use rectangles for components
Use lines to connect them as communication traffic
Front-end layer
Data cache
Redis / Memcached
Cache key / Cache algorithm
Data storage
Single DB / Master-slave, sharding
Search
ElasticSearch
Scale
Replica
Sharding
Denormalization
https://github.com/FreemanZhang/system-design Page 7 of 48
GitHub - FreemanZhang/system-design: Preparing for system design interview questions 01/07/18, 5)42 PM
OO design principles
Dependency injection provides references to objects that the class depends on instead of allowing the class to gather
the dependencies itself. In practice, dependency injection can be summarized as not using the "new" keyword in your
classes and demanding instances of your dependencies to be provided to your class by its clients.
Dependency injection is an important principle and a subclass of a broader principle called inversion of control.
Dependency injection is limited to object creation and assembly of its dependencies. Inversion of control, on the other
hand, is a more generic idea and can be applied to different problems on different levels of abstraction.
IOC is heavily used by several frameworks such as Spring, Rails and even Java EE containers. Instead of you being
in control of creating instances of your objects and invoking methods, you become the creator of plugins or
extensions to the framework. The IOC framework will look at the web request and figure out which classes should
be instantiated and which components should be delegated to. This means your classes do not have to know
when their instances are created, who is using them, or how their dependencies are put together.
https://github.com/FreemanZhang/system-design Page 8 of 48
GitHub - FreemanZhang/system-design: Preparing for system design interview questions 01/07/18, 5)42 PM
Copy/Paste programming
CAP theorem
If you get a network partition, you have to trade off consistency versus availability.
Consistency: Every read would get the most recent write.
Availability: Every request received by the nonfailing node in the system must result in a response.
Partition tolerance: The cluster can survive communication breakages in the cluster that separate the cluster into
multiple partitions unable to communicate with each other.
Consistency
Update consistency
Def: Write-write conflicts occur when two clients try to write the same data at the same time. Result is a lost update.
Solutions:
Pessimistic approach: Preventing conflicts from occuring.
The most common way: Write locks. In order to change a value you need to acquire a lock, and the system
ensures that only once client can get a lock at a time.
Optimistic approach: Let conflicts occur, but detects them and take actions to sort them out.
The most common way: Conditional update. Any client that does an update tests the value just before
updating it to see if it is changed since his last read.
Save both updates and record that they are in conflict. This approach usually used in version control
systems.
Problems of the solution: Both pessimistic and optimistic approach rely on a consistent serialization of the updates.
Within a single server, this is obvious. But if it is more than one server, such as with peer-to-peer replication, then two
nodes might apply the update in a different order.
Often, when people first encounter these issues, their reaction is to prefer pessimistic concurrency because they are
determined to avoid conflicts. Concurrent programming involves a fundamental tradeoff between safety (avoiding
errors such as update conflicts) and liveness (responding quickly to clients). Pessimistic approaches often severly
degrade the responsiveness of a system to the degree that it becomes unfit for its purpose. This problem is made
worse by the danger of errors such as deadlocks.
Read consistency
Def:
Read-write conflicts occur when one client reads inconsistent data in the middle of another client's write.
Types:
Logical consistency: Ensuring that different data items make sense together.
Example:
Martin begins update by modifying a line item
Pramod reads both records
Martin completes update by modifying shipping charge
Replication consistency: Ensuring that the same data item has the same value when read from different replicas.
Example:
There is one last hotel room for a desirable event. The reservation system runs onmany nodes.
Martin and Cindy are a couple considering this room, but they are discussing this on the phone because
Martin is in London and Cindy is in Boston.
Meanwhile Pramod, who is in Mumbai, goes and books that last room.
That updates the replicated room availability, but the update gets to Boston quicker than it gets to
https://github.com/FreemanZhang/system-design Page 9 of 48
GitHub - FreemanZhang/system-design: Preparing for system design interview questions 01/07/18, 5)42 PM
London.
When Martin and Cindy fire up their browsers to see if the room is available, Cindy sees it booked and
Martin sees it free.
Read-your-write consistency (Session consistency): Once you have made an update, you're guaranteed to
continue seeing that update. This can be difficult if the read and write happen on different nodes.
Solution1: A sticky session. a session that's tied to one node. A sticky session allows you to ensure that as
long as you keep read-your-writes consistency on a node, you'll get it for sessions too. The downsides is that
sticky sessions reduce the ability of the load balancer to do its job.
Solution2: Version stamps and ensure every interaction with the data store includes the latest version stamp
seen by a session.
Replication Consistency
Def: Slaves could return stale data.
Reason:
Replication is usually asynchronous, and any change made on the master needs some time to replicate to its
slaves. Depending on the replication lag, the delay between requests, and the speed of each server, you may get
the freshest data or you may get stale data.
Solution:
Send critical read requests to the master so that they would always return the most up-to-date data.
Cache the data that has been written on the client side so that you would not need to read the data you have just
written.
Minize the replication lag to reduce the chance of stale data being read from stale slaves.
Message queue
Benefits
Enabling asynchronous processing:
Defer processing of time-consuming tasks without blocking our clients. Anything that is slow or unpredictable is a
candidate for asynchronous processing. Example include
Interact with remote servers
Low-value processing in the critical path
Resource intensive work
Independent processing of high- and low- priority jobs
Message queues enable your application to operate in an asynchronous way, but it only adds value if your
application is not built in an asynchronous way to begin with. If you developed in an environment like Node.js,
which is built with asynchronous processing at its core, you will not benefit from a message broker that much.
What is good about message brokers is that they allow you to easily introduce asynchronous processing to other
platforms, like those that are synchronous by nature (C, Java, Ruby)
Easier scalability:
Producers and consumers can be scaled separately. We can add more producers at any time without overloading
the system. Messages that cannot be consumed fast enough will just begin to line up in the message queue. We
can also scale consumers separately, as now they can be hosted on separate machines and the number of
consumers can grow independently of producers.
Decoupling:
All that publishers need to know is the format of the message and where to publish it. Consumers can become
oblivious as to who publishes messages and why. Consumers can focus solely on processing messages from the
queue. Such a high level decoupling enables consumers and producers to be developed indepdently. They can
even be developed by different teams using different technologies.
Evening out traffic spikes:
You should be able to keep accepting requests at high rates even at times of icnreased traffic. Even if your
publishing generates messages much faster than consumers can keep up with, you can keep enqueueing
https://github.com/FreemanZhang/system-design Page 10 of 48
GitHub - FreemanZhang/system-design: Preparing for system design interview questions 01/07/18, 5)42 PM
messages, and publishers do not have to be affected by a temporary capacity problem on the consumer side.
Isolating failures and self-healing:
The fact that consumers' availability does not affect producers allows us to stop message processing at any time.
This means that we can perform maintainance and deployments on back-end servers at any time. We can simply
restart, remove, or add servers without affecting producer's availability, which simplifies deployments and server
management. Instead of breaking the entire application whenever a back-end server goes offline, all that we
experience is reduced throughput, but there is no reduction of availability. Reduced throughput of asynchronous
tasks is usually invisible to the user, so there is no consumer impact.
Components
Message producer
Locate the message queue and send a valid message to it
Message broker - where messages are sent and buffered for consumers.
Be available at all times for producers and to accept their messages.
Buffering messages and allowing consumers to consume related messages.
Message consumer
Receive and process message from the message queue.
The two most common ways of implement consumers are a "cron-like" and a "daemon-like" approach.
Connects periodically to the queue and checks the status of the queue. If there are messages, it consumes
them and stops when the queue is empty or after consuming a certain amount of messages. This model is
common in scripting languages where you do not have a persistenly running application container, such as
PHP, Ruby, or Perl. Cron-like is also referred to as a pull model because the consumers pulls messages from
the queue. It can also be used if messages are added to the queue rarely or if network connectivity is
unreliable. For example, a mobile application may try to pull the queue from time to time, assuming that
connection may be lost at any point in time.
A daemon-like consumer runs constantly in an infinite loop, and it usually has a permanent connection to the
message broker. Instead of checking the status of the queue periodically, it simply blocks on the socket read
operation. This means that the consumer is waiting idly until messages are pushed by the message broker in
the connection. This model is more common in languages with persistent application containers, such as
Java, C#, and Node.js. This is also referred to as a push model because messages are pushed by the
message broker onto the consumer as fast as the consumer can keep processing them.
Routing methods
Direct worker queue method
Consumers and producers only have to know the name of the queue.
Well suited for the distribution of time-consuming tasks such as sending out e-mails, processing videos, resizing
images, or uploading content to third-party web services.
Publish/Subscribe method
Producers publish message to a topic, not a queue. Messages arriving to a topic are then cloned for each
consumer that has a declared subscription to that topic.
Custom routing rules
A consumer can decide in a more flexible way what messages should be routed to its queue.
Logging and alerting are good examples of custom routing based on pattern matching.
Protocols
AMQP: A standardized protocol accepted by OASIS. Aims at enterprise integration and interoperability.
STOMP: A minimalist protocol.
Simplicity is one of its main advantages. It supports fewer than a dozen operations, so implementation and
debugging of libraries are much easier. It also means that the protocol layer does not add much performance
overhead.
But interoperability can be limited because there is no standard way of doing certain things. A good example of
https://github.com/FreemanZhang/system-design Page 11 of 48
GitHub - FreemanZhang/system-design: Preparing for system design interview questions 01/07/18, 5)42 PM
impaired is message prefetch count. Prefetch is a great way of increasing throughput because messages are
received in batches instead of one message at a time. Although both RabbitMQ and ActiveMQ support this
feature, they both implement it using different custom STOMP headers.
JMS
A good feature set and is popular
Your ability to integrate with non-JVM-based languages will be very limited.
Challenges
No message ordering: Messages are processed in parallel and there is no synchronization between consumers. Each
consumer works on a single message at a time and has no knowledge of other consumers running in parallel to it.
Since your consumers are running in parallel and any of them can become slow or even crash at any point in time, it is
difficult to prevent messages from being occasionally delivered out of order.
Solutions:
Limit the number of consumers to a single thread per queue
Build the system to assume that messages can arrive in random order
Use a messaging broker that supports partial message ordering guarantee.
It is best to depend on the message broker to deliver messages in the right order by using partial message
guarantee (ActiveMQ) or topic partitioning (Kafka). If your broker does not support such functionality, you will
need to ensure that your application can handle messages being processed in an unpredictable order.
Partial message ordering is a clever mechanism provided by ActiveMQ called message groups. Messages
can be published with a special label called a message group ID. The group ID is defined by the application
developer. Then all messages belonging to the same group are guaranteed to be consumed in the same
order they were produced. Whenever a message with a new group ID gets published, the message broker
maps the new group Id to one of the existing consumers. From then on, all the messages belonging to the
same group are delivered to the same consumer. This may cause other consumers to wait idly without
messages as the message broker routes messages based on the mapping rather than random distribution.
Message ordering is a serious issue to consider when architecting a message-based application, and RabbitMQ,
ActiveMQ and Amazon SQS messaging platform cannot guarantee global message ordering with parallel workers.
In fact, Amazon SQS is known for unpredictable ordering messages because their infrastructure is heavily
distributed and ordering of messages is not supported.
Message requeueing
By allowing messages to be delivered to your consumers more than once, you make your system more robust and
reduce constraints put on the message queue and its workers. For this approach to work, you need to make all of
your consumers idempotent.
But it is not an easy thing to do. Sending emails is, by nature, not an idempotent operation. Adding an extra
layer of tracking and persistence could help, but it would add a lot of complexity and may not be able to
handle all of the faiulres.
Idempotent consumers may be more sensitive to messages being processed out of order. If we have two
messages, one to set the product's price to $55 and another one to set the price of the same product to $60,
we could end up with different results based on their processing order.
Race conditions become more likely
https://github.com/FreemanZhang/system-design Page 12 of 48
GitHub - FreemanZhang/system-design: Preparing for system design interview questions 01/07/18, 5)42 PM
Networking
TCP vs UDP
TCP UDP
Reliable: TCP is connection-oriented protocol. When a file Not Reliable: UDP is connectionless protocol. When you
or message send it will get delivered unless connections a send a data or message, you donʼt know if itʼll get
fails. If connection lost, the server will request the lost there, it could get lost on the way. There may be
part. There is no corruption while transferring a message. corruption while transferring a message.
Streaming: Data is read as a “stream,” with nothing Datagrams: Packets are sent individually and are
distinguishing where one packet ends and another begins. guaranteed to be whole if they arrive. One packet per
There may be multiple packets per read call. one read call.
HTTP
Status code
Groups
4XX Client error 401 Authentication failure; 403 Authorization failure; 404 Resource not found
3XX Redirect 301 Resource moved permanently; 302 Resource moved temporarily
2XX Success 200 OK; 201 Created; 203 Object marked for deletion
Status
Meaning Examples
code
https://github.com/FreemanZhang/system-design Page 13 of 48
GitHub - FreemanZhang/system-design: Preparing for system design interview questions 01/07/18, 5)42 PM
Malformed
400 Frequently a problem with parameter formatting or missing headers
request
Authentication The system doesn't know who the request if from. Authentication signature errors or
401
error invalid credentials can cause this
Authorization The system knows who you are but you don't have permission for the action you're
403
error requesting
Page not
404 The resource doesn't exist
found
Method not Frequently a PUT when it needs a POST, or vice versa. Check the documentation
405
allowed carefully for the correct HTTP method
Verbs
HTTP
System Successful response code + Response
Action verb Request body
call body
address
Order
Add order Post {"name" : "iced tea", "size" :
iced 201 Created Location: /orders/1
to system /orders/ "trenta"}
team
Delete
Cancel DELETE 202 Item Marked for Deletion or 204 No
order from
order /orders/1 Content
system
What about actions that don't fit into the world of CRUD operations?
Restructure the action to appear like a field of a resource. This works if the action doesn't take parameters. For
example an activate action could be mapped to a boolean activated field and updated via a PATCH to the
resource.
Treat it like a sub-resource with RESTful principles. For example, GitHub's API lets you star a gist with PUT
/gists/:id/star and unstar with DELETE /gists/:id/star.
Sometimes you really have no way to map the action to a sensible RESTful structure. For example, a multi-
resource search doesn't really make sense to be applied to a specific resource's endpoint. In this case, /search
would make the most sense even though it isn't a resource. This is OK - just do what's right from the perspective
of the API consumer and make sure it's documented clearly to avoid confusion.
Others
Headers
https://github.com/FreemanZhang/system-design Page 14 of 48
GitHub - FreemanZhang/system-design: Preparing for system design interview questions 01/07/18, 5)42 PM
Request
The client's preferred format for the response body. Browsers tend to prefer text/html,
which is a human-friendly format. Applications using an API are likely to request
Text/html,
Accept JSON, which is structured in a machine-parseable way. This can be a list, and if so,
application/json
the list is parsed in priority order: the first entry is the most desired format, all the way
down to the last one.
Accept- The preferred written language for the response. This is most often used by browsers
en-US
language indicating the language the user has specified as a preference
This header tells the server what kind of client is making the request. This is an
important header because sometimes responses or JavaScript actions are performed
User-
Mozilla/5.0 differently for different browsers. This is used less frequently for this purpose by API
agent
clients, but it's a friendly practice to send a consistent user-agent for the server to use
when determining how to send the information back.
Content- size of the When sending a PUT or POST, this can be sent so the server can verify that the
length content body request body wasn't truncated on the way to the server.
Content- When a content body is sent, the client can indicate to the server what the format is
application/json
type for that content in order to help the server respond to the request correctly.
Response
As with the request, when the content body is sent back to the client, the
Content-Type is generally set to help the client know how best to process the
Content- request. Note that this is tied somewhat indirectly to the Accept header sent
application/json
Type by the client. The server will generally do its best to send the first type of
content from the list sent by the client but may not always provide the first
choice.
Access-
Control- Content-Type, This restricts the headers that a client can use for the request to a particular
Allow- Authorization, Accept resource
Headers
Access-
Control- GET, PUT, POST,
What HTTP methods are allowed for this resource
Allow- DELETE, OPTIONS
Methods
Access-
Control- * or
This restricts the locations that can refer requests to the resource
Allow- http://www.example.com
Origin
Compression
Accept-Encoding/Content-Encoding:
Condition: Content compression occurs only when a client advertises, wants to use it and a server indicates its
willingness to enable it.
Clients indicate they want to use it by sending the Accept-Encoding header when making requests. The value
of this header is a comma-separated list of compression methods that the client will accept. For example,
Accept-Encoding: gzip, deflate.
If the server supports any of the compression methods that the client has advertised, it may deliver a
https://github.com/FreemanZhang/system-design Page 15 of 48
GitHub - FreemanZhang/system-design: Preparing for system design interview questions 01/07/18, 5)42 PM
compressed version of the resource. It indicates that the content has been compressed with the Content-
Encoding header in the response. For example, Content-Encoding: gzip. Content-Length header in the
response indicates the size of the compressed content.
Methods
identity: no compression.
compress: UNIX compress method, which is based on the Lempel-Ziv Welch (LZW) aglorithm
gzip: the most popular format.
deflate: just gzip without the checksum header.
What to compress:
Usually applied to text-based content such as HTML, XML, CSS, and Javascript.
Not applied to binary data
Many of binary formats such as GIF, PNG, and JPEG already use compression.
Disadvantages:
There is additional CPU usage at both the server side and client side.
There will always be a small percentage of clients that simply can't accept compressed content.
Parameters
Parameters are frequently used in HTTP requests to filter responses or give additional information about the request.
They're used most frequently with GET(read) operations to specify exactly what's wanted from the server. Parameters
are added to the address. They're separated from the address with a question mark (?), and each key-value pair is
separated by an equals sign (=); pairs are separated from each other using the ampersand.
HTTP session
Stateless applications
Web application servers are generally "stateless":
Each HTTP request is independent; server can't tell if 2 requests came from the same browser or user.
Web server applications maintain no information in memory from request to request (only information on disk
survives from one request to another).
Statelessness not always convenient for application developers: need to tie together a series of requests from the
same user. Since the HTTP protocol is stateless itself, web applications developed techniques to create a concept of a
session on top of HTTP so that servers could recognize multiple requests from the same user as parts of a more
complex and longer lasting sequence.
Structure of a session
The session is a key-value pair data structure. Think of it as a hashtable where each user gets a hashkey to put their
data in. This hashkey would be the “session id”.
https://github.com/FreemanZhang/system-design Page 16 of 48
GitHub - FreemanZhang/system-design: Preparing for system design interview questions 01/07/18, 5)42 PM
Amount of
Big Limited
data
Save bandwidth by passing only a reference to the session Must pass all data to the webserver
Efficiency
(sessionID) each pageload. each pageload
Scalability Need efforts to scale because requests depend on server state Easier to implement
Cookie Def
Cookies are key/value pairs used by websites to store state informations on the browser. Say you have a website
(example.com), when the browser requests a webpage the website can send cookies to store informations on the
browser.
HTTP/1.1 200 OK
Content-type: text/html
Set-Cookie: foo=10
Set-Cookie: bar=20; Expires=Fri, 30 Sep 2011 11:48:00 GMT
... rest of the response
// Here two cookies foo=10 and bar=20 are stored on the browser. The second one will expire on 30
September. In each subsequent request the browser will send the cookies back to the server.
Advantage: You do not have to store the sesion state anywhere in your data center. The entire session state is being
handed to your web server with every web request, thus making your application stateless in the context of the HTTP
session.
Disadvantage: Session storage can becomes expensive. Cookies are sent by the browser with every single request,
regardless of the type of resource being requested. As a result, all requests within the same cookie domain will have
session storage appended as part of the request.
Use case: When you can keep your data minimal. If all you need to keep in session scope is userID or some security
token, you will benefit from the simplicity and speed of this solution. Unfortunately, if you are not careful, adding more
data to the session scope can quickly grow into kilobytes, making web requests much slower, especially on mobile
devices. The coxt of cookie-based session storage is also amplified by the fact that encrypting serialized data and
then Based64 encoding increases the overall byte count by one third, so that 1KB of session scope data becomes
1.3KB of additional data transferred with each web request and web response.
https://github.com/FreemanZhang/system-design Page 17 of 48
GitHub - FreemanZhang/system-design: Preparing for system design interview questions 01/07/18, 5)42 PM
Approaches:
Keep state in main memory
Store session state in files on disk
Store session state in a database
Delegate the session storage to an external data store: Your web application would take the session identifier
from the web request and then load session data from an external data store. At the end of the web request
life cycle, just before a response is sent back to the user, the application would serialize the session data and
save it back in the data store. In this model, the web server does not hold any of the session data between
web requests, which makes it stateless in the context of an HTTP session.
Many data stores are suitable for this use case, for example, Memcached, Redis, DynamoDB, or Cassandra.
The only requirement here is to have very low latency on get-by-key and put-by-key operations. It is best if
your data store provides automatic scalability, but even if you had to do data partitioning yourself in the
application layer, it is not a problem, as sessions can be partitioned by the session ID itself.
}. Every time an internet user visits a specific website, a new session ID (a unique number that a web site's server
assigns a specific user for the duration of that user's visit) is generated. And an entry is created inside server's
session table
The load balancer needs to be able to inspect the headers of the request to make sure that requests with the same
session cookie always go to the server that initially the cookie.
But sticky sessions break the fundamental principle of statelessness, and I recommend avoiding them. Once you allow
your web servers to be unique, by storing any local state, you lose flexibility. You will not be able to restart,
decommission, or safely auto-scale web servers without braking user's session because their session data will be
bound to a single physical machine.
DNS
Resolve domain name to IP address
Design
Initial design
A simple design for DNS would have one DNS server that contains all the mappings. But the problems with a
centralized design include:
A single point of failure: If the DNS server crashes, so does the entire Internet.
Traffic volume: A single DNS server would have to handle all DNS queries.
Distant centralized database: A single DNS server cannot be close to all the querying clients.
https://github.com/FreemanZhang/system-design Page 18 of 48
GitHub - FreemanZhang/system-design: Preparing for system design interview questions 01/07/18, 5)42 PM
Maintenance: The single DNS server would have to keep records for all Internet hosts. It needed to be updated
frequently
Internals
DNS records
The DNS servers store source records (RRs). A resource record is a four-tuple that contains the following fields:
(Name, Value, Type, TTL )
There are the following four types of records
If Type=A, then Name is a hostname and Value is the IP address for the hostname. Thus, a Type A record provides
the standard hostname-to-IP address mapping. For example, (relay1.bar.foo.com, 145.37.93.126, A) is a Type A
record
If Type=NS, then Name is a domain and Value is the hostname of an authoritative DNS server that knows how to
obtain the IP addresses for hosts in the domain. This record is used to route DNS queries further along in the
query chain. As an example, (foo.com, dns.foo.com, NS) is a Type NS record.
If Type=CNAME, then Value is a canonical hostname for the alias hostname Name.
If Type=MX, then Value is the canonical name of a mail server that has an alias hostname Name.
When a user enters a URL into the browser's address bar, the first step is for the browser to resolve the hostname
(http://www.amazon.com/index.html) to an IP address. The browser extracts the host name www.amazon.com from
the URL and delegates the resolving task to the operating system. At this stage, the operating system has a couple of
choices.
It can either resolve the address using a static hosts file (such as /etc/hosts on Linux)
It then query a local DNS server.
The local DNS server forwards to a root DNS server. The root DNS server takes not of the com suffix and returns
a list of IP addresss for TLD servers responsible for com domain
The local DNS server then resends the query to one of the TLD servers. The TLD server takes note of
www.amazon. suffix and respond with the IP address of the authoritative DNS server for amazon.
Finally, the local DNS server resends the query message directly to authoritative DNS which responds with the IP
address of www.amazon.com.
Once the browser receives the IP addresses from DNS, it can initiate a TCP connection to the HTTP server process
located at port 80 at that IP address.
https://github.com/FreemanZhang/system-design Page 19 of 48
GitHub - FreemanZhang/system-design: Preparing for system design interview questions 01/07/18, 5)42 PM
Types
Round-robin DNS
A DNS server feature that allowing you to resolve a single domain name to one of many IP addresses.
GeoDNS
A DNS service that allows domain names to be resolved to IP addresses based on the location of the customer. A
client connecting from Europe may get a different IP address than the client connecting from Australia. The goal is to
direct the customer to the closest data center to minimize network latency.
Functionality
DNS Caching
Types:
Whenever the client issues a request to an ISP's resolver, the resolver caches the response for a short period
(TTL, set by the authoritative name server), and subsequent queries for this hostname can be answered directly
from the cache.
All major browsers also implement their own DNS cache, which removes the need for the browser to ask the
operating system to resolve. Because this isn't particularly faster than quuerying the operating system's cache,
the primary motivation here is better control over what is cached and for how long.
Performance:
DNS look-up times can vary dramatically - anything from a few milliseconds to perhaps one-half a second if a
remote name server must be queried. This manifests itself mostly as a slight delay when the user first loads the
site. On subsequent views, the DNS query is answered from a cache.
Load balancing
DNS can be used to perform load distribution among replicated servers, such as replicated web servers. For
replicated web servers, a set of IP addresses is thus associated with one canonical hostname. The DNS database
contains this set of IP addresses. When clients make a DNS query for a name mapped to a set of addresses, the
server responds with the entire set of IP addresses, but rotates the ordering of the addresses within each reply.
Because a client typically sends its HTTP request to the IP address that is the first in the set, DNS rotation distributes
the traffic among the replicated servers.
Host alias
A host with a complicated hostname can have one or more alias names. For example, a hostname such as relay1.west-
coast.enterprise.com could have two aliases such as enterprise.com and www.enterprise.
DNS prefetching
Def
Performing DNS lookups on URLs linked to in the HTML document, in anticipation that the user may eventually click
one of these links. Typically, a single UDP packet can carry the question, and a second UDP packet can carry the
answer.
Control prefetching
Most browsers support a link tag with the nonstandard rel="dns-prefetch" attribute. This causes teh browser to
prefetch the given hostname and can be used to precache such redirect linnks. For example
In addition, site owners can disable or enable prefetching through the use of a special HTTP header like:
X-DNS-Prefetch-Control: off
https://github.com/FreemanZhang/system-design Page 20 of 48
GitHub - FreemanZhang/system-design: Preparing for system design interview questions 01/07/18, 5)42 PM
Load balancers
Benefits
Decoupling
Hidden server maintenance. You can take a web server out of the load balancer pool, wait for all active
connections to drain, and then safely shutdown the web server without affecting even a single client. You can use
this method to perform rolling updates and deploy new software across the cluster without any downtime.
Seamlessly increase capacity. You can add more web servers at any time without your client ever realizing it. As
soon as you add a new server, it can start receiving connections.
Automated scaling. If you are on cloud-based hosting with the ability to configure auto-scaling (like Amazon,
Open Stack, or Rackspace), you can add and remove web servers throughout the day to best adapt to the traffic.
Security
SSL termination: By making load balancer the termination point, the load balancers can inspect the contents of
the HTTPS packets. This allows enhanced firewalling and means that you can balance requests based on teh
contents of the packets.
Filter out unwanted requests or limit them to authenticated users only because all requests to back-end servers
must first go past the balancer.
Protect against SYN floods (DoS attacks) because they pass traffic only on to a back-end server after a full TCP
connection has been set up with the client.
Round-robin algorithm
Def: Cycles through a list of servers and sends each new request to the next server. When it reaches the end of the
list, it starts over at the beginning.
Problems:
Not all requests have an equal performance cost on the server. But a request for a static resource will be several
orders of magnitude less resource-intensive than a requst for a dynamic resource.
Not all servers have identical processing power. Need to query back-end server to discover memory and CPU
usage, server load, and perhaps even network latency.
How to support sticky sessions: Hashing based on network address might help but is not a reliable option. Or the
load balancer could maintain a lookup table mapping session ID to server.
Security
SSL
Definition
Hyper Text Transfer Protocol Secure (HTTPS) is the secure version of HTTP, the protocol over which data is sent
between your browser and the website that you are connected to. The 'S' at the end of HTTPS stands for 'Secure'. It
means all communications between your browser and the website are encrypted. HTTPS is often used to protect
highly confidential online transactions like online banking and online shopping order forms.
HTTPS pages typically use one of two secure protocols to encrypt communications - SSL (Secure Sockets Layer) or
TLS (Transport Layer Security). Both the TLS and SSL protocols use what is known as an 'asymmetric' Public Key
Infrastructure (PKI) system. An asymmetric system uses two 'keys' to encrypt communications, a 'public' key and a
'private' key. Anything encrypted with the public key can only be decrypted by the private key and vice-versa.
As the names suggest, the 'private' key should be kept strictly protected and should only be accessible the owner of
the private key. In the case of a website, the private key remains securely ensconced on the web server. Conversely,
the public key is intended to be distributed to anybody and everybody that needs to be able to decrypt information
that was encrypted with the private key.
https://github.com/FreemanZhang/system-design Page 21 of 48
GitHub - FreemanZhang/system-design: Preparing for system design interview questions 01/07/18, 5)42 PM
Only use PKI to generate session key and use the session key for further communications.
NoSQL
NoSQL vs SQL
There is no generally accepted definition. All we can do is discuss some common characteristics of the databases that
tend to be called "NoSQL".
Low cost on changing query. It allows High cost in changing query. It does not allow you to
Query you to easily look at the data in different easily look at the data in different ways. NoSQL
flexibility ways. Standard SQL supports things like databases do not have the flexibility of joins or
joins and subqueries. subqueries.
https://github.com/FreemanZhang/system-design Page 22 of 48
GitHub - FreemanZhang/system-design: Preparing for system design interview questions 01/07/18, 5)42 PM
transactions to handle consistency how many nodes we need to query when we are
across the whole database. This gathering data. By using aggregates, we give the
Scalability
inherently clashes with a cluster database important information about which bits of data
environment (an aggregate) will be manipulated together, and thus
should live on the same node.
NoSQL flavors
Key-value
Suitable use cases
Storing session information: Generally, every web session is unique and is assigned a unique sessionid value.
Applications that store the sessionid on disk or in a RDBMS will greatly benefit from moving to a key-value store,
since everything about the session can be stored by a single PUT request or retrieved using GET. This single-
request operation makes it very fast, as everything about the session is stored in a single object. Solutions such
as Memcached are used by many web applications, and Riak can be used when availability is important
User profiles, Preferences: Almost every user has a unique userId, username, or some other attributes, as well
as preferences such as language, color, timezone, which products the user has access to, and so on. This can all
be put into an object, so getting preferences of a user takes a single GET operation. Similarly, product profiles can
be stored.
Shopping Cart Data: E-commerce websites have shopping carts tied to the user. As we want the shopping carts
to be available all the time, across browsers, machines, and sessions, all the shopping information can be put into
value where the key is the userid. A riak cluster would be best suited for these kinds of applications.
Relationships among Data: If you need to have relationships between different sets of data, or correlate teh data
between different sets of key, key-value stores are not the best solution to use, even though some key-value
stores provide link-walking features.
Multioperation transactions: If you're saving multiple keys and there is a failure to save any of them, and you
want to revert or roll back the rest of the operations, key-value stores are not the best solution to be used.
Query by data: If you need to search the keys based on something found in the value part of the key-value pairs,
then key-value stores are not going to perform well for you. This is no way to inspect the value on the database
side, with the exception of some products like Riak Search or indexing engines like Lucene.
Operations by sets: Since operations are limited to one key at a time, there is no way to operate upon multiple
keys at the same time. If you need to operate upon multiple keys, you have to handle this from the client side.
Document
Suitable use cases
Event logging: Applications have different event logging needs; within the enterprise, there are many different
applications that want to log events. Document databases can store all these different types of events and can
act as a central data store for event storage. This is especially true when the type of data being captured by the
events keeps changing. Events can be sharded by the name of the application where the event originated or by
the type of event such as order_processed or customer_logged.
Content Management Systems, Blogging Platforms: Since document databases have no predefined schemas
and usually uderstand JSON documents, they work well in content management systems or applications for
https://github.com/FreemanZhang/system-design Page 23 of 48
GitHub - FreemanZhang/system-design: Preparing for system design interview questions 01/07/18, 5)42 PM
publishing websites, managing user comments, user registrations, profiles, web-facing documents.
Web Analytics or Real-Time Analytics: Document databases can store data for real-time analytics; since parts of
the document can be updated, it's very easy to store page views or unique visitors, and new metrics can be easily
added without schema changes.
E-Commerce Applications: E-commerce applications often need to have flexible schema for products and
orders, as well as the ability to evolve their data models without expensive database refactoring or data migration.
Complex Transactions Spanning Different Operations: If you need to have atomic cross-document operations,
then document databases may not be for you. However, there are some document databases that do support
these kinds of operations, such as RavenDB.
Queries against Varying Aggregate Structure: Flexible schema means that the database does not enforce any
restrictions on the schema. Data is saved in the form of application entities. If you need to query these entities ad
hoc, your queries will be changing (in RDBMS terms, this would mean that as you join criteria between tables, the
tables to join keep changing). Since the data is saved as an aggregate, if the design of the aggregate is constantly
changing, you need to save the aggregates at the lowest level of granularity-basically, you need to normalize the
data. In this scenario, document databases may not work.
Column-Family
Suitable use cases
Event Logging: Column-family databases with their ability to store any data structures are a great choice to store
event information, such as application state or errors encountered by the application. Within the enterprise, all
applications can write their events to Cassandra with their own columns and the row key of the form
appname:timestamp. Since we can scale writes, Cassandra would work ideally for an event logging system.
Content Management Systems, Blogging Platforms: Using column-families, you can store blog entries with
tags, categories, links, and trackbacks in different columns. Comments can be either stored in the same row or
moved to a different keyspace; similarly, blog users and the actual blogs can be put into different column families.
Counters: Often, in web applications you need to count and categorize visitors of a page to calculate analytics,
you can use the CounterColumnType during creation of a column family.
Expiring usage: You may provide demo to users, or may want to show ad banners on a website for a specific time.
You can do this by using expiring columns: Cassandra allows you to have columns which, after a given time, are
deleted automatically. This time is known as TTL and is defined in seconds. The column is deleted after the TTL
has elapsed; when the column does not exist, the access can be revoked or the banner can be removed.
// Once a column family is created, you can have arbitrary columns for each page visited within the web application
INCR visit_counter['mfowler'][home] BY 1;
INCR visit_counter['mfowler'][products] BY 1;
INCR visit_counter['mfowler'][contactus] BY 1;
// expiring columns
SET Customer['mfowler']['demo_access'] = 'allowed' WITH ttl=2592000;
https://github.com/FreemanZhang/system-design Page 24 of 48
GitHub - FreemanZhang/system-design: Preparing for system design interview questions 01/07/18, 5)42 PM
Graph
Suitable use cases
Connected data:
Social networks are where graph databases can be deployed and used very effectively. These social graphs
don't have to be only of the friend kind; for example, they can represent employees, their knowledge, and
where they worked with other employees on different projects. Any link-rich domain is well-suited for graph
databases.
If you have relationships between domain entities from different domains (such as social, spatial, commerce)
in a single database, you can make these relationships more valuable by providing the ability to traverse
across domains.
Routing, Dispatch, and Location-Based Services: Every location or address that has a delivery is node, and all
the nodes where the delivery has to be made by the delivery person can be modeled as a graph nodes.
Relationships between nodes can have the property of distance, thus allowing you to deliver the goods in an
efficient manner. Distance and location properties can also be used in graphs of places of interest, so that your
application can provide recommendations of good restaurants or entertainment options nearby. You can also
create nodes for your points of sales, such as bookstores or restaurants, and notify the users when they are close
to any of the nodes to provide location-based services.
Recommendation Engines:
As nodes and relationships are created in the system, they can be used to make recommendations like "your
friends also bought this product" or "when invoicing this item, these other items are usually invoiced." Or, it
can be used to make recommendations to travelers mentioning that when other visitors come to Barcelona
they usually visit Antonio Gaudi's creations.
An interesting side effect of using the graph databases for recommendations is that as the data size grows,
the number of nodes and relationships available to make the recommendations quickly increases. The same
data can also be used to mine information-for example, which products are always bought together, or which
items are always invoiced together; alerts can be raised when these conditions are not met. Like other
recommendation engines, graph databases can be used to search for patterns in relationships to detect
fraud in transactions.
When you want to update all or a subset of entities - for example, in an analytics solution where all entities may
need to be updated with a changed property - graph databases may not be optimal since changing a peroperty
on all the nodes is not a straight-forward operation. Even if the data model works for the problem domain, some
databases may be unable to handle lots of data, especially in global graph operations.
Scaling
Functional partitioning
Consistency
Use all lowercase, hyphenated endpoints such as /api/verification-tokens. This increases URL "hackability", which is
the ability to manually go in and modify the URL by hand. You can pick any naming scheme you like, as long as you're
consistent about it.
https://github.com/FreemanZhang/system-design Page 25 of 48
GitHub - FreemanZhang/system-design: Preparing for system design interview questions 01/07/18, 5)42 PM
Use a noun or two to describe the resource, such as users, products, or verification-tokens.
Always describe resources in plural: /api/users rather than /api/user. This makes the API more semantic.
Collection resource: /users
Instance resource: /users/007
HEAD /products/:id Returns whether the product exists through a status code of 200 or 404
POST /authentication/login Most other API methods should use POST requests
Versioning
What is versioning? In traditional API scenarios, versioning is useful because it allows you to commit breaking
changes to your service without demolishing the interaction with existing consumers.
Whether you need versioning? Unless your team and your application are small enough that both live in the same
repository and developers touch on both indistinctly, go for the safe bet and use versions in your API.
Is the API public facing as well? In this case, versioning is necessary, baking a bit more predictability into your
service's behavior.
Is teh API used by several applications Are the API and the front end developed by separated teams? Is there a
drawn-out process to change an API point? If any of these cases apply, you're probably better off versioning your
API.
How to implement versioning? There are two popular ways to do it:
The API version should be set in HTTP headers, and that if a version isn't specified in the request, you should get
a response from the latest version of the API. But it can lead to breaking changes inadvertently.
The API version should be embedded into the URL. This identifies right away which version of the API your
application wants by looking at the requested endpoint. An API version should be included in the URL to ensure
browser explorability.
Request: You should decide on a consistent data-transfer strategy to upload the data to the server when making PUT,
PATCH, or POST requests that modify a resource in the server. Nowadays, JSON is used almost ubiquitously as the
data transport of choice due to its simplicity, the fact that it's native to browsers, and the high availability of JSON
parsing libraries across server-side languages.
https://github.com/FreemanZhang/system-design Page 26 of 48
GitHub - FreemanZhang/system-design: Preparing for system design interview questions 01/07/18, 5)42 PM
Response:
Responses should conform to a consistent data-transfer format, so you have no surprises when parsing the
response. Even when an error occurs on the server side, the response is still expected to be valid according to the
chosen transport; For example, if your API is built using JSON, then all the responses produced by our API should
be valid JSON.
You should figure out the envelope in which you'll wrap your responses. An envelope, or message wrapper, is
crucial for providing a consistent experience across all your API endpoints, allowing consumers to make certain
assumptions about the responses the API provides. A useful starting point may be an object with a single field,
named data, that contains the body of your response.
{
"data" : {} // actual response
}
Choose the right status codes for the problems your server is encountering so that the client knows what to do, but
even more important is to make sure the error messages that are coming back are clear.
An authentication error can happen because the wrong keys are used, because the signature is generated
incorrectly, or because it's passed to the server in the wrong way. The more information you can give to
developers about how and why the command failed, the more likely they'll be able to figure out how to solve the
problem.
When you respond with status codes in the 2XX Success class, the response body should contain all of the relevant
data that was requested. Here's an example showing the response to a request on a product that could be found,
alongside with the HTTP version and status code:
HTTP/1.1 200 OK
{
"data": {
"id" : "baeb-b001",
"name" : "Angry Pirate Plush Toy",
"description" : "Batteries not included",
"price" : "$39.99",
"categories": ["plushies", "kids"]
}
}
If the request is most likely failed due to an error made by the client side (the user wasn't properly authenticated, for
instance), you should use 4XX Client Error codes. If the request is most likely failed due to a server side error, then
you should use 5XX error codes. In these cases, you should use the error field to describe why the request was faulty.
// if input validation fails on a form while attempting to create a product, you could return a response using a
HTTP/1.1 400 Bad Request
{
"error": {
"code": "bf-400",
"message": "Some required fields were invalid.",
"context": {
"validation": [
"The product name must be 6-20 alphanumeric characters",
"The price cann't be negative",
"At least one product category should be selected. "
]
}
}
}
https://github.com/FreemanZhang/system-design Page 27 of 48
GitHub - FreemanZhang/system-design: Preparing for system design interview questions 01/07/18, 5)42 PM
"error": {
"code": "bf-500",
"message": "An unexpected error occurred while accessing the database",
"context": {
"id": "baeb-b001"
}
}
}
Paging
Suppose a user makes a query to your API for /api/products. How many products should that end point return? You
could set a default pagination limit across the API and have the ability to override that default for each individual
endpoint. Within a reasonable range, the consumer should have the ability to pass in a query string parameter and
choose a different limit.
Using Github paging API as an example, requests that return multiple items will be paginated to 30 items by
default. You can specify further pages with the ?page parameter. For some resources, you can also set a custom
page size up to 100 with the ?per_page parameter. Note that for technical reasons not all endpoints respect the ?
per_page parameter, see events for example. Note that page numbering is 1-based and that omitting the ?page
parameter will return the first page.
curl 'https://api.github.com/user/repos?page=2&per_page=100'
Common parameters
page and per_page. Intuitive for many use cases. Links to "page 2" may not always contain the same data.
offset and limit. This standard comes from the SQL database world, and is a good option when you need stable
permalinks to result sets.
since and limit. Get everything "since" some ID or timestamp. Useful when it's a priority to let clients efficiently
stay "in sync" with data. Generally requires result set order to be very stable.
Metadata
Include enough metadata so that clients can calculate how much data there is, and how and whether to fetch the
next set of results. Examples of how that might be implemented:
{
"results": [ ... actual results ... ],
"pagination": {
"count": 2340,
"page": 4,
"per_page": 20
}
}
Link header
The pagination info is included in the Link header. It is important to follow these Link header values instead of
constructing your own URLs. In some instances, such as in the Commits API, pagination is based on SHA1 and not
on page number.
Rel attribute
describes the relationship between the requested page and the linked page
Name Description
https://github.com/FreemanZhang/system-design Page 28 of 48
GitHub - FreemanZhang/system-design: Preparing for system design interview questions 01/07/18, 5)42 PM
next The link relation for the immediate next page of results.
prev The link relation for the immediate previous page of results.
Cases exist where data flows too rapidly for traditional paging methods to behave as expected. For instance, if a few
records make their way into the database between requests for the first page and the second one, the second page
results in duplicates of items that were on page one but were pushed to the second page as a result of the inserts.
This issue has two solutions:
The first is to use identifiers instead of page numbers. This allows the API to figure out where you left off, and
even if new records get inserted, you'll still get the next page in the context of the last range of identifiers that the
API gave you.
The second is to give tokens to the consumer that allow the API to track the position they arrived at after the last
request and what the next page should look like.
Benefits
You can distribute traffic among your web service machines on a per-request basis. You can deploy a load balancer
between your web services and their clients, and each request can be sent to any of the available web service
machines. Being able to distribute requests in a round-robin fashion allows for better load distributionn and more
flexibility.
Since each web service request can be served by any of the web service machines, you can take service machines
out of the load balancer pool as soon as they crash. Most of the modern load balancers support heartbeat checks to
make sure that web services machines serving the traffic are available. As soon as a machine crashes or experiences
some other type of failure, the load balancer will remove that host from the load-balancing pool, reducing the capacity
of the cluster, but preventing clients from timing out or failing to get responses.
You can restart and decommission servers at any point in time without worrying about affecting your clients. For
example, if you want to shut down a server for maintenance, you need to take that machine out of the load balancer
pool. Most load balancers support graceful removal of hosts, so new connections from clients are not sent to that
server any more, but existing connections are not terminated to prevent client-side errors. After removing the host
from the pool, you need to wait for all of your open connections to be closed by your clients, which can take a minute
or two, and then you can safely shut down the machine without affecting even a single web service request.
You will be able to perform zero-downtime updates of your web services. You can roll out your changes to one server
at a time by taking it out of rotation, upgrading, and then putting it back into rotation. If your software does not allow
you to run two different versions at the same time, you can deploy to an alternative stack and switch all of the traffic at
once on the load balancer level.
By removing all of the application state from your web services, you will be able to scale your web services layer by
simply adding more clones. All you need to do is adding more machines to the load balancer pool to be able to
support more concurrent connections, perform more network I/O, and compute more responses.
The first use case is related to security, as your web service is likely going to require clients to pass some
authentication token with each web service request. The token will have to be validated on the web service side, and
client permissions will have to be evaluated in some way to make sure that the user has access to the operation they
are attempting to perform. You could cache authentication and authorization details directly on your web service
machines, but that could cause problems when changing permissions or blocking accounts, as these objects would
need to expire before new permissions could take effect. A better approach is to use a shared in-memory object
cache and have each web service machine reach out for the data needed at request time. If not present, data could be
fetched from the original data store and placed in the object cache. By having a single central copy of each cached
object, you will be able to easily invalidate it when users' permissions change.
https://github.com/FreemanZhang/system-design Page 29 of 48
GitHub - FreemanZhang/system-design: Preparing for system design interview questions 01/07/18, 5)42 PM
Another common problem when dealing with stateless web services is how to support resource locking. You can use
distributed lock systems like Zookeeper or even build your own simple lock service using a data store of your choice.
To make sure your web services scale well, you should avoid resource locks for as long as possible and look for
alternative ways to synchronize parallel processes.
Distributed locking creates an opportunity for your service to stall or fail. This, in turn, increases your latency and
reduces the number of parallel clients that your web service can serve. Instead of resource locks, you can
sometimes use optimistic concurrency control where you check the state before the final update rather than
acquiring locks. You can also consider message queues as a way to decouple components and remove the need
for resource locking in the first place.
If none of the above techniques work for you and you need to use resource locks, it is important to strike a
balance between having to acquire a lot of fine-grained locks and having coarse locks that block access to large
sets of data. By having too many fine-grained locks, you increase risk for deadlocks. If you use few coarse locks,
you can increase concurrency because multiple web services can be blocked waiting on the same resource lock.
The last challenge is application-level transactions. A distributed transaction is a set of internal service steps and
external web service calls that either complete together or fail entirely. It is very difficult to scale and coordinate
without sacrificing high availability. The most common method of implementing distributed transactions is the 2 Phase
Commit algorithm. An example of a distributed transaction would be a web service that creates an order within an
online shop.
The first alternative to distributed transactions is to not support them at all. As long as the core of your system
functionality is not compromised, your company may be fine with such a minor inconsistencies in return for the
time saved developing it.
The second alternative to distributed transactions is to provide a mechanism of compensating transactions. A
compensating transactins can be used to revert the result of an operation that was issued as part of a larger
logical transaction that has failed. The benefit of this approach is that web services do not need to wait for one
another; they do not need to maintain any state or resources for the duration of the overarching transaction
either.
From a caching perspective, the GET method is the most important one, as GET responses can be cached.
To be able to scale using cache, you would usually deploy reverse proxies between your clients and your web service.
As your web services layer grow, you may end up with a more complex deployment where each of your web services
has a reverse proxy dedicated to serve its results. Depending on the reverse proxy used, you may also have load
balancers deployed between reverse proxies and web services to distribute the underlying network traffic and provide
quick failure recovery.
Cache-Control header
Setting the Cache-Control header to private bypasses intermediaries (such as nginx, other caching layers like Varnish,
and all kinds of hardware in between) and only allows the end client to cache the response.
Setting it to public allows intermediaries to store a copy of the response in their cache.
Expires
Tells the browser that a resource should be cached and not requested again until the expiration date has elapsed.
It's hard to define future Expires headers in API responses because if the data in the server changes, it could mean
that the cleint's cache becomes stale, but it doesn't have any way of knowing that until the expiration date. A
conservative alternative to Expires header in responses is using a pattern callled "conditional requests"
Last-Modified/If-Modified-Since/Max-age
Specifying a Last-Modified header in your response. It's best to specify a max-age in the Cache-Control header, to let
the browser invalidate the cache after a certain period of time even if the modification date doesn't change
The next time the browser requests this resource, it will only ask for the contents of the resource if they're unchanged
since this date, using the If-Modified-Since request header. If the resource hasn't changed since Thu, 3 Jul 2014
https://github.com/FreemanZhang/system-design Page 30 of 48
GitHub - FreemanZhang/system-design: Preparing for system design interview questions 01/07/18, 5)42 PM
18Ç31Ç12 GMT, the server will return with an empty body with the 304 Not Modified status code.
ETag
ETag header is usually a hash that represents the source in its current state. This allows the server to identify if the
cached contents of the resource are different than the most recent versions:
On subsequent requests, the If-None-Match request header is sent with the ETag value of the last requested version
for the same resource. If the current version has the same ETag value, your current version is what the client has
cached and a 304 Not Modified response will be returned.
If-None-Match: "d5jiodjiojiojo"
Vary: Authorization
You could implement caching of authenticated REST resources by using headers like Vary: Authorization in your web
service responses. Responses with such headers instruct HTTP caches to store a separate response for each value of
the Authorization header.
Functional partitioning
By functional partitioning, you group closely related functionality together. The resulting web services are loosely
coupled and they can now be scaled independently.
Security
Throttling
This kind of safeguarding is usually unnecessary when dealing with an internal API, or an API meant only for your front
end, but it's a crucial measure to make when exposing the API publicly.
Suppose you define a rate limit of 2,000 requests per hour for unauthenticated users; the API should include the
following headers in its responses, with every request shaving off a point from the remainder. The X-RateLimit-Reset
header should contain a UNIX timestamp describing the moment when the limit will be reset
Once the request quota is drained, the API should return a 429 Too Many Request response, with a helpful error
message wrapped in the usual error envelope:
X-RateLimit-Limit: 2000
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1404429213925
{
"error": {
"code": "bf-429",
"message": "Request quota exceeded. Wait 3 minutes and try again.",
"context": {
"renewal": 1404429213925
}
}
}
However, it can be very useful to notify the consumer of their limits before they actually hit it. This is an area that
currently lacks standards but has a number of popular conventions using HTTP response headers.
Documentation
https://github.com/FreemanZhang/system-design Page 31 of 48
GitHub - FreemanZhang/system-design: Preparing for system design interview questions 01/07/18, 5)42 PM
Others
https://github.com/FreemanZhang/system-design Page 32 of 48
GitHub - FreemanZhang/system-design: Preparing for system design interview questions 01/07/18, 5)42 PM
Sharding benefits
Scale horizontally to any size. Without sharding, sooner or later, your data set size will be too large for a single server
to manage or you will get too many concurrent connections for a single server to handle. You are also likely to reach
your I/O throughput capacity as you keep reading and writing more data. By using application-level sharing, none of
the servers need to have all of the data. This allows you to have multiple MySQL servers, each with a reasonable
amount of RAM, hard drives, and CPUs and each of them being responsible for a small subset of the overall data,
queries, and read/write throughput.
Since sharding splits data into disjoint subsets, you end up with a share-nothing architecture. There is no overhead of
communication between servers, and there is no cluster-wide synchronization or blocking. Servers are independent
from each other because they shared nothing. Each server can make authoritative decisions about data modifications
You can implement in the application layer and then apply it to any data store, regardless of whether it supports
sharding out of the box or not. You can apply sharding to object caches, message queues, nonstructured data stores,
or even file systems.
Sharding key
Determine what tables need to be sharded. A good starting point for deciding that is to look at the number of rows in
the tables as well as the dependencies between the tables.
Typically you use only a single column as partition key. Using multiple columns can be hard to maintain unless
they are hard to maintain.
Sharding on a column that is a primary key offers significant advantages. The reason for this is that the column
should have a unique index, so that each value in the column uniquely identifies the row.
Sharding function
Static sharding
Def: The sharding key is mapped to a shard identifier using a fixed assignment that never changes.
Static sharding schemes run into problems when the distribution of the queries is not even.
Types:
Range partitioning (used in HBase)
Easy to implement but distribution can easy to become uneven. For example, if you are using URIs as keys,
"hot" sites will be clustered together when you actually want the opposite, to spread them out. One hard can
become overloaded and you have to split it a lot to be able to cope with the increase in load.
Hash partitioning
Computes a hash of the input in some manner (MD5 or SHA-1) and then uses modulo arithmetic to get a
number between 1 and the number of the shards.
Evenly distributed but need large amount of data migration when the number of server changes and
rehashing
Consistent hashing: Gauranteed to move rows from just one old shard to the new shard. The entire hash
range is shown as a ring. On the hash ring, the shards are assigned to points on the ring using the hash
function. In a similar manner, the rows are distributed over the ring using the same hash function. Each shard
is now responsible for the region of the ring that starts at the shard's point on the ring and continues to the
next shard point. Because a region may start at the end of the hash range and wrap around to the beginning
of the hash range, a ring is used here instead of a flat line.
Pick a hash function which must have a big range, hence a lot of "points" on the hash ring where rows
can be assigned. The most commonly used functions are MD5, SHA and Murmur hash (murmur3 -2^128,
2^128)
https://github.com/FreemanZhang/system-design Page 33 of 48
GitHub - FreemanZhang/system-design: Preparing for system design interview questions 01/07/18, 5)42 PM
Dynamic sharding
The sharding key is looked up in a dictionary that indicates which shard contains the data.
More flexible. You are allowed to change the location of shards and it is also easy to move data between shards if
you have to. You do not need to migrate all of the data in one shot, but you can do it incrementally, one account at
a time. To migrate a user, you need to lock its account, migrate the data, and then unlock it. You could usually do
these migrations at night to reduce the impact on the system, and you could also migrate multiple accounts at the
same time. There is an additional level of flexibility, as you can cherry-pick users and migrate them to the shards
of your choice. Depending on the application requirements, you could migrate your largest or busiest clients to
separate dedicated database instances to give them more capacity.
Requires a centralized store called the sharding database and extra queries to find the correct shard to retrieve
the data from.
Challenges
Cross-shard joins
Tricky to execute queries spanning multiple shards. The most common reason for using cross-shard joins is to create
reports. This usually requires collecting information from the entire database. There are basically two approaches to
solve this problem
Execute the query in a map-reduce fashion (i.e., send the query to all shards and collect the result into a single
result set). It is pretty common that running the same query on each of your servers and picking the highest of
the values will not guarantee a correct result.
Replicate all the shards to a separate reporting server and run the query there. This approach is easier. It is
usually feasible, as well, because most reporting is done at specific times, is long-running, and does not depend
on the current state of the database.
Using AUTO_INCREMENT
It is quite common to use AUTO_INCREMENT to create a unique identifier for a column. However, this fails in a sharded
environment because the the shards do not syncrhonize their AUTO_INCREMENT identifiers. This means if you insert
a row in one shard, it might well happen that the same identifier is used on another shard. If you truly want to generate
a unique identifer, there are basically three approaches.
Generate a unique UUID. The drawback is that the identifier takes 128 bits (16 bytes).
Use a composite identifier. Where the first part is the shard identifier and the second part is a locally generated
identifier. Note that the shard identifier is used when generating the key, so if a row with this identifier is moved,
the original shard identifier has to move with it. You can solve this by maintaining, in addition to the column with
the AUTO_INCREMENT, an extra column containing the shard identifier for the shard where the row was created.
Use atomic counters provided by some data stores. For example, if you already use Redis, you could create a
counter for each unique identifier. You would then use Redis' INCR command to increase the value of a selected
counter and return it with a different value.
Distributed transactions
Lose the ACID properties of your database as a whole. Maintaining ACID properties across shards requires you to use
distributed transactions, which are complex and expensive to execute (most open-source database engines like
MySQL do not even support distributed transactions).
https://github.com/FreemanZhang/system-design Page 34 of 48
GitHub - FreemanZhang/system-design: Preparing for system design interview questions 01/07/18, 5)42 PM
Clones - Replication
Replication purpose
Duplicate components
Def: Keep duplicates around for each component - ready to take over immediately if the original component fails.
Characteristics: Do not lose performance when switching and switching to the standby is usually faster than
restructuring the system. But expensive.
For example: Hot standby
A dedicated server that just duplicates the main master. The hot standby is connected to the master as a
slave, so that it reads and applies all changes. This setup is often called primary-backup configuration.
Def: Have extra capacity in the system so that if a component fails, you can still handle the load.
Characteristics: Should one of the component fail, the system will still be responding, but the capacity of the
system will be reduced.
Slave failures
Because the slaves are used only for read quires, it is sufficient to inform the load balancer that the slave is
missing. Then we can take the failing slave out of rotation. rebuild it and put it back.
Master failures
Problems:
All the slaves have stale data.
Some queries may block if they are waiting for changes to arrive at the slave. Some queries may make it into
the relay log of the slave and therefore will eventually be executed by the slave. No special consideration has
to be taken on the behalf of these queries.
For queries that are waiting for events that did not leave the master before it crashed, they are usually
reported as failures so users should reissue the query.
Solutions:
If simply restart does not work
First find out which of your slaves is most up to date.
Then reconfigure it to become a master.
Finally reconfigure all remaining slaves to replicate from the new master.
Relay failures
For servers acting as relay servers, the situation has to be handled specially. If they fail, the remaining slaves have
to be redirected to use some other relay or the master itself.
Disaster recovery
Disaster does not have to mean earthquakes or floods; it just means that something went very bad for the
computer and it is not local to the machine that failed. Typical examples are lost power in the data center (not
necessarily because the power was lost in the city; just losing power in the building is sufficient.)
The nature of a disaster is that many things fail at once, making it impossible to handle redundancy by duplicating
servers at a single data center. Instead, it is necessary to ensure data is kept safe at another geographic location,
and it is quite common for companies to ensure high availability by having different components at different
offices.
https://github.com/FreemanZhang/system-design Page 35 of 48
GitHub - FreemanZhang/system-design: Preparing for system design interview questions 01/07/18, 5)42 PM
When to use
Scale reads: Instead of a single server having to respond to all the queries, you can have many clones sharing the
load. You can keep scaling read capacity by simply adding more slaves. And if you ever hit the limit of how many
slaves your master can handle, you can use multilevel replication to further distribute the load and keep adding even
more slaves. By adding multiple levels of replication, your replication lag increases, as changes need to propogate
through more servers, but you can increase read capacity.
Scale the number of concurrently reading clients and the number of queries per second: If you want to scale your
database to support 5,000 concurrent read connections, then adding more slaves or caching more aggressively can
be a great way to go.
Scale writes: No matter what topology you use, all of your writes need to go through a single machine.
Although a dual master architecture appears to double the capacity for handling writes (because there are two
masters), it actually doesn't. Writes are just as expensive as before because each statement has to be executed
twice: once when it is received from the client and once when it is received from the other master. All the writes
done by the A clients, as well as B clients, are replicated and get executed twice, which leaves you in no better
position than before.
Not a good way to scale the overall data set size: If you want to scale your active data set to 5TB, replication would not
help you get there. The reason why replication does not help in scaling the data set size is that all of the data must be
present on each of the machines. The master and each of its slave need to have all of the data.
Def of active data set: All of the data that must be accessed frequently by your application. (all of the data your
database needs to read from or write to disk within a time window, like an hour, a day, or a week.)
Size of active data set: When the active data set is small, the database can buffer most of it in memory. As your
active data set grows, your database needs to load more disk blocks because in-memory buffers are not large
enough to contain enough of the active disk blocks.
Access pattern of data set
Like a time-window: In an e-commerce website, you use tables to store information about each purchase.
This type of data is usually accessed right after the purchase and then it becomes less and less relevant as
time goes by. Sometimes you may still access older transactions after a few days or weeks to update
shipping details or to perform a refund, but after that, the data is pretty much dead except for an occasional
report query accessing it.
Unlimited data set growth: A website that allowed users to listen to music online, your users would likely
come back every day or every week to listen to their music. In such case, no matter how old an account is,
the user is still likely to log in and request her playlists on a weekly or daily basis.
Replication Topology
Master-slave vs peer-to-peer
https://github.com/FreemanZhang/system-design Page 36 of 48
GitHub - FreemanZhang/system-design: Preparing for system design interview questions 01/07/18, 5)42 PM
does speed up recovery after a failure of case, that can mean that a client cannot read a write
the master since a slave can be appointed it just made.
a new master very quickly.
Master-slave replication
Responsibility:
Master is reponsible for all data-modifying commands like updates, inserts, deletes or create table statements.
The master server records all of these statements in a log file called a binlog, together with a timestamp, and a
sequence number to each statement. Once a statement is written to a binlog, it can then be sent to slave servers.
Slave is responsible for all read statements.
Replication process: The master server writes commands to its own binlog, regardless if any slave servers are
connected or not. The slave server knows where it left off and makes sure to get the right updates. This asynchronous
process decouples the master from its slaves - you can always connect a new slave or disconnect slaves at any point
in time without affecting the master.
i. First the client connects to the master server and executes a data modification statement. The statement is
executed and written to a binlog file. At this stage the master server returns a response to the client and
continues processing other transactions.
ii. At any point in time the slave server can connect to the master server and ask for an incremental update of the
master' binlog file. In its request, the slave server provides the sequence number of the last command that it saw.
iii. Since all of the commands stored in the binlog file are sorted by sequence number, the master server can quickly
locate the right place and begin streaming the binlog file back to the slave server.
iv. The slave server then writes all of these statements to its own copy of the master's binlog file, called a relay log.
https://github.com/FreemanZhang/system-design Page 37 of 48
GitHub - FreemanZhang/system-design: Preparing for system design interview questions 01/07/18, 5)42 PM
v. Once a statement is written to the relay log, it is executed on the slave data set, and the offset of the most
recently seen command is increased.
Number of slaves
It is a common practice to have two or more slaves for each master server. Having more than one slave machine have
the following benefits:
Distribute read-only statements among more servers, thus sharding the load among more servers
Use different slaves for different types of queries. E.g. Use one slave for regular application queries and another
slave for slow, long-running reports.
Losing a slave is a nonevent, as slaves do not have any information that would not be available via the master or
other slaves.
Peer-to-peer replication
Dual masters
Two masters replicate each other to keep both current. This setup is very simple to use because it is symmetric.
Failing over to the standby master does not require any reconfiguration of the main master, and failing back to the
main master again when the standby master fails in turn is very easy.
Active-active: Writes go to both servers, which then transfer changes to the other master.
Active-passive: One of the masters handles writes while the other server, just keeps current with the active
master
The most common use of active-active dumal masters setup is to have the servers geographically close to
different sets of users - for example, in branch offices at different places in the world. The users can then work
with local server, and the changes will be replicated over to the other master so that both masters are kept in
sync.
Circular replication
Replication mode
Asynchronous: The master does not wait for the slaves to apply the changes, but instead just dispatches each change
request to the slaves and assume they will catch up eventually and replicate all the changes.
Synchronous: The master and slaves are always in sync and a transaction is not allowed to be committed on the
master unless the slaves agrees to commit it as well (i.e. synchronous replication makes the master wait for all the
slaves to keep up with the writes.)
Synchronous vs Asynchronous
Asynchronous replication is a lot faster than synchronous replication. Compared with asynchronous replication,
synchronous replication requires extra synchronization to guarantee consistency. It is usually implemented through a
protocol called two-phase commit, which guarantees consistency between the master and slaves. What makes this
protocol slow is that it requires a total of four messages, including messages with the transaction and the prepare
request. The major problem is not the amount of network traffic required to handle the synchronization, but the
latency introduced by the network and by processing the commit on the slave, together with the fact that the commit
is blocked on the master until all the slaves have acknowledged the transaction. In contrast, the master does not have
to wait for the slave, but can report the transaction as committed immediately, which improves performance
significantly.
The performance of asynchronous replication comes at the price of consistency. In asynchronous replication the
transaction is reported as committed immediately, without waiting for any acknowledgement from the slave.
Cache
https://github.com/FreemanZhang/system-design Page 38 of 48
GitHub - FreemanZhang/system-design: Preparing for system design interview questions 01/07/18, 5)42 PM
Access pattern
def: write go through the cache and write is confirmed as success only if writes to DB and the cache both succeed.
use-case: applications which write and re-read the information quickly. But the write latency might be much higher
because of two write phase
def: write directly goes to the DB. The cache reads the info from DB in case of a miss
use-case: lower write load to cache and faster writes, but can lead to higher read latency in case of applications which
write and re-read the information quickly
def: write is directly done to the caching layer and write is confirmed as soon as the write to the cache completes.The
cache then asynchronously syncs this write to the DB.
use-case: quick write latency and high write throughput. But might lose data in case the cache layer dies
https://github.com/FreemanZhang/system-design Page 39 of 48
GitHub - FreemanZhang/system-design: Preparing for system design interview questions 01/07/18, 5)42 PM
HTTP Cache
All of the caching technologies working in the HTTP layer work as read-through caches
Procedures
First Client 1 connects to the cache and request a particular web resource.
Then the cache has a change to intercept the request and respond to it using a cached object.
Only if the cache does not have a valid cached response, will it connect to the origin server itself and forward
the client's request.
Advantages: Read-through caches are especially attractive because they are transparent to the client. This
pluggable architecture gives a lot of flexibility, allowing you to add layers of caching to the HTTP stack without
needing to modify any of the clients.
Headers
https://github.com/FreemanZhang/system-design Page 40 of 48
GitHub - FreemanZhang/system-design: Preparing for system design interview questions 01/07/18, 5)42 PM
must-revalidate: Once the response becomes stale, it cannot be returned to clients without revalidation. Although
it may seem odd, caches may return stale objects under certain conditions. For example, if the client explicitly
allows it or if the cache loses connection to the original server.
Expires:
Allows you to specify an absolute point in time when the object becomes stale.
Some of the functionality controlled by the Cache-Control header overlaps that of other HTTP headers. Expiration
time of the web response can be defined either by Cache-Control: max-age=600 or by setting an absolute
expiration time using the Expires header. Including both of these headers in the response is redundant and leads
to confusion and potentially inconsistent behavior.
Vary:
Tell caches that you may need to generate multiple variations of the response based on some HTTP request
headers. For example: Vary:Accept-Encoding is the most common Vary header indicating that you may return
responses encoded in different ways depending on the Accept-Encoding header that the client sends to your web
server. Some clients who accept gzip encoding will get a compressed response, where others who cannot
support gzip will get an uncompressed response.
How not to cache:
It's common to see meta tags used in the HTML of pages to control caching. This is a poor man's cache control
technique, which isn't terribly effective. Although most browsers honor these meta tags when caching locally,
most intermediate proxies do not.
Types
Browser cache
Browsers have built-in caching capabilities to reduce the number of request sent out. These usually uses a
combination of memory and local files.
There are several problems with browser cache
The size of the cache tends to be quite small by default. Usually around 1GB. Given that web pages have become
increasingly heavy, browsers would probably be more effective if they defaulted to much larger caches.
When the cache becomes full, the algorithm to decide what to remove is crude. Commonly, the LRU algorithm is
used to purge old items. It fails to take into account the relative "cost" to request different types of resources. For
example, the loading of Javascript resources typically blocks loading of the rest of the page. It makes more sense
for these to be given preference in the cache over, say, images.
Many browsers offer an easy way for the user to remove temporary data for the sake of privacy. Users often feel
that cleaning the browser cache is an important step in somehow stopping their PC from running slow.
Caching proxies
A caching proxy is a server, usually installed in a local corporate network or by the Internet service provider (ISP). It is
a read-through cache used to reduce the amount of traffic generated by the users of the network by reusing
responses between users of the network. The larger the network, the larger the potential savings - that is why it was
quite common among ISPs to install transparent caching proxies and route all of the HTTP traffic through them to
cache as many requests as possible.
In recent years, the practice of installing local proxy servers has become less popular as bandwidth has become
cheaper and as it becomes more popular for websiste to serve their resources soley over the Secure Socket Layer.
Reverse proxy
A reverse proxy works in the exactly same way as a regular caching proxy, but the intent is to place a reverse proxy in
your own data center to reduce the load put on your web servers.
Purpose:
For caching, they can be used to lighten load on the back-end server by serving up cached versions of
dynamically generated pages (thus cuttping CPU usage). Using reverse proxies can also give you more flexibility
because you can override HTTP headers and better control which requests are being cached and for how long.
For load balancing, they can be used for load-balancing multiple back-end web servers.
https://github.com/FreemanZhang/system-design Page 41 of 48
GitHub - FreemanZhang/system-design: Preparing for system design interview questions 01/07/18, 5)42 PM
A CDN is a distributed network of cache servers that work in similar way as caching proxies. They depend on the same
HTTP headers, but they are controlled by the CDN service provider.
Advantage:
Reduce the load put on your servers
Save network bandwidth
Improve the user experience because by pushing content closer to your users.
Procedures: Web applications would typically use CDN to cache their static files like images, CSS, JavaScript, videos
or PDF.
You can imlement it easily by creating a static subdomain and generate URLs for all of your static files using this
domain
Then you configure the CDN provider to accept these requests on your behalf and point DNS for s.example.org to
the CDN provider.
Any time CDN fails to serve a piece of content from its cache, it forwards the request to your web servers and
caches the response for subsequent users.
Scaling
Do not worry about the scalability of browser caches or third-party proxy servers.
This usually leaves you to manage reverse proxy servers. For most young startups, a single reverse proxy should be
able to handle the incoming traffic, as both hardware reverse proxies and leading open-source ones can handle more
than 10,000 requests per second from a single machine.
First step: To be able to scale the reverse proxy layer efficiently, you need to first focus on your cache hit ratio
first.
Cache key space: Describe how many distinct URLs your reverse proxies will observe in a period of time. The
more distinct URLs are served, the more memory or storage you need on each reverse proxy to be able to
serve a significant portion of traffic from cache. Avoid caching responses that depend on the user (for
example, that contain the user ID in the URL). These types of response can easily pollute your cache with
objects that cannot be reused.
Average response TTL: Describe how long each response can be cached. The longer you cache objects, the
more chance you have to reuse them. Always try to cache objects permanently. If you cannot cache objects
forever, try to negotiate the longest acceptable cache TTL with your business stakeholders.
Average size of cached object: Affects how much memory or storage your reverse proxies will need to store
the most commonly accessed objects. Average size of cached object is the most difficult to control, but you
should still keep in mind because there are some techniques that help you "shrink" your objects.
Second step: Deploying multiple reverse proxies in parallel and distributing traffic among them. You can also scale
reverse proxies vertically by giving them more memory or switching their persistent storage to solid-state drive.
Types
Web storage allows a web application to use a limited amount (usually up to 5MB to 25MB of data).
Web storage works as a key-value store.
Caches co-located with code: One located directly on your web servers.
https://github.com/FreemanZhang/system-design Page 42 of 48
GitHub - FreemanZhang/system-design: Preparing for system design interview questions 01/07/18, 5)42 PM
Interacting with a distributed object cache usually requires a network round trip to the cache server. On the plus side,
distributed object caches usually work as simple key-value stores, allowing clients to store data in the cache. You can
scale simply by adding more servers to the cache cluster. By adding servers, you can scale both the throughput and
overall memory pool of your cache.
Scaling
Cache priority
The higher up the call stack you can cache, the more resources you can save.
Aggregated time spent = time spent per request * number of requests
Cache reuse
Always try to reuse the same cached object for as many requests/users as you can.
Cache invalidation
LRU
TTL
Pains
Evict policies
FIFO ( first-in, first out )
LRU ( least recently used )
LFU ( least frequently used )
See reference section for more discussions
What to do with evicted one
Overflow to disk
Delete it
Expiration policy
TTI: time to idle, a counter count down if not reset
TTL: time to leave, maximum tolerance for staleness
Pain of loading
https://github.com/FreemanZhang/system-design Page 43 of 48
GitHub - FreemanZhang/system-design: Preparing for system design interview questions 01/07/18, 5)42 PM
def: on startup, create background thread to pull the existing cache data from another peer
automatically bootstrap key on startup
cache value on demand
Pain of duplication
Def
Many readers read an empty value from the cache and subseqeuntly try to load it from the database. The result is
unnecessary database load as all readers simultaneously execute the same query against the database.
Let's say you have [lots] of webservers all hitting a single memcache key that caches the result of a slow database
query, say some sort of stat for the homepage of your site. When the memcache key expires, all the webservers may
think "ah, no key, I will calculate the result and save it back to memcache". Now you have [lots] of servers all doing the
same expensive DB query.
/* read some data, check cache first, otherwise read from SoR */
public V readSomeData(K key) {
Element element;
if ((element = cache.get(key)) != null) {
return element.getValue();
}
return value;
}
Solutions
Stale date solution: The first client to request data past the stale date is asked to refresh the data, while subsequent
requests are given the stale but not-yet-expired data as if it were fresh, with the understanding that it will get
refreshed in a 'reasonable' amount of time by that initial request
When a cache entry is known to be getting close to expiry, continue to server the cache entry while reloading it
before it expires.
When a cache entry is based on an underlying data store and the underlying data store changes in such a way
that the cache entry should be updated, either trigger an (a) update or (b) invalidation of that entry from the data
store.
Add entropy back into your system: If your system doesnʼt jitter then you get thundering herds.
For example, cache expirations. For a popular video they cache things as best they can. The most popular video
they might cache for 24 hours. If everything expires at one time then every machine will calculate the expiration at
the same time. This creates a thundering herd.
https://github.com/FreemanZhang/system-design Page 44 of 48
GitHub - FreemanZhang/system-design: Preparing for system design interview questions 01/07/18, 5)42 PM
By jittering you are saying randomly expire between 18-30 hours. That prevents things from stacking up. They
use this all over the place. Systems have a tendency to self synchronize as operations line up and try to destroy
themselves. Fascinating to watch. You get slow disk system on one machine and everybody is waiting on a
request so all of a sudden all these other requests on all these other machines are completely synchronized. This
happens when you have many machines and you have many events. Each one actually removes entropy from the
system so you have to add some back in.
No expire solution: If cache items never expire then there can never be a recalculation storm. Then how do you update
the data? Use cron to periodically run the calculation and populate the cache. Take the responsibility for cache
maintenance out of the application space. This approach can also be used to pre-warm the the cache so a newly
brought up system doesn't peg the database.
The problem is the solution doesn't always work. Memcached can still evict your cache item when it starts
running out of memory. It uses a LRU (least recently used) policy so your cache item may not be around when a
program needs it which means it will have to go without, use a local cache, or recalculate. And if we recalculate
we still have the same piling on issues.
This approach also doesn't work well for item specific caching. It works for globally calculated items like top N
posts, but it doesn't really make sense to periodically cache items for user data when the user isn't even active. I
suppose you could keep an active list to get around this limitation though.
Architecture
Lambda architecture
Building blocks
Load balancer
https://github.com/FreemanZhang/system-design Page 45 of 48
GitHub - FreemanZhang/system-design: Preparing for system design interview questions 01/07/18, 5)42 PM
Hardware vs software
Run on standard PC hardware, using Run on special hardware and contain any software
Def
applications like Nginx and HAProxy pre-installed and configured by the vendor.
More intelligent because can talk Higher throughput and lower latency. High purchase
HTTP (can perform the compression cost. Hardware load balancer prices start from a few
of resources passing through and thousand dollars and go as high as over 100,000
Strength/Weakness
routing-based on the presence of dollars per device. Specialized training and harder to
cookies) and more flexible for hacking find people with the work experience necessary to
in new features or changes operate them.
HAProxy vs Nginx
Can cache HTTP responses A little faster than Nginx and a wealth of extra features. It can be
Strengths
from your servers. configured as either a layer 4 or layer 7 load balancer.
Extra functionalities of HAProxy. It can be configured as either a layer 4 or layer 7 load balancer.
When HAProxy is set up to be a layer 4 proxy, it does not inspect higher-level protocols and it depends solely on
TCP/IP headers to distribute the traffic. This, in turn, allows HAProxy to be a load balancer for any protocol, not
just HTTP/HTTPS. You can use HAProxy to distribute traffic for services like cache servers, message queues, or
databases.
HAProxy can also be configured as a layer 7 proxy, in which case it supports sticky sessions and SSL termination,
but needs more resources to be able to inspect and track HTTP-specific information. The fact that HAProxy is
simpler in design makes it perform sligthly better than Nginx, especially when configured as a layer 4 load
balancer. Finally, HAProxy has built-in high-availability support.
Web server
Apache vs Nginx
https://github.com/FreemanZhang/system-design Page 46 of 48
GitHub - FreemanZhang/system-design: Preparing for system design interview questions 01/07/18, 5)42 PM
History Tunning Apache to cope with real-world traffic efficiently is a features including which
complex art. make them appropriate for
microservices.
Asynchronous event-driven
One process/threads per connection. Each requests to be handled model. There is a single
Architecture
as a separate child/thread. master process with one or
more worker processes.
Cache
Standalone cache
Memcached
Redis
Database
DynamoDB
Cassandra
Queue
ActiveMQ
RabbitMQ
https://github.com/FreemanZhang/system-design Page 47 of 48
GitHub - FreemanZhang/system-design: Preparing for system design interview questions 01/07/18, 5)42 PM
SQS
Kafka
Data Processing
Hadoop
Spark
EMR
Stream Processing
Samza
Storm
References
Hired in Tech courses
Blogs and papers
Books: "Professional Website Performance" by Peter Smith
Books: "Web Scalability for Startup Engineers" by Artur Ejsmont
Books: "MySQL High Performance" and "MySQL High Availability" from O'Reilly
Jiuzhang/Bittiger system design class
Gainlo blog
https://github.com/FreemanZhang/system-design Page 48 of 48