Application Layer (Chapter 7) - CSHub
Application Layer (Chapter 7) - CSHub
URLs
Why URLs?
• Remembering IP addresses is annoying
• Servers may change IP address
DHCP assigns an IP address to a machine without one. Besides this, it also provides you the
address of a nameserver.
The IPs of name servers or records may also be cached at the local computer, by the operating
system. There are also 2 different ways of querying name servers:
• Recursive query: The local name server does further queries to other name servers
(better if you have weak machines)
• Iterative query: The machine does further queries to other name servers
One name resolution can use both mechanisms, as in the picture underneath, the local name
server is recursive while the other servers are iterative.
Email
Structure
The participating computers are split up into 2 categories:
• Users (the computers sending and receiving emails, may not always be online)
• Message transfer agents, aka mail servers (route the emails to their destination, should
always be online)
Message format
Email messages contain an envelope, header, and body.
IMAP
IMAP sends commands to a mail server to manipulate mailboxes. Some common commands:
1. LOGIN, log into server
2. FETCH, fetch messages from a folder
3. CREATE/DELETE, create or delete a folder
4. EXPUNGE, remove messages marked for deletion.
SMTP
Users and mailservers use SMTP to send email from a source to a destination. This is used
between user and mailservers (often with extensions, e.g. authentication) but also between
mailservers. Deciding upon where to send the message to is done using DNS.
Some issues:
• SMTP uses ASCII to send emails, so you can't send arbitrary binary data.
• Basic SMTP does not include authentication
• The FROM field in the header is not checked
MIME
To send other messages than just plain text, the Multipurpose Internet Mail Extensions (MIME)
exists. Using this you could send more than just text messages, where the type is indicated by a
new header. In order to support this, we can't just force all mailservers to be updated.
Base64
In order to do this, we use base64 encoding. We put in a binary string and we get ASCII.
Base64 only uses 64 characters, so 6 bits are translated into 1 character (so we just regroup
bytes into 6 bit streams). Unfortunately, this gives a 1 bit overhead per 6 bit sequence.
To indicate padding, we use = signs. We use padding when the output length in base64 is not
divisible by 4.
We then pad until the output length is divisible by 4 (so group input by 3 bytes). The final ==
sequence indicates that the last group contained only one byte, and = indicates that it
contained two bytes.
For example, if our output base64 becomes YW55 IGN , we pad with 1 =, to make it divisible by 4.
When the output is YW55 IG , we pad with ==
Some other problems / properties:
• If we already had ASCII in our message, it was already padded with a 0. Turning this into
base64 gives an even bigger overhead
• If the amount of bits is not divisible by 6, we have even a bigger overhead
HTTP
In order to query resources, we use HTTP. We send HTTP requests and get HTTP responses.
HTTP uses TCP, so before we can send a HTTP request, we need to setup a TCP connection.
We could create one connection for each HTTP request that we want to send. That is very
inefficient, so we use persistent connections to allow browsers to issue multiple requests over
the TCP connection. We can send sequential requests or pipelined requests (we send multiple
requests at the same time).
URLs
In order to locate web pages we use URLs (uniform resource locators). URLs specify a protocol
(e.g. HTTPS), domain name (interesting for the network and transport layer as this is where a
request will be sent to) and path (only interesting for HTTP, specific for the particular server).
The slashes in front of the URL are not really needed, but were used to improve readability in
the past.
MIME
In the web, the MIME type text/html is parsed. Other data is passed to a plugin or another
application. A plugin is integrated into your browser and handles MIME types that are not text
based.
CDN
Static content (e.g. JS files) can be hosted on a CDN. A Content Delivery Network is a type of
caching to increase system scalability.
An origin CDN server distributes content over multiple other CDN nodes (all over the world).
Then you make sure that users in Europe get the CDN in Europe etc. There are a few
possibilities for that:
• Use a front end to forward the requests to the right CDN node. This does mean that
there is still 1 front end which can still give delays if it's not placed well geographically
• Use DNS load balancing, you query a CDN name server which responds differently to a
request from Europe than to a request from the USA
For video, MPEG can be used for compression. MPEG compresses over a sequence of frames,
further using motion tracking to remove temporal redundancy. There are a few techniques for
this:
• I (Intra-coded), frames are self-contained. So a frame stands on itself instead of relying on
neighbor frames
• P (Predictive), only store changes to previous frames
• B (Bidirectional), frames may base prediction on previous frames and future frames
Using a high compression rate we can cut down the size of the file by a lot
When playing media, we buffer it. There are 2 things we worry about:
• We don't want a too full buffer, then we will lose content (we have a high-water mark, if
this is reached we need to download slower)
• We don't want an empty buffer, then we have stalling (so we have a low-water mark, if
this is reached we need to download quicker)
A partial solution to this is to change the compression rate based on the bandwidth:
• If the bandwidth decreases more lossy compression
• If the bandwidth increases less lossy compression
Peer-to-peer systems
Instead of relying on a central infrastructure, users can create their own infrastructure by
connecting to each other. This is naturally very scalable (to a certain extent), whereas a
centralized infrastructure isn't.
Napster
The original idea of Napster was the following:
• A user asks a central server who has a certain file
• The server responds with the IP address of a machine having that file
• The user requests that machine for the file
• The file gets returned
The problem with this is that there is still a central server, and because Napster was prone to
pirating this central server was shut down.
BitTorrent
The idea of BitTorrent:
• You download the description of a file you want to get (from PirateBay for example)
• You connect to the tracker server, which provides you with a list of peers (people who
serve chunks of the files)
• You seed (also become the host) and download chunks from the peers
The disadvantage of this is that the tracker still needs to be online. In modern versions of
BitTorrent this tracker is distributed.
So imagine places 7, 17, 24, 30 are taken and hash (t) returns 12, we store the data at node
17.
In general, you need to query half of the ring to find a file in this way. That's not efficient in a real
situation, so we need a better structure.
So, besides only keeping track of neighbors, we also keep track of other nodes in a table: the i
th entry of a table keeps track of the successor (location + 2i ) node. In total, it keeps track of
m nodes (the table contains 2m places)
Here, we define successor (x) as the node address y for which (y − x) mod 2m is minimal (or:
first node you get when walking clock-wise along the ring starting at x).
Example
Imagine we have the following ring (where m = 5):
We then get the following finger table for location 3 :
m Start location + 2m Address of successor
0 3 + 20 = 4 7
1 3 + 21 = 5 7
2 3 + 22 = 7 7
3 3 + 23 = 11 17
4 3 + 24 = 19 24