16. Introduction to the Web and HTTP

Network Applications

Network applications are programs that run on different end-systems and communicate with each other over the network (e.g. web browser and server).

Application protocols are a small part of a network application, and different network applications may share the same application protocol.

Services applications need:

Reliable data transfer (except some loss-tolerant applications like multimedia streaming)
Bandwidth
- Some are rate-sensitive such as audio calls
- Some are elastic e.g. email, FTP
Latency
- Some interactive real-time apps such as audio calls, VR, multi-player games etc. have tight end-to-end delay constraints

TCP provides a reliable transport service, flow control and congestion control, but no guarantees for latency or transmission rate.

UDP provides none of these guarantees.

Application Structure

Client-server:

Server:
- Always-on
- Permanent IP address
- May use server farms for scaling
Clients:
Communicate with the server
May be intermittently connected
May have dynamic IP addresses
Do not communicate directly with each other

Peer-to-peer:

No always-on server
Arbitrary end systems directly communicate
Peers intermittently connected and can change IP addresses
Highly scalable
Difficult to manage

Hybrid. Napster is an example:

File transfer uses peer-to-peer
File search is centralized; peers register content at central server and query it to locate content

The Web

On-demand service, compared to TV or radio which were scheduled

Hypertext Transfer Protocol (HTTP)

A web page consists of a base HTML-file and several referenced objects:

Can be a HTML file (iframe), image (img) etc.
Each object is addressable by a URL

HTTP uses the client-server model:

The browser requests, receives and displays web objects
The server sends objects in response to requests

Uses TCP:

Client initiates a TCP connection to the server on port 80
Server accepts the TCP connection
HTTP messages exchanged between the browser and server
TCP connection closed

HTTP is stateless: the server can work without maintaining any information about past client requests.

HTTP Connections

Non-persistent HTTP: At most one object is sent over a TCP connection. Used by HTTP/1.0.

Persistent HTTP: multiple objects can be sent over a single TCP connection; used by HTTP/1.1 by default.

The client will initialize the TCP connection, send an HTTP request message (containing some URL); the server will receive this response and form a response message containing the requested object.

HTTP/1.0 will close the TCP connection after the object is delivered, while persistent HTTP will leave the connection open and use the same connection for subsequent HTTP requests.

Response Time Modelling (RTT): time to send a small packet to travel from the client to the server and back.

For HTTP/1.0:

1 RTT to initiate the TCP connection (first two parts)
1 RTT for the HTTP request (and third part of handshake) and the request back

Hence, the total time is 2 RTTs, plus the transmission time for the file. After the file is received, the TCP connection must also be closed.

In addition to this, the OS must allocate resources for each TCP connection; browsers will often use parallel TCP connections to fetch referenced objects.

Persistent HTTP with pipelining is the default behavior in HTTP/1.1; the client sends requests as soon as it encounters a referenced file.

HTTP Messages

The HTTP request consists of the following ASCII-encoded text:

Request line (the tree values are separated by a space):
- Method (GET, POST, PUT, HEAD etc.)
- URL
- Version (HTTP/1.0, HTTP/1.1 etc.)
- \r\n to end the request line
Header lines:
- header_field_name: header_value\r\n (space required after the colon)
- Order of headers not significant, but fields sending control data should be sent field
- \r\n after all header lines to indicate the end of the header section
Entity body
- Used for POST and PUT requests

Some methods:

GET: get object
HEAD: get the headers that would be returned if it was a GET request
POST: filling out a form
PUT (1.1+): upload file in entity body to path specified in URL field
DELETE (1.1+): delete file specified in URL field

Some headers:

User-agent
Connection: close|keep-alive: defaults to keep-alive for HTTP/1.1
Accept-language: comma-separated list of languages and optional quality value for each language
- Language code can be language or language and region (e.g. en, en-gb)
- Weight: estimate of user’s preference for that language. lang-code;q=weight,. Defaults to q=1, can be any fraction between 0 and 1 (larger is better)

The response message follows this format:

Status line (space-separated):
- Version
- Status code (e.g. 200, 400)
- Phrase (text description of status code (e.g. OK))
- \r\n
Header lines:
- header_field_name: header_value\r\n (space required after the colon)
- \r\n after all header lines to indicate the end of the header section
Entity body

Some HTTP status codes:

1xx: request received, continuing process
2xx: request successful
- 200: OK
3xx: redirection
- 301/308: permanent redirect. The latter enforces that the request method (e.g. GET, POST) cannot change
- 302/307: temporary redirect. The latter enforces that the request method (e.g. GET, POST) cannot change
- 304: not modified since If-Modified-Since value
4xx: client error
- 400: bad request
- 404: not found
5xx: server error
- 500: internal server error
- 502: bad gateway (proxy/gateway received invalid response from upstream server)
- 503: service unavailable (e.g. overloaded)
- 505: HTTP version not supported

Cookies: User-Server State

HTTP is stateless, so to identify users cookies are used. There are four components:

Cookie header line in HTTP response message
- Multiple Set-cookie lines can be sent
- The header format is Set-cookie: cookie_name=cookie_value
  - By default, cookies are session cookies which are deleted after the current session ends
  - ; Expires = DoW, dd mmm yyyy, hh:mm:ss ttt; can be appended to make it a permanent cookie
  - ; Secure; ensures it is only sent on HTTPS connections
  - ; HttpOnly ensures it is not accessible through JS
Cookie header line in HTTP request message
- cookie: cookie1_name=cookie1_value; cookie2_name=cookie2_value...
Cookie file kept on the client and managed by the browser
Server’s database storing cookies

Web Caches (Proxy Servers)

Proxy servers sit between the origin server and client. There can be multiple proxy servers, one of which will hopefully be closer to the client than the origin server.

The proxy server/cache acts as both a client and server. It helps to:

Reduce response time
Reduce traffic on an institution’s access link
Lower bandwidth costs

If the requested object is in the cache, the browser will use it. Otherwise, it will make a request to the proxy server (and if the proxy does not have the resource, it will in turn make a request to the origin server).

Conditional GET

The server will not send the object if the cache has an up-to-date copy of the object.

Request header: If-modified-since: some_date.

If the object has not been modified since the given date, a 304 Not Modified response will be sent and there will be no body. Otherwise, it will generate a normal 200 OK response.