01. Introduction
- Fundamental design principals of data networks & design
- Networked applications, socket programming
- Selected data transmission, LAN
- Internet architecture and algorithms
- Routing protocols and algorithms
- Transport layer: UDP, TCP
Term 3 quizzes and assignments:
- 25%
- 4 lab quizzes: 2% each
- Week 1: problem sheet + optional quiz
- Week 5: problem sheet
- Super quiz: packet processing, 7% during week 3
- Assignment: socket programming, 10%
- Sunday August 16 11:59 pm
Mid-term test:
- 25% (electronic)
- Friday September 11 7:00-8:30 pm
- Lectures and lab from term 3
Term 4 quizzes:
- 4 lab quizzes, 2% each
Lab test:
- 17%
- Friday October 9 7:00-8:00 pm
- Term 4 lab content
Final exam:
- 25%
- Term 4 content only
- 90 minutes
02. Sockets
Ethernet/WLAN/PPP etc. -> Internet Protocol -> TCP/DNS/HTTP etc.
Everything over IP, IP over everything.
IP Address
- Each host (not just end hosts) is identified by one or more IP addresses
- Usually as many IP addresses as network adapters
- End hosts usually have one
- Exceptions e.g. laptops with Wi-Fi and ethernet
- Both identifies the host and helps with routing
- DNS used to translate human readable host names to IPv4 addresses
Representation
- 32-bit wide address
- Worldwide unique (with caveats)
- Dotted-decimal notation: 192.168.1.1
IP Service - Best Effort
The basic IP service is packet delivery. It is:
- Stateless: no connection/shared state
- Unacknowledged: the IP service does not send acknowledgement that a package has been received
- Unreliable: no retransmissions on the IP level
- Unordered: does not guarantee in-sequence delivery
UDP and TCP
-
UDP: user datagram protocol
-
TCP: transmission control protocols
-
Both operate on top of IPv4: they generate packets and use that as the IPv4 packet payload: protocol layering
-
Both are unicast: between a pair of IP addresses
-
They add their own addressing capabilities: port numbers
- IPv4 address refers to end host; port number refers to particular application running on the end host
UDP
IPv4, plus port numbers. It is:
- Connectionless
- Unacknowledged
- Unreliable
- Unordered
TCP
TCP allows safe and reliable transfer over the internet. It is:
- Connection-oriented
- Different from a connection in a circuit-switched network: resources are NOT reserved
- Goes through the same three phases of a connection: setup, use and teardown
- Reliable and in-sequence
- Byte-stream oriented
- Full-duplex
Ports
- 16-bit port number
- A process can allocate one or more ports
- One port is allocated to at most one process
- Some ports, well-known ports, are allocated to well-known applications
- 22: SSH; 25: SMTP; 53: DNS; 80: HTTP; 443: HTTPS
- This is convention, not a rule
- Ports 1023 and below are usually reserved for system services
- Admin permissions required in Linux
Processes and blocking socket calls
An OS creates an abstraction for processes to make processes think that they have their own exclusive processor and allocate memory exclusively (memory not accessible by other processes). Below the abstraction, time-sharing is used. A process can be:
- Running: running on the processor
- A process will be given some quantum of time to run, after which it will be swapped out with another runnable process
- Runnable: ready to run on the processor, but not currently running
- Blocked: waiting for external input e.g. file from disk, socket
- Blocked processes use no CPU resources
- Much better than busy-looping (polling)
- NB: of one thread calls a blocking call, the whole process will be blocked
Socket API
- Follows the Unix philosophy that (almost) everything is a file (or acts like a file)
- One socket is bound to exactly one port
- Within the same process, multiple sockets can be bound to the same port
- Associated with its underlying protocol (e.g. UDP or TCP)
- Application does not need to know the details of the abstraction, but needs to specify which one to use
- Associated with buffers - decouples the application from the underlying data protocol
- Receiver’s TCP/UDP entity places incoming data into the receiver buffer; process will
read()from the buffer at its own discretion - Transmitter
write()s data into the transmit buffer; TCP/UDP sends it at its own discretion
- Receiver’s TCP/UDP entity places incoming data into the receiver buffer; process will
- When writing to a socket:
- UDP: data encapsulated in UDP datagram and transferred - one write, one datagram
- TCP: data buffered and may be combined with data from previous/future
write()s before being transmitted - Successful write means data was successfully put into the buffer
Client/Server Paradigm
Client applications must:
- Know the server IP address and port number
- Have its own IP address and port number
- Have a socket bound to these
- Through the socket, it can initiate contact with the server
Server applications must:
- Have an IP address and port number
- Must be known to clients
- Must have a socket bound to these
- Accept service requests from clients and respond to them
The client and server can run on the same machine: 127.0.0.1 (localhost) maps a device to itself.
Socket Types
Differ by OS, but these are universal:
- Stream socket: TCP (usually)
- Reliable, in-sequence transmission of a byte stream
- The application does know what protocol is being used under the hood - usually TCP, but not guaranteed
- If delivery fails (for a prolonged time), the sending application is notified
- i.e. receiver not responding
- A
write()does not guarantee a packet - the underlying protocol will determine on its own if it should send a packet or combine it with data from otherwrite()s - Receiver cannot detect the boundary between
write()s - record boundaries not preserved
- Datagram socket: UDP (usually)
- Does not guarantee reliable or in-sequence delivery
- If delivery fails, sending application is not notified
- One
write()of x bytes, one UDP packet - A
read()will return a block of at most x bytes (when no bits are lost) - Record boundaries are preserved
Example Workflow: TCP client
socket()- Creates a socket, allocates resources (buffer, socket handle etc.)
- Initially assigned to a random, unused port number
- Can fail (e.g. lack of buffer space, not enough handles, need root)
connect()- Specify IP address and port number of server
- (Tries to) establish connection with the server
read()/write()- Or
sendto()/recvfrom()- additional parameters - Reads from/sends data to the server
- Or
close- Closes the connection; frees up resources
Example workflow: TCP Server
socket()creates socketbind()binds to a particular port number and IPlisten()declares the willingness to accept incoming connections and allocate resources for incoming connection requests- Buffer size specified
accept()- Accept an incoming connection request
- Takes the oldest connection request from the queue
- Generates a new socket with the same port number (as the socket
listen()is using) - Allows a separate thread to process it for parallel processing
- On the new socket:
read(),write()close()
close()on the initial socket - to stop accepting all connections
Example workflow: UDP client
socket()rcvfrom(),sendto()sendtorequires you to explicitly specify the receiver addressrcvfromgives you the IP address of the sender as well as the data
close()connect(), thenread()/write()can also be used.connect()gives you a default sender and receiver IP address, and can be called multiple times.
Example workflow: UDP server
socket()bind()rcvfrom(),sendto()close()
Socket API Functions
#include <sys/types.h>
#include <sys/socket.h>
#include <unistd.h>
Returns an integer. If negative, error, and sets the errno variable.
Socket (non-blocking)
- Creates a new socket structure
- Allocates resources e.g. socket buffer
- Assigns to random un-used port number
- Returns file descriptor representing the socket
int socket(int domain, int type, int protocol);.
domain: type of networking technology e.g.AE_INET(IPv4),AF_NET6(IPv6)- type: type of socket e.g.
SOCK_STREAM(TCP-like),SOCK_DGRAM(UDP-like),SOCK_RAW - protocol: protocol for given socket protocol. Usually only one sensible option. If in doubt, set to
0
If successful, returns file descriptor (positive integer). If fails, returns -1 and sets errno.
Bind (non-blocking)
- Links a socket to a particular IP address/port number/address family combination
int bind(int sockfd, const struct sockaddr *addr, socklen_t addrlen);
sockfd: socket descriptor (returned from socket())addr: pointer to struct containing address information. If IPv4, need to castsockaddr_intosockaddraddrlen: length of address
Tries to be general, so is big enough to fit in any type of address. sa_family is the same value as used in protocol.
struct sockaddr {
sa_family_t sa_family;
char sa_data[14];
}
Need to cast sockaddr_in to a char array.
struct sockaddr_in {
short sin_family;
unsigned short sin_port;
struct in_addr sin_addr;
char sin_zero[8];
};
struct in_addr {
unsigned long s_addr;
};
Listen (non-blocking)
int listen(int sockfd, int backlog);
Declares that you are willing to accept incoming steams (SOCK_STREAM or SOCK_SEQPACKET connections), and allocates resources (queue).
backlog: how long the queue can be
Accept (blocking)
- Waits for incoming connection requests
- Return a new socket (with same port number) over which you can talk to the sender, and returns the socket descriptor
int accept(int sockfd, struct sockaddr *addr, socklen_t *addrlen);
addr: pointer to address struct. It will fill in the address details of the sender
Connect (blocking)
int connect(int sockfd, const struct sockaddr *addr, socklen_t addrlen);
- For steam sockets: request establishment of connection with server
- For datagram sockets: used to specify default receiver for datagrams when
send()orwrite()used
Fate of connection setup must be known.
Blocks until the fate of the connection setup is known:
- For stream sockets, the connection must be accepted using
acceptand that message must be received - For datagram sockets, there is no confirmation of setup or anything, so it is non-blocking
Read
ssize_t recv(int sockfd, void *buf, size_t len, int flags); ssize_t recvfrom(int sockfd, void *buf, size_t len, int flags, struct sockaddr *src_addr, socklen_t *addrlen);
ssize_t read(int fd, void *buf, size_t count);
read: prepare buffer area, size of the buffer area, and pass it to read, which will fill it up with data.
recvfrom: the caller also provides memory for address structure and the function fills it in with the address/port/address family of the host.
Read will be blocking if there is no data (unless a flag is set using recv/recvfrom, in which case it will return an error code).
Write
ssize_t send(int sockfd, const void *buf, size_t len, int flags);
ssize_t sendto(int sockfd, const void *buf, size_t len, int flags, const struct sockaddr *dest_addr, socklen_t addrlen);
ssize_t write(int fd, const void *buf, size_t count);
write: pass it a buffer of data to send, and the number of bytes to send.
If the underlying socket buffer is too full, it will be blocking (unless the flag is set, which will cause it to return an error code).
send/write require the addressee to be specified using connect.
Close
int close(int fd);
Frees up socket resources.
Endianness
For certain data types (e.g. 16 bit unsigned port number), the packet header in particular, there must be a canonical representation for data being transmitted - the network byte order. Hosts must convert between their own internal representation (host byte order) and network representation.
The canonical representation for the internet is big-endian (most significant byte at lowest memory address). Intel hates it.
Host (h) to network (n) conversion, or vice versa, available as helper functions for 16/32 bit integers.
#include <arpa/inet.h>
uint32_t htonl(uint32_t hostlong); // host to network long (32 bits)
uint16_t htons(uint16_t hostshort); // host to network short (16 bits)
uint32_t ntohl(uint32_t netlong); // network to host long
uint16_t ntohs(uint16_t netshort); // network to host short
Helper functions for address manipulation.
#include <sys/socket.h>
#include <netinet/in.h>
#include <arpa/inet.h>
int inet_aton(const char *cp, struct in_addr *inp);
in_addr_t inet_addr(const char *cp);
in_addr_t inet_network(const char *cp);
char *inet_ntoa(struct in_addr in);
struct in_addr inet_makeaddr(int net, int host);
in_addr_t inet_lnaof(struct in_addr in);
in_addr_t inet_netof(struct in_addr in);
TCP Client Example
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <stdint.h>
#include <string.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <netdb.h>
int main() {
char[] server_name = "localhost";
uint16_t portno = 80;
struct sockaddr_in_serv_addr; // Address structure for server
struct hostent *server; // Result of DNS resolver
char[] buffer[256] = "A message to send"; // Buffer for data to send
// Create socket with protocol 0 (default) for given address family - TCP
int sockfd = socket(AE_INET, SOCK_STREAM, 0);
if (sockfd < 0) {
perror("ERROR opening socket");
exit(1);
}
server = gethostname(server_name); // DNS resolution
if (server == NULL) {
perror("ERROR no such host");
close(sockfd); // Not needed, but stylistically bad to not close the socket
exit(0);
}
bzero((char*) &serv_addr, sizeof(serv_addr)); // Zero out the struct
serv_addr.sin_family = AE_NET;
// Copy IP address from DNS response
bcopy((char*)server->h_addr,
(char*)&serv_addr.sin_addr.s_addr,
server->h_length
);
// Set port number
serv_addr.sin_port = htons(portno);
if (connect(sockfd, (struct sockaddr*) &serv_addr, sizeof(serv_addr)) < 0) {
perror("ERROR connecting");
close(sockfd);
exit(1);
}
int n = write(sockfd, buffer, strlen(buffer));
if (n < 0) {
perror("ERROR writing to socket");
exit(1);
}
bzero(buffer, 256); // Zero out buffer; wait for response
n = read(sockfd, buffer, 255);
if (n < 0) {
perror("ERROR reading from socket");
exit(1);
}
printf("%s\n", buffer);
}
TCP Server Example
int main() {
int portno = 80;
int sockfd = socket(AE_INET, SOCK_STREAM, 0);
struct sockaddr_in serv_addr;
struct sockaddr_in cli_addr;
if (sockfd < 0) {
perror("ERROR opening socket");
exit(1);
}
bzero((char *) &serv_addr, sizeof(serv_addr));
serv_addr.sin_family = IENET;
serv_addr.sin_addr.s_addr = INADDR_ANY; // Bound to all local interfaces. Otherwise, use the IP address of a specific interface
serv_addr.sin_port = htons(portno);
if (bind(sockfd, (struct sockaddr *) &serv_addr, sizeof(serv_addr)) < 0) {
eprint("ERROR on binding");
exit(1);
}
listen(sockfd, 5); // Allow maximum of 5 queued onnections
clilen = sizeof(cli_addr);
// Wait (blocking) for incoming connection request with `accept`
int newsockfd = accept(sockfd, (struct sockaddr *) &cli_addr, &clilen);
if (newsockfd < 0) {
eprint("ERROR on accept");
exit(1);
}
bzero(buffer, 256);
n = read(newsockfd, buffer, 255);
if (n < 0) {
eprint("ERROR reading from socket");
exit(1);
}
printf("%s\n", buffer);
n = write(newsocketfd, "Message received", 17);
if (n < 0) {
eprint("ERROR writing to socket");
exit(0);
}
close(newsockfd);
close(sockfd);
return 0;
}
03. Protocol Layering
Layering
The internet and POTS are complex and require vast amounts of software. Hence, structuring principals are used to achieve:
- Modularity, allowing software re-use
- Independence of network technologies (transparency)
- Separation of concerns
- Correctness
Networking software is layered - layer
- The SAP is a service access point - an interface through which layer
will provide a service to layer - Layer
can provide multiple SAPs - e.g. UDP: the SAP will be the socket API (and the associated port number)
- Layer
- Layer
’s protocol entity is the implementation of layer . The protocol entity cannot communicate with the other host’s protocol entity (at the same level) directly; instead, it must use the service provided by layer to communicate with the other host indirectly - However, the protocol entity can act as if this is the case
- Layer
must never make any assumptions about what layer is doing
General layout of Layer -PDU (Protocol Data Unit)
| N-protocol header | N + 1 data (N-SDU) | N trailer |
- The
-PDU/packet is constructed by the -protocol entity - It carries the data given by layer
for transmission; called the service data unit/payload/user data - The
-protocol entity adds a header containing control information relevant to the receiving layer - It may also add a trailer (usually a checksum)
Thus, going down the stack, each layer sandwiches the
| N-2 hdr | N-1 hdr | N hdr | N+1 data | N trl | N-1 trl | N-2 trl|
The receiver does the opposite, with each layer removing its header/trailer before handing the payload to the upper layer.
OSI Seven Layer Model
OSI: Open Systems Interconnection. A commercial failure, but the terminology and layering architecture/concepts are foundational to networking.
- Physical layer
- Link layer
- Network layer
- Transport Layer
- Session layer
- Presentation layer
- Application layer
If there are two end hosts, A and B, and a router in between:
- The lowest two layers, physical and link, are single-hop scope, exchanging PDUs only between two physically connected hosts
- The network layer uses hop-by-hop communication to achieve end-to-end communication
- Routers work only on the lowest three layers
- Routers are stupid; intelligence in the end hosts
- The upper four layers, network, transport, session and presentation, exchange PDUs between end hosts; purely concerned with end-to-end transmission
Physical (PHY)
Transmission of digital data over a physical medium using modulated waveforms/signals.
Often requires the specification of:
- Cable types/frequencies
- Connectors
- Electrical specifications
- Modulation/demodulation and signal specification
- Carrier or bit synchronization methods
Link layer
Responsible for reliable transfer of messages over a physical link. Messages at this layer are called frames.
Often requires the specification of:
- Framing: delineation of frame start/end, frame size, frame format
- Error control (coding or retransmission)
- Error-correction coding sometimes regarded as PHY functionality
- Medium access control
- Allocating the right to send on channels with multiple senders
- Flow control: avoiding overwhelming a slow receiver
Network layer
Provides link technology-independent abstraction of the entire network to other layers. Handles addressing and routing and end-to-end delivery of messages.
Messages at this layer and above are called packets.
Often requires the specification of:
- Addressing format
- Exchange of routing information and route computation
- Depending on technology: establishment, maintenance and teardown of connections
Transport Layer
Provides reliable, in-sequence and transparent end-to-end transfer; gives programming abstractions to higher layers.
Often requires the specification of:
- Error control procedures (again)
- Link layer protects against transmission error; the router can still lose the packet (e.g. if it has no resources available)
- Flow control procedures (again)
- Link layer does flow control for stations one hop apart; transport layer does end-to-end flow control
- Congestion control procedures (sometimes considered a network-layer issue)
- Protect network against overloading
Session & Presentation Layers
No one really knows what these are.
Session layer:
- Establishes communication sessions between applications
- Can involve several transport layer connections between the same two endpoints to increase data transfer rates. They may run either sequentially or in parallel
- May control the way two partners interact (e.g. alternatingly)
Presentation layer:
- translation between different representation of data types (endian conversion)
Application Layer
Application support functions (high-level APIs) used in many applications. Applications sit on top of this layer.
Examples include file transfer services, directory services, transaction processing support (e.g. two-phase commit)
TCP/IP Reference Model
Broadly equivalent to OSI model, but with the session and representation layers removed.
- Physical Layer
- Network Interface
- Internet
- Transport Layer
- Application
Once again, routers only implement the lowest three layers, which are kept simple. This allows many things to be implemented in hardware, allowing routers to keep up with the deluge of packets.
Application Layer
Protocols such as SMTP, HTTP, FTP.
Transport Layer
- Provides end-to-end communications to applications
- Offers its services through the socket interface (TCP, UDP, etc.)
- SAPs are called ports; used for application multiplexing:
- Several applications/processes can use the transport service
- Port bound to one application
- PDUs generated by TCP/UDP called segments
- TCP has mechanisms for:
- Error control (retransmission-based) and in-order delivery
- Flow control
- Congestion control
- Packet loss used as sign of congestion; sender will reduce rate
Internet Layer
IP:
- PDUs are called datagrams
- All higher-layer segments are encapsulated in datagrams
- Specifies an addressing scheme
- Provides end-to-end delivery of datagrams (forwarding)
- Does not specify how routing is done; left to dedicated protocols
- Has no mechanisms for error, flow or congestion control
- Simple, so IP datagrams can be sent over any network interface
- IP over everything
Physical and Network Interface Layer
Network interface layer:
- Similar to link layer
- Accepts IP datagrams and delivers them over a physical link
Service Providers and Service Users
An
The service user and provider:
- Talk to each other through service primitives
- Must obey rules in the usage of services (e.g. open file before reading from it)
There are standard service primitives for a service.
Confirmed Service
- Host A’s service user issues a
S.requestservice primitive to the service provider (possibly with some user data) - The service provider at host A generates one or more PDUs (packets) and sends them to host B
- The service provider at host B receives the PDU and generates a
S.indicationprimitive to the service user - The service user processes the request and prepares a response, passing it to the service provider through a
S.responseprimitive - The service provider generates a response PDU, sending it to host A’s service provider
- Host A’s service provider sends a
S.confirmprimitive to service user
Unconfirmed Service
The first three steps: S.request, PDU sent to B, then S.indication. Service user A does not know if their request has reached B’s service user.
Confirmed Delivery Service
Host B’s service provider simultaneously sends a S.indication to its service user and an acknowledgement packet to A’s service, which will generate a S.confirm packet to the service user.
Thus, A’s service user knows that B’s service provider received the request, but not if the service user successfully processed it.
Multiplexing
Allows several
To do this, the
The sending
Splitting
- Allows a single
protocol entity to transmit data over several SAPs - Thus, the
protocol entity must make scheduling decisions to decide which SAP to use for a given PDU - Mechanisms for sequencing may be necessary
Fragmentation and Reassembly
- PDUs may have a limited size: on lower layers, this is usually for physical reasons (e.g. Ethernet)
- To make PDU sizes transparent to higher layers, the SAP may accept large packets and partition the data into several
PDUs, called fragments (each with its own header/trailer), and transmit them separately - The fragments must be numbered so the receiver can reassemble them
- What happens if fragments are lost? IP does not have retransmission, so it must conclude after some time that a fragment will not arrive. Then, it can free the buffer (the buffer needs enough space for all the fragments, even if they have not arrived). Otherwise, malicious actor could cause DOS
Blocking and Deblocking
- Some higher level produce very small
-PDU - Thus, the transmitter may wait until several
-SDUs are present, blocking the requests, before putting them into a single SDU to save overhead - Receiver entity must deblock the
-PDU into several -SDUs using the markers placed into the PDU by the transmitter. - When should the sender stop collecting SDUs and start sending?
Sequence Numbers
- An
-entity can maintain a sequence number - For each new PDU the sequence number is written into the
-PDU, then incremented - Allows the receiver to:
- Detect duplicate PDUs
- Detect lost PDUs (and possibly request retransmission)
- Put
-PDUs back into the right order when reordered by the network
- Implementation issues:
- Overflowing
- Choice of initial sequence number
04. Physical Layering
Fundamentals
Digital Data
Digital data is finite or countably infinite sequence of discrete symbols, each belonging to a finite set called the alphabet.
As any finite alphabet can be represented by a group of bits, any digital data can be represented as a sequence of bits.
Analog Data
Analog data can take on an uncountable number of values which change continuously over time.
Transmission of Analog and Digital Data
With digital data, we want to recover the transmitted sequence of bits exactly, regardless of any distortions or errors introduced by the channel.
When the receiver receives a group of bits, it must choose one out of a finite number of alternatives. The design of the system centers around maximizing the probability that it makes the correct choice.
With analog data, if the channel introduce distortions, the receiver cannot exactly recover the transmitted data as it is faced an uncountably infinite number of alternatives. Thus, the goal of an analog system is to extract the best approximation to the transmitted signal.
Transmission of Digital Data Using Analog Signals
- Most transmission media only allows transmission of an analog physical signal taking on a continuous range of values over a continuous time interval
- Signal/waveform denotes the evolution of some physical quantity over time
- Modulation is when this signal is modified according to the data
Structure of Communication Systems
Formatting
This step transforms a source signal into digital data (i.e. ADC).
Source Coding (Compression)
The step of encoding digital data with the goal of reducing size. There are two types, lossless and lossy coding.
Lossless coding, also called redundancy reduction, encodes the data such that it can be perfectly restored.
Lossy coding, also called relevancy reduction, is used where humans have tolerance to imperfections, such as video/audio. The decoded data may differ from the original data, but aims to do so while reducing the difference perceived by humans.
Channel Coding
This adds redundant symbols/bits to combat transmission errors in the channel, giving channel decoder a better chance to guess the correct data. Thus, it reduces bit-error probability.
Modulator
Takes the sequence of digital symbols and maps these to a physical signal or waveform of a certain duration (symbol time) - this determines the bit rate.
Could increase the amount of data sent per symbol time, but this will make each symbol closer together and thus more susceptible to interference. To combat this, the power can be increased.
Demodulator and Channel Decoder
The demodulator must produce a guess of what could have been the transmitted waveform (and thus symbol) - this is possible as there is only a finite number of waveforms.
The probability of error, or symbol/bit error rate, is a key performance metric.
The channel decoder uses the redundancy introduced by the channel encoder to correct some or all of the errors made by the demodulator.
In the case of soft demodulation, the demodulator also sends its confidence that a certain symbol has been interpreted correctly.
Source Decoder and Formatter
The source decoder decompresses the compressed source data.
The formatter converts the data to the user.
Baseband Transmission
NRZ (Non-Return-to-Zero) Encoding
A voltage level is associated with each bit value and transmitted for a fixed amount of time - the symbol time. The inverse is the symbol/baud rate.
As one bit is transmitted per symbol time, the symbol rate is the bit rate.
Bipolar NRZ maps
However, more than two levels can be used - a group of
Signal Impairments
Assume unipolar NRZ is used below.
Attenuation
Attenuation - loss of signal power; in communications, this is often the loss of signal power for varying distance (path loss).
It is measured as a ratio of signal power at the transmitter output,
This is often expressed using decibels:
If the medium attenuates all frequencies equally, attenuation is a simple division by the attenuation factor.
Unfortunately, the medium does not attenuate all frequencies the same way.
Assuming there is a cut-off frequency in which all frequencies above the cut-off are attenuated completely, the signal will degrade as the cut-off frequency is lowered (fourier series describing square wave; low frequencies sinusoids will remain behind, making it less ‘sharp’).
A more realistic scenario is a low-pass filter.
Thermal Noise
A standard model for thermal noise is additive white Gaussian noise (AWGN) where the noise signal,
The observed signal will be
The noise is usually dependent on temperature.
Passband Transmission
To transmit several data signals simultaneously on a single channel (while avoiding collisions), passband transmission is required.
Data is transmitted by modulating a sinusoidal carrier of a certain frequency; one signal occupies a certain bandwidth around the center frequency.
By choosing center frequencies with sufficient separation, several data signals can be transmitted in parallel in a sufficiently wide frequency band (e.g. FM radio, Wi-Fi bands). This is also called frequency-division multiplexing.
In digital passband modulation, information modulated onto sinusoidal carrier of duration
Where
As each symbol will have their own functions, there may be sudden jumps (in theory) in amplitude as it the waveform moves from one symbol to the next.
There are three forms of digital modulation, each of which varies one of the functions:
Amplitude Modulation (Amplitude Shift Keying) varies
Frequency Modulation (Frequency Shift Keying) varies
Phase Modulation (Amplitude Shift Keying) varies
Quadrature Amplitude Modulation (QAM)
A combination of ASK and PSK, varying amplitude and phase according to the symbol simultaneously.
Amplitude-Phase graph with some (square) number of constellation points equally spread apart, forming a grid.
QAM usually applied with 16, 64 or 256 different constellation points, allowing 4, 6 or 8 bits to be mapped to one QAM symbol. Greater throughput but higher error rate.
Synchronization
Sender and receiver need to agree on how long the symbol period is, the exact center frequency, where packets start and and etc.
They have no common time or frequency reference, with each having their own local time and frequency references (using a crystal oscillators).
Oscillator frequency depends on manufacturing tolerances, ageing, temperature, pressure etc.
Hence, the receiver cannot assume the sender has the same time and frequency reference. Hence the receiver must:
- Synchronize its time/frequency reference with the transmitter’s
- Track and maintain it during packet reception
Oscillator mismatch and Doppler shift can mess up the frequency, and PSK/QAM require tracking signal phase. Synchronizing frequency and phase is called carrier synchronization.
- The transmitter transmit symbols at its own pace, using its local clock
- The receiver needs to extract information about symbol duration so it knows when symbols start and end
- For longer streams, the receiver clock must be synchronized continuously.
- In NRZ or on-off keying, long runs of the same bit mean long durations with the same voltage/amplitude, causing loss of synchronization
- Thus, the signal level must change sufficiently often
Manchester Encoding ensures there is at least one signal change per bit: a logical 1 is represented by a change from high to low in the middle of a symbol duration, and a logical 0 from a change from low to high. Voltage goes from -V to V.
The receiver will need to sample the incoming signal at a much higher frequency; if the transition occurs in an unexpected sample, the local clock can be adjusted so that that sample will be exactly in the middle of the symbol time.
Frame Synchronization
Ethernet has a preamble, called SYNC, to allow for initial synchronization; a 64 bit long sequence of alternating 0s and 1s. After this is a start-frame-delimiter; SDF. This is required as when the receiver acquires synchronization, it will have no idea when in the preamble it is.
How does the receiver know when the frame ends?
- Time gap: medium kept idle between frames
- Code violation: deliberate violations in encoding method (e.g. bit with no transition for Manchester encoding)
- Length field
Ethernet has length fields, but a bit error could occur in the length field so it cannot be trusted.
Bit stuffing is another option: a protocol has some well-known start and end pattern; the pattern must not be allowed to appear in the payload data.
For example, in HDLC, the flag is 01111110; after a sequence of 5 consecutive 1’s, the sender will insert a zero bit. Thus, if the receiver gets six 1’s, it must be the flag; if it gets 5 1’s followed by a 0, it should remove the 0. Unfortunately, a bit error could occur.
05. Local Area Networks and Ethernet
Misc.
Geometric RV:
- Type 1, Bernoulli trial: returns probability of k failures before success:
, Expected value is - Type 2, probability of k trials before first success:
, expected value
LANs
- Packet-switched networks
- Limited geographic extension; usually less than a kilometer
- Usually uses a shared transmission medium to multiple stations
- Usually controlled by one particular entity
- Often higher data rates than in WAN
- Broadcast transmissions possible
Standards
Standards typically specify these layers:
- Physical layer (PHY)
- Transmission media
- Network adapters
- Modulation schemes
- Data rates
- Network topologies
- Medium Access Control (MAC) sub-layer
- Specifies how stations share the transmission medium
- Possibly the link layer; error control and flow control (error control coding often done in hardware in the PHY layer)
IEEE 802 Series
IEEE = Institute of Electrical and Electronics Engineers. ISO has adopted several IEEE standards, including:
- 802.1: Overview, bridging, network management
- 802.2 Logical Link Control
- 802.3: Ethernet
- 802.11: WLAN
- 802.15: WPAN including Bluetooth
802.2 and 802.1’s MAC bridging are layer 2 protocols and thus, they are independent of the layer 1 protocols being used (WLAN, MAN etc.).
Physical Topologies
Bus Topology
Rarely used today. All stations are attached via a tap line to the bus. Thus, signals are heard by all stations - it is a broadcast medium, so frames include a destination address. Parallel transmissions may collide, garbling packets.
Due to signal attenuation, the bus length is limited to a few hundred meters at most.
In addition, if the bus cable is damaged, the entire network will stop working.
Star Topology
Stations attached to a central unit via their own private cable. The cable often has separate cables for transmit and receive directions, allowing full-duplex.
The central unit can act as a repeater or bridge/switch. As a repeater, it copies incoming signals to all other outputs, which may congest the network. As a bridge/switch, it copies incoming frames to only where the destination is. The latter allows transmissions to occur in parallel.
There is still a signal point of failure, but the impact of a cut cable is much smaller.
Wireless Topologies
Wireless transmission media are much more prone to errors, and stations cannot necessarily hear all stations - the channel is time-varying and the links are volatile and unpredictable.
MAC Fundamentals
These schemes allow a set of distributed stations to efficiently coordinate access to a common channel. More specifically:
- There are a number of stations wishing to communicate
- There is a shared communications channel/resource that can be used by one station at a time
- No other means for information exchange between stations exists
It is often regarded as a separate sub-layer between the PHY and link-layer.
Assumptions:
- The shared channel is a broadcast medium - this may not be true for wireless transmission media
- If parallel transmissions occur, all contending transmissions are garbled - once again, this may not be true for wireless transmission media
Desirable properties:
- Small medium access delay - time between the arrival of a packet and start of a successful transmissions
- Collisions, waiting times etc. affect this
- Small access delay desirable if medium is lightly loaded
- Fairness and fair re-use of unused resources
- Low overhead and high throughput
- Stability - increasing load should not decrease throughput
Orthogonal MAC Schemes
The behavior of one station does not influence the behavior of other stations. This is done by exclusively allocating resources to individual stations.
FDMA: frequency division multiple access; TDMA: time division; SDMA: space division; CDMA: code division.
FDMA
Each station is given a band of frequencies - a sub-channel they have exclusive transmission access to.
The channel bandwidth is subdivided into N sub-channels, with guard bands acting as safety margins between sub-channels and the fringe of the channel. This reduces interference between adjacent sub-channels.
To receive data, a station must have a separate receiver for each channel or a single tunable receiver - the latter has the issue of coordination as it must switch (related - tuning takes time) to the channel before data is transmitted onto the channel.
Performance
If the total available bandwidth is B (bps), station i is assigned 1/N of B (approximately - guard bands ignored) on a long-term basis.
Medium access delay is 0 as i can start the transmission immediately without risk of collision.
If a packet has size B/N bits, transmission takes one second. Thus:
Advantages and Disadvantages
N stations can transmit in parallel, and there is no need for time synchronization between the transmitters.
However, each station either needs N receivers or tuneable receivers, frequency synchronization is needed, and there is no re-use.
In addition, there needs to be some way of allocating sub-channels; if there is a central station in charge of keeping track which channels are being used and processing allocations requests, what happens if it crashes?
Hence, FDMA is good for CBR but bad for VBR.
TDMA
Time is subdivided into successive super-frames of duration T_SF, with each super-frame being subdivided into N subframes (plus guard times to compensate for synchronization errors). Hence, time slots are assigned exclusively and on a longer-term basis a station i for transmission.
Performance
Each station gets the full bandwidth B (bps) for 1/N of the time (ignoring guard times).
If:
- Station i owns one time slot
- T_SF is 1 second
- A time-slot has enough bandwidth to transmit B/N bits
- A packet of B/N bits arrives at a random time to station i
Thus:
Thus, on average, TDMA allows starting later and finishing sooner due to the full bandwidth being exclusively available.
Advantages and Disadvantages
Compared to FDMA, asymmetric bandwidth assignments is much easier - one station can use multiple time slots rather than having multiple receivers running simultaneously. In addition, it has better completion times on average, and does not require tunable receivers or multiple receivers.
The downsides are that tight time-synchronizations between stations is required and there is a high expected access delay on idle systems.
In addition, TDMA also requires signalling and shared state to allocate time slots and so has the same issues.
Hence, TDMA is also good for CBR but bad for VBR traffic.
Random Access Protocols
These protocols do not reserve channel resources or require a central station/shared state. Access to the medium is unpredictable, using randomness to avoid collisions.
ALOHA
N uncoordinated transmitters and one receiver (a base station). A collision occurs if two packets overlap at the receiver.
When a new packet arrives, it is put in a queue. Once the current packet is transmitted, it checks the queue and transmits the packet after a random backoff.
If the queue is empty, it calculates the checksum and transmits the frame immediately, then starts an acknowledgement timer.
The receiver sends an immediate ack on successful reception of a packet; if the ack timer expires, the transmitter enters backoff mode:
- The transmitter chooses a random backoff time
- When the backoff timer expires, the transmitter retransmits the frame
- If the number of failed trials exceeds the threshold, the frame is dropped.
The choice of random distribution for backoff times plays a key role in controlling the delay, throughput and stability. Often, the distribution depends on the number of collisions seen by the frame.
ALOHA is simple to implement and if the network load is small, new frames are sent immediately - zero access delay, with the full channel bandwidth.
However, if A and B sends packets of the same length, there is a vulnerability period two frames long where a collision will occur if B sends the frame - one frame before and while A is sending the packet.
ALOHA has good throughput for low loads, but beyond a certain point, the increasing collision rate leads to decreased throughput - it is an unstable protocol.
If all packets have the same length and the package transmission time is \tau.
CSMA - Carrier Sense Multiple Access
These protocols assume that stations can determine the state of the medium (busy or idle) (almost) instantaneously.
This is called carrier-sensing or clear channel assessment, and allows for listen-before-talk: before transmitting a frame, it performs the CS operation. If the channel is busy, the station defers the transmission using some strategy. It usually has a maximum number of deferrals/backoffs for a frame. Note that this does not eliminate collisions completely - collisions can occur if the time difference between two stations starting a transmission is smaller than the propagation delay.
This works well when medium the propagation time is small compared to the packet length as it allows other stations to notice transmissions quickly after it is started. Thus, it can occur in local area networks, but is absolutely useless if the sender may have stopped the transmission by the time the receiver detects the start of the frame.
Non-persistent CSMA
After getting a new packet from the higher layer (with an empty queue), the station performs the carrier-sense operation. If the medium is idle, it can transmit immediately but if the medium is busy, it generates a backoff time from a random distribution. Then it can repeat the process until it succeeds or reaches some threshold.
In the case of a collision, a backoff time is chosen and the process repeats. How can collisions be detected? One solution is to expect an acknowledgement for each frame.
Performance issue: there is a high probability that the medium will be idle for some time after the previous transmission finishes (especially if the packet is long), lowering utilization.
If it is known there is a small number of transmitters, using a distribution returning small waiting time reduces the time gaps between packets, increasing throughputs.
p-persistent CSMA
A parameter p is known to all stations.
If the medium is busy, it defers until the end of the ongoing transmission.
It will then divide the time onto time slots, which should be just large enough to run the following:
- With probability p it starts its transmission; otherwise, it defers for a further time slots
- If it defers, it checks the medium, waiting until the end of the transmission if a new one starts
- If a collision occurs, the process starts over
Choosing p is an issue, with a low value leading to stability in high load but low throughput as the medium will be idle after a transmission ends.
1-persistent CSMA or CSMA/CD
If the medium is busy, it defers until the end of the transmission.
When the medium becomes idle, the station sends unconditionally. This avoids idle times between transmissions.
However, if multiple stations start transmitting, a collision occurs and the transmitters detect this immediately - they need to be able to transmit and sense simultaneously. In this case:
- The transmission is aborted
- A jamming signal is sent to inform all stations about the collision
- A collision resolution procedure is started (e.g. backoff)
Ethernet
Frame Format
| Length | Name |. | ------- | ----------- |. | 7 | Preamble |. | 1 | SOF |. | 6 | DstAddr |. | 6 | SrcAddr |. | 2 | Length/Type |. | 46-1500 | Payload |. | 4 | FCS |.
Uses Manchester encoding.
- Preamble: for bit synchronization
- Start of Frame: end of preamble, also for synchronization
- Destination address
- Source address
- Length/Type field
- Payload
- Minimum length due to CSMA protocol
- If higher level sends payload that is too small, the payload is padded: how does the receiver know which bits are padding? They can’t; it must pass it onto the higher layers
- IP protocol: has length information embedded in it, so must use that
- Frame Check Sequence: CRC checksum
Ethernet Addresses
A 48 bit, globally unique address. Each vendor gets its own address range and each Ethernet adapter has its own address.
Addresses are written as six colon-separated bytes in hexadecimal representation.
Special addresses:
- The broadcast address is
FF:FF:FF:FF:FF:FF - Addresses where the first bit is
1are multicast addresses (except the broadcast address) - Addresses where the first bit is
0are unicast addresses
Address filtering: a station will throw away any unicast addresses not addressed to itself.
Length/Type field
When L/T is bigger than or equal to 0x0600, it is a type field: it indicates the higher-layer protocol encapsulated in the Ethernet frame. Hence, it allows protocol multiplexing - a service to multiple higher-level protocols can be provided using a single lower-level SAP. Some common protocols:
| Type field | Protocol |.
| ---------- | --------------- |.
| 0x0800 | IPV4 |.
| 0x0806 | ARP |.
| 0x86DD | IPV6 |.
| 0x8863 | PPPoE Discovery |.
| 0x8864 | PPPoE Session |.
When L/T is less than or equal to 1500, it indicates the length of the payload; the first two bytes of the payload indicates the type (hopefully).
PHY
There are several different physical layers such as coaxial cables, twisted pair cables, and fibre cables.
Ethernet also supports two fundamentally different topologies: broadcast (almost never used today) and switched Ethernet.
Broadcast Topologies
- All stations share the medium and hence, MAC is needed
- Stations operate in half-duplex mode - they cannot simultaneously send and receive
- In Ethernet, this can be achieved using the bus topology with coaxial cable or with hub-based topology
- Called Half-Duplex Mode
Switched Topologies
Stations are attached to (possibly cascaded) switches via point-to-point links - a private cable. Hence, no MAC is needed (but is still used) and parallel transmissions are possible, although there may be contention for resources in the switch.
Stations operate in full-duplex mode - each link uses pairs of cables.
This is required for 100 Mbps and higher PHYs - basically all modern Ethernet.
Half-Duplex Ethernet MAC Protocol
CSMA/CD - Carrier-Sense-Multiple-Access with Collision-Detection.
This has very short medium access delays under light load, but the collision rate increases under heavy load with many stations - do not operate Ethernet beyond 50-60% load.
Assumes that parallel transmissions lead to collisions and that stations can monitor the medium for collision while transmitting - this works by checking the voltage on the medium and checking if this exceeds the voltage applied by the transmitter.
- A new packet arrives at the MAC. The local collision counter,
collis set to0 - Carrier-sense operation performed
- If idle, the transmission starts immediately
- If busy, listen until the channel becomes idle. Start the transmission immediately
- While transmitting, check for a collision. If this is detected
- Abort frame transmission and send a short jamming signal
- Increment
coll. Ifcoll > 16, drop the frame and setcollto0 - Wait for a random time (backoff time) and go to the second step
- If no collision is detected, transmit the whole frame and reset
coll
Backoff time computation
The computation uses a truncated binary exponential backoff algorithm.
A random integer is uniformly generated the range
The slot time is a common parameter that is just large enough to cover the maximum round-trip time and processing delays such that if two stations start transmitting one slot apart, the later one will notice.
The algorithm starts by assuming the number of contenders is small - if this is the case, you do not need many windows. If this fails many times, then it is likely that there are many contenders, so you need to increase the number of windows to reduce the probability of collision.
Minimum Frame Size/Slot Time
The minimum frame size (64 bytes for 10 Mbps) is chosen to ensure the transmitter can detect collisions - it can only detect collisions while it is still transmitting.
NB: the preamble and SOF are not included in the frame size calculation.
The maximum cable length and processing time from repeaters/hub is defined in the standard, so it is possible to generate the worst case:
- Station A starts transmitting
- Just before it reaches station B, station B starts transmitting
- Station A can only detect the collision while it is still transmitting
- Hence, the worst case delay is twice the propagation delay, plus some buffer for processing time
Bridges and Switches
Often, existing LANs need to be coupled together - this can be done at several different levels.
| Layer | Name of device |
|---|---|
| PHY | Repeater/Hub |
| MAC | Bridge/Layer-2 switches |
| Network | Router |
| Application | Gateway |
Marketing departments do not necessarily follow this terminology.
Repeaters and Hubs
Repeater
Repeaters are dumb devices which simply amplify a signal on the analog level. They amplify existing noise and add their own noise to the signal, but have very small delays and are completely protocol or modulation-agnostic.
Regenerating Repeater
These demodulates an incoming signal on a symbol-per-symbol basis and modulates it again.
They do no error checking so regeneration can introduce errors.
Hub
Hubs are centralized repeaters - they broadcast signals coming in on one port to all other ports. Only one station can transmit at a time,.
Hubs may or may not be regenerative.
Basically a bus, except in a star topology, so a single broken cable won’t kill the entire network.
Bridges
Bridges interconnect LANs on the MAC layer. They mostly connect LANs of the same type (e.g. Ethernet), or at least ones with the same MAC address structure. Bridges can be cascaded.
Bridges understand/interpret fields related to the MAC protocol (source/address fields); repeaters and hubs do not.
Basic Operation
- Packet transmitted on LAN A
- All stations on LAN A, plus the bridge, receive the packet
- If the bridge knows the destination address belongs to a station in LAN A, it does nothing
- The bridge learns which stations are attached to which LANs over time by reading the source addresses of the packets transmitted on each LAN.
- Otherwise, it transmits it to all other LANs it is connected too
- This is dangerous when several bridges are used and loops are present
Advantages
- Reliability: failures in one LAN do not affect others as they are only interconnected by the bridge
- Performance: allows several parallel transmissions as long as the packets do not cross LANs
- Security: local traffic is not transmitted to other LANs
Layer-2 Switches
Stations are attached to the switch via point-to-point/private links with separate transmit/receive line (full-duplex). Hence, there is no broadcast medium so no need for a MAC. Instead, stations contend for the resources of the switch.
Switches transmit frames only to the correct output port, and can process frames in parallel - hence, they can increase network capacity as compared to LANs with a broadcast medium.
Frames arriving in parallel to the same destination are buffered.
Switches are transparent to stations.
Comparison to hub
- Hub builds a broadcast medium - only one station can transmit at a time
- A switch can accept up to N parallel transmissions, where N is the number of stations in the LAN
- Each attached station can receive packets at full medium capacity
06. IP and Related Protocols
The Internet
A packet-switched network of networks. The networks and links follow the end-to-end principle:
- Keep routers dumb, put intelligence in hosts
- Routers know how to deliver packets
- Making deliveries reliable is done by the end host
IPv4
Terminology:
- IP Packets are called datagrams unless fragmentation/reassembly is being used - they are called fragments in this case
- End stations are called hosts
- IP routers are called routers
- IP addresses are assigned to network interfaces, not hosts
Best-Effort, Datagram Delivery:
- Connectionless: no connection/shared state
- Unacknowledged
- Unreliable: no retransmissions
- Unordered: no in-sequence guarantees
Packet Format
Big endian byte ordering.
| Length | Name |
|---|---|
| 4 | Version |
| 4 | Hdr Len |
| 6 | TOS/DSCP |
| 2 | Unused |
| 16 | TotalLength |
| Bytes 0-3 | |
| 16 | Identification |
| 3 | Flags |
| 13 | Fragment Offset |
| Bytes 4-7 | |
| 8 | Time-To-Live |
| 8 | Protocol Type |
| 16 | Header Checksum |
| Bytes 8-11 | |
| 32 | Source Address |
| Bytes 12-15 | |
| 32 | Destination Address |
| Bytes 16+ | |
| 32n | Options + Padding |
| Data |
Version (4)
Version of IP protocol. 4 for IPv4.
Header Length (4)
Length of header in multiples of 4 bytes. Without options, the header is 20 bytes long so the value would be 5.
TOS/DSCP (2)
Stands for Type of Service/DiffServ Code Point. Allows the priority for each packet to be defined (usually by the ISP depending on how much you pay) - mostly ignored.
Fill with zero.
Unused (2)
Fill with zero.
Total Length (16)
Total length (in bytes) of the datagram - required since some MAC layers may add padding to meet minimum length requirements.
This may be modified during fragmentation and reassembly.
Identification (16)
Identifies each IP payload accepted from higher layers for a given interface - a sequence number.
Flags (3)
Contains two flags for fragmentation and reassembly (DF = don’t fragment; MF = more fragments).
Fragmentation Offset (13)
Gives the offset of the current fragment within the entire datagram, in multiples of 8 bytes.
Time-to-Live (8)
Last resort to deal with loops - IP does not specify routing protocols and so there is a possibility that loops can occur.
This value is the upper limit on number of routers a packet can traverse, and is decremented by each router (hence routers must recalculate checksum) the packet passes through. Once the TTL values reaches 0, the packet is discarded and the sender is notified via an ICMP message.
TTL is typically 32 or 64.
Protocol Type (8)
Determines which higher-level protocol generated the payload - provides different SAPs to allow protocol multiplexing.
| Value | Protocol |
|---|---|
0x01 |
ICMP |
0x02 |
IGMP |
0x04 |
IP-in-IP Encapsulation |
0x06 |
TCP |
0x11 |
UDP |
Header Checksum (16)
Checksum for the IP header (NOT the data) - for some types of data, a few bad bits may not matter.
Source/Destination Address (32)
Indicates the initial sender and final receiver of the datagram. Internally structured into two sections: the network and host ID. The network ID refers to some local network, and the host ID must be unique within that network.
Routing and Forwarding
Routing/Forwarding Tables
IP routers have several network interfaces - these are called ports. When a router receives a packet on some input port, it looks at the DestinationAddress field to determine the output port using the forwarding table.
The forwarding table is a table of all networks the router knows about, identified by their network ID, and the output port to send the packet through to reach that network. A table lookup is used to then select the output port for every incoming packet.
Notes:
- The time of the lookup is dependent on the number of entries. Expensive routers often implement this in hardware
- The table is filled by a separate routing protocol
- It is the responsibility of the last router on a path to deliver an IP datagram to the host.
Important: a host address is tied to its location in the network; there is no mobility support - if a host switches to another network, it obtains another address, breaking ongoing TCP connections.
Classless Inter-Domain Routing
How many bits should be allocated to the network ID? Classful addressing used to be used, allocating exactly 8, 16 or 24 bits to the network ID.
CIDR was introduced in 1993 and is now mandatory, and allows an arbitrary number of bits to be allocated to the network.
Netmask
The netmask specifies which bits belong to the network ID. It is a 32 bit value, with the leftmost k bits being 1s, and the remaining bits 0. A /k netmask will have k bits allocated to the network ID. The network ID is calculated using a logical AND with the IP address.
Private host addresses
There are two special host addresses:
- The host address with all 0s (32-k zeroes) signifies that the network as a whole is being referred to
- The host address with all 1s ((32-k ones) is the broadcast address - all hosts on the network will receive the packet
There are also some reserved IP blocks:
| Block | Usage |
|---|---|
10.0.0.0/8 |
Private-use IP networks |
127.0.0.0/8 |
Host loopback network |
169.254.0.0/16 |
Link-local for point-to-point links |
172.16.0.0/12 |
Private-use IP networks |
192.168.0.0/16 |
Private-use IP networks |
These addresses can be used within the provider network; packets with private addresses are not routed in the public internet.
The loopback address is usually denoted as 127.0.0.1, but any valid address in 127.0.0.0/8 can be used.
Simplified Packet Processing
IP input:
- Run sanity check on packet
- Insert packet into IP input queue
- Process IP options (extremely rare - not discussed)
- Check if destined to its own IP address or the broadcast address:
- Protocol demultiplexing: check
Protocolfield, pass TCP/UDP/whatever to the right high-layer protocol
- Protocol demultiplexing: check
- If not:
- Check if forwarding enabled; if not, the device is not a router so drop the packet
- Pass to IP output
IP output:
- Check if the packet is an immediate neighbor/directly reachable (e.g. directly via Ethernet)
- If so, attempt to deliver over a single hop
- Else:
- Calculate next hop using forwarding table lookup
- Forwarding table filled and maintained by a routing daemon
- Decrement
TTL - Recompute
HdrChksum - Hand over packet to outgoing interface
- Calculate next hop using forwarding table lookup
Forwarding Table Contents and Lookup (rough approximation)
Each entry contains:
- Destination IP address: one of
- Full host address (i.e. non-zero host-id)
- Network address with netmask (more common)
- Information about next hop: one of
- IP address of next-hop router (which is directly reachable)
- IP address of directly-connected network (network address/netmask)
- Flags
- If destination IP is host or network
- If next hop is router or directly attached to the network
- Specification of outgoing interface
A router only knows the next hop, not the entire remaining route. The forwarding table lookup has three stages:
- Look for an entry that is a full host address completely matching
dst. If found, send packet to indicated next hop/outgoing interface and stop processing- Extremely uncommon
- Look for an entry that is a network address matching
dst’s network address. If found, send packet to indicated next hop/outgoing interface and stop processingdst & netmask == net-addr & netmask- Usually, the network address already has the host bits set to zero, so slightly overkill
- Look for a special default entry - if found, send packet to indicated next hop - the default router and stop processing
- Drop packet and send ICMP message back to original sender of the datagram
In End-Hosts
Forwarding tables in end hosts generally contain two or more entries:
- The default route, typically configured via DHCP
- An entry for each network it is directly connected to - this way, it can communicate with other hosts on the network
In Routers
Most routers at the ‘border’ of the Internet (close to the customer) usually have small a forwarding table, often filled with other networks belonging to the same owner. For all other networks, they rely on the default router.
Core routers:
- Do not have a default router
- Are the default routers of other routers
- Know (almost) all of the Internet networks
Fragmentation and Reassembly
The link-layer technologies used by IP have different maximum packet sizes, called the maximum transmission unit. For several reasons, IP abstracts these concerns:
- For separation of concerns
- A packet could use different link-layer technologies in transit
- The route could change due to link failures/restores
Hence, IP offers its own maximum message length of 65515: (2^{16} - 1) - 20, where 20 is the minimum IP header size.
The overall process is as follows:
- The sender IP instance partitions the the message into fragments
- Each fragment is transmitted individually as a full IP packet
- Header information specifies that this is a fragment, and stores the position of the fragment in the message
- Each fragment has a size no larger than the MTU of the outgoing link
- Hence, intermediate routers can fragment a full message or split fragments further, but never re-assemble them (as it is not guaranteed all packets will follow the same path)
- The destination IP instance buffers the received fragments, re-assembles the message, then delivers it to the higher layer
- When the first fragment arrives, a buffer large enough for the whole message is allocated and a timer is started
All fragments datagrams belonging to the same message have:
- A full IP header
- The same value in the
Identificationfield (unique identifier set by the higher layer protocol) - A
TotalLengthfield reflecting the size of the fragment FragmentOffsetfield, specifying the offset (in units of 8 bytes) relative to the original, un-fragmented IP datagram- All but the last fragment will have the
MF(more-fragments) bit set- The last fragment does not have this set. It also has a non-zero
FragmentOffsetto differentiate it from an un-fragmented packet
- The last fragment does not have this set. It also has a non-zero
The sender can set the DF (don’t fragment) bit in the IP header to forbid fragmentation by intermediate routers. If the outgoing link’s MTU is too small, the intermediate router may return an ICMP message.
Address Resolution Protocol
IP datagrams are encapsulated into Ethernet frames, and Ethernet stations pick up a packet only if the destination MAC address matches its own. Hence, stations must know which MAC address a given IP address refers to.
ARP allows a host to map a given IP address to an MAC address. ARP is dynamic:
- The MAC address for a given IP address does not need to be statically configured
- Nodes can be moved or equipped with new network adapters without any re-configuration
This trades the downsides of static configuration for the complexity of running a separate protocol.
Basic Operation
- Two stations, A and B, are attached to the same Ethernet network
- Each has an IP and MAC address
- Each maintains an ARP cache
- Station A wishes to send an IP packet to station B’s IP address, but does not know B’s MAC address
- Station A broadcasts an ARP-request message containing:
- Its own IP and MAC address
- Stations B’s IP address
- Hence, the packet is sent to the Ethernet broadcast address. The length/type field of the Ethernet frame has a value of
0x0806 - Any other host recognizes that the IP address is not its own, and discards the packet
- If it cached every binding it ever heard, its ARP cache would get full quickly
- When Station B receives this, it:
- Stores a binding between A’s IP address and MAC address in its own cache
- Responds with a unicast ARP-reply packet including:
- Its own MAC and IP address
- A’s MAC and IP address
- A then stores the binding between B’s IP and MAC address in its ARP cache
ARP makes no retransmissions if an request is not answered.
If a sender wants to send an IP packet to a local destination, it first checks its ARP cache for a binding to the IP address. If not, it starts the address resolution procedure. Once this is finished, it will encapsulate the IP packet in an Ethernet frame and direct it to the MAC address.
Entries in the ARP cache are soft-state; they get discarded some fixed time (usually 20 minutes) after it is inserted into the cache. Some implementations restart the timer after using the entry.
This ensures that stale information gets flushed out (this is much more reliable than that device of interesting notifying everyone that its binding has changed).
Frame format
- Standard Ethernet Header
DstAddr,SrcAddr,FrameType
HardType- Type of MAC addresses used
0x0001for Ethernet’s 48 bit addresses
ProtType- Identifier for the higher-layer protocol that requires the address resolution
0x0800for IPv4
HardSize,ProtSize- Size in bytes of hardware and protocol addresses
- 6 and 4 for Ethernet and IPv4
Op- Type of ARP packet; differentiates between and ARP-request and ARP-reply
- Sender Ethernet/IP address
- Target Ethernet/IP address
ARP is a protocol working an a higher-level than Ethernet, so although the destination and source address is included in the packet, it needs to be included again in the Ethernet payload.
Internet Control Message Protocol
Allows routers or the destination host to inform the sender of error conditions such as when:
- No route to the destination
- Destination host is not reachable
- Fragmentation required, but
DFset
ICMP messages are encapsulated into regular IP datagrams and like IP, has no error control mechanisms.
ICMP messages are not required, and are often filtered out by firewalls even if they are sent. Thus, the sending host must not rely on ICMP messages.
Message Format
type: 8-bit ICMP message typecode: 8-bit ICMP sub-typechecksum: 16-bit checksum covering the header and data (calculated withchecksumbeing zero)- Data: depends on the
type/codecombination, but often includes the first few bytes of the offending IP datagram (including its header)
Some common type/code Combinations
| Type | Code | Meaning |
|---|---|---|
0 |
0 |
Echo reply (ping) |
3 |
0 |
Destination network unreachable |
3 |
1 |
Destination host unreachable |
3 |
2 |
Destination protocol unreachable |
3 |
3 |
Destination port unreachable |
3 |
4 |
Fragmentation required but DF set |
3 |
6 |
Destination network unknown |
3 |
7 |
Destination host unknown |
4 |
0 |
Source-quench |
8 |
0 |
Echo request (ping) |
11 |
0 |
TTL expired in transit |
11 |
1 |
Fragment reassembly time exceeded |
Destination Unreachable (type=3):
code=0: router has no matching entry (or default entry) for the non-directly connected destination address in its forwarding tablecode=1: reached last router before reaching destination, but that router could not reach the directly connected host (e.g. no ARP response)code=2: reached final destination, but value ofprotocolfield not implemented by receiving hostcode=3: used with TCP/UDP when no socket is bound to a port number
Source-quench (type=4, code=0): if IP router drops packet due to congestion. The intention is to let the source host throttle itself, but the act of sending the packet adds more work to the router.
TTL expiration (type=11, code=0): if IP router drops packet due to TTL reaching zero. tracert uses this, incrementing the TTL each time it sends a packet.
Fragmentation reassembly timeout (type=11, code=1): if not all fragments of a message received within a timeout. This invites the higher-level protocol to re-transmit the message.
07. Introduction to Routing
Network Layer Overview
Names of data units:
- Application: message
- Transport: segment
- Network: datagram
- Link: frame
Responsibilities of layers:
- Network layer: responsible for finding a route from A to B
- Transport layer: responsible for transporting data from A to be reliably or easily
- Application layer: responsible for sharing data
Layering: each layer takes data from above, adds header information then passes the new data unit to the layer before.
Possible services of a general network layer:
- Guaranteed delivery (with/without bounded delay)
- In-order delivery
- Guaranteed minimal bandwidth
- Guaranteed maximum jitter
- Jitter: gap between two consecutive messages
The internet network layer, IPv4 guarantees none of these.
Routing Overview
Routing is a network-wide process that determines the end-to-end paths packets take from source to destination.
The network needs to work out the path from the sending to receiving router automatically in an dynamic environment.
A router uses its routing table to forward a packet to some output link.
The router’s switch fabric - the physical architecture of the router, varies. Some implementations put the network interfaces/line cards close to the CPU to decrease processing times. The cross bar architecture is used to more quickly forward packets to outgoing interfaces.
Hierarchical Overview
There are inter-AS and intra-AS routing algorithms, where AS stands for an autonomous system.
Each ISP is an AS, and some corporations and universities are also ASs.
Each AS is a set of routers and networks managed by a single organization. The routers exchange information via a common routing protocol - the intra routing protocol.
The only inter-AS routing protocol is that is used is BGP. The RIP and OSFP protocols are the two main intra-AS routing protocols. All three are load-insensitive.
There are three types of AS:
- Stub AS
- Has only one connection to another AS
- Like a leaf node
- Multi-homed AS
- Has more than one connection to other ASs, but does not allow traffic to path through
- e.g. university network will not be doing forwarding
- Transit AS
- Has more than one AS and allows traffic to path through
- Run by ISPs
Forwarding vs Routing
Forwarding and routing are two independent processes.
Routing determines the path to take:
- Is not done packet-per-packet
- Is non-real time: up to two minutes of latency
- Requires communication between routers to jointly create forwarding tables
Forwarding transfers packets hop-by-hop:
- It determines which outgoing link to use on a packet-by-packet basis
- Is done in real time, possibly in hardware
The forwarding table:
- Maps a destination address to an interface
- Results from either the execution of the routing protocol or is static and preconfigured
- Is consulted for every packet
- Changes on relatively large timescales (e.g. topology or load changes)
Classification of Routing Algorithms
Routing algorithms solves a routing problem with ideal assumptions. Routing protocols embeds a routing algorithm in a real network environment; it operates in a distributed environment and can work around failure cases.
Classifications
Static/dynamic
Static algorithms change very slowly over time, mostly due to human intervention.
Dynamic algorithms re-compute routes in response to topology or traffic changes. The computation may occur periodically or in response to changes.
Global/Decentralized
Global algorithms assumes each node has global knowledge/state information on the network.
Decentralized algorithms assume no node has complete state information about the network and exchange information with their neighbors.
Load-sensitive/-insensitive
Load-sensitive algorithms assume link costs vary dynamically depending on the level of congestion in the link.
Internet routing protocols such as RIP, OSPF and BGP are load-insensitive and assign a static cost to each link.
08. Introduction to Routing - Link-State Routing
Modelling
Routers are modelled as nodes and links are modelled as edges. Edges have associated metrics:
- Cost of the link: delay, monetary costs, distance etc.
- Available resources: current capacity etc.
Dijkstra calculates the least-cost path.
Assumptions
Each router has a complete network picture: topology, link costs etc.
To do this, each router reliably floods information about its neighbors to every other router. All routers should have consistent information.
Flooding can sometimes cause broadcast storms; controlled flooding or spanning-tree broadcasts are used to prevent this.
Dijkstra’s Algorithm
After flooding, each router independently calculates the least-cost path from itself to every other router using Dijkstra’s algorithm and generates a forwarding table for every destination.
Definitions:
-
: source node -
: nodes whose least-cost path is already known; initially . One node is added to each iteration -
: cost of edge between and -
: current minimum cost of path from to node . Initially: -
if is adjacent to -
otherwise
-
-
: parent/predecessor node of - vertex before along the shortest path from to
Until all nodes are in
- Find the smallest
where is not in - Add
to - For all
adjacent to and not in : -
- i.e. see if the cost of
plus the cost between and is less the current cost of
- i.e. see if the cost of
-
Running this algorithm, a forwarding table can be generating, mapping some destination to the next hop.
Complexity:
- If a min-heap is used, getting the least-cost node
not in is - For each iteration,
needs to be updated for all of ’s neighbors:
Oscillation with Link-State Routing
Since routers have a complete network picture, if one router finds a cheaper path through node
09. Introduction to Routing - Distance Vector Algorithm
A dynamic, decentralized and asynchronous routing algorithm.
Both RIP and BGP use the Bellman-Ford (distance-vector) algorithm.
Bellman-Ford Equation
if
Distance-Vector Algorithm
The distance vector,
Each node,
In the algorithm, each node periodically sends its own distance vector estimates to its neighbors. When it receives a new estimate, it can update its own distance vector and if it changes, update its own neighbors. The distance estimates for any node will eventually converge to the actual least cost,
The algorithm is iterative and asynchronous: each local iteration is called by either a local link cost change or a DV update message from its neighbor.
The algorithm is distributed: each node notifies its neighbors only when its DV changes, and each node only knows the next hop, not the entire path.
Hence, each node is in one of three state:
- Wait for change in local link cost, or message from a neighbor
- Recompute DV estimate
- Notify neighbors if required
Pseudo-code
distance_vector = []
neighbor_distance_vectors = {}
def init():
for y in destinations:
distance_vector[y] = cost(self, y) # Infinity if not a neighbor
for w in neighbors:
neighbor_distance_vectors[w] = [INFINITY for y in destinations]
send(w, distance_vector)
while True:
get_update_or_link_cost_change()
for y in destinations:
distance_vector[y] = min([cost(self, v) + neighbor_distance_vectors[v][y] for v in neighbors])
if change_occurred:
for y in neighbors:
send(y, distance_vector)
Count-to-infinity problem
Good news travels fast: if a link cost is lowered, this change propagates through the network quickly.
However, there is an issue when a link cost increases. For example, assume
Initial state: link costs of
| x | y | z | |
|---|---|---|---|
| x | 0 | 1 | 3 (y) |
| y | 1 | 0 | 2 |
| z | 3 (y) | 2 | 0 |
At
| x | y | z | |
|---|---|---|---|
| x | 0 | 1 | 3 (y) |
| y | 5 (z) | 0 | 2 |
| z | 3 (y) | 2 | 0 |
At
| x | y | z | |
|---|---|---|---|
| x | 0 | 1 | 3 (y) |
| y | 5 (z) | 0 | 2 |
| z | 7 (y) | 2 | 0 |
Thus, at each iteration,
| x | y | z | |
|---|---|---|---|
| x | 0 | 1 | 3 (y) |
| y | 22 (z) | 0 | 2 |
| z | 20 | 2 | 0 |
Poisoned reverse
If a node
Before the link cost change in the example above,
| x | y | z | |
|---|---|---|---|
| x | 0 | 1 |
|
| y | 1 | 0 | 2 |
| z |
|
2 | 0 |
At
| x | y | z | |
|---|---|---|---|
| x | 0 | 1 |
|
| y | 40 | 0 | 2 |
| z |
|
2 | 0 |
At
| x | y | z | |
|---|---|---|---|
| x | 0 | 1 |
|
| y | 40 | 0 | 2 |
| z | 20 | 2 | 0 |
At
| x | y | z | |
|---|---|---|---|
| x | 0 | 1 |
|
| y | 22 (z) | 0 | 2 |
| z | 20 | 2 | 0 |
Hence, the lie makes it converge faster.
However, poisoned reverse can make the situation worse and cause routing loops. Example:
Consider only entries only to
So
At
Hence,
Now
Hence,
Now,
Hence,
Hence,
The computed path from
To fix this, RIP limits the maximum cost of a path is limited to 15.
Comparison of Link-State and Distance-Vector Algorithms
Message complexity:
- LS:
, nodes and , links, messages - DV: only exchanging information between neighbors
- Convergence time varies
Speed of convergence:
- LS:
algorithm, messages - May have oscillations
- DV: convergence time varies
- Routing loops
- Count-to-infinity problem
Robustness - behavior when router malfunctions:
- LS: node advertises incorrect link cost
- Each node computes only its own table
- DV: node advertises incorrect path cost
- Each node’s table used by others; errors propagate through the network
10. NAT, IPv6, RIP, OSPF, BGP
NAT
Motivations:
- ISP only needs to allocate IP per customer
- Can change addresses of devices in the local network without the outside world needing to know
- Can change ISP without changing local addresses
- Devices in local network not explicitly addressable by outside world
Each device has an internal IP address, but packets exiting the network have their source address transformed into a single public IP address.
Question: how does the router know which host to forward returning traffic if all devices have the same public IP address?
A table mapping a (external IP, external port) tuple to an (internal IP, internal port) tuple. When the device sends a packet, the router will randomly choose an external port to use, and when the response packet arrives, it can simply map this to the device’s internal IP and port.
The port number is a 16 bit field, so it theoretically allows over 60,000 simultaneous connections.
Controversy:
- Port numbers address processes, not hosts
- Port numbers are layer 4; routers should only process up to layer 3
- Violates end-to-end agreement; hosts no longer talk directly to each other
- IPv6 is a better solution
IPv6
Motivations:
- IPv4 address space getting full
- Header format helps speed up processing/forwarding
- Facilitates QoS
Datagram Format
IPv6 is simpler; no fragmentation, no checksums, fixed length (40 bytes) header:
- Version (4), value 6
- Traffic class (8): facilitates QoS
- Flow label (20): packet identifier
- Payload length (16): length of payload in bytes
- Next header (8): type of next (upper-layer) header (e.g. TCP, UDP)
- Hop limit (8): TTL
- Source address (128)
- Destination address (128)
- Payload
Transition
Old devices only support IPv4, and some routers cannot be upgraded. Hence, ‘flag days’, where the world switches protocol on some specified day, will not work.
With a dual-stack approach, all IPv6 nodes also have complete IPv4 implementations, allowing packets to be transformed in both directions as it is routed. This loses IPv6 or IPv4 specific fields.
Another approach is tunnelling: stuff the whole IPv6 datagram as the payload of a IPv4 datagram. If the packet is small enough and DF is set, a IPv6 router can then parse the payload and forward it as a IPv6 packet.
Due to DHCP, CIDRised addresses and NAT partially solving the IP address shortage problem (in the short term), IPv6 adoption has been slow.
Routing in the Internet
Flat networks don’t scale: if you have millions of hosts, storing routing information for each requires large amounts of memory. If link-state routing is used, broadcasting LS updates takes too much bandwidth, and if distance-vector routing is used, it will never converge.
Hierarchical Routing
Hierarchical routing aggregates routers into autonomous regions, where routers belong to the same organization and run the same intra-AS routing protocol.
Gateway routers have a direct link to a router in another AS.
The forwarding table is configured by both intra- and inter-AS routing algorithms:
- The intra-AS algorithm sets entries for internal and external destinations
- The inter-AS algorithm sets entries for only external destinations
Routing between ASes
Case 1: a single gateway router to only one other AS:
- Just forward all external traffic to that gateway router
Case 2: two or more physical links to other ASes (typically a transit AS):
- Learn from the inter-AS protocol that subnet X is reachable via multiple gateways
- Use routing information from intra-AS protocol to determine costs of path to each of the gateways
- Hot potato routing; choose the closest gateway
- Determine interface I that connects to first router on the path to the nearest gateway; insert
(X, I)into the forwarding table
RIP - Routing Information Protocol
Uses the distance vector algorithm.
Distance metric:
- Number of hops, max 15
- Number of subnets traversed along the shortest path from the source router to the destination subnet, including the destination subset (hence it is always at least 1)
To determine subnets, detach each interface from its host/router to create islands of isolated networks; the interfaces are the endpoints of the isolated networks, called subnets.
RIP advertisements:
- Distance vectors exchanged among neighbors every 30 seconds via Response Message
- They are also called advertisements, and are sent in UDP packets (layer-4)
- An application-level process (daemon) called
route-dmanages the RIP routing tables - UDP is level 4 so it cannot be managed at the router level
- Each advertisement contains a list of up to 25 destination nets within the AS
Example:
w-- A --x-- D --y-- B-- ... --`z`
|
C
The routing table in D would have:
| Dest. net | Next router | Num. hops |
|---|---|---|
w |
A |
2 |
y |
B |
2 |
z |
B |
7 |
x |
NA | 1 |
If A then advertises its own routing table where the cost to z (through C) is 4, D would use the Bellman-Ford equation and update its routing table to route requests to z through A with a cost of 5.
Link Failure and Recovery
After 180 seconds of no advertisements, the neighbor/link is declared dead; routes going through that neighbor are invalidated and new advertisements are sent to neighbors. Through this, the link failure information propagates through the entire network.
Differences between RIP and the DV Algorithm
- Can advertise a maximum of 25 subsets
- The link cost is always 1 in RIP
- Nodes in the DV algorithm are subsets
- Instead of distance vectors, it sends the routing table
- Routers do not store their neighbors’ DVs
- Routers exchange advertisements at fixed intervals, not when there are updates
- Does not have the count-to-infinity problem - has the count-to-16 problem
OSPF - Open Shortest Path First
- Uses the link state algorithm
- Algorithms are disseminated to the entire AS via broadcast messages (flooding)
- OSPF messages sent directly over IP (layer 3), not TCP/UDP
- Typically deployed in upper-tier ISPs
Advanced features not in RIP:
- Security: all OSPF messages are authenticated
- Multiple same-cost paths allowed
- Link cost can be set by admins (traffic engineering)
- Unicast and multicast support (one-to-one and one-to-group)
- Two-level hierarchy
OSPF ASes are configured into areas. Within an area:
- Each area runs its own OSPF algorithm
- Link-state advertisements do not exit the area
- Each node has detailed area topology
- Area border routers send summarized distance information to nets in its own areas and advertise to other area border routers
There is one backbone area which links areas together. In the backbone:
- Routers that run OSPF routing within the backbone area called backbone routers
- Boundary routers connect to other AS’s, typically through BGP
BGP - Border Gateway Protocol
The de facto standard. Provides each AS with the ability to:
- Obtain subnet reachability information
- Propagate the reachability information
- Determines ‘good’ routes to subsets
Pairs of routers - BGP peers, exchange routing information over TCP connections (called BGP sessions).
eBGP sessions allow routers in two different AS’s to exchange reachability information - when it advertises a prefix, it promises that it can forward any datagrams destined to that prefix. The receiver can then create an entry for that prefix in its forwarding table and re-advertise the same information to others via iBGP (between routers in the same AS) or eBGP sessions.
The advertisement is a combination of the prefix and attributes. Two important attributes are:
AS_PATH: not updated in iBGP connections. Contains a list of ASes through which the advertisement has passed - reject if already in the pathNEXT_HOP: not updated in iBGP connections. Advertisements flow from the destination outwards, so it contains the address of the boundary router for the next-hop AS- An import policy is used to accept or decline adverts
Route selection
Routers may learn about more than one route to a given prefix, so it uses elimination rules to pick one:
- Local preference value attribute (policy decision)
- Shortest AS-PATH (DV algorithm)
- Closest NEXT-HOP router (hot potato routing)
- Additional criteria
Routing policy can affect routing. For example, a keep silent policy could occur if X is a dual-homed (e.g. for redundancy) customer network connected to B and C provider networks; it will not want to route traffic from B to C and hence hence, it will not advertise to B a route to C, or vice versa.
11. Reliable Data Transfer - Error Detection and Correction
Transmission errors can occur due to factor such as:
- Thermal noise
- Weak signal strength, interference, jamming, jitter erc.
- Router/switch/station crashing
- Packets being:
- routed in the wrong direction
- dropped due to congestion
- dropped due to insufficient resources
To correct:
- Bit errors due to signal attenuation/noise, use error detection and correction
- Buffer overflows due to speed mismatch, too much traffic etc., use flow/congestion control
- Lost packets due to buffer overflows, lack of resources, use acknowledgement and retransmission
- Out-of-order packets due to retransmissions/packets using different routes, use acknowledgement and retransmission
Error detection
Use redundancy bits, sometimes called error detection and correction bits.
These bits can be used to check if an error has been detected (not that it has occurred).
Parity Check
This is the simplest way of detection an error; append a single parity bit to the end of a block of data.
- Even parity: number of 1’s in the data, plus parity bit is even
- Odd parity: number of 1’s in the data, plus parity bit is odd
2-D Parity Check
Divide bits into a matrix and use parity bits for each column and row, plus a parity bit for the parity bits (number of 1’s for the row and column parity bits).
Hence, it is possible to determine the bit which was flipped and hence correct single-bit errors.
This is called forward error correction - the receiver can detect and correct errors without needing to transmit a message to the sender.
Internet Checksum
Used for IPv4 header checksum and UDP data checksum (optional in IPv4, mandatory in IPv6. Checksum of part of the IP header (source/destination addresses, plus other fields), the UDP header and data).
Segment contents are treated as a sequence of 16-bit integers.
The checksum is the wraparound sum of the integers (ones complement):
- Add the two integers
- If there is overflow, add 1 to the result (wraparound)
Once all integers have been summed, invert the bits.
The receiver will sum all segments, plus the checksum, and check that all the bits are 1.
This scheme is considerably less effective than CRC, but also very simple to implement.
Why does IP (not v6), UDP and TCP even need checksums?
- Some link-layers do not do error correction
- Bit errors can occur anywhere - e.g. router memory
- As a final end-to-end check
- Certain functionality must be implemented on an end-to-end basis
def ones_complement_addition(shorts):
result = 0
for short in shorts:
result += short
if result > 0xFFFF:
result -= 0xFFFF
return result
def checksum(shorts):
return 0xFFFF - ones_complement_addition(shorts)
def check_checksum(shorts, checksum):
return 0xFFFF == (ones_complement_addition(shorts) + checksum)
Cyclic Redundancy Check (CRC)
Assumptions:
- d data bits
- The sender and receiver agree on an r+1 pattern called a generator, where the left-most bit is 1
The sender chooses r additional bits to append to the data such that the d+r are divisible by the generator using modulo-2 arithmetic.
The receiver then divides the d+r bits by the generator; if there is no remainder, no error is detected.
D: d-bit data block-
: r additional bits (frame check sequence) -
: d+r-bit frame -
: predetermined pattern of r+1 bits
Modulo-2 arithmetic:
- Addition/subtraction: exclusive OR (
) - Multiplication/division: same as base-2 arithmetic, but no carries or borrows
The
This is divisible by the divisor: hence,
Hence,
Example
G=11 0101, D=10 1000 1101, r=5
Initially, set the F bits to zero:
Divide 11 0101 by T=101 0001 1010 0000:
10 1000 XOR 11 0101 = 01 1011. The divisor and divided have an equal number of bits, so the most significant bit of the result is 1.
Get rid of the leading 0, then pull the next least significant bit from T:
11 0111 XOR 11 0101 = 00 1110. Second MSB of result is 1 again.
Git rid of the leading 0’s, then pull the next least significant bit from T:
1 1101 XOR 11 0101; this will not fit, so add a 0 as the next MSB of the result. Pull the next LSB from T:
11 1010 XOR 11 0101 = 00 1011
…
So result of division is 11 0101 0110 with a remainder of 1110. Thus, F=01110 and the message is T=101 001 1010 1110.
Other notes
- It is called a cyclic redundancy check as if you do a circular shift, it is still divisible by
G. - CRC can detect consecutive bit errors of
rbits or fewer. - CRCs are also called polynomial codes: do the division as sums of powers of
X(2?) - Four versions of
Gare widely used e.g. CRC-32 is sums ofXto the power of32, 26, 23, 22, 16, 12, 11, 10, 8, 7, 5, 4, 2, 1, 0
Forward Error Correction
Useful on wireless applications with propagation delay and high bit error rates.
Messages are divided into blocks of k-bits called datawords; the FEC encoder generates n-bit codewords.
Hamming distance: distance
Error correction can be done by finding the minimum distance from the received codeword to any valid codeword (and thus dataword). This fails when it is an equal distance away from multiple valid codewords.
Useful properties for the scheme:
- Maximize the overall hamming distances between any two valid codewords
- Ensure that no codeword is an equal distance away from two codewords
k-error correcting: minimum hamming distance between two valid codewords is at least 2k + 1.
12. Reliable Data Transfer - ARQ Protocols
Lost packets and out of order packets: acknowledgement and retransmission required.
Reliable data transfer protocols based on retransmission are known as Automatic Repeat reQuest (ARQ) Protocols. These include:
- Alternating Bit Protocol
- Go-Back-N
- Selective Repeat (Selective Reject)
Reliable Data Transfer (RDT) Protocols
RDT 1.0: Perfectly-Reliable
A perfectly reliable channel with:
- No bit errors
- No buffer overflows
- No out-of-order packets
- No packet loss
RDT 2.0: Bit Errors
All packets will arrive at the receivers, but the packet may be corrupted. Checksumming/CRC required for error detection.
Once the sender transmits a packet, it waits for a NAK/ACK from the receiver:
- If NAK, retransmit the packet
- If ACK, send the next packet once it is received from the higher layer
This is known as a stop-and-wait protocol as the sender must wait for the NAK/ACK.
RDT 2.1: Handles Corrupted ACK/NACK
In RDT2.0, if the response is corrupted, the sender does not know if the receiver got a corrupted packet.
In this case, the sender will retransmit the packet. As the receiver does not know if it is a new transmission or a retransmission, RDT 2.0 requires sequence numbers.
As there is a response after every single packet, only a single bit is required for the sequence number. If it is a retransmission, it will use the same value but if it is a transmission, it will flip the bit.
RDT 2.2: NAK-Free Protocol
The receiver only sends ACK messages, but it also sends a sequence number along with it.
The ACK sequence number is the sequence number of the last packet that was received successfully.
RDT 3.0: Lossy Channel with Bit Errors
Packets can get lost and bits can get flipped.
If ACK does not arrive after a timeout, the sender should retransmit the packet. Hence, the sender retransmits on both packet/ACK loss or delay.
If the timeout is too long, performance decreases, but if it is too short, duplicate data packets get sent.
This is also known as the alternating-bit protocol.
There are four cases:
- No loss
- Lost packet
- Lost ACK
- Premature timeout (ACK received after timeout)
Pipelined Protocols
RDT 3.0 is a stop-and-wait-protocol, so the sender spends most of its time waiting rather than transmitting. If RTT is the round trip time, L is the packet size and R is the transmission rate, then the sender utilization is U_sender = (L/R) / (RTT + L/R) (add L/R on denominator as the entire packet must be received before the ACK can be sent).
Pipelining allows the sender to send multiple packets without waiting for an ACK to arrive. This requires the sequence number to occupy more than one bit, and forces the sender and receiver to buffer more than one packet. There are two basic approaches:
Go-Back-N Protocol
The sender can send multiple packets but can have no more than N unacknowledged packets in the pipeline.
These protocols are sometimes called sliding-window protocols: there is a window of size N that must contain all the unacknowledged packets.
Packets to the left of the window are acknowledged and packets to the left are not yet sent. Inside the window is unacknowledged packets or packets that can be sent immediately.
The window size is related to flow and congestion control.
The protocol uses cumulative acknowledgement: an ACK with sequence number n acknowledges all packets with sequence numbers <= n. If any unexpected packets arrive (e.g. out-of-order), they are discarded and an ACK with the most recently received valid packet is sent.
On timeout, all packets in the window are resent.
Performance problem: if the transmission rate is high, there will be a lot of unacknowledged packets, so if the first packet is corrupted, all the following packets will be discarded.
Sender events:
- The sender has data to send or it receives an ACK:
- If there are less than N unacknowledged packets, send the next packet (and slide the window right)
- It reaches a timeout:
- Resend all packets in the window
Receiver events:
- An invalid or unexpected (e.g. sequence number already received) packet arrives:
- Send an ACK for the most recently received valid packet
- A good packet arrives:
- Move the window right and send an ACK for that packet
In the quiz:
- Base: first sequence number in the window
- Next sequence number: sequence number for next packet to send
- Usable sequence numbers: sequence numbers for frames that can be sent immediately
Selective Repeat/Reject
The receiver individually acknowledges all correctly received packets, buffering packets as needed for in-order delivery to the higher layer.
The sender only resends the oldest packet for which the ACK is not received (timeout).
The receiver’s window can have:
- Expected packets, not yet received
- Buffered packets that arrived out of order
- Usable sequence numbers, not yet received
If a packet is re-received, this means the ACK for the packet was lost, and so the receiver should send an ACK for that packet again. If the packet was near the base of the window before the receiver sent the original ACK, this could mean that the receiver window is now ahead of the sender window.
The worst case scenario for window size is:
- All but the first packet in the window arrives. ACKs arrive
- First packet resent, ACK lost
- Receiver window moves forward by
sender does not move - Sender window is
- Receiver window is
- Sender window is
- Sender sends first packet again
To prevent this from being interpreted as a new packet, the sender base number must not be in the receiver window. Hence, the window size must be larger than
Hidden assumption: the packets will NOT be reordered within the channel between the sender and receiver - this is reasonable for a single link, but could be broken in an multi-hop network.
If this assumption is broken, old copies of a packet with a sequence number not in either the sender or receiver’s window can arrive. One solution is to have a live time for every packet to ensure it will not stay in the network for very long - ideally, the sequence space will be large enough that it is impossible to fully cycle through the sequence numbers within this time. Note that the receiver will ignore these packets regardless.
13. Transport Layer Protocol - Reliable Data Transfer
TCP Segments
TCP will divide each message into segments - the sequence number for each segment is the index of the first data byte in the segment. In practice, a random sequence number is used for the initial segment’s sequence number.
The receiver’s ACK message contains the index of the largest byte received: the segment’s sequence number plus the number of data bytes.
TCP uses a single retransmission timer.
Simplified TCP sender (without refinements below):
next_sequence_number = initial_sequence_number
send_base = initial_sequence_number
while True:
if # data received from upper layer:
segment = create_tcp_segment(next_sequence_number)
if timer is None: # no unacknowledged segments in the window
timer = start_timer()
send_packet(segment)
next_sequence_number += len(segment.data)
elif # timer expired:
# re-send the segment with the oldest sequence number
segment = unacknowledged_packets_in_window()[0]
send_packet(segment)
timer = start_timer()
elif # ack received:
ack = get_sequence_number()
if ack > send_base:
send_base = ack # cumulative ACK
if len(unacknowledged_packets_in_window()) != 0:
timer = start_timer()
Some refinements:
- When the timer expires, double the timeout interval:
timeout_interval *= 2- The expiration may be an indication of congestion
- Typical initial value is one second
- When the timer is restarted, reset the timeout interval to its initial value
- Timer restarts occur when an ACK arrives or data is received from the upper layer AND there are currently no unacknowledged segments
- Fast retransmit: If the sender receives three ACKs with the same segment number, retransmit it before the timeout
- ACKs coming back may also be a signal that the network is not too congested
Events on the receiver side:
- In order segment with expected sequence number arrives:
- Wait 500 ms; if no segment comes within this time, send an ACK
- Above, but there is already another segment with pending ACK:
- Immediately send an ACK, acknowledging the arrival of both packets
- Out-of-order segment with higher-than-expected sequence number:
- Immediately send a duplicate ACK with the sequence number of the next expected byte
- Arrival of segment that partially/completely fills a gap:
- If the segment fills in the start of the gap, immediately send an ACK
TCP is a mixture of the go-back-N and selective repeat strategies:
- It uses cumulative ACK, similar to GBN
- Out-of-order segments are often buffered, similar to SR
- When there is a timeout, the sender retransmits only the smallest unacknowledged packet, unlike GBN
Setting the Timeout Value
The timeout value should be longer than the round-trip time, so the first step is to find this value.
The SampleRTT will be the time from segment transmission to ACK, with the condition that retransmissions should not be used - the ACK for the first transmission could arrive after the sender retransmits, giving a wrong value.
The RTT varies over time, so several samples are required. This is done using exponential weighted moving average, which puts more weights on more recent samples:
An estimate of the variability is also required:
With these two values, the timeout interval can be calculated:
14. Transport Layer Protocols - Flow and Congestion Control
TCP creates a reliable data transfer service on top of IP using pipelined segments, cumulative ACKs, retransmission timers etc.
Setting the window size
The window size is simply the difference between the last byte sent and last byte acknowledged. TCP simply limits this to LastByteSent - LastByteACKed <= min(CongestionWindow, RecvWindow), where the former is determined by the network congestion control algorithm and the latter by the TCP flow control algorithm.
Flow Control
Flow control is a speed-matching service - it matches the rate at which the sender writes against the rate at which the receiver can read.
The sender and receiver allocate buffers; the receiver simply sends the amount of spare room its buffer (TCP is full-duplex):
RecvWindow = RecvBuffer - (LastByteReceived - LastByteRead)
This is written to the 16-bit ‘Receive Window’ section in the TCP header.
Extreme Case
If the receive window is 0 bytes (i.e. buffer full as the higher-level application has not read the buffer contents), the sender will stop sending data.
This will mean that the receiver can no longer send acknowledgement packets back with updated window sizes. Hence, when the RecvWindow is 0, the sender will periodically send 1 byte of data.
Congestion Control
Packet retransmission treats the symptom of packet loss, not the cause, which is typically overflowing router buffers. In fact, retransmissions can make the problem worse.
Congestion control is not so much a service provided to the application, but to the network to the whole.
There are two broad approaches to congestion control. The first is network-assisted congestion control, where routers provide feedback to end-systems by indicating an explicit rate the sender should send at and/or a single bit flag indicating congestion. Today’s systems try to make routers as simple as possible and so do not use this approach.
Instead, end-to-end congestion control is used, where congestion is inferred from end systems observing loss and delay.
Congestion is caused by too many sources sending too much data too fast for the network to handle. This manifests itself through lost packets (buffer overflows) and long delays (queueing).
Congestion control attempts to ensure everyone enjoys a share of resources without causing network congestion. However, there is no predefined share for everyone - it is dynamic.
Assuming that RecvWindow > CongestionWindow (i.e. LastByteSent - LastByteACKed <= CongestionWindow) and there is negligible transmission delay (time between sending first and last byte in a segment negligible), within in one RTT, the sender sends at most CongestionWindow bytes.
Hence, the rate of transmission is approximately CongestionWindow / RTT - CongestionWindow can be adjusted according to network congestion.
The arrival of ACKs is taken as an indication of the network being congestion-free: if ACKs (for previously unacknowledged segments) arrive quickly, the congestion window should be increased quickly (and slowly if ACKs arrive slowly).
After a loss event - a timeout or 3 duplicate ACKs, the congestion window should decrease.
TCP Slow Start
Begin the connection with CongestionWindow = 1 MSS (maximum segment size). This means 1 segment is sent per RTT.
This may be much smaller than the available bandwidth. Hence, the rate should be increased exponentially until the first loss event: double the congestion window every RTT - this is done by incrementing CongestionWindow by 1 MSS for every ACK.
After the first loss event, congestion control moves on to AIMD:
TCP AIMD (Additive Increase, Multiplicative Decrease)
Additive increase (AI): increase CongestionWindow by 1 MSS every RTT in the absence of loss events. This can be done by
Multiplicative decrease (MD): halve CongestionWindow after every loss event.
This leads to a saw-tooth pattern in the congestion window size.
Timeout Rules
After three duplicate ACKs arrive, the CongestionWindow should be halved and then the window should grow linearly (AIMD).
After a timeout event, CongestionWindow should be set to 1 MSS and grow exponentially (slow start) until it reaches a threshold - half of CongestionWindow from before the timeout.
Note that the duplicate ACK behavior occurs in TCP Reno; TCP Tahoe’s (Reno’s predecessor) behavior is the same as a timeout event.
15. Transport-Layer Services - TCP and UDP
Transport protocols provide logical communication between application processes running on different hosts.
The sender side breaks the application messages into segments and passes it to the network layer, while the receiver reassembles segments into messages and passes it to the application layer.
Multiple transport layer protocols are available, the most common being TCP and UDP.
The internet transport layer extends IP’s host-to-host delivery to process-to-process delivery (hence requiring multiplexing and demultiplexing), with TCP specifically adding reliable delivery, flow control and congestion control.
Multiplexing and Demultiplexing
Demultiplexing: the host receives IP datagrams; each has source/destination IP addresses and carries a single transport-layer segment. Inside each segment is the source/destination port number.
Connectionless-demultiplexing
With connectionless-demultiplexing (e.g. UDP), a two-tuple of the destination IP and port number can be used to identify the socket. Packets with different source IP and/or port numbers will be directed to the same socket.
Connection-oriented-demultiplexing
TCP sockets are identified by a 4-tuple: all four of the source and destination IP addresses and port numbers are required to direct the segment to the appropriate socket.
Hence, the server host may support many simultaneous TCP sockets through a single port number.
UDP
A bare bones, best effort service: segments can be lost or delivered out of order.
UDP is often used for streaming multimedia applications as this use case is loss-tolerate and rate-sensitive. UDP is also used in RIP, DNS, SNMP.
Segment structure:
| 16 bits | 16 bits |
| Source port | Destination port |
| Length | Checksum |
| Data |
The length is the length in bytes of the segment, including the header.
Advantages:
- No connection establishment - no delay
- Simple: no connection state at sender or receiver
- Small header - 8 bytes
- No congestion control - sender can send as fast as possible
The UDP socket for an incoming packet can be identified by its destination IP and port.
TCP
Properties:
- Point-to-point: one sender, one receiver
- Provides a reliable, in-order byte stream
- No application-layer message boundaries
- Pipelined: TCP congestion/flow control sets the window size
- Full-duplex data: bi-directional data flow
- Connection-oriented: handshaking to initialize sender/receiver state
- Flow-controlled: sender will not overwhelm the receiver
Packet structure:
| 16 bits | 16 bits |
| Source port | Destination port |
| Sequence Number |
| Acknowledgement Number |
| Flags | Receive window |
| Checksum | Urg. data pointer |
| Options (variable) |
| Application Data |
Notes:
- Sequence number: byte stream number of the first byte in the segment’s data
- Acknowledgement: sequence number of next byte expected from the other side (cumulative ACK and hence acknowledges that all bytes prior to that have arrived)
- Urgent data pointer field rarely used
- Options can be padded with zeros to ensure its size is a multiple of 4 bytes.
Flags:
- Header length (4 bits) - number of bytes in header divided by 4
- Reserved (3 bits)
NS: ECE-nonceCWR: congestion window reducedECE: ECN-echo; ifSYNECN capable else packet with congestion experienced flag setURG: urgent pointer field significantACK: acknowledgement field significant - all packets sent by the client should have this set to true, excluding the initialSYNPSH: push function: asks to push buffered data to the applicationRST: reset connectionSYN: synchronize sequence number: only the first packet sent by each end should have this sentFIN: last packet from the sender
TCP Connection Management
Opening: three-way handshake
NB: segments with SYN or FIN flags are treated as if they have a 1 byte payload.
Step 1:
- Client host sends TCP
SYNsegment (SYN = 1) to server - Randomly picks initial sequence number:
client_isn - No application data
The second step both acknowledges the segment and sends the server’s ISN:
- Server host receives
SYN, replies withSYNACKsegment- Allocates buffers
- Randomly picks initial sequence number:
server_isn SYN= 1,ACK = 1,ACK_number = client_isn + 1,sequence_number = server_isn
This third step is needed to confirm that the client has received the server’s ISN:
- Client host receives
SYNACK, replies withACK- Response may contain data
SYN = 0,ACK = 1,ACK_number = server_isn + 1;sequence_number = client_isn + 1
Termination
- Step 1: client host sends
FINto server - Step 2: server host receives
FIN, responds withACK - Step 3: once server successfully closes connection (freeing buffers etc.), sends
FIN - Step 4: client receives
FIN, responds withACK
Client waits for double the maximum segment lifespan to ensure the ACK arrives.
16. Introduction to the Web and HTTP
Network Applications
Network applications are programs that run on different end-systems and communicate with each other over the network (e.g. web browser and server).
Application protocols are a small part of a network application, and different network applications may share the same application protocol.
Services applications need:
- Reliable data transfer (except some loss-tolerant applications like multimedia streaming)
- Bandwidth
- Some are rate-sensitive such as audio calls
- Some are elastic e.g. email, FTP
- Latency
- Some interactive real-time apps such as audio calls, VR, multi-player games etc. have tight end-to-end delay constraints
TCP provides a reliable transport service, flow control and congestion control, but no guarantees for latency or transmission rate.
UDP provides none of these guarantees.
Application Structure
Client-server:
- Server:
- Always-on
- Permanent IP address
- May use server farms for scaling
- Clients:
- Communicate with the server
- May be intermittently connected
- May have dynamic IP addresses
- Do not communicate directly with each other
Peer-to-peer:
- No always-on server
- Arbitrary end systems directly communicate
- Peers intermittently connected and can change IP addresses
- Highly scalable
- Difficult to manage
Hybrid. Napster is an example:
- File transfer uses peer-to-peer
- File search is centralized; peers register content at central server and query it to locate content
The Web
- On-demand service, compared to TV or radio which were scheduled
Hypertext Transfer Protocol (HTTP)
A web page consists of a base HTML-file and several referenced objects:
- Can be a HTML file (
iframe), image (img) etc. - Each object is addressable by a URL
HTTP uses the client-server model:
- The browser requests, receives and displays web objects
- The server sends objects in response to requests
Uses TCP:
- Client initiates a TCP connection to the server on port 80
- Server accepts the TCP connection
- HTTP messages exchanged between the browser and server
- TCP connection closed
HTTP is stateless: the server can work without maintaining any information about past client requests.
HTTP Connections
Non-persistent HTTP: At most one object is sent over a TCP connection. Used by HTTP/1.0.
Persistent HTTP: multiple objects can be sent over a single TCP connection; used by HTTP/1.1 by default.
The client will initialize the TCP connection, send an HTTP request message (containing some URL); the server will receive this response and form a response message containing the requested object.
HTTP/1.0 will close the TCP connection after the object is delivered, while persistent HTTP will leave the connection open and use the same connection for subsequent HTTP requests.
Response Time Modelling (RTT): time to send a small packet to travel from the client to the server and back.
For HTTP/1.0:
- 1 RTT to initiate the TCP connection (first two parts)
- 1 RTT for the HTTP request (and third part of handshake) and the request back
Hence, the total time is 2 RTTs, plus the transmission time for the file. After the file is received, the TCP connection must also be closed.
In addition to this, the OS must allocate resources for each TCP connection; browsers will often use parallel TCP connections to fetch referenced objects.
Persistent HTTP with pipelining is the default behavior in HTTP/1.1; the client sends requests as soon as it encounters a referenced file.
HTTP Messages
The HTTP request consists of the following ASCII-encoded text:
- Request line (the tree values are separated by a space):
- Method (
GET,POST,PUT,HEADetc.) - URL
- Version (
HTTP/1.0,HTTP/1.1etc.) \r\nto end the request line
- Method (
- Header lines:
header_field_name: header_value\r\n(space required after the colon)- Order of headers not significant, but fields sending control data should be sent field
\r\nafter all header lines to indicate the end of the header section
- Entity body
- Used for
POSTandPUTrequests
- Used for
Some methods:
GET: get objectHEAD: get the headers that would be returned if it was aGETrequestPOST: filling out a formPUT(1.1+): upload file in entity body to path specified in URL fieldDELETE(1.1+): delete file specified in URL field
Some headers:
User-agentConnection: close|keep-alive: defaults tokeep-alivefor HTTP/1.1Accept-language: comma-separated list of languages and optional quality value for each language- Language code can be language or language and region (e.g.
en,en-gb) - Weight: estimate of user’s preference for that language.
lang-code;q=weight,. Defaults toq=1, can be any fraction between0and1(larger is better)
- Language code can be language or language and region (e.g.
The response message follows this format:
- Status line (space-separated):
- Version
- Status code (e.g.
200,400) - Phrase (text description of status code (e.g.
OK)) \r\n
- Header lines:
header_field_name: header_value\r\n(space required after the colon)\r\nafter all header lines to indicate the end of the header section
- Entity body
Some HTTP status codes:
1xx: request received, continuing process2xx: request successful200: OK
3xx: redirection301/308: permanent redirect. The latter enforces that the request method (e.g.GET,POST) cannot change302/307: temporary redirect. The latter enforces that the request method (e.g.GET,POST) cannot change304: not modified sinceIf-Modified-Sincevalue
4xx: client error400: bad request404: not found
5xx: server error500: internal server error502: bad gateway (proxy/gateway received invalid response from upstream server)503: service unavailable (e.g. overloaded)505: HTTP version not supported
Cookies: User-Server State
HTTP is stateless, so to identify users cookies are used. There are four components:
- Cookie header line in HTTP response message
- Multiple
Set-cookielines can be sent - The header format is
Set-cookie: cookie_name=cookie_value- By default, cookies are session cookies which are deleted after the current session ends
; Expires = DoW, dd mmm yyyy, hh:mm:ss ttt;can be appended to make it a permanent cookie; Secure;ensures it is only sent on HTTPS connections; HttpOnlyensures it is not accessible through JS
- Multiple
- Cookie header line in HTTP request message
cookie: cookie1_name=cookie1_value; cookie2_name=cookie2_value...
- Cookie file kept on the client and managed by the browser
- Server’s database storing cookies
Web Caches (Proxy Servers)
Proxy servers sit between the origin server and client. There can be multiple proxy servers, one of which will hopefully be closer to the client than the origin server.
The proxy server/cache acts as both a client and server. It helps to:
- Reduce response time
- Reduce traffic on an institution’s access link
- Lower bandwidth costs
If the requested object is in the cache, the browser will use it. Otherwise, it will make a request to the proxy server (and if the proxy does not have the resource, it will in turn make a request to the origin server).
Conditional GET
The server will not send the object if the cache has an up-to-date copy of the object.
Request header: If-modified-since: some_date.
If the object has not been modified since the given date, a 304 Not Modified response will be sent and there will be no body. Otherwise, it will generate a normal 200 OK response.
17. Email and DNS
Three major components:
- User agent
- Agent through which the user accesses and sends email
- Mail server
- Has a mailbox containing incoming messages
- Has a message queue of outgoing messages to send
- SMTP
- Protocol used to exchange information between mail servers
- Mail servers run both the client and server
Simple Mail Transfer Protocol (SMTP)
The standard protocol used to transfer email between servers.
Basic operation:
- A uses a user agent (UA) to compose message
- A’s UA sends the message to their mail server and places it in a message queue
- Called a email submission
- A’s mail server’s SMTP client opens a TCP connection with B’s mail server’s SMTP server
- If multiple messages need to be sent, they are sent over a persistent TCP connection
- B’s mail server places the message in B’s client
- B uses his UA to read the message
SMTP uses telenet connection and runs on port 25 by default.
Sample telenet interaction:
- TCP connection established; server sends three digit reply code (probably
220) and information about the server (host name, server name/version, date/time etc.) - Client sends
HELO client_host_namemessage; server responds (with250if valid) - Client sends
MAIL FROM: <user@client_host_name>message; server confirms validity - Client sends
RCPT TO: <user@server_host_name>message; server confirms validity - Client sends
DATAmessage; server responds with delimiter used to close the email - Client sends body content, ending it with the specified delimiter
- Body content typically restricted to 7-bit ASCII
- Server responds with status message
- Client sends
QUITmessage
Mail message format:
From: user@client_host_name
To: user@server_host_name
Subject: subject_text
[BLANK LINE]
message_body
The receiving server appends a Received: header line to the top of the message as a record of the mail servers the email passed through.
Multipurpose Internet Mail Extension (MIME)
For non-ASCII data, MIME is used. It specifies the type of content (e.g. jpeg, mp3) and the method in which it is encoded. For example:
...
Subject: subject_text
MIME-Version: 1.0
Content-Transfer-Encoding: base64
Content-Type: image/jpeg
[BLANK LINE]
base64_encoded data
In base64 encoding, every three octets (24 bits) are divided into four bytes of 6 bits each - enough to fit into an ASCII character. The characters A-Za-z0-9+/ is used, with A corresponding to 0 and / to 63. = is used as padding.
SMTP and Mail Access Protocols
Access protocols are used for communication between the receiver’s mail server and user agent. Two mail protocols are:
- POP3 (Post Office Protocol):
- After authorization, the message is simply downloaded
- IMAP (Internet Mail Access Protocol):
- More features
- Allows messages stored on the server to be manipulated
Web-based email is another alternative; the user agent uses HTTP to communicate with its remote mailbox.
POP3
TCP telenet connection running on port 110.
During the authorization phase, the client sends two commands:
user usernamepass password
After each command is sent, the server responds with either +OK or -ERR.
During the transaction phase, the client can send:
list: list message numbersretr mail_num: retrieve message by that numberdele mail_num: delete message with that number
IMAP
IMAP:
- Keeps all messages in the server
- Allows the user to organize messages in folders
- Keeps state across sessions
- ALlows the user to obtain components of messages instead of the entire message
Domain Name System (DNS)
DNS is:
- a directory service
- a distributed database implemented in a hierarchy of DNS servers
- an application-layer protocol (UDP on port 53)
Some DNS services:
- Hostname to IP address translation
- On Linux, this can be done using the command
gethostbyname
- On Linux, this can be done using the command
- Host aliasing
- Canonical (
CNAMErecords) and alias names (e.g.www.)
- Canonical (
- Mail server aliasing (
MXrecords) - Load distribution
- Reply to DNS request can return a list of IP addresses. With DNS rotation, the list can rotate, and as clients typically pick the first address, the load can be distributed to multiple servers
DNS is not centralized:
- Acts as a single point of failure
- Traffic volume too large
- Latency: will be far away from most hosts
- Maintenance almost impossible
Hierarchy
DNS is composted of three main levels of DNS servers:
- Root DNS servers
- Top-level domain (TLD) servers
com,orgetc.
- Authoritative DNS servers
- May be provided by organization or service provider
- Non-authoritative DNS servers will cache responses from the latter, so may not be accurate
If a client wants the IP address for www.google.com, to a first approximation, it will:
- Query the root server for the
comDNS server - Query the
comDNS server to get thegoogle.comDNS server - Query the
google.comDNS server for the IP address forwww.google.com
There are 13 root name servers worldwide (although each is a cluster of replicated servers).
Local name servers do not belong to the hierarchy, but are central to the DNS architecture. Each ISP has provides one and acts as the default name server.
Query Types and Caching
If a client wants the IP address for www.google.com through an iterated query:
- The client sends a query to the local DNS server
- The local DNS server sends a query to the root DNS server for the IP address of the
comDNS server - The local DNS server sends a query to the
comDNS server for the IP address of the authoritative DNS server forgoogle.comhost - The local DNS server sends a query to the authoritative DNS server for the
google.comhost for the IP address ofwww.google.com - The local DNS forwards the reply back to the client
A recursive query puts the burden of name resolution to the contacted name server:
- The client sends a query to the local DNS server
- The local DNS server sends a query to the root DNS server for the IP address of
www.google.com - The root DNS server sends a query to the
comDNS server for the IP address ofwww.google.com - The
comDNS server sends a query to the authoritative DNS server for thegoogle.comhost for the IP address ofwww.google.com - The response recursive through the stack back to the client
Both queries require 8 messages. Hence, caching is used:
- Once any name server learns a mapping, it caches it
- TTL ensures cache entries expire
Hence, root name servers are usually not consulted.
DNS records
Each resource record stores the name, value, type, ttl and class (although that is always IN (internet)).
Some types values:
A:nameis hostname;valueis the IP address- IPv6 uses the
typeAAAAas the IPv6 addresses use four times as many bits
- IPv6 uses the
NS:nameis the domain,valueis the hostname of the authoritative name server for the domainCNAME:nameis the alias name for the canonical name stored invalueMX:valueis the canonical name of the mail server associated with the alias name
DNS Messages
Both query and rely messages use the same message format:
| 16 bits | 16 bits |
| Identification | Flags |
| Num. questions | Num. answer RPs |
| Num. authority RPs | Num. additional RPs |
| Questions (variable num.) |
| Answers (variable num. RRs) |
| Authority (variable num. RRs) |
| Additional info (variable num. RRs) |
The query and reply to the corresponding query have the same identification number.
The flags indicate;
- If the message is a query (
0) or reply (1) - If the query is recursive or not
- If a reply, it will indicate if the server supports it
The additional fields:
- Questions (
(Name, Type))- Reply will repeat these
- Answers (
(Type, Value, TTL)) - Authority: records for authoritative servers
- Additional information (e.g. IP address of authoritative DNS server)
Adding New Records
Example: registering a domain name at a registrar:
- Need to provide registrar with host names and IP addresses of primary/secondary authoritative DNS servers
- Registrar inserts two RRs into the TLD server for each authoritative name server
(host_name, authoritative_DNS_server_host_name, NS)(authoritative_DNS_server_host_name, authoritative_DNS_server_IP_address, A)
- Go to the authoritative name server and insert records pointing to your server there
Mid-Term Test Notes
Sockets
IP: stateless, unacknowledged, unreliable, unordered.
TCP/UDP add ports on top of IPv4 for addressing.
TCP/Stream:
- Acknowledgements of failure
- Record boundaries not preserved
UDP/Datagram:
- One
write, one packet; oneread, one packet
API:
bind: socket to IP port/address familylisten: declare accepting incoming requestsaccept: waits until request received. Returns a new socket
connect- TCP: request establishment of connection
- UDP: default receiver when using
send/write
read/recvfrom: get received data. Blocking if no datawrite/sendto: send data. Blocking if buffer is fullsendto/recvfromspecifies port/address family of sender/receiverclose: free resources
Network byte order: big endian.
Protocol Layering
Service access point: interface to service for higher layer.
| Header | Higher-layer payload/service data unit | Trailer |
OSI
Physical/Link layers single-hop scope.
Network layer use single-hops to achieve end-to-end communication.
Routers implement physical, link and network which are intentionally kept simple (and thus allows hardware implementations for high performance).
Physical:
- Bits to modulation of signal
- Specifies signal, modulation, bit synchronization etc.
Link:
- MAC: flow control etc.
- LLC: frames, error detection/coding, retransmission
Network:
- Addressing and routing; end-to-end delivery of messages
- Specifies addressing format, exchange of routing information, possibly establishment etc. of connections
Transport:
- Reliable, in-sequence, transparent end-to-end transfer
- Error control to guard against routers dropping packets
- Congestion control (stopping network overload, fair resource allocation) and end-to-end flow control (stopping receiver overload)
Session:
- Communication sessions between applications
- May involve several transport layer connections
- Controls duplex/half-duplex/simplex operation
Presentation:
- Endian conversion
Application:
- High-level support functions e.g. HTTP
TCP/IP reference model: physical/network interface/internet/transport/application.
Transport:
- End-to-end communication using the socket interface
- SAPs called ports; multiplexing over a single network adapter
- PDUs generated by this layer called segments
Service Primitives
Confirmed service:
- User A sends
request(with data?) - Provider A generates PDU(s)
- Provider B receives PDU(s), sends
indicationto user B - User B generates
response(with data?) - Provider B generates response PDU
- Provider A receives response, sends
confirmto user A
Unconfirmed: no response or confirm e.g. UDP.
Confirmed Delivery: Provider B sends PDU when request PDU received regardless what user B does.
Multiplexing:
- Several N SAPs transmitting data over a single N-1 SPA
- Scheduling to determine which SAP to serve
- Requires identifier (e.g. port numbers)
Splitting:
- Single N entity using several N-1 SAPs
- Scheduling to determine which SAP to use for a given PDU
- May require sequencing mechanisms
Fragmentation:
- When PDU is too large for the lower-layer protocols; partition into fragments
- Numbering of fragments
Blocking:
- Buffering small SDUs into a single PDU
- Markers required to separate SDUs
Physical Layer
- Formatting: ADC - source signal into digital data
- Source coding/compression. Lossy coding/relevancy reduction and lossless/redundancy reductiom
- Channel coding: adding redundant information for error correction - reduces bit-error probability
- Modulator: maps bits to physical signal/waveform of a certain duration - symbol time
- Baseband transmission. Bipolar NRZ:
1=x V,0=-x V. Unipolar:1=x V,0=0 V - Attenuation: signal power at transmitter divided by receiver. Lower is better
- Thermal noise: normal distribution added to signal
Passband transmission:
Where t is less than duration T, the symbol duration, A(t) is the amplitude, f(t) is the frequency offset, f_c is the center frequency and \phi(t) is the phase.
- Amplitude shift keying: varies
A(t)for each symbol - Frequency: varies
f(t)for each symbol - Phase: varies
\phi(t)
Quadrature Amplitude Modulation: varies amplitude and phase. n unique amplitudes and n unique phases for n^2 symbols. Greater throughput but higher error rate.
Carrier synchronization: synchronizes frequency and phase.
NRZ can lead to long runs of the same bit and loss of synchronization.
Manchester encoding: [-x, x] voltage range:
- 1: high to low in middle of symbol duration
- 0: low to high “”
Frame synchronization, ethernet: preamble at start for carrier synchronization, then start-frame-delimiter.
End of frame detection: gap between frames, deliberate code violations, dedicated length fields, special flags (with escape codes).
LAN
Geometric RV:
-
Type 1, Bernoulli trial: returns probability of k failures before success:
P(k)=p(1-p)^k, Expected value is(1-p)/p -
Type 2, probability of k trials before first success:
P(k)=p(1-p)^(k-1), expected value1/p -
Bus: tap line attaches all stations to bus. Broadcast medium
-
Star: private full-duplex channel for each station. Central un it is a repeater or bridge/switch
-
MAC: multiple stations, shared channel usable by one station at a time, no other channels, broadcast medium
-
Want: small medium access delay, fairness, reuse of resources, low overhead, stability
Orthogonal: behavior of one station independent of another.
Frequency division multiple access:
- Bandwidth divided into sub-channels + guard channels
- Separate receivers/tunable receiver - latter requires switching before transmission starts
- 0 medium access delay, B/n bandwidth
Time:
- Full bandwidth for 1/n of the time. Avg MAD is half superframe time plus B/n
- Avg better than FDMA for n>2
- Requires time synchronization
Both require some way of allocating sub-channels/time slots.
Random access protocol: no central station/shared state.
ALOHA:
- Transmit packet + checksum immediately
- Receiver sends immediate ack
- If timer expires, random backoff time chosen; repeat until threshold reached
- vulnerability period: if a station sends a packet of size n, vulnerability period one frame before and during sending of packet (if that packet has the same size)
- Unstable for high loads due to increasing collision rates
CSMA - carrier sense multiple access. Carrier sense: almost instantaneous check of if the medium is busy. Fails if time difference between two stations starting collisions is smaller than the propagation delay.
Non-persistent CSMA:
- If medium busy, generate random backoff time, then check again
- High probability of medium being idle after transmission finishes
p-persistent CSMA:
- Time is divided into very small frames
- Once medium is free, probability p of starting transmission. If collision occurs, restart
- If defers, repeats the process the next time slot
- Low p leads to stability in high load but low throughput in low load
1-persistent: CMSA/CD:
- Send unconditionally at end of transmission: avoid idle time
- Detect collision immediately if multiple stations transmit
- Abort transmissions
- Send jamming signal
- Begin collision resolution
Ethernet
Manchester Encoding:
| Length | Name |
|---|---|
| 7 | Preamble |
| 1 | SOF |
| 6 | DstAddr |
| 6 | SrcAddr |
| 2 | Length/Type |
| 46-1500 | Payload |
| 4 | FCS |
MAC address: 48 bit globally unique. If first of address is 1, multicast. Else, unicast address.
If L/T >= 1500, type field (e.g. IPv4, ARP) to allow for protocol multiplexing. Else, length field. Type assumed to be first two bytes of payload.
Broadcast: half-duplex, MAC needed, 10 Mbps PHYs.
MAC protocol: CSMA/CD:
-
collset to zero -
Carrier-sense; once medium idle, transmission starts immediately
-
If collision detected
- Abort, send jamming signal, increment
coll - Drop packet if
coll > 16 - Wait for backoff time
- Backoff window:
[0, 2^{min(10, coll) - 1}] - Multiply by slot time; predefined time large enough for max round-trip time
- Backoff window:
- Abort, send jamming signal, increment
-
Minimum frame size: transmitter can only detect collision while it is still transmitting by measuring voltage on medium
-
Repeater: amplify signal on analog level
-
Regenerating repeater: demodulate then modulate on a symbol-by-symbol level. No error checking
-
Hub: centralized repeater; broadcast signals to all ports. Basically a bus
-
Bridge: interconnects LANs on the MAC layer - must have the same MAC address structure
- Peeks at source address to learn what stations are on what LANs
- If destination address unknown, broadcasts packet to all other LANs
-
Switches: private full-duplex links for all stations. Frames transmitted to only the correct port - allows parallel transmissions
IP
Big endian byte ordering
| Length | Name |
|---|---|
| Bytes 0-3 | |
| 4 | Version (=4) |
| 4 | Hdr Len (numBytes/4) |
| 6 | TOS/DSCP (zeros) |
| 2 | Unused (zeros) |
| 16 | TotalLength (bytes) |
| Bytes 4-7 | |
| 16 | Identification |
| 3 | Flags (0, DF, MF) |
| 13 | FragmentOffset (offset/8) |
| Bytes 8-11 | |
| 8 | Time-To-Live (usually 32/64) |
| 8 | Protocol Type |
| 16 | Header Checksum |
| Bytes 12-15 | |
| 32 | Source Address |
| Bytes 16-19 | |
| 32 | Destination Address |
| Bytes 20+ | |
| 32n | Options + Padding |
| Data |
Last fragment has MF unset, and non-zero FragmentOffset to differentiate from an unfragmented packet.
Protocol type: determines higher-level protocol that generated the payload (e.g. UDP/TCP).
Routing table: map from destination address to output port.
Host address of all 0s: network as a whole being referred to. All 1s: broadcast address.
Lookup:
- Check for full host address match
- Check for network address match
- Use default
- Drop packet, send ICMP message
End hosts have default route + entry for each network it is directly connected to.
Core routers: know almost all internet networks, have no default routers.
Address Resolution Protocol: map from IP to MAC:
- Each has ARP cache, entries discarded fixed time after insert (or last use)
- Broadcast message containing own IP/MAC + IP of target
- Target responds with own IP/MAC plus requester IP/MAC
Internet Control Message Protocol - optional protocol used to inform sender of error.
Test Notes
Distance-Vector
Periodically send
Poisoned reverse: if
RIP
-
Metric: number of hops (number of subsets traversed())
-
Max path cost: 15
-
DVs sent to neighbors every 30 seconds via UDP
- Each advertisement contains at max 25 entries
-
Table stores destination network, next hop and number of hops
-
After 3 minutes of no advertisements, node declared dead
Link-State (Dijkstra)
Each router reliably floods information about its neighbors and independently calculates the least-cost path to every other router.
Add nearest node not in tree to the tree, calculate distance to each neighbor through the node.
OSPF
- Authenticated advertisements over IP (not UDP/TCP)
- Multiple same-cost paths allowed
- Link cost can be set by administrators
- Uni/multi/broadcast support
- Used mostly in upper-tier ISPs
Hierarchy
- Can be configured into areas. Each area runs OSPF:
- Broadcasts only within the same area
- Each node has detailed topology about its own area
- Inter-area routing through area-border routers
- ‘Summarizes’ distance to networks in its own area
- One backbone area which connects all areas
- Boundary routers connect to other ASes
IPv6
- Version (4): 6
- Traffic class (8): QoS
- Flow label (20): identifier for a group of packets
- Payload length (16): size of data in bytes
- Next header (8): payload type (e.g. UDP, TCP)
- Hop limit (8): TTL
- Source/Destination: 128 bytes each
When forwarding packet through IPv4 router:
- Dual stack: transform to IPv4, lose metadata
- Tunnelling: stuff IPv6 packet into IPV4
BGP
Pairs of routers, either in the same (iBGP) or different AS (eBGP), exchange information over TCP. If a node advertises a prefix, it promises that it can forward datagrams to the prefix. Some attributes:
AS_PATH: not updated in iBGP connections. Contains a list of ASes through which the advertisement has passed - reject if already in the pathNEXT_HOP: not updated in iBGP connections. Advertisements flow from the destination outwards, so it contains the address of the boundary router for the next-hop AS
Import policy used to accept/reject advertisements.
Route selection uses local preference (policy decision), shortest AS_PATH, closest NEXT_HOP router etc.
Reliable Data Transfer
Internet checksum:
def ones_complement_addition(shorts):
result = 0
for short in shorts:
result += short
if result > 0xFFFF:
result -= 0xFFFF
return result
def checksum(shorts):
return 0xFFFF - ones_complement_addition(shorts)
def check_checksum(shorts, checksum):
return 0xFFFF == (ones_complement_addition(shorts) + checksum)
CRC
ddata bits andrchecksum appended to the end.- Agreed-upon
r+1bit pattern, with MSB being1 - To find
r: Initially setrto0, do long division (XOR) and get remainder - Can detect consecutive bit errors up to
rbits long.
RDT
1.0: no bit errors, buffer overflows, out-of-order packets, packet loss
2.0: bit errors; checksum. Stop and wait for ACK/NACK after each packet
2.1: 1-bit sequence numbers in case ACK/NACK corrupted
2.2: No NACK. Sends ACK with sequence number - last packet that was received successfully
3.0: bit errors and lost packets. Assumes packets are not reordered in the channel (although this can be solved by having a time to live and sufficiently large sequence space)
Go-Back-N
No more than N unacknowledged packets. Receiver sends cumulative ACK. On timeout, all packets reset.
Selective Repeat
An ACK for each packet.
On timeout, resend oldest packet without ACK. If duplicate packet received, send ACK for that packet again.
Worst case scenario:
- All but the first packet in the window arrives. ACKs arrive
- First packet resent, ACK lost
- Receiver window moves forward by
sender does not move - Sender window is
- Receiver window is
- Sender window is
- Sender sends first packet again
To prevent this from being interpreted as a new packet, the sender base number must not be in the receiver window. Hence, the window size must be larger than
UDP and TCP
UDP
| 16 bits | 16 bits |
| Source port | Destination port |
| Length | Checksum |
| Data |
Identified by destination port and IP.
TCP
| 16 bits | 16 bits |
| Source port | Destination port |
| Sequence Number |
| Acknowledgement Number |
| Flags | Receive window |
| Checksum | Urg. data pointer |
| Options (variable) |
| Application Data |
- Sequence number: byte number for first byte in segment
- Acknowledgement: next expected byte number
- Flags starts with 4-bit header length - bytes in header divided by 4
LastByteSent - LastByteACKed <= min(CongestionWindow, RecvWindow)
Handshake and Termination
Segments with SYN or FIN treated as if they have a 1 byte payload.
Three-way handshake:
- Client sends TCP SYN segment
- Server replies with SYNACK:
ack = client_isn + 1 - Client replies with ACK:
ack = server_isn + 1, sequence_number = client_isn + 1
Termination:
- Client sends
FIN - Server responds with
ACK - Once connection closed, server sends
FIN - Client responds with
ACK, waiting twice the max segment lifespan to ensure the ACK arrived
Acknowledgements
ACK: largest byte received (cumulative ACK).
Sender events:
- One retransmission timer timer
- If timer expires, resend oldest unacknowledged packet
- Double timeout and restart timer
- When ACK received, restart timer (if received ACK is not for base packet)
- Fast retransmit: if three ACKs with the same segment number received, retransmit it immediately
Receiver events:
- If in-order segment arrives, wait 500 ms. If another in-order segment arrives in this window, send the ACK immediately
- If out-of-order packet arrives, buffer and send duplicate ACK
Timeout value requires finding round trip time: use exponential weighted moving average:
Flow control
- Send receive window - amount of free space in buffer
- If receive window 0, need to periodically send TCP segment
Congestion Control
Overflowing router buffers:
Slow start:
- Congestion window = 1 MSS (1 segment per RTT)
- Double window every RTT (increment by one after every ACK)
AIMD (after first loss):
- Increase by 1 after every RTT (window += 1/window)
- Half after every loss
After 3 duplicate ACKs, half window and move to AIMD.
After timeout, slow start until it reaches half of window size before timeout. Then move to AIMD.
LastByteSent - LastByteAcked <= min(CongestionWindow, ReceiveWindow)
Lab 4
- First ping to station on the same network slow as ARP request required
- Minimal forwarding table:
0.0.0.0/0pointing to default gateway
Ping failure modes:
- Network unreachable: no entry in forwarding table
- Destination host unreachable: directly-attached network, ARP fails
- Destination port not reachable: TCP/UDP packet, port not bound
- No response: if routing table of receiver empty
ASes:
- Stub: leaf node connected to single AS
- Multihomed: connected to multiple ASes
- Transit: multihomed but traffic can path through