All Files in ‘COSC264 (2020-S2)’ Merged

01. Introduction

Term 3 quizzes and assignments:

Mid-term test:

Term 4 quizzes:

Lab test:

Final exam:

02. Sockets

Ethernet/WLAN/PPP etc. -> Internet Protocol -> TCP/DNS/HTTP etc.

Everything over IP, IP over everything.

IP Address

Representation

IP Service - Best Effort

The basic IP service is packet delivery. It is:

UDP and TCP

UDP

IPv4, plus port numbers. It is:

TCP

TCP allows safe and reliable transfer over the internet. It is:

Ports

Processes and blocking socket calls

An OS creates an abstraction for processes to make processes think that they have their own exclusive processor and allocate memory exclusively (memory not accessible by other processes). Below the abstraction, time-sharing is used. A process can be:

Socket API

Client/Server Paradigm

Client applications must:

Server applications must:

The client and server can run on the same machine: 127.0.0.1 (localhost) maps a device to itself.

Socket Types

Differ by OS, but these are universal:

Example Workflow: TCP client

Example workflow: TCP Server

Example workflow: UDP client

Example workflow: UDP server

Socket API Functions

#include <sys/types.h>
#include <sys/socket.h>
#include <unistd.h>

Returns an integer. If negative, error, and sets the errno variable.

Socket (non-blocking)

int socket(int domain, int type, int protocol);.

If successful, returns file descriptor (positive integer). If fails, returns -1 and sets errno.

Bind (non-blocking)

int bind(int sockfd, const struct sockaddr *addr, socklen_t addrlen);

Tries to be general, so is big enough to fit in any type of address. sa_family is the same value as used in protocol.

struct sockaddr {
 sa_family_t sa_family;
 char sa_data[14];
}

Need to cast sockaddr_in to a char array.

struct sockaddr_in {
 short sin_family;
 unsigned short sin_port;
 struct in_addr sin_addr;
 char sin_zero[8];
};

struct in_addr {
 unsigned long s_addr;
};

Listen (non-blocking)

int listen(int sockfd, int backlog);

Declares that you are willing to accept incoming steams (SOCK_STREAM or SOCK_SEQPACKET connections), and allocates resources (queue).

Accept (blocking)

int accept(int sockfd, struct sockaddr *addr, socklen_t *addrlen);

Connect (blocking)

int connect(int sockfd, const struct sockaddr *addr, socklen_t addrlen);

Fate of connection setup must be known.

Blocks until the fate of the connection setup is known:

Read

ssize_t recv(int sockfd, void *buf, size_t len, int flags); ssize_t recvfrom(int sockfd, void *buf, size_t len, int flags, struct sockaddr *src_addr, socklen_t *addrlen);
ssize_t read(int fd, void *buf, size_t count);

read: prepare buffer area, size of the buffer area, and pass it to read, which will fill it up with data.

recvfrom: the caller also provides memory for address structure and the function fills it in with the address/port/address family of the host.

Read will be blocking if there is no data (unless a flag is set using recv/recvfrom, in which case it will return an error code).

Write

ssize_t send(int sockfd, const void *buf, size_t len, int flags);
ssize_t sendto(int sockfd, const void *buf, size_t len, int flags, const struct sockaddr *dest_addr, socklen_t addrlen);
ssize_t write(int fd, const void *buf, size_t count);

write: pass it a buffer of data to send, and the number of bytes to send.

If the underlying socket buffer is too full, it will be blocking (unless the flag is set, which will cause it to return an error code).

send/write require the addressee to be specified using connect.

Close

int close(int fd);

Frees up socket resources.

Endianness

For certain data types (e.g. 16 bit unsigned port number), the packet header in particular, there must be a canonical representation for data being transmitted - the network byte order. Hosts must convert between their own internal representation (host byte order) and network representation.

The canonical representation for the internet is big-endian (most significant byte at lowest memory address). Intel hates it.

Host (h) to network (n) conversion, or vice versa, available as helper functions for 16/32 bit integers.

#include <arpa/inet.h>
uint32_t htonl(uint32_t hostlong); // host to network long (32 bits)
uint16_t htons(uint16_t hostshort); // host to network short (16 bits)

uint32_t ntohl(uint32_t netlong); // network to host long
uint16_t ntohs(uint16_t netshort); // network to host short

Helper functions for address manipulation.

#include <sys/socket.h>
#include <netinet/in.h>
#include <arpa/inet.h>

int inet_aton(const char *cp, struct in_addr *inp);
in_addr_t inet_addr(const char *cp);
in_addr_t inet_network(const char *cp);
char *inet_ntoa(struct in_addr in);
struct in_addr inet_makeaddr(int net, int host);
in_addr_t inet_lnaof(struct in_addr in);
in_addr_t inet_netof(struct in_addr in);

TCP Client Example

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <stdint.h>
#include <string.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <netdb.h>

int main() {
  char[] server_name = "localhost";
  uint16_t portno = 80;
  struct sockaddr_in_serv_addr; // Address structure for server
  struct hostent *server; // Result of DNS resolver
  char[] buffer[256] = "A message to send"; // Buffer for data to send

  // Create socket with protocol 0 (default) for given address family - TCP
  int sockfd = socket(AE_INET, SOCK_STREAM, 0);
  if (sockfd < 0) {
    perror("ERROR opening socket");
    exit(1);
  }

  server = gethostname(server_name); // DNS resolution
  if (server == NULL) {
    perror("ERROR no such host");
    close(sockfd); // Not needed, but stylistically bad to not close the socket
    exit(0);
  }

  bzero((char*) &serv_addr, sizeof(serv_addr)); // Zero out the struct
  serv_addr.sin_family = AE_NET;

  // Copy IP address from DNS response
  bcopy((char*)server->h_addr,
    (char*)&serv_addr.sin_addr.s_addr,
    server->h_length
  );

  // Set port number
  serv_addr.sin_port = htons(portno);

  if (connect(sockfd, (struct sockaddr*) &serv_addr, sizeof(serv_addr)) < 0) {
    perror("ERROR connecting");
    close(sockfd);
    exit(1);
  }

  int n = write(sockfd, buffer, strlen(buffer));
  if (n < 0) {
    perror("ERROR writing to socket");
    exit(1);
  }

  bzero(buffer, 256); // Zero out buffer; wait for response
  n = read(sockfd, buffer, 255);
  if (n < 0) {
    perror("ERROR reading from socket");
    exit(1);
  }

  printf("%s\n", buffer);

}

TCP Server Example

int main() {
  int portno = 80;
  int sockfd = socket(AE_INET, SOCK_STREAM, 0);

  struct sockaddr_in serv_addr;
  struct sockaddr_in cli_addr;

  if (sockfd < 0) {
    perror("ERROR opening socket");
    exit(1);
  }


  bzero((char *) &serv_addr, sizeof(serv_addr));
  serv_addr.sin_family = IENET;
  serv_addr.sin_addr.s_addr = INADDR_ANY; // Bound to all local interfaces. Otherwise, use the IP address of a specific interface
  serv_addr.sin_port = htons(portno);

  if (bind(sockfd, (struct sockaddr *) &serv_addr, sizeof(serv_addr)) < 0) {
    eprint("ERROR on binding");
    exit(1);
  }

  listen(sockfd, 5); // Allow maximum of 5 queued onnections

  clilen = sizeof(cli_addr);
  // Wait (blocking) for incoming connection request with `accept`
  int newsockfd = accept(sockfd, (struct sockaddr *) &cli_addr, &clilen);

  if (newsockfd < 0) {
    eprint("ERROR on accept");
    exit(1);
  }

  bzero(buffer, 256);
  n = read(newsockfd, buffer, 255);
  if (n < 0) {
    eprint("ERROR reading from socket");
    exit(1);
  }

  printf("%s\n", buffer);
  n = write(newsocketfd, "Message received", 17);
  if (n < 0) {
    eprint("ERROR writing to socket");
    exit(0);
  }

  close(newsockfd);
  close(sockfd);
  return 0;
}

03. Protocol Layering

Layering

The internet and POTS are complex and require vast amounts of software. Hence, structuring principals are used to achieve:

Networking software is layered - layer NN offers an interface to layer N+1N + 1, offering it using only the service offered by layer N1N - 1 (through its interface).

General layout of Layer NN-PDU (Protocol Data Unit)

| N-protocol header | N + 1 data (N-SDU) | N trailer |

Thus, going down the stack, each layer sandwiches the N+1N + 1 payload between its own header and trailer:

| N-2 hdr | N-1 hdr | N hdr | N+1 data | N trl | N-1 trl | N-2 trl|

The receiver does the opposite, with each layer removing its header/trailer before handing the payload to the upper layer.

OSI Seven Layer Model

OSI: Open Systems Interconnection. A commercial failure, but the terminology and layering architecture/concepts are foundational to networking.

If there are two end hosts, A and B, and a router in between:

Physical (PHY)

Transmission of digital data over a physical medium using modulated waveforms/signals.

Often requires the specification of:

Responsible for reliable transfer of messages over a physical link. Messages at this layer are called frames.

Often requires the specification of:

Network layer

Provides link technology-independent abstraction of the entire network to other layers. Handles addressing and routing and end-to-end delivery of messages.

Messages at this layer and above are called packets.

Often requires the specification of:

Transport Layer

Provides reliable, in-sequence and transparent end-to-end transfer; gives programming abstractions to higher layers.

Often requires the specification of:

Session & Presentation Layers

No one really knows what these are.

Session layer:

Presentation layer:

Application Layer

Application support functions (high-level APIs) used in many applications. Applications sit on top of this layer.

Examples include file transfer services, directory services, transaction processing support (e.g. two-phase commit)

TCP/IP Reference Model

Broadly equivalent to OSI model, but with the session and representation layers removed.

Once again, routers only implement the lowest three layers, which are kept simple. This allows many things to be implemented in hardware, allowing routers to keep up with the deluge of packets.

Application Layer

Protocols such as SMTP, HTTP, FTP.

Transport Layer

Internet Layer

IP:

Physical and Network Interface Layer

Network interface layer:

Service Providers and Service Users

An NN-protocol implements an NN-service; it is an NN-service provider which will be used by an N+1N+1 protocol; the NN-service user.

The service user and provider:

There are standard service primitives for a service.

Confirmed Service

Unconfirmed Service

The first three steps: S.request, PDU sent to B, then S.indication. Service user A does not know if their request has reached B’s service user.

Confirmed Delivery Service

Host B’s service provider simultaneously sends a S.indication to its service user and an acknowledgement packet to A’s service, which will generate a S.confirm packet to the service user.

Thus, A’s service user knows that B’s service provider received the request, but not if the service user successfully processed it.

Multiplexing

Allows several NN SAPs to transmit data over a single N1N-1 SAP.

To do this, the NN protocol entity needs to make scheduling decisions to decide which NN SAP to serve next.

The sending NN protocol entity must include an SAP identifier into the NN PDUs so it can be delivered to the right SAP (e.g. UDP has port numbers).

Splitting

Fragmentation and Reassembly

Blocking and Deblocking

Sequence Numbers

04. Physical Layering

Fundamentals

Digital Data

Digital data is finite or countably infinite sequence of discrete symbols, each belonging to a finite set called the alphabet.

As any finite alphabet can be represented by a group of bits, any digital data can be represented as a sequence of bits.

Analog Data

Analog data can take on an uncountable number of values which change continuously over time.

Transmission of Analog and Digital Data

With digital data, we want to recover the transmitted sequence of bits exactly, regardless of any distortions or errors introduced by the channel.

When the receiver receives a group of bits, it must choose one out of a finite number of alternatives. The design of the system centers around maximizing the probability that it makes the correct choice.

With analog data, if the channel introduce distortions, the receiver cannot exactly recover the transmitted data as it is faced an uncountably infinite number of alternatives. Thus, the goal of an analog system is to extract the best approximation to the transmitted signal.

Transmission of Digital Data Using Analog Signals

Structure of Communication Systems

Formatting

This step transforms a source signal into digital data (i.e. ADC).

Source Coding (Compression)

The step of encoding digital data with the goal of reducing size. There are two types, lossless and lossy coding.

Lossless coding, also called redundancy reduction, encodes the data such that it can be perfectly restored.

Lossy coding, also called relevancy reduction, is used where humans have tolerance to imperfections, such as video/audio. The decoded data may differ from the original data, but aims to do so while reducing the difference perceived by humans.

Channel Coding

This adds redundant symbols/bits to combat transmission errors in the channel, giving channel decoder a better chance to guess the correct data. Thus, it reduces bit-error probability.

Modulator

Takes the sequence of digital symbols and maps these to a physical signal or waveform of a certain duration (symbol time) - this determines the bit rate.

Could increase the amount of data sent per symbol time, but this will make each symbol closer together and thus more susceptible to interference. To combat this, the power can be increased.

Demodulator and Channel Decoder

The demodulator must produce a guess of what could have been the transmitted waveform (and thus symbol) - this is possible as there is only a finite number of waveforms.

The probability of error, or symbol/bit error rate, is a key performance metric.

The channel decoder uses the redundancy introduced by the channel encoder to correct some or all of the errors made by the demodulator.

In the case of soft demodulation, the demodulator also sends its confidence that a certain symbol has been interpreted correctly.

Source Decoder and Formatter

The source decoder decompresses the compressed source data.

The formatter converts the data to the user.

Baseband Transmission

NRZ (Non-Return-to-Zero) Encoding

A voltage level is associated with each bit value and transmitted for a fixed amount of time - the symbol time. The inverse is the symbol/baud rate.

As one bit is transmitted per symbol time, the symbol rate is the bit rate.

Bipolar NRZ maps 11 to VV and 00 to V-V, while unipolar NRZ/on-off keying maps 11 to VV and 00 to 00 volts.

However, more than two levels can be used - a group of kk bits can be mapped to 2k2^k levels.

Signal Impairments

Assume unipolar NRZ is used below.

Attenuation

Attenuation - loss of signal power; in communications, this is often the loss of signal power for varying distance (path loss).

It is measured as a ratio of signal power at the transmitter output, PtxP_tx and signal power observed by the receiver input, PrxP_rx:

η=PtxPrx \eta = \frac{P_{tx}}{P_{rx}}

This is often expressed using decibels:

ηdB=10log10(n) \eta_{dB} = 10 \cdot log_{10}(n)

If the medium attenuates all frequencies equally, attenuation is a simple division by the attenuation factor.

Unfortunately, the medium does not attenuate all frequencies the same way.

Assuming there is a cut-off frequency in which all frequencies above the cut-off are attenuated completely, the signal will degrade as the cut-off frequency is lowered (fourier series describing square wave; low frequencies sinusoids will remain behind, making it less ‘sharp’).

A more realistic scenario is a low-pass filter.

Thermal Noise

A standard model for thermal noise is additive white Gaussian noise (AWGN) where the noise signal, n(t)n(t) is modelled as a zero-mean, normally distributed function with variance σ2\sigma^2:

n(t)=1σ(2π)exσ2/2 n(t) = \frac{1}{\sigma\sqrt(2 \pi)} \cdot e^{-\frac{x}{\sigma}^{2}/2}

The observed signal will be r(t)+n(t)r(t) + n(t).

The noise is usually dependent on temperature.

Passband Transmission

To transmit several data signals simultaneously on a single channel (while avoiding collisions), passband transmission is required.

Data is transmitted by modulating a sinusoidal carrier of a certain frequency; one signal occupies a certain bandwidth around the center frequency.

By choosing center frequencies with sufficient separation, several data signals can be transmitted in parallel in a sufficiently wide frequency band (e.g. FM radio, Wi-Fi bands). This is also called frequency-division multiplexing.

In digital passband modulation, information modulated onto sinusoidal carrier of duration TT - the symbol duration.

s(t)=A(t)cos(t(fc+f(t))+ϕ(t)),t[0,T] s(t) = A(t) \cdot cos(t(f_c + f(t)) + \phi(t)), t \in [0,T]

Where A(t)A(t) is the amplitude, f(t)f(t) is the frequency offset, ϕ(t)\phi(t) is the phase, and fcf_c is the center frequency.

As each symbol will have their own functions, there may be sudden jumps (in theory) in amplitude as it the waveform moves from one symbol to the next.

There are three forms of digital modulation, each of which varies one of the functions:

Amplitude Modulation (Amplitude Shift Keying) varies A(t)A(t):

si(t)=Aicos(fct)s_i(t) = A_i \cdot cos(f_c \cdot t) where AiA_i is the amplitude corresponding to the iith symbol, and si(t)s_i(t) the waveform for that symbol.

Frequency Modulation (Frequency Shift Keying) varies f(t)f(t):

si(t)=Acos((fc+fi)t)s_i(t) = A \cdot cos((f_c + f_i) \cdot t), where t[0,T]t\in[0,T] and fif_i is the frequency offset for the given symbol.

Phase Modulation (Amplitude Shift Keying) varies ϕ(t)\phi(t):

si(t)=Acos(fct+ϕi)s_i(t) = A \cdot cos(f_c \cdot t + \phi_i), where phiiphi_i is the phase for the iith symbol.

Quadrature Amplitude Modulation (QAM)

A combination of ASK and PSK, varying amplitude and phase according to the symbol simultaneously.

Amplitude-Phase graph with some (square) number of constellation points equally spread apart, forming a grid.

QAM usually applied with 16, 64 or 256 different constellation points, allowing 4, 6 or 8 bits to be mapped to one QAM symbol. Greater throughput but higher error rate.

Synchronization

Sender and receiver need to agree on how long the symbol period is, the exact center frequency, where packets start and and etc.

They have no common time or frequency reference, with each having their own local time and frequency references (using a crystal oscillators).

Oscillator frequency depends on manufacturing tolerances, ageing, temperature, pressure etc.

Hence, the receiver cannot assume the sender has the same time and frequency reference. Hence the receiver must:

Oscillator mismatch and Doppler shift can mess up the frequency, and PSK/QAM require tracking signal phase. Synchronizing frequency and phase is called carrier synchronization.

Manchester Encoding ensures there is at least one signal change per bit: a logical 1 is represented by a change from high to low in the middle of a symbol duration, and a logical 0 from a change from low to high. Voltage goes from -V to V.

The receiver will need to sample the incoming signal at a much higher frequency; if the transition occurs in an unexpected sample, the local clock can be adjusted so that that sample will be exactly in the middle of the symbol time.

Frame Synchronization

Ethernet has a preamble, called SYNC, to allow for initial synchronization; a 64 bit long sequence of alternating 0s and 1s. After this is a start-frame-delimiter; SDF. This is required as when the receiver acquires synchronization, it will have no idea when in the preamble it is.

How does the receiver know when the frame ends?

Ethernet has length fields, but a bit error could occur in the length field so it cannot be trusted.

Bit stuffing is another option: a protocol has some well-known start and end pattern; the pattern must not be allowed to appear in the payload data.

For example, in HDLC, the flag is 01111110; after a sequence of 5 consecutive 1’s, the sender will insert a zero bit. Thus, if the receiver gets six 1’s, it must be the flag; if it gets 5 1’s followed by a 0, it should remove the 0. Unfortunately, a bit error could occur.

05. Local Area Networks and Ethernet

Misc.

Geometric RV:

LANs

Standards

Standards typically specify these layers:

IEEE 802 Series

IEEE = Institute of Electrical and Electronics Engineers. ISO has adopted several IEEE standards, including:

802.2 and 802.1’s MAC bridging are layer 2 protocols and thus, they are independent of the layer 1 protocols being used (WLAN, MAN etc.).

Physical Topologies

Bus Topology

Rarely used today. All stations are attached via a tap line to the bus. Thus, signals are heard by all stations - it is a broadcast medium, so frames include a destination address. Parallel transmissions may collide, garbling packets.

Due to signal attenuation, the bus length is limited to a few hundred meters at most.

In addition, if the bus cable is damaged, the entire network will stop working.

Star Topology

Stations attached to a central unit via their own private cable. The cable often has separate cables for transmit and receive directions, allowing full-duplex.

The central unit can act as a repeater or bridge/switch. As a repeater, it copies incoming signals to all other outputs, which may congest the network. As a bridge/switch, it copies incoming frames to only where the destination is. The latter allows transmissions to occur in parallel.

There is still a signal point of failure, but the impact of a cut cable is much smaller.

Wireless Topologies

Wireless transmission media are much more prone to errors, and stations cannot necessarily hear all stations - the channel is time-varying and the links are volatile and unpredictable.

MAC Fundamentals

These schemes allow a set of distributed stations to efficiently coordinate access to a common channel. More specifically:

It is often regarded as a separate sub-layer between the PHY and link-layer.

Assumptions:

Desirable properties:

Orthogonal MAC Schemes

The behavior of one station does not influence the behavior of other stations. This is done by exclusively allocating resources to individual stations.

FDMA: frequency division multiple access; TDMA: time division; SDMA: space division; CDMA: code division.

FDMA

Each station is given a band of frequencies - a sub-channel they have exclusive transmission access to.

The channel bandwidth is subdivided into N sub-channels, with guard bands acting as safety margins between sub-channels and the fringe of the channel. This reduces interference between adjacent sub-channels.

To receive data, a station must have a separate receiver for each channel or a single tunable receiver - the latter has the issue of coordination as it must switch (related - tuning takes time) to the channel before data is transmitted onto the channel.

Performance

If the total available bandwidth is B (bps), station i is assigned 1/N of B (approximately - guard bands ignored) on a long-term basis.

Medium access delay is 0 as i can start the transmission immediately without risk of collision.

If a packet has size B/N bits, transmission takes one second. Thus:

.E[Completion Time]=E[Transmission Delay]+E[Access Time].=E[Transmission Delay]+0.=1s.. \begin{aligned}. E[\text{Completion Time}] &= E[\text{Transmission Delay}] + E[\text{Access Time}] \\. &= E[\text{Transmission Delay}] + 0 \\. &= 1 s. \end{aligned}.

Advantages and Disadvantages

N stations can transmit in parallel, and there is no need for time synchronization between the transmitters.

However, each station either needs N receivers or tuneable receivers, frequency synchronization is needed, and there is no re-use.

In addition, there needs to be some way of allocating sub-channels; if there is a central station in charge of keeping track which channels are being used and processing allocations requests, what happens if it crashes?

Hence, FDMA is good for CBR but bad for VBR.

TDMA

Time is subdivided into successive super-frames of duration T_SF, with each super-frame being subdivided into N subframes (plus guard times to compensate for synchronization errors). Hence, time slots are assigned exclusively and on a longer-term basis a station i for transmission.

Performance

Each station gets the full bandwidth B (bps) for 1/N of the time (ignoring guard times).

If:

Thus:

.E[Access Delay]=TSF2=0.5s (on average).E[Completion Time]=E[Access Delay]+E[Transmission Delay].=0.5s+1/Ns<=1s( less than forN>2).. \begin{aligned}. E[\text{Access Delay}] &= \frac{T_{SF}}{2} = 0.5 s \text{ (on average)} \\. E[\text{Completion Time}] &= E[\text{Access Delay}] + E[\text{Transmission Delay}] \\. &= 0.5 s + 1/N s <= 1 s \text{( less than for} N > 2\text{)}. \end{aligned}.

Thus, on average, TDMA allows starting later and finishing sooner due to the full bandwidth being exclusively available.

Advantages and Disadvantages

Compared to FDMA, asymmetric bandwidth assignments is much easier - one station can use multiple time slots rather than having multiple receivers running simultaneously. In addition, it has better completion times on average, and does not require tunable receivers or multiple receivers.

The downsides are that tight time-synchronizations between stations is required and there is a high expected access delay on idle systems.

In addition, TDMA also requires signalling and shared state to allocate time slots and so has the same issues.

Hence, TDMA is also good for CBR but bad for VBR traffic.

Random Access Protocols

These protocols do not reserve channel resources or require a central station/shared state. Access to the medium is unpredictable, using randomness to avoid collisions.

ALOHA

N uncoordinated transmitters and one receiver (a base station). A collision occurs if two packets overlap at the receiver.

When a new packet arrives, it is put in a queue. Once the current packet is transmitted, it checks the queue and transmits the packet after a random backoff.

If the queue is empty, it calculates the checksum and transmits the frame immediately, then starts an acknowledgement timer.

The receiver sends an immediate ack on successful reception of a packet; if the ack timer expires, the transmitter enters backoff mode:

The choice of random distribution for backoff times plays a key role in controlling the delay, throughput and stability. Often, the distribution depends on the number of collisions seen by the frame.

ALOHA is simple to implement and if the network load is small, new frames are sent immediately - zero access delay, with the full channel bandwidth.

However, if A and B sends packets of the same length, there is a vulnerability period two frames long where a collision will occur if B sends the frame - one frame before and while A is sending the packet.

ALOHA has good throughput for low loads, but beyond a certain point, the increasing collision rate leads to decreased throughput - it is an unstable protocol.

If all packets have the same length and the package transmission time is \tau.

CSMA - Carrier Sense Multiple Access

These protocols assume that stations can determine the state of the medium (busy or idle) (almost) instantaneously.

This is called carrier-sensing or clear channel assessment, and allows for listen-before-talk: before transmitting a frame, it performs the CS operation. If the channel is busy, the station defers the transmission using some strategy. It usually has a maximum number of deferrals/backoffs for a frame. Note that this does not eliminate collisions completely - collisions can occur if the time difference between two stations starting a transmission is smaller than the propagation delay.

This works well when medium the propagation time is small compared to the packet length as it allows other stations to notice transmissions quickly after it is started. Thus, it can occur in local area networks, but is absolutely useless if the sender may have stopped the transmission by the time the receiver detects the start of the frame.

Non-persistent CSMA

After getting a new packet from the higher layer (with an empty queue), the station performs the carrier-sense operation. If the medium is idle, it can transmit immediately but if the medium is busy, it generates a backoff time from a random distribution. Then it can repeat the process until it succeeds or reaches some threshold.

In the case of a collision, a backoff time is chosen and the process repeats. How can collisions be detected? One solution is to expect an acknowledgement for each frame.

Performance issue: there is a high probability that the medium will be idle for some time after the previous transmission finishes (especially if the packet is long), lowering utilization.

If it is known there is a small number of transmitters, using a distribution returning small waiting time reduces the time gaps between packets, increasing throughputs.

p-persistent CSMA

A parameter p is known to all stations.

If the medium is busy, it defers until the end of the ongoing transmission.

It will then divide the time onto time slots, which should be just large enough to run the following:

Choosing p is an issue, with a low value leading to stability in high load but low throughput as the medium will be idle after a transmission ends.

1-persistent CSMA or CSMA/CD

If the medium is busy, it defers until the end of the transmission.

When the medium becomes idle, the station sends unconditionally. This avoids idle times between transmissions.

However, if multiple stations start transmitting, a collision occurs and the transmitters detect this immediately - they need to be able to transmit and sense simultaneously. In this case:

Ethernet

Frame Format

| Length | Name |. | ------- | ----------- |. | 7 | Preamble |. | 1 | SOF |. | 6 | DstAddr |. | 6 | SrcAddr |. | 2 | Length/Type |. | 46-1500 | Payload |. | 4 | FCS |.

Uses Manchester encoding.

Ethernet Addresses

A 48 bit, globally unique address. Each vendor gets its own address range and each Ethernet adapter has its own address.

Addresses are written as six colon-separated bytes in hexadecimal representation.

Special addresses:

Address filtering: a station will throw away any unicast addresses not addressed to itself.

Length/Type field

When L/T is bigger than or equal to 0x0600, it is a type field: it indicates the higher-layer protocol encapsulated in the Ethernet frame. Hence, it allows protocol multiplexing - a service to multiple higher-level protocols can be provided using a single lower-level SAP. Some common protocols:

| Type field | Protocol |. | ---------- | --------------- |. | 0x0800 | IPV4 |. | 0x0806 | ARP |. | 0x86DD | IPV6 |. | 0x8863 | PPPoE Discovery |. | 0x8864 | PPPoE Session |.

When L/T is less than or equal to 1500, it indicates the length of the payload; the first two bytes of the payload indicates the type (hopefully).

PHY

There are several different physical layers such as coaxial cables, twisted pair cables, and fibre cables.

Ethernet also supports two fundamentally different topologies: broadcast (almost never used today) and switched Ethernet.

Broadcast Topologies

Switched Topologies

Stations are attached to (possibly cascaded) switches via point-to-point links - a private cable. Hence, no MAC is needed (but is still used) and parallel transmissions are possible, although there may be contention for resources in the switch.

Stations operate in full-duplex mode - each link uses pairs of cables.

This is required for 100 Mbps and higher PHYs - basically all modern Ethernet.

Half-Duplex Ethernet MAC Protocol

CSMA/CD - Carrier-Sense-Multiple-Access with Collision-Detection.

This has very short medium access delays under light load, but the collision rate increases under heavy load with many stations - do not operate Ethernet beyond 50-60% load.

Assumes that parallel transmissions lead to collisions and that stations can monitor the medium for collision while transmitting - this works by checking the voltage on the medium and checking if this exceeds the voltage applied by the transmitter.

Backoff time computation

The computation uses a truncated binary exponential backoff algorithm.

A random integer is uniformly generated the range [0,2min(10,coll)1][0, 2^{min(10, coll)} - 1] - this is the backoff window, and is multiplied by the slot time.

The slot time is a common parameter that is just large enough to cover the maximum round-trip time and processing delays such that if two stations start transmitting one slot apart, the later one will notice.

The algorithm starts by assuming the number of contenders is small - if this is the case, you do not need many windows. If this fails many times, then it is likely that there are many contenders, so you need to increase the number of windows to reduce the probability of collision.

Minimum Frame Size/Slot Time

The minimum frame size (64 bytes for 10 Mbps) is chosen to ensure the transmitter can detect collisions - it can only detect collisions while it is still transmitting.

NB: the preamble and SOF are not included in the frame size calculation.

The maximum cable length and processing time from repeaters/hub is defined in the standard, so it is possible to generate the worst case:

Bridges and Switches

Often, existing LANs need to be coupled together - this can be done at several different levels.

Layer Name of device
PHY Repeater/Hub
MAC Bridge/Layer-2 switches
Network Router
Application Gateway

Marketing departments do not necessarily follow this terminology.

Repeaters and Hubs

Repeater

Repeaters are dumb devices which simply amplify a signal on the analog level. They amplify existing noise and add their own noise to the signal, but have very small delays and are completely protocol or modulation-agnostic.

Regenerating Repeater

These demodulates an incoming signal on a symbol-per-symbol basis and modulates it again.

They do no error checking so regeneration can introduce errors.

Hub

Hubs are centralized repeaters - they broadcast signals coming in on one port to all other ports. Only one station can transmit at a time,.

Hubs may or may not be regenerative.

Basically a bus, except in a star topology, so a single broken cable won’t kill the entire network.

Bridges

Bridges interconnect LANs on the MAC layer. They mostly connect LANs of the same type (e.g. Ethernet), or at least ones with the same MAC address structure. Bridges can be cascaded.

Bridges understand/interpret fields related to the MAC protocol (source/address fields); repeaters and hubs do not.

Basic Operation

Advantages

Layer-2 Switches

Stations are attached to the switch via point-to-point/private links with separate transmit/receive line (full-duplex). Hence, there is no broadcast medium so no need for a MAC. Instead, stations contend for the resources of the switch.

Switches transmit frames only to the correct output port, and can process frames in parallel - hence, they can increase network capacity as compared to LANs with a broadcast medium.

Frames arriving in parallel to the same destination are buffered.

Switches are transparent to stations.

Comparison to hub

06. IP and Related Protocols

The Internet

A packet-switched network of networks. The networks and links follow the end-to-end principle:

IPv4

Terminology:

Best-Effort, Datagram Delivery:

Packet Format

Big endian byte ordering.

Length Name
4 Version
4 Hdr Len
6 TOS/DSCP
2 Unused
16 TotalLength
Bytes 0-3
16 Identification
3 Flags
13 Fragment Offset
Bytes 4-7
8 Time-To-Live
8 Protocol Type
16 Header Checksum
Bytes 8-11
32 Source Address
Bytes 12-15
32 Destination Address
Bytes 16+
32n Options + Padding
Data

Version (4)

Version of IP protocol. 4 for IPv4.

Header Length (4)

Length of header in multiples of 4 bytes. Without options, the header is 20 bytes long so the value would be 5.

TOS/DSCP (2)

Stands for Type of Service/DiffServ Code Point. Allows the priority for each packet to be defined (usually by the ISP depending on how much you pay) - mostly ignored.

Fill with zero.

Unused (2)

Fill with zero.

Total Length (16)

Total length (in bytes) of the datagram - required since some MAC layers may add padding to meet minimum length requirements.

This may be modified during fragmentation and reassembly.

Identification (16)

Identifies each IP payload accepted from higher layers for a given interface - a sequence number.

Flags (3)

Contains two flags for fragmentation and reassembly (DF = don’t fragment; MF = more fragments).

Fragmentation Offset (13)

Gives the offset of the current fragment within the entire datagram, in multiples of 8 bytes.

Time-to-Live (8)

Last resort to deal with loops - IP does not specify routing protocols and so there is a possibility that loops can occur.

This value is the upper limit on number of routers a packet can traverse, and is decremented by each router (hence routers must recalculate checksum) the packet passes through. Once the TTL values reaches 0, the packet is discarded and the sender is notified via an ICMP message.

TTL is typically 32 or 64.

Protocol Type (8)

Determines which higher-level protocol generated the payload - provides different SAPs to allow protocol multiplexing.

Value Protocol
0x01 ICMP
0x02 IGMP
0x04 IP-in-IP Encapsulation
0x06 TCP
0x11 UDP

Header Checksum (16)

Checksum for the IP header (NOT the data) - for some types of data, a few bad bits may not matter.

Source/Destination Address (32)

Indicates the initial sender and final receiver of the datagram. Internally structured into two sections: the network and host ID. The network ID refers to some local network, and the host ID must be unique within that network.

Routing and Forwarding

Routing/Forwarding Tables

IP routers have several network interfaces - these are called ports. When a router receives a packet on some input port, it looks at the DestinationAddress field to determine the output port using the forwarding table.

The forwarding table is a table of all networks the router knows about, identified by their network ID, and the output port to send the packet through to reach that network. A table lookup is used to then select the output port for every incoming packet.

Notes:

Important: a host address is tied to its location in the network; there is no mobility support - if a host switches to another network, it obtains another address, breaking ongoing TCP connections.

Classless Inter-Domain Routing

How many bits should be allocated to the network ID? Classful addressing used to be used, allocating exactly 8, 16 or 24 bits to the network ID.

CIDR was introduced in 1993 and is now mandatory, and allows an arbitrary number of bits to be allocated to the network.

Netmask

The netmask specifies which bits belong to the network ID. It is a 32 bit value, with the leftmost k bits being 1s, and the remaining bits 0. A /k netmask will have k bits allocated to the network ID. The network ID is calculated using a logical AND with the IP address.

Private host addresses

There are two special host addresses:

There are also some reserved IP blocks:

Block Usage
10.0.0.0/8 Private-use IP networks
127.0.0.0/8 Host loopback network
169.254.0.0/16 Link-local for point-to-point links
172.16.0.0/12 Private-use IP networks
192.168.0.0/16 Private-use IP networks

These addresses can be used within the provider network; packets with private addresses are not routed in the public internet.

The loopback address is usually denoted as 127.0.0.1, but any valid address in 127.0.0.0/8 can be used.

Simplified Packet Processing

IP input:

IP output:

Forwarding Table Contents and Lookup (rough approximation)

Each entry contains:

A router only knows the next hop, not the entire remaining route. The forwarding table lookup has three stages:

In End-Hosts

Forwarding tables in end hosts generally contain two or more entries:

In Routers

Most routers at the ‘border’ of the Internet (close to the customer) usually have small a forwarding table, often filled with other networks belonging to the same owner. For all other networks, they rely on the default router.

Core routers:

Fragmentation and Reassembly

The link-layer technologies used by IP have different maximum packet sizes, called the maximum transmission unit. For several reasons, IP abstracts these concerns:

Hence, IP offers its own maximum message length of 65515: (2^{16} - 1) - 20, where 20 is the minimum IP header size.

The overall process is as follows:

All fragments datagrams belonging to the same message have:

The sender can set the DF (don’t fragment) bit in the IP header to forbid fragmentation by intermediate routers. If the outgoing link’s MTU is too small, the intermediate router may return an ICMP message.

Address Resolution Protocol

IP datagrams are encapsulated into Ethernet frames, and Ethernet stations pick up a packet only if the destination MAC address matches its own. Hence, stations must know which MAC address a given IP address refers to.

ARP allows a host to map a given IP address to an MAC address. ARP is dynamic:

This trades the downsides of static configuration for the complexity of running a separate protocol.

Basic Operation

ARP makes no retransmissions if an request is not answered.

If a sender wants to send an IP packet to a local destination, it first checks its ARP cache for a binding to the IP address. If not, it starts the address resolution procedure. Once this is finished, it will encapsulate the IP packet in an Ethernet frame and direct it to the MAC address.

Entries in the ARP cache are soft-state; they get discarded some fixed time (usually 20 minutes) after it is inserted into the cache. Some implementations restart the timer after using the entry.

This ensures that stale information gets flushed out (this is much more reliable than that device of interesting notifying everyone that its binding has changed).

Frame format

ARP is a protocol working an a higher-level than Ethernet, so although the destination and source address is included in the packet, it needs to be included again in the Ethernet payload.

Internet Control Message Protocol

Allows routers or the destination host to inform the sender of error conditions such as when:

ICMP messages are encapsulated into regular IP datagrams and like IP, has no error control mechanisms.

ICMP messages are not required, and are often filtered out by firewalls even if they are sent. Thus, the sending host must not rely on ICMP messages.

Message Format

Some common type/code Combinations

Type Code Meaning
0 0 Echo reply (ping)
3 0 Destination network unreachable
3 1 Destination host unreachable
3 2 Destination protocol unreachable
3 3 Destination port unreachable
3 4 Fragmentation required but DF set
3 6 Destination network unknown
3 7 Destination host unknown
4 0 Source-quench
8 0 Echo request (ping)
11 0 TTL expired in transit
11 1 Fragment reassembly time exceeded

Destination Unreachable (type=3):

Source-quench (type=4, code=0): if IP router drops packet due to congestion. The intention is to let the source host throttle itself, but the act of sending the packet adds more work to the router.

TTL expiration (type=11, code=0): if IP router drops packet due to TTL reaching zero. tracert uses this, incrementing the TTL each time it sends a packet.

Fragmentation reassembly timeout (type=11, code=1): if not all fragments of a message received within a timeout. This invites the higher-level protocol to re-transmit the message.

07. Introduction to Routing

Network Layer Overview

Names of data units:

Responsibilities of layers:

Layering: each layer takes data from above, adds header information then passes the new data unit to the layer before.

Possible services of a general network layer:

The internet network layer, IPv4 guarantees none of these.

Routing Overview

Routing is a network-wide process that determines the end-to-end paths packets take from source to destination.

The network needs to work out the path from the sending to receiving router automatically in an dynamic environment.

A router uses its routing table to forward a packet to some output link.

The router’s switch fabric - the physical architecture of the router, varies. Some implementations put the network interfaces/line cards close to the CPU to decrease processing times. The cross bar architecture is used to more quickly forward packets to outgoing interfaces.

Hierarchical Overview

There are inter-AS and intra-AS routing algorithms, where AS stands for an autonomous system.

Each ISP is an AS, and some corporations and universities are also ASs.

Each AS is a set of routers and networks managed by a single organization. The routers exchange information via a common routing protocol - the intra routing protocol.

The only inter-AS routing protocol is that is used is BGP. The RIP and OSFP protocols are the two main intra-AS routing protocols. All three are load-insensitive.

There are three types of AS:

Forwarding vs Routing

Forwarding and routing are two independent processes.

Routing determines the path to take:

Forwarding transfers packets hop-by-hop:

The forwarding table:

Classification of Routing Algorithms

Routing algorithms solves a routing problem with ideal assumptions. Routing protocols embeds a routing algorithm in a real network environment; it operates in a distributed environment and can work around failure cases.

Classifications

Static/dynamic

Static algorithms change very slowly over time, mostly due to human intervention.

Dynamic algorithms re-compute routes in response to topology or traffic changes. The computation may occur periodically or in response to changes.

Global/Decentralized

Global algorithms assumes each node has global knowledge/state information on the network.

Decentralized algorithms assume no node has complete state information about the network and exchange information with their neighbors.

Load-sensitive/-insensitive

Load-sensitive algorithms assume link costs vary dynamically depending on the level of congestion in the link.

Internet routing protocols such as RIP, OSPF and BGP are load-insensitive and assign a static cost to each link.

08. Introduction to Routing - Link-State Routing

Modelling

Routers are modelled as nodes and links are modelled as edges. Edges have associated metrics:

Dijkstra calculates the least-cost path.

Assumptions

Each router has a complete network picture: topology, link costs etc.

To do this, each router reliably floods information about its neighbors to every other router. All routers should have consistent information.

Flooding can sometimes cause broadcast storms; controlled flooding or spanning-tree broadcasts are used to prevent this.

Dijkstra’s Algorithm

After flooding, each router independently calculates the least-cost path from itself to every other router using Dijkstra’s algorithm and generates a forwarding table for every destination.

Definitions:

Until all nodes are in SS:

Running this algorithm, a forwarding table can be generating, mapping some destination to the next hop.

Complexity:

Since routers have a complete network picture, if one router finds a cheaper path through node ww, it is likely that other routers will also use a path through ww. If the algorithm is load-sensitive, this will cause the link cost of paths through ww to increase, repeating the cycle.

09. Introduction to Routing - Distance Vector Algorithm

A dynamic, decentralized and asynchronous routing algorithm.

Both RIP and BGP use the Bellman-Ford (distance-vector) algorithm.

Bellman-Ford Equation

if dx(y)d_x(y) is the least-cost path from xx to yy, then:

dx(y)=minvc(x,v)+dv(y)d_x(y) = min_v{c(x, v) + d_v(y)} for all vv where vv is a neighbor of xx.

Distance-Vector Algorithm

Dx(y)D_x(y) is an estimation of the least-cost path from xx to yy

The distance vector, Dx\vec{D_x}, maintains a list of estimates for each node in the graph: [Dx(y):yN][ D_x(y): y \in N ], where NN is the set of all destinations.

Each node, xx knows the cost, c(x,v)c(x, v) to each neighbor vv, its distance vector, and its neighbors’ distance vectors.

In the algorithm, each node periodically sends its own distance vector estimates to its neighbors. When it receives a new estimate, it can update its own distance vector and if it changes, update its own neighbors. The distance estimates for any node will eventually converge to the actual least cost, dx(y)d_x(y).

The algorithm is iterative and asynchronous: each local iteration is called by either a local link cost change or a DV update message from its neighbor.

The algorithm is distributed: each node notifies its neighbors only when its DV changes, and each node only knows the next hop, not the entire path.

Hence, each node is in one of three state:

Pseudo-code

distance_vector = []
neighbor_distance_vectors = {}

def init():
  for y in destinations:
    distance_vector[y] = cost(self, y) # Infinity if not a neighbor
  
  for w in neighbors:
    neighbor_distance_vectors[w] = [INFINITY for y in destinations]
    send(w, distance_vector)
  
  while True:
    get_update_or_link_cost_change()

    for y in destinations:
      distance_vector[y] = min([cost(self, v) + neighbor_distance_vectors[v][y] for v in neighbors])

    if change_occurred:
      for y in neighbors:
        send(y, distance_vector)

Count-to-infinity problem

Good news travels fast: if a link cost is lowered, this change propagates through the network quickly.

However, there is an issue when a link cost increases. For example, assume xz>xyzx \rightarrow z > x \rightarrow y \rightarrow z. If the link cost from xyx \rightarrow y increases such that the new best path from xx to yy is xzyx \rightarrow z \rightarrow y, an issue will occur as the algorithm is asynchronous: zz still thinks the best path from zz to yy is zxyz \rightarrow x \rightarrow y. Hence, it will generate a routing loop: xzxzx \rightarrow z \rightarrow x \rightarrow z\dots.

Initial state: link costs of c(x,y)=1c(x, y) = 1, c(y,z)=2c(y, z) = 2, c(z,x)=20c(z, x) = 20.

x y z
x 0 1 3 (y)
y 1 0 2
z 3 (y) 2 0

At t0t_0, yy detects c(x,y)=40c(x, y) = 40. yy thinks the best route to xx is now through zz:

Dy(x)=min(cost(y,x)+Dx(x),cost(y,z)+Dz(x))=min(40+0,2+3)=5D_y(x) = min(cost(y, x) + D_x(x), cost(y, z) + D_z(x)) = min(40 + 0, 2 + 3) = 5

x y z
x 0 1 3 (y)
y 5 (z) 0 2
z 3 (y) 2 0

At t1t_1, zz receives an update from yy and calculates Dz(x)D_z(x):

Dz(x)=min(cost(z,x)+Dx(x),cost(z,y)+Dy(x))=min(20+0,2+5)=7D_z(x) = min(cost(z, x) + D_x(x), cost(z, y) + D_y(x)) = min(20 + 0, 2 + 5) = 7

x y z
x 0 1 3 (y)
y 5 (z) 0 2
z 7 (y) 2 0

Thus, at each iteration, Dz(x)D_z(x) and Dy(x)D_y(x) increases by 2 (link cost between yy and zz) and eventually converge:

x y z
x 0 1 3 (y)
y 22 (z) 0 2
z 20 2 0

Poisoned reverse

If a node xx routes packets to zz through yy, xx will lie to yy, telling it that Dx(z)=D_x(z) = \infty.

Before the link cost change in the example above, zz lies to yy, saying Dz(x)=D_z(x) = \infty as long as it uses yy to route to xx. In yy’s DV table:

x y z
x 0 1 \infty
y 1 0 2
z \infty 2 0

At t0t_0, yy detects c(x,y)=40c(x, y) = 40:

Dy(x)=min(cost(y,x)+Dx(x),cost(y,z)+Dz(x))=min(40+0,2+)=40D_y(x) = min(cost(y, x) + D_x(x), cost(y, z) + D_z(x)) = min(40 + 0, 2 + \infty) = 40

x y z
x 0 1 \infty
y 40 0 2
z \infty 2 0

At t1t_1, zz, receives an update from yy and calculates Dz(x)D_z(x):

Dz(x)=min(cost(z,x)+Dx(x),cost(z,y)+Dy(x))=min(20+0,2+40)=20D_z(x) = min(cost(z, x) + D_x(x), cost(z, y) + D_y(x)) = min(20 + 0, 2 + 40) = 20:

x y z
x 0 1 \infty
y 40 0 2
z 20 2 0

At t2t_2, yy, updates zz, now without lying. yy calculates Dy(x)D_y(x):

Dy(x)=min(cost(y,x)+Dx(x),cost(y,z)+Dz(x))=min(40+0,2+20)=22D_y(x) = min(cost(y, x) + D_x(x), cost(y, z) + D_z(x)) = min(40 + 0, 2 + 20) = 22

x y z
x 0 1 \infty
y 22 (z) 0 2
z 20 2 0

Hence, the lie makes it converge faster.

However, poisoned reverse can make the situation worse and cause routing loops. Example:

cost(w,y)=1cost(w,z)=1cost(x,y)=4cost(x,z)=50cost(y,z)=3 \begin{aligned} cost(w, y) &= 1 \\ cost(w, z) &= 1 \\ cost(x, y) &= 4 \\ cost(x, z) &= 50 \\ cost(y, z) &= 3 \\ \end{aligned}

Consider only entries only to xx:

cost(y,x)=4(direct)cost(w,x)=5(wyx)cost(z,x)=6(zwyx) \begin{aligned} cost(y, x) &= 4 (direct) \\ cost(w, x) &= 5 (w \rightarrow y \rightarrow x) \\ cost(z, x) &= 6 (z \rightarrow w \rightarrow y \rightarrow x) \end{aligned}

So zz tells ww that Dz(x)=D_z(x)=\infty and ww tells yy that Dw(x)=D_w(x)=\infty; they lie to the node they rely on for routing, ensuring they do not send requests back to them.

At t0t_0, yy detects cost(x,y)=60cost(x, y) = 60 and recalculates its vector table: Dy(x)D_y(x) is the minimum of:

Hence, Dy(x)=9D_y(x) = 9 (through zz). Now, yy notifies ww that Dy(x)=9D_y(x) = 9 and zz that Dy(x)=D_y(x) = \infty.

Now ww recomputes Dw(x)D_w(x):

Hence, Dw(x)=10D_w(x) = 10 (through yy). Now, ww notifies zz that Dw(x)=10D_w(x) = 10 and yy that Dw(x)=D_w(x) = \infty.

Now, zz recomputes Dz(x)D_z(x):

Hence, Dz(x)=11D_z(x) = 11 (through ww). Now zz notifies yy that Dz(x)=11D_z(x) = 11 and ww that Dz(x)=D_z(x) = \infty. Now, yy recomputes Dy(x)D_y(x):

Hence, Dy(x)=14D_y(x) = 14 (through zz). The process repeats.

The computed path from yy to xx is thus yzwyy \rightarrow z \rightarrow w \rightarrow y \rightarrow \dots; a routing loop occurs.

To fix this, RIP limits the maximum cost of a path is limited to 15.

Message complexity:

Speed of convergence:

Robustness - behavior when router malfunctions:

10. NAT, IPv6, RIP, OSPF, BGP

NAT

Motivations:

Each device has an internal IP address, but packets exiting the network have their source address transformed into a single public IP address.

Question: how does the router know which host to forward returning traffic if all devices have the same public IP address?

A table mapping a (external IP, external port) tuple to an (internal IP, internal port) tuple. When the device sends a packet, the router will randomly choose an external port to use, and when the response packet arrives, it can simply map this to the device’s internal IP and port.

The port number is a 16 bit field, so it theoretically allows over 60,000 simultaneous connections.

Controversy:

IPv6

Motivations:

Datagram Format

IPv6 is simpler; no fragmentation, no checksums, fixed length (40 bytes) header:

Transition

Old devices only support IPv4, and some routers cannot be upgraded. Hence, ‘flag days’, where the world switches protocol on some specified day, will not work.

With a dual-stack approach, all IPv6 nodes also have complete IPv4 implementations, allowing packets to be transformed in both directions as it is routed. This loses IPv6 or IPv4 specific fields.

Another approach is tunnelling: stuff the whole IPv6 datagram as the payload of a IPv4 datagram. If the packet is small enough and DF is set, a IPv6 router can then parse the payload and forward it as a IPv6 packet.

Due to DHCP, CIDRised addresses and NAT partially solving the IP address shortage problem (in the short term), IPv6 adoption has been slow.

Routing in the Internet

Flat networks don’t scale: if you have millions of hosts, storing routing information for each requires large amounts of memory. If link-state routing is used, broadcasting LS updates takes too much bandwidth, and if distance-vector routing is used, it will never converge.

Hierarchical Routing

Hierarchical routing aggregates routers into autonomous regions, where routers belong to the same organization and run the same intra-AS routing protocol.

Gateway routers have a direct link to a router in another AS.

The forwarding table is configured by both intra- and inter-AS routing algorithms:

Routing between ASes

Case 1: a single gateway router to only one other AS:

Case 2: two or more physical links to other ASes (typically a transit AS):

RIP - Routing Information Protocol

Uses the distance vector algorithm.

Distance metric:

To determine subnets, detach each interface from its host/router to create islands of isolated networks; the interfaces are the endpoints of the isolated networks, called subnets.

RIP advertisements:

Example:

w-- A --x-- D --y-- B-- ... --`z`
    |
    C

The routing table in D would have:

Dest. net Next router Num. hops
w A 2
y B 2
z B 7
x NA 1

If A then advertises its own routing table where the cost to z (through C) is 4, D would use the Bellman-Ford equation and update its routing table to route requests to z through A with a cost of 5.

After 180 seconds of no advertisements, the neighbor/link is declared dead; routes going through that neighbor are invalidated and new advertisements are sent to neighbors. Through this, the link failure information propagates through the entire network.

Differences between RIP and the DV Algorithm

OSPF - Open Shortest Path First

Advanced features not in RIP:

OSPF ASes are configured into areas. Within an area:

There is one backbone area which links areas together. In the backbone:

BGP - Border Gateway Protocol

The de facto standard. Provides each AS with the ability to:

Pairs of routers - BGP peers, exchange routing information over TCP connections (called BGP sessions).

eBGP sessions allow routers in two different AS’s to exchange reachability information - when it advertises a prefix, it promises that it can forward any datagrams destined to that prefix. The receiver can then create an entry for that prefix in its forwarding table and re-advertise the same information to others via iBGP (between routers in the same AS) or eBGP sessions.

The advertisement is a combination of the prefix and attributes. Two important attributes are:

Route selection

Routers may learn about more than one route to a given prefix, so it uses elimination rules to pick one:

  1. Local preference value attribute (policy decision)
  2. Shortest AS-PATH (DV algorithm)
  3. Closest NEXT-HOP router (hot potato routing)
  4. Additional criteria

Routing policy can affect routing. For example, a keep silent policy could occur if X is a dual-homed (e.g. for redundancy) customer network connected to B and C provider networks; it will not want to route traffic from B to C and hence hence, it will not advertise to B a route to C, or vice versa.

11. Reliable Data Transfer - Error Detection and Correction

Transmission errors can occur due to factor such as:

To correct:

Error detection

Use redundancy bits, sometimes called error detection and correction bits.

These bits can be used to check if an error has been detected (not that it has occurred).

Parity Check

This is the simplest way of detection an error; append a single parity bit to the end of a block of data.

2-D Parity Check

Divide bits into a matrix and use parity bits for each column and row, plus a parity bit for the parity bits (number of 1’s for the row and column parity bits).

Hence, it is possible to determine the bit which was flipped and hence correct single-bit errors.

This is called forward error correction - the receiver can detect and correct errors without needing to transmit a message to the sender.

Internet Checksum

Used for IPv4 header checksum and UDP data checksum (optional in IPv4, mandatory in IPv6. Checksum of part of the IP header (source/destination addresses, plus other fields), the UDP header and data).

Segment contents are treated as a sequence of 16-bit integers.

The checksum is the wraparound sum of the integers (ones complement):

Once all integers have been summed, invert the bits.

The receiver will sum all segments, plus the checksum, and check that all the bits are 1.

This scheme is considerably less effective than CRC, but also very simple to implement.

Why does IP (not v6), UDP and TCP even need checksums?

def ones_complement_addition(shorts):
  result = 0
  for short in shorts:
    result += short
    if result > 0xFFFF:
      result -= 0xFFFF
  return result

def checksum(shorts):
  return 0xFFFF - ones_complement_addition(shorts)

def check_checksum(shorts, checksum):
  return 0xFFFF == (ones_complement_addition(shorts) + checksum)

Cyclic Redundancy Check (CRC)

Assumptions:

The sender chooses r additional bits to append to the data such that the d+r are divisible by the generator using modulo-2 arithmetic.

The receiver then divides the d+r bits by the generator; if there is no remainder, no error is detected.

Modulo-2 arithmetic:

The DD bits are shifted left by r bits, so T=D2rFT = D \cdot 2^r \oplus F.

This is divisible by the divisor: hence, D2rF=nGD \cdot 2^r \oplus F = nG.

F\oplus F both sides: D2r=nGFD \cdot 2^r = nG \oplus F

Hence, FF is the remainder of $\frac{D \cdot 2^r}{G}` (using modulo-2 arithmetic).

Example

G=11 0101, D=10 1000 1101, r=5

Initially, set the F bits to zero:

Divide 11 0101 by T=101 0001 1010 0000:

10 1000 XOR 11 0101 = 01 1011. The divisor and divided have an equal number of bits, so the most significant bit of the result is 1.

Get rid of the leading 0, then pull the next least significant bit from T:

11 0111 XOR 11 0101 = 00 1110. Second MSB of result is 1 again.

Git rid of the leading 0’s, then pull the next least significant bit from T:

1 1101 XOR 11 0101; this will not fit, so add a 0 as the next MSB of the result. Pull the next LSB from T:

11 1010 XOR 11 0101 = 00 1011

So result of division is 11 0101 0110 with a remainder of 1110. Thus, F=01110 and the message is T=101 001 1010 1110.

Other notes

Forward Error Correction

Useful on wireless applications with propagation delay and high bit error rates.

Messages are divided into blocks of k-bits called datawords; the FEC encoder generates n-bit codewords.

Hamming distance: distance d(v1,v2)d(v_1, v_2) between two n-bit binary sequences; the number of bits in which V1V_1 and v2v_2 disagree (assuming they are the same length).

Error correction can be done by finding the minimum distance from the received codeword to any valid codeword (and thus dataword). This fails when it is an equal distance away from multiple valid codewords.

Useful properties for the scheme:

k-error correcting: minimum hamming distance between two valid codewords is at least 2k + 1.

12. Reliable Data Transfer - ARQ Protocols

Lost packets and out of order packets: acknowledgement and retransmission required.

Reliable data transfer protocols based on retransmission are known as Automatic Repeat reQuest (ARQ) Protocols. These include:

Reliable Data Transfer (RDT) Protocols

RDT 1.0: Perfectly-Reliable

A perfectly reliable channel with:

RDT 2.0: Bit Errors

All packets will arrive at the receivers, but the packet may be corrupted. Checksumming/CRC required for error detection.

Once the sender transmits a packet, it waits for a NAK/ACK from the receiver:

This is known as a stop-and-wait protocol as the sender must wait for the NAK/ACK.

RDT 2.1: Handles Corrupted ACK/NACK

In RDT2.0, if the response is corrupted, the sender does not know if the receiver got a corrupted packet.

In this case, the sender will retransmit the packet. As the receiver does not know if it is a new transmission or a retransmission, RDT 2.0 requires sequence numbers.

As there is a response after every single packet, only a single bit is required for the sequence number. If it is a retransmission, it will use the same value but if it is a transmission, it will flip the bit.

RDT 2.2: NAK-Free Protocol

The receiver only sends ACK messages, but it also sends a sequence number along with it.

The ACK sequence number is the sequence number of the last packet that was received successfully.

RDT 3.0: Lossy Channel with Bit Errors

Packets can get lost and bits can get flipped.

If ACK does not arrive after a timeout, the sender should retransmit the packet. Hence, the sender retransmits on both packet/ACK loss or delay.

If the timeout is too long, performance decreases, but if it is too short, duplicate data packets get sent.

This is also known as the alternating-bit protocol.

There are four cases:

Pipelined Protocols

RDT 3.0 is a stop-and-wait-protocol, so the sender spends most of its time waiting rather than transmitting. If RTT is the round trip time, L is the packet size and R is the transmission rate, then the sender utilization is U_sender = (L/R) / (RTT + L/R) (add L/R on denominator as the entire packet must be received before the ACK can be sent).

Pipelining allows the sender to send multiple packets without waiting for an ACK to arrive. This requires the sequence number to occupy more than one bit, and forces the sender and receiver to buffer more than one packet. There are two basic approaches:

Go-Back-N Protocol

The sender can send multiple packets but can have no more than N unacknowledged packets in the pipeline.

These protocols are sometimes called sliding-window protocols: there is a window of size N that must contain all the unacknowledged packets.

Packets to the left of the window are acknowledged and packets to the left are not yet sent. Inside the window is unacknowledged packets or packets that can be sent immediately.

The window size is related to flow and congestion control.

The protocol uses cumulative acknowledgement: an ACK with sequence number n acknowledges all packets with sequence numbers <= n. If any unexpected packets arrive (e.g. out-of-order), they are discarded and an ACK with the most recently received valid packet is sent.

On timeout, all packets in the window are resent.

Performance problem: if the transmission rate is high, there will be a lot of unacknowledged packets, so if the first packet is corrupted, all the following packets will be discarded.

Sender events:

Receiver events:

In the quiz:

Selective Repeat/Reject

The receiver individually acknowledges all correctly received packets, buffering packets as needed for in-order delivery to the higher layer.

The sender only resends the oldest packet for which the ACK is not received (timeout).

The receiver’s window can have:

If a packet is re-received, this means the ACK for the packet was lost, and so the receiver should send an ACK for that packet again. If the packet was near the base of the window before the receiver sent the original ACK, this could mean that the receiver window is now ahead of the sender window.

The worst case scenario for window size is:

To prevent this from being interpreted as a new packet, the sender base number must not be in the receiver window. Hence, the window size must be larger than 2N12N - 1; in other words, the window size is at maximum half the sequence space.

Hidden assumption: the packets will NOT be reordered within the channel between the sender and receiver - this is reasonable for a single link, but could be broken in an multi-hop network.

If this assumption is broken, old copies of a packet with a sequence number not in either the sender or receiver’s window can arrive. One solution is to have a live time for every packet to ensure it will not stay in the network for very long - ideally, the sequence space will be large enough that it is impossible to fully cycle through the sequence numbers within this time. Note that the receiver will ignore these packets regardless.

13. Transport Layer Protocol - Reliable Data Transfer

TCP Segments

TCP will divide each message into segments - the sequence number for each segment is the index of the first data byte in the segment. In practice, a random sequence number is used for the initial segment’s sequence number.

The receiver’s ACK message contains the index of the largest byte received: the segment’s sequence number plus the number of data bytes.

TCP uses a single retransmission timer.

Simplified TCP sender (without refinements below):

next_sequence_number = initial_sequence_number
send_base = initial_sequence_number
while True:
  if # data received from upper layer:
    segment = create_tcp_segment(next_sequence_number)
    if timer is None: # no unacknowledged segments in the window
      timer = start_timer()
    send_packet(segment)
    next_sequence_number += len(segment.data)

  elif # timer expired:
    # re-send the segment with the oldest sequence number 
    segment = unacknowledged_packets_in_window()[0]
    send_packet(segment)
    timer = start_timer()

  elif # ack received:
    ack = get_sequence_number()
    if ack > send_base:
      send_base = ack # cumulative ACK
      if len(unacknowledged_packets_in_window()) != 0:
        timer = start_timer()

Some refinements:

Events on the receiver side:

TCP is a mixture of the go-back-N and selective repeat strategies:

Setting the Timeout Value

The timeout value should be longer than the round-trip time, so the first step is to find this value.

The SampleRTT will be the time from segment transmission to ACK, with the condition that retransmissions should not be used - the ACK for the first transmission could arrive after the sender retransmits, giving a wrong value.

The RTT varies over time, so several samples are required. This is done using exponential weighted moving average, which puts more weights on more recent samples:

EstimatedRTT=(1α)EstimatedRTT+αSampleRTTEstimatedRTT = (1 - \alpha) \cdot EstimatedRTT + \alpha \cdot SampleRTT, where α\alpha is typically a value such as 0.1250.125.

An estimate of the variability is also required:

DevRTT=(1β)DevRTT+βSampleRTTEstimatedRTTDevRTT = (1 - \beta) \cdot DevRTT + \beta \cdot |SampleRTT - EstimatedRTT|, where β\beta is typically a value such as 0.250.25.

With these two values, the timeout interval can be calculated:

TimeoutInterval=EstimatedRTT+4DevRTTTimeoutInterval = EstimatedRTT + 4 \cdot DevRTT

14. Transport Layer Protocols - Flow and Congestion Control

TCP creates a reliable data transfer service on top of IP using pipelined segments, cumulative ACKs, retransmission timers etc.

Setting the window size

The window size is simply the difference between the last byte sent and last byte acknowledged. TCP simply limits this to LastByteSent - LastByteACKed <= min(CongestionWindow, RecvWindow), where the former is determined by the network congestion control algorithm and the latter by the TCP flow control algorithm.

Flow Control

Flow control is a speed-matching service - it matches the rate at which the sender writes against the rate at which the receiver can read.

The sender and receiver allocate buffers; the receiver simply sends the amount of spare room its buffer (TCP is full-duplex):

RecvWindow = RecvBuffer - (LastByteReceived - LastByteRead)

This is written to the 16-bit ‘Receive Window’ section in the TCP header.

Extreme Case

If the receive window is 0 bytes (i.e. buffer full as the higher-level application has not read the buffer contents), the sender will stop sending data.

This will mean that the receiver can no longer send acknowledgement packets back with updated window sizes. Hence, when the RecvWindow is 0, the sender will periodically send 1 byte of data.

Congestion Control

Packet retransmission treats the symptom of packet loss, not the cause, which is typically overflowing router buffers. In fact, retransmissions can make the problem worse.

Congestion control is not so much a service provided to the application, but to the network to the whole.

There are two broad approaches to congestion control. The first is network-assisted congestion control, where routers provide feedback to end-systems by indicating an explicit rate the sender should send at and/or a single bit flag indicating congestion. Today’s systems try to make routers as simple as possible and so do not use this approach.

Instead, end-to-end congestion control is used, where congestion is inferred from end systems observing loss and delay.

Congestion is caused by too many sources sending too much data too fast for the network to handle. This manifests itself through lost packets (buffer overflows) and long delays (queueing).

Congestion control attempts to ensure everyone enjoys a share of resources without causing network congestion. However, there is no predefined share for everyone - it is dynamic.

Assuming that RecvWindow > CongestionWindow (i.e. LastByteSent - LastByteACKed <= CongestionWindow) and there is negligible transmission delay (time between sending first and last byte in a segment negligible), within in one RTT, the sender sends at most CongestionWindow bytes.

Hence, the rate of transmission is approximately CongestionWindow / RTT - CongestionWindow can be adjusted according to network congestion.

The arrival of ACKs is taken as an indication of the network being congestion-free: if ACKs (for previously unacknowledged segments) arrive quickly, the congestion window should be increased quickly (and slowly if ACKs arrive slowly).

After a loss event - a timeout or 3 duplicate ACKs, the congestion window should decrease.

TCP Slow Start

Begin the connection with CongestionWindow = 1 MSS (maximum segment size). This means 1 segment is sent per RTT.

This may be much smaller than the available bandwidth. Hence, the rate should be increased exponentially until the first loss event: double the congestion window every RTT - this is done by incrementing CongestionWindow by 1 MSS for every ACK.

After the first loss event, congestion control moves on to AIMD:

TCP AIMD (Additive Increase, Multiplicative Decrease)

Additive increase (AI): increase CongestionWindow by 1 MSS every RTT in the absence of loss events. This can be done by CongestionWindow+=1MSSCongestionWindowCongestionWindow \mathrel{+}= \frac{1 MSS}{CongestionWindow} whenever a new ACK arrives (1MSSCongestionWindowCongestionWindow1MSS=1MSS\frac{1 MSS}{CongestionWindow} \cdot \frac{CongestionWindow}{1 MSS} = 1 MSS). Additive increase is called congestion avoidance as it increases slowly to avoid congestion.

Multiplicative decrease (MD): halve CongestionWindow after every loss event.

This leads to a saw-tooth pattern in the congestion window size.

Timeout Rules

After three duplicate ACKs arrive, the CongestionWindow should be halved and then the window should grow linearly (AIMD).

After a timeout event, CongestionWindow should be set to 1 MSS and grow exponentially (slow start) until it reaches a threshold - half of CongestionWindow from before the timeout.

Note that the duplicate ACK behavior occurs in TCP Reno; TCP Tahoe’s (Reno’s predecessor) behavior is the same as a timeout event.

15. Transport-Layer Services - TCP and UDP

Transport protocols provide logical communication between application processes running on different hosts.

The sender side breaks the application messages into segments and passes it to the network layer, while the receiver reassembles segments into messages and passes it to the application layer.

Multiple transport layer protocols are available, the most common being TCP and UDP.

The internet transport layer extends IP’s host-to-host delivery to process-to-process delivery (hence requiring multiplexing and demultiplexing), with TCP specifically adding reliable delivery, flow control and congestion control.

Multiplexing and Demultiplexing

Demultiplexing: the host receives IP datagrams; each has source/destination IP addresses and carries a single transport-layer segment. Inside each segment is the source/destination port number.

Connectionless-demultiplexing

With connectionless-demultiplexing (e.g. UDP), a two-tuple of the destination IP and port number can be used to identify the socket. Packets with different source IP and/or port numbers will be directed to the same socket.

Connection-oriented-demultiplexing

TCP sockets are identified by a 4-tuple: all four of the source and destination IP addresses and port numbers are required to direct the segment to the appropriate socket.

Hence, the server host may support many simultaneous TCP sockets through a single port number.

UDP

A bare bones, best effort service: segments can be lost or delivered out of order.

UDP is often used for streaming multimedia applications as this use case is loss-tolerate and rate-sensitive. UDP is also used in RIP, DNS, SNMP.

Segment structure:

|   16 bits   |      16 bits     |
| Source port | Destination port |
|   Length    |     Checksum     |
|              Data              |

The length is the length in bytes of the segment, including the header.

Advantages:

The UDP socket for an incoming packet can be identified by its destination IP and port.

TCP

Properties:

Packet structure:

| 16 bits     | 16 bits           |
| Source port | Destination port  |
|         Sequence Number         |
|      Acknowledgement Number     |
| Flags       | Receive window    |
| Checksum    | Urg. data pointer |
|     Options (variable)          |
|        Application Data         |

Notes:

Flags:

TCP Connection Management

Opening: three-way handshake

NB: segments with SYN or FIN flags are treated as if they have a 1 byte payload.

Step 1:

The second step both acknowledges the segment and sends the server’s ISN:

This third step is needed to confirm that the client has received the server’s ISN:

Termination

Client waits for double the maximum segment lifespan to ensure the ACK arrives.

16. Introduction to the Web and HTTP

Network Applications

Network applications are programs that run on different end-systems and communicate with each other over the network (e.g. web browser and server).

Application protocols are a small part of a network application, and different network applications may share the same application protocol.

Services applications need:

TCP provides a reliable transport service, flow control and congestion control, but no guarantees for latency or transmission rate.

UDP provides none of these guarantees.

Application Structure

Client-server:

Peer-to-peer:

Hybrid. Napster is an example:

The Web

Hypertext Transfer Protocol (HTTP)

A web page consists of a base HTML-file and several referenced objects:

HTTP uses the client-server model:

Uses TCP:

HTTP is stateless: the server can work without maintaining any information about past client requests.

HTTP Connections

Non-persistent HTTP: At most one object is sent over a TCP connection. Used by HTTP/1.0.

Persistent HTTP: multiple objects can be sent over a single TCP connection; used by HTTP/1.1 by default.

The client will initialize the TCP connection, send an HTTP request message (containing some URL); the server will receive this response and form a response message containing the requested object.

HTTP/1.0 will close the TCP connection after the object is delivered, while persistent HTTP will leave the connection open and use the same connection for subsequent HTTP requests.

Response Time Modelling (RTT): time to send a small packet to travel from the client to the server and back.

For HTTP/1.0:

Hence, the total time is 2 RTTs, plus the transmission time for the file. After the file is received, the TCP connection must also be closed.

In addition to this, the OS must allocate resources for each TCP connection; browsers will often use parallel TCP connections to fetch referenced objects.

Persistent HTTP with pipelining is the default behavior in HTTP/1.1; the client sends requests as soon as it encounters a referenced file.

HTTP Messages

The HTTP request consists of the following ASCII-encoded text:

Some methods:

Some headers:

The response message follows this format:

Some HTTP status codes:

Cookies: User-Server State

HTTP is stateless, so to identify users cookies are used. There are four components:

Web Caches (Proxy Servers)

Proxy servers sit between the origin server and client. There can be multiple proxy servers, one of which will hopefully be closer to the client than the origin server.

The proxy server/cache acts as both a client and server. It helps to:

If the requested object is in the cache, the browser will use it. Otherwise, it will make a request to the proxy server (and if the proxy does not have the resource, it will in turn make a request to the origin server).

Conditional GET

The server will not send the object if the cache has an up-to-date copy of the object.

Request header: If-modified-since: some_date.

If the object has not been modified since the given date, a 304 Not Modified response will be sent and there will be no body. Otherwise, it will generate a normal 200 OK response.

17. Email and DNS

Email

Three major components:

Simple Mail Transfer Protocol (SMTP)

The standard protocol used to transfer email between servers.

Basic operation:

SMTP uses telenet connection and runs on port 25 by default.

Sample telenet interaction:

Mail message format:

From: user@client_host_name
To: user@server_host_name
Subject: subject_text
[BLANK LINE]
message_body

The receiving server appends a Received: header line to the top of the message as a record of the mail servers the email passed through.

Multipurpose Internet Mail Extension (MIME)

For non-ASCII data, MIME is used. It specifies the type of content (e.g. jpeg, mp3) and the method in which it is encoded. For example:

...
Subject: subject_text
MIME-Version: 1.0
Content-Transfer-Encoding: base64
Content-Type: image/jpeg
[BLANK LINE]
base64_encoded data

In base64 encoding, every three octets (24 bits) are divided into four bytes of 6 bits each - enough to fit into an ASCII character. The characters A-Za-z0-9+/ is used, with A corresponding to 0 and / to 63. = is used as padding.

SMTP and Mail Access Protocols

Access protocols are used for communication between the receiver’s mail server and user agent. Two mail protocols are:

Web-based email is another alternative; the user agent uses HTTP to communicate with its remote mailbox.

POP3

TCP telenet connection running on port 110.

During the authorization phase, the client sends two commands:

After each command is sent, the server responds with either +OK or -ERR.

During the transaction phase, the client can send:

IMAP

IMAP:

Domain Name System (DNS)

DNS is:

Some DNS services:

DNS is not centralized:

Hierarchy

DNS is composted of three main levels of DNS servers:

If a client wants the IP address for www.google.com, to a first approximation, it will:

There are 13 root name servers worldwide (although each is a cluster of replicated servers).

Local name servers do not belong to the hierarchy, but are central to the DNS architecture. Each ISP has provides one and acts as the default name server.

Query Types and Caching

If a client wants the IP address for www.google.com through an iterated query:

A recursive query puts the burden of name resolution to the contacted name server:

Both queries require 8 messages. Hence, caching is used:

Hence, root name servers are usually not consulted.

DNS records

Each resource record stores the name, value, type, ttl and class (although that is always IN (internet)).

Some types values:

DNS Messages

Both query and rely messages use the same message format:

|       16 bits      |       16 bits       |
|   Identification   |       Flags         |
|   Num. questions   |   Num. answer RPs   |
| Num. authority RPs | Num. additional RPs |
|        Questions (variable num.)         |
|       Answers (variable num. RRs)        |
|      Authority (variable num. RRs)       |
|    Additional info (variable num. RRs)   |

The query and reply to the corresponding query have the same identification number.

The flags indicate;

The additional fields:

Adding New Records

Example: registering a domain name at a registrar:

Mid-Term Test Notes

Sockets

IP: stateless, unacknowledged, unreliable, unordered.

TCP/UDP add ports on top of IPv4 for addressing.

TCP/Stream:

UDP/Datagram:

API:

Network byte order: big endian.

Protocol Layering

Service access point: interface to service for higher layer.

| Header | Higher-layer payload/service data unit | Trailer |

OSI

Physical/Link layers single-hop scope.

Network layer use single-hops to achieve end-to-end communication.

Routers implement physical, link and network which are intentionally kept simple (and thus allows hardware implementations for high performance).

Physical:

Link:

Network:

Transport:

Session:

Presentation:

Application:

TCP/IP reference model: physical/network interface/internet/transport/application.

Transport:

Service Primitives

Confirmed service:

Unconfirmed: no response or confirm e.g. UDP.

Confirmed Delivery: Provider B sends PDU when request PDU received regardless what user B does.

Multiplexing:

Splitting:

Fragmentation:

Blocking:

Physical Layer

Passband transmission:

s(t)=A(t)cos[(fc+f(t))t+ϕ(t)] s(t)= A(t)\cdot cos[(f_c + f(t)) \cdot t + \phi(t)]

Where t is less than duration T, the symbol duration, A(t) is the amplitude, f(t) is the frequency offset, f_c is the center frequency and \phi(t) is the phase.

Quadrature Amplitude Modulation: varies amplitude and phase. n unique amplitudes and n unique phases for n^2 symbols. Greater throughput but higher error rate.

Carrier synchronization: synchronizes frequency and phase.

NRZ can lead to long runs of the same bit and loss of synchronization.

Manchester encoding: [-x, x] voltage range:

Frame synchronization, ethernet: preamble at start for carrier synchronization, then start-frame-delimiter.

End of frame detection: gap between frames, deliberate code violations, dedicated length fields, special flags (with escape codes).

LAN

Geometric RV:

Orthogonal: behavior of one station independent of another.

Frequency division multiple access:

Time:

Both require some way of allocating sub-channels/time slots.

Random access protocol: no central station/shared state.

ALOHA:

CSMA - carrier sense multiple access. Carrier sense: almost instantaneous check of if the medium is busy. Fails if time difference between two stations starting collisions is smaller than the propagation delay.

Non-persistent CSMA:

p-persistent CSMA:

1-persistent: CMSA/CD:

Ethernet

Manchester Encoding:

Length Name
7 Preamble
1 SOF
6 DstAddr
6 SrcAddr
2 Length/Type
46-1500 Payload
4 FCS

MAC address: 48 bit globally unique. If first of address is 1, multicast. Else, unicast address.

If L/T >= 1500, type field (e.g. IPv4, ARP) to allow for protocol multiplexing. Else, length field. Type assumed to be first two bytes of payload.

Broadcast: half-duplex, MAC needed, 10 Mbps PHYs.

MAC protocol: CSMA/CD:

IP

Big endian byte ordering

Length Name
Bytes 0-3
4 Version (=4)
4 Hdr Len (numBytes/4)
6 TOS/DSCP (zeros)
2 Unused (zeros)
16 TotalLength (bytes)
Bytes 4-7
16 Identification
3 Flags (0, DF, MF)
13 FragmentOffset (offset/8)
Bytes 8-11
8 Time-To-Live (usually 32/64)
8 Protocol Type
16 Header Checksum
Bytes 12-15
32 Source Address
Bytes 16-19
32 Destination Address
Bytes 20+
32n Options + Padding
Data

Last fragment has MF unset, and non-zero FragmentOffset to differentiate from an unfragmented packet.

Protocol type: determines higher-level protocol that generated the payload (e.g. UDP/TCP).

Routing table: map from destination address to output port.

Host address of all 0s: network as a whole being referred to. All 1s: broadcast address.

Lookup:

End hosts have default route + entry for each network it is directly connected to.

Core routers: know almost all internet networks, have no default routers.

Address Resolution Protocol: map from IP to MAC:

Internet Control Message Protocol - optional protocol used to inform sender of error.

Test Notes

Distance-Vector

Periodically send DxD_x, node xx’s estimates of the least cost path to all nodes in the networks: Dx(y)=min([C(x,v)+Dv(y) for v in neighbors(x)])D_x(y) = min([C(x, v) + D_v(y) \text{ for } v \text{ in } neighbors(x)]), to its neighbors.

Poisoned reverse: if xx routes to zz through yy, xx will say that Dx(z)=D_x(z) = \infty, ensuring yy will not route to zz through xx if the link cost changes. This reduces the time to converge when link cost increases.

RIP

Each router reliably floods information about its neighbors and independently calculates the least-cost path to every other router.

Add nearest node not in tree to the tree, calculate distance to each neighbor through the node.

O(n2)O(n^2) algorithm, O(nE)O(nE) messages.

OSPF

Hierarchy

IPv6

When forwarding packet through IPv4 router:

BGP

Pairs of routers, either in the same (iBGP) or different AS (eBGP), exchange information over TCP. If a node advertises a prefix, it promises that it can forward datagrams to the prefix. Some attributes:

Import policy used to accept/reject advertisements.

Route selection uses local preference (policy decision), shortest AS_PATH, closest NEXT_HOP router etc.

Reliable Data Transfer

Internet checksum:

def ones_complement_addition(shorts):
  result = 0
  for short in shorts:
    result += short
    if result > 0xFFFF:
      result -= 0xFFFF
  return result

def checksum(shorts):
  return 0xFFFF - ones_complement_addition(shorts)

def check_checksum(shorts, checksum):
  return 0xFFFF == (ones_complement_addition(shorts) + checksum)

CRC

RDT

1.0: no bit errors, buffer overflows, out-of-order packets, packet loss

2.0: bit errors; checksum. Stop and wait for ACK/NACK after each packet

2.1: 1-bit sequence numbers in case ACK/NACK corrupted

2.2: No NACK. Sends ACK with sequence number - last packet that was received successfully

3.0: bit errors and lost packets. Assumes packets are not reordered in the channel (although this can be solved by having a time to live and sufficiently large sequence space)

Go-Back-N

No more than N unacknowledged packets. Receiver sends cumulative ACK. On timeout, all packets reset.

Selective Repeat

An ACK for each packet.

On timeout, resend oldest packet without ACK. If duplicate packet received, send ACK for that packet again.

Worst case scenario:

To prevent this from being interpreted as a new packet, the sender base number must not be in the receiver window. Hence, the window size must be larger than 2N12N - 1.

UDP and TCP

UDP

|   16 bits   |      16 bits     |
| Source port | Destination port |
|   Length    |     Checksum     |
|              Data              |

Identified by destination port and IP.

TCP

| 16 bits     | 16 bits           |
| Source port | Destination port  |
|         Sequence Number         |
|      Acknowledgement Number     |
| Flags       | Receive window    |
| Checksum    | Urg. data pointer |
|     Options (variable)          |
|        Application Data         |

Handshake and Termination

Segments with SYN or FIN treated as if they have a 1 byte payload.

Three-way handshake:

  1. Client sends TCP SYN segment
  2. Server replies with SYNACK: ack = client_isn + 1
  3. Client replies with ACK: ack = server_isn + 1, sequence_number = client_isn + 1

Termination:

  1. Client sends FIN
  2. Server responds with ACK
  3. Once connection closed, server sends FIN
  4. Client responds with ACK, waiting twice the max segment lifespan to ensure the ACK arrived

Acknowledgements

ACK: largest byte received (cumulative ACK).

Sender events:

Receiver events:

Timeout value requires finding round trip time: use exponential weighted moving average:

Flow control

Congestion Control

Overflowing router buffers:

Slow start:

AIMD (after first loss):

After 3 duplicate ACKs, half window and move to AIMD.

After timeout, slow start until it reaches half of window size before timeout. Then move to AIMD.

LastByteSent - LastByteAcked <= min(CongestionWindow, ReceiveWindow)

Lab 4

Ping failure modes:

ASes: