06. IP and Related Protocols

The Internet

A packet-switched network of networks. The networks and links follow the end-to-end principle:

Keep routers dumb, put intelligence in hosts
- Routers know how to deliver packets
- Making deliveries reliable is done by the end host

IPv4

Terminology:

IP Packets are called datagrams unless fragmentation/reassembly is being used - they are called fragments in this case
End stations are called hosts
IP routers are called routers
IP addresses are assigned to network interfaces, not hosts

Best-Effort, Datagram Delivery:

Connectionless: no connection/shared state
Unacknowledged
Unreliable: no retransmissions
Unordered: no in-sequence guarantees

Packet Format

Big endian byte ordering.

Length	Name
4	Version
4	Hdr Len
6	TOS/DSCP
2	Unused
16	TotalLength
Bytes 0-3
16	Identification
3	Flags
13	Fragment Offset
Bytes 4-7
8	Time-To-Live
8	Protocol Type
16	Header Checksum
Bytes 8-11
32	Source Address
Bytes 12-15
32	Destination Address
Bytes 16+
32n	Options + Padding
	Data

Header Length (4)

Length of header in multiples of 4 bytes. Without options, the header is 20 bytes long so the value would be 5.

TOS/DSCP (2)

Stands for Type of Service/DiffServ Code Point. Allows the priority for each packet to be defined (usually by the ISP depending on how much you pay) - mostly ignored.

Fill with zero.

Total Length (16)

Total length (in bytes) of the datagram - required since some MAC layers may add padding to meet minimum length requirements.

This may be modified during fragmentation and reassembly.

Identification (16)

Identifies each IP payload accepted from higher layers for a given interface - a sequence number.

Flags (3)

Contains two flags for fragmentation and reassembly (DF = don’t fragment; MF = more fragments).

Fragmentation Offset (13)

Gives the offset of the current fragment within the entire datagram, in multiples of 8 bytes.

Time-to-Live (8)

Last resort to deal with loops - IP does not specify routing protocols and so there is a possibility that loops can occur.

This value is the upper limit on number of routers a packet can traverse, and is decremented by each router (hence routers must recalculate checksum) the packet passes through. Once the TTL values reaches 0, the packet is discarded and the sender is notified via an ICMP message.

TTL is typically 32 or 64.

Protocol Type (8)

Determines which higher-level protocol generated the payload - provides different SAPs to allow protocol multiplexing.

Value	Protocol
`0x01`	ICMP
`0x02`	IGMP
`0x04`	IP-in-IP Encapsulation
`0x06`	TCP
`0x11`	UDP

Header Checksum (16)

Checksum for the IP header (NOT the data) - for some types of data, a few bad bits may not matter.

Source/Destination Address (32)

Indicates the initial sender and final receiver of the datagram. Internally structured into two sections: the network and host ID. The network ID refers to some local network, and the host ID must be unique within that network.

Routing and Forwarding

Routing/Forwarding Tables

IP routers have several network interfaces - these are called ports. When a router receives a packet on some input port, it looks at the DestinationAddress field to determine the output port using the forwarding table.

The forwarding table is a table of all networks the router knows about, identified by their network ID, and the output port to send the packet through to reach that network. A table lookup is used to then select the output port for every incoming packet.

Notes:

The time of the lookup is dependent on the number of entries. Expensive routers often implement this in hardware
The table is filled by a separate routing protocol
It is the responsibility of the last router on a path to deliver an IP datagram to the host.

Important: a host address is tied to its location in the network; there is no mobility support - if a host switches to another network, it obtains another address, breaking ongoing TCP connections.

Classless Inter-Domain Routing

How many bits should be allocated to the network ID? Classful addressing used to be used, allocating exactly 8, 16 or 24 bits to the network ID.

CIDR was introduced in 1993 and is now mandatory, and allows an arbitrary number of bits to be allocated to the network.

Netmask

The netmask specifies which bits belong to the network ID. It is a 32 bit value, with the leftmost k bits being 1s, and the remaining bits 0. A /k netmask will have k bits allocated to the network ID. The network ID is calculated using a logical AND with the IP address.

Private host addresses

There are two special host addresses:

The host address with all 0s (32-k zeroes) signifies that the network as a whole is being referred to
The host address with all 1s ((32-k ones) is the broadcast address - all hosts on the network will receive the packet

There are also some reserved IP blocks:

Block	Usage
`10.0.0.0/8`	Private-use IP networks
`127.0.0.0/8`	Host loopback network
`169.254.0.0/16`	Link-local for point-to-point links
`172.16.0.0/12`	Private-use IP networks
`192.168.0.0/16`	Private-use IP networks

These addresses can be used within the provider network; packets with private addresses are not routed in the public internet.

The loopback address is usually denoted as 127.0.0.1, but any valid address in 127.0.0.0/8 can be used.

Simplified Packet Processing

IP input:

Run sanity check on packet
Insert packet into IP input queue
Process IP options (extremely rare - not discussed)
Check if destined to its own IP address or the broadcast address:
- Protocol demultiplexing: check Protocol field, pass TCP/UDP/whatever to the right high-layer protocol
If not:
- Check if forwarding enabled; if not, the device is not a router so drop the packet
- Pass to IP output

IP output:

Check if the packet is an immediate neighbor/directly reachable (e.g. directly via Ethernet)
- If so, attempt to deliver over a single hop
Else:
- Calculate next hop using forwarding table lookup
  - Forwarding table filled and maintained by a routing daemon
- Decrement TTL
- Recompute HdrChksum
- Hand over packet to outgoing interface

Forwarding Table Contents and Lookup (rough approximation)

Each entry contains:

Destination IP address: one of
- Full host address (i.e. non-zero host-id)
- Network address with netmask (more common)
Information about next hop: one of
- IP address of next-hop router (which is directly reachable)
- IP address of directly-connected network (network address/netmask)
Flags
- If destination IP is host or network
- If next hop is router or directly attached to the network
Specification of outgoing interface

A router only knows the next hop, not the entire remaining route. The forwarding table lookup has three stages:

Look for an entry that is a full host address completely matching dst. If found, send packet to indicated next hop/outgoing interface and stop processing
- Extremely uncommon
Look for an entry that is a network address matching dst’s network address. If found, send packet to indicated next hop/outgoing interface and stop processing
- dst & netmask == net-addr & netmask
  - Usually, the network address already has the host bits set to zero, so slightly overkill
Look for a special default entry - if found, send packet to indicated next hop - the default router and stop processing
Drop packet and send ICMP message back to original sender of the datagram

In End-Hosts

Forwarding tables in end hosts generally contain two or more entries:

The default route, typically configured via DHCP
An entry for each network it is directly connected to - this way, it can communicate with other hosts on the network

In Routers

Most routers at the ‘border’ of the Internet (close to the customer) usually have small a forwarding table, often filled with other networks belonging to the same owner. For all other networks, they rely on the default router.

Core routers:

Do not have a default router
Are the default routers of other routers
Know (almost) all of the Internet networks

Fragmentation and Reassembly

The link-layer technologies used by IP have different maximum packet sizes, called the maximum transmission unit. For several reasons, IP abstracts these concerns:

For separation of concerns
A packet could use different link-layer technologies in transit
The route could change due to link failures/restores

Hence, IP offers its own maximum message length of 65515: (2^{16} - 1) - 20, where 20 is the minimum IP header size.

The overall process is as follows:

The sender IP instance partitions the the message into fragments
Each fragment is transmitted individually as a full IP packet
- Header information specifies that this is a fragment, and stores the position of the fragment in the message
Each fragment has a size no larger than the MTU of the outgoing link
- Hence, intermediate routers can fragment a full message or split fragments further, but never re-assemble them (as it is not guaranteed all packets will follow the same path)
The destination IP instance buffers the received fragments, re-assembles the message, then delivers it to the higher layer
- When the first fragment arrives, a buffer large enough for the whole message is allocated and a timer is started

All fragments datagrams belonging to the same message have:

A full IP header
The same value in the Identification field (unique identifier set by the higher layer protocol)
A TotalLength field reflecting the size of the fragment
FragmentOffset field, specifying the offset (in units of 8 bytes) relative to the original, un-fragmented IP datagram
All but the last fragment will have the MF (more-fragments) bit set
- The last fragment does not have this set. It also has a non-zero FragmentOffset to differentiate it from an un-fragmented packet

The sender can set the DF (don’t fragment) bit in the IP header to forbid fragmentation by intermediate routers. If the outgoing link’s MTU is too small, the intermediate router may return an ICMP message.

Address Resolution Protocol

IP datagrams are encapsulated into Ethernet frames, and Ethernet stations pick up a packet only if the destination MAC address matches its own. Hence, stations must know which MAC address a given IP address refers to.

ARP allows a host to map a given IP address to an MAC address. ARP is dynamic:

The MAC address for a given IP address does not need to be statically configured
Nodes can be moved or equipped with new network adapters without any re-configuration

This trades the downsides of static configuration for the complexity of running a separate protocol.

Basic Operation

Two stations, A and B, are attached to the same Ethernet network
- Each has an IP and MAC address
- Each maintains an ARP cache
Station A wishes to send an IP packet to station B’s IP address, but does not know B’s MAC address
Station A broadcasts an ARP-request message containing:
- Its own IP and MAC address
- Stations B’s IP address
Hence, the packet is sent to the Ethernet broadcast address. The length/type field of the Ethernet frame has a value of 0x0806
Any other host recognizes that the IP address is not its own, and discards the packet
- If it cached every binding it ever heard, its ARP cache would get full quickly
When Station B receives this, it:
- Stores a binding between A’s IP address and MAC address in its own cache
- Responds with a unicast ARP-reply packet including:
  - Its own MAC and IP address
  - A’s MAC and IP address
A then stores the binding between B’s IP and MAC address in its ARP cache

ARP makes no retransmissions if an request is not answered.

If a sender wants to send an IP packet to a local destination, it first checks its ARP cache for a binding to the IP address. If not, it starts the address resolution procedure. Once this is finished, it will encapsulate the IP packet in an Ethernet frame and direct it to the MAC address.

Entries in the ARP cache are soft-state; they get discarded some fixed time (usually 20 minutes) after it is inserted into the cache. Some implementations restart the timer after using the entry.

This ensures that stale information gets flushed out (this is much more reliable than that device of interesting notifying everyone that its binding has changed).

Frame format

Standard Ethernet Header
- DstAddr, SrcAddr, FrameType
HardType
- Type of MAC addresses used
- 0x0001 for Ethernet’s 48 bit addresses
ProtType
- Identifier for the higher-layer protocol that requires the address resolution
- 0x0800 for IPv4
HardSize, ProtSize
- Size in bytes of hardware and protocol addresses
- 6 and 4 for Ethernet and IPv4
Op
- Type of ARP packet; differentiates between and ARP-request and ARP-reply
Sender Ethernet/IP address
Target Ethernet/IP address

ARP is a protocol working an a higher-level than Ethernet, so although the destination and source address is included in the packet, it needs to be included again in the Ethernet payload.

Internet Control Message Protocol

Allows routers or the destination host to inform the sender of error conditions such as when:

No route to the destination
Destination host is not reachable
Fragmentation required, but DF set

ICMP messages are encapsulated into regular IP datagrams and like IP, has no error control mechanisms.

ICMP messages are not required, and are often filtered out by firewalls even if they are sent. Thus, the sending host must not rely on ICMP messages.

Message Format

type: 8-bit ICMP message type
code: 8-bit ICMP sub-type
checksum: 16-bit checksum covering the header and data (calculated with checksum being zero)
Data: depends on the type/code combination, but often includes the first few bytes of the offending IP datagram (including its header)

Some common `type`/`code` Combinations

Type	Code	Meaning
`0`	`0`	Echo reply (ping)
`3`	`0`	Destination network unreachable
`3`	`1`	Destination host unreachable
`3`	`2`	Destination protocol unreachable
`3`	`3`	Destination port unreachable
`3`	`4`	Fragmentation required but `DF` set
`3`	`6`	Destination network unknown
`3`	`7`	Destination host unknown
`4`	`0`	Source-quench
`8`	`0`	Echo request (ping)
`11`	`0`	TTL expired in transit
`11`	`1`	Fragment reassembly time exceeded

Destination Unreachable (type=3):

code=0: router has no matching entry (or default entry) for the non-directly connected destination address in its forwarding table
code=1: reached last router before reaching destination, but that router could not reach the directly connected host (e.g. no ARP response)
code=2: reached final destination, but value of protocol field not implemented by receiving host
code=3: used with TCP/UDP when no socket is bound to a port number

Source-quench (type=4, code=0): if IP router drops packet due to congestion. The intention is to let the source host throttle itself, but the act of sending the packet adds more work to the router.

TTL expiration (type=11, code=0): if IP router drops packet due to TTL reaching zero. tracert uses this, incrementing the TTL each time it sends a packet.

Fragmentation reassembly timeout (type=11, code=1): if not all fragments of a message received within a timeout. This invites the higher-level protocol to re-transmit the message.