The Internet
A packet-switched network of networks. The networks and links follow the end-to-end principle:
- Keep routers dumb, put intelligence in hosts
- Routers know how to deliver packets
- Making deliveries reliable is done by the end host
IPv4
Terminology:
- IP Packets are called datagrams unless fragmentation/reassembly is being used - they are called fragments in this case
- End stations are called hosts
- IP routers are called routers
- IP addresses are assigned to network interfaces, not hosts
Best-Effort, Datagram Delivery:
- Connectionless: no connection/shared state
- Unacknowledged
- Unreliable: no retransmissions
- Unordered: no in-sequence guarantees
Packet Format
Big endian byte ordering.
| Length | Name |
|---|---|
| 4 | Version |
| 4 | Hdr Len |
| 6 | TOS/DSCP |
| 2 | Unused |
| 16 | TotalLength |
| Bytes 0-3 | |
| 16 | Identification |
| 3 | Flags |
| 13 | Fragment Offset |
| Bytes 4-7 | |
| 8 | Time-To-Live |
| 8 | Protocol Type |
| 16 | Header Checksum |
| Bytes 8-11 | |
| 32 | Source Address |
| Bytes 12-15 | |
| 32 | Destination Address |
| Bytes 16+ | |
| 32n | Options + Padding |
| Data |
Version (4)
Version of IP protocol. 4 for IPv4.
Header Length (4)
Length of header in multiples of 4 bytes. Without options, the header is 20 bytes long so the value would be 5.
TOS/DSCP (2)
Stands for Type of Service/DiffServ Code Point. Allows the priority for each packet to be defined (usually by the ISP depending on how much you pay) - mostly ignored.
Fill with zero.
Unused (2)
Fill with zero.
Total Length (16)
Total length (in bytes) of the datagram - required since some MAC layers may add padding to meet minimum length requirements.
This may be modified during fragmentation and reassembly.
Identification (16)
Identifies each IP payload accepted from higher layers for a given interface - a sequence number.
Flags (3)
Contains two flags for fragmentation and reassembly (DF = don’t fragment; MF = more fragments).
Fragmentation Offset (13)
Gives the offset of the current fragment within the entire datagram, in multiples of 8 bytes.
Time-to-Live (8)
Last resort to deal with loops - IP does not specify routing protocols and so there is a possibility that loops can occur.
This value is the upper limit on number of routers a packet can traverse, and is decremented by each router (hence routers must recalculate checksum) the packet passes through. Once the TTL values reaches 0, the packet is discarded and the sender is notified via an ICMP message.
TTL is typically 32 or 64.
Protocol Type (8)
Determines which higher-level protocol generated the payload - provides different SAPs to allow protocol multiplexing.
| Value | Protocol |
|---|---|
0x01 |
ICMP |
0x02 |
IGMP |
0x04 |
IP-in-IP Encapsulation |
0x06 |
TCP |
0x11 |
UDP |
Header Checksum (16)
Checksum for the IP header (NOT the data) - for some types of data, a few bad bits may not matter.
Source/Destination Address (32)
Indicates the initial sender and final receiver of the datagram. Internally structured into two sections: the network and host ID. The network ID refers to some local network, and the host ID must be unique within that network.
Routing and Forwarding
Routing/Forwarding Tables
IP routers have several network interfaces - these are called ports. When a router receives a packet on some input port, it looks at the DestinationAddress field to determine the output port using the forwarding table.
The forwarding table is a table of all networks the router knows about, identified by their network ID, and the output port to send the packet through to reach that network. A table lookup is used to then select the output port for every incoming packet.
Notes:
- The time of the lookup is dependent on the number of entries. Expensive routers often implement this in hardware
- The table is filled by a separate routing protocol
- It is the responsibility of the last router on a path to deliver an IP datagram to the host.
Important: a host address is tied to its location in the network; there is no mobility support - if a host switches to another network, it obtains another address, breaking ongoing TCP connections.
Classless Inter-Domain Routing
How many bits should be allocated to the network ID? Classful addressing used to be used, allocating exactly 8, 16 or 24 bits to the network ID.
CIDR was introduced in 1993 and is now mandatory, and allows an arbitrary number of bits to be allocated to the network.
Netmask
The netmask specifies which bits belong to the network ID. It is a 32 bit value, with the leftmost k bits being 1s, and the remaining bits 0. A /k netmask will have k bits allocated to the network ID. The network ID is calculated using a logical AND with the IP address.
Private host addresses
There are two special host addresses:
- The host address with all 0s (32-k zeroes) signifies that the network as a whole is being referred to
- The host address with all 1s ((32-k ones) is the broadcast address - all hosts on the network will receive the packet
There are also some reserved IP blocks:
| Block | Usage |
|---|---|
10.0.0.0/8 |
Private-use IP networks |
127.0.0.0/8 |
Host loopback network |
169.254.0.0/16 |
Link-local for point-to-point links |
172.16.0.0/12 |
Private-use IP networks |
192.168.0.0/16 |
Private-use IP networks |
These addresses can be used within the provider network; packets with private addresses are not routed in the public internet.
The loopback address is usually denoted as 127.0.0.1, but any valid address in 127.0.0.0/8 can be used.
Simplified Packet Processing
IP input:
- Run sanity check on packet
- Insert packet into IP input queue
- Process IP options (extremely rare - not discussed)
- Check if destined to its own IP address or the broadcast address:
- Protocol demultiplexing: check
Protocolfield, pass TCP/UDP/whatever to the right high-layer protocol
- Protocol demultiplexing: check
- If not:
- Check if forwarding enabled; if not, the device is not a router so drop the packet
- Pass to IP output
IP output:
- Check if the packet is an immediate neighbor/directly reachable (e.g. directly via Ethernet)
- If so, attempt to deliver over a single hop
- Else:
- Calculate next hop using forwarding table lookup
- Forwarding table filled and maintained by a routing daemon
- Decrement
TTL - Recompute
HdrChksum - Hand over packet to outgoing interface
- Calculate next hop using forwarding table lookup
Forwarding Table Contents and Lookup (rough approximation)
Each entry contains:
- Destination IP address: one of
- Full host address (i.e. non-zero host-id)
- Network address with netmask (more common)
- Information about next hop: one of
- IP address of next-hop router (which is directly reachable)
- IP address of directly-connected network (network address/netmask)
- Flags
- If destination IP is host or network
- If next hop is router or directly attached to the network
- Specification of outgoing interface
A router only knows the next hop, not the entire remaining route. The forwarding table lookup has three stages:
- Look for an entry that is a full host address completely matching
dst. If found, send packet to indicated next hop/outgoing interface and stop processing- Extremely uncommon
- Look for an entry that is a network address matching
dst’s network address. If found, send packet to indicated next hop/outgoing interface and stop processingdst & netmask == net-addr & netmask- Usually, the network address already has the host bits set to zero, so slightly overkill
- Look for a special default entry - if found, send packet to indicated next hop - the default router and stop processing
- Drop packet and send ICMP message back to original sender of the datagram
In End-Hosts
Forwarding tables in end hosts generally contain two or more entries:
- The default route, typically configured via DHCP
- An entry for each network it is directly connected to - this way, it can communicate with other hosts on the network
In Routers
Most routers at the ‘border’ of the Internet (close to the customer) usually have small a forwarding table, often filled with other networks belonging to the same owner. For all other networks, they rely on the default router.
Core routers:
- Do not have a default router
- Are the default routers of other routers
- Know (almost) all of the Internet networks
Fragmentation and Reassembly
The link-layer technologies used by IP have different maximum packet sizes, called the maximum transmission unit. For several reasons, IP abstracts these concerns:
- For separation of concerns
- A packet could use different link-layer technologies in transit
- The route could change due to link failures/restores
Hence, IP offers its own maximum message length of 65515: (2^{16} - 1) - 20, where 20 is the minimum IP header size.
The overall process is as follows:
- The sender IP instance partitions the the message into fragments
- Each fragment is transmitted individually as a full IP packet
- Header information specifies that this is a fragment, and stores the position of the fragment in the message
- Each fragment has a size no larger than the MTU of the outgoing link
- Hence, intermediate routers can fragment a full message or split fragments further, but never re-assemble them (as it is not guaranteed all packets will follow the same path)
- The destination IP instance buffers the received fragments, re-assembles the message, then delivers it to the higher layer
- When the first fragment arrives, a buffer large enough for the whole message is allocated and a timer is started
All fragments datagrams belonging to the same message have:
- A full IP header
- The same value in the
Identificationfield (unique identifier set by the higher layer protocol) - A
TotalLengthfield reflecting the size of the fragment FragmentOffsetfield, specifying the offset (in units of 8 bytes) relative to the original, un-fragmented IP datagram- All but the last fragment will have the
MF(more-fragments) bit set- The last fragment does not have this set. It also has a non-zero
FragmentOffsetto differentiate it from an un-fragmented packet
- The last fragment does not have this set. It also has a non-zero
The sender can set the DF (don’t fragment) bit in the IP header to forbid fragmentation by intermediate routers. If the outgoing link’s MTU is too small, the intermediate router may return an ICMP message.
Address Resolution Protocol
IP datagrams are encapsulated into Ethernet frames, and Ethernet stations pick up a packet only if the destination MAC address matches its own. Hence, stations must know which MAC address a given IP address refers to.
ARP allows a host to map a given IP address to an MAC address. ARP is dynamic:
- The MAC address for a given IP address does not need to be statically configured
- Nodes can be moved or equipped with new network adapters without any re-configuration
This trades the downsides of static configuration for the complexity of running a separate protocol.
Basic Operation
- Two stations, A and B, are attached to the same Ethernet network
- Each has an IP and MAC address
- Each maintains an ARP cache
- Station A wishes to send an IP packet to station B’s IP address, but does not know B’s MAC address
- Station A broadcasts an ARP-request message containing:
- Its own IP and MAC address
- Stations B’s IP address
- Hence, the packet is sent to the Ethernet broadcast address. The length/type field of the Ethernet frame has a value of
0x0806 - Any other host recognizes that the IP address is not its own, and discards the packet
- If it cached every binding it ever heard, its ARP cache would get full quickly
- When Station B receives this, it:
- Stores a binding between A’s IP address and MAC address in its own cache
- Responds with a unicast ARP-reply packet including:
- Its own MAC and IP address
- A’s MAC and IP address
- A then stores the binding between B’s IP and MAC address in its ARP cache
ARP makes no retransmissions if an request is not answered.
If a sender wants to send an IP packet to a local destination, it first checks its ARP cache for a binding to the IP address. If not, it starts the address resolution procedure. Once this is finished, it will encapsulate the IP packet in an Ethernet frame and direct it to the MAC address.
Entries in the ARP cache are soft-state; they get discarded some fixed time (usually 20 minutes) after it is inserted into the cache. Some implementations restart the timer after using the entry.
This ensures that stale information gets flushed out (this is much more reliable than that device of interesting notifying everyone that its binding has changed).
Frame format
- Standard Ethernet Header
DstAddr,SrcAddr,FrameType
HardType- Type of MAC addresses used
0x0001for Ethernet’s 48 bit addresses
ProtType- Identifier for the higher-layer protocol that requires the address resolution
0x0800for IPv4
HardSize,ProtSize- Size in bytes of hardware and protocol addresses
- 6 and 4 for Ethernet and IPv4
Op- Type of ARP packet; differentiates between and ARP-request and ARP-reply
- Sender Ethernet/IP address
- Target Ethernet/IP address
ARP is a protocol working an a higher-level than Ethernet, so although the destination and source address is included in the packet, it needs to be included again in the Ethernet payload.
Internet Control Message Protocol
Allows routers or the destination host to inform the sender of error conditions such as when:
- No route to the destination
- Destination host is not reachable
- Fragmentation required, but
DFset
ICMP messages are encapsulated into regular IP datagrams and like IP, has no error control mechanisms.
ICMP messages are not required, and are often filtered out by firewalls even if they are sent. Thus, the sending host must not rely on ICMP messages.
Message Format
type: 8-bit ICMP message typecode: 8-bit ICMP sub-typechecksum: 16-bit checksum covering the header and data (calculated withchecksumbeing zero)- Data: depends on the
type/codecombination, but often includes the first few bytes of the offending IP datagram (including its header)
Some common type/code Combinations
| Type | Code | Meaning |
|---|---|---|
0 |
0 |
Echo reply (ping) |
3 |
0 |
Destination network unreachable |
3 |
1 |
Destination host unreachable |
3 |
2 |
Destination protocol unreachable |
3 |
3 |
Destination port unreachable |
3 |
4 |
Fragmentation required but DF set |
3 |
6 |
Destination network unknown |
3 |
7 |
Destination host unknown |
4 |
0 |
Source-quench |
8 |
0 |
Echo request (ping) |
11 |
0 |
TTL expired in transit |
11 |
1 |
Fragment reassembly time exceeded |
Destination Unreachable (type=3):
code=0: router has no matching entry (or default entry) for the non-directly connected destination address in its forwarding tablecode=1: reached last router before reaching destination, but that router could not reach the directly connected host (e.g. no ARP response)code=2: reached final destination, but value ofprotocolfield not implemented by receiving hostcode=3: used with TCP/UDP when no socket is bound to a port number
Source-quench (type=4, code=0): if IP router drops packet due to congestion. The intention is to let the source host throttle itself, but the act of sending the packet adds more work to the router.
TTL expiration (type=11, code=0): if IP router drops packet due to TTL reaching zero. tracert uses this, incrementing the TTL each time it sends a packet.
Fragmentation reassembly timeout (type=11, code=1): if not all fragments of a message received within a timeout. This invites the higher-level protocol to re-transmit the message.