14. Transport Layer Protocols - Flow and Congestion Control

TCP creates a reliable data transfer service on top of IP using pipelined segments, cumulative ACKs, retransmission timers etc.

Setting the window size

The window size is simply the difference between the last byte sent and last byte acknowledged. TCP simply limits this to LastByteSent - LastByteACKed <= min(CongestionWindow, RecvWindow), where the former is determined by the network congestion control algorithm and the latter by the TCP flow control algorithm.

Flow Control

Flow control is a speed-matching service - it matches the rate at which the sender writes against the rate at which the receiver can read.

The sender and receiver allocate buffers; the receiver simply sends the amount of spare room its buffer (TCP is full-duplex):

RecvWindow = RecvBuffer - (LastByteReceived - LastByteRead)

This is written to the 16-bit ‘Receive Window’ section in the TCP header.

Extreme Case

If the receive window is 0 bytes (i.e. buffer full as the higher-level application has not read the buffer contents), the sender will stop sending data.

This will mean that the receiver can no longer send acknowledgement packets back with updated window sizes. Hence, when the RecvWindow is 0, the sender will periodically send 1 byte of data.

Congestion Control

Packet retransmission treats the symptom of packet loss, not the cause, which is typically overflowing router buffers. In fact, retransmissions can make the problem worse.

Congestion control is not so much a service provided to the application, but to the network to the whole.

There are two broad approaches to congestion control. The first is network-assisted congestion control, where routers provide feedback to end-systems by indicating an explicit rate the sender should send at and/or a single bit flag indicating congestion. Today’s systems try to make routers as simple as possible and so do not use this approach.

Instead, end-to-end congestion control is used, where congestion is inferred from end systems observing loss and delay.

Congestion is caused by too many sources sending too much data too fast for the network to handle. This manifests itself through lost packets (buffer overflows) and long delays (queueing).

Congestion control attempts to ensure everyone enjoys a share of resources without causing network congestion. However, there is no predefined share for everyone - it is dynamic.

Assuming that RecvWindow > CongestionWindow (i.e. LastByteSent - LastByteACKed <= CongestionWindow) and there is negligible transmission delay (time between sending first and last byte in a segment negligible), within in one RTT, the sender sends at most CongestionWindow bytes.

Hence, the rate of transmission is approximately CongestionWindow / RTT - CongestionWindow can be adjusted according to network congestion.

The arrival of ACKs is taken as an indication of the network being congestion-free: if ACKs (for previously unacknowledged segments) arrive quickly, the congestion window should be increased quickly (and slowly if ACKs arrive slowly).

After a loss event - a timeout or 3 duplicate ACKs, the congestion window should decrease.

TCP Slow Start

Begin the connection with CongestionWindow = 1 MSS (maximum segment size). This means 1 segment is sent per RTT.

This may be much smaller than the available bandwidth. Hence, the rate should be increased exponentially until the first loss event: double the congestion window every RTT - this is done by incrementing CongestionWindow by 1 MSS for every ACK.

After the first loss event, congestion control moves on to AIMD:

TCP AIMD (Additive Increase, Multiplicative Decrease)

Additive increase (AI): increase CongestionWindow by 1 MSS every RTT in the absence of loss events. This can be done by CongestionWindow+=1MSSCongestionWindowCongestionWindow \mathrel{+}= \frac{1 MSS}{CongestionWindow} whenever a new ACK arrives (1MSSCongestionWindowCongestionWindow1MSS=1MSS\frac{1 MSS}{CongestionWindow} \cdot \frac{CongestionWindow}{1 MSS} = 1 MSS). Additive increase is called congestion avoidance as it increases slowly to avoid congestion.

Multiplicative decrease (MD): halve CongestionWindow after every loss event.

This leads to a saw-tooth pattern in the congestion window size.

Timeout Rules

After three duplicate ACKs arrive, the CongestionWindow should be halved and then the window should grow linearly (AIMD).

After a timeout event, CongestionWindow should be set to 1 MSS and grow exponentially (slow start) until it reaches a threshold - half of CongestionWindow from before the timeout.

Note that the duplicate ACK behavior occurs in TCP Reno; TCP Tahoe’s (Reno’s predecessor) behavior is the same as a timeout event.