11. Reliable Data Transfer - Error Detection and Correction

Transmission errors can occur due to factor such as:

Thermal noise
Weak signal strength, interference, jamming, jitter erc.
Router/switch/station crashing
Packets being:
- routed in the wrong direction
- dropped due to congestion
- dropped due to insufficient resources

To correct:

Bit errors due to signal attenuation/noise, use error detection and correction
Buffer overflows due to speed mismatch, too much traffic etc., use flow/congestion control
Lost packets due to buffer overflows, lack of resources, use acknowledgement and retransmission
Out-of-order packets due to retransmissions/packets using different routes, use acknowledgement and retransmission

Error detection

Use redundancy bits, sometimes called error detection and correction bits.

These bits can be used to check if an error has been detected (not that it has occurred).

Parity Check

This is the simplest way of detection an error; append a single parity bit to the end of a block of data.

Even parity: number of 1’s in the data, plus parity bit is even
Odd parity: number of 1’s in the data, plus parity bit is odd

2-D Parity Check

Divide bits into a matrix and use parity bits for each column and row, plus a parity bit for the parity bits (number of 1’s for the row and column parity bits).

Hence, it is possible to determine the bit which was flipped and hence correct single-bit errors.

This is called forward error correction - the receiver can detect and correct errors without needing to transmit a message to the sender.

Used for IPv4 header checksum and UDP data checksum (optional in IPv4, mandatory in IPv6. Checksum of part of the IP header (source/destination addresses, plus other fields), the UDP header and data).

Segment contents are treated as a sequence of 16-bit integers.

The checksum is the wraparound sum of the integers (ones complement):

Add the two integers
If there is overflow, add 1 to the result (wraparound)

Once all integers have been summed, invert the bits.

The receiver will sum all segments, plus the checksum, and check that all the bits are 1.

This scheme is considerably less effective than CRC, but also very simple to implement.

Why does IP (not v6), UDP and TCP even need checksums?

Some link-layers do not do error correction
Bit errors can occur anywhere - e.g. router memory
As a final end-to-end check
Certain functionality must be implemented on an end-to-end basis

def ones_complement_addition(shorts):
  result = 0
  for short in shorts:
    result += short
    if result > 0xFFFF:
      result -= 0xFFFF
  return result

def checksum(shorts):
  return 0xFFFF - ones_complement_addition(shorts)

def check_checksum(shorts, checksum):
  return 0xFFFF == (ones_complement_addition(shorts) + checksum)

Cyclic Redundancy Check (CRC)

Assumptions:

d data bits
The sender and receiver agree on an r+1 pattern called a generator, where the left-most bit is 1

The sender chooses r additional bits to append to the data such that the d+r are divisible by the generator using modulo-2 arithmetic.

The receiver then divides the d+r bits by the generator; if there is no remainder, no error is detected.

D: d-bit data block
$F$ : r additional bits (frame check sequence)
$T$ : d+r-bit frame
$G$ : predetermined pattern of r+1 bits

Modulo-2 arithmetic:

Addition/subtraction: exclusive OR ( $\oplus$ )
Multiplication/division: same as base-2 arithmetic, but no carries or borrows

The $D$ bits are shifted left by r bits, so $T = D \cdot 2^r \oplus F$ .

This is divisible by the divisor: hence, $D \cdot 2^r \oplus F = nG$ .

$\oplus F$ both sides: $D \cdot 2^r = nG \oplus F$

Hence, $F$ is the remainder of $\frac{D \cdot 2^r}{G}` (using modulo-2 arithmetic).

Example

G=11 0101, D=10 1000 1101, r=5

Initially, set the F bits to zero:

Divide 11 0101 by T=101 0001 1010 0000:

10 1000 XOR 11 0101 = 01 1011. The divisor and divided have an equal number of bits, so the most significant bit of the result is 1.

Get rid of the leading 0, then pull the next least significant bit from T:

11 0111 XOR 11 0101 = 00 1110. Second MSB of result is 1 again.

Git rid of the leading 0’s, then pull the next least significant bit from T:

1 1101 XOR 11 0101; this will not fit, so add a 0 as the next MSB of the result. Pull the next LSB from T:

11 1010 XOR 11 0101 = 00 1011 …

So result of division is 11 0101 0110 with a remainder of 1110. Thus, F=01110 and the message is T=101 001 1010 1110.

Other notes

It is called a cyclic redundancy check as if you do a circular shift, it is still divisible by G.
CRC can detect consecutive bit errors of r bits or fewer.
CRCs are also called polynomial codes: do the division as sums of powers of X (2?)
Four versions of G are widely used e.g. CRC-32 is sums of X to the power of 32, 26, 23, 22, 16, 12, 11, 10, 8, 7, 5, 4, 2, 1, 0

Forward Error Correction

Useful on wireless applications with propagation delay and high bit error rates.

Messages are divided into blocks of k-bits called datawords; the FEC encoder generates n-bit codewords.

Hamming distance: distance $d(v_1, v_2)$ between two n-bit binary sequences; the number of bits in which $V_1$ and $v_2$ disagree (assuming they are the same length).

Error correction can be done by finding the minimum distance from the received codeword to any valid codeword (and thus dataword). This fails when it is an equal distance away from multiple valid codewords.

Useful properties for the scheme:

Maximize the overall hamming distances between any two valid codewords
Ensure that no codeword is an equal distance away from two codewords

k-error correcting: minimum hamming distance between two valid codewords is at least 2k + 1.