03. Classical Encryption

Terminology

Cryptography: the study of designing systems
Cryptoanalysis: the study of breaking systems
Steganography; the study of concealing information; not covered in this course

Cryptography transforms data based on a secret called the key. It provides confidentiality and authentication:

Confidentiality: key needed to read the message
Authentication: key needed to write the message

Cryptosystems:

A set of plaintexts holding the original message
A set of ciphertexts holding the encrypted message
- Sometimes called cryptogram
A set of keys
A function called the encryption or encipherment which transforms the plaintext into a ciphertext
An inverse function called the decryption or decipherment which transforms the ciphertext into the plaintext

Symmetric key cipher (secret key cipher):

Encryption/decryption keys known only to the sender/receiver
Secure channel required for transmission of keys

Asymmetric key cipher (public key cipher):

Each participant has a public and private key
Can be used to both encrypt messages and create digital signatures

Notation for Symmetric Encryption Algorithms

Encryption function $E$
Decryption function $D$
Message/plaintext $M$
Cryptogram/ciphertext $C$
Shared secret key $K$

Encryption: $C = E(M, K)$

Decryption: $M = D(C, K)$

Methods of Cryptanalysis

What resources are available to the adversary? Computational capabilities, inputs/outputs to the systems, etc.

What is the adversary aiming to achieve? Retrieving the whole secret key? Distinguishing between two messages?

Exhaustive Key Search

Adversary tries all possible keys. Impossible to prevent such attacks; can only ensure there are enough keys to make exhaustive search too difficult computationally.

Note that the adversary may find the key without exhaustive search or even break the cryptosystem without finding the key.

Preventing exhaustive key search is a minimum standard.

Attack Classification

Ciphertext only attack: the attacker has access only to intercepted ciphertexts.

A cryptosystem is highly insecure if it can be practically attacked using only intercepted ciphertexts.

Known plaintext attack: the attacker knows a small amount of plaintexts and their corresponding ciphertexts.

Chosen plaintext attack: the attacker can obtain the ciphertext from some plaintext it has selected (attacker has ‘inside encryptor’).

Chosen ciphertext attack: the attacker can obtain the plaintext from some ciphertext it has selected (attacker has ‘inside decryptor’).

A cryptosystem should be secure against chosen plaintext and ciphertext attacks.

Kerckhoff’s Principle

Kerckoff’s Principle states the that the attacker has complete knowledge of the cipher; the decryption key is the only item unknown to the attacker.

Secret, non-standard algorithms are often flawed, providing mainly security through obscurity.

Alphabets

Historical ciphers: define alphabet for the plaintext and ciphertext

Roman alphabet: $A, B, C, \dots, Z$

Sometimes it includes spaces, upper/lowercase characters, punctuation
Sometimes maps the alphabet to numbers: $A = 0, B = 1, \dots, Z = 25$

Statistical attacks depend on using the redundancy of the alphabet:

Distribution of single letters, digrams, trigrams are used
Exact statistics vary by sample

Basic Cipher Operations

Transposition: characters in plaintext are mixed up with each other (permutations)

Substitution: each character is replaced by a different character

Transposition Cipher

Permuting characters in a fixed period $d$ and permutation $f$ .

The plaintext is seen as a matrix with $d$ columns. For each row, the characters are mixed up in the order given by $f$ . The same permutation is used by each row.

Key is $(d, f)$ , each block of $d$ characters being re-ordered using the permutation $f$ .

There are $d!$ permutations of length $d$ .

Cryptanalysis

Frequency distribution of ciphertext and plaintext characters are the same.

If $d$ is small, transposition ciphers can be solved by hand using anagramming.

Knowledge of plaintext language digram/trigrams can help to optimize trials.

Simple Substitution Ciphers

Each character in plaintext alphabet replaced by character in ciphertext alphabet using a substitution table.

This is called a monoaphabetic substitution cipher.

Caesar cipher:

$i$ th letter of the alphabet mapped to the $(i + j)$ th letter using the key $j$
Encryption: $C_i = (M_i + j) \pmod n$
Decryption: $M_i = (C_i - j) \pmod n$
Guess $j$ by finding the most frequent character in the ciphertext and mapping it to the most frequent character in the language (e.g. $\Delta$ (space), ‘e’)

Random simple substitution cipher:

Each character assigned to a random character of the alphabet
Encryption/decryption done using substitution table
If the alphabet has $26$ characters, there are $26!$ keys
- One-to-one mapping, so second character can only be assigned to $n - 1$ characters
Caesar cipher is a special case of the random simple substitution cipher
Frequency analysis: use the most frequent characters, common di/trigrams such as ‘the’

Polyalphabetic Substitution

Multiple mappings from plaintext to ciphertext: smoothens frequency distribution.

Typically periodic substitution ciphers based on a period $d$ .

Given $d$ ciphertext alphabets $C_0, C_1, \dots, c_{d - 1}$ , let $f_i: A \rightarrow C_i$ .

A plaintext message:

M = M_0 \dots M_{d-1}M_d \dots M_{2d-1}M_{2d} \dots

it is encrypted to:

(K, M) = f_0(M_0)f_1(M_1) \dots f_{d-1}(M_{d - 1})f_0(M_d) \dots f_{d - 1}(M_{2d - 1}) \dots

If $d = 1$ , the cipher is monoalphabetic - a simple substitution cipher.

Key generation:

Select block length $d$
Generate $d$ random simple substitution ciphers

Encryption: encrypt the $i$ th character using the $j$ th substitution table such that $i \equiv j \pmod d$ .

Decryption: use the same substitution table as encryption.

Vigenère Cipher

Based on shifted alphabets.

The key $K$ is a sequence of characters $K = K_0 K_1 \dots K_{d - 1}$ .

Let $M$ be the plaintext character. for $0 \le i \le d - 1$ , $K$ gives the amount of shift in the $i$ th character e.g. $f_i(M) = (M + K_i) \bmod n$ .

e.g. if $K= LOCK = \{ 11, 14, 2, 10 \}$ , the first character is shifted by $11$ , the second is shifted by $14$ , …, the fifth is shifted by $11$ .

Cryptanalysis

Identifying period length:

Kasiski method
Cryptool uses autocorrelation to automatically estimate period

Once period identified, the $d$ substitution tables can be attacked separately - there needs to be sufficient ciphertext to do this.

Autocorrelation

Given ciphertext $C$ , computed the correlation between $C$ and its shift $C_i$ for all values $i$ of the period.

English is non-random; there is better correlation between two texts with the same shift size.

Find peaks in the value of $C_i$ when $i$ is a multiple of the period; results can be plotted on a histogram.

Kasiski Method

If you identify sequences of characters that occur multiple times, find the distance between them; the period is likely to be a multiple of the period.

If you find multiple sequences with different distances, the period is likely to be a common divisor.

Once the period is found, the separate alphabets can be attacked separately; at this point, it is just a Caesar cipher.

Other Ciphers (for use by hand)

Beaufort cipher: like Vigenère, but $f_i(M) = (K_i - M) \bmod n$

Autokey: starts off as the Vigenère cipher, but the plaintext defines the subsequent alphabet. Hence, the cipher is not periodic.

Running key cipher: practically infinite set of alphabets generated from a shared key. This is ofen an extract from a book called the book cipher.

Rotor Machines

Enigma: each character encrypted with a different alphabet with a period of ~17,000; would never repeat in the same message (in practice).

Hill Cipher

Polygram/polygraphic cipher: simple substitution of an extended alphabet consisting of multiple characters.

Has linearity, making known plaintext attacks easy.

Given $d$ plaintext characters:

Encryption: multiplying the $d \times d$ matrix $K$ by the block of plaintext $M$ : $C = KM$ .

Decryption: multiplying the matrix $K^{-1}$ by the block of ciphertext $C$ : $M = K^{-1}C$ .

Example

$d = 2$ ; takes digrams as input/output blocks.

Each plaintext pair written as column vector. If there are insufficient letters, they are filled with uncommon letters (e.g. $Z$ ).

K = \begin{pmatrix} 4 & 6 \\ 1 & 7 \end{pmatrix}, K^{-1} = \begin{pmatrix} 4 & 12 \\ 11 & 10 \end{pmatrix}

Plaintext:

M = (EG) = \begin{pmatrix} 4 \\ 6 \end{pmatrix}

Encryption:

\begin{aligned} C &= KM \\ &= \begin{pmatrix} 4 & 6 \\ 1 & 7 \end{pmatrix} \begin{pmatrix} 4 \\ 6 \end{pmatrix} \\ &= \begin{pmatrix} 52 \bmod{27} \\ 46 \bmod{27} \end{pmatrix} \\ &= \begin{pmatrix} 25 \\ 19 \end{pmatrix} \\ &= ZT \\ \end{aligned}

Decryption:

\begin{aligned} M &= K^{-1}C \\ &= \begin{pmatrix} 4 & 12 \\ 11 & 10 \end{pmatrix} \begin{pmatrix} 25 \\ 19 \end{pmatrix} \\ &= \begin{pmatrix} 4 \\ 6 \end{pmatrix} \\ &= (EG) \end{aligned}

Cryptanalysis

Known plaintext attacks possible given $d$ plaintext-ciphertext matching blocks: given blocks (column vectors) $M_i$ and $C_i$ , $0 \le i \le d - 1$ :

$C = [C_0 C_1 \dots C_{d-1}]$
$M = [M_0 M_1 \dots M_{d - 1}]$
$C = KM$ , so $K = CM^{-1}$

$C$ , $M$ and $K$ are all $d \times d$ vectors

Then $K^{-1}$ can be found to decrypt the ciphertext.

Comments:

The plaintext message $M$ may not be invertible
Ciphertext-only attacks follow known plaintext attacks with the extra task of finding probable blocks of matching plaintext-ciphertext
- e.g. if $d = 2$ , the frequency distribution of non-overlapping pairs of ciphertext characters can be compared with the distribution of pairs of plaintext characters
Cryptool defaults to an alphabet of $A = 1, B = 2, \Delta = 27$ (where $\Delta$ is space)