All Files in ‘COSC362 (2021-S2)’ Merged

01. Introduction

Lecturer: Clementine Gritti.

Course weighting:

Quizzes: 20%
- 9 quizzes, 10 multi-choice questions with 4 options and 1 correct answer
Assignment: 20%
- Released a week before the term break, due the week after
Lab attendance: 10%
- If lab is missed, report can be emailed to course coordinator and TA by the following Friday
Final Exam: 50%

CIA Triad:

Loss of Confidentiality: when unauthorized people can access the data
Loss of Integrity: when unauthorized changes can be made to the data
Loss of Availability: when authorized users cannot access the data

02. Course Overview

What is Cyber security?

The NIST (National Institute of Standards and Technology) computer security handbook defines it as protections afforded to an information system to preserve confidentiality, integrity, availability (CIA) of system resources.

Terminology:

Threat: potential security harm to an asset/system resource
Attack: thread that is carried out. If successful, leads to undesirable violation of security
Countermeasure: means taken to deal with attacks (prevention, detection, recovery)
Vulnerabilities:
- Leaky: access to information given when it should not
- Corrupted: does the wrong thing or gives wrong answers
- Unavailable: impossible/impractical to access

Key questions:

What assets need to be protected
How are the assets threatened
How can the threats be countered

Assets include:

Hardware
Software: OS, system utilities, applications
Data: files, DB
Networks: local/wide area network links, bridges, routers etc.

Types of Attacks

Passive:

Does not alter information or resources in the system
Hard to detect, easy to prevent
Types of passive attacks:
- Eavesdropping/interception: attacker directly accesses sensitive data in transit
- Traffic analysis/inference: observations of the amount of traffic between the source and destination

Active:

Alters information or system resources
Hard to prevent, easy to detect
Types of active attacks:
- Masquerade: attacker claims to be a different entity
- Message modification (falsification): message modified in transit
- DDoS (misappropriation): attacker prevents legitimate users from accessing resources

Inside:

Initiated by entity inside the security perimeter
Authorized to use the systems, but used in a malicious way
Types of inside attacks:
- Exposure: sensitive data intentionally released to outsider
- Falsification: data altered or replaced

Outside:

Initiated from outside the perimeter by an unauthorized or illegitimate user
Types of outside attacks:
- Obstruction: communication links disabled, or communication control information altered
- Intrusion: attacker gains unauthorized access to sensitive data

Fundamental Requirements

Information security management requires:

Threat identification
Classification by likelihood and severity
Security controls applied based on cost-benefit analysis

Countermeasures to threats and vulnerabilities:

Computer security technical measures (access control, authentication etc.)
Management measures (awareness, training)

What is Information Security?

ISO security architecture defines:

Security: when vulnerabilities in assets/resources are minimized
Asset: anything of value
Vulnerability: any weakness that could be exploited to violate a system or its information
Threat: potential security violation

Hence, information security is security where the assets/resources are information systems

Security Services and Mechanisms

OSI Security Architecture X.800: dated, but most definitions/terminology still relevant. Defines security threats, services, and mechanisms.

Security Services

A security service is processing/communication service that gives a specific kind of protection to system resources.

Security services include:

Peer entity authentication: confirms entity is who they claim to be
Data origin authentication: confirms origin of data unit/message
Access control: protects against unauthorized use of resources
Data confidentiality: protects data against unauthorized disclosure
Traffic flow confidentiality: protects disclosure of data that can be derived from knowledge of traffic flow
Data integrity: detects modification/reply of data in messages
Non-repudiation: protects against the message creator falsely denying to creating the data
Availability: protects against DDoS
Encipherment: transforms data to hide its content
Digital signature: mechanism to transform data using a signing key

From Stack Exchange:

Non-repudiation: entity cannot deny to having sent/signed the message
Message (or data origin) authentication: entity originally made the message
Entity authentication: entity involved in current communication session

Security Mechanisms

A security mechanism is a method of implementing one or more security services.

Security mechanisms include:

Data integrity: corruption-detection techniques
- Message Authentication Codes
Authentication exchange: protocols to ensure identify of participants
- TLS
Traffic padding: spurious traffic generated to protect against traffic analysis; usually used in combination with encipherment
Control lists, passwords, tokens which indicate access rights
Routing control: use of specific secure routes
Notarization: use of trusted third party to assure source/receipt of data

03. Number Theory and Finite Fields

Discrete mathematics: cyroptology deals with finite objects (e.g. alphabets, blocks of characters)

Modular arithmetic; deals with finite number of values.

Basic Number Theory

Factorization

$\mathbb{Z} = \{ \dots, -3, -2, -1, 0, 1, 2, 3, \dots \}$ is the set of integers.

Given $a, b \in \mathbb{Z}$ , $a$ divides $b$ if there exists $k \in \mathbb{Z}$ such that $ak = b$ . In this case, $a$ is a factor of $b$ : $a|b$ .

An integer $p > 1$ is prime iff its only divisors are $1$ and $p$ .

Properties

If $a|b$ and $a|c$ , then $a|(b+c)$ .

If $p$ is prime and $p|ab$ , $p|a$ OR $p|b$ .

Division algorithm

Given $a, b \in \mathbb{Z}$ such that $a > b$ , then there exists $q, r \in \mathbb{Z}$ such that:

a = bq + r

$q$ is the quotient and $0 \ge r \gt b$ is the remainder. $r \lt \frac{a}{2}$ .

Greatest Common Divisor

$d$ is the GCD of $a$ and $b$ ; $gcd(a, b) = d$ if:

$d|a$ and $|b$
If $c|a$ and $c|b$ , then $c|d$
$d \gt 0$

$a$ and $b$ are relatively prime if $gcd(a, b) = 1$

Euclidean Algorithm

To find $d = gcd(a, b)$ :

\begin{aligned} a &= bq_1 + r_1 \text{ for } 0 \gt r_1 \gt b \\ b &= r_1q_2 + r_2 \text{ for } 0 \gt r_2 \gt r_1 \\ r_1 &= r_2q_3 + r_3 \text{ for } 0 \gt r_3 \gt r_2 \\ & \dots \\ r_{k - 2} &= r_{k - 1}q_k + r_k \text{ for } 0 \gt r_k \gt r_{k - 1} \\ r_{k - 1} &= r_kq_{k + 1} \text{ with } r_{k + 1} = 0 \end{aligned}

Hence, $d = r_k = gcd(a, b)$ .

In psuedo-code:

def gcd(a, b):
  r[-1] = a
  r[0] = b
  k = 0
  while r[k] != 0:
    q[k] = floor(r[k - 1]/r[k])
    r[r + 1] = r[k - 1] - q[k]r[k]
    k = k + 1
  
  k = k - 1
  return r[k]

Back-Substitution

Find $x$ , $y$ in $ax + by = d = r_k$ .

r_{k - 3} = r_{k - 2}q_{k - 1} + r_{k - 1}

Can be rewritten as:

r_{k - 1} = r_{k - 3} - r_{k - 2}q_{k - 1}

$r_{k - 2} = r_{k - 1}q_k + r_k$ can be rewritten as $r_k = r_{k - 2} - r_{k - 1}q_k$ . $r_{k - 1}$ can be substituted with the value above. Hence, this process can be repeated until you have $r_{k - 1}$ in terms of the original values.

Example: $gcd(17, 3)$ :

\begin{aligned} 17 &= 3 \times 5 + 2 \\ 3 &= 2 \times 1 + 1 \\ 2 &= 1 \times 2 \end{aligned}

Back-substitution:

\begin{aligned} 1 &= 3 - 2 \times 1 \\ &= 3 - (17 - 3 \times 5) \times 1 \\ &= 17 \times - 1 + 3 \times 6 \end{aligned}

The result shows that $3^{-1} \equiv 6 \pmod{17}$ .

Modular Arithmetic

$b$ is a residue of $a \pmod n$ if $a - b = kn$ for some integer $k$ :

a \equiv b \pmod n \Longleftrightarrow a - b = kn

Given $a \equiv b \pmod n$ and $c \equiv d \pmod n$ :

\begin{aligned} a + c &\equiv b + d \pmod n \\ ac &\equiv bd \pmod n \\ ka &\equiv kb \pmod n \end{aligned}

$b \pmod n$ denotes the unique value $a$ in the complete set of residues $\{ 0, 1, \dots, n-1\}$ such that:

a \equiv b \pmod n

In other words, $b \pmod n$ is the remainder after dividing $a$ by $n$ .

Residue Class

The set $\{ r_0, r_1, \dots, r_{n - 1}\}$ is a complete set of residues modulo $n$ if for every $a \in \mathbb{N}$ , $a \equiv r_i \pmod n$ for exactly one $r_i$ .

The set $\{ 0, 1, \dots, n - 1\}$ is denoted as $\mathbb{Z}_n$ .

Groups

A group $\mathbb{G}$ is a set with binary operation $\cdot$ and:

Closure: $a \cdot b \in \mathbb{G}$ for any and all $a, b \in \mathbb{G}$
Identity: there is an element $1$ such that $a \cdot 1 = 1 \cdot a = a$ for any and all $a \in \mathbb{G}$
Inverse: there is an element $b$ such that $a \cdot b = 1$ for any and all $a \in \mathbb{G}$
Associativity: $(a \cdot b) \cdot c = a \cdot (b \cdot c)$ for any and all $a, b, c \in \mathbb{G}$

A group is abelian when it is the operation is commutative; $a \cdot b = b\cdot a$ for $a, b \in \mathbb{G}$ .

Cyclic Groups

The order $|\mathbb{G}|$ of a group $\mathbb{G}$ is the number of elements in $\mathbb{G}$ .

$g^k$ denotes the repeated application of $g \in \mathbb{G}$ using the group operation. e.g. $g^3 = g \cdot g \cdot g$ .

The order $|g|$ of $g \in \mathbb{G}$ is the smallest integer $k$ such that $g^k = 1$ .

$g$ is a generator for $\mathbb{G}$ if $|g| = |\mathbb{G}|$ .

A group is cyclic if it has a generator.

Computing Inverses Modulo $n$

The inverse if $a$ (if it exists) is a value $x$ such that $ax \equiv 1 \pmod n$ ; it is written as $a^{-1} \pmod n$ . This means that $a$ must be coprime to $n$ .

Theorem: let $0 \gt a \gt n$ ; $a$ has an inverse modulo $n$ iff $gcd(a, n) = 1$ .

The Euclidean algorithm can be used to find the inverse of $a$ .

If $x$ such that $ax \equiv 1 \pmod n$ , there is an integer $y$ such that $ax = 1 + yn$ .

Hence, starting from $gcd(a, n) = 1$ , use back substitution to find $x$ and $y$ in the equation $ax + ny = 1$ . the $y$ value gives the inverse.

Group of Primes Modulus $\mathbb{Z}^*_p$

$\mathbb{Z}^*_p = {1, 2, \dots, p - 1}$ is a complete set of residues modulo the prime $p$ with the value $0$ removed.

Properties:

$|\mathbb{Z}^*_p| = p - 1$
$\mathbb{Z}^*_p$ is cyclic
$\mathbb{Z}^*_p$ has many generators (in general)

It can be thought of as the multiplicative group of integers $1, 2, \dots, p - 1$ which have inverses modulo $p$ .

Finding Generators

A generator of $\mathbb{Z}^*_p$ is an element of order $p - 1$

Lagrange theorem: the order of any element must exactly divide $p - 1$ .

To find a generator of $\mathbb{Z}^*_p$ :

Compute all distinct prime factors $f_1, f_2, \dots, f_r$ of $p - 1$
$g$ is a generator iff $g^\frac{p - 1}{f_i} \neq 1 \pmod p$ for all $i = 1, 2, \dots, r$

Example

Find a generator for $\mathbb{Z}_{11}^*$ .

$|\mathbb{Z}_{11}^*|$ is $10$ as $11$ is prime. $10 = 2 \cdot 5$ , so to check if $g \in \mathbb{Z}_{11}^*$ is a generator, check if $g^2 \not\equiv 1 \pmod{11}$ , $g^5 \not\equiv 1 \pmod{11}$ :

$1$ : not a generator as $1^n \equiv 1 \pmod{11}$
$2$ : $2^5 \equiv 32 \equiv 2 \not\equiv 1 \pmod{11}$ and $2^2 \equiv 4 \not\equiv 1 \pmod{11}$ so $2$ is a generator

Groups of Composite Modulus $\mathbb{Z}^*_p$

For any non-prime $n$ , $\mathbb{Z}_n^*$ is a group of residues that have an inverse under multiplication.

Properties:

$\mathbb{Z}_n^*$ is a group
$\mathbb{Z}_n^*$ is not cyclic in general
Finding its order is difficult

e.g. $\mathbb{Z}_6^* = \{1, 5\}$ (elements coprime to $6$ ).

Example

Find a generator for $\mathbb{Z}_9^*$ .

$\mathbb{Z}_9^* = \{ 1, 2, 4, 5, 7, 8, \}$ so $|\mathbb{Z}_9^*| = 6$

$1$ : not a generator as $1^n \equiv 1 \pmod 9$
$2$ :
- $2^2 \equiv 4 \pmod 9$
- $2^3 \equiv 8 \pmod 9$
- $2^4 \equiv 16 \equiv 7 \pmod 9$
- $2^5 \equiv 32 \equiv 5 \pmod 9$
- $2^6 \equiv 64 \equiv 1 \pmod 9$
- $2^7 \equiv 128 \equiv 2 \pmod 9$
- Hence, the order of $2$ is equal to $|\mathbb{Z}_9^*|$ , cycling through all elements in the group before repeating
- NB: every element is guaranteed to be in $\mathbb{Z}_9^*$ as it is a group and hence has closure over the multiplication operation

Fields

A field $\mathbb{F}$ is a set with two binary operations, $+$ and $\cdot$ , with the properties such that:

$\mathbb{F}$ is an abelian group under the operation $+$ with the identity element $0$
$\mathbb{F} \backslash \{ 0 \}$ is an abelian group under the operation $\cdot$ with identity element $1$
Distributivity: $a \cdot (b + c) = (a \cdot b) + (a \cdot c)$ for $a, b, c \in \mathbb{F}$

Theorem: only finite fields of size $p^n$ exist, where $p$ is a prime and $n$ is any positive integer.

Finite Field $GF(p)$

For a finite field $GF(p) = \mathbb{Z}_p$ :

Multiplication and addition are done modulo $p$
Its multiplicative group is exactly $\mathbb{Z}_p^*$

Finite Field $GF(2)$

$GF(2)$ is the simplest field with two elements, $0$ and $1$ :

Addition modulo $2$ : XOR ( $\oplus$ )
Multiplicative group: $\{ 1 \}$

Finite Field $GF(2^8)$

$GF(2^8)$ is the field used for calculations in AES (block cipher).

Arithmetic in this field is considered as polynomial arithmetic where the field elements are polynomials with binary coefficients. e.g. $0010 1101 \leftrightarrow x^5 + x^3 + x^2 + 1$

Properties:

Polynomial division can be done easily using shift registers
Adding two strings: add their coefficients modulo $2$ (XOR)
Multiplication with respect to a generator polynomial
- AES uses $m(x) = x^8 + x^4 + x^3 + x + 1$
Multiplying two strings: multiply them as polynomials, then take remainder of division by $m(x)$

03. Classical Encryption

Terminology

Cryptography: the study of designing systems
Cryptoanalysis: the study of breaking systems
Steganography; the study of concealing information; not covered in this course

Cryptography transforms data based on a secret called the key. It provides confidentiality and authentication:

Confidentiality: key needed to read the message
Authentication: key needed to write the message

Cryptosystems:

A set of plaintexts holding the original message
A set of ciphertexts holding the encrypted message
- Sometimes called cryptogram
A set of keys
A function called the encryption or encipherment which transforms the plaintext into a ciphertext
An inverse function called the decryption or decipherment which transforms the ciphertext into the plaintext

Symmetric key cipher (secret key cipher):

Encryption/decryption keys known only to the sender/receiver
Secure channel required for transmission of keys

Asymmetric key cipher (public key cipher):

Each participant has a public and private key
Can be used to both encrypt messages and create digital signatures

Notation for Symmetric Encryption Algorithms

Encryption function $E$
Decryption function $D$
Message/plaintext $M$
Cryptogram/ciphertext $C$
Shared secret key $K$

Encryption: $C = E(M, K)$

Decryption: $M = D(C, K)$

Methods of Cryptanalysis

What resources are available to the adversary? Computational capabilities, inputs/outputs to the systems, etc.

What is the adversary aiming to achieve? Retrieving the whole secret key? Distinguishing between two messages?

Exhaustive Key Search

Adversary tries all possible keys. Impossible to prevent such attacks; can only ensure there are enough keys to make exhaustive search too difficult computationally.

Note that the adversary may find the key without exhaustive search or even break the cryptosystem without finding the key.

Preventing exhaustive key search is a minimum standard.

Attack Classification

Ciphertext only attack: the attacker has access only to intercepted ciphertexts.

A cryptosystem is highly insecure if it can be practically attacked using only intercepted ciphertexts.

Known plaintext attack: the attacker knows a small amount of plaintexts and their corresponding ciphertexts.

Chosen plaintext attack: the attacker can obtain the ciphertext from some plaintext it has selected (attacker has ‘inside encryptor’).

Chosen ciphertext attack: the attacker can obtain the plaintext from some ciphertext it has selected (attacker has ‘inside decryptor’).

A cryptosystem should be secure against chosen plaintext and ciphertext attacks.

Kerckhoff’s Principle

Kerckoff’s Principle states the that the attacker has complete knowledge of the cipher; the decryption key is the only item unknown to the attacker.

Secret, non-standard algorithms are often flawed, providing mainly security through obscurity.

Alphabets

Historical ciphers: define alphabet for the plaintext and ciphertext

Roman alphabet: $A, B, C, \dots, Z$

Sometimes it includes spaces, upper/lowercase characters, punctuation
Sometimes maps the alphabet to numbers: $A = 0, B = 1, \dots, Z = 25$

Statistical attacks depend on using the redundancy of the alphabet:

Distribution of single letters, digrams, trigrams are used
Exact statistics vary by sample

Basic Cipher Operations

Transposition: characters in plaintext are mixed up with each other (permutations)

Substitution: each character is replaced by a different character

Transposition Cipher

Permuting characters in a fixed period $d$ and permutation $f$ .

The plaintext is seen as a matrix with $d$ columns. For each row, the characters are mixed up in the order given by $f$ . The same permutation is used by each row.

Key is $(d, f)$ , each block of $d$ characters being re-ordered using the permutation $f$ .

There are $d!$ permutations of length $d$ .

Cryptanalysis

Frequency distribution of ciphertext and plaintext characters are the same.

If $d$ is small, transposition ciphers can be solved by hand using anagramming.

Knowledge of plaintext language digram/trigrams can help to optimize trials.

Simple Substitution Ciphers

Each character in plaintext alphabet replaced by character in ciphertext alphabet using a substitution table.

This is called a monoaphabetic substitution cipher.

Caesar cipher:

$i$ th letter of the alphabet mapped to the $(i + j)$ th letter using the key $j$
Encryption: $C_i = (M_i + j) \pmod n$
Decryption: $M_i = (C_i - j) \pmod n$
Guess $j$ by finding the most frequent character in the ciphertext and mapping it to the most frequent character in the language (e.g. $\Delta$ (space), ‘e’)

Random simple substitution cipher:

Each character assigned to a random character of the alphabet
Encryption/decryption done using substitution table
If the alphabet has $26$ characters, there are $26!$ keys
- One-to-one mapping, so second character can only be assigned to $n - 1$ characters
Caesar cipher is a special case of the random simple substitution cipher
Frequency analysis: use the most frequent characters, common di/trigrams such as ‘the’

Polyalphabetic Substitution

Multiple mappings from plaintext to ciphertext: smoothens frequency distribution.

Typically periodic substitution ciphers based on a period $d$ .

Given $d$ ciphertext alphabets $C_0, C_1, \dots, c_{d - 1}$ , let $f_i: A \rightarrow C_i$ .

A plaintext message:

M = M_0 \dots M_{d-1}M_d \dots M_{2d-1}M_{2d} \dots

it is encrypted to:

(K, M) = f_0(M_0)f_1(M_1) \dots f_{d-1}(M_{d - 1})f_0(M_d) \dots f_{d - 1}(M_{2d - 1}) \dots

If $d = 1$ , the cipher is monoalphabetic - a simple substitution cipher.

Key generation:

Select block length $d$
Generate $d$ random simple substitution ciphers

Encryption: encrypt the $i$ th character using the $j$ th substitution table such that $i \equiv j \pmod d$ .

Decryption: use the same substitution table as encryption.

Vigenère Cipher

Based on shifted alphabets.

The key $K$ is a sequence of characters $K = K_0 K_1 \dots K_{d - 1}$ .

Let $M$ be the plaintext character. for $0 \le i \le d - 1$ , $K$ gives the amount of shift in the $i$ th character e.g. $f_i(M) = (M + K_i) \bmod n$ .

e.g. if $K= LOCK = \{ 11, 14, 2, 10 \}$ , the first character is shifted by $11$ , the second is shifted by $14$ , …, the fifth is shifted by $11$ .

Cryptanalysis

Identifying period length:

Kasiski method
Cryptool uses autocorrelation to automatically estimate period

Once period identified, the $d$ substitution tables can be attacked separately - there needs to be sufficient ciphertext to do this.

Autocorrelation

Given ciphertext $C$ , computed the correlation between $C$ and its shift $C_i$ for all values $i$ of the period.

English is non-random; there is better correlation between two texts with the same shift size.

Find peaks in the value of $C_i$ when $i$ is a multiple of the period; results can be plotted on a histogram.

Kasiski Method

If you identify sequences of characters that occur multiple times, find the distance between them; the period is likely to be a multiple of the period.

If you find multiple sequences with different distances, the period is likely to be a common divisor.

Once the period is found, the separate alphabets can be attacked separately; at this point, it is just a Caesar cipher.

Other Ciphers (for use by hand)

Beaufort cipher: like Vigenère, but $f_i(M) = (K_i - M) \bmod n$

Autokey: starts off as the Vigenère cipher, but the plaintext defines the subsequent alphabet. Hence, the cipher is not periodic.

Running key cipher: practically infinite set of alphabets generated from a shared key. This is ofen an extract from a book called the book cipher.

Rotor Machines

Enigma: each character encrypted with a different alphabet with a period of ~17,000; would never repeat in the same message (in practice).

Hill Cipher

Polygram/polygraphic cipher: simple substitution of an extended alphabet consisting of multiple characters.

Has linearity, making known plaintext attacks easy.

Given $d$ plaintext characters:

Encryption: multiplying the $d \times d$ matrix $K$ by the block of plaintext $M$ : $C = KM$ .

Decryption: multiplying the matrix $K^{-1}$ by the block of ciphertext $C$ : $M = K^{-1}C$ .

Example

$d = 2$ ; takes digrams as input/output blocks.

Each plaintext pair written as column vector. If there are insufficient letters, they are filled with uncommon letters (e.g. $Z$ ).

K = \begin{pmatrix} 4 & 6 \\ 1 & 7 \end{pmatrix}, K^{-1} = \begin{pmatrix} 4 & 12 \\ 11 & 10 \end{pmatrix}

Plaintext:

M = (EG) = \begin{pmatrix} 4 \\ 6 \end{pmatrix}

Encryption:

\begin{aligned} C &= KM \\ &= \begin{pmatrix} 4 & 6 \\ 1 & 7 \end{pmatrix} \begin{pmatrix} 4 \\ 6 \end{pmatrix} \\ &= \begin{pmatrix} 52 \bmod{27} \\ 46 \bmod{27} \end{pmatrix} \\ &= \begin{pmatrix} 25 \\ 19 \end{pmatrix} \\ &= ZT \\ \end{aligned}

Decryption:

\begin{aligned} M &= K^{-1}C \\ &= \begin{pmatrix} 4 & 12 \\ 11 & 10 \end{pmatrix} \begin{pmatrix} 25 \\ 19 \end{pmatrix} \\ &= \begin{pmatrix} 4 \\ 6 \end{pmatrix} \\ &= (EG) \end{aligned}

Cryptanalysis

Known plaintext attacks possible given $d$ plaintext-ciphertext matching blocks: given blocks (column vectors) $M_i$ and $C_i$ , $0 \le i \le d - 1$ :

$C = [C_0 C_1 \dots C_{d-1}]$
$M = [M_0 M_1 \dots M_{d - 1}]$
$C = KM$ , so $K = CM^{-1}$

$C$ , $M$ and $K$ are all $d \times d$ vectors

Then $K^{-1}$ can be found to decrypt the ciphertext.

Comments:

The plaintext message $M$ may not be invertible
Ciphertext-only attacks follow known plaintext attacks with the extra task of finding probable blocks of matching plaintext-ciphertext
- e.g. if $d = 2$ , the frequency distribution of non-overlapping pairs of ciphertext characters can be compared with the distribution of pairs of plaintext characters
Cryptool defaults to an alphabet of $A = 1, B = 2, \Delta = 27$ (where $\Delta$ is space)

05. Block Ciphers

Main bulk encryption algorithms used in commercial applications. AES is one example of such algorithm.

Principles

Block ciphers are symmetric key ciphers where each block of plaintext encrypted with the same key.

A block is a set of plaintext symbols of a fixed size, typically 64 to 256 bits in modern ciphers.

They are used in configurations called modes of operation.

Notation

$P$ : plaintext block of length $n$ bits
$C$ : ciphertext block of length $n$ bits
$K$ : key of length $k$ bits
Encryption: $C = E(P, K)$
Decryption: $P = D(C, K)$

Criteria

Shannon defined two encryption techniques:

Confusion: substitution used to make the relationship between $K$ and $C$ as complex as possible.
Diffusion: transformations used to dissipate the statistical properties of $P$ across $C$ .

Repeated use of techniques can be used using the concept of a product cipher.

Product & Iterated Ciphers

Product Cipher

Cryptosystem where encryption performed by applying/composing several sub-encryption algorithms: output of one block used as input to next block.

Often composed of simple functions $f_i$ for $1 \le i \le r$ such that each $f_i$ has its own key $K_i$ .

C = E(P, K) = f_r(\dots(f_2(f_1(P, K_1), K_2)\dots), K_r)

Iterated Cipher

Special product ciphers called iterated ciphers where:

Encryption divided into $r$ similar rounds
Sub-encryption functions are the same function $g$ : the round function
Each round key/subkey $K_i$ is derived from the master key $K$ using a process called key schedule

Encryption

Given plaintext block $P$ , round function $g$ , round keys $K-1, K_2, \dots, K_r$ , the ciphertext block $C$ is derived through $r$ rounds:

\begin{aligned} W_0 &= P \\ W_1 &= g(W_0, K_1) \\ W_2 &= g(W_1, K_2) \\ \dots W_r &= g(W_{r - 1}, K_r) = C \end{aligned}

Decryption

There must be an inverse function $g^{-1}$ such that $g^{-1}(g(W, K_i), K_i) = W$ for all keys $K_i$ and blocks $W$ .

\begin{aligned} W_r &= C \\ W_{r - 1} &= g^{-1}(W_r, K_r) \\ W_{r - 2} &= g^{-1}(W_{r - 1}, K_{r - 1}) \\ \dots W_0 &= g^{-1}(W_1, K_r) = P \end{aligned}

Types of Iterated Ciphers

Substitution-Permutation Network (SPN) e.g. Advanced Encryption Standard (AES).

Feistel Cipher e.g. Data Encryption Standard (DES).

Substitution-Permutation Network

Block length $n$ must allow each block to be split into $m$ sub-blocks of length $l$ : $n = lm$ .

Substitution $\pi_S$ (called substitution box or S-box) operates on sub-blocks of length $l$ bits:

\pi_S: \{ 0, 1 \}^l \rightarrow \{ 0, 1 \}^l

i.e. mapping some binary number of size $l$ bits to another.

Permutation $\pi_P$ (called permutation-box or P-box) swaps the inputs from $\{ 1, \dots, n \}$ :

\pi_P: \{ 1, \dots, n \} \rightarrow \{ 1, \dots, n \}

i.e. swapping the order of bits in the entire block around.

Operation

Round key $K_i$ XORed with current state block $W_i$ : $K_i \oplus W_i$
Each sub-block substituted applying $\pi_S$
The whole block permuted using $\pi_P$

Example

4 round keys
4 S-boxes
1 P-box
Last round does have a P-block

Feistel Cipher

Round function swaps the two halves of the block to form a new right hand half.

Encryption

Feistel Cipher Network

Plaintext block $P = W_0$ split into two halves $(L_0, R_0)$ .

For each round:

$L_i = R_{i - 1}$
$R_i = L_{i - 1} \oplus f(R_{i - 1}, K_i)$

The output is the ciphertext block $C = W_R = (L_r, R_r)$ .

Decryption

Split $C$ into two halves, $(L_r, R_r)$ .

For each round:

$L_{i - 1} = R_i \oplus f(L_i, K_i)$
$R_{i - 1} = L_i$

$f$ does not need to be inverted: $x \oplus x = 0$ , so applying by applying $f$ twice it can be decrypted.

The choice of $f$ is critical for security; is is the only non-linear part of the encryption.

Differential and Linear Cryptanalysis

Differential cryptanalysis: chosen plaintext attack using correlation in the differences between two input plaintexts and their corresponding ciphertexts.

Liner cryptanalysis: known plaintext attack that theoretically break DES.

Modern block ciphers normally immune to both attacks.

Avalanche Effects

Key avalanche: a small change in key (with the same plaintext) should result in a large change in ciphertext.

Plaintext avalanche: a small change in plaintext should result in a large change in ciphertext: changing one bit should change all bits in the ciphertext with a probability of $1/2$ .

Key avalanche is related to Shannon’s notion of confusion, plaintext avalanche to Shannon’s notion of diffusion.

DES

Designed by IBM researchers, became US standard in 1976. 16-round Feistel cipher with key length of 56 bits and data block length of 64 bits.

The key length is actually 64 bits, but the last bit of every byte is redundant.

Encryption

$P$ is an input plaintext block of 64 bits:

All bits of $P$ permuted using an initial fixed permutation of $IP$
16 rounds of Feistel operations applied, denoted by function $f$
- Each round uses a different 48-bit subkey
- The subkey is defined by a series of permutations and shifts
A final fixed inverse permutation $IP^{-1}$ is applied

The 64-bit ciphertext block $C$ is the output.

Decryption requires only reversing the order in which the subkeys are applied.

Feistel Operation

32 bits expanded to 48 bits using a padding scheme which repeats some bits
XOR the 48 bits with the 48-bit subkey
Break 48 bits into 8 blocks of 6 bits
Each block $W_i$ transformed using substitution table $S_i$ , resulting in blocks of length $4$ and hence a total of 32 bits.
- A transformation table is used to determine the output value
- If the input block is $W = x_1x_2x_3x_4x_5x_6$ , the row number is given by $x_1x_6$ and the column number by $x_2x_3x_4x_5$
A permutation is applied to the result

Brute Force Attacks

Testing all the possible $2^k$ keys (where $k$ is the size of the key $K$ ). $k = 56$ is fairly small, requiring only $2^{k}/2 = 2^{55}$ trials on average - this was criticized from the start.

The key can be identified by using a small number of ciphertext blocks and looking for low entropy in the decrypted plaintext.

Double Encryption

Let $K_1$ and $K_2$ be two block cipher keys.

Encryption: $C = E(E(P, K_1), K_2)$ .

If both keys have length $k$ , exhaustive attacks require $2^{2k - 1}$ trials on average.

Meet-in-the-Middle Attack

Let $(P, C)$ is a single plaintext-ciphertext pair:

For each key $K$ , store $C' = E(P, K)$ in memory
For any key $K'$ , check if $D(C, K') = C'$ (i.e. matches any ciphertext stored in memory)
- If this is found, $K$ is $K_1$ and $K'$ is $K_2$
- Check if key values work for other $(P, C)$ pairs

Requirements:

Storing one plaintext block for every key: $2^{56}$ 64-bit blocks
An encryption operation for every key
A decryption operation for every key

Expensive but still much cheaper than brute-forcing $2^{111}$ keys.

Triple Encryption

Requires three keys: $C = E(D(E(P, K_1), K_2), K_3)$ . (symmetric so decryption/encryption doesn’t matter? TODO)

This increases the computational requirements enough to make it secure against MITM attacks.

NIST SP 800-131A (2015) approves two-key triple DES, where $K_1 = K_3$ , only for legacy use. three-key triple DES is approved.

OpenSSL removed triple DES in 2016. Office 365 stopped using triple DES in 2018.

AES

Designed in an open competition after controversy over DES. Winning submission is ‘Rijndael’.

128-bit data block
128-, 192- or 256-bit master key
Byte-based design
Substitution-permutation network
- Initial round key addition
- 10, 12, or 14 rounds (depending on key size)
- Final round

Algorithm

State Matrix

16-byte data block size arranged in a $4 \times 4$ matrix.

Mixture of finite field operations in $GF(2^8)$ and bit string operations.

Round Transformation

Each round has four basic operations:

ByteSub (non-linear substitution): substitute each byte wth a different value using a substitution table
ShiftRow (permutation): rotate first row right by zero bytes, second row right by one byte… (bytes wrap to left)
MixColumn (diffusion): each column is replaced with result of it being multiplied by a matrix
AddRoundKey: XORs array with round key

Substitution-permutation network with block length of $n = 128$ and sub-block length of $l - 8$ .

S-box uses look-up table.

Key Schedule

Master key is 128/192/256 bits.

Each of the 10/12/14 rounds uses a 128-bit subkey. There is one subkey per round plus one initial subkey, all derived from the master key.

Security

Some weaknesses but no significant break; most serious real attacks can reduce effective key size by around two bits.

Vulnerable to related-key attack: attacker obtains a ciphertext encrypted with a key related to an actual key in a specified way.

Comparisons with DES

Data block size: 64 vs 128 bits
Key size: 56 vs 128/192/256 bits
Design:
- Both iterated ciphers
- DES uses Feistel; AES uses SPN
- AES substantially faster in both hardware and software

06. Block Cipher Modes of Operation

Block ciphers encrypt single blocks of data, but many applications require multiple blocks to be encrypted sequentially and breaking the plaintext into blocks and encrypting them separately can be insecure.

Modes of operation are standardized with different security and efficiency characteristics.

NIST has many standards (e.g. SP 800-38 series) for this.

Important Features of Different Modes

Different modes can provide confidentiality, authentication (and integrity) or both.

Modes for confidentiality normally include randomization.

Different modes have different efficiency and communication properties.

Randomized Encryption

Problem: the same plaintext block is encrypted to the same ciphertext block every time - allows patterns to be found in long ciphertexts.

Prevention: randomizing encryption schemes by using an initialization vector (IV) which propagates through the entire ciphertext. It may need to be either random or unique.

Alternatively, there could be a variable state which is updated with each block.

Efficiency

Parallel processing: encrypting/decrypting multiple plaintext/ciphertext blocks in parallel.

Error propagation: a bit error in the ciphertext which results in multiple bit errors in the plaintext after decryption.

Padding

Requiring the plaintext to consist of complete blocks.

NIST suggested padding method: append a single $1$ bit to the data string, then with $0$ s until the last block is completed.

Notation

Plaintext message $P$ of length $n$ blocks
$t$ -th plaintext block $P_t$ for $1 \le t \le n$
Ciphertext message $C$
$t$ -th ciphertext block $C_t$ for $1 \le t \le n$
Key $K$
Initialization vector $IV$

Modes can be applied to any block cipher.

Confidentiality Modes

Electronic Code Book (ECB) Mode

A basic mode of a block cipher; each block is encrypted with a key, IV is not used.

Encryption

C_t = E(P_t, K)

Decryption

P_t = D(C_t, K)

Properties

Randomized	Padding	IV	Parallel encryption	Parallel decryption
No	Required	None	Yes	Yes

Error propagation: within blocks.

Cipher Block Chaining (CBC) Mode

Blocks chained together: the plaintext XORed with previous ciphertext (or IV for the first block) and then encrypted.

Encryption

C_t = E(P_t \oplus C_{t - 1}, K)

for $1 \le t \le n$ where $C_0 = IV$ .

Decryption

P_t = D(C_t, K) \oplus C_{t - 1}

Where $C_0 = IV$ .

Properties

Randomized	Padding	IV	Parallel encryption	Parallel decryption
Yes	Required	Random	No	Yes

Error propagation: within blocks and into specific bits in the next block.

Parallel decryption means that decryption does not require the plaintext of previous block. However, it does require the ciphertext of the previous block.

Commonly used for bulk encryption, was often used in TLS up to TLS 1.2.

Counter (CTR) Mode

Synchronous stream cipher mode.

Encryption

The counter and a nonce (IV) are initialized using a random value $N$ :

T_t = N \| t

That is, $T_t$ is the concatenation of the nonce $N$ with the block number $t$ .

Then, this result is encrypted with the key $K$ :

O_t = E(T_t, K)

Finally, it is XORed with the plaintext block $P_t$ :

\begin{aligned} C_t &= O_t \oplus P_t \\ &= E(N \| t, K) \oplus P_t \end{aligned}

Decryption

P_t = O_t \oplus C_t

Properties

Randomized	Padding	IV	Parallel encryption	Parallel decryption
Yes	Optional	Unique	Yes	Yes

A one-bit change in ciphertext produces one-bit change in the plaintext at the same location.

This allows access to specific plaintext blocks without decrypting the whole stream.

CTR mode is the basis for authenticated encryption in TLS 1.2.

Authentication Mode

Message Integrity

Ensuring messages are not altered in transmission: preventing an adversary from re-ordering, replacing, replication and deleting message blocks to alter the received message.

Message integrity and authentication are treated as the same thing.

Proving message integrity is independent from using encryption for confidentiality.

Message Authentication Code (MAC)

T = \mathrm{MAC}(M, K)

Where $M$ is an arbitrary-length message and $K$ a secret key $K$ .

The output $T$ is a short, fixed-length tag.

Given both parties share the key $K$ :

The sender computes $T = (M, K)$
The message $M$ and tag $T$ are sent
The receiver computes $T' = \mathrm{MAC}(M', K)$ on the received message $M'$ and checks that $T' = T$

MAC Properties

Only the sender and receiver can produce $T$ from $M$ .

If $T' = T$ , the receiver can conclude the message received is from the expected sender and has not been modified in transit. Otherwise, the receiver can conclude $(M', T)$ was not sent by the expected sender.

It has the basic security property of unforgeability: it is infeasible to produce $M$ and $T$ such that $T = \mathrm{MAC}(M, K)$ without knowledge of $K$ .

Basic CBC-MAC

Using block cipher to create a MAC providing message integrity (but not confidentiality).

If $P$ is the message with $n$ blocks:

C_t = E(P_t \oplus C_{t - 1}, K)

For $1 \le t \le n$ such that $C_0 = IV$ .

$T = \mathrm{CBC\text{-}MAC}(P, K)$ is the last cyphertext: $T = C_n$ .

It is unforgeable as long as the message length is fixed.

$IV$ must be fixed and public (e.g. all zeroes): CBC-MAC with a random IV is NOT secure:

If the IV is random, the IV needs to be sent along with the MAC
$C_0 = E(P_t \oplus IV, K)$
Hence, the attacker can modify $P_t$ and $IV$ together such that XORing them gives the same result. As $C_0$ is not modified, none of the subsequent ciphertexts (and hence the tag) stays unchanged

Cipher-based MAC (CMAC)

Standardized, NIST version of CBC-MAC. The IV is all zeroes. The below is as per RFC4493.

Two keys $K_1$ and $K_2$ are derived from the original key $K$ .

$K_1$ OR $K_2$ is XORed with $M_n$ (with padding as necessary).

For $1 \le t \le n - 1$ and $C_0 = IV = 00\dots00$ :

C_t = E(P_t \oplus C_{t - 1}, K)

For the final block:

P_n' = \begin{cases} K_1 \oplus P_n , & \text{block complete} \\ K_2 \oplus (P_n \| 100\dots00_2), & \text{block incomplete} \end{cases}

(That is, $P_n$ concatenated with 1 and then enough zeros to fill up a block)

Then do the same operation as with the previous blocks , except that $P_n'$ is used:

C_n = E(P_n' \oplus C_{n - 1}, K)

Finally, $\mathrm{CMAC}(P, K) = \mathrm{MSB}_{Tlen}(C_n)$

NIST allows the length of the tag, $Tlen$ , to be any number of bits, although 64 bits or greater is recommended.

The standard recommends the MAC tag $T$ to be of at least length $\mathrm{log}_2(lim/R)$ where:

$lim$ is a limit on how many invalid messages are detected before $K$ is changed
$R$ is the acceptable probability that a false message is accepted

Authenticated Encrypted Mode

Two types of input data:

Payload: both encrypted and authenticated
Associated data: only authenticated

NIST specifies two modes:

NIST SP-800-38C (2004) for Counter with CBC-MAC
NIST SP-800-38D (2007) for Galois/Counter (GCM)

Both use CTR mode but add integrity in different ways.

Both are used in TLS 1.2 and 1.3.

Counter with CBC-MAC (CCM) Mode

CBC-MAC for authentication of all data, CTR mode encryption for the payload.

Inputs:

Nonce $N$ for CTR mode
Payload $P$ of $Plen$ bits
Associated data $A$

Compute the CBC-MAC tag, getting $T$ with length $Tlen$ .

Split the message $M$ into blocks of $128$ bits. That is, into $m = \lceil Plen/128 \rceil$ blocks:

S = S_0 \| S_1 \| \dots S_m

Then, use CTR mode to compute blocks.

C = (P \oplus \mathrm{MSB}_{Plen}(S)) \| (T \oplus \mathrm{MSB}_{Tlen}(S_0))

From RFC3610:

Authentication using CBC-MAC:
- Blocks $B_0 \dots B_n$ generated. $B_0$ contains the metadata such as the nonce, payload length etc. Later blocks contain the payload and associated data.
  
  $\begin{aligned} &X_i = \begin{cases} E(B_0, K), & i = 0 \\ E(B_i \oplus X_{i - 1}, K), & i = 1, \dots, n \end{cases} \\ \\ &T = \text{MSB}_{\text{Tlen}}(X_n) \end{aligned}$
Encryption using CTR mode:
- Generate a keystream $S_i = E(\text{Flags} \| N \| i)$ where $i$ is the block number.
- Output message $C_i = S_i \oplus P_i$ . $S_i$ starts with $S_1$ , not $S_0$
- Output authentication value $U = T \oplus S_0$
Decryption requires key $K$ , nonce $N$ , authenticated data $A$ and ciphertext $C$
- Authenticated data must be sent separately!

CCM Mode Format

Lengths of $N$ and $P$ are included in the first block.

If $A$ is non-zero, then formatted from the second block onwards, including its length.

e.g. TLS 1.2: $T$ 8 bytes, $N$ 12 octets, max payload size $2^{24} - 1$ bytes.

07. Pseudorandom Numbers and Stream Ciphers

Random Numbers

Randomness: want any specific string of bits is exactly as random as any other string.

True random number generator (TRNG): physical process which outputs each valid string independently with equal probability.

Pseudorandom number generator (PRNG): deterministic algorithm which approximates a TRNG.

For practically, TRNGs are often used to provide a seed for a PRNG.

Pseudorandom Number Generator (PRNG)/Deterministic Random Bit Generator (DRBG)

Entropy source includes:

Physical noise source
Digitization process
Post-processing stages

Periodic health tests required to ensure reliable operation.

Each generator takes a seed as an input, outputting a bit string before updating its state.

The seed should be updated after some number of calls.

DRBGs expose some functions:

Instantiate: set the initial state using a seed
Generate: provide bit string for each request
Reseed: input new random seed and update the state
Test: check correct operation
Uninstantiate: delete/zeroising the state

The DDRBG should prevent an attacker from being able to reliably distinguish between its output and a truly random string. There are two types of resistance:

Backtracking resistance: attacker with access to current state of DRBG should not be able to distinguish between the output of earlier calls and random strings
Forward prediction resistance: attacker with access to current state of DRBG should not be able to distinguish between output of later calls and random strings

CTR_DRBG

Block cipher in counter (CTR) mode - AES with 128-bit keys recommended.

DRBG initialized with seed whose length is equal to key length PLUS block length.

Seed defines a key $K$ and counter value $ctr$ (no separate nonce).

CTR run iteratively with no plaintext.

Update Function

Each request to the DRBG generates up to $2^{19}$ bits.

State $(K, ctr)$ must be updated after each Generate request by generating two blocks using the current key to obtain a new key and counter; provides backtracking resistance.

Up to $2^{48}$ requests to the Generate function are allowed before re-seeding is required; provides forwards prediction and backtracking resistance.

Dual_EC_DRBG

Older standard based on elliptic curve discrete logarithm problem and with many flaws.

Much slower than other DRBGs.

Secret deal between NSA and RSA Security to use this as the default PRNG in their software was reported in 2013.

Stream Ciphers

Generates keystream using a short key and initialization vector $IV$ .

Each element of the keystream is used to successively encrypt one or more ciphertext characters.

Stream ciphers are usually symmetric.

Synchronous Stream Ciphers

Keystream is generated independently of the plaintext; both the sender and receiver need to generate the same keystream and be synchronized.

Keystream and plaintext are XORed together, so receiver simply needs to XOR the ciphertext with the keystream to decrypt.

Vigenère cipher can be seen as a periodic synchronous stream cipher where each shift is defined by a key letter.

CTR mode of operation for a block cipher can be used to generate a keystream.

Binary Synchronous Stream Ciphers

For each time interval $t$ :

$s(t)$ is the binary keystream
$p(t)$ is the binary plaintext
$c(t)$ is the binary ciphertext

Encryption: $c(t) = p(t) \oplus s(t)$ .

Decryption: $p(t) = c(t) \oplus s(t)$ .

One-Time Pad

Key is random sequence of characters such that each is independently generated.

Each character in the key is only used once; this provides perfect secrecy.

Alphabet can be of any length but is usually either binary or a natural language alphabet.

Perfect Secrecy

Shannon’s definition:

Message set $\{ M_1, \dots, M_k \}$
Ciphertext set $\{ C_1, \dots, C_k \}$
$\mathrm{Pr}(M_i | C_j)$ ; probability that $M_i$ is encrypted given that $C_j$ is observed
- In most cases, messages $M_i$ are not equally likely; that is, given a ciphertext, some messages are more likely than others
For all messages $M_i$ and ciphertexts $C_i$ :

$\mathrm{Pr}(M_i | C_j) = \mathrm{Pr}(M_i)$

The ciphertext should be independent of the plaintext

One-Time Pad Perfect Secrecy

Let a ciphertext $C_j$ be observed.

The probability that $M_i$ was sent given that $C_j$ is observed is the probability that $M_i$ is chosen, weighted by the probability that the right keystream is chosen.

As each keystream is chosen with equal probability, $\mathrm{Pr}(M_i | C_j) = \mathrm{Pr}(M_i)$ .

Any keystream is possible and so given any plaintext, every possible ciphertext is generated with equal probability.

Vernam Binary One-Time Pad

Plaintext : binary sequence $b_1, \dots, b_r$
Ciphertext: binary sequence $c_1, \dots, c_r$
Keystream : binary sequence $k_1, \dots, k_r$
- Must be same length as the plaintext
Encryption: $c_i \equiv p_i \oplus k_i$
Decryption: $p_i \equiv c_i \oplus k_i$

Properties

Shannon showed that any ciphertext with perfect secrecy must have as many keys as there are messages. Hence, one-time pad is the only unbreakable cipher.

However, this requires secure communication between fixed parties and secure key generation, transportation, synchronization, and destruction which are all difficult due to the size of the keys.

Visual Cryptography

Encryption splits an image into two shares, each pixel being shared in a random way (similar to splitting a bit in a one-time pad).

Each share alone reveals no information about the image, but the two images can be overlayed to reveal the plaintext.

Prominent Stream Ciphers

A5 Cipher

Binary synchronous stream cipher applied in most GSM communications. Three variants:

A5/1 is the original algorithm defined in 1987
A5/2 is the weakened version intended for deployment outside Europe
A5/3 (KASUMI) used in 3G systems

A5/1 Design

Three linear feedback shift registers (LFSRs) whose outputs are combined.

The LFSRs are irregularly clocked, making the overall output non-linear.

It has a 64-bit keystream such that 10 bits are fixed to zero; hence, the effective key length is 54 bits.

RC4 Cipher

Ron’s code #4 . Originally owned by RSA but leaked in 1994. Too weak to use in real systems nowadays, but was was widely deployed in TLS before 2013.

ChaCha Algorithm

Possible replacement of RC4 designed in 2008.

Faster than AES; as few as 4 cycles/byte on x84 processors.

Number Theory for Public Key Cryptography

Number theory problems used in public key cryptography
Need efficient ways of generating large prime numbers
Definitions of hard computational problems are the base of crypto systems

Chinese Remainder Theorem (CRT)

Let $p$ , $q$ be relatively prime.

Let $n = pq$ be the modulus.

Given integers $c_1$ and $c_2$ there exists a unique integer $0 \le x \lt n$ such that:

\begin{aligned} x &\equiv c_1 \pmod p \\ x &\equiv c_2 \pmod q \end{aligned}

x \equiv \frac{n}{p}y_1c_1 + \frac{n}{q}y_2c_2 \pmod n

Where:

\begin{aligned} y_1 \equiv \left(\frac{n}{p}\right)^{-1} \pmod p \equiv q^{-1} \pmod p \\ y_2 \equiv \left(\frac{n}{q}\right)^{-1} \pmod q \equiv p^{-1} \pmod q \end{aligned}

Condensed Equation:

x \equiv qc_1(q^{-1} \pmod p) + pc_2(p^{-1} \pmod q) \pmod{pq}

Example

Find $x$ such that $x \equiv 5 \pmod 6$ and $x \equiv 33 \pmod {35}$ :

$c_1 = 5$ and $c_2 = 33$
$p = 6$ and $q = 35$ are relatively prime so CRT can be used
$n = 6 \cdot 35 = 210$

For $y_1$ :

\begin{aligned} \frac{210}{6}y_1 &\equiv 1 \pmod 6 \\ 35y_1 &\equiv 1 \pmod 6 \\ 5y_1 &\equiv 1 \pmod 6 \\ y_1 &\equiv 5 \pmod 6 \end{aligned}

Make sure you replace $q$ with $q \bmod p$ (assuming $q \gt p$ ): otherwise instead of finding $5^{-1} \pmod 6$ , you will find $6^{-1} \bmod{35}$ .

For $y_2$ :

\begin{aligned} \frac{210}{35}y_1 &\equiv 1 \pmod {35} \\ 6y_1 &\equiv 1 \pmod {35} \\ y_1 &\equiv 6 \pmod {35} \end{aligned}

\begin{aligned} x &\equiv \frac{n}{p}y_1c_1 + \frac{n}{q}y_2c_2 \pmod n \\ &\equiv (35 \cdot 5 \cdot 5) + (6 \cdot 6 \cdot 33) \pmod {210} \\ &\equiv 173 \pmod {210} \end{aligned}

Example 2

Find $x$ such that $x \equiv 5 \pmod{7}$ and $x \equiv 7 \pmod{10}$

$\mathrm{gcd}(7, 10) = 1$ ; $p$ and $q$ are relatively prime. Hence, CRT applies.

Hence $n = 7 \cdot 10 = 70$

\begin{aligned} x &\equiv 10 \cdot y_1 \cdot 5 + 7 \cdot y_2 \cdot 7 &\pmod{70} \\ &\equiv 50 y_1 + 49 y_2 &\pmod{70} \end{aligned}

Where:

$y_1 \equiv 10^{-1} \pmod{7}$
- As $(5 \cdot 10)^{-1} \equiv 1 \pmod{7}$ , $y_1 = 5$
$y_2 \equiv 7^{-1} \pmod{10}$
- As $(3 \cdot 7)^{-1} \equiv 1 \pmod{10}$ , $y_2 = 3$

Hence:

\begin{aligned} x &\equiv 50 y_1 + 49 y_2 &\pmod{70} \\ &\equiv 50 \cdot 5 + 49 \cdot 3 &\pmod{70} \\ &\equiv 250 + 147 &\pmod{70} \\ &\equiv 40 + 7 &\pmod{70} \\ &\equiv 47 &\pmod{70} \end{aligned}

Test:

$47 \bmod{7} = 42 + 5 = 5$ as required
$47 \bmod{10} = 40 + 7 = 7$ as required

Euler Function

Given the positive integer $n$ , the Euler function $\phi(n)$ denotes the number of positive integers less than $n$ and relatively prime to $n$ .

e.g. $\phi(10) = 4$ as $\mathbb{Z}^{*}_{10} = \{ 1, 3, 7, 9 \}$ .

Properties

$\phi(p) = p - 1$ where $p$ is prime.

$\phi(pq) = (p - 1)(q - 1)$ where $p$ and $q$ are distinct primes.

if $n = p_1^{e_1} \dots p_t^{e_t}$ where $p_i$ are distinct primes, then:

\phi(n) = \prod_{i = 1}^{t}{p_i^{e_i - 1}(p_i - 1)}

e.g. for $24 = 2^3 \cdot 3^1$ , $\phi(24) = 2^2(2 - 1) \cdot 3^0(3 - 1) = 4 \cdot 2 = 8$

Fermat’s Theorem

Let $p$ be a prime; for any integer $a$ such that $1 \lt a \le p - 1$ :

a^{p - 1} \mod p = 1

Euler’s Theorem

More general case of Fermat’s theorem.

If $\mathrm{gcd}(a, n) = 1$ (i.e. $a$ and $n$ coprime) then:

a^{\phi(n)} \mod n = 1

Primality Tests

Testing for primality by trial division not practical for large numbers.

There are many probabilistic methods, although these may fail in exceptional circumstances.

Agrawal, Saxena, Kayal 2002: polynomial time deterministic primality test, although still impractical.

Fermat Primality Test

Fermat’s little theorem: if $p$ is prime, $a^{p - 1} \mod p = 1$ for all $a$ such that $\mathrm{gcd}(a, p) = 1$ .

If $a^{n - 1} \mod n \neq 1$ , $n$ is NOT a prime.

The probability of failure can be reduced by repeating the test with different base values of $a$ .

Given the number $n$ to test for primality and $k$ , the number of times to run the test:

Pick $a$ at random such that $1 \lt a \lt n - 1$
If $a^{n - 1} \mod n \neq 1$ , return $n$ as being composite. Otherwise, return probable prime
- The powers can be reduced using the following properties:
  - $ab \bmod n = (a \bmod n)(b \bmod n) \bmod n$
  - $(a^m)^k \bmod n = (a^m \bmod n)^k \bmod n$

Some composite numbers such as $561$ and $1105$ are called Carmichael numbers; the Fermat primality test always returns these numbers as being probable primes.

Example

Check if $517$ is prime, running the test at most four times with values $a = 2, 3, 11, 17$ .

$n - 1 = 517 - 1 = 516 = 43 \cdot 3 \cdot 2^2$

Using $a = 2$ :

\begin{aligned} 2^{516} \bmod{517} &= \left(2^{43} \bmod{517}\right)^{12} \bmod{517} \\ &= (382)^{12} \bmod{517} \\ &= \left((382)^3 \bmod{517}\right)^4 \bmod{517} \\ &= (28)^{4} \bmod{517} \\ &= 460 \neq 1 \end{aligned}

Hence it is composite, so we do not need to make further checks.

Miller-Rabin Test

Most widely used test for generating large prime numbers. It is guaranteed to detect composite numbers of the test is run sufficiently many times.

Modular square root of $1$ : a number $x$ such that $x^2 \mod n = 1$ . In other words, the root of the equation $x^2 - 1 \equiv (x - 1)(x + 1) \equiv 0 \pmod n$ .

There are four square roots of $1$ when $n = pq$ (i.e. composite):

Two are $1$ and $-1$
Two are called non-trivial square roots of $1$

If $x$ is a non-trivial square root of $1$ then $\mathrm{gcd}(x - 1, n)$ is a non-trivial factor of $n$ . Hence, $n$ must be composite.

Miller-Rabin Algorithm

Let $n$ and $u$ be odd and find $v$ such that $n - 1 = 2^v u$ . Then:

Pick $a$ at random such that $1 \lt a \lt n - 1$
Set $b = a^u \mod n$
If $b = 1$ , return probable prime
For $j = 0$ to $v - 1$ :
- If $b = -1$ , return probable prime
- Else, set $b = b^2 \mod n$
Return composite

If the test returns probable prime, $n$ may be composite.

If $n$ is composite then then test returns probable prime with at most a probability of $0.25$ .

As the algorithm is run $k$ times, it outputs probable prime when $n$ is composite with a probability of no more than $0.25^k$ .

In practice, the error probability is smaller: when $a$ is given the first seven primes, the smallest composite the algorithm returns as being a probable prime is $341,550,071,728,321$

Why The Miller-Rabin Test Works

Given a random $a$ such that $0 \lt a \lt n - 1$ , with $n - 1$ being equal to $2^v u$ .

Hence, $b$ can be given the values $\{ a^u \mod n, a^{2u} \mod n, \dots, a^{2^{v}u} \mod n \}$ .

Each number on the sequence is the square of the previous number (modulo $n$ ).

If $n$ is prime, $a^{2^v u} \mod n = a^{n - 1} = 1$ (Fermat’s theorem). Hence:

Either $a^u \mod n = 1$ OR
There is a square root of $1$ somewhere in the sequence whose value is $-1$ (which is equal to $n - 1$

If a non-trivial square root of $1$ is found, $n$ must be composite.

Example

Let $n = 1729$ (a Charmichael number)

$n - 1 = 1728 = 64 \times 27 = 2^6 \times 27$ . Hence, $v = 6$ , $u = 27$ .

Choose $a = 2$
$b = 2^{27} \bmod{1729} = 645$
$b \neq 1$ so:
- $b = 645^2 \bmod n = 1065$
- $b = 1065^2 \bmod n = 1$
- $b = 1^2 \bmod n = 1$
- $\dots$
- Hence, $b = -1 = n - 1$ will never occur
- Hence, $n$ must be composite

NB: $1064$ is a non-trivial square root of $1$ modulo $1729$ : $\mathrm{gcd}(1729, 1064) = 133$ is a factor of $1729$ .

Example 2

Let $n = 17$ .

$n - 1 = 16 = 2^vu$ .

$16 = 2^5$ and $u$ must be odd. Hence, the only valid values are $u = 1$ and $v = 5$ .

Let $1 < a < n - 1$ be a prime. Pick $a = 3$ :

$b = a^u \bmod n = 3^1 \bmod{17} = 3$
This is not $1$ so:
Repeat up to $v = 5$ times:
- $b = 3^2 \bmod{17} = 9$
- $b = 9^2 \bmod{17} = 13$
- $b = 13^2 \bmod{17} = 16 = -1$

Hence, $17$ is a probable prime. Repeat for other values of $a$ until satisfied.

Generation of Large Primes

Choose a random odd integer $n$ of the same number of bits as the required prime
Test if $r$ is divisible by any of a small list of primes
If not:
- Apply the Miller-Rabin test five random or fixed based values of $a$
- If $r$ fails any test, set $r = r + 2$ and return to step 2

This incremental method does not produce completely random primes. If this is an issue start from step 1 if $r$ is found to be composite.

Basic Complexity Theory

Two aspects:

Algorithmic complexity: how long does it to take to run a particular algorithm
Problem complexity: how long does it to take to run the best known algorithm for the given problem

Express this complexity using ‘big O’ notation; in terms of the space and time required to solve the problem for a given size.

Hard Problems

Integer factorization: given an integer of $m$ bits, find its prime factors.

Discrete logarithm problem (base 2): given a prime $p$ of $m$ bits and an integer $0 \lt y \lt p$ , find $x$ such that $y = 2^x \mod p$

There are no known polynomial algorithms to solve the problems; the best known algorithms are sub-exponential.

Factorization Problem

Trial by division: exponential time algorithm

Some fast methods exist, although they only apply to integers with particular properties.

Bets known general method: number field sieve, a sub-exponential algorithm.

As $n = pq$ , $n$ is large even with small keys; brute force search of 128-bit AES keys takes roughly the same computational effort as factorization of a 3072-bit number with two factors of roughly equal size.

Discrete Logarithm Problem

Given a prime $p$ and generator $g$ of $\mathbb{Z}_p^*$ , the discrete logarithm is:

Given $y \in \mathbb{Z}_p^*$ , find $x$ such that $y = g^x \bmod p$ .

(i.e. find the power given the remainder)

This can be written as $x = log_g(y) \pmod p$ .

If $p$ is large enough, the problem is hard. Given the same length, the security level is equal RSA (and hence should be at least 2048 bits).

Example

Find $x$ such that $2^x = 3 \bmod 5$ :

\begin{aligned} 1&: 2^1 \bmod 5 = 2 \\ 2&: 2^{2} \bmod 5 = 4 \bmod 5 = 4 \\ 3&: 2^{3} \bmod 5 = 8 \bmod 5 = 3 \\ \end{aligned}

Hence $x = 3$ .

Example 2

Find the discrete logarithm of the number $4$ with regard to base $2$ for the modulus $p = 7$ .

i.e. solve $2^x = 4 \bmod 7$ .

\begin{aligned} 1&: 2^1 &\equiv 2 &\pmod 7 \\ 2&: 2^2 \equiv 2 \cdot 2 &\equiv 4 &\pmod 7 \\ 3&: 2^3 \equiv 4 \cdot 2 \equiv 8 &\equiv 1 &\pmod 7 \\ 4&: 2^4 \equiv 1 \cdot 2 &\equiv 2 &\pmod 7 \\ 5&: 2^4 \equiv 2 \cdot 2 &\equiv 4 &\pmod 7 \end{aligned}

There is a cycle with powers of $2$ modulo $7$ taking on the values $2, 4, 1$ . Hence, $x = 2$ .

09. Hash Functions and MACs

Hash Functions

A public function $H$ such that:

$H$ is simple and fast to compute
$H$ takes as input message $m$ of arbitrary length and outputs a message digest $H(m)$ of fixed length

Security Properties

Collision resistant: it should be infeasible to find any two values $x_1$ and $x_2$ such that $H(x_1) = H(X_2)$
Second-preimage resistant: given a value $x_1$ , it should be infeasible to find a different value $x_2$ such that $H(x_1) = H(x_2)$
Preimage resistant (one-way): given a value $y$ , it should be infeasible to find any input $x$ such that $H(x) = y$

If an attacker can break second-preimage resistance, they can also break collision resistance.

Birthday Paradox

If there is a group of 23 people, there is over a 50% chance that at least two people have the same birthday.

If choosing $\sqrt{|S|}$ values from a set $S$ , the probability of getting two values the same is about half ( $n \approx \sqrt{2|S| \cdot p_\text{collision}}$ ).

Hence, if $H$ is a hash function with an output size of $k$ bits, $2^{k/2}$ trials will be enough to find a collision half the time (assuming $H$ is a random function).

Today, $2^{128}$ trials is considered infeasible so hash functions should have an output of at least 256 bits for collision resistance.

In comparison, to get a 50% change of guessing the key of a block cipher requires only $2^{k - 1}$ trials, so hash functions require about double the number of bits compared to block ciphers for the same security.

Iterated Hash Functions

Iterated hash functions splits the input into fixed-size blocks and operates on them sequentially.

Merkle-Damgård Construction

A compression function $h$ taking fixed-sized inputs and applies them to the blocks of the message.

The compression function takes two $n$ -bit input strings $x_1$ and $x_2$ and produces an $n$ -bit output string $y$ :

m = m_1 || m_2 || m_3 || … || m_l

       m_1    m_2    m_3       m_l pad+len
        |      |      |         |     |
        |      |      |         |     |
        v      v      v         v     v
IV ---> h ---> h ---> h --...-> h---> h ---> H(m)

Security: if the compression function $h$ is collision-resistant, then so is $H$ .

Weaknesses:

Length extension attacks: if padding appended but not message length, there is no difference in the output of the message and the message with the right padding added. Hence, if the message is $m \| p$ , they can extend this to $m \| p \| z \| p'$ , where $z$ is the additional contents added by the attacker. Since the hash is the full internal state of the hash function, the attacker can use the hash and continue calulating the rest of the hash for $z \| p'$ , allowing them to create valid hashes without knowledge of $\text{IV}$ or the rest of the message. Adding message length stops this attack
Second-preimage attacks: not as hard as they should be
Collisions for multiple messages: not that too much more difficult than finding collisions for two messages

The Merkle-Damgård construction is used in MD5, SHA-1 and SHA-2.

Standardized Hash Functions

MDx Family

Proposed by Ron Rivest, widely used in the 1990s.

128-bit output, all broken.

Secure Hash Algorithm (SHA)

Based on MDx family, more complex design with 160 bit output.

SHA-0 introduced 1993, SHA-1 in 1995 with minor changes. Both broken.

SHA-2 Family

Standardized 2015.

Hash sizes of 224, 256, 384 or 512 bits.

Minimum recommended is SHA-256 (256 bit hash, 512 bit blocks) - same security as AES-128.

Most secure is SHA-512: 512 bit hash, 1024 bit blocks.

Padding:

Message length field: 64 bits for 512 bit blocks, 128 bits for 1024 bit blocks.
Always at least one bit of padding
There is one 1 bit, some number of 0 bits (enough to make blocks full) and then the length field
Padding and length fields may add an extra block

SHA-3

MDx, SHA families based on the same basic design which were vulnerable to a few unexpected attacks.

Keccak function picked by NIST in 2015 as as SHA-3: uses sponge function instead of compression.

Using Hash Functions

Hash functions do not depend a key; anyone can calculate it so it is not encryption.

However, it does help to provide data authentication:

Authenticating the hash of a message to authenticate the message
Building blocks for MACs
Building blocks for signatures

Password storage:

Pick random salt: makes it resistant to rainbow table attacks as there needs to be a different dictionary for every salt
Compute $h = H(\text{password}, \text{salt})$
Store salt and hash value

Message Authentication Code

Ensures message integrity. takes in a message $M$ of arbitrary length and secret key $K$ and outputs a fixed-sized tag $T = \mathrm{MAC}(M, K)$ .

The tag $T$ is appended to the message and the recipient can compute $T'$ with the message they receive and their shared secret, checking to ensure that $T' = T$ .

Unforgeability: not feasible to produce a valid pair $(M, T)$ such that $T = \mathrm{MAC}(M, K)$ without knowledge of $K$ .

Unforgeability under chosen message attack: even with access to an oracle that can calculate the MAC for an input message of the attacker’s choosing, they cannot create a valid tag themselves (i.e. guess the private key from tags they ask the oracle to generate).

MAC from Hash Function (HMAC)

Proposed by Bellare, Canetti and Krawczyk in 1996.

Can be built from any iterated hash function $H$ .

With a key $K$ that has been padded with zeroes to be the required block size and two fixed strings:

opad = 0x5c5c...5c
ipad = 0x3636...36

\mathrm{HMAC}(M, K) = H((K \oplus \mathrm{opad}) \| H((K \oplus \mathrm{ipad}) \| M))

HMAC is secure if $H$ is collision resistant or $H$ is a pseudorandom function. It is designed to resist length-extension attacks, even if $H$ is a Merkle-Damgård construct.

HMAC is often used as a pseudorandom function to derive subkeys.

Authenticated Encryption

A and B share a key $K$ and $A$ wishes to send a message $M$ with confidentiality and authenticity/integrity.

Two options:

Split $K$ into $K_1$ and $K_2$ , encrypting $M$ with $K_1$ and using $K_2$ with a MAC
Use an authenticated encryption algorithm that provides both

Combining Encryption and MAC

Three options:

Encrypt-and-MAC: encrypt $M$ , apply MAC to $M$ and send the ciphertext $C$ and tag $T$
- Encryption algorithms usually have an IV but MACs usually don’t; if the same message is sent multiple times the MAC will be the same
MAC-then-encrypt: calculate the MAC on $M$ , encrypt $M \| T$ then send the ciphertext $C$
Encrypt-then-MAC: encrypt $M$ , calculate the MAC on the ciphertext $C$ and then send $C$ and tag $T$

Encrypt-then-MAC is the safest option: $C = E(M, K_1)$ , $T = \mathrm{MAC}(C, K_2)$ , send $C \| T$

MAC-then-encrypt was used in older versions of TLS while newer versions use authentication encryption modes.

https://crypto.stackexchange.com/questions/202/should-we-mac-then-encrypt-or-encrypt-then-mac

CTR Mode for Block Ciphers

Synchronous stream cipher. Counter initialized with random nonce $N$ , keystream generated by encrypting successive values of the counter:

O_t = E(N||t, K)

where $t$ is the block number.

Encryption and decryption is simply XORing the plain/ciphertext with $O_t$ .

Galois Counter Mode (GCM)

CCM mode cannot be used for processing of streaming data: formatting function for $N$ , $A$ and $P$ requires knowledge of the length of $A$ and $P$ .

Combines CTR mode on the block cipher $E$ with a special keyed hash function GHASH (uses multiplication in finite field $\mathrm{GF}(2^{128})$ .

Input: plaintext $P$ , authenticated data $A$ , nonce $N$
Outputs: ciphertext $C$ , tag $T$
Lengths $\mathrm{len}_A$ and $\mathrm{len}_C$ are 64 bit values
- $u$ and $v$ are the minimum numbers of zeros required to expand $A$ and $C$ to complete blocks
Length $t$ of $T$ is 128 bits, $N$ is 96 bits long
Initial block input: $J_0 = N \| 0^{31} \| 1$
Function $\mathrm{inc}_{32}$ increments 32 MSB of the input string by $1 \pmod{2^{32}}$

GCM diagram

GHASH:

GASH diagram

$HK = E(0^{128}, K)$ is the hash subkey. $\cdot$ is multiplication in the finite field.

Output is $Y_m = \mathrm{GHASH}_{HK}(X_1, \dots, X_m)$

Decryption:

Receiver receives ciphertext $C$ , nonce $N$ , tag $A$ , authenticated data $A$
Receiver computes tag $T'$ using shared key $K$ , compares with $T$
If the same, $P$ can be computed by generated the same keystream from CTR mode

10. Public Key Cryptography

Public key cryptography (PKC) has some features that symmetric key cryptography (SKC) does not and is applied for key management in protocols such as TLS and IPsec.

Discrete log-based ciphers are alternatives to PKC.

One-Way Functions

A function is one-way if $f(x) = y$ is easily computed given $x$ , but $f^{-1}(y) = x$ is hard to computed given $y$ .

It is not known if one-way functions exist, but there are many functions that are believed to be one-way:

Multiplication of large primes: the inverse function is integer factorization
Exponentiation: the inverse function takes discrete logarithms

Trapdoor One-Way Functions

A function such that $f^{-1}(y)$ is easily computed given additional information, called a trapdoor.

Example: let $f(x) = x^e \bmod n$ for any $x$ co-prime to $n$ :

The inverse function will be to find $d$ such that $\left(x^e\right)^d \bmod n = x^{ed} \bmod n = x$
From Euler’s theorem, $a^{\phi(n)} \bmod n = a^{k\phi(n)} \bmod n = 1$ for any integer $k$ , assuming $a$ co-prime to $n$
Hence, $x \cdot x^{k\phi(n)} \bmod n = x^{k\phi(n) + 1} \bmod n = x$ ( $x$ and $n$ are co-prime to each other)
- $ed = k\phi(n) + 1$ for some integer $k$
- That is, $ed \equiv 1 \pmod{\phi(n)}$
- Hence, $d = e^{-1} \bmod \phi(n)$
Integer factorization is assumed to be a hard problem; hence $\phi(n)$ cannot be computed easily
Hence, only someone with knowledge of $n$ ’s factors, the trapdoor, can find the inverse function

Asymmetric Cryptography

Another word to describe public key cryptography.

Public key cryptosystems, such as the Diffie-Hellman key exchange, are designed by using a trapdoor one-way function: the trapdoor is the decryption key.

This allows a public key to be stored in a public directory: anyone can obtain the public key and use it to form an encrypted message that only a person with the private key can decrypt.

Asymmetry: encryption and decryption keys are different. Encryption key is public and known to anybody; the decryption key is private and known only to its owner.

Finding the private key from the public key must be a hard computational problem.

Advantages of shared key/symmetric cryptography:

Key management is simplified: they dot not need to be transported confidentially
Digital signatures can be obtained

RSA

Designed in 1977 by Rivest, Shamir, and Adleman from MIT (patent expired 2000).

It is based on integer factorization problem.

Algorithm

Key generation:

Randomly choose two distinct primes, $p$ and $q$ from the set of all primes of a certain size
Compute $n = pq$
Randomly choose $e$ such that $\mathrm{gcd}(e, \phi(n)) = \mathrm{gcd}(e, (p - 1)(q - 1)) = 1$
Compute $d = e^{-1} \bmod \phi(n)$
Set the public key $K_E = (n, e)$
Set the private key $K_D = (p, q, d)$

Encryption:

Input is value $M$ such that $0 < M < n$
Compute $C = \mathrm{Enc}(M, K_E) = M^e \bmod n$

Decryption:

Compute $M = \mathrm{Dec}(C, K_D) = C^d \bmod n$

Any message must be pre-processed:

Coding it as a number
Adding randomness (to avoid repeating ciphertext for the same plaintext)

Numerical Example

Key generation:

Let $p = 43$ , $q = 59$
$n = pq = 2537$
$\phi(n) = (p - 1)(q - 1) = 2436$
Let $e = 5$
- $d = e^{-1} \bmod \phi(n) = 5^{-1} \bmod{2436} = 1949$ (Solve $ed + k'\phi(n) = 1$ using the Euclidean algorithm)

Encryption:

Let $M = 50$
$C = M^e \bmod n = 50^5 \bmod{2537} = 2488$

Decryption:

$C^d \bmod n = 2488^{1949} \bmod{2537} = 50 = M$

Numerical Example 2

Key generation:

Let $p = 11$ , $q = 13$
$n = pq = 11 \cdot 13 = 143$
$\phi(n) = (p - 1)(q - 1) = 10 \cdot 12 = 120$
Let $e = 7$
- Find $d = e^{-1} \bmod \phi(n) = 7^{-1} \bmod{120}$
- Solve $ed + k'\phi(n) = 1$ using the Euclidean algorithm:
  
  $\begin{aligned} 120 &= 7 \cdot 17 + 1 \\ \therefore 1 &= 120 \cdot 1 + 7 \cdot (-17) \\ \therefore 7^{-1} \bmod{120} &= -17 = 103 \end{aligned}$

Encryption:

Let $M = 5$
$C = M^e \bmod n = 5^7 \bmod{143} = 47$

Decryption:

$C^d \bmod n = 47^{103} \bmod{143} = 5 = M$

Correctness

Decrypting an encrypted message: does $(M^e)^d \bmod n = M$ ?

As $d = e^{-1} \bmod \phi(n)$ , $ed \bmod \phi(n) = 1$ : this can be written as being some integer $k$ such that $ed = 1 + k\phi(n)$ .

Hence, $(M^e)^d \bmod n = M^{ed} \bmod n = M^{1 + k\phi(n)} \bmod n$ .

Now we must show that $M^{1 + k\phi(n)} \bmod n = M$ . There are two cases:

Case 1: Coprime to n

$\mathrm{gcd}(M, n) = 1$ . In this case, we can apply Euler’s theorem, $M^{\phi(n)} \bmod n = 1$ :

\begin{aligned} M^{1 + k\phi(n)} \bmod n &= M \cdot (M^{\phi(n)})^k \bmod n \\ &= M \cdot 1^k \bmod n \\ &= M \end{aligned}

Case 2: Multiple of p or q

Since $p$ and $q$ are prime and $M < pq$ , if $\mathrm{gcd}(M, n) \neq 1$ , $M$ must be a multiple of either $p$ and $q$ and hence coprime to the other prime.

Supposing that $\mathrm{gcd}(M, p) = 1$ , $\mathrm{gcd}(M, q) = q$ and hence, there is some integer $l$ such that $M = lq$ .

Applying Fermat’s theorem, $M^{\phi(p)} \bmod p = M^{p - 1} \bmod p = 1$ :

\begin{aligned} M^{1 + k\phi(n)} \bmod p &= M \cdot \left(M^{\phi(n)}\right)^k &\bmod p \\ &= M \cdot \left(M^{p - 1}\right)^{k(q - 1)} &\bmod p \\ &= M \cdot 1^{k(q - 1)} &\bmod p \\ &= M &\bmod p \end{aligned}

As $M = lq$ $M^{1 + k\phi(n)} \bmod q = 0$ . As $n = pq$ and $p$ and $q$ are primes, we can use the Chinese Remainder Theorem:

There is a unique solution $x = M^{1 + k\phi(n)} \bmod n$ to:
- $M = M^{1 + k\phi(n)} \bmod p$
- $M = M^{1 + k\phi(n)} \bmod q \,( = 0)$
- Hence, $x = M$
And $M = M^{1 + k\phi(n)} \bmod n$ is satisfied too

Applications

Message encryption
Digital signature
Distribution of key for symmetric key encryption (hybrid encryption)
User authentication

Implementation

A few implementation details

Key Generation

Generating large primes $p$ and $q$ :

At least 1024 bits recommended for today
Simple algorithm: select random odd number $r$ of required length, check if prime, incrementing by two if not

Choice of $e$ :

Choose at random for best security
But small values are often used in practice: more efficient
$e = 3$ is smallest possible value; very fast but has security issues
$e = 2^{16} + 1$ is a popular choice
$d$ should be at least $\sqrt{n}$ to prevent known attacks such as Wiener’s attack

Encryption/decryption

Fast Exponentiation

A square-and-multiply modular exponentiation algorithm.

In binary, $e = e_0 \cdot 2^0 + e_1 \cdot 2^1 + \dots + e_k \cdot 2^k$ , where $e_i$ are bits.

If $M$ is the plaintext, $M^e = M^{e_0} \cdot (M^2)^{e_1} \cdot \dots \cdot (M^{2^k})^{e_k}$ where $(M^{2^i})^{e_i}$ is zero when $e_i = 0$ and $M^{2^i}$ when $e_i = 1$ .

Code:

// 2^66 % 100
M = 2   // base
n = 100 // modulus
e = [0, 1, 0, 0, 0, 0, 1] // exponent as bits, LSB first
z = 1; // Result - M^e \bmod n
for(let i = 0; i < e.length - 1; i++) {
  if(e[i] == 1) {
    z = (z * M) % n;
    // z = 1 initially so first multiplication is unnecessary
    // Hence e.filter(i => i == 1).length - 1 multiplications
  }
  
  M = (M * M) % n; // equals base^{2^{i + 1}}
  // When i = 0, M = base^1 - right for the next iteration
  // Does not need to be calculated for the i + 1 == e.length
  // as the value is used for the next iteration
}

Cost:

If $2^k \le e \lt 2^{k + 1}$ the algorithm uses $k$ squarings
If $b$ bits of $e$ are high, there are $b - 1$ multiplications (first computation has $z = z \cdot M$ , but $z$ is initially one)
$n$ is 2048 bits so $e$ is at most 2048 bits. Hence, computing $M^e \bmod n$ requires at most 2048 modular squarings or multiplications
On average, only half of $e$ ’s bits are high so there are only 1024 multiplications

Faster Decryption Using CRT

Decrypting $C$ with respect to $p$ and $q$ separately.

Compute $M_p = C^{d \bmod (p - 1)} \pmod p$ and $M_q = C^{d \bmod (q - 1)} \pmod q$

Solve $M \bmod n$ using CRT. $d = (d \bmod (p - 1)) + k(p - 1)$ for some $k$ . Hence:

\begin{aligned} M \bmod p &= C^{d \bmod n} \bmod p = C^d \bmod p \\ &= C^{d \bmod (p - 1)} C^{k(p - 1)} \bmod p = C^{d \bmod (p - 1)} \\ &= M_p \end{aligned}

Hence, $M \equiv M_p \bmod p$ and similarly, $M \equiv M_q \bmod q$ .

Then we can output $M = q\cdot (q^{-1} \bmod p) \cdot M_p + p \cdot (p^{-1} \bmod q) \cdot M_q \bmod n$ .

Speedup

Exponents $d \bmod (p - 1)$ and $d \bmod (q - 1)$ are about half the length of $d$ and the complexity of exponentiation with square-and-multiply increases with the cube of input length.

Hence, computing each of $M_p$ and $M_q$ uses about an eighth of the computation, leading to four times less computation.

Hence, storing $p$ and $q$ instead of just $n$ allows for faster decryption.

Padding

Encryption directly on the message encoded as a number is bad as it is vulnerable to attacks:

Building up a dictionary of known plaintexts
Guessing the plaintext and checking if it encrypts to the ciphertext
Håstad’s attack

Hence, a padding mechanism must be used to prepare the message for encryption, adding redundancy and randomness.

Håstad’s Attack

The same message $M$ is encrypted without padding to three different ciphertexts $C_1$ , $C_2$ , $C_3$ with the public exponent $e = 3$ being used by all recipients.

\begin{aligned} C_1 = M^3 \bmod n_1 \\ C_2 = M^3 \bmod n_2 \\ C_3 = M^3 \bmod n_3 \end{aligned}

Equations can be solved using the CRT to obtain $M^3$ in the ordinary (non-modular integers). The attacker can then simply take the cube root to find $M$ .

Padding Types

PKCS 1: simple ad-hoc design
Optimal asymmetric encryption padding (OAEP):
- Bellaware and Rogaway, 1994
- Security proof in a suitable model
- Standardized in IEEE P1363

Security

Attacks

Mostly avoided through the use of standardized padding mechanisms.

Possible attacks:

Factorization of the modulus $n$ : this is believed to be a hard problem, so should be fine as long as $n$ is large enough.
Finding $d$ from $n$ and $e$ : as hard as factorizing $n$ (Miller’s theorem)
Quantum computers: Shor’s theorem can theoretically factorize $n$ in polynomial time
Timing analysis: using timing of decryption process to obtain information about $d$
- Demonstrated in practice in smart cards
- Avoided by randomizing the decryption processes

Practical Problems with Key Generation

OpenSSL implementation in some systems would use massively-reduced randomness (2008)
Lenstra in 2012 analyzed 6 million RSA keys:
- Found 4% of keys were identical
- Found 0.2% of keys provided no security as they shared one prime factor with each other

Diffie-Hellman Key Exchange

Two users sharing a secret using only public communication.

Public elements:

Large prime $p$
Generator $g \in \mathbb{Z}_p^*$

Alice and Bob randomly select values $a$ and $b$ respectively, where $1 < a, b < p$ .

Over an insecure channel, Alice sends $g^a$ , Bob sends $g^b$ .

Both compute the secret key $Z = g^{ab} \bmod p$ : $(g^b)^a \equiv (g^a)^b) \pmod p$ .

Example

Let $p = 181$ , $g = 2$ , $a = 50$ , $b = 33$ .

Alice sends $g^a \bmod p = 2^{50} \bmod{181} = 116$
Bob sends $g^b \bmod p = 2^{33} \bmod{181} = 30$

Both compute $Z = (g^b)^a \bmod p = (g^a)^b \bmod p = 30^{50} \bmod{181} = 116^{33} \bmod{181} = 49$

Properties

Security

Relies on the difficulty of the discrete logarithm problem.

If an attacker intercepts $g^a \bmod p$ and take the discrete logarithm to find $a$ , they can compute $(g^b)^a$ in the same way as Bob.

There is no better known way for a passive adversary to find the shared secret.

Authenticated Diffie-Hellman

In the basic protocol, messages are not authenticated: a man-in-the-middle-attack is possible where the attacker acts as a proxy between the two parties, decrypting messages from one party and then re-encrypting messages to send to the other party with the other key.

Alice chooses $a$ , sending $A$ and $g^a \bmod p$
Bob chooses $b$ , sending $B$ , $g^b \bmod p$ and $\mathrm{Sig}_B(B, A, g^b)$
Alice sends $\mathrm{Sig}_A(A, B, g^a)$
Both computes $g^{ab} \bmod p$

Both parties must know each other’s public signature verification keys, $A$ and $B$ (identity + public key).

Static and Ephemeral Diffie-Hellman

The above protocol uses ephemeral keys: keys are used once. In the static protocol:

Alice chooses a long-term private key $x_A$ and public key $y_A = g^{x_A} \bmod p$
Bob chooses a long-term private key $x_B$ and public key $y_B = g^{x_B} \bmod p$
Alice and Bob find a shared secret $S = g^{x_A x_B} \bmod p$ which is static
- $S$ stays the same until either party changes their public key

Elgamal Cryptosystem

Proposed by Elgamal in 1985, turning the Diffie-Hellman protocol into a cryptosystem for encryption and for signature.

Alice combines her ephemeral private key with Bob’s long-term public key

Algorithms

Key generation:

Select prime $p$ , generator $g \in \mathbb{Z}_p^*$
Select long-term private key $1 < K_D < p$
Compute $y = g^{K_D} \bmod p$
Set the long-term public key $K_E = (p, g, y)$

Encryption:

Select a message $0 < M < p$
Choose at random an ephemeral private key $k$
Compute $g^k \bmod p$ and $My^k \bmod p$
Compute the ciphertext:
- $C = (C_1, C_2) = \mathrm{Enc}(M, K_E) = (g^k \bmod p, My^k \bmod p)$

Decryption:

Compute $C_1^{K_D} \bmod p$
$\mathrm{Dec}(C, K_D) = C_2 \cdot (C_1^{K_D})^{-1} \bmod p = M$

Correctness

Alice knows ephemeral private key $k$
Bob knows long-term private key $K_D$
Both compute the Diffie-Hellman value for the two public keys:
- $C_1 = g^k \bmod p$
- $y = g^{K_d} \bmod p$
Diffie-Hellman value $y^k \equiv C_1^{K_d} \pmod p$ used as the mask for the message $M$

Example

Key generation:

Prime $p = 181$
Generator $g = 2$
Long-term private key $K_D = 50$
Compute $y = g^{K_D} \bmod p = 2^{50} \bmod{181} = 116$
Bob’s public key is $(181, 2, 116)$

Encryption:

Alice wants to send $M = 97$
$k = 31$ chosen at random
Computes

$\begin{aligned} C &= (C_1,\, C_2) \\ &= (g^k \bmod p,\, My^k \bmod p) \\ &= (2^{31} \bmod{181},\, 97 \cdot 116^{31} \bmod{181}) \\ &= (98,\, 173) \end{aligned}$

Decryption:

Bob computes $C_1^{K_D} \bmod p = 98^{50} \bmod{181} = 138$
Bob recovers

$\begin{aligned} M &= C_2 \cdot (C_1^{K_D})^{-1} \bmod p \\ &= 173 \cdot 138^{-1} \bmod{181} \\ &= 173 \cdot 101 \bmod{181} \\ &= 97 \end{aligned}$

Security

Dependent on the difficulty of the discrete logarithm problem: if broken, they could determine the private key $K_D$ from $y = g^{K_D} \bmod p$
Many users could share the same $p$ and $g$
Padding not required: ephemeral key $k$ randomizes the ciphertext

Elliptic Curves

Algebraic structures formed from cubic equations.

Curves are defined over any field.

e.g. set of all $(x, y)$ pairs which satisfy $y^2 = x^3 + ax + b \bmod p$ , then creates a curve over the field $\mathbb{Z}_p$ .

A point on the curve is the identity element, and by defining a binary operation on the points (e.g. multiplication), we can form a group over elliptic curve’s points: the elliptic curve group.

Any elliptic curve can be used but most applications use standardized curves generated in a verifiably random way (e.g. NIST curve P-192 has curve of $n$ points over $\mathbb{Z}_p$ with generator $(G_x, G_y)$ and equation $y^2 = x^3 - 3x + b \bmod p$ with a defined random values for $p$ , $n$ , $b$ , $G_x$ , $G_y$ and seed $s$ .

Discrete log defined on elliptic curve groups: if elliptic curve operation operation denoted as multiplication, definition is the same as in $\mathbb{Z}_p^*$ .

Elliptic curve implementations require smaller keys compared to RSA:

Symmetric key	RSA modulus	EC element length
80	1024	160
128	3072	256
192	7680	384
256	15360	512

From Ars:

Symmetric across horizontal axis
Any non-vertical line intersects with the curve at most three times. Group operation:
- Infinity, $\mathcal{O}$ , is the identity element
- Draw line intersecting the two points
  - If the points are the same, use the tangent
- Find the third point intersecting the line
  - If the points are the same and it is at an inflection point, use the same point
- Find the opposite point $(x, y) \to (x, -y)$

11. Digital Signatures

MACs allow entities with shared secrets to generate valid tags, providing data integrity and authentication.

Digital signatures use public key cryptography to provide additional properties: only the owner of a private signing key can generate a valid signature.

Non-repudiation: the signer cannot deny they have signed a message.

Properties

Algorithms: Key & signature generation, signature verification.

Key generation algorithm outputs a private signing key, $K_S$ and public verification key, $K_V$ .

Signature generation: $s = \mathrm{Sig}(M, K_S)$ , where $M$ is a message of any length to sign. Only the owner should be able to generate a valid signature. The signature is usually a fixed size (and if there are multiple possible signatures for the same message, they will all be the same length).

Signature verification: $\mathrm{Ver}(M, s, K_V)$ outputs true or false.

Correctness: if $s = \mathrm{Sig}(M, K_S)$ , then $\mathrm{Ver}(M, s, K_V) = \mathrm{true}$ (for matching $K_S$ and $K_V$ ).

Unforgeability: computationally infeasible to generate signature for any message without key. Note that the signing algorithm may be randomized - many possible signatures for a message.

Stronger security definition: forging a new signature should be difficult even if they can obtain signatures for messages of their choice (chosen message oracle).

Security Goals

Key recovery: recovering the private signing key $K_S$ fom the public verification key $K_V$ and some known signatures.

Selective forgery: choosing a message and obtaining a signature for that message.

Existential forgery: forging a signature on any message not previously signed (even a meaningless message).

Modern digital signatures should be able to resist existential forgery under a chosen message attack.

RSA Signatures

Key Generation

Key generation is the same as for encryption keys:

Public verification key: $n$ , $e$ where $n$ is the product of two large primes $p$ and $q$
Private signing key: $p$ , $q$ , $d$ such that $ed \bmod \phi(n) = 1$

A fixed public parameter, the hash function $h$ , is also required (e.g. SHA-256).

Signature Generation and Verification

Given message $M$ , modulus $n$ and private exponent $d$ , the signature is $s = h(M)^d \bmod n$ . $(M, s)$ is the signature.

Given claimed signature $(M, s)$ , modulus $n$ and public exponent $e$ , check if $s^e \bmod n = h(M)$

Discrete Logarithm Signatures

Three versions:

Original Elgamal signatures in $\mathbb{Z}_p^*$ (1985)
Digital signature algorithm (DSA) standardized by NIST
DSA based on elliptic curve groups: ECDSA

Elgamal Elements in $\mathbb{Z}_p^*$

$p$ is a large prime
$g$ is a generator of $\mathbb{Z}_p^*$
$x$ is the private signing key, where $0 < x < p - 1$
$y = g^x \bmod p$ is part of the public key
A message $m$ in $0 < M < p - 1$

$p$ , $g$ and $y$ form the public verification key.

Signature generation:

Pick random $k$ such that $\mathrm{gcd}(k, p - 1) = 1$
Compute $r = g^k \bmod p$
Solve $M = xr + ks \bmod(p - 1)$ for $s$ :

$s = k^{-1}(M - xr) \bmod(p - 1)$
Output the tuple $(M, r, s)$

Signature verification: check if

g^M \equiv y^r r^s \pmod p

Correctness

Fermat’s Little Theorem

Fermat’s little theorem: given a prime $p$ , for any integer $a < p$ :

\begin{aligned} a^p - a &= np \\ a(a^{p - 1} - 1) &= np \end{aligned}

if $\mathrm{gcd}(a, p) = 1$ , $a \neq np$ and hence:

\begin{aligned} a^{p - 1} - 1 &= np \\ \therefore a^{p - 1} &\equiv 1 \pmod p \end{aligned}

Proving Correctness

Let $M \equiv y \pmod{p - 1}$ :

\begin{aligned} a^M &\equiv a^M \cdot a^{n(p - 1)} &\pmod{p} \\ &\equiv a^{M + n(p - 1)} &\pmod{p} \\ &\equiv a^{M \,\bmod{(p - 1)}} &\pmod{p} \\ &\equiv a^y &\pmod{p} \end{aligned}

Hence, coming back to the Elgamal signature verification equation (and noting that $g$ is a generator and hence relatively prime to $p$ :

\begin{aligned} g^M &\equiv g^{sk + xr} &\pmod p \\ &\equiv g^{sk}g^{xr} &\pmod p \\ &\equiv g^{sk}g^{xr} &\pmod p \\ &\equiv (g^k)^s(g^x)^r &\pmod p \\ &\equiv r^s y^r &\pmod p\\ \end{aligned}

as required.

Digital Signature Algorithm (DSA)

Published by NIST in 1994. Based on Elgamal signatures but calculations are simpler and signatures shorter: done in a subgroup of $\mathbb{Z}_p^*$ or an elliptic curve group. Also prevents attacks Elgamal signatures were vulnerable to.

Prime $p$ chosen such that $p - 1$ has a prime divisor $q$ that is much smaller.

Generator $g$ from Elgamal signatures is replaced by:

g = h^{\frac{p - 1}{q}} \bmod p

where $h$ is a generator.

$g$ has order $q$ as $g^q \bmod p = 1$ (by Fermat’s theorem, $g^q \bmod p = \left(h^\frac{p - 1}{q}\right)^q \bmod p = h^{p - 1} \bmod p = 1$ ). Hence, all exponents can be reduced modulo $q$ before exponentiation.

Let $H$ be a standard SHA hash algorithm such that the output is $N$ bits long (the same size as $q$ ).

$g^{H(M)} \equiv y^r r^s \bmod p$ so rearranged, we get the verification equation:

\left(g^{H(M)}\right)^{s^{-1}} \left(y^{-r}\right)^{s^{-1}} \equiv r \pmod p

Both sides of the equation are reduced modulo $q$ .

$p$ is a $L$ -bit long prime modulus, $q$ a $N$ bit long a prime divisor of $p - 1$ . There are several approved combinations for $(L, N, \text{Hash fn})$ : $(1024, 160, \text{SHA-1})$ , $(2048, 224, \text{SHA-224})$ , $(2048, 256, \text{SHA-256})$ , $(3072, 256, \text{SHA-256})$ . The first option may be insecure.

Key generation:

Choose secret key $0 < x < q$
Compute the public key, $y = g^x \bmod p$ .

Signature generation:

Message $M$
Pick $0 < k < q$
Compute $r = \left(g^k \bmod p \right) \bmod q$
- Why $\bmod q$ ?
Compute $s = k^{-1}(H(M) - xr) \bmod q$
Set the signature as $(M, r, s)$

Signature verification:

Check if $0 < r < q$ , $0 < s < q$
Compute $w = s^{-1} \bmod q$
Compute $u_1 = H(M)w \bmod q$
Compute $u_2 = rw \bmod q$
Check that $\left(g^{u_1} y^{-u_2} \bmod p \right) \bmod q = r$

Verification equation is the same as with Elgamal except that all exponents and the final result is reduced modulo $q$ . Signature size is also smaller at $2N$ bits.

Correctness

Fermat’s Little Theorem

Where $p$ and $q$ are primes and $\text{gcd}(p - 1, q) = m \neq 1$ , let $n$ be any integer.

\begin{aligned} & a^{b + nq} &\bmod p \\ =& a^{b + n \cdot \frac{p - 1}{m}} &\bmod p \\ =& a^{b} \left(a^{p - 1}\right)^{\frac{n}{m}} &\bmod p \\ =& a^{b} \cdot 1^{\frac{n}{m}} &\bmod p \\ =& a^b &\bmod p \end{aligned}

Hence $a^{b \,\bmod q} \bmod p \equiv a^b \bmod p$

Proving Correctness

Let $M \equiv y \pmod{p - 1}$ :

\begin{aligned} a^M &\equiv a^M \cdot a^{n(p - 1)} &\pmod{p} \\ &\equiv a^{M + n(p - 1)} &\pmod{p} \\ &\equiv a^{M \,\bmod{(p - 1)}} &\pmod{p} \\ &\equiv a^y &\pmod{p} \end{aligned}

Hence, coming back to the Elgamal signature verification equation (and noting that $g$ is a generator and hence relatively prime to $p$ :

\begin{aligned} g^M &\equiv g^{sk + xr} &\pmod p \\ &\equiv g^{sk}g^{xr} &\pmod p \\ &\equiv g^{sk}g^{xr} &\pmod p \\ &\equiv (g^k)^s(g^x)^r &\pmod p \\ &\equiv r^s y^r &\pmod p\\ \end{aligned}

as required.

Elliptic Curve DSA (ECDSA)

Similar to DSA except that:

$q$ becomes the order of the elliptic curve group
Multiplication modulo $p$ is replaced by the elliptic curve group operation
After operations on group elements, only the $x$ coordinate is kept from the pair $(x, y)$

Compared to DSA, public keys are shorter but signatures are not (326 to 1142 bits).

12. Public Key Infrastructure and Certificates

Public Key Infrastructure (PKI)

NIST: “… key management environment for public key information of a public key cryptographic system”

Must consider:

Key lifecycle: generation, distribution, storage and destruction
Trusted legal/business entities:
- Registration Authorities (RAs): vouch for the identify of a user
- Validation Authorities (VAs): verify identities
- Certification Authorities (CAs): issue digital certificates (certifying public key TODO?)

Digital Certificates

How do you confirm the relationship between the public key and the claimed owner of that key? Through the use of digital certificates:

Contain public key and owner identity
Metadata such as signature algorithm, validity period

Certificates are signed by a certification authority (that should be trusted by the certificate verifier).

Certification Authority

A CA creates, issues and revokes certificates for subscribers and other CAs.

A CA has a certification practice statement (CPS) which covers processes such as checks before issuing certificates, physical/procedural security controls, revocation processes.

X.509 Certificate

Now RFC 5280, currently on version 3.

Important fields:

Version number
Serial number
Signature algorithm identifier
Issuer name (CA name)
Subject name (user to which the certificate is issued)
Public key information
Validity period
Digital signature (generated by CA)

Verification:

Check CA signature is valid
- Requires user to have public key of the CA
Check any conditions set in the certificate (e.g. validity period) are correct

Certification paths: CAs can issue certificates to other CAs. Hence, as long as there is a chain of CAs leading to a trusted root CA, the last CA can be trusted and hence the certificate can be validated.

Phishing: attacker can make URL and interface similar to a genuine site

Extended validation certificates: certificate issued by only some CAs after they have validated the entity’s legal identity. Different icon in browsers, but mostly ignored by users.

Revocation:

Certificate marked as invalid even if its validity period is current
User must check which certificates have been revoked
Certificate Revocation List (CRL): each CA issues list of revoked certificates that must be downloaded by clients
Online Certificate Status Protocol (OCSP): server responds to requests about specific certificates

Public Key Pinning:

Depreciated feature that allowed websites to tell browsers to fix the public key used to verify certificates
If CA was compromised, attacker can issue another certificate for the website but the browser would continue to use the pinned key for some time period

PKI Examples

Hierarchical PKI:

   R           Root
  / \
 /   Y Intermediate
A     \         CAs
       Z
      / \
      B  C    Users

CA certifies public key of entity below. If non-hierarchical, certification can be done between any CAs.

Browser PKI:

Multiple hierarchies with preloaded public keys as root CAs
Intermediate CAs can be added
Users can add their own certificates
Most servers send their public key and certificate through TLS

OpenPGP PKI:

Used in PGP emails
Certificate includes ID, public key, validity period, self-signature
NO certification authorities
Various key servers store public keys
Web of trust: users can attest to association between public key and username

13. Key Establishment

Distribution of cryptographic keys to protect subsequent sessions. In TLS, public keys allow clients/servers to share a new communication key.

Kerberos: key establishment without public keys.

Key Management

Key management has four phases:

Key generation: keys should be generated such that they are all equally likely to occur
Key distribution: keys should be distributed in a secure fashion
Key protection: keys should be accessible only to authorized parties
Key destruction: once a key has performed its function, it should be destroyed such that it (TODO the encrypted data?) has value to an attacker

Hierarchy

Keys often organized into a hierarchy. In a two-level hierarchy, there are long- and short-term keys:

Long-term/static keys are used to protect the distribution of session keys. They may last anywhere from a few hours to a few years, depending on the application
Short-term/session/ephemeral keys are used to protect communications in a session. They may last anywhere from a few seconds to a few hours, depending on the application

Key Establishment

Symmetric keys with ciphers (e.g. AES, MAC) are used in practice for session keys as as they are more efficient than public key algorithms.

Long-term keys can be symmetric or asymmetric.

Key Distribution Security Goals

Authentication: they should be able to authenticate that the receiver is the party they intended to send the key to.

Confidentiality: no adversary can obtain the session key accepted by a particular party.

In formal models, the protocol is broken if the adversary can distinguish the session key from a random string.

Mutual and Unilateral Authentication

Mutual authentication: both parties achieve the authentication goal.

Unilateral authentication: only one party achieves the authentication goal. This is done by most real-world key establishment protocols: typically clients authenticate the server with client authentication happening later.

Adversary Capabilities

A strong adversary knows the details of the cryptographic algorithms and can:

Eavesdrop on all messages
Alter messages sent
Re-route messages to any other party
Obtain the session key used in any previous run

Distribution of Pre-Shared Keys

A trusted authority (TA) generates and distributes long-term keys to all users when they join the system. Their involvement ends here in the pre-distribution phase so if there are no new users they can go offline.

A simple scheme is to assign a secret key for each pair of users, but this will not scale well as the number of keys grows quadratically.

Probabilistic schemes reduce key material at each party by forwarding messages to other parties that hopefully have a link to the final receiver. Hence, it offers only a high probability of a secure channel between any two parties and requires other nodes to be trusted as they must decrypt and re-encrypt the message. This is suitable for sensor networks.

Key Distribution using Symmetric Keys

Key distribution with online server.

A TA shares a long-term shared key with each user. They distribute session keys to users when requested and hence, the TA is highly trusted and is a single point of attack. Scalability may also be a problem.

Needham-Schroeder Protocol

Widely-known key establishment protocol published 1978. Found vulnerable to replay attacks in 1981 - attacker can replay old messages that a honest party will accept an old session key.

Notation:

Two parties, $A$ and $B$
TA $S$
$A$ and $S$ share a long-term key $K_{AS}$
$B$ and $S$ share a long-term key $K_{AS}$
New session key $K_{AB}$ generated by $S$
Nonce $N_A$ , $N_B$ randomly generated by $A$ and $B$ respectively for one-time use
$A \to B: M$ : $A$ sends message $M$ to $B$
$\{M\}_K$ : message $M$ encrypted using key $K$ . There is assumed to be some authentication mechanism

Protocol:

\begin{aligned} \text{1.}\>& A \to S: \mathrm{ID}_A, \mathrm{ID}_B, N_A \\ \text{2.}\>& S \to A: \left\{ K_{AB}, \mathrm{ID}_A, \mathrm{ID}_B, N_A, \left\{ K_{AB}, \mathrm{ID}_A, \mathrm{ID}_B \right\}_{K_{BS}} \right\}_{K_{AS}} \\ \text{3.}\>& A \to B: \left\{ K_{AB}, \mathrm{ID}_A, \mathrm{ID}_B \right\}_{K_{BS}}, \mathrm{ID}_A \\ \text{4.}\>& B \to A: \{ N_B \}_{K_{AB}} \\ \text{5.}\>& A \to B: \{ N_B - 1 \}_{K_{AB}} \end{aligned}

Replay Attacks

If an attacker $C$ gets a previous session key $K'_{AB}$ , they can masquarade as $A$ and persuade $B$ to use the old key (steps 3-5).

To defend against this, the established key must be fresh for each session:

Random challenges (nonces)
Timestamps
Counters

The repaired protocol uses random challenges. After $A$ establishes request for connection with $B$ :

\begin{aligned} \text{1.}\>& B \to A: \mathrm{ID}_B, N_B \\ \text{2.}\>& A \to S: \mathrm{ID}_A, \mathrm{ID}_B, N_A, N_B \\ \text{3.}\>& S \to A: \left\{ K_{AB}, \mathrm{ID}_A, \mathrm{ID}_B, N_A \right\}_{K_{AS}}, \left\{ K_{AB}, \mathrm{ID}_A, \mathrm{ID}_B, N_B \right\}_{K_{BS}} \\ \text{4.}\>& A \to B: \left\{ K_{AB}, \mathrm{ID}_A, \mathrm{ID}_B, N_B \right\}_{K_{BS}} \end{aligned}

Tickets can also be used: if $A$ wishes to communicate with $B$ , $S$ generates ticket $\left\{ K_{AB}, \mathrm{ID}_A, \mathrm{ID}_B, T_B \right\}_{K_{BS}}$ where $T_B$ is a validity period - $A$ can use the ticket for this duration.

\begin{aligned} \text{1.}\>& A \to S: \mathrm{ID}_A, \mathrm{ID}_B, N_A \\ \text{2.}\>& S \to A: \left\{ K_{AB}, \mathrm{ID}_A, \mathrm{ID}_B, N_A \right\}_{K_{AS}}, \text{ticket} \\ \text{3.}\>& A \to B: \text{ticket} \end{aligned}

Kerberos

Now on V5, released 1995. Standardized as RFC 4120. Used in Windows as the default domain authentication method.

Kerberos allows:

Secure network authentication service in an insecure network
Single sign-on: users only need to enter their username/password once per session
Access to different online services using individual tickets
Session keys to be established in an authenticated and confidential fashion

It is a 3-level protocol:

Level 1

Client $C$ interacts with authentication server $AS$ to obtain a ticket-granting ticket at the start of a session.

\begin{aligned} \text{1.}\> C \to AS&: \mathrm{ID}_C, \mathrm{ID}_{TGS}, N_1 \\ \text{2.}\> AS \to C&: \left\{ K_{C, TGS}, \mathrm{ID}_{TGS}, N_1 \right\}_{K_{C}}, \text{ticket}_{TGS} \end{aligned}

Where:

$K_C$ is the symmetric key shared between $AS$ and $C$ , usually generated on login from $C$
$K_{C, TGS}$ is the symmetric key generated by $AS$
$N_1$ is the nonce generated by $C$ to ensure $K_{C, TGS}$ is fresh
$K_{TGS}$ is the long-term key shared between $AS$ and $TGS$
$\text{ticket}_{TGS} = \{ K_{C, TGS}, \mathrm{ID}_C, T_1 \}_{K_{TGS}}$ is valid for some validity period $T_1$ .

At the end of this exchange, $C$ has a ticket-granting ticket that can be used to obtain different service-granting tickets from the ticket-granting server.

Level 2

Client $C$ interacts with ticket granting server $TGS$ :

\begin{aligned} \text{1.}\> C \to TGS&: \mathrm{ID}_V, N_2, \text{ticket}_{TGS}, \text{authenticator}_{TGS} \\ \text{2.}\> TGS \to C &: \mathrm{ID}_C, \text{ticket}_V, \{ K_{C, V}, N_2, \mathrm{ID}_V \}_{K_{C, TGS}} \end{aligned}

Where:

$K_{C, V}$ is a session key shared between $V$ and $C$
$K_V$ is a long-term key shared between $V$ and $TGS$
$N_2$ is a nonce
$\text{ticket}_V = \left\{ K_{C, V}, \mathrm{ID}_C, T_2 \right\}_{K_V}$ is a ticket for service $V$ with validity period $T_2$
$\text{authenticator}_{TGS} = \left\{ \mathrm{ID}_C, TS_1 \right\}_{K_{C, TGS}}$ where $TS_1$ is a timestamp
- The ticket-granting server must check that the timestamp is valid

The ticket-granting server must also check that the client has permission to access the service $V$ .

In practice, the AS and TGS are the same machine.

Level 3

Client $C$ interacts with application server $V$ :

\begin{aligned} \text{1.}\> C \to V&: \text{ticket}_V, \text{authenticator}_V \\ \text{2.}\> V \to C&: \left\{ TS_2 \right\}_{K_{C, V}} \end{aligned}

Where:

$\text{authenticator}_V = \left\{ \textrm{ID}_C, TS_2 \right\}_{K_{C, V}}$

The reply from $V$ is intended to provide mutual authentication, allowing $C$ to verify they are talking to the right $V$ .

Limitations

Realms (domains over which an authentication server has authority to authenticate a user) must share keys with every other realm, so although multiple realms are supported it has limited scalability.

$K_{C}$ is derived from the user’s password, so offline password guessing is possible.

Key Distribution using Asymmetric Cryptography

No online TA is required. Instead, public keys (managed by PKI - certificates and CAs) is used for authentication. Users are trusted to generate good session keys (so hopefully each party has a good PRNG).

Key transport: user chooses key material and sends it to another party (encrypted, possibly signed). Does NOT provide forward secrecy.

Key agreement: Diffie-Hellman or some other protocol where both parties provide input to the key. The messages are signed, providing authentication. Provides forwards secrecy.

TLS supports both key transport and agreement.

Forward Secrecy

If a long-term key is compromised, the attacker can now claim to be the owner of the key. If key transport is used, all previous session keys will be compromised.

A protocol provides (perfect) forwards secrecy if compromise of the long-term secret keys do not reveal session keys previously agreed using the long-term keys.

Signed Diffie-Hellman

Computations done in $\mathbb{Z}_p^*$ . Notation:

Generator $g$
Random values $a$ and $b$ chosen by each party where $1 \le a, b \le p - 1$
$\mathrm{Sign}_A(m)$ is a signature on message $m$ from $A$ signed with their signing/long-term key
$\mathrm{Sign}_B(m)$ is a signature on message $m$ from $B$ signed with their signing/long-term key
$\mathrm{ID}_A$ and $\mathrm{ID}_B$ are $A$ and $B$ ’s identities respectively
Both parties know each other’s public verification key
Long-term signing keys provide only authentication, hence it has perfect forward secrecy

Protocol:

$A$ sends $\mathrm{ID}_A$ , $g^a$
$B$ sends $\mathrm{ID}_B$ , $g^b$ and $\mathrm{Sign}_B(\mathrm{ID}_A \| \mathrm{ID}_A \| g^b || g^a)$
- $A$ checks signature. If valid, computes session key $K_{AB} = (g^b)^a = g^{ab}$
A sends $\mathrm{Sign}_B(\mathrm{ID}_A \| \mathrm{ID}_B \| g^a || g^b)$
- $B$ checks signature. If valid, computes session key $K_{AB} = (g^a)^b = g^{ab}$

14. Transport Layer Security Protocol

The most widely used security protocol.

History:

SSL 2.0 developed by Netscape in 1994, 3.0 in 1995
Standardized as RFC 2246 in 1999, called TLS 1.0
TLS 1.1 (4346) in 2006, fix issues with non-random IVs, weaknesses from padding
(the below are good to use)
TLS 1.2 (5246) in 2008 allows standard authenticated encryption
TLS 1.3 (8446) in 2018 had major changes and separated key agreement and authentication algorithms

Overview

Three higher-level protocols:

TLS handshake protocol
TLS alert protocol signals events
TLS change cipher spec protocol for changing cryptographic algorithm (not available in TLS 1.3)

TLS record protocol provides basic services to the higher-level protocols. Stack:

|----------------------------------------|
| TLS       | TLS change |  TLS  | HTTP/ |
| Handshake |   cipher   | alert | Other |
|----------------------------------------|
|           TLS record protocol          |
|----------------------------------------|
|                  TCP                   |
|----------------------------------------|
|                  IP                    |
|----------------------------------------|

TLS Record Protocol

TLS offers:

Message confidentiality - message contents cannot be read in transit
Message integrity - receiver can detect modifications made to the message in transit

These services may be provided by a symmetric encryption algorithm (confidentiality) and MAC (authentication/integrity). TLS 1.2 and above offer authenticated encryption modes (CCM, GCM), combining these two services into one.

The handshake protocol establishes the symmetric session keys used by the record protocol.

The record protocol also deals with dividing messages into blocks and re-assembling received blocks and possibly with compressing/decompressing block contents.

Format

                HEADER
|--------------------------------------|
| Content |  Major  |  Minor  | Length |
|  type   | version | version |        |
|--------------------------------------|

          ENCRYPTED CONTENTS
|--------------------------------------|
|              Plaintext               |
|        (possibly compressed)         |
|      (not available in TLS 1.3)      |
|--------------------------------------|
|                 MAC                  |
|  (unless authentication encryption)  |
|--------------------------------------|

Content type can be change-cipher-spec, alert, handshake or application-data.

TLS 1.3 does not allow the cipher suite to be changed to prevent downgrade attacks - a new session must be created.

Major version is 3 for TLS and minor version 1 to 4 for TLS 1.0 to 1.3.

Length of data is in octets.

Operation

Each application-layer message is $2^{14}$ bytes or less
Compression removed in TLS 1.3 after attacks discovered, null by default in TLS 1.2
Authenticated data: data, header and implicit record sequence number
Plaintext: data and MAC (unless using authenticated encryption)
Session keys: established in handshake protocol
- One key for each of MAC and encryption for each direction
- A single key if using authenticated encryption
Specification: encryption/MAC algorithms specified in negotiated cipher suite

MAC algorithm:

SHA-2 allowed in TLS 1.2 and above
MD5 and SHA-1 not supported in TLS 1.3

Encryption algorithm:

Block cipher in CBC or a stream cipher
Most common block cipher is AES
3DES and RC4 not supported in TLS 1.3
For block ciphers, padding is applied after the MAC to get complete blocks

Authenticated encryption algorithm:

Allowed instead of separate encryption and MAC algorithms in TLS 1.2 and above
TLS 1.3 supports only AES with either CCM or GCM modes
Authenticated data in header and implicit record sequence number

TLS Handshake Protocol

For:

Negotiating the TLS version and cryptographic algorithms used
Establishing a shared symmetric session key for use in the record protocol
Authenticating the server and optionally authenticating the client

Many variations supported - many were dropped to in TLS 1.3.

Phases

Phase 1: initiating the logical connection and establishing the capabilities of the partner
Phases 2 and 3: performing the key exchange
- Operation depends on the handshake variant negotiated in phase 1
Phase 4: completing setup

Cipher Suites

Specify the public key algorithms used for key establishment and symmetric algorithms used for authenticated encryption/key computation.

There were over 300 standardized cipher suites, many of which were discarded in TLS 1.3.

TLS 1.3 requires cipher suites to be Authenticated Encryption with Associated Data (AEAD). AEAD means that there is some associated data that must be sent in plain text (not confidential) but should be authenticated - for example, sequence numbers and other header information.

e.g. TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA384:

Ephemeral elliptic curve Diffie-Hellman key transfer (new key for each session)
Key exchange parameters signed using ECDSA (and server’s certificate is also signed by CA using ECDSA)
AES with 256 bit key used as block cipher
Cipher block chaining mode of operation
SHA-2 with digest size of 384 used for the HMAC and for key generation/validation
- Hence required even if AEAD algorithm like GCM or CCM used

e.g. TLS_RSA_WITH_3DES_EDE_CBC_SHA:

Mandatory in TLS 1.0/1.1
RSA used for key exchange
3DES in CBC mode for encryption (confidentiality)
SHA-1 used for HMAC (data integrity) and key generation/validation

Handshake algorithms:

DHE-DSS: Diffie-Hellman key Exchange with Digital Signature Standard/Algorithm (TLS 1.2 only)
DHE-RSA: Ephemeral Diffie-Hellman with RSA signatures (TLS 1.2/1.3)
ECDHE-RSA: Elliptic Curve DHE with RSA signatures (TLS 1.2/1.3)
ECDHE-ECDSA: Elliptic Curve DHE with Elliptic Curve Digital Signature Algorithm (TLS 1.2/1.3)

DH-RSA: permanent DH parameters part of the server certificate (signed with RSA), so there is no forwards secrecy.

Record algorithms:

AES-CBC-SHA256: AES in CBC mode, SHA256 HMAC (TLS 1.2)
AES-GCM: AES in GCM mode for authenticated encryption (TLS 1.2/1.3)
CHACHA20-POLY1305: ChaCha stream cipher with Poly1305 MAC (TLS 1.2/1.3)

Forwards Secrecy:

Compromise of a long-term key should not lead to compromise of session keys established prior to the compromise
Diffie-Hellman handshakes offer forwards secrecy, but RSA-based handshakes do not. Hence, TLS 1.3 drops support for static RSA

Handshake

Phase 1: client/server negotiate TLS version, cipher suite and compression, and exchange nonces
Phase 2: server sends certificate and key exchange message (if required by cipher suite)
Phase 3: client sends certificate and key exchange message (if required by cipher suite)
Phase 4: secure communications. Finished messages also includes check value of all previous messages

Client hello (phase 1):

Highest TLS version supported
Cipher suites supported
Client nonce $N_C$

Server hello (phase 1):

Selected TLS version and cipher suite
Server nonce $N_S$

Server key exchange (phase 2):

Server inputs to key exchange

Client key exchange (phase 3):

Client inputs to key exchange

Change cipher suite (phase 4):

Use negotiated cipher suite for record layer

Ephemeral Diffie-Hellman Handshake Variant

Server key exchange; server sends:

Diffie-Hellman generator
Group parameters (e.g. $P$ )
Server’s ephemeral Diffie-Hellman value

The response is signed by the server using their private key. The client must check this.

Client key exchange; client sends their ephemeral Diffie-Hellman value. Signed if the client has their own certificate.

Pre-master secret ( $\text{pms}$ ) is the shared Diffie-Hellman secret that both parties have computed from the key exchange.

Steps:

Client hello: TLS version, supported cipher suites, client nonce $N_C$
Server hello: server certificate, chosen cipher suite, server nonce $N_S$
Server signature: $N_C$ , $N_S$ and server’s DH parameter signed using server private key
Client checks server signature, sends client’s DH parameter
Pre-master secret calculated using DH parameters
Session keys computed with PRF
Client finished message: encrypted with session key
Server finished message: encrypted with session key

RSA Handshake Variant

Client hello: TLS version, supported cipher suites, client nonce $N_C$
Server hello: server certificate, chosen cipher suite, server nonce $N_S$
Client checks server certificate
Pre-master secret key transport: client randomly chooses pre-master secret $\text{pms}$ , encrypted with server’s public key
Session keys computed with PRF
Client finished message: encrypted with session key
Server finished message: encrypted with session key

Other Handshake Variants

Diffie-Hellman: static/fixed Diffie-Hellman with certified keys. If the client does not have a certificate, they use an ephemeral Diffie-Hellman key.

Anonymous Diffie-Hellman: ephemeral Diffie-Hellman, but keys are not signed at all - only protects against passive eavesdropping.

Session Key Generation

Master secret $\text{ms}$ generated using the pre-master secret and a pseudorandom function $\mathrm{PRF}$ :

\text{ms} = \mathrm{PRF}(\text{pms}, \text{\textquotedblleft master secret\textquotedblright}, N_C \| N_S)

The $\text{pms}$ to $\text{ms}$ conversion is required to ensure the $\text{ms}$ is in the right format as $\text{pms}$ may vary depending on the key transfer algorithm.

To generate the key material (the amount depends on cipher suite):

k = \mathrm{PRF}(\text{ms}, \text{\textquotedblleft key expansion\textquotedblright}, N_C \| N_S)

Independent session keys are partitioned from $k$ in each direction; a write key and a read key are used on each side.

Depending on cipher suite, key material may include encryption key, MAC key and IV.

The $\mathrm{PRF}$ is built from a HMAC specified by the TLS standard - SHA-2 in TLS 1.2 and a combination of MD5 and SHA-1 in TLS 1.0/1.1.

e.g. for TLS 1.2:

A(i) = \begin{cases} \text{nonce}, & i = 0 \\ \mathrm{HMAC}(\text{key}, A(i - 1)), & \text{otherwise} \end{cases}

\begin{aligned} \mathrm{PRF}(\text{key}, \text{label}, \text{nonce}) = & \mathrm{HMAC}(\text{key}, A(1) \| \text{label} \| \text{nonce}) \| \\ & \mathrm{HMAC}(\text{key}, A(2) \| \text{label} \| \text{nonce}) \| \\ & \vdots \end{aligned}

TLS Alert Protocol

Alert messages of varying degrees of severity:

Warning alerts
close_notify alerts
Fatal alerts

If alert messages are handled improperly, users may be vulnerable to truncation attacks.

Attacks

Backwards Compatibility

Insecure versions of TLS are depreciated slowly:

SSL 3.0 was depreciated in 2015
End-of-life for TLS 1.0 and 1.1 was in 2020

TLS 1.2 is secure as long as a good cipher suite is used:

RC4 shown to be vulnerable, offered in TLS 1.2

TLS 1.2 supported by 99.5% of websites, TLS 1.3 (released 2018) ~50% as of August 2021. See SSL Pulse for up-to-date statistics.

BEAST (Browser Exploit Against SSL/TLS)

Exploits non-standardized IV use in CBC mode encryption: IVs are chained from previous ciphertexts; attack could recover plaintext byte-by-byte.

Theoretical attack found in 2002 but required the full block to be guessed. In 2011 researchers found a method where they could have all but one byte in the block to be known, requiring only that byte to be guessed.

TLS 1.1 requires random IVs, and most browsers added mitigation strategies by putting only one byte of data in the first block (and padding the rest with random data) and the remaining bytes into the second block, forcing a randomized IV.

CRIME (Compression Ratio Info-leak Made Easy) and BREACH (Browser Reconnaissance and Exfiltration via Adaptive Compression of Hypertext)

Side channel attacks based on compression: different inputs result in different amounts of compression.

CRIME is based on compression in the TLS level, BREACH on compression at the HTTP level.

CRIME: attacker has ability to control part of request. If request gets smaller, the attacker-controlled content is probably matches part of source content (e.g. cookies).

TLS 1.3 does not allow compression. Disabling compression at a HTTP level results in a large performance hit.

POODLE (Padding Oracle On Downgraded Legacy Encryption)

A padding oracle enables an attacker to know if a message in a ciphertext was correctly padded.

Encryption in CBC mode can provide a padding oracle due to its error propagation properties: servers (after decrypt all blocks and validating the padding) may sometimes return an ‘invalid padding’ error.

Main mitigation is to have a uniform error response so that attacker cannot distinguish between padding and MAC errors.

Theoretical in 2002, practical in 2014 with POODLE, which forced a downgrade into SSL 3.0.

Heartbleed

Implementation error in OpenSSL found in 2014: improper input validation due to missing bounds check in heartbeat messages allowed memory leakage.

Heartbeats allow clients/servers to send a few bytes of data that the partner should echo back: OpenSSL did not validate that the length field matched actual length of the heartbeat, so the allocated memory could include freed data that contained sensitive data (e.g usernames/passwords, private keys).

Other Attacks

Man-In-The-Middle (MITM)
- Found in 2015, Lenovo was bundling Superfish spyware in their computers, which including a self-signed root certificate that used the same private key across devices
STARTTLS command injection
Sweet32
Triple Handshake
RC4 attacks
Lucky Thirteen
Renegotiation

TLS 1.3

Drafted 2014, adopted as RFC 8446 in August 2018.

TLS 1.3 did a large spring cleaning, removing:

Static RSA, Diffie-Hellman key exchange
Cipher-suite renegotiation (change-cipher-spec)
SSL negotiation
DSA
Data compression
Non-AEAD (authenticated encryption with associated data) cipher suites
- AEAD is more efficient, secure, require less keys and are easier to implement compared to separate encryption and MACs algorithms.
MD5, SHA-335 hash functions
Change Cipher Spec protocol

TLS 1.3 also added:

Separation of key agreement and authentication algorithms from cipher suites
Mandating perfect secrecy: ephemeral keys during EC Diffie-Hellman key agreement
Encrypting content type
0-RTT mode (using pre-shared key)
Post-handshake client authentication
ChaCha20 stream cipher with Poly1305 MAC

Handshake:

Client hello:
- TLS version, supported cipher suites
- Key share:
  - Guesses selected cipher suite (s), sends key share (s)
  - If guess was wrong, server sends retry hello request
Server hello:
- Selected TLS version, cipher suite
- Key share
  - Everything after this in the handshake can be encrypted
- Server certificate (and optionally the certificates up the chain)
- Hash of handshake messages signed with server certificate
Client handshake finished
- Hash of handshake messages signed with session key

0-RTT Overview

Even faster handshakes using pre-shared key (resumption master secret) obtained for the purpose of 0-RTT after a previous (and recent) connection.

TLS 1.3 1-RTT (with ephemeral Diffie-Hellman key exchange):

Client hello
Server hello plus:
- Diffie-Hellman key share
- Encrypted extensions
- Certificate
- Key to verify certificate
Client response:
- Diffie-Hellman key share
- Certificate
- Verification data

TLS 1.2 1-RTT session resumption:

Client hello plus:
- Diffie-Hellman key share
- Pre-shared key (from previous connection after handshake)
Server hello
- Diffie-Hellman key share
- Encrypted extensions

TLS 1.3 0-RTT:

Client hello:
- Key share
- Application data encrypted with resumption key negotiated previously
- End of early data alert
Server hello:
- Key share
- Application data encrypted with session key
Client handshake finished:
- And more application data
Further messages encrypted with new session key

Limitations:

Attackers can capture encrypted 0-RTT and re-send them to the server: if the server is misconfigured, it may accept the replayed requests

15. IPsec and VPN

IP security: framework for ensuring secure communications over IP networks; similar security services as TLS, but running at a lower level of the protocol stack.

VPNs: extending a private network across a public network.

IP Layer Security

TLS runs at the transport layer; IPsec runs at the network layer. Hence, it allows protection for any higher levels, including TCP and UDP.

Provides encryption, authentication and key management algorithms.

Standardized in 2005 with RFC 4301-4305; commonly used to provide VPNs.

Security services:

Confidentiality: encryption to protect against unauthorized data disclosure
Integrity: MACs to determine if data has been modified in transit
Limited traffic analysis protection: difficult to know which parties are communicating, how often, or how much
- Possible by concealing IP datagram details (e.g. source/destination addresses)
Message reply protection: data not delivered multiple times or badly out-of-order
Peer authentication: each endpoint confirms its identity with the other IPsec endpoint

Architectures

Gateway-to-Gateway Architecture:

Secure communication between two networks
Protects only data between the two gateways

Host-to-Gateway Architecture:

For secure remote access (e.g. VPN gateway) - allowing access to resources on secured networks from insecure networks
Each remote access user establishes a connection to the gateway

Host-to-Host Architecture:

Most secure but also most costly; typically used for special purpose needs e.g. remote management of a single server
Provides end-to-end protection for data
All user systems and servers need to have VPN software installed/configured; resource intensive to implement
Key management through a manual process

Protocols

Encapsulating security payload (ESP): provides confidentiality, authentication, integrity and reply protection.

Authentication header (AH) (depreciated): authentication, integrity and reply protection, but NOT confidentiality.

Internet key exchange (IKE): negotiation, creation and management of session keys in security associations (SAs).

IPsec Connection Setup

With the IKEv2 protocol (RFC 7296, 2014):

Diffie-Hellman protocol authenticated with X.509 certificates
Includes cookies to mitigate DDOS attacks
- Provides proof of reachability before expensive cryptographic operations are run
- When server under load, server responds to initial respond with stateless cookie; cookie whose value can be derived from the initial request without storing responder-side state
  - Client must repeat the request, but this time with the stateless cookie
- Proof of work: pre-image for a partial hash value (find message whose hash is less than a given value)
  - Reduces the number of negotiations an attacker can initiate

Security Associations (SA)

Runs after connection setup allows keys to be established.

SAs contain information needed to support an IPsec connection.

It may include:

Cryptographic keys and algorithms
Key lifetimes
Security parameter index (SPI)
Security protocol identifier (ESP/AH)

SAs tells the endpoint how it should process inbound IPsec packets and/or generate outbound packets.

SAs are unidirectional: there is one SA for each direction.

Cryptographic Suites

Cryptographic suites in IPsec are:

Similar to TLS cipher suites
Allow specific groups for Diffie-Hellman (both finite field and symmetric key)
3DES and AES for encryption in CBC or GCM mode
- HMAC or CMAC used for integrity if GCM mode not used

Modes of Operation

ESH and AH can run in two different modes:

Transport mode:
- Maintain the IP header of the original packet
- Only protects the payload
- Generally used in host-to-host architecture
Tunnel mode
- Encapsulates the entire packet in another packet
- Generally used in gateway-to-gateway or host-to-gateway architectures

Transport Mode ESP

ESP components:

Header: SPI identifying the SA and sequence numbers
Trailer: padding, length, and possibly extra padding for enhanced traffic flow confidentiality
Auth: MAC of encrypted data and ESP header (not required if authenticated-encryption mode used)

----------------------------------------------------------
| IP header | ESP header | Data | ESP trailer | ESP Auth |
-------------------------|--------------------|-----------
                         |     encrypted      |
            |         authenticated           |

Outbound packet processing:

Original IP data and ESP trailer encrypted with symmetric cipher
- Padding added in ESP trailer
- If SA using authentication service, MAC calculated for ESP header, data and ESP trailer, and appended to the end of the packet (ESP auth)
  - i.e. encrypt-then-MAC
IP header updated
- Protocol field updated to ESP
- Total length field updated
- Checksums recalculated

Tunnel Mode ESP

--------------------------------------------------------------------------
| New IP header | ESP header | IP header | Data | ESP trailer | ESP Auth |
--------------------------------------------------------------|-----------
                             |           encrypted            |
                |                authenticated                |

Outbound packet processing:

Entire packet, along with ESP trailer, encrypted
If authentication service being used by SA, MAC calculated for ESP header + encrypted section and appended to the end (ESP auth)
- i.e. encrypt-then-MAC
New outer IP header prepended
- Inner packet contains ultimate source/destination address
- Destination address of outer packet may be different (e.g. may be to a security gateway)
- Set to an ESP protocol packet

Security

Active attacks exist only for the encryption-only mode of ESP; encryption without integrity known to be insecure
Attacks due to MAC-then-encrypt configuration
- AH: encryption after MAC
- ESP: encryption before MAC

Virtual Private Networks

Secure channel over insecure connection.

Types:

Branch office interconnect (intranet VPN)
Supplier/business partner access (extranet VPN)
Remote access

Internet VPN: Branch Office Interconnect

Enterprise | Firewall | Internet | Firewall | Branch |
                    <---- VPN ----->

VPN tunnel between router/firewalls of main company and branch office
- AH to authenticate data from tunnel endpoints
- ESP to encrypt data over the internet
Only routers/firewalls need to support IPsec; transparent to clients

Extranet VPN: Supplier Network

Enterprise | Firewall | Internet | Firewall | Supplier Clients |
                    <---------- VPN --------->

Supplier may not be part of the enterprise
VPN extended to operate between router/firewall of main company and individual parts/clients of the supplier TODO

Remote Access

ISPs can provide VPN services across the un-trusted internet.

16. Email Security

Email Security Requirements

SMTP (Single Message Transfer Protocol, RFC 5321) used to transmit email.

Message user agent (MUA) connects client to a mail system, using POP/IMAP to retrieve mail from message store (MS) and SMTP to send mail to a message submission agent (MSA).

The message handling system (MHS) transfers messages from the MSA to the MS via one or more message transfer agents (MTAs).

 Message Transfer    ...     Message Transfer
   Agent (MTA) 1   -------->   Agent (MTA) n
        ^                            |
        |                            | (Local SMTP)
        |                            v
Message Submission             Mail Delivery
    Agent (MSA)                 Agent (MDA)
        ^                            |
        |      Message Handling      |
        |        System (MHS)        |
- - - - - - - - - - - - - - - - - - - - - - - - - - -
        |                            | (Local SMTP)
        |                            v
        |                         Message
        |                       Store (MS)
        |                            |
        |                            | (IMAP/POP)
        |                            v
   Message User                Message User
    Agent (MUA)                 Agent (MUA)

Email content should be confidential and/or authenticated. The email service should also have a high level of availability.

Spam:

Unsolicited bulk email (UBE)
Common vector for phishing
Email filtering to counter
Proposal for proof of work: email sender must solve some problem before MHS accepts the email

Link Security

DomainKeys Identified Mail (DKIM)

Domain-to-domain security.

Standard which provides email authentication. RFC 6376.

Sending mail domain signs outgoing emails with its RSA signatures, verified by receiving domain.

Public key of sending domain stored in a DNS record.

Widely used to prevent email spoofing, spam, phishing.

The email contains:

Version
Algorithm
Domain claiming origin
List of signed header fields
- Hash for those fields
Hash for the body
Selector subdividing namespace
- Name of DNS record which contains public key the email was signed with
- e.g. if you want to retire old keys, use different keys for each server

STARTTLS

Extension of SMTP/POP (RFC 2595) and IMAP (RFC 3207) to run over TLS.

Link-by-link security; not end-to-end. However, use of TLS means forwards secrecy may availlable, although this doesn’t help if an attacker controls on of the links.

Link-to-link security allows metadata information (e.g. email destination) to be protected since most nodes provide transmit email for many users (ala VPN), making it hard to determine where a specific email is going from observing network traffic.

Opportunistic use of TLS; use if possible, continue if not available. This makes it vulnerable to STRIPTLS attacks where an attacker interrupts TLS negotiations, making it fail and fall back to plaintext.

End-to-End Security

Client-to-client security.

Pretty Good Privacy (PGP)

Email authentication and encryption for message contents.

OpenPGP: RFC 4880
GnuPG: open implementation

Hybrid encryption:

New random ‘session key’ generated for each new message
Message content encrypted with symmetric encryption
- Compress plaintext contents with zip
- OpenPGP requires 3DES with 3 keys (168 bits total), recommends AES-128, CAST5 and a few other algorithms to be supported
Session key encrypted using asymmetric encryption using receiver’s long-term public key
- OpenPGP requires Elgamal support, recommends RSA support

Optional authentication:

Optionally sign hash of plaintext (SHA1/SHA2) with sender’s private key
- OpenPGP requires RSA signature support, recommends DSA signatures support
RSA-signed messages hashed with SHA1 (or SHA2)

Then packaging: content encoded with radix-64 so that binary strings can be sent.

Web of Trust:

Public keys available on distributed key servers
- Users generate their own public/private key pairs:
Any PGP user can sign another user’s public key, indicating a level of trust
Users can revoke their own keys by signing a revocation certificate with the revoked key
- Or set an expiry date when generating their keys

Usability:

Average user can’t understand it
Difficult to make an interface that allows users to operate PGP correctly and safely
Vulnerable to EFail
- Client tricked into placing decrypted message within JS or HTML tag, allowing it to be sent to an attacker’s server

Criticisms:

Old algorithms used
No support for new ones
No support for authenticated encryption
Lots of metadata not protected: file length, recipient key identity

Secure/Multipurpose Internet Mail Extension (S/MIME)

Has similar features to PGP, providing authentication, integrity, non-repudiation and confidentiality of the message body, but it cannot interoperate with PGP.

It includes the sender’s public key in each message, keys being X.509 certificates issued by CAs. It is supported by most popular mail clients.

Authentication:

Sender:
- Creates message $m$
- Generates message digest $h(m)$ : SHA-256 hash of $m$
  - SHA guarantee: no one else can find a message $m$ that creates the hash $h(m)$
- Signs $h(m)$ with private RSA key to create signature $s$
  - RSA guarantee: only the owner of the private key can generate $s$
- Sends $s$ and $m$
Receiver:
- Uses the sender’s public key to verify $s$
- Calculate the digest themselves and check it corresponds to the received digest

Confidentiality:

Sender:
- Creates message $m$ , random 128 bit content-encryption key $k$ for this message
- Encrypts $m$ using $k$ using AES-128 with CBC mode
- Encrypts $k$ using the receiver’s public RSA key
- Send $m$ and $k$
Receiver:
- Decrypts $k$ using their private RSA key
- Decrypts the encrypted message using $k$

The use of symmetric cryptography makes the process more efficient. By using a new ‘session key’ each time (one-time-mechanism), the encryption approach can be strengthened.

17. Malware and Cyber Attacks

Methods

Many different methods to gain access to a target computer:

Social engineering: persuading an authorized user to do something
Hacking/cracking: guessing, corrupting or stealing information
Viruses/worms
- Virus:
  - Attaches itself to legitimate programs
  - Often causes undesirable behavior
  - Automatically spreads to other computers (e.g. through email attachments)
- Worm:
  - Runs independently
  - Replicates complete copies of itself onto other hosts on a network, often using system vulnerabilities (e.g. WannaCry)
Trojan horse:
- Harmful piece of software that appears benign and legitimate
- Do not infect files and does not necessarily self-propagate
- Gives attacker remote access to a machine
- Network can be scanned by the attacker’s servers to locate infected machines, forming a botnet:
  - Bot: software agent interacting with a service intended for people.
  - Botnet: collection of bots running autonomously; usually a collection of compromised machines (e.g. services exposed to the internet using default usernames/passwords) running trojans, worms or backdoors.
- e.g. Zeus:
  - Stole bank information through a keylogger
  - Spread through drive-by-downloads and phishing attacks
Network-layer attacks:
- IP spoofing, sequence number predicition, TCP jacking
Web-based attacks:
- XSS, SQL injection, session hijacking
Denial of Service (DoS)
- Operating system attacks:
  - Ping of death, tear drop, land, snork
- Network attacks:
  - SYN flood, TCP fin/rst
- Distributed DoS:
  - TCP flood, reflection

Persuading an authorized user to disclose sensitive information:

Inviting user to log into fake website
Impersonating employee that has forgotten their user ID and/password
Impersonating technical support staff and requesting that they login to ‘check’ their account
Persuading a user to install malicious software

Spear phishing:

Email appearing to be from an individual or business you know
Attempts to gain access to sensitive information such as credit card/bank account numbers, passwords etc.

Hacking/Cracking

Password discovery: default passwords.

Password cracking tools also readily available for many systems (e.g. zip files, Windows password files).

Password attacks:

Brute force: all possible permutations of characters
Dictionary attacks: real-world passwords or permutations of them
Tools such as L0phtcrack, John the Ripper available

Denial of Service Attacks

Makes network services unavailable to users by overloading servers.

Financial incentive (DOS for hire services) and/or for extortion (stop attack when ransom paid).

No magic solution: use a properly-configured firewall to filter out illegitimate requests, and add more servers.

e.g. TCP SYN-ACK flood:

Normal SYN-ACK sequence: client asks for connection (SYN), server allocates resources (SYN-ACK), client responds with ACK
Attack can spoof sender IP address and flood target server with SYN connections; won’t receive response from the server, but forces it to allocate resources for the connection

Rootkits

Collection of programs used to mask intrusion and obtain admin access.

After gaining user-level access to a target system, attacker can install rootkits through known vulnerabilities, password cracking etc.

They may collect user IDs and passwords from other machines on the network. e.g.

Once installed they may:

Monitor traffic and keystrokes
Add backdoors
Alter log files
Attack other machines on the network
Alter system tools to circumvent detection

Blended Threats

Combination of attacks using different vulnerabilities:

Worms dropping viruses
Destructive trojans horses
Password stealers
Remote access trojans (RATs)
- Previously used against energy sectors
- Now aimed at organizations using/making industrial machines/systems
- 2013, Flavex: hacked into websites of manufacturers of industrial control systems and poisoned their software download files
Trojanized applications that replace system tools
Multi-platform attacks
Advanced persistent threats (APTs)
- Stealthy and continuous hacking processes: humans involved in real-time
- Attacking organizations or nation-states
- Requires high degree of covertness over a long period of time
- External command-and-control, continuous monitoring and data extraction

Zero Day Attacks

Taking advantages of software vulnerabilities before the manufacturer can release a patch/fix.

Blaster worm (Windows 2000):

Extremely virulent
Optional patch released one month prior to release

Nachi worm:

Variant of Blaster
Carried dangerous payload
Released two days after patch released

Time available to install updates shrinks over time and may be negative in some cases.

Attack Methods

Buffer Overflow

Exploits inadequate buffer boundary checking.

It often involves overwriting return addresses on the stack, making the machine run attacker-controlled code. However, it could also leak memory contents to the attacker.

Heartbleed was an example of the latter:

https://xkcd.com/1354/
Bug existed for over two years
Leaked private keys, user details
More than 300,000 attacks in a single day

COSC362 Exam Notes

CIA: Confidentiality, Integrity, Availability

Maths

Groups:

One binary operation
Four properties:
- Closure: $a \cdot b \in \mathbb{G}$
- Identity: element $1$ such that $a \cdot 1 = a$
- Inverse: all elements have inverse
- Associativity: $(a \cdot b) \cdot c = a \cdot (b \cdot c)$
Abelian group:
- Additionally commutative: $a \cdot b = b \cdot a$
Order:
- of a group: number of elements in a group
- for $g$ in $\mathbb{G}$ , $|g|$ is smallest integer such that $g^k = 1$
- $g$ generator if $| g | = |\mathbb{G}|$ ; group cyclic if it has a generator

Finding Generators:

$\mathbb{Z}_n^* = \mathbb{Z}_n \backslash \{0\}$
Primes: to find a generator of $\mathbb{Z}^*_p$ :
- Use Lagrange theorem; order of any element must exactly divide $p - 1$
- Compute all distinct prime factors $f_1, f_2, \dots, f_r$ of $p - 1$
- $g$ is a generator iff $g^\frac{p - 1}{f_i} \neq 1 \pmod p$ for all $i = 1, 2, \dots, r$
Composite: brute force. Find $\mathbb{Z}_n^*$ , iterate through all values of the group raised to powers up to $|\mathbb{Z}_n^*|$

Field $\mathbb{F}$ : set with two operations:

Abelian group for $+$ with identity $0$
$\mathbb{F} \backslash \{ 0 \}$ abelian for $\cdot$ with identity $1$
Distributivity: $a \cdot (b + c) = (a \cdot b) + (a \cdot c)$
Finite field: field where operations use modular arithmetic

Chinese Remainder Theorem:

Relatively prime $p$ , $q$
Given integers $c_1$ and $c_2$ there exists a unique integer $0 \le x \lt pq$ such that:

$\begin{aligned} x &\equiv c_1 \pmod p \\ x &\equiv c_2 \pmod q \end{aligned}$
Solution:

$x \equiv qc_1(q^{-1} \pmod p) + pc_2(p^{-1} \pmod q) \pmod{pq}$

Euler function:

$\phi(n)$ : number of integers smaller and relatively prime to $n$
$\phi(p) = p - 1$ for prime $p$ .
For integer $n$ with prime divisors $p_i^{e_i}$ :

$\phi(n) = \prod_{i = 1}^{t}{p_i^{e_i - 1}(p_i - 1)}$
Euler’s Theorem: for relatively prime $a$ and $n$ :

$a^{\phi(n)} \bmod n = 1$

Fermat Primality Test:

Uses Fermat’s little theorem: if $a^{n - 1} \bmod n \neq 1$ for $1 < a < n - 1$ , $a$ an $n$ are not co-prime and hence $n$ cannot be prime
Repeat with multiple values of $a$ (e.g. known small primes); return probable prime if all return $1$
Reduce powers using:
- $ab \bmod n = (a \bmod n)(b \bmod n) \bmod n$
- $(a^m)^k \bmod n = (a^m \bmod n)^k \bmod n$
Carmichael numbers: composite numbers that are always found to be probable primes by the Fermat primality test

Miller-Rabin test:

Only for odd $n$
Pick odd $u$ and any integer $v$ such that $n - 1 = 2^v u$
Pick any $1 < a < n - 1$
- e.g. first 7 primes
Set $b = a^u \bmod n$
If $b = 1$ return probable prime
Else repeat the following $v$ times:
- If $b = -1$ return probable prime
- $b = b^2 \bmod n$
Return composite

Returns probable prime for composite numbers with a maximum probability of 25%.

Discrete logarithm problem:

Find exponent $x$ that solves $y = g^x \bmod p$
Hard problem: need to brute force values of $x$

Classical Encryption

Confidentiality: reading message requires key.

Authentication: creating message requires key.

Attack classes:

Ciphertext only: only ciphertexts
Known plaintext: some plaintexts and corresponding ciphertexts known
Chosen plaintext: ciphertext of attacker-controlled plaintext known
Chosen ciphertext: plaintext of some ciphertext chosen by attacker known

Kerckhoff’s Principle: the only thing the attacker doesn’t know is the key.

Systems:

Transposition
- Message uses permutation $f$ and each block of characters is permuted with this key. If block of length $d$ , $d!$ possible keys
- Frequency distribution of cipher/plaintext the same
- Common di/trigrams in the language can be used to optimize trials
Simple (Monoalphabetic) Substitution Cipher
- Each character replaced with another
- Ceasar: shifted alphabet
- $n!$ keys where $n$ is the alphabet size
- Frequency analysis attacks
Polyalphabetic Substitution Cipher
- $d$ ciphertext alphabets; $i$ th character uses alphabet $i \bmod d$
Vigenère Cipher
- Polyalphabetic substitution, except that ciphertext alphabets are just shifted alphabets (Caesar)
- Once period identified, substitution tables can be attacked separately
- Autocorrelation: for each possible period length, generate frequency distribution for each alphabet and compare to (shifted) English alphabet frequency distribution
- Kasiski Method: identify sequences of characters the appear multiple times: distance between them likely to be multiple of the period. Find common divisor
Hill Cipher: Polygram Cipher
- Simple substitution cipher, but substitute multiple characters at a time
- Linear function: multiply each group (column vector) by the key (a matrix)
- Vulnerable to known-plaintext attacks (but may fail if matrix not invertible)
- Ciphertext-only attacks: find probable blocks

Modern Encryption

Hash Functions/MACs

Properties:

Collision resitance: find collision given no constraints
Second-preimage resistance: find collision for a given message
Preimage resistance: can’t find message given its hash

Birthday paradox:

~50% chance of finding collision to hash function outputting $k$ bits given $2^{k/2}$ trials
- ~ $2^{128}$ trials infeasible today, so hash function outputs should be at least 256 bits
- c.f. block cipher key: need $2^{k - 1}$ trials for 50% probability of finding key

Merkle-Damgård Construction:

Compression function $h$ takes in two $n$ -bit inputs and outputs one $n$ -bit output
If $h$ collision resistant, whole hash function is collision resistant
Split message into $n$ -bit blocks
- Hash IV and first block
- Hash output of above and second block
- …
- Hash output of above and length (plus padding)
Length extension attacks: hash is full state of the MAC so attacker could add extra blocks

Standards:

MDx: all broken
SHA-0/SHA-1: broken
SHA-2:
- Min 256 bits (i.e. AES-128-level security)
- SHA-512 most secure
SHA-3: uses sponge function over compression functions

MACs:

Tag generated from message and secret key
Unforgeability: cannot produce Message-Tag pair without key
Unforegability under chosen message attack: above holds even with access to oracle that can calculate MAC for attacker-chosen messages
But not non-repudiation: signed with session key, not sender’s private key

HMAC:

MAC from iterated hash function (compression function)
$\mathrm{HMAC}(M, K) = H((K' \oplus \mathrm{opad}) \| H((K' \oplus \mathrm{ipad}) \| M))$
- Where $K'$ is $H(K)$ if $K$ is larger than the block size and $K$ otherwise
- Where $\text{opad}$ and $\text{ipad}$ are known constants

Encryption and MAC:

If not using authenticated encryption, there are three options:
- Encrypt-and-MAC: encrypt $M$ , apply MAC to $M$ and send ciphertext and tag
  - Insecure; don’t use
- MAC-then-Encrypt: calculate MAC on $M$ , encrypt $M\|T$ , send ciphertext
- Encrypt-then-MAC: encrypt $M$ , calculate MAC on $C$ , send ciphertext and tag
  - Most secure but also a bit harder

Block Cipher

Key sizes and equivalents for symmetric algorithms (block ciphers), factoring modulus (e.g. RSA’s $n$ ), discrete logarithm key (exponent) and group ( $p$ ), elliptic curve, hashes: https://www.keylength.com/en/4/

Product cipher: chain simple functions together, each using its own key.

Iterated cipher: product cipher but each round uses the same function using a key derived from a master key (using key schedule).

Substitution-Permutation Network (SPN):

Substitution/S-box/ $\pi_S$ : function given sub-blocks of $l$ bits and returns substituted bits
- Flips some bits; number of $1$ ’s in the sub-block will change
Permutation/P-box/ $\pi_P$ : function given full block $n$ and returning a permutation
- Number of $1$ ’s will not change but the location of them will
In each round, split block into $n/l$ sub-blocks, run substitution each, then run permutation on the whole block
Round key $K_i$ XORed with output of previous round, $W_{i - 1}$ (plaintext for the first round)

Feistel Cipher:

Split plaintext into left and right
In each round:
- $L_i = R_{i - 1}$
- $R_i = L_{i - 1} \oplus f(R_{i - 1}, K_i)$
- Output of (hopefully non-linear) function $f$ XORed, so decryption is same as encryption

Shannon:

Key avalanche: small change in key results in large change to ciphertext
- Relates to Shannon’s notion of confusion: substitution used to make the relationship between $K$ and $C$ as complex as possible
Plaintext avalanche: small change in plaintext results in large change to ciphertext
- Relates to *Shannon’s notion of diffusion: transformations used to dissipate the statistical properties of $P$ across $C$ .

DES:

16-round Feistel cipher with 56-bit keys, block length of 64 bits
- Each round uses difference 48-bit subkey
Plaintext initially permuted with fixed permutation, inverse permutation applied at the end
Brute-force requires $2^{55}$ trials on average
Double encryption: run DES on plaintext, then run DES on the resultant ciphertext with a different key
Meet-in-the-Middle attack:
- Known plaintext attack
- For a given block:
  - Encrypt plaintext with all possible keys; store in memory
  - Decrypt ciphertext with any key; compare to above values
    - Repeat for all possible keys
    - If match found, check if it works for other pairs
- Instead of an average of $2^{2d - 1}$ attempts, requires storage of $2^d$ ciphertexts, and $2^d$ encryption and decryption operations
Triple DES:
- Three independent keys approved by NIST; two keys for legacy use; never use one key; can be brute-forced as easily as DES

AES:

128-bit data block
Ceasar: shifted alphabet
128, 192 or 256 bit master keys with 10, 12 or 14 rounds respectively
Byte-based: finite field operations in $\mathbb{F}_{256}^*$
SPN but not Feistel

Block Modes of Operation

ECB (Electronic Code Book):

Each block uses the same key
$C_t = E(P_t, K)$

CBC (Cipher Block Chaining)

Plaintext XORed with previous ciphertext (or IV)
Parallel decryption possible
$C_t = E(P_t \oplus C_{t - 1}, K)$

CTR (Counter Mode):

Concatenation of nonce $N$ and block number $t$ encrypted
Parallel encryption and decryption
$C_t = E(N \| t, K) \oplus P_t$

Authentication/Integrity

Tag $T$ of message $M$ is unforgeable - impossible to produce $T = \mathrm{MAC}(M, K)$ without $K$ .

CBC-MAC:

$C_t = E(P_t \oplus C_{t - 1}, K)$
$C_0$ is the IV: must be fixed and public
- If not, IV must be sent along with the message and attacker has control over $P_1$
The tag is the last ciphertext

CMAC (Cipher-Based MAC):

NIST version of CBC-MAC
IV all zeroes

Authenticated Encryption

Data fits into one of two buckets:

Payload: encrypted and authenticated
Associated data: only authenticated

This is called AEAD - Authenticated Encryption with Associated Data.

CCM (Counter with CBC-MAC): If $h$ collision resistant, whole hash function is collision resistan

CBC-MAC for authentication; CTR mode encryption for payload
Nonce $N$ (for CTR encryption), payload $P$ , associated data $A$
Lengths of $N$ and $P$ included in first block; cannot be used for streaming

GCM (Galois Counter Mode):

CTR mode + $\mathrm{GHASH}$ function
Used in TLS 1.2 and 1.3
TODO more details?

RNGs

Seed obtained from true RNG; used in PRNG/deterministic random bit generator (DRBG).

DRBGs should have:

Backtracking resistance: access to current state does not allow attacker to distinguish between random noise and previous DRBG output
Forward prediction resistance: access to current state does not allow attacker to distinguish between random noise and later DRBG output

CTR_DRBG:

Block cipher in counter mode (e.g. AES)
- ‘Plaintext’ is just zeroes; XOR operation does nothing
Initialized with seed:
- Seed length = key length + block length
- Seed defines key and counter value (no nonce)
On each request, return $E(n, K)$ and increment counter
Maximum of $2^{48}$ generate calls before re-seeding

Dual_EC_DRBG:

Very bad, don’t use, most likely backdoored by NSA

Synchronous Stream Ciphers:

Both parties must generate the same keystream and be synchronized
XOR the keystream and plain text to get the ciphertext
One-Time Pad
- Shannon’s Perfect Secrecy: distribution of messages given ciphertext the same as the distribution of messages
A5 Cipher, RC4, ChaCha

Public Key Cryptography

One-way function:

Functions where the inverse is hard to compute
Integer factorization, discrete logarithm problem believed to be one-way
Trapdoor one-way functions:
- Inverse easy to compute given additional information - trapdoor
Asymmetric cryptography: trapdoor is the decryption key

RSA:

Choose distinct primes $p$ , $q$
- At least 1024 bits
- $n = pq$
  - Shor’s theorem: polynomial time factorization for $n$ with quantum computers
Choose $e$ such that $e$ and $\phi(n)$ are co-prime
- Random gives best security
- Small values faster
- $e = 3$ has security issues, prefer larger values (e.g. $e = 2^{16} + 1$ )
- $d$ should be at least $\sqrt(n)$
Compute $d = e^{-1} \bmod \phi(n)$
Public key $K_E = (n, e)$
Private key $K_D = (p, q, d)$
Encryption: $C = M^e \bmod n$
Decryption: $M = C^d \bmod n$
- CRT can be used to increase decryption speed
Padding required to add randomness
- Håstad’s Attack: if same $M$ and $e$ used by different people (i.e. different $n$ ), CRT can be used to find $M^e$ in the ordinary numbers and take the $e$ th root to find $M$

Diffie-Hellman:

Prime $p$
Generator $g$ in $\mathbb{Z}_p^*$
$g^a \bmod p$ , $g^b \bmod p$ sent by Alice and Bob respectively
Key $Z = g^{ab} \bmod p$
Relies on discrete logarithm problem being hard

Authenticated Diffie-Hellman:

Alice sends $g^a \bmod p$ and identity $A$
Bob sends $g^b \bmod p$ , identity $B$ and uses public key to sign $g^a \bmod p$ , $g^b \bmod p$ , $A$ , and $B$
Alice and uses public key to sign $g^a \bmod p$ , $g^b \bmod p$ , $A$ , and $B$
Both compute $g^{ab} \bmod p$

Static Diffie Hellman: $a$ and $g^a$ are the long-term private/public keys

Elgamal Cryptosystem:

Key generation:
- Pick $p$ , generator $g$
- Pick long-term private key $K_D$
- $y = g^{K_D}$
- Long-term public key $(p, g, y)$
Encryption:
- Pick ephemeral private key $k$
- Send $(C_1, C_2) = (g^k \bmod p, My^k \bmod p)$
Decryption:
- $M = C_2 \cdot \left(C_1^{K_D}\right)^{-1} \bmod p$

Digital Signatures

Unforgeability: infeasible to generate a valid signature for any message without key.

Provides non-repudiation.

RSA signatures:

Signing: $s = h(M)^d \bmod n$ where $h$ is a fixed, public hash function
Verification: check that $s^e \bmod n = h(M)$

DSA signatures:

Elgamal:
- Private key $x$ , public key $y = g^x \bmod p$
- Signing:
  - Pick $k$ smaller than and co-prime to $p - 1$
  - $r = g^k \bmod p$
  - Find $s$ in $M = xr + ks \bmod{(p - 1)}$
  - Output $(M, r, s)$
  - Ciphertexts twice the size of RSA given same size for $p$ and $n$
- Verification:
  - Check $g^M \equiv y^r r^s \pmod p$
DSA
- $p$ such that $p - 1$ has small prime divisor $q$
- Generator $h$ in $\mathbb{Z}_p^*$
- Generator $g = h^{\frac{p - 1}{q}} \bmod p$
  - Order $q$
  - All exponents can be reduced modulo $q$ prior to exponentiation
- Signing:
  - $0 < k < q$
  - $r = (g^k \bmod p) \bmod q$
  - $s = k^{-1}(H(M) - xr) \bmod q$
- Verification:
  - $w = s^{-1} \bmod q$
  - $u_1 = H(M)w \bmod q$
  - $u_2 = rw \bmod q$
  - Check $(g^{u_1}y^{-u_2} \bmod p) \bmod q = r$
- DSA signatures shorter than RSA signatures
ECDSA:
- $q$ order of elliptic curve group
- Multiplication modulo $p$ replaced with elliptic curve group operation
- Public keys shorter c.f. DSA but signatures are not
- Takes longer to verify c.f. RSA
- c.f. AES, ~double key size

Public Key Infrastructure

Trusted certification authority (CA) (CA public key required by clients) issues/signs and revokes certificates.

Certificates (e.g. X.509 v3) contain:

Public key
Owner identity
Signature on the above signed by the CA
Metadata (e.g. validity period, algorithms)

Usually signed with RSA since RSA verification is faster than ECDSA.

Revocation: each CA has list of revoked certificates.

Key Management

Key management phases:

Generation (all keys equally likely)
Distribution
Protection (only accessible to authorized parties)
Destruction

Mutual vs unilateral authentication:

Mutual: both parties authenticate the other party
Unilateral: only one party authenticates (e.g. client authenticates server)

Pre-Shared Keys:

Trusted authority (TA) generates and distributes long-term keys to all users when joining
Only involved during pre-distribution
Simple scheme: one key for each pair of users - $O(n^2)$ keys
Probabilistic schemes: high probability of secure channel between any two users

With symmetric keys:

User and TA share long-term keys
TA distributes session keys to users when requested
Fixed Needham-Schroeder protocol:
- $A$ asks for connection with $B$
- $B$ sends ID and nonce to $A$
- $A$ sends ID and nonce for both parties to TA
- $S$ generates session key between $A$ and $B$ encrypted with long-term key $SA$
  
  $S \to A: \left\{ K_{AB}, \mathrm{ID}_A, \mathrm{ID}_B, N_A \right\}_{K_{AS}}, \left\{ K_{AB}, \mathrm{ID}_A, \mathrm{ID}_B, N_B \right\}_{K_{BS}}$
- $A$ sends portion of message encrypted with long-term key $SB$

With asymmetric keys:

Each user has public key signed by trusted CA
Users trusted to generate good session keys; parties must all have good PRNGs
Session keys encrypted with public keys
If long-term key compromised, attacker can act as owner
- Forwards secrecy: if compromise does not reveal previous session keys
- Diffie-Hellman: both parties provide input to key material, allowing forwards secrecy

TLS

MAC:

SHA-2 (>= TLS 1.2)
MD5, SHA-1 (< TLS 1.3)

Encryption:

Block cipher in CBC mode or stream cipher
Most commonly AES. 3DES, RC4 also available (< TLS 1.3)
Block ciphers: padding applied after MAC to get complete blocks
Plaintext optionally compressed (< TLS 1.3)

Authenticated-Encryption:

TLS 1.3: AES with CCM or GCM
One key for both encryption and MAC
Header data, (implicit) sequence number authenticated
- Otherwise, uses MAC-then-encrypt by default. MAC

Protocols:

Handshake: establish session keys
Record: sends data with established session keys
Change Cipher Spec (< 1.3)
Alert
- Warning
- Close notify (sender will not send further messages)
- Fatal

DH Handshake:

Client hello: highest available TLS version, available cipher suites, client nonce
Server hello: server’s public certificate, selected TLS version, cipher suite, server nonce
Server-key exchange: both nonces and DH parameters signed with server certificate
Client checks signature
Client-key exchange: client’s DH parameter
Both parties compute pre-master secret (PMS), then master secret (MS), then session keys
- One key for MAC and encryption for each direction
- IV may also be generated
Client finished: encrypted with session key
Server finished: encrypted with session key

RSA: client generates PMS and encrypts with server’s public key. No forwards secrecy.

Anonymous DH: against passive eavesdropping.

Attacks:

BEAST: CBC mode encryption, allowed byte-by-byte decryption
CRIME, BREACH: attacker controls part of request, attempts to get it to match cookies or passwords. If request is smaller due to compression, probable partial match
POODLE: CBC mode encryption, servers would return invalid padding error instead of uniform response.
Heartbleed: memory contents leaked via bad bounds check

IPSec

Services:

Confidentiality
Integrity
Limited traffic analysis protection
- Conceals IP source, destination addresses
Message replay protection
Peer authentication: each endpoint confirms identity of partner

Architectures:

Gateway-to-gateway
- e.g. connect two secure networks
- No protection between endpoint and gateway
Host-to-Gateway
- Secure hosts on insecure network
- Each user establishes VPN connection to gateway
Host-to-Host
- Mostly for special purposes
- End-to-end protection
- All systems need VPN software configured
- Key management system required

Protocols:

Encapsulating Security Payload (ESP)
Authentication Header (AH)
- No confidentiality
- Depreciated
Internet Key Exchange (IKE)
- Managing session keys in Security Associations (SA)
  - One SA per direction
  - Includes algorithms, keys, security protocol identifier (SA and/or AH)
- IKEv2 used now
  - DH with X.509 certificates
  - Proof of reachability through cookies: server responds with cookie, client repeats request with cookie
    - Cookie can be calculated server-side without requiring server to store any state
  - Proof of work: pre-image for partial hash

Modes of operation: ESH/AH operate in:

Transport: IP header modified, contents protected
- Host-to-host
- IP header modified: protocol changed to ESP, length field modified, checksums
- IP header contains:
  - Header
  - Packet data and ESP trailer (padding) encrypted
  - MAC for header and encrypted data (if SA using authentication service)
Tunnel mode: original packet encapsulated in new packet
- Gateway-to-gateway or host-to-gateway
- Similar structure, except that encrypted data is the full IP packet, not just the data

Email

Actors and systems:

Message User Agent (MUA) represents the client
- Message Store (MS) to client: POP/IMAP
- Client to Message Submission Agent (MSA): SMTP
Message Handling System (MHS):
- System handling email delivery
- MSA -> MS through Message Transfer Agents (MTA)

Link security:

DomainKeys Identified Mail (DKIM):
- Provides email authentication
- Sending domain signs contents and some headers with public key encryption
- Public key in DNS record
STARTTLS:
- Link-by-link encryption
- Running IMAP and SMTP/POP over TLS
- Opportunistic: use TLS if available

End-to-end security:

Pretty Good Privacy (PGP)
- Message encrypted with session key generated for each email
- Session key encrypted with receiver’s long-term public key
- Optional authentication: hash of plaintext signed with sender’s private key
- Web of trust:
  - Public keys on distributed servers
  - Users sign other user’s public keys, indicating trust
  - Revocation: sign revocation certificate with their old key
Secure/Multipurpose Internet Mail Extension (S/MIME)
- PGP, but with X.509 certificates issued by CAs
- Authentication is required

TODO 17. Malware and Cyber Attacks