09. Hash Functions and MACs

Hash Functions

A public function HH such that:

Security Properties

If an attacker can break second-preimage resistance, they can also break collision resistance.

Birthday Paradox

If there is a group of 23 people, there is over a 50% chance that at least two people have the same birthday.

If choosing S\sqrt{|S|} values from a set SS, the probability of getting two values the same is about half (n2Spcollisionn \approx \sqrt{2|S| \cdot p_\text{collision}}).

Hence, if HH is a hash function with an output size of kk bits, 2k/22^{k/2} trials will be enough to find a collision half the time (assuming HH is a random function).

Today, 21282^{128} trials is considered infeasible so hash functions should have an output of at least 256 bits for collision resistance.

In comparison, to get a 50% change of guessing the key of a block cipher requires only 2k12^{k - 1} trials, so hash functions require about double the number of bits compared to block ciphers for the same security.

Iterated Hash Functions

Iterated hash functions splits the input into fixed-size blocks and operates on them sequentially.

Merkle-Damgård Construction

A compression function hh taking fixed-sized inputs and applies them to the blocks of the message.

The compression function takes two nn-bit input strings x1x_1 and x2x_2 and produces an nn-bit output string yy:

m = m_1 || m_2 || m_3 || … || m_l

       m_1    m_2    m_3       m_l pad+len
        |      |      |         |     |
        |      |      |         |     |
        v      v      v         v     v
IV ---> h ---> h ---> h --...-> h---> h ---> H(m)

Security: if the compression function hh is collision-resistant, then so is HH.

Weaknesses:

The Merkle-Damgård construction is used in MD5, SHA-1 and SHA-2.

Standardized Hash Functions

MDx Family

Proposed by Ron Rivest, widely used in the 1990s.

128-bit output, all broken.

Secure Hash Algorithm (SHA)

Based on MDx family, more complex design with 160 bit output.

SHA-0 introduced 1993, SHA-1 in 1995 with minor changes. Both broken.

SHA-2 Family

Standardized 2015.

Hash sizes of 224, 256, 384 or 512 bits.

Minimum recommended is SHA-256 (256 bit hash, 512 bit blocks) - same security as AES-128.

Most secure is SHA-512: 512 bit hash, 1024 bit blocks.

Padding:

SHA-3

MDx, SHA families based on the same basic design which were vulnerable to a few unexpected attacks.

Keccak function picked by NIST in 2015 as as SHA-3: uses sponge function instead of compression.

Using Hash Functions

Hash functions do not depend a key; anyone can calculate it so it is not encryption.

However, it does help to provide data authentication:

Password storage:

Message Authentication Code

Ensures message integrity. takes in a message MM of arbitrary length and secret key KK and outputs a fixed-sized tag T=MAC(M,K)T = \mathrm{MAC}(M, K).

The tag TT is appended to the message and the recipient can compute TT' with the message they receive and their shared secret, checking to ensure that T=TT' = T.

Unforgeability: not feasible to produce a valid pair (M,T)(M, T) such that T=MAC(M,K)T = \mathrm{MAC}(M, K) without knowledge of KK.

Unforgeability under chosen message attack: even with access to an oracle that can calculate the MAC for an input message of the attacker’s choosing, they cannot create a valid tag themselves (i.e. guess the private key from tags they ask the oracle to generate).

MAC from Hash Function (HMAC)

Proposed by Bellare, Canetti and Krawczyk in 1996.

Can be built from any iterated hash function HH.

With a key KK that has been padded with zeroes to be the required block size and two fixed strings:

HMAC(M,K)=H((Kopad)H((Kipad)M)) \mathrm{HMAC}(M, K) = H((K \oplus \mathrm{opad}) \| H((K \oplus \mathrm{ipad}) \| M))

HMAC is secure if HH is collision resistant or HH is a pseudorandom function. It is designed to resist length-extension attacks, even if HH is a Merkle-Damgård construct.

HMAC is often used as a pseudorandom function to derive subkeys.

Authenticated Encryption

A and B share a key KK and AA wishes to send a message MM with confidentiality and authenticity/integrity.

Two options:

Combining Encryption and MAC

Three options:

Encrypt-then-MAC is the safest option: C=E(M,K1)C = E(M, K_1), T=MAC(C,K2)T = \mathrm{MAC}(C, K_2), send CTC \| T

MAC-then-encrypt was used in older versions of TLS while newer versions use authentication encryption modes.

https://crypto.stackexchange.com/questions/202/should-we-mac-then-encrypt-or-encrypt-then-mac

CTR Mode for Block Ciphers

Synchronous stream cipher. Counter initialized with random nonce NN, keystream generated by encrypting successive values of the counter:

Ot=E(Nt,K) O_t = E(N||t, K)

where tt is the block number.

Encryption and decryption is simply XORing the plain/ciphertext with OtO_t.

Galois Counter Mode (GCM)

CCM mode cannot be used for processing of streaming data: formatting function for NN, AA and PP requires knowledge of the length of AA and PP.

Combines CTR mode on the block cipher EE with a special keyed hash function GHASH (uses multiplication in finite field GF(2128)\mathrm{GF}(2^{128}).

GCM diagram

GHASH:

GASH diagram

HK=E(0128,K)HK = E(0^{128}, K) is the hash subkey. \cdot is multiplication in the finite field.

Output is Ym=GHASHHK(X1,,Xm)Y_m = \mathrm{GHASH}_{HK}(X_1, \dots, X_m)

Decryption: