1 Introduction

HMAC [4] is a popular cryptographic-hash-function-based MAC. The basic construct is actually NMAC, of which HMAC can be viewed as a derivative.

The Constructions Succinctly:

$$\begin{aligned}\begin{array}{rcl} \mathsf {NMAC}(K_{\mathrm {out}}\Vert K_{\mathrm {in}},M) &{} = &{} H^*(K_{\mathrm {out}},H^*(K_{\mathrm {in}},M))\\ \mathsf {HMAC}(K_{\mathrm {out}}\Vert K_{\mathrm {in}},M)&{} = &{} H(K_{\mathrm {out}}\Vert H(K_{\mathrm {in}}\Vert M)). \end{array} \end{aligned}$$

Here \(H\) is a cryptographic hash function, eg. MD5 [34], SHA-1 [31], or RIPEMD-160 [19]. Let \(h{:\;\;}\{0,1\}^c\times \{0,1\}^b \rightarrow \{0,1\}^c\) denote the underlying compression function. (Here \(b=512\) while \(c\) is \(128\) or \(160\).) Let \(h^*\) be the iterated compression function which on input \(K\in \{0,1\}^c\) and a message \(x=x[1]\ldots x[n]\) consisting of \(b\)-bit blocks, lets \(a[0]=K\) and \(a[i]=h(a[i-1],x[i])\) for \(i=1,\ldots ,n\), and finally returns \(a[n]\). Then \(H^*(K,M)=h^*(K,M^*)\) and \(H(M)= H^*(\mathrm {IV},M)\), where \(M^*\) denotes \(M\) padded appropriately to a length that is a positive multiple of \(b\) and \(\mathrm {IV}\) is a public \(c\)-bit initial vector that is fixed as part of the description of \(H\). Both NMAC and HMAC use two keys, which in the case of NMAC are of length \(c\) bits each, and in the case of HMAC of length \(b\) bits each and derived from a single \(b\)-bit key. HMAC is a non-intrusive version of NMAC in the sense that it uses the cryptographic hash function only as a blackbox, making it easier to implement.

Usage HMAC is standardized via an IETF RFC [27], a NIST FIPS [30], and ANSI X9.71 [1], and implemented in SSL, SSH, IPsec, and TLS among other places. It is often used as a PRF (pseudorandom function [21]) rather than merely as a MAC. In particular this is the case when it is used for key-derivation, as in TLS [18] and IKE (the Internet Key Exchange protocol of IPsec) [23]. HMAC is also used as a PRF in a standard for one-time passwords [29] that is the basis for Google authenticator.

What’s Known The results are for NMAC, but can be lifted to HMAC. It is shown in [4] that NMAC is a secure PRF if (1) the underlying compression function \(h\) is a secure PRF, and also (2) that the hash function \(H\) is weakly collision resistant (WCR). The latter, introduced in [4], is a relaxation of collision resistance (CR) that asks that it be computationally infeasible for an adversary, given an oracle for \(H^*(K,\cdot )\) under a hidden key \(K\), to find a collision, meaning distinct inputs \(M_1,M_2\) such that \(H^*(K, M_1)=H^*(K, M_2)\).

The Problem HMAC is usually implemented with MD5 or SHA-1. But, due to the attacks of [37, 38], these functions are not WCR. Thus the assumption on which the proof of [4] is based is not true. This does not reflect any actual weaknesses in the NMAC or HMAC constructs, on which no attacks are known. (Being iterated MACs, the generic birthday-based forgery attacks of [33] always break NMAC and HMAC in time \(2^{c/2}\), but we mean no better-than-birthday attacks are known.) But it means that we have lost the proof-based guarantees. We are interested in recovering them.

Loss of WCR First we pause to expand on the claim above that our main hash functions are not WCR. Although WCR appears to be a weaker requirement than CR due to the hidden key, for iterated hash functions, it ends up not usually being so. The reason is that collision-finding attacks such as those on MD5 [38] and SHA-1 [37] extend to find collisions in \(H^*(\mathrm {IV},\cdot )\) for an arbitrary but given \(\mathrm {IV}\), and, any such attack, via a further extension attack, can be used to compromise WCR, meaning to find a collision in \(H^*(K,\cdot )\), given an oracle for this function, even with \(K\) hidden. This was pointed out in [4, 25], and, for the curious, we recall the attack in Sect. 6.

Main Result We show (Theorem 3.3) that NMAC is a PRF under the sole assumption that the underlying compression function \(h\) is itself a PRF. In other words, the additional assumption that the hash function is WCR is dropped. (And, in particular, as long as \(h\) is a PRF, the conclusion is true even if \(H\) is not WCR, let alone CR.)

The main advantage of our result is that it is based on an assumption that is not refuted by any known attacks. (There are to date no attacks that compromise the pseudorandomness of the compression functions of MD5 or SHA-1.) Another feature of our result is that it is the first proof for NMAC that is based solely on an assumption about the compression function rather than also assuming something about the entire iterated hash function.

Techniques We show (Lemma 3.1) that if a compression function \(h\) is a PRF then the iterated compression function \(h^*\) is computationally almost universal (cAU), a computational relaxation of the standard information-theoretic notion of almost universality (AU) of [14, 36, 39]. We then conclude with Lemma 3.2 which says that the composition of a PRF and a cAU function is a PRF. (This can be viewed as a computational relaxation of the Carter–Wegman paradigm [14, 39].)

Another Result The fact that compression functions are underlain by block ciphers, together with the fact that no known attacks compromise the pseudorandomness of the compression functions of MD5, SHA-1, may give us some confidence that it is ok to assume these are PRFs, but it still behooves us to be cautious. What can we prove about NMAC without assuming the compression function is a PRF? We would not expect to be able to prove it is a PRF, but what about just a secure MAC? (Any PRF is a secure MAC [7, 10], so our main result implies NMAC is a secure MAC, but we are interested in seeing whether this can be proved under weaker assumptions.) We show (Theorem 4.3) that NMAC is a secure MAC if \(h\) is a privacy-preserving MAC (PP-MAC) [9] and \(h^*\) (equivalently, \(H^*\)) is cAU. A PP-MAC (the definition is provided in Sect. 4) is stronger than a MAC but weaker than a PRF. This result reverts to the paradigm of [4] of making assumptions both about the compression function and its iteration, but the point is that cAU is a very weak assumption compared to WCR and PP-MAC is a weaker assumption than PRF.

From NMAC to HMAC The formal results (both previous and new) we have discussed so far pertain to NMAC. However, discussions (above and in the literature) tend to identify NMAC and HMAC security-wise. This is explained by an observation of [4] which says that HMAC inherits the security of NMAC as long as the compression function is a PRF when keyed via the data input. (So far when we have talked of it being a PRF, it is keyed via the chaining variable.) In our case this means that HMAC is a PRF if the compression function is a “dual-PRF,” meaning a PRF when keyed by either of its two inputs.

However, the analysis above assumes that the two keys \(K_{\mathrm {out}},K_{\mathrm {in}}\) of HMAC are chosen independently at random, while in truth they are equal to \(K{\oplus }\mathsf {opad}\) and \(K{\oplus }\mathsf {ipad}\), respectively, where \(K\) is a random \(b\)-bit key and \(\mathsf {opad},\mathsf {ipad}\) are fixed, distinct constants. We apply the theory of PRFs under related-key attacks [8] to extend the observation of [4] to this single-key version of HMAC, showing it inherits the security of NMAC as long as the data input-keyed compression function is a PRF under an appropriate (and small) class of related-key attacks. Assuming additionally that the compression function is a PRF in the usual sense, we obtain a (in fact, the first) security proof of the single-key version of HMAC. These results are in Sect. 5.

Related Work If \(h^*\) were a PRF, it would imply it is cAU, but \(h^*\) is not a PRF due to the extension attack. It is, however, shown by [5] that if \(h\) is a PRF then \(h^*\) (which they call the cascade) is a “pf-PRF” (prefix-free PRF), meaning a PRF as long as no query of the adversary is a prefix of another query. It was pointed out to us by Victor Shoup after seeing an early draft of our paper that it is possible to apply this in a blackbox way to show that \(h^*\) is cAU. Lemma 3.1, however, obtains a better bound. Although this is by only a constant factor in the context of the lemma itself, Theorem 3.4 shows that it can translate into an appreciably better security guarantee in the context of NMAC itself. The proof of Lemma 3.1 exploits and extends the ideas of [5] while also adding some new ones. For comparison, we do present the indirect proof in Sect. 7 and contrast the bounds obtained with ours.

Dodis et al. [20, Lemma 4] show that the cascade over a family of random functions is AU as long as the two messages whose collision probability one considers have the same length. (In this model, \(h(K,\cdot )\) is a random function for each \(K \in \{0,1\}^c\). That is, it is like Shannon’s ideal cipher model, except the component maps are functions not permutations.) This does not imply Lemma 3.1 (showing the cascade \(h^*\) is cAU if \(h\) is a PRF), because we need to allow the two messages to have different lengths, and also because it is not clear what implication their result has for the case when \(h\) is a PRF (A PRF does not permit one to securely instantiate a family of random functions.). A second result [20, Lemma 5] in the same paper says that if \(h^*(K,M)\) is close to uniformly distributed then so is \(h^*(K, M\Vert X)\). (Here \(M\) is chosen from some distribution, \(K\) is a random but known key, and \(X\) is a fixed block.) This result only assumes \(h\) is a PRF, but again we are not able to discern any implications for the problems we consider, because in our case, the last block of the input is not fixed, we are interested in the cAU property rather than randomness, and our inputs are not drawn from a distribution.

Versions and Subsequent Work A preliminary version of this paper appears in the proceedings of CRYPTO’06 [3]. The full version you are now reading contains a different and simpler proofs of Claim 3.8, an additional result in the form of Theorem 3.4, and proofs and other material omitted from the preliminary version due to lack of space.

Proofs in our prior version gave blackbox reductions, but results had only stated the tighter, non-blackbox reductions derived via coin-fixing. Results now explicitly state both.

Tightness of our bounds is estimated based on conjectures about the prf security of the starting compression function \(h\). Koblitz and Menezes [26] point out that the conjectures made in prior versions of our work were over optimistic In the context of non-blackbox reductions. (Time-memory tradeoff attacks [2, 17, 22, 24] should be factored in.) Tightness estimates are now based on the blackbox versions of our reductions and indicate that our bounds are not as tight as we had thought. The gap has been filled by Pietrzak [32], who gives blackbox reduction proofs for NMAC that he shows via matching attack to be tight. In the process, he shows that an improvement to our bounds claimed by [26] was wrong.

2 Definitions

Notation We denote the concatenation of strings \(s_1,s_2\) by either \(s_1\Vert s_2\) or just \(s_1s_2\). We denote by \(|s|\) the length of string \(s\). Let \(b\) be a positive integer representing a block length, and let \(B=\{0,1\}^b\). Let \(B^+\) denote the set of all strings of length a positive multiple of \(b\) bits. Whenever we speak of blocks we mean \(b\)-bit ones. If \(M\in B^+\) then \(\Vert M\Vert _b=|M|/b\) is the number of blocks in \(M\), and \(M[i]\) denotes its \(i\)-th \(b\)-bit block, meaning \(M=M[1]\ldots M[n]\) where \(n=\Vert M\Vert _b\). If \(M_1,M_2\in B^+\), then \(M_1\) is a prefix of \(M_2\), written \(M_1\subseteq M_2\), if \(M_2=M_1\Vert A\) for some \(A\in B^*\). If \(S\) is a finite set then \(s\mathop {\leftarrow }\limits ^{\$}S\) denotes the operation of selecting \(s\) uniformly at random from \(S\). An adversary is a (possibly randomized) algorithm that may have access to one or more oracles. We let

$$\begin{aligned} A^{\mathcal{O}_1,\ldots }(x_1,\ldots ) \Rightarrow {1}\quad \text{ and } \quad y\mathop {\leftarrow }\limits ^{\$}A^{\mathcal{O}_1,\ldots }(x_1,\ldots ) \end{aligned}$$

denote, respectively, the event that \(A\) with the indicated oracles and inputs outputs 1, and the experiment of running \(A\) with the indicated oracles and inputs and letting \(y\) be the value returned. (This value is a random variable depending on the random choices made by \(A\) and its oracles.) Either the oracles or the inputs (or both) may be absent, and often will be.

A family of functions is a two-argument map \(f{:\;\;} Keys \times Dom \rightarrow Rng \) whose first argument is regarded as a key. We fix one such family \(h{:\;\;}\{0,1\}^c\times B\rightarrow \{0,1\}^c\) to model a compression function that we regard as being keyed via its \(c\)-bit chaining variable. Typical values are \(b=512\) and \(c=128\) or \(160\). The iteration of family \(h{:\;\;}\{0,1\}^c\times B\rightarrow \{0,1\}^c\) is the family of functions \(h^*{:\;\;}\{0,1\}^c\times B^{+}\rightarrow \{0,1\}^c\) defined via:

figure a

This represents the Merkle-Damgård [15, 28] iteration method used in all the popular hash functions but without the “strengthening,” meaning that there is no \(|M|\)-based message padding.

PRFs A prf-adversary \(A\) against a family of functions \(f{:\;\;} Keys \times Dom \rightarrow Rng \) takes as oracle a function \(g{:\;\;} Dom \rightarrow Rng \) and returns a bit. The prf-advantage of \(A\) against \(f\) is the difference between the probability that it outputs 1 when its oracle is \(g=f(K,\cdot )\) for a random key \(K\mathop {\leftarrow }\limits ^{\$} Keys \), and the probability that it outputs 1 when its oracle \(g\) is chosen at random from the set \(\mathsf {Maps}( Dom , Rng )\) of all functions mapping \( Dom \) to \( Rng \), succinctly written as

$$\begin{aligned} \mathbf {Adv}_{f}^{\mathrm{prf}}(A) \;=\;{\Pr \left[ \,{A^{f(K,\cdot )}\Rightarrow {1}}\,\right] }-{\Pr \left[ \,{A^{{\$}}\Rightarrow {1}}\,\right] }. \end{aligned}$$
(1)

In both cases, the probability is over the choice of oracle and the coins of \(A\).

cAU and Collision Probability Let \(F{:\;\;}\{0,1\}^k\times Dom \rightarrow Rng \) be a family of functions. cAU is measured by considering an almost-universal (au) adversary \(A\) against \(F\). It (takes no inputs and) returns a pair of messages in \( Dom \). Its au-advantage is

$$\begin{aligned} \mathbf {Adv}_{F}^{\mathrm{au}}(A) \;=\;{\Pr }\left[ \,{F(K,M_1)=F(K,M_2)\;\wedge \;M_1\ne M_2}\, :\,{(M_1,M_2)\mathop {\leftarrow }\limits ^{\$}A\,;\,K\mathop {\leftarrow }\limits ^{\$} Keys }\,\right] . \end{aligned}$$

This represents a very weak form of collision resistance since \(A\) must produce \(M_1,M_2\) without being given any information about \(K\). WCR [4] is a stronger notion because here \(A\) gets an oracle for \(F(K,\cdot )\) and can query this in its search for \(M_1,M_2\).

For \(M_1,M_2\in Dom \) it is useful to let \(\mathsf {Coll}_{F}(M_1,M_2)=\Pr [ F(K,M_1)=F(K,M_2)]\), the probability being over \(K\mathop {\leftarrow }\limits ^{\$}\{0,1\}^k\).

MACs Any PRF is a MAC. (This was established for the basic notion of MAC security in [7], but holds even for the most stringent notions and with tight security reductions [10].) Accordingly, we do not explicitly discuss MACs until Sect. 4.1 where we consider PP-MACs.

3 Security of NMAC

Let \(h{:\;\;}\{0,1\}^c\times \{0,1\}^b\rightarrow \{0,1\}^c\) be a family of functions that represents the compression function, here assumed to be a PRF. Let \(\mathsf {pad}\) denote a padding function such that \(s^*=s\Vert \mathsf {pad}(|s|)\in B^+\) for any string \(s\). (Such padding functions are part of the description of current hash functions. Note the pad depends only on the length of \(s\).) Then, the family \(\mathsf {NMAC}{:\;\;}\{0,1\}^{2c}\times D\rightarrow \{0,1\}^c\) is defined by \(\mathsf {NMAC}(K_{\mathrm {out}}\Vert K_{\mathrm {in}},M)=h(K_{\mathrm {out}},h^*(K_{\mathrm {in}},M^*)\Vert \mathsf {fpad})\) where \(\mathsf {fpad}=\mathsf {pad}(c)\in \{0,1\}^{b-c}\) and \(h^*\) is the iterated compression function as defined in Sect. 2. The domain \(D\) is the set of all strings up to some maximum length, which is \(2^{64}\) for current hash functions.

It turns out that our security proof for \(\mathsf {NMAC}\) does not rely on any properties of \(\mathsf {pad}\) beyond the fact that \(M^*=M\Vert \mathsf {pad}(|M|)\in B^+\). (In particular, the Merkle-Damgård strengthening, namely inclusion of the message length in the padding, that is used in current hash functions and is crucial to collision resistance of the hash function, is not important to the security of \(\mathsf {NMAC}\).) Accordingly, we will actually prove the security of a more general construct that we call generalized NMAC. The family \(\mathsf {GNMAC}{:\;\;}\{0,1\}^{2c}\times B^+\rightarrow \{0,1\}^c\) is defined by \(\mathsf {GNMAC}( K_{\mathrm {out}}\Vert K_{\mathrm {in}},M)\;=\;h(K_{\mathrm {out}},h^*(K_{\mathrm {in}},M)\Vert \mathsf {fpad})\) where \(\mathsf {fpad}\) is any (fixed) \(b-c\) bit string. Note the domain is \(B^+\), meaning inputs have to have a length that is a positive multiple of \(b\) bits. NMAC is nonetheless a special case of GNMAC via \(\mathsf {NMAC}( K_{\mathrm {out}}\Vert K_{\mathrm {in}},M)=\mathsf {GNMAC}(K_{\mathrm {out}}\Vert K_{\mathrm {in}},M^*)\) and thus the security of NMAC is implied by that of GNMAC. (Security as a PRF or a MAC, respectively, for both.)

3.1 The Results

Main Lemma The following says that if \(h\) is a PRF then its iteration \(h^*\) is cAU.

Lemma 3.1

Let \(B=\{0,1\}^b\). Let \(h{:\;\;}\{0,1\}^c\times B\rightarrow \{0,1\}^c\) be a family of functions and let \(n\ge 1\) be an integer. Let \(A^*\) be an au-adversary against \(h^*\) that has time complexity at most \(t\). Assume the two messages output by \(A^*\) are at most \(n_1, n_2\) blocks long, respectively, where \(1\le n_1,n_2\le n\). Then there exists a prf-adversary \(A\) against \(h\) such that

$$\begin{aligned} \mathbf {Adv}_{h^*}^{\mathrm{au}}(A^*) \;\le \;(n_1+n_2-1) \cdot \mathbf {Adv}_{h}^{\mathrm{prf}}(A) + \frac{1}{2^c}. \end{aligned}$$
(2)

\(A\) makes at most \(2\) oracle queries. Its time complexity is \(t\) under a blackbox reduction and \(O(nT_{h})\) under a non-blackbox reduction, where \(T_{h}\) is the time for one evaluation of \(h\).

By a blackbox reduction we mean that the proof gives an explicit procedure to construct \(A\) from \(A^*\). In the non-blackbox case, the time complexity of \(A\) is lower, and in fact independent of that of \(A^*\), meaning the reduction is quantitatively better, but in this case the proof establishes the existence of \(A\) without giving an explicit way to derive it from \(A^*\). (In the latter case, \(A\) in fact depends only on \(h\) and \(n\), not on \(A^*\).) The proof of Lemma 3.1 is in Sect. 3.2.

One might ask whether stronger results hold. For example, assuming \(h\) is a PRF, (1) Is \(h^*\) a PRF? (2) Is \(h^*\) WCR? (Either would imply that \(h^*\) is cAU.) But the answer is NO to both questions. The function \(h^*\) is never a PRF due to the extension attack. On the other hand, it is easy to give an example of a PRF \(h\) such that \(h^*\) is not WCR. Also MD5 and SHA-1 are candidate counter-examples, since their compression functions appear to be PRFs, but their iterations are not WCR.

The   Prf(cAU) = Prf   Lemma Thecomposition of families \(h{:\;\;}\{0,1\}^c\times \{0,1\}^b\rightarrow \{0,1\}^c\) and \(F{:\;\;}\{0,1\}^k \times D\rightarrow \{0,1\}^b\) is the family \( hF {:\;\;}\{0,1\}^{c+k}\times {D}\rightarrow \{0,1\}^c\) defined by \( hF (K_{\mathrm {out}}\Vert K_{\mathrm {in}},M)=h(K_{\mathrm {out}},F(K_{\mathrm {in}},M))\). The following lemma says that if \(h\) is a PRF and \(F\) is cAU then \( hF \) is a PRF.

Lemma 3.2

Let \(B=\{0,1\}^b\). Let \(h{:\;\;}\{0,1\}^c\times B\rightarrow \{0,1\}^c\) and \(F{:\;\;}\{0,1\}^k\times {D}\rightarrow B\) be families of functions, and let \( hF {:\;\;}\{0,1\}^{c+k}\times {D}\rightarrow \{0,1\}^c\) be defined by

$$\begin{aligned} hF (K_{\mathrm {out}}\Vert K_{\mathrm {in}},M)=h(K_{\mathrm {out}},F(K_{\mathrm {in}},M)) \end{aligned}$$

for all \(K_{\mathrm {out}}\in \{0,1\}^c,K_{\mathrm {in}}\in \{0,1\}^k\) and \(M\in {D}\). Let \(A_{ hF }\) be a prf-adversary against \( hF \) that makes at most \(q\ge 2\) oracle queries, each of length at most \(n\), and has time complexity at most \(t\). Then there exists a prf-adversary \(A_h\) against \(h\) and an au-adversary \(A_{F}\) against \(F\) such that

$$\begin{aligned} \mathbf {Adv}_{ hF }^{\mathrm{prf}}(A_{ hF }) \;\le \;\mathbf {Adv}_{h}^{\mathrm{prf}}(A_{h})+\left( \begin{array}{l} q\\ 2 \end{array}\right) \cdot \mathbf {Adv}_{F}^{\mathrm{au}}(A_{F}) \; \end{aligned}$$
(3)

\(A_{h}\) makes at most \(q\) oracle queries, has time complexity at most \(t\) and is obtained via a blackbox reduction. The two messages \(A_{F}\) outputs have length at most \(n\). The time complexity of \(A_{F}\) is \(t\) under a blackbox reduction and \(O(T_{F}(n))\) under a non-blackbox reduction, where \(T_{F}(n)\) is the time to compute \(F\) on an \(n\)-bit input.

This extends the analogous Prf(AU)=Prflemma by relaxing the condition on \(F\) from AU to cAU. (The Prf(AU)=Prflemma is alluded to in [5, 12], and variants are in [12, 13].) A simple proof of Lemma 3.2, using games [11, 35], is in Sect. 3.3.

GNMAC is a PRF We now combine the two lemmas above to conclude that if \(h\) is a PRF then so is \(\mathsf {GNMAC}\).

Theorem 3.3

Assume \(b\ge c\) and let \(B=\{0,1\}^b\). Let \(h{:\;\;}\{0,1\}^c\times B\rightarrow \{0,1\}^c\) be a family of functions and let \(\mathsf {fpad}\in \{0,1\}^{b-c}\) be a fixed padding string. Let \(\mathsf {GNMAC}{:\;\;}\{0,1\}^{2c} \times B^+\rightarrow \{0,1\}^c\) be defined by

$$\begin{aligned} \mathsf {GNMAC}(K_{\mathrm {out}}\Vert K_{\mathrm {in}},M)=h(K_{\mathrm {out}},h^*(K_{\mathrm {in}},M)\Vert \mathsf {fpad}) \end{aligned}$$

for all \(K_{\mathrm {out}},K_{\mathrm {in}}\in \{0,1\}^c\) and \(M\in B^+\). Let \(A_{\mathsf {GNMAC}}\) be a prf-adversary against \(\mathsf {GNMAC}\) that makes at most \(q\ge 2\) oracle queries, each of at most \(m\) blocks, and has time complexity at most \(t\). Then there exist prf-adversaries \(A_1,A_2\) against \(h\) such that

$$\begin{aligned} \mathbf {Adv}_{\mathsf {GNMAC}}^{\mathrm{prf}}(A_{\mathsf {GNMAC}}) \;\le \;\mathbf {Adv}_{h}^{\mathrm{prf}}(A_{1}) + \left( \begin{array}{l} q\\ 2\end{array}\right) \left[ 2m\cdot \mathbf {Adv}_{h}^{\mathrm{prf}}(A_2)+ \frac{1}{2^c}\right] . \end{aligned}$$
(4)

\(A_1\) makes at most \(q\) oracle queries, has time complexity at most \(t\) and is obtained via a blackbox reduction. \(A_2\) makes at most \(2\) oracle queries. The time complexity of \(A_2\) is \(t\) under a blackbox reduction and \(O(mT_{h})\) under a non-blackbox reduction, where \(T_{h}\) is the time for one computation of \(h\).

Proof of Theorem 3.3

Define \(F{:\;\;}\{0,1\}^c\times B^+\rightarrow \{0,1\}^b\) by \(F(K_{\mathrm {in}},M)=h^*(K_{\mathrm {in}},M)\Vert \mathsf {fpad}\).Then \(\mathsf {GNMAC}= hF \). Apply Lemma 3.2 with \(k=c\), \(D=B^+\) and \(A_{ hF }=A_{\mathsf {GNMAC}}\) to get prf-adversary \(A_1\) and au-adversary \(A_{F}\) with the properties stated in the lemma. Note that \(\mathbf {Adv}_{F}^{\mathrm{au}}(A_{F}) =\mathbf {Adv}_{h^*}^{\mathrm{au}}(A_{F})\). This is because a pair of messages is a collision for \(h^*(K_{\mathrm {in}},\cdot )\Vert \mathsf {fpad}\) iff it is a collision for \(h^*(K_{\mathrm {in}}, \cdot )\). Let \(n=m\) and apply Lemma 3.1 to \(A^*=A_{F}\) to get prf-adversary \(A_2\).

The upper bound on the advantage of \(A_{\mathsf {GNMAC}}\) in (4) is a function of a single assumed bound \(m\) on the block lengths of all messages queried by \(A_{\mathsf {GNMAC}}\). The following more general result expresses the bound in terms of assumed bounds on the block lengths of each query taken individually:

Theorem 3.4

Assume \(b\ge c\) and let \(B=\{0,1\}^b\). Let \(h{:\;\;}\{0,1\}^c\times B\rightarrow \{0,1\}^c\) be a family of functions and let \(\mathsf {fpad}\in \{0,1\}^{b-c}\) be a fixed padding string. Let \(\mathsf {GNMAC}{:\;\;}\{0,1\}^{2c} \times B^+\rightarrow \{0,1\}^c\) be defined as in Theorem 3.3. Let \(A_{\mathsf {GNMAC}}\) be a prf-adversary against \(\mathsf {GNMAC}\) that makes at most \(q\ge 2\) oracle queries, where the \(i\)-th query is of at most \(m_i\ge 1\) blocks for \(1\le i\le q\). Let \(n=m_1+\cdots +m_q\) and \(m=\max (m_1,\ldots ,m_q)\). Assume \(A_{\mathsf {GNMAC}}\) has time complexity at most \(t\). Then there exist prf-adversaries \(A_1,A_2\) against \(h\) such that

$$\begin{aligned} \mathbf {Adv}_{\mathsf {GNMAC}}^{\mathrm{prf}}(A_{\mathsf {GNMAC}}) \;\le \;\mathbf {Adv}_{h}^{\mathrm{prf}}(A_{1}) \!+\! (q-1)\cdot (n-q/2)\cdot \mathbf {Adv}_{h}^{\mathrm{prf}}(A_2)\!+\! \left( \begin{array}{l} q\\ 2\end{array}\right) \frac{1}{2^c}.\nonumber \\ \end{aligned}$$
(5)

\(A_1\) makes at most \(q\) oracle queries, has time complexity at most \(t\) and is obtained via a blackbox reduction. \(A_2\) makes at most \(2\) oracle queries, has time complexity \(O(mT_{h})\) and is obtained via a non-blackbox reduction, where \(T_{h}\) is the time for one computation of \(h\).

Theorem 3.4 is more general than Theorem 3.3 in the sense that the former implies the (non-blackbox case of the) latter. Indeed, if we set \(m_i=m\) for all \(1\le i\le q\) then we get \(n=mq\) and (5) implies (4). On the other hand, Theorem 3.4 can give significantly better security guarantees than Theorem 3.3 in cases where the messages being authenticated are of varying lengths. For example suppose \(A_{\mathsf {GNMAC}}\) makes a single query of block length \(m\) followed by \(q-1\) queries, each a single block. Then the bound from (4) multiplies \(\mathbf {Adv}_{h}^{\mathrm{prf}}(A_2)\) by \(q(q-1)m\), whereas the bound from (5) multiplies \(\mathbf {Adv}_{h}^{\mathrm{prf}}(A_2)\) by \((q-1)(m+q/2-1)\). These are thus different by a \(\Theta (q)\) factor if, say, \(m=\Theta (q)\). The proof of Theorem 3.4 can be found in Sect. 3.4.

3.2 Proof of Lemma 3.1

Some Definitions In this proof it will be convenient to consider prf-adversaries that take inputs. The advantage of \(A\) against \(h\) on inputs \(x_1,\ldots \) is defined as

$$\begin{aligned} \mathbf {Adv}_{h}^{\mathrm{prf}}(A(x_1,\ldots )) \;=\;{\Pr \left[ \,{A^{h(K,\cdot )}(x_1,\ldots )\Rightarrow {1}}\,\right] }- {\Pr \left[ \,{A^{{\$}}(x_1,\ldots )\Rightarrow {1}}\,\right] }\;, \end{aligned}$$

where in the first case \(K\mathop {\leftarrow }\limits ^{\$}\{0,1\}^c\) and in the second case the notation means that \(A\) is given as oracle a map chosen at random from \(\mathsf {Maps}(\{0,1\}^b, \{0,1\}^c)\).

Overview To start with, we ignore \(A_{ hF }\) and upper bound \(\mathsf {Coll}_{F}(M_1,M_2)\) as some appropriate function of the prf-advantage of a prf-adversary against \(h\) that takes \(M_1,M_2\) as input. We consider first the case that \(M_1 \subseteq M_2\) (\(M_1\) is a prefix of \(M_2\)) and then the case that \(M_1 \not \subseteq M_2\), building in each case a different adversaries.

The Case \(M_1\subseteq M_2\). We begin with some high-level intuition. Suppose \(M_1\subseteq M_2\) with \(m_2=\Vert M_2\Vert _b\ge 1+m_1\), where \(m_1= \Vert M_1\Vert _b\). The argument to upper bound \(\mathsf {Coll}_{h^*}(M_1,M_2)\) has two parts. First, a hybrid argument is used to show that \(a[m_1]=h^*(K, M_1)\) is computationally close to random when \(K\) is drawn at random. Next, we imagine a game in which \(a[m_1]\) functions as a key to \(h\). Let \(a[m_1+1] =h(a[m_1],M_2[m_1+1])\) and \(a[m_2]=h^*(a[m_1+1],M_2[m_1+2]\ldots M_2[m_2])\). Now, if \(a[m_2]=a[m_1]\) then we effectively have a way to recover the “key” \(a[m_1]\) given \(a[m_1+1]\), amounting to a key-recovery attack on \(h(a[m_1],\cdot )\) based on one input–output example of this function. But being a PRF, \(h\) is also secure against key-recovery.

In the full proof that follows, we use the games and adversaries specified in Fig. 1. Adversaries \(A_1,A_2\) represent, respectively, the first and second parts of the argument outlined above, while \(A_3\) integrates the two.

Fig. 1
figure 1

Games and adversaries taking input distinct messages \(M_1,M_2\) such that \(M_1\subseteq M_2\). The adversaries take an oracle \(g{:\;\;}\{0,1\}^b\rightarrow \{0,1\}^c\)

Claim 3.5

Let \(M_1,M_2\in B^+\) with \(M_1\subseteq M_2\) and \(1+\Vert M_1\Vert _b\le \Vert M_2\Vert _b\). Suppose \(1\le l\le \Vert M_1\Vert _b\). Then

$$\begin{aligned} {\Pr \left[ \,{A_1^{{\$}}(M_1,M_2,l)\Rightarrow {1}}\,\right] }&= {\Pr \left[ \,{G_1(M_1,M_2,l)\Rightarrow {1}}\,\right] } \\ {\Pr \left[ \,{A_1^{h(K,\cdot )}(M_1,M_2,l)\Rightarrow {1}}\,\right] }&= {\Pr \left[ \,{G_1(M_1,M_2,l-1)\Rightarrow {1}}\,\right] }. \end{aligned}$$

Recall that the notation means that in the first case \(A_1\) gets as oracle \(g\mathop {\leftarrow }\limits ^{\$}\mathsf {Maps}(\{0,1\}^b,\{0,1\}^c)\) and in the second case \(K\mathop {\leftarrow }\limits ^{\$}\{0,1\}^c\).

Proof of Claim 3.5

\(A_1^g(M_1,M_2,l)\) sets \(a[l]=g(M_2[l])\). If \(g\) is chosen at random then this is equivalent to the \(a[l]\mathop {\leftarrow }\limits ^{\$}\{0,1\}^c\) assignment in \(G_1(M_1,M_2,l)\). On the other hand if \(g=h(K,\cdot )\) for a random \(K\), then \(K\) plays the role of \(a[l-1]\) in \(G_1(M_1,M_2,l-1)\).

Claim 3.6

Let \(M_1,M_2\in B^+\) with \(M_1\subseteq M_2\) and \(1+\Vert M_1\Vert _b\le \Vert M_2\Vert _b\). Then

$$\begin{aligned} {\Pr \left[ \,{A_2^{{\$}}(M_1,M_2)\Rightarrow {1}}\,\right] }&= 2^{-c} \\ {\Pr \left[ \,{A_2^{h(K,\cdot )}(M_1,M_2)\Rightarrow {1}}\,\right] }&\ge {\Pr \left[ \,{G_1(M_1,M_2,m_1)\Rightarrow {1}}\,\right] }. \end{aligned}$$

Proof of  Claim 3.6

Suppose \(g\) is chosen at random. Since \(y\ne M_2[m_1+1]\), the quantity \(g(y)\) is not defined until the last line of the code of \(A_2\), at which point \(h(a[m_2],y)\) is fixed, and thus the probability that the two are equal is \(2^{-c}\) due to the randomness of \(g(y)\). Now suppose \(g=h(K,\cdot )\) for a random \(K\). Think of \(K\) as playing the role of \(a[m_1]\) in \(G_1(M_1,M_2,m_1)\). Then \(a[m_2]=K\) in \(A_2^g(M_1,M_2)\) exactly when \(a[m_1]=a[m_2]\) in \(G_1(M_1,M_2,m_1)\), meaning exactly when the latter game returns 1. But if \(a[m_2]=K\) then certainly \(h(a[m_2],y)=h(K,y)\), and the latter is \(g(y)\), so \(A_1^g(M_1,M_2)\) will return 1. (However, it could be that \(h(a[m_2],y)=h(K,y)\) even if \(a[m_2]\ne K\), which is why we have an inequality rather than an equality in the claim.)

Claim 3.7

Let \(M_1,M_2\in B^+\) with \(M_1\subseteq M_2\) and \(1+\Vert M_1\Vert _b\le \Vert M_2\Vert _b\). Let \(m_1=\Vert M_1\Vert _b\). Then

$$\begin{aligned} \mathbf {Adv}_{h}^{\mathrm{prf}}(A_3(M_1,M_2))&\ge \frac{1}{m_1+1}\left( \mathsf {Coll}_{h^*}(M_1,M_2)-2^{-c}\right) . \end{aligned}$$

Proof of Claim 3.7

From the description of \(A_3\), whether \(g={\$}\) or \(g=h(K,\cdot )\),

$$\begin{aligned}&{\Pr \left[ \,{A_3^g(M_1,M_2)\Rightarrow {1}}\,\right] }\\&\quad = \frac{1}{m_1+1}\left( {\Pr \left[ \,{A_2^g(M_1,M_2)\Rightarrow {1}}\,\right] }+ \sum _{l=1}^{m_1} {\Pr \left[ \,{A_1^g(M_1,M_2,l)\Rightarrow {1}}\,\right] }\right) . \end{aligned}$$

Now Claims 3.6 and 3.5 imply that \({\Pr \left[ \,{A_3^{h(K,\cdot )}(M_1,M_2)\Rightarrow {1}}\,\right] }\) is

$$\begin{aligned}&\ge \frac{1}{m_1+1}\left( {\Pr \left[ \,{G_1(M_1,M_2,m_1)\Rightarrow {1}}\,\right] }+{ \sum _{l=1}^{m_1}} {\Pr \left[ \,{G_1(M_1,M_2,l-1)\Rightarrow {1}}\,\right] }\right) \nonumber \\&= \frac{1}{m_1+1}\cdot { \sum _{l=0}^{m_1}} {\Pr \left[ \,{G_1(M_1,M_2,l)\Rightarrow {1}}\,\right] } . \end{aligned}$$
(6)

On the other hand, Claims 3.6 and 3.5 also imply that \({\Pr \left[ \,{A_3^{{\$}}(M_1,M_2)\Rightarrow {1}}\,\right] }\) is

$$\begin{aligned}&= \frac{1}{m_1+1}\left( 2^{-c}+ \sum _{l=1}^{m_1} {\Pr \left[ \,{G_1(M_1,M_2,l)\Rightarrow {1}}\,\right] }\right) . \end{aligned}$$
(7)

Subtracting (7) from (6) and exploiting the cancellation of terms, we get

$$\begin{aligned} \mathbf {Adv}_{h}^{\mathrm{prf}}(A_3(M_1,M_2)) \;\ge \;\frac{1}{m_1+1}\left( {\Pr \left[ \,{G_1(M_1,M_2,0)\Rightarrow {1}}\,\right] }-2^{-c}\right) . \end{aligned}$$

Now examination of Game \(G_1(M_1,M_2,0)\) shows that in this game, \(a[m_1]{=} h^*(a[0],M_1)\), \(a[m_2]=h^*(a[0],M_2)\), and \(a[0]\) is selected at random. Since the game returns 1 iff \(a[m_1]=a[m_2]\), the probability that it returns 1 is exactly \(\mathsf {Coll}_{h^*}(M_1,M_2)\).

The Case \(M_1\not \subseteq M_2\). For \(M_1,M_2\in B^+\) with \(\Vert M_1\Vert _b\le \Vert M_2\Vert _b\) and \(M_1\not \subseteq M_2\), we let \(\mathrm {LCP}(M_1,M_2)\) denote the length of the longest common blockwise prefix of \(M_1,M_2\), meaning the largest integer \(p\) such that \(M_1[1]\ldots M_1[p]=M_2[1]\ldots M_2[p]\) but \(M_1[p+1]\ne M_2[p+1]\). Letting \(p=\mathrm {LCP}(M_1,M_2)\), \(m_1=\Vert M_1\Vert _b\) and \(m_2=\Vert M_2\Vert _b\) we consider the following sequence of pairs:

$$\begin{aligned}&\underbrace{(0,0),\ldots ,(p,p)}_{I_1(M_1,M_2)},(p+1,p+1), \underbrace{(p+2,p+1),\ldots ,(m_1,p+1)}_{I_2(M_1,M_2)},\nonumber \\&\underbrace{(m_1,p+2),\ldots ,(m_1,m_2)}_{I_3(M_1,M_2)}. \end{aligned}$$
(8)

For \(j=1,2,3\) we let \(I_j(M_1,M_2)\) be the set of pairs indicated above. We then let

$$\begin{aligned} I(M_1,M_2)&= I_1(M_1,M_2)\cup \{(p+1,p+1)\} \cup I_2(M_1,M_2) \cup I_3(M_1,M_2) \\ I^*(M_1,M_2)&= I(M_1,M_2)-\{(0,0)\} \\ I_1^*(M_1,M_2)&= I_1(M_1,M_2)-\{(0,0)\}. \end{aligned}$$

We consider the pairs to be ordered as per (8), and for \((l_1,l_2) \in I^*(M_1,M_2)\) we let \(\textsc {Pd}(l_1,l_2)\) denote the predecessor of \((l_1,l_2)\), meaning the pair immediately preceding \((l_1,l_2)\) in the sequence of (8). Note that \(|I(M_1,M_2)|=m_1+m_2-p\). We consider the games and adversary of Fig. 2.

Fig. 2
figure 2

Games and adversaries taking input distinct messages \(M_1,M_2\in B^+\) such that \(M_1\not \subseteq M_2\) and \(\Vert M_1\Vert _b\le \Vert M_2\Vert _b\)

Claim 3.8

Let \(M_1,M_2\in B^+\) with \(M_1\not \subseteq M_2\), and \(\Vert M_1\Vert _b\le \Vert M_2\Vert _b\). Suppose \((l_1,l_2)\in I^*(M_1,M_2)\). Let \((l_1',l_2')=\textsc {Pd}(l_1,l_2)\). Then

$$\begin{aligned} {\Pr \left[ \,{A_4^{{\$}}(M_1,M_2,l_1,l_2)\Rightarrow {1}}\,\right] }&= {\Pr \left[ \,{G_2(M_1,M_2,l_1,l_2)\Rightarrow {1}}\,\right] } \\ {\Pr \left[ \,{A_4^{h(K,\cdot )}(M_1,M_2,l_1,l_2)\Rightarrow {1}}\,\right] }&= {\Pr \left[ \,{G_2(M_1,M_2,l_1',l_2')\Rightarrow {1}}\,\right] }. \end{aligned}$$

Proof of Claim 3.8

We begin by justifying the first equality, namely the one where \(g\mathop {\leftarrow }\limits ^{\$}\mathsf {Maps}(\{0,1\}^b,\{0,1\}^c)\). Let us compare the code of \(G_2(M_1,M_2,l_1,l_2)\) and \(A_4^g (M_1,M_2,l_1,l_2)\). Line a10 is equivalent to line 210 because \(g\) is random. Now consider line a20. If \((l_1,l_2)\in I_1^*(M_1,M_2)\) then we have \(a_2[l_2]=a_1[l_1]\) just as per line 220. This is true because, in this case, we have \(M_1[l_1]=M_2[l_2]\) and hence \(a_2[l_2]=g(M_2[l_2])=g(M_1[l_1])= a_1[l_1]\). If \((l_1,l_2)=(p+1,p+1)\) then setting \(a_2[l_2]=g(M_2[l_2])\) is equivalent to choosing \(a_2[l_2]\) at random because \(g\) is random and has not previously been invoked on \(M_2[l_2]\). (We use here in a crucial way that the point \(M_1[p+1]\) on which \(g\) has previously been invoked is different from \(M_2[p+1]\).) If \((l_1,l_2)\in I_3(M_1,M_2)\) then setting \(a_2[l_2]=g(M_2[l_2])\) is equivalent to choosing \(a_2[l_2]\) at random because \(g\) is random and has not previously been invoked anywhere. (In this case, \(a_1[l_1]\) is chosen at random and not via \(g\).) Otherwise, \(a_2[l_2]\) is chosen at random. We have shown that line a20 is equivalent to line 220. The rest of the code is the same in both cases.

Now we justify the second equality, namely the one where \(K\) is selected at random from \(\{0,1\}^c\) and \(g=h(K,\cdot )\). We will consider some auxiliary games, shown in Fig. 3. We claim that

$$\begin{aligned} {\Pr \left[ \,{A_4^{h(K,\cdot )}(M_1,M_2,l_1,l_2)\Rightarrow {1}}\,\right] }&= {\Pr \left[ \,{G_3(M_1,M_2,l_1,l_2)\Rightarrow {1}}\,\right] } \end{aligned}$$
(9)
$$\begin{aligned}&= {\Pr \left[ \,{G_4(M_1,M_2,l_1,l_2)\Rightarrow {1}}\,\right] } \end{aligned}$$
(10)
$$\begin{aligned}&= {\Pr \left[ \,{G_5(M_1,M_2,l_1,l_2)\Rightarrow {1}}\,\right] } \end{aligned}$$
(11)
$$\begin{aligned}&= {\Pr \left[ \,{G_2(M_1,M_2,l_1',l_2')\Rightarrow {1}}\,\right] } \end{aligned}$$
(12)

We now justify the above relations.

Since \(a_1[l_1']\) and \(a_2[l_2']\) in \(G_3(M_1,M_2,l_1,l_2)\) both equal the key \(K\), the output of this game is the same as that of \(A_4^{h(K,\cdot )} (M_1,M_2,l_1,l_2)\), justifying (9).

Game \(G_4(M_1,M_2,l_1,l_2)\) was obtained by dropping the application of \(h(K,\cdot )\) in 310, 320 and beginning the loops of 330, 340 at \(l_1', l_2'\) rather than \(l_1,l_2\), respectively. To justify (10) we consider some cases. If \((l_1,l_2)\in I_1^*(M_1,M_2)\cup \{(p+1,p+1)\}\cup I_2(M_1,M_2)\) then \(l_1'=l_1-1\), so droping the application of \(h(K,\cdot )\) in 310 is compensated for by beginning the loop at 430 one step earlier. On the other hand if \((l_1,l_2)\in I_3(M_1,M_2)\) then \(l_1'=l_1=m_1\), so starting the loop of 430 at \(l_1'\) effectively makes no change, and we can also replace \(l_1\) by \(l_1'\) in the “else” clause as shown in 410. Similarly if \((l_1,l_2)\in I_1^*(M_1,M_2)\cup \{(p+1,p+1)\}\cup I_3(M_1,M_2)\) then \(l_2'=l_2-1\), so droping the application of \(h(K,\cdot )\) in 320 is compensated for by beginning the loop at 440 one step earlier. On the other hand if \((l_1,l_2)\in I_2(M_1,M_2)\) then \(l_2'=l_2=p+1\), so starting the loop of 440 at \(l_2'\) effectively makes no change, and we can also replace \(l_2\) by \(l_2'\) in the “else” clause as shown in 420.

Rather than picking \(K\) upfront and assigning it to \(a_1[l_1']\), Game \(G_5(M_1,M_2,l_1,l_2)\) picks \(a_1[l_1']\) directly at random and then adjusts its choice of \(a_2[l_2']\) to ensure that it equals \(a_1[l_1']\) whenever they were both set to \(K\). This justifies (11). Finally the test at 520 is equivalent to testing whether \((l_1',l_2')\in I_1(M_1,M_2)\), justifying (12).

Fig. 3
figure 3

Games equivalent to \(G_2(M_1,M_2,l_1',l_2')\), where \(M_1,M_2\in B^+\) are such that \(M_1\not \subseteq M_2\) and \(\Vert M_1\Vert _b\le \Vert M_2\Vert _b\), and \((l_1,l_2)\in I^*(M_1,M_2)\)

We now define prf-adversary \(A_5^{g}(M_1,M_2)\) against \(h\) as follows. It picks \((l_1,l_2)\mathop {\leftarrow }\limits ^{\$}I^*(M_1,M_2)\) and returns \(A_4^{g}(M_1,M_2,l_1,l_2)\).

Claim 3.9

Let \(M_1,M_2\in B^+\) with \(M_1\not \subseteq M_2\) and \(\Vert M_1\Vert _b\le \Vert M_2\Vert _b\). Let \(m=\Vert M_1\Vert _b+\Vert M_2\Vert _b -\mathrm {LCP}(M_1,M_2)-1\). Then

$$\begin{aligned} \mathbf {Adv}_{h}^{\mathrm{prf}}(A_5)&\ge \frac{1}{m}\cdot \left( \mathsf {Coll}_{h^*}(M_1,M_2) - 2^{-c}\right) . \end{aligned}$$

Proof of  Claim 3.9

Note that \(m=|I^*(M_1,M_2)|\). By Claim 3.8 we have the following, where the sums are over \((l_1,l_2)\in I^*(M_1,M_2)\) and we are letting \(\textsc {Pd}^{j}(l_1,l_2)\) be the \(j\)-th component of \(\textsc {Pd}(l_1, l_2)\) for \(j=1,2\):

$$\begin{aligned}&\mathbf {Adv}_{h}^{\mathrm{prf}}(A_5) \\&\quad = \frac{1}{m}\cdot \sum {\Pr \left[ \,{A_4^{h(K,\cdot )}(M_1,M_2,l_1,l_2)\Rightarrow {1}}\,\right] } \;-\;\frac{1}{m}\cdot \sum {\Pr \left[ \,{A_4^{{\$}}(M_1,M_2,l_1,l_2)\Rightarrow {1}}\,\right] } \\&\quad = \frac{1}{m}\cdot \sum {\Pr \left[ \,{G_2(M_1,M_2,\textsc {Pd}^{1}(l_1,l_2),\textsc {Pd}^{2}(l_1,l_2))\Rightarrow {1}}\,\right] }\;\\&\quad \quad -\frac{1}{m}\cdot \sum {\Pr \left[ \,{G_2(M_1,M_2,l_1,l_2)\Rightarrow {1}}\,\right] } \\&\quad = \frac{1}{m}\cdot \left( {\Pr \left[ \,{G_2(M_1,M_2,0,0)\Rightarrow {1}}\,\right] } - {\Pr \left[ \,{G_2(M_1,M_2,m_1,m_2)\Rightarrow {1}}\,\right] }\right) \;, \end{aligned}$$

where \(m_1=\Vert M_1\Vert _b\) and \(m_2=\Vert M_2\Vert _b\). Examination of Game \(G_2(M_1,M_2,0,0)\) shows that in this game, \(a_1[m_1]= h^*(a_1[0], M_1)\), \(a_2[m_2]=h^*(a_2[0],M_2)\), and \(a_1[0]=a_2[0]\) is selected at random. Since the game returns 1 iff \(a_1[m_1]=a_2[m_2]\), the probability that it returns 1 is exactly \(\mathsf {Coll}_{h^*}(M_1,M_2)\). On the other hand, the values \(a_1[m_1]\) and \(a_2[m_2]\) in \(G_2(M_1,M_2,m_1,m_2)\) are chosen independently at random, and so the probability that they are equal, which is the probability this game returns 1, is \(2^{-c}\).

Putting it Together We consider the following adversary \(A_6\) that takes input any distinct \(M_1,M_2\in B^+\) such that \(\Vert M_1\Vert _b\le \Vert M_2\Vert _b\):

figure b

Now let \(A^*\) be any au-adversary against \(h^*\) such that the two messages output by \(A^*\) are at most \(n_1, n_2\) blocks long, respectively, where \(1\le n_1,n_2\le n\) and \(n\) is as in the Lemma statement. We assume wlog that the two messages \(M_1,M_2\) output by \(A^*\) are always distinct, in \(B^+\), and satisfy \(\Vert M_1\Vert _b\le \Vert M_2\Vert _b\). Then we have

$$\begin{aligned} \mathbf {Adv}_{h^*}^{\mathrm{au}}(A^*)&= \sum _{M_1\subseteq M_2} \mathsf {Coll}_{h^*}(M_1,M_2)\cdot {\Pr \left[ \,{M_1,M_2}\,\right] }\\&+ \sum _{M_1\not \subseteq M_2} \mathsf {Coll}_{h^*}(M_1,M_2)\cdot {\Pr \left[ \,{M_1,M_2}\,\right] } \end{aligned}$$

where \(\Pr [M_1,M_2]\) denotes the probability that \(A^*\) outputs \((M_1,M_2)\). Now use Claims 3.7 and 3.9, and also the assumptions \(\Vert M_1\Vert _b\le n_1\), \(\Vert M_2\Vert _b\le n_2\) and \(n_2 \ge 1\) from the lemma statement, to get

$$\begin{aligned} \mathbf {Adv}_{h^*}^{\mathrm{au}}(A^*)&\le \sum _{M_1\subseteq M_2} \left[ (n_1+n_2-1)\cdot \mathbf {Adv}_{h}^{\mathrm{prf}}(A_3(M_1,M_2))+2^{-c}\right] \cdot {\Pr \left[ \,{M_1,M_2}\,\right] } \nonumber \\&\quad \;\nonumber \\&+\sum _{M_1\not \subseteq M_2} \left[ (n_1+n_2-1)\cdot \mathbf {Adv}_{h}^{\mathrm{prf}}(A_5(M_1,M_2))+2^{-c}\right] \cdot {\Pr \left[ \,{M_1,M_2}\,\right] } \nonumber \\&= \sum _{M_1,M_2} \left[ (n_1+n_2-1)\cdot \mathbf {Adv}_{h}^{\mathrm{prf}}(A_6(M_1,M_2))+2^{-c}\right] \cdot {\Pr \left[ \,{M_1,M_2}\,\right] }. \nonumber \\ \end{aligned}$$
(13)

Let \(A\), given oracle \(g\), run \(A^*\) to get \((M_1,M_2)\) and then return \(A_6^g(M_1,M_2)\). Equation (2) follows from (13), establishing the blackbox claim of the lemma. The time complexity of \(A\) remains around \(t\) because, by our convention, the time complexity is that of the overlying experiment, so that of \(A^*\) includes the time to compute \(T_{F}\) on the messages that \(A^*\) outputs. The non-blackbox case uses a standard “coin-fixing” argument. Let \(M_1^*,M_2^*\in B^+\) be distinct messages such that \(\Vert M_1^*\Vert _b\le \Vert M_2^*\Vert _b\le n\) and

$$\begin{aligned} \mathbf {Adv}_{h}^{\mathrm{prf}}(A_6(M_1,M_2)) \;\le \;\mathbf {Adv}_{h}^{\mathrm{prf}}(A_6(M_1^*,M_2^*)) \end{aligned}$$
(14)

for all distinct \(M_1,M_2\in B^+\) with \(\Vert M_1\Vert _b\le \Vert M_2\Vert _b\le n\). Now let \(A\) be the adversary that has \(M_1^*,M_2^*\) hardwired in its code and, given oracle \(g\), returns \(A_6^g(M_1^*,M_2^*)\).

3.3 Proof of Lemma 3.2

Game \(G0\) of Fig. 4 implements an oracle for \( hF (K_{\mathrm {out}}\Vert K_{\mathrm {in}},\cdot )\) with the keys chosen at random, while Game \(G2\) implements an oracle for a random function. So

$$\begin{aligned}&\mathbf {Adv}_{ hF }^{\mathrm{prf}}(A_{ hF }) \nonumber \\&\quad = {\Pr \left[ \,{A_{ hF }^{G0}\Rightarrow {1}}\,\right] }-{\Pr \left[ \,{A_{ hF }^{G2}\Rightarrow {1}}\,\right] } \nonumber \\&\quad =\left( {\Pr \left[ \,{A_{ hF }^{G0}\Rightarrow {1}}\,\right] }-{\Pr \left[ \,{A_{ hF }^{G1}\Rightarrow {1}}\,\right] }\right) \nonumber \\&\quad \quad + \left( {\Pr \left[ \,{A_{ hF }^{G1}\Rightarrow {1}}\,\right] }-{\Pr \left[ \,{A_{ hF }^{G2}\Rightarrow {1}}\,\right] }\right) , \end{aligned}$$
(15)

where in the last step we simply subtracted and then added back in the value \({\Pr \left[ \,{A_{ hF }^{G1}\!\Rightarrow {1}\!}\,\right] }\).

Fig. 4
figure 4

Games \(G0,G1,G2\) for the proof of Lemma 3.2

Let \(A_{h}\), given an oracle for a function \(f{:\;\;}B\rightarrow \{0,1\}^c\), pick \(K_{\mathrm {in}}\mathop {\leftarrow }\limits ^{\$}\{0,1\}^k\). It then runs \(A_{ hF }\), replying to oracle query \(M\) by \(f(F(K_{\mathrm {in}},M))\), and returns whatever output \(A_{ hF }\) returns. A simple analysis shows that

$$\begin{aligned} \mathbf {Adv}_{h}^{\mathrm{prf}}(A_{h}) \;=\;{\Pr \left[ \,{A_{ hF }^{G0}\Rightarrow {1}}\,\right] }-{\Pr \left[ \,{A_{ hF }^{G1}\Rightarrow {1}}\,\right] } . \end{aligned}$$
(16)

The main part of the proof is to construct an au-adversary \(A_{F}'\) against \(F\) such that

$$\begin{aligned} {\Pr \left[ \,{A_{ hF }^{G1}\Rightarrow {1}}\,\right] }-{\Pr \left[ \,{A_{ hF }^{G2}\Rightarrow {1}}\,\right] } \;\le \;\left( \begin{array}{l} q\\ 2\end{array}\right) \cdot \mathbf {Adv}_{F}^{\mathrm{au}}(A_{F}'). \end{aligned}$$
(17)

Towards constructing \(A_{F}'\), consider Games \(G3,G4, and G5\) of Fig. 5. (Game \(G3\) is defined by the code on the left of the Figure. Game \(G4\) is the same except that the boxed code-statement is omitted.) We will assume now that a prf-adversary never repeats an oracle query. This is wlog, and is used below without explicit mention.

Fig. 5
figure 5

Games \(G3,G4\) are defined by the code on the left, where Game \(G3\) includes the boxed statement, while Game \(G4\) does not

Claim 3.10

Game \(G4\) is equivalent to Game \(G2\), while Game \(G3\) is equivalent to Game \(G1\).

Proof of  Claim 3.10

In Game \(G4\), the “If” statement does nothing beyond setting the \(\mathsf {bad}\) flag, and the reply to query \(M_s\) is always the random value \(Z_s\). Thus, Game \(G4\) implements a random function just like Game \(G2\). Game \(G3\) returns random values except that it also ensures that if \(F(K_{\mathrm {in}},M_i)=F(K_{\mathrm {in}},M_j)\) then the answers to queries \(M_i,M_j\) are the same. Thus, it is equivalent to Game \(G1\).

Now we have:

$$\begin{aligned} {\Pr \left[ \,{A_{ hF }^{G1}\Rightarrow {1}}\,\right] }-{\Pr \left[ \,{A_{ hF }^{G2}\Rightarrow {1}}\,\right] }&= {\Pr \left[ \,{A_{ hF }^{G3}\Rightarrow {1}}\,\right] }-{\Pr \left[ \,{A_{ hF }^{G4}\Rightarrow {1}}\,\right] } \end{aligned}$$
(18)
$$\begin{aligned}&\le {\Pr \left[ \,{A_{ hF }^{G4}\,\mathrm {sets}\,\mathsf {bad}}\,\right] }. \end{aligned}$$
(19)

Above, (18) is by Claim 3.10. Since \(G3,G4\) differ only in statements that follow the setting of \(\mathsf {bad}\), (19) follows from the Fundamental Lemma of Game Playing [11].

We define au-adversary \(A_{F}'\) against \(F\), as follows: It runs \(A_{ hF }^{G5}\), then picks at random \(i,j\) subject to \(1\le i<j\le q\), and finally outputs the messages \(M_i,M_j\). In other words, it runs \(A_{ hF }\), replying to the oracle queries of the latter with random values, and then outputs a random pair of messages that \(A_{ hF }\) queries to its oracle. (In order for \(M_i,M_j\) to always be defined, we assume \(A_{ hF }\) always makes exactly \(q\) oracle queries rather than at most \(q\) where by “always” we mean no matter how its oracle queries are replied to. This is wlog.) We claim that

$$\begin{aligned} {\Pr \left[ \,{A_{ hF }^{G4}\,\mathrm {sets}\,\mathsf {bad}}\,\right] } \;\le \;\left( \begin{array}{l} q\\ 2\end{array}\right) \cdot \mathbf {Adv}_{F}^{\mathrm{au}}(A_{F}'). \end{aligned}$$
(20)

Combining (19) and (20) yields (17). We now justify (20). Intuitively, it is true because \(i,j\) are chosen at random after the execution of \(A_{ hF }\) is complete, so \(A_{ hF }\) has no information about them. A rigorous proof, however, needs a bit more work. Consider the experiment defining the au-advantage of \(A_{F}'\). (Namely, we run \(A_{ hF }^{G5}\), pick \(i,j\) at random subject to \(1\le i < j \le q\), and then pick \(K_{\mathrm {in}}\mathop {\leftarrow }\limits ^{\$}\{0,1\}^k\).) In this experiment, consider the following events defined for \(1\le \alpha < \beta \le q\):

Notice that the events “\(C_{\alpha ,\beta }\wedge (i,j)=(\alpha ,\beta )\)” (\(1\le \alpha <\beta \le q\)) are disjoint. (Even though the events \(C_{\alpha ,\beta }\) for \(1\le \alpha <\beta \le q\) are not.) Thus:

$$\begin{aligned} \mathbf {Adv}_{F}^{\mathrm{au}}(A_{F}')&= {\Pr \left[ \,{ {\textstyle \bigvee _{1\le \alpha < \beta \le q}} \left( C_{\alpha ,\beta }\wedge (i,j)=(\alpha ,\beta )\right) }\,\right] } \\&= \sum _{1\le \alpha < \beta \le q} {\Pr \left[ \,{C_{\alpha ,\beta }\wedge (i,j)= (\alpha ,\beta )}\,\right] }. \end{aligned}$$

Since \(i,j\) are chosen at random after the execution of \(A_{ hF }^{G5}\) is complete, the events “\((i,j)=(\alpha ,\beta )\)” and \(C_{\alpha ,\beta }\) are independent. Thus the above equals

$$\begin{aligned} \sum _{1\le \alpha < \beta \le q} {\Pr \left[ \,{C_{\alpha ,\beta }}\,\right] }\cdot {\Pr \left[ \,{(i,j)= (\alpha ,\beta )}\,\right] }&= \sum _{1\le \alpha < \beta \le q} {\Pr \left[ \,{C_{\alpha ,\beta }}\,\right] }\cdot \frac{1}{{{q}\atopwithdelims (){2}}} \\&= \frac{1}{{{q}\atopwithdelims (){2}}} \cdot \sum _{1\le \alpha < \beta \le q} {\Pr \left[ \,{C_{\alpha ,\beta }}\,\right] } \\&\ge \frac{1}{{{q}\atopwithdelims (){2}}} \cdot {\Pr \left[ \,{C}\,\right] }. \end{aligned}$$

The proof of (20) is concluded by noting that \({\Pr \left[ \,{C}\,\right] }\) equals \({\Pr \left[ \,{G4\,\mathrm {sets}\,\mathsf {bad}}\,\right] }\).

To obtain the blackbox claim of the lemma, we set \(A_{F}=A_{F}'\). Combining (15), (16) and (17) we then get (3). We neglect the small overhead in time complexity for \(A_{F}\) compared to \(A_{ hF }\). The coin-fixing argument for the non-blackbox case is as follows. We note that

$$\begin{aligned} \mathbf {Adv}_{F}^{\mathrm{au}}(A_{F}') \;=\;{\mathbf {E}_{M_1,M_2}\left[ {\mathsf {Coll}_{F}(M_1,M_2)}\right] } \end{aligned}$$

where the expectation is over \((M_1,M_2)\mathop {\leftarrow }\limits ^{\$}A_{F}'\). Thus there must exist \(M_1,M_2\in B^+\) such that \(\mathsf {Coll}_{F}(M_1,M_2)\ge \mathbf {Adv}_{F}^{\mathrm{au}}(A_{F}')\), and these messages are distinct because \(A_{ hF }\) never repeats an oracle query. Let \(A_{F}\) be the au-adversary that has \(M_1,M_2\) hardwired as part of its code and, when run, simply outputs these messages and halts. Then

$$\begin{aligned} \mathbf {Adv}_{F}^{\mathrm{au}}(A_{F}') \;\le \;\mathbf {Adv}_{F}^{\mathrm{au}}(A_{F}). \end{aligned}$$
(21)

Furthermore the time complexity of \(A_{F}\) is \(O(mT_{h})\). (By our convention, the time complexity is that of the overlying experiment, so includes the time to compute \(T_{F}\) on the messages that \(A_{F}\) outputs.) Combining (15), (16), (17) and (21) we get (3).

3.4 Proof of Theorem 3.4

Define \(F{:\;\;}\{0,1\}^c\times B^+\rightarrow \{0,1\}^b\) by \(F(K_{\mathrm {in}},M) =h^*(K_{\mathrm {in}},M)\Vert \mathsf {fpad}\). Note \(\mathsf {GNMAC}= hF \). Consider the proof of Lemma 3.2 with \(A_{\mathsf {GNMAC}}\) playing the role of \(A_{ hF }\). We assume as usual that \(A_{ hF }\) does not repeat an oracle query and makes exactly \(q\) oracle queries. Let \(A_1\) be the adversary \(A_{h}\) from that proof. Referring also to the games in that proof, we have

$$\begin{aligned} \mathbf {Adv}_{\mathsf {GNMAC}}^{\mathrm{prf}}(A_{ hF }) \;\le \;\mathbf {Adv}_{h}^{\mathrm{prf}}(A_{1}) + {\Pr \left[ \,{A_{ hF }^{G4}\,\mathrm {sets}\,\mathsf {bad}}\,\right] }. \end{aligned}$$
(22)

The non-blackbox case of Lemma 3.1 tells us that there is an adversary \(A_2\), with resource bounds as in the theorem statement, such that, for all distinct \(M,M'\in B^+\) satisfying \(\Vert M\Vert _b,\Vert M'\Vert _b\le m\), we have

$$\begin{aligned} \mathsf {Coll}_{F}(M,M')\;\le \;(\Vert M\Vert _b+\Vert M'\Vert _b-1) \cdot \mathbf {Adv}_{h}^{\mathrm{prf}}(A_2) + \frac{1}{2^c}. \end{aligned}$$
(23)

(We obtain \(A_2\) by letting \(M,M'\) maximize the above collision probability over the indicated domain and then applying Lemma 3.1 to the adversary \(A^*_{M,M'}\) that has \(M,M'\) hardwired in its code and just outputs them.) Let \(M_1,\ldots ,M_q\) denote the queries made by \(A_{ hF }\) in its execution with \(G4\). Regarding these as random variables over the coins of this execution, and also taking the probability over the choice of \(K_{\mathrm {in}}\), we have

$$\begin{aligned} {\Pr \left[ \,{A_{ hF }^{G4}\,\mathrm {sets}\,\mathsf {bad}}\,\right] }&= {\Pr \left[ \,{\exists \,i<j\,:\,F(K_{\mathrm {in}},M_i)=F(K_{\mathrm {in}},M_j)}\,\right] } \end{aligned}$$
(24)
$$\begin{aligned}&\le \sum _{i<j} {\Pr \left[ \,{F(K_{\mathrm {in}},M_i)=F(K_{\mathrm {in}},M_j)}\,\right] } \nonumber \\&\le \sum _{i<j} (n_i+n_j-1) \cdot \mathbf {Adv}_{h}^{\mathrm{prf}}(A_2) + \frac{1}{2^c} \\&= \mathbf {Adv}_{h}^{\mathrm{prf}}(A_2)\cdot \sum _{i<j} (n_i+n_j-1) \,+ {{q}\atopwithdelims (){2}}\frac{1}{2^c} \nonumber \\&= \mathbf {Adv}_{h}^{\mathrm{prf}}(A_2)\cdot (q-1)\cdot (n-q/2) \,+ {{q}\atopwithdelims (){2}}\frac{1}{2^c}. \nonumber \end{aligned}$$
(25)

Above, (24) is by the definition of game \(G4\). We obtained (25) by applying (23). To do this, one can first write each term of the sum as itself a sum, over all distinct messages \(M,M'\), of \(\mathsf {Coll}_{F}(M,M')\), weighted by the probability that \(M_i=M\) and \(M_j=M'\), the latter taken only over the execution of \(A_{ hF }\) with \(G4\). Note we use our convention that resource bounds assumed on an adversary (in this case, the block lengths of its queries) hold not only in the game defining security but across all possible answers to queries, and hence in particular in its execution with \(G4\).

3.5 Tightness of Bound

We can get concrete estimates of the security that Theorem 3.3 guarantees for \(\mathsf {GNMAC}\) by making conjectures about the prf security of \(h\) and then applying the theorem. As an example, one such conjecture is that the best attack against \(h\) as a PRF is exhaustive key search. (Birthday attacks do not apply since \(h\) is not a family of permutations.) This means that \(\mathbf {Adv}_{h}^{\mathrm{prf}}(A)\le \overline{t} \cdot 2^{-c}\) for any prf-adversary \(A\) of time complexity \(t\) making \(q\le \overline{t}\) queries, where \(\overline{t}= t/T_{h}\). Let us refer to this as the EKS (exhaustive key search) conjecture on \(h\).

Now let \(A_{\mathsf {GNMAC}}\) be a prf-adversary against \(\mathsf {GNMAC}\) who has time complexity \(t\) and makes at most \(q\approx \overline{t}\) queries, each of at most \(m\) blocks. The blackbox case of Theorem 3.3 says that the prf-advantage of \(A_{\mathsf {GNMAC}}\) is at most \(O(mq^2\overline{t}/2^{c}) \approx O(mq^3/2^c)\), which hits 1 when \(q \approx 2^{c/3}/m^{1/3}\), meaning the bound justifies \(\mathsf {GNMAC}\) up to roughly \(2^{c/3}/m^{1/3}\) queries. This falls short of the \(2^{c/2}/m^{1/2}\) queries of the best known attack on \(\mathsf {GNMAC}\), namely the birthday one of [33], indicating the bound is not tight.

If we use the non-blackbox case of Theorem 3.3 while continuing to make the EKS conjecture on \(h\), the bound on the prf-advantage of \(A_{\mathsf {GNMAC}}\) is \(O(m^2q^2T_{h}/2^{c})\), which hits 1 when \(q\approx 2^{c/2}/m\), which is almost tight. This, indeed, was the estimate made in the preliminary version of this paper [3]. However, De, Trevisan and Tulsiani [17] use time-memory tradeoffs in the style of [2, 22, 24] to show that there exist better than exhaustive key search attacks on the prf security of any function family, and thus in particular on \(h\). These results are non-constructive, and hence can be ignored if we are using blackbox reductions, but in the non-blackbox case one must take them into account in making conjectures on the prf security of \(h\). This was pointed out by Koblitz and Menezes [26]. The EKS conjecture on \(h\) is thus not an appropriate starting point in the context of non-blackbox reductions. It is not clear to us what is.

In summary, while the non-blackbox case of Theorem 3.3 provides a tighter reduction, the gain is offset by being forced to conjecture poorer security of the starting \(h\). The gap has been filled by Pietrzak [32], who gives blackbox reduction proofs of prf security for NMAC that he shows via matching attack to be tight. In the process he shows that an improvement to the bounds of Theorem 3.3 claimed by [26] was incorrect.

4 MAC-Security of NMAC Under Weaker Assumptions

Since any PRF is a secure MAC [7, 10], Theorem 3.3 implies that NMAC is a secure MAC if the compression function is a PRF. Here we show that NMAC is a secure MAC under a weaker than PRF assumption on the compression function —namely that it is a privacy-preserving MAC— coupled with the assumption that the hash function is cAU. This is of interest given the still numerous usages of HMAC as a MAC (rather than as a PRF). This result can be viewed as attempting to formalize the intuition given in [4, Remark 4.4].

4.1 Privacy-Preserving MACs

MAC Forgery Recall that the mac-advantage of mac-adversary \(A\) against a family of functions \(f{:\;\;} Keys \times Dom \rightarrow Rng \) is

$$\begin{aligned} \mathbf {Adv}_{f}^{\mathrm{mac}}(A) \;=\;{\Pr }\left[ \,{A^{f(K,\cdot ),\mathrm {VF}_{f}(K,\cdot ,\cdot )} \text{ forges }}\, :\,{K\mathop {\leftarrow }\limits ^{\$} Keys }\,\right] . \end{aligned}$$

The verification oracle \(\mathrm {VF}_{f}(K,\cdot ,\cdot )\) associated to \(f\) takes input \(M,T\), returning 1 if \(f(K,M)=T\) and \(0\) otherwise. Queries to the first oracle are called mac queries, and ones to the second are called verification queries. \(A\) is said to forge if it makes a verification query \(M,T\) the response to which is 1 but \(M\) was not previously a mac query. Note we allow multiple verification queries [10].

Privacy-Preserving MACs The privacy notion for MACs that we use adapts the notion of left-or-right indistinguishability of encryption [6] to functions that are deterministic, and was first introduced by [9] who called it indistinguishability under distinct chosen-plaintext attack. An oracle query of an ind-adversary \(A\) against family \(f{:\;\;}\{0,1\}^k\times \{0,1\}^l\rightarrow \{0,1\}^L\) is a pair of \(l\)-bit strings. The reply is provided by one or the other of the following games:

figure c

Each game has an initialization step in which it picks a key; it then uses this key in computing replies to all the queries made by \(A\). The ind-advantage of \(A\) is

$$\begin{aligned} \mathbf {Adv}_{f}^{\mathrm{ind}}(A) \;=\;{\Pr \left[ \,{A^{\mathrm {Right}}\Rightarrow {1}}\,\right] }- {\Pr \left[ \,{A^{\mathrm {Left}}\Rightarrow {1}}\,\right] }. \end{aligned}$$

However, unlike for encryption, the oracles here are deterministic. So \(A\) can easily win (meaning, obtain a high advantage), for example by making a pair of queries of the form \((x,z),(y,z)\), where \(x,y,z\) are distinct, and then returning 1 iff the replies returned are the same. (We expect that \(h(K,x)\ne h(K,y)\) with high probability over \(K\) for functions \(h\) of interest, for example compression functions.) We fix this by simply outlawing such behavior. To be precise, let us say that \(A\) is legitimate if for any sequence \((x_0^1,x_1^1),\ldots ,(x_0^q, x_1^q)\) of oracle queries that it makes, \(x_0^1,\ldots ,x_0^q\) are all distinct \(l\)-bit strings, and also \(x_1^1,\ldots , x_1^q\) are all distinct \(l\)-bit strings. (As a test, notice that the adversary who queried \((x,z),(y,z)\) was not legitimate.) It is to be understood henceforth that an ind-adversary means a legitimate one. When we say that \(f\) is privacy-preserving, we mean that the ind-advantage of any (legitimate) practical ind-adversary is low.

Privacy-preservation is not, by itself, a demanding property. For example, it is achieved by a constant family such as the one defined by \(f(K,x)=0^L\) for all \(K,x\). We will, however, want the property for families that are also secure MACs.

PP-MAC \(<\)

PRF We claim that a privacy-preserving MAC (PP-MAC) is strictly weaker than a PRF, in the sense that any PRF is (a secure MAC [7, 10] and) privacy-preserving, but not vice-versa. This means that when (below) we assume that a compression function \(h\) is a PP-MAC, we are indeed assuming less of it than that it is a PRF. Let us now provide some details about the claims made above. First, the following is the formal statement corresponding to the claim that any PRF is privacy-preserving:

Proposition 4.1

Let \(f{:\;\;}\{0,1\}^k\times \{0,1\}^l \rightarrow \{0,1\}^L\) be a family of functions, and \(A_{\mathrm {ind}}\) an ind-adversary against it that makes at most \(q\) oracle queries and has time complexity at most \(t\). Then there is a prf-adversary \(A_{\mathrm {prf}}\) against \(f\) such that \(\mathbf {Adv}_{f}^{\mathrm{ind}}(A_{\mathrm {ind}}) \;\le \;2\cdot \mathbf {Adv}_{f}^{\mathrm{prf}}(A_{\mathrm {prf}})\). \(A_{\mathrm {prf}}\) makes at most \(q\) oracle queries, has time complexity at most \(t\) and is obtained via a blackbox reduction.

The proof is a simple exercise and is omitted. Next we explain why a PP-MAC need not be a PRF. The reason (or one reason) is that if the output of a family of functions has some structure, for example, always ending in a \(0\) bit, it would disqualify the family as a PRF but need not preclude its being a PP-MAC. To make this more precise, let \(f{:\;\;}\{0,1\}^k\times \{0,1\}^l\rightarrow \{0,1\}^L\) be a PP-MAC. Define \(g{:\;\;}\{0,1\}^k\times \{0,1\}^l\rightarrow \{0,1\}^{L+1}\) by \(g(K,x)=f(K,x)\Vert 0\) for all \(K\in \{0,1\}^k\) and \(x\in \{0,1\}^l\). Then \(g\) is also a PP-MAC, but is clearly not a PRF.

4.2 Results

The following implies that if \(h\) is a PP-MAC and \(F\) is cAU then their composition \( hF \) is a secure MAC.

Lemma 4.2

Let \(B=\{0,1\}^b\). Let \(h{:\;\;}\{0,1\}^c \times B\rightarrow \{0,1\}^c\) and \(F{:\;\;}\{0,1\}^k\times {D}\rightarrow B\) be families of functions, and let \( hF {:\;\;}\{0,1\}^{c+k}\times {D}\rightarrow \{0,1\}^c\) be defined by

$$\begin{aligned} hF (K_{\mathrm {out}}\Vert K_{\mathrm {in}},M)=h(K_{\mathrm {out}},F(K_{\mathrm {in}},M)) \end{aligned}$$

for all \(K_{\mathrm {out}}\in \{0,1\}^c,K_{\mathrm {in}}\in \{0,1\}^k\) and \(M\in {D}\). Let \(A_{ hF }\) be a mac-adversary against \( hF \) that makes at most \(q_{\mathrm {mac}}\) mac queries and at most \(q_{\mathrm {vf}}\) verification queries, with the messages in each of these queries being of length at most \(n\). Suppose \(A_{ hF }\) has time complexity at most \(t\). Let \(q=q_{\mathrm {mac}}+q_{\mathrm {vf}}\) and assume \(2\le q < 2^b\). Then there exists a mac-adversary \(A_1\) against \(h\), an ind-adversary \(A_2\) against \(h\), and an au-adversary \(A_{F}\) against \(F\) such that

$$\begin{aligned} \mathbf {Adv}_{ hF }^{\mathrm{mac}}(A_{ hF }) \;\le \;\mathbf {Adv}_{h}^{\mathrm{mac}}(A_1)+ \mathbf {Adv}_{h}^{\mathrm{ind}}(A_2)+{{q}\atopwithdelims (){2}}\cdot \mathbf {Adv}_{F}^{\mathrm{au}}(A_{F}). \end{aligned}$$
(26)

\(A_{1}\) makes at most \(q_{\mathrm {mac}}\) mac queries and at most \(q_{\mathrm {vf}}\) verification queries, has time complexity at most \(t\) and is obtained via a blackbox reduction. \(A_2\) makes at most \(q\) oracle queries, has time complexity at most \(t\) and is obtained via a blackbox reduction. \(A_{F}\) outputs messages of length at most \(n\) and makes \(2\) oracle queries. The time complexity of \(A_{F}\) is \(t\) under a blackbox reduction and \(O(T_{F}(n))\) under a non-blackbox reduction, where \(T_{F}(n)\) is the time to compute \(F\) on an \(n\)-bit input.

The proof is in Sect. 4.3. As a corollary we have the following, which says that if \(h\) is a PP-MAC and \(h^*\) is cAU then \(\mathsf {GNMAC}\) is a secure MAC.

Theorem 4.3

Assume \(b\ge c\) and let \(B=\{0,1\}^b\). Let \(h{:\;\;}\{0,1\}^c\times B\rightarrow \{0,1\}^c\) be a family of functions and let \(\mathsf {fpad}\in \{0,1\}^{b-c}\) be a fixed padding string. Let \(\mathsf {GNMAC}{:\;\;}\{0,1\}^{2c} \times B^+\rightarrow \{0,1\}^c\) be defined by

$$\begin{aligned} \mathsf {GNMAC}(K_{\mathrm {out}}\Vert K_{\mathrm {in}},M)=h(K_{\mathrm {out}},h^*(K_{\mathrm {in}},M)\Vert \mathsf {fpad}) \end{aligned}$$

for all \(K_{\mathrm {out}},K_{\mathrm {in}}\in \{0,1\}^c\) and \(M\in B^+\). Let \(A_{\mathsf {GNMAC}}\) be a mac-adversary against \(\mathsf {GNMAC}\) that makes at most \(q_{\mathrm {mac}}\) mac queries and at most \(q_{\mathrm {vf}}\) verification queries, with the messages in each of these queries being of at most \(m\) blocks. Suppose \(A_{\mathsf {GNMAC}}\) has time complexity at most \(t\). Let \(q=q_{\mathrm {mac}}+q_{\mathrm {vf}}\) and assume \(2\le q < 2^b\). Then there exists a mac-adversary \(A_1\) against \(h\), an ind-adversary \(A_2\) against \(h\), and an au-adversary \(A^*\) against \(h^*\) such that

$$\begin{aligned} \mathbf {Adv}_{\mathsf {GNMAC}}^{\mathrm{mac}}(A_{\mathsf {GNMAC}}) \;\le \;\mathbf {Adv}_{h}^{\mathrm{mac}}(A_1)+ \mathbf {Adv}_{h}^{\mathrm{ind}}(A_2)+{{q}\atopwithdelims (){2}}\cdot \mathbf {Adv}_{h^*}^{\mathrm{au}}(A^*). \end{aligned}$$
(27)

\(A_{1}\) makes at most \(q_{\mathrm {mac}}\) mac queries and at most \(q_{\mathrm {vf}}\) verification queries, has time complexity at most \(t\) and is obtained via a blackbox reduction. \(A_2\) makes at most \(q\) oracle queries, has time complexity at most \(t\) and is obtained via a blackbox reduction. \(A^*\) outputs messages of at most \(m\) blocks and makes \(2\) oracle queries. The time complexity of \(A^*\) is \(t\) under a blackbox reduction and \(O(mT_{h})\) under a non-blackbox reduction, where \(T_{h}\) is the time for one computation of \(h\).

We remark that Lemma 4.2 can be extended to show that \( hF \) is not only a MAC but itself privacy-preserving. (This assumes \(h\) is privacy-preserving and \(F\) is cAU. We do not prove this here.) This implies that \(\mathsf {GNMAC}\) is privacy-preserving as long as \(h\) is privacy-preserving and \(h^*\) is cAU. This is potentially useful because it may be possible to show that a PP-MAC is sufficient to ensure security in some applications where \(\mathsf {HMAC}\) is currently assumed to be a PRF.

4.3 Proof of Lemma 4.2

A mac-adversary against \(h\) gets a mac oracle \(h(K_{\mathrm {out}},\cdot )\) and corresponding verification oracle \(\mathrm {VF}_{h}(K_{\mathrm {out}},\cdot ,\cdot )\). By itself picking key \(K_{\mathrm {in}}\) and invoking these oracles, it can easily simulate the mac oracle \(h(K_{\mathrm {out}},F(K_{\mathrm {in}},\cdot ))\) and verification oracle \(\mathrm {VF}_{h}( K_{\mathrm {out}},F(K_{\mathrm {in}},\cdot ),\cdot )\) required by a mac-adversary against \( hF \). This leads to the following natural construction of \(A_1\):

figure d

Consider the experiment defining the mac-advantage of \(A_1\). Namely, choose \(K_{\mathrm {out}}\mathop {\leftarrow }\limits ^{\$}\{0,1\}^c\) and run \(A_1\) with oracles \(h(K_{\mathrm {out}},\cdot )\) and \(\mathrm {VF}_{h}(K_{\mathrm {out}},\cdot ,\cdot )\). Let \(\mathrm {Coll}\) (for “collision”) be the event that there exist \(j,l\) such that \(y_j=y_l\) but \(M_j\ne M_l\). Then

$$\begin{aligned} \mathbf {Adv}_{h}^{\mathrm{mac}}(A_1)&= {\Pr \left[ \,{A_1 \text{ forges }}\,\right] } \nonumber \\&\ge {\Pr \left[ \,{A_{ hF } \text{ forges } \;\wedge \, \overline{\mathrm {Coll}}}\,\right] } \nonumber \\&\ge {\Pr \left[ \,{A_{ hF } \text{ forges }}\,\right] } - {\Pr \left[ \,{\mathrm {Coll}}\,\right] } \nonumber \\&= \mathbf {Adv}_{ hF }^{\mathrm{mac}}(A_{ hF }) - {\Pr \left[ \,{\mathrm {Coll}}\,\right] }. \end{aligned}$$
(28)

The rest of the proof is devoted to upper bounding \({\Pr \left[ \,{\mathrm {Coll}}\,\right] }\). Consider the games \(G1,G2\) of Fig. 6, where we denote by \(\langle {i}\rangle _{b}\) the representation of integer \(i\) as a \(b\)-bit string. (The assumption \(q<2^b\) made in the lemma statement means that we can always represent \(i\) this way in \(G2\).) These games differ only in the manner in which tag \(T_i\) is computed. In \(G1\), it is equal to \( hF (K_{\mathrm {out}}\Vert K_{\mathrm {in}}, M_i)\). In \(G2\), however, it is the result of applying \(h(K_{\mathrm {out}},\cdot )\) to the current value of the counter \(i\), and as such does not depend on \(K_{\mathrm {in}}\). Now note that

$$\begin{aligned} {\Pr \left[ \,{\mathrm {Coll}}\,\right] }&= {\Pr \left[ \,{A_{ hF }^{G1}\,\mathrm {sets}\,\mathsf {bad}}\,\right] } \nonumber \\&= \left( {\Pr \left[ \,{A_{ hF }^{G1}\,\mathrm {sets}\,\mathsf {bad}}\,\right] } - {\Pr \left[ \,{A_{ hF }^{G2}\,\mathrm {sets}\,\mathsf {bad}}\,\right] }\right) + {\Pr \left[ \,{A_{ hF }^{G2}\,\mathrm {sets}\,\mathsf {bad}}\,\right] }.\nonumber \\ \end{aligned}$$
(29)
Fig. 6
figure 6

Games for the proof of Lemma 4.2

Now consider the adversaries \(A_2,A_{F}'\) described in Fig. 7. We claim that

$$\begin{aligned} {\Pr \left[ \,{A_{ hF }^{G1}\,\mathrm {sets}\,\mathsf {bad}}\,\right] } - {\Pr \left[ \,{A_{ hF }^{G2}\,\mathrm {sets}\,\mathsf {bad}}\,\right] }&\le \mathbf {Adv}_{h}^{\mathrm{ind}}(A_2) \end{aligned}$$
(30)
$$\begin{aligned} {\Pr \left[ \,{A_{ hF }^{G2}\,\mathrm {sets}\,\mathsf {bad}}\,\right] }&\le {{q}\atopwithdelims (){2}}\cdot \mathbf {Adv}_{F}^{\mathrm{au}}(A_{F}'). \end{aligned}$$
(31)

To obtain the blackbox claim of the lemma, we set \(A_{F}=A_{F}'\). Clearly

$$\begin{aligned} \mathbf {Adv}_{F}^{\mathrm{au}}(A_{F}') \;\le \;\mathbf {Adv}_{F}^{\mathrm{au}}(A_{F}). \end{aligned}$$
(32)

In the non-blackbox case, coin-fixing can be used just as in the proof of Lemma 3.2 to derive from \(A_{F}'\) an au-adversary \(A_{F}\) that has time complexity \(O(T_{F}(n))\) (and also outputs messages of \(n\) bits) such that (32) holds. Combining (32), (31), (30), (29) and (28) yields (26) and completes the proof of Lemma 4.2. It remains to prove (30), (31). We begin with the first of these.

Fig. 7
figure 7

Ind-adversary \(A_2\) against \(h\), taking an oracle \(g\) that on input a pair of \(b\)-bit strings returns a \(c\)-bit string, and au-adversary \(A_{F}'\) against \(F\), that outputs a pair of strings in \(D\)

Recall that an ind-adversary against \(h\) is given an oracle that takes as input a pair of \(b\)-bit strings \(x_0,x_1\). We are denoting this oracle by \(g\). Now it is easy to see that

$$\begin{aligned} {\Pr \left[ \,{A_2^{\mathrm {Right}}\Rightarrow {1}}\,\right] }&= {\Pr \left[ \,{A_{ hF }^{G1}\,\mathrm {sets}\,\mathsf {bad}}\,\right] } \\ {\Pr \left[ \,{A_2^{\mathrm {Left}}\Rightarrow {1}}\,\right] }&= {\Pr \left[ \,{A_{ hF }^{G2}\,\mathrm {sets}\,\mathsf {bad}}\,\right] } \;, \end{aligned}$$

which implies (30). However, there is one important thing we still need to verify, namely that \(A_2\) is legitimate. So consider the sequence \((x_1,y_1),(x_2,y_2),\ldots \) of oracle queries it makes. The left halves \(x_1,x_2,\ldots \) are values of the counter \(i\) in different loop iterations and are thus strictly increasing (although not necessarily successive) and in particular different. On the other hand, the right half values \(y_1,y_2,\ldots \) are distinct because as soon as \(y_i=y_j\) for some \(j<i\), adversary \(A_2\) halts (and returns 1), never making an oracle query whose right half is \(y_i\).

Next we turn to \(A_{F}'\). In order for the messages \(M_i,M_j\) it returns to always be defined, we assume wlog that \(A_{ hF }\) always makes exactly, rather than at most, \(q_{\mathrm {mac}}\) mac queries and exactly, rather than at most, \(q_{\mathrm {vf}}\) verification queries. Intuitively, (31) is true because \(i,j\) are chosen at random after the execution of \(A_{ hF }\) is complete, so \(A_{ hF }\) has no information about them. This can be made rigorous just as in the proof of Lemma 3.2, and the details follow. Consider the experiment defining the au-advantage of \(A_{F}'\). In this experiment, consider the following events defined for \(1\le \alpha < \beta \le q\):

Notice that the events “\(C_{\alpha ,\beta }\wedge (i,j)=(\alpha ,\beta )\)” (\(1\le \alpha <\beta \le q\)) are disjoint. (Even though the events \(C_{\alpha ,\beta }\) for \(1\le \alpha <\beta \le q\) are not.) Thus:

$$\begin{aligned} \mathbf {Adv}_{F}^{\mathrm{au}}(A_{F}')&= {\Pr \left[ \,{ {\textstyle \bigvee _{1\le \alpha < \beta \le q}} \left( C_{\alpha ,\beta }\wedge (i,j)=(\alpha ,\beta )\right) }\,\right] } \\&= \sum _{1\le \alpha < \beta \le q} {\Pr \left[ \,{C_{\alpha ,\beta }\wedge (i,j) =(\alpha ,\beta )}\,\right] }. \end{aligned}$$

Since \(i,j\) are chosen at random after the execution of \(A_{ hF }\) is complete, the events “\((i,j)=(\alpha ,\beta )\)” and \(C_{\alpha ,\beta }\) are independent. Thus the above equals

$$\begin{aligned} \sum _{1\le \alpha < \beta \le q} {\Pr \left[ \,{C_{\alpha ,\beta }}\,\right] }\cdot {\Pr \left[ \,{(i,j) =(\alpha ,\beta )}\,\right] }&= \sum _{1\le \alpha < \beta \le q} {\Pr \left[ \,{C_{\alpha ,\beta }}\,\right] }\cdot \frac{1}{{{q}\atopwithdelims (){2}}} \\&= \frac{1}{{{q}\atopwithdelims (){2}}} \cdot \sum _{1\le \alpha < \beta \le q} {\Pr \left[ \,{C_{\alpha ,\beta }}\,\right] } \\&\ge \frac{1}{{{q}\atopwithdelims (){2}}} \cdot {\Pr \left[ \,{C}\,\right] }. \end{aligned}$$

The proof of (31) is concluded by noting that \({\Pr \left[ \,{C}\,\right] }\) equals \({\Pr \left[ \,{G2\,\mathrm {sets}\,\mathsf {bad}}\,\right] }\).

5 Security of \(\mathsf {HMAC}\)

In this section, we show how our security results about \(\mathsf {NMAC}\) lift to corresponding ones about \(\mathsf {HMAC}\). We begin by recalling the observation of [4] as to how this works for HMAC with two independent keys, and then discuss how to extend this to the single-keyed version of HMAC.

The Constructs Let \(h{:\;\;}\{0,1\}^c\times \{0,1\}^b\rightarrow \{0,1\}^c\) as usual denote the compression function. Let \(\mathsf {pad}\) be the padding function as described in Sect. 3, so that \(s^*=s\Vert \mathsf {pad}(|s|)\in B^+\) for any string \(s\). Recall that the cryptographic hash function \(H\) associated to \(h\) is defined by \(H(M)=h^*(\mathrm {IV},M^*)\), where \(\mathrm {IV}\) is a \(c\)-bit initial vector that is fixed as part of the description of \(H\) and \(M\) is a string of any length up to some maximum length that is related to \(\mathsf {pad}\). (This maximum length is \(2^{64}\) for current hash functions.) Then \(\mathsf {HMAC}(K_{\mathrm {out}}\Vert K_{\mathrm {in}},M)=H(K_{\mathrm {out}}\Vert H(K_{\mathrm {in}}\Vert M))\), where \(K_{\mathrm {out}}, K_{\mathrm {in}}\in \{0,1\}^b\). If we write this out in terms of \(h^*\) alone we get

$$\begin{aligned} \mathsf {HMAC}(K_{\mathrm {out}}\Vert K_{\mathrm {in}},M) \;=\;h^*(\mathrm {IV},\; K_{\mathrm {out}}\,\Vert \, h^*(\mathrm {IV},K_{\mathrm {in}}\Vert M\Vert \mathsf {pad}(b+|M|))\,\Vert \,\mathsf {pad}(b+c)\;). \end{aligned}$$

As with NMAC, the details of the padding conventions are not important to the security of HMAC as a PRF, and we will consider the more general construct \(\mathsf {GHMAC}{:\;\;}\{0,1\}^{2b}\times B^+\rightarrow \{0,1\}^c\) defined by

$$\begin{aligned} \mathsf {GHMAC}(K_{\mathrm {out}}\Vert K_{\mathrm {in}},M)= h^*(\mathrm {IV},\;K_{\mathrm {out}}\,\Vert \, h^*(\mathrm {IV},K_{\mathrm {in}}\Vert M)\,\Vert \,\mathsf {fpad}\;) \end{aligned}$$
(33)

for all \(K_{\mathrm {out}},K_{\mathrm {in}}\in \{0,1\}^b\) and all \(M\in B^+\). Here \(\mathrm {IV}\in \{0,1\}^c\) and \(\mathsf {fpad}\in \{0,1\}^{b-c}\) are fixed strings. HMAC is a special case, via \(\mathsf {HMAC}(K_{\mathrm {out}}\Vert K_{\mathrm {in}},M)=\mathsf {GHMAC}(M\Vert \mathsf {pad}(b+|M|))\) with \(\mathsf {fpad}=\mathsf {pad}(b+c)\), and thus security properties of \(\mathsf {GHMAC}\) (as a PRF or MAC) are inherited by \(\mathsf {HMAC}\), allowing us to focus on the former.

The Dual Family To state the results, it is useful to define \(\overline{h}{:\;\;}\{0,1\}^b\times \{0,1\}^c\rightarrow \{0,1\}^c\), the dual of family \(h\), by \(\overline{h}(x,y)=h(y,x)\). The assumption that \(h\) is a PRF when keyed by its data input is formally captured by the assumption that \(\overline{h}\) is a PRF.

5.1 Security of GHMAC

Let \(K'_{\mathrm {out}}=h(\mathrm {IV},K_{\mathrm {out}})\) and \(K'_{\mathrm {in}}= h(\mathrm {IV},K_{\mathrm {in}})\). The observation of [4] is that

$$\begin{aligned} \mathsf {GHMAC}(K_{\mathrm {out}}\Vert K_{\mathrm {in}},M)&= h(K'_{\mathrm {out}},h^*(K'_{\mathrm {in}},M)\Vert \mathsf {fpad}) \nonumber \\&= \mathsf {GNMAC}(K'_{\mathrm {out}}\Vert K'_{\mathrm {in}},M). \end{aligned}$$
(34)

This effectively reduces the security of \(\mathsf {GHMAC}\) to \(\mathsf {GNMAC}\). Namely, if \(\overline{h}\) is a PRF and \(K_{\mathrm {out}},K_{\mathrm {in}}\) are chosen at random, then \(K'_{\mathrm {out}},K'_{\mathrm {in}}\) will be computationally close to random. Now (34) implies that if \(\mathsf {GNMAC}\) is a PRF then so is \(\mathsf {GHMAC}\). The formal statement follows.

Lemma 5.1

Assume \(b\ge c\) and let \(B=\{0,1\}^b\). Let \(h{:\;\;}\{0,1\}^c\times B\rightarrow \{0,1\}^c\) be a family of functions. Let \(\mathsf {fpad}\in \{0,1\}^{b-c}\) be a fixed padding string and \(\mathrm {IV}\in \{0,1\}^c\) a fixed initial vector. Let \(\mathsf {GHMAC}{:\;\;}\{0,1\}^{2b}\times B^+\rightarrow \{0,1\}^c\) be defined by (33) above. Let \(A\) be a prf-adversary against \(\mathsf {GHMAC}\) that has time complexity at most \(t\). Then there exists a prf-adversary \(A_{\overline{h}}\) against \(\overline{h}\) such that

$$\begin{aligned} \mathbf {Adv}_{\mathsf {GHMAC}}^{\mathrm{prf}}(A) \;\le \;2\cdot \mathbf {Adv}_{\overline{h}}^{\mathrm{prf}}(A_{\overline{h}}) + \mathbf {Adv}_{\mathsf {GNMAC}}^{\mathrm{prf}}(A). \end{aligned}$$

Furthermore, \(A_{\overline{h}}\) makes only \(1\) oracle query, this being \(\mathrm {IV}\), has time complexity at most \(t\) and is obtained via a blackbox reduction.

Proof of  Lemma 5.1

Assume wlog that \(A\) never repeats an oracle query. Consider the games in Fig. 8, and let \(p_i= \Pr [A^{Gi}\Rightarrow {1}]\) for \(0\le i\le 3\). Then

$$\begin{aligned} \mathbf {Adv}_{\mathsf {GHMAC}}^{\mathrm{prf}}(A) \;=\;p_3 - p_0 \;=\;(p_3-p_1)+(p_1-p_0). \end{aligned}$$

Clearly \(p_1-p_0=\mathbf {Adv}_{\mathsf {GNMAC}}^{\mathrm{prf}}(A)\). To complete the proof we construct \(A_{\overline{h}}\) so that

$$\begin{aligned} \mathbf {Adv}_{\overline{h}}^{\mathrm{prf}}(A_{\overline{h}}) \;=\;\frac{p_3+p_2}{2}-\frac{p_2+p_1}{2} \;=\;\frac{p_3-p_1}{2}. \end{aligned}$$
(35)

Given an oracle \(g{:\;\;}\{0,1\}^c\rightarrow \{0,1\}^c\), adversary \(A_{\overline{h}}\) picks a bit \(c\) at random. Then it picks keys via

If \(c=1\) then \(K'_{\mathrm {out}}\leftarrow g(\mathrm {IV})\,;\,K_{\mathrm {in}}\mathop {\leftarrow }\limits ^{\$}\{0,1\}^b\,;\,K'_{\mathrm {in}}\leftarrow h(\mathrm {IV},K_{\mathrm {in}})\)

Else \(K'_{\mathrm {out}}\mathop {\leftarrow }\limits ^{\$}\{0,1\}^c\,;\,K'_{\mathrm {in}}\leftarrow g(\mathrm {IV})\).

Finally it runs \(A\) with oracle \(\mathsf {GNMAC}(K'_{\mathrm {out}}\Vert K'_{\mathrm {in}},\cdot )\), and returns whatever \(A\) returns.

Fig. 8
figure 8

Games \(G0,G1,G2,G3\) for the proof of Lemma 5.1

Combining this with Theorem 3.3 or Theorem 3.4 yields the result that \(\mathsf {GHMAC}\) is a PRF assuming \(h,\overline{h}\) are both PRFs. Note that the PRF assumption on \(\overline{h}\) is mild because \(A_{\overline{h}}\) makes only one oracle query.

5.2 Single-Keyed HMAC

\(\mathsf {HMAC},\mathsf {GHMAC}\) as described and analyzed above use two keys that are assumed to be chosen independently at random. However, HMAC is in fact usually implemented with these keys derived from a single \(b\)-bit key. Here we provide the first security proofs for this single-key version of HMAC.

Specifically, let \(\mathsf {opad},\mathsf {ipad}\in \{0,1\}^b\) be distinct, fixed and known constants. (Their particular values can be found in [4, 27] and are not important here.) Then the single-key version of HMAC is defined by

$$\begin{aligned} \mathsf {HMAC\text{- }1}(K,M)\;=\;\mathsf {HMAC}(K{\oplus }\mathsf {opad}\Vert K{\oplus }\mathsf {ipad},M). \end{aligned}$$

As before, we look at this as a special case of a more general construct, namely \(\mathsf {GHMAC\text{- }1}{:\;\;}\{0,1\}^{b}\times B^+\rightarrow \{0,1\}^c\), defined by

$$\begin{aligned} \mathsf {GHMAC\text{- }1}(K,M)= \mathsf {GHMAC}(K{\oplus }\mathsf {opad}\Vert K{\oplus }\mathsf {ipad},M) \end{aligned}$$
(36)

for all \(K\in \{0,1\}^b\) and all \(M\in B^+\). We now focus on \(\mathsf {GHMAC\text{- }1}\). We will show that \(\mathsf {GHMAC\text{- }1}\) inherits the security of \(\mathsf {GNMAC}\) as long as \(\overline{h}\) is a PRF against an appropriate class of related key attacks. In such an attack, the adversary can obtain input–output examples of \(h\) under keys related to the target key. Let us recall the formal definitions following [8].

A related-key attack on a family of functions \(\overline{h}{:\;\;}\{0,1\}^b\times \{0,1\}^c\rightarrow \{0,1\}^c\) is parameterized by a set \(\Phi \subseteq \mathsf {Maps}(\{0,1\}^b, \{0,1\}^b)\) of key-derivation functions. We define the function \(\textsc {RK}{:\;\;}\Phi \times \{0,1\}^b\rightarrow \{0,1\}^b\) by \(\textsc {RK}(\phi ,K)=\phi (K)\) for all \(\phi \in \Phi \) and \(K\in \{0,1\}^b\). A rka-adversary \(A_{\overline{h}}\) may make an oracle query of the form \(\phi ,x\) where \(\phi \in \Phi \) and \(x\in \{0,1\}^c\). Its rka-advantage is defined by

$$\begin{aligned} \mathbf {Adv}_{\overline{h},\Phi }^{\mathrm{rka}}(A_{\overline{h}}) \;=\;{\Pr \left[ \,{A_{\overline{h}}^{\overline{h}(\textsc {RK}(\cdot ,K),\cdot )}\Rightarrow {1}}\,\right] }- {\Pr \left[ \,{A_{\overline{h}}^{G(\textsc {RK}(\cdot ,K),\cdot )}\Rightarrow {1}}\,\right] }. \end{aligned}$$

In the first case, \(K\) is chosen at random from \(\{0,1\}^b\) and the reply to query \(\phi ,x\) of \(A_{\overline{h}}\) is \(\overline{h}(\phi (K),x)\). In the second case, \(G\mathop {\leftarrow }\limits ^{\$}\mathsf {Maps}(\{0,1\}^b\times \{0,1\}^c,\{0,1\}^c)\) and \(K\mathop {\leftarrow }\limits ^{\$}\{0,1\}^b\), and the reply to query \(\phi ,x\) of \(A_{\overline{h}}\) is \(G(\phi (K),x)\).For any string \(s\in \{0,1\}^b\) let \(\Delta _s{:\;\;}\{0,1\}^b \rightarrow \{0,1\}^b\) be defined by \(\Delta _s(K)=K{\oplus }s\).

Lemma 5.2

Assume \(b\ge c\) and let \(B=\{0,1\}^b\). Let \(h{:\;\;}\{0,1\}^c\times B\rightarrow \{0,1\}^c\) be a family of functions. Let \(\mathsf {fpad}\in \{0,1\}^{b-c}\) be a fixed padding string, \(\mathrm {IV}\in \{0,1\}^c\) a fixed initial vector, and \(\mathsf {opad},\mathsf {ipad}\in \{0,1\}^b\) fixed, distinct strings. Let \(\mathsf {GHMAC\text{- }1}{:\;\;}\{0,1\}^{b}\times B^+\rightarrow \{0,1\}^c\) be defined by (36) above. Let \(\Phi =\{\Delta _{\mathsf {opad}},\Delta _{\mathsf {ipad}}\}\). Let \(A\) be a prf-adversary against \(\mathsf {GHMAC\text{- }1}\) that has time complexity at most \(t\). Then there exists a rka-adversary \(A_{\overline{h}}\) against \(\overline{h}\) such that

$$\begin{aligned} \mathbf {Adv}_{\mathsf {GHMAC\text{- }1}}^{\mathrm{prf}}(A) \;\le \;\mathbf {Adv}_{\overline{h},\Phi }^{\mathrm{rka}}(A_{\overline{h}}) + \mathbf {Adv}_{\mathsf {GNMAC}}^{\mathrm{prf}}(A). \end{aligned}$$

Furthermore, \(A_{\overline{h}}\) makes \(2\) oracle queries, these being \(\Delta _{\mathsf {opad}},\mathrm {IV}\) and \(\Delta _{\mathsf {ipad}},\mathrm {IV}\), has time complexity at most \(t\) and is obtained via a blackbox reduction.

Proof Lemma 5.2

Adversary \(A_{\overline{h}}\) queries its oracle with \(\Delta _{\mathsf {opad}},\mathrm {IV}\) and lets \(K'_{\mathrm {out}}\) denote the value returned. It also queries its oracle with \(\Delta _{\mathsf {ipad}},\mathrm {IV}\) and lets \(K'_{\mathrm {in}}\) denote the value returned. It then runs \(A\), answering the latter’s oracle queries via \(\mathsf {GNMAC}(K'_{\mathrm {out}}\Vert K'_{\mathrm {in}},\cdot )\), and returns whatever \(A\) returns.

Combining this with Theorem 3.3 or Theorem 3.4 yields the result that \(\mathsf {GHMAC\text{- }1}\) is a PRF assuming \(h\) is a PRF and \(\overline{h}\) is a PRF under \(\Phi \)-restricted related-key attacks, where \(\Phi \) is as in Lemma 5.2. We remark that \(\Phi \) is a small set of simple functions, which is important because it is shown in [8] that if \(\Phi \) is too rich then no family can be a PRF under \(\Phi \)-restricted related-key attacks. Furthermore, the assumption on \(\overline{h}\) is rendered milder by the fact that \(A_{\overline{h}}\) makes only two oracle queries, in both of which the message is the same, namely is the initial vector.

5.3 Hashing of Keys

Single-key HMAC as defined and analyzed above is the core construct of the RFC 2104 standard [27]. The standard allows the option that longer keys are hashed and padded before use with HMAC. Some standards [30] build this step into the description. This step can be justified under additional assumptions, including (but not limited to) that the hash function is close to regular.

5.4 Lifting the Results of Sect. 4

The procedure above to lift the NMAC results of Sect. 3 to HMAC applies also to lift the results of Sect. 4 to HMAC.Specifically, if \(h\) is a PP-MAC, \(h^*\) is AU and \(\overline{h}\) is a PRF then \(\mathsf {GHMAC}\) is a (privacy-preserving) MAC. Also if \(h\) is a PP-MAC, \(h^*\) is AU and \(\overline{h}\) is a PRF under \(\Phi \)-restricted related-key attacks, with \(\Phi \) as in Lemma 5.2, then \(\mathsf {GHMAC\text{- }1}\) is a (privacy-preserving) MAC. Note that the assumption on \(\overline{h}\) continues to be that it is a PRF or PRF against \(\Phi \)-restricted related-key attacks. (Namely, this has not been reduced to its being a PP-MAC.) This assumption is, however, mild in this context since (as indicated by Lemmas 5.2 and 5.1) it need only hold with respect to adversaries that make very few queries and these of a very specific type.

5.5 Remarks

Let \(h{:\;\;}\{0,1\}^{128}\times \{0,1\}^{512}\rightarrow \{0,1\}^{128}\) denote the compression function of MD5 [34]. An attack by den Boer and Bosselaers [16] finds values \(x_0,x_1,K\) such that \(h(x_0,K)=h( x_1,K)\) but \(x_0\ne x_1\). In a personal communication, Rijmen has said that it seems possible to extend this to an attack that finds such \(x_0,x_1\) even when \(K\) is unknown. If so, this might translate into the following attack showing \(h\) is not a PRF when keyed by its data input. (That is, \(\overline{h}\) is not a PRF.) Given an oracle \(g{:\;\;}\{0,1\}^{128}\rightarrow \{0,1\}^{128}\), the attacker would find \(x_0,x_1\) and obtain \(y_0=g(x_0)\) and \(y_1=g(x_1)\) from the oracle. It would return 1 if \(y_0=y_1\) and \(0\) otherwise. How does this impact the above, where we are assuming \(\overline{h}\) is a PRF? Interestingly, the actual PRF assumptions we need on \(\overline{h}\) are so weak that even such an attack does not break them. In Lemma 5.1, we need \(\overline{h}\) to be a PRF only against adversaries that make just one oracle query. (Because \(A_{\overline{h}}\) makes only one query.) But the attack above makes two queries. On the other hand, in Lemma 5.2, we need \(\overline{h}\) to be a related-key PRF only against adversaries that make two related-key queries in both of which the 128-bit message for \(\overline{h}\) is the same, this value being the initial vector used by the hash function. Furthermore, the related-key functions must be \(\Delta _{\mathsf {opad}}\), \(\Delta _{\mathsf {ipad}}\). The above-mentioned attack, however, uses two different messages \(x_0,x_1\) and calls the oracle under the original key rather than the related keys. In summary, the attack does not violate the assumptions made in either of the lemmas.

6 Attacking Weak Collision Resistance

Recall that \(H\) represents the cryptographic hash function (eg. MD5, SHA-1) while \(H^*\) is the extended hash function, which is the hash function with the initial vector made explicit as an (additional) first input. Let us use the term general collision-finding attack to refer to an attack that finds collisions in \(H^*(\mathrm {IV},\cdot )\) for an arbitrary but given \(\mathrm {IV}\). As we discussed in Sect. 1, it was noted in [4, 25] that any general collision-finding attack can be used to compromise the weak collision resistance (WCR) of \(H\). (And since the known collision-finding attacks on MD5 and SHA-1 [37, 38] do extend to general ones, the WCR of these functions is no more than their CR.) Here we recall the argument that shows this. It is a simple extension attack, and works as follows.

To compromise WCR of \(H\), an attacker given an oracle for \(H^*(K,\cdot )\) under a hidden key \(K\) must output distinct \(M_1,M_2\) such that \(H^*(K,M_1)=H^*(K,M_2)\). Our attacker picks some string \(x\) and calls its oracle to obtain \(\mathrm {IV}=H^*(K,x)\). Then it runs the given general collision-finding attack on input \(\mathrm {IV}\) to obtain a collision \(X_1,X_2\) for \(H^*(\mathrm {IV},\cdot )\). (That is, \(X_1,X_2\) are distinct strings such that \(H^*(\mathrm {IV},X_1)= H^*(\mathrm {IV},X_2)\).) Now let \(M_1=x\Vert \mathsf {pad}(|x|) \Vert X_1\Vert \mathsf {pad}(|X_1|)\) and \(M_2=x\Vert \mathsf {pad}(|x|)\Vert X_2\Vert \mathsf {pad}(|X_2|)\). (Here \(\mathsf {pad}(n)\) is a padding string that when appended to a string of length \(n\) results in a string whose length is a positive multiple of \(b\) bits where \(b\) is the block length of the underlying compression function. The function \(\mathsf {pad}\) is part of the description of the cryptographic hash function.) Then it follows that \(H^*(K,M_1)=H^*(K,M_2)\).

We clarify that these attacks on the WCR of the hash function do not break HMAC. What these attacks show is that the WCR assumption made for the security proof of [4] is not true for MD5 and SHA-1. This means we lose the proof-based guarantee of [4], but it does not imply any weakness in the construct. Our results show that WCR of the iterated hash function is not necessary for the security of HMAC: pseudorandomness of the compression function suffices. This helps explain why no attacks have emerged on HMAC even when it is implemented with hash functions that are not WCR.

7 The Reduction-From-pf-PRF Proof

We sketch how one can obtain the result that \(h\) a PRF implies \(h^*\) is cAU using the result of [5] that says that \(h\) a PRF implies \(h^*\) is a pf-PRF (a PRF as long as no query of the adversary is a prefix of another query). We then compare this with the direct proof given in Sect. 3.2.

The Result of [5] A prf-adversary is said to be prefix-free if no query it makes is a prefix of another. The result of [5] is that if \(D\) is a prefix-free prf-adversary against \(h^*\) that makes at most \(q\) queries, each of at most \(m\) blocks, then there is a prf-adversary \(A\) against \(h\) such that

$$\begin{aligned} \mathbf {Adv}_{h^*}^{\mathrm{prf}}(D) \;\le \;qm\cdot \mathbf {Adv}_{h}^{\mathrm{prf}}(A) \end{aligned}$$
(37)

and \(A\) has about the same time complexity as \(D\). (We remark that there is a typo in the statement of Theorem 3.1 of the proceedings version of [5] in this regard: the factor \(q\) is missing from the bound. This is, however, corrected in the on-line version of the paper.)

The Reduction-From-pf-PRF The result obtained via the reduction-from-pf-PRF proof will be slightly worse than the one of Lemma 3.1. Namely we claim that, under the same conditions as in that lemma, (2) is replaced by

$$\begin{aligned} \mathbf {Adv}_{h^*}^{\mathrm{au}}(A^*) \;\le \;2\cdot [\max (n_1,n_2)+1]\cdot \mathbf {Adv}_{h}^{\mathrm{prf}}(A) + \frac{1}{2^c} \;, \end{aligned}$$
(38)

and, in the non-blackbox case, the time complexity of \(A\) increases from \((n_1+n_2)\) computations of \(h\) to \(2\max (n_1,n_2)\) computations of \(h\). For some intuition about the proof, imagine \(h^*\) were a PRF. (It is not.) Then \(\mathsf {Coll}_{h^*}(M_1,M_2)\) would be about the same as the probability that \(f(M_1)=f(M_2)\) for a random function \(f\), because otherwise the prf-adversary who queried its oracle with \(M_1,M_2\) and accepted iff the replies were the same would be successful. With \(h^*\) in fact only a pf-PRF, the difficulty is the case that \(M_1\subseteq M_2\), which renders the adversary just described not prefix-free. There is a simple (and well-known) observation —we will call it the extension trick— to get around this. Namely, assuming wlog \(M_1\ne M_2\) and \(\Vert M_1\Vert _b \le \Vert M_2\Vert _b\), let \(x\in B\) be a block different from \(M_2[\Vert M_1\Vert _b+1]\), and let \(M_1'=M_1\Vert x\) and \(M_2'=M_2\Vert x\). Then \(\mathsf {Coll}_{h^*}(M_1,M_2)\le \mathsf {Coll}_{h^*}(M_1',M_2')\) but \(M_1\) is not a prefix of \(M_2\). This leads to the prefix-free prf-adversary against \(h^*\) below:

figure e

Here \(f{:\;\;}B^+\rightarrow \{0,1\}^c\) and we assume wlog that \(M_1,M_2\in B^+\) are distinct messages with \(\Vert M_1\Vert _b\le \Vert M_2\Vert _b\). Now the result of [5] gives us a prf-adversary \(A\) against \(h\) such that (37) holds. Thus:

$$\begin{aligned} \mathbf {Adv}_{h^*}^{\mathrm{au}}(A^*) - 2^{-c}&\le \mathbf {Adv}_{h^*}^{\mathrm{prf}}(D) \\&\le 2\cdot [\max (n_1,n_2)+1]\cdot \mathbf {Adv}_{h}^{\mathrm{prf}}(A). \end{aligned}$$

Re-arranging terms yields (38).

The time complexity of \(A\) as per [5] is (essentially) the time complexity \(t\) of \(A^*\), rather than being a small quantity independent of \(t\) as in Lemma 3.1. It is not clear whether or not the coin-fixing argument of our proof of can be applied to \(A\) to reduce this time complexity. (One would have to enter into the details of the proof of [5] to check.) However, instead, we can first modify \(A^*\) to an adversary that has embedded in its code a pair \(M_1,M_2\in B^+\) of distinct messages that maximize \(\mathsf {Coll}_{h^*}(M_1,M_2)\). It just outputs these messages and halts. We then apply the argument above to this modified \(A^*\), and now \(A\) will have time complexity of \(2\max (n_1,n_2)\) computations of \(h\) plus minor overhead.

Comparisons For the case that \(M_1\subseteq M_2\), our direct proof (meaning the one of Sect. 3.2) uses a different (and novel) ideas as opposed to the extension trick, which leads to a factor of only \(n_1+1\) in the bound (Claim 3.7) in this case, as opposed to the \(2[\max (n_1,n_2)+1]\) factor obtained via the reduction-from-pf-PRF proof. The difference can be significant in the case that \(M_2\) is long and \(M_1\) is short. In the case \(M_1\not \subseteq M_2\) our direct proof relies heavily on ideas of [5], but avoids the intermediate reduction to the multi-oracle model they use and exploits the non-adaptive nature of the setting to improve the factor in the bound from \(2\max (n_1,n_2)\) to \(n_1+n_2-1\). These improvements (overall, a constant factor) may not seem significant in this context. But they become significant when translated to NMAC, as evidenced by Theorem 3.4. As we saw, the latter gives appreciably better bounds than Theorem 3.3 in the case that authenticated messages have varying lengths. However, our direct proof (meaning, Lemma 3.1) is crucial to obtaining Theorem 3.4. The reduction-from-pf-PRF proof will only yield (a result that is slightly worse than) Theorem 3.3.

The reduction-from-pf-PRF proof is certainly simpler than our direct proof if one is willing to take the result of [5] as given. However, especially for a construct that is as widely standardized as HMAC, we think it is useful to have from-scratch proofs that are as easily verifiable as possible. If the measure of complexity is that of a from-scratch (i.e., self-contained) proof, we contend that our direct one (although not trivial) is simpler than that of [5]. (In particular because we do not use the multi-oracle model.) We remark that if a reader’s primary interest is the simplest possible self-contained proof regardless of the quality of the bound, the way to get it is to use our direct proof for the case \(M_1 \not \subseteq M_2\) and then the extension trick (as opposed to our direct proof) for the case \(M_1\subseteq M_2\).