Generic Security of NMAC and HMAC with Input Whitening
 3 Citations
 1.3k Downloads
Abstract
HMAC and its variant NMAC are the most popular approaches to deriving a MAC (and more generally, a PRF) from a cryptographic hash function. Despite nearly two decades of research, their exact security still remains far from understood in many different contexts. Indeed, recent works have resurfaced interest for generic attacks, i.e., attacks that treat the compression function of the underlying hash function as a black box.
Generic security can be proved in a model where the underlying compression function is modeled as a random function – yet, to date, the question of proving tight, nontrivial bounds on the generic security of HMAC/NMAC even as a PRF remains a challenging open question.
In this paper, we ask the question of whether a small modification to HMAC and NMAC can allow us to exactly characterize the security of the resulting constructions, while only incurring little penalty with respect to efficiency. To this end, we present simple variants of NMAC and HMAC, for which we prove tight bounds on the generic PRF security, expressed in terms of numbers of construction and compression function queries necessary to break the construction. All of our constructions are obtained via a (near) blackbox modification of NMAC and HMAC, which can be interpreted as an initial step of keydependent message preprocessing.
While our focus is on PRF security, a further attractive feature of our new constructions is that they clearly defeat all recent generic attacks against properties such as state recovery and universal forgery. These exploit properties of the socalled “functional graph” which are not directly accessible in our new constructions.
Keywords
Message authentication codes HMAC Generic attacks Provable security1 Introduction
This paper presents new variants of the HMAC/NMAC constructions of message authentication codes which enjoy provable security as a pseudorandom function (PRF) against generic distinguishing attacks, i.e., attacks which treat the compression function of the underlying hash function as a blackbox. In particular, we prove concrete tight bounds in terms of the number of queries to the construction and to the compression function necessary to distinguishing our construction from a random function. Our constructions are the first HMAC/NMAC variants to enjoy such a tight analysis, and we see this as an important stepping stone towards the understanding of the generic security of such constructions.
Security of \(\mathsf {HMAC}\)/\(\mathsf {NMAC}\). The security of both constructions has been studied extensively, both by obtaining security proofs and proposing attacks. On the former side, \(\mathsf {NMAC}\) and \(\mathsf {HMAC}\) were proven to be secure pseudorandom functions (PRFs) in the standard model [3], later also using weaker assumptions [2] and via a tight bound in the uniform setting [7]. However, as argued in [7], this standardmodel bound might be overly pessimistic, covering also very unnatural constructions of the underlying compression function \(\mathsf{f}\) (for example the one used in their tightness proof). The authors hence argue for the need of an analysis of the PRF security of \(\mathsf {HMAC}\) in the socalled ideal compression function model where the compression function is modelled as an ideal random function and the adversary is allowed to query it. This model was previously used by Dodis et al. [6] to study indifferentiability of \(\mathsf {HMAC}\), which however only holds for certain key lengths.
This is also the model implicitly underlying many of the recently proposed attacks on hashbased MACs [5, 10, 15, 17, 19, 20, 22]. These attacks are termed generic, meaning they can be mounted for any underlying hash function as long as it follows the MerkleDamgård (MD) paradigm. The complexity of such a generic attack is then expressed in the number of keydependent queries to the construction (denoted \({q_\mathrm {C}}\)) as well as the number of queries to the underlying compression function (denoted \({q_\mathsf{f}}\)). These two classes of queries are also often referred to as online and offline, respectively.
All iterated MACs are subject to the longknown Preneel and van Oorschot’s attack [21] which implies a forgery (and hence also distinguishing) attack against \(\mathsf {HMAC}\)/\(\mathsf {NMAC}\) making \({q_\mathrm {C}}= 2^{c/2}\) construction queries (consisting of constantlength messages) and no direct compression function queries (i.e., \({q_\mathsf{f}}=0\)). This immediately raises two questions:
How does the security of \(\mathsf {HMAC}\) and \(\mathsf {NMAC}\) degrade (in terms of tolerable \({q_\mathrm {C}}\)) by increasing (1) the length \(\ell \) of the messages and (2) the number \({q_\mathsf{f}}\) of compressionfunction evaluations?
The first question has been partially addressed in [7]. Their result^{2} can be interpreted as giving tight bounds on the PRF security of \(\mathsf {NMAC}\) against an attacker making \({q_\mathrm {C}}\) keydependent construction queries (of length at most \(\ell < 2^{c/3}\) bbit blocks) but no queries to the compression function. They show that both constructions can only be distinguished from random function with advantage roughly \(\epsilon ({q_\mathrm {C}}, \ell ) \approx \ell ^{1+o(1)} {q_\mathrm {C}}^2/2^{c}\), improving significantly on the bound \(\epsilon ({q_\mathrm {C}}, \ell ) \approx \ell ^2 {q_\mathrm {C}}^2/2^c\) provable using standard folklore techniques. From our perspective, this bound can be read as a smooth tradeoff: with increasing maximum allowed query length \(\ell \) it tells us how many queries \({q_\mathrm {C}}\) can be tolerated for any acceptable upper bound on advantage.
Still, it is not clear how this tradeoff changes when allowing extremely long messages (\(\ell >2^{c/3}\)) and/or some queries to the compression function (\({q_\mathsf{f}}>0\)). Note that while huge \(\ell \) can be prevented by standards, in practical settings \({q_\mathsf{f}}\) is very likely to be much higher than \({q_\mathrm {C}}\), as it represents cheap local (offline) computation of the attacker. We therefore focus on capturing the tradeoff between \({q_\mathrm {C}}\) and \({q_\mathsf{f}}\) for values of \({q_\mathrm {C}}\) that do not allow to mount the attack from [21]. However, as we argue below, getting such a tight tradeoff for \(\mathsf {NMAC}\)/\(\mathsf {HMAC}\) seems to be out of reach for now, we hence relax the problem by allowing for slight modifications to the vanilla \(\mathsf {NMAC}\)/\(\mathsf {HMAC}\) construction.
Our Contributions. We ask the following question here, and answer it positively:
Can we devise variants of \(\mathsf {HMAC}\)/\(\mathsf {NMAC}\) whose security provably degrades gracefully with an increasing number of compression function queries \({q_\mathsf{f}}\), possibly retaining security for \({q_\mathsf{f}}\) being much larger than \(2^c\)?
The main contribution of this paper is the introduction and analysis of a variant of \(\mathsf {NMAC}\) (which we then adapt to the \(\mathsf {HMAC}\) setting, as described below) which uses additional key material to “whiten” message blocks before being processed by the compression function. Concretely, our construction – termed \(\mathsf {WNMAC}\) (for “whitened NMAC”) uses an additional extra bbit key \(K_{\mathrm {w}}\), and given a message M padded as \(M[1], \ldots , M[\ell ]\), operates as \(\mathsf {NMAC}\) on input padded to blocks \(M'[i] = M[i] \oplus K_b\), i.e., every message block is whitened with the same key (see also Fig. 1).
The rationale behind \(\mathsf {WNMAC}\) is twofold. First, from the security viewpoint, the justification comes from the rich line of research on generic attacks on hashbased MACs. Most recent attacks [10, 15, 19, 20] exploit the socalled “functional graph” of the compression function \(\mathsf{f}\), i.e., the graph capturing the structure of \(\mathsf{f}\) when repeatedly invoked with its bbit input fixed to some constant (say \(0^b\)). Since our whitening denies the adversary the knowledge of bbit inputs on which \(\mathsf{f}\) is invoked during construction queries, intuitively it seems to be the right way to foil such attacks. Moreover, a recent work by Sasaki and Wang [22] suggests that keying every invocation of \(\mathsf{f}\) is necessary in order to prevent suboptimal security against generic state recovery attacks. \(\mathsf {WNMAC}\) arguably provides the simplest and most natural such keying. Second, from the practical perspective, \(\mathsf {WNMAC}\) can be implemented on top of an existing implementation of \(\mathsf {NMAC}\), using it as a blackbox.
Other Security Properties. Additionally, we also analyze the security level \(\mathsf {WNMAC}\) achieves with respect to other security notions frequently considered in the attacks literature. By a series of reductions, we show that, roughly speaking, \(\epsilon _\mathsf {WNMAC}\) also upperbounds the adversary’s advantage for distinguishingH and state recovery. We believe that addressing these cryptanalytic notions also using the traditional toolbox of provable security is important and see this paper as taking the first step on that path.
Lifting to HMAC. We then move our attention from \(\mathsf {NMAC}\) to \(\mathsf {HMAC}\) and propose two analogous modifications to it. The first one, called \(\mathsf {WHMAC}\), is obtained from \(\mathsf {HMAC}\) in the same way \(\mathsf {WNMAC}\) is obtained from \(\mathsf {NMAC}\): by whitening the padded message blocks with an independent key, The second one, termed \(\mathsf {WHMAC}^+\), additionally processes a fresh key \(K^+\) instead of the first block of the message. Both variants can be implemented given only blackbox access to \(\mathsf {HMAC}\), and we prove that they maintain the same security level as \(\mathsf {WNMAC}\) as long as the parameters b, c of \(\mathsf{f}\) satisfy \(b\gg 2c\) (for \(\mathsf {WHMAC}\)) or \(b\gg c\) (for \(\mathsf {WHMAC}^+\)). Note that for existing hash functions, the former condition is satisfied for both MD5 and SHA1, while the latter holds also for SHA256 and SHA512.
The Dual Construction. Motivated by the most restrictive term \({q_\mathrm {C}}{q_\mathsf{f}}/2^{2c}\) in \(\epsilon _\mathsf {WNMAC}\), the final construction we propose in this paper is a “dual” version of \(\mathsf {WNMAC}\) denoted \(\mathsf {DWNMAC}\), that differs in the final, outer \(\mathsf{f}\)call. Instead of \(\mathsf{f}(K_2,s\,\Vert \,0^{bc})\) for a cbit key \(K_2\) and a cbit state s padded with zeroes, the outer call in \(\mathsf {DWNMAC}\) computes \(\mathsf{f}(s,K_2)\) for a longer, bbit key. As expected, we prove that this tweak removes the need for the \({q_\mathrm {C}}{q_\mathsf{f}}/2^{2c}\) term and replaces it by the strictly favourable term \({q_\mathrm {C}}{q_\mathsf{f}}/2^{b+c}\), proving that the zeropadding in the outer call of \(\mathsf {WNMAC}\) was actually responsible for the “bottleneck” term in its security bound.
Our Techniques. In our informationtheoretic analysis of \(\mathsf {WNMAC}\) we employ the Hcoefficient technique by Patarin [18], partially inheriting the notational framework from the recent analysis of keyed sponges by Gaži, Pietrzak, and Tessaro [8]. On a high level, the heart of our proof is a careful analysis of the probability that two sets intersect in the ideal experiment: (1) the set of adversarial queries to \(\mathsf{f}\), and (2) the set of inputs on which \(\mathsf{f}\) is invoked when answering the adversary’s queries to \(\mathsf {WNMAC}\). Obtaining a bound on the probability of this event then allows us to exclude it and use the result from [7] that considers \({q_\mathsf{f}}=0\), properly adapted to the \(\mathsf {WNMAC}\) setting.
Related Work. As mentioned above, the motivation for our work partially stems from the recent line of work on generic attacks against iterated hashbased MACs [5, 10, 15, 17, 19, 20, 22]. While our security bound for \(\mathsf {WNMAC}\) does not exclude attacks of the complexity (in terms of numbers of queries and message lengths) considered in these papers, the design of \(\mathsf {WNMAC}\) was partially guided by the structure of these attacks and seems to prevent them. We find in particular the work [22] to be a good justification for investigating the security of \(\mathsf {WNMAC}\) and related constructions. Iterated MAC that uses keying in every \(\mathsf{f}\)invocation was already considered by An and Bellare [1], their construction \({\mathsf {NI}}\) was later subject to analysis [7] that we adapt and reuse. One can see \(\mathsf {WNMAC}\) as a conceptual simplification of \({\mathsf {NI}}\) where the key is simply used to whiten the bbit input to the compression function. Finally, our dual construction considered in Sect. 5 bears resemblance to the Sandwich MAC analyzed by Yasuda [23], we believe that our methods could be easily adapted to cover this construction as well.
Perspective and Open Problems. We stress that the reader should not conclude from this work that \(\mathsf {NMAC}\) and \(\mathsf {HMAC}\) are necessarily less secure than the constructions proposed in this paper, specifically with respect to PRF security. In fact, we are not aware of any attacks showing a separation between the PRF security of our constructions and that of the original \(\mathsf {NMAC}\)/\(\mathsf {HMAC}\) constructions, finding one is an interesting open problem.
While obtaining a nontight birthdaytype bound for \(\mathsf {NMAC}\)/\(\mathsf {HMAC}\) is feasible (for most keylength values, a bound follow directly from the indifferentiability analysis of [6]), proving tight bounds in terms of compression function and construction queries on the generic PRF security of \(\mathsf {NMAC}\)/\(\mathsf {HMAC}\) is a challenging open problem, on which little progress has been made. The main challenge is to understand how partial information in form of \(\mathsf{f}\)queries can help the attacker to break security (i.e., distinguish) in settings with \({q_\mathrm {C}}\ll 2^{c/2}/\sqrt{\ell }\), when the attack from [7] does not apply. This will require in particular developing a better understanding of the functional graph defined by queries to the function \(\mathsf{f}\). Some of its properties have been indeed exploited in existing generic attacks, but proving security appears to require a much deeper understanding: Most of the recent attacks, which are probably still not tight, do not come with rigorous proofs but instead rely on conjectures on the structure of these graphs [10]. The difficulty of this question for \(\mathsf {NMAC}\)/\(\mathsf {HMAC}\) is also well documented by the fact that even proving security of the whitened constructions presented in this paper required some novel tricks and considerable effort.
Similarly, it remains equally challenging to prove that for the properties considered by the recent \(\mathsf {HMAC}\)/\(\mathsf {NMAC}\) attacks (such as distinguishingH, state recovery or various types of forgeries), the security of \(\mathsf {WNMAC}\)/\(\mathsf {WHMAC}\) is provably superior. Yet, we note that our construction invalidates direct application of all existing attacks, and hence we feel confident conjecturing that its security is much higher.
Blackbox Instantiations. Throughout the paper we implicitly assume we can add a key to each bbit input block, even though we aim for a blackbox instantiation. For many MDbased hash functions, such finegrained control of the input to the compression function is generally not possible via a blackbox message preprocessing. Concretely, the functions from the SHAfamily with 512bit blocks only allow to effectively control (via alterations of the message) the first 447 bits of the last block, since the remaining 65 bits are reserved for the 64bit length, and an additional 1bit. Our analysis can be easily modified to take this into account. The resulting bound will change very little, and will result in the term \(\ell {q_\mathrm {C}}{q_\mathsf{f}}/ 2^{b + c}\) being replaced by the term \((\ell 1 + 2^d) \cdot {q_\mathrm {C}}\cdot {q_\mathsf{f}}/ 2^{b + c}\), where d is the length of the noncontrollable part of the input (for SHAfunctions, \(d = 65\)). Note that since \(d \ll b  c\), this will not affect the tightness of the bounds for concrete parameters.
2 Preliminaries
Basic Notation. We denote \(\left[ n \right] := \{1, \ldots , n\}\). Moreover, for a finite set \({\mathcal S}\) (e.g., \({\mathcal S} = \{0,1\}\)), we let \({\mathcal S}^n\), \({\mathcal S}^+\) and \({\mathcal S}^*\) be the sets of sequences of elements of \({\mathcal S}\) of length n, of arbitrary (but nonzero) length, and of arbitrary length, respectively (with \(\varepsilon \) denoting the empty sequence, as opposed to \(\epsilon \) which is a small quantity). As a shorthand, let \(\{0,1\}^{b*}\) denote \(\left( \{0,1\}^b \right) ^*\). We denote by S[i] the ith element of \(S \in {\mathcal S}^n\) for all \(i \in [n]\). Similarly, we denote by \(S[i \ldots j]\), for every \(1 \le i \le j \le n\), the subsequence consisting of \(S[i], S[i + 1], \ldots , S[j]\), with the convention that \(S[i \ldots i] = S[i]\). Moreover, we denote by \(S \,\Vert \,S'\) the concatenation of two sequences in \({\mathcal S}^*\), and also, we let \(S \,\, T\) be the usual prefixof relation: \(S\mid T: \Leftrightarrow (\exists S'\in {\mathcal S}^*:S\,\Vert \,S'=T)\).
We also let \({\mathcal F}({\mathcal D},{\mathcal R})\) be the set of all functions from \({\mathcal D}\) to \({\mathcal R}\); and with a slight abuse of notation we sometimes write \({\mathcal F}(m, n)\) (resp. \({\mathcal F}(*, n)\)) to denote the set of functions mapping mbit strings to nbit strings (resp. from \(\{0,1\}^*\) to \(\{0,1\}^n\)). We denote by \(x\mathop {\leftarrow }\limits ^{{\tiny {\$}}}{\mathcal X}\) the act of sampling x uniformly at random from \({\mathcal X}\). Finally, we denote the event that an adversary \({\mathsf A}\), given access to an oracle \(\mathsf {O}\), outputs a value y, as \({\mathsf A}^{\mathsf {O}} \Rightarrow y\). To emphasize the random experiment considered, we sometimes denote the probability of an event A in a random experiment \(\mathsf {E}\) by \(\mathsf {P}^{\mathsf {E}}[A]\). Finally, the minentropy \(\mathsf {H}_{\infty }(X)\) of a random variable X with range \({\mathcal X}\) is defined as \( \log \left( \max _{x \in {\mathcal X}} \mathsf {P}_{X}(x) \right) \).
Note that the notion of PRFsecurity is identical to the notion of distinguishingR, first defined in [13] and often used in the cryptanalytic literature on hashbased MACs.
MACs and Unpredictability. It is well known that a good PRF also yields a good messageauthentication code (MAC). A concrete security bound for unforgeability can be obtained from the PRF bound via a standard argument.
3 The Whitened NMAC Construction
Theorem 1
Note that as observed in Sect. 2, this also covers the socalled distinguishingR security of \(\mathsf {WNMAC}\). Moreover, our analysis also implies security bounds for distinguishingH and state recovery, as we discuss later.
3.1 Basic Notation, Message Trees and Repetition Patterns
Let us fix an adversary \({\mathsf A}\). We assume that \({\mathsf A}\) is deterministic, it makes exactly \({q_\mathsf{f}}\) queries to \(\mathsf{f}\) and \({q_\mathrm {C}}\) construction queries, and it never repeats the same query twice. All these assumptions are without loss of generality for an informationtheoretic indistinguishability analysis, since an arbitrary (possibly randomized) adversary making at most this many queries can be transformed into one satisfying the above constraints and achieving advantage which is at least as large.

The vertex set is \(V := \left\{ M' \in \left( \{0,1\}^b \right) ^* \,:\, \exists M \in {\mathcal Q}_C: M' \,\, M \right\} \), where \(\mid \) is the prefixof partial ordering of strings. In particular, note that the empty string \(\varepsilon \) is a vertex and that \({\mathcal Q}_C\subseteq V\).
 The set \(E \subseteq V \times V\) of (directed) edges is$$E := \left\{ (M, M') \,:\, \exists m \in \{0,1\}^b: M' = M \,\Vert \,m \right\} .$$
To simplify our exposition, we also define the following two mappings based on \({T}({\mathcal Q}_C)\).

The mapping \(\pi (v):V\setminus \{\varepsilon \}\rightarrow V\) returns the unique parent node of \(v\in V\setminus \{\varepsilon \}\); i.e., the unique node u such that \((u,v)\in E\).

The mapping \(\mu (v):V\setminus \{\varepsilon \}\rightarrow \{0,1\}^b\) returns the unique message block \(m\in \{0,1\}^b\) such that \(\pi (v)\,\Vert \,\mu (v)=v\) (intuitively, this will be the message block that is processed when “arriving” in vertex v).
Alternatively, with a slight abuse of notation we will also refer to the vertices in V as \(v_1,\ldots , v_{V}\) which is an arbitrary ordering of them such that for all \(1\le i,j\le V\) it satisfies \(v_i  \mid v_j \Rightarrow i\le j\). Note that one obtains such an ordering for example if one, intuitively speaking, processes the messages in \({\mathcal Q}_C\) blockwise and labels the vertices by their “first appearance”: in particular \(v_1=\varepsilon \) is the tree root.

The set of vertices V and edges E are defined exactly as for \({T}({\mathcal Q}_C)\) above.

The vertexlabeling function \(\lambda :V\rightarrow \{0,1\}^c\) is defined iteratively: \(\lambda (\varepsilon ):=K_1\) and for each nonroot vertex \(v\in V\setminus \{\varepsilon \}\) we put \(\lambda (v):=\mathsf{f}(\lambda (\pi (v)),\mu (v)\oplus K_{\mathrm {w}})\).
3.2 Interactions and Transcripts
Let \(\mathcal {QR}_C\) denote the set of \({q_\mathrm {C}}\) pairs (x, r) such that \(x\in \{0,1\}^{b*}\) is a construction query and \(r\in \{0,1\}^c\) is a potential response to it (what we mean by “potential” will be clear from below). Similarly let \(\mathcal {QR}_\mathsf{f}\) denote the set of \({q_\mathsf{f}}\) pairs (x, r) such that \(x\in \{0,1\}^c\times \{0,1\}^b\) is an \(\mathsf{f}\)query and \(r\in \{0,1\}^c\) is a potential response to it. Let \({\mathcal Q}_C\subseteq \{0,1\}^{b*}\) and \({\mathcal Q}_\mathsf{f}\subseteq \{0,1\}^{c}\times \{0,1\}^b\) denote the sets of first coordinates (i.e., the queries) in \(\mathcal {QR}_C\) and \(\mathcal {QR}_\mathsf{f}\), respectively; we have \({\mathcal Q}_C={q_\mathrm {C}}\) and \({\mathcal Q}_\mathsf{f}={q_\mathsf{f}}\).
We call the pair of sets \((\mathcal {QR}_C,\mathcal {QR}_\mathsf{f})\) valid if the adversary \({\mathsf A}\) would indeed ask these queries throughout the experiment, assuming that each of her queries would be replied by the respective response in \(\mathcal {QR}_C\) or \(\mathcal {QR}_\mathsf{f}\) (note that once a deterministic \({\mathsf A}\) is fixed, this determines whether a given pair \((\mathcal {QR}_C,\mathcal {QR}_\mathsf{f})\) is valid).
 Real World. The transcript \(\mathsf {T}_{\mathsf {real}}\) for the adversary \({\mathsf A}\) is obtained by sampling \(\mathsf{f}\mathop {\leftarrow }\limits ^{{\tiny {\$}}}{\mathcal F}(c+b,c)\) and \(K=(K_1,K_2,K_{\mathrm {w}})\leftarrow \{0,1\}^c\times \{0,1\}^c\times \{0,1\}^b\), and letting \(\mathsf {T}_{\mathsf {real}}\) denotewhere we execute \({\mathsf A}\), which asks construction queries \(M_1, \ldots , M_{q_\mathrm {C}}\) answered with \(Y_i :=\mathsf {WNMAC}[\mathsf{f}]_K(M_i)\) for all \(i \in [{q_\mathrm {C}}]\); and \(\mathsf{f}\)queries \(X_1, \ldots , X_{q_\mathsf{f}}\) answered with \(R_i:=\mathsf{f}(X_i)\) for all \(i \in [{q_\mathsf{f}}]\) (note that the Cqueries and \(\mathsf{f}\)queries may in general be interleaved adaptively, depending on \({\mathsf A}\)). Finally, we let \({T}^{\mathsf{f}}_K({\mathcal Q}_C)\) be the labeled message tree corresponding to \({\mathcal Q}_C\), \(\mathsf{f}\) and K.$$\begin{aligned} \left( \mathcal {QR}_C=\left\{ (M_i,Y_i) \right\} _{i=1}^{q_\mathrm {C}}, \mathcal {QR}_\mathsf{f}=\left\{ (X_i,R_i) \right\} _{i=1}^{q_\mathsf{f}}, K=(K_1,K_2,K_{\mathrm {w}}), {T}^{\mathsf{f}}_K({\mathcal Q}_C) \right) \! , \end{aligned}$$
 Ideal World. The transcript \(\mathsf {T}_{\mathsf {ideal}}\) for the adversary \({\mathsf A}\) is obtained similarly to the above, but here, together with the random function \(\mathsf{f}\mathop {\leftarrow }\limits ^{{\tiny {\$}}}{\mathcal F}(c+b,c)\) and the key tuple \(K=(K_1,K_2,K_{\mathrm {w}})\leftarrow \{0,1\}^c\times \{0,1\}^c\times \{0,1\}^b\), we also sample \({q_\mathrm {C}}\) independent random values \(Y_1, \ldots , Y_{q_\mathrm {C}}\in \{0,1\}^r\). Then we let \(\mathsf {T}_{\mathsf {ideal}}\) denotewhere we execute \({\mathsf A}\), answer each its Cquery \(M_i\) with \(Y_i\) for all \(i \in [{q_\mathrm {C}}]\) and each its \(\mathsf{f}\)query \(X_i\) with \(R_i:=\mathsf{f}(X_i)\) for all \(i\in [{q_\mathsf{f}}]\). Then we let \({T}^{\mathsf{f}}_K({\mathcal Q}_C)\) be the labeled message tree corresponding to \({\mathcal Q}_C\), \(\mathsf{f}\) and K.$$\begin{aligned} \left( \mathcal {QR}_C=\left\{ (M_i,Y_i) \right\} _{i=1}^{q_\mathrm {C}}, \mathcal {QR}_\mathsf{f}=\left\{ (X_i,R_i) \right\} _{i=1}^{q_\mathsf{f}}, K=(K_1,K_2,K_{\mathrm {w}}), {T}^{\mathsf{f}}_K({\mathcal Q}_C) \right) \! , \end{aligned}$$
Later we refer to the above two random experiments as \(\mathsf {real}\) and \(\mathsf {ideal}\), respectively. Note that the range of \(\mathsf {T}_{\mathsf {real}}\) is included in the range of \(\mathsf {T}_{\mathsf {ideal}}\) by definition, and that the range of \(\mathsf {T}_{\mathsf {ideal}}\) is easily seen to contain all valid transcripts.
3.3 The HCoefficient Method
We are going to use Patarin’s Hcoefficient method [18]. This means that we need to partition the set of valid transcripts into good transcripts \(\mathsf {GT}\) and bad transcripts \(\mathsf {BT}\) and then apply the following lemma.
Lemma 1

(a) \(\mathsf {P}\left[ \mathsf {T}_{\mathsf {ideal}} \in \mathsf {BT} \right] \le \delta \).
 (b) For all \(\tau \in \mathsf {GT}\),$$\begin{aligned} \frac{\mathsf {P}\left[ \mathsf {T}_{\mathsf {real}} = \tau \right] }{\mathsf {P}\left[ \mathsf {T}_{\mathsf {ideal}} = \tau \right] } \ge 1  \epsilon . \end{aligned}$$
More verbally, we want a set of good transcripts \(\mathsf {GT}\) such that with very high probability (i.e., \(1  \delta \)) a generated transcript in the ideal world is going to be in this set, and moreover, for each such good transcript, the probabilities that it occurs in the real and in the ideal worlds are roughly the same, i.e., at most a multiplicative factor \(1  \epsilon \) apart.
3.4 Good and Bad Transcripts
Definition 1

(1) The event \(\mathsf {C\text{ }f\text{ }coll_{out}}\) has not occurred.

(2) The event \(\mathsf {C\text{ }coll}\) has not occurred.

(3) For any \(v\in V\) we have \(\lambda (v)\ne K_2\).
We denote as \(\mathsf {GT}\) the set of all good transcripts, and \(\mathsf {BT}\) the set of all bad transcripts, i.e., transcripts which can possibly occur (i.e., they are in the range of \(\mathsf {T}_{\mathsf {ideal}}\)) and are not good. More specifically, we denote by \(\mathsf {BT}_i\) the set of all bad transcripts that do not satisfy the ith property in the definition of a good transcript above, hence we have \(\mathsf {BT}=\bigcup _{i=1}^3\mathsf {BT}_i\).
3.5 Probability of a Cfcollision
In this section we upperbound the probability of \(\mathsf {C\text{ }f\text{ }coll}\) by considering inner and outer Cfcollisions separately.
Lemma 2
We have \( \mathsf {P}^\mathsf {ideal}[\mathsf {C\text{ }f\text{ }coll_{in}}] \le {\ell {q_\mathrm {C}}{q_\mathsf{f}}}/{2^{b+c}} \).
Proof
We now argue that \(\mathsf {P}^{\mathsf {ideal}}[\mathsf {C\text{ }f\text{ }coll_{in}}] = \mathsf {P}^{\mathsf {ideal}'}[\mathsf {C\text{ }f\text{ }coll_{in}}] \). To see this, consider an intermediate experiment \(\mathsf {ideal}''\) that is defined exactly as \(\mathsf {ideal}\) except that it uses a separate ideal compression function \(\mathsf{g}\) to generate the vertex labels of the tree contained in the transcript, where \(\mathsf{g}\) is completely independent of \(\mathsf{f}\) queried by the adversary (i.e., the adversary queries \(\mathsf{f}\) and the transcript contains \(\mathcal {QR}_\mathsf{f}\) and \({T}^\mathsf{g}_K({\mathcal Q}_C)\)). It is now clear that \( \mathsf {P}^{\mathsf {ideal}}[\mathsf {C\text{ }f\text{ }coll_{in}}] = \mathsf {P}^{\mathsf {ideal}''}[\mathsf {C\text{ }f\text{ }coll_{in}}] \) since as long as no inner Cfcollision happens, the experiments are identical.
The remaining equality \( \mathsf {P}^{\mathsf {ideal}''}[\mathsf {C\text{ }f\text{ }coll_{in}}] = \mathsf {P}^{\mathsf {ideal}'}[\mathsf {C\text{ }f\text{ }coll_{in}}] \) follows from the definition of \(\mathsf {ideal}'\). It is easy to see that the distribution of vertex labels sampled in steps 2 and 3 of \(\mathsf {ideal}'\) and by labeling the tree \({T}^\mathsf{g}_K({\mathcal Q}_C)\) in \(\mathsf {ideal}''\) are the same. In both cases, repeated inputs to the compression function lead to consistent outputs, while fresh inputs lead to independent random outputs. The two experiments only differ in the order of sampling: \(\mathsf {ideal}''\) first samples \(\mathsf{g}\) and then performs the labeling, while \(\mathsf {ideal}'\) starts by sampling the repetition pattern, and then chooses the actual labels correspondingly. The same distribution of vertex labels in these two experiments then implies the same probability of \(\mathsf {C\text{ }f\text{ }coll_{in}}\) occurring.
We proceed by upperbounding the probability of an outer Cfcollision.
Lemma 3
Proof
Applying (4) to the event \(\mathsf {C\text{ }f\text{ }coll_{out}}\) as A, it remains to bound the probability \(\mathsf {P}^{\mathsf {ideal}''}\left[ \mathsf {C\text{ }f\text{ }coll_{out}} \right] \); for this we observe that \( \mathsf {P}^{\mathsf {ideal}''}[\mathsf {C\text{ }f\text{ }coll_{out}}] = \mathsf {P}^{\mathsf {ideal}'}[\mathsf {C\text{ }f\text{ }coll_{out}}] \) similarly as before: the repetition pattern \(\rho \) sampled in step 2 of \(\mathsf {ideal}'\) has the same distribution as the repetition pattern induced by the tree \({T}^\mathsf{g}_K({\mathcal Q}_C)\) in \(\mathsf {ideal}''\), and this together with the sampling performed in step 3 results in the same distribution of vertex labels in \(\mathsf {ideal}''\) and \(\mathsf {ideal}'\) and hence also in the same probability of \(\mathsf {C\text{ }f\text{ }coll_{out}}\) in both experiments.
3.6 Probability of Repeated Outer Invocations
In this section we analyze the probability that any of the outer finvocations in the ideal experiment will not be fresh, in particular we upperbound both \(\mathsf {P}[\mathsf {T}_\mathsf {ideal}\in \mathsf {BT}_2]\) and \(\mathsf {P}[\mathsf {T}_\mathsf {ideal}\in \mathsf {BT}_3]\).
Lemma 4
Proof
Lemma 5
Proof
As is clear from the description of the ideal experiment, the key \(K_2\) is chosen uniformly at random and independently of the rest of the experiment, in particular of the labels \(\lambda (v)\). The lemma hence follows by a simple union bound over all \(\ell {q_\mathrm {C}}\) vertices \(v\in V\). \(\square \)
3.7 Good Transcripts and Putting Pieces Together
3.8 Tightness
We now argue that the \({q_\mathrm {C}}{q_\mathsf{f}}/2^{2c}\) term in our bound on the security of \(\mathsf {WNMAC}\) as given in (2) is tight, by giving a matching attack (up to a linear factor O(c)). For most practical parameters, this will be the dominating term in (2), and thus for those parameters Theorem 1 gives a tight bound. Here we only describe an attack for the case where \({q_\mathrm {C}}=\varTheta (c)\) is very small, and defer the general case to the full version.
The \({q_\mathrm {C}}=\varTheta (c)\) Case. We must define an adversary \({\mathsf A}^{{\mathcal {O}},\mathsf{f}}\) who can distinguish the case where the first oracle \({\mathcal {O}}\) implements a random function \(\mathsf{R}\) from the case where it implements \(\mathsf {WNMAC}^\mathsf{f}((K_1,K_2,K_{\mathrm {w}}),\cdot )\) with random keys \(K_1,K_2,K_{\mathrm {w}}\) using the random function \(\mathsf{f}:\{0,1\}^{b+c}\rightarrow \{0,1\}^c\) which is given as the second oracle.
3.9 DistinguishingH Security of WNMAC
The above results also imply a bound on the distinguishingH security of \(\mathsf {WNMAC}\). To capture this, we first introduce the notion of distinguishingC, which corresponds to PRFsecurity with the restriction that the distinguisher only uses construction queries.
Definition 2
The notion of distinguishingC is useful for bridging distinguishingH and PRFsecurity, as the following lemma shows (we omit its simple proof).
Lemma 6
One can readily obtain a bound on the distinguishingC security of \(\mathsf {WNMAC}\) using Theorem 1 with \({q_\mathsf{f}}=0\).
Lemma 7
By combining Theorem 1 and Lemmas 6 and 7, we get the following theorem.
Theorem 2
3.10 State Recovery for WNMAC
Theorem 3
Proof (sketch)
First, we replace the compression function oracle \(\mathsf{f}\) by an independent random function \(\mathsf{g}\) completely unrelated to \(\mathsf {WNMAC}^\mathsf{f}\). The error introduced by this is upperbounded by Theorem 2 and now, compressionfunction queries are useless to the adversary, hence we can disregard them.
Let us denote by \(\mathcal E\) the experiment where \(\mathsf{A}\) interacts with \(\mathsf {WNMAC}^\mathsf{f}\) (without direct access to \(\mathsf{f}\)). Consider an alternative experiment \(\mathcal E'\) given in Fig. 4. As long as the key \(K_2\) chosen in step 4 does not hit any of the internal states that occurred during the query evaluation, the experiment \(\mathcal E'\) is identical to \(\mathcal E\). Moreover, since \(K_2\) is chosen independently at random, such a hit can only occur with probability at most \(\ell {q_\mathrm {C}}/2^c\). Since the vertex labels are only sampled after the adversary makes its guess for the state, the probability that the guess will be correct is at most \(\ell /2^c\). \(\square \)
4 Whitening HMAC
The theorem below relates the security of \(\mathsf {WHMAC}\) and \(\mathsf {WHMAC}^+\) to the security of \(\mathsf {WNMAC}\).
Theorem 4
Proof
Intuitively, for \(\mathsf {WHMAC}\) one can think of \(\mathsf{f}\) as an extractor which extracts keys \(K'_1,K'_2\) from K, and the bound then readily follows by the leftover hash lemma. For \(\mathsf {WNMAC}^+\) one can roughly think of \(K'_1\) and \(K'_2\) as being extracted from independent keys \(K^+\) and K, respectively. For the latter it is thus sufficient that b (which is the length, and thus also the entropy of the uniform K and \(K^+\)) is sufficiently larger than c (the length of \(K'_1,K'_2\)), whereas for the former we need b to be sufficiently larger than 2c. We now give the details of the proof for \(\mathsf {WHMAC}\) and postpone the treatment of \(\mathsf {WNMAC}^+\) to the full version.
5 The Dual WNMAC Construction
Looking at the security bounds for \(\mathsf {WNMAC}\) given in Sect. 3 from a distance, it seems that under reasonable assumptions the most restrictive term in the bounds is \({q_\mathsf{f}}{q_\mathrm {C}}/2^{2c}\). Intuitively speaking, the reason for this term is the outer \(\mathsf{f}\)call in \(\mathsf {WNMAC}\) that only takes 2c bits of actual inputs and adds \(bc\) padding zeroes.
In an attempt to overcome this limitation, we propose a variant of the \(\mathsf {WNMAC}\) construction that we call Dual WNMAC (\(\mathsf {DWNMAC}\)). We prove the PRFsecurity of \(\mathsf {DWNMAC}\) that goes beyond the restrictive term \({q_\mathsf{f}}{q_\mathrm {C}}/2^{2c}\) and our proof again extends also to distinguishingH and staterecovery security. The price we pay for this improvement is a slight increase in the key length and the fact that \(\mathsf {DWNMAC}\) cannot be implemented using only blackbox access to \(\mathsf {NMAC}\). Similarly, if we apply the same modification to \(\mathsf {WHMAC}\), the resulting construction can no longer be implemented using blackbox access to \(\mathsf {HMAC}\).
We now summarize the security of \(\mathsf {DWNMAC}\).
Theorem 5
Proof (sketch)
The proofs are analogous to the proofs for \(\mathsf {WNMAC}\) given in Sect. 3, with the main modification needed in Lemma 3 where the probability of an outer Cfcollision can be upperbounded by \({q_\mathrm {C}}{q_\mathsf{f}}/2^{b+c}\). Roughly speaking, this is because the outer call in \(\mathsf {DWNMAC}\) does not contain the \(0^{bc}\) padding and instead processes \(b+c\) bits of input that are hard to predict for the attacker. \(\square \)
Footnotes
 1.
Some details such as padding and arbitrary key length are addressed in Sect. 2.
 2.
Here we refer to Theorem 2 in [7] that formally considers a related construction \({\mathsf {NI}}\) in the standard model. However, its proof starts by a transition to the idealmodel analysis of a construction very closely related to \(\mathsf {NMAC}\), while disallowing compressionfunction queries.
Notes
Acknowledgments
We thank the anonymous reviewers for their helpful comments. Gaži and Pietrzak’s work was partly funded by the European Research Council under an ERC Starting Grant (259668PSPC). Tessaro’s research was partially supported by NSF grant CNS1423566 and by the Glen and Susanne Culler Chair.
References
 1.An, J.H., Bellare, M.: Constructing VILMACs from FILMACs: message authentication under weakened assumptions. In: Wiener, M. (ed.) CRYPTO 1999. LNCS, vol. 1666, pp. 252–269. Springer, Heidelberg (1999) Google Scholar
 2.Bellare, M.: New proofs for \({\sf NMAC}\) and \({\sf HMAC}\): security without collisionresistance. In: Dwork, C. (ed.) CRYPTO 2006. LNCS, vol. 4117, pp. 602–619. Springer, Heidelberg (2006) CrossRefGoogle Scholar
 3.Bellare, M., Canetti, R., Krawczyk, H.: Keying hash functions for message authentication. In: Koblitz, N. (ed.) CRYPTO 1996. LNCS, vol. 1109, pp. 1–15. Springer, Heidelberg (1996) Google Scholar
 4.Damgård, I.B.: A design principle for hash functions. In: Brassard, G. (ed.) CRYPTO 1989. LNCS, vol. 435, pp. 416–427. Springer, Heidelberg (1990) Google Scholar
 5.Dinur, I., Leurent, G.: Improved generic attacks against hashbased MACs and HAIFA. In: Garay, J.A., Gennaro, R. (eds.) CRYPTO 2014, Part I. LNCS, vol. 8616, pp. 149–168. Springer, Heidelberg (2014) Google Scholar
 6.Dodis, Y., Ristenpart, T., Steinberger, J., Tessaro, S.: To hash or not to hash again? (In)differentiability results for H \(^{2}\) and HMAC. In: SafaviNaini, R., Canetti, R. (eds.) CRYPTO 2012. LNCS, vol. 7417, pp. 348–366. Springer, Heidelberg (2012) CrossRefGoogle Scholar
 7.Gaži, P., Pietrzak, K., Rybár, M.: The exact PRFsecurity of NMAC and HMAC. In: Garay, J.A., Gennaro, R. (eds.) CRYPTO 2014, Part I. LNCS, vol. 8616, pp. 113–130. Springer, Heidelberg (2014) Google Scholar
 8.Gaži, P., Pietrzak, K., Tessaro, S.: The exact PRF security of truncation: tight bounds for keyed sponges and truncated CBC. In: Gennaro, R., Robshaw, M.J.B. (eds.) CRYPTO 2015, Part I. LNCS, vol. 9215, pp. 368–387. Springer, Heidelberg (2015) CrossRefGoogle Scholar
 9.Goldreich, O., Goldwasser, S., Micali, S.: On the cryptographic applications of random functions. In: Blakely, G.R., Chaum, D. (eds.) CRYPTO 1984. LNCS, vol. 196, pp. 276–288. Springer, Heidelberg (1985) CrossRefGoogle Scholar
 10.Guo, J., Peyrin, T., Sasaki, Y., Wang, L.: Updates on generic attacks against \({\sf HMAC}\) and \({\sf NMAC}\). In: Garay, J.A., Gennaro, R. (eds.) CRYPTO 2014, Part I. LNCS, vol. 8616, pp. 131–148. Springer, Heidelberg (2014) Google Scholar
 11.Hardy, G.H., Wright, E.M.: An Introduction to the Theory of Numbers, 6th edn. Oxford University Press, Oxford (2008) zbMATHGoogle Scholar
 12.Håstad, J., Impagliazzo, R., Levin, L.A., Luby, M.: A pseudorandom generator from any oneway function. SIAM J. Comput. 28(4), 1364–1396 (1999)zbMATHMathSciNetCrossRefGoogle Scholar
 13.Kim, J.S., Biryukov, A., Preneel, B., Hong, S.H.: On the security of HMAC and NMAC based on HAVAL, MD4, MD5, SHA0 and SHA1 (Extended abstract). In: De Prisco, R., Yung, M. (eds.) SCN 2006. LNCS, vol. 4116, pp. 242–256. Springer, Heidelberg (2006) CrossRefGoogle Scholar
 14.Krawczyk, H., Bellare, M., Canetti, R.: HMAC: keyedhashing for message authentication. In: IETF Internet Request for Comments 2104, February 1997Google Scholar
 15.Leurent, G., Peyrin, T., Wang, L.: New generic attacks against hashbased MACs. In: Sako, K., Sarkar, P. (eds.) ASIACRYPT 2013, Part II. LNCS, vol. 8270, pp. 1–20. Springer, Heidelberg (2013) CrossRefGoogle Scholar
 16.Merkle, R.C.: One way hash functions and DES. In: Brassard, G. (ed.) CRYPTO 1989. LNCS, vol. 435, pp. 428–446. Springer, Heidelberg (1990) Google Scholar
 17.Naito, Y., Sasaki, Y., Wang, L., Yasuda, K.: Generic staterecovery and forgery attacks on ChopMDMAC and on NMAC/HMAC. In: Sakiyama, K., Terada, M. (eds.) IWSEC 2013. LNCS, vol. 8231, pp. 83–98. Springer, Heidelberg (2013) CrossRefGoogle Scholar
 18.Patarin, J.: The “Coefficients H” technique. In: Avanzi, R.M., Keliher, L., Sica, F. (eds.) SAC 2008. LNCS, vol. 5381, pp. 328–345. Springer, Heidelberg (2009) CrossRefGoogle Scholar
 19.Peyrin, T., Sasaki, Y., Wang, L.: Generic relatedkey attacks for HMAC. In: Wang, X., Sako, K. (eds.) ASIACRYPT 2012. LNCS, vol. 7658, pp. 580–597. Springer, Heidelberg (2012) CrossRefGoogle Scholar
 20.Peyrin, T., Wang, L.: Generic universal forgery attack on iterative hashbased MACs. In: Nguyen, P.Q., Oswald, E. (eds.) EUROCRYPT 2014. LNCS, vol. 8441, pp. 147–164. Springer, Heidelberg (2014) CrossRefGoogle Scholar
 21.Preneel, B., van Oorschot, P.C.: MDxMAC and building fast MACs from hash functions. In: Coppersmith, D. (ed.) CRYPTO 1995. LNCS, vol. 963, pp. 1–14. Springer, Heidelberg (1995) Google Scholar
 22.Sasaki, Y., Wang, L.: Generic attacks on strengthened HMAC: nbit secure HMAC requires key in all blocks. In: Abdalla, M., De Prisco, R. (eds.) SCN 2014. LNCS, vol. 8642, pp. 324–339. Springer, Heidelberg (2014) Google Scholar
 23.Yasuda, K.: “Sandwich” Is indeed secure: how to authenticate a message with just one hashing. In: Pieprzyk, J., Ghodosi, H., Dawson, E. (eds.) ACISP 2007. LNCS, vol. 4586, pp. 355–369. Springer, Heidelberg (2007) CrossRefGoogle Scholar