In this section, we introduce the technical tools and general concepts used in our attacks. Those tools briefly introduced in Sect. 2.1 through Sect. 2.4 are all existing and well-known tools. Those tools described in detail in Sect. 2.5 through Sect. 2.7 are our new tools exploited in different attacks.
Joux’s Multi-collision (MC) and Its Applications in Attacks on the Concatenation Combiner [35]
In 2004, Joux introduced multi-collisions on narrow-pipe Merkle–Damgård hash functions. Given a hash function \(\mathcal {H}\), a multi-collision refers to a set of messages \(\mathcal {M} =\{M_1, M_2, \ldots \}\) whose hash digests are all the same, i.e., \(\mathcal {H}(M_i)=\mathcal {H}(M_j)\) for any pair \(M_i,M_j \in \mathcal {M} \). The computational complexity of generic brute-force search increases exponentially when the target size \(|\mathcal {M}|\) increases; more precisely, it is approximately \(2^{(|\mathcal {M}|-1) \cdot n/|\mathcal {M}|}\). Utilizing the iterative nature of Merkle–Damgård structure, Joux’s algorithm (see Algorithm 1, whose pseudo-code is given in Appendix A) is able to find multi-collision of size \(2^t\) with a complexity of \(t\cdot 2^{n/2}\), i.e., a complexity not much greater than that of finding a single collision.
It is trivial to see the message set \(\mathcal {M} = \{\overline{m}_1 \Vert \overline{m}_2 \Vert \cdots \Vert \overline{m}_t ~~|~~ \overline{m}_i=m_i\text { or }m'_i \text { for }i = 1,2, \ldots , t\}\) forms a multi-collision of size \(2^t\), and the overall complexity is \(\mathcal {O}(t\cdot 2^{n/2})\). Moreover, a data structure \( \mathcal {M}_\mathtt{MC}\) of t pairs of message blocks can fully define the set \( \mathcal {M} \) of \( 2^t \) colliding messages.
With Joux’s multi-collision at hand, one can immediately deploy a collision attack and a preimage attack on concatenation combiner with complexities \( n\cdot 2^{n/2} \) and \( n\cdot 2^n \), respectively. The collision attack goes as follows: first, build a \( 2^{n/2} \)-Joux’s multi-collision for one of the underlying hash functions (iterated), and then, exploit the messages in the structure to launch a birthday attack for the other hash function to find a collision among the outputs. The preimage attack follows a similar framework (see [7] for an illustration and Joux’s original paper [35] for more details).
Since its invention, Joux’s multi-collisions have been employed in numerous cryptanalysis of hash functions, including the following most relevant ones and works such as [32, 49].
Expandable Message (EM) and the Long Message Second-Preimage Attack [38]
In [14], Dean devised a second-preimage attack for long messages on specific Merkle–Damgård hash functions for which it is easy to find fixed points in their compression function. Given a challenge message \(\varvec{M}= m_1\Vert m_2\Vert \ldots \Vert m_L\), the attacker computes the sequence of internal states \(a_0,a_1,\ldots ,a_L\) generated during the invocation of the compression function on \(\varvec{M}\). A simplified attack would now start from the state \(x_0 = IV\) and evaluate the compression function with arbitrary message blocks until a collision \(h(x_0,m) = a_p\) is found for some message block m and index p. The attacker can now append the message suffix \(m_{p+1}\Vert \ldots \Vert m_L\) to m, hoping to obtain the target hash value \(\mathcal {H}(\varvec{M})\). However, this approach does not work due to the final padding of the message length, which will be different if the message prefixes are of different lengths. The solution of Dean was to compute an expandable message that consists of the initial state \(x_0\) and another state \(\hat{x}\) such that for each length \(\kappa \) (in some range), there is a message \(M_{\Vert \kappa }\) of \(\kappa \) blocks that maps \(x_0\) to \(\hat{x}\). Thus, the algorithm first finds a collision \(h(\hat{x},m) = a_p\), and the second preimage is computed as \(M_{\Vert p-1}\Vert m\Vert m_{p+1}\Vert \ldots \Vert m_L\). The assumption that it is easy to find fixed points in the compression function is used in efficient construction of the expandable message.
In [38], Kelsey and Schneier described a more generic attack that uses multi-collisions of a special form to construct an expandable message, removing the restriction of Dean regarding fixed points. As in Joux’s original algorithm, the multi-collisions are constructed iteratively in t steps. In the ith step, we find a collision between some \(m_i\) and \(m'_i\) such that \(|m_i|=1\) (it is a single block) and \(|m'_i|=2^{i-1}+1\), namely \(h(x_{i-1},m_i) = h(x_{i-1},m'_i)\). This is done by firstly picking an arbitrary prefix of size \(2^{i-1}\) of \(m'_i\) denoted by \( \hat{m}_i \), say \( [0]^{2^{i-1}} \), computing \(h(x_{i-1}, \hat{m}_i) = x'_i\) and then looking for a collision \(h(x_{i-1},m_i) = h(x'_i,\check{m}_i)\) using a final block \(\check{m}_i\) (namely \(m'_i = \hat{m}_i \Vert \check{m}_i\)) (see Fig. 6).
The construction of Kelsey and Schneier gives an expandable message that can be used to generate messages starting from \(x_0\) and reaching \(\hat{x} = x_t\) whose (integral) sizes are in the interval \([t,2^t+t-1]\). (Such a message structure is denoted as a \((t,2^t+t-1)\)-expandable message.) A message of length \(t \le \kappa \le 2^t+t-1\) is generated by looking at the t-bit binary representation of \(\kappa -t\). In iteration \(i \in \{1,2,\ldots ,t\}\), we select the long message fragment \(m'_i\) if the ith LSB of \(\kappa -t\) is set to 1 (otherwise, we select the single block \(m_i)\). In the sequel, we denote this type of expandable message by \( \mathcal {M}_\mathtt{EM}\). Given that the challenge message \(\varvec{M}\) is of \(L \le 2^{n/2}\) blocks, the construction of the expandable message in the first phase of the attack requires less than \(n \cdot 2^{n/2}\) computations, while obtaining the collision with one of the states computed during the computation of \(\varvec{M}\) requires approximately \(1/L \cdot 2^{n}\) computations according to the birthday paradox.
Diamond Structure (DS) [37]
Like Joux’s multi-collisions and expandable message, the diamond structure is also a type of multi-collision. The difference is that instead of mapping a common starting state to a final state, each message in a diamond maps a different state to a final state. A \( 2^t \)-diamond contains \( 2^t \) specially structured messages mapping \( 2^t \) starting states to a final state, and it forms a complete binary tree of depth t. The \( 2^t \) starting states are leaves, and the final state is the root. A \( 2^t \)-diamond can be built by launching several collision attacks requiring about \( \sqrt{t}\cdot 2^{\frac{(n+t)}{2}} \) messages and \( n\cdot \sqrt{t}\cdot 2^{\frac{(n+t)}{2}} \) computations in total [9]. In the sequel, we denote the set of messages in a diamond by \( \mathcal {M}_\mathtt{DS}\). The diamond was primarily invented by Kelsey and Kohno to devise herding attacks against MD hash functions [37], in which the attacker first commits to the digest value of a message using the root of his diamond and later “herds” any given prefix of a message to his commitment by choosing an appropriate message from his diamond as the suffix. Later, Andreeva et al. successfully exploited it to launch herding and/or second-preimage attack beyond MD hash constructions, such as the dithered hash, Hash-Twice, the Zipper hash and hash trees [1,2,3]. Concretely, the second-preimage attack on Hash-Twice in [2] leverages techniques in herding attack and techniques in the above-mentioned second-preimage attack. One key point of this attack is that it builds a long Joux’s multi-collision in the first pass, exploits messages in this multi-collision to build a diamond structure in the second pass and finally uses the diamond as a connector to connect one crafted message to the challenge message on some states. Let \( 2^t \) be the width of the diamond and \( 2^{\ell } \) be the length of the message; the complexity of this attack is approximately \( 2^{(n + t)/2} + 2^{n - \ell } + 2^{n - t} \).
Distinguished Points (DP)
The memory complexity of many algorithms that are based on functional graphs (e.g., parallel collision search [56]) can be reduced by utilizing the distinguished points method (which is attributed to Ron Rivest). Assume that our goal is to detect a collision of a chain (starting from an arbitrary node) with the nodes of \(\mathcal {G}\) computed in Algorithm 5, but without storing all the \(2^{t}\) nodes in memory. The idea is to define a set of \(2^{t}\) distinguished points (nodes) using a simple predicate (e.g., the \(n-t\) LSBs of a node are zero). The nodes of \(\mathcal {G}\) contain approximately \(2^{t} \cdot 2^{t-n} = 2^{2t-n}\) distinguished points, and only they are stored in memory. A collision of an arbitrary chain with \(\mathcal {G}\) is expected to occur at depth of about \(2^{n-t}\) and will be detected at the next distinguished point which is located (approximately) after traversing additional \(2^{n-t}\) nodes. Consequently, we can detect the collision with a small overhead in time complexity, but a significant saving factor of \(2^{n-t}\) in memory.
Interestingly, in the specific attack of Sect. 4, the distinguished points method is essential for reducing the time complexity of the algorithm.
Interchange Structure (IS)
In this subsection, we present how to build a structure that enables us to simultaneously control two (or more) hash computation lanes sharing the same input message and succeed in further relaxing the pairwise relation between the internal states of computation lanes. We name the structure the interchange structure.
The main idea is to consider several chains of internal states reached by processing a common message \(\varvec{M}\) from different starting points. (Note that the message \(\varvec{M}\) is not fixed in advance, but will be determined when building the structure.) More precisely, the message \(\varvec{M}\) is denoted as the primary message and divided into several chunks: \(\varvec{M}= M_0\Vert M_1\Vert \ldots \). (As discussed later, a chunk consists of approximately n / 2 message blocks.) We denote chains of internal states for \(\mathcal {H}_1\) as \(\vec {a}_j\) and the individual states of the chain as \(\vec {a}_j^{i}\), with \(h^*_1(\vec {a}_j^{i}, M_i) = \vec {a}_j^{i+1}\). Similarly, we denote chains for \(\mathcal {H}_2\) as \(\vec {b}_k\), with \(h^*_2(\vec {b}_k^{i}, M_i) = \vec {b}_k^{i+1}\). When considering both hash functions, message block \(M_i\) leads from the pair of states \((\vec {a}_j^{i},\vec {b}_k^{i})\) to \((\vec {a}_j^{i+1},\vec {b}_k^{i+1})\), which is denoted as
$$\begin{aligned} (\vec {a}_j^{i},\vec {b}_k^{i})&{\mathop {\leadsto }\limits ^{M_i}} (\vec {a}_j^{i+1},\vec {b}_k^{i+1}). \end{aligned}$$
Switch Structure
To construct a desired interchange structure, we first create the basic building blocks to jump between chains in a controlled way; we named them switches. A switch allows to jump from a specific pair of chains \((\vec {a}_{j_0}, \vec {b}_{k_0})\) to a different pair of chains \((\vec {a}_{j_0}, \vec {b}_{k_1})\) using a secondary message chunk \(M_i'\), in addition to the normal transitions using chunk \(M_i\) of the primary message \(\varvec{M}\):
To simplify the notation, we often omit the chunk index to show only the chains that are affected by the switch.
The main message chunk \(M_i\) and the secondary message chunk \(M_i'\) are determined when building the switch, and the main message defines the next state of all the chains. We note that the secondary message chunk \(M'_i\) should only be used when the state is \((\vec {a}_{j_0}^{i}, \vec {b}_{k_0}^{i})\). A simple example is depicted in Fig. 7.
Alternatively, a switch can be designed to jump from \((\vec {a}_{j_0}, \vec {b}_{k_0})\) to \((\vec {a}_{j_1}, \vec {b}_{k_0})\). It can be built with a complexity of \(\tilde{O}(2^{n/2})\).
We now explain how to build the switch structure at the core of some of our attacks. This construction is strongly based on the multi-collision technique of Joux presented in Sect. 2.1.
Given states \(\vec {a}_{j_0}^i\), \(\vec {b}_{k_0}^i\) and \(\vec {b}_{k_1}^i\), we want to build message chunks \(M_i\) and
in order to have the following transitions:
The main message chunk \(M_i\) is used to define the next state of all the remaining chains, while the secondary message chunk
will be used to jump from chains \((\vec {a}_{j_0}, \vec {b}_{k_0})\) to \((\vec {a}_{j_0}, \vec {b}_{k_1})\). We note that
will only be used when the state is \((\vec {a}_{j_0}^{i},\vec {b}_{k_0}^{i})\). In particular, \(M_i\) and
must satisfy the following:
The full building procedure is shown in Algorithm 2 whose pseudo-code is given in Appendix A; it requires approximately \(n/2 \cdot 2^{n/2}\) evaluations of the compression functions.
Interchange Structure
By combining several simple switches, we can build an interchange structure with starting points \(IV_1\) and \(IV_2\) and ending points \(\big \{ A_j \mid j=0 \ldots 2^t-1\big \}\) and \(\big \{ B_k \mid k=0 \ldots 2^t-1\big \}\), so that we can select a message ending in any state \((A_j, B_k)\). An interchange structure with \(2^t\) chains for each function requires about \(2^{2t}\) switches. Since we can build a switch for a cost of \(\tilde{O}(2^{n/2})\), the total structure is built with \(\tilde{O}(2^{2t+n/2})\) operations.
Let us now describe the combination of switch structures into an interchange structure. The goal of this structure is to select the final value of the \(\mathcal {H}_1\) computation and the \(\mathcal {H}_2\) computation independently. More precisely, the structure defines two sets of final values \(A_j\) and \(B_k\), and a set of messages \(\varvec{M}_{jk}\) such that
$$\begin{aligned} (IV_1, IV_2) {\mathop {\leadsto }\limits ^{\varvec{M}_{jk}}} (A_j, B_k). \end{aligned}$$
Algorithm 3 describes the combination of switches to build an interchange structure. Its pseudo-code is given in Appendix A, where the Interchange functions builds the structure and the SelectMessage function extracts the message reaching \((\vec {a}_j, \vec {b}_k)\).
The structure can be somewhat optimized using the fact that the extra chains have no prespecified initial values. We show how to take advantage of this in Appendix B, using multi-collision structures in addition to the switch structures. However, this does not significantly change the complexity: we need \((2^t-1)(2^t-1)\) switches instead of \(2^{2t}-1\). In total, we need approximately \(n/2 \cdot 2^{2t+n/2}\) evaluations of the compression functions to build a \(2^t\)-interchange structure.
We believe that a \(2^{t}\)-interchange structure based on switches will need at least \(\varTheta (2^{2t})\) switches, because every switch can only increase the number of reachable pairs \((\vec {a}_j,\vec {b}_k)\) by one. As shown in Appendix B, some switches can be saved in the beginning, but it seems that new ideas are needed to reduce the total complexity below \(\varTheta (2^{2t+n/2})\).
Simultaneous Expandable Messages (SEMs)
In this subsection, we build a simultaneous expandable message for two MD hash functions basing on the multi-collision described in Sect. 2.1 and the expandable message for a single MD hash function described in Sect. 2.2. This expandable message consists of the initial states \((IV_1,IV_2)\) and final states \((\hat{x},\hat{y})\) such that for each length \(\kappa \) in some appropriate range (determined below), there is a message \(M_{\Vert \kappa }\) of \(\kappa \) blocks that maps \((IV_1,IV_2)\) to \((\hat{x},\hat{y})\). A similar construction of an expandable message over two hash functions was proposed in the independent paper [34] by Jha and Nandi, which analyses the Zipper hash assuming weak compression functions. We describe our construction approach of this simultaneous expandable message in detail next.
We set \(C \approx n/2 + \log (n)\) as a parameter that depends on the state size n. Our basic building block consists of two pairs of states \((x_0,y_0)\) and \((x_1,y_1)\) and two message fragments ms and ml that map the state pair \((x_0,y_0)\) to \((x_1,y_1)\). The message ms is the (shorter) message fragment of fixed size C, while ml is of size \(i > C\). We will show how to construct this building block for any state pair \((x_0,y_0)\) and length \(i > C\) in Algorithm 4.
Given this building block and a positive parameter t, we build an expandable message in the range of \([C(C-1)+tC, C^2-1+C(2^t+t-1)]\). This is done by utilizing a sequence of \(C-1+t\) basic building blocks. The first \(C-1\) building blocks are built with parameters \(i \in \{C+1,C+2,\ldots ,2C-1\}\). It is easy to see that these structures give a \((C(C-1),C^2-1)\)-expandable message by selecting at most one longer message fragment from the sequence, where the remaining \(C-2\) (or \(C-1\)) fragments are of length C. The final t building blocks give a standard expandable message, but it is built in intervals of C. These t building blocks are constructed with parameters \(i = C(2^{j-1}+1)\) for \(j \in \{1,\ldots ,t\}\). See Fig. 8 for a visual illustration.
Given a length \(\kappa \) in the range of \([C(C-1)+tC, C^2-1+C(2^t+t-1)]\), we can construct a corresponding message by first computing \(\kappa \text{(modulo } C)\). We then select the length \(\kappa ' \in [C(C-1),C^2-1]\) such that \(\kappa ' \equiv \kappa \text{(modulo } C)\), defining the first \(C-1\) message fragment choices. Finally, we compute \((\kappa -\kappa ')/C\), which is an integer in the range of \([t,2^t+t-1]\), and select the final t message fragment choices as in a standard expandable message using the binary representation of \((\kappa -\kappa ')/C\).
Construction of the Building Block
Given state pair \((x_0,y_0)\) and length \(i > C\), the algorithm for constructing the building block for the expandable message is based on multi-collisions, as described below; its pseudo-code is given in Appendix A.
The complexity of Step 1 is less than i compression function evaluations. The complexity of Step 2 is approximately \(2^{n/2}\), while the complexity of Step 3 is approximately \(C \cdot 2^{n/2} \approx n \cdot 2^{n/2}\). The complexity of Step 4 is approximately \(i + n \cdot 2^{n/2}\). In total, the complexity of constructing the basic building block is approximately \(i + n \cdot 2^{n/2}\) (ignoring small factors).
Complexity Analysis of the Full Building Procedure
The full expandable message requires computing \(C-1+t\) building blocks whose sum of length parameters (dominated by the final building block) is approximately \(C\cdot {2^t} \approx n \cdot 2^t\). Assuming that \(t<n\), we construct \(C-1+t \approx n\) building blocks, and the total time complexity of constructing the expandable message is approximately \(n\cdot 2^t + n^2 \cdot 2^{n/2}\). Our attacks require the \((C(C-1)+tC, C^2-1+C(2^t+t-1))\)-expandable message to extend up to length L, implying that \(L \approx n\cdot 2^t\) and giving a time complexity of approximately
$$\begin{aligned} L + n^2 \cdot 2^{n/2}.\end{aligned}$$
Functional Graph (FG) of Random Mappings
In many of our attacks, we evaluate a compression function \(h\) with a fixed message input block m (e.g., the zero block) and simplify our notation by defining \(f(x)= h_{[m]}(x) = h(x,m)\). The mapping f yields a directed functional graph.
The functional graph of a random mapping f is defined via successive iteration on this mapping.
Let f be an element in \(\mathcal {F}_N\) that is the set of all mappings with the set N as both the domain and range. The functional graph of f is a directed graph whose nodes are the elements \(0, \dots , N-1\) and whose edges are the ordered pairs \(\langle x, f(x) \rangle \), for all \(x\in \{0, \dots , N-1\}\). If starting from any \(x_0\) and iterating f, that is \(x_1 = f(x_0), x_2 = f(x_1),\dots ,\) we will find that before N iterations, a value \(x_j\) equal to one of \(x_0, x_1, \dots , x_{j-1}\); suppose that the collided one is \( x_i \). In this case, we say the path \(x_0\rightarrow x_1\rightarrow \dots \rightarrow x_{i-1}\rightarrow x_{i}\) connects to a cycle\( x_i \rightarrow x_{i+1} \rightarrow \cdots \rightarrow x_{j - 1} \rightarrow x_i \). If we consider all possible starting points \(x_0\), paths exhibit confluence and form trees; trees grafted on cycles form components; a collection of components forms a functional graph. That is, a functional graph can be viewed as a set of connected components; a component is a cycle of trees; a tree is recursively defined by appending a node to a set of trees; a node is a basic atomic object and is labelled by an integer [26].
Structures of functional graph of random mappings have been studied for a long time, and some parameters have accurate asymptotic evaluations [26]. Below, we list some of the most relevant ones. These properties have been extensively studied and exploited in cryptography, e.g., in the classical works of Hellman [29] and van Oorschot and Wiener [56], and much more recently in generic attacks on hash-based MACs [18, 28, 41, 52, 54] (refer to [7] for a systematization of knowledge regarding the applications of random functional graphs in generic attacks).
Theorem 1
[26]. The expected number of components, number of cyclic nodes (nodes belong to a cycle), number of terminal nodes (nodes without preimage: \(f^{-1}(x) = \emptyset \)), number of image nodes (nodes with preimage) and number of kth iterate image nodes (image nodes of the kth iterate \(f^{k}\) of f) in a random mapping of size N have the following asymptotic forms as \(N\rightarrow \infty \):
- 1.
# Components \(\frac{1}{2}\log N = 0.5 \cdot n\)
- 2.
# Cyclic nodes \(\sqrt{\pi N /2} \approx 1.2 \cdot 2^{n/2}\)
- 3.
# Terminal nodes \(e^{-1} N \approx 0.37 \cdot 2^n\)
- 4.
# Image nodes \((1 - e^{-1}) N \approx 0.62 \cdot 2^n\)
- 5.
# kth iterate image nodes \((1 - \tau _k) N\), where \(\tau _k\) satisfies the recurrence relation \(\tau _0=0\), \(\tau _{k+1} = e^{-1 + \tau _k}\)
Seen from an arbitrary node \( x_0 \), we call the length (measured by the number of edges) of the path starting from \( x_0 \) and before entering a cycle the tail length of \( x_0 \) and denote it by \( \lambda (x_0) \); term the length of the cycle connected with \( x_0 \) the cycle length of \( x_0 \) and denote it by \( \mu (x_0) \); name the length of the non-repeating trajectory of the node \( x_0 \) the rho length of \( x_0 \) and denote it by \( \rho (x_0) = \lambda (x_0) + \mu (x_0) \).
Theorem 2
[26]. Seen from a random point (any of the N nodes in the associated functional graph is taken equally likely) in a random mapping of \(\mathcal {F}_N\), the expected tail length, cycle length and rho length have the following asymptotic forms:
- 1.
Tail length (\(\lambda \)) \( \sqrt{\pi N / 8} \approx 0.62 \cdot 2^{n/2}\)
- 2.
Cycle length (\( \mu \)) \( \sqrt{\pi N / 8} \approx 0.62 \cdot 2^{n/2}\)
- 3.
Rho length (\( \rho \)) \( \sqrt{\pi N / 2} \approx 1.2 \cdot 2^{n/2}\)
Theorem 3
[26]. The expected maximum cycle length (\(\mu ^{max}\)), maximum tail length (\( \lambda ^{max} \)) and maximum rho length (\( \rho ^{max} \)) in the functional graph of a random mapping of \(\mathcal {F}_N\) satisfy the following:
- 1.
\(\mathbf {E}\{\mu ^{max}\mid \mathcal {F}_N\} = 0.78248 \cdot 2^{n/2}\)
- 2.
\(\mathbf {E}\{\lambda ^{max}\mid \mathcal {F}_N\} = 1.73746 \cdot 2^{n/2} \)
- 3.
\(\mathbf {E}\{\rho ^{max}\mid \mathcal {F}_N\} = 2.41490 \cdot 2^{n/2}\)
Theorem 4
[26]. Assuming the smoothness condition, the expected value of the size of the largest tree and the size of the largest connected component in a random mapping of \(\mathcal {F}_N\) are asymptotically
- 1.
Largest tree: \(0.48 \cdot 2^{n}\)
- 2.
Largest component: \(0.75782 \cdot 2^{n}\)
The results from these theorems indicate that in a random mapping, most of the points tend to be grouped together in a single giant component. This component is therefore expected to have very tall trees and a large cycle [26].
A useful algorithm for expanding the functional graph of f is given below (see Algorithm 5 whose pseudo-code is given in Appendix A). This algorithm is not new and has been previously used (for example, in [28, 54]). It takes an input parameter \(t \ge n/2\) that determines the number of expanded nodes (and the running time of the algorithm).
Deep Iterates in the Functional Graphs (FGDI)
Next, we describe our observations on functional graph of random mappings. The efficiencies of our following attacks are mostly based on these observations on special nodes in functional graphs.
In our attacks, we are particularly interested in nodes of f that are located deep in the functional graph. More specifically, \(x'\) is an iterate of depth i if there exists some ancestor node x such that \(x' = f^i(x)\), i.e., \( x' \) is an ith iterate image node (or say ith iterate for short). If i is relatively large, we say that \( x' \) is a deep iterate. Deep iterates are usually obtained using chains evaluated from an arbitrary starting point \(x_0\) by computing a sequence of nodes using the relation \(x_{i+1} = f(x_i)\). We denote this sequence by \(\vec {x}\). The following two observations regarding deep iterates make them helpful in the proposed attacks:
Observation 1
It is easy to obtain a large set of deep iterates. Specifically, by running Algorithm 5 with input parameter t (\( t\ge n/2 \)), one can obtain a set of \( 2^t \) nodes, among which a constant fraction (\( \varTheta (2^t) \)) are \( 2^{n - t} \)th iterates. The theoretical reasoning is as follows. After we have executed the algorithm and developed \(2^t\) nodes, then another chain from an arbitrary starting point is expected to collide with the evaluated graph at depth of roughly \(2^{n - t}\). This is a direct consequence of the birthday paradox. Moreover, for two chains from two different starting points x and y, the probability that \( \Pr [f^{2^{n - t}}(x) = f^{2^{n - t}}(y)] = \varTheta (2^{-t}) \) [18, Lemma 1] (note that \( n - t < n/2 \)). That is, for \( t \ge n/2 \), when the number of new chains (of length \( 2^{n - t} \) and from arbitrary starting points) is less than \( 2^{t} \), they are expected to collide with the evaluated graph at distinct points. In particular, this observation implies that most chains developed by the algorithm will be extended to depth \(\varOmega (2^{n-t})\) (without colliding with \(\mathcal {G}\) of cycling); therefore, a constant fraction of the developed nodes is iterates of depth \(2^{n-t}\). In total, the algorithm develops \(\varTheta (2^t)\) iterates of f of depth \(2^{n-t}\) in \(2^t\) time. This conclusion was also verified experimentally.
Observation 2
A deep iterate has a relatively high probability to be encountered during the evaluation of a chain from an arbitrary starting node. Let \( f_1 \) and \( f_2 \) be two independent n-to-n-bit mappings. Suppose \( \bar{x} \) (resp. \( \bar{y} \)) is an iterate of depth \( 2^{g} \) in \( \mathcal {FG}_{f_1} \) (resp. \( \mathcal {FG}_{f_2} \)); then, it is an endpoint of a chain of states of length \( 2^g \). Let d be in the interval \( [1, 2^g] \) and \( x_0 \) (resp. \( y_0 \)) be a random point. Then, according to Lemma 1, \( \Pr [x_d = \bar{x} \approx d\cdot 2^{-n}] \) (resp. \( \Pr [y_d = \bar{y} \approx d\cdot 2^{-n}] \) ), which is the probability that \( \bar{x} \) (resp. \( \bar{y} \)) will be encountered at distance d from \( x_0 \) (resp. \( y_0 \)). Due to the independence of \( f_1 \) and \( f_2 \), \(\Pr [x_d = \bar{x} \bigwedge y_d = \bar{y}] \approx (d \cdot 2^{-n})^2\). Summing the probabilities of the (disjoint) events over all distances d in the interval \([1,2^{g}]\), we conclude that the probability that \(\bar{x}\) and \(\bar{y}\) will be encountered at the same distance is approximately \((2^{g})^3 \cdot 2^{-2n} = 2^{3g - 2n}\).
The probability calculation in Observation 2 yields the conclusion that we need to compute approximately \( 2^{2n - 3g} \) chains from different starting points to find a pair of starting points \( (x_0, y_0) \) reaching a pair of \( 2^g \)th iterates \( (\bar{x}, \bar{y})\) at the same distance. This conclusion was verified experimentally. Note that since various trials performed by selecting different starting points for the chains are dependent, the proof of this conclusion is incomplete. However, this dependency is negligible in our attacks, and thus, we can ignore it. More details are given in Appendix C.
Lemma 1
Let f be an n-bit random mapping and \(x'_0\) an arbitrary point. Let \(D \le 2^{n/2}\) and define the chain \(x'_{i} = f(x'_{i-1})\) for \(i \in \{1,\ldots , D\}\) (namely \(x'_D\) is an iterate of depth D). Let \(x_0\) be a randomly chosen point, and define \(x_{d} = f(x_{d-1})\) for integer \( d \ge 1 \). Then, for any \(d \in \{1,\ldots , D\}\), \(\Pr [x_d = x'_D] = \varTheta (d \cdot 2^{-n})\).
Proof
(Sketch.) We can assume that the chains do not cycle (i.e., each chain contains distinct nodes), as \(D \le 2^{n/2}\). For \(x_d = x'_D\) to occur, \(x_{d-i}\) should collide with \(x'_{D-i}\) forFootnote 8 some \(0 \le i \le d\). For a fixed i, the probability for this collision is roughlyFootnote 9\(2^{-n}\), and summing over all \(0 \le i \le d\) (all events are disjointed), we get that the probability is approximately \(d \cdot 2^{-n}\). \(\square \)
Multi-cycles in Functional Graphs (FGMC)
Next, we study a property of some more special nodes—cyclic nodes in random functional graphs. There are efficient cycle search algorithms (with \( O(2^{n/2}) \) time complexity) to detect the cycle length and collect cyclic nodes in the largest component of a random functional graph [36, Chapter 7], and cycles have been exploited in generic attacks on hash-based MACs [28, 41]. Here, we exploit them in a new way. Each cyclic node in a functional graph defined by f loops along the cycle when computed by f iteratively and goes back to itself after a (multi-)cycle length number of function calls. This property can be utilized to provide extra degrees of freedom when estimating the distance of other nodes to a cyclic node in the functional graph, i.e., it can be expanded to a set of discrete values by using multi-cycles. For example, let x and \(x'\) be two nodes in a component of the functional graph defined by f, x be a cyclic node, and the cycle length of the component be denoted as L. Clearly, there exists a path from \(x'\) to x as they are in the same component, and the path length is denoted by d. Then, we have the following:
$$\begin{aligned} f^d(x')=x; \quad f^{L}(x)=x \Longrightarrow f^{(d + i \cdot L)}(x')=x \quad \text{ for } \text{ any } \text{ positive } \text{ integer } i. \end{aligned}$$
Suppose it is limited to use at most t cycles (limitation imposed by the length of the message). Then, the distance from \(x'\) to x is expanded to a set of \(t+1\) values \(\{d+i\cdot L \mid i=0, 1, 2,..., t\}\).
Now, let us consider a special case of reaching two deep iterates from two random starting nodes: select two cyclic nodes within the largest components in the functional graphs as the deep iterates. More specifically, let \( \mathcal {FG}_{f_1} \) and \( \mathcal {FG}_{f_2} \) be two functional graphs defined by \(f_1\) and \(f_2\). Let \(\bar{x}\) and \(x_0\) be two nodes in a common largest component of \(\mathcal {FG}_{f_1}\), where \(\bar{x}\) is a cyclic node. Let \(L_1\) denote the cycle length of the component and \(d_1\) denote the path length from \(x_0\) to \(\bar{x}\). Similarly, we define notations \(\bar{y}\), \(y_0\), \(L_2\) and \(d_2\) in \(\mathcal {FG}_{f_2}\). We are interested in the probability of linking \(x_0\) to \(\bar{x}\) and \(y_0\) to \(\bar{y}\) at a common distance. Thanks to the usage of multiple cycles, the distance values from \(x_0\) to \(\bar{x}\) and from \(y_0\) to \(\bar{y}\) can be selected from two sets \(\{ d_1+i\cdot L_1 \mid i=0, 1, 2, \ldots , t\}\) and \(\{ d_2+j\cdot L_2 \mid j=0, 1, 2, \ldots \, t\}\), respectively. Hence, as long as there exists a pair of integers (i, j) such that \(0 \le i, j \le t\) and \(d_1+i\cdot L_1=d_2+j\cdot L_2\), we obtain a common distance \(d=d_1+i\cdot L_1=d_2+j\cdot L_2\) such that
$$\begin{aligned} f_1^{d} (x_0) = \bar{x},&f_2^{d} (y_0) = \bar{y}. \end{aligned}$$
Next, we evaluate the probability amplification of reaching \((\bar{x}, \bar{y})\) from a random pair \((x_0, y_0)\) at the same distance. Without loss of generality, we assume \(L_1 \le L_2\). Let \(\varDelta L \triangleq L_2 \mod L_1\). Then, it follows that
$$\begin{aligned} d_1 + i\cdot L_1&= d_2 + j \cdot L_2&\Longrightarrow \\ d_1-d_2&=j\cdot L_2-i\cdot L_1&\Longrightarrow \\ (d_1-d_2) \mod L_1&= j\cdot \varDelta L \mod L_1.&\end{aligned}$$
Letting j range over all integer values in internal [0, t], we will collect a set of \(t+1\) values \(\mathcal {D}=\{ j\cdot \varDelta L \mod L_1 \mid j=0, 1, \ldots , t\}\).Footnote 10 Since \(d_1 = \mathcal {O}(2^{n/2})\), \(d_2=\mathcal {O}(2^{n/2})\) and \(L_1=\varTheta (2^{n/2})\), it follows that \(|d_1-d_2| =\mathcal {O}(L_1)\), and we assume \(|d_1-d_2| < L_1\) by ignoring the constant factor. Therefore, for a randomly sampled pair \((x_0, y_0)\) that encounters \((\bar{x}, \bar{y})\), we are able to derive a pair of (i, j) such that \(d_1+i\cdot L_1=d_2+j\cdot L_2\), as long as their distance bias \(d_1-d_2\) is in the set \(\mathcal {D}\). In other words, we are able to correct such a distance bias by using multi-cycles. Hereafter, the set \(\mathcal {D}\) is referred to as the set of correctable distance bias. Thus, the probability of reaching \((\bar{x}, \bar{y})\) from a random pair \((x_0, y_0)\) at a common distance is amplified by roughly t times, where t is the number of cycles to the maximum.