Keywords

1 Introduction

Finding collisions or multicollisions is a fundamental problem in theoretical computer sciences and one of the most critical problems especially in cryptography. For given finite sets X and Y with \(|Y | = N\), and a function \(H :X \rightarrow Y\), an l-collision finding problem is to find a set of l distinct inputs \(x_1,\dots ,x_l\) such that \(H(x_1) = \dots = H(x_l)\). Both upper and lower bounds of query and time complexity of the l-collision finding problem are fundamental and have several applications in cryptography.

Applications of Multicollisions. We often use the lower bound of query complexity (or the upper bound of the success probability) to prove the security of cryptographic schemes. Let us consider a cryptographic scheme based on Pseudo-Random Functions (PRFs). In the security proof, we replace the PRFs with truly random functions (or random oracles) and show the security of the scheme with the random oracles by information-theoretic arguments. In the latter security arguments, we often use the lower bound of queries for finding multicollisions of random functions. For example, Chang and Nandi [CN08] proved the indifferentiability of the chopMD hash function construction; Jaulmes et al. [JJV02] proved the indistinguishability of RMAC; Hirose et al [HIK+10] proved the indifferentiability of the ISO standard lightweight hash function Lesamnta-LW; Naito and Ohta [NO14] improved the indifferentiability of PHOTON and Parazoa hash functions; and Javanovic et al. [JLM14] greatly improved the security lower bounds of authenticated-encryption mode of KeyedSponge. The upper bound of the probability to obtain multicollisions after q queries plays an important role in their proofs.

In addition, studying and improving the upper bound for the l-collision finding problem also help our understanding, which often leads to the complexity of generic attacks. For example, l-collisions are exploited in the collision attack on the MDC-2 hash function construction by Knudsen et al [KMRT09], the preimage attack on the JH hash function by Mendel and Thomsen [MT08], the internal state recovery attack on HMAC by Naito et al [NSWY13], the key recovery attack on iterated Even-Mansour by Dinur et al [DDKS14], and the key recovery attack on LED block cipher by Nikolić et al. [NWW13].

Furthermore, multicollisions also have applications in protocols. An interesting example is a micro-payment scheme, MicroMint [RS96]. Here, a coin is a bit-string the validity of which can be easily checked but hard to produce. In MicroMint, coins are 4-collisions of a function. If 4-collisions can be produced quickly, a malicious user can counterfeit coins.

Existing Results for Multicollisions in Classical Setting. The problem of finding (multi-)collisions has been extensively discussed in the classical setting. Suppose that we can access the function H in the classical query; that is, we can send \(x \in X\) to the oracle H and obtain \(y \in Y\) as H(x). For a random function H, making q queries to H can find the collision of H with probability at most \(q^2/N\). The birthday bound shows when \(q \approx N^{1/2}\), we obtain a collision with probability 1/2. This can be extended to the l-collision case. Suzuki et al. [STKT08] showed that with \(N^{(l-1)/l}\) queries the probability of finding an l-collision is upper bounded by 1/l! and lower bounded by \(1/l! - 1/2(l!)^2\), which shows that the query complexity can be approximated to \(N^{(l-1)/l}\) for a small constant l. To be more precise, it is shown that \(O\big ((l!)^{1/l} N^{(l-1)/l}\big )\) evaluations of the function H finds an l-collision with probability about 1/2 if \(H :X \rightarrow Y\) is a random function.

The above argument only focuses on the number of queries. To implement the l-collision finding algorithm, the computational cost, \(T\), and the memory amount, \(S\), or their tradeoff should be considered. The simple method needs to store all the results of the queries. Hence, it requires \(T= S= N^{1/2}\) for collisions and \(T= S= O(N^{(l-1)/l})\) for l-collisions. The collision finding algorithm can be made memoryless by using Floyd’s cycle detecting algorithm [Flo67]. However, no such memoryless algorithm is known for l-collisions, thus the researcher’s goal is to achieve better complexity with respect to \(T\times S\) or to trade \(T\) and \(S\) for a given \(T\times S\).

An l-collision can be found with \(T= l \cdot N\) and \(S= O(1)\) by running a brute-force preimage attack l times for a fixed target. Although this method achieves better \(T\times S\) than the simple method, it cannot trade \(T\) for \(S\) Joux and Lucks [JL09] discovered the 3-collision finding algorithm with \(T= N^{1-\alpha }\) and \(S= N^\alpha \) for \(\alpha < 1/3\) by using the parallel collision search technique. Nikolić and Sasaki [NS16] achieved the same complexity as Joux and Lucks by using an unbalanced meet-in-the-middle attack.

1.1 Collisions and Multicollisions in Quantum Setting

Algorithmic speedup using quantum computers has been actively discussed recently. For example, Grover’s seminal result [Gro96] attracted cryptographers’ attention because of the quantum speedup of database search. Given a function \(F :X \rightarrow \{0,1\}\) such that there exists a unique \(x_0 \in X\) that satisfies \(F(x_0)=1\), Grover’s algorithm finds \(x_0\) in \(O\big (|X |^{1/2}\big )\) queries.

This paper discusses the complexity of quantum algorithms in the quantum query model. In this model, a function H is given as a black box, and the complexity of quantum algorithms is measured as the number of quantum queries to H. A quantum query model is widely adopted, and previous studies on finding collisions in the quantum setting follow this model [BHT97, Amb07, Bel12, Yue14, Zha15].

Previous research on finding collisions and multicollisions can be classified with respect to two types of dichotomies.

 

Domain size and codomain size.:

The domain size and codomain size of the function \(H :X \rightarrow Y\) is a sensitive problem for quantum algorithms. Some quantum algorithms aim to find collisions and multicollisions of H with \(|X | \ge |Y |\), while others target H with \(|X | < |Y |\). The former algorithms can be directly applied to find collisions and multicollisions of real hash functions such as SHA-3. The latter ones mainly target database search rather than hash functions. The (multi-)collision search on database can still be converted for hash functions, but it generally requires a huge complexity increase. (On the other hand, the (multi-)collision search for hash functions with \(|X | \ge |Y |\) cannot be converted for a database with \(|X | < |Y |\).)

Hereafter, we use “H” and “D” to denote the cases with \(|X | \ge |Y |\) and \(|X | < |Y |\), respectively. We note that our goal is finding a new multicollision algorithm that can be applied to real hash functions, namely the H setting.

Random function and any function.:

Both in classical and quantum settings, existing algorithms often assume randomness: they can find collisions only on average when H is chosen uniformly at random from \(\mathop {\mathrm {Map}}(X,Y) := \{f \mid f:X \rightarrow Y\}\). If an algorithm finds collisions of any function \(H \in \mathop {\mathrm {Map}}(X,Y)\), it also finds collisions of randomly chosen functions. Hence algorithms applied to any function are stronger than ones only applied to a random function. Hereafter, we use “Rnd” and “Arb” to denote the cases in which H is chosen uniformly at random and H is chosen arbitrarily, respectively. We note that the Rnd setting is sufficient for our goal and will show that our new algorithm can be applied to the Arb setting.

 

In the following, we revisit the existing results of collision and multicollision-finding algorithms in the quantum setting.

  • Brassard et al. [BHT97] proposed a quantum algorithm \(\mathbf {BHT}\), which can be classified as H-Arb for 2-collisions. To be more precise, \(\mathbf {BHT}\) finds a 2-collision of any l-to-one function with \(O(N^{1/3})\) quantum queries and a memory amount of \(O(N^{1/3})\).

  • Ambainis [Amb07] studied an element distinctness problem rather than the collision finding problem, but his algorithm can be directly applied to find (multi)collisions of functions. The algorithm is for D-Arb for l-collisions with \(O(M^{l/(l+1)})\) quantum queries, where M is the domain size.

  • Belovs [Bel12] improved the complexity of Ambainis’ algorithm [Amb07].

  • Zhandry [Zha15] observed that Ambainis’ algorithm [Amb07] can be modified to H-Rnd for 2-collisions with \(O(N^{1/3})\) quantum queries, when \(|X | = \varOmega (N^{1/2})\) and \(N=|Y |\).

  • Yuen [Yue14] discussed the application of \(\mathbf {BHT}\) when \(|X | = |Y |\) and the target function H is weakened to Rnd. The complexity is \(O(N^{1/3})\) quantum queries. We do not discuss its details because the discussed case in Yuen’s work [Yue14] is a subset of Zhandry’s extension of Ambainis’ algorithm.

  • Regarding the lower bound, \(O((N/l)^{1/3})\) of \(\mathbf {BHT}\) to find 2-collisions against l-to-one function was proved to be tight by several researchers [AS04, Amb05, Kut05]. Zhandry proved that \(O(N^{1/3})\) for 2-collisions against random function is tight. That is, any quantum algorithm that finds a 2-collision against a random function requires \(\varOmega (N^{1/3})\) quantum queries [Zha15].Footnote 1 Obviously, \(\varOmega (N^{1/3})\) can also be a lower bound for \(l > 2\), but no advanced lower bound is known for \(l > 2\). Hülsing et al. [HRS16] studied quantum generic security of hash functions by considering quantum query complexity in the quantum random-oracle model. They successfully showed the upper and lower bound of quantum query complexity to solve the one-wayness, second-preimage resistance, extended target-collision resistance, and their variants.Footnote 2 Unfortunately, they did not treat collision and multicollision resistances.

The classifications of the existing algorithms are shown in Table 1. As mentioned earlier, Ambainis’ algorithm [Amb07] and its improvement by Belovs [Bel12] originally focused on the database search, but they can be converted into the hash function setting with extra complexity. However, all the other approaches for the hash function setting only analyze 2-collisions. Hence, we can conclude that no quantum algorithm exists that is optimized to find l-collisions for hash functions.

Table 1. Summary of existing quantum algorithms to find (multi-)collisions.

1.2 Our Contributions

In this paper, we study quantum algorithms to find l-collisions against a function \(H :X \rightarrow Y\).

First, the problem of finding l-collisions against hash functions has not received much attention in the literature. Even if the previous work can be directly applied to l-collisions against hash functions, nobody has considered this problem and no generic attack is known. This motivates us to provide a systematization of knowledge about existing quantum algorithms. Namely, we, for the first time in this field, provide the state of the art of the complexity of finding l-collisions against hash functions with a direct application, trivial extension, and simple combination of existing results.

This state of the art sheds light on the problems that require further investigation. For the second but main contribution of this paper, we present a new quantum algorithm to find l-collisions against hash functions.

Our contributions in each part are detailed below.

Systematization of Knowledge (combination of Existing Results)

  • Our first observation is that, when H is a random function and \(|X | = l |Y |\) for a small constant l, the query complexity of the l-collision finding problem is lowered to \(O(N^{1/2})\) by simply applying Grover’s algorithm. Hence, any meaningful generic attack in the quantum setting must achieve the query complexity below \(O(N^{1/2})\). Intuitively, a preimage of the hash value can be generated with \(O(N^{1/2})\) queries in the quantum setting and l-collisions are generated by generating l preimages. This corresponds to the upper bound of O(N) complexity in the classical setting. (Note that this upper bound is for the Rnd setting and does not hold for the Arb setting.)

  • The above observation is quite straightforward but useful to measure the effect of other attacks. For example, Ambainis’ l-collision search for database [Amb07] can be converted for hash functions with \(O(M^{l/(l+1)})\) complexity where M is the domain size. However, this cannot be below \(O(N^{1/2})\) for any l. The same applies to the improvement by Belovs [Bel12]. Those converted algorithms can be meaningful only in the Arb setting.

  • Zhandry [Zha15] discussed the application of Ambainis’ l-collision search in H-Rnd and D-Rnd only for \(l=2\), although it can trivially be extended to \(l > 2\). If it is extended, the complexity for \(l=3\) reaches \(O(N^{1/2})\). Thus, Zhandry’s idea only works for \(l=2\).

  • Zhandry [Zha15] considered Ambainis’ l-collision search rather than Belovs’ improvement [Bel12]. If we consider Zhandry + Belovs, the complexity in H-Rnd for \(l=3\) becomes \(O(N^{10/21})\), which is faster than the simple application of Grover’s algorithm. Thus, it is a meaningful generic attack. For \(l \ge 4\), the complexity of Zhandry + Belovs reaches \(O(N^{1/2})\).

  • In summary, for the Rnd setting, the tight algorithm with \(O(N^{1/2})\) complexity exists for \(l=2\). There is a better generic attack than the simple application of Grover’s algorithm for \(l=3\), although the lower bound is unknown. For \(l \ge 4\), there is no known algorithm better than the application of Grover’s algorithm, and the lower bound is also unknown. For the Arb setting, direct application of Belovs’ algorithm is the existing best attack.

New Quantum Multicollision-Finding Algorithm

  • Given the above state of the art, our main contribution is a new l-collision finding algorithm with \(O\left( N^{ (3^{l-1}-1) / 2 \cdot 3^{l-1}} \right) \) quantum queries against an arbitrary function \(H :X \rightarrow Y\) with \(|X | = l |Y |\). By applying this algorithm in the Rnd setting, we achieve a speedup compared with the simple upper bound of \(O(N^{1/2})\) for any l. The complexity of our algorithm matches the tight bound of \(O(N^{1/3})\) for \(l=2\) and is faster than O(10/21) of Zhandry + Belovs for \(l=3\). The complexity of our algorithm for a small constant l is shown in Table 2. The complexities are compared in Fig. 1.

  • Unlike other algorithms for Arb, our algorithm asymptotically approaches to \(O(N^{1/2})\) as l increases. The previous results by Ambainis [Amb07] asymptotically approaches to O(M), and Belovs [Bel12] asymptotically approaches to \(O(M^{3/4})\), respectively, where \(M = |X |\). Our algorithm improves these results for \(M \ge l \cdot N\). The complexities are compared in Fig. 2 for \(M = l \cdot N\).

  • The core idea of our algorithm is a sophisticated combination of the 3-collision algorithm in the classical setting by Joux and Lucks [JL09] and the generalized Grover algorithm for the quantum setting [BBHT98].

    In short, we recursively call a collision finding algorithm and Grover’s algorithm. For example, to generate 3-collisions, we first iterate the 2-collision finding algorithm of \(O(N^{1/3})\) complexity \(O(N^{1/9})\) times. Then, we search for the preimage of one of \(O(N^{1/9})\) 2-collisions by using Grover’s algorithm, which runs with \(O(N^{4/9})\) complexity. To generate 4-collisions, we iterate the 3-collision finding algorithm of \(O(N^{4/9})\) complexity \(O(N^{1/27})\) times, then search for the preimage of one of \(O(N^{1/27})\) 3-collisions with \(O(N^{13/27})\) complexity.

    In classical setting, the recursive application of the algorithm of [JL09] has never been discussed in literature. This is because the resulting complexity easily exceeds the information theoretically upper bound of \(O(N^{(l-1)n/l})\). In contrast, no such upper bounds are known in quantum complexity, thus we can obtain advantages with the recursive application.

  • Finally, we provide a rigorous complexity evaluation of our algorithm, which is another main focus of this paper. The point of our proof is that lower and upper bounding the number of collisions of H is necessary for lower bound success probability. Our evaluation suggests that our algorithm finds a 2-collision of SHA3-512 with \(2^{179}\) quantum queries and finds a 3-collision with \(2^{238}\) quantum queries.

Table 2. Quantum query complexity of our l-collision finding algorithm. Query denotes \(\log _N(\text {query})\), which asymptotically approaches 1/2 as l increases.
Fig. 1.
figure 1

Quantum query complexity needed to find l-collision in H-Rnd setting. Query denotes \(\log _N(\text {query})\).

Fig. 2.
figure 2

Quantum query complexity for finding an l-collision in H-Arb setting. Query denotes \(\log _N(\text {query})\).

2 Preliminaries

Notation. We define l-collision as follows.

Definition 2.1

( l -collision). Let l be a positive integer. Let X and Y be finite spaces. Let \(H :X \rightarrow Y\) be a function from X to Y. Let \(\{x_1,x_2,\dots ,x_l \}\) be a subset of X and let y be an element of Y. We define \(\big (\{x_1,x_2,\dots ,x_l \}, y\big )\) as an l-collision of H if the pair satisfies \(x_i \ne x_j\) for \(i \ne j\) and \(y = H(x_1) = H(x_2) = \cdots = H(x_{l})\). Two l-collisions \(c = \big (\{x_1,x_2,\dots ,x_l \}, y\big )\) and \(c' = \big (\{x'_1,x'_2,\dots ,x'_l \}, y' \big )\) are said to be equal if and only if \(\{x_1,x_2,\dots ,x_l \} = \{x'_1,x'_2,\dots ,x'_l \}\) as sets and \(y= y'\).

If \(|X | = l \cdot |Y |\) and a function \(H :X \rightarrow Y\) satisfies \(|H^{-1}(H(x)) | = l\) for any \(x \in X\), we call H l-to-one function. If l is clear by context, we simply call H regular function.

Complexity of quantum algorithm. Suppose that we are given a function H as a black box and can query a quantum state to the function H; that is, we can send a quantum superposition, say, \(\sum _{x \in X} \alpha _x{\,|x\rangle }{\,|b\rangle }\) to the oracle H and obtain \(\sum _{x \in X} \alpha _x {\,|x\rangle } {\,|b \oplus H(x)\rangle }\). In the quantum query model, the complexity of a quantum algorithm is measured by the number of quantum queries to H that the algorithm makes. Many existing studies on collision problems in quantum setting follow this model [BHT97, Amb07, Bel12, Zha15], and the quantum query complexity of collision problems must be understood when we make security proofs in the quantum random oracle model [BDF+11], which corresponds to the random oracle model in a classical setting. As for time complexity, we will discuss it in Sect. 6.

In the rest of the paper, we assume that readers already have sufficient basic knowledge about the quantum circuit model and omit a detailed explanation of it.

2.1 Grover’s Algorithm and Its Generalization

Grover’s algorithm [Gro96] was proposed for fast database search in a quantum setting. The problem of database search is modeled as follows:

Problem 2.1

(Quantum Database Search). Suppose that there is a function \(F :X \rightarrow \{0,1\}\) such that there is only one element \(x_0 \in X\) that satisfies \(F(x_0) =1\). The problem is to find \(x_0\) under the condition that we are allowed to access a quantum oracle of F.

Grover’s algorithm can solve this problem with high probability, making quantum queries to F for roughly \(\sqrt{|X |}\) times. This means that the complexity needed for an exhaustive search in a quantum setting is the square root of one in the classical setting. For example, an exhaustive key search against AES-128 will succeed with approximately \(2^{64}\) quantum queries.

The database search problem described above is naturally extended so that F has more than one preimage of 1. A formal description is given below.

Problem 2.2

(Generalized Quantum Database Search). Suppose that there is a function \(F :X \rightarrow \{0,1\}\) and we are allowed to make quantum queries to F. Then, find \(x_0\) that satisfies \(F(x_0) = 1\).

Boyer et al. proposed a quantum algorithm solving this problem [BBHT98]. The advantage of their algorithm is that it can be applied without knowing the number of \(x \in X\) that satisfies \(F(x) = 1\) in advance.

Theorem 2.1

([BBHT98] Theorem 3). Let X be a finite set and \(F :X \rightarrow \{0,1\}\) be a function. Let \(t= \big | \{x \in X \mid F(x)=1\} \big |\). If \(t \le \frac{3}{4} |X |\), there exists a quantum algorithm \(\mathbf {BBHT}\) that finds \(x \in X\) that satisfies \(F(x) = 1\) with an expected number of quantum queries to F at most \(9/2 \cdot \sqrt{|X |/t}\), without knowing t in advance. When \(t=0\), this algorithm will never abort.

The above algorithm \(\mathbf {BBHT}\) is applicable only to the case \(t \le \frac{3}{4} |X |\), but we want an algorithm that is also applicable to the case \(t > \frac{3}{4}|X |\) . Now we consider the following algorithm \(\mathcal {A}\). \(\mathcal {A}\) runs \(\mathbf {BBHT}\), and simultaneously choose random elements from X independently and uniformly at random, and make queries to F. \(\mathcal {A}\) makes exactly one query when \(\mathbf {BBHT}\) makes one query, and \(\mathcal {A}\) stops at once if it finds \(x \in X\) such that \(F(x)=1\). This algorithm \(\mathcal {A}\) is also applicable to the case \(t > \frac{3}{4}|X |\), and it finds \(x \in X\) such that \(F(x)=1\) with an expected number of quantum queries to F at most

$$ \max \left\{ 2 \cdot \frac{9}{2} \sqrt{\frac{|X |}{t}}, 2 \cdot \frac{4}{3} \right\} = 9 \sqrt{\frac{|X |}{t}}. $$

We also call this algorithm \(\mathbf {BBHT}\). Now we have the following corollary.

Corollary 2.1

Let X be a finite set and \(F :X \rightarrow \{0,1\}\) be a function. Let \(t= \big | \{x \in X \mid F(x)=1\} \big |\). There exists a quantum algorithm \(\mathbf {BBHT}\) that finds \(x \in X\) that satisfies \(F(x) = 1\) with an expected number of quantum queries to F at most \(9 \cdot \sqrt{|X |/t}\), without knowing t in advance. When \(t=0\), this algorithm will never abort.

3 Systematization of Knowledge on Quantum Multicollision Algorithms

In the classical setting, l-collision on hash functions can be found with \(O(N^{(l-1)/l})\) queries for a small constant l.

However, the problem has not received much attention in the quantum setting. This section surveys previous work and integrates the findings of different researchers to make several new observations on this topic.

3.1 Survey of Previous Work

We review the algorithm \(\mathbf {BHT}\) [BHT97] because our new algorithm explained in Sect. 4 is an extension of it. We also survey previous studies, classifying them in two types: element l-distinctness problem (D-Arb), and collision finding problem on random functions (D-Rnd and H-Rnd).

BHT: Collision Finding Problem on \({\varvec{l}}\) -to-one functions. For simplicity, we describe \(\mathbf {BHT}\) only for the case \(l=2\). Let XY be sets that satisfy \(|X | = 2 \cdot |Y |\), \(|Y | = N\), and \(H :X \rightarrow Y\) be a 2-to-one function.

The basic idea of \(\mathbf {BHT}\) is as follows. First, we choose a parameter k (\(k = N^{1/3}\) will turn out to be optimal) and a subset \(X' \subset X\) of cardinality k. We then make a list \(L = \{ (x,H(x)) \}_{x \in X'}\). Second, we use the \(\mathbf {BBHT}\) algorithm to find an element \(x \in X\) such that there exists \(x_0 \in X'\) that satisfies \((x_0,H(x)) \in L\) and \(x \ne x_0\), i.e., we try to find a pair \((x_0,H(x_0)) \in L\) that can be extended to a collision \((\{x,x_0\},H(x_0))\). The precise description of \(\mathbf {BHT}\) is as follows.

Definition 3.1

( \(\mathbf {BHT}(H,k)\) )

  1. 1.

    Choose an arbitrary subset \(X' \subset X\) of cardinality k.

  2. 2.

    Make a list \(L = \big \{ \big (x,H(x)\big ) \big \}_{x \in X'}\) by querying \(x \in X'\) to H.

  3. 3.

    Sort L in accordance with H(x).

  4. 4.

    Check whether L contains a 2-collision, i.e., there exist \((x,H(x)),(y,H(y)) \in L\) such that \(x \ne y \) and \(H(x) = H(y)\). If so, output the 2-collision \((\{x,y\},H(x))\). Otherwise proceed to the next step.

  5. 5.

    Construct the oracle \(F :X \rightarrow \{0,1\}\) by defining \(F(x)=1\) if and only if there exists \(x_0 \in X'\) such that \((x,H(x_0)) \in L\) and \(x \ne x_0\).

  6. 6.

    Run \(\mathbf {BBHT}(F)\) to find \(\tilde{x} \in X'\) such that \(F(\tilde{x}) = 1\).

  7. 7.

    Find \(x_0 \in X'\) that satisfies \(H(\tilde{x}) = H(x_0)\) from the list L. Output the 2-collision \((\{\tilde{x},x_0\},H(x_0))\).

This algorithm makes k quantum queries in Step 2 and \(O(\sqrt{N/k})\) quantum queries in Step 6 (in fact, in constructing the list L, we need no advantage of quantum calculation, so queries in Step 2 can also be made classically if we are allowed to access a classical oracle of H). Thus, the total number of quantum queries is \(O(k + \sqrt{N/k})\), which is minimized when \(k=N^{1/3}\). Brassard et al gave the following theorem [BHT97].

Theorem 3.1

([BHT97, Theorem 1]). Suppose that X and Y are finite sets that satisfy \(|X | = 2 \cdot |Y |\), and \(H :X \rightarrow Y\) is a 2-to-one function. Let \(N = |Y |\) and k be an integer such that \(1 \le k \le N\). \(\mathbf {BHT}\) finds a 2-collision of H with an expected quantum query complexity \(O( k + \sqrt{N/k})\) and memory complexity O(k). In particular, when \(k = N^{1/3}\), \(\mathbf {BHT}\) finds a 2-collision of H with expected quantum query complexity \(O({N}^{1/3})\) and memory complexity \(O(N^{1/3})\).

Element \({\varvec{l}}\) -distinctness problem ( \({\varvec{l}}\) -collisions in D-Arb). Consider the element l-distinctness problem, in which we are given access to the oracle \(H :X' \rightarrow Y\) to find whether there exist distinct \(x_1,\dots ,x_l\) such that \(H(x_1) = \dots = H(x_l)\), i.e., there exits an l-collision of H. Note that H obviously has an l-collision if \(|X' | \ge (l-1)|Y |\), and the element l-distinctness problem considers the collision detecting problem on the database rather than the hash function.

Ambainis [Amb07] proposed a quantum algorithm based on quantum walks that solves the element l-distinctness problem. His algorithm finds not only whether there exists an l-collision but also an actual l-collision value \((\{x_1, \dots , x_l\},y)\) and can be applied even for finding collisions in \(|X' | \ge (l-1)|Y |\). His algorithm requires \(O\big (|X' |^{l/(l+1)}\big )\) quantum queries to H. This algorithm was later improved by Belovs [Bel12], who developed an algorithm that requires \(O\big (|X' |^{1-2^{l-2}/(2^l-1)}\big ) = o(|X' |^{3/4})\) quantum queries.Footnote 3

Although the algorithms by Ambainis and Belovs can be applied to find an l-collision for \(|X' | \ge (l-1)|Y |\), the complexity increases as the domain size \(|X' |\) increases. These algorithms are inefficient to find collisions of hash functions, since the domain size of cryptographic hash functions is exponentially larger than the codomain size, and we often regard the problem size as dependent on the codomain size \(|Y |\) not the domain size \(|X' |\). Hence we need another dedicated quantum algorithm to efficiently find collisions of hash functions. The black circles and rectangles in Fig. 2 correspond to the query complexity for naïve applications of Ambainis’ algorithm and Belovs’ algorithm for hash functions, respectively.

Collision Finding Problem on Random Functions ( \({\varvec{l}}\) -collisions in D-Rnd and H-Rnd). Among variants of the collision problem, the collision finding problem on random functions is the most significant problem in the context of cryptography. We introduce algorithms for \(l=2\) in the following.

A modification of \(\mathbf {BHT}\). Let us consider a modification of \(\mathbf {BHT}\), denoted \(\mathbf {BHT'}\), in which we choose a subset \(X'\) uniformly at random. This small modification yields two important improvements of \(\mathbf {BHT}\):

  • Brassard et al [BHT97] mentioned that if \(|X | \ge l |Y |\),then \(\mathbf {BHT'}(H,N^{1/3})\) finds a collision with quantum query complexity \(O(N^{1/3})\) with constant probability.

  • Yuen [Yue14] showed that if \(|X | = |Y |\) and H is random, then \(\mathbf {BHT'}(H,N^{1/3})\) finds a collision with quantum query complexity \(O(N^{1/3})\) with constant probability.

Zhandry’s algorithm. Zhandry [Zha15] proposed a quantum algorithm finding a collision with \(O(N^{1/3})\)-quantum queries even if \(|X | = \varOmega (N^{1/2})\) and H is random. This improves the restrictions of \(\mathbf {BHT}\) and \(\mathbf {BHT'}\), \(|X | \ge 2 |Y |\) [BHT97], or \(|X | = |Y |\) and H is random [Yue14].

His algorithm is summarized as follows:

  1. 1.

    Choose a random subset \(X' \subset X\) of size \(N^{1/2}\).

  2. 2.

    Invoke Ambainis’ algorithm for \(H|_{X'} :X' \rightarrow Y\) and obtain a collision.

The collision exists if H is random because of the birthday bound and the query complexity is \(O\big (|X' |^{2/3}) = O\big ( (N^{1/2})^{2/3} \big ) = O(N^{1/3})\).

3.2 New Observations

This section gives our new observations, which are summarized as:

  1. 1.

    In quantum setting, the trivial upper bound for finding an l-collision of a random function is \(O(N^{1/2})\).

  2. 2.

    We can find a 3-collision of a random function with quantum query complexity \(O(N^{10/21})\).

Observation 1 is obtained by applying a generalized Grover algorithm, and Observation 2 is obtained by combining the idea of Zhandry [Zha15] and the result of Belovs [Bel12].

Trivial Upper-Bound for Finding \({\varvec{l}}\) -collisions in quantum setting. In the classical setting, the trivial upper bound for finding an l-collision is O(N) because of the following algorithm:

  1. 1.

    Choose an element \(x_1 \in X\) uniformly at random.

  2. 2.

    Operate exhaustive search to find \(x_i\) for \(i = 2,\dots ,l\) that satisfies \(H(x_i) = H(x)\).

  3. 3.

    Output \((\{x_1,\dots ,x_l\},H(x_1))\) as an l-collision.

In the quantum setting, we can replace the exhaustive search with \(\mathbf {BBHT}\). We call this algorithm \(\mathbf {Multi}\hbox {-}{} \mathbf{Grover}\), described as follows:

Definition 3.2

( \(\mathbf {Multi}\hbox {-}{} \mathbf{Grover}(H)\) )

  1. 1.

    Choose an element \(x_1 \in X\) uniformly at random and set \(L = \{x_1\}\).

  2. 2.

    While \(|L | < l\), do:

    1. (a)

      Invoke \(\mathbf {BBHT}(F)\) to find \(x \in X\) such that \(H(x) = H(x_1)\), where we implement \(F :X \rightarrow \{0,1\}\) as \(F(x) = 1\) if and only if \(H(x) = H(x_1)\).

    2. (b)

      If \(x \not \in L\), then \(L \leftarrow L \cup \{x\}\).

  3. 3.

    Output \((L,H(x_1))\) as an l-collision.

Roughly speaking, each step in the loop requires \(O(N^{1/2})\) queries to find \(x_i\). Thus, the total query complexity is \(O(N^{1/2})\) for a small constant l. Therefore, to achieve a meaningful improvement, we need to find an l-collision with fewer than \(O(N^{1/2})\) quantum queries.

We note that the lower bound of 2-collisions in [Zha15] also applies to multicollisions. Hence, complexity of any multicollision-finding algorithm is between \(O(N^{n/3})\) by 2-collisions and \(O(N^{n/2})\) by the trivial upper bound. This corresponds to between birthday bound and preimage bound in the classical setting.

Extension of Element \({\varvec{l}}\) -distinctness to \({\varvec{l}}\) -collision. We observe that algorithms for l-distinctness problem can be used to find l-collisions of a random function \(H :X \rightarrow Y\) by extending Zhandry’s idea. Let XY be finite sets with \(|Y | = N\) and \(|X | \ge (l!)^{1/l} N^{(l-1)/l}\). Let \(H :X \rightarrow Y\) be a random function.

  1. 1.

    Choose a random subset \(X' \subset X\) of size \((l!)^{1/l} N^{(l-1)/l}\)

  2. 2.

    Invoke Belovs’ algorithm for \(H|_{X'} :X' \rightarrow Y\) and obtain an l-collision

According to Suzuki et al. [STKT08], \(H|_{X'}\) has an l-collision with probability approximately 1/2. Thus, we observe that Belovs’ algorithm can find an l-collision of \(H|_{X'}\) with quantum query complexity \(O\left( (N^{1-2^{l-2}/(2^l-1)})^{(l-1)/l} \right) \).Footnote 4 This matches the tight bound \(\varTheta ( N^{1/3} )\) for \(l=2\) [Zha15] and gives a new upper bound \(O(N^{10/21})\) for \(l=3\), which is crucially lower than the trivial bound \(O(N^{1/2})\) (see Sect. 3.2). The white rectangles for \(l = 2,3\) in Fig. 1 correspond to this algorithm.

Note that for the case of H-Rnd, if \(l \ge 4\), \((N^{1-2^{l-2}/(2^l-1)})^{(l-1)/l}\) becomes greater than or equal to \(N^{1/2}\), which matches the trivial bound for finding l-collisions. Therefore, we have to make another quantum algorithm if we want to find l-collisions for \(l \ge 4\) with fewer than \(N^{1/2}\) quantum queries.

Our algorithm given in the next section finds an l-collision with the same query complexity as existing work for \(l=2\), and less query complexity than observations above for \(l \ge 3\).

4 New Quantum Algorithm for Finding Multicollisions

Now we describe our algorithm for finding multicollisions. We begin with intuitive arguments about how to come up with an algorithm for finding multicollisions by extending the \(\mathbf {BHT}\) algorithm and then give a formal description of our algorithm.

4.1 Intuitive Discussion from 2-Collisions to l-Collisions

First, we intuitively assume that \(\mathbf {BHT}(H,k)\) can find a collision for a function \(H :X \rightarrow Y\) if \(|X | = 2 N\) without any modification, because the expected number of preimages \(|H^{-1}(y) |\) for each \(y \in Y\) is 2 when H is chosen uniformly at random from \(\mathop {\mathrm {Map}}(X,Y)\). (See Sect. 3.1 and the original paper [BHT97] for this justification.) Recall that the principle of \(\mathbf {BHT}(H,k)\) is to make a list L of 1-collisions the size of which is k and to extend 1-collisions in L to 2-collisions with the \(\mathbf {BBHT}\) algorithm. Constructing the list L requires k quantum queries, and \(\mathbf {BBHT}\) makes \(O(\sqrt{N/k})\) quantum queries, so the total number of quantum queries is \(O(k + \sqrt{N/k})\). The optimal k that minimizes \(k + \sqrt{N/k}\) satisfies \(k = \sqrt{N/k}\), which is \(k = N^{1/3}\) and then \(O(k + \sqrt{N/k}) = O(N^{1/3})\).

Next we consider to find a 3-collision of a function \(H :X \rightarrow Y\) under the condition \(|X | = 3 N\). We take a similar strategy to that of \(\mathbf {BHT}\), i.e., we make a list L of 2-collisions the size of which is k, and extend 2-collisions in L to 3-collisions with the \(\mathbf {BBHT}\) algorithm. We can find a 2-collision of H with \(\mathbf {BHT}(H,N^{1/3})\), which makes \(O(N^{1/3})\) quantum queries. Constructing the list L requires \(k \cdot N^{1/3}\) queries, and \(\mathbf {BBHT}\) makes \(O(\sqrt{N/k})\) quantum queries, so the total number of quantum queries is \(O(k \cdot N^{1/3} + \sqrt{N/k})\). The optimal k that minimizes \(k \cdot N^{1/3} + \sqrt{N/k}\) satisfies \(k \cdot N^{1/3} = \sqrt{N/k}\). This is \(k = N^{1/9}\) and then \(O(k \cdot N^{1/3} + \sqrt{N/k}) = O(N^{4/9})\). Hence, our new algorithm improves the bound \(O\left( N^{10/21} \right) \) for \(l=3\), which we observed in the previous section.

Similarly to above, we can find l-collisions of a function \(H :X \rightarrow Y\) under the condition \(|X | = l N\), i.e., we construct a list L of \((l-1)\)-collisions of the size k, and extend \((l-1)\)-collisions in L to l-collisions using \(\mathbf {BBHT}\). By inductive argument, we can find that constructing the list L requires \(k \cdot N^{(3^{l-2}-1)/(2 \cdot 3^{l-2})}\) queries, and \(\mathbf {BBHT}\) makes \(O(\sqrt{N/k})\) quantum queries, so the total number of quantum queries is \(O(k \cdot N^{(3^{l-2}-1)/(2 \cdot 3^{l-2})} + \sqrt{N/k})\). The optimal k that minimizes \(k \cdot N^{(3^{l-2}-1)/(2 \cdot 3^{l-2})} + \sqrt{N/k}\) satisfies \(k \cdot N^{(3^{l-2}-1)/(2 \cdot 3^{l-2})} = \sqrt{N/k}\), which is \(k = N^{1/3^{l-1}}\), and then

$$O\left( k \cdot N^{(3^{l-2}-1)/(2 \cdot 3^{l-2})} + \sqrt{N/k} \right) = O\left( N^{ ( 3^{l-1}-1) / ( 2 \cdot 3^{l-1}) } \right) $$

holds. Again, our new algorithm improves the trivial bound \(N^{1/2}\) for l-collisions, \(l \ge 4\).

If there exists an algorithm finding l-collisions in the case \(|X | = l N\), then we can use it to find l-collisions in the case \(|X | > l N\) with the same number of queries and the same memory size, by choosing a subset \(X' \subset X\) of size lN and by operating the algorithm on \(H|_{X'}\).

4.2 Formal Description of Our Algorithm

Formalizing the above arguments, we obtain a quantum algorithm that finds l-collisions of any function \(H :X \rightarrow Y\) with \(|Y | = N\) and \(|X | \ge l N\). As briefly introduced in Sect. 4.1, our main idea is to construct a recursive algorithm \(\mathbf {MColl}\). The algorithm below focuses on the procedure. Complexity analysis of \(\mathbf {MColl}\) will be given in the next section. Although our algorithm is an extension of \(\mathbf {BHT}\), the definition of the function F is slightly modified to simplify the complexity analysis.

\(\mathbf {MColl}(H,l)\)

  1. 1.

    If \(|X | > l N\), then choose a subset \(X' \subset X\) such that \(|X' |=l N\) uniformly at random and operate \(\mathbf {MColl}(H|_{X'},l)\). Otherwise proceed to the next step.

  2. 2.

    If \(l=1\), then choose x from X uniformly at random and output \((\{x\},H(x))\). Otherwise proceed to the next step.

  3. 3.

    Operate \(\mathbf {MColl}(H,l-1)\) repeatedly for \(N^{1/{3^{l-1}}}\) times and obtain \((l-1)\)-collisions \(c^{(i)} = (\{x^{(i)}_1,x^{(i)}_2,\dots ,x^{(i)}_{l-1} \}, y^{(i)})\). Store these \((l-1)\)-collisions in a list L.

  4. 4.

    Sort L in accordance with \(y^{(i)}\).

  5. 5.

    Check whether L contains duplication, i.e., there exist indices \(i \ne j\) such that \(c^{(i)} = c^{(j)}\). If it does, then stop and restart from Step 3. Otherwise proceed to the next step.

  6. 6.

    Check whether L contains an l-collision. If there is an l-collision, then output it. Otherwise proceed to the next step.

  7. 7.

    Define \(F :X \rightarrow \{0,1\}\) by \(F(x)=1\) if and only if \(H(x)=y^{(i)}\) holds for \(1 \le {}^\exists i \le N^{1/3^{l-1}}\) (F can be implemented in a quantum circuit by calling H twice as shown in Fig. 3).

  8. 8.

    Operate \(\mathbf {BBHT}(F)\). Let \(\tilde{x} \in X\) be the obtained answer, which satisfies \(F(\tilde{x})=1\).

  9. 9.

    Find \(i_0\) that satisfies \(H(\tilde{x}) = y^{(i_0)}\) from the list L. If \(\tilde{x} \in \{ x^{(i_0)}_1, x^{(i_0)}_2, \dots , x^{(i_0)}_{l-1} \}\), then stop and restart from Step 3. Otherwise output an l-collision \((\{x^{(i_0)}_1,x^{(i_0)}_2,\dots ,x^{(i_0)}_{l-1},\) \(\tilde{x}\}, y^{(i_0)})\).

Fig. 3.
figure 3

Quantum circuit of F. H is the function we want to find collisions, and \(\mathop { BL }:Y \rightarrow \{0,1\}\) is the binary function that is defined by \(\mathop { BL }(y) = 1\) if and only if there exists \(c^{(i)} = \left( \{x^{(i)}_1,\dots ,x^{(i)}_{l-1}\},y^{(i)} \right) \in L\) such that \(y=y^{(i)}\). Here \(\mathop { BL }\) corresponds to a quantum circuit \({\,|x\rangle }{\,|y\rangle } \mapsto {\,|x\rangle }{\,|y \oplus \mathop { BL }(x)\rangle }\).

5 Complexity Analysis of \(\mathbf {MColl}\)

In this section we analyze the complexity of \(\mathbf {MColl}\). First, we discuss complexity intuitively in Sect. 5.1 and then give formal arguments and proofs in Sect. 5.2.

5.1 Intuitive Analysis

We intuitively discuss the complexity of our algorithm. In the following, we show that \(\mathbf {MColl}(H,l)\) finds that an l-collision with memory complexity is approximately \(N^{1/3}\) and the expected quantum query complexity is at most approximately \(l! \cdot N^{ (3^{l-1}-1) / (2 \cdot 3^{l-1}) } \).

First, we consider memory complexity. The claim obviously holds for \(l = 1\). In the case \(l \ge 2\), the algorithm uses memory only for storing the list L. The memory size needed for L is \(N^{1/3^{l-1}}\), which is less than or equal to \(N^{1/3}\). Thus, the memory complexity is at most \(N^{1/3}\).

Next, we consider quantum query complexity. We upper bound the expected number of quantum queries by approximately

$$\begin{aligned} Q_l := (N/P_l) \cdot l! \cdot N^{ (3^{l-1}-1) / (2 \cdot 3^{l-1}) }, \end{aligned}$$

where \(P_l\) is the number of the points in Y that have at least l preimages for a fixed H. Regarding \((N/P_l)\) as constant, we obtain the desired bound.

The claim obviously holds for \(l=1\). Assume that the claim holds for \((l-1)\). Since Step 3 makes \(N^{1/{3^{l-1}}}\)-times calls of \(\mathbf {MColl}(H,l-1)\), the number of queries made in operating Step 3 once is approximately

$$\begin{aligned} N^{1/{3^{l-1}}} \cdot Q_{l-1}&= N^{1/{3^{l-1}}} \cdot \left( (N / P_{l-1}) \cdot (l-1)! \cdot N^{(3^{l-2} - 1) / (2 \cdot 3^{l-2})} \right) \nonumber \\&= (N / P_{l-1}) \cdot (l-1)! \cdot N^{(3^{l-1} - 1) / (2 \cdot 3^{l-1})}. \end{aligned}$$
(1)

Note that \(\mathbf {BBHT}(F)\) finds an element x that satisfies \(F(x)=1\) with approximately \(\sqrt{1/p}\) queries to F, where \(p = \Pr _{x \leftarrow X}[F(x)=1]\). Since L contains \(N^{1/{3^{l-1}}}\) elements, here we approximately argue that \(p \approx N^{1/{3^{l-1}}} / |X | \approx N^{(1 - 3^{l-1} ) /3^{l-1} }\) and thus the number of queries to F is approximately \(\sqrt{1/p}\), which is further approximated to \(N^{(3^{l-1} - 1) / ( 2 \cdot 3^{l-1})}\). From the construction of F, the number of queries to H in Step 8 is twice the number of queries to F (see Fig. 3), so the number of queries to H is

$$\begin{aligned} 2 \cdot N^{(3^{l-1} - 1) / ( 2 \cdot 3^{l-1})}. \end{aligned}$$
(2)

Summing up the numbers of queries in Steps 3 and 8 in Eqs. (1) and (2), we obtain the number of queries to H in the case that \(\mathbf {MColl}(H,l)\) does not stop in Steps 5 or 9 as

$$\begin{aligned} \left( (N / P_{l-1}) \cdot (l-1)! + 2 \right) \cdot N^{ (3^{l-1}-1) / (2 \cdot 3^{l-1})} \approx (N / P_{l-1}) \cdot (l-1)! \cdot N^{ (3^{l-1}-1) / (2 \cdot 3^{l-1})}. \end{aligned}$$
(3)

Now let q denote the probability that \(\mathbf {MColl}(H,l)\) outputs without being terminated at Steps 5 or 9. Then the overall quantum query complexity is approximately \((1/q) \cdot \left( (N / P_{l-1}) \cdot (l-1)! \cdot N^{ (3^{l-1}-1) / 2 \cdot 3^{l-1} } \right) \). We assume that q equals the probability that an l-collision is outputted in Step 9, since the probability that Step 5 finds a duplication in L is very small when l is a small constant, and ignoring Step 6 only decreases q and increases the overall complexity. Intuitively, we can assume that q equals the product of two probabilities in Step 9:

  1. 1.

    The probability that the \((l-1)\)-collision \(\{ x^{(i_0)}_1, x^{(i_0)}_2, \dots , x^{(i_0)}_{l-1} \}\) can be extended to an l-collision.

  2. 2.

    The probability that \(\tilde{x} \not \in \{ x^{(i_0)}_1, x^{(i_0)}_2, \dots , x^{(i_0)}_{l-1} \}\) holds, under the condition in which \(\{ x^{(i_0)}_1, x^{(i_0)}_2, \dots , x^{(i_0)}_{l-1} \}\) can be extended to an l-collision.

The probability of 1. is approximately \(P_l / P_{l-1}\), and the probability of 2. is lower bounded by 1 / l. Thus, we have \(q \ge (P_l / P_{l-1}) \cdot (1/l)\). Consequently, we have overall approximated complexity

$$\begin{aligned} (1/q) \cdot \left( (N / P_{l-1}) \cdot (l-1)! \cdot N^{ (3^{l-1}-1) /( 2 \cdot 3^{l-1} )} \right) , \end{aligned}$$

which is at most

$$\begin{aligned} Q_l = (N/P_{l}) \cdot l! \cdot N^{ (3^{l-1}-1) / 2 \cdot 3^{l-1} }. \end{aligned}$$

This validates the claim.

5.2 Precise Analysis

The discussion in the previous section is very informal with many approximations. This section gives the precise bound and proof. The main theorem in this section is as follows:

Theorem 5.1

Let X and Y be finite sets with \(|Y | = N\) and \(|X | \ge l \cdot |Y |\). Let \(H :X \rightarrow Y\) be an arbitrary function. For \(l \ge 1\), \(\mathbf {MColl}(H,l)\) finds an l-collision with expected quantum query complexity at most

$$\begin{aligned} \left( 1 + 18\sqrt{2}e \right) \cdot \left( \frac{2lN^{1/3}}{2lN^{1/3} - 1} \right) ^{l-1} \cdot l \cdot l! \cdot N^{ (3^{l-1}-1) / (2 \cdot 3^{l-1}) } \end{aligned}$$

and memory complexity \(N^{1/3}\).

Remark 5.1

Expected time complexity of \(\mathbf {MColl}(H,l)\) is upper bounded by the product of expected quantum query complexity and \(O(T_H + \lg N)\), where \(T_H\) is the time needed to make a quantum query to H once, using \(O(N^{1/3})\) qubits. See Sect. 6 for details.

Proof

It suffices to show that the claim holds in the case \(|X | = l \cdot |Y |\). The proof for memory complexity is the same as that we described in the previous section. In the following, we consider quantum query complexity.

For \(l \ge 1\), define \(A_l\) as

$$\begin{aligned} A_l := \text {the total number of quantum queries to }H \text { that }\mathbf {MColl}(H,l) \text { makes}, \end{aligned}$$

and for \(l \ge 2\), define \(B_l, C_l\) as

$$\begin{aligned} B_l&:= \text {the number of quantum queries to }H \text { made in Step 3}, \\ C_l&:= \text {the number of quantum queries to }H \text { made in Step 8}. \end{aligned}$$

For \(l \ge 2\), we consider a modification of \(\mathbf {MColl}(H,l)\), denoted by \(\mathbf {MColl'}(H,l)\), which never restarts from Step 3 once it stops in Steps 5 or 9. Let \(D_l\) be the total number of quantum queries to H that \(\mathbf {MColl'}(H,l)\) makes. Let \(\mathsf {success}\) denote the event such that \(\mathbf {MColl'}(H,l)\) outputs an l-collision. Then we have

$$ \mathop {\mathrm {E}}[D_l] = \mathop {\mathrm {E}}[B_l] + \mathop {\mathrm {E}}[C_l] $$

and

$$\begin{aligned} \mathop {\mathrm {E}}[A_l] = \frac{\mathop {\mathrm {E}}[D_l]}{\Pr [\mathsf {success}]} = \frac{\mathop {\mathrm {E}}[B_l] + \mathop {\mathrm {E}}[C_l]}{\Pr [\mathsf {success}]} \end{aligned}$$
(4)

for \(l \ge 2\). In addition, since \(\mathop {\mathrm {E}}[B_l] = N^{1/3^{l-1}} \cdot \mathop {\mathrm {E}}[A_{l-1}]\) for \(l \ge 2\), we have

$$\begin{aligned} \mathop {\mathrm {E}}[A_l] = \frac{N^{1/3^{l-1}} \cdot \mathop {\mathrm {E}}[A_{l-1}] + \mathop {\mathrm {E}}[C_l] }{ \Pr [\mathsf {success}]}. \end{aligned}$$
(5)

We will show two lemmas on bounds for \(\mathop {\mathrm {E}}[C_l]\) and \(\Pr [\mathsf {success}]\) in Sects. 5.3 and 5.4, respectively:

Lemma 5.1

For \(l \ge 2\), \(\mathop {\mathrm {E}}[C_l] \le 18 \cdot \sqrt{\frac{l}{l-1}} \cdot N^{\frac{3^{l-1}-1}{2 \cdot 3^{l-1}}}\) holds.

Lemma 5.2

For \(l \ge 2\), \(\Pr [\mathsf {success}] \ge \frac{l-1}{l} \cdot \frac{1}{l} \cdot \left( 1 - \frac{1}{2l} \cdot N^{\frac{2}{3^{l-1}}-1} \right) \) holds.

Putting them in the inequality (5), we obtain

$$\begin{aligned} \mathop {\mathrm {E}}[A_l] \le \left( N^{1/3^{l-1}} \mathop {\mathrm {E}}[A_{l-1}] + 18 \sqrt{\frac{l}{l-1}} N^{ \frac{3^{l-1} -1}{2 \cdot 3^{l-1}} } \right) \cdot \frac{l^2 f_l}{l-1}, \end{aligned}$$
(6)

where

$$\begin{aligned} f_l = \frac{1}{1 - \frac{1}{2l} \cdot N^{\frac{2}{3^{l-1}}-1}}. \end{aligned}$$

Let \(\{ g_l \}_{1 \le l}\) be a sequence of numbers defined by \(g_1 = 1\) and

$$\begin{aligned} g_l = \left( g_{l-1} + \frac{18\sqrt{l/(l-1)}}{ (l-1) \cdot (l-1)!} \right) f_l \end{aligned}$$

for \(l \ge 2\).

We show the following claims:

Claim

For \(l \ge 1\), \(\mathop {\mathrm {E}}[A_l] \le g_l \cdot l \cdot l! \cdot N^{ \frac{3^{l-1} -1}{2 \cdot 3^{l-1}}}\) holds.

Claim

For \(l \ge 1\), \(g_l \le \left( 1 + 18\sqrt{2}e \right) \cdot \left( \frac{2lN^{1/3}}{2lN^{1/3} - 1} \right) ^{l-1}\) holds.

Combining them, we obtain for \(l \ge 1\),

$$ \mathop {\mathrm {E}}[A_l] \le \left( 1 + 18\sqrt{2}e \right) \cdot \left( \frac{2lN^{1/3}}{2lN^{1/3} - 1} \right) ^{l-1} \cdot l \cdot l! \cdot N^{ \frac{3^{l-1} -1}{2 \cdot 3^{l-1}}} $$

as we wanted.    \(\square \)

Proof

(Proof of Claim). We give a proof of this claim by induction on l. Since \(\mathop {\mathrm {E}}[A_1] = 1\), the claim holds for \(l=1\). Now we assume that the claim holds for \((l-1)\). By the induction, we have

$$\begin{aligned} \mathop {\mathrm {E}}[A_l]&\le \left( N^{1/3^{l-1}} \mathop {\mathrm {E}}[A_{l-1}] + 18 \sqrt{\frac{l}{l-1}} N^{ \frac{3^{l-1} -1}{2 \cdot 3^{l-1}} } \right) \cdot \frac{l^2 f_l}{l-1}\\&\le \left( N^{1/3^{l-1}} \left( g_{l-1} \cdot (l-1) \cdot (l-1)! \cdot N^{ \frac{3^{l-2} -1}{2 \cdot 3^{l-2}}} \right) + 18 \sqrt{\frac{l}{l-1}} N^{ \frac{3^{l-1} -1}{2 \cdot 3^{l-1}} } \right) \cdot \frac{l^2 f_l}{l-1} \\&= \left( g_{l-1} + \frac{18\sqrt{l/(l-1)}}{(l-1) \cdot (l-1)!} \right) \cdot f_l \cdot l \cdot l! \cdot N^{ \frac{3^{l-1} -1}{2 \cdot 3^{l-1}} }\\&= g_l \cdot l \cdot l! \cdot N^{ \frac{3^{l-1} -1}{2 \cdot 3^{l-1}}} \end{aligned}$$

and the claim also holds for any \(l \ge 1\).    \(\square \)

Proof

(Proof of Claim). Finally, we upper bound \(g_l\). Letting \(h_l = \frac{18\sqrt{l/(l-1)}}{ (l-1) \cdot (l-1)!}\), we have \(g_l = (g_{l-1} + h_l) f_l\). Since \(f_l \ge 1\) holds for \(l \ge 2\), we have

$$\begin{aligned} g_l = \left( g_{l-1} + h_l\right) f_l = \left( \left( g_{l-2} + h_{l-1}\right) f_{l-1} + h_l\right) f_l \le \left( g_{l-2} + h_{l-1} + h_l\right) f_{l-1}f_l. \end{aligned}$$

Continuing calculations, we obtain \( g_l \le \left( 1 + \sum ^l_{i=2} h_i \right) \cdot \prod ^l_{i=2}f_i. \) Thus, we have

$$\begin{aligned} g_l&\le \left( 1 + \sum ^l_{i=2} h_i \right) \cdot \prod ^l_{i=2}f_i \\&= \left( 1 + \sum ^l_{i=2} \frac{18\sqrt{i/(i-1)}}{(i-1) \cdot (i-1)!} \right) \prod ^l_{i=2} \frac{1}{1 - \frac{1}{2l} \cdot N^{\frac{2}{3^{i-1}}-1}}\\&\le \left( 1 + \sum ^l_{i=2} \frac{18 \sqrt{2}}{(i-1)!} \right) \prod ^l_{i=2} \frac{1}{1 - \frac{1}{2l} \cdot N^{-1/3}}\\&\le \left( 1 + 18\sqrt{2} \left( \sum ^l_{i=2} \frac{1}{(i-1)!} \right) \right) \prod ^l_{i=2} \frac{2lN^{1/3}}{2lN^{1/3} - 1}\\&\le \left( 1 {+} 18\sqrt{2} \left( \sum ^\infty _{i=0} \frac{1}{i!} \right) \right) \left( \frac{2lN^{1/3}}{2lN^{1/3} - 1} \right) ^{l-1} {=} \left( 1 + 18\sqrt{2}e \right) \left( \frac{2lN^{1/3}}{2lN^{1/3} - 1} \right) ^{l-1}, \end{aligned}$$

as we wanted.    \(\square \)

5.3 Proof of Lemma 5.1

Note that

$$\begin{aligned} \mathop {\mathrm {E}}[C_l] \le \mathop {\mathrm {E}}[C_l \mid \text {Step 8 is operated}] \end{aligned}$$

holds, and we upper bound the conditional expectation \(\mathop {\mathrm {E}}[C_l \mid \text {Step 8 is operated}]\). When the algorithm operates in Step 8, it has already passed Steps 5 and 6. Thus, L has neither duplication nor l-collision. In particular, we can assume that L is a list of completely distinct \((l-1)\) collisions of H, i.e., \(y^{(i_1)} = y^{(i_2)}\) holds if and only if \(i_1 = i_2\). Thus, we have

$$\begin{aligned} \big |F^{-1}(1) \big | = \left|\bigcup _{i = 1}^{N^{1/3^{l-1}}} H^{-1}(y^{(i)}) \right|= \sum _{i=1}^{N^{1/3^{l-1}}} \big |H^{-1}(y^{(i)}) \big | \ge (l-1) \cdot N^{1 / 3^{l-1}} \end{aligned}$$

and

$$\begin{aligned} \frac{\big |F^{-1}(1) \big |}{|X |} \ge \frac{(l-1) \cdot N^{1 / 3^{l-1}}}{l \cdot N}. \end{aligned}$$

Since \(\mathbf {MColl}\) makes two quantum queries to H while making one query to F (See Fig. 3), we have

$$\begin{aligned} \mathop {\mathrm {E}}[C_l \mid \text {Step 8 is operated}] \le 2 \cdot 9 \cdot \sqrt{\frac{l \cdot N}{(l-1) \cdot N^{1 / 3^{l-1}}}} = 18 \cdot \sqrt{\frac{l}{l-1}} \cdot N^{\frac{3^{l-1}-1}{2 \cdot 3^{l-1}}} \end{aligned}$$

by Corollary 2.1 as we wanted.

5.4 Proof of Lemma 5.2

Next, we lower bound \(\Pr [\mathsf {success}]\). Note that

holds.

We need two lemmas. For the proof of Lemma 5.3, we refer readers to Shoup’s textbook [Sho08]. The proof of Lemma 5.4 is given in Appendix A.

Lemma 5.3

([Sho08, Theorem 8.26]). Let [d] be the set of integers \(\{1,2, \dots , d\}\), and \([d]^{\times n}\) be the n-array Cartesian power set of [d] for positive integers dn. If \(s = (s_1,s_2,\dots ,s_n)\) is chosen uniformly at random from \([d]^{\times n}\), then the probability that \(s_i \ne s_j\) holds for all \(i \ne j\) is lower bounded by \(1 - {n^2}/({2d})\).

Lemma 5.4

Let X and Y be finite sets with \(|Y | = N\) and \(|X | = l N\). Let H be a function from X to Y. Then the number of l-collisions and \((l-1)\)-collisions of H are greater than or equal to N and lN, respectively.

First, we lower bound \(\Pr \left[ c^{(i)} \ne c^{(j)} \text { for }i \ne j \right] \). From the construction of \(\mathbf {MColl}\), we can assume that \(\mathbf {MColl}(H,l-1)\) outputs an \((l-1)\)-collision of H uniformly at random. Thus, we can assume that elements \(c^{(i)} \in L\) are chosen independently and uniformly at random from the set of \((l-1)\)-collisions of H. By Lemma 5.4, the number of \((l-1)\)-collisions of H is at least \(l \cdot N\). Moreover, if n is fixed, \(1 - n^2/2d\) is a monotonically increasing function on d. Therefore, by Lemma 5.3, we have

$$\begin{aligned} \Pr \left[ c^{(i)} \ne c^{(j)} \text { for }i \ne j \right] \ge 1 - \frac{(N^{1/3^{l-1}})^2}{2lN} = 1 - \frac{1}{2l} \cdot N^{\frac{2}{3^{l-1}}-1}. \end{aligned}$$
(7)

Second, we lower bound . Note that the event \(\mathsf {success}\) occurs if and only if

$$\begin{aligned} y^{(i)} = y^{(j)}\text { for some }i \ne j, \end{aligned}$$
(8)

or

$$\begin{aligned} c^{(i_0)} \text { can be extended to an } l\text {-collision, and } \tilde{x} \not \in \left\{ x^{(i_0)}_1, x^{(i_0)}_2, \dots , x^{(i_0)}_{l-1}\right\} . \end{aligned}$$
(9)

occurs. Recall that \(\tilde{x}\) is the output of Step 8 and \(i_0\) is an index satisfying \(H(\tilde{x})=y^{(i_0)}\). The event (8) corresponds to the event in which \(\mathbf {MColl}\) finds an l-collision in Step 6, and the event (9) corresponds to the event in which \(\mathbf {MColl}\) finds an l-collision in Step 9.

Now, let \({\mathcal {L}}\) be all the possible lists L that satisfy \(c^{(i)} \ne c^{(j)} \text { for } i \ne j\). Let \({\mathcal {L}}_1 \subset {\mathcal {L}}\) denote the set of lists in which there exists l-collisions, i.e., there are two indices \(i \ne j\) such that \(y^{(i)} = y^{(j)}\), and \({\mathcal {L}}_{2} \subset {\mathcal {L}}\) denotes the set of lists in which there is no l-collision, i.e., \(y^{(i)} \ne y^{(j)}\) holds for \(i \ne j\). Then we have \({\mathcal {L}} = {\mathcal {L}}_{1} \coprod {\mathcal {L}}_{2}\). \(\mathbf {MColl}\) finds an l-collision in Step 6 if and only if \(L \in {\mathcal {L}}_{1}\). In the following, we ignore Step 4 and consider that L is not sorted for simplicity.

For a fixed \(L \in {\mathcal {L}}\), let \(A^L\) and \(B^L\) denote the sets of elements in L that can and cannot be extended to l-collisions, respectively. We have that \(L = A^L \coprod B^L\), \(\big |A^L \big |\) equals the number of \(y^{(i)}\) such that \(\big |H^{-1}(y^{(i)}) \big | \ge l\), and \(\big |B^L \big |\) equals the number of \(y^{(i)}\) such that \(\big |H^{-1}(y^{(i)}) \big | = l-1\). Define \(\langle A^L \rangle \), \(\langle B^L \rangle \) by

$$\begin{aligned} \langle A^L \rangle := \left|\bigcup _{c^{(i)} = (..., y^{(i)}) \in A^L } H^{-1}(y^{(i)}) \right|\text { and } \langle B^L \rangle := \left|\bigcup _{c^{(i)} = (..., y^{(i)}) \in B^L } H^{-1}(y^{(i)}) \right|, \end{aligned}$$

which are the numbers of preimages of \(y^{(i)}\)’s in \(A^L\) and \(B^L\), respectively. Note that

$$\begin{aligned} \Pr [\mathsf {success}\mid c^{(i)} \ne c^{(j)} \text { for } i \ne j] = \sum _{L \in {\mathcal {L}}} \Pr [\mathsf {success}\mid L] \Pr [L]. \end{aligned}$$

holds.

If \(L \in {\mathcal {L}}_{2}\), then \(\mathsf {success}\) occurs if and only if the event (9) occurs, that is, \(\tilde{x}\) can be used to construct an l-collision with an \((l-1)\)-collision in L. Note that \(\tilde{x}\) is chosen uniformly at random from the set

$$\begin{aligned} \left( \bigcup _{c^{(i)} = (...,y^{(i)}) \in A^L } H^{-1}(y^{(i)})\right) \bigcup \left( \bigcup _{c^{(i)} = (...,y^{(i)}) \in B^L } H^{-1}(y^{(i)}) \right) , \end{aligned}$$

and the event (9) occurs if and only if

$$\begin{aligned} {\tilde{x} \in \bigcup _{c^{(i)} = (...,y^{(i)}) \in A^L } H^{-1}(y^{(i)})} \wedge {\tilde{x} \ne x^{(i)}_j \text { for all } i \text { and } j} \end{aligned}$$

holds. Now we have

and

which suggests that

In addition, we have \(\big \langle A^L \big \rangle \ge l \cdot |A^L |\) since \(y^{(i)} \ne y^{(j)}\) holds for \(i \ne j\) if \(L \in {\mathcal {L}}_{2}\). Thus, we have \(\big \langle A^L \big \rangle \ge l \cdot \big |A^L \big | \ge (l-1)\cdot \big |A^L \big |\) and \(\big \langle B^L \big \rangle = (l-1)\cdot \big |B^L \big |\). This yields that

$$\begin{aligned} \frac{ \big \langle A^L \big \rangle }{ \big \langle A^L \big \rangle + \big \langle B^L \big \rangle } = \frac{ 1 }{ 1 + \frac{ \big \langle B^L \big \rangle }{ \big \langle A^L \big \rangle } } \ge \frac{ 1 }{ 1 + \frac{ (l-1)\big |B^L \big | }{ (l-1)\big |A^L \big | } } = \frac{\big |A^L \big |}{\big |A^L \big | + \big |B^L \big |}. \end{aligned}$$

Thus, we have

$$\begin{aligned} \Pr [\mathsf {success}\mid L] \ge \frac{ \big |A^L \big | }{ \big |A^L \big | + \big |B^L \big | } \cdot \frac{1}{l} \end{aligned}$$
(10)

for \(L \in {\mathcal {L}}_{2}\). Moreover, since \(\Pr [\mathsf {success}\mid L]=1\) for \(L \in {\mathcal {L}}_{1}\), the inequality (10) also holds for \(L \in {\mathcal {L}}_{1}\). Therefore, we have

Now we use the following lemmas the proofs of which are given in Appendices B and C, respectively.

Lemma 5.5

Let XY be finite sets such that \(|X | = l \cdot |Y |\), and H be a function from X to Y. Let AB denote the sets of \((l-1)\)-collisions of H that can and cannot be extended to l-collisions, respectively. Then we have

$$ \frac{|A |}{|A |+|B |} \ge \frac{l-1}{l}. $$

Lemma 5.6

Let XY be finite sets such that \(|X | = l \cdot |Y |\), and H be a function from X to Y. Let AB denote the sets of \((l-1)\)-collisions of H that can and cannot be extended to l-collisions, respectively. Then we have

$$\begin{aligned} \sum _{L \in {\mathcal {L}}} \frac{ \big |A^L \big | }{ \big |A^L \big | + \big |B^L \big | } \cdot \Pr [L] = \frac{|A |}{|A | + |B |}. \end{aligned}$$

By the above lemmas, we have

Consequently, \(\Pr [\mathsf {success}]\) is lower bounded as \(\Pr [\mathsf {success}] \ge \frac{l-1}{l} \cdot \frac{1}{l} \cdot \left( 1 - \frac{1}{2l} \cdot N^{\frac{2}{3^{l-1}}-1} \right) \), that completes the proof.

6 Discussions on Time Complexity

The previous section only focused on quantum query complexity. This section discusses time complexity of \(\mathbf {MColl}\). We measure the unit of time complexity by the number of executions of quantum gates, which operate primary binary calculations on \(\lg N\)-bit strings such as \(\mathsf {NOT}, \mathsf {AND}, \mathsf {OR}\), and \(\mathsf {XOR}\). For a function \(F:X \rightarrow \{0,1\}\), \(\mathbf {BBHT}\) finds an \(x_0\) such that \(F(x_0)=1\) in time \(O(\sqrt{|X |/t} \cdot T')\), where \(t = | \{x \in X \mid F(x)=1 \} |\) and \(T'\) is the time for evaluating F once.

To begin with, we show the following theorem.

Theorem 6.1

Let XY be finite sets with \(|Y | = N\) and \(|X | \ge l \cdot |Y |\). For any function \(H :X \rightarrow Y\), \(\mathbf {MColl}(H,l)\) runs in expected time

$$\begin{aligned} C \cdot \left( \frac{2lN^{1/3}}{2lN^{1/3} - 1} \right) ^{l-1} \cdot l \cdot l! \cdot N^{ (3^{l-1}-1) / 2 \cdot 3^{l-1} } \cdot (T_H + \lg N) \end{aligned}$$

for some constant C, using \(O(N^{1/3})\) qubits, where \(T_H\) denotes the time needed to make a quantum query to H.

Proof

Let \(A'_l\) be the running time of \(\mathbf {MColl}(H,l)\) for \(l \ge 1\). For \(l \ge 2\), let \(B'_l,G'_l,C'_l,K'_l\) be the running time of Steps 3, 4, 8, and 9, respectively. Similarly to the inequality 4, we have

$$\begin{aligned} \mathop {\mathrm {E}}[A'_l] = \frac{ \mathop {\mathrm {E}}[B'_l] + \mathop {\mathrm {E}}[G'_l] + \mathop {\mathrm {E}}[C'_l] + \mathop {\mathrm {E}}[K'_l] }{ \Pr [\mathsf {success}]} \end{aligned}$$
(11)

for \(l \ge 2.\) We have

$$\begin{aligned} \mathop {\mathrm {E}}[B'_l]&= N^{1/3^{l-1}} \cdot \mathop {\mathrm {E}}[A'_{l-1}], \\ \mathop {\mathrm {E}}[G'_l]&= O\left( N^{1/3^{l-1}} \lg N^{1/3^{l-1}} \right) = O\left( N^{1/3^{l-1}} \lg N \right) , \\ \mathop {\mathrm {E}}[K'_l]&= O\left( \lg N^{1/3^{l-1}} \right) = O\left( \lg N \right) , \end{aligned}$$

since Steps 4 and 9 can be done classically. In addition, we have that

$$\begin{aligned} \mathop {\mathrm {E}}[C'_l]&= \mathop {\mathrm {E}}[C_l] \cdot \left( T_H + O\left( \lg N^{1/3^{l-1}} \right) \right) = O\left( \mathop {\mathrm {E}}[C_l] \cdot \left( T_H + \lg N^{1/3^{l-1}} \right) \right) \\&= O\left( \sqrt{\frac{l}{l-1}} \cdot N^{\frac{3^{l-1}-1}{2 \cdot 3^{l-1}}} \cdot \left( T_H + \lg N \right) \right) , \end{aligned}$$

which follows from the construction of the quantum circuit of F (See Fig. 3) and the claim below. See Appendix D for details of this claim.

Claim

The quantum circuit \(\mathop { BL }\) can be constructed so that it runs in time \(O(\lg N^{1/3^{l-1}})\) using \(O(N^{1/3^{l-1}})\) qubits.

Eventually, we have

$$\begin{aligned} \mathop {\mathrm {E}}[A'_l]&= O\left( \frac{N^{1/3^{l-1}} \cdot \mathop {\mathrm {E}}[A'_{l-1}] + N^{1/3^{l-1}} \lg N + \sqrt{\frac{l}{l-1}} \cdot N^{\frac{3^{l-1}-1}{2 \cdot 3^{l-1}}} \cdot \left( T_H + \lg N \right) + \lg N}{ \Pr [\mathsf {success}]} \right) \\&\le O\left( \frac{N^{1/3^{l-1}} \cdot \mathop {\mathrm {E}}[A'_{l-1}] + \sqrt{\frac{l}{l-1}} \cdot N^{\frac{3^{l-1}-1}{2 \cdot 3^{l-1}}} }{ \Pr [\mathsf {success}]} \cdot \left( T_H + \lg N \right) \right) \\&= O\left( \left( N^{1/3^{l-1}} \mathop {\mathrm {E}}[A'_{l-1}] + \sqrt{\frac{l}{l-1}} N^{ \frac{3^{l-1} -1}{2 \cdot 3^{l-1}} } \right) \cdot \frac{l^2 f_l}{l-1} \cdot \left( T_H + \lg N \right) \right) . \end{aligned}$$

The above equation yields the claim of Theorem 6.1 due to the same argument as that in the proof of Theorem 5.1.

Remark 6.1

From the viewpoint of time complexity, there are a few criticisms of existing quantum 2-collision finding algorithms [GR03, Ber09]. They are based on the observation that memory size is essentially the same as machine size for quantum machines, since we have to embed data that we use in a quantum algorithm into the quantum circuit of the algorithm.

Note that these criticisms only focus on collisions of random functions and thus are invalid when we consider finding collisions of any function. Furthermore, the target of these criticisms is time complexity, and our main result (Theorem 5.1), which focuses on quantum query complexity, is out of the scope of these criticisms.

7 Conclusion

Finding multicollisions is one of the most important problems in cryptology, both for attack and provable security. In the post-quantum era, this problem needs to be studied in a quantum setting to realize quantum-secure cryptographic schemes. This paper systematized knowledge on the multicollision-finding problem in a quantum setting and proposed a new quantum multicollision-finding algorithm. Our algorithm finds an l-collision of any function \(H :X \rightarrow Y\), where \(|X | \ge l \cdot |Y |\), with expected quantum query complexity \(O(N^{ (3^{l-1}-1)/2 \cdot 3^{l-1}}) = o(N^{1/2})\) and memory complexity \(O(N^{1/3})\) for a small constant l. If our algorithm is applied to the random function, the complexity matches the known tight bound for \(l=2\), improves the simple combination of Zhandry and Belovs’ results for \(l=3\), and for the first time improves the simple bound of \(O(N^{1/2})\) for \(l\ge 4\). Getting rid of the condition \(|X | \ge l \cdot |Y |\) and proving a lower bound to find an l-collision are left for future work.

The quantum stuff in this paper is encapsulated in Grover’s algorithm, and the results can equally well be understood as query complexity given a “Grover black-box” without assuming any knowledge of quantum theory on the reader. We hope this paper encourages researchers in classical setting to actively discuss quantum algorithms.