Time–space complexity of quantum search algorithms in symmetric cryptanalysis: applying to AES and SHA2
Abstract
Performance of cryptanalytic quantum search algorithms is mainly inferred from query complexity which hides overhead induced by an implementation. To shed light on quantitative complexity analysis removing hidden factors, we provide a framework for estimating time–space complexity, with carefully accounting for characteristics of target cryptographic functions. Processor and circuit parallelization methods are taken into account, resulting in the time–space tradeoff curves in terms of depth and qubit. The method guides how to rank different circuit designs in order of their efficiency. The framework is applied to representative cryptosystems NIST referred to as a guideline for security parameters, reassessing the security strengths of AES and SHA2.
Keywords
Quantum circuit Grover Parallelization Resource estimates AES SHA2Mathematics Subject Classification
94A60 68Q12 81P681 Introduction
Quantum cryptanalysis is an area of study that has long been developed alongside the field of quantum computing, as many cryptosystems are expected to be directly affected by quantum algorithms. It is thus natural that cryptographic communities are putting more and more efforts for preparing postquantum era as quantum computing communities are making progress. A notable effort being made by National Institute of Standards and Technology (NIST) primarily concerns new publickey cryptosystems leveraged by the quantum periodfinding algorithm that might make some currently used publickey schemes obsolete once a practical quantum computer becomes available [1]. Unlike publickey schemes, however, the significance of quantum search algorithms in symmetric cryptosystems is arguable. It had been widely known that most symmetric cryptosystem’s security levels will be simply reduced by half due to the asymptotic behavior of the query complexity of Grover’s algorithm under the oracle assumption [2]. Squareroot improvement in exhaustivesearch ability seems on the one hand not negligible and may affect the current symmetric cryptosystems like in key sizes. On the other hand, when the detailed mechanism ‘how quantum objects or Grover’s speedup work’ comes into account, there exist claims that the threat is not as harmful as it has been believed to be [1, 3].^{1} The main reason for devaluating the algorithm in symmetric cryptography stems from its ‘poor parallelizability.’
As the field has matured over decades, not mere asymptotic but more quantitative approaches to the cryptanalysis are also being considered recently [4, 5, 6, 7, 8, 9]. These works have substantially improved the understanding of quantum attacks by systematically estimating quantum resources. Nevertheless, it is still noticeable that the existing works on resource estimates are more intended for suggesting exemplary quantum circuits (so that one can count the number of required gates and qubits explicitly) than finetuning of actual attack designs. Furthermore, despite that the parallelizability of quantum search algorithms is the main source of the debate on the quantum threat in symmetric cryptography, the resource cost of parallel quantum attack has never been estimated quantitatively. In fact, quantum search algorithms have never been applied to parallel applications in gatelevel details, not just in quantum cryptanalysis but in the whole field of quantum information. There could be various difficulties hampering the parallelizing quantum algorithms, just as many classical serial applications cannot find its parallel counterparts easily.
The importance of estimating costs of quantum search algorithms beyond pioneering works should be emphasized as it can be utilized to suggest practical security levels in the postquantum era. NIST indeed suggested security levels based on the resistances of advanced encryption standard (AES) and secure hash algorithm (SHA) to quantum attacks in PQC standardization call for proposals document [10]. In addition, the difficulty of measuring the complexity of quantum attacks was questioned in the first NIST PQC standardization workshop.^{2} The main purpose of this work is to formulate the time–space complexity of quantum search algorithms in order to provide reliable quantum security strengths of classical symmetric cryptosystems.
1.1 This work
There exist two noteworthy points overlooked in the previous works. First, the target function to be inverted is generally a pseudorandom function or a cryptographic hash function. Under the characteristics of such functions, bijective correspondence between input and output is not guaranteed. This makes Grover’s algorithm seemingly inapplicable due to the unpredictability of the number of targets. The second point is a time–space tradeoff of quantum resources. Earlier works on quantitative resource estimates have implicitly or explicitly assumed a single quantum processor. Presuming that the resource in classical estimates includes the number of processors the adversary is equipped with, the single processor assumption is something that should be revised.
Being aware of the issues, we come up with a framework for analyzing the time–space complexity of cryptanalytic quantum search algorithms. The main consequences we present in this paper are threefolded:
1.1.1 Precise query complexity involving parallelization
The number of oracle queries, or equivalently Grover iterations, is first estimated as exactly as possible, reasonably accounting for previously overlooked points. Random statistics of the target function are carefully handled which lead to increase in iteration number compared with the case of a unique target. Surprisingly, however, the cost of dealing with random statistics in this paper is not expensive compared with the previous work [6] under the single processor assumption. Furthermore, when processor parallelization is considered, we observed that this extra cost gets even more negligible. It is also interesting to recognize that the parallelization methods could vary depending on the search problems. After taking the asymptotical big O notation off, the relation between time and space in terms of Grover iterations and number of processors, called tradeoff curve, is obtained. Apart from resource estimates, investigating the tradeoff curve of stateoftheart collision finding algorithm in [11] with optimized parameters is one of our major concerns.
1.1.2 Depthqubit tradeoff and circuit design tuning
In the next stage, time and space resources are defined in a way that they can be interpreted as physical quantities. Cost of quantum circuits for cryptanalytic algorithms can be estimated in units of Toffolidepths and logical qubits. Taking the total number of gates as time complexity disturbs accurate estimates for the speed of quantum algorithms due to far different overheads introduced by various gates in real operation. With the definitions of quantum resources, the tradeoff curve now describes the relation between circuit depths and number of qubits. Since we are given a ‘relation’ between time and space, it is then possible to grade the various quantum circuits in order of efficiency. In other words, the method described so far enables one to tell which attack design is more costeffective.
By applying generic methodology newly introduced, time–space complexities of AES and SHA2 against quantum attacks are measured in the following way.^{3} Various designs are constructed by assembling different circuit components with options such as reduced depth at the cost of the increase in qubits (or vice versa). Design candidates are then subjected to the tradeoff relation for comparison. The tradeoff coefficient of the most efficient design represents the hardness of quantum cryptanalysis. Compared with preexisting circuit designs, we have improved the circuits by reducing required qubits and/or depths in various ways. However, we do not claim that we have found the optimal attacks for AES and SHA2. The method enables us to select the best one out of candidates at hand.
1.1.3 Revisiting the security levels of NIST PQC standardization
We end this subsection with two important caveats. One is that a use of classical resources appears in this paper, but we do not handle the complexity induced by it because of unclear comparison criteria for quantum and classical resources. The other is that our focus is put on specific algorithms implemented in the level of elementary gates. Readers are however encouraged not to rule out other algorithms that quantum computing communities also pursuit, for example, ones using quantum memory.
1.2 Organization
Next section covers the backgrounds including Grover’s algorithm, its parallelization, other variants, and a short introduction to AES and SHA2. In Sect. 3, the time–space complexity of relevant algorithms in a unit of Grover iteration is investigated. A basic unit of quantum computation is proposed in Sect. 4 as well as introducing a concept of tradeoff in quantum resources. Sections 5, 6, and 7 show the results of applying the time–space analysis to AES and SHA2. In Sect. 8, based on the observations made in the previous sections, a comprehensive figure summarizing the quantum security strengths of AES and SHA2 is drawn. Section 9 summarizes the paper.
2 Backgrounds
Grover’s algorithm, the success probability, parallelization methods, and some generalizations or variants are explained briefly. A brief review of AES and SHA2, and an introduction to related works on resource estimates are followed. We do not cover the basics of quantum computing, but leave the references [12, 13] for interested readers. Throughout the paper, the target function is denoted by f, \(N=2^n\) for some \(n\in \mathbb {N}\) and every bra or ket state is normalized.
2.1 Grover’s algorithm
The operators \(S_f\) and \(\,A S_0 A^{1}\) are known as oracle and diffusion operators, respectively. By acting the oracle operator on a state, only the target state is marked through the sign change. The diffusion operator flips amplitudes around the average.
Success probability of measurement as a function of the number of iterations has been studied in [14], observing the optimal number of iterations that minimizes the ratio of the iterations to success rate. We introduce the results below with notation that is used throughout the paper.
2.2 Parallelization
Parallelization of Grover’s algorithm using multiple quantum computers has been investigated in applications to cryptanalysis [1, 10, 15]. Consideration of parallelization in a hybrid algorithm can be found in [16]. Asymptotically the execution time is reduced by a factor of the square root of the number of quantum computers. There are two straightforward parallelization methods having such property, called inner and outer parallelization.
Parameters \(T_q\) and \(S_q\) stand for the number of sequential Grover iterations and the number of quantum computers, respectively. \(S_c\) stands for the amount of classical resources, such as the size of storage and/or the number of processors. Definitions of two parallelization methods can be given as follows.
Definition 1
After dividing the entire search space into \(S_q\) disjoint sets, each machine searches one of the sets for the target. The number of iterations can be reduced due to the reduced domain size.
Definition 2
Copies of Grover’s algorithm on the entire search space are run on \(S_q\) machines. Since it is successful if any of the \(S_q\) machines finds the target, the number of iterations can be reduced.

\(2^{40}\): Approximate number of logical gates that presently envisioned quantum computing architectures are expected to serially perform in a year.

\(2^{64}\): Approximate number of logical gates that current classical computing architectures can perform serially in a decade.

\(2^{96}\): Approximate number of logical gates that atomic scale qubits with speed of light propagation times could perform in a millennium.
2.3 Generalizations and variants
Fixedpoint [17] and quantum amplitude amplification (QAA) [18] algorithms are generalizations of Grover’s algorithm. A brief review of QAA is given in this subsection which appears as a component of a collision finding algorithm in later sections. We skip over the fixedpoint algorithm as it has no advantage over Grover’s algorithm and QAA in this work.^{5}
There exist a number of variants of Grover’s algorithm in application to collision finding. In [19], Brassard, Høyer, and Tapp suggested a quantum collision finding algorithm (BHT) of \(O(N^{1/3})\) query complexity using quantum memory amounting to \(O(N^{1/3})\) classical data. A multicollision algorithm using BHT was suggested in [20]. In this work however, we do not consider BHT as a candidate algorithm for the following reasons. One is that the algorithm entails a need for quantum memory where the realization and the usage cost are controversial [21], and the other is that we are unable to come up with any implementation restricted to use of elementary gates that do not exceed the total cost of \(O(N^{1/2})\).
Apart from quantum circuits, algorithms primarily designed for other type of models such as measurementbased quantum computation also exist, for example quantum walk search [22, 23] or element distinctness [24], but we do not cover them as stateoftheart quantum architecture is targeting for circuit computation. Interested readers may further refer to [20] and related references therein for more information on quantum collision finding.
Bernstein analyzed quantum and classical collision finding algorithms in [3]. Quoting the work, no quantum algorithm with better time–space product complexity than \(O(N^{1/2})\) which is achieved by the stateoftheart classical algorithm [25] had not been reported. If Grover’s algorithm is parallelized with the distinguished point method, complexity of \(O(N^{1/2})\) can be achieved. This is one of the examples of immediate ways to combine quantum search with the rho method as mentioned in [3]. We denote it as Grover with distinguished point (GwDP) algorithm in this paper.
In ASIACRYPT 2017, Chailloux, NayaPlasencia, and Schrottenloher suggested a new quantum collision finding algorithm, called CNS algorithm, of \(O(N^{2/5})\) query complexity using \(O(N^{1/5})\) classical memory [11].
2.3.1 QAA algorithm
Basic structure of QAA is the same as Grover’s original algorithm. Initial state \(\Psi \rangle = A 0 \rangle \) is prepared, and then Grover iteration Q is repeatedly applied i times to get success probability Eq. 2. The only difference is that in QAA, the preparation operator A is not restricted to \(H^{\otimes n}\) where \(N=2^n\), and so thus the search space can be arbitrarily defined. Detailed derivation is not covered here, but instead we describe the key feature in an example.
As a trivial example, let us assume we are given a quantum computer and try to find a target bitstring 110011 in a set \(N=\{ x\; \; x \in \{0,1\}^6\; \text {and two}\) \(\text {middle}\) \(\text {bits are 0} \}\). Domain size is not equal to \(2^6\), and the initial state can be prepared by \(A=H_1 H_2 H_5 H_6\) where \(H_r\) is Hadamard gate acting on rth qubit. Remaining processes are to apply Grover iterations \(Q=A S_0 A^{1} S_f\) with A given by the state preparation operator just mentioned. The search space examined is rather trivial, but QAA also works on arbitrary domain. Nontrivial domain can be given as something like \(N=\{ x\; \; x \in \{0,1\}^6,\; f(x) \ne 0 \}\) for some given function f. It is a matter of preparing a state encoding appropriate search space, or in other words, that is to find an operator A. Once A is constructed, QAA works in the same way as in Grover’s algorithm.
2.3.2 GwDP algorithm
GwDP algorithm is a parallelization of Grover’s algorithm. Distinguished points (DP) can be defined by function outputs whose d most significant bits are zeros, denoted by dbit DP. We allow the notation DP to indicate inputs to produce DP or pairs of DP and corresponding input.
For \(S_q=S_c=2^s\), we use \((n2s)\)bit DP. By running \(T_q=O\left( 2^{n/2s}\right) \) times of Grover iterations, DP is expected to be found on each machine. Storing \(O(2^s)\) DPs sorted according to the output, a collision is found with high probability. The time–space product is always \(T_qS_q=O\left( N^{1/2}\right) \).
2.3.3 CNS algorithm
Instead of the details of CNS algorithm [11], we briefly mention the highlevel description and the corresponding complexities.
CNS algorithm consists of two phases, the list preparation and the collision finding. In the list preparation phase, a list of size \(2^l\) of dbit DPs is drawn up with the time complexity of \(O(2^{l+d/2})\) and the classical storage of size \(O(2^l)\). In the collision finding phase QAA algorithm is used. Each iteration of QAA algorithm consists of \(O(2^{d/2})\) Grover iterations and \(O(2^l)\) operations for the list comparison. After \(O\left( 2^{(ndl)/2}\right) \) QAA iterations, a collision is expected to be found. In total, CNS algorithm has \(O\left( 2^{l+d/2}+2^{(ndl)/2}(2^{d/2}+2^l)\right) \) time complexity and uses \(O(2^l)\) classical memory. With the optimal parameters \(l=d/2\) and \(d=2n/5\), a collision is found in \(T_q=O(N^{2/5})\) with \(S_c=O(N^{1/5})\).
If \(S_q=2^s\), time complexity becomes \(O(2^{(ndls)/2}(2^{d/2}+2^l)+2^{l+d/2s})\) for \(s \le \min (l,ndl)\). When \(l=d/2\) and \(d=2/5 \{n+s\}\), the complexities satisfy \((T_q)^5(S_q)^3=O(N^2)\) and \(T_q(S_c)^3=O(N)\).
2.4 AES and SHA2 algorithms
A brief review of AES and SHA2 is given in this subsection. Specifically, AES128 and SHA256 algorithms are described which will form the main body of later sections.
2.4.1 AES128
Only the encryption procedure of AES128 which is relevant to this work will be shortly reviewed. See [26] for details.

ShiftRows does cyclic shifts of the last three rows of the internal state by different offsets.

MixColumns does a linear transformation on each column of the internal state that mixes the data.

AddRoundKey does an addition of the internal state and the round key by an XOR operation.

SubBytes does a nonlinear transformation on each byte. SubBytes works as substitutionboxes (Sbox) generated by computing a multiplicative inverse, followed by a linear transformation and an addition of Sbox constant.

RotWord does a cyclic shift on four bytes.

Rcon does an addition of the constant and the word by XOR operation.

SubWord does an Sbox operation on each byte in word.
2.4.2 SHA256
For brevity, only SHA256 hashing algorithm for one message block which is relevant to this work will be reviewed. Description of preprocessing including message padding, parsing, and setting initial hash value is also omitted here. See [27] for details.

\(Ch(x,y,z) = (x \wedge y) \oplus (\lnot x \wedge z)\),

\(Maj(x,y,z) = (x \wedge y) \oplus (x \wedge z) \oplus (y \wedge z)\),

\(\Sigma _0(x) = ROT\!R^2(x) \oplus ROT\!R^{13}(x) \oplus ROT\!R^{22}(x)\),

\(\Sigma _1(x) = ROT\!R^6(x) \oplus ROT\!R^{11}(x) \oplus ROT\!R^{25}(x)\), where \(ROT\!R^n(x)\) is circular right shift of x by n positions.

\(\sigma _0(x) = ROT\!R^7(x) \oplus ROT\!R^{18}(x) \oplus S\!H\!R^3(x)\),

\(\sigma _1(x) = ROT\!R^{17}(x) \oplus ROT\!R^{19}(x) \oplus S\!H\!R^{10}(x)\), where \(S\!H\!R^n(x)\) is right shift of x by n positions.
2.5 Quantum resource estimates
Quantum resource estimates of Shor’s periodfinding algorithm have long been studied in the various literature. See for example [8, 28] and referenced materials therein. On the other hand, quantitative quantum analysis on cryptographic schemes other than period finding is still in its early stage. Partial list may include attacks on multivariatequadratic problems [9], hash functions [5, 7], and AES [4, 6]. We introduce two of them which are the most relevant to our work.
2.5.1 AES key search
Grassl et al. reported the quantum costs of AESk key search for \(k\in \{128,192,256\}\) in the units of logical qubit and gate [6]. In estimating the time cost, the author’s focus was put on a specific gate called ‘T’ gate and its depth, although the overall gate count was also provided. Space cost was simply estimated as the total number of qubits required to run Grover’s algorithm.
There are two points we pay attention on. First is that the authors ensured a single target key. Since AES algorithm works like a random function, there is nonnegligible probability that a plaintext ends up with the same ciphertext when encrypted by two different keys. To avoid the cases, the authors encrypt \(r ~(\in \{3,4,5\})\) plaintext blocks simultaneously to obtain r ciphertexts so that only the true key results in given ciphertexts. The procedure removes the ambiguity in the number of iterations. Note, however, that the removal of the ambiguity comes in exchange of at least tripling the space cost. The other point is that reversible circuit implementation of internal functions of AES was always aimed at reducing the number of qubits. One may see proposed circuit design as spaceoptimized.
2.5.2 SHA2 and SHA3 preimage search
Amy et al. reported the quantum costs of SHA2 and SHA3 preimage search in the units of logical and physical qubit and gate [5]. The method considers an errorcorrection scheme called surface code. Time cost was set considering the scheme. Estimating the costs of T gates in terms of physical resources was one of the main results. One point we would like to address in the work is that randomlike behavior of SHA function was not considered. It is assumed in the paper that the unique preimage of a given hash exists.
3 Tradeoff in query complexity
In this section, the definitions of cryptographic search problems and the querybased time cost of the corresponding quantum search algorithms are discussed. The tradeoff equations between the number of queries and the number of machines are given as a result.
3.1 Types of search problems
We assume that \(f:X\rightarrow Y\) is a random function which means f is selected from the set of all functions from X to Y uniformly at random. Useful statistics of random functions can be found in [29]. The probabilities related to the number of preimages are quoted below. When an element x is selected from a set X uniformly at random, it is denoted by \(x {\overset{\$}{\leftarrow }}X\).
The target function in cryptanalytic search problems is usually modeled as a pseudorandom function (PRF) or a cryptographic hash function (CHF). The precise interpretation of this notions can be found in Sects. 3.5 and 5.5 of [30]. It can be assumed that PRF and CHF have similar statistic behaviors to a random function.
The formal definitions of search problems relevant to symmetric cryptanalysis can be described with random functions. The way of generating the given information in each problem is carefully distinguished. The first is Key Search generalized from the secret key search problem using a pair of plaintext and ciphertext of an encryption algorithm.
Definition 3
For a random function \(f:X\rightarrow Y\), \(y=f(x_0)\) is generated from an \(x_0 \in X\). Key Search is to find the target \(x_0\) for given f and y.
The existence of the target \(x_0\) in X is always ensured. However, preimages of y other than \(x_0\) can be found, which is called a false alarm. The false alarms have to be resolved by additional information since no clue (that helps to recognize the real target) is given within the problem.
Definitions generalized from the preimage and the collision problems of CHF are given as follows.
Definition 4
For a random function \(f:{\{0,1\}^*}\rightarrow Y\), y is chosen at random, \(y {\overset{\$}{\leftarrow }}Y\), or equivalently, \(y=f(x_0)\) for an \(x_0 \in {\{0,1\}^*}\). Preimage Search is to find any \(x\in {\{0,1\}^*}\) satisfying \(f(x)=y\) for given f and y.
There is no false alarm in Preimage Search. However, the existence of a preimage in a fixed subset of \({\{0,1\}^*}\) cannot be ensured.
Definition 5
For a given random function \(f:{\{0,1\}^*}\rightarrow Y\), Collision Finding is to find any inputs \(x_1,x_2\in {\{0,1\}^*}\) satisfying \(f(x_1)=f(x_2)\).
3.2 Tradeoff in Grover’s algorithm for Key Search
In this subsection, the expected iteration number and the parallelization tradeoff of Grover’s algorithm are given. We assume that \(f:X \rightarrow Y\) and \(X=Y=N\).
Proposition 1
Proof
This proof is similar to the one in Sect. 4 of [14].
Comparing \(I_{\mathrm{rand}}^{\mathrm{KS}}\) with \(I_{1}\) of Eq. 4, the expected iteration increases by 37.8...%.
Proposition 2
In the followings, the optimal expected number of iterations and tradeoff curves are defined and analyzed in the same way as in this subsection, but briefly.
3.3 Tradeoff in Grover’s algorithm for Preimage Search
Two resolutions can be sought. The first is to change the domain X in every execution of Grover’s algorithm. In this case, the result on the optimal iteration number of Preimage Search becomes the same as Proposition 1. The second is to expand the domain, \(X=aN\in \mathbb {N}\) for some \(a>1\). The success probability then reads \(P_{\mathrm{rand},(aN)}^{\mathrm{PS}}(i) = \sum _{t\ge 1}q_{(aN)}(t)\cdot p_{t,(aN)}(i)\).
Proposition 3
When \(N=2^{256}\), the proposition can be assumed to hold for \(a\ge 2^{10}\). Subscript ‘\(\gg 1\)’ specifies the assumption. The fact that \(I_{\mathrm{rand},(\gg N)}^{\mathrm{PS}}\approx I_{1,N}\), i.e., better performance up to some converged value for larger domain size, is remarked. If a grows to 8, the failure probability decreases below \(0.0004\ldots \approx 1/e^8\).
There are subtleties in comparing inner and outer parallelization which are inappropriate to be pointed out here. We conclude that it is always favored to enlarge the domain size, and then for large \(S_q\), two parallelization methods show asymptotically the same performance. Denoting the optimal time and space complexities for Preimage Search by \(T_q^{\mathrm{PS}}\) and \(S_q^{\mathrm{PS}}\), the tradeoff curve is given as follows.
Proposition 4
Note that while the inner parallelization is a better option in Key Search, both parallelization methods have similar behaviors in Preimage Search.
3.4 Tradeoff in quantum collision finding algorithms
A collision could be found by using Grover’s algorithm in the way of second preimage search. This has the same result as Sect. 3.3 if the input of the given pair of ‘first preimage’ is not included in the domain. Apart from Grover’s algorithm, the optimal expected iterations and tradeoff curves for parallelizations of two collision finding algorithms, GwDP and CNS, are given in this subsection.
In collision finding algorithms, searching for a preimage of large set is required. Let \(f:{\{0,1\}^*}\rightarrow Y\) and \(X\subset {\{0,1\}^*}\) be a set of size N. For \(f_X\) and \(y {\overset{\$}{\leftarrow }}Y\), the expected number of preimages of y becomes \(1 \approx \sum _{j\ge 1}j\cdot q(j)\). If the size of a set \(A\subset Y\) is large enough, it can be assumed that the number of preimages of \(A = A\).
3.4.1 GwDP algorithm
Let \(S_q=2^s\) for some \(s\in \mathbb {N}\) and \(X\subset {\{0,1\}^*}\) be a set of size N. In each quantum machine, a parameter \((n2s+2)\) is used for the number of bits to be fixed in DPs. The parameter \((n2s+2)\) is chosen as an optimal one only among integers in order to allow the easier implementation by quantum gates.
Proposition 5
Note that the algorithm also requires \(S_c^{\mathrm{GwDP}}=O(2^s)\) classical storage.
3.4.2 CNS algorithm
Since there are about \(2^{nd}\left( =X_1/2^d\right) \) DPs in \(X_1\), the expected number of Grover iterations to find a DP is the same as \(I_{2^{nd}}=0.690\ldots \cdot 2^{d/2}\) of Eq. 4. The expected number of Grover iterations to build L is \(0.690\ldots \cdot 2^{d/2}\cdot 2^l\). A classical storage of size \(O(2^l)\) is required in addition.
QAA iteration \(Q_2\) of the collision finding phase consists of two steps. The first is acting of the oracle operator \(S_{f_L}\). Let \(t_L\) be the ratio of the time cost of \(S_{f_L}\) per list element of L to that of Grover iteration. The second step is acting of the diffusion operator \(A_2S_0A_2^{1}\).
Proposition 6
Proposition 7
The algorithm also requires the classical resource \(S_c^{\mathrm{CNS}}=O\left( N^{1/5} (S_q^\mathrm{CNS})^{1/5}\right) \). If the constant \(t_L\) is determined, the time–space complexity of CNS algorithm could be derived from this tradeoff curve.
4 Depth–qubit cost metric
Universal quantum computers are capable of carrying out elementary logic operations such as Pauli X, Hadamard, CNOT, T. See [13] for details on quantum gates. Implementation of any cryptographic operation in this paper is restricted such that it can only be realized by using these gates. One may think of the restriction as a quantum version of software implementation in classical computing. Quantum security of symmetric cryptosystems can then be estimated in units of elementary logic gates.
It is generally known that each elementary gate has different physical implementation time. Considering various aspects of quantum computing, we suggest to simplify a measure of computation time and to ignore all the other factors or gates that complicates the analysis of quantum algorithms.
Two primary resources in quantum computing, circuit depth and qubit, can be exchanged to meet a certain attack design criteria. Time–space complexity investigated in the previous section can be used to give an attribute ‘efficiency’ to each and every design. To further quantify depth–qubit complexity and to be able to rank the efficiency, we briefly cover the time–space tradeoff of quantum resources in this section.
4.1 Cost measure
Difficulties often arise when it comes to setting quantum complexity measures that are physically interpretable. There exists a number of factors making it complicate, for example different architecture each experimental group is pursuing. A qubit or a certain gate may cost differently in each architecture. It is therefore hardly possible to accurately assess operational time of each type of gate in general and to estimate overall run time. Despite the notable difficulty in quantifying the basic unit cost of quantum computation, a number of groups have attempted to estimate the algorithm costs in various applications [5, 6, 7]. The cost metric varies depending on author’s viewpoint. For example, one considering the faulttolerant computation would estimate the cost involving specific hardware implementations or errorcorrection schemes. On the other hand, one that is not to impose constraints on hardware or errorcorrection scheme would estimate the cost in logical qubits and gates. The latter approach is adopted in this work. Readers should keep in mind that this approach ignores the overheads introduced by fault tolerance.^{7}
Highlevel circuit description of Grover iteration involves not only elementary gates but also larger gates such as C\(^k\)NOT. It is very unlikely that such gates can be directly operated in any realistic universal quantum computers. Decomposition of those gates into smaller ones is thus required in practical estimates.
Determining the unit time cost is a subtle matter. We would like to address that the simplest, yet justified time cost measure involves Toffoli gate.
Definition 6
A unit of quantum computational time cost is the time required to operate a nonparallelizable logical Toffoli gate.
In other words, Toffolidepth will be the time cost of the algorithm. We will look into its justification in Sect. 4.3.
Space cost is estimated as a total number of logical qubits required to perform the quantum search algorithm.
Definition 7
Quantum computational space cost is the number of logical qubits required to run the entire circuit.
 1.
Data qubits are qubits of which the space is searched by the quantum search algorithm. For example in AES128, the size of the key space is \(2^{128}\) which requires 128 data qubits.
 2.
Work qubits are initialized qubits those assist certain operation. Whether it stays in an initialized value or gets written depends on the operation.
 3.
Garbage qubits are previously initialized work qubits, which then get written unwanted information after a certain operation.
 4.
Output qubits are previously initialized work qubits, which then get written the output information of a certain operation.
 5.
Oracle qubit is a single qubit used for phase kickback (sign change) in oracle and diffusion operators.
4.2 Time–space tradeoff
Readers those are familiar with quantum circuit model can safely skip over this subsection as it covers some general facts about depth–qubit tradeoff. In quantum circuit model, it is often possible to sacrifice efficiency in qubits for better performance in time and vice versa. Quantum version of such time–space tradeoff forms a main body of Sects. 5 and 6. As a preliminary we give an example to introduce the general concept of tradeoff in quantum circuits.
Consider a function f that carries out binary multiplications of k single bit values. At the end of this subsection we will deal with general k, but for now, let us explicitly write down the description with \(k=2\), the multiplication of two bits a and b as \(f(a, b) = ab\).
Similarly, multiplications of four bits can be implemented by using C\(^4\)NOT gate as shown in Fig. 4b. C\(^4\)NOT gate carries out NOT operation on target bit if \(a=b=c=d=1\) and nothing otherwise.
A less straightforward decomposition can be found in Fig. 5b. It makes use of twice as many Toffoli gates as Fig. 5a but requires only a single arbitrary work qubit.^{8} Similar to Eq. 17, ten Toffoli gates transform the input state into the output state.
Both designs work as desired. In fact for general k, timeefficient design as in Fig. 5a requires \(k2\) zeroed work qubits within depth \(2k3\), whereas spaceefficient design as in Fig. 5b uses only one arbitrary qubit within depth \(8k24\) (for \(k \ge 5\)) [32]. We denote time and spaceefficient designs lowerdepth and lessqubit C\(^k\)NOT, respectively.
Bit multiplication is one of examples qubit and depth are mutually exchangeable. In Sects. 5 and 6 we will compare multiple circuits that do the same job with a different number of qubits, and examine the consequence of each design when parallelized.
4.3 Remarks on Toffoli gate
Toffoli gate plays an important role in this work as it is defined as a basic time unit. Some remarks on Toffoli gates are given below.
First, Toffoli (and single) gates are universal [33, 34, 35]. Any quantum mechanically permitted computations can be implemented by these gates.
Second, circuits consisting only of Clifford gates are not advantageous over classical computing, implying that a use of nonClifford gates such as Toffoli is essential for quantum benefit [36, 37].
Third, logical Toffoli gates are expected to be the main source of time bottleneck in real applications [5, 38, 39, 40]. Interested readers are encouraged to refer to [39], where resources for quantum applications are counted in terms of Toffoli gates. To summarize their reasoning, presently envisioned quantum computing architecture will dedicate its performance mostly on producing a special gate called T gate [41]. Production or preparation of T gates is hardwaredependent, whereas the number of Toffoli gates (which consists of several T gates) is machineindependent but rather depends only on the algorithm, justifying the choice for the resource unit. Similar analysis that T gates are much more expansive than all the other gates can be found in [5], where the ratio of physical execution time in all Clifford gates to all T gates is about 0.0001 in breaking SHA256. Because of the importance of T gates, there are scientific communities focusing on finding better implementation of T [41, 42, 43, 44] and reducing the number of T gates applied [8, 45, 46, 47, 48]. Therefore, it is more transparent to connect the time complexity with Toffoli gates than any other gates.
Typically in previous studies a quantum algorithm is first implemented in Toffoli level, and then, the circuit undergoes a kind of ‘compilation’ process that looks for an elementarylevel circuit [5, 8]. Finding an optimal compiling method is very complicated and worth researching [7]. At this stage however, it is hardly possible to find true optimal elementarylevel circuit from compiling huge highlevel circuit. In this work therefore, we stay in Toffolilevel implementation conforming the purpose of providing a general framework.
5 Complexity of AES128 Key Search
The idea of applying Grover’s algorithm to exhaustive attack on AES128 is as follows. Linearly superposed \(2^{128}\) input keys encoded in 128 data qubits are fed as an input to AES\(\mathcal {C}\) shown in Fig. 7. AES\(\mathcal {C}\) contains a reversible circuit implementation of AES128 encryption algorithm. The AES\(\mathcal {C}\) encrypts the given plaintext, outputting superposed ciphertexts encoded in output qubits. Superposed ciphertexts are then compared with given ciphertext via C\(^{128}\)NOT gate to mark the target. After marking is done, every qubit except the oracle qubit is passed on to AES\(\mathcal {C}\) Reverse to disentangle the data qubits from other qubits.
5.1 Circuit implementation cost
AES128 encryption internally performs SubBytes, MixColumns, ShiftRows, AddRoundKey, SubWord, RotWord, and Rcon. Quantum circuits for these operations are mostly adopted from [6] with improvements and fixes.
AddRoundKey and Rcon are XORings of fixedsize strings which can also be efficiently realized by CNOT or X gates only.
SubBytes and SubWord are the only operations which require quantum resources. Since SubBytes and SubWord consist of 16 and 4 Sboxes, the Sbox is the only operation to be carefully discussed.
Classically, Sbox can be implemented as a lookup table. However, a quantum counterpart of such table should involve the notion of the quantum memory aforementioned in Sect. 2.3. Therefore in this work, Sbox is realized by explicitly calculating multiplicative inverse followed by affine transformations as described in Sect. 3.2.1 of [6].
Resource estimate of quantum AES128 encryption has been narrowed down to estimate the cost of finding multiplicative inverse of the element \(\alpha \) in GF(\(2^8\)). In [6], multiplicative inverse of \(\alpha \) is calculated by using two arithmetic circuits, Maslov et al.’s modular multiplier [51] and inplace squaring [6]. Slight modification of previous method is found in this work with seven multipliers being used, verified by the quantum circuit simulation by matrix product state [52]. We visualize the sequence in a simplified way such that for example, Open image in new window means CNOT gates are used to copy the string in the first eightbit register to the third register.
The entire sequence is given in Fig. 8, where each state ket represents eightbit register, and Sq and Mul denote modular squaring and multiplication operations. One can see that in Fig. 8, only seven multipliers have been used. Almazrooie et al. also came up with a design for multiplicative inverse [4]. We briefly compare the existing circuit designs in Table 1.
As squaring in GF(\(2^8\)) is linear, it does not involve the use of Toffoli nor work qubits. Therefore, it is only required to estimate the cost of multipliers. Table 2 summarizes the elementary operation costs in AES128. Two distinct multipliers are considered in this work, Maslov et al.’s design [51] and Kepley and Steinwandt’s design [53].
Comparison of circuit designs for finding multiplicative inverse
Grassl et al.’s  Almazrooie et al’s  This work  

Number of qubits  40  48  40 
Number of multiplications  8  7  7 
Costs of elementary operations in AES128
Lessqubit  Lowerdepth  

Multiplier  Sbox  Multiplier  Sbox  
Toffolidepth  18  126  8  56 
Work qubits  8  32  27  108 
5.2 Design candidates
Four main tradeoff points are considered. First point that has an impact on the overall design is to determine whether key schedule and AES rounds are carried out in parallel. As Sbox is used in both key schedule and AES round, scheduleround parallel implementation would require more work qubits. This option is denoted by serial/parallel scheduleround.
Second, AES round functions can be reversed in the middle of encryption process to save work qubits. The idea of reverse AES round was suggested in Sect. 3.2.3 in [6]. Since each run of round function produces garbage qubits, forward running of 10 rounds accumulates \(\ge 1280\) garbage qubits. Putting reverse rounds in between forward rounds reduces a large amount of work qubits at the cost of longer Toffolidepth. This option is denoted by reverse round when applied.
Thirdly, a choice of multiplier could make an important tradeoff point. Lessqubit and lowerdepth multipliers are two options. For simplicity, we do not consider adaptive use of both multipliers although it is possible to improve the efficiency by using appropriate multiplier in different part of circuit. This option is denoted by lessqubit/lowerdepth multiplier.
Fourth, to present the extremely depthoptimized circuit design, the cleaning process in Sbox could be skipped leaving every work qubit used in Sbox garbage. This option is denoted by Sbox uncleaning when applied.

AES\(\mathcal {C}{1}\): Serial scheduleround, reverseround, lessqubit multiplier

AES\(\mathcal {C}{2}\): Serial scheduleround, reverseround, lowerdepth multiplier

AES\(\mathcal {C}{3}\): Parallel scheduleround, lessqubit multiplier

AES\(\mathcal {C}{4}\): Parallel scheduleround, lowerdepth multiplier

AES\(\mathcal {C}{5}\): Parallel scheduleround, lessqubit multiplier, Sbox uncleaning

AES\(\mathcal {C}{6}\): Parallel scheduleround, lowerdepth multiplier, Sbox uncleaning.
5.3 Comparison
Costs of AES128 encryption circuit and entire attack circuit on a single quantum processor
AES128  Grover  

Toffolidepth  Qubits  Toffolidepth  Qubits  
AES\(\mathcal {C}{1}\)  11088  984  \( 1.360\ldots \times 2^{78}\)  985 
AES\(\mathcal {C}{2}\)  4928  3017  \( 1.290\ldots \times 2^{77}\)  3018 
AES\(\mathcal {C}{3}\)  1260  2208  \( 1.405\ldots \times 2^{75}\)  2209 
AES\(\mathcal {C}{4}\)  560  7148  \( 1.510\ldots \times 2^{74}\)  7149 
AES\(\mathcal {C}{5}\)  720  6654  \( 1.808\ldots \times 2^{74}\)  6655 
AES\(\mathcal {C}{6}\)  320  21854  \( 1.064\ldots \times 2^{74}\)  21855 
Comparison of time–space complexity of different AES128 circuit designs
AES\(\mathcal {C}{1}\)  AES\(\mathcal {C}{2}\)  AES\(\mathcal {C}{6}\)  AES\(\mathcal {C}{5}\)  AES\(\mathcal {C}{3}\)  AES\(\mathcal {C}{4}\)  

\(c_\#^{\mathrm{KS}}/c_4^{\mathrm{KS}}\)  \( 28.606\ldots \)  \( 19.705\ldots \)  \( 1.519\ldots \)  \( 1.333\ldots \)  \( 1.070\ldots \)  1 
5.4 Comparison to ensured single target
Comparison of attack design with and without a single target
AES128  Grover  

Toffolidepth  Qubits  Toffolidepth  Qubits  
Unique Key  560  14296  \( 1.269\ldots \times 2^{74} \)  14297 
AES\(\mathcal {C}{4}\)  560  7148  \( 1.510\ldots \times 2^{74} \)  7149 
It is now natural to ask whether the oracle operator with a single target is more costefficient than the random function oracle with less qubits. Assuming \(r=2\) guarantees a single target, we compare a design dubbed Unique Key with AES\(\mathcal {C}{4}\). Unique Key’s encryption circuit design is chosen to be the same as AES\(\mathcal {C}{4}\), meaning that the difference in efficiency solely comes from ensuring a single target. Results are summarized in Table 5. Full Toffolidepth of Unique Key is estimated considering \(I_{1}\) in Eq. 4. With a guaranteed single target, Toffolidepth is expected to be shortened compared with AES\(\mathcal {C}{4}\) at the cost of doubling qubits. Although ensuring single target can be regarded as an optimization point when using single processor, it strictly cannot be an option in parallel attack since the inner parallelization removes a penalty of random characteristics as in Eq. 8. Finally regarding Almazrooie et al.’s design [4], we notice that the authors reduce the number of qubits at the cost of lengthening the oracle circuit.^{9} We would address that, however, both previous designs result in inefficient tradeoff curves.
6 Complexity of SHA256 Preimage Search
6.1 Circuit implementation cost
SHA256 internally performs five elementary operations, \(\sigma _{0(1)}\), \(\Sigma _{0(1)}\), Ch, Maj, and ADDER (modular addition) [27].
Among internal operations carried out in SHA256, \(\Sigma _{0(1)}\) consists only of XORings of bit permutations. Results of three ROTR operations are written on 32bit output register, with being successively XORed. Only CNOT gates are involved in implementation with 32 work qubits.
Similarly, \(\sigma _{0(1)}\) is implemented with one difference from \(\Sigma _{0(1)}\), that is SHR. SHR itself is not linear, but writing a result of SHR on 32bit output register is possible. Therefore, \(\sigma _{0(1)}\) is also efficiently realized by CNOT gates with 32 work qubits.
Ch and Maj are bitwise operations that do require Toffoli gates. We adopt Amy et al.’s design where Ch and Maj require one and two Toffoli gates, respectively. See Figs. 4 and 5 in [5].
Costs of elementary operations in SHA256
ADDER (poly)  ADDER (log)  \( \sigma _{0(1)}\), \(\Sigma _{0(1)} \)  Ch  Maj  

Toffolidepth  61  22  0  1  2 
Work qubits  1  53  32  32  32 
6.2 Design candidates
Three optimization points are considered. First point, that has an impact on the overall design, is to determine whether message schedule and round functions are carried out in parallel. Figure 10 shows a serial circuit implementation of SHA256. In the algorithm description, ith round function is fed by ith word from the schedule meaning that parallel implementation is possible if enough work qubits are given. This option is denoted by serial/parallel scheduleround.
Second point is to determine which ADDER is to be used. Use of the polydepth ADDER is better in saving work space, whereas the logdepth ADDER could shorten the execution time. For simplicity, we do not consider adaptive use of both ADDERs although it is possible to improve the efficiency by using appropriate ADDER in different part of circuit. This option is denoted by polydepth/logdepth ADDER.
Lastly, it is now optional to decide how many work qubits are to be used to implement C\(^{256}\)NOT gate for marking the targets (hash comparison). As discussed in Sect. 4.2, C\(^k\)NOT gate can be one of the tradeoff points. However in AES128, we do not need to consider C\(^{128}\)NOT as an optimization point seriously since the encryption process accompanies enough number of work qubits that can be reused in lowerdepth C\(^{128}\)NOT gate. Situation is different in SHA256. It is noticeable that hashing process of SHA256 does not involve as many work qubits as AES128, meaning that the lowerdepth C\(^{256}\)NOT gate cannot be implemented unless more qubits are introduced solely for hash comparison. Toffolidepth and work qubits required for lowerdepth (lessqubit) C\(^{256}\)NOT gate are 509 (2024) and 254 (1), respectively. Note that lowerdepth and lessqubit C\(^{k}\)NOT gates present here are only two extreme exemplary designs. This option is denoted by lessqubit/lowerdepth C\(^{256}\)NOT.

SHA\(\mathcal {C}{1}\): Serial scheduleround, polydepth ADDER, lessqubit C\(^{256}\)NOT

SHA\(\mathcal {C}{2}\): Serial scheduleround, logdepth ADDER, lessqubit C\(^{256}\)NOT

SHA\(\mathcal {C}{3}\): Serial scheduleround, logdepth ADDER, lowerdepth C\(^{256}\)NOT

SHA\(\mathcal {C}{4}\): Parallel scheduleround, polydepth ADDER, lessqubit C\(^{256}\)NOT

SHA\(\mathcal {C}{5}\): Parallel scheduleround, logdepth ADDER, lessqubit C\(^{256}\)NOT

SHA\(\mathcal {C}{6}\): Parallel scheduleround, logdepth ADDER, lowerdepth C\(^{256}\)NOT.
6.3 Comparison
Costs of SHA256 hashing circuit and entire attack circuit on a single quantum processor
SHA256  Grover  

Toffolidepth  Qubits  Toffolidepth  Qubits  
SHA\(\mathcal {C}{1}\)  36368  801  \( 1.586\ldots \times 2^{143}\)  802 
SHA\(\mathcal {C}{2}\)  13280  853  \( 1.227\ldots \times 2^{142}\)  854 
SHA\(\mathcal {C}{3}\)  13280  853  \( 1.163\ldots \times 2^{142}\)  1023 
SHA\(\mathcal {C}{4}\)  27584  834  \( 1.216\ldots \times 2^{143}\)  835 
SHA\(\mathcal {C}{5}\)  10112  938  \( 1.919\ldots \times 2^{141}\)  939 
SHA\(\mathcal {C}{6}\)  10112  938  \( 1.792\ldots \times 2^{141}\)  1023 
Comparison of tradeoff coefficients of different SHA256 circuit designs
SHA\(\mathcal {C}{1}\)  SHA\(\mathcal {C}{4}\)  SHA\(\mathcal {C}{3}\)  SHA\(\mathcal {C}{2}\)  SHA\(\mathcal {C}{5}\)  SHA\(\mathcal {C}{6}\)  

\(c_\#^\mathrm{PS}/c_6^\mathrm{PS}\)  \( 9.830\ldots \)  \( 6.015\ldots \)  \( 1.685\ldots \)  \( 1.565\ldots \)  \( 1.053\ldots \)  1 
7 Complexity of SHA256 Collision Finding
Costs of two collision finding algorithms, GwDP and CNS, are to be estimated in this section. We adopt SHA\(\mathcal {C}{6}\) which also turn out to be the most efficient in time–space complexity in GwDP and CNS algorithms.^{10}
7.1 GwDP algorithm
Costs of GwDP algorithm for various number of machines
\(S_q\)  Toffolidepth  Qubits 

\(2^2\)  \(1.986\ldots \times 2^{141}\)  4084 
\(2^4\)  \(1.985\ldots \times 2^{139}\)  16272 
\(2^8\)  \(1.984\ldots \times 2^{135}\)  258304 
\(2^{16}\)  \(1.981\ldots \times 2^{127}\)  \( 6.508\ldots \times 10^7 \) 
\(2^{32}\)  \(1.975\ldots \times 2^{111}\)  \( 4.127\ldots \times 10^{12} \) 
\(2^{64}\)  \(1.963\ldots \times 2^{79} \)  \( 1.732\ldots \times 10^{22} \) 
7.2 CNS algorithm
Parameter values and costs of CNS algorithm for various number of machines
\(S_q\)  l  d  \(t_L\)  Toffolidepth  Qubits 

\(2^2\)  \(55.155\ldots \)  97  \(0.015064\ldots \)  \(1.353\ldots \times 2^{116}\)  3756 
\(2^4\)  \(55.558\ldots \)  98  \(0.014987\ldots \)  \(1.203\ldots \times 2^{115}\)  15024 
\(2^8\)  \(56.364\ldots \)  99  \(0.014834\ldots \)  \(1.729\ldots \times 2^{112}\)  240384 
\(2^{16}\)  \(57.976\ldots \)  102  \(0.014527\ldots \)  \(1.960\ldots \times 2^{107}\)  \( 6.154\ldots \times 10^{7} \) 
\(2^{32}\)  \(61.201\ldots \)  109  \(0.013914\ldots \)  \(1.352\ldots \times 2^{98} \)  \( 4.033\ldots \times 10^{12} \) 
\(2^{64}\)  \(67.654\ldots \)  121  \(0.012692\ldots \)  \(1.100\ldots \times 2^{79} \)  \( 1.732\ldots \times 10^{22} \) 
8 Security strengths of AES and SHA2
Tradeoff coefficients of AESk Key Search for \(k\in \{128,192,256\}\) and SHAm Collision Finding for \(m\in \{256,384,512\}\). Coefficients \(c_k^{\mathrm{KS}}\) and \(c_m^{\mathrm{CF}}\) are divided by their respective minimal values \(c_{128}^{\mathrm{KS}} = c_4^{\mathrm{KS}}\) and \(c_{256}^{\mathrm{CF}} = c^{\mathrm{GwDP}}\)
AES128  AES192  AES256  

\(c_k^\mathrm{KS}/c_{128}^\mathrm{KS}\)  1  \( 1.560\ldots \)  \( 2.586\ldots \) 
SHA256  SHA384  SHA512  

\(c_m^{\mathrm{CF}}/c_{256}^\mathrm{CF}\)  1  \( 3.837\ldots \)  \( 3.940\ldots \) 
Resource estimates for AES128 Key Search with circuit AES\(\mathcal {C}{4}\) are extended to AES192 and AES256, and similarly that of SHA256 Collision Finding with circuit SHA\(\mathcal {C}{6}\) is applied to SHA384 and SHA512. Since depth–qubit tradeoff curves Eqs. 20 and 22 must hold for larger key and message digest sizes, we only compare their tradeoff coefficients in Table 11. There is a tendency that the values of coefficients grow as the key or message digest sizes get larger. Increasing coefficient values reflect various complexity factors added, more rounds, longer schedules, larger word size, and so on. Especially in hash, size of the message block in SHA384 is doubled compared with SHA256 leading to large gap between \(c_{256}^{\mathrm{CF}}\) and \(c_{384}^{\mathrm{CF}}\). In contrast, \(c_{384}^{\mathrm{CF}}\) and \(c_{512}^{\mathrm{CF}}\) do not show much difference as SHA384 and SHA512 algorithms are identical except truncation and initial values. The result of Sect. 7.2 is also extended to SHA384 and reflected in Fig. 11.
Figure 11 summarizes the results which can be interpreted as another threshold to be used, for the security strength classification of proposed schemes in NIST PQC standardization process.
9 Summary
Instead of conventional query complexity, we have examined the time–space complexity of Grover’s algorithm and its variants. Three categories of cryptographic search problems and their characteristics are carefully considered in conjunction with the probabilistic nature of quantum search algorithms.
To relate the time–space complexity with physical quantity, we have proposed a way of quantifying the computational power of quantum computers. Despite its simplicity, counting the number of sequential Toffoli gates reflects the reliable time complexity in estimating security levels of symmetric cryptosystems. With simplified cost measure, one can estimate the quantum complexity of a cryptosystem concisely by counting (and focusing) relevant operations only. It is worth noting that the above scheme is general for quantum resource estimates in symmetric cryptanalysis.
The scheme has been applied to resource estimates for AES and SHA2. When multiple quantum tradeoff options are given, the time–space complexity provides clear criteria to tell which is more efficient. Based on the tradeoff observations made in AES and SHA2, security strengths of respective systems are investigated with the MAXDEPTH assumption.
Footnotes
 1.
See also ‘S. Fluhrer, Reassessing Grover’s algorithm, http://eprint.iacr.org/2017/811,’ which analyzed parallelizability of Grover’s algorithm in conjunction with cryptosystems.
 2.
Interested readers are suggested to look into ‘Moody, D.: Let’s get ready to rumble—the NIST PQC “competition.” PQCrypto 2018 invited presentation (2018).’
 3.
 4.
In trivial cases, rounding function is not explicitly used in this paper for simplicity.
 5.
There are two reasons. One is that fixedpoint search requires two oracle queries per iteration, and the other is \(\log (2/\delta )\) factor in Eq. 3 in [17] which also increases the required number of iterations depending on the bounding parameter \(\delta \). Comparing these factors with the overhead in our method introduced by random statistics, we concluded that the fixedpoint algorithm is not favored.
 6.
The first and the last rounds are different, but will not be covered in detail here.
 7.
Faulttolerant cost could be in general huge, but we expect that logical cost to faulttolerant cost conversion would be more or less uniform.
 8.
The first Toffoli gate in Fig. 5b is redundant in this case, but needed if one wants to carry out \(z \oplus abcd\), where z is the initial value of the last qubit.
 9.
Compared with [6], the circuit is about three times longer but requires onethird of qubits.
 10.
Details on circuit comparisons in GwDP and CNS algorithms are dropped from the main text. An interesting point worth noticing is that SHA\(\mathcal {C}{5}\) has small advantageous range of \(S_q (< 2^{8})\) over SHA\(\mathcal {C}{6}\). The reason is that while SHA\(\mathcal {C}{5}\) requires zero additional qubits in hash comparison, SHA\(\mathcal {C}{6}\) needs \((256d2)\) qubits in comparison where d is the number of fixed bits in DP. Since d grows as \(S_q\) increases, there occurs crossover point. It is also noticeable that SHA\(\mathcal {C}{6}\) cannot exactly fit into Proposition 5 for the same reason just mentioned, but the deviation is small.
Notes
Acknowledgements
We are grateful to Brandon Langenberg, Martin Rötteler, and Rainer Steinwandt for helpful discussion and sharing details of their previous work which has motivated us.
References
 1.Bernstein, D.J., Lange, T.: Postquantum cryptography. Nature 549(7671), 188–194 (2017)ADSCrossRefGoogle Scholar
 2.Grover, L.K.: Quantum mechanics helps in searching for a needle in a haystack. Phys. Rev. Lett. 79(2), 325–328 (1997)ADSCrossRefGoogle Scholar
 3.Bernstein, D.J.: Cost analysis of hash collisions: Will quantum computers make SHARCS obsolete?. In: SHARCS ’09, pp. 105–116 (2009)Google Scholar
 4.Almazrooie, M., Samsudin, A., Abdullah, R., Mutter, K.N.: Quantum reversible circuit of AES128. Quantum Inf. Process. 17(5), 112 (2018)MathSciNetCrossRefGoogle Scholar
 5.Amy, M., Di Matteo, O., Gheorghiu, V., Mosca, M., Parent, A., Schanck, J.: Estimating the cost of generic quantum preimage attacks on SHA2 and SHA3. In: SAC 2016, pp. 317–337. ISBN: 9783319694535 (2017)Google Scholar
 6.Grassl, M., Langenberg, B., Rötteler, M., Steinwandt, R.: Applying Grover’s algorithm to AES: quantum resource estimates. In: PQCrypto 2016, pp. 29–43. ISBN: 9783319293608 (2016)CrossRefGoogle Scholar
 7.Parent, A., Rötteler, M., Svore, K.M.: Reversible circuit compilation with space constraints. arXiv preprint arXiv:1510.00377 (2015)
 8.Rötteler, M., Naehrig, M., Svore, K.M., Lauter, K.: Quantum resource estimates for computing elliptic curve discrete logarithms. In: ASIACRYPT 2017, pp. 241–270. ISBN: 9783319706979 (2017)Google Scholar
 9.Schwabe, P., Westerbaan, B.: Solving binary \(\cal{MQ}\) with Grover’s algorithm. In: SPACE 2016, pp. 303–322. ISBN: 9783319494456 (2016)Google Scholar
 10.NIST: Postquantum cryptography—call for proposals (2017). https://csrc.nist.gov/Projects/PostQuantumCryptography/PostQuantumCryptographyStandardization/CallforProposals
 11.Chailloux, A., NayaPlasencia, M., Schrottenloher, A.: An efficient quantum collision search algorithm and implications on symmetric cryptography. In: ASIACRYPT 2017, pp. 211–240. ISBN: 9783319706979. (2017)CrossRefGoogle Scholar
 12.Bernstein, D.J., Buchmann, J., Dahmen, E.: Post Quantum Cryptography, 1st edn. Springer, Berlin (2008)Google Scholar
 13.Nielsen, M.A., Chuang, I.L.: Quantum Computation and Quantum Information, 10 Anniversary edn. Cambridge University Press, Cambridge (2010)CrossRefGoogle Scholar
 14.Boyer, M., Brassard, G., Høyer, P., Tapp, A.: Tight bounds on quantum searching. Fortschritte der Physik 46(4–5), 493–505 (1998)ADSCrossRefGoogle Scholar
 15.Zalka, C.: Grover’s quantum searching algorithm is optimal. Phys. Rev. A 60(4), 2746–2751 (1999)ADSCrossRefGoogle Scholar
 16.Bernstein, D.J., Yang, B.Y.: Asymptotically faster quantum algorithms to solve multivariate quadratic equations. In: PQCrypto 2018, pp. 487–506. ISBN: 9783319790633 (2018)CrossRefGoogle Scholar
 17.Yoder, T.J., Low, G.H., Chuang, I.L.: Fixedpoint quantum search with an optimal number of queries. Phys. Rev. Lett. 113(21), 210501 (2014)ADSCrossRefGoogle Scholar
 18.Brassard, G., Høyer, P., Mosca, M., Tapp, A.: Quantum amplitude amplification and estimation. AMS Contemp. Math. 305, 53–74 (2002)MathSciNetCrossRefGoogle Scholar
 19.Brassard, G., Høyer, P., Tapp, A.: Quantum cryptanalysis of hash and clawfree functions. In: LATIN ’98, pp. 163–169. ISBN: 9783540697152 (1998)CrossRefGoogle Scholar
 20.Hosoyamada, A., Sasaki, Y., Xagawa, K.: Quantum multicollisionfinding algorithm. In: ASIACRYPT 2017, pp. 179–210. ISBN: 9783319706979 (2017)CrossRefGoogle Scholar
 21.Arunachalam, S., Gheorghiu, V., JochymO’Connor, T., Mosca, M., Srinivasan, P.V.: On the robustness of bucket brigade quantum RAM. New J. Phys. 17(12), 123010 (2015)ADSCrossRefGoogle Scholar
 22.Shenvi, N., Kempe, J., Whaley, K.B.: Quantum randomwalk search algorithm. Phys. Rev. A 67(5), 052307 (2003)ADSCrossRefGoogle Scholar
 23.Childs, A.M., Goldstone, J.: Spatial search by quantum walk. Phys. Rev. A 70(2), 022314 (2004)ADSCrossRefGoogle Scholar
 24.Ambainis, A.: Quantum walk algorithm for element distinctness. SIAM J. Comput. 37(1), 210–239 (2007)MathSciNetCrossRefGoogle Scholar
 25.van Oorschot, P.C., Wiener, M.J.: Parallel collision search with cryptanalytic applications. J. Cryptol. 12(1), 1–28 (1999)MathSciNetCrossRefGoogle Scholar
 26.NIST: Advanced Encryption Standard (AES), FIPS PUB 197 (2001)Google Scholar
 27.NIST: Secure Hash Standard (SHS), FIPS PUB 1804 (2015)Google Scholar
 28.Pavlidis, A., Gizopoulos, D.: Fast quantum modular exponentiation architecture for Shor’s factoring algorithm. Quantum Inf. Comput. 14(7 & 8), 649–682 (2014)MathSciNetGoogle Scholar
 29.Flajolet, P., Odlyzko, A.M.: Random mapping statistics. In: EUROCRYPT ’89, pp. 329–354. Springer, Berlin Heidelberg. ISBN: 9783540468851 (1990)Google Scholar
 30.Katz, J., Lindell, Y.: Introduction to Modern Cryptography, 2nd edn. Chapman & Hall/CRC, London (2007)CrossRefGoogle Scholar
 31.Häner, T., Rötteler, M., Svore, K.M.: Factoring using \(2n+2\) qubits with Toffoli based modular multiplication. Quantum Inf. Comput. 17(7 & 8), 673 (2017)MathSciNetGoogle Scholar
 32.Barenco, A., Bennett, C.H., Cleve, R., DiVincenzo, D.P., Margolus, N., Shor, P., Sleator, T., Smolin, J.A., Weinfurter, H.: Elementary gates for quantum computation. Phys. Rev. A 52, 3457–3467 (1995)ADSCrossRefGoogle Scholar
 33.Deutsch, D.: Quantum computational networks. Proc. R. Soc. Lond. A 425(1868), 73–90 (1989)ADSMathSciNetCrossRefGoogle Scholar
 34.Shi, Y.: Both Toffoli and controlledNOT need little help to do universal quantum computing. Quantum Inf. Comput. 3(1), 84–92 (2003)MathSciNetzbMATHGoogle Scholar
 35.Toffoli, T.: Reversible computing. In: de Bakker, J., van Leeuwen, J. (eds.) Automata, Languages and Programming, pp. 632–644. Springer, Berlin Heidelberg (1980)CrossRefGoogle Scholar
 36.Gottesman, D.: The Heisenberg representation of quantum computers. In: GP22: ICGTMP ’98, pp. 32–43 (1998)Google Scholar
 37.Aaronson, S., Gottesman, D.: Improved simulation of stabilizer circuits. Phys. Rev. A 70(5), 052328 (2004)ADSCrossRefGoogle Scholar
 38.Fowler, A.G., Mariantoni, M., Martinis, J.M., Cleland, A.N.: Surface codes: towards practical largescale quantum computation. Phys. Rev. A 86(3), 032324 (2012)ADSCrossRefGoogle Scholar
 39.Jones, N.C., Van Meter, R., Fowler, A.G., McMahon, P.L., Kim, J., Ladd, T.D., Yamamoto, Y.: Layered architecture for quantum computing. Phys. Rev. X 2(3), 031007 (2012)Google Scholar
 40.O’Gorman, J., Campbell, E.T.: Quantum computation with realistic magicstate factories. Phys. Rev. A 95(3), 032338 (2017)ADSCrossRefGoogle Scholar
 41.Bravyi, S., Kitaev, A.: Universal quantum computation with ideal Clifford gates and noisy ancillas. Phys. Rev. A 71(2), 022316 (2005)ADSMathSciNetCrossRefGoogle Scholar
 42.Bravyi, S., Haah, J.: Magicstate distillation with low overhead. Phys. Rev. A 86(5), 052329 (2012)ADSCrossRefGoogle Scholar
 43.Jones, C.: Multilevel distillation of magic states for quantum computing. Phys. Rev. A 87(4), 042305 (2013)ADSCrossRefGoogle Scholar
 44.Meier, A.M., Eastin, B., Knill, E.: Magicstate distillation with the fourqubit code. Quantum Inf. Comput. 13(3–4), 195–209 (2013)MathSciNetGoogle Scholar
 45.Amento, B., Rötteler, M., Steinwandt, R.: Efficient quantum circuits for binary elliptic curve arithmetic: reducing Tgate complexity. Quantum Inf. Comput. 13(7–8), 631–644 (2013)MathSciNetGoogle Scholar
 46.Amy, M., Maslov, D., Mosca, M., Rötteler, M.: A meetinthemiddle algorithm for fast synthesis of depthoptimal quantum circuits. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 32(6), 818–830 (2013)CrossRefGoogle Scholar
 47.Bocharov, A., Svore, K.M.: Resourceoptimal singlequbit quantum circuits. Phys. Rev. Lett. 109(19), 190501 (2012)ADSCrossRefGoogle Scholar
 48.Gosset, D., Kliuchnikov, V., Mosca, M., Russo, V.: An algorithm for the Tcount. Quantum Inf. Comput. 14(15–16), 1261–1276 (2014)MathSciNetGoogle Scholar
 49.Beth, T., Rötteler, M.: Quantum algorithms: applicable algebra and quantum physics. In: Alber, G. (ed.) Quantum Information: An Introduction to Basic Theoretical Concepts and Experiments, pp. 96–150. Springer, Berlin Heidelberg (2001)CrossRefGoogle Scholar
 50.Patel, K.N., Markov, I.L., Hayes, J.P.: Optimal synthesis of linear reversible circuits. Quantum Inf. Comput. 8(3), 282–294 (2008)MathSciNetzbMATHGoogle Scholar
 51.Maslov, D., Mathew, J., Cheung, D., Pradhan, D.K.: An O\((m^2)\)depth quantum algorithm for the elliptic curve discrete logarithm problem over GF\((2^m)^a\). Quantum Inf. Comput. 9(7), 610–621 (2009)MathSciNetzbMATHGoogle Scholar
 52.Vidal, G.: Efficient classical simulation of slightly entangled quantum computations. Phys. Rev. Lett. 91(14), 147902 (2003)ADSCrossRefGoogle Scholar
 53.Kepley, S., Steinwandt, R.: Quantum circuits for \(\mathbb{F}_{2^n}\)multiplication with subquadratic gate count. Quantum Inf. Process. 14(7), 2373–2386 (2015)ADSMathSciNetCrossRefGoogle Scholar
 54.Cuccaro, S.A., Draper, T.G., Kutin, S.A., Moulton, D.P.: A new quantum ripplecarry addition circuit. arXiv preprint arXiv:quantph/0410184 (2004)
 55.Draper, T.G., Kutin, S.A., Rains, E.M., Svore, K.M.: A logarithmicdepth quantum carrylookahead adder. Quantum Inf. Comput. 6(4 & 5), 351–369 (2006)MathSciNetzbMATHGoogle Scholar
Copyright information
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.