On post-processing in the quantum algorithm for computing short discrete logarithms

We revisit the quantum algorithm for computing short discrete logarithms that was recently introduced by Ekerå and Håstad. By carefully analyzing the probability distribution induced by the algorithm, we show its success probability to be higher than previously reported. Inspired by our improved understanding of the distribution, we propose an improved post-processing algorithm that is considerably more efficient, enables better tradeoffs to be achieved, and requires fewer runs, than the original post-processing algorithm. To prove these claims, we construct a classical simulator for the quantum algorithm by sampling the probability distribution it induces for given logarithms. This simulator is in itself a key contribution. We use it to demonstrate that Ekerå–Håstad achieves an advantage over Shor, not only in each individual run, but also overall, when targeting cryptographically relevant instances of RSA and Diffie–Hellman with short exponents.

short DLP. The order r may be assumed to be known or unknown. Both cases are cryptographically relevant.

Earlier works
In 2016 Ekerå [4] introduced a modified version of Shor's algorithm [16,17] for computing discrete logarithms that is more efficient than Shor's original algorithm when the logarithm is short. This work was originally motivated by the use of short discrete logarithms in instantiations of cryptographic schemes based on the computational intractability of the DLP in finite fields. A concrete example is the use of short exponents in the Diffie-Hellman key exchange protocol when instantiated with safe-prime groups.
This work was subsequently generalized by Ekerå and Håstad [5] so as to enable tradeoffs between the number of times that the algorithm needs to be executed, and the requirements it imposes on the quantum computer. These ideas parallel earlier ideas by Seifert [15] for making tradeoffs in Shor's order finding algorithm; the quantum part of Shor's general factoring algorithm.
Ekerå and Håstad furthermore explained how the RSA integer factoring problem may be expressed as a short DLP. This gives rise to a new algorithm for factoring RSA integers that does not rely directly on order-finding, and that imposes less requirements on the quantum computer in each run compared to Shor's or Seifert's general factoring algorithms.

Background on quantum cost metrics
The hard part when implementing Shor's algorithms and their various derivatives in practice is to exponentiate group elements in superposition. The algorithms of Seifert, Ekerå and Ekerå-Håstad reduce the requirements on the quantum computer compared to Shor by reducing the exponent length in each run.
In virtually all practical implementations, an exponent length reduction translates into a reduction in the number of controlled group operations that need to be evaluated quantumly. 1 This in turn translates into a reduction in the number of logical qubit operations, the circuit depth, the runtime and the required coherence time. The number of logical qubits required is typically not reduced, however. This is because control qubits can be recycled [10,11] in implementations.
When accounting for the need for quantum error correction, the number of physical qubits required may nevertheless be reduced as a result of reducing the exponent length, and it is physical rather than logical qubit counts that matter in practice. To understand why physical qubits may be saved, note that quantum error correcting codes, such as the surface code (see [6] for a good overview), use an array of physical qubits to construct each logical qubit. In simple terms; the fewer the number of operations that need be performed with respect to a logical qubit, for some fixed maximum tolerable probability of uncorrectable errors arising, the fewer the number of physical qubits needed to construct said logical qubit.
Quantum spacetime volume, the product of the runtime and the number of physical qubits required, is sometimes used as a complexity metric in the literature. For the reasons explained above, a reduction in the exponent length typically gives rise to a reduction in the spacetime volume in each run, and in some cases overall.

Our contributions
To compute an m bit short discrete logarithm d, the quantum algorithm of Ekerå and Håstad exponentiates group elements in superposition to m + 2m/s bit exponents for s ≥ 1 an integer. Given a set of s good outputs, the classical post-processing algorithm in [5] recovers d by enumerating vectors in an s + 1-dimensional lattice. The probability of observing a good output in a run is lower bounded by 1/8 in [5], so we expect to find s good outputs after at most about 8s runs. As good outputs cannot be efficiently recognized, d is recovered in [5] by exhaustively post-processing all subsets of s outputs from a set of about 8s outputs.
We refer to s as the tradeoff factor, as it controls the tradeoff between the number of runs required and the exponent length in each run. To minimize the exponent length we would like to select s large. However, the fact that the post-processing complexity grows exponentially in s limits the achievable tradeoff. Furthermore, the fact that we expect to have to perform about 8s runs implies that an overall reduction in the number of group operations performed quantumly may not be achieved, even if the number of operations in each run is reduced.
In this work, we analyze and capture the probability distribution induced by the quantum algorithm of Ekerå and Håstad. This analysis allows us to tightly estimate the probability of observing a good output, and to show that it is significantly higher than the lower bound of 1/8 in [5] guarantees.
More importantly, the fact that we can capture the probability distribution induced by the quantum algorithm of Ekerå and Håstad for known d enables us to classically simulate the quantum algorithm by sampling the distribution. This simulator is in itself a key contribution. Inspired by our improved understanding of the distribution, we design an improved and considerably more efficient classical post-processing algorithm. We then use the simulator to heuristically estimate the number of runs required to solve for d for different tradeoff factors, and to practically verify the heuristic estimates by post-processing simulated outputs.
With our new post-processing and analysis, Ekerå-Håstad's algorithm achieves an advantage over Shor's algorithms, not only in each individual run, but also overall, when targeting cryptographically relevant instances of RSA and Diffie-Hellman with short exponents. When making tradeoffs, Ekerå-Håstad's algorithm with our new post-processing furthermore outperforms Seifert's algorithm in practice when factoring RSA integers. To the best of our knowledge, this makes Ekerå-Håstad's algorithm the currently most efficient polynomial time quantum algorithm for breaking the RSA cryptosystem [12], and DLP-based cryptosystems such as Diffie-Hellman [3] when instantiated with safe-prime groups with short exponents, as in TLS [7], IKE [8] and standards such as NIST SP 800-56A [2].
For example, compared to Shor's original algorithms, Ekerå-Håstad can achieve a reduction in the number of group operations that are evaluated quantumly in each run, of a factor 6.1 or 14.2 for FF-DH-2048 in safe-prime groups with 224 bit exponents, and of a factor 1.35 or 3.6 for RSA-2048, depending on whether tradeoffs are made. When not making tradeoffs, a single run of Ekerå-Håstad in general suffices. For other options and technical details, please see Appendix A.
Both RSA and Diffie-Hellman are widely deployed cryptosystems. It is important to understand the complexity of attacking these systems quantumly to project the remaining lifetime of confidential information protected by these systems, and to inform migration strategies. This paper helps further this understanding.

Overview
The remainder of this paper is structured as follows: We recall the quantum algorithm of Ekerå and Håstad in Sect. 2 and we analyze the probability distribution it induces in Sects. 3-4. In particular, we derive a closed form expression for the probability of observing a given output.
In Sect. 5 we use the closed form expression to generate a high resolution histogram for the probability distribution, and describe how it may be sampled to classically simulate the quantum algorithm, for known d. In Sect. 6 we introduce our improved post-processing algorithm. We furthermore use the simulator to heuristically estimate and experimentally verify the number of runs required to solve for d. We summarize and conclude the paper in Sect. 7. Concrete cost estimates for RSA and Diffie-Hellman are given in Appendix A.

Notation
Before proceeding, we introduce notation used throughout this paper: u mod n denotes u reduced modulo n constrained to 0 ≤ u mod n < n.
-{u} n denotes u reduced modulo n constrained to −n/2 ≤ {u} n < n/2. u , u and u denotes u rounded up, down and to the closest integer.
u ≪ v is used to denote that u is less than v by some order of magnitude.
u ∼ v is used to denote that u and v are approximately of similar size.

The quantum algorithm
Recall that given a generator g of a finite cyclic group of order r and x = [d]g, where d is such that 0 < d < 2 m ≪ r , the quantum algorithm of Ekerå and Håstad in [5] computes the logarithm d = log g x by inducing the system where ∼ m/s is an integer for s ≥ 1 the tradeoff factor. Observing (1) yields a pair ( j, k) for j and k integers on the intervals 0 ≤ j < 2 +m and 0 ≤ k < 2 , respectively, and some group element y = [e]g ∈ G.
3 The probability distribution The above analysis implies that the probability P of observing a given pair ( j, k) is determined by its argument α or, equivalently, by its angle θ , where α( j, k) = {d j + 2 m k} 2 +m and θ(α) = 2πα 2 +m and the probability

The summation interval for a given e
Recall that e = a − bd where 0 < d < 2 m , 0 ≤ a < 2 +m and 0 ≤ b < 2 . This implies that e is an integer on the interval Divide the interval for e into three regions, and denote these regions A, B and C, respectively. Define the middle region B to be the region in e where #b(e) = 2 or equivalently 0 = b 0 (e) ≤ b < b 1 (e) = 2 . Then by (2) region B spans the interval 0 ≤ e < 2 +m − (2 − 1)d. This is easy to see, as for all b on the interval 0 ≤ b < 2 there must exist an a on 0 ≤ a < 2 +m such that e = a − bd. It follows that region A to the left of B spans −(2 − 1)d ≤ e < 0 and that region C to the right of B spans 2 +m − (2 − 1)d ≤ e < 2 +m . Regions A and C are hence both of length (2 − 1)d. Regions A and C are furthermore both divided into 2 − 1 plateaus of length d, as there is, for each multiple of d that is subtracted from e in these regions, one fewer values of b for which there exists an a such that e = a − bd. The situation that arises is depicted in Fig. 1.

The probability of observing (j, k) summed over all e
We may now sum ζ(θ, #b(e)) over all e to express P(θ ) = 1 2 2(2 +m) on closed form by first using that if the sum is split into partial sums over the regions A, B and C, respectively, it follows from the previous two sections that the sums over regions A and C must yield the same contribution.
Furthermore, region B is rectangular, and each plateau in regions A and C is rectangular. Closed form expressions for the contribution from these rectangular regions may be trivially derived. This implies that so the probability of observing the pair ( j, k) over all e is so the probability of observing the pair ( j, k) over all e is

The distribution of pairs (j, k) with argumentW
e now proceed to analyze the distribution and multiplicity of angles θ , or equivalently of arguments α, to complete our analysis.

Definition 1
Let κ denote the greatest integer such that 2 κ divides d.

Claim 1 All admissible arguments
Proof As 2 κ | d < 2 m and the modulus is a power of two the claim follows.
has a unique solution in j with 0 ≤ j < 2 +m−κ . For any such j , there are 2 κ values of j on 0 ≤ j < 2 +m with j ≡ j (mod 2 l+m−κ ). Hence there are 2 +κ pairs ( j, k) with α = {d j + 2 m k} 2 +m . By Claim 1, these are the only admissible α, and so the lemma follows.

The probability of observing a pair (j, k) with argumentĮ
t follows from Lemma 1 that the probability (θ(α)) of observing one of the pairs ( j, k) with argument α is (α) = N (α) · P (θ(α)) where the multiplicity

Understanding the probability distribution
In this section, we introduce the notion of t-good pairs, prove upper bounds on the probability of obtaining a t-good pair, and show how the pairs are distributed as a function of t to build an understanding for the distribution.
Definition 4 Let ρ(t) denote the probability of observing a t-good pair.

Corollary 1
The probability of observing a t-good ( j, k) for t such that | m − t | ≤ Δ, and Δ some positive integer, is lower-bounded by where we have used Lemma 2, and that the sum is a geometric series. Finally, the probability of observing a t-good ( j, k) such that t = 0 is where we have used that N (0) = 2 +κ , that the sum is over at most 2 +m+1 values of e, and that ζ(0, #b(e)) ≤ 2 2 , as in the proof of Lemma 2. Hence, the probability of observing a where we have used that ρ(t) sums to one, and so the corollary follows.
Recall that ρ(t) is a probability distribution, and hence normalized, whilst Lemma 2 states that ρ(t) is exponentially suppressed as | m−t | grows large for 0 < t < +m. The probability mass is hence concentrated on the t-good pairs for t ∼ m, except for artificially large κ as t = 0 then attracts a non-negligible fraction of the probability mass. This is formally stated in Corollary 1.
This implies that each pair ( j, k) observed yields ∼ bits of information on d, as we expect | α( j, k) = {d j + 2 m k} 2 +m | ∼ 2 m , whilst for random ( j, k) we expect | α( j, k) | ∼ 2 +m . The situation that arises for different d is visualized in Fig. 2.
In the region denoted A in Fig. 2, d is odd and hence κ is zero. As d decreases from 2 m − 1 to 2 m−1 + 1, the probability mass is redistributed towards slightly smaller values of t. This reflects the fact that it becomes easier to solve for d as d decreases in size in relation to 2 m . The redistributive effect weakens if d is further decreased: The histogram in region B where d = 2 m−5 − 1 is roughly representative of the distribution that arises when d is smaller than 2 m by a few orders of magnitude. Further decreasing d has no significant effect.
In the region denoted C in Fig. 2, d is selected to be divisible by a great power of two to illustrate the redistribution that occurs when κ is close to maximal. As the admissible arguments are multiples of 2 κ , it follows that ρ(t) = 0 for 0 < t < κ, so the corresponding histogram entries are forced to zero and the probability mass redistributed. This effect is only visible for artificially large κ. It does not arise in cryptographic applications where d may be presumed to be random.
Recall that [5] introduces the notion of a "good" pair ( j, k), which in our vocabulary is a pair ( j, k) such that | α( j, k) | ≤ 2 m−2 , and that it establishes a lower bound of 1/8 on the probability of observing a good pair. The quantum algorithm is therefore run 8s times and all subsets of s outputs post-processed.
Compared to the original analysis in [5], our new analysis is much more precise: It indicates that the probability of observing a good pair is significantly higher that the 1/8 bound guarantees. Empirically, it is at least 3/10 for random d. This immediately gives rise to an efficiency gain when exhausting subsets. More importantly, as is stated formally in Corollary 1, our new analysis shows that essentially all observed pairs ( j, k) are "somewhat good", in that they yield ∼ bits of information on d. This fact enables us to altogether forego exhausting subsets, giving rise to the improved post-processing algorithm in Sect. 6.

Simulating the quantum algorithm
In this section, we combine results from the previous sections to construct a high-resolution histogram for the probability distribution for a given d. We furthermore describe how the histogram may be sampled to simulate the quantum algorithm.

Constructing the histogram
To construct the high-resolution histogram, we divide the argument axis into regions and subregions and integrate (α) numerically in each subregion.
For each subregion, we compute the probability mass contained within the subregion by applying Simpson's rule, followed by Richardson extrapolation to cancel the linear error term. Simpson's rule is hence applied 2 ν (1 + 2) times in each region. Each application requires (α) to be evaluated in up to three points (the two endpoints and the midpoint). When evaluating (α), we divide the result by 2 κ to account for the density of admissible pairs.
Note that as the distribution is symmetric around zero, we need only compute one side in practice. Note furthermore that the above approach to constructing the histogram is only valid provided κ is is not artificially large in relation to m, as we must otherwise account for which α are admissible. This is not a concern in cryptographic applications as d may then be presumed random.

Sampling the distribution
To sample an argument α from the histogram for the distribution, we first sample a subregion and then sample α uniformly at random from the set of all admissible arguments in this subregion. To sample the subregion, we first order all subregions in the histogram by probability, and compute the cumulative probability up to and including each subregion in the resulting ordered sequence. Then, we sample a pivot uniformly at random from [0, 1) ⊂ R, and return the first subregion in the ordered sequence for which the cumulative probability is greater than or equal to the pivot. The sampling operation fails if the pivot is greater than the total cumulative probability.
To sample a pair ( j, k) from the distribution, we first sample an admissible argument α as described above, and then sample ( j, k) uniformly at random from the set of all integer pairs ( j, k) yielding α using Lemma 3 below.

Lemma 3
The set of integer pairs ( j, k) on 0 ≤ j < 2 +m and 0 ≤ k < 2 that yield the admissible argument α is given by as t runs through all integers on 0 ≤ t < 2 κ and k through all integers on 0 ≤ k < 2 .
Proof As α ≡ d j + 2 m k mod 2 +m , it follows by solving for j as in Lemma 1 that as k and t run through 0 ≤ k < 2 and 0 ≤ t < 2 κ , enumerates the 2 +κ distinct pairs ( j, k) that yield the admissible argument α. By dividing both α − 2 m k and d by 2 κ , so as to show that the required inverse exists, the lemma follows.
More specifically, we sample integers t, k uniformly at random from 0 ≤ t < 2 κ and 0 ≤ k < 2 , and then compute j from α and t, k as in Lemma 3.

Our improved post-processing algorithm
We are now ready to introduce our improved post-processing algorithm.
In analogy with the post-processing algorithm in [5], it recovers d from a set {( j 1 , k 1 ), . . . , ( j n , k n )} of n pairs produced by executing the quantum algorithm n times. However, where the original post-processing sets n = s and requires subsets of s pairs from a larger set of about 8s pairs to be exhaustively solved for d, we allow n to assume larger values and simultaneously solve all n pairs for d.
Note that we may admit larger n because of the analysis in Sect. 4 which shows that even if a pair is not good by the definition in [5], then it is close to being good with overwhelming probability, in the sense that all pairs ( j i , k i ) have associated arguments In analogy with the original post-processing algorithm in [5], the set of n pairs is used to form a vector v = ( {−2 m k 1 } 2 +m , . . . , {−2 m k n } 2 +m , 0) ∈ Z D and a D-dimensional integer lattice L with basis matrix ⎡ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣ j 1 j 2 · · · j n 1 2 +m 0 · · · 0 0 0 2 +m · · · 0 0 . . .
where D = n + 1. For some constants m 1 , . . . , m n ∈ Z, the vector u = ({d j 1 } 2 +m + m 1 2 +m , . . . , {d j n } 2 +m + m n 2 +m , d) ∈ L is such that the distance This implies that u and hence d may be found by enumerating all vectors in L within a D-dimensional hypersphere of radius R centered on v. The volume of such a hypersphere is For comparison, the fundamental parallelepiped in L contains a single lattice vector and is of volume det L = 2 ( +m)n . Heuristically, we therefore expect the hypersphere to contain approximately v = V D (R) / det L lattice vectors. The exact number depends on placement of the hypersphere in Z D and on the shape of the fundamental parallelepiped in L.
Assuming v to be sufficiently small for it to be computationally feasible to enumerate all lattice vectors in the hypersphere in practice, the above algorithm may be used to recover d. As the volume quotient v decreases in n, the number of vectors that need to be enumerated may be reduced by running the quantum algorithm more times and including the resulting pairs in L. However, there are limits to the number of pairs that may be included in L, as a reduced basis must be computed to enumerate L, and the complexity of computing such a basis grows fairly rapidly in the dimension of L.
In what follows, we show how to heuristically estimate the minimum number of runs n required to solve a specific known problem instance for d with minimum success probability q, for a given tradeoff factor s, and for a given bound on the number of vectors v that we at most accept to enumerate in L.

Estimating the minimum n required to solve for d
The radius R depends on the pairs ( j i , k i ) via the arguments α i . For fixed n and fixed probability q, we may estimate the minimum radius R such that by sampling α i from the probability distribution as in Sect. 6

.2. Then
Pr providing a heuristic bound on the number of lattice vectors v that at most have to be enumerated that holds with probability at least q.
Given an upper limit on the number of lattice vectors that we accept to enumerate, equation (9) may be used as an heuristic to estimate the minimum value of n such that v is below this limit with probability at least q.
To compute the estimate in practice, we use the heuristic to compute an upper bound on v for n = 1, 2, . . . and return the minimum n for which the bound is below the limit on the number of vectors that we accept to enumerate.
As the volume quotient v decreases by approximately a constant factor for every increment in n, the minimum n may be found efficiently via interpolation once the heuristic bound on v has been computed for a few values of n.

Estimating R
To estimate R for m, s and n, explicitly known d, and a given target success probability q, we sample N sets of n arguments {α 1 , . . . , α n } from the probability distribution as described in Sect. 5.2. For each set, we compute R, sort the resulting list of values in ascending order, and select the value at index (N − 1) q to arrive at our estimate for R. Note that the constant N controls the accuracy of the estimate. For N sufficiently large in relation to q, and to the variance of the arguments, we expect to obtain sufficiently good estimates. Note furthermore that the sampling of arguments may fail. This occurs when the argument being sampled is not in the regions of the argument axis covered by the histogram.
If the sampling of at least one argument in a set fails, we err on the side of caution and over-estimate R by letting R = ∞ for the set. The entries for the failed sets will then all be sorted to the end of the list. If the value of R selected from the sorted list is ∞, no estimate is produced.

Results and analysis
To illustrate the efficiency of our new post-processing algorithm, we heuristically estimate n as a function of m and various s for the hard case d = 2 m − 1, and verify the estimates in practical simulations, in this section: To this end, we let = m/s and fix q = 0.99. We fix N = 10 6 when estimating R, and record the smallest n for which the volume quotient v < 2. Note that the latter requirement ensures that the lattice need not be enumerated. If v < 2 it heuristically suffices to reduce a single basis of dimension D = n + 1 and to then apply Babai's nearest plane algorithm [1] to recover u and hence d from v. We verify the estimate by sampling M = 10 3 sets of n pairs {( j 1 , k 1 ), . . . , ( j n , k n )} and solving each set for d with our improved post-processing algorithm. If d is thus recovered, the verification succeeds, otherwise it fails. We record the smallest n such that at most M(1 − q) = 10 verifications fail. Table 1 was produced by executing the procedure described above. As may be seen in the table, the heuristically estimated values of n are in general verified by the simulations, except when v is close to two, in which case the values may differ slightly. Note that for large tradeoff factors s in relation to m, increasing or decreasing n typically has a small effect on v. This may lead to slight instabilities in the estimates, as v may be close to two for several values of n. Note furthermore that the difference in the sample sizes N and M may of course also give rise to slight discrepancies when verifying the estimates.
Compared to the post-processing algorithm originally proposed in [5], that required 8s runs to be performed and all subsets of s outputs from the resulting 8s outputs to be exhaustively solved for d, the new post-processing algorithm is considerably more efficient and achieves considerably better tradeoffs. It furthermore requires considerably fewer quantum algorithm runs. Asymptotically, the number of runs required n tends to s + 1 as m tends to infinity for fixed s, when we require a solution to be found without enumerating the lattice, as may be seen in Table 1. If we accept to enumerate a limited number of vectors in the lattice, this may potentially be slightly improved. In particular, a single run then suffices for s = 1.
To reduce the lattice bases, we used the LLL [9] and BKZ [13,14] algorithms, as implemented in fpLLL 5.2, with default parameters and for BKZ a block size of min(10, n + 1) for all combinations of m, s and n. For these parameter choices, a basis in general takes at most minutes to reduce and solve for d in a single thread, except when using BKZ for the largest combinations of m and s.

Further improvements
Above, we conservatively fixed a high minimum success probability q = 0.99 and considered the hard case d = 2 m − 1. Furthermore, we required that mapping v to the closest vector in L should yield u without enumerating L.
In practice, some of these choices may be relaxed: Instead of requiring u to be the closest vector to v in L, we may enumerate all vectors in a hypersphere of limited radius centered on v. In cryptographic applications, the logarithm d may in general be assumed to be randomly selected. If not, the logarithm may be randomized; solve x [c] g for d + c with respect to the basis g for some random c.

Summary and conclusion
We have introduced a new efficient post-processing algorithm that is practical for greater tradeoff factors, and requires considerably fewer quantum algorithm runs, than the original post-processing algorithm in [5]. To develop the new post-processing algorithm, and to estimate the number of runs it requires, we have analyzed the probability distribution induced by the quantum algorithm, and developed a method for simulating it when d is known.
With our new analysis and post-processing, Ekerå-Håstad's algorithm achieves an advantage over Shor's algorithms, not only in each individual run, but also overall, when targeting cryptographically relevant instances of RSA and Diffie-Hellman with short exponents. When making tradeoffs, Ekerå-Håstad's algorithm with our new post-processing furthermore outperforms Seifert's algorithm when factoring RSA integers. To the best of our knowledge, this makes Ekerå-Håstad's algorithm the currently most efficient polynomial time quantum algorithm for breaking the RSA cryptosystem [12], and DLP-based cryptosystems such as Diffie-Hellman [3] when instantiated with safe-prime groups with short exponents, as in TLS [7], IKE [8] and standards such as NIST SP 800-56A [2]. The reader is referred to Appendix A for a detailed analysis supporting these claims. In Table 2 we provide estimates of the complexity of computing short discrete logarithms in these standardized safe-prime groups. The estimates were computed as in Sect. 6.3, for maximal d = 2 m − 1 and ≥ 99% success probability, except that we accept to enumerate a limited number of vectors in L. We tabulate m, s and n, for s = 1, and for s the greatest tradeoff factor such that n − s ≤ 3 with v close to two.
The number of group operations that need to be computed, in each run of the quantum algorithm, and overall in n runs, are tabulated for each tuple m, s and n, along with the advantage, defined as the quotient between 2 log 2 r , a low estimate of the number of group operations in each run of Shor's algorithm for general discrete logarithms, and the number of group operations, in each run or overall, in Ekerå-Håstad with our improved post-processing.
Note that the advantage is intentionally slightly underestimated as Ekerå-Håstad with our improved post-processing has ≥ 99% probability of recovering d after n runs. More than one run of Shor's algorithm may be required to achieve a comparably high success probability, or a few more than 2 log 2 r group operations may need to be computed in each run. The success probability depends on how the control registers are initialized, their lengths in relation to r , and on how the output from the quantum algorithm is post-processed.
As may be seen in Table 2, Ekerå-Håstad with our improved post-processing reduces the number of group operations in each run by up to a factor of 34.6 for these m, s and n. It achieves an advantage, not only in each run, but also overall, even for large tradeoff factors. We have verified the estimates by post-processing simulated outputs.
To aid interpretation, when an advantage is achieved in each run, it implies that Ekerå-Håstad's algorithm is easier to fit on a large-scale but constrained quantum computer compared to Shor's algorithm. Furthermore, it also implies that Ekerå-Håstad can be parallelized to achieve a reduction in the time required to solve a given problem instance. When an overall advantage is achieved, it implies that Ekerå-Håstad's algorithm requires less group operations to be computed overall in the n runs compared to a single run of Shor's algorithm.

A.2 RSA
Let p, q be two large distinct primes of length l bits. Then N = pq is said to be an RSA integer. To factor N into p and q, we follow [5] and execute the below procedure: Pick an element g, of unknown order r , uniformly at random from Z * N . Let p = ( p − 1)/2 andq = (q − 1)/2. Then r divides 2pq / gcd(p,q). Let Then d = f (N ) mod r =p +q − 2 l−1 , assuming r > 2 l−1 , which holds with overwhelming probability. The discrete logarithm d is then short. To factor N , it suffices to compute d using the quantum algorithm of Ekerå and Håstad. Given d we may immediately solve p + q = 2(d + 2 l−1 + 1) and pq = N for p and q.
Recall that in our analysis of the probability distribution, we required d to be on the interval 0 < d =p +q − 2 l−1 < 2 m , which implies m = l − 1. Furthermore, we required r ≥ 2 +m + (2 − 1)d, where we let = m/s for s an integer. This implies that we must select s ≥ 2 if the requirement on the size of r is to be respected with high probability.
In Table 3 we provide estimates of the complexity of factoring RSA integers. Again the estimates were computed as described in Sect. 6.3, for ≥ 99% success probability and maximal d = 2 m − 1. We tabulate m, s and n, for s = n = 2, and for s the greatest tradeoff factor such that n − s ≤ 3 whilst keeping v sufficiently small to avoid enumerating L.
The estimates are otherwise tabulated as described in Sect. A.1, except that we compute the advantage with respect to a single run of Shor's order finding algorithm. We have verified the estimates by post-processing simulated quantum algorithm outputs.
As may be seen in Table 3, an advantage is achieved in each run of the algorithm. For s = 2, we perform the same number of operations overall in the two runs, as Shor's algorithm does in a single run. Hence, the benefit of having a reduced circuit in each individual run comes at no cost in terms of the overall number of operations that need to be performed. For s > 2, we obtain a greater advantage in each individual run of the algorithm, at the expense of performing a greater number of operations overall than Shor's algorithm does in a single run. Compared to the advantage Seifert's algorithm [15] achieves over Shor, Ekerå-Håstad with our improved post-processing achieves a further advantage by up to a factor of two.

A.2.1 Breaking RSA with an advantage in a single run
In the previous section, and in [5], we had to select s ≥ 2 to respect the requirement that r ≥ 2 +m + (2 − 1)d where we let = m/s . Assume that we instead let = m − δ, for some small positive integer δ. If δ is sufficiently large, then the requirement that r ≥ 2 +m + (2 − 1)d is respected with high probability. At the same time, if δ is not too large, we only need to enumerate a moderately large set of vectors in L in the classical post-processing. Let us consider how δ affects the probability of respecting the requirement on r : The requirement that r ≥ 2 +m +(2 −1)d may be simplified to r ≥ 2 +m+1 = 2 2l−δ−1 , by using that = m − δ where m = l − 1 and d < 2 m . Also 2 2(l−1) ≤ φ(N ) = ( p − 1)(q − 1) < 2 2l , where φ is Euler's totient function, as p, q are large distinct l bit primes.
By Lemma 4 in the next section, that operates under slightly different assumptions that p, q are primes drawn from [2, x] as x → ∞ for proof-technical reasons, the probability that r ≥ φ(N )/2 τ is lower-bounded by P N (τ ). It is reasonable to assume that this result carries over to our situation where p, q are of large fixed length. This assumption is supported by simulations where N = pq and g ∈ Z * N are sampled and r estimated, see Sect. A.2.3. Under this assumption, we have that r ≥ 2 2(l−1)−τ with probability at least P N (τ ).
At the same time, for δ = 20, we expect to have to enumerate at most v = 1.4 · 10 9 vectors in L in the classical post-processing to recover d with success probability ≥ 99%, see Table 4 where we have again estimated v as in Sect. 6.3 for maximal d = 2 m − 1.
Such an enumeration is easy to perform in practice. It is facilitated by the lattice being two-dimensional, and by the last component of the vector sought being d. The facts that 0 < d ≤ 2 m and d ≡ (N − 1)/2 (mod 2) may hence be used to prune the enumeration. We have verified that this slightly tweaked version of Ekerå-Håstad's algorithm achieves ≥ 99% success probability by post-processing simulated outputs.
Note that we tweaked the algorithm to ensure that r ≥ 2 +m + (2 − 1)d. This requirement serves to prevent modular reductions from occurring in our analysis of the probability distribution. It may be possible to relax this requirement, at the expense of carrying out a more complicated analysis. We circumvent this complication by letting = m − δ.

A.2.2 On the probability of picking a generator with sufficiently large order
In this section we analyze the probability of picking g ∈ Z * N with sufficiently large order. The bound is asymptotically correct in the limit as x tends to infinity.
. The order r is hence always reduced by a factor gcd( p − 1, q − 1) compared to φ(N ) due to shared factors. It may be further reduced: This depends on g, and on how p − 1 and q − 1 factor. To understand the distribution of r , we therefore first seek an expression for the probability of a factor z dividing p − 1, and q − 1, respectively.
To this end, we use Siegel-Walfisz' theorem [18,19]. In essence it states that as x → ∞ the probability over all primes p ≤ x that p ≡ a (mod z) is 1/φ(z), for z ∈ [2, ln c (x)] a fixed integer, where c is an arbitrary positive constant, and a a fixed residue coprime to z. For a = 1 this implies that p − 1 ≡ 0 (mod z), i.e. z divides p − 1, with probability 1/φ(z). The same holds for q − 1. Furthermore, as φ(z 1 z 2 ) = φ(z 1 ) · φ(z 2 ) for z 1 , z 2 coprime factors, we may independently analyze the reductions caused by distinct prime powers.
To obtain the lower bound P N (τ ) on the probability of r ≥ φ(N )/2 τ , we let F be the set of all primes less than 2 τ , and F the set of all primes greater than 2 τ , and proceed as follows: In Part 1 of the proof, we compute all reductions φ(N )/r ≤ 2 τ to which combinations of powers of factors in F give rise, and sum up the associated probabilities, to obtain the probability ρ N (τ ). In Part 2, we upper-bound the probability of factors in F giving rise to a reduction in the order greater than 2 τ , and subtract it from ρ N (τ ) to obtain P N (τ ). Part 1 For each f ∈ F , let n be the greatest power of f such that f n ≤ 2 τ . As f e divides p − 1, or equivalently q − 1, with probability 1/φ( f e ), we have that is the probability that f e is the greatest power of f to divide p − 1, or equivalently q − 1, for e ∈ [0, n], and the probability that f n+1 divides p − 1, or equivalently q − 1, when e = n + 1. For pairwise combinations of e p , e q ∈ [0, n + 1], the probability is (e p , f ) · (e q , f ) that f e p and f e q divide p − 1 and q − 1, respectively, in the special manner described above.
In the formulation of this lemma, we pick g uniformly at random from Z * N , and consider the order of g. This is equivalent to selecting g p and g q uniformly at random from the cyclic groups Z + p−1 and Z + q−1 , respectively, and considering the order of (g p , g q ) ∈ Z + p−1 × Z + q−1 Z * N .
In Lemma 4 above, we consider N = pq with p, q distinct primes drawn uniformly at random from [2, x] as x → ∞. In practice, we would instead like to select p, q = p uniformly at random from the set of all primes of length l bits, so that N is an RSA integer by the definition in this paper. In practice, we may also wish to require that N must be a 2l bit number. Imposing such restrictions on the sizes of p and q should not have any significant impact on the outcome of the analysis, for the reasons elaborated on in the next section.
If other special requirements are imposed on p and q, the analysis may however need to be adapted to meet these requirements. In particular, this is the case if it is required that p − 1 and q − 1 have no small divisors besides two. The success probability then increases.

A.2.3: Supplementary supporting simulations
The primary obstacle to selecting p, q as l bit primes in Lemma 4 is the unknown constant in the error term in Siegel-Walfisz' theorem. Asymptotically, the error term goes to zero regardless of the value of the constant, allowing us to state that z divides p−1 with probability 1/φ(z) in Lemma 4 for p selected uniformly at random on [2, x] as x → ∞.
For large fixed x, taking the probability to be 1/φ(z) is a reasonable approximation. However, without a good explicit bound on the error in the approximation, technical problems arise if one seeks to follow the approach taken in Lemma 4. In particular, problems arise when exhausting all combinations of products of small prime factor powers in Part 1 of the proof.
One way to obtain an estimate anyway is to instead run numerical simulations: Draw two distinct primes p, q uniformly at random from (2 l−1 , 2 l ]. Compute N = pq and select a generator g uniformly at random from Z * N . Let r = φ(N )/ gcd( p − 1, q − 1). For all prime factors f ≤ 2 η for some sufficiently large η, test if f divides r . If so, and if g r / f ≡ 1, reduce r by a factor of f . Go back and test again recursively if f still divides r . Otherwise, process the next factor. The order r thus computed is a good heuristic estimate of the order of g.
We have run such experiments, in which we repeated the simulation procedure 10 6 times, for l = 1024, τ = 19 and η = 24. On average, only ∼ 130 problem instances failed to meet the requirement that r ≥ φ(N )/2 τ . This is in agreement with the asymptotic bound. It furthermore indicates that the success probability is high already for 2048 bit RSA integers.