Abstract
We present improved quantum circuits for elliptic curve scalar multiplication, the most costly component in Shor’s algorithm to compute discrete logarithms in elliptic curve groups. We optimize lowlevel components such as reversible integer and modular arithmetic through windowing techniques and more adaptive placement of uncomputing steps, and improve over previous quantum circuits for modular inversion by reformulating the binary Euclidean algorithm. Overall, we obtain an affine Weierstrass point addition circuit that has lower depth and uses fewer T gates than previous circuits. While previous work mostly focuses on minimizing the total number of qubits, we present various tradeoffs between different cost metrics including the number of qubits, circuit depth and Tgate count. Finally, we provide a full implementation of point addition in the Q# quantum programming language that allows unit tests and automatic quantum resource estimation for all components.
S. Jaques—Partially supported by the University of Oxford Clarendon fund.
Most of this work was done by Samuel Jaques, while he was an intern at Microsoft Research.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
 1.
Our code will be released under an open source license.
References
Amy, M., Maslov, D., Mosca, M., Roetteler, M.: A meetinthemiddle algorithm for fast synthesis of depthoptimal quantum circuits. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 32(6), 818–830 (2013)
Babbush, R., et al.: Encoding electronic spectra in quantum circuits with linear T complexity. Phys. Rev. X 8(4), 041015 (2018). arXiv: quantph/1805.03662
Barenco, A., et al.: Elementary gates for quantum computation. Phys. Rev. A 52(5), 3457–3467 (1995). arXiv: quantph/9503016
Bennett, C.H.: Logical reversibility of computation. IBM J. Res. Dev. 17(6), 525–532 (1973)
Bennett, C.H.: Time/space tradeoffs for reversible computation. SIAM J. Comput. 18(4), 766–776 (1989)
Bernstein, D.J., Lange, T.: (2007). https://www.hyperelliptic.org/EFD
Bernstein, D.J., Yang, B.Y.: Fast constanttime GCD computation and modular inversion. IACR Trans. Cryptogr. Hardw. Embed. Syst. 2019(3), 340–398 (2019)
Cuccaro, S.A., Draper, T.G., Kutin, S.A., Moulton, D.P.: A new quantum ripplecarry addition circuit (2004). arXiv:quantph/0410184
Draper, T.G., Kutin, S.A., Rains, E.M., Svore, K.M.: A logarithmicdepth quantum carrylookahead adder, June 2004. arXiv: quantph/0406142
Fowler, A.G., Mariantoni, M., Martinis, J.M., Cleland, A.N.: Surface codes: towards practical largescale quantum computation. Phys. Rev. A 86, 032324 (2012)
Gheorghiu, V., Mosca, M.: Benchmarking the quantum cryptanalysis of symmetric, publickey and hashbased cryptographic schemes (2019)
Gidney, C.: Halving the cost of quantum addition. Quantum 2, 74 (2018)
Gidney, C.: Windowed quantum arithmetic (2019). arXiv: quantph/1905.07682
Gidney, C., Ekerå, M.: How to factor 2048 bit RSA integers in 8 hours using 20 million noisy qubits, May 2019. arXiv: quantph/1905.09749
Griffiths, R., Niu, C.S.: Semiclassical Fourier transform for quantum computation. Phys. Rev. Lett. 76(17), 3228–3231 (1996)
Jones, C.: Lowoverhead constructions for the faulttolerant Toffoli gate. Phys. Rev. A 87(2), 022328 (2013)
Kaliski, B.S.: The Montgomery inverse and its applications. IEEE Trans. Comput. 44(8), 1064–1065 (1995)
Meuli, G., Soeken, M., Campbell, E., Roetteler, M., De Micheli, G.: The role of multiplicative complexity in compiling low Tcount oracle circuits (2019). arXiv: quantph/1908.01609
Meuli, G., Soeken, M., Roetteler, M., Bjørner, N., De Micheli, G.: Reversible pebbling game for quantum memory management. In: Design, Automation & Test in Europe Conference, pp. 288–291 (2019)
Montgomery, P.L.: Modular multiplication without trial division. Math. Comput. 44(170), 519–521 (1985)
Moore, C.: Quantum circuits: fanout, parity, and counting (1999). arXiv: quantph/9903046
Nielsen, M.A., Chuang, I.L.: Quantum Computation and Quantum Information. Cambridge University Press, Cambridge (2000)
Proos, J., Zalka, C.: Shor’s discrete logarithm quantum algorithm for elliptic curves, January 2003. arXiv: quantph/0301141
Roetteler, M., Naehrig, M., Svore, K.M., Lauter, K.: Quantum resource estimates for computing elliptic curve discrete logarithms. In: Takagi, T., Peyrin, T. (eds.) ASIACRYPT 2017. LNCS, vol. 10625, pp. 241–270. Springer, Cham (2017). https://doi.org/10.1007/9783319706979_9
Selinger, P.: Quantum circuits of Tdepth one. Phys. Rev. A 87(4), 042302 (2013). arXiv: 1210.0974
Shor, P.W.: Algorithms for quantum computation: discrete logarithms and factoring. In: FOCS 1994, pp. 124–134 (1994)
Shor, P.W.: Polynomialtime algorithms for prime factorization and discrete logarithms on a quantum computer. SIAM J. Comput. 26(5), 1484–1509 (1997)
Svore, K.M., et al.: Q#: enabling scalable quantum computing and development with a highlevel DSL. In: RWDSL@CGO 2018 (2018)
Takahashi, Y., Tani, S., Kunihiro, N.: Quantum addition circuits and unbounded fanout. Quantum Inf. Comput. 10, 10 (2009)
Testa, E., Soeken, M., Amarù, L.G., De Micheli, G.: Reducing the multiplicative complexity in logic networks for cryptography and security applications. In: Design Automation Conference, p. 74 (2019)
U.S. Department of Commerce/National Institute of Standards and Technology. Digital signature standard (DSS). FIPS1864 (2013). http://nvlpubs.nist.gov/nistpubs/FIPS/NIST.FIPS.1864.pdf
Acknowledgements
We thank Dan Bernstein, Martin Ekerå, Iggy van Hoof, and Tanja Lange for helpful suggestions about elliptic curve arithmetic. We thank Martin Albrecht for lending computing power to run resource estimates.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendices
A Alternative Approaches
1.1 A.1 Modular Multiplication
RNSL provide two circuits for modular multiplication. The first is the one proposed by Proos and Zalka [23], which uses a doubleandadd approach, where doubling and addition are both modular operations modulo p. The other is reversible Montgomery multiplication, which uses an addandhalve approach and works in Montgomery form. The primary motivation for considering Montgomery multiplication instead of the straightforward doubleandadd method is that modular reduction is achieved by suitable additions to clear lower order bits and divisions by 2 (i.e. bit rotations) as part of the whole circuit, not delegated to the addition and halving circuits. This results in simpler operations per bit.
However, Montgomery multiplication has the downside that it entangles with a register of auxiliary qubits which must be cleared. In our case, at every point in an elliptic curve point addition, we have enough spare auxiliary qubits for this. Overall, it is cheaper, even with the Bennett method, and especially with the multiplythenadd technique of Sect. 4.1.
1.2 A.2 Modular Inversion
Proos and Zalka [23] (PZ) gave an approach to modular inversion based on precise control of a bitshift division operation, with asymptotic complexity of \(O(n^2)\). There are O(n) iterations of a round. Each round implements conditional logic by computing state qubits, then using those state qubits to control some operations on the integer registers.
RNSL use a similar roundbased construction, which implements a reversible binary extended Euclidean algorithm. As with multiplication, the primary difference between the PZ division and the RNSL division is that PZ’s is based on doubling and integer long division, while RNSL’s is based on halving and binary operations. The PZ inversion leaves only \(O(\lg n)\) auxiliary qubits, while RNSL creates \(2n+O(\lg n)\) auxiliary qubits, but PZ has a higher depth and gate cost.
Naively, the PZ approach uses 5n qubits, though they show that, with fidelity loss on the order of \(O(n^{3})\) per round, they require only \(2n+8\sqrt{n}+O(\log n)\) qubits. The RNSL approach uses 6n qubits. We choose to use the RNSL algorithm. It is exactly correct, so it can be used for higher depth algorithms, and the total Tcost and depth are less than half of the PZ approach.
1.3 A.3 Recursive GCD Algorithms
There are several subquadratic GCD algorithms (such as [7]). These work by defining a series of \(2\times 2\) matrices \(T_n\) such that \(T_nT_{n1}\cdots T_1(u,v)^T\) will map integers u and v to the nth step of the Euclidean algorithm. These can be computed and multiplied together recursively.
Adapted to quantum circuits, these approaches require quantum matrix multiplication. We could find no efficient method to do this inplace, meaning that each recursive call would require a new set of auxiliary qubits to store the matrix output. This would quickly overwhelm our qubit budget. The base case of [7] is nearly identical to our approach for a single round.
One of the primary advantages of [7] is that the recursive process allows much of the arithmetic to be done with small integers which fit into the registers of classical CPUs. All the qubits in our model of a quantum computer are identical, so it has no caching or register issues. If quantum technologies arise with different kinds of qubits (perhaps a “memory” with higher coherence times but lower gate fidelity), then recursive GCD algorithms should be revisited. It is also possible that the specific structure of the matrices in this approach permit an easy, inplace multiplication circuit. We leave this to future work.
1.4 A.4 Alternate Curve Representations
Projective Coordinates. Projective coordinates use equivalence classes (X : Y : Z) of triples (X, Y, Z) to represent an elliptic curve point, where \((X_1,Y_1,Z_1)\sim (X_2,Y_2,Z_2)\) iff there is some nonzero constant c such that \(X_1=cX_2\), \(Y_1=cY_2\), and \(Z_2=cZ_2\). These can be used with many different families of curves. Projective coordinates lend themselves to efficient, inversionfree arithmetic, which is appealing for classical computers.
Projective coordinates do not give a unique representation of each point, which Shor’s algorithm requires to ensure history independence and thus proper interference of states in superposition. Dividing by the Z coordinate produces a unique representation but requires an expensive division. It is an open problem to provide a unique projective representation with divisionfree arithmetic.
Another issue is that the classical elliptic curve formulas, naively adapted to quantum circuits, operate outofplace. An outofplace addition circuit is easy to adapt into an inplace addition circuit. If we can construct a circuit \(U_{+Q}\) to add a point Q, we can construct a circuit \(U_{Q}\), and we can construct an inplace point addition by writing \(P+Q\) into another register, then subtracting Q from \(P+Q\) to clear the input. This doubles the cost of point addition.
This technique requires a unique representation. If \((P\,+\,Q)\,\,Q\) does not have the same representation as P, we cannot cancel them out. Thus, for any current algorithm to compute addition with projective coordinates with cost C, we can transform it to a quantumsuitable inplace version with cost \(2C+2D\), where D is the cost of division. The division creates a unique representation.
According to the Explicit Formulas Database [6], the lowestcost addition uses 6 squares and/or multiplications. With the required reductions, the total cost is 12 squares/multiplications and 2 divisions, much higher than affine Weierstrass coordinates. Thus, we choose not to use projective coordinates in this work.
1.5 A.5 Precomputation
Precomputed tables of certain powers of the base element can speed up exponentiations. The “comb” method is a standard technique used for elliptic curve scalar multiplication. To multiply a point P by a scalar k, we divide k into \(k_1+2k_2+\cdots + 2^\ell k_\ell \) for some \(\ell \), with the property that \(k_j\) contains bits of k in positions congruent to j modulo \(\ell \) (each \(k_j\) looks like a comb of bits). We then precompute a table of all multiples of P by scalars of the form \(b_0\,+\,b_12^\ell \,+\,b_2 2^{2\ell }\dots \), with \(b_i\in \{0,1\}\). By the definition of \(k_j\), each \(k_jP\) is a precomputed point in this table for all j. Thus, we can compute kP by using \(k_j\) to look up elements of the table, adding them to a running total, and doubling the running total.
The advantage of the comb technique is that it saves precomputation. We only precompute one table and use it for the entire computation. Unfortunately for the quantum case, precomputation is essentially free because it is entirely classical, but lookups are expensive. The comb technique does not reduce the number of table lookups, since we must do a separate lookup for each index \(k_j\).
Further, efficient inplace point doubling is unlikely, since it implies efficient inplace point halving. Thus, doubling points in the comb would require some pebbling technique which would likely add significant depth or width costs.
B Modular Division and Addition
For elliptic curve addition, we only need to divide integers and copy the result to a blank output, but other applications may wish to construct a circuit that, given registers containing x, y, and z, will compute \(yx^{1}+z\).
We might simply invert, multiply, and then add the output of the multiplication instead of copying. However, doubling the output to correct the pseudoinverse while uncomputing will also multiply z by a factor of \(2^{2nk}\). To correct for this, we can repeatedly halve z during the forward computation of the modular inverse. This means that while we compute the modular inverse, we control a modular halving of the register containing z by the counter, which will halve z exactly \(2nk\) times. Then we multiply the pseudoinverse by y and add the result to the register with z, producing the state \(\left x^{1}2^{2nk}\bmod p\right\rangle \left y2^n\bmod p\right\rangle \left z2^{2nk}+x^{1}y2^{2nk}\bmod p\right\rangle \). From here, if we perform controlled modular doublings of the register containing z as we uncompute the inversion circuit, this will correct both z and the pseudoinverse of x, producing the desired output.
C Analysis of Windowed Arithmetic
A quantum lookup to N elements requires 4N Tgates [2]. To optimize window costs, we balance this cost against the operations we save.
Multiplication. Section 4.1 describes a single windowed multiplication round. For nbit integers with window size k, repeating this round \(\left\lceil n/k \right\rceil \) times performs the full multiplication. Since the quantum lookup will cost \(4\cdot 2^k\) T gates [2] and uncontrolled \(n+k\)bit addition costs \(O(n+k)\) T gates, we expect the optimal window size to be approximately \(k=O(\lg n)\). The total multiplication cost is still \(O(n^2)\) because we only window addition by p, not addition of the quantum register y. Compared to unwindowed addandhalve multiplication, windowing should save a factor of roughly \(\frac{1}{2}+ O(\frac{1}{\lg n})\). Similar reasoning suggests savings of \(\frac{1}{2}+O(\frac{1}{\lg \lg n})\) in depth.
Numerical estimates show a window size of \(k\approx 0.7\lg n + 0.5\) optimizes Tcount, and \(1.97\lg \lg n1.11\) optimizes Tdepth. At the scale we estimate, this is only noticeable in the leading coefficient of the cost. We found a 22% reduction in Tdepth at 384 bits, for example.
Windowing adds a significant cost of roughly \(n+k\) auxiliary qubits, but the full elliptic curve point addition circuit has enough unused auxiliary qubits during any multiplication that this does not make a difference.
Point Addition. Windowing requires 2 extra registers as the cache to load the precomputed points. We use the components of the second point three times during point addition. We could perform the lookup once and keep the values, increasing total circuit width by two registers. Alternatively, we can fit the lookups within the existing space. At every point where \(x_2\) or \(y_2\) are added, the circuit has spare auxiliary qubits available. Thus, we can perform the lookup, add the point to the quantum register, then uncompute the quantum lookup to free the qubits for the expensive modular division. This requires us six lookups (including uncomputing) rather than just two, but uses no extra registers.
With a window size of \(\ell \), including sign bit, each lookup costs \(4\cdot 2^{\ell  1}\) T gates and Tdepth. The windowing saves us \(\ell 1\) point additions. If point addition costs \(\mathsf {A}\) T gates, we would expect \(\ell \approx \lg (\mathsf {A}/24)\) to be the optimal value, leading to a factor \(\ell \) reduction in Tgate cost.
D Automatic Compilation for Aggressive TCount and TDepth Reduction
In this section, we motivate automatic compilation methods to drastically reduce the Tcount and the Tdepth if we allow a significant increase in circuit width.
The modular multiplication followed by an addition is one of the most costly operations in the overall algorithm. It is implemented as a unitary \(U : x\rangle y\rangle z\rangle 0\rangle \mapsto x\rangle y\rangle (xy + z) \bmod p\rangle 0\rangle \) that adds the result of the multiplication of two numbers x and y onto a third number z, all in Montgomery form with bitwidth n and modulus p. We apply the following procedure to automatically obtain a quantum circuit for this operation:

1.
We generate logic networks over the gate basis \(\{\mathrm {AND}, \mathrm {XOR}, \mathrm {INV}\}\), called XorAndinverter Graphs (XAGs), for the functions \(xy \bmod p\), \((x + y) \bmod p\), and \((x  y) \bmod p\), where x and y are integers in Montgomery form.

2.
We apply the logic optimization method described in [30] to minimize the number of AND gates in the XAGs.

3.
The optimized XAGs are then translated into outofplace quantum circuits using the method in [18], which requires 4 T gates for each AND gate in the XAG. Optimizing these circuits for depth requires roughly 2 qubits for each AND gate in the XAG, by using the AND gate construction from Sect. 3.

4.
The automatically generated unitaries are composed as described in Fig. 9, which uses a technique similar to that described in Appendix A.4 to turn the outofplace addition and subtraction into an inplace addition.
Table 2 lists the resource costs in terms of Tcount, Tdepth, and circuit width, for both the manual construction and the automatic construction. Several factors of reduction in Tcount and Tdepth are possible, while the increase in the number of qubits is significant. However, such a design point can be of high interest, in particular when combined with automatic quantum memory strategies, e.g., pebbling [19], that can find intermediate tradeoff points that lie in between the manual and automatic construction.
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Häner, T., Jaques, S., Naehrig, M., Roetteler, M., Soeken, M. (2020). Improved Quantum Circuits for Elliptic Curve Discrete Logarithms. In: Ding, J., Tillich, JP. (eds) PostQuantum Cryptography. PQCrypto 2020. Lecture Notes in Computer Science(), vol 12100. Springer, Cham. https://doi.org/10.1007/9783030442231_23
Download citation
DOI: https://doi.org/10.1007/9783030442231_23
Published:
Publisher Name: Springer, Cham
Print ISBN: 9783030442224
Online ISBN: 9783030442231
eBook Packages: Computer ScienceComputer Science (R0)