Abstract
We present improved quantum circuits for elliptic curve scalar multiplication, the most costly component in Shor’s algorithm to compute discrete logarithms in elliptic curve groups. We optimize low-level components such as reversible integer and modular arithmetic through windowing techniques and more adaptive placement of uncomputing steps, and improve over previous quantum circuits for modular inversion by reformulating the binary Euclidean algorithm. Overall, we obtain an affine Weierstrass point addition circuit that has lower depth and uses fewer T gates than previous circuits. While previous work mostly focuses on minimizing the total number of qubits, we present various trade-offs between different cost metrics including the number of qubits, circuit depth and T-gate count. Finally, we provide a full implementation of point addition in the Q# quantum programming language that allows unit tests and automatic quantum resource estimation for all components.
S. Jaques—Partially supported by the University of Oxford Clarendon fund.
Most of this work was done by Samuel Jaques, while he was an intern at Microsoft Research.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Our code will be released under an open source license.
References
Amy, M., Maslov, D., Mosca, M., Roetteler, M.: A meet-in-the-middle algorithm for fast synthesis of depth-optimal quantum circuits. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 32(6), 818–830 (2013)
Babbush, R., et al.: Encoding electronic spectra in quantum circuits with linear T complexity. Phys. Rev. X 8(4), 041015 (2018). arXiv: quant-ph/1805.03662
Barenco, A., et al.: Elementary gates for quantum computation. Phys. Rev. A 52(5), 3457–3467 (1995). arXiv: quant-ph/9503016
Bennett, C.H.: Logical reversibility of computation. IBM J. Res. Dev. 17(6), 525–532 (1973)
Bennett, C.H.: Time/space trade-offs for reversible computation. SIAM J. Comput. 18(4), 766–776 (1989)
Bernstein, D.J., Lange, T.: (2007). https://www.hyperelliptic.org/EFD
Bernstein, D.J., Yang, B.-Y.: Fast constant-time GCD computation and modular inversion. IACR Trans. Cryptogr. Hardw. Embed. Syst. 2019(3), 340–398 (2019)
Cuccaro, S.A., Draper, T.G., Kutin, S.A., Moulton, D.P.: A new quantum ripple-carry addition circuit (2004). arXiv:quant-ph/0410184
Draper, T.G., Kutin, S.A., Rains, E.M., Svore, K.M.: A logarithmic-depth quantum carry-lookahead adder, June 2004. arXiv: quant-ph/0406142
Fowler, A.G., Mariantoni, M., Martinis, J.M., Cleland, A.N.: Surface codes: towards practical large-scale quantum computation. Phys. Rev. A 86, 032324 (2012)
Gheorghiu, V., Mosca, M.: Benchmarking the quantum cryptanalysis of symmetric, public-key and hash-based cryptographic schemes (2019)
Gidney, C.: Halving the cost of quantum addition. Quantum 2, 74 (2018)
Gidney, C.: Windowed quantum arithmetic (2019). arXiv: quant-ph/1905.07682
Gidney, C., Ekerå, M.: How to factor 2048 bit RSA integers in 8 hours using 20 million noisy qubits, May 2019. arXiv: quant-ph/1905.09749
Griffiths, R., Niu, C.-S.: Semiclassical Fourier transform for quantum computation. Phys. Rev. Lett. 76(17), 3228–3231 (1996)
Jones, C.: Low-overhead constructions for the fault-tolerant Toffoli gate. Phys. Rev. A 87(2), 022328 (2013)
Kaliski, B.S.: The Montgomery inverse and its applications. IEEE Trans. Comput. 44(8), 1064–1065 (1995)
Meuli, G., Soeken, M., Campbell, E., Roetteler, M., De Micheli, G.: The role of multiplicative complexity in compiling low T-count oracle circuits (2019). arXiv: quant-ph/1908.01609
Meuli, G., Soeken, M., Roetteler, M., Bjørner, N., De Micheli, G.: Reversible pebbling game for quantum memory management. In: Design, Automation & Test in Europe Conference, pp. 288–291 (2019)
Montgomery, P.L.: Modular multiplication without trial division. Math. Comput. 44(170), 519–521 (1985)
Moore, C.: Quantum circuits: fanout, parity, and counting (1999). arXiv: quant-ph/9903046
Nielsen, M.A., Chuang, I.L.: Quantum Computation and Quantum Information. Cambridge University Press, Cambridge (2000)
Proos, J., Zalka, C.: Shor’s discrete logarithm quantum algorithm for elliptic curves, January 2003. arXiv: quant-ph/0301141
Roetteler, M., Naehrig, M., Svore, K.M., Lauter, K.: Quantum resource estimates for computing elliptic curve discrete logarithms. In: Takagi, T., Peyrin, T. (eds.) ASIACRYPT 2017. LNCS, vol. 10625, pp. 241–270. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-70697-9_9
Selinger, P.: Quantum circuits of T-depth one. Phys. Rev. A 87(4), 042302 (2013). arXiv: 1210.0974
Shor, P.W.: Algorithms for quantum computation: discrete logarithms and factoring. In: FOCS 1994, pp. 124–134 (1994)
Shor, P.W.: Polynomial-time algorithms for prime factorization and discrete logarithms on a quantum computer. SIAM J. Comput. 26(5), 1484–1509 (1997)
Svore, K.M., et al.: Q#: enabling scalable quantum computing and development with a high-level DSL. In: RWDSL@CGO 2018 (2018)
Takahashi, Y., Tani, S., Kunihiro, N.: Quantum addition circuits and unbounded fan-out. Quantum Inf. Comput. 10, 10 (2009)
Testa, E., Soeken, M., Amarù, L.G., De Micheli, G.: Reducing the multiplicative complexity in logic networks for cryptography and security applications. In: Design Automation Conference, p. 74 (2019)
U.S. Department of Commerce/National Institute of Standards and Technology. Digital signature standard (DSS). FIPS-186-4 (2013). http://nvlpubs.nist.gov/nistpubs/FIPS/NIST.FIPS.186-4.pdf
Acknowledgements
We thank Dan Bernstein, Martin Ekerå, Iggy van Hoof, and Tanja Lange for helpful suggestions about elliptic curve arithmetic. We thank Martin Albrecht for lending computing power to run resource estimates.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendices
A Alternative Approaches
1.1 A.1 Modular Multiplication
RNSL provide two circuits for modular multiplication. The first is the one proposed by Proos and Zalka [23], which uses a double-and-add approach, where doubling and addition are both modular operations modulo p. The other is reversible Montgomery multiplication, which uses an add-and-halve approach and works in Montgomery form. The primary motivation for considering Montgomery multiplication instead of the straightforward double-and-add method is that modular reduction is achieved by suitable additions to clear lower order bits and divisions by 2 (i.e. bit rotations) as part of the whole circuit, not delegated to the addition and halving circuits. This results in simpler operations per bit.
However, Montgomery multiplication has the downside that it entangles with a register of auxiliary qubits which must be cleared. In our case, at every point in an elliptic curve point addition, we have enough spare auxiliary qubits for this. Overall, it is cheaper, even with the Bennett method, and especially with the multiply-then-add technique of Sect. 4.1.
1.2 A.2 Modular Inversion
Proos and Zalka [23] (PZ) gave an approach to modular inversion based on precise control of a bit-shift division operation, with asymptotic complexity of \(O(n^2)\). There are O(n) iterations of a round. Each round implements conditional logic by computing state qubits, then using those state qubits to control some operations on the integer registers.
RNSL use a similar round-based construction, which implements a reversible binary extended Euclidean algorithm. As with multiplication, the primary difference between the PZ division and the RNSL division is that PZ’s is based on doubling and integer long division, while RNSL’s is based on halving and binary operations. The PZ inversion leaves only \(O(\lg n)\) auxiliary qubits, while RNSL creates \(2n+O(\lg n)\) auxiliary qubits, but PZ has a higher depth and gate cost.
Naively, the PZ approach uses 5n qubits, though they show that, with fidelity loss on the order of \(O(n^{-3})\) per round, they require only \(2n+8\sqrt{n}+O(\log n)\) qubits. The RNSL approach uses 6n qubits. We choose to use the RNSL algorithm. It is exactly correct, so it can be used for higher depth algorithms, and the total T-cost and depth are less than half of the PZ approach.
1.3 A.3 Recursive GCD Algorithms
There are several sub-quadratic GCD algorithms (such as [7]). These work by defining a series of \(2\times 2\) matrices \(T_n\) such that \(T_nT_{n-1}\cdots T_1(u,v)^T\) will map integers u and v to the nth step of the Euclidean algorithm. These can be computed and multiplied together recursively.
Adapted to quantum circuits, these approaches require quantum matrix multiplication. We could find no efficient method to do this in-place, meaning that each recursive call would require a new set of auxiliary qubits to store the matrix output. This would quickly overwhelm our qubit budget. The base case of [7] is nearly identical to our approach for a single round.
One of the primary advantages of [7] is that the recursive process allows much of the arithmetic to be done with small integers which fit into the registers of classical CPUs. All the qubits in our model of a quantum computer are identical, so it has no caching or register issues. If quantum technologies arise with different kinds of qubits (perhaps a “memory” with higher coherence times but lower gate fidelity), then recursive GCD algorithms should be revisited. It is also possible that the specific structure of the matrices in this approach permit an easy, in-place multiplication circuit. We leave this to future work.
1.4 A.4 Alternate Curve Representations
Projective Coordinates. Projective coordinates use equivalence classes (X : Y : Z) of triples (X, Y, Z) to represent an elliptic curve point, where \((X_1,Y_1,Z_1)\sim (X_2,Y_2,Z_2)\) iff there is some non-zero constant c such that \(X_1=cX_2\), \(Y_1=cY_2\), and \(Z_2=cZ_2\). These can be used with many different families of curves. Projective coordinates lend themselves to efficient, inversion-free arithmetic, which is appealing for classical computers.
Projective coordinates do not give a unique representation of each point, which Shor’s algorithm requires to ensure history independence and thus proper interference of states in superposition. Dividing by the Z coordinate produces a unique representation but requires an expensive division. It is an open problem to provide a unique projective representation with division-free arithmetic.
Another issue is that the classical elliptic curve formulas, naively adapted to quantum circuits, operate out-of-place. An out-of-place addition circuit is easy to adapt into an in-place addition circuit. If we can construct a circuit \(U_{+Q}\) to add a point Q, we can construct a circuit \(U_{-Q}\), and we can construct an in-place point addition by writing \(P+Q\) into another register, then subtracting Q from \(P+Q\) to clear the input. This doubles the cost of point addition.
This technique requires a unique representation. If \((P\,+\,Q)\,-\,Q\) does not have the same representation as P, we cannot cancel them out. Thus, for any current algorithm to compute addition with projective coordinates with cost C, we can transform it to a quantum-suitable in-place version with cost \(2C+2D\), where D is the cost of division. The division creates a unique representation.
According to the Explicit Formulas Database [6], the lowest-cost addition uses 6 squares and/or multiplications. With the required reductions, the total cost is 12 squares/multiplications and 2 divisions, much higher than affine Weierstrass coordinates. Thus, we choose not to use projective coordinates in this work.
1.5 A.5 Precomputation
Precomputed tables of certain powers of the base element can speed up exponentiations. The “comb” method is a standard technique used for elliptic curve scalar multiplication. To multiply a point P by a scalar k, we divide k into \(k_1+2k_2+\cdots + 2^\ell k_\ell \) for some \(\ell \), with the property that \(k_j\) contains bits of k in positions congruent to j modulo \(\ell \) (each \(k_j\) looks like a comb of bits). We then precompute a table of all multiples of P by scalars of the form \(b_0\,+\,b_12^\ell \,+\,b_2 2^{2\ell }\dots \), with \(b_i\in \{0,1\}\). By the definition of \(k_j\), each \(k_jP\) is a precomputed point in this table for all j. Thus, we can compute kP by using \(k_j\) to look up elements of the table, adding them to a running total, and doubling the running total.
The advantage of the comb technique is that it saves precomputation. We only precompute one table and use it for the entire computation. Unfortunately for the quantum case, precomputation is essentially free because it is entirely classical, but look-ups are expensive. The comb technique does not reduce the number of table look-ups, since we must do a separate look-up for each index \(k_j\).
Further, efficient in-place point doubling is unlikely, since it implies efficient in-place point halving. Thus, doubling points in the comb would require some pebbling technique which would likely add significant depth or width costs.
B Modular Division and Addition
For elliptic curve addition, we only need to divide integers and copy the result to a blank output, but other applications may wish to construct a circuit that, given registers containing x, y, and z, will compute \(yx^{-1}+z\).
We might simply invert, multiply, and then add the output of the multiplication instead of copying. However, doubling the output to correct the pseudo-inverse while uncomputing will also multiply z by a factor of \(2^{2n-k}\). To correct for this, we can repeatedly halve z during the forward computation of the modular inverse. This means that while we compute the modular inverse, we control a modular halving of the register containing z by the counter, which will halve z exactly \(2n-k\) times. Then we multiply the pseudo-inverse by y and add the result to the register with z, producing the state \(\left| x^{-1}2^{2n-k}\bmod p\right\rangle \left| y2^n\bmod p\right\rangle \left| z2^{2n-k}+x^{-1}y2^{2n-k}\bmod p\right\rangle \). From here, if we perform controlled modular doublings of the register containing z as we uncompute the inversion circuit, this will correct both z and the pseudo-inverse of x, producing the desired output.
C Analysis of Windowed Arithmetic
A quantum look-up to N elements requires 4N T-gates [2]. To optimize window costs, we balance this cost against the operations we save.
Multiplication. Section 4.1 describes a single windowed multiplication round. For n-bit integers with window size k, repeating this round \(\left\lceil n/k \right\rceil \) times performs the full multiplication. Since the quantum look-up will cost \(4\cdot 2^k\) T gates [2] and uncontrolled \(n+k\)-bit addition costs \(O(n+k)\) T gates, we expect the optimal window size to be approximately \(k=O(\lg n)\). The total multiplication cost is still \(O(n^2)\) because we only window addition by p, not addition of the quantum register y. Compared to un-windowed add-and-halve multiplication, windowing should save a factor of roughly \(\frac{1}{2}+ O(\frac{1}{\lg n})\). Similar reasoning suggests savings of \(\frac{1}{2}+O(\frac{1}{\lg \lg n})\) in depth.
Numerical estimates show a window size of \(k\approx 0.7\lg n + 0.5\) optimizes T-count, and \(1.97\lg \lg n-1.11\) optimizes T-depth. At the scale we estimate, this is only noticeable in the leading coefficient of the cost. We found a 22% reduction in T-depth at 384 bits, for example.
Windowing adds a significant cost of roughly \(n+k\) auxiliary qubits, but the full elliptic curve point addition circuit has enough unused auxiliary qubits during any multiplication that this does not make a difference.
Point Addition. Windowing requires 2 extra registers as the cache to load the precomputed points. We use the components of the second point three times during point addition. We could perform the look-up once and keep the values, increasing total circuit width by two registers. Alternatively, we can fit the look-ups within the existing space. At every point where \(x_2\) or \(y_2\) are added, the circuit has spare auxiliary qubits available. Thus, we can perform the look-up, add the point to the quantum register, then uncompute the quantum look-up to free the qubits for the expensive modular division. This requires us six look-ups (including uncomputing) rather than just two, but uses no extra registers.
With a window size of \(\ell \), including sign bit, each look-up costs \(4\cdot 2^{\ell - 1}\) T gates and T-depth. The windowing saves us \(\ell -1\) point additions. If point addition costs \(\mathsf {A}\) T gates, we would expect \(\ell \approx \lg (\mathsf {A}/24)\) to be the optimal value, leading to a factor \(\ell \) reduction in T-gate cost.
D Automatic Compilation for Aggressive T-Count and T-Depth Reduction
In this section, we motivate automatic compilation methods to drastically reduce the T-count and the T-depth if we allow a significant increase in circuit width.
The modular multiplication followed by an addition is one of the most costly operations in the overall algorithm. It is implemented as a unitary \(U : |x\rangle |y\rangle |z\rangle |0\rangle \mapsto |x\rangle |y\rangle |(xy + z) \bmod p\rangle |0\rangle \) that adds the result of the multiplication of two numbers x and y onto a third number z, all in Montgomery form with bit-width n and modulus p. We apply the following procedure to automatically obtain a quantum circuit for this operation:
-
1.
We generate logic networks over the gate basis \(\{\mathrm {AND}, \mathrm {XOR}, \mathrm {INV}\}\), called Xor-And-inverter Graphs (XAGs), for the functions \(xy \bmod p\), \((x + y) \bmod p\), and \((x - y) \bmod p\), where x and y are integers in Montgomery form.
-
2.
We apply the logic optimization method described in [30] to minimize the number of AND gates in the XAGs.
-
3.
The optimized XAGs are then translated into out-of-place quantum circuits using the method in [18], which requires 4 T gates for each AND gate in the XAG. Optimizing these circuits for depth requires roughly 2 qubits for each AND gate in the XAG, by using the AND gate construction from Sect. 3.
-
4.
The automatically generated unitaries are composed as described in Fig. 9, which uses a technique similar to that described in Appendix A.4 to turn the out-of-place addition and subtraction into an in-place addition.
Table 2 lists the resource costs in terms of T-count, T-depth, and circuit width, for both the manual construction and the automatic construction. Several factors of reduction in T-count and T-depth are possible, while the increase in the number of qubits is significant. However, such a design point can be of high interest, in particular when combined with automatic quantum memory strategies, e.g., pebbling [19], that can find intermediate trade-off points that lie in between the manual and automatic construction.
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Häner, T., Jaques, S., Naehrig, M., Roetteler, M., Soeken, M. (2020). Improved Quantum Circuits for Elliptic Curve Discrete Logarithms. In: Ding, J., Tillich, JP. (eds) Post-Quantum Cryptography. PQCrypto 2020. Lecture Notes in Computer Science(), vol 12100. Springer, Cham. https://doi.org/10.1007/978-3-030-44223-1_23
Download citation
DOI: https://doi.org/10.1007/978-3-030-44223-1_23
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-44222-4
Online ISBN: 978-3-030-44223-1
eBook Packages: Computer ScienceComputer Science (R0)