Improved Quantum Circuits for Elliptic Curve Discrete Logarithms

Häner, Thomas; Jaques, Samuel; Naehrig, Michael; Roetteler, Martin; Soeken, Mathias

doi:10.1007/978-3-030-44223-1_23

Thomas Häner¹⁰,
Samuel Jaques¹¹,
Michael Naehrig¹²,
Martin Roetteler¹⁰ &
…
Mathias Soeken¹⁰

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 12100))

Included in the following conference series:

International Conference on Post-Quantum Cryptography

1566 Accesses
37 Citations
3 Altmetric

Abstract

We present improved quantum circuits for elliptic curve scalar multiplication, the most costly component in Shor’s algorithm to compute discrete logarithms in elliptic curve groups. We optimize low-level components such as reversible integer and modular arithmetic through windowing techniques and more adaptive placement of uncomputing steps, and improve over previous quantum circuits for modular inversion by reformulating the binary Euclidean algorithm. Overall, we obtain an affine Weierstrass point addition circuit that has lower depth and uses fewer T gates than previous circuits. While previous work mostly focuses on minimizing the total number of qubits, we present various trade-offs between different cost metrics including the number of qubits, circuit depth and T-gate count. Finally, we provide a full implementation of point addition in the Q# quantum programming language that allows unit tests and automatic quantum resource estimation for all components.

S. Jaques—Partially supported by the University of Oxford Clarendon fund.

Most of this work was done by Samuel Jaques, while he was an intern at Microsoft Research.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Our code will be released under an open source license.

References

Amy, M., Maslov, D., Mosca, M., Roetteler, M.: A meet-in-the-middle algorithm for fast synthesis of depth-optimal quantum circuits. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 32(6), 818–830 (2013)
Article Google Scholar
Babbush, R., et al.: Encoding electronic spectra in quantum circuits with linear T complexity. Phys. Rev. X 8(4), 041015 (2018). arXiv: quant-ph/1805.03662
Google Scholar
Barenco, A., et al.: Elementary gates for quantum computation. Phys. Rev. A 52(5), 3457–3467 (1995). arXiv: quant-ph/9503016
Bennett, C.H.: Logical reversibility of computation. IBM J. Res. Dev. 17(6), 525–532 (1973)
Article MathSciNet Google Scholar
Bennett, C.H.: Time/space trade-offs for reversible computation. SIAM J. Comput. 18(4), 766–776 (1989)
Article MathSciNet Google Scholar
Bernstein, D.J., Lange, T.: (2007). https://www.hyperelliptic.org/EFD
Bernstein, D.J., Yang, B.-Y.: Fast constant-time GCD computation and modular inversion. IACR Trans. Cryptogr. Hardw. Embed. Syst. 2019(3), 340–398 (2019)
Google Scholar
Cuccaro, S.A., Draper, T.G., Kutin, S.A., Moulton, D.P.: A new quantum ripple-carry addition circuit (2004). arXiv:quant-ph/0410184
Draper, T.G., Kutin, S.A., Rains, E.M., Svore, K.M.: A logarithmic-depth quantum carry-lookahead adder, June 2004. arXiv: quant-ph/0406142
Fowler, A.G., Mariantoni, M., Martinis, J.M., Cleland, A.N.: Surface codes: towards practical large-scale quantum computation. Phys. Rev. A 86, 032324 (2012)
Article Google Scholar
Gheorghiu, V., Mosca, M.: Benchmarking the quantum cryptanalysis of symmetric, public-key and hash-based cryptographic schemes (2019)
Google Scholar
Gidney, C.: Halving the cost of quantum addition. Quantum 2, 74 (2018)
Article Google Scholar
Gidney, C.: Windowed quantum arithmetic (2019). arXiv: quant-ph/1905.07682
Gidney, C., Ekerå, M.: How to factor 2048 bit RSA integers in 8 hours using 20 million noisy qubits, May 2019. arXiv: quant-ph/1905.09749
Griffiths, R., Niu, C.-S.: Semiclassical Fourier transform for quantum computation. Phys. Rev. Lett. 76(17), 3228–3231 (1996)
Article Google Scholar
Jones, C.: Low-overhead constructions for the fault-tolerant Toffoli gate. Phys. Rev. A 87(2), 022328 (2013)
Article Google Scholar
Kaliski, B.S.: The Montgomery inverse and its applications. IEEE Trans. Comput. 44(8), 1064–1065 (1995)
Article Google Scholar
Meuli, G., Soeken, M., Campbell, E., Roetteler, M., De Micheli, G.: The role of multiplicative complexity in compiling low T-count oracle circuits (2019). arXiv: quant-ph/1908.01609
Meuli, G., Soeken, M., Roetteler, M., Bjørner, N., De Micheli, G.: Reversible pebbling game for quantum memory management. In: Design, Automation & Test in Europe Conference, pp. 288–291 (2019)
Google Scholar
Montgomery, P.L.: Modular multiplication without trial division. Math. Comput. 44(170), 519–521 (1985)
Article MathSciNet Google Scholar
Moore, C.: Quantum circuits: fanout, parity, and counting (1999). arXiv: quant-ph/9903046
Nielsen, M.A., Chuang, I.L.: Quantum Computation and Quantum Information. Cambridge University Press, Cambridge (2000)
MATH Google Scholar
Proos, J., Zalka, C.: Shor’s discrete logarithm quantum algorithm for elliptic curves, January 2003. arXiv: quant-ph/0301141
Roetteler, M., Naehrig, M., Svore, K.M., Lauter, K.: Quantum resource estimates for computing elliptic curve discrete logarithms. In: Takagi, T., Peyrin, T. (eds.) ASIACRYPT 2017. LNCS, vol. 10625, pp. 241–270. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-70697-9_9
Chapter Google Scholar
Selinger, P.: Quantum circuits of T-depth one. Phys. Rev. A 87(4), 042302 (2013). arXiv: 1210.0974
Article Google Scholar
Shor, P.W.: Algorithms for quantum computation: discrete logarithms and factoring. In: FOCS 1994, pp. 124–134 (1994)
Google Scholar
Shor, P.W.: Polynomial-time algorithms for prime factorization and discrete logarithms on a quantum computer. SIAM J. Comput. 26(5), 1484–1509 (1997)
Article MathSciNet Google Scholar
Svore, K.M., et al.: Q#: enabling scalable quantum computing and development with a high-level DSL. In: RWDSL@CGO 2018 (2018)
Google Scholar
Takahashi, Y., Tani, S., Kunihiro, N.: Quantum addition circuits and unbounded fan-out. Quantum Inf. Comput. 10, 10 (2009)
MathSciNet MATH Google Scholar
Testa, E., Soeken, M., Amarù, L.G., De Micheli, G.: Reducing the multiplicative complexity in logic networks for cryptography and security applications. In: Design Automation Conference, p. 74 (2019)
Google Scholar
U.S. Department of Commerce/National Institute of Standards and Technology. Digital signature standard (DSS). FIPS-186-4 (2013). http://nvlpubs.nist.gov/nistpubs/FIPS/NIST.FIPS.186-4.pdf

Download references

Acknowledgements

We thank Dan Bernstein, Martin Ekerå, Iggy van Hoof, and Tanja Lange for helpful suggestions about elliptic curve arithmetic. We thank Martin Albrecht for lending computing power to run resource estimates.

Author information

Authors and Affiliations

Microsoft Quantum, Redmond, WA, USA
Thomas Häner, Martin Roetteler & Mathias Soeken
Department of Materials, University of Oxford, Oxford, UK
Samuel Jaques
Microsoft Research, Redmond, WA, USA
Michael Naehrig

Authors

Thomas Häner
View author publications
You can also search for this author in PubMed Google Scholar
Samuel Jaques
View author publications
You can also search for this author in PubMed Google Scholar
Michael Naehrig
View author publications
You can also search for this author in PubMed Google Scholar
Martin Roetteler
View author publications
You can also search for this author in PubMed Google Scholar
Mathias Soeken
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Samuel Jaques .

Editor information

Editors and Affiliations

University of Cincinnati, Cincinnati, OH, USA
Jintai Ding
Inria, Paris, France
Jean-Pierre Tillich

Appendices

A Alternative Approaches

1.1 A.1 Modular Multiplication

RNSL provide two circuits for modular multiplication. The first is the one proposed by Proos and Zalka [23], which uses a double-and-add approach, where doubling and addition are both modular operations modulo p. The other is reversible Montgomery multiplication, which uses an add-and-halve approach and works in Montgomery form. The primary motivation for considering Montgomery multiplication instead of the straightforward double-and-add method is that modular reduction is achieved by suitable additions to clear lower order bits and divisions by 2 (i.e. bit rotations) as part of the whole circuit, not delegated to the addition and halving circuits. This results in simpler operations per bit.

However, Montgomery multiplication has the downside that it entangles with a register of auxiliary qubits which must be cleared. In our case, at every point in an elliptic curve point addition, we have enough spare auxiliary qubits for this. Overall, it is cheaper, even with the Bennett method, and especially with the multiply-then-add technique of Sect. 4.1.

1.2 A.2 Modular Inversion

Proos and Zalka [23] (PZ) gave an approach to modular inversion based on precise control of a bit-shift division operation, with asymptotic complexity of \(O(n^2)\). There are O(n) iterations of a round. Each round implements conditional logic by computing state qubits, then using those state qubits to control some operations on the integer registers.

RNSL use a similar round-based construction, which implements a reversible binary extended Euclidean algorithm. As with multiplication, the primary difference between the PZ division and the RNSL division is that PZ’s is based on doubling and integer long division, while RNSL’s is based on halving and binary operations. The PZ inversion leaves only \(O(\lg n)\) auxiliary qubits, while RNSL creates \(2n+O(\lg n)\) auxiliary qubits, but PZ has a higher depth and gate cost.

Naively, the PZ approach uses 5n qubits, though they show that, with fidelity loss on the order of \(O(n^{-3})\) per round, they require only \(2n+8\sqrt{n}+O(\log n)\) qubits. The RNSL approach uses 6n qubits. We choose to use the RNSL algorithm. It is exactly correct, so it can be used for higher depth algorithms, and the total T-cost and depth are less than half of the PZ approach.

1.3 A.3 Recursive GCD Algorithms

There are several sub-quadratic GCD algorithms (such as [7]). These work by defining a series of \(2\times 2\) matrices \(T_n\) such that \(T_nT_{n-1}\cdots T_1(u,v)^T\) will map integers u and v to the nth step of the Euclidean algorithm. These can be computed and multiplied together recursively.

Adapted to quantum circuits, these approaches require quantum matrix multiplication. We could find no efficient method to do this in-place, meaning that each recursive call would require a new set of auxiliary qubits to store the matrix output. This would quickly overwhelm our qubit budget. The base case of [7] is nearly identical to our approach for a single round.

One of the primary advantages of [7] is that the recursive process allows much of the arithmetic to be done with small integers which fit into the registers of classical CPUs. All the qubits in our model of a quantum computer are identical, so it has no caching or register issues. If quantum technologies arise with different kinds of qubits (perhaps a “memory” with higher coherence times but lower gate fidelity), then recursive GCD algorithms should be revisited. It is also possible that the specific structure of the matrices in this approach permit an easy, in-place multiplication circuit. We leave this to future work.

1.4 A.4 Alternate Curve Representations

Projective Coordinates. Projective coordinates use equivalence classes (X : Y : Z) of triples (X, Y, Z) to represent an elliptic curve point, where \((X_1,Y_1,Z_1)\sim (X_2,Y_2,Z_2)\) iff there is some non-zero constant c such that \(X_1=cX_2\), \(Y_1=cY_2\), and \(Z_2=cZ_2\). These can be used with many different families of curves. Projective coordinates lend themselves to efficient, inversion-free arithmetic, which is appealing for classical computers.

Projective coordinates do not give a unique representation of each point, which Shor’s algorithm requires to ensure history independence and thus proper interference of states in superposition. Dividing by the Z coordinate produces a unique representation but requires an expensive division. It is an open problem to provide a unique projective representation with division-free arithmetic.

Another issue is that the classical elliptic curve formulas, naively adapted to quantum circuits, operate out-of-place. An out-of-place addition circuit is easy to adapt into an in-place addition circuit. If we can construct a circuit \(U_{+Q}\) to add a point Q, we can construct a circuit \(U_{-Q}\), and we can construct an in-place point addition by writing \(P+Q\) into another register, then subtracting Q from \(P+Q\) to clear the input. This doubles the cost of point addition.

This technique requires a unique representation. If \((P\,+\,Q)\,-\,Q\) does not have the same representation as P, we cannot cancel them out. Thus, for any current algorithm to compute addition with projective coordinates with cost C, we can transform it to a quantum-suitable in-place version with cost \(2C+2D\), where D is the cost of division. The division creates a unique representation.

According to the Explicit Formulas Database [6], the lowest-cost addition uses 6 squares and/or multiplications. With the required reductions, the total cost is 12 squares/multiplications and 2 divisions, much higher than affine Weierstrass coordinates. Thus, we choose not to use projective coordinates in this work.

1.5 A.5 Precomputation

Precomputed tables of certain powers of the base element can speed up exponentiations. The “comb” method is a standard technique used for elliptic curve scalar multiplication. To multiply a point P by a scalar k, we divide k into \(k_1+2k_2+\cdots + 2^\ell k_\ell \) for some \(\ell \), with the property that \(k_j\) contains bits of k in positions congruent to j modulo \(\ell \) (each \(k_j\) looks like a comb of bits). We then precompute a table of all multiples of P by scalars of the form \(b_0\,+\,b_12^\ell \,+\,b_2 2^{2\ell }\dots \), with \(b_i\in \{0,1\}\). By the definition of \(k_j\), each \(k_jP\) is a precomputed point in this table for all j. Thus, we can compute kP by using \(k_j\) to look up elements of the table, adding them to a running total, and doubling the running total.

The advantage of the comb technique is that it saves precomputation. We only precompute one table and use it for the entire computation. Unfortunately for the quantum case, precomputation is essentially free because it is entirely classical, but look-ups are expensive. The comb technique does not reduce the number of table look-ups, since we must do a separate look-up for each index \(k_j\).

Further, efficient in-place point doubling is unlikely, since it implies efficient in-place point halving. Thus, doubling points in the comb would require some pebbling technique which would likely add significant depth or width costs.

B Modular Division and Addition

For elliptic curve addition, we only need to divide integers and copy the result to a blank output, but other applications may wish to construct a circuit that, given registers containing x, y, and z, will compute \(yx^{-1}+z\).

We might simply invert, multiply, and then add the output of the multiplication instead of copying. However, doubling the output to correct the pseudo-inverse while uncomputing will also multiply z by a factor of \(2^{2n-k}\). To correct for this, we can repeatedly halve z during the forward computation of the modular inverse. This means that while we compute the modular inverse, we control a modular halving of the register containing z by the counter, which will halve z exactly \(2n-k\) times. Then we multiply the pseudo-inverse by y and add the result to the register with z, producing the state \(\left| x^{-1}2^{2n-k}\bmod p\right\rangle \left| y2^n\bmod p\right\rangle \left| z2^{2n-k}+x^{-1}y2^{2n-k}\bmod p\right\rangle \). From here, if we perform controlled modular doublings of the register containing z as we uncompute the inversion circuit, this will correct both z and the pseudo-inverse of x, producing the desired output.

C Analysis of Windowed Arithmetic

A quantum look-up to N elements requires 4N T-gates [2]. To optimize window costs, we balance this cost against the operations we save.

Multiplication. Section 4.1 describes a single windowed multiplication round. For n-bit integers with window size k, repeating this round \(\left\lceil n/k \right\rceil \) times performs the full multiplication. Since the quantum look-up will cost \(4\cdot 2^k\) T gates [2] and uncontrolled \(n+k\)-bit addition costs \(O(n+k)\) T gates, we expect the optimal window size to be approximately \(k=O(\lg n)\). The total multiplication cost is still \(O(n^2)\) because we only window addition by p, not addition of the quantum register y. Compared to un-windowed add-and-halve multiplication, windowing should save a factor of roughly \(\frac{1}{2}+ O(\frac{1}{\lg n})\). Similar reasoning suggests savings of \(\frac{1}{2}+O(\frac{1}{\lg \lg n})\) in depth.

Numerical estimates show a window size of \(k\approx 0.7\lg n + 0.5\) optimizes T-count, and \(1.97\lg \lg n-1.11\) optimizes T-depth. At the scale we estimate, this is only noticeable in the leading coefficient of the cost. We found a 22% reduction in T-depth at 384 bits, for example.

Windowing adds a significant cost of roughly \(n+k\) auxiliary qubits, but the full elliptic curve point addition circuit has enough unused auxiliary qubits during any multiplication that this does not make a difference.

Point Addition. Windowing requires 2 extra registers as the cache to load the precomputed points. We use the components of the second point three times during point addition. We could perform the look-up once and keep the values, increasing total circuit width by two registers. Alternatively, we can fit the look-ups within the existing space. At every point where \(x_2\) or \(y_2\) are added, the circuit has spare auxiliary qubits available. Thus, we can perform the look-up, add the point to the quantum register, then uncompute the quantum look-up to free the qubits for the expensive modular division. This requires us six look-ups (including uncomputing) rather than just two, but uses no extra registers.

With a window size of \(\ell \), including sign bit, each look-up costs \(4\cdot 2^{\ell - 1}\) T gates and T-depth. The windowing saves us \(\ell -1\) point additions. If point addition costs \(\mathsf {A}\) T gates, we would expect \(\ell \approx \lg (\mathsf {A}/24)\) to be the optimal value, leading to a factor \(\ell \) reduction in T-gate cost.

D Automatic Compilation for Aggressive T-Count and T-Depth Reduction

In this section, we motivate automatic compilation methods to drastically reduce the T-count and the T-depth if we allow a significant increase in circuit width.

The modular multiplication followed by an addition is one of the most costly operations in the overall algorithm. It is implemented as a unitary \(U : |x\rangle |y\rangle |z\rangle |0\rangle \mapsto |x\rangle |y\rangle |(xy + z) \bmod p\rangle |0\rangle \) that adds the result of the multiplication of two numbers x and y onto a third number z, all in Montgomery form with bit-width n and modulus p. We apply the following procedure to automatically obtain a quantum circuit for this operation:

1.
We generate logic networks over the gate basis \(\{\mathrm {AND}, \mathrm {XOR}, \mathrm {INV}\}\), called Xor-And-inverter Graphs (XAGs), for the functions \(xy \bmod p\), \((x + y) \bmod p\), and \((x - y) \bmod p\), where x and y are integers in Montgomery form.
2.
We apply the logic optimization method described in [30] to minimize the number of AND gates in the XAGs.
3.
The optimized XAGs are then translated into out-of-place quantum circuits using the method in [18], which requires 4 T gates for each AND gate in the XAG. Optimizing these circuits for depth requires roughly 2 qubits for each AND gate in the XAG, by using the AND gate construction from Sect. 3.
4.
The automatically generated unitaries are composed as described in Fig. 9, which uses a technique similar to that described in Appendix A.4 to turn the out-of-place addition and subtraction into an in-place addition.

Table 2. Comparison of resource costs between a manual and automatic construction to implement \(|xy + z \bmod p\rangle \).

Full size table

Table 2 lists the resource costs in terms of T-count, T-depth, and circuit width, for both the manual construction and the automatic construction. Several factors of reduction in T-count and T-depth are possible, while the increase in the number of qubits is significant. However, such a design point can be of high interest, in particular when combined with automatic quantum memory strategies, e.g., pebbling [19], that can find intermediate trade-off points that lie in between the manual and automatic construction.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Häner, T., Jaques, S., Naehrig, M., Roetteler, M., Soeken, M. (2020). Improved Quantum Circuits for Elliptic Curve Discrete Logarithms. In: Ding, J., Tillich, JP. (eds) Post-Quantum Cryptography. PQCrypto 2020. Lecture Notes in Computer Science(), vol 12100. Springer, Cham. https://doi.org/10.1007/978-3-030-44223-1_23

Download citation

DOI: https://doi.org/10.1007/978-3-030-44223-1_23
Published: 10 April 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-44222-4
Online ISBN: 978-3-030-44223-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics