Keywords

1 Introduction

Post-quantum Key Establishment. The existence of a quantum computer that is capable of implementing Shor’s algorithm [36] at a large enough scale would have devastating consequences on the current public-key cryptographic standards and thus on the current state of cybersecurity [32]. Subsequently, the field of post-quantum cryptography (PQC) [4] is rapidly growing as cryptographers look for public-key solutions that can resist large-scale quantum adversaries. Recently, the USA’s National Institute of Standards and Technology (NIST) began a process to develop new cryptographic standards and announced a call for PQC proposals with a deadline of November 30, 2017 [40].

Although the PQC community is currently examining alternatives to replace both traditional key establishment and traditional digital signature algorithms, there is an argument for scrutinising proposals in the former category with more haste than those in the latter. While digital signatures only need to be quantum-secure at the moment a powerful enough quantum adversary is realised, the realistic threat of long-term archival of sensitive data and a retroactive quantum break means that, ideally, key establishment protocols will offer quantum resistance long before such a quantum adversary exists [38].

Post-quantum key establishment proposals typically fall under one of three umbrellas:

  1. (i)

    Code-based. Based on the McEliece cryptosystem [28] and its variants [28], modern proposals include Bernstein, Chou and Schwabe’s McBits [5] and Misoczki et al.’s specialised MDPC-McEliece [10, 29].

  2. (ii)

    Lattice-based. Proposals here began with Hoffstein, Pipher and Silverman’s standardised NTRUEncrypt [20], and in more recent times have been based on either Regev’s learning with errors (LWE) problem [34] or Lyubashevsky, Peikert and Regev’s ring variant (R-LWE) [27]. Peikert brought these problems to life in [33], and his protocols served as a basis for a number of recent implementations, including Bos et al.’s R-LWE key establishment software [7], Alkim et al.’s R-LWE successor NewHope [1], and Bos et al.’s LWE key establishment software Frodo [6].

  3. (iii)

    Isogeny-based. Starting with the work of Couveignes [13] and with later work by Rostovsev and Stolbunov [35, 39], Jao and De Feo proposed and implemented supersingular isogeny Diffie-Hellman (SIDH) key exchange [21]. In recent times a number of improvements and optimisations of their SIDH protocol have been proposed and implemented [2, 11, 12, 15, 26].

To date there is no clear frontrunner among the post-quantum key establishment proposals. In terms of functionality, all of the public implementations resulting from (i), (ii) and (iii) suffer the same drawback of requiring modifications (e.g., the Fujisaka-Okamoto transformation [17]) to achieve active securityFootnote 1. However, there are bandwidth versus performance trade-offs to consider when examining the above proposals; while SIDH affords significantly smaller public keys than its code- and lattice-based counterparts, the performance of the state-of-the-art SIDH software is currently orders of magnitude slower than the state-of-the-art implementations mentioned in (i) and (ii) above. The reason for this wide performance gap is that well-chosen code- and lattice-based instantiations typically involve simple matrix/vector operations over special, and comparatively tiny, implementation-friendly moduli that are either powers of 2 or very close to a power of 2. On the other hand, in addition to SIDH inheriting several of the more complex operations from traditional curve-based cryptography like scalar multiplications and pairings, it also involves a new style of isogeny arithmetic and requires a new breed of significantly larger underlying finite fields. Whereas classical elliptic curve cryptography affords implementers the flexibility to cherry-pick the fastest underlying finite fields of sizes as small as 256 bits, most of the SIDH implementations to date have required extension fields of over one thousand bits whose underlying characteristic are of the form \(p=2^{i}3^j-1\). Imposing this special form of prime restricts both the number of SIDH-friendly fields available at a given security level and the number of field arithmetic optimisations possible for implementers.

Our Contributions. This paper presents a new algorithm for computing the fundamental operation in isogeny-based public-key cryptography, and in particular, within the SIDH protocol.

  • Odd-degree Montgomery isogenies. We derive a new formula for odd-degree isogenies between Montgomery curves – see Theorem 1. Compared to Vélu’s formulas for isogenies between Weierstrass curves, this formula is elegant and simple, both to write down and to implement. This formula immediately lends itself to a compact algorithm that computes arbitrary odd-degree isogenies.

  • Unifying the two isogeny operations. SIDH operations require isogeny computations to be applied to elliptic curves within the isogeny class and to the points that lie on those curves. These two operations are typically different and require independent functions. For odd-degree isogenies, we show that both of these operations can be performed using the same core function by exploiting the simple connection between 2-torsion points and the Montgomery curve coefficient. This streamlines SIDH code, and for isogenies of degree 5 and above, has the added benefit of being significantly faster than performing the computations independently.

  • Simplified algorithm. Together, the above two improvements culminate in a general-purpose algorithm that can efficiently compute isogenies of any odd degree. Coupled with specialised code for 2- and/or 4-isogenies, this allows arbitrary SIDH computations and gives rise to new possibilities within the SIDH framework. Our implementation benchmarks show that practitioners can lift the restriction of primes of the form \(p=2^{i}3^j-1\) without paying a huge performance penalty.

  • Faster 3- and 4-isogenies. While the contributions mentioned above broaden the scope of curves that can be considered SIDH-friendly, they do not give an immediate speedup to existing SIDH implementations because the pre-existing formulas for 3-isogenies are a special case of Theorem 1. Nevertheless, as an auxiliary result, we give new dedicated 3- and 4-isogeny algorithms that do give immediate speedups. When plugging these new algorithms into Microsoft’s recent v2.0 release of their SIDH libraryFootnote 2, Alice and Bob’s key generations are both sped up by a factor 1.18x, while their shared secret computations are both sped up by a factor 1.11x.

Although this paper is largely geared towards SIDH key exchange, we note that almost all of the discussion applies analogously to other supersingular isogeny-based cryptographic schemes, e.g., to the other schemes proposed by De Feo et al. [15], and to the recent isogeny-based signature scheme from Yoo et al. [43].

Organisation. We give the preliminaries in Sect. 2. We provide the new formula for odd-degree Montgomery isogenies in Sect. 3 and discuss its connection to related works. We show how the point and curve isogeny computations can be performed using the same function in Sect. 4, before presenting the general-purpose odd-degree isogeny algorithm in Sect. 5. We provide implementation benchmarks and conclude with some potential implications in Sect. 6. The faster explicit formulas for 3- and 4-isogenies are presented in Appendix A.

Remark 1

(Even degree isogenies). Since any separable isogeny can be written as a chain of prime degree isogenies [18, Theorem 25.1.2], our claim of treating arbitrary degree isogenies on Montgomery curves follows from the coupling of Theorem 1 (which covers isogenies of any odd degree) with the prior treatment of 2-isogenies on Montgomery curves by De Feo et al. [15]. It is worth noting that a technicality arises in the treatment of 2-isogenies on Montgomery curves: there is currently no known way of computing a 2-isogeny directly from a generic 2-torsion point without extracting a square root to transform the image curve into Montgomery form. De Feo, Jao and Plût overcome this obstruction by making use of a special 8-torsion point lying above the 2-torsion point in the kernel, which is already available for use in the SIDH framework. In broader contexts, however, the preservation of the Montgomery form under general 2-isogenies might become problematic; in these cases even powers of 2 can be treated by the application of 4-isogenies which do not need to compute square roots in order to preserve the Montgomery form [15]. In Remark 2 we discuss the related work of Moody and Shumow [31] on the (twisted) Edwards model [31]; in their case the 2-isogeny formula also requires a square root computation to preserve the Edwards form. Although Vélu’s formulas for 2-isogenies between short Weierstrass curves do not require square root computations, we believe it worthwhile to pose the open question of finding efficient 2-isogenies that preserve either of the faster Montgomery and/or twisted Edwards models (on input of a generic 2-torsion point).

2 Preliminaries

Montgomery Curves. Unless stated otherwise, all elliptic curves E/K in this paper are assumed to be written in Montgomery form [30]

$$\begin{aligned} E / K \,:by^2=x^3+ax^2+x. \end{aligned}$$

We will be dealing with the group of K-rational points on E, denoted E(K), which is the set of solutions \((x,y) \in K \times K\) to the above equation, furnished with a point at infinity, \(\mathcal {O}_E\). This point looks different under different projective embeddings of E. The usual embedding into \(\mathbb {P}^2\) via \(x=X/Z\) and \(y=Y/Z\) gives \(\mathcal {O}_E = (0 \,:1 \,:0)\), but the proof of Theorem 1 makes use of the alternative embedding into \(\mathbb {P}(1,2,1)\) via \(x=X/Z\) and \(y=Y/Z^2\), under which \(\mathcal {O}_E = (1 \,:0 \,:0)\). The inverse of a point (xy) is \((x,-y)\), and the number of points in E(K) is always divisible by 4.

Let \(P=(x_P,y_P)\) and \(Q=(x_Q,y_Q)\) be such that \(x_P \ne x_Q\). Then the coordinates of these points and the x-coordinates of their sum \(P+Q\) and difference \(P-Q\) are related by Montgomery’s group law identities [30, p. 261]

$$\begin{aligned} x_{P\,+\,Q}(x_P-x_Q)^2x_Px_Q&= b(x_Py_Q-x_Qy_P)^2, \quad \mathrm{and}\nonumber \\ x_{P\,-\,Q}(x_P-x_Q)^2x_Px_Q&= b(x_Py_Q+x_Qy_P)^2. \end{aligned}$$
(1)

Montgomery multiplies these equations to produce his celebrated differential arithmetic formulas [30, p. 262]

$$\begin{aligned} x_{P\,+\,Q}x_{P-Q}&= (x_Px_Q-1)^2/(x_P-x_Q)^2, \quad \mathrm{and} \nonumber \\ x_{[2]P}&= (x_{P}^2-1)^2/(4x_{P}(x_P^2+ax_P+1)). \end{aligned}$$
(2)

Assuming the usual embedding of E into \(\mathbb {P}^2\), then following [30], we use \(\mathbf {x}\) throughout to denote the subsequent projection of points into \(\mathbb {P}^1\) that drops the Y-coordinate, i.e.,

$$\begin{aligned} \mathbf {x}\,:\quad E \setminus \{\mathcal {O}_E\} \rightarrow \mathbb {P}^1, \quad \quad (X \,:Y \,:Z) \mapsto (X \,:Z). \end{aligned}$$

Applying this to (2) gives Montgomery’s two algorithms for differential arithmetic in \(\mathbb {P}^1\), i.e.,

$$\begin{aligned} \mathtt{xDBL}&\,:\quad (\mathbf {x}(P),a) \mapsto \mathbf {x}([2]P), \quad \mathrm{and} \nonumber \\ \mathtt{xADD}&\,:\quad (\mathbf {x}(P),\mathbf {x}(Q),\mathbf {x}(P-Q)) \mapsto \mathbf {x}(P+Q). \end{aligned}$$
(3)

If \(\ell \) is odd, then the \(\ell \)-th division polynomial of an elliptic curve E / K is written as \(\psi _\ell (x) \in K[x]\), and this vanishes precisely at the nontrivial \(\ell \)-torsion points, i.e., the points P such that \([\ell ]P =\mathcal {O}_E\). The first two nontrivial odd division polynomials on the Montgomery curve \(E / K \,:by^2=x^3+ax^2+x\) are

$$\begin{aligned} \psi _3(x)&=3x^4+4ax^3+6x^2-1, \nonumber \\ \psi _5(x)&=5x^{12} + 20ax^{11}+(16a^2+62)x^{10}+80ax^9-105x^8-360ax^7\nonumber \\&-60(5+4a^2)x^6-16a(23+4a^2)x^5-5(25+32a^2)x^4-140ax^3-50x^2+1. \end{aligned}$$
(4)

SIDH. Let \(p = f \cdot n_A n_B \pm 1\) be a large prime where \(\mathrm{gcd}(n_A,n_B)=1\) and f is a small cofactor. SIDH [21] works in the isogeny class of supersingular elliptic curves over \(\mathbb {F}_{p^2}\), all of which have cardinality . Let E be a public starting curve in this isogeny class. To generate her public key, Alice chooses a secret subgroup \(G_A\) of order \(n_A\) on E and computes her public key as \(E/ G_A\). Likewise, Bob chooses a secret subgroup \(G_B\) of order \(n_B\) and computes his public key as \(E/ G_B\). The shared secret is then \(E/\langle G_A, G_B \rangle \), and so long as computing this from E, \(E/G_A\) and \(E/G_B\) is hard, this offers an alternative instantiation of the Diffie-Hellman protocol [14]. The key to the SIDH construction is ensuring that both parties exchange enough information to allow the mutual computation of \(E/\langle G_A, G_B \rangle \), while still hiding their secret keys.

To achieve this, Jao and De Feo [21] propose that the public keys also contain the images of certain public points under the isogenies defined by their secret subgroups. If \(\phi _A \,:E \rightarrow E/G_A\) is the secret isogeny corresponding to the subgroup \(G_A\), then Alice not only sends Bob the curve \(E/G_A\), but also the image of \(\phi _A\) on two points \(P_B\) and \(Q_B\) whose linear combinations generate the set of subgroups chosen by Bob, i.e., Alice’s public key is \(\mathrm{PK}_A=(E/G_A\, , \, \phi _A(P_B) \, , \, \phi _A(Q_B))\). Similarly, if linear combinations of \(P_A\) and \(Q_A\) generate the set of subgroups chosen by Alice, then Bob’s public key is \(\mathrm{PK}_B=(E/G_B\, , \, \phi _B(P_A) \, , \, \phi _B(Q_A))\). In this way Alice’s key generation amounts to randomly choosing two secret integers \(u_A, v_A \in \mathbb {Z}_{n_A}\), computing \(G_A = \langle [u_A]P_A+[v_A]Q_A\rangle \), and upon receipt of Bob’s public key, she can then compute \(E/\langle G_A, G_B \rangle = (E/G_B)/\langle [u_A]\phi _B(P_A)+ [v_A]\phi _B(Q_A)\rangle \). Bob proceeds analogously, and both parties compute the shared secret as the j-invariant of \(E/\langle G_A, G_B \rangle \).

In order for SIDH to be secure, \(n_A\) and \(n_B\) must be exponentially large so that Alice and Bob have an exponentially large keyspace. On the other hand, in order for SIDH to be practical, the computation of the \(n_A\)- and \(n_B\)-isogenies must be manageable. To achieve this, Jao and De Feo propose that \(n_A=\ell _A^{e_A}\) and \(n_B = \ell _B^{e_B}\) for \(\ell _A\) and \(\ell _B\) small; in this way there are \(\ell _A^{e_A-1}(\ell _A+1)\) secret cyclic subgroups of order \(n_A\) for Alice to choose from, and her secret isogeny computations can be performed as the composition of \(e_A\) low-degree \(\ell _A\)-isogenies (the analogous statement applies to Bob). In all of the SIDH implementations to date [2, 11, 12, 15, 26], \(\ell _A=2\) and \(\ell _B=3\), and Alice computes her \(2^{e_A}\)-isogeny as a composition of 2- and/or 4-isogenies (see [12, 15]), while Bob computes his \(3^{e_B}\)-isogeny as a composition of 3-isogenies. One consequence of this paper is to facilitate practical \(\ell ^e\)-isogenies where \(\ell \ge 5\).

Following [21, Fig. 2], one way to compute an \(\ell ^e\) cyclic isogeny \(\phi \) on the curve \(E_0\) is to start with a point \(P_0\) of order \(\ell ^e\), compute the point \([\ell ^{e-1}]P_0\) of order \(\ell \), then use Vélu’s formulas [41] to compute the \(\ell \)-isogeny \(\phi _0 \,:E_0 \rightarrow E_1\) with \(\mathrm{ker}(\phi _0)=\langle [\ell ^{e-1}]P_0\rangle \), and evaluate it at \(P_0\) to give \(\phi _0(P_0) = P_1 \in E_1\). Note that pushing \(P_0\) through the \(\ell \)-isogeny reduces the order of its image point, \(P_1\), by a factor of \(\ell \) on \(E_1\). This process is then repeated at each new iteration by first computing the order \(\ell \) point \([\ell ^{e-1-i}]P_i\), then the \(\ell \)-isogeny \(\phi _i \,:E_i \rightarrow E_{i+1}\), and finally the computation of the new point \(P_{i+1}=\phi _i(P_i)\); this is done until \(i=e-1\) and we have the final curve \(E_e=\phi _{e-1}\circ \dots \circ \phi _0(E_0) = \phi (E_0)\).

In their extended article, De Feo et al. [15] detailed a much faster approach towards the computation described above. Roughly speaking, they achieve large speedups by storing intermediate multiples of the \(P_i\) at each step and evaluating \(\phi _i\) at these multiples in such a way that the length of the scalar multiplication to find an order-\(\ell \) point on \(E_{i+1}\) is reduced. They aim to minimise the overall cost of the \(\ell ^e\)-isogeny computation by comparing the costs of point multiplications and isogeny computations and studying the optimisation problem in a combinatorial context – see [15, Sect. 4.2.2].

Following [12], in order to thwart simple timing attacks [24], the fastest way to compute SIDH operations in a constant-time fashion is to (i) perform point operations on the projective line \(\mathbb {P}^1\) associated to Montgomery’s x-coordinate, i.e., using the map \(\mathbf {x}\) above, and (ii) to also perform isogeny operations projectively in \(\mathbb {P}^1\) by ignoring the b coefficient (in the same way the Y-coordinate is ignored in the point arithmetic). The reasoning here is that the j-invariant (i.e., the isomorphism class) of the Montgomery curve \(E/K \,:by^2=x^3+ax^2+x\) is \(j(E) = 256(a^2-3)^3/(a^2-4)\), which is independent of b, as is the differential arithmetic arising from (3). All of the formulas and algorithms we describe in the remainder of this paper fit into this same framework.

3 Coordinate Maps for Odd-Degree Montgomery Isogenies

At the heart of this paper is the coordinate maps in Eq. (6) of Theorem 1 below. Although we are mostly concerned with the SIDH-specific applications to come in the following sections, we believe that the simplicity and usability of the formula may be of interest outside the realm of SIDH, so we leave the underlying field unspecified and state the isogeny formula in full. We follow the theorem with a discussion of the related work of Moody and Shumow [31].

Theorem 1

For a field K with \(\mathrm{char}(K) \ne 2\), let \(P \in E(\bar{K})\) be a point of order \(\ell =2d+1\) on the Montgomery curve \(E/K \,:by^2 = x^3+ax^2+x\) and write \(\sigma = \sum _{i=1}^d x_{[i]P}\), \(\tilde{\sigma }=\sum _{i=1}^d 1/x_{[i]P}\) and \(\pi = \prod _{i=1}^d x_{[i]P}\). The Montgomery curve

$$\begin{aligned} E'/K \,:b'y^2 = x^3+a'x^2+x \end{aligned}$$
(5)

with

$$\begin{aligned} a' = (6 \tilde{\sigma }-6\sigma +a ) \cdot \pi ^2 \quad \quad \quad \mathrm{and} \quad \quad \quad b' = b\cdot \pi ^2 \end{aligned}$$

is the codomain of the \(\ell \)-isogeny \(\phi \,:E \rightarrow E'\) with \(\ker (\phi ) = \langle P \rangle \), which is defined by the coordinate maps

$$\begin{aligned} \phi \,:(x,y) \mapsto (f(x), y \cdot f'(x)), \end{aligned}$$
(6)

where

$$\begin{aligned} f (x) = x \cdot \prod _{i=1}^d \left( \frac{x \cdot x_{[i]P}-1}{x-x_{[i]P}} \right) ^2, \end{aligned}$$

and \(f'(x)\) is its derivative.

Proof

The proof follows along the lines of Washington’s proof of Vélu’s formulas on general Weiestrass curves [42, Theorem 12.16]. Write

$$\begin{aligned} \phi : (x,y) \mapsto (X,Y), \end{aligned}$$

where \(X=f(x)=x \cdot u(x)^2/w(x)^2\) and \(Y=y\cdot f'(x)\), with \(u(x)=\prod _{i=1}^d (x\cdot x_i-1)\) and \(w(x)=\prod _{i=1}^d (x- x_i)\), and write \(G=\langle P \rangle \). Since X and Y are rational functions of x and y, they are functions on E, and it is clear that the only poles of X and Y are at the points in G. Our main goal is to show that the function

$$\begin{aligned} F(X,Y)=b'Y^2-(X^3+a'X^2+X) \end{aligned}$$
(7)

is 0. The idea is to introduce a uniformising parameter t at \(\mathcal {O}_E\) and ultimately show that \(F(X,Y) \in O(t)\), i.e., that F(XY) vanishes at \(\mathcal {O}_E\). Now, since \(x_{[i]P}=x_{[\ell -i]P}\) for \(i=1 \dots d\), it follows from (2) that for any \(Q=(x_Q,y_Q) \not \in G\),

$$\begin{aligned} f(x_Q)= \prod _{T \in G} x_{Q+T}, \end{aligned}$$

and therefore that the functions X and Y are invariant under translation by elements of G. Thus, if we can show that F(XY) vanishes at \(\mathcal {O}_E\), we will have also shown that it vanishes at all of the points in G. Since the function F(XY) can only have poles at the points in G, the only possibility is that F(XY) has no poles, which means that it is constant [42, Proposition 11.1(3)]. Furthermore, since it is 0 at infinity, it must be that F(XY) is identically zero, and therefore that X and Y satisfy the Montgomery curve equation in (5).

To show that F(XY) vanishes at \(\mathcal {O}_E\), let \(t=x/y\) be a uniformising parameter at \(\mathcal {O}_E\) and let \(s=1/y\). Dividing \(by^2 = x^3+ax^2+x\) by \(y^3\) and rearranging yields

$$\begin{aligned} s=(t^3+at^2s+ts^2)/b. \end{aligned}$$

Continually substituting this value for s into the above equation eventually yields

$$\begin{aligned} s=t_3 \cdot t^3 \, + \, t_5 \cdot t^5 \, + \, t_7 \cdot t^7 \, + \, t_9 \cdot t^9 \, + \, \dots , \end{aligned}$$

where

$$\begin{aligned} t_3 =1/b, \, \quad \quad t_5 =a/b^2, \quad \quad t_7 =(1+a^2)/b^3, \quad \quad t_9 =a(3+a^2)/b^4, \, \nonumber \\ t_{11} =(6a^2+a^4+2)/b^5, \quad \quad t_{13} =a(a^4+10a^2+10)/b^6. \quad \quad \end{aligned}$$

Since \(y=1/s\) and \(x=ty\), we invert the above equation to give

$$\begin{aligned} y&= bt^{-3} \cdot (y_0 \, + \, y_2 \cdot t^2 \, + \, y_4 \cdot t^4 \, + \, y_6 \cdot t^6 \, + \dots ) \, , \quad \mathrm{and} \nonumber \\ x&= bt^{-2} \cdot (y_0 \, + \, y_2 \cdot t^2 \, + \, y_4 \cdot t^4 \, + \, y_6 \cdot t^6 \, + \dots ) \end{aligned}$$
(8)

where

$$\begin{aligned}&\quad \quad \quad \quad y_0=1, \, \quad \quad y_2=-a/b, \quad \quad y_4=-1/b^2, \quad \quad y_6=-a/b^3, \, \nonumber \\&y_8=-(a^2+1)/b^4, \quad y_{10}=-(a^3+3a)/b^5. \quad y_{12}=(a^6+14a^4+24a^2+3)/b^6. \end{aligned}$$

From (6), we write

$$\begin{aligned} X = x \cdot f_1(x)^2 \cdot f_2(x)^2 \cdot \dots \cdot f_d(x)^2 , \end{aligned}$$
(9)

where

$$\begin{aligned} f_i(x) = \frac{x_{[i]P} \cdot x -1}{x-x_{[i]P}}, \end{aligned}$$

for \(1 \le i \le d\). Substitution of (8) gives

$$\begin{aligned} f_i(t)&= \frac{x_{[i]P} \cdot \left( bt^{-2} \cdot (y_0 \, + \, y_2 \cdot t^2 \, + \, y_4 \cdot t^4 \, + \, y_6 \cdot t^6 \, + \dots ) \right) -1}{ \left( bt^{-2} \cdot (y_0 \, + \, y_2 \cdot t^2 \, + \, y_4 \cdot t^4 \, + \, y_6 \cdot t^6 \, + \dots ) \right) -x_{[i]P} }, \nonumber \\&= x_{[i]P} \, + \, \left[ \frac{x_{[i]P}^2-1}{b} \right] \cdot t^2 \, + \, \left[ \frac{(x_{[i]P}^2-1) (a+x_{[i]P})}{b^2} \right] \cdot t^4 \nonumber \\&\quad \quad \quad \quad \quad \, + \, \left[ \frac{(x_{[i]P}^2-1) ((a+x_{[i]P})^2+1)}{b^3} \right] \cdot t^6 \quad + \quad O(t^8). \end{aligned}$$

Squaring the above equation yields

$$\begin{aligned}&f_i(t)^2 = x_{[i]P}^2 \, + \, \left[ \frac{2x_{[i]P}(x_{[i]P}^2-1)}{b} \right] \cdot t^2 \, + \, \left[ \frac{(x_{[i]P}^2-1) (3x_{[i]P}^2+2ax_{[i]P}-1)}{b^2} \right] \cdot t^4 \nonumber \\&\, + \, \left[ \frac{2(x_{[i]P}^2-1) (a^2 x_{[i]P}+3ax_{[i]P}^2+2x_{[i]P}^3-a)}{b^3} \right] \cdot t^6 \quad + \quad O(t^8). \end{aligned}$$
(10)

Substitution of (8) and (10) into (9) gives

$$\begin{aligned}&X(t) = X_{-2} \cdot t^{-2} \, + \, X_0 \, + \, X_2 \cdot t^2 \, + \, X_4 \cdot t^4 \, + \, \, O(t^6). \end{aligned}$$
(11)

where

$$\begin{aligned}&X_{-2} = b \pi ^2 \nonumber \\&X_{0} = -\pi ^2(2(\tilde{\sigma }-\sigma )+a) \nonumber \\&X_{2} = -\frac{4\pi ^4((\sigma -\tilde{\sigma })(a+3(\tilde{\sigma }-\sigma ))+1)+1}{5b\pi ^2} \nonumber \\&X_{4} = -\frac{12\pi ^4(\sigma -\tilde{\sigma })a^2+(3-16\pi ^4((\sigma -\tilde{\sigma })^2-2))a-10(\sigma -\tilde{\sigma })(8\pi ^4(\sigma -\tilde{\sigma })^2-1)}{35b^2\pi ^2}. \end{aligned}$$

Now, the product rule gives \(X'(x) =X'(t) \cdot (x'(t))^{-1}\), so from (11) we have

$$\begin{aligned} X'(t) = t^{-3} \cdot \left( -2X_{-2} +2X_{2}t^4 + 4X_4 t^6 + O(t^8) \right) , \end{aligned}$$
(12)

and from (8) we have

$$\begin{aligned} x'(t)&= bt^{-3} \cdot (-2y_0 \, + \, 2y_4 \cdot t^4 \, + \, 4y_6 \cdot t^6 \, + \, 6y_8 \cdot t^8 \dots ). \end{aligned}$$

Inverting the above equation yields

$$\begin{aligned} x'(t)^{-1}&= -\frac{t^3}{2by_0^2} \cdot \Big (y_0 + y_4 \cdot t^4 + 2y_6 \cdot t^6 + \left[ \frac{3y_0y_8+y_4^2}{y_0}\right] \cdot t^8 \nonumber \\&\quad \quad \quad \quad \quad \quad \quad \quad \quad \quad + \left[ \frac{4y_0y_{10}+4y_{4}y_{6}}{y_0}\right] \cdot t^{10} + O(t^{12}) \Big ). \end{aligned}$$
(13)

We now have all the ingredients to write F(XY) entirely in terms of t. We write

$$\begin{aligned} F(t)&= b' \left( y(t) \cdot X'(t) \cdot x'(t)^{-1}\right) ^2 - \left( X(t)^3 +a' X(t)^2 + X(t)\right) , \end{aligned}$$

Substituting (8), (11), (12) and (13) into the above equation, and collecting coefficients, yields

$$\begin{aligned} F(X,Y)&= F_{-6} \cdot t^{-6} + F_{-4} \cdot t^{-4} + F_{-2} \cdot t^{-2} + F_{0} + O(t), \end{aligned}$$
(14)

where

$$\begin{aligned}&F_{-6} = X_{-2}^2 \cdot (b'-X_{-2}), \nonumber \\&F_{-4} = -X_{-2}^2 \cdot \frac{2ab'+b(a'+3X_0)}{b}, \nonumber \\&F_{-2} = X_{-2} \cdot \frac{((a^2-4)X_{-2}-2b^2X_2) \cdot b'-b^2(2a'X_0+3X_0^2+3X_2X_{-2}+1)}{b^2}, \nonumber \\&F_0 = \frac{4X_{-2}(aX_2-bX_4)b'-b\cdot (a'X_0^2+2a'X_2X_{-2}+X_0^3+6X_{-2}X_0X_2+3X_4X_{-2}^2+X_0)}{b}. \end{aligned}$$

With \(X_{-2}\), \(X_0\), \(X_{2}\) and \(X_4\) as in (11), and with \(a'\) and \(b'\) from (5), we get \(F_{-6}=F_{-4}=F_{-2}=F_0=0\). Thus, we have \(F(X,Y) \in O(t)\), which means that F(XY) vanishes at \(\mathcal {O}_E\), and thus (as detailed above) that F(XY) is identically zero. It follows that X and Y satisfy the equation for \(E'\) in (5), and thus that \(\phi \) is a rational map from E to \(E'\). Since E is a smooth curve, we have that \(\phi \) is a morphism [37, Proposition II.2.1]. To show that \(\phi \) is an isogeny, we project into \(\mathbb {P}(1,2,1)\), where \(\mathcal {O}_{E}=(1 :0 :0)\). Substitution of \(x = X/Z\) and \(y=Y/Z^2\) into (6) reveals that \(\phi (\mathcal {O}_E) = (1 :0 :0) = \mathcal {O}_{E'}\), and we have established that \(\phi \) is an isogeny [37, III.4]. This completes the proof.    \(\square \)

In comparison to Vélu’s formulas [41] on general Weierstrass curves, the simplicity of Eq. (6) lies in the fact that it factors neatly across the different multiples of P. This lends itself to the simple algorithm we describe in Sect. 5.

Remark 2

To our knowledge, the work of Moody and Shumow [31] is the only prior work to investigate arbitrary degree isogenies on non-Weierstrass models. They managed to successfully derive general isogenies on both (twisted) Edwards curves [31, Theorem 3] and on Huff curves [31, Theorem 5] without passing back and forth to Vélu’s formulas on Weierstrass models. Given the ‘uniform-variable’ formulas in [31, Sect. 4.4], we could have presumably arrived at (6) by exploiting the birational equivalence between twisted Edwards and Montgomery curves [3, Theorem 3.2]. In particular, there is a simple relationship between the twisted Edwards y-coordinate and the Montgomery x-coordinate, and subsequently, there are Edwards y-only analogues of Montgomery’s x-only differential arithmetic that offer favourable trade-offs in certain ECC scenariosFootnote 3. However, our experiments seemed to suggest that these trade-offs evaporate in SIDH when the curve constants are treated projectively. Nevertheless, given the similarities between the y-only isogeny formula in [31, Theorem 4] and our Theorem 1, it could be that there are savings to be gained in a twisted Edwards version of SIDH, or perhaps in some sort of hybrid that passes back and forth between the two models – see [9]. We leave this investigation open, pointing out that the sorts of trade-offs discussed in [9] can become especially favourable in SIDH, due to the large field sizes and the nature of arithmetic in quadratic extension fields.

We conclude this section by pointing out that (in our case) it is a simple exercise to transform Eq. (6) into an analogue that, rather than writing the isogeny map in terms of the coordinates of the torsion points à la Vélu, instead writes it in terms of the coefficients of the polynomial defining the kernel subgroup à la Kohel [25, Sect. 2.4]. While a Kohel-style formulation of our formula is arguably more natural from a mathematical perspective, the way it is factored and written in (6) is more natural from an algorithmic perspective.

4 Computing the Isogenous Curve Using the 2-Torsion

Let \(\phi \,:E \mapsto E'\) be the isogeny of Montgomery curves in Theorem 1 and let \(Q \in E\) be any point where \(Q \not \in \mathrm{ker}(\phi )\). All supersingular isogeny-based cryptosystems, and in particular all known implementations of SIDH [2, 11, 12, 15, 26], require separate functions for computing isogenous curves, i.e., \(\mathtt{iso\_curve}\,:E \mapsto \phi (E)\), and for evaluating the isogeny at points, i.e., \(\mathtt{iso\_point}\,:Q \mapsto \phi (Q)\). In this brief section we show that these two functions can be unified in the computation of odd-degree isogenies. The idea is to exploit the correspondence between the 2-torsion points and the curve-twist isomorphism class, and to replace calls to the \(\mathtt{iso\_curve}\) function with calls to \(\mathtt{iso\_point}\) on the input of 2-torsion points. Pushing 2-torsion points through an odd-degree isogeny preserves their order on the image curve, and so the correspondence between 2-torsion points and the isogenous curves they lie on remains an invariant throughout the SIDH algorithm.

On the Montgomery curve \(E/K \,:y^2=x^3+ax^2+x\), the three affine points of order 2 in \(E(\bar{K})\) are

$$\begin{aligned} P_0=(0 , 0) , \quad \quad \quad P_\alpha =(\alpha ,0), \quad \quad \quad \mathrm{and} \quad \quad \quad P_{1/\alpha }=(1/\alpha ,0), \end{aligned}$$

where \(a=-(\alpha ^2 + 1)/\alpha \). Note that the full 2-torsion is K-rational if \(x^2 + ax + 1\) is reducible in K[x], i.e., if \(\alpha \in K\); this is typically the case in SIDH and is therefore assumed in this section.

Under the \(\mathbf {x}\) map from Sect. 2, the 2-torsion points are then

$$\begin{aligned} \mathbf {x}(P_0) = (0 \,:1) , \quad \quad \mathbf {x}(P_\alpha ) = (X_\alpha \,:Z_\alpha ), \quad \quad \mathrm{and} \quad \quad \mathbf {x}(P_{1/\alpha }) =(Z_\alpha \,:X_\alpha ), \end{aligned}$$

and in \(\mathbb {P}^1\) we now have

$$\begin{aligned} (a \,:1) = (X_\alpha ^2+Z_\alpha ^2 \,:-X_\alpha Z_\alpha ). \end{aligned}$$
(15)

Observe that for the isogeny \(\phi \) described in Theorem 1, the point \(P_0=(0,0) \in E\) is mapped to the point \(\phi (P_0) = (0,0) \in E'\). Since \(E'\) is a Montgomery curve and 2-torsion points preserve their order under odd isogenies, it must be that

$$\begin{aligned} \mathbf {x}(\phi (P_0)) = (0 \,:1) , \quad \,\, \mathbf {x}(\phi (P_\alpha )) = (X_\alpha ' \,:Z_\alpha '), \quad \,\, \mathrm{and} \quad \,\, \mathbf {x}(\phi (P_{1/\alpha })) =(Z_\alpha ' \,:X_\alpha '), \end{aligned}$$

so that the relation in (15) between 2-torsion coordinates and the curve coefficient holds on the new curve.

Rather than thinking of the Montgomery curve as being represented by the coefficient \((a \,:1) = (A \,:C)\), we can (without loss of generality) think of it as being represented by the 2-torsion point \((X_\alpha \,:Z_\alpha )\). A close inspection of Theorem 1 reveals that, for values of d greater than 3, computing the isogenous curve via (5) becomes increasingly more expensive than passing a 2-torsion point through (6). In these cases a function for computing (5) is no longer needed. If, during the current iteration, the curve constant \((A \,:C)\) is needed for point operations (e.g., the multiplication-by-\(\ell \) map), then we can recover A using (15) at a costFootnote 4 of \({2}\mathbf {S}+{5}\mathbf {a}\). In fact, the general multiplication-by-\(\ell \) routine is the Montgomery ladder [30] that calls \(\mathtt{xDBL}\) as a subroutine, and (in SIDH) \(\mathtt{xDBL}\) makes use of the constant \((a-2 \,:4) = ((A-2C)/4 \,:C)\). Parsing directly to this format is slightly faster than parsing to \((A \,:C)\), since from (15) we have \(((A-2C)/4 \,:C) = ((X_\alpha +Z_\alpha )^2 \,:(X_\alpha -Z_\alpha )^2-(X_\alpha +Z_\alpha )^2)\), which can be computed in \({2}\mathbf {S}+{3}\mathbf {a}\).

Although parsing from \(\alpha \) to \((a \,:1)=(1+\alpha ^2 \,:-\alpha )\) is trivial, parsing in the other direction requires a square root computation in general. We never have to do this, however, since (i) during key generation, the starting curve is fixed and so the corresponding 2-torsion point(s) can be thought of as system parametersFootnote 5, and (ii) for the subsequent shared secret computations, we can happily replace a with \(\alpha \) as the description of the supersingular curve in the (compressed or uncompressed) public keyFootnote 6. In the next section we use the notation \(\mathtt{a\_from\_alpha}\) to represent the function that performs this cheap parsing.

Remark 3

If P and \(Q \ne \pm P\) are two points on the Montgomery curve \(E/K \,:by^2 = x^3 + ax^2 + x\), then the curve constant a relates to their x-coordinates and the x-coordinate of their difference via [12, Remark 4]

$$\begin{aligned} a = \frac{(1-x_Px_Q-x_Px_{Q-P}-x_Qx_{Q-P})^2}{4x_Px_Qx_{Q-P}}-x_P-x_Q-x_{Q-P}. \end{aligned}$$
(16)

Thus, if we ever have three points on an isogenous curve whose coefficient has not been computed, we can use the projective version of the above equation to recover \((a \,:1)=(A \,:C)\) from \(\mathbf {x}(P)\), \(\mathbf {x}(Q)\) and \(\mathbf {x}(Q-P)\). Since the cost of computing the isogenous curve using the 2-torsion technique grows as the degree of the isogeny grows, while the cost of computing the isogenous curve via (16) is fixed, there will obviously be a crossover point when taking advantage of three available points becomes faster. Based on the cost of computing one \(\ell \)-isogeny presented in the next section, and on the projective version of (16) costing \({8}\mathbf {M}+{5}\mathbf {S}+{11}\mathbf {a}\), it will be faster to use the above after \(d\ge 2\). Following [12], we note that there is always three such x-coordinates that can be exploited during key generation, namely the three x-coordinates whose image under the isogeny forms (part of) the public key. During the shared secret computation, however, there will not always be three points available at each stage. Thus, we recommend that unless d is very large (so that the performance benefits of using (16) over an additional isogeny evaluation will be visible), it is most simple to stick to the 2-torsion approach in this section through the SIDH algorithm.

5 A General-Purpose Algorithm for Arbitrary Odd-Degree Isogenies

We now turn to deriving an optimised algorithm for arbitrary odd-degree isogeny evaluation based on Theorem 1. Since we are working exclusively within the \(\mathbb {P}^1\) framework under the map \(\mathbf {x}\), the only equation we need to recall is (6), which we rewrite as

$$\begin{aligned} f (x) = x \cdot \left( \prod _{i=1}^d \left( \frac{x \cdot x_{[i]P}-1}{x-x_{[i]P}} \right) \right) ^2. \end{aligned}$$

We begin working this into an algorithm by first projectivising into \(\mathbb {P}^1\), writing \((X_i \,:Z_i) = (x_{[i]P} \,:1)\) for \(i=1\dots d\), \((X \,:Z) = (x \,:1)\) for the indeterminate coordinate where the isogeny is evaluated, and \((X' \,:Z') = \mathbf {x}(\phi (x,y))\) for the result. Then

$$\begin{aligned} X'&= X \cdot \Big ( \prod _{i=1}^d (X \cdot X_i -Z_i \cdot Z)\Big )^2, \quad \mathrm{and} \nonumber \\ Z'&=Z \cdot \Big ( \prod _{i=1}^d (X \cdot Z_i -X_i \cdot Z) \Big )^2 \end{aligned}$$

At first glance it appears that computing the pairs \((X \cdot X_i -Z_i \cdot Z)\) and \((X \cdot Z_i -X_i \cdot Z)\) will cost \({4}\mathbf {M}+{2}\mathbf {a}\) each, but following Montgomery [30], we can achieve this in \({2}\mathbf {M}+{6}\mathbf {a}\) by rewriting the above as

$$\begin{aligned} X'&= X \cdot \Big ( \prod _{i=1}^d \Big [ (X-Z)(X_i+Z_i)+(X+Z)(X_i-Z_i) \Big ] \Big )^2, \quad \mathrm{and} \nonumber \\ Z'&=Z \cdot \Big ( \prod _{i=1}^d \Big [ (X-Z)(X_i+Z_i)-(X+Z)(X_i-Z_i) \Big ] \Big )^2. \end{aligned}$$
(17)

Observe that when \(d>1\) the values of \(X-Z\) and \(X+Z\) can be reused across the d expressions in both of the products above. Furthermore, the isogeny \(\phi \) is usually going to be evaluated at multiple points of the form \((X \,:Z)\), and this will always be the case if the 2-torsion technique from the previous section is employed. Thus, suppose the isogeny is to be evaluated at the n elements \((\mathbf {X}_1 \,:\mathbf {Z}_1), \dots ,(\mathbf {X}_n \,:\mathbf {Z}_n)\), where we use boldface to distinguish these points and the coordinates of the i-th multiples of the kernel generator P. We note at once that the values of \((X_i+Z_i)\) and \((X_i-Z_i)\) can now also be reused across the n elements evaluated by the isogeny. This mutual recycling across both sets of points suggests a simple subroutine that merely computes the sum and difference of these pairwise products as in (17): we dub this routine CrissCross and present it in Algorithm 1 for completeness.

figure a

Now, on input of the kernel generator \(\mathbf {x}(P) = (X_{1} \,:Z_{1})\), the first step of the main algorithm will be to generate the \(d-1\) additional elements \(\mathbf {x}([i]P) = (X_{i} \,:Z_{i})\). This subroutine is called KernelPoints and we present it in Algorithm 2. Since it must start with a call to \(\mathtt{xDBL}\) Footnote 7, we also need to input the modified curve constant \((\hat{A} \,:\hat{C}) = (a-2 \,:4)\).

figure b

Looking back at (17), we can see that once the \((X_i \,:Z_i)\) have been computed for \(i=1, \dots , d\), they can immediately be overwritten by their sum and difference pairs through assigning \((\hat{X}_i, \hat{Z}_i) \leftarrow (X_i+Z_i,X_i-Z_i)\) in preparation for CrissCross. Based on (17), we now present an algorithm for evaluating a single isogeny that takes as input the modified set of kernel point coordinates: OddIsogeny is given in Algorithm 3.

figure c

We are now in a position to present SimultaneousOddIsogeny, which is the main algorithm – see Algorithm 4. It takes as input \(\mathbf {x}(P) = (X_1 \,:Z_1) \in \mathbb {P}^1\) and \((\hat{A} \,:\hat{C}) = (a-2 \,:4)\), which correspond to a point P of order \(\ell \) on \(E/K \,:by^2=x^3+ax^2+x\), as well as an n-tuple \((\mathbf {x}(Q_1), \dots \mathbf {x}(Q_n)) = ((\mathbf {X}_1 \,:\mathbf {Z}_1), \dots ,(\mathbf {X}_n \,:\mathbf {Z}_n)) \in (\mathbb {P}^1)^n\) where the \(Q_i \in E\) are such that \(Q \not \in \langle P \rangle \). The output is an n-tuple corresponding to \((\mathbf {x}(\phi (Q_1)), \dots \mathbf {x}(\phi (Q_n))) \in (\mathbb {P}^1)^n\), where \(\mathrm{ker}(\phi ) = \langle P \rangle \).

figure d

Simplified Odd-Degree Isogenies in SIDH. Together with an algorithm for computing the multiplication-by-\(\ell \) map, Algorithm 4 is essentially all that is needed to compute an odd \(\ell ^e\)-degree isogeny in the context of SIDH. Regardless of which high-level strategy is used to compute the \(\ell ^e\)-isogeny (i.e., whether it be the multiplication-based approach [21, Fig. 2] or the optimal strategy [15, Sect. 4.2.2]), Algorithm 4 will be called e times to compute e isogenies of degree \(\ell \). In Algorithm 5 we show how SimultaneousOddIsogeny is to be used in conjunction with the simple conversion function a_from_alpha from Sect. 4 and the Montgomery ladder for computing the multiplication-by-\(\ell \) map. We assume the use of the function \(\mathtt{LADDER}\) as discussed in Sect. 4, where the Montgomery coefficient a is passed in projectively as \((\hat{A} \,:\hat{C}) = (a-2 :4)\); i.e.,

$$\begin{aligned} \mathtt{LADDER} \,:\mathbb {P}^1 \times \mathbb {P}^1 \times \mathbb {Z} \rightarrow \mathbb {P}^1, \quad (\mathbf {x}(P),(\hat{A} \,:\hat{C}), \ell ^z) \mapsto \mathbf {x}([\ell ^z]P). \end{aligned}$$
(18)

For ease of exposition, we adopt the multiplication-based approach [21, Fig. 2] for computing the degree \(\ell ^e\)-isogeny, but note that the way in which the proposed algorithms are called in Lines 4–6 of Algorithm 5 is analogous if the optimal strategy mentioned above is used; the only difference worth mentioning is that the length of the list of the \((\mathbf {X}_i' \,:\mathbf {Z}_i')\) passed in and out of SimultaneousOddIsogeny on Line 6 can change when it is called within the code executing the optimal strategy.

figure e

In the notation of Sect. 2, let \(E/\mathbb {F}_{p^2} :y^2=x(x-\alpha )(x-1/\alpha )\) be the public starting curve in the SIDH protocol. For public key generation, Alice would compute her secret kernel generator as \(R_A = [u_A]P_A+[v_A]Q_A\) of order \(\ell _A^{e_A}\) (see [15, Algorithm 1]), and with Bob’s public basis \(P_B\) and \(Q_B\), she can then compute her public key by calling Algorithm 5 as

$$\begin{aligned} \big ( (X_{\alpha ,A}&\,:Z_{\alpha ,A}), ( (\mathbf {x}(\phi _A(P_B)), \mathbf {x}(\phi _A(Q_B)), \mathbf {x}(\phi _A(Q_B-P_B))) \big ) \nonumber \\ \, \,&=\mathtt{SIDH\_Isogeny} \big (\mathbf {x}(R_A),(\ell _A,e_A),(\alpha \,:1), (\mathbf {x}(P_B), \mathbf {x}(Q_B), \mathbf {x}(Q_B-P_B)) \big ), \end{aligned}$$

where \(\mathrm{ker}(\phi _A) = \langle R_A \rangle \), and where \(\mathbf {x}(Q_B-P_B)\) is included as an input to avoid sign ambiguities in the subsequent shared secret computations – see [12]. Alice would normalise each of these elements, i.e., convert them all from \(\mathbb {P}^1(\mathbb {F}_{p^2})\) into \(\mathbb {F}_{p^2}\) via a simultaneous inversion [30], then send them to Bob. Writing \(\alpha _A = X_{\alpha ,A} / Z_{\alpha ,A}\), Bob can then compute \(\mathbf {x}(S_B) =\mathbf {x}\left( [u_B]\phi _A(P_B)+[v_B]\phi _A(Q_B)\right) \), and compute the shared secret by calling Algorithm 5 as

$$\begin{aligned} (X_{\alpha ,AB} \,:Z_{\alpha ,AB})= \mathtt{SIDH\_Isogeny} \big (\mathbf {x}(S_B),(\ell _B,e_B),(\alpha _A \,:1) \big ), \end{aligned}$$

before computing the j-invariant of the Montgomery curve whose coefficient is the output of the function \(\mathtt{a\_from\_alpha}((X_{\alpha ,AB}:Z_{\alpha ,AB}))\). Note that, during the shared secret computation, the \((\mathbb {P}^1)^k\) input to \(\mathtt{SIDH\_Isogeny}\) is empty, i.e., has \(k=0\).

We note that the operation counts presented in Algorithms 2–4 do not apply to the special case of \(d=1\). Although Algorithm 4 still performs the 3-isogeny computations in the same number of operations as the dedicated formulas in Appendix A, the claimed operation counts only hold if \(\mathtt{KernelPoints}\) is called, which is not the case for 3-isogenies (where no additional kernel elements are required).

We conclude this section with a remark on a more compact version of Algorithm 4.

Remark 4

(A low-storage version). The description of the general odd-degree isogeny function in Algorithm 4 aims to minimise the total number of field operations needed for an \(\ell \)-isogeny computation and its evaluation at an arbitrary number of points. However, the recycling of the additions computed in (17) requires us to generate the entire list of d kernel elements before entering the loop that repeatedly calls Algorithm 3. If d is large, the space required to store d elements in \(\mathbb {F}_{p^2}\) might become infeasible, especially given the size of the fields used in real-world SIDH implementations. Moreover, this recycling only saves \(\mathbb {F}_{p^2}\) additions, and our benchmarking of the SIDH v2.0 software accompanying [12] in the following section revealed that their software has \({1}\mathbf {M} \approx {20}\mathbf {a}\), which means the above recycling will only have a minor benefit on the overall performance. Thus, a more streamlined version of Algorithm 4 would simply compute one of the elements \(\mathbf {x}([i]P)=(X_i :Z_i)\) at a time and absorb its contribution to (17) immediately before calling \(\mathtt{xADD}\) to replace it by \(\mathbf {x}([i+1]P)=(X_{i+1} :Z_{i+1})\), and so on, with no need for Algorithm 2. Since the required storage would then remain fixed as d increases, this would give a much more compact algorithm for larger d, both in its description and in terms of the storage required to implement it.

6 Implementation Results and Implications

In this section we provide benchmarks for SimultaneousOddIsogeny, i.e., the general odd-degree isogeny function in Algorithm 4. We stress that we are not aiming to outperform the relative performance of the 3- and 4-isogenies, by pointing out that the relative performance of odd \(\ell \)-isogenies decreases as \(\ell \) grows larger. The point of this paper is to broaden the class of curves for which SIDH is practical in all of the relevant aspects, i.e., memory requirements, code size, simplicity of the implementation, as well as efficiency. Nevertheless, there are scenarios where larger odd-degree isogenies could be preferred over the low degree ones, as we will discuss later in this section.

Table 1 presents benchmarks for the evaluation of isogenies of degree \(\ell \in \{3,5, \dots , 15\}\) at \(n \in \{1,2,5,8\}\) input points. These timings were obtained by wrapping Algorithm 4 around the SIDH v2.0 softwareFootnote 8 accompanying [11, 12]; this software uses the supersingular isogeny class containing the curve \(E/\mathbb {F}_{p^2} :y^2=x^3\,+\,x\) where \(p=2^{372}\cdot 3^{239}-1\), where all curves in the class have cardinality \((2^{372}\cdot 3^{239})^2\). We note that this curve does not have \(\mathbb {F}_{p^2}\)-rational points of order \(\ell \) for odd \(\ell >3\), but this is immaterial; the timings for Algorithm 4 would be exactly the same when working with a curve with rational \(\ell \)-torsion over the same field. We benchmarked in this way in order to get a fair comparison of the cost of different values of \(\ell \) and n when the field arithmetic stays fixed at a size that is relevant to real-world SIDH implementations. We discuss the influence of needing rational \(\ell \)-torsion on the field arithmetic later in this section. The reason we chose to benchmark \(n \in \{1,2,5,8\}\) is based on the average number of isogeny evaluations for both Alice and Bob at each step of the SIDH v2.0 software that uses the optimal tree traversal (see Sect. 2 or [15, Sect. 4.2.2]) in the main loop: Alice and Bob use roughly 7.15 and 7.70 evaluations of every 4- and 3-isogeny (respectively) during key generation, and this would include one more evaluation if our 2-torsion technique from Sect. 4 was employed (hence \(n=8\)), and they use 4.15 and 4.70 respective isogeny evaluations per step during the shared secret phase (hence \(n=5\)). We also include \(n=1\) to benchmark the cost of a single isogeny evaluation and \(n=2\) assuming a single isogeny evaluation is included alongside the 2-torsion technique from Sect. 4; this is to view the relative performance of the simple SIDH loop in Algorithm 5 that evaluates each isogeny at one point during the main loop.

Table 1. Cycle counts for SimultaneousOddIsogeny for different values of \(\ell \) and n. Timing benchmarks were taken on an Intel Core i7-6500U Skylake processor running Ubuntu 14.04.5 LTS with TurboBoost disabled and all cores but one are switched-off. To obtain the executables, we used GNU-gcc version 4.8.4 with the -O2 flag set and GNU assembler version 2.24.

Table 1 shows a natural increase in latency as \(\ell \) grows. A single 5-isogeny evaluation costs around 2.71x that of a single 3-isogeny, and the cost of a 15-isogeny evaluation is around 11.40x that of a 3-isogeny. Due to the multiple isogeny evaluations sharing computations performed on the kernel elements (see Sect. 5), naturally these ratios become slightly more favourable for larger \(\ell \) as n increases: the evaluation of a 5-isogeny (resp. 15-isogeny) at \(n=8\) points costs around 2.03x (resp. 7.18x) the same number of 3-isogeny evaluations. These numbers are depicted graphically on the left of Fig. 1, and the approximate relative slowdown of using \(\ell \)-isogenies within the SIDH framework is depicted on the right. An analogous version of Fig. 1 for \(\ell \) up to \(\ell =301\) is given in Fig. 2. In both figures the cycle counts have been divided by n in order to give a cost per isogeny evaluation.

Fig. 1.
figure 1

Average cycle counts for SimultaneousOddIsogeny for different values of \(\ell \) and n. Timing benchmarks were taken on an Intel Core i7-6500U Skylake processor running Ubuntu 14.04.5 LTS with TurboBoost disabled and all cores but one are switched-off. To obtain the executables, we used GNU-gcc version 4.8.4 with the -O2 flag set and GNU assembler version 2.24. Raw cycle counts per isogeny evaluation are given on the left, while on the right they are scaled by the factor \(\mathrm{log}(3)/(\mathrm{log}(\ell ) \cdot \mathbf{C}_3)\), where \(\mathbf{C}_3\) is the cost of a 3-isogeny, in order to approximate the relative factor slowdown within the SIDH framework.

Fig. 2.
figure 2

Average cycle counts for SimultaneousOddIsogeny for different values of \(\ell \) and n. Timing benchmarks were taken on an Intel Core i7-6500U Skylake processor running Ubuntu 14.04.5 LTS with TurboBoost disabled and all cores but one are switched-off. To obtain the executables, we used GNU-gcc version 4.8.4 with the -O2 flag set and GNU assembler version 2.24. Raw cycle counts per isogeny evaluation are given on the left, while on the right they are scaled by the factor \(\mathrm{log}(3)/(\mathrm{log}(\ell ) \cdot \mathbf{C}_3)\), where \(\mathbf{C}_3\) is the cost of a 3-isogeny, in order to approximate the relative factor slowdown within the SIDH framework.

The right graphs in Figs. 1 and 2 aim to depict the relative factor slowdowns of computing an \(\ell ^e\) isogeny versus a \(3^{e_3}\) isogeny assuming that \(\ell ^e \approx 3^{e_3}\). However, we must note that a more accurate depiction of the relative slowdown in the SIDH framework would incorporate the relative costs of the multiplication-by-\(\ell \) functions, since these are called almost as frequently as the \(\ell \)-isogeny functions in an optimised implementation (and significantly more times than the \(\ell \)-isogeny functions in the simple SIDH loop – see [12, Sect. 6]). To that end, we point out that the relative slowdown of using \(\ell \)-isogenies would be much less than these graphs depict (as \(\ell \) increases), under the assumption that the Montgomery ladder is called to compute \(\mathbf {x}(P) \mapsto \mathbf {x}([\ell ]P)\). Table 2 and Fig. 3 exhibit the obvious trend in LADDER’s performance as \(\ell \) increases: unlike the linear increase in \(\ell \)-isogeny latencies, the performance of ladder is asymptotically logarithmic, being (roughly) fixed by the value \(\lceil \mathrm{log}_2(\ell )\rceil \). In any case, we make the obvious comment that a practically meaningful representation of the performance trade-offs for different values of \(\ell \) can only be obtained by benchmarking similarly optimised implementations in all cases. As we discuss below, such implementations might call for vastly different styles of field arithmetic, so we leave this open for future work.

Table 2. Cycle counts for \([\ell ](X \,:Z)\) on Montgomery Ladder with projective inputs: \((X \,:Z)\) and \((A_{24} \,:C_{24})\). Timing benchmarks were taken on an Intel Core i7-6500U Skylake processor running Ubuntu 14.04.5 LTS with TurboBoost disabled and all cores but one are switched-off. To obtain the executables, we used GNU-gcc version 4.8.4 with the -O2 flag set and GNU assembler version 2.24. For the fixed odd low degrees of \(\ell \in \{3,5,7\}\), we also present the cycle counts of our own optimised, dedicated algorithms for computing the multiplication-by-\(\ell \) maps, since this might be of interest for future implementers; see Appendix A for justification.
Fig. 3.
figure 3

Cycle counts chart for \([\ell ](X \,:Z)\) using Montgomery ladder with projective inputs: \((X \,:Z)\) and \((A_{24} \,:C_{24})\). Timing benchmarks were taken on an Intel Core i7-6500U Skylake processor running Ubuntu 14.04.5 LTS with TurboBoost disabled and all cores but one are switched-off. To obtain the executables, we used GNU-gcc version 4.8.4 with the -O2 flag set and GNU assembler version 2.24.

Implications. At a first glance, Table 1 and Figs. 1 and 2 seem to suggest that, unless faster isogenies of degree \(\ell \ge 5\) are found, such higher degree isogenies will not find any meaningful real-world application. However, the ability to compute arbitrary degree isogenies in SIDH already opens up some interesting possibilities as we now discuss.

Firstly, recent work by Bos and Friedberger [8] studied SIDH-friendly primes of the form \(p=2^{i}r^j-1\), where r can be any small prime. They investigated a number of different arithmetic techniques, and interestingly, when implementing arithmetic over the field with \(p=2^{372}3^{329}-1\) above, found that arithmetic over a comparably sized field \(p=2^{391}19^{88}-1\) was actually significantly faster [8, Table 3]. The more severe slowdown of 19-isogenies versus 3-isogenies means that, overall, the performance of 3-isogenies will still be preferred. However, in real-world applications like the transport layer security (TLS) protocol, it is typically one side of the protocol (i.e., the server, who is processing many SIDH instances) where performance is the bottleneck, while the performance of a single SIDH instance on the client side is ultimately a non-issue. In such a situation, we could envision affording the server the luxury of the faster prime \(p=2^{391}19^{88}-1\) and the faster 4-isogenies in order to get the best of both worlds, while the client could put up with the 19-isogenies and not be noticibly hampered by the increased latency on their side.

Another possibility opened up by Algorithm 4 is the abandonment of even-degree isogenies on either side of the protocol, in the name of implementation simplicity. For example, SIDH implementations using primes of the formFootnote 9 \(p=4 \cdot 3^i 5^j-1\) could be implemented using Algorithm 4 for isogenous curve and point operations on both sides. This would make for a much simpler and more compact code-base, and could be an attractive option if the relatively modest slowdown from 4- to 5-isogenies (and the possible slowdown of the new shaped primes) is justifiable.

Finally, we leave it as an open question to see whether primes not of the form \(p=f \cdot 2^i3^j-1\) can be found where arithmetic is fast enough to justify isogenies of \(\ell \ge 5\). It could even be possible to find fast primes where \(p\pm 1\) is smooth but contains many small, unique prime factors, and where isogeny walks on either or both sides of the protocol involve isogenies of different degrees. Of course, the security implications of such a choice are also left as open.