Abstract
CSIDH is a recent quantumresistant primitive based on the difficulty of finding isogeny paths between supersingular curves. Recently, two constanttime versions of CSIDH have been proposed: first by Meyer, Campos and Reith, and then by Onuki, Aikawa, Yamazaki and Takagi. While both offer protection against timing attacks and simple power consumption analysis, they are vulnerable to more powerful attacks such as fault injections. In this work, we identify and repair two oversights in these algorithms that compromised their constanttime character. By exploiting Edwards arithmetic and optimal addition chains, we produce the fastest constanttime version of CSIDH to date. We then consider the stronger attack scenario of fault injection, which is relevant for the security of CSIDH static keys in embedded hardware. We propose and evaluate a dummyfree CSIDH algorithm. While these CSIDH variants are slower, their performance is still within a small constant factor of lessprotected variants. Finally, we discuss derandomized CSIDH algorithms.
You have full access to this open access chapter, Download conference paper PDF
Similar content being viewed by others
1 Introduction
Isogenybased cryptography was introduced by Couveignes [10], who defined a key exchange protocol similar to Diffie–Hellman based on the action of an ideal class group on a set of ordinary elliptic curves. Couveignes’ protocol was independently rediscovered by Rostovtsev and Stolbunov [27, 28], who were the first to recognize its potential as a postquantum candidate. Recent efforts to make this system practical have put it back at the forefront of research in postquantum cryptography [13]. A major breakthrough was achieved by Castryck, Lange, Martindale, Panny, and Renes with CSIDH [6], a reinterpretation of Couveignes’ system using supersingular curves defined over a prime field.
The first implementation of CSIDH completed a key exchange in less than 0.1 seconds, and its performance has been further improved by Meyer and Reith [22]. However, both [6] and [22] recognized the difficulty of implementing CSIDH with constanttime algorithms, that is, algorithms whose running time, sequence of operations, and memory access patterns do not depend on secret data. The implementations of [6] and [22] are thus vulnerable to simple timing attacks.
The first attempt at implementing CSIDH in constanttime was realized by Bernstein, Lange, Martindale, and Panny [3], but their goal was to obtain a fully deterministic reversible circuit implementing the class group action, to be used in quantum cryptanalyses. The distinct problem of efficient CSIDH implementation with sidechannel protection was first tackled by Jalali, Azarderakhsh, Mozaffari Kermani, and Jao [16], and independently by Meyer, Campos, and Reith [21], whose work was improved by Onuki, Aikawa, Yamazaki, and Takagi [26].
The approach of Jalali et al. is similar to that of [3], in that they achieve a stronger notion of constant time (running time independent from all inputs), at the cost of allowing the algorithm to fail with a small probability. In order to make the failure probability sufficiently low, they introduce a large number of useless operations, which make the performance significantly worse than the original CSIDH algorithm. This poor performance and possibility of failure reduces the interest of this implementation; we will not analyze it further here.
Meyer et al. take a different path: the running time of their algorithm is independent of the secret key, but not of the output of an internal random number generator. They claim a speed only 3.10 times slower than the unprotected algorithm in [22]. Onuki et al. introduced new improvements, claiming a speedup of \(27.35 \%\) over Meyer et al., i.e., a net slowdown factor of 2.25 compared to [22].
Our Contribution. In this work we take a new look at sidechannel protected implementations of CSIDH. We start by reviewing the implementations in [21] and [26]. We highlight some flaws that make their constanttime claims disputable, and propose fixes for them. Since these fixes introduce some minor slowdowns, we report on the performance of the revised algorithms.
Then, we introduce new optimizations to make both [21] and [26] faster: we improve isogeny formulas for the model, and we introduce the use of optimal addition chains in the scalar multiplications. With these improvements, we obtain a version of CSIDH protected against timing and some simple power analysis (SPA) attacks that is 25% more efficient than [21] and 15% more efficient than a repaired version of [26].
Then, we shift our focus to stronger security models. All constanttime versions of CSIDH presented so far use socalled “dummy operations”, i.e., computations whose result is not used, but whose role is to hide the conditional structure of the algorithm from timing and SPA attacks that read the sequence of operations performed from a single power trace. However, this countermeasure is easily defeated by faultinjection attacks, where the adversary may modify values during the computation. We propose a new constanttime variant of CSIDH without dummy operations as a firstline defence. The new version is only twice as slow as the simple constanttime version.
We conclude with a discussion of derandomized variants of CSIDH. The versions discussed previously are “constanttime” in the sense that their running time is uncorrelated to the secret key, however it depends on some (necessarily secret) seed to a PRNG. While this notion of “constanttime” is usually considered good enough for sidechannel protection, one may object that a compromise of the PRNG or the seed generation would put the security of the implementation at risk, even if the secret was securely generated beforehand (with an uncomprised PRNG) as part of a longterm or static keypair. We observe that this dependence on additional randomness is not necessary: a simple modification of CSIDH, already considered in isogenybased signature schemes [11, 14], can easily be made constanttime and free of randomness. Unfortunately this modification requires increasing substantially the size of the base field, and is thus considerably slower and not compatible with the original version. On the positive side, the increased field size makes it much more resistant to quantum attacks, a nonnegligible asset in a context where the quantum security of CSIDH is still unclear; it can thus be seen as CSIDH variant for the paranoid.
Organization. In Sect. 2 we briefly recall ideas, algorithms and parameters from CSIDH [6]. In Sect. 3 we highlight shortcomings in [21] and [26] and propose ways to fix them. In Sect. 4 we introduce new optimizations compatible with all previous versions of CSIDH. In Sect. 5 we introduce a new algorithm for evaluating the CSIDH group action that is resistant against timing and some simple power analysis attacks, while providing protection against some fault injections. Finally, in Sect. 6 we discuss a more costly variant of CSIDH with stronger security guarantees.
Notation. M, S, and A denote the cost of computing a single multiplication, squaring, and addition (or subtraction) in \(\mathbb {F}_p\), respectively. We assume that a constanttime equality test \(\texttt {isequal}(X,Y)\) is defined, returning \(1\) if \(X = Y\) and \(0\) otherwise. We also assume that a constanttime conditional swap \(\texttt {cswap}(X,Y,b)\) is defined, exchanging \((X,Y)\) if \(b = 1\) (and not if \(b = 0\)).
2 CSIDH
CSIDH is an isogeny based primitive, similar to Diffie–Hellman, that can be used for key exchange and encapsulation [6], signatures [4, 11, 14], and other more advanced protocols. Compared to the other main isogenybased primitive SIDH [12, 17], CSIDH is slower. On the positive side, CSIDH has smaller public keys, is based on a better understood security assumption, and supports an easy key validation procedure, making it better suited than SIDH for CCAsecure encryption, staticdynamic and staticstatic key exchange. In this work we will use the jargon of key exchange when we refer to cryptographic concepts.
CSIDH works over a finite field \(\mathbb {F}_p\), where p is a prime of the special form
with \(\ell _1,\dots ,\ell _n\) a set of small odd primes. Concretely, the original CSIDH article [6] defined a 511bit p with \(\ell _1,\dots ,\ell _{n1}\) the first 73 odd primes, and \(\ell _n=587\).
The set of public keys in CSIDH is a subset of all supersingular elliptic curves defined over \(\mathbb {F}_p\), in Montgomery form \(y^2=x^3+Ax^2+x\), where \(A\in \mathbb {F}_p\) is called the Acoefficient of the curve.^{Footnote 1} The endomorphism rings of these curves are isomorphic to orders in the imaginary quadratic field \(\mathbb {Q}(\sqrt{4p})\). Castryck et al. [6] choose to restrict the public keys to the horizontal isogeny class of the curve with \(A=0\), so that all endomorphism rings are isomorphic to \(\mathbb {Z}[\sqrt{p}]\).
2.1 The Class Group Action
Let \(E/\mathbb {F}_p\) be an elliptic curve with \({{\,\mathrm{End}\,}}(E) \cong \mathbb {Z}[\sqrt{p}]\). If \(\mathfrak {a}\) is a nonzero ideal in \(\mathbb {Z}[\sqrt{p}]\), then it defines a finite subgroup \(E[\mathfrak {a}] = \bigcap _{\alpha \in \mathfrak {a}}\ker (\alpha )\), where we identify each \(\alpha \) with its image in \({{\,\mathrm{End}\,}}(E)\). We then have a quotient isogeny \(\phi : E \rightarrow E' = E/E[\mathfrak {a}]\) with kernel \(\mathfrak {a}\); this isogeny and its codomain is welldefined up to isomorphism. If \(\mathfrak {a}= (\alpha )\) is principal, then \(\phi \cong \alpha \) and \(E/E[\mathfrak {a}] \cong E\). Hence, we get an action of the ideal class group \({{\,\mathrm{Cl}\,}}(\mathbb {Z}[\sqrt{p}])\) on the set of isomorphism classes of elliptic curves \(E\) over \(\mathbb {F}_p\) with \({{\,\mathrm{End}\,}}(E) \cong \mathbb {Z}[\sqrt{p}]\); this action is faithful and transitive. We write \(\mathfrak {a}*E\) for the image of (the class of) \(E\) under the action of \(\mathfrak {a}\), which is (the class of) \(E/E[\mathfrak {a}]\) above.
For CSIDH, we are interested in computing the action of small prime ideals. Consider one of the primes \(\ell _i\) dividing \(p+1\); the principal ideal \((\ell _i) \subset \mathbb {Z}[\sqrt{p}]\) splits into two primes, namely \(\mathfrak {l}_i = (\ell _i,\pi 1)\) and \({\bar{\mathfrak {l}}_{i}} = ({\ell _{i}},\pi +1)\), where \(\pi \) is the element of \(\mathbb {Z}[\sqrt{p}]\) mapping to the Frobenius endomorphism of the curves. Since \({\bar{\mathfrak {l}}_{i}}\mathfrak {l}_i = (\ell _i)\) is principal, we have \({\bar{\mathfrak {l}}_{i}} = \mathfrak {l}_i^{1}\) in \({{\,\mathrm{Cl}\,}}(\mathbb {Z}[\sqrt{p}])\), and hence
for all \(E/\mathbb {F}_p\) with \({{\,\mathrm{End}\,}}(E) \cong \mathbb {Z}[\sqrt{p}]\).
2.2 The CSIDH Algorithm
At the heart of CSIDH is an algorithm that evaluates the class group action described above on any supersingular curve over \(\mathbb {F}_p\). Cryptographically, this plays the same role as modular exponentiation in classic Diffie–Hellman.
The input to the algorithm is an elliptic curve \(E:y^2=x^3+Ax^2+x\), represented by its Acoefficient, and an ideal class \(\mathfrak {a} = \prod _{i =1}^{n} \mathfrak {l}_i^{e_i},\) represented by its list of exponents \((e_i,\dots ,e_n)\in \mathbb {Z}^n\). The output is the (Acoefficient of the) elliptic curve \(\mathfrak {a}* E = \mathfrak {l}_1^{e_1} * \cdots * \mathfrak {l}_n^{e_n} * E\).
The isogenies corresponding to \(\mathfrak {l}_i=(\ell _i,\pi 1)\) can be efficiently computed using Vélu’s formulæ and their generalizations: exploiting the fact that \(\#E(\mathbb {F}_p)=p+1=4\prod \ell _i\), one looks for a point R of order \(\ell _i\) in \(E(\mathbb {F}_p)\) (i.e., a point that is in the kernels of both the multiplicationby\(\ell _i\) map and \(\pi 1\)), computes the isogeny \(\phi :E\rightarrow E/\langle R\rangle \) with kernel \(\langle R\rangle \), and sets \(\mathfrak {l}_i*E=E/\langle R\rangle \). Iterating this procedure lets us compute \(\mathfrak {l}_i^e*E\) for any exponent \(e\ge 0\).
The isogenies corresponding to \(\mathfrak {l}_i^{1}\) are computed in a similar fashion: this time one looks for a point R of order \(\ell _i\) in the kernel of \(\pi +1\), i.e., a point in \(E(\mathbb {F}_{p^2})\) of the form \((x,iy)\) where both \(x\) and \(y\) are in \(\mathbb {F}_p\) (since \(i = \sqrt{1}\) is in \(\mathbb {F}_{p^2}\setminus \mathbb {F}_p\) and satisfies \(i^p = i\)). Then one proceeds as before, setting \(\mathfrak {l}_i^{1}*E=E/\langle R\rangle \).
In the sequel we assume that we are given an algorithm QuotientIsogeny which, given a curve \(E/\mathbb {F}_p\) \(\phi : E \rightarrow E' \cong E/\langle R\rangle \), and returns the pair \((\phi ,E')\). We refer to this operation as isogeny computation. Algorithm 1, taken from the original CSIDH article [6], computes the class group action.
For cryptographic purposes, the exponent vectors \((e_1,\dots ,e_n)\) must be taken from a space of size at least \(2^{2\lambda }\), where \(\lambda \) is the (classical) security parameter. The CSIDH512 parameters in [6] take \(n=74\), and all \(e_i\) in the interval \([5, 5]\), so that \(74 \log _2(2\,\cdot \,5\,+\,1) \simeq 255.99\), consistent with the NIST1 security level. With this choice, the implementation of [6] computes one class group action in 40 ms on average. Meyer and Reith [22] further improved this to 36 ms on average. Neither implementation is constanttime.
2.3 The Meyer–Campos–Reith ConstantTime Algorithm
As Meyer, Campos and Reith observe in [21], Algorithm 1 performs fewer scalar multiplications when the key has the same number of positive and negative exponents than it does in the unbalanced case where these numbers differ. Algorithm 1 thus leaks information about the distribution of positive and negative exponents under timing attacks. Besides this, analysis of power traces would reveal the cost of each isogeny computation, and the number of such isogenies computed, which would leak the exact exponents of the private key.
In view of this vulnerability, Meyer, Campos and Reith proposed in [21] a constanttime CSIDH algorithm whose running time does not depend on the private key (though, unlike [16], it still varies due to randomness). The essential differences between the algorithm of [21] and classic CSIDH are as follows. First, to address the vulnerability to timing attacks, they choose to use only positive exponents in [0, 10] for each \(\ell _i\), instead of \([5, 5]\) in the original version, while keeping the same prime \(p = \prod _{i= 1}^{74} \ell _i 1\). To mitigate power consumption analysis attacks, their algorithm always computes the maximal amount of isogenies allowed by the exponent, using dummy isogeny computations if needed.
Since these modifications generally produce more costly group action computations, the authors also provide several optimizations that limit the slowdown in their algorithm to a factor of 3.10 compared to [22]. These include the Elligator 2 map of [2] and [3], multiple batches for isogeny computation (SIMBA), and sample the exponents \(e_i\) from intervals of different sizes depending on \(\ell _i\).
2.4 The Onuki–Aikawa–Yamazaki–Takagi ConstantTime Algorithm
Still assuming that the attacker can perform only power consumption analysis and timing attacks, Onuki, Aikawa, Yamazaki and Takagi proposed a faster constanttime version of CSIDH in [26].
The key idea is to use two points to evaluate the action of an ideal, one in \(\ker (\pi 1)\) (i.e., in \(E(\mathbb {F}_p)\)) and one in \(\ker (\pi +1)\) (i.e., in \(E(\mathbb {F}_{p^2})\) with xcoordinate in \(\mathbb {F}_p\)). This allows them to avoid timing attacks, while keeping the same primes and exponent range \([5, 5]\) as in the original CSIDH algorithm. Their algorithm also employs dummy isogenies to mitigate some power analysis attacks, as in [21]. With these improvements, they achieve a speedup of \(27.35 \%\) compared to [21].
We include pseudocode for the algorithm of [26] in Algorithm 2, to serve both as a reference for a discussion of some subtle leaks in Sect. 3 and also as a departure point for our dummyfree algorithm in Sect. 5.
3 Repairing ConstantTime Versions
3.1 Projective Elligator
Both [21] and [26] use the Elligator 2 map to sample a random point on the current curve \(E_A\) in step 6 of Algorithm 2. Elligator takes as input a random field element \(u\in \{2,\dots ,\frac{p1}{2}\}\) and the Montgomery Acoefficient from the current curve and returns a pair of points in \(E_A[\pi  1]\) and \(E_A[\pi + 1]\) respectively.
To avoid a costly inversion of \(u^2  1\), instead of sampling u randomly, Meyer, Campos and Reith^{Footnote 2} follow [3] and precompute a set of ten pairs \((u,(u^21)^{1})\); they try them in order until one that produces a point Q passing the test in Step 12 is found. When this happens, the algorithm moves to the next curve, and Elligator can keep on using the next precomputed value of u, going back to the first value when the tenth has been reached. This is a major departure from [3], where all precomputed values of u are tried for each isogeny computation, and the algorithm succeeds if at least one passes the test. And indeed the implementation of [21] leaks information on the secret via the timing channel:^{Footnote 3} since Elligator uses no randomness for u, its output only depends on the Acoefficient of the current curve, which itself depends on the secret key; but the running time of the algorithm varies and, not being correlated to u, it is necessarily correlated to A and thus to the secret.
Fortunately this can be easily fixed by (re)introducing randomness in the input to Elligator. To avoid field inversions, we use a projective variant: given \(u\ne 0,1\) and assuming \(A\ne 0\), we write \(V = (A : u^2 1)\), and we want to determine whether V is the abscissa of a projective point on \(E_A\). Plugging V into the homogeneous equation
gives
We can test the existence of a solution for Y by computing the Legendre symbol of the right hand side: if it is a square, the points with projective XZcoordinates
are in \(E_A[\pi 1]\) and \(E_A[\pi +1]\) respectively, otherwise their roles are swapped.
We are left with the case \(A=0\). Following [3], Meyer, Campos and Reith precompute once and for all a pair of generators \(T_+,T_\) of \(E_0[\pi 1]\) and \(E_0[\pi +1]\), and output those instead of random points. This choice suffers from a similar issue to the previous one: because the points are output in a deterministic way, the running time of the whole algorithm will be correlated to the number of times the curve \(E_0\) is encountered during the isogeny walk.
In practice, \(E_0\) is unlikely to ever be encountered in a random isogeny walk, except as the starting curve in the first phase of a key exchange, thus this flaw seems hard to exploit. Nevertheless, we find it not significantly more expensive to use a different approach, also suggested in [3]: with \(u\ne 0\), only on \(E_0\), we define the output of Elligator as \(T_+=(u:1),T_=(u:1)\) when \(u^3+u\) is a square, and we swap the points when \(u^3+u\) is not a square.
With these choices, under reasonable heuristics experimentally verified in [3], the running time of the whole algorithm is uncorrelated to the secret key as long as the values of u are unknown to an adversary. We summarize our implementation of Elligator in Algorithm 3, generalizing it to the case of Montgomery curves represented by projective coefficients (see also Sect. 4.1.1).
3.2 Fixing a Leaking Branch in Onuki–Aikawa–Yamazaki–Takagi
The algorithm from [26], essentially reproduced in Algorithm 2, includes a conditional statement at Line 12 which branches on the value of the point Q computed at Line 10. But this value depends on the sign s of the secret exponent \(e_i\), so the branch leaks information about the secret. We propose repairing this by always computing both \(Q_0 \leftarrow [k/\ell _i]P_0\) and \(Q_1 \leftarrow [k/\ell _i]P_1\) at Line 10, and replacing the condition in Line 12 with a test for \((Q_0 = \infty )\; \mathbf or \; (Q_1 = \infty )\) (and using constanttime conditional swaps throughout).^{Footnote 4} This fix is visible in Line 13 of Algorithm 5.
4 Optimizing ConstantTime Implementations
In this section we propose several optimizations that are compatible with both nonconstanttime and constanttime implementations of CSIDH.
4.1 Isogeny and Point Arithmetic on Twisted Edwards Curves
In this subsection, we present efficient formulas in twistedEdwards coordinates for four fundamental operations: point addition, point doubling, isogeny computation (as presented in [25]; cf. § 2.2), and isogeny evaluation (i.e. computing the image of a point under an isogeny). Our approach obtains a modest but still noticeable improvement with respect to previous proposals based on Montgomery representation, or hybrid strategies that propound combinations of Montgomery and twistedEdwards representations [5, 18,19,20, 23].
Castryck, Galbraith, and Farashahi [5] proposed using a hybrid representation to reduce the cost of point doubling on certain Montgomery curves, by exploiting the fact that converting between Montgomery and twisted Edwards models can be done at almost no cost. In [23], Meyer, Reith and Campos considered using twisted Edwards formulas for computing isogeny and elliptic curve arithmetic, but concluded that a pure twistedEdwardsonly approach would not be advantageous in the context of SIDH. Bernstein, Lange, Martindale, and Panny observed in [3] that the conversion from Montgomery XZ coordinates to twisted Edwards YZ coordinates occurs naturally during the Montgomery ladder. Kim, Yoon, Kwon, Park, and Hong presented a hybrid model in [19] using Edwards and Montgomery models for isogeny computations and point arithmetic, respectively; in [18] and [20], they suggested computing isogenies using a modified twisted Edwards representation that introduces a fourth coordinate w.
To the best of our knowledge, the quest for more efficient elliptic curve and isogeny arithmetic than that offered by pure Montgomery and twistedEdwardsMontgomery representations remains an open problem. As a step forward in this direction, Moody and Shumow [25] showed that when dealing with isogenies of odd degree \(d = 2\ell  1\) with \(\ell \ge 2,\) twisted Edwards representation offers a cheaper formulation for isogeny computation than the corresponding one using Montgomery curves; nevertheless, they did not address the problem of getting a cheaper twisted Edwards formulation for the isogeny evaluation operation.
4.1.1 Montgomery Curves
A Montgomery curve [24] is defined by the equation \(E_{A,B}: By^2 = x^3 + Ax^2 + x\), such that \(B \ne 0\) and \(A^2 \ne 4\) (we often write \(E_{A}\) for \(E_{A,1}\)). We refer to [9] for a survey on Montgomery curves. When performing isogeny computations and evaluations, it is often more convenient to represent the constant A in the projective space \(\mathbb {P}^1\) as \((A': C'),\) such that \(A = A'/C'.\) Montgomery curves are attractive because they are exceptionally wellsuited to performing the differential point addition operation which computes \(x(P+Q)\) from \(x(P)\), \(x(Q)\), and \(x(PQ)\). Equations (1) and (2) describe the differential point doubling and addition operations proposed by Montgomery in [24]:
where \(A_{24p} = A + 2C\) and \(C_{24} = 4C\), and
Montgomery curves can be used to efficiently compute isogenies using Vélu’s formulas [30]. Suppose we want the image of a point \(Q\) under an \(\ell \)isogeny \(\phi \), where \(\ell = 2k+1\). For each \(1 \le i \le k\) we let \((X_i: Z_i) = x([i]P)\), where \(\langle {P}\rangle = \ker \phi \). Equation (3) computes \((X':Z') = x(\phi (Q))\) from \((X_Q:Z_Q) = x(Q)\).
4.1.2 Twisted Edwards Curves
In [1] we see that every Montgomery curve \(E_{A,B}: By^2 = x^3 + Ax^2 + x\) is birationally equivalent to a twisted Edwards curve \(E_{a,d} : ax^2 + y^2 = 1 + dx^2y^2\); the curve constants are related by
and the rational maps \(\phi : E_{a,d} \rightarrow E_{A,B}\) and \(\psi : E_{A,B} \rightarrow E_{a,d}\) are defined by
Rewriting this relationship for Montgomery curves with projective constants, \(E_{a,d}\) is equivalent to the Montgomery curve \(E_{(A: C)} = E_{A/C,1}\) with constants
To avoid notational ambiguities, we write \((Y_P: T_P)\) for the \(\mathbb {P}^1\) projection of the ycoordinate of the point \(P \in E_{a,d}\). Let \(P\in E_{(A:C)}\). In projective coordinates, the map \(\psi \) of (4) becomes
Comparing (5) with (1) reveals that \(Y_P\) and \(T_P\) appear in the doubling formula, so we can substitute them at no cost. Replacing \(A_{24p}\) and \(C_{24}\) with their twisted Edwards equivalents a and \(e=ad\), respectively, we obtain a doubling formula for twisted Edwards YT coordinates:
Similarly, the coordinates \(Y_P, T_P, Y_Q, T_Q, Y_{PQ}\) and \(T_{PQ}\) appear in (2), and thus we derive differential addition formulas for twisted Edwards coordinates:
The computational costs of doubling and differential addition are \(4\mathbf M + 2\mathbf S + 4\mathbf A \) (the same as evaluating (1)) and \(4\mathbf M + 2\mathbf S + 6\mathbf A \) (the same as (2)), respectively.
The Moody–Shumow formulas for isogeny computation [25] are given in terms of twisted Edwards YTcoordinates. It remains to derive a twisted Edwards YTcoordinate isogenyevaluation formula for \(\ell \)isogenies where \(\ell = 2k+1\). We do this by applying the map in (5) to (3), which yields
The main advantage of the approach outlined here is that by only using points given in YT coordinates, we can compute point doubling, point addition and isogeny construction and evaluation at a lower computational cost. Indeed, isogeny evaluation in XZ costs \(4k\mathbf M + 2\mathbf S + 6k\mathbf A \), whereas the above YT coordinate formula costs \(4k\mathbf M + 2\mathbf S + (2k + 4)\mathbf A \), thus saving \(4k  4\) field additions.
4.2 Addition Chains for a Faster Scalar Multiplication
Since the coefficients in CSIDH scalar multiplications are always known in advance (they are essentially system parameters), there is no need to hide them by using constanttime scalar multiplication algorithms such as the classical Montgomery ladder. Instead, we can use shorter differential addition chains.^{Footnote 5}
In the CSIDH group action computation, any given scalar k is the product of a subset of the collection of the 74 small primes \(\ell _i\) dividing \(\frac{p+1}{4}\). We can take advantage of this structure to use shorter differential addition chains than those we might derive for general scalars of a comparable size. First, we precomputed the shortest differential addition chains for each one of the small primes \(\ell _i\). One then computes the scalar multiplication operation [k]P as the composition of the differential addition chains for each prime \(\ell \) dividing k.
Power analysis on the coefficient computation might reveal the degree of the isogeny that is currently being computed, but, since we compute exactly one \(\ell _i\)isogeny for each \(\ell _i\) per loop, this does not leak any secret information.
This simple trick allows us to compute scalar multiplications [k]P using differential addition chains of length roughly \(1.5\lceil \log _2(k)\rceil \). This yields a saving of about 25% compared with the cost of the classical Montgomery ladder.
5 Removing Dummy Operations for FaultAttack Resistance
The use of dummy operations in the previous constanttime algorithms implies that the attacker can obtain information on the secret key by injecting faults into variables during the computation. If the final result is correct, then she knows that the fault was injected in a dummy operation; if it is incorrect, then the operation was real. For example, if one of the values in Line 18 of Algorithm 2 is modified without affecting the final result, then the adversary learns whether the corresponding exponent \(e_i\) was zero at that point.
Fault injection attacks have been considered in the context of SIDH ([15, 29]), but to the best of our knowledge, they have not been studied yet on dummy operations in the context of CSIDH. Below we propose an approach to constanttime CSIDH without dummy computations, making every computation essential for a correct final result. This gives us some natural resistance to fault, at the cost of approximately a twofold slowdown.
Our approach to avoiding faultinjection attacks is to change the format of secret exponent vectors \((e_1,\dots ,e_n)\). In both the original CSIDH and the Onuki et al. variants, the exponents \(e_i\) are sampled from an integer interval \([m_i,m_i]\) centered in 0. For naive CSIDH, evaluating the action of \(\mathfrak {l}_i^{e_i}\) requires evaluating between \(0\) and \(m\) isogenies, corresponding to either the ideal \(\mathfrak {l}_i\) (for positive \(e_i\)) or \(\mathfrak {l}_i^{1}\) (for negative \(e_i\)). If we follow the approach of [26], then we must also compute \(k  e_i\) dummy \(\ell _i\)isogenies to ensure a constanttime behaviour.
For our new algorithm, the exponents \(e_i\) are uniformly sampled from sets
i.e., centered intervals containing only even or only odd integers. The interesting property of these sets is that a vector drawn from \({{\,\mathrm{\mathcal {S}}\,}}(m)^n\) can always be rewritten (in a nonunique way) as a sum of m vectors with entries \(\{1,+1\}\) (i.e., vectors in \({{\,\mathrm{\mathcal {S}}\,}}(1)^n\)). But the action of a vector drawn from \({{\,\mathrm{\mathcal {S}}\,}}(1)^n\) can clearly be implemented in constanttime without dummy operations: for each coefficient \(e_i\), we compute and evaluate the isogeny associated to \(\mathfrak {l}_i\) if \(e_i=1\), or the one associated to \(\mathfrak {l}_i^{1}\) if \(e_i=1\). Thus, we can compute the action of vectors drawn from \({{\,\mathrm{\mathcal {S}}\,}}(m)^n\) by repeating m times this step.
More generally, we want to evaluate the action of vectors \((e_1,\ldots ,e_n)\) drawn from \({{\,\mathrm{\mathcal {S}}\,}}(m_1)\times \cdots \times {{\,\mathrm{\mathcal {S}}\,}}(m_n)\). Algorithm 4 achieves this in constanttime and without using dummy operations. The outer loop at line 3 is repeated exactly \(\max (m_i)\) times, but the inner “if” block at line 5 is only executed \(m_i\) times for each i; it is clear that this flow does not depend on secrets. Inside the “if” block, the coefficients \(e_i\) are implicitly interpreted as
i.e., the algorithm starts by acting by \(\mathfrak {l}_i^{\texttt {sign}(e_i)}\) for \(e_i\) iterations, then alternates between \(\mathfrak {l}_i\) and \(\mathfrak {l}_i^{1}\) for \(m_ie_i\) iterations. We assume that the \(\texttt {sign}:\mathbb {Z}\rightarrow \{\pm 1\}\) operation is implemented in constant time, and that \(\texttt {sign}(0)=1\). If one is careful to implement the isogeny evaluations in constanttime, then it is clear that the full algorithm is also constanttime.
However, Algorithm 4 is only an idealized version of the CSIDH group action algorithm. Indeed, like in [21, 26], it may happen in some iterations that Elligator outputs points of order not divisible by \(\ell _i\), and thus the action of \(\mathfrak {l}_i\) or \(\mathfrak {l}_i^{1}\) cannot be computed in that iteration. In this case, we simply skip the loop and retry later: this translates into the variable \(z_i\) not being decremented, so the total number of iterations may end up being larger than \(\max (m_i)\). Fortunately, if the input value u fed to Elligator is random, its output is uncorrelated to secret values^{Footnote 6}, and thus the fact that an iteration is skipped does not leak information on the secret. The resulting algorithm is summarized in Algorithm 5.
To maintain the security of standard CSIDH, the bounds \(m_i\) must be chosen so that the key space is at least as large. For example, the original implementation [6] samples secrets in \([5,5]^{74}\), which gives a key space of size \(11^{74}\); hence, to get the same security we would need to sample secrets in \({{\,\mathrm{\mathcal {S}}\,}}(10)^{74}\). But a constanttime version of CSIDH à la Onuki et al. only needs to evaluate five isogeny steps per prime \(\ell _i\), whereas the present variant would need to evaluate ten isogeny steps. We thus expect an approximately twofold slowdown for this variant compared to Onuki et al., which is confirmed by our experiments.
6 Derandomized CSIDH Algorithms
As we stressed in Sect. 3, all of the algorithms presented here depend on the availability of highquality randomness for their security. Indeed, the input to Elligator must be randomly chosen to ensure that the total running time is uncorrelated to the secret key. Typically, this would imply the use of a PRNG seeded with high quality true randomness that must be kept secret. An attack scenario where the attacker may know the output of the PRNG, or where the quality of PRNG output is less than ideal, therefore degrades the security of all algorithms. This is true even when the secret was generated with a highquality PRNG if the keypair is static, and the secret key is then used by an algorithm with lowquality randomness.
We can avoid this issue completely if points of order \(\prod \ell _i^{m_i}\), where \(m_i\) is the maximum possible exponent (in absolute value) for \(\ell _i\), are available from the start. Unfortunately this is not possible with standard CSIDH, because such points are defined over field extensions of exponential degree.
Instead, we suggest modifying CSIDH as follows. First, we take a prime \(p = 4 \prod _{i=1}^{n} \ell _i  1\) such that \(\lceil n\log _2(3) \rceil = 2\lambda \), where \(\lambda \) is a security parameter, and we restrict to exponents of the private key sampled from \(\{1, 0, 1 \}\). Then, we compute two points of order \((p+1)/4\) on the starting public curve, one in \(\ker (\pi 1)\) and the other in \(\ker (\pi + 1)\), where \(\pi \) is the Frobenius endomorphism. This computation involves no secret information and can be implemented in variabletime; furthermore, if the starting curve is the initial curve with \(A=0\), or a public curve corresponding to a long term secret key, these points can be precomputed offline and attached to the system parameters or the public key. We also remark that even for ephemeral public keys, a point of order \(p+1\) must be computed anyway for key validation purposes, and thus this computation only slows down key validation by a factor of two.
Since we have restricted exponents to \(\{1, 0, 1\}\), every \(\ell _i\)isogeny in Algorithm 2 can be computed using only (the images of) the two precomputed points. There is no possibility of failure in the test of Line 12, and no need to sample any other point.
We note that this algorithm still uses dummy operations. If faultinjection attacks are a concern, the exponents can be further restricted to \(\{1,1\}\), and the group action evaluated as in (a stripped down form of) Algorithm 5. However this further increases the size of p, as n must now be equal to \(2\lambda \).
This protection comes at a steep price: at the 128 bits security level, the prime p goes from 511 bits to almost 1500. The resulting field arithmetic would be considerably slower, although the global running time would be slightly offset by the smaller number of isogenies to evaluate.
On the positive side, the resulting system would have much stronger quantum security. Indeed, the best known quantum attacks are exponential in the size of the key space (\(\approx 2^{2\lambda }\) here), but only subexponential in p (see [6, 7, 13]). Since our modification more than doubles the size of p without changing the size of the key space, quantum security is automatically increased. For this same reason, for security levels beyond NIST1 (64 quantum bits of security), the size of p increases more than linearly in \(\lambda \), and the variant proposed here becomes natural. Finally, parameter sets with a similar imbalance between the size of p and the security parameter \(\lambda \) have already been considered in the context of isogeny based signatures [11], where they provide tight security proofs in the QROM.
Hence, while at the moment this costly modification of CSIDH may seem overkill, we believe further research is necessary to try and bridge the efficiency gap between it and the other sidechannel protected implementations of CSIDH.
7 Experimental Results
Tables 1 and 2 summarize our experimental results, and compare our algorithms with those of [6, 21], and [26]. Table 1 compares algorithms in terms of elementary field operations, while Table 2 compares cycle counts of C implementations. All of our experiments were ran on a Intel(R) Core(TM) i76700K CPU 4.00 GHz machine with 16 GB of RAM. Turbo boost was disabled. The software environment was the Ubuntu 16.04 operating system and gcc version 5.5.
In all of the algorithms considered here (except the original [6]), the group action is evaluated using the SIMBA method (Splitting Isogeny computations into Multiple BAtches) proposed by Meyer, Campos, and Reith in [21]. Roughly speaking, SIMBA\(m\)\(k\) partitions the set of primes \(\ell _i\) into \(m\) disjoint subsets \(S_i\) (batches) of approximately the same size. SIMBA\(m\)\(k\) proceeds by computing isogenies for each batch \(S_i\); after \(k\) steps, the unreached primes \(\ell _i\) from each batch are merged.
Castryck et al. We used the reference CSIDH implementation made available for download by the authors of [6]. None of our countermeasures or algorithmic improvements were applied.
Meyer–Campos–Reith. We used the software library freely available from the authors of [21]. This software batches isogenies using SIMBA511. The improvements we describe in Sects. 3 and 4 were not applied.
Onuki et al. Unfortunately, the source code for the implementation in [26] was not freely available, so direct comparison with our implementation was impossible. Table 1 includes their field operation counts for their unmodified algorithm (which, as noted in Sect. 3, is insecure) using SIMBA38, and our estimates for a repaired version applying our fix in Sect. 3. We did not apply the optimizations of Sect. 4 here. (We do not replicate the cycle counts from [26] in Table 2, since they may have been obtained using turbo boost, thus rendering any comparison invalid.)
Our Implementations. We implemented three constanttime CSIDH algorithms, using the standard primes with the exponent bounds \(m_i\) from [26, § 5.2].

MCRstyle. This is essentially our version of Meyer–Campos–Reith (with one torsion point and dummy operations, batching isogenies with SIMBA511), but applying the techniques of Sects. 3 and 4.

OAYTstyle. This is essentially our version of Onuki et al. (using two torsion points and dummy operations, batching isogenies with SIMBA38), but applying the techniques of Sects. 3 and 4.

Nodummy. This is Algorithm 5 (with two torsion points and no dummy operations), batching isogenies using SIMBA511.
In each case, the improvements and optimizations of Sects. 3 and 4 are applied, including projective Elligator, short differential addition chains, and twisted Edwards arithmetic and isogenies. Our software library is freely available from
https://github.com/JJChiDguez/csidh.
The field arithmetic is based on the Meyer–Campos–Reith software library [21]; since the underlying arithmetic is essentially identical, the performance comparisons below reflect differences in the CSIDH algorithms.
Results.
We see in Table 2 that the techniques we introduced in Sects. 3 and 4 produce substantial savings compared with the implementation of [21]. In particular, our OAYTstyle implementation yields a 25% improvement over [21]. Since the implementations use the same underlying field arithmetic library, these improvements are entirely due to the techniques introduced in this paper. While our nodummy variant is (unsurprisingly) slower, we see that the performance penalty is not prohibitive: it is less than twice as slow as our fastest dummyoperation algorithm, and only 44% slower than [21].
8 Conclusion and Perspectives
We studied sidechannel protected implementations of the isogeny based primitive CSIDH. Previous implementations failed at being constant time because of some subtle mistakes. We fixed those problems, and proposed new improvements, to achieve the most efficient version of CSIDH protected against timing and simple power analysis attacks to date. All of our algorithms were implemented in C, and the source made publicly available online.
We also studied the security of CSIDH in stronger attack scenarios. We proposed a protection against some faultinjection and timing attacks that only comes at a cost of a twofold slowdown. We also sketched an alternative version of CSIDH “for the paranoid”, with much stronger security guarantees, however at the moment this version seems too costly for the security benefits; more work is required to make it competitive with the original definition of CSIDH.
Notes
 1.
 2.
Presumably, Onuki et al. do the same, however their exposition is not clear on this point, and we do not have access to their code.
 3.
The Elligator optimization is described in § 5.3 of [21]. The unoptimized constanttime version described in Algorithm 2 therein is not affected by this problem.
 4.
We also found a branch on secret data in the code provided with [21] at https://zenon.cs.hsrm.de/pqcrypto/fastercsidh, during the 3isogeny computation, when computing \([\ell ]P = [(\ell 1)/2]P + [(\ell +1)/2]P\). This can be easily fixed by a conditional swap, without any significant impact on running time.
 5.
A differential addition chain is an addition chain such that for every chain element c computed as \(a + b\), the difference \(ab\) is already present in the chain.
 6.
Assuming the usual heuristic assumptions on the distribution of the output of Elligator, see [21].
References
Bernstein, D.J., Birkner, P., Joye, M., Lange, T., Peters, C.: Twisted Edwards curves. In: Vaudenay, S. (ed.) AFRICACRYPT 2008. LNCS, vol. 5023, pp. 389–405. Springer, Heidelberg (2008). https://doi.org/10.1007/9783540681649_26
Bernstein, D.J., Hamburg, M., Krasnova, A., Lange, T.: Elligator: ellipticcurve points indistinguishable from uniform random strings. In: 2013 ACM SIGSAC Conference on Computer and Communications Security, CCS 2013, Berlin, Germany, 4–8 November 2013, pp. 967–980 (2013)
Bernstein, D.J., Lange, T., Martindale, C., Panny, L.: Quantum circuits for the CSIDH: optimizing quantum evaluation of isogenies. In: Ishai, Y., Rijmen, V. (eds.) EUROCRYPT 2019. LNCS, vol. 11477, pp. 409–441. Springer, Cham (2019). https://doi.org/10.1007/9783030176563_15
Beullens, W., Kleinjung, T., Vercauteren, F.: CSIFiSh: efficient isogeny based signatures through class group computations. IACR Cryptology ePrint Archive 2019/498 (2019)
Castryck, W., Galbraith, S.D., Farashahi, R.R.: Efficient arithmetic on elliptic curves using a mixed EdwardsMontgomery representation. Cryptology ePrint Archive, 2008/218 (2008)
Castryck, W., Lange, T., Martindale, C., Panny, L., Renes, J.: CSIDH: an efficient postquantum commutative group action. In: Peyrin, T., Galbraith, S. (eds.) ASIACRYPT 2018. LNCS, vol. 11274, pp. 395–427. Springer, Cham (2018). https://doi.org/10.1007/9783030033323_15
Childs, A.M., Jao, D., Soukharev, V.: Constructing elliptic curve isogenies in quantum subexponential time. J. Math. Cryptol. 8(1), 1–29 (2014)
Costello, C., Longa, P., Naehrig, M.: Efficient algorithms for supersingular isogeny DiffieHellman. In: Robshaw, M., Katz, J. (eds.) CRYPTO 2016. LNCS, vol. 9814, pp. 572–601. Springer, Heidelberg (2016). https://doi.org/10.1007/9783662530184_21
Costello, C., Smith, B.: Montgomery curves and their arithmetic  the case of large characteristic fields. J. Cryptogr. Eng. 8(3), 227–240 (2018)
Couveignes, J.M.: Hard homogeneous spaces. Cryptology ePrint Archive, Report 2006/291 (2006)
De Feo, L., Galbraith, S.D.: SeaSign: compact isogeny signatures from class group actions. Cryptology ePrint Archive, Report 2018/824 (2018)
De Feo, L., Jao, D., Plût, J.: Towards quantumresistant cryptosystems from supersingular elliptic curve isogenies. J. Math. Cryptol. 8(3), 209–247 (2014)
De Feo, L., Kieffer, J., Smith, B.: Towards practical key exchange from ordinary isogeny graphs. In: Peyrin, T., Galbraith, S. (eds.) ASIACRYPT 2018. LNCS, vol. 11274, pp. 365–394. Springer, Cham (2018). https://doi.org/10.1007/9783030033323_14
Decru, T., Panny, L., Vercauteren, F.: Faster SeaSign signatures through improved rejection sampling. In: Ding, J., Steinwandt, R. (eds.) PQCrypto 2019. LNCS, vol. 11505, pp. 271–285. Springer, Cham (2019). https://doi.org/10.1007/9783030255107_15
Gélin, A., Wesolowski, B.: Loopabort faults on supersingular isogeny cryptosystems. In: Lange, T., Takagi, T. (eds.) PQCrypto 2017. LNCS, vol. 10346, pp. 93–106. Springer, Cham (2017). https://doi.org/10.1007/9783319598796_6
Jalali, A., Azarderakhsh, R., Kermani, M.M., Jao, D.: Towards optimized and constanttime CSIDH on embedded devices. In: Polian, I., Stöttinger, M. (eds.) COSADE 2019. LNCS, vol. 11421, pp. 215–231. Springer, Cham (2019). https://doi.org/10.1007/9783030163501_12
Jao, D., De Feo, L.: Towards quantumresistant cryptosystems from supersingular elliptic curve isogenies. In: Yang, B.Y. (ed.) PQCrypto 2011. LNCS, vol. 7071, pp. 19–34. Springer, Heidelberg (2011). https://doi.org/10.1007/9783642254055_2
Kim, S., Yoon, K., Kwon, J., Hong, S., Park, Y.H.: Efficient isogeny computations on twisted Edwards curves. Secur. Commun. Netw. 2018, 11 (2018)
Kim, S., Yoon, K., Kwon, J., Park, Y.H., Hong, S.: New hybrid method for isogenybased cryptosystems using Edwards curves. Cryptology ePrint Archive, Report 2018/1215 (2018). https://eprint.iacr.org/2018/1215
Kim, S., Yoon, K., Kwon, J., Park, Y.H., Hong, S.: Optimized method for computing odddegree isogenies on Edwards curves. Cryptology ePrint Archive, Report 2019/110 (2019). https://eprint.iacr.org/2019/110
Meyer, M., Campos, F., Reith, S.: On Lions and elligators: an efficient constanttime implementation of CSIDH. In: Ding, J., Steinwandt, R. (eds.) PQCrypto 2019. LNCS, vol. 11505, pp. 307–325. Springer, Cham (2019). https://doi.org/10.1007/9783030255107_17
Meyer, M., Reith, S.: A faster way to the CSIDH. In: Chakraborty, D., Iwata, T. (eds.) INDOCRYPT 2018. LNCS, vol. 11356, pp. 137–152. Springer, Cham (2018). https://doi.org/10.1007/9783030053789_8
Meyer, M., Reith, S., Campos, F.: On hybrid SIDH schemes using Edwards and Montgomery curve arithmetic. Cryptology ePrint Archive 2017/1213 (2017)
Montgomery, P.L.: Speeding the Pollard and elliptic curve methods of factorization. Math. Comput. 48, 243–264 (1987)
Moody, D., Shumow, D.: Analogues of Vélu’s formulas for isogenies on alternate models of elliptic curves. Math. Comput. 85(300), 1929–1951 (2016)
Onuki, H., Aikawa, Y., Yamazaki, T., Takagi, T.: A faster constanttime algorithm of CSIDH keeping two torsion points. In: IWSEC 2019  The 14th International Workshop on Security (2019, to appear)
Rostovtsev, A., Stolbunov, A.: Publickey cryptosystem based on isogenies. Cryptology ePrint Archive, Report 2006/145 (2006)
Stolbunov, A.: Constructing publickey cryptographic schemes based on class group action on a set of isogenous elliptic curves. Adv. Math. Commun. 4(2), 215–235 (2010)
Ti, Y.B.: Fault attack on supersingular isogeny cryptosystems. In: Lange, T., Takagi, T. (eds.) PQCrypto 2017. LNCS, vol. 10346, pp. 107–122. Springer, Cham (2017). https://doi.org/10.1007/9783319598796_7
Vélu, J.: Isogénies entre courbes elliptiques. Comptesrendu de l’académie des sciences de Paris (1971)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
CervantesVázquez, D., Chenu, M., ChiDomínguez, JJ., De Feo, L., RodríguezHenríquez, F., Smith, B. (2019). Stronger and Faster SideChannel Protections for CSIDH. In: Schwabe, P., Thériault, N. (eds) Progress in Cryptology – LATINCRYPT 2019. LATINCRYPT 2019. Lecture Notes in Computer Science(), vol 11774. Springer, Cham. https://doi.org/10.1007/9783030305307_9
Download citation
DOI: https://doi.org/10.1007/9783030305307_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 9783030305291
Online ISBN: 9783030305307
eBook Packages: Computer ScienceComputer Science (R0)