Four-Dimensional Gallant–Lambert–Vanstone Scalar Multiplication

Longa, Patrick; Sica, Francesco

doi:10.1007/s00145-012-9144-3

Four-Dimensional Gallant–Lambert–Vanstone Scalar Multiplication

Published: 16 January 2013

Volume 27, pages 248–283, (2014)
Cite this article

Download PDF

Journal of Cryptology Aims and scope Submit manuscript

Four-Dimensional Gallant–Lambert–Vanstone Scalar Multiplication

Download PDF

Patrick Longa¹ &
Francesco Sica²

1356 Accesses
20 Citations
3 Altmetric
Explore all metrics

Abstract

The GLV method of Gallant, Lambert, and Vanstone (CRYPTO 2001) computes any multiple kP of a point P of prime order n lying on an elliptic curve with a low-degree endomorphism Φ (called GLV curve) over $\mathbb{F}_{p}$ as

$$kP = k_1P + k_2\varPhi(P) \quad\text{with } \max \bigl\{ |k_1|,|k_2| \bigr\} \leq C_1\sqrt{n} $$

for some explicit constant C ₁>0. Recently, Galbraith, Lin, and Scott (EUROCRYPT 2009) extended this method to all curves over $\mathbb{F}_{p^{2}}$ which are twists of curves defined over $\mathbb{F}_{p}$. We show in this work how to merge the two approaches in order to get, for twists of any GLV curve over $\mathbb{F}_{p^{2}}$, a four-dimensional decomposition together with fast endomorphisms Φ,Ψ over $\mathbb{F}_{p^{2}}$ acting on the group generated by a point P of prime order n, resulting in a proven decomposition for any scalar k∈[1,n] given by

$$kP=k_1P+ k_2\varPhi(P)+ k_3\varPsi(P) + k_4\varPsi\varPhi(P) \quad \text{with } \max_i \bigl(|k_i| \bigr)< C_2\, n^{1/4} $$

for some explicit C ₂>0. Remarkably, taking the best C ₁,C ₂, we obtain C ₂/C ₁<412, independently of the curve, ensuring in theory an almost constant relative speedup. In practice, our experiments reveal that the use of the merged GLV–GLS approach supports a scalar multiplication that runs up to 1.5 times faster than the original GLV method. We then improve this performance even further by exploiting the Twisted Edwards model and show that curves originally slower may become extremely efficient on this model. In addition, we analyze the performance of the method on a multicore setting and describe how to efficiently protect GLV-based scalar multiplication against several side-channel attacks. Our implementations improve the state-of-the-art performance of scalar multiplication on elliptic curves over large prime characteristic fields for a variety of scenarios including side-channel protected and unprotected cases with sequential and multicore execution.

Faster Compact Diffie–Hellman: Endomorphisms on the x-line

Twisted Hessian Curves

A special scalar multiplication algorithm on Jacobi quartic curves

Article 16 November 2023

1 Introduction

The Gallant–Lambert–Vanstone (GLV) method is a generic approach to speed up the computation of scalar multiplication on some elliptic curves defined over fields of large prime characteristic. Given a curve with a point P of prime order n, it consists essentially in an algorithm that finds a decomposition of an arbitrary scalar multiplication kP for k∈[1,n] into two scalar multiplications, with the new scalars having only about half the bitlength of the original scalar. This immediately enables the elimination of half the doublings by employing the Straus–Shamir trick for simultaneous point multiplication.

Whereas the original GLV method as defined in [13] works on curves over $\mathbb{F}_{p}$ with an endomorphism of small degree (GLV curves), Galbraith–Lin–Scott (GLS) in [11] have shown that over $\mathbb{F}_{p^{2}}$ one can expect to find many more such curves by basically exploiting the action of the Frobenius endomorphism. One can therefore expect that on the particular GLV curves, this new insight will lead to improvements over $\mathbb{F}_{p^{2}}$. Indeed, the GLS article itself considers four-dimensional decompositions on GLV curves with nontrivial automorphisms (corresponding to the degree one cases) but leaves the other cases open to investigation.

In this work, we generalize the GLS method to all GLV curves by exploiting fast endomorphisms Φ,Ψ over $\mathbb{F}_{p^{2}}$ acting on a cyclic group generated by a point P of prime order n to construct a proven decomposition with no heuristics involved for any scalar k∈[1,n]

$$kP=k_1P+ k_2\varPhi(P)+ k_3\varPsi(P) + k_4\varPsi\varPhi(P)\quad \text{with}\ \max_i \bigl(|k_i| \bigr)< C n^{1/4} $$

for some explicitly computable C. In doing this we provide a reduction algorithm for the four-dimensional relevant lattice which runs in O(log² n) by implementing two Cornacchia-type algorithms [9, 25], one in ℤ, the other in ℤ[i]. The algorithm is remarkably simple to implement and allows us to demonstrate an improved $C=O(\sqrt{s})$ (compared to the value obtained with LLL, which is only $\mathrm{\varOmega}(s^{3/2})$). Thus, it guarantees a relative speedup practically independent of the curve when moving from a two-dimensional to a four-dimensional GLV method over the same underlying field. If parallel computation is available, then the computation of kP can possibly be implemented (close to) four times faster in this case. When moving from two-dimensional GLV over $\mathbb {F}_{p}$ to the four-dimensional case over $\mathbb{F}_{p^{2}}$, our method still guarantees a relative speedup that is quasi-uniform among all GLV curves (see Sect. 8 for details). In fact, we present experimental results on different GLV curves that demonstrate that the relative speedup between the original GLV method and the proposed method (termed GLV–GLS in the remainder) is as high as 1.5 times.

Twisted Edwards curves [2] are efficient generalizations of the popular Edwards curves [10], which exhibit high-performance arithmetic. By exploiting this curve model, Galbraith, Lin, and Scott [12] showed that the GLS method can be improved in practice a further 10 %, approximately. Similar findings were later reported by Longa and Gebotys [23] (see also Longa [22]). Galbraith et al. also described how to write down j-invariant 0 and 1728 curves in Edwards form to combine a four-dimensional decomposition with the fast arithmetic provided by this curve model. We exploit this approach and, most remarkably, lift the restriction to those special curves and show that in practice the GLV–GLS curves discussed in this work may achieve extremely high-performance and become virtually equivalent in terms of speed when written in Twisted Edwards form.

In the last years multiple works have incrementally shown the impact of using the GLS method for high performance [11, 16, 23]. However, it is still unclear how well the method behaves on settings where side-channel attacks are a threat. Since it is usually assumed that required countermeasures once in place degrade performance significantly, it is also unclear if the GLS method would retain its current superiority in the case of side-channel protected implementations. Here, we study this open problem and describe how to protect implementations based on the GLV–GLS method against timing attacks, cache attacks, and similar ones and still achieve very high performance. The techniques discussed naturally apply to GLV-based implementations in general. Finally, we discuss different strategies to implement GLV-based scalar multiplication on modern multicore processors, and include the case in which countermeasures against side-channel attacks are required.

The presented implementations corresponding to the GLV–GLS method improve the state-of-the-art performance of point multiplication for all the cases under study: protected and unprotected versions with sequential and parallel execution. For instance, on one core of an Intel Core i7-2600 processor and at roughly 128 bits of security, we compute an unprotected scalar multiplication in only 91,000 cycles (which is 1.34 times faster than a previous result reported by Hu, Longa, and Xu [16]) and a side-channel protected scalar multiplication in only 137,000 cycles (which is 1.42 times faster than the protected implementation presented by Bernstein et al. [3]).

Related Work

Recently, a paper by Zhou, Hu, Xu, and Song [32] has shown that it is possible to combine the GLV and GLS approaches by introducing a three-dimensional version of the GLV method, which seems to work to a certain degree, with however no justification but through practical implementations. The first author together with Hu and Xu [16] studied the case of curves with j-invariant 0 and provided a bound for this particular case. Our analysis supplements [16] by considering all GLV curves and providing a unified treatment.

2 The GLV Method

In this section we briefly summarize the GLV method following [29]. Let E be an elliptic curve defined over a finite field $\mathbb{F}_{q}$, and P be a point on this curve with prime order n such that the cofactor $h=\#E(\mathbb{F}_{q})/n$ is small, say h≤4. Let us consider a nontrivial endomorphism Φ defined over $\mathbb{F}_{q}$ and its characteristic polynomial X ²+rX+s. In all the examples, r and s are actually small fixed integers, and q is varying in some family. By hypothesis there is only one subgroup of order n in $E(\mathbb {F}_{q})$, implying that Φ(P)=λP for some λ∈[0,n−1], since Φ(P) has order dividing the prime n. In particular, λ is obtained as a root of X ²+rX+s modulo n.

Define the group homomorphism (the GLV reduction map)

Let $\mathcal {K}=\ker \mathfrak {f}$. It is a sublattice of ℤ×ℤ of rank 2 since the quotient is finite. Let $\Bbbk>0$ be a constant (depending on the curve) such that we can find v ₁,v ₂ two linearly independent vectors of $\mathcal {K}$ satisfying $\max\{\vert v_{1}\vert _{\infty}, \vert v_{2}\vert _{\infty}\}<~\Bbbk\sqrt{n}$, where $\vert \vphantom{v_{1}}\cdot \vert _{\infty}$ denotes the rectangle norm.^{Footnote 1} Express

$$(k,0)= \beta_1v_1 + \beta_2v_2, $$

where β _i∈ℚ. Then round β _i to the nearest integer b _i=⌊β _i⌉=⌊β _i+1/2⌋ and let v=b ₁ v ₁+b ₂ v ₂. Note that $v\in \mathcal {K}$ and that $u\overset{\text{def}}{=} (k,0)-v$ is short. Indeed by the triangle inequality we have that

$$\vert \vphantom{v_1}u\vert _\infty\leq \frac{\vert v_1\vert _\infty + \vert v_2 \vert _\infty}{2} <\Bbbk\sqrt{n}. $$

If we set (k ₁,k ₂)=u, then we get k≡k ₁+k ₂ λ(modn) or equivalently kP=k ₁ P+k ₂ Φ(P) with $\max (|k_{1}|,|k_{2}|)<\Bbbk\sqrt{n}$.

In [29], the optimal value of $\Bbbk$ (with respect to large values of n, i.e., large fields, keeping X ²+rX+s constant) is determined. Let Δ=r ²−4s be the discriminant of the characteristic polynomial of Φ. Then the optimal $\Bbbk$ is given by the following result.^{Footnote 2}

Theorem 1

[29, Theorem 4]

Assuming that n is the norm of an element of ℤ[Φ], the optimal value of $\Bbbk$ is

$$\Bbbk= \begin{cases} \frac{\sqrt{s}}{2} (1+\frac{1}{|\varDelta |} ) &\text{\textit{if} $r$ \textit{is odd,}}\\[4pt] \frac{\sqrt{s}}{2} \sqrt{1+\frac{4}{|\varDelta |}} &\text{\textit{if} $r$ \textit{is even.}} \end{cases} $$

3 The GLS Improvement

In 2009, Galbraith, Lin, and Scott [11] realized that we do not need to have Φ ²+rΦ+s=0 in $\operatorname{End}(E)$ but only in a subgroup of $E(\mathbb{F})$ for a specific finite field $\mathbb{F}$. In particular, considering $\varPsi=\operatorname{Frob}_{p}$ the p-power Frobenius endomorphism of a curve E defined over $\mathbb{F}_{p}$, we know that Ψ ^m(P)=P for all $P\in E(\mathbb{F}_{p^{m}})$. While this tells nothing useful if m=1,2, it does offer new nontrivial relations for higher-degree extensions. The case m=4 is particularly useful here.

In this case, if $P\in E(\mathbb{F}_{p^{4}}) \backslash E(\mathbb {F}_{p^{2}})$, then Ψ ²(P)=−P, and hence on the subgroup generated by P, Ψ satisfies the equation X ²+1=0. This implies that if Ψ(P) is a multiple of P (which happens as soon as the order n of P is sufficiently large, say at least 2p), we can apply the previous GLV construction and split again a scalar multiplication as kP=k ₁ P+k ₂ Ψ(P) with $\max(|k_{1}|,|k_{2}|) = O(\sqrt{n})$. Contrast this with the characteristic polynomial of Ψ which is X ²−a _p X+p for some integer a _p, a nonconstant polynomial to which we cannot apply efficiently the GLV paradigm.

For efficiency reasons, however, one does not work with $E/\mathbb{F}_{p^{4}}$ directly but with $E'/\mathbb{F}_{p^{2}}$ isomorphic to E over $\mathbb{F}_{p^{4}}$ but not over $\mathbb{F}_{p^{2}}$, that is, a quadratic twist over $\mathbb{F}_{p^{2}}$. In this case, it is possible that $\#E'(\mathbb{F}_{p^{2}})=n\geq(p-1)^{2}$ be prime. Furthermore, if ψ:E′→E is an isomorphism defined over $\mathbb{F}_{p^{4}}$, then the endomorphism $\varPsi= \psi \operatorname{Frob}_{p} \psi^{-1} \in \operatorname{End}(E')$ satisfies the equation X ²+1=0, and if p≡5(mod8), it can be defined over $\mathbb{F}_{p}$.

This idea is at the heart of the GLS approach, but it only works for curves over $\mathbb{F}_{p^{m}}$ with m>1, and therefore it does not generalize the original GLV method but rather complements it.

4 Combining GLV and GLS

Let $E/\mathbb{F}_{p}$ be a GLV curve. As in Sect. 3, we will denote by $E'/\mathbb{F}_{p^{2}}$ a quadratic twist $\mathbb {F}_{p^{4}}$-isomorphic to E via the isomorphism ψ:E→E′. We also suppose that $\# E'(\mathbb{F}_{p^{2}}) = nh$ where n is prime and h≤4. We then have the two endomorphisms of E′, $\varPsi= \psi \operatorname{Frob}_{p} \psi^{-1}$ and Φ=ψϕψ ⁻¹, with ϕ the GLV endomorphism coming with the definition of a GLV curve. They are both defined over $\mathbb {F}_{p^{2}}$ since if σ is the nontrivial Galois automorphism of $\mathbb {F}_{p^{4}}/ \mathbb {F}_{p^{2}}$, then ψ ^σ=−ψ, so that $\varPsi^{\sigma}= \psi^{\sigma} \operatorname{Frob}_{p}^{\sigma}(\psi^{-1} )^{\sigma}= (-\psi)\operatorname{Frob}_{p}(-\psi^{-1}) = \varPsi$, meaning that $\varPsi\in \operatorname{End}_{ \mathbb {F}_{p^{2}}}(E')$. Similarly for Φ, where we are using the fact that $\phi\in \operatorname{End}_{ \mathbb {F}_{p}}(E)$. Notice that Ψ ²+1=0 and that Φ has the same characteristic polynomial as ϕ. Furthermore, since we have a large subgroup $\langle P \rangle\subset E'(\mathbb{F}_{p^{2}})$ of prime order, Φ(P)=λP and Ψ(P)=μP for some λ,μ∈[1,n−1]. We will assume that Φ and Ψ, when viewed as algebraic integers, generate disjoint quadratic extensions of ℚ. In particular, we are not dealing with Example 1 from Appendix A, but this can be treated separately with a quartic twist as described in Appendix B.

Consider the biquadratic (Galois of degree 4, with Galois group ℤ/2×ℤ/2) number field K=ℚ(Φ,Ψ). Let $\mathfrak{o}_{K}$ be its ring of integers. The following analysis is inspired by [29, Sect. 8].

We have $\mathbb{Z}[\varPhi, \varPsi] \subseteq\mathfrak{o}_{K}$. Since the degrees of Φ and Ψ are much smaller than n, the prime n is unramified in K, and the existence of λ and μ above means that n splits in ℚ(Φ) and ℚ(Ψ), namely that n splits completely in K. There exists therefore a prime ideal $\mathfrak{n}$ of $\mathfrak{o}_{K}$ dividing $n\mathfrak {o}_{K}$, such that its norm is n. We can also suppose that $\varPhi \equiv \lambda\pmod{\mathfrak{n}}$ and $\varPsi\equiv\mu\pmod{\mathfrak{n}}$. The four-dimensional GLV–GLS method works as follows.

Consider the GLV–GLS reduction map F defined by

If we can find four linearly independent vectors v ₁,…,v ₄∈kerF with max_i|v _i|_∞≤Cn ^1/4 for some constant C>0, then for any k∈[1,n−1], we write

$$(k,0,0,0) = \sum_{j=1}^4 \beta_j v_j $$

with β _j∈ℚ. As in the GLV method, one performs a Babai rounding to obtain the closest lattice vector $v= \sum_{j=1}^{4} \lfloor \beta_{j} \rceil v_{j}$ and defines

$$u = (k,0,0,0)-v = (k_1, k_2, k_3, k_4) . $$

We then get

$$ kP=k_1P+ k_2\varPhi(P)+ k_3 \varPsi(P) + k_4\varPsi\varPhi(P)\quad \text{with } \max_i \bigl(|k_i|\bigr)\leq2C n^{1/4} . $$

(1)

We next focus on the study of kerF in order to find a reduced basis v ₁,v ₂,v ₃,v ₄ with an explicit C. We can factor the GLV–GLS map F as

Notice that the kernel of the second map (reduction mod $\mathfrak{n}\cap \mathbb {Z}[\varPhi,\varPsi]$) is exactly $\mathfrak{n}\cap \mathbb {Z}[\varPhi,\varPsi]$. This can be seen as follows. The reduction map factors as

$$\mathbb{Z}[\varPhi,\varPsi] \longrightarrow\mathfrak{o}_K \longrightarrow \mathfrak{o}_K / \mathfrak{n} \cong\mathbb{Z}/n, $$

where the first arrow is inclusion, and the second is reduction mod $\mathfrak{n}$ corresponding to reducing the x _i’s mod $\mathfrak {n}\cap\mathbb{Z}= n\mathbb{Z}$ and using $\varPhi\equiv\lambda, \varPsi \equiv\mu\pmod{\mathfrak{n}}$. But the kernel of this map consists precisely of elements of ℤ[Φ,Ψ] which are in $\mathfrak {n}$, and that is what we want.

Moreover, since the reduction map is surjective, we obtain an isomorphism $\mathbb {Z}[\varPhi,\varPsi]/\mathfrak{n}\cap \mathbb {Z}[\varPhi ,\varPsi] \cong\mathbb {Z}/n$, which says that the index of $\mathfrak{n}\cap \mathbb {Z}[\varPhi ,\varPsi]$ inside ℤ[Φ,Ψ] is n. Since the first map f is an isomorphism, we get that $\ker F = f^{-1} (\mathfrak{n}\cap \mathbb {Z}[\varPhi,\varPsi])$ and that kerF has index [ℤ⁴:kerF]=n inside ℤ⁴.

We can also produce a basis of kerF by the following observation. Let Φ′=Φ−λ, Ψ′=Ψ−μ, and hence Φ′Ψ′=ΦΨ−λΨ−μΦ+λμ. In matrix form,

Since the determinant of the square matrix is 1, we deduce that ℤ[Φ,Ψ]=ℤ[Φ′,Ψ′]. But in this new basis, we claim that

$$\mathfrak{n}\cap \mathbb {Z}\bigl[\varPhi',\varPsi'\bigr] = n \mathbb{Z} + \mathbb {Z}\varPhi' + \mathbb{Z}\varPsi' + \mathbb{Z}\varPhi'\varPsi' . $$

Indeed, reverse inclusion (⊇) is easy since $\varPhi',\varPsi', \varPhi'\varPsi' \in\mathfrak{n}$ and so is n, because $\mathfrak{n}$ divides $n\mathfrak{o}_{K}$ is equivalent to $\mathfrak{n} \supseteq n\mathfrak {o}_{K}$. On the other hand, the index of both sides in ℤ[Φ′,Ψ′] is n, which can only happen, once an inclusion is proved, if the two sides are equal. Using the isomorphism f, we see that a basis of kerF⊂ℤ⁴ is therefore given by

$$w_1= (n,0,0,0), w_2= (-\lambda, 1 ,0,0), w_3 = (-\mu, 0, 1, 0), w_4 = ( \lambda\mu, -\mu, - \lambda, 1) . $$

The LLL algorithm [20] then finds, for a given basis w ₁,…,w ₄ of kerF, a reduced^{Footnote 3} basis v ₁,…,v ₄ in polynomial time (in the logarithm of the norm of the w _i’s) such that (cf. [8, Theorem 2.6.2, p. 85])

$$ \prod_{i=1}^4 |v_i|_\infty \leq8 \bigl[\mathbb{Z}^4 \colon \ker F\bigr] = 8n. $$

(2)

Lemma 1

Let Φ and Ψ be as defined at the beginning of this section,

be the norm of an element x ₁+x ₂ Φ+x ₃ Ψ+x ₄ ΦΨ∈ℤ[Φ,Ψ], where the $b_{i_{1},i_{2},i_{3},i_{4}}$ ’s lie in ℤ. Then, for any nonzero v∈kerF, one has

$$ |v|_\infty\geq \frac{n^{1/4}}{ (\sum_{\substack{i_1,i_2,i_3,i_4\\ i_1+i_2+i_3+i_4=4}} |b_{i_1,i_2,i_3,i_4}| )^{1/4}} . $$

(3)

Proof

For v∈kerF, we have , and if v≠0, we must therefore have . On the other hand, if we did not have (3), then every component of v would be strictly less than the right-hand side, and plugging this upper bound in the definition of would yield a quantity <n, a contradiction. □

Let B be the denominator of the right-hand side of (3). Then (2) and (3) imply that

$$ |v_i|_\infty\leq8B^{3} n^{1/4}, \quad i=1,2,3,4 . $$

(4)

Remark 1

In our case, where Ψ ²+1=0 and Φ ²+rΦ+s=0, we get as norm function

and therefore,

$$ B= \bigl(4+4s^2 + 8s + 8|r| + 8 |r| s + 2 \bigl(r^2+2s\bigr) + 2 |r^2-2s| \bigr)^{1/4} . $$

(5)

From (1) and (4) we have proved the following theorem.

Theorem 2

Let $E/\mathbb{F}_{p}$ be a GLV curve, and $E'/\mathbb{F}_{p^{2}}$ a twist, together with the two efficient endomorphisms Φ and Ψ, where everything is defined as at the start of this section. Suppose that the minimal polynomial of Φ is X ²+rX+s=0. Let $P\in E'(\mathbb{F}_{p^{2}})$ be a generator of the large subgroup of prime order n. There exists an efficient algorithm which for any k∈[1,n] finds integers k ₁,k ₂,k ₃,k ₄ such that

$$kP=k_1P+ k_2\varPhi(P)+ k_3\varPsi(P) + k_4\varPsi\varPhi(P)\quad \text{\textit{with} } \max_i \bigl(|k_i|\bigr)\leq16 B^3 n^{1/4} $$

and

$$B= \bigl(4+4s^2 + 8s + 8|r| + 8 |r| s + 2 \bigl(r^2+2s \bigr) + 2 \bigl|r^2-2s\bigr| \bigr)^{1/4} . $$

5 Uniform Improvements and a Tale of Two Cornacchia Algorithms

The previous analysis is only the first step of our work. It shows that the GLV–GLS method works as predicted in a four-way decomposition on twists of GLV curves over $\mathbb{F}_{p^{2}}$. However, the constant B ³ involved is rather large and, hence, does not guarantee a non-negligible gain when switching from two to four dimensions (especially on those GLV curves with more complicated endomorphism rings). A much deeper argument allows us to prove the following result.

Theorem 3

When performing an optimal lattice reduction on kerF, it is possible to decompose any k∈[1,n] into integers k ₁,k ₂,k ₃,k ₄ such that

$$kP=k_1P+ k_2\varPhi(P)+ k_3\varPsi(P) + k_4\varPsi\varPhi(P) $$

with $\max_{i} (|k_{i}|) < 103 (\sqrt{1+|r|+s}) \, n^{1/4}$.

The significance of this theorem lies in the improvement of the constant 16B ³, which is Ω(s ^3/2) in Theorem 2, to a value that is an absolute constant times greater than the minimal bound for the two-dimensional GLV method (Theorem 1). Hence, this guarantees in practice a more uniform improvement when switching from two-dimensional to four-dimensional GLV independently of the curve.

To prove Theorem 3, first note that Lemma 1 gives a rather poor bound when applied to more than one vector, as is done three times for the proof of Theorem 2. A more direct treatment of the reduced vectors of kerF becomes necessary, and this is done via a modification of the original GLV approach. This results in a new, easy-to-implement lattice reduction algorithm which employs two Cornacchia-type algorithms [8, Sect. 1.5.2], one in ℤ (as in the original GLV method), the other one in ℤ[i] (Gaussian Cornacchia).

The full proof of Theorem 3 via the new lattice reduction algorithm can be found in Appendix D.

5.1 The Euclidean Algorithm in ℤ

The first step is to find ν=a+ib∈ℤ[i] such that |ν|²=a ²+b ²=n, i.e., a Gaussian prime above n. Recall that n splits in ℤ[i]. Let ν=a+ib a prime above n. We can furthermore assume that νP=aP+biP:=aP+bΨ(P)=0 since $\nu\bar{\nu}P = nP=0$, and hence either $\bar{\nu}P$ is a nonzero multiple of P and therefore νP=0, or else $\bar{\nu}P=0$, so that in any case one of the Gaussian primes (WLOG ν) above n will have νP=0. We can find ν by Cornacchia’s algorithm [8, Sect. 1.5.2], which is a truncated form of the Euclidean algorithm. For completeness and consistency with what will follow, we recall how this is done.

Let μ∈[1,n] be such that μ≡i(modn), with i being defined by Ψ(P)=iP. Actually, in the GLS approach [11], it has been pointed out that this value of μ can be readily computed from $\#E( \mathbb {F}_{p})$. The extended Euclidean algorithm to compute the gcd of n and μ produces three terminating sequences of integers (r _j)_j≥0, (s _j)_j≥0 and (t _j)_j≥0 such that

(6)

for some integer q _j+1>0 and initial data

(7)

This means that at step j≥0,

$$r_j = q_{j+1} r_{j+1} + r_{j+2} $$

and similarly for the other sequences. The sequence (q _j)_j≥1 is uniquely defined by imposing that the previous equation be the integer division of r _j by r _j+1. In other terms, q _j+1=⌊r _j/r _j+1⌋. This implies by induction that all the sequences are well defined in the integers, together with the following properties.

Lemma 2

The sequences (r _j)_j≥0,(s _j)_j≥0, and (t _j)_j≥0 defined by (6) and (7) with q _j+1=⌊r _j/r _j+1⌋ satisfy the following properties, valid for all j≥0.

1.
r _j>r _j+1≥0 and q _j+1≥1,
2.
(−1)^j s _j≥0 and |s _j|<|s _j+1| (this last inequality valid for j≥1),
3.
(−1)^j+1 t _j≥0 and |t _j|<|t _j+1|,
4.
s _j+1 r _j−s _j r _j+1=(−1)^j+1 r ₁,
5.
t _j+1 r _j−t _j r _j+1=(−1)^j r ₀,
6.
r ₀ s _j+r ₁ t _j=r _j.

These properties lie at the heart of the original GLV algorithm. They imply in particular via property 1 that the algorithm terminates (once r _j reaches zero) and that it has O(logn) steps, as r _j=q _j+1 r _j+1+r _j+2≥r _j+1+r _j+2>2r _j+2. Note that properties 1, 2, and 3 imply that properties 4 and 5 can be rewritten in our case respectively as

$$ |s_{j+1} r_j| + |s_j r_{j+1}| = \mu\quad\text{and} \quad|t_{j+1} r_j| + |t_j r_{j+1}| = n . $$

(8)

The Cornacchia (as well as the GLV) algorithm does not make use of the full sequences (r _j),(s _j), and (t _j) but rather stops at the m≥0 such that $r_{m}\geq\sqrt{n}$ and $r_{m+1}< \sqrt{n}$. An application of (8) with j=m yields |t _m+1 r _m|<n or $|t_{m+1}| < \sqrt{n}$. Since by property 6 we have r _m+1−μt _m+1=ns _m+1≡0(modn), we deduce that $r_{m+1}^{2}+ t_{m+1}^{2}=(r_{m+1}-\mu t_{m+1})(r_{m+1}+\mu t_{m+1}) \equiv0 \pmod{n}$. Moreover, t _m+1≠0 by property 3, so that $0< r_{m+1}^{2}+t_{m+1}^{2} < n + n = 2n$, which therefore implies that $r_{m+1}^{2}+ t_{m+1}^{2} = n$ and finally that ν=r _m+1−it _m+1.

We present here the pseudo-code of this Euclidean algorithm in ℤ.

Algorithm 1

(Cornacchia’s GCD in ℤ)

5.2 The Euclidean Algorithm in ℤ[i]

In the previous subsection we have given a meaning to zP, where z∈ℤ[i], and we have seen how to construct ν, a Gaussian prime such that νP=0. By identifying^{Footnote 4} (x ₁,x ₂,x ₃,x ₄)∈ℤ⁴ with (z ₁,z ₂)=(x ₁+ix ₃,x ₂+ix ₄)∈ℤ[i]², we can rewrite the 4-GLV reduction map F of Sect. 4 as (using the same letter F by abuse of notation)

This F should be compared with the map $\mathfrak {f}$ of Sect. 2. In mimicking the GLV original paper [13] we would like to apply the extended Euclidean algorithm (defined exactly as before, with integer divisions occurring in ℤ[i], henceforth denoted EGEA in short for extended Gaussian Euclidean algorithm) to the pair (r ₀,r ₁)=(λ,ν) if $\lambda\geq\sqrt{2}\, |\nu|$ and (r ₀,r ₁)=(λ+n,ν) otherwise (the latter case being exceptionally rare). This should output short vectors in ℤ[i]², which we can transform into short vectors in ℤ⁴ using the previous isomorphism, thus proving Theorem 3 by the Babai rounding argument given in Sect. 4.

What are the difficulties in following this path? Let us note that properties 4, 5, and 6 of Lemma 2 still hold and property 1 holds in modulus (in particular, the algorithm terminates). However, in the analysis of this algorithm, especially in [29], a crucial role is played by (8), in order to derive a bound on |s _j+1 r _j| and |s _j r _j+1| from a bound on

$$ s_{j+1}r_j - s_jr_{j+1} = (-1)^{j+1}\nu $$

(9)

in the present case. This fact, as we saw, stems from the alternating sign of the sequence (s _j), which results from taking a canonical form of integer division with positive quotients q _j+1 and nonnegative remainders r _j+2, a property which is not available here. Nevertheless, we can still use a similar reasoning using (9), provided that the arguments of s _j+1 r _j and s _j r _j+1 are not too close, so as to avoid a high degree of cancellation. In other terms, in order to follow the argument of [29, Theorem 1], we need a property of the kind

$$| s_{j+1}r_j - s_jr_{j+1} | \leq M \quad\Longrightarrow\quad\max \bigl(|s_{j+1}r_j|, |s_jr_{j+1}|\bigr) \leq c M $$

for some explicit absolute constant c (equal to 1 in [29]). This is in general impossible to attain because in the EGEA, in contrast to the usual extended Euclidean algorithm, we have no control over the arguments of the r _j’s or the s _j’s. However, in most cases something of the sort can be proved. This is the content of Lemma 4 (Appendix D). We define the corresponding indices (terms) of the sequences r _j,s _j as “good” when this happens. If all the terms were good, then the proof of [29, Theorem 1] could be carried over to proving Theorem 3 without almost any change (the final constant of the theorem would be different, depending on c). However, this is not the case, and the main difficulty here lies in the treatment of the terms which are not good (called therefore “bad”). The surprising fact is that we can still control the contribution of bad terms to our advantage (see Lemma 5) and, ultimately, the combination of Lemmas 4 and 5 becomes the main ingredient in the proof of Theorem 3. All above makes the reasoning noticeably more sophisticated than in [29].

We now turn to the description of the EGEA. The first observation is that in the case of Gaussian integers, there can be 2, 3, or 4 possible choices for a remainder in the jth step of the integer division r _j=q _j+1 r _j+1+r _j+2. It turns out that choosing at each step j≥0 of the EGEA a remainder r _j+2 with smallest modulus will yield Theorem 3.

We give the pseudo-code of Cornacchia’s Algorithm in ℤ[i] in two forms, working with complex numbers (see Algorithm 2) and separating real and imaginary parts (see Algorithm 3, Appendix C).

Algorithm 2

(EGEA or Cornacchia’s algorithm in ℤ[i]—compact form)

Remark 2

In the case of the LLL algorithm, we have not managed to demonstrate a bound as good as the one obtained with our lattice reduction algorithm.

Remark 3

Nguyen and Stehlé [26] have produced an efficient lattice reduction in four dimensions which finds successive minima and hence produces a decomposition with relatively good bounds. Our algorithm represents a very simple and easy-to-implement alternative that may be ideal for certain cryptographic libraries.

6 GLV–GLS using the Twisted Edwards Model

The GLV–GLS method can be sped up in practice by writing down GLV–GLS curves in the Twisted Edwards model. Note that arithmetic on j-invariant 0 Weierstrass curves is already very efficient. However, some GLV curves do not exhibit such high-speed arithmetic. In particular, curves in Examples 3–6 from Appendix A have Weierstrass coefficients a ₄⋅a ₆≠0 for curve parameters a ₄ and a ₆, and hence they have more expensive point doubling (even more if we consider the extra multiplication by the twisted parameter u when using the GLS method). So the impact of using Twisted Edwards is expected to be especially significant for these curves. In fact, if we consider that suitable parameters can be always chosen, the use of Twisted Edwards curves isomorphic to the original Weierstrass GLV–GLS curves uniformizes the performance of all of them.

Let us illustrate how to produce a Twisted Edwards GLV–GLS curve with the GLV curve from Example 4, Appendix A. First, consider its quadratic twist over $\mathbb{F}_{p^{2}}$

$$E'/\mathbb{F}_{p^2}{:}\ x^3 - \frac{15}{2} u^2 x -7 u^3 = (x + 2u) \cdot \biggl(x^2 - 2ux - \frac{7}{2} u^2\biggr). $$

The change of variables x ₁=x+2u transforms E′ into

$$y^2 = x_1^3 -6u x_1^2 + \frac{9u^2}{2} x_1 . $$

Let $\beta= 3u/\sqrt{2} \in\mathbb{F}_{p^{2}}$ and substitute x ₁=βx′ to get

$$\frac{1}{\beta^3} y^2 = x'^3 - \frac{6u}{\beta} x'^2 + x', $$

and this is a Montgomery curve M _A,B:Bv ²=u ³+Au ²+u, where A≠±2,B≠0, with

$$B= \frac{1}{\beta^3} = \frac{2\sqrt{2}}{27u^3} , \qquad A=- \frac{6u}{\beta}= -2\sqrt{2} . $$

The corresponding Twisted Edwards GLV–GLS curve is then E _a,d:ax ²+y ²=1+dx ² y ² with

$$a =\frac{A+2}{B} = 27u^3 \biggl(\frac{\sqrt{2}}{2} - 1 \biggr), \qquad d = \frac{A-2}{B} = -27u^3 \biggl(\frac{\sqrt{2}}{2} + 1 \biggr). $$

The map E′→E _a,d is

$$(x,y) \mapsto \biggl(\frac{x+2u}{\beta y} , \frac{x+2u-\beta }{x+2u+\beta } \biggr) = (X,Y) $$

with inverse

$$(X,Y) \mapsto \biggl( \frac{\beta-2u + (\beta+2u)Y}{1-Y} , \frac {1+Y}{(1-Y) X} \biggr) . $$

We now specify the formulas for Φ and Ψ, obtained by composing these endomorphisms on the Weierstrass model with the birational maps above. We found an extremely appealing expression in the case where u=1+i and i ²=−1. Then $\beta= 3u/\sqrt{2} = 3\zeta_{8}$ where ζ ₈ is a primitive 8th root of unity. We have

and

$$\varPsi(X,Y) = \biggl(\zeta_8 X^p, \frac{1}{Y^p} \biggr) . $$

In this case,

$$a = 54 \bigl(\zeta_8^3-\zeta_8^2+1 \bigr) , \qquad d = -54 \bigl(\zeta_8^3+ \zeta_8^2-1\bigr). $$

Finally, one would want to use the efficient formulas given in [15] for the case a=−1. After ensuring that −a be a square in $\mathbb{F}_{p^{2}}$, we use the map $(x,y) \mapsto(x/\sqrt[]{-a},y)$ to convert to the isomorphic curve −x ²+y ²=1+d′x ² y ², where d′=−d/a.

7 Side-Channel Protection and Parallelization of the GLV–GLS Method

Given the potential threat posed by attacks that exploit timing information to deduce secret keys ([7, 19]), many works have proposed countermeasures to minimize the risks and achieve the so-called constant-time execution during cryptographic computations. In general, to avoid leakage, the execution flow should be independent of the secret key. This means that conditional branches and secret-dependent table lookup indices should be avoided [4, 18]. There are five key points that are especially vulnerable during the computation of scalar multiplication: inversion, modular reduction in field operations, precomputation, scalar recoding, and double-and-add execution.

A well-known technique that is secure and easy to implement for inverting any field element a consists of computing the exponentiation a ^p−2 mod p using a short addition chain for p−2.

To protect field operations, one may exploit conditional move instructions typically found on modern x86 and x64 processors (a.k.a. cmove). Since conditional checks happen during operations such as addition and subtraction as part of the reduction step, it is standard practice to replace conditional branches with the conditional move instruction. Luckily, these conditional branches are highly unpredictable, and, hence, the substitution above does not only make the execution constant-time but also more efficient in most cases. An exception happens when performing modular reduction during a field multiplication or squaring, where a final correction step could happen very rarely, and hence a conditional branch may be more efficient.

For the case of precomputation in the setting of elliptic curves, recent work [18] and later [3] showed how to enable the use of precomputed points by employing constant-time table lookups that mask the extraction of points, which is a known technique in the literature (see, for example, [5]). In our implementations (see Sect. 8), we exploit a similar approach based on cmove and conditional vector instructions instead, which is expected to achieve higher performance on some platforms than implementations based on logical instructions (see Listing 1 in [18]). Note that it is straightforward to enable the use of signed-digit representations that allow negative points by performing a second table lookup between the point selected in the first table lookup and its negated value.

To protect the scalar recoding and its corresponding double-and-add algorithm, one needs a regular pattern execution. Based on a method in [27], Joye and Tunstall [17] proposed a constant-time recoding that supports a regular execution double-and-add algorithm that exploits precomputations. The nonzero density of the method is 1/(w−1), where w is the window width. Therefore, there is certain loss in performance in comparison with an unprotected version with nonzero density 1/(w+1). In GLV-based implementations one has to deal with more than one scalar, and these scalars are scanned simultaneously, using interleaving [13], for instance, during multi-exponentiation. So there are two issues that arise. First, how are the several scalars aligned with respect to their zero and nonzero digit representation? And second, how do we guarantee the same representation length for all scalars so that no dummy operations are required? The first issue is inherently solved by the recoding algorithm itself. The input is always an odd number, which means that, from left to right, one obtains the execution pattern (w−1) doublings, d additions, (w−1) doublings, d additions, … , (w−1) doublings and d additions, for d-dimensional GLV. For dealing with even numbers, one may employ the technique described in [17] in a constant-time fashion, namely, scalars k _i that are even are replaced by k _i+1 and scalars that are odd are replaced by k _i+2 (the correction, also constant-time, is performed after the scalar multiplication computation using d point additions). A solution to the second issue was also hinted by [17]. We present in Appendix E the modified recoding algorithm that outputs a regular pattern representation with fixed length. Note that in the case of Twisted Edwards, one can alternatively use unified addition formulas that also work for doubling (see [2, 15] for details). However, our analysis indicates that this approach is consistently slower because of the high cost of these unified formulas in comparison to doubling and the extra cost incurred by the increase in constant-time table lookup accesses.

7.1 Multicore Computation and Its Side-Channel Protection

Parallelization of scalar multiplication over prime fields is particularly difficult on modern multicore processors. This is due to the difficulty to perform point operations concurrently when executing the double-and-add algorithm from left to right. From right to left parallelization is easier, but performance is hurt because the use of precomputations is cumbersome. Hence, parallelization should be ideally performed at the field arithmetic level. Unfortunately, current multicore processors still impose a severe overhead for thread creation/destruction. During our tests, we observed overheads of a few thousands of cycles on modern 64-bit CPUs (that is, much more costly than a point addition or doubling). Given this limitation, for the GLV method, it seems the ideal approach (from a speed perspective) to let each core manage a separate scalar multiplication with k _i. This is simple to implement, minimizes thread management overhead and also eases the task of protecting the implementation against side-channel attacks since each scalar can be recoded using Algorithm 4, Appendix E. Using d cores, the total cost of a protected d-dimensional GLV l-bit scalar multiplication (disregarding precomputation) is approximately l/d doublings and l/((w−1)⋅d) mixed additions. A somewhat slower approach (but more power efficient) would be to let one core manage all doublings and let one or two extra cores manage the additions corresponding to nonzero digits. For instance, for dimension four and three cores, the total cost (disregarding precomputation) is approximately l/d doublings and l/((w−1)⋅d) general additions, always that the latency of (w−1) doublings be equivalent or greater than the addition part (otherwise, the cost is dominated by nonmixed additions).

8 Performance Analysis and Experimental Results

For our analysis and experiments, we consider the five curves below: two GLV curves in Weierstrass form with and without nontrivial automorphisms, their corresponding GLV–GLS counterparts, and one curve in Twisted Edwards form isomorphic to the GLV–GLS curve $E'_{3}$ (see below).

GLV–GLS curve with j-invariant 0 in Weierstrass form $E'_{1}/\mathbb{F}_{p_{1}^{2}}: y^{2}=x^{3} + 9u$, where p ₁=2¹²⁷−58309 and $\#E'_{1}(\mathbb{F}_{p_{1}^{2}}) = r$, where r is a 254-bit prime. We use $\mathbb{F}_{p_{1}^{2}} = \mathbb {F}_{p_{1}}[i]/(i^{2} +1)$ and $u=1+i \in\mathbb{F}_{p_{1}^{2}}$. $E'_{1}$ is the quadratic twist of the curve in Example 2, Appendix A. Φ(x,y)=λP=(ξx,y) and Ψ(x,y)=μP=(u ^(1−p)/3 x ^p,u ^(1−p)/2 y ^p), where ξ ³=1modp ₁. We have that Φ ²+Φ+1=0 and Ψ ²+1=0.
GLV curve with j-invariant 0 in Weierstrass form $E_{2}/\mathbb {F}_{p_{2}}: y^{2} = x^{3} + 2$, where p ₂=2²⁵⁶−11733, and $\# E_{2}(\mathbb{F}_{p_{2}})$ is a 256-bit prime. This curve corresponds to Example 2, Appendix A.
GLV–GLS curve in Weierstrass form $E'_{3}/\mathbb{F}_{p_{3}^{2}}: y^{2}=x^{3}-15/2\; u^{2} x-7u^{3}$, where p ₃=2¹²⁷−5997 and $\#E'_{3}(\mathbb{F}_{p_{3}^{2}}) = 8r$, where r is a 251-bit prime. We use $\mathbb{F}_{p_{3}^{2}} = \mathbb {F}_{p_{3}}[i]/(i^{2} + 1)$ and $u=1+i \in\mathbb{F}_{p_{3}^{2}}$. $E'_{3}$ is the quadratic twist of a curve isomorphic to the one in Example 4, Appendix A. The formula for Φ(x,y)=λP can be easily derived from ψ(x,y), and Ψ(x,y)=μP=(u ^(1−p) x ^p,u ^3(1−p)/2 y ^p). It can be verified that Φ ²+2=0 and Ψ ²+1=0.
GLV–GLS curve in Twisted Edwards form $E'_{T3}/\mathbb {F}_{p_{3}^{2}}: -x^{2} + y^{2}=1+ dx^{2} y^{2}$, where
p ₃=2¹²⁷−5997 and $\# E'_{T3}(\mathbb{F}_{p_{3}^{2}}) = 8r$, where r is a 251-bit prime. We use again $\mathbb{F}_{p_{3}^{2}} = \mathbb{F}_{p_{3}}[i]/(i^{2} + 1)$ and $u=1+i \in\mathbb{F}_{p_{3}^{2}}$. $E'_{T3}$ is isomorphic to curve $E'_{3}$ above and was obtained following the procedure in Sect. 6. The formulas for Φ(x,y) and Ψ(x,y) are also given in Sect. 6. It can be verified that Φ ²+2=0 and Ψ ²+1=0.
GLV curve $E_{4}/\mathbb{F}_{p_{4}}: y^{2}=x^{3}-15/2\; x-7$, where p ₄=2²⁵⁶−45717 and $\#E_{4}(\mathbb{F}_{p_{4}}) = 2r$, where r is a 256-bit prime. This curve is isomorphic to the curve in Example 4, Appendix A.

For our experiments, we also explored the case of p=2¹²⁸−c, with a relatively small integer c, for GLV–GLS curves. We finally decided on p=2¹²⁷−c because it was consistently faster thanks to the use of lazy reduction in the multiplication over $\mathbb {F}_{p^{2}}$ [21] at the expense of a slight reduction in security.

Let us first analyze the performance of the GLV–GLS method over $\mathbb{F}_{p^{2}}$ in comparison with the traditional 2-GLV case over $\mathbb{F}_{p}$. We assume the use of a pseudo-Mersenne prime of the form p=2^m−c with small c (for our targeted curves, groups with (near) prime order cannot be constructed using the attractive Mersenne prime p=2¹²⁷−1). Given that we have a proven ratio C ₂/C ₁<412 that is independent of the curve, the only values left that could affect significantly a uniform speedup between GLV–GLS and 2-GLV are the quadratic nonresidue β used to build $\mathbb{F}_{p^{2}}$ as $\mathbb{F}_{p}[i]/(i^{2}-\beta)$, the value of the twisting parameter u, and the cost of applying the endomorphisms Φ and Ψ. In particular, if |β|>1, a few extra additions (or a multiplication by a small constant) are required per $\mathbb{F}_{p^{2}}$ multiplication and squaring. Luckily, for all the GLV curves listed in Appendix A, one can always use a suitably chosen modulus p so that |β| can be one or at least very close to it. Similar comments apply to the twisting parameter u. In this case, the extra cost (equivalent to a few additions) is added to the cost of point doubling always that the curve parameter a in the Weierstrass equation be different to zero (e.g., it does not affect j-invariant 0 curves). In the case of Twisted Edwards, we applied a better strategy, that is, we eliminated the twisting parameter u in the isomorphic curve. The cost of applying Φ and Ψ does depend on the chosen curve, and it could be relatively expensive. If computing Φ(P), Ψ(P) or ΨΦ(P) is more expensive than point addition, then its use can be limited to only one application (i.e., multiples of those values—if using precomputations—should be computed with point additions). Further, the extra cost can be minimized by choosing the optimal window width for each k _i.

To illustrate how the parameters above may affect the performance gain, we detail in Table 1 estimates for the cost of computing a scalar multiplication with our representative curves. For the remainder, we use the following notation: M, S, A, and I represent field multiplication, squaring, addition, and inversion over $\mathbb {F}_{p}$, respectively, and m, s, a, and i represent the same operations over $\mathbb{F}_{p^{2}}$. Side-channel protected multiplication and squaring are denoted by m_s and s_s. We consider the cost of addition, subtraction, negation, multiplication by 2, and division by 2 as equivalent. For the targeted curves in Weierstrass form, a mixed addition consists of 8 multiplications, 3 squarings, and 7 additions, and a general addition consists of 12 multiplications, 4 squarings, and 7 additions. For $E'_{1}$ and E ₂, a doubling consists of 3 multiplications, 4 squarings, and 7 additions, and for $E'_{3}$ and E ₄, a doubling consists of 3 multiplications, 6 squarings, and 12 additions. For Twisted Edwards, we consider the use of mixed homogeneous/extended homogeneous projective coordinates [15]. In this case, a mixed addition consists of 7 multiplications and 7 additions, a general addition consists of 8 multiplications and 6 or 7 additions, and a doubling consists of 4 multiplications, 3 squarings, and 5 additions. We also assume the use of interleaving [13] with width-w nonadjacent form (wNAF) and the use of the LM scheme for precomputing points on the Weierstrass curves [24] (see also [22, Chap. 3]).

Table 1. Operation counts and performance for scalar multiplication at approximately 128 bits of security. To determine the total costs, we consider 1i=66m, 1s=0.76m, and 1a=0.18m for $E'_{1}$, $E'_{3}$, and $E'_{T3}$; and 1I=290M, 1S=0.85M, and 1A=0.18M for E ₂ and E ₄. The cost ratio of multiplications over $\mathbb{F}_{p}$ and $\mathbb{F}_{p^{2}}$ is M/m=0.91. These values and the performance figures (in cycles) were obtained by benchmarking full implementations on a single core of a 3.4 GHz Intel Core i7-2600 (Sandy Bridge) processor.

Full size table

According to our theoretical estimates, it is expected that the relative speedup when moving from 2-GLV to GLV-GLS be as high as 1.5 times, approximately. To confirm our findings, we realized full implementations of the methods. Experimental results, also displayed in Table 1, closely follow our estimates and confirm that speedups in practice are about 1.52 times. Most remarkably, the use of the Twisted Edwards model pushes performance even further. In Table 1, the expected gains for $E'_{T3}$ are 31 % and 97 % in comparison with 4-GLV–GLS and 2-GLV in Weierstrass form (respectively). In practice, we achieved similar speedups, namely, 33 % and 102 % (respectively). Likewise, a rough analysis indicates that a Twisted Edwards GLV–GLS curve for a j-invariant 0 curve would achieve roughly similar speed to $E'_{T3}$, which means that in comparison to its corresponding Weierstrass counterpart the gains are in the order of 9 % and 66 % (respectively). This highlights the impact of using Twisted Edwards especially over those GLV–GLS curves relatively slower in the Weierstrass model. Timings were registered on a single core of a 3.4 GHz Intel Core i7-2600 (Sandy Bridge) processor.

Let us now focus on curves $E'_{1}$, E ₂, and $E'_{T3}$ to assess performance of implementations targeting four scenarios of interest: unprotected and side-channel protected versions with sequential and multicore execution. Operation counts for computing a scalar multiplication at approximately 128 bits of security for the different cases are displayed in Table 2. The techniques to protect and parallelize our implementations are described in Sect. 7. In particular, the execution flow and memory address access of side-channel protected versions are not secret and are fully independent of the scalar. For our versions running on several cores, we used OpenMP. We use an implementation in which each core is in charge of one scalar multiplication with k _i. Given the high cost of thread creation/destruction, this approach guarantees the fastest computation in our case (see Sect. 7 for a discussion). Note that these multicore figures are only relevant for scenarios in which latency rather than throughput is targeted. Finally, we consider the cost of constant-time table lookups (denoted by t) given its nonnegligible cost in protected implementations.

Table 2. Operation counts for scalar multiplication at approximately 128 bits of security using curves $E'_{1}$, E ₂, and $E'_{T3}$ in up to four variants: unprotected and side-channel protected implementations with sequential and multicore execution. To determine the total costs we consider 1i=66m, 1s=0.76m, and 1a = 0.18m for unprotected versions of $E'_{1}$ and $E'_{T3}$; 1i = 79m_s, 1s_s = 0.81m_s, and 1a = 0.17m_s for protected versions of $E'_{1}$ and $E'_{T3}$; t = 0.83m_s for $E'_{1}$ (32pts.); t = 1.28m_s for $E'_{T3}$ (36pts.); t = 0.78m_s for $E'_{T3}$ (20pts.); and 1I = 290M, 1S = 0.85M, and 1A = 0.18M for E ₂. In our case, M/m = 0.91 and m_s/m = 1.11. These values were obtained by benchmarking full implementations on a 3.4 GHz Intel Core i7-2600 (Sandy Bridge) processor.

Full size table

Focusing on curve $E'_{1}$, it can be noted a significant cost reduction when switching from non-GLV to a GLV–GLS implementation. The speedup is more than twofold for sequential, unprotected versions. Significant improvements are also expected when using multiple cores. A remarkable factor 3 speedup is expected when using GLV–GLS on four cores in comparison with a traditional execution (listed as non-GLV).

In general for our targeted GLV–GLS curves, the speedup obtained by using four cores is in between 1.42–1.80 times. Interestingly, the improvement is greater for protected implementations since the overhead of using a regular pattern execution is minimized when distributing computation among various cores. Remarkably, protecting implementations against timing attacks slowdowns performance by a factor in between 1.28–1.52, approximately. On the other hand, in comparison with curve E ₂, an optimal execution of GLV–GLS on four cores is expected to run 1.81 times faster than an optimal execution of the standard 2-GLV on two cores.

To confirm our findings, we implemented the different versions using curves $E'_{1}$, E ₂, and $E'_{T3}$. To achieve maximum performance and ease the task of parallelizing and protecting the implementations, we wrote our own standalone software without employing any external library. For our experiments we used a 3.4 GHz Intel Core i7-2600 processor, which contains four cores. The timings in terms of clock cycles are displayed in Table 3. As can be seen, closely following our analysis, GLV–GLS achieves a twofold speedup over a non-GLV implementation on a single core. Parallel execution improves performance by up to 1.76 times for side-channel protected versions. In comparison with the non-GLV implementation, the four-core implementation runs 3 times faster. Our results also confirm the lower-than-expected cost of adding side-channel protection. Sequential versions lose about 50 % in performance, whereas parallel versions only lose about 28 %. The relative speedup when moving from 2-GLV to GLV-GLS on j-invariant 0 curves is 1.53 times, closely following the theoretical factor-1.5 speedup estimated previously. Four-core GLV–GLS supports a computation that runs 1.81 times faster than the standard 2-GLV on two cores. Finally, in practice our Twisted Edwards curve achieves up to 9 % speedup on the sequential, nonprotected scenario in comparison with the efficient j-invariant 0 curve based on Jacobian coordinates.

Table 3. Point multiplication timings (in clock cycles), 64-bit processor.

Full size table

Comparison to Related Work

Let us now compare our best numbers with recent results in the literature for elliptic curves over large prime characteristic fields. Focusing on one-core unprotected implementations, the first author together with Hu and Xu reported in [16] 122,000 cycles for a j-invariant 0 Weierstrass curve on an Intel Core i7-2600 (Sandy Bridge) processor. We report 91,000 cycles with the GLV–GLS Twisted Edwards curve $E'_{T3}$, improving that number by a factor-1.34 speedup. We benchmarked on the same processor the side-channel protected software recently presented by Bernstein et al. [3] and obtained 194,000 cycles. Thus, our protected implementation, which runs in 137,000 cycles, is 1.42 times faster. Our result is also 1.12 times faster than the recent implementation by Hamburg [14].

It is also relevant to mention very recent results in settings other than elliptic curves over large prime characteristic fields. Taverne et al. [31] reported a protected implementation of a binary Edwards curve that runs in 225,000 cycles on an Intel Core i7-2600 (Sandy Bridge) machine, which is 1.64 times slower than our corresponding result. Aranha et al. [1] presented an implementation of the Koblitz curve K-283 that runs in 99,000 cycles on the same machine, which is 9 % slower than our GLV–GLS Twisted Edwards curve $E'_{T3}$ (unprotected sequential execution). Aranha et al. do not report timings for side-channel protected implementations. A faster (although also unprotected) implementation of a GLS binary curve over a quadratic extension field of characteristic two was recently announced in ECC2012. The running time in this case is about 73,000 cycles on the same Sandy Bridge processor [28]. These results highlight the significant impact of the carryless multiplier on the efficiency of characteristic two fields in the newest Intel processors. Efficient implementations on genus-2 (hyperelliptic) curves were recently reported in Bos et al. [6]. For instance, a protected implementation on a Kummer surface over a prime field runs in approximately 117,000 cycles on an Intel Core i7-3520M (Ivy Bridge) processor. Note that this processor architecture is in general more efficient than Sandy Bridge.

To the best of our knowledge, we have presented the first scalar multiplication implementation running on multiple cores that is protected against timing attacks, cache attacks, and several others.

9 Conclusion

We have shown how to generalize the GLV scalar multiplication method by combining it with Galbraith–Lin–Scott’s ideas to perform a proven almost fourfold speedup on GLV curves over $\mathbb{F}_{p^{2}}$. We have introduced a new and easy-to-implement reduction algorithm, consisting in two applications of the extended Euclidean algorithm, one in ℤ and the other in ℤ[i]. The refined bound obtained from this algorithm has allowed us to get a relative improvement from 2-GLV to 4-GLV–GLS practically independent of the curve. Our analysis and experimental results on different GLV curves show that in practice one should expect a factor-1.5 speedup, approximately. We improve performance even further by exploiting the Twisted Edwards model over a larger set of curves and show that this approach is especially significant to certain GLV curves with slow arithmetic in the Weierstrass model. This makes available to implementers new curves that achieve close to optimal performance. Moreover, we have shown how to protect GLV-based implementations against certain side-channel attacks with relatively low overhead and carried out a performance analysis on modern multicore processors. Our implementations of the generalized GLV–GLS method improve the state-of-the-art performance of elliptic curve point multiplication over fields of large prime characteristic for multiple scenarios: unprotected and side-channel protected versions with sequential and parallel execution. Finally, we have produced new families of GLV curves and written all such curves (up to isomorphism) with nontrivial endomorphisms of degree ≤3.

Notes

The rectangle norm of (x,y) is by definition max(|x|,|y|). As remarked in [29], we can replace it by any other metric norm. We will use the term “short” to denote smallness in the rectangle norm.
There is a mistake in [29] in the derivation of $\Bbbk$ for odd values of r. This affects [29, Corollary 1] for curves E ₂ and E ₃, where the correct values of $\Bbbk$ are respectively 2/3 and $4\sqrt{2}/7$.
The estimates are usually given for the Euclidean norm of the vectors. But it is easy to see that the rectangle norm is upper bounded by the Euclidean norm.
It is important to keep in mind that this association is only an isomorphism of abelian groups (ℤ-modules). However, ℤ[i]² is also endowed with a structure of ℤ[i]-module.
By small we mean really small, usually less than 5. In particular, for cryptographic applications, the degree is much smaller than the field size.
This is the only case where we cannot apply Lemma 1. It needs a separate treatment, given in Appendix B.
We take the opportunity to correct a typo found and transmitted in many sources, where a y factor was absent in the second coordinate. Its sign is irrelevant.
This is the first example where the endomorphism ring is not the maximal order of its field of fractions. It can be summarily seen as follows: $\operatorname{End}(E)\supseteq\mathbb{Z}[\sqrt{-3}]$. If not equal, then it must be the full ring of integers $\mathbb{Z}[\frac {1+\sqrt{-3}}{2}]$. This would imply that j=0, as there is only h(−3)=1 isomorphism class of elliptic curves with complex multiplication by $\mathbb{Z}[\frac{1+\sqrt{-3}}{2}]$, given in Example 2 (see [30] for an abridged description of the theory of complex multiplication). This is clearly not the case here. Alternatively, one can see that there would exist a nontrivial automorphism (a primitive cube root of unity) corresponding to $\frac{-1+\sqrt{-3}}{2}$. A direct computation then shows that this is impossible.

References

D.F. Aranha, A. Faz-Hernandez, J. Lopez, F. Rodriguez-Henriquez, Faster implementation of scalar multiplication on Koblitz curves, in Proceedings of Latincrypt 2012. LNCS, vol. 7533 (Springer, Berlin, 2012), pp. 177–193
Chapter Google Scholar
D.J. Bernstein, P. Birkner, M. Joye, T. Lange, C. Peters, Twisted Edwards curves, in Proceedings of AFRICACRYPT 2008, ed. by S. Vaudenay. LNCS, vol. 5023 (Springer, Berlin, 2008), pp. 389–405
Chapter Google Scholar
D.J. Bernstein, N. Duif, T. Lange, P. Schwabe, B.-Y. Yang, High-speed high-security signatures, in Proceedings of CHES 2011, ed. by B. Preneel, T. Takagi. LNCS, vol. 6917 (Springer, Berlin, 2011), pp. 124–142
Google Scholar
D.J. Bernstein, Curve25519: New Diffie–Hellman speed records, in Proceedings of PKC 2006. LNCS, vol. 3958 (Springer, Berlin, 2006), pp. 207–228
Google Scholar
D.J. Bernstein, CPU traps and pitfalls. Talk at Emerging Topics in Cryptographic Design and Cryptanalysis, Pythagorion, Samos, 2007. Available at: http://cr.yp.to/talks/2007.05.04/slides.pdf
J. Bos, C. Costello, H. Hisil, K. Lauter, Two is greater than one. Cryptology ePrint Archive, Report 2012/670, 2012. Available at: http://eprint.iacr.org/2012/670
D. Brumley, D. Boneh, Remote timing attacks are practical, in Proceedings of the 12th USENIX Security Symposium, ed. by S. Mangard, F.-X. Standaert. LNCS, vol. 6225 (Springer, Berlin, 2003), pp. 80–94
Google Scholar
H. Cohen, A Course in Computational Algebraic Number Theory. Graduate Texts in Mathematics, vol. 138 (Springer, Berlin, 1996)
Google Scholar
G. Cornacchia, Su di un metodo per la risoluzione in numeri interi dell’equazione $\sum_{h=0}^{n}C_{h}x^{n-h}y^{h}=P$. G. Mat. Battaglini 46, 33–90 (1908)
Google Scholar
H. Edwards, A normal form for elliptic curves. Bull. Am. Math. Soc. 44, 393–422 (2007)
Article MATH Google Scholar
S.D. Galbraith, X. Lin, M. Scott, Endomorphisms for faster elliptic curve cryptography on a large class of curves, in Proceedings of EUROCRYPT 2009, ed. by A. Joux. LNCS, vol. 5479 (Springer, Berlin, 2009), pp. 518–535
Google Scholar
S.D. Galbraith, X. Lin, M. Scott, Endomorphisms for faster elliptic curve cryptography on a large class of curves. J. Cryptol. 24(3), 446–469 (2011)
Article MATH MathSciNet Google Scholar
R.P. Gallant, J.L. Lambert, S.A. Vanstone, Faster point multiplication on elliptic curves with efficient endomorphisms, in Advances in Cryptology—Proceedings of CRYPTO 2001, ed. by J. Kilian. LNCS, vol. 2139 (Springer, Berlin, 2001), pp. 190–200
Chapter Google Scholar
M. Hamburg, Fast and compact elliptic-curve cryptography. Cryptology ePrint Archive, Report 2012/309, 2012. Available at: http://eprint.iacr.org/2012/309
H. Hisil, K. Wong, G. Carter, E. Dawson, Twisted Edwards curves revisited, in Proceedings of ASIACRYPT 2008, ed. by J. Pieprzyk. LNCS, vol. 5350 (Springer, Berlin, 2008), pp. 326–343
Chapter Google Scholar
Z. Hu, P. Longa, M. Xu, Implementing 4-dimensional GLV method on GLS elliptic curves with j-invariant 0. Des. Codes Cryptogr. 63(3), 331–343 (2012). Also in Cryptology ePrint Archive, Report 2011/315, http://eprint.iacr.org/2011/315
Article MATH MathSciNet Google Scholar
M. Joye, M. Tunstall, Exponent recoding and regular exponentiation algorithms, in Proceedings of Africacrypt 2003, ed. by M. Joye. LNCS, vol. 5580 (Springer, Berlin, 2009), pp. 334–349
Google Scholar
E. Kasper, Fast elliptic curve cryptography in OpenSSL, in 2nd Workshop on Real-Life Cryptographic Protocols and Standardization (2011)
Google Scholar
P.C. Kocher, Timing attacks on implementations of Diffie–Hellman, RSA, DSS, and other systems, in Advances in Cryptology—Proceedings of CRYPTO 1996, ed. by N. Koblitz. LNCS, vol. 1109 (Springer, Berlin, 1996), pp. 104–113
Google Scholar
A.K. Lenstra, H.W. Lenstra Jr., L. Lovász, Factoring polynomials with rational coefficients. Math. Ann. 261, 513–534 (1982)
Article Google Scholar
P. Longa, Elliptic curve cryptography at high speeds. Talk at the 15th Workshop on Elliptic Curve Cryptography (ECC 2011), INRIA, France, 2011. Available at: http://ecc2011.loria.fr/slides/longa.pdf
P. Longa, High-speed elliptic curve and pairing-based cryptography. PhD thesis, University of Waterloo, 2011. Available at: http://hdl.handle.net/10012/5857
P. Longa, C. Gebotys, Efficient techniques for high-speed elliptic curve cryptography, in Proceedings of CHES 2010, ed. by S. Mangard, F.-X. Standaert. LNCS, vol. 6225 (Springer, Berlin, 2010), pp. 80–94
Google Scholar
P. Longa, A. Miri, New composite operations and precomputation scheme for elliptic curve cryptosystems over prime fields, in Proceedings of PKC 2008, ed. by R. Cramer. LNCS, vol. 4939 (Springer, Berlin, 2008), pp. 229–247
Google Scholar
F. Morain, Courbes elliptiques et tests de primalité. PhD thesis, Université de Lyon I, 1990. Available at: http://www.lix.polytechnique.fr/Labo/Francois.Morain/Articles/publisfm.php, Chap. 2: On Cornacchia’s algorithm (joint with J.-L. Nicolas)
P.Q. Nguyen, D. Stehlé, Low-dimensional lattice basis reduction revisited, in Algorithmic Number Theory, Proceedings of 6th International Symposium, ANTS-VI, Burlington, VT, USA, 13–18 June 2004, ed. by D.A. Buell. LNCS, vol. 3076 (Springer, Berlin, 2004), pp. 338–357
Chapter Google Scholar
K. Okeya, T. Takagi, The width-w NAF method provides small memory and fast elliptic curve scalars multiplications against side-channel attacks, in Proceedings of CT-RSA 2003, ed. by M. Joye. LNCS, vol. 2612 (Springer, Berlin, 2003), pp. 328–342
Google Scholar
F. Rodriguez-Henriquez, Private communication, 2012
F. Sica, M. Ciet, J.-J. Quisquater, Analysis of the Gallant–Lambert–Vanstone method based on efficient endomorphisms: Elliptic and hyperelliptic curves, in Selected Areas in Cryptography, 9th Annual International Workshop, SAC 2002, ed. by H. Heys, K. Nyberg. LNCS, vol. 2595 (Springer, Berlin, 2002), pp. 21–36
Google Scholar
H.M. Stark, Class-numbers of complex quadratic fields, in Modular Functions of One Variable, I, Proc. Internat. Summer School, Univ. Antwerp, Antwerp, 1972. Lecture Notes in Mathematics, vol. 320 (Springer, Berlin, 1973), pp. 153–174
Chapter Google Scholar
J. Taverne, A. Faz-Hernandez, D.F. Aranha, F. Rodriguez-Henriquez, D. Hankerson, J. Lopez, Speeding scalar multiplication over binary elliptic curves using the new carry-less multiplication instruction. J. Cryptograph. Eng. 1, 187–199 (2011)
Article Google Scholar
Z. Zhou, Z. Hu, M. Xu, W. Song, Efficient 3-dimensional GLV method for faster point multiplication on some GLS elliptic curves. Inf. Process. Lett. 77(262), 1075–1104 (2010)
MathSciNet Google Scholar

Download references

Acknowledgements

We thank the reviewers, Mike Scott and Dan Bernstein for their helpful comments. Also, we would like to thank Diego Aranha for his advice on multicore programming, Joppe Bos for his help on looking for efficient chains for implementing modular inversion, and Craig Costello and Kristin Lauter for helping us to detect a typo on a curve parameter in a previous paper version.

Author information

Authors and Affiliations

Microsoft Research, One Microsoft Way, Redmond, WA, 98052, USA
Patrick Longa
School of Science and Technology, Nazarbayev University, 53 Kabanbay Batyr Ave., Astana, Kazakhstan
Francesco Sica

Authors

Patrick Longa
View author publications
You can also search for this author in PubMed Google Scholar
Francesco Sica
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Patrick Longa.

Additional information

Communicated by Nigel P. Smart

This is the full version of a paper published at Asiacrypt 2012.

Appendices

Appendix A. Examples

We give a few examples of GLV curves, which are curves defined over ℂ with complex multiplication by a quadratic integer of small norm, corresponding to an endomorphism ϕ of small degree.^{Footnote 5} They make up an exhaustive list, up to isomorphism, in increasing order of endomorphism degree up to degree 3. While the first four examples appear in the previous literature, the next ones (degree 3) are new and have been computed with the Stark algorithm [30].

Example 1

Let p≡1(mod4) be a prime. Define an elliptic curve E over $\mathbb {F}_{p}$ by

$$y^2=x^3+ax. $$

If β is an element of order 4, then the map ϕ defined in the affine plane by

$$\phi(x,y)=(-x, \beta y) $$

is an endomorphism of E defined over $\mathbb {F}_{p}$ with $\operatorname{End}(E)= \mathbb {Z}[\phi] \cong \mathbb {Z}[\sqrt{-1}]$ since ϕ satisfies the equation^{Footnote 6}

$$\phi^2+1=0. $$

Example 2

Let p≡1(mod3) be a prime. Define an elliptic curve E over $\mathbb {F}_{p}$ by

$$y^2=x^3+b. $$

If γ is an element of order 3, then we have an endomorphism ϕ defined over $\mathbb {F}_{p}$ by

$$\phi(x,y)=(\gamma x, y), $$

and $\operatorname{End}(E)= \mathbb {Z}[\phi] \cong \mathbb {Z}[\frac{1+\sqrt{-3}}{2}]$ since ϕ satisfies the equation

$$\phi^2+\phi+1=0. $$

Example 3

Let p>3 be a prime such that −7 is a quadratic residue modulo p. Define an elliptic curve E over $\mathbb {F}_{p}$ by

$$y^2=x^3- \frac{3}{4} x^2 -2x -1. $$

If $\xi=(1+\sqrt{-7})/2$ and a=(ξ−3)/4, then we get the $\mathbb {F}_{p}$-endomorphism ϕ defined by

$$\phi(x,y)= \biggl(\frac{x^2-\xi}{\xi^2(x-a)}, \frac{y(x^2-2ax+\xi )}{\xi^3(x-a)^2} \biggr), $$

and $\operatorname{End}(E)= \mathbb {Z}[\phi]\cong \mathbb {Z}[\frac{1+\sqrt{-7}}{2}]$ since ϕ satisfies the equation

$$\phi^2-\phi+2=0. $$

Example 4

Let p>3 be a prime such that −2 is a quadratic residue modulo p. Define an elliptic curve E over $\mathbb {F}_{p}$ by

$$y^2=4x^3- 30x -28 $$

together with the $\mathbb {F}_{p}$-endomorphism ϕ defined^{Footnote 7} by

$$\phi(x,y)= \biggl(-\frac{2x^2+4x+9}{4(x+2)}, y\frac{2x^2+8x-1}{4\sqrt {-2}(x+2)^2} \biggr). $$

We have $\operatorname{End}(E)= \mathbb {Z}[\phi]\cong \mathbb {Z}[\sqrt{-2}]$ since ϕ satisfies the equation

$$\phi^2+2=0. $$

Example 5

Let p>3 be a prime such that −11 is a quadratic residue modulo p. We define the elliptic curve E over $\mathbb {F}_{p}$

$$y^2 = x^3 - \frac{13824}{539} x + \frac{27648}{539} $$

with $a=(1+\sqrt{-11})/2$ and the endomorphism ϕ defined by

such that $\operatorname{End}(E)=\mathbb{Z}[\phi] \cong\mathbb{Z}[\frac{1+\sqrt {-11}}{2}]$. The characteristic polynomial of ϕ is

$$\phi^2 - \phi+3 =0 . $$

Example 6

Let p>3 be a prime such that −3 is a quadratic residue mod p. We define the elliptic curve E over $\mathbb {F}_{p}$

$$y^2 = x^3 - \frac{3375}{121}x + \frac{6750}{121} $$

with the endomorphism ϕ defined by

such that^{Footnote 8} $\operatorname{End}(E)=\mathbb {Z}[\phi] \cong\mathbb{Z}[\sqrt{-3}]$. The characteristic polynomial of ϕ is

$$\phi^2 +3 =0 . $$

Appendix B. Treatment of the j=1728 Curve

Let $a\in\mathbb{F}_{p}$, and $u\in\mathbb{F}_{p^{8}}$ be such that $u^{4}\in \mathbb{F}_{p^{2}}$. Let E ₁ be the curve of equation y ²=x ³+ax over $\mathbb{F}_{p^{8}}$ and $E_{1}'$ the curve of equation y ²=x ³+au ⁴ x over $\mathbb{F}_{p^{2}}$. Then E ₁ is isomorphic to $E_{1}'$ over $\mathbb{F}_{p^{8}}$ via the isomorphism $\psi\colon E_{1} \to E_{1}'$ defined by ψ(x,y)=(u ² x,u ³ y). In other terms, $E_{1}'$ is a quartic twist of E ₁. We define $\varPsi= \psi \operatorname{Frob}_{p} \psi^{-1}\in \operatorname{End}(E_{1}')$.

Then, since $\operatorname{Frob}_{p}^{8}=1$ and $\operatorname{Frob}_{p}^{4}\neq1$ on $E_{1}(\mathbb {F}_{p^{8}})$, we have Ψ ⁸=1 on $E_{1}'(\mathbb{F}_{p^{2}})$, and, if $u\notin\mathbb{F}_{p^{4}}$, then Ψ ⁴+1=0. In this case, we can proceed as in Sect. 4, defining Φ=Ψ ². Note that K=ℚ(Ψ) is a quartic field with $\mathfrak{o}_{K}= \mathbb {Z}[\varPsi]$. Lemma 1 still holds with the norm function of K/ℚ. Finally, Theorem 2 is true with 16B ³ replaced by 16⋅24^3/4≈173.49.

Appendix C. Cornacchia’s algorithm in ℤ[i]—real & imaginary parts

Algorithm 3

(EGEA or Cornacchia’s algorithm in ℤ[i]—real & imaginary parts)

Appendix D. Proof of Theorem 3

This section is devoted to proving that Algorithms 2 and 3 produce a reduced basis of kerF of rectangle norm $< 51.5 (\sqrt{1+|r|+s}) n^{1/4}$. The proof of the decomposition of k follows from the deduction recalled in Sect. 4.

Let us note first, about the running time, that it is known that the extended Euclidean algorithm runs in O(log² n) bits. The same analysis will also show that its Gaussian version runs in O(log² n) bits since its number of steps is also logarithmic. In short, this works as follows: if b _j=⌊log₂(|r _j|)⌋ (i.e., the bitsize of |r _j|), then step j of the EGEA necessitates to find q _j+1 and then r _j+2. One can show that integer division of two h-bit Gaussian integers with an ℓ-bit quotient runs in O(h(ℓ+1)) binary operations. Finding q _j+1 has therefore a runtime O(b _j(c _j+1+1)), where c _j+1=⌊log₂(|q _j+1|)⌋=b _j−b _j+1+O(1). Similarly, knowing q _j+1, computing r _j+2 can be done in O(b _j+1 c _j+1)+O(b _j+1)=O(b _j+1(b _j−b _j+1))+O(b _j+1). If S=O(logn) is the number of steps of the EGEA, the total runtime is less than a constant times

$$\sum_{j=0}^S b_{j} (b_{j}-b_{j+1})+ b_{j} = O\bigl( b_0^2 + b_0 S\bigr) = O\bigl( \log^2 n\bigr) . $$

In the following, whenever z∈ℂ^∗, its argument value arg(z) will be always chosen in (−π,π]. By lattice square we mean a square of side length one with vertices in ℤ[i]. We single out eight exceptional lattice squares, which are those lattice squares with a vertex of modulus 1 (that is, ±1 or ±i) but not containing the origin as a vertex. Our analysis of the EGEA rests on the following lemmas.

Lemma 3

(A geometric property of squares)

There exists an absolute real constant θ≈2.45861 (with 2arctan2<θ) such that, for any point P of a lattice square, different from the vertices, letting V ₁ be the closest vertex to P, there exists another vertex V ₂≠V ₁ with $\theta\leq\widehat{V_{1}PV_{2}} \leq\pi$. (Note that $V_{1}P \leq1/\sqrt{2}$.)

Proof

This is one case where a picture is worth one thousand words. We refer to Fig. D.1 for a visual explanation of why the argument works. The dotted and dashed circle arcs are centered on the vertices and have radius $1/\sqrt{2}$. The plain circle arcs have the following property: for any point P on them, the two square vertices V and V′ belonging to them make an angle of θ with P, in other terms, $|\widehat{V P V'}| =\theta$. Therefore points between two bigger arcs (in one of the two almond-shaped regions) “look” at the diagonally opposite vertices marking the intersection of these arcs with an angle between θ and π. We then choose the closest vertex to get a distance $\leq1/\sqrt{2}$. In case P is at the intersection of the two almond-shaped regions (in the “blown square”), we may have to choose one region where one of the vertices is at distance $\leq1/\sqrt{2}$, but this is always possible, since the dashed and dotted disks cover everything. Finally, if P does not belong to the union of the two almond-shaped regions, then it lies inside one of the smaller plain disks, where its angle between two appropriate consecutive vertices will also be between θ and π. Furthermore, by choosing the closest vertex V ₁ to P, we have $V_{1}P < 1/\sqrt{2}$.

□

It remains to explain how we can calculate θ, or rather its value on the usual trigonometric functions sinθ and cosθ (which is what we really need later), since we can show that they are algebraic numbers expressible by radicals, but θ/π∉ℚ.

We concentrate on finding the Cartesian coordinates of R=(1/2,1−u/2), appearing in Fig. D.1, supposing the vertices are the origin, (1,0),(1,1), and (0,1). Our aim is then to find u∈(0,1). A look at Fig. D.2 shows the disposition of the angles, so that u=cot(θ/2) and 2−u=cot(3θ/2−π)=cot(3θ/2). The triplication formulas for the cotangent then show that u satisfies the equation

$$u+\frac{3u-u^3}{1-3u^2} =2 \quad\Longleftrightarrow\quad 2u^3-3u^2-2u+1=0. $$

Solving it yields that the root we are looking for is

$$u= \frac{{ ( {\sqrt{ 3 } i} + 1 ) { ( {{12 \sqrt{ 237 }} i} - 54 )}^{\frac{1}{3}} }}{{12 {2}^{\frac{1}{3}} }} + \frac{{{7 {2}^{\frac{1}{3}} } ( 1 - {\sqrt{ 3 } i} )}}{{4 { ( {{12 \sqrt{ 237 }} i} - 54 )}^{\frac{1}{3}} }} + \frac{1}{2} \approx 0.3554157, $$

where the determination of the cube root is the one in the first quadrant.

Remark 4

One can see that θ/π∉ℚ in the following way.

$$\cot(\theta/2)= i \frac{e^{i\theta/2} + e^{-i\theta /2}}{e^{i\theta /2}-e^{-i\theta/2}}, $$

and supposing for contradiction that θ/π∈ℚ, we would have that e ^iθ is a root of unity. The preceding equality shows that then cot(θ/2) belongs to a cyclotomic extension of ℚ, whose Galois group is abelian. But we have seen that the irreducible polynomial of cot(θ/2) is 2x ³−3x ²−2x+1, with discriminant 316, not a rational square. Therefore, its Galois group is the nonabelian S ₃, a contradiction.

Remark 5

When applying Lemma 3, it is essential that we be able to choose from the set of all vertices of the lattice square which ones are the adequate V ₁ and V ₂. Since the only excluded quotient q _j is zero, it means that we must be careful to avoid all four squares which have the origin as a vertex. But this follows from the fact that at all steps j≥0 we always have $|r_{j}/r_{j+1}| \geq \sqrt{2}$.

Define Θ=arctan2−π/3 and $A=1/\sin\varTheta= 2\sqrt{5} (8+5\sqrt{3})/\sqrt{13+4\sqrt{3}} \approx16.6902$. In the following analysis of the EGEA, it will be useful to make the following distinction between indices.

Definition 1

(Good and bad j’s)

A step j≥0 of the EGEA will be called bad if, during the (j−1)th step, among all four choices of q _j as a vertex of the lattice square containing r _j−1/r _j (and consequent choice of r _j+1 and s _j+1, noting that for the purpose of this definition, we do not require that |r _j+1|<|r _j|), we always have s _j s _j+1 r _j+1≠0 and

$$\biggl \vert \arg \biggl( \frac{ s_{j+1}r_j}{s_j r_{j+1}} \biggr) \biggr \vert < \varTheta. $$

Otherwise j is called good.

Remark 6

Note that j=0 and j=1 are always good since s ₁=0.

Lemma 4

(Use of good j’s)

If j is good, then for some choice of $r_{j+1}'$ as per preceding definition (and relative $s_{j+1}'$), we have

$$\bigl \vert s_{j+1}'r_j - s_jr_{j+1}' \bigr \vert \geq\sin\varTheta\max \bigl(\bigl|s_{j+1}' r_j\bigr|, \bigl|s_j r_{j+1}'\bigr|\bigr), $$

and, therefore, since r _j+1 in the EGEA has smallest modulus among all four choices $r_{j+1}'$, we have

$$\max\bigl(|s_{j}r_{j+1}|, |s_{j+1}r_j|\bigr) \leq(A+1) |\nu|. $$

Proof

Notice that the result holds trivially if $s_{j} s_{j+1}'r_{j+1}'=0$. Otherwise, this is a straightforward application of a general inequality about complex numbers that we can express as follows: let ζ∈ℂ^∗ with π≥|arg(ζ)|≥Θ. We claim that under these conditions, |1−ζ|≥sinΘ. Indeed, writing ζ=re ^iψ with Θ≤ψ≤π, we have

$$|1-\zeta|^2 = \bigl(1-re^{i\psi} \bigr) \bigl(1-re^{-i\psi} \bigr) = 1- 2r \cos\psi+ r^2. $$

First note that we can suppose ψ≤π/2, otherwise clearly |1−ζ|≥1. The last expression in r, when viewed as a quadratic polynomial, has minimum (over ℝ) equal to −Δ/4=−(4cos² ψ−4)/4=sin² ψ≥sin² Θ. Therefore |1−ζ|≥sinΘ, thereby proving our claim. The first part of the lemma will follow by applying the claim to $\zeta= s_{j+1}'r_{j}/s_{j}r_{j+1}'$ and $\zeta=s_{j}r_{j+1}'/s_{j+1}'r_{j}$ successively.

The second part follows from

$$|s_{j}r_{j+1}| \leq \bigl|s_{j} r_{j+1}'\bigr| \leq A \bigl \vert s_{j+1}'r_j - s_jr_{j+1}' \bigr \vert = A |\nu|, $$

and therefore,

$$|s_{j+1}r_j | = |s_{j+1} r_j - s_j r_{j+1} + s_j r_{j+1}| \leq|s_{j+1} r_j - s_j r_{j+1} | + |s_j r_{j+1}| \leq|\nu| + A |\nu|. $$

□

Remark 7

The next result is crucial in controlling what happens in the bad cases. Its proof is rather elaborated.

Lemma 5

(Bad-j behavior of s _j)

If j is bad, then

$$|s_{j+1}| \leq2\sqrt{2} |s_{j-1}| \quad\text{\textit{and}} \quad|s_j| \leq |s_{j-1}| . $$

Proof

We first suppose that the point P corresponding to r _j−1/r _j does not belong to an exceptional lattice square. Let V ₁ and V ₂ as in Lemma 3 correspond respectively to q _j and $q_{j}'$. Upon defining $r_{j+1}'= r_{j-1} - q_{j}' r_{j}$, since r _j+1=r _j−1−q _j r _j, Lemma 3 states that $\pi\geq |\arg ( (q_{j} - r_{j-1}/r_{j})/(q_{j}' - r_{j-1}/r_{j}) ) | = |\arg(r_{j+1}/r_{j+1}')| \geq\theta$. By the definition of “bad” we have, denoting $s_{j+1}' = s_{j-1} - q_{j}' s_{j}$,

$$\biggl \vert \arg \biggl( \frac{ s_{j+1}r_j}{s_j r_{j+1}} \biggr) \biggr \vert < \varTheta \quad\text{and}\quad \biggl \vert \arg \biggl( \frac{ s_{j+1}'r_j}{s_j r_{j+1}'} \biggr) \biggr \vert < \varTheta, $$

and this yields

We deduce

while, on the other hand,

$$\biggl \vert \arg \biggl(\frac{s_{j+1} r_{j+1}'}{s_{j+1}'r_{j+1}} \biggr)\biggr \vert + \biggl \vert \arg \biggl( \frac{r_{j+1}}{r_{j+1}'} \biggr)\biggr \vert < 2\varTheta+\pi< \theta-\frac{2\pi}{3} +\pi< \frac{4\pi}{3}\,, $$

which together imply

$$ \biggl \vert \arg \biggl( \frac{s_{j+1}}{s_{j+1}'} \biggr) \biggr \vert > \frac {2\pi }{3} . $$

(D.1)

Now assume that |s _j|>|s _j−1|. Then $|q_{j} s_{j}|>\sqrt{2}\, |s_{j-1}|$ and $|q_{j}' s_{j}| > \sqrt{2}\, |s_{j-1}|$, since the quotients $q_{j}, q_{j}'$ are Gaussian integers of modulus different from zero or one. Furthermore, since there is at most one Gaussian integer of modulus $\sqrt{2}$ in a lattice square, we have that either |q _j s _j|>2|s _j−1| or $|q_{j}' s_{j}|>2 |s_{j-1}|$. Therefore, by Remark 7,

$$\biggl \vert \arg \biggl(\frac{q_js_j- s_{j-1}}{q_j s_j} \biggr)\biggr \vert = \biggl \vert \arg \biggl(1- \frac{s_{j-1}}{q_j s_j} \biggr)\biggr \vert \leq \frac{\pi}{4}\,, $$

and, similarly,

$$\biggl \vert \arg \biggl(\frac{q_j's_j- s_{j-1}}{q_j' s_j} \biggr)\biggr \vert \leq\frac{\pi}{4} $$

with at least one of them being ≤π/6. We then get, using that $|\arg(q_{j}/q_{j}')|\leq\pi/4$,

contradicting (D.1).

Exceptional Choices of V ₁ and V ₂

We now discuss the exceptional cases where q _j or $q_{j}'=\pm1,\pm i$, which need to be handled ad hoc. By symmetry (without loss of generality), we place ourselves in the case where r _j−1/r _j lies in the lattice square of vertices i,1+i,1+2i,2i. It then belongs to one of the five regions labeled 1 to 5 on Fig. D.3. Note that the open gray region is off-limits since $|r_{j-1}/r_{j}| \geq \sqrt{2}$. Each of these regions contains two lattice points, which as before we will denote V ₁ for the one closest to the point P of affix r _j−1/r _j and V ₂ for the other one (in case P lies on the boundary between two or more zones, their distinction is immaterial). Note that V ₁ is closest to P among all four vertices.

Region 1 (delimited by a triangle of vertices i,1/4+3i/2,2i): In this case, $|\arg(r_{j+1}/r_{j+1}')| = \widehat{V_{1}PV_{2}} \geq 2\arctan2$. Supposing to fix notations that q _j=i and $q_{j}'=2i$, we have, assuming that |s _j|>|s _j−1| and using Remark 7,

$$\biggl \vert \arg \biggl(\frac{q_js_j- s_{j-1}}{q_j s_j} \biggr)\biggr \vert \leq\frac{\pi}{2} , \quad\biggl \vert \arg \biggl(\frac{q_j's_j- s_{j-1}}{q_j' s_j} \biggr)\biggr \vert \leq \frac{\pi}{6} . $$

On the other hand, a reasoning similar to the one leading to (D.1) with the value 2arctan2 instead of θ will show that again $|\arg(s_{j+1}/s_{j+1}')| > 2\pi/3$, which leads to a contradiction since

The other four cases are treated similarly, and we briefly outline them.

Region 2 (delimited by a triangle of vertices 2i,1+7i/4,1+2i): Here $|\arg(r_{j+1}/r_{j+1}')| \geq2\arctan2$. Letting $q_{j}=2i, q_{j}'=1+2i$, one can show, assuming that |s _j|>|s _j−1|, that

a contradiction.

Region 3 (delimited by a triangle of vertices 1+2i,3/4+i/2,1+i): Here $|\arg(r_{j+1}/r_{j+1}')| \geq2\arctan2$. Letting $q_{j}=1+i, q_{j}'=1+2i$, one can show, assuming that |s _j|>|s _j−1|, that

a contradiction.

Region 4 (the red zone): Here $|\arg(r_{j+1}/r_{j+1}')| \geq\pi-\arctan2 + \arctan(2/3)$. Letting $q_{j}=i, q_{j}'=1+2i$, one can show, assuming that |s _j|>|s _j−1|, that

a contradiction.

Region 5 (the yellow zone): Here $|\arg(r_{j+1}/r_{j+1}')| \geq\pi- \arctan(4/7)$. Letting q _j=1+i, $q_{j}'= 2i$, one can show, assuming that |s _j|>|s _j−1|, that

a contradiction.

We have thus proved that in any case |s _j|≤|s _j−1|. To show the first part of the lemma, we proceed similarly, although there is a slight difference. We assume at first that |s _j+1|>2 |s _j−1| and $|s_{j+1}' |> \sqrt{2}\, |s_{j-1}|$. Then, by Remark 7,

$$\biggl \vert \arg \biggl( \frac{s_{j+1} - s_{j-1}}{s_{j+1}} \biggr) \biggr \vert = \biggl \vert \arg \biggl( 1 -\frac{s_{j-1}}{s_{j+1}} \biggr) \biggr \vert \leq \frac{\pi}{6} $$

and

$$\biggl \vert \arg \biggl( \frac{s_{j+1}' - s_{j-1}}{s_{j+1}'} \biggr) \biggr \vert \leq\frac{\pi}{4} . $$

Proceeding as previously,

again contradicting (D.1), which also holds in the exceptional cases, as we have just seen. Therefore, |s _j+1|≤2 |s _j−1| or $|s_{j+1}' | \leq \sqrt{2}\, |s_{j-1}|$ (or both). In the first case, we are done. Otherwise, since $s_{j+1}+ q_{j} s_{j} = s_{j+1}' + q_{j}' s_{j}=s_{j-1}$, we derive

$$|s_{j+1}|\leq\bigl|s_{j+1}'\bigr| + \bigl|q_j - q_j'\bigr| |s_j| \leq\sqrt{2} |s_{j-1}| + \sqrt{2} |s_{j-1}| = 2\sqrt{2} |s_{j-1}| $$

by the already proved second part of the lemma and the fact that $q_{j}, q_{j}'$ correspond to two vertices of the same lattice square, so that $|q_{j} -q_{j}'| \leq\sqrt{2}$. □

Lemma 6

(Lower bound on generic vectors of kerF)

For any nonzero (z ₁,z ₂)∈kerF, we have

$$\max\bigl(|z_1|, |z_2|\bigr) \geq\frac{\sqrt{|\nu|}}{\sqrt{1+|r|+s}} . $$

In particular, for any j≥0, we have

$$\max\bigl(|r_j|, |s_j|\bigr) \geq\frac{\sqrt{|\nu|}}{\sqrt{1+|r|+s}} . $$

Proof

This proof uses an argument already appearing in the proof of the original GLV algorithm (see [29]) and Lemma 1. If (0,0)≠(z ₁,z ₂)∈kerF, then z ₁+λz ₂≡0(modν). If λ′ is the other root of X ²+rX+s(modn), we get that

$$z_1^2 - r z_1z_2 +s z_2^2 \equiv(z_1+\lambda z_2) \bigl(z_1 +\lambda' z_2\bigr) \equiv0 \pmod{\nu}. $$

Since X ²+rX+s is irreducible in ℚ(i) because the two quadratic fields are linearly disjoint, we therefore have $|z_{1}^{2} - r z_{1}z_{2} +s z_{2}^{2}|\geq|\nu|$. On the other hand, if

$$\max\bigl(|z_1|, |z_2|\bigr) < \frac{\sqrt{|\nu|}}{\sqrt{1+|r|+s}} , $$

then

$$\bigl|z_1^2 - r z_1z_2 +s z_2^2\bigr| \leq|z_1|^2 + |r| |z_1| |z_2| + s |z_2|^2 < |\nu|, $$

a contradiction. To show the second part, it suffices to note that since r ₀ s _j+νt _j=r _j (where, as mentioned previously, r ₀=λ or λ+n), we have that

$$0\equiv\nu t_j= r_j - r_0 s_j \equiv r_j - \lambda s_j \pmod{\nu}, $$

so that (r _j,−s _j)∈kerF for every j≥0. □

Proof of Theorem 3

It remains here to show the improved bound, which brings us to finding four ℚ-linearly independent vectors of kerF of rectangle norm bounded by Cn ^1/4. Define m≥1 as the index such that

$$ |r_m| \geq\frac{\sqrt{|\nu|}}{\sqrt{1+|r|+s}} \quad \text{and} \quad |r_{m+1}| < \frac{\sqrt{|\nu|}}{\sqrt{1+|r|+s}}. $$

(D.2)

Let us consider an index j≤m. If it is good, then by Lemma 4 we have |s _j+1 r _j|≤(A+1)|ν|, and therefore, since (|r _j|) is a decreasing sequence,

$$ |s_{j+1}| \leq2\sqrt{2}(A+1)\sqrt{1+|r|+s} \sqrt{|\nu|}. $$

(D.3)

On the other hand if it is bad, then let l<j be the largest good index less than j. By Lemma 5 and Lemma 4 we have

and therefore in any case (D.3) holds. Applying this to j=m−1 and j=m, we find that

$$ \max\bigl(|s_m|, |s_{m+1}|\bigr) \leq2\sqrt{2}(A+1) \sqrt{1+|r|+s} \sqrt{|\nu |}. $$

(D.4)

Moreover, using

$$ s_{m+1} r_m - s_m r_{m+1} = (-1)^{m+1} \nu, $$

(D.5)

from (D.2) and (D.4) we deduce

$$|s_{m+1}r_m| \leq{|\nu| + |s_m r_{m+1}| } \leq|\nu| + 2\sqrt{2}(A+1) |\nu| . $$

In addition, by Lemma 6 we must have

$$|s_{m+1}| \geq\frac{\sqrt{|\nu|}}{\sqrt{1+|r|+s}}, $$

which therefore implies that

$$|r_m| \leq \bigl(2\sqrt{2} (A + 1 )+1 \bigr) \sqrt{1+|r|+s} \sqrt {| \nu|}. $$

This last equation, together with (D.4) and (D.2), shows that the two vectors v ₁=(r _m,−s _m),v ₂=(r _m+1,−s _m+1)∈kerF⊂ℤ[i]² have rectangle norms bounded by $C \sqrt {|\nu |} = C n^{1/4}$ for $C= (2\sqrt{2} (A + 1 )+1 ) \sqrt {1+|r|+s} < 51.5\, \sqrt{1+|r|+s}$ (that these two vectors belong to kerF was shown in the proof of Lemma 6).

We can find two more vectors by noticing that (D.5) implies that v ₁ and v ₂ are ℚ(i)-linearly independent. Therefore, the vectors v ₁,v ₂,v ₃=iv ₁,v ₄=iv ₂ are ℚ-linearly independent. They all belong to kerF and have rectangle norms bounded by Cn ^1/4. In view of the fact that the Euclidean norm upper-bounds the rectangle norm, the corresponding vectors in ℤ⁴ also have rectangle norms bounded by Cn ^1/4, thus concluding the proof of the theorem, since these are exactly the four vectors returned by Algorithm 3. □

Appendix E. Side-Channel Resistant Recoding Algorithm for d-Dimensional GLV Scalar Multiplication

Below is the recoding algorithm with regular execution that, likewise, enables a scalar multiplication with regular pattern execution. It modifies Algorithm 6 from [17] to support a recoding of fixed length. To apply the recoding to GLV-based scalar multiplication, Algorithm 4 has to be applied to each scalar derived from the GLV decomposition.

Algorithm 4

(Regular Pattern Recoding Algorithm with Fixed Length)

Proof

The correctness of Algorithm 4 can be found in [17]. We still need to show that all scalars in a GLV-based scalar multiplication can be represented with exactly t+1=⌈l/(d⋅(w−1))⌉+1 digits. For all practical purposes, it can be assumed that the GLV method decomposes an l-bit scalar into ⌈l/d⌉-bit scalars. What is left is to show that every (w−1) bits of the scalar are converted to exactly (w−1) digits (padding with zeroes in the most significant positions if required). This is precisely what happens since at every iteration (w−1) bits are replaced by (w−2) zero digits and one nonzero digit. The updating step sets to one the least significant bit of the new partial value k. After all iterations are done this bit is assigned to k _t as the (t+1)th digit. □

Rights and permissions

Reprints and permissions

About this article

Cite this article

Longa, P., Sica, F. Four-Dimensional Gallant–Lambert–Vanstone Scalar Multiplication. J Cryptol 27, 248–283 (2014). https://doi.org/10.1007/s00145-012-9144-3

Download citation

Received: 27 August 2012
Published: 16 January 2013
Issue Date: April 2014
DOI: https://doi.org/10.1007/s00145-012-9144-3

Key words

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Four-Dimensional Gallant–Lambert–Vanstone Scalar Multiplication

Abstract

Similar content being viewed by others

Faster Compact Diffie–Hellman: Endomorphisms on the x-line

Twisted Hessian Curves

A special scalar multiplication algorithm on Jacobi quartic curves

1 Introduction

Related Work

2 The GLV Method

Theorem 1

3 The GLS Improvement

4 Combining GLV and GLS

Lemma 1

Proof

Remark 1

Theorem 2

5 Uniform Improvements and a Tale of Two Cornacchia Algorithms

Theorem 3

5.1 The Euclidean Algorithm in ℤ

Lemma 2

Algorithm 1

5.2 The Euclidean Algorithm in ℤ[i]

Algorithm 2

Remark 2

Remark 3

6 GLV–GLS using the Twisted Edwards Model

7 Side-Channel Protection and Parallelization of the GLV–GLS Method

7.1 Multicore Computation and Its Side-Channel Protection

8 Performance Analysis and Experimental Results

Comparison to Related Work

9 Conclusion

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendices

Appendix A. Examples

Example 1

Example 2

Example 3

Example 4

Example 5

Example 6

Appendix B. Treatment of the j=1728 Curve

Appendix C. Cornacchia’s algorithm in ℤ[i]—real & imaginary parts

Algorithm 3

Appendix D. Proof of Theorem 3

Lemma 3

Proof

Remark 4

Remark 5

Definition 1

Remark 6

Lemma 4

Proof

Remark 7

Lemma 5

Proof

Exceptional Choices of V 1 and V 2

Lemma 6

Proof

Proof of Theorem 3

Appendix E. Side-Channel Resistant Recoding Algorithm for d-Dimensional GLV Scalar Multiplication

Algorithm 4

Proof

Rights and permissions

About this article

Cite this article

Share this article

Key words

Search

Navigation

Exceptional Choices of V ₁ and V ₂