Abstract
We propose efficient algorithms and formulas that improve the performance of side channel protected elliptic curve computations with special focus on scalar multiplication exploiting the Gallant–Lambert–Vanstone (CRYPTO 2001) and Galbraith–Lin–Scott (EUROCRYPT 2009) methods. Firstly, by adapting Feng et al.’s recoding to the GLV setting, we derive new regular algorithms for variable-base scalar multiplication that offer protection against simple side-channel and timing attacks. Secondly, we propose an efficient, side-channel protected algorithm for fixed-base scalar multiplication which combines Feng et al.’s recoding with Lim-Lee’s comb method. Thirdly, we propose an efficient technique that interleaves ARM and NEON-based multiprecision operations over an extension field to improve performance of GLS curves on modern ARM processors. Finally, we showcase the efficiency of the proposed techniques by implementing a state-of-the-art GLV–GLS curve in twisted Edwards form defined over \(\mathbb {F}_{p^2}\), which supports a four-dimensional decomposition of the scalar and is fully protected against timing attacks. Analysis and performance results are reported for modern \(\times \)64 and ARM processors. For instance, we compute a variable-base scalar multiplication in 89,000 and 244,000 cycles on an Intel Ivy Bridge and an ARM Cortex-A15 processor (respect.); using a precomputed table of 6KB, we compute a fixed-base scalar multiplication in 49,000 and 116,000 cycles (respect.); and using a precomputed table of 3KB, we compute a double-scalar multiplication in 115,000 and 285,000 cycles (respect.). The proposed techniques represent an important improvement of the state-of-the-art performance of elliptic curve computations, and allow us to set new speed records in several modern processors. The techniques also reduce the cost of adding protection against timing attacks in the computation of GLV-based variable-base scalar multiplication to below 10 %. This work is the extended version of a publication that appeared at CT-RSA (Faz-Hernández et al. Topics in Cryptology, CT-RSA 2014, vol. 8366, pp. 1–27 2014).
Similar content being viewed by others
Notes
However, in some cases, one can afford the reduction of precomputations from 16 to 8 when using the windowed recoding if endomorphisms are cheap and can be computed on-the-fly during the evaluation stage; e.g., see [35].
In the case of unprotected software on \(\times \)64, Oliveira et al. [35] hold the current speed record with 72,000 cycles on an Intel Sandy Bridge. Their protected version is significantly more costly and runs in about 115,000 cycles.
References
Aranha, D.F., Karabina, K., Longa, P., Gebotys, C.H., López, J.: Faster explicit formulas for computing pairings over ordinary curves. In: Paterson, K.G. (ed.) Advances in cryptology, EUROCRYPT, vol. 6632, pp 48–68. Springer, New York (2011)
Bernstein, D.: Cache-timing attacks on AES. http://cr.yp.to/antiforgery/cachetiming-20050414.pdf (2005)
Bernstein, D., Birkner, P., Joye, M., Lange, T., Peters, C.: Twisted Edwards curves. In: Vaudenay, S., (ed.) Proceedings of Africacrypt 2008 LNCS, vol. 5023, pp. 389–405. Springer, New York (2008)
Bernstein, D., Chuengsatiansup, C., Lange, T., Schwabe, P.: Kummer strikes back: new DH speed records. In: Cryptology ePrint Archive, Report 2014/134 (2014). Available at: http://eprint.iacr.org/2014/134
Bernstein, D., Duif, N., Lange, T., Schwabe, P., Yang, B.-Y.: High-speed high-security signatures. In: Preneel, B., Takagi, T. (eds.) Proceedings of CHES 2011, LNCS, vol. 6917, pp. 124–142. Springer, New York (2011)
Bernstein, D., Lange, T.: eBACS: ECRYPT Benchmarking of Cryptographic Systems. http://bench.cr.yp.to/results-dh.html (2013). Accessed 12 Dec 2013
Bernstein, D., Schwabe, P.: NEON crypto. In: Prouff, E., Schaumont, P.R. (eds.) Cryptographic Hardware and Embedded Systems, CHES 2012, Lecture Notes in Computer Science, vol. 7428, pp. 320–339. Springer, New York (2012)
Bos, J.W., Costello, C., Hisil, H., Lauter, K.: Fast cryptography in genus 2. In: Johansson, T., Nguyen, P.Q. (eds.) Advances in Crytology, EUROCRYPT, LNCS, vol. 7881, pp. 194–210. Springer, New York (2013)
Bos, J.W., Costello, C., Hisil, H., Lauter, K.: High-performance scalar multiplication using 8-dimensional GLV/GLS decomposition. In: Bertoni, G., Coron, J.-S. (eds.) Cryptographic Hardware and Embedded Systems, CHES 2013, LNCS, vol. 8086, pp. 331–348. Springer, New York (2013)
Bos, J.W., Costello, C., Longa, P., Naehrig, M.: Selecting elliptic curves for cryptography: an efficiency and security analysis. In: Proceedings of Cryptology ePrint Archive, Report 2014/130 (2014). Available at: http://eprint.iacr.org/2014/130
Brumley, D., Boneh, D.: Remote timing attacks are practical. In: Mangard, S., Standaert, F.-X. (eds.) Proceedings of the 12th USENIX Security Symposium, LNCS, vol. 6225, pp. 80–94. Springer, New York (2003)
Faz-Hernández, A., Longa, P., Sánchez, A.H.: Efficient and secure algorithms for GLV-based scalar multiplication and their implementation on GLV-GLS curves. In: Benaloh, J. (ed.) Topics in Cryptology, CT-RSA 2014, vol. 8366, pp. 1–27. Springer, New York (2014)
Feng, M., Zhu, B.B. Xu, M., Li, S.: Efficient comb elliptic curve multiplication methods resistant to power analysis. In: Proceedings of Cryptology ePrint Archive, Report 2005/222 (2005). Available at: http://eprint.iacr.org/2005/222
Feng, M., Zhu, B.B., Zhao, C., Li, S.: Signed MSB-set comb method for elliptic curve point multiplication. In: Chen, K., Deng, R., Lai, X., Zhou, J. (eds) Proceedings of Information Security Practice and Experience (ISPEC 2006), LNCS, vol. 3903, pp. 13–24. Springer, New York (2006)
Galbraith, S.D., Lin, X., Scott, M.: Endomorphisms for faster elliptic curve cryptography on a large class of curves. J. Cryptol. 24(3), 446–469 (2011)
Galbraith, S.D., Lin, X., Scott, M.: Endomorphisms for faster elliptic curve cryptography on a large class of curves. In: Joux, A. (ed.) Advances in Cryptology, EUROCRYPT, LNCS, vol. 5479, pp. 518–535. Springer, New York (2009)
Gallant, R.P., Lambert, J.L., Vanstone, S.A.: Faster point multiplication on elliptic curves with efficient endomorphisms. In: Kilian, J. (ed.) Advances in Cryptology, CRYPTO, LNCS, vol. 2139, pp. 190–200. Springer, New York (2001)
Guillevic, A., Ionica, S.: Four dimensional GLV via the weil restriction. In: Sako, K., Sarkar, P. (eds.) Advances in Cryptology, ASIACRYPT, LNCS, vol. 8269, pp. 79–96. Springer, New York (2013)
Hamburg, M.: Fast and compact elliptic-curve cryptography. In: Proceedings of Cryptology ePrint Archive, Report 2012/309 (2012). Available at: http://eprint.iacr.org/2012/309
Hankerson, D., Karabina, K., Menezes, A.: Analyzing the Galbraith–Lin–Scott point multiplication method for elliptic curves over binary fields. IEEE Trans. Comput. 58(10), 1411–1420 (2009)
Hankerson, D., Menezes, A., Vanstone, S.: Guide to elliptic curve cryptography. Springer, New York (2004)
Hedabou, M., Pinel, P., Beneteau, L.: Countermeasures for preventing comb method against SCA attacks. In: Deng, R., Bao, F., Pang, H., Zhou, J. (eds.) Proceedings of Information Security Practice and Experience (ISPEC 2005), LNCS, vol. 3439, pp. 85–96. Springer, New York (2005)
Hisil, H., Wong, K., Carter, G., Dawson, E.: Twisted Edwards curves revisited. In: Pieprzyk, J. (ed.) Advances in Cryptology, ASIACRYPT, LNCS, vol. 5350, pp. 326–343. Springer, New York (2008)
Hu, Z., Longa, P., Xu, M.: Implementing 4-dimensional GLV method on GLS elliptic curves with j-invariant 0. Des. Codes Cryptogr. 63(3), 331–343 (2012). http://eprint.iacr.org/2011/315
Joye, M., Tunstall, M.: Exponent recoding and regular exponentiation algorithms. In: Joye, M., (ed.) Proceedings of Africacrypt 2003, LNCS, vol. 5580, pp. 334–349. Springer, New York (2009)
Kocher, P.C.: Timing attacks on implementations of Diffie–Hellman, RSA, DSS, and other systems. In: Koblitz, N. (ed.) Advances in Cryptology, CRYPTO, LNCS, vol. 1109, pp. 104–113. Springer, New York (1996)
Kocher, P.C., Jaffe, J., Jun, B.: Differential power analysis. In: Wiener, M. (ed.) Advances in Cryptology, CRYPTO, LNCS, vol. 1666, pp. 388–397. Springer, New York (1999)
Lim, C.H., Lee, P.J.: More flexible exponentiation with precomputation. In: Desmedt, Y. (ed.) Advances in Cryptology, CRYPTO, LNCS, vol. 839, pp. 95–107. Springer, New York (1994)
ARM Limited. ARM Architecture Reference Manual: ARMv7-A and ARMv7-R (edn.) (2012)
Longa, P., Gebotys, C.: Efficient techniques for high-speed elliptic curve cryptography. In Mangard, S., Standaert, F.-X. (eds.) Proceedings of CHES 2010, LNCS, vol. 6225, pp. 80–94. Springer, New York (2010)
Longa, P., Sica, F.: Four-dimensional Gallant–Lambert–Vanstone scalar multiplication. In: Wang, X., Sako, K. (eds.) Advances in Cryptology, ASIACRYPT, LNCS, vol. 7658, pp. 718–739. Springer, New York (2012)
Longa, P., Sica, F.: Four-dimensional Gallant–Lambert–Vanstone scalar multiplication. J. Cryptol. 27(2), 248–283 (2014)
Möller, B.: Algorithms for multi-exponentiation. In: Vaudenay, S., Youssef, A.M. (eds.) Proceedings of SAC 2001, LNCS, vol. 2259, pp. 165–180. Springer, New York (2001)
Okeya, K., Takagi, T.: The width-\(w\) NAF method provides small memory and fast elliptic curve scalars multiplications against side-channel attacks. In: Joye, M. (ed.) Proceedings of CT-RSA 2003, vol. 2612, pp. 328–342. Springer, New York (2003)
Oliveira, T., López, J., Aranha, D.F., Rodríguez-Henríquez, F.: Lambda coordinates for binary elliptic curves. In: Bertoni, G., Coron, J.-S. (eds.) Cryptographic Hardware and Embedded Systems, CHES 2013, LNCS, vol. 8086, pp. 311–330. Springer, New York (2013)
Osvik, D.A., Shamir, A., Tromer, E.: Cache attacks and countermeasures: the case of AES. In: Pointcheval, D. (ed.) Topics in Cryptology, CT-RSA 2006, vol. 3860, pp. 1–20. Springer, New York (2006)
Microsoft Research. MSR Elliptic Curve Cryptography Library (MSR ECCLib) (2014). Available at: http://research.microsoft.com/en-us/projects/nums
Sánchez, A.H., Rodríguez-Henríquez, F.: NEON implementation of an attribute-based encryption scheme. In: Jacobson, M., Locasto, M., Mohassel, P., Safavi-Naini, R. (eds.) International Conference on Applied Cryptography and Network Security, ACNS 2013, LNCS, vol. 7954, pp. 322–338. Springer, New York (2013)
Smith, B.: Families of fast elliptic curves from \(\mathbb{Q}\)-curves. In: Sako, K., Sarkar, P. (eds.) Advances in Cryptology, ASIACRYPT, LNCS, vol. 8269, pp. 61–78. Springer, New York (2013)
Weber, D., Denny, T.F.: The solution of McCurley’s discrete log challenge. In: Krawczyk, H. (ed.) Advances in Cryptology, CRYPTO, LNCS, vol. 1462, pp. 458–471. Springer, New York (1998)
Yanik, T., Savaş, E., Koç, Ç.K.: Incomplete reduction in modular arithmetic. IEE Proc. Comput. Digital Tech. 149(2), 46–52 (2002)
Yen, S.-M., Joye, M.: Checking before output may not be enough against fault-based cryptanalysis. IEEE Trans. Comput. 49(9), 967–970 (2000)
Yen, S.-M., Kim, S., Lim, S., Moon, S.-J.: A countermeasure against one physical cryptanalysis may benefit another attack. In: Kim, K., (ed.) Information Security and Cryptology, ICISC 2001, Lecture Notes in Computer Science, vol. 2288, pp. 414–427. Springer, New York (2002)
Acknowledgments
We would like to thank Joppe Bos, Craig Costello, Francisco Rodríguez-Henríquez and the reviewers for their useful comments that helped us improve the quality of this work. Also, we would like to thank Francisco Rodríguez-Henríquez for giving us access to the Arndale board for the development of the ARM implementation.
Author information
Authors and Affiliations
Corresponding author
Appendices
Appendix A: Comb method using the modified LSB-set representation
Let \(t\) be the bitlength of the prime subgroup order \(r\). Assume that \(k \in [1,r-1]\) is partitioned in \(w\) consecutive parts of \(d\) digits each, and each part is partitioned in \(v\) strings of \(e\) digits each, padding \(k\) with \((dw-t)\) zeros to the left, where \(l = dw, d = ev\) and \(e = \lceil t/wv \rceil \). The modified LSB-set representation of \(k\) is given by
where \(b_i \in \{1,-1\}\) for \(0 \le i<d\), and \(b_i \in \{0,b_{i \,\,{\hbox {mod}}\,\, d}\}\) for \(d \le i \le l-1\). If \(wv \mid t\) then the carry bit \(c \in \{0,1\}\). Otherwise, \(c\) is always zero. Disregarding the carry bit, rewrite the representation (2) in matrix form as follows:
where each \(K^{w'}\) consists of \(v\) strings of \(e\) digits each. Let the \(v'\)-th string in a given \(K^{w'}\) be denoted by \(K^{w'}_{v'}\), and the \(e'\)-th digit in a given \(K^{w'}_{v'}\) be denoted by \(K^{w'}_{v',e'}\), such that \(K^{w'}_{v',e'} = b_{dw'+ev'+e'}\). Then, to compute the scalar multiplication, we have
Assuming that \(P[w'] = 2^{dw'} P\) for \(0 \le w' \le w-1\), then
Recall that by definition of the \(m\)LSB-set representation \(K^0_{v',e'} \in \{1,-1\}\) and \(K^{w'}_{v',e'} \in \{0,K^0_{v',e'}\}\) for \(1 \le w' \le w-1\), for a given pair of indices \((v',e')\). Assume that the following values are precomputed for all \(0 \le u < 2^{w-1}\) and \(0 \le v' < v\)
where \(u = (u_{w-2}, \ldots , u_0)_2\). Then, \(k'P\) can be rewritten as:
where digit-columns \(\mathbb {K}_{v',e'} = [K^{w-1}_{v',e'}, \ldots , K^2_{v',e'} , K^1_{v',e'}]\) \(\equiv | K^{w-1}_{v',e'} 2^{w-2} + \cdots + K^2_{v',e'} 2 + K^1_{v',e'} |\), and the sign \(s_{v',e'} = K^0_{v',e'} \in \{1,-1\}\).
Based on Eq. (7), \(k'P\) can be computed from left-to-right using precomputed points (6) together with a variant of the double-and-add algorithm (see Algorithm 5). The final result is obtained after a final correction computing \(kP = k'P + c \cdot 2^{wd}P\) using the precomputed value \(2^{wd}P\).
Appendix B: Formulas for endomorphisms \(\Phi \) and \(\Psi \) on curve Ted127-glv4
Let \(P = (X_1,Y_1,Z_1)\) be a point in homogeneous projective coordinates on a twisted Edwards curve with Eq. (1), \(u = 1+i\) be a quadratic non-residue in \(\mathbb {F}_{p^2}\), and \(\zeta _8 = u/\sqrt{2}\) be a primitive 8th root of unity. Then, we can compute \(\Phi (P) = (X_2,Y_2,Z_2,T_2)\) as follows:
where \(\alpha = \zeta _8^3 + 2\zeta _8^2 + \zeta _8, \;\theta = \zeta _8^3 - 2\zeta _8^2 + \zeta _8, \;\mu = 2\zeta _8^3 + \zeta _8^2 - 1, \;\gamma = 2\zeta _8^3 - \zeta _8^2 + 1\) and \(\phi = \zeta _8^2 - 1\). For curve Ted127-glv4, we have the fixed values
where \(A \!=\!143485135153817520976780139629062568752\), \( B = 170141183460469231731687303715884099729\).
Computing an endomorphism \(\Phi \) with the formula above costs \(12m + 2s + 5a\) or only \(8m + 1s + 5a\) if \(Z_1 = 1\). Similarly, we can compute \(\Psi (P) = (X_2,Y_2,Z_2,T_2)\) as follows:
Given the value for \(\zeta _8\) on curve Ted127-glv4 computing an endomorphism \(\Psi \) with the formula above costs approximately \(3m + 1s + 2M + 5A\) or only \(1m + 2M + 4A\) if \(Z_1 = 1\).
Appendix C: Algorithms for quadratic extension field operations exploiting interleaved ARM/NEON operations
Algorithms targeting ARM platforms for multiplication and squaring over \(\mathbb {F}_{p^2}\), with \(p = 2^{127}-c\), are detailed by Algorithms 13 and 14, respectively. These algorithms exploit functions interleaving ARM/NEON-based operations, namely double_mul_neonarm, triple_mul_neonarm and double_red_neonarm, which are detailed in Algorithms 8, 9 and 10, respectively.
Appendix D: Cost of fixed-base scalar multiplication using the \(m\)LSB-set comb method
In Table 6, we present estimated costs in terms of multiplications over \(\mathbb {F}_{p^2}\) per bit for fixed-base scalar multiplication on curve Ted127-glv4 using the \(m\)LSB-set comb method (Algorithm 5). Precomputed points are stored as \((x,y)\) coordinates (“affine”) or as \((x+y, y-x,2t)\) coordinates (“extended”). Best results for a given memory requirement are in bold
Appendix E: Cost of fixed/variable-base double-scalar multiplication on curve Ted127-glv4 using \(w\)NAF with interleaving
In Table 7, we present estimated costs in terms of multiplications over \(\mathbb {F}_{p^2}\) per bit for fixed/variable-base double-scalar multiplication on curve Ted127-glv4 using \(w\)-NAF with interleaving. Precomputations for the fixed base are stored as \((x,y)\) coordinates (“affine”) or as \((x+y, y-x,2t)\) coordinates (“extended”). The window size \(w_{1,j}\) for each sub-scalar \(j\), number of points and memory listed in the first column correspond to requirements for the fixed base. For the variable base, we fix \(w_2=4\), corresponding to the use of 16 precomputed points (see Sect. 6). Best results for a given money requirement are in bold
Rights and permissions
About this article
Cite this article
Faz-Hernández, A., Longa, P. & Sánchez, A.H. Efficient and secure algorithms for GLV-based scalar multiplication and their implementation on GLV–GLS curves (extended version). J Cryptogr Eng 5, 31–52 (2015). https://doi.org/10.1007/s13389-014-0085-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13389-014-0085-7