Abstract
AVX2 is the newest instruction set on the Intel Haswell processor that provides simultaneous execution of operations over vectors of 256 bits. This work presents the advances on the applicability of AVX2 on the development of an efficient software implementation of the elliptic curve Diffie-Hellman protocol using the Curve25519 elliptic curve. Also, we will discuss some advantages that vector instructions offer as an alternative method to accelerate prime field and elliptic curve arithmetic. The performance of our implementation shows a slight improvement against the fastest state-of-the-art implementations.
Armando Faz-Hernández and Julio López were partially supported by the Intel Labs University Research Office.
Julio López was partially supported by FAPESP, Projeto Temático grant number 2013/25.977-7.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Aranha, D.F., Gouvêa, C.P.L.: RELIC is an Efficient LIbrary for Cryptography. http://code.google.com/p/relic-toolkit/
Aranha, D.F., Barreto, P.S.L.M., Pereira, G.C.C.F., Ricardini, J.E.: A note on high-security general-purpose elliptic curves. Cryptology ePrint Archive, Report 2013/647 (2013). http://eprint.iacr.org/
Bernstein, D.J.: Curve25519: new Diffie-Hellman speed records. In: Yung, M., Dodis, Y., Kiayias, A., Malkin, T. (eds.) PKC 2006. LNCS, vol. 3958, pp. 207–228. Springer, Heidelberg (2006)
Bernstein, D.J.: Cryptography in NaCl, March 2009. http://cr.yp.to/highspeed/naclcrypto-20090310.pdf
Bernstein, D.J.: DNSCurve: usable security for DNS, June 2009. http://dnscurve.org
Bernstein, D.J., Lange, T.: eBACS: ECRYPT benchmarking of cryptographic systems, March 2015. Accessed on 20 March 2015 http://bench.cr.yp.to/supercop.html
Bernstein, D.J., Lange, T.: SafeCurves: choosing safe curves for elliptic-curve cryptography (2015). Accessed 20 March 2015 http://safecurves.cr.yp.to
Bernstein, D.J., Lange, T., Schwabe, P.: NaCl: Networking and Cryptography library, October 2013. http://nacl.cr.yp.to/
Bernstein, D.J., Schwabe, P.: NEON Crypto. In: Prouff, E., Schaumont, P. (eds.) CHES 2012. LNCS, vol. 7428, pp. 320–339. Springer, Heidelberg (2012). http://dx.doi.org/10.1007/978-3-642-33027-8_19
Bos, J.W., Costello, C., Longa, P., Naehrig, M.: Selecting Elliptic Curves for Cryptography: An Efficiency and Security Analysis. Cryptology ePrint Archive, Report 2014/130 (2014). http://eprint.iacr.org/
Cohen, H., Frey, G., Avanzi, R., Doche, C., Lange, T., Nguyen, K., Vercauteren, F.: Handbook of Elliptic and Hyperelliptic Curve Cryptography, (2nd edn). Chapman & Hall/CRC (2012)
Corporation, I.: Intel Pentium processor with MMX technology documentation, January 2008. http://www.intel.com/design/archives/Processors/mmx/
Corporation, I.: Define SSE2, SSE3 and SSE4, January 2009. http://www.intel.com/support/processors/sb/CS-030123.htm
Corporation, I.: Intel Advanced Vector Extensions Programming Reference, June 2011. https://software.intel.com/sites/default/files/m/f/7/c/36945
Fog, A.: Instruction tables: Lists of instruction latencies, throughputs and micro-operation breakdowns for Intel, AMD and VIA CPUs, December 2014
Granger, R., Scott, M.: Faster ECC over \(\mathbb{F}_{2^{521}-1}\). Cryptology ePrint Archive, Report 2014/852 (2014). http://eprint.iacr.org/
Granlund, T., the GMP development team: GNU MP: The GNU Multiple Precision Arithmetic Library, (5.0.5 edn) (2012). http://gmplib.org/
Itoh, T., Tsujii, S.: A fast algorithm for computing multiplicative inverses in GF\((2^m)\) using normal bases. Inf. Comput. 78(3), 171–177 (1988). http://dx.doi.org/10.1016/0890-5401(88)90024–7
Montgomery, P.L.: Speeding the pollard and elliptic curve methods of factorization. Math. Comput. 48(177), 243–264 (1987). http://dx.doi.org/10.2307/2007888
National Institute of Standards and Technology: Digital Signature Standard (DSS). FIPS Publication 186, may 1994. http://www.bibsonomy.org/bibtex/2a98c67565fa98cc7c90d7d622c1ad252/dret
Shell, O.S.: OpenSSH, January 2014. http://www.openssh.com/txt/release-6.5
Solinas, J.A.: Generalized Mersenne Numbers. Technical report,Center of Applied Cryptographic Research (CACR) (1999)
The OpenSSL Project: OpenSSL: The Open Source toolkit for SSL/TLS, April 2003. www.openssl.org
Acknowledgments
The authors would like to thank the anonymous reviewers for their helpful suggestions and comments. Additionally, they would like to show their gratitude to Jérémie Detrey for his valuable comments on an earlier version of the manuscript.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendices
A Relevant AVX2 Instructions
A list of the most relevant instructions used in this work is presented. For clarity, instructions were grouped according to their functionality. Table 4 shows in the second column a mnemonic used in this document; in the third column is described the specific assembler name of the instruction, and the last columns show the latency and the reciprocal throughput of every instruction, the entries were taken from the Agner Fog’s measurements published in [15].
B Algorithms
1.1 B.1 Implementation of Modular Squaring Using AVX2
To compute the modular squaring we follow a similar approach like in the case of modular multiplication. Algorithm 4 shows the scheduling of instructions used to compute the modular squaring of an interleaved tuple \(\langle \mathbf {A},\mathbf {B}\rangle \). The products \(a_{x,y}\) such that \(\nu _{x,y}=2\) are computed in the inner loops (lines 12 to 15 and 20 to 23) and once that these products were accumulated, they are multiplied by 2 using shift instructions. At the end, the lines from 26 to 29 compute the modular reduction.
1.2 B.2 Implementation of Coefficient Reduction Using AVX2
The coefficient reduction is processed coefficient-wise. We split each coefficient into three parts \(a_i=h_i\parallel m_i\parallel l_i\) and compute the process described in Sect. 3.2. Simultaneously, each \(m_i\) (medium coefficient) is added to the correspondent \(l_{i+1}\) (low coefficient) and to the \(h_{i-1}\) (high coefficient). For those coefficients that need to be reduced modulo p, we compute the multiplication by c using just shift instructions. After the coefficient reduction is processed, the size of each coefficient in the updated tuple will have at most \(\beta _i+1\) bits.
1.3 B.3 Point Multiplication Using Montgomery Ladder
Algorithm 6 shows the computation of the Montgomery point multiplication to calculate the x-coordinate of k P given the x-coordinate of P and an integer scalar k. This algorithm also requires the ladder step presented in Algorithm 1.
For its use in the computation of the elliptic curve Diffie-Hellman protocol using the Curve25519, the document [4] describes an encoding for the secret key when is given as a string of bytes. Then, the description of Algorithm 6 assumes that the secret key was already encoded.
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Faz-Hernández, A., López, J. (2015). Fast Implementation of Curve25519 Using AVX2. In: Lauter, K., Rodríguez-Henríquez, F. (eds) Progress in Cryptology -- LATINCRYPT 2015. LATINCRYPT 2015. Lecture Notes in Computer Science(), vol 9230. Springer, Cham. https://doi.org/10.1007/978-3-319-22174-8_18
Download citation
DOI: https://doi.org/10.1007/978-3-319-22174-8_18
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-22173-1
Online ISBN: 978-3-319-22174-8
eBook Packages: Computer ScienceComputer Science (R0)