Fast Implementation of Curve25519 Using AVX2

Faz-Hernández, Armando; López, Julio

doi:10.1007/978-3-319-22174-8_18

Fast Implementation of Curve25519 Using AVX2

Armando Faz-Hernández¹⁵ &
Julio López¹⁵

Conference paper
First Online: 01 January 2015

1525 Accesses
15 Citations

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 9230))

Abstract

AVX2 is the newest instruction set on the Intel Haswell processor that provides simultaneous execution of operations over vectors of 256 bits. This work presents the advances on the applicability of AVX2 on the development of an efficient software implementation of the elliptic curve Diffie-Hellman protocol using the Curve25519 elliptic curve. Also, we will discuss some advantages that vector instructions offer as an alternative method to accelerate prime field and elliptic curve arithmetic. The performance of our implementation shows a slight improvement against the fastest state-of-the-art implementations.

Armando Faz-Hernández and Julio López were partially supported by the Intel Labs University Research Office.

Julio López was partially supported by FAPESP, Projeto Temático grant number 2013/25.977-7.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Aranha, D.F., Gouvêa, C.P.L.: RELIC is an Efficient LIbrary for Cryptography. http://code.google.com/p/relic-toolkit/
Aranha, D.F., Barreto, P.S.L.M., Pereira, G.C.C.F., Ricardini, J.E.: A note on high-security general-purpose elliptic curves. Cryptology ePrint Archive, Report 2013/647 (2013). http://eprint.iacr.org/
Bernstein, D.J.: Curve25519: new Diffie-Hellman speed records. In: Yung, M., Dodis, Y., Kiayias, A., Malkin, T. (eds.) PKC 2006. LNCS, vol. 3958, pp. 207–228. Springer, Heidelberg (2006)
Chapter Google Scholar
Bernstein, D.J.: Cryptography in NaCl, March 2009. http://cr.yp.to/highspeed/naclcrypto-20090310.pdf
Bernstein, D.J.: DNSCurve: usable security for DNS, June 2009. http://dnscurve.org
Bernstein, D.J., Lange, T.: eBACS: ECRYPT benchmarking of cryptographic systems, March 2015. Accessed on 20 March 2015 http://bench.cr.yp.to/supercop.html
Bernstein, D.J., Lange, T.: SafeCurves: choosing safe curves for elliptic-curve cryptography (2015). Accessed 20 March 2015 http://safecurves.cr.yp.to
Bernstein, D.J., Lange, T., Schwabe, P.: NaCl: Networking and Cryptography library, October 2013. http://nacl.cr.yp.to/
Bernstein, D.J., Schwabe, P.: NEON Crypto. In: Prouff, E., Schaumont, P. (eds.) CHES 2012. LNCS, vol. 7428, pp. 320–339. Springer, Heidelberg (2012). http://dx.doi.org/10.1007/978-3-642-33027-8_19
Chapter Google Scholar
Bos, J.W., Costello, C., Longa, P., Naehrig, M.: Selecting Elliptic Curves for Cryptography: An Efficiency and Security Analysis. Cryptology ePrint Archive, Report 2014/130 (2014). http://eprint.iacr.org/
Cohen, H., Frey, G., Avanzi, R., Doche, C., Lange, T., Nguyen, K., Vercauteren, F.: Handbook of Elliptic and Hyperelliptic Curve Cryptography, (2nd edn). Chapman & Hall/CRC (2012)
Google Scholar
Corporation, I.: Intel Pentium processor with MMX technology documentation, January 2008. http://www.intel.com/design/archives/Processors/mmx/
Corporation, I.: Define SSE2, SSE3 and SSE4, January 2009. http://www.intel.com/support/processors/sb/CS-030123.htm
Corporation, I.: Intel Advanced Vector Extensions Programming Reference, June 2011. https://software.intel.com/sites/default/files/m/f/7/c/36945
Fog, A.: Instruction tables: Lists of instruction latencies, throughputs and micro-operation breakdowns for Intel, AMD and VIA CPUs, December 2014
Google Scholar
Granger, R., Scott, M.: Faster ECC over \(\mathbb{F}_{2^{521}-1}\). Cryptology ePrint Archive, Report 2014/852 (2014). http://eprint.iacr.org/
Granlund, T., the GMP development team: GNU MP: The GNU Multiple Precision Arithmetic Library, (5.0.5 edn) (2012). http://gmplib.org/
Itoh, T., Tsujii, S.: A fast algorithm for computing multiplicative inverses in GF\((2^m)\) using normal bases. Inf. Comput. 78(3), 171–177 (1988). http://dx.doi.org/10.1016/0890-5401(88)90024–7
Article MathSciNet Google Scholar
Montgomery, P.L.: Speeding the pollard and elliptic curve methods of factorization. Math. Comput. 48(177), 243–264 (1987). http://dx.doi.org/10.2307/2007888
Article Google Scholar
National Institute of Standards and Technology: Digital Signature Standard (DSS). FIPS Publication 186, may 1994. http://www.bibsonomy.org/bibtex/2a98c67565fa98cc7c90d7d622c1ad252/dret
Shell, O.S.: OpenSSH, January 2014. http://www.openssh.com/txt/release-6.5
Solinas, J.A.: Generalized Mersenne Numbers. Technical report,Center of Applied Cryptographic Research (CACR) (1999)
Google Scholar
The OpenSSL Project: OpenSSL: The Open Source toolkit for SSL/TLS, April 2003. www.openssl.org

Download references

Acknowledgments

The authors would like to thank the anonymous reviewers for their helpful suggestions and comments. Additionally, they would like to show their gratitude to Jérémie Detrey for his valuable comments on an earlier version of the manuscript.

Author information

Authors and Affiliations

Institute of Computing, University of Campinas, 1251 Albert Einstein, Cidade Universitaria, Campinas, Brazil
Armando Faz-Hernández & Julio López

Authors

Armando Faz-Hernández
View author publications
You can also search for this author in PubMed Google Scholar
Julio López
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Armando Faz-Hernández .

Editor information

Editors and Affiliations

Microsoft Research, Redmond, Washington, USA
Kristin Lauter
CINVESTAV-IPN, Mexico City, Distrito Federal, Mexico
Francisco Rodríguez-Henríquez

Appendices

A Relevant AVX2 Instructions

A list of the most relevant instructions used in this work is presented. For clarity, instructions were grouped according to their functionality. Table 4 shows in the second column a mnemonic used in this document; in the third column is described the specific assembler name of the instruction, and the last columns show the latency and the reciprocal throughput of every instruction, the entries were taken from the Agner Fog’s measurements published in [15].

B Algorithms

1.1 B.1 Implementation of Modular Squaring Using AVX2

To compute the modular squaring we follow a similar approach like in the case of modular multiplication. Algorithm 4 shows the scheduling of instructions used to compute the modular squaring of an interleaved tuple \(\langle \mathbf {A},\mathbf {B}\rangle \). The products \(a_{x,y}\) such that \(\nu _{x,y}=2\) are computed in the inner loops (lines 12 to 15 and 20 to 23) and once that these products were accumulated, they are multiplied by 2 using shift instructions. At the end, the lines from 26 to 29 compute the modular reduction.

1.2 B.2 Implementation of Coefficient Reduction Using AVX2

The coefficient reduction is processed coefficient-wise. We split each coefficient into three parts \(a_i=h_i\parallel m_i\parallel l_i\) and compute the process described in Sect. 3.2. Simultaneously, each \(m_i\) (medium coefficient) is added to the correspondent \(l_{i+1}\) (low coefficient) and to the \(h_{i-1}\) (high coefficient). For those coefficients that need to be reduced modulo p, we compute the multiplication by c using just shift instructions. After the coefficient reduction is processed, the size of each coefficient in the updated tuple will have at most \(\beta _i+1\) bits.

1.3 B.3 Point Multiplication Using Montgomery Ladder

Algorithm 6 shows the computation of the Montgomery point multiplication to calculate the x-coordinate of k P given the x-coordinate of P and an integer scalar k. This algorithm also requires the ladder step presented in Algorithm 1.

For its use in the computation of the elliptic curve Diffie-Hellman protocol using the Curve25519, the document [4] describes an encoding for the secret key when is given as a string of bytes. Then, the description of Algorithm 6 assumes that the secret key was already encoded.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Faz-Hernández, A., López, J. (2015). Fast Implementation of Curve25519 Using AVX2. In: Lauter, K., Rodríguez-Henríquez, F. (eds) Progress in Cryptology -- LATINCRYPT 2015. LATINCRYPT 2015. Lecture Notes in Computer Science(), vol 9230. Springer, Cham. https://doi.org/10.1007/978-3-319-22174-8_18

Download citation

DOI: https://doi.org/10.1007/978-3-319-22174-8_18
Published: 15 August 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-22173-1
Online ISBN: 978-3-319-22174-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics