Software Implementation of Binary Elliptic Curves: Impact of the Carry-Less Multiplier on Scalar Multiplication

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6917)


The availability of a new carry-less multiplication instruction in the latest Intel desktop processors significantly accelerates multiplication in binary fields and hence presents the opportunity for reevaluating algorithms for binary field arithmetic and scalar multiplication over elliptic curves. We describe how to best employ this instruction in field multiplication and the effect on performance of doubling and halving operations. Alternate strategies for implementing inversion and half-trace are examined to restore most of their competitiveness relative to the new multiplier. These improvements in field arithmetic are complemented by a study on serial and parallel approaches for Koblitz and random curves, where parallelization strategies are implemented and compared. The contributions are illustrated with experimental results improving the state-of-the-art performance of halving and doubling-based scalar multiplication on NIST curves at the 112- and 192-bit security levels, and a new speed record for side-channel resistant scalar multiplication in a random curve at the 128-bit security level.


Elliptic curve cryptography finite field arithmetic parallel algorithm efficient software implementation 


  1. 1.
    Ahmadi, O., Hankerson, D., Rodríguez-Henríquez, F.: Parallel formulations of scalar multiplication on Koblitz curves. J. UCS 14(3), 481–504 (2008)MathSciNetzbMATHGoogle Scholar
  2. 2.
    Aranha, D.F., López, J., Hankerson, D.: Efficient Software Implementation of Binary Field Arithmetic Using Vector Instruction Sets. In: Abdalla, M., Barreto, P.S.L.M. (eds.) LATINCRYPT 2010. LNCS, vol. 6212, pp. 144–161. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  3. 3.
    Avanzi, R.M.: Another Look at Square Roots (and Other Less Common Operations) in Fields of Even Characteristic. In: Adams, C., Miri, A., Wiener, M. (eds.) SAC 2007. LNCS, vol. 4876, pp. 138–154. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  4. 4.
    Bellare, M. (ed.): CRYPTO 2000. LNCS, vol. 1880. Springer, Heidelberg (2000)zbMATHGoogle Scholar
  5. 5.
    Bernstein, D., Lange, T.: Analysis and optimization of elliptic-curve single-scalar multiplication. In: Proceedings 8th International Conference on Finite Fields and Applications (Fq8), vol. 461, pp. 1–20. AMS, Providence (2008)Google Scholar
  6. 6.
    Bernstein, D.J.: Batch Binary Edwards. In: Halevi, S. (ed.) CRYPTO 2009. LNCS, vol. 5677, pp. 317–336. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  7. 7.
    Bernstein, D.J., Lange, T. (eds.): eBACS: ECRYPT Benchmarking of Cryptographic Systems, (accessed March 30, 2011)
  8. 8.
    Beuchat, J.-L., González-Díaz, J.E., Mitsunari, S., Okamoto, E., Rodríguez-Henríquez, F., Teruya, T.: High-speed software implementation of the optimal ate pairing over barreto–naehrig curves. In: Joye, M., Miyaji, A., Otsuka, A. (eds.) Pairing 2010. LNCS, vol. 6487, pp. 21–39. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  9. 9.
    Blake, I.F., Murty, V.K., Xu, G.: A note on window τ-NAF algorithm. Inf. Process. Lett. 95(5), 496–502 (2005)MathSciNetzbMATHCrossRefGoogle Scholar
  10. 10.
    Bodrato, M.: Towards Optimal Toom-Cook Multiplication for Univariate and Multivariate Polynomials in Characteristic 2 and 0. In: Carlet, C., Sunar, B. (eds.) WAIFI 2007. LNCS, vol. 4547, pp. 116–133. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  11. 11.
    Bos, J.W., Kleinjung, T., Niederhagen, R., Schwabe, P.: ECC2K-130 on Cell CPUs. In: Bernstein, D.J., Lange, T. (eds.) AFRICACRYPT 2010. LNCS, vol. 6055, pp. 225–242. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  12. 12.
    Comba, P.G.: Exponentiation Cryptosystems on the IBM PC. IBM Systems Journal 29(4), 526–538 (1990)CrossRefGoogle Scholar
  13. 13.
    Dahmen, E., Okeya, K., Schepers, D.: Affine Precomputation with Sole Inversion in Elliptic Curve Cryptography. In: Pieprzyk, J., Ghodosi, H., Dawson, E. (eds.) ACISP 2007. LNCS, vol. 4586, pp. 245–258. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  14. 14.
    Firasta, N., Buxton, M., Jinbo, P., Nasri, K., Kuo, S.: Intel AVX: New frontiers in performance improvement and energy efficiency. White paper,
  15. 15.
    Fog, A.: Instruction tables: List of instruction latencies, throughputs and micro-operation breakdowns for Intel, AMD and VIA CPUs, (accessed March 01, 2011)
  16. 16.
    Fong, K., Hankerson, D., López, J., Menezes, A.: Field inversion and point halving revisited. IEEE Transactions on Computers 53(8), 1047–1059 (2004)CrossRefGoogle Scholar
  17. 17.
    Grabher, P., Großschädl, J., Page, D.: On software parallel implementation of cryptographic pairings. Cryptology ePrint Archive, Report 2008/205 (2008),
  18. 18.
    Guajardo, J., Paar, C.: Itoh-Tsujii inversion in standard basis and its application in cryptography and codes. Designs, Codes and Cryptography 25(2), 207–216 (2002)MathSciNetzbMATHCrossRefGoogle Scholar
  19. 19.
    Gueron, S.: Intel Advanced Encryption Standard (AES) Instructions Set. White paper,
  20. 20.
    Gueron, S., Kounavis, M. E.: Carry-less multiplication and its usage for computing the GCM mode. White paper,
  21. 21.
    Hankerson, D., Menezes, A.J., Vanstone, S.: Guide to Elliptic Curve Cryptography. Springer, Secaucus (2004)zbMATHGoogle Scholar
  22. 22.
    Intel. Intel SSE4 Programming Reference. Technical Report,
  23. 23.
    Itoh, T., Tsujii, S.: A fast algorithm for computing multiplicative inverses in GF(2m) using normal bases. Inf. Comput. 78(3), 171–177 (1988)MathSciNetzbMATHCrossRefGoogle Scholar
  24. 24.
    Järvinen, K.: Optimized FPGA-based elliptic curve cryptography processor for high-speed applications. Integration, the VLSI Journal (to appear)Google Scholar
  25. 25.
    Karatsuba, A., Ofman, Y.: Multiplication of many-digital numbers by automatic computers. Doklady Akad. Nauk SSSR 145, 293–294 (1962); Translation in Physics-Doklady 7, 595–596 (1963)Google Scholar
  26. 26.
    Kim, K.H., Kim, S.I.: A new method for speeding up arithmetic on elliptic curves over binary fields. Cryptology ePrint Archive, Report 2007/181 (2007),
  27. 27.
    King, B., Rubin, B.: Improvements to the Point Halving Algorithm. In: Wang, H., Pieprzyk, J., Varadharajan, V. (eds.) ACISP 2004. LNCS, vol. 3108, pp. 262–276. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  28. 28.
    Knudsen, E.: Elliptic Scalar Multiplication Using Point Halving. In: Lam, K.-Y., Okamoto, E., Xing, C. (eds.) ASIACRYPT 1999. LNCS, vol. 1716, pp. 135–149. Springer, Heidelberg (1999)CrossRefGoogle Scholar
  29. 29.
    Koblitz, N.: CM-Curves with Good Cryptographic Properties. In: Feigenbaum, J. (ed.) CRYPTO 1991. LNCS, vol. 576, pp. 279–287. Springer, Heidelberg (1992)Google Scholar
  30. 30.
    Longa, P., Gebotys, C.H.: Efficient techniques for high-speed elliptic curve cryptography. In: Mangard, S., Standaert, F.-X. (eds.) CHES 2010. LNCS, vol. 6225, pp. 80–94. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  31. 31.
    López, J., Dahab, R.: Fast Multiplication on Elliptic Curves over GF(2m) without Precomputation. In: Koç, Ç.K., Paar, C. (eds.) CHES 1999. LNCS, vol. 1717, pp. 316–327. Springer, Heidelberg (1999)CrossRefGoogle Scholar
  32. 32.
    López, J., Dahab, R.: High-Speed Software Multiplication in GF(2m). In: Roy, B., Okamoto, E. (eds.) INDOCRYPT 2000. LNCS, vol. 1977, pp. 203–212. Springer, Heidelberg (2000)Google Scholar
  33. 33.
    Montgomery, P.L.: Five, six, and seven-term Karatsuba-like formulae. IEEE Transactions on Computers 54(3), 362–369 (2005)zbMATHCrossRefGoogle Scholar
  34. 34.
    National Institute of Standards and Technology (NIST). Recommended Elliptic Curves for Federal Government Use. NIST Special Publication (July 1999),
  35. 35.
    Schroeppel, R.: Elliptic curves: Twice as fast! Presentation at the CRYPTO 2000 [4] Rump Session (2000)Google Scholar
  36. 36.
    Solinas, J.A.: Efficient arithmetic on Koblitz curves. Designs, Codes and Cryptography 19(2-3), 195–249 (2000)MathSciNetzbMATHCrossRefGoogle Scholar
  37. 37.
    Wall, D.W.: Limits of instruction-level parallelism. In: 4th International Conference on Architectural Support for Programming Languages and Operating System (ASPLOS 1991), pp. 176–188. ACM, New York (1991)CrossRefGoogle Scholar
  38. 38.
    Wulf, W.A., McKee, S.A.: Hitting the Memory Wall: Implications of the Obvious. SIGARCH Computer Architecture News 23(1), 20–24 (1995)CrossRefGoogle Scholar

Copyright information

© International Association for Cryptologic Research 2011

Authors and Affiliations

  1. 1.Université de Lyon, Université Lyon1, ISFAFrance
  2. 2.Computer Science DepartmentCINVESTAV-IPNMéxico
  3. 3.Institute of ComputingUniversity of CampinasBrazil
  4. 4.Auburn UniversityUSA

Personalised recommendations