Skip to main content

SMCOS: Fast and Parallel Modular Multiplication on ARM NEON Architecture for ECC

  • Conference paper
  • First Online:
Information Security and Cryptology (Inscrypt 2021)

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 13007))

Included in the following conference series:

Abstract

Elliptic Curve Cryptography (ECC) is considered a more effective public-key cryptographic algorithm in some scenarios, because it uses shorter key sizes while providing a considerable level of security. Modular multiplication constitutes the “arithmetic foundation” of modern public-key cryptography such as ECC. In this paper, we propose the Cascade Operand Scanning for Specific Modulus (SMCOS) vectorization method to speed up the prime field multiplication of ECC on Single Instruction Multiple Data (SIMD) architecture. Two key features of our design sharply reduce the number of instructions. 1) SMCOS uses operands based on non-redundant representation to perform a “trimmed” Cascade Operand Scanning (COS) multiplication, which minimizes the cost of multiplication and other instructions. 2) One round of fast vector reduction is designed to replace the conventional Montgomery reduction, which consumes less instructions for reducing intermediate results of multiplication. Further more, we offer a general method for pipelining vector instructions on ARM NEON platforms. By this means, the prime field multiplication of ECC using the SMCOS method reaches an ever-fastest execution speed on 32-bit ARM NEON platforms. Detailed benchmark results show that the proposed SMCOS method performs modular multiplication of NIST P192, Secp256k1, and Numsp256d1 within only 205, 310 and 306 clock cycles respectively, which are roughly 32% faster than the Multiplicand Reduction method, and about 47% faster than the Coarsely Integrated Cascade Operand Scanning method.

This work was partially supported by Shandong Province Key Research & Development Plan/Major Science & Technology Innovation Project (Grant No. 2020CXGC010115).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Azarderakhsh, R., Liu, Z., Seo, H., Kim, H.: NEON PQCRYTO: fast and parallel ring-LWE encryption on ARM NEON architecture. IACR Cryptol. ePrint Arch. 2015, 1081 (2015)

    Google Scholar 

  2. Bernstein, D.J., Schwabe, P.: NEON crypto. In: Prouff, E., Schaumont, P. (eds.) CHES 2012. LNCS, vol. 7428, pp. 320–339. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33027-8_19

    Chapter  Google Scholar 

  3. Bos, J.W., Montgomery, P.L., Shumow, D., Zaverucha, G.M.: Montgomery multiplication using vector instructions. In: Lange, T., Lauter, K., Lisoněk, P. (eds.) SAC 2013. LNCS, vol. 8282, pp. 471–489. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-662-43414-7_24

    Chapter  Google Scholar 

  4. Câmara, D., Gouvêa, C.P.L., López, J., Dahab, R.: Fast software polynomial multiplication on arm processors using the NEON engine. In: Cuzzocrea, A., Kittl, C., Simos, D.E., Weippl, E., Xu, L. (eds.) CD-ARES 2013. LNCS, vol. 8128, pp. 137–154. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-40588-4_10

    Chapter  Google Scholar 

  5. Cheng, H., Großschädl, J., Tian, J., Rønne, P.B., Ryan, P.Y.A.: High-throughput elliptic curve cryptography using AVX2 vector instructions. In: Dunkelman, O., Jacobson, Jr., M.J., O’Flynn, C. (eds.) SAC 2020. LNCS, vol. 12804, pp. 698–719. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-81652-0_27

    Chapter  MATH  Google Scholar 

  6. ARM Cortex: A9 NEON media processing engine technical reference manual revision: r4p1 (2012)

    Google Scholar 

  7. Faz-Hernández, A., López, J.: Fast implementation of Curve25519 using AVX2. In: Lauter, K., Rodríguez-Henríquez, F. (eds.) LATINCRYPT 2015. LNCS, vol. 9230, pp. 329–345. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-22174-8_18

    Chapter  Google Scholar 

  8. Faz-Hernández, A., Lopez, J., Dahab, R.: High-performance implementation of elliptic curve cryptography using vector instructions. ACM Trans. Math. Softw. (TOMS) 45(3), 1–35 (2019)

    Article  MathSciNet  Google Scholar 

  9. Grewal, G., Azarderakhsh, R., Longa, P., Hu, S., Jao, D.: Efficient implementation of bilinear pairings on arm processors. In: Knudsen, L.R., Wu, H. (eds.) SAC 2012. LNCS, vol. 7707, pp. 149–165. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-35999-6_11

    Chapter  Google Scholar 

  10. Gueron, S., Krasnov, V.: Software implementation of modular exponentiation, using advanced vector instructions architectures. In: Özbudak, F., Rodríguez-Henríquez, F. (eds.) WAIFI 2012. LNCS, vol. 7369, pp. 119–135. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-31662-3_9

    Chapter  MATH  Google Scholar 

  11. Hisil, H., Egrice, B., Yassi, M.: Fast 4 way vectorized ladder for the complete set of montgomery curves. IACR Cryptol. ePrint Arch. 2020, 388 (2020)

    Google Scholar 

  12. Holdings, A.: Arm architecture reference manual, ARMV7-A AND ARMV7-R edition. Arm Holdings (2014)

    Google Scholar 

  13. Huang, J., Liu, Z., Hu, Z., Großschädl, J.: Parallel implementation of SM2 elliptic curve cryptography on Intel processors with AVX2. In: Liu, J.K., Cui, H. (eds.) ACISP 2020. LNCS, vol. 12248, pp. 204–224. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-55304-3_11

    Chapter  Google Scholar 

  14. Intel Corporation: Using streaming SIMD extensions (SSE2) to perform big multiplications, application note AP-941, July 2000. http://software.intel.com/sites/default/files/14/4f/24960

  15. Kocher, P.C.: Timing attacks on implementations of Diffie-Hellman, RSA, DSS, and other systems. In: Koblitz, N. (ed.) CRYPTO 1996. LNCS, vol. 1109, pp. 104–113. Springer, Heidelberg (1996). https://doi.org/10.1007/3-540-68697-5_9

    Chapter  Google Scholar 

  16. Longa, P.: Four\(\mathbb{Q}\)NEON: faster elliptic curve scalar multiplications on ARM processors. In: Avanzi, R., Heys, H. (eds.) SAC 2016. LNCS, vol. 10532, pp. 501–519. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-69453-5_27

    Chapter  Google Scholar 

  17. Márquez, R.C., Sarmiento, A.J.C., Sánchez-Solano, S.: Speeding up elliptic curve arithmetic on arm processors using neon instructions. Revista Ingeniería Electrónica, Automática y Comunicaciones 41(3), 1–20 (2020). ISSN: 1815-5928

    Google Scholar 

  18. Microsoft Research: MSR Elliptic Curve Cryptography library (MSR ECClib) (2014). http://research.microsoft.com/en-us/projects/nums

  19. Montgomery, P.L.: Modular multiplication without trial division. Math. Comput. 44(170), 519–521 (1985)

    Article  MathSciNet  Google Scholar 

  20. Oder, T., Pöppelmann, T., Güneysu, T.: Beyond ecdsa and rsa: Lattice-based digital signatures on constrained devices. In: 2014 51st ACM/EDAC/IEEE Design Automation Conference (DAC). pp. 1–6. IEEE (2014)

    Google Scholar 

  21. OpenSSL: The open source toolkit for SSL. Download at https://www.openssl.org

  22. Orisaka, G., Aranha, D.F., López, J.: Finite field arithmetic using AVX-512 for isogeny-based cryptography. In: Anais do XVIII Simpósio Brasileiro em Segurança da Informação e de Sistemas Computacionais, pp. 49–56. SBC (2018)

    Google Scholar 

  23. Wuille, P., et al.: libsecp256k1: Optimized C library for EC operations on curve Secp256k1 (2015)

    Google Scholar 

  24. Pabbuleti, K.C., Mane, D.H., Desai, A., Albert, C., Schaumont, P.: SIMD acceleration of modular arithmetic on contemporary embedded platforms. In: 2013 IEEE High Performance Extreme Computing Conference (HPEC), pp. 1–6. IEEE (2013)

    Google Scholar 

  25. Page, D., Smart, N.P.: Parallel cryptographic arithmetic using a redundant montgomery representation. IEEE Trans. Comput. 53(11), 1474–1482 (2004)

    Article  Google Scholar 

  26. Sánchez, A.H., Rodríguez-Henríquez, F.: NEON implementation of an attribute-based encryption scheme. In: Jacobson, M., Locasto, M., Mohassel, P., Safavi-Naini, R. (eds.) ACNS 2013. LNCS, vol. 7954, pp. 322–338. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-38980-1_20

    Chapter  Google Scholar 

  27. Seo, H., Liu, Z., Großschädl, J., Choi, J., Kim, H.: Montgomery modular multiplication on ARM-NEON revisited. In: Lee, J., Kim, J. (eds.) ICISC 2014. LNCS, vol. 8949, pp. 328–342. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-15943-0_20

    Chapter  Google Scholar 

  28. Seo, H., Liu, Z., Großschädl, J., Kim, H.: Efficient arithmetic on ARM-NEON and its application for high-speed RSA implementation. Secur. Commun. Netw. 9(18), 5401–5411 (2016)

    Article  Google Scholar 

  29. Walter, C.D., Thompson, S.: Distinguishing exponent digits by observing modular subtractions. In: Naccache, D. (ed.) CT-RSA 2001. LNCS, vol. 2020, pp. 192–207. Springer, Heidelberg (2001). https://doi.org/10.1007/3-540-45353-9_15

    Chapter  Google Scholar 

  30. Wang, J., Vadnala, P.K., Großschädl, J., Xu, Q.: Higher-order masking in practice: a vector implementation of masked AES for ARM NEON. In: Nyberg, K. (ed.) CT-RSA 2015. LNCS, vol. 9048, pp. 181–198. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-16715-2_10

    Chapter  MATH  Google Scholar 

  31. Zhao, Y., Pan, W., Lin, J., Liu, P., Xue, C., Zheng, F.: PhiRSA: exploiting the computing power of vector instructions on Intel Xeon Phi for RSA. In: Avanzi, R., Heys, H. (eds.) SAC 2016. LNCS, vol. 10532, pp. 482–500. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-69453-5_26

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Wei Wang or Jingqiang Lin .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Wang, W., Wang, W., Lin, J., Fu, Y., Meng, L., Wang, Q. (2021). SMCOS: Fast and Parallel Modular Multiplication on ARM NEON Architecture for ECC. In: Yu, Y., Yung, M. (eds) Information Security and Cryptology. Inscrypt 2021. Lecture Notes in Computer Science(), vol 13007. Springer, Cham. https://doi.org/10.1007/978-3-030-88323-2_28

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-88323-2_28

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-88322-5

  • Online ISBN: 978-3-030-88323-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics