SMCOS: Fast and Parallel Modular Multiplication on ARM NEON Architecture for ECC

Wang, Wenjie; Wang, Wei; Lin, Jingqiang; Fu, Yu; Meng, Lingjia; Wang, Qiongxiao

doi:10.1007/978-3-030-88323-2_28

Wenjie Wang^10,11,
Wei Wang^10,12,
Jingqiang Lin^13,14,
Yu Fu^10,11,
Lingjia Meng^10,11 &
…
Qiongxiao Wang^10,11

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 13007))

Included in the following conference series:

International Conference on Information Security and Cryptology

769 Accesses
2 Citations

Abstract

Elliptic Curve Cryptography (ECC) is considered a more effective public-key cryptographic algorithm in some scenarios, because it uses shorter key sizes while providing a considerable level of security. Modular multiplication constitutes the “arithmetic foundation” of modern public-key cryptography such as ECC. In this paper, we propose the Cascade Operand Scanning for Specific Modulus (SMCOS) vectorization method to speed up the prime field multiplication of ECC on Single Instruction Multiple Data (SIMD) architecture. Two key features of our design sharply reduce the number of instructions. 1) SMCOS uses operands based on non-redundant representation to perform a “trimmed” Cascade Operand Scanning (COS) multiplication, which minimizes the cost of multiplication and other instructions. 2) One round of fast vector reduction is designed to replace the conventional Montgomery reduction, which consumes less instructions for reducing intermediate results of multiplication. Further more, we offer a general method for pipelining vector instructions on ARM NEON platforms. By this means, the prime field multiplication of ECC using the SMCOS method reaches an ever-fastest execution speed on 32-bit ARM NEON platforms. Detailed benchmark results show that the proposed SMCOS method performs modular multiplication of NIST P192, Secp256k1, and Numsp256d1 within only 205, 310 and 306 clock cycles respectively, which are roughly 32% faster than the Multiplicand Reduction method, and about 47% faster than the Coarsely Integrated Cascade Operand Scanning method.

This work was partially supported by Shandong Province Key Research & Development Plan/Major Science & Technology Innovation Project (Grant No. 2020CXGC010115).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Azarderakhsh, R., Liu, Z., Seo, H., Kim, H.: NEON PQCRYTO: fast and parallel ring-LWE encryption on ARM NEON architecture. IACR Cryptol. ePrint Arch. 2015, 1081 (2015)
Google Scholar
Bernstein, D.J., Schwabe, P.: NEON crypto. In: Prouff, E., Schaumont, P. (eds.) CHES 2012. LNCS, vol. 7428, pp. 320–339. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33027-8_19
Chapter Google Scholar
Bos, J.W., Montgomery, P.L., Shumow, D., Zaverucha, G.M.: Montgomery multiplication using vector instructions. In: Lange, T., Lauter, K., Lisoněk, P. (eds.) SAC 2013. LNCS, vol. 8282, pp. 471–489. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-662-43414-7_24
Chapter Google Scholar
Câmara, D., Gouvêa, C.P.L., López, J., Dahab, R.: Fast software polynomial multiplication on arm processors using the NEON engine. In: Cuzzocrea, A., Kittl, C., Simos, D.E., Weippl, E., Xu, L. (eds.) CD-ARES 2013. LNCS, vol. 8128, pp. 137–154. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-40588-4_10
Chapter Google Scholar
Cheng, H., Großschädl, J., Tian, J., Rønne, P.B., Ryan, P.Y.A.: High-throughput elliptic curve cryptography using AVX2 vector instructions. In: Dunkelman, O., Jacobson, Jr., M.J., O’Flynn, C. (eds.) SAC 2020. LNCS, vol. 12804, pp. 698–719. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-81652-0_27
Chapter MATH Google Scholar
ARM Cortex: A9 NEON media processing engine technical reference manual revision: r4p1 (2012)
Google Scholar
Faz-Hernández, A., López, J.: Fast implementation of Curve25519 using AVX2. In: Lauter, K., Rodríguez-Henríquez, F. (eds.) LATINCRYPT 2015. LNCS, vol. 9230, pp. 329–345. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-22174-8_18
Chapter Google Scholar
Faz-Hernández, A., Lopez, J., Dahab, R.: High-performance implementation of elliptic curve cryptography using vector instructions. ACM Trans. Math. Softw. (TOMS) 45(3), 1–35 (2019)
Article MathSciNet Google Scholar
Grewal, G., Azarderakhsh, R., Longa, P., Hu, S., Jao, D.: Efficient implementation of bilinear pairings on arm processors. In: Knudsen, L.R., Wu, H. (eds.) SAC 2012. LNCS, vol. 7707, pp. 149–165. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-35999-6_11
Chapter Google Scholar
Gueron, S., Krasnov, V.: Software implementation of modular exponentiation, using advanced vector instructions architectures. In: Özbudak, F., Rodríguez-Henríquez, F. (eds.) WAIFI 2012. LNCS, vol. 7369, pp. 119–135. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-31662-3_9
Chapter MATH Google Scholar
Hisil, H., Egrice, B., Yassi, M.: Fast 4 way vectorized ladder for the complete set of montgomery curves. IACR Cryptol. ePrint Arch. 2020, 388 (2020)
Google Scholar
Holdings, A.: Arm architecture reference manual, ARMV7-A AND ARMV7-R edition. Arm Holdings (2014)
Google Scholar
Huang, J., Liu, Z., Hu, Z., Großschädl, J.: Parallel implementation of SM2 elliptic curve cryptography on Intel processors with AVX2. In: Liu, J.K., Cui, H. (eds.) ACISP 2020. LNCS, vol. 12248, pp. 204–224. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-55304-3_11
Chapter Google Scholar
Intel Corporation: Using streaming SIMD extensions (SSE2) to perform big multiplications, application note AP-941, July 2000. http://software.intel.com/sites/default/files/14/4f/24960
Kocher, P.C.: Timing attacks on implementations of Diffie-Hellman, RSA, DSS, and other systems. In: Koblitz, N. (ed.) CRYPTO 1996. LNCS, vol. 1109, pp. 104–113. Springer, Heidelberg (1996). https://doi.org/10.1007/3-540-68697-5_9
Chapter Google Scholar
Longa, P.: Four\(\mathbb{Q}\)NEON: faster elliptic curve scalar multiplications on ARM processors. In: Avanzi, R., Heys, H. (eds.) SAC 2016. LNCS, vol. 10532, pp. 501–519. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-69453-5_27
Chapter Google Scholar
Márquez, R.C., Sarmiento, A.J.C., Sánchez-Solano, S.: Speeding up elliptic curve arithmetic on arm processors using neon instructions. Revista Ingeniería Electrónica, Automática y Comunicaciones 41(3), 1–20 (2020). ISSN: 1815-5928
Google Scholar
Microsoft Research: MSR Elliptic Curve Cryptography library (MSR ECClib) (2014). http://research.microsoft.com/en-us/projects/nums
Montgomery, P.L.: Modular multiplication without trial division. Math. Comput. 44(170), 519–521 (1985)
Article MathSciNet Google Scholar
Oder, T., Pöppelmann, T., Güneysu, T.: Beyond ecdsa and rsa: Lattice-based digital signatures on constrained devices. In: 2014 51st ACM/EDAC/IEEE Design Automation Conference (DAC). pp. 1–6. IEEE (2014)
Google Scholar
OpenSSL: The open source toolkit for SSL. Download at https://www.openssl.org
Orisaka, G., Aranha, D.F., López, J.: Finite field arithmetic using AVX-512 for isogeny-based cryptography. In: Anais do XVIII Simpósio Brasileiro em Segurança da Informação e de Sistemas Computacionais, pp. 49–56. SBC (2018)
Google Scholar
Wuille, P., et al.: libsecp256k1: Optimized C library for EC operations on curve Secp256k1 (2015)
Google Scholar
Pabbuleti, K.C., Mane, D.H., Desai, A., Albert, C., Schaumont, P.: SIMD acceleration of modular arithmetic on contemporary embedded platforms. In: 2013 IEEE High Performance Extreme Computing Conference (HPEC), pp. 1–6. IEEE (2013)
Google Scholar
Page, D., Smart, N.P.: Parallel cryptographic arithmetic using a redundant montgomery representation. IEEE Trans. Comput. 53(11), 1474–1482 (2004)
Article Google Scholar
Sánchez, A.H., Rodríguez-Henríquez, F.: NEON implementation of an attribute-based encryption scheme. In: Jacobson, M., Locasto, M., Mohassel, P., Safavi-Naini, R. (eds.) ACNS 2013. LNCS, vol. 7954, pp. 322–338. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-38980-1_20
Chapter Google Scholar
Seo, H., Liu, Z., Großschädl, J., Choi, J., Kim, H.: Montgomery modular multiplication on ARM-NEON revisited. In: Lee, J., Kim, J. (eds.) ICISC 2014. LNCS, vol. 8949, pp. 328–342. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-15943-0_20
Chapter Google Scholar
Seo, H., Liu, Z., Großschädl, J., Kim, H.: Efficient arithmetic on ARM-NEON and its application for high-speed RSA implementation. Secur. Commun. Netw. 9(18), 5401–5411 (2016)
Article Google Scholar
Walter, C.D., Thompson, S.: Distinguishing exponent digits by observing modular subtractions. In: Naccache, D. (ed.) CT-RSA 2001. LNCS, vol. 2020, pp. 192–207. Springer, Heidelberg (2001). https://doi.org/10.1007/3-540-45353-9_15
Chapter Google Scholar
Wang, J., Vadnala, P.K., Großschädl, J., Xu, Q.: Higher-order masking in practice: a vector implementation of masked AES for ARM NEON. In: Nyberg, K. (ed.) CT-RSA 2015. LNCS, vol. 9048, pp. 181–198. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-16715-2_10
Chapter MATH Google Scholar
Zhao, Y., Pan, W., Lin, J., Liu, P., Xue, C., Zheng, F.: PhiRSA: exploiting the computing power of vector instructions on Intel Xeon Phi for RSA. In: Avanzi, R., Heys, H. (eds.) SAC 2016. LNCS, vol. 10532, pp. 482–500. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-69453-5_26
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

State Key Laboratory of Information Security, Institute of Information Engineering, Chinese Academy of Sciences, Beijing, 100089, China
Wenjie Wang, Wei Wang, Yu Fu, Lingjia Meng & Qiongxiao Wang
School of Cyber Security, University of Chinese Academy of Sciences, Beijing, 100089, China
Wenjie Wang, Yu Fu, Lingjia Meng & Qiongxiao Wang
Data Assurance and Communication Security Research Center, CAS, Beijing, 100089, China
Wei Wang
School of Cyber Security, University of Science and Technology of China, Hefei, 230027, Anhui, China
Jingqiang Lin
Beijing Institute, University of Science and Technology of China, Beijing, China
Jingqiang Lin

Authors

Wenjie Wang
View author publications
You can also search for this author in PubMed Google Scholar
Wei Wang
View author publications
You can also search for this author in PubMed Google Scholar
Jingqiang Lin
View author publications
You can also search for this author in PubMed Google Scholar
Yu Fu
View author publications
You can also search for this author in PubMed Google Scholar
Lingjia Meng
View author publications
You can also search for this author in PubMed Google Scholar
Qiongxiao Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Wei Wang or Jingqiang Lin .

Editor information

Editors and Affiliations

Shanghai Jiao Tong University, Shanghai, China
Yu Yu
Columbia University, New York, NY, USA
Moti Yung

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, W., Wang, W., Lin, J., Fu, Y., Meng, L., Wang, Q. (2021). SMCOS: Fast and Parallel Modular Multiplication on ARM NEON Architecture for ECC. In: Yu, Y., Yung, M. (eds) Information Security and Cryptology. Inscrypt 2021. Lecture Notes in Computer Science(), vol 13007. Springer, Cham. https://doi.org/10.1007/978-3-030-88323-2_28

Download citation

DOI: https://doi.org/10.1007/978-3-030-88323-2_28
Published: 18 October 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-88322-5
Online ISBN: 978-3-030-88323-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics