Abstract
Elliptic Curve Cryptography (ECC) is considered a more effective public-key cryptographic algorithm in some scenarios, because it uses shorter key sizes while providing a considerable level of security. Modular multiplication constitutes the “arithmetic foundation” of modern public-key cryptography such as ECC. In this paper, we propose the Cascade Operand Scanning for Specific Modulus (SMCOS) vectorization method to speed up the prime field multiplication of ECC on Single Instruction Multiple Data (SIMD) architecture. Two key features of our design sharply reduce the number of instructions. 1) SMCOS uses operands based on non-redundant representation to perform a “trimmed” Cascade Operand Scanning (COS) multiplication, which minimizes the cost of multiplication and other instructions. 2) One round of fast vector reduction is designed to replace the conventional Montgomery reduction, which consumes less instructions for reducing intermediate results of multiplication. Further more, we offer a general method for pipelining vector instructions on ARM NEON platforms. By this means, the prime field multiplication of ECC using the SMCOS method reaches an ever-fastest execution speed on 32-bit ARM NEON platforms. Detailed benchmark results show that the proposed SMCOS method performs modular multiplication of NIST P192, Secp256k1, and Numsp256d1 within only 205, 310 and 306 clock cycles respectively, which are roughly 32% faster than the Multiplicand Reduction method, and about 47% faster than the Coarsely Integrated Cascade Operand Scanning method.
This work was partially supported by Shandong Province Key Research & Development Plan/Major Science & Technology Innovation Project (Grant No. 2020CXGC010115).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Azarderakhsh, R., Liu, Z., Seo, H., Kim, H.: NEON PQCRYTO: fast and parallel ring-LWE encryption on ARM NEON architecture. IACR Cryptol. ePrint Arch. 2015, 1081 (2015)
Bernstein, D.J., Schwabe, P.: NEON crypto. In: Prouff, E., Schaumont, P. (eds.) CHES 2012. LNCS, vol. 7428, pp. 320–339. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33027-8_19
Bos, J.W., Montgomery, P.L., Shumow, D., Zaverucha, G.M.: Montgomery multiplication using vector instructions. In: Lange, T., Lauter, K., Lisoněk, P. (eds.) SAC 2013. LNCS, vol. 8282, pp. 471–489. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-662-43414-7_24
Câmara, D., Gouvêa, C.P.L., López, J., Dahab, R.: Fast software polynomial multiplication on arm processors using the NEON engine. In: Cuzzocrea, A., Kittl, C., Simos, D.E., Weippl, E., Xu, L. (eds.) CD-ARES 2013. LNCS, vol. 8128, pp. 137–154. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-40588-4_10
Cheng, H., Großschädl, J., Tian, J., Rønne, P.B., Ryan, P.Y.A.: High-throughput elliptic curve cryptography using AVX2 vector instructions. In: Dunkelman, O., Jacobson, Jr., M.J., O’Flynn, C. (eds.) SAC 2020. LNCS, vol. 12804, pp. 698–719. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-81652-0_27
ARM Cortex: A9 NEON media processing engine technical reference manual revision: r4p1 (2012)
Faz-Hernández, A., López, J.: Fast implementation of Curve25519 using AVX2. In: Lauter, K., Rodríguez-Henríquez, F. (eds.) LATINCRYPT 2015. LNCS, vol. 9230, pp. 329–345. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-22174-8_18
Faz-Hernández, A., Lopez, J., Dahab, R.: High-performance implementation of elliptic curve cryptography using vector instructions. ACM Trans. Math. Softw. (TOMS) 45(3), 1–35 (2019)
Grewal, G., Azarderakhsh, R., Longa, P., Hu, S., Jao, D.: Efficient implementation of bilinear pairings on arm processors. In: Knudsen, L.R., Wu, H. (eds.) SAC 2012. LNCS, vol. 7707, pp. 149–165. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-35999-6_11
Gueron, S., Krasnov, V.: Software implementation of modular exponentiation, using advanced vector instructions architectures. In: Özbudak, F., Rodríguez-Henríquez, F. (eds.) WAIFI 2012. LNCS, vol. 7369, pp. 119–135. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-31662-3_9
Hisil, H., Egrice, B., Yassi, M.: Fast 4 way vectorized ladder for the complete set of montgomery curves. IACR Cryptol. ePrint Arch. 2020, 388 (2020)
Holdings, A.: Arm architecture reference manual, ARMV7-A AND ARMV7-R edition. Arm Holdings (2014)
Huang, J., Liu, Z., Hu, Z., Großschädl, J.: Parallel implementation of SM2 elliptic curve cryptography on Intel processors with AVX2. In: Liu, J.K., Cui, H. (eds.) ACISP 2020. LNCS, vol. 12248, pp. 204–224. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-55304-3_11
Intel Corporation: Using streaming SIMD extensions (SSE2) to perform big multiplications, application note AP-941, July 2000. http://software.intel.com/sites/default/files/14/4f/24960
Kocher, P.C.: Timing attacks on implementations of Diffie-Hellman, RSA, DSS, and other systems. In: Koblitz, N. (ed.) CRYPTO 1996. LNCS, vol. 1109, pp. 104–113. Springer, Heidelberg (1996). https://doi.org/10.1007/3-540-68697-5_9
Longa, P.: Four\(\mathbb{Q}\)NEON: faster elliptic curve scalar multiplications on ARM processors. In: Avanzi, R., Heys, H. (eds.) SAC 2016. LNCS, vol. 10532, pp. 501–519. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-69453-5_27
Márquez, R.C., Sarmiento, A.J.C., Sánchez-Solano, S.: Speeding up elliptic curve arithmetic on arm processors using neon instructions. Revista Ingeniería Electrónica, Automática y Comunicaciones 41(3), 1–20 (2020). ISSN: 1815-5928
Microsoft Research: MSR Elliptic Curve Cryptography library (MSR ECClib) (2014). http://research.microsoft.com/en-us/projects/nums
Montgomery, P.L.: Modular multiplication without trial division. Math. Comput. 44(170), 519–521 (1985)
Oder, T., Pöppelmann, T., Güneysu, T.: Beyond ecdsa and rsa: Lattice-based digital signatures on constrained devices. In: 2014 51st ACM/EDAC/IEEE Design Automation Conference (DAC). pp. 1–6. IEEE (2014)
OpenSSL: The open source toolkit for SSL. Download at https://www.openssl.org
Orisaka, G., Aranha, D.F., López, J.: Finite field arithmetic using AVX-512 for isogeny-based cryptography. In: Anais do XVIII Simpósio Brasileiro em Segurança da Informação e de Sistemas Computacionais, pp. 49–56. SBC (2018)
Wuille, P., et al.: libsecp256k1: Optimized C library for EC operations on curve Secp256k1 (2015)
Pabbuleti, K.C., Mane, D.H., Desai, A., Albert, C., Schaumont, P.: SIMD acceleration of modular arithmetic on contemporary embedded platforms. In: 2013 IEEE High Performance Extreme Computing Conference (HPEC), pp. 1–6. IEEE (2013)
Page, D., Smart, N.P.: Parallel cryptographic arithmetic using a redundant montgomery representation. IEEE Trans. Comput. 53(11), 1474–1482 (2004)
Sánchez, A.H., Rodríguez-Henríquez, F.: NEON implementation of an attribute-based encryption scheme. In: Jacobson, M., Locasto, M., Mohassel, P., Safavi-Naini, R. (eds.) ACNS 2013. LNCS, vol. 7954, pp. 322–338. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-38980-1_20
Seo, H., Liu, Z., Großschädl, J., Choi, J., Kim, H.: Montgomery modular multiplication on ARM-NEON revisited. In: Lee, J., Kim, J. (eds.) ICISC 2014. LNCS, vol. 8949, pp. 328–342. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-15943-0_20
Seo, H., Liu, Z., Großschädl, J., Kim, H.: Efficient arithmetic on ARM-NEON and its application for high-speed RSA implementation. Secur. Commun. Netw. 9(18), 5401–5411 (2016)
Walter, C.D., Thompson, S.: Distinguishing exponent digits by observing modular subtractions. In: Naccache, D. (ed.) CT-RSA 2001. LNCS, vol. 2020, pp. 192–207. Springer, Heidelberg (2001). https://doi.org/10.1007/3-540-45353-9_15
Wang, J., Vadnala, P.K., Großschädl, J., Xu, Q.: Higher-order masking in practice: a vector implementation of masked AES for ARM NEON. In: Nyberg, K. (ed.) CT-RSA 2015. LNCS, vol. 9048, pp. 181–198. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-16715-2_10
Zhao, Y., Pan, W., Lin, J., Liu, P., Xue, C., Zheng, F.: PhiRSA: exploiting the computing power of vector instructions on Intel Xeon Phi for RSA. In: Avanzi, R., Heys, H. (eds.) SAC 2016. LNCS, vol. 10532, pp. 482–500. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-69453-5_26
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Wang, W., Wang, W., Lin, J., Fu, Y., Meng, L., Wang, Q. (2021). SMCOS: Fast and Parallel Modular Multiplication on ARM NEON Architecture for ECC. In: Yu, Y., Yung, M. (eds) Information Security and Cryptology. Inscrypt 2021. Lecture Notes in Computer Science(), vol 13007. Springer, Cham. https://doi.org/10.1007/978-3-030-88323-2_28
Download citation
DOI: https://doi.org/10.1007/978-3-030-88323-2_28
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-88322-5
Online ISBN: 978-3-030-88323-2
eBook Packages: Computer ScienceComputer Science (R0)