Skip to main content

Algorithmic Views of Vectorized Polynomial Multipliers – NTRU Prime

  • Conference paper
  • First Online:
Applied Cryptography and Network Security (ACNS 2024)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14584))

Included in the following conference series:

  • 254 Accesses

Abstract

In this paper, we explore the cost of vectorization for multiplying polynomials with coefficients in \(\mathbb {{Z}}_q\) for an odd prime q, as exemplified by NTRU Prime, a postquantum cryptosystem that found early adoption due to its inclusion in OpenSSH.

If there is a large power of two dividing \(q - 1\), we can apply radix-2 Cooley–Tukey fast Fourier transforms to multiply polynomials in \(\mathbb {{Z}}_q[x]\). The radix-2 nature admits efficient vectorization. Conversely, if 2 is the only power of two dividing \(q - 1\), we can apply Schönhage’s and Nussbaumer’s FFTs to craft radix-2 roots of unity, but these double the number of coefficients.

We show how to avoid the doubling while maintaining the vectorization friendliness with Good–Thomas, Rader’s, and Bruun’s FFTs. In particular, in sntrup761, the most common instance of NTRU Prime we have \(q=4591\), and we exploit the existing Fermat-prime factor of \(q - 1\) for Rader’s FFT and power-of-two factor of \(q + 1\) for Bruun’s FFT.

Polynomial multiplications in \(\mathbb {{Z}}_{4591}[x]/\left\langle {x^{761}-x-1} \right\rangle \) is still a worthwhile target because while out of the NIST PQC competition, sntrup761 is still going to be used with OpenSSH by default in the near future.

Our polynomial multiplication outperforms the state-of-the-art vector-optimized implementation by \(6.1 \times \). For ntrulpr761, our keygen, encap, and decap are \(2.98 \times \), \(2.79 \times \), and \(3.07 \times \) faster than the state-of-the-art vector-optimized implementation. For sntrup761, we outperform the reference implementation significantly.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://marc.info/?l=openssh-unix-dev &m=164939371201404 &w=2.

  2. 2.

    ARMv8-A, which naturally comes with the SIMD technology Neon, is currently the most prevalent architecture for mobile devices and is used for all Apple hardware.

  3. 3.

    There are some exceptions, including addv, smaxv, sadalp. We are not using them in this paper and refer to [ARM15] for more details.

  4. 4.

    We write some assembly and only obtain comparable performance. So we keep the implementations with intrinsics instead for readability.

  5. 5.

    There are several options for signed-extending vector elements—saddl{,2} and ssubl{,2} which go to either F0/F1, sxtl{,2} to F1, and smull{,2} going to F0.

  6. 6.

    \(\forall \text { coprime } q_0, q_1, \left\{ {\omega _{q_0}^{i_0} \omega _{q_1}^{i_1}| 0 \le i_0 < q_0, 0 \le i_1 < q_1} \right\} = \left\{ {\omega _{q_0 q_1}^i | 0 \le i < q_0 q_1} \right\} \) in the splitting field of \(x^{q_0 q_1} - 1\).

  7. 7.

    ARM’s DIT flag, according to https://developer.arm.com/documentation/ddi0595/2021-06/AArch64-Registers/DIT--Data-Independent-Timing, does not guarantee the high half multiplications sqrdmulh and sqdmulh to be constant-time.

References

  1. Alagic, G., et al.: NISTIR8413 – status report on the second round of the nist post-quantum cryptography standardization process (2022). https://doi.org/10.6028/NIST.IR.8413-upd1

  2. Alkim, E., et al.: Polynomial multiplication in NTRU Prime comparison of optimization strategies on cortex-M4. IACR Trans. Cryptogr. Hardware Embed. Syst. 2021(1), 217–238 (2021). https://tches.iacr.org/index.php/TCHES/article/view/8733

  3. Alkim, E., Hwang, V., Yang, B.Y.: Multi-parameter support with NTTs for NTRU and NTRU Prime on cortex-M4. IACR Trans. Cryptogr. Hardware Embed. Syst. 349–371 (2022)

    Google Scholar 

  4. ARM. Cortex-A72 Software Optimization Guide (2015). https://developer.arm.com/documentation/uan0016/a/

  5. ARM. Arm Architecture Reference Manual, Armv8, for Armv8-A architecture profile (2021). https://developer.arm.com/documentation/ddi0487/gb/?lang=en

  6. Barrett, P.: Implementing the Rivest Shamir and Adleman public key encryption algorithm on a standard digital signal processor. In: Odlyzko, A.M. (ed.) CRYPTO 1986. LNCS, vol. 263, pp. 311–323. Springer, Heidelberg (1986). https://doi.org/10.1007/3-540-47721-7_24

    Chapter  Google Scholar 

  7. Bernstein, D.J., et al.: NTRU Prime. In: Submission to the NIST Post-Quantum Cryptography Standardization Project [?] (2020). https://ntruprime.cr.yp.to/

  8. Bernstein, D.J., Brumley, B.B., Chen, M.S., Tuveri, N.: OpenSSLNTRU: faster post-quantum TLS key exchange. In: 31st USENIX Security Symposium (USENIX Security 2022), pp. 845–862 (2022)

    Google Scholar 

  9. Brawley, J.V., Carlitz, L.: Irreducibles and the composed product for polynomials over a finite field. Disc. Math. 65(2), 115–139 (1987)

    Article  MathSciNet  Google Scholar 

  10. Bernstein, D.J.: Multidigit multiplication for mathematicians (2001)

    Google Scholar 

  11. Blake, I.F., Gao, S., Mullin, R.C.: Explicit factorization of \(x^{2^k} + 1\) over \(\mathbb{F}_p\) with prime \(p \equiv 3 \;mod \;4\). Appl. Algebra Eng. Commun. Comput. 4(2), 89–94 (1993)

    Google Scholar 

  12. Becker, H., Hwang, V., Kannwischer, M.J., Yang, B.Y., Yang, S.Y.: Neon NTT: faster Dilithium, Kyber, and Saber on cortex-A72 and apple M1. IACR Trans. Cryptogr. Hardware Embed. Systems 2022(1), 221–244 (2022). https://tches.iacr.org/index.php/TCHES/article/view/9295

  13. Becker, H., Kannwischer, M.J.: Hybrid scalar/vector implementations of Keccak and SPHINCS+ on AArch64. Cryptology ePrint Archive (2022)

    Google Scholar 

  14. Bruun, G.: z-transform DFT filters and FFT’s. IEEE Trans. Acoust. Speech Signal Process. 26(1), 56–63 (1978)

    Article  Google Scholar 

  15. Bernstein, D.J., Yang, B.Y.: Fast constant-time GCD computation and modular inversion. IACR Trans. Cryptogr. Hardware Embed. Syst. 2019(3), 340–398 (2019). https://tches.iacr.org/index.php/TCHES/article/view/8298

  16. Chung, C.M.M., Hwang, V., Kannwischer, M.J., Seiler, G., Shih, C.J., Yang, B.Y.: NTT multiplication for NTT-unfriendly rings new speed records for Saber and NTRU on Cortex-M4 and AVX2. IACR Trans. Cryptogr. Hardware Embed. Syst. 2021(2), 159–188 (2021). https://tches.iacr.org/index.php/TCHES/article/view/8791

  17. Cooley, J.W., Tukey, J.W.: An algorithm for the machine calculation of complex fourier series. Math. Comput. 19(90), 297–301 (1965)

    Article  MathSciNet  Google Scholar 

  18. Dubois, E., Venetsanopoulos, A.: A new algorithm for the radix-3 FFT. IEEE Trans. Acoust. Speech Signal Process. 26(3), 222–225 (1978)

    Article  Google Scholar 

  19. Good, I.J.: The interaction algorithm and practical Fourier analysis. J. Roy. Stat. Soc.: Ser. B (Methodol.) 20(2), 361–372 (1958)

    MathSciNet  Google Scholar 

  20. Haasdijk, J.: Optimizing NTRU LPRime on the ARM Cortex - A72 (2021). https://github.com/jhaasdijk/KEMobi

  21. Kannwischer, M.J., Schwabe, P., Stebila, D., Wiggers, T.: PQClean. https://github.com/PQClean

  22. Meyn, H.: Factorization of the cyclotomic polynomial \(x^{2^n} + 1\) over finite fields. Finite Fields Appl. 2(4), 439–442 (1996)

    Article  MathSciNet  Google Scholar 

  23. Murakami, H.: Real-valued fast discrete Fourier transform and cyclic convolution algorithms of highly composite even length. In: 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings, vol. 3, pp. 1311–1314 (1996)

    Google Scholar 

  24. Martínez, F.E., Vergara, C.R., de Oliveira, L.B.: Explicit factorization of \(x^n-1 \in \mathbb{F} _q[x]\). arXiv preprint arXiv:1404.6281 (2014)

  25. Nguyen, D.T., Gaj, K.: Optimized software implementations of CRYSTALS-Kyber, NTRU, and Saber using NEON-based special instructions of ARMv8,. In: Third PQC Standardization Conference (2021)

    Google Scholar 

  26. Nussbaumer, H.: Fast polynomial transform algorithms for digital convolution. IEEE Trans. Acoust. Speech Signal Process. 28(2), 205–215 (1980)

    Article  MathSciNet  Google Scholar 

  27. Rader, C.M.: Discrete Fourier transforms when the number of data samples is prime. Proc. IEEE 56(6), 1107–1108 (1968)

    Article  Google Scholar 

  28. Schönhage, A.: Schnelle multiplikation von polynomen über körpern der charakteristik 2. Acta Informatica 7(4), 395–398 (1977)

    Article  MathSciNet  Google Scholar 

  29. Tuxanidy, A., Wang, Q.: Composed products and factors of cyclotomic polynomials over finite fields. Des. Codes Crypt. 69(2), 203–231 (2013)

    Article  MathSciNet  Google Scholar 

  30. van der Hoeven, J.: The truncated Fourier transform and applications. In: Proceedings of the 2004 International Symposium on Symbolic and Algebraic Computation, pp. 290–296 (2004)

    Google Scholar 

  31. Yansheng, W., Yue, Q.: Further factorization of \(x^n - 1\) over a finite field (II). Disc. Math. Algor. Appl. 13(06), 2150070 (2021)

    Google Scholar 

  32. Yansheng, W., Yue, Q., Fan, S.: Further factorization of \(x^n - 1\) over a finite field. Finite Fields Appl. 54, 197–215 (2018)

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgments

This work was supported in part by the Academia Sinica Investigator Award AS-IA-109-M01, and Taiwan’s National Science and Technology Council grants 112-2634-F-001-001-MBK and 112-2119-M-001-006.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Vincent Hwang , Chi-Ting Liu or Bo-Yin Yang .

Editor information

Editors and Affiliations

A Detailed Performance Numbers

A Detailed Performance Numbers

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Hwang, V., Liu, CT., Yang, BY. (2024). Algorithmic Views of Vectorized Polynomial Multipliers – NTRU Prime. In: Pöpper, C., Batina, L. (eds) Applied Cryptography and Network Security. ACNS 2024. Lecture Notes in Computer Science, vol 14584. Springer, Cham. https://doi.org/10.1007/978-3-031-54773-7_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-54773-7_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-54772-0

  • Online ISBN: 978-3-031-54773-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics