FFT Program Generation for Ring LWE-Based Cryptography

Masuda, Masahiro; Kameyama, Yukiyoshi

doi:10.1007/978-3-030-85987-9_9

Masahiro Masuda¹⁰ &
Yukiyoshi Kameyama¹⁰

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 12835))

Included in the following conference series:

International Workshop on Security

445 Accesses
3 Citations

Abstract

Fast Fourier Transform (FFT) enables an efficient implementation of polynomial multiplication, which is at the core of any cryptographic constructions based on the hardness of the Ring learning with errors (RLWE) problem. Existing implementations of FFT for RLWE-based cryptography rely on hand-written assembly code for performance, making it difficult to understand, maintain, and extend for new architectures.

We present a novel framework to implement FFT for RLWE-based cryptography, based on a principled program-generation approach. We start with a high-level, abstract definition of an FFT program, and generate low-level code by interpreting high-level primitives and delegating low-level details to an architecture-specific module. Since low-level details concerning modular arithmetic and vectorization are separated from high-level logic, we can easily generate both AVX2- and AVX512-optimized low-level code from the same high-level description of the FFT program. Our generated code is highly competitive compared to expert-written assembly code: For AVX2 (and AVX512, resp) it runs 1.13x (and 1.39x, resp) faster than the AVX2-optimized assembly implementation in the NewHope key-exchange protocol.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
FFT in which coefficients are taken from a finite field is often called NTT (Number Theoretic Transform), but we use the term FFT throughout this paper.
2.
https://github.com/newhopecrypto/newhope-usenix.
3.
The requirement on q comes from the fact that in general multiplying two polynomials of degree n requires a transform of size 2n. However, thanks to the property of negative wrapped convolution, it is enough to do a transform of size n in practice.
4.
It is similar to “interface” in other languages.
5.
Recall that NewHope uses double precision floating-point instructions to compute reductions.
6.
Both of our AVX2 and AVX512 support code are less than 90 lines of OCaml.

References

Aguilar-Melchor, C., Barrier, J., Guelton, S., Guinet, A., Killijian, M.-O., Lepoint, T.: NFLlib: NTT-based fast lattice library. In: Sako, K. (ed.) CT-RSA 2016. LNCS, vol. 9610, pp. 341–356. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-29485-8_20
Chapter Google Scholar
Alkim, E., Ducas, L., Pöppelmann, T., Schwabe, P.: Post-quantum key exchange: a new hope. In: Proceedings of the 25th USENIX Conference on Security Symposium, SEC 2016, pp. 327–343. USENIX Association, USA (2016)
Google Scholar
Barrett, P.: Implementing the Rivest Shamir and Adleman Public key encryption algorithm on a standard digital signal processor. In: Odlyzko, A.M. (ed.) CRYPTO 1986. LNCS, vol. 263, pp. 311–323. Springer, Heidelberg (1987). https://doi.org/10.1007/3-540-47721-7_24
Chapter Google Scholar
Bos, J., et al.: CRYSTALS - Kyber: a CCA-secure module-lattice-based KEM, pp. 353–367 (2018). https://doi.org/10.1109/EuroSP.2018.00032
Carette, J., Kiselyov, O., Shan, C.: Finally tagless, partially evaluated. In: Shao, Z. (ed.) APLAS 2007. LNCS, vol. 4807, pp. 222–238. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-76637-7_15
Chapter Google Scholar
Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms, 3rd edn. The MIT Press, Cambridge (2009)
MATH Google Scholar
Güneysu, T., Oder, T., Pöppelmann, T., Schwabe, P.: Software speed records for lattice-based signatures. In: Gaborit, P. (ed.) PQCrypto 2013. LNCS, vol. 7932, pp. 67–82. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-38616-9_5
Chapter Google Scholar
Johnson, S.G., Frigo, M.: A modified split-radix FFT with fewer arithmetic operations. Trans. Sig. Proc. 55(1), 111–119 (2007). https://doi.org/10.1109/TSP.2006.882087
Article MathSciNet MATH Google Scholar
Kiselyov, O.: Typed tagless final interpreters. In: Gibbons, J. (ed.) Generic and Indexed Programming. LNCS, vol. 7470, pp. 130–174. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-32202-0_3
Chapter Google Scholar
Kiselyov, O.: Reconciling abstraction with high performance: a MetaOCaml approach. Found. Trends Program. Lang. 5(1), 1–101 (2018). https://doi.org/10.1561/2500000038
Article MathSciNet Google Scholar
Kiselyov, O., Taha, W.: Relating FFTW and split-radix. In: Wu, Z., Chen, C., Guo, M., Bu, J. (eds.) ICESS 2004. LNCS, vol. 3605, pp. 488–493. Springer, Heidelberg (2005). https://doi.org/10.1007/11535409_71
Chapter Google Scholar
Leroy, X., Doligez, D., Frisch, A., Garrigue, J., Rémy, D., Vouillon, J.: The OCaml system release 4.11 (2020). https://caml.inria.fr/pub/docs/manual-ocaml/
Liu, Z., et al.: High-performance ideal lattice-based cryptography on 8-bit AVR microcontrollers. ACM Trans. Embed. Comput. Syst. 16(4) (2017). https://doi.org/10.1145/3092951
Longa, P., Naehrig, M.: Speeding up the number theoretic transform for faster ideal lattice-based cryptography. In: Foresti, S., Persiano, G. (eds.) CANS 2016. LNCS, vol. 10052, pp. 124–139. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-48965-0_8
Chapter Google Scholar
Lyubashevsky, V., Peikert, C., Regev, O.: On ideal lattices and learning with errors over rings. In: Gilbert, H. (ed.) EUROCRYPT 2010. LNCS, vol. 6110, pp. 1–23. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-13190-5_1
Chapter Google Scholar
Montgomery, P.L.: Modular multiplication without trial division. Math. Comput. 44, 519–521 (1985)
Article MathSciNet Google Scholar
Navas, J.A., Dutertre, B., Mason, I.A.: Verification of an optimized NTT algorithm. In: Christakis, M., Polikarpova, N., Duggirala, P.S., Schrammel, P. (eds.) NSV/VSTTE -2020. LNCS, vol. 12549, pp. 144–160. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-63618-0_9
Chapter Google Scholar
Roy, S.S., Vercauteren, F., Mentens, N., Chen, D.D., Verbauwhede, I.: Compact ring-LWE cryptoprocessor. In: Batina, L., Robshaw, M. (eds.) CHES 2014. LNCS, vol. 8731, pp. 371–391. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-662-44709-3_21
Chapter Google Scholar
Seiler, G.: Faster AVX2 optimized NTT multiplication for Ring-LWE lattice cryptography. IACR Cryptology ePrint Archive 2018/39 (2018)
Google Scholar

Download references

Acknowledgement

We thank Tadanori Teruya for helpful discussion. Feedbacks from anonymous reviewers helped improve this paper and are greatly appreciated. The second author is supported in part by JSPS Grant-in-Aid for Scientific Research (B) 18H03218.

Author information

Authors and Affiliations

University of Tsukuba, Tsukuba, Japan
Masahiro Masuda & Yukiyoshi Kameyama

Authors

Masahiro Masuda
View author publications
You can also search for this author in PubMed Google Scholar
Yukiyoshi Kameyama
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Masahiro Masuda .

Editor information

Editors and Affiliations

Hiroshima University, Hiroshima, Japan
Toru Nakanishi
National Institute of Information and Communications Technology, Tokyo, Japan
Ryo Nojima

Appendices

Appendix A Vectorize Module

Vectorize module is used to generate vectorized code for trivially vectorizable loops. It simply redefines the meaning of language primitives used in a sequential program so that the same program can evaluated to vectorized loop code. It is implemented as a OCaml functor, which is often used in the tagless-final style to extend the meaning of existing DSL.

Appendix B Lazy Reduction Implementation

We implement lazy reduction again as a OCaml functor, extending the original meanings of vadd and vsub to give semantics of lazy reduction. As explained in Sect. 4.3, we allow the result of addition to stay in 15 bits and apply Barrett reduction every other stage, while the result of subtraction is reduced to 14 bits in every stage by Barrett reduction. We can implement such specification for lazy reduction as follows:

This is used in our FFT code generator as follows. Lazy_reduction is instantiated for each stage s, and by simply wrapping the original meanings of vectorized primitives such as vadd and vsub defined in V_lang, the innermost loop now executes with the lazy reduction enabled. Note that we do not have to change the code of the innermost loop at all. The tagless-final style allows such an extension in a highly modular manner.

Appendix C Details on SIMD Backend Implementation

This is the full mapping between language primitives used in vectorized reductions of Sect. 4.2 and corresponding AVX2 instructions.

not_zero primitive is implemented in a cumbersome way, since AVX instruction returns 0xFFFF or 0x0000 for the result of comparison instructions, while we need 0x0001 or 0x0000 to represent the presence or absence of the carry bit. not_zero primitive hides such details specific to a particular ISA and provides a straightforward interface to a programmer.

Shuffle operations can be implemented by shift, blend, unpack, and permute instructions. The implementation using AVX2 is shown below. The AVX512 counterpart is entirely similar but uses different instruction combinations to realize desired permutations.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Masuda, M., Kameyama, Y. (2021). FFT Program Generation for Ring LWE-Based Cryptography. In: Nakanishi, T., Nojima, R. (eds) Advances in Information and Computer Security. IWSEC 2021. Lecture Notes in Computer Science(), vol 12835. Springer, Cham. https://doi.org/10.1007/978-3-030-85987-9_9

Download citation

DOI: https://doi.org/10.1007/978-3-030-85987-9_9
Published: 27 August 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-85986-2
Online ISBN: 978-3-030-85987-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics