Abstract
Fast Fourier Transform (FFT) enables an efficient implementation of polynomial multiplication, which is at the core of any cryptographic constructions based on the hardness of the Ring learning with errors (RLWE) problem. Existing implementations of FFT for RLWE-based cryptography rely on hand-written assembly code for performance, making it difficult to understand, maintain, and extend for new architectures.
We present a novel framework to implement FFT for RLWE-based cryptography, based on a principled program-generation approach. We start with a high-level, abstract definition of an FFT program, and generate low-level code by interpreting high-level primitives and delegating low-level details to an architecture-specific module. Since low-level details concerning modular arithmetic and vectorization are separated from high-level logic, we can easily generate both AVX2- and AVX512-optimized low-level code from the same high-level description of the FFT program. Our generated code is highly competitive compared to expert-written assembly code: For AVX2 (and AVX512, resp) it runs 1.13x (and 1.39x, resp) faster than the AVX2-optimized assembly implementation in the NewHope key-exchange protocol.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
FFT in which coefficients are taken from a finite field is often called NTT (Number Theoretic Transform), but we use the term FFT throughout this paper.
- 2.
- 3.
The requirement on q comes from the fact that in general multiplying two polynomials of degree n requires a transform of size 2n. However, thanks to the property of negative wrapped convolution, it is enough to do a transform of size n in practice.
- 4.
It is similar to “interface” in other languages.
- 5.
Recall that NewHope uses double precision floating-point instructions to compute reductions.
- 6.
Both of our AVX2 and AVX512 support code are less than 90 lines of OCaml.
References
Aguilar-Melchor, C., Barrier, J., Guelton, S., Guinet, A., Killijian, M.-O., Lepoint, T.: NFLlib: NTT-based fast lattice library. In: Sako, K. (ed.) CT-RSA 2016. LNCS, vol. 9610, pp. 341–356. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-29485-8_20
Alkim, E., Ducas, L., Pöppelmann, T., Schwabe, P.: Post-quantum key exchange: a new hope. In: Proceedings of the 25th USENIX Conference on Security Symposium, SEC 2016, pp. 327–343. USENIX Association, USA (2016)
Barrett, P.: Implementing the Rivest Shamir and Adleman Public key encryption algorithm on a standard digital signal processor. In: Odlyzko, A.M. (ed.) CRYPTO 1986. LNCS, vol. 263, pp. 311–323. Springer, Heidelberg (1987). https://doi.org/10.1007/3-540-47721-7_24
Bos, J., et al.: CRYSTALS - Kyber: a CCA-secure module-lattice-based KEM, pp. 353–367 (2018). https://doi.org/10.1109/EuroSP.2018.00032
Carette, J., Kiselyov, O., Shan, C.: Finally tagless, partially evaluated. In: Shao, Z. (ed.) APLAS 2007. LNCS, vol. 4807, pp. 222–238. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-76637-7_15
Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms, 3rd edn. The MIT Press, Cambridge (2009)
Güneysu, T., Oder, T., Pöppelmann, T., Schwabe, P.: Software speed records for lattice-based signatures. In: Gaborit, P. (ed.) PQCrypto 2013. LNCS, vol. 7932, pp. 67–82. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-38616-9_5
Johnson, S.G., Frigo, M.: A modified split-radix FFT with fewer arithmetic operations. Trans. Sig. Proc. 55(1), 111–119 (2007). https://doi.org/10.1109/TSP.2006.882087
Kiselyov, O.: Typed tagless final interpreters. In: Gibbons, J. (ed.) Generic and Indexed Programming. LNCS, vol. 7470, pp. 130–174. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-32202-0_3
Kiselyov, O.: Reconciling abstraction with high performance: a MetaOCaml approach. Found. Trends Program. Lang. 5(1), 1–101 (2018). https://doi.org/10.1561/2500000038
Kiselyov, O., Taha, W.: Relating FFTW and split-radix. In: Wu, Z., Chen, C., Guo, M., Bu, J. (eds.) ICESS 2004. LNCS, vol. 3605, pp. 488–493. Springer, Heidelberg (2005). https://doi.org/10.1007/11535409_71
Leroy, X., Doligez, D., Frisch, A., Garrigue, J., Rémy, D., Vouillon, J.: The OCaml system release 4.11 (2020). https://caml.inria.fr/pub/docs/manual-ocaml/
Liu, Z., et al.: High-performance ideal lattice-based cryptography on 8-bit AVR microcontrollers. ACM Trans. Embed. Comput. Syst. 16(4) (2017). https://doi.org/10.1145/3092951
Longa, P., Naehrig, M.: Speeding up the number theoretic transform for faster ideal lattice-based cryptography. In: Foresti, S., Persiano, G. (eds.) CANS 2016. LNCS, vol. 10052, pp. 124–139. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-48965-0_8
Lyubashevsky, V., Peikert, C., Regev, O.: On ideal lattices and learning with errors over rings. In: Gilbert, H. (ed.) EUROCRYPT 2010. LNCS, vol. 6110, pp. 1–23. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-13190-5_1
Montgomery, P.L.: Modular multiplication without trial division. Math. Comput. 44, 519–521 (1985)
Navas, J.A., Dutertre, B., Mason, I.A.: Verification of an optimized NTT algorithm. In: Christakis, M., Polikarpova, N., Duggirala, P.S., Schrammel, P. (eds.) NSV/VSTTE -2020. LNCS, vol. 12549, pp. 144–160. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-63618-0_9
Roy, S.S., Vercauteren, F., Mentens, N., Chen, D.D., Verbauwhede, I.: Compact ring-LWE cryptoprocessor. In: Batina, L., Robshaw, M. (eds.) CHES 2014. LNCS, vol. 8731, pp. 371–391. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-662-44709-3_21
Seiler, G.: Faster AVX2 optimized NTT multiplication for Ring-LWE lattice cryptography. IACR Cryptology ePrint Archive 2018/39 (2018)
Acknowledgement
We thank Tadanori Teruya for helpful discussion. Feedbacks from anonymous reviewers helped improve this paper and are greatly appreciated. The second author is supported in part by JSPS Grant-in-Aid for Scientific Research (B) 18H03218.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendices
Appendix A Vectorize Module
Vectorize module is used to generate vectorized code for trivially vectorizable loops. It simply redefines the meaning of language primitives used in a sequential program so that the same program can evaluated to vectorized loop code. It is implemented as a OCaml functor, which is often used in the tagless-final style to extend the meaning of existing DSL.
Appendix B Lazy Reduction Implementation
We implement lazy reduction again as a OCaml functor, extending the original meanings of vadd and vsub to give semantics of lazy reduction. As explained in Sect. 4.3, we allow the result of addition to stay in 15 bits and apply Barrett reduction every other stage, while the result of subtraction is reduced to 14 bits in every stage by Barrett reduction. We can implement such specification for lazy reduction as follows:
This is used in our FFT code generator as follows. Lazy_reduction is instantiated for each stage s, and by simply wrapping the original meanings of vectorized primitives such as vadd and vsub defined in V_lang, the innermost loop now executes with the lazy reduction enabled. Note that we do not have to change the code of the innermost loop at all. The tagless-final style allows such an extension in a highly modular manner.
Appendix C Details on SIMD Backend Implementation
This is the full mapping between language primitives used in vectorized reductions of Sect. 4.2 and corresponding AVX2 instructions.
not_zero primitive is implemented in a cumbersome way, since AVX instruction returns 0xFFFF or 0x0000 for the result of comparison instructions, while we need 0x0001 or 0x0000 to represent the presence or absence of the carry bit. not_zero primitive hides such details specific to a particular ISA and provides a straightforward interface to a programmer.
Shuffle operations can be implemented by shift, blend, unpack, and permute instructions. The implementation using AVX2 is shown below. The AVX512 counterpart is entirely similar but uses different instruction combinations to realize desired permutations.
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Masuda, M., Kameyama, Y. (2021). FFT Program Generation for Ring LWE-Based Cryptography. In: Nakanishi, T., Nojima, R. (eds) Advances in Information and Computer Security. IWSEC 2021. Lecture Notes in Computer Science(), vol 12835. Springer, Cham. https://doi.org/10.1007/978-3-030-85987-9_9
Download citation
DOI: https://doi.org/10.1007/978-3-030-85987-9_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-85986-2
Online ISBN: 978-3-030-85987-9
eBook Packages: Computer ScienceComputer Science (R0)