Skip to main content

FFT Program Generation for Ring LWE-Based Cryptography

  • Conference paper
  • First Online:
Advances in Information and Computer Security (IWSEC 2021)

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 12835))

Included in the following conference series:

Abstract

Fast Fourier Transform (FFT) enables an efficient implementation of polynomial multiplication, which is at the core of any cryptographic constructions based on the hardness of the Ring learning with errors (RLWE) problem. Existing implementations of FFT for RLWE-based cryptography rely on hand-written assembly code for performance, making it difficult to understand, maintain, and extend for new architectures.

We present a novel framework to implement FFT for RLWE-based cryptography, based on a principled program-generation approach. We start with a high-level, abstract definition of an FFT program, and generate low-level code by interpreting high-level primitives and delegating low-level details to an architecture-specific module. Since low-level details concerning modular arithmetic and vectorization are separated from high-level logic, we can easily generate both AVX2- and AVX512-optimized low-level code from the same high-level description of the FFT program. Our generated code is highly competitive compared to expert-written assembly code: For AVX2 (and AVX512, resp) it runs 1.13x (and 1.39x, resp) faster than the AVX2-optimized assembly implementation in the NewHope key-exchange protocol.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    FFT in which coefficients are taken from a finite field is often called NTT (Number Theoretic Transform), but we use the term FFT throughout this paper.

  2. 2.

    https://github.com/newhopecrypto/newhope-usenix.

  3. 3.

    The requirement on q comes from the fact that in general multiplying two polynomials of degree n requires a transform of size 2n. However, thanks to the property of negative wrapped convolution, it is enough to do a transform of size n in practice.

  4. 4.

    It is similar to “interface” in other languages.

  5. 5.

    Recall that NewHope uses double precision floating-point instructions to compute reductions.

  6. 6.

    Both of our AVX2 and AVX512 support code are less than 90 lines of OCaml.

References

  1. Aguilar-Melchor, C., Barrier, J., Guelton, S., Guinet, A., Killijian, M.-O., Lepoint, T.: NFLlib: NTT-based fast lattice library. In: Sako, K. (ed.) CT-RSA 2016. LNCS, vol. 9610, pp. 341–356. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-29485-8_20

    Chapter  Google Scholar 

  2. Alkim, E., Ducas, L., Pöppelmann, T., Schwabe, P.: Post-quantum key exchange: a new hope. In: Proceedings of the 25th USENIX Conference on Security Symposium, SEC 2016, pp. 327–343. USENIX Association, USA (2016)

    Google Scholar 

  3. Barrett, P.: Implementing the Rivest Shamir and Adleman Public key encryption algorithm on a standard digital signal processor. In: Odlyzko, A.M. (ed.) CRYPTO 1986. LNCS, vol. 263, pp. 311–323. Springer, Heidelberg (1987). https://doi.org/10.1007/3-540-47721-7_24

    Chapter  Google Scholar 

  4. Bos, J., et al.: CRYSTALS - Kyber: a CCA-secure module-lattice-based KEM, pp. 353–367 (2018). https://doi.org/10.1109/EuroSP.2018.00032

  5. Carette, J., Kiselyov, O., Shan, C.: Finally tagless, partially evaluated. In: Shao, Z. (ed.) APLAS 2007. LNCS, vol. 4807, pp. 222–238. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-76637-7_15

    Chapter  Google Scholar 

  6. Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms, 3rd edn. The MIT Press, Cambridge (2009)

    MATH  Google Scholar 

  7. Güneysu, T., Oder, T., Pöppelmann, T., Schwabe, P.: Software speed records for lattice-based signatures. In: Gaborit, P. (ed.) PQCrypto 2013. LNCS, vol. 7932, pp. 67–82. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-38616-9_5

    Chapter  Google Scholar 

  8. Johnson, S.G., Frigo, M.: A modified split-radix FFT with fewer arithmetic operations. Trans. Sig. Proc. 55(1), 111–119 (2007). https://doi.org/10.1109/TSP.2006.882087

    Article  MathSciNet  MATH  Google Scholar 

  9. Kiselyov, O.: Typed tagless final interpreters. In: Gibbons, J. (ed.) Generic and Indexed Programming. LNCS, vol. 7470, pp. 130–174. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-32202-0_3

    Chapter  Google Scholar 

  10. Kiselyov, O.: Reconciling abstraction with high performance: a MetaOCaml approach. Found. Trends Program. Lang. 5(1), 1–101 (2018). https://doi.org/10.1561/2500000038

    Article  MathSciNet  Google Scholar 

  11. Kiselyov, O., Taha, W.: Relating FFTW and split-radix. In: Wu, Z., Chen, C., Guo, M., Bu, J. (eds.) ICESS 2004. LNCS, vol. 3605, pp. 488–493. Springer, Heidelberg (2005). https://doi.org/10.1007/11535409_71

    Chapter  Google Scholar 

  12. Leroy, X., Doligez, D., Frisch, A., Garrigue, J., Rémy, D., Vouillon, J.: The OCaml system release 4.11 (2020). https://caml.inria.fr/pub/docs/manual-ocaml/

  13. Liu, Z., et al.: High-performance ideal lattice-based cryptography on 8-bit AVR microcontrollers. ACM Trans. Embed. Comput. Syst. 16(4) (2017). https://doi.org/10.1145/3092951

  14. Longa, P., Naehrig, M.: Speeding up the number theoretic transform for faster ideal lattice-based cryptography. In: Foresti, S., Persiano, G. (eds.) CANS 2016. LNCS, vol. 10052, pp. 124–139. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-48965-0_8

    Chapter  Google Scholar 

  15. Lyubashevsky, V., Peikert, C., Regev, O.: On ideal lattices and learning with errors over rings. In: Gilbert, H. (ed.) EUROCRYPT 2010. LNCS, vol. 6110, pp. 1–23. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-13190-5_1

    Chapter  Google Scholar 

  16. Montgomery, P.L.: Modular multiplication without trial division. Math. Comput. 44, 519–521 (1985)

    Article  MathSciNet  Google Scholar 

  17. Navas, J.A., Dutertre, B., Mason, I.A.: Verification of an optimized NTT algorithm. In: Christakis, M., Polikarpova, N., Duggirala, P.S., Schrammel, P. (eds.) NSV/VSTTE -2020. LNCS, vol. 12549, pp. 144–160. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-63618-0_9

    Chapter  Google Scholar 

  18. Roy, S.S., Vercauteren, F., Mentens, N., Chen, D.D., Verbauwhede, I.: Compact ring-LWE cryptoprocessor. In: Batina, L., Robshaw, M. (eds.) CHES 2014. LNCS, vol. 8731, pp. 371–391. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-662-44709-3_21

    Chapter  Google Scholar 

  19. Seiler, G.: Faster AVX2 optimized NTT multiplication for Ring-LWE lattice cryptography. IACR Cryptology ePrint Archive 2018/39 (2018)

    Google Scholar 

Download references

Acknowledgement

We thank Tadanori Teruya for helpful discussion. Feedbacks from anonymous reviewers helped improve this paper and are greatly appreciated. The second author is supported in part by JSPS Grant-in-Aid for Scientific Research (B) 18H03218.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Masahiro Masuda .

Editor information

Editors and Affiliations

Appendices

Appendix A Vectorize Module

Vectorize module is used to generate vectorized code for trivially vectorizable loops. It simply redefines the meaning of language primitives used in a sequential program so that the same program can evaluated to vectorized loop code. It is implemented as a OCaml functor, which is often used in the tagless-final style to extend the meaning of existing DSL.

figure x

Appendix B Lazy Reduction Implementation

We implement lazy reduction again as a OCaml functor, extending the original meanings of vadd and vsub to give semantics of lazy reduction. As explained in Sect. 4.3, we allow the result of addition to stay in 15 bits and apply Barrett reduction every other stage, while the result of subtraction is reduced to 14 bits in every stage by Barrett reduction. We can implement such specification for lazy reduction as follows:

figure y

This is used in our FFT code generator as follows. Lazy_reduction is instantiated for each stage s, and by simply wrapping the original meanings of vectorized primitives such as vadd and vsub defined in V_lang, the innermost loop now executes with the lazy reduction enabled. Note that we do not have to change the code of the innermost loop at all. The tagless-final style allows such an extension in a highly modular manner.

figure z

Appendix C Details on SIMD Backend Implementation

This is the full mapping between language primitives used in vectorized reductions of Sect. 4.2 and corresponding AVX2 instructions.

figure aa

not_zero primitive is implemented in a cumbersome way, since AVX instruction returns 0xFFFF or 0x0000 for the result of comparison instructions, while we need 0x0001 or 0x0000 to represent the presence or absence of the carry bit. not_zero primitive hides such details specific to a particular ISA and provides a straightforward interface to a programmer.

Shuffle operations can be implemented by shift, blend, unpack, and permute instructions. The implementation using AVX2 is shown below. The AVX512 counterpart is entirely similar but uses different instruction combinations to realize desired permutations.

figure ab

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Masuda, M., Kameyama, Y. (2021). FFT Program Generation for Ring LWE-Based Cryptography. In: Nakanishi, T., Nojima, R. (eds) Advances in Information and Computer Security. IWSEC 2021. Lecture Notes in Computer Science(), vol 12835. Springer, Cham. https://doi.org/10.1007/978-3-030-85987-9_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-85987-9_9

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-85986-2

  • Online ISBN: 978-3-030-85987-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics