Efficient Montgomery Multiplication on GPUs

  • Nicolae Roşia
  • Virgil Cervicescu
  • Mihai Togan
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9522)


Public-key cryptosystems and algorithms, including RSA [20], EC and Diffie-Hellman key exchange [5], require efficient large integer arithmetic in finite fields. Contemporary processors are not designed to support such operations in a productive manner, since most of them natively work on 8 to 64 bit word sizes. Thus, an expensive cryptographic accelerator is frequently required to offload the computational burden. In this paper, we focus on a highly parallel architecture which is commonly found in commodity computers, i.e. the Graphical Processing Unit (GPU). Recently, GPUs have known an exponential growth in terms of computing power, becoming a cost-effective option for offloading computationally intensive tasks. This paper describes a parallel implementation of the Montgomery Multiplication, as well as optimizations that enable efficient exploitation of the CUDA GPU architecture.


Mongtomery multiplication Modular exponentiation CUDA GPGPU 


  1. 1.
    OpenSSL: The Open Source toolkit for SSL/TLS.
  2. 2.
    Antao, S., Bajard, J.C., Sousa, L.: Elliptic curve point multiplication on GPUs. In: Charot, F., Hannig, F., Teich, J., Wolinski, C., (Eds.) ASAP, pp. 192–199. IEEE (2010)Google Scholar
  3. 3.
    Cohen, A.E., Parhi, K.K.: GPU accelerated elliptic curve cryptography in GF(\(2^m\)). In: Proceedings of the 2010 IEEE International Midwest Symposium on Circuits and Systems (MWSCAS), Seattle, WA, pp. 57–60 (2010)Google Scholar
  4. 4.
    Cook, D.L., Ioannidis, J., Keromytis, A.D., Luck, J.: CryptoGraphics: secret key cryptography using graphics cards. In: Menezes, A. (ed.) CT-RSA 2005. LNCS, vol. 3376, pp. 334–350. Springer, Heidelberg (2005) CrossRefGoogle Scholar
  5. 5.
    Diffie, W., Hellman, M.: New directions in cryptography. IEEE Trans. Inf. Theor. 22(6), 644–654 (2006). MathSciNetCrossRefzbMATHGoogle Scholar
  6. 6.
    Dussé, S.R., Kaliski, Jr., B.S.: A cryptographic library for the Motorola DSP 56000. In: Damgård, I.B. (ed.) EUROCRYPT 1990. LNCS, vol. 473, pp. 230–244. Springer, Heidelberg (1991)Google Scholar
  7. 7.
    Fleissner, S.: GPU-accelerated Montgomery exponentiation. In: Shi, Y., van Albada, G.D., Dongarra, J., Sloot, P.M.A. (eds.) ICCS 2007, Part I. LNCS, vol. 4487, pp. 213–220. Springer, Heidelberg (2007) CrossRefGoogle Scholar
  8. 8.
    Giorgi, P., Izard, T., Tisserand, A.: Comparison of modular arithmetic algorithms on GPUs. In: Proceedings of International Conference on Parallel Computing ParCo, Lyon, France (2009)Google Scholar
  9. 9.
    Harrison, O., Waldron, J.: Practical symmetric key cryptography on modern graphics hardware. In: 17th USENIX Security Symposium, pp. 195–209 (2008)Google Scholar
  10. 10.
    Koç, C., Acar, T., Kaliski, B.J.: Analyzing and comparing Montgomery multiplication algorithms. IEEE Micro 16(3), 26–33 (1996)CrossRefGoogle Scholar
  11. 11.
    Leboeuf, K., Muscedere, R., Ahmadi, M.: High performance prime field multiplication for GPU. In: 2012 IEEE International Symposium on Circuits and Systems (ISCAS), pp. 93–96, May 2012Google Scholar
  12. 12.
    Leboeuf, K., Muscedere, R., Ahmadi, M.: A GPU implementation of the Montgomery multiplication algorithm for elliptic curve cryptography. In: IEEE International Symposium on Circuits and Systems (ISCAS 2013), pp. 2593–2596, May 2013Google Scholar
  13. 13.
    Manavski, S.A.: CUDA compatible GPU as an efficient hardware accelerator for AES cryptography. In: IEEE International Conference on Signal Processing and Communications (ICSPC 2007), 24–27 November 2007, Dubai, United Arab Emirates, pp. 65–68 (2007)Google Scholar
  14. 14.
    Montgomery, P.L.: Modular multiplication without trial division. Math. Comput. 44(170), 519–521 (1985)MathSciNetCrossRefzbMATHGoogle Scholar
  15. 15.
    Moss, A., Page, D., Smart, N.P.: Toward acceleration of RSA using 3D graphics hardware. In: Galbraith, S.D. (ed.) Cryptography and Coding 2007. LNCS, vol. 4887, pp. 364–383. Springer, Heidelberg (2007) CrossRefGoogle Scholar
  16. 16.
    NVIDIA Corporation: GeForce GTX 750 SpecificationsGoogle Scholar
  17. 17.
    NVIDIA Corporation: CUDA C Best Practices Guide, 7.0 edn. (2015)Google Scholar
  18. 18.
    NVIDIA Corporation: CUDA C Programming Guide, 7.0 edn. (2015)Google Scholar
  19. 19.
    NVIDIA Corporation: Tuning CUDA Applications for Maxwell, 7.0 edn. (2015)Google Scholar
  20. 20.
    Rivest, R., Shamir, A., Adleman, L.: A method for obtaining digital signatures and public-key cryptosystems. Commun. ACM 21, 120–126 (1978)MathSciNetCrossRefzbMATHGoogle Scholar
  21. 21.
    Szerwinski, R., Güneysu, T.: Exploiting the power of GPUs for asymmetric cryptography. In: Oswald, E., Rohatgi, P. (eds.) CHES 2008. LNCS, vol. 5154, pp. 79–99. Springer, Heidelberg (2008) CrossRefGoogle Scholar
  22. 22.
    Trei, W.: Efficient Modular Arithmetic for SIMD Devices. In: IACR Cryptology ePrint Archive 2013, 652 (2013)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Nicolae Roşia
    • 1
    • 2
  • Virgil Cervicescu
    • 2
  • Mihai Togan
    • 3
  1. 1.Advanced Technology InstituteBucharestRomania
  2. 2.Military Technical AcademyBucharestRomania
  3. 3.certSIGNBucharestRomania

Personalised recommendations