Advertisement

Efficient Acceleration of Asymmetric Cryptography on Graphics Hardware

  • Owen Harrison
  • John Waldron
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5580)

Abstract

Graphics processing units (GPU) are increasingly being used for general purpose computing. We present implementations of large integer modular exponentiation, the core of public-key cryptosystems such as RSA, on a DirectX 10 compliant GPU. DirectX 10 compliant graphics processors are the latest generation of GPU architecture, which provide increased programming flexibility and support for integer operations. We present high performance modular exponentiation implementations based on integers represented in both standard radix form and residue number system form. We show how a GPU implementation of a 1024-bit RSA decrypt primitive can outperform a comparable CPU implementation by up to 4 times and also improve the performance of previous GPU implementations by decreasing latency by up to 7 times and doubling throughput. We present how an adaptive approach to modular exponentiation involving implementations based on both a radix and a residue number system gives the best all-around performance on the GPU both in terms of latency and throughput. We also highlight the usage criteria necessary to allow the GPU to reach peak performance on public key cryptographic operations.

Keywords

Graphics Processor Public-Key Cryptography RSA  Residue Number System 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Nvidia CUDA Programming Guide, Version 2.0 (2008)Google Scholar
  2. 2.
    Microsoft, Direct X Technology, http://msdn.microsoft.com/directx/
  3. 3.
    Nvidia Corporation, “CUDA”, http://developer.nvidia.com/object/cuda.html
  4. 4.
    Menezes, A., van Oorschot, P., Vanstone, S.: Handbook of Applied Cryptography. CRC Press, Boca Raton (1996) ISBN 0-8493-8523-7CrossRefzbMATHGoogle Scholar
  5. 5.
    Montgomery, P.L.: Modular Multiplication Without Trial Division. Mathematics of Computation 44, 519–521 (1985)MathSciNetCrossRefzbMATHGoogle Scholar
  6. 6.
    Cook, D., Ioannidis, J., Keromytis, A., Luck, J.: CryptoGraphics: Secret Key Cryptography Using Graphics Cards. In: Menezes, A. (ed.) CT-RSA 2005. LNCS, vol. 3376, pp. 334–350. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  7. 7.
    Harrison, O., Waldron, J.: AES encryption implementation and analysis on commodity graphics processing units. In: Paillier, P., Verbauwhede, I. (eds.) CHES 2007. LNCS, vol. 4727, pp. 209–226. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  8. 8.
    Yang, J., Goodman, J.: Symmetric Key Cryptography on Modern Graphics Hardware. In: Kurosawa, K. (ed.) ASIACRYPT 2007. LNCS, vol. 4833, pp. 249–264. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  9. 9.
    Harrison, O., Waldron, J.: Practical Symmetric Key Cryptography on Modern Graphics Hardware. In: 17th USENIX Security Symposium, San Jose, CA, July 28 - August 1 (2008)Google Scholar
  10. 10.
    Moss, A., Page, D., Smart, N.P.: Toward Acceleration of RSA Using 3D Graphics Hardware. In: 11th IMA International Conference on Cryptography and Coding, Cirencester, UK, December 18-20 (2007)Google Scholar
  11. 11.
    Fleissner, S.: GPU-Accelerated Montgomery Exponentiation. In: Shi, Y., van Albada, G.D., Dongarra, J., Sloot, P.M.A. (eds.) ICCS 2007. LNCS, vol. 4487, pp. 213–220. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  12. 12.
  13. 13.
    Knuth, D.E.: The Art of Computer Programming, 3rd edn., vol. 2. Addison-Wesley, Reading (1997)zbMATHGoogle Scholar
  14. 14.
    OpenSSL Open Source Project, http://www.openssl.org/
  15. 15.
    Szerwinski, R., Güneysu, T.: Exploiting the Power of GPUs for Asymmetric Cryptography. In: Oswald, E., Rohatgi, P. (eds.) CHES 2008. LNCS, vol. 5154, pp. 79–99. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  16. 16.
    Posch, K.C., Posch, R.: Modulo Reduction in Residues Numbers Systems. IEEE Trans. on Parallel and Distributed Systems 6(5), 449–454 (1995)CrossRefGoogle Scholar
  17. 17.
    Kawamura, S., Koike, M., Sano, F., Shimbo, A.: Cox-Rower Architecture for Fast Parallel Montgomery Multiplication. In: Preneel, B. (ed.) EUROCRYPT 2000. LNCS, vol. 1807, pp. 523–538. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  18. 18.
    Szabo, N.S., Tanaka, R.I.: Residue Arithmetic and its Applications to Computer Technology. McGraw-Hill, New York (1967)zbMATHGoogle Scholar
  19. 19.
    Posch, K.C., Posch, R.: Base Extension Using a Convolution Sum in Residue Number Systems. Computing 50, 93–104 (1993)MathSciNetCrossRefzbMATHGoogle Scholar
  20. 20.
    Granlund, T., Montgomery, P.: Division by Invariant Integers using Multiplication. In: SIGPLAN 1994 Conference on Programming Language Design and Implementation, Orlando, Florida (June 1994)Google Scholar
  21. 21.
    Quisquater, J.-J., Couvreur, C.: Fast Decipherment Algorithm for RSA Public-Key Cryptosystem. Electronics Letters 18(21), 905–907 (1982)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Owen Harrison
    • 1
  • John Waldron
    • 1
  1. 1.Computer Architecture GroupTrinity College DublinDublin 2Ireland

Personalised recommendations