High-Speed Elliptic Curve Cryptography on the NVIDIA GT200 Graphics Processing Unit

  • Shujie Cui
  • Johann Großschädl
  • Zhe Liu
  • Qiuliang Xu
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8434)


This paper describes a high-speed software implementation of Elliptic Curve Cryptography (ECC) for GeForce GTX graphics cards equipped with an NVIDIA GT200 Graphics Processing Unit (GPU). In order to maximize throughput, our ECC software allocates just a single thread per scalar multiplication and aims to launch as many threads in parallel as possible. We adopt elliptic curves in Montgomery as well as twisted Edwards form, both defined over a special family of finite fields known as Optimal Prime Fields (OPFs). All field-arithmetic operations use a radix-224 representation for the operands (i.e. 24 operand bits are contained in a 32-bit word) to comply with the native (24 ×24)-bit integer multiply instruction of the GT200 platform. We implemented the OPF arithmetic without conditional statements (e.g. if-then clauses) to prevent thread divergence and unrolled the loops to minimize execution time. The scalar multiplication on the twisted Edwards curve employs a comb approach if the base point is fixed and uses extended projective coordinates so that a point addition requires only seven multiplications in the underlying OPF. Our software currently supports elliptic curves over 160-bit and 224-bit OPFs. After a detailed evaluation of numerous implementation options and configurations, we managed to launch 2880 threads on the 30 multiprocessors of the GT200 when the elliptic curve has Montgomery form and is defined over a 224-bit OPF. The resulting throughput is 115k scalar multiplications per second (for arbitrary base points) and we achieved a minimum latency of 19.2 ms. In a fixed-base setting with 256 precomputed points, the throughput increases to some 345k scalar multiplications and the latency drops to 4.52 ms.


Graphic Processing Unit Shared Memory Elliptic Curf Scalar Multiplication Global Memory 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Antão, S., Bajard, J.-C., Sousa, L.: Elliptic curve point multiplication on GPUs. In: Proceedings of the 21st IEEE International Conference on Application-Specific Systems, Architectures and Processors (ASAP 2010), pp. 192–199. IEEE Computer Society Press (2010)Google Scholar
  2. 2.
    Antão, S., Bajard, J.-C., Sousa, L.: RNS-based elliptic curve point multiplication for massive parallel architectures. Computer Journal 55(5), 629–647 (2012)CrossRefGoogle Scholar
  3. 3.
    Bernstein, D.J., Birkner, P., Joye, M., Lange, T., Peters, C.: Twisted Edwards curves. In: Vaudenay, S. (ed.) AFRICACRYPT 2008. LNCS, vol. 5023, pp. 389–405. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  4. 4.
    Bernstein, D.J., Chen, H.-C., Chen, M.-S., Cheng, C.-M., Hsiao, C.-H., Lange, T., Lin, Z.-C., Yang, B.-Y.: The billion-mulmod-per-second PC. In: Proceedings of the 4th Workshop on Special-Purpose Hardware for Attacking Cryptographic Systems (SHARCS 2009), Lausanne, Switzerland, pp. 131–144 (September 2009)Google Scholar
  5. 5.
    Bernstein, D.J., Chen, T.-R., Cheng, C.-M., Lange, T., Yang, B.-Y.: ECM on graphics cards. In: Joux, A. (ed.) EUROCRYPT 2009. LNCS, vol. 5479, pp. 483–501. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  6. 6.
    Bos, J.W.: Low-latency elliptic curve scalar multiplication. International Journal of Parallel Programming 40(5), 532–550 (2012)CrossRefGoogle Scholar
  7. 7.
    Chu, D., Großschädl, J., Liu, Z., Müller, V., Zhang, Y.: Twisted Edwards-form elliptic curve cryptography for 8-bit AVR-based sensor nodes. In: Proceedings of the 1st ACM Workshop on Asia Public-Key Cryptography (AsiaPKC 2013), pp. 39–44. ACM Press (2013)Google Scholar
  8. 8.
    Giorgi, P., Izard, T., Tisserand, A.: Comparison of modular arithmetic algorithms on GPUs. In: Parallel Computing: From Multicores and GPU’s to Petascale. Advances in Parallel Computing, vol. 19, pp. 315–322. IOS Press (2010)Google Scholar
  9. 9.
    Großschädl, J.: TinySA: A security architecture for wireless sensor networks. In: Proceedings of the 2nd International Conference on Emerging Networking Experiments and Technologies (CoNEXT 2006), pp. 288–289. ACM Press (2006)Google Scholar
  10. 10.
    Hankerson, D.R., Menezes, A.J., Vanstone, S.A.: Guide to Elliptic Curve Cryptography. Springer (2004)Google Scholar
  11. 11.
    Hisil, H., Wong, K.K.-H., Carter, G., Dawson, E.: Twisted Edwards curves revisited. In: Pieprzyk, J. (ed.) ASIACRYPT 2008. LNCS, vol. 5350, pp. 326–343. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  12. 12.
    Jang, K., Han, S., Han, S., Moon, S., Park, K.: SSLShader: Cheap SSL acceleration with commodity processors. In: Andersen, D.G., Ratnasamy, S. (eds.) Proceedings of the 8th USENIX Symposium on Networked Systems Design and Implementation (NSDI 2011). USENIX Organization (2011)Google Scholar
  13. 13.
    Khan, F.G.: General Purpose Computation on Graphics Processing Units using OpenCL. Ph.D. Thesis, Politecnico di Torino, Torino, Italy (March 2013)Google Scholar
  14. 14.
    Lindholm, E., Nickolls, J., Oberman, S., Montrym, J.: NVIDIA Tesla: A unified graphics and computing architecture. IEEE Micro 28(2), 39–55 (2008)CrossRefGoogle Scholar
  15. 15.
    Liu, Z., Großschädl, J., Wong, D.S.: Low-weight primes for lightweight elliptic curve cryptography on 8-bit processors. In: Lin, D., Xu, S., Yung, M. (eds.) The 9th China International Conference on Information Security and Cryptology — INSCRYPT 2013. LNCS. Springer, Heidelberg (to appear)Google Scholar
  16. 16.
    Liu, Z., Wenger, E., Großschädl, J.: MoTE-ECC: Energy-scalable elliptic curve cryptography for wireless sensor networks (February 2013) (to be published)Google Scholar
  17. 17.
    Montgomery, P.L.: Modular multiplication without trial division. Mathematics of Computation 44(170), 519–521 (1985)CrossRefzbMATHMathSciNetGoogle Scholar
  18. 18.
    Montgomery, P.L.: Speeding the Pollard and elliptic curve methods of factorization. Mathematics of Computation 48(177), 243–264 (1987)CrossRefzbMATHMathSciNetGoogle Scholar
  19. 19.
    NVIDIA Corporation. NVIDIA GeForce® GTX 200 GPU Architectural Overview. Technical brief (2008),
  20. 20.
    NVIDIA Corporation. CUDA C Programming Guide. Design guide (2013),
  21. 21.
    NVIDIA Corporation. Parallel Thread Execution ISA. Application guide (2013),
  22. 22.
    Szerwinski, R., Güneysu, T.: Exploiting the power of GPUs for asymmetric cryptography. In: Oswald, E., Rohatgi, P. (eds.) CHES 2008. LNCS, vol. 5154, pp. 79–99. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  23. 23.
    Yanık, T.,  Savaş, E., Koç, Ç.K.: Incomplete reduction in modular arithmetic. IEE Proceedings – Computers and Digital Techniques 149(2), 46–52 (2002)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Shujie Cui
    • 1
    • 2
  • Johann Großschädl
    • 2
  • Zhe Liu
    • 2
  • Qiuliang Xu
    • 1
  1. 1.School of Computer Science and TechnologyShandong UniversityJinanP.R. China
  2. 2.Laboratory of Algorithmics, Cryptology and SecurityUniversity of LuxembourgLuxembourg

Personalised recommendations