Montgomery Multiplication on the Cell

  • Joppe W. Bos
  • Marcelo E. Kaihara
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6067)


A technique to speed up Montgomery multiplication targeted at the Synergistic Processor Elements (SPE) of the Cell Broadband Engine is proposed. The technique consists of splitting a number into four consecutive parts. These parts are placed one by one in each of the four element positions of a vector, representing columns in a 4-SIMD organization. This representation enables arithmetic to be performed in a 4-SIMD fashion. An implementation of the Montgomery multiplication using this technique is up to 2.47 times faster compared to an unrolled implementation of Montgomery multiplication, which is part of the IBM multi-precision math library, for odd moduli of length 160 to 2048 bits. The presented technique can also be applied to speed up Montgomery multiplication on other SIMD-architectures.


Cell Broadband Engine Cryptology Computer Arithmetic Montgomery Multiplication Single Instruction Multiple Data (SIMD) 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Rivest, R.L., Shamir, A., Adleman, L.: A method for obtaining digital signatures and public-key cryptosystems. Communications of the ACM 21, 120–126 (1978)zbMATHCrossRefMathSciNetGoogle Scholar
  2. 2.
    Koblitz, N.: Elliptic curve cryptosystems. Mathematics of Computation 48, 203–209 (1987)zbMATHMathSciNetGoogle Scholar
  3. 3.
    Miller, V.S.: Use of elliptic curves in cryptography. In: Williams, H.C. (ed.) CRYPTO 1985. LNCS, vol. 218, pp. 417–426. Springer, Heidelberg (1986)Google Scholar
  4. 4.
    Montgomery, P.L.: Modular multiplication without trial division. Mathematics of Computation 44(170), 519–521 (1985)zbMATHMathSciNetGoogle Scholar
  5. 5.
    Costigan, N., Scott, M.: Accelerating SSL using the vector processors in IBM’s Cell broadband engine for Sony’s playstation 3. Cryptology ePrint Archive, Report 2007/061 (2007),
  6. 6.
    Bos, J.W., Casati, N., Osvik, D.A.: Multi-stream hashing on the PlayStation 3. In: PARA 2008 (2008) (to appear)Google Scholar
  7. 7.
    Bos, J.W., Osvik, D.A., Stefan, D.: Fast implementations of AES on various platforms. Cryptology ePrint Archive, Report 2009/501 (2009),
  8. 8.
    Costigan, N., Schwabe, P.: Fast elliptic-curve cryptography on the Cell broadband engine. In: Preneel, B. (ed.) AFRICACRYPT 2009. LNCS, vol. 5580, pp. 368–385. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  9. 9.
    Bos, J.W., Kaihara, M.E., Montgomery, P.L.: Pollard rho on the PlayStation 3. In: SHARCS 2009, pp. 35–50 (2009)Google Scholar
  10. 10.
    Stevens, M., Sotirov, A., Appelbaum, J., Lenstra, A., Molnar, D., Osvik, D.A., de Weger, B.: Short chosen-prefix collisions for MD5 and the creation of a rogue CA certificate. In: Halevi, S. (ed.) Advances in Cryptology - CRYPTO 2009. LNCS, vol. 5677, pp. 55–69. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  11. 11.
    IBM: Multi-precision math library. Example Library API Reference,
  12. 12.
    Hofstee, H.P.: Power efficient processor architecture and the Cell processor. In: HPCA 2005. IEEE Computer Society, Los Alamitos (2005)Google Scholar
  13. 13.
    Walter, C.D.: Montgomery exponentiation needs no final subtractions. Electronics Letters 35(21), 1831–1832 (1999)CrossRefGoogle Scholar
  14. 14.
    Kocher, P.C.: Timing attacks on implementations of Diffie-Hellman, RSA, DSS, and other systems. In: Koblitz, N. (ed.) CRYPTO 1996. LNCS, vol. 1109, pp. 104–113. Springer, Heidelberg (1996)Google Scholar
  15. 15.
    IBM: Software Development Kit (SDK) 3.1 (2007),

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Joppe W. Bos
    • 1
  • Marcelo E. Kaihara
    • 1
  1. 1.Laboratory for Cryptologic AlgorithmsÉcole Polytechnique Fédérale de Lausanne (EPFL)LausanneSwitzerland

Personalised recommendations