Architectural Enhancements for Montgomery Multiplication on Embedded RISC Processors

  • Johann Großschädl
  • Guy-Armand Kamendje
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2846)


Montgomery multiplication normally spends over 90% of its execution time in inner loops executing some kind of multiply-and-add operations. The performance of these critical code sections can be greatly improved by customizing the processor’s instruction set for low-level arithmetic functions. In this paper, we investigate the potential of architectural enhancements for multiple-precision Montgomery multiplication according to the so-called Finely Integrated Product Scanning (FIPS) method. We present instruction set extensions to accelerate the FIPS inner loop operation based on the availability of a multiply/accumulate (MAC) unit with a wide accumulator. Finally, we estimate the execution time of a 1024-bit Montgomery multiplication on an extended MIPS32 core and discuss the impact of the multiplier latency.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    ARC International Technical summary of the ARCtangent TM-A4 processor core. Product brief (2001), Available for download at
  2. 2.
    ARM Limited. ARM SecurCore Solutions. Product brief (2002), Available for download at
  3. 3.
    Barrett, P.D.: Implementing the Rivest, Shamir and Adleman public-key encryption algorithm on a standard digital signal processor. In: Odlyzko, A.M. (ed.) CRYPTO 1986. LNCS, vol. 263, pp. 311–323. Springer, Heidelberg (1987)Google Scholar
  4. 4.
    Choi, K., Song, M.: Design of a high performance 32×32-bit multiplier with a novel sign select Booth encoder. In: Proceedings of the 34th IEEE International Symposium on Circuits and Systems (ISCAS 2001), vol. II, pp. 701–704. IEEE, Los Alamitos (2001)Google Scholar
  5. 5.
    Comba, P.G.: Exponentiation cryptosystems on the IBM PC. IBM Systems Journal 29(4), 526–538 (1990)CrossRefGoogle Scholar
  6. 6.
    Dhem, J.-F.: Design of an efficient public-key cryptographic library for RISC-based smart cards. Ph.D. Thesis, Université Catholique de Louvain, Louvain-la-Neuve, Belgium (1998)Google Scholar
  7. 7.
    Dussé, S.R., Kaliski, B.S.: A cryptographic library for the Motorola DSP56000. In: Damgård, I.B. (ed.) EUROCRYPT 1990. LNCS, vol. 473, pp. 230–244. Springer, Heidelberg (1991)Google Scholar
  8. 8.
    Faraboschi, P., Brown, G.M., Fisher, J.A., Desoli, G., Homewood, M.O.: Lx: A technology platform for customizable VLIW embedded processing. In: Proceedings of the 27th Annual International Symposium on Computer Architecture (ISCA 2000), pp. 203–213. ACM Press, New York (2000)CrossRefGoogle Scholar
  9. 9.
    Gonzalez, R.E.: Xtensa: A configurable and extensible processor. IEEE Micro 20(2), 60–70 (2000)CrossRefGoogle Scholar
  10. 10.
    Goodman, J.R.: Energy Scalable Reconfigurable Cryptographic Hardware for Portable Applications. Ph.D. Thesis, Massachusetts Institute of Technology, Cambridge, MA, USA (2000)Google Scholar
  11. 11.
    Großschädl, J., Kamendje, G.-A.: Optimized RISC architecture for multipleprecision modular arithmetic. In: Hutter, D., Müller, G., Stephan, W., Ullmann, M. (eds.) Security in Pervasive Computing. LNCS, vol. 2802, pp. 253–270. Springer, Heidelberg (2003) (in print)CrossRefGoogle Scholar
  12. 12.
    Gschwind, M.: Instruction set selection for ASIP design. In: Proceedings of the 7th International Symposium on Hardware/Software Codesign (CODES 1999), pp. 7–11. ACM Press, New York (1999)CrossRefGoogle Scholar
  13. 13.
    Keutzer, K.W., Malik, S., Newton, A.R.: From ASIC to ASIP: The next design discontinuity. In: Proceedings of the 20th International Conference on Computer Design (ICCD 2002), pp. 84–90. IEEE Computer Society Press, Los Alamitos (2002)Google Scholar
  14. 14.
    Knuth, D.E.: Seminumerical Algorithms, 3rd edn. The Art of Computer Programming, vol. 2. Addison-Wesley, Reading (1998)MATHGoogle Scholar
  15. 15.
    Koç, Ç.K., Acar, T., Kaliski, B.S.: Analyzing and comparing Montgomery multiplication algorithms. IEEE Micro 16(3), 26–33 (1996)CrossRefGoogle Scholar
  16. 16.
    Kocher, P.C.: Timing attacks on implementations of Diffie-Hellman, RSA, DSS, and other systems. In: Koblitz, N. (ed.) CRYPTO 1996. LNCS, vol. 1109, pp. 104–113. Springer, Heidelberg (1996)Google Scholar
  17. 17.
    Küçükçakar, K.: An ASIP design methodology for embedded systems. In: Proceedings of the 7th International Symposium on Hardware/Software Codesign (CODES 1999), pp. 17–21. ACM Press, New York (1999)CrossRefGoogle Scholar
  18. 18.
    Menezes, J., van Oorschot, P.C., Vanstone, S.A.: Handbook of Applied Cryptography. CRC Press, Boca Raton (1996)CrossRefGoogle Scholar
  19. 19.
    MIPS Technologies, Inc. Making smart cards secure. The Pipeline (Technology Newsletter), Fall 2001, p. 4 (2001), Available for download at
  20. 20.
    MIPS Technologies, Inc. MIPS32 4Km TM processor core family data sheet (2001), Available for download at
  21. 21.
    MIPS Technologies, Inc. MIPS32 4K TM processor core family software user’s manual (2001), Available for download at
  22. 22.
    MIPS Technologies, Inc. MIPS32TM architecture for programmers, Vol. I&II (2001), Available for download at
  23. 23.
    MIPS Technologies, Inc. SmartMIPS Architecture Smart Card Extensions. Product brief (2001), Available for download at
  24. 24.
    MIPS Technologies, Inc. Pro Series TM Processor Cores. Product brief (2003), Available for download at
  25. 25.
    Montgomery, P.L.: Modular multiplication without trial division. Mathematics of Computation 44(170), 519–521 (1985)MATHCrossRefMathSciNetGoogle Scholar
  26. 26.
    NEC Electronics, Inc. V-WAY32 32-bit Security Cryptocontroller. Product letter (2000), Available for download at
  27. 27.
    Phillips, B.J., Burgess, N.: Implementing 1,024-bit RSA exponentiation on a 32-bit processor core. In: Proceedings of the 12th IEEE International Conference on Application-specific Systems, Architectures and Processors (ASAP 2000), pp. 127–137. IEEE Computer Society Press, Los Alamitos (2000)CrossRefGoogle Scholar
  28. 28.
    STMicroelectronics. ST22 SmartJ Platform Smartcard ICs. Product brief (2002), Available for download at
  29. 29.
    The Open SystemC Initiative (OSCI). SystemC Version 2.0 User’s Guide (2002), Available for download at
  30. 30.
    Walter, C.D.: MIST: An efficient, randomized exponentiation algorithm for resisting power analysis. In: Preneel, B. (ed.) CT-RSA 2002. LNCS, vol. 2271, pp. 53–66. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  31. 31.
    Wang, A., Killian, E., Maydan, D.E., Rowen, C.: Hardware/software instruction set configurability for system-on-chip processors. In: Proceedings of the 38th Design Automation Conference (DAC 2001), pp. 184–188. ACM Press, New York (2001)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2003

Authors and Affiliations

  • Johann Großschädl
    • 1
  • Guy-Armand Kamendje
    • 1
  1. 1.Institute for Applied Information Processing and CommunicationsGraz University of TechnologyGrazAustria

Personalised recommendations