Reverse Product-Scanning Multiplication and Squaring on 8-Bit AVR Processors

  • Zhe LiuEmail author
  • Hwajeong Seo
  • Johann Großschädl
  • Howon Kim
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8958)


High performance, small code size, and good scalability are important requirements for software implementations of multi-precision arithmetic algorithms to fit resource-limited embedded systems. In this paper, we describe optimization techniques to speed up multi-precision multiplication and squaring on the AVR ATmega series of 8-bit microcontrollers. First, we present a new approach to perform multi-precision multiplication, called Reverse Product Scanning (RPS), that resembles the hybrid technique of Gura et al., but calculates the byte-products in the inner loop in reverse order. The RPS method processes four bytes of the two operands in each iteration of the inner loop and employs two carry-catcher registers to minimize the number of add instructions. We also describe an optimized algorithm for multi-precision squaring based on the RPS technique that is, depending on the operand length, up to 44.3 % faster than multiplication. Our AVR Assembly implementations of RPS multiplication and RPS squaring occupy less than 1 kB of code space each and are written in a parameterized fashion so that they can support operands of varying length without recompilation. Despite this high level of flexibility, our RPS multiplication outperforms the looped variant of Hutter et al.’s operand-caching technique and saves between 40 and 51 % of code size. We also combine our RPS multiplication and squaring routines with Karatsuba’s method to further reduce execution time. When executed on an ATmega128 processor, the “karatsubarized RPS method” needs only 85 k clock cycles for a 1024-bit multiplication (or 48 k cycles for a squaring). These results show that it is possible to achieve high performance without sacrificing code size or scalability.


Clock Cycle Partial Product Flash Memory Nest Loop Code Size 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Atmel Corporation: 8-bit ARV\(^{\textregistered }\) Instruction Set. User Guide, July 2008.
  2. 2.
    Atmel Corporation: 8-bit ARV\(^{\textregistered }\) Microcontroller with 128 K Bytes In-System Programmable Flash: ATmega128, ATmega128L. Datasheet, June 2008.
  3. 3.
    Bernstein, D.J.: Batch binary Edwards. In: Halevi, S. (ed.) CRYPTO 2009. LNCS, vol. 5677, pp. 317–336. Springer, Heidelberg (2009) CrossRefGoogle Scholar
  4. 4.
    Boneh, D., Franklin, M.K.: Identity-based encryption from the Weil pairing. SIAM J. Comput. 32(3), 586–615 (2003)MathSciNetCrossRefzbMATHGoogle Scholar
  5. 5.
    Brent, R.P., Zimmermann, P.: Modern Computer Arithmetic, Cambridge Monographs on Applied and Computational Mathematics, vol. 18. Cambridge University Press, Cambridge (2010)CrossRefGoogle Scholar
  6. 6.
    Comba, P.G.: Exponentiation cryptosystems on the IBM PC. IBM Syst. J. 29(4), 526–538 (1990)CrossRefGoogle Scholar
  7. 7.
    Großschädl, J., Avanzi, R.M., Savaş, E., Tillich, S.: Energy-efficient software implementation of long integer modular arithmetic. In: Rao, J.R., Sunar, B. (eds.) CHES 2005. LNCS, vol. 3659, pp. 75–90. Springer, Heidelberg (2005) CrossRefGoogle Scholar
  8. 8.
    Großschädl, J., Savaş, E.: Instruction set extensions for fast arithmetic in finite fields GF(p) and GF(2\(^{m}\)). In: Joye, M., Quisquater, J.-J. (eds.) CHES 2004. LNCS, vol. 3156, pp. 133–147. Springer, Heidelberg (2004) CrossRefGoogle Scholar
  9. 9.
    Gura, N., Patel, A., Wander, A., Eberle, H., Shantz, S.C.: Comparing elliptic curve cryptography and RSA on 8-bit CPUs. In: Joye, M., Quisquater, J.-J. (eds.) CHES 2004. LNCS, vol. 3156, pp. 119–132. Springer, Heidelberg (2004) CrossRefGoogle Scholar
  10. 10.
    Hankerson, D.R., Menezes, A.J., Vanstone, S.A.: Guide to Elliptic Curve Cryptography. Springer Verlag, New York (2004) zbMATHGoogle Scholar
  11. 11.
    Hutter, M., Schwabe, P.: Multiprecision multiplication on AVR revisited. Cryptology ePrint Archive, Report 2014/592 (2014).
  12. 12.
    Hutter, M., Wenger, E.: Fast multi-precision multiplication for public-key cryptography on embedded microprocessors. In: Preneel, B., Takagi, T. (eds.) CHES 2011. LNCS, vol. 6917, pp. 459–474. Springer, Heidelberg (2011) CrossRefGoogle Scholar
  13. 13.
    Karatsuba, A.A., Ofman, Y.P.: Multiplication of multidigit numbers on automata. Soviet Physics - Doklady 7(7), 595–596 (1963)Google Scholar
  14. 14.
    Koblitz, N.I.: Elliptic curve cryptosystems. Math. Comput. 48(177), 203–209 (1987)MathSciNetCrossRefzbMATHGoogle Scholar
  15. 15.
    Liu, Z., Großschädl, J.: New speed records for montgomery modular multiplication on 8-Bit AVR microcontrollers. In: Pointcheval, D., Vergnaud, D. (eds.) AFRICACRYPT. LNCS, vol. 8469, pp. 215–234. Springer, Heidelberg (2014) CrossRefGoogle Scholar
  16. 16.
    Liu, Z., Großschädl, J., Kizhvatov, I.: Efficient and side-channel resistant RSA implementation for 8-bit AVR microcontrollers. In: Proceedings of the 1st International Workshop on the Security of the Internet of Things (SECIOT 2010) (2010)Google Scholar
  17. 17.
    Miller, V.S.: Use of elliptic curves in cryptography. In: Williams, H.C. (ed.) CRYPTO 1985. LNCS, vol. 218, pp. 417–426. Springer, Heidelberg (1986) Google Scholar
  18. 18.
    Rivest, R.L., Shamir, A., Adleman, L.M.: A method for obtaining digital signatures and public key cryptosystems. Commun. ACM 21(2), 120–126 (1978)MathSciNetCrossRefzbMATHGoogle Scholar
  19. 19.
    Scott, M., Szczechowiak, P.: Optimizing multiprecision multiplication for public key cryptography. Cryptology ePrint Archive, Report 2007/299 (2007).
  20. 20.
    Seo, H., Kim, H.: Multi-precision multiplication for public-key cryptography on embedded microprocessors. In: Lee, D.H., Yung, M. (eds.) WISA 2012. LNCS, vol. 7690, pp. 55–67. Springer, Heidelberg (2012) CrossRefGoogle Scholar
  21. 21.
    Seo, H., Kim, H.: Optimized multi-precision multiplication for public-key cryptography on embedded microprocessors. Int. J. Comput. Commun. Eng. 2(3), 255–259 (2013)CrossRefGoogle Scholar
  22. 22.
    Seo, H., Liu, Z., Choi, J., Kim, H.: Multi-precision squaring for public-key cryptography on embedded microprocessors. In: Paul, G., Vaudenay, S. (eds.) INDOCRYPT 2013. LNCS, vol. 8250, pp. 227–243. Springer, Heidelberg (2013) CrossRefGoogle Scholar
  23. 23.
    Uhsadel, L., Poschmann, A., Paar, C.: Enabling full-size public-key algorithms on 8-bit sensor nodes. In: Stajano, F., Meadows, C., Capkun, S., Moore, T. (eds.) ESAS 2007. LNCS, vol. 4572, pp. 73–86. Springer, Heidelberg (2007) CrossRefGoogle Scholar
  24. 24.
    Zhang, Y., Großschädl, J.: Efficient prime-field arithmetic for elliptic curve cryptography on wireless sensor nodes. In: Proceedings of the 1st International Conference on Computer Science and Network Technology (ICCSNT 2011), vol. 1, pp. 459–466. IEEE (2011)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Zhe Liu
    • 1
    Email author
  • Hwajeong Seo
    • 2
  • Johann Großschädl
    • 1
  • Howon Kim
    • 2
  1. 1.Laboratory of Algorithmics, Cryptology and Security (LACS)University of LuxembourgLuxembourgLuxembourg
  2. 2.School of Computer Science and EngineeringPusan National UniversityBusanRepublic of Korea

Personalised recommendations