All the AES You Need on Cortex-M3 and M4

  • Peter Schwabe
  • Ko Stoffelen
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10532)


This paper describes highly-optimized AES-\(\{128,192,256\}\)-CTR assembly implementations for the popular ARM Cortex-M3 and M4 embedded microprocessors. These implementations are about twice as fast as existing implementations. Additionally, we provide the fastest bitsliced constant-time and masked implementations of AES-128-CTR to protect against timing attacks, power analysis and other (first-order) side-channel attacks. All implementations, including an architecture-specific instruction scheduler and register allocator, which we use to minimize expensive loads, are released into the public domain.


AES Software implementation ARM Cortex-M Constant-time Bitslicing Masking 


  1. 1.
    Atasu, K., Breveglieri, L., Macchetti, M.: Efficient AES implementations for ARM based platforms. In: Proceedings of the 2004 ACM Symposium on Applied Computing, pp. 841–845. ACM (2004)Google Scholar
  2. 2.
    Balasch, J., Gierlichs, B., Reparaz, O., Verbauwhede, I.: DPA, bitslicing and masking at 1 GHz. In: Güneysu, T., Handschuh, H. (eds.) CHES 2015. LNCS, vol. 9293, pp. 599–619. Springer, Heidelberg (2015). doi: 10.1007/978-3-662-48324-4_30 CrossRefGoogle Scholar
  3. 3.
    Bernstein, D.J.: Cache-timing attacks on AES., 2005
  4. 4.
    Bernstein, D.J., Schwabe, P.: New AES software speed records. In: Chowdhury, D.R., Rijmen, V., Das, A. (eds.) INDOCRYPT 2008. LNCS, vol. 5365, pp. 322–336. Springer, Heidelberg (2008). doi: 10.1007/978-3-540-89754-5_25 CrossRefGoogle Scholar
  5. 5.
    Bernstein, D.J., Schwabe, P.: NEON crypto. In: Prouff, E., Schaumont, P. (eds.) CHES 2012. LNCS, vol. 7428, pp. 320–339. Springer, Heidelberg (2012). doi: 10.1007/978-3-642-33027-8_19 CrossRefGoogle Scholar
  6. 6.
    Bertoni, G., Breveglieri, L., Fragneto, P., Macchetti, M., Marchesin, S.: Efficient software implementation of AES on 32-bit platforms. In: Kaliski, B.S., Koç, K., Paar, C. (eds.) CHES 2002. LNCS, vol. 2523, pp. 159–171. Springer, Heidelberg (2003). doi: 10.1007/3-540-36400-5_13 CrossRefGoogle Scholar
  7. 7.
    Biham, E.: A fast new DES implementation in software. In: Biham, E. (ed.) FSE 1997. LNCS, vol. 1267, pp. 260–272. Springer, Heidelberg (1997). doi: 10.1007/BFb0052352 CrossRefGoogle Scholar
  8. 8.
    Boyar, J., Peralta, R.: A new combinational logic minimization technique with applications to cryptology. In: Festa, P. (ed.) SEA 2010. LNCS, vol. 6049, pp. 178–189. Springer, Heidelberg (2010). doi: 10.1007/978-3-642-13193-6_16 CrossRefGoogle Scholar
  9. 9.
    Canright, D.: A very compact S-box for AES. In: Rao, J.R., Sunar, B. (eds.) CHES 2005. LNCS, vol. 3659, pp. 441–455. Springer, Heidelberg (2005). doi: 10.1007/11545262_32 CrossRefGoogle Scholar
  10. 10.
    Cryptovia: AES algorithms for ARM CPU.
  11. 11.
    Daemen, J., Rijmen, V.: AES proposal: rijndael, version 2 (1999).
  12. 12.
    Daemen, J., Rijmen, V.: The Design of Rijndael: AES - The Advanced Encryption Standard. Springer, Heidelberg (2013). doi: 10.1007/978-3-662-04722-4 zbMATHGoogle Scholar
  13. 13.
    Dinu, D., Corre, Y.L., Khovratovich, D., Perrin, L., Großschädl, J., Biryukov, A.: Triathlon of lightweight block ciphers for the Internet of Things. Cryptology ePrint Archive, Report 2015/209 (2015).
  14. 14.
    Goudarzi, D., Rivain, M.: How fast can higher-order masking be in software? Cryptology ePrint Archive, Report 2016/264 (2016).
  15. 15.
    Hamburg, M.: Accelerating AES with vector permute instructions. In: Clavier, C., Gaj, K. (eds.) CHES 2009. LNCS, vol. 5747, pp. 18–32. Springer, Heidelberg (2009). doi: 10.1007/978-3-642-04138-9_2 CrossRefGoogle Scholar
  16. 16.
    Ishai, Y., Sahai, A., Wagner, D.: Private circuits: securing hardware against probing attacks. In: Boneh, D. (ed.) CRYPTO 2003. LNCS, vol. 2729, pp. 463–481. Springer, Heidelberg (2003). doi: 10.1007/978-3-540-45146-4_27 CrossRefGoogle Scholar
  17. 17.
    Käsper, E., Schwabe, P.: Faster and timing-attack resistant AES-GCM. In: Clavier, C., Gaj, K. (eds.) CHES 2009. LNCS, vol. 5747, pp. 1–17. Springer, Heidelberg (2009). doi: 10.1007/978-3-642-04138-9_1 CrossRefGoogle Scholar
  18. 18.
    Kocher, P.C.: Timing attacks on implementations of diffie-hellman, RSA, DSS, and other systems. In: Koblitz, N. (ed.) CRYPTO 1996. LNCS, vol. 1109, pp. 104–113. Springer, Heidelberg (1996). doi: 10.1007/3-540-68697-5_9 Google Scholar
  19. 19.
    Könighofer, R.: A fast and cache-timing resistant implementation of the AES. In: Malkin, T. (ed.) CT-RSA 2008. LNCS, vol. 4964, pp. 187–202. Springer, Heidelberg (2008). doi: 10.1007/978-3-540-79263-5_12 CrossRefGoogle Scholar
  20. 20.
    Matsui, M.: How far can we go on the x64 processors? In: Robshaw, M. (ed.) FSE 2006. LNCS, vol. 4047, pp. 341–358. Springer, Heidelberg (2006). doi: 10.1007/11799313_22 CrossRefGoogle Scholar
  21. 21.
    Matsui, M., Nakajima, J.: On the power of bitslice implementation on intel core2 processor. In: Paillier, P., Verbauwhede, I. (eds.) CHES 2007. LNCS, vol. 4727, pp. 121–134. Springer, Heidelberg (2007). doi: 10.1007/978-3-540-74735-2_9 CrossRefGoogle Scholar
  22. 22.
    May, L., Penna, L., Clark, A.: An implementation of bitsliced DES on the pentium MMXTM processor. In: Dawson, E.P., Clark, A., Boyd, C. (eds.) ACISP 2000. LNCS, vol. 1841, pp. 112–122. Springer, Heidelberg (2000). doi: 10.1007/10718964_10 CrossRefGoogle Scholar
  23. 23.
    NXP Semiconductors N.V. AN11241: AES encryption and decryption software on LPC microcontrollers.
  24. 24.
    Osvik, D.A., Bos, J.W., Stefan, D., Canright, D.: Fast software AES encryption. In: Hong, S., Iwata, T. (eds.) FSE 2010. LNCS, vol. 6147, pp. 75–93. Springer, Heidelberg (2010). doi: 10.1007/978-3-642-13858-4_5 CrossRefGoogle Scholar
  25. 25.
    Osvik, D.A., Shamir, A., Tromer, E.: Cache attacks and countermeasures: the case of AES. In: Pointcheval, D. (ed.) CT-RSA 2006. LNCS, vol. 3860, pp. 1–20. Springer, Heidelberg (2006). doi: 10.1007/11605805_1 CrossRefGoogle Scholar
  26. 26.
    ARM Holdings plc: mbed TLS v2.3.0.
  27. 27.
    ARM Holdings plc: ARM’s Cortex-M and Cortex-R embedded processors (2015).
  28. 28.
    RealTimeLogic: SharkSSL/RayCrypto v2.4 crypto library - benchmarks with ARM Cortex-M3.
  29. 29.
  30. 30.
  31. 31.
    Stoffelen, K.: Instruction scheduling and register allocation on ARM Cortex-M. In: Software Performance Enhancement for Encryption and Decryption, and Benchmarking - SPEED-B (2016).
  32. 32.
    Trichina, E.: Combinational logic design for AES SubByte transformation on masked data. Cryptology ePrint Archive, Report 2003/236 (2003).
  33. 33.
    Tromer, E., Osvik, D.A., Shamir, A.: Efficient cache attacks on AES, and countermeasures. J. Cryptol. 23(1), 37–71 (2010). CrossRefzbMATHMathSciNetGoogle Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.Digital Security GroupRadboud UniversityNijmegenThe Netherlands

Personalised recommendations