Efficient Vector Implementations of AES-Based Designs: A Case Study and New Implemenations for Grøstl

  • Severin Holzer-Graf
  • Thomas Krinninger
  • Martin Pernull
  • Martin Schläffer
  • Peter Schwabe
  • David Seywald
  • Wolfgang Wieser
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7779)


In this paper we evaluate and improve different vector implementation techniques of AES-based designs. We analyze how well the T-table, bitsliced and bytesliced implementation techniques apply to the SHA-3 finalist Grøstl. We present a number of new Grøstl implementations that improve upon many previous results. For example, our fastest ARM NEON implementation of Grøstl is 40% faster than the previously fastest ARM implementation. We present the first Intel AVX2 implementations of Grøstl, which require 40% less instructions than previous implementations. Furthermore, we present ARM Cortex-M0 implementations of Grøstl that improve the speed by 55% or the memory requirements by 15%.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Aoki, K., Roland, G., Sasaki, Y., Schläffer, M.: Byte Slicing Grøstl – Optimized Intel AES-NI and 8-bit Implementations of the SHA-3 Finalist Grøstl. In: Lopez, J., Samarati, P. (eds.) Proceedings of SECRYPT 2011, pp. 124–133. SciTePress (2011)Google Scholar
  2. 2.
    ARM Limited: Cortex-a8 technical reference manual, revision r3p2 (2010), http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0344k/index.html
  3. 3.
  4. 4.
    Bernstein, D.J., Lange, T.: eBASH: ECRYPT Benchmarking of All Submitted Hashes (January 2011), http://bench.cr.yp.to/ebash.html
  5. 5.
    Bernstein, D.J., Lange, T.: SUPERCOP (2012), http://bench.cr.yp.to/supercop.html, (accessed September 9, 2012)
  6. 6.
    Bernstein, D.J., Schwabe, P.: NEON crypto (2012), http://cryptojedi.org/papers/#neoncrypto
  7. 7.
    Biham, E.: A Fast New DES Implementation in Software. In: Biham, E. (ed.) FSE 1997. LNCS, vol. 1267, pp. 260–272. Springer, Heidelberg (1997), http://www.cs.technion.ac.il/users/wwwb/cgi-bin/tr-get.cgi/1997/CS/CS0891.pdf CrossRefGoogle Scholar
  8. 8.
    Boyar, J., Peralta, R.: A New Combinational Logic Minimization Technique with Applications to Cryptology. In: Festa, P. (ed.) SEA 2010. LNCS, vol. 6049, pp. 178–189. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  9. 9.
    Çalik, Ç.: Multi-stream and Constant-time SHA-3 Implementations. NIST hash function mailing list (December 2010), http://www.metu.edu.tr/~ccalik/software.html#sha3
  10. 10.
    Canright, D.: A Very Compact S-Box for AES. In: Rao, J.R., Sunar, B. (eds.) CHES 2005. LNCS, vol. 3659, pp. 441–455. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  11. 11.
    Corp, I.: Intel advanced vector extensions programming reference (2011), http://software.intel.com/file/36945
  12. 12.
    Daemen, J., Rijmen, V.: AES Proposal: Rijndael. NIST AES Algorithm Submission (September 1999), http://csrc.nist.gov/archive/aes/rijndael/Rijndael-ammended.pdf
  13. 13.
    Derbez, P., Fouque, P.A., Jean, J.: Improved Key Recovery Attacks on Reduced-Round AES. In: CRYPTO Rump Session (2012)Google Scholar
  14. 14.
    Ferguson, N., Kelsey, J., Lucks, S., Schneier, B., Stay, M., Wagner, D., Whiting, D.L.: Improved Cryptanalysis of Rijndael. In: Schneier, B. (ed.) FSE 2000. LNCS, vol. 1978, pp. 213–230. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  15. 15.
    Gauravaram, P., Knudsen, L.R., Matusiewicz, K., Mendel, F., Rechberger, C., Schläffer, M., Thomsen, S.S.: Grøstl – a SHA-3 candidate. Submission to NIST (2008), http://www.groestl.info (retrieved July 4, 2010)
  16. 16.
    Gauravaram, P., Knudsen, L.R., Matusiewicz, K., Mendel, F., Rechberger, C., Schläffer, M., Thomsen, S.S.: Grøstl – a SHA-3 candidate. Submission to NIST (Round 3) (2011), http://www.groestl.info (November 25, 2011)
  17. 17.
    Grisenthwaite, R.: Armv8 technology preview (2011), http://www.arm.com/files/downloads/ARMv8_Architecture.pdf
  18. 18.
    Gueron, S., Krasnov, V.: Simultaneous hashing of multiple messages. Cryptology ePrint Archive, Report 2012/371 (2012), http://eprint.iacr.org/2012/371
  19. 19.
    Hamburg, M.: Accelerating AES with Vector Permute Instructions. In: Clavier, C., Gaj, K. (eds.) CHES 2009. LNCS, vol. 5747, pp. 18–32. Springer, Heidelberg (2009), http://mikehamburg.com/papers/vector_aes/vector_aes.pdf CrossRefGoogle Scholar
  20. 20.
    Intel: Intel software development emulator (2012), http://software.intel.com/en-us/articles/intel-software-development-emulator/
  21. 21.
    Intel Corporation: Intel Advanced Encryption Standard Instructions (AES-NI) (March 2011), http://software.intel.com/en-us/articles/intel-advanced-encryption-standard-instructions-aes-ni/
  22. 22.
    Intel (Mark Buxton): Haswell New Instruction Descriptions (June 2011), http://software.intel.com/en-us/blogs/2011/06/13/haswell-new-instruction-descriptions-now-available/
  23. 23.
    Käsper, E., Schwabe, P.: Faster and Timing-Attack Resistant AES-GCM. In: Clavier, C., Gaj, K. (eds.) CHES 2009. LNCS, vol. 5747, pp. 1–17. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  24. 24.
    National Institute of Standards and Technology: FIPS PUB 197, Advanced Encryption Standard (AES). Federal Information Processing Standards Publication 197, U.S. Department of Commerce (November 2001)Google Scholar
  25. 25.
    National Institute of Standards and Technology: Cryptographic Hash Project (2007), http://www.nist.gov/hash-competition.
  26. 26.
    NIST: Announcing request for candidate algorithm nominations for a new cryptographic hash algorithm (SHA-3) family. Federal Register 72(212), 62212–62220 (2007), http://csrc.nist.gov/groups/ST/hash/documents/FR_Notice_Nov07.pdf
  27. 27.
    Schwabe, P., Yang, B.-Y., Yang, S.-Y.: SHA-3 on ARM11 Processors. In: Mitrokotsa, A., Vaudenay, S. (eds.) AFRICACRYPT 2012. LNCS, vol. 7374, pp. 324–341. Springer, Heidelberg (2012), http://cryptojedi.org/papers/#sha3arm CrossRefGoogle Scholar
  28. 28.
    Wenzel-Benner, C., Gräf, J.: XBX: eXternal Benchmarking eXtension for the SUPERCOP Crypto Benchmarking Framework (2012), https://xbx.das-labor.org/
  29. 29.
    Wieser, W.: Optimization of Grøstl for 32-bit ARM Processors. Bachelor’s thesis, Graz University of Technology, Austria (2011)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Severin Holzer-Graf
    • 1
  • Thomas Krinninger
    • 1
  • Martin Pernull
    • 1
  • Martin Schläffer
    • 1
  • Peter Schwabe
    • 2
  • David Seywald
    • 1
  • Wolfgang Wieser
    • 1
  1. 1.IAIKGraz University of TechnologyAustria
  2. 2.Digital Security GroupRadboud University NijmegenThe Netherlands

Personalised recommendations