A Program Generator for Intel AES-NI Instructions

  • Raymond Manley
  • David Gregg
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6498)


Recent Intel processors provide hardware instructions that implement a full AES round in a single instruction. Existing libraries use hand-tuned assembly language to overlap the execution of multiple AES instructions and extract maximum performance. We present a program generator that creates optimized AES code automatically from a simple, annotated C version of the code. We show how this generator can be used to rapidly create highly optimized versions of several AES modes. The resulting code generated has performance that is equal to, or up to 7% faster than the hand-tuned assembly libraries from Intel.


AES AES-NI program generator code generation 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Specification for the Advanced Encryption Standard (AES) (2001)Google Scholar
  2. 2.
    Daemen, J., Rijmen, V.: The design of Rijndael: AES — the Advanced Encryption Standard. Springer, Heidelberg (2002)CrossRefzbMATHGoogle Scholar
  3. 3.
    Gueron, S.: Intel’s New AES Instructions for Enhanced Performance and Security. In: Dunkelman, O. (ed.) Fast Software Encryption. LNCS, vol. 5665, pp. 51–66. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  4. 4.
    Gueron, S.: Intel Advanced Encryption Standard (AES) Instructions Set (White Paper). Intel Corp. (2010),
  5. 5.
    Gopal, V., Feghali, W., Guilford, J., Ozturk, E., Wolrich, G., Dixon, M., Locktyukhin, M., Perminov, M.: Fast Cryptographic Computation on Intel Architecture Via Function Stitching (White Paper). Intel Corp. (2010),
  6. 6.
    Akdemir, K., Dixon, M., Feghali, W., Fay, P., Gopal, V., Guilford, J., Ozturk, E., Wolrich, G., Zohar, R.: Breakthrough AES Performance with Intel AES New Instructions (White Paper). Intel Corp. (2010),
  7. 7.
    Rudd, T.: Cheetah - The Python-Powered Template Engine (2007),
  8. 8.
    Cytron, R., Ferrante, J., Rosen, B.K., Wegman, M.N., Zadeck, F.K.: Efficiently computing static single assignment form and the control dependence graph. ACM Trans. Program. Lang. Syst. 13(4), 451–490 (1991)CrossRefGoogle Scholar
  9. 9.
    Skiena, S.S.: The Algorithm Design Manual. Springer, New York (1998)zbMATHGoogle Scholar
  10. 10.
    Fisher, J.A.: Very Long Instruction Word architectures and the ELI-512. In: ISCA 1983: Proceedings of the 10th Annual International Symposium on Computer Architecture, pp. 140–150. ACM, New York (1983)Google Scholar
  11. 11.
    Rau, B.R.: Iterative modulo scheduling: an algorithm for software pipelining loops. In: MICRO 27: Proceedings of the 27th Annual International Symposium on Microarchitecture, pp. 63–74. ACM, New York (1994)CrossRefGoogle Scholar
  12. 12.
    Manley, R., Gregg, D.: Code Generation for Hardware Accelerated AES. In: 21st IEEE International Conference on Application-specific Systems, Architectures and Processors (Poster Session), ASAP 2010 (2010)Google Scholar
  13. 13.
    Bernstein, D.J., Schwabe, P.: New AES Software Speed Records. In: Chowdhury, D.R., Rijmen, V., Das, A. (eds.) INDOCRYPT 2008. LNCS, vol. 5365, pp. 322–336. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  14. 14.
    Ehrsam, W.F., Meyer, C.H.W., Powers, R.L., Smith, J.L., Tuchman, W.L.: Product block cipher system for data security. Patent, US 3962539 (June 1976)Google Scholar
  15. 15.
    McGrew, D.A., Viega, J.: The Galois/Counter Mode of Operation, GCM (2004),
  16. 16.
    Gueron, S., Kounavis, M.E.: Intel Carry-Less Multiplication Instruction and its Usage for Computing the GCM Mode (White Paper). Intel Corp. (2010),
  17. 17.
    Eastlake, D.E., Jones, P.E.: US Secure Hash Algorithm 1, SHA1 (2001),
  18. 18.
    Gopal, V., Ozturk, E., Feghali, W., Guilford, J., Wolrich, G., Dixon, M.: Optimized Galois-Counter-Mode Implementation on Intel Architecture Processors. Intel Corp. (2010),
  19. 19.
    Püschel, M., Moura, J.M.F., Johnson, J., Padua, D., Veloso, M., Singer, B., Xiong, J., Franchetti, F., Gacic, A., Voronenko, Y., Chen, K., Johnson, R.W., Rizzolo, N.: SPIRAL: Code Generation for DSP Transforms. Proceedings of the IEEE, special issue on Program Generation, Optimization, and Adaptation 93(2), 232–275 (2005)Google Scholar
  20. 20.
    Frigo, M., Steven, Johnson, G.: The Design and Implementation of FFTW3. Proceedings of the IEEE, 216–231 (2005)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Raymond Manley
    • 1
  • David Gregg
    • 1
  1. 1.Lero@TCD, School of Computer Science and StatisticsTrinity College DublinIreland

Personalised recommendations