On the Power of Bitslice Implementation on Intel Core2 Processor

  • Mitsuru Matsui
  • Junko Nakajima
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4727)


This paper discusses the state-of-the-art fast software implementation of block ciphers on Intel’s new microprocessor Core2, particularly concentrating on “bitslice implementation”. The bitslice parallel encryption technique, initially proposed by Biham for speeding-up DES, has been successful on RISC processors with many long registers, but on the other side bitsliced ciphers are not widely used in real applications on PC platforms, because in many cases they were actually not very fast on previous PC processors. Moreover the bitslice mode requires a non-standard data format and hence an additional format conversion is needed for compatibility with an existing parallel mode of operation, which was considered to be expensive.

This paper demonstrates that some bitsliced ciphers have a remarkable performance gain on Intel’s Core2 processor due to its enhanced SIMD architecture. We show that KASUMI, a UMTS/GSM mobile standard block cipher, can be four times faster when implemented using a bitslice technique on this processor. Also our bitsliced AES code runs at the speed of 9.2 cycles/byte, which is the performance record of AES ever made on a PC processor. Next we for the first time focus on how to optimize a conversion algorithm between a bitslice format and a standard format on a specific processor. As a result, the bitsliced AES code can be faster than a highly optimized “standard AES” code on Core2, even taking an overhead of the conversion into consideration. This means that in the CTR mode, bitsliced AES is not only fast but also fully compatible with an existing implementation and moreover secure against cache timing attacks, since a bitsliced cipher does not use any lookup tables with key/data-dependent address.


Fast Software Encryption Bitslice AES KASUMI Core2 


  1. 1.
    3GPP TS 35.202 v6.1.0, 3G Security; Specification of the 3GPP Confidentiality and Integrity Algorithms; Document 2: KASUMI Specification (Release 6), 3rd Generation Partnership Project (2005)Google Scholar
  2. 2.
    Anderson, R., Biham, E., Knudsen, L.: Serpent: A proposal for the Advanced Encryption Standard, Available at
  3. 3.
    Aoki, K., Ichikawa, T., Kanda, M., Matsui, M., Moriai, S., Nakajima, J., Tokita, T.: The 128-Bit Block Cipher Camellia. IEICE Trans. Fundamentals E85-A(1), 11–24 (2002)Google Scholar
  4. 4.
    Bhaskar, R., Dubey, P., Kumar, V., Rudra, A.: Efficient galois field arithmetic on SIMD architectures. In: Proceedings of the fifteenth annual ACM symposium on Parallel algorithms and architectures, pp. 256–257. ACM Press, New York (2003)CrossRefGoogle Scholar
  5. 5.
    Biham, E.: A Fast New DES Implementation in Software. In: Biham, E. (ed.) FSE 1997. LNCS, vol. 1267, pp. 260–272. Springer, Heidelberg (1997)CrossRefGoogle Scholar
  6. 6.
    Canright, D.: A Very Compact S-Box for AES. In: Rao, J.R., Sunar, B. (eds.) CHES 2005. LNCS, vol. 3659, pp. 441–455. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  7. 7.
    The distributed. net project: Available at
  8. 8.
    Federal Information Processing Standards Publication 197, Advanced Encryption Standard (AES), NIST (2001)Google Scholar
  9. 9.
    Fog, A.: Software optimization resources, Available at
  10. 10.
    Gladman, B.: Serpent Performance, Available at
  11. 11.
    Granlund, T.: Instruction latencies and throughput for AMD and Intel x86 Processors, Available at
  12. 12.
    ISO/IEC 18033-3, Information technology - Security techniques - Encryption algorithms - Part3: Block ciphers (2005)Google Scholar
  13. 13.
    Matsui, M.: New encryption algorithm MISTY. In: Biham, E. (ed.) FSE 1997. LNCS, vol. 1267, pp. 54–68. Springer, Heidelberg (1997)CrossRefGoogle Scholar
  14. 14.
    Matsui, M.: How Far Can We Go on the x64 Processors? In: Robshaw, M. (ed.) FSE 2006. LNCS, vol. 4047, pp. 341–358. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  15. 15.
    Nakajima, J., Matsui, M.: Fast Software Implementations of MISTY1 on Alpha Processors. IEICE Trans. Fundamentals E82-A(1), 107–116 (1999)Google Scholar
  16. 16.
    Mentens, N., Batina, L., Preneel, B., Verbauwhede, I.: A Systematic Evaluation of Compact Hardware Implementations for the Rijndael S-Box. In: Menezes, A.J. (ed.) CT-RSA 2005. LNCS, vol. 3376, pp. 323–333. Springer, Heidelberg (2005)Google Scholar
  17. 17.
    Osvik, D.A., Shamir, A., Tromer, E.: Full AES key extraction in 65 milliseconds using cache attacks. In: Crypto 2005 rump session.Google Scholar
  18. 18.
    Rudra, A., Dubey, P., Jutla, C., Kummar, V., Rao, J., Rohatgi, P.: Efficient Rijndael Encryption Implementation with Composite Field Arithmetic. In: Koç, Ç.K., Naccache, D., Paar, C. (eds.) CHES 2001. LNCS, vol. 2162, pp. 171–184. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  19. 19.
    Satoh, A., Morioka, S., Takano, K., Munetoh, S.: A Compact Rijndael Hardware Architecture with S-Box Optimization. In: Boyd, C. (ed.) ASIACRYPT 2001. LNCS, vol. 2248, pp. 239–254. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  20. 20.
    Shimoyama, T., Amada, S., Moriai, S.: Improved fast software implementation of block ciphers. In: Proceedings of the First International Conference on Information and Communication Security, pp. 269–273. Springer, Heidelberg (1997)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2007

Authors and Affiliations

  • Mitsuru Matsui
    • 1
  • Junko Nakajima
    • 1
  1. 1.Information Technology R&D Center, Mitsubishi Electric Corporation, 5-1-1 Ofuna Kamakura KanagawaJapan

Personalised recommendations