Light-Weight Instruction Set Extensions for Bit-Sliced Cryptography

  • Philipp Grabher
  • Johann Großschädl
  • Dan Page
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5154)

Abstract

Bit-slicing is a non-conventional implementation technique for cryptographic software where an n-bit processor is considered as a collection of n 1-bit execution units operating in SIMD mode. Particularly when implementing symmetric ciphers, the bit-slicing approach has several advantages over more conventional alternatives: it often allows one to reduce memory footprint by eliminating large look-up tables, and it permits more predictable performance characteristics that can foil time based side-channel attacks. Both features are attractive for mobile and embedded processors, but the performance overhead that results from bit-sliced implementation often represents a significant disadvantage. In this paper we describe a set of light-weight Instruction Set Extensions (ISEs) that can improve said performance while retaining all advantages of bit-sliced implementation. Contrary to other crypto-ISE, our design is generic and allows for a high degree of algorithm agility: we demonstrate applicability to several well-known cryptographic primitives including four block ciphers (DES, Serpent, AES, and PRESENT), a hash function (SHA-1), as well as multiplication of ternary polynomials.

References

  1. 1.
    Anderson, R., Biham, E., Knudsen, L.: Serpent: A proposal for the Advanced Encryption Standard. Technical report, http://www.cl.cam.ac.uk/~rja14/serpent.html
  2. 2.
    Bartolini, S., Branovic, I., Giorgi, R., Martinelli, E.: A performance evaluation of ARM ISA extension for elliptic curve cryptography over binary finite fields. In: Proceedings of the 16th Symposium on Computer Architecture and High Performance Computing (SBAC-PAD 2004), pp. 238–245. IEEE Computer Society Press, Los Alamitos (2004)Google Scholar
  3. 3.
    Bertoni, G., Breveglieri, L., Fragneto, P., Macchetti, M., Marchesin, S.: Efficient software implementation of AES on 32-bit platforms. In: Kaliski Jr., B.S., Koç, Ç.K., Paar, C. (eds.) CHES 2002. LNCS, vol. 2523, pp. 129–142. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  4. 4.
    Bertoni, G.M., Breveglieri, L., Farina, R., Regazzoni, F.: Speeding up AES by extending a 32-bit processor instruction set. In: Proceedings of the 17th IEEE International Conference on Application-Specific Systems, Architectures and Processors (ASAP 2006), pp. 275–279. IEEE Computer Society Press, Los Alamitos (2006)CrossRefGoogle Scholar
  5. 5.
    Biham, E.: A fast new DES implementation in software. In: Biham, E. (ed.) FSE 1997. LNCS, vol. 1267, pp. 260–272. Springer, Heidelberg (1997)CrossRefGoogle Scholar
  6. 6.
    Bogdanov, A., Knudsen, L.R., Leander, G., Paar, C., Poschmann, A., Robshaw, M.J., Seurin, Y., Vikkelsoe, C.: PRESENT: An ultra-lightweight block cipher. In: Paillier, P., Verbauwhede, I. (eds.) CHES 2007. LNCS, vol. 4727, pp. 450–466. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  7. 7.
    Bonneau, J., Mironov, I.: Cache-collision timing attacks against AES. In: Goubin, L., Matsui, M. (eds.) CHES 2006. LNCS, vol. 4249, pp. 201–215. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  8. 8.
    Buchty, R., Heintze, N., Oliva, D.: Cryptonite – A programmable crypto processor architecture for high-bandwidth applications. In: Müller-Schloer, C., Ungerer, T., Bauer, B. (eds.) ARCS 2004. LNCS, vol. 2981, pp. 184–198. Springer, Heidelberg (2004)Google Scholar
  9. 9.
    Burke, J., McDonald, J., Austin, T.: Architectural support for fast symmetric-key cryptography. In: Proceedings of the 9th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS 2000), pp. 178–189. ACM Press, New York (2000)CrossRefGoogle Scholar
  10. 10.
    Canright, D.: A very compact S-box for AES. In: Rao, J.R., Sunar, B. (eds.) CHES 2005. LNCS, vol. 3659, pp. 441–455. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  11. 11.
    Daemen, J., Rijmen, V.: The Design of Rijndael: AES – The Advanced Encryption Standard. Springer, Heidelberg (2002)MATHGoogle Scholar
  12. 12.
    Davies, P.L., Robsky, S.R.: Customized processor extension speeds network cryptology. Electronic Design 50(19), 83–88 (2002)Google Scholar
  13. 13.
    Elbirt, A.J.: Fast and efficient implementation of AES via instruction set extensions. In: Proceedings of the 21st International Conference on Advanced Information Networking and Applications (AINA 2007), vol. 1, pp. 481–490. IEEE Computer Society Press, Los Alamitos (2007)Google Scholar
  14. 14.
    Fiskiran, A.M., Lee, R.B.: PAX: A datapath-scalable minimalist cryptographic processor for mobile devices. In: Embedded Cryptographic Hardware: Design and Security, pp. 19–34. Nova Science Publishers (2004)Google Scholar
  15. 15.
    Fiskiran, A.M., Lee, R.B.: On-chip lookup tables for fast symmetric-key encryption. In: Proceedings of the 16th IEEE International Conference on Application-specific Systems, Architectures and Processors (ASAP 2005), pp. 356–363. IEEE Computer Society Press, Los Alamitos (2005)CrossRefGoogle Scholar
  16. 16.
    Großschädl, J., Savaş, E.: Instruction set extensions for fast arithmetic in finite fields GF(p) and GF(2m). In: Joye, M., Quisquater, J.-J. (eds.) CHES 2004. LNCS, vol. 3156, pp. 133–147. Springer, Heidelberg (2004)Google Scholar
  17. 17.
    Großschädl, J., Tillich, S., Szekely, A., Wurm, M.: Cryptography instruction set extensions to the SPARC V8 architecture (preprint submitted for publication, 2007)Google Scholar
  18. 18.
    Hankerson, D., Menezes, A., Vanstone, S.: Guide to Elliptic Curve Cryptography. Springer, Heidelberg (2004)MATHGoogle Scholar
  19. 19.
    Harrison, K., Page, D., Smart, N.P.: Software implementation of finite fields of characteristic three, for use in pairing pased cryptosystems. LMS Journal of Computation and Mathematics 5(1), 181–193 (2002)MATHMathSciNetGoogle Scholar
  20. 20.
    Institute of Electrical and Electronics Engineers (IEEE). IEEE Std 1363-2000: IEEE Standard Specifications for Public-Key CryptographyGoogle Scholar
  21. 21.
    Könighofer, R.: A fast and cache-timing resistant implementation of the AES. In: Topics in Cryptology — CT-RSA 2008. LNCS, vol. 4964, pp. 187–202. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  22. 22.
    Kwan, M.: Reducing the gate count of bitslice DES. Cryptology ePrint Archive, Report 2000/051 (2000), http://eprint.iacr.org
  23. 23.
    Lee, R.B., Shi, Z., Yang, X.: Efficient permutation instructions for fast software cryptography. IEEE Mirco. 21(6), 56–69 (2001)CrossRefGoogle Scholar
  24. 24.
    Matsui, M.: How far can we go on the x64 processors? In: Robshaw, M.J.B. (ed.) FSE 2006. LNCS, vol. 4047, pp. 341–358. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  25. 25.
    Matsui, M., Nakajima, J.: On the power of bitslice implementation on Intel Core2 processor. In: Paillier, P., Verbauwhede, I. (eds.) CHES 2007. LNCS, vol. 4727, pp. 121–134. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  26. 26.
    Mimosys. Clarity Product Datasheet (July 2006), http://www.mimosys.com/pdf/Mimosys_Clarity_Product_Datasheet.pdf
  27. 27.
    O’Melia, S.R.: Instruction Set Extensions for Enhancing the Performance of Symmetric-Key Cryptography. M.Sc. Thesis. University of Massachusetts, Lowell (2007)Google Scholar
  28. 28.
    Osvik, D.A., Shamir, A., Tromer, E.: Cache attacks and countermeasures: The case of AES. In: Pointcheval, D. (ed.) CT-RSA 2006. LNCS, vol. 3860, pp. 1–20. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  29. 29.
    Patterson, C.: A dynamic FPGA implementation of the Serpent block cipher. In: Paar, C., Koç, Ç.K. (eds.) CHES 2000. LNCS, vol. 1965, pp. 141–155. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  30. 30.
    Phillips, B.J., Burgess, N.: Implementing 1,024-bit RSA exponentiation on a 32-bit processor core. In: Proceedings of the 12th IEEE International Conference on Application-specific Systems, Architectures and Processors (ASAP 2000), pp. 127–137. IEEE Computer Society Press, Los Alamitos (2000)CrossRefGoogle Scholar
  31. 31.
    Pozzi, L., Ienne, P.: Exploiting pipelining to relax register-file port constraints of instruction-set extensions. In: Proceedings of the 8th International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES 2005), pp. 2–10. ACM Press, New York (2005)CrossRefGoogle Scholar
  32. 32.
    Ravi, S., Raghunathan, A., Potlapally, N.R., Sankaradass, M.: System design methodologies for a wireless security processing platform. In: Proceedings of the 39th Design Automation Conference (DAC 2002), pp. 777–782. ACM Press, New York (2002)CrossRefGoogle Scholar
  33. 33.
    Ravi, S., Raghunathan, A., Potlapally, N.R.: Securing wireless data: System architecture challenges. In: Proceedings of the 15th International Symposium on System Synthesis (ISSS 2002), pp. 195–200. ACM Press, New York (2002)Google Scholar
  34. 34.
    Shi, Z., Lee, R.B.: Bit permutation instructions for accelerating software cryptography. In: Proceedings of the 12th IEEE International Conference on Application-Specific Systems, Architectures, and Processors (ASAP 2000), pp. 138–148. IEEE Computer Society Press, Los Alamitos (2000)Google Scholar
  35. 35.
    Tillich, S., Großschädl, J.: Instruction set extensions for efficient AES implementation on 32-bit processors. In: Goubin, L., Matsui, M. (eds.) CHES 2006. LNCS, vol. 4249, pp. 270–284. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  36. 36.
    Wu, L., Weaver, C., Austin, T.M.: CryptoManiac: A fast flexible architecture for secure communication. In: Proceedings of the 28th Annual International Symposium on Computer Architecture (ISCA 2001), pp. 110–119. ACM Press, New York (2001)Google Scholar
  37. 37.
    Yang, X., Vachharajani, M., Lee, R.B.: Fast subword permutation instructions based on butterfly networks. In: Media Processors 2000. Proceedings of the SPIE, vol. 3970, pp. 80–86. SPIE (1999)Google Scholar
  38. 38.
    Yehia, S., Clark, N.T., Mahlke, S.A., Flautner, K.: Exploring the design space of LUT-based transparent accelerators. In: Proceedings of the 8th International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES 2005), pp. 238–249. ACM Press, New York (2005)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Philipp Grabher
    • 1
  • Johann Großschädl
    • 1
  • Dan Page
    • 1
  1. 1.Department of Computer ScienceUniversity of BristolBristolU.K.

Personalised recommendations