Parallel Implementations of LEA

  • Hwajeong Seo
  • Zhe Liu
  • Taehwan Park
  • Hyunjin Kim
  • Yeoncheol Lee
  • Jongseok Choi
  • Howon KimEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8565)


LEA is a new lightweight and low-power encryption algorithm. This algorithm has a certain useful features which are especially suitable for parallel hardware and software implementations, i.e., simple ARX operations, non-S-BOX architecture, and 32-bit word size. In this paper we evaluate the performance of the LEA algorithm on ARM-NEON and GPUs by taking advantage of both the desirable features of LEA and a parallel computing platform and programming model by NEON and CUDA. Specifically, we propose novel parallel LEA implementations on representative SIMT and SIMD architectures such as CUDA and NEON. In case of CUDA, we firstly designed a thread-based computation model to fall into functional parallelism by computing several encryptions over one thread. To alleviate the memory transfer delay, we allocate memory to satisfy coalescing memory access. Secondly our method is block cipher implementation written in assembly language, which provides efficient and flexible programming environments. With these optimization techniques, we achieved 17.352 and 2.5 GBps (bytes per second) throughput without/with memory transfer. In case of NEON, we adopted pipeline instructions and SIMD-based execution models, which enhanced encryption by 49.85 % compared to previous ARM implementations.


Low-power encryption algorithm Single instruction multiple data Single instruction multiple threads NEON GPGPU Software implementation Block cipher ARM 


  1. 1.
  2. 2.
    Bernstein, D.J., Schwabe, P.: NEON crypto. In: Prouff, E., Schaumont, P. (eds.) CHES 2012. LNCS, vol. 7428, pp. 320–339. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  3. 3.
    Sánchez, A.H., Rodríguez-Henríquez, F.: NEON implementation of an attribute-based encryption scheme. In: Jacobson, M., Locasto, M., Mohassel, P., Safavi-Naini, R. (eds.) ACNS 2013. LNCS, vol. 7954, pp. 322–338. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  4. 4.
    Nvidia: CUDA C programming guide (2012)Google Scholar
  5. 5.
    Nvidia: CUDA best practices guide (2012)Google Scholar
  6. 6.
    Hong, D., Lee, J.-K., Kim, D.-C., Kwon, D., Ryu, K.H., Lee, D.-G.: LEA: a 128-bit block cipher for fast encryption on common processors. In: Kim, Y., Lee, H., Perrig, A. (eds.) WISA 2013. LNCS, vol. 8267, pp. 1–24. Springer, Heidelberg (2014)CrossRefGoogle Scholar
  7. 7.
    Nvidia: Parallel thread execution ISA version 3.1. Accessed 2013
  8. 8.
    Scott, M., Szczechowiak, P.: Optimizing multiprecision multiplication for public key cryptography. IACR Cryptology ePrint Archive 2007:299 (2007)Google Scholar
  9. 9.
    Intel Corporation. Accessed 2013
  10. 10.
    Iwai, K., Kurokawa, T., Nisikawa, N.: AES encryption implementation on CUDA GPU and its analysis. In: 2010 First International Conference on Networking and Computing (ICNC), pp. 209–214. IEEE (2010)Google Scholar
  11. 11.
    Stefan, D.: Analysis and Implementation of eSTREAM and SHA-3 Cryptographic Algorithms. Ph.D. dissertation, COOPER UNION (2011)Google Scholar
  12. 12.
    Neves, S., Arajo, F.: Cryptography in GPUs. Ph.D. dissertation, Masters thesis, Universidade de Coimbra, Coimbra (2009)Google Scholar
  13. 13.
    Iwai, K., Nishikawa, N., Kurokawa, T.: Acceleration of AES encryption on CUDA GPU. Int. J. Netw. Comput. 2(1), 131 (2012)Google Scholar
  14. 14.
    Khalid, A., Paul, G., Chattopadhyay, A.: New speed records for Salsa20 stream cipher using an autotuning framework on GPUs. In: Youssef, A., Nitaj, A., Hassanien, A.E. (eds.) AFRICACRYPT 2013. LNCS, vol. 7918, pp. 189–207. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  15. 15.
    Liu, G., An, H., Han, W., Xu, G., Yao, P., Xu, M., Hao, X., Wang, Y.: A program behavior study of block cryptography algorithms on GPGPU. In: Fourth International Conference on Frontier of Computer Science and Technology, 2009 FCST’09, pp. 33–39. IEEE (2009)Google Scholar
  16. 16.
    Di Biagio, A., Barenghi, A., Agosta, G., Pelosi, G.: Design of a parallel AES for graphics hardware using the CUDA framework. In: IEEE International Symposium on Parallel & Distributed Processing, 2009. IPDPS 2009, pp. 1–8. IEEE (2009)Google Scholar
  17. 17.
    Bernstein, D.J., Chen, H.-C., Cheng, C.-M., Lange, T., Niederhagen, R., Schwabe, P., Yang, B.-Y.: Usable assembly language for GPUs: a success story. IACR Cryptology ePrint Archive 2012:137 (2012)Google Scholar
  18. 18.
    Cook, S.: CUDA Programming: A Developer’s Guide to Parallel Computing with GPUs. Newnes, Boston (2012)Google Scholar
  19. 19.
    Benchmarking the new Kepler (GTX 680). Accessed 2013
  20. 20.
    GeForce GTX 680 2 GB review: Kepler sends Tahiti on vacation.,3161-15.html. Accessed 2013
  21. 21.
    GPGPU face-off: K20 vs 7970 vs GTX680 vs M2050 vs GTX580. Accessed 2013
  22. 22.
    Manavski, S.A.: CUDA compatible GPU as an efficient hardware accelerator for AES cryptography. In: IEEE International Conference on Signal Processing and Communications, 2007, ICSPC 2007, pp. 65–68. IEEE (2007)Google Scholar
  23. 23.
    Holzer-Graf, S., Krinninger, T., Pernull, M., Schläffer, M., Schwabe, P., Seywald, D., Wieser, W.: Efficient vector implementations of AES-based designs: a case study and new implemenations for Grøstl. In: Dawson, E. (ed.) CT-RSA 2013. LNCS, vol. 7779, pp. 145–161. Springer, Heidelberg (2013)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Hwajeong Seo
    • 1
  • Zhe Liu
    • 2
  • Taehwan Park
    • 1
  • Hyunjin Kim
    • 1
  • Yeoncheol Lee
    • 1
  • Jongseok Choi
    • 1
  • Howon Kim
    • 1
    Email author
  1. 1.School of Computer Science and EngineeringPusan National UniversityBusanRepublic of Korea
  2. 2.Laboratory of Algorithmics, Cryptology and Security (LACS)University of LuxembourgLuxembourg-KirchbergLuxembourg

Personalised recommendations