Linking Application Description with Efficient SIMD Code Generation for Low-Precision Signed-Integer GEMM

  • Günther SchindlerEmail author
  • Manfred Mücke
  • Holger Fröning
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10659)


The need to implement demanding numerical algorithms within a constrained power budget has led to a renewed interest in low-precision number formats. Exploration of the degrees of freedom provided both by better support for low-precision number formats on computer architectures and by the respective application domain remains a most demanding task, though.

In this example, we upgrade the machine learning framework Theano and the Eigen linear algebra library to support matrix multiplication of formats between 32 and 1 bit by packing multiple values in a 32-bit vector. This approach keeps all the optimizations of Eigen to the overall matrix operation, while maximizing performance enabled through SIMD units on modern embedded CPUs. With respect to 32-bit formats, we achieve a speedup between 0.45 and 21.17 on an ARM Cortex-A15.



The main author is sponsored by the German Research Foundation (DFG). The financial support by the Austrian Federal Government, within the framework of the COMET Funding Programme is gratefully acknowledged. We also acknowledge the valuable discussions with various people, including Franz Pernkopf and Matthias Zöhrer (Graz University of Technology, Austria), and Michaela Blott (Xilinx).


  1. 1.
    ARM: Cortex-A9 NEON Media - technical reference manual. Technical report (2008)Google Scholar
  2. 2.
    ARM: Introducing NEON - development article. Technical report (2009)Google Scholar
  3. 3.
    Courbariaux, M., Bengio, Y.: BinaryNet: training deep neural networks with weights and activations constrained to +1 or \(-1\). CoRR (2016)Google Scholar
  4. 4.
    Esmaeilzadeh, H., Sampson, A., Ceze, L., Burger, D.: Architecture support for disciplined approximate programming. SIGPLAN Not. 47(4), 301–312 (2012)CrossRefGoogle Scholar
  5. 5.
    Fousse, L., Hanrot, G., Lefèvre, V., Pélissier, P., Zimmermann, P.: MPFR: a multiple-precision binary floating-point library with correct rounding. Research report RR-5753, INRIA (2005)Google Scholar
  6. 6.
    Goto, K., van de Geijn, R.A.: Anatomy of high-performance matrix multiplication. ACM Trans. Math. Softw. 34(3), 12:1–12:25 (2008)MathSciNetCrossRefzbMATHGoogle Scholar
  7. 7.
    Guennebaud, G., Jacob, B., et al.: Eigen v3 (2010).
  8. 8.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. CoRR arXiv:1512.03385 (2015)
  9. 9.
    Holoborodko, P.: MPFR C++ (2008–2012).
  10. 10.
    Sampson, A., Dietl, W., Fortuna, E., Gnanapragasam, D., Ceze, L., Grossman, D.: EnerJ: approximate data types for safe and general low-power computation. In: Proceedings of 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2011. ACM, New York (2011)Google Scholar
  11. 11.
    Sidiroglou-Douskos, S., Misailovic, S., Hoffmann, H., Rinard, M.: Managing performance vs. accuracy trade-offs with loop perforation. In: Proceedings of 19th ACM SIGSOFT Symposium and the 13th European Conference on Foundations of Software Engineering, ESEC/FSE 2011. ACM, New York (2011)Google Scholar
  12. 12.
    Theano Development Team: Theano: A Python framework for fast computation of mathematical expressions, May 2016. arXiv e-prints arXiv:1605.02688
  13. 13.
    Venkataramani, S., Sabne, A., Kozhikkottu, V., Roy, K., Raghunathan, A.: Salsa: systematic logic synthesis of approximate circuits. In: Proceedings of 49th Annual Design Automation Conference, DAC 2012, pp. 796–801. ACM, New York (2012)Google Scholar
  14. 14.
    Zhu, C., Han, S., Mao, H., Dally, W.J.: Trained ternary quantization. CoRR (2016)Google Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  • Günther Schindler
    • 1
    Email author
  • Manfred Mücke
    • 2
  • Holger Fröning
    • 1
  1. 1.Institute of Computer EngineeringRuprecht Karls University, HeidelbergMannheimGermany
  2. 2.Materials Center Leoben Forschung GmbHLeobenAustria

Personalised recommendations