Scalable and Efficient Linear Algebra Kernel Mapping for Low Energy Consumption on the Layers CGRA

  • Zoltán Endre RákossyEmail author
  • Dominik Stengele
  • Axel Acosta-Aponte
  • Saumitra Chafekar
  • Paolo Bientinesi
  • Anupam Chattopadhyay
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9040)


A scalable mapping is proposed for 3 important kernels from the Numerical Linear Algebra domain, to exploit architectural features to reach asymptotically optimal efficiency and a low energy consumption. Performance and power evaluations were done with input data set matrix sizes ranging from 64\(\times \)64 to 16384\(\times \)16384. 12 architectural variants with up to 10\(\times \)10 processing elements were used to explore scalability of the mapping and the architecture, achieving \(<10\,\%\) energy increase for architectures up to 8\(\times \)8 PEs coupled with performance speed-ups of more than an order of magnitude. This enables a clean area-performance trade-off on the Layers architecture while keeping energy constant over the variants.


Processing Element Accumulation Procedure Memory Bank Layer Architecture Numerical Linear Algebra 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Ali, M., Stotzer, E., Igual, F.D., van de Geijn, R.A.: Level-3 BLAS on the TI C6678 multi-core DSP. In: Proc. of the 2012 IEEE 24th Intl. Simp. on Computer Architecture and High Performance Computing (SBAC-PAD), pp. 179–186. IEEE (2012)Google Scholar
  2. 2.
    Chattopadhyay, A.: Ingredients of adaptability: a survey of reconfigurable processors. VLSI Design 2013, 10 (2013)CrossRefMathSciNetGoogle Scholar
  3. 3.
    DeHon, A.: The density advantage of configurable computing. Computer 33(4), 41–49 (2000)CrossRefGoogle Scholar
  4. 4.
    Fell, A., Rákossy, Z.E., Chattopadhyay, A.: Force-directed scheduling for data-flow graph mapping on coarse-grained reconfigurable architectures. In: Reconfigurable Computing and FPGAs (ReConFig), IEEE (2014)Google Scholar
  5. 5.
    Gonzalez, J., Núñez, R.C.: LAPACKrc: Fast linear algebra kernels/solvers for FPGA accelerators. In: Journal of Physics: Conference Series 180, p. 012042. IOP Publishing (2009)Google Scholar
  6. 6.
    Lei, Y., Dou, Y., Dong, Y., Zhou, J., Xia, F.: FPGA implementation of an exact dot product and its application in variable-precision floating-point arithmetic. The Journal of Supercomputing 64(2), 580–605 (2013). CrossRefGoogle Scholar
  7. 7.
    Pedram, A., van de Geijn, R.A., Gerstlauer, A.: Codesign tradeoffs for high-performance, low-power linear algebra architectures. IEEE Trans. Comput. 61(12), 1724–1736 (2012)CrossRefMathSciNetGoogle Scholar
  8. 8.
    Rákossy, Z.E., Acosta Aponte, A., Chattopadhyay, A.: Exploiting architecture description language for diverse IP synthesis in heterogeneous MPSoC. In: Reconfigurable Computing and FPGAs (ReConFig). IEEE (2013)Google Scholar
  9. 9.
    Rákossy, Z.E., Merchant, F., Acosta Aponte, A., Nandy, S., Chattopadhyay, A.: Scalable and energy-efficient reconfigurable accelerator for column-wise givens rotation. In: 22nd International Conference on Very Large Scale Integration (VLSI-SoC). IEEE (2014)Google Scholar
  10. 10.
    Rákossy, Z.E., Naphade, T., Chattopadhyay, A.: Design and analysis of layered coarse-grained reconfigurable architecture. In: Reconfigurable Computing and FPGAs (ReConFig), pp. 1–6 (2012)Google Scholar
  11. 11.
    Volkov, V., Demmel, J.W.: Benchmarking gpus to tune dense linear algebra. In: Proc. of the 2008 ACM/IEEE Conf. on Supercomputing, p. 31. IEEE Press (2008)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Zoltán Endre Rákossy
    • 1
    Email author
  • Dominik Stengele
    • 2
  • Axel Acosta-Aponte
    • 1
  • Saumitra Chafekar
    • 1
  • Paolo Bientinesi
    • 2
  • Anupam Chattopadhyay
    • 3
  1. 1.Institute for Communication Technologies and Embedded Systems (ICE)AachenGermany
  2. 2.Algorithmically-Driven Code Generation for High-Performance, Computing Architectures, AICESRWTH Aachen UniversityAachenGermany
  3. 3.School of Computer EngineeringNanyang Technological UniversityNanyangSingapore

Personalised recommendations