Skip to main content

Advertisement

Log in

Are our dense linear algebra libraries energy-friendly?

Time–power–energy trade-offs in BLAS and LAPACK

  • Special Issue Paper
  • Published:
Computer Science - Research and Development

Abstract

In this paper we conduct a detailed analysis of the sources of power dissipation and energy consumption during the execution of current dense linear algebra kernels on multicore processors, binding these two metrics together with performance to the arithmetic intensity of the operations. In particular, by leveraging the RAPL interface of an Intel E5 (“Sandy Bridge”) six-core CPU, we decompose the power-energy duo into its core (mainly due to floating-point units and cache), RAM (off-chip accesses), and uncore components,performing a series of illustrative experiments for a range of memory-bound to CPU-bound high performance kernels. Additionally, we investigate the energy proportionality of these three architecture components for the execution of linear algebra routines on the Intel E5.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. According to [18], the “core” mainly comprises the floating-point execution units, branch prediction logic, and the higher levels of cache. The “uncore” is basically composed of the last level of cache (L3 in this processor), the memory and interconnect controllers, and the power control logic.

  2. We also evaluated Intel MKL 10.3 and GotoBLAS 1.13, but observed higher performance with OpenBLAS for the kernels and platform targeted in this study.

  3. http://www.netlib.org/lapack version 3.5.0.

  4. For simplicity, hereafter we neglect lower order terms in the arithmetic and storage costs.

  5. http://www.cs.virginia.edu/stream.

References

  1. Alonso P, Dolz MF, Mayo R, Quintana-Ortí ES (2014) Modeling power and energy consumption of dense matrix factorizations on multicore processors. Concurr. Computat. Practice Exp. (to appear)

  2. Anderson E, Bai Z, Bischof C, Blackford LS, Demmel J, Dongarra JJ, Croz JD, Hammarling S, Greenbaum A, McKenney A, Sorensen D (1999) LAPACK Users’ guide, 3rd edn. SIAM, Philadelphia

  3. Asanovic K, Bodik R, Catanzaro BC, Gebis JJ, Husbands P, Keutzer K, Patterson DA, Plishker WL, Shalf J, Williams SW, Yelick KA (2006) The landscape of parallel computing research: a view from Berkeley. Tech. Rep. UCB/EECS-2006-183, EECS Department, University of California, Berkeley

  4. Barroso LA (2005) The price of performance. ACM Queue 3:48–53

    Article  Google Scholar 

  5. Barroso LA, Hölzle U (2007) The case for energy-proportional computing. Computer 40(12):33–37

    Article  Google Scholar 

  6. Beckett J, Bradfield R (2011) Power efficiency comparison of enterprise-class blade servers and enclosures. A Dell Technical White Paper

  7. Bosilca G, Ltaief H, Dongarra J (2012) Power profiling of Cholesky and QR factorizations on distributed memory systems. In: Third international conference on energy-aware high performance computing (Ena-HPC), Hamburg, pp 1–9

  8. Choi JW, Bedard D, Fowler R, Vuduc R (2013) A roofline model of energy. In: 27th IEEE Int Symp Parallel Distributed Processing (IPDPS), pp 661–672

  9. Curtis-Maury M, Dzierwa J, Antonopoulos CD, Nikolopoulos DS (2006) Online power-performance adaptation of multithreaded programs using hardware event-based prediction. In: Proc. 20th Annual Int. Conf. Supercomputing, ICS ’06, pp 157–166

  10. David H, Gorbatov E, Hanebutte UR, Khanna R, Le C (2010) RAPL: memory power estimation and capping. In: 2010 ACM/IEEE Int. Symp. Low-Power Electronics and Design (ISLPED), pp 6189–194

  11. Demmel J, Gearhart A (2012) Instrumenting linear algebra energy consumption via on-chip energy counters. Tech. Rep. UCB/EECS-2012-168, EECS Department, University of California, Berkeley

  12. Dongarra JJ, Du Croz J, Hammarling S, Duff I (1990) A set of level 3 basic linear algebra subprograms. ACM Trans Math Softw 16(1):1–17

    Article  MATH  Google Scholar 

  13. Elnozahy E, Kistler M, Rajamony R (2003) Energy-efficient server clusters. In: Power-Aware Computer Systems Second International Workshop, vol 2325., PACS 2002. Lecture Notes in Computer Science (LNCS). Springer, Cambridge, p 179–197

  14. Esmaeilzadeh H, Blem E, St. Amant R, Sankaralingam K, Burger D (2011) Dark silicon and the end of multicore scaling. In: Proc. 38th Annual Int. Symp. Computer architecture, ISCA ’11, pp 365–376

  15. Freeh VW, Lowenthal D, Pan F, Kappiah N, Springer R, Rountree B, Femal M (2007) Analyzing the energy-time trade-off in high-performance computing applications. IEEE Trans Parallel Distrib Syst 18(6):835–848

    Article  Google Scholar 

  16. Golub GH, Loan CFV (1989) Matrix computations, 2nd edn. The Johns Hopkins Univ. Press, Baltimore

    MATH  Google Scholar 

  17. Goto K, van de Geijn R (2008) High performance implementation of the level-3 BLAS. ACM Trans Math Soft 35(1), 4:1–4:14

  18. Hill DL, Huff T, Kulick S, Safranek R (2010) The uncore: a modular approach to feeding the high-performance cores. Intel Technol J 14(3):30–49

  19. Intel: Math Kernel Library (2012). http://developer.intel.com/software/products/mkl/. Accessed Apr 2014

  20. (2012). http://xianyi.github.com/OpenBLAS/. Accessed Apr 2014

  21. Ryckbosch F, Polfliet S, Eeckhout L (2011) Trends in server energy proportionality. Computer 44(9):69–72

    Article  Google Scholar 

  22. Van Zee FG, van de Geijn RA (2013) BLIS: A framework for generating BLAS-like libraries. ACM Trans Math Soft (to appear)

Download references

Acknowledgments

This work was supported by the CICYT project TIN2011-23283 of MINECO and FEDER, and the EU Project FP7 318793 “EXA2GREEN.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to María Barreda.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Aliaga, J.I., Barreda, M., Dolz, M.F. et al. Are our dense linear algebra libraries energy-friendly?. Comput Sci Res Dev 30, 187–196 (2015). https://doi.org/10.1007/s00450-014-0263-y

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00450-014-0263-y

Keywords

Navigation