Are our dense linear algebra libraries energy-friendly?

Aliaga, José I.; Barreda, María; Dolz, Manuel F.; Quintana-Ortí, Enrique S.

doi:10.1007/s00450-014-0263-y

Are our dense linear algebra libraries energy-friendly?

Time–power–energy trade-offs in BLAS and LAPACK

Special Issue Paper
Published: 05 July 2014

Volume 30, pages 187–196, (2015)
Cite this article

Computer Science - Research and Development

José I. Aliaga¹,
María Barreda¹,
Manuel F. Dolz² &
…
Enrique S. Quintana-Ortí¹

220 Accesses
3 Citations
Explore all metrics

Abstract

In this paper we conduct a detailed analysis of the sources of power dissipation and energy consumption during the execution of current dense linear algebra kernels on multicore processors, binding these two metrics together with performance to the arithmetic intensity of the operations. In particular, by leveraging the RAPL interface of an Intel E5 (“Sandy Bridge”) six-core CPU, we decompose the power-energy duo into its core (mainly due to floating-point units and cache), RAM (off-chip accesses), and uncore components,performing a series of illustrative experiments for a range of memory-bound to CPU-bound high performance kernels. Additionally, we investigate the energy proportionality of these three architecture components for the execution of linear algebra routines on the Intel E5.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Portable HPC Programming on Intel Many-Integrated-Core Hardware with MAGMA Port to Xeon Phi

Sparse Linear Algebra on AMD and NVIDIA GPUs – The Race Is On

Porting Sparse Linear Algebra to Intel GPUs

Notes

According to [18], the “core” mainly comprises the floating-point execution units, branch prediction logic, and the higher levels of cache. The “uncore” is basically composed of the last level of cache (L3 in this processor), the memory and interconnect controllers, and the power control logic.
We also evaluated Intel MKL 10.3 and GotoBLAS 1.13, but observed higher performance with OpenBLAS for the kernels and platform targeted in this study.
http://www.netlib.org/lapack version 3.5.0.
For simplicity, hereafter we neglect lower order terms in the arithmetic and storage costs.
http://www.cs.virginia.edu/stream.

References

Alonso P, Dolz MF, Mayo R, Quintana-Ortí ES (2014) Modeling power and energy consumption of dense matrix factorizations on multicore processors. Concurr. Computat. Practice Exp. (to appear)
Anderson E, Bai Z, Bischof C, Blackford LS, Demmel J, Dongarra JJ, Croz JD, Hammarling S, Greenbaum A, McKenney A, Sorensen D (1999) LAPACK Users’ guide, 3rd edn. SIAM, Philadelphia
Asanovic K, Bodik R, Catanzaro BC, Gebis JJ, Husbands P, Keutzer K, Patterson DA, Plishker WL, Shalf J, Williams SW, Yelick KA (2006) The landscape of parallel computing research: a view from Berkeley. Tech. Rep. UCB/EECS-2006-183, EECS Department, University of California, Berkeley
Barroso LA (2005) The price of performance. ACM Queue 3:48–53
Article Google Scholar
Barroso LA, Hölzle U (2007) The case for energy-proportional computing. Computer 40(12):33–37
Article Google Scholar
Beckett J, Bradfield R (2011) Power efficiency comparison of enterprise-class blade servers and enclosures. A Dell Technical White Paper
Bosilca G, Ltaief H, Dongarra J (2012) Power profiling of Cholesky and QR factorizations on distributed memory systems. In: Third international conference on energy-aware high performance computing (Ena-HPC), Hamburg, pp 1–9
Choi JW, Bedard D, Fowler R, Vuduc R (2013) A roofline model of energy. In: 27th IEEE Int Symp Parallel Distributed Processing (IPDPS), pp 661–672
Curtis-Maury M, Dzierwa J, Antonopoulos CD, Nikolopoulos DS (2006) Online power-performance adaptation of multithreaded programs using hardware event-based prediction. In: Proc. 20th Annual Int. Conf. Supercomputing, ICS ’06, pp 157–166
David H, Gorbatov E, Hanebutte UR, Khanna R, Le C (2010) RAPL: memory power estimation and capping. In: 2010 ACM/IEEE Int. Symp. Low-Power Electronics and Design (ISLPED), pp 6189–194
Demmel J, Gearhart A (2012) Instrumenting linear algebra energy consumption via on-chip energy counters. Tech. Rep. UCB/EECS-2012-168, EECS Department, University of California, Berkeley
Dongarra JJ, Du Croz J, Hammarling S, Duff I (1990) A set of level 3 basic linear algebra subprograms. ACM Trans Math Softw 16(1):1–17
Article MATH Google Scholar
Elnozahy E, Kistler M, Rajamony R (2003) Energy-efficient server clusters. In: Power-Aware Computer Systems Second International Workshop, vol 2325., PACS 2002. Lecture Notes in Computer Science (LNCS). Springer, Cambridge, p 179–197
Esmaeilzadeh H, Blem E, St. Amant R, Sankaralingam K, Burger D (2011) Dark silicon and the end of multicore scaling. In: Proc. 38th Annual Int. Symp. Computer architecture, ISCA ’11, pp 365–376
Freeh VW, Lowenthal D, Pan F, Kappiah N, Springer R, Rountree B, Femal M (2007) Analyzing the energy-time trade-off in high-performance computing applications. IEEE Trans Parallel Distrib Syst 18(6):835–848
Article Google Scholar
Golub GH, Loan CFV (1989) Matrix computations, 2nd edn. The Johns Hopkins Univ. Press, Baltimore
MATH Google Scholar
Goto K, van de Geijn R (2008) High performance implementation of the level-3 BLAS. ACM Trans Math Soft 35(1), 4:1–4:14
Hill DL, Huff T, Kulick S, Safranek R (2010) The uncore: a modular approach to feeding the high-performance cores. Intel Technol J 14(3):30–49
Intel: Math Kernel Library (2012). http://developer.intel.com/software/products/mkl/. Accessed Apr 2014
(2012). http://xianyi.github.com/OpenBLAS/. Accessed Apr 2014
Ryckbosch F, Polfliet S, Eeckhout L (2011) Trends in server energy proportionality. Computer 44(9):69–72
Article Google Scholar
Van Zee FG, van de Geijn RA (2013) BLIS: A framework for generating BLAS-like libraries. ACM Trans Math Soft (to appear)

Download references

Acknowledgments

This work was supported by the CICYT project TIN2011-23283 of MINECO and FEDER, and the EU Project FP7 318793 “EXA2GREEN.

Author information

Authors and Affiliations

Depto. de Ingeniería y Ciencia de Computadores, Universitat Jaume I, 12.071 , Castellón, Spain
José I. Aliaga, María Barreda & Enrique S. Quintana-Ortí
Department of Informatics, University of Hamburg, 22.527 , Hamburg, Germany
Manuel F. Dolz

Authors

José I. Aliaga
View author publications
You can also search for this author in PubMed Google Scholar
María Barreda
View author publications
You can also search for this author in PubMed Google Scholar
Manuel F. Dolz
View author publications
You can also search for this author in PubMed Google Scholar
Enrique S. Quintana-Ortí
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to María Barreda.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Aliaga, J.I., Barreda, M., Dolz, M.F. et al. Are our dense linear algebra libraries energy-friendly?. Comput Sci Res Dev 30, 187–196 (2015). https://doi.org/10.1007/s00450-014-0263-y

Download citation

Published: 05 July 2014
Issue Date: May 2015
DOI: https://doi.org/10.1007/s00450-014-0263-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Are our dense linear algebra libraries energy-friendly?

Abstract

Access this article

Similar content being viewed by others

Portable HPC Programming on Intel Many-Integrated-Core Hardware with MAGMA Port to Xeon Phi

Sparse Linear Algebra on AMD and NVIDIA GPUs – The Race Is On

Porting Sparse Linear Algebra to Intel GPUs

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Are our dense linear algebra libraries energy-friendly?

Abstract

Access this article

Similar content being viewed by others

Portable HPC Programming on Intel Many-Integrated-Core Hardware with MAGMA Port to Xeon Phi

Sparse Linear Algebra on AMD and NVIDIA GPUs – The Race Is On

Porting Sparse Linear Algebra to Intel GPUs

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation