Profiling high performance dense linear algebra algorithms on multicore architectures for power and energy efficiency

Ltaief, Hatem; Luszczek, Piotr; Dongarra, Jack

doi:10.1007/s00450-011-0191-z

Profiling high performance dense linear algebra algorithms on multicore architectures for power and energy efficiency

Special Issue Paper
Published: 31 August 2011

Volume 27, pages 277–287, (2012)
Cite this article

Computer Science - Research and Development

Hatem Ltaief¹,
Piotr Luszczek² &
Jack Dongarra²

219 Accesses
17 Citations
1 Altmetric
Explore all metrics

Abstract

This paper presents the power profile of two high performance dense linear algebra libraries i.e., LAPACK and PLASMA. The former is based on block algorithms that use the fork-join paradigm to achieve parallel performance. The latter uses fine-grained task parallelism that recasts the computation to operate on submatrices called tiles. In this way tile algorithms are formed. We show results from the power profiling of the most common routines, which permits us to clearly identify the different phases of the computations. This allows us to isolate the bottlenecks in terms of energy efficiency. Our results show that PLASMA surpasses LAPACK not only in terms of performance but also in terms of energy efficiency.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Are our dense linear algebra libraries energy-friendly?

Article 05 July 2014

Performance Evaluation and Analysis of Linear Algebra Kernels in the Prototype Tianhe-3 Cluster

Runtime Scheduling of the LU Factorization: Performance and Energy

References

Agullo E, Hadri B, Ltaief H, Dongarrra J (2009) Comparative study of one-sided factorizations with multiple software packages on multi-core hardware. In: SC ’09: proceedings of the conference on high performance computing networking, storage and analysis. ACM, New York, pp 1–12. http://doi.acm.org/10.1145/1654059.1654080
Chapter Google Scholar
Anderson E, Bai Z, Bischof C, Blackford SL, Demmel JW, Dongarra JJ, Croz JD, Greenbaum A, Hammarling S, McKenney A, Sorensen DC (1999) LAPACK user’s guide, 3rd edn. Society for Industrial and Applied Mathematics, Philadelphia
Book MATH Google Scholar
Anzt H, Rocker B, Heuveline V (2010) Energy efficiency of mixed precision iterative refinement methods using hybrid hardware platforms—an evaluation of different solver and hardware configurations. Comput Sci 25(3–4):141–148. doi:10.1007/s00450-010-0124-2
Google Scholar
Bekas C, Curioni A (2010) A new energy aware performance metric. Comput Sci 25(3–4):187–195. doi:10.1007/s00450-010-0119-z
Google Scholar
Bischof CH, Lang B, Sun X (2000) Algorithm 807: the SBR toolbox—software for successive band reduction. ACM Trans Math Softw 26(4):602–616. http://doi.acm.org/10.1145/365723.365736
Article MathSciNet Google Scholar
Buttari A, Dongarra J, Langou J, Langou J, Luszczek P, Kurzak J (2007) Mixed precision iterative refinement techniques for the solution of dense linear systems. Int J Hight Perform Comput Appl 21(4):457–466. doi:10.1177/1094342007084026
Article Google Scholar
Buttari A, Langou J, Kurzak J, Dongarra J (2009) A class of parallel tiled linear algebra algorithms for multicore architectures. Parallel Comput 35(1):38–53
Article MathSciNet Google Scholar
Chen G, Malkowski K, Kandemir MT, Raghavan P (2005) Reducing power with performance constraints for parallel sparse applications. In: IPDPS. IEEE Comput Soc, Los Alamitos. http://doi.ieeecomputersociety.org/10.1109/IPDPS.2005.378
Google Scholar
Ding Y, Malkowski K, Raghavan P, Kandemir MT (2008) Towards energy efficient scaling of scientific codes. In: IPDPS. IEEE Press, New York, pp 1–8. doi:10.1109/IPDPS.2008.4536217
Google Scholar
Freeh VW, Lowenthal DK (2005) Using multiple energy gears in MPI programs on a power-scalable cluster. In: Pingali K, Yelick KA, Grimshaw AS (eds) Proceedings of the ACM SIGPLAN symposium on principles and practice of parallel programming (10th PPOPP’2005), Chicago, IL, USA. ACM SIGPLAN Notices, vol 40, pp 164–173
Chapter Google Scholar
Ge R, Feng X, Song S, Chang HC, Li D, Cameron KW (2010) Powerpack: Energy profiling and analysis of high-performance systems and applications. IEEE Trans Parallel Distrib Syst PDS-21(5):658–671
Article Google Scholar
Golub GH, Van Loan CF (1996) Matrix computation, 3rd edn. John Hopkins studies in the mathematical sciences. Johns Hopkins University Press, Baltimore
Google Scholar
Kågström B, Kressner D, Quintana-Ortí E, Quintana-Ortí G (2008) Blocked algorithms for the reduction to Hessenberg-triangular form revisited. BIT Numer Math 48:563–584
Article MATH Google Scholar
Kappiah N, Freeh VW, Lowenthal DK (2005) Just in time dynamic voltage scaling: exploiting inter-node slack to save energy in MPI programs. In: SC. IEEE Comput Soc, Los Alamitos, p 33. http://doi.acm.org/10.1145/1105760.1105797
Google Scholar
Kogge P, Bergman K, Borkar S, Campbell D, Carlson W, Dally W, Denneau M, Franzon P, Harrod W, Hill K, Hiller J, Karp S, Keckler S, Klein D, Lucas R, Richards M, Scarpelli A, Scott S, Snavely A, Sterling T, Williams RS, Yelick K (2008) Exascale computing study: technology challenges in achieving exascale systems. Tech Rep TR-2008-13, Department of Computer Science and Engineering. University of Notre Dame
Ltaief H, Luszczek P, Dongarra J (2011, submitted) High performance bidiagonal reduction using tile algorithms on homogeneous multicore architectures. ACM Trans Math Softw
Luszczek P, Ltaief H, Dongarra J (2011) Two-stage tridiagonal reduction for dense symmetric matrices using tile algorithms on multicore architectures. In: Proceedings of IPDPS 2011. ACM, Anchorage
Google Scholar
Multicore application modeling infrastructure (MuMI) project. http://www.mumi-tool.org
Sutter H (2005) The free lunch is over: a fundamental turn toward concurrency in software. Dr Dobb’s Journal 30(3). http://www.ddj.com/184405990
Trefethen LN, Bau D (1997) Numerical linear algebra. SIAM, Philadelphia. http://www.siam.org/books/OT50/Index.htm
Book MATH Google Scholar
University of Tennessee Knoxville (2010) PLASMA users’ guide, parallel linear algebra software for multicore architectures, version 2.3. Available electronically at http://icl.cs.utk.edu/projectsfiles/plasma/pdf/users_guide.pdf

Download references

Author information

Authors and Affiliations

KAUST Supercomputing Laboratory, Thuwal, Saudi Arabia
Hatem Ltaief
Department of Electrical Engineering and Computer Science, University of Tennessee, Knoxville, USA
Piotr Luszczek & Jack Dongarra

Authors

Hatem Ltaief
View author publications
You can also search for this author in PubMed Google Scholar
Piotr Luszczek
View author publications
You can also search for this author in PubMed Google Scholar
Jack Dongarra
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hatem Ltaief.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ltaief, H., Luszczek, P. & Dongarra, J. Profiling high performance dense linear algebra algorithms on multicore architectures for power and energy efficiency. Comput Sci Res Dev 27, 277–287 (2012). https://doi.org/10.1007/s00450-011-0191-z

Download citation

Published: 31 August 2011
Issue Date: November 2012
DOI: https://doi.org/10.1007/s00450-011-0191-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Profiling high performance dense linear algebra algorithms on multicore architectures for power and energy efficiency

Abstract

Access this article

Similar content being viewed by others

Are our dense linear algebra libraries energy-friendly?

Performance Evaluation and Analysis of Linear Algebra Kernels in the Prototype Tianhe-3 Cluster

Runtime Scheduling of the LU Factorization: Performance and Energy

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Profiling high performance dense linear algebra algorithms on multicore architectures for power and energy efficiency

Abstract

Access this article

Similar content being viewed by others

Are our dense linear algebra libraries energy-friendly?

Performance Evaluation and Analysis of Linear Algebra Kernels in the Prototype Tianhe-3 Cluster

Runtime Scheduling of the LU Factorization: Performance and Energy

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation