Power profiling of Cholesky and QR factorizations on distributed memory systems

Bosilca, George; Ltaief, Hatem; Dongarra, Jack

doi:10.1007/s00450-012-0224-2

Power profiling of Cholesky and QR factorizations on distributed memory systems

Special Issue Paper
Published: 30 August 2012

Volume 29, pages 139–147, (2014)
Cite this article

Computer Science - Research and Development

George Bosilca¹,
Hatem Ltaief² &
Jack Dongarra¹

252 Accesses
8 Citations
Explore all metrics

Abstract

This paper presents the power profile of two high performance dense linear algebra libraries on distributed memory systems, ScaLAPACK and DPLASMA. From the algorithmic perspective, their methodologies are opposite. The former is based on block algorithms and relies on multithreaded BLAS and a two-dimensional block cyclic data distribution to achieve high parallel performance. The latter is based on tile algorithms running on top of a tile data layout and uses fine-grained task parallelism combined with a dynamic distributed scheduler (DAGuE) to leverage distributed memory systems. We present performance results (Gflop/s) as well as the power profile (Watts) of two common dense factorizations needed to solve linear systems of equations, namely Cholesky and QR. The reported numbers show that DPLASMA surpasses ScaLAPACK not only in terms of performance (up to 2X speedup) but also in terms of energy efficiency (up to 62 %).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Exploiting Data Sparsity for Large-Scale Matrix Computations

Runtime Scheduling of the LU Factorization: Performance and Energy

High Performance Polar Decomposition on Distributed Memory Systems

Notes

Up to our knowledge.

References

MPI-2: extensions to the message passing interface standard. (1997) http://www.mpi-forum.org/
Agullo E, Hadri B, Ltaief H, Dongarra J (2009) Comparative study of one-sided factorizations with multiple software packages on multi-core hardware. In: Proceedings of the conference on high performance computing networking, storage and analysis (SC’09), pp 1–12
Chapter Google Scholar
Anderson E, Bai Z, Bischof C, Blackford SL, Demmel JW, Dongarra JJ, Croz JD, Greenbaum A, Hammarling S, McKenney A, Sorensen DC (1999) LAPACK user’s guide, 3rd edn. Society for Industrial and Applied Mathematics, Philadelphia
Book MATH Google Scholar
Bosilca G, Bouteiller A, Danalis A, Faverge M, Haidar A, Herault T, Kurzak J, Langou J, Lemarinier P, Ltaief H, Luszczek P, YarKhan A, Dongarra J (2011) Flexible development of dense linear algebra algorithms on massively parallel architectures with DPLASMA. In: PDSEC-11. ACM, New York
Google Scholar
Bosilca G, Bouteiller A, Herault T, Lemarinier P, Dongarra J (2011) DAGuE: a generic distributed DAG engine for high performance computing. In: HIPS
Google Scholar
Buttari A, Langou J, Kurzak J, Dongarra J (2009) A class of parallel tiled linear algebra algorithms for multicore architectures. Parallel Comput 35(1):38–53
Article MathSciNet Google Scholar
Choi J, Demmel J, Dhillon I, Dongarra J, Ostrouchov S, Petitet A, Stanley K, Walker D, Whaley RC (1996) ScaLAPACK, a portable linear algebra library for distributed memory computers-design issues and performance. Comput Phys Commun 97(1–2):1–15
Article MATH Google Scholar
Cosnard M, Jeannot E (1999) Compact DAG representation and its dynamic scheduling. J Parallel Distrib Comput 58:487–514
Article Google Scholar
Costa GD, Pierson JM (2011) Characterizing applications from power consumption: a case study for HPC benchmarks. In: Kranzlmüller D, Tjoa AM (eds) ICT-GLOW. Lecture notes in computer science, vol 6868. Springer, Berlin, pp 10–17. doi:10.1007/978-3-642-23447-7
Google Scholar
Dongarra J, Beckman P (2011) The international exascale software roadmap. Int J Supercomput Appl High Perform Comput 25(1):3–60
Article Google Scholar
Ge R, Feng X, Song S, Chang HC, Li D, Cameron KW (2010) PowerPack: energy profiling and analysis of High-Performance systems and applications. IEEE Trans Parallel Distrib Syst 21(5):658–671
Article Google Scholar
Geist A, Beguelin A, Dongarra J, Jiang W, Manchek R, Sunderam V (1994) PVM: parallel virtual machine: a users’ guide and tutorial for networked parallel computing. MIT Press, Cambridge
MATH Google Scholar
Golub GH, Van Loan CF (1996) Matrix computations, 3rd edn. John Hopkins studies in the mathematical sciences. The John Hopkins University Press, Baltimore
MATH Google Scholar
Haidar A, Ltaief H, Dongarra J (2011) Parallel reduction to condensed forms for symmetric eigenvalue problems using aggregated fine-grained and memory-aware kernels. In: SC11, Seattle, WA, USA
Google Scholar
Haidar A, Ltaief H, Luszczek P, Dongarra J (2012) A comprehensive study of task coalescing for selecting parallelism granularity in a two-stage bidiagonal reduction. In: IPDPS’12, Shanghai, China
Google Scholar
Kansal A, Zhao F (2008) Fine-grained energy profiling for power-aware application design. ACM SIGMETRICS Perform Eval Rev 36(2):26–31. http://doi.acm.org/10.1145/1453175.1453180
Article Google Scholar
Kappiah N, Freeh VW, Lowenthal DK (2005) Just in time dynamic voltage scaling: exploiting inter-node slack to save energy in MPI programs. In: SC. IEEE Comput. Soc., Los Alamitos, p 33
Google Scholar
Ltaief H, Luszczek P, Dongarra J (2011) Profiling high performance dense linear algebra algorithms on multicore architectures for power and energy efficiency. In: Second international conference on energy-aware HPC (EnA-HPC 2011), Hamburg, Germany
Google Scholar
Quintana-Ortí G, Quintana-Ortí ES, Chan E, van de Geijn RA, Van Zee FG (2008) Scheduling of QR factorization algorithms on SMP and multi-core architectures. In: PDP. IEEE Comput. Soc., Los Alamitos, pp 301–310
Google Scholar
Trefethen LN, Bau D (1997) Numerical linear algebra. SIAM, Philadelphia. http://www.siam.org/books/OT50/Index.htm
Book MATH Google Scholar
University of Tennessee (2011) Knoxville: PLASMA users’ guide, parallel linear algebra software for multicore architectures. Version 2.4
Zee FGV, Chan E, van de Geijn RA, Quintana-Orti ES, Quintana-Orti G (2009) The libflame library for dense matrix computations. Comput Sci Eng 11(6):56–63
Article Google Scholar

Download references

Acknowledgements

The authors would like to thank Pr. Kirk Cameron from the Department of Computer Science at Virginia Tech, for granting access to his platform.

Author information

Authors and Affiliations

Innovative Computing Laboratory, University of Tennessee, Knoxville, USA
George Bosilca & Jack Dongarra
Supercomputing Laboratory, KAUST, Thuwal, Saudi Arabia
Hatem Ltaief

Authors

George Bosilca
View author publications
You can also search for this author in PubMed Google Scholar
Hatem Ltaief
View author publications
You can also search for this author in PubMed Google Scholar
Jack Dongarra
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to George Bosilca.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bosilca, G., Ltaief, H. & Dongarra, J. Power profiling of Cholesky and QR factorizations on distributed memory systems. Comput Sci Res Dev 29, 139–147 (2014). https://doi.org/10.1007/s00450-012-0224-2

Download citation

Published: 30 August 2012
Issue Date: May 2014
DOI: https://doi.org/10.1007/s00450-012-0224-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Power profiling of Cholesky and QR factorizations on distributed memory systems

Abstract

Access this article

Similar content being viewed by others

Exploiting Data Sparsity for Large-Scale Matrix Computations

Runtime Scheduling of the LU Factorization: Performance and Energy

High Performance Polar Decomposition on Distributed Memory Systems

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Power profiling of Cholesky and QR factorizations on distributed memory systems

Abstract

Access this article

Similar content being viewed by others

Exploiting Data Sparsity for Large-Scale Matrix Computations

Runtime Scheduling of the LU Factorization: Performance and Energy

High Performance Polar Decomposition on Distributed Memory Systems

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation