Modeling 1D Distributed-Memory Dense Kernels for an Asynchronous Multifrontal Sparse Solver

Amestoy, Patrick R.; L’Excellent, Jean-Yves; Rouet, François-Henry; Sid-Lakhdar, Wissam M.

doi:10.1007/978-3-319-17353-5_14

Patrick R. Amestoy¹⁶,
Jean-Yves L’Excellent¹⁷,
François-Henry Rouet¹⁸ &
…
Wissam M. Sid-Lakhdar¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8969))

Included in the following conference series:

International Conference on High Performance Computing for Computational Science

749 Accesses

Abstract

To solve sparse systems of linear equations, multifrontal methods rely on dense partial \(LU\) decompositions of so-called frontal matrices; we consider a parallel asynchronous setting in which several frontal matrices can be factored simultaneously. In this context, to address performance and scalability issues of acyclic pipelined asynchronous factorization kernels, we study models to revisit properties of left and right-looking variants of partial \(LU\) decompositions, study the use of several levels of blocking, before focusing on communication issues. The general purpose sparse solver MUMPS has been modified to implement the proposed algorithms and confirm the properties demonstrated by the models.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
http://www.netlib.org/hpl/.

References

Agullo, E., Demmel, J., Dongarra, J., Hadri, B., Kurzak, J., Langou, J., Ltaief, H., Luszczek, P., Tomov, S.: Numerical linear algebra on emerging architectures: the PLASMA and MAGMA projects. J. Phys. Conf. Ser. 180(1), 012037 (2009)
Article Google Scholar
Amestoy, P.R., Buttari, A., Duff, I.S., Guermouche, A., L’Excellent, J.-Y., Uçar, B.: The multifrontal method. In: Padua, D. (ed.) Encyclopedia of Parallel Computing, pp. 1209–1216. Springer, Heidelberg (2011)
Google Scholar
Amestoy, P.R., Duff, I.S., Koster, J., L’Excellent, J.-Y.: A fully asynchronous multifrontal solver using distributed dynamic scheduling. SIAM J. Matrix Anal. Appl. 23(1), 15–41 (2001)
Article MATH MathSciNet Google Scholar
Augonnet, C., Thibault, S., Namyst, R., Wacrenier, P.-A.: StarPU: a unified platform for task scheduling on heterogeneous multicore architectures. Concurrency Comput.: Pract. Experience 23(2), 187–198 (2011). Special Issue: Euro-Par 2009
Article Google Scholar
Bosilca, G., Bouteiller, A., Danalis, A., Faverge, M., Haidar, A., Herault, T., Kurzak, J., Langou, J., Lemarinier, P., Ltaief, H., Luszczek, P., Yarkhan, A., Dongarra, J.J.: distibuted dense numerical linear algebra algorithms on massively parallel architectures: DPLASMA. In: Proceedings of the 25th IEEE International Symposium on Parallel & Distributed Processing Workshops and Ph.D. Forum (IPDPSW’11). PDSEC 2011, pp. 1432–1441. Anchorage, USA (2011)
Google Scholar
Bosilca,G., Bouteiller, A., Danalis, A., Herault, T., Lemarinier, P., Dongarra, J.: DAGuE: A generic distributed DAG engine for high performance computing. In: 16th International Workshop on High-Level Parallel Programming Models and Supportive Environments (HIPS’11) (2011)
Google Scholar
Buttari, A., Langou, J., Kurzak, J., Dongarra, J.: A class of parallel tiled linear algebra algorithms for multicore architectures. Parallel Comput. 35(1), 38–53 (2009)
Article MathSciNet Google Scholar
Choi, J., Dongarra, J.J., Ostrouchov, L.S., Petitet, A.P., Walker, D.W., Whaley, R.C.: Design and implementation of the ScaLAPACK LU, QR, and Cholesky factorization routines. Sci. Program. 5(3), 173–184 (1996)
Google Scholar
Desprez, F., Dongarra, J.J., Tourancheau, B.: Performance complexity of LU factorization with efficient pipelining and overlap on a multiprocessor. LAPACK working note 67, Computer Science Department, University of Tennessee, Knoxville, Tennessee (1994)
Google Scholar
Duff, I.S., Erisman, A.M., Reid, J.K.: Direct Methods for Sparse Matrices. Oxford University Press, London (1986)
MATH Google Scholar
Duff, I.S., Reid, J.K.: The multifrontal solution of unsymmetric sets of linear systems. SIAM J. Sci. Stat. Comput. 5, 633–641 (1984)
Article MATH MathSciNet Google Scholar
Golub, G.H., Van Loan, C.F.: Matrix Computations, 2nd edn. Johns Hopkins Press, Baltimore (1989)
MATH Google Scholar
Grigori, L., Demmel, J., Xiang, H.: CALU: a communication optimal LU factorization algorithm. SIAM J. Matrix Anal. Appl. 32(4), 1317–1350 (2011)
Article MATH MathSciNet Google Scholar
Hoefler, T., Lumsdaine, A.: Message progression in parallel computing - to thread or not to thread? In: IEEE International Conference on Cluster Computing, pp. 213–222 (2008)
Google Scholar
Liu, J.W.H.: The multifrontal method for sparse matrix solution: theory and practice. SIAM Rev. 34, 82–109 (1992)
Article MATH MathSciNet Google Scholar
Rouet, F.-H.: Memory and performance issues in parallel multifrontal factorizations and triangular solutions with sparse right-hand sides. Ph.D. thesis, Institut National Polytechnique de Toulouse, October 2012
Google Scholar
Sid-Lakhdar, W.M.: Scaling multifrontal methods for the solution of large sparse linear systems on hybrid shared-distributed memory architectures. Ph.D. dissertation, ENS Lyon (2014, In preparation)
Google Scholar
Solomonik, E., Bhatele, A., Demmel, J.: Improving communication performance in dense linear algebra via topology aware collectives. In: Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2011, pp. 77:1–77:11. ACM, New York (2011)
Google Scholar
Solomonik, E., Demmel, J.: Communication-optimal parallel 2.5D matrix multiplication and LU factorization algorithms. In: Jeannot, E., Namyst, R., Roman, J. (eds.) Euro-Par 2011, Part II. LNCS, vol. 6853, pp. 90–109. Springer, Heidelberg (2011)
Chapter Google Scholar
Toledo, S.: Locality of reference in lu decomposition with partial pivoting. SIAM J. Matrix Anal. Appl. 18(4), 1065–1081 (1997)
Article MATH MathSciNet Google Scholar
Wadsworht, D.M., Chen, Z.: Performance of MPI broadcast algorithms. In: Proceedings of the 22nd International Parallel and Distributed Processing Symposium (IPDPS 2008), pp. 1–7 (2008)
Google Scholar

Download references

Acknowledgement

This work was granted access to the HPC resources of CALMIP under the allocation 2013-0989 and GENCI/IDRIS resources under allocation x2013065063.

Author information

Authors and Affiliations

University of Toulouse, INPT(ENSEEIHT)-IRIT, Toulouse, France
Patrick R. Amestoy
University of Lyon, Inria and LIP (CNRS, ENS Lyon, Inria, UCBL), Lyon, France
Jean-Yves L’Excellent
Lawrence Berkeley National Laboratory, Berkeley, USA
François-Henry Rouet
University of Lyon, ENS Lyon and LIP (CNRS, ENS Lyon, Inria, UCBL), Lyon, France
Wissam M. Sid-Lakhdar

Authors

Patrick R. Amestoy
View author publications
You can also search for this author in PubMed Google Scholar
Jean-Yves L’Excellent
View author publications
You can also search for this author in PubMed Google Scholar
François-Henry Rouet
View author publications
You can also search for this author in PubMed Google Scholar
Wissam M. Sid-Lakhdar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wissam M. Sid-Lakhdar .

Editor information

Editors and Affiliations

IRIT, ENSEEIHT, Toulouse Cedex, France
Michel Daydé
Lawrence Berkeley National Laboratory, Berkeley, California, USA
Osni Marques
Information Technology Center, The University of Tokyo, Tokyo, Japan
Kengo Nakajima

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Amestoy, P.R., L’Excellent, JY., Rouet, FH., Sid-Lakhdar, W.M. (2015). Modeling 1D Distributed-Memory Dense Kernels for an Asynchronous Multifrontal Sparse Solver. In: Daydé, M., Marques, O., Nakajima, K. (eds) High Performance Computing for Computational Science -- VECPAR 2014. VECPAR 2014. Lecture Notes in Computer Science(), vol 8969. Springer, Cham. https://doi.org/10.1007/978-3-319-17353-5_14

Download citation

DOI: https://doi.org/10.1007/978-3-319-17353-5_14
Published: 18 April 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-17352-8
Online ISBN: 978-3-319-17353-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics