Parallel Communication-Avoiding Algorithm for Triangular Matrix Inversion on Homogeneous and Heterogeneous Platforms

Mahfoudhi, Ryma; Mahjoub, Zaher; Nasri, Wahid

doi:10.1007/s10766-014-0310-0

Parallel Communication-Avoiding Algorithm for Triangular Matrix Inversion on Homogeneous and Heterogeneous Platforms

Published: 30 March 2014

Volume 43, pages 631–655, (2015)
Cite this article

International Journal of Parallel Programming Aims and scope Submit manuscript

Ryma Mahfoudhi¹,
Zaher Mahjoub¹ &
Wahid Nasri²

276 Accesses
1 Citation
Explore all metrics

Abstract

We address in this paper the parallelization of a recursive algorithm for large scale triangular matrix inversion based on the ‘Divide and Conquer’ (D&C) paradigm. A set of different versions of an original sequential algorithm are first presented. A theoretical performance study permits to establish an accurate comparison between the designed algorithms. Afterwards, we develop in the second part of the paper, an optimal parallel avoiding-communication algorithm for a given number of available homogeneous and heterogeneous processors. To reach this target, we use a so called ‘non equitable and incomplete’ version of the D&C paradigm consisting in recursively decomposing the original problem into two sub-problems of non equal sizes, then decomposing only one sub-problem in the same previous manner. The theoretical study is validated by a series of experiments achieved on three target platforms, namely an 8-core shared memory machine, a distributed memory cluster and a heterogeneous CPU-GPU cluster. The obtained results permit to illustrate the interest of the contribution.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Performance improvement of the triangular matrix product in commodity clusters

Article Open access 15 April 2024

Parallelizing the dual revised simplex method

Article Open access 14 December 2017

A new distributed graph coloring algorithm for large graphs

Article 23 March 2023

References

Quarteroni, A., Sacco, R., Saleri, F.: Méthodes Numériques. Algorithmes, Analyse et Applications. Springer, Milano (2007)
Heller, D.: A survey of parallel algorithms in numerical linear algebra. SIAM Rev. 20, 740–777 (1978)
Article MATH MathSciNet Google Scholar
Modi, J.J.: Parallel Algorithms and Matrix Computation. Oxford University Press, Oxford (1988)
MATH Google Scholar
JáJá, J.: An Introduction to Parallel Algorithms. Addison-Wesley, Reading (1992)
Schikarski, A., Wagner, D.: Efficient parallel matrix inversion on interconnection networks. J. Parallel Distrib. Comput. 34, 196–201 (1996)
Article Google Scholar
Nasri, W.: Optimal parallelization of a recursive algorithm for triangular matrix inversion on MIMD computers. Doctoral thesis, Faculty of Sciences of Tunis, Tunis (2002)
Nasri, W., Mahjoub, Z.: Design and implementation of a general parallel divide and Conquer algorithm for triangular matrix inversion. Int. J. Parallel Distrib. Syst. Netw. 5(1), 35–42 (2002)
Google Scholar
Karlsson, L.: Computing explicit matrix inverses by recursion. MS thesis, Umea University, Department of Computing Science, Sweden (2006)
Li, K.: Fast and highly scalable parallel computations for fundamental matrix problems on distributed memory systems. J. Supercomput. http://www.springerlink.com/content/x03424q12666w3t4/fulltext.pdf (2009)
Gengler, M., Ubéda S., Desprez, F.: Initiation au parallélisme: concepts, architectures et algorithmes. Masson, Paris (1996)
Choi, J., Dongarra, J., Ostrouchov, S., Petitet, A., Walker, D., Whaley, R. C.: A proposal for a set of parallel basic linear algebra subprograms. TR CS- pp. 95–292, Computer Science Department, University of Tennesse, Knoxville, TN (1995)
Marrakchi, M.: Conception et analyse d’ordonnancements efficaces pour algorithmes parallèles d’algèbre linéaire. Doctoral thesis, Faculty of Sciences of Tunis (2001)
Ries, F., De Marco, T., Guerrieri, R.: Triangular matrix inversion on heterogeneous multicore systems. IEEE Trans. Parallel Distrib. Syst. 23, 177–184 (2012)
Article Google Scholar
Georganas, E., González-Domínguezy, J., Solomonik, E., Zhengz, Y., Touriñoy, J., Yelick, K.: Communication avoiding and overlapping for numerical linear algebra. SC ’12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (2012)
Donfack, S., Grigori, L., Khabou, A.: Avoiding communication through a Multilevel LU Factorization, Euro-Par 2012 Parallel Processing, pp. 551–562 (2012)
ChronoMath http://serge.mehl.free.fr/anx/equ_deg3.html
Nasri, W., Mahjoub, Z., Trystram, D.: Computing the inverse of a triangular matrix on heterogeneous clusters. In: Algorithms and Tools for Parallel Computing on Heterogeneous Clusters, pp. 67–78 (2007)
Karmarkar, N., Karp, R.M., Luekerand, G.S., Odlyzko, A.M.: Probabilistic analysis of optimum partitioning. J. Appl. Prob. 23, 626–645 (1986)
Article MATH Google Scholar
Khabou, A.: Dense Matrix Computations: Communication Cost and Numerical Stability. Thesis, University Paris-Sud (2013)
Chergui, J.: OpenMP: Parallélisation multitâches pour machines à mémoire partagée. Course, Institut du développement et des ressources en informatique scientifique, France (2006)
OpenMP. http://www.openmp.org
Creel, M., Goffe, W.L.: Multi-Core CPUs, Clusters, and Grid Computing. Kluwer, Dordrecht (2007)
Google Scholar
Message Passing Interface Forum. http://www.mpi-forum.org
Plaza, A., Valencia, D., Plaza, J.: An experimental comparison of parallel algorithms for hyperspectral analysis using heterogeneous and homogeneous networks of workstations. Parallel Comput. 34, 92–114 (2008)
Article Google Scholar
Kumar, A., Grama, A., Gupta, A., Karypis, G.: Introduction to Parallel Computing: Design and Analysis of Algorithms. Addison-Wesley, Reading (1994)
Nvidia. https://developer.nvidia.com/cuBLAS
Tomov, S., Nath, R., Dongarra, R.: Accelerating the reduction to upper Hessenberg, tridiagonal, and bidiagonal forms through hybrid GPU-based computing. Parallel Comput. J. 36, 645–654 (2010)
Google Scholar
Mahfoudhi, R., Mahjoub, Z., Nasri, W.: Une nouvelle méthode de parallélisation optimale pour l’inversion de matrice triangulaire, RenPar’20 / SympA’14 / CFSE 8. Saint-Malo, France (2011)
Mahfoudhi, R., Mahjoub, Z., Nasri, W.: Parallel communication-free algorithm for triangular matrix inversion on heterogenoues platform. In: Proceedings of the Federated Conference on Computer Science and Information Systems, pp. 553–560, Wroklaw, Poland (2012)

Download references

Acknowledgments

We address our deep thanks to Dr. N. Jaïdane for his invaluable help and an anonymous referee for his judicious comments and suggestions

Author information

Authors and Affiliations

Faculty of Sciences of Tunis, University of Tunis El Manar, Manar II, 2092 , Tunis, Tunisia
Ryma Mahfoudhi & Zaher Mahjoub
Higher School of Sciences and Techniques of Tunis, 1008 , Montfleury, Tunis, Tunisia
Wahid Nasri

Authors

Ryma Mahfoudhi
View author publications
You can also search for this author in PubMed Google Scholar
Zaher Mahjoub
View author publications
You can also search for this author in PubMed Google Scholar
Wahid Nasri
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ryma Mahfoudhi.

Additional information

This paper is based on two previous communications published in RenPar’20/SympA’14/CFSE 8, Saint-Malo, France, 2011 and CANA, Wroklaw, Poland, 2012 [28, 29].

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mahfoudhi, R., Mahjoub, Z. & Nasri, W. Parallel Communication-Avoiding Algorithm for Triangular Matrix Inversion on Homogeneous and Heterogeneous Platforms. Int J Parallel Prog 43, 631–655 (2015). https://doi.org/10.1007/s10766-014-0310-0

Download citation

Received: 16 September 2013
Accepted: 13 March 2014
Published: 30 March 2014
Issue Date: August 2015
DOI: https://doi.org/10.1007/s10766-014-0310-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Parallel Communication-Avoiding Algorithm for Triangular Matrix Inversion on Homogeneous and Heterogeneous Platforms

Abstract

Access this article

Similar content being viewed by others

Performance improvement of the triangular matrix product in commodity clusters

Parallelizing the dual revised simplex method

A new distributed graph coloring algorithm for large graphs

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Parallel Communication-Avoiding Algorithm for Triangular Matrix Inversion on Homogeneous and Heterogeneous Platforms

Abstract

Access this article

Similar content being viewed by others

Performance improvement of the triangular matrix product in commodity clusters

Parallelizing the dual revised simplex method

A new distributed graph coloring algorithm for large graphs

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation