Skip to main content
Log in

Parallel Communication-Avoiding Algorithm for Triangular Matrix Inversion on Homogeneous and Heterogeneous Platforms

  • Published:
International Journal of Parallel Programming Aims and scope Submit manuscript

Abstract

We address in this paper the parallelization of a recursive algorithm for large scale triangular matrix inversion based on the ‘Divide and Conquer’ (D&C) paradigm. A set of different versions of an original sequential algorithm are first presented. A theoretical performance study permits to establish an accurate comparison between the designed algorithms. Afterwards, we develop in the second part of the paper, an optimal parallel avoiding-communication algorithm for a given number of available homogeneous and heterogeneous processors. To reach this target, we use a so called ‘non equitable and incomplete’ version of the D&C paradigm consisting in recursively decomposing the original problem into two sub-problems of non equal sizes, then decomposing only one sub-problem in the same previous manner. The theoretical study is validated by a series of experiments achieved on three target platforms, namely an 8-core shared memory machine, a distributed memory cluster and a heterogeneous CPU-GPU cluster. The obtained results permit to illustrate the interest of the contribution.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22
Fig. 23
Fig. 24

Similar content being viewed by others

References

  1. Quarteroni, A., Sacco, R., Saleri, F.: Méthodes Numériques. Algorithmes, Analyse et Applications. Springer, Milano (2007)

  2. Heller, D.: A survey of parallel algorithms in numerical linear algebra. SIAM Rev. 20, 740–777 (1978)

    Article  MATH  MathSciNet  Google Scholar 

  3. Modi, J.J.: Parallel Algorithms and Matrix Computation. Oxford University Press, Oxford (1988)

    MATH  Google Scholar 

  4. JáJá, J.: An Introduction to Parallel Algorithms. Addison-Wesley, Reading (1992)

  5. Schikarski, A., Wagner, D.: Efficient parallel matrix inversion on interconnection networks. J. Parallel Distrib. Comput. 34, 196–201 (1996)

    Article  Google Scholar 

  6. Nasri, W.: Optimal parallelization of a recursive algorithm for triangular matrix inversion on MIMD computers. Doctoral thesis, Faculty of Sciences of Tunis, Tunis (2002)

  7. Nasri, W., Mahjoub, Z.: Design and implementation of a general parallel divide and Conquer algorithm for triangular matrix inversion. Int. J. Parallel Distrib. Syst. Netw. 5(1), 35–42 (2002)

    Google Scholar 

  8. Karlsson, L.: Computing explicit matrix inverses by recursion. MS thesis, Umea University, Department of Computing Science, Sweden (2006)

  9. Li, K.: Fast and highly scalable parallel computations for fundamental matrix problems on distributed memory systems. J. Supercomput. http://www.springerlink.com/content/x03424q12666w3t4/fulltext.pdf (2009)

  10. Gengler, M., Ubéda S., Desprez, F.: Initiation au parallélisme: concepts, architectures et algorithmes. Masson, Paris (1996)

  11. Choi, J., Dongarra, J., Ostrouchov, S., Petitet, A., Walker, D., Whaley, R. C.: A proposal for a set of parallel basic linear algebra subprograms. TR CS- pp. 95–292, Computer Science Department, University of Tennesse, Knoxville, TN (1995)

  12. Marrakchi, M.: Conception et analyse d’ordonnancements efficaces pour algorithmes parallèles d’algèbre linéaire. Doctoral thesis, Faculty of Sciences of Tunis (2001)

  13. Ries, F., De Marco, T., Guerrieri, R.: Triangular matrix inversion on heterogeneous multicore systems. IEEE Trans. Parallel Distrib. Syst. 23, 177–184 (2012)

    Article  Google Scholar 

  14. Georganas, E., González-Domínguezy, J., Solomonik, E., Zhengz, Y., Touriñoy, J., Yelick, K.: Communication avoiding and overlapping for numerical linear algebra. SC ’12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (2012)

  15. Donfack, S., Grigori, L., Khabou, A.: Avoiding communication through a Multilevel LU Factorization, Euro-Par 2012 Parallel Processing, pp. 551–562 (2012)

  16. ChronoMath http://serge.mehl.free.fr/anx/equ_deg3.html

  17. Nasri, W., Mahjoub, Z., Trystram, D.: Computing the inverse of a triangular matrix on heterogeneous clusters. In: Algorithms and Tools for Parallel Computing on Heterogeneous Clusters, pp. 67–78 (2007)

  18. Karmarkar, N., Karp, R.M., Luekerand, G.S., Odlyzko, A.M.: Probabilistic analysis of optimum partitioning. J. Appl. Prob. 23, 626–645 (1986)

    Article  MATH  Google Scholar 

  19. Khabou, A.: Dense Matrix Computations: Communication Cost and Numerical Stability. Thesis, University Paris-Sud (2013)

  20. Chergui, J.: OpenMP: Parallélisation multitâches pour machines à mémoire partagée. Course, Institut du développement et des ressources en informatique scientifique, France (2006)

  21. OpenMP. http://www.openmp.org

  22. Creel, M., Goffe, W.L.: Multi-Core CPUs, Clusters, and Grid Computing. Kluwer, Dordrecht (2007)

    Google Scholar 

  23. Message Passing Interface Forum. http://www.mpi-forum.org

  24. Plaza, A., Valencia, D., Plaza, J.: An experimental comparison of parallel algorithms for hyperspectral analysis using heterogeneous and homogeneous networks of workstations. Parallel Comput. 34, 92–114 (2008)

    Article  Google Scholar 

  25. Kumar, A., Grama, A., Gupta, A., Karypis, G.: Introduction to Parallel Computing: Design and Analysis of Algorithms. Addison-Wesley, Reading (1994)

  26. Nvidia. https://developer.nvidia.com/cuBLAS

  27. Tomov, S., Nath, R., Dongarra, R.: Accelerating the reduction to upper Hessenberg, tridiagonal, and bidiagonal forms through hybrid GPU-based computing. Parallel Comput. J. 36, 645–654 (2010)

    Google Scholar 

  28. Mahfoudhi, R., Mahjoub, Z., Nasri, W.: Une nouvelle méthode de parallélisation optimale pour l’inversion de matrice triangulaire, RenPar’20 / SympA’14 / CFSE 8. Saint-Malo, France (2011)

  29. Mahfoudhi, R., Mahjoub, Z., Nasri, W.: Parallel communication-free algorithm for triangular matrix inversion on heterogenoues platform. In: Proceedings of the Federated Conference on Computer Science and Information Systems, pp. 553–560, Wroklaw, Poland (2012)

Download references

Acknowledgments

We address our deep thanks to Dr. N. Jaïdane for his invaluable help and an anonymous referee for his judicious comments and suggestions

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ryma Mahfoudhi.

Additional information

This paper is based on two previous communications published in RenPar’20/SympA’14/CFSE 8, Saint-Malo, France, 2011 and CANA, Wroklaw, Poland, 2012 [28, 29].

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mahfoudhi, R., Mahjoub, Z. & Nasri, W. Parallel Communication-Avoiding Algorithm for Triangular Matrix Inversion on Homogeneous and Heterogeneous Platforms. Int J Parallel Prog 43, 631–655 (2015). https://doi.org/10.1007/s10766-014-0310-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10766-014-0310-0

Keywords

Navigation