Computing and Visualization in Science

, Volume 16, Issue 3, pp 105–117 | Cite as

\({{\fancyscript{H}}} \)-LU factorization on many-core systems

Article

Abstract

A version of the \({{\fancyscript{H}}} \)-LU factorization is introduced, based on the individual computational tasks occurring during the block-wise \({{\fancyscript{H}}} \)-LU factorization. The dependencies between these tasks form a directed acylic graph, which is used for efficient scheduling on parallel systems. The algorithm is especially suited for many-core processors and shows a much improved parallel scaling behavior compared to previous \({{\fancyscript{H}}} \)-LU factorization algorithms.

Keywords

Hierarchical matrices Parallel algorithms Many-core processors DAG-based 

Mathematics Subject Classification

65F05 65Y05 65Y20 68W10 68W40 

References

  1. 1.
    Agullo, E., Buttari, A., Dongarra, J., Faverge, M., Hadri, B., Haidar, A., Kurzak, J., Langou, J., Ltaief, H., Luszczek, P., YarKhan, A.: PLASMA Users’ Guide. Electrical Engineering and Computer Science Department, University of Tennessee, Knoxville (1997)Google Scholar
  2. 2.
    Agullo, E., Demmel, J., Dongarra, J., Hadri, B., Kurzak, J., Langou, J., Ltaief, H., Luszczek, P., Tomov, S.: Numerical linear algebra on emerging architectures: the PLASMA and MAGMA projects. J. Phys. Conf. Ser. 180(1), 012,037 (2009)CrossRefGoogle Scholar
  3. 3.
    Anderson, E., Bai, Z., Bischof, C., Blackford, S., Demmel, J., Dongarra, J., Du Croz, J., Greenbaum, A., Hammarling, S., McKenney, A., Sorensen, D.: LAPACK Users’ Guide, 3rd edn. Society for Industrial and Applied Mathematics, Philadelphia (1999)CrossRefGoogle Scholar
  4. 4.
    Buttari, A., Langou, J., Kurzak, J., Dongarra, J.: Parallel tiled QR factorization for multicore architectures. Concurr. Comput. Pract. Exp. 20(13), 1573–1590 (2008)CrossRefGoogle Scholar
  5. 5.
    Buttari, A., Langou, J., Kurzak, J., Dongarra, J.: A class of parallel tiled linear algebra algorithms for multicore architectures. Parallel Comput. 35(1), 38–53 (2009)CrossRefMathSciNetGoogle Scholar
  6. 6.
    Duff, I.S., Reid, J.K.: The multifrontal solution of indefinite sparse symmetric linear. ACM Trans. Math. Softw. 9(3), 302–325 (1983). doi: 10.1145/356044.356047 CrossRefMATHMathSciNetGoogle Scholar
  7. 7.
    Grasedyck, L., Hackbusch, W.: Construction and arithmetics of \({\cal H}\)-matrices. Computing 70, 295–334 (2003)CrossRefMATHMathSciNetGoogle Scholar
  8. 8.
    Grasedyck, L., Hackbusch, W., Kriemann, R.: Performance of \({\cal H}\)-LU preconditioning for sparse matrices. Comput. Methods Appl. Math. 8(4), 336–349 (2008)CrossRefMATHMathSciNetGoogle Scholar
  9. 9.
    Grasedyck, L., Kriemann, R., Le Borne, S.: Domain-decomposition based \({\cal H}\)-matrix preconditioners. In: Proceedings of DD16. LNSCE, vol. 55, pp. 661–668. Springer, Berlin (2006)Google Scholar
  10. 10.
    Grasedyck, L., Kriemann, R., LeBorne, S.: Parallel black box \({\cal H}\)-LU preconditioning for elliptic boundary value problems. Comput. Vis. Sci. 11(4–6), 273–291 (2008). doi:  10.1007/s00791-008-0098-9 CrossRefMathSciNetGoogle Scholar
  11. 11.
    Group, K.O.W., et al.: The OpenCL specification. In: Munshi, A. (ed.) (2008). http://www.khronos.org/registry/cl/
  12. 12.
    Hackbusch, W.: A sparse matrix arithmetic based on \({\cal H}\) matrices. Part I: introduction to \({\cal H}\)-matrices. Computing 62, 89–108 (1999)CrossRefMATHMathSciNetGoogle Scholar
  13. 13.
    Hogg, J., Reid, J., Scott, J.: Design of a multicore sparse cholesky factorization using DAGs. SIAM J. Sci. Comput. 32(6), 3627–3649 (2010)Google Scholar
  14. 14.
    Izadi, M.: Hierarchical matrix techniques on massively parallel computers. Ph.D. thesis, University of Leipzig (2012)Google Scholar
  15. 15.
    Kriemann, R.: Hlibpro. http://www.hlibpro.com/
  16. 16.
    Kriemann, R.: Parallel \({\cal H}\)-matrix arithmetics on shared memory systems. Computing 74, 273–297 (2005)CrossRefMATHMathSciNetGoogle Scholar
  17. 17.
    Kurzak, J., Buttari, A., Dongarra, J.: Solving systems of linear equations on the CELL processor using Cholesky factorization. IEEE Trans. Parallel Distrib. Syst. 19(9), 1175–1186 (2008)CrossRefGoogle Scholar
  18. 18.
    Kurzak, J., Dongarra, J.: QR factorization for the cell broadband engine. Sci. Program. 17(1–2), 31–42 (2009)Google Scholar
  19. 19.
    Lacoste, X., Ramet, P., Faverge, M., Ichitaro, Y., Dongarra, J.: Sparse direct solvers with accelerators over DAG runtimes. Rapport de recherche RR-7972, INRIA. http://hal.inria.fr/hal-00700066 (2012)
  20. 20.
    Nickolls, J., Buck, I., Garland, M., Skadron, K.: Scalable parallel programming with CUDA. Queue 6(2), 40–53 (2008)CrossRefGoogle Scholar
  21. 21.
    Quintana-Ortí, E.S., Geijn, R.A.V.D.: Updating an LU factorization with pivoting. ACM Trans. Math. Softw. (TOMS) 35(2), 11 (2008)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2014

Authors and Affiliations

  1. 1.Max-Planck-Institute for Mathematics in the SciencesLeipzigGermany

Personalised recommendations