Advertisement

Parallel and Fully Recursive Multifrontal Supernodal Sparse Cholesky

  • Dror Irony
  • Gil Shklarski
  • Sivan Toledo
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2330)

Abstract

We describe the design, implementation, and performance of a new parallel sparse Cholesky factorization code. The code uses a supernodal multifrontal factorization strategy. Operations on small dense submatrices are performed using new dense-matrix subroutines that are part of the code, although the code can also use the blas and lapack. The new code is recursive at both the sparse and the dense levels, it uses a novel recursive data layout for dense submatrices, and it is parallelized using Cilk, an extension of C specifically designed to parallelize recursive codes. We demonstrate that the new code performs well and scales well on SMP’s.

Keywords

Cholesky Factorization Cache Line Runtime System Data Layout False Sharing 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
    R. C. Agarwal, F. G. Gustavson, and M. Zubair. Exploiting functional parallelism of POWER2 to design high-performance numerical algorithms. IBM Journal of Research and Development, 38(5):563–576, 1994.Google Scholar
  2. 2.
    R. C. Agarwal, F. G. Gustavson, and M. Zubair. Improving performance of linear algebra algorithms for dense matrices using algorithmic prefetch. IBM Journal of Research and Development, 38(3):265–275, 1994.zbMATHCrossRefGoogle Scholar
  3. 3.
    B. S. Andersen, J. Waśniewski, and F. G. Gustavson. A recursive formulation of cholesky factorization of a matrix in packed storage. ACM Transactions on Mathematical Software, 27:214–244, June 2001.Google Scholar
  4. 4.
    J. Bilmes, K. Asanovic, C. W. Chin, and J. Demmel. Optimizing matrix multiply using PHIPAC: a portable, high-performance, ANSI C coding methodology. In Proceedings of the International Conference on Supercomputing, Vienna, Austria, 1997.Google Scholar
  5. 5.
    Compaq. Compaq extended math library (CXML). Software and documuntation available online from http://www.compaq.com/math/,2001.
  6. 6.
    J. J. Dongarra, J. D. Cruz, S. Hammarling, and I. Duff. Algorithm 679: A set of level 3 basic linear algebra subprograms. ACM Transactions on Mathematical Software, 16(1): 18–28, 1990.zbMATHCrossRefGoogle Scholar
  7. 7.
    J. J. Dongarra, J. D. Cruz, S. Hammarling, and I. Duff. A set of level 3 basic linear algebra subprograms. ACM Transactions on Mathematical Software, 16(1): 1–17, 1990.zbMATHCrossRefGoogle Scholar
  8. 8.
    J. J. Dongarra and P. Raghavan. A new recursive implementation of sparse Cholesky factorization. In Proceedings of the 16th IMACS World Congress 2000 on Scientific Computing, Applications, Mathematics, and Simulation, Lausanne, Switzerland, Aug. 2000.Google Scholar
  9. 9.
    E. Elmroth and F. Gustavson. Applying recursion to serial and parallel QR factorization leads to better performance. IBM Journal of Research and Development, 44(4):605–624, 2000.Google Scholar
  10. 10.
    E. Elmroth and F. G. Gustavson. A faster and simpler recursive algorithm for the LAPACK routine DGELS. BIT, 41:936–949, 2001.CrossRefMathSciNetGoogle Scholar
  11. 11.
    M. Frigo, C. E. Leiserson, and K. H. Randall. The implementation of the Cilk-5 multithreaded language. ACM SIGPLANNotices, 33(5):212–223, 1998.CrossRefGoogle Scholar
  12. 12.
    F. Gustavson, A. Henriksson, I. Jonsson, B. Kågström, and P. Ling. Recursive blocked data formats and BLAS’s for dense linear algebra algorithms. In B. Kågström, J. Dongarra, E. Elmroth, and J. Waśniewski, editors, Proceedings of the 4th International Workshop on Applied Parallel Computing and Large Scale Scientific and Industrial Problems (PARA’ 98), number 1541 in Lecture Notes in Computer Science Number, pages 574–578, Ume, Sweden, June 1998. Springer.Google Scholar
  13. 13.
    F. G. Gustavson. Recursion leads to automatic variable blocking for dense linear-algebra algorithms. IBM Journal of Research and Development, 41:737–755, Nov. 1997.Google Scholar
  14. 14.
    F. G. Gustavson and I. Jonsson. Minimal-storage high-performance Cholesky factorization via blocking and recursion. IBM Journal of Research and Development, 44:823–850, Nov. 2000.Google Scholar
  15. 15.
    IBM. Engineering and scientific subroutine library (SCSL). Software and documuntation available online from http://www-1.ibm.com/servers/eservers/pseries/ software/sp/essl.html,2001.
  16. 16.
    Intel. Math kernel library (MKL). Software and documuntation available online from http://www.intel.com/software/products/mkl/,2001.
  17. 17.
    C. Kamath, R. Ho, and D. P. Manley. DXML: a high-performance scientific subroutine library. Digital Technical Journal, 6(3):44–56, 1994.Google Scholar
  18. 18.
    J. W. H. Liu. On the storage requirement in the out-of-core multifrontal method for sparse factorization. ACM Transactions on Mathematical Software, 12(3):249–264, 1986.zbMATHCrossRefGoogle Scholar
  19. 19.
    J. W. H. Liu. The multifrontal method and paging in sparse Cholesky factorization. ACM Transactions on Mathematical Software, 15(4):310–325, 1989.zbMATHCrossRefGoogle Scholar
  20. 20.
    J. W. H. Liu. The multifrontal method for sparse matrix solution: Theory and practice. SIAM Review, 34(1):82–109, 1992.zbMATHCrossRefMathSciNetGoogle Scholar
  21. 21.
    E. G. Ng and B. W. Peyton. Block sparse Cholesky algorithms on advanced uniprocessor computers. SIAM Journal on Scientific Computing, 14(5): 1034–1056, 1993.zbMATHCrossRefMathSciNetGoogle Scholar
  22. 22.
    O. Schenk and K. Gärtner. Sparse factorization with two-level scheduling in PARADISO. In Proceedings of the 10th SIAM Conference on Parallel Processing for Scientific Computing, page 10 pages on CDROM, Portsmouth, Virginia, Mar. 2001.Google Scholar
  23. 23.
    SGI. Scientific computing software library (SCSL). Software and documuntation available online from from http://www.sgi.com/software/scsl.html, 1993-2001.
  24. 24.
    Supercomputing Technologies Group, MIT Laboratory for Computer Science, Cambridge, MA. Cilk-5.3 Reference Manual, June 2000. Available online at http://supertech.lcs.mit.edu/cilk.
  25. 25.
    S. Toledo. Locality of reference in LU decomposition with partial pivoting. SIAM Journal on Matrix Analysis and Applications, 18(4): 1065–1081, 1997.zbMATHCrossRefMathSciNetGoogle Scholar
  26. 26.
    R. C. Whaley and J. J. Dongarra. Automatically tuned linear algebra software. Technical report, Computer Science Department, University Of Tennessee, 1998. available online at http://www.netlib.org/atlas.

Copyright information

© Springer-Verlag Berlin Heidelberg 2002

Authors and Affiliations

  • Dror Irony
    • 1
  • Gil Shklarski
    • 1
  • Sivan Toledo
    • 1
  1. 1.School of Computer ScienceTel-Aviv UnivsityIsrael

Personalised recommendations