Evaluation of Sparse LU Factorization and Triangular Solution on Multicore Platforms

  • Xiaoye Sherry Li
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5336)

Abstract

The Chip Multiprocessor (CMP) will be the basic building block for computer systems ranging from laptops to supercomputers. New software developments at all levels are needed to fully utilize these systems. In this work, we evaluate performance of different high-performance sparse LU factorization and triangular solution algorithms on several representative multicore machines. We include both pthreads and MPI implementations in this study, and found that the pthreads implementation consistently delivers good performance and a left-looking algorithm is usually superior.

Keywords

Memory Bandwidth Task Queue Chip Multiprocessor Multicore Platform Hardware Thread 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    CrayPatCray Performance Analysis Tools, http://docs.cray.com/books/S-2376-41/S-2376-41.pdf
  2. 2.
    Davis, T.A.: University of Florida Sparse Matrix Collection, http://www.cise.ufl.edu/research/sparse/matrices
  3. 3.
    Demmel, J.W., Gilbert, J.R., Li, X.S.: An asynchronous parallel supernodal algorithm for sparse gaussian elimination. SIAM J. Matrix Analysis and Applications 20(4), 915–952 (1999)MathSciNetCrossRefMATHGoogle Scholar
  4. 4.
    Demmel, J.W., Gilbert, J.R., Li, X.S.: SuperLU Users Guide. Technical Report LBNL-44289, Lawrence Berkeley National Laboratory (September 1999)(Last update: September 2007), http://crd.lbl.gov/~xiaoye/SuperLU/
  5. 5.
    Duff, I.S., Koster, J.: On algorithms for permuting large entries to the diagonal of a sparse matrix. SIAM J. Matrix Analysis and Applications 22(4), 973–996 (2001)MathSciNetCrossRefMATHGoogle Scholar
  6. 6.
    Li, X.S.: Sparse Gaussian elimination on high performance computers. Technical Report UCB//CSD-96-919, Computer Science Division, U.C. Berkeley, Ph.D dissertation (September 1996)Google Scholar
  7. 7.
    Li, X.S.: An overview of SuperLU: Algorithms, implementation, and user interface. ACM Trans. Mathematical Software 31(3), 302–325 (2005)MathSciNetCrossRefMATHGoogle Scholar
  8. 8.
    Li, X.S., Demmel, J.W.: SuperLU_DIST: A scalable distributed-memory sparse direct solver for unsymmetric linear systems. ACM Trans. Mathematical Software 29(2), 110–140 (2003)MathSciNetCrossRefMATHGoogle Scholar
  9. 9.
    MPICH - A Portable Implementation of MPI, http://www-unix.mcs.anl.gov/mpi/mpich1/
  10. 10.
    PAPI - Performance Application Programming Interface, http://icl.cs.utk.edu/papi/
  11. 11.
    Phillips, S.: Victoriafalls: Scaling highly-threaded processor cores. In: HOT CHIPS 19: A Symposium on High Performance Chips, Stanford, California, August 19-21 (2007)Google Scholar
  12. 12.
    Shalf, J.: Private communicationsGoogle Scholar
  13. 13.
    Williams, S.: Private communicationsGoogle Scholar
  14. 14.
    Williams, S., Oliker, L., Vuduc, R., Shalf, J., Yelick, K., Demmel, J.: Optimization of sparse matrix-vector multiplication on emerging multicore platforms. In: Supercomputing (SC), Reno, California, November 10-16 (2007)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Xiaoye Sherry Li
    • 1
  1. 1.Lawrence Berkeley National Laboratory, MS 50F-1650BerkeleyUSA

Personalised recommendations