VECPAR 2008: High Performance Computing for Computational Science - VECPAR 2008 pp 287-300 | Cite as
Evaluation of Sparse LU Factorization and Triangular Solution on Multicore Platforms
Abstract
The Chip Multiprocessor (CMP) will be the basic building block for computer systems ranging from laptops to supercomputers. New software developments at all levels are needed to fully utilize these systems. In this work, we evaluate performance of different high-performance sparse LU factorization and triangular solution algorithms on several representative multicore machines. We include both pthreads and MPI implementations in this study, and found that the pthreads implementation consistently delivers good performance and a left-looking algorithm is usually superior.
Keywords
Memory Bandwidth Task Queue Chip Multiprocessor Multicore Platform Hardware ThreadPreview
Unable to display preview. Download preview PDF.
References
- 1.CrayPatCray Performance Analysis Tools, http://docs.cray.com/books/S-2376-41/S-2376-41.pdf
- 2.Davis, T.A.: University of Florida Sparse Matrix Collection, http://www.cise.ufl.edu/research/sparse/matrices
- 3.Demmel, J.W., Gilbert, J.R., Li, X.S.: An asynchronous parallel supernodal algorithm for sparse gaussian elimination. SIAM J. Matrix Analysis and Applications 20(4), 915–952 (1999)MathSciNetCrossRefMATHGoogle Scholar
- 4.Demmel, J.W., Gilbert, J.R., Li, X.S.: SuperLU Users Guide. Technical Report LBNL-44289, Lawrence Berkeley National Laboratory (September 1999)(Last update: September 2007), http://crd.lbl.gov/~xiaoye/SuperLU/
- 5.Duff, I.S., Koster, J.: On algorithms for permuting large entries to the diagonal of a sparse matrix. SIAM J. Matrix Analysis and Applications 22(4), 973–996 (2001)MathSciNetCrossRefMATHGoogle Scholar
- 6.Li, X.S.: Sparse Gaussian elimination on high performance computers. Technical Report UCB//CSD-96-919, Computer Science Division, U.C. Berkeley, Ph.D dissertation (September 1996)Google Scholar
- 7.Li, X.S.: An overview of SuperLU: Algorithms, implementation, and user interface. ACM Trans. Mathematical Software 31(3), 302–325 (2005)MathSciNetCrossRefMATHGoogle Scholar
- 8.Li, X.S., Demmel, J.W.: SuperLU_DIST: A scalable distributed-memory sparse direct solver for unsymmetric linear systems. ACM Trans. Mathematical Software 29(2), 110–140 (2003)MathSciNetCrossRefMATHGoogle Scholar
- 9.MPICH - A Portable Implementation of MPI, http://www-unix.mcs.anl.gov/mpi/mpich1/
- 10.PAPI - Performance Application Programming Interface, http://icl.cs.utk.edu/papi/
- 11.Phillips, S.: Victoriafalls: Scaling highly-threaded processor cores. In: HOT CHIPS 19: A Symposium on High Performance Chips, Stanford, California, August 19-21 (2007)Google Scholar
- 12.Shalf, J.: Private communicationsGoogle Scholar
- 13.Williams, S.: Private communicationsGoogle Scholar
- 14.Williams, S., Oliker, L., Vuduc, R., Shalf, J., Yelick, K., Demmel, J.: Optimization of sparse matrix-vector multiplication on emerging multicore platforms. In: Supercomputing (SC), Reno, California, November 10-16 (2007)Google Scholar