Abstract
We shall say that a scalable algorithm achieves efficiency that is bounded away from zero as the number of processors and the problem size increase in such a way that the size of the data structures increases linearly with the number of processors. In this paper we show that the column-oriented approach to sparse Cholesky for distributed-memory machines is not scalable. By considering message volume, node contention, and bisection width, one may obtain lower bounds on the time required for communication in a distributed algorithm. Applying this technique to distributed, column-oriented, dense Cholesky leads to the conclusion that N (the order of the matrix) must scale with P (the number of processors) so that storage grows like P 2. So the algorithm is not scalable. Identical conclusions have previously been obtained by consideration of communication and computation latency on the critical path in the algorithm; these results complement and reinforce that conclusion.
For the sparse case, both theory and some new experimental measurements, reported here, make the same point: for column-oriented distributed methods, the number of gridpoints (which is O(N)) must grow as P 2 in order to maintain parallel efficiency bounded above zero. Our sparse matrix results employ the “fan-in” distributed scheme, implemented on machines with either a grid or a fat-tree interconnect using a subtree-to-submachine mapping of the columns.
The alternative of distributing the rows and columns of the matrix to the rows and columns of a grid of processors is shown to be scalable for the dense case. Its scalability for the sparse case has been established previously [10]. To date, however, none of these methods has achieved high efficiency on a highly parallel machine.
Finally, open problems and other approaches that may be more fruitful are discussed.
Keywords
Research Institute for Advanced Computer Science, MS T045-1 NASA Ames Research Center, Moffett Field, CA 94035. This author’s work was supported by the NAS Systems Division via Cooperative Agreement NCC 2-387 between NASA and the University Space Research Association (USRA).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
E. Anderson, A. Benzoni, J. Dongarra, S. Moulton, S. Ostrouchov, B. TourancheauAND R. VAN DE Geijn, LAPACK for distributed memory architectures: progress report, In Parallel Processing for Scientific Computing, SIAM, 1992.
C. Ashcraft, S. C. Eisenstat, AND J. W. H. Liu, A fan-in algorithm for distributed sparse numerical factorization, SIAM J. Scient. Stat. Comput. 11 (1990), pp. 593–599.
C. Ashcraft, S. C. Eisenstat, J. W. H. Liu, AND A. H. Sherman, A comparison of three column-based distributed sparse factorization schemes, Research Report YALEU/DCS/RR810, Comp. Sci. Dept., Yale Univ., 1990.
C. Ashcraft, S. C. Eisenstat, J. W. H. Liu, B.W. Peyton, AND A. H. Sherman, A compute-ahead fan-in scheme for parallel sparse matrix factorization, In D. Pelletier, editor, Proceedings, Supercomputing Symposium ’90, pp. 351–361. École Polytechnique de Montréal, 1990.
C. Ashcraft, The fan-both family of column-based distributed Cholesky factorization algorithms, These proceedings.
P. Bjorstad AND M. D. Skogen, Domain decomposition algorithms of Schwarz type, designed for massively parallel computers. Proceedings of the Fifth International Symposium on Domain Decomposition. SIAM, 1992.
J. Dongarra, R VAN DE Geijn, AND D. Walker, A look at scalable dense linear algebra libraries, Proceedings, Scalable High Performance Computer Conference, Williamsburg, VA, 1992.
A. George, J. W. H. Liu, AND E. Ng, Communication results for parallel sparse Cholesky factorization on a hypercube, Parallel Comput. 10 (1989), pp. 287–298.
A. George, M. T. Heath, J. W. H. Liu, AND E. Ng, Solution of sparse positive definite systems on a hypercube, J. Comput. Appl. Math. 27 (1989), pp. 129–156.
J. R. Gilbert AND R. Schreiber, Highly parallel sparse Cholesky factorization,SIAM J. Scient. Stat. Comput., to appear.
J. R. Gilbert, C. Moler, AND R. Schreiber, Sparse matrices in MATLAB: design and implementation, SIAM J. Matrix Anal. Appl. 13 (1992), pp. 333–356.
S. W. Hammond, Mapping Unstructured Grid Computations to Massively Parallel Computers, PhD thesis, Dept. of Comp. Sci., Rensselaer Polytechnic Institute, 1992.
S. W. Hammond AND R. Schreiber, Mapping unstructured grid problems to the Connection Machine, In Piyush Mehrotra, J. Saltz, and R. Voigt, editors, Unstructured Scientific Computation on Multiprocessors, pp. 11–30. MIT Press, 1992.
M. T. Heath, E. Ng, AND B. W. Peyton, Parallel algorithms for sparse linear systems, SIAM Review 33 (1991), pp. 420–460.
S. G. Kratzer, Massively parallel sparse matrix computations, In P. Mehrotra, J. Saltz, and R. Voigt, editors, Unstructured Scientific Computation on Multiprocessors, pp. 178–186. MIT Press, 1992. A more complete version will appear in J. Supercomputing.
C. E. Leiserson, Fat-trees: universal networks for hardware-efficient supercomputing, IEEE Trans. Comput. C-34 (1985), pp. 892–901.
Guangye LI AND Thomas F. Coleman, A parallel triangular solver for a distributed memory multiprocessor, SIAM J. Scient. Stat. Comput. 9 (1988), pp. 485–502.
M. Mu AND J. R. Rice, Performance of PDE sparse solvers on hypercubes, In P. Mehrotra, J. Saltz, and R. Voigt, editors, Unstructured Scientific Computation on Multiprocessors, pp. 345–370. MIT Press, 1992.
M. Mu AND J. R. Rice, A grid based subtree-subcube assignment strategy for solving PDEs on hypercubes, Siam J. Scient. Stat. Comput., 13 (1992), pp. 826–839.
A. T. Ogielski AND W. Aiello, Sparse matrix algebra on parallel processor arrays,These proceedings.
D. P. O’leary AND G. W. Stewart, Data-flow algorithms for parallel matrix computations, Comm. ACM, 28 (1985), pp. 840–853.
L.S. Ostrouchov, M.T. Heath, AND C.H. Romine, Modeling speedup in parallel sparse matrix factorization, Tech Report ORNL/TM-11786, Mathematical Sciences Section, Oak Ridge National Lab., December, 1990.
D. Patterson, Massively parallel computer architecture: observations and ideas on a new theoretical model, Comp. Sci. Dept., Univ. of California at Berkeley, 1992.
C. Pommerell, M. Annaratone, AND W. Fichtner, A set of new mapping and coloring heuristics for distributed-memory parallel processors, SIAM J. Scient. Stat. Comput. 13 (1992), pp. 194–226.
A. Pothen, H. D. Simon, And L. Wang, Spectral nested dissection,Report CS-92-01, Comp. Sci. Dept., Penn State Univ. Submitted to J. Parallel and Distrib. Comput.
E. Rothberg AND A. Gupta, The performance impact of data reuse in parallel dense Cholesky factorization, Stanford Comp. Sci. Dept. Report STAN-CS-92–1401.
E. Rothberg AND A. Gupta, An efficient block-oriented approach to parallel sparse Cholesky factorization, Stanford Comp. Sci. Dept. Tech. Report, 1992.
Y. Saad AND M.H. Schultz, Data communication in parallel architectures, Parallel Comput. 11 (1989), pp. 131–150.
S. Venugopal AND V. K. Naik, Effects of partitioning and scheduling sparse matrix factorization on communication and load balance, Proceedings, Supercomputing 91, pp. 866–875. IEEE Computer Society Press, 1991.
L. Hulbert AND E. Zmijewski, Limiting communication in parallel sparse Cholesky factorization, SIAM J. Matrix Anal. Applics. 12 (1991), pp. 1184–1197.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1993 Springer-Verlag New York, Inc.
About this paper
Cite this paper
Schreiber, R. (1993). Scalability of Sparse Direct Solvers. In: George, A., Gilbert, J.R., Liu, J.W.H. (eds) Graph Theory and Sparse Matrix Computation. The IMA Volumes in Mathematics and its Applications, vol 56. Springer, New York, NY. https://doi.org/10.1007/978-1-4613-8369-7_9
Download citation
DOI: https://doi.org/10.1007/978-1-4613-8369-7_9
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4613-8371-0
Online ISBN: 978-1-4613-8369-7
eBook Packages: Springer Book Archive