Parallel solution of irregular, sparse matrix problems using High Performance Fortran
For regular, sparse, linear systems, like those derived from regular grids, using High Performance Fortran (HPF) for iterative solvers is straightforward. However, for irregular matrices the efficient implementation of solvers in HPF becomes much harder.
First, the locality in the computations (a good partitioning) is unclear. Second, for efficiency we often use storage schemes that obscure even the simplest structure in the matrix (like rows and columns). Third, the limited capabilities of HPF to distribute data structures make it hard to implement the desired distribution. Fourth, data structures often have very different sizes and shapes, and matching the distributions for efficient implementation (locality) is a problem. Fifth, after implementing the distributions, we still must write the program in such a way that the compiler recognizes the efficient implementation and leaves out unnecessary communication, synchronization, etc.
We discuss techniques for handling these problems, and our results demonstrate that efficient implementations are possible. In fact, we show that on larger numbers of processors the efficiency of our irregular, sparse matrix-vector product is higher than the efficiency of the inner product, another essential kernel in iterative methods. For comparison we show results for regular, sparse matrices.
All our experiments are carried out using the Portland Group (PGI) HPF compiler (version 2.1) on the Intel Paragon at the Swiss Federal Institute of Technology (ETH Zurich).
- 1.S. T. Barnard and H. D. Simon. A fast multilevel implementation of recursive spectral bisection for partitioning unstructured problems. Technical Report RNR-92-033, NASA Ames Research Center, Mail Stop T045-1, Moffet Field, CA 94035, USA, 1992.Google Scholar
- 2.E. De Sturler. Iterative Methods on Distributed Memory Computers. PhD thesis, Delft University of Technology, Delft, The Netherlands, October 1994.Google Scholar
- 3.E. De Sturler. Incomplete Block LU preconditioners on slightly overlapping subdomains for a massively parallel computer. Applied Numerical Mathematics (IMACS), 19:129–146, 1995.Google Scholar
- 4.E. De Sturler and H. A. Van der Vorst. Communication cost reduction for Krylov methods on parallel computers. In W. Gentzsch and U. Harms, editors, High-Performance Computing and Networking, Lecture Notes in Computer Science 797, pages 190–195, Berlin, Heidelberg, Germany, 1994. Springer-Verlag.Google Scholar
- 5.E. De Sturler and H. A. Van der Vorst. Reducing the effect of global communication in GMRES(m) and CG on parallel distributed memory computers. Applied Numerical Mathematics (IMACS), 18:441–459, 1995.Google Scholar
- 6.F. Nataf, F. Rogier, and E. De Sturler. Domain decomposition methods for fluid dynamics. In A. Sequeira, editor, Navier-Stokes Equations and Related Nonlinear Problems, New York, 1995. Plenum Press.Google Scholar
- 7.A. Pothen, H. D. Simon, and K.-P. Liou. Partitioning sparse matrices with eigenvectors of graphs. SIAM J. Matrix Anal. Appl., 11:430–452, 1990.Google Scholar
- 8.Y. Saad and M. Schultz. GMRES: A generalized minimal residual algorithm for solving nonsymmetric linear systems. SIAM J. Sci. Statist. Cornput., 7:856–869, 1986.Google Scholar