# The design of a parallel dense linear algebra software library: Reduction to Hessenberg, tridiagonal, and bidiagonal form

- 95 Downloads
- 19 Citations

## Abstract

This paper discusses issues in the design of ScaLAPACK, a software library for performing dense linear algebra computations on distributed memory concurrent computers. These issues are illustrated using the ScaLAPACK routines for reducing matrices to Hessenberg, tridiagonal, and bidiagonal forms. These routines are important in the solution of eigenproblems. The paper focuses on how building blocks are used to create higher-level library routines. Results are presented that demonstrate the scalability of the reduction routines. The most commonly-used building blocks used in ScaLAPACK are the sequencing BLAS, the parallel BLAS (PBLAS) and the Basic Linear Algebra Communication Subprograms (BLACS). Each of the matrix reduction algorithms consists of a series of steps in each of which one block column (or*panel*), and/or block row, of the matrix is reduced, followed by an update of the portion of the matrix that has not been factorized so far. This latter phase is performed using Level 3 PBLAS operations and contains the bulk of the computation. However, the panel reduction phase involves a significant amount of communication, and is important in determining the scalability of the algorithm. The simplest way to parallelize the panel reduction phase is to replace the BLAS routines appearing in the LAPACK routine (mostly matrix-vector and matrix-matrix multiplications) with the corresponding PBLAS routines. However, in some cases it is possible to reduce communication startup costs by performing the communication necessary for consecutive BLAS operations in a single communication using a BLACS call. Thus, there is a tradeoff between efficiency and software engineering considerations, such as ease of programming and simplicity of code.

## Keywords

Software Library Startup Cost Block Column Algebra Software Engineering Consideration## Preview

Unable to display preview. Download preview PDF.

## References

- [1]E. Anderson, Z. Bai, C. Bischof, J. Demmel, J. Dongarra, J. Du Croz, A. Greenbaum, S. Hammarling, A. McKenney, S. Ostrouchov and D. Sorensen,
*LAPACK User's Guide*(SIAM, Philadelphia, PA, 1992).Google Scholar - [2]E. Anderson, Z. Bai, C. Bischof, J. Demmel, J.J. Dongarra, J. DuCroz, A. Greenbaum, S. Hammarling, A. McKenney and D. Sorensen, Lapack: A portable linear algebra library for high-performance computers, in:
*Proc. Supercomputing '90*(IEEE Press, 1990), pp. 1–10.Google Scholar - [3]C. Bischof and C. Van Loan, The wy representation for products of Householder matrices, SIAM J. Sci. Statist. Comp. 8 (1987) 2–13.CrossRefGoogle Scholar
- [4]J. Choi, J.J. Dongarra, S. Ostrouchov, A.P. Petitet, D.W. Walker and R.C. Whaley, The design and implementation of the ScaLAPACK LU, QR, and Cholesky factorization routines, submitted to Sci. Progr. (1994). Also available on Oak Ridge National Laboratory Technical Reports, TM-12270 (September, 1994).Google Scholar
- [5]J. Choi, J.J. Dongarra, R. Pozo and D.W. Walker, ScaLAPACK: A scalable linear algebra library for distributed memory concurrent computers, in:
*Proc. 4th Symp. on Massively Parallel Computing*, ed. H.J. Siegel (1992), pp. 120–127.Google Scholar - [6]J. Choi, J.J. Dongarra and D.W. Walker, The design of scalable software libraries for distributed memory concurrent computers, in:
*Environments and Tools for Parallel Scientific Computing*, eds. J.J. Congarra and B. Tourancheau, Proc. of Workshop, Saint Hilaire du Touvet, France (1993) pp. 3–15.Google Scholar - [7]J. Choi, J.J. Dongarra and D.W. Walker, PB-BLAS: A set of parallel block basic linear algebra subprograms, in:
*Proc. 1994 Scalable High Performance Computing Conf.*(IEEE Computer Society, 1994).Google Scholar - [8]J. Dongarra, J. Du Croz, I. Duff and S. Hammarling, A set of level 3 basic linear algebra subprograms, ACM Trans. Math. Softw. 16 (1990) 1–17.CrossRefGoogle Scholar
- [9]J. Dongarra and S. Ostrouchov, LAPACK block factorization algorithms on the Intel iPSC/860, Technical Report CS-90-115, University of Tennessee at Knoxville, Computer Science Department (October 1990).Google Scholar
- [10]J.J. Dongarra, LAPACK Working Note 34: Workshop on the BLACS, Computer Science Dept. Technical Report CS-91-134, University of Tennessee, Knoxville, TN (May 1991) (LAPACK Working Note #34).Google Scholar
- [11]J.J. Dongarra, J. Du Croz, S. Hammarling and R. Hanson, An extended set of Fortran basic linear algebra subroutines, ACM Trans. Math. Softw. 14 (1988) 1–17.CrossRefGoogle Scholar
- [12]J.J. Dongarra, S.J. Hammarling and D.C. Sorensen, Block reduction of matrices to condensed forms for eigenvalue computations, J. Comp. Appl. Math. 27 (1989) 215–227.CrossRefGoogle Scholar
- [13]J.J. Dongarra, R. van de Geijn and D.W. Walker, A look at scalable dense linear algebra libraries, in:
*Proc. Scalable High-Performance Computing Conf.*(IEEE, 1992), pp. 372–379.Google Scholar - [14]J.J. Dongarra and R.A. van de Geijn, Two-dimensional basic linear algebra communication subprograms, Technical Report LAPACK working note 37, Computer Science Department, University of Tennessee, Knoxville, TN (October 1991).Google Scholar
- [15]J.J. Dongarra and R.A. van de Geijn, Reduction to condensed form for the eigenvalue problem on distributed memory architectures, Parallel Comp. 18 (1992) 973–982.CrossRefGoogle Scholar
- [16]J.J. Dongarra, R.A. van de Geijn and D.W. Walker, Scalability issues affecting the design of dense linear algebra library, J. Parallel Distr. Comp. (1994), to appear.Google Scholar
- [17]J.J. Dongarra and D.W. Walker, Software libraries for linear algebra computations on high performance computers, SIAM Rev. 37 (1995) 151–180.CrossRefGoogle Scholar
- [18]G.C. Fox, M.A. Johnson, G.A. Lyzenga, S.W. Otto, J.K. Salmon and D.W. Walker,
*Solving Problems on Concurrent Processors*, vol. 1 (Prentice Hall, Englewood Cliffs, NJ, 1988).Google Scholar - [19]G.H. Golub and C.F. Van Loan,
*Matrix Computations*, 2nd ed. (The Johns Hopkins Press, Baltimore, MD, 1989).Google Scholar - [20]C. Lawson, R. Hanson, D. Kincaid and F. Krogh, Basic linear algebra subprograms for Fortran usage, ACM Trans. Math. Softw. 5 (1979) 308–323.CrossRefGoogle Scholar
- [21]W. Lichtenstein and S.L. Johnsson, Block cyclic dense linear algebra, SIAM J. Sci. Comp. 14 (1993) 1259–1288.CrossRefGoogle Scholar
- [22]R. Schreiber and C. Van Loan, A storage efficient wy representation for products of Householder transformations, SIAM J. Sci. Statist. Comp. 10 (1989) 53–57.CrossRefGoogle Scholar
- [23]C. Smith, B. Hendrickson and E. Jessup, A parallel algorithm for Householder tridiagonalization, in:
*Proc. 5th SIAM Conf. on Applied Linear Algebra*(June 1994), pp. 361–365.Google Scholar - [24]E.F. Van de Velde, Data redistribution and concurrency, Parallel Comp. 16 (December 1990).Google Scholar