Numerical Algorithms

, Volume 10, Issue 2, pp 379–399 | Cite as

The design of a parallel dense linear algebra software library: Reduction to Hessenberg, tridiagonal, and bidiagonal form

  • Jaeyoung Choi
  • Jack J. Dongarra
  • David W. Walker


This paper discusses issues in the design of ScaLAPACK, a software library for performing dense linear algebra computations on distributed memory concurrent computers. These issues are illustrated using the ScaLAPACK routines for reducing matrices to Hessenberg, tridiagonal, and bidiagonal forms. These routines are important in the solution of eigenproblems. The paper focuses on how building blocks are used to create higher-level library routines. Results are presented that demonstrate the scalability of the reduction routines. The most commonly-used building blocks used in ScaLAPACK are the sequencing BLAS, the parallel BLAS (PBLAS) and the Basic Linear Algebra Communication Subprograms (BLACS). Each of the matrix reduction algorithms consists of a series of steps in each of which one block column (orpanel), and/or block row, of the matrix is reduced, followed by an update of the portion of the matrix that has not been factorized so far. This latter phase is performed using Level 3 PBLAS operations and contains the bulk of the computation. However, the panel reduction phase involves a significant amount of communication, and is important in determining the scalability of the algorithm. The simplest way to parallelize the panel reduction phase is to replace the BLAS routines appearing in the LAPACK routine (mostly matrix-vector and matrix-matrix multiplications) with the corresponding PBLAS routines. However, in some cases it is possible to reduce communication startup costs by performing the communication necessary for consecutive BLAS operations in a single communication using a BLACS call. Thus, there is a tradeoff between efficiency and software engineering considerations, such as ease of programming and simplicity of code.


Software Library Startup Cost Block Column Algebra Software Engineering Consideration 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. [1]
    E. Anderson, Z. Bai, C. Bischof, J. Demmel, J. Dongarra, J. Du Croz, A. Greenbaum, S. Hammarling, A. McKenney, S. Ostrouchov and D. Sorensen,LAPACK User's Guide (SIAM, Philadelphia, PA, 1992).Google Scholar
  2. [2]
    E. Anderson, Z. Bai, C. Bischof, J. Demmel, J.J. Dongarra, J. DuCroz, A. Greenbaum, S. Hammarling, A. McKenney and D. Sorensen, Lapack: A portable linear algebra library for high-performance computers, in:Proc. Supercomputing '90 (IEEE Press, 1990), pp. 1–10.Google Scholar
  3. [3]
    C. Bischof and C. Van Loan, The wy representation for products of Householder matrices, SIAM J. Sci. Statist. Comp. 8 (1987) 2–13.CrossRefGoogle Scholar
  4. [4]
    J. Choi, J.J. Dongarra, S. Ostrouchov, A.P. Petitet, D.W. Walker and R.C. Whaley, The design and implementation of the ScaLAPACK LU, QR, and Cholesky factorization routines, submitted to Sci. Progr. (1994). Also available on Oak Ridge National Laboratory Technical Reports, TM-12270 (September, 1994).Google Scholar
  5. [5]
    J. Choi, J.J. Dongarra, R. Pozo and D.W. Walker, ScaLAPACK: A scalable linear algebra library for distributed memory concurrent computers, in:Proc. 4th Symp. on Massively Parallel Computing, ed. H.J. Siegel (1992), pp. 120–127.Google Scholar
  6. [6]
    J. Choi, J.J. Dongarra and D.W. Walker, The design of scalable software libraries for distributed memory concurrent computers, in:Environments and Tools for Parallel Scientific Computing, eds. J.J. Congarra and B. Tourancheau, Proc. of Workshop, Saint Hilaire du Touvet, France (1993) pp. 3–15.Google Scholar
  7. [7]
    J. Choi, J.J. Dongarra and D.W. Walker, PB-BLAS: A set of parallel block basic linear algebra subprograms, in:Proc. 1994 Scalable High Performance Computing Conf. (IEEE Computer Society, 1994).Google Scholar
  8. [8]
    J. Dongarra, J. Du Croz, I. Duff and S. Hammarling, A set of level 3 basic linear algebra subprograms, ACM Trans. Math. Softw. 16 (1990) 1–17.CrossRefGoogle Scholar
  9. [9]
    J. Dongarra and S. Ostrouchov, LAPACK block factorization algorithms on the Intel iPSC/860, Technical Report CS-90-115, University of Tennessee at Knoxville, Computer Science Department (October 1990).Google Scholar
  10. [10]
    J.J. Dongarra, LAPACK Working Note 34: Workshop on the BLACS, Computer Science Dept. Technical Report CS-91-134, University of Tennessee, Knoxville, TN (May 1991) (LAPACK Working Note #34).Google Scholar
  11. [11]
    J.J. Dongarra, J. Du Croz, S. Hammarling and R. Hanson, An extended set of Fortran basic linear algebra subroutines, ACM Trans. Math. Softw. 14 (1988) 1–17.CrossRefGoogle Scholar
  12. [12]
    J.J. Dongarra, S.J. Hammarling and D.C. Sorensen, Block reduction of matrices to condensed forms for eigenvalue computations, J. Comp. Appl. Math. 27 (1989) 215–227.CrossRefGoogle Scholar
  13. [13]
    J.J. Dongarra, R. van de Geijn and D.W. Walker, A look at scalable dense linear algebra libraries, in:Proc. Scalable High-Performance Computing Conf. (IEEE, 1992), pp. 372–379.Google Scholar
  14. [14]
    J.J. Dongarra and R.A. van de Geijn, Two-dimensional basic linear algebra communication subprograms, Technical Report LAPACK working note 37, Computer Science Department, University of Tennessee, Knoxville, TN (October 1991).Google Scholar
  15. [15]
    J.J. Dongarra and R.A. van de Geijn, Reduction to condensed form for the eigenvalue problem on distributed memory architectures, Parallel Comp. 18 (1992) 973–982.CrossRefGoogle Scholar
  16. [16]
    J.J. Dongarra, R.A. van de Geijn and D.W. Walker, Scalability issues affecting the design of dense linear algebra library, J. Parallel Distr. Comp. (1994), to appear.Google Scholar
  17. [17]
    J.J. Dongarra and D.W. Walker, Software libraries for linear algebra computations on high performance computers, SIAM Rev. 37 (1995) 151–180.CrossRefGoogle Scholar
  18. [18]
    G.C. Fox, M.A. Johnson, G.A. Lyzenga, S.W. Otto, J.K. Salmon and D.W. Walker,Solving Problems on Concurrent Processors, vol. 1 (Prentice Hall, Englewood Cliffs, NJ, 1988).Google Scholar
  19. [19]
    G.H. Golub and C.F. Van Loan,Matrix Computations, 2nd ed. (The Johns Hopkins Press, Baltimore, MD, 1989).Google Scholar
  20. [20]
    C. Lawson, R. Hanson, D. Kincaid and F. Krogh, Basic linear algebra subprograms for Fortran usage, ACM Trans. Math. Softw. 5 (1979) 308–323.CrossRefGoogle Scholar
  21. [21]
    W. Lichtenstein and S.L. Johnsson, Block cyclic dense linear algebra, SIAM J. Sci. Comp. 14 (1993) 1259–1288.CrossRefGoogle Scholar
  22. [22]
    R. Schreiber and C. Van Loan, A storage efficient wy representation for products of Householder transformations, SIAM J. Sci. Statist. Comp. 10 (1989) 53–57.CrossRefGoogle Scholar
  23. [23]
    C. Smith, B. Hendrickson and E. Jessup, A parallel algorithm for Householder tridiagonalization, in:Proc. 5th SIAM Conf. on Applied Linear Algebra (June 1994), pp. 361–365.Google Scholar
  24. [24]
    E.F. Van de Velde, Data redistribution and concurrency, Parallel Comp. 16 (December 1990).Google Scholar

Copyright information

© J.C. Baltzer AG, Science Publishers 1995

Authors and Affiliations

  • Jaeyoung Choi
    • 1
  • Jack J. Dongarra
    • 1
    • 2
  • David W. Walker
    • 2
  1. 1.Department of Computer ScienceUniversity of Tennessee at KnoxvilleKnoxvilleUSA
  2. 2.Mathematical Sciences SectionOak Ridge National LaboratoryOak RidgeUSA

Personalised recommendations