Recursive blocked data formats and BLAS’s for dense linear algebra algorithms

  • Fred Gustavson
  • André Henriksson
  • Isak Jonsson
  • Bo Kågström
  • Per Ling
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1541)


Recursive blocked data formats and recursive blocked BLAS’s are introduced and applied to dense linear algebra algorithms that are typified by LAPACK. The new data formats allow for maintaining data locality at every level of the memory hierarchy and hence providing high performance on today’s memory tiered processors. This new data format is hybrid. It contains blocking parameters which are chosen so that the associated submatrices of a block-partitioned A fir into level 1 cache. The recursive part of the data format chooses a linear order of the blocks that maintains a two-dimensional data locality of A in a one-dimensional tiered memory structure. We argue that, out of the NB factorial choices of ordering the NB blocks, our recursive ordering leads to one of the best. This is because our algorithms are also recursive and will do their computations on submatrices that follow the new recursive data structure definition. This is in analogy with the well known principle that the data structure should be matched to the algorithm. Performance results in support for our recursive approach are also presented.


Recursive Algorithm Memory Hierarchy Factorization Algorithm Recursive Data Format Block Column 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    R. C. Agarwal, F. G. Gustavson, and M. Zubair. Improving performance of linear algebra algorithms for dense matrices, using algorithmic prefetch. IBM J. Res. Develop, 38(3):265–275, May 1994.zbMATHCrossRefGoogle Scholar
  2. 2.
    E. Anderson, Z. Bai, J. Demmel, J. Dongarra, J. Du Croz, A. Greenbaum, S. Hammarling, A. McKenney, S. Ostrouchov and D. Sorensen. LAPACK Users’ Guide, Second Edition. SIAM Publications, Philadelphia, 1995.zbMATHGoogle Scholar
  3. 3.
    J. Dongarra, J. DuCroz, I. Duff, and S. Hammarling. A Set of Level 3 Basic Linear Algebra Subprograms. ACM Trans. Math. Softw., 16(1):1–17, March 1990.zbMATHCrossRefGoogle Scholar
  4. 4.
    E. Elmroth and F. Gustavson. New Serial and Parallel Recursive QR Factorization Algorithms for SMP Systems, This Proceedings, Springer Verlag, 1998.Google Scholar
  5. 5.
    IBM. Engineering and Scientific Subroutine Library, Guide and Reference, January 1994. SC23-0526-01.Google Scholar
  6. 6.
    F. Gustavson. Recursion leads to automatic variable blocking for dense linear algebra. IBM J. Res. Develop, 41(6):737–755, November 1997.Google Scholar
  7. 7.
    F. Gustavson, A. Henriksson, I. Jonsson, B. Kågström and P. Ling. Superscalar GEMM-based Level 3 BLAS—The Ongoing Evolution of a Portable High-Performance Library. This Proceedings, Springer Verlag, 1998.Google Scholar
  8. 8.
    A. Henriksson and I. Jonsson. High-Performance Matrix Multiplication on the IBM SP High Node. Master Thesis, UMNAD 98.235, Department of Computing Science, Umeå University, S-901 87 Umeå, June 1998.Google Scholar
  9. 9.
    B. Kågström and C. Van Loan. GEMM-Based Level-3 BLAS. Technical Report CTC91TR47, Department of Computer Science, Cornell University, December 1989.Google Scholar
  10. 10.
    B. Kågström, P. Ling, and C. Van Loan. GEMM-based level 3 BLAS: High-performance model implementations and performance evaluation benchmark. ACM Trans. Math. Software, 1997. Accepted for publication.Google Scholar
  11. 11.
    S. Toledo. Locality of Reference in LU Decomposition with Partial Pivoting. SIAM J. Matrix Anal. Appl., 18(4):1065–1081, 1997.zbMATHMathSciNetCrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1998

Authors and Affiliations

  • Fred Gustavson
    • 1
  • André Henriksson
    • 2
  • Isak Jonsson
    • 2
  • Bo Kågström
    • 2
  • Per Ling
    • 2
  1. 1.IBM T.J. Watson Research CenterYorktown HeightsU.S.A.
  2. 2.Department of Computing Science and HPC2NUmeå UniversityUmeåSweden

Personalised recommendations