Recursive blocked data formats and BLAS’s for dense linear algebra algorithms
Recursive blocked data formats and recursive blocked BLAS’s are introduced and applied to dense linear algebra algorithms that are typified by LAPACK. The new data formats allow for maintaining data locality at every level of the memory hierarchy and hence providing high performance on today’s memory tiered processors. This new data format is hybrid. It contains blocking parameters which are chosen so that the associated submatrices of a block-partitioned A fir into level 1 cache. The recursive part of the data format chooses a linear order of the blocks that maintains a two-dimensional data locality of A in a one-dimensional tiered memory structure. We argue that, out of the NB factorial choices of ordering the NB blocks, our recursive ordering leads to one of the best. This is because our algorithms are also recursive and will do their computations on submatrices that follow the new recursive data structure definition. This is in analogy with the well known principle that the data structure should be matched to the algorithm. Performance results in support for our recursive approach are also presented.
KeywordsRecursive Algorithm Memory Hierarchy Factorization Algorithm Recursive Data Format Block Column
Unable to display preview. Download preview PDF.
- 4.E. Elmroth and F. Gustavson. New Serial and Parallel Recursive QR Factorization Algorithms for SMP Systems, This Proceedings, Springer Verlag, 1998.Google Scholar
- 5.IBM. Engineering and Scientific Subroutine Library, Guide and Reference, January 1994. SC23-0526-01.Google Scholar
- 6.F. Gustavson. Recursion leads to automatic variable blocking for dense linear algebra. IBM J. Res. Develop, 41(6):737–755, November 1997.Google Scholar
- 7.F. Gustavson, A. Henriksson, I. Jonsson, B. Kågström and P. Ling. Superscalar GEMM-based Level 3 BLAS—The Ongoing Evolution of a Portable High-Performance Library. This Proceedings, Springer Verlag, 1998.Google Scholar
- 8.A. Henriksson and I. Jonsson. High-Performance Matrix Multiplication on the IBM SP High Node. Master Thesis, UMNAD 98.235, Department of Computing Science, Umeå University, S-901 87 Umeå, June 1998.Google Scholar
- 9.B. Kågström and C. Van Loan. GEMM-Based Level-3 BLAS. Technical Report CTC91TR47, Department of Computer Science, Cornell University, December 1989.Google Scholar
- 10.B. Kågström, P. Ling, and C. Van Loan. GEMM-based level 3 BLAS: High-performance model implementations and performance evaluation benchmark. ACM Trans. Math. Software, 1997. Accepted for publication.Google Scholar