# An adaptive blocking strategy for matrix factorizations

## Abstract

On most high-performance architectures, data movement is slow compared to floating-point (in particular, vector) performance. On these architectures block algorithms have been successful for matrix computations. By considering a matrix as a collection of submatrices (the so-called blocks) one naturally arrives at algorithms that require little data movement. The optimal blocking strategy, however, depends on the computing environment and on the problem parameters. Current approaches use fixed-width blocking strategies that are not optimal. This paper presents an “adaptive blocking” methodology for determining in a systematic manner an optimal blocking strategy for a uniprocessor machine. We demonstrate this technique on a block QR factorization routine on a uniprocessor. After generating timing models for the high-level kernels of the algorithm we can formulate the optimal blocking strategy in a recurrence relation that we can solve inexpensively with a dynamic programming technique. Experiments on one processor of a CRAY-2 show that in fact the resulting blocking strategy is as good as any fixed-width blocking strategy. So while we do not know the optimum fixed-width blocking strategy unless we re-run the same problem several times, adaptive blocking provides optimum performance in the very first run.

## Keywords

block algorithm adaptive blocking performance evaluation performance portability QR factorization## Preview

Unable to display preview. Download preview PDF.

## References

- [1]Alfred Aho, John Hopcroft, and Jeffrey Ullman.
*The Design and Analysis of Computer Algorithms*. Addison-Wesley, Reading, Mass., 1974.Google Scholar - [2]Michael Berry, Kyle Gallivan, William Harrod, William Jalby, Sy-Shin Lo, Ulrike Meier, Bernard Philippe, and Ahmed Sameh. Parallel algorithms on the Cedar system. In W. Händler, editor,
*Proceedings of CONPAR 86*, pages 25–39. Springer Verlag, New York, 1986.Google Scholar - [3]Christian Bischof, James Demmel, Jack Dongarra, Jeremy Du Croz, Anne Greenbaum, Sven Hammarling, and Danny Sorensen. LAPACK Working Note #5: Provisional contents. Technical Report ANL-88-38, Argonne National Laboratory, Mathematics and Computer Sciences Division, September 1988.Google Scholar
- [4]Christian H. Bischof. Adaptive blocking in the QR factorization.
*The Journal of Supercomputing*, 3(3):193–208, 1989.CrossRefGoogle Scholar - [5]Christian H. Bischof. Computing the singular value decomposition on a distributed system of vector processors.
*Parallel Computing*, 11:171–186, 1989.CrossRefGoogle Scholar - [6]Christian H. Bischof and Jack J. Dongarra. A project for developing a linear algebra library for high-performance computers. In Graham Carey, editor,
*Parallel and Vector Supercomputing: Methods and Algorithms*, pages 45–56. John Wiley & Sons, Somerset, NJ, 1989.Google Scholar - [7]Christian H. Bischof and Charles F. Van Loan. The WY representation for products of Householder matrices.
*SIAM Journal on Scientific and Statistical Computing*, 8:s2–s13, 1987.CrossRefGoogle Scholar - [8]William S. Cleveland, Susan J. Devlin, and Eric Grosse. Regression by local fitting: Methods, properties and computational algorithms.
*Journal of Econometrics*, 37:87–114, 1988.CrossRefGoogle Scholar - [9]Jim Demmel, Jack Dongarra, Jeremy Du Croz, Anne Greenbaum, Sven Hammarling, and Danny Sorensen. Prospectus for the development of a linear algebra library for high-performance computers. Technical Report ANL-MCS-TM97, Argonne National Laboratory, Mathematics and Computer Sciences Division, September 1987.Google Scholar
- [10]Jack Dongarra and Eric Grosse. Distribution of mathematical software by electronic mail.
*Communications of the ACM*, 30(5):403–407, 1987.CrossRefGoogle Scholar - [11]Jack Dongarra, Ahmed Sameh, and Danny Sorensen. Implementation of some concurrent algorithms for matrix factorization.
*Parallel Computing*, 3(1):25–34, 1986.CrossRefGoogle Scholar - [12]Jack J. Dongarra, Jeremy Du Croz, Sven Hammarling, and Richard J. Hanson. An extended set of Fortran basic linear algebra subprograms.
*ACM Transactions on Mathematical Software*, 14(1):1–17, 1988.CrossRefGoogle Scholar - [13]Jack J. Dongarra, Sven J. Hammarling, and Danny C. Sorensen. Block reduction of matrices to condensed form for eigenvalue computations. Technical Report MCS-TM-99, Argonne National Laboratory, Mathematics and Computer Sciences Division, September 1987.Google Scholar
- [14]Kyle Gallivan, William Jalby, Ulrike Meier, and Ahmed Sameh. The impact of hierarchical memory systems on linear algebra algorithm design.
*SIAM Journal on Scientific and Statistical Computing*, 8(6):1079–1084, November 1987.CrossRefGoogle Scholar - [15]Gene H. Golub and Charles F. Van Loan.
*Matrix Computations*. The Johns Hopkins University Press, 1983.Google Scholar - [16]William Harrod. Solving linear least squares problems on an Alliant FX/8. Technical report, University of Illinois at Urbana-Champaign, Center for Supercomputing Research and Development, 1986.Google Scholar
- [17]Kai Hwang and Fayé A. Briggs.
*Computer Architecture and Parallel Processing*. McGraw-Hill, New York, 1984.Google Scholar - [18]Peter Lancaster and Kestutis Šalkauskas.
*Curve and Surface Fitting: An Introduction*. Academic Press, San Diego, 1986.Google Scholar - [19]C. L. Lawson, R. J. Hanson, R. J. Kincaid, and F. T. Krogh. Basic linear algebra subprograms for Fortran usage.
*ACM Transactions on Mathematical Software*, 5(3):308–323, September 1979.CrossRefGoogle Scholar - [20]Peter Mayes and Guiseppe Radicati di Brozolo. Block factorization algorithms on the IBM 3090/VF. In
*Proceedings of the International Meeting on Supercomputing*, 1989.Google Scholar - [21]Robert Schreiber.
*Block Algorithms for Parallel Machines*, pages 197–207. Number 13 in IMA Volumes in Mathematics and its Applications. Springer Verlag, Berlin, 1988.Google Scholar - [22]Robert Schreiber and Charles Van Loan. A storage efficient WY representation for products of Householder transformations.
*SIAM Journal on Scientific and Statistical Computing*, 10(1):53–57, 1989.CrossRefGoogle Scholar