Abstract
We present a new recursive algorithm for the QR factorization of an m by n matrix A. The recursion leads to an automatic variable blocking that allow us to replace a level 2 part in a standard block algorithm by level 3 operations. However, there are some additional costs for performing the updates which prohibits the efficient use of the recursion for large n. This obstacle is overcome by using a hybrid recursive algorithm that outperforms the LAPACK algorithm DGEQRF by 78% to 21% as m=n increases from 100 to 1000. A successful parallel implementation on a PowerPC 604 based IBM SMP node based on dynamic load balancing is presented. For 2, 3, 4 processors and m=n=2000 it shows speedups of 1.96, 2.99, and 3.92 compared to our uniprocessor algorithm.
Preview
Unable to display preview. Download preview PDF.
References
E. Anderson, Z. Bai, C. Bischof, J. Demmel, J. Dongarra, J. Du Croz, A. Greenbaum, S. Hammarling, S. McKenney, S. Ostrouchov, and D. Sorensen. LAPACK Users’ Guide—Release 2.0. SIAM, Philadelphia, 1994.
C. Bischof. Adaptive blocking in the QR factorization. The Journal of Supercomputing, 3:193–208, 1989.
C. Bischof and C. Van Loan. The WY representation for products of householder matrices. SIAM J. Scientific and Statistical Computing, 8(1):s2–s13, 1987.
A. Chalmers and J. Tidmus. Practical Parallel Processing. International Thomson Computer Press, UK, 1996.
K. Dackland, E. Elmroth, and B. Kågström. A ring-oriented approach for block matrix factorizations on shared and distributed memory architectures. In R. F. Sincovec et al, editor, Proceedings of the Sixth SIAM Conference on Parallel Processing for Scientific Computing, pages 330–338, Norfolk, 1993. SIAM Publications.
K. Dackland, E. Elmroth, B. Kågström, and C. Van Loan. Parallel block matrix factorizations on the shared memory multiprocessor IBM 3090 VF/600J. International Journal of Supercomputer Applications, 6(1):69–97, 1992.
J. Dongarra, L. Kaufman, and S. Hammarling. Squeezing the most out of eigen-value solvers on high performance computers. Lin. Alg. and its Applic., 77:113–136, 1986.
F. Gustavson. Recursion leads to automatic variable blocking for dense linear-algebra algorithms. IBM Journal of Research and Development, 41(6):737–755, 1997.
R. Schreiber and C. Van Loan. A storage efficient WY representation for products of householder transformations. SIAM J. Scientific and Statistical Computing, 10(1):53–57, 1989.
S. Toledo. Locality of reference in LU decomposition with partial pivoting. SIAM J. Matrix. Anal. Appl., 18(4):1065–1081, 1997.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1998 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Elmroth, E., Gustavson, F. (1998). New serial and parallel recursive QR factorization algorithms for SMP systems. In: Kågström, B., Dongarra, J., Elmroth, E., Waśniewski, J. (eds) Applied Parallel Computing Large Scale Scientific and Industrial Problems. PARA 1998. Lecture Notes in Computer Science, vol 1541. Springer, Berlin, Heidelberg . https://doi.org/10.1007/BFb0095328
Download citation
DOI: https://doi.org/10.1007/BFb0095328
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-65414-8
Online ISBN: 978-3-540-49261-0
eBook Packages: Springer Book Archive