New serial and parallel recursive QR factorization algorithms for SMP systems
We present a new recursive algorithm for the QR factorization of an m by n matrix A. The recursion leads to an automatic variable blocking that allow us to replace a level 2 part in a standard block algorithm by level 3 operations. However, there are some additional costs for performing the updates which prohibits the efficient use of the recursion for large n. This obstacle is overcome by using a hybrid recursive algorithm that outperforms the LAPACK algorithm DGEQRF by 78% to 21% as m=n increases from 100 to 1000. A successful parallel implementation on a PowerPC 604 based IBM SMP node based on dynamic load balancing is presented. For 2, 3, 4 processors and m=n=2000 it shows speedups of 1.96, 2.99, and 3.92 compared to our uniprocessor algorithm.
Unable to display preview. Download preview PDF.
- 1.E. Anderson, Z. Bai, C. Bischof, J. Demmel, J. Dongarra, J. Du Croz, A. Greenbaum, S. Hammarling, S. McKenney, S. Ostrouchov, and D. Sorensen. LAPACK Users’ Guide—Release 2.0. SIAM, Philadelphia, 1994.Google Scholar
- 4.A. Chalmers and J. Tidmus. Practical Parallel Processing. International Thomson Computer Press, UK, 1996.Google Scholar
- 5.K. Dackland, E. Elmroth, and B. Kågström. A ring-oriented approach for block matrix factorizations on shared and distributed memory architectures. In R. F. Sincovec et al, editor, Proceedings of the Sixth SIAM Conference on Parallel Processing for Scientific Computing, pages 330–338, Norfolk, 1993. SIAM Publications.Google Scholar
- 6.K. Dackland, E. Elmroth, B. Kågström, and C. Van Loan. Parallel block matrix factorizations on the shared memory multiprocessor IBM 3090 VF/600J. International Journal of Supercomputer Applications, 6(1):69–97, 1992.Google Scholar