On Aggressive Early Deflation in Parallel Variants of the QR Algorithm

  • Bo Kågström
  • Daniel Kressner
  • Meiyue Shao
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7133)

Abstract

The QR algorithm computes the Schur form of a matrix and is by far the most popular approach for solving dense nonsymmetric eigenvalue problems. Multishift and aggressive early deflation (AED) techniques have led to significantly more efficient sequential implementations of the QR algorithm during the last decade. More recently, these techniques have been incorporated in a novel parallel QR algorithm on hybrid distributed memory HPC systems. While leading to significant performance improvements, it has turned out that AED may become a computational bottleneck as the number of processors increases. In this paper, we discuss a two-level approach for performing AED in a parallel environment, where the lower level consists of a novel combination of AED with the pipelined QR algorithm implemented in the ScaLAPACK routine PDLAHQR. Numerical experiments demonstrate that this new implementation further improves the performance of the parallel QR algorithm.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Adlerborn, B., Kågström, B., Kressner, D.: Parallel Variants of the Multishift QZ Algorithm with Advanced Deflation Techniques. In: Kågström, B., Elmroth, E., Dongarra, J., Waśniewski, J. (eds.) PARA 2006. LNCS, vol. 4699, pp. 117–126. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  2. 2.
    Anderson, E., Bai, Z., Bischof, C.H., Blackford, S., Demmel, J.W., Dongarra, J.J., Du Croz, J., Greenbaum, A., Hammarling, S., McKenney, A., Sorensen, D.C.: LAPACK User’s Guide, 3rd edn. SIAM, Philadelphia (1999)CrossRefMATHGoogle Scholar
  3. 3.
    Bai, Z., Demmel, J.W.: On a Block Implementation of Hessenberg Multishift QR Iteration. Intl. J. of High Speed Comput. 1, 97–112 (1989)CrossRefMATHGoogle Scholar
  4. 4.
    Bai, Z., Demmel, J.W.: On Swapping Diagonal Blocks in Real Schur Form. Linear Algebra Appl. 186, 73–95 (1993)MathSciNetCrossRefMATHGoogle Scholar
  5. 5.
    Blackford, L.S., Choi, J., Cleary, A., D’Azevedo, E., Demmel, J.W., Dhillon, I., Dongarra, J.J., Hammarling, S., Henry, G., Petitet, A., Stanley, K., Walker, D., Whaley, R.C.: ScaLAPACK Users’ Guide. SIAM, Philadelphia (1997)CrossRefMATHGoogle Scholar
  6. 6.
    Braman, K., Byers, R., Mathias, R.: The Multishift QR Algorithm. Part I: Maintaining Well-focused Shifts and Level 3 Performance. SIAM J. Matrix Anal. Appl. 23(4), 929–947 (2002)CrossRefMATHGoogle Scholar
  7. 7.
    Braman, K., Byers, R., Mathias, R.: The Multishift QR Algorithm. Part II: Aggressive Early Deflation. SIAM J. Matrix Anal. Appl. 23(4), 948–973 (2002)CrossRefMATHGoogle Scholar
  8. 8.
    Byers, R.: LAPACK 3.1 xHSEQR: Tuning and Implementation Notes on the Small Bulge Multi-shift QR Algorithm with Aggressive Early Deflation. LAPACK Working Note 187 (2007)Google Scholar
  9. 9.
    Golub, G., Uhlig, F.: The QR Algorithm: 50 Years Later Its Genesis by John Francis and Vera Kublanovskaya and Subsequent Developments. IMA J. Numer. Anal. 29(3), 467–485 (2009)MathSciNetCrossRefMATHGoogle Scholar
  10. 10.
    Granat, R., Kågström, B., Kressner, D.: A Novel Parallel QR Algorithm for Hybrid Distributed Memory HPC Systems. SIAM J. Sci. Comput. 32(4), 2345–2378 (2010) (An earlier version appeared as LAPACK Working Note 216)MathSciNetCrossRefMATHGoogle Scholar
  11. 11.
    Granat, R., Kågström, B., Kressner, D.: Parallel Eigenvalue Reordering in Real Schur Forms. Concurrency and Computat.: Pract. Exper. 21(9), 1225–1250 (2009)CrossRefGoogle Scholar
  12. 12.
    GOTO-BLAS – High-performance BLAS by Kazushige Goto, http://www.tacc.utexas.edu/tacc-projects/#blas
  13. 13.
    Henry, G., van de Geijn, R.: Parallelizing the QR Algorithm for the Nonsymmetric Algebraic Eigenvalue Problem: Myths and Reality. SIAM J. Sci. Comput. 17, 870–883 (1997)CrossRefMATHGoogle Scholar
  14. 14.
    Henry, G., Watkins, D.S., Dongarra, J.J.: A Parallel Implementation of the Nonsymmetric QR Algorithm for Distributed Memory Architectures. SIAM J. Sci. Comput. 24(1), 284–311 (2002)MathSciNetCrossRefMATHGoogle Scholar
  15. 15.
    Kressner, D.: Numerical Methods for General and Structured Eigenvalue Problems. LNCSE, vol. 46. Springer, Heidelberg (2005)Google Scholar
  16. 16.
    Kressner, D.: The Effect of Aggressive Early Deflation on the Convergence of the QR Algorithm. SIAM J. Matrix Anal. Appl. 30(2), 805–821 (2008)MathSciNetCrossRefMATHGoogle Scholar
  17. 17.
    Lang, B.: Effiziente Orthogonaltransformationen bei der Eigen- und Singulärwertzerlegung. Habilitationsschrift (1997)Google Scholar
  18. 18.
    Watkins, D.S.: The Matrix Eigenvalue Problem: GR and Krylov Subspace Methods. SIAM, Philadelphia (2007)CrossRefMATHGoogle Scholar
  19. 19.
    Watkins, D.S.: Francis’s Algorithm. Amer. Math. Monthly (2010) (to appear)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Bo Kågström
    • 1
  • Daniel Kressner
    • 2
  • Meiyue Shao
    • 1
  1. 1.Department of Computing Science and HPC2NUmeå UniversityUmeåSweden
  2. 2.Seminar for Applied MathematicsETH ZürichSwitzerland

Personalised recommendations