I/O Efficient Algorithms for Block Hessenberg Reduction Using Panel Approach

  • Sraban Kumar Mohanty
  • Gopalan Sajith
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7678)


Reduction to Hessenberg form is a major performance bottleneck in the computation of the eigenvalues of a nonsymmetric matrix; which takes O(N 3) flops. All the known blocked and unblocked direct Hessenberg reduction algorithms have an I/O complexity of O(N 3/B). To improve the performance by incorporating matrix-matrix operations in the computation, usually the Hessenberg reduction is computed in two steps: the first reducing the matrix to a banded Hessenberg form, and the second further reducing it to Hessenberg form. We propose and analyse the first step of the reduction, i.e., reduction of a nonsymmetric matrix to banded Hessenberg form of bandwidth t for varying values of N and M (the size of the internal memory), on external memory model introduced by Aggarwal and Vitter for the I/O complexity and show that the reduction can be performed in \(O(N^3/\min\{t,\sqrt{M}\}B)\) I/Os.


Large Matrix Computation External Memory Algorithms Out-of-Core Algorithms Matrix Computations Hessenberg Reduction I/O Efficient Eigenvalue Problem 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Aggarwal, A., Vitter, J.S.: The input/output complexity of sorting and related problems. Comm. ACM 31(9), 1116–1127 (1988)MathSciNetCrossRefGoogle Scholar
  2. 2.
    Vitter, J.S.: External memory algorithms. In: Handbook of Massive Data Sets. Massive Comput., vol. 4, pp. 359–416. Kluwer Acad. Publ., Dordrecht (2002)Google Scholar
  3. 3.
    Mohanty, S.K.: I/O Efficient Algorithms for Matrix Computations. PhD thesis, Indian Institute of Technology Guwahati, Guwahati, India (2010)Google Scholar
  4. 4.
    Mohanty, S.K., Sajith, G.: I/O efficient QR and QZ algorithms. In: 19th IEEE Annual International Conference on High Performance Computing (HiPC 2012), Pune, India (accepted, December 2012)Google Scholar
  5. 5.
    Roh, K., Crochemore, M., Iliopoulos, C.S., Park, K.: External memory algorithms for string problems. Fund. Inform. 84(1), 17–32 (2008)MathSciNetzbMATHGoogle Scholar
  6. 6.
    Chiang, Y.J., Goodrich, M.T., Grove, E.F., Tamassia, R., Vengroff, D.E., Vitter, J.S.: External-memory graph algorithms. In: Proceedings of the Sixth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 139–149. ACM, Philadelphia (1995)Google Scholar
  7. 7.
    Chiang, Y.J.: Dynamic and I/O-Efficient Algorithms for Computational Geometry and Graph Problems: Theoretical and Experimental Results. PhD thesis, Brown University, Providence, RI, USA (1996)Google Scholar
  8. 8.
    Goodrich, M.T., Tsay, J.J., Vengroff, D.E., Vitter, J.S.: External-memory computational geometry. In: Proceedings of the 34th Annual IEEE Symposium on Foundations of Computer Science, pp. 714–723. IEEE Computer Society Press, Palo Alto (1993)Google Scholar
  9. 9.
    Arge, L.: The buffer tree: a technique for designing batched external data structures. Algorithmica 37(1), 1–24 (2003)MathSciNetzbMATHCrossRefGoogle Scholar
  10. 10.
    Vitter, J.S.: External memory algorithms and data structures: dealing with massive data. ACM Comput. Surv. 33(2), 209–271 (2001)CrossRefGoogle Scholar
  11. 11.
    Demaine, E.D.: Cache-oblivious algorithms and data structures. Lecture Notes from the EEF Summer School on Massive Data Sets, BRICS, University of Aarhus, Denmark (2002)Google Scholar
  12. 12.
    Vitter, J.S., Shriver, E.A.M.: Algorithms for parallel memory. I. Two-level memories. Algorithmica 12(2-3), 110–147 (1994)MathSciNetzbMATHCrossRefGoogle Scholar
  13. 13.
    Toledo, S., Gustavson, F.G.: The design and implementation of SOLAR, a portable library for scalable out-of-core linear algebra computations. In: Fourth Workshop on Input/Output in Parallel and Distributed Systems, pp. 28–40. ACM Press (1996)Google Scholar
  14. 14.
    Reiley, W.C., Van de Geijn, R.A.: POOCLAPACK: parallel out-of-core linear algebra package. Technical Report CS-TR-99-33, Department of Computer Science, The University of Texas at Austin (November 1999)Google Scholar
  15. 15.
    Alpatov, P., Baker, G., Edwards, H.C., Gunnels, J., Morrow, G., Overfelt, J., de Geijn, R.A.V.: PLAPACK: Parallel linear algebra package design overview. In: Supercomputing 1997: Proceedings of the ACM/IEEE Conference on Supercomputing, pp. 1–16. ACM, New York (1997)CrossRefGoogle Scholar
  16. 16.
    Van de Geijn, R.A., Alpatou, P., Baker, G., Edwards, C., Gunnels, J., Morrow, G., Overfelt, J.: Using PLAPACK: Parallel Linear Algebra Package. MIT Press, Cambridge (1997)Google Scholar
  17. 17.
    Choi, J., Dongarra, J.J., Pozo, R., Walker, D.W.: ScaLAPACK: A scalable linear algebra library for distributed memory concurrent computers. In: Proceedings of the Fourth Symposium on the Frontiers of Massively Parallel Computation, pp. 120–127. IEEE Computer Society Press (1992)Google Scholar
  18. 18.
    Anderson, E., Bai, Z., Bischof, C.H., Demmel, J., Dongarra, J.J., Croz, J.D., Greenbaum, A., Hammarling, S., McKenney, A., Ostrouchov, S., Sorensen, D.C.: LAPACK Users’ Guide, 2nd edn. SIAM, Philadelphia (1995)Google Scholar
  19. 19.
    Basic Linear Algebra Subprograms(BLAS),
  20. 20.
    Toledo, S.: A survey of out-of-core algorithms in numerical linear algebra. In: External Memory Algorithms. DIMACS Ser. Discrete Math. Theoret. Comput. Sci. Amer. Math. Soc., vol. 50, pp. 161–179, Piscataway, NJ, Providence, RI (1999)Google Scholar
  21. 21.
    Elmroth, E., Gustavson, F.G., Jonsson, I., Kågström, B.: Recursive blocked algorithms and hybrid data structures for dense matrix library software. SIAM Rev. 46(1), 3–45 (2004)MathSciNetzbMATHCrossRefGoogle Scholar
  22. 22.
    Haveliwala, T., Kamvar, S.D.: The second eigenvalue of the google matrix. Technical Report 2003-20, Stanford InfoLab (2003)Google Scholar
  23. 23.
    Christopher, M.D., Eugenia, K., Takemasa, M.: Estimating and correcting global weather model error. Monthly Weather Review 135(2), 281–299 (2007)CrossRefGoogle Scholar
  24. 24.
    Alter, O., Brown, P.O., Botstein, D.: Processing and modeling genome-wide expression data using singular value decomposition. In: Bittner, M.L., Chen, Y., Dorsel, A.N., Dougherty, E.R. (eds.) Microarrays: Optical Technologies and Informatics, vol. 4266, pp. 171–186. SPIE (2001)Google Scholar
  25. 25.
    Xu, S., Bai, Z., Yang, Q., Kwak, K.S.: Singular value decomposition-based algorithm for IEEE 802.11a interference suppression in DS-UWB systems. IEICE Trans. Fundam. Electron. Commun. Comput. Sci. E89-A(7), 1913–1918 (2006)Google Scholar
  26. 26.
    Golub, G.H., Van Loan, C.F.: Matrix Computations, 3rd edn. Johns Hopkins Studies in the Mathematical Sciences. Johns Hopkins University Press, Baltimore (1996)zbMATHGoogle Scholar
  27. 27.
    Watkins, D.S.: Fundamentals of Matrix Computations, 2nd edn. Pure and Applied Mathematics. Wiley-Interscience. John Wiley & Sons, New York (2002)zbMATHCrossRefGoogle Scholar
  28. 28.
    Dongarra, J.J., Duff, I.S., Sorensen, D.C., Van der Vorst, H.A.: Numerical Linear Algebra for High Performance Computers. Software, Environments and Tools, vol. 7. SIAM, Philadelphia (1998)zbMATHCrossRefGoogle Scholar
  29. 29.
    Dongarra, J.J., Croz, J.D., Hammarling, S., Duff, I.S.: A set of level 3 basic linear algebra subprograms. ACM Trans. Math. Softw. 16(1), 1–17 (1990)zbMATHCrossRefGoogle Scholar
  30. 30.
    Elmroth, E., Gustavson, F.G.: New Serial and Parallel Recursive QR Factorization Algorithms for SMP Systems. In: Kågström, B., Elmroth, E., Waśniewski, J., Dongarra, J. (eds.) PARA 1998. LNCS, vol. 1541, pp. 120–128. Springer, Heidelberg (1998)CrossRefGoogle Scholar
  31. 31.
    Gunter, B.C., Reiley, W.C., Van de Geijn, R.A.: Implementation of out-of-core Cholesky and QR factorizations with POOCLAPACK. Technical Report CS-TR-00-21, Austin, TX, USA (2000)Google Scholar
  32. 32.
    Gunter, B.C., Reiley, W.C., Van De Geijn, R.A.: Parallel out-of-core Cholesky and QR factorization with POOCLAPACK. In: IPDPS 2001: Proceedings of the 15th International Parallel & Distributed Processing Symposium. IEEE Computer Society, Washington, DC (2001)Google Scholar
  33. 33.
    Gunter, B.C., Van de Geijn, R.A.: Parallel out-of-core computation and updating of the QR factorization. ACM Trans. Math. Software 31(1), 60–78 (2005)MathSciNetzbMATHCrossRefGoogle Scholar
  34. 34.
    Buttari, A., Langou, J., Kurzak, J., Dongarra, J.J.: A class of parallel tiled linear algebra algorithms for multicore architectures. Parallel Comput. 35(1), 38–53 (2009)MathSciNetCrossRefGoogle Scholar
  35. 35.
    Bischof, C.H., Lang, B., Sun, X.: A framework for symmetric band reduction. ACM Trans. Math. Software 26(4), 581–601 (2000)MathSciNetCrossRefGoogle Scholar
  36. 36.
    Quintana Ortí, G., de Geijn, R.A.V.: Improving the performance of reduction to Hessenberg form. ACM Trans. Math. Software 32(2), 180–194 (2006)MathSciNetCrossRefGoogle Scholar
  37. 37.
    Dongarra, J.J., Sorensen, D.C., Hammarling, S.J.: Block reduction of matrices to condensed forms for eigenvalue computations. J. Comput. Appl. Math. 27(1-2), 215–227 (1989)MathSciNetzbMATHCrossRefGoogle Scholar
  38. 38.
    Dongarra, J.J., van de Geijn, R.A.: Reduction to condensed form for the eigenvalue problem on distributed memory architectures. Parallel Comput. 18(9), 973–982 (1992)MathSciNetzbMATHCrossRefGoogle Scholar
  39. 39.
    Bischof, C.H., Lang, B., Sun, X.: Parellel tridiagonal through two-step band reduction. In: Proceedings of the Scalable High-Performance Computing Conference, pp. 23–27. IEEE Computer Society Press (May 1994)Google Scholar
  40. 40.
    Lang, B.: Using level 3 BLAS in rotation-based algorithms. SIAM J. Sci. Comput. 19(2), 626–634 (1998)MathSciNetzbMATHCrossRefGoogle Scholar
  41. 41.
    Lang, B.: A parallel algorithm for reducing symmetric banded matrices to tridiagonal form. SIAM J. Sci. Comput. 14(6), 1320–1338 (1993)MathSciNetzbMATHCrossRefGoogle Scholar
  42. 42.
    Berry, M.W., Dongarra, J.J., Kim, Y.: A parallel algorithm for the reduction of a nonsymmetric matrix to block upper-Hessenberg form. Parallel Comput. 21(8), 1189–1211 (1995)MathSciNetzbMATHCrossRefGoogle Scholar
  43. 43.
    Ltaief, H., Kurzak, J., Dongarra, J.J.: Parallel block Hessenberg reduction using algorithms-by-tiles for multicore architectures revisited. LAPACK Working Note #208, University of Tennessee, Knoxville (2008)Google Scholar
  44. 44.
    Bai, Y., Ward, R.C.: Parallel block tridiagonalization of real symmetric matrices. J. Parallel Distrib. Comput. 68(5), 703–715 (2008)zbMATHCrossRefGoogle Scholar
  45. 45.
    Großer, B., Lang, B.: Efficient parallel reduction to bidiagonal form. Parallel Comput. 25(8), 969–986 (1999)MathSciNetzbMATHCrossRefGoogle Scholar
  46. 46.
    Lang, B.: Parallel reduction of banded matrices to bidiagonal form. Parallel Comput. 22(1), 1–18 (1996)MathSciNetzbMATHCrossRefGoogle Scholar
  47. 47.
    Trefethen, L.N., Bau III, D.: Numerical Linear Algebra. SIAM (1997)Google Scholar
  48. 48.
    Ltaief, H., Kurzak, J., Dongarra, J.J.: Scheduling two-sided transformations using algorithms-by-tiles on multicore architectures. LAPACK Working Note #214, University of Tennessee, Knoxville (2009)Google Scholar
  49. 49.
    Bischof, C.H., Van Loan, C.F.: The WY representation for products of Householder matrices. SIAM J. Sci. Statist. Comput. 8(1), S2–S13 (1987)MathSciNetCrossRefGoogle Scholar
  50. 50.
    Wu, Y.J.J., Alpatov, P., Bischof, C.H., van de Geijn, R.A.: A parallel implementation of symmetric band reduction using PLAPACK. In: Proceedings of Scalable Parallel Library Conference. PRISM Working Note 35, Mississippi State University (1996)Google Scholar
  51. 51.
    Bai, Y.: High performance parallel approximate eigensolver for real symmetric matrices. PhD thesis, University of Tennessee, Knoxville (2005)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Sraban Kumar Mohanty
    • 1
  • Gopalan Sajith
    • 2
  1. 1.Computer Science & Engineering DisciplinePDPM Indian Institute of Information Technology, Design and Manufacturing JabalpurJabalpurIndia
  2. 2.Computer Science & Engineering DepartmentIndian Institute of Technology GuwahatiGuwahatiIndia

Personalised recommendations