Numerical Algorithms

, Volume 80, Issue 2, pp 635–660 | Cite as

Look-ahead in the two-sided reduction to compact band forms for symmetric eigenvalue problems and the SVD

  • Rafael Rodríguez-Sánchez
  • Sandra Catalán
  • José R. HerreroEmail author
  • Enrique S. Quintana-Ortí
  • Andrés E. Tomás
Original Paper


We address the reduction to compact band forms, via unitary similarity transformations, for the solution of symmetric eigenvalue problems and the computation of the singular value decomposition (SVD). Concretely, in the first case, we revisit the reduction to symmetric band form, while, for the second case, we propose a similar alternative, which transforms the original matrix to (unsymmetric) band form, replacing the conventional reduction method that produces a triangular–band output. In both cases, we describe algorithmic variants of the standard Level 3 Basic Linear Algebra Subroutines (BLAS)-based procedures, enhanced with look-ahead, to overcome the performance bottleneck imposed by the panel factorization. Furthermore, our solutions employ an algorithmic block size that differs from the target bandwidth, illustrating the important performance benefits of this decision. Finally, we show that our alternative compact band form for the SVD is key to introduce an effective look-ahead strategy into the corresponding reduction procedure.


Two-sided reduction to compact band form Look-ahead Symmetric eigenvalue problems Singular value decomposition 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


Funding information

This research was partially sponsored by projects TIN2014-53495-R and TIN2015-65316-P of the Spanish Ministerio de Economía y Competitividad, project 2014-SGR-1051 from the Generalitat de Catalunya, and the EU H2020 project 732631 OPRECOMP.


  1. 1.
    Aliaga, J.I., Alonso, P., Badía, J.M., Chacn, P., Davidović, D., López-Blanco, J.R., Quintana-Ortí, E.S.: A fast band–krylov eigensolver for macromolecular functional motion simulation on multicore architectures and graphics processors. J. Comput. Phys. 309(Supplement C), 314–323 (2016)MathSciNetCrossRefzbMATHGoogle Scholar
  2. 2.
    Anderson, E., Bai, Z., Blackford, L.S., Demmel, J., Dongarra, J.J., Du Croz, J., Hammarling, S., Greenbaum, A., McKenney, A., Sorensen, D.C.: LAPACK Users’ Guide, 3rd edn. SIAM (1999)Google Scholar
  3. 3.
    Ballard, G., Demmel, J., Grigori, L., Jacquelin, M., Knight, N., Nguyen, H.D.: Reconstructing householder vectors from tall-skinny QR. J. Parallel Distributed Comp. 85, 3–31 (2015)CrossRefGoogle Scholar
  4. 4.
    Bientinesi, P., Igual, F.D., Kressner, D., Petschow, M., Quintana-ortí, E.S.: Condensed forms for the symmetric eigenvalue problem on multi-threaded architectures. Concurrency Comp.: Pract. Exp. 23(7), 694–707 (2011)CrossRefGoogle Scholar
  5. 5.
    Bischof, C.H., Lang, B., Sun, X.: Algorithm 807: the SBR Toolbox—software for successive band reduction. ACM Trans. Math. Soft. 26(4), 602–616 (2000)MathSciNetCrossRefzbMATHGoogle Scholar
  6. 6.
    Buttari, A., Langou, J., Kurzak, J., Dongarra, J.: A class of parallel tiled linear algebra algorithms for multicore architectures. Parallel Comput. 35(1), 38–53 (2009)MathSciNetCrossRefGoogle Scholar
  7. 7.
    Castaldo, A.M., Whaley, R.C., Samuel, S.: Scaling LAPACK panel operations using parallel cache assignment. ACM Trans. Math. Soft. 39(4), 22:1–22:30 (2013)MathSciNetCrossRefzbMATHGoogle Scholar
  8. 8.
    Catalȧn, S., Herrero, J.R., Quintana-ortí, E.S., Rodríguez-Sȧnchez, R., van de Geijn, R.A.: A case for malleable thread-level linear algebra libraries: The LU factorization with partial pivoting. CoRR, arXiv:1611.06365 (2016)
  9. 9.
    Davidović, D., Quintana-Ortí, E.S.: Applying OOC techniques in the reduction to condensed form for very large symmetric eigenproblems on GPUs. In: Proceedings of the 20th Euromicro Conference on Parallel, Distributed and Network Based Processing – PDP 2012, pp. 442–449 (2012)Google Scholar
  10. 10.
    Davis, T.A., Rajamanickam, S.: Algorithm 8xx: PIRO BAND, pipelined plane rotations for band reduction. ACM Trans. Math. Soft. SubmittedGoogle Scholar
  11. 11.
    Dhillon, I.S., Parlett, B.N., Vömel, C.: The design and implementation of the MRRR algorithm. ACM Trans. Math. Softw. 32(4), 533–560 (2006)MathSciNetCrossRefzbMATHGoogle Scholar
  12. 12.
    Dongarra, J.J., Du Croz, J., Hammarling, S., Duff, I.: A set of level 3 basic linear algebra subprograms. ACM Trans. Math. Softw. 16(1), 1–17 (1990)MathSciNetCrossRefzbMATHGoogle Scholar
  13. 13.
    Dongarra, J.J., Croz, J.D., Hammarling, S., Hanson, R.J.: An extended set of FORTRAN basic linear algebra subprograms. ACM Trans. Math. Softw. 14 (1), 1–17 (1988)CrossRefzbMATHGoogle Scholar
  14. 14.
    Fernando, K.V., Parlett, B.N.: Accurate singular values and differential QD algorithms. Numer. Mathematik 67(2), 191–229 (1994)MathSciNetCrossRefzbMATHGoogle Scholar
  15. 15.
    Golub, G.H., Van Loan, C.F.: Matrix Computations, 3rd edn. The Johns Hopkins University Press, Baltimore (1996)Google Scholar
  16. 16.
    Grosser, B., Lang, B.: Efficient parallel reduction to bidiagonal form. Parallel Comput. 25(8), 969–986 (1999)MathSciNetCrossRefzbMATHGoogle Scholar
  17. 17.
    Gu, M., Eisenstat, S.C.: Divide-and-conquer algorithm for the bidiagonal SVD. SIAM J. Matrix Anal. Appl. 16(1), 79–92 (1995)MathSciNetCrossRefzbMATHGoogle Scholar
  18. 18.
    Haidar, A., Ltaief, H., Dongarra, J.: Parallel reduction to condensed forms for symmetric eigenvalue problems using aggregated fine-grained and memory-aware kernels. In: 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC), pp. 1–11 (2011)Google Scholar
  19. 19.
    Haidar, A., Kurzak, J., Luszczek, P.: An improved parallel singular value algorithm and its implementation for multicore hardware. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, SC ’13, pp. 90:1–90:12. ACM, New York (2013)Google Scholar
  20. 20.
    Moldaschl, M., Gansterer, W.N.: Comparison of eigensolvers for symmetric band matrices. Sci. Comput. Program. 90(PA), 55–66 (2014)CrossRefGoogle Scholar
  21. 21.
    Petschow, M., Peise, E., Bientinesi, P.: High-performance solvers for dense Hermitian eigenproblems. SIAM J. Scientific Comp. 35(1), C1–C22 (2013)MathSciNetCrossRefzbMATHGoogle Scholar
  22. 22.
    Quintana-Ortí, G., Quintana-Ortí, E.S., van de Geijn, R.A., Van Zee, F.G., Chan, E.: Programming matrix algorithms-by-blocks for thread-level parallelism. ACM Trans. Math. Softw. 36(3), 14:1–14:26 (2009)MathSciNetCrossRefzbMATHGoogle Scholar
  23. 23.
    Strazdins, P.: A comparison of lookahead and algorithmic blocking techniques for parallel matrix factorization. Technical Report TR-CS-98-07, Department of Computer Science, The Australian National University Canberra 0200 ACT, Australia (1998)Google Scholar
  24. 24.
    Van Zee, F.G., Smith, T.M., Marker, B., Low, T.M., Van De Geijn, R.A., Igual, F.D., Smelyanskiy, M., Zhang, X., Kistler, M., Austel, V., Gunnels, J.A., Killough, L.: The BLIS framework: experiments in portability. ACM Trans. Math. Softw. 42(2), 12:1–12:19 (2016)MathSciNetCrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Departamento Ingeniería y Ciencia de ComputadoresUniversidad Jaume ICastellónSpain
  2. 2.Departament d’Arquitectura de ComputadorsUniversitat Politècnica de CatalunyaBarcelonaSpain

Personalised recommendations