Synchronization-Reducing Variants of the Biconjugate Gradient and the Quasi-Minimal Residual Methods

  • Stefan Feuerriegel
  • H. Martin Bücker
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8285)


The Biconjugate Gradient (BiCG) and the Quasi-Minimal Residual (QMR) method are among the popular iterative methods for the solution of large, sparse, non-symmetric systems of linear equations. When these methods are implemented on large-scale parallel computers, their scalability is limited by the synchronization caused when carrying out inner product-like operations. Therefore, we propose two new synchronization-reducing variants of BiCG and QMR in an attempt to mitigate these negative performance effects. The idea behind these new s-step variants is to group several dot products for joint execution. Although these new algorithms still reveal numerical instabilities, they are shown to keep the cost of inner product-like operations almost independent of the number of processes, thus improving scalability significantly.


s-step BiCG s-step QMR synchronization-reducing 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Fletcher, R.: Conjugate gradient methods for indefinite systems. In: Watson, G. (ed.) Numerical Analysis. LNM, vol. 506, pp. 73–89. Springer, Heidelberg (1976)CrossRefGoogle Scholar
  2. 2.
    Freund, R.W., Nachtigal, N.M.: An implementation of the QMR method based on coupled two-term recurrences. SIAM J. Sci. Comput. 15(2), 313–337 (1994)MathSciNetCrossRefzbMATHGoogle Scholar
  3. 3.
    Saad, Y.: Krylov subspace methods on supercomputers. SIAM J. Sci. Stat. Comput. 10(6), 1200–1232 (1989)MathSciNetCrossRefzbMATHGoogle Scholar
  4. 4.
    van der Vorst, H.: Iterative methods for the solution of large systems of equations on supercomputers. Advances in Water Resources 13(3), 137–146 (1990)CrossRefGoogle Scholar
  5. 5.
    Demmel, J., Heath, M., van der Vorst, H.: Parallel numerical linear algebra. Acta Numerica 2(1), 111–197 (1993)CrossRefGoogle Scholar
  6. 6.
    Duff, I.S., van der Vorst, H.A.: Developments and trends in the parallel solution of linear systems. Parallel Computing 25(13-14), 1931–1970 (1999)MathSciNetCrossRefGoogle Scholar
  7. 7.
    Bücker, H.M.: Iteratively solving large sparse linear systems on parallel computers. NIC Serices, John Von Neumann Institute f. Computing. Jülich 10, 521–548 (2002)Google Scholar
  8. 8.
    Bücker, H.M., Sauren, M.: Reducing global synchronization in the biconjugate gradient method. In: Yang, T. (ed.) Parallel numerical computations with applications, pp. 63–76. Kluwer Academic Publishers, Norwell (1999)CrossRefGoogle Scholar
  9. 9.
    Fischer, B., Freund, R.: An inner product-free conjugate gradient-like algorithm for Hermitian positive definite systems. In: Brown, J., et al. (eds.) Proc. Cornelius Lanczos Intern. Centenary Conf., pp. 288–290. SIAM (1994)Google Scholar
  10. 10.
    Meurant, G.: The conjugate gradient method on supercomputers. Supercomputer 13, 9–17 (1986)Google Scholar
  11. 11.
    Van Rosendale, J.: Minimizing inner product data dependencies in conjugate gradient iteration. NASA Contractor Report NASA–CR–172178, NASA Langley Research Center, Center, Hampton, VA (1983)Google Scholar
  12. 12.
    Bücker, H.M., Sauren, M.: A Variant of the Biconjugate Gradient Method Suitable for Massively Parallel Computing. In: Bilardi, G., Ferreira, A., Lüling, R., Rolim, J. (eds.) IRREGULAR 1997. LNCS, vol. 1253, pp. 72–79. Springer, Heidelberg (1997)CrossRefGoogle Scholar
  13. 13.
    Bücker, H.M., Sauren, M.: A Parallel Version of the Quasi-Minimal Residual Method Based on Coupled Two-Term Recurrences. In: Waśniewski, J., Dongarra, J., Madsen, K., Olesen, D. (eds.) PARA 1996. LNCS, vol. 1184, pp. 157–165. Springer, Heidelberg (1996)CrossRefGoogle Scholar
  14. 14.
    Chronopoulos, A.T.: A Class of Parallel Iterative Methods Implemented on Multiprocessors. Technical report UIUCDCS–R–86–1267, Department of Computer Science, University of Illinois, Urbana, Illinois (1986)Google Scholar
  15. 15.
    Chronopoulos, A.T., Gear, C.W.: S-step iterative methods for symmetric linear systems. J. Comput. Appl. Math. 25(2), 153–168 (1989)MathSciNetCrossRefzbMATHGoogle Scholar
  16. 16.
    Chronopoulos, A.T., Swanson, C.D.: Parallel iterative s-step methods for unsymmetric linear systems. Parallel Computing 22(5), 623–641 (1996)MathSciNetCrossRefzbMATHGoogle Scholar
  17. 17.
    Kim, S.K., Chronopoulos, A.: A class of Lanczos-like algorithms implemented on parallel computers. Parallel Computing 17(6-7), 763–778 (1991)MathSciNetCrossRefzbMATHGoogle Scholar
  18. 18.
    Kim, S.K., Chronopoulos, A.T.: An efficient nonsymmetric Lanczos method on parallel vector computers. J. Comput. Appl. Math. 42(3), 357–374 (1992)MathSciNetCrossRefzbMATHGoogle Scholar
  19. 19.
    Alvarez-Dios, J.A., Cabaleiro, J.C., Casal, G.: A generalization of s-step variants of gradient methods. J. Comput. Appl. Math. 236(12), 2938–2953 (2012)MathSciNetCrossRefzbMATHGoogle Scholar
  20. 20.
    Mohiyuddin, M., Hoemmen, M., Demmel, J., Yelick, K.: Minimizing communication in sparse matrix solvers. In: Proc. Conf. High Perf. Comput. Networking, Storage and Analysis, SC 2009, pp. 36:1–36:12. ACM, New York (2009)Google Scholar
  21. 21.
    Hoemmen, M.F.: Communication-avoiding Krylov subspace methods. PhD thesis, EECS Department, University of California, Berkeley (2010)Google Scholar
  22. 22.
    Carson, E., Knight, N., Demmel, J.: Avoiding communication in two-sided Krylov subspace methods. SIAM J. Sci. Comput. 35(5), S42–S61 (2013)Google Scholar
  23. 23.
    Ghysels, P., Ashby, T.J., Meerbergen, K., Vanroose, W.: Hiding global communication latency in the GMRES algorithm on massively parallel machines. SIAM J. Sci. Comput. 35(1), 48–71 (2013)MathSciNetCrossRefGoogle Scholar
  24. 24.
    Ghysels, P., Vanroose, W.: Hiding global synchronization latency in the preconditioned Conjugate Gradient algorithm. Parallel Computing (in press, 2013)Google Scholar
  25. 25.
    Curfmann McInnes, L., Smith, B., Zhang, H., Mills, R.T.: Hierarchical and nested Krylov methods for extreme-scale computing. Parallel Computing (in press, 2013)Google Scholar
  26. 26.
    Lanczos, C.: An iteration method for the solution of the eigenvalue problem of linear differential and integral operators. J. Res. Nat. Bur. Stand. 45(4), 255–282 (1950)MathSciNetCrossRefGoogle Scholar
  27. 27.
    Feuerriegel, S., Bücker, H.M.: A normalization scheme for the non-symmetric s-Step Lanczos algorithm. In: Kołodziej, J., Aversa, R., Zhang, J., Amato, F., Fortino, G. (eds.) ICA3PP 2013, Part II. LNCS, vol. 8286, pp. 30–39. Springer, Heidelberg (2013)Google Scholar
  28. 28.
    Freund, R., Nachtigal, N.: QMR: a quasi-minimal residual method for non-Hermitian linear systems. Num. Math. 60(1), 315–339 (1991)MathSciNetCrossRefzbMATHGoogle Scholar
  29. 29.
    Sauren, M., Bücker, H.M.: On deriving the quasi-minimal residual method. SIAM Review 40(4), 922–926 (1998)MathSciNetCrossRefzbMATHGoogle Scholar
  30. 30.
    van der Vorst, H.A., Ye, Q.: Residual replacement strategies for Krylov subspace iterative methods for the convergence of true residuals. SIAM J. Sci. Comput. 22(3), 835–852 (2000)MathSciNetCrossRefzbMATHGoogle Scholar
  31. 31.
    Carson, E., Demmel, J.: A residual replacement strategy for improving the maximum attainable accuracy of s-step Krylov subspace methods. Technical Report UCB/EECS–2012–197, University of California, Berkeley (2012)Google Scholar
  32. 32.
    Gustafsson, M., Demmel, J., Holmgren, S.: Numerical evaluation of the communication-avoiding Lanczos algorithm. Technical Report 2012–001, Department of Information Technology, Uppsala University (January 2012)Google Scholar
  33. 33.
    Freund, R.W., Hochbruck, M.: A biconjugate gradient type algorithm on massively parallel architectures. In: Vichnevetsky, R., Miller, J.J.H. (eds.) IMACS 1991 Proc. 13th World Congress Comput. Appl. Math, pp. 720–721. Criterion Press, Dublin (1991)Google Scholar
  34. 34.
    Freund, R.W., Hochbruck, M.: A biconjugate gradient-type algorithm for the iterative solution of non-Hermitian linear systems on massively parallel architectures. In: Brezinski, C., Kulisch, U. (eds.) IMACS 1991, Proc. 13th World Congress Comput. Appl. Math. I, pp. 169–178. Elsevier Science Publishers (1992)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2013

Authors and Affiliations

  • Stefan Feuerriegel
    • 1
  • H. Martin Bücker
    • 2
  1. 1.University of FreiburgFreiburgGermany
  2. 2.Friedrich Schiller University JenaJenaGermany

Personalised recommendations