Iteration-fusing conjugate gradient for sparse linear systems with MPI + OmpSs


In this paper, we target the parallel solution of sparse linear systems via iterative Krylov subspace-based method enhanced with a block-Jacobi preconditioner on a cluster of multicore processors. In order to tackle large-scale problems, we develop task-parallel implementations of the preconditioned conjugate gradient method that improve the interoperability between the message-passing interface and OmpSs programming models. Specifically, we progressively integrate several communication-reduction and iteration-fusing strategies into the initial code, obtaining more efficient versions of the method. For all these implementations, we analyze the communication patterns and perform a comparative analysis of their performance and scalability on a cluster consisting of 32 nodes with 24 cores each. The experimental analysis shows that the techniques described in the paper outperform the classical method by a margin that varies between 6 and 48%, depending on the evaluation.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11


  1. 1.


  1. 1.

    Aliaga JI, Barreda M, Bollhöfer M, Quintana-Ortí ES (2016) Exploiting task-parallelism in message-passing sparse linear system solvers using OmpSs. In: Euro-Par 2016: Parallel Processing: 22nd International Conference on Parallel and Distributed Computing. Springer, Berlin, pp 631–643

  2. 2.

    Aliaga JI, Barreda M, Flegar G, Bollhöfer M, Quintana-Ortí ES (2017) Communication in task-parallel ILU-preconditioned CG solvers using MPI+OmpSs. Concurr Comput Pract Exp 29(21):e4280

    Article  Google Scholar 

  3. 3.

    Anzt H, Dongarra J, Flegar G, Quintana-Ortí ES (2017) Batched Gauss-Jordan elimination for block-Jacobi preconditioner generation on GPUs. In: 8th international workshop programming models & applications for multicores & manycores, PMAM, pp 1–10.

  4. 4.

    Barrett R, Berry M, Chan TF, Demmel J, Donato J, Dongarra J, Eijkhout V, Pozo R, Romine C, der Vorst HV (1994) Templates for the solution of linear systems: building blocks for iterative methods, 2nd edn. SIAM, Philadelphia

    Google Scholar 

  5. 5.

    Chow E, Scott J (2016) On the use of iterative methods and blocking for solving sparse triangular systems in incomplete factorization preconditioning. Technical Report. Technical Report RAL-P-2016-006, Rutherford Appleton Laboratory

  6. 6.

    Cools S (2018) Numerical stability analysis of the class of communication hiding pipelined conjugate gradient methods. CoRR arXiv:abs/1804.02962

  7. 7.

    Cools S, Vanroose W, Yetkin EF, Agullo E, Giraud L (2016) On rounding error resilience, maximal attainable accuracy and parallel performance of the pipelined Conjugate Gradients method for large-scale linear systems in PETSc. In: Proceedings of the Exascale Applications and Software Conference 2016, EASC ’16. ACM, New York, pp 3:1–3:10.

  8. 8.

    Duranton M, De Bosschere K, Coppens B, Gamrat C, Gray M, Munk H, Ozer E, Vardanega T, Zendraand O (2019) The HiPEAC vision: high performance and embedded architecture and compilation. Accessed 20 Nov 2019

  9. 9.

    Ghysels P, Vanroose W (2014) Hiding global synchronization latency in the preconditioned conjugate gradient algorithm. Parallel Comput 40(7):224–238.

    MathSciNet  Article  Google Scholar 

  10. 10.

    Golub GH, Loan CFV (1996) Matrix computations, 3rd edn. The Johns Hopkins University Press, Baltimore

    Google Scholar 

  11. 11.

    Kepner J, Gilbert J (eds) (2011) Graph algorithms in the language of linear algebra. SIAM, Philadelphia

    Google Scholar 

  12. 12.

    Liao X, Lu K, Yang C et al (2018) Moving from exascale to zettascale computing: challenges and techniques. Front Inf Technol Electron Eng 19(10):1236–1244.

    Article  Google Scholar 

  13. 13.

    MPI forum. Accessed 20 Nov 2019

  14. 14.

    OmpSs project home page. Accessed 20 Nov 2019

  15. 15.

    Saad Y (2003) Iterative methods for sparse linear systems, 3rd edn. Society for Industrial and Applied Mathematics, Philadelphia

    Google Scholar 

  16. 16.

    Sala K, Bellón J, Farré P, Teruel X, Perez JM, Peña AJ, Holmes D, Beltran V, Labarta J (2018) Improving the interoperability between MPI and task-based programming models. In: EuroMPI Conference, Barcelona, Spain

  17. 17.

    Wulf WA, McKee SA (1995) Hitting the memory wall: implications of the obvious. SIGARCH Comput Archit News 23(1):20–24.

    Article  Google Scholar 

  18. 18.

    Zhuang S, Casas M (2017) Iteration-fusing conjugate gradient. In: Proceedings of the International Conference on Supercomputing, ICS ’17. ACM, New York, NY, USA, pp 21:1–21:10.

Download references


This research was partially supported by the H2020 EU FETHPC Project 671602 “INTERTWinE.” The researchers from Universidad Jaume I were sponsored by Project TIN2017-82972-R of the Spanish Ministerio de Economía y Competitividad. Maria Barreda was supported by the POSDOC-A/2017/11 project from the Universitat Jaume I.

Author information



Corresponding author

Correspondence to María Barreda.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Barreda, M., Aliaga, J.I., Beltran, V. et al. Iteration-fusing conjugate gradient for sparse linear systems with MPI + OmpSs. J Supercomput 76, 6669–6689 (2020).

Download citation


  • Sparse linear systems
  • Multicore processors
  • Distributed systems
  • Communication-reduction strategies
  • Iteration fusing