In this paper, we target the parallel solution of sparse linear systems via iterative Krylov subspace-based method enhanced with a block-Jacobi preconditioner on a cluster of multicore processors. In order to tackle large-scale problems, we develop task-parallel implementations of the preconditioned conjugate gradient method that improve the interoperability between the message-passing interface and OmpSs programming models. Specifically, we progressively integrate several communication-reduction and iteration-fusing strategies into the initial code, obtaining more efficient versions of the method. For all these implementations, we analyze the communication patterns and perform a comparative analysis of their performance and scalability on a cluster consisting of 32 nodes with 24 cores each. The experimental analysis shows that the techniques described in the paper outperform the classical method by a margin that varies between 6 and 48%, depending on the evaluation.
This is a preview of subscription content, access via your institution.
Buy single article
Instant access to the full article PDF.
Tax calculation will be finalised during checkout.
Subscribe to journal
Immediate online access to all issues from 2019. Subscription will auto renew annually.
Tax calculation will be finalised during checkout.
Aliaga JI, Barreda M, Bollhöfer M, Quintana-Ortí ES (2016) Exploiting task-parallelism in message-passing sparse linear system solvers using OmpSs. In: Euro-Par 2016: Parallel Processing: 22nd International Conference on Parallel and Distributed Computing. Springer, Berlin, pp 631–643
Aliaga JI, Barreda M, Flegar G, Bollhöfer M, Quintana-Ortí ES (2017) Communication in task-parallel ILU-preconditioned CG solvers using MPI+OmpSs. Concurr Comput Pract Exp 29(21):e4280
Anzt H, Dongarra J, Flegar G, Quintana-Ortí ES (2017) Batched Gauss-Jordan elimination for block-Jacobi preconditioner generation on GPUs. In: 8th international workshop programming models & applications for multicores & manycores, PMAM, pp 1–10. https://doi.org/10.1145/3026937.3026940
Barrett R, Berry M, Chan TF, Demmel J, Donato J, Dongarra J, Eijkhout V, Pozo R, Romine C, der Vorst HV (1994) Templates for the solution of linear systems: building blocks for iterative methods, 2nd edn. SIAM, Philadelphia
Chow E, Scott J (2016) On the use of iterative methods and blocking for solving sparse triangular systems in incomplete factorization preconditioning. Technical Report. Technical Report RAL-P-2016-006, Rutherford Appleton Laboratory
Cools S (2018) Numerical stability analysis of the class of communication hiding pipelined conjugate gradient methods. CoRR arXiv:abs/1804.02962
Cools S, Vanroose W, Yetkin EF, Agullo E, Giraud L (2016) On rounding error resilience, maximal attainable accuracy and parallel performance of the pipelined Conjugate Gradients method for large-scale linear systems in PETSc. In: Proceedings of the Exascale Applications and Software Conference 2016, EASC ’16. ACM, New York, pp 3:1–3:10. https://doi.org/10.1145/2938615.2938621
Duranton M, De Bosschere K, Coppens B, Gamrat C, Gray M, Munk H, Ozer E, Vardanega T, Zendraand O (2019) The HiPEAC vision: high performance and embedded architecture and compilation. https://www.hipeac.net/vision/2019/. Accessed 20 Nov 2019
Ghysels P, Vanroose W (2014) Hiding global synchronization latency in the preconditioned conjugate gradient algorithm. Parallel Comput 40(7):224–238. https://doi.org/10.1016/j.parco.2013.06.001
Golub GH, Loan CFV (1996) Matrix computations, 3rd edn. The Johns Hopkins University Press, Baltimore
Kepner J, Gilbert J (eds) (2011) Graph algorithms in the language of linear algebra. SIAM, Philadelphia
Liao X, Lu K, Yang C et al (2018) Moving from exascale to zettascale computing: challenges and techniques. Front Inf Technol Electron Eng 19(10):1236–1244. https://doi.org/10.1631/FITEE.1800494
MPI forum. http://www.mpi-forum.org. Accessed 20 Nov 2019
OmpSs project home page. http://pm.bsc.es/ompss. Accessed 20 Nov 2019
Saad Y (2003) Iterative methods for sparse linear systems, 3rd edn. Society for Industrial and Applied Mathematics, Philadelphia
Sala K, Bellón J, Farré P, Teruel X, Perez JM, Peña AJ, Holmes D, Beltran V, Labarta J (2018) Improving the interoperability between MPI and task-based programming models. In: EuroMPI Conference, Barcelona, Spain
Wulf WA, McKee SA (1995) Hitting the memory wall: implications of the obvious. SIGARCH Comput Archit News 23(1):20–24. https://doi.org/10.1145/216585.216588
Zhuang S, Casas M (2017) Iteration-fusing conjugate gradient. In: Proceedings of the International Conference on Supercomputing, ICS ’17. ACM, New York, NY, USA, pp 21:1–21:10. https://doi.org/10.1145/3079079.3079091
This research was partially supported by the H2020 EU FETHPC Project 671602 “INTERTWinE.” The researchers from Universidad Jaume I were sponsored by Project TIN2017-82972-R of the Spanish Ministerio de Economía y Competitividad. Maria Barreda was supported by the POSDOC-A/2017/11 project from the Universitat Jaume I.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Barreda, M., Aliaga, J.I., Beltran, V. et al. Iteration-fusing conjugate gradient for sparse linear systems with MPI + OmpSs. J Supercomput 76, 6669–6689 (2020). https://doi.org/10.1007/s11227-019-03100-4
- Sparse linear systems
- Multicore processors
- Distributed systems
- Communication-reduction strategies
- Iteration fusing