Abstract
The article discusses the parallel implementation of solving systems of linear algebraic equations on the heterogeneous platform containing a central processing unit (CPU) and graphic accelerators (GPU). The performance of parallel algorithms for the classical conjugate gradient method schemes when using the CPU and GPU together is significantly limited by the synchronization points. The article investigates the pipeline version of the conjugate gradient method with one synchronization point, the possibility of asynchronous calculations, load balancing between the CPU and GPU when solving the large linear systems. Numerical experiments were carried out on test matrices and computational nodes of different performance of a heterogeneous platform, which allowed us to estimate the contribution of communication costs. The algorithms are implemented with the combined use of technologies: MPI, OpenMP and CUDA. The proposed algorithms, in addition to reducing the execution time, allow solving large linear systems, for which there are not enough memory resources of one GPU or a computing node. At the same time, block algorithm with the pipelining decreases the total execution time by reducing synchronization points and aggregating some messages in one.
Supported by Russian Foundation for Basic Research (RFBR) according to the research project 17-01-00402. The work was carried out with the financial support of Udmurt State University in the contest of the grants “Scientific Potential”, project No. 2020-04-03.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Agullo, E., Giraud, L., Guermouche, A., Roman, J.: Parallel hierarchical hybrid linear solvers for emerging computing platforms. C. R. Mec. 333, 96–103 (2011)
Gaidamour, J., Henon, P.: A parallel direct/iterative solver based on a Schur complement approach. In: IEEE 11th International Conference on Computational Science and Engineering, pp. 98–105. San Paulo (2008)
Giraud, L., Haidar, A., Saad, Y.: Sparse approximations of the Schur complement for parallel algebraic hybrid solvers in 3D. Numer. Math. 3, 276–294 (2010)
Rajamanickam, S., Boman, E.G., Heroux, M.A.: ShyLU: a hybrid-hybrid solver for multicore platforms. In: IEEE 26th International Parallel and Distributed Processing Symposium, Shanghai, pp. 631–643 (2012)
Yamazaki, I., Rajamanickam, S., Boman, E., Hoemmen, M., Heroux, M., Tomov, S.: Domain decomposition preconditioners for communication-avoiding Krylov methods on a hybrid CPU/GPU cluster. In: Proceedings of International Conference for High Performance Computing, Networking, Storage and Analysis (SC14), pp. 933–944 (2014)
Kopysov, S., Kuzmin, I., Nedozhogin, N., Novikov, A., Sagdeeva, Y.: Scalable hybrid implementation of the Schur complement method for multi-GPU systems. J. Supercomputing 69(1), 81–88 (2014). https://doi.org/10.1007/s11227-014-1209-7
Hestenes, M.R., Stiefel, E.: Methods of conjugate gradients for solving linear systems. J. Res. Nat. Bur. Stan. 49(6), 409–436 (1952)
Jamal, A., Baboulin, M., Khabou, A., Sosonkina, M.A.: A hybrid CPU approach GPU for the parallel algebraic recursive multilevel solver pARMS. In: 2016 18th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC). Timisoara, pp. 411–416 (2016)
Kasmi, N., Zbakh, M., Mahmoudi, S.A., Manneback, P.: Performance evaluation of StarPU schedulers with preconditioned conjugate gradient solver on heterogeneous (Multi-CPUs/Multi-GPUs) architecture. IJCSNS Int. J. Comput. Sci. Netw. Secur. 17, 206–215 (2017)
Cornelis, J., Cools, S., Vanroose, W.: The Communication-Hiding Conjugate Gradient Method with Deep Pipelines. https://arxiv.org/pdf/1801.04728.pdf. Accessed 14 Apr 2020
D’Azevedo, E.F., Romine, C.H.: Reducing communcation costs in the conjugate gradient algorithm on distributed memory multiprocessors. Technical report ORNL/TM-12192, Oak Ridge National Lab (1992)
Linear Algebra. Springer, Singapore (2018). https://doi.org/10.1007/978-981-13-0926-7_7
Chronopoulos, A.T., Gear, C.W.: s-step iterative methods for symmetric linear systems. J. Comput. Appl. Math. 25(2), 153–168 (1989)
Ghysels, P., Vanroose, W.: Hiding global synchronization latency in the preconditioned Conjugate Gradient algorithm. Parallel Comput. 40(7), 224–238 (2014)
Gropp, W.: Update on libraries for blue waters. http://wgropp.cs.illinois.edu/bib-/talks/tdata/2011/Stream-nbcg.pdf. Accessed 14 Apr 2020
Kadyrov, I.R., Kopysov, S.P., Novikov, A.K.: Partitioning of triangulated multiply connected domain into subdomains without branching of inner boundaries. Uchenye Zap. Kazanskogo Univ. Ser. Fiz. Matematicheskie Nauki 160(3), 544–560 (2018)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Nedozhogin, N.S., Kopysov, S.P., Novikov, A.K. (2020). Resource-Efficient Parallel CG Algorithms for Linear Systems Solving on Heterogeneous Platforms. In: Voevodin, V., Sobolev, S. (eds) Supercomputing. RuSCDays 2020. Communications in Computer and Information Science, vol 1331. Springer, Cham. https://doi.org/10.1007/978-3-030-64616-5_11
Download citation
DOI: https://doi.org/10.1007/978-3-030-64616-5_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-64615-8
Online ISBN: 978-3-030-64616-5
eBook Packages: Computer ScienceComputer Science (R0)