Skip to main content

Scalability Pipelined Algorithm of the Conjugate Gradient Method on Heterogeneous Platforms

  • Conference paper
  • First Online:
Mesh Methods for Boundary-Value Problems and Applications

Abstract

This paper presents a parallelized iterative solver for large sparse linear systems implemented on a heterogeneous platform. Traditionally, these problems do not scale well on multi-CPU/multi-GPUs clusters. We consider the standard preconditioned Conjugate Gradient (PCG) algorithm, and as an alternative the pipelined variant, a formulation that is potentially better suited for hybrid CPU/GPU computing since it requires only one synchronization point per iteration, instead of two for standard CG. On heterogeneous cluster, the PCG iteration needs the vector entries generated by current GPU and other GPUs, so the communication between GPUs becomes a major performance bottleneck. In this paper, we study the implementation of the pipeline PCG on multi-CPU/multi-GPU platform. This paper presents an approach to reduce the communications between cluster compute nodes for these solvers. Additionally, computation and communication are overlapped to reduce the impact of data exchange. To achieve scalability, we adopt pipelined version of the conjugate gradient method with one synchronization point, the possibility of asynchronous calculations, load balancing between the CPU and GPU for parallel solving the large linear systems. The algorithm is implemented with the combined use of technologies: MPI, OpenMP and CUDA. We show that almost optimum speed up on 8-CPU/2GPU may be reached (relatively to a one GPU execution). The parallelized solver achieves a speedup of up to 5.49 times on 16 NVIDIA Tesla GPUs, as compared to a one GPU.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Bosner, N., Bujanovi, Z., Drma, Z.: Parallel solver for shifted systems in a hybrid CPU–GPU framework. SIAM Journal on Scientific Computing 40(4), C605–C633 (2018)

    Article  MathSciNet  Google Scholar 

  2. Chronopoulos, A., Gear, C.: s-step iterative methods for symmetric linear systems. Journal of Computational and Applied Mathematics 25(2), 153–168 (1989)

    Google Scholar 

  3. Collignon, T., Gijzen, M.V.: Two implementations of the preconditioned conjugate gradient method on heterogeneous computing grids. International Journal of Applied Mathematics and Computer Science 20(1), 109–121 (01 Mar 2010)

    Google Scholar 

  4. Davis, T.A., Hu, Y.: The university of florida sparse matrix collection 38(1) (2011)

    Google Scholar 

  5. Dongarra, J., Heroux, M.A., Luszczek, P.: A new metric for ranking high-performance computing systems. National Science Review 3(1), 30–35 (01 2016)

    Google Scholar 

  6. Hestenes, M.R., Stiefel, E.: Methods of conjugate gradients for solving linear systems. Journal of research of the National Bureau of Standards 49, 409–436 (1952)

    Article  MathSciNet  Google Scholar 

  7. Jamal, A., Baboulin, M., Khabou, A., Sosonkina, M.: A hybrid CPU/GPU approach for the Parallel Algebraic Recursive Multilevel Solver pARMS. In: 2016 18th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC). pp. 411–416 (2016)

    Google Scholar 

  8. Kasmi, N., Zbakh, M., Haouari, A.: Performance analysis of preconditioned conjugate gradient solver on heterogeneous (multi-CPUs/multi-GPUs) architecture. Lecture Notes in Networks and Systems 49, 318–336 (2019).

    Article  Google Scholar 

  9. Kopysov, S., Kuzmin, I., Nedozhogin, N., Novikov, A., Sagdeeva, Y.: Scalable hybrid implementation of the schur complement method for multi-gpu systems. Journal of Supercomputing 69(1), 81–88 (2014)

    Article  Google Scholar 

  10. Mittal, S., Vetter, J.S.: A survey of cpu-gpu heterogeneous computing techniques. ACM Comput. Surv. 47(4) (Jul 2015).

    Google Scholar 

  11. Kadyrov, I.R., Kopysov, S.P., Novikov, A.K.: Partitioning of triangulated multiply connected domain into subdomains without branching of inner boundaries. Uchenye Zapiski Kazanskogo Universiteta. Seriya Fiziko-Matematicheskie Nauki 160(3), 544–560 (2018)

    MathSciNet  Google Scholar 

  12. Zhang, X., Yang, C., Liu, F., Liu, Y., Lu, Y.: Optimizing and scaling HPCG on Tianhe-2: early experience, vol. 8631, part I, p. 28–41. Springer, Dalian, China (aug 2014),

    Google Scholar 

Download references

Acknowledgements

The work was carried out with the financial support of Udmurt State University in the contest of the grants “Scientific Potential”, project No. 2020-04-03.

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Nedozhogin, N.S., Kopysov, S.P., Novikov, A.K. (2022). Scalability Pipelined Algorithm of the Conjugate Gradient Method on Heterogeneous Platforms. In: Badriev, I.B., Banderov, V., Lapin, S.A. (eds) Mesh Methods for Boundary-Value Problems and Applications. Lecture Notes in Computational Science and Engineering, vol 141. Springer, Cham. https://doi.org/10.1007/978-3-030-87809-2_27

Download citation

Publish with us

Policies and ethics