Skip to main content
Log in

Parallel sparse linear solver with GMRES method using minimization techniques of communications for GPU clusters

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

In this paper, we aim at exploiting the power computing of a graphics processing unit (GPU) cluster for solving large sparse linear systems. We implement the parallel algorithm of the generalized minimal residual iterative method using the Compute Unified Device Architecture programming language and the MPI parallel environment. The experiments show that a GPU cluster is more efficient than a CPU cluster. In order to optimize the performances, we use a compressed storage format for the sparse vectors and the hypergraph partitioning. These solutions improve the spatial and temporal localization of the shared data between the computing nodes of the GPU cluster.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  1. Ament M, Knittel G, Weiskopf D, Strasser W (2010) A parallel preconditioned conjugate gradient solver for the poisson problem on a multi-GPU platform. In: Proceedings of the 2010 18th Euromicro conference on parallel, distributed and network-based processing, IEEE Computer Society, pp 583–592

  2. Arnoldi W (1951) The principle of minimized iteration in the solution of the matrix eigenvalue problem. Quart Appl Math 9:17–29

    MATH  MathSciNet  Google Scholar 

  3. Bahi J, Contassot-Vivier S, Couturier R (2008) Parallel iterative algorithms: from sequential to grid computing. In: Numerical analysis and scientific computing. Chapman & Hall/CRC

  4. Bahi J, Couturier R, Ziane Khodja L (2011) Parallel GMRES implementation for solving sparse linear systems on GPU clusters. In: Proceedings of the 19th high performance computing symposia, HPC ’11, SCS, International, pp 12–19

  5. Bahi J, Couturier R, Ziane Khodja L (2012) Parallel sparse linear solver gmres for gpu clusters with compression of exchanged data. In: Euro-Par 2011: parallel processing workshops, volume 7155 of LNCS, Springer, pp 471–480

  6. Bell N, Garland M (2009) Implementing sparse matrix-vector multiplication on throughput-oriented processors. In: SC’09, Portland, Oregon, ACM, pp 1–11

  7. Bolz J, Farmer I, Grinspun E, Schröder P (2003) Sparse matrix solvers on the GPU: conjugate gradients and multigrid. ACM Trans Graph 22(3):917–924

    Article  Google Scholar 

  8. Çatalyürek Ü, Aykanat C (1999) Hypergraph-partitioning-based decomposition for parallel sparse-matrix vector multiplication. IEEE Trans Parallel Distrib Syst 10(7):673–693

    Google Scholar 

  9. Çatalyürek Ü, Aykanat C (1999) PaToH: partitioning tool for hypergraphs. http://bmi.osu.edu/~umit/PaToH/manual.pdf. Accessed 28 Feb 2014

  10. Cevahir A, Nukada A, Matsuoka S (2009) Fast conjugate gradients with multiple GPUs. In: Computational science ICCS 2009, volume 5544 of LNCS, Springer, pp 893–903

  11. Cevahir A, Nukada A, Matsuoka S (2010) High performance conjugate gradient solver on multi-GPU clusters using hypergraph partitioning. Comput Sci Res Dev 25:83–91

    Article  Google Scholar 

  12. Chen C, Taha T (2013) A communication reduction approach to iteratively solve large sparse linear systems on a GPGPU cluster. Cluster Comput 1–11

  13. Contassot-Vivier S, Jost T, Vialle S (2012) Impact of asynchronism on GPU accelerated parallel iterative computations. In: Applied parallel and scientific computing, volume 7133 of LNCS, Springer, pp 43–53

  14. Couturier R, Domas S (2012) Sparse systems solving on GPUs with GMRES. J Supercomput 59(3):1504–1516

    Article  Google Scholar 

  15. CUSP Library. http://cusplibrary.github.io/. Accessed 28 Feb 2014

  16. Davis T, Hu Y (1997) The University of Florida sparse matrix collection, Digest. http://www.cise.ufl.edu/research/sparse/matrices/. Accessed 28 Feb 2014

  17. Devine K, Boman E, Heaphy R, Bisseling R, Çatalyürek Ü (2006) Parallel hypergraph partitioning for scientific computing. In: Proceedings of the 20th international conference on parallel and distributed processing, IPDPS’06, IEEE Computer Society, pp 124–124

  18. DeVries B, Iannelli J, Trefftz C, O’Hearn K, Wolffe G (2013) Parallel implementations of FGMRES for solving large, sparse non-symmetric linear systems. Proc Comput Sci 18:491–500

    Article  Google Scholar 

  19. Gaikwad A, Toke I (2010) Parallel iterative linear solvers on GPU: a financial engineering case. In: Proceedings of the 2010 18th Euromicro conference on parallel, distributed and network-based processing, IEEE Computer Society, pp 607–614

  20. Ghaemian N, Abdollahzadeh A, Heinemann Z, Harrer A, Sharifi M, Heinemann G (2008) Accelerating the GMRES iterative linear solver of an oil reservoir simulator using the multi-processing power of compute unified device architecture of graphics cards. In: PARA 2008

  21. Göddeke D, Strzodka R, Mohd-Yusof J, McCormick P, Buijssen S, Grajewski M, Turek S (2007) Exploring weak scalability for FEM calculations on a GPU-enhanced cluster. Parallel Comput Spec Issue High-perform Comput Accel 33(10–11):685–699

    Google Scholar 

  22. Haase G, Liebmann M, Douglas C, Plank G (2010) A parallel algebraic multigrid solver on graphics processing units. In: High performance computing and applications, volume 5938 of LNCS, Springer, pp 38–47

  23. Jost T, Contassot-Vivier S, Vialle S (2009) An efficient multi-algorithms sparse linear solver for GPUs. In International conference on parallel computing, ParCo2009

  24. Karypis G, Kumar V (1998) hMETIS: a hypergraph partitioning package. http://glaros.dtc.umn.edu/gkhome/fetch/sw/hmetis/manual.pdf. Accessed 28 Feb 2014

  25. Li R, Saad Y (2013) GPU-accelerated preconditioned iterative linear solvers. J Supercomput 63(2):443–466

    Article  Google Scholar 

  26. Neic A, Liebmann M, Haase G, Plank G (2012) Algebraic multigrid solver on clusters of CPUs and GPUs. In: Applied parallel and scientific computing, volume 7134 of LNCS, Springer, pp 389–398

  27. NVIDIA Corporation (2012) CUDA Toolkit 4.2 CUBLAS Library.

  28. NVIDIA Corporation (2012) NVIDIA CUDA C Programming Guide.

  29. Paige C, Saunders M (1975) Solution of sparse indefinite systems of linear equations. SIAM J Numer Anal 12(4):617–629

    Article  MATH  MathSciNet  Google Scholar 

  30. PHG—parallel hypergraph and graph partitioning with Zoltan. http://www.cs.sandia.gov/Zoltan/ug_html/ug_alg_phg.html. Accessed 28 Feb 2014

  31. Saad Y, Schultz M (1986) GMRES : a generalized minimal residual algorithm for solving nonsymmetric linear systems. SIAM J Sci Stat Comput 7(3):856–869

    Article  MATH  MathSciNet  Google Scholar 

  32. Wang M, Klie H, Parashar M, Sudan H (2009) Solving sparse linear systems on NVIDIA Tesla GPUs. In: Computational science ICCS 2009, volume 5544 of LNCS, Springer, pp 864–873

  33. Weber D, Bender J, Schnoes M, Stork A, Fellner D (2013) Efficient GPU data structures and methods to solve sparse linear systems in dynamics applications. Comput Graph Forum 32:16–26

    Article  Google Scholar 

  34. Zhao N, Wang X (2012) A parallel preconditioned Bi-Conjugate Gradient stabilized solver for the Poisson problem. J Comput 7(12): 3088–3095

    Google Scholar 

  35. Zoltan: parallel partitioning, load balancing and data-management services. User’s guide. http://www.cs.sandia.gov/Zoltan/ug_html/ug.html. Accessed 28 Feb 2014

Download references

Acknowledgments

This paper is based upon work supported by the Région de Franche-Comté and partially funded by the Labex ACTION program (contract ANR-11-LABX-01-01).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Raphaël Couturier.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ziane Khodja, L., Couturier, R., Giersch, A. et al. Parallel sparse linear solver with GMRES method using minimization techniques of communications for GPU clusters. J Supercomput 69, 200–224 (2014). https://doi.org/10.1007/s11227-014-1143-8

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-014-1143-8

Keywords

Navigation