Skip to main content
Log in

A communication reduction approach to iteratively solve large sparse linear systems on a GPGPU cluster

  • Published:
Cluster Computing Aims and scope Submit manuscript

Abstract

Finite Element Methods (FEM) are widely used in academia and industry, especially in the fields of mechanical engineering, civil engineering, aerospace, and electrical engineering. These methods usually convert partial difference equations into large sparse linear systems. For complex problems, solving these large sparse linear systems is a time consuming process. This paper presents a parallelized iterative solver for large sparse linear systems implemented on a GPGPU cluster. Traditionally, these problems do not scale well on GPGPU clusters. This paper presents an approach to reduce the communications between cluster compute nodes for these solvers. Additionally, computation and communication are overlapped to reduce the impact of data exchange. The parallelized system achieved a speedup of up to 15.3 times on 16 NVIDIA Tesla GPUs, compared to a single GPU. An analytical evaluation of the algorithm is conducted in this paper, and the analytical equations for predicting the performance are presented and validated.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Algorithm 1
Fig. 1
Fig. 2
Fig. 3
Algorithm 2
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  1. Saad, Y.: Iterative Methods for Sparse Linear Systems, 2nd edn. SIAM, Philadelphia (2003)

    Book  MATH  Google Scholar 

  2. Bell, N., Garland, M.: Implementing sparse matrix-vector multiplication on throughput-oriented processors. In: Proceedings of SC ’09, Portland, OR (2009)

    Google Scholar 

  3. Mohiyuddin, M., Hoemmen, M., Demmel, J., Yelick, K.: Minimizing communication in sparse matrix solvers. In: Proceedings of SC’09 (2009)

    Google Scholar 

  4. Bahi, J.M., Couturier, R., Khodja, L.J.: Parallel GMRES implementation for solving sparse linear systems on GPU clusters. In: Proceedings of HPC’11, Boston, MA, pp. 12–19 (2011)

    Google Scholar 

  5. Bahi, J.M., Couturier, R., Khodja, L.J.: Parallel sparse linear solver GMRES for GPU clusters with compression of exchanged data. In: Lect. Notes Comput. Sci., vol. 7155, pp. 471–480 (2012)

    Google Scholar 

  6. Cevahir, A., Nukada, A., Matsuoka, S.: High performance conjugate gradient solver on multi-GPU clusters using hypergraph partitioning. Comput. Sci. Res. Dev. 25, 83–91 (2010)

    Article  Google Scholar 

  7. He, H., Wanga, L., Lee, E., Chen, P.: An MPI-CUDA implementation and optimization for parallel sparse equations and least squares (LSQR). In: The 2012 International Conference on Computational Science (ICCS), Procedia Computer Science. Elsevier, Amsterdam (2012)

    Google Scholar 

  8. NVIDIA: CUDA programming guide, version 5.0 (2012)

  9. CUSP library: http://code.google.com/p/cusp-library/

  10. Guo, P., Huang, H., Chen, Q., Wang, L., Lee, E., Chen, P.: A model-driven partitioning and auto-tuning integrated framework for sparse matrix-vector multiplication on GPUs. In: Proceeding of TeraGrid ’11, Salt Lake City, UT (2011)

    Google Scholar 

  11. Godwin, J., Holewinski, J., Sadayappan, P.: High-performance sparse matrix-vector multiplication on GPUs for structured grid computations. In: The GPGPU 5, London, UK (2012)

    Google Scholar 

  12. NVIDIA: Developing a Linux kernel module using RDMA for GPU direct, v0.2, July 2012

  13. Jordan, A., Bycul, R.P.: The parallel algorithm of conjugate gradient method. In: Lect. Notes Comput. Sci., vol. 2326, pp. 156–165 (2002)

    Google Scholar 

  14. Ament, M., Knittel, G., Weiskopf, D., Straßer, W.: A parallel preconditioned conjugate gradient solver for the Poisson problem on a multi-GPU platform. In: Proceedings of 18th Euromicro Conference on Parallel, Distributed and Network-Based Processing, Pisa, Italy (2010)

    Google Scholar 

  15. Benzi, M., Meyer, C.D., Tuma, M.: A sparse approximate inverse preconditioner for the conjugate gradient method. SIAM J. Sci. Comput. 17(5), 1135–1149 (1996)

    Article  MATH  MathSciNet  Google Scholar 

  16. Huckle, T.: Factorized sparse approximate inverses for preconditioning. J. Supercomput. 25, 109–117 (2003)

    Article  MATH  Google Scholar 

  17. Devine, K., Boman, E., Heaphy, R., Bisseling, R., Atalyurek, U.: Parallel hypergraph partitioning for scientific computing. In: Proceeding of IPDPS’06, Isle of Rhodes, Greece (2006)

    Google Scholar 

  18. Schloegel, K., Karypis, G., Kumar, K.: Parallel static and dynamic multi-constraint graph partitioning. Concurr. Comput. 14, 219–240 (2012)

    Article  Google Scholar 

  19. Catalyurek, U.V., Aykanat, C.: Hypergraph-partitioning-based decomposition for parallel sparse-matrix vector multiplication. IEEE Trans. Parallel Distrib. Syst. 10(7), 673–693 (1999)

    Article  Google Scholar 

  20. Pichel, J.C., Rivera, F.F., Fernández, M., Rodríguez, A.: Optimization of sparse matrix–vector multiplication using reordering techniques on GPUs. Microprocess. Microsyst. 36(2), 65–77 (2012)

    Article  Google Scholar 

  21. ViennaCL: http://viennacl.sourceforge.net/

  22. Davis, T., Hu, Y.: The University of Florida sparse matrix collection. http://www.cise.ufl.edu/research/sparse/matrices/

  23. Blelloch, G.E., Koutis, I., Miller, G.L., Tangwongsan, K.: Hierarchical diagonal blocking and precision reduction applied to combinatorial multigrid. In: Proceedings of SC’10, New Orleans, LA (2010)

    Google Scholar 

  24. Göddeke, D.: Fast and accurate finite-element multigrid solvers for PDE simulations on GPU clusters. PhD dissertation, Technische Universität Dortmund, Fakultät für Mathematik, Logos Verlag, Berlin (2010). ISBN: 978-3-8325-2768-6

  25. Bolz, J., Schr, P.: Sparse matrix solvers on the GPU: conjugate gradients and multigrid. ACM Trans. Graph. 22(3), 917–924 (2003)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tarek M. Taha.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chen, C., Taha, T.M. A communication reduction approach to iteratively solve large sparse linear systems on a GPGPU cluster. Cluster Comput 17, 327–337 (2014). https://doi.org/10.1007/s10586-013-0279-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10586-013-0279-2

Keywords

Navigation