Advertisement

The Journal of Supercomputing

, Volume 73, Issue 8, pp 3411–3432 | Cite as

Efficient implementation of Jacobi iterative method for large sparse linear systems on graphic processing units

  • Abal-Kassim Cheik AhamedEmail author
  • Frédéric Magoulès
Article

Abstract

In this paper, an original Jacobi implementation is considered for the solution of sparse linear systems of equations. The proposed algorithm helps to optimize the parallel implementation on GPU. The performance analysis of GPU-based (using CUDA) algorithm of the implementation of this algorithm is compared to the corresponding serial CPU-based algorithm. Numerical experiments performed on a set of matrices arising from the finite element discretization of various equations (3D Laplace equation, 3D gravitational potential equation, 3D Heat equation) with different meshes, illustrate the performance, robustness and efficiency of our algorithm, with a speed up to 23\(\times \) in double-precision arithmetics.

Keywords

Jacobi method GPU Sparse matrices CSR format Finite element method 

Mathematics Subject Classification

65F10 65F50 65K05 68W10 74S05 

Notes

Acknowledgments

The authors acknowledge partial financial support from the OpenGPU project (Pôle de Compétitivité Systematic, France), and from the CRESTA (Collaborative Research into Exascale Systemware, Tools and Applications) project (European Commission). The authors also acknowledge the CUDA Research Center at CentraleSupélec (formerly, École Centrale Paris), Université Paris Saclay, France, for its support and for providing the computing facilities.

References

  1. 1.
    Bahi J, Miellou JC, Rhofir K (1997) Asynchronous multisplitting methods for nonlinear fixed point problems. Numer Algorithms 15(3–4):315–345MathSciNetCrossRefzbMATHGoogle Scholar
  2. 2.
    Bahi JM (2000) Asynchronous iterative algorithms for nonexpansive linear systems. J Parallel Distrib Comput 60(1):92–112Google Scholar
  3. 3.
    Bahi JM, Contassot-Vivier S, Couturier R (2007) Parallel iterative algorithms: from sequential to grid computing. CRC Press, Boca RatonzbMATHGoogle Scholar
  4. 4.
    Bahi JM, Couturier R, Khodja LZ (2011) Parallel GMRES implementation for solving sparse linear systems on GPU clusters. Society for Computer Simulation International, San Diego, pp 12–19Google Scholar
  5. 5.
    Bell N, Garland M (2008) Efficient sparse matrix–vector multiplication on CUDA. Nvidia Technical Report NVR-2008-004, Nvidia CorporationGoogle Scholar
  6. 6.
    Bell N, Garland M (2009) Implementing sparse matrix–vector multiplication on throughput-oriented processors. In: Proceedings of the conference on high performance computing networking, storage and analysis (SC’09), Portland. ACM, New York, pp 18:1–18:11. doi: 10.1145/1654059.1654078
  7. 7.
    Bolz J, Farmer I, Grinspun E, Schröoder P (2003) Sparse matrix solvers on the GPU: conjugate gradients and multigrid. ACM Trans Graph 22(3):917. doi: 10.1145/882262.882364
  8. 8.
    Bošnački D, Edelkamp S, Sulewski D (2009) Efficient probabilistic model checking on general purpose graphics processors. In: Păsăreanu CS (ed) Model checking software, vol 5578. Springer, Berlin, pp 32–49CrossRefGoogle Scholar
  9. 9.
    Cheik Ahamed A-K, Magoulès F (2012) Fast sparse matrix–vector multiplication on graphics processing unit for finite element analysis. In: 14th IEEE international conference on high performance computing and communications (HPCC’12), Liverpool. IEEE, pp 1307–1314. doi: 10.1109/HPCC.2012.193
  10. 10.
    Cheik Ahamed A-K, Magoulès F (2012) Iterative methods for sparse linear systems on graphics processing unit. In: 14th IEEE international conference on high performance computing and communications (HPCC’12), Liverpool. IEEE, pp 836–842. doi: 10.1109/HPCC.2012.118
  11. 11.
    Cheik Ahamed A-K, Magoulès F (2014) Parallel sub-structuring methods for solving sparse linear systems on a cluster of GPUs. In: 16th IEEE international conference on high performance computing and communications (HPCC’14), Paris. IEEE, pp 121–128. doi: 10.1109/HPCC.2014.24
  12. 12.
    Cheik Ahamed A-K, Magoulès F (2013) Iterative Krylov methods for gravity problems on graphics processing unit. In: 12th international symposium on distributed computing and applications to business, engineering and science (DCABES), Kingston. IEEE, pp 16–20. doi: 10.1109/DCABES.2013.10
  13. 13.
    Cheik Ahamed A-K, Magoulès F (2014) A stochastic-based optimized Schwarz method for the gravimetry equations on GPU clusters. In: Erhel J, Gander MJ, Halpern L, Pichot G, Sassi T, Widlund O (eds) Domain decomposition methods in science and engineering XXI, vol 98. Springer, New York, pp 687–695Google Scholar
  14. 14.
    Cheik Ahamed A-K, Magoulès F (2014) Energy consumption analysis on graphics processing units. In: 13th international symposium on distributed computing and applications to business, engineering and science (DCABES), Xianning. IEEE, pp 46–50. doi: 10.1109/DCABES.2014.13
  15. 15.
    Cheik Ahamed A-K, Magoulès F (2014) Iterative Krylov methods for acoustic problems on graphics processing unit. In: 13th international symposium on distributed computing and applications to business, engineering and science (DCABES), Xianning. IEEE, pp 19–23. doi: 10.1109/DCABES.2014.7
  16. 16.
    Cormie-Bowins E (2012) A comparison of sequential and GPU implementations of iterative methods to compute reachability probabilities. arXiv:1210.6412
  17. 17.
    Davidson A, Zhang Y, Owens JD (2011) An auto-tuned method for solving large tridiagonal systems on the GPU. In: Proceedings of the 2011 IEEE international parallel and distributed processing symposium (IPDPS’11), Anchorage. IEEE Computer Society, pp 956–965. doi: 10.1109/IPDPS.2011.92
  18. 18.
    Domenico PA, Schwartz FW (1998) Physical and chemical hydrogeology, vol 44. Wiley, New YorkGoogle Scholar
  19. 19.
    Eaton TT, Hart DJ, Bradbury KR, Wang HF (2000) Hydraulic conductivity and specific storage of the Maquoketa shale. Final Report for the University of Wisconsin Water Resources Institute. Open-File Report 00-01. Madison, Wisconsin: Wisconsin Geological and Natural History SurveyGoogle Scholar
  20. 20.
    Gomes GAA (2009) Linear solvers for stable fluids: GPU vs CPU. In: 17th Encontro Portugues de Computacao Grafica (EPCG’09), pp 145–153Google Scholar
  21. 21.
    Gravvanis GA, Filelis-Papadopoulos CK, Giannoutakis KM (2012) Solving finite difference linear systems on GPUs: CUDA based parallel explicit preconditioned biconjugate conjugate gradient type methods. J Supercomput 61(3):590–604. doi: 10.1007/s11227-011-0619-z
  22. 22.
    Guo P, Wang L (2010) Auto-tuning CUDA parameters for sparse matrix–vector multiplication on GPUs. In: 2010 international conference on computational and information sciences (ICCIS), Chengdu. IEEE, pp 1154–1157. doi: 10.1109/ICCIS.2010.285
  23. 23.
    Hassani R, Fazely A, Choudhury RUA, Luksch P (2013) Analysis of sparse matrix–vector multiplication using iterative method in CUDA. In: IEEE, pp 262–266. doi: 10.1109/NAS.2013.41
  24. 24.
    Jacobsen DA, Thibault JC, Senocak I (2010) An MPI-CUDA implementation for massively parallel incompressible flow computations on multi-GPU clusters. In: 48th AIAA aerospace sciences meeting and exhibit, vol 16Google Scholar
  25. 25.
    Kreutzer M, Hager G, Wellein G, Fehske H, Basermann A, Bishop AR (2011) Sparse matrix–vector multiplication on GPGPU clusters: a new storage format and a scalable implementation. CoRR. arXiv:1112.5588
  26. 26.
    Magoulès F, Cerise R, Callet P (2013) A beam-tracing domain decomposition method for sound holography in church acoustics. In: IEEE, pp 61–65. doi: 10.1109/DCABES.2013.18
  27. 27.
    Magoulès F, Cheik Ahamed A-K (2015) Alinea: an advanced linear algebra library for massively parallel computations on graphics processing units. Int J High Perform Comput Appl 29(3):284–310. doi: 10.1177/1094342015576774 CrossRefGoogle Scholar
  28. 28.
    Magoulès F, Cheik Ahamed A-K, Suzuki A (2015) Green computing on graphics processing units. Concurrency Computat Pract Exp. 1–21. doi: 10.1002/cpe.3692
  29. 29.
    Magoulès F, Cheik Ahamed A-K, Putanowicz R (2014) Auto-tuned Krylov methods on cluster of graphics processing unit. Int J Comput Math 92(6):1222–1250. doi: 10.1080/00207160.2014.930137
  30. 30.
    Magoulès F, Cheik Ahamed A-K, Putanowicz R (2015) Optimized Schwarz method without overlap for the gravitational potential equation on cluster of graphics processing unit. Int J Comput Math 1–26. doi: 10.1080/00207160.2015.1011628
  31. 31.
    Magoulès F, Cheik Ahamed A-K, Putanowicz R (2015) Fast iterative solvers for large compressed-sparse row linear systems on graphics processing unit. Pollack Period Int J Eng Inf Sci Akadémiai Kiadó 10(1):3–18. doi: 10.1556/Pollack.10.2015.1.1
  32. 32.
    Margaris A, Souravlas S, Roumeliotis M (2014) Parallel implementations of the Jacobi linear algebraic systems solve. CoRR. arXiv:1403.5805
  33. 33.
    Mu D, Chen P, Wang L (2013) Accelerating the discontinuous Galerkin method for seismic wave propagation simulations using multiple GPUs with CUDA and MPI. Earthq Sci 26(6):377–393. doi: 10.1007/s11589-013-0047-7
  34. 34.
    Ren L, Chen X, Wang Y, Zhang C, Yang H (2012) Sparse LU factorization for parallel circuit simulation on GPU. In: Proceedings of the 49th annual design automation conference (DAC’12). ACM, New York, pp 1125–1130. doi: 10.1145/2228360.2228565
  35. 35.
    Sørensen HHB (2013) Auto-tuning of level 1 and level 2 BLAS for GPUs. Concurr Comput Pract Exp 25(8):1183–1198CrossRefGoogle Scholar
  36. 36.
    Suchoski B, Severn C, Shantharam M, Raghavan P (2012) Adapting sparse triangular solution to GPUs. In: IEEE, pp 140–148. doi: 10.1109/ICPPW.2012.23
  37. 37.
    Suski B (2006) Caractérisation et suivi des écoulements hydriques dans les milieux poreux par la méthode du Potentiel Spontané. Thèse. Ph.D. thesisGoogle Scholar
  38. 38.
    Xu S, Lin HX, Xue W (2010) Sparse matrix–vector multiplication optimizations based on matrix bandwidth reduction using NVIDIA CUDA. In: IEEE, pp 609–614. doi: 10.1109/DCABES.2010.162

Copyright information

© Springer Science+Business Media New York 2016

Authors and Affiliations

  1. 1.CentraleSupélec, Université Paris-SaclayChâtenay-Malabry CedexFrance

Personalised recommendations