Preliminary Implementation of PETSc Using GPUs

Chapter

Abstract

PETSc is a scalable solver library for the solution of algebraic equations arising from the discretization of partial differential equations and related problems. PETSc is organized as a class library with classes for vectors, matrices, Krylov methods, preconditioners, nonlinear solvers, and differential equation integrators. A new subclass of the vector class has been introduced that performs its operations on NVIDIA GPU processors. In addition, a new sparse matrix subclass that performs matrix-vector products on the GPU was introduced. The Krylov methods, nonlinear solvers, and integrators in PETSc run unchanged in parallel using these new subclasses. These can be used transparently from existing PETSc application codes in C, C++, Fortran, or Python. The implementation is done with the Thrust and Cusp C++ packages from NVIDIA.

References

  1. Abedi R, Petracovici B, Haber R (2006) A space-time discontinuous Galerkin method for linearized elastodynamics with element-wise momentum balance. Comput Methods Appl Mech Eng 195(25–28):3247–3273MathSciNetMATHCrossRefGoogle Scholar
  2. Baker C, Heroux M, Edwards H, Williams A (2010) A light-weight api for portable multicore programming. In: 18th Euromicro international conference on parallel, distributed and network-based processing (PDP), IEEE, pp 601–606Google Scholar
  3. Balay S, Gropp WD, McInnes LC, Smith BF (1997) Efficient management of parallelism in object oriented numerical software libraries. In: Arge E, Bruaset AM, Langtangen HP (eds) Modern software tools in scientific computing. Birkhäuser Press, Basel, pp 163–202Google Scholar
  4. Balay S, Brown J, Buschelman K, Eijkhout V, Gropp WD, Kaushik D, Knepley MG, McInnes LC, Smith BF, Zhang H (2011) PETSc Web page. http://www.mcs.anl.gov/petsc
  5. Baskaran M, Bordawekar R (2009) Optimizing sparse matrix-vector multiplication onGPUs. IBM Research Report RC24704, IBMGoogle Scholar
  6. Bell N, Garland M (2008) Efficient sparse matrix-vector multiplication on CUDA. NVIDIA corporation, NVIDIA Technical report NVR-2008-004Google Scholar
  7. Bell N, Garland M (2009) Implementing sparse matrix-vector multiplication on throughput-oriented processors. In: Proceedings of the conference on high performance computing networking, storage and analysis. ACM, New York, pp 1–11Google Scholar
  8. Bell N, Garland M (2010) The Cusp library. http://code.google.com/p/cusp-library/
  9. Bell N, Hoberock J (2010) The Thrust library. http://code.google.com/p/thrust/
  10. Bolz J, Farmer I, Grinspun E, Schröoder P (2003) Sparse matrix solvers on the GPU: conjugate gradients and multigrid. In: SIGGRAPH ’03: ACM SIGGRAPH 2003 papers. ACM, New York, pp. 917–924. http://doi.acm.org/10.1145/1201775.882364
  11. Buatois L, Caumon G, Lévy B (2007) Concurrent number cruncher: an efficient sparse linear solver on the GPU. In: Proceedings of the 3rd international conference high performance computing and communications, pp 358–371Google Scholar
  12. Cevahir A, Nukada A, Matsuoka S (2009) Fast conjugate gradients with multipleGPUs. Computational Science-ICCS, Springer, Heidelberg, pp 893–903Google Scholar
  13. Feng Z, Li P (2008) Multigrid on GPU: tackling power grid analysis on parallel simt platforms. In: IEEE/ACM international conference on computer-aided design, ICCAD 2008, pp 647–654Google Scholar
  14. Heroux MA, Bartlett RA, Howle VE, Hoekstra RJ, Hu JJ, Kolda TG, Lehoucq RB, Long KR, Pawlowski RP, Phipps ET, Salinger AG, Thornquist HK, Tuminaro RS, Willenbring JM, Williams A, Stanley KS (2005) An overview of the Trilinos project. ACM Trans Math Softw 31(3):397–423. doi http://doi.acm.org/10.1145/1089014.1089021 Google Scholar
  15. Heroux M et al (2009) Trilinos web page. http://trilinos.sandia.gov/
  16. Joldes G, Wittek A, Miller K (2010) Real-time nonlinear finite element computations on GPU-application to neurosurgical simulation. Comput Methods Appl Mech Eng 199:49–52Google Scholar
  17. Keunings R (1995) Parallel finite element algorithms applied to computational rheology. Comp Chem Eng 19(6):647–670CrossRefGoogle Scholar
  18. Klöckner A, Warburton T, Bridge J, Hesthaven JS (2009) Nodal discontinuous Galerkin methods on graphics processors. J Comput Phys 228(21):7863–7882. doi http://dx.doi.org/10.1016/j.jcp.2009.06.041 Google Scholar
  19. Komatitsch D, Vilotte J (1998) The spectral element method: an efficient tool to simulate the seismic response of 2d and 3d geological structures. Bull Seismol Soc Am 88(2):368–392Google Scholar
  20. Liu R, Li D (2000) A finite element model study on wear resistance of pseudoelastic TiNi alloy. Mater Sci Eng A 277(1–2):169–175Google Scholar
  21. Taylor Z, Cheng M, Ourselin S (2007) Real-time nonlinear finite element analysis for surgical simulation using graphics processing units. In: Proceedings of the 10th international conference on medical image computing and computer-assisted intervention, vol part I. Springer, Heidelberg, pp 701–708Google Scholar
  22. Vuduc R, Chandramowlishwaran A, Choi JMG (2010) On the limits of GPU acceleration. In: HOTPAR: proceedings of the 2nd USENIX workshop on hot topics in parallelism, USENIXGoogle Scholar
  23. Wu W, Heng P (2004) A hybrid condensed finite element model with GPU acceleration for interactive 3d soft tissue cutting. Comput Animat Virtual Worlds 15(3–4):219–227CrossRefGoogle Scholar
  24. Yokota R, Bardhan JP, Knepley MG, Barba L, Hamada T (2011) Biomolecular electrostatics using a fast multipole BEM on up to 512 gpus and a billion unknowns. Comput Phys Commun 182(6):1272–1283. doi:10.1016/j.cpc.2011.02.013; http://www.sciencedirect.com/science/article/pii/S0010465511000750

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Victor Minden
    • 1
  • Barry Smith
    • 2
  • Matthew G. Knepley
    • 3
  1. 1.School of EngineeringTufts UniversityMedfordUSA
  2. 2.Mathematics and Computer Science DivisionArgonne National LaboratoryArgonneUSA
  3. 3.Computation InstituteUniversity of Chicago ChicagoUSA

Personalised recommendations