Abstract
PETSc is a scalable solver library for the solution of algebraic equations arising from the discretization of partial differential equations and related problems. PETSc is organized as a class library with classes for vectors, matrices, Krylov methods, preconditioners, nonlinear solvers, and differential equation integrators. A new subclass of the vector class has been introduced that performs its operations on NVIDIA GPU processors. In addition, a new sparse matrix subclass that performs matrix-vector products on the GPU was introduced. The Krylov methods, nonlinear solvers, and integrators in PETSc run unchanged in parallel using these new subclasses. These can be used transparently from existing PETSc application codes in C, C++, Fortran, or Python. The implementation is done with the Thrust and Cusp C++ packages from NVIDIA.
The submitted manuscript has been created by UChicago Argonne, LLC, Operator of Argonne National Laboratory (“Argonne”). Argonne, a U.S. Department of Energy Office of Science laboratory, is operated under Contract No. DE-AC02-06CH11357. The U.S. Government retains for itself, and others acting on its behalf, a paid-up nonexclusive, irrevocable worldwide license in said article to reproduce, prepare derivative works, distribute copies to the public, and perform publicly and display publicly, by or on behalf of the Government.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Thrust is a CUDA library of parallel algorithms with an interface resembling the C++ Standard Template Library (STL). Thrust provides a flexible high-level interface for GPU programming that greatly enhances developer productivity.
- 2.
Cusp is a library for sparse linear algebra and graph computations on CUDA that uses Thrust.
References
Abedi R, Petracovici B, Haber R (2006) A space-time discontinuous Galerkin method for linearized elastodynamics with element-wise momentum balance. Comput Methods Appl Mech Eng 195(25–28):3247–3273
Baker C, Heroux M, Edwards H, Williams A (2010) A light-weight api for portable multicore programming. In: 18th Euromicro international conference on parallel, distributed and network-based processing (PDP), IEEE, pp 601–606
Balay S, Gropp WD, McInnes LC, Smith BF (1997) Efficient management of parallelism in object oriented numerical software libraries. In: Arge E, Bruaset AM, Langtangen HP (eds) Modern software tools in scientific computing. Birkhäuser Press, Basel, pp 163–202
Balay S, Brown J, Buschelman K, Eijkhout V, Gropp WD, Kaushik D, Knepley MG, McInnes LC, Smith BF, Zhang H (2011) PETSc Web page. http://www.mcs.anl.gov/petsc
Baskaran M, Bordawekar R (2009) Optimizing sparse matrix-vector multiplication onGPUs. IBM Research Report RC24704, IBM
Bell N, Garland M (2008) Efficient sparse matrix-vector multiplication on CUDA. NVIDIA corporation, NVIDIA Technical report NVR-2008-004
Bell N, Garland M (2009) Implementing sparse matrix-vector multiplication on throughput-oriented processors. In: Proceedings of the conference on high performance computing networking, storage and analysis. ACM, New York, pp 1–11
Bell N, Garland M (2010) The Cusp library. http://code.google.com/p/cusp-library/
Bell N, Hoberock J (2010) The Thrust library. http://code.google.com/p/thrust/
Bolz J, Farmer I, Grinspun E, Schröoder P (2003) Sparse matrix solvers on the GPU: conjugate gradients and multigrid. In: SIGGRAPH ’03: ACM SIGGRAPH 2003 papers. ACM, New York, pp. 917–924. http://doi.acm.org/10.1145/1201775.882364
Buatois L, Caumon G, Lévy B (2007) Concurrent number cruncher: an efficient sparse linear solver on the GPU. In: Proceedings of the 3rd international conference high performance computing and communications, pp 358–371
Cevahir A, Nukada A, Matsuoka S (2009) Fast conjugate gradients with multipleGPUs. Computational Science-ICCS, Springer, Heidelberg, pp 893–903
Feng Z, Li P (2008) Multigrid on GPU: tackling power grid analysis on parallel simt platforms. In: IEEE/ACM international conference on computer-aided design, ICCAD 2008, pp 647–654
Heroux MA, Bartlett RA, Howle VE, Hoekstra RJ, Hu JJ, Kolda TG, Lehoucq RB, Long KR, Pawlowski RP, Phipps ET, Salinger AG, Thornquist HK, Tuminaro RS, Willenbring JM, Williams A, Stanley KS (2005) An overview of the Trilinos project. ACM Trans Math Softw 31(3):397–423. doi http://doi.acm.org/10.1145/1089014.1089021
Heroux M et al (2009) Trilinos web page. http://trilinos.sandia.gov/
Joldes G, Wittek A, Miller K (2010) Real-time nonlinear finite element computations on GPU-application to neurosurgical simulation. Comput Methods Appl Mech Eng 199:49–52
Keunings R (1995) Parallel finite element algorithms applied to computational rheology. Comp Chem Eng 19(6):647–670
Klöckner A, Warburton T, Bridge J, Hesthaven JS (2009) Nodal discontinuous Galerkin methods on graphics processors. J Comput Phys 228(21):7863–7882. doi http://dx.doi.org/10.1016/j.jcp.2009.06.041
Komatitsch D, Vilotte J (1998) The spectral element method: an efficient tool to simulate the seismic response of 2d and 3d geological structures. Bull Seismol Soc Am 88(2):368–392
Liu R, Li D (2000) A finite element model study on wear resistance of pseudoelastic TiNi alloy. Mater Sci Eng A 277(1–2):169–175
Taylor Z, Cheng M, Ourselin S (2007) Real-time nonlinear finite element analysis for surgical simulation using graphics processing units. In: Proceedings of the 10th international conference on medical image computing and computer-assisted intervention, vol part I. Springer, Heidelberg, pp 701–708
Vuduc R, Chandramowlishwaran A, Choi JMG (2010) On the limits of GPU acceleration. In: HOTPAR: proceedings of the 2nd USENIX workshop on hot topics in parallelism, USENIX
Wu W, Heng P (2004) A hybrid condensed finite element model with GPU acceleration for interactive 3d soft tissue cutting. Comput Animat Virtual Worlds 15(3–4):219–227
Yokota R, Bardhan JP, Knepley MG, Barba L, Hamada T (2011) Biomolecular electrostatics using a fast multipole BEM on up to 512 gpus and a billion unknowns. Comput Phys Commun 182(6):1272–1283. doi:10.1016/j.cpc.2011.02.013; http://www.sciencedirect.com/science/article/pii/S0010465511000750
Acknowledgments
We thank Nathan Bell from NVIDIA and Lisandro Dalcin for their assistance with this project. This work was supported by the Office of Advanced Scientific Computing Research, Office of Science, U.S. Department of Energy, under Contract DE-AC02-06CH11357.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Minden, V., Smith, B., Knepley, M.G. (2013). Preliminary Implementation of PETSc Using GPUs. In: Yuen, D., Wang, L., Chi, X., Johnsson, L., Ge, W., Shi, Y. (eds) GPU Solutions to Multi-scale Problems in Science and Engineering. Lecture Notes in Earth System Sciences. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-16405-7_7
Download citation
DOI: https://doi.org/10.1007/978-3-642-16405-7_7
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-16404-0
Online ISBN: 978-3-642-16405-7
eBook Packages: Earth and Environmental ScienceEarth and Environmental Science (R0)