Preliminary Implementation of PETSc Using GPUs

Minden, Victor; Smith, Barry; Knepley, Matthew G.

doi:10.1007/978-3-642-16405-7_7

Victor Minden⁷,
Barry Smith⁸ &
Matthew G. Knepley⁹

Part of the book series: Lecture Notes in Earth System Sciences ((LNESS))

3026 Accesses
12 Citations

Abstract

PETSc is a scalable solver library for the solution of algebraic equations arising from the discretization of partial differential equations and related problems. PETSc is organized as a class library with classes for vectors, matrices, Krylov methods, preconditioners, nonlinear solvers, and differential equation integrators. A new subclass of the vector class has been introduced that performs its operations on NVIDIA GPU processors. In addition, a new sparse matrix subclass that performs matrix-vector products on the GPU was introduced. The Krylov methods, nonlinear solvers, and integrators in PETSc run unchanged in parallel using these new subclasses. These can be used transparently from existing PETSc application codes in C, C++, Fortran, or Python. The implementation is done with the Thrust and Cusp C++ packages from NVIDIA.

The submitted manuscript has been created by UChicago Argonne, LLC, Operator of Argonne National Laboratory (“Argonne”). Argonne, a U.S. Department of Energy Office of Science laboratory, is operated under Contract No. DE-AC02-06CH11357. The U.S. Government retains for itself, and others acting on its behalf, a paid-up nonexclusive, irrevocable worldwide license in said article to reproduce, prepare derivative works, distribute copies to the public, and perform publicly and display publicly, by or on behalf of the Government.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

eBook: USD 16.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Thrust is a CUDA library of parallel algorithms with an interface resembling the C++ Standard Template Library (STL). Thrust provides a flexible high-level interface for GPU programming that greatly enhances developer productivity.
2.
Cusp is a library for sparse linear algebra and graph computations on CUDA that uses Thrust.

References

Abedi R, Petracovici B, Haber R (2006) A space-time discontinuous Galerkin method for linearized elastodynamics with element-wise momentum balance. Comput Methods Appl Mech Eng 195(25–28):3247–3273
Article MathSciNet MATH Google Scholar
Baker C, Heroux M, Edwards H, Williams A (2010) A light-weight api for portable multicore programming. In: 18th Euromicro international conference on parallel, distributed and network-based processing (PDP), IEEE, pp 601–606
Google Scholar
Balay S, Gropp WD, McInnes LC, Smith BF (1997) Efficient management of parallelism in object oriented numerical software libraries. In: Arge E, Bruaset AM, Langtangen HP (eds) Modern software tools in scientific computing. Birkhäuser Press, Basel, pp 163–202
Google Scholar
Balay S, Brown J, Buschelman K, Eijkhout V, Gropp WD, Kaushik D, Knepley MG, McInnes LC, Smith BF, Zhang H (2011) PETSc Web page. http://www.mcs.anl.gov/petsc
Baskaran M, Bordawekar R (2009) Optimizing sparse matrix-vector multiplication onGPUs. IBM Research Report RC24704, IBM
Google Scholar
Bell N, Garland M (2008) Efficient sparse matrix-vector multiplication on CUDA. NVIDIA corporation, NVIDIA Technical report NVR-2008-004
Google Scholar
Bell N, Garland M (2009) Implementing sparse matrix-vector multiplication on throughput-oriented processors. In: Proceedings of the conference on high performance computing networking, storage and analysis. ACM, New York, pp 1–11
Google Scholar
Bell N, Garland M (2010) The Cusp library. http://code.google.com/p/cusp-library/
Bell N, Hoberock J (2010) The Thrust library. http://code.google.com/p/thrust/
Bolz J, Farmer I, Grinspun E, Schröoder P (2003) Sparse matrix solvers on the GPU: conjugate gradients and multigrid. In: SIGGRAPH ’03: ACM SIGGRAPH 2003 papers. ACM, New York, pp. 917–924. http://doi.acm.org/10.1145/1201775.882364
Buatois L, Caumon G, Lévy B (2007) Concurrent number cruncher: an efficient sparse linear solver on the GPU. In: Proceedings of the 3rd international conference high performance computing and communications, pp 358–371
Google Scholar
Cevahir A, Nukada A, Matsuoka S (2009) Fast conjugate gradients with multipleGPUs. Computational Science-ICCS, Springer, Heidelberg, pp 893–903
Google Scholar
Feng Z, Li P (2008) Multigrid on GPU: tackling power grid analysis on parallel simt platforms. In: IEEE/ACM international conference on computer-aided design, ICCAD 2008, pp 647–654
Google Scholar
Heroux MA, Bartlett RA, Howle VE, Hoekstra RJ, Hu JJ, Kolda TG, Lehoucq RB, Long KR, Pawlowski RP, Phipps ET, Salinger AG, Thornquist HK, Tuminaro RS, Willenbring JM, Williams A, Stanley KS (2005) An overview of the Trilinos project. ACM Trans Math Softw 31(3):397–423. doi http://doi.acm.org/10.1145/1089014.1089021
Google Scholar
Heroux M et al (2009) Trilinos web page. http://trilinos.sandia.gov/
Joldes G, Wittek A, Miller K (2010) Real-time nonlinear finite element computations on GPU-application to neurosurgical simulation. Comput Methods Appl Mech Eng 199:49–52
Google Scholar
Keunings R (1995) Parallel finite element algorithms applied to computational rheology. Comp Chem Eng 19(6):647–670
Article Google Scholar
Klöckner A, Warburton T, Bridge J, Hesthaven JS (2009) Nodal discontinuous Galerkin methods on graphics processors. J Comput Phys 228(21):7863–7882. doi http://dx.doi.org/10.1016/j.jcp.2009.06.041
Google Scholar
Komatitsch D, Vilotte J (1998) The spectral element method: an efficient tool to simulate the seismic response of 2d and 3d geological structures. Bull Seismol Soc Am 88(2):368–392
Google Scholar
Liu R, Li D (2000) A finite element model study on wear resistance of pseudoelastic TiNi alloy. Mater Sci Eng A 277(1–2):169–175
Google Scholar
Taylor Z, Cheng M, Ourselin S (2007) Real-time nonlinear finite element analysis for surgical simulation using graphics processing units. In: Proceedings of the 10th international conference on medical image computing and computer-assisted intervention, vol part I. Springer, Heidelberg, pp 701–708
Google Scholar
Vuduc R, Chandramowlishwaran A, Choi JMG (2010) On the limits of GPU acceleration. In: HOTPAR: proceedings of the 2nd USENIX workshop on hot topics in parallelism, USENIX
Google Scholar
Wu W, Heng P (2004) A hybrid condensed finite element model with GPU acceleration for interactive 3d soft tissue cutting. Comput Animat Virtual Worlds 15(3–4):219–227
Article Google Scholar
Yokota R, Bardhan JP, Knepley MG, Barba L, Hamada T (2011) Biomolecular electrostatics using a fast multipole BEM on up to 512 gpus and a billion unknowns. Comput Phys Commun 182(6):1272–1283. doi:10.1016/j.cpc.2011.02.013; http://www.sciencedirect.com/science/article/pii/S0010465511000750

Download references

Acknowledgments

We thank Nathan Bell from NVIDIA and Lisandro Dalcin for their assistance with this project. This work was supported by the Office of Advanced Scientific Computing Research, Office of Science, U.S. Department of Energy, under Contract DE-AC02-06CH11357.

Author information

Authors and Affiliations

School of Engineering, Tufts University, Medford, MA, 02155, USA
Victor Minden
Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, IL, 60439-4844, USA
Barry Smith
Computation Institute, University of Chicago, Chicago, IL, 60637, USA
Matthew G. Knepley

Authors

Victor Minden
View author publications
You can also search for this author in PubMed Google Scholar
Barry Smith
View author publications
You can also search for this author in PubMed Google Scholar
Matthew G. Knepley
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Victor Minden .

Editor information

Editors and Affiliations

University of Minnesota, Dep. of Earth Sciences and Minnesota, Supercomputing Institute, Pillsbury Hall 23, Minneapolis, 55455, Minnesota, USA
David A. Yuen
Network Information Center, Comuter Center and Computer, Zhong Guan Cun 4, Beijing, 100190, China, People's Republic
Long Wang
Supercomputing Center, Zhong Guan Cun 4, Beijing, 100190, China, People's Republic
Xuebin Chi
, Computer Science, University of Houston, Calhoun Street 4800, Houston, 77204, Texas, USA
Lennart Johnsson
Inst. Process Engineering (IPE), Chinese Academy of Sciences, Zhongguancun North Second Street 1, Beijing, 100190, China, People's Republic
Wei Ge
, Laboratory of Computational Geodynamics,, Chinese Academy of Sciences, Yu Quan Lu 19a, Beijing, 100049, China, People's Republic
Yaolin Shi

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Minden, V., Smith, B., Knepley, M.G. (2013). Preliminary Implementation of PETSc Using GPUs. In: Yuen, D., Wang, L., Chi, X., Johnsson, L., Ge, W., Shi, Y. (eds) GPU Solutions to Multi-scale Problems in Science and Engineering. Lecture Notes in Earth System Sciences. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-16405-7_7

Download citation

DOI: https://doi.org/10.1007/978-3-642-16405-7_7
Published: 09 January 2013
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-16404-0
Online ISBN: 978-3-642-16405-7
eBook Packages: Earth and Environmental ScienceEarth and Environmental Science (R0)

Publish with us

Policies and ethics