High-Order Discontinuous Galerkin Methods by GPU Metaprogramming

Klöckner, Andreas; Warburton, Timothy; Hesthaven, Jan S.

doi:10.1007/978-3-642-16405-7_23

Andreas Klöckner⁷,
Timothy Warburton⁸ &
Jan S. Hesthaven⁹

Part of the book series: Lecture Notes in Earth System Sciences ((LNESS))

2900 Accesses
6 Citations

Abstract

Discontinuous Galerkin (DG) methods for the numerical solution of partial differential equations have enjoyed considerable success because they are both flexible and robust: They allow arbitrary unstructured geometries and easy control of accuracy without compromising simulation stability. In a recent publication, we have shown that DG methods also adapt readily to execution on modern, massively parallel graphics processors (GPUs). A number of qualities of the method contribute to this suitability, reaching from locality of reference, through regularity of access patterns, to high arithmetic intensity. In this article, we illuminate a few of the more practical aspects of bringing DG onto a GPU, including the use of a Python-based metaprogramming infrastructure that was created specifically to support DG, but has found many uses across all disciplines of computational science.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Barth T, Knight T (2005) A streaming language implementation of the discontinuous Galerkin method. Technical report 20050184165. NASA Ames Research Center
Google Scholar
Bilmes J, Asanovic K, Chin C, Demmel J (1997) Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology. In: Proceedings of the 11th international conference on supercomputing. ACM, New York, pp 340–347
Google Scholar
Buck I, Foley T, Horn D, Sugerman J, Fatahalian K, Houston M, Hanrahan P (2004) Brook for GPUs: stream computing on graphics hardware. In: International conference on computer graphics and interactive techniques. ACM, New York, pp 777–786
Google Scholar
Burstedde C, Ghattas O, Gurnis M, Isaac T, Stadler G, Warburton T, Wilcox L (2010) Extreme-scale amr. In: International conference for high performance computing, networking, storage and analysis (SC), pp 1–12, Nov 2010. doi:10.1109/SC.2010.25
Cockburn B, Hou S, Shu C-W (1990) The runge-kutta local projection discontinuous galerkin finite element method for conservation laws IV: the multidimensional case. Math Comput 54(190):545–581. doi:10.2307/2008501
Google Scholar
Dally WJ, Hanrahan P, Erez M, Knight TJ, Labonté F, Ahn JH, Jayasena N, Kapasi UJ, Das A, Gummaraju J (2003) Merrimac: supercomputing with streams. In: Proceedings of the ACM/IEEE SC2003 conference (SC’03), vol 1
Google Scholar
Filipovič J, Fousek J (2010) Medium-grained functions mapping using modern GPUs. In: Proceedings of the symposium on application accelerators in high performance computing (SAAHPC’11), Knoxville, TN
Google Scholar
Frigo M, Johnson SG (2005) The design and implementation of FFTW3. Proc IEEE 93(2):216–231. doi:10.1109/JPROC.2004.840301. Special issue on “Program Generation, Optimization, and Platform Adaptation”
Google Scholar
Göddeke D, Strzodka R, Turek S (2005) Accelerating double precision FEM simulations with GPUs. In: Proceedings of ASIM
Google Scholar
Hesthaven JS, Warburton T (2007) Nodal discontinuous galerkin methods: algorithms, analysis, and applications. 1st edn, Springer. ISBN 0387720650
Google Scholar
Klöckner A, Pinto N, Lee Y, Catanzaro B, Ivanov P, Fasih A (2012) PyCUDA and PyOpenCL: a scripting-based approach to GPU run-time code generation. Parallel Comput 38(3):157–174. doi:10.1016/j.parco.2011.09.001
Google Scholar
Klöckner A, Warburton T, Bridge J, Hesthaven J (2009) Nodal discontinuous galerkin methods on graphics processors. J Comp Phys 228:7863–7882. doi:10.1016/j.jcp2009.06.041
Google Scholar
Klöckner A, Warburton T, Hesthaven J (2011a) Solving wave equations on unstructured geometries. In: Hwu W-m (ed) GPU computing gems, Jade Edn. Morgan Kaufmann Publishers, Waltham
Google Scholar
Klöckner A, Warburton T, Hesthaven JS (2011b) Viscous shock capturing in a time-explicit discontinuous galerkin method. Math Model Nat Phenom 6:57–83. doi:10.1051/mmnp/20116303
Google Scholar
Krakiwsky S, Turner L, Okoniewski M (2004) Acceleration of finite-difference time-domain (FDTD) using graphics processor units (GPU). In: IEEE MTT-S international microwave symposium digest, vol 2, pp 1033–1036, ISBN 0149-645X. doi:10.1109/MWSYM.2004.1339160
Lattner C, Adve V (2004) LLVM: a compilation framework for lifelong program analysis and transformation. In: IEEE/ACM international symposium on code generation and optimization, 0:75. doi:10.1109/CGO.2004.1281665
Lesaint P, Raviart P (1974) On a finite element method for solving the neutron transport equation. Mathematical aspects of finite elements in partial, differential equations. Academic Press, New York, pp 89–123
Google Scholar
Li W, Wei X, Kaufman A (2003) Implementing lattice boltzmann computation on graphics hardware. Vis Comput 19:444–456
Google Scholar
Lindholm E, Nickolls J, Oberman S, Montrym J (2008) Nvidia tesla: a unified graphics and computing architecture. IEEE Micro 28:39–55. doi:10.1109/MM.2008.31
Google Scholar
Mohammadian AH, Shankar V, Hall WF (1991) Computation of electromagnetic scattering and radiation using a time-domain finite-volume discretization procedure. Comput Phys Commun 68(1–3):175–196. doi:10.1016/0010-4655(91)90199-U
Mueller C, Martin B, Lumsdaine A (2007) CorePy: high-productivity Cell/BE programming. In: Procceedings of the first STI/Georgia tech workshop on software and applications for the Cell/BE processor, Georgia
Google Scholar
Nvidia corporation (2009) NVIDIA CUDA 2.2 compute unified device architecture programming guide. Nvidia corporation, Santa Clara, USA, April 2009
Google Scholar
Oliphant T (2006) Guide to NumPy. Trelgol Publishing, Spanish Fork
Google Scholar
Reed WH, Hill TR (1973) Triangular mesh methods for the neutron transport equation. Technical report, Los Alamos Scientific Laboratory, Los Alamos
Google Scholar
van Rossum G et al (1994) The python programming language. http://python.org
Warburton T (2006) An explicit construction of interpolation nodes on the simplex. J Eng Math 56:247–262. doi:10.1007/s10665-006-9086-6
Google Scholar
Warburton T (2010) A low storage curvilinear discontinuous galerkin time-domain method for electromagnetics. In: IEEE international symposium on electromagnetic theory (EMTS) (URSI 2010), pp 996–999
Google Scholar
Whaley RC, Petitet A, Dongarra JJ (2001) Automated empirical optimizations of software and the ATLAS project. Par Comp 27:3–35. doi:10.1016/S0167-8191(00)00087-9
Google Scholar

Download references

Acknowledgments

AK’s research was partially funded by AFOSR under contract number FA9550-07-1-0422, through the AFOSR/NSSEFF Program Award FA9550-10-1-0180 and also under contract DEFG0288ER25053 by the Department of Energy. TW acknowledges the support of AFOSR under grant number FA9550-05-1-0473 and of the National Science Foundation under grant number DMS 0810187. JSH was partially supported by AFOSR, NSF, and DOE. The opinions expressed are the views of the authors. They do not necessarily reflect the official position of the funding agencies.

Author information

Authors and Affiliations

Courant Institute of Mathematical Sciences, New York University, New York, NY, 10012, USA
Andreas Klöckner
Department of Computational and Applied Mathematics, Rice University, Houston, TX, 77005, USA
Timothy Warburton
Division of Applied Mathematics, Brown University, Providence, RI, 02912, USA
Jan S. Hesthaven

Authors

Andreas Klöckner
View author publications
You can also search for this author in PubMed Google Scholar
Timothy Warburton
View author publications
You can also search for this author in PubMed Google Scholar
Jan S. Hesthaven
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Andreas Klöckner .

Editor information

Editors and Affiliations

University of Minnesota, Dep. of Earth Sciences and Minnesota, Supercomputing Institute, Pillsbury Hall 23, Minneapolis, 55455, Minnesota, USA
David A. Yuen
Network Information Center, Comuter Center and Computer, Zhong Guan Cun 4, Beijing, 100190, China, People's Republic
Long Wang
Supercomputing Center, Zhong Guan Cun 4, Beijing, 100190, China, People's Republic
Xuebin Chi
, Computer Science, University of Houston, Calhoun Street 4800, Houston, 77204, Texas, USA
Lennart Johnsson
Inst. Process Engineering (IPE), Chinese Academy of Sciences, Zhongguancun North Second Street 1, Beijing, 100190, China, People's Republic
Wei Ge
, Laboratory of Computational Geodynamics,, Chinese Academy of Sciences, Yu Quan Lu 19a, Beijing, 100049, China, People's Republic
Yaolin Shi

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Klöckner, A., Warburton, T., Hesthaven, J.S. (2013). High-Order Discontinuous Galerkin Methods by GPU Metaprogramming. In: Yuen, D., Wang, L., Chi, X., Johnsson, L., Ge, W., Shi, Y. (eds) GPU Solutions to Multi-scale Problems in Science and Engineering. Lecture Notes in Earth System Sciences. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-16405-7_23

Download citation

DOI: https://doi.org/10.1007/978-3-642-16405-7_23
Published: 09 January 2013
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-16404-0
Online ISBN: 978-3-642-16405-7
eBook Packages: Earth and Environmental ScienceEarth and Environmental Science (R0)

Publish with us

Policies and ethics