Skip to main content

High-Order Discontinuous Galerkin Methods by GPU Metaprogramming

  • Chapter
  • First Online:
GPU Solutions to Multi-scale Problems in Science and Engineering

Abstract

Discontinuous Galerkin (DG) methods for the numerical solution of partial differential equations have enjoyed considerable success because they are both flexible and robust: They allow arbitrary unstructured geometries and easy control of accuracy without compromising simulation stability. In a recent publication, we have shown that DG methods also adapt readily to execution on modern, massively parallel graphics processors (GPUs). A number of qualities of the method contribute to this suitability, reaching from locality of reference, through regularity of access patterns, to high arithmetic intensity. In this article, we illuminate a few of the more practical aspects of bringing DG onto a GPU, including the use of a Python-based metaprogramming infrastructure that was created specifically to support DG, but has found many uses across all disciplines of computational science.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • Barth T, Knight T (2005) A streaming language implementation of the discontinuous Galerkin method. Technical report 20050184165. NASA Ames Research Center

    Google Scholar 

  • Bilmes J, Asanovic K, Chin C, Demmel J (1997) Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology. In: Proceedings of the 11th international conference on supercomputing. ACM, New York, pp 340–347

    Google Scholar 

  • Buck I, Foley T, Horn D, Sugerman J, Fatahalian K, Houston M, Hanrahan P (2004) Brook for GPUs: stream computing on graphics hardware. In: International conference on computer graphics and interactive techniques. ACM, New York, pp 777–786

    Google Scholar 

  • Burstedde C, Ghattas O, Gurnis M, Isaac T, Stadler G, Warburton T, Wilcox L (2010) Extreme-scale amr. In: International conference for high performance computing, networking, storage and analysis (SC), pp 1–12, Nov 2010. doi:10.1109/SC.2010.25

  • Cockburn B, Hou S, Shu C-W (1990) The runge-kutta local projection discontinuous galerkin finite element method for conservation laws IV: the multidimensional case. Math Comput 54(190):545–581. doi:10.2307/2008501

    Google Scholar 

  • Dally WJ, Hanrahan P, Erez M, Knight TJ, Labonté F, Ahn JH, Jayasena N, Kapasi UJ, Das A, Gummaraju J (2003) Merrimac: supercomputing with streams. In: Proceedings of the ACM/IEEE SC2003 conference (SC’03), vol 1

    Google Scholar 

  • Filipovič J, Fousek J (2010) Medium-grained functions mapping using modern GPUs. In: Proceedings of the symposium on application accelerators in high performance computing (SAAHPC’11), Knoxville, TN

    Google Scholar 

  • Frigo M, Johnson SG (2005) The design and implementation of FFTW3. Proc IEEE 93(2):216–231. doi:10.1109/JPROC.2004.840301. Special issue on “Program Generation, Optimization, and Platform Adaptation”

    Google Scholar 

  • Göddeke D, Strzodka R, Turek S (2005) Accelerating double precision FEM simulations with GPUs. In: Proceedings of ASIM

    Google Scholar 

  • Hesthaven JS, Warburton T (2007) Nodal discontinuous galerkin methods: algorithms, analysis, and applications. 1st edn, Springer. ISBN 0387720650

    Google Scholar 

  • Klöckner A, Pinto N, Lee Y, Catanzaro B, Ivanov P, Fasih A (2012) PyCUDA and PyOpenCL: a scripting-based approach to GPU run-time code generation. Parallel Comput 38(3):157–174. doi:10.1016/j.parco.2011.09.001

    Google Scholar 

  • Klöckner A, Warburton T, Bridge J, Hesthaven J (2009) Nodal discontinuous galerkin methods on graphics processors. J Comp Phys 228:7863–7882. doi:10.1016/j.jcp2009.06.041

    Google Scholar 

  • Klöckner A, Warburton T, Hesthaven J (2011a) Solving wave equations on unstructured geometries. In: Hwu W-m (ed) GPU computing gems, Jade Edn. Morgan Kaufmann Publishers, Waltham

    Google Scholar 

  • Klöckner A, Warburton T, Hesthaven JS (2011b) Viscous shock capturing in a time-explicit discontinuous galerkin method. Math Model Nat Phenom 6:57–83. doi:10.1051/mmnp/20116303

    Google Scholar 

  • Krakiwsky S, Turner L, Okoniewski M (2004) Acceleration of finite-difference time-domain (FDTD) using graphics processor units (GPU). In: IEEE MTT-S international microwave symposium digest, vol 2, pp 1033–1036, ISBN 0149-645X. doi:10.1109/MWSYM.2004.1339160

  • Lattner C, Adve V (2004) LLVM: a compilation framework for lifelong program analysis and transformation. In: IEEE/ACM international symposium on code generation and optimization, 0:75. doi:10.1109/CGO.2004.1281665

  • Lesaint P, Raviart P (1974) On a finite element method for solving the neutron transport equation. Mathematical aspects of finite elements in partial, differential equations. Academic Press, New York, pp 89–123

    Google Scholar 

  • Li W, Wei X, Kaufman A (2003) Implementing lattice boltzmann computation on graphics hardware. Vis Comput 19:444–456

    Google Scholar 

  • Lindholm E, Nickolls J, Oberman S, Montrym J (2008) Nvidia tesla: a unified graphics and computing architecture. IEEE Micro 28:39–55. doi:10.1109/MM.2008.31

    Google Scholar 

  • Mohammadian AH, Shankar V, Hall WF (1991) Computation of electromagnetic scattering and radiation using a time-domain finite-volume discretization procedure. Comput Phys Commun 68(1–3):175–196. doi:10.1016/0010-4655(91)90199-U

  • Mueller C, Martin B, Lumsdaine A (2007) CorePy: high-productivity Cell/BE programming. In: Procceedings of the first STI/Georgia tech workshop on software and applications for the Cell/BE processor, Georgia

    Google Scholar 

  • Nvidia corporation (2009) NVIDIA CUDA 2.2 compute unified device architecture programming guide. Nvidia corporation, Santa Clara, USA, April 2009

    Google Scholar 

  • Oliphant T (2006) Guide to NumPy. Trelgol Publishing, Spanish Fork

    Google Scholar 

  • Reed WH, Hill TR (1973) Triangular mesh methods for the neutron transport equation. Technical report, Los Alamos Scientific Laboratory, Los Alamos

    Google Scholar 

  • van Rossum G et al (1994) The python programming language. http://python.org

  • Warburton T (2006) An explicit construction of interpolation nodes on the simplex. J Eng Math 56:247–262. doi:10.1007/s10665-006-9086-6

    Google Scholar 

  • Warburton T (2010) A low storage curvilinear discontinuous galerkin time-domain method for electromagnetics. In: IEEE international symposium on electromagnetic theory (EMTS) (URSI 2010), pp 996–999

    Google Scholar 

  • Whaley RC, Petitet A, Dongarra JJ (2001) Automated empirical optimizations of software and the ATLAS project. Par Comp 27:3–35. doi:10.1016/S0167-8191(00)00087-9

    Google Scholar 

Download references

Acknowledgments

AK’s research was partially funded by AFOSR under contract number FA9550-07-1-0422, through the AFOSR/NSSEFF Program Award FA9550-10-1-0180 and also under contract DEFG0288ER25053 by the Department of Energy. TW acknowledges the support of AFOSR under grant number FA9550-05-1-0473 and of the National Science Foundation under grant number DMS 0810187. JSH was partially supported by AFOSR, NSF, and DOE. The opinions expressed are the views of the authors. They do not necessarily reflect the official position of the funding agencies.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Andreas Klöckner .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Klöckner, A., Warburton, T., Hesthaven, J.S. (2013). High-Order Discontinuous Galerkin Methods by GPU Metaprogramming. In: Yuen, D., Wang, L., Chi, X., Johnsson, L., Ge, W., Shi, Y. (eds) GPU Solutions to Multi-scale Problems in Science and Engineering. Lecture Notes in Earth System Sciences. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-16405-7_23

Download citation

Publish with us

Policies and ethics