Skip to main content

Efficient High-Order Discontinuous Galerkin Finite Elements with Matrix-Free Implementations

Part of the Progress in IS book series (PROIS)


This work presents high-order discontinuous Galerkin finite element kernels optimized for node-level performance on a series of Intel architectures ranging from Sandy Bridge to Skylake. The kernels implement matrix-free evaluation of integrals with sum factorization techniques. In order to increase performance and thus to help to achieve higher energy efficiency, this work proposes an element-based shared-memory parallelization option and compares it to a well-established shared-memory parallelization with global face data. The new algorithm is supported by the relevant metrics in terms of arithmetics and memory transfer. On a single node with \(2\times 24\) cores of Intel Skylake Scalable, we report more than 1,200 GFLOPs/s in double precision for the full operator evaluation and up to 175 GB/s of memory throughput. Finally, we also show that merging the more arithmetically heavy operator evaluation with vector operations in application code allows to more than double efficiency on the latest hardware both with respect to energy as well as regarding time to solution.


  • High-order discontinuous Galerkin method
  • Sum factorization
  • Matrix-free method
  • Shared-memory parallelization
  • Energy efficiency by software optimization
  • Merged vector operations

This is a preview of subscription content, access via your institution.

Buying options

USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-319-99654-7_7
  • Chapter length: 22 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
USD   79.99
Price excludes VAT (USA)
  • ISBN: 978-3-319-99654-7
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   99.99
Price excludes VAT (USA)
Hardcover Book
USD   139.99
Price excludes VAT (USA)
Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7


  1. 1.

    Retrieved on May 14, 2018.


  1. Adams, M., Brezina, M., Hu, J., Tuminaro, R.: Parallel multigrid smoothing: polynomial versus Gauss-Seidel. J. Comput. Phys. 188, 593–610 (2003).

    MathSciNet  CrossRef  MATH  Google Scholar 

  2. Arndt, D., Bangerth, W., Davydov, D., Heister, T., Heltai, L., Kronbichler, M., Maier, M., Pelteret, J.-P., Turcksin, B., Wells, D.: The deal.II library, version 8.5. J. Numer. Math. 25(3) 137–145 (2017).

  3. Bastian, P.: A fully-coupled discontinuous Galerkin method for two-phase flow in porous media with discontinuous capillary pressure. Comput. Geosci. 18, 779–796 (2014).

    MathSciNet  CrossRef  MATH  Google Scholar 

  4. Breuer, A., Heinecke, A., Bader, M.: Petascale local time stepping for the ADER-DG finite element method, In: 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pp. 854–863 (2016).

  5. Deville, M.O., Fischer, P.F., Mund, E.H.: High-Order Methods for Incompressible Fluid Flow. Cambridge University Press (2002)

    Google Scholar 

  6. Fehn, N., Wall, W. A., Kronbichler, M.: A Matrix-free High-order Discontinuous Galerkin Compressible Navier-Stokes Solver: a performance comparison of compressible and incompressible formulations for turbulent incompressible flows (2018). arXiv:1806.03095

  7. Fehn, N., Wall, W.A., Kronbichler, M.: Efficiency of high-performance discontinuous Galerkin spectral element methods for under-resolved turbulent incompressible flows. Int. J. Numer. Methods Fluids 88, 32–54 (2018).

  8. Hager, G., Wellein, G.: Introduction to High Performance Computing for Scientists and Engineers. CRC Press (2010)

    Google Scholar 

  9. Hesthaven, J.S., Warburton, T.: Nodal Discontinuous Galerkin Methods: algorithms, analysis, and applications, texts in applied mathematics, vol. 54. Springer (2008).

  10. Hindenlang, F., Gassner, G., Altmann, C., Beck, A., Staudenmaier, M., Munz, C.D.: Explicit discontinuous Galerkin methods for unsteady problems. Comput. Fluids 61, 86–93 (2012).

    MathSciNet  CrossRef  MATH  Google Scholar 

  11.,5120.html. Accessed 14 May 2018

  12. Accessed 14 May 2018

  13. Accessed 14 May 2018

  14. Huang, H., Scovazzi, G.: A high-order, fully coupled, upwind, compact discontinuous Galerkin method for modeling of viscous fingering in compressible porous media. Comput. Meth. Appl. Mech. Engrg. 263, 169–187 (2013).

    MathSciNet  CrossRef  MATH  Google Scholar 

  15. Karniadakis, G.E., Sherwin, S.J.: Spectral/hp element methods for computational fluid dynamics, 2nd edn. Oxford University Press (2005).

  16. Kennedy, C.A., Carpenter, M.H., Lewis, R.M.: Low-storage, explicit Runge-Kutta schemes for the compressible Navier-Stokes equations. Appl. Numer. Math. 35, 177–219 (2000)

    MathSciNet  CrossRef  Google Scholar 

  17. Kochhar, G., Yoon, K., Weage, J.: 14G with Skylake—How Much Better for HPC? Dell EMC Community. Accessed 28 Sept 2017

    Google Scholar 

  18. Kopriva, D.: Implementing Spectral Methods for Partial Differential Equations. Springer (2009).

  19. Kormann, K., Kronbichler, M.: Parallel finite element operator application: Graph partitioning and coloring. In: Proceedings of 7th IEEE International Conference on e-Science, pp. 332–339 (2011).

  20. Krank, B., Fehn, N., Wall, W.A., Kronbichler, M.: A high-order semi-explicit discontinuous Galerkin solver for 3D incompressible flow with application to DNS and LES of turbulent channel flow. J. Comput. Phys. 348, 634–659 (2017).

    MathSciNet  CrossRef  MATH  Google Scholar 

  21. Kronbichler, M., Kormann, K.: A generic interface for parallel cell-based finite element operator application. Comput. Fluids 63, 135–147 (2012).

    MathSciNet  CrossRef  MATH  Google Scholar 

  22. Kronbichler, M., Kormann, K.: Fast matrix-free evaluation of discontinuous Galerkin finite element operators. arXiv:1711.03590 (2017)

  23. Kronbichler, M., Kormann, K., Pasichnyk, I., Allalen, M.: Fast matrix-free discontinuous Galerkin Kernels on modern computer architectures. In: International Supercomputing Conference, pp. 237–255. Springer, Cham (2017).

  24. Kronbichler, M., Schoeder, S., Müller, C., Wall, W.A.: Comparison of implicit and explicit hybridizable discontinuous Galerkin methods for the acoustic wave equation. Int. J. Numer. Methods Eng. 106(9), 712–739 (2016).

    MathSciNet  CrossRef  MATH  Google Scholar 

  25. Kronbichler, M., Wall, W.A.: A performance comparison of continuous and discontinuous Galerkin methods with fast multigrid solvers (2016). arXiv:1611.03029

  26. Müthing, S., Piatkowski, M., Bastian, P.: High-performance implementation of matrix-free high-order discontinuous Galerkin methods (2017). arXiv:1711.10885

  27. Orszag, S.A.: Spectral methods for problems in complex geometries. J. Comput. Phys. 37, 70–92 (1980)

    MathSciNet  CrossRef  Google Scholar 

  28. Schoeder, S., Kormann, K., Wall, W. A., Kronbichler, M.: Efficient explicit time stepping of high order discontinuous Galerkin schemes for waves (2018). arXiv:1805.03981

  29. Shoukourian, H., Wilde, T., Huber, H., Bode, A.: Analysis of the efficiency characteristics of the first high-temperature direct liquid cooled Petascale supercomputer and its cooling infrastructure. J. Parallel Distrib. Comput. 107, 87–100 (2017).

    CrossRef  Google Scholar 

  30. Treibig, J., Hager, G., Wellein, G.: LIKWID: A lightweight performance-oriented tool suite for x86 multicore environments. In: Proceedings of PSTI2010, the First International Workshop on Parallel Software Tools and Tool Infrastructures, San Diego, CA (2010).,

  31. Wichmann, K.-R., Kronbichler, M., Löhner, R., Wall, W.A.: Practical applicability of optimizations and performance models to complex stencil-based loop kernels in CFD. Int. J. High Perf. Comput. Appl. (2018).

Download references


The authors acknowledge the support given by the Bayerische Kompetenz-netzwerk für Technisch-Wissenschaftliches Hoch- und Höchstleistungsrechnen (KONWIHR) in the framework of the project Matrix-free GPU kernels for complex applications in fluid dynamics. This work was supported by the German Research Foundation (DFG) under the project High-order discontinuous Galerkin for the exa-scale (ExaDG) within the priority program Software for Exascale Computing (SPPEXA). The authors acknowledge collaboration with Katharina Kormann, Igor Pasichnyk, and Matthias Brehm.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Martin Kronbichler .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Verify currency and authenticity via CrossMark

Cite this paper

Kronbichler, M., Allalen, M. (2018). Efficient High-Order Discontinuous Galerkin Finite Elements with Matrix-Free Implementations. In: Bungartz, HJ., Kranzlmüller, D., Weinberg, V., Weismüller, J., Wohlgemuth, V. (eds) Advances and New Trends in Environmental Informatics. Progress in IS. Springer, Cham.

Download citation

  • DOI:

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-99653-0

  • Online ISBN: 978-3-319-99654-7

  • eBook Packages: Computer ScienceComputer Science (R0)