Fast Matrix-Free Discontinuous Galerkin Kernels on Modern Computer Architectures

Kronbichler, Martin; Kormann, Katharina; Pasichnyk, Igor; Allalen, Momme

doi:10.1007/978-3-319-58667-0_13

Martin Kronbichler ORCID: orcid.org/0000-0001-8406-835X¹⁹,
Katharina Kormann^20,21,
Igor Pasichnyk²² &
…
Momme Allalen²³

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10266))

Included in the following conference series:

International Conference on High Performance Computing

2298 Accesses
8 Citations

Abstract

This study compares the performance of high-order discontinuous Galerkin finite elements on modern hardware. The main computational kernel is the matrix-free evaluation of differential operators by sum factorization, exemplified on the symmetric interior penalty discretization of the Laplacian as a metric for a complex application code in fluid dynamics. State-of-the-art implementations of these kernels stress both arithmetics and memory transfer. The implementations of SIMD vectorization and shared-memory parallelization are detailed. Computational results are presented for dual-socket Intel Haswell CPUs at 28 cores, a 64-core Intel Knights Landing, and a 16-core IBM Power8 processor. Up to polynomial degree six, Knights Landing is approximately twice as fast as Haswell. Power8 performs similarly to Haswell, trading a higher frequency for narrower SIMD units. The performance comparison shows that simple ways to express parallelism through for loops perform better on medium and high core counts than a more elaborate task-based parallelization with dynamic scheduling according to dependency graphs, despite less memory transfer in the latter algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Efficient High-Order Discontinuous Galerkin Finite Elements with Matrix-Free Implementations

ExaDG: High-Order Discontinuous Galerkin for the Exa-Scale

An Efficient High Performance Parallelization of a Discontinuous Galerkin Spectral Element Method

Notes

1.
https://hpgmg.org.
2.
https://github.com/RRZE-HPC/likwid, retrieved on September 18, 2016.
3.
As a complement to the numbers given by likwid that count FMAs as one FLOP, we recorded FMAs and additions and multiplication separately with the Intel software development emulator.

References

Bangerth, W., Davydov, D., Heister, T., Heltai, L., Kanschat, G., Kronbichler, M., Maier, M., Turcksin, B., Wells, D.: The deal.II library, version 8.4. J. Numer. Math. 24(3), 135–141 (2016). doi:10.1515/jnma-2016-1045. www.dealii.org
Article MathSciNet MATH Google Scholar
Hesthaven, J.S., Warburton, T.: Nodal Discontinuous Galerkin Methods: Algorithms, Analysis, and Applications. Texts in Applied Mathematics, vol. 54. Springer, New York (2008). doi:10.1007/978-0-387-72067-8
Book MATH Google Scholar
Hindenlang, F., Gassner, G., Altmann, C., Beck, A., Staudenmaier, M., Munz, C.D.: Explicit discontinuous Galerkin methods for unsteady problems. Comput. Fluids 61, 86–93 (2012). doi:10.1016/j.compfluid.2012.03.006
Article MathSciNet MATH Google Scholar
Intel Corporation: Intel VTune Amplifier XE 2017. https://software.intel.com/en-us/intel-vtune-amplifier-xe
Jeffers, J., Reinders, J., Sodani, A.: Intel Xeon Phi Processor High Performance Programming, Knights Landing Edition. Morgan Kaufmann, Cambridge (2016)
Google Scholar
Karniadakis, G.E., Sherwin, S.J.: Spectral/hp Element Methods for Computational Fluid Dynamics, 2nd edn. Oxford University Press, Oxford (2005). doi:10.1093/acprof:oso/9780198528692.001.0001
Book MATH Google Scholar
Karniadakis, G.E., Israeli, M., Orszag, S.A.: High-order splitting methods for the incompressible Navier-Stokes equations. J. Comput. Phys. 97(2), 414–443 (1991). doi:10.1016/0021-9991(91)90007-8
Article MathSciNet MATH Google Scholar
Kopriva, D.: Implementing Spectral Methods for Partial Differential Equations. Springer, Dordrecht (2009). doi:10.1007/978-90-481-2261-5
Book MATH Google Scholar
Kormann, K., Kronbichler, M.: Parallel finite element operator application: graph partitioning and coloring. In: Proceedings of the 7th IEEE International Conference on eScience, pp. 332–339 (2011). doi:10.1109/eScience.2011.53
Krank, B., Fehn, N., Wall, W.A., Kronbichler, M.: A high-order semi-explicit discontinuous Galerkin solver for 3D incompressible flow with application to DNS and LES of turbulent channel flow. arXiv preprint arXiv:1607.01323 (2016)
Kronbichler, M., Kormann, K.: A generic interface for parallel cell-based finite element operator application. Comput. Fluids 63, 135–147 (2012). doi:10.1016/j.compfluid.2012.04.012
Article MathSciNet MATH Google Scholar
Kronbichler, M., Wall, W.A.: A performance comparison of continuous and discontinuous Galerkin methods with fast multigrid solvers. arXiv preprint arXiv:1611.03029 (2016)
Reinders, J.: Intel Threading Building Blocks. O’Reilly, Sebastopol (2007)
Google Scholar

Download references

Acknowledgements

The authors acknowledge the support given by the Bayerische Kompetenznetzwerk für Technisch-Wissenschaftliches Hoch- und Höchstleistungsrechnen (KONWIHR) in the framework of the project High performance finite difference stencils for modern parallel processors. This work was supported by the German Research Foundation (DFG) under the project High-order discontinuous Galerkin for the exa-scale (ExaDG) within the priority program Software for Exascale Computing (SPPEXA). The authors gratefully acknowledge the Gauss Centre for Supercomputing e.V. (www.gauss-centre.eu) for funding this project by providing computing time on the GCS Supercomputer SuperMUC at Leibniz Supercomputing Centre (LRZ, www.lrz.de) through project id pr83te.

The authors acknowledge collaboration with Benjamin Krank, Niklas Fehn, and Matthias Brehm.

Author information

Authors and Affiliations

Institute for Computational Mechanics, Technical University of Munich, Boltzmannstr. 15, 85747, Garching, Germany
Martin Kronbichler
Max–Planck–Institute for Plasma Physics, Boltzmannstr. 2, 85748, Garching, Germany
Katharina Kormann
Zentrum Mathematik, Technical University of Munich, Boltzmannstr. 3, 85747, Garching, Germany
Katharina Kormann
IBM Deutschland, Boltzmannstr. 1, 85748, Garching, Germany
Igor Pasichnyk
Leibniz-Rechenzentrum der Bayerischen Akademie der Wissenschaften, Boltzmannstr. 1, 85748, Garching, Germany
Momme Allalen

Authors

Martin Kronbichler
View author publications
You can also search for this author in PubMed Google Scholar
Katharina Kormann
View author publications
You can also search for this author in PubMed Google Scholar
Igor Pasichnyk
View author publications
You can also search for this author in PubMed Google Scholar
Momme Allalen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Martin Kronbichler .

Editor information

Editors and Affiliations

Deutsches Klimarechenzentrum (DKRZ), Hamburg, Germany
Julian M. Kunkel
Tokyo Institute of Technology, Tokyo, Japan
Rio Yokota
Argonne National Laboratory, Argonne, IL, USA
Pavan Balaji
KAUST, Thuwal, Saudi Arabia
David Keyes

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kronbichler, M., Kormann, K., Pasichnyk, I., Allalen, M. (2017). Fast Matrix-Free Discontinuous Galerkin Kernels on Modern Computer Architectures. In: Kunkel, J.M., Yokota, R., Balaji, P., Keyes, D. (eds) High Performance Computing. ISC High Performance 2017. Lecture Notes in Computer Science(), vol 10266. Springer, Cham. https://doi.org/10.1007/978-3-319-58667-0_13

Download citation

DOI: https://doi.org/10.1007/978-3-319-58667-0_13
Published: 12 May 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-58666-3
Online ISBN: 978-3-319-58667-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Fast Matrix-Free Discontinuous Galerkin Kernels on Modern Computer Architectures

Abstract

Access this chapter

Similar content being viewed by others

Efficient High-Order Discontinuous Galerkin Finite Elements with Matrix-Free Implementations

ExaDG: High-Order Discontinuous Galerkin for the Exa-Scale

An Efficient High Performance Parallelization of a Discontinuous Galerkin Spectral Element Method

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Fast Matrix-Free Discontinuous Galerkin Kernels on Modern Computer Architectures

Abstract

Access this chapter

Similar content being viewed by others

Efficient High-Order Discontinuous Galerkin Finite Elements with Matrix-Free Implementations

ExaDG: High-Order Discontinuous Galerkin for the Exa-Scale

An Efficient High Performance Parallelization of a Discontinuous Galerkin Spectral Element Method

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation