Skip to main content

Fast Matrix-Free Discontinuous Galerkin Kernels on Modern Computer Architectures

  • Conference paper
  • First Online:
High Performance Computing (ISC High Performance 2017)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10266))

Included in the following conference series:

Abstract

This study compares the performance of high-order discontinuous Galerkin finite elements on modern hardware. The main computational kernel is the matrix-free evaluation of differential operators by sum factorization, exemplified on the symmetric interior penalty discretization of the Laplacian as a metric for a complex application code in fluid dynamics. State-of-the-art implementations of these kernels stress both arithmetics and memory transfer. The implementations of SIMD vectorization and shared-memory parallelization are detailed. Computational results are presented for dual-socket Intel Haswell CPUs at 28 cores, a 64-core Intel Knights Landing, and a 16-core IBM Power8 processor. Up to polynomial degree six, Knights Landing is approximately twice as fast as Haswell. Power8 performs similarly to Haswell, trading a higher frequency for narrower SIMD units. The performance comparison shows that simple ways to express parallelism through for loops perform better on medium and high core counts than a more elaborate task-based parallelization with dynamic scheduling according to dependency graphs, despite less memory transfer in the latter algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://hpgmg.org.

  2. 2.

    https://github.com/RRZE-HPC/likwid, retrieved on September 18, 2016.

  3. 3.

    As a complement to the numbers given by likwid that count FMAs as one FLOP, we recorded FMAs and additions and multiplication separately with the Intel software development emulator.

References

  1. Bangerth, W., Davydov, D., Heister, T., Heltai, L., Kanschat, G., Kronbichler, M., Maier, M., Turcksin, B., Wells, D.: The deal.II library, version 8.4. J. Numer. Math. 24(3), 135–141 (2016). doi:10.1515/jnma-2016-1045. www.dealii.org

    Article  MathSciNet  MATH  Google Scholar 

  2. Hesthaven, J.S., Warburton, T.: Nodal Discontinuous Galerkin Methods: Algorithms, Analysis, and Applications. Texts in Applied Mathematics, vol. 54. Springer, New York (2008). doi:10.1007/978-0-387-72067-8

    Book  MATH  Google Scholar 

  3. Hindenlang, F., Gassner, G., Altmann, C., Beck, A., Staudenmaier, M., Munz, C.D.: Explicit discontinuous Galerkin methods for unsteady problems. Comput. Fluids 61, 86–93 (2012). doi:10.1016/j.compfluid.2012.03.006

    Article  MathSciNet  MATH  Google Scholar 

  4. Intel Corporation: Intel VTune Amplifier XE 2017. https://software.intel.com/en-us/intel-vtune-amplifier-xe

  5. Jeffers, J., Reinders, J., Sodani, A.: Intel Xeon Phi Processor High Performance Programming, Knights Landing Edition. Morgan Kaufmann, Cambridge (2016)

    Google Scholar 

  6. Karniadakis, G.E., Sherwin, S.J.: Spectral/hp Element Methods for Computational Fluid Dynamics, 2nd edn. Oxford University Press, Oxford (2005). doi:10.1093/acprof:oso/9780198528692.001.0001

    Book  MATH  Google Scholar 

  7. Karniadakis, G.E., Israeli, M., Orszag, S.A.: High-order splitting methods for the incompressible Navier-Stokes equations. J. Comput. Phys. 97(2), 414–443 (1991). doi:10.1016/0021-9991(91)90007-8

    Article  MathSciNet  MATH  Google Scholar 

  8. Kopriva, D.: Implementing Spectral Methods for Partial Differential Equations. Springer, Dordrecht (2009). doi:10.1007/978-90-481-2261-5

    Book  MATH  Google Scholar 

  9. Kormann, K., Kronbichler, M.: Parallel finite element operator application: graph partitioning and coloring. In: Proceedings of the 7th IEEE International Conference on eScience, pp. 332–339 (2011). doi:10.1109/eScience.2011.53

  10. Krank, B., Fehn, N., Wall, W.A., Kronbichler, M.: A high-order semi-explicit discontinuous Galerkin solver for 3D incompressible flow with application to DNS and LES of turbulent channel flow. arXiv preprint arXiv:1607.01323 (2016)

  11. Kronbichler, M., Kormann, K.: A generic interface for parallel cell-based finite element operator application. Comput. Fluids 63, 135–147 (2012). doi:10.1016/j.compfluid.2012.04.012

    Article  MathSciNet  MATH  Google Scholar 

  12. Kronbichler, M., Wall, W.A.: A performance comparison of continuous and discontinuous Galerkin methods with fast multigrid solvers. arXiv preprint arXiv:1611.03029 (2016)

  13. Reinders, J.: Intel Threading Building Blocks. O’Reilly, Sebastopol (2007)

    Google Scholar 

Download references

Acknowledgements

The authors acknowledge the support given by the Bayerische Kompetenznetzwerk für Technisch-Wissenschaftliches Hoch- und Höchstleistungsrechnen (KONWIHR) in the framework of the project High performance finite difference stencils for modern parallel processors. This work was supported by the German Research Foundation (DFG) under the project High-order discontinuous Galerkin for the exa-scale (ExaDG) within the priority program Software for Exascale Computing (SPPEXA). The authors gratefully acknowledge the Gauss Centre for Supercomputing e.V. (www.gauss-centre.eu) for funding this project by providing computing time on the GCS Supercomputer SuperMUC at Leibniz Supercomputing Centre (LRZ, www.lrz.de) through project id pr83te.

The authors acknowledge collaboration with Benjamin Krank, Niklas Fehn, and Matthias Brehm.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Martin Kronbichler .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Kronbichler, M., Kormann, K., Pasichnyk, I., Allalen, M. (2017). Fast Matrix-Free Discontinuous Galerkin Kernels on Modern Computer Architectures. In: Kunkel, J.M., Yokota, R., Balaji, P., Keyes, D. (eds) High Performance Computing. ISC High Performance 2017. Lecture Notes in Computer Science(), vol 10266. Springer, Cham. https://doi.org/10.1007/978-3-319-58667-0_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-58667-0_13

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-58666-3

  • Online ISBN: 978-3-319-58667-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics