OpenMP Parallelization Strategies for a Discontinuous Galerkin Solver

Crivellini, Andrea; Franciolini, Matteo; Colombo, Alessandro; Bassi, Francesco

doi:10.1007/s10766-018-0589-3

OpenMP Parallelization Strategies for a Discontinuous Galerkin Solver

Published: 30 July 2018

Volume 47, pages 838–873, (2019)
Cite this article

International Journal of Parallel Programming Aims and scope Submit manuscript

Andrea Crivellini ORCID: orcid.org/0000-0002-3995-742X¹,
Matteo Franciolini¹,
Alessandro Colombo² &
…
Francesco Bassi²

647 Accesses
3 Citations
Explore all metrics

Abstract

This paper aims to report on the open multi-processing (OpenMP) parallel implementation of a fully unstructured high-order discontinuous Galerkin (DG) solver for computational fluid dynamics and computational aeroacoustics applications. Even if the use of OpenMP paradigm is confined to shared memory systems, it has some advantages over the use of the message passing interface (MPI) library, and getting the best of this approach potentially improves the parallel efficiency of codes running on clusters of multi-core nodes. While with MPI the use of a domain decomposition algorithm is almost unavoidable, the OpenMP shared memory context offers several opportunities. Three strategies, here optimised for a DG solver, are presented and compared: the first refers to a customization of a colouring approach, the second mimics an MPI implementation in the OpenMP context, while the third method is somehow half way between the previous two. The numerical tests performed on both inviscid and viscous test cases indicate that, thanks to the compactness of the DG discretization, all the code versions perform quite satisfactory. In particular, the domain decomposition algorithm reaches the highest level of parallel efficiency at low computational loads while the colouring approach excels at larger computational loads and it can be easily implemented within an existing MPI code. Moreover, colouring is very well suited to deal with hardware accelerators, an opportunity given by the OpenMP 4.0 standard. Finally, the performance gain observed in using a hybrid MPI/OpenMP version of the DG code on high performance computing facilities is demonstrated.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An energy-efficient GMRES–multigrid solver for space-time finite element computation of dynamic poroelasticity

Article Open access 13 April 2024

Parallelizing the dual revised simplex method

Article Open access 14 December 2017

Shared Memory Parallelism in Modern C++ and HPX

Article 20 April 2024

Notes

The auto option was introduced in the OpenMP standard only with the release 3.0, it delegates the scheduling decision to the compiler.

References

de Wiart, C.C., Hillewaert, K.: Development and validation of a massively parallel high-order solver for DNS and LES of industrial flows. In: IDIHOM: Industrialization of High-Order Methods-A Top-Down Approach, pp. 251–292. Springer (2015)
Renac, F., Plata, M.L., Martin, E., Chapelier, J.B., Couaillier, V.: IDIHOM: industrialization of high-order methods—A top-down approach: results of a collaborative research project funded by the European Union, 2010–2014, chapter Aghora: a high-order DG solver for turbulent flow simulations, pp. 315–335. Springer International Publishing, Cham (2015)
Google Scholar
Brus, S.R., Wirasaet, D., Westerink, J.J., Dawson, C.: Performance and scalability improvements for discontinuous Galerkin solutions to conservation laws on unstructured grids. J. Sci. Comput. 70(1), 210–242 (2017)
Article MathSciNet MATH Google Scholar
Nair, R.D., Choi, H.W., Tufo, H.M.: Computational aspects of a scalable high-order discontinuous Galerkin atmospheric dynamical core. Comput. Fluids 38(2), 309–319 (2009)
Article MathSciNet MATH Google Scholar
Reuter, B., Aizinger, V., Köstler, : A multi-platform scaling study for an OpenMP parallelization of a discontinuous Galerkin ocean model. Comput. Fluids 117, 325–335 (2015)
Article MathSciNet MATH Google Scholar
Dong, S., Karniadakis, G.E.: Dual-level parallelism for high-order CFD methods. Parallel Comput. 30(1), 1–20 (2004)
Article Google Scholar
Chorley, M.J., Walker, D.W.: Performance analysis of a hybrid MPI/OpenMP application on multi-core clusters. J. Comput. Sci. 1(3), 168–174 (2010)
Article Google Scholar
Jin, H., Jespersen, D., Mehrotra, P., Biswas, R., Huang, L., Chapman, B.: High performance computing using MPI and OpenMP on multi-core parallel systems. Parallel Comput. 37(9), 562–575 (2011)
Article Google Scholar
Bassi, F., Colombo, F., Crivellini, A., Franciolini, M.: Hybrid OpenMP/MPI parallelization of a high-order Discontinuous Galerkin CFD/CAA solver. In: 7th European Congress on Computational Methods in Applied Sciences and Engineering, ECCOMAS Congress 2016, pp. 7992–8012. National Technical University of Athens (2016)
Crivellini, A., Franciolini, M.: On the implementation of OpenMP and Hybrid MPI/OpenMP parallelization strategies for an explicit DG solver. Adv. Parallel Comput. 32, 527–536 (2018)
Google Scholar
Crivellini, A., Bassi, F.: A three-dimensional parallel discontinuous Galerkin solver for acoustic propagation studies. Int. J. Aeroacoust. 2(2), 157–173 (2003)
Article Google Scholar
Bassi, F., Crivellini, A., Rebay, S., Savini, M.: Discontinuous Galerkin solution of the Reynolds averaged Navier–Stokes and k-\(\omega \) turbulence model equations. Comput. Fluids 34, 507–540 (2005)
Article MATH Google Scholar
Bassi, F., Crivellini, A., Ghidoni, A., Rebay, S.: High-order discontinuous Galerkin discretization of transonic turbulent flows. In: 47th AIAA Aerospace Sciences Meeting including the New Horizons Forum and Aerospace Exposition, Orlando, FL, USA, January 5–8 2009. AIAA (2009)
Bassi, F., Botti, L., Colombo, A., Crivellini, A., Franchina, N., Ghidoni, A., Rebay, S.: Very high-order accurate discontinuous Galerkin computation of transonic turbulent flows on aeronautical configurations. Note Numer. Fluid Mech. Multidiscip. Des. 113, 25–38 (2010)
Article Google Scholar
Bassi, F., Crivellini, A., Di Pietro, D.A., Rebay, S.: An artificial compressibility flux for the discontinuous Galerkin solution of the incompressible Navier–Stokes equations. J. Comput. Phys. 218(2), 794–815 (2006)
Article MathSciNet MATH Google Scholar
Bassi, F., Crivellini, A., Di Pietro, D.A., Rebay, S.: An implicit high-order discontinuous Galerkin method for steady and unsteady incompressible flows. Comput. Fluids 36(10), 1529–1546 (2007). Special Issue Dedicated to Professor Michele Napolitano on the Occasion of his 60th Birthday
Article MathSciNet MATH Google Scholar
Crivellini, A., D’Alessandro, V., Bassi, F.: A Spalart–Allmaras turbulence model implementation in a discontinuous Galerkin solver for incompressible flows. J. Comput. Phys. 241, 388–415 (2013)
Article MathSciNet MATH Google Scholar
Franciolini, M., Crivellini, A., Nigro, A.: On the efficiency of a matrix-free linearly implicit time integration strategy for high-order discontinuous Galerkin solutions of incompressible turbulent flows. Comput. Fluids 159, 276–294 (2017)
Article MathSciNet MATH Google Scholar
Hu, F.Q., Atkins, H.L.: Eigensolution analysis of the discontinuous Galerkin method with nonuniform grids: I. one space dimension. J. Comput. Phys. 182(2), 516–545 (2002)
Article MathSciNet MATH Google Scholar
Toulopoulos, I., Ekaterinaris, J.A.: High-order discontinuous Galerkin discretizations for computational aeroacoustics in complex domains. AIAA J. 44(3), 502–511 (2006)
Article Google Scholar
Bernacki, M., Fezoui, L., Lanteri, S., Piperno, S.: Parallel discontinuous Galerkin unstructured mesh solvers for the calculation of three-dimensional wave propagation problems. Appl. Math. Model. 30(8), 744–763 (2006)
Article MATH Google Scholar
Baggag, A., Atkins, H.L., Keyes, D.: Parallel implementation of the discontinuous galerkin method, August 1999. NASA/CR-1999-209546, ICASE Report No. 99-35 (1999)
Bassi, F., Rebay, S., Mariotti, G., Pedinotti, S., Savini, M.: A high-order accurate discontinuous finite element method for inviscid and viscous turbomachinery flows. In: Decuypere, R., Dibelius, G. (eds) Proceedings of the 2nd European Conference on Turbomachinery Fluid Dynamics and Thermodynamics, pp. 99–108, Antwerpen, Belgium, March 5–7 1997. Technologisch Instituut (1997)
Cools, R.: An encyclopædia of cubature formulas. J. Complex. 19, 445–453 (2003)
Article MATH Google Scholar
Kennedy, C.A., Carpenter, M.H., Lewis, R.M.: Low-storage, explicit Runge–Kutta schemes for the compressible Navier–Stokes equations. Appl. Numer. Math. 35(3), 177–219 (2000)
Article MathSciNet MATH Google Scholar
Sato, Y., Hino, T., Ohashi, K.: Parallelization of an unstructured Navier–Stokes solver using a multi-color ordering method for OpenMP. Comput. Fluids 88, 496–509 (2013)
Article MathSciNet MATH Google Scholar
Komatitsch, D., Michéa, D., Erlebacher, G.: Porting a high-order finite-element earthquake modeling application to NVIDIA graphics cards using CUDA. J. Parallel Distrib. Comput. 69(5), 451–460 (2009)
Article Google Scholar
Karypis, G., Kumar, V.: METIS, a software package for partitioning unstructured graphs, partitioning meshes, and computing fill-reducing orderings of sparse matrices. Technical Report Version 4.0, University of Minnesota, Department of Computer Science/Army HPC Research Center (1998)
Hoeflinger, J., Alavilli, P., Jackson, T., Kuhn, B.: Producing scalable performance with OpenMP: experiments with two CFD applications. Parallel Comput. 27(4), 391–413 (2001). Parallel computing in aerospace
Article MATH Google Scholar
Hardin, J.C., Ristorcelli, J.R., Tam, C.K.W.: ICASE/LaRC Workshop on Benchmark Problems in Computational Aeroacoustics (CAA). NASA conference publication. National Aeronautics and Space Administration, Langley Research Center (1995)
Tam, C.K.W., Hardin, J.C.: Second computational aeroacoustics (CAA): workshop on benchmark problems. NASA conference publication, NASA (1997)
Crivellini, A.: Assessment of a sponge layer as a non-reflective boundary treatment with highly accurate gust–airfoil interaction results. Int. J. Comput. Fluid Dyn. 30(2), 176–200 (2016)
Article MathSciNet Google Scholar
Colombo, A., Crivellini, A.: Assessment of a sponge layer non-reflecting boundary treatment for high-order CAA/CFD computations. Comput. Fluids 140, 478–499 (2016)
Article MathSciNet MATH Google Scholar
Mani, A.: Analysis and optimization of numerical sponge layers as a nonreflective boundary treatment. J. Comput. Phys. 231(2), 704–716 (2012)
Article MATH Google Scholar
Morris, P.J.: Scattering of sound by a sphere: Category 1: Problems 3 and 4. In: Tam, C.K.W., Hardin, J.C. (eds.) Second Computational Aeroacoustics (CAA) Workshop on Benchmark Problems, 1997. NASA CP 3352 (1997)
Simonaho, S.P., Lähivaara, T., Huttunen, T.: Modeling of acoustic wave propagation in time-domain using the discontinuous Galerkin method—a comparison with measurements. Appl. Acoust. 73(2), 173–183 (2012)
Article Google Scholar
5th International Workshop on High–Order CFD Methods. https://how5.cenaero.be/
Gassner, G.J., Beck, A.D.: On the accuracy of high-order discretizations for underresolved turbulence simulations. Theoret. Comput. Fluid Dyn. 27(3–4), 221–237 (2013)
Article Google Scholar
Bassi, F., Botti, L., Colombo, A., Crivellini, A., Ghidoni, A., Massa, F.C.: On the development of an implicit high-order discontinuous galerkin method for DNS and implicit LES of turbulent flows. Eur. J. Mech. B/Fluids 55, 367–379 (2016)
Article MathSciNet MATH Google Scholar
Van Rees, W.M., Leonard, A., Pullin, D.I., Koumoutsakos, P.: A comparison of vortex and pseudo-spectral methods for the simulation of periodic vortical flows at high reynolds numbers. J. Comput. Phys. 230(8), 2794–2805 (2011)
Article MathSciNet MATH Google Scholar
Advanced Micro Devices, Inc. AMD Opteron 6200 series processors, Linux tuning guide, 2012. Downloadable from https://developer.amd.com/wordpress/media/2012/10/51803A_OpteronLinuxTuningGuide_SCREEN.pdf
Advanced Micro Devices, Inc. AMD Opteron 6200/4200 series processors compiler options quick reference guide, 2012. Downloadable from https://developer.amd.com/wordpress/media/2012/10/CompilerOptQuickRef-62004200.pdf (2012)
Fang, J., Varbanescu, A.L., Sips, H., Zhang, L., Che, Y., Xu, C.: An empirical study of Intel Xeon Phi”. ArXiv e-prints, 12 (2013)
Sodani, A., Gramunt, R., Corbal, J., Kim, H.S., Vinod, K., Chinthamani, S., Hutsell, S., Agarwal, R., Liu, Y.C.: Knights landing: second-generation intel xeon phi product. IEEE Micro 36(2), 34–46 (2016)
Article Google Scholar
Waltz, J., Wohlbier, J.G., Risinger, L.D., Canfield, T.R., Charest, M.R.J., Long, A.R., Morgan, N.R.: Performance analysis of a 3D unstructured mesh hydrodynamics code on multi-core and many-core architectures. Int. J. Numer. Methods Fluids 77(6), 319–333 (2015)
Article Google Scholar
Kannan, R., Harrand, V., Lee, M., Przekwas, A.J.: Highly scalable computational algorithms on emerging parallel machine multicore architectures: development and implementation in CFD context. Int. J. Numer. Methods Fluids 73(10), 869–882 (2013)
MathSciNet Google Scholar
Kannan, R., Harrand, V., Tan, X.G., Yang, H.Q., Przekwas, A.J.: Highly scalable computational algorithms on emerging parallel machine multicore architectures II: development and implementation in the CSD and FSI contexts. J. Parallel Distrib. Comput. 74(9), 2808–2817 (2014)
Article Google Scholar
Altmann, C., Beck, A., Birkefeld, A., Gassner, G., Hindenlang, F., Munz, C.D., Staudenmaier, M.: Discontinuous Galerkin for high performance computational fluid dynamics. In: Nagel, Wolfgang E., Kröner, Dietmar H, Resch, Michael M (eds.) High Performance Computing in Science and Engineering ‘12, pp. 225–238. Springer, Berlin (2013)
Chapter Google Scholar
Bueno, J., Martinell, L., Duran, A., Farreras, M., Martorell, X., Badia, R.M., Ayguade, E, Labarta, J.: Productive cluster programming with OmpSs. In: European Conference on Parallel Processing, pp. 555–566. Springer (2011)
Matheou, George, Evripidou, Paraskevas: Data-driven concurrency for high performance computing. ACM Trans. Architect. Code Optim. (TACO) 14(4), 53 (2017)
Google Scholar

Download references

Acknowledgements

We acknowledge Aki Mäkivirta from Genelec, Timo Lähivaara and Simo-Pekka Simonaho from University of Eastern Finland and Tomi Huttunen from Kuava Oy for providing us the Genelec speaker geometry and details of their numerical and experimental tests. We also acknowledge “Centro per le Tecnologie Didattiche e la Comunicazione”, University of Bergamo, for the resources provided by CINECA, within the “Convenzione di Ateneo Università degli Studi di Bergamo”. Moreover, the CINECA award, under the ISCRA initiative (grant numbers HP10CE90VW and HP10BMA1AP), is acknowledged for the availability of high performance computing resources and support.

Author information

Authors and Affiliations

Dipartimento di Ingegneria Industriale e Scienze Matematiche, Università Politecnica delle Marche, Via Brecce Bianche 12, 60131, Ancona, Italy
Andrea Crivellini & Matteo Franciolini
Dipartimento di Ingegneria e Scienze Applicate, Università degli studi di Bergamo, Via Marconi 5, 24044, Dalmine, BG, Italy
Alessandro Colombo & Francesco Bassi

Authors

Andrea Crivellini
View author publications
You can also search for this author in PubMed Google Scholar
Matteo Franciolini
View author publications
You can also search for this author in PubMed Google Scholar
Alessandro Colombo
View author publications
You can also search for this author in PubMed Google Scholar
Francesco Bassi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Andrea Crivellini.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Crivellini, A., Franciolini, M., Colombo, A. et al. OpenMP Parallelization Strategies for a Discontinuous Galerkin Solver. Int J Parallel Prog 47, 838–873 (2019). https://doi.org/10.1007/s10766-018-0589-3

Download citation

Received: 11 October 2017
Accepted: 25 July 2018
Published: 30 July 2018
Issue Date: December 2019
DOI: https://doi.org/10.1007/s10766-018-0589-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

OpenMP Parallelization Strategies for a Discontinuous Galerkin Solver

Abstract

Access this article

Similar content being viewed by others

An energy-efficient GMRES–multigrid solver for space-time finite element computation of dynamic poroelasticity

Parallelizing the dual revised simplex method

Shared Memory Parallelism in Modern C++ and HPX

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

OpenMP Parallelization Strategies for a Discontinuous Galerkin Solver

Abstract

Access this article

Similar content being viewed by others

An energy-efficient GMRES–multigrid solver for space-time finite element computation of dynamic poroelasticity

Parallelizing the dual revised simplex method

Shared Memory Parallelism in Modern C++ and HPX

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation