Skip to main content
Log in

OpenMP Parallelization Strategies for a Discontinuous Galerkin Solver

  • Published:
International Journal of Parallel Programming Aims and scope Submit manuscript

Abstract

This paper aims to report on the open multi-processing (OpenMP) parallel implementation of a fully unstructured high-order discontinuous Galerkin (DG) solver for computational fluid dynamics and computational aeroacoustics applications. Even if the use of OpenMP paradigm is confined to shared memory systems, it has some advantages over the use of the message passing interface (MPI) library, and getting the best of this approach potentially improves the parallel efficiency of codes running on clusters of multi-core nodes. While with MPI the use of a domain decomposition algorithm is almost unavoidable, the OpenMP shared memory context offers several opportunities. Three strategies, here optimised for a DG solver, are presented and compared: the first refers to a customization of a colouring approach, the second mimics an MPI implementation in the OpenMP context, while the third method is somehow half way between the previous two. The numerical tests performed on both inviscid and viscous test cases indicate that, thanks to the compactness of the DG discretization, all the code versions perform quite satisfactory. In particular, the domain decomposition algorithm reaches the highest level of parallel efficiency at low computational loads while the colouring approach excels at larger computational loads and it can be easily implemented within an existing MPI code. Moreover, colouring is very well suited to deal with hardware accelerators, an opportunity given by the OpenMP 4.0 standard. Finally, the performance gain observed in using a hybrid MPI/OpenMP version of the DG code on high performance computing facilities is demonstrated.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17

Similar content being viewed by others

Notes

  1. The auto option was introduced in the OpenMP standard only with the release 3.0, it delegates the scheduling decision to the compiler.

References

  1. de Wiart, C.C., Hillewaert, K.: Development and validation of a massively parallel high-order solver for DNS and LES of industrial flows. In: IDIHOM: Industrialization of High-Order Methods-A Top-Down Approach, pp. 251–292. Springer (2015)

  2. Renac, F., Plata, M.L., Martin, E., Chapelier, J.B., Couaillier, V.: IDIHOM: industrialization of high-order methods—A top-down approach: results of a collaborative research project funded by the European Union, 2010–2014, chapter Aghora: a high-order DG solver for turbulent flow simulations, pp. 315–335. Springer International Publishing, Cham (2015)

    Google Scholar 

  3. Brus, S.R., Wirasaet, D., Westerink, J.J., Dawson, C.: Performance and scalability improvements for discontinuous Galerkin solutions to conservation laws on unstructured grids. J. Sci. Comput. 70(1), 210–242 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  4. Nair, R.D., Choi, H.W., Tufo, H.M.: Computational aspects of a scalable high-order discontinuous Galerkin atmospheric dynamical core. Comput. Fluids 38(2), 309–319 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  5. Reuter, B., Aizinger, V., Köstler, : A multi-platform scaling study for an OpenMP parallelization of a discontinuous Galerkin ocean model. Comput. Fluids 117, 325–335 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  6. Dong, S., Karniadakis, G.E.: Dual-level parallelism for high-order CFD methods. Parallel Comput. 30(1), 1–20 (2004)

    Article  Google Scholar 

  7. Chorley, M.J., Walker, D.W.: Performance analysis of a hybrid MPI/OpenMP application on multi-core clusters. J. Comput. Sci. 1(3), 168–174 (2010)

    Article  Google Scholar 

  8. Jin, H., Jespersen, D., Mehrotra, P., Biswas, R., Huang, L., Chapman, B.: High performance computing using MPI and OpenMP on multi-core parallel systems. Parallel Comput. 37(9), 562–575 (2011)

    Article  Google Scholar 

  9. Bassi, F., Colombo, F., Crivellini, A., Franciolini, M.: Hybrid OpenMP/MPI parallelization of a high-order Discontinuous Galerkin CFD/CAA solver. In: 7th European Congress on Computational Methods in Applied Sciences and Engineering, ECCOMAS Congress 2016, pp. 7992–8012. National Technical University of Athens (2016)

  10. Crivellini, A., Franciolini, M.: On the implementation of OpenMP and Hybrid MPI/OpenMP parallelization strategies for an explicit DG solver. Adv. Parallel Comput. 32, 527–536 (2018)

    Google Scholar 

  11. Crivellini, A., Bassi, F.: A three-dimensional parallel discontinuous Galerkin solver for acoustic propagation studies. Int. J. Aeroacoust. 2(2), 157–173 (2003)

    Article  Google Scholar 

  12. Bassi, F., Crivellini, A., Rebay, S., Savini, M.: Discontinuous Galerkin solution of the Reynolds averaged Navier–Stokes and k-\(\omega \) turbulence model equations. Comput. Fluids 34, 507–540 (2005)

    Article  MATH  Google Scholar 

  13. Bassi, F., Crivellini, A., Ghidoni, A., Rebay, S.: High-order discontinuous Galerkin discretization of transonic turbulent flows. In: 47th AIAA Aerospace Sciences Meeting including the New Horizons Forum and Aerospace Exposition, Orlando, FL, USA, January 5–8 2009. AIAA (2009)

  14. Bassi, F., Botti, L., Colombo, A., Crivellini, A., Franchina, N., Ghidoni, A., Rebay, S.: Very high-order accurate discontinuous Galerkin computation of transonic turbulent flows on aeronautical configurations. Note Numer. Fluid Mech. Multidiscip. Des. 113, 25–38 (2010)

    Article  Google Scholar 

  15. Bassi, F., Crivellini, A., Di Pietro, D.A., Rebay, S.: An artificial compressibility flux for the discontinuous Galerkin solution of the incompressible Navier–Stokes equations. J. Comput. Phys. 218(2), 794–815 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  16. Bassi, F., Crivellini, A., Di Pietro, D.A., Rebay, S.: An implicit high-order discontinuous Galerkin method for steady and unsteady incompressible flows. Comput. Fluids 36(10), 1529–1546 (2007). Special Issue Dedicated to Professor Michele Napolitano on the Occasion of his 60th Birthday

    Article  MathSciNet  MATH  Google Scholar 

  17. Crivellini, A., D’Alessandro, V., Bassi, F.: A Spalart–Allmaras turbulence model implementation in a discontinuous Galerkin solver for incompressible flows. J. Comput. Phys. 241, 388–415 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  18. Franciolini, M., Crivellini, A., Nigro, A.: On the efficiency of a matrix-free linearly implicit time integration strategy for high-order discontinuous Galerkin solutions of incompressible turbulent flows. Comput. Fluids 159, 276–294 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  19. Hu, F.Q., Atkins, H.L.: Eigensolution analysis of the discontinuous Galerkin method with nonuniform grids: I. one space dimension. J. Comput. Phys. 182(2), 516–545 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  20. Toulopoulos, I., Ekaterinaris, J.A.: High-order discontinuous Galerkin discretizations for computational aeroacoustics in complex domains. AIAA J. 44(3), 502–511 (2006)

    Article  Google Scholar 

  21. Bernacki, M., Fezoui, L., Lanteri, S., Piperno, S.: Parallel discontinuous Galerkin unstructured mesh solvers for the calculation of three-dimensional wave propagation problems. Appl. Math. Model. 30(8), 744–763 (2006)

    Article  MATH  Google Scholar 

  22. Baggag, A., Atkins, H.L., Keyes, D.: Parallel implementation of the discontinuous galerkin method, August 1999. NASA/CR-1999-209546, ICASE Report No. 99-35 (1999)

  23. Bassi, F., Rebay, S., Mariotti, G., Pedinotti, S., Savini, M.: A high-order accurate discontinuous finite element method for inviscid and viscous turbomachinery flows. In: Decuypere, R., Dibelius, G. (eds) Proceedings of the 2nd European Conference on Turbomachinery Fluid Dynamics and Thermodynamics, pp. 99–108, Antwerpen, Belgium, March 5–7 1997. Technologisch Instituut (1997)

  24. Cools, R.: An encyclopædia of cubature formulas. J. Complex. 19, 445–453 (2003)

    Article  MATH  Google Scholar 

  25. Kennedy, C.A., Carpenter, M.H., Lewis, R.M.: Low-storage, explicit Runge–Kutta schemes for the compressible Navier–Stokes equations. Appl. Numer. Math. 35(3), 177–219 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  26. Sato, Y., Hino, T., Ohashi, K.: Parallelization of an unstructured Navier–Stokes solver using a multi-color ordering method for OpenMP. Comput. Fluids 88, 496–509 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  27. Komatitsch, D., Michéa, D., Erlebacher, G.: Porting a high-order finite-element earthquake modeling application to NVIDIA graphics cards using CUDA. J. Parallel Distrib. Comput. 69(5), 451–460 (2009)

    Article  Google Scholar 

  28. Karypis, G., Kumar, V.: METIS, a software package for partitioning unstructured graphs, partitioning meshes, and computing fill-reducing orderings of sparse matrices. Technical Report Version 4.0, University of Minnesota, Department of Computer Science/Army HPC Research Center (1998)

  29. Hoeflinger, J., Alavilli, P., Jackson, T., Kuhn, B.: Producing scalable performance with OpenMP: experiments with two CFD applications. Parallel Comput. 27(4), 391–413 (2001). Parallel computing in aerospace

    Article  MATH  Google Scholar 

  30. Hardin, J.C., Ristorcelli, J.R., Tam, C.K.W.: ICASE/LaRC Workshop on Benchmark Problems in Computational Aeroacoustics (CAA). NASA conference publication. National Aeronautics and Space Administration, Langley Research Center (1995)

  31. Tam, C.K.W., Hardin, J.C.: Second computational aeroacoustics (CAA): workshop on benchmark problems. NASA conference publication, NASA (1997)

  32. Crivellini, A.: Assessment of a sponge layer as a non-reflective boundary treatment with highly accurate gust–airfoil interaction results. Int. J. Comput. Fluid Dyn. 30(2), 176–200 (2016)

    Article  MathSciNet  Google Scholar 

  33. Colombo, A., Crivellini, A.: Assessment of a sponge layer non-reflecting boundary treatment for high-order CAA/CFD computations. Comput. Fluids 140, 478–499 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  34. Mani, A.: Analysis and optimization of numerical sponge layers as a nonreflective boundary treatment. J. Comput. Phys. 231(2), 704–716 (2012)

    Article  MATH  Google Scholar 

  35. Morris, P.J.: Scattering of sound by a sphere: Category 1: Problems 3 and 4. In: Tam, C.K.W., Hardin, J.C. (eds.) Second Computational Aeroacoustics (CAA) Workshop on Benchmark Problems, 1997. NASA CP 3352 (1997)

  36. Simonaho, S.P., Lähivaara, T., Huttunen, T.: Modeling of acoustic wave propagation in time-domain using the discontinuous Galerkin method—a comparison with measurements. Appl. Acoust. 73(2), 173–183 (2012)

    Article  Google Scholar 

  37. 5th International Workshop on High–Order CFD Methods. https://how5.cenaero.be/

  38. Gassner, G.J., Beck, A.D.: On the accuracy of high-order discretizations for underresolved turbulence simulations. Theoret. Comput. Fluid Dyn. 27(3–4), 221–237 (2013)

    Article  Google Scholar 

  39. Bassi, F., Botti, L., Colombo, A., Crivellini, A., Ghidoni, A., Massa, F.C.: On the development of an implicit high-order discontinuous galerkin method for DNS and implicit LES of turbulent flows. Eur. J. Mech. B/Fluids 55, 367–379 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  40. Van Rees, W.M., Leonard, A., Pullin, D.I., Koumoutsakos, P.: A comparison of vortex and pseudo-spectral methods for the simulation of periodic vortical flows at high reynolds numbers. J. Comput. Phys. 230(8), 2794–2805 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  41. Advanced Micro Devices, Inc. AMD Opteron 6200 series processors, Linux tuning guide, 2012. Downloadable from https://developer.amd.com/wordpress/media/2012/10/51803A_OpteronLinuxTuningGuide_SCREEN.pdf

  42. Advanced Micro Devices, Inc. AMD Opteron 6200/4200 series processors compiler options quick reference guide, 2012. Downloadable from https://developer.amd.com/wordpress/media/2012/10/CompilerOptQuickRef-62004200.pdf (2012)

  43. Fang, J., Varbanescu, A.L., Sips, H., Zhang, L., Che, Y., Xu, C.: An empirical study of Intel Xeon Phi”. ArXiv e-prints, 12 (2013)

  44. Sodani, A., Gramunt, R., Corbal, J., Kim, H.S., Vinod, K., Chinthamani, S., Hutsell, S., Agarwal, R., Liu, Y.C.: Knights landing: second-generation intel xeon phi product. IEEE Micro 36(2), 34–46 (2016)

    Article  Google Scholar 

  45. Waltz, J., Wohlbier, J.G., Risinger, L.D., Canfield, T.R., Charest, M.R.J., Long, A.R., Morgan, N.R.: Performance analysis of a 3D unstructured mesh hydrodynamics code on multi-core and many-core architectures. Int. J. Numer. Methods Fluids 77(6), 319–333 (2015)

    Article  Google Scholar 

  46. Kannan, R., Harrand, V., Lee, M., Przekwas, A.J.: Highly scalable computational algorithms on emerging parallel machine multicore architectures: development and implementation in CFD context. Int. J. Numer. Methods Fluids 73(10), 869–882 (2013)

    MathSciNet  Google Scholar 

  47. Kannan, R., Harrand, V., Tan, X.G., Yang, H.Q., Przekwas, A.J.: Highly scalable computational algorithms on emerging parallel machine multicore architectures II: development and implementation in the CSD and FSI contexts. J. Parallel Distrib. Comput. 74(9), 2808–2817 (2014)

    Article  Google Scholar 

  48. Altmann, C., Beck, A., Birkefeld, A., Gassner, G., Hindenlang, F., Munz, C.D., Staudenmaier, M.: Discontinuous Galerkin for high performance computational fluid dynamics. In: Nagel, Wolfgang E., Kröner, Dietmar H, Resch, Michael M (eds.) High Performance Computing in Science and Engineering ‘12, pp. 225–238. Springer, Berlin (2013)

    Chapter  Google Scholar 

  49. Bueno, J., Martinell, L., Duran, A., Farreras, M., Martorell, X., Badia, R.M., Ayguade, E, Labarta, J.: Productive cluster programming with OmpSs. In: European Conference on Parallel Processing, pp. 555–566. Springer (2011)

  50. Matheou, George, Evripidou, Paraskevas: Data-driven concurrency for high performance computing. ACM Trans. Architect. Code Optim. (TACO) 14(4), 53 (2017)

    Google Scholar 

Download references

Acknowledgements

We acknowledge Aki Mäkivirta from Genelec, Timo Lähivaara and Simo-Pekka Simonaho from University of Eastern Finland and Tomi Huttunen from Kuava Oy for providing us the Genelec speaker geometry and details of their numerical and experimental tests. We also acknowledge “Centro per le Tecnologie Didattiche e la Comunicazione”, University of Bergamo, for the resources provided by CINECA, within the “Convenzione di Ateneo Università degli Studi di Bergamo”. Moreover, the CINECA award, under the ISCRA initiative (grant numbers HP10CE90VW and HP10BMA1AP), is acknowledged for the availability of high performance computing resources and support.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Andrea Crivellini.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Crivellini, A., Franciolini, M., Colombo, A. et al. OpenMP Parallelization Strategies for a Discontinuous Galerkin Solver. Int J Parallel Prog 47, 838–873 (2019). https://doi.org/10.1007/s10766-018-0589-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10766-018-0589-3

Keywords

Navigation