Asynchronous OpenCL/MPI Numerical Simulations of Conservation Laws

  • Philippe HelluyEmail author
  • Thomas Strub
  • Michel Massaro
  • Malcolm Roberts
Conference paper
Part of the Lecture Notes in Computational Science and Engineering book series (LNCSE, volume 113)


Hyperbolic conservation laws are important mathematical models for describing many phenomena in physics or engineering. The Finite Volume (FV) method and the Discontinuous Galerkin (DG) method are two popular methods for solving conservation laws on computers. In this paper, we present several FV and DG numerical simulations that we have realized with the OpenCL and MPI paradigms. First, we compare two optimized implementations of the FV method on a regular grid: an OpenCL implementation and a more traditional OpenMP implementation. We compare the efficiency of the approach on several CPU and GPU architectures of different brands. Then we present how we have implemented the DG method in the OpenCL/MPI framework in order to achieve high efficiency. The implementation relies on a splitting of the DG mesh into subdomains and subzones. Different kernels are compiled according to the zone properties. In addition, we rely on the OpenCL asynchronous task graph in order to overlap OpenCL computations, memory transfers and MPI communications.


Discontinuous Galerkin Gauss Point Task Graph Interface Zone Volume Zone 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



This work has benefited from several supports: from the French Defense Agency DGA, from the Labex ANR-11-LABX-0055-IRMIA and from the AxesSim company. We also thank Vincent Loechner for his helpful advice regarding the optimization of the OpenMP code.


  1. 1.
    Aubert, D.: Numerical cosmology powered by GPUs. Proc. Int. Astron. Union 6 (S270), 397–400 (2010)CrossRefGoogle Scholar
  2. 2.
    Augonnet, C., Thibault, S., Namyst, R., Wacrenier, P.A.: StarPU: a unified platform for task scheduling on heterogeneous multicore architectures. Concurr. Comput.: Pract. Exper. 23 (2), 187–198 (2011)CrossRefGoogle Scholar
  3. 3.
    Berenger, J.P.: A perfectly matched layer for the absorption of electromagnetic waves. J. Comput. Phys. 114 (2), 185–200 (1994)MathSciNetCrossRefzbMATHGoogle Scholar
  4. 4.
    Cabel, T., Charles, J., Lanteri, S.: Multi-GPU acceleration of a DGTD method for modeling human exposure to electromagnetic waves. Research report, vol. RR-7592, p. 27. INRIA. (2011)
  5. 5.
    Cohen, G., Ferrieres, X., Pernet, S.: A spatial high-order hexahedral discontinuous Galerkin method to solve Maxwell’s equations in time domain. J. Comput. Phys. 217 (2), 340–363 (2006)MathSciNetCrossRefzbMATHGoogle Scholar
  6. 6.
    Helluy, P., Jung, J.: Interpolated pressure laws in two-fluid simulations and hyperbolicity. In: Finite Volumes for Complex Applications VII-Methods and Theoretical Aspects: FVCA 7, Berlin, June 2014, pp. 37–53. Springer, Cham (2014)Google Scholar
  7. 7.
    Helluy, P., Jung, J.: Two-fluid compressible simulations on GPU cluster. ESAIM Proc. Surv. 45, 349–358 (2014)MathSciNetCrossRefzbMATHGoogle Scholar
  8. 8.
    Hesthaven, J.S., Warburton, T.: Nodal Discontinuous Galerkin Methods: Algorithms, Analysis, and Applications. Texts in Applied Mathematics, vol. 54. Springer, New York (2008)Google Scholar
  9. 9.
    Klöckner, A., Warburton, T., Bridge, J., Hesthaven, J.S.: Nodal discontinuous Galerkin methods on graphics processors. J. Comput. Phys. 228 (21), 7863–7882 (2009). MathSciNetCrossRefzbMATHGoogle Scholar
  10. 10.
    Kloeckner, A.: Hedge: Hybrid and Easy Discontinuous Galerkin Environment. (2010)
  11. 11.
    LeVeque, R.J.: Finite volume methods for hyperbolic problems. Cambridge Texts in Applied Mathematics, vol. 31. Cambridge University Press, Cambridge (2002)Google Scholar
  12. 12.
    Massaro, M., Helluy, P., Loechner, V.: Numerical simulation for the MHD system in 2D using OpenCL. ESAIM Proc. Surv. 45, 485–492 (2014)MathSciNetCrossRefzbMATHGoogle Scholar
  13. 13.
    Michéa, D., Komatitsch, D.: Accelerating a three-dimensional finite-difference wave propagation code using GPU graphics cards. Geophys. J. Int. 182 (1), 389–402 (2010)Google Scholar
  14. 14.
    OpenCL: The open standard for parallel programming of heterogeneous systems. Accessed 23 Feb 2016
  15. 15.
    Ruetsch, G., Micikevicius, P.: Optimizing matrix transpose in CUDA. Nvidia CUDA SDK Application Note (2009)Google Scholar
  16. 16.
    Shen, J., Fang, J., Sips, H., Varbanescu, A.L.: Performance gaps between OpenMP and OpenCL for multi-core CPUs. In: 2012 41st International Conference on Parallel Processing Workshops (ICPPW), pp. 116–125. IEEE (2012)Google Scholar
  17. 17.
    Williams, S., Waterman, A., Patterson, D.: Roofline: an insightful visual performance model for multicore architectures. Commun. ACM 52 (4), 65–76 (2009)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Philippe Helluy
    • 1
    Email author
  • Thomas Strub
    • 2
  • Michel Massaro
    • 3
  • Malcolm Roberts
    • 3
  1. 1.IRMAUniversité de Strasbourg and Inria TonusStrasbourgFrance
  2. 2.AxesSim IllkirchIllkirch-GraffenstadenFrance
  3. 3.IRMAUniversité de StrasbourgStrasbourgFrance

Personalised recommendations