1 Introduction

In this paper we address the resolution of basic cloud formation processes on modern super computer systems. The simulation of cloud formations, as part of convective processes, is expected to play an important role in future numerical weather prediction [1]. This requires both suitable physical models and effective computational realizations. Here we focus on the simulation of simple benchmark scenarios [10]. They contain relatively small scale effects which are well approximated with the compressible Navier-Stokes equations. We use the ader-dg method of [5], which allows us to simulate the Navier-Stokes equations with a space-time-discretization of arbitrary high order. In contrast to Runge-Kutta time integrators or semi-implicit methods, an increase of the order of ader-dg only results in larger computational kernels and does not affect the complexity of the scheme. Additionally, ader-dg is a communication avoiding scheme and reduces the overhead on larger scale. We see our scheme in the regime of already established methods for cloud simulations, as seen for example in [10, 12, 13].

Due to the viscous components of the Navier-Stokes equations, it is not straightforward to apply the ader-dg formalism of [5], which addresses hyperbolic systems of partial differentials equations (pdes) in first-order formulation. To include viscosity, we use the numerical flux for the compressible Navier-Stokes equations of Gassner et al. [8]. This flux has already been applied to the ader-dg method in [4]. In contrast to this paper, we focus on the simulation of complex flows with a gravitational source term and a realistic background atmosphere. Additionally, we use adaptive mesh refinement (amr) to increase the spatial resolution in areas of interest. This has been shown to work well for the simulation of cloud dynamics [12]. Regarding the issue of limiting in high-order dg methods, we note that viscosity not only models the correct physics of the problem but also smooths oscillations and discontinuities, thus stabilizing the simulation.

We base our work on the ExaHyPE Engine (www.exahype.eu), which is a framework that can solve arbitrary hyperbolic pde systems. A user of the engine is provided with a simple code interface which mirrors the parts required to formulate a well-posed Cauchy problem for a system of hyperbolic pdes of first order. The underlying ader-dg method, parallelization techniques and dynamic adaptive mesh refinement are available for simulations while the implementations are left as a black box to the user. An introduction to the communication-avoiding implementation of the whole numerical scheme can be found in [3].

To summarize, we make the following contributions in this paper:

  • We extend the ExaHyPE Engine to allow viscous terms.

  • We thus provide an implementation of the compressible Navier-Stokes equations. In addition, we tailor the equation set to stratified flows with gravitational source term. We emphasize that we use a standard formulation of the Navier-Stokes equations as seen in the field of computational fluid mechanics and only use small modifications of the governing equations, in contrast to a equation set that is tailored exactly to the application area.

  • We present a general amr-criterion that is based on the detection of outlier cells w.r.t. their total variation. Furthermore, we show how to utilize this criterion for stratified flows.

  • We evaluate our implementation with standard cfd scenarios and atmospheric flows and inspect the effectiveness of our proposed amr-criterion. We thus inspect, whether our proposed general implementation can achieve results that are competitive with the state-of-the-art models that rely on heavily specified equations and numerics.

2 Equation Set

The compressible Navier-Stokes equations in the conservative form are given as


with the vector of conserved quantities \(\varvec{Q}\), flux and source . Note that the flux can be split into a hyperbolic part \(\varvec{F}^{h}(\varvec{Q})\), which is identical to the flux of the Euler equations, and a viscous part . The conserved quantities \(\varvec{Q}\) are the density \(\rho \), the two or three-dimensional momentum \(\rho \varvec{v}\) and the energy density \(\rho E\). The rows of Eq. (1) are the conservation of mass, the conservation of momentum and the conservation of energy.

The pressure \(p\) is given by the equation of state of an ideal gas

$$\begin{aligned} p= (\gamma - 1) \left( \rho E- \frac{1}{2} \left( \varvec{v}\cdot \rho \varvec{v}\right) - gz \right) . \end{aligned}$$

The term gz is the geopotential height with the gravity of Earth g [10]. The temperature T relates to the pressure by the thermal equation of state

$$\begin{aligned} p= \rho R T, \end{aligned}$$

where R is the specific gas constant of a fluid.

We model the diffusivity by the stress tensor


with constant viscosity \(\mu \). The heat diffusion is governed by the coefficient

$$\begin{aligned} \kappa = \frac{\mu \gamma }{\Pr } \frac{1}{\gamma - 1} R = \frac{\mu c_p}{\Pr }, \end{aligned}$$

where the ratio of specific heats \(\gamma \), the heat capacity at constant pressure \(c_p\) and the Prandtl number \(\Pr \) depend on the fluid.

Many realistic atmospheric flows can be described by a perturbation over a background state that is in hydrostatic equilibrium


i.e. a state, where the pressure gradient is exactly in balance with the gravitational source term . The vector \(\varvec{k}\) is the unit vector pointing in z-direction. The momentum equation is dominated by the background flow in this case. Because this can lead to numerical instabilities, problems of this kind are challenging and require some care. To lessen the impact of this, we split the pressure \(p= \overline{p}+ p'\) into a sum of the background pressure \(\overline{p}(z)\) and perturbation \(p'(\varvec{x}, t)\). We split the density \(\rho = \overline{\rho }+ \rho '\) in the same manner and arrive at


Note that a similar and more complex splitting is performed in [10, 12]. In contrast to this, we use the true compressible Navier-Stokes equations with minimal modifications.

3 Numerics

The ExaHyPE Engine implements an ader-dg-scheme and a muscl-Hancock finite volume method. Both can be considered as instances of the more general PnPm schemes of [5]. We use a Rusanov-style flux that is adapted to pdes with viscous terms [7, 8]. The finite volume scheme is stabilized with the van Albada limiter [15]. The user can state dynamic amr rules by supplying custom criteria that are evaluated point-wise. Our criterion uses an element-local error estimate based on the total variation of the numerical solution. We exploit the fact that the total variation of a numerical solution is a perfect indicator for edges of a wavefront. Let \(\varvec{f}(\varvec{x}): \mathbb {R}^{N_\text {vars}} \rightarrow \mathbb {R}\) be a sufficiently smooth function that maps the discrete solution at a point \(\varvec{x}\) to an arbitrary indicator variable. The total variation (tv) of this function is defined by


for each cell. The operator \(\Vert \cdot \Vert _1\) denotes the discrete \(L_1\) norm in this equation. We compute the integral efficiently with Gaussian quadrature over the collocated quadrature points. How can we decide whether a cell is important or not? To resolve this conundrum, we compute the mean and the population standard deviation of the total variation of all cells. It is important that we use the method of [2] to compute the modes in a parallel and numerical stable manner. A cell is then considered to contain significant information if its deviates from the mean more than a given threshold. This criterion can be described formally by

$$\begin{aligned} \text {evaluate-refinement}(\varvec{Q}, \mu , \sigma ) = {\left\{ \begin{array}{ll} \text {refine} &{} \text {if } \mathrm {TV}(\varvec{Q}) \ge \mu + T_\text {refine}\sigma , \\ \text {coarsen} &{} \text {if } \mathrm {TV}(\varvec{Q}) < \mu + T_\text {coarsen}\sigma , \\ \text {keep} &{} \text {otherwise}. \end{array}\right. } \end{aligned}$$

The parameters \(T_\text {refine}> T_\text {coarsen}\) can be chosen freely. Chebyshev’s inequality

$$\begin{aligned} \mathbb {P}\bigl (\vert X - \mu \vert \ge c \sigma \bigr ) \le \frac{1}{c^2}, \end{aligned}$$

with probability \(\mathbb {P}\) guarantees that we neither mark all cells for refinement nor for coarsening. This inequality holds for arbitrary distributions under the weak assumption that they have a finite mean \(\mu \) and a finite standard deviation \(\sigma \) [16]. Note that subcells are coarsened only if all subcells belonging to the coarse cell are marked for coarsening. In contrast to already published criteria which are either designed solely for the simulation of clouds [12] or computationally expensive [7], our criterion works for arbitrary pdes and yet, is easy to compute and intuitive.

4 Results

In this section, we evaluate the quality of the results of our numerical methods and the scalability of our implementation. We use a mix of various benchmarking scenarios. After investigating the numerical convergence rate, we look at three standard cfd scenarios: the Taylor-Green vortex, the three-dimensional Arnold-Beltrami-Childress flow and a lid-driven cavity flow. Finally, we evaluate the performance for stratified flow scenarios in both two and three dimensions.

4.1 CFD Testing Scenarios

We begin with a manufactured solution scenario which we can use for a convergence test. We use the following constants of fluids for all scenarios in this section:

$$\begin{aligned} \gamma = 1.4, \quad \Pr = 0.7, \quad c_v = 1.0. \end{aligned}$$

Our description of the manufactured solution follows [4]. To construct this solution, we assume that


solves our pde. We use the constants , and simulate a domain of size \(\left[ 10 \times 10 \right] \) for \({0.5}\,\text {s}\). The viscosity is set to \(\mu = 0.1\). Note that Eq. (12) does not solve the compressible Navier-Stokes equations Eq. (1) directly. It rather solves our equation set with an added source term which can be derived with a computer algebra system. We ran this for a combination of orders \(1, \ldots , 6\) and multiple grid sizes. Note that by order we mean the polynomial order throughout the entire paper and not the theoretical convergence order. For this scenario, we achieve high-order convergence (Fig. 1) but notice some diminishing returns for large orders.

After we have established that the implementation of our numerical method converges, we are going to investigate three established testing scenarios from the field of computational fluid mechanics. A simple scenario is the Taylor-Green vortex. Assuming an incompressible fluid, it can be written as


The constant governs the speed of sound and thus the Mach number \(\text {Ma} = 0.1\) [6]. The viscosity is set to \(\mu = 0.1\).

We simulate on the domain \([0,2\pi ]^2\) and impose the analytical solution at the boundary. A comparison at time \(t = 10.0\) of the analytical solution for the pressure with our approximation (Fig. 2a) shows excellent agreement. Note that we only show a qualitative analysis because this is not an exact solution for our equation set as we assume compressibility of the fluid. This is nevertheless a valid comparison because for very low Mach numbers, both incompressible and compressible equations behave in a very similarly. We used an ader-dg-scheme of order 5 with a grid of \(25^2\) cells.

The Arnold-Beltrami-Childress (abc) flow is similar to the Taylor-Green vortex but is an analytical solution for the three-dimensional incompressible Navier-Stokes equations [14]. It is defined in the domain \( \left[ -\pi , \pi \right] ^3 \) as


The constant is chosen as before. We use a viscosity of \(\mu = 0.01\) and analytical boundary conditions. Our results (Fig. 2b) show a good agreement between the analytical solution and our approximation with an ader-dg-scheme of order 3 with a mesh consisting of \(27^3\) cells at time \(t = {0.1}\,\text {s}\). Again, we do not perform a quantitative analysis as the abc-flow only solves our equation set approximately.

Fig. 1.
figure 1

Mesh size vs. error for various polynomial orders P. Dashed lines show the theoretical convergence order of \(P+1\).

Fig. 2.
figure 2

Two-dimensional cfd scenarios

As a final example of standard flow scenarios, we consider the lid-driven cavity flow where the fluid is initially at rest, with \(\rho = 1\) and . We consider a domain of size \({1}\,\text {m} \times {1}\,\text {m}\) which is surrounded by no-slip walls. The flow is driven entirely by the upper wall which has a velocity of \(v_x = {1}\,\text {m/s}\). The simulation runs for \({10}\,\text {s}\). Again, our results (Fig. 3) have an excellent agreement with the reference solution of [9]. We used an ader-dg-method of order 3 with a mesh of size \(27^2\).

4.2 Stratified Flow Scenarios

Our main focus is the simulation of stratified flow scenarios. In the following, we present bubble convection scenarios in both two and three dimensions. With the constants

$$\begin{aligned} \gamma = 1.4 ,\quad \Pr = 0.71 ,\quad R = 287.058 ,\quad p_0 = 10^5\text {Pa}, \quad g = {9.8}\,{\text {m}/\text {s}^2}, \end{aligned}$$

all following scenarios are described in terms of the potential temperature

$$\begin{aligned} \theta = T \left( \frac{p_0}{p} \right) ^{R/c_p}, \end{aligned}$$

with reference pressure \(p_0\) [10, 12]. We compute the initial background density and pressure by inserting the assumption of a constant background energy in Eq. (6). The background atmosphere is then perturbed. We set the density and energy at the boundary such that it corresponds to the background atmosphere. Furthermore, to ensure that the atmosphere stays in hydrostatic balance, we need to impose the viscous heat flux


at the boundary [10]. In this equation, \(\overline{T}(z)\) is the background temperature at position z, which can be computed from Eqs. (2) and (6).

Our first scenario is the colliding bubbles scenario [12]. We use perturbations of the form

$$\begin{aligned} \theta '= {\left\{ \begin{array}{ll} A &{} r \le a, \\ A \exp \left( - \frac{(r-a)^2}{s^2} \right) &{} r > a, \end{array}\right. } \end{aligned}$$

where s is the decay rate and r is the radius to the center

$$\begin{aligned} r^2 = \Vert \varvec{x} - \varvec{x_c} \Vert _2, \end{aligned}$$

i.e., r denotes the Euclidean distance between the spatial positions \(\varvec{x} = (x, z)\) and the center of a bubble \(\varvec{x_c} = (x_c, z_c)\) – for three-dimensional scenarios \(\varvec{x}\) and \(\varvec{x_c}\) also contain a y coordinate.

We have two bubbles, with constants

$$\begin{aligned} \begin{array}{llllll} \displaystyle \text {warm:} &{}\qquad A = {0.5}\,\text {K}, &{}\quad a = {150}\,\text {m}, &{}\quad s = {50}\,\text {m}, &{}\quad x_c = {500}\,\text {m,} &{}\quad z_c = {300}\,\text {m},\\ \displaystyle \text {cold:} &{}\qquad A = {-0.15}\,\text {K}, &{}\quad a = {0}\,\text {m}, &{}\quad s = {50}\,\text {m}, &{}\quad x_c = {560}\,\text {m}, &{}\quad z_c = {640}\,\text {m}. \end{array} \end{aligned}$$

Similar to [12], we use a constant viscosity of \(\mu = 0.001\) to regularize the solution. Note that we use a different implementation of viscosity than [12]. Hence, it is difficult to compare the parametrization directly. We ran this scenario twice: once without amr and a mesh of size \({1000/81}\,\text {m} = {12.35}\,\text {m}\) and once with amr with two adaptive refinement levels and parameters \(T_\text {refine}= 2.5\) and \(T_\text {coarsen}= -0.5\). For both settings we used polynomials of order 6. We specialize the amr-criterion (Eq. 9) to our stratified flows by using the potential temperature. This resulted in a mesh with cell-size lengths of approx. 111.1 m, 37.04 m, and 12.34 m. The resulting mesh can be seen in Fig. 5.

Fig. 3.
figure 3

Our approximation (solid lines) of the lid-driven cavity flow vs. reference solution (crosses) of [9]. The respective other coordinate is held constant at a value of 0.

Fig. 4.
figure 4

Colliding bubbles with muscl-Hancock. Contour values for potential temperature perturbation are \(-0.05, 0.05, 0.1, \ldots 0.45\).

Fig. 5.
figure 5

Left: Colliding bubbles with ader-dg. Contour values for potential temperature perturbation are \(-0.05, 0.05, 0.1, \ldots 0.45\). Right: Comparison of small scale structure between order 3 (top) and order 6 (bottom).

We observe that the \(L_2\) difference between the potential temperature of the amr run, which uses 1953 cells, and the one of the fully refined run with 6561 cells, is only 1.87. The relative error is \({2.6\times 10^{-6}}\). We further emphasize that our amr-criterion accurately tracks the position of the edges of the cloud instead of only its position. This is the main advantage of our gradient-based method in contrast to methods working directly with the value of the solution, as for example [12]. Overall, our result for this benchmark shows an excellent agreement to the previous solutions of [12]. In addition, we ran a simulation with the same settings for a polynomial order of 3. The lower resolution leads to spurious waves (Fig. 5) and does not capture the behavior of the cloud.

Furthermore, we simulated the same scenario with our muscl-Hancock method, using \(7^2\) patches with \(90^2\) finite volume cells each. As we use limiting, we do not need any viscosity. The results of this method (Fig. 4) also agree with the reference but contain fewer details. Note that the numerical dissipativity of the finite volume scheme has a smoothing effect that is similar to the smoothing caused by viscosity.

Fig. 6.
figure 6

Cosine bubble scenario.

For our second scenario, the cosine bubble, we use a perturbation of the form


where A denotes the maximal perturbation and a is the size of the bubble. We use the constants

$$\begin{aligned} A = {0.5}\,\text {K}, \quad a = {250}\,\text {m}, \quad x_c = {500}\,\text {m}, \quad z_c = {350}\,\text {m}. \end{aligned}$$

For the three-dimensional bubble, we set \(y_c = x_c = {500}\,\text {m}\). This corresponds to the parameters used in [11]Footnote 1. For the 2D case, we use a constant viscosity of \(\mu = 0.001\) and an ader-dg-method of order 6 with two levels of dynamic amr, resulting again in cell sizes of roughly \({111.1}\,\text {m}, {37.04}\,\text {m}, {12.34}\,\text {m}\). We use slightly different amr parameters of \(T_\text {refine}= 1.5\) and \(T_\text {coarsen}= -0.5\) and let the simulation run for 700 s. Note that, as seen in Fig. 6a, our amr-criterion tracks the wavefront of the cloud accurately. This result shows an excellent agreement to the ones achieved in [10, 12].

For the 3D case, we use an ader-dg-scheme of order 3 with a static mesh with cell sizes of 40 m and a shorter simulation duration of 400 s. Due to the relatively coarse resolution and the hence increased aliasing errors, we need to increase the viscosity to \(\mu = 0.005\). This corresponds to a larger amount of smoothing. Our results (Fig. 6b) capture the dynamics of the scenario well and agree with the reference solution of [11].

4.3 Scalability

All two-dimensional scenarios presented in this paper can be run on a single workstation in less than two days. Parallel scalability was thus not the primary goal of this paper. Nevertheless, our implementation allows us to scale to small to medium scale setups using a combined mpi + Thread building blocks (tbb) parallelization strategy, which works as follows: We typically first choose a number of mpi ranks that ensure an equal load balancing. ExaHyPE achieves best scalability for \(1, 10, 83, \ldots \) ranks, as our underlying framework uses three-way splittings for each level and per dimension and an additional communication rank per level. For the desired number of compute nodes, we then determine the number of tbb threads per rank to match the number of total available cores.

We ran the two bubble scenario for a uniform grid with a mesh size of \(729 \times 729\) with order 6, resulting in roughly 104 million degrees of freedom (dof), for 20 timesteps and for multiple combinations of mpi ranks and tbb threads. This simulation was performed on the SuperMUC-NG system using computational kernels that are optimized for its Skylake architecture. Using a single mpi rank, we get roughly 4.9 millions dof updates (\(\text {MDOF/s}\)) using two tbb threads and \(20.2 \text {MDOF/s}\) using 24 threads (i.e. a half node). For a full node with 48 threads, we get a performance of \(12 \text {MDOF/s}\). When using 5 nodes with 10 mpi ranks, we achieve \(29.3 \text {MDOF/s}\) for two threads and \(137.3 \text {MDOF/s}\) for 24 threads.

We further note that for our scenarios weak scaling is more important than strong scaling, as we currently cover only a small area containing a single cloud, where in practical applications one would like to simulate more complex scenarios.

5 Conclusion

We presented an implementation of a muscl-Hancock-scheme and an ader-dg-method with amr for the Navier-Stokes equations, based on the ExaHyPE Engine. Our implementation is capable of simulating different scenarios: We show that our method has high order convergence and we successfully evaluated our method for standard cfd scenarios: We have competitive results for both two-dimensional scenarios (Taylor-Green vortex and lid-driven cavity) and for the three-dimensional abc-flow.

Furthermore, our method allows us to simulate flows in hydrostatic equilibrium correctly, as our results for the cosine and colliding bubble scenarios showed. We showed that our amr-criterion is able to vastly reduce the number of grid cells while preserving the quality of the results.

Future work should be directed towards improving the scalability. With an improved amr scaling and some fine tuning of the parallelization strategy, the numerical method presented here might be a good candidate for the simulation of small scale convection processes that lead to cloud formation processes.