Abstract
We develop errorcontrol based time integration algorithms for compressible fluid dynamics (CFD) applications and show that they are efficient and robust in both the accuracylimited and stabilitylimited regime. Focusing on discontinuous spectral element semidiscretizations, we design new controllers for existing methods and for some new embedded RungeKutta pairs. We demonstrate the importance of choosing adequate controller parameters and provide a means to obtain these in practice. We compare a wide range of errorcontrolbased methods, along with the common approach in which step size control is based on the CourantFriedrichsLewy (CFL) number. The optimized methods give improved performance and naturally adopt a step size close to the maximum stable CFL number at loose tolerances, while additionally providing control of the temporal error at tighter tolerances. The numerical examples include challenging industrial CFD applications.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
Systems of hyperbolic conservation laws are used to model many areas of science and engineering, such as fluid dynamics, acoustics, and electrodynamics. In practical applications, these systems must often be solved numerically. Explicit RungeKutta schemes are the most commonly used time discretizations for hyperbolic partial differential equations (PDEs), because of their efficiency and parallel scalability [24, 46, 51]. Overall efficiency of the method also depends on choosing a time step that is as large as possible while still satisfying stability and accuracy requirements. Since stability requirements are frequently more restrictive in this setting, hyperbolic PDE practitioners often adapt the time step size based on a desired CourantFriedrichsLewy (CFL) number. The CFL number involves the ratio of the maximum characteristic speed to the mesh spacing, which is essentially a proxy for the norm of the Jacobian. The optimal CFL number depends on the space and time discretizations chosen, and possibly on the problem; it is often determined by trial and error.
On the other hand, time integration research has long emphasized the efficiency of errorbased step size control. Much effort has gone into the design of embedded RungeKutta pairs and step size controllers for this purpose. Compared to CFLbased control, errorbased control has the advantage of not requiring a manuallytuned CFL number and allowing for control of the temporal error when necessary. CFLbased control has the advantage of (usually) yielding nearoptimal efficiency once the appropriate CFL value has been found, as long as the calculation is indeed stabilitylimited. Errorbased step size control for convectiondominated problems has been attempted previously; see e.g., [7, 80]. An ideal time integration algorithm would achieve the efficiency of the CFLbased controller in the stabilitylimited regime without the need for manuallytuned parameters, while automatically reducing the step size if error control becomes a more restrictive requirement. In this work, we develop such algorithms in the context of computational fluid dynamics (CFD).
Specifically, we focus on lowstorage RungeKutta pairs (reviewed in Sect. 2) combined with PID step size controllers (reviewed in Sect. 3) and spectral element methods. Spectral element methods can be very efficient for largescale computations [5, 27, 32, 34, 79]. Because stability is a challenging issue for these schemes, a lot of effort has been devoted to developing energy stable (linearly stable) [3, 54, 78], and entropy stable (nonlinearly stable) spatial discretizations [13, 15, 20, 22, 23, 67, 71, 72]. Stable fullydiscrete schemes can be obtained from these semidiscretizations by using a slight modification of classical time integration schemes, based on the relaxation approach [41, 63, 65, 66].
In the paradigm of CFLbased error control, a common approach to time integrator design is to seek a large region of absolute stability (see e.g., [21] for a recent example of this approach in the context of CFD). For errorbased control, a large region of absolute stability is again important (for both the main method and the embedded method). Additionally, when automatic error control is used with step sizes near the stability limit, the concept of step size control stability becomes crucial to the design of the controller. We demonstrate the importance of choosing good step size controllers in Sect. 4. There exists some previous work on developing errorbased step size control techniques for convectiondominated problems, principally by Berzins et al. [7, 80].
We compare some existing RungeKutta pairs in Sect. 5, and develop optimized RungeKutta pairs for discontinuous spectral element semidiscretizations of hyperbolic conservation laws in Sect. 6. The spectral element methods applied for the numerical experiments are implemented in the hpadaptive, unstructured, curvilinear grid solver SSDC [55]. SSDC is built on top of the Portable and Extensible Toolkit for Scientific computing (PETSc) [6], its mesh topology abstraction (DMPLEX) [44], and its scalable ODE/DAE solver library [1]. Further details on the spatial semidiscretizations can be found in [13, 14, 20, 56, 57, 67]. We perform numerical experiments using the novel schemes in Sect. 7, both for the compressible Euler and NavierStokes equations. Finally, we summarize and discuss our results in Sect. 8. We contributed our optimized methods to the freely available open source library DifferentialEquations.jl [62] written in Julia [8].
2 RungeKutta Methods and Adaptive Time Stepping
Using the method of lines, a spatial semidiscretization of a hyperbolic PDE yields an ordinary differential equation (ODE) system
where \(u:[0,T] \rightarrow {\mathbb {R}}^m\) and \(m\) is the number of degrees of freedom in the spatial discretization. An explicit firstsameaslast (FSAL) RungeKutta pair with s stages can be described by its Butcher tableau [12, 28]
where \(A \in {\mathbb {R}}^{s \times s}\) is strictly lowertriangular, \(b, c \in {\mathbb {R}}^s\), and \({\widehat{b}}\in {\mathbb {R}}^{s+1}\). For (1), a step from \(u^n \approx u(t_n)\) to \(u^{n+1} \approx u(t_{n+1})\), where \(t_{n+1} = t_n + {\Delta t}_n\), is given by
Here, \(y_i\) are the stage values of the RungeKutta method and the difference \(u{\widehat{u}}\) is used to estimate the local truncation error. If \({\widehat{b}}_{s+1}=0,\) then (3) is an ordinary Runge‑Kutta pair; otherwise, it is referred to as an FSAL Runge‑Kutta pair. The FSAL idea is to use the derivative of the new solution as an additional input for the error estimator [19]. If the step is accepted, this costs nothing since the value \(f(t_{n+1}, u^{n+1})\) must be computed at the next step anyway. Usually, for a main method of order q, the embedded method is chosen to be of order \({\widehat{q}}= q1\); i.e., the schemes are used in local extrapolation mode.
Remark 1
There are different notations for FSAL methods. A common alternative to our choice of using \(A \in {\mathbb {R}}^{s \times s}\), \(b, c \in {\mathbb {R}}^{s}\), and \({\widehat{b}}\in {\mathbb {R}}^{s+1}\) is to embed the baseline sstage RungeKutta method in a method with \(s+1\) stages and Butcher coefficients
Then, the last row of \({\tilde{A}}\) is equal to \({\tilde{b}}\) and \({\widehat{b}}\in {\mathbb {R}}^{s+1}\) can be defined as usual.
The common assumption \(\sum _j a_{ij} = c_i\) is used throughout this article. For methods with errorbased step size control, the initial step size is chosen using the algorithm described in [28, p. 169].
2.1 LowStorage Methods
A typical Runge‑Kutta implementation requires simultaneous storage of all of the stages and/or their derivatives. Each stage or derivative occupies m words; we refer to this amount of storage (sufficient for holding a copy of the solution on the spatial grid at one point in time) as a register. A lowstorage Runge‑Kutta method is one that can be implemented using only a few registers; herein we consider methods that require just three or four registers. Note that three registers are the fewest possible if one requires an error estimator and the ability to reject a step.
We consider the lowstorage method classes (with and without the FSAL technique):

3S*: threeregister methods that include an error estimate;

3S*_{+}: threeregister methods that require a fourth register for the error estimate.
Let \(S_j\) denote a given storage register. The class 3S* methods, introduced in [40], can be implemented using only three storage registers of size \(m\) if assignments of the form
can be made with only \(m+ o(m)\) memory. Otherwise, an additional register is required. The 3S* method family is parameterized by the coefficients \(c_i, \gamma _{1,i}, \gamma _{2,i}, \gamma _{3,i}, \beta _{i}, \delta _{i}\) and can be implemented as described in Algorithm 1. We will also use 3S* or 3S*_{+} to denote some strong stability preserving (SSP) RungeKutta methods that can be implemented using a slight modification of these algorithms, as described in [39].
The FSAL technique has been applied to lowstorage methods in [53]; these schemes append an FSAL stage to 3S* methods to get more coefficients for the embedded error estimator.
All 3S*_{+} methods use an additional storage location for the embedded error estimator. If the embedded method is not used, they reduce to 3S* methods without an embedded scheme. Their lowstorage implementation is delineated in Algorithm 2.
2.2 ErrorBased Step Size Control
We use step size controllers based on digital signal processing [25, 26, 73,74,75] implemented in PETSc [1, 6]. In particular, we use PID controllers that select a new time step using the formula^{Footnote 1}
where q is the order of the main method, \({\widehat{q}}\) is the order of the embedded method (usually \({\widehat{q}}= q  1\)), \(k = \min (q, {\widehat{q}}) + 1\) (usually \(k = {\widehat{q}}+ 1 = q\)), \(\beta _i\) are the controller parameters, and
where \(m\) is the number of degrees of freedom in u, and \(\texttt {atol}\), \(\texttt {rtol}\) are the absolute and relative error tolerances, respectively. Some common controller parameters recommended in the literature are given in Table 1. Unless stated otherwise, we use equal absolute and relative error tolerances. The choice of the weighted/relative error estimate \(w_{n+1}\) is common in the literature [28, Equations (4.10) and (4.11)] and often the default choice in general purpose ODE software such as PETSc [1] or DifferentialEquations.jl [62]. This choice of \(w_{n+1}\) allows to decouple the time integration parameters from a possible spatial semidiscretization. In contrast to a quadraturebased approach, it weighs degrees of freedom of different refinement levels in the same way, which can be beneficial, since refined regions (of interest) are not weighed less than coarse regions (without interesting solution features).
If the factor multiplying the old time step \({\Delta t}_{n}\) is too small or the solution is out of physical bounds, e.g., because of negative density/pressure in CFD, the step is rejected and retried with a smaller time step \({\Delta t}_{n}\). The default options used in all numerical experiments described in this work accept a step if the factor multiplying the step size is at least \(0.9^2\). Otherwise, the step is rejected and retried with the step size predicted by the PID controller. If the solution is out of physical bounds, the step is rejected and retried with a time step reduced by a factor of four.
3 CFL vs. ErrorBased Step Size Control
The errorbased step size control described above is efficient if the practical time step is limited by the constraint of accuracy. On the other hand, if the allowable time step is determined by stability, and an explicit time discretization is employed, then it is natural to use a step size of the form \({\Delta t}_n \propto 1/L_n\), where \(L_n\) is an approximation of the norm of the Jacobian of the ODE system.
In the time integration of hyperbolic PDEs, it is indeed often the case that the step size is limited in practice by stability rather than accuracy. Therefore it is common practice to use a step size control of the kind just described. For such systems, the norm of the Jacobian is proportional to \(\max _i(\lambda _\mathrm {max}(u^n_i)/{\Delta x}_i)\), where \({\Delta x}_i\) is a local measure of the mesh spacing (at grid point/cell/element i), and \(\lambda _\mathrm {max}\) is the maximal (local) wave speed, related to the largestmagnitude eigenvalue of the flux Jacobian of the hyperbolic system. The step size control thus takes the form (referred to herein as a CFLbased control)
where \(\nu \) is the desired CFL number. The appropriate choice of \(\nu \) depends on the details of the space and time discretizations; it can be studied theoretically using linearization (see e.g., [49]) but is often determined experimentally. An additional complication is the question of how to define \({\Delta x}\). Even on uniform Cartesian grids and regular triangulations [48], multiple waves traveling in different directions make an optimal choice of \(\nu \) difficult. This is even more a challenging question for unstructured grids.
A clear advantage of errorbased control is the availability of an estimate of the temporal error. At first glance, errorbased step size control seems inappropriate in the stabilitylimited regime, since the local error may not be very sensitive to small differences between stable and unstable step sizes, near the stable step size limit. A tight error tolerance that ensures stability at all steps might result in an excessively small step size. However, as described in [29, Section IV.2] and discussed below, it is possible to design errorbased step size controllers that behave appropriately in the stabilitylimited regime.
Both classes of controllers require some userdetermined parameters: \(\nu \) and \({\Delta x}\) for the CFLbased controller, and \(\texttt {atol}\) and \(\texttt {rtol}\) for the errorbased controllers. In this section we show through an example that carefullydesigned errorbased controllers can achieve nearmaximal efficiency in a way that is relatively insensitive to changes in the user parameters. In contrast, the efficiency of the CFLbased controller always bears a linear sensitivity to the parameters \(\nu \) and \({\Delta x}\).
To demonstrate, we consider the twodimensional advection equation with constant velocity \(a = (1, 1)^{\text{T}}\) in the domain \([5,5]^2\) with periodic boundary conditions. An initial sinusoid of one wavelength in each direction is advected over the time interval [0, 100]. In space we apply the spectral collocation method of SSDC based on solution polynomials of degree \(p = 4\) [55].
For the CFLbased controller, the ratio of the local mesh spacing and the maximal speed at a node i is estimated in this case as
where d is the number of spatial dimensions (\(d = 2\) for this example), \(a = (1, 1)^{\text{T}}\) is the constant advection velocity, \(J_i\) is the determinant of the grid Jacobian \(\partial _{x} \xi \) at node i, \((J \partial _{x} \xi ^j)_i\) is the contravariant basis vector in direction j at node i [45, Chapter 6], and \(\sigma \) is a normalizing factor depending on the solution polynomial degree, p, which is usually chosen such that a real stability interval of 2 corresponds to \(\nu = 1\).
Two grids will be used: a regular, uniform grid with \(8^2\) elements and the curvilinear grid shown in Fig. 1. The results are summarized in Fig. 2 for the uniform grid and in Fig. 3 for the unstructured grid. We test three time discretizations: the popular fourthorder, fivestage, lowstorage method \(\hbox {CK}{4}(3){5}\)[2N] of [35]; the method \(\hbox {KCL}4(3)5[2\hbox {R}_{+}]\hbox {C}\) of [37], which comes with an embedded thirdorder error estimator; and the SSP method \(\hbox {SSP3(2)3[3S*}_{+}]\) of [70], equipped with the embedded method of [18].
Remark 2
We use the same naming convention as [37], referring to an sstage RungeKutta method of order q with embedded method of order \({\widehat{q}}\) as \(\hbox {NAME}q({\widehat{q}})s\). Additional identifiers indicating lowstorage requirements or other properties are appended, e.g., a subscript “F” for FSAL methods. The number of stages s denotes the effective number of RHS evaluations per step, which is one less than the number of stages for FSAL methods. For lowstorage methods, the required amount of memory based on certain assumptions is listed following the notation of [40]. In particular, nN methods need only n memory registers of size m if assignments of the form \(S_j \leftarrow \alpha S_j + f(t, S_i)\) can be made without additional allocations. Similarly, nR methods use n memory registers and assignments of the form \(S_j \leftarrow f(t, S_j)\); mS methods were described in Sect. 2.1. As described there, a subscript \(_+\) indicates methods that require an additional storage register if an embedded error estimator is used. Additional parts of the names of Runge‑Kutta methods are usually taken directly from their sources. For example, the method \(\hbox {KCL}4(3)5[2\hbox {R}_{+}]\hbox {C}\) is a fourthorder method with embedded thirdorder error estimator. It has five stages and requires two memory registers based on the nR assumption. If the embedded error estimator is used, it requires three memory registers. The C suffix is appended as suggested in [37] to indicate a particular design criterion (in this particular case, looking for a compromise between linear stability and accuracy).
The widelyused method \(\hbox {CK}{4}(3){5}\)[2N] has linear stability properties very similar to \(\hbox {KCL}4(3)5[2\hbox {R}_{+}]\hbox {C}\) — both methods have the same maximum stable CFL number \(\nu = 2.1\). Thus, they use the same number of RHS evaluations while yielding nearly the same errors. Thus, we omit the method \(\hbox {CK}{4}(3){5}\)[2N] in the following plots and use only the \(\hbox {KCL}4(3)5[2\hbox {R}_{+}]\hbox {C}\) pair, for which it is possible to use errorbased step size control. Impressively, the errorbased controller manages, for a wide range of tolerances, to use almost exactly the same number of steps as the carefully tuned CFLbased controller. Over this range of tolerances, including \(\texttt {tol}\in [10^{5}, 10^{3}]\), the step size is determined by stability. Hence, the number of RHS evaluations and the error are nearly independent of the tolerance in this regime. For tolerances larger than \(10^{3}\), the final error increases for this longtime simulation while the number of RHS evaluations stays nearly the same. For tighter tolerances (below \(10^{6}\)), the errorbased controller detects accuracy restrictions and increases the number of RHS evaluations. This also leads to a reduction of the final error until it plateaus again because of the dominant spatial error (at ca. \(\texttt {tol}= 10^{7}\)). However, the number of RHS evaluations keeps increasing.
Using the same CFL number of \(\nu = 2.1\) on the unstructured grid still results in a stable simulation. However, the CFL number can be doubled there without increasing the error significantly. Hence, the user has to tune this parameter carefully to get a stable and efficient simulation. In contrast, using errorbased step size control we see behavior very similar to what was observed for the uniform grid. The same error tolerance can be used, resulting in the same optimal number of function evaluations determined manually for the CFLbased step size controller. This demonstrates the enhanced robustness properties of errorbased step size control.
These examples suggest that errorbased control is more robust to changes in the grid and less sensitive to the required user parameters. Similar results have been obtained using other RungeKutta schemes for this problem and for more challenging problems, some of which are presented later in this work. For practitioners whose primary interest is in applying the schemes to solve challenging scientific problems or developing spatial semidiscretizations, errorbased time step controllers seem favorable, since the most important design choices have to be provided by the developers of the time integration schemes and the practitioners have to choose only the rather robust error tolerance of the solver.
4 Importance of Controller Parameters
Standard errorbased controllers will often work acceptably in the asymptotic regime (i.e., the regime where the leading truncation error term strongly dominates all subsequent terms). However, as demonstrated in Sect. 3, applications involving convectiondominated problems are often constrained by stability, so that one may be working outside the asymptotic regime. In this case, the standard theory does not apply; instead, step size control stability has to be considered [30].
Following [29, Section IV.2], step size control stability can be explained using the linear model problem \(\frac{\hbox {d}}{\hbox {d}t} u(t) = \lambda u(t)\). Given an explicit RungeKutta method with embedded error estimator and a PID controller (6) with parameters \(\beta _i\), the update formulae become
where R is the stability polynomial of the main method, E is the difference of the stability polynomials of the embedded and the main method, and \(e\) is the (local) error estimate. By taking logarithms, this update formula can be reduced to a difference recursion with fixed points on the boundary of the stability region of the main method. To get a stable behavior, the spectral radius of the associated Jacobian has to be less than unity [29, Proposition IV.2.3]. For a PID controller (6), this Jacobian becomes [37]
where k is the order of the error estimator; if \({\widehat{q}}= q  1\) is the order of the embedded method, then \(k = q = {\widehat{q}}+ 1\). To get step size control stability, one can fix a controller such as the standard I controller and optimize the Runge‑Kutta pair accordingly as demonstrated in [31]. The other possibility, pursued here, is to optimize the controller parameters for a given Runge‑Kutta pair.
While one might hope that a controller designed to work well with one method will also work well with other methods, this is generally not the case. Rather, a controller should be designed for the given error estimator; cf. [4] for the case of linear multistep methods. To demonstrate this, we consider again the linear advection problem described in Sect. 3 with a uniform mesh. We will take the PI34 controller with \(\beta _1 = 0.7, \beta _2 = 0.4\) [25], designed for use with the classical \(\hbox {DP}{5}(4){6}_{\mathrm{F}}\) method of [61], but use instead the \(\hbox {BS}5(4)7_{\mathrm{F}}\) method of [10]. Note that both are 5(4) pairs designed with similar purposes in mind. Using a tolerance of \(\texttt {tol}= 10^{5}\), the integration requires 5 015 RHS evaluations and includes many rejected steps. Applying instead the optimized coefficient \(\beta = (0.28, 0.23)\) derived later in this manuscript results in only 4 119 RHS evaluations and a nearly identical final error. A significant performance gain is obtained by applying appropriate controller parameters, cf. Table 2.
The spectral radius of the Jacobian (11) determining step size control stability is plotted in Fig. 4. We see that the standard PI34 controller is unstable near the negative real axis while the optimized one is stable.
5 Comparison of Existing Methods
Here, we compare some general purpose methods and schemes designed for semidiscretizations of hyperbolic conservation laws. Since we are interested in errorbased step size control, we consider only schemes with embedded error estimators. Hence, we consider the general purpose schemes

\(\hbox {BS}3(2)3_{\mathrm{F}}\), thirdorder, fourstage FSAL method of [9],

\(\hbox {BS}5(4)7_{\mathrm{F}}\), fifthorder, eightstage FSAL method of [10],

\(\hbox {DP}{5}(4){6}_{\mathrm{F}}\), fifthorder, sevenstage FSAL method of [61],
the SSP schemes

\(\hbox {SSP3(2)3[3S*}_{+}]\), thirdorder, threestage SSP method of [70] with the embedded method of [18],

\(\hbox {SSP3(2)4[3S*}_{+}]\), thirdorder, fourstage SSP method of [47] with the embedded method of [18] which can be implemented efficiently in lowstorage form as described in Appendix A,
and the lowstorage methods optimized for hyperbolic conservation laws

\(\hbox {KCL}3(2)4[2\hbox {R}_{+}]\hbox {C}\), thirdorder, fourstage method of [37],

\(\hbox {KCL}4(3)5[2\hbox {R}_{+}]\hbox {C}\), fourthorder, fivestage method of [37],

\(\hbox {KCL}4(3)5[3\hbox {R}_{+}]\hbox {C}\), fourthorder, fivestage method of [37],

\(\hbox {KCL5(4)9[2R}_{+}]\hbox {S}\), fifthorder, ninestage method of [37].
The results shown here, obtained with three commonlyused general purpose methods, are typical of what we have found in tests with a much wider range of methods. These results are sufficient to illustrate our main conclusions. We do not consider \({\hbox {SSP4(3)10[3S}^{*}}_{+}]\) of [39] with the embedded method of [18] because step size control stability cannot be achieved for this method and any PID controller tested. The embedded method for \({\hbox {SSP3(2)9[3S}^{*}}_{+}]\) proposed in [18] also does not lead to step size control stability. We have created a new embedded method with a stable optimized controller. However, it does not perform better than \(\hbox {SSP3(2)4[3S*}_{+}]\), even with manually tuned CFLbased step size control.
In the following, we will use three representative test problems to compare the performance of these schemes. All test problems are semidiscretizations of the compressible Euler equations in d space dimensions
where the conserved variables \(u = (\rho , \rho v^{\text{T}}, \rho e)^{\text{T}}\) are the density \(\rho \), the momentum \(\rho v\), and the energy \(\rho e\). The flux for the spatial coordinate j is
where \(p = \rho T = (\gamma 1) (\rho e  \rho v^2 / 2)\) is the pressure, T is the temperature, and an ideal gas law with ratio of specific heats \(\gamma = \nicefrac {7}{5}\) is assumed. The spatial semidiscretizations use entropydissipative nodal DG methods with polynomials of degree p on LegendreGaussLobatto nodes with upwind interface fluxes implemented in SSDC. We present detailed results for \(p = 2\), which is a relevant choice in practical CFD applications. The results are similar for higherorder semidiscretizations such as polynomials of degree \(p \in \{3, 4\}\), presented in the supplementary material in more detail.
5.1 Inviscid TaylorGreen Vortex
The inviscid TaylorGreen vortex in \(d=3\) space dimensions is a classical test case to study the stability of numerical methods [23]. The initial condition given by
with the Mach number \({Ma}= 0.1\) is evolved in the periodic domain \([\uppi , \uppi ]^3\). Unless stated otherwise, we use 8 elements per coordinate direction and the final time of \(t = 20\). This test case is chosen as an example where the time step is mostly restricted by stability, the solution becomes turbulent, and a relatively low Mach number is used.
5.2 Isentropic Vortex
The isentropic vortex is a widely used benchmark problem [69] with the analytical solution. For the stationary case, the exact solution is given by
where r is the distance from the axis of the vortex and \(v_{\mathrm {t}}\) is the tangential velocity. The moving vortex solution is obtained by a uniform translation in the direction of the velocity vector field.
Herein, the simulation domain is a cube \([5, 5]^3\) with periodic boundaries where the vortex rotates around the axis \((1, 1, 0)^{\text{T}}\), a direction not aligned with the grid. The parameters for this test are \(\gamma = 1.4\), \({Ma}= 0.5\), \(\beta = 5\) and \(T_\infty = 1\). Unless stated otherwise, we use 8 elements per coordinate direction for optimizing controllers and 20 elements for examples with the final time of \(t = 20\). This test case is chosen as an example where the time step can be restricted by accuracy for tight tolerances and because of the existence of an analytical solution.
5.3 Smooth Flow with Source Terms
The analytical solution
is imposed as initial condition in the periodic domain \([1, 1]\) and the source term
is added to the righthand side of the energy equation. The variation of the pressure with amplitude \(A_\mathrm{p} = 50\) and frequency \(\omega _\mathrm{p} = \nicefrac {\uppi }{5}\) results in a cyclic variation of the CFL restriction on the time step. Unless stated otherwise, we use 20 elements and the final time of \(t = 20\). This test case is chosen to assess the ability of the schemes to adapt to varying time step restrictions and because of the existence of an analytical solution.
5.4 Optimization of Step Size Controllers
As explained in Sect. 4, the choice of appropriate step size controller parameters is important to obtain good performance when the schemes are run at the stability limit. Hence, we have optimized controller parameters for each scheme. In general, the optimal time step controller parameters for a given RungeKutta pair will depend somewhat on the problem under consideration. No single controller is optimal for all test cases, but for the experiments conducted in this work, good controllers are usually within ca. 5% of the optimal performance.
For the lowstorage schemes of [37], we used the PI34 controller proposed originally with them. We also tested the PID controller using \(\beta = (0.49, 0.34, 0.10)\) proposed in [36]. We also performed an optimization of controller parameters for each method, as follows.
We ran simulations of all three test cases described above and measured the performance of each scheme (in terms of the number of righthand side evaluations). We used a bruteforce search over the domain \(\texttt {tol}\in [10^{8}, 10^{1}]\), sampling at each power of ten, and \(\beta _1 \in [0.1, 1.0]\), \(\beta _2 \in [0.4, 0.05]\), \(\beta _3 \in [0.0, 0.1]\) sampling at an interval of 0.01 in each parameter, and restricting a priori to parameter values yielding step size control stability for the given scheme (computed using NodePy [43]). The final time for these simulations was set to \(t = 8\) for (14), \(t = 4\) for (15), and \(t = 20\) for (16) to make the bruteforce optimization feasible. From the resulting data, consisting of thousands of runs with each method, an overall best choice of parameters was selected as in Sect. 6.1. Usually, this kind of minmax problem was approached by comparing the controllers minimizing the maximum, the median, or the 95% percentile of the RHS evaluations across all CFD simulations. Then, the final choice was made by human interaction taking into account step size control stability and design criteria for PID controllers.
5.5 Results for Existing Schemes
For \(\hbox {BS}3(2)3_{\mathrm{F}}\), all of the controllers from Table 1 perform reasonably well, PI42 being slightly better than the others. In general, a wide range of controller parameters is acceptable for this scheme. As typified by the example in Sect. 4, standard controllers do not perform well for \(\hbox {BS}5(4)7_{\mathrm{F}}\). We found instead that \(\beta = (0.28, 0.23, 0.00)\) is a reasonable choice for this scheme. For \(\hbox {DP}{5}(4){6}_{\mathrm{F}}\), the PI34 controller (which was originally designed for it by Gustafsson [25]) performs reasonably well in our test cases and optimized controllers like \(\beta = (0.61, 0.27, 0.01)\) do not perform significantly better.
Subsequently, we used the optimized controller parameters and ran full simulations (up to \(t=20\)) for each method with a range of tolerances. Results are shown in Tables 3, 4, and 5, where polynomials of degree \(p = 2\) have been used. There, we only show results for a tolerance \(\texttt {tol}= 10^{5}\), since this choice is usually good for these smallscale test problems. Extended details are available in the supplementary material. For the inviscid TaylorGreen vortex, the time step is indeed restricted by stability for most tolerances, indicated by the approximately constant number of function evaluations, except for the very tight tolerance \(\texttt {tol}= 10^{8}\) and some schemes. For the isentropic vortex (15), the step size is restricted by stability for tolerances \(\gtrapprox 10^{6}\). For smaller tolerances, the number of function evaluations increases. However, this does not result in a significant change of the total error, which is determined mostly by the spatial semidiscretization. Finally, for the smooth flow with the source term (16), the step size is again restricted mostly by stability constraints.
5.5.1 General Purpose Methods
For very loose tolerances \(\gtrapprox 10^{3}\), \(\hbox {BS}5(4)7_{\mathrm{F}}\) and \(\hbox {DP}{5}(4){6}_{\mathrm{F}}\) result in a significant overhead caused by step rejections for some test cases. Otherwise, \(\hbox {BS}5(4)7_{\mathrm{F}}\) performs better than \(\hbox {DP}{5}(4){6}_{\mathrm{F}}\). Other fifthorder general purpose schemes like \(\hbox {T5(4)6}_{\mathrm{F}}\) of [77] perform slightly better, usually yielding an improvement of ca. 5%. However, \(\hbox {BS}3(2)3_{\mathrm{F}}\) is ca. 50% more efficient as long as the time step is restricted by stability.
These results do not change significantly if slightly higherorder semidiscretizations are employed in space (see supplementary material), up to polynomials of degree \(p = 4\), resulting in fifthorder convergence in space. Hence, matching the order of accuracy in space and time is not strictly necessary if one is interested in fixed mesh sizes, especially in common CFD applications. This remains true even if the polynomial degree is increased to \(p = 7\) for the test problems considered here. Then, the temporal error becomes significant and the error of the fully discrete method plateaus only at relatively tight tolerances such as \(10^{8}\). Nevertheless, \(\hbox {BS}3(2)3_{\mathrm{F}}\) is still the most efficient method for such highorder methods and tight tolerances.
5.5.2 SSP Methods
The popular method \(\hbox {SSP3(2)3[3S*}_{+}]\) can be equipped with the PI34 controller to give acceptable step size control performance; slightly better behavior can be achieved by choosing \(\beta = (0.70, 0.37, 0.05)\). For loose and medium tolerances, this scheme performs similarly to \(\hbox {BS}3(2)3_{\mathrm{F}}\). \(\hbox {BS}3(2)3_{\mathrm{F}}\) is significantly more efficient than \(\hbox {SSP3(2)3[3S*}_{+}]\) at tight tolerances.
\(\hbox {SSP3(2)4[3S*}_{+}]\) performs ca. 50% better than \(\hbox {SSP3(2)3[3S*}_{+}]\) or \(\hbox {BS}3(2)3_{\mathrm{F}}\) at loose and medium tolerances for the inviscid TaylorGreen vortex using the optimized controller \(\beta = (0.55, 0.27, 0.05)\). At loose tolerances, it is also ca. 15% more efficient than \(\hbox {BS}3(2)3_{\mathrm{F}}\) for the isentropic vortex. However, the number of RHS evaluations increases drastically for tighter tolerances, making \(\hbox {SSP3(2)4[3S*}_{+}]\) less efficient than \(\hbox {BS}3(2)3_{\mathrm{F}}\) for these parameters. The results for the flow with source term are similar but less pronounced. Hence, \(\hbox {SSP3(2)4[3S*}_{+}]\) can be more efficient than the best schemes so far but the embedded method does not seem to be reliable enough to make the choice of the tolerance as robust as for other schemes. Additionally, the choice of appropriate controller parameters can be crucial for \(\hbox {SSP3(2)4[3S*}_{+}]\), since some standard controllers do not perform well.
For higher polynomial degrees \(p \in \{3, 4\}\), \(\hbox {SSP3(2)4[3S*}_{+}]\) is still a very interesting method that can even beat \(\hbox {BS}3(2)3_{\mathrm{F}}\). \(\hbox {SSP3(2)3[3S*}_{+}]\) is less efficient than \(\hbox {SSP3(2)4[3S*}_{+}]\) also for these higher polynomial degrees. For even higher polynomial degrees such as \(p = 7\), the situation changes a bit since the temporal error becomes significant. While \(\hbox {SSP3(2)4[3S*}_{+}]\) is still the most efficient method so far for medium tolerances, it becomes less efficient than \(\hbox {BS}3(2)3_{\mathrm{F}}\) for the vortex problems at a tolerance of \(10^{8}\), since it is less optimized for accuracy than that general purpose method.
5.5.3 LowStorage Methods
Some standard controllers like PI34 perform mostly acceptably well for \(\hbox {KCL}3(2)4[2\hbox {R}_{+}]\hbox {C}\) (based on the number of step rejections). Nevertheless, an optimized controller with parameter \(\beta = (0.50, 0.35, 0.10)\) results in a few percent fewer function evaluations. However, \(\hbox {BS}3(2)3_{\mathrm{F}}\) is up to 20% more efficient, in accordance with the real stability interval scaled by the effective number of stages, which is three for \(\hbox {BS}3(2)3_{\mathrm{F}}\) because of the FSAL property.
The PI34 controller does not perform well for the other lowstorage schemes. For \(\hbox {KCL}4(3)5[3\hbox {R}_{+}]\hbox {C}\) (but not for the other schemes), the PID controller with \(\beta = (0.49, 0.34, 0.10)\) proposed in [36] performs much better. An optimized controller with parameter \(\beta = (0.41, 0.28, 0.08)\) performs even slightly better, making this scheme more efficient than \(\hbox {BS}3(2)3_{\mathrm{F}}\) for the inviscid TaylorGreen vortex and slightly more efficient for the isentropic vortex. However, \(\hbox {BS}3(2)3_{\mathrm{F}}\) is still better for the other test case. \(\hbox {KCL}4(3)5[2\hbox {R}_{+}]\hbox {C}\) and \(\hbox {KCL5(4)9[2R}_{+}]\hbox {S}\) were more challenging for optimizing controller parameters and less efficient than \(\hbox {KCL}4(3)5[3\hbox {R}_{+}]\hbox {C}\).
As for \(p = 2\), \(\hbox {KCL}4(3)5[3\hbox {R}_{+}]\hbox {C}\) is usually the most efficient existing lowstorage method of [37] for \(p \in \{3, 4\}\), which can also be more efficient than \(\hbox {BS}3(2)3_{\mathrm{F}}\) for the vortex problems. However, \(\hbox {SSP3(2)4[3S*}_{+}]\) is even more efficient there. Additionally, the sensitivity of the step size controller for the lowstorage methods is bigger than for \(\hbox {BS}3(2)3_{\mathrm{F}}\). For \(p = 7\), all of the lowstorage methods considered here result in a nonnegligible amount of step rejections for the inviscid TaylorGreen vortex. Nevertheless, the fourthorder accurate methods can be up to 15% more efficient than \(\hbox {BS}3(2)3_{\mathrm{F}}\) there. Nevertheless, \(\hbox {SSP3(2)4[3S*}_{+}]\) is still more efficient for this test problem. At tight tolerances such as \(10^{8}\), \(\hbox {KCL}4(3)5[3\hbox {R}_{+}]\hbox {C}\) is the most efficient method considered so far for the isentropic vortex and the smooth flow with source term.
5.5.4 Discussion
All of the general purpose schemes make use of the FSAL technique. Additionally, the stability region of the embedded scheme is always at least as big as the one of the main method. Although the 2R lowstorage schemes were optimized for convectiondominated problems, they were outperformed for all test problems and at almost all tolerances by the general purpose method \(\hbox {BS}3(2)3_{\mathrm{F}}\). Possible reasons for this are that the 2R method coefficients are chosen subject to more stringent lowstorage requirements, they do not exploit the FSAL technique, and they have embedded methods with a stability region that is smaller than that of the main method in some areas. When the time step is restricted by stability, this can effectively reduce the allowable time step for the main method.
This last point is illustrated in Fig. 5, which shows the stability regions (scaled by the effective number of stages) of the main and embedded method for three pairs. We see that although the stability region of \(\hbox {BS}3(2)3_{\mathrm{F}}\) includes less of the real axis than that of \(\hbox {KCL}4(3)5[3\hbox {R}_{+}]\hbox {C}\), the embedded method for \(\hbox {BS}3(2)3_{\mathrm{F}}\) extends further than that of \(\hbox {KCL}4(3)5[3\hbox {R}_{+}]\hbox {C}\). The last stability region in the figure corresponds to a new method developed in the next section. Like \(\hbox {BS}3(2)3_{\mathrm{F}}\), it has the useful property that the stability region of the embedded method contains that of the main method.
In the results described above, the behavior of the methods for the inviscid TaylorGreen vortex was often slightly different from for the other test cases. This can partly be explained by the lower Mach number chosen for this example. Indeed, numerical experiments show that low Mach numbers put more stress on real axis stability than on the rest of the spectrum generated by linear advection. Hence, methods with stability regions that include more parts of the negative real axis but are not optimal for the linear advection spectrum can perform better for low Mach numbers; see also [17]. In this article, we focus on applications in CFD with medium to high Mach numbers. However, flows with small Mach numbers are usually computed using incompressible solvers and implicit time integration methods. Hence, we do not focus on this regime in this article. Nevertheless, we use the inviscid TaylorGreen vortex as test case to study the step size control stability on the negative real axis.
6 New Optimized RungeKutta Pairs
In the previous section we developed optimized step size controllers for existing RungeKutta pairs. Now we consider the optimization of RungeKutta pairs themselves (along with controllers). To do so, we begin with the 3S*_{+} methods of [59], without embedded error estimator. Then, we design an embedded method and optimized controller parameters. The embedded method is optimized for step size control stability, good error metrics (see [37] and the supplementary material for the present work), to have a large stability region that includes that of the main method, and to have coefficients that are not too large. The resulting schemes given by double precision floating point numbers are optimized further using extended precision numbers in Julia [8] and the package Optim.jl [52], such that the order conditions are satisfied at least to quadruple precision. Coefficients of the new optimized methods are available in the accompanying repository [64] in full precision. Double precision coefficients are given in Appendix B. The stability region of a representative method is shown in Fig. 5c.
We also developed new pairs from scratch, based on the approach used in [59]. Specifically, we compute the Fourier footprint of the spectral element semidiscretization of the linear advection equation by varying the direction of the wave propagation velocity vector, the solution orientation, and the wave vector module and construct optimized stability polynomials using the algorithm described in [38]. Afterwards, lowstorage RungeKutta schemes are constructed by minimizing their principal error constants, given their class, the number of stages s, the order of accuracy q, and the optimized stability polynomial as constraints. These optimizations are carried out using RKOpt [42], based on the optimization toolbox of MATLAB. However, the resulting methods did not perform better than the pairs based on starting with methods from [59].
6.1 Optimization of Controller Parameters
The controller parameters for these new pairs are optimized using the same approach as described in Sect. 5.
Typical performance results of the optimization procedure are shown in Fig. 6 using \({\hbox {RK3(2)5}_{\mathrm{F}}[\hbox {3S}^{*}}_{+}]\) as an example. For the isentropic vortex (15) with \(\texttt {tol}= 10^{5}\), the temporal accuracy starts to play a role and the controllers are not necessarily limited by stability. In this regime, controllers with larger \(\beta _1\) and \(\beta _2\) closer to zero perform better; they are more near to the simple deadbeat (I) controller, which is in some sense optimal in the asymptotic regime. In contrast, the controllers operate near the stability boundary for the test case with source term (16). Here, controllers with more damping and smaller \(\beta _1\) perform better. To find an acceptable controller for the scheme \({\hbox {RK3(2)5}_{\mathrm{F}}[\hbox {3S}^{*}}_{+}]\), both kinds of problems have to be considered, seeking a compromise between efficiency in the asymptotic regime and near the stability boundary.
6.2 Results for New Schemes
Analogously to Tables 3, 4 and 5, results are summarized in Table 6 for the new optimized lowstorage schemes with error control for \(p = 2\); extended details and results for higherorder spatial semidiscretizations using solution polynomials of degree \(p \in \{3, 4, 7\}\) are available in the supplementary material.
In general, the novel schemes are more efficient than all methods tested in Sect. 5. In particular, the novel thirdorder schemes are up to 18% more efficient than \(\hbox {BS}3(2)3_{\mathrm{F}}\), in accordance with the relative lengths of the real stability intervals. They are also up to 5% more efficient than \(\hbox {KCL}4(3)5[3\hbox {R}_{+}]\hbox {C}\) with the optimized PID controller for the inviscid TaylorGreen vortex and up to 13% more efficient for the isentropic vortex. Recall that \(\hbox {BS}3(2)3_{\mathrm{F}}\) is more efficient than \(\hbox {KCL}4(3)5[3\hbox {R}_{+}]\hbox {C}\) for the other test problem. Only \(\hbox {SSP3(2)4[3S*}_{+}]\) is more efficient than \({\hbox {RK3(2)5}_{\mathrm{F}}[\hbox {3S}^{*}}_{+}]\) for the inviscid TaylorGreen vortex, in accordance with the particularly large real stability interval, cf. Sect. 5.5. However, the new schemes are more efficient at realistic (medium to high) Mach numbers, for which they have been optimized.
The optimized fourthorder schemes can be even more efficient for the inviscid TaylorGreen vortex and the smooth flow with source term at medium tolerances, giving an improvement of a few percent. For the isentropic vortex, the thirdorder schemes are still up to 6% more efficient. The optimized fourthorder schemes are up to 25% more efficient than the best corresponding methods of [37] with the controller recommended there. However, the fourthorder accurate main methods of [59] make it particularly difficult to design good embedded methods and controllers. This can already be seen in the optimized controller coefficients, where the magnitudes of \(\beta _1\) and \(\beta _2\) differ less than for other optimized methods. While the controllers can be tuned to result in acceptable performance for these test problems, they do not necessarily lead to good performance for other setups.
The optimized fifthorder schemes are less efficient than the optimized third and fourthorder schemes for these test cases unless the tolerance is very tight (so the spatial error dominates and the influence of the time integrator is negligible). These schemes (used with the controllers designed here) are much more efficient than the corresponding method of [37] (used with the controller prescribed there), by up to 25% for the source term problem, up to 35% for the isentropic vortex, and up to 18% for the inviscid TaylorGreen vortex.
For \(p = 3\), the thirdorder accurate schemes are the most efficient ones of the optimized lowstorage methods, except for the inviscid TaylorGreen vortex, where the fourthorder methods are up to 6% more efficient. For \(p = 4\), the fourthorder accurate schemes are the most efficient new ones for these experiments, followed closely by the thirdorder accurate methods. For \(p = 7\), the fourthorder accurate schemes are still the most efficient new ones; the thirdorder methods do not match the same small errors for tight tolerances and their temporal error dominates the spatial one. However, the fourthorder methods are difficult to control for loose tolerances, resulting in a significant number of step rejections. For sufficiently tight tolerances, the optimized fourthorder methods are more efficient than the fourthorder methods of [37].
In general, FSAL methods are often more efficient than nonFSAL schemes, especially at loose tolerances. Thus, we recommend to use the novel 3S*_{+} (FSAL) methods for hyperbolic problems where the time step is restricted mostly by stability constraints.
Optimization of RungeKutta pairs for higher numbers of stages was not as successful. While we were able to obtain schemes with good theoretical properties, their performance did not show improvement compared to the schemes listed above. Improvements of the underlying optimization algorithms or the imposition of additional constraints might lead to better schemes in the future. However, the novel schemes developed in this work are already a significant improvement over the state of the art and perform well.
6.3 Further Optimizations
As shown in [59] and the numerical experiments above, the spatial error usually dominates the temporal error. Hence, it is interesting to optimize lowerorder time integrators for higherorder spatial discretizations. Such an approach is presented in [58], but focused on firstorder accurate time integrators. The results shown there demonstrate some speedup compared to the schemes presented in [59], but these come with a reduced accuracy.
Here, we choose thirdorder accurate RungeKutta methods and optimize them for a fifthorder spatial discretization. This resulted in a speedup compared to the optimized schemes described in the previous sections for some test cases; for other test cases, no speedup could be observed. This is in accordance with the general similarity of the scaled convex hulls of the spectra for different polynomial degrees \(p = {1, 2, 3, 4}\) shown in Fig. 7. Hence, we do not pursue this path of research further.
7 Additional Numerical Experiments and Comparisons
Hitherto, a careful selection of test cases was used to demonstrate issues and design criteria for explicit RungeKutta schemes applied to semidiscretizations of hyperbolic conservation laws. Next, more involved examples are used to demonstrate that the novel methods can be applied successfully to largescale CFD problems including the compressible NavierStokes equations.
To be useful for engineering and applied problems in CFD, a CFLbased control must be automated as much as possible. Therefore, we use the approach described in Sect. 3 also for the viscous CFL number. The normalizing factor \(\sigma \) in (9) is chosen depending on the solution polynomial degree such that a method with a real stability interval of 2 is stable for the linear advectiondiffusion equation on a uniform grid with a CFL factor \(\nu = 1\). On top of that, a safety factor of 0.95 is applied, cf. [2].
Except for the viscous shock described next, the other simulations conducted here start from a checkpoint of a developed solution and run on 8 nodes of Shaheen XC40 using 32 CPU cores each (one compute node of Shaheen XC40^{Footnote 2}). The general purpose and SSP methods are implemented using the explicit RungeKutta interface of PETSc. The other methods are implemented using their respective lowstorage forms in PETSc.
7.1 Viscous Shock
The propagating viscous shock is a classical test problem for the compressible NavierStokes equations. The momentum \(\mathscr {V}\) of the analytical solution satisfies the ODE
The solution of this ODE can be written implicitly as
where
Here, \(\mathscr {U}_{\text{L}/\text{R}}\) are the known velocities to the left and right of the shock at \(\pm \infty \), \(\dot{\mathscr {M}}\) is the constant mass flow across the shock, \(Pr = 3/4\) is the Prandtl number, and \(\mu \) is the dynamic viscosity. The mass and total enthalpies are constant across the shock. Moreover, the momentum and energy equations become redundant.
For our tests, we compute \(\mathscr {V}\) from (19) to machine precision using bisection. The moving shock solution is obtained by applying a uniform translation to the above solution. Initially, at \(t = 0\), the shock is located at the center of the domain. We use the parameters \({Ma}=2.5\), \({Re}=10\), and \(\gamma =1.4\) in the domain given by \(x \in [0.5,0.5]\) till the final time \(t = 2\). The boundary conditions are prescribed by penalizing the numerical solution against the analytical solution, which is also used to prescribe the initial condition.
Some results for the most promising methods and optimized controllers are shown in Table 7; extended details are available in the Supplementary Material. The new scheme \({\hbox {RK3(2)5}_{\mathrm{F}}[\hbox {3S}^{*}}_{+}]\) is ca. 18% more efficient than \(\hbox {BS}3(2)3_{\mathrm{F}}\) for relevant tolerances, in accordance with the relative real stability intervals. \(\hbox {SSP3(2)4[3S*}_{+}]\) is a very promising scheme for this kind of problem because of its improved stability properties around the negative real axis. In particular, \(\hbox {SSP3(2)4[3S*}_{+}]\) is ca. 50% more efficient than \(\hbox {BS}3(2)3_{\mathrm{F}}\) for relevant tolerances, also in accordance with the relative real stability intervals. Except for very tight tolerances and low solution polynomial degrees, the step size controllers detect the stability constraint accurately; the spatial error dominates and the error of the time integration schemes is negligible.
7.2 NASA Juncture Flow
We consider the NASA juncture flow problem as described in [55, Section 3.8]. The NASA juncture flow test was designed to validate CFD for wing juncture trailing edge separation and progression, and it is a collaborative effort between CFD computationalists and experimentalists [16]. Specifically, the NASA juncture flow experiment is a series of wind tunnel tests conducted in the NASA Langley subsonic tunnel to collect validation data in the juncture region of a wingbody configuration [68].
Here, we simulate the NASA juncture flow with a wing based on the DLRF6 geometry and a leading edge horn to mitigate the effect of the horseshoe vortex over the wingfuselage juncture [50]. A general view of the geometry is shown in Fig. 8b. The model crank chord is \(\ell ={557.1}\,{\mathrm{mm}}\), the wing span is \(77.89\ell \), and the fuselage length is \(f=8.69\ell \). The wing leading edge horn meets the fuselage at \(x_1=3.45\ell \), and the wing root trailingedge is located at \(x_1=5.31\ell \). In the wind tunnel, the model is mounted on a sting aligned with the fuselage axis. The sting is attached to a mast that emerges from the wind tunnel floor. The Reynolds number is \({Re}=2.4 \times 10^6\) and the freestream Mach number is \({Ma}=0.189\). The angle of attack is AoA \(=2.5^\circ \). We perform simulations in free air conditions, ignoring both the sting and the mast.
As shown in Fig. 8b, the grid is subdivided into three blocks, corresponding to three different approximation degrees, p, for the solution field. In particular, we use \(p=1\) in the farfield region, \(p=3\) in the region surrounding the model, and \(p=2\) elsewhere. In total, we use \(\approx 6.762 \times 10^5\) hexahedral elements and \(\approx 4.091 \times 10^7\) degrees of freedom (DOFs). We highlight that the boundary layer thickness over the fuselage for \(x =1\,000\)–2 000 mm is about 16 mm, while it is about 20 mm over the wing upstream of the separation bubble [33]. In the present simulation we use between eight and nine solution points in the boundary layer thickness \(\delta _{99}\). The mesh features a maximum aspect ratio of ca. 110. The grid is constructed using the commercial software Pointwise V18.3 released in September 2019; solid boundaries are described using a quadratic mesh.
Figure 9 shows the Qcriterion colored by the velocity magnitude of flow past the aircraft. The separation of the flow on the wing near the junction with the fuselage is visible.
A summary of the performance of the different methods is presented in Table 8. Here, the CFL adaptor with \(\nu = 1.0\) tuned for linear advectiondiffusion works for all Runge‑Kutta methods. However, it was significantly less efficient than the errorbased controller with a conservative tolerance of \(10^{8}\); the CFL controller used ca. 50% more RHS evaluations and wallclock time. Thus, a tedious manual tuning to increase the CFL factor would be necessary to match the efficiency of the errorbased controller which just works out of the box.
\(\hbox {BS}3(2)3_{\mathrm{F}}\) is an efficient general purpose method for this CFD problem. Nevertheless, the optimized third and fourthorder accurate methods are more efficient. Interestingly, \(\hbox {SSP3(2)4[3S*}_{+}]\) is again significantly more efficient, nearly 50% faster than \(\hbox {BS}3(2)3_{\mathrm{F}}\).
7.3 Viscous Flow Past a Formula 1 Front Wing
Here, we consider the flow past a Formula 1 front wing with a relatively complex geometry, supported by the availability of a CAD model and experimental results [60]. We refer to this test case as the Imperial Front Wing, originally based on the front wing and endplate design of the McLaren 17D race car [11]. The panel of Fig. 10 gives an overview of the Imperial Front Wing geometry. We denote by h the distance between the ground and the lowest part of the front wing endplate and by c the chord length of the main element. The position of the wing in the tunnel is further characterized by a pitch angle of \(1.094^{\circ }\). Here we use \(h/c = 0.36\) which can be considered as a relatively low front ride height, with high ground effect and hence higher loads on the wing. The corresponding Reynolds number is \({Re}= 2.2 \times 10^5\), based on the main element chord c of 250 mm and a free stream velocity U of 25 m/s. The Mach number is set to \({Ma}= 0.036\). This corresponds to a practically incompressible flow.
The computational domain is divided into \(3.4 \times 10^6\) hexahedral elements with a maximum aspect ratio of ca. 250. Two different semidiscretizations with solution polynomials of degree \(p = 1\) and \(p = 2\) are used. The grid is constructed using the commercial software Pointwise V18.3 released in September 2019; solid boundaries are described using a quadratic mesh.
In Fig. 11, we present the contour plot of the timeaveraged pressure coefficient on the surface of the front wing. The statistics have been obtained by averaging the solution for approximately five flowthrough time units.
The performance characteristics of the different methods for \(p = 2\) are summarized in Table 9. The results are in agreement with those obtained for the NASA juncture flow. The CFL adaptor with \(\nu = 1.0\) works for all methods and is less efficient than errorbased step size controllers. Again, \(\hbox {BS}3(2)3_{\mathrm{F}}\) is an efficient general purpose scheme for this problem. Nevertheless, the optimized third and fourthorder methods are more efficient and \(\hbox {SSP3(2)4[3S*}_{+}]\) is the most efficient scheme for this problem. The fifthorder method is less efficient than the other four schemes, as expected.
The results for \(p = 1\) summarized in Table 10 are mostly similar to the ones presented before. In contrast to the results for \(p = 2\), the CFL adaptor with \(\nu = 1.0\) did not work for \(\hbox {SSP3(2)4[3S*}_{+}]\); the simulation crashed when using \(\nu = 1.0\) and manual tuning was necessary to get a working setup^{Footnote 3}. For \(\nu = 0.85\), the CFL adaptor worked and was a few percent more efficient than the errorbased controller. However, the latter did not require any manual tuning at all and worked robustly with default parameters for all Runge‑Kutta methods. Here, \(\hbox {SSP3(2)4[3S*}_{+}]\) is less efficient than \({\hbox {RK3(2)5}_{\mathrm{F}}[\hbox {3S}^{*}}_{+}]\) and \({\hbox {RK4(3)9[3S}^{*}}_{+}]\). Otherwise, the results are similar to the ones obtained for the juncture flow and the setup using \(p = 2\).
8 Summary and Conclusions
We studied explicit RungeKutta methods applied to dissipative spectral element semidiscretizations of hyperbolic conservation laws and CFD problems based on the compressible Euler and NavierStokes equations. In this context, we argued in Sect. 3 that errorbased step size control can be advantageous compared to CFLbased approaches, since associated userdefined parameters are usually more robust and can be varied in rather large ranges without affecting accuracy or efficiency. Additionally, errorbased step size control moves the burden of constructing critical parts of the controller from the developer of the spatial semidiscretization to the developer of the time integrator, easing the workflow for most researchers. The results for more complex test problems in Sect. 7 also support this conclusion.
In Sect. 4, we demonstrated that choosing good step size controller parameters is especially important if the time step is restricted by stability constraints, as is typical for many convectiondominated problems. We compared existing RungeKutta pairs in Sect. 5 and proposed an approach to optimize controller parameters for such methods. In general, the thirdorder method \(\hbox {BS}3(2)3_{\mathrm{F}}\) of Bogacki and Shampine [9] performs well compared to both general purpose schemes and methods designed specifically for CFD applications. The strongstability preserving method \(\hbox {SSP3(2)4[3S*}_{+}]\) of [47] with embedded method of [18] also performs well, but is more sensitive to the choice of error tolerance.
In Sect. 6, we developed explicit lowstorage RungeKutta pairs with optimized step size controllers. These novel schemes are more efficient than all of the existing schemes when applied to advectiondominated problems. We demonstrated their performance in several CFD applications with increasing complexity, including the compressible Euler and NavierStokes equations. We contributed our optimized methods to the freely available open source library DifferentialEquations.jl [62] written in Julia [8].
Although not demonstrated in this article, another advantage of errorbased step size control becomes apparent for a cold startup of CFD problems, i.e., simulations around complex geometries that are initialized with a free stream flow. CFLbased approaches often need to adjust the CFL scaling at the beginning to cope with the initial transient period. In contrast, our errorbased approach does not need special tuning and is robust in our experience.
A summary of existing and novel methods with optimized controller parameters is given in Table 11. Depending on whether dissipative/lowMach effects dominate, \(\hbox {SSP3(2)4[3S*}_{+}]\) and \({\hbox {RK3(2)5}_{\mathrm{F}}[\hbox {3S}^{*}}_{+}]\) are the most efficient schemes in our experience. Additionally, \(\hbox {BS}3(2)3_{\mathrm{F}}\) is a surprisingly efficient general purpose method. It becomes increasingly complicated to design controllers that are stable and efficient across different applications for methods of higher order and/or with more stages. However, this is not necessarily a severe drawback, since thirdorder accurate methods like \({\hbox {RK3(2)5}_{\mathrm{F}}[\hbox {3S}^{*}}_{+}]\), \(\hbox {SSP3(2)4[3S*}_{+}]\), and \(\hbox {BS}3(2)3_{\mathrm{F}}\) are usually more efficient in CFD applications. As argued in Sect. 3, the errorbased control has usually a relatively mild sensitivity with respect to the choice of the tolerance. In our experience, it is usually good to choose a relatively tight tolerance around \(10^{8}\) for applied CFD problems. Since the step size is almost always limited by stability, the tolerance does not matter that much, but a relatively tight tolerance helps for methods that are more difficult to control (e.g., those of higher order or with more stages).
The present work was influenced and partially motivated by the landmark work of Kennedy et al. [37], which focused on developing optimized RungeKutta methods for CFD. Herein, we focus on modern spectral element semidiscretizations that introduce dissipation at element interfaces, e.g., using upwind numerical fluxes. Hence, the stability regions of our methods focus also on the negative real axis, whereas “imaginary axis stability is a high priority to the methods” designed in [37, p. 183]. Furthermore, we concentrate on the common case where the spatial error dominates and the step size is restricted by stability rather than temporal accuracy. Hence, we design RungeKutta pairs with large stability regions, for both the main and the embedded methods. In particular, the stability regions of our novel embedded schemes are larger than the ones of the corresponding main methods.
To our knowledge, this article is the first exploring the impact of controller parameters and step size control stability on the efficiency of explicit RungeKutta methods for CFD systematically. This provides important insights into the construction of new methods and augments best practices published before. In particular, we think the conventional wisdom that “coping with step size control instability is probably best accomplished by reducing step sizes” [37, p. 208] can be improved upon by instead optimizing the controller, since that results in a more robust and efficient scheme. As noted in [37, p. 208], “doing this optimization requires some caution because it is not sufficient in the design of a good controller for each of the eigensolutions to be damped. The time constants associated with these eigensolutions must not be too large or too small”. Herein, we proposed a way to conduct this optimization systematically and applied it to a wide range of schemes.
Of course, such an approach also comes with limitations, in particular if the main method is fixed such that only the embedded method and the controller can be designed freely. Some methods such as the fourthorder method used in this article constrain the range of embedded methods and controllers such that a good general purpose optimization is not necessarily successful. While the combined method can be efficient for certain problems, it is not necessarily similarly efficient for other problems, e.g., when going from inviscid to viscous flows. Other schemes such as the novel third and fifthorder accurate optimized methods result in less stability restrictions, making the resulting methods and controllers efficient for a broad range of CFD problems. Thus, we would like to stress that designing a good time integration method should not only focus on the main method but consider the interaction of a main method, an error estimator, and a step size controller. Applying this principle to viscous flows will be a subject of future work.
Some previous work has focused on automated step size control for convectiondominated problems with the goal of achieving a temporal error that is of similar magnitude to that of the spatial error [7, 80]. The addition of such a control on top of the techniques employed here might lead to an even more efficient controller that is not adversely affected by excessively tight temporal error tolerance specification.
We expect the new methods developed in this article to perform well for convectiondominated flows in the subsonic regime; here, we tested them mainly with reference Mach numbers in the range 0.1–0.5. We have demonstrated their improved performance compared to some standard schemes also in other regimes, including viscous flows and the transonic/supersonic regime. For small Mach numbers, incompressible solvers with implicit time discretizations are usually applied. However, if compressible solvers should be used for low Mach numbers, methods could be optimized following the approach of this article.
Notes
Shaheen XC40 is the petascale supercomputer hosted at KAUST, which features 6 174 dual socket compute nodes based on the 16 core Intel Haswell processors running at 2.3 GHz. Each node is equipped with 128 GB of DDR4 memory running at 2.3 GHz. Overall, the system has a total of 197 568 processor cores and 790 TB of aggregate memory.
This behavior can be explained by the different shape of the stability region of \(\hbox {SSP3(2)4[3S*}_{+}]\) compared to the other methods, with a relatively larger real stability interval, cf. Table 1 in the supplementary material. Another measure of the size of the stability region might help but would not remove the necessity of manual tuning to get good performance for CFLbased controllers.
References
Abhyankar, S., Brown, J., Constantinescu, E.M., Ghosh, D., Smith, B.F., Zhang, H.: PETSc/TS: a modern scalable ODE/DAE solver library (2018). arXiv:1806.01437 [math.NA]
Al Jahdali, R., Boukharfane, R., Dalcin, L., Parsani, M.: Optimized explicit RungeKutta schemes for entropy stable discontinuous collocated methods applied to the Euler and NavierStokes equations. In: AIAA Scitech 2021 Forum, p. 0633 (2021). https://doi.org/10.2514/6.20210633
Almquist, M., Dunham, E.M.: Elastic wave propagation in anisotropic solids using energystable finite differences with weakly enforced boundary and interface conditions (2020). arXiv:2003.12811 [math.NA]
Arévalo, C., Söderlind, G., Hadjimichael, Y., Fekete, I.: Local error estimation and step size control in adaptive linear multistep methods. Numer. Algorithm 86, 537–563 (2021). https://doi.org/10.1007/s11075020009001
Baggag, A., Atkins, H., Keyes, D.: Parallel implementation of the discontinuous Galerkin method. Tech. Rep. NASA/CR1999209546, NASA, Institute for Computer Applications in Science and Engineering, NASA Langley Research Center, Hampton VA United States (1999)
Balay, S., Abhyankar, S., Adams, M.F., Brown, J., Brune, P., Buschelman, K., Dalcin, L., Dener, A., Eijkhout, V., Gropp, W.D., Kaushik, D., Knepley, M.G., May, D.A., McInnes, L.C., Mills, R.T., Munson, T., Rupp, K., Sanan, P., Smith, B.F., Zampini, S., Zhang, H., Zhang, H.: PETSc users manual. Tech. Rep. ANL95/11—Revision 3.13, Argonne National Laboratory (2020)
Berzins, M.: Temporal error control for convectiondominated equations in two space dimensions. SIAM J. Sci. Comput. 16(3), 558–580 (1995)
Bezanson, J., Edelman, A., Karpinski, S., Shah, V.B.: Julia: a fresh approach to numerical computing. SIAM Rev. 59(1), 65–98 (2017). https://doi.org/10.1137/141000671. arXiv:1411.1607 [cs.MS]
Bogacki, P., Shampine, L.F.: A 3(2) pair of RungeKutta formulas. Appl. Math. Lett. 2(4), 321–325 (1989). https://doi.org/10.1016/08939659(89)900797
Bogacki, P., Shampine, L.F.: An efficient RungeKutta (4,5) pair. Comput. Math. Appl. 32(6), 15–28 (1996). https://doi.org/10.1016/08981221(96)001411
Buscariolo, F.F., Hoessler, J., Moxey, D., Jassim, A., Gouder, K., Basley, J., Murai, Y., Assi, G.R.S., Sherwin, S.J.: Spectral/hp element simulation of flow past a Formula One front wing: validation against experiments (2019). http://arxiv.org/abs/1909.06701v1
Butcher, J.C.: Numerical Methods for Ordinary Differential Equations. Wiley, Chichester (2016). https://doi.org/10.1002/9781119121534
Carpenter, M.H., Fisher, T.C., Nielsen, E.J., Frankel, S.H.: Entropy stable spectral collocation schemes for the NavierStokes equations: discontinuous interfaces. SIAM J. Sci. Comput. 36(5), B835–B867 (2014). https://doi.org/10.1137/130932193
Carpenter, M.H., Parsani, M., Fisher, T.C., Nielsen, E.J.: Towards an entropy stable spectral element framework for computational fluid dynamics. In: 54th AIAA Aerospace Sciences Meeting. American Institute of Aeronautics and Astronautics (2016). https://doi.org/10.2514/6.20161058
Chan, J., Fernández, D.C.D.R., Carpenter, M.H.: Efficient entropy stable Gauss collocation methods. SIAM J. Sci. Comput. 41(5), A2938–A2966 (2019). https://doi.org/10.1137/18M1209234
Christopher, L.R.: The NASA juncture flow test as a model for effective CFD/experimental collaboration. In: 2018 Applied Aerodynamics Conference. American Institute of Aeronautics and Astronautics (2018). https://doi.org/10.2514/6.20183319
Citro, V., Giannetti, F., Sierra, J.: Optimal explicit RungeKutta methods for compressible NavierStokes equations. Appl. Numer. Math. 152, 511–526 (2020). https://doi.org/10.1016/j.apnum.2019.11.005
Conde, S., Fekete, I., Shadid, J.N.: Embedded error estimation and adaptive stepsize control for optimal explicit strong stability preserving RungeKutta methods (2018). arXiv:1806.08693 [math.NA]
Dormand, J.R., Prince, P.J.: A family of embedded RungeKutta formulae. J. Comput. Appl. Math. 6(1), 19–26 (1980). https://doi.org/10.1016/0771050X(80)900133
Fernández, D.C.D.R., Carpenter, M.H., Dalcin, L., Zampini, S., Parsani, M.: Entropy stable h/pnonconforming discretization with the summationbyparts property for the compressible Euler and NavierStokes equations. SN Partial Differ. Equ. Appl. 1(2), 1–54 (2020). https://doi.org/10.1007/s4298502000009z
Figueroa, A., Jackiewicz, Z., Löhner, R.: Explicit twostep RungeKutta methods for computational fluid dynamics solvers. Int. J. Numer. Methods Fluids 93(2), 429–444 (2021). https://doi.org/10.1002/fld.4890
Fisher, T.C., Carpenter, M.H.: Highorder entropy stable finite difference schemes for nonlinear conservation laws: finite domains. J. Comput. Phys. 252, 518–557 (2013). https://doi.org/10.1016/j.jcp.2013.06.014
Gassner, G.J., Winters, A.R., Kopriva, D.A.: Split form nodal discontinuous Galerkin schemes with summationbyparts property for the compressible Euler equations. J. Comput. Phys. 327, 39–66 (2016). https://doi.org/10.1016/j.jcp.2016.09.013
Gottlieb, S., Ketcheson, D.I.: Time discretization techniques. In: Abgrall, R., Shu, C.W. (eds) Handbook of Numerical Analysis, vol. 17, pp. 549–583. Elsevier (2016). https://doi.org/10.1016/bs.hna.2016.08.001
Gustafsson, K.: Control theoretic techniques for stepsize selection in explicit RungeKutta methods. ACM Trans. Math. Softw. (TOMS) 17(4), 533–554 (1991). https://doi.org/10.1145/210232.210242
Gustafsson, K., Lundh, M., Söderlind, G.: A PI stepsize control for the numerical solution of ordinary differential equations. BIT Numer. Math. 28(2), 270–287 (1988). https://doi.org/10.1007/BF01934091
Hadri, B., Parsani, M., Hutchinson, M., Heinecke, A., Dalcin, L., Keyes, D.: Performance study of sustained petascale direct numerical simulation on Cray XC40 systems. Concurrency Computat. Pract. Exper. 32(20), e5725 (2020). https://doi.org/10.1002/cpe.5725
Hairer, E., Nørsett, S.P., Wanner, G.: Solving Ordinary Differential Equations I: Nonstiff Problems, Springer Series in Computational Mathematics, vol. 8. SpringerVerlag, Berlin/Heidelberg (2008). https://doi.org/10.1007/9783540788621
Hairer, E., Wanner, G.: Solving Ordinary Differential Equations II: Stiff and DifferentialAlgebraic Problems, Springer Series in Computational Mathematics, vol. 14. SpringerVerlag, Berlin/Heidelberg (2010). https://doi.org/10.1007/9783642052217
Hall, G., Higham, D.J.: Analysis of stepsize selection schemes for RungeKutta codes. IMA J. Numer. Anal. 8(3), 305–310 (1988). https://doi.org/10.1093/imanum/8.3.305
Higham, D.J., Hall, G.: Embedded RungeKutta formulae with stable equilibrium states. J. Comput. Appl. Math. 29(1), 25–33 (1990). https://doi.org/10.1016/03770427(90)901923
Hutchinson, M., Heinecke, A., Pabst, H., Henry, G., Parsani, M., Keyes, D.: Efficiency of high order spectral element methods on petascale architectures. In: Kunkel, J., Balaji, P., Dongarra, J. (eds) High Performance Computing. ISC High Performance 2016. Lecture Notes in Computer Science, vol. 9697. Springer, Cham (2016). https://doi.org/10.1007/9783319413211_23
Iyer, P.S., Malik, M.R.: Wallmodeled LES of the NASA juncture flow experiment. In: AIAA Scitech 2020 Forum, pp. 1–23 (2020). https://doi.org/10.2514/6.20201307
Karniadakis, G.E., Sherwin, S.: Spectral/hp Element Methods for Computational Fluid Dynamics. Oxford University Press, Oxford (2013). https://doi.org/10.1093/acprof:oso/9780198528692.001.0001
Kennedy, C.A., Carpenter, M.H.: Fourth order 2Nstorage RungeKutta schemes. Technical Memorandum NASATM109112, NASA, NASA Langley Research Center, Hampton VA 236810001, United States (1994)
Kennedy, C.A., Carpenter, M.H.: Additive RungeKutta schemes for convectiondiffusionreaction equations. Appl. Numer. Math. 44(1/2), 139–181 (2003). https://doi.org/10.1016/S01689274(02)001381
Kennedy, C.A., Carpenter, M.H., Lewis, R.M.: Lowstorage, explicit RungeKutta schemes for the compressible NavierStokes equations. Appl. Numer. Math. 35(3), 177–219 (2000). https://doi.org/10.1016/S01689274(99)001415
Ketcheson, D.I., Ahmadia, A.J.: Optimal stability polynomials for numerical integration of initial value problems. Commun. Appl. Math. Comput. Sci. 7(2), 247–271 (2013). https://doi.org/10.2140/camcos.2012.7.247
Ketcheson, D.I.: Highly efficient strong stabilitypreserving RungeKutta methods with lowstorage implementations. SIAM J. Sci. Comput. 30(4), 2113–2136 (2008). https://doi.org/10.1137/07070485X
Ketcheson, D.I.: RungeKutta methods with minimum storage implementations. J. Comput. Phys. 229(5), 1763–1773 (2010). https://doi.org/10.1016/j.jcp.2009.11.006
Ketcheson, D.I.: Relaxation RungeKutta methods: conservation and stability for innerproduct norms. SIAM J. Numer. Anal. 57(6), 2850–2870 (2019). https://doi.org/10.1137/19M1263662. arXiv:1905.09847 [math.NA]
Ketcheson, D.I., Parsani, M., Grant, Z.J., Ahmadia, A., Ranocha, H.: RKOpt: a package for the design of numerical ODE solvers. J. Open Source Softw. 5(54), 2514 (2020). https://doi.org/10.21105/joss.02514. https://github.com/ketch/RKOpt
Ketcheson, D.I., Ranocha, H., Parsani, M., bin Waheed, U., Hadjimichael, Y.: NodePy: a package for the analysis of numerical ODE solvers. J. Open Source Softw. 5(55), 2515 (2020). https://doi.org/10.21105/joss.02515. https://github.com/ketch/nodepy
Knepley, M.G., Karpeev, D.A.: Mesh algorithms for PDE with Sieve I: mesh distribution. Sci. Program. 17(3), 215–230 (2009). https://doi.org/10.3233/SPR20090249
Kopriva, D.A.: Implementing Spectral Methods for Partial Differential Equations: Algorithms for Scientists and Engineers. Springer Science & Business Media, New York (2009). https://doi.org/10.1007/9789048122615
Kopriva, D.A., Jimenez, E.: An assessment of the efficiency of nodal discontinuous Galerkin spectral element methods. In: Ansorge, R., Bijl, H., Meister, A., Sonar, T. (eds) Recent Developments in the Numerics of Nonlinear Hyperbolic Conservation Laws, pp. 223–235. Springer, Berlin/Heidelberg (2013). https://doi.org/10.1007/9783642332210_13
Kraaijevanger, J.F.B.M.: Contractivity of RungeKutta methods. BIT Numer. Math. 31(3), 482–528 (1991). https://doi.org/10.1007/BF01933264
Kubatko, E.J., Dawson, C., Westerink, J.J.: Time step restrictions for RungeKutta discontinuous Galerkin methods on triangular grids. J. Comput. Phys. 227(23), 9697–9710 (2008)
Langseth, J.O., LeVeque, R.J.: A wave propagation method for threedimensional hyperbolic conservation laws. J. Comput. Phys. 165(1), 126–166 (2000)
Langtry, R.B., Kuntz, M., Menter, F.R.: Drag prediction of engineairframe interference effects with CFX5. J. Aircr. 42(6), 1523–1529 (2005)
LeVeque, R.J.: Finite Difference Methods for Ordinary and Partial Differential Equations: SteadyState and TimeDependent Problems. SIAM, Philadelphia, PA, USA (2007)
Mogensen, P.K., Riseth, A.N.: Optim: a mathematical optimization package for Julia. J. Open Source Softw. 3(24), 615 (2018). https://doi.org/10.21105/joss.00615
Montijano, J.I., Rández, L., Ketcheson, D.I.: Lowstorage FSAL embedded pairs of RungeKutta methods (2020) (In preparation)
O'Reilly, O., Lundquist, T., Dunham, E.M., Nordström, J.: Energy stable and highorderaccurate finite difference methods on staggered grids. J. Comput. Phys. 346, 572–589 (2017)
Parsani, M., Boukharfane, R., Nolasco, I.R., Fernández, D.C.D.R., Zampini, S., Hadri, B., Dalcin, L.: Highorder accurate entropystable discontinuous collocated Galerkin methods with the summationbyparts property for compressible CFD frameworks: scalable SSDC algorithms and flow solver. J. Comput. Phys. 424, 109844 (2021). https://doi.org/10.1016/j.jcp.2020.109844
Parsani, M., Carpenter, M.H., Nielsen, E.J.: Entropy stable discontinuous interfaces coupling for the threedimensional compressible NavierStokes equations. J. Comput. Phys. 290, 132–138 (2015). https://doi.org/10.1016/j.jcp.2015.02.042
Parsani, M., Carpenter, M.H., Nielsen, E.J.: Entropy stable wall boundary conditions for the threedimensional compressible NavierStokes equations. J. Comput. Phys. 292, 88–113 (2015). https://doi.org/10.1016/j.jcp.2015.03.026
Parsani, M., Ketcheson, D.I., Deconinck, W.: Optimized loworder explicit RungeKutta schemes for the highorder spectral difference method. In: Proceedings of the 11th Finnish Mechanics Days, pp. 49–54. University of Oulu, Department of Mechanical Engineering (2012). http://hdl.handle.net/10754/333613
Parsani, M., Ketcheson, D.I., Deconinck, W.: Optimized explicit RungeKutta schemes for the spectral difference method applied to wave propagation problems. SIAM J. Sci. Comput. 35(2), A957–A986 (2013). https://doi.org/10.1137/120885899
Pegrum, J.: Experimental study of the vortex system generated by a Formula 1 front wing. Ph.D. thesis, Imperial College London (2007)
Prince, P.J., Dormand, J.R.: High order embedded RungeKutta formulae. J. Comput. Appl. Math. 7(1), 67–75 (1981). https://doi.org/10.1016/0771050X(81)900103
Rackauckas, C., Nie, Q.: DifferentialEquations.jl—a performant and featurerich ecosystem for solving differential equations in Julia. J. Open Res. Softw. 5(1), 15 (2017). https://doi.org/10.5334/jors.151
Ranocha, H., Dalcin, L., Parsani, M.: Fullydiscrete explicit locally entropystable schemes for the compressible Euler and NavierStokes equations. Comput. Math. Appl. 80(5), 1343–1359 (2020). https://doi.org/10.1016/j.camwa.2020.06.016.. arXiv:2003.08831 [math.NA]
Ranocha, H., Dalcin, L., Parsani, M., Ketcheson, D.I.: Coefficients of optimized lowstorage RungeKutta methods with automatic step size control for spectral element methods applied to compressible computational fluid dynamics. 2021. https://github.com/ranocha/OptimizedRKCFD, https://doi.org/10.5281/zenodo.4671927
Ranocha, H., Lóczi, L., Ketcheson, D.I.: General relaxation methods for initialvalue problems with application to multistep schemes. Numer. Math. 146, 875–906 (2020). https://doi.org/10.1007/s00211020011584. arXiv:2003.03012 [math.NA]
Ranocha, H., Sayyari, M., Dalcin, L., Parsani, M., Ketcheson, D.I.: Relaxation RungeKutta methods: fullydiscrete explicit entropystable schemes for the compressible Euler and NavierStokes equations. SIAM J. Sci. Comput. 42(2), A612–A638 (2020). https://doi.org/10.1137/19M1263480.. arXiv:1905.09129 [math.NA]
Rojas, D., Boukharfane, R., Dalcin, L., Fernández, D.C.D.R., Ranocha, H., Keyes, D.E., Parsani, M.: On the robustness and performance of entropy stable discontinuous collocation methods. J. Comput. Phys. 426, 109891 (2021). https://doi.org/10.1016/j.jcp.2020.109891. arXiv:1911.10966 [math.NA]
Rumsey, C.L., Morrison, J.H.: Goals and status of the NASA juncture flow experiment, p. STOMPAVT246. , NATO (2016)
Shu, C.W.: Essentially nonoscillatory and weighted essentially nonoscillatory schemes for hyperbolic conservation laws. Final Report NASA/CR97206253, NASA, Institute for Computer Applications in Science and Engineering, NASA Langley Research Center, Hampton VA United States (1997)
Shu, C.W., Osher, S.: Efficient implementation of essentially nonoscillatory shockcapturing schemes. J. Comput. Phys. 77(2), 439–471 (1988). https://doi.org/10.1016/00219991(88)901775
Sjögreen, B., Yee, H.: High order entropy conservative central schemes for wide ranges of compressible gas dynamics and MHD flows. J. Comput. Phys. 364, 153–185 (2018). https://doi.org/10.1016/j.jcp.2018.02.003
Sjögreen, B., Yee, H.C., Kotov, D.: Skewsymmetric splitting and stability of high order central schemes. In: Journal of Physics: Conference Series, vol. 837, p. 012019. IOP Publishing (2017). https://doi.org/10.1088/17426596/837/1/012019
Söderlind, G.: Automatic control and adaptive timestepping. Numer. Algorithms 31(1/2/3/4), 281–310 (2002). https://doi.org/10.1023/A:1021160023092
Söderlind, G.: Digital filters in adaptive timestepping. ACM Trans. Math. Softw. (TOMS) 29(1), 1–26 (2003). https://doi.org/10.1145/641876.641877
Söderlind, G.: Timestep selection algorithms: adaptivity, control, and signal processing. Appl. Numer. Math. 56(3/4), 488–502 (2006). https://doi.org/10.1016/j.apnum.2005.04.026
Söderlind, G., Wang, L.: Adaptive timestepping and computational stability. J. Comput. Appl. Math. 185(2), 225–243 (2006). https://doi.org/10.1016/j.cam.2005.03.008
Tsitouras, C.: RungeKutta pairs of order 5(4) satisfying only the first column simplifying assumption. Comput. Math. Appl. 62(2), 770–775 (2011). https://doi.org/10.1016/j.camwa.2011.06.002
Vincent, P.E, Castonguay, P., Jameson, A.: A new class of highorder energy stable flux reconstruction schemes. J. Sci. Comput. 47, 50–72 (2011). https://doi.org/10.1007/s109150109420z
Vincent, P., Witherden, F., Vermeire, B., Park, J.S., Iyer, A.: Towards green aviation with Python at petascale. In: SC’16: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 1–11. IEEE Press (2016)
Ware, J., Berzins, M.: Adaptive finite volume methods for timedependent PDEs. In: Modeling, Mesh Generation, and Adaptive Numerical Methods for Partial Differential Equations, pp. 417–430. Springer (1995). https://doi.org/10.1007/9781461242482_20
Acknowledgements
Research reported in this publication was supported by the King Abdullah University of Science and Technology (KAUST). We are thankful for the computing resources of the Supercomputing Laboratory and the Extreme Computing Research Center at KAUST. Funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under Germany’s Excellence Strategy EXC 2044390685587, Mathematics Münster: DynamicsGeometryStructure. Special thanks are extended to the McLaren F1 racing Team for providing data, CAD geometries, and setup of the Imperial Front Wing test case.
Funding
Open Access funding enabled and organized by Projekt DEAL.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interest
On behalf of all authors, the corresponding author states that there is no conflict of interest.
Supplementary Information
Below is the link to the electronic supplementary material.
Appendices
Appendix A Efficient implementation of \(\hbox {SSP3(2)4[3S*}_{+}]\)
The Butcher coefficients of \(\hbox {SSP3(2)4[3S*}_{+}]\) are
where spaces indicate zeros. Because of its lowstorage structure, the method can be implemented efficiently and memoryfriendly as
At the end of one step (A2), \(u^{n+1}\) is stored in u and \({\widehat{u}}^{n+1}\) is stored in \({\widehat{u}}\). Usually, it is not important to know \({\widehat{u}}^{n+1}\) but \({\widehat{u}}^{n+1}  u^{n+1}\) to estimate the error; this difference can be obtained as \(({\widehat{u}} u) / 2\) instead of \(({\widehat{u}}+ u) / 2\) in the last assignment in (A2). If the lowstorage assumption introduced in [40] can be applied, \(\hbox {SSP3(2)4[3S*}_{+}]\) can be implemented using only three memory locations for \(u^n\), u and \({\widehat{u}}\). Otherwise, an additional storage location is necessary to evaluate the righthand side f. Note that the previous value \(u^n\) is already included in this count of memory locations.
Appendix B Coefficients of the Novel RungeKutta Pairs
The lowstorage coefficients of the novel methods are listed in double precision in Tables B1, B2, B3, B4, B5 and B6. Fullprecision results in electronic form are available in the accompanying repository [64]. We contributed our optimized methods to the freely available open source library DifferentialEquations.jl [62] written in Julia [8].
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Ranocha, H., Dalcin, L., Parsani, M. et al. Optimized RungeKutta Methods with Automatic Step Size Control for Compressible Computational Fluid Dynamics. Commun. Appl. Math. Comput. 4, 1191–1228 (2022). https://doi.org/10.1007/s4296702100159w
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s4296702100159w
Keywords
 Explicit RungeKutta methods
 Step size control
 Compressible Euler equations
 Compressible NavierStokes equations
 hpadaptive spatial discretizations