1 Introduction

Systems of hyperbolic conservation laws are used to model many areas of science and engineering, such as fluid dynamics, acoustics, and electrodynamics. In practical applications, these systems must often be solved numerically. Explicit Runge-Kutta schemes are the most commonly used time discretizations for hyperbolic partial differential equations (PDEs), because of their efficiency and parallel scalability [24, 46, 51]. Overall efficiency of the method also depends on choosing a time step that is as large as possible while still satisfying stability and accuracy requirements. Since stability requirements are frequently more restrictive in this setting, hyperbolic PDE practitioners often adapt the time step size based on a desired Courant-Friedrichs-Lewy (CFL) number. The CFL number involves the ratio of the maximum characteristic speed to the mesh spacing, which is essentially a proxy for the norm of the Jacobian. The optimal CFL number depends on the space and time discretizations chosen, and possibly on the problem; it is often determined by trial and error.

On the other hand, time integration research has long emphasized the efficiency of error-based step size control. Much effort has gone into the design of embedded Runge-Kutta pairs and step size controllers for this purpose. Compared to CFL-based control, error-based control has the advantage of not requiring a manually-tuned CFL number and allowing for control of the temporal error when necessary. CFL-based control has the advantage of (usually) yielding near-optimal efficiency once the appropriate CFL value has been found, as long as the calculation is indeed stability-limited. Error-based step size control for convection-dominated problems has been attempted previously; see e.g., [7, 80]. An ideal time integration algorithm would achieve the efficiency of the CFL-based controller in the stability-limited regime without the need for manually-tuned parameters, while automatically reducing the step size if error control becomes a more restrictive requirement. In this work, we develop such algorithms in the context of computational fluid dynamics (CFD).

Specifically, we focus on low-storage Runge-Kutta pairs (reviewed in Sect. 2) combined with PID step size controllers (reviewed in Sect. 3) and spectral element methods. Spectral element methods can be very efficient for large-scale computations [5, 27, 32, 34, 79]. Because stability is a challenging issue for these schemes, a lot of effort has been devoted to developing energy stable (linearly stable) [3, 54, 78], and entropy stable (nonlinearly stable) spatial discretizations [13, 15, 20, 22, 23, 67, 71, 72]. Stable fully-discrete schemes can be obtained from these semi-discretizations by using a slight modification of classical time integration schemes, based on the relaxation approach [41, 63, 65, 66].

In the paradigm of CFL-based error control, a common approach to time integrator design is to seek a large region of absolute stability (see e.g., [21] for a recent example of this approach in the context of CFD). For error-based control, a large region of absolute stability is again important (for both the main method and the embedded method). Additionally, when automatic error control is used with step sizes near the stability limit, the concept of step size control stability becomes crucial to the design of the controller. We demonstrate the importance of choosing good step size controllers in Sect. 4. There exists some previous work on developing error-based step size control techniques for convection-dominated problems, principally by Berzins et al. [7, 80].

We compare some existing Runge-Kutta pairs in Sect. 5, and develop optimized Runge-Kutta pairs for discontinuous spectral element semidiscretizations of hyperbolic conservation laws in Sect. 6. The spectral element methods applied for the numerical experiments are implemented in the hp-adaptive, unstructured, curvilinear grid solver SSDC [55]. SSDC is built on top of the Portable and Extensible Toolkit for Scientific computing (PETSc) [6], its mesh topology abstraction (DMPLEX) [44], and its scalable ODE/DAE solver library [1]. Further details on the spatial semidiscretizations can be found in [13, 14, 20, 56, 57, 67]. We perform numerical experiments using the novel schemes in Sect. 7, both for the compressible Euler and Navier-Stokes equations. Finally, we summarize and discuss our results in Sect. 8. We contributed our optimized methods to the freely available open source library DifferentialEquations.jl [62] written in Julia [8].

2 Runge-Kutta Methods and Adaptive Time Stepping

Using the method of lines, a spatial semidiscretization of a hyperbolic PDE yields an ordinary differential equation (ODE) system

$$\left \{ \begin{aligned} \begin{aligned} &\frac{\hbox {d}}{\hbox {d}t} u(t)= f(t, u(t)),&t \in (0,T), \\ &u(0)= u_0, \end{aligned} \end{aligned} \right. $$
(1)

where \(u:[0,T] \rightarrow {\mathbb {R}}^m\) and \(m\) is the number of degrees of freedom in the spatial discretization. An explicit first-same-as-last (FSAL) Runge-Kutta pair with s stages can be described by its Butcher tableau [12, 28]

$$\begin{aligned} \begin{array}{c | c} c &{} A \\ \hline &{} b^{\text{T}} \\ &{} {\widehat{b}}^{\text{T}} \end{array} \end{aligned}$$
(2)

where \(A \in {\mathbb {R}}^{s \times s}\) is strictly lower-triangular, \(b, c \in {\mathbb {R}}^s\), and \({\widehat{b}}\in {\mathbb {R}}^{s+1}\). For (1), a step from \(u^n \approx u(t_n)\) to \(u^{n+1} \approx u(t_{n+1})\), where \(t_{n+1} = t_n + {\Delta t}_n\), is given by

$$\left \{ \begin{aligned} \begin{aligned} &y_i = u^n + {\Delta t}_n \sum _{j=1}^{i-1} a_{ij} \, f\left( t_n + c_j {\Delta t}_n, y_j\right) , \\ &u^{n+1}= u^n + {\Delta t}_n \sum _{i=1}^{s} b_{i} \, f\left( t_n + c_i {\Delta t}_n, y_i\right) , \qquad i \in \{1, \cdots , s\}, \\ &{\widehat{u}}^{n+1}= u^n + {\Delta t}_n \sum _{i=1}^{s} {\widehat{b}}_{i} \, f\left( t_n + c_i {\Delta t}_n, y_i\right) + {\widehat{b}}_{s+1} f\left( t_{n+1}, u^{n+1}\right) . \end{aligned} \end{aligned} \right.$$
(3)

Here, \(y_i\) are the stage values of the Runge-Kutta method and the difference \(u-{\widehat{u}}\) is used to estimate the local truncation error. If \({\widehat{b}}_{s+1}=0,\) then (3) is an ordinary Runge‑Kutta pair; otherwise, it is referred to as an FSAL Runge‑Kutta pair. The FSAL idea is to use the derivative of the new solution as an additional input for the error estimator [19]. If the step is accepted, this costs nothing since the value \(f(t_{n+1}, u^{n+1})\) must be computed at the next step anyway. Usually, for a main method of order q, the embedded method is chosen to be of order \({\widehat{q}}= q-1\); i.e., the schemes are used in local extrapolation mode.

Remark 1

There are different notations for FSAL methods. A common alternative to our choice of using \(A \in {\mathbb {R}}^{s \times s}\), \(b, c \in {\mathbb {R}}^{s}\), and \({\widehat{b}}\in {\mathbb {R}}^{s+1}\) is to embed the baseline s-stage Runge-Kutta method in a method with \(s+1\) stages and Butcher coefficients

$$\begin{aligned} {\tilde{A}} = \begin{pmatrix} A &{} 0 \\ b^{\text{T}} &{} 0 \end{pmatrix} \in {\mathbb {R}}^{(s+1) \times (s+1)}, \quad {\tilde{b}} = \begin{pmatrix} b \\ 0 \end{pmatrix} \in {\mathbb {R}}^{s+1}, \quad {\tilde{c}} = \begin{pmatrix} c \\ 1 \end{pmatrix} \in {\mathbb {R}}^{s+1}. \end{aligned}$$
(4)

Then, the last row of \({\tilde{A}}\) is equal to \({\tilde{b}}\) and \({\widehat{b}}\in {\mathbb {R}}^{s+1}\) can be defined as usual.

The common assumption \(\sum _j a_{ij} = c_i\) is used throughout this article. For methods with error-based step size control, the initial step size is chosen using the algorithm described in [28, p. 169].

2.1 Low-Storage Methods

A typical Runge‑Kutta implementation requires simultaneous storage of all of the stages and/or their derivatives. Each stage or derivative occupies m words; we refer to this amount of storage (sufficient for holding a copy of the solution on the spatial grid at one point in time) as a register. A low-storage Runge‑Kutta method is one that can be implemented using only a few registers; herein we consider methods that require just three or four registers. Note that three registers are the fewest possible if one requires an error estimator and the ability to reject a step.

We consider the low-storage method classes (with and without the FSAL technique):

  • 3S*: three-register methods that include an error estimate;

  • 3S*+: three-register methods that require a fourth register for the error estimate.

Let \(S_j\) denote a given storage register. The class 3S* methods, introduced in [40], can be implemented using only three storage registers of size \(m\) if assignments of the form

$$\begin{aligned} S_j \leftarrow S_j + f(t, S_j) \end{aligned}$$
(5)

can be made with only \(m+ o(m)\) memory. Otherwise, an additional register is required. The 3S* method family is parameterized by the coefficients \(c_i, \gamma _{1,i}, \gamma _{2,i}, \gamma _{3,i}, \beta _{i}, \delta _{i}\) and can be implemented as described in Algorithm 1. We will also use 3S* or 3S*+ to denote some strong stability preserving (SSP) Runge-Kutta methods that can be implemented using a slight modification of these algorithms, as described in [39].

The FSAL technique has been applied to low-storage methods in [53]; these schemes append an FSAL stage to 3S* methods to get more coefficients for the embedded error estimator.

figure a

All 3S*+ methods use an additional storage location for the embedded error estimator. If the embedded method is not used, they reduce to 3S* methods without an embedded scheme. Their low-storage implementation is delineated in Algorithm 2.

figure b

2.2 Error-Based Step Size Control

We use step size controllers based on digital signal processing [25, 26, 73,74,75] implemented in PETSc [1, 6]. In particular, we use PID controllers that select a new time step using the formulaFootnote 1

$$\begin{aligned} {\Delta t}_{n+1} = \varepsilon _{n+1}^{\beta _1 / k} \varepsilon _{n }^{\beta _2 / k} \varepsilon _{n-1}^{\beta _3 / k} {\Delta t}_{n}, \end{aligned}$$
(6)

where q is the order of the main method, \({\widehat{q}}\) is the order of the embedded method (usually \({\widehat{q}}= q - 1\)), \(k = \min (q, {\widehat{q}}) + 1\) (usually \(k = {\widehat{q}}+ 1 = q\)), \(\beta _i\) are the controller parameters, and

$$\begin{aligned} \varepsilon _{n+1} = \frac{1}{w_{n+1}}, \quad w_{n+1} = \left( \frac{1}{m} \sum _{i=1}^{m} \left( \frac{u_i^{n+1} - {\widehat{u}}_i^{n+1}}{\texttt {atol}+ \texttt {rtol}\max \{ |u_i^{n+1}|, |{\widehat{u}}_i^{n+1}| \}} \right) ^2 \right) ^{1/2}, \end{aligned}$$
(7)

where \(m\) is the number of degrees of freedom in u, and \(\texttt {atol}\), \(\texttt {rtol}\) are the absolute and relative error tolerances, respectively. Some common controller parameters recommended in the literature are given in Table 1. Unless stated otherwise, we use equal absolute and relative error tolerances. The choice of the weighted/relative error estimate \(w_{n+1}\) is common in the literature [28, Equations (4.10) and (4.11)] and often the default choice in general purpose ODE software such as PETSc [1] or DifferentialEquations.jl [62]. This choice of \(w_{n+1}\) allows to decouple the time integration parameters from a possible spatial semidiscretization. In contrast to a quadrature-based approach, it weighs degrees of freedom of different refinement levels in the same way, which can be beneficial, since refined regions (of interest) are not weighed less than coarse regions (without interesting solution features).

If the factor multiplying the old time step \({\Delta t}_{n}\) is too small or the solution is out of physical bounds, e.g., because of negative density/pressure in CFD, the step is rejected and retried with a smaller time step \({\Delta t}_{n}\). The default options used in all numerical experiments described in this work accept a step if the factor multiplying the step size is at least \(0.9^2\). Otherwise, the step is rejected and retried with the step size predicted by the PID controller. If the solution is out of physical bounds, the step is rejected and retried with a time step reduced by a factor of four.

Table 1 Classical step size controllers recommended in the literature

3 CFL- vs. Error-Based Step Size Control

The error-based step size control described above is efficient if the practical time step is limited by the constraint of accuracy. On the other hand, if the allowable time step is determined by stability, and an explicit time discretization is employed, then it is natural to use a step size of the form \({\Delta t}_n \propto 1/L_n\), where \(L_n\) is an approximation of the norm of the Jacobian of the ODE system.

In the time integration of hyperbolic PDEs, it is indeed often the case that the step size is limited in practice by stability rather than accuracy. Therefore it is common practice to use a step size control of the kind just described. For such systems, the norm of the Jacobian is proportional to \(\max _i(\lambda _\mathrm {max}(u^n_i)/{\Delta x}_i)\), where \({\Delta x}_i\) is a local measure of the mesh spacing (at grid point/cell/element i), and \(\lambda _\mathrm {max}\) is the maximal (local) wave speed, related to the largest-magnitude eigenvalue of the flux Jacobian of the hyperbolic system. The step size control thus takes the form (referred to herein as a CFL-based control)

$$\begin{aligned} {\Delta t}_n = \nu \, \min _i \frac{{\Delta x}_i}{\lambda _\mathrm {max}\left( u^n_i\right) }, \end{aligned}$$
(8)

where \(\nu \) is the desired CFL number. The appropriate choice of \(\nu \) depends on the details of the space and time discretizations; it can be studied theoretically using linearization (see e.g., [49]) but is often determined experimentally. An additional complication is the question of how to define \({\Delta x}\). Even on uniform Cartesian grids and regular triangulations [48], multiple waves traveling in different directions make an optimal choice of \(\nu \) difficult. This is even more a challenging question for unstructured grids.

A clear advantage of error-based control is the availability of an estimate of the temporal error. At first glance, error-based step size control seems inappropriate in the stability-limited regime, since the local error may not be very sensitive to small differences between stable and unstable step sizes, near the stable step size limit. A tight error tolerance that ensures stability at all steps might result in an excessively small step size. However, as described in [29, Section IV.2] and discussed below, it is possible to design error-based step size controllers that behave appropriately in the stability-limited regime.

Both classes of controllers require some user-determined parameters: \(\nu \) and \({\Delta x}\) for the CFL-based controller, and \(\texttt {atol}\) and \(\texttt {rtol}\) for the error-based controllers. In this section we show through an example that carefully-designed error-based controllers can achieve near-maximal efficiency in a way that is relatively insensitive to changes in the user parameters. In contrast, the efficiency of the CFL-based controller always bears a linear sensitivity to the parameters \(\nu \) and \({\Delta x}\).

To demonstrate, we consider the two-dimensional advection equation with constant velocity \(a = (1, 1)^{\text{T}}\) in the domain \([-5,5]^2\) with periodic boundary conditions. An initial sinusoid of one wavelength in each direction is advected over the time interval [0, 100]. In space we apply the spectral collocation method of SSDC based on solution polynomials of degree \(p = 4\) [55].

For the CFL-based controller, the ratio of the local mesh spacing and the maximal speed at a node i is estimated in this case as

$$\begin{aligned} \frac{{\Delta x}_i}{\lambda _\mathrm {max}\left( u^n_i\right) } = \sigma \frac{J_i}{\sum _{j=1}^d \left| (J \partial _{x} \xi ^j)_i \cdot a \right| }, \end{aligned}$$
(9)

where d is the number of spatial dimensions (\(d = 2\) for this example), \(a = (1, 1)^{\text{T}}\) is the constant advection velocity, \(J_i\) is the determinant of the grid Jacobian \(\partial _{x} \xi \) at node i, \((J \partial _{x} \xi ^j)_i\) is the contravariant basis vector in direction j at node i [45, Chapter 6], and \(\sigma \) is a normalizing factor depending on the solution polynomial degree, p, which is usually chosen such that a real stability interval of 2 corresponds to \(\nu = 1\).

Fig. 1
figure 1

Unstructured grid used in the comparison of CFL- and error-based step size control

Two grids will be used: a regular, uniform grid with \(8^2\) elements and the curvilinear grid shown in Fig. 1. The results are summarized in Fig. 2 for the uniform grid and in Fig. 3 for the unstructured grid. We test three time discretizations: the popular fourth-order, five-stage, low-storage method \(\hbox {CK}{4}(3){5}\)[2N] of [35]; the method \(\hbox {KCL}4(3)5[2\hbox {R}_{+}]\hbox {C}\) of [37], which comes with an embedded third-order error estimator; and the SSP method \(\hbox {SSP3(2)3[3S*}_{+}]\) of [70], equipped with the embedded method of [18].

Remark 2

We use the same naming convention as [37], referring to an s-stage Runge-Kutta method of order q with embedded method of order \({\widehat{q}}\) as \(\hbox {NAME}q({\widehat{q}})s\). Additional identifiers indicating low-storage requirements or other properties are appended, e.g., a subscript “F” for FSAL methods. The number of stages s denotes the effective number of RHS evaluations per step, which is one less than the number of stages for FSAL methods. For low-storage methods, the required amount of memory based on certain assumptions is listed following the notation of [40]. In particular, nN methods need only n memory registers of size m if assignments of the form \(S_j \leftarrow \alpha S_j + f(t, S_i)\) can be made without additional allocations. Similarly, nR methods use n memory registers and assignments of the form \(S_j \leftarrow f(t, S_j)\); mS methods were described in Sect. 2.1. As described there, a subscript \(_+\) indicates methods that require an additional storage register if an embedded error estimator is used. Additional parts of the names of Runge‑Kutta methods are usually taken directly from their sources. For example, the method \(\hbox {KCL}4(3)5[2\hbox {R}_{+}]\hbox {C}\) is a fourth-order method with embedded third-order error estimator. It has five stages and requires two memory registers based on the nR assumption. If the embedded error estimator is used, it requires three memory registers. The C suffix is appended as suggested in [37] to indicate a particular design criterion (in this particular case, looking for a compromise between linear stability and accuracy).

Fig. 2
figure 2

Performance of CFL- and error-based step size controls for a linear advection problem with \(p = 4\) on a uniform grid. For CFL-based controllers, the maximal CFL number is always included. The error-based controller is a standard PI controller with \(\beta _1 = 0.7, \beta _2 = -0.4\) and uses equal absolute and relative tolerances \(\texttt {atol}= \texttt {rtol}= \texttt {tol}\)

The widely-used method \(\hbox {CK}{4}(3){5}\)[2N] has linear stability properties very similar to \(\hbox {KCL}4(3)5[2\hbox {R}_{+}]\hbox {C}\) — both methods have the same maximum stable CFL number \(\nu = 2.1\). Thus, they use the same number of RHS evaluations while yielding nearly the same errors. Thus, we omit the method \(\hbox {CK}{4}(3){5}\)[2N] in the following plots and use only the \(\hbox {KCL}4(3)5[2\hbox {R}_{+}]\hbox {C}\) pair, for which it is possible to use error-based step size control. Impressively, the error-based controller manages, for a wide range of tolerances, to use almost exactly the same number of steps as the carefully tuned CFL-based controller. Over this range of tolerances, including \(\texttt {tol}\in [10^{-5}, 10^{-3}]\), the step size is determined by stability. Hence, the number of RHS evaluations and the error are nearly independent of the tolerance in this regime. For tolerances larger than \(10^{-3}\), the final error increases for this long-time simulation while the number of RHS evaluations stays nearly the same. For tighter tolerances (below \(10^{-6}\)), the error-based controller detects accuracy restrictions and increases the number of RHS evaluations. This also leads to a reduction of the final error until it plateaus again because of the dominant spatial error (at ca. \(\texttt {tol}= 10^{-7}\)). However, the number of RHS evaluations keeps increasing.

Fig. 3
figure 3

Performance of CFL- and error-based step size controls for a linear advection problem with \(p = 4\) on a nonuniform grid. For CFL-based controllers, the maximal CFL number is always included. The error-based controller is a standard PI controller with \(\beta _1 = 0.7, \beta _2 = -0.4\) and uses equal absolute and relative tolerances \(\texttt {atol}= \texttt {rtol}= \texttt {tol}\)

Using the same CFL number of \(\nu = 2.1\) on the unstructured grid still results in a stable simulation. However, the CFL number can be doubled there without increasing the error significantly. Hence, the user has to tune this parameter carefully to get a stable and efficient simulation. In contrast, using error-based step size control we see behavior very similar to what was observed for the uniform grid. The same error tolerance can be used, resulting in the same optimal number of function evaluations determined manually for the CFL-based step size controller. This demonstrates the enhanced robustness properties of error-based step size control.

These examples suggest that error-based control is more robust to changes in the grid and less sensitive to the required user parameters. Similar results have been obtained using other Runge-Kutta schemes for this problem and for more challenging problems, some of which are presented later in this work. For practitioners whose primary interest is in applying the schemes to solve challenging scientific problems or developing spatial semidiscretizations, error-based time step controllers seem favorable, since the most important design choices have to be provided by the developers of the time integration schemes and the practitioners have to choose only the rather robust error tolerance of the solver.

4 Importance of Controller Parameters

Standard error-based controllers will often work acceptably in the asymptotic regime (i.e., the regime where the leading truncation error term strongly dominates all subsequent terms). However, as demonstrated in Sect. 3, applications involving convection-dominated problems are often constrained by stability, so that one may be working outside the asymptotic regime. In this case, the standard theory does not apply; instead, step size control stability has to be considered [30].

Following [29, Section IV.2], step size control stability can be explained using the linear model problem \(\frac{\hbox {d}}{\hbox {d}t} u(t) = \lambda u(t)\). Given an explicit Runge-Kutta method with embedded error estimator and a PID controller (6) with parameters \(\beta _i\), the update formulae become

$$ \left \{ \begin{aligned} \begin{aligned} u^{n+1}&= R\left( {\Delta t}_n \lambda \right) u^n, \\ e^{n+1}&= E\left( {\Delta t}_n \lambda \right) u^n, \\ {\Delta t}_{n+1}&= \text {PID}\left( \beta , \texttt {atol}, \texttt {rtol}, e^{n+1}, e^{n}, e^{n-1}\right) , \end{aligned} \end{aligned} \right. $$
(10)

where R is the stability polynomial of the main method, E is the difference of the stability polynomials of the embedded and the main method, and \(e\) is the (local) error estimate. By taking logarithms, this update formula can be reduced to a difference recursion with fixed points on the boundary of the stability region of the main method. To get a stable behavior, the spectral radius of the associated Jacobian has to be less than unity [29, Proposition IV.2.3]. For a PID controller (6), this Jacobian becomes [37]

(11)

where k is the order of the error estimator; if \({\widehat{q}}= q - 1\) is the order of the embedded method, then \(k = q = {\widehat{q}}+ 1\). To get step size control stability, one can fix a controller such as the standard I controller and optimize the Runge‑Kutta pair accordingly as demonstrated in [31]. The other possibility, pursued here, is to optimize the controller parameters for a given Runge‑Kutta pair.

While one might hope that a controller designed to work well with one method will also work well with other methods, this is generally not the case. Rather, a controller should be designed for the given error estimator; cf. [4] for the case of linear multistep methods. To demonstrate this, we consider again the linear advection problem described in Sect. 3 with a uniform mesh. We will take the PI34 controller with \(\beta _1 = 0.7, \beta _2 = -0.4\) [25], designed for use with the classical \(\hbox {DP}{5}(4){6}_{\mathrm{F}}\) method of [61], but use instead the \(\hbox {BS}5(4)7_{\mathrm{F}}\) method of [10]. Note that both are 5(4) pairs designed with similar purposes in mind. Using a tolerance of \(\texttt {tol}= 10^{-5}\), the integration requires 5 015 RHS evaluations and includes many rejected steps. Applying instead the optimized coefficient \(\beta = (0.28, -0.23)\) derived later in this manuscript results in only 4 119 RHS evaluations and a nearly identical final error. A significant performance gain is obtained by applying appropriate controller parameters, cf. Table 2.

Table 2 Performance of different controllers for \(\hbox {BS}5(4)7_{\mathrm{F}}\): number of function evaluations (#FE) and rejected steps (#R) for the linear advection problem with uniform grid using polynomials of degree \(p = 4\) as in Sect. 3

The spectral radius of the Jacobian (11) determining step size control stability is plotted in Fig. 4. We see that the standard PI34 controller is unstable near the negative real axis while the optimized one is stable.

Fig. 4
figure 4

Stability region scaled by the effective number of stages and spectral radius of the Jacobian (11) determining step size control stability for \(\hbox {BS}5(4)7_{\mathrm{F}}\). The standard PI34 controller is unstable near the negative real axis while the optimized PI controller is stable along the boundary of the stability region

5 Comparison of Existing Methods

Here, we compare some general purpose methods and schemes designed for semidiscretizations of hyperbolic conservation laws. Since we are interested in error-based step size control, we consider only schemes with embedded error estimators. Hence, we consider the general purpose schemes

  • \(\hbox {BS}3(2)3_{\mathrm{F}}\), third-order, four-stage FSAL method of [9],

  • \(\hbox {BS}5(4)7_{\mathrm{F}}\), fifth-order, eight-stage FSAL method of [10],

  • \(\hbox {DP}{5}(4){6}_{\mathrm{F}}\), fifth-order, seven-stage FSAL method of [61],

the SSP schemes

  • \(\hbox {SSP3(2)3[3S*}_{+}]\), third-order, three-stage SSP method of [70] with the embedded method of [18],

  • \(\hbox {SSP3(2)4[3S*}_{+}]\), third-order, four-stage SSP method of [47] with the embedded method of [18] which can be implemented efficiently in low-storage form as described in Appendix A,

and the low-storage methods optimized for hyperbolic conservation laws

  • \(\hbox {KCL}3(2)4[2\hbox {R}_{+}]\hbox {C}\), third-order, four-stage method of [37],

  • \(\hbox {KCL}4(3)5[2\hbox {R}_{+}]\hbox {C}\), fourth-order, five-stage method of [37],

  • \(\hbox {KCL}4(3)5[3\hbox {R}_{+}]\hbox {C}\), fourth-order, five-stage method of [37],

  • \(\hbox {KCL5(4)9[2R}_{+}]\hbox {S}\), fifth-order, nine-stage method of [37].

The results shown here, obtained with three commonly-used general purpose methods, are typical of what we have found in tests with a much wider range of methods. These results are sufficient to illustrate our main conclusions. We do not consider \({\hbox {SSP4(3)10[3S}^{*}}_{+}]\) of [39] with the embedded method of [18] because step size control stability cannot be achieved for this method and any PID controller tested. The embedded method for \({\hbox {SSP3(2)9[3S}^{*}}_{+}]\) proposed in [18] also does not lead to step size control stability. We have created a new embedded method with a stable optimized controller. However, it does not perform better than \(\hbox {SSP3(2)4[3S*}_{+}]\), even with manually tuned CFL-based step size control.

In the following, we will use three representative test problems to compare the performance of these schemes. All test problems are semidiscretizations of the compressible Euler equations in d space dimensions

$$\begin{aligned} \partial _t u + \sum _{j=1}^d \partial _j f^j(u) = \sigma , \quad u(0) = u_0, \end{aligned}$$
(12)

where the conserved variables \(u = (\rho , \rho v^{\text{T}}, \rho e)^{\text{T}}\) are the density \(\rho \), the momentum \(\rho v\), and the energy \(\rho e\). The flux for the spatial coordinate j is

$$\begin{aligned} f^j(u) = \begin{pmatrix} \rho v_j \\ \left( \rho v v_j + p \delta _{i,j} \right) _i \\ \left( \rho e + p \right) v_j \end{pmatrix}, \end{aligned}$$
(13)

where \(p = \rho T = (\gamma -1) (\rho e - \rho v^2 / 2)\) is the pressure, T is the temperature, and an ideal gas law with ratio of specific heats \(\gamma = \nicefrac {7}{5}\) is assumed. The spatial semidiscretizations use entropy-dissipative nodal DG methods with polynomials of degree p on Legendre-Gauss-Lobatto nodes with upwind interface fluxes implemented in SSDC. We present detailed results for \(p = 2\), which is a relevant choice in practical CFD applications. The results are similar for higher-order semidiscretizations such as polynomials of degree \(p \in \{3, 4\}\), presented in the supplementary material in more detail.

5.1 Inviscid Taylor-Green Vortex

The inviscid Taylor-Green vortex in \(d=3\) space dimensions is a classical test case to study the stability of numerical methods [23]. The initial condition given by

$$ \left \{ \begin{aligned} \begin{aligned} &p(x) = 1 + \frac{1}{16} \left( \cos (2 x_1) + \cos (2 x_2) \right) \left( \cos (2 x_3) + 2\right) , \\ &v_1(x) = \sin (x_1) \cos (x_2) \cos (x_3), \quad v_2(x) = -\cos (x_1) \sin (x_2) \cos (x_3), \\ & v_3(x) = 0, \quad T(x) = 1, \end{aligned} \end{aligned} \right. $$
(14)

with the Mach number \({Ma}= 0.1\) is evolved in the periodic domain \([-\uppi , \uppi ]^3\). Unless stated otherwise, we use 8 elements per coordinate direction and the final time of \(t = 20\). This test case is chosen as an example where the time step is mostly restricted by stability, the solution becomes turbulent, and a relatively low Mach number is used.

5.2 Isentropic Vortex

The isentropic vortex is a widely used benchmark problem [69] with the analytical solution. For the stationary case, the exact solution is given by

$$ \left \{ \begin{aligned} \begin{aligned} &T = T_{\infty } - \frac{(\gamma -1) {Ma}_\infty ^2\beta ^2}{8 \gamma \uppi ^2}\exp \bigl (1-r^2\bigr ), \\ & v_{\mathrm {t}} = \frac{r\beta }{2\uppi }\exp {\left( \frac{1-r^2}{2}\right) }, \qquad \rho = T^{1 / (\gamma -1)}, \end{aligned} \end{aligned} \right. $$
(15)

where r is the distance from the axis of the vortex and \(v_{\mathrm {t}}\) is the tangential velocity. The moving vortex solution is obtained by a uniform translation in the direction of the velocity vector field.

Herein, the simulation domain is a cube \([-5, 5]^3\) with periodic boundaries where the vortex rotates around the axis \((1, 1, 0)^{\text{T}}\), a direction not aligned with the grid. The parameters for this test are \(\gamma = 1.4\), \({Ma}= 0.5\), \(\beta = 5\) and \(T_\infty = 1\). Unless stated otherwise, we use 8 elements per coordinate direction for optimizing controllers and 20 elements for examples with the final time of \(t = 20\). This test case is chosen as an example where the time step can be restricted by accuracy for tight tolerances and because of the existence of an analytical solution.

5.3 Smooth Flow with Source Terms

The analytical solution

$$\begin{aligned} \rho (t,x) = \frac{3}{2} + \sin (\uppi (x-t) ), \quad v(t,x) = 1, \quad p(t,x) = 1 + A_\mathrm{p} \left( 1 + \sin (\omega _\mathrm{p} t) \right) \end{aligned}$$
(16)

is imposed as initial condition in the periodic domain \([-1, 1]\) and the source term

$$\begin{aligned} \sigma _{\rho e}(t,x) = \frac{A_\mathrm{p} \omega _\mathrm{p}}{\gamma - 1} \cos (\omega _\mathrm{p} t) \end{aligned}$$
(17)

is added to the right-hand side of the energy equation. The variation of the pressure with amplitude \(A_\mathrm{p} = 50\) and frequency \(\omega _\mathrm{p} = \nicefrac {\uppi }{5}\) results in a cyclic variation of the CFL restriction on the time step. Unless stated otherwise, we use 20 elements and the final time of \(t = 20\). This test case is chosen to assess the ability of the schemes to adapt to varying time step restrictions and because of the existence of an analytical solution.

5.4 Optimization of Step Size Controllers

As explained in Sect. 4, the choice of appropriate step size controller parameters is important to obtain good performance when the schemes are run at the stability limit. Hence, we have optimized controller parameters for each scheme. In general, the optimal time step controller parameters for a given Runge-Kutta pair will depend somewhat on the problem under consideration. No single controller is optimal for all test cases, but for the experiments conducted in this work, good controllers are usually within ca. 5% of the optimal performance.

For the low-storage schemes of [37], we used the PI34 controller proposed originally with them. We also tested the PID controller using \(\beta = (0.49, -0.34, 0.10)\) proposed in [36]. We also performed an optimization of controller parameters for each method, as follows.

We ran simulations of all three test cases described above and measured the performance of each scheme (in terms of the number of right-hand side evaluations). We used a brute-force search over the domain \(\texttt {tol}\in [10^{-8}, 10^{-1}]\), sampling at each power of ten, and \(\beta _1 \in [0.1, 1.0]\), \(\beta _2 \in [-0.4, -0.05]\), \(\beta _3 \in [0.0, 0.1]\) sampling at an interval of 0.01 in each parameter, and restricting a priori to parameter values yielding step size control stability for the given scheme (computed using NodePy [43]). The final time for these simulations was set to \(t = 8\) for (14), \(t = 4\) for (15), and \(t = 20\) for (16) to make the brute-force optimization feasible. From the resulting data, consisting of thousands of runs with each method, an overall best choice of parameters was selected as in Sect. 6.1. Usually, this kind of min-max problem was approached by comparing the controllers minimizing the maximum, the median, or the 95% percentile of the RHS evaluations across all CFD simulations. Then, the final choice was made by human interaction taking into account step size control stability and design criteria for PID controllers.

5.5 Results for Existing Schemes

For \(\hbox {BS}3(2)3_{\mathrm{F}}\), all of the controllers from Table 1 perform reasonably well, PI42 being slightly better than the others. In general, a wide range of controller parameters is acceptable for this scheme. As typified by the example in Sect. 4, standard controllers do not perform well for \(\hbox {BS}5(4)7_{\mathrm{F}}\). We found instead that \(\beta = (0.28, -0.23, 0.00)\) is a reasonable choice for this scheme. For \(\hbox {DP}{5}(4){6}_{\mathrm{F}}\), the PI34 controller (which was originally designed for it by Gustafsson [25]) performs reasonably well in our test cases and optimized controllers like \(\beta = (0.61, -0.27, 0.01)\) do not perform significantly better.

Subsequently, we used the optimized controller parameters and ran full simulations (up to \(t=20\)) for each method with a range of tolerances. Results are shown in Tables 3, 4, and 5, where polynomials of degree \(p = 2\) have been used. There, we only show results for a tolerance \(\texttt {tol}= 10^{-5}\), since this choice is usually good for these small-scale test problems. Extended details are available in the supplementary material. For the inviscid Taylor-Green vortex, the time step is indeed restricted by stability for most tolerances, indicated by the approximately constant number of function evaluations, except for the very tight tolerance \(\texttt {tol}= 10^{-8}\) and some schemes. For the isentropic vortex (15), the step size is restricted by stability for tolerances \(\gtrapprox 10^{-6}\). For smaller tolerances, the number of function evaluations increases. However, this does not result in a significant change of the total error, which is determined mostly by the spatial semidiscretization. Finally, for the smooth flow with the source term (16), the step size is again restricted mostly by stability constraints.

5.5.1 General Purpose Methods

For very loose tolerances \(\gtrapprox 10^{-3}\), \(\hbox {BS}5(4)7_{\mathrm{F}}\) and \(\hbox {DP}{5}(4){6}_{\mathrm{F}}\) result in a significant overhead caused by step rejections for some test cases. Otherwise, \(\hbox {BS}5(4)7_{\mathrm{F}}\) performs better than \(\hbox {DP}{5}(4){6}_{\mathrm{F}}\). Other fifth-order general purpose schemes like \(\hbox {T5(4)6}_{\mathrm{F}}\) of [77] perform slightly better, usually yielding an improvement of ca. 5%. However, \(\hbox {BS}3(2)3_{\mathrm{F}}\) is ca. 50% more efficient as long as the time step is restricted by stability.

These results do not change significantly if slightly higher-order semidiscretizations are employed in space (see supplementary material), up to polynomials of degree \(p = 4\), resulting in fifth-order convergence in space. Hence, matching the order of accuracy in space and time is not strictly necessary if one is interested in fixed mesh sizes, especially in common CFD applications. This remains true even if the polynomial degree is increased to \(p = 7\) for the test problems considered here. Then, the temporal error becomes significant and the error of the fully discrete method plateaus only at relatively tight tolerances such as \(10^{-8}\). Nevertheless, \(\hbox {BS}3(2)3_{\mathrm{F}}\) is still the most efficient method for such high-order methods and tight tolerances.

Table 3 Performance of general purpose schemes: number of function evaluations (#FE), rejected steps (#R), and \(L^2\) error of the density for the inviscid Taylor-Green vortex (14), the isentropic vortex (15), and the flow with source term (16) using polynomials of degree \(p = 2\)

5.5.2 SSP Methods

The popular method \(\hbox {SSP3(2)3[3S*}_{+}]\) can be equipped with the PI34 controller to give acceptable step size control performance; slightly better behavior can be achieved by choosing \(\beta = (0.70, -0.37, 0.05)\). For loose and medium tolerances, this scheme performs similarly to \(\hbox {BS}3(2)3_{\mathrm{F}}\). \(\hbox {BS}3(2)3_{\mathrm{F}}\) is significantly more efficient than \(\hbox {SSP3(2)3[3S*}_{+}]\) at tight tolerances.

\(\hbox {SSP3(2)4[3S*}_{+}]\) performs ca. 50% better than \(\hbox {SSP3(2)3[3S*}_{+}]\) or \(\hbox {BS}3(2)3_{\mathrm{F}}\) at loose and medium tolerances for the inviscid Taylor-Green vortex using the optimized controller \(\beta = (0.55, -0.27, 0.05)\). At loose tolerances, it is also ca. 15% more efficient than \(\hbox {BS}3(2)3_{\mathrm{F}}\) for the isentropic vortex. However, the number of RHS evaluations increases drastically for tighter tolerances, making \(\hbox {SSP3(2)4[3S*}_{+}]\) less efficient than \(\hbox {BS}3(2)3_{\mathrm{F}}\) for these parameters. The results for the flow with source term are similar but less pronounced. Hence, \(\hbox {SSP3(2)4[3S*}_{+}]\) can be more efficient than the best schemes so far but the embedded method does not seem to be reliable enough to make the choice of the tolerance as robust as for other schemes. Additionally, the choice of appropriate controller parameters can be crucial for \(\hbox {SSP3(2)4[3S*}_{+}]\), since some standard controllers do not perform well.

For higher polynomial degrees \(p \in \{3, 4\}\), \(\hbox {SSP3(2)4[3S*}_{+}]\) is still a very interesting method that can even beat \(\hbox {BS}3(2)3_{\mathrm{F}}\). \(\hbox {SSP3(2)3[3S*}_{+}]\) is less efficient than \(\hbox {SSP3(2)4[3S*}_{+}]\) also for these higher polynomial degrees. For even higher polynomial degrees such as \(p = 7\), the situation changes a bit since the temporal error becomes significant. While \(\hbox {SSP3(2)4[3S*}_{+}]\) is still the most efficient method so far for medium tolerances, it becomes less efficient than \(\hbox {BS}3(2)3_{\mathrm{F}}\) for the vortex problems at a tolerance of \(10^{-8}\), since it is less optimized for accuracy than that general purpose method.

Table 4 Performance of SSP schemes: number of function evaluations (#FE), rejected steps (#R), and \(L^2\) error of the density for the inviscid Taylor-Green vortex (14), the isentropic vortex (15), and the flow with source term (16) using polynomials of degree \(p = 2\)

5.5.3 Low-Storage Methods

Some standard controllers like PI34 perform mostly acceptably well for \(\hbox {KCL}3(2)4[2\hbox {R}_{+}]\hbox {C}\) (based on the number of step rejections). Nevertheless, an optimized controller with parameter \(\beta = (0.50, -0.35, 0.10)\) results in a few percent fewer function evaluations. However, \(\hbox {BS}3(2)3_{\mathrm{F}}\) is up to 20% more efficient, in accordance with the real stability interval scaled by the effective number of stages, which is three for \(\hbox {BS}3(2)3_{\mathrm{F}}\) because of the FSAL property.

The PI34 controller does not perform well for the other low-storage schemes. For \(\hbox {KCL}4(3)5[3\hbox {R}_{+}]\hbox {C}\) (but not for the other schemes), the PID controller with \(\beta = (0.49, -0.34, 0.10)\) proposed in [36] performs much better. An optimized controller with parameter \(\beta = (0.41, -0.28, 0.08)\) performs even slightly better, making this scheme more efficient than \(\hbox {BS}3(2)3_{\mathrm{F}}\) for the inviscid Taylor-Green vortex and slightly more efficient for the isentropic vortex. However, \(\hbox {BS}3(2)3_{\mathrm{F}}\) is still better for the other test case. \(\hbox {KCL}4(3)5[2\hbox {R}_{+}]\hbox {C}\) and \(\hbox {KCL5(4)9[2R}_{+}]\hbox {S}\) were more challenging for optimizing controller parameters and less efficient than \(\hbox {KCL}4(3)5[3\hbox {R}_{+}]\hbox {C}\).

As for \(p = 2\), \(\hbox {KCL}4(3)5[3\hbox {R}_{+}]\hbox {C}\) is usually the most efficient existing low-storage method of [37] for \(p \in \{3, 4\}\), which can also be more efficient than \(\hbox {BS}3(2)3_{\mathrm{F}}\) for the vortex problems. However, \(\hbox {SSP3(2)4[3S*}_{+}]\) is even more efficient there. Additionally, the sensitivity of the step size controller for the low-storage methods is bigger than for \(\hbox {BS}3(2)3_{\mathrm{F}}\). For \(p = 7\), all of the low-storage methods considered here result in a non-negligible amount of step rejections for the inviscid Taylor-Green vortex. Nevertheless, the fourth-order accurate methods can be up to 15% more efficient than \(\hbox {BS}3(2)3_{\mathrm{F}}\) there. Nevertheless, \(\hbox {SSP3(2)4[3S*}_{+}]\) is still more efficient for this test problem. At tight tolerances such as \(10^{-8}\), \(\hbox {KCL}4(3)5[3\hbox {R}_{+}]\hbox {C}\) is the most efficient method considered so far for the isentropic vortex and the smooth flow with source term.

Table 5 Performance of low-storage schemes: number of function evaluations (#FE), rejected steps (#R), and \(L^2\) error of the density for the inviscid Taylor-Green vortex (14), the isentropic vortex (15), and the flow with source term (16) using polynomials of degree \(p = 2\)

5.5.4 Discussion

All of the general purpose schemes make use of the FSAL technique. Additionally, the stability region of the embedded scheme is always at least as big as the one of the main method. Although the 2R low-storage schemes were optimized for convection-dominated problems, they were outperformed for all test problems and at almost all tolerances by the general purpose method \(\hbox {BS}3(2)3_{\mathrm{F}}\). Possible reasons for this are that the 2R method coefficients are chosen subject to more stringent low-storage requirements, they do not exploit the FSAL technique, and they have embedded methods with a stability region that is smaller than that of the main method in some areas. When the time step is restricted by stability, this can effectively reduce the allowable time step for the main method.

This last point is illustrated in Fig. 5, which shows the stability regions (scaled by the effective number of stages) of the main and embedded method for three pairs. We see that although the stability region of \(\hbox {BS}3(2)3_{\mathrm{F}}\) includes less of the real axis than that of \(\hbox {KCL}4(3)5[3\hbox {R}_{+}]\hbox {C}\), the embedded method for \(\hbox {BS}3(2)3_{\mathrm{F}}\) extends further than that of \(\hbox {KCL}4(3)5[3\hbox {R}_{+}]\hbox {C}\). The last stability region in the figure corresponds to a new method developed in the next section. Like \(\hbox {BS}3(2)3_{\mathrm{F}}\), it has the useful property that the stability region of the embedded method contains that of the main method.

In the results described above, the behavior of the methods for the inviscid Taylor-Green vortex was often slightly different from for the other test cases. This can partly be explained by the lower Mach number chosen for this example. Indeed, numerical experiments show that low Mach numbers put more stress on real axis stability than on the rest of the spectrum generated by linear advection. Hence, methods with stability regions that include more parts of the negative real axis but are not optimal for the linear advection spectrum can perform better for low Mach numbers; see also [17]. In this article, we focus on applications in CFD with medium to high Mach numbers. However, flows with small Mach numbers are usually computed using incompressible solvers and implicit time integration methods. Hence, we do not focus on this regime in this article. Nevertheless, we use the inviscid Taylor-Green vortex as test case to study the step size control stability on the negative real axis.

Fig. 5
figure 5

Stability regions scaled by the effective number of stages of three representative methods (taking the FSAL property into account). The stability region of the main method is marked in gray and the boundary of the embedded method’s stability region is drawn as black line

6 New Optimized Runge-Kutta Pairs

In the previous section we developed optimized step size controllers for existing Runge-Kutta pairs. Now we consider the optimization of Runge-Kutta pairs themselves (along with controllers). To do so, we begin with the 3S*+ methods of [59], without embedded error estimator. Then, we design an embedded method and optimized controller parameters. The embedded method is optimized for step size control stability, good error metrics (see [37] and the supplementary material for the present work), to have a large stability region that includes that of the main method, and to have coefficients that are not too large. The resulting schemes given by double precision floating point numbers are optimized further using extended precision numbers in Julia [8] and the package Optim.jl [52], such that the order conditions are satisfied at least to quadruple precision. Coefficients of the new optimized methods are available in the accompanying repository [64] in full precision. Double precision coefficients are given in Appendix B. The stability region of a representative method is shown in Fig. 5c.

We also developed new pairs from scratch, based on the approach used in [59]. Specifically, we compute the Fourier footprint of the spectral element semidiscretization of the linear advection equation by varying the direction of the wave propagation velocity vector, the solution orientation, and the wave vector module and construct optimized stability polynomials using the algorithm described in [38]. Afterwards, low-storage Runge-Kutta schemes are constructed by minimizing their principal error constants, given their class, the number of stages s, the order of accuracy q, and the optimized stability polynomial as constraints. These optimizations are carried out using RK-Opt [42], based on the optimization toolbox of MATLAB. However, the resulting methods did not perform better than the pairs based on starting with methods from [59].

6.1 Optimization of Controller Parameters

The controller parameters for these new pairs are optimized using the same approach as described in Sect. 5.

Fig. 6
figure 6

Performance of different controllers with \(\beta _3 = 0\) for \({\hbox {RK3(2)5}_{\mathrm{F}}[\hbox {3S}^{*}}_{+}]\) and two of the test problems with tolerance \(\texttt {tol}= 10^{-5}\). The chosen controller for this scheme uses \(\beta = (0.70, -0.23, 0.00)\) (marked with a black +). The number of function evaluations (#FE) is visualized only for those controllers that result in step size control stability along the whole boundary of the main method’s stability region

Typical performance results of the optimization procedure are shown in Fig. 6 using \({\hbox {RK3(2)5}_{\mathrm{F}}[\hbox {3S}^{*}}_{+}]\) as an example. For the isentropic vortex (15) with \(\texttt {tol}= 10^{-5}\), the temporal accuracy starts to play a role and the controllers are not necessarily limited by stability. In this regime, controllers with larger \(\beta _1\) and \(\beta _2\) closer to zero perform better; they are more near to the simple deadbeat (I-) controller, which is in some sense optimal in the asymptotic regime. In contrast, the controllers operate near the stability boundary for the test case with source term (16). Here, controllers with more damping and smaller \(\beta _1\) perform better. To find an acceptable controller for the scheme \({\hbox {RK3(2)5}_{\mathrm{F}}[\hbox {3S}^{*}}_{+}]\), both kinds of problems have to be considered, seeking a compromise between efficiency in the asymptotic regime and near the stability boundary.

6.2 Results for New Schemes

Analogously to Tables 3, 4 and 5, results are summarized in Table 6 for the new optimized low-storage schemes with error control for \(p = 2\); extended details and results for higher-order spatial semidiscretizations using solution polynomials of degree \(p \in \{3, 4, 7\}\) are available in the supplementary material.

Table 6 Performance of optimized low-storage schemes: number of function evaluations (#FE), rejected steps (#R), and \(L^2\) error of the density for the inviscid Taylor-Green vortex (14), the isentropic vortex (15), and the flow with source term (16) using polynomials of degree \(p = 2\)

In general, the novel schemes are more efficient than all methods tested in Sect. 5. In particular, the novel third-order schemes are up to 18% more efficient than \(\hbox {BS}3(2)3_{\mathrm{F}}\), in accordance with the relative lengths of the real stability intervals. They are also up to 5% more efficient than \(\hbox {KCL}4(3)5[3\hbox {R}_{+}]\hbox {C}\) with the optimized PID controller for the inviscid Taylor-Green vortex and up to 13% more efficient for the isentropic vortex. Recall that \(\hbox {BS}3(2)3_{\mathrm{F}}\) is more efficient than \(\hbox {KCL}4(3)5[3\hbox {R}_{+}]\hbox {C}\) for the other test problem. Only \(\hbox {SSP3(2)4[3S*}_{+}]\) is more efficient than \({\hbox {RK3(2)5}_{\mathrm{F}}[\hbox {3S}^{*}}_{+}]\) for the inviscid Taylor-Green vortex, in accordance with the particularly large real stability interval, cf. Sect. 5.5. However, the new schemes are more efficient at realistic (medium to high) Mach numbers, for which they have been optimized.

The optimized fourth-order schemes can be even more efficient for the inviscid Taylor-Green vortex and the smooth flow with source term at medium tolerances, giving an improvement of a few percent. For the isentropic vortex, the third-order schemes are still up to 6% more efficient. The optimized fourth-order schemes are up to 25% more efficient than the best corresponding methods of [37] with the controller recommended there. However, the fourth-order accurate main methods of [59] make it particularly difficult to design good embedded methods and controllers. This can already be seen in the optimized controller coefficients, where the magnitudes of \(\beta _1\) and \(\beta _2\) differ less than for other optimized methods. While the controllers can be tuned to result in acceptable performance for these test problems, they do not necessarily lead to good performance for other setups.

The optimized fifth-order schemes are less efficient than the optimized third- and fourth-order schemes for these test cases unless the tolerance is very tight (so the spatial error dominates and the influence of the time integrator is negligible). These schemes (used with the controllers designed here) are much more efficient than the corresponding method of [37] (used with the controller prescribed there), by up to 25% for the source term problem, up to 35% for the isentropic vortex, and up to 18% for the inviscid Taylor-Green vortex.

For \(p = 3\), the third-order accurate schemes are the most efficient ones of the optimized low-storage methods, except for the inviscid Taylor-Green vortex, where the fourth-order methods are up to 6% more efficient. For \(p = 4\), the fourth-order accurate schemes are the most efficient new ones for these experiments, followed closely by the third-order accurate methods. For \(p = 7\), the fourth-order accurate schemes are still the most efficient new ones; the third-order methods do not match the same small errors for tight tolerances and their temporal error dominates the spatial one. However, the fourth-order methods are difficult to control for loose tolerances, resulting in a significant number of step rejections. For sufficiently tight tolerances, the optimized fourth-order methods are more efficient than the fourth-order methods of [37].

In general, FSAL methods are often more efficient than non-FSAL schemes, especially at loose tolerances. Thus, we recommend to use the novel 3S*+ (FSAL) methods for hyperbolic problems where the time step is restricted mostly by stability constraints.

Optimization of Runge-Kutta pairs for higher numbers of stages was not as successful. While we were able to obtain schemes with good theoretical properties, their performance did not show improvement compared to the schemes listed above. Improvements of the underlying optimization algorithms or the imposition of additional constraints might lead to better schemes in the future. However, the novel schemes developed in this work are already a significant improvement over the state of the art and perform well.

6.3 Further Optimizations

As shown in [59] and the numerical experiments above, the spatial error usually dominates the temporal error. Hence, it is interesting to optimize lower-order time integrators for higher-order spatial discretizations. Such an approach is presented in [58], but focused on first-order accurate time integrators. The results shown there demonstrate some speedup compared to the schemes presented in [59], but these come with a reduced accuracy.

Here, we choose third-order accurate Runge-Kutta methods and optimize them for a fifth-order spatial discretization. This resulted in a speedup compared to the optimized schemes described in the previous sections for some test cases; for other test cases, no speedup could be observed. This is in accordance with the general similarity of the scaled convex hulls of the spectra for different polynomial degrees \(p = {1, 2, 3, 4}\) shown in Fig. 7. Hence, we do not pursue this path of research further.

Fig. 7
figure 7

Convex hulls of the spectra of the spectral element semidiscretizations used for the optimization of stability polynomials as in [59]. The spectra are scaled such that \(\min \mathrm {Re}(\lambda ) = -1\)

7 Additional Numerical Experiments and Comparisons

Hitherto, a careful selection of test cases was used to demonstrate issues and design criteria for explicit Runge-Kutta schemes applied to semidiscretizations of hyperbolic conservation laws. Next, more involved examples are used to demonstrate that the novel methods can be applied successfully to large-scale CFD problems including the compressible Navier-Stokes equations.

To be useful for engineering and applied problems in CFD, a CFL-based control must be automated as much as possible. Therefore, we use the approach described in Sect. 3 also for the viscous CFL number. The normalizing factor \(\sigma \) in (9) is chosen depending on the solution polynomial degree such that a method with a real stability interval of 2 is stable for the linear advection-diffusion equation on a uniform grid with a CFL factor \(\nu = 1\). On top of that, a safety factor of 0.95 is applied, cf. [2].

Except for the viscous shock described next, the other simulations conducted here start from a checkpoint of a developed solution and run on 8 nodes of Shaheen XC40 using 32 CPU cores each (one compute node of Shaheen XC40Footnote 2). The general purpose and SSP methods are implemented using the explicit Runge-Kutta interface of PETSc. The other methods are implemented using their respective low-storage forms in PETSc.

7.1 Viscous Shock

The propagating viscous shock is a classical test problem for the compressible Navier-Stokes equations. The momentum \(\mathscr {V}\) of the analytical solution satisfies the ODE

$$\begin{aligned} \alpha \mathscr {V}\frac{\partial \mathscr {V}}{\partial x} - (\mathscr {V}-1)(\mathscr {V}-\mathscr {V}_{f}) = 0, \qquad -\infty \leqslant x \leqslant +\infty . \end{aligned}$$
(18)

The solution of this ODE can be written implicitly as

$$\begin{aligned} x-\frac{1}{2}\alpha \left( \log \left| (\mathscr {V}(x_1)-1)(\mathscr {V}(x_1)-\mathscr {V}_{f})\right| +\frac{1+\mathscr {V}_{f}}{1-\mathscr {V}_{f}}\log \left| \frac{\mathscr {V}(x_1)-1}{\mathscr {V}(x_1)-\mathscr {V}_{f}}\right| \right) = 0, \end{aligned}$$
(19)

where

$$\begin{aligned} \mathscr {V}_{f} \equiv \frac{\mathscr {U}_{\text{L}}}{\mathscr {U}_{\text{R}}}, \qquad \alpha \equiv \frac{2\gamma }{\gamma + 1}\frac{\,\mu }{Pr\dot{\mathscr {M}}}. \end{aligned}$$
(20)

Here, \(\mathscr {U}_{\text{L}/\text{R}}\) are the known velocities to the left and right of the shock at \(\pm \infty \), \(\dot{\mathscr {M}}\) is the constant mass flow across the shock, \(Pr = 3/4\) is the Prandtl number, and \(\mu \) is the dynamic viscosity. The mass and total enthalpies are constant across the shock. Moreover, the momentum and energy equations become redundant.

For our tests, we compute \(\mathscr {V}\) from (19) to machine precision using bisection. The moving shock solution is obtained by applying a uniform translation to the above solution. Initially, at \(t = 0\), the shock is located at the center of the domain. We use the parameters \({Ma}=2.5\), \({Re}=10\), and \(\gamma =1.4\) in the domain given by \(x \in [-0.5,0.5]\) till the final time \(t = 2\). The boundary conditions are prescribed by penalizing the numerical solution against the analytical solution, which is also used to prescribe the initial condition.

Table 7 Number of function evaluations (#FE), rejected steps (#R), and \(L^2\) error of the density for the viscous shock (19) using polynomials of degree p

Some results for the most promising methods and optimized controllers are shown in Table 7; extended details are available in the Supplementary Material. The new scheme \({\hbox {RK3(2)5}_{\mathrm{F}}[\hbox {3S}^{*}}_{+}]\) is ca. 18% more efficient than \(\hbox {BS}3(2)3_{\mathrm{F}}\) for relevant tolerances, in accordance with the relative real stability intervals. \(\hbox {SSP3(2)4[3S*}_{+}]\) is a very promising scheme for this kind of problem because of its improved stability properties around the negative real axis. In particular, \(\hbox {SSP3(2)4[3S*}_{+}]\) is ca. 50% more efficient than \(\hbox {BS}3(2)3_{\mathrm{F}}\) for relevant tolerances, also in accordance with the relative real stability intervals. Except for very tight tolerances and low solution polynomial degrees, the step size controllers detect the stability constraint accurately; the spatial error dominates and the error of the time integration schemes is negligible.

7.2 NASA Juncture Flow

We consider the NASA juncture flow problem as described in [55, Section 3.8]. The NASA juncture flow test was designed to validate CFD for wing juncture trailing edge separation and progression, and it is a collaborative effort between CFD computationalists and experimentalists [16]. Specifically, the NASA juncture flow experiment is a series of wind tunnel tests conducted in the NASA Langley subsonic tunnel to collect validation data in the juncture region of a wing-body configuration [68].

Here, we simulate the NASA juncture flow with a wing based on the DLR-F6 geometry and a leading edge horn to mitigate the effect of the horseshoe vortex over the wing-fuselage juncture [50]. A general view of the geometry is shown in Fig. 8b. The model crank chord is \(\ell ={557.1}\,{\mathrm{mm}}\), the wing span is \(77.89\ell \), and the fuselage length is \(f=8.69\ell \). The wing leading edge horn meets the fuselage at \(x_1=3.45\ell \), and the wing root trailing-edge is located at \(x_1=5.31\ell \). In the wind tunnel, the model is mounted on a sting aligned with the fuselage axis. The sting is attached to a mast that emerges from the wind tunnel floor. The Reynolds number is \({Re}=2.4 \times 10^6\) and the freestream Mach number is \({Ma}=0.189\). The angle of attack is AoA \(=-2.5^\circ \). We perform simulations in free air conditions, ignoring both the sting and the mast.

Fig. 8
figure 8

Solution polynomial degree distribution, computational domain and boundary mesh elements for the NASA juncture experiment [55]; f is the fuselage length

As shown in Fig. 8b, the grid is subdivided into three blocks, corresponding to three different approximation degrees, p, for the solution field. In particular, we use \(p=1\) in the far-field region, \(p=3\) in the region surrounding the model, and \(p=2\) elsewhere. In total, we use \(\approx 6.762 \times 10^5\) hexahedral elements and \(\approx 4.091 \times 10^7\) degrees of freedom (DOFs). We highlight that the boundary layer thickness over the fuselage for \(x =1\,000\)–2 000 mm is about 16 mm, while it is about 20 mm over the wing upstream of the separation bubble [33]. In the present simulation we use between eight and nine solution points in the boundary layer thickness \(\delta _{99}\). The mesh features a maximum aspect ratio of ca.  110. The grid is constructed using the commercial software Pointwise V18.3 released in September 2019; solid boundaries are described using a quadratic mesh.

Fig. 9
figure 9

Q-criterion colored by the velocity magnitude of the NASA juncture flow

Figure 9 shows the Q-criterion colored by the velocity magnitude of flow past the aircraft. The separation of the flow on the wing near the junction with the fuselage is visible.

Table 8 Performance of different methods for error and CFL-based step size controllers: number of function evaluations (#FE), rejected steps (#R), and wall-clock time in seconds for the NASA juncture flow problem

A summary of the performance of the different methods is presented in Table 8. Here, the CFL adaptor with \(\nu = 1.0\) tuned for linear advection-diffusion works for all Runge‑Kutta methods. However, it was significantly less efficient than the error-based controller with a conservative tolerance of \(10^{-8}\); the CFL controller used ca. 50% more RHS evaluations and wall-clock time. Thus, a tedious manual tuning to increase the CFL factor would be necessary to match the efficiency of the error-based controller which just works out of the box.

\(\hbox {BS}3(2)3_{\mathrm{F}}\) is an efficient general purpose method for this CFD problem. Nevertheless, the optimized third- and fourth-order accurate methods are more efficient. Interestingly, \(\hbox {SSP3(2)4[3S*}_{+}]\) is again significantly more efficient, nearly 50% faster than \(\hbox {BS}3(2)3_{\mathrm{F}}\).

7.3 Viscous Flow Past a Formula 1 Front Wing

Here, we consider the flow past a Formula 1 front wing with a relatively complex geometry, supported by the availability of a CAD model and experimental results [60]. We refer to this test case as the Imperial Front Wing, originally based on the front wing and endplate design of the McLaren 17D race car [11]. The panel of Fig. 10 gives an overview of the Imperial Front Wing geometry. We denote by h the distance between the ground and the lowest part of the front wing endplate and by c the chord length of the main element. The position of the wing in the tunnel is further characterized by a pitch angle of \(1.094^{\circ }\). Here we use \(h/c = 0.36\) which can be considered as a relatively low front ride height, with high ground effect and hence higher loads on the wing. The corresponding Reynolds number is \({Re}= 2.2 \times 10^5\), based on the main element chord c of 250 mm and a free stream velocity U of 25 m/s. The Mach number is set to \({Ma}= 0.036\). This corresponds to a practically incompressible flow.

The computational domain is divided into \(3.4 \times 10^6\) hexahedral elements with a maximum aspect ratio of ca.  250. Two different semidiscretizations with solution polynomials of degree \(p = 1\) and \(p = 2\) are used. The grid is constructed using the commercial software Pointwise V18.3 released in September 2019; solid boundaries are described using a quadratic mesh.

Fig. 10
figure 10

Overview of the Imperial Front Wing

In Fig. 11, we present the contour plot of the time-averaged pressure coefficient on the surface of the front wing. The statistics have been obtained by averaging the solution for approximately five flow-through time units.

Fig. 11
figure 11

Time-averaged pressure coefficient, \(C_\mathrm{p}\), on the surface of the Imperial Front Wing

Table 9 Performance of different methods for error and CFL-based step size controllers: number of function evaluations (#FE), rejected steps (#R), and wall-clock time in seconds for the Imperial Front Wing using polynomials of degree \(p = 2\)

The performance characteristics of the different methods for \(p = 2\) are summarized in Table 9. The results are in agreement with those obtained for the NASA juncture flow. The CFL adaptor with \(\nu = 1.0\) works for all methods and is less efficient than error-based step size controllers. Again, \(\hbox {BS}3(2)3_{\mathrm{F}}\) is an efficient general purpose scheme for this problem. Nevertheless, the optimized third- and fourth-order methods are more efficient and \(\hbox {SSP3(2)4[3S*}_{+}]\) is the most efficient scheme for this problem. The fifth-order method is less efficient than the other four schemes, as expected.

Table 10 Performance of different methods for error and CFL-based step size controllers: number of function evaluations (#FE), rejected steps (#R), and wall-clock time in seconds for the Imperial Front Wing using polynomials of degree \(p = 1\). \(\hbox {SSP3(2)4[3S*}_{+}]\) crashed for \(\nu = 1.0\)

The results for \(p = 1\) summarized in Table 10 are mostly similar to the ones presented before. In contrast to the results for \(p = 2\), the CFL adaptor with \(\nu = 1.0\) did not work for \(\hbox {SSP3(2)4[3S*}_{+}]\); the simulation crashed when using \(\nu = 1.0\) and manual tuning was necessary to get a working setupFootnote 3. For \(\nu = 0.85\), the CFL adaptor worked and was a few percent more efficient than the error-based controller. However, the latter did not require any manual tuning at all and worked robustly with default parameters for all Runge‑Kutta methods. Here, \(\hbox {SSP3(2)4[3S*}_{+}]\) is less efficient than \({\hbox {RK3(2)5}_{\mathrm{F}}[\hbox {3S}^{*}}_{+}]\) and \({\hbox {RK4(3)9[3S}^{*}}_{+}]\). Otherwise, the results are similar to the ones obtained for the juncture flow and the setup using \(p = 2\).

8 Summary and Conclusions

We studied explicit Runge-Kutta methods applied to dissipative spectral element semidiscretizations of hyperbolic conservation laws and CFD problems based on the compressible Euler and Navier-Stokes equations. In this context, we argued in Sect. 3 that error-based step size control can be advantageous compared to CFL-based approaches, since associated user-defined parameters are usually more robust and can be varied in rather large ranges without affecting accuracy or efficiency. Additionally, error-based step size control moves the burden of constructing critical parts of the controller from the developer of the spatial semidiscretization to the developer of the time integrator, easing the workflow for most researchers. The results for more complex test problems in Sect. 7 also support this conclusion.

In Sect. 4, we demonstrated that choosing good step size controller parameters is especially important if the time step is restricted by stability constraints, as is typical for many convection-dominated problems. We compared existing Runge-Kutta pairs in Sect. 5 and proposed an approach to optimize controller parameters for such methods. In general, the third-order method \(\hbox {BS}3(2)3_{\mathrm{F}}\) of Bogacki and Shampine [9] performs well compared to both general purpose schemes and methods designed specifically for CFD applications. The strong-stability preserving method \(\hbox {SSP3(2)4[3S*}_{+}]\) of [47] with embedded method of [18] also performs well, but is more sensitive to the choice of error tolerance.

In Sect. 6, we developed explicit low-storage Runge-Kutta pairs with optimized step size controllers. These novel schemes are more efficient than all of the existing schemes when applied to advection-dominated problems. We demonstrated their performance in several CFD applications with increasing complexity, including the compressible Euler and Navier-Stokes equations. We contributed our optimized methods to the freely available open source library DifferentialEquations.jl [62] written in Julia [8].

Although not demonstrated in this article, another advantage of error-based step size control becomes apparent for a cold startup of CFD problems, i.e., simulations around complex geometries that are initialized with a free stream flow. CFL-based approaches often need to adjust the CFL scaling at the beginning to cope with the initial transient period. In contrast, our error-based approach does not need special tuning and is robust in our experience.

Table 11 Optimized PID controller parameters for some explicit Runge-Kutta methods with embedded error estimator

A summary of existing and novel methods with optimized controller parameters is given in Table 11. Depending on whether dissipative/low-Mach effects dominate, \(\hbox {SSP3(2)4[3S*}_{+}]\) and \({\hbox {RK3(2)5}_{\mathrm{F}}[\hbox {3S}^{*}}_{+}]\) are the most efficient schemes in our experience. Additionally, \(\hbox {BS}3(2)3_{\mathrm{F}}\) is a surprisingly efficient general purpose method. It becomes increasingly complicated to design controllers that are stable and efficient across different applications for methods of higher order and/or with more stages. However, this is not necessarily a severe drawback, since third-order accurate methods like \({\hbox {RK3(2)5}_{\mathrm{F}}[\hbox {3S}^{*}}_{+}]\), \(\hbox {SSP3(2)4[3S*}_{+}]\), and \(\hbox {BS}3(2)3_{\mathrm{F}}\) are usually more efficient in CFD applications. As argued in Sect. 3, the error-based control has usually a relatively mild sensitivity with respect to the choice of the tolerance. In our experience, it is usually good to choose a relatively tight tolerance around \(10^{-8}\) for applied CFD problems. Since the step size is almost always limited by stability, the tolerance does not matter that much, but a relatively tight tolerance helps for methods that are more difficult to control (e.g., those of higher order or with more stages).

The present work was influenced and partially motivated by the landmark work of Kennedy et al. [37], which focused on developing optimized Runge-Kutta methods for CFD. Herein, we focus on modern spectral element semidiscretizations that introduce dissipation at element interfaces, e.g., using upwind numerical fluxes. Hence, the stability regions of our methods focus also on the negative real axis, whereas “imaginary axis stability is a high priority to the methods” designed in [37, p. 183]. Furthermore, we concentrate on the common case where the spatial error dominates and the step size is restricted by stability rather than temporal accuracy. Hence, we design Runge-Kutta pairs with large stability regions, for both the main and the embedded methods. In particular, the stability regions of our novel embedded schemes are larger than the ones of the corresponding main methods.

To our knowledge, this article is the first exploring the impact of controller parameters and step size control stability on the efficiency of explicit Runge-Kutta methods for CFD systematically. This provides important insights into the construction of new methods and augments best practices published before. In particular, we think the conventional wisdom that “coping with step size control instability is probably best accomplished by reducing step sizes” [37, p. 208] can be improved upon by instead optimizing the controller, since that results in a more robust and efficient scheme. As noted in [37, p. 208], “doing this optimization requires some caution because it is not sufficient in the design of a good controller for each of the eigensolutions to be damped. The time constants associated with these eigensolutions must not be too large or too small”. Herein, we proposed a way to conduct this optimization systematically and applied it to a wide range of schemes.

Of course, such an approach also comes with limitations, in particular if the main method is fixed such that only the embedded method and the controller can be designed freely. Some methods such as the fourth-order method used in this article constrain the range of embedded methods and controllers such that a good general purpose optimization is not necessarily successful. While the combined method can be efficient for certain problems, it is not necessarily similarly efficient for other problems, e.g., when going from inviscid to viscous flows. Other schemes such as the novel third- and fifth-order accurate optimized methods result in less stability restrictions, making the resulting methods and controllers efficient for a broad range of CFD problems. Thus, we would like to stress that designing a good time integration method should not only focus on the main method but consider the interaction of a main method, an error estimator, and a step size controller. Applying this principle to viscous flows will be a subject of future work.

Some previous work has focused on automated step size control for convection-dominated problems with the goal of achieving a temporal error that is of similar magnitude to that of the spatial error [7, 80]. The addition of such a control on top of the techniques employed here might lead to an even more efficient controller that is not adversely affected by excessively tight temporal error tolerance specification.

We expect the new methods developed in this article to perform well for convection-dominated flows in the subsonic regime; here, we tested them mainly with reference Mach numbers in the range 0.1–0.5. We have demonstrated their improved performance compared to some standard schemes also in other regimes, including viscous flows and the transonic/supersonic regime. For small Mach numbers, incompressible solvers with implicit time discretizations are usually applied. However, if compressible solvers should be used for low Mach numbers, methods could be optimized following the approach of this article.