Optimized Runge-Kutta Methods with Automatic Step Size Control for Compressible Computational Fluid Dynamics

We develop error-control based time integration algorithms for compressible fluid dynamics (CFD) applications and show that they are efficient and robust in both the accuracy-limited and stability-limited regime. Focusing on discontinuous spectral element semidiscretizations, we design new controllers for existing methods and for some new embedded Runge-Kutta pairs. We demonstrate the importance of choosing adequate controller parameters and provide a means to obtain these in practice. We compare a wide range of error-control-based methods, along with the common approach in which step size control is based on the Courant-Friedrichs-Lewy (CFL) number. The optimized methods give improved performance and naturally adopt a step size close to the maximum stable CFL number at loose tolerances, while additionally providing control of the temporal error at tighter tolerances. The numerical examples include challenging industrial CFD applications.


Introduction
Systems of hyperbolic conservation laws are used to model many areas of science and engineering, such as fluid dynamics, acoustics, and electrodynamics. In practical applications, these systems must often be solved numerically. Explicit Runge-Kutta schemes are the most commonly used time discretizations for hyperbolic partial differential equations (PDEs), because of their efficiency and parallel scalability [24, 46,51]. Overall efficiency of the method also depends on choosing a time step that is as large as possible while still satisfying stability and accuracy requirements. Since stability requirements are frequently more restrictive in this setting, hyperbolic PDE practitioners often adapt the time step size based on a desired CFL number. The CFL number involves the ratio of the maximum characteristic speed to the mesh spacing, which is essentially a proxy for the norm of the Jacobian. The optimal CFL number depends on the space and time discretizations chosen, and possibly on the problem; it is often determined by trial and error.
On the other hand, time integration research has long emphasized the efficiency of errorbased step size control. Much effort has gone into the design of embedded Runge-Kutta pairs and step size controllers for this purpose. Compared to CFL-based control, error-based control has the advantage of not requiring a manually-tuned CFL number and allowing for control of the temporal error when necessary. CFL-based control has the advantage of (usually) yielding near-optimal efficiency once the appropriate CFL value has been found, as long as the calculation is indeed stability-limited. Error-based step size control for convection-dominated problems has been attempted previously; see e.g. [7,80]. An ideal time integration algorithm would achieve the efficiency of the CFL-based controller in the stability-limited regime without the need for manually-tuned parameters, while automatically reducing the step size if error control becomes a more restrictive requirement. In this work, we develop such algorithms in the context of computational fluid dynamics (CFD).
Specifically, we focus on low-storage Runge-Kutta pairs (reviewed in Section 2) combined with PID step size controllers (reviewed in Section 3) and spectral element methods. Spectral element methods can be very efficient for large-scale computations [5,27,32,34,79]. Because stability is a challenging issue for these schemes, a lot of effort has been devoted to developing energy stable (linearly stable) [3,54,78], and entropy stable (nonlinearly stable) spatial discretizations [13,15,20,22,23,67,71,72]. Stable fully-discrete schemes can be obtained from these semidiscretizations by using a slight modification of classical time integration schemes, based on the relaxation approach [40, 63,65,66].
In the paradigm of CFL-based error control, a common approach to time integrator design is to seek a large region of absolute stability (see e.g. [21] for a recent example of this approach in the context of CFD). For error-based control, a large region of absolute stability is again important (for both the main method and the embedded method). Additionally, when automatic error control is used with step sizes near the stability limit, the concept of step size control stability becomes crucial to the design of the controller. We demonstrate the importance of choosing good step size controllers in Section 4. There exists some previous work on developing errorbased step size control techniques for convection-dominated problems, principally by Berzins and co-authors [7,80].
We compare some existing Runge-Kutta pairs in Section 5, and develop optimized Runge-Kutta pairs for discontinuous spectral element semidiscretizations of hyperbolic conservation laws in Section 6. The spectral element methods applied for the numerical experiments are implemented in the ℎ -adaptive, unstructured, curvilinear grid solver SSDC [55]. SSDC is built on top of the Portable and Extensible Toolkit for Scientific computing (PETSc) [6], its mesh topology abstraction (DMPLEX) [44], and its scalable ODE/DAE solver library [1]. Further details on the spatial semidiscretizations can be found in [13,14,20,56,57,67]. We perform numerical experiments using the novel schemes in Section 7, both for the compressible Euler and Navier-Stokes equations. Finally, we summarize and discuss our results in Section 8. We contributed our optimized methods to the freely available open source library DifferentialEquations.jl [62] written in Julia [8].
Here, are the stage values of the Runge-Kutta method and the difference − is used to estimate the local truncation error. If +1 = 0 then (2.3) is an ordinary RK pair; otherwise it is referred to as an FSAL RK pair. The FSAL idea is to use the derivative of the new solution as an additional input for the error estimator [19]. If the step is accepted, this costs nothing since the value ( +1 , +1 ) must be computed at the next step anyway. Usually, for a main method of order , the embedded method is chosen to be of order = − 1; i.e. the schemes are used in local extrapolation mode.
Remark 2.1. There are different notations for FSAL methods. A common alternative to our choice of using ∈ R × , , ∈ R , and ∈ R +1 is to embed the baseline -stage Runge-Kutta method in a method with + 1 stages and Butcher coefficients (2.4) Then, the last row of˜ is equal to˜ and ∈ R +1 can be defined as usual.
The common assumption = is used throughout this article. For methods with errorbased step size control, the initial step size is chosen using the algorithm described in [28, p. 169].

Low-storage methods
A typical RK implementation requires simultaneous storage of all of the stages and/or their derivatives. Each stage or derivative occupies words; we refer to this amount of storage (sufficient for holding a copy of the solution on the spatial grid at one point in time) as a register. A low-storage RK method is one that can be implemented using only a few registers; herein we consider methods that require just three or four registers. Note that three registers is the fewest possible if one requires an error estimator and the ability to reject a step.
We consider the low-storage method classes (with and without the FSAL technique): • 3S*: three-register methods that include an error estimate • 3S* + : three-register methods that require a fourth register for the error estimate Let denote a given storage register. The class 3S* methods, introduced in [41], can be implemented using only three storage registers of size if assignments of the form ← + ( , ) (2.5) can be made with only + ( ) memory. Otherwise, an additional register is required. The 3S* method family is parameterized by the coefficients , 1, , 2, , 3, , , and can be implemented as described in Algorithm 1. We will also use 3S* or 3S* + to denote some strong stability where is the order of the main method, is the order of the embedded method (usually = − 1), = min( , ) + 1 (usually = + 1 = ), are the controller parameters, and +1 = 1 atol + rtol max{| +1 |, | +1 |} 2 1/2 , (2.7) 1 Open source libraries such as PETSc [1] and DifferentialEquations.jl, [62] where we implemented a PID step size control, will usually limit the factor multiplying the time step size, e.g. using a limiter of the form ( ) = 1 + arctan( − 1) [76].
where is the number of degrees of freedom in , and atol, rtol are the absolute and relative error tolerances. Some common controller parameters recommended in the literature are given in Table 1. Unless stated otherwise, we use equal absolute and relative error tolerances. The choice of the weighted/relative error estimate +1 is common in the literature [28, Equations (4.10) and (4.11)] and often the default choice in general purpose ODE software such as PETSc [1] or DifferentialEquations.jl [62]. This choice of +1 allows to decouple the time integration parameters from a possible spatial semidiscretization. In contrast to a quadrature-based approach, it weighs degrees of freedom of different refinement levels in the same way, which can be beneficial, since refined regions (of interest) are not weighed less than coarse regions (without interesting solution features).
If the factor multiplying the old time step Δ is too small or the solution is out of physical bounds, e.g. because of negative density/pressure in CFD, the step is rejected and retried with a smaller time step Δ . The default options used in all numerical experiments described in this work accept a step if the factor multiplying the step size is at least 0.9 2 . Otherwise, the step is rejected and retried with the step size predicted by the PID controller. If the solution is out of physical bounds, the step is rejected and retried with a time step reduced by a factor of four.

CFL-vs. error-based step size control
The error-based step size control described above is efficient if the practical time step is limited by the constraint of accuracy. On the other hand, if the allowable time step is determined by stability, and an explicit time discretization is employed, then it is natural to use a step size of the form Δ ∝ 1/ , where is an approximation of the norm of the Jacobian of the ODE system.
In the time integration of hyperbolic PDEs, it is indeed often the case that the step size is limited in practice by stability rather than accuracy. Therefore it is common practice to use a step size control of the kind just described. For such systems, the norm of the Jacobian is proportional to max ( max ( )/Δ ), where Δ is a local measure of the mesh spacing (at grid point/cell/element ), and max is the maximal (local) wave speed, related to the largestmagnitude eigenvalue of the flux Jacobian of the hyperbolic system. The step size control thus takes the form (referred to herein as a CFL-based control) where is the desired CFL number. The appropriate choice of depends on the details of the space and time discretizations; it can be studied theoretically using linearization (see e.g. [49]) but is often determined experimentally. An additional complication is the question of how to define Δ . Even on uniform Cartesian grids and regular triangulations [48], multiple waves traveling in different directions make an optimal choice of difficult. This is even more a challenging question for unstructured grids. A clear advantage of error-based control is the availability of an estimate of the temporal error. At first glance, error-based step size control seems inappropriate in the stability-limited regime, since the local error may not be very sensitive to small differences between stable and unstable step sizes, near the stable step size limit. A tight error tolerance that ensures stability at all steps might result in an excessively small step size. However, as described in [29, Section IV.2] and discussed below, it is possible to design error-based step size controllers that behave appropriately in the stability-limited regime.
Both classes of controllers require some user-determined parameters: and Δ for the CFLbased controller, and atol and rtol for the error-based controller. In this section we show through an example that carefully-designed error-based controllers can achieve near-maximal efficiency in a way that is relatively insensitive to changes in the user parameters. In contrast, the efficiency of the CFL-based controller always bears a linear sensitivity to the parameters and Δ .
To demonstrate, we consider the two-dimensional advection equation with constant velocity = (1, 1) in the domain [−5, 5] 2 with periodic boundary conditions. An initial sinusoid of one wavelength in each direction is advected over the time interval [0, 100]. In space we apply the spectral collocation method of SSDC based on solution polynomials of degree = 4 [55].
For the CFL-based controller, the ratio of the local mesh spacing and the maximal speed at a node is estimated in this case as where is the number of spatial dimensions ( = 2 for this example), = (1, 1) is the constant advection velocity, the determinant of the grid Jacobian at node , ( ) is the contravariant basis vector in direction at node [45, Chapter 6], and is a normalizing factor depending on the solution polynomial degree, , which is usually chosen such that a real stability interval of 2 corresponds to = 1. Two grids will be used: A regular, uniform grid with 8 2 elements and the curvilinear grid shown in Figure 1. The results are summarized in Figure 2 for the uniform grid and in Figure 3 for the unstructured grid. We test three time discretizations: the popular fourth-order, fivestage, low-storage method CK4 (3) , referring to an -stage Runge-Kutta method of order with embedded method of order as NAME ( ) . Additional identifiers indicating low-storage requirements or other properties are appended, e.g. a subscript "F" for FSAL methods. The number of stages denotes the effective number of RHS evaluations per step, which is one less than the number of stages for FSAL methods. For low-storage methods, the required amount of memory based on certain assumptions is listed following the notation of [41]. In particular, N methods need only memory registers of size if assignments of the form ← + ( , ) can be made without additional allocations. Similarly, methods use memory registers and assignments of the form ← ( , ); methods were described in Section 2.1. As described there, a subscript + indicates methods that require an additional storage register if an embedded error estimator is used. Additional parts of the names of RK methods are usually taken directly from their sources. For example, the method KCL4(3)5[2R + ]C is a fourth-order method with embedded third-order error estimator. It has five stages and requires two memory registers based on the R assumption. If the embedded error estimator is used, it requires three memory registers. The C suffix is appended as suggested in [37] to indicate a particular design criterion (in this particular case, looking for a compromise between linear stability and accuracy).  The widely-used method CK4(3)5[2N] has linear stability properties very similar to KCL4(3)5[2R + ]C -both methods have the same maximum stable CFL number = 2.1. Thus, they use the same number of RHS evaluations while yielding nearly the same errors. Thus, we omit the method CK4(3)5[2N] in the following plots and use only the KCL4(3)5[2R + ]C pair, for which it is possible to use error-based step size control. Impressively, the error-based controller manages, for a wide range of tolerances, to use almost exactly the same number of steps as the carefully tuned CFL-based controller. Over this range of tolerances, including tol ∈ [10 −5 , 10 −3 ], the step size is determined by stability. Hence, the number of RHS evaluations and the error are nearly independent of the tolerance in this regime. For tolerances larger than 10 −3 , the final error increases for this long-time simulation while the number of RHS evaluations stays nearly the same. For tighter tolerances (below 10 −6 ), the error-based controller detects accuracy restrictions and increases the number of RHS evaluations. This also leads to a reduction of the final error until it plateaus again because of the dominant spatial error (at ca. tol = 10 −7 ). However, the number of RHS evaluations keeps increasing.
Using the same CFL number of = 2.1 on the unstructured grid still results in a stable simulation. However, the CFL number can be doubled there without increasing the error significantly. Hence, the user has to tune this parameter carefully to get a stable and efficient simulation. In contrast, using error-based step size control we see behavior very similar to what was observed for the uniform grid. The same error tolerance can be used, resulting in the same optimal number of function evaluations determined manually for the CFL-based step size controller. This demonstrates the enhanced robustness properties of error-based step size

control.
These examples suggest that error-based control is more robust to changes in the grid and less sensitive to the required user parameters. Similar results have been obtained using other Runge-Kutta schemes for this problem and for more challenging problems, some of which are presented later in this work. For practitioners whose primary interest is in applying the schemes to solve challenging scientific problems or developing spatial semidiscretizations, error-based time step controllers seem favorable, since the most important design choices have to be provided by the developers of the time integration schemes and the practitioners have to choose only the rather robust error tolerance of the solver.

Importance of controller parameters
Standard error-based controllers will often work acceptably in the asymptotic regime (i.e., the regime where the leading truncation error term strongly dominates all subsequent terms). However, as demonstrated in Section 3, applications involving convection-dominated problems are often constrained by stability, so that one may be working outside the asymptotic regime. In this case, the standard theory does not apply; instead, step size control stability has to be considered [30].
Following [29, Section IV.2], step size control stability can be explained using the linear model problem d d ( ) = ( ). Given an explicit Runge-Kutta method with embedded error estimator and a PID controller (2.6) with parameters , the update formulae become where is the stability polynomial of the main method, is the difference of the stability polynomials of the embedded and the main method, and is the (local) error estimate. By taking logarithms, this update formula can be reduced to a difference recursion with fixed points on the boundary of the stability region of the main method. To get a stable behavior, the spectral radius of the associated Jacobian has to be less than unity [29, Proposition IV.
where is the order of the error estimator; if = − 1 is the order of the embedded method, then = = + 1. To get step size control stability, one can fix a controller such as the standard I controller and optimize the RK pair accordingly as demonstrated in [31]. The other possibility, pursued here, is to optimize the controller parameters for a given RK pair. While one might hope that a controller designed to work well with one method will also work well with other methods, this is generally not the case. Rather, a controller should be designed for the given error estimator; cf. [4] for the case of linear multistep methods. To demonstrate this, we consider again the linear advection problem described in Section 3 with a uniform mesh. We will take the PI34 controller with 1 = 0.7, 2 = −0. 4 [25], designed for use with the classical DP5(4)6 F method of [61], but use instead the BS5(4)7 F method of [9]. Note that both are 5(4) pairs designed with similar purposes in mind. Using a tolerance of tol = 10 −5 , the integration requires 5015 RHS evaluations and includes many rejected steps. Applying instead the optimized coefficients = (0.28, −0.23) derived later in this manuscript results in only 4119 RHS evaluations and a nearly identical final error. A significant performance gain is obtained by applying appropriate controller parameters, cf. Table 2. The spectral radius of the Jacobian (4.2) determining step size control stability is plotted in Figure 4. We see that the standard PI34 controller is unstable near the negative real axis while the optimized one is stable.

Comparison of existing methods
Here, we compare some general purpose methods and schemes designed for semidiscretizations of hyperbolic conservation laws. Since we are interested in error-based step size control, we consider only schemes with embedded error estimators. Hence, we consider the general purpose schemes • BS3(2)3 F , third-order, four-stage FSAL method of [10], • BS5(4)7 F , fifth-order, eight-stage FSAL method of [9], • DP5(4)6 F , fifth-order, seven-stage FSAL method of [61], the SSP schemes   2) determining step size control stability for BS5(4)7 F . The standard PI34 controller is unstable near the negative real axis while the optimized PI controller is stable along the boundary of the stability region.
• SSP3(2)4[3S* + ], third-order, four-stage SSP method of [47] with embedded method of [18] which can be implemented efficiently in low-storage form as described in Appendix A, and the low-storage methods optimized for hyperbolic conservation laws In the following, we will use three representative test problems to compare the performance of these schemes. All test problems are semidiscretizations of the compressible Euler equations in space dimensions where the conserved variables = ( , , ) are the density , the momentum , and the energy . The flux for the spatial coordinate is where = = ( − 1)( − 2 /2) is the pressure, the temperature, and an ideal gas law with ratio of specific heats = 7 /5 is assumed. The spatial semidiscretizations use entropydissipative nodal DG methods with polynomials of degree on Legendre-Gauss-Lobatto nodes with upwind interface fluxes implemented in SSDC. We present detailed results for = 2, which is a relevant choice in practical CFD applications. The results are similar for higher-order semidiscretizations such as polynomials of degree ∈ {3, 4}, presented in the supplementary material in more detail.

Inviscid Taylor Green vortex
The inviscid Taylor-Green vortex in = 3 space dimensions is a classical test case to study the stability of numerical methods [23]. The initial condition given by

Isentropic vortex
The isentropic vortex is a widely used benchmark problem [69] with analytical solution. For the stationary case, the exact solution is given by where is the distance from the axis of the vortex and t is the tangential velocity. The moving vortex solution is obtained by a uniform translation in the direction of the velocity vector field. Herein, the simulation domain is a cube [−5, 5] 3 with periodic boundaries where the vortex rotates around the axis (1, 1, 0) , a direction not aligned with the grid. The parameters for this test are = 1.4, Ma = 0.5, = 5 and ∞ = 1. Unless stated otherwise, we use 8 elements per coordinate direction for optimizing controllers and 20 elements for examples with a final time of = 20. This test case is chosen as an example where the time step can be restricted by accuracy for tight tolerances and because of the existence of an analytical solution.

Smooth flow with source terms
The analytical solution is imposed as initial condition in the periodic domain [−1, 1] and the source term is added to the right hand side of the energy equation. The variation of the pressure with amplitude = 50 and frequency = /5 results in a cyclic variation of the CFL restriction on the time step. Unless stated otherwise, we use 20 elements and a final time of = 20. This test case is chosen to assess the ability of the schemes to adapt to varying time step restrictions and because of the existence of an analytical solution.

Optimization of step size controllers
As explained in Section 4, the choice of appropriate step size controller parameters is important to obtain good performance when the schemes are run at the stability limit. Hence, we have optimized controller parameters for each scheme. In general, the optimal time step controller parameters for a given Runge-Kutta pair will depend somewhat on the problem under consideration. No single controller is optimal for all test cases, but for the experiments conducted in this work, good controllers are usually within ca. 5 % of the optimal performance.
For the low-storage schemes of [37], we used the PI34 controller proposed originally with them. We also tested the PID controller using = (0.49, −0.34, 0.10) proposed in [35]. We also performed an optimization of controller parameters for each method, as follows.
We ran simulations of all three test cases described above and measured the performance of each scheme (in terms of the number of right-hand side evaluations). We used a brute-force search over the domain tol ∈ [10 −8 , 10 −1 ], sampling at each power of ten, and 1 ∈ [0.1, sampling at an interval of 0.01 in each parameter, and restricting a priori to parameter values yielding step size control stability for the given scheme (computed using NodePy [43]). The final times for these simulations were set to = 8 for (5.3), = 4 for (5.4), and = 20 for (5.5) to make the brute-force optimization feasible. From the resulting data, consisting of thousands of runs with each method, an overall best choice of parameters was selected as in Section 6.1. Usually, this kind of min-max problem was approached by comparing the controllers minimizing the maximum, the median, or the 95 % percentile of the RHS evaluations across all CFD simulations. Then, the final choice was made by human interaction taking into account step size control stability and design criteria for PID controllers.

Results for existing schemes
For BS3(2)3 F , all of the controllers from Table 1 perform reasonably well, PI42 being slightly better than the others. In general, a wide range of controller parameters is acceptable for this scheme. As typified by the example in Section 4, standard controllers do not perform well for BS5(4)7 F . We found instead that = (0.28, −0.23, 0.00) is a reasonable choice for this scheme. For DP5(4)6 F , the PI34 controller (which was originally designed for it by Gustafsson [25]) performs reasonably well in our test cases and optimized controllers like = (0.61, −0.27, 0.01) do not perform significantly better.
Subsequently, we used the optimized controller parameters and ran full simulations (up to = 20) for each method with a range of tolerances. Results are shown in Tables 3, 4, and 5, where polynomials of degree = 2 have been used. There, we only show results for a tolerance tol = 10 −5 , since this choice is usually good for these small-scale test problems. Extended details are available in the supplementary material. For the inviscid Taylor-Green vortex, the time step is indeed restricted by stability for most tolerances, indicated by the approximately constant number of function evaluations, except for the very tight tolerance tol = 10 −8 and some schemes. For the isentropic vortex (5.4), the step size is restricted by stability for tolerances 10 −6 . For smaller tolerances, the number of function evaluations increases. However, this does not result in a significant change of the total error, which is determined mostly by the spatial semidiscretization. Finally, for the smooth flow with source term (5.5), the step size is again restricted mostly by stability constraints.

General purpose methods
For very loose tolerances 10 −3 , BS5(4)7 F and DP5(4)6 F result in a significant overhead caused by step rejections for some test cases. Otherwise, BS5(4)7 F performs better than DP5(4)6 F . Other fifth-order general purpose schemes like T5(4)6 F of [77] perform slightly better, usually yielding an improvement of ca. 5 %. However, BS3(2)3 F is ca. 50 % more efficient as long as the time step is restricted by stability.
These results do not change significantly if slightly higher-order semidiscretizations are employed in space (see supplementary material), up to polynomials of degree = 4, resulting in fifth-order convergence in space. Hence, matching the order of accuracy in space and time is not strictly necessary if one is interested in fixed mesh sizes, especially in common CFD applications. This remains true even if the polynomial degree is increased to = 7 for the test problems considered here. Then, the temporal error becomes significant and the error of the fully discrete method plateaus only at relatively tight tolerances such as 10 −8 . Nevertheless, BS3(2)3 F is still the most efficient method for such high-order methods and tight tolerances. Table 3.: Performance of general purpose schemes: Number of function evaluations (#FE), rejected steps (#R), and 2 error of the density for the inviscid Taylor-Green vortex (5.3), the isentropic vortex (5.4) , and the flow with source term (5.5) using polynomials of degree = 2.

SSP methods
The popular method SSP3(2)3[3S* + ] can be equipped with the PI34 controller to give acceptable step size control performance; slightly better behavior can be achieved by choosing = (0.70, −0.37, 0.05). For loose and medium tolerances, this scheme performs similarly to BS3(2)3 F . BS3(2)3 F is significantly more efficient than SSP3(2)3[3S* + ] at tight tolerances. SSP3(2)4[3S* + ] performs ca. 50 % better than SSP3(2)3[3S* + ] or BS3(2)3 F at loose and medium tolerances for the inviscid Taylor-Green vortex using the optimized controller = (0.55, −0.27, 0.05). At loose tolerances, it is also ca. 15 % more efficient than BS3(2)3 F for the isentropic vortex. However, the number of RHS evaluations increases drastically for tighter tolerances, making SSP3(2)4[3S* + ] less efficient than BS3(2)3 F for these parameters. The results for the flow with source term are similar but less pronounced. Hence, SSP3(2)4[3S* + ] can be more efficient than the best schemes so far but the embedded method does not seem to be reliable enough to make the choice of the tolerance as robust as for other schemes. Additionally, the choice of appropriate controller parameters can be crucial for SSP3(2)4[3S* + ], since some standard controllers do not perform well.
As for = 2, KCL4(3)5[3R + ]C is usually the most efficient existing low-storage method of [37] for ∈ {3, 4}, which can also be more efficient than BS3(2)3 F for the vortex problems. However, SSP3(2)4[3S* + ] is even more efficient there. Additionally, the sensitivity of the step size controller for the low-storage methods is bigger than for BS3(2)3 F . For = 7, all of the low-storage methods considered here result in a non-negligible amount of step rejections for the inviscid Taylor-Green vortex. Nevertheless, the fourth-order accurate methods can be up to 15 % more efficient than BS3(2)3 F there. Nevertheless, SSP3(2)4[3S* + ] is still more efficient for this test problem. At tight tolerances such as 10 −8 , KCL4(3)5[3R + ]C is the most efficient method considered so far for the isentropic vortex and the smooth flow with source term.

Discussion
All of the general purpose schemes make use of the FSAL technique. Additionally, the stability region of the embedded scheme is always at least as big as the one of the main method. Although the 2R low-storage schemes were optimized for convection-dominated problems, they were outperformed for all test problems and at almost all tolerances by the general purpose method BS3(2)3 F . Possible reasons for this are that the 2R method coefficients are chosen subject to more stringent low-storage requirements, they do not exploit the FSAL technique, and they have embedded methods with a stability region that is smaller than that of the main method in some areas. When the time step is restricted by stability, this can effectively reduce the allowable time step for the main method. This last point is illustrated in Figure 5, which shows the stability regions (scaled by the effective number of stages) of the main and embedded method for three pairs. We see that although the stability region of BS3(2)3 F includes less of the real axis than that of KCL4(3)5[3R + ]C, the embedded method for BS3(2)3 F extends further than that of KCL4(3)5[3R + ]C. The last stability region in the figure corresponds to a new method developed in the next section. Like BS3(2)3 F , it has the useful property that the stability region of the embedded method contains that of the main method.
In the results described above, the behavior of the methods for the inviscid Taylor-Green vortex was often slightly different than for the other test cases. This can partly be explained by the lower Mach number chosen for this example. Indeed, numerical experiments show that low Mach numbers put more stress on real axis stability than on the rest of the spectrum generated by linear advection. Hence, methods with stability regions that include more parts of the negative real axis but are not optimal for the linear advection spectrum can perform better for low Mach numbers; see also [17]. In this article, we focus on applications in CFD with medium to high Mach numbers. However, flows with small Mach numbers are usually computed using incompressible solvers and implicit time integration methods. Hence, we do not focus on this regime in this article. Nevertheless, we use the inviscid Taylor-Green vortex as test case to study the step size control stability on the negative real axis.

New Optimized Runge-Kutta pairs
In the previous section we developed optimized step size controllers for existing Runge-Kutta pairs. Now we consider the optimization of Runge-Kutta pairs themselves (along with controllers). To do so, we begin with the 3S* + methods of [58], without embedded error estimator. Then, we design an embedded method and optimized controller parameters. The embedded method is optimized for step size control stability, good error metrics (see [37] and the supplementary material for the present work), to have a large stability region that includes that of the main method, and to have coefficients that are not too large. The resulting schemes given by double precision floating point numbers are optimized further using extended precision numbers in Julia [8] and the package Optim.jl [52], such that the order conditions are satisfied at least to quadruple precision. Coefficients of the new optimized methods are available in the accompanying repository [64] in full precision. Double precision coefficients are given in Appendix B. The stability region of a representative method is shown in Figure 5c. We also developed new pairs from scratch, based on the approach used in [58]. Specifically, we compute the Fourier footprint of the spectral element semidiscretization of the linear advection equation by varying the direction of the wave propagation velocity vector, the solution orientation, and the wave vector module and construct optimized stability polynomials using the algorithm described in [38]. Afterwards, low-storage Runge-Kutta schemes are constructed by minimizing their principal error constants, given their class, the number of stages , the order of accuracy , and the optimized stability polynomial as constraints. These optimizations are carried out using RK-Opt [42], based on the optimization toolbox of MATLAB. However, the resulting methods did not perform better than the pairs based on starting with methods from [58].

Optimization of controller parameters
The controller parameters for these new pairs are optimized using the same approach as described in Section 5.  Typical performance results of the optimization procedure are shown in Figure 6 using RK3(2)5 F [3S* + ] as an example. For the isentropic vortex (5.4) with tol = 10 −5 , the temporal accuracy starts to play a role and the controllers are not necessarily limited by stability. In this regime, controllers with larger 1 and 2 closer to zero perform better; they are more near to the simple deadbeat (I-) controller, which is in some sense optimal in the asymptotic regime. In contrast, the controllers operate near the stability boundary for the test case with source term (5.5). Here, controllers with more damping and smaller 1 perform better. To find an acceptable controller for the scheme RK3(2)5 F [3S* + ], both kinds of problems have to be considered, seeking a compromise between efficiency in the asymptotic regime and near the stability boundary. Analogously to Tables 3-5, results are summarized in Table 6 for the new optimized lowstorage schemes with error control for = 2; extended details and results for higher-order spatial semidiscretizations using solution polynomials of degree ∈ {3, 4, 7} are available in the supplementary material.  In general, the novel schemes are more efficient than all methods tested in Section 5. In particular, the novel third-order schemes are up to 18 % more efficient than BS3(2)3 F , in accordance with the relative lengths of the real stability intervals. They are also up to 5 % more efficient than KCL4(3)5[3R + ]C with the optimized PID controller for the inviscid Taylor-Green vortex and up to 13 % more efficient for the isentropic vortex. Recall that BS3(2)3 F is more efficient than KCL4(3)5[3R + ]C for the other test problem. Only SSP3(2)4[3S* + ] is more efficient than RK3(2)5 F [3S* + ] for the inviscid Taylor-Green vortex, in accordance with the particularly large real stability interval, cf. Section 5.5. However, the new schemes are more efficient at realistic (medium to high) Mach numbers, for which they have been optimized.

Results for new schemes
The optimized fourth-order schemes can be even more efficient for the inviscid Taylor-Green vortex and the smooth flow with source term at medium tolerances, giving an improvement of a few percent. For the isentropic vortex, the third-order schemes are still up to 6 % more efficient. The optimized fourth-order schemes are up to 25 % more efficient than the best corresponding methods of [37] with the controller recommended there. However, the fourthorder accurate main methods of [58] make it particularly difficult to design good embedded methods and controllers. This can already be seen in the optimized controller coefficients, where the magnitudes of 1 and 2 differ less than for other optimized methods. While the controllers can be tuned to result in acceptable performance for these test problems, they do not necessarily lead to good performance for other setups.
The optimized fifth-order schemes are less efficient than the optimized third-and fourth-order schemes for these test cases unless the tolerance is very tight (so the spatial error dominates and the influence of the time integrator is negligible). These schemes (used with the controllers designed here) are much more efficient than the corresponding method of [37] (used with the controller prescribed there), by up to 25 % for the source term problem, up to 35 % for the isentropic vortex, and up to 18 % for the inviscid Taylor-Green vortex. For = 3, the third-order accurate schemes are the most efficient ones of the optimized lowstorage methods, except for the inviscid Taylor-Green vortex, where the fourth-order methods are up to 6 % more efficient. For = 4, the fourth-order accurate schemes are the most efficient new ones for these experiments, followed closely by the third-order accurate methods. For = 7, the fourth-order accurate schemes are still the most efficient new ones; the third-order methods do not match the same small errors for tight tolerances and their temporal error dominates the spatial one. However, the fourth-order methods are difficult to control for loose tolerances, resulting in a significant number of step rejections. For sufficiently tight tolerances, the optimized fourth-order methods are more efficient than the fourth-order methods of [37].
In general, FSAL methods are often more efficient than non-FSAL schemes, especially at loose tolerances. Thus, we recommend to use the novel 3S* + (FSAL) methods for hyperbolic problems where the time step is restricted mostly by stability constraints.
Optimization of Runge-Kutta pairs for higher numbers of stages was not as successful. While we were able to obtain schemes with good theoretical properties, their performance did not show improvement compared to the schemes listed above. Improvements of the underlying optimization algorithms or the imposition of additional constraints might lead to better schemes in the future. However, the novel schemes developed in this work are already a significant improvement over the state of the art and perform well.

Further optimizations
As shown in [58] and the numerical experiments above, the spatial error usually dominates the temporal error. Hence, it is interesting to optimize lower-order time integrators for higher-order spatial discretizations. Such an approach is presented in [59], but focused on first-order accurate time integrators. The results shown there demonstrate some speedup compared to the schemes presented in [58], but these come with a reduced accuracy.
Here, we choose third-order accurate Runge-Kutta methods and optimize them for a fifthorder spatial discretization. This resulted in a speedup compared to the optimized schemes described in the previous sections for some test cases; for other test cases, no speedup could be observed. This is in accordance with the general similarity of the scaled convex hulls of the spectra for different polynomial degrees ∈ 1, 2, 3, 4 shown in Figure 7. Hence, we do not pursue this path of research further.  [58]. The spectra are scaled such that min Re( ) = −1.

Additional numerical experiments and comparisons
Hitherto, a careful selection of test cases was used to demonstrate issues and design criteria for explicit Runge-Kutta schemes applied to semidiscretizations of hyperbolic conservation laws. Next, more involved examples are used to demonstrate that the novel methods can be applied successfully to large-scale CFD problems including the compressible Navier-Stokes equations.
To be useful for engineering and applied problems in CFD, a CFL-based control must be automated as much as possible. Therefore, we use the approach described in Section 3 also for the viscous CFL number. The normalizing factor in (3.2) is chosen depending on the solution polynomial degree such that a method with a real stability interval of 2 is stable for the linear advection-diffusion equation on a uniform grid with a CFL factor = 1. On top of that, a safety factor of 0.95 is applied, cf. [2].
Except for the viscous shock described next, the other simulations conducted here start from a checkpoint of a developed solution and run on 8 nodes of Shaheen XC40 using 32 CPU cores each (one compute node of Shaheen XC40 2 ). The general purpose and SSP methods are implemented using the explicit Runge-Kutta interface of PETSc. The other methods are implemented using their respective low-storage forms in PETSc.

Viscous shock
The propagating viscous shock is a classical test problem for the compressible Navier-Stokes equations. The momentum of the analytical solution satisfies the ODE The solution of this ODE can be written implicitly as Here, / are the known velocities to the left and right of the shock at ±∞, ℳ is the constant mass flow across the shock, = 3/4 is the Prandtl number, and is the dynamic viscosity. The mass and total enthalpy are constant across the shock. Moreover, the momentum and energy equations become redundant.
For our tests, we compute from (7.2) to machine precision using bisection. The moving shock solution is obtained by applying a uniform translation to the above solution. Initially, at = 0, the shock is located at the center of the domain. We use the parameters Ma = 2.5, Re = 10, and = 1.4 in the domain given by ∈ [−0.5, 0.5] till the final time = 2. The boundary conditions are prescribed by penalizing the numerical solution against the analytical solution, which is also used to prescribe the initial condition.  Some results for the most promising methods and optimized controllers are shown in Table 7; extended details are available in the supplementary material. The new scheme RK3(2)5 F [3S* + ] is ca. 18 % more efficient than BS3(2)3 F for relevant tolerances, in accordance with the relative real stability intervals. SSP3(2)4[3S* + ] is a very promising scheme for this kind of problem because of its improved stability properties around the negative real axis. In particular, SSP3(2)4[3S* + ] is ca. 50 % more efficient than BS3(2)3 F for relevant tolerances, also in accordance with the relative real stability intervals. Except for very tight tolerances and low solution polynomial degrees, the step size controllers detect the stability constraint accurately; the spatial error dominates and the error of the time integration schemes is negligible.

NASA juncture flow
We consider the NASA juncture flow problem as described in [55,Section 3.8]. The NASA juncture flow test was designed to validate CFD for wing juncture trailing edge separation and progression, and it is a collaborative effort between CFD computationalists and experimentalists [16]. Specifically, the NASA juncture flow experiment is a series of wind tunnel tests conducted in the NASA Langley subsonic tunnel to collect validation data in the juncture region of a wing-body configuration [68].
Here, we simulate the NASA juncture flow with a wing based on the DLR-F6 geometry and a leading edge horn to mitigate the effect of the horseshoe vortex over the wing-fuselage juncture [50]. A general view of the geometry is shown in Figure 8(b). The model crank chord is ℓ = 557.1 mm, the wing span is 77.89ℓ , and the fuselage length is = 8.69ℓ . The wing leading edge horn meets the fuselage at 1 = 3.45ℓ , and the wing root trailing-edge is located at 1 = 5.31ℓ . In the wind tunnel, the model is mounted on a sting aligned with the fuselage axis. The sting is attached to a mast that emerges from the wind tunnel floor. The Reynolds number is Re = 2.4 × 10 6 and the freestream Mach number is Ma = 0.189. The angle of attack is AoA = −2.5 • . We perform simulations in free air conditions, ignoring both the sting and the mast. for the NASA juncture experiment [55]; is the fuselage length.
As shown in Figure 8(b), the grid is subdivided into three blocks, corresponding to three different approximation degrees, , for the solution field. In particular, we use = 1 in the far-field region, = 3 in the region surrounding the model, and = 2 elsewhere. In total, we use ≈ 6.762 × 10 5 hexahedral elements and ≈ 4.091 × 10 7 degrees of freedom (DOFs). We highlight that the boundary layer thickness over the fuselage for = 1, 000-2, 000 mm is about 16 mm, while it is about 20 mm over the wing upstream of the separation bubble [33]. In the present simulation we use between eight and nine solution points in the boundary layer thickness 99 . The mesh features a maximum aspect ratio of ca. 110. The grid is constructed using the commercial software Pointwise V18.3 released in September 2019; solid boundaries are described using a quadratic mesh.    A summary of the performance of the different methods is presented in Table 8. Here, the CFL adaptor with = 1.0 tuned for linear advection-diffusion works for all RK methods. However, it was significantly less efficient than the error-based controller with a conservative tolerance of 10 −8 ; the CFL controller used ca. 50 % more RHS evaluations and wall-clock time. Thus, a tedious manual tuning to increase the CFL factor would be necessary to match the efficiency of the error-based controller which just works out of the box. BS3(2)3 F is an efficient general purpose method for this CFD problem. Nevertheless, the optimized third-and fourth-order accurate methods are more efficient. Interestingly, SSP3(2)4[3S* + ] is again significantly more efficient, nearly 50 % faster than BS3(2)3 F .

Viscous flow past a Formula 1 front wing
Here, we consider the flow past a Formula 1 front wing with a relatively complex geometry, supported by the availability of a CAD model and experimental results [60]. We refer to this test case as the Imperial Front Wing, originally based on the front wing and endplate design of the McLaren 17D race car [11]. The panel of Figure 10 gives an overview of the Imperial Front Wing geometry. We denote by ℎ the distance between the ground and the lowest part of the front wing endplate and by the chord length of the main element. The position of the wing in the tunnel is further characterized by a pitch angle of 1.094 • . Here we use ℎ/ = 0.36 which can be considered as a relatively low front ride height, with high ground effect and hence higher loads on the wing. The corresponding Reynolds number is Re = 2.2 × 10 5 , based on the main element chord of 250 mm and a free stream velocity of 25 m/s. The Mach number is set to Ma = 0.036. This corresponds to a practically incompressible flow.
The computational domain is divided into 3.4 × 10 6 hexahedral elements with a maximum aspect ratio of ca. 250. Two different semidiscretizations with solution polynomials of degree = 1 and = 2 are used. The grid is constructed using the commercial software Pointwise V18.3 released in September 2019; solid boundaries are described using a quadratic mesh. In Figure 11, we present the contour plot of the time-averaged pressure coefficient on the surface of the front wing. The statistics have been obtained by averaging the solution for approximately five flow-through time units.  The performance characteristics of the different methods for = 2 are summarized in Table 9. The results are in agreement with those obtained for the NASA juncture flow. The CFL adaptor with = 1.0 works for all methods and is less efficient than error-based step size controllers. Again, BS3(2)3 F is an efficient general purpose scheme for this problem. Nevertheless, the optimized third-and fourth-order methods are more efficient and SSP3(2)4[3S* + ] is the most  The results for = 1 summarized in Table 10 are mostly similar to the ones presented before. In contrast to the results for = 2, the CFL adaptor with = 1.0 did not work for SSP3(2)4[3S* + ]; the simulation crashed when using = 1.0 and manual tuning was necessary to get a working setup 3 . For = 0.85, the CFL adaptor worked and was a few percent more efficient than the error-based controller. However, the latter did not require any manual tuning at all and worked robustly with default parameters for all RK methods. Here, SSP3(2)4[3S* + ] is less efficient than RK3(2)5 F [3S* + ] and RK4(3)9[3S* + ]. Otherwise, the results are similar to the ones obtained for the juncture flow and the setup using = 2.

Summary and conclusions
We studied explicit Runge-Kutta methods applied to dissipative spectral element semidiscretizations of hyperbolic conservation laws and CFD problems based on the compressible Euler and Navier-Stokes equations. In this context, we argued in Section 3 that error-based step size control can be advantageous compared to CFL-based approaches, since associated userdefined parameters are usually more robust and can be varied in rather large ranges without affecting accuracy or efficiency. Additionally, error-based step size control moves the burden of constructing critical parts of the controller from the developer of the spatial semidiscretization to the developer of the time integrator, easing the workflow for most researchers. The results for more complex test problems in Section 7 also support this conclusion.
In Section 4, we demonstrated that choosing good step size controller parameters is especially important if the time step is restricted by stability constraints, as is typical for many convectiondominated problems. We compared existing Runge-Kutta pairs in Section 5 and proposed an approach to optimize controller parameters for such methods. In general, the third-order method BS3(2)3 F of Bogacki and Shampine [10] performs well compared to both general purpose schemes and methods designed specifically for CFD applications. The strong-stability preserving method SSP3(2)4[3S* + ] of [47] with embedded method of [18] also performs well, but is more sensitive to the choice of error tolerance.
In Section 6, we developed explicit low-storage Runge-Kutta pairs with optimized step size controllers. These novel schemes are more efficient than all of the existing schemes when applied to advection-dominated problems. We demonstrated their performance in several CFD applications with increasing complexity, including the compressible Euler and Navier-Stokes equations. We contributed our optimized methods to the freely available open source library DifferentialEquations.jl [62] written in Julia [8].
Although not demonstrated in this article, another advantage of error-based step size control becomes apparent for a cold startup of CFD problems, i.e. simulations around complex geometries that are initialized with a free stream flow. CFL-based approaches often need to adjust the CFL scaling at the beginning to cope with the initial transient period. In contrast, our error-based approach does not need special tuning and is robust in our experience.
A summary of existing and novel methods with optimized controller parameters is given in Table 11. Depending on whether dissipative/low-Mach effects dominate, SSP3(2)4[3S* + ] and RK3(2)5 F [3S* + ] are the most efficient schemes in our experience. Additionally, BS3(2)3 F is a surprisingly efficient general purpose method. It becomes increasingly complicated to design controllers that are stable and efficient across different applications for methods of higher order and/or with more stages. However, this is not necessarily a severe drawback, since third-order accurate methods like RK3(2)5 F [3S* + ], SSP3(2)4[3S* + ], and BS3(2)3 F are usually more efficient in CFD applications. As argued in Section 3, the error-based control has usually a relatively mild sensitivity with respect to the choice of the tolerance. In our experience, it is usually good to choose a relatively tight tolerance around 10 −8 for applied CFD problems. Since the step size is almost always limited by stability, the tolerance does not matter that much, but a relatively tight tolerance helps for methods that are more difficult to control (e.g. those of higher order or with more stages).
The present work was influenced and partially motivated by the landmark work of Kennedy, Carpenter, and Lewis [37], which focused on developing optimized Runge-Kutta methods for CFD. Herein, we focus on modern spectral element semidiscretizations that introduce dissipation at element interfaces, e.g. using upwind numerical fluxes. Hence, the stability regions of our methods focus also on the negative real axis, whereas "imaginary axis stability is a high priority to the methods" designed in [37, p. 183]. Furthermore, we concentrate on the common case where the spatial error dominates and the step size is restricted by stability rather than temporal accuracy. Hence, we design Runge-Kutta pairs with large stability regions, for both the main and the embedded method. In particular, the stability regions of our novel embedded schemes are larger than the ones of the corresponding main methods.
To our knowledge, this article is the first exploring the impact of controller parameters and step size control stability on the efficiency of explicit Runge-Kutta methods for CFD systematically. This provides important insights into the construction of new methods and augments best practices published before. In particular, we think the conventional wisdom that "coping with step size control instability is probably best accomplished by reducing step sizes" [37, p. 208] can be improved upon by instead optimizing the controller, since that results in a more robust and efficient scheme. As noted in [37, p. 208], "doing this optimization requires some caution because it is not sufficient in the design of a good controller for each of the eigensolutions to be damped. The time constants associated with these eigensolutions must not be too large or too small.". Herein, we proposed a way to conduct this optimization systematically and applied it to a wide range of schemes.
Of course, such an approach also comes with limitations, in particular if the main method is fixed such that only the embedded method and the controller can be designed freely. Some methods such as the fourth-order method used in this article constrain the range of embedded methods and controllers such that a good general purpose optimization is not necessarily successful. While the combined method can be efficient for certain problems, it is not necessarily similarly efficient for other problems, e.g. when going from inviscid to viscous flows. Other schemes such as the novel third-and fifth-order accurate optimized methods result in less stability restrictions, making the resulting methods and controllers efficient for a broad range of CFD problems. Thus, we would like to stress that designing a good time integration method should not only focus on the main method but consider the interaction of a main method, an error estimator, and a step size controller. Applying this principle to viscous flows will be a subject of future work.
Some previous work has focused on automated step size control for convection-dominated problems with the goal of achieving a temporal error that is of similar magnitude to that of the spatial error [7,80]. The addition of such a control on top of the techniques employed here might lead to an even more efficient controller that is not adversely affected by excessively tight temporal error tolerance specification.
We expect the new methods developed in this article to perform well for convection-dominated flows in the subsonic regime; here, we tested them mainly with reference Mach numbers in the range 0.1-0.5. We have demonstrated their improved performance compared to some standard schemes also in other regimes, including viscous flows and the transonic/supersonic regime. For small Mach numbers, incompressible solvers with implicit time discretizations are usually applied. However, if compressible solvers should be used for low Mach numbers, methods could be optimized following the approach of this article.

Acknowledgments
Research reported in this publication was supported by the King Abdullah University of Science and Technology (KAUST). We are thankful for the computing resources of the Supercomputing Laboratory and the Extreme Computing Research Center at KAUST.

A. Efficient implementation of SSP3(2)4[3S* + ]
The Butcher coefficients of SSP3(2)4[3S* + ] are 0 1 /2 1 /2 where spaces indicate zeros. Because of its low-storage structure, the method can be implemented efficiently and memory-friendly as At the end of one step (A.2), +1 is stored in and +1 is stored in . Usually, it is not important to know +1 but +1 − +1 to estimate the error; this difference can be obtained as ( − )/2 instead of ( + )/2 in the last assignment in (A.2). If the low-storage assumption introduced in [41] can be applied, SSP3(2)4[3S* + ] can be implemented using only three memory locations for , , and . Otherwise, an additional storage location is necessary to evaluate the right-hand side . Note that the previous value is already included in this count of memory locations.

B. Coefficients of the novel Runge-Kutta pairs
The low-storage coefficients of the novel methods are listed in double precision in Tables 12-17. Full-precision results in electronic form are available in the accompanying repository [64]. We contributed our optimized methods to the freely available open source library DifferentialEquations.jl [62] written in Julia [8].       Table 1 gives the values of certain measures used to predict the accuracy and efficiency of Runge-Kutta pairs, for each of the pairs discussed in the paper. For the definitions of these quantities, see [37].

Detailed results for the inviscid test problems
Here, we provide detailed performance characteristics of the numerical methods for numerical studies described in Sections 5 and 6 of the manuscript.         (2) (2)

Detailed results for the viscous shock
Here, we provide detailed performance characteristics of the numerical methods for numerical studies of the viscous shock described in Section 7.1 of the manuscript.