1 Introduction

Runge-Kutta (RK) methods achieve high-order accuracy in time by means of combining approximations to the solution at multiple stages. An s-stage RK scheme can be represented via the Butcher tableau

Throughout the whole paper we assume that c = A e, where e is the vector of all ones. The scheme’s stability function [12] R(ζ) = 1 + ζ b T(IζA)−1 e measures the growth u n+1u n per step Δt, when applying the scheme to the linear model equation u′(t) = λu, with ζ = λΔt.

A particular interest lies in the accuracy of the RK scheme for stiff problems, i.e., problems in which a larger time step is chosen than the fastest time scale of the problem’s dynamics. A standard stiff model problem [8] is the scalar linear ordinary differential equation (ODE)

$$\displaystyle \begin{aligned} u' = \lambda (u-\phi(t)) + \phi'(t)\;, \end{aligned} $$
(1)

with i.c. u(0) = ϕ(0) and Re λ ≤ 0. The true solution y(t) = ϕ(t) evolves on an O(1) time scale. Hence, λ-values with large negative real part result in stiffness. Considering a family of test problems (parametrized by λ), one can now establish the scheme’s convergence via two different limits: (a) the non-stiff limit Δt → 0 and ζ → 0; and (b) the stiff limit Δt → 0 and ζ →−. A characteristic property of most RK schemes is that, while the non-stiff limit recovers the scheme’s order (as given by the order conditions [2, 5]), the error decays at a reduced order in the stiff limit. This phenomenon is called “order reduction” (OR) [1, 3, 7, 10, 11] and it manifests in various ways for more complex problems, including numerical boundary layers [6]. The OR phenomenon can be seen by studying the RK scheme applied to (1). The approximation error at time t n+1 reads [12, Chapter IV.15]

$$\displaystyle \begin{aligned} \epsilon^{n+1} = R(\zeta)\,\epsilon^{n} + \zeta{\mathbf{b}}^{\,T}(I-\zeta A)^{-1}\mathbf{\delta}_s^{\,{n+1}} + \delta^{n+1}\;, \end{aligned} $$
(2)

where R(ζ) is the growth factor, and

$$\displaystyle \begin{aligned} \mathbf{\delta}_s^{\,{n+1}} =\! \sum_{j\geq 2}\textstyle\frac{{\varDelta t}^{\,j}}{(j-1)!}\,\mathbf{\tau}^{\,(j)}\phi^{(j)}(t_n) \;,\quad \delta^{n+1} =\! \displaystyle\sum_{j\geq 1}\textstyle\frac{{\varDelta t}^{\,j}}{(j-1)!} \!\left({\mathbf{b}}^{\,T}{\mathbf{c}}^{\,j-1}-\textstyle\frac{1}{j}\right)\! \phi^{(j)}(t_n) \end{aligned}$$

are the truncation errors incurred at the intermediate stages and at the end of the step, respectively. Here, ϕ (j) denotes the j-th derivative of the solution, and the vectors

$$\displaystyle \begin{aligned} \mathbf{\tau}^{\,(j)} = A{\mathbf{c}}^{\,j-1} - \tfrac{1}{j}{\mathbf{c}}^{\,j}\;, \quad j=1,2,\dots \end{aligned}$$

we call the stage order residuals or stage order vectors. The condition τ (η) = 0 for 0 ≤ η ≤ j appears often in the literature and is also referred to as the simplifying assumption C(η) [12]. In (2), the step error δ n+1 is of the formal order (in Δt) of the scheme (due to the order conditions). Moreover, the growth factor carries over (more or less, see [4]) the accuracy from one to the next step. Hence, the critical expression for OR is the term involving the stage error \(\mathbf {\delta }_s^{\,{n+1}}\). Specifically, the asymptotic behavior of the expression

$$\displaystyle \begin{aligned} g^{(j)} = \zeta{\mathbf{b}}^{\,T}(I-\zeta A)^{-1}\mathbf{\tau}^{\,(j)} \end{aligned} $$
(3)

matters. In the non-stiff limit (ζ ≪ 1), a Neumann expansion yields ζ(IζA)−1 = ζI + ζ 2 A + ζ 3 A 2 + …, leading to expressions b T A τ (j) with  > 0. And in fact the order conditions guarantee that b T A τ (j) = 0 for 0 ≤  + j ≤ p − 1 to ensure the formal order of the scheme.

Conversely, in the stiff limit we can treat ζ −1 as the small parameter and expand ζ(IζA)−1 = −A −1(Iζ −1 A −1)−1 = −A −1 − ζ −1 A −2 − ζ −2 A −3 −…, leading to expressions b T A τ (j) with  < 0. The order conditions do not imply that these quantities vanish, and in general one may observe a reduced rate of convergence.

A key question is therefore whether additional conditions can be imposed on the RK scheme that recover the scheme’s order in the stiff regime. A well-known answer to the question is:

Definition 1

Let \(\hat {p}\) denote the order of the quadrature rule of an RK scheme. Let \(\hat {q}\) denote the largest integer such that τ (j) = 0 for \(1\le j\le \hat {q}\). The stage order of a RK scheme is \(q=\min (\hat {p},\hat {q})\).

Having stage order q implies that the error decays at an order of (at least) q in the stiff regime (see also [12]). This work focuses particularly on diagonally-implicit Runge-Kutta (DIRK) schemes, for which A is lower diagonal. A known drawback of DIRK schemes is that they cannot have high stage order:

Theorem 1

The stage order of an irreducible DIRK scheme is at most 2. The stage order of a DIRK scheme with non-singular A is at most 1.

Proof

Since c = A e, we have \(\mathbf {\tau }^{\,(2)}_1 = a_{11}c_1 - \frac {1}{2}(c_1)^2 = \frac {1}{2}(a_{11})^2\). Thus if A is non-singular, one has τ (2) ≠ 0, so q ≤ 1. Consider now the case that a 11 = c 1 = 0, and suppose that the method has stage order 3. The conditions \(\mathbf {\tau }^{\,(2)}_2 = \mathbf {\tau }^{\,(3)}_2 = 0\) then imply a 21 = a 22 = c 2 = 0, which would render the scheme reducible. Hence, q ≤ 2. □

Hence, while DIRK schemes possess an implementation-friendly structure (each stage is a backward-Euler-type solve), their potential to avoid OR by means of high stage order is limited. We therefore move to a weaker condition that can avoid OR in some situations for higher order in the context of DIRK schemes.

2 Weak Stage Order

To avoid order reduction, the expressions g (j) in (3) need to vanish in the stiff limit. In line with [9], we define the following criteria:

Definition 2 (Weak Stage Order)

A RK scheme has weak stage order (WSO) \(\tilde {q}\) if there is an A-invariant subspace that is orthogonal to b and that contains the stage order vectors τ (j) for \(1\le j\le \tilde {q}\).

Theorem 2 (WSO Is the Most General Condition that Ensures g (j) = 0 for All ζ > 0)

Let coefficients A, b be given. Then g (j) = 0 for all ζ > 0 and \(1\leq j \leq \tilde {q}\) if and only if the corresponding RK scheme has weak stage order \(\tilde {q}\).

Proof

Let C(G) denote the column space of

$$\displaystyle \begin{aligned} G := \begin{bmatrix} \mathbf{\tau}^{\,(1)}, A \mathbf{\tau}^{\,(1)},A^2 \mathbf{\tau}^{\,(1)}, \ldots, A^{s-1}\mathbf{\tau}^{\,(1)}, \mathbf{\tau}^{\,(2)}, A\mathbf{\tau}^{\,(2)}, \ldots, A^{s-1}\mathbf{\tau}^{\,(\tilde{q})} \end{bmatrix}. \end{aligned}$$

From the Cayley-Hamilton theorem it follows that WSO \(\tilde {q}\) is equivalent to

$$\displaystyle \begin{aligned} {\mathbf{b}}^T A^\ell \mathbf{\tau}^{\,(j)} = 0, \quad \quad 0\leq \ell \leq s-1, \ 1 \leq j \leq \tilde{q}\;. \end{aligned} $$
(4)

⇒ Because C(G) is A-invariant, C(G) is invariant under multiplication by (1 − ζA)−1, i.e. if v ∈ C(G) then for any ζ > 0, the product (1 − ζA)−1 v ∈ C(G). Since b is orthogonal to C(G), we have g (j) = 0 for all \(1\leq j \leq \tilde {q}\).

⇐=  If g (j) = 0, then ζ −1 g (j) = b T(1 − ζA)−1 τ (j) = 0 for all ζ > 0. Differentiating both sides of this equation -times, with respect to ζ, and taking the limit as ζ → 0+, yields the conditions in Eq. (4). □

Definition 3 (Weak Stage Order Eigenvector Criterion)

A RK scheme satisfies the WSO eigenvector criterion of order \(\tilde {q}_e\) if for each \(1\le j\le \tilde {q}_e\), there exists μ j such that A τ (j) = μ j τ (j), and moreover, b T τ (j) = 0.

The WSO eigenvector criterion of order \(\tilde {q}_e\) implies WSO (of at least) \(\tilde {q}_e\). For a given scheme, let p denote the classical order, q the stage order, and \(\tilde {q}\) the weak stage order. Then we have \(\tilde {q} \ge q\) and p ≥ q. Note however that a method with WSO \(\tilde {q}\ge 1\) need not even be consistent; order conditions must be imposed separately.

The WSO eigenvector criterion may serve to avoid OR because it implies that

$$\displaystyle \begin{aligned} g^{(j)} = \zeta{\mathbf{b}}^{\,T}(1-\zeta\mu_j)^{-1}\mathbf{\tau}^{\,(j)} = \frac{\zeta}{1-\zeta\mu_j}{\mathbf{b}}^{\,T}\mathbf{\tau}^{\,(j)}\;, \end{aligned}$$

i.e., it allows one to “push” the stage order residuals past the matrix (1 − ζA)−1, and then use b T τ (j) = 0. Note that the condition b T τ (j) = 0 that is required in Definition 3 is actually automatically satisfied (due to the order conditions) if \(p>\tilde {q}_e\) (or \(p\ge \tilde {q}_e\) for stiffly accurate schemes).

It must be stressed that the concept of WSO (both criteria) is based on the linear test equation (1), hence it is not clear to what extent WSO will remedy OR for nonlinear problems or problems with time-dependent coefficients. In Sect. 4 we numerically investigate some nonlinear test problems.

Finally, we present a limitation theorem on the WSO eigenvector criterion.

Theorem 3

DIRK schemes with invertible A have \(\tilde {q}_e\le 3\).

Proof

Because the τ (j) only depend on A, the eigenvector relation in Definition 3 depends only on A, not on b. With A lower triangular, the first k components of τ (j) depend only on the upper k rows of A; and the same is true for the eigenvector relation as well. Hence, for a scheme to have an A that allows for the WSO eigenvector criterion of order \(\tilde {q}_e\), all upper sub-matrices of A must admit the same, too. We can therefore study A row by row. The first component of τ (j) equals \((1-\frac {1}{j})a_{11}^j\), which is nonzero for j > 1. Hence, the first row of the equation A τ (j) = μ j τ (j) is equivalent to μ j = a 11. With that, we can move to the second row of the equation, which reads

$$\displaystyle \begin{aligned} (1\!-\!\tfrac{1}{j})a_{11}^j a_{21} + (a_{22}\!-\!a_{11}) \left(a_{11}^{j-1}a_{21} + (a_{21}\!+\!a_{22})^{j-1}a_{22} - \tfrac{1}{j}(a_{21}\!+\!a_{22})^j\right) = 0\;. \end{aligned} $$
(5)

To determine the set of solutions (a 11, a 21, a 22) of (5), we first observe that (5) is homogeneous, i.e., if (a 11, a 21, a 22) solves (5), then (μa 11, μa 21, μa 22) solves (5) as well for any \(\mu \in \mathbb {R}\). It therefore suffices to consider the solutions of (5) in the 2D-plane \((\frac {a_{11}}{a_{21}},\frac {a_{22}}{a_{21}})\). Figure 1 shows the resulting solution curves for j ∈{2, 3, 4}.

Fig. 1
figure 1

Curves of WSO orders 2, 3, and 4 as functions of the re-scaled parameters \( \frac {a_{11}}{a_{21}}\) and \( \frac {a_{22}}{a_{21}}\). Left panel: scale 10; right panel: scale 1. All orders are satisfied along the line of slope 1 going through (1,0), corresponding to equal-time DIRK schemes. Moreover, there are two further points (other than the origin), where orders 2 and 3 are satisfied. Neither of these two points satisfies order 4

One class of solutions lies on the straight line of slope 1 passing through (1, 0). Those schemes are equal-time methods, i.e., RK schemes that have c = ν e, where \(\nu \in \mathbb {R}\) is a constant. In fact, equal-time schemes satisfy the eigenvector relation for all j. However, they are not particularly useful RK methods, because—among other limitations—they are restricted to second order. This follows because the order 1 and 2 conditions require b T e = 1 and \({\mathbf {b}}^{\,T}\mathbf {c} = \frac {1}{2}\). Thus \(\nu = \frac {1}{2}\), and \({\mathbf {b}}^{\,T}{\mathbf {c}}^2 = \nu ^2 = \frac {1}{4}\), which contradicts the order 3 condition \({\mathbf {b}}^{\,T}{\mathbf {c}}^2 = \frac {1}{3}\). Note that the equal-time scenario also covers the points at infinity in Fig. 1, i.e., the schemes with a 21 = 0.

Non-equal-time schemes that satisfy (5) for j = 2 and j = 3 are the following two points in the \((\frac {a_{11}}{a_{21}},\frac {a_{22}}{a_{21}})\) plane: \(P_1 = (-4+3\sqrt {2},\sqrt {2}-1) = (0.2426,0.4142)\) and \(P_2 = (-(\sqrt {2}+1)(\sqrt {2}+2),-(\sqrt {2}+1)) = (-8.2426,-2.4142)\). None of these two points satisfies (5) for j = 4 (green curve in Fig. 1). Therefore \(\tilde {q}_e\le 3\). □

Among the two sets of solutions found in the proof, P 1 implies that a 11, a 21, and a 22 all have the same sign, which is a desirable property. In contrast, P 2 implies that a 21 < 0. Both WSO 3 schemes presented below correspond to the P 1 solution.

3 DIRK Schemes with High Weak Stage Order

Imposing the classical order conditions [2, 5], together with the WSO eigenvector relation (Definition 3), we determine RK schemes by searching the parameter space of DIRK schemes (with all diagonal entries non-zero). A stiffly accurate structure (b T equals the last row of A) is imposed, as is A-stability (verified by evaluating the stability function R(ζ) along the imaginary axis). Together this implies that the resulting scheme is L-stable; i.e., it ensures that unresolved stiff modes decay [5]. The number of stages is chosen so that the constraints admit solutions. The optimization itself is carried out using MATLAB’s optimization toolbox, using multiple local optimization algorithms included in the function fmincon. An effort was made to minimize the L 2 norm of the local truncation error coefficients. However, in multiple cases the solver exhibited bad convergence properties; so while the schemes below yield reasonable truncation errors, it should not be expected that they are optimal. We find an order 3 scheme with WSO 2 (see also [9]),

an order 3 scheme with WSO 3,

and an order 4 scheme with WSO 3,

4 Numerical Results

In this section we verify the order of accuracy of the schemes above and demonstrate that WSO remedies order reduction for linear problems. We confirm that WSO p is required for ODEs, and WSO p − 1 is required for PDE IBVPs. In addition, we study the effect of WSO for two nonlinear problems.

4.1 Linear ODE Test Problem

We consider the linear ODE test problem (1) with the true solution \(\phi (t) = \sin {}(t + \frac {\pi }{4})\), the stiffness parameter λ = −104, and the initial condition \(u(0) = \sin {}(\frac {\pi }{4})\). The problem is solved using three 3rd order DIRK schemes (with WSO 1, 2, and 3) and two 4th order DIRK schemes (with WSO 1 and 3)Footnote 1 up to the final time T = 10. The convergence results are shown in Fig. 2. In the stiff regime where |ζ| = |λ|Δt ≫ 1, first order convergence is observed for the WSO 1 schemes as expected, the WSO 2 scheme improves the convergence rate to 2, and the WSO 3 schemes exhibit 3rd order convergence. In addition to yielding better convergence orders in the stiff regime, the schemes with higher WSO also turn out to yield substantially smaller error constants in the non-stiff regime (Δt ≪ 1∕|λ|). For comparison, we also display a DIRK scheme with explicit first stage (EDIRK), that is, a 11 = 0, of stage order 2 (see Theorem 1). The left panel of Fig. 2 shows that the WSO 2 scheme exhibits the same convergence behavior as the stage order 2 EDIRK scheme and performs equally well in terms of accuracy.

Fig. 2
figure 2

Error convergence for linear ODE test problem (1). Left: 3rd order DIRK schemes with WSO 1 (blue circles), WSO 2 (red triangles), WSO 3 (black squares), and a 3rd order EDIRK scheme with stage order 2 (light red dots). Right: 4th order DIRK schemes with WSO 1 (blue circles) and WSO 3 (red triangles)

4.2 Linear PDE Test Problem: Schrödinger Equation

As a linear PDE test problem, we study the dispersive Schrödinger equation. The method of manufactured solutions is used, i.e., the forcing, the boundary conditions (b.c.) and initial conditions (i.c.) are selected to generate a desired true solution. The spatial approximation is carried out using 4th order centered differences on a fixed spatial grid of 10,000 cells. This renders spatial approximation errors negligible and thus isolates the temporal errors due to DIRK schemes. The errors are measured in the maximum norm in space.

We consider

$$\displaystyle \begin{aligned} u_t = \frac{i \omega}{k^2}u_{xx}\;\;\mbox{for}\;\;(x,t)\in(0,1)\times(0,1.2],\quad u = g\;\;\mbox{on}\;\;\{0,1\}\times(0,1.2]\;, \end{aligned} $$
(6)

with the true solution u(x, t) = e i(kxωt), ω = 2π and k = 5. Figure 3 shows the convergence orders of u, u x and u xx for 3rd order DIRK schemes with WSO 1 (left), WSO 3 (middle) and a 4th order DIRK scheme with WSO 3 (right). For IBVPs, spatial boundary layers are produced by RK methods, thus limiting the convergence order in u to \(\tilde {q} + 1\), with an additional half an order loss per derivative when \(\tilde {q}<p\) [9]. As a result, the 4th order WSO 3 scheme recovers 4th order convergence in u and improves the convergence in u x and u xx. When \(\tilde {q}=p\), the full convergence order in u, u x and u xx is achieved, as seen in the middle panel in Fig. 3.

Fig. 3
figure 3

Error convergence for the Schrödinger equation using 3rd order DIRK schemes with WSO 1 (left) and WSO 3 (middle), and a 4th order DIRK with WSO 3 (right)

4.3 Nonlinear PDE Test Problem: Burgers’ Equation

This example demonstrates that WSO avoids order reduction for certain nonlinear IBVPs as well. We consider the viscous Burgers’ equation with pure Neumann b.c.

$$\displaystyle \begin{aligned} u_t + u u_x = \nu u_{xx} + f\;\;\mbox{for}\;\;(x,t)\in(0,1)\times(0,1],\quad u_x= h\;\;\mbox{on}\;\;\{0,1\}\times(0,1]\;. \end{aligned} $$
(7)

Here ν = 0.1 and \(u(x,t) = \cos {}(2+10t)\sin {}(0.2+20x)\). The nonlinear implicit equations arising at each time step are solved using a standard Newton iteration. The choice of Neumann b.c. distinguishes this example from the one given in [9]. With Neumann b.c., the convergence order in u is limited to \(\tilde {q}+1.5\) (half an order better than with Dirichlet b.c.). Figure 4 shows that order reduction arises with the stage order 1 scheme, and that the WSO 2 scheme recovers 3rd order convergence for u and u x, and the 3rd order WSO 3 scheme yields 3rd order convergence for u, u x and u xx.

Fig. 4
figure 4

Error convergence for the viscous Burgers’ equation using 3rd order DIRK schemes with WSO 1 (left), WSO 2 (middle) and WSO 3 (right)

4.4 Stiff Nonlinear ODE: Van der Pol Oscillator

This example illustrates that DIRK schemes with high WSO may not remove order reduction for all types of nonlinear problems. Consider the Van der Pol oscillator

$$\displaystyle \begin{aligned} x' = y\;\;\;\text{and}\;\;\; y' = \mu(1-x^2)y-x\;, \end{aligned} $$
(8)

with i.c. (x(0), y(0)) = (2, 0), stiffness parameter μ = 500, and final time T = 10. The nonlinear system at each time step is solved via MATLAB’s built-in nonlinear system solver. The “exact” solution is computed using explicit RK4 with a time step Δt = 10−6. In this case, the presented DIRK schemes with high WSO do not improve the convergence rates in the stiff regime and they perform worse than the WSO 1 scheme in terms of accuracy (see Fig. 5). On the other hand, an EDIRK with stage order 2 improves the rate of convergence in the stiff regime (see right panel in Fig. 5). However, it does so, interestingly, by yielding larger errors for large time steps.

Fig. 5
figure 5

Error convergence for Van der Pol’s equation. Left: 3rd order DIRK schemes with WSO 1 (blue circles), WSO 2 (red triangles) and WSO 3 (black squares). Right: 4th order DIRK schemes with WSO 1 (blue circles) and WSO 3 (red triangles), and a 3rd order EDIRK scheme with stage order 2 (black squares)

5 Conclusions and Outlook

This study demonstrates that it is possible to overcome order reduction (OR) for certain classes of problems in the context of DIRK schemes, even though these are limited to low stage order. A specific weak stage order (WSO) “eigenvector” criterion has been presented, analyzed, and applied to determine DIRK schemes with WSO up to 3. The numerical results confirm that the schemes avoid OR for linear problems and for some nonlinear problems in which the mechanism for order reduction is linear (i.e., boundary conditions). The key limitation found herein is that the eigenvector criterion cannot go beyond WSO 3 for DIRK schemes. Hence, a key question of future research is how high WSO is admitted by the general criterion in Definition 2. Another important future research task is to devise further DIRK schemes that are truly optimized in terms of truncation error coefficients or other criteria.