1 Introduction

There are many research topics [1, 2] in developing numerical methods for solving initial value problems (IVPs) described by

$$ \frac{d\phi}{dt} = f\bigl(t, \phi(t)\bigr),\quad t \in[t_{0}, t_{f}];\qquad \phi(t_{0}) = \phi_{0}, $$
(1)

where f has continuously bounded partial derivatives up to required order for the analysis of the developed numerical method. A long time simulation of the solution, which is needed in many physical problems (for example, a Hamiltonian system such as the Kepler problem, harmonic oscillator, molecular dynamics, etc.), is one of the most important topics in IVPs [35]. Also, such long time simulations sometimes demand very special step size selection to control the local truncation error. Most existing mechanisms of the explicit single step algorithm for solving (1) may be described by

$$ \textstyle\begin{cases} \phi_{m+1} = F(\phi_{m}), \\ e_{m+1} = G(\phi_{m},\phi_{m+1}), \end{cases} $$
(2)

where F and G are functions derived from the numerical methods. Here, \(\phi_{m+1}\) and \(e_{m+1}\) denote the approximations of the solution and the local truncation error \(E_{m+1}\), respectively, at time \(t_{m+1}\). For a pth order scheme, the estimated error \(e_{m+1}\) is usually approximated to fit only the coefficient of the \((p+1)\)th order term in the expansion of \(E_{m+1}\) about the time step size h. The other important issue is to reduce the computational costs in a long time simulation, for which an efficient control scheme of the time step size is important (for example, Radau5). There have been several approaches related to those issues (for example, embedded Runge–Kutta formulae, adaptive time stepping, long time error estimation, etc. [611]).

In the existing schemes, the estimated error \(e_{m}\) obtained from the previous time step \([t_{m-1}, t_{m}]\) is mainly used only for choosing an appropriate next time step size in most algorithms. Also, the solution \(\phi_{m+1}\) at time \(t_{m+1}\) is calculated with an initial value which is a solution \(\phi_{m}\) at the previous time \(t_{m}\). That is, \(\phi_{m}\) is assumed to be the exact initial condition for \(\phi_{m+1}\) despite the existence of the local truncation error \(e_{m}\), which leads to accumulation of the error as the time is increasing. In order to control the accumulation error, smaller integration steps or special step size controllers are sometimes required, especially for a long time simulation or stiff systems. Nevertheless, the most existing methods cannot fully resolve the error control to get a given tolerance so that it is difficult to get reliable results at stringent tolerances (for example, see [12, 13]).

The subject of this paper is to develop a new integration scheme to control the accumulation error. As a remedy to control or minimize the accumulated error of \(e_{m}\), we will embed it in the algorithm of the calculation scheme for \(\phi_{m+1}\). Further, we want to propose an estimating scheme for \(e_{m+1}\) which is correlated with three pieces of information \(\phi_{m}\), \(\phi _{m+1}\), and \(e_{m}\). That is, the scheme we want to develop is an explicit single step algorithm, the so-called error embedded error correction method (EEECM), of the form

$$ \textstyle\begin{cases} \phi_{m+1} = F(\phi_{m}, e_{m}), \\ e_{m+1} = G(\phi_{m}, e_{m}, \phi_{m+1}). \end{cases} $$
(3)

To concretely describe the proposed algorithm, the classical 4th order Runge–Kutta (RK4) method and the well-known 7th order Runge–Kutta–Fehlberg (RKF7) method for calculating \(\phi_{m+1}\) and \(e_{m+1}\), respectively, will be used. Finally, we want to develop the EEECM having the accuracy order of 7 for the solution \(\tilde{\phi}_{m+1} =\phi_{m+1}+e_{m+1}\). In particular, we will develop the efficient estimating algorithm for \(e_{m+1}\), which fits the coefficients up to 7th order term in the expansion of the global error \(E_{m+1}=\phi(t_{m+1})-\phi_{m+1}\) about the time step size h.

An error correction method (ECM) is a widely used technique in many numerical scientific computations in general. The deferred correction methods (DCM) originally developed by Pereyra and Zadunaisky [14, 15] are the representative ECMs for solving (1). There are also extended results about the DCMs (for example, see [1624]). These ECMs are based on the deferred equation of the form

$$ \frac{d\psi(t)}{dt} = f\bigl(t, \psi(t) + x(t)\bigr) - x'(t), $$
(4)

where x is a local approximation of the solution defined on each integration step \([t_{m},t_{m+1}]\). After solving (4), the solution \(\phi (t)\) of (1) can be obtained by using the identity

$$ \phi(t)=x(t)+\psi(t). $$
(5)

Two relations (4) and (5) enable us to develop the EEECM of the form (3) for solving (1). Practically, for the approximate solution \(\phi_{m+1}\), we use a local linear approximation x which uses the information of both the solution and its slope depending on the error \(e_{m}\) at time \(t_{m}\) and solve the deferred equation (4) with RK4. As mentioned above, we want to estimate the exact quantity of the error \(E_{m+1}\) up to the desired convergence order. To derive a formula for \(e_{m+1}\), another local approximation x is constructed by a local Hermite cubic interpolation polynomial having all the information of the calculated solutions and those slopes at both time \(t_{m}\) and \(t_{m+1}\). Based on the local approximation, we again solve the deferred equation (4) with the RKF7. As an appropriate step size controller, we exploit a standard step size controller to focus only on the EEECM for non-stiff problems. The constructed EEECM controls the error at each integration step, and it turns out that the proposed method possesses a good behavior of error bound in a long time simulation with a given tolerance. For an assessment of the effectiveness of the proposed algorithm, particularly its error bounds in a long time simulation, a simple harmonic oscillator problem with analytical solution and a hard error controlling problem are numerically solved. Finally, a two-body Kepler problem is also used to assess the efficiency of this algorithm. Throughout these numerical tests, it is shown that the proposed method is quite efficient compared to several existing methods.

This paper is organized as follows. In Sect. 2, we describe the methodology to formulate and control the solution and error formulas based on ECM. In Sect. 3, we give a concrete analysis of the convergence for the developed EEECM. Several numerical results are presented in Sect. 4 to give both the numerical evidences for the theoretical analysis and the numerical effectiveness of EEECM. Finally, in Sect. 5, a summary for EEECM and some discussion for further works are given.

2 Derivation of algorithm

In this section, we present the algorithm of EEECM based on the deferred equations. Let us assume that the approximated solution \(\phi_{m}\) and the estimated error \(e_{m}\) for the solution \(\phi(t_{m})\) and the error \(E_{m}\), respectively, at time \(t_{m}\) are already calculated. Then, as a local approximation of the solution \(\phi(t)\) on the integration step \([t_{m},t_{m+1}]\), one may consider the modified Euler’s polygon \(y(t)\) defined by

$$ y(t):= \phi_{m} + (t-t_{m})f(t_{m}, \phi_{m} + e_{m}),\quad t \in[t_{m}, t_{m+1}]. $$
(6)

Let \(\psi(t)\) be the difference between \(\phi(t)\) and \(y(t)\) such that

$$ \psi(t):=\phi(t)-y(t),\quad t\in[t_{m},t_{m+1}]. $$
(7)

Differentiating both sides of (7) and combining the result with (1) and (6), one can see that the difference \(\psi(t)\) satisfies the following deferred differential equation:

$$ \textstyle\begin{cases} \psi'(t)= g_{1}(t,\psi(t)), & t\in(t_{m},t_{m+1}],\\ \psi(t_{m})=E_{m}, \end{cases} $$
(8)

where \(g_{1}\) is defined by

$$ g_{1}\bigl(t,\psi(t)\bigr):=f\bigl(t, \psi(t) + y(t) \bigr) - y'(t)=f\bigl(t,\psi (t)+y(t)\bigr)-f(t_{m}, \phi_{m} + e_{m}). $$
(9)

Observe that the initial condition \(\psi(t_{m})\) of (8) is given by the unknown actual error \(E_{m}\) at time \(t_{m}\), and hence problem (8) cannot be solved directly. Since \(e_{m}\) is assumed to be an estimated error of \(E_{m}=\psi(t_{m})\), instead of solving (8), it is natural to consider the following IVP:

$$ \textstyle\begin{cases} \theta'(t)=g_{1}(t,\theta(t)),& t\in(t_{m},t_{m+1}],\\ \theta(t_{m})=e_{m} \end{cases} $$
(10)

for an approximation of \(\psi(t_{m+1})\). One may check that applying RK4 to (10) leads to

$$ \begin{aligned} &\theta(t_{m+1})\approx e_{m}+\frac{h}{6} [-5v_{1}+2v_{2}+2v_{3}+v_{4} ], \\ &v_{1}=f(t_{m},{\tilde{\phi}}_{m}),\qquad v_{2}=f \biggl(t_{m}+\frac {h}{2},{\tilde{\phi}}_{m}+\frac{h}{2}v_{1} \biggr), \\ &v_{3}=f \biggl(t_{m}+\frac{h}{2},{\tilde{\phi}}_{m}+\frac{h}{2}v_{2} \biggr),\qquad v_{4}=f(t_{m}+h,{\tilde{\phi}}_{m}+hv_{3}), \end{aligned} $$
(11)

where \({\tilde{\phi}}_{m}:=\phi_{m}+e_{m}\). Combining approximation (11) with (6) and (7), one may get an approximation formula for \(\phi(t_{m+1})\) as follows.

$$ \phi_{m+1}:={\tilde{\phi}}_{m}+ \frac{h}{6} [v_{1}+2v_{2}+2v_{3}+v_{4} ], $$
(12)

where the intermediate values \(v_{i}\) are defined by (11). Note that the classical RK4 uses only the approximate value \(\phi_{m}\) at time \(t_{m}\) to calculate \(\phi_{m+1}\), whereas algorithm (12) uses the value \({\tilde{\phi}}_{m}:=\phi_{m}+e_{m}\) instead of \(\phi_{m}\), which is a remarkable difference compared to the RK4.

Since the estimated error \(e_{m}\) at time \(t_{m}\) is embedded in algorithm (12), a recursive relation for a sequence \(\{e_{m}\}\) is needed to complete the algorithm. We try to derive this relation using another deferred equation together with an appropriate local approximation. Recall that after the calculation of (12), one can use the information of both the approximate solutions and those slopes at time \(t_{m}\) and \(t_{m+1}\). Hence, as the local approximation, it is natural to use a local Hermite cubic interpolation such that

$$ x(t) =a_{0}+a_{1}(t-t_{m}) + a_{2}(t-t_{m})^{2} + a_{3}(t-t_{m})^{2}(t-t_{m+1}) $$
(13)

satisfying \(x(t_{m}) = {\tilde{\phi}}_{m}\), \(x'(t_{m})= f(t_{m},{\tilde{\phi}}_{m})\), \(x(t_{m+1})=\phi_{m+1}\), and \(x'(t_{m+1})=f(t_{m+1},\phi_{m+1})\). Then it solves [25]

$$\begin{aligned} x(t) ={}& x(t_{m}) + x'(t_{m}) (t-t_{m}) + \frac{x(t_{m+1})-x(t_{m}) -x'(t_{m})h}{h^{2}}(t-t_{m})^{2} \\ & {}+ \frac{(x'(t_{m+1})+x'(t_{m}))h -2(x(t_{m+1})-x(t_{m}))}{h^{3}} (t -t_{m})^{2}(t-t_{m+1}). \end{aligned}$$
(14)

Let \(\psi(t)\) be the difference between \(\phi(t)\) and \(x(t)\) such that

$$ \psi(t):=\phi(t)-x(t),\quad t\in[t_{m},t_{m+1}]. $$
(15)

As the derivation of (8), one can see that the difference \(\psi(t)\) defined by (15) satisfies the following deferred differential equation:

$$ \textstyle\begin{cases} \psi'(t)= g_{2}(t,\psi(t)), & t\in(t_{m},t_{m+1}],\\ \psi(t_{m})=E_{m}-e_{m}, \end{cases} $$
(16)

where \(g_{2}\) is defined by

$$ g_{2}\bigl(t,\psi(t)\bigr):=f\bigl(t, \psi(t) + x(t) \bigr) - x'(t). $$
(17)

Observe that the initial condition \(\psi(t_{m})\) of (16) contains the unknown value \(E_{m}\) and hence problem (16) cannot be solved directly. Since \(e_{m}\) is the estimated error of \(E_{m}\), if one assumes that it is well approximated, then the initial value \(\psi(t_{m})\) becomes quite small. Hence, instead of solving (16), it is natural to consider the following IVP:

$$ \textstyle\begin{cases} \theta'(t)=g_{2}(t,\theta(t)), & t\in(t_{m},t_{m+1}],\\ \theta(t_{m})=0 \end{cases} $$
(18)

for an approximation of \(\psi(t_{m+1})\). To solve (18), we consider the well-known RKF7 with Butcher array [26]

$$ \textstyle\begin{array}{c|c} {\mathbf {c}}&{\mathcal {A}}\\\hline & {\mathbf {b}}^{T} \end{array}\displaystyle , $$
(19)

where

$$\begin{aligned} &{\mathbf {c}}=[c_{1},c_{2}, \ldots,c_{11}]^{T}:= \biggl[0,\frac {2}{27}, \frac{1}{9},\frac{1}{6},\frac{5}{12},\frac{1}{2}, \frac {5}{6},\frac{1}{6},\frac{2}{3},\frac{1}{3},1 \biggr]^{T}, \\ &{\mathbf {b}}=[b_{1},b_{2},\ldots,b_{11}]^{T}:= \biggl[\frac {41}{840},0,0,0,0,\frac{34}{105},\frac{9}{35}, \frac{9}{35},\frac {9}{280},\frac{9}{280},\frac{41}{840} \biggr]^{T}, \\ &\mathcal{A}=(\alpha_{i,j}):= \begin{bmatrix} 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ \frac{2}{27} & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ \frac{1}{36} & \frac{1}{12} & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0\\ \frac{1}{24} & 0 & \frac{1}{8} & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ \frac{5}{12} & 0 &-\frac{25}{16}&\frac{25}{16} & 0 & 0 & 0 & 0 & 0 & 0 \\ \frac{1}{20} & 0 & 0 &\frac{1}{4} &\frac{1}{5} & 0 & 0 & 0 & 0 & 0 \\ -\frac{25}{108} & 0 & 0 &\frac{125}{108} &-\frac{65}{27} &\frac {125}{54} & 0 & 0 & 0 & 0 \\ \frac{31}{300} & 0 & 0 &0 &\frac{61}{225} &-\frac{2}{9} &\frac {13}{900} & 0 & 0 & 0 \\ 2 & 0 & 0 &-\frac{53}{6} &\frac{704}{45} &-\frac{107}{9} &\frac {67}{90} & 3 & 0 & 0 \\ -\frac{91}{108} & 0 &0 &\frac{23}{108} &-\frac{976}{135} &\frac {311}{54} &-\frac{19}{60} &\frac{17}{6} &-\frac{1}{12} & 0 \\ \frac{2383}{4100} & 0 & 0 &-\frac{341}{164}&\frac{4496}{1025}&-\frac {301}{82}&\frac{2133}{4100}&\frac{45}{82}&\frac{45}{164}&\frac{18}{41} \end{bmatrix} . \end{aligned}$$
(20)

Since \(\theta(t_{m})=0\), by applying the RKF7 to problem (18) and using (17), \(\theta(t_{m+1})\) can be approximated as

$$ \begin{aligned} &\theta(t_{m+1})\approx h\sum _{i=2}^{11} b_{i}K_{i}, \\ &K_{i}=g_{2} \Biggl(t_{m}+c_{i} h,h \sum_{j=2}^{i-1}\alpha_{i,j}K_{j} \Biggr) \\ &\phantom{K_{i}}=f \Biggl(t_{m}+c_{i}h,h\sum _{j=2}^{i-1}\alpha_{i,j}K_{j}+x(t_{m}+c_{i} h) \Biggr)-x'(t_{m}+c_{i}h),\quad i=2,\ldots,11. \end{aligned} $$
(21)

From the definition of the Hermite interpolation x defined by (14), one may see that \(\psi(t_{m+1})=\phi(t_{m+1})-\phi_{m+1}:=E_{m+1}\). Also, we recall that system (18) is a perturbed system from IVP (16). Thus, one may take the approximation of \(\theta (t_{m+1})\) given in (21) as an estimated error \(e_{m+1}\) for the actual error \(E_{m+1}\). That is, we define

$$ e_{m+1}= h\sum_{i=2}^{11} b_{i}K_{i}, $$
(22)

where \(K_{i}\) are defined by (21).

It is easy to check that the coefficients (20) in Butcher array (19) have the following identities:

$$ \begin{aligned} &\sum_{j=1}^{i-1} \alpha_{i,j}=c_{i},\quad i=1,\ldots,11, \\ &\sum_{j=1}^{i-1}\alpha_{i,j}c_{j}= \frac{c_{i}^{2}}{2},\qquad \sum_{j=1}^{i-1} \alpha_{i,j}c_{j}^{2}=\frac{c_{i}^{3}}{3}, \quad i=3, \ldots,11, \\ &\sum_{j=1}^{11} b_{j}=1, \qquad\sum _{j=1}^{11}b_{j}c_{j}= \frac{1}{2},\qquad \sum_{j=1}^{11}b_{j}c_{j}^{2}= \frac{1}{3}. \end{aligned} $$
(23)

Using these identities, one may prove the following lemma.

Lemma 1

The algorithm for \(e_{m+1}\) defined by (22) can be simplified by

$$ e_{m+1}= \phi_{m}+e_{m}- \phi_{m+1}+h\sum_{i=1}^{11} b_{i}V_{i}, $$
(24)

where the intermediate values \(V_{i}\) are defined by

$$ \begin{aligned}& V_{1}:=v_{1},\qquad V_{2}:=f\bigl(t_{m}+c_{2} h,x(t_{m}+c_{2}h) \bigr), \\ &V_{i}:=f \Biggl(t_{m}+c_{i}h, \phi_{m}+e_{m}+h\sum_{j=1}^{i-1} \alpha _{i,j}V_{j} \Biggr),\quad i=3,\ldots,11, \end{aligned} $$
(25)

where \(v_{1}\) is defined by (11).

Proof

For the quantity \(K_{i}\) defined by (21), we let

$$\Gamma_{i}:=K_{i}+x'(t_{m}+c_{i}h),\quad i=2,3,\ldots,11. $$

Then algorithm (22) can be written as

$$ \ \begin{aligned}& e_{m+1}= \gamma+h\sum _{i=2}^{11} b_{i}\Gamma_{i}, \\ &\Gamma_{2}=f \bigl(t_{m}+c_{2}h,x(t_{m}+c_{2}h) \bigr), \\ &\Gamma_{i}=f \Biggl(t_{m}+c_{i}h,h\sum _{j=2}^{i-1}\alpha_{i,j}\Gamma _{j}+\beta_{i} \Biggr), \quad i=3,\ldots,11, \end{aligned} $$
(26)

where γ and \(\beta_{i}\) are defined by

$$ \begin{aligned} &\gamma:=-h\sum _{i=2}^{11} b_{i}x'(t_{m}+c_{i}h), \\ &\beta_{i}:=-h\sum_{j=2}^{i-1} \alpha_{i,j}x'(t_{m}+c_{j}h)+x(t_{m}+c_{i} h). \end{aligned} $$
(27)

For a simplification of γ and \(\beta_{i}\) defined in (27), we consider Taylor’s expansion of x about \(t=t_{m}\) given by

$$ x(t)=x(t_{m})+(t-t_{m})x'(t_{m})+ \frac{(t-t_{m})^{2}}{2}x''(t_{m}) + \frac{(t-t_{m})^{3}}{6}x^{(3)}(t_{m}). $$
(28)

By substituting (28) into the formula of γ given in (27) and combining the result with (23) and (28), one may check that

$$\begin{aligned} \gamma&=hb_{1}x'(t_{m})-hx'(t_{m})- \frac{h^{2}}{2}x''(t_{m})- \frac {h^{3}}{6}x^{(3)}(t_{m}) \\ &=hb_{1}x'(t_{m})+x(t_{m})-x(t_{m+1}) \end{aligned}$$
(29)

and

$$ \beta_{i}=x(t_{m})+h\alpha_{i,1}x'(t_{m}),\quad i=3,\ldots,11. $$
(30)

Hence, substituting (29) and (30) into (26) and considering the definition of \(V_{i}\) defined by (25), one can complete the proof. □

Remark 1

Remark that 16 evaluations of the Hermite interpolation and its derivatives are required for algorithm (22). However, by introducing Lemma 1, only one evaluation of the Hermite interpolation is required. It is remarkable.

For summarizing the algorithm we discussed, we consider the Butcher array of RK4 given by

$$ \textstyle\begin{array}{c|c} {\mathbf {n}}&{\mathcal {S}}\\\hline & {\mathbf {k}} \end{array}\displaystyle , $$
(31)

where

$$ \begin{aligned} &{\mathbf {n}}=[n_{1},n_{2},n_{3},n_{4}]:= \biggl[0,\frac{1}{2},\frac {1}{2},1 \biggr], \qquad {\mathbf {k}}=[k_{1},k_{2},k_{3},k_{4}]:= \biggl[ \frac{1}{6},\frac {1}{3},\frac{1}{3},\frac{1}{6} \biggr], \\ &\mathcal{S}=(s_{i,j}):= \begin{bmatrix} 0 & 0 & 0 & 0 \\ \frac{1}{2} & 0 & 0 & 0 \\ 0 & \frac{1}{2} & 0 & 0 \\ 0 & 0 & 1 & 0 \end{bmatrix} . \end{aligned} $$
(32)

Also, if we define

$$V_{0}:=f(t_{m+1},\phi_{m+1}), $$

then, from the definition of x given in (14), we have

$$\begin{aligned} x(t_{m}+c_{2}h)={}&\phi_{m}+e_{m}+( \phi_{m+1}-\phi_{m}-e_{m})c_{2}^{2}(3-2c_{2}) \\ &{}+c_{2}(1-c_{2})h\bigl((1-c_{2})V_{1}-c_{2}V_{0} \bigr). \end{aligned}$$

Thus, by combining it with (24) and (12), one can derive the algorithm EEECM given by

$$ \textstyle\begin{cases} \phi_{m+1}=\phi_{m}+e_{m}+h\sum_{i=1}^{4}k_{i}v_{i},\\ e_{m+1}=\phi_{m}+e_{m}-\phi_{m+1}+h\sum_{i=1}^{11}b_{i}V_{i}, \end{cases}\displaystyle m\geq0, $$
(33)

where the intermediate values \(v_{i}\) and \(V_{i}\) are defined by

$$\begin{aligned} &v_{1}=f(t_{m}, \phi_{m}+e_{m}),\qquad v_{i}=f (t_{m}+n_{i}h, \phi_{m}+e_{m}+h s_{i, i-1}v_{i-1} ),\quad i=2,3,4, \\ &V_{0}=f(t_{m+1},\phi_{m+1}), \qquad V_{1}=v_{1}, \\ &V_{2}=f \bigl(t_{m}+c_{2} h, \phi_{m}+e_{m}+c_{2}^{2}(3-2c_{2}) (\phi_{m+1}-\phi _{m}-e_{m}) \\ &\phantom{V_{2}=}{}+c_{2}(1-c_{2})h\bigl((1-c_{2})V_{1}-c_{2}V_{0} \bigr) \bigr), \\ &V_{i}=f \Biggl(t_{m}+c_{i}h, \phi_{m}+e_{m}+h\sum_{j=1}^{i-1} \alpha _{i,j}V_{j} \Biggr),\quad i=3,\ldots,11. \end{aligned}$$
(34)

Also, if we let \(\tilde{\phi}_{m}:=\phi_{m}+e_{m}\), then one may get a better approximation \(\{\tilde{\phi}_{m}\}\) than the approximation \(\{\phi_{m}\}\), and it satisfies the following recurrence relation:

$$ \tilde{\phi}_{m+1}=\tilde{\phi}_{m}+h\sum _{i=1}^{11}b_{i}V_{i},\quad m\geq0, $$
(35)

where the intermediate values \(V_{i}\) are calculated by

$$ \begin{aligned}& v_{1}=f(t_{m}, \tilde{\phi}_{m}),\qquad v_{i}=f (t_{m}+n_{i}h, \tilde{\phi}_{m} +h s_{i, i-1}v_{i-1} ),\quad i=2,3,4, \\ &V_{0}=f \Biggl(t_{m+1},\tilde{\phi}_{m}+h\sum _{i=1}^{4}k_{i}v_{i} \Biggr),\qquad V_{1}=v_{1}, \\ &V_{2}=f \Biggl(t_{m}+c_{2} h,\tilde{ \phi}_{m}+c_{2}^{2}(3-2c_{2})h\sum _{i=1}^{4}k_{i}v_{i}+c_{2}(1-c_{2})h \bigl((1-c_{2})V_{1}-c_{2}V_{0}\bigr) \Biggr), \\ & V_{i}=f \Biggl(t_{m}+c_{i}h,\tilde{ \phi}_{m}+h\sum_{j=1}^{i-1}\alpha _{i,j}V_{j} \Biggr),\quad i=3,\ldots,11. \end{aligned} $$
(36)

Remark 2

The algorithm of EEECM (33) (or (35)) needs 15 function evaluations in each time step, which is two more function evaluations than those of RKF78. Unlike RKF78, not only is the estimated error sequence \(\{ e_{m}\}\) embedded to calculate the solution \(\phi_{m}\), but also it will be used to control the time step size. This is the reason why we call the proposed algorithm an error embedded error correction method. We also remark that scheme (33) (or (35)) is also applicable to a system of ODEs of the form

$$\Phi'(t)=\mathfrak{F}\bigl(t,\Phi(t)\bigr),\quad t\in(t_{0},t_{f}];\qquad \Phi (t_{0})=\Phi_{0}, $$

where \(\Phi:= [\phi_{1}(t),\ldots,\phi_{d}(t) ]^{T}\) and \(\mathfrak{F}:= [f_{1}(t,\Phi(t)),\ldots,f_{d}(t,\Phi(t)) ]^{T}\).

Remark 3

(Geometric interpretation)

A geometric meaning of EEECM is interpreted in Fig. 1, which consists of two steps: the first step calculates the approximated solution \(\phi_{m+1} \) at \(t_{m+1}\) based on the deferred equation constructed by the Euler polygon \(y(t)\), for which two pieces of information \(\phi_{m}\) and \(e_{m}\) calculated at time \(t_{m}\) are used. To complete the algorithm, a scheme embedding the sequence of the estimated error \(e_{m}\) into the algorithm itself is required. Therefore, in the second step, the local truncation error \(E_{m+1}\) is estimated with another deferred equation based on higher order local approximation \(x(t)\), for which all pieces of information \(\phi_{m}, \phi_{m+1}\), and \(e_{m}\) are used.

Figure 1
figure 1

Geometric meaning of the error embedded error correction methods

3 Convergence analysis

The aim of this section is to give a concrete convergence analysis for algorithm (35). For the simplicity of the analysis, we assume that IVP (1) is an autonomous problem. That is, we assume \(f(t,\phi(t)):=f(\phi(t))\). For a simplification, we introduce the operator \(D^{k}\) defined by

$$ D^{k}f(y):=f\frac{\partial}{\partial y} \bigl(D^{k-1}f(y) \bigr),\quad k\geq1, $$
(37)

where \(D^{0}f(y):=f(y)\). Let \(F(y,h;f)\) be a function of the form

$$ \begin{aligned}& F(y,h;f):=\sum _{i=1}^{11} b_{i}V_{i}=b_{1}V_{1}+ \sum_{i=6}^{11} b_{i}V_{i}, \\ &v_{1}=f(y),\qquad v_{i}=f (y+h s_{i, i-1}v_{i-1} ),\quad i=2,3,4, \\ &V_{0}=f \Biggl(y+h\sum_{i=1}^{4}k_{i}v_{i} \Biggr),\qquad V_{1}=v_{1}, \\ &V_{2}=f \Biggl(y+h \Biggl[c_{2}^{2}(3-2c_{2}) \sum_{i=1}^{4}k_{i}v_{i} +c_{2}(1-c_{2}) \bigl((1-c_{2})V_{1}-c_{2}V_{0} \bigr) \Biggr] \Biggr), \\ &V_{i}=f \Biggl(y+h\sum_{j=1}^{i-1} \alpha_{i,j}V_{j} \Biggr),\quad i=3,\ldots,11, \end{aligned} $$
(38)

where \(k_{i}\) and \((s_{i,j} )\) are defined in (32) and \(c_{i}\) and \((\alpha_{i,j} )\) are defined in (20). Note that, by Taylor’s expansion of \(f(y+h\nu)\) about y, one may rewrite \(V_{i}\) of (38) by

$$ V_{i}=\sum_{k=0}^{\infty}\frac{f^{(k)}(y)}{k!} h^{k} X_{i}^{k},\quad i=3,\ldots,11, $$
(39)

where

$$ X_{i}:=\sum_{j=1}^{i-1} \alpha_{i,j}V_{j},\quad i=3,\ldots,11. $$
(40)

The above two relations (39) and (40) give a simple expansion of \(X_{i}\) as follows.

Lemma 2

For the quantities \(X_{i}\) \((i\geq4)\) defined by (40), we have

$$\begin{aligned} X_{i}={}&\sum_{k=0}^{5} \frac{h^{k}}{k!} D^{k}f \bigl({\mathcal {A}} {\mathbf {c}}^{k} \bigr)_{i} + f'\sum_{k=4}^{5} \frac{h^{k}}{(k-1)!}D^{k-1}f \biggl({\mathcal {A}}^{2}{\mathbf {c}}^{k-1}-\frac{1}{k}{\mathcal {A}}\mathbf{c}^{k} \biggr)_{i} \\ &{}+\frac{h^{5}}{3!}D^{3}f \biggl(f''f \biggl({\mathcal {A}} \bigl(\mathbf{c}\cdot {\mathcal {A}}\mathbf{c}^{3} \bigr)-\frac{1}{4}{\mathcal {A}}\mathbf{c}^{5} \biggr)+ \bigl(f'\bigr)^{2} \biggl({\mathcal {A}}^{3} \mathbf{c}^{3}-\frac{1}{4}{\mathcal {A}}^{2}{\mathbf {c}}^{4} \biggr) \biggr)_{i}+\mathcal{O} \bigl(h^{6} \bigr), \end{aligned}$$
(41)

where all the functions on the right-hand side are evaluated at the value y and \((\mathbf{a} )_{i}\) denotes the ith component of a vector a. Here, \(\mathbf{c}^{0}:= [1, 1, \ldots, 1]^{T}\) and also a multiplication between two vectors \(\mathbf{a}:=[a_{1},\ldots,a_{11}]^{T}\) and \(\mathbf{b}:=[b_{1},\ldots,b_{11}]^{T}\) is defined by \(\mathbf{a}\cdot\mathbf{b}:= [a_{1}b_{1},\ldots,a_{11}b_{11} ]^{T} \) and \(\mathbf{a}^{k}:=\mathbf{a}\cdot\mathbf{a}^{k-1}\).

Proof

For the value \(X_{i}\) defined on (40), let us define a vector X by \(\mathbf{X}:= [X_{1},\ldots ,X_{11} ]^{T}\) with \(X_{1}=X_{2}=0\). Then, from the definition of the matrix \({\mathcal {A}}\) of (20), combining (40) with (39) and the identity \(\sum_{j=1}^{i-1}\alpha_{i,j}=c_{i}\) of (23) yields

$$\begin{aligned} X_{i}&=f\alpha_{i,1}+ \alpha_{i,2}V_{2}+f\sum_{j=3}^{i-1} \alpha _{i,j}+\sum_{k=1}^{5} \frac{f^{(k)}}{k!}h^{k}\sum_{j=3}^{i-1} \alpha _{i,j}X_{j}^{k}+\mathcal{O} \bigl(h^{6} \bigr) \\ &=f\mathbf{c}_{i}+\sum_{k=1}^{5} \frac{f^{(k)}}{k!}h^{k} \bigl({\mathcal {A}} {\mathbf {X}}^{k} \bigr)_{i}+\mathcal{O} \bigl(h^{6} \bigr),\quad i\geq4, \end{aligned}$$
(42)

where \(\alpha_{i,2}=0\) (\(i\geq4\)) is used in the above second equality and the power \(\mathbf{X}^{k}\) is obtained by the above vector multiplication. To obtain a series expansion of \(X_{i}\) in terms of h, we let

$$ \mathbf{X}:= \sum_{k=0}^{5} h^{k} \mathbf{a}_{k}+\mathcal{O} \bigl(h^{6} \bigr) $$
(43)

and substitute it into (42). Here, we may assume \((\mathbf{a}_{k})_{i}=0\ (i=1,2)\) and \((\mathbf{a}_{k})_{3}\) are determined by Taylor’s expansion of \(X_{3}\) in terms of h. Further, we expand the resulted equation in ascending order of h. Then one may check that for \(i\geq4\),

$$\begin{aligned} X_{i}={}&fc_{i}+h f' ({\mathcal {A}}\mathbf{a}_{0} )_{i}+h^{2} \biggl(f'{\mathcal {A}}\mathbf{a}_{1}+\frac{f''}{2}{ \mathcal {A}} \mathbf{a}_{0}^{2} \biggr)_{i}+h^{3} \biggl(f'{\mathcal {A}}\mathbf{a}_{2}+f''{ \mathcal {A}} ({\mathbf {a}}_{0}\cdot\mathbf{a}_{1} ) +\frac{f^{(3)}}{3!}{\mathcal {A}}\mathbf{a}_{0}^{3} \biggr)_{i} \\ &{} +h^{4} \biggl(f'{\mathcal {A}} \mathbf{a}_{3}+\frac{f''}{2}{\mathcal {A}} \bigl( \mathbf{a}_{1}^{2}+2 \mathbf{a}_{0}\cdot \mathbf{a}_{2} \bigr)+\frac {f^{(3)}}{2}{\mathcal {A}} \bigl( \mathbf{a}_{0}^{2}\cdot\mathbf{a}_{1} \bigr) +\frac{f^{(4)}}{4!}{\mathcal {A}} \mathbf{a}_{0}^{4} \biggr)_{i} \\ &{} +h^{5} \biggl(f'{\mathcal {A}} \mathbf{a}_{4}+f''{\mathcal {A}} ( \mathbf{a}_{1}\cdot \mathbf{a}_{2}+\mathbf{a}_{0} \cdot\mathbf{a}_{3} ) \\ &{}+\frac{f^{(3)}}{2}{\mathcal {A}} \bigl(\mathbf{a}_{0}\cdot \mathbf{a}_{1}^{2}+{\mathbf {a}}_{0}^{2}\cdot \mathbf{a}_{2} \bigr)+\frac{f^{(4)}}{3!}{\mathcal {A}} \bigl({\mathbf {a}}_{0}^{3}\cdot\mathbf{a}_{1} \bigr)+ \frac{f^{(5)}}{5!}{\mathcal {A}} {\mathbf {a}}_{0}^{5} \biggr)_{i}+\mathcal{O} \bigl(h^{6} \bigr). \end{aligned}$$
(44)

Thus, by comparing the coefficients of two equations (43) and (44), one may have the following recurrence relations for \(\mathbf{a}_{i}\):

$$\begin{aligned} &(\mathbf{a}_{0} )_{i}=f (\mathbf{c} )_{i},\qquad (\mathbf{a}_{1} )_{i}=f' ({\mathcal {A}}\mathbf{a}_{0} )_{i},\qquad (\mathbf{a}_{2} )_{i}= \biggl(f'{\mathcal {A}}\mathbf{a}_{1}+\frac {f''}{2}{ \mathcal {A}} \mathbf{a}_{0}^{2} \biggr)_{i}, \\ &(\mathbf{a}_{3} )_{i}= \biggl(f'{\mathcal {A}} \mathbf{a}_{2}+f''{\mathcal {A}} ( \mathbf{a}_{0}\cdot\mathbf{a}_{1} ) + \frac{f^{(3)}}{3!}{ \mathcal {A}}\mathbf{a}_{0}^{3} \biggr)_{i}, \\ &(\mathbf{a}_{4} )_{i}= \biggl(f'{\mathcal {A}} \mathbf{a}_{3}+\frac {f''}{2}{\mathcal {A}} \bigl( \mathbf{a}_{1}^{2}+2 \mathbf{a}_{0}\cdot \mathbf{a}_{2} \bigr)+\frac{f^{(3)}}{2}{\mathcal {A}} \bigl( \mathbf{a}_{0}^{2}\cdot\mathbf{a}_{1} \bigr) + \frac{f^{(4)}}{4!}{\mathcal {A}} \mathbf{a}_{0}^{4} \biggr)_{i}, \\ &(\mathbf{a}_{5} )_{i}= \biggl(f'{\mathcal {A}} \mathbf{a}_{4}+f''{\mathcal {A}} ( \mathbf{a}_{1}\cdot\mathbf{a}_{2}+\mathbf{a}_{0} \cdot\mathbf{a}_{3} )+\frac {f^{(3)}}{2}{\mathcal {A}} \bigl( \mathbf{a}_{0}\cdot\mathbf{a}_{1}^{2}+{\mathbf {a}}_{0}^{2}\cdot\mathbf{a}_{2} \bigr) \\ &\phantom{(\mathbf{a}_{5} )_{i}=}{}+\frac{f^{(4)}}{3!}{\mathcal {A}} \bigl(\mathbf{a}_{0}^{3} \cdot\mathbf{a}_{1} \bigr)+\frac{f^{(5)}}{5!}{\mathcal {A}} \mathbf{a}_{0}^{5} \biggr)_{i}, \quad i \geq4. \end{aligned}$$
(45)

Finally, we solve the recurrence relations (45) with the aid of the relations in (23) and (37). Then one may get the required identity in (41). □

From equation (39) together with (41) in the above lemma, we have the following corollary.

Corollary 1

For the intermediate values \(V_{i}\) \((i\geq6)\) defined in (38), we have

$$\begin{aligned} V_{i}={}&\sum_{k=0}^{6} \frac{h^{k}}{k!} D^{k}f(y) \mathbf{c}^{k}_{i}+ \frac {h^{5}}{4!}f' D^{4}f(y) \biggl({\mathcal {A}} \mathbf{c}^{4}-\frac{{\mathbf {c}}^{5}}{5} \biggr)_{i} \\ &{}+h^{6} ( \bigl(f'(y)\bigr)^{2}D^{4}f(y) \biggl({\mathcal {A}}^{2}\mathbf{c}^{4}- \frac{{\mathcal {A}}\mathbf{c}^{5}}{5} \biggr)_{i}+\frac{f'(y)}{5!}D^{5}f(y) \biggl({\mathcal {A}} \mathbf{c}^{5}-\frac{\mathbf{c}^{6}}{6} \biggr)_{i} \\ &{}+\frac{f''(y)f(y)}{4!}D^{4}f(y) \biggl(\mathbf{c}\cdot{\mathcal {A}} {\mathbf {c}}^{4}-\frac{\mathbf{c}^{6}}{5} \biggr)_{i}+\mathcal{O} \bigl(h^{7} \bigr). \end{aligned}$$
(46)

Proof

By directly substituting (41) into (39) and expanding the resulted equation in ascending order of h with the aid of the identity \(({\mathcal {A}}\mathbf{c}^{3})_{i} = \frac{\mathbf{c}_{i}^{4}}{4}, i\geq6 \), one may get the required equation (46). □

Substituting expansion (46) into the sum of F defined by (38) leads to the following theorem.

Theorem 1

Let us assume that the slope function f is sufficiently smooth. Then the function F defined by (38) satisfies

$$ F(y,h;f)=\sum_{k=0}^{6} \frac{h^{k}}{(k+1)!} D^{k}f(y)+{\mathcal {O}} \bigl(h^{7} \bigr). $$
(47)

Proof

By substituting the expansion of \(V_{i}\) in (46) into the sum of F in (38) and simplifying the result, one may get

$$\begin{aligned} F(y,h; f)={}&f(y)\sum_{i=1}^{11}b_{i} + \sum_{k=1}^{6}\frac {D^{k}f(y)}{k!}h^{k} \Biggl(\sum_{i=6}^{11} b_{i}c_{i}^{k} \Biggr) \\ &{}+\frac{h^{5}}{4!}f'(y)D^{4}f(y)\sum _{i=6}^{11}b_{i} \biggl({\mathcal {A}} \mathbf{c}^{4} -\frac{1}{5}c_{i}^{5} \biggr)_{i} \\ &{}+ h^{6} \Biggl[\bigl(f'(y)\bigr)^{2}D^{4}f(y) \sum_{i=6}^{11}b_{i} \biggl({ \mathcal {A}}^{2}\mathbf{c}^{4}-\frac{1}{5}{\mathcal {A}} \mathbf{c}^{5} \biggr)_{i} \\ &{}+\frac{f'(y)}{5!}D^{5}f(y)\sum_{i=6}^{11}b_{i} \biggl({\mathcal {A}} {\mathbf {c}}^{5} - \frac{1}{6}c^{6} \biggr)_{i} \\ &{}+\frac{f''f}{4!}D^{4}f(y)\sum_{i=6}^{11}b_{i} \biggl(c_{i} \bigl({\mathcal {A}}\mathbf{c}^{4} \bigr)_{i}-\frac{1}{5}c_{i}^{6} \biggr) \Biggr] +\mathcal{O} \bigl(h^{7} \bigr). \end{aligned}$$
(48)

From the Butcher array (19) with the coefficients in (20), one may check that

$$ \begin{aligned} &\sum_{i=6}^{11}b_{i} \bigl({\mathcal {A}}\mathbf{c}^{4} \bigr)_{i}= \frac {1}{5}\sum_{i=6}^{11}b_{i}c_{i}^{5},\qquad \sum_{i=6}^{11}b_{i} \bigl({ \mathcal {A}}^{2}\mathbf{c}^{4} \bigr)_{i}= \frac{1}{5}\sum_{i=6}^{11}b_{i} \bigl({\mathcal {A}}\mathbf{c}^{5} \bigr)_{i}, \\ &\sum_{i=6}^{11}b_{i} \bigl({ \mathcal {A}}\mathbf{c}^{5} \bigr)_{i} =\frac {1}{6}\sum _{i=6}^{11} b_{i}c_{i}^{6},\qquad \sum_{i=6}^{11}b_{i}c_{i} \bigl({\mathcal {A}}\mathbf{c}^{4} \bigr)_{i} = \frac {1}{5}\sum_{i=6}^{11}b_{i}c_{i}^{6}, \\ &\sum_{i=6}^{11}b_{i}c_{i}^{k} = \frac{1}{k+1},\qquad \sum_{i=1}^{11}b_{i}=1. \end{aligned} $$
(49)

Combining the relations in (49) with equation (48) yields the required equation (47). □

For a concrete convergence analysis of scheme (35), similar to the methodology in [25], we now define the truncation error by

$$ T_{m}(\phi):= \phi(t_{m+1})- \phi(t_{m})-hF\bigl(\phi(t_{m}),h;f\bigr), \quad m\geq0, $$
(50)

and define \(\tau_{m}(\phi)\) implicitly by

$$ T_{m}(\phi)=h\tau_{m}(\phi). $$
(51)

Then two equations (50) and (51) give

$$ \phi(t_{m+1})=\phi(t_{m})+hF\bigl( \phi(t_{m}),h;f\bigr)+h\tau_{m}(\phi),\quad m\geq0. $$
(52)

Thus, subtract (35) from (52) together with (38) to obtain

$$ \tilde{E}_{m+1}=\tilde{E}_{m}+h \bigl[F\bigl( \phi(t_{m}),h;f\bigr)-F(\tilde{\phi }_{m},h;f) \bigr]+h \tau_{m}(\phi), $$
(53)

in which \(\tilde{E}_{m}:=\phi(t_{m})-\tilde{\phi}_{m}\). For the simplicity of the convergence analysis, we now assume that the function F satisfies a Lipschitz condition

$$ \bigl\vert F(y,h;f)-F(z,h;f) \bigr\vert \leq L \vert y-z \vert $$
(54)

for all \(-\infty< y,z<\infty\) and all small \(h>0\). This condition can be usually obtained by using the Lipschitz condition on f and its derivatives. Applying the Lipschitz condition (54) into (53) leads to

$$ \vert \tilde{E}_{m+1} \vert \leq (1+Lh ) \vert \tilde {E}_{m} \vert +h\tau(h),\quad m\geq0, $$
(55)

where \(\tau(h)\) is defined by

$$ \tau(h)=\max_{m\geq0} \bigl\vert \tau_{m}(\phi) \bigr\vert . $$
(56)

On the other hand, from Taylor’s expansion of \(\phi(t_{m+1})\) about \(t_{m}\) and two equations (47) and (50), one may get

$$ \tau_{m}(\phi)=\mathcal{O} \bigl(h^{7} \bigr),\quad m\geq0, $$
(57)

for a sufficiently smooth function f. Hence, from the above three relations (55), (56), and (57), one can get the following convergence theorem for algorithm (35).

Theorem 2

Assume that the present method (35) satisfies the Lipschitz condition (54) and the slope function f is sufficiently smooth. Then, for the IVP (1), algorithm (35) has the rate of convergence \(\mathcal {O} (h^{7} )\).

Remark 4

Theorem 2 shows that the estimated error \(e_{m+1}\) in algorithm (33) exactly estimates the coefficients of Taylor’s expansion about h of the error \(E_{m+1}:=\phi(t_{m+1})-\phi_{m+1}\) up to the 7th order term, whereas the embedded RKF78 exactly estimates the 8th order term only. Also, unlike the existing embedded schemes, the estimated error \(e_{m}\) is embedded in the algorithm EEECM itself by considering as an initial value at each time interval. It turns out that the proposed algorithm (33) is more efficient in a long time simulation, which is shown throughout several numerical results (see Sect. 4).

4 Numerical results

In this section, we show several numerical results and compare the efficiency of the proposed method to those of other existing methods such as BV78, RKF78, Radau5, and Matlab built-in routines—ode113 and ode45 [2, 26, 27]. As a time step control for the proposed method, we use a standard step size selection algorithm (for example, [1, 28]) which is given by

$$ h_{m+1}= \biggl(\frac{\mathit{tol}}{ \Vert e_{m} \Vert _{\infty}} \biggr)^{\frac{1}{5}}h_{m}, $$
(58)

where tol is a given tolerance and \(e_{m}\) is the estimated local truncation error at time \(t_{m}\) calculated by (33). Also, the initial time step size is chosen by \(h_{0}=\frac{1}{4}(\mathit{tol})^{\frac{1}{5}}\), since RK4 is used to approximate the solution. In each test problem, we calculate both errors \(E_{m} = \phi(t_{m})-\phi_{m}\) and \(\tilde{E}_{m} = \phi(t_{m})-\phi_{m}-e_{m}\) denoted by EEECM and EEECM(e), respectively.

4.1 Simple problems

In this subsection, we will show the efficiency of EEECM with two simple IVPs. One is a well-known simple harmonic oscillator. The other is knowing that the global error control is quite difficult [12]. Details of each problem will be explained in each subsection.

Example 1

Consider a harmonic oscillator described by

$$ \textstyle\begin{cases} {y'_{1}}(t)=-y_{2}(t), \\ {y'_{2}}(t)=y_{1}(t), \end{cases} $$
(59)

whose analytic solutions are given by \([y_{1}(t), y_{2}(t)]=[\cos(t), \sin(t)]\).

To validate the theoretical convergence analysis in Theorem 2, the problem is solved on the interval \([0, 500]\) with different step sizes and the results are reported in Table 1. The first column shows time step sizes, the second does the errors measured by sup norm at the final time, and the last gives the rates between the errors generated by using the previous and the current step sizes. The results show that the numerical convergence order is 7, which validates theoretical convergence order.

Table 1 Convergence order of EEECM for solving a simple harmonic oscillator

For a demonstration of a long time simulation of EEECM, we solve the problem on the interval \([0, 10^{5}]\) and show how solutions and errors are well calculated. In Fig. 2, we plot the absolute errors in a log scale calculated by various numerical schemes with two different tolerances (a) \(1\text{e--}6\) and (b) \(1\text{e--}8\). It can be seen that all existing methods have the exponential growth of the error in the sense that the errors over time are increasing linearly up in a log scale. On the other hand, the figures of EEECM have uniform-like error bound during the whole time interval under the given tolerances. Furthermore, the results of EEECM(e) are superior to those of the existing methods. These remarkable results may contribute to many other fields which stood in needs of long-term simulations.

Figure 2
figure 2

Comparison of the error with different tolerances (a) \(\mathit{tol}=1\text{e--}6\) (b) \(\mathit{tol}=1\text{e--}8\)

Finally, we calculate the time cost required to obtain a desired accuracy by varying tolerances from \(\mathit{tol}=1\text{e--}5\) to \(\mathit{tol}=1\text{e--}10\). In Fig. 3, we plot the numerical absolute errors at the final time (y-axis) corresponding to the given tolerances versus the demanded CPU time (x-axis). The numerical results show that the proposed scheme obtains the most accurate solution for each fixed CPU time. In particular, one can see that the proposed method achieves the required accuracies within the given tolerances, whereas all existing methods except for RKF78 fail to achieve this requirement. Also, EEECM(e) is comparable to RKF78 in the sense of the CPU time and accuracy for given tolerances. We therefore conclude that the proposed method is the most efficient scheme in view of the above discussion, restricted to this harmonic oscillator problem.

Figure 3
figure 3

Comparison of errors versus CPU-times for given tolerances from \(1\text{e--}5\) to \(1\text{e--}10\)

Example 2

In this example, we test a system that the global error control task becomes more difficult [12] as the time goes on. The system consists of four equations given by

$$ \textstyle\begin{cases} {y'_{1}}=2t{y_{2}}^{1/5}y_{4}, \\ {y'_{2}}=10t\exp(5(y_{3}-1) )y_{4}, \\ {y'_{3}}=2ty_{4}, \\ {y'_{4}}=-2t\log(y_{1}) \end{cases} $$
(60)

defined on the interval \([0, 20]\) with the initial condition \(y(0)=[1, 1, 1, 1]^{T}\). Its analytic solutions are given by

$$ \begin{aligned}& y_{1}(t)=\exp\bigl(\sin \bigl(t^{2}\bigr)\bigr), \qquad y_{2}(t)=\exp\bigl(5\sin \bigl(t^{2}\bigr)\bigr), \\ &y_{3}(t)=\sin\bigl(t^{2}\bigr)+1, \qquad y_{4}(t)=\cos \bigl(t^{2}\bigr). \end{aligned} $$
(61)

The derivatives of the system show that their oscillation frequencies grow up rapidly when the time goes on. This is the reason why the global error control task [12] is difficult. We solve the problem with a fixed tolerance \(\mathit{tol}=1\text{e--}8\) and plot the absolute error in a log scale in Fig. 4(a). One can see that the proposed method achieves the required accuracy within the given tolerance on the interval \([0, 20]\), whereas all other methods fail to meet the given tolerances and some results at the final time are significantly contaminated by the errors.

Figure 4
figure 4

Comparison of (a) the error with a given tolerances \(\mathit{tol}=1\text{e--}8\) and (b) errors versus CPU-times for given tolerances from \(1\text{e--}5\) to \(1\text{e--}10\)

As shown in the first example, we also calculate the time cost required to obtain the desired accuracy by varying tolerances from \(\mathit{tol}=1\text{e--}5\) to \(\mathit{tol}=1\text{e--}10\) and plot the numerical results in Fig. 4(b). In this example, the numerical results show that the proposed scheme obtains the most accurate solution for each fixed CPU time. In particular, one can see that the proposed method achieves the required accuracies within the given tolerances, whereas all existing methods fail to achieve this requirement. Even the absolute errors of the other methods at the final time achieve about only half order for the desired accuracy even though the required CPU time is small compared to our method. That is, one may claim that our method well controls the global error within the given tolerances for this complicated system.

4.2 Hamiltonian system

Formally, a Hamiltonian system is a dynamical system completely described by the scalar function H, the Hamiltonian. Firstly, we solve a simple pendulum problem to show how well EEECM can conserve the total energy H. Secondly, we test a two-body Kepler problem to confirm that the proposed method is well fit for the Hamiltonian system.

4.2.1 Pendulum problem

In this example, we solve the equation for the period of swing of a simple gravity pendulum depending only on its length and the local strength of gravity. The total energy of the pendulum is given by

$$ H(p, q)=\frac{1}{2}p^{2}-\cos(q), $$
(62)

whose components p and q satisfy

$$ \textstyle\begin{cases} p'(t)=\sin(q), \\ q'(t)=p. \end{cases} $$
(63)

We solve the system on the interval \([0, 500]\) with the initial conditions \(p(0)=1\) and \(q(0)=\frac{\pi}{2}\) together with the given tolerance \(1\text{e--}8\). We examine the conservation property of the total energy H described by \(\vert H(p(0), q(0)) - H(p_{m}, q_{m}) \vert = \vert \frac{1}{2}- H(p_{m}, q_{m}) \vert \), where \(p_{m}\) and \(q_{m}\) are the approximate solutions at time \(t_{m}\). The numerical results are reported in Fig. 5 and show that only three methods, ode113, RKF78, and EEECM, achieve the invariance of H within the given tolerances. In particular, the numerical result of EEECM(e) has an outstanding conservation property compared to other numerical results. Hence, one may claim that the proposed method is superior to other existing methods.

Figure 5
figure 5

Comparison of invariance of the total energy H for solving the pendulum problem with tolerance \(1\text{e--}8\)

4.2.2 Kepler problem

In astronomy problems, such as the Kepler problem, a long-term simulation is an indispensable factor. Hence, we solve a two-body Kepler problem subject to Newton’s law of gravitation revolving around their center of mass, placed at the origin, in elliptic orbits in the \((q_{1}, q_{2})\)-plane [29]. Assuming unitary masses and gravitational constant, the dynamics is described by the Hamiltonian function H given by

$$ H(p_{1}, p_{2}, q_{1}, q_{2})= \frac{1}{2}\bigl(p_{1}^{2} + p_{2}^{2} \bigr)-\frac{1}{\sqrt{q_{1}^{2}+q_{2}^{2}}} $$
(64)

together with the angular momentum L, which is another invariant of the system, described by

$$ L(p_{1}, p_{2}, q_{1}, q_{2})=q_{1}p_{2}-q_{2}p_{1}, $$
(65)

whose components \(p_{i}\), \(q_{i}\) (\(i=1,2\)) satisfy the following IVP:

$$ \textstyle\begin{cases} {p'_{1}}(t)=-q_{1} ({q_{1}}^{2}+{q_{2}}^{2} )^{(-3/2)}, \\ {p'_{2}}(t)=-q_{2}({q_{1}}^{2}+{q_{2}}^{2} )^{(-3/2)}, \\ {q'_{1}}(t)=p_{1}, \\ {q'_{2}}(t)=p_{2}. \end{cases} $$
(66)

We solve system (66) with the initial conditions \(p_{1}(0)=0\), \(p_{2}(0)=2\), \(q_{1}(0)=0.4\), \(q_{2}(0)=0\) on the interval \([0, 1000\pi]\) together with a fixed tolerance \(1\text{e--}8\). It is well known that the true solution is periodic with periodicity 2π [30]. As the previous example, we examine the conservation properties of the total energy H (Fig. 6(a)) as well as the angular momentum L (Fig. 6(b)). From the two figures, one can see that the numerical results EEECM(e) are the most accurate.

Figure 6
figure 6

The invariances of (a) total energy H and (b) angular momentum L for a two-body Kepler problem with tolerance \(1\text{e--}8\)

In Fig. 7, we examine the numerical periodicity with several methods and calculate the error between the starting point \((q_{1}(0),q_{2}(0))=(0.4,0)\) and the numerical solution at time \(2k\pi\) (\(k=1, \ldots,500\)) by using the Matlab built-in function for the cubic spline interpolation. After that, the only 16 points among the 500 calculated errors by selecting one after every 30 points are plotted in Fig. 7. The figures show that the proposed method generates the most accurate results in the sense of periodicity. One can summarize that the proposed method gives the most efficient numerical results in respect of both conservation and periodicity.

Figure 7
figure 7

Comparison of periodicity for a two-body Kepler problem with tolerance \(1\text{e--}8\)

5 Conclusion and further discussion

A new error control strategy for non-stiff problems is developed within the ECM framework. Unlike the traditional way to approximate solutions in an explicit single step method, we suggest a methodology that contains the estimated error at each integration step and enables us to control the bound of the local truncation error for a long time simulation. Throughout several numerical results, it is shown that the proposed method obtains a uniform-like error bound, which is outstanding compared with existing numerical methods. Also, it is seen that like symplectic methods, the proposed scheme preserves the invariants such as the energy and angular momentum in Hamiltonian systems.

In order to fully explore the efficiency of EEECM, several extended issues are currently being pursued. One of them is to optimize the number of function evaluations to reduce the computational cost such as the existing embedded algorithms. Another issue is to investigate strategies for selecting time integration step size, since an adaptive time stepping is necessary to find efficient solutions for a long time simulation. The proposed method is developed only for non-stiff problems, and we solved simple Hamiltonian systems. Hence, the other challenge is to extend the idea of the proposed method into stiff systems. Additionally, the generalization of the proposed idea will be applied to many physical problems expressed by partial differential equations (PDEs). Results along these directions will be reported in the future.