1 Introduction and Motivation

Differential equations of fractional (i.e., non-integer) order [6] are an object of great current interest. They are useful tools for modeling various phenomena in science and engineering, see, e.g., [1, 2, 27,28,29]. These problems typically have the form

$$\begin{aligned} D_a^\alpha y(t) = f(t, y(t)), \qquad y(a) = {\tilde{y}}_0, \end{aligned}$$
(1.1)

where \(\alpha \in (0, 1)\) is the order of the differential operator. Note that \(\alpha > 1\) arises only rarely in applications and this case will not be discussed here. In Eq. (1.1), \(f:[a,b] \times {\mathbb {R}} \rightarrow {\mathbb {R}}\) is a given function and \({\tilde{y}}_0 \in {\mathbb {R}}\) describes the initial state of the modeled system at \(t = a\). The differential operator \(D_a^\alpha \) in Eq. (1.1) is the Caputo differential operator of order \(\alpha \) with starting point \(a \in {\mathbb {R}}\) as defined by [6, Definition 3.2], namely

$$\begin{aligned} D_a^\alpha y(t) = \frac{1}{\varGamma (1-\alpha )} \frac{\mathrm d}{\mathrm d t} \int _a^t (t-s)^{-\alpha } \left( y(s) - y(a) \right) \mathrm d s \end{aligned}$$
(1.2)

for \(t \in [a,b]\) and sufficiently smooth functions \(y: [a,b] \rightarrow { {\mathbb {R}}}\).

In Eqs. (1.1) and (1.2), the starting point a plays a special role as the start of the process that is being studied. A typical application from mechanics is an object made from viscoelastic material under external loads that is still in its virgin state for \(t < a\) and to which forces are only applied for \(t \ge a\). Here problem (1.1) is an initial value problem as \(y(a) = {\tilde{y}}_0\) is the state of the process at the initial time \(t = a\) and we are interested in finding y(t) for \(t \in [a,b]\) and some given time \(b > a\).

From the analytical viewpoint such initial value problems are well understood, see e.g., [6, Chapters 6 and 7]. Many numerical methods have been proposed and investigated; cf., e.g., [23]. However, from the modeling perspective, these methods often are of limited use because they hinge on the exact state of the process at the initial time \(t=a\) and this may be impossible to determine in actual applications. If one can only measure the value of y(b) for some \(b > a\) but not at a itself, this leads to

$$\begin{aligned} D_a^\alpha y(t) = f(t, y(t)), \qquad y(b) = y^*. \end{aligned}$$
(1.3)

Then we are tasked to solve (1.3) on the interval [ac] where a is the starting time of the process and \(c \ge b\). This can be done in two steps:

  • Solve the problem on the interval [ab]. Since b in Eq. (1.3) denotes the interval’s end point, this is a terminal value problem.

  • With the solution known on the entire interval [ab], the value of y at the initial point a is known as \({\tilde{y}}_0 = y(a)\). Therefore we can replace the terminal condition \(y(b) = y^*\) in Eq. (1.3) by the initial condition \(y(a) = {\tilde{y}}_0\). This converts the original problem into the classical initial value problem (1.1) that can now be solved on the entire interval [ac].

As indicated above, the second step of this process has a well understood structure and can be handled by standard methods. Therefore it does not require any special attention. Hence we focus only on the first step in this work.

2 Analytic Properties of Terminal Value Problems

Here we provide the basis for our numerical work with terminal value problems for fractional differential equations and recall some of their known analytical properties.

Conditions for well posed-ness have been discussed and partially established in [5, 8]. A complete analysis is in [3], with additional aspects presented in [9, 14]. For our work here, we shall specifically use the following result that is an immediate consequence of a statement by Cong and Tuan regarding initial value problems [3, Theorem 3.5]. It follows the classical set-up in assuming that the function f in Eq. (1.3) is continuous on \([a,b] \times {\mathbb {R}}\), maps into \({\mathbb {R}}\) and satisfies a Lipschitz condition with respect to the second variable.

Theorem 1.1

Let \(f: [a,b] \times {\mathbb {R}} \rightarrow {\mathbb {R}}\) be continuous and satisfy the Lipschitz condition with respect to the second variable

$$\begin{aligned} |f(t, x_1) - f(t, x_2)| \le L(t) | x_1 - x_2 | \end{aligned}$$
(2.1)

for all \(t \in [a,b]\) with some function \(L \in C[a,b]\). Then, for any terminal value \(y^* \in {\mathbb {R}}\) the terminal value problem (1.3) has a unique solution y in C[ab].

Remark 2.1

When talking about initial value problems, it is common practice to discuss not only scalar but multidimensional problems, i.e. to assume that the function f on the right-hand side of the differential equation maps from \([a,b] \times {\mathbb {R}}^d\) to \({\mathbb {R}}^d\) with some \(d \ge 1\). In the initial value problem setting, this generalization does not introduce any difficulties as far as the existence and the uniqueness of solutions are concerned. However, if terminal value problems are addressed as we are doing here, the situation is significantly different. To be precise, in Cong and Tuan [3] it was shown that well-posedness of terminal value problems in general only occurs for \(d = 1\); with a counterexample for \(d > 1\) given in Cong and Tuan [3, Section 6]. This counterexample demonstrated that multiple solutions can arise when \(d > 1\). Therefore we only consider the scalar case \(d=1\) here.

A classical technique for investigating analytical properties of initial value problems for differential equations is to rewrite the given problem in the form of an equivalent integral equation. This can be done in the fractional case in exactly the same way as in the classical case of first order problems, see, e.g., [6, Lemma 6.2] and results in a Volterra integral equation. But for terminal value problems of fractional differential equations this changes significantly. When rewriting our fractional derivative problem in form of an integral equation, the resulting integral equation will be of Fredholm type, not Volterra type [6, Theorem 6.18]. In first order differential equations, Fredholm integral equations may arise as well, but in connection with boundary value problems and not with initial value problems. Therefore we shall employ techniques that are based on principles used for boundary value problems of integer order initial value problems. Specifically, shooting methods, see Keller [24], are the foundation of the numerical methods that we suggest for solving fractional terminal value problems.

To describe our method, we need further analytic prerequisites. An important concept is the one-parameter Mittag-Leffler function \(E_\alpha : {\mathbb {C}} \rightarrow {\mathbb {C}}\) defined for \(\mathop {\textrm{Re}} \alpha > 0\) as

$$\begin{aligned} E_\alpha (z) = \sum _{k=0}^\infty \frac{z^k}{\varGamma (\alpha k + 1)}, \end{aligned}$$

see [20]. For \(\alpha \) values that are relevant in our setting we exploit the following:

Lemma 1.1

For all \(\alpha \in (0,1)\), the function \(E_\alpha \) is analytic. Moreover, for these \(\alpha \) and all \(z \in {\mathbb {R}}\) we have \(E_\alpha (z) > 0\) and \(E_\alpha (z)\) is strictly increasing in \(z \in {\mathbb {R}}\).

Proof

The analyticity of \(E_\alpha \) follows from [20, Proposition 3.1]. The inequality \(E_\alpha (z) > 0\) is trivial for \(z \ge 0\) since Euler’s Gamma function satisfies \(\varGamma (w) > 0\) for all \(w > 0\). For \(z<0\) the inequality is a consequence of the properties discussed in [20, Subsection 3.7.2] and the strict monotonicity follows from the properties shown in [20, Subsection 3.7.2] as well. \(\square \)

To become familiar with the nature of our numerical method, it is essential to understand further properties of initial value problems for fractional differential equations. Specifically, we mention the following: Given two solutions of the same fractional differential equation on the same interval, but starting from two different initial values, what are the differences between these two solutions on this interval? Upper bounds for the differences are directly available by standard classical Gronwall type arguments. For our purposes, however, we also need lower bounds, about which much less is known. First we explain our result for a linear differential equation. This is very simple but immediately gives us important insights.

Theorem 1.2

Let \(\ell \in C[a,b]\) be given, and let \(y_1\) and \(y_2\), respectively, be the solutions of the initial value problems

$$\begin{aligned} D_a^\alpha y_k(t) = \ell (t) y_k(t), \qquad y_k(a) = y_{0,k} \qquad (k = 1,2) \end{aligned}$$

with \(y_{0,1} > y_{0,2}\). Then, for all \(t \in [a,b]\),

$$\begin{aligned} ( y_{0,1} - y_{0,2} ) E_\alpha (\ell _*(t) (t-a)^\alpha ) \le y_1(t) - y_2(t) \le ( y_{0,1} - y_{0,2} ) E_\alpha (\ell ^*(t) (t-a)^\alpha ) \nonumber \\ \end{aligned}$$
(2.2)

where \(\ell _*(t) = \min _{s \in [a,t]} \ell (s)\) and \(\ell ^*(t) = \max _{s \in [a,t]} \ell (s)\).

Proof

The upper bound has been derived in Diethelm and Tuan [13, Theorem 5] and the lower bound is shown in Diethelm and Tuan [13, Theorem 4]. \(\square \)

In the general (nonlinear) case the result is more involved but the essential properties of the linear case remain intact.

Theorem 1.3

Assume that f satisfies the hypotheses of Theorem 2.1. Let \(y_1\) and \(y_2\), respectively, be the solutions of the initial value problems

$$\begin{aligned} D_a^\alpha y_k(t) = f(t, y_k(t)), \qquad y_k(a) = y_{0,k} \qquad (k = 1,2) \end{aligned}$$

where \(y_{0,1} > y_{0,2}\). Then, for all \(t \in [a,b]\) we have

$$\begin{aligned} ( y_{0,1} - y_{0,2} ) E_\alpha (\tilde{\ell }_*(t) (t-a)^\alpha ) \le y_1(t) - y_2(t) \le ( y_{0,1} - y_{0,2} ) E_\alpha (\tilde{\ell }^*(t) (t-a)^\alpha )\nonumber \\ \end{aligned}$$
(2.3)

where

$$\begin{aligned} \tilde{\ell }_*(t) = \inf _{s \in [a,t], y \ne 0} \frac{f(s, y + y_1(s)) - f(s, y_1(s))}{y} < \infty \end{aligned}$$
(2.4a)

and

$$\begin{aligned} \tilde{\ell }^*(t) = \sup _{s \in [a,t], y \ne 0} \frac{f(s, y + y_1(s)) - f(s, y_1(s))}{y} < \infty . \end{aligned}$$
(2.4b)

Proof

This is the result of Diethelm and Tuan [13, Theorem 7]. \(\square \)

Theorem 2.3 can be applied to the linear case of Theorem 2.2. In this situation, the functions \(\ell _*\) and \(\ell ^*\) of Theorem 2.2 coincide with the functions \(\tilde{\ell }_*\) and \(\tilde{\ell }^*\), respectively, of Theorem 2.3.

We use the notation \(\beta \sim \gamma \) for expressions \(\beta \) and \(\gamma \) that depend on the same quantities to denote that there exist absolute constants \(C_1 > 0\) and \(C_2 > 0\) such that \(C_1 \beta \le \gamma \le C_2 \beta \) for all admissible values of the quantities that \(\beta \) and \(\gamma \) depend on. With this notation and the findings of Lemma 2.1, we can summarize the statements of Theorems 2.2 and 2.3 more compactly.

Corollary 1.1

Under the assumptions of Theorem 2.2 or Theorem 2.3 and for any \(y_{0,1} > y_{0,2} \in {\mathbb {R}}\) we have

$$\begin{aligned} c_* \left( y_{0,1} - y_{0,2} \right) \le y_1(b) - y_2(b) \le c^* \left( y_{0,1} - y_{0,2} \right) \end{aligned}$$
(2.5)

where

$$\begin{aligned} c_* = E_\alpha (\tilde{\ell }_*(b) (b-a)^\alpha ) \quad \text{ and } \quad c^* = E_\alpha (\tilde{\ell }^*(b) (b-a)^\alpha ) \end{aligned}$$
(2.6)

for the functions \(\tilde{\ell }_*\) and \(\tilde{\ell }^*\) of (2.4). Since \(c^*> c_* > 0\) by (2.6), we can rewrite Eq. (2.5) as

$$\begin{aligned} y_1(b) - y_2(b) \sim y_{0,1} - y_{0,2}. \end{aligned}$$
(2.7)

This observation is the foundation for our numerical method in Sect. 3.4.

For later reference, we note a few more facts:

Remark 2.2

In general, we cannot expect that the ratio

$$\begin{aligned} {\hat{c}}:= \frac{y_1(b) - y_2(b)}{ y_{0,1} - y_{0,2}}, \end{aligned}$$

i.e. the proportionality factor between the terminal and the initial value of the solution to a problem, is known exactly. However, from Eq. (2.5), we know that \({\hat{c}}\) is bounded above by \(c^*\) and below by \(c_*\) given in (2.6). As long as no additional information is available to obtain a more precise approximate value for \({\hat{c}}\), one may use the mean of the upper and the lower bound, i.e. the approximation

$$\begin{aligned} {\hat{c}} \approx \frac{c_* + c^*}{2}. \end{aligned}$$
(2.8)

To compute this value from (2.6), we need to evaluate the quantities \(\tilde{\ell }_*(b)\) and \(\tilde{\ell }^*(b)\) as defined in Eq. (2.4). If an approximate solution \({\hat{y}}\) to the differential equation in question is known at least for some grid points \(a = t_0< t_1< t_2< \ldots < t_N = b\), one may select a step size \(H>0\) and an integer \(M > 0\) and approximate these upper and lower bounds by

$$\begin{aligned} \tilde{\ell }_*(b)\approx & {} \min \Big \{ \frac{f(t_j, k H + {\hat{y}}(t_j)) - f(t_j,{\hat{y}}(t_j))}{k H}: \nonumber \\{} & {} \qquad \qquad j \in \{ 0, 1, 2, \ldots , N \}, k \in \{ \pm 1, \pm 2, \ldots , \pm M \} \Big \} \end{aligned}$$
(2.9a)

and

$$\begin{aligned} \tilde{\ell }^*(b)\approx & {} \max \Big \{ \frac{f(t_j, k H + {\hat{y}}(t_j)) - f(t_j,{\hat{y}}(t_j))}{k H}: \nonumber \\{} & {} \qquad \qquad j \in \{ 0, 1, 2, \ldots , N \}, k \in \{ \pm 1, \pm 2, \ldots , \pm M \} \Big \}, \end{aligned}$$
(2.9b)

respectively. Then we obtain \({\hat{c}}\) from these approximate values instead of the exact values \(\tilde{\ell }_*(b)\) and \(\tilde{\ell }^*(b)\).

Remark 2.3

Estimating the proportionality factor \({\hat{c}}\) as indicated in Remark 2.2 is quite useful when the differential equation in Eq. (1.3) is dissipative, i.e. when \((f(t, y_1) - f(t, y_2)) (y_1 - y_2) \le 0\) for all \(t \in [a,b]\) and all \(y_1, y_2 \in {\mathbb {R}}\). In this case, Eq. (2.4) implies that \(\tilde{\ell }_*(t) \le \tilde{\ell }^*(t) \le 0\) for all t. And due to the monotonicity of the Mittag-Leffler function \(E_\alpha \) (see Lemma 2.1), we obtain

$$\begin{aligned} 0 < c_* = E_\alpha (\tilde{\ell }_*(b) (b-a)^\alpha ) \le c^* = E_\alpha (\tilde{\ell }^*(b) (b-a)^\alpha ) \le E_\alpha (0) = 1. \end{aligned}$$

Therefore, the interval \([c_*, c^*]\) in which the correct value of \({\hat{c}}\) must lie is quite small. By choosing this interval’s midpoint as starting point we only make a small error.

If, on the other hand, the differential equation is not dissipative then \(c^*\) may be very much larger than \(c_*\), and the strategy described in Remark 2.2 may lead to an estimate for \({\hat{c}}\) that is very far away from the correct value.

In Sect. 5 we work through example problems for either case.

Remark 2.4

Following Remark 2.3 we suggest yet another method for approximating \({\hat{c}}\): Analyze the given differential equation and see whether the approach of Remark 2.2 is appropriate (i.e., whether this does not lead to an excessively large value for \({\hat{c}}\)). If Remark 2.2 does not yield a useful \({\hat{c}}\) value, choose a smaller one. More precisely:

  1. 1.

    Approximately compute the values \(\tilde{\ell }_*(b)\) and \(\tilde{\ell }^*(b)\) as indicated in Remark 2.2.

  2. 2.

    If \(\tilde{\ell }_*(b) \le \tilde{\ell }^*(b) \le 0\) then there is no danger of obtaining extremely large values for \(c_*\) or \(c^*\). Thus proceed as suggested in Remark 2.2.

  3. 3.

    If \(\tilde{\ell }_*(b) \le 0 < \tilde{\ell }^*(b)\) then \(c_* \le 1\), but \(c^*\) may be very much larger than 1. To dampen the possible overestimation that \(c^*\) might induce, ignore the precise value of \(c^*\) and set \({\hat{c}} = 1\).

  4. 4.

    If \(0 < \tilde{\ell }_*(b) \le \tilde{\ell }^*(b)\) then \(c^* \ge c_* > 1\). Again, to mitigate an overestimation, use the lower bound of the interval \([c_*, c^*]\) as an estimate for \({\hat{c}}\), i.e. set \({\hat{c}} = E_\alpha (\tilde{\ell }_*(b) (b-a)^\alpha )\) as suggested by the first relation in Eq. (2.6).

3 Description of the Method

3.1 General Framework

As indicated in Sect. 2, the essential characteristics of problem (1.3) are identical to those of classical boundary value problems and our approach involves shooting methods [24] which are a well established technique for boundary value problems. The basic steps of general shooting methods are as follows:

  1. 1.

    Set \(k=0\). Given problem (1.3), make an initial guess \({\tilde{y}}_0^{(0)}\) for the value y(a).

  2. 2.

    Numerically compute the solution \({\tilde{y}}_k\) of the differential equation (1.3) for the initial condition \({\tilde{y}}_k(a) = {\tilde{y}}_0^{(k)}\). Ignore the terminal condition in (1.3) in this process.

  3. 3.

    Compare the computed solution \({\tilde{y}}_k(b)\) with the desired solution \(y^*\) at point b:

    1. (a)

      If \({\tilde{y}}_k(b)\) is sufficiently close to the desired solution \(y^*\), accept \({\tilde{y}}_k\) as the numerical solution of the given terminal value problem (1.3) and stop.

    2. (b)

      Otherwise, iterate from k to \(k+1\) and construct a new improved guess \({\tilde{y}}_0^{(k+1)}\) for the starting value y(a). Go back to step 2

These simple components of shooting methods will be specified more precisely in the subsequent subsections. The overarching concern here is to keep the chosen shooting algorithm’s computational complexity low. The computational cost of shooting algorithms is reflected in the number of operations required per iteration step, multiplied by the number of iterations needed to achieve satisfactory accuracy.

3.2 Selecting the Initial Guess \({\tilde{y}}_0^{(0)}\) for y(a)

Unless specific information about the given fractional ODE problem is available that suggests otherwise, we choose \({\tilde{y}}_0^{(0)} = y^*\) as our initial guess for y(a) required in step 1, i.e., we use the desired terminal value as a first guess for our initial value.

It is often assumed that a good choice of an initial guess at a leads to quick convergence with acceptable accuracy at b, while a poorly chosen initial guess might require more iterations, thus leading to a significantly higher overall computational cost. The examples in Sect. 5, however, indicate otherwise. In every test example that we have considered, satisfactory accuracy was achieved with very few iterations with our method, no matter how close the starting guess at a was to the exact solution.

3.3 Numerically Solving a Fractional ODE Initial Value Problem

Algorithms that compute the solution of an (artificially constructed) initial value problem in step 2 have been discussed by Ford et al. [14], showing that the fractional Adams-Bashforth-Moulton (ABM) method [10, 11] is a good choice for non-stiff ODEs. However, for stiff differential equations or when the interval [ab] is very large, the stability properties of ABM may be insufficient [16] and one should use an implicit linear multistep method such as the fractional trapezoidal method [18] or a fractional backward differentiation formula [25, 26]. For our examples in Sect. 5, we present the results obtained with both alternative methods for comparison. There we use a uniform discretization of the basis interval [ab] by choosing an integer \(N> 1\) and equally spaced grid points \(t_j = a + j h\) for the step size \(h = (b-a) / N\). This allows us to use FFT techniques and obtain a fast implementation, see [19, 21]. Using the FFT to compute the numerical solution on [ab] takes \(O(N (\log N)^2)\) operations instead of \(O(N^2)\) operations for the standard implementation.

Solutions to fractional differential equations of the type considered here are almost never differentiable at their initial point a, see Diethelm [6, Theorem 6.26]. This adversely affects the convergence rate for many numerical methods such as the ABM method, see Diethelm et al. [11]. To improve convergence, one could replace a uniform mesh by a graded one, see Zhou and Stynes [30]. Or one could use a non-polynomial collocation scheme that was suggested, analyzed and tested in Ford et al. [15]. Such techniques will lead to faster convergence and can reach the required accuracy with larger step sizes and lower the overall computational effort. But these ideas cannot be easily combined with the FFT technique and consequently their numerical schemes become more costly overall. We shall not pursue these latter approaches any further.

3.4 Improved Subsequent Guesses for the Initial Values

The major contribution of our work is a new efficient method for guessing the initial values y(a) that hits \(y^* = y(b)\) more and more accurately. Traditional approaches [5, 7, 15] use classical bisection that halves the size of the containment interval for the “correct” choice of y(a) in each step. Clearly bisection is convergent, but it takes a large number of iterations to arrive in a sufficiently small neighbourhood of the exact solution if the size of the containment interval is large such as \(10^7\) units wide. Note that ten interval halving steps reduce the error of the initial value location only by a factor of about \(10^3\) since \(2^{10} \approx \) 1,000 and it would take 40 guess iterations to reduce this error to a reasonable \(10^{-5}\). We suggest a different method that converges much faster. Section 5 compares the new approach with classical bisection based methods.

Like the classical bisection method, our approach also requires two initial guesses for the initial values \({\tilde{y}}_0^{(0)}\) and \({\tilde{y}}_0^{(1)}\). For \({\tilde{y}}_0^{(0)}\) we always choose \({\tilde{y}}_0^{(0)}:= y^*\), the given terminal value. The next guess for a starting value and all subsequent guesses are chosen according to Theorems 2.2 and 2.3 and the fact that any two solution curves of a given fractional ODE with different initial values cannot cross each other. Hence, two solution curves for different starting values \(y_{0,1} > y_{0,2}\) can either spread out or bunch up further over the time interval [ab]. By Corollary 2.1, the proportion of two solution values \(y^*_1\) and \(y^*_2\) obtained for \(t = b\) and their starting values \(y_{0,1}\) and \(y_{0,2}\) at \(t = a\) indicates how to space the initial values until we find a starting value \(y_0^*\) that reaches the desired final value \(y^*\) within a chosen error bound.

As long as only one initial guess \({\tilde{y}}_0^{(0)}\) is available, i.e., when the next guess \({\tilde{y}}_0^{(1)}\) has not yet been computed, we assume—due to a lack of any information that might suggest otherwise—that the proportionality factor \({\hat{c}}\) between the terminal values (i.e. the function values of the solution at \(t=b\)) and the initial values (the corresponding values for \(t=a\)), see Remark 2.2, is given by Eq. (2.8). The values of \(c_*\) and \(c^*\) in this formula are then replaced by their approximations indicated in Eq. (2.9). According to Remark 2.2, our next guess for the initial value becomes

$$\begin{aligned} {\tilde{y}}_0^{(1)}:= {\tilde{y}}_0^{(0)} + \frac{ y^* - {\tilde{y}}_0(b) }{{\hat{c}}}. \end{aligned}$$
(3.1)

\( {\tilde{y}}_0^{(1)} \) is equal to the previous guess \({\tilde{y}}_0^{(0)}\) if and only if the latter has resulted in the exact solution, i.e., if and only if \({\tilde{y}}_0(b) = y^*\) and the problem has been solved.

Remark 3.1

Note that evaluating the formulas in (2.9) requires knowledge of an approximate solution to the given differential equation for some initial condition. At this stage, such information is already available because we have computed a ’solution’ using the first guess \({\tilde{y}}_0^{(0)}\) as initial value.

Remark 3.2

  • The approach described in Remark 2.2 to compute the value \({\hat{c}}\) requires the evaluation of the Mittag-Leffler function \(E_\alpha \), cf. Eq. (2.6). For this purpose, we suggest to use the algorithm developed in [17].

  • In case of a non-dissipative fractional differential equation, we have seen in Remark 2.3 that the approach of Remark 2.2 may lead to very poor approximations of \({\hat{c}}\) and its true value may be massively over-estimated. Therefore, for non-dissipative problems, one is likely to be better off with simply choosing an arbitrary not excessively large positive number for \({\hat{c}}\) such as \({\hat{c}} = 1\), in Eq. (3.1).

  • If the user believes that evaluating the formulas in (2.9) is too expensive then one may again use \({\hat{c}} = 1\) in Eq. (3.1), even for dissipative equations. Then the guess of  \({\tilde{y}}_0^{(1)}\) may be worse than the one obtained with \({\hat{c}}\) from (2.8) and the number of iterations until satisfactory accuracy may increase slightly.

  • Remark 2.4 suggests another way to choose \({\hat{c}}\) as it tries to find a compromise between the two earlier suggestions.

In the numerical experiments of Sect. 5, we report the results of our new method for various choices of \({\hat{c}}\) such as the construct of Remark 2.2, the idea of Remark 2.4, or simply setting \({\hat{c}} = 1\). In practice the method of Remark 2.4 usually leads to the smallest number of iterations and quickest overall convergence.

Our strategy for constructing additional initial values \({\tilde{y}}_0^{(k)}\) for \(k = 2, 3, \ldots \) is based on Corollary 2.1 which states that, given two fractional initial value problems for the same fractional differential equation, but starting from different initial conditions, the difference in terminal values of these two problems is approximately proportional to the difference in their initial values. Strictly speaking, Corollary 2.1 only holds in the asymptotic sense when the difference between subsequent initial values is tending to zero. In practice, our proportional secting idea has worked exceedingly well for all types of fractional ODEs even if this assumption is not satisfied, and the results in Sect. 5 show this clearly.

Next we need to specify initial value guesses for \({\tilde{y}}_0^{(k)}\) when \(k \ge 2\) from two earlier guessed starts. Our algorithm always analyzes the two most recent iteration results for y at b and compares their calculated approximations \({\tilde{y}}_{k-1}(b)\) and \({\tilde{y}}_{k-2}(b)\) with the desired value \(y^*\). How are these three values for y at b positioned? For this we found it convenient to express the target value \(y^*\) as a generalized convex combination of the two \({\tilde{y}}_{\mu }(b)\) values for \(\mu \in \{k-2, k-1\}\) and write

$$\begin{aligned} y^* = \lambda _k {\tilde{y}}_{k-1}(b) + (1 - \lambda _k) {\tilde{y}}_{k-2}(b) \end{aligned}$$
(3.2)

for some \(\lambda _k \in {\mathbb {R}}\). Here \(\lambda _k\) can be immediately computed since all other quantities in (3.2) are known. Since any positioning of the three values relative to each other is possible, this concept may lead to \(\lambda _k < 0\) or \(\lambda _k > 1\) which would not be admissible in a classical convex combination, but this does not create any difficulties for our algorithm. From \(\lambda _k\) we compute the new guess for the next shooting start as

$$\begin{aligned} {\tilde{y}}_0^{(k)}:= \lambda _k {\tilde{y}}_0^{(k-1)}+ (1 - \lambda _k) {\tilde{y}}_0^{(k-2)}. \end{aligned}$$
(3.3)

The new starting guess \({\tilde{y}}_0^{(k)}\) is a generalized convex combination of the two preceding guesses that uses the same proportions as those in Eq. (3.2). Evidently, if the statement of Corollary 2.1 were an equality and no errors had occurred in the numerical solver for the initial value problem, then (3.3) would lead to a starting guess that hits the desired target value \(y^*\) exactly.

With the value of \(\lambda _k\) in Eq. (3.3) as computed from Eq. (3.2), we obtain the next initial value guess as

$$\begin{aligned} {\tilde{y}}_0^{(k)} = {\tilde{y}}_0^{(k-1)} + \left( y^* - {\tilde{y}}_{k-1}(b) \right) \frac{{\tilde{y}}_0^{(k-1)} - {\tilde{y}}_0^{(k-2)}}{{\tilde{y}}_{k-1}(b) - {\tilde{y}}_{k-2}(b)}. \end{aligned}$$
(3.4)

Thus, the correction term that gets us from the previous initial value \({\tilde{y}}_0^{(k-1)}\) to the next starting guess \({\tilde{y}}_0^{(k)}\) is proportional to the error of the previous terminal value \(y^* - {\tilde{y}}_{k-1}(b)\) multiplied by the proportionality factor made up of the quotient of the difference of the two preceding initial values and the difference of the two preceding terminal values.

Note that formula (3.4) for \({\tilde{y}}_0^{(k)}\) is independent of the actual lay of \({\tilde{y}}_0^{(k-1)}\) and \({\tilde{y}}_0^{(k-2)}\) with respect to each other or to \(y^*\). This formulation was chosen deliberately to avoid any lay-logical tree complications when executing our proportional secting method. The reference point in Eq. (3.4) is always \(y^*\). The algorithm computes \({\tilde{y}}_0^{(k)}\) whose associated function value \(y_k(b)\) is closer to \(y^*\) than at least one of those generated by the initial values \({\tilde{y}}_0^{(k-1)}\) or \({\tilde{y}}_0^{(k-2)}\). Once \({\tilde{y}}_0^{(k)}\) and the associated terminal value \({\tilde{y}}_k(b)\) have been computed, we drop the oldest point data pair \({\tilde{y}}_0^{(k-2)}\) and \({\tilde{y}}_{k-2}(b)\) and continue with the pair with indices k and \(k-1\) and iterate on until \(|y_k(b) - y^*|\) has dropped below the required accuracy threshold.

Compared to classical bisection, our approach has two significant advantages:

  1. 1.

    Before a classical bisection method can be started, the correct initial value y(a) of the solution y must be known to lie inside the search interval, i.e., two numbers \(\underline{y_0}\) and \(\overline{y_0}\) with \(y(a) \in [\underline{y_0}, \overline{y_0}]\) must have been computed for the actual solution y with \(y(b) = y^*\). Any first guess \(y_0^{(0)}\) provides one of the search interval bounds, but to find another that lies on the opposite side of the unknown exact value of y(a), further iterations are necessary. The proportional secting method does not require any of this; to compute \({\tilde{y}}_0^{(k+1)}\) we do not need to know anything about the lay of y(a), \({\tilde{y}}_0^{(k)}\), or \({\tilde{y}}_0^{(k-1)}\).

  2. 2.

    In classical bisection, one starts with the initial interval \([\underline{y_0}, \overline{y_0}]\) in which the exact solution’s value for y(a) is known to be located. In each iteration step, the size of this interval (and hence the accuracy with which one knows the correct initial value) is reduced by one half. While this method clearly converges, it is easy to see that its convergence is typically rather slow. When the interval \([\underline{y_0}, \overline{y_0}]\) is large, classical bisection often requires very many iterations for an acceptable accuracy in the \(10^{-6}\) or \(10^{-8}\) range. Our examples demonstrate that our proportional secting scheme reduces the size of the initial search interval much faster and thereby it solves the shooting problem with fewer iterations.

Remark 3.3

Searching for the correct initial value is a nonlinear equations problem. The proportional secting method solves this nonlinear equation by the secant method. Such an approach for handling fractional terminal value problems was briefly mentioned by Ford and Morgado [14, Section 3]. However, their focus was on the selection of IVP solvers and not on shooting strategies. The authors of [14] have neither stated any properties of this approach nor provided an analysis or given any reasons why to use this method. The two main advantages of the secant method over the bisection approach seem to have been unnoticed so far.

Remark 3.4

Our approach always replaces the older of the two previous initial values, viz. \({\tilde{y}}_0^{(k-2)}\), by the newly computed value \({\tilde{y}}_0^{(k)}\) and then proceeds to the next iteration with the pair \(({\tilde{y}}_0^{(k-1)}, {\tilde{y}}_0^{(k)})\). If \(y^* = y(b)\) lies inside of the interval bounded by \({\tilde{y}}^{(k-1)}(b)\) and \({\tilde{y}}^{(k-2)}(b)\) we have obtained a guaranteed enclosure

$$\begin{aligned} y(a) \in \left[ \min \left\{ {\tilde{y}}^{(k-1)}_0, {\tilde{y}}^{(k-2)}_0 \right\} , \max \left\{ {\tilde{y}}^{(k-1)}_0, {\tilde{y}}^{(k-2)}_0 \right\} \right] \end{aligned}$$

of the solution’s exact initial value. Our algorithm does not guarantee such an enclosure in any iteration and we never know whether

$$\begin{aligned} y(a) \in \left[ \min \left\{ {\tilde{y}}^{(k)}_0, {\tilde{y}}^{(k-1)}_0 \right\} , \max \left\{ {\tilde{y}}^{(k)}_0, {\tilde{y}}^{(k-1)}_0 \right\} \right] . \end{aligned}$$

Since for most practical applications it is not relevant to have this information, we actually do not consider this a major drawback. We could modify proportional secting so that it retains the enclosure property once it has been obtained. Instead of always replacing the older of the two previous values, we might just replace the value that is on the same side of the exact solution as the new one, thus employing the regula falsi (false position method). However, using the regula falsi here comes with the added cost of potentially requiring significantly more iterations before reaching acceptable accuracy. Therefore we do not pursue this idea further.

Figure 1 visualizes the proportional secting iterations for the terminal value problem

$$\begin{aligned} D_0^\alpha y(t) = \frac{1}{t+1} \sin (t \cdot y(t)) \ \text { with } \ y(20) = y^* = 0.8360565 \end{aligned}$$

that we will discuss in more detail in Example 5.3 below. Our interval of interest is \([a,b] = [0, 20]\). We start with \({\tilde{y}}_0^{(0)} = y^* \approx 0.836\) at \(a = 0\), use the BDF2 solver for the initial value problem and obtain the black approximate solution graph that arrives at \({\tilde{y}}_0(b) \approx 0.57\) at time \(b = 20\). For simplicity’s sake we follow Remark 3.2 and construct the next initial value \({\tilde{y}}_0^{(1)}\) based on formula (3.1) with \({\hat{c}} = 1\). Since \({\tilde{y}}_0(b) \approx 0.58 < 0.836\ldots = y^*\), this moves the initial value exactly \(y^* - {\tilde{y}}_0(b)\) units up to obtain \({\tilde{y}}^{(1)}_0 \approx 1.1\) and subsequently \({\tilde{y}}_1(b) \approx 0.89\) when traveling along the blue graph. The next solution graph from \({\tilde{y}}^{(2)}_0 \approx 1.05\) (shown in red) arrives about halfway between \({\tilde{y}}_1(b)\) and \(y^*\) at time b. The dotted graph finally reaches \(y^*\) at b in 8 iterations with a \(10^{-15}\) error. An absolute error of approximately \(10^{-10}\) at the terminal point b takes 7 iterations, and 6 iterations suffice to obtain a \(10^{-7}\) accuracy for this example. Thus, each extra iteration gives us 3 to 4 more accuracy digits at time b.

Fig. 1
figure 1

Visualization of the behavior of the algorithm when applied to Example 5.3. The dotted curve is the numerical solution after 8 iterations of our shooting method; it cannot be visually distinguished from the exact solution

3.5 Selection of the Step Size for the Numerical IVP Solver

To reduce the computational cost of any iterative shooting algorithm, Diethelm [7] has proposed to vary the step size of the algorithm. When the iteration counter k of the “shots” is small, one assumingly is relatively far from the exact solution because the initial value is not sufficiently accurate and it does not make sense to solve the initial value problem with high accuracy. Moreover one can likewise use relatively large steps in the early iterations. Our numerical experiments in Sect. 5 indicate that no such difficult to implement step size varying procedure is necessary for either one of the three variants of our proportional secting approach because we always arrive at a very accurate solution in a small number (3 to 8 instead of 18 to 65) of iterations. Therefore we have decided to use a fixed step size and the same IVP solver in all shooting iterations.

3.6 Algorithmic Description of the Proportional Secting Scheme

We now write out the proportional secting method in a formal pseudo-code like manner in Algorithm 3.1 below.

figure a

4 Analysis of the Algorithm

For a better understanding of the performance and behavior of our algorithm, we now give a short theoretical analysis.

4.1 Accuracy and Convergence

We begin with a look at error estimates. The total error of our numerical scheme has two active components originating from two different sources besides rounding and truncation errors. When solving the terminal value problem (1.3) in the k-th iteration of the shooting procedure, we solve an adjacent problem from the starting guess \(y(a) = {\tilde{y}}_0^{(k)}\) and obtain \(y_k\) with \(y_k(b) \ne y(b) = y^*\) for the exact solution y. This is one component \(e_1(k)\) of the total error. Additionally there is another error component \(e_2(k,N)\) associated with the k-th iterative solution \(y_k\), namely the computational error \(e_2(k, N)\) inherent in the numerical integration algorithm that we use. \(e_2(k, N)\) depends on the number N of grid points of the fractional IVP solver and on the k-th initial value problem. We shall assume that the grid points are uniformly spaced as \(t_j = a + j (b-a) / N\) for \(j = 0, 1, 2, \ldots , N\).

We first consider the limit case \(k \rightarrow \infty \), i.e. we assume that we have done so many shooting iterations that the exact terminal value \(y_\infty (b) = y(b) = y^*\) is reached numerically. We denote the solutions for this special terminal value problem by \(y_\infty \) for the exact solution and by \({\tilde{y}}_{\infty , N}\) for the numerical solution where, in contrast to the convention used in the remainder of this paper, the notation explicitly indicates the number N of grid points used in the IVP solver. Since k cannot be varied any more in the limit case, it remains to study the influence of the parameter N, i.e., the number of grid points of the IVP solver, on the error. The following theorem indicates that the convergence order of the underlying IVP solver is retained.

Theorem 4.1

Assume the hypotheses of Theorem 2.1 and that the IVP solver approximates the solution of the given fractional differential equation in Eq. (1.3) with an \(O(N^{-p})\) convergence order with some constant \(p > 0\) for any initial condition. Then

$$\begin{aligned} \max _{j = 0, 1, 2, \ldots , N} | y(t_j) - {\tilde{y}}_{\infty , N}(t_j) | = O(N^{-p}). \end{aligned}$$
(4.1)

Remark 4.1

In Theorem 4.1, we require the initial value solver to solve the given differential equation with an \(O(N^{-p})\) error for every initial value, not just for the special initial value of the exact solution of the given terminal value problem. This requirement is due to a special property of fractional differential equations: The solutions of fractional differential equations tend to have only weak smoothness properties in general, but may behave much more smoothly for some exceptional initial values, see Diethelm [6, Section 6.4]. If the exact solution of the terminal value problem has such an exceptional initial value then the IVP solver may compute a numerical approximation rapidly if starting from the exact initial value. But generally the iterations converge much more slowly for all other, even nearby, initial values.

Proof

Our proof has two steps. First we prove that

$$\begin{aligned} | y(t_0) - {\tilde{y}}_{\infty , N}(t_0) | = O(N^{-p}) \end{aligned}$$
(4.2)

and then we use (4.2) to show Eq. (4.1).

In an indirect proof, we assume that (4.2) is not true. Then there exists a function \(\phi : {\mathbb {N}} \rightarrow {\mathbb {R}}\) and a strictly increasing sequence \((n_\mu )_{\mu =1}^\infty \) of positive integers such that \(\lim _{\mu \rightarrow \infty } \phi (n_\mu ) = \infty \) and

$$\begin{aligned} | y(t_0) - {\tilde{y}}_{\infty , n_\mu }(t_0) | \ge c n_\mu ^{-p} \phi (n_\mu ) \end{aligned}$$
(4.3)

for all sufficiently large \(\mu \) and some constant \(c > 0\). By definition of \({\tilde{y}}_{\infty , N}\), we have \({\tilde{y}}_{\infty , N}(t_N) = {\tilde{y}}_{\infty , N}(b) = y^*\) and thus

$$\begin{aligned} 0 = N^p \cdot | y^* - {\tilde{y}}_{\infty , N}(t_N) | = N^p \cdot | y^* - y_\infty (t_N) + y_\infty (t_N) - {\tilde{y}}_{\infty , N}(t_N) | \ge | z_{1,N} - z_{2,N} | \nonumber \\ \end{aligned}$$
(4.4)

where

$$\begin{aligned} z_{1,N} = N^p \cdot | y^* - y_\infty (t_N) | \quad \text{ and } \quad z_{2,N} = N^p \cdot | y_\infty (t_N) - {\tilde{y}}_{\infty , N}(t_N) |. \end{aligned}$$

And whenever N occurs in the sequence \((n_\mu )\), we have:

  • After convergence \(z_{1,N}\) equals \( N^p\) times the absolute value of the difference between the exact solution \(y^*\) of the given terminal value problem (1.3) evaluated at \(t=b\), i.e. the solution to the initial value problem for the associated differential equation combined with the initial value \(y(t_0)\), and the exact solution of the terminal value problem when computed by the shooting method. The shooting method terminal value problem is equivalent to the initial value problem for the same differential equation but with the initial value \({\tilde{y}}_{\infty , N}(t_0)\). Thus, by Corollary 2.1 and Eq. (4.3), we obtain

    $$\begin{aligned} z_{1,N}\ge & {} N^p \cdot | y(t_0) - {\tilde{y}}_{\infty , N}(t_0) | \cdot E_\alpha (\tilde{\ell }_*(b) (b-a)^\alpha ) \\\ge & {} c N^p N^{-p} \phi (N) = c \phi (N) \rightarrow \infty ~. \end{aligned}$$
  • After convergence \(z_{2,N}\) equals \(N^p\) times the absolute value of the error of the numerical solution to the initial value problem for \(y_\infty \) at the point \(t_N\). By assumption, the IVP solver converges at an \(O(N^{-p})\) rate, and therefore there exists some constant \(c' > 0\) such that \(0 \le z_{2,N} \le c' N^p N^{-p} = c'\) for all sufficiently large N in the sequence \((n_\mu )\).

Consequently, for sufficiently large N in the sequence \((n_\mu )\), we have \(z_{1,N} > z_{2,N}\), and hence the rightmost entry of Eq. (4.4) is strictly positive, giving the required contradiction. So Eq. (4.2) has been proved.

Now note that

$$\begin{aligned} \max _{j = 0, 1, 2, \ldots , N} | y(t_j) - {\tilde{y}}_{\infty , N}(t_j) | \le d_1(N) + d_2(N) \end{aligned}$$

where

$$\begin{aligned} d_1(N)= & {} \sup _{t \in [a, b]} | y(t) - y_\infty (t) | \\ \text{ and } \quad d_2(N)= & {} \max _{j = 0, 1, 2, \ldots , N} |y_\infty (t_j) - {\tilde{y}}_{\infty , N}(t_j)|. \end{aligned}$$

And \({\tilde{y}}_{\infty ,N}(t_0) = y_\infty (t_0)\) because the initial value problem solver that generates the approximation \({\tilde{y}}_{\infty , N}\) is exact for the initial point so that

$$\begin{aligned} d_1(N)\le & {} |y(t_0) - y_\infty (t_0)| \cdot E_\alpha (\tilde{\ell }^*(b) (b-a)^\alpha ) \\\le & {} |y(t_0) - {\tilde{y}}_{\infty ,N}(t_0)| \cdot E_\alpha (\tilde{\ell }^*(b) (b-a)^\alpha ) = O(N^{-p}) \end{aligned}$$

by Corollary 2.1 and Eq. (4.2). Moreover, due to the convergence rate of the initial value problem solver,

$$\begin{aligned} d_2( N)= & {} \max _{j = 0, 1, 2, \ldots , N} |y_\infty (t_j) - {\tilde{y}}_{\infty ,N}(t_j)| = O(N^{-p}) \end{aligned}$$

for sufficiently large N. \(\square \)

For only finitely many shooting steps the following similar error bounds hold.

Theorem 4.2

Let \({\tilde{y}}_k\) be the numerical solution of a given terminal value problem (1.3) obtained after \(k \ge 2\) steps of our shooting method. Then, under the assumptions of Theorem 4.1, we have

$$\begin{aligned} \max _{j = 0, 1, 2, \ldots , N} | y(t_j) - {\tilde{y}}_k(t_j) | \le e_1(k) + e_2(k, N) \end{aligned}$$

where

$$\begin{aligned} e_1(k) = \sup _{t \in [a, b]} | y(t) - y_k(t) | \le |y(a) - y_0^{(k)}| \cdot E_\alpha (\tilde{\ell }^*(b) (b-a)^\alpha ) \end{aligned}$$

depends only on k and

$$\begin{aligned} e_2(k, N) = \max _{j = 0, 1, 2, \ldots , N} |y_k(t_j) - {\tilde{y}}_k(t_j)| = O(N^{-p}) \end{aligned}$$

depends on k and the chosen number N of grid points.

Proof

This can be shown much in the same way as Theorem 4.1. \(\square \)

Numerical experiments in Sect. 5 will illustrate these estimates.

4.2 Stability and Robustness

The numerical stability and robustness of an algorithm are relevant issues when assessing its practical usefulness. For shooting algorithms we have to deal with two essential aspects in this context.

We need to understand the fundamental idea here to solve initial value problems that are close to, but not identical to the initial value problem that is equivalent to the given terminal value problem. From Theorem 2.3 and Corollary 2.1, we can see the well-posedness of both the original terminal value problem and the associated equivalent initial value problem. Thus small changes in either of these problems, no matter whether they are due to the way in which the shooting method works, to rounding errors of the given data or inherent errors of the initial value problem solver, or anything else do not lead to significant perturbations of the algorithm’s output.

Another key component is the initial value solving algorithm that is executed in every iteration of a shooting method. If an integrator with poor stability properties is used or if the chosen step size is too large to guarantee stability, then the instability will be propagated into the shooting method and may render its output meaningless. Fortunately, the stability properties of many frequently used fractional IVP solvers in the context of fractional ODEs are well understood: for the Adams-Bashforth-Moulton method (ABM), see Garrappa [16], for the fractional linear multistep methods such as the fractional BDF2 or the fractional trapezoidal methods, see Lubich [25]; for additional information see Garrappa [18]. These well established IVP solving methods are highly suitable for our purposes and they will be used in the numerical examples of Sect. 5.

5 Numerical Results

Here we present numerical experiments with our new proportional secting scheme, first in all its three variants of choosing \({\hat{c}}\) required in Eq. (3.1) when computing the second guess for the initial value and continuing with further proportional secting iterations. We compare our new method with the conventionally used shooting method that uses bisection. Our algorithm has been implemented in MATLAB R2022a on a notebook with an Intel Core i7-8550U CPU clocked at 1.8 GHz running Windows 10 and in MATLAB R2022b on a MacBook Pro with 2.4 GHz Quad-Core Intel Core i5 and 16 GB RAM.

In all cases, we have tested the shooting methods with two different solvers for the initial value problems, using the Adams-Bashforth-Moulton (ABM) scheme and the second order backward differentiation formula (BDF2) of Lubich [25]. The ABM method was implemented in a P(EC)\(^m\)E structure with four corrector iterations [4]. BDF2 is an implicit method and hence needs to solve a nonlinear equation at each time step to compute the corresponding approximate solution. For this we use Garrappa’s implementation [18]. We terminate its Newton iterations when two successive values differ by less than \(10^{-10}\). All shooting iterations are terminated when the approximate solution \(y_k\) at the endpoint b of [ab] differs by at most \(\varepsilon \) from the desired value \(y(b) = y^*\) for \(\varepsilon = 10^{-6}\), \(\varepsilon = 10^{-8}\), and \(\varepsilon = 10^{-10}\) in our tests.

The tables below list the chosen initial value problem solver together with the corresponding step size, the maximal error over the interval of interest and the number of iterations that each of the shooting methods needed with the respective combination of IVP solver and step size to converge up to the required accuracy. In this context, we note that, since the shooting strategies—and hence the sequences of the chosen initial values—differ from each other, the approximate solutions computed by the four different approaches do not coincide exactly. Therefore, the respective maximal errors are also not precisely identical. However, at least for \(\varepsilon = 10^{-8}\) and \(\varepsilon = 10^{-10}\), the maximal errors agree with each other at least within the accuracy listed in the tables. For \(\varepsilon = 10^{-6}\), the error variations are somewhat larger but their values have a common order of magnitude. For this case the corresponding table columns list the errors for the worst of our four guessing approaches.

Example 5.1

Our first example is the terminal value problem

$$\begin{aligned} D_0^\alpha y(t)= & {} \frac{8! \cdot t^{8-\alpha } }{\varGamma (9-\alpha )} - \frac{3 \varGamma (5+\alpha /2) t^{4-\alpha /2}}{\varGamma (5-\alpha /2)} \\{} & {} {} + \frac{9}{4} \varGamma (1+\alpha ) + \left( \frac{3}{2} t^{\alpha /2} - t^4 \right) ^3 - |y(t)|^{3/2}, \\ y(1)= & {} \frac{1}{4}, \end{aligned}$$

whose exact solution is

$$\begin{aligned} y(t) = t^8 - 3 t^{4+\alpha /2} + \frac{9}{4} t^\alpha . \end{aligned}$$

This is a standard example used for testing numerical methods in fractional calculus; cf., e.g., [10]. We report the results for the special case \(\alpha = 0.3\) in Tables 12 and 3 below.

Table 1 Computational cost and accuracy obtained when solving Example 5.1 for \(\alpha = 0.3\) with different numerical methods and \(\varepsilon = 10^{-6}\)
Table 2 Computational cost and accuracy obtained when solving Example 5.1 for \(\alpha = 0.3\) with different numerical methods and \(\varepsilon = 10^{-8}\)
Table 3 Computational cost and accuracy obtained when solving Example 5.1 for \(\alpha = 0.3\) with different numerical methods and \(\varepsilon = 10^{-10}\)

The results in Tables 12 and 3 show the following:

  • Even if a very small step size is used in the initial value solver, the best accuracy that we can achieve is limited by the parameter \(\varepsilon \) that governs the termination criterion of the shooting iterations. Generally the total error is slightly larger than \(\varepsilon \) in lockstep with the chosen step size. For relatively small steps the error component \(e_1(k)\) dominates the overall error estimate as established in Theorem 4.2 and for larger steps \(e_2(k,N)\) is the dominant error contribution.

  • In Tables 2 and 3 where the accuracy requirement \(\varepsilon \) is much smaller than the error term \(e_2(k,N)\) from Theorem 4.2, the convergence rate of BDF2 is around \(O(h^2)\) which is exactly the rate for the corresponding initial value problems, see [25]. This confirms the expected behavior predicted by Theorem 4.1 where we dealt with the case \(\varepsilon = 0\). For the Adams method, the expected convergence rate is again \(O(h^2)\) for this example, see Lubich [4]. But the actual rate in the data is a little lower at \(O(h^{1.9})\), because the step size is still too large for asymptotics to have set in.

  • Varying the strategy for selecting the next initial guess (i.e., switching between classical bisection and proportional secting) but not changing the IVP solver has no influence on the final result because a change of starting point guessing means trying to solve the same nonlinear equation by a different method which should not lead to significantly different results.

  • There is, however, a substantial difference in the number of iterations that the two guessing strategies require to obtain a result with the desired accuracy. Classical bisection needs many iterations, and this induces a rather high computational cost. The proportional secting method performs much better. It typically requires only between 8% and 24% of the iterations for classical bisection.

  • A simple run times comparison of the two algorithms reflects this speedup.

    • For example, for the case \(\varepsilon = 10^{-10}\) shown in Table 3, the Adams method (ABM) with stepsize 0.001 requires 1.53 s to converge when combined with classical bisection, but only 0.15 s in combination with the first variant of proportional secting with \({\hat{c}} = 1\) in Eq. (3.1), or 0.21 s with either of its two other variants as specified in Remarks 2.2 and 2.4, respectively. Here the simplest choice of \({\hat{c}} = 1\) is the fastest overall since calculating the quantities \(c^*\) and possibly also \(c_*\) of Eq. (2.6) for \({\hat{c}}\) is time consuming.

    • When using the BDF2 solver for the initial value problems with stepsize 0.0005 we measured run times of 3.38 s for the classical bisection method and 0.34 s and 0.43 s, respectively, for the first, and for the second and third variants of the proportional secting algorithm, respectively.

  • The fractional differential equation of Example 5.1 is not dissipative. For the proportionality factor \({\hat{c}}\) that is required for the second guess for the initial value when using Remark 2.2, the lower inclusion bound is \(c_* \approx 0.23\) and the upper bound is \(c^* \approx 2.27 \cdot 10^4\). This inclusion interval is rather large and the strategy of Remark 2.2 gives us a rather inaccurate approximation of the optimal \({\hat{c}}\) for the second guess. Running our new algorithm with the optimal value for \({\hat{c}}\) in the second starting guess converges in a small number of iterations and with the alternative versions of Remark 2.4 or \({\hat{c}} = 1\) it needs the same small number of iterations. The proportional secting method is very forgiving of bad guesses.

Example 5.2

Our second example deals with the terminal value problem

$$\begin{aligned} D_0^\alpha y(t) = - \frac{3}{2} y(t) \quad \text {with} \quad y(7) = \frac{14}{5} E_\alpha \left( - \frac{3}{2} \cdot 7^\alpha \right) \approx 0.6476 \end{aligned}$$

with \(\alpha = 0.3\) and the Mittag-Leffler function \(E_\alpha \) of order \(\alpha \). The exact solution is

$$\begin{aligned} y(t) = \frac{14}{5} E_\alpha \left( - \frac{3}{2} t^\alpha \right) . \end{aligned}$$

The results are in Tables 45 and 6. For this and the third example below we list the data for fewer step sizes than we did for Example 5.1 earlier.

Table 4 Computational cost and accuracy obtained when solving Example 5.2 for \(\alpha = 0.3\) with different numerical methods and \(\varepsilon = 10^{-6}\)
Table 5 Computational cost and accuracy obtained when solving Example 5.2 for \(\alpha = 0.3\) with different numerical methods and \(\varepsilon = 10^{-8}\)
Table 6 Computational cost and accuracy obtained when solving Example 5.2 for \(\alpha = 0.3\) with different numerical methods and \(\varepsilon = 10^{-10}\)

Note the following:

  • In Example 5.2, the exact solution satisfies \(y(b) = y(7) \approx 0.6476\) with \(y(a) = y(0) = 14/5 = 2.8\), so y(b) is a relatively poor initial guess for y(0). All of our shooting start guessing rules perform very similarly to Example  5.1.

  • The performance comparisons between bisection and proportional secting are essentially the same as those for Example 5.1. Proportional secting reaches the required accuracy with only between 8.3% and 15% of the iterations that the bisection method needs and the run times decrease correspondingly.

  • For both shooting strategies, the accuracy of the BDF2 second-order backward differentiation formula is much better than the accuracy of the ABM method.

  • This example uses a dissipative fractional differential equation that is linear, homogeneous and has the constant negative growth coefficient \(-3/2\). For fractional ODEs with constant coefficients the proportionality factor \({\hat{c}}\) is discussed in Remark 2.2 and it is simple to compute. In this Example the lower bound \(c_*\) coincides with the upper bound \(c^*\) and both of them have the value \(0.23\ldots \) which we can use for \({\hat{c}}\). This leads to a slight reduction in the number of required iterations.

Example 5.3

The third example is based on the initial value problem

$$\begin{aligned} D_0^\alpha y(t) = \frac{1}{t+1} \sin (t \cdot y(t)), \qquad y(0) = 1, \end{aligned}$$

with \(\alpha = 0.7\) on the interval \([a,b] = [0, 20]\).

The exact solution for this problem is unknown. Using a second order backward differentiation formula with 16,000,000 steps on \([a,b] = [0, 20]\), and thus a stepsize of \( 20/(16 \times 10^6) = 1.25 \times 10^{-6}\), we have computed the approximate solution shown in Fig. 1 as the curve highlighted by the black dots. We are confident to be very close to the exact solution with terminal value \(y(b) = y(20) \approx 0.8360565\). By replacing the given initial condition above by the terminal condition, we obtain a terminal value problem that we try to solve with the fractional ODE shooting methods of this paper. This fractional ODE problem appears more challenging than those of Examples 5.1 and 5.2 because here we work on a much larger interval and with a decaying oscillatory exact solution. The results are listed in Tables 78 and 9. Without any precise information of the exact solution, the listed computed errors are in comparison to the numerical solution constructed earlier.

Table 7 Computational cost and accuracy obtained when solving Example 5.3 for \(\alpha = 0.7\) with different numerical methods and \(\varepsilon = 10^{-6}\)
Table 8 Computational cost and accuracy obtained when solving Example 5.3 for \(\alpha = 0.7\) with different numerical methods and \(\varepsilon = 10^{-8}\)
Table 9 Computational cost and accuracy obtained when solving Example 5.3 for \(\alpha = 0.7\) with different numerical methods and \(\varepsilon = 10^{-10}\)

These tables again exhibit a similar behavior as our Examples 5.1 and 5.2. The proportional secting method is substantially faster than the classical bisection method. It requires significantly fewer shooting iterations to converge for the required accuracy. The same holds true for the run time measurements. For \(\varepsilon = 10^{-8}\) (see Table 8) and a BDF2 solver with stepsize 0.02, the run time is 0.48 s for the classical bisection method while proportional secting needs only 0.14 s for the variant with \({\hat{c}} = 1\) in Eq. (3.1) and 0.18 s when \({\hat{c}}\) in Eq. (3.1) is chosen according to Remark 2.2 and 0.15 s when the idea of Remark 2.4 is used to compute \({\hat{c}}\).

There is a significant difference in Example 5.3 compared to Example 5.2: Using the strategy of Remark 2.2 to compute the second guess for the initial value leads to convergence, but it is slightly worse than when simply using \({\hat{c}} = 1\) as suggested in Remark 3.2. This fractional differential equation is not dissipative, and its containment interval bounds given by (2.6) are \(c_* \approx 0.05\) and \(c^* \approx 5 \times 10^7\). Thus, the first containing interval for the correct proportionality factor is extremely large and the midpoint method of Remark 2.2 then starts from a very large error and continues with a relatively poor second approximate solution so that more iterations are needed until convergence.

6 Conclusion

We have discussed shooting methods for solving fractional terminal value problems with Caputo derivatives numerically. Choosing the best numerical IVP solver was not our focus. This has already been discussed extensively in Refs. [14, 16, 18, 19]. Instead we have investigated and tested algorithms that select initial values for each iteration in fractional ODE shooting procedures. Classical bisection is often recommended and most often used. It converges rather slowly, requiring many iterations until approximating the actual solution well. The newly proposed proportional secting method is much better here. It computes the guess of the second and subsequent starting values for the shooting procedure rather differently. Three separate methods, differing only in the guess of the second initial value, have been proposed and their respective performances shown to differ only slightly in speed and accuracy. Our experimental findings are supported by the results of analytical investigations.

Remark 4.1

The Caputo differential operator listed in Eq. (1.2) is not the only fractional differential operator of order \(\alpha \) with starting point a for which one can try to formulate terminal value problems. Indeed, it is conceivable to use the so-called Hilfer fractional derivatives of order \(\alpha \) and type \(\mu \in [0,1]\) with starting point a [22, Definition 3.3] instead. The special case \(\mu = 1\) of this class of operators is, on a very large set of functions, equivalent to the Caputo operators that we have discussed here; for \(\mu = 0\) one obtains the Riemann-Liouville derivatives [6, Chapter 2]. In principle, one could use the approach that we have proposed here to handle such generalized terminal value problems. One would then have to replace the initial value problem solving subroutines for Caputo IVPs in Algorithm 3.1 by corresponding functions for the modified operators, which is a relatively straightforward matter. However, to theoretically justify the proportional secting idea in this generalized context, one would also need to generalize our Corollary 2.1 in a corresponding way. Such a result currently does not seem to be available.

7 Software

The MATLAB source codes of the algorithms described in this paper, including all required auxiliary functions, can be downloaded from a dedicated repository [12] on the Zenodo platform, thus allowing all readers to reproduce the results of our numerical experiments. Our functions were tested in MATLAB R2022a and MATLAB R2022b.