1 Introduction

We study a Cauchy type problem with Caputo fractional differentiation. To this approach, let \(\alpha \in {\mathbb {R}}\), \({\mathbb {N}}\ni n:=[\alpha ]+1\), for \(\alpha \not \in {\mathbb {N}}\), and \(n=\alpha \) for \(\alpha \in {\mathbb {N}}\). Further, let \(\alpha _j\in {\mathbb {R}}\) \((j=1, \cdots , \sigma \in {\mathbb {N}})\) be such that

$$\begin{aligned} 0=\alpha _0<\alpha _1< \cdots<\alpha _{\sigma }<\alpha . \end{aligned}$$

Our objective is to study the spectral element method for the nonlinear differential equation

$$\begin{aligned}&{}^cD_{a^+}^{\alpha }(u)(x)=f\left[ x,u(x),{}^cD_{a^+}^{\alpha _1}(u)(x), \ldots , {}^cD_{a^+}^{\alpha _\sigma }(u)(x)\right] , \quad x\in [a,b],\nonumber \\&u^{(\kappa )}(a)=b_{\kappa }, \quad b_{\kappa }\in {\mathbb {R}}, \,\,\, \kappa =0,1, \ldots , n-1, \end{aligned}$$
(1.1)

where \({}^cD_{a^+}^{\alpha }\) denotes the Caputo fractional differential operator defined via Riemann-Liouville fractional derivatives and the integral representation:

$$\begin{aligned} {}^cD_{a^+}^{\alpha }(u)(x):= & {} D_{a^+}^{\alpha }\left( u(t)- \sum _{\kappa =0}^{n-1}\frac{u^{(\kappa )}(a)}{\kappa !}(t-a)^{\kappa }\right) (x)\nonumber \\= & {} \frac{1}{\varGamma (n-\alpha )}\int _a^x \frac{u^{(n)}(t)dt}{(x-t)^{\alpha -n+1}} =:{\mathscr {I}}_{a^+}^{n-\alpha }D^n(u)(x), \, D:=\frac{d}{dx}. \nonumber \\ \end{aligned}$$
(1.2)

In this paper we consider the study of the one-dimensional case where, for simplicity, we shall let \([a, b]:=[0,1]=\varOmega \), and denote f on the right hand side of (1.1) by

$$\begin{aligned} F_{\alpha _0,\cdots ,\alpha _{\sigma }}[x,u]:= f\left[ x,u(x),{}^cD_{a^+}^{\alpha _1}(u)(x), \ldots , {}^cD_{a^+}^{\alpha _\sigma }(u)(x)\right] . \end{aligned}$$
(1.3)

For \(u\in AC^n(\varOmega )\) ( space of absolutely continuous complex-valued functions with continuous derivatives up to order \(n-1\) in \(\varOmega \)) the Caputo fractional derivative exists almost everywhere in \(\varOmega \), see, e.g. [22].

Below are some related, previous, studies of the problem (1.1). Kilbas et al. [21] derived existence of unique \(L_1\) solution for Cauchy type problem with the Riemann-Liouville fractional derivative:

$$\begin{aligned}&\left( D^{\alpha }_{a^+}u\right) (x)=f[x,u(x)],\, \alpha \in {\mathbb {C}}, \mathfrak {R}(\alpha )>0, \quad x>a,\nonumber \\&\left( D_{a^+}^{\alpha -k}u\right) \left( a^+\right) =b_k, \, b_k\in {\mathbb {C}}, \quad k=1, \ldots , -[-\alpha ], \end{aligned}$$
(1.4)

under the assumption that \(f(x,y)\in L_1[a,b]\) satisfies a Lipschitz condition in y. Their investigation is based on rendering (1.4) into the equivalent Volterra integral equation:

$$\begin{aligned} u(x)=\sum _{j=1}^{-[-\alpha ]}\frac{b_j}{\varGamma (\alpha -j+1)}(x-a)^{\alpha -j} +\frac{1}{\varGamma (\alpha )}\int _a^x\frac{f[t,u(t)]dt}{(x-t)^{1-\alpha }}, \end{aligned}$$
(1.5)

and Banach fixed point theorem. This problem was first considered by Picher and Swell [32], for \(\alpha \in (0,1)\) and a bounded, Lipschitz, function f(xy). System of such problems is studied by Bonilla et al. [6]. Existence of a unique solution for (1.4), with Caputo fractional derivative, is given in [11, 13, 14]. In [14] a “shadowing-like” approach reduces, both in linear and nonlinear cases, the non-rational fractional orders to rational ones and then follow Grönwall type argument. Kilbas and Marzan [21] studied (1.4), in a similar integral equation as (1.5), with Caputo fractional derivative:

$$\begin{aligned} \begin{aligned} u(x)=\sum _{\kappa =1}^{-[-\alpha ]-1}&\frac{b_\kappa }{\varGamma (\kappa +1)}(x-a)^{\kappa }+ \frac{1}{\varGamma (\alpha )}\int _a^x\frac{f[t,u(t)]dt}{(x-t)^{1-\alpha }}, \\ u^{(\kappa )}(0)&=b_{\kappa }, \quad \kappa =0, 1 ,\ldots , -[-\alpha ]-1. \end{aligned} \end{aligned}$$
(1.6)

Some other approaches are Laplace transform [17], Adomian decomposition method [9]. System of Caputo fractional differential equations:

$$\begin{aligned} {}^cD_{a^+}^{\alpha }Y(x)=AY(x), \quad Y(0)=Y_0,\qquad \alpha \in [0, 1], \end{aligned}$$

investigated in [8] and [14], contain full wellposedness proofs. This approach was generalized to nonlinear Cauchy type problem with the Riemann-Liouville fractional derivative of order \(\alpha \in {\mathbb {C}}\) (\(\mathfrak {R}(\alpha )>0\)) in [22]:

$$\begin{aligned} D_{a^+}^{\alpha }(u)(x)=f\left[ x,u(x),D_{a^+}^{\alpha _1}(u)(x), \cdots , D_{a^+}^{\alpha _\sigma }(u)(x)\right] , \quad ( x\in [a,b]), \end{aligned}$$
(1.7)

where \(0<\mathfrak {R}(\alpha _1)< \cdots<\mathfrak {R}(\alpha _{\sigma })<\mathfrak {R}(\alpha )\), \(\sigma \ge 2\), giving conditions for a unique \(L_1(a,b)\) solution of (1.6) with \(\alpha \in {\mathbb {C}}\). An explicit approach, via multivariate Mittag-Leffler functions, for the linear version is given in [26]. In [25] an operational method is introduced, where using B-spline functions the multi-order fractional differential equation:

$$\begin{aligned} F\left( u(x),{}^cD^{\beta _1},\ldots , {}^cD^{\beta _m}\right) =g(x),\quad \beta _i\in {\mathbb {R}}, \end{aligned}$$

is solved combining the operational matrix of the Caputo fractional derivatives with the collocation method, and without using the equivalent Volterra integral equation form.

Other related studies are considering initial boundary value problems of sub-diffusion type and combine the spectral Galerkin in time with finite element [18] or finite difference schemes [42] in space. They construct efficient schemes tackling singularities that substantially raise the convergence rate.

A main strength of wavelets lies on the property of removing/reducing the singularities, see, e.g. [27]. Wavelet bases have been widely used for alternative representations of integral and differential operators [3, 28], leading to more thorough study of the underlying equations. As for the numerical studies, wavelet bases have significant advantage, e.g., in the study of adaptivity of discontinuous Galerkin schemes [10, 19, 38,39,40,41], collocation method [29], etc. In this setting, Rehman et al. [34], approach (1.4) with the Caputo fractional derivative, using Legendre wavelet method and convert the problem to a system of algebraic equations, hence reducing it to finding unknown coefficients of the system. In an other study, Patera [31], introduced the spectral method into the finite element schemes that uses higher degree piecewise polynomial bases. This procedure may lead to higher order of accuracy, where convergence is achieved either by rasing the spectral order or making mesh refinements (as hp approach). Compared to the standard finite elements, using hp-mesh, faster convergence can be achieved with fewer degrees of freedom. The disadvantage of hp approach is that it is not easily applicable in complex geometries.

Fractional calculus and fractional differential equations have applications in numerous fields of science and engineering, such as material science in mechanical engineering, anomalous diffusion, control and robotics, signal processing and system identifications, friction modeling, wave propagation, turbulence, seepage in fractal media, etc. [4, 7, 20, 23, 30, 45]. There are several equivalent definitions given for the fractional derivatives, e.g., Grüwald-Letnikov, Riemann-Liouville, and the Caputo fractional derivative [33].

In this paper we consider a multiwavelet approach rather than scalar wavelets. This would allow higher vanishing moments without extending the supports of the involved functions. Thus a smooth function gets negligible projections on most of the bases and, hence, can be locally approximated by lower-order polynomials [1]. The advantageous interpolating property of the scaling functions makes multiwavelet more suitable in solving differential equations: a property that helps to find the coefficients of expansions of a solution. Our objective is to solve the generalized Cauchy type problem (1.1) with Caputo fractional derivative. We represent the fractional integral operator in multiwavelet bases that leads to a sparse representation of the solution operator on a finite interval. Doing so we reduce the problem to the equivalent Volterra integral equation and then apply the multiwavelet spectral element method to reach faster discretization scheme. We investigate existence, uniqueness and derive convergence for the proposed scheme.

An outline of the remaining part of the paper is as follows. In Sect. 2 we prove the existence of a unique solution to our model problem. Section 3 is devoted to the properties of multiwavelets and related projections. In Sect. 4 we introduce multiscale transformation and give a representation of fractional integrals in multiwavelet bases in a sparse matrix form. In Sect. 5 we define the multiwavelets spectral element method and prove error estimates, via a splitting technique, where we employ Chebyshev polynomials, cf Richardson [35], to tackle the lack of continuous derivatives in the vicinity of the origin. Finally, in our concluding Sect. 6, we present numerical results justifying the robustness of our constructed scheme.

2 Wellposedness

We show the wellposedness for the solution of the differential Eq. (1.1) by verifying the existence of a unique solution for the, equivalent, nonlinear Volterra integral equation:

$$\begin{aligned} u(x)=\sum _{j=0}^{n-1}\frac{b_j}{j!}(x-a)^j+\frac{1}{\varGamma (\alpha )} \int _a^x\frac{F_{\alpha _0,\ldots ,\alpha _{\sigma }}[t,u] dt}{(x-t)^{1-\alpha }}. \end{aligned}$$
(2.1)

For ordinary differential equations, equivalence of Cauchy type problem, of Riemann-Liouville fractional order, and its corresponding nonlinear Volterra integral equation is proved in Kilbas et al. [22], where also the existence of a unique solution for this problem is proved. In this study, however, we consider the Caputo fractional order rather than the Riemann-Liouville. Since, then verifying the equivalence between (1.1) and (2.1), though similar to the proof of Theorem 3.24 in [22], is much easier. Hence, we establish existence of a unique solution for the Cauchy problem (1.1) in

$$\begin{aligned}&C_{\gamma }^{\alpha , \upsilon }(\varOmega ):=\{ u(x)\in {C}^{\upsilon }(\varOmega ): {}^cD_{a^+}^{\alpha }u\in C_{\gamma }(\varOmega )\},\nonumber \\&\quad \upsilon \in {\mathbb {N}}, \, \alpha >0, \, 0\le \gamma <1, \end{aligned}$$
(2.2)

where \(C_{\gamma }(\varOmega )\) is the space of functions f with \((x-a)^{\gamma }f(x)\in C(\varOmega )\).

To proceed, for a locally integrable function \(f\in {\mathbf {C}}_{n-\alpha }^{\alpha }(\varOmega )\), we define the fractional integrals of order \(\alpha \in {\mathbb {R}}\):

$$\begin{aligned} {\mathscr {I}}^{\alpha }(f)(x): =\frac{1}{\varGamma (\alpha )}\int _0^x(x-t)^{\alpha -1}f(t)dt,\quad x>0, \end{aligned}$$
(2.3)

where

$$\begin{aligned}&{\mathbf {C}}_{n-\alpha }^{\alpha }(\varOmega )\\&\quad =\left\{ u(x)\in C_{n-\alpha }(\varOmega ): \left( D_{a^+}^{\alpha }u\right) (x)\in C_{n-\alpha }(\varOmega )\right\} ,\quad n-1<\alpha \le n, \,\,\, (n\in {\mathbb {N}}). \end{aligned}$$

We shall use the following result by [22].

Lemma 2.1

For \(0\le \gamma <1, \, \alpha \ge \gamma \), the operator \( {\mathscr {I}}^{\alpha }\) in (2.3) is bounded in \(C_{\gamma }(\varOmega )\),

$$\begin{aligned} \Vert {\mathscr {I}}^{\alpha }f\Vert _{C_{\gamma }}\le d(\varOmega )^{\alpha } \frac{\varGamma (1-\gamma )}{\varGamma (1+\alpha -\gamma )}\Vert f\Vert _{C_{\gamma }}, \end{aligned}$$
(2.4)

where \(d(\varOmega ):=\vert {\varOmega }\vert \) is the size of \(\varOmega \).

Theorem 2.1

Assume that \(0\le \gamma < 1\), \(\gamma \le \alpha \), \(G\subset {\mathbb {C}}\) is an open set, and \(f:\varOmega \times G^{\sigma +1} \rightarrow {\mathbb {C}}\), with \(f[x,u,u_1, \cdots , u_{\sigma }]\in C_{\gamma }(\varOmega )\),   \(u_1, \cdots , u_{\sigma }\in G\), satisfies the Lipschitz condition

$$\begin{aligned}&|f[x,u,u_1, \cdots , u_{\sigma }](x)-f[x,v,v_1, \cdots , v_{\sigma }](x)|\le \varLambda _{\sigma }\sum _{j=0}^{\sigma }|u_j-v_j|,\nonumber \\&\quad \quad u_j, \, v_j\in G, \end{aligned}$$
(2.5)

where \(\varLambda _{\sigma }>0\) is independent of x. Let \(u^{(\kappa _j)}(a)=b_{\alpha _j}\) , with \(\kappa _j=0,\ldots , n_j-1\), and \(j=1, \cdots , \sigma \), be fixed numbers and set \( n_j=[\alpha _j]+1\). If \(n-1<\alpha <n=[\alpha ]+1\), then (1.1) has a unique solution \(u\in C_{\gamma }^{\alpha , n-1}\).

Proof

To show existence of a unique solution \(u(x)\in C^{n-1}(\varOmega )\) for the Cauchy problem (1.1), it suffices to prove the existence of a unique solution \(u(x)\in C^{n-1}(\varOmega )\), cf (2.2), for the equivalent nonlinear Volterra integral Eq. (2.1). To this approach, we pick up a point \(x_1\in \varOmega \) (\([a,x_1]\subset [a,b]\), such \(x_1\) always exists) for which the following inequality holds true

$$\begin{aligned} \eta :=\varLambda _{\sigma }\sum _{\kappa =0}^{n-1}\sum _{j=0}^{\sigma } \frac{(x_1-a)^{(\alpha -\kappa -\alpha _j)}}{\varGamma (\alpha -\kappa -\alpha _j+1)}<1. \end{aligned}$$
(2.6)

We rewrite the Eq. (2.1), introducing Volterra integral operator \({\mathscr {K}}\), in the form

$$\begin{aligned} ({\mathscr {K}}u)(x)=u_0(x)+\frac{1}{\varGamma (\alpha )}\int _a^x \frac{F_{\alpha _0,\ldots ,\alpha _{\sigma }}[t,u] dt}{(x-t)^{1-\alpha }}, \end{aligned}$$
(2.7)

where

$$\begin{aligned} u_0(x):=\sum _{j=0}^{n-1}\frac{b_j}{j!}(x-a)^j. \end{aligned}$$

To apply Banach fixed point theorem (see Theorem 1.9 [22]), we need to show that

  1. 1.

    if \(u(x)\in C^{n-1}[a,x_1]\), then \(({\mathscr {K}}u)(x)\in C^{n-1}[a,x_1]\)

  2. 2.

    for any \(u_1, u_2 \in C^{n-1}[a,x_1]\), (satisfying same initial data),

    $$\begin{aligned} \Vert {\mathscr {K}}u_1-{\mathscr {K}}u_2\Vert _{C^{n-1}[a,x_1]}\le \eta \Vert u_1-u_2\Vert _{C^{n-1}[a,x_1]}. \end{aligned}$$
    (2.8)

Here, the distance in the complex metric space \(C^{n-1}(\varOmega )\) is given by

$$\begin{aligned} d(u_1,u_2)=\Vert u_1-u_2\Vert _{C^{n-1}(\varOmega )}:= \sum _{\kappa =0}^{n-1}\left\| u_1^{(\kappa )}-u_2^{(\kappa )}\right\| _{C(\varOmega )}. \end{aligned}$$

Differentiating \(({\mathscr {K}}u)(x)\), \(\kappa \) times,   \(\kappa =1,\cdots , n-1\), and using \((D^{\kappa }{{\mathscr {I}}}_{a^+}^{\alpha })(u)(x)= ({{\mathscr {I}}}_{a^+}^{\alpha -\kappa })(u)(x)\), which holds for \(\alpha >\kappa \) and any sufficiently regular \(u(x)\in C(\varOmega )\), we have, using (2.7),

$$\begin{aligned} ({\mathscr {K}}u)^{(\kappa )}(x)=u_o^{(\kappa )}(x)+ \frac{1}{\varGamma (\alpha -\kappa )} \int _a^x\frac{F_{\alpha _0,\ldots ,\alpha _{\sigma }}[t,u] dt}{(x-t)^{1-\alpha +\kappa }}. \end{aligned}$$

The first term on the right hand side: \(u_0^{(\kappa )}(x):=\sum _{j=\kappa }^{n-1}\frac{b_j}{(j-\kappa )!}(x-a)^{(j-\kappa )}\) is continuous on \([a, x_1]\). To show continuity for the second term, we use Lemma 2.1 for \(\gamma =0\), and a new \(\alpha \), by relabeling: \(\alpha \rightarrow \alpha -\kappa -1\). Then we can prove that the second term is also continuous on \([a, x_1]\) and for any \(\kappa =0, 1, \cdots , n-1\), and

$$\begin{aligned}&\left\| \frac{1}{\varGamma (\alpha -\kappa )} \int _a^x\frac{F_{\alpha _0,\ldots ,\alpha _{\sigma }}[t,u] dt}{(x-t)^{1-\alpha +\kappa }}\right\| _{C[a,x_1]}\nonumber \\&\quad \le \frac{(x_1-a)^{(\alpha -\kappa )}}{\varGamma (\alpha -\kappa +1)} \Vert F_{\alpha _0,\ldots ,\alpha _{\sigma }}[x,u]\Vert _{C[a,x_1]}. \end{aligned}$$
(2.9)

Therefore, \(({\mathscr {K}}u)(x)\in C^{n-1}[a,x_1]\).

To prove (2.8), by Lipschitz condition (2.5), inequality (2.9), and successive estimates,

$$\begin{aligned}&\left\| {\mathscr {K}}u_1-{\mathscr {K}}u_2\right\| _{C^{n-1}\left[ a,x_1\right] }= \sum _{\kappa =0}^{n-1} \left\| \left( {\mathscr {K}}u_1\right) ^{(\kappa )}-\left( {\mathscr {K}}u_2\right) ^{(\kappa )}\right\| _{C[a,x_1]}\\&\quad =\sum _{\kappa =0}^{n-1} \left\| I_{a^+}^{\alpha -\kappa }\left( F_{\alpha _0,\ldots ,\alpha _{\sigma }}[x,u_1]\right) - I_{a^+}^{\alpha -\kappa }\left( F_{\alpha _0,\ldots ,\alpha _{\sigma }}[x,u_2]\right) \right\| _{C[a,x_1]}\\&\quad =\sum _{\kappa =0}^{n-1} \left\| I_{a^+}^{\alpha -\kappa }\left( F_{\alpha _0,\ldots ,\alpha _{\sigma }}[x,u_1]- F_{\alpha _0,\ldots ,\alpha _{\sigma }}[x,u_2]\right) \right\| _{C[a,x_1]}\\&\quad \le \varLambda _{\sigma }\sum _{\kappa =0}^{n-1} \left\| I_{a^+}^{\alpha -\kappa } \left( \sum _{j=0}^{\sigma }{}^cD_{a^+}^{\alpha _j}(u_1-u_2)\right) \right\| _{C[a,x_1]}\\&\quad \le \varLambda _{\sigma }\sum _{\kappa =0}^{n-1}\sum _{j=0}^{\sigma } \left\| I_{a^+}^{\alpha -\kappa -\alpha _j} \left( I_{a^+}^{\alpha _j}{}^cD_{a^+}^{\alpha _j}(u_1-u_2)\right) \right\| _{C[a,x_1]}\\&\quad =\varLambda _{\sigma }\sum _{\kappa =0}^{n-1}\sum _{j=0}^{\sigma } \left\| I_{a^+}^{\alpha -\kappa -\alpha _j} \left( (u_1-u_2)-\right. \right. \\&\quad \left. \left. \sum _{\kappa _j=0}^{n_j-1}\frac{(u_1-u_2)^{(\kappa _j)}(a)}{(\kappa _j)!}(x-a)^{\kappa _j} \right) \right\| _{C[a,x_1]}. \end{aligned}$$

Hence, the assumption \((u_1-u_2)^{(\kappa _j)}(a)=0\) yields

$$\begin{aligned} \Vert {\mathscr {K}}u_1-{\mathscr {K}}u_2\Vert _{C^{n-1}[a,x_1]}&\le \varLambda _{\sigma }\sum _{\kappa =0}^{n-1}\sum _{j=0}^{\sigma } \left\| I_{a^+}^{\alpha -\kappa -\alpha _j}(u_1-u_2)\right\| _{C[a,x_1]}\\&\le \varLambda _{\sigma }\sum _{\kappa =0}^{n-1} \sum _{j=0}^{\sigma } \frac{(x_1-a)^{(\alpha -\kappa -\alpha _j)}}{\varGamma (\alpha -\kappa -\alpha _j+1)} \Vert u_1-u_2\Vert _{C[a,x_1]} \\&\le \eta \Vert u_1-u_2\Vert _{C^{n-1}[a,x_1]}. \end{aligned}$$

By (2.6), \(0<\eta <1\), hence using Banach fixed point theorem, there exists a unique solution \({\hat{u}}\in C^{n-1}[a,x_1]\) for (1.1) in \(\varOmega _1:=[a, x_1]\). Following theorem 1.19 [22], this solution is the limit of a convergence sequence \({\mathscr {K}}^l({\hat{u}}_0)\) (\(l\rightarrow \infty \)) where \({\hat{u}}_0\) is any function in \(C^{n-1}(\varOmega _1)\). We can take \({\hat{u}}_0=u_0\), if at least one \(b_{\kappa }\ne 0\), put \(u_l={\mathscr {K}}^l{\hat{u}}_0\) and use (2.7), to get the recursion

$$\begin{aligned} u_l(x)=u_0(x)+\frac{1}{\varGamma (\alpha )} \int _a^x\frac{F_{\alpha _0,\ldots ,\alpha _{\sigma }}[t,u_{l-1}] dt}{(x-t)^{1-\alpha }}. \end{aligned}$$
(2.10)

Finally, since \(\lim _{l\rightarrow \infty } \Vert {\mathscr {K}}^l{\hat{u}}_0-{\hat{u}}\Vert _{C^{n-1}[a,x_1]}=0\), we have

$$\begin{aligned} \lim _{l\rightarrow \infty }\Vert u_l-{\hat{u}}\Vert _{C^{n-1}[a,x_1]}=0. \end{aligned}$$
(2.11)

Note that here the method of successive approximation is used to obtain a unique solution to the nonlinear Volterra integral Eq. (2.1): first for \(x\in (a, x_1)\) and then, in order to establish uniqueness on \(\varOmega \), we choose \(x_2=x_1+h_1\), \((x_2\in [x_1, b])\) at the next step and set

$$\begin{aligned} u(x)=u_0(x)+\frac{1}{\varGamma (\alpha )}\left( \int _a^{x_1}+\int _{x_1}^{x_2}\right) \frac{F_{\alpha _0,\ldots ,\alpha _{\sigma }}[t,u] dt}{(x-t)^{1-\alpha }}. \end{aligned}$$
(2.12)

Due to the previous step, the first integral at the right hand side of (2.12) is a known function. Hence, by the same argument as above, there exists a unique solution \({\hat{u}}(x)\in C^{n-1}[x_1,x_2]\) to (2.1) on \([x_1,x_2]\). Iterating this procedure, there exists a unique solution \(u(x)={\hat{u}}(x)\in C^{n-1}(\varOmega )\) to the nonlinear Volterra integral equation and hence to the Cauchy problem (1.1).

It remains to show that this solution belongs to \(C_{\gamma }^{\alpha ,n-1}(\varOmega )\). By (1.1) and (2.5),

$$\begin{aligned} \left\| {}^cD_{a^+}^{\alpha }(u_l)(x){-} {}^cD_{a^+}^{\alpha } (u)(x)\right\| _{C_{\gamma }(\varOmega )}&= \left\| F_{\alpha _0,\ldots ,\alpha _{\sigma }} [x,u_{l}(x)]{-}F_{\alpha _0,\ldots ,\alpha _{\sigma }}[x,u(x)]\right\| _{C_{\gamma }(\varOmega )}\\&\le \varLambda _{\sigma }\sum _{j=1}^{\sigma }\Vert u_l-u\Vert _{C_{\gamma }(\varOmega )}. \end{aligned}$$

Using (2.11), we get

$$\begin{aligned} \lim _{l\rightarrow \infty }\left\| {}^cD_{a^+}^{\alpha }(u_l)(x)-{}^cD_{a^+}^{\alpha }(u)(x)\right\| _{C_{\gamma }(\varOmega )}=0. \end{aligned}$$
(2.13)

Thus \({}^cD_{a^+}^{\alpha }(u)(x)\in C_{\gamma }(\varOmega )\), and the proof is complete. \(\square \)

3 Multiwavelets and projections

3.1 Alpert’s multiwavelets

To prepare for approximation procedure we recall Alpert’s multiwavelets, cf. [1,2,3], that can be readily revised to bases for \(L_2[0,1]\) using Lagrange interpolation polynomials \(L_k(x)\), through the roots \(\{\tau _{k}, k=0, 1, \ldots , r-1\}\) of the Legendre polynomial \(P_r\). Then, the Interpolating Scaling Functions (ISFs) are defined by

$$\begin{aligned} \phi ^{k}(x) := \left\{ \begin{array}{ll} \sqrt{\frac{2}{\omega _{k}}}L_k(2x-1), &{}\quad {x\in [0,1]},\\ 0, &{} \quad \text {otherwise}, \end{array} \right. \quad k=0, 1, \ldots , r-1, \end{aligned}$$
(3.1)

where \(\omega _k:=2/(r{P'}_r(\tau _{k})P_{r-1}(\tau _{k}))\), \(k=0, 1, \cdots , r-1\) are the Gauss-Legendre quadrature weights and \(\phi ^{k}\):s are a family of orthonormal bases for the subspace

$$\begin{aligned} V_0^r={\overline{span}}\left\{ \phi ^k:k=0, 1, \cdots , r-1\right\} \subset L_2[0, 1], \end{aligned}$$

of piecewise polynomials of degree less than r on [0, 1], and with \(L_2\)-inner product.

Let now \(\varOmega :=[0,1]=\bigcup _{b\in {\mathscr {B}}_j}{I_{j,b}}\), where \(I_{j,b}:=[x_{j,b},x_{j,b+1}]\), \(x_{j,b}:=2^{-j}b\) and \({\mathscr {B}}_j:= \{0, 1, \cdots , 2^j-1\}\) for \(j\in {\mathbb {Z}}^+\cup \{0\}\). Further, define the dilation and translation operators:

$$\begin{aligned} {\mathscr {D}}_af(x)=\sqrt{a} \, f(ax), \quad \hbox {and}\quad {\mathscr {T}}_bf(x)=f(x-b), \quad \hbox {respectively}. \end{aligned}$$

As a property of the Multi-Resolution Analysis (MRA), we can introduce the nested subspaces:

$$\begin{aligned} V_j^r={\overline{span}}\left\{ \phi _{j,b}^k: ={\mathscr {D}}_{2^j}{\mathscr {T}}_b\phi ^k, \,\, \, b\in {\mathscr {B}}_j,\,\, \, k=0, 1, \cdots , r-1\right\} , \end{aligned}$$

with \(V_j^r\subset V_{j+1}^r\). Thus there exist complementary orthogonal subspaces \(W_j^r\) such that

$$\begin{aligned} V_{j+1}^r=V_j^r\oplus W_j^r, \qquad j\in {\mathbb {Z}}^+\cup \{0\}, \end{aligned}$$
(3.2)

where \(\oplus \) denotes orthogonal sum, and a family of bases functions \(\psi _{j,b}^k\) that generates \(W_j^r\):

$$\begin{aligned} W_j^r={\overline{span}}\left\{ \psi _{j,b}^k: ={\mathscr {D}}_{2^j}{\mathscr {T}}_b\psi ^k, \,\,\, b\in {\mathscr {B}}_j,\,\,\, k=0, 1, \cdots , r-1\right\} . \end{aligned}$$

The functions \(\psi ^k=\psi _{0,0}^k\) are called multiwavelets. Alpert’s multiwavelets are constructed using dual basis \( \psi ^{\tilde{k}}_{\tilde{j},\tilde{b}}\) with the bi-orthogonality condition:

$$\begin{aligned} \left\langle \psi ^k_{j,b}, \psi ^{\tilde{k}}_{\tilde{j},\tilde{b}}\right\rangle _{L^2(\varOmega )} =\delta _{j,\tilde{j}}\delta _{b,\tilde{b}}\delta _{k,\tilde{k}}. \end{aligned}$$
(3.3)

A corresponding duality for the scaling functions is defined replacing \(\psi \) by \(\varphi \) in (3.3).

Due to the fact that \(V_j^r\subset V_{j+1}^r\) and \(W_j^r\subset V_{j+1}^r\), the vector functions \(\varPhi _0^{r,0}:=[\phi ^0_{0,0}, \cdots , \phi ^{r-1}_{0,0}]^T\) and \(\varPsi _0^{r,0}:=[\psi ^0_{0,0}, \cdots , \psi ^{r-1}_{0,0}]^T\) satisfy matrix refinement equations:

$$\begin{aligned} \varPhi _0^{r,0}(x)&= \sum \limits _{b\in {\mathscr {B}}_1}H^b{\mathscr {D}}_2{\mathscr {T}}_b\varPhi _0^{r,0}(x) =\sum \limits _{b\in {\mathscr {B}}_1}H^b\varPhi _1^{r,b}(x), \end{aligned}$$
(3.4)
$$\begin{aligned} \varPsi _0^{r,0}(x)&= \sum \limits _{b\in {\mathscr {B}}_1}G^b{\mathscr {D}}_2{\mathscr {T}}_b\varPhi _0^{r,0}(x) =\sum \limits _{b\in {\mathscr {B}}_1}G^b\varPhi _1^{r,b}(x), \end{aligned}$$
(3.5)

where \(\varPhi _j^{r,b}:={\mathscr {D}}_{2^j}{\mathscr {T}}_b\varPhi _0^{r,0} =[\phi ^0_{j,b}, \cdots , \phi ^{r-1}_{j,b}]^T\), and \(H^b\), \(G^b\) for \(b\in {\mathscr {B}}_j\) are \((r\times r)\) matrices with elements obtained from the inner products using the orthonormality property of wavelets and scaling functions. To identify the closed form for multiwavelets, from (3.5), there are \(2r^2\) unknown coefficients to be found. This is achieved using

  1. (i)

    The orthonormality that yields 2r of the unknown coefficients of (3.5) using:

    $$\begin{aligned} \left\langle \psi ^i_{0,0}(x),\psi ^k_{0,0}(x)\right\rangle _{L^2(\varOmega )}=\delta _{ik}, \quad i, k=0, 1, \ldots , r-1. \end{aligned}$$
  2. (ii)

    The number of, \(N_{\psi }^k=k+r-1\), vanishing moments defined by

    $$\begin{aligned}&{\mathscr {N}}_p^k:=\int _{-\infty }^{\infty }x^p\psi ^k_{0,0}(x)dx =\left\{ \begin{array}{l l} 0,\quad &{} 0\le p<N_{\psi }^k \\ \ne 0,&{} p=N_{\psi }^k \end{array}\right. ,\nonumber \\&\ k=0, 1, \cdots , r-1, \end{aligned}$$
    (3.6)

    equivalent to \({\hat{\psi }}^k_{0,0}(0)=0\),   \(0\le p<N_{\psi }^k\), that gives the remaining \(2r(r-1)\) coefficients.

3.2 Projection onto \(V_J^r\)

For a fixed integer \(J\ge 0\), we define an orthonormal projection operator \({\mathscr {P}}_J^r (f): L_2(\varOmega )\rightarrow V_J^r\):

$$\begin{aligned} f\approx {\mathscr {P}}_J^r(f)= \sum _{b\in {\mathscr {B}}_J} \sum _{k=0}^{r-1}\left\langle f,\phi _{J,b}^k\right\rangle _{L^2(\varOmega )} \phi _{J,b}^k, \quad f\in L_2(\varOmega ). \end{aligned}$$
(3.7)

The coefficients \(f_{J,b}^k:=\left\langle f,\phi _{J,b}^k\right\rangle _{L^2(\varOmega )} =\int _{I_{J,b}}f(x)\phi _{J,b}^r(x)dx\) are approximated using the Gauss-Legendre quadrature rule:

$$\begin{aligned}&f_{J,b}^{k}\approx 2^{-J/2}\sqrt{\frac{\omega _k}{2}}f\left( 2^{-J}({\hat{\tau }}_k+b)\right) ,\quad b\in {\mathscr {B}}_J,\,\, \, k=0,...,r-1, \,\, \nonumber \\&\quad \hbox { and } {\hat{\tau }}_k:=(\tau _k +1)/2, \end{aligned}$$
(3.8)

where \(\tau _k\) are the roots of \(P_r\). Let now \(\varPhi _J^r:=[{\varPhi _{J}^{r,0}}^T, {\varPhi _J^{r,1}}^T, \cdots , {\varPhi _J^{r,b_{\max }}}^T]^T\) where \(b_{\max }:=\max b\in {\mathscr {B}}_J\). Then, for fixed r and J, \(\varPhi _J^r\) is a vector function with elements consisting of the multiscaling functions (same as ISFs) \(\phi ^k\). Hence,

$$\begin{aligned} f\approx {\mathscr {P}}_J^r(f)={F_J}^T\mathbf{{\varPhi }}_J^r, \end{aligned}$$
(3.9)

where \(F_J\in {\mathbb {R}}^{N}\) is a \(N=r2^J\)-dimensional vector with entries \({[F_J]}_{br+k+1}=f_{J,b}^k\).

Lemma 3.1

(cf Lemma 1.1, [1]) Let \(f : [0, 1]\rightarrow {\mathbb {R}}\) be r times continuously differentiable function. Then, \({\mathscr {P}}_J^r(f)\) approximates f with the \(L_2\)-error bound

$$\begin{aligned} \Vert {\mathscr {P}}_J^r(f)-f\Vert \le 2^{-Jr}\frac{2}{4^rr!}\sup _{x\in [0, 1]}\left| f^{(r)}(x)\right| . \end{aligned}$$

4 Multiscale transformation

Using (3.2), we write the multiscale decomposition as \(V_J^r=V_0^r\oplus (\oplus _{j=0}^{J-1}W_j^r)\). Thus, one can approximate any function \(f\in L_2(\varOmega )\) by ISFs of the space \(V_0^r\) and multiwavelets of the spaces \(W_j^r\), \(j=0, 1, \cdots , J-1\). To proceed, we introduce multiscale operator \({\mathscr {M}}_J^r: L_2(\varOmega )\rightarrow V_J^r\) as

$$\begin{aligned} f\approx {\mathscr {M}}_J^r(f)=\left( {\mathscr {P}}_0^r+\sum _{j=0}^{J-1}{\mathscr {Q}}_j^r\right) (f), \end{aligned}$$
(4.1)

where \({\mathscr {P}}_0^r (f)\in V_0^r\) and \({\mathscr {Q}}_j^r\), \(j=0,\ldots , J-1\) are the orthonormal projection operators that map \(L_2(\varOmega )\) onto \(W_j^r\). Hence, \({\mathscr {M}}_J^r\) can be represented by

$$\begin{aligned} f\approx {\mathscr {M}}_J^r(f) =\sum _{k=0}^{r-1}f_{0,0}^k\phi _{0,0}^k+\sum _{j=0}^{J-1} \sum _{b\in {\mathscr {B}}_j}\sum _{k=0}^{r-1}\tilde{f}_{j,b}^{k}\psi _{j,b}^{k}, \end{aligned}$$
(4.2)

i.e., by a multiscale transformation, with

$$\begin{aligned} f_{0,0}^k:=\left\langle f,\phi _{0,0}^k\right\rangle _{L^2(\varOmega )}, \quad \hbox { and } \quad \tilde{f}_{j,b}^{k}:=\left\langle f,\psi _{j,b}^k\right\rangle _{L^2(\varOmega )}. \end{aligned}$$
(4.3)

Let now \(\varPsi _J^r:= [{\varPhi _{0}^{r,0}}^T, {\varPsi _0^{r,0}}^T, {\varPsi _1^{r,1}}^T \cdots , {\varPsi _{J-1}^{r,b_{\max }}}^T]^T\), where \(\varPsi _{j}^{r,b}:={\mathscr {D}}_{2^j}{\mathscr {T}}_b\varPsi _0^{r,0}\). Note that here \(b_{\max }:=\max b\in {\mathscr {B}}_{J-1}\). Thus, using notations preceding (3.4)–(3.5), we may write

$$\begin{aligned} f\approx {\mathscr {M}}_J^r(f)={\tilde{F}_J}^T{{\varPsi }}_J^r, \end{aligned}$$
(4.4)

where \(\tilde{F}_J\) is a \(r2^J\)-dimensional vector with entries \(f_{0,0}^{k}\) and \(\tilde{f}_{j,b}^k\), for \(b\in {\mathscr {B}}_{J-1}\), \(j=0, \cdots , J-1\) and \(k=0, \cdots , r-1\).

The single-scale coefficients \(f_{0,0}^k\), \(k=0, 1, \cdots , r-1\) are computed using (3.8) and the interpolation property of interpolating scaling functions. But, to evaluate the multiwavelets coefficients \(\tilde{f}_{j,b}^k\), we do not have this property available. Therefore, these integrals are computed numerically, by introducing the \(N\times N\) wavelet transform matrix \(T_J\), obtained using matrix refinement Eqs. (3.4) and (3.5). Then, the multiwavelets are obtained as the result of \(T_J\) acting on the multiscaling functions:

$$\begin{aligned} {\varPsi }_J^r=T_J {\varPhi }_J^r. \end{aligned}$$
(4.5)

Below is a brief way to construct \(T_J\) . In general, for ISFs, the refinement equation between neighboring scales is given by \(\varPhi _j^r=H_j\varPhi _{j+1}^r\), where \(H_j=I_{2^j}\otimes H\), \(H=[H^0\ H^1]\) and \(I_{2^j}\) is the identity matrix of order \(2^j\). Let now the vector function \(\varUpsilon _j^r:=[{\varPsi _j^{r,0}}^T, \cdots , {\varPsi _j^{r,2^j-1}}^T]^T\) satisfy \(\varUpsilon _j^r=G_j\varPhi _{j+1}^r, \,\, j=0, 1, \cdots , J-1\), where \(G_j=I_{2^j}\otimes G\) and \(G=[G^0,\ , G^1]\). Then, we readily identify the wavelet transform matrix \(T_J\) as

$$\begin{aligned} T_J=\left[ \begin{array}{c} \frac{1}{2^{J}}\left( H_0\times H_1\times \ldots \times H_{J-1}\right) \\ \frac{1}{2^{J}}\left( G_0\times H_1\times \ldots \times H_{J-1}\right) \\ \frac{1}{2^{J-1}}\left( G_1\times H_2\times \ldots \times H_{J-1}\right) \\ \vdots \\ \frac{1}{2^{2}}(G_{J-2}\times H_{J-1})\\ \frac{1}{2}G_{J-1} \end{array} \right] . \end{aligned}$$
(4.6)

The elements of the matrices \(G^0\), \(G^1\), \(H^0\) and \(H^1\), are, implicitely, determined using (3.4) and (3.5). Further, one can find the reconstruction formula (see also [3]):

$$\begin{aligned} \varPhi _1^{r,b}=\bar{G^b}\varPsi _0^r+\bar{H^b}\varPhi _0^r, \quad b=0, 1. \end{aligned}$$
(4.7)

Relations (3.4), (3.5) and (4.7) yield algorithms for transition between different scales.

4.1 Thresholding

By (3.6), each multiwavelet provides vanishing moments of order \(N_{\psi }^k=k+r-1,\) \(k=0, 1, \cdots , r-1\). The Alpert’s multiwavelets are uniformly bounded both in \(L_{\infty }\) and \(L_1\), up to a constant:

$$\begin{aligned} \left\| \psi _{J,b}^k\right\| _{L_{\infty }(\varOmega )}\lesssim 1,\quad \left\| \psi _{J,b}^k\right\| _{L_{1}(\varOmega )}\lesssim 1. \end{aligned}$$
(4.8)

The vanishing moments and normalizations (4.8), imply that the detail coefficients \(\tilde{f}_{J,b}^k\) become smaller when the underlying function is locally smooth, and we have, cf [19],

$$\begin{aligned} \tilde{f}_{J,b}^k=\left| \left\langle f,\psi _{J,b}^k\right\rangle \right| \le \inf _{P\in \prod _{N_{\psi }^k}}\left| \left\langle f-P,\psi _{J,b}^k\right\rangle \right| \lesssim 2^{-JN_{\psi }^k}\Vert f\Vert _{W^{1,N_{\psi }^k}(\varOmega )}, \end{aligned}$$
(4.9)

where \(\prod _{N_{\psi }^k}\) is the space of polynomial functions of degree less than \(N_{\psi }^k\) and \(W^{1,N_{\psi }^k}(\varOmega )\) is the Sobolev space of real valued functions. Thus the detail coefficients decay at the rate \(2^{-JN_{\psi }^k}, \,\, N_{\psi }^k=k+r-1\). Using higher vanishing moments, and increasing the refinement level J, more detail coefficients may be discarded in smooth regions. This yields thresholding with operator \({\mathscr {T}}_{D_\varepsilon }\):

$$\begin{aligned} {\mathscr {T}}_{D_\varepsilon }({\tilde{F}}_J)={\bar{F}}_J, \end{aligned}$$
(4.10)

where \(D_\varepsilon :=\{(J,b,k):|\tilde{f}_{J,b}^k >\varepsilon \}\), and the elements of \({\bar{F}}_J\) are given by

$$\begin{aligned} \bar{f}_{j,b}^k:=\left\{ \begin{array}{ll} \tilde{f}_{j,b}^k,&{} (j,b,k)\in D_\varepsilon ,\\ 0,&{} \hbox {else}, \end{array}\right. \quad b\in {\mathscr {B}}_j,\,\, j=0, \ldots , J-1, \,\, k=0, \ldots , r-1.\nonumber \\ \end{aligned}$$
(4.11)

The thresholding operator \({\mathscr {T}}_{D_\varepsilon }\), acts on the detail coefficients leaving the coarse scale coefficients unaffected. The approximation error due to the thresholding can be estimated similar to that of the classical wavelets, e.g., for the approximation operator \({\mathscr {A}}_{D_{\varepsilon }}:={{\mathscr {M}}_J^r}^{-1} {\mathscr {T}}_{D_{\varepsilon }}{\mathscr {M}}_J^r\).

Proposition 4.1

(Approximation error, cf [19]). Let \(\varOmega \) be bounded and \(\varepsilon _j=\bar{a}^{j-J}\varepsilon \) with \(\bar{a}>1\). Then, the approximation error with respect to the set of significant details \(D_{\varepsilon }\) is uniformly bounded with respect to \(L^q(\varOmega )\), \(q\in [1, \infty ]\), i.e.,

$$\begin{aligned} \left\| {\mathscr {P}}_J^r f-{\mathscr {P}}_{J,D_{\varepsilon }}^r f\right\| _{L^q(\varOmega )} \le C_{thr}\varepsilon , \end{aligned}$$
(4.12)

for some constant \(C_{thr}>0\) independent of J and \(\varepsilon \). Here \({\mathscr {P}}_J^r f\) and \({\mathscr {P}}_{J,D_{\varepsilon }}^r f\) are the projections corresponding to the coefficients \({\tilde{F_J}}\) and \({\mathscr {A}}_{D_\varepsilon }{\tilde{F}_J}\), respectively.

4.2 Representation of fractional integral in multiwavelet bases

Recall the fractional integrals of order \(\alpha \in {\mathbb {R}}\) defined in (2.3):

$$\begin{aligned} {\mathscr {I}}^{\alpha }(f)(x): =\frac{1}{\varGamma (\alpha )}\int _0^x(x-t)^{\alpha -1}f(t)dt,\quad f\in {\mathbf {C}}_{n-\alpha }^{\alpha }(\varOmega ). \end{aligned}$$
(4.13)

Although the Alpert’s multiwavelet bases are discontinuous, by construction, they are locally integrable. Hence, the fractional integral operator \({\mathscr {I}}^{\alpha }\) acting on the vector function \(\varPhi _J^r\) can be expressed by the projection operator \({\mathscr {P}}_J^r\) as,

$$\begin{aligned} {\mathscr {P}}_J^r\left( {\mathscr {I}}^{\alpha }(\varPhi _J^r)(x)\right) =I_{\phi }^{\alpha }\varPhi _J^r(x), \end{aligned}$$
(4.14)

To find the entries of the matrix \(I_{\phi }^{\alpha }\), we use the following auxiliary result:

Lemma 4.1

(cf [37]) Lagrange polynomials \(L_k\), on the set of nodes \(\tau _k\in [0,1]\), are given by

$$\begin{aligned} L_k(x)=\sum _{l=0}^{r-1}\beta _{k,l}x^{r-1-l}, \quad k=0, \ldots , r-1, \end{aligned}$$
(4.15)

where \(\beta _{k,0}=1/\Big (\prod _{l'=0,l'\ne {k}} ^{r-1}(\tau _k-\tau _{l'})\) and

$$\begin{aligned} \beta _{k,l}=\frac{(-1)^l}{\prod _{l'=0,l'\ne {k}} ^{r-1}(\tau _k-\tau _{l'})} \sum _{k_l=k_{l-1}+1}^{r-1}\cdots \sum _{k_1=0}^{r-l-2}\prod _{i'=1}^{l}\tau _{k_{i'}}, \quad \begin{array}{l} l=1, \ldots , r-1, \\ k\ne k_1\ne \cdots \ne k_l.\\ \end{array} \end{aligned}$$

Let now \(i=br+k+1\), \(j=b'r+k'+1\) for \(k,k'=0,\cdots , r-1\), and \(b,b'\in {\mathscr {B}}_J\), then the coefficients \([I_{\phi }^{\alpha }]_{i,j}\) are given by

$$\begin{aligned}{}[I_{\phi }^{\alpha }]_{i,j}= & {} 2^{\frac{-J}{2}}\sqrt{\frac{\omega _{k'}}{2}} {\mathscr {I}}^{\alpha }\left( \phi _{J,b}^r\right) \left( 2^{-J}\left( {\hat{\tau }}_{k'}+b'\right) \right) \nonumber \\= & {} \frac{2^{\frac{-J}{2}}}{\varGamma (\alpha )}\sqrt{\frac{\omega _{k'}}{2}} \int _0^{2^{-J}\left( \hat{\tau _{k'}}+b'\right) } \left( 2^{-J}\left( {\hat{\tau }}_{k'}+b'\right) -t\right) ^{\alpha -1}\phi _{J,b}^r(t)dt. \end{aligned}$$
(4.16)

To compute the integral in (4.16), correlating the integration interval and the support of \(\phi _{J,b}^r\), we get the following three cases:

  1. Case (1):

    \(b'<b\). Given that the support of \(\phi _{J,b}^r\) is \(I_{J,b}\), \({\hat{\tau }}_{k'}<1\) yields \(2^{-J}({\hat{\tau }}_{k'}+b')< 2^{-J}b\), then

    $$\begin{aligned}{}[I_{\phi }^{\alpha }]_{i,j}=0, \quad i,j=1, \cdots , N. \end{aligned}$$
    (4.17)
  2. Case (2):

    \(b'=b\). Here, by the change of variable \(x=2^Jt-b\) and with \(\lambda :={\hat{\tau }}_{k'}+b'-b\), (4.16) can be rewritten as

    $$\begin{aligned}{}[I_{\phi }^{\alpha }]_{i,j}=\frac{2^{-J\alpha }}{\varGamma (\alpha )} \sqrt{\frac{\omega _{k'}}{2}}\int _0^{\lambda }(\lambda -x)^{\alpha -1}\phi ^k(x)dx. \end{aligned}$$
    (4.18)

    Using definition of the multi-scaling function \(\phi ^k\) in (3.1) and Lemma 4.1, we have

    $$\begin{aligned}{}[I_{\phi }^{\alpha }]_{i,j}=\frac{2^{-J\alpha }}{\varGamma (\alpha )}\sqrt{\frac{\omega _{k'}}{\omega _{k}}}\sum _{l=0}^{r-1}\beta _{k,l}\int _0^{\lambda }(\lambda -x)^{\alpha -1}(2x-1)^{r-1-l}dx. \end{aligned}$$
    (4.19)

    By a further change of variable: \(x=\lambda t\), we can rewrite (4.19) as

    $$\begin{aligned}&[I_{\phi }^{\alpha }]_{i,j}=\frac{{(2^{-J}\lambda )}^{\alpha }}{\varGamma (\alpha )}\sqrt{\frac{\omega _{k'}}{\omega _{k}}} {B}(1,\alpha )\sum _{l=0}^{r-1} \beta _{k,l}(-1)^{r-1-l} {}_2{\mathscr {F}}_1\nonumber \\&\quad (l+1-r,1;\alpha +1;2\lambda ), \end{aligned}$$
    (4.20)

    where B is the \(\beta \) function and \({}_2{\mathscr {F}}_1\) is the hypergeometric function defined by the power series representation below (see [5]),

    $$\begin{aligned} {}_2{\mathscr {F}}_1(a,b;c;z)=\sum _{m=0}^{\infty } \frac{(a)_m(b)_m}{(c)_m}\frac{z^m}{m!}, \quad |z|<1. \end{aligned}$$

    Here \((\cdot )_m\) is the Pochhammer symbol. Since \(l+1-r\) is a non-positive integer, this series terminates at a finite sum and the function \({}_2{\mathscr {F}}_1\) reduces to the polynomial

    $$\begin{aligned}&{}_2{\mathscr {F}}_1(l+1-r,1;\alpha +1;2\lambda )\nonumber \\&\quad =\sum _{m=0}^{r-l-1}(-1)^m\left( \begin{array}{c} r-1-l \\ m \\ \end{array} \right) \frac{(1)_m}{(\alpha +1)_m}(2\lambda )^m. \end{aligned}$$
    (4.21)
  3. Case (3):

    \(b'>b\). Then, \({\hat{\tau }}_{k'}+b'>b+1\), and the integral in (4.16) can be rewritten as

    $$\begin{aligned}{}[I_{\phi }^{\alpha }]_{i,j}=\frac{2^{\frac{-J}{2}}}{\varGamma (\alpha )} \sqrt{\frac{\omega _{k'}}{2}} \int _{2^{-J}b}^{2^{-J}(b+1)}\left( 2^{-J}\left( {\hat{\tau }}_{k'}+b'\right) -t\right) ^{\alpha -1}\phi _{J,b}^r(t)dt.\nonumber \\ \end{aligned}$$
    (4.22)

    Using the same argument as in Case (2), we represent (4.22) in form of (4.18). Now, once again, using the definition of scaling function (3.1), Lemma 4.1 and the change of variable \(x=\lambda y\) we end up with

    $$\begin{aligned} \left[ I_{\phi }^{\alpha }\right] _{i,j}= \frac{\left( 2^{-J}\lambda \right) ^{\alpha }}{\varGamma (\alpha )} \sqrt{\frac{\omega _{k'}}{\omega _k}}\sum _{l=0}^{r-1}\beta _{k,l} \int _0^{1/\lambda }(1-y)^{\alpha -1}\left( 2\lambda y-1\right) ^{r-1-l}dy.\nonumber \\ \end{aligned}$$
    (4.23)

    Finally, applying the binomial expansion of \((2\lambda y-1)^{r-1-l}\):

    $$\begin{aligned} (2\lambda y-1)^{r-1-l}=\sum _{m=0}^{r-1-l}\left( \begin{array}{c} r-1-l \\ m \\ \end{array} \right) (y)^{r-1-l-m}(-1)^m, \end{aligned}$$

    and further simplification, we get

    $$\begin{aligned}{}[I_{\phi }^{\alpha }]_{i,j}= & {} \frac{\left( 2^{-J}\lambda \right) ^{\alpha }}{\varGamma (\alpha )} \sqrt{\frac{\omega _{k'}}{\omega _k}}\sum _{l=0}^{r-1}\beta _{k,l}\sum _{m=0}^{r-1-l}\left( \begin{array}{c} r-1-l \\ m \\ \end{array} \right) (2\lambda )^{r-1-l-m}(-1)^m\nonumber \\&\times \int _0^{1/\lambda }(1-y)^{\alpha -1}(y)^{r-1-l-m}dy. \end{aligned}$$
    (4.24)

    Here the last integral is the incomplete \(\beta \) function which is defined as

    $$\begin{aligned} {B}(x;a,b)=\int _0^x t^{a-1}(1-t)^{b-1}dt =\frac{x^a}{a}{}_2{\mathscr {F}}_1(a,1-b;a+1;x). \end{aligned}$$

    Hence, letting \(\sigma :=r-l-m\), we can write

    $$\begin{aligned} \int _0^{1/\lambda }(1-y)^{\alpha -1}(y)^{\sigma -1}dy=\frac{(1/\lambda )^{\sigma }}{\sigma }{}_2{\mathscr {F}}_1(\sigma ,1-\alpha ;\sigma +1;1/\lambda ). \end{aligned}$$

In this way, the elements of the block upper triangular matrix \(I_{\phi }^{\alpha }\) are specified in all possible cases.

5 Multiwavelets spectral element method

Using construction of the Alpert’s multiwavelets, it is easy to verify that \(\phi _{j,b}^k,\, \psi _{j,b}^k\ \in C^{n-1}(I_{j,b})\) for \(j\in {\mathbb {N}}_0\), \(b\in {\mathscr {B}}_j\) and \(k=0, 1, \ldots , r-1\), due to the fact that these bases functions are polynomials on each subinterval \(I_{j,b}\). To discretize the nonlinear Volterra integral Eq. (2.7), the approximate solution is represented by the multiscale operator \({\mathscr {M}}_J^r:L^2(\varOmega )\rightarrow V_{J}^r\), cf. (4.1)–(4.4); where, for notational reason, we replace \(f,\, {\tilde{f}},\) and \({\tilde{F}}\) by \(u,\, {\tilde{u}},\) and \({\tilde{U}}\), respectively and get, in compact form

$$\begin{aligned} u\approx {\mathscr {M}}_J^r(u)(x)=\tilde{U}_J^T\varPsi _J^r(x), \end{aligned}$$
(5.1)

where \(\tilde{U}_J\) is a \(N=r2^J\)-dimensional vector with entries \(u_{0,0}^{k}\) and \(\tilde{u}_{j,b}^k\) (\(0\le j\le J-1\)), \(b\in {\mathscr {B}}_j\), and \(0\le k\le r-1\). Similar expansions are also valid for \(u_0(x)\), as well as \(F_{\alpha _0,\cdots ,\alpha _{\sigma }} [x,{\mathscr {M}}_J^r(u)(x)]\in C_{\gamma }(\varOmega )\):

$$\begin{aligned}&{\mathscr {M}}_J^r(u_0)(x)=\tilde{U}_0^T\varPsi _J^r(x), \end{aligned}$$
(5.2)
$$\begin{aligned}&{\mathscr {M}}_J^r({F}_{\alpha _0,\ldots ,\alpha _{\sigma }} [x,{\mathscr {M}}_J^r(u)(x)])(x)=\tilde{F}_J^T\varPsi _J^r(x), \end{aligned}$$
(5.3)

where \(\tilde{U}_0\) and \(\tilde{F}_J\) are N-dimensional vectors. The elements of \(\tilde{U}_0\) and \(\tilde{F}_J\), are computed using (3.8) and the wavelet transform matrix \(T_J\). Let now \(u_J^r:={\mathscr {M}}_J^r(u)\), using (1.3),

$$\begin{aligned}&{\mathscr {M}}_J^r(F_{\alpha _0,\ldots ,\alpha _{\sigma }} [x,u_J^r(x)])(x)=f[x,u_J^r(x),{}^cD_{a^+}^{\alpha _1} (u_J^r)(x), \ldots , {}^cD_{a^+}^{\alpha _\sigma }(u_J^r)(x)]\\&\quad =f[x,u_J^r(x),{\mathscr {I}}_{a^+}^{n_1-\alpha _1}D^{n_1}(u_J^r)(x), \ldots , {\mathscr {I}}_{a^+}^{n_{\sigma }-\alpha _{\sigma }}D^{n_{\sigma }} (u_J^r)(x)], \end{aligned}$$

and

$$\begin{aligned} {\mathscr {I}}_{a^+}^{n_j-\alpha _j}D^{n_j}(u_J^r)(x)&:= U^T{\mathscr {I}}_{a^+}^{n_j-\alpha _j}D^{n_j}(\varPsi _J^r)(x) =U^T{\mathscr {I}}_{a^+}^{n_j-\alpha _j}D^{n_j}_{\psi }(\varPsi _J^r)(x)\\&=U^TD^{n_j}_{\psi }I_{\psi }^{n_j-\alpha _j}\varPsi _J^r(x),\quad j=1, \ldots , \sigma , \end{aligned}$$

where \(D_{\psi }:=T_JD_{\phi }T_J^{T}\) (herein, \(D_{\phi }\) is the matrix representation of differential operator for scaling functions \(\varPhi _J^r\)) is the matrix representing the differential operator D for Alpert’s multiwavelets and \( I_{\psi }^{\alpha }:=T_JI_{\phi }^{\alpha }T_J^{T}\) (see [3]). Approximating (2.1), using (5.2)–(5.3), (4.14) and the wavelet transform matrix \(T_J\), the residual in approximating (2.1) is written as

$$\begin{aligned} r_J(x)=\left( \tilde{U}^T-\tilde{U}_0^T-\tilde{F}_J^TI_{\psi }^{\alpha }\right) \varPsi _J^r(x). \end{aligned}$$
(5.4)

Minimizing (5.4) yields the unknown coefficients. The spectral method is implemented either by the Galerkin- or the collocation-method. In the wavelet Galerkin method we use the orthogonality relation \(\langle r_J,\varPsi _J^r \rangle =0\) to find the Fourier coefficients for \(r_J\) associated with \(\varPsi _J^r\). This yields the nonlinear system

$$\begin{aligned} \tilde{U}^T-\tilde{U}_0^T-\tilde{F}_J^TI_{\psi }^{\alpha }=0 (\approx ), \end{aligned}$$
(5.5)

where the orthonormality property of the Alpert’s multiwavelets has been used. Then, the Newton’s method is used to solve this nonlinear system.

5.1 Error estimate

There are some concerns about the approximation Lemma 3.1. The common wisdom knows that one cannot expect the exact solution to have continuous derivatives in the vicinity of the origin. A remedy cf [35] reads: let f(x) satisfy the condition

$$\begin{aligned} |f(x)-f(0)|\le A|x|^{\omega }, \quad \omega>0,\quad \hbox {for some real } \, A>0, \end{aligned}$$
(5.6)

and set \(L=\ln 2^J\). Then one can show that there exists \(A_1>0\), such that for a sufficiently small \(x_L\in (0,1)\),

$$\begin{aligned} |f(x)-f(0)|_{x\in [0, x_L]}\le A_1e^{-\omega L}. \end{aligned}$$
(5.7)

Suppose that the Chebyshev interpolant \(C_n(y)\) provides an approximation for the function

$$\begin{aligned} F_L(y)=f\left( \varrho ^{-1}\left( L(y-1)/2\right) \right) , \quad y\in (-\infty , 1], \end{aligned}$$

where \(\varrho :=\ln {x}\), \(x\in [0,1]\), and \(x_L\) depends on \(\varrho \) and decreases, exponentially, to zero, as \(L\rightarrow \infty \).

Now we consider \(C_L^n\), interpolating f(x) on [0, 1], as

$$\begin{aligned} C_L^n=\left\{ \begin{array}{ll} f(0),&{}\quad x\in [0,x_L],\\ C_n(2\varrho (x)/L+1),&{}\quad x\in [x_L,1]. \end{array}\right. \end{aligned}$$
(5.8)

Under these assumptions, the approximation error is given by

$$\begin{aligned} \left\| f-C_L^n\right\| =\max \left\{ \left\| F_L-C_n\right\| _{y\in [-1,1]},\Vert f-f_0\Vert _{x\in [0,x_L]}\right\} . \end{aligned}$$
(5.9)

Then, similar to the Theorem 3.2 in [35], we can derive the following estimates:

Theorem 5.1

Let f be a sufficiently regular function defined on any subinterval of [0, 1] not containing the origin that satisfies (5.6). Then there exists a \(\delta >0\) such that for \(L=\ln {2^J}\),

$$\begin{aligned} \Vert f-C_L^n\Vert \le C 2^{-J\delta }, \end{aligned}$$
(5.10)

and

$$\begin{aligned} \Vert {\mathscr {P}}_J^r(f)-f\Vert \le C 2^{-J\delta }. \end{aligned}$$
(5.11)

Proof

Using (5.9), (5.7), and proof of lemma 3.1 (see [1]), we have

$$\begin{aligned} \Vert f-C_L^n\Vert \le \max {\left\{ \frac{2^{1-rJ}M_r\sqrt{1-2^{-J}}}{4^rr!}, A_1e^{-\omega L}\right\} }, \end{aligned}$$

where \(M_r=\sup _{x\in [x_L,1]}|f^{(r)}(x)|\). Evidently, the choice \(x_L=2^{-J}\) (\(L=\ln 2^{-J}\)) yields

$$\begin{aligned} \Vert f-C_L^n\Vert&\le \max {\left\{ \frac{2^{1-rJ}M_r\sqrt{1-2^{-J}}}{4^rr!},A_1e^{-\omega \ln {2^J}}\right\} }\\&=\max {\left\{ \frac{2M_r\sqrt{1-2^{-J}}}{4^rr!},A_1\right\} }2^{-J\delta }, \delta =\min {\{r,\omega \}}, \omega >0,\ 1\le r\in {\mathbb {N}}. \end{aligned}$$

Then (5.10) follows with the constant \(C=\max {\Big \{\frac{2M_r\sqrt{1-2^{-J}}}{4^rr!},A_1\Big \}}\).

To estimate \(\Vert {\mathscr {P}}_J^rf-f\Vert \), we use the error estimate for the polynomial based on Chebyshev nodes for interpolation, and (5.10), to get (5.11), viz.

$$\begin{aligned} \Vert {\mathscr {P}}_J^rf-f\Vert ^2&\le \sum _{b\in {\mathscr {B}}_J}\int _{I_{J,b}}\left( C_L^n(x)-f(x)\right) ^2dx\\&\le \sum _{b\in {\mathscr {B}}_J}\int _{I_{J,b}}\left( C2^{-J\delta }\right) ^2dx = (C2^{-J\delta })^2. \end{aligned}$$

\(\square \)

Lemma 5.1

(cf Corollary 2.3 (a), [22]). Given \(\alpha \in {\mathbb {R}}^+\), let \(n=-[-\alpha ]\). If \(\alpha \not \in {\mathbb {N}}_0\), then the Caputo fractional derivative operator \({}^c{D}_{a^+}^{\alpha }\) is bounded from the space \(C^n[0,1]\) into C[0, 1] as follows

$$\begin{aligned} \Vert {}^c{D}_{a^+}^{\alpha }u\Vert _C\le \frac{1}{\varGamma (n-\alpha )(n-\alpha +1)}\Vert u\Vert _{C^n}. \end{aligned}$$
(5.12)

Theorem 5.2

Under the assumptions of Theorem 2.1, and for u(x) and \(u_J(x)\) the exact and multiwavelets Galerkin solutions of Eq. (2.1), respectively, we have that:

If \(n-1<\alpha <n=[\alpha ]+1\) and

$$\begin{aligned} {\tilde{\eta }}:= \varLambda _{\sigma }\sum _{j=0}^{\sigma }\frac{1}{\varGamma (\alpha -\alpha _j+1)}<1, \end{aligned}$$

then, for any \(u(x)\in C(\varOmega )\), the following estimate holds true

$$\begin{aligned} \Vert u-u_J\Vert _{C(\varOmega )}\le \left\{ \begin{array}{ll} C_02^{-Jr},&{} \hbox {if }\,{F_{\alpha _1,\ldots ,\alpha _{\sigma }}[x,u] \in C^r(\varOmega ), \quad r>0 },\\ C_12^{{-}J\delta }, &{} \hbox {if}\,{F_{\alpha _1,\cdots ,\alpha _{\sigma }}[x,u] \in C^r\left( \varOmega \setminus \{0\}\right) , \quad r>0, \delta {=}\min \{r, \omega \}}. \end{array}\right. \end{aligned}$$

here

$$\begin{aligned} C_0&=\left( 1-{\tilde{\eta }}\right) ^{-1}\frac{2}{4^rr!}\left( \frac{1}{\varGamma (1+\alpha -r)} { C_2\left\| F_{\alpha _1,\cdots ,\alpha _{\sigma }}[x,u_J]\right\| _{C^{\mathfrak {m}}(\varOmega )} } +\left\| u_0^{(r)}\right\| _{C(\varOmega )}\right) ,\\ C_1&=\left( 1-{\tilde{\eta }}\right) ^{-1}\left( C+\frac{2}{4^rr!}\left\| u_0^{(r)}\right\| \right) ,\quad \hbox {for some}{C>0},\quad \hbox {and} \\ {C_2}&{ =1/\min \left\{ \varGamma (1+\alpha -r), \varGamma \left( \mathfrak {m}-r+\alpha \right) \left( \mathfrak {m}-r+\alpha +1\right) \right\} }, \end{aligned}$$

with \(\mathfrak {m}=[r-\alpha ]+1\).

Proof

Let \(r>0\). If \(F_{\alpha _1,\cdots ,\alpha _{\sigma }}[x,u]\in C^r(\varOmega )\), then by Lemma 3.1, with (\(\gamma =0\)), we have

$$\begin{aligned}&\left\| {\mathscr {M}}_J^r({\mathscr {I}}_{a^+}^{\alpha } (F_{\alpha _1,\cdots , \alpha _{\sigma }}[x,u])) - {\mathscr {I}}_{a^+}^{\alpha } F_{\alpha _1,\cdots ,\alpha _{\sigma }}[x,u]\right\| _{C(\varOmega )} \\&\quad \le \frac{2^{1-Jr}}{4^rr!}\max _{x\in \varOmega } \left| D^r{\mathscr {I}}_{a^+}^{\alpha }F_{\alpha _1,\ldots ,\alpha _{\sigma }}[x,u]\right| .\\ \end{aligned}$$

There are two possible cases here:

  1. 1.

    If \(r< n\), then we have

    $$\begin{aligned} {D}^{r}{\mathscr {I}}_{a^+}^{\alpha }={\mathscr {I}}_{a^+}^{\alpha -r}, \end{aligned}$$

    and it follows from (2.4) of Lemma 2.1, that

    $$\begin{aligned}&\left\| {\mathscr {M}}_J^r\left( {\mathscr {I}}_{a^+}^{\alpha } \left( F_{\alpha _1,\cdots , \alpha _{\sigma }}[x,u]\right) \right) - {\mathscr {I}}_{a^+}^{\alpha } F_{\alpha _1,\ldots ,\alpha _{\sigma }}[x,u]\right\| _{C(\varOmega )} \nonumber \\&\quad \le \frac{2^{1-Jr}}{4^rr!}\left\| {\mathscr {I}}_{a^+}^{\alpha -r} F_{\alpha _1,\cdots ,\alpha _{\sigma }}[x,u]\right\| _{C(\varOmega )}\nonumber \\&\quad \le \frac{2^{1-Jr}}{4^rr!}\frac{1}{\varGamma (1+\alpha -r)} \Vert F_{\alpha _1,\cdots ,\alpha _{\sigma }}[x,u]\Vert _{C(\varOmega )}. \end{aligned}$$
    (5.13)
  2. 2.

    If \(r\ge n\). Combining Lemma 2.21 of [22] and Theorem 3.14 in [15], it is easy to write

    $$\begin{aligned} {D}^{r}{\mathscr {I}}_{a^+}^{\alpha }={}^c{D}_{a^+}^{r-\alpha }\ {}^c{D}_{a^+}^{\alpha }{\mathscr {I}}_{a^+}^{\alpha }= {}^c{D}_{a^+}^{r-\alpha }. \end{aligned}$$
    (5.14)

    Thus using Lemma 5.1, we have

    $$\begin{aligned}&\left\| {\mathscr {M}}_J^r\left( {\mathscr {I}}_{a^+}^{\alpha } \left( F_{\alpha _1,\cdots , \alpha _{\sigma }}[x,u]\right) \right) - {\mathscr {I}}_{a^+}^{\alpha } F_{\alpha _1,\ldots ,\alpha _{\sigma }}[x,u]\right\| _{C(\varOmega )}\\&\quad \le \frac{2^{1-Jr}}{4^rr!}\left\| {}^c{D}_{a^+}^{\alpha -r} F_{\alpha _1,\ldots ,\alpha _{\sigma }}[x,u]\right\| _{C(\varOmega )}\\&\quad \le \frac{2^{1-Jr}}{4^rr!}\frac{1}{\varGamma \left( \mathfrak {m}-r+\alpha \right) \left( \mathfrak {m}-r+\alpha +1\right) } \left\| F_{\alpha _1,\cdots ,\alpha _{\sigma }}[x,u]\right\| _{C^\mathfrak {m}(\varOmega )}, \end{aligned}$$

    where \(\mathfrak {m}=[r-\alpha ]+1\).

In general, these two cases can be summarized in one, as

$$\begin{aligned}&\left\| {\mathscr {M}}_J^r\left( {\mathscr {I}}_{a^+}^{\alpha } \left( F_{\alpha _1,\cdots , \alpha _{\sigma }}[x,u]\right) \right) - {\mathscr {I}}_{a^+}^{\alpha } F_{\alpha _1,\cdots ,\alpha _{\sigma }}[x,u]\right\| _{C(\varOmega )}\nonumber \\&\quad \le \frac{2^{1-Jr}}{4^rr!} C_2 \left\| F_{\alpha _1,\ldots ,\alpha _{\sigma }}[x,u]\right\| _{C^\mathfrak {m}(\varOmega )}, \end{aligned}$$
(5.15)

where \(C_2:=1/\min \{\varGamma (1+\alpha -r), \varGamma (\mathfrak {m}-r+\alpha )(\mathfrak {m}-r+\alpha +1)\}\).

Using (1.3), the hypothesis and proof of the Theorem 2.1 (Lipschitz condition 2.5), and (2.9), we also have the following estimate,

$$\begin{aligned}&\left\| {\mathscr {I}}_{0^+}^{\alpha } F_{\alpha _1,\ldots ,\alpha _{\sigma }}[x,u] - {\mathscr {I}}_{0^+}^{\alpha } \left( F_{\alpha _1,\ldots ,\alpha _{\sigma }}[x,u_J]\right) \right\| _{C(\varOmega )}\nonumber \\&\quad \le \varLambda _{\sigma }\left\| {\mathscr {I}}_{0^+}^{\alpha } \left[ \sum _{j=0}^{\sigma }{}^cD_{0^+}^{\alpha _j}(u-u_J)\right] \right\| _{C(\varOmega )}\nonumber \\&\quad = \varLambda _{\sigma }\sum _{j=0}^{\sigma } \left\| {\mathscr {I}}_{0^+}^{\alpha -\alpha _j} \left[ {\mathscr {I}}_{0^+}^{\alpha _j}{}^cD_{0^+}^{\alpha _j}(u-u_J)\right] \right\| _{C(\varOmega )}\nonumber \\&\quad = \varLambda _{\sigma }\sum _{j=0}^{\sigma } \left\| {\mathscr {I}}_{0^+}^{\alpha -\alpha _j}\left[ (u-u_J)- \sum _{\kappa _j=0}^{n_j} \frac{(u-u_J)^{(\kappa _j)}(0)}{\kappa _j!}x^{\kappa _j}\right] \right\| _{C(\varOmega )}\nonumber \\&\quad \le \varLambda _{\sigma }\sum _{j=0}^{\sigma } \frac{1}{\varGamma (\alpha -\alpha _j+1)}\Vert u-u_J\Vert _{C(\varOmega )}. \end{aligned}$$
(5.16)

Now, we can write

$$\begin{aligned} u-u_J= & {} u_0-{\mathscr {M}}_J^r(u_0)+ {\mathscr {I}}_{a^+}^{\alpha } F_{\alpha _1,\ldots ,\alpha _{\sigma }}[x,u]- {\mathscr {M}}_J^r\left( {\mathscr {I}}_{a^+}^{\alpha } \left( F_{\alpha _1,\ldots ,\alpha _{\sigma }}[x,u_J]\right) \right) \nonumber \\&+{\mathscr {I}}_{a^+}^{\alpha } F_{\alpha _1,\ldots ,\alpha _{\sigma }}[x,u_J]- {\mathscr {I}}_{a^+}^{\alpha }F_{\alpha _1,\ldots ,\alpha _{\sigma }}[x,u_J]. \end{aligned}$$
(5.17)

Taking \(C(\varOmega )\)-norm of (5.17) and using (5.13)– (5.16) and triangle inequality, we get

$$\begin{aligned} \left\| u-u_J\right\| _{C(\varOmega )}\le & {} \left\| u_0-{\mathscr {M}}_J^r(u_0)\right\| _{C(\varOmega )}\nonumber \\&+\left\| {\mathscr {I}}_{a^+}^{\alpha } F_{\alpha _1,\ldots ,\alpha _{\sigma }}[x,u_J] -{\mathscr {M}}_J^r\left( {\mathscr {I}}_{a^+}^{\alpha } \left( F_{\alpha _1,\ldots ,\alpha _{\sigma }}[x,u_J]\right) \right) \right\| _{C(\varOmega )}\nonumber \\&+\left\| {\mathscr {I}}_{a^+}^{\alpha } F_{\alpha _1,\ldots ,\alpha _{\sigma }}[x,u]- {\mathscr {I}}_{a^+}^{\alpha } F_{\alpha _1,\cdots ,\alpha _{\sigma }}[x,u_J]\right\| _{C(\varOmega )}\nonumber \\\le & {} \frac{2^{1-Jr}}{4^rr!} \left( { C_2\left\| F_{\alpha _1,\cdots ,\alpha _{\sigma }}[x,u_J]\right\| _{C^{\mathfrak {m}}(\varOmega )} } +\left\| u_0^{(r)}\right\| _{C(\varOmega )}\right) \nonumber \\&+\varLambda _{\sigma }\sum _{j=0}^{\sigma } \frac{1}{\varGamma (\alpha -\alpha _j+1)} \left\| u-u_J\right\| _{C(\varOmega )}, \end{aligned}$$
(5.18)

and hence, since \({\tilde{\eta }}= \varLambda _{\sigma }\sum _{j=0}^{\sigma }\frac{1}{\varGamma (\alpha -\alpha _j+1)}<1\), we end up with

$$\begin{aligned} \Vert u-u_J\Vert _{C(\varOmega )}\le C_02^{-Jr}, \end{aligned}$$
(5.19)

where

$$\begin{aligned} C_0=\left( 1-{\tilde{\eta }}\right) ^{-1} \frac{2}{4^rr!} \left( \frac{1}{\varGamma \left( 1+\alpha -r\right) } { C_2\left\| F_{\alpha _1,\ldots ,\alpha _{\sigma }}[x,u_J]\right\| _{C^{\mathfrak {m}}(\varOmega )} } +\left\| u_0^{(r)}\right\| _{C(\varOmega )}\right) . \end{aligned}$$

Similarly, if \(F_{\alpha _1,\cdots ,\alpha _{\sigma }}[x,u]\) is analytic away from the origin, (in the sense that \(F_{\alpha _1,\cdots ,\alpha _{\sigma }}[x,u]\in C^r(\varOmega \setminus \{0\}\)), then

$$\begin{aligned} \left\| {\mathscr {M}}_J^r\left( {\mathscr {I}}_{a^+}^{\alpha } \left( F_{\alpha _1,\ldots , \alpha _{\sigma }}[x,u]\right) \right) - {\mathscr {I}}_{a^+}^{\alpha } F_{\alpha _1,\ldots ,\alpha _{\sigma }}[x,u]\right\| _{C(\varOmega )}\le C2^{-J\delta }, \end{aligned}$$
(5.20)

where \(C=\max {\Big \{\frac{2M_r\sqrt{1-2^{-J}}}{4^rr!},A_1\Big \}}\) with \(M_r=\max _{x\in [x_L,1]} |D^r{\mathscr {I}}_{a^+}^{\alpha }F_{\alpha _1,\cdots ,\alpha _{\sigma }}[x,u]|\).

Finally, by (5.16), (5.17) and (5.20), and using the triangle inequality, one can easily derive the following \(C(\varOmega )\)-norm estimate of (5.17),

$$\begin{aligned} (1-\tilde{\eta })\left\| u-u_J\right\| _{C(\varOmega )}\le \frac{2^{1-Jr}}{4^rr!}\left\| u_0^{(r)}\right\| _{C(\varOmega )}+Ce^{-\delta J}. \end{aligned}$$
(5.21)

Hence, we can conclude that

$$\begin{aligned} \Vert u-u_J\Vert _{C(\varOmega )}\le C_12^{-J\delta }, \end{aligned}$$
(5.22)

with \(C_1=\left( \left( 1-{\tilde{\eta }}\right) ^{-1}\left( C+\frac{2}{4^rr!}\left\| u_0^{(r)}\right\| \right) \right) \). \(\square \)

6 Numerical examples

Here we present some canonical examples to illustrate the validity of the proposed scheme through numerical implementations for the following two configurations:

\(E_1.\):

Three linear Cauchy type initial value problems

$$\begin{aligned} {E_1} (a):\quad D_{0^+}^{\alpha }u(x)+u(x)&=0, \quad 0<\alpha <2, \,\, \, x\in \varOmega ,\\ u(0)=1, \quad u'(0)&=0. \end{aligned}$$
$$\begin{aligned} {E_1} (b):\quad u''(x)+D_{0^+}^{3/2}u(x)+u(x)&=f_0(x), \quad x\in \varOmega ,\\ u(0)=0,\quad u'(0)&=0. \end{aligned}$$
$$\begin{aligned} {E_1} (c):\quad D_{0^+}^{1/2}u(x)+\sqrt{\pi }u(x)&=\sqrt{\pi }, \quad x\in \varOmega ,\\ u(0)=1. \end{aligned}$$
\(E_2.\):

Three nonlinear Cauchy type initial value problems

$$\begin{aligned}&{E_2} (a):\quad D_{0^+}^{\alpha }u(x)-2D_{0^+}^{\alpha _1}u(x) (D_{0^+}^{\alpha _2}u(x))^2-D_{0^+}^{\alpha _3}u(x)=f_1(x), \quad x\in \varOmega ,\\&\quad \quad \quad \quad \quad \, u(0)=0, \quad u'(0)=0, \ldots , u^{(n-1)}(0)=0.\\&{E_2}( b):\quad D_{0^+}^{\alpha }u(x)+a_1(x)\sin (D_{0^+}^{\alpha _1}u(x)) -D_{0^+}^{\alpha _2}u(x)=f_2(x), \quad x\in \varOmega ,\\&\quad \quad \quad \quad \quad \, u(0)=0, \quad u'(0)=0, \ldots , u^{(n-1)}(0)=0.\\&{E_2}( c):\quad D_{0^+}^{3/2}u(x)+a_2(x)\sqrt{D_{0^+}^{1/2}u(x)} +u(x)=f_3(x), \quad x\in \varOmega ,\\&\quad \quad \quad \quad \quad \, u(0)=0, \quad u'(0)=0,\quad u''(0)=0. \end{aligned}$$

Here if \(\alpha <1\), then we suppress the data \(u'(0)=0\), and recall the exact solution given in [25]:

$$\begin{aligned} u(x)=\sum _{i=0}^{\infty }\frac{(-x^{\alpha })^{i}}{\varGamma (\alpha i+1)}. \end{aligned}$$

Equation \(E_1(b)\) is the Bagley-Torvik equation [13] with exact solution \(x^{7/3}\) and

$$\begin{aligned} f_0(x):=\frac{28}{9}x^{1/3}+\frac{112 \pi \sqrt{3}}{135\varGamma {(2/3)}\varGamma {(5/6)}}x^{5/6}+x^{7/3}. \end{aligned}$$

The equation \(E_1(c)\) demonstrate ability of the proposed method in solving problems whose solutions do not have continuous derivatives near the origin. The exact solution of this problem is given by \(u(x)=1-e^{\pi x}\hbox {erfc}(\sqrt{\pi x})\).

The exact solutions for the configuration \(E_2\) (\(E_2(a),\, E_2(b)\), and \(E_2(c)\)) all are of the form \(u(x)=x^{\beta }\), and the source functions \(f_1(x)\) and \(f_2(x)\) are obtained via

$$\begin{aligned} f_1:=&\frac{\varGamma (\beta +1)x^{\beta -\alpha }}{\varGamma (\beta +1-\alpha )}\\&\quad -2\frac{\varGamma (\beta +1)x^{3\beta -\alpha _1-2\alpha _2}}{\varGamma (\beta +1-\alpha _1)}\left( \frac{\varGamma (\beta +1)}{\varGamma (\beta +1-\alpha _2)}\right) ^2-\frac{\varGamma (\beta +1)x^{\beta -\alpha _3}}{\varGamma (\beta +1-\alpha _3)},\\ f_2:=&\frac{\varGamma (\beta +1)x^{\beta -\alpha }}{\varGamma (\beta +1-\alpha )}-a_1(x)\sin \left( \frac{\varGamma (\beta +1)x^{\beta -\alpha _1}}{\varGamma (\beta +1-\alpha _1)}\right) -\frac{\varGamma (\beta +1)x^{\beta -\alpha _2}}{\varGamma (\beta +1-\alpha _2)}. \end{aligned}$$

The coefficient \(a_1(x)\) is a given smooth function.

As for the equation \(E_2(c)\), the exact solution is \(u(x)=x^3\), for which one can observe that the corresponding function \(F_{\alpha _1,\cdots ,\alpha _{\sigma }}[x,u]\) in (1.3) has not a continuous derivative at zero. In this problem \(a_2(x):=x^2\) and

$$\begin{aligned} f_3(x):={\frac{x}{5\,\sqrt{\pi }} \left( 4\,\sqrt{5}x\root 4 \of {\pi }\sqrt{{x }^{5/2}}+5\,{x}^{2}\sqrt{\pi }+40\,\sqrt{x} \right) }. \end{aligned}$$

Using the multiwavelet spectral element method of Sect. 5, the equivalent Volterra integral equation for the configuration \(E_1\) is reduced to a system of linear algebraic equations in matrix form as \(AU=U_0\). According to (4.9), the entries of the coefficient matrix A decay at the rate of \(2^{-JN_{\psi }}\). Thus we expect that by increasing the refinement level J and the vanishing moment of multiwavelets (equivalently r), the coefficients tend to zero for the sufficiently smooth underlying functions. Using a proper threshold, one may set most coefficients to zero and thus obtain a sparse coefficient matrix A. Obviously this process increases the computational speed but it may also increase the error when choosing a larger threshold. The error is controlled by a given tolerance \(\xi \):

$$\begin{aligned} \varepsilon 2^{-J/2}\Vert {\mathscr {P}}_J^ru\Vert _2=\xi . \end{aligned}$$
(6.1)

To preserve the desired accuracy, the optimal threshold value (6.1) must be used. Let \(N_{\varepsilon }\) be the number of nonzero elements of the coefficient matrix after thresholding. Then, the compression percentage (percentage of matrix sparsity) \(P_{\varepsilon }\) is defined by

$$\begin{aligned} P_{\varepsilon }=\frac{N^2-N_{\varepsilon }}{N^2}. \end{aligned}$$

To check efficiency and accuracy of the proposed method for the configuration \(E_1\), we illustrate the effects of the refinement level J and the multiplicity parameter r in Figs. 1, 5 and 6. We observe that by increasing these parameters the \(L_2\) errors decrease and thus resulting in higher CPU times. Since our goal is to decrease the computational cost for linear Cauchy type problem by thresholding, we plot the effect of varying threshold values on the coefficient matrix A and the CPU time (see Figs. 2 and 3). In Fig. 3, the percentage of compression is also presented. One can see that with increasing threshold value, the number of elements of the coefficients matrix decreases which results in lower CPU time and a higher percentage of compression. Figure 4 shows the effect of the thresholding on \(L_2\) error with different values of r, J and \(\varepsilon \). Note that with increasing threshold value the error increases.

Table 1 shows the effect of thresholding on \(L_2\)-error , for the configuration \(E_1(a)\) and with \(\alpha =0.85,\, r=5\) and \(J=4\). Table 2 shows the effect of thresholding, for the refinement parameter \(J (=2,\, 3)\) and the multiplicity \(r (=3,\, 2)\), on \(L_2\)-error for the configuration \(E_1(b)\) and with different \(\varepsilon \)-values.

Fig. 1
figure 1

Effects of the refinement level J (left) and the multiplicity parameter r (right) on \(L_2\) error when \(\alpha =0.85\) for configuration \(E_1(a)\)

Fig. 2
figure 2

Effects of the thresholding with thresholds \(\varepsilon =10^{-5}\) (left) and \(\varepsilon =10^{-3}\) (right) on the coefficient matrix A when \(\alpha =0.85\), taking \(r=2\) and \(J=7\) for configuration \(E_1(a)\)

Fig. 3
figure 3

Effect of thresholding on CPU time (left) and percentage of compression (right) when \(\alpha =0.85\), taking \(r=2\) and \(J=7\) for configuration \(E_1(a)\)

Fig. 4
figure 4

Effect of thresholding on \(L_2\) error when \(\alpha =0.85\), taking \(r=5\), \(J=2\) (left) and \(r=2\), \(J=7\) (right) for configuration \(E_1(a)\)

Table 5 gives the \(L_2\) error for the configuration \(E_2(a)\), for different \(\alpha \), \(\beta \), and for \(J=2, 3\), and \(r=2, 3, 4, 5, 6\). We observe that, by increasing the refinement level J and the multiplicity parameter r, the \(L_2\) error decreases. As for the configuration \(E_2(b)\): choosing \(\alpha =2.5\), \(\alpha _1=1.5\) \(\alpha _2=0.5\), different values for \(\beta \), and various functions for \(a_1(x)\), we can see the effect of parameters J and r in Fig. 7.

Table 1 Effect of thresholding on \(L_2\) error, the percentage of matrix sparsity and the CPU time when \(\alpha =0.85\), taking \(r=5\) and \(J=4\) for configuration \(E_1(a)\)
Fig. 5
figure 5

Effects of the refinement level J (left) and the multiplicity parameter r (right) on \(L_2\) error for configuration \(E_1(b)\)

We return to the critical case of non-differentiability in the vicinity of the origin, i.e. the problems whose corresponding "F"-functions do not have a continuous derivative near the origin.

In the linear case, Tables 34 demonstrate the effect of thresholding on the \(L_2\) error and the percentage of matrix sparsity for \(E_1(c)\). As for the nonlinear Example \(E_{2}(c)\): Fig. 8 is a further justification for robustness of the error analysis, where it can be seen that, by increasing J, the error decreases exponentially, and the harm near \(x=0\) is indeed removable.

To verify the accuracy of our approach we compare Implementing \(E_1(a)\), with the existing results in the literature, e.g. in [12, 24, 36]: In Table 6, the absolute error for the proposed method at \(x=1\) is compared with those of the methods presented in [12, 24]. As seen, we get a substantially better convergence with, relatively, larger mesh size h (lines 5 and 6 of the table compared with the lines 1 and 2 in [12], and lines 3 and 4 in [24]). In Table 7 a comparison is made with the results of the method in [36], for different values of both \(\alpha \) and x. Once again, the results confirm that our method with lower spectral order (degree of approximating polynomilal) gives better accuracy than in [36]. For \(\alpha =1\) and \(\alpha =2\), the exact solutions for \(E_1(a)\) is reported in [36] as \(u(x)=e^{-x}\) and \(u(x)=\cos (x)\), respectively. Our approximate solutions for different values of \(\alpha \), and with \(r=3\) and \(J=3\), plotted in Fig. 9, shows that the numerical solutions converge to the analytical ones as \(\alpha \) approaches an integer order. This comfirms that the solution of the fractional differential equation approaches to that of the integer-order differential equation. This is yet a further test to justisy the merit of our approach.

All examples are carried out with the combined use of Maple and Matlab softwares. Altogether confirms the advantageous effects of thresholding discussed in the paper.

Table 2 Effects of thresholding, the refinement level J and the multiplicity r on \(L_2\) error for configuration \(E_1(b)\)
Fig. 6
figure 6

Effects of the refinement level J (left) and the multiplicity parameter r (right) on \(L_2\) error for configuration \(E_1(c)\)

Table 3 Effect of thresholding on \(L_2\) error and the percentage of matrix sparsity taking \(r=3\) for \(E_1(c)\)
Table 4 Effect of threshold parameter on \(L_2\) error and the percentage of matrix sparsity taking \(r=5\) and \(J=4\) for \(E_1(c)\)
Table 5 \(L_2\) error when \(\alpha =2.5\), \(\alpha _1=1.5\) and \(\alpha _2=0.5\) for configuration \(E_2(a)\)
Fig. 7
figure 7

Effects of the refinement level r (left) and the multiplicity parameter J (right) on \(L_2\) error for configuration \(E_2(b)\)

Fig. 8
figure 8

Effects of the refinement level J (left) and the multiplicity parameter r (right) on \(L_2\) error for configuration \(E_2(c)\)

Table 6 Comparison of absolute value error reported in [12, 24] and proposed method at \(x=1\) for \(E_1(a)\)
Table 7 Comparison of absolute value error reported in [36] and proposed method for \(E_1(a)\)
Fig. 9
figure 9

Approximate solution for different values of \(\alpha \) taking \(r=3\) and \(J=3\) for \(E_1(a)\)

7 Conclusion

We propose a reliable and efficient scheme based on multiwavelet spectral element method for numerical solution of a generalized Cauchy type problem with Caputo fractional derivative. Under certain assumptions, including Lipschitz continuity of the right hand side, we prove existence of a unique solution for the problem. To this approach, we start transforming the equation into a Volterra integral equation and then reduce it to an algebraic system. Our proposed scheme is based on representing the fractional integral operator in the multiwavelet bases as a sparse coefficient matrix. For the resulting equation, in nonlinear form, the numerical computations are more challenging. We propose an adequate numerical scheme to deal with this problem. In the linear case, selecting an appropriate threshold, the number of non-zero coefficients in the system is substantially reduced, thus obtaining a fast convergence with lower cost. More specifically, selecting the high values of the refinement level J and multiplicity parameter r, the error will decrease correspondingly. Furthermore, due to this observation, it is obvious that, by increasing the threshold the CPU time decreases and the error increases. A main obstacle in error estimates is the lack of continuous derivatives of the solution near the origin. This, however, is tackled by a splitting technique, separating a local neighborhood of the origin, where \(L_2\)-estimate based on Chebyshev polynomials is employed. This approach can be extended to two-dimensional domains, with no reentrants, using 2D integral equation representation, and is the subject of a forthcoming study (see also [10]). This, however, requires excessive CPU time to implement. Implementing, goal oriented, numerical examples show the robustness of this approach. The scheme yields a desired converges for an appropriate choice of threshold and is cost-effective where we may, substantially, reduce the number of non-zero coefficients of the system.

Here we mention that the constructed wavelet transform matrix \(T_j\) has an epical role in decreasing the number of nonzero coefficients, hence, reducing the computational load. More specifically, \(T_j\) converts the coefficients in the expasions based on scaling functions to those in multiwavelets. Here, the coefficients of the expansion with respcet to the scaling functions are detemined using interpolation, and no integration is performed. The matrix \(I_\phi ^\alpha \), which defines the fractional integral operator, and as a spars matrix, reduces the computational load, is not considered elsewhere.