1 Introduction

It is surprising that, to our best knowledge, although there exists a huge amount of literature on Monte Carlo method for integration, there are no available works on using Monte Carlo method for differentiation.

This paper is devoted to introducing the numerical evaluation of the Grünwald-Letnikov fractional derivatives by the Monte Carlo method.

It is well known and documented that the Grünwald-Letnikov fractional derivatives play an important role in various numerical methods and approximations of the Riemann-Liouville and Caputo fractional derivatives, based on fractional-order differences, see [11] or [1, 2, 9, 10].

In this work we, however, develop a totally different approach. After recalling the general framework and necessary notions, we introduce a stochastic interpretation of the Grünwald-Letnikov definition of fractional-order differentiation, and demonstrate, that the evaluation of a fractional-order derivative at a given point can be replaced by the study of a certain stochastic process.

Then we outline the basic scheme of the proposed Monte Carlo approach, and provide the algorithm for computations. Close look at the separate stages of this algorithm discovers an unexpected link between our Monte Carlo method for fractional differentiation, on one side, and the known finite difference methods on non-equidistant or nested grids, on the other side. In particular, in both cases the function values are evaluated at the nodes that are more dense near the current point, and less dense towards the beginning of the considered interval.

Implementation in the form of a toolbox for MATLAB allowed experiments with the functions, that are most frequently used in applications of fractional-order modeling, like Heaviside’s unit step function, power function, exponential function, trigonometrical functions, and the Mittag-Leffler function. The provided examples demonstrated excellent agreement of the exact fractional-order derivatives of the considered functions and the numerical results by the proposed method.

In the concluding remarks we emphasize two important aspects. First, the proposed method can be enhanced using standard approaches for improving the classical Monte Carlo method, such as reduction of variance, importance sampling, stratified sampling, control variates or antithetic sampling. Second, the proposed method allows parallelization, which means that finally parallel algorithms can be used in the applications of the fractional calculus.

2 Grünwald-Letnikov fractional derivatives

Let \(\alpha > 0\), and let

$$\begin{aligned} F_0 = \{ f \in L_1=L_1(\mathbb {R}): \exists g \in L_1 \text{ with } \hat{g}(\omega ) = (-i\omega )^\alpha \hat{f}(\omega ), \, \omega \in \mathbb {R}\}, \end{aligned}$$

where \(\hat{f}\) is the Fourier transfrom of f. For \(f \in F_0\) define \(f^{(\alpha )} = g\) if \(g \in L_1\) and \((-i\omega )^\alpha \hat{f}(\omega ) = \hat{g}(\omega )\) for \(\omega \in \mathbb {R}\). The function \(f^{(\alpha )}\), defined uniquely by the uniqueness of the Fourier transform, is the Riemann-Liouville fractional derivative of f.

Similarly, let us define

$$\begin{aligned} F_0^{+} = \{ f \in L_1(\mathbb {R}_{+}) : \exists g\in L_1(\mathbb {R}_{+}) \text{ with } \check{g}(s) = (-s^\alpha ) \check{f}(s), \, \text{ Re }(s) \le 0 \}, \end{aligned}$$

where \(\check{f}(s) = \int _{0}^\infty e^{st} f(t) dt\) denotes the Laplace transform of f. For \(f \in F_0^{+}\) define \(f^{(\alpha )} = g\) if \(g \in L_1(\mathbb {R}_{+})\) and \((-s)^\alpha \check{f}(s) = \check{g}(s)\) for \(\text{ Re } (s) \le 0\).

In order to calculate the fractional-order derivative of f, the Grünwald-Letnikov fractional derivative can be used, that was introduced in [11] as a non-local operator on \(L_1(\mathbb {R})\) given by

$$\begin{aligned} D^\alpha f(t) = \lim _{h \rightarrow 0} A_h^{\alpha } f(t), \quad \alpha> 0, \quad h>0, \end{aligned}$$
(2.1)

where

$$\begin{aligned} A_h^{\alpha } f(t) = \frac{1}{h^\alpha } \sum _{k=0}^{\infty } \gamma (\alpha , k) f(t-kh), \end{aligned}$$
(2.2)

and

$$\begin{aligned} w_k = \gamma (\alpha , k) = (-1)^k \frac{\Gamma (\alpha + 1)}{k! \, \Gamma (\alpha - k + 1)}. \end{aligned}$$
(2.3)

When (2.1) is applied to function \(f \in F_0^{+}\), we extend f to \(L_1(\mathbb {R})\) by setting \(f(t)=0\) for \(t < 0\). Hence the operator (2.1) can be regarded as an operator on \(L_1 (\mathbb {R}_{+})\), see [1, 2, 9, 11] for more details.

In particular, the operator ((2.1) is well defined for a class F of bounded functions f, such that f and its derivatives of order up to \(n>1+\alpha \) exist and are absolutely integrable, and its Fourier transform is \((ik)^\alpha \hat{f} (k)\), see, e.g., [9, pp. 22–23].

Applying the Stirling approximation

$$\begin{aligned} \Gamma (x+1) \sim \sqrt{2 \pi x} x^x e^{-x} \quad \text{ as } \ x \rightarrow \infty , \end{aligned}$$

we have

$$\begin{aligned} w_k \sim \frac{-\alpha }{\Gamma (1-\alpha )} \frac{1}{k^{\alpha + 1}} \quad \text{ as } \ k \rightarrow \infty . \end{aligned}$$
(2.4)

The binomial series

$$\begin{aligned} (1-z)^\alpha = \sum _{k=0}^{\infty } w_k \, z^k \end{aligned}$$

converges for any complex \(|z| \le 1\) and any \(\alpha > 0\). Thus for \(z = 1\) we have \(\sum _{k=0}^{\infty } w_k = (1-1)^\alpha = 0\), and hence

$$\begin{aligned} w_0 = 1, \quad \sum _{k=1}^{\infty } w_k = -1 \end{aligned}$$

This has been noticed by Machado [8].

Denoting \(p_k=-w_k\) (\(k = 1, 2, \ldots \)), we have

$$\begin{aligned} \sum _{k=1}^{\infty } p_k = 1, \end{aligned}$$

where \(p_k = p_k(\alpha ) > 0\) for all \(0<\!\alpha \!<1\) (but there \(\exists k\): \(p_k(\alpha ) < 0\)  if  \(1<\!\alpha \!<2\)). Note that [11]

$$\begin{aligned} p_k(\alpha ) = \left( 1 - \frac{\alpha +1}{k} \right) p_{k-1}(\alpha ), \quad k = 1, 2, \ldots , \text{ with } \, p_0(\alpha ) = 1. \end{aligned}$$

Using (2.4), we can obtain the following relationship:

$$\begin{aligned} \frac{\Delta ^\alpha f(t)}{\Delta t^\alpha }= & {} \frac{1}{(\Delta t)^\alpha } \left[ f(t) + \sum _{k=1}^{\infty } w_k f(t-k \Delta t) \right] \\= & {} \frac{1}{(\Delta t)^\alpha } \sum _{k=1}^{\infty } p_k \left[ f(t) - f(t-k \Delta t) \right] \\\approx & {} \sum _{k=1}^{\infty } \left[ f(t) - f(t-k \Delta t) \right] \frac{\alpha }{\Gamma (1-\alpha )} \frac{1}{(k \Delta t)^{1+\alpha }} \\\approx & {} \int _{0}^{\infty } \left[ f(t) - f(t-y) \right] \frac{\alpha }{\Gamma (1-\alpha )} \frac{1}{y^{1+\alpha }}\, dy \end{aligned}$$

which motivates the general form of fractional-order derivative as

$$\begin{aligned} ^{G}D^{\alpha } f(t) = \int _{0}^{\infty } \left[ f(t) - f(t-y) \right] \frac{\alpha }{\Gamma (1-\alpha )} \frac{1}{y^{1+\alpha }} \, dy. \end{aligned}$$

Integration by parts gives the Caputo from of fractional derivative:

$$\begin{aligned} ^{C}D^{\alpha } f(t) = \frac{1}{\Gamma (1-\alpha )} \int _{0}^{\infty } \frac{d}{dt} f(t-y) \frac{dy}{y^\alpha }, \quad 0< \alpha < 1, \end{aligned}$$

which is just the regularized form of the Riemann–Liouville fractional derivative:

$$\begin{aligned} ^{RL}D^{\alpha } f(t) = \frac{1}{\Gamma (1-\alpha )} \frac{d}{dt} \int _{0}^{\infty } f(t-y) \frac{dy}{y^\alpha }, \quad 0< \alpha < 1. \end{aligned}$$

Thus, the Grünwald-Letnikov fractional derivative can be considered as approximation of the Caputo of the Riemann-Liouville fractional derivatives in the numerical analysis of fractional differential equations. In particular, if \(f \in F\) (or \(F^{+}\)) then the non-local operator \(A_h^\alpha f\) defined in (2.2) converges to \(^{RL}D^{\alpha } f\) or \(^{C}D^{\alpha } f\) in \(L_1(\mathbb {R})\) (or \(L_1(\mathbb {R}_{+})\)) norm as \(h \rightarrow 0\), see [9, 11] for more details.

3 Monte Carlo approach to the Grünwald-Letnikov fractional derivatives

Let Y be a discrete random variable such that

$$\begin{aligned} \mathbb {P}\{ Y=k \} = p_k = p_k(\alpha ) = (-1)^{k+1} \frac{\Gamma (\alpha + 1)}{k! \, \Gamma (\alpha - k + 1)}, \quad k = 1, 2, \ldots \end{aligned}$$
(3.1)

Note that \(\mathbb {E}\,Y = \infty \), \(0<\alpha <1\). (\(\mathbb {E}\) denotes the mathematical expectation.)

Given \(f \in F_0\) (or \(F_0^+\) or F), we define the stochastic process

$$\begin{aligned} \xi _h(t) = f(t - Yh). \end{aligned}$$
(3.2)

Then, if \(\mathbb {E} f(t - Yh) < \infty \), we have

$$\begin{aligned} \sum _{k=0}^\infty \gamma (\alpha , k) f(t - kh) = f(t) - \sum _{k=1}^\infty p_k(\alpha ) F_f(k) = f(t) - \mathbb {E} f(t-Yh), \end{aligned}$$
(3.3)

where

$$\begin{aligned} F_f(k) = f(t-kh), \quad k = 1, 2, \ldots \end{aligned}$$

We assume that for \(f \in F_0\) (or \(F_0^+\) or F)

$$\begin{aligned} \mathbb {E} f(t-Yh) < \infty \end{aligned}$$

for any fixed t and h.

Let \(Y_1\), \(Y_2\),..., \(Y_n\), ..., are independent copies of the random variable Y; then by the strong law of large numbers

$$\begin{aligned} \frac{1}{N} \sum _{n=1}^{N} f(t - Y_n h) \, \longrightarrow \, \mathbb {E} f(t - Yh) \end{aligned}$$

with probability one for any fixed t and h, and hence

$$\begin{aligned} A_{N,h}^\alpha f(t) = \frac{1}{h^\alpha } \left[ \! f(t) - \frac{1}{N} \sum _{n=1}^{N} f(t - Y_n h)\! \right] \longrightarrow A_h^\alpha f(t), \quad 0< \alpha < 1,\! \quad N \rightarrow \infty , \end{aligned}$$

with probability one, where \(A_h^\alpha f\) is defined by (2.2).

Moreover, if \(f \in F_0\) (or \(F_0^+\) or F) and \(\mathrm {Var} f(t-Yh) < \infty \), then by the central limit theorem and Slutsky lemma as \(N \rightarrow \infty \) we have

$$\begin{aligned} B_N^\alpha = \frac{A_{N,h}^\alpha f(t) - A_h^\alpha f(t)}{\sqrt{v_N}} \longrightarrow ^{D} N (0,1), \end{aligned}$$

where \(\longrightarrow ^{D}\) means convergence in distributions and N(0, 1) is the standard normal law.

Here \(v_N\) is the sample variance

$$\begin{aligned} v_N = \frac{1}{N (N-1)h^{2\alpha }} \sum _{n=1}^{N} \left[ f(t - Y_n h) - \frac{1}{N} \sum _{n=1}^{N} f(t - Y_n h) \right] ^2. \end{aligned}$$

Thus, for a given \(\varepsilon > 0\)

$$\begin{aligned} \mathbb {P} \left\{ |B_N^\alpha | \le \Phi _{1-\frac{\varepsilon }{2}} \right\} = \mathbb {P} \left\{ |N(0,1)| \le \Phi _{1-\frac{\varepsilon }{2}} \right\} = 1-\varepsilon , \end{aligned}$$

where \(\Phi _{1-\frac{\varepsilon }{2}}\) is the quantily of the standard normal law, and hence for a large N we have with probability \(1-\varepsilon \) the following asymptotic confidence interval:

$$\begin{aligned} \mathbb {P} \left\{ A_{N,h}^\alpha f(t) - \Phi _{1-\frac{\varepsilon }{2}} \sqrt{v_N} \le A_h^\alpha f(t) \le A_{N,h}^\alpha f(t) + \Phi _{1-\frac{\varepsilon }{2}} \sqrt{v_N} \right\} \approx 1-\varepsilon ; \end{aligned}$$

for example, if \(\varepsilon = 0.05\), \(\Phi _{1-\frac{\varepsilon }{2}} \approx 1.96\).

4 The basic scheme of the method

The above results can be used as the basis of the Monte Carlo method for numerical approximation and computation of the Grünwald-Letnikov fractional derivatives.

Indeed, we can replace the samples \(Y_1\), \(Y_2\), ..., \(Y_N\) by their Monte Carlo simulations. For the simulation of the random variable Y with distribution (3.1), we introduce the cumulative distribution function

$$\begin{aligned} F_j = \sum _{i=1}^{j} p_i, \end{aligned}$$

where \(p_k=p_k(\alpha )\) are defined in (3.1). Then

$$\begin{aligned} 0 = F_0< F_1< \ldots< F_j < \ldots , \text{ and } p_j = F_j - F_{j-1}. \end{aligned}$$

If U is a random variable uniformly distributed on [0, 1], then

$$\begin{aligned} \mathbb {P} (F_{j-1}<U<F_j) = p_j , \end{aligned}$$

and hence, to generate \(Y \in \{ 1, 2, \ldots \}\), we set

$$\begin{aligned} Y=k, \text{ if } \, F_{k-1} \le U < F_k. \end{aligned}$$
(4.1)

Each trial (draw) of the proposed Monte Carlo method includes the following steps.

  1. 1.

    Evaluate \(p_i=p_i(\alpha )\) defined in (3.1).

  2. 2.

    Evaluate \(F_j = \sum \limits _{i=1}^{j} p_i\).

  3. 3.

    Generate N independent uniformly distributed random points, and compute the values \(Y_i\) using (4.1).

  4. 4.

    Evaluate the expression

    $$\begin{aligned} A_{N,h}^\alpha f(t) = \frac{1}{h^\alpha } \left[ f(t) - \frac{1}{N} \sum _{k=1}^{N} f(t - Y_k h) \right] \,. \end{aligned}$$
    (4.2)

After repeating steps 1–4 K times (trials), the mean of the obtained K values gives an approximation of the value of the fractional derivative of order \(\alpha \), \(0<\alpha \le 1\), at point t.

5 Close look at the stages of implementation

The proposed algorithm has been implemented in MATLAB [13]; this allows useful visualizations and numerical experiments with the functions that are frequently used in the fractional calculus and in fractional-order differential equations.

Obviously, the approximation (4.2) involves the mean of the values of f(t) evaluated at points \(t_k = t - Y_k h\) (\(k = 1, \ldots , N\)), where t is the current point of interest. In other words, all necessary function values are taken with the same weight.

The values \(F_j\) divide the interval [0, 1] into subintervals of unequal length. In Fig. 1 those divisions of [0, 1] are shown for orders \(\alpha =0.3\), \(\alpha =0.5\), and \(\alpha =0.7\). The first point is equal to \(\alpha \), and then the density of points \(F_j\) increases towards 1. This naturally produces larger values of \(Y_k\) towards 0, and smaller values of \(Y_k\) towards 1.

Fig. 1
figure 1

Distribution of the values of \(F_j\) within [0, 1] for \(\alpha =0.3\), \(\alpha =0.5\), and \(\alpha =0.7\), for \(N=2000\) points of division

Fig. 2
figure 2

Examples of random distribution of the nodes \(t_k\), at which the function values are computed in the interval \(0 \le t \le 5\), for \(\alpha =0.3\) and \(N=2000\) points of division

As a result, the points \(t_k\), at which the function f(t) is evaluated, are distributed over the interval [0, t] non-uniformly. In Fig. 2, Fig. 3, and Fig. 4 are shown examples of the distributions of \(t_k\) over [0, t] in some trials (draws) for \(t=5\) and \(\alpha =0.3\), \(\alpha =0.5\), \(\alpha =0.7\). We observe that the density of \(t_k\) towards 1 increases with increasing value of \(\alpha \).

Fig. 3
figure 3

Examples of random distribution of the nodes \(t_k\), at which the function values are computed in the interval \(0 \le t \le 5\), for \(\alpha =0.5\) and \(N=2000\) points of division

Fig. 4
figure 4

Examples of random distribution of the nodes \(t_k\), at which the function values are computed in the interval \(0 \le t \le 5\), for \(\alpha =0.7\) and \(N=2000\) points of division

Fig. 5
figure 5

Derivative of order 0.2 of the Heaviside unit-step function H(t)

Fig. 6
figure 6

Derivative of order 0.5 of the Heaviside unit-step function H(t)

The overall picture reminds some recent efforts in the finite difference methods based on non-equidistant grids or nested meshes [3,4,5,6,7, 14]. The common feature is represented by the higher density of discretization nodes near the current point, and the lesser density far from the current point; see [4, Fig. 1] and [7, Fig 2]. However, in the aforementioned papers the function values at the discretization nodes are taken with different weights, while in the proposed Monte Carlo approach all weights are the same.

Fig. 7
figure 7

Derivative of order 0.5 of the power function \(y(t)=t^\nu \), \(\nu = 1.3\)

The dependence of the density of nodes on \(\alpha \) indicates that the finite difference approaches using non-uniform grids or nested meshes can be improved, if this dependence on the order \(\alpha \) of fractional differentiation would eventually be taken into account.

6 Examples

The following examples demonstrate the use of the proposed Monte Carlo method for fractional-order differentiation. In all examples, the results of the computations are compared with the exact fractional-order derivatives of the function under fractional-order differentiation. The Mittag-Leffler function [11]

$$\begin{aligned} E_{\alpha , \beta } (z) = \sum _{n=0}^{\infty } \frac{z^n}{\Gamma (\alpha n + \beta )}, \quad z \in \mathbb {C}, \quad \alpha , \beta > 0, \end{aligned}$$

that appears in some of the provided examples, is computed using [12]. In all examples the considered interval is sufficiently large, namely \(t \in [0, 10]\). The exact fractional derivatives are plotted using solid lines, the results of the proposed Monte Carlo method are shown by bold points, the results of K individual trials (draws) are shown by vertically oriented small points (in all examples, \(K=100\)), and the confidence intervals are shown by short horizontal lines above and below the bold points.

Fig. 8
figure 8

Derivative of order 0.5 of the function \(y(t) = e^{-\lambda t} - 1\), \(\lambda = 0.1\)

Fig. 9
figure 9

Derivative of order 0.5 of the Mittag-Leffler function \(y(t) = E_{\alpha , 1} (-\lambda t^\alpha )\), \(\alpha =0.5\), \(\lambda = 0.1\)

Fig. 10
figure 10

Derivative of order 0.7 of the function \(y(t) = E_{\alpha , 1} (-\lambda t^\alpha ) -1\), \(\alpha =0.5\), \(\lambda = 0.1\)

Fig. 11
figure 11

Derivative of order 0.5 of the function \(y(t) = \sin (t)\)

Fig. 12
figure 12

Derivative of order 0.5 of the function \(y(t) = \cos (t)\)

6.1 Example 1. The Heaviside function

$$\begin{aligned} y(t) = H(t), \qquad D^\alpha y(t) = \frac{t^{-\alpha }}{\Gamma (1 - \alpha )}, \quad t > 0. \end{aligned}$$

The result for \(\alpha = 0.2\) is shown in Fig. 5, and for \(\alpha = 0.5\) in Fig. 6. Since \(H(t) = 1\) for \(t>0\), these figures represent, in fact, the Grünwald-Letnikov (and Riemann-Liouville) fractional-order derivative of a constant. We see that in this case the sample variance of the values obtained in individual trials (draws) is very small.

6.2 Example 2. The power function

$$\begin{aligned} y(t) = t^\nu , \qquad D^\alpha y(t) = t^{\nu -\alpha } \frac{\Gamma (\nu + 1)}{\Gamma (\nu +1 - \alpha )}, \quad t > 0. \end{aligned}$$

The result for \(\nu = 1.3\) and \(\alpha = 0.5\) is shown in Fig. 7. In this case, the sample variance of the values obtained in individual trials (draws) increases, but the confidence intervals remain sufficiently small.

6.3 Example 3. The exponential function

$$\begin{aligned} y(t) = e^{-\lambda t} - 1, \qquad D^\alpha y(t) = t^{-\alpha } E_{1, 1-\alpha } (-\lambda t^\alpha ) - \frac{t^{\alpha }}{\Gamma (1-\alpha )}, \quad t > 0. \end{aligned}$$

The result for \(\lambda = 0.1\) and \(\alpha = 0.5\) is shown in Fig. 8. The sample variance of the values obtained in individual trials (draws) increases, but the confidence intervals remain sufficiently small.

6.4 Example 4. The Mittag-Leffler function

$$\begin{aligned} (A) \qquad y(t) = E_{\alpha , 1} (-\lambda t^\alpha ), \qquad D^\alpha y(t) = t^{-\alpha } E_{\alpha , 1-\alpha } (-\lambda t^\alpha ), \quad t > 0. \end{aligned}$$

The result for \(\lambda = 0.1\) and \(\alpha = 0.5\) is shown in Fig. 9. We see that in this case the sample variance of the values obtained in individual trials (draws) is very small, and the same holds for the confidence intervals.

$$\begin{aligned} (B) \quad y(t) = E_{\alpha , 1} (-\lambda t^\alpha ) - 1, \quad D^\alpha y(t) = t^{-\alpha } E_{\alpha , 1-\alpha } (-\lambda t^\alpha ) - \frac{t^{\alpha }}{\Gamma (1-\alpha )}, \,\, t > 0. \end{aligned}$$

The result for \(\lambda = 0.1\) and \(\alpha = 0.7\) is shown in Fig. 10. We see that in this case, while the sample variance of the values obtained in individual trials (draws) is large enough, the confidence intervals are still sufficiently small.

6.5 Example 5. The trigonometric functions

Taking into account that \(\sin (t) = t E_{2,2} (-t^2)\) and \(\cos (t) = E_{2,1} (-t^2)\), we have:

$$\begin{aligned}&(A) \qquad y(t) = \sin (t), \qquad D^\alpha y(t) = t^{1-\alpha } E_{2, 2-\alpha } (-t^2), \quad t> 0,\\&(B) \qquad y(t) = \cos (t), \qquad D^\alpha y(t) = t^{-\alpha } E_{2, 1-\alpha } (-t^2), \quad t > 0. \end{aligned}$$

The results for (A) and (B) are shown in Fig. 11 and Fig. 12, respectively. We see that in this case also the sample variance of the values obtained in individual trials (draws) is very small, and the same holds for the confidence intervals.

Overall, these examples show that the proposed Monte Carlo method for fractional-order differentiation works well for various kinds of functions that are important for applications of the fractional calculus, and for functions of various kinds of behavior.

7 Concluding remarks: a way to parallelization

In this work, the Monte Carlo method is proposed for approximation and computation of fractional-order derivatives. It can be used for evaluation of all three types of fractional-order derivatives, that usually appear in applications: the Grünwald-Letnikov, the Riemann-Liouville, and also the Caputo fractional derivatives, when they are equivalent to the Riemann-Liouville derivatives [11, Section 3.1].

The proposed method is implemented in the form of a toolbox for MATLAB, and illustrated on several examples. This opens a way to development of a family of Monte Carlo methods for the fractional calculus, using standard methods for enhancements, such as reduction of variance, importance sampling, stratified sampling, control variates or antithetic sampling.

By its nature, the proposed Monte Carlo method for fractional-order differentiation allows parallelization of computations on multiple core processors, GPUs, computer grids, and on parallel computers, and therefore have high potential for applications.