1 Introduction

This paper is a continuation of our paper [8], where the Monte Carlo method for fractional differentiation was introduced and used for the approximation and evaluation of the Grünwald–Letnikov fractional derivative of order \(0<\alpha <1\). The goal of the present work is the extension of the Monte Carlo method for fractional-order differentiation to higher orders.

Let us recall that the Grünwald–Letnikov fractional derivative is a non-local operator on \(L_1({\mathbb {R}}\)) defined as [10]

$$\begin{aligned} D^\alpha f(t) = \lim _{h \rightarrow 0} A_{h}^{\alpha } f(t), \quad \alpha> 0, \quad h>0, \end{aligned}$$
(1.1)

where

$$\begin{aligned} A_{h}^{\alpha } f(t) = \frac{1}{h^\alpha } \sum _{k=0}^{\infty } w_k f(t-kh), \end{aligned}$$
(1.2)
$$\begin{aligned} w_k = \gamma (\alpha ,k) = (-1)^k \frac{\varGamma (\alpha +1)}{k! \, \varGamma (\alpha - k + 1)} . \end{aligned}$$
(1.3)

For \(f(t)=0\) for \(t < 0\), the fractional-order difference (1.2) can be used for numerical evaluation of the Grünwald–Letnikov fractional order derivative, the Riemann–Liouville fractional derivative, and the Caputo fractional derivative when it is equivalent to the Riemann–Liouville one, [10].

2 A bit of history: signed probabilities and fractions of a coin

Since for the coefficients \(w_k = \gamma (\alpha ,k)\), \(k = 1, 2, \ldots \), we have [8, 9],

$$\begin{aligned} \sum _{k=1}^{\infty }\left( -\gamma (\alpha ,k)\right) = 1, \end{aligned}$$
(2.1)

we can look at the coefficients \(w_k\) as probabilities. However, not always all coefficients \(w_k = \gamma (\alpha ,k)\) are positive, and in such situations we can consider them as signed probabilities, that can be positive or negative, while preserving the property (2.1).

The notion of negative probability goes back to 1932 to Wigner’s remark [16] that some considered expressions

“...cannot be really interpreted as the simultaneous probability for coordinates and momenta, as is clear from the fact, that it may take negative values. But of course this must not hinder the use of it in calculations as an auxiliary function which obeys many relations we would expect from such a probability.”

In 1942, Dirac [4] who used negative energy and negative probabilities in quantum mechanics, emphasized that negative energy states occur with negative probability and positive energy states occur with positive probability and that it is possible to develop a theory would allow its application “to a hypothetical world in which the initial probability of certain states is negative, but transition probabilities calculated for this hypothetical world are found to be always positive”, and concluded that

“Negative energies and probabilities should not be considered as nonsense. They are well-defined concepts mathematically, like a negative sum of money, since the equations which express the important properties of energies and probabilities can still be used when they are negative. Thus negative energies and probabilities should be considered simply as things which do not appear in experimental results.”

It appears that Dirac’s paper directly motivated Bartlett in 1945 to introduce signed probabilities [1], and one of the important consequences of allowing negative probabilities was that

“...a negative probability implies automatically a complementary probability greater than unity...”

This idea was also elaborated by Feynman [6] in 1987. Similarly to Bartlett, he mentioned that using negative probabilities leads to simultaneously allowing probabilities greater than one, but “probability greater than unity presents no problem different from that of negative probabilities”, because those are used in intermediate calculations and the final result is always expressed in positive numbers between 0 and 1.

This development of the idea of allowing probabilities to be negative and positive naturally leads to the algebraic theory of probability, see [14]. As quoted above, negative probabilities can be used in the intermediate computations, while the final result must be physical (which means – positive, measurable).

This is similar to using, say, negative resistors or capacitors in electrical circuits: there are no passive elements having negative resistance, but it is possible to create circuits exhibiting negative resistances, negative capacitances, and negative inductances. This was pointed out by Bode in his classical book [2, Chapter IX]. Such circuits are called negative-impedance converters ([5, 12]) and are based on using operational amplifiers. For example, the negative resistance of −10 k\(\varOmega \) means that if such an element is connected in series with a classical 20 k\(\varOmega \) resistor, then the resistance of the resulting connection is 10 k\(\varOmega \).

A beautiful example of the interpretation of negative probabilities was developed by Székely [15], who introduced half-coins as random variables that take the values n with signed probabilities – positive for odd values of n and negative for even values. A fair coin is a random variable X that take the values 0 and 1 with probability 1/2. Its probability generating function is \({\mathbb {E}} \, z^X =(1+z)/2\). The addition of independent random variables corresponds to multiplication of their probability generating functions. Therefore, the probability generating function of the sum of two fair coins is equal to \(((1+z)/2)^2\), and it is natural to define a half-coin via its probability generating function

$$\begin{aligned} \left( \frac{1+z}{2}\right) ^\mu = \sum _{n=0}^{\infty } p_n z^n, \quad \mu = \frac{1}{2}, \end{aligned}$$

where for \(\mu = 1/2\) the coefficients \(p_n = \left( {\begin{array}{c}1/2\\ n\end{array}}\right) \) have alternating signs. When “flipping” two half-coins, the sum of the outcomes is 0 or 1 with probability 0.5, like in the case of flipping a normal coin.

Taking \(\mu = 1/3\) yields third-coins and “flipping” three of them makes a normal coin; \(\mu = 1/4\) defines a quarter-coin, etc. Non-integer values of \(\mu \) produce \(\mu \)-coins, and to get a normal coin it is necessary to “flip” \(1/\mu \) of such \(\mu \)-coins, which depending on \(\mu \) can be a finite or even infinite number of such “fractions of coins”.

Extending the Monte Carlo method introduced in our paper [8] for the case of \(0<\alpha <1\) to orders higher than one leads to working with signed probability distributions, because in such case the coefficients \(w_k=\gamma (\alpha ,k)\) in the fractional-order difference (1.2) have different signs, and some of them have values outside of the interval (0, 1), as mentioned by Bartlett and Feynman, while still satisfying the property (2.1).

3 Monte Carlo fractional differentiation using signed probabilities

Since we deal with signed probabilities, it is necessary to follow the ideas of Dirac, Bartlett, and Feynman, which means that for using the Monte Carlo method we have to transform the signed probabilities to the positive probabilities. As a result, the expression used for Monte Carlo simulations will consist of two parts – one of them might contains terms with different signs and is independent of the random variables used in the Monte Carlo draws (trials), and the other contains the terms of the same (positive) sign and is dependent of those random variables. Below we first illustrate this for the cases \(1<\alpha <2\) and \(2<\alpha <3\), and then present a framework for the general case.

3.1 The case of order \(1<\alpha <2\)

When \(1<\alpha <2\), we have \(w_1=\gamma (\alpha ,1) = -\alpha < 0\), and \(w_k > 0\) for \(k=2, 3, \ldots \)

Let \(Y \in \{1, 2, \ldots \}\) be a discrete random variable such that

$$\begin{aligned} {\mathbb {P}} (Y=1)= & {} p_1 = p_1(\alpha ) = 2-\alpha \in (0,1) \nonumber \\ {\mathbb {P}} (Y=k)= & {} p_k = p_k(\alpha ) = w_k = (-1)^k \frac{\varGamma (\alpha +1)}{k! \, \varGamma (\alpha - k + 1)}, \,\, k = 2, 3, \ldots \quad \end{aligned}$$
(3.1)

and

$$\begin{aligned} \sum _{k=1}^{\infty } p_k = 1, \end{aligned}$$

since

$$\begin{aligned} \sum _{k=0}^{\infty } w_k = 0 = 1- \alpha +\sum _{k=2}^{\infty } w_k, \end{aligned}$$

or

$$\begin{aligned} -1= & {} - \alpha +\sum _{k=2}^{\infty } w_k, \\ 1= & {} 2-\alpha + \sum _{k=2}^{\infty } w_k. \end{aligned}$$

Note that \({\mathbb {E}} \, Y < \infty \), but \(\text{ Var } \, Y = \infty \).

We have the following:

$$\begin{aligned} \sum _{k=0}^{\infty } w_k \, f(t - kh)= & {} f(t) + \sum _{k=1}^{\infty } w_k \, f(t - kh) \nonumber \\= & {} f(t) -\alpha f(t-h) + \sum _{k=2}^{\infty } w_k \, f(t - kh) \nonumber \\= & {} f(t) + (-2 + p_1(\alpha )) f(t-h) + \sum _{k=2}^{\infty } w_k \, f(t - kh) \nonumber \\= & {} f(t) - 2f(t-h) + \sum _{k=1}^{\infty } w_k \, f(t - kh) \nonumber \\= & {} f(t) - 2f(t-h) + {\mathbb {E}} f(t-Yh), \end{aligned}$$
(3.2)

if the stochastic process \(\zeta _h(t) = f(t - Y h)\) is such that for a fixed f, t, and h \({\mathbb {E}}\, f(t-Yh) < \infty \), where the random variable Y is defined by (3.1).

Let \(Y_1\), \(Y_2\), ..., \(Y_n\), ...be independent copies of the random variable Y, then by the strong law of large numbers

$$\begin{aligned} \frac{1}{N} \sum _{n=1}^{N} f(t - Y_n h) \rightarrow {\mathbb {E}} f(t- Y\,h), \quad N \rightarrow \infty \end{aligned}$$

with probability one for any fixed t and h, and hence, as \(N \rightarrow \infty \),

$$\begin{aligned} A_{N,h}^{\alpha } = \frac{1}{h^\alpha } \bigl [ f(t) - 2 f(t-h) + \frac{1}{N} \sum _{n=1}^{N} f(t - Y_n h) \bigr ] \end{aligned}$$
(3.3)

with probability one converges to \(A_h^\alpha f(t)\), \(\alpha \in (1,2)\), defined by (1.2).

Moreover, if for fixed t and h

$$\begin{aligned} \text{ Var } f(t-Yh) < \infty , \end{aligned}$$

then by the central limit theorem and Slutsky lemma, as \(N \rightarrow \infty \), we have

$$\begin{aligned} B_N^\alpha = \bigl [ A_{N,h}^{\alpha } f(t) - A_{h}^{\alpha } f(t) \bigr ] / \sqrt{v_N} \rightarrow ^D N(0,1), \end{aligned}$$

where \(\rightarrow ^D\) denotes convergence in distributions and N(0, 1) is the standard normal law, and \(v_N\) is a sample variance of random variables \(f(t-Y_n h)h^{-\alpha }\), \(n=1, 2, \ldots , N\). This allows us to build asymptotic confidence intervals (see details in [8]).

The above results can be used as the basis of the Monte Carlo method for numerical approximation and evaluation of the Grünwald–Letnikov fractional derivatives.

Indeed, we can replace the sample \(Y_1\), \(Y_2\), ..., \(Y_N\) by its Monte Carlo simulations.

For the simulation of the random variable Y with the distribution (3.1), we define

$$\begin{aligned} F_j = \sum _{i=1}^{j} p_i, \end{aligned}$$

where \(p_i = p_i(\alpha )\) are defined in (3.1).

Then

$$\begin{aligned} 0 = F_0< F_1< \ldots < F_j \le \ldots , \quad \text{ with } \ \ p_i = F_i - F_{i-1}. \end{aligned}$$

If U is a random variable uniformly distributed on [0, 1], then

$$\begin{aligned} {\mathbb {P}} (F_{j-1}< U < F_j) = p_j, \end{aligned}$$

and to generate \(Y \in {1, 2, \ldots }\) we set \(Y=k\)  if  \(F_{k-1} \le U < F_k\).

3.2 The case of order \(2<\alpha <3\)

When \(2< \alpha <3\), we have

$$\begin{aligned} w_1= & {} -\alpha< 0; \quad w_2 = \frac{\alpha (\alpha - 1)}{2} > 0; \quad \\ w_3= & {} - \frac{\alpha (\alpha -1) (\alpha -2)}{6}< 0; \\&w_k < 0 \ \ \text{ for } \ \ k = 3, 4, \ldots \end{aligned}$$

Using the binomial series

$$\begin{aligned} (1-z)^\alpha = \sum _{k=0}^{\infty } w_k z^k, \quad |z|\le 1, \end{aligned}$$

we easily obtain

$$\begin{aligned} -1 = -\alpha + \frac{\alpha (\alpha -1)}{2} + \sum _{k=3}^{\infty } w_k, \end{aligned}$$

and denoting

$$\begin{aligned} q_1 = \alpha ; \quad q_2 = - \frac{\alpha (\alpha -1)}{2}; \quad q_k= -w_k, \ \, k\ge 3, \end{aligned}$$

we have

$$\begin{aligned} 1 = q_1 + q_2 + \sum _{k=3}^{\infty } q_k, \end{aligned}$$

and

$$\begin{aligned} q_1 + q_2 = \alpha - \frac{\alpha (\alpha -1)}{2} = \frac{3}{2} \alpha - \frac{\alpha ^2}{2} \in (0,1) \quad \text{ for } \ \ \alpha \in (2, 3). \end{aligned}$$

Thus, putting \(p_2 = q_1 + q_2\) and \(p_k = q_k = -w_k\) for \(k\ge 3\),

$$\begin{aligned} \sum _{k=2}^{\infty } p_k = 1, \end{aligned}$$

where \(0< p_k < 1\) for \(k = 2, 3, \ldots \)

Now, introducing the discrete random variable

$$\begin{aligned} Y \in \{ 2, 3, \ldots , n, \dots \} \end{aligned}$$

such that

$$\begin{aligned} {\mathbb {P}} \left( Y = k \right) = p_k, \quad k = 2, 3, \ldots , \end{aligned}$$

we have the following:

$$\begin{aligned} \sum _{k=0}^{\infty } w_k f(t-kh)= & {} f(t) -\alpha f(t-h) + \frac{\alpha (\alpha -1)}{2} f(t-2h) + \sum _{k=3}^{\infty } w_k f(t-kh) \nonumber \\= & {} f(t) - p_1 f(t-h) + p_2 f(t-2h) - \sum _{k=3}^{\infty } p_k f(t-kh) \nonumber \\= & {} f(t) - \alpha \left[ f(t-h) - f(t-2h) \right] - (p_1 - p_2) f(t-2h) \nonumber \\&- \sum _{k=3}^{\infty } p_k f(t-kh) \nonumber \\= & {} f(t) - \alpha \left[ f(t-h) - f(t-2h) \right] - \sum _{k=2}^{\infty } p_k f(t-kh) \nonumber \\= & {} f(t) - \alpha \left[ f(t-h) - f(t-2h) \right] - {\mathbb {E}} \, f(t-Yh). \end{aligned}$$
(3.4)

Let \(Y_1\), \(Y_2\), ..., \(Y_n\), ...be independent copies of the random variable Y, then for for fixed t and h such that

$$\begin{aligned} {\mathbb {E}} \, f(t-Yh) < \infty \end{aligned}$$
(3.5)

and by the strong law of large numbers we obtain

$$\begin{aligned} \frac{1}{N} \sum _{n=1}^{N} f(t-Y_n h) \longrightarrow {\mathbb {E}} \, f(t-Yh) \end{aligned}$$

with probability one as \(N \rightarrow \infty \).

Thus under assumption (3.5) we have the following convergence with probability one:

$$\begin{aligned} A_{h}^{N,\alpha } f(t)= & {} \frac{1}{h^\alpha } \left( f(t) - \alpha f(t-h) + \alpha f(t-2h ) \right) \\&\qquad - \frac{1}{N} \sum _{n=1}^{N} f(t-Y_n h) \longrightarrow A_{h}^{\alpha } f(t), \qquad N \rightarrow \infty , \end{aligned}$$

and this can be used for evaluation of the Grünwlad–Letnikov fractional derivative by the Monte Carlo method.

The simulation of \(Y_1\), \(Y_2\), ..., \(Y_n\), ...is similar to the previous cases of \(\alpha \in (0,1)\) and \(\alpha \in (1,2)\), with the only difference that

$$\begin{aligned} F_j = \sum _{i=2}^{j} p_i, \qquad p_i = F_i-F_{i-1}, \end{aligned}$$

and if \(F_{k-1}< U < F_k\), then \(Y = k+1\) (k = 1, 2, ...), where U is a random variable uniformly distributed on [0, 1].

3.3 The general case for any \(\alpha >0\)

It turns out that the methodology presented above for \(\alpha \in (1,2)\) and \(\alpha \in (2,3)\) can be extended to the case of arbitrary \(\alpha > 0\) in terms of generalized (signed) probabilities.

Let us consider a generalized random variable \({\bar{Y}} \in {1, 2, \ldots }\) such that

$$\begin{aligned} \sum _{k=1}^{\infty } q_k = 1, \quad q_k \in {\mathbb {R}}, \end{aligned}$$

where we interpret \(q_k\) as generalized probability \(\bar{{\mathbb {P}}} ({\bar{Y}}=k)\), which are allowed to be negative, but for which there exists an integer \(r \ge 1\) such that

$$\begin{aligned}&\quad q_1 + q_2 \in (0, 1), \\&\quad q_3 + q_4 \in (0, 1), \\&\quad \ldots \ldots \\&\quad q_{2r-1} + q_{2r} \in (0, 1) \\&\quad q_{2r+1} \in (0, 1), \quad q_{2r+2} \in (0, 1), \ldots \end{aligned}$$

Then we can write the following formal identity for the Borel function g(k), \(k = 1, 2, \ldots \):

$$\begin{aligned} \sum _{k=1}^{\infty } g(k) q_k= & {} q_1 ( g(1)-g(2) ) + g(2) (q_1+q_2) \\&+ \, q_3 (g(3)-g(4)) + g(4) (q_3 + q_4)\\&+ \ldots + q_{2j-1} ( g(2j-1) - g(2j) ) + g(2j) ( q_{2j-1} + q_{2j} ) \\&+ \ldots + q_{2r-1} ( g(2r-1)-g(2r) ) + g(2r) (q_{2r-1} + q_{2r}) \\&+ \, g(2r+1) q_{2r+1} + g(2r+2) q_{2r+2} + \ldots , \end{aligned}$$

which can be finally written as

$$\begin{aligned} \sum _{k=1}^{\infty } g(k) q_k= & {} \sum _{j=1}^{r} q_{2j-1} ( g(2j-1) - g(2j) ) \end{aligned}$$
(3.6)
$$\begin{aligned}&+ \sum _{j=1}^{r} g(2j) (q_{2j-1} + q_{2j} ) \end{aligned}$$
(3.7)
$$\begin{aligned}&+ \sum _{j=1}^{\infty } g(2r+j) q_{2r+j}. \end{aligned}$$
(3.8)

Thus, denoting

$$\begin{aligned} \pi _{j+1}= & {} q_{2j-1} + q_{2j}, \quad j = 1, 2, \ldots , r,\\ \pi _{r+1}= & {} q_{r+j}, \quad j = r+1, r+2, \ldots , \end{aligned}$$

we define the generalized expectation

$$\begin{aligned} \bar{{\mathbb {E}}} \,g ({\bar{Y}})&{\mathop {=}\limits ^\mathrm{def}}&\sum _{k=1}^{\infty } g(k) q(k) \end{aligned}$$
(3.9)
$$\begin{aligned}= & {} \sum _{j=1}^{r} q_{2j-1} ( g(2j-1) - g(2j) ) \end{aligned}$$
(3.10)
$$\begin{aligned}&+ \sum _{j=1}^{r} g(2j) \pi _{j+1} + \sum _{j=r+1}^{\infty } g(r+j) \pi _{r_j} \end{aligned}$$
(3.11)
$$\begin{aligned}= & {} \sum _{j=1}^{r} q_{2j-1} ( g(2j-1) - g(2j) ) + {\mathbb {E}} \, {\bar{g}} (Y), \end{aligned}$$
(3.12)

where the ordinary discrete random variable Y is defined as

$$\begin{aligned}&Y \in \{2, 4, \ldots , 2r, 2r+1, 2r+2, \ldots \},\\&{\mathbb {P}} (Y = k) = \pi _k, \quad k \in {\bar{K}} = \{2r+1, 2r+2, \ldots \}, \end{aligned}$$

and \({\bar{g}} (k)\), \(k \in {\bar{K}}\), be the above function.

If \(Y_1\), \(Y_2\), ..., \(Y_n\), ...are independent copies of the discrete random variable Y such that \({\mathbb {E}} \, {\bar{g}} (Y) < \infty \), then by the strong law of large numbers

$$\begin{aligned} \frac{1}{N} \sum _{n=1}^{N} {\bar{g}} (Y_n) \longrightarrow {\mathbb {E}} \, {\bar{g}} (Y), \quad N \rightarrow \infty \end{aligned}$$

with probability one, and hence for the generalized random variable \({\bar{Y}}\) with probability one

$$\begin{aligned} \sum _{j=1}^{r} q_{2j-1} (g(2j-1) - g(2j) ) + \frac{1}{N} \sum _{n=1}^{N} {\bar{g}} (Y_n) \longrightarrow \bar{{\mathbb {E}}} \, g({\bar{Y}}), \quad N \rightarrow \infty , \end{aligned}$$

where \(\bar{{\mathbb {E}}} \, g({\bar{Y}})\) is defined by (3.12).

The subsection 3.2 serves as an easy example of the presented general case, where for \(\alpha \in (2,3)\) we can take \(r=1\), \(K = \{1, 2, \ldots , n, \ldots \}\) and \({\bar{K}} = \{2, 3, \ldots , n, \ldots \}\). The generalized (signed) probabilities in this case are

$$\begin{aligned}&{\mathbb {P}} ( {\bar{Y}} = 1) = q_1 = \alpha > 0, \\&{\mathbb {P}} ( {\bar{Y}} = 2) = q_2 = - \frac{\alpha (\alpha -1)}{2} < 0, \end{aligned}$$

but \(\pi _2 = q_1 + q_2 \in (0,1)\), as well as for the rest \(q_j = -w_j \in (0,1)\), \(j = 3, 4, \ldots \), and \(g(k) = f(t-kh)\), \(k = 0, 1, 2, \ldots \), for a given f and fixed r and h.

4 Monte Carlo method for fractional differentiation using the semi-group property of fractional differences

Another approach can be based on using the semigroup property [3, 7] of the fractional-order finite difference operator defined in (1.2). In fact, in this way we can simply follow our paper [8], where the function f(t) should be replaced by an integer-order fractional difference. Using the semi-group property of fractional-order differences, we can write

$$\begin{aligned} A_h^\alpha = A_h^{\alpha - n} A_h^n, \quad n< \alpha < n+1, \end{aligned}$$

where

$$\begin{aligned} A_h^n f(t) = \frac{1}{h^n} \sum _{j=0}^{n} (-1)^j \left( {\begin{array}{c}n\\ j\end{array}}\right) f(t - jh), \qquad \left( {\begin{array}{c}n\\ j\end{array}}\right) = \frac{n!}{j! \, (n-j)!}. \end{aligned}$$

Hence [8],

$$\begin{aligned} A_h^\alpha f(t)= & {} \frac{1}{h^{\alpha -n}} \sum _{k=0}^{\infty } \gamma (\alpha - n, k) A_h^n f(t-kh) \nonumber \\= & {} \frac{1}{h^{\alpha - n}} \left( \frac{1}{h^n} \sum _{j=0}^{n} (-1)^j \left( {\begin{array}{c}n\\ j\end{array}}\right) f(t - jh) \right. \nonumber \\&\left. - \, \frac{1}{h^n} \sum _{k=1}^{\infty } b_k(\alpha -n) \, \sum _{j=0}^{n} (-1)^j \left( {\begin{array}{c}n\\ j\end{array}}\right) f(t - (k+j)h) \right) , \end{aligned}$$
(4.1)

where if \({\tilde{\alpha }} = \alpha -n \in (0,1)\), then

$$\begin{aligned} b_k({\tilde{\alpha }})= & {} b_k (\alpha -n) = - \gamma ({\tilde{\alpha }}, k) \nonumber \\= & {} (-1)^{k+1} \frac{\varGamma ({\tilde{\alpha }} + 1)}{k! \, \varGamma ({\tilde{\alpha }}- k + 1)} > 0, \quad k = 1, 2, \ldots , \end{aligned}$$
(4.2)

and

$$\begin{aligned} \sum _{k=1}^{\infty } b_k = 1. \end{aligned}$$

Let \(Z \in \{ 1, 2, \ldots \}\) be a discrete random variable with

$$\begin{aligned} {\mathbb {P}} (Z=k) = b_k ({\tilde{\alpha }}), \quad k = 1, 2, \ldots \end{aligned}$$
(4.3)

Then from (4.1) we have

$$\begin{aligned} A_h^\alpha f(t)= & {} \frac{1}{h^{\alpha - n}} \left( \frac{1}{h^n} \sum _{j=0}^{n} (-1)^j \left( {\begin{array}{c}n\\ j\end{array}}\right) f(t - jh) \right. \nonumber \\&\left. - \, \frac{1}{h^n} {\mathbb {E}} \sum _{j=0}^{n} (-1)^j \left( {\begin{array}{c}n\\ j\end{array}}\right) f(t - (Z+j)h) \right) . \end{aligned}$$
(4.4)

If \(Z_1\), \(Z_2\), ..., \(Z_m\), ...are independent copies of the random variable Z, then, as \(N \rightarrow \infty \), by the strong law of large numbers

$$\begin{aligned}&\frac{1}{N} \sum _{m=1}^{N} \frac{1}{h^n} \sum _{j=0}^{n} (-1)^j \left( {\begin{array}{c}n\\ j\end{array}}\right) f(t - (Z_m+j)h) \\&\quad \longrightarrow {\mathbb {E}} \frac{1}{h^n} \sum _{j=0}^{n} (-1)^j \left( {\begin{array}{c}n\\ j\end{array}}\right) f(t - (Z+j)h) \end{aligned}$$

with probability one for \(\alpha \in (n, n+1)\), assuming that the last expectation exists, and hence with probability one as \(N \rightarrow \infty \) we have convergence to \(A_{h}^\alpha f(t)\) defined by (1.2):

$$\begin{aligned} A_{N,h}^\alpha f(t)= & {} \frac{1}{h^{\alpha - n}} \left( \frac{1}{h^n} \sum _{j=0}^{n} (-1)^j \left( {\begin{array}{c}n\\ j\end{array}}\right) f(t - jh) \right. \nonumber \\&\quad \left. - \, \frac{1}{N} \sum _{m=1}^{N} \frac{1}{h^n} \sum _{j=0}^{n} (-1)^j \left( {\begin{array}{c}n\\ j\end{array}}\right) f(t - (Z_m+j)h) \right) \nonumber \\&\quad \longrightarrow \, A_{h}^\alpha f(t). \end{aligned}$$

Moreover, if for fixed f, t, and h the following inequality holds

$$\begin{aligned} \text{ Var } \, \left[ h^{-n} \sum _{j=0}^{n} (-1)^j \left( {\begin{array}{c}n\\ j\end{array}}\right) f(t - (Z + j)h) \right] < \infty , \end{aligned}$$

then by the central limit theorem and Slutsky lemma, as \(N \rightarrow \infty \), we have

$$\begin{aligned} B_N^\alpha = \bigl [ A_{N,h}^{\alpha } f(t) - A_{h}^{\alpha } f(t) \bigr ] / \sqrt{v_N} \rightarrow ^D N(0,1), \end{aligned}$$

where \(\rightarrow ^D\) denotes convergence in distributions and N(0, 1) is the standard normal law, and \(v_N\) is a sample variance of random variables

$$\begin{aligned} \frac{1}{h^\alpha } \sum _{j=0}^{n} (-1)^j \left( {\begin{array}{c}n\\ j\end{array}}\right) f(t - (Z_m+j)h), \quad m=1, 2, \ldots , N. \end{aligned}$$

This allows us to build asymptotic confidence intervals (see details in [8]).

The above results can be used as the basis of the Monte Carlo method for numerical approximation and evaluation of the Grünwald–Letnikov fractional derivatives. Indeed, we can replace the sample \(Z_1\), \(Z_2\), ..., \(Z_N\) by its Monte Carlo simulations.

For the simulation of the random variable Z with the distribution (4.3), we define

$$\begin{aligned} F_j = \sum _{i=1}^{j} b_i, \end{aligned}$$

where \(b_i = p_i(\alpha )\) are defined in (4.2).

Then

$$\begin{aligned} 0 = F_0< F_1< \ldots< F_j < \ldots , \quad \text{ with } b_i = F_i - F_{i-1}. \end{aligned}$$

If U is a random variable uniformly distributed on [0, 1], then

$$\begin{aligned} {\mathbb {P}} (F_{j-1}< U < F_j) = b_j, \end{aligned}$$

and to generate \(Z \in {1, 2, \ldots }\), we set \(Z=k\)  if  \(F_{k-1} \le U < F_k\).

5 Examples

Both proposed methods have been implemented in MATLAB [13]. This allows their mutual comparison, as well as useful visualizations and numerical experiments with the functions that are frequently used in the applications of the fractional calculus and fractional-order differential equations. The Mittag-Leffler function [10]

$$\begin{aligned} E_{\alpha , \beta } (z) = \sum _{n=0}^{\infty } \frac{z^n}{\varGamma (\alpha n + \beta )}, \quad z \in {\mathbb {C}}, \quad \alpha , \beta > 0, \end{aligned}$$

that appears in some of the provided examples, is computed using [11]. In all examples the considered interval is sufficiently large, namely \(t \in [0, 10]\).

The exact fractional derivatives are plotted using solid lines, the results of the proposed Monte Carlo method are shown by bold points, the results of K individual trials (draws) are shown by vertically oriented small points (in all examples, \(K=200\)), and the confidence intervals are shown by short horizontal lines above and below the bold points.

5.1 Example 1. The power function

$$\begin{aligned} y(t) = t^\nu , \qquad D^\alpha y(t) = \frac{\varGamma (\nu + 1)}{\varGamma (\nu +1 - \alpha )} t^{\nu -\alpha }, \quad t> 0, \quad \alpha > 0. \end{aligned}$$

The particular case of \(\nu =0\) is the Heaviside unit-step function, and its derivatives of orders \(\alpha =1.7\) and \(\alpha =2.6\) are shown in Fig. 1.

The derivatives of the power function for \(\nu =1.3\) and orders \(\alpha =1.7\) and \(\alpha =2.6\) are shown in Fig. 2.

Fig. 1
figure 1

Derivative of orders \(\alpha =1.7\) (left) and \(\alpha =2.6\) (right) of the Heaviside function

Fig. 2
figure 2

Derivative of orders \(\alpha =1.7\) (left) and \(\alpha =2.6\) (right) of the power function \(y(t)=t^{1.3}\)

5.2 Example 2. The exponential function

Since the difference of the exponential function and the first terms of its power series can be expressed in terms of the Mittag-Leffler function, we can easily obtain the explicit expression for the corresponding fractional derivative [10]:

$$\begin{aligned} y(t)= & {} e^{\lambda t} - 1 - \lambda t = {\lambda ^2 t^2} E_{1,3} (\lambda t), \\ D^\alpha y(t)= & {} \lambda ^2 t^{2-\alpha } E_{1, 3-\alpha } (\lambda t), \quad t > 0, \quad \alpha \in (0, 3). \end{aligned}$$

The derivative of order \(\alpha =1.7\) of the function \(y(t) = e^{\lambda t} - 1 - \lambda t\) for \(\lambda =-1.4\) and \(\lambda =-0.4\) are shown in Fig. 3.

Fig. 3
figure 3

Derivative of order \(\alpha =1.7\) of the function \(y(t) = e^{\lambda t} - 1 - \lambda t\) for \(\lambda =-1.4\) (left) and \(\lambda =-0.4\) (right)

5.3 Example 3. The sine function

Similarly to the previous example, the fractional-order derivatives of the sine function can be obtained using its representation in terms of the Mittag-Leffler function:

$$\begin{aligned} y(t)= & {} \sin (t) = t E_{2,2} (-t^2), \\ D^\alpha y(t)= & {} t^{1-\alpha } E_{2, 2-\alpha } (-t^2), \quad t > 0, \quad \alpha \in (0, 2). \end{aligned}$$

The results of evaluation of the derivatives of order 1.7 of the sine function using both methods are shown in Fig. 4 (method P is based on the signed probabilities approach, method S is based on using the semi-group property of fractional differences). The computed values of the fractional-order derivative are in mutual agreement and conform with the exact fractional-order derivatives. The confidence intervals are also of similar size, while the variance is much smaller in the case of the method based on the semi-group property of fractional-order differences; however, the computational cost is much higher. This observation holds for all other examples.

Fig. 4
figure 4

Derivative of order \(\alpha =1.7\) of \(\sin (t)\) using method P (left) and method S (right)

5.4 Example 4. The Mittag-Leffler function

The product of the power function and the Mittag-Leffler function appears frequently in solutions of fractional differential equations with Riemann-Liouville fractional derivatives, so it is also a suitable example:

$$\begin{aligned} y(t)= & {} t^{\beta -1} E_{\mu , \beta } (\lambda t^\mu ), \\ D^\alpha y(t)= & {} t^{\beta -\alpha -1} E_{\mu , \beta -\alpha } (\lambda t^\mu ), \quad t> 0, \quad \mu > 0, \quad \alpha \in (0, \beta ). \end{aligned}$$

The results of computations for such function are shown in Fig. 5 for \(\alpha =1.7\), \(\mu =1.5\), \(\beta =3.5\), and \(\lambda =-1.4\), resp. \(\lambda =-0.4\).

Fig. 5
figure 5

Derivative of order \(\alpha =1.7\) of the Mittag-Leffler function \(t^{2.5}E_{1.5,3.5} (\lambda t^{1.5})\) for \(\lambda =-1\) (left) and \(\lambda =-0.4\) (right)

6 Conclusions

Our extension of the Monte Carlo approach to fractional differentiation of orders higher than one led to working with signed probabilities that are not necessarily positive. We have demonstrated how this can be processed and used for computations using the Monte Carlo method.

The results of computations by the method based on signed probabilities are compared with the results obtained by the method that uses the semi-group property of fractional-order differences. Both methods produce practically the same values of fractional derivatives, which are in agreement with the values computed using the explicit formulas. The method based on the semi-group property is, by its construction, significantly less efficient as it requires more computations; at the same time it is characterized by smaller variance of the outputs of trials at the points of evaluations. The method based on signed probabilities is, on the contrary, faster and is characterized by larger values of variance of the trials at the points of evaluations. In both cases the confidence intervals are sufficiently small for practical purposes. The presented method can be further enhanced using standard approaches for improving the classical Monte Carlo method, such as reduction of variance, importance sampling, stratified sampling, control variates or antithetic sampling.