1 Introduction

In this paper, we take advantage of the powerful theory of random difference equations to conduct a full probabilistic study to the random second order linear differential equation

$$ \textstyle\begin{cases} \ddot{X}(t)+A(t)\dot{X}(t)+B(t)X(t)=0, \quad t\in\mathbb {R}, \\ X(t_{0})=Y_{0}, \\ \dot{X}(t_{0})=Y_{1}. \end{cases} $$
(1)

The data coefficients \(A(t)\) and \(B(t)\) are stochastic processes and the initial conditions \(Y_{0}\) and \(Y_{1}\) are random variables on an underlying complete probability space \((\Omega,\mathcal{F},\mathbb {P})\). In the triple of the probability space, Ω is the sample space, which consists of outcomes that will be generically denoted by ω; \(\mathcal{F}\) is a σ-algebra of events; and \(\mathbb{P}\) is a probability measure. Naturally, the solution of (1), \(X(t)\), is a stochastic process as well. Actually, we should write \(A(t,\omega)\), \(B(t,\omega)\), \(Y_{0}(\omega)\) and \(Y_{1}(\omega)\), but to simplify the notation, and in concordance with the literature, we will hide the generic outcome ω and just write \(A(t)\), \(B(t)\), \(Y_{0}\) and \(Y_{1}\) instead.

The aim of this paper is, in the first step, to specify the meaning of random differential equation (1) via the \(\mathrm {L}^{p}(\Omega )\) random calculus or, more concretely, using the so-called mean square calculus that corresponds to \(p=2\); secondly, to find a proper stochastic process solution to (1); and thirdly, to compute its main statistical information (expectation and variance) under mild conditions.

Particular cases of the random initial value problem (1) have been studied in previous contributions using \(\mathrm {L}^{p}(\Omega)\) random calculus. For instance, important deterministic models appearing in the area of mathematical physics like Airy, Hermite, Legendre, Laguerre and Bessel differential equations have been randomized and rigorously studied in [3, 4, 7, 8, 10], respectively. In these contributions, approximate solution stochastic processes together with their main statistical moments (mean and variance) are constructed by taking advantage of the random mean square calculus. Since in the case of Hermite, Legendre and Laguerre deterministic differential equations it is well known that they admit polynomial solutions, in those contributions the concept of random polynomial solution is introduced in the stochastic framework as well. In [15], the authors propose a homotopy technique to solve some particular random differential equations belonging to the class given in (1). A very important case of problem (1) is when its coefficients are random variables rather than stochastic processes, i.e., \(A(t)=A\) and \(B(t)=B\), corresponding to the autonomous case. In [6] the authors construct approximations of the first and second probability density function of the solution stochastic process using a complementary approach to mean square calculus. In [13, 14, 1820, 33] one addresses significant advances for other random differential equations, dealing with the computation of the probability density function of the corresponding solution. Additional studies dealing with random differential equations via random mean square calculus include [12, 2124, 2628, 31, 36], for instance.

The structure of this paper is described as follows. In Sect. 2 we revise the notation and the theory of \(\mathrm {L}^{p}(\Omega)\) calculus necessary to understand the paper. In Sect. 3 we solve the random initial value problem (1) in a suitable way, and we describe the manner of approximating the main statistical information of the solution process (mean and variance). In Sect. 4 we compare our findings with the existing literature, and we also perform some numerical examples including an illustrative application in a modeling setting using real data. Finally, in Sect. 5 conclusions are drawn.

2 Preliminaries

Let \((\Omega,\mathcal{F},\mathbb{P})\) be a complete probability space. In this paper we work with random variables \(X:\Omega \rightarrow\mathbb{R}\) that belong to the so-called Lebesgue spaces \(\mathrm {L}^{p}(\Omega)\). Recall that we say that \(X\in \mathrm {L}^{p}(\Omega)\), \(1\leq p<\infty\), if the norm

$$\Vert X \Vert _{\mathrm {L}^{p}(\Omega)}:= \biggl( \int_{\Omega} \vert X \vert ^{p} \,\mathrm {d}\mathbb {P} \biggr)^{\frac{1}{p}} $$

is finite. We say that \(X\in \mathrm {L}^{\infty}(\Omega)\) if the norm

$$\Vert X \Vert _{\mathrm {L}^{\infty}(\Omega)}:=\inf \bigl\{ \sup \bigl\{ \bigl\vert X( \omega) \bigr\vert : \omega\in \Omega\backslash N \bigr\} : \mathbb{P}(N)=0 \bigr\} $$

is finite (this norm is usually termed essential supremum). These spaces are Banach, and the particular case of \(\mathrm {L}^{2}(\Omega)\) is a Hilbert space. Sometimes, when we refer to convergence in \(\mathrm {L}^{2}(\Omega)\), we will say that the convergence holds in the mean square (m.s.) sense.

In general, the statistical information of the random variable X is given by a set of operators that give information about the distribution of X. In this paper we will deal with the expectation, \(\mathbb{E}[X]=\int_{\Omega}X \,\mathrm {d}\mathbb{P}\), and with the variance, \(\mathbb{V}[X]=\mathbb{E}[(X-\mathbb{E}[X])^{2}]\). Mean square convergence is important because it preserves the limit of the expectation and variance: if \(\{X_{n}\}_{n=1}^{\infty}\) is a sequence of random variables that converges in \(\mathrm {L}^{2}(\Omega)\) to X (that is, m.s. convergent), then

$$ \lim_{n\rightarrow\infty} \mathbb{E}[X_{n}]= \mathbb{E}[X], \qquad \lim_{n\rightarrow\infty} \mathbb{V}[X_{n}]= \mathbb{V}[X] $$
(2)

(see Theorem 4.3.1 in [37]).

A very important inequality concerning the norm of a product of random variables is Hölder’s inequality: for any two random variables X and Y, we have \(\|XY\|_{\mathrm {L}^{r}(\Omega)}\leq\|X\|_{\mathrm {L}^{p}(\Omega )}\|Y\|_{\mathrm {L}^{q}(\Omega)}\), where \(1\leq r,p,q\leq\infty\) and \(1/r=1/p+1/q\). When \(r=1\) and \(p=q=2\), the inequality is known as Cauchy–Schwarz inequality. Another stochastic result that will be used throughout this paper is Jensen’s inequality: if f is a convex function on \(\mathbb{R}\) and assuming that the expectations \(\mathbb {E}[X]\) and \(\mathbb{E}[f(X)]\) both exist and are finite, then \(f(\mathbb{E}[X]) \leq\mathbb{E}[f(X)]\). In particular, if \(f(z)=|z|\) one gets \(|\mathbb{E}[X]| \leq\mathbb{E}[|X|]\).

In this paper we will also deal with stochastic processes \(X=\{ X(t,\omega): t\in I, \omega\in\Omega\}\), where \(I\subseteq \mathbb{R}\). To simplify notation, we will just write X or \(X(t)\) and we will make the dependence on \(\omega\in\Omega\) implicit. Fixed \(\omega\in\Omega\), the stochastic process \(X(t)\) can be seen as a real mapping from \(I\subseteq\mathbb{R}\) to \(\mathbb{R}\), so any concept of real calculus, such as continuity, differentiability, etc., can be defined for the stochastic process.

However, sometimes it is more suitable to work with \(\mathrm {L}^{p}(\Omega)\) random calculus. In the case of \(p=2\), it is termed mean square calculus. To read a full exposition on this topic, see [37, Ch. 4], [29, Ch. XI], or [30, Ch. 5]. In [38] the authors combine \(\mathrm {L}^{2}(\Omega)\) and \(\mathrm {L}^{4}(\Omega)\), corresponding to mean square and mean fourth random calculus, to solve random differential equations.

We say that the stochastic process X is in \(\mathrm {L}^{p}(\Omega)\) if the random variable \(X(t)\) belongs to \(\mathrm {L}^{p}(\Omega)\) for all \(t\in I\). For such processes, we say that X is differentiable in the \(\mathrm {L}^{p}(\Omega)\) sense at \(t_{0}\in I\) if there exists a random variable \(\dot{X}(t_{0})\) in \(\mathrm {L}^{p}(\Omega)\) such that

$$\lim_{h\rightarrow0} \biggl\Vert \frac{X(t_{0}+h)-X(t_{0})}{h}-\dot {X}(t_{0}) \biggr\Vert _{\mathrm {L}^{p}(\Omega)}=0. $$

The random variable \(\dot{X}(t_{0})\) is called the \(\mathrm {L}^{p}(\Omega)\) derivative of \(X(t)\) at \(t_{0}\). We say that the stochastic process X is differentiable on I in the \(\mathrm {L}^{p}(\Omega)\) sense if it is differentiable at every \(t_{0}\in I\).

The stochastic process X is analytic at \(t_{0}\) if \(X(t)=\sum_{n=0}^{\infty}X_{n} (t-t_{0})^{n}\) for every t in a neighbourhood of \(t_{0}\), where \(X_{0},X_{1},\ldots\) are random variables and the sum is in the topology of \(\mathrm {L}^{p}(\Omega)\).

Thus, when we deal with the random differential equation (1), we will understand the derivatives in an \(\mathrm {L}^{p}(\Omega)\) random sense. More concretely, as we will see, the correct setting will be \(\mathrm {L}^{2}(\Omega)\) and the use of the mean square calculus.

An important difference with respect to the deterministic scenario when solving a random differential equation is the computation of the main statistical functions associated to the solution stochastic process, such as the mean and the variance functions.

3 Results

Our main goal is to find the solution stochastic process to the random initial value problem (1). We will assume that the data stochastic process \(A(t)\) and \(B(t)\) are analytic at \(t_{0}\), in the following sense:

$$A(t)=\sum_{n=0}^{\infty}A_{n} (t-t_{0})^{n},\qquad B(t)=\sum_{n=0}^{\infty}B_{n} (t-t_{0})^{n}, $$

for \(t\in(t_{0}-r,t_{0}+r)\), being \(r>0\) fixed, and the sum is understood in the \(\mathrm {L}^{2}(\Omega)\) setting. We search for an analytic solution process \(X(t)\) of the form

$$X(t)=\sum_{n=0}^{\infty}X_{n} (t-t_{0})^{n}, $$

for \(t\in(t_{0}-r,t_{0}+r)\), where the sum is in \(\mathrm {L}^{2}(\Omega)\). This stochastic process will be a solution to the random problem (1) in the sense of \(\mathrm {L}^{2}(\Omega)\) (so, in particular twice differentiable in the mean square sense).

3.1 Auxiliary results concerning random power series

We need some auxiliary results to deal with random power series in the \(\mathrm {L}^{2}(\Omega)\) setting. First of all, we need a result to differentiate a power series in the \(\mathrm {L}^{p}(\Omega)\) sense (in this paper we will just use the cases \(p=1\) and \(p=2\), but we do the proof for a general \(p\geq1\) just for the sake of completeness). The particular case \(p=2\) is a consequence of Theorem 3.1 in [11].

Theorem 3.1

(Differentiation of a random power series in the \(\mathrm {L}^{p}(\Omega)\) sense)

Let \(A(t)=\sum_{n=0}^{\infty}A_{n} (t-t_{0})^{n}\) be a random power series in the \(\mathrm {L}^{p}(\Omega)\) setting (\(p\geq1\)) for \(t\in(t_{0}-r,t_{0}+r)\), \(r>0\). Then the random power series \(\sum_{n=1}^{\infty}n A_{n} (t-t_{0})^{n-1}\) exists in \(\mathrm {L}^{p}(\Omega)\) for \(t\in(t_{0}-r,t_{0}+r)\); moreover, the \(\mathrm {L}^{p}(\Omega)\) derivative of \(A(t)\) is equal to it:

$$\dot{A}(t)=\sum_{n=1}^{\infty}n A_{n} (t-t_{0})^{n-1} $$

for all \(t\in(t_{0}-r,t_{0}+r)\).

Proof

Let us see first that the random power series \(\sum_{n=1}^{\infty}n A_{n} (t-t_{0})^{n-1}\) exists in \(\mathrm {L}^{p}(\Omega)\) for \(t\in(t_{0}-r,t_{0}+r)\). Given \(0<\rho<r\), we prove that

$$ \sum_{n=1}^{\infty}n \Vert A_{n} \Vert _{\mathrm {L}^{p}(\Omega)} \rho^{n-1}< \infty. $$
(3)

Fix \(s: 0<\rho<s<r\). Since the sum \(\sum_{n=0}^{\infty}A_{n} s^{n}\) exists in \(\mathrm {L}^{p}(\Omega)\), \(\lim_{n\rightarrow\infty} \|A_{n}\|_{\mathrm {L}^{p}(\Omega)}s^{n}=0\), so there exists \(K>0\) such that \(\|A_{n}\|_{\mathrm {L}^{p}(\Omega)}s^{n}\leq K\) for every \(n\geq0\). Then

$$n \Vert A_{n} \Vert _{\mathrm {L}^{p}(\Omega)}\rho^{n-1}\leq n K \frac{1}{s} \biggl(\frac{\rho}{s} \biggr)^{n-1}. $$

As \(\sum_{n=1}^{\infty}n (\rho/s)^{n-1}<\infty\), by the comparison test for series, we conclude that (3) holds.

Let us see now that the \(\mathrm {L}^{p}(\Omega)\) derivative of \(A(t)\) is equal to \(\sum_{n=1}^{\infty}n A_{n} (t-t_{0})^{n-1}\). By the definition of \(\mathrm {L}^{p}(\Omega)\) derivative, we have to check that

$$\lim_{h\rightarrow0} \Biggl\Vert \sum_{n=0}^{\infty}A_{n} \frac {(t+h-t_{0})^{n}-(t-t_{0})^{n}}{h}-\sum_{n=1}^{\infty}n A_{n}(t-t_{0})^{n-1} \Biggr\Vert _{\mathrm {L}^{p}(\Omega)}=0. $$

Fix s and h such that \(0<\rho<s<r\) and \(0<|h|<s-\rho\). Fix \(t\in (t_{0}-\rho,t_{0}+\rho)\). By the triangular inequality,

$$\begin{aligned} & \Biggl\Vert \sum_{n=0}^{\infty}A_{n} \frac {(t+h-t_{0})^{n}-(t-t_{0})^{n}}{h}-\sum_{n=1}^{\infty}n A_{n}(t-t_{0})^{n-1} \Biggr\Vert _{\mathrm {L}^{p}(\Omega)} \\ &\quad \leq \sum_{n=1}^{\infty} \Vert A_{n} \Vert _{\mathrm {L}^{p}(\Omega)} \biggl\vert \frac {(t+h-t_{0})^{n}-(t-t_{0})^{n}}{h}-n (t-t_{0})^{n-1} \biggr\vert . \end{aligned}$$
(4)

We know that

$$ \lim_{h\rightarrow0} \biggl\vert \frac{(t+h-t_{0})^{n}-(t-t_{0})^{n}}{h}-n (t-t_{0})^{n-1} \biggr\vert =0 $$
(5)

(by the definition of the pointwise derivative of \((t-t_{0})^{n}\)). On the other hand, using the identity \(a^{n}-b^{n}=(a-b)(\sum_{m=0}^{n-1} a^{n-1-m}b^{m})\), we perform the following estimates:

$$\begin{aligned} \biggl\vert \frac{(t+h-t_{0})^{n}-(t-t_{0})^{n}}{h} \biggr\vert ={} & \biggl\vert \frac {(t+h-t_{0})^{n}-(t-t_{0})^{n}}{(t+h-t_{0})-(t-t_{0})} \biggr\vert \\ = {}& \Biggl\vert \sum_{m=0}^{n-1} (t+h-t_{0})^{n-1-m}(t-t_{0})^{m} \Biggr\vert \\ \leq {}& \sum_{m=0}^{n-1} \vert t+h-t_{0} \vert ^{n-1-m} \vert t-t_{0} \vert ^{m} \\ \leq {}& \sum_{m=0}^{n-1} \bigl( \vert t-t_{0} \vert + \vert h \vert \bigr)^{n-1-m} \vert t-t_{0} \vert ^{m} \\ \leq {}& n \bigl( \vert t-t_{0} \vert + \vert h \vert \bigr)^{n-1}\leq ns^{n-1}, \end{aligned}$$

where we have used that \(|t-t_{0}|\leq|t-t_{0}|+|h|\), \(|t-t_{0}|<\rho\) and \(|h|< s-\rho\). Then

$$\begin{aligned} & \Vert A_{n} \Vert _{\mathrm {L}^{p}(\Omega)} \biggl\vert \frac {(t-t_{0}+h)^{n}-(t-t_{0})^{n}}{h}-n(t-t_{0})^{n-1} \biggr\vert \\ & \quad \leq \Vert A_{n} \Vert _{\mathrm {L}^{p}(\Omega)} \biggl( \biggl\vert \frac {(t-t_{0}+h)^{n}-(t-t_{0})^{n}}{h} \biggr\vert +n \vert t-t_{0} \vert ^{n-1} \biggr) \\ &\quad \leq \Vert A_{n} \Vert _{\mathrm {L}^{p}(\Omega)} \bigl(ns^{n-1}+n \rho^{n-1} \bigr)\leq \Vert A_{n} \Vert _{\mathrm {L}^{p}(\Omega)} \bigl(ns^{n-1}+ns^{n-1} \bigr)= 2 \Vert A_{n} \Vert _{\mathrm {L}^{p}(\Omega )}ns^{n-1}, \end{aligned}$$
(6)

being \(\sum_{n=1}^{\infty}\|A_{n}\|_{\mathrm {L}^{p}(\Omega)}ns^{n-1}<\infty\). By the dominated convergence theorem for series, both (5) and (6) permit us to conclude that (4) tends to 0 as \(h\rightarrow0\), as wanted. □

As it shall be apparent later, we need a theorem to multiply random power series. In the deterministic setting, the so-called Merten’s theorem allows multiplying two power series. The deterministic version of Merten’s theorem is proved in Theorem 8.46 of [1]. We adapt the proof in [1] to a stochastic setting in Theorem 3.2. Notice that, from Theorem 3.2, when we multiply two random power series, we lose Lebesgue spaces of convergence. As we will see in the proof, this fact will be a consequence of the Cauchy–Schwarz inequality (for two real numbers u and v, we always have \(|uv|=|u||v|\); however, for random variables U and V, we do not have \(\|UV\|_{\mathrm {L}^{2}(\Omega)}=\|U\|_{\mathrm {L}^{2}(\Omega)}\|V\|_{\mathrm {L}^{2}(\Omega)}\) in general, but \(\|UV\|_{\mathrm {L}^{1}(\Omega)}\leq\|U\|_{\mathrm {L}^{2}(\Omega)}\|V\|_{\mathrm {L}^{2}(\Omega)}\)).

Theorem 3.2

(Merten’s theorem for random series in the mean square sense)

Let \(U=\sum_{n=0}^{\infty}U_{n}\) and \(V=\sum_{n=0}^{\infty}V_{n}\) be two random series that converge in \(\mathrm {L}^{2}(\Omega)\). Suppose that one of the series converges absolutely, say \(\sum_{n=0}^{\infty}\|V_{n}\|_{\mathrm {L}^{2}(\Omega)}<\infty\). Then

$$\Biggl(\sum_{n=0}^{\infty}U_{n} \Biggr) \Biggl(\sum_{n=0}^{\infty}V_{n} \Biggr)=\sum_{n=0}^{\infty}W_{n}, $$

where

$$W_{n}=\sum_{m=0}^{n} U_{n-m}V_{m} $$

and \(\sum_{n=0}^{\infty}W_{n}\) is understood in \(\mathrm {L}^{1}(\Omega)\). The series \(\sum_{n=0}^{\infty}W_{n}\) is known as the Cauchy product of the series \(\sum_{n=0}^{\infty}U_{n}\) and \(\sum_{n=0}^{\infty}V_{n}\).

Proof

Let us write the Nth partial sum of \(\sum_{n=0}^{\infty}W_{n}\) in an appropriate way:

$$\begin{aligned} \sum_{n=0}^{N} W_{n}= {}& \sum _{n=0}^{N}\sum_{m=0}^{n} U_{n-m} V_{m}=\sum_{m=0}^{N} V_{m}\sum_{n=m}^{N} U_{n-m}= \sum_{m=0}^{N} V_{m} \sum_{n=0}^{N-m} U_{n} \\ ={} & \sum_{m=0}^{N} V_{m} \Biggl(U-\sum_{n=N-m+1}^{\infty}U_{n} \Biggr)= U\sum_{m=0}^{N} V_{m}-\sum _{m=0}^{N} V_{m} \sum _{n=N-m+1}^{\infty}U_{n}. \end{aligned}$$

The first addend, \(U\sum_{m=0}^{N} V_{m}\), tends to UV in \(\mathrm {L}^{1}(\Omega)\) as \(N\rightarrow\infty\). In fact, observe that because of the Cauchy–Schwarz inequality, one gets

$$\begin{aligned} \Biggl\Vert U\sum_{m=0}^{N} V_{m} - UV \Biggr\Vert _{\mathrm {L}^{1}(\Omega)} &= \Biggl\Vert U \Biggl(\sum _{m=0}^{N} V_{m} - V \Biggr) \Biggr\Vert _{\mathrm {L}^{1}(\Omega)}\\ & \leq \Vert U \Vert _{\mathrm {L}^{2}(\Omega)} \Biggl\Vert \sum _{m=0}^{N} V_{m} - V \Biggr\Vert _{\mathrm {L}^{2}(\Omega)}\, \xrightarrow[N \to\infty]{\,}0, \end{aligned}$$

where we have used that \(\Vert U \Vert _{\mathrm {L}^{2}(\Omega)}<\infty \) (since \(U\in \mathrm {L}^{2}(\Omega)\) because it is the m.s. limit of the series \(\sum_{n=0}^{\infty}U_{n}\)) and that \(\Vert \sum_{m=0}^{N} V_{m} - V \Vert _{\mathrm {L}^{2}(\Omega)} \xrightarrow[N \to\infty]{0}\) (since by hypothesis \(\sum_{n=0}^{\infty}V_{n}\) converges to V in \(\mathrm {L}^{2}(\Omega)\)).

Thus, it only remains to prove that the second addend, \(\sum_{m=0}^{N} V_{m} \sum_{n=N-m+1}^{\infty}U_{n}\), goes to 0 in \(\mathrm {L}^{1}(\Omega)\) as \(N\rightarrow\infty\).

Since \(\lim_{N\rightarrow\infty} \sum_{n=N}^{\infty}U_{n}=0\) in \(\mathrm {L}^{2}(\Omega)\), there exists \(L>0\) such that

$$\Biggl\Vert \sum_{n=N}^{\infty}U_{n} \Biggr\Vert _{\mathrm {L}^{2}(\Omega)}\leq L, \quad \forall N\in\mathbb{N}. $$

Let \(K=\sum_{n=0}^{\infty}\|V_{n}\|_{\mathrm {L}^{2}(\Omega)}\). Fix \(\epsilon >0\). We can take \(N_{\epsilon}\) such that, for all \(N\geq N_{\epsilon}\),

$$\Biggl\Vert \sum_{n=N}^{\infty}U_{n} \Biggr\Vert _{\mathrm {L}^{2}(\Omega)}< \frac {\epsilon}{2K},\qquad \sum _{n=N+1}^{\infty} \Vert V_{n} \Vert _{\mathrm {L}^{2}(\Omega )}< \frac{\epsilon}{2L}. $$

Then, for \(N\geq2N_{\epsilon}\), by the triangular inequality and the Cauchy–Schwarz inequality,

$$\begin{aligned} \Biggl\Vert \sum_{m=0}^{N} V_{m} \sum_{n=N-m+1}^{\infty}U_{n} \Biggr\Vert _{\mathrm {L}^{1}(\Omega)}\leq {}& \sum _{m=0}^{N_{\epsilon}} \Vert V_{m} \Vert _{\mathrm {L}^{2}(\Omega)} \Biggl\Vert \sum_{n=N-m+1}^{\infty}U_{n} \Biggr\Vert _{\mathrm {L}^{2}(\Omega)} \\ &{}+ \sum_{m=N_{\epsilon}+1}^{N} \Vert V_{m} \Vert _{\mathrm {L}^{2}(\Omega)} \Biggl\Vert \sum_{n=N-m+1}^{\infty}U_{n} \Biggr\Vert _{\mathrm {L}^{2}(\Omega)} \\ \leq {}& \frac{\epsilon}{2K} \sum_{m=0}^{N_{\epsilon}} \Vert V_{m} \Vert _{\mathrm {L}^{2}(\Omega)}+L\sum _{m=N_{\epsilon}+1}^{N} \Vert V_{m} \Vert _{\mathrm {L}^{2}(\Omega)} \\ \leq {}& \frac{\epsilon}{2K} \sum_{m=0}^{\infty} \Vert V_{m} \Vert _{\mathrm {L}^{2}(\Omega)}+L\sum _{m=N_{\epsilon}+1}^{\infty} \Vert V_{m} \Vert _{\mathrm {L}^{2}(\Omega )} \\ \leq{} & \frac{\epsilon}{2K}K+L\frac{\epsilon}{2L}=\epsilon. \end{aligned}$$

This shows that \(\sum_{m=0}^{N} V_{m} \sum_{n=N-m+1}^{\infty}U_{n}\) tends to 0 in \(\mathrm {L}^{1}(\Omega)\) as \(N\rightarrow\infty\). □

3.2 Main result: constructing the solution stochastic process of the random non-autonomous second order linear differential equation

We present the main theorem of this paper. After stating and proving it, a deeper analysis of the hypotheses and consequences of the theorem will be performed.

Theorem 3.3

Let \(A(t)=\sum_{n=0}^{\infty}A_{n} (t-t_{0})^{n}\) and \(B(t)=\sum_{n=0}^{\infty}B_{n} (t-t_{0})^{n}\) be two random series in the \(\mathrm {L}^{2}(\Omega)\) setting, for \(t\in(t_{0}-r,t_{0}+r)\), being \(r>0\) finite and fixed. Assume that the initial conditions \(Y_{0}\) and \(Y_{1}\) belong to \(\mathrm {L}^{2}(\Omega)\). Suppose that there is a constant \(C_{r}>0\), maybe dependent on r, such that \(\|A_{n}\|_{\mathrm {L}^{\infty}(\Omega)}\leq C_{r}/r^{n}\) and \(\|B_{n}\|_{\mathrm {L}^{\infty}(\Omega)}\leq C_{r}/r^{n}\), \(n\geq0\). Then the stochastic process \(X(t)=\sum_{n=0}^{\infty}X_{n}(t-t_{0})^{n}\), \(t\in(t_{0}-r,t_{0}+r)\), where

$$\begin{aligned} &X_{0}=Y_{0},\qquad X_{1}=Y_{1}, \end{aligned}$$
(7)
$$\begin{aligned} &X_{n+2}=\frac{-1}{(n+2)(n+1)}\sum_{m=0}^{n} \bigl[(m+1)A_{n-m}X_{m+1}+B_{n-m}X_{m} \bigr], \quad n\geq0, \end{aligned}$$
(8)

is the unique analytic solution to the random initial value problem (1) in the mean square sense.

Proof

Suppose that \(X(t)=\sum_{n=0}^{\infty}X_{n}(t-t_{0})^{n}\) is a solution to (1) in the \(\mathrm {L}^{2}(\Omega)\) sense for \(t\in (t_{0}-r,t_{0}+r)\), \(r>0\). By Theorem 3.1 with \(p=2\), the mean square derivatives of \(X(t)\) are given by

$$\begin{aligned} &\dot{X}(t)=\sum_{n=1}^{\infty}n X_{n}(t-t_{0})^{n-1}=\sum _{n=0}^{\infty}(n+1)X_{n+1} (t-t_{0})^{n}, \\ &\ddot{X}(t)=\sum_{n=2}^{\infty}n(n-1)X_{n}(t-t_{0})^{n-2}=\sum _{n=0}^{\infty}(n+2) (n+1)X_{n+2}(t-t_{0})^{n}. \end{aligned}$$

By the random Merten’s Theorem 3.2,

$$\begin{aligned} &A(t)\dot{X}(t)=\sum_{n=0}^{\infty}\Biggl(\sum _{m=0}^{n} A_{n-m}(m+1)X_{m+1} \Biggr) (t-t_{0})^{n}, \\ &B(t)X(t)=\sum_{n=0}^{\infty}\Biggl(\sum _{m=0}^{n} B_{n-m}X_{m} \Biggr) (t-t_{0})^{n}, \end{aligned}$$

where these two random series converge in \(\mathrm {L}^{1}(\Omega)\). From \(\ddot{X}(t)+A(t)\dot{X}(t)+B(t)X(t)=0\),

$$ \sum_{n=0}^{\infty}\Biggl[(n+2) (n+1)X_{n+2}+\sum_{m=0}^{n} \bigl(A_{n-m}(m+1)X_{m+1}+B_{n-m}X_{m} \bigr) \Biggr](t-t_{0})^{n}=0, $$
(9)

for \(t\in(t_{0}-r,t_{0}+r)\), where the random series is again in the \(\mathrm {L}^{1}(\Omega)\) sense. By Theorem 3.1 with \(p=1\), differentiating (9) over and over again in the \(\mathrm {L}^{1}(\Omega)\) sense and evaluating at \(t=t_{0}\) yield

$$(n+2) (n+1)X_{n+2}+\sum_{m=0}^{n} \bigl(A_{n-m}(m+1)X_{m+1}+B_{n-m}X_{m} \bigr)=0. $$

Isolating \(X_{n+2}\) we obtain the recursive expression (8):

$$X_{n+2}=\frac{-1}{(n+2)(n+1)}\sum_{m=0}^{n} \bigl[(m+1)A_{n-m}X_{m+1}+B_{n-m}X_{m} \bigr]. $$

The initial conditions of the random initial value problem (1) give (7), and so \(X(t)\) is uniquely determined with probability 1.

It remains to check that the random series \(\sum_{n=0}^{\infty}X_{n}(t-t_{0})^{n}\) is convergent in \(\mathrm {L}^{2}(\Omega)\). For that purpose, we will make use of the \(\mathrm {L}^{\infty}(\Omega)\) bounds for \(A_{n}\) and \(B_{n}\) quoted in the hypotheses.

From the hypothesis \(Y_{0},Y_{1}\in \mathrm {L}^{2}(\Omega)\) and by induction on n in expression (8), we obtain that \(X_{n}\in \mathrm {L}^{2}(\Omega )\) for all \(n\geq0\). By the triangular inequality, the hypotheses and Hölder’s inequality with \(r=2\), \(p=\infty\) and \(q=2\) (see the notation used in Sect. 2),

$$\begin{aligned} \Vert X_{n+2} \Vert _{\mathrm {L}^{2}(\Omega)}\leq{} & \frac{1}{(n+2)(n+1)}\sum _{m=0}^{n} \bigl[(m+1) \Vert A_{n-m}X_{m+1} \Vert _{\mathrm {L}^{2}(\Omega)}+ \Vert B_{n-m}X_{m} \Vert _{\mathrm {L}^{2}(\Omega)} \bigr] \\ \leq {}& \frac{1}{(n+2)(n+1)}\frac{C_{r}}{r^{n}}\sum_{m=0}^{n} r^{m} \bigl((m+1) \Vert X_{m+1} \Vert _{\mathrm {L}^{2}(\Omega)}+ \Vert X_{m} \Vert _{\mathrm {L}^{2}(\Omega)} \bigr). \end{aligned}$$
(10)

Define \(H_{0}:=\|Y_{0}\|_{\mathrm {L}^{2}(\Omega)}\), \(H_{1}:=\|Y_{1}\|_{\mathrm {L}^{2}(\Omega )}\) and

$$ H_{n+2}:=\frac{1}{(n+2)(n+1)}\frac{C_{r}}{r^{n}}\sum _{m=0}^{n} r^{m} \bigl((m+1)H_{m+1}+H_{m} \bigr). $$
(11)

By induction on n it is trivially seen that \(\|X_{n}\|_{\mathrm {L}^{2}(\Omega )}\leq H_{n}\) for \(n\geq0\). Thus, given \(0<\rho<r\), it is enough to see that \(\sum_{n=0}^{\infty}H_{n}\rho^{n}<\infty\). For that purpose, we rewrite (11) so that \(H_{n+2}\) is expressed as a function of \(H_{n+1}\) and \(H_{n}\) (second order recurrence equation):

$$\begin{aligned} H_{n+2}= {}& \frac{1}{(n+2)(n+1)}\frac{C_{r}}{r^{n}} \Biggl(\sum _{m=0}^{n-1} r^{m} \bigl((m+1)H_{m+1}+H_{m} \bigr)+r^{n} \bigl((n+1)H_{n+1}+H_{n} \bigr) \Biggr) \\ = {}& \frac{1}{(n+2)(n+1)}\frac{C_{r}}{r^{n}}\frac{(n+1)n}{C_{r}}r^{n-1} \Biggl(\underbrace{\frac{1}{(n+1)n}\frac{C_{r}}{r^{n-1}}\sum _{m=0}^{n-1} r^{m} \bigl((m+1)H_{m+1}+H_{m} \bigr)}_{=H_{n+1}} \Biggr) \\ &{}+ \frac{C_{r}}{n+2}H_{n+1}+\frac{C_{r}}{(n+2)(n+1)}H_{n} \\ ={} & \biggl(\frac{n}{(n+2)r}+\frac{C_{r}}{n+2} \biggr)H_{n+1}+ \frac {C_{r}}{(n+2)(n+1)}H_{n}. \end{aligned}$$
(12)

Fix \(s: 0<\rho<s<r\). We have

$$H_{n+2}s^{n+2}= \biggl(\frac{n s}{(n+2)r}+\frac{C_{r} s}{n+2} \biggr)H_{n+1}s^{n+1}+\frac{C_{r}s^{2}}{(n+2)(n+1)}H_{n}s^{n}. $$

Let \(M_{n}=\max_{0\leq m\leq n} H_{m} s^{m}\). We have

$$ H_{n+2}s^{n+2}\leq \biggl(\frac{n s}{(n+2)r}+ \frac{C_{r} s}{n+2}+\frac {C_{r}s^{2}}{(n+2)(n+1)} \biggr)M_{n+1}. $$
(13)

Since

$$\lim_{n\rightarrow\infty} \frac{n s}{(n+2)r}+\frac{C_{r} s}{n+2}+ \frac{C_{r}s^{2}}{(n+2)(n+1)}=\frac{s}{r}< 1, $$

it holds \(M_{n+2}=M_{n+1}\) for all large n, and call the common value M. Hence, \(H_{n} s^{n}\leq M\) for all large n, therefore \(H_{n}\rho ^{n}\leq M(\rho/s)^{n}\). Since \(\sum_{n=0}^{\infty}(\rho/s)^{n}<\infty\), by comparison the series \(\sum_{n=0}^{\infty}H_{n}\rho^{n}\) converges, and we are done. □

3.3 Comments on the hypotheses of the theorem

The hypotheses concerning the \(\mathrm {L}^{\infty}(\Omega)\) growth of the coefficients \(A_{n}\) and \(B_{n}\), \(n\geq0\), may seem quite restrictive. However, these hypotheses have been necessary to relate the \(\mathrm {L}^{2}(\Omega)\) norm of the coefficients \(X_{0},X_{1},X_{2},\ldots\) in (10), then define the random variables \(H_{0},H_{1},H_{2},\ldots\) and finally bound \(\|X_{n}\|_{\mathrm {L}^{2}(\Omega)}\leq H_{n}\) by induction on \(n\geq0\). Without the hypotheses \(\|A_{n}\|_{\mathrm {L}^{\infty}(\Omega)}\leq C_{r}/r^{n}\) and \(\|B_{n}\|_{\mathrm {L}^{\infty}(\Omega)}\leq C_{r}/r^{n}\) of Theorem 3.3, this would not have been possible.

Moreover, these \(\mathrm {L}^{\infty}(\Omega)\) hypotheses are equivalent to a growth condition on the moments of the random variables \(A_{0},A_{1},\ldots \) and \(B_{0},B_{1},\ldots\) . The key fact is that, for a given random variable Z, we have that \(\mathbb{E}[|Z|^{n}]\leq HR^{n}\) for certain \(H>0\) and \(R>0\) if and only if \(\|Z\|_{\mathrm {L}^{\infty}(\Omega)}\leq R\).

This key fact is a direct consequence of the following result: if Z is a random variable, then \(\lim_{n\rightarrow\infty} \|Z\|_{\mathrm {L}^{n}(\Omega)}= \|Z\|_{\mathrm {L}^{\infty}(\Omega)}\). For the sake of completeness, we show the proof. If \(\|Z\|_{\mathrm {L}^{\infty}(\Omega )}<\infty\), define \(\Omega_{\delta}=\{|Z|\geq\|Z\|_{\mathrm {L}^{\infty }(\Omega)}-\delta\}\) for \(\delta>0\) small. By the definition of \(\mathrm {L}^{\infty}(\Omega)\), \(\mathbb{P}(\Omega_{\delta})>0\). Now,

$$\Vert Z \Vert _{\mathrm {L}^{n}(\Omega)}= \biggl( \int_{\Omega} \vert Z \vert ^{n} \,\mathrm {d}\mathbb {P} \biggr)^{\frac{1}{n}}\geq \biggl( \int_{\Omega_{\delta}} \vert Z \vert ^{n} \,\mathrm {d}\mathbb{P} \biggr)^{\frac{1}{n}}\geq \bigl( \Vert Z \Vert _{\mathrm {L}^{\infty}(\Omega)}-\delta \bigr) \mathbb{P}(\Omega_{\delta})^{\frac{1}{n}}. $$

Since \(\mathbb{P}(\Omega_{\delta})>0\), we obtain

$$\liminf_{n\rightarrow\infty} \Vert Z \Vert _{\mathrm {L}^{n}(\Omega)} \geq \Vert Z \Vert _{\mathrm {L}^{\infty}(\Omega)}-\delta. $$

As \(\delta>0\) is arbitrary, \(\liminf_{n\rightarrow\infty} \|Z\| _{\mathrm {L}^{n}(\Omega)} \geq\|Z\|_{\mathrm {L}^{\infty}(\Omega)}\). To prove the reverse inequality, write

$$\Vert Z \Vert _{\mathrm {L}^{n}(\Omega)}= \biggl( \int_{\Omega} \vert Z \vert ^{n}\,\mathrm {d}\mathbb {P} \biggr)^{\frac{1}{n}}= \biggl( \int_{\Omega} \vert Z \vert ^{n-1} \vert Z \vert \,\mathrm {d}\mathbb {P} \biggr)^{\frac{1}{n}}\leq \Vert Z \Vert _{\mathrm {L}^{\infty}(\Omega)}^{\frac {n-1}{n}} \Vert Z \Vert _{\mathrm {L}^{1}(\Omega)}^{\frac{1}{n}}. $$

Since \(\|Z\|_{\mathrm {L}^{1}(\Omega)}\leq\|Z\|_{\mathrm {L}^{\infty}(\Omega )}<\infty\), it holds \(\lim_{n\rightarrow\infty}\|Z\|_{\mathrm {L}^{1}(\Omega)}^{\frac{1}{n}}=1\) (the trivial case \(Z\equiv0\) is obvious, we can assume \(Z\not\equiv0\)). Then

$$\limsup_{n\rightarrow\infty} \Vert Z \Vert _{\mathrm {L}^{n}(\Omega)}\leq \Vert Z \Vert _{\mathrm {L}^{\infty}(\Omega)}. $$

This shows the result when \(\|Z\|_{\mathrm {L}^{\infty}(\Omega)}<\infty\). If \(\|Z\|_{\mathrm {L}^{\infty}(\Omega)}=\infty\), then one has to define \(\Omega_{L}=\{|Z|\geq L\}\) for \(L>0\) large. Proceeding similarly and by the arbitrariness of L, one arrives at \(\liminf_{n\rightarrow\infty } \|Z\|_{\mathrm {L}^{n}(\Omega)}=\infty=\|Z\|_{\mathrm {L}^{\infty}(\Omega)}\).

Growth hypotheses of the form \(\mathbb{E}[|Z|^{n}]\leq HR^{n}\), for certain \(H>0\) and \(R>0\), are common in the literature to find stochastic analytic solutions to particular cases of (1). See, for example, Airy’s random differential equation in [7] and Hermite’s random differential equation in [3]. We have proved in this subsection that controlling the growth of the moments is equivalent to controlling the \(\mathrm {L}^{\infty}(\Omega)\) norm. Hence, Theorem 3.3 will allow us to generalize the results obtained in previous articles, for instance [3, 7]. See Example 4.1 and Example 4.2 for the generalization.

3.4 Relationship between the random pointwise and classical differential equations approaches

The hypotheses of Theorem 3.3, besides providing a stochastic solution to our problem (1), also give a pointwise classical solution to (1) under the additional assumption \(Y_{0},Y_{1}\in \mathrm {L}^{\infty}(\Omega)\). Indeed, by hypothesis we have \(r^{n} \|A_{n}\|_{\mathrm {L}^{\infty}(\Omega)}\leq C_{r}\) and \(r^{n}\|B_{n}\|_{\mathrm {L}^{\infty}(\Omega)}\leq C_{r}\). For fixed \(\delta>0\) small enough, one gets

$$\biggl(\frac{r}{1+\delta} \biggr)^{n} \Vert A_{n} \Vert _{\mathrm {L}^{\infty}(\Omega )}\leq\frac{C_{r}}{(1+\delta)^{n}}, \qquad\biggl(\frac{r}{1+\delta } \biggr)^{n} \Vert B_{n} \Vert _{\mathrm {L}^{\infty}(\Omega)}\leq \frac{C_{r}}{(1+\delta )^{n}}. $$

From these two inequalities,

$$\sum_{n=0}^{\infty} \Vert A_{n} \Vert _{\mathrm {L}^{\infty}(\Omega)}(t-t_{0})^{n}< \infty ,\qquad \sum _{n=0}^{\infty} \Vert B_{n} \Vert _{\mathrm {L}^{\infty}(\Omega )}(t-t_{0})^{n}< \infty $$

for \(t\in(t_{0}-r/(1+\delta),t_{0}+r/(1+\delta))\). Letting δ approach 0, we arrive at the uniform convergence of the series for \(t\in(t_{0}-r,t_{0}+r)\). The solution X also satisfies \(\sum_{n=0}^{\infty}\|X_{n}\|_{\mathrm {L}^{\infty}(\Omega)}(t-t_{0})^{n}<\infty\) for \(t\in(t_{0}-r,t_{0}+r)\), because we can apply the triangular inequality in (10) with the \(\mathrm {L}^{\infty}(\Omega)\) norm instead of the \(\mathrm {L}^{2}(\Omega)\) norm. Therefore, \(X(t)\) is a pointwise classical solution for each \(\omega\in\Omega\) fixed (it is a solution from a deterministic point of view). This manner of studying random differential equations is referred to as the sample approach [37, Appendix I], [9].

3.5 Statistical information of the solution stochastic process: mean and variance

The expectation and variance of the stochastic process \(X(t)=\sum_{n=0}^{\infty}X_{n}(t-t_{0})^{n}\) given by (7)–(8) can be approximated. Indeed, first, one has to obtain \(X_{n}\) as a function of \(Y_{0}\), \(Y_{1}\), \(A_{0},\ldots,A_{n-1}\) and \(B_{0},\ldots,B_{n-1}\) by recursion via (8) for \(n=0,1,\ldots,N\). After this, we construct a truncation

$$ X_{N}(t)=\sum_{n=0}^{N} X_{n} (t-t_{0})^{n} $$
(14)

of the solution stochastic process \(X(t)\). Since \(X_{N}(t)\rightarrow X(t)\) in \(\mathrm {L}^{2}(\Omega)\) as \(N\rightarrow\infty\), according to the key property (2), we have

$$\lim_{N\rightarrow\infty} \mathbb{E} \bigl[X_{N}(t) \bigr]=\mathbb {E} \bigl[X(t) \bigr],\qquad \lim_{N\rightarrow\infty} \mathbb {V} \bigl[X_{N}(t) \bigr]=\mathbb{V} \bigl[X(t) \bigr]. $$

As an example of the manner one can proceed, we show by hand some random coefficients \(X_{n}\):

$$\begin{aligned} &X_{2}=\frac{-1}{2}(A_{0}Y_{1}+B_{0}Y_{0}), \\ &X_{3}=\frac{-1}{6}(A_{1}Y_{1}+B_{1}Y_{0}+2A_{0}X_{2}+B_{0}Y_{1})= \frac {-1}{6} \bigl(A_{1}Y_{1}+B_{1}Y_{0}-A_{0}^{2}Y_{1}-A_{0}B_{0}Y_{0}+B_{0}Y_{1} \bigr), \\ &X_{4}= \frac{-1}{12}(A_{2}Y_{1}+B_{2}Y_{0}+2A_{1}X_{2}+B_{1}X_{1}+3A_{0}X_{3}+B_{0}X_{2}) \\ &\phantom{X_{4}}= \frac{-1}{12} \biggl(A_{2}Y_{1}+B_{2}Y_{0}-A_{1}A_{0}Y_{1}-A_{1}B_{0}Y_{0}+B_{1}Y_{1}- \frac{1}{2} A_{0}A_{1}Y_{1}- \frac{1}{2} A_{0}B_{1}Y_{0} \\ &\phantom{X_{4}=}{}+ \frac{1}{2} A_{0}^{3}Y_{1}+ \frac{1}{2} A_{0}^{2}B_{0}Y_{0}- \frac{1}{2} A_{0}B_{0}Y_{1}- \frac{1}{2} B_{0}A_{0}Y_{1}- \frac{1}{2} B_{0}^{2}Y_{0} \biggr). \end{aligned}$$

From these computations, we have truncation (14) for \(N=4\), \(X_{4}(t)=Y_{0}+Y_{1}t+X_{2}t^{2}+X_{3}t^{3}+X_{4}t^{4}\). We need to compute \(\mathbb {E}[X_{4}(t)]\) to approximate \(\mathbb{E}[X(t)]\). By linearity of the expectation, \(\mathbb{E}[X_{4}(t)]=\mathbb{E}[Y_{0}]+\mathbb {E}[Y_{1}]t+\mathbb{E}[X_{2}]t^{2}+\mathbb{E}[X_{3}]t^{3}+\mathbb{E}[X_{4}]t^{4}\). Assuming independence of \(Y_{0},Y_{1},A_{0},A_{1},\ldots,B_{0},B_{1},\ldots\) and applying a property from [16, p. 93], we are able to compute the expectation of the addends by hand:

$$\begin{aligned} &\mathbb{E}[X_{2}]=\frac{-1}{2} \bigl(\mathbb{E}[A_{0}] \mathbb{E}[Y_{1}]+\mathbb {E}[B_{0}]\mathbb{E}[Y_{0}] \bigr), \\ &\mathbb{E}[X_{3}]=\frac{-1}{6} \bigl(\mathbb{E}[A_{1}] \mathbb{E}[Y_{1}]+\mathbb {E}[B_{1}]\mathbb{E}[Y_{0}]- \mathbb{E} \bigl[A_{0}^{2} \bigr]\mathbb{E}[Y_{1}]\\ &\phantom{\mathbb{E}[X_{3}]=}{}- \mathbb {E}[A_{0}]\mathbb{E}[B_{0}]\mathbb{E}[Y_{0}]+ \mathbb{E}[B_{0}]\mathbb {E}[Y_{1}] \bigr), \\ &\mathbb{E}[X_{4}]= \frac{-1}{12} \biggl(\mathbb{E}[A_{2}] \mathbb {E}[Y_{1}]+\mathbb{E}[B_{2}]\mathbb{E}[Y_{0}]- \mathbb{E}[A_{1}]\mathbb {E}[A_{0}]\mathbb{E}[Y_{1}]- \mathbb{E}[A_{1}]\mathbb{E}[B_{0}]\mathbb {E}[Y_{0}] \\ &\phantom{\mathbb{E}[X_{4}]=}{}+ \mathbb{E}[B_{1}]\mathbb{E}[Y_{1}]-\frac{1}{2} \mathbb {E}[A_{0}]\mathbb{E}[A_{1}]\mathbb{E}[Y_{1}]- \frac{1}{2} \mathbb {E}[A_{0}]\mathbb{E}[B_{1}] \mathbb{E}[Y_{0}] \\ &\phantom{\mathbb{E}[X_{4}]=}{}+ \frac{1}{2} \mathbb{E} \bigl[A_{0}^{3} \bigr] \mathbb{E}[Y_{1}]+\frac{1}{2} \mathbb {E} \bigl[A_{0}^{2} \bigr]\mathbb{E}[B_{0}]\mathbb{E}[Y_{0}]-\frac{1}{2} \mathbb {E}[A_{0}]\mathbb{E}[B_{0}]\mathbb{E}[Y_{1}] \\ &\phantom{\mathbb{E}[X_{4}]=}{}- \frac{1}{2} \mathbb{E}[B_{0}]\mathbb{E}[A_{0}] \mathbb{E}[Y_{1}]-\frac{1}{2} \mathbb{E} \bigl[B_{0}^{2} \bigr]\mathbb{E}[Y_{0}] \biggr). \end{aligned}$$

For large values of n, we need a computer to manage the big expressions for \(X_{n}\), and as a consequence \(\mathbb{E}[X_{N}(t)]\) and \(\mathbb{V}[X_{N}(t)]\) for large values of N. We show how to implement the necessary formulas to compute the expectation and variance of the truncated series in the software Mathematica®. The recurrence relation (7)–(8) is defined as follows:

X[n_?NonPositive] := Y0;

X[1] = Y1;

X[n_] := -1/(n*(n - 1))*Sum[(m + 1)*A[n - 2 - m]*X[m + 1]

     + B[n - 2 - m]*X[m], {m, 0, n - 2}].

Truncation (14) is implemented by writing

seriesX[t_, t0_, N_] := X[0] + Sum[X[n]*(t - t0)^n, {n, 1, N}].

Using the Expectation function, in which one can set the distributions of A[n], B[n], Y0 and Y1, both the expectation and variance of (14) can be calculated by the computer.

There are other approaches to approximating the expectation of the solution to the random initial value problem (1). One of these approaches is the so-called dishonest method [2], [17, p. 149], which assumes that \(A(t)\) and \(\dot{X}(t)\) are independent and that \(B(t)\) and \(X(t)\) are independent. Denoting \(\mu_{X}(t)=\mathbb{E}[X(t)]\), the idea is that, since \(\mathbb {E}[\ddot{X}(t)]=\frac{\mathrm {d}^{2}}{\mathrm {d}t^{2}}(\mu_{X}(t))\) and \(\mathbb {E}[\dot{X}(t)]=\frac{\mathrm {d}}{\mathrm {d}t}(\mu_{X}(t))\), because of the commutation between the mean square limit and the expectation operator (see [37, Ch. 4]), by the assumed independence we arrive at a deterministic initial value problem to compute \(\mu_{X}(t)\):

$$ \textstyle\begin{cases} \frac{\mathrm {d}^{2}}{\mathrm {d}t^{2}}(\mu_{X}(t))+\mathbb {E}[A(t)]\frac{\mathrm {d}}{\mathrm {d}t}(\mu_{X}(t))+\mathbb{E}[B(t)]\mu_{X}(t)=0,\quad t\in\mathbb{R}, \\ \mu_{X}(t_{0})=\mathbb{E}[Y_{0}], \\ \frac{\mathrm {d}}{\mathrm {d}t}(\mu_{X}(t_{0}))=\mathbb{E}[Y_{1}]. \end{cases} $$
(15)

In [2] and [17, p. 149] this method is used to handle the problem of computing the expectation of the solution stochastic process of certain random differential equations. In [3, 7] approximations of the expectation of the corresponding solution stochastic process obtained via specific methods have been compared with the ones calculated by the dishonest approach. In our context, the dishonest method will work on cases where \(\mathbb {C}\mathrm {ov}[A(t),\dot {X}(t)]\) and \(\mathbb {C}\mathrm {ov}[B(t),X(t)]\) are small, but in general, there is no certainty that this may hold. In Example 4.1 and Example 4.2, we approximate \(\mathbb {C}\mathrm {ov}[A(t),\dot{X}(t)]\) and \(\mathbb {C}\mathrm {ov}[B(t),X(t)]\) in order to understand better the accuracy of the dishonest method. Nevertheless, the approximations via the truncation method previously described allow us to obtain reliable approximations for the expectation, and also for the variance, of the solution process.

Another popular approach consists in using Monte Carlo simulations. Sample from the distributions of \(A(t)\), \(B(t)\), \(Y_{0}\) and \(Y_{1}\) to obtain, say M realizations, for M large. That is, we have \(A(t,\omega_{1}),\ldots,A(t,\omega_{M})\), \(B(t,\omega_{1}),\ldots ,B(t,\omega_{M})\), \(Y_{0}(\omega_{1}),\ldots,Y_{0}(\omega_{M})\) and \(Y_{1}(\omega_{1}),\ldots,Y_{1}(\omega_{M})\) for M outcomes \(\omega _{1},\ldots,\omega_{M}\in\Omega\). Then we solve the M deterministic initial value problems

$$ \textstyle\begin{cases} \ddot{X}(t,\omega_{i})+A(t,\omega_{i})\dot{X}(t,\omega _{i})+B(t,\omega_{i})X(t,\omega_{i})=0, \quad t\in\mathbb{R}, \\ X(t_{0},\omega_{i})=Y_{0}(\omega_{i}), \\ \dot{X}(t_{0},\omega_{i})=Y_{1}(\omega _{i}), \end{cases} $$
(16)

so that we obtain M realizations of \(X(t)\): \(X(t,\omega_{1}),\ldots ,X(t,\omega_{M})\). The law of large numbers permits approximating \(\mathbb{E}[X(t)]\) and \(\mathbb{V}[X(t)]\) by computing the sample mean and sample variance of \(X(t,\omega_{1}),\ldots,X(t,\omega_{M})\).

Monte Carlo simulations, in contrast to the dishonest method, give always correct approximations, and as M grows, these approximations are more accurate, although Monte Carlo method possesses a slow convergence rate, namely, \(\mathcal{O}(1/\sqrt{M})\) [39, p. 53]. Thereby, the statistical information computed by means of Monte Carlo simulations must approximate our truncation method.

3.6 Obtaining error estimates for the approximation of the solution stochastic process, its mean and its variance

Given an error ϵ, we want to obtain \(N_{\epsilon}\) so that \(\| X_{N}(t)-X(t)\|_{\mathrm {L}^{2}(\Omega)}<\epsilon\) for all \(N\geq N_{\epsilon}\). Notice that, in such a case, by Jensen’s and the Cauchy–Schwarz inequalities, we would have

$$\bigl\vert \mathbb{E} \bigl[X_{N}(t) \bigr]-\mathbb{E} \bigl[X(t) \bigr] \bigr\vert = \bigl\vert \mathbb{E} \bigl[X_{N}(t)-X(t) \bigr] \bigr\vert \leq \mathbb{E} \bigl[ \bigl\vert X_{N}(t)-X(t) \bigr\vert \bigr]\leq \bigl\Vert X_{N}(t)-X(t) \bigr\Vert _{\mathrm {L}^{2}(\Omega )}< \epsilon, $$

therefore we will be able to estimate the error when approximating the mean \(\mathbb{E}[X(t)]\) via \(\mathbb{E}[X_{N}(t)]\).

The method to estimate errors for \(\|X_{N}(t)-X(t)\|_{\mathrm {L}^{2}(\Omega)}\) is as follows. We use the notation from the proof of Theorem 3.3. If we denote \(\rho=|t-t_{0}|\) and take \(\rho< s< r\), we have

$$ \bigl\Vert X_{N}(t)-X(t) \bigr\Vert _{\mathrm {L}^{2}(\Omega)}= \Biggl\Vert \sum_{n=N+1}^{\infty}X_{n} (t-t_{0})^{n} \Biggr\Vert _{\mathrm {L}^{2}(\Omega)} \leq\sum _{n=N+1}^{\infty} \Vert X_{n} \Vert _{\mathrm {L}^{2}(\Omega)} \rho^{n}\leq\sum_{n=N+1}^{\infty}H_{n} \rho^{n}. $$
(17)

To bound \(H_{n} s^{n}\), we base our reasoning on (13). Given \(M_{n}=\max_{0\leq m\leq n}H_{m}s^{m}\), we saw in the proof of Theorem 3.3 that \(M_{n}=M\) for sufficiently large n. In fact, for n satisfying

$$ \frac{n s}{(n+2)r}+\frac{C_{r} s}{n+2}+\frac{C_{r}s^{2}}{(n+2)(n+1)}< 1, $$
(18)

it holds \(M_{n}=M\), so M can be computed just by knowing r, \(C_{r}\), s, \(\|Y_{0}\|_{\mathrm {L}^{2}(\Omega)}\) and \(\|Y_{1}\|_{\mathrm {L}^{2}(\Omega)}\), because from these values we can see when (18) holds and compute \(H_{n}\) via the recursion (11) or (12), and thereby M. Notice that, if a lot of the random variables \(A_{n}\) and \(B_{n}\) are 0, then \(H_{n}\) will not be a tight bound for \(\|X_{n}\|_{\mathrm {L}^{2}(\Omega)}\), in general.

Once we know M, recall from the end of the proof of Theorem 3.3 that \(H_{n}\rho^{n}\leq M(\rho/s)^{n}\). We bound from (17):

$$\bigl\Vert X_{N}(t)-X(t) \bigr\Vert _{\mathrm {L}^{2}(\Omega)}\leq M\sum _{n=N+1}^{\infty}\biggl(\frac{\rho}{s} \biggr)^{n}=M\frac{ (\frac{\rho}{s} )^{N+1}}{1-\frac{\rho}{s}}. $$

If we want \(\|X_{N}(t)-X(t)\|_{\mathrm {L}^{2}(\Omega)}\) to be smaller than a prefixed error \(\epsilon>0\), we impose

$$M\frac{ (\frac{\rho}{s} )^{N+1}}{1-\frac{\rho }{s}}< \epsilon. $$

This yields

$$ N_{\epsilon}= \biggl\lceil \frac{\log (\frac{ (1-\frac{\rho }{s} )\epsilon}{M} )}{\log (\frac{\rho}{s} )}-1 \biggr\rceil $$
(19)

(here, \(\lceil x\rceil\) denotes the least integer that is greater than or equal to x, commonly known as the ceiling of x).

In Example 4.1 and Example 4.4, we will apply these computations to find an \(N_{\epsilon}\) for which the approximation of \(\mathbb{E}[X(t)]\) via \(\mathbb{E}[X_{N}(t)]\) at the points \(t=1\) and \(t=0.25\), respectively, gives an error smaller than ϵ.

We develop a similar method to estimate the errors in the approximations for the variance. That is to say, given an error \(\epsilon>0\), we want to find \(N_{\epsilon}\) such that \(|\mathbb {V}[X_{N}(t)]-\mathbb{V}[X(t)]|<\epsilon\) for all \(N\geq N_{\epsilon}\). We start bounding the difference \(|\mathbb{V}[X_{N}(t)]-\mathbb {V}[X(t)]|\) using triangular, Jensen’s and the Cauchy–Schwarz inequalities:

$$\begin{aligned} & \bigl\vert \mathbb{V} \bigl[X_{N}(t) \bigr]-\mathbb{V} \bigl[X(t) \bigr] \bigr\vert \\ &\quad= \bigl\vert \mathbb {E} \bigl[ \bigl(X_{N}(t) \bigr)^{2} \bigr]- \bigl(\mathbb{E} \bigl[X_{N}(t) \bigr] \bigr)^{2}-\mathbb{E} \bigl[ \bigl(X(t) \bigr)^{2} \bigr]+ \bigl( \mathbb {E} \bigl[X(t) \bigr] \bigr)^{2} \bigr\vert \\ &\quad \leq \mathbb{E} \bigl[ \bigl\vert \bigl(X_{N}(t) \bigr)^{2}- \bigl(X(t) \bigr)^{2} \bigr\vert \bigr]+ \bigl\vert \bigl(\mathbb {E} \bigl[X_{N}(t) \bigr] \bigr)^{2}- \bigl( \mathbb{E} \bigl[X(t) \bigr] \bigr)^{2} \bigr\vert \\ &\quad= \mathbb{E} \bigl[ \bigl\vert X_{N}(t)-X(t) \bigr\vert \bigl\vert X_{N}(t)+X(t) \bigr\vert \bigr]+ \bigl\vert \mathbb {E} \bigl[X_{N}(t) \bigr]-\mathbb{E} \bigl[X(t) \bigr] \bigr\vert \bigl\vert \mathbb{E} \bigl[X_{N}(t) \bigr]+\mathbb{E} \bigl[X(t) \bigr] \bigr\vert \\ &\quad \leq \bigl\Vert X_{N}(t)-X(t) \bigr\Vert _{\mathrm {L}^{2}(\Omega)} \bigl\Vert X_{N}(t)+X(t) \bigr\Vert _{\mathrm {L}^{2}(\Omega)}\\ &\qquad{}+ \bigl\vert \mathbb{E} \bigl[X_{N}(t) \bigr]-\mathbb{E} \bigl[X(t) \bigr] \bigr\vert \bigl( \bigl\vert \mathbb {E} \bigl[X_{N}(t) \bigr] \bigr\vert + \bigl\vert \mathbb{E} \bigl[X(t) \bigr] \bigr\vert \bigr) \\ &\quad \leq \bigl\Vert X_{N}(t)-X(t) \bigr\Vert _{\mathrm {L}^{2}(\Omega)} \bigl( \bigl\Vert X_{N}(t) \bigr\Vert _{\mathrm {L}^{2}(\Omega )}+ \bigl\Vert X(t) \bigr\Vert _{\mathrm {L}^{2}(\Omega)} \bigr)\\ &\qquad{}+ \bigl\vert \mathbb{E} \bigl[X_{N}(t) \bigr]-\mathbb {E} \bigl[X(t) \bigr] \bigr\vert \bigl( \bigl\vert \mathbb{E} \bigl[X_{N}(t) \bigr] \bigr\vert + \bigl\vert \mathbb{E} \bigl[X(t) \bigr] \bigr\vert \bigr). \end{aligned}$$

Let \(\delta>0\) that will be determined later on to make the error smaller than ϵ. By the results previously obtained in this subsection, we can choose \(N_{\delta}\) such that \(\|X_{N}(t)-X(t)\|_{\mathrm {L}^{2}(\Omega)}<\delta\) for all \(N\geq N_{\delta}\) (just applying (19) with \(\epsilon=\delta>0\)). Moreover, \(|\mathbb {E}[X_{N}(t)]-\mathbb{E}[X(t)]|\leq\|X_{N}(t)-X(t)\|_{\mathrm {L}^{2}(\Omega )}<\delta\), by Jensen’s and the Cauchy–Schwarz inequalities. Then

$$\begin{aligned} &\bigl\vert \mathbb{V} \bigl[X_{N}(t) \bigr]-\mathbb{V} \bigl[X(t) \bigr] \bigr\vert \\ &\quad \leq \delta \bigl( \bigl\Vert X_{N}(t) \bigr\Vert _{\mathrm {L}^{2}(\Omega)}+ \bigl\Vert X(t) \bigr\Vert _{\mathrm {L}^{2}(\Omega )} \bigr)+ \delta \bigl( \bigl\vert \mathbb{E} \bigl[X_{N}(t) \bigr] \bigr\vert + \bigl\vert \mathbb{E} \bigl[X(t) \bigr] \bigr\vert \bigr) \\ &\quad \leq \delta \bigl( \bigl\Vert X_{N}(t)-X(t)+X(t) \bigr\Vert _{\mathrm {L}^{2}(\Omega)}+ \bigl\Vert X(t) \bigr\Vert _{\mathrm {L}^{2}(\Omega)} \bigr) \\ &\qquad{} + \delta \bigl( \bigl\vert \mathbb{E} \bigl[X_{N}(t) \bigr]- \mathbb{E} \bigl[X_{N}(t) \bigr]+\mathbb {E} \bigl[X_{N}(t) \bigr] \bigr\vert + \bigl\vert \mathbb{E} \bigl[X(t) \bigr] \bigr\vert \bigr) \\ &\quad \leq \delta \bigl( \bigl\Vert X_{N}(t)-X(t) \bigr\Vert _{\mathrm {L}^{2}(\Omega)}+ \bigl\Vert X(t) \bigr\Vert _{\mathrm {L}^{2}(\Omega)}+ \bigl\Vert X(t) \bigr\Vert _{\mathrm {L}^{2}(\Omega)} \bigr) \\ &\qquad{} + \delta \bigl( \bigl\vert \mathbb{E} \bigl[X_{N}(t) \bigr]- \mathbb{E} \bigl[X_{N}(t) \bigr] \bigr\vert + \bigl\vert \mathbb {E} \bigl[X(t) \bigr] \bigr\vert + \bigl\vert \mathbb{E} \bigl[X(t) \bigr] \bigr\vert \bigr) \\ & \quad\leq \delta \bigl(\delta+2 \bigl\Vert X(t) \bigr\Vert _{\mathrm {L}^{2}(\Omega)} \bigr)+ \delta \bigl(\delta +2 \bigl\vert \mathbb{E} \bigl[X(t) \bigr] \bigr\vert \bigr). \end{aligned}$$

To bound \(\|X(t)\|_{\mathrm {L}^{2}(\Omega)}\), write

$$\begin{aligned} \bigl\Vert X(t) \bigr\Vert _{\mathrm {L}^{2}(\Omega)}={} & \Biggl\Vert \sum _{n=0}^{\infty}X_{n}(t-t_{0})^{n} \Biggr\Vert _{\mathrm {L}^{2}(\Omega)}\leq\sum_{n=0}^{\infty} \Vert X_{n} \Vert _{\mathrm {L}^{2}(\Omega)}\rho^{n} \\ \leq {}& \sum_{n=0}^{\infty}H_{n} \rho^{n}\leq M\sum_{n=0}^{\infty}\biggl(\frac{\rho}{s} \biggr)^{n}=M\frac{1}{1-\frac{\rho}{s}}=:\gamma, \end{aligned}$$

where \(\rho=|t-t_{0}|\) and s is any number satisfying \(\rho< s< r\). Before, we saw how to compute M, so we have obtained a computable bound for \(\|X(t)\|_{\mathrm {L}^{2}(\Omega)}\). On the other hand, to bound \(|\mathbb{E}[X(t)]|\) we have two options. One option consists in using Jensen’s and the Cauchy–Schwarz inequalities to derive \(|\mathbb {E}[X(t)]|\leq\|X(t)\|_{\mathrm {L}^{2}(\Omega)}\leq\gamma\), and we are done. The second option, which provides a tighter bound for \(|\mathbb {E}[X(t)]|\), consists in using the approximations performed for \(\mathbb{E}[X(t)]\) via \(\mathbb{E}[X_{N}(t)]\), and from them deducing an upper bound for \(\mathbb{E}[X(t)]\). For any of these two options, we denote the upper bound obtained for \(|\mathbb{E}[X(t)]|\) by \(\beta >0\). Thus, for \(N\geq N_{\delta}\),

$$\bigl\vert \mathbb{V} \bigl[X_{N}(t) \bigr]-\mathbb{V} \bigl[X(t) \bigr] \bigr\vert \leq\delta(2\gamma+\delta )+\delta(2\beta+\delta). $$

Now choose δ so that

$$\delta(2\gamma+\delta)+\delta(2\beta+\delta)\leq\epsilon. $$

From here, we take

$$ \delta=\frac{-(\gamma+\beta)+\sqrt{(\gamma+\beta)^{2}+2\epsilon}}{2}>0. $$
(20)

To sum up, given a prefixed error \(\epsilon>0\), the steps to be done in order to guarantee the approximations \(\mathbb{V}[X_{N}(t)]\) of the exact variance \(\mathbb{V}[X(t)]\) to satisfy \(|\mathbb {V}[X_{N}(t)]-\mathbb{V}[X(t)]|\leq\epsilon\) are the following ones:

  1. 1.

    Compute \(\gamma=M/(1-\rho/s)\);

  2. 2.

    Compute \(\beta>0\), upper bound of \(|\mathbb{E}[X(t)]|\);

  3. 3.

    Obtain \(\delta>0\) from (20);

  4. 4.

    Take \(N_{\delta}\) as in the approximation of the mean (expression (19), but with δ instead of ϵ).

To put forward these ideas, in Example 4.1 and Example 4.4 we will apply these computations to find an \(N_{\epsilon}\) for which, given a priori error \(\epsilon>0\), the approximation of \(\mathbb{V}[X(t)]\) by means of \(\mathbb{V}[X_{N}(t)]\) at the points \(t=1\) and \(t=0.25\), respectively, gives an error smaller than ϵ.

4 Examples

The main goal of this section is to approximate the expectation and variance of the solution process \(X(t)\) to particular random initial value problems (1). Our tools will be the ones described in Sect. 3.5. That is, computing \(\mathbb{E}[X_{N}(t)]\) and \(\mathbb{V}[X_{N}(t)]\) for truncation (14), the dishonest method and Monte Carlo simulations. We will compare the three approaches in order to realize the potentiality of using the truncation method. Our truncation method and Monte Carlo simulation must give similar approximations of the statistical moments (and equal and exact results in the limit).

In addition, we will take some particular problems (1) that have been already studied in the literature such as Airy’s and Hermite’s random differential equations [3, 7]. As we will see, our findings will generalize the results obtained in those papers (recall Sect. 3.3).

Example 4.1

(Airy’s random differential equation)

Airy’s random differential equation is the following:

$$ \textstyle\begin{cases} \ddot{X}(t)+AtX(t)=0, \quad t\in\mathbb{R}, \\ X(0)=Y_{0}, \\ \dot{X}(0)=Y_{1}, \end{cases} $$
(21)

where A, \(Y_{0}\) and \(Y_{1}\) are random variables.

In [7], the hypothesis used in order to obtain a mean square analytic solution \(X(t)\) is \(\mathbb{E}[|A|^{n}]\leq HR^{n}\), \(n\geq n_{0}\). Notice that this hypothesis is equivalent to \(\|A\|_{\mathrm {L}^{\infty}(\Omega)}\leq R\) by Sect. 3.3.

In the case \(\mathbb{E}[|A|^{n}]\leq HR^{n}\), \(n\geq n_{0}\) (that is, \(\|A\| _{\mathrm {L}^{\infty}(\Omega)}\leq R\)), we are under the hypotheses of Theorem 3.3. Indeed, in the notation of Theorem 3.3, \(A_{n}=0\) for all \(n\geq0\), \(B_{1}=A\) and \(B_{n}=0\) for every \(n\neq1\). For a fixed and finite \(r>0\) and \(t_{0}=0\), we have \(\|B_{1}\| _{\mathrm {L}^{\infty}(\Omega)}\leq C_{r}/r^{1}\), being \(C_{r}=r\|A\|_{\mathrm {L}^{\infty}(\Omega)}\), for instance. Then the stochastic process \(X(t)=\sum_{n=0}^{\infty}X_{n} t^{n}\), defined as in Theorem 3.3, is a mean square analytic solution to (21) in \((-r,r)\). As \(r>0\) is arbitrary, in fact \(X(t)=\sum_{n=0}^{\infty}X_{n} t^{n}\) is a mean square analytic solution to (21) in \(\mathbb{R}\).

Let us carry out a practical case. As in [7], consider \(A\sim \text{Beta}(2,3)\) and \(Y_{0}\), \(Y_{1}\) independent random variables such that \(Y_{0}\sim\text{Normal}(1,1)\) and \(Y_{1}\sim\text{Normal}(2,1)\). Tables 1 and 2 collect the simulations obtained in [7]. In Table 1 we show, for distinct values of t, \(\mathbb{E}[X_{N}(t)]\) for \(N=15\) and \(N=16\), the expectation of the solution stochastic process obtained via the dishonest method and also using Monte Carlo simulations with samples of size \(50{,}000\) and \(100{,}000\). In Table 2 we present, for distinct values of t, \(\mathbb{V}[X_{N}(t)]\) for \(N=15\) and \(N=16\) and the corresponding approximations computed via Monte Carlo sampling with \(50{,}000\) and \(100{,}000\) simulations.

Table 1 Approximation of the expectation of the solution stochastic process. Example 4.1, assuming independent random data
Table 2 Approximation of the variance of the solution stochastic process. Example 4.1, assuming independent random data

We observe that convergence has been achieved for small N. Compare with Monte Carlo simulation, in which a lot of realizations are required in order to obtain good approximations. Nevertheless, it must be remarked that small N is needed for the truncation order, because Airy’s random differential equation is not specially complex. For more complex data processes, as in Example 4.3, Example 4.4 and Example 4.5, a larger order of truncation N may be needed. This may imply a computational expense greater than or similar to Monte Carlo simulation.

On the other hand, it is remarkable how well the dishonest method approximates the correct expectation, although the required independence between \(A(t)\), \(\dot{X}(t)\) and \(B(t)\), \(X(t)\) does not hold. The key point is that \(\mathbb {C}\mathrm {ov}[A(t),\dot{X}(t)]\) and \(\mathbb {C}\mathrm {ov}[B(t),X(t)]\) are small, as Table 3 shows, which justifies the accuracy of the dishonest method, especially for small t. Notice that, in this example, \(\mathbb {C}\mathrm {ov}[A(t),\dot{X}(t)]=0\), because \(A(t)\equiv 0\) is deterministic. The value of \(\mathbb {C}\mathrm {ov}[B(t),X(t)]\) is calculated by considering approximations \(X_{N}(t)\) of \(X(t)\) with \(N=16\), since for this order of truncation one gets good approximations of \(X(t)\); in other words, approximations can be consider as fairly exact.

Table 3 Approximation of \(\mathbb {C}\mathrm {ov}[A(t),\dot{X}(t)]\) and \(\mathbb {C}\mathrm {ov}[B(t),X(t)]\) via accurate truncations \(\dot{X}_{16}(t)\) and \(X_{16}(t)\), respectively. Example 4.1, assuming independent random data

We perform another example of Airy’s random differential equation (21), but this time the input random variables A, \(Y_{0}\) and \(Y_{1}\) will not be independent. Indeed, take a random vector \((A,Y_{0},Y_{1})\) that follows a multivariate Gaussian distribution, with mean vector and covariance matrix

$$\mu= \begin{pmatrix} 0.4 \\ 1 \\ 2 \end{pmatrix} ,\qquad \Sigma= \begin{pmatrix} 0.04 & 0.0001 & -0.05 \\ 0.0001 & 1 & 0.5 \\ -0.005 & 0.5 & 1 \end{pmatrix} , $$

respectively. In order for the hypotheses of Theorem 3.3 to be satisfied, we need to truncate A (because the normal distribution is unbounded). Since A follows a normal distribution with mean \(\mu _{A}=0.4\) and variance \(\sigma_{A}^{2}=0.04\), the interval \([\mu_{A}-3\sigma _{A},\mu_{A}+3\sigma_{A}]=[-0.2,1]\) contains 99.7% of the observations of A. Thus, the multivariate Gaussian distribution will be truncated to \([-0.2,1]\times\mathbb{R}\times\mathbb{R}\).

In Tables 4 and 5, we present the numerical experiments. We use truncation (14) with \(N=15\) and \(N=16\), the dishonest method and Monte Carlo simulation.

Table 4 Approximation of the expectation of the solution stochastic process. Example 4.1, assuming dependent random data
Table 5 Approximation of the variance of the solution stochastic process. Example 4.1, assuming dependent random data

Once again, convergence has been achieved quite quickly compared with Monte Carlo simulation. The results obtained are more accurate than via the dishonest method and Monte Carlo simulation. Nonetheless, the accuracy of the dishonest method is remarkable again, particularly in the time interval \(t\in[0,1]\), although not as good as in the previous case (see Table 1). In Table 6, we show approximations of the covariances \(\mathbb {C}\mathrm {ov}[A(t),\dot{X}(t)]\) and \(\mathbb {C}\mathrm {ov}[B(t),X(t)]\). These covariances are small, especially for small t, which explains the good approximation of the expectation via the dishonest method.

Table 6 Approximation of \(\mathbb {C}\mathrm {ov}[A(t),\dot{X}(t)]\) and \(\mathbb {C}\mathrm {ov}[B(t),X(t)]\) via accurate truncations \(\dot{X}_{16}(t)\) and \(X_{16}(t)\), respectively. Example 4.1, assuming dependent random data

As an application of the error estimates studied in Sect. 3.6, we estimate for which index \(N_{\epsilon}\) the error obtained in the approximation of \(\mathbb{E}[X(t)]\) via \(\mathbb{E}[X_{N}(t)]\) is smaller than \(\epsilon=0.00001\). Take \(t=1\) and \(r=2\). Let, for instance, \(s=1.5\). In both cases in this example (assuming independent and dependent random data), we have \(C_{r}=r\|A\|_{\mathrm {L}^{\infty}(\Omega )}\). Since \(\|A\|_{\mathrm {L}^{\infty}(\Omega)}=1\) (recall that in the two cases considered throughout this example, the realizations \(A(\omega )\), \(\omega\in\Omega\), of the random variable A lie either in \([0,1]\) or in \([-0.2,1]\), thus being less than 1), we take \(C_{r}=r=2\). From these values, the least \(n_{0}\) such that (18) holds for all \(n\geq n_{0}\) is \(n_{0}=7\) (this value is obtained by plotting the left-hand side of (18) and looking at the point \(n_{0}\) from which the graph is less than 1). Then, from (13), \(M=M_{8}=\max_{0\leq m\leq8}H_{m} s^{m}=2024.49\). Finally, using (19), one gets \(N_{\epsilon}=10\). For \(N\geq N_{\epsilon}=10\), it holds \(|\mathbb{E}[X_{N}(t)]-\mathbb{E}[X(t)]|<0.00001\).

Now, given \(\epsilon=0.00001\), we obtain an \(N_{\epsilon}\) such that \(|\mathbb{V}[X_{N}(t)]-\mathbb{V}[X(t)]|<\epsilon\) at \(t=1\) for every \(N\geq N_{\epsilon}\). We use the ideas and notation from Sect. 3.6. We have \(t=1\), \(\rho=1\), \(r=2\) and \(s=1.5\). We saw that \(M=2024.49\). Then \(\gamma=M/(1-\rho/s)=6073.47\). Recall that we could choose β equal to γ or, for a tighter bound, use Tables 1 and 4. We see that \(|\mathbb {E}[X(t)]|\leq2.869=:\beta\). From these values, we obtain \(\delta =8.22638\cdot10^{-10}\). Finally, choose \(N_{\delta}\) so that \(\| X_{N}(1)-X(1)\|_{\mathrm {L}^{2}(\Omega)}<\delta\). Use formula (19) (with δ instead of ϵ) to get \(N_{\delta}=73\). Thus, for \(N\geq73\), the inequality \(|\mathbb{V}[X_{N}(t)]-\mathbb {V}[X(t)]|<0.00001\) holds for sure.

Example 4.2

(Hermite’s random differential equation)

Hermite’s random differential equation is defined as follows:

$$ \textstyle\begin{cases} \ddot{X}(t)-2t\dot{X}(t)+AX(t)=0, \quad t\in\mathbb{R}, \\ X(0)=Y_{0}, \\ \dot{X}(0)=Y_{1}, \end{cases} $$
(22)

where A, \(Y_{0}\) and \(Y_{1}\) are random variables.

In [3], the moments of A are controlled as \(\mathbb {E}[|A|^{n}]\leq HR^{n}\), \(n\geq n_{0}\), to prove the existence of a mean square analytic solution to random initial value problem (22). As we saw in Sect. 3.3, this hypothesis reduces to \(\|A\|_{\mathrm {L}^{\infty}(\Omega)}\leq R\).

If \(\mathbb{E}[|A|^{n}]\leq HR^{n}\), \(n\geq n_{0}\) (that is, \(\|A\|_{\mathrm {L}^{\infty}(\Omega)}\leq R\)), we are under the assumptions of Theorem 3.3. Indeed, in the notation of Theorem 3.3, \(A_{1}=-2\), \(A_{n}=0\) for all \(n\neq1\), \(B_{0}=A\) and \(B_{n}=0\) for every \(n\neq0\). For a fixed and finite \(r>0\) and \(t_{0}=0\), we have \(\| A_{1}\|_{\mathrm {L}^{\infty}(\Omega)}\leq C_{r}/r^{1}\) and \(\|B_{0}\|_{\mathrm {L}^{\infty}(\Omega)}\leq C_{r}/r^{0}=C_{r}\), being \(C_{r}=\max\{2r,\|A\|_{\mathrm {L}^{\infty}(\Omega)}\}\) for example. Then \(X(t)=\sum_{n=0}^{\infty}X_{n} t^{n}\) defined as in Theorem 3.3 is an analytic solution to (22) in \((-r,r)\). Again, as \(r>0\) is arbitrary, \(X(t)=\sum_{n=0}^{\infty}X_{n} t^{n}\) is a mean square analytic solution stochastic process to random initial value problem (22) in \(\mathbb{R}\).

As in [3], let \(A\sim\text{Normal}(\mu=5,\sigma^{2}=1)\) and \(Y_{0}\), \(Y_{1}\) independent random variables such that \(Y_{0}\sim\text{Normal}(1,1)\) and \(Y_{1}\sim\text{Normal}(2,1)\). In this case, since the normal distribution is unbounded, it does not fulfill the hypotheses, therefore we need to truncate it: in [3] it has been truncated to the interval \([\mu-3\sigma,\mu+3\sigma]=[2,8]\), which contains approximately 99.7% of the observations of a Gaussian random variable. Tables 7 and 8 simulate the results obtained in [3]. In Table 7 we show, for distinct values of t, \(\mathbb {E}[X_{N}(t)]\) for \(N=15\) and \(N=16\), the corresponding approximation obtained via the dishonest method and Monte Carlo simulations with samples of size \(50{,}000\) and \(100{,}000\). In Table 8 we present, for distinct values of t, \(\mathbb{V}[X_{N}(t)]\) for \(N=15\) and \(N=16\) and Monte Carlo simulations with samples of size \(50{,}000\) and \(100{,}000\). As it occurred in the previous example, convergence has been achieved for small N. In Table 9, we show approximations of the covariances \(\mathbb {C}\mathrm {ov}[A(t),\dot{X}(t)]\) and \(\mathbb {C}\mathrm {ov}[B(t),X(t)]\) to understand better the accuracy of the dishonest method.

Table 7 Approximation of the expectation of the solution stochastic process. Example 4.2
Table 8 Approximation of the variance of the solution stochastic process. Example 4.2
Table 9 Approximation of \(\mathbb {C}\mathrm {ov}[A(t),\dot{X}(t)]\) and \(\mathbb {C}\mathrm {ov}[B(t),X(t)]\) via accurate truncations \(\dot{X}_{16}(t)\) and \(X_{16}(t)\), respectively. Example 4.2

Example 4.3

(Random linear differential equation with polynomial data processes)

Let us consider more complex data processes in our random differential equation (1). The data stochastic processes will be random polynomials. For example,

$$ \textstyle\begin{cases} \ddot{X}(t)+(A_{0}+A_{1}t)\dot{X}(t)+(B_{0}+B_{1}t)X(t)=0,\quad t\in\mathbb{R}, \\ X(0)=Y_{0}, \\ \dot{X}(0)=Y_{1}, \end{cases} $$
(23)

where \(A_{0}=4\), \(A_{1}\sim\text{Uniform}(0,1)\), \(B_{0}\sim\text{Gamma}(2,2)\), \(B_{1}\sim\text{Bernoulli}(0.35)\), \(Y_{0}=-1\) and \(Y_{1}\sim \text{Binomial}(2,0.29)\) are assumed to be independent. In order for the hypotheses of Theorem 3.3 to be satisfied, the gamma distribution will be truncated. For the gamma distribution with shape and rate 2, it can straightforwardly be checked that the interval \([0,4]\) contains approximately 99.7% of the observations.

By Theorem 3.3, the mean square solution of (23) can be written as a random power series \(X(t)=\sum_{n=0}^{\infty}X_{n} t^{n}\) that is mean square convergent for all \(t\in\mathbb{R}\).

In Tables 10 and 11, the numerical experiments for the expectation and variance are presented.

Table 10 Approximation of the expectation of the solution stochastic process. Example 4.3
Table 11 Approximation of the variance of the solution stochastic process. Example 4.3

Example 4.4

(Random linear differential equation with infinite series data processes)

In this example, the data stochastic processes in the random differential equation (1) are non-polynomial analytic stochastic process:

$$ \textstyle\begin{cases} \ddot{X}(t)+A(t)\dot{X}(t)+B(t)X(t)=0, \quad t\in\mathbb {R}, \\ X(0)=Y_{0}, \\ \dot{X}(0)=Y_{1}, \end{cases} $$
(24)

where \(A_{n}\sim\text{Beta}(11,15)\) for \(n\geq0\), \(B_{n}=1/n^{2}\), for \(n\geq1\), and \(Y_{0}\sim\text{Poisson}(2)\) and \(Y_{1}\sim\text{Uniform}(0,1)\) are assumed to be independent. We have \(\mathbb {E}[A_{n}]=11/26\) and \(\mathbb{V}[A_{n}]=55/6084\), therefore \(\|A_{n}\| _{\mathrm {L}^{2}(\Omega)}=\sqrt{\mathbb{V}[A_{n}]+\mathbb {E}[A_{n}]^{2}}=0.12908\), \(n\geq0\). Then

$$\sum_{n=0}^{\infty} \Vert A_{n} \Vert _{\mathrm {L}^{2}(\Omega)}t^{n}=0.12908\sum_{n=0}^{\infty}t^{n}, $$

which is convergent for \(t\in(-1,1)\). On the other hand,

$$\sum_{n=0}^{\infty} \Vert B_{n} \Vert _{\mathrm {L}^{2}(\Omega)}t^{n}=\sum_{n=1}^{\infty}\frac{1}{n^{2}}t^{n}, $$

which is convergent for \(t\in(-1,1)\) as well. Therefore, the maximum r, that we can take so that the random differential equation (24) makes sense, is \(r=1\).

Since \(|A_{n}(\omega)| \leq1\) for all \(\omega\in\Omega\), \(|B_{n}| \leq 1\) and \(r=1\), we can take \(C_{r}=1\) in Theorem 3.3 and the hypotheses hold. By Theorem 3.3, the mean square solution of (24), \(X(t)\), is defined and is mean square analytic on \((-1,1)\).

In Tables 12 and 13 we present the numerical experiments. To apply the dishonest method, we need the following two computations:

$$\mathbb{E} \bigl[A(t) \bigr]=\frac{11}{26}\sum _{n=0}^{\infty}t^{n}=\frac {11}{26(1-t)},\quad t \in(-1,1) $$

and

$$\mathbb{E} \bigl[B(t) \bigr]=\sum_{n=1}^{\infty}\frac{t^{n}}{n^{2}}, \quad t\in(-1,1). $$
Table 12 Approximation of the expectation of the solution stochastic process. Example 4.4, assuming independent initial conditions
Table 13 Approximation of the variance of the solution stochastic process. Example 4.4, assuming independent initial conditions

To apply Monte Carlo simulation, we need realizations of the stochastic process \(A(t)\), that is, realizations of the random variables \(A_{0},A_{1},\ldots\) . As we cannot obtain infinite realizations of a beta distribution in the computer, we will approximate \(A(t,\omega)\approx \sum_{n=0}^{100} A_{n}(t,\omega)t^{n}\). So, from realizations of \(A_{0},\ldots,A_{100}\), we will obtain an approximation of a realization of \(A(t)\).

By contrast, our approximations using truncation \(X_{N}(t)\), \(t\in (-1,1)\), do not require realizations of the infinite data stochastic process \(A(t)\).

As it is observed in Tables 12 and 13, the convergence has been practically achieved for \(N=17\).

As an application of the error estimates analyzed in Sect. 3.6, we estimate for which index \(N_{\epsilon}\) the error obtained in the approximation of \(\mathbb{E}[X(t)]\) via \(\mathbb{E}[X_{N}(t)]\) is smaller than \(\epsilon=0.00001\). Take \(t=0.25\) and, for instance, \(s=0.5\). We have \(r=1\) and \(C_{r}=1\), and from these values we obtain the least \(n_{0}\) such that (18) holds for all \(n\geq n_{0}\) by trial and error. We obtain \(n_{0}=0\). From (13), \(M=M_{1}=\max\{ H_{0},H_{1} s\}=\max\{\|Y_{0}\|_{\mathrm {L}^{2}(\Omega)},\|Y_{1}\|_{\mathrm {L}^{2}(\Omega )}\cdot0.5\}=\max\{\sqrt{6},1/(2\sqrt{3})\}=\sqrt{6}\), whence \(N_{\epsilon}=18\) by using (19).

Now we bound the error made when approximating the variance. Given an error \(\epsilon=0.00001\), we obtain an \(N_{\epsilon}\) such that \(|\mathbb{V}[X_{N}(t)]-\mathbb{V}[X(t)]|<\epsilon\) at the point \(t=0.25\) for all \(N\geq N_{\epsilon}\). We use the ideas and notation from Sect. 3.6. We have \(t=0.25\), \(\rho=0.25\), \(r=1\) and \(s=0.5\). We computed \(M=\sqrt{6}\), whence \(\gamma=M/(1-\rho /s)=4.89898\). Recall that we could choose β equal to γ or, for a tighter bound, use Table 12. In Table 12, we see that \(|\mathbb{E}[X(0.25)]|\leq2.113=:\beta \). From these values, \(\delta=7.13065\cdot10^{-7}\). To end up, pick \(N_{\delta}\) such that \(\|X_{N}(0.25)-X(0.25)\|_{\mathrm {L}^{2}(\Omega)}<\delta \). Using expression (19) (with δ instead of ϵ), we get \(N_{\delta}=22\). Thereby, \(|\mathbb{V}[X_{N}(0.25)]-\mathbb {V}[X(0.25)]|<0.00001\) for \(N\geq22\).

We perform another example for the random initial value problem (24), again with \(A_{n}\sim\text{Beta}(11,15)\) for \(n\geq 0\), \(B_{n}=1/n^{2}\) for \(n\geq1\), but now the random vector \((Y_{0},Y_{1})\) follows a multinomial distribution with three repetitions and probabilities 0.29 and 0.15. The random variables/vectors \(A_{0},A_{1},\ldots\) and \((Y_{0},Y_{1})\) are independent, but, obviously, \(Y_{0}\) and \(Y_{1}\) are not independent. Again, the solution stochastic process \(X(t)\) is defined on \((-1,1)\).

In Tables 14 and 15, we show the numerical experiments. Convergence has been practically achieved.

Table 14 Approximation of the expectation of the solution stochastic process. Example 4.4, assuming dependent initial conditions
Table 15 Approximation of the variance of the solution stochastic process. Example 4.4, assuming dependent initial conditions

Example 4.5

(An application of the truncation method and Monte Carlo simulation to modeling)

In order to see a real application of our theoretical development, let us fit data that describe the fish weight growth over the time via a random second order linear differential equation. In Fig. 1, we show the fish weight in lbs (vertical axis) per year (horizontal axis). The fish weight datum at the ith year will be denoted by \(w_{i}\) for \(1\leq i\leq33\).

Figure 1
figure 1

Data on fish weights. In the horizontal axis, we represent the years, from 1 to 33. In the vertical axis, we represent the weights in lbs

These data were previously used in [5], where a randomized Bernoulli differential equation was used, taking as reference the Bertalanffy model [34, p. 331].

Let W be the stochastic process that models the fish weight. The random variable \(W(t)\) models the fish weight at year t, \(1\leq t\leq 33\). Since \(W(t)\) is a positive random variable, we will work with \(X(t)=\log(W(t))\) instead. The observed data become \(\log(w_{1}),\ldots ,\log(w_{33})\). We use the random initial value problem

$$ \textstyle\begin{cases} \ddot{X}(t)+A_{0}\dot{X}(t)+(B_{0}+B_{1}t)X(t)=0, \quad t\in \mathbb{R}, \\ X(0)=Y_{0}, \\ \dot{X}(0)=Y_{1} \end{cases} $$
(25)

to model the logarithm of the fish weight growth. The stochastic processes \(A(t)=A_{0}\) and \(B(t)=B_{0}+B_{1} t\) have been chosen by numerical fit trials and computational viability. Notice that (25) is a generalization of Airy’s random differential equation from Example 4.1.

Using the data drawn in Fig. 1, we would like to find the best random variables \(A_{0}\), \(B_{0}\), \(B_{1}\), \(Y_{0}\) and \(Y_{1}\) so that \(W(t)\) fits appropriately the uncertainty associated to the fish weight growth. Since we do not have an explicit solution process X, we will use truncation (14), \(X_{N}(t)\), to approximate it in the \(\mathrm {L}^{2}(\Omega)\) sense. Using the truncation with a high N, we will be able to give a suitable distribution for \((A_{0},B_{0},B_{1},Y_{0},Y_{1})\).

There are two statistical approaches to dealing with this problem: the frequentist and the Bayesian techniques. Reference [25] provides an introduction to Bayesian statistics. We do not carry out a Bayesian approach, because \(X_{N}(t)\) has a very large expression, which makes the use of Bayesian estimation impracticable in the computer. Thereby, we use the ideas of the so-called inverse frequentist technique for parameter estimation exhibited in [5] and [35, Chap. 7]. In order not to depart from our reasoning, we will explain our concrete frequentist approach in Remark 4.6 at the end of this example.

Without entering into the theoretical details that will be explained in Remark 4.6, we specify the steps to solve our modeling problem. Computational viability makes us choose \(N=24\) as the order of truncation. We give the random vector \((A_{0},B_{0},B_{1},Y_{0},Y_{1})\) a sixth dimensional multinormal random distribution (in the end, it will be truncated so that the hypotheses of Theorem 3.3 are fulfilled). The mean vector μ is determined as the solution of the deterministic minimization problem

$$\min_{a_{0}, b_{0}, b_{1}, y_{0}, y_{1}\in\mathbb{R}} \sum_{i=1}^{33} \bigl(\log(w_{i})-X_{24}(t_{i}|a_{0},b_{0},b_{1},y_{0},y_{1}) \bigr)^{2}, $$

where \(X_{24}(t|a_{0},b_{0},b_{1},y_{0},y_{1})\) corresponds to the value of \(X_{24}(t)\) substituting \(A_{0}\), \(B_{0}\), \(B_{1}\), \(Y_{0}\) and \(Y_{1}\) by the real numbers \(a_{0}\), \(b_{0}\), \(b_{1}\), \(y_{0}\) and \(y_{1}\). This minimization problem can be solved with the built-in function FindFit with the option Method -> NMinimize in Mathematica®. We obtain

$$\mu= \begin{pmatrix} 0.169695 \\ -0.0123653 \\ 0.000347771 \\ -2.09309 \\ 0.672599 \end{pmatrix} . $$

The covariance matrix Σ is estimated with \(\sigma^{2} (J^{T}J)^{-1}\), where

$$\sigma^{2}=\frac{\sum_{i=1}^{33} (\log(w_{i})-X_{24}(t_{i}|\mu ) )^{2}}{33-5}=0.00369358, $$

and J is the Jacobian matrix of \(X_{24}(t)\) with respect to \((A_{0},B_{0},B_{1},Y_{0},Y_{1})\) evaluated at μ. We obtain

$${{ \Sigma= \begin{pmatrix} 0.000109461 & -0.000010458 & 1.44986\cdot10^{-7} & -0.000645456 & 0.000435313 \\ -0.000010458 & 2.47732\cdot10^{-6} & -7.94312\cdot10^{-8} & 0.0000461867 & -0.0000398354 \\ 1.44986\cdot 10^{-7} & -7.94312\cdot10^{-8} & 3.23082\cdot10^{-9} & -3.16049\cdot 10^{-7} & 5.53045\cdot10^{-7} \\ -0.000645456 & 0.0000461867 & -3.16049\cdot10^{-7} & 0.00624016 & -0.00312264 \\ 0.000435313 & -0.0000398354 & 5.53045\cdot10^{-7} & -0.00312264 & 0.00186506 \end{pmatrix} .}} $$

Once we know the estimated distribution of the random vector \((A_{0},B_{0},B_{1},Y_{0},Y_{1})\), we know the distribution of \(X_{24}(t)\), at least theoretically. The computational complexity of \(X_{24}(t)\) is very big, so computing its exact expectation or even good approximations of it is nearly impossible. Due to the complexity of both the truncation expression and the distribution of \((A_{0},B_{0},B_{1},Y_{0},Y_{1})\), it is better to perform Monte Carlo simulation directly on (25) via simulations of \((A_{0},B_{0},B_{1},Y_{0},Y_{1})\), which follows a (truncated) multivariate Gaussian distribution \((\mu,\Sigma)\).

By means of Monte Carlo simulation with \(100{,}000\) iterates, we obtain samples of \(X(i)\), \(i=1,\ldots,33\). Applying exponential, we have samples of \(W(i)\), \(i=1,\ldots,33\). Hence, approximations of both \(\mathbb{E}[W(i)]\) and \(\mathbb{V}[W(i)]\) can be calculated. A confidence interval can be computed in two ways: either considering \([\mathbb{E}[W(t)]\pm2\sqrt{\mathbb{V}[W(t)]}]\) (this is based on how confidence intervals are constructed in a Gaussian setting) or obtaining an accurate approximation using the quartiles of the sample produced by Monte Carlo. In Figs. 2 and 3, the results are shown. As it is observed in both plots, the mean approximates well the real data. However, the confidence interval grows as we move away from 0. Intuitively, this may hold because a truncated random power series centered at \(t_{0}\) works better near \(t_{0}\). This phenomenon may be resolved by making the order of truncation N larger and larger (if the computer permits it).

Figure 2
figure 2

Fit of the fish weight data. The blue points represent the real weights, the red points represent the estimated weights (the mean) and the green lines cover a 95% confidence interval constructed with the Gaussian rule \([\text{mean}\pm2\cdot\text{standard deviation}]\)

Figure 3
figure 3

Fit of the fish weight data. The blue points represent the real weights, the red points represent the estimated weights (the mean) and the green lines cover a 95% confidence interval constructed by taking the quartiles in the Monte Carlo sampling

Remark 4.6

(Frequentist technique used in Example 4.5)

Let X be a random vector of size n to be modeled (in our case, \((\log(W(1)),\ldots,\log(W(33)))\)). We set a model of the form \(X=f(V)\), where V is a random vector with p components (in our case \((A_{0},B_{0},B_{1},Y_{0},Y_{1})\)) that follows a multinormal distribution with parameters \((\mu,\Sigma)\), and \(f:\mathbb{R}^{p}\rightarrow\mathbb {R}^{n}\) is a function, maybe non-linear (in our case, \(n=33\) and \(f_{i}(a_{0},b_{0},b_{1},y_{0},y_{1})=X_{24}(i|a_{0},b_{0},b_{1},y_{0},y_{1})\), \(i=1,\ldots ,33\), where \(X_{24}(t|a_{0},b_{0},b_{1},y_{0},y_{1})\) is the truncation that approximates the solution of the random differential equation). Let \(x=X(\omega)\) be a vector realization of X (in our example, the real data \(x=(\log(w_{1}),\ldots,\log(w_{33}))\)). From x, we want to estimate the best μ and Σ so that the model \(X=f(V)\) can be considered correct. Let \(\hat{v}\in\mathbb{R}^{p}\) be the minimizer of

$$\min_{v\in\mathbb{R}^{p}} \sum_{i=1}^{n} \bigl(x_{i}-f_{i}(v) \bigr)^{2}. $$

Using Taylor’s expansion,

$$X\approx f(\hat{v})+Jf(\hat{v}) (V-\hat{v}), $$

where J stands for the Jacobian. Then

$$Z:=X-f(\hat{v})+Jf(\hat{v})\hat{v}\approx\underbrace{Jf(\hat{v})}_{J}V. $$

We derive that \(Z\approx J\mu+E\), where E follows a multivariate normal distribution with parameters \((0,J\Sigma J^{T})\) (here T stands for the transpose matrix operator), i.e., \(E \sim\text{MN}(0,J\Sigma J^{T})\). Write \(J\Sigma J^{T}=P^{T}DP\), where P and D are an orthogonal and a diagonal matrix, respectively. Multiplying by P, we have \(\bar{Z}\approx\bar{J}\mu+\bar{E}\), where \(\bar {Z}=PZ\), \(\bar{J}=PJ\) and \(\bar{E}=PE\sim\text{MN}(0,D)\). Therefore, \(\bar{Z}\approx\bar{J}\mu+\bar{E}\) is a classical linear model (see [32, Chap. 7], [35, Chap. 7]) with normal and independent errors. In a linear model, one should assume homoscedasticity so that the estimation of the parameters makes sense. Thus, we impose \(D=\sigma^{2} I_{n}\), where \(\sigma^{2}\) is the variance of the errors in the linear model. As a consequence, \(\bar{Z}\approx\bar{J}\mu+\bar{E}\) is a classical linear model with homoscedasticity, and the estimations follow from general theory: μ̂ is the minimizer of

$$\min_{\mu\in\mathbb{R}^{p}} \Vert \bar{z}-\bar{J}\mu \Vert _{2}^{2}=\min_{\mu \in\mathbb{R}^{p}} \Vert Pz-PJ\mu \Vert _{2}^{2}=\min_{\mu\in\mathbb{R}^{p}} \Vert z-J\mu \Vert _{2}^{2}, $$

where \(z=x-f(\hat{v})+J\hat{v}\) is the vector realization of Z, \(\bar{z}=Pz\) is the vector realization of and \(\|\cdot\| _{2}\) is the Euclidean norm. Now,

$$\Vert z-J\mu \Vert _{2}^{2}= \bigl\Vert x- \bigl(f( \hat{v})+J(\mu-\hat{v}) \bigr) \bigr\Vert _{2}^{2}\approx \bigl\Vert x-f(\mu) \bigr\Vert _{2}^{2}, $$

so we can take \(\hat{\mu}=\hat{v}\), which justifies our choice for the mean in the example. On the other hand, by the general theory of linear models,

$$\hat{\sigma}^{2}=\frac{ \Vert \bar{z}-\bar{J}\hat{\mu} \Vert _{2}^{2}}{n-p}=\frac{ \Vert Pz-PJ\hat{\mu} \Vert _{2}^{2}}{n-p}= \frac{ \Vert z-J\hat{\mu } \Vert _{2}^{2}}{n-p}\approx\frac{ \Vert x-f(\hat{\mu}) \Vert _{2}^{2}}{n-p}. $$

Finally, from \(J\Sigma J^{T}=P^{T}DP=\sigma^{2}P^{T}P=\sigma^{2} I_{n}\), we derive \(\Sigma=\sigma^{2}(J^{T} J)^{-1}\), by multiplying by \((J^{T} J)^{-1}J^{T}\) to the left and by \(J(J^{T} J)^{-1}\) to the right at both sides of the equality (we assume \(\mathrm{rank}(J)=p\) so that \((J^{T} J)^{-1}\) exists). Thus, we choose the estimator \(\hat{\Sigma}=\hat {\sigma}^{2}(J^{T} J)^{-1}\).

5 Conclusions

In this paper we have determined analytic stochastic processes that are solutions to the random non-autonomous second order linear differential equation in the mean square sense taking advantage of the powerful theory of random difference equations. After reviewing the \(\mathrm {L}^{p}(\Omega)\) random calculus and results concerning random power series (differentiation of a random power series in the \(\mathrm {L}^{p}(\Omega)\) sense and Merten’s theorem for random series in the mean square sense), we stated the main theorem of the paper, Theorem 3.3. This theorem gives assumptions on the coefficient stochastic processes and on the random initial conditions of a random non-autonomous second order linear differential equation, so that there exists an analytic stochastic process that is a solution in the mean square sense. This mean square approach permitted approximating the main statistical information of the solution stochastic process, expectation and variance. These approximations for the expectation and variance were compared with other methods previously used in the literature: the dishonest method and Monte Carlo simulation.

The numerical examples presented illustrate the potentiality of our results. The examples show that our findings allow for much more complex random non-autonomous second order linear differential equations than those from the existing literature. The ideas of this paper permit dealing with any random non-autonomous second order linear differential equation in a general form. The statistical information of the stochastic process solution can be computed up to any degree of accuracy. These achievements have been reached thanks to the powerful theory of difference equations.

Moreover, our truncation method provides a methodology to estimate the parameters of the multivariate normally distributed explanatory random vector in the modeling of real data via random non-autonomous second order linear differential equations. This procedure together with Monte Carlo simulations gives fitting approximations of the real data.

We mention that a future line of research could consist in extending our ideas to a random non-autonomous linear differential equation of order higher than two using appropriate results belonging to the theory of random difference equations. The extension would require a more complex solution process than that given in Theorem 3.3, which, in turn, would increase the computational expense when approximating accurately its main statistical features.