1 Introduction

The computational approximation of stochastic partial differential equations (SPDEs) often turns out to be a very expensive and demanding task. One usually has to combine numerical schemes for the temporal discretization of the interval \([0,T]\) with Galerkin finite element methods for the spatial discretization as well as truncation methods for the infinite dimensional noise. By the combination of such schemes one then generates a sample path of the numerical solution. If we are interested in the approximation of expected values of functionals of the solution, we have to repeat this procedure several times in order to compute a decent Monte Carlo approximation.

For instance, let \(X :[0,T] \times \Omega \rightarrow H\) be a Hilbert-space valued stochastic process, which denotes the solution to the given SPDE. Then our computational goal may be a good approximation of the real number

$$\begin{aligned} {\mathbb {E}}[\varphi (X(T))], \end{aligned}$$
(1)

where \(\varphi :H \rightarrow {\mathbb {R}}\) is a sufficiently smooth mapping.

Before the upcoming of the multilevel Monte Carlo algorithm (MLMC) [10, 16], it was common to purely focus on weakly convergent schemes for the problem (1). These schemes guarantee a good approximation of the distribution of \(X\) and are then combined with a standard Monte Carlo estimator to compute an approximation of (1).

In [10] Giles pointed out that the computational complexity of problem (1) can drastically be reduced by the MLMC algorithm, which distributes the most costly work of the Monte Carlo estimator to coarser time grids, while relatively few samples need to be simulated of the smallest and hence most costly temporal step size. But for this idea to work one also needs to take the order of strong convergence into account. In addition to an approximation of the distribution, a strongly convergent scheme generates good pathwise approximations of the solution \(X\). For more details on strong and weak convergence we refer to [18].

Additionally, Giles showed in [11] that the usage of higher order strongly convergent schemes, such as the Milstein method [26], further reduces the computational complexity, although the order of weak convergence remains unchanged. While [10, 11, 16] are purely concerned with the finite dimensional SODE problem, similar results also hold for solutions to SPDEs [2, 4].

Consequently, this observation has spurred the study of an infinite dimensional analogue of the Milstein scheme and first results have been achieved for a temporal semidiscretization of linear SPDEs in [23, 24]. Afterwards, the Milstein scheme has been combined with Galerkin finite element methods and extended to more general types of driving noises in [1, 3], while it was applied to semilinear SPDEs in [14], but only with spectral Galerkin methods.

In this paper we apply the more general Milstein–Galerkin finite element methods to the class of semilinear SPDEs studied in [14]. Under mildly relaxed assumptions on the nonlinearities we obtain slightly sharper estimates of the error of strong convergence. For this we embed the scheme into a more abstract framework and analyze the strong error with respect to the notion of bistability and consistency, which originated from [31] and has been applied to SODEs for the first time in [5, 19]. A key role is played by the choice of the so-called Spijker norm (24) (see also [13, p. 438] and [29, 30]), which is used to measure the local truncation error and results into two-sided estimates of the error as shown in Theorem 1.1 below.

In the definition of Milstein-type schemes there appear iterated stochastic integrals which are called Lévy areas. Unless the SPDE enjoys some special properties like commutative noise as, for example, in [14], one relies on additional approximation methods in order to simulate these stochastic integrals. These methods are typically expensive and limit the practical value of the Milstein-type schemes. For instance, we refer to [17, Ch. 10.3].

However, in the recent publication [12] Giles and Szpruch propose an antithetic version of the MLMC algorithm for finite dimensional SODEs. By this modification they still benefit from the higher order of convergence of the Milstein scheme but without the need to simulate the Lévy areas. In a forthcoming publication we will show that the results of [12] carry over to SPDEs by applying a slightly adapted version of the abstract concept which we develope in this paper.

In order to give a more detailed outline of the paper we first fix some notation. Let \([0,T]\) be a finite time interval and \((H,( \cdot , \cdot )_H, \Vert \cdot \Vert _H )\) and \((U, (\cdot , \cdot )_U, \Vert \cdot \Vert _U)\) be two separable real Hilbert spaces. We denote by \((\Omega , {\mathcal {F}}, {\mathbf {P}})\) a probability space which is combined with a normal filtration \(({\mathcal {F}}_t)_{t \in [0,T]} \subset {\mathcal {F}}\) satisfying the usual conditions. Then, let \((W(t))_{t\in [0,T]}\) be a cylindrical \(Q\)-Wiener process in \(U\) with respect to \(( \mathcal {F}_t )_{ t \in [0,T] }\). Here, the given covariance operator \(Q :U \rightarrow U\) is assumed to be bounded, symmetric and positive semidefinite, but not necessarily of finite trace. For the definition of cylindrical \(Q\)-Wiener processes in \(U\) we refer to [28, Ch. 2.5].

Next, we introduce the semilinear SPDE, whose solution we want to approximate. Let \(X :[0,T] \times \Omega \rightarrow H\) denote the mild solution [8, Ch. 7] to the semilinear SPDE

$$\begin{aligned} \mathrm {d}X(t) + \big [ AX(t) + f(X(t)) \big ] \,\mathrm {d}t&= g(X(t)) \,\mathrm {d}W(t), \text { for } 0 \le t \le T,\nonumber \\ X(0)&= X_0. \end{aligned}$$
(2)

Here, \(-A :{\mathrm {dom}}(A) \subset H \rightarrow H\) is the generator of an analytic semigroup \((S(t))_{t \ge 0}\) on \(H\) and \(f\) and \(g\) denote nonlinear mappings which are Lipschitz continuous and smooth in an appropriate sense. In Sect. 2.1 we give a precise formulation of our conditions on \(A\), \(f\), \(g\) and \(X_0\), which are also sufficient for the existence and uniqueness of mild solutions \(X\) (see also Sect. 2.2).

By definition [8, Ch. 7] the mild solution satisfies

$$\begin{aligned} X(t) = S(t) X_0 - \int _0^t S(t-\sigma ) f(X(\sigma )) \,\mathrm {d}\sigma + \int _0^t S(t-\sigma ) g(X(\sigma )) \,\mathrm {d}W(\sigma ) \end{aligned}$$
(3)

\({\mathbf {P}}\text {-a.s.}\) for all \(0 \le t \le T\).

As our main example we have the following situation in mind: \(H\) is the space \(L_2 ({\mathcal {D}};{\mathbb {R}})\) of square integrable functions, where \(\mathcal {D} \subset {\mathbb {R}}^d\) is a bounded domain with smooth boundary \(\partial {\mathcal {D}}\) or a convex domain with polygonal boundary. Then, for example, let \(-A\) be the Laplacian with homogeneous Dirichlet boundary conditions. Much more extensive lists of examples are given in [14, 15] and [21, Ch. 2.3].

In order to introduce the Milstein–Galerkin finite element scheme we denote by \(k \in (0,T]\) a given equidistant time step size with grid points \(t_n = nk\), \(n = 1,\ldots ,N_k\), where \(N_k \in {\mathbb {N}}\) is defined by \(N_k k \le T < (N_k + 1)k\). By the parameter \(h \in (0,1]\) we control the spatial discretization. Then the Milstein scheme for the spatio-temporal discretization of the SPDE (2) is given by the recursion

$$\begin{aligned} X_{k,h}(t_0)&= P_h X_0,\nonumber \\ X_{k,h}(t_n)&= X_{k,h}(t_{n-1}) - k \big [ A_h X_{k,h}(t_n) + P_h f(X_{k,h}(t_{n-1})) \big ]\nonumber \\&\quad +\,P_h g(X_{k,h}(t_{n-1})) \Delta _k W(t_{n}) \nonumber \\&\quad +\int _{t_{n-1}}^{t_n} P_h g'(X_{k,h}(t_{n-1}))\Big [ \int _{t_{n-1}}^{\sigma _1} g(X_{k,h}(t_{n-1})) \,\mathrm {d}W(\sigma _2) \Big ] \,\mathrm {d}W(\sigma _1) \end{aligned}$$
(4)

for \(n \in \{1,\ldots ,N_k\}\), where \(\Delta _k W(t_n) := W(t_n) - W(t_{n-1})\). Here, \(P_h\), \(h \in (0,1]\), denotes the orthogonal projector onto the Galerkin finite element space \(V_h \subset H\) and \(A_h\) is a discrete version of the generator \(A\). Together with some useful error estimates, the operators of the spatial approximation are explained in more detail in Sect. 2.4.

In Sect. 3 we introduce a class of abstract numerical one-step schemes in Hilbert spaces and we develop our stability and consistency analysis within this framework. We end up with a set of sufficient conditions for the so-called bistability (see Definition 3.1) and a decomposition of the local truncation error.

In Sects. 4 and 5 we verify that the scheme (4) is indeed bistable and consistent (see Theorems 4.1 and 5.1). These two properties together yield our main result (compare with Theorem 3.4):

Theorem 1.1

Suppose the spatial discretization fulfills Assumptions 2.7 and 2.9. Provided that the initial condition \(X_0\) and the coefficients \(f\) and \(g\) of the SPDE (2) satisfy Assumptions 2.2–2.4 with \(p \in [2,\infty )\) and \(r \in [0,1)\), then there exists a constant \(C\) such that

$$\begin{aligned} \frac{1}{C} \big \Vert \mathcal {R}_k [ X|_{\mathcal {T}_k}] \big \Vert _{-1,p} \le \max _{0 \le n \le N_k} \big \Vert X_{k,h}(t_n) - X(t_n) \big \Vert _{L_p(\Omega ;H)} \le C \big \Vert \mathcal {R}_k [ X|_{\mathcal {T}_k}] \big \Vert _{-1,p}, \end{aligned}$$
(5)

where \(\mathcal {R}_k\) is the residual operator associated to the scheme (4). In particular, from the estimate of the local truncation error it follows that

$$\begin{aligned} \max _{0 \le n \le N_k} \big \Vert X_{k,h}(t_n) - X(t_n) \big \Vert _{L_p(\Omega ;H)} \le C \big ( h^{1+r} + k^{\frac{1+r}{2}} \big ) \end{aligned}$$
(6)

for all \(h \in (0,1]\) and \(k \in (0,T]\), where \(X_{k,h}\) denotes the grid function generated by the scheme (4) and \(X\) is the mild solution to (2).

At this point, for a better understanding of Theorem 1.1, let us explain the different objects appearing in its formulation. First, Assumptions 2.7 and 2.9 are concerned with the spatial discretization. In our main example above, they are usually satisfied for the standard piecewise linear finite element method.

Roughly speaking, the other assumptions determine the spatial regularity of the solution, which is measured by the parameter \(r \in [0,1)\) in terms of fractional powers of the operator \(A\). This parameter also controls the order of convergence.

Then, in (5) the residual operator \(\mathcal {R}_k\) of the numerical scheme (4) appears. It characterizes (4) in the sense that \(\mathcal {R}_k[Z] = 0\) if and only if the \(H\)-valued grid function \(Z\) coincides with \(X_{k,h}\). We therefore use the residual operator in order to determine how far the exact solution \(X\) (restricted to the time grid) differs from the numerical solution \(X_{k,h}\). This residual is called local truncation error and, if measured in terms of the stochastic Spijker norm (24), it can be used to estimate the strong error from above and below (compare further with Sect. 3).

In [19] two-sided error estimates of the form (5) have been used to prove the maximal order of convergence of all Itō-Taylor schemes. However, this question is not discussed in this paper and it is subject to future research if a similar result can be derived for the Milstein–Galerkin finite element scheme.

Further, we note that the order of convergence in (6) is slightly sharper than in [14], where the corresponding result contains a small order reduction of the form \(1 + r - \epsilon \) for arbitrary \(\epsilon > 0\). As in [20] this order reduction is avoided by the application of Lemma 2.10, which contains sharp integral versions of estimates for the Galerkin finite element error operator.

In practice, the scheme (4) can seldomly be implemented directly on a computer due to the fact that the space \(U\) and thus also the noise \(W\) is probably of high or infinite dimension. In our final Sect. 6 we discuss the stability and consistency of a variant of (4), which incorporates a spectral approximation of the Wiener process. This approach has already been studied by several authors in the context of Milstein schemes for SPDEs, for instance in [1, 3, 14]. With Theorem 6.5 we obtain an extended version of Theorem 1.1, which also takes the noise approximation into account.

2 Preliminaries

2.1 Main assumptions

In this subsection we give a precise formulation of our assumptions on the SPDE (2). The first one is concerned with the linear operator.

Assumption 2.1

The linear operator \(A :{\mathrm {dom}}(A) \subset H \rightarrow H\) is densely defined, self-adjoint and positive definite with compact inverse.

As in [27, Ch. 2.5] it follows from Assumption 2.1 that the operator \(-A\) is the generator of an analytic semigroup \((S(t))_{t \in [0,T]}\) on \(H\). There also exists an increasing, real-valued sequence \((\lambda _i)_{i \in {\mathbb {N}}}\) with \(\lambda _i > 0\), \(i \in {\mathbb {N}}\), and \(\lim _{i \in {\mathbb {N}}} \lambda _i = \infty \) and an orthonormal basis \((e_i)_{i \in {\mathbb {N}}}\) of \(H\) such that \(A e_i = \lambda _i e_i\) for every \(i \in {\mathbb {N}}\).

Further, we recall the definition of fractional powers of \(A\) from [21, Ch. B.2]. For any \(r \ge 0\) the operator \(A^{\frac{r}{2}} :{\mathrm {dom}}( A^{\frac{r}{2}} ) \subset H \rightarrow H\) is defined by

$$\begin{aligned} A^{\frac{r}{2}} x := \sum _{j = 1}^\infty \lambda _j^{\frac{r}{2}} (x, e_j) e_j \end{aligned}$$

for all

$$\begin{aligned} x \in {\mathrm {dom}}(A^{\frac{r}{2}}) = \Big \{ x \in H \, :\, \sum _{j = 1}^\infty \lambda _j^r (x, e_j)^2 < \infty \Big \}. \end{aligned}$$

Endowed with the inner product \((\cdot , \cdot )_r := (A^{\frac{r}{2}} \cdot , A^{\frac{r}{2}} \cdot )\) and norm \(\Vert \cdot \Vert _{r} := \Vert A^{\frac{r}{2}} \cdot \Vert \) the spaces \(\dot{H}^{r}:= {\mathrm {dom}}(A^{\frac{r}{2}})\) become separable Hilbert spaces.

In addition, we define the spaces \(\dot{H}^{-r}\) with negative exponents as the dual spaces of \(\dot{H}^{r}\), \(r > 0\). In this case it follows from [21, Th. B.8] that the elements of \(\dot{H}^{-r}\) can be characterized by

$$\begin{aligned} \dot{H}^{-r} = \Big \{ x = \sum _{j =1}^{\infty } x_j e_j \, :\, (x_j)_{j \in {\mathbb {N}}} \subset {\mathbb {R}},\; \text { with } \sum _{j = 1}^\infty \lambda _j^{-r} x_j^2 < \infty \Big \}, \end{aligned}$$

where the equality is understood to be isometrically isomorphic and the norm in \(\dot{H}^{-r}\) can be computed by \(\Vert x \Vert _{-r} = \Vert A^{-\frac{r}{2}} x \Vert \). Here, we set

$$\begin{aligned} A^{-\frac{r}{2}} x = \sum _{j = 1}^{\infty } \lambda _j^{-\frac{r}{2}} x_j e_j, \quad \text { for all } x = \sum _{j = 1}^\infty x_j e_j \in \dot{H}^{-r}. \end{aligned}$$

For the formulation of the remaining assumptions let parameter values \(p \in [2,\infty )\) and \(r \in [0,1)\) be given.

Assumption 2.2

The random variable \(X_0 :\Omega \rightarrow \dot{H}^{1+r}\) is \({\mathcal {F}}_0/\mathcal {B}(\dot{H}^{1+r})\)-measurable. In addition, it holds

$$\begin{aligned} {\mathbb {E}}\big [ \Vert X_0 \Vert ^{2p}_{1+r} \big ] < \infty . \end{aligned}$$

The next assumption is concerned with the nonlinear mapping \(f :H \rightarrow \dot{H}^{-1+r}\) in (2).

Assumption 2.3

The mapping \(f :H \rightarrow \dot{H}^{-1+r}\) is continuously Fréchet differentiable. In addition, there exists a constant \(C_f\) such that \(\Vert f(0) \Vert _{-1+r} \le C_f\) and

$$\begin{aligned} \sup _{x \in H} \Vert f'(x) \Vert _{{\mathcal {L}}(H;\dot{H}^{-1+r})} \le C_f, \end{aligned}$$

as well as

$$\begin{aligned} \Vert f(x_1) - f(x_2) \Vert _{-1+r}&\le C_f \Vert x_1 - x_2 \Vert ,\nonumber \\ \Vert f'(x_1) - f'(x_2) \Vert _{{\mathcal {L}}(H,\dot{H}^{-1+r})}&\le C_f \Vert x_1 - x_2 \Vert , \end{aligned}$$
(7)

for all \(x_1, x_2 \in H\).

See [33, Example 5.1] for an example of a Nemytskii \(f\) operator which is an element of \(\mathcal {C}_{\mathrm {b}}^2(H,\dot{H}^{-1})\), that is, \(f\) is two times continuously Freéchet differentiable as a mapping from \(H\) to \(\dot{H}^{-1}\) with bounded derivatives. In particular, this operator satisfies Assumption 2.3 with \(r=0\).

The last assumption deals with the nonlinear mapping \(g\) in the stochastic integral part of (2). As in [8, 28] we denote \(U_0 := Q^{\frac{1}{2}}(U)\), which together with the inner product \((u_0,v_0)_{U_0} := ( Q^{-\frac{1}{2}} u_0, Q^{-\frac{1}{2}} v_0 )_U\) for \(u_0, v_0 \in U_0\) becomes an Hilbert space. Here \(Q^{-\frac{1}{2}}\) denotes the pseudoinverse [28, App. C] of \(Q^{\frac{1}{2}}\).

Then, by \({\mathcal {L}}_2(H_1,H_2) \subset {\mathcal {L}}(H_1,H_2)\) we denote the space of all Hilbert-Schmidt operators \(L :H_1 \rightarrow H_2\) between two separable Hilbert spaces \(H_1\) and \(H_2\). Together with the inner product

$$\begin{aligned} ( L_1, L_2)_{{\mathcal {L}}_2(H_1,H_2)} = \sum _{j = 1}^\infty \big ( L_1 \psi _j, L_2 \psi _j \big )_{H_2}, \end{aligned}$$

where \((\psi _j)_{j \in {\mathbb {N}}}\) is an arbitrary orthonormal basis of \(H_1\), the set \({\mathcal {L}}_2(H_1,H_2)\) becomes an Hilbert space. We recall the abbreviations \({\mathcal {L}}_2^0 := {\mathcal {L}}_2(U_0,H)\) and \({\mathcal {L}}_{2,r}^0 := {\mathcal {L}}_2(U_0, \dot{H}^r)\) from [20] and refer to [28, App. B] for a short review on Hilbert-Schmidt operators.

Assumption 2.4

Let the mapping \(g :H \rightarrow {\mathcal {L}}_2^0\) be continuously Fréchet differentiable. In addition, there exists a constant \(C_g\) such that \(\Vert g(0) \Vert _{{\mathcal {L}}_2^0} \le C_g\) and

$$\begin{aligned} \sup _{x \in H} \Vert g'(x) \Vert _{{\mathcal {L}}(H;{\mathcal {L}}_2^0)} \le C_g, \end{aligned}$$

as well as

$$\begin{aligned} \Vert g(x_1) - g(x_2) \Vert _{{\mathcal {L}}_2^0}&\le C_g \Vert x_1 - x_2\Vert ,\nonumber \\ \Vert g'(x_1) - g'(x_2) \Vert _{{\mathcal {L}}(H,{\mathcal {L}}_2^0)}&\le C_g \Vert x_1 - x_2\Vert , \nonumber \\ \Vert g'(x_1)g(x_1) - g'(x_2)g(x_2) \Vert _{{\mathcal {L}}_2(U_0,{\mathcal {L}}_2^0)}&\le C_g \Vert x_1 - x_2\Vert , \end{aligned}$$
(8)

for all \(x_1, x_2 \in H\).

Further, the mapping \(g :H \rightarrow {\mathcal {L}}_2^0\) satisfies \(g(x) \in {\mathcal {L}}_{2,r}^0\) and

$$\begin{aligned} \Vert g(x) \Vert _{{\mathcal {L}}_{2,r}^0} \le C_g \big (1 + \Vert x \Vert _r \big ) \end{aligned}$$
(9)

for all \(x \in \dot{H}^r\).

Remark 2.5

It is straightforward to generalize most of the results and techniques, which we develop in this paper, to the case when \(f\) and \(g\) are allowed to also depend on \(t\in [0,T]\) and \(\omega \in \Omega \). For example, this has been done for the linearly implicit Euler-Maruyama method in [21].

2.2 Existence, uniqueness and regularity of the mild solution

Under the assumptions of Sect. 2.1, there exists a unique (up to modification) mild solution \(X :[0,T] \times \Omega \rightarrow H\) to (2) of the form (3). A proof for this is found, for instance, in [21, Ch. 2.4] (based on the methods from [15, Th. 1]).

Furthermore, it holds true that for all \(s \in [0,r+1]\), where \(r \in [0,1)\) and \(p \in [2,\infty )\) are given by Assumptions 2.2 to 2.4, we have

$$\begin{aligned} \sup _{t \in [0,T]} {\mathbb {E}}\big [ \Vert X(t) \Vert ^{2p}_{s} \big ] < \infty \end{aligned}$$
(10)

and there exists a constant \(C\) depending on \(r\), \(s\), and \(p\) such that

$$\begin{aligned} \big ({\mathbb {E}}\big [ \Vert X(t_1) - X(t_2) \Vert _{s}^{2p} \big ] \big )^{\frac{1}{2p}} \le C |t_1 -t_2|^{\min (\frac{1}{2},\frac{r+1 - s}{2})} \end{aligned}$$
(11)

for all \(t_1,t_2 \in [0,T]\). These regularity results have been proved in [15, Th. 1] and [22].

2.3 A Burkholder–Davis–Gundy type inequality

Burkholder–Davis–Gundy-type inequalities are frequently used to estimate higher moments of stochastic integrals. The version in Proposition 2.6 is a special case of [8, Lem. 7.2].

Proposition 2.6

For any \(p \in [2,\infty )\), \(0 \le \tau _1 < \tau _2 \le T\), and for any predictable stochastic process \(\Psi :[0,T] \times \Omega \rightarrow {\mathcal {L}}_2^0\), which satisfies

$$\begin{aligned} \Big ( \int _{\tau _1}^{\tau _2} \big \Vert \Psi (\sigma ) \big \Vert ^2_{L_p(\Omega ;{\mathcal {L}}_2^0)} \,\mathrm {d}\sigma \Big )^{\frac{1}{2}} < \infty , \end{aligned}$$

we have

$$\begin{aligned} \Big \Vert \int _{\tau _1}^{\tau _2} \Psi (\sigma ) \,\mathrm {d}W(\sigma ) \Big \Vert _{L_p(\Omega ;H)}&\le C(p) \Bigg ( {\mathbb {E}}\Big [ \Big ( \int _{\tau _1}^{\tau _2} \big \Vert \Psi (\sigma ) \big \Vert ^2_{{\mathcal {L}}_2^0} \,\mathrm {d}\sigma \Big )^{\frac{p}{2}} \Big ] \Bigg )^{\frac{1}{p}} \\&\le C(p)\, \Bigg ( \int _{\tau _1}^{\tau _2} \big \Vert \Psi (\sigma ) \big \Vert ^2_{L_p(\Omega ;{\mathcal {L}}_2^0)} \,\mathrm {d}\sigma \Bigg )^{\frac{1}{2}}. \end{aligned}$$

Here the constant can be chosen to be

$$\begin{aligned} C(p) = \left( \frac{p}{2} ( p - 1) \right) ^{\frac{1}{2}} \left( \frac{p}{p - 1} \right) ^{(\frac{p}{2} - 1)}. \end{aligned}$$

Proof

Under the given assumptions on \(\Psi \) it follows that

$$\begin{aligned} \Bigg ( {\mathbb {E}}\Big [ \Big ( \int _{\tau _1}^{\tau _2} \big \Vert \Psi (\sigma ) \big \Vert ^2_{{\mathcal {L}}_2^0} \,\mathrm {d}\sigma \Big )^{\frac{p}{2}} \Big ] \Bigg )^{\frac{1}{p}}&= \Big \Vert \int _{\tau _1}^{\tau _2} \big \Vert \Psi (\sigma ) \big \Vert ^2_{{\mathcal {L}}_2^0} \,\mathrm {d}\sigma \Big \Vert _{L_{p/2}(\Omega ;{\mathbb {R}})}^{\frac{1}{2}}\\&\le \Bigg ( \int _{\tau _1}^{\tau _2} \big \Vert \Psi (\sigma ) \big \Vert ^2_{L_p(\Omega ;{\mathcal {L}}_2^0)} \,\mathrm {d}\sigma \Bigg )^{\frac{1}{2}} < \infty . \end{aligned}$$

Therefore, the stochastic integral is well-defined and [8, Lem. 7.2] yields

$$\begin{aligned} \Big \Vert \int _{\tau _1}^{\tau _2} \Psi (\sigma ) \,\mathrm {d}W(\sigma ) \Big \Vert _{L_p(\Omega ;H)}&\le C(p) \Bigg ( {\mathbb {E}}\Big [ \Big ( \int _{\tau _1}^{\tau _2} \big \Vert \Psi (\sigma ) \big \Vert ^2_{{\mathcal {L}}_2^0} \,\mathrm {d}\sigma \Big )^{\frac{p}{2}} \Big ] \Bigg )^{\frac{1}{p}}\\&\le C(p) \Bigg ( \int _{\tau _1}^{\tau _2} \big \Vert \Psi (\sigma ) \big \Vert ^2_{L_p(\Omega ;{\mathcal {L}}_2^0)} \,\mathrm {d}\sigma \Bigg )^{\frac{1}{2}}, \end{aligned}$$

which are the asserted inequalities.

2.4 Galerkin finite element methods

In this subsection we recall the most important elements of Galerkin finite element methods. For a more detailed review we refer to [20, 21], which in turn are based on [32, Ch. 2,3 and 7].

Our starting point is a sequence \((V_h)_{h \in (0,1]}\) of finite dimensional subspaces of \(\dot{H}^1\). Here, the parameter \(h \in (0,1]\) controls the dimension of \(V_h\), which usually increases as \(h\) decreases. For smaller values of \(h\) we therefore expect to find a better approximation of an element from \(H\) within \(V_h\).

Then, for every \(h \in (0,1]\) the Ritz projector \(R_h :\dot{H}^{1} \rightarrow V_h\) is the orthogonal projector onto \(V_h\) with respect to the inner product \((\cdot ,\cdot )_1\) and given by

$$\begin{aligned} \big ( R_h x, y_h \big )_1 = \big ( x, y_h \big )_1 \quad \text {for all } x \in \dot{H}^{1}, \, y_h \in V_h. \end{aligned}$$

The following assumption ensures that the spaces \((V_h)_{h \in (0,1]}\) contain good approximations of all elements in \(\dot{H}^1\) and \(\dot{H}^2\), respectively. It is formulated in terms of the Ritz projector and closely related to the spatial approximation of the elliptic problem \(Au = f\) as noted in [21, Rem. 3.4]. Compare also with [32, (ii) on p. 31 and (2.25)]).

Assumption 2.7

Let a sequence \((V_h)_{h \in (0,1]}\) of finite dimensional subspaces of \(\dot{H}^1\) be given such that there exists a constant \(C\) with

$$\begin{aligned} \big \Vert R_h x - x \big \Vert \le C h^s \Vert x \Vert _{s} \text { for all } x \in \dot{H}^s, \; s\in \{1,2\}, \; h \in (0,1]. \end{aligned}$$
(12)

Another important operator is the linear mapping \(A_h :V_h \rightarrow V_h\), which denotes a discrete version of \(A\). For a given \(x_h \in V_h\) we define \(A_h x_h \in V_h\) by the representation theorem through the relationship

$$\begin{aligned} (x_h, y_h)_{1} = (A_h x_h, y_h) \quad \text {for all } y_h \in V_h. \end{aligned}$$

It directly follows that \(A_h\) is self-adjoint and positive definite on \(V_h\).

Finally, we denote by \(P_h :\dot{H}^{-1} \rightarrow V_h\) the (generalized) orthogonal projector onto \(V_h\) with respect to the inner product in \(H\). As in [7] the projector \(P_h\) is defined by

$$\begin{aligned} (P_h x, y_h) = ( A^{-\frac{1}{2}} x, A^{\frac{1}{2}} y_h ) \quad \text {for all } x \in \dot{H}^{-1}, y_h \in V_h. \end{aligned}$$

After having introduced all operators for the spatial approximation we recall the following discrete negative norm estimate from [25, (3.7)]

$$\begin{aligned} \Vert A_h^{-\frac{1}{2}} P_h x \Vert&= \sup _{z_h \in V_h} \frac{\big |(A_h^{-\frac{1}{2}}P_h x,z_h) \big |}{\Vert z_h \Vert } = \sup _{z_h \in V_h} \frac{\big |( P_h x,A_h^{-\frac{1}{2}} z_h) \big |}{\Vert z_h \Vert } \nonumber \\&= \sup _{z_h' \in V_h} \frac{\big |\langle x,z_h' \rangle \big |}{\Vert A_h^{\frac{1}{2}} z_h' \Vert } \le \sup _{z_h' \in V_h} \frac{\Vert x \Vert _{-1} \Vert z_h' \Vert _{1}}{ \Vert A_h^{\frac{1}{2}} z_h' \Vert } = \Vert x \Vert _{-1} \end{aligned}$$
(13)

for all \(x\in \dot{H}^{-1}\).

The remainder of this subsection lists some error estimates for spatio-temporal Galerkin finite element approximations of the linear Cauchy problem

$$\begin{aligned} \frac{\mathrm {d}}{\,\mathrm {d}t} u(t) + A u(t) = 0,\quad t \in [0,T], \quad u(0) = x \in H. \end{aligned}$$
(14)

In terms of the semigroup \((S(t))_{t \in [0,T]}\) generated by \(-A\), the solution to (14) is given by \(u(t) = S(t) x\) for all \(t \in [0,T]\).

Let \(k \in (0,T]\) be a given equidistant time step size. We define \(N_k \in {\mathbb {N}}\) by \(k N_k \le T < k(N_k + 1)\) and denote the set of all temporal grid points by \(\mathcal {T}_k := \{ t_n \, : \, n = 0,1,\ldots ,N_k\,\}\) with \(t_n = kn\). Then, we combine the spatially discrete operators with a backward Euler scheme and obtain the spatio-temporal Galerkin finite element approximation \(u_{k,h} :\mathcal {T}_k \rightarrow V_h\) of (14), which is given by the recursion

$$\begin{aligned} u_{k,h}(t_0)&= P_h x,\nonumber \\ u_{k,h}(t_n) + k A_h u_{k,h}(t_n)&= u_{k,h}(t_{n-1}), \quad n = 1,\ldots ,N_k, \end{aligned}$$
(15)

for \(h \in (0,1]\) and \(k \in (0,T]\). Equivalently, we may write \(u_{k,h}(t_n) = \overline{S}_{k,h}^n P_h u_0\) with \(\overline{S}_{k,h} = ( \mathrm {Id}_H + k A_h)^{-1}\) for all \(n \in \{0,\ldots ,N_k\}\).

Similar to the analytic semigroup \((S(t))_{t \in [0,T]}\), the discrete operator \(\overline{S}_{k,h} :V_h \rightarrow V_h\) has the following smoothing property

$$\begin{aligned} \big \Vert A_h^{\rho } \overline{S}_{k,h}^{-j} x_h \big \Vert = \big \Vert A_h^\rho ( \mathrm {Id}_H + k A_h )^{-j} x_h \big \Vert \le C t_{j}^{-\rho } \Vert x_h \Vert \end{aligned}$$
(16)

for all \(j \in \{1,\ldots ,N_k\}\), \(x_h \in V_h\), \(k \in (0,T]\) and \(h \in (0,1]\). Here the constant \(C = C(\rho )\) is independent of \(h,k\) and \(j\). For a proof of (16) we refer to [32, Lem. 7.3].

For the error analysis in Sect. 4 it will be convenient to introduce the continuous time error operator between (14) and (15)

$$\begin{aligned} F_{k,h}(t) := S_{k,h}(t) P_h - S(t), \quad t \in [0,T), \end{aligned}$$
(17)

where

$$\begin{aligned} S_{k,h}(t) := ( \mathrm {Id}_H + k A_h )^{-j}, \quad \text {if } t \in [t_{j-1}, t_j) \text { for } j\in \{1,2,\ldots \,N_k\} . \end{aligned}$$
(18)

The mapping \(t \mapsto S_{k,h}(t)\), and hence \(t \mapsto F_{k,h}(t)\), is right continuous with left limits. A simple consequence of (16) and (13) are the inequalities

$$\begin{aligned} \big \Vert S_{k,h}(t) P_h x \big \Vert \le C \big \Vert x \big \Vert \quad \text { for all } x \in H, \end{aligned}$$
(19)

and

$$\begin{aligned} \big \Vert S_{k,h}(t) P_h x \big \Vert = \big \Vert A_h^{\frac{1}{2}} ( \mathrm {Id}_H + k A_h )^{-j} A_h^{-\frac{1}{2}} P_h x \big \Vert \le C t_{j}^{-\frac{1}{2}} \big \Vert x \big \Vert _{-1} \le C t^{-\frac{1}{2}} \big \Vert x \big \Vert _{-1}, \end{aligned}$$
(20)

which hold for all \(x \in \dot{H}^{-1}\), \(h \in (0,1]\), \(k \in (0,T]\) and \(t > 0\) with \(t \in [t_{j-1}, t_j)\), \(j = 1,2,\ldots \). For both inequalities the constant \(C\) can be chosen to be independent of \(h \in (0,1]\) and \(k \in (0,T]\).

The next lemma provides several estimates for the error operator \(F_{k,h}\) with non-smooth initial data. Most of the results are well-known and are found in [32, Ch. 7]. The missing cases have been proved in [21, Lem. 3.12].

Lemma 2.8

Under Assumption 2.7 the following estimates hold true:

  1. (i)

    Let \(0 \le \nu \le \mu \le 2\). Then there exists a constant \(C\) such that

    $$\begin{aligned} \big \Vert F_{k,h}(t) x \big \Vert \le C \big ( h^{\mu } + k^{\frac{\mu }{2}} \big ) t^{-\frac{\mu - \nu }{2}} \big \Vert x \big \Vert _{\nu } \text { for all } x \in \dot{H}^\nu , \; t \in (0,T), \; h, k \in (0,1]. \end{aligned}$$
  2. (ii)

    Let \(0 \le \rho \le 1\). Then there exists a constant \(C\) such that

    $$\begin{aligned} \big \Vert F_{k,h}(t) x \big \Vert \le C t^{-\frac{\rho }{2}} \big \Vert x \big \Vert _{-\rho } \text { for all } x \in \dot{H}^{-\rho }, \; t \in (0,T), \; h, k \in (0,1]. \end{aligned}$$
  3. (iii)

    Let \(0 \le \rho \le 1\). Then there exists a constant \(C\) such that

    $$\begin{aligned} \big \Vert F_{k,h}(t) \!x\! \big \Vert \!\le \! C \big ( h^{2- \rho } \!+\! k^{\frac{2 \!-\! \rho }{2}} \big ) t^{-1} \big \Vert \!x\! \big \Vert _{-\rho } \text { for all } \!x\! \in \dot{H}^{-\rho }, \; t \in (0,T), \; h, k \in (0,1]. \end{aligned}$$

The next assumption is concerned with the stability of the orthogonal projector \(P_h\) with respect to the norm \(\Vert \cdot \Vert _{1}\). It only appears in the proof of Lemma 2.10 as shown in [21, Lem. 3.13].

Assumption 2.9

Let a family \((V_h)_{h \in (0,1]}\) of finite dimensional subspaces of \(\dot{H}^1\) be given such that there exists a constant \(C\) with

$$\begin{aligned} \Vert P_h x \Vert _{1} \le C \Vert x \Vert _{1} \quad \text { for all } x \in \dot{H}^1, \; h \in (0, 1]. \end{aligned}$$
(21)

The last lemma of this section is concerned with sharper integral versions of the error estimate in Lemma 2.8 (i) and (iii). A proof is given in [21, Lem. 3.13].

Lemma 2.10

Let \(0 \le \rho \le 1\). Under Assumption 2.7 the operator \(F_{k,h}\) satisfies the following estimates.

  1. (i)

    There exists a constant \(C\) such that

    $$\begin{aligned} \Big \Vert \int _{0}^{t} F_{k,h}(\sigma ) x \,\mathrm {d}\sigma \Big \Vert \le C \big ( h^{2 - \rho } + k^{\frac{2 - \rho }{2}} \big ) \big \Vert x \big \Vert _{-\rho } \end{aligned}$$

    for all \(x \in \dot{H}^{-\rho }\), \(t > 0\), and \(h, k \in (0,1]\).

  2. (ii)

    Under the additional Assumption 2.9 there exists a constant \(C\) such that

    $$\begin{aligned} \Big ( \int _{0}^{t} \big \Vert F_{k,h}(\sigma ) x \big \Vert ^2 \,\mathrm {d}\sigma \Big )^{\frac{1}{2}} \le C \big ( h^{1 + \rho } + k^{\frac{1 + \rho }{2}} \big ) \big \Vert x \big \Vert _{\rho } \end{aligned}$$

    for all \(x \in \dot{H}^{\rho }\), \(t > 0\), and \(h, k \in (0,1]\).

3 Stability and consistency of numerical one-step schemes

This section contains the somewhat more abstract framework of the convergence analysis. We generalize the notion of stability and consistency from [5, 19] to Hilbert spaces and we derive a set of sufficient conditions for the so-called bistability. Finally, a decomposition of the local truncation error gives a blueprint for the proof of consistency of the Milstein–Galerkin finite element scheme in Sect. 5.

3.1 Definition of the abstract one-step scheme

As above, let \(\mathcal {T}_k := \{t_n \, : \, n = 0,1,\ldots ,N_k \}\) be the set of temporal grid points for a given equidistant time step size \(k \in (0,T]\) and recall that \(N_k \in {\mathbb {N}}\) is given by \(N_k k \le T < (N_k + 1)k\).

The first important ingredient, which determines the numerical scheme, is a family of bounded linear operators \(S_k :H \rightarrow H\), \(k \in (0,T]\), which approximate the semigroup \(S(t)\), \(t \in [0,T]\), in a suitable sense.

Further, for the definition of the second ingredient, let us introduce the set \(\mathbb {T} \subset [0,T) \times (0,T]\), which is given by

$$\begin{aligned} \mathbb {T} := \{ (t,k) \in [0,T) \times (0,T] \; : \; t + k \le T \}. \end{aligned}$$

The so-called increment function is a mapping \(\Phi :H \times \mathbb {T} \times \Omega \rightarrow H\) with the property that for every \((t,k) \in \mathbb {T}\) the mapping \((x,\omega ) \mapsto \Phi (x,t,k)(\omega )\) is measurable with respect to \({\mathcal {B}}(H) \otimes {\mathcal {F}}_{t + k}/{\mathcal {B}}(H)\).

Then, for every \(k \in (0,T]\) the discrete time stochastic process \(X_k :\mathcal {T}_k \times \Omega \rightarrow H\), is given by the recursion

$$\begin{aligned} X_k(t_0)&:= \xi ,\nonumber \\ X_k(t_n)&:= S_k X_k(t_{n-1}) + \Phi \big ( X_k(t_{n-1}), t_{n-1}, k \big ) \end{aligned}$$
(22)

for every \(n \in \{1,\ldots ,N_k\}\), where \(\xi :\Omega \rightarrow H\), is an \({\mathcal {F}}_{t_0}/{\mathcal {B}}(H)\)-measurable random variable representing the initial value of the numerical scheme. It follows directly that \(X_k(t_n)\) is \({\mathcal {F}}_{t_n}/{\mathcal {B}}(H)\)-measurable for all \(n \in \{1,\ldots ,N_k\}\).

In Sect. 4 we show how the Milstein–Galerkin finite element scheme fits into the framework of (22).

Having introduced the abstract numerical scheme, we recall the cornerstones of the stability and consistency concept for one-step methods from [5, 19]. First, let us introduce the family of linear spaces of adapted, \(p\)-integrable grid functions

$$\begin{aligned} \mathcal {G}_p(\mathcal {T}_k) \!:=\! \big \{ Z :\mathcal {T}_k \!\times \! \Omega \!\rightarrow \! H \; : \; Z(t_n) \in L_p(\Omega ,\mathcal {F}_{t_n},{\mathbf {P}};H) \!\text { for all }\! n \in \{0,1, \ldots ,N_k \} \big \} \end{aligned}$$

for all \(p \in [2,\infty )\) and \(k \in (0,T]\). The spaces \(\mathcal {G}_p(\mathcal {T}_k)\) are endowed with the two norms

$$\begin{aligned} \Vert Z \Vert _{0,p} := \max _{n \in \{0,\ldots ,N_k\}} \big \Vert Z(t_n) \big \Vert _{L_p(\Omega ;H)} \end{aligned}$$
(23)

and

$$\begin{aligned} \Vert Z \Vert _{-1,p} := \Vert Z(t_0) \Vert _{L_p(\Omega ;H)} + \max _{n \in \{1,\ldots ,N_k\}} \Big \Vert \sum _{j = 1}^n S_k^{n-j} Z(t_j) \Big \Vert _{L_p(\Omega ;H)} \end{aligned}$$
(24)

for all \(Z \in \mathcal {G}_p(\mathcal {T}_k)\). The norm \(\Vert \cdot \Vert _{-1,p}\) is called (stochastic) Spijker norm and known to result into sharp and two-sided estimates of the error of convergence, see for example [13, p. 438] as well as [5, 29, 30, 31].

Next, for \(p\in [2,\infty )\), let us define the family of nonlinear operators \(\mathcal {R}_k :\mathcal {G}_p(\mathcal {T}_k) \rightarrow \mathcal {G}_p(\mathcal {T}_k)\), which for \(k \in (0,T]\) are given by

$$\begin{aligned} \mathcal {R}_k[Z](t_0)&= Z(t_0) - \xi , \nonumber \\ \mathcal {R}_k[Z](t_n)&= Z(t_n) - S_k Z(t_{n-1}) - \Phi (Z(t_{n-1}),t_{n-1},k), \quad n \in \{1,\ldots ,N_k\}. \end{aligned}$$
(25)

Below we show that the operators \(\mathcal {R}_k\) are well-defined under Assumptions 3.5 and 3.7 for all \(k \in (0,T]\). Further, under these conditions it holds that \(\mathcal {R}_k[X_k]= 0 \in \mathcal {G}_p(\mathcal {T}_k)\) for all \(k \in (0,T]\), where \(X_k \in \mathcal {G}_p(\mathcal {T}_k)\) is the discrete time stochastic process generated by the numerical scheme (22). The mappings \(\mathcal {R}_k\) are therefore called residual operators associated to the numerical scheme (22).

The following definition contains our notion of stability.

Definition 3.1

Let \(p \in [2,\infty )\). The numerical scheme (22) is called bistable (with respect to the norms \(\Vert \cdot \Vert _{0,p}\) and \(\Vert \cdot \Vert _{-1,p}\)) if the residual operators \(\mathcal {R}_k :\mathcal {G}_p(\mathcal {T}_k) \rightarrow \mathcal {G}_p(\mathcal {T}_k)\) are well-defined and bijective for all \(k \in (0,T]\) and if there exists a constant \(C_{\mathrm {Stab}}\) independent of \(k \in (0,T]\) such that

$$\begin{aligned} \frac{1}{C_{\mathrm {Stab}}} \big \Vert \mathcal {R}_k[Y] - \mathcal {R}_k[Z] \big \Vert _{-1,p} \le \Vert Y - Z \Vert _{0,p} \le C_{\mathrm {Stab}} \big \Vert \mathcal {R}_k[Y] - \mathcal {R}_k[Z] \big \Vert _{-1,p} \end{aligned}$$
(26)

for all \(k \in (0,T]\) and \(Y, Z \in \mathcal {G}_p(\mathcal {T}_k)\).

Therefore, for a bistable numerical scheme, the distance between two arbitrary adapted grid functions can be estimated by the distance of their residuals measured with respect to the stochastic Spijker norm and vice versa. In Sect. 3.3 we show that Assumptions 3.5 to 3.7 are sufficient conditions for the stability of the numerical scheme (22).

The counterpart of the notion of stability is the so-called consistency of the numerical scheme, which we define in the same way as in [5, 19]. For this we denote by \(Z|_{\mathcal {T}_k} \in \mathcal {G}_p(\mathcal {T}_k)\) the restriction of a \(p\)-fold integrable, adapted and continuous stochastic process \(Z :[0,T] \times \Omega \rightarrow H\) to the set \(\mathcal {G}_p(\mathcal {T}_k)\), that is

$$\begin{aligned} Z|_{\mathcal {T}_k}(t_n) := Z(t_n), \quad n \in \{0,\ldots ,N_k\}. \end{aligned}$$

Definition 3.2

Let \(p \in [2, \infty )\). We say that the numerical scheme (22) is consistent of order \(\gamma > 0\) with respect to the SPDE (2) if there exists a constant \(C_\mathrm {Cons}\) independent of \(k \in (0,T]\) such that

$$\begin{aligned} \big \Vert \mathcal {R}_k [ X|_{\mathcal {T}_k} ] \big \Vert _{-1,p} \le C_{\mathrm {Cons}} k^{\gamma } \end{aligned}$$

for all \(k \in (0,T]\), where \(X\) is the mild solution to (2).

The term \(\Vert \mathcal {R}_k [ X|_{\mathcal {T}_k} ] \Vert _{-1,p}\) is called local truncation error or consistency error. Finally, we introduce the notion of strong convergence.

Definition 3.3

Let \(p \in [2, \infty )\). We say that the numerical scheme (22) is strongly convergent of order \(\gamma > 0\), if there exists a constant \(C\) independent of \(k \in (0,T]\) such that

$$\begin{aligned} \big \Vert X_k - X|_{\mathcal {T}_k} \big \Vert _{0,p} \le C k^{\gamma } \end{aligned}$$

for all \(k \in (0,T]\), where \(X_k \in \mathcal {G}_p(\mathcal {T}_k)\), \(k \in (0,T]\) are the grid functions generated by the numerical scheme (22) and \(X\) denotes the mild solution to (2).

Theorem 3.4

A bistable numerical scheme of the form (22) is strongly convergent of order \(\gamma > 0\) if and only if it is consistent of order \(\gamma > 0\). In particular, it holds

$$\begin{aligned} \frac{1}{C_{\mathrm {Stab}}} \big \Vert \mathcal {R}_k[X|_{\mathcal {T}_k} ] \big \Vert _{-1,p} \le \big \Vert X_k - X|_{\mathcal {T}_k} \big \Vert _{0,p} \le C_{\mathrm {Stab}} \big \Vert \mathcal {R}_k[X|_{\mathcal {T}_k} ] \big \Vert _{-1,p} \end{aligned}$$

for all \(k \in (0,T]\), where \(X_k \in \mathcal {G}_p(\mathcal {T}_k)\), \(k \in (0,T]\), denotes the family of grid functions generated by the numerical scheme (22) and \(X\) is the mild solution to (2).

Proof

First, let us recall that the residual operators \(\mathcal {R}_k :\mathcal {G}_p(\mathcal {T}_k) \rightarrow \mathcal {G}_p(\mathcal {T}_k)\) satisfy \(\mathcal {R}_k [ X_k ] = 0\) for every \(k \in (0,T]\). Thus, by the bistability of the numerical scheme we obtain

$$\begin{aligned} \frac{1}{C_{\mathrm {Stab}}} \big \Vert \mathcal {R}_k\big [X|_{\mathcal {T}_k} \big ] \big \Vert _{-1,p} \le \big \Vert X_k - X|_{\mathcal {T}_k} \big \Vert _{0,p} \le C_{\mathrm {Stab}} \big \Vert \mathcal {R}_k\big [X|_{\mathcal {T}_k} \big ] \big \Vert _{-1,p}. \end{aligned}$$

Consequently, the assertion follows directly from the definitions of consistency and strong convergence. \(\square \)

3.2 Assumptions on the numerical scheme

In this subsection some assumptions on the abstract numerical scheme (22) are collected, which assure its stability as we will show in Sect. 3.3.

Assumption 3.5

(Initial value) Let \(p \in [2,\infty )\). The initial condition \(\xi :\Omega \rightarrow H\) is a \(p\)-fold integrable and \(\mathcal {F}_0/\mathcal {B}(H)\)-measurable random variable.

The next two assumptions are concerned with the family of linear operators \(S_k,\, k \in (0,T]\), and the increment function \(\Phi \).

Assumption 3.6

(Linear stability) For the family of bounded linear operators \(S_{k} :H \rightarrow H\), \(k \in (0,T]\), there exists a constant \(C_S\) independent of \(k \in (0,T]\) such that

$$\begin{aligned} \sup _{k \in (0,T]} \sup _{n \in \{1,\ldots ,N_k\}} \Vert S_k^n \Vert _{{\mathcal {L}}(H)} \le C_S. \end{aligned}$$

Assumption 3.7

(Nonlinear stability) Let \(p \in [2,\infty )\) be the same as in Assumption 3.5. For every \((t,k) \in \mathbb {T}\) the mapping \(\Phi (\cdot ,t,k) :H \times \Omega \rightarrow H\) is measurable with respect to \({\mathcal {B}}(H) \otimes {\mathcal {F}}_{t + k}/{\mathcal {B}}(H)\). Further, there exists a constant \(C_\Phi \) such that

$$\begin{aligned}&\displaystyle \Big \Vert \sum \limits _{j = m}^{n} S^{n-j}_k \Phi (0,t_{j-1},k) \Big \Vert _{L_p(\Omega ;H)} \le C_\Phi \big (t_n - t_{m-1} \big )^{\frac{1}{2}} \end{aligned}$$
(27)

for all \(k \in (0,T]\) and \(n,m \in \{1, \ldots , N_k\}\) with \(n \ge m\). In addition, it holds

$$\begin{aligned}&\Big \Vert \sum _{j = 1}^{n} S_k^{n-j} \big ( \Phi ( Y(t_{j-1}),t_{j-1}, k ) - \Phi (Z(t_{j-1}),t_{j-1},k) \big ) \Big \Vert _{L_p(\Omega ;H)}^2\nonumber \\&\quad \le C_\Phi ^2 k \sum _{j = 1}^{n} \big ( t_n - t_{j-1} \big )^{-\frac{1}{2}} \big \Vert Y(t_{j-1}) - Z(t_{j-1}) \big \Vert _{L_p(\Omega ;H)}^2 \end{aligned}$$
(28)

for all \(k \in (0,T]\), \(n \in \{1,\ldots ,N_k\}\) and all \(Y, Z \in \mathcal {G}_p(\mathcal {T}_k)\).

Let us remark that from Assumption 3.7 it follows directly that

$$\begin{aligned} \big \Vert \Phi ( Z(t_{n-1}) ,t_{n-1}, k)\big \Vert _{L_p(\Omega ;H)} \le C_\Phi k^{\frac{1}{4}} \big (T^{\frac{1}{4}} + \Vert Z(t_{n-1}) \Vert _{L_p(\Omega ;H)} \big ) \end{aligned}$$
(29)

for all \(k \in (0,T]\), \(n \in \{1,\ldots ,N_k\}\) and all \(Z \in \mathcal {G}_p(\mathcal {T}_k)\). Indeed, fix \(k \in (0,T]\) and \(n \in \{1,\ldots ,N_k\}\) and define \(\hat{Z} \in \mathcal {G}_p(\mathcal {T}_k)\) by

$$\begin{aligned} \hat{Z}(t_j) = {\left\{ \begin{array}{ll} Z(t_{n-1}),&{} \quad j = n-1,\\ 0,&{} \quad j \ne n-1. \end{array}\right. } \end{aligned}$$

Then, we get from (28)

$$\begin{aligned}&\big \Vert \Phi \big ( Z(t_{n-1}), t_{n-1}, k \big ) \big \Vert _{L_p(\Omega ;H)} \\&\quad = \Big \Vert \sum _{j = 1}^{n} \Big [ S^{n-j}_k \big ( \Phi \big (\hat{Z}(t_{j-1}),t_{j-1},k\big ) - \Phi \big (0, t_{j-1},k \big ) \big ) \Big ] + \Phi \big ( 0 , t_{n-1}, k\big ) \Big \Vert _{L_p(\Omega ;H)}\\&\quad \le C_\Phi k^{\frac{1}{4}} \big \Vert Z(t_{n-1}) \big \Vert _{L_p(\Omega ;H)} + \big \Vert \Phi \big ( 0, t_{n-1}, k \big ) \big \Vert _{L_p(\Omega ;H)}. \end{aligned}$$

Further, (27) applied with \(n= m\) yields

$$\begin{aligned} \big \Vert \Phi \big ( 0, t_{n-1}, k \big ) \big \Vert _{L_p(\Omega ;H)} \le C_\Phi T^{\frac{1}{4}} k^{\frac{1}{4}} \end{aligned}$$

which completes the proof of (29).

3.3 Bistability of the numerical scheme

In this subsection we demonstrate that Assumptions 3.5–3.7 are sufficient for the bistability of the numerical scheme.

Theorem 3.8

Let Assumptions 3.5 to 3.7 be satisfied with \(p \in [2,\infty )\). Then, the mappings \(\mathcal {R}_k:\mathcal {G}_p(\mathcal {T}_k) \rightarrow \mathcal {G}_p(\mathcal {T}_k)\) are well-defined and bijective for all \(k \in (0,T]\). Further, the numerical scheme (22) is bistable.

Proof

Let \(k \in (0,T]\) be arbitrary. We first prove that \(\mathcal {R}_k :\mathcal {G}_p(\mathcal {T}_k) \rightarrow \mathcal {G}_p(\mathcal {T}_k)\) is indeed well-defined. For all \(n \in \{0,\ldots ,N_k\}\) and \(Z \in \mathcal {G}_p(\mathcal {T}_k)\) the random variable \(\mathcal {R}_k[Z](t_n)\) is \(\mathcal {F}_{t_n}\)-measurable. In addition, by Assumptions 3.5 and 3.6 and (29) it follows that the \(\mathcal {R}_k[Z](t_n)\) is also \(p\)-fold integrable for all \(n \in \{0,\ldots ,N_k\}\). Therefore, it holds \(\mathcal {R}_k[Z] \in \mathcal {G}_p(\mathcal {T}_k)\).

Given \(Y,Z \in \mathcal {G}_p(\mathcal {T}_k)\) with \(\mathcal {R}_k[Y] = \mathcal {R}_k[Z]\), then it particularly holds \(\mathcal {R}_k[Y](t_0) = \mathcal {R}_k[Z](t_0)\) from which we deduce \(Y(t_0) = Z(t_0)\). Further, under the assumption that for some \(n \in \{0,\ldots ,N_k-1\}\), we have shown that \(Y(t_j) = Z(t_j)\) for all \(j \in \{0,\ldots ,n\}\) then it follows by (25)

$$\begin{aligned} 0 = \mathcal {R}_k[Y]( t_{n+1}) - \mathcal {R}_k[Z](t_{n+1}) = Y(t_{n+1}) - Z(t_{n+1}). \end{aligned}$$

Hence, \(Y(t_{n+1}) = Z(t_{n+1})\) which proves that \(\mathcal {R}_k\) is injective.

Further, for arbitrary \(V \in \mathcal {G}_p(\mathcal {T}_k)\) the grid function \(Z \in \mathcal {G}_p(\mathcal {T}_k)\) defined by

$$\begin{aligned} Z(t_0)&:= V(t_0) + \xi ,\nonumber \\ Z(t_n)&:= S_k^n Z(t_0) + \sum _{j = 1}^n S_k^{n-j} \big ( \Phi (Z(t_{j-1}), t_{j-1}, k) + V(t_{j}) \big ), \end{aligned}$$
(30)

for all \(n \in \{1,\ldots ,N_k\}\), satisfies \(\mathcal {R}_k[Z] = V\), as one directly verifies by an inductive argument. Consequently, \(\mathcal {R}_k\) is also surjective. In particular, for all \(Z \in \mathcal {G}_p(\mathcal {T}_k)\) we equivalently rewrite the discrete variation of constants formula (30) as

$$\begin{aligned} Z(t_0)&= \mathcal {R}_k[Z](t_0) + \xi ,\nonumber \\ Z(t_n)&= S_k^n Z(t_0) + \sum _{j = 1}^n S_k^{n-j} \big ( \Phi (Z(t_{j-1}),t_{j-1}, k ) + \mathcal {R}_k[Z](t_j) \big ) \end{aligned}$$
(31)

for all \(n \in \{1,\ldots ,N_k\}\). Thus, from Assumption 3.6 and (28) we obtain

$$\begin{aligned}&\Vert Y(t_n) - Z(t_n) \Vert _{L_p(\Omega ;H)} \le \big \Vert S_k^n ( Y(t_0) - Z(t_0) ) \big \Vert _{L_p(\Omega ;H)} \\&\qquad +\, \Big \Vert \sum _{j = 1}^n S_k^{n-j} \big ( \Phi (Y(t_{j-1}), t_{j-1}, k) ) - \Phi (Z(t_{j-1}), t_{j-1}, k) \big ) \Big \Vert _{L_p(\Omega ;H)}\\&\qquad + \,\Big \Vert \sum _{j = 1}^n S_k^{n-j} \big ( \mathcal {R}_k[Y](t_j) - \mathcal {R}_k[Z](t_j) \big ) \Big \Vert _{L_p(\Omega ;H)}\\&\quad \le C_S \big \Vert Y(t_0) - Z(t_0) \big \Vert _{L_p(\Omega ;H)} + \Big \Vert \sum _{j = 1}^n S_k^{n-j} \big ( \mathcal {R}_k[Y](t_j) - \mathcal {R}_k[Z](t_j) \big ) \Big \Vert _{L_p(\Omega ;H)} \\&\qquad + \, C_\Phi \Big ( k \sum _{j = 1}^{n} \big ( t_n - t_{j-1} \big )^{-\frac{1}{2}} \big \Vert Y(t_{j-1}) - Z(t_{j-1}) \big \Vert _{L_p(\Omega ;H)}^2 \Big )^{\frac{1}{2}} \end{aligned}$$

for all \(n \in \{1,\ldots ,N_k\}\) and all \(Y, Z \in \mathcal {G}_p(\mathcal {T}_k)\). In addition, we have

$$\begin{aligned} \Vert Y(t_0) - Z(t_0) \Vert _{L_p(\Omega ;H)} = \Vert \mathcal {R}_k[Y](t_0) -\mathcal {R}_k[Z](t_0) \Vert _{L_p(\Omega ;H)}. \end{aligned}$$

Now, from the definition of the norm \(\Vert \cdot \Vert _{-1,p}\) in (24) it directly follows that

$$\begin{aligned}&\Vert Y(t_n) - Z(t_n) \Vert _{L_p(\Omega ;H)}^2 \le 2 (1 + C_S)^2 \big \Vert \mathcal {R}_k[Y] - \mathcal {R}_k[Z] \big \Vert _{-1,p}^2 \\&\qquad \qquad \qquad +\, 2 C_\Phi ^2 k \sum _{j = 1}^{n}\big ( t_n - t_{j-1} \big )^{-\frac{1}{2}} \big \Vert Y(t_{j-1}) - Z(t_{j-1}) \big \Vert _{L_p(\Omega ;H)}^2 \end{aligned}$$

and an application of the discrete Gronwall lemma (see Lemma 3.9) completes the proof of the right hand side inequality in (26).

Similarly, by rearranging (31) and an application of (28) we obtain

$$\begin{aligned}&\Big \Vert \sum _{j = 1}^n S_k^{n-j} \big ( \mathcal {R}_k[Y](t_j) - \mathcal {R}_k[Z](t_j) \big ) \Big \Vert _{L_p(\Omega ;H)}\\&\quad \le \Vert Y(t_n) - Z(t_n) \Vert _{L_p(\Omega ;H)} + \Vert S_k^n(Y(t_0) - Z(t_0)) \Vert _{L_p(\Omega ;H)}\\&\qquad +\, \Big \Vert \sum _{j = 1}^n S_k^{n-j} \Bigg ( \Phi (Y(t_{j-1}), t_{j-1}, k) ) - \Phi (Z(t_{j-1}), t_{j-1}, k) \Bigg ) \Big \Vert _{L_p(\Omega ;H)}\\&\quad \le 2 \Vert Y - Z \Vert _{0,p} + C_\Phi \Bigg ( k \sum _{j = 1}^{n} \big ( t_n - t_{j-1} \big )^{-\frac{1}{2}} \big \Vert Y(t_{j-1}) - Z(t_{j-1}) \big \Vert _{L_p(\Omega ;H)}^2 \Bigg )^{\frac{1}{2}}\\&\quad \le \Bigg ( 2 + C_\Phi \Bigg (k \sum _{j = 1}^{n} \big ( t_n - t_{j-1} \big )^{-\frac{1}{2}} \Bigg )^{\frac{1}{2}} \Bigg ) \Vert Y - Z \Vert _{0,p} \end{aligned}$$

for all \(n \in \{1,\ldots ,N_k\}\) and \(Y, Z \in \mathcal {G}_p(\mathcal {T}_k)\). Since we have

$$\begin{aligned} k \sum _{j = 1}^n \big (t_n - t_{j-1}\big )^{-\frac{1}{2}} \le \int _{0}^{t_n} \sigma ^{-\frac{1}{2}} \,\mathrm {d}\sigma \le 2 t_n^{\frac{1}{2}} \le 2 T^{\frac{1}{2}}, \end{aligned}$$
(32)

we have also shown the validity of the inequality on the left-hand side of (26). \(\square \)

A proof of the following version of Gronwall’s lemma is given in [9, Lemma 7.1].

Lemma 3.9

(Discrete Gronwall lemma) Let \(T > 0\), \(k \in (0,T]\), \(\eta \in (0,1]\) and a real-valued nonnegative sequence \(x_n\), \(n \in \{0,\ldots ,N_k\}\), be given. If there exist constants \(C_1, C_2 \ge 0\) such that

$$\begin{aligned} x_n \le C_1 + C_2 k \sum _{j = 1}^{n} \big ( t_n - t_{j-1} \big )^{-1+\eta } x_{j-1} \quad \text { for all } n = 0, \ldots , N_k. \end{aligned}$$

Then, there exists a constant \(C = C(C_2, T, \eta )\), independent of \(k\), such that

$$\begin{aligned} x_n \le C C_1 \quad \text { for all } n = 0, \ldots , N_k. \end{aligned}$$

Having established this we directly deduce the following norm estimate for the numerical scheme (22).

Corollary 3.10

For \(k \in (0,T]\) let \(X_k \in \mathcal {G}_p(\mathcal {T}_k)\) be the grid function, which is generated by the numerical scheme (22). Under Assumptions 3.5 to 3.7 with \(p \in [2,\infty )\) it holds that

$$\begin{aligned} \big \Vert X_k \big \Vert _{0,p} \le C_{\mathrm {Stab}} \Bigg ( \big \Vert \xi \big \Vert _{L_p(\Omega ;H)} + C_\Phi T^{\frac{1}{2}} \Bigg ), \end{aligned}$$

for all \(k \in (0,T]\).

Proof

Under the given assumptions the numerical scheme (22) is stable. Since \(\mathcal {R}_k[X_k] = 0 \in \mathcal {G}_p(\mathcal {T}_k)\) it holds

$$\begin{aligned} \big \Vert X_k \big \Vert _{0,p} = \big \Vert X_k - 0 \big \Vert _{0,p}&\le C_{\mathrm {Stab}} \big \Vert \mathcal {R}_k[X_k]-\mathcal {R}_k[0]\big \Vert _{-1,p} \\&= C_{\mathrm {Stab}} \big \Vert \mathcal {R}_k[0]\big \Vert _{-1,p}. \end{aligned}$$

Further, from (27) it follows that

$$\begin{aligned} \big \Vert \mathcal {R}_k[0]\big \Vert _{-1,p}&= \big \Vert \xi \big \Vert _{L_p(\Omega ;H)} + \max _{n \in \{1,\ldots ,N_k\}}\big \Vert \sum _{j = 1}^n S_k^{n-j} \Phi (0, t_{j-1}, k) \big \Vert _{L_p(\Omega ;H)} \\&\le \big \Vert \xi \big \Vert _{L_p(\Omega ;H)} + C_\Phi T^{\frac{1}{2}}, \end{aligned}$$

which completes the proof. \(\square \)

3.4 Consistency of the numerical scheme

In this section we derive a decomposition of the local truncation error \(\Vert \mathcal {R}_k [X|_{\mathcal {T}_k} ] \Vert _{-1,p}\), which turns out to be useful in the proof of consistency of the Milstein scheme. In Lemma 3.11 it is shown that the local truncation error is dominated by a sum of five terms.

The first one is concerned with the distance between the initial conditions of the SPDE (2) and the numerical scheme (22). The next three summands are concerned with the error originating from replacing the analytic semigroup \(S(t)\), \(t \in [0,T]\), by the family of bounded linear operators \(S_k\). Finally, the last term deals with the error caused by the increment function \(\Phi \).

Lemma 3.11

Let \(X\) be the mild solution to (2). Then the local truncation error satisfies the estimate

$$\begin{aligned}&\big \Vert \mathcal {R}_k [X|_{\mathcal {T}_k} ] \big \Vert _{-1,p} \le \big \Vert X(t_0) - \xi \big \Vert _{L_p(\Omega ;H)} + \max _{n \in \{1,\ldots ,N_k\}} \big \Vert (S(t_n) - S_k^n) X_0 \big \Vert _{L_p(\Omega ;H)} \\&\quad +\,\max _{n \in \{1,\ldots ,N_k\}} \Big \Vert \sum _{j = 1}^n \int _{t_{j-1}}^{t_{j}} \big ( S(t_n - \sigma ) - S_k^{n - j + 1} \big ) f(X(\sigma )) \,\mathrm {d}\sigma \Big \Vert _{L_p(\Omega ;H)} \\&\quad +\,\max _{n \in \{1,\ldots ,N_k\}} \Big \Vert \sum _{j = 1}^n \int _{t_{j-1}}^{t_{j}} \big ( S(t_n - \sigma ) - S_k^{n - j + 1} \big ) g(X(\sigma )) \,\mathrm {d}W(\sigma )\Big \Vert _{L_p(\Omega ;H)} \\&\quad +\,\max _{n \in \{1,\ldots ,N_k\}} \Big \Vert \sum _{j = 1}^n S_k^{n-j} \Big ( - \int _{t_{j-1}}^{t_{j}} S_k f(X(\sigma )) \,\mathrm {d}\sigma + \int _{t_{j-1}}^{t_{j}} S_k g(X(\sigma )) \,\mathrm {d}W(\sigma ) \\&\qquad \qquad \qquad \qquad - \Phi (X(t_{j-1}),t_{j-1},k) \Big ) \Big \Vert _{L_p(\Omega ;H)} \end{aligned}$$

for all \(k \in (0,T]\).

Proof

The stochastic Spijker norm of \(\mathcal {R}_k [X|_{\mathcal {T}_k} ]\) is given by

$$\begin{aligned}&\big \Vert \mathcal {R}_k [X|_{\mathcal {T}_k} ] \big \Vert _{-1,p} \\&\quad = \big \Vert \mathcal {R}_k [X|_{\mathcal {T}_k} ](t_0) \big \Vert _{L_p(\Omega ;H)} + \max _{n \in \{1,\ldots ,N_k\}} \Big \Vert \sum _{j = 1}^n S_k^{n-j} \mathcal {R}_k [X|_{\mathcal {T}_k} ](t_j) \Big \Vert _{L_p(\Omega ;H)}\\&\quad = \big \Vert X(t_0) - \xi \big \Vert _{L_p(\Omega ;H)}\\&\qquad + \max _{n \in \{1,\ldots ,N_k\}} \Big \Vert \sum _{j = 1}^n S_k^{n-j} \big ( X(t_j) - S_k X(t_{j-1}) - \Phi (X(t_{j-1}),t_{j-1},k) \big ) \Big \Vert _{L_p(\Omega ;H)}. \end{aligned}$$

First, we insert the following relationship into the second term

$$\begin{aligned} X(t_j)&= S(t_j - t_{j-1}) X(t_{j-1}) - \int _{t_{j-1}}^{t_{j}} S(t_j - \sigma ) f(X(\sigma )) \,\mathrm {d}\sigma \\&\qquad + \int _{t_{j-1}}^{t_{j}} S(t_j - \sigma ) g(X(\sigma )) \,\mathrm {d}W(\sigma ), \quad {\mathbf {P}}\text {-a.s.}, \end{aligned}$$

which follows from (3). Hence, for every \(n \in \{1, \ldots , N_k\}\) we get

$$\begin{aligned}&\Big \Vert \sum _{j = 1}^n S_k^{n-j} \big ( X(t_j) - S_k X(t_{j-1}) - \Phi (X(t_{j-1}),t_{j-1},k) \big ) \Big \Vert _{L_p(\Omega ;H)}\\&\quad \le \Big \Vert \sum _{j = 1}^n S_k^{n-j} \Big ( \big ( S(k) - S_k \big ) X(t_{j-1}) - \int _{t_{j-1}}^{t_{j}} \big ( S(t_j - \sigma ) - S_k \big ) f(X(\sigma )) \,\mathrm {d}\sigma \\&\quad \quad \qquad + \int _{t_{j-1}}^{t_{j}} \big ( S(t_j - \sigma ) - S_k \big ) g(X(\sigma )) \,\mathrm {d}W(\sigma ) \Big ) \Big \Vert _{L_p(\Omega ;H)}\\&\quad \qquad +\, \Big \Vert \sum _{j = 1}^n S_k^{n-j} \Big ( - \int _{t_{j-1}}^{t_{j}} S_k f(X(\sigma )) \,\mathrm {d}\sigma + \int _{t_{j-1}}^{t_{j}} S_k g(X(\sigma )) \,\mathrm {d}W(\sigma ) \\&\quad \quad \qquad -\, \Phi (X(t_{j-1}),t_{j-1},k) \Big ) \Big \Vert _{L_p(\Omega ;H)}. \end{aligned}$$

The last summand is already in the desired form. Therefore, it remains to estimate the first summand

$$\begin{aligned} \Theta _n&:= \Big \Vert \sum _{j = 1}^n S_k^{n-j} \Big ( \big ( S(k) - S_k \big ) X(t_{j-1}) - \int _{t_{j-1}}^{t_{j}} \big ( S(t_j - \sigma ) - S_k \big ) f(X(\sigma )) \,\mathrm {d}\sigma \\&\qquad \qquad + \int _{t_{j-1}}^{t_{j}} \big ( S(t_j - \sigma ) - S_k \big ) g(X(\sigma )) \,\mathrm {d}W(\sigma ) \Big ) \Big \Vert _{L_p(\Omega ;H)}. \end{aligned}$$

For this, we again insert (3) and obtain

$$\begin{aligned} \Theta _n&\le \Big \Vert \sum _{j = 1}^n S_k^{n-j} \big ( S(k) - S_k \big ) S(t_{j-1}) X_0 \Big \Vert _{L_p(\Omega ;H)}\\&+ \,\Big \Vert \sum _{j = 1}^n S_k^{n-j} \Big ( \big ( S(k) - S_k \big ) \int _{0}^{t_{j-1}} S(t_{j-1} - \sigma ) f(X(\sigma )) \,\mathrm {d}\sigma \\&\quad +\, \int _{t_{j-1}}^{t_{j}} \big ( S(t_j - \sigma ) - S_k \big ) f(X(\sigma )) \,\mathrm {d}\sigma \Big ) \Big \Vert _{L_p(\Omega ;H)}\\&+\, \Big \Vert \sum _{j = 1}^n S_k^{n-j} \Big ( \big ( S(k) - S_k \big ) \int _{0}^{t_{j-1}} S(t_{j-1} - \sigma ) g(X(\sigma )) \,\mathrm {d}W(\sigma )\\&\quad +\, \int _{t_{j-1}}^{t_{j}} \big ( S(t_j \!-\! \sigma ) \!-\! S_k \big ) g(X(\sigma )) \,\mathrm {d}W(\sigma ) \Big ) \Big \Vert _{L_p(\Omega ;H)} \!=:\! \Theta _n^1 + \Theta _n^2 + \Theta _n^3. \end{aligned}$$

Next, we apply the fact that

$$\begin{aligned} \sum _{j = 1}^n S_k^{n-j} \big ( S(k) - S_k \big ) S(t_{j-1}) = S(t_n) - S_k^n \end{aligned}$$
(33)

for all \(n \in \{1,\ldots , N_k\}\). This yields for the term \(\Theta _n^1\) the estimate

$$\begin{aligned} \Theta _n^1 = \Big \Vert \sum _{j = 1}^n S_k^{n-j} \big ( S(k) - S_k \big ) S(t_{j-1}) X_0 \Big \Vert _{L_p(\Omega ;H)} = \big \Vert (S(t_n) - S_k^n) X_0 \big \Vert _{L_p(\Omega ;H)} \end{aligned}$$
(34)

for all \(n \in \{1,\ldots , N_k\}\). In addition, it holds

$$\begin{aligned} \Theta _n^2 = \Big \Vert \sum _{j = 1}^n \int _{t_{j-1}}^{t_{j}} \big ( S(t_n - \sigma ) - S_k^{n - j + 1} \big ) f(X(\sigma )) \,\mathrm {d}\sigma \Big \Vert _{L_p(\Omega ;H)}, \end{aligned}$$
(35)

as well as

$$\begin{aligned} \Theta _n^3 = \Big \Vert \sum _{j = 1}^n \int _{t_{j-1}}^{t_{j}} \big ( S(t_n - \sigma ) - S_k^{n - j + 1} \big ) g(X(\sigma )) \,\mathrm {d}W(\sigma )\Big \Vert _{L_p(\Omega ;H)} \end{aligned}$$
(36)

for all \(n \in \{1, \ldots , N_k\}\). Indeed, for a given \(\sigma \in (0,t_{N_k}]\) let \(\ell (\sigma ) \in {\mathbb {N}}\) be determined by \(t_{\ell (\sigma ) - 1} < \sigma \le t_{\ell (\sigma )}\). Then, by interchanging summation and integration we obtain

$$\begin{aligned}&\sum _{j = 1}^n S_k^{n-j} \big ( S(k) - S_k \big ) \int _{0}^{t_{j-1}} S(t_{j-1} - \sigma ) f(X(\sigma )) \,\mathrm {d}\sigma \\&\quad = \sum _{j = 1}^n \int _{0}^{t_{n-1}} \mathbb {I}_{[0,t_{j-1}]}(\sigma ) S_k^{n-j} \big ( S(k) - S_k \big ) S(t_{j-1} - \sigma ) f(X(\sigma )) \,\mathrm {d}\sigma \\&\quad = \int _{0}^{t_{n-1}} \sum _{j = \ell (\sigma ) +1}^n S_k^{n-j} \big ( S(k) - S_k \big ) S(t_{j-1} - t_{\ell (\sigma )}) S(t_{\ell (\sigma )} - \sigma ) f(X(\sigma )) \,\mathrm {d}\sigma \\&\quad = \int _{0}^{t_{n-1}} \big ( S(t_{n} - t_{\ell (\sigma )}) - S_k^{n-\ell (\sigma )} \big ) S(t_{\ell (\sigma )} - \sigma ) f(X(\sigma )) \,\mathrm {d}\sigma \\&\quad = \sum _{j = 1}^{n-1} \int _{t_{j-1}}^{t_{j}} \big ( S(t_{n} - t_{j}) - S_k^{n-j} \big ) S(t_{j} - \sigma ) f(X(\sigma )) \,\mathrm {d}\sigma , \end{aligned}$$

where we applied (33) in the third step. Therefore, it holds

$$\begin{aligned}&\sum _{j = 1}^n S_k^{n-j} \Big ( \big ( S(k) - S_k \big ) \int _{0}^{t_{j-1}} S(t_{j-1} - \sigma ) f(X(\sigma )) \,\mathrm {d}\sigma \\&\qquad \qquad + \int _{t_{j-1}}^{t_{j}} \big ( S(t_j - \sigma ) - S_k \big ) f(X(\sigma )) \,\mathrm {d}\sigma \Big )\\&\quad = \sum _{j = 1}^{n-1} \int _{t_{j-1}}^{t_{j}} \big ( S(t_{n} - t_{j}) - S_k^{n-j} \big ) S(t_{j} - \sigma ) f(X(\sigma )) \,\mathrm {d}\sigma \\&\qquad \qquad \qquad + \sum _{j = 1}^n S_k^{n-j} \int _{t_{j-1}}^{t_{j}} \big ( S(t_j - \sigma ) - S_k \big ) f(X(\sigma )) \,\mathrm {d}\sigma \\&\quad = \sum _{j = 1}^n \int _{t_{j-1}}^{t_{j}} \big ( S(t_{n} - \sigma ) - S_k^{n-j + 1} \big ) f(X(\sigma )) \,\mathrm {d}\sigma . \end{aligned}$$

This completes the proof of (35) and the same arguments also yield (36). \(\square \)

Remark 3.12

In the finite dimensional situation with \(H = {\mathbb {R}}^d\), \(U = {\mathbb {R}}^m\), \(d, m \in {\mathbb {N}}\) and \(A = 0\) the SPDE (2) becomes a stochastic ordinary differential equation (SODE). In this situation we have \(S(t) = \mathrm {Id}_H\) for all \(t \in [0,T]\). If one applies a numerical scheme with \(S_k = \mathrm {Id}_H\), then the error decomposition in Lemma 3.11 actually holds with equality and it coincides with the stochastic Spijker norm from [5]. Compare also with [19, (1.5)], where the application of the maximum occurs inside the expectation.

4 Bistability of the Milstein–Galerkin finite element scheme

In this section we embed the Milstein–Galerkin finite element scheme (4) into the abstract framework of Sect. 3. Then, we prove that Assumptions 3.5–3.7 are satisfied and we consequently conclude the bistability of the scheme.

For the embedding we first set \(\xi _h = P_h X_0\) and

$$\begin{aligned} S_{k,h} := \big ( \mathrm {Id}_H + k A_h \big )^{-1} P_h \in {\mathcal {L}}(H) \end{aligned}$$

for every \(h \in (0,1]\). Let us note that in contrast to Sect. 2.4 the operator \(S_{k,h}\) includes the orthogonal projector \(P_h\) and is therefore defined as an operator from \(H\) to \(H\).

Further, the increment function \(\Phi _h :H \times \mathbb {T} \times \Omega \rightarrow H\), \(h \in (0,1]\), is given by

$$\begin{aligned} \Phi _h(x,t,k)&= - k S_{k,h} f(x) + S_{k,h}g(x) \big (W(t+k) - W(t) \big )\nonumber \\&\qquad +\, S_{k,h} \int _{t}^{t + k} g'(x)\Bigg [ \int _{t}^{\sigma _1} g(x) \,\mathrm {d}W(\sigma _2) \Bigg ] \,\mathrm {d}W(\sigma _1) \end{aligned}$$
(37)

for all \((t,k) \in \mathbb {T}\) and \(x \in H\).

Theorem 4.1

Under Assumptions 2.2 to 2.4 the Milstein–Galerkin finite element scheme (4) is bistable for every \(h \in (0,1]\). The stability constant \(C_{\mathrm {Stab}}\) can be chosen to be independent of \(h \in (0,1]\).

Proof

First, let \(h \in (0,1]\) be an arbitrary but fixed parameter value of the spatial discretization. By Theorem 3.8 it is sufficient to show that Assumptions 3.5–3.7 are satisfied.

Regarding Assumption 3.5 it directly follows from Assumption 2.2 that \(\xi _h = P_h X_0\) is \(p\)-fold integrable and \({\mathcal {F}}_0/\mathcal {B}(H)\)-measurable. Furthermore, it holds

$$\begin{aligned} \big \Vert \xi _h \big \Vert _{L_p(\Omega ;H)} \le \big \Vert X_0 \big \Vert _{L_p(\Omega ;H)}, \end{aligned}$$
(38)

that is, the norm of \(\xi _h\) is bounded independently of \(h \in (0,1]\) by the norm of \(X_0\).

The stability of the family of linear operators \(S_{k,h}\), \(k \in (0,T]\), follows from (16) with \(\rho = 0\) which yields

$$\begin{aligned} \big \Vert S_{k,h}^n x \big \Vert = \big \Vert \big ( ( \mathrm {Id}_H + kA_h)^{-1} P_h \big )^n x \big \Vert = \big \Vert ( \mathrm {Id}_H + kA_h)^{-n} P_h x \big \Vert \le C \Vert x \Vert \end{aligned}$$

for all \(x \in H\) and \(n \in \{1,\ldots ,N_k\}\). Consequently, Assumption 3.6 is satisfied with \(C_S = C\) and the constant is also independent of \(h \in (0,1]\) and \(k \in (0,T]\).

Hence, it remains to investigate if Assumption 3.7 is also fulfilled. First, for every \(p \in [2,\infty )\) and for all \(m,n \in \{1,\ldots ,N_k\}\) with \(n \ge m\) it holds

$$\begin{aligned}&\Big \Vert \sum _{j = m}^n S_{k,h}^{n-j} \Phi _h(0,t_{j-1},k) \Big \Vert _{L_p(\Omega ;H)} \\&\quad \le \Big \Vert \sum _{j = m}^n S_{k,h}^{n-j+1} f(0) k \Big \Vert _{L_p(\Omega ;H)} \!+\! \Big \Vert \sum _{j = m}^n S_{k,h}^{n-j+1} g(0) \big (W(t_{j}) - W(t_{j-1})\big )\Big \Vert _{L_p(\Omega ;H)}\\&\qquad +\, \Big \Vert \sum _{j = m}^n S_{k,h}^{n-j+1} \int _{t_{j-1}}^{t_j} g'(0)\Bigg [ \int _{t_{j-1}}^{\sigma _1} g(0) \,\mathrm {d}W(\sigma _2) \Bigg ] \,\mathrm {d}W(\sigma _1) \Big \Vert _{L_p(\Omega ;H)}\\&\quad =: I_1 + I_2 + I_3. \end{aligned}$$

We deal with the three terms separately. By recalling (18) and (20) the deterministic term \(I_1\) is estimated by

$$\begin{aligned} I_1 = \Big \Vert \int _{t_{m-1}}^{t_n} S_{k,h}(t_n - \sigma ) P_h f(0) \,\mathrm {d}\sigma \Big \Vert&\le \int _{t_{m-1}}^{t_n} (t_n- \sigma )^{-\frac{1}{2}} \Vert f(0) \Vert _{-1} \,\mathrm {d}\sigma \nonumber \\&= 2 (t_n- t_{m-1})^{\frac{1}{2}} \Vert f(0)\Vert _{-1}. \end{aligned}$$
(39)

For the estimate of \(I_2\) we first write the sum as a stochastic integral by inserting (18), then we apply Proposition 2.6 and (19) and obtain

$$\begin{aligned} I_2&= \Big \Vert \int _{t_{m-1}}^{t_n} S_{k,h}(t_n - \sigma ) P_h g(0) \,\mathrm {d}W(\sigma ) \Big \Vert _{L_p(\Omega ;H)}\nonumber \\&\le C(p) \Bigg ( \int _{t_{m-1}}^{t_n} \big \Vert S_{k,h}(t_n - \sigma ) P_h g(0) \big \Vert _{{\mathcal {L}}_2^0}^2 \,\mathrm {d}\sigma \Bigg )^{\frac{1}{2}}\nonumber \\&\le C \Bigg ( \int _{t_{m-1}}^{t_n} \Vert g(0) \Vert _{{\mathcal {L}}_2^0}^2 \,\mathrm {d}\sigma \Bigg )^{\frac{1}{2}} = C (t_n- t_{m-1})^{\frac{1}{2}} \Vert g(0)\Vert _{{\mathcal {L}}_2^0}, \end{aligned}$$
(40)

where the constant \(C\) is again independent of \(h \in (0,1]\) and \(k \in (0,T]\).

Before we continue with the estimate of the third term \(I_3\), it is convenient to introduce the stochastic process \(\Gamma _Y :[0,T] \times \Omega \rightarrow H\), which for a given \(Y \in \mathcal {G}_p(\mathcal {T}_k)\) is defined by

$$\begin{aligned} \Gamma _Y(\sigma )&:= {\left\{ \begin{array}{ll} 0 \in H, &{} \text {for } \sigma = 0,\\ \int ^{\sigma }_{t_{j-1}} g(Y(t_{j-1})) \,\mathrm {d}W(\tau ), &{} \text {for } \sigma \in (t_{j-1}, t_j], \; j \in \{1,\ldots ,N_k\}. \end{array}\right. } \end{aligned}$$
(41)

Note that \(\Gamma _Y\) is left-continuous with existing right limits and therefore predictable. Further, it holds by Proposition 2.6

$$\begin{aligned} \sup _{\sigma \in [0,T]} \big \Vert \Gamma _Y(\sigma ) \big \Vert _{L_p(\Omega ;{\mathcal {L}}_2^0)} \le C(p) k^{\frac{1}{2}} \max _{j \in \{1,\ldots ,N_k\}} \Vert g(Y(t_{j-1}))\Vert _{L_p(\Omega ;{\mathcal {L}}_2^0)} \end{aligned}$$

for all \(p \in [2,\infty )\).

Together with the same arguments as above and Assumption 2.4, this yields for \(I_3\) that

$$\begin{aligned} I_3&= \Big \Vert \int _{t_{m-1}}^{t_n} S_{k,h}(t_n - \sigma ) P_h g'(0)\big [ \Gamma _0(\sigma ) \big ] \,\mathrm {d}W(\sigma ) \Big \Vert _{L_p(\Omega ;H)}\nonumber \\&\le C(p) \Bigg ( \int _{t_{m-1}}^{t_n} \big \Vert S_{k,h}(t_n - \sigma ) P_h g'(0)\big [ \Gamma _0(\sigma ) \big ] \big \Vert _{L_p(\Omega ;{\mathcal {L}}_2^0)}^2 \,\mathrm {d}\sigma \Bigg )^{\frac{1}{2}}\nonumber \\&\le C \Bigg ( \int _{t_{m-1}}^{t_n} \big \Vert g'(0)\big [\Gamma _0(\sigma ) \big ] \big \Vert _{L_p(\Omega ;{\mathcal {L}}_2^0)}^2 \,\mathrm {d}\sigma \Bigg )^{\frac{1}{2}}\nonumber \\&\le C\, (t_n- t_{m-1})^{\frac{1}{2}} k^{\frac{1}{2}} \Vert g'(0) \Vert _{{\mathcal {L}}(H;{\mathcal {L}}_2^0)} \Vert g(0)\Vert _{{\mathcal {L}}_2^0} \le C C_g^2 T^{\frac{1}{2}} (t_n- t_{m-1})^{\frac{1}{2}}. \end{aligned}$$
(42)

Hence, a combination of (39), (40) and (42) completes the proof of (27).

Next, we verify that \(\Phi _h\) also satisfies (28). For every \(p \in [2,\infty )\), for all \(Y, Z \in \mathcal {G}_p(\mathcal {T}_k)\) and \(n \in \{1,\ldots ,N_k\}\) it holds

$$\begin{aligned}&\Big \Vert \sum _{j = 1}^n S_{k,h}^{n-j} \big ( \Phi _h(Y(t_{j-1}),t_{j-1},k) - \Phi _h(Z(t_{j-1}),t_{j-1},k) \big ) \Big \Vert _{L_p(\Omega ;H)} \\&\quad \le \Big \Vert \sum _{j = 1}^n S_{k,h}^{n-j+1} \big ( f(Y(t_{j-1})) - f(Z(t_{j-1})) \big ) k \Big \Vert _{L_p(\Omega ;H)} \\&\quad \; + \,\Big \Vert \sum _{j = 1}^n S_{k,h}^{n-j+1} \big ( g(Y(t_{j-1})) - g(Z(t_{j-1})) \big ) \big (W(t_{j}) - W(t_{j-1})\big )\Big \Vert _{L_p(\Omega ;H)}\\&\quad \; + \,\Big \Vert \sum _{j = 1}^n S_{k,h}^{n-j+1} \int _{t_{j-1}}^{t_j} g'(Y(t_{j-1}))\big [ \Gamma _Y(\sigma ) \big ] \!-\! g'(Z(t_{j-1}))\big [ \Gamma _Z(\sigma ) \big ] \,\mathrm {d}W(\sigma ) \Big \Vert _{L_p(\Omega ;H)}\\&\quad =: I_4 + I_5 + I_6. \end{aligned}$$

Again, we bound the three terms separately. For \(I_4\) we apply (13) and (16) and obtain

$$\begin{aligned} I_4&\le k \sum _{j = 1}^n \big \Vert A_h^{\frac{1}{2}} S_{k,h}^{n-j+1} A_h^{-\frac{1}{2}} P_h \big ( f(Y(t_{j-1})) - f(Z(t_{j-1})) \big ) \big \Vert _{L_p(\Omega ;H)}\\&\le k \sum _{j = 1}^n \big (t_n - t_{j-1}\big )^{-\frac{1}{2}} \big \Vert f(Y(t_{j-1})) - f(Z(t_{j-1})) \big \Vert _{L_p(\Omega ;\dot{H}^{-1})}. \end{aligned}$$

Therefore, by an application of Assumption 2.3 and the Cauchy–Schwarz inequality we get

$$\begin{aligned} I_4^2&\le C_f^2 k^2 \Big (\sum _{j = 1}^n \big (t_n - t_{j-1}\big )^{-\frac{1}{4}} \big (t_n - t_{j-1}\big )^{-\frac{1}{4}} \big \Vert Y(t_{j-1}) - Z(t_{j-1}) \big \Vert _{L_p(\Omega ;H)} \Big )^2\\&\le C_f^2 k^2 \sum _{j = 1}^n \big (t_n - t_{j-1}\big )^{-\frac{1}{2}} \sum _{j = 1}^n \big (t_n - t_{j-1}\big )^{-\frac{1}{2}} \big \Vert Y(t_{j-1}) - Z(t_{j-1}) \big \Vert _{L_p(\Omega ;H)}^2. \end{aligned}$$

After applying (32) the estimate of \(I_4^2\) is in the desired form of (28), that is

$$\begin{aligned} I_4^2 \le 2 C_f^2 T^{\frac{1}{2}} k \sum _{j = 1}^n \big (t_n - t_{j-1}\big )^{-\frac{1}{2}} \big \Vert Y(t_{j-1}) - Z(t_{j-1}) \big \Vert _{L_p(\Omega ;H)}^2. \end{aligned}$$
(43)

In order to apply Proposition 2.6 to the remaining two terms \(I_5\) and \(I_6\), we again write the sum as an integral in each term. For this we define

$$\begin{aligned} g_Y(\sigma )&:= \mathbb {I}_{(t_{j-1},t_j]}(\sigma ) g(Y(t_{j-1})), \\ g'_Y(\sigma )&:= \mathbb {I}_{(t_{j-1},t_j]}(\sigma ) g'(Y(t_{j-1}))\big [ \Gamma _Y(\sigma ) \big ] \end{aligned}$$

for all \(Y \in \mathcal {G}_p(\mathcal {T}_k)\) and \(\sigma \in [0,T]\). Then, \(I_5\) is estimated by applying Proposition 2.6 and (19). Thus we have

$$\begin{aligned} I_5&= \Big \Vert \int _{0}^{t_n} S_{k,h}(t_n - \sigma ) P_h \big ( g_Y(\sigma ) - g_Z(\sigma ) \big ) \,\mathrm {d}W(\sigma ) \Big \Vert _{L_p(\Omega ;H)}\\&\le C(p) \Bigg ( \int _{0}^{t_n} \big \Vert S_{k,h}(t_n - \sigma ) P_h \big ( g_Y(\sigma ) - g_Z(\sigma )\big ) \big \Vert _{L_p(\Omega ;{\mathcal {L}}_2^0)}^2 \,\mathrm {d}\sigma \Bigg )^{\frac{1}{2}} \\&\le C \Bigg ( \int _{0}^{t_n} \big \Vert g_Y(\sigma ) - g_Z(\sigma ) \big \Vert _{L_p(\Omega ;{\mathcal {L}}_2^0)}^2 \,\mathrm {d}\sigma \Bigg )^{\frac{1}{2}}\\&= C \Bigg ( k \sum _{j = 1}^n \big \Vert g(Y(t_{j-1})) - g(Z(t_{j-1})) \big \Vert _{L_p(\Omega ;{\mathcal {L}}_2^0)}^2 \Bigg )^{\frac{1}{2}}. \end{aligned}$$

Then Assumption 2.4 yields

$$\begin{aligned} I_5^2&\le C^2 C_g k \sum _{j = 1}^n \big \Vert Y(t_{j-1}) - Z(t_{j-1}) \big \Vert _{L_p(\Omega ;H)}^2 \nonumber \\&\le C^2 T^{\frac{1}{2}} C_g k \sum _{j = 1}^n \big (t_n - t_{j-1}\big )^{-\frac{1}{2}} \big \Vert Y(t_{j-1}) - Z(t_{j-1}) \big \Vert _{L_p(\Omega ;H)}^2. \end{aligned}$$
(44)

It remains to prove a similar estimate for \(I_6\). As above, Proposition 2.6 and (19) yield

$$\begin{aligned} I_6&= \Big \Vert \int _{0}^{t_n} S_{k,h}(t_n - \sigma ) P_h \big ( g_Y'(\sigma ) - g_Z'(\sigma ) \big ) \,\mathrm {d}W(\sigma ) \Big \Vert _{L_p(\Omega ;H)}\\&\le C \Big ( \sum _{j = 1}^n \int _{t_{j-1}}^{t_j} \big \Vert g'(Y(t_{j-1}))[\Gamma _Y(\sigma )] - g'(Z(t_{j-1}))[\Gamma _Z(\sigma )] \big \Vert _{L_p(\Omega ;{\mathcal {L}}_2^0)}^2 \,\mathrm {d}\sigma \Big )^{\frac{1}{2}}. \end{aligned}$$

Next, since

$$\begin{aligned} g'(Y(t_{j-1}))[\Gamma _Y(\sigma )] = \int _{t_{j-1}}^{\sigma } g'(Y(t_{j-1})) g(Y(t_{j-1})) \,\mathrm {d}W(\sigma _2) \in L_p(\Omega ;{\mathcal {L}}_2^0), \end{aligned}$$

we obtain by a further application of Proposition 2.6 and (8)

$$\begin{aligned} I_6^2&\le C k^2 \sum _{j = 1}^n \big \Vert g'(Y(t_{j-1}))g(Y(t_{j-1})) - g'(Z(t_{j-1}))g(Z(t_{j-1})) \big \Vert _{L_p(\Omega ;{\mathcal {L}}_2(U_0,{\mathcal {L}}_2^0))}^2\\&\le C T^{\frac{3}{2}} C_g k \sum _{j = 1}^n \big (t_n - t_{j-1}\big )^{-\frac{1}{2}} \big \Vert Y(t_{j-1}) - Z(t_{j-1}) \big \Vert _{L_p(\Omega ;H)}^2. \end{aligned}$$

Hence, together with (43) and (44) the proof of (28) is complete, where the constant \(C_\Phi \) can also be chosen to be independent of \(h \in (0,1]\).

Concerning the measurability of \(\Phi _h\) it is clear, that for every \((x,t,k) \in H \times \mathbb {T}\) we have \(\Phi _h(x,t,k) \in L_p(\Omega ,{\mathcal {F}}_{t+k},{\mathbf {P}};H)\). In addition, the same arguments, which have been used for the analysis of the terms \(I_4\) to \(I_6\), yield the continuity of \(x \mapsto \Phi _h(x,t,k)\) as a mapping from \(H\) to \(L_p(\Omega ;H)\). From this we directly deduce the measurability of the mapping \((x, \omega ) \mapsto \Phi _h(x,t,k)(\omega )\) with respect to \(\mathcal {B}(H) \otimes {\mathcal {F}}_{t + k}/ \mathcal {B}(H)\).

Finally, after a short inspection of the proof of Theorem 3.8 we note the following: Since all constants can be chosen to be independent of \(h \in (0,1]\), there also exists a choice of the stability constant \(\mathrm {C}_{\mathrm {Stab}}\) for the Milstein–Galerkin finite element scheme (4), which is likewise independent of the parameter \(h \in (0,1]\). \(\square \)

5 Consistency of the Milstein scheme

The aim of this section is to investigate if the Milstein scheme is consistent. Our result is summarized in the following theorem. Its proof is based on the decomposition of the local truncation error given in Lemma 3.11 and is split over a series of lemmas.

Theorem 5.1

Let Assumptions 2.7 and 2.9 be satisfied by the spatial discretization. If Assumptions 2.2 to 2.4 are fulfilled, then the local truncation error of the scheme (4) satisfies

$$\begin{aligned} \big \Vert \mathcal {R}_k \big [ X|_{\mathcal {T}_k} \big ] \big \Vert _{-1,p} \le C \big ( h^{1+r} + k^{\frac{1+r}{2}} \big ) \end{aligned}$$

for all \(h \in [0,1)\) and \(k \in (0,T]\). In particular, if \(h\) and \(k\) are coupled by \(h := c k^{\frac{1}{2}}\) for a positive constant \(c \in {\mathbb {R}}\), then the Milstein scheme is consistent of order \(\frac{1+r}{2}\).

Lemma 5.2

(Consistency of the initial condition) Let Assumption 2.2 be satisfied with \(r \in [0,1]\). Under Assumption 2.7 it holds

$$\begin{aligned} \Vert X(0) - \xi _h \Vert _{L_p(\Omega ;H)} \le C h^{1+r} \end{aligned}$$

for \(\xi _h = P_h X_0\) and for all \(h \in (0,1]\).

Proof

By the best approximation property of the orthogonal projector \(P_h :H \rightarrow V_h\) and by Assumption 2.7 it holds

$$\begin{aligned} \Vert X(0) \!-\! \xi _h \Vert _{L_p(\Omega ;H)}&= \Vert (\mathrm {Id}_H - P_h) X_0 \Vert _{L_p(\Omega ;H)} \!\le \! \Vert (\mathrm {Id}_H \!-\! R_h) X_0 \Vert _{L_p(\Omega ;H)} \le C h^{1+r} \end{aligned}$$

for all \(h \in (0,1]\). \(\square \)

The next three lemmas are concerned with the consistency of the family of linear operators \(S_{k,h}\), \(k \in (0,T]\), \(h \in (0,1]\).

Lemma 5.3

Let Assumption 2.2 be satisfied for some \(r \in [0,1]\). If the spatial discretization satisfies Assumption 2.7 it holds

$$\begin{aligned} \max _{n \in \{1,\ldots ,N_k\}} \big \Vert \big ( S(t_n) - S_{k,h}^n \big ) X_0 \big \Vert _{L_p(\Omega ;H)} \le C \big ( h^{1+r} + k^{\frac{1+r}{2}} \big ) \big \Vert A^{\frac{1+r}{2}} X_0 \big \Vert _{L_p(\Omega ;H)} \end{aligned}$$

for all \(h \in (0,1]\) and \(k \in (0,T]\).

Proof

The term on the left hand side of the inequality is the error of the fully discrete approximation scheme for the linear Cauchy problem \(u_t = Au\) with the initial condition being a random variable. By Assumption 2.2 we have that \(X_0(\omega ) \in \dot{H}^{1+r}\) for \({\mathbf {P}}\)-almost all \(\omega \in \Omega \). Thus, the error estimate from [32, Theorem 7.8]) yields

$$\begin{aligned} \big \Vert \big ( S(t_n) - S_{k,h}^n \big ) X_0 \big \Vert _{L_p(\Omega ;H)} \le C \big ( h^{1+r} + k^{\frac{1+r}{2}} \big ) \big \Vert A^{\frac{1+r}{2}} X_0 \big \Vert _{L_p(\Omega ;H)}, \end{aligned}$$

for all \(h \in (0,1]\), \(k \in (0,T]\), where the constant \(C\) is also independent of \(n \in \{1,\ldots ,N_k\}\). \(\square \)

Lemma 5.4

Let Assumptions 2.2 to 2.4 be satisfied for some \(r \in [0,1]\). If the spatial discretization satisfies Assumption 2.7 it holds

$$\begin{aligned} \max _{n \in \{1,\ldots ,N_k\}} \Big \Vert \sum _{j = 1}^n \int _{t_{j-1}}^{t_j} \big ( S(t_n - \sigma ) - S_{k,h}^{n-j+1} \big ) f(X(\sigma )) \,\mathrm {d}\sigma \Big \Vert _{L_p(\Omega ;H)} \le C \big ( h^{1+r} + k^{\frac{1+r}{2}} \big ) \end{aligned}$$

for all \(h \in (0,1]\) and \(k \in (0,T]\).

Proof

First, by recalling (17) it is convenient to write

$$\begin{aligned} \sum _{j = 1}^n \int _{t_{j-1}}^{t_j} \big ( S(t_n - \sigma ) - S_{k,h}^{n-j+1} \big ) f(X(\sigma )) \,\mathrm {d}\sigma = \int _{0}^{t_n} F_{k,h}(t_n - \sigma ) f(X(\sigma )) \,\mathrm {d}\sigma \end{aligned}$$

for all \(n \in \{1,\ldots ,N_k\}\). Then, it follows

$$\begin{aligned}&\Big \Vert \sum _{j = 1}^n \int _{t_{j-1}}^{t_j} \big ( S(t_n - \sigma ) - S_{k,h}^{n-j+1} \big ) f(X(\sigma )) \,\mathrm {d}\sigma \Big \Vert _{L_p(\Omega ;H)} \\&\quad \le \Big \Vert \int _{0}^{t_n} F_{k,h}(t_n - \sigma ) \big (f(X(\sigma )) - f(X(t_n))\big ) \,\mathrm {d}\sigma \Big \Vert _{L_p(\Omega ;H)}\\&\qquad +\, \Big \Vert \int _{0}^{t_n} F_{k,h}(t_n - \sigma ) f(X(t_n)) \,\mathrm {d}\sigma \Big \Vert _{L_p(\Omega ;H)} =: J_n^1 + J_n^2 \end{aligned}$$

for all \(n \in \{1,\ldots ,N_k\}\). We estimate the two summands separately. For \(J_{n}^1\) we apply Lemma 2.8 (iii) with \(\rho = 1 - r\) and obtain

$$\begin{aligned} J^1_n&\le \int _{0}^{t_n} \big \Vert F_{k,h}(t_n - \sigma ) \big (f(X(\sigma )) - f(X(t_n))\big ) \big \Vert _{L_p(\Omega ;H)} \,\mathrm {d}\sigma \\&\le C \big ( h^{1+r} + k^{\frac{1+r}{2}} \big ) \int _{0}^{t_n} (t_n - \sigma )^{-1} \big \Vert f(X(\sigma )) - f(X(t_n)) \big \Vert _{L_p(\Omega ;\dot{H}^{-1+r})} \,\mathrm {d}\sigma \\&\le C \big ( h^{1+r} + k^{\frac{1+r}{2}} \big ) \int _{0}^{t_n} (t_n - \sigma )^{-1 + \frac{1}{2}} \,\mathrm {d}\sigma \le C T^{\frac{1}{2}} \big ( h^{1+r} + k^{\frac{1+r}{2}} \big ), \end{aligned}$$

where we also applied (7) and (11).

The term \(J_n^2\) is estimated by an application of Lemma 2.10 (i) with \(\rho = 1 - r\), Assumption 2.3 and (10), which yield

$$\begin{aligned} J_n^2&\le C \big ( h^{1+r} + k^{\frac{1+r}{2}} \big ) \big \Vert f(X(t_n)) \big \Vert _{L_p(\Omega ;\dot{H}^{-1 + r})}\\&\le C \big ( h^{1+r} + k^{\frac{1+r}{2}} \big ) \Big ( 1 + \sup _{\sigma \in [0,T]} \Vert X(\sigma ) \Vert _{L_p(\Omega ;H)} \Big ), \end{aligned}$$

for all \(h \in (0,1]\), \(k \in (0,T]\) and \(n \in \{1, \ldots , N_k\}\). This completes the proof of Lemma 5.4. \(\square \)

Lemma 5.5

Let Assumptions 2.2 to 2.4 be satisfied for some \(r \in [0,1)\). If the spatial discretization satisfies Assumptions 2.7 and 2.9 it holds

$$\begin{aligned}&\max _{n \in \{1,\ldots ,N_k\}} \Big \Vert \sum _{j = 1}^n \int _{t_{j-1}}^{t_j} \big ( S(t_n - \sigma ) - S_{k,h}^{n-j+1} \big ) g(X(\sigma )) \,\mathrm {d}W(\sigma ) \Big \Vert _{L_p(\Omega ;H)}\\&\qquad \le C \big ( h^{1+r} + k^{\frac{1+r}{2}} \big ) \end{aligned}$$

for all \(h \in (0,1]\) and \(k \in (0,T]\).

Proof

As in the proof of Lemma 5.4, by (17), we first rewrite the sum inside the norm as

$$\begin{aligned} \sum _{j = 1}^n \int _{t_{j-1}}^{t_j} \big ( S(t_n - \sigma ) - S_{k,h}^{n-j+1} \big ) g(X(\sigma )) \,\mathrm {d}W(\sigma ) = \int _{0}^{t_n} F_{k,h}(t_n - \sigma ) g(X(\sigma )) \,\mathrm {d}W(\sigma ) \end{aligned}$$

for all \(n \in \{1,\ldots ,N_k\}\). Then, it follows by Proposition 2.6

$$\begin{aligned}&\Big \Vert \sum _{j = 1}^n \int _{t_{j-1}}^{t_j} \big ( S(t_n - \sigma ) - S_{k,h}^{n-j+1} \big ) g(X(\sigma )) \,\mathrm {d}W(\sigma ) \Big \Vert _{L_p(\Omega ;H)} \\&\quad \le C(p) \Big ( {\mathbb {E}}\Big [ \Big ( \int _{0}^{t_n} \big \Vert F_{k,h}(t_n - \sigma ) g(X(\sigma )) \big \Vert ^2_{{\mathcal {L}}_2^0} \,\mathrm {d}\sigma \Big )^{\frac{p}{2}} \Big ] \Big )^{\frac{1}{p}} \\&\quad \le C(p) \Big ( {\mathbb {E}}\Big [ \Big ( \int _{0}^{t_n} \big \Vert F_{k,h}(t_n - \sigma ) \big ( g(X(\sigma )) - g(X(t_n)) \big ) \big \Vert ^2_{{\mathcal {L}}_2^0} \,\mathrm {d}\sigma \Big )^{\frac{p}{2}} \Big ] \Big )^{\frac{1}{p}} \\&\qquad +\,\, C(p)\Big ( {\mathbb {E}}\Big [ \Big ( \int _{0}^{t_n} \big \Vert F_{k,h}(t_n - \sigma ) g(X(t_n)) \big \Vert ^2_{{\mathcal {L}}_2^0} \,\mathrm {d}\sigma \Big )^{\frac{p}{2}} \Big ] \Big )^{\frac{1}{p}} =: C(p) \big ( J_n^3 + J_n^4 \big ) \end{aligned}$$

for all \(n \in \{1,\ldots ,N_k\}\). We estimate the two summands separately. For \(J_{n}^3\) we apply Lemma 2.8 (i) with \(\mu = 1 + r\) and \(\nu = 0\) and obtain by (8) and (11)

$$\begin{aligned} J^3_n&\le C \big ( h^{1+r} + k^{\frac{1+r}{2}} \big ) \Bigg ( \int _{0}^{t_n} (t_n - \sigma )^{-1-r} \big \Vert g(X(\sigma )) - g(X(t_n)) \big \Vert _{L_p(\Omega ;{\mathcal {L}}_2^0)}^2 \,\mathrm {d}\sigma \Bigg )^{\frac{1}{2}} \\&\le C \big ( h^{1+r} + k^{\frac{1+r}{2}} \big ) \Bigg ( \int _{0}^{t_n} (t_n - \sigma )^{- r} \,\mathrm {d}\sigma \Bigg )^{\frac{1}{2}} \le C T^{\frac{1-r}{2}} \big ( h^{1+r} + k^{\frac{1+r}{2}} \big ), \end{aligned}$$

where we also applied the same technique as in the proof of the second inequality of Proposition 2.6.

For the estimate of \(J_n^4\) we first apply Lemma 2.10 (ii) with \(\rho = r\). Then (9) and (10) yield

$$\begin{aligned} J_n^4&\le C \big ( h^{1+r} + k^{\frac{1+r}{2}} \big ) \big \Vert g(X(t_n)) \big \Vert _{L_p(\Omega ;{\mathcal {L}}_{2,r}^0)}\\&\le C \big ( h^{1+r} + k^{\frac{1+r}{2}} \big ) \Big ( 1 + \sup _{\sigma \in [0,T]} \Vert X(\sigma ) \Vert _{L_p(\Omega ;\dot{H}^r)} \Big ), \end{aligned}$$

for all \(h \in (0,1]\), \(k \in (0,T]\) and \(n \in \{1, \ldots , N_k\}\). The proof is complete. \(\square \)

Remark 5.6

Let us stress that the case \(r=1\) is not included in Lemma 5.5. The reason for this is found in the estimate of the term \(J_n^3\), where a blow up occurs for \(r = 1\). This problem can be avoided under stronger assumptions on \(g\) as, for example, the existence of an parameter value \(s \in (0,1]\) such that

$$\begin{aligned} \Vert g(x_1) - g(x_2) \Vert _{{\mathcal {L}}_{2,s}^0} \le C_g \Vert x_1 - x_2 \Vert _{s} \end{aligned}$$

for all \(x_1, x_2 \in \dot{H}^s\). This is often satisfied for linear \(g\) as shown in [1].

By Lemma 3.11 it therefore remains to investigate the order of convergence of the fifth and final term, which after inserting (37) is dominated by the following two summands

$$\begin{aligned}&\max _{n \in \{1,\ldots ,N_k\}} \Big \Vert \sum _{j = 1}^{n} S_{k,h}^{n-j} \Big ( - \int _{t_{j-1}}^{t_j} S_k f(X(\sigma )) \,\mathrm {d}\sigma + \int _{t_{j-1}}^{t_j}S_k g(X(\sigma )) \,\mathrm {d}W(\sigma ) \nonumber \\&\qquad \qquad \qquad -\, \Phi _h(X(t_{j-1}),t_{j-1},k) \Big ) \Big \Vert _{L_p(\Omega ;H)}\nonumber \\&\quad \le \max _{n \in \{1,\ldots ,N_k\}} \Big \Vert \sum _{j = 1}^{n} S_{k,h}^{n-j +1 } \int _{t_{j-1}}^{t_j} f(X(\sigma )) - f(X(t_{j-1})) \,\mathrm {d}\sigma \Big \Vert _{L_p(\Omega ;H)}\nonumber \\&\qquad + \max _{n \in \{1,\ldots ,N_k\}} \Big \Vert \sum _{j = 1}^{n} S_{k,h}^{n-j +1 } \int _{t_{j-1}}^{t_j} \Big ( g(X(\sigma )) - g(X(t_{j-1}))\nonumber \\&\qquad \qquad -\, g'(X(t_{j-1}))\big [ \Gamma _{X|_{\mathcal {T}_k}}(\sigma ) \big ] \Big ) \,\mathrm {d}W(\sigma ) \Big \Vert _{L_p(\Omega ;H)}, \end{aligned}$$
(45)

where we recall from (41) that

$$\begin{aligned} \Gamma _{X|_{\mathcal {T}_k}}(\sigma )&:= {\left\{ \begin{array}{ll} 0 \in H, &{} \quad \text {for } \sigma = 0,\\ \int ^{\sigma }_{t_{j-1}} g(X(t_{j-1})) \,\mathrm {d}W(\tau ), &{} \quad \text {for } \sigma \in (t_{j-1}, t_j], \; j \in \{1,\ldots ,N_k\}. \end{array}\right. }\quad \end{aligned}$$
(46)

The remaining two lemmas in this section are concerned with the estimate of the two summands in (45).

Lemma 5.7

Let Assumptions 2.2–2.4 be satisfied for some \(r \in [0,1)\). If the spatial discretization satisfies Assumption 2.7 it holds

$$\begin{aligned} \max _{n \in \{1,\ldots ,N_k\}} \Big \Vert \sum _{j = 1}^{n} S_{k,h}^{n-j +1 } \int _{t_{j-1}}^{t_j} f(X(\sigma )) - f(X(t_{j-1})) \,\mathrm {d}\sigma \Big \Vert _{L_p(\Omega ;H)} \le C k^{\frac{1+r}{2}} \end{aligned}$$

for all \(h \in (0,1]\) and \(k \in (0,T]\).

Proof

For every \(n \in \{1,\ldots ,N_k\}\) we first insert the conditional expectation in the following way

$$\begin{aligned}&\Big \Vert \sum _{j = 1}^{n} S_{k,h}^{n-j +1 } \int _{t_{j-1}}^{t_j} f(X(\sigma )) - f(X(t_{j-1})) \,\mathrm {d}\sigma \Big \Vert _{L_p(\Omega ;H)} \nonumber \\&\quad \le \Big \Vert \sum _{j = 1}^{n} S_{k,h}^{n-j +1 } \int _{t_{j-1}}^{t_j} f(X(\sigma )) - {\mathbb {E}}\big [ f(X(\sigma )) \big | {\mathcal {F}}_{t_{j-1}} \big ] \,\mathrm {d}\sigma \Big \Vert _{L_p(\Omega ;H)}\nonumber \\&\qquad +\, \Big \Vert \sum _{j = 1}^{n} S_{k,h}^{n-j +1 } \int _{t_{j-1}}^{t_j} {\mathbb {E}}\big [ f(X(\sigma )) \big | {\mathcal {F}}_{t_{j-1}} \big ] - f(X(t_{j-1})) \,\mathrm {d}\sigma \Big \Vert _{L_p(\Omega ;H)}. \end{aligned}$$
(47)

Thus, for the summands of the first term it follows

$$\begin{aligned} {\mathbb {E}}\Big [ S_{k,h}^{n-j +1 } \int _{t_{j-1}}^{t_j} f(X(\sigma )) - {\mathbb {E}}\big [ f(X(\sigma )) \big | {\mathcal {F}}_{t_{j-1}} \big ] \,\mathrm {d}\sigma \Big | {\mathcal {F}}_{t_{\ell }} \Big ] = 0 \in H \end{aligned}$$

for every \(j,\ell \in \{1, \ldots , n\}\) with \(\ell < j\). Consequently, by setting

$$\begin{aligned} M_i := \sum _{j = 1}^i S_{k,h}^{n-j +1 } \int _{t_{j-1}}^{t_j} f(X(\sigma )) - {\mathbb {E}}\big [ f(X(\sigma )) \big | {\mathcal {F}}_{t_{j-1}} \big ] \,\mathrm {d}\sigma , \quad i \in \{0,1,\ldots ,n\}, \end{aligned}$$

we obtain a discrete time martingale in \(L_p(\Omega ;H)\). Thus, Burkholder’s inequality [6, Th. 3.3] is applicable and yields together with (16) and (13)

$$\begin{aligned}&\Big \Vert \sum _{j = 1}^{n} S_{k,h}^{n-j +1 } \int _{t_{j-1}}^{t_j} f(X(\sigma )) - {\mathbb {E}}\big [ f(X(\sigma )) \big | {\mathcal {F}}_{t_{j-1}} \big ] \,\mathrm {d}\sigma \Big \Vert _{L_p(\Omega ;H)}\\&\; \le C \Bigg ( {\mathbb {E}}\Big [ \Big ( \sum _{j = 1}^n \Big \Vert S_{k,h}^{n-j +1 } \int _{t_{j-1}}^{t_j} f(X(\sigma )) - {\mathbb {E}}\big [ f(X(\sigma )) \big | {\mathcal {F}}_{t_{j-1}} \big ] \,\mathrm {d}\sigma \Big \Vert ^2 \Big )^{\frac{p}{2}} \Big ] \Bigg )^{\frac{1}{p}} \\&\; \le C \Bigg ( \sum _{j = 1}^n \Big \Vert A_h^{\frac{1}{2}} S_{k,h}^{n-j +1 }\! \int _{t_{j-1}}^{t_j} A_h^{-\frac{1}{2}} P_h \big ( f(X(\sigma )) \!-\! {\mathbb {E}}\big [ f(X(\sigma )) \big | {\mathcal {F}}_{t_{j-1}} \big ] \big ) \,\mathrm {d}\sigma \Big \Vert ^2_{L_p(\Omega ;H)} \Bigg )^{\frac{1}{2}}\\&\; \le C \Bigg ( \sum _{j = 1}^n t_{n-j +1}^{-1} k \int _{t_{j-1}}^{t_j} \big \Vert f(X(\sigma )) - {\mathbb {E}}\big [ f(X(\sigma )) \big | {\mathcal {F}}_{t_{j-1}} \big ] \big \Vert _{L_p(\Omega ;\dot{H}^{-1})}^2 \,\mathrm {d}\sigma \Bigg )^{\frac{1}{2}}. \end{aligned}$$

In addition, since \(\Vert {\mathbb {E}}[ G | {\mathcal {F}}_{t_{j-1}}] \Vert _{L_p(\Omega ;\dot{H}^{-1})} \le \Vert G \Vert _{L_p(\Omega ;\dot{H}^{-1})}\) for all \(G \in L_p(\Omega ;\dot{H}^{-1})\) it follows for all \(\sigma \in [t_{j-1}, t_j]\)

$$\begin{aligned}&\big \Vert f(X(\sigma )) - {\mathbb {E}}\big [ f(X(\sigma )) \big | {\mathcal {F}}_{t_{j-1}} \big ] \big \Vert _{L_p(\Omega ;\dot{H}^{-1})} \\&\quad = \big \Vert f(X(\sigma )) - f(X(t_{j-1})) + {\mathbb {E}}\big [f(X(t_{j-1})) - f(X(\sigma )) \big | {\mathcal {F}}_{t_{j-1}} \big ] \big \Vert _{L_p(\Omega ;\dot{H}^{-1})}\\&\quad \le 2 \big \Vert f(X(\sigma )) - f(X(t_{j-1}))\big \Vert _{L_p(\Omega ;\dot{H}^{-1})} \le C | \sigma - t_{j-1}|^{\frac{1}{2}}, \end{aligned}$$

where we also used (7) and (11) in the last step. Therefore, in the same way as in (32) the estimate of the first summand in (47) is completed by

$$\begin{aligned}&\Big \Vert \sum _{j = 1}^{n} S_{k,h}^{n-j +1 } \int _{t_{j-1}}^{t_j} f(X(\sigma )) - {\mathbb {E}}\big [ f(X(\sigma )) \big | {\mathcal {F}}_{t_{j-1}} \big ] \,\mathrm {d}\sigma \Big \Vert _{L_p(\Omega ;H)}\\&\quad \le C \Bigg ( \sum _{j = 1}^n t_{n-j +1}^{-1} k^3 \Bigg )^{\frac{1}{2}} \le C \Bigg ( k^3 \sum _{j = 1}^n t_{n-j +1}^{-r} t_{n-j +1}^{-1+r} \Bigg )^{\frac{1}{2}}\\&\quad \le C \Bigg ( k^3 \sum _{j = 1}^n t_{n-j +1}^{-r} k^{-1+r} \Bigg )^{\frac{1}{2}} \le C k^{\frac{1+r}{2}}. \end{aligned}$$

For the second summand in (47) we make use of the mean value theorem for Fréchet differentiable mappings, which reads

$$\begin{aligned} f(X(\tau _1)) = f(X(\tau _2)) + \displaystyle \int _{0}^{1} f'(X(\tau _2)+ s (X(\tau _1) - X(\tau _2))) \big [ X(\tau _1) - X(\tau _2) \big ] \,\mathrm {d}s \end{aligned}$$

for all \(\tau _1,\tau _2 \in [0,T]\). For convenience we introduce the short hand notation

$$\begin{aligned} f'(\tau _1,\tau _2;s) := f'(X(\tau _2)+ s (X(\tau _1) - X(\tau _2))) \end{aligned}$$

for all \(\tau _1,\tau _2 \in [0,T]\) and \(s \in [0,1]\). Then, by inserting (3) we obtain the identity

$$\begin{aligned}&{\mathbb {E}}\big [ f(X(\sigma )) | {\mathcal {F}}_{t_{j-1}}\big ] - f(X(t_{j-1})) \\&\quad = {\mathbb {E}}\Big [\int _{0}^{1} f'(\sigma ,t_{j-1};s)\big [ \big (S(\sigma -t_{j-1}) - \mathrm {Id}_H\big ) X(t_{j-1}) \big ] \,\mathrm {d}s \Big | {\mathcal {F}}_{t_{j-1}} \Big ] \\&\qquad -\, {\mathbb {E}}\Big [ \int _{0}^{1} f'(\sigma ,t_{j-1};s) \Big [ \int _{t_{j-1}}^{\sigma } S(\sigma - \tau ) f(X(\tau )) \,\mathrm {d}\tau \Big ]\,\mathrm {d}s \Big | {\mathcal {F}}_{t_{j-1}} \Big ] \\&\qquad + \, {\mathbb {E}}\Big [ \int _{0}^{1} f'(\sigma ,t_{j-1};s) \Big [ \int _{t_{j-1}}^{\sigma } S(\sigma - \tau ) g(X(\tau )) \,\mathrm {d}W(\tau ) \Big ] \,\mathrm {d}s \Big | {\mathcal {F}}_{t_{j-1}} \Big ]\\&\quad =: \Theta _1 (\sigma ,t_{j-1}) + \Theta _2 (\sigma ,t_{j-1}) + \Theta _3 (\sigma ,t_{j-1}), \end{aligned}$$

which holds \({\mathbf {P}}\)-almost surely. Hence, the second summand in (47) satisfies

$$\begin{aligned}&\Big \Vert \sum _{j = 1}^{n} S_{k,h}^{n-j +1 } \int _{t_{j-1}}^{t_j} {\mathbb {E}}\big [ f(X(\sigma )) | {\mathcal {F}}_{t_{j-1}}\big ] - f(X(t_{j-1})) \,\mathrm {d}\sigma \Big \Vert _{L_p(\Omega ;H)} \nonumber \\&\quad = \Big \Vert \sum _{j = 1}^{n} S_{k,h}^{n-j +1 } \int _{t_{j-1}}^{t_j} \Theta _1 (\sigma ,t_{j-1}) + \Theta _2 (\sigma ,t_{j-1}) + \Theta _3 (\sigma ,t_{j-1}) \,\mathrm {d}\sigma \Big \Vert _{L_p(\Omega ;H)}\nonumber \\&\quad \le C \sum _{j = 1}^n t_{n-j+1}^{-\frac{1}{2}} \int _{t_{j-1}}^{t_j} \big \Vert \Theta _1 (\sigma ,t_{j-1}) + \Theta _2 (\sigma ,t_{j-1}) + \Theta _3 (\sigma ,t_{j-1})\big \Vert _{L_p(\Omega ;\dot{H}^{-1})} \,\mathrm {d}\sigma \nonumber \\ \end{aligned}$$
(48)

for every \(n \in \{1,\ldots ,N_k\}\), where we again applied (16) and (13) in the last step.

Below we show that

$$\begin{aligned} \big \Vert \Theta _i (\sigma ,t_{j-1})\big \Vert _{L_p(\Omega ;\dot{H}^{-1})} \le C | \sigma - t_{j-1} |^{\frac{1+r}{2}}, \quad \text {for } i \in \{1,2,3\}. \end{aligned}$$
(49)

Then this is used to complete the estimate of (48) by

$$\begin{aligned}&\displaystyle \Big \Vert \sum \limits _{j = 1}^{n} S_{k,h}^{n-j +1 } \int \nolimits _{t_{j-1}}^{t_j} {\mathbb {E}}\big [ f(X(\sigma )) | {\mathcal {F}}_{t_{j-1}}\big ] - f(X(t_{j-1})) \,\mathrm {d}\sigma \Big \Vert _{L_p(\Omega ;H)} \\&\displaystyle \quad \le C \sum \limits _{j = 1}^n t_{n-j+1}^{-\frac{1}{2}} \int \nolimits _{t_{j-1}}^{t_j} | \sigma - t_{j-1}|^{\frac{1+r}{2}} \,\mathrm {d}\sigma \le C k^{\frac{1+r}{2}}, \end{aligned}$$

where we again applied (32). Thus, the assertion is proved if we show (49).

For the estimation of \(\Theta _1\) we recall that \(\Vert {\mathbb {E}}[ G | {\mathcal {F}}_{t_{j-1}}] \Vert _{L_p(\Omega ;\dot{H}^{-1})} \le \Vert G \Vert _{L_p(\Omega ;\dot{H}^{-1})}\) for all \(G \in L_p(\Omega ;\dot{H}^{-1})\) and obtain

$$\begin{aligned}&\big \Vert \Theta _1 (\sigma ,t_{j-1})\big \Vert _{L_p(\Omega ;\dot{H}^{-1})}\\&\quad \le \Big \Vert \int _{0}^{1} f'(\sigma ,t_{j-1};s)\big [ \big (S(\sigma -t_{j-1}) - \mathrm {Id}_H\big ) X(t_{j-1}) \big ] \,\mathrm {d}s \Big \Vert _{L_p(\Omega ;\dot{H}^{-1})} \\&\quad \le \int _{0}^{1} \Bigg ( {\mathbb {E}}\Big [ \big \Vert f'(\sigma ,t_{j-1};s) \big \Vert _{{\mathcal {L}}(H,\dot{H}^{-1})}^p \big \Vert \big ( S(\sigma - t_{j-1}) - \mathrm {Id}_H \big ) X(t_{j-1}) \big \Vert ^p \Big ] \Bigg )^{\frac{1}{p}} \,\mathrm {d}s\\&\quad \le C \sup _{x \in H} \big \Vert f'(x) \big \Vert _{{\mathcal {L}}(H,\dot{H}^{1+r})} \big \Vert \big ( S(\sigma - t_{j-1}) - \mathrm {Id}_H \big ) X(t_{j-1}) \big \Vert _{L_p(\Omega ;H)}. \end{aligned}$$

Further, from [27, Ch. 2.6, Th. 6.13] it follows

$$\begin{aligned} \big \Vert \big ( S(\sigma \!-\! t_{j-1}) \!-\! \mathrm {Id}_H \big ) X(t_{j-1}) \big \Vert _{L_p(\Omega ;H)}&\le C (\sigma - t_{j-1})^{\frac{1+r}{2}} \Vert X(t_{j-1}) \big \Vert _{L_p(\Omega ;\dot{H}^{1+r})}\\&\le C (\sigma \!-\! t_{j-1})^{\frac{1+r}{2}} \sup _{\sigma \in [0,T]} \Vert X(\sigma ) \Vert _{L_p(\Omega ;\dot{H}^{1+r})} \end{aligned}$$

for all \(\sigma \in [t_{j-1}, t_j]\). In the light of (10) this proves (49) with \(i = 1\).

By following the same steps, it holds for \(\Theta _2\)

$$\begin{aligned}&\big \Vert \Theta _2 (\sigma ,t_{j-1})\big \Vert _{L_p(\Omega ;\dot{H}^{-1})}\\&\quad \le C \sup _{x \in H} \big \Vert f'(x) \big \Vert _{{\mathcal {L}}(H,\dot{H}^{-1+r})} \Big \Vert \int _{t_{j-1}}^{\sigma } S(\sigma - \tau ) f(X(\tau )) \,\mathrm {d}\tau \Big \Vert _{L_p(\Omega ;H)}. \end{aligned}$$

Now, by applying the fact that

$$\begin{aligned} \Vert A^{\frac{1-r}{2}} S(\sigma - \tau ) \Vert _{{\mathcal {L}}(H)} \le C (\sigma - \tau )^{-\frac{1-r}{2}}, \quad \text {for all } t_{j-1} \le \tau < \sigma \le t_j, \end{aligned}$$

we get for every \(\sigma \in [t_{j-1},t_j]\)

$$\begin{aligned} \Big \Vert \int _{t_{j-1}}^{\sigma } S(\sigma \!-\! \tau ) f(X(\tau )) \,\mathrm {d}\tau \Big \Vert _{L_p(\Omega ;H)}\!&\le \! C \int _{t_{j-1}}^{\sigma } (\sigma \!-\! \tau )^{-\frac{1-r}{2}} \Vert f(X(\tau )) \Vert _{L_p(\Omega ;\dot{H}^{-1+r})} \,\mathrm {d}\tau \\ \!&\le \! C \Big ( 1 \!+\! \sup _{\sigma \in [0,T]} \big \Vert X(\sigma ) \big \Vert _{L_p(\Omega ;H)} \Big ) k^{\frac{1+r}{2}}. \end{aligned}$$

As for \(\Theta _1\) we therefore conclude

$$\begin{aligned} \big \Vert \Theta _2 (\sigma ,t_{j-1})\big \Vert _{L_p(\Omega ;\dot{H}^{-1})} \le C k^{\frac{1+r}{2}} \quad \text { for all } \sigma \in [t_{j-1},t_j]. \end{aligned}$$

For the estimate of \(\Theta _3\) we first apply the fact that

$$\begin{aligned} {\mathbb {E}}\Big [ \int _{0}^{1} f'(X(t_{j-1})) \Big [ \int _{t_{j-1}}^{\sigma } S(\sigma - \tau ) g(X(\tau )) \,\mathrm {d}W(\tau ) \Big ] \,\mathrm {d}s \Big | {\mathcal {F}}_{t_{j-1}} \Big ] = 0. \end{aligned}$$

From this we get

$$\begin{aligned}&\big \Vert \Theta _3 (\sigma ,t_{j-1})\big \Vert _{L_p(\Omega ;\dot{H}^{-1})}\\&\quad \le \displaystyle \int _{0}^{1} \Big \Vert \big ( f'(\sigma ,t_j,s) \!-\! f'(X(t_{j-1})) \big ) \Big [ \int _{t_{j-1}}^{\sigma } S(\sigma \!-\! \tau ) g(X(\tau )) \,\mathrm {d}W(\tau ) \Big ]\Big \Vert _{L_p(\Omega ;\dot{H}^{-1})} \,\mathrm {d}s. \end{aligned}$$

Further, for every \(s \in [0,1]\) we derive by Hölder’s inequality

$$\begin{aligned}&\Big \Vert \big ( f'(\sigma ,t_j,s) - f'(X(t_{j-1})) \big ) \Big [ \int _{t_{j-1}}^{\sigma } S(\sigma - \tau ) g(X(\tau )) \,\mathrm {d}W(\tau ) \Big ]\Big \Vert _{L_p(\Omega ;\dot{H}^{-1})}\\&\quad \le \Bigg ( {\mathbb {E}}\Big [ \big \Vert f'(\sigma ,t_j,s) - f'(X(t_{j-1})) \big \Vert _{{\mathcal {L}}(H,\dot{H}^{-1})}^p \Big \Vert \int _{t_{j-1}}^{\sigma } S(\sigma - \tau ) g(X(\tau )) \,\mathrm {d}W(\tau ) \Big \Vert ^p \Big ] \Bigg )^{\frac{1}{p}}\\&\quad \le \big \Vert f'( X(t_{j-1}) + s (X(\sigma ) - X(t_{j-1}))) - f'(X(t_{j-1})) \big \Vert _{L_{2p}(\Omega ;{\mathcal {L}}(H,\dot{H}^{-1}))} \\&\qquad \times \, \Big \Vert \int _{t_{j-1}}^{\sigma } S(\sigma - \tau ) g(X(\tau )) \,\mathrm {d}W(\tau ) \Big \Vert _{L_{2p}(\Omega ;H)}. \end{aligned}$$

Now, we have by (7) and (11)

$$\begin{aligned}&\big \Vert f'( X(t_{j-1}) + s (X(\sigma ) - X(t_{j-1}))) - f'(X(t_{j-1})) \big \Vert _{L_{2p}(\Omega ;{\mathcal {L}}(H,\dot{H}^{-1}))}\\&\quad \le C_f \big \Vert X(\sigma ) - X(t_{j-1}) \big \Vert _{L_{2p}(\Omega ;H)} \le C (\sigma - t_{j-1})^{\frac{1}{2}} \end{aligned}$$

for all \(s \in [0,1]\) and \(\sigma \in [t_{j-1},t_j]\). In addition, by Proposition 2.6 it holds true that

$$\begin{aligned}&\Big \Vert \int _{t_{j-1}}^{\sigma } S(\sigma - \tau ) g(X(\tau )) \,\mathrm {d}W(\tau ) \Big \Vert _{L_{2p}(\Omega ;H)}\\&\quad \le C \Big ( \int _{t_{j-1}}^{\sigma } \big \Vert S(\sigma - \tau ) g(X(\tau )) \big \Vert ^2_{L_p(\Omega ;H)} \,\mathrm {d}\tau \Big )^{\frac{1}{2}} \le C \Big ( 1 + \sup _{\tau \in [0,T]} \big \Vert X(\tau ) \big \Vert _{L_p(\Omega ;H)} \Big ) k^{\frac{1}{2}} \end{aligned}$$

for all \(\sigma \in [t_{j-1},t_j]\). This completes the estimate of \(\Theta _3\) and, therefore, also the proof of the Lemma. \(\square \)

The last building block in the proof of consistency is the following lemma.

Lemma 5.8

Let Assumptions 2.2–2.4 be satisfied for some \(r \in [0,1)\). If the spatial discretization satisfies Assumption 2.7 it holds

$$\begin{aligned}&\displaystyle \max _{n \in \{1,\ldots ,N_k\}} \Big \Vert \sum \limits _{j = 1}^{n} S_{k,h}^{n-j +1 } \int \nolimits _{t_{j-1}}^{t_j} \Big ( g(X(\sigma )) - g(X(t_{j-1}))\\&\qquad \qquad \qquad - g'(X(t_{j-1}))\Big [ \displaystyle \int _{t_{j-1}}^{\sigma } g(X(t_{j-1})) \,\mathrm {d}W(\tau )\Big ] \Big ) \,\mathrm {d}W(\sigma ) \Big \Vert _{L_p(\Omega ;H)} \le C k^{\frac{1+r}{2}} \end{aligned}$$

for all \(h \in (0,1]\) and \(k \in (0,T]\).

Proof

The proof mainly applies the same techniques as used in the proof of Lemma 5.8. First let us fix an arbitrary \(n \in \{1,\ldots ,N_k\}\) and recall the notation \(\Gamma _X\) in (41). Then we note that

$$\begin{aligned}&{\mathbb {E}}\Big [ S_{k,h}^{n-j+1} \int _{t_{j-1}}^{t_j} g(X(\sigma )) - g(X(t_{j-1})) - g'(X(t_{j-1}))\big [ \Gamma _X(\sigma ) \big ] \,\mathrm {d}W(\sigma ) \Big | {\mathcal {F}}_{t_{j-1}} \Big ] = 0 \end{aligned}$$

for every \(j \in \{1,\ldots ,n\}\). Hence, the sum of these terms is a discrete time martingale in \(L_p(\Omega ;H)\). Hence, we first apply Burkholder’s inequality [6, Th. 3.3] and then Proposition 2.6 and obtain

$$\begin{aligned}&\Big \Vert \sum _{j = 1}^{n} S_{k,h}^{n-j +1 } \int _{t_{j-1}}^{t_j} g(X(\sigma )) - g(X(t_{j-1})) - g'(X(t_{j-1}))\big [ \Gamma _X(\sigma ) \big ] \,\mathrm {d}W(\sigma ) \Big \Vert _{L_p(\Omega ;H)}\\&\quad \le C \Bigg ( {\mathbb {E}}\Big [ \Bigg ( \sum _{j = 1}^n \Big \Vert S_{k,h}^{n-j+1} \int _{t_{j-1}}^{t_j} \Bigg ( g(X(\sigma )) - g(X(t_{j-1})) \\&\quad \qquad -\, g'(X(t_{j-1}))\big [ \Gamma _X(\sigma ) \big ] \Bigg ) \,\mathrm {d}W(\sigma ) \Big \Vert ^{2} \Bigg )^{\frac{p}{2}} \Big ] \Bigg )^{\frac{1}{p}} \\&\quad \le C \Big ( \sum _{j = 1}^n \Big \Vert \int _{t_{j-1}}^{t_j} \Big ( g(X(\sigma )) - g(X(t_{j-1})) \\&\quad \qquad -\, g'(X(t_{j-1}))\big [ \Gamma _X(\sigma ) \big ] \Big ) \,\mathrm {d}W(\sigma ) \Big \Vert ^{2}_{L_p(\Omega ;H)} \Big )^{\frac{1}{2}}\\&\quad \le C \Big ( \sum _{j = 1}^n \int _{t_{j-1}}^{t_j} \big \Vert g(X(\sigma )) - g(X(t_{j-1})) - g'(X(t_{j-1}))\big [ \Gamma _X(\sigma ) \big ] \big \Vert ^{2}_{L_p(\Omega ;{\mathcal {L}}_2^0)} \,\mathrm {d}\sigma \Big )^{\frac{1}{2}}. \end{aligned}$$

Consequently, if we show that there exists a constant such that

$$\begin{aligned} \big \Vert g(X(\sigma )) - g(X(t_{j-1})) - g'(X(t_{j-1}))\big [ \Gamma _X(\sigma ) \big ] \big \Vert ^{2}_{L_p(\Omega ;{\mathcal {L}}_2^0)} \le C k^{1+r} \end{aligned}$$
(50)

for all \(\sigma \in [t_{j-1},t_j]\), the proof is complete. In order to prove (50) we again apply the mean value theorem for Fréchet differentiable mappings and obtain

$$\begin{aligned} g(X(\sigma )) - g(X(t_{j-1})) = \int _0^1 g'(\sigma ,t_{j-1},s) \big [ X(\sigma ) - X(t_{j-1}) \big ] \,\mathrm {d}s, \end{aligned}$$

where we denote

$$\begin{aligned} g'(\tau _1,\tau _2,s) := g'( X(\tau _2) + s (X(\tau _1) - X(\tau _2)))\quad \text { for all } \tau _1, \tau _2 \in [0,T],\; s \in [0,1]. \end{aligned}$$

After inserting (3) we get the estimate

$$\begin{aligned}&\big \Vert g(X(\sigma )) - g(X(t_{j-1})) - g'(X(t_{j-1}))\big [ \Gamma _X(\sigma ) \big ] \big \Vert _{L_p(\Omega ;{\mathcal {L}}_2^0)}\\&\quad \le \int _0^1 \big \Vert g'(\sigma ,t_{j-1},s) \big [ \big ( S(\sigma - t_{j-1}) - \mathrm {Id}_H \big ) X(t_{j-1}) \big ] \big \Vert _{L_p(\Omega ;{\mathcal {L}}_2^0)} \,\mathrm {d}s\\&\qquad + \int _0^1 \Big \Vert g'(\sigma ,t_{j-1},s) \Big [ \int _{t_{j-1}}^{\sigma }S(\sigma - \tau ) f(X(\tau )) \,\mathrm {d}\tau \Big ] \Big \Vert _{L_p(\Omega ;{\mathcal {L}}_2^0)} \,\mathrm {d}s\\&\qquad + \int _0^1 \Big \Vert g'(\sigma ,t_{j-1},s) \Big [ \int _{t_{j-1}}^{\sigma } S(\sigma - \tau ) g(X(\tau )) \,\mathrm {d}W(\tau ) - \Gamma _X(\sigma )\Big ] \Big \Vert _{L_p(\Omega ;{\mathcal {L}}_2^0)} \,\mathrm {d}s\\&\qquad + \int _0^1 \Big \Vert \big ( g'(\sigma ,t_{j-1},s) - g'(X(t_{j-1})) \big ) \big [ \Gamma _X(\sigma ) \big ] \Big \Vert _{L_p(\Omega ;{\mathcal {L}}_2^0)} \,\mathrm {d}s\\&\quad =: J_5 + \ldots + J_8. \end{aligned}$$

We consider the terms \(J_i\), \(i \in \{5,\ldots ,8\}\), one by one. The desired estimated of \(J_5\) is obtained in the same way as for the term \(\Theta _1\) in the proof of Lemma 5.7, namely

$$\begin{aligned} J_5&\le \int _0^1 \big ( {\mathbb {E}}\big [ \big \Vert g'(\sigma ,t_{j-1},s) \big \Vert _{{\mathcal {L}}(H,{\mathcal {L}}_2^0)}^p \big \Vert \big ( S(\sigma - t_{j-1}) - \mathrm {Id}_H \big ) X(t_{j-1}) \big \Vert ^p \big ] \big )^{\frac{1}{p}} \,\mathrm {d}s\\&\le C \sup _{x \in H} \big \Vert g(x) \big \Vert _{{\mathcal {L}}(H,{\mathcal {L}}_2^0)} \big \Vert \big ( S(\sigma - t_{j-1}) - \mathrm {Id}_H \big ) X(t_{j-1}) \big \Vert _{L_p(\Omega ;H)}\\&\le C \sup _{x \in H} \big \Vert g(x) \big \Vert _{{\mathcal {L}}(H,{\mathcal {L}}_2^0)} \sup _{\tau \in [0,T]} \Vert X(\tau ) \Vert _{L_p(\Omega ;\dot{H}^{1+r})} k^{\frac{1+r}{2}}. \end{aligned}$$

Likewise, the estimate of \(J_6\) is done by the exact same steps as for the term \(\Theta _2\) in the proof of Lemma 5.7. Thus, it holds

$$\begin{aligned} J_6 \le C \sup _{x \in H} \big \Vert g(x) \big \Vert _{{\mathcal {L}}(H,{\mathcal {L}}_2^0)} \Big ( 1 + \sup _{\sigma \in [0,T]} \big \Vert X(\sigma ) \big \Vert _{L_p(\Omega ;H)} \Big ) k^{\frac{1+r}{2}}. \end{aligned}$$

As above, the term \(J_7\) is first estimated by

$$\begin{aligned} J_7&\le C \sup _{x \in H} \big \Vert g(x) \big \Vert _{{\mathcal {L}}(H,{\mathcal {L}}_2^0)} \Big \Vert \displaystyle \int _{t_{j-1}}^{\sigma } S(\sigma - \tau ) g(X(\tau )) \,\mathrm {d}W(\tau ) - \Gamma _X(\sigma )\Big \Vert _{L_p(\Omega ;H)}. \end{aligned}$$

Then, after inserting the definition (41) of \(\Gamma _X\) and an application of Proposition 2.6 we arrive at

$$\begin{aligned}&\Big \Vert \int _{t_{j-1}}^{\sigma } S(\sigma - \tau ) g(X(\tau )) \,\mathrm {d}W(\tau ) - \Gamma _X(\sigma )\Big \Vert _{L_p(\Omega ;H)}\\&\quad \le C \Big ( \int _{t_{j-1}}^{\sigma } \big \Vert S(\sigma - \tau ) g(X(\tau )) - g(X(t_{j-1})) \big \Vert _{L_p(\Omega ;{\mathcal {L}}_2^0)}^2 \,\mathrm {d}\tau \Big )^{\frac{1}{2}}\\&\quad \le C \Big ( \int _{t_{j-1}}^{\sigma } \big \Vert \big ( S(\sigma - \tau ) - \mathrm {Id}_H \big ) g(X(\tau )) \big \Vert _{L_p(\Omega ;{\mathcal {L}}_2^0)}^2 \,\mathrm {d}\tau \Big )^{\frac{1}{2}}\\&\qquad +\, C \Big ( \int _{t_{j-1}}^{\sigma } \big \Vert g(X(\tau )) - g(X(t_{j-1})) \big \Vert _{L_p(\Omega ;{\mathcal {L}}_2^0)}^2 \,\mathrm {d}\tau \Big )^{\frac{1}{2}}. \end{aligned}$$

For the first summand recall by (9) that \(g\) yields some additional spatial regularity. Together with [27, Ch. 2.6, Th. 6.13] we can use this to obtain

$$\begin{aligned}&\Big ( \int _{t_{j-1}}^{\sigma } \big \Vert \big ( S(\sigma - \tau ) - \mathrm {Id}_H \big ) g(X(\tau )) \big \Vert _{L_p(\Omega ;{\mathcal {L}}_2^0)}^2 \,\mathrm {d}\tau \Big )^{\frac{1}{2}}\\&\quad \le C \Big ( \int _{t_{j-1}}^{\sigma } (\sigma - \tau )^{r} \big \Vert g( X(\tau ) ) \big \Vert _{L_p(\Omega ;{\mathcal {L}}_{2,r}^0)}^2 \,\mathrm {d}\tau \Big )^{\frac{1}{2}} \\&\quad \le C \Big ( 1 + \sup _{\tau \in [0,T]} \big \Vert X(\tau ) \big \Vert _{L_p(\Omega ;\dot{H}^{r})} \Big ) k^{\frac{1+r}{2}} \end{aligned}$$

for all \(\sigma \in [t_{j-1}, t_j]\). A similar estimate follows for the second summand by (8) and (11), that is

$$\begin{aligned} \Bigg ( \int _{t_{j-1}}^{\sigma } \big \Vert g(X(\tau )) \!-\! g(X(t_{j-1})) \big \Vert _{L_p(\Omega ;{\mathcal {L}}_2^0)}^2 \,\mathrm {d}\tau \Bigg )^{\frac{1}{2}} \!\le \! C \Bigg ( \int _{t_{j-1}}^{\sigma } ( \tau - t_{j-1}) \,\mathrm {d}\tau \Bigg )^{\frac{1}{2}} \le C k^{\frac{1+r}{2}}. \end{aligned}$$

This shows the desired estimate for \(J_7\) and it remains to consider \(J_8\). The estimate of \(J_8\) is very similar to the estimate of \(\Theta _3\) in the proof of Lemma 5.7. After the application of Hölder’s inequality we arrive at

$$\begin{aligned} J_8&\le \displaystyle \int _{0}^{1} \big \Vert g'\big ( X(t_{j-1}) + s \big ( X(\sigma ) - X(t_{j-1}) \big ) \big ) - g'(X(t_{j-1})) \big \Vert _{L_{2p}(\Omega ;{\mathcal {L}}(H,{\mathcal {L}}_2^0))}\,\mathrm {d}s\\&\qquad \times \Big \Vert \displaystyle \int _{t_{j-1}}^{\sigma } g(X(t_{j-1})) \,\mathrm {d}W(\tau ) \Big \Vert _{L_{2p}(\Omega ;H)}. \end{aligned}$$

Next, by (8) and (11) it follows

$$\begin{aligned} \big \Vert g'\big ( X(t_{j-1}) + s \big ( X(\sigma ) - X(t_{j-1}) \big ) \big ) - g'(X(t_{j-1})) \big \Vert _{L_{2p}(\Omega ;{\mathcal {L}}(H,{\mathcal {L}}_2^0))} \le C (\sigma - t_{j-1})^{\frac{1}{2}}, \end{aligned}$$

while Proposition 2.6 and Assumption 2.4 yield

$$\begin{aligned} \Big \Vert \int _{t_{j-1}}^{\sigma } g(X(t_{j-1})) \,\mathrm {d}W(\tau ) \Big \Vert _{L_{2p}(\Omega ;H)}&\le C \Big ( \displaystyle \int _{t_{j-1}}^{\sigma } \big \Vert g(X(t_{j-1})) \big \Vert _{L_{2p}(\Omega ;{\mathcal {L}}_2^0)}^2 \,\mathrm {d}\tau \Big )^{\frac{1}{2}}\\&\le C \Big ( 1 + \sup _{\tau \in [0,T]} \Vert X(\tau ) \Vert _{L_{2p}(\Omega ;H)}\Big ) k^{\frac{1}{2}}, \end{aligned}$$

for all \(\sigma \in [t_{j-1},t_j]\). Therefore, there exists a constant such that

$$\begin{aligned} J_8 \le C k \end{aligned}$$

and the assertion of the lemma has been proved. \(\square \)

6 Noise approximation

The starting point of the spectral noise approximation is the covariance operator \(Q \in {\mathcal {L}}(U)\), which is symmetric and nonnegative. Since we do not assume that \(Q\) has finite trace, we need to approximate the stochastic integral with respect to a cylindrical \(Q\)-Wiener process \(W :[0,T] \times \Omega \rightarrow U\) (see [28, Ch. 2.5]).

Throughout this section we work under the assumption that there exists an orthonormal basis \((\varphi _j)_{j \in {\mathbb {N}}}\) of the separable Hilbert space \(U\) such that

$$\begin{aligned} Q \varphi _j = \mu _j \varphi _j, \quad \text { for all } j \in {\mathbb {N}}, \end{aligned}$$

where \(\mu _j \ge 0\), \(j \in {\mathbb {N}}\), denote the eigenvalues of \(Q\). First let us note that this assumption is not fulfilled for all symmetric and nonnegative operators \(Q \in L(U)\). However, it always holds true for white noise, that is \(Q = \mathrm {Id}_U\), or if \(Q\) is of finite trace by the spectral theorem for compact, symmetric operators. Further, the family \((\sqrt{\mu _j} \varphi _j)_{j \in {\mathbb {N}}}\) is an orthonormal basis of the Cameron-Martin space \(U_0 = Q^{\frac{1}{2}}(U)\), which is endowed with the inner product \((u,v)_{U_0} := ( Q^{-\frac{1}{2}} u, Q^{-\frac{1}{2}} v )_{U}\) for all \(u,v \in U_0\) (see [28, Ch. 2.3]).

As demonstrated in [28, Rem. 2.5.1, Prop. 2.5.2], in order to define the stochastic integral with respect to a cylindrical Wiener process, one introduces a further Hilbert space \(U_1\) and an Hilbert–Schmidt embedding \(\mathcal {I} :U_0 \rightarrow U_1\), such that \(W\) becomes a standard Wiener process on the larger space \(U_1\) with covariance operator \(Q_1 := \mathcal {I}\mathcal {I}^{*}\) and Karhunen-Loève expansion

$$\begin{aligned} W(t) = \sum _{j = 1}^\infty \beta _j(t) \mathcal {I}(\sqrt{\mu _j} \varphi _j), \quad t \in [0,T], \end{aligned}$$
(51)

where \(\beta _j :[0,T] \times \Omega \rightarrow {\mathbb {R}}\), \(j \in {\mathbb {N}}\), is a family of independent, standard real-valued Brownian motions. Since \(\mathcal {I} :U_0 \rightarrow Q_1^{\frac{1}{2}}(U_0)\) is an isometry, the definition of the stochastic integral with respect to a cylindrical Wiener process is in fact independent of the choice of the space \(U_1\), see [28, Rem. 2.5.3]. Finally note that one can choose the Hilbert–Schmidt operator \(\mathcal {I}\) in such a way that \(( \mathcal {I}(\sqrt{\mu _j} \varphi _j))_{j\in {\mathbb {N}}}\) becomes an orthonormal basis of \(Q_1^{\frac{1}{2}}(U_1)\).

In order to approximate the Wiener process we follow in the footsteps of [1]. Let us denote by \(Q_J \in {\mathcal {L}}(U)\), \(J \in {\mathbb {N}}\), the operator given by

$$\begin{aligned} Q_J \varphi _j := {\left\{ \begin{array}{ll} \mu _j \varphi _j,&{} \quad \text { if } j \in \{1,\ldots ,J\},\\ 0,&{} \quad \text { else}. \end{array}\right. } \end{aligned}$$

As in [1] we further use the abbreviation \(Q_{cJ} := Q - Q_J\). Now, since \(Q_J\) is of finite rank, the \(Q_J\)-Wiener process \(W^J :[0,T] \times \Omega \rightarrow U\) defined by

$$\begin{aligned} W^J(t) = \sum _{j = 1}^J \sqrt{\mu _j} \beta _j(t) \varphi _j, \quad t \in [0,T], \end{aligned}$$
(52)

can be simulated on a computer, provided that the orthonormal basis \((\varphi _j)_{j \in {\mathbb {N}}}\) of \(U\) is explicitly known. Here \(\beta _j :[0,T] \times \Omega \rightarrow {\mathbb {R}}\), \(j \in {\mathbb {N}}\), are the same as in (51) Further, from [1] we recall the notation \(W^{cJ}(t) := W(t) - W^J(t)\) for all \(t \in [0,T]\).

Then, the Milstein–Galerkin finite element scheme with truncated noise is given by the recursion

$$\begin{aligned} X_{k,h,J}(t_0)&= P_h X_0,\nonumber \\ X_{k,h,J}(t_n)&= X_{k,h,J}(t_{n-1}) - k \big [ A_h X_{k,h,J}(t_n) + P_h f(X_{k,h,J}(t_{n-1})) \big ]\nonumber \\&+\, P_h g(X_{k,h,J}(t_{n-1})) \Delta _k W^{J^2}(t_{n})\nonumber \\&+\, \int _{t_{n-1}}^{t_n} P_h g'(X_{k,h,J}(t_{n-1}))\Big [\! \int _{t_{n-1}}^{\sigma _1}\! g(X_{k,h,J}(t_{n-1})) \!\,\mathrm {d}W^J(\sigma _2) \Big ] \,\mathrm {d}W^J(\sigma _1)\nonumber \\ \end{aligned}$$
(53)

for \(n \in \{1,\ldots ,N_k\}\) and all \(h \in (0,1]\), \(k \in (0,T]\) and \(J \in {\mathbb {N}}\). We stress that the Euler-Maruyama term incorporates \(J^2\) summands of the Wiener noise expansion (52) while the additional iterated integral term of the Milstein scheme only uses \(J\) summands. As discussed in [1] this leads to an optimal balance between computational cost and order of convergence. As already mentioned in the introduction we hereby neglect the problem of how to simulate the iterated stochastic integrals.

First we embed the scheme (53) into the abstract framework of Sect. 3. Compared to the scheme (4) the only difference appears in the definition of the increment function, which is now given by

$$\begin{aligned} \Phi _{h,J}(x,t,k)&= - k S_{k,h} f(x) + S_{k,h} g(x) \big (W^{J^2}(t+k) - W^{J^2}(t) \big )\nonumber \\&\qquad + S_{k,h} \int _{t}^{t + k} g'(x)\Bigg [ \int _{t}^{\sigma _1} g(x) \,\mathrm {d}W^J(\sigma _2) \Bigg ] \,\mathrm {d}W^J(\sigma _1), \end{aligned}$$
(54)

for \((t,k) \in \mathbb {T}\). Our first result is concerned with the stability of the Milstein scheme with truncated noise.

Theorem 6.1

Under Assumptions 2.2 to 2.4 the Milstein–Galerkin finite element scheme (53) is bistable for every \(h \in (0,1]\) and \(J \in {\mathbb {N}}\). The stability constant \(C_{\mathrm {Stab}}\) can be chosen to be independent of \(h \in (0,1]\) and \(J \in {\mathbb {N}}\).

Proof

We only need to verify that Assumption 3.7 is also satisfied by \(\Phi _{h,J}\). This is done by the exact same steps as in the proof of Theorem 4.1. An important tool in that proof is Proposition 2.6. Here we have to apply it to stochastic integrals with respect to \(W^J\). For this, let \(\Psi :[0,T] \times \Omega \rightarrow {\mathcal {L}}_2^0\) be a predictable stochastic process satisfying the condition of Proposition 2.6, then it holds

$$\begin{aligned} \Big \Vert \int _{\tau _1}^{\tau _2} \Psi (\sigma ) \,\mathrm {d}W^J(\sigma ) \Big \Vert _{L_p(\Omega ;H)} \le C(p) \Bigg ( {\mathbb {E}}\Big [ \Big ( \int _{\tau _1}^{\tau _2} \big \Vert \Psi (\sigma ) Q_J^{\frac{1}{2}} \big \Vert _{{\mathcal {L}}_2(U,H)}^2 \,\mathrm {d}\sigma \Big )^{\frac{p}{2}} \Big ] \Bigg )^{\frac{1}{p}}, \end{aligned}$$

for all \(0 \le \tau _1 < \tau _2 \le T\) and all \(J \in {\mathbb {N}}\). Since we have

$$\begin{aligned} \big \Vert \Psi (\sigma ) Q_J^{\frac{1}{2}} \big \Vert _{{\mathcal {L}}_2(U,H)}^2 = \sum _{j = 1}^J \mu _j \big \Vert \Psi (\sigma ) \varphi _j \big \Vert ^2 \le \sum _{j = 1}^\infty \mu _j \big \Vert \Psi (\sigma ) \varphi _j \big \Vert ^2 = \big \Vert \Psi (\sigma ) \big \Vert _{{\mathcal {L}}_2^0}^2, \end{aligned}$$

the \(L_p\)-norm of the stochastic integral with respect to \(W^J\) is therefore bounded by

$$\begin{aligned} \Big \Vert \int _{\tau _1}^{\tau _2} \Psi (\sigma ) \,\mathrm {d}W^J(\sigma ) \Big \Vert _{L_p(\Omega ;H)} \le C(p) \Big ( {\mathbb {E}}\Big [ \Big ( \int _{\tau _1}^{\tau _2} \big \Vert \Psi (\sigma ) \big \Vert _{{\mathcal {L}}_2^0}^2 \,\mathrm {d}\sigma \Big )^{\frac{p}{2}} \Big ] \Big )^{\frac{1}{p}}. \end{aligned}$$

In particular, the constant is independent of \(J \in {\mathbb {N}}\). Keeping this in mind, all steps in the proof of Theorem 4.1 remain also valid for (53). \(\square \)

Having this established it remains to investigate if the scheme (53) is consistent. For this we introduce the following additional condition, which allows us to control the order of consistency with respect to the parameter \(J \in {\mathbb {N}}\).

Assumption 6.2

There exist constants \(C\) and \(\alpha > 0\) such that

$$\begin{aligned} \Bigg ( \sum _{j = 1}^\infty j^{\alpha } \mu _j \big \Vert g(x) \varphi _j\big \Vert ^2 \Bigg )^{\frac{1}{2}}&\le C \big ( 1 + \Vert x \Vert \big ),\end{aligned}$$
(55)
$$\begin{aligned} \Bigg ( \sum _{j = 1}^\infty j^{\alpha } \mu _j \big \Vert g'(x)[y] \varphi _j\big \Vert ^2 \Bigg )^{\frac{1}{2}}&\le C \Vert y \Vert \end{aligned}$$
(56)

for all \(x,y \in H\), where \((\varphi _j, \mu _j)_{j \in {\mathbb {N}}}\) are the eigenpairs of the covariance operator \(Q\).

Remark 6.3

  1. (1)

    Note that (55) and (56) coincide if \(g :H \rightarrow {\mathcal {L}}_2^0\) is linear.

  2. (2)

    Assumption 2.4 ensures that (55) and (56) are fulfilled with \(\alpha = 0\). Further, provided that \(Q\) is of finite trace and \(g :H \rightarrow {\mathcal {L}}_2^0\) satisfies

$$\begin{aligned} \Vert g (x) \Vert _{{\mathcal {L}}(U,H)}&\le C \big ( 1 + \Vert x \Vert \big ), \qquad \text { and } \Vert g' (x)[y] \Vert _{{\mathcal {L}}(U,H)} \le C \Vert y\Vert \end{aligned}$$

for all \(x,y\in H\), then (55) and (56) simplify to

$$\begin{aligned} \Big ( \sum _{j = 1}^\infty j^{\alpha } \mu _j \Big )^{\frac{1}{2}} < \infty . \end{aligned}$$
(57)

Hence, the order of convergence of the truncated noise only depends on the rate of decay of the eigenvalues of the covariance operator \(Q\).

Theorem 6.4

Let Assumptions 2.7 and 2.9 be fulfilled by the spatial discretization. If Assumptions 2.2 to 2.4 and Assumption 6.2 are satisfied with \(p \in [2,\infty )\), \(r \in [0,1)\) and \(\alpha > 0\), then there exists a constant \(C\) such that the following estimate holds true for the local truncation error of the Milstein scheme (53), namely

$$\begin{aligned} \big \Vert \mathcal {R}_k \big [ X|_{\mathcal {T}_k} \big ] \big \Vert _{-1,p} \le C \big ( h^{1+r} + k^{\frac{1+r}{2}} + J^{-\alpha } \big ) \end{aligned}$$

for all \(h \in [0,1)\), \(k \in (0,T]\) and \(J \in {\mathbb {N}}\). In particular, if \(h\), \(J\) and \(k\) are coupled by \(h := c_1 k^{\frac{1}{2}}\) and \(J := \lceil c_2 k^{-\frac{1+r}{2\alpha }}\rceil \) for some positive constants \(c_1,c_2 \in {\mathbb {R}}\), then the Milstein scheme is consistent of order \(\frac{1+r}{2}\).

Proof

The proof relies on slightly generalized techniques from [1, Lemmas 4.1 and 4.2]. First let us note, that the results of Lemmas 5.2–5.7 remain valid for (53). Therefore, from the decomposition of the local truncation error in Lemma 3.11 and (45) it follows that we only need to show that the following analogue of Lemma 5.8 is valid: There exists a constant \(C\) such that for all \(n \in \{1,\ldots ,N_k\}\)

$$\begin{aligned}&\Big \Vert \sum _{j = 1}^{n} S_{k,h}^{n-j +1 } \Big ( \int _{t_{j-1}}^{t_j} g(X(\sigma )) \,\mathrm {d}W(\sigma ) - \int _{t_{j-1}}^{t_j} g(X(t_{j-1})) \,\mathrm {d}W^{J^2}(\sigma )\\&\qquad \qquad - \int _{t_{j-1}}^{t_j} g'(X(t_{j-1}))\Big [ \int _{t_{j-1}}^{\sigma } g(X(t_{j-1})) \,\mathrm {d}W^J(\tau )\Big ] \,\mathrm {d}W^J(\sigma ) \Big ) \Big \Vert _{L_p(\Omega ;H)}\\&\quad \le C \big ( k^{\frac{1+r}{2}} + J^{-\alpha } \big ) \end{aligned}$$

for all \(h \in (0,1]\), \(k \in (0,T]\) and \(J \in {\mathbb {N}}\). We begin by fixing an arbitrary \(n \in \{1,\ldots ,N_k\}\) and insert several suitable terms with untruncated noise such that Lemma 5.8 is applicable. Hence, we obtain

$$\begin{aligned}&\Big \Vert \sum _{j = 1}^{n} S_{k,h}^{n-j +1 } \Big ( \int _{t_{j-1}}^{t_j} g(X(\sigma )) \,\mathrm {d}W(\sigma ) - \int _{t_{j-1}}^{t_j} g(X(t_{j-1})) \,\mathrm {d}W^{J^2}(\sigma )\nonumber \\&\qquad \qquad - \int _{t_{j-1}}^{t_j} g'(X(t_{j-1}))\Big [ \int _{t_{j-1}}^{\sigma } g(X(t_{j-1})) \,\mathrm {d}W^J(\tau )\Big ] \,\mathrm {d}W^J(\sigma ) \Big ) \Big \Vert _{L_p(\Omega ;H)}\nonumber \\&\quad \le C k^{\frac{1+r}{2}} + \Big \Vert \sum _{j = 1}^{n} S_{k,h}^{n-j +1 } \int _{t_{j-1}}^{t_j} g(X(t_{j-1})) \,\mathrm {d}W^{cJ^2}(\sigma ) \Big \Vert _{L_p(\Omega ;H)} \nonumber \\&\qquad +\, \Big \Vert \sum _{j = 1}^{n} S_{k,h}^{n-j +1 } \Big ( \int _{t_{j-1}}^{t_j} g'(X(t_{j-1}))\Big [ \int _{t_{j-1}}^{\sigma } g(X(t_{j-1})) \,\mathrm {d}W(\tau )\Big ] \,\mathrm {d}W(\sigma )\nonumber \\&\qquad \qquad - \int _{t_{j-1}}^{t_j} g'(X(t_{j-1}))\Big [ \int _{t_{j-1}}^{\sigma } g(X(t_{j-1})) \,\mathrm {d}W^J(\tau )\Big ] \,\mathrm {d}W^J(\sigma ) \Big ) \Big \Vert _{L_p(\Omega ;H)}. \end{aligned}$$
(58)

First let us note that by Assumption 6.2 it holds

$$\begin{aligned}&\big \Vert g(X(t_{j-1})) \big \Vert _{L_p(\Omega ;{\mathcal {L}}_2(Q_{cJ^2}^{\frac{1}{2}}(U),H))}\nonumber \\&= \Big ( {\mathbb {E}}\Big [ \Big ( \sum _{j = J^2+1}^\infty \mu _j \big \Vert g(X(t_{j-1})) \varphi _j \big \Vert ^2 \Big )^{\frac{p}{2}} \Big ] \Big )^{\frac{1}{p}}\nonumber \\&\le \Big ( {\mathbb {E}}\Big [ \Big ( \sum _{j = J^2+1}^\infty \frac{j^\alpha }{J^{2\alpha }} \mu _j \big \Vert g(X(t_{j-1})) \varphi _j \big \Vert ^2 \Big )^{\frac{p}{2}} \Big ] \Big )^{\frac{1}{p}}\nonumber \\&\le C J^{-\alpha } \Big ( 1 + \sup _{t\in [0,T]} \big \Vert X(t) \big \Vert _{L_p(\Omega ;H)}\Big ). \end{aligned}$$
(59)

This together with Burkholder’s inequality [6, Th. 3.3], (19) and Proposition 2.6 applied to \(W^{cJ^2}\) yields for the second term

$$\begin{aligned}&\Big \Vert \sum _{j = 1}^{n} S_{k,h}^{n-j +1 } \int _{t_{j-1}}^{t_j} g(X(t_{j-1})) \,\mathrm {d}W^{cJ^2}(\sigma ) \Big \Vert _{L_p(\Omega ;H)} \\&\quad \le C \Big ( {\mathbb {E}}\Big [ \Big ( \sum _{j = 1}^{n} \Big \Vert S_{k,h}^{n-j +1 } \int _{t_{j-1}}^{t_j} g(X(t_{j-1})) \,\mathrm {d}W^{cJ^2}(\sigma ) \Big \Vert ^2 \Big )^{\frac{p}{2}} \Big ] \Big )^{\frac{1}{p}} \\&\quad \le C \Big ( \sum _{j = 1}^{n} \Big \Vert S_{k,h}^{n-j +1 } \int _{t_{j-1}}^{t_j} g(X(t_{j-1})) \,\mathrm {d}W^{cJ^2}(\sigma ) \Big \Vert ^2_{L_p(\Omega ;H)} \Big )^{\frac{1}{2}}\\&\quad \le C \Big ( \sum _{j = 1}^{n} \int _{t_{j-1}}^{t_j} \big \Vert g(X(t_{j-1})) \big \Vert ^2_{L_p(\Omega ;{\mathcal {L}}_2(Q_{cJ^2}^{\frac{1}{2}}(U),H))} \,\mathrm {d}\sigma \Big )^{\frac{1}{2}} \le C \sqrt{T} J^{-\alpha }. \end{aligned}$$

By the same arguments we get for the third summand in (58) that

$$\begin{aligned}&\Big \Vert \sum _{j = 1}^{n} S_{k,h}^{n-j +1 } \Big ( \int _{t_{j-1}}^{t_j} g'(X(t_{j-1}))\Big [ \int _{t_{j-1}}^{\sigma } g(X(t_{j-1})) \,\mathrm {d}W(\tau )\Big ] \,\mathrm {d}W(\sigma )\\&\qquad \qquad - \int _{t_{j-1}}^{t_j} g'(X(t_{j-1}))\Big [ \int _{t_{j-1}}^{\sigma } g(X(t_{j-1})) \,\mathrm {d}W^J(\tau )\Big ] \,\mathrm {d}W^J(\sigma ) \Big ) \Big \Vert _{L_p(\Omega ;H)}\\&\quad \le C \Big ( \sum _{j = 1}^{n} \Big \Vert \int _{t_{j-1}}^{t_j} g'(X(t_{j-1}))\Big [ \int _{t_{j-1}}^{\sigma } g(X(t_{j-1})) \,\mathrm {d}W^{cJ}(\tau )\Big ] \,\mathrm {d}W(\sigma )\Big \Vert _{L_p(\Omega ;H)}^2\Big )^{\frac{1}{2}}\\&\qquad +\, C \Big ( \sum _{j = 1}^{n} \Big \Vert \int _{t_{j-1}}^{t_j} g'(X(t_{j-1}))\Big [ \int _{t_{j-1}}^{\sigma } g(X(t_{j-1})) \,\mathrm {d}W(\tau )\Big ] \,\mathrm {d}W^{cJ}(\sigma ) \Big \Vert _{L_p(\Omega ;H)}^2 \Big )^{\frac{1}{2}}. \end{aligned}$$

Then, by two applications of Proposition 2.6 and Assumptions 2.4 and (59) it follows

$$\begin{aligned}&\Big ( \sum _{j = 1}^{n} \Big \Vert \int _{t_{j-1}}^{t_j} g'(X(t_{j-1}))\Big [ \int _{t_{j-1}}^{\sigma } g(X(t_{j-1})) \,\mathrm {d}W^{cJ}(\tau )\Big ] \,\mathrm {d}W(\sigma )\Big \Vert _{L_p(\Omega ;H)}^2\Big )^{\frac{1}{2}}\\&\quad \le C \Big ( \sum _{j = 1}^{n} \int _{t_{j-1}}^{t_j} \Big \Vert g'(X(t_{j-1})) \Big [ \int _{t_{j-1}}^{\sigma } g(X(t_{j-1})) \,\mathrm {d}W^{cJ}(\tau )\Big ] \Big \Vert _{L_p(\Omega ;{\mathcal {L}}_2^0)}^2 \,\mathrm {d}\sigma \Big )^{\frac{1}{2}}\\&\quad \le C \sup _{x \in H} \big \Vert g'(x) \big \Vert _{{\mathcal {L}}(H,{\mathcal {L}}_2^0)} \Big ( \sum _{j = 1}^{n} \int _{t_{j-1}}^{t_j} \int _{t_{j-1}}^{\sigma } \big \Vert g(X(t_{j-1})) \big \Vert _{L_p(\Omega ;{\mathcal {L}}_2(Q_{cJ}^{\frac{1}{2}},H))}^2 \,\mathrm {d}\tau \,\mathrm {d}\sigma \Big )^{\frac{1}{2}}\\&\quad \le C C_g \sqrt{T} k^{\frac{1}{2}} J^{-\frac{\alpha }{2}} \le C \big ( k + J^{-\alpha } \big ), \end{aligned}$$

where we applied the inequality \(ab \le \frac{1}{2} (a^2+ b^2)\) in the last step. By using (56) we obtain the following estimate in the same way as in (59),

$$\begin{aligned}&\Big ( \sum _{j = 1}^{n} \Big \Vert \int _{t_{j-1}}^{t_j} g'(X(t_{j-1}))\Big [ \int _{t_{j-1}}^{\sigma } g(X(t_{j-1})) \,\mathrm {d}W(\tau )\Big ] \,\mathrm {d}W^{cJ}(\sigma ) \Big \Vert _{L_p(\Omega ;H)}^2 \Big )^{\frac{1}{2}}\\&\quad \le C \Big ( \sum _{j = 1}^{n} \int _{t_{j-1}}^{t_j} \Big \Vert g'(X(t_{j-1}))\Big [ \int _{t_{j-1}}^{\sigma } g(X(t_{j-1})) \,\mathrm {d}W(\tau )\Big ] \Big \Vert _{L_p(\Omega ;{\mathcal {L}}_2(Q_{cJ}^{\frac{1}{2}}(U),H))}^2 \,\mathrm {d}\sigma \Big )^{\frac{1}{2}} \\&\quad \le C \Big ( \sum _{j = 1}^{n} \int _{t_{j-1}}^{t_j} J^{-\alpha } \Big \Vert \int _{t_{j-1}}^{\sigma } g(X(t_{j-1})) \,\mathrm {d}W(\tau ) \Big \Vert _{L_p(\Omega ;H)}^2 \,\mathrm {d}\sigma \Big )^{\frac{1}{2}}\\&\quad \le C J^{-\frac{\alpha }{2}} \Big ( \sum _{j = 1}^{n} \int _{t_{j-1}}^{t_j} \big (\sigma - t_{j-1}\big ) \big \Vert g(X(t_{j-1})) \big \Vert _{L_p(\Omega ;{\mathcal {L}}_2^0)}^2 \,\mathrm {d}\sigma \Big )^{\frac{1}{2}}\\&\quad \le C \sqrt{T} C_g \Big ( 1 + \sup _{t \in [0,T]} \big \Vert X(t) \big \Vert _{L_p(\Omega ;H)} \Big ) k^{\frac{1}{2}} J^{-\frac{\alpha }{2}} \le C \big ( k + J^{-\alpha }\big ). \end{aligned}$$

This completes the proof. \(\square \)

Combining Theorems 6.1 and 6.4 with Theorem 3.4 immediately yields the following convergence result:

Theorem 6.5

Let Assumptions 2.7 and 2.9 be fulfilled by the spatial discretization. If Assumptions 2.2 to 2.4 and Assumption 6.2 are satisfied with \(p \in [2,\infty )\), \(r \in [0,1)\) and \(\alpha > 0\), then there exists a constant \(C\) such that

$$\begin{aligned} \max _{0 \le n \le N_k} \big \Vert X_{k,h,J}(t_n) - X(t_n) \big \Vert _{L_p(\Omega ;H)} \le C \big ( h^{1+r} + k^{\frac{1+r}{2}} + J^{-\alpha } \big ) \end{aligned}$$

for all \(h \in (0,1]\), \(k \in (0,T]\) and \(J \in {\mathbb {N}}\), where \(X_{k,h,J}\) denotes the grid function generated by the Milstein scheme with truncated noise (53) and \(X\) is the mild solution to (2).