1 Motivation

For the approximation of stochastic partial differential equations (SPDEs) with commutative noise, some higher order schemes such as the Milstein schemes in [1, 2, 9, 14, 15, 17], the derivative-free versions [18, 38], or the Wagner-Platen type scheme [3] were derived and implemented in the last years. Concerning equations that do not need to possess commutative noise, see [4, 16, 23, 29, 30, 34] for some examples with practical applications like, e.g., stochastic filtering, it was, however, an open question how to implement a higher order scheme due to the iterated stochastic integrals that are involved. Therefore, the numerical scheme of choice was so far some Euler scheme, for example, the exponential Euler or the linear implicit Euler, see [7, 13, 22, 25, 26]. Recently, the authors presented two algorithms to obtain an approximation of such stochastic integrals, see [21]. In [36], the Milstein scheme proposed by Jentzen and Röckner [9] has been analyzed for non-commutative equations in the case that it is combined with the algorithms proposed in [21]. However, as the main drawback the Milstein scheme requires the evaluation of the derivative of an operator in each time step. This is the reason that its computational complexity increases quadratically w.r.t. the dimension of the state space compared to the Euler scheme with linearly growing computational complexity. In the present paper, we propose a derivative-free numerical scheme to efficiently approximate the mild solution of SPDEs which do not need to have commutative noise, that is, the commutativity condition

$$\begin{aligned} \big ( B'(v) (B(v) u) \big ) {\tilde{u}} = \big ( B'(v) (B(v) {\tilde{u}}) \big ) u \end{aligned}$$
(1)

for all \(v\in H_{\beta }\), \(u, {\tilde{u}} \in U_0\) has not to be fulfilled. Our goal is to approximate the mild solution to SPDEs of type

$$\begin{aligned} \mathrm {d} X_t = \big ( AX_t+F(X_t)\big ) \, \mathrm {d}t + B(X_t) \, \mathrm {d}W_t, \quad t\in (0,T], \quad X_0 = \xi \end{aligned}$$
(2)

with a scheme that obtains the same temporal order of convergence as the Milstein scheme, however, without the need to evaluate any derivative and with significantly reduced computational complexity which is of the same order of magnitude as for the Euler scheme, i.e., which depends only linearly on the dimension of the state space. For details on the notation, we refer to Sect. 2.1. In general, the Milstein scheme proposed in [9] applied to (2) using the notation \(Y^{\text {MIL}}_m = Y_m^{\text {MIL};N,K,M}\) reads as \(Y^{\text {MIL}}_0 = P_N\xi \) and

$$\begin{aligned} Y^{\text {MIL}}_{m+1}&= P_N e^{Ah} \bigg (Y^{\text {MIL}}_m + hF(Y^{\text {MIL}}_m) + B(Y^{\text {MIL}}_m) \Delta W^{K,M}_m \nonumber \\&\quad + \int _{t_m}^{t_{m+1}} B'(Y^{\text {MIL}}_m) \left( \int _{t_m}^{s} B(Y^{\text {MIL}}_m) \, \mathrm {d}W^K_r\right) \, \mathrm {d}W^K_s \bigg ) \end{aligned}$$
(3)

for some \(K,M,N \in {\mathbb {N}}\), \(h=\frac{T}{M}\), and \(m\in \{0,\ldots ,M-1\}\). Numerical schemes that attain higher orders of convergence involve iterated stochastic integrals and it is not possible to rewrite these expressions such as

$$\begin{aligned} \int _t^{t+h} B'(X_t) \left( \int _t^s B(X_t) \, \mathrm {d}W_r^K \right) \, \mathrm {d}W_s^K \end{aligned}$$
(4)

for \(h>0\), \(t,t+h\in [0,T]\) and \(K\in {\mathbb {N}}\) in terms of increments of the approximated Q-Wiener process \((W_t^K)_{t\in [0,T]}\) like in the commutative case, see [9]. Therefore, methods such as the derivative-free Milstein type scheme presented in [20], which was developed based on this assumption, are not applicable to approximate the mild solution of these equations. In [21], we introduced two methods to approximate iterated stochastic integrals

$$\begin{aligned} \int _t^{t+h} \Psi \left( \Phi \int _t^s \mathrm {d}W_r\right) \mathrm {d}W_s \end{aligned}$$
(5)

with \(t\ge 0\), \(h>0\) for some operators \(\Psi \in L(H,L(U,H)_{U_0})\), \(\Phi \in L(U,H)_{U_0}\), and a Q-Wiener process \((W_t)_{t\in [0,T]}\) of trace class. Therewith, it is possible to implement the Milstein scheme (3) from [9], we refer to [36] for details. However, the evaluation of the derivative in the Milstein scheme is costly. Precisely, the computational cost needed to evaluate this term is of order \({\mathcal {O}}(N^2 K)\) in each time step, see [20, 36]. This computational effort can be reduced by one order of magnitude if the derivative is replaced by some customized approximation—see also the detailed discussion of this issue in [20].

In this work, we design a derivative-free numerical scheme to approximate the mild solution of Eq. (2) which can be combined with any method to simulate the iterated stochastic integrals involved in the scheme, see also [19]. As the main result, a two stages derivative-free Milstein type scheme is developed where the stages are constructed in such a way that, compared to the Milstein scheme (3) proposed in [9], the computational complexity is reduced by one order of magnitude. It is worth mentioning that the stages need to be chosen significantly different compared to the ones designed for the commutative noise case in [20]. The construction of the stages for the derivative-free Milstein type scheme in case of non-commutative noise is a nontrivial task as a naive choice of, e.g., finite differences does not lead to the achieved reduction of computational cost. The paper is organized as follows: First, we introduce the setting in which we work and state results on the convergence of the proposed scheme—both, with and without an approximation of the iterated integrals. The same theoretical order of convergence as for the Milstein scheme can be obtained. Moreover, we illustrate the advantages of such a higher order derivative-free scheme with a concrete example in Sect. 3. We combine the scheme with Algorithm 1 presented in [21], which is based on a truncated Fourier series expansion, and derive the effective order of convergence for this scheme—a concept that combines the theoretical order of convergence with the computational effort based on a cost model introduced in [20]. In terms of this effective order of convergence, the original Milstein scheme (3) is outperformed by the proposed derivative-free Milstein type scheme. Compared to the exponential Euler scheme, the proposed scheme obtains a higher effective order of convergence for a large set of parameter values when combined with Algorithm 1 from [21]. In Sect. 4, we analyze the \(L^2\)-error and the computational cost for the derivative-free Milstein type scheme numerically. The presented simulations confirm a higher effective order of convergence in contrast to the original Milstein scheme and at least the same or even higher effective order of convergence in contrast to the Euler scheme for the examples considered in Sect. 4. Finally, in Sects. 5 and 6, we give some concluding remarks and the proofs for the convergence results.

2 Approximation of solutions for SPDEs

In this section, we present a derivative-free Milstein type scheme for SPDE (2) which does not need to have commutative noise. Precisely, we introduce a scheme which can be coupled with an arbitrary method for the approximation of the involved iterated stochastic integrals. For example, when combined with the algorithms introduced in [21] for the simulation of twice-iterated integrals, the theoretical order of convergence of the original Milstein scheme can be maintained.

2.1 Framework

Throughout this work, we assume the framework presented in the following. Let \((H,\langle \cdot ,\cdot \rangle _H)\) and \((U,\langle \cdot ,\cdot \rangle _U)\) denote some separable real-valued Hilbert spaces and let \(T\in (0,\infty )\) be some fixed time point. Further, let the operator \(Q \in L(U)\) be non-negative, symmetric and have finite trace. Then, the subspace \(U_0 \subset U\) is defined as \(U_0 = Q^{\frac{1}{2}}U\). Moreover, we consider some complete probability space \((\Omega ,{\mathcal {F}},{\text {P}})\) and a U-valued Q-Wiener process \((W_t)_{t\in [0,T]}\) with respect to the filtration \(({\mathcal {F}}_t)_{t\in [0,T]}\) which fulfills the usual conditions. In terms of the eigenvalues of Q, denoted as \(\eta _j\), with corresponding eigenvectors \({\tilde{e}}_j\) for \(j\in {\mathcal {J}}\) with some countable index set \({\mathcal {J}}\) forming an orthonormal basis \(\{ {\tilde{e}}_j : j \in J\}\) of U (see [28]), we obtain the following series representation of the Q-Wiener process, see [28],

$$\begin{aligned} W_t = \sum _{\begin{array}{c} j\in {\mathcal {J}} \\ \eta _j \ne 0 \end{array}} \sqrt{\eta _j} \, {\tilde{e}}_j \, \beta _t^j, \quad t\in [0,T]. \end{aligned}$$
(6)

In this representation, the stochastic processes \((\beta _t^j)_{t\in [0,T]}\) denote independent real-valued Brownian motions for all \(j\in {\mathcal {J}}\) with \(\eta _j\ne 0\). Below, the following notation is used for different sets of linear operators. The space of linear and bounded operators mapping from U to H that are restricted to the subspace \(U_0\) is called \((L(U,H)_{U_0},\Vert \cdot \Vert _{L(U,H)})\) with \(L(U,H)_{U_0} := \{T :U_0 \rightarrow H \, | \, T\in L(U,H) \}\), by \(L_{HS}(U,H)\), we denote the set of Hilbert-Schmidt operators mapping from U to H and, finally, we denominate \(L^{(2)}(U,H) = L(U,L(U,H))\) and \(L_{HS}^{(2)}(U,H)= L_{HS}(U,L_{HS}(U,H))\).

For the existence and uniqueness of a mild solution of SPDE (2) and the validity of the proofs of convergence in Sect. 6, we assume the following conditions.

(A1):

The linear operator \(A :{\mathcal {D}}(A)\subset H \rightarrow H\) is the generator of an analytic \(C_0\)-semigroup \(S(t) = e^{At}\) for all \(t\ge 0\). We denote the eigenvalues of \(-A\) by \(\lambda _i \in (0,\infty )\) and the corresponding eigenvectors by \(e_i\) for \(i \in {\mathcal {I}}\) and some countable index set \({\mathcal {I}}\), that is, \(-Ae_i =\lambda _i e_i\) for all \(i\in {\mathcal {I}}\). Furthermore, let \(\inf _{i\in {\mathcal {I}}}\lambda _i >0\) and let the eigenfunctions \(\{ e_i : i\in {\mathcal {I}}\}\) of \(-A\) form an orthonormal basis of H, see [35], and

$$\begin{aligned} Av = \sum _{i\in {\mathcal {I}}} -\lambda _i\langle v,e_i\rangle _H e_i \end{aligned}$$

for all \(v\in {\mathcal {D}}(A)\). We introduce the real Hilbert spaces \(H_r := {\mathcal {D}}((-A)^r)\) for \(r\in [0,\infty )\) with norm \(\Vert x\Vert _{H_r} =\Vert (-A)^rx\Vert _H\) for \(x\in H_r\).

(A2):

Let \(\beta \in [0,1)\) and assume that \(F :H_{\beta } \rightarrow H\) is twice continuously Fréchet differentiable with \(\sup _{v\in H_{\beta }} \Vert F'(v)\Vert _{L(H)} < \infty \) and \(\sup _{v\in H_{\beta }} \Vert F''(v)\Vert _{L^{(2)}(H_{\beta },H)}<\infty \).

(A3):

The operator \(B :H_{\beta } \rightarrow L(U,H)_{U_0}\) is assumed to be twice continuously Fréchet differentiable such that \(\sup _{v\in H_{\beta }} \Vert B'(v)\Vert _{L(H,L(U,H))} < \infty \) and \(\sup _{v\in H_{\beta }} \Vert B''(v)\Vert _{L^{(2)}(H,L(U,H))}<\infty \). Further, let \(B(H_{\delta }) \subset L(U,H_{\delta })\) for some \(\delta \in (0,\tfrac{1}{2})\) and assume that

$$\begin{aligned} \Vert B(u) \Vert _{L(U,H_{\delta })}&\le C ( 1 + \Vert u \Vert _{H_{\delta }} ) , \\ \Vert B'(v) P B(v) - B'(w) P B(w) \Vert _{L_{HS}^{(2)}(U_0,H)}&\le C \Vert v - w \Vert _{H} , \\ \Vert (-A)^{-\vartheta } B(v) Q^{-\alpha } \Vert _{L_{HS}(U_0,H)}&\le C (1 + \Vert v \Vert _{H_{\gamma }}) \end{aligned}$$

for some constant \(C>0\), all \(u \in H_{\delta }\), \(v, w \in H_{\gamma }\), where \(\gamma \in \left[ \max ( \beta , \delta ), \delta + \frac{1}{2} \right) \), \(\alpha \in (0,\infty )\), \(\vartheta \in \left( 0, \frac{1}{2} \right) \), \(\beta \in [0,\delta +\tfrac{1}{2})\), any orthogonal projection operator \(P :H \rightarrow \text {span}\{e_i: i\in \tilde{{\mathcal {I}}}\}\subset H\) with finite index set \(\tilde{{\mathcal {I}}}\subset {\mathcal {I}}\) and the case that P is the identity.

(A4):

The initial value \(\xi :\Omega \rightarrow H_{\gamma }\) is \({\mathcal {F}}_0\)-\({\mathcal {B}}(H_{\gamma })\)-measurable and it holds \(\mathrm {E}\big [\Vert \xi \Vert ^4_{H_{\gamma }}\big ] <\infty \).

(A5):

Assume that at least one of the following conditions is fulfilled:

(a):

\(Q^{\frac{1}{2}}\) is a trace class operator,

(b):

\(\Vert B''(v)(P B(u), P B(u))\Vert _{L^{(2)}(U,L(U,H))} \le C(1+\Vert v\Vert _{H} + \Vert u\Vert _{H})\) for all \(u,v \in H\), some \(C>0\) and any orthogonal projection operator \(P :H \rightarrow \text {span}\{e_i: i\in \tilde{{\mathcal {I}}}\}\subset H\) with finite index set \(\tilde{{\mathcal {I}}}\subset {\mathcal {I}}\).

In this work, we do not make a difference between the operator B and its extension \({\tilde{B}} :H \rightarrow L(U,H)_{U_0}\). The operator \({\tilde{B}}\) is globally Lipschitz continuous as \(H_{\beta }\subset H\) is dense. With F, we deal analogously. Note that assumptions (A1)–(A4) are the same as for the scheme for SPDEs with commutative noise introduced in [20] and similar to the conditions imposed in [9, 36] for the original Milstein scheme. However, the commutativity condition (1), which is essential in [9, 20], needs not to be fulfilled in our setting. On the other hand, assumption (A5) is required. Assumptions (A1)–(A4) assure the existence of a unique mild solution \(X :[0,T] \times \Omega \rightarrow H_{\gamma }\) for SPDE (2), see [8, 9]. Moreover, it holds \(\sup _{t \in [0,T]} \mathrm {E}\big [ \Vert X_t \Vert _{H_{\gamma }}^4 + \Vert B(X_t) \Vert ^4 _{L_{HS}(U_0,H_{\delta })} \big ] < \infty \) and

$$\begin{aligned} \sup _{\begin{array}{c} s,t \in [0,T] \\ s \ne t \end{array}} \frac{\left( \mathrm {E}\big [ \Vert X_t-X_s\Vert _{H_r}^p \big ] \right) ^{\frac{1}{p}}}{|t-s|^{\min (\gamma -r, \frac{1}{2})}} <\infty \end{aligned}$$

for every \(r \in [0,\gamma ]\) and \(p\in [2,4]\), see [8].

2.2 The derivative-free Milstein type scheme

In this section, we derive a numerical scheme to approximate the mild solution of SPDE (2). Since the Milstein scheme (3) is computationally expensive due to the derivative that has to be evaluated in each step, see also the discussion in [20, 36], the main goal is to develop a derivative-free Milstein type scheme with significantly reduced computational complexity and thus improved performance. In order to compare numerical methods in this work, we consider the so-called effective order of convergence first introduced in [33]. This number combines the theoretical order of convergence with the computational cost involved in the calculation of an approximation by a particular scheme. As in [20], the goal is to raise the effective order of convergence by means of a customized approximation for the term \(B'B\) that is free of derivatives and, in addition, computationally less expensive. Here, however, we can not assume that the operator \(B'B\) fulfills a commutativity condition, that is, condition (1) is not required. Therefore, as the main challenge, a completely new and customized type of stages has to be developed for a derivative-free Milstein type scheme that does not assume the commutativity condition (1). It has to be pointed out that the design of these stages needs to be different compared to the stages of schemes for SPDEs with commutative noise, like the one proposed in [20], and is much harder to construct. In Theorem 2.1, we state that for the proposed derivative-free Milstein type scheme, the theoretical order of convergence is the same as for the Milstein scheme given in [9]. At the same time, compared to the Milstein scheme in [9], the computational effort is significantly reduced for the derivative-free Milstein type scheme. That means that the effective order of convergence is a priori larger for the proposed derivative-free method. Moreover, compared to the Euler schemes like in [7, 13, 22], the computational cost is of the same magnitude while the order of convergence w.r.t. step size h is at least the same or even significantly higher. Thus, the scheme that we derive in the following is more efficient in terms of the effective order of convergence than the Euler type schemes for many parameter sets determined by the SPDE under consideration if we combine it, for example, with the algorithms for the simulation of the iterated stochastic integrals introduced in [21], see Table 2. Precisely, compared to the Euler schemes, the increase in the computational cost that results from the approximation of the iterated stochastic integrals can be neglected and we get, in many cases, a significantly higher effective order of convergence due to the higher theoretical order of convergence in the time step that the derivative-free Milstein type scheme features.

At first, the infinite dimensional spaces have to be discretized. For the solution space H, we introduce the projection operator \(P_N :H \rightarrow H_N\) that maps H to the finite dimensional subspace \(H_N := \text {span}\{e_i : i\in {\mathcal {I}}_N\}\) for some fixed \(N\in {\mathbb {N}}\) with some index set \({\mathcal {I}}_N \subset {\mathcal {I}}\) and \(|{\mathcal {I}}_N|=N\). We define this operator as

$$\begin{aligned} P_N x = \sum _{i\in {\mathcal {I}}_N} \langle x,e_i\rangle _H e_i, \quad x\in H. \end{aligned}$$

Analogously, we define the projection operator \(P_K :U \rightarrow U_K\) to approximate the Q-Wiener process for some fixed \(K\in {\mathbb {N}}\) by

$$\begin{aligned} W_t^K := P_K W_t = \sum _{j\in {\mathcal {J}}_K} \sqrt{\eta _j} {\tilde{e}}_j \beta _t^j, \quad t\in [0,T], \end{aligned}$$

with \(U_K := \text {span}\{{\tilde{e}}_j : j \in {\mathcal {J}}_K\}\) for some index set \({\mathcal {J}}_K \subset {\mathcal {J}}\), \(|{\mathcal {J}}_K| = K\), and \(\eta _j\ne 0\) for \(j\in {\mathcal {J}}_K\). In order to discretize the time interval, we work with an equidistant time step for legibility of the representation. Let \(h = \frac{T}{M}\) for some \(M\in {\mathbb {N}}\) and define \(t_m = m\cdot h\) for \(m\in \{0,\ldots ,M\}\). The increments of the approximated Q-Wiener process are then denoted by

$$\begin{aligned} \Delta W^{K,M}_m := W_{t_{m+1}}^K - W_{t_m}^K = \sum _{j \in {\mathcal {J}}_K} \sqrt{\eta _j} \, \Delta \beta _m^j \, {\tilde{e}}_j \quad \end{aligned}$$

where the increments of the real-valued Brownian motions are given by \(\Delta \beta _m^j = \beta _{t_{m+1}}^j-\beta _{t_m}^j\) for \(m \in \{0,\ldots , M-1\}\), \(j\in {\mathcal {J}}_K\).

The key idea for the construction of the derivative-free Milstein type scheme is alike to that in the commutative case, see [20], which in turn is based on the work for the finite dimensional setting in [31,32,33]. The derivative-free Milstein type scheme yields a discrete time process which we denote by \((Y_m^{N,K,M})_{m \in \{0,\ldots , M\}}\) such that \(Y_m^{N,K,M}\) is \({\mathcal {F}}_{t_m}\)-\({\mathcal {B}}(H)\)-measurable for all \(m \in \{0,\ldots ,M\}\), \(M\in {\mathbb {N}}\). We define the derivative-free Milstein type (\(\text {DFM}\)) scheme as \(Y_{0}^{N,K,M} = P_N \xi \) and

$$\begin{aligned} Y_{m+1}^{N,K,M}&= P_N e^{Ah} \bigg ( Y_{m}^{N,K,M} + h F(Y_{m}^{N,K,M}) + B(Y_m^{N,K,M}) \Delta W_m^{K,M} \nonumber \\&\quad +\left. \sum _{j\in {\mathcal {J}}_K} \bigg ( B ( Y_m^{N,K,M} + \sum _{i\in {\mathcal {J}}_K} P_N B(Y_m^{N,K,M}) {\tilde{e}}_i \, I^Q_{(i,j),m}) \right. \nonumber \\&\quad - B(Y_m^{N,K,M}) \bigg ) {\tilde{e}}_j \bigg ) \end{aligned}$$
(7)

for \(m\in \{0,\ldots ,M-1\}\), \(N,M,K\in {\mathbb {N}}\). For \(i,j\in {\mathcal {J}}_K\) and \(m\in \{0,\ldots ,M-1\}\), the term \(I^Q_{(i,j),m} = I^Q_{(i,j),t_m,t_{m+1}}\) denotes the iterated stochastic Itô integral

$$\begin{aligned} I^Q_{(i,j),t_m,t_{m+1}} = \int _{t_m}^{t_{m+1}}\int _{t_m}^s \, \langle \mathrm {d}W_r, {\tilde{e}}_i \rangle _U \, \langle \mathrm {d}W_s, {\tilde{e}}_j \rangle _{U} . \end{aligned}$$
(8)

The term containing the operator \(B'B\) in the Milstein scheme (3) is approximated by customized stages in the derivative-free Milstein type scheme (7), similar to the stages used in Runge–Kutta schemes. These stages are developed in such a way that the overall computational cost for the \(\text {DFM}\) scheme is decreased by one order compared to the original Milstein scheme, see also the discussion in Sect. 3. However, the stage values have to be chosen differently compared to the commutative noise case. Unlike in [20], the main distinction is that we employ only two stages and that the iterated stochastic integrals are carefully placed within the stage. Note that the reduction of the cost for the newly developed derivative-free Milstein type scheme (7) results from a carefully tailored design of the derivative-free stages based on a technique first established in [31,32,33] for the finite-dimensional SDE setting. This kind of trick allows to make the number of necessary stages independent of the dimension of the driving Wiener process in the SDE setting which reduces the computational complexity considerably. However, applied to the SPDE setting, this technique even reduces the computational complexity of the scheme much more substantially such that a higher effective order of convergence is achieved. This improvement is analyzed in detail in Sect. 3.

At this point, we assume that the iterated stochastic integrals are given exactly in order to consider the error estimate independent of the approximation error for the iterated integrals. We consider the error resulting from the approximation of the mild solution of (2) without an approximation of the iterated stochastic integral since the method for the computation of the iterated stochastic integrals is interchangeable. Then, in a second step, we conclude from Theorem 2.2 below that if an approximation of the iterated stochastic integral fulfills some specified conditions, this estimate remains valid.

Theorem 2.1

(Convergence of \(\text {DFM}\) scheme) Assume that (A1)–(A4) and (A5) hold. Then, there exists a constant \(C_{Q,T} \in (0,\infty )\), independent of N, K and M, such that for \((Y_m^{N,K,M})_{0 \le m \le M}\), defined by the \(\text {DFM}\) scheme in (7), it holds

$$\begin{aligned} \max _{0 \le m \le M} \Big ( \mathrm {E}\Big [ \big \Vert X_{t_m} - Y_m^{N,K,M} \big \Vert _H^2 \Big ] \Big )^{\frac{1}{2}}&\le C_{Q,T} \left( \left( \inf _{i \in {\mathcal {I}} {\setminus } {\mathcal {I}}_N} \lambda _i \right) ^{-\gamma }\right. \nonumber \\&\quad +\left. \bigg ( \sup _{j \in {\mathcal {J}} {\setminus } {\mathcal {J}}_K} \eta _j \bigg )^{\alpha } + M^{-q_{\text {DFM}}}\right) \end{aligned}$$
(9)

for all \(N,K,M \in {\mathbb {N}}\) and with \(q_{\text {DFM}}= \min (2(\gamma -\beta ),\gamma )\). The parameters are determined by assumptions (A1)–(A4).

Proof

The proof of Theorem 2.1 is stated in Sect. 6. \(\square \)

This is the same estimate (apart from the constant) as for the Milstein scheme (3) proposed in [9] or the derivative-free Milstein type scheme for SPDEs with commutative noise in [20]. The computational effort, however, increases compared to the schemes for SPDEs with commutative noise as the iterated stochastic integrals have to be simulated. We discuss this issue below.

2.3 Approximation of iterated integrals

In Sect. 2.2, we implicitly assumed that the iterated stochastic integrals can be computed exactly. However, up to now there exists no algorithm for the exact simulation of the iterated stochastic integrals in a setting with non-commutative noise. Therefore, the iterated integrals have to be approximated appropriately. We prove the following general result.

Theorem 2.2

Let \({\bar{I}}^Q_{(i,j),m}\), \(i,j\in {\mathcal {J}}_K\), \(m \in \{0, \ldots , M-1\}\), denote some approximations of the iterated stochastic integrals in (8) and let \(({\bar{Y}}_m)_{m \in \{0,\ldots ,M\}}\) with \({\bar{Y}}_m = {\bar{Y}}_m^{N,K,M}\) denote the discrete time process obtained by the \(\text {DFM}\) scheme (7) if the integrals \(I^Q_{(i,j),m}\) are replaced by the approximations \({\bar{I}}^Q_{(i,j),m}\), \(i,j\in {\mathcal {J}}_K\), \(m \in \{0,\ldots ,M-1\}\). Assume that conditions (A1)–(A5) are fulfilled and that

$$\begin{aligned}&\bigg ( \mathrm {E}\bigg [ \bigg \Vert \int _{t_l}^{t_{l+1}} B'({\bar{Y}}_l)\left( \int _{t_l}^s P_N B({\bar{Y}}_l) \, \mathrm {d}W_r^K\right) \, \mathrm {d}W^K_s \nonumber \\&\qquad - \sum _{i,j\in {\mathcal {J}}_K} {\bar{I}}_{(i,j),l}^Q B'({\bar{Y}}_l) (P_N B({\bar{Y}}_l) {\tilde{e}}_i) {\tilde{e}}_j \bigg \Vert _H^2 \bigg ] \bigg )^{\frac{1}{2}} \le {\mathcal {E}}(M,K) \end{aligned}$$
(10)

for all \(l \in \{0,\ldots ,M-1\}\), \(K,M \in {\mathbb {N}}\) and some function \({\mathcal {E}} :{\mathbb {N}} \times {\mathbb {N}} \rightarrow {\mathbb {R}}_+\). Further, in case of assumption (A5a) assume that

$$\begin{aligned} \sum _{j\in {\mathcal {J}}} \bigg ( \mathrm {E}\bigg [ \bigg ( \sum _{i\in {\mathcal {J}}} \big ({\bar{I}}_{(i,j),t,t+h}^Q \big )^2 \bigg )^{2} \bigg ] \bigg )^{\frac{1}{4}} \le C_Q h \end{aligned}$$
(11)

and in case of assumption (A5b) assume that

$$\begin{aligned} \sum _{j \in {\mathcal {J}}} \bigg ( \mathrm {E} \bigg [ \bigg ( \sum _{i \in {\mathcal {J}}} \big ( {\bar{I}}_{(i,j),t,t+h}^Q \big )^2 \bigg )^q \bigg ] \bigg )^{\frac{1}{2}} \le C_Q h^q \end{aligned}$$
(12)

for \(q\in \{2,3\}\), some \(C_Q>0\), all \(h >0\) and \(t \in [0,T-h]\). Then, there exists a constant \(C_{Q,T} \in (0,\infty )\), independent of N, K and M, such that it holds

$$\begin{aligned} \max _{0 \le m \le M} \Big ( \mathrm {E}\Big [ \big \Vert X_{t_m} - {\bar{Y}}_m \big \Vert _H^2 \Big ] \Big )^{\frac{1}{2}}&\le C_{Q,T} \left( \left( \inf _{i \in {\mathcal {I}} {\setminus } {\mathcal {I}}_N} \lambda _i \right) ^{-\gamma } \right. \\&\quad + \left. \bigg ( \sup _{j \in {\mathcal {J}} {\setminus } {\mathcal {J}}_K} \eta _j \bigg )^{\alpha } + M^{-q_{\text {DFM}}} + M^{\frac{1}{2}} \, {\mathcal {E}}(M,K) \right) \end{aligned}$$

for all \(N,K,M \in {\mathbb {N}}\) and with \(q_{\text {DFM}}=\min (2(\gamma -\beta ),\gamma )\).

Proof

The proof of this theorem is stated in Sect. 6. \(\square \)

Note that Theorem 2.2 applies to the Milstein scheme (3) as well, see also [36]. Now, we want to illustrate this statement with two exemplary choices—Algorithm 1 and Algorithm 2 as introduced in [21]. First, we consider Algorithm 1 which is based on a series representation of the iterated stochastic integral. This representation is truncated after D summands for some \(D\in {\mathbb {N}}\), see [12, 21], which yields the approximation. The numerical scheme (7) is called \(\text {DFM-A1}\) if the iterated integrals are approximated by Algorithm 1—denoted as \({\bar{I}}^{Q}_{(i,j),m}={\bar{I}}^{Q,(D),(1)}_{(i,j),m}\). For this method, there exists some constant \(C_{Q,T}>0\) such that (10) is fulfilled with

$$\begin{aligned} {\mathcal {E}}(M,K) = {\mathcal {E}}^{(D),(1)}(M,K) = C_{Q,T} \frac{1}{M \, \sqrt{D}} \end{aligned}$$
(13)

for all \(D,K,M\in {\mathbb {N}}\), see [21, Corollary 1]. If we approximate the integrals with Algorithm 2 instead, we denote the scheme (7) by \(\text {DFM-A2}\) and the approximation \({\bar{I}}^{Q}_{(i,j),m}\) of \(I^Q_{(i,j),m}\) by \({\bar{I}}^{Q,(D),(2)}_{(i,j),m}\). The series representation is not only truncated after D summands, but the remainder is approximated by a multivariate normally distributed random vector, for details, we refer to [21, 24, 39]. For this algorithm, (10) holds with

$$\begin{aligned} {\mathcal {E}}(M,K) = {\mathcal {E}}^{(D),(2)}(M,K) = C_{Q,T} \frac{\min \left( K \sqrt{K-1}, (\min _{j\in {\mathcal {J}}_K} \eta _j)^{-1} \right) }{M \, D} \end{aligned}$$
(14)

for all \(D, K, M \in {\mathbb {N}}\) and some constant \(C_{Q,T}>0\), see [21, Corollary 2, Theorem 4]. This estimate shows that the error converges in D with a higher order, compared to the estimate for Algorithm 1. Note that the error estimate also depends on the number K, which controls the accuracy of the approximation of the Q-Wiener process, and on the eigenvalues of the operator Q. For a proof of the error estimates (13) and (14), we refer to [21]. Moreover, conditions (11) and (12) are fulfilled for Algorithm 1 and 2, which can be easily seen from the definition of the algorithms in [21].

In order to determine which of the two algorithms obtains a higher order of convergence, one has to analyze the computational costs that are involved, see also [21, 36] for a comparison. The goal is that the \(\text {DFM}\) scheme combined with Algorithm 1 or Algorithm 2 preserves the error estimate stated in Theorem 2.1. This requires a choice of \(D \ge D_{1} = \lceil M^{2\min (2(\gamma -\beta ),\gamma )-1} \rceil \) for Algorithm 1, whereas for Algorithm 2 we need \(D \ge D_2 = \lceil \min \big ( K \sqrt{K-1}, (\min _{j\in {\mathcal {J}}_K} \eta _j)^{-1} \big ) M^{\min (2(\gamma -\beta ),\gamma )-\frac{1}{2}} \rceil \). Alternatively, one can choose \(D \ge D_{1} = \lceil M^{-1} (\sup _{j \in {\mathcal {J}} {\setminus } {\mathcal {J}}_K}\eta _j )^{-2\alpha } \rceil \) for the first algorithm and \(D \ge D_{2} = \lceil M^{-\frac{1}{2}} \min \big ( K \sqrt{K-1}, (\min _{j\in {\mathcal {J}}_K} \eta _j)^{-1} \big ) ( \sup _{j \in {\mathcal {J}} {\setminus } {\mathcal {J}}_K}\eta _j )^{-\alpha } \rceil \) for the second algorithm. However, if all summands of the error estimate in Theorem 2.1 are optimally balanced, then \(( \sup _{j \in {\mathcal {J}} {\setminus } {\mathcal {J}}_K} \eta _j )^{\alpha } = {\mathcal {O}}( M^{-\min (2(\gamma -\beta ),\gamma )} )\) which results in the same orders of magnitude for the choice of \(D_1\) and \(D_2\), respectively. These considerations show that the computational effort for the two schemes \(\text {DFM-A1}\) and \(\text {DFM-A2}\) is determined by the parameters which in turn are specified by the equation. Therefore, the choice of the optimal scheme depends on the SPDE that has to be solved. From now on, we assume that \(D\in {\mathbb {N}}\) is chosen such that the temporal order of convergence is not decreased, i.e., such that \(D=D_1\) for Algorithm 1 or \(D=D_2\) for Algorithm 2, respectively.

Remark 2.1

Note that Algorithm 1 and Algorithm 2 proposed in [21] merely represent examples and that Theorem 2.2 is valid if the derivative-free Milstein type scheme \(\text {DFM}\) is combined with any approximation for the iterated stochastic integrals such that conditions (10) together with (11) or (12) are fulfilled.

3 The effective order of convergence: a comparison

In the following, we compare the performance of the derivative-free Milstein type (\(\text {DFM}\)) scheme to the performance of the original Milstein (\(\text {MIL}\)) scheme (3), the exponential Euler (\(\text {EXE}\)) scheme and the linear implicit Euler (\(\text {LIE}\)) scheme. For example, one can combine the \(\text {DFM}\) scheme and the \(\text {MIL}\) scheme with Algorithm 1 or Algorithm 2 in order to approximate the solution of SPDEs that need not fulfill the commutativity condition (1). However, the analysis can be done similarly for any other approximation method for the iterated stochastic integrals as specified in Theorem 2.2. In the following, we restrict our analysis to Algorithm 1 as an example. The \(\text {LIE}\) scheme is considered in [13, 37] and the \(\text {EXE}\) scheme is introduced in [22] which are combined with a Galerkin approximation. The proof of the following theorem is detailed in [19] and the main idea can be found in [9].

Proposition 3.1

(Convergence of \(\text {EXE}\) scheme) Assume that (A1)–(A4) hold. Then, there exists a constant \(C_{Q,T} \in (0,\infty )\), independent of N, K and M, such that for the approximation process \((Y_m^{\text {EXE}})_{0 \le m \le M}\) with \(Y_m^{\text {EXE}}=Y_m^{\text {EXE}; N,K,M}\), defined by the \(\text {EXE}\) scheme, it holds

$$\begin{aligned}&\max _{0 \le m \le M} \Big ( \mathrm {E}\Big [ \big \Vert X_{t_m} - Y_m^{\text {EXE}} \big \Vert _H^2 \Big ] \Big )^{\frac{1}{2}} \nonumber \\&\quad \le C_{Q,T} \left( \left( \inf _{i \in {\mathcal {I}} {\setminus } {\mathcal {I}}_N} \lambda _i \right) ^{-\gamma } + \bigg ( \sup _{j \in {\mathcal {J}} {\setminus } {\mathcal {J}}_K} \eta _j \bigg )^{\alpha } + M^{-q_{\text {EXE}}} \right) \end{aligned}$$
(15)

for all \(N,K,M \in {\mathbb {N}}\) and with \(q_{\text {EXE}} = \min (\frac{1}{2},2(\gamma -\beta ),\gamma )\). The parameters are determined by assumptions (A1)–(A4).

Compared to the \(\text {DFM}\) scheme, the \(\text {EXE}\) scheme requires less restrictive assumptions as we do not need (A5) or conditions on the second derivative of B and the estimate for \(B'(v) P B(v)-B'(w) P B(w)\) can be omitted. For the LIE scheme, similar results as in Proposition 3.1 can be obtained analogously. Below, q stands for the order of convergence w.r.t. the step size \(h=\frac{T}{M}\) with \(q_{\text {DFM}}= q_{\text {MIL}}= \min (2(\gamma -\beta ),\gamma ) \ge \min (\frac{1}{2},2(\gamma -\beta ),\gamma ) = q_{\text {EXE}}\). However, to compare the performance of the schemes we have to take into account their computational cost in combination with their error estimates as, e.g., iterated stochastic integrals have to be simulated for the higher order schemes \(\text {DFM}\) and \(\text {MIL}\) only.

3.1 The cost model

In order to compare the efficiency of different approximation algorithms, one is usually interested in the dependency of the errors on their computational cost. Therefore, we consider a theoretical cost model proposed in [20]. It is assumed that any standard arithmetic operation or evaluation of sine, cosine or exponential function etc. produces unit cost 1. Further, the simulation of any realization of an N(0, 1)-distributed real valued random variable is assumed to produce at least cost one. However, the evaluation \(\phi (v)\) of a functional \(\phi \in V^*\) with \(V=H\) or \(V=U\) is assumed to be usually more costly with \({\text {cost}}(\phi ,v) = {\text {cost}}(\phi ) \equiv c\) for all \(v \in V\) and for some \(c \ge 1\) where typically \(c \gg 1\). Such functionals are needed for, e.g., the calculation of Fourier coefficients \(\phi _i(v) = \langle v, {\hat{e}}_i \rangle _V\) of \(v \in V\) for some ONB \(\{{\hat{e}}_i\}_{i \in {\mathbb {N}}}\) of V. Let \(L(H,E)_{N} = \{ T\arrowvert _{H_N} : \ T \in L(H,E)\}\) for some vector space E and let \(L_{HS}(U,H)_{K,N} = \{ P_N T\arrowvert _{U_K} : \ T \in L_{HS}(U,H)\}\). As a result, we obtain for any \(v,y \in H_N\) and \(u \in U_K\) the following computational costs due to \(|{\mathcal {I}}_N|=N\) and \(|{\mathcal {J}}_K|=K\) [20]:

  1. (i)

    One evaluation of the mapping \(P_N \circ F :H \rightarrow H_N\) with

    $$\begin{aligned} P_N F(y) = \sum _{i \in {\mathcal {I}}_N} \langle F(y), e_i \rangle _H \, e_i \end{aligned}$$

    is determined by the functionals \(\langle F(y), e_i \rangle _H\) for \(i \in {\mathcal {I}}_N\) which results in \({\text {cost}}(P_N F) = {\mathcal {O}}( N )\).

  2. (ii)

    Evaluating \(P_N \circ B(\cdot )\arrowvert _{U_K} :H \rightarrow L_{HS}(U,H)_{K,N}\) with

    $$\begin{aligned} P_N B(y)u = \sum _{i \in {\mathcal {I}}_N} \sum _{j \in {\mathcal {J}}_K} \langle B(y) {\tilde{e}}_j, e_i \rangle _H \, \langle u, {\tilde{e}}_j \rangle _U \, e_i \end{aligned}$$

    needs the evaluation of the functionals \(\langle B(y) {\tilde{e}}_j, e_i \rangle _H\) for \(i \in {\mathcal {I}}_N\) and \(j \in {\mathcal {J}}_K\) which results in \({\text {cost}}(P_N \circ B(y)\arrowvert _{U_K}) = {\mathcal {O}}( N K )\).

  3. (iii)

    For \(P_N \circ B'(\cdot )\arrowvert _{H_N,U_K} :H \rightarrow L(H,L_{HS}(U,H)_{K,N})_N\) with

    $$\begin{aligned} P_N \big ( (B'(y)v)u \big ) = \sum _{k,l \in {\mathcal {I}}_N} \sum _{j \in {\mathcal {J}}_K} \langle (B'(y) e_k) {\tilde{e}}_j, e_l \rangle _H \, \langle v, e_k \rangle _H \, \langle u, {\tilde{e}}_j \rangle _U \, e_l \end{aligned}$$

    the functionals \(\langle (B'(y) e_k) {\tilde{e}}_j, e_l \rangle _H\) have to be evaluated for all \(k,l \in {\mathcal {I}}_N\) and \(j \in {\mathcal {J}}_K\) and it follows that \({\text {cost}}(P_N \circ (B'(y)(\cdot ))(\cdot )\arrowvert _{H_N,U_K}) = {\mathcal {O}}( N^2 K )\).

Considering the computational cost for one time step of the Milstein scheme (3), one evaluation of \(P_N \circ F(\cdot )\), one of \(P_N \circ B(\cdot )\arrowvert _{U_K}\), and one evaluation of \(P_N \circ B'(\cdot )\arrowvert _{H_N,U_K}\) are needed. Then, the evaluated operators \(P_N \circ B(Y^{\text {MIL}}_m) \arrowvert _{U_K} \in L(U,H)_{K,N}\), \(P_N \circ B'(Y^{\text {MIL}}_m)\arrowvert _{H_N,U_K} \in L(H,L_{HS}(U,H)_{K,N})_N\), \(P_N \circ B'(Y^{\text {MIL}}_m)(v)\arrowvert _{U_K} \in L_{HS}(U,H)_{K,N}\) and \(P_N \circ e^{Ah} \in L(H,H)_{N,N}\) have to be applied to the corresponding elements of the Hilbert spaces. Here, it has to be pointed out that calculating the Fourier coefficients of \(P_N B(Y^{\text {MIL}}_m) {\tilde{e}}_j\) for some basis element \({\tilde{e}}_j \in U_K\) is for free because they are in the j-th column of the matrix representation \(P_N B(Y^{\text {MIL}}_m)\arrowvert _{U_K} = \big (b_{i,j}(Y^{\text {MIL}}_m) \big )_{i \in {\mathcal {I}}_N, j \in {\mathcal {J}}_K}\) with \(b_{i,j}(Y^{\text {MIL}}_m)=\langle B(Y^{\text {MIL}}_m) {\tilde{e}}_j, e_i \rangle _H\) which are already determined. The same applies to the operator \(P_N \circ B'(Y^{\text {MIL}}_m)(v)\arrowvert _{U_K} \in L_{HS}(U,H)_{K,N}\) if it is applied to some basis element \({\tilde{e}}_j \in U_K\). Thus, the computational cost for M time steps of the Milstein scheme is \({\text {cost}}(\text {MIL}(N,K,M)) = {\mathcal {O}}(N^2 K M)\) if the cost for the simulation of iterated stochastic integrals is not taken into account.

In contrast to the Milstein scheme, in each time step the proposed derivative-free Milstein type scheme \(\text {DFM}\) needs one evaluation of \(P_N \circ F(\cdot )\), one evaluation of \(P_N \circ B(\cdot )\arrowvert _{U_K}\), and the calculation of

$$\begin{aligned} \sum _{j \in {\mathcal {J}}_K} P_N B \bigg ( Y_m^{N,K,M} + \sum _{i\in {\mathcal {J}}_K} P_N B(Y_m^{N,K,M}) {\tilde{e}}_i \, I^Q_{(i,j),m} \bigg ) {\tilde{e}}_j . \end{aligned}$$
(16)

Observe that the calculation of each summand requires the computation of the functionals

$$\begin{aligned} \phi _k^j = \bigg \langle B \bigg ( Y_m^{N,K,M} + \sum _{i\in {\mathcal {J}}_K} P_N B(Y_m^{N,K,M}) {\tilde{e}}_i \, I^Q_{(i,j),m} \bigg ) {\tilde{e}}_j, e_k \bigg \rangle _H \end{aligned}$$

for \(k \in {\mathcal {I}}_N\) with \({\text {cost}}(\{\phi _k^j : k \in {\mathcal {I}}_N \})=c N\) for each \(j \in {\mathcal {J}}_K\) due to \(|{\mathcal {I}}_N|=N\). Therefore, the evaluation of (16) can be done with cost \({\mathcal {O}}( N K )\). Here, the crucial point is that although the argument of B depends on the index j, the resulting operator is then applied to the basis function \({\tilde{e}}_j\) only, which is responsible for the fundamental reduction of the computational complexity. In addition, the linear operators \(P_N \circ e^{Ah} \arrowvert _{H_N} \in L(H,H)_{N,N}\) and \(P_N \circ B(Y^{\text {MIL}}_m)\arrowvert _{U_K} \in L(U,H)_{K,N}\) (note again that calculating, e.g., \(P_N B(Y^{\text {MIL}}_m) {\tilde{e}}_j\) for a basis \({\tilde{e}}_j \in U_K\) is for free) have to be applied to the corresponding elements of the Hilbert spaces. Thus, the total computational cost for M time steps of the \(\text {DFM}\) scheme is \({\text {cost}}(\text {DFM}(N,K,M))={\mathcal {O}}(N K M)\) if the cost for the simulation of iterated stochastic integrals is not taken into account.

Analogously to the derivative-free Milstein type schemes \(\text {DFM-A1}\) and \(\text {DFM-A2}\), we denote the Milstein scheme by \(\text {MIL-A1}\) and \(\text {MIL-A2}\) if it is combined with either Algorithm 1 or Algorithm 2 proposed in [21] for the approximation of the iterated stochastic integrals, respectively. Then, the dominating computational cost due to necessary evaluations of real-valued functionals and the simulation of random numbers for each time step can be found in Table 1 for the linear implicit Euler scheme \(\text {LIE}\) as well as for the \(\text {EXE}\), \(\text {MIL-A1}\), \(\text {MIL-A2}\), \(\text {DFM-A1}\) and \(\text {DFM-A2}\) schemes. It has to be pointed out that, in contrast to finite-dimensional stochastic differential equations (SDEs), the computational effort of each numerical scheme depends not only on the number of time steps M, but also on the dimensions N and K of the subspaces \(H_N\) and \(U_K\) which have to increase in order to decrease the approximation error, compare Theorem 2.1 and Proposition 3.1. However, it turns out that different schemes can attain the same error estimates like in (9) and (15), however with significantly different computational cost, see also the discussion in [20]. In order to compare the performance of different numerical schemes, one has to compare the accuracy of each scheme versus the needed computational cost instead of just comparing their error estimates w.r.t. N, K and M given in Theorem 2.1 and Proposition 3.1. For example, the Milstein scheme and the derivative-free Milstein type scheme both attain the same error estimate (9), however, the computational effort for \(\text {MIL}\) is \({\text {cost}}(\text {MIL}(N,K,M)) = {\mathcal {O}}(N^2 K M)\) whereas for the \(\text {DFM}\) scheme it is only \({\text {cost}}(\text {DFM}(N,K,M))={\mathcal {O}}(N K M)\) if the random numbers are assumed not to be the dominating cost. Therefore, in this case, the \(\text {DFM}\) scheme performs a priori with a higher order of convergence compared to the \(\text {MIL}\) scheme if errors versus costs are considered. Compared with the \(\text {LIE}\) scheme and the \(\text {EXE}\) scheme, the \(\text {DFM}\) scheme belongs to the same class \({\mathcal {O}}(N K M)\) of computational complexity, which is in some sense optimal for one-step approximations for SPDEs of type (2). Although the \(\text {LIE}\) scheme as well as the \(\text {EXE}\) scheme have worse error bounds given in (15) compared to the one for the \(\text {DFM}\) and the \(\text {MIL}\) scheme in (9), it is not clear which scheme should be preferred because the computational cost for simulating the iterated stochastic integrals for the \(\text {DFM}\) and the \(\text {MIL}\) scheme have to be taken into account as well. Therefore, we derive the effective order of convergence for each scheme under consideration. This concept is also detailed in [20].

Table 1 Computational cost given by the number of evaluations of real-valued functionals and independent N(0, 1)-distributed random variables needed for each time step

3.2 Comparison of the effective orders of convergence

In order to compare the performance of different numerical schemes, we consider the so-called effective order of convergence which was proposed in [33] and also considered in [20]. In the following, we restrict our comparison to the schemes \(\text {DFM-A1}\) and \(\text {MIL-A1}\), both using Algorithm 1, as well as the \(\text {EXE}\) scheme in order to keep the analysis concise. Further, we assume that the simulation of a normally distributed real-valued random variable produces the same computational cost as the evaluation of a functional. For a detailed analysis and comparison of the effective order of convergence for the schemes \(\text {MIL-A1}\), \(\text {MIL-A2}\) and \(\text {EXE}\) we refer to [36]. Since the \(\text {LIE}\) scheme and the \(\text {EXE}\) scheme have the same order of convergence and similar computational cost, we restrict our analysis to the \(\text {EXE}\) scheme in the following because one can get exactly the same results for the \(\text {LIE}\) scheme. We want to point out that the focus of this article lies on the introduction and analysis of the derivative-free Milstein type scheme and a complete comparison taking into account further algorithms next to Algorithm 1 for the simulation of the iterated stochastic integrals would go beyond the scope of this article and may be object of future research.

For each scheme under consideration and its approximation process \((Y_m^{N,K,M})_m\in \{0,\ldots ,M\}\), we have to minimize the error term

$$\begin{aligned} \sup _{m\in \{0,\ldots ,M\}}\big ( \mathrm {E}\big [ \Vert X_{t_m} - Y_m^{N,K,M} \Vert _H^2 \big ] \big )^{\frac{1}{2}} \end{aligned}$$

over all \(N,K,M \in {\mathbb {N}}\) under the constraint that the computational cost does not exceed some specified value \({\bar{c}}>0\). Note that if D is chosen as described in Sect. 2.3, then the computational cost of each scheme given in Table 1 depends on N, K and M only. In the following, we assume that \(\sup _{j\in {\mathcal {J}} {\setminus } {\mathcal {J}}_K }\eta _j = {\mathcal {O}}( K^{-\rho _Q})\) and \(( \inf _{i\in {\mathcal {I}}{\setminus } {\mathcal {I}}_N}\lambda _i )^{-1} = {\mathcal {O}}( N^{-\rho _A})\) for some \(\rho _A>0\) and \(\rho _Q>1\). Then, we obtain the following expression for all \(N,K,M \in {\mathbb {N}}\) and some \(C>0\), see also [20],

$$\begin{aligned} \text {err}(\text {SCHEME})&=\sup _{m\in \{0,\ldots ,M\}} \Big ( \mathrm {E}\Big [ \big \Vert X_{t_m} - Y_m^{N,K,M} \big \Vert _H^2 \Big ] \Big )^{\frac{1}{2}} \nonumber \\&\le C \big ( N^{-\gamma \rho _A}+K^{-\alpha \rho _Q}+M^{-q} \big ). \end{aligned}$$
(17)

Note that the parameter \(q>0\) is determined by the scheme that is considered. Given some computational cost \({\bar{c}}>0\), the goal is to minimize the error under the constraint that the computational cost is bounded by \({\bar{c}}\). Solving this optimization problem yields the effective order of convergence, denoted by EOC(SCHEME), which is then given by an expression of the form

$$\begin{aligned} \text {err}(\text {SCHEME}) ={\mathcal {O}}\big ({\bar{c}}^{\, \, -\text {EOC(SCHEME)}}\big ). \end{aligned}$$

Next, we analyze the effective order of convergence for the \(\text {DFM-A1}\) scheme and the \(\text {MIL-A1}\) scheme, which make use of Algorithm 1 for the approximation of the iterated stochastic integrals, and the \(\text {EXE}\) scheme. To begin with, let \(q := q_{\text {DFM}}= q_{\text {MIL}}= \min (2(\gamma -\beta ),\gamma )\) for the scheme \(\text {DFM-A1}\) and the scheme \(\text {MIL-A1}\), see [36, Thm. 1] and [9, Thm. 1], and let \(D= D_1 = {\mathcal {O}}( M^{2q-1} )\) for Algorithm 1 unless otherwise stated.

First, we consider the scheme \(\text {DFM-A1}\). The computational cost for the calculation of one trajectory with M time steps amounts to \({\bar{c}} = {\mathcal {O}}(MKN)+{\mathcal {O}}(KM^{2 q})\), see Table 1 and the discussion in the last section. Solving the above mentioned optimization problem results in \(M={\mathcal {O}}({\bar{c}}^{\frac{\alpha \rho _Q \gamma \rho _A}{z}})\), \(K={\mathcal {O}}({\bar{c}}^{\frac{\gamma \rho _A q}{z}})\) and \(N={\mathcal {O}}({\bar{c}}^{\frac{\alpha \rho _Q q}{z}})\) for some \(z>0\), such that all summands in (17) are balanced. Then, one determines z from \({\bar{c}} = {\mathcal {O}}(MKN)\) or \({\bar{c}} = {\mathcal {O}}(KM^{2 q})\), depending on which of the two terms dominates the total computational cost. If \(2q-1 \le 0\) or if \(M^{2q-1} = {\mathcal {O}}(N)\), then \({\bar{c}} = {\mathcal {O}}(MKN)\) and we calculate that \(z= (\alpha \rho _Q + \gamma \rho _A)q + \alpha \rho _Q \gamma \rho _A\). This is the case if \(\alpha \rho _Q \gamma \rho _A (2q -1) \le \alpha \rho _Q q\) is fulfilled. On the other hand, if \(2q-1>0\) and if \(N={\mathcal {O}}(M^{2q-1})\), then \({\bar{c}} = {\mathcal {O}}(KM^{2 q})\) are the dominating cost and we get \(z=(1+2 \alpha \rho _Q) \gamma \rho _A q\), which is the case if \(\alpha \rho _Q \gamma \rho _A (2q-1) \ge \alpha \rho _Q q\).

Now, two cases have to be distinguished: If \(\gamma \rho _A (2q -1) \le q\) is fulfilled, then \({\bar{c}} = {\mathcal {O}}(MKN)\). We solve the optimization problem and obtain

$$\begin{aligned} M&= {\mathcal {O}}\left( {\bar{c}}^{\, \frac{\gamma \rho _A\alpha \rho _Q}{(\alpha \rho _Q+\gamma \rho _A) q + \alpha \rho _Q\gamma \rho _A}}\right) , \quad N = {\mathcal {O}}\left( {\bar{c}}^{\, \frac{\alpha \rho _Q q}{(\alpha \rho _Q+\gamma \rho _A) q +\alpha \rho _Q\gamma \rho _A}}\right) ,\nonumber \\ K&= {\mathcal {O}}\left( {\bar{c}}^{\, \frac{\gamma \rho _A q}{(\alpha \rho _Q+\gamma \rho _A) q +\alpha \rho _Q\gamma \rho _A}}\right) . \end{aligned}$$
(18)

Further, the effective order of convergence is given by

$$\begin{aligned} \text {err}(\text {DFM-A1}) = {\mathcal {O}}\left( {\bar{c}}^{\, \, -\frac{\gamma \rho _A\alpha \rho _Q q }{(\alpha \rho _Q+\gamma \rho _A) q +\alpha \rho _Q\gamma \rho _A}}\right) , \end{aligned}$$
(19)

which is the same result as for the derivative-free Milstein type scheme in the case of SPDEs with commutative noise, see the computations in [20]. On the other hand, if \(\gamma \rho _A (2 q -1) \ge q\) holds, then \({\bar{c}} = {\mathcal {O}}(KM^{2 q})\) and optimization yields

$$\begin{aligned} M = {\mathcal {O}}\left( {\bar{c}}^{\, \frac{\alpha \rho _Q}{(2\alpha \rho _Q+1) q}}\right) , \quad N = {\mathcal {O}}\left( {\bar{c}}^{\, \frac{\alpha \rho _Q}{(2\alpha \rho _Q+1)\gamma \rho _A}}\right) , \quad K = {\mathcal {O}}\left( {\bar{c}}^{\, \frac{1}{2\alpha \rho _Q+1}}\right) . \end{aligned}$$
(20)

In this case, we obtain the effective order of convergence from

$$\begin{aligned} \text {err}(\text {DFM-A1}) = {\mathcal {O}}\left( {\bar{c}}^{\, \, -\frac{\alpha \rho _Q}{2\alpha \rho _Q+1}}\right) . \end{aligned}$$
(21)

Next, we consider the Milstein scheme \(\text {MIL-A1}\). Here, the computational effort for the computation of one trajectory is \({\bar{c}} = {\mathcal {O}}(MKN^2)+{\mathcal {O}}(KM^{2 q})\), compare Table 1. Again, two cases have to be considered: If \(\gamma \rho _A(2q -1) \le 2 q\), then \({\bar{c}} = {\mathcal {O}}(MKN^2)\) and solving the optimization problem yields

$$\begin{aligned} M&= {\mathcal {O}}\left( {\bar{c}}^{\, \frac{\gamma \rho _A\alpha \rho _Q}{(2\alpha \rho _Q+\gamma \rho _A) q + \alpha \rho _Q\gamma \rho _A}}\right) , \quad N = {\mathcal {O}}\left( {\bar{c}}^{\, \frac{\alpha \rho _Q q}{(2\alpha \rho _Q+\gamma \rho _A) q +\alpha \rho _Q\gamma \rho _A}}\right) ,\nonumber \\ K&= {\mathcal {O}}\left( {\bar{c}}^{\, \frac{\gamma \rho _A q}{(2\alpha \rho _Q+\gamma \rho _A) q +\alpha \rho _Q\gamma \rho _A}}\right) . \end{aligned}$$
(22)

As a result of this, we obtain the effective order of convergence from

$$\begin{aligned} \text {err}(\text {MIL-A1}) = {\mathcal {O}}\left( {\bar{c}}^{\, \, -\frac{\gamma \rho _A\alpha \rho _Q q }{(2\alpha \rho _Q+\gamma \rho _A) q +\alpha \rho _Q\gamma \rho _A}}\right) , \end{aligned}$$
(23)

which is also the same effective order of convergence as for the Milstein scheme if it is applied to some SPDE with commutative noise, see also [20]. However, in the case of \(\gamma \rho _A(2q -1) \ge 2 q\) the computational effort for the \(\text {MIL-A1}\) scheme is \({\bar{c}} = {\mathcal {O}}(KM^{2 q})\) and we obtain the same choice for M, N and K as given in (20) and also the same effective order of convergence as given by (21), see also [36].

Finally, we consider the \(\text {EXE}\) scheme where the optimal choice for M, N and K is given by (18), however, with \(q=q_{\text {EXE}}= \min (\frac{1}{2},2(\gamma -\beta ),\gamma )\) and the effective order of convergence for the \(\text {EXE}\) scheme was computed in [20] and is given by

$$\begin{aligned} \text {err}(\text {EXE}) = {\mathcal {O}}\left( {\bar{c}}^{\, \, -\frac{\gamma \rho _A\alpha \rho _Q q_{\text {EXE}}}{(\alpha \rho _Q+\gamma \rho _A) q_{\text {EXE}}+\gamma \rho _A\alpha \rho _Q}} \right) . \end{aligned}$$
(24)

Here, we note that the same holds for the \(\text {LIE}\) scheme.

In order to determine the scheme which is most efficient for the approximation of the solution for an SPDE of type (2) that needs not to fulfill a commutativity condition for the noise, we have to compare the effective orders of convergence according to the distinct parameter settings for the schemes \(\text {DFM-A1}\), \(\text {MIL-A1}\) and \(\text {EXE}\).

If \(\gamma \rho _A (2q-1) \le q\) and \(q \le \frac{1}{2}\), then, it follows that \(q= q_{\text {EXE}}\). Thus, the \(\text {EXE}\) scheme and the \(\text {DFM-A1}\) scheme have the same effective order of convergence given in (19) and (24), whereas the \(\text {MIL-A1}\) scheme obviously has a lower effective order of convergence given in (23).

If \(\gamma \rho _A (2q-1) \le q\) and \(q > \frac{1}{2}\), then, it follows that \(q> q_{\text {EXE}}= \frac{1}{2}\). Here, the \(\text {DFM-A1}\) scheme has obviously a higher effective order of convergence compared to the one of the \(\text {EXE}\) scheme. Further, comparing the effective order of convergence of the \(\text {EXE}\) scheme and the \(\text {MIL-A1}\) scheme results in

$$\begin{aligned} \frac{q_{\text {MIL}}\gamma \rho _A \alpha \rho _Q}{(2\alpha \rho _Q+\gamma \rho _A) q_{\text {MIL}}+ \gamma \rho _A \alpha \rho _Q} \le \frac{\frac{1}{2}\gamma \rho _A\alpha \rho _Q}{(\alpha \rho _Q+\gamma \rho _A)\frac{1}{2} +\gamma \rho _A\alpha \rho _Q} . \end{aligned}$$

Here, it follows that the \(\text {EXE}\) scheme has a higher order of convergence than the \(\text {MIL-A1}\) scheme and thus the \(\text {DFM-A1}\) scheme attains the highest effective order of convergence in this case.

If \(q \le \gamma \rho _A (2q-1) \le 2q\), then, it follows that \(q>q_{\text {EXE}}=\frac{1}{2}\). In this case, it holds for the effective orders of convergence of the \(\text {EXE}\), the \(\text {MIL-A1}\) and the \(\text {DFM-A1}\) scheme that

$$\begin{aligned} \frac{\frac{1}{2}\gamma \rho _A\alpha \rho _Q}{(\alpha \rho _Q+\gamma \rho _A)\frac{1}{2} +\gamma \rho _A\alpha \rho _Q} \le \frac{q_{\text {MIL}}\gamma \rho _A \alpha \rho _Q}{(2\alpha \rho _Q+\gamma \rho _A) q_{\text {MIL}}+ \gamma \rho _A \alpha \rho _Q} \le \frac{\alpha \rho _Q}{2\alpha \rho _Q+1} . \end{aligned}$$

Thus, the \(\text {DFM-A1}\) scheme is the one with the highest effective order of convergence in the present case.

If \(2q \le \gamma \rho _A (2q-1)\), it holds that \(q>q_{\text {EXE}}=\frac{1}{2}\). In this case, the \(\text {DFM-A1}\) and the \(\text {MIL-A1}\) scheme attain the same effective order of convergence given in (21). As a result of this, a comparison of the effective order of the \(\text {EXE}\) scheme with the one of the \(\text {MIL-A1}\) scheme and \(\text {DFM-A1}\) scheme results in

$$\begin{aligned} \frac{\frac{1}{2}\gamma \rho _A\alpha \rho _Q}{(\alpha \rho _Q+\gamma \rho _A)\frac{1}{2} +\gamma \rho _A\alpha \rho _Q} \le \frac{\alpha \rho _Q}{2\alpha \rho _Q+1} . \end{aligned}$$

In this case, the \(\text {DFM-A1}\) scheme and the \(\text {MIL-A1}\) scheme have the same effective order of convergence which is higher than the one of the \(\text {EXE}\) scheme.

Table 2 For a given parameter set, the conditions in this table have to be checked in order to determine the optimal scheme among the schemes \(\text {DFM-A1}\), \(\text {MIL-A1}\) and \(\text {EXE}\)

The same holds true if the \(\text {EXE}\) scheme is replaced by the \(\text {LIE}\) scheme. We summarize the results of our comparison in Table 2 which shows that the \(\text {DFM-A1}\) scheme always attains the highest possible effective order of convergence. However, in the case of \(q_{\text {DFM}}= q_{\text {EXE}}\le \frac{1}{2}\), although the \(\text {EXE}\) scheme and the \(\text {DFM-A1}\) scheme have the same effective order of convergence, one may prefer the \(\text {EXE}\) scheme because it requires less computational effort compared to the \(\text {DFM-A1}\) scheme, see Table 1. On the other hand, in the case of \(2q \le \gamma \rho _A (2q-1)\), both the \(\text {DFM-A1}\) and the \(\text {MIL-A1}\) scheme have the same optimal effective order of convergence, which is higher than that of the \(\text {EXE}\) scheme. Here, one may prefer the \(\text {DFM-A1}\) scheme because it needs less computational effort compared to the \(\text {MIL-A1}\) scheme, see Table 1, and because it is derivative-free whereas one has to calculate the derivative of the operator B for the \(\text {MIL-A1}\) scheme.

Finally, it has to be pointed out that the maximal effective order of convergence that can be attained is always bounded by 1/2 independent of the given parameters whenever Algorithm 1 is applied to simulate the iterated stochastic integrals.

For completeness, we want to note that assumption (A5) as well as parts of (A3) do not have to be fulfilled for the exponential Euler scheme. This means that there might be parameter sets that are valid for the \(\text {EXE}\) scheme but not for the \(\text {DFM}\) scheme and in these situations the exponential Euler scheme would be the method of choice. Moreover, it is not clear if the obtained upper error bounds are sharp and thus if the effective order of convergence may be further improved.

3.3 The case of a finite-dimensional Q-Wiener process

If the Q-Wiener process W is finite-dimensional, i.e., if \(|\{ j \in {\mathcal {J}} : \eta _j \ne 0 \}| < \infty \), the error estimate only depends on M and N provided we choose \(K=|\{ j \in {\mathcal {J}} : \eta _j \ne 0 \}|\). Then, we obtain new solutions for M and N solving the optimization problem that minimizes the error under the constraint of a prescribed computational cost budget \({\bar{c}}\). Therefore, we compare once more the \(\text {DFM-A1}\) scheme, the \(\text {MIL-A1}\) scheme and the \(\text {EXE}\) scheme. Now, the computational cost required to approximate one trajectory of the solution of SPDE (2) by the \(\text {DFM-A1}\) scheme becomes \({\bar{c}} = {\mathcal {O}}(MN)+{\mathcal {O}}(M^{2q})\), for the \(\text {MIL-A1}\) scheme we get \({\bar{c}}={\mathcal {O}}(MN^2)+{\mathcal {O}}(M^{2q})\) and for the \(\text {EXE}\) scheme it is \({\bar{c}}={\mathcal {O}}(MN)\).

If \(\gamma \rho _A (2q-1) \le q\), the computational cost for the \(\text {DFM-A1}\) scheme is \({\bar{c}} = {\mathcal {O}}(MN)\) and solving the optimization problem yields

$$\begin{aligned} M = {\mathcal {O}}\left( {\bar{c}}^{\, \frac{\gamma \rho _A}{\gamma \rho _A +q}} \right) , \quad N = {\mathcal {O}}\left( {\bar{c}}^{\, \frac{q}{\gamma \rho _A +q}} \right) . \end{aligned}$$
(25)

Then, the effective order of convergence is given by

$$\begin{aligned} \text {err}(\text {DFM-A1}) = {\mathcal {O}}\left( {\bar{c}}^{\, \, -\frac{\gamma \rho _A q}{\gamma \rho _A+q}}\right) . \end{aligned}$$
(26)

If \(\gamma \rho _A (2q-1) \ge q\), then \(q > \frac{1}{2}\) and \({\bar{c}} = {\mathcal {O}}(M^{2q})\) for the \(\text {DFM-A1}\) scheme. Here, optimization results in

$$\begin{aligned} M = {\mathcal {O}}\left( {\bar{c}}^{\, \frac{\gamma \rho _A}{2 \gamma \rho _A q}} \right) , \quad N = {\mathcal {O}}\left( {\bar{c}}^{\, \frac{q}{2\gamma \rho _A q}} \right) , \end{aligned}$$
(27)

and the effective order of convergence can be calculated as

$$\begin{aligned} \text {err}(\text {DFM-A1}) = {\mathcal {O}}\left( {\bar{c}}^{\, \, -\frac{1}{2}}\right) . \end{aligned}$$
(28)

Considering the \(\text {MIL-A1}\) scheme, again two cases have to be distinguished: If \(\gamma \rho _A (2q-1) \le 2q\), the \(\text {MIL-A1}\) has computational cost \({\bar{c}} = {\mathcal {O}}(MN^2)\) and optimization yields

$$\begin{aligned} M = {\mathcal {O}}\left( {\bar{c}}^{\, \frac{\gamma \rho _A}{\gamma \rho _A +2q}} \right) , \quad N = {\mathcal {O}}\left( {\bar{c}}^{\, \frac{q}{\gamma \rho _A +2q}} \right) \end{aligned}$$
(29)

and the effective order of convergence is given by

$$\begin{aligned} \text {err}(\text {MIL-A1}) = {\mathcal {O}}\left( {\bar{c}}^{\, \, -\frac{\gamma \rho _A q}{\gamma \rho _A+2q}}\right) . \end{aligned}$$
(30)

If \(2q \le \gamma \rho _A (2q-1)\), it follows that \(q>\frac{1}{2}\) and that the \(\text {MIL-A1}\) scheme attains the same computational cost \({\bar{c}}={\mathcal {O}}(M^{2q})\) as the \(\text {DFM-A1}\) scheme in the second case. Thus, we also get (27) for M and N, and also the same effective order of convergence as given by (28).

Clearly, for the \(\text {EXE}\) scheme it holds \(q_{\text {EXE}}\le \tfrac{1}{2}\) and thus we get the same results as for the \(\text {DFM-A1}\) scheme given in (25) for M and N as well as by (26) for the effective order of convergence with \(q=q_{\text {EXE}}\).

Table 3 In case of a finite-dimensional Q-Wiener process and \(K=|\{ j \in {\mathcal {J}} : \eta _j \ne 0 \}| < \infty \), the conditions in this table have to be checked in order to determine the optimal scheme among the schemes \(\text {DFM-A1}\), \(\text {MIL-A1}\) and \(\text {EXE}\)

Finally, comparing the effective orders of convergence for the schemes under consideration, we easily derive the results presented in Table 3. Here, again the \(\text {DFM-A1}\) scheme performs better or at least as good as one of the other schemes. Clearly, in the case of \(q_{\text {DFM}}=q_{\text {MIL}}=q_{\text {EXE}}\le \frac{1}{2}\) one may prefer the \(\text {EXE}\) scheme or the \(\text {LIE}\) scheme although they have the same effective order of convergence as the \(\text {DFM-A1}\) scheme because they are easier to implement. However, in the case of \(2q \le \gamma \rho _A(2q -1)\) where the \(\text {DFM-A1}\) scheme and the \(\text {MIL-A1}\) scheme attain the same effective order of convergence one may prefer the \(\text {DFM-A1}\) scheme because it needs less computational effort and because no derivative of the operator B is needed by the \(\text {DFM-A1}\) scheme. Again, the effective order of convergence is always bounded by 1/2 as for the infinite-dimensional noise case.

3.4 An illustrative example for the performance of the \(\text {DFM}\) scheme

In order to illuminate the improvement of the effective order of convergence for the \(\text {DFM-A1}\) scheme compared to the \(\text {MIL-A1}\) and the \(\text {EXE}\) scheme, we consider as an example the important case where \(A = \Delta \) is the Laplace operator. This covers the stochastic heat equation and many reaction-diffusion type equations. Thus, it holds \(\rho _A=2\) and we assume for simplicity that \(\beta =0\) and that \(\delta \in (0,\frac{1}{2})\) is maximal. Then, for any F and B fulfilling the corresponding assumptions in Sect. 2.1, we get \(q_{\text {DFM}}= q_{\text {MIL}}= \gamma \) and \(q_{\text {EXE}}= \min (\gamma , \frac{1}{2})\). Therefore, we only need to analyze the effective order of convergence subject to the values of \(\gamma \in (0,1)\) and \(\alpha \rho _Q >0\). In Fig. 1, we plot the effective order of convergence (EOC) versus the parameter \(\gamma \) for the schemes \(\text {DFM-A1}\), \(\text {EXE}\) and \(\text {MIL-A1}\) if they are applied to such an SPDE in case of an infinite-dimensional Q-Wiener process for \(\alpha \rho _Q=1\) on the left and in case of a finite-dimensional Q-Wiener process on the right.

Fig. 1
figure 1

Effective order of convergence (EOC) vs \(\gamma \) for the schemes \(\text {DFM-A1}\), \(\text {EXE}\) and \(\text {MIL-A1}\) for an SPDE driven by an infinite-dimensional (left) and a finite-dimensional (right) Q-Wiener process

In case of an infinite-dimensional Q-Wiener process, the EOC for the \(\text {DFM-A1}\) scheme is given by (19) if \(\gamma < \frac{3}{4}\) and by (21) if \(\gamma \ge \frac{3}{4}\). Therefore, the maximal possible EOC for the \(\text {DFM-A1}\) scheme is \(p_{\text {DFM}} = \frac{\alpha \rho _Q}{2 \alpha \rho _Q +1}\) which is attained for any \(\gamma \ge \frac{3}{4}\). For the \(\text {EXE}\) scheme, the EOC is determined by (24) and the upper bound for the EOC is given by \(p_{\text {EXE}} = \frac{2 \alpha \rho _Q}{ \alpha \rho _Q + 2 +4 \alpha \rho _Q}\) which is approached by the \(\text {EXE}\) scheme as \(\gamma \rightarrow 1\). Moreover, the EOC for the \(\text {MIL-A1}\) scheme is given by (23) for all \(\gamma < 1\). Thus, an upper bound for the EOC of the \(\text {MIL-A1}\) scheme is given by \(p_{\text {MIL-A1}} = p_{\text {DFM-A1}} = \frac{\alpha \rho _Q}{2 \alpha \rho _Q + 1}\) which is approached as \(\gamma \rightarrow 1\). For \(\alpha \rho _Q = 1\), these results are presented in the left diagram of Fig. 1. If the SPDE is driven by a finite-dimensional Q-Wiener process, then following the discussion in Sect. 3.3, the EOC for the \(\text {DFM-A1}\) scheme is given by (26) if \(\gamma < \frac{3}{4}\) and the EOC is \(\frac{1}{2}\) if \(\gamma \ge \frac{3}{4}\). For the \(\text {EXE}\) scheme, the EOC coincides with the one given in (26) if \(\gamma < \frac{1}{2}\) and then changes to the value \(\frac{\gamma }{2 \gamma + \frac{1}{2}}\) if \(\gamma \ge \frac{1}{2}\). An upper bound for the EOC of the \(\text {EXE}\) scheme is given by \(\frac{2}{5}\), which is approached as \(\gamma \rightarrow 1\). Finally, the EOC for the \(\text {MIL-A1}\) scheme is determined by (30) for any \(\gamma < 1\) and an upper bound for the EOC is given by \(\frac{1}{2}\) which is again approached as \(\gamma \rightarrow 1\). Note that the EOC of all schemes under consideration does not depend on the term \(\alpha \rho _Q\) in case of a finite-dimensional Q-Wiener process and the results are presented in the right diagram of Fig. 1. The left diagram displays the case \(\alpha \rho _Q = 1\), however, the qualitative characteristics remain the same if different values of \(\alpha \rho _Q>0\) are considered. One can even see that the left diagram continuously approaches the right diagram and for the upper bounds it holds that \(p_{\text {DFM}} = p_{\text {MIL-A1}} \rightarrow \frac{1}{2}\) whereas \(p_{\text {EXE}} \rightarrow \frac{2}{5}\) as the value of \(\alpha \rho _Q\) increases.

Comparing the EOC of the different schemes, we can see that for \(\gamma \in (0,\frac{1}{2}]\) the schemes \(\text {EXE}\) and \(\text {DFM-A1}\) attain exactly the same EOC that is always higher than the EOC of the \(\text {MIL-A1}\) scheme. For \(\gamma \in (\frac{1}{2}, \frac{3}{4}]\), the regime for the EOC of the \(\text {EXE}\) schemes changes due to \(q_{\text {EXE}}= \min (\frac{1}{2}, \gamma )\) in this setting. Here, the EOC of the \(\text {EXE}\) scheme is still higher than the EOC of the \(\text {MIL-A1}\) scheme. Moreover, the \(\text {DFM-A1}\) scheme has the highest EOC and thus outperforms the \(\text {EXE}\) as well as the \(\text {MIL-A1}\) scheme. In case of \(\gamma \in (\frac{3}{4},1)\), the regime for the EOC of the \(\text {DFM-A1}\) scheme changes as the cost for the computation of the iterated stochastic integrals are now dominating the overall cost. Further, now the \(\text {MIL-A1}\) schemes has a higher EOC than the \(\text {EXE}\) scheme. However, the \(\text {DFM-A1}\) scheme still has the highest EOC compared to the other two schemes under consideration. Note that the \(\text {DFM-A1}\) scheme clearly outperforms the \(\text {MIL-A1}\) scheme for all \(\gamma \in (0,1)\), which is due to the reduction of the computational cost for the \(\text {DFM}\) scheme compared to the \(\text {MIL}\) scheme irrespective of the computational cost for the iterated stochastic integrals. This reduction of the cost for the \(\text {DFM}\) scheme results from a carefully tailored design of the derivative-free stages for the \(\text {DFM}\) scheme. Summarizing, the newly proposed \(\text {DFM}\) scheme combined with Algorithm 1 for the computation of the iterated stochastic integrals attains for any \(\gamma \in (0,1)\) always the highest possible EOC compared to the \(\text {EXE}\) scheme and the \(\text {MIL-A1}\) scheme. Moreover, the maximal possible EOC for the \(\text {EXE}\) scheme is \(\frac{2}{5}\) whereas the maximal possible EOC for the \(\text {DFM-A1}\) scheme is \(\frac{1}{2}\).

4 Numerical analysis

In this section, we compare the \(\text {DFM-A1}\) scheme to the \(\text {MIL-A1}\) and the \(\text {EXE}\) schemes to demonstrate the theoretical results presented above, summarized in Tables 2 and 3, by numerical computations for some examples. In the following, we approximate the mild solution of SPDE (2), that is,

$$\begin{aligned} X_t = e^{At}\xi + \int _0^t e^{A(t-s)}F(X_s)\, \mathrm {d}s + \int _0^t e^{A(t-s)}B(X_s)\, \mathrm {d}W_s, \quad t\in (0,T]. \end{aligned}$$

For the numerical analysis, we consider the following setting. We fix \(H=U=L^2((0,1),{\mathbb {R}})\), set \(T=1\), and \({\mathcal {I}}={\mathcal {J}} ={\mathbb {N}}\). Let A be the Laplace operator with Dirichlet boundary conditions. To be precise, \(A=\frac{\Delta }{100}\) with eigenvalues \(\lambda _i = \frac{\pi ^2 i^2}{100}\) of \(-A\) and eigenvectors \(e_i= \sqrt{2} \sin (i\pi x)\) for \(i\in {\mathbb {N}}\), \(x\in (0,1)\) and on the boundary, we have \(X_t(0) = X_t(1) = 0\) for all \(t\in (0,T]\). The covariance operator Q is defined by the eigenvalues \(\eta _j = j^{-\rho _Q}\) for some \(\rho _Q>1\) which is given separately for each example below and \({\tilde{e}}_j = \sqrt{2} \sin (j\pi x)\) for \(j\in {\mathbb {N}}\), \(x\in (0,1)\). For the operator B, we present the general setting introduced for the numerical analysis in [20]. Define the functionals \(\mu _{ij}:H_{\beta }\rightarrow {\mathbb {R}}\), \(\phi _{ij}^k:H_{\beta }\rightarrow {\mathbb {R}}\) for \(i,k\in {\mathcal {I}}\), \(j\in {\mathcal {J}}\) such that \(\phi _{ij}^k\) is the Fréchet derivative of \(\mu _{ij}\) in direction \(e_k\) and let

$$\begin{aligned} B(y)u =\sum _{i\in {\mathcal {I}}}\sum _{j\in {\mathcal {J}}} \mu _{ij}(y)\langle u,{\tilde{e}}_j\rangle _U e_i, \end{aligned}$$

as well as

$$\begin{aligned} (B'(y)(B(y)v))u =\sum _{i,k\in {\mathcal {I}}}\sum _{j,r\in {\mathcal {J}}} \phi _{ij}^k(y)\mu _{kr}(y)\langle v,{\tilde{e}}_r\rangle _U \langle u,{\tilde{e}}_j\rangle _U e_i \end{aligned}$$

for \(y\in H_{\beta }\) and \(u,v\in U_0\), see [20, Sec. 5.3] for details.

We choose \(\mu _{ij}(y) = \frac{\langle y, e_j \rangle _H}{i^p+j^4}\) for all \(i \in {\mathcal {I}}\), \(j\in {\mathcal {J}}\), \(y\in H\) and some \(p>1\) that differs in the examples presented below, which leads to \(\phi _{ij}^k(y) = \left\{ \begin{array}{ll} 0, &{} k \ne j \\ \frac{1}{i^p+j^4}, &{} k = j \end{array}\right. \) for all \(i,k \in {\mathcal {I}}\), \(j\in {\mathcal {J}}\), \(y\in H\). This is the setting considered in [36]. Assumption (A1) is obviously fulfilled and assumptions (A2) and (A4) can be easily verified to be fulfilled in the following examples. Next, we elaborate on assumption (A3). By the definition of the \(L(U,H_{\delta })\)-norm and the operator B, we obtain

$$\begin{aligned} \Vert B(y)\Vert _{L(U,H_{\delta })} =\sup _{\begin{array}{c} u\in U \\ \Vert u\Vert _U = 1 \end{array}}\bigg \Vert \sum _{i\in {\mathcal {I}}} \sum _{j\in {\mathcal {J}}} \lambda _i^{\delta } \mu _{ij}(y) \langle u,{\tilde{e}}_j \rangle _U e_i \bigg \Vert _H. \end{aligned}$$

In the next steps, we employ the Parseval equality and the triangle inequality

$$\begin{aligned} \Vert B(y)\Vert _{L(U,H_{\delta })}&=\sup _{\begin{array}{c} u\in U \\ \Vert u\Vert _U = 1 \end{array}} \bigg ( \sum _{i\in {\mathcal {I}}} \bigg | \sum _{j\in {\mathcal {J}}} \lambda _i^{\delta } \mu _{ij}(y) \langle u,{\tilde{e}}_j \rangle _U \bigg |^2 \bigg )^{\frac{1}{2}} \\&\le \sup _{\begin{array}{c} u\in U \\ \Vert u\Vert _U = 1 \end{array}} \bigg ( \sum _{i\in {\mathcal {I}}} \bigg ( \sum _{j\in {\mathcal {J}}} |\lambda _i^{\delta }| \cdot | \mu _{ij}(y) | \cdot | \langle u,{\tilde{e}}_j \rangle _U | \bigg )^2 \bigg )^{\frac{1}{2}}. \end{aligned}$$

It holds by Parseval’s equality that

$$\begin{aligned} \Vert y\Vert ^2_{H_{\delta }} = \Vert (-A)^{\delta }y \Vert ^2_H = \sum _{i\in {\mathcal {I}}} |\lambda _i^{\delta } \langle y,e_i \rangle _H|^2 \end{aligned}$$

and therewith

$$\begin{aligned} |\langle y, e_j \rangle _H |^2 = \lambda _j^{-2\delta }|\lambda _j^{\delta } \langle y, e_j \rangle _H |^2 \le \lambda _j^{-2\delta } \Vert y\Vert _{H_{\delta }}^2 \end{aligned}$$
(31)

for all \(j \in {\mathcal {J}}\). As \(|\langle u,{\tilde{e}}_j\rangle _U|^2 \le 1\) by Parseval, we obtain

$$\begin{aligned} \Vert B(y)\Vert _{L(U,H_{\delta })}&\le \bigg ( \sum _{i\in {\mathcal {I}}} \lambda _i^{2\delta } \bigg ( \sum _{j\in {\mathcal {J}}} |\mu _{ij}(y)| \bigg )^2 \bigg )^{\frac{1}{2}} \\&= \bigg ( \sum _{i\in {\mathcal {I}}} \frac{\pi ^{4\delta }i^{4\delta }}{100^{2\delta }} \bigg ( \sum _{j\in {\mathcal {J}}} \frac{| \langle y,e_j\rangle _H |}{i^p+j^4} \bigg )^2 \bigg )^{\frac{1}{2}} \\&\le \bigg ( \sum _{i\in {\mathcal {I}}} \frac{\pi ^{4\delta }i^{4\delta }}{100^{2\delta }} \bigg ( \sum _{j\in {\mathcal {J}}} \frac{\lambda _j^{-\delta } \Vert y\Vert _{H_{\delta }} }{i^p+j^4} \bigg )^2 \bigg )^{\frac{1}{2}}. \end{aligned}$$

Then, for some \(\varepsilon \in (0,2\delta )\), some \(C_1=C_1(\varepsilon ,\delta )>0\) and with \(r=\frac{4}{3-\varepsilon +2\delta }>1\), \(q=\frac{4}{1+\varepsilon -2\delta }>1\) such that \(\frac{1}{r}+\frac{1}{q}=1\), Young’s inequality gives the estimate

$$\begin{aligned} \Vert B(y)\Vert _{L(U,H_{\delta })}&\le \bigg ( \sum _{i\in {\mathcal {I}}} i^{4\delta } \bigg ( \sum _{j\in {\mathcal {J}}} \frac{j^{-2\delta }}{ r^{\frac{1}{r}} \, q^{\frac{1}{q}} \, i^{\frac{3-\varepsilon +2\delta }{4}p} \, j^{1+\varepsilon -2\delta }} \bigg )^2 \bigg )^{\frac{1}{2}} \Vert y\Vert _{H_{\delta }} \\&\le C_1 \left( \sum _{i\in {\mathcal {I}}} i^{4\delta -\frac{3-\varepsilon +2\delta }{2}p} \right) ^{\frac{1}{2}} \Vert y\Vert _{H_{\delta }}. \end{aligned}$$

If for \(\delta \in (0,\tfrac{1}{2})\) it holds that \(p > \tfrac{2+8 \delta }{3+2 \delta }\), then it follows that \(\Vert B(y)\Vert _{L(U,H_{\delta })}\le C(1+\Vert y\Vert _{H_{\delta }})\) for all \(y\in H_{\delta }\).

Next, we compute the term

$$\begin{aligned} \Vert (-A)^{-\vartheta }B(y)Q^{-\alpha }\Vert _{L_{HS}(U_0,H)} = \bigg ( \sum _{j\in {\mathcal {J}}} \Vert (-A)^{-\vartheta } B(y) Q^{-\alpha +\frac{1}{2} }{\tilde{e}}_j \Vert _H^2 \bigg )^{\frac{1}{2}} \end{aligned}$$

for all \(y\in H_{\gamma }\). We rewrite the expression above to obtain

$$\begin{aligned} \Vert (-A)^{-\vartheta }B(y)Q^{-\alpha }\Vert _{L_{HS}(U_0,H)}&= \bigg ( \sum _{k\in {\mathcal {I}}} \sum _{j\in {\mathcal {J}}} |\langle (-A)^{-\vartheta } B(y) Q^{-\alpha +\frac{1}{2}} {\tilde{e}}_j , e_k \rangle _H |^2 \bigg )^{\frac{1}{2}} \\&= \bigg ( \sum _{i\in {\mathcal {I}}} \sum _{j\in {\mathcal {J}}} | \lambda _i^{-\vartheta } \langle B(y) Q^{-\alpha +\frac{1}{2}} {\tilde{e}}_j , e_i\rangle _H |^2 \bigg )^{\frac{1}{2}} \\&= \bigg ( \sum _{i\in {\mathcal {I}}} \sum _{j\in {\mathcal {J}}} \lambda _i^{-2\vartheta } | \langle B(y) \eta _j^{-\alpha +\frac{1}{2}} {\tilde{e}}_j , e_i\rangle _H |^2 \bigg )^{\frac{1}{2}}. \end{aligned}$$

Here, we employed the definition of the operators A and Q. In the next step, we insert the definition of the operator B

$$\begin{aligned} \Vert (-A)^{-\vartheta }B(y)Q^{-\alpha }\Vert _{L_{HS}(U_0,H)}&= \bigg ( \sum _{i\in {\mathcal {I}}} \sum _{j\in {\mathcal {J}}} \lambda _i^{-2\vartheta } \eta _j^{-2\alpha +1} |\mu _{ij}(y)|^2 \bigg )^{\frac{1}{2}} \\&= \bigg ( \sum _{i\in {\mathcal {I}}} \sum _{j\in {\mathcal {J}}} \frac{\pi ^{-4\vartheta } i^{-4\vartheta }}{100^{-2\vartheta }} j^{(2\alpha -1) \rho _Q} \frac{| \langle y, e_j \rangle _H |^2}{|i^p+j^4|^2} \bigg )^{\frac{1}{2}}. \end{aligned}$$

By Parseval’s equality and calculations as in (31), we obtain for some \(C_2>0\) that

$$\begin{aligned} \Vert (-A)^{-\vartheta } B(y) Q^{-\alpha } \Vert _{L_{HS}(U_0,H)} \le C_2 \left( \sum _{i\in {\mathcal {I}}} \sum _{j\in {\mathcal {J}}} i^{-4\vartheta } j^{(2\alpha -1) \rho _Q} j^{-4\gamma } \frac{\Vert y \Vert _{H_{\gamma }}^2}{|i^p+j^4|^2} \right) ^{\frac{1}{2}}. \end{aligned}$$

Then, for all \(\varepsilon \in (4\vartheta -1, 4\vartheta -1+2p)\) with \(r=\frac{2p}{1+\varepsilon -4\vartheta }>1\), \(q=\frac{2p}{2p-1-\varepsilon +4\vartheta }>1\), Young’s inequality yields that

$$\begin{aligned}&\Vert (-A)^{-\vartheta }B(y)Q^{-\alpha }\Vert _{L_{HS}(U_0,H)} \\&\quad \le C_2 \left( \sum _{i\in {\mathcal {I}}} \sum _{j\in {\mathcal {J}}} i^{-4\vartheta } j^{(2\alpha -1) \rho _Q - 4 \gamma } \frac{\Vert y \Vert _{H_{\gamma }}^2}{ \big ( r^{\frac{1}{r}} \, i^{\frac{1+\varepsilon -4\vartheta }{2}} \, q^{\frac{1}{q}} \, j^{\frac{4p-2-2\varepsilon +8\vartheta }{p}} \big )^2} \right) ^{\frac{1}{2}} \\&\quad \le C_3 \left( \sum _{i\in {\mathcal {I}}} \frac{1}{i^{1+\varepsilon }}\right) ^{\frac{1}{2}} \left( \sum _{j\in {\mathcal {J}}} \frac{1}{j^{(1-2\alpha )\rho _Q+4\gamma +8-\frac{4}{p} -\frac{4\varepsilon }{p}+\frac{16\vartheta }{p}}} \right) ^{\frac{1}{2}} \Vert y \Vert _{H_{\gamma }} \end{aligned}$$

with \(C_3=C_3(\varepsilon ,\vartheta ,p)>0\). Therefore, \(\Vert (-A)^{-\vartheta }B(y)Q^{-\alpha }\Vert _{L_{HS}(U_0,H)} \le C (1+\Vert y\Vert _{H_{\gamma }})\) holds for all \(y\in H_{\gamma }\) and some \(C>0\) if \(\alpha <\frac{7+\rho _Q+4\gamma }{2\rho _Q} + \frac{2 (\min (0, 4\vartheta -1) - {\hat{\varepsilon }})}{p \rho _Q}\) for some arbitrary small \({\hat{\varepsilon }}>0\), \(p > \max \big ( \frac{1-4\vartheta }{2}, 1 \big )\) and if \(\varepsilon \in (\max (0,4\vartheta -1),4\vartheta -1+2p)\). In the following examples, \(p > \max \big ( \tfrac{2+8\delta }{3+2\delta }, 1 \big )\) and \(\rho _Q\) are specified and we select \(\gamma \) and \(\alpha \) to be maximal. We do not state any other condition given in (A3) as these do not pose a restriction on the parameters but note that these are fulfilled as well. Finally, we examine the commutativity condition (1). On the one hand, it holds that

$$\begin{aligned} \sum _{k\in {\mathcal {I}}} \phi _{im}^k(y)\mu _{kn}(y)= \frac{1}{i^p+m^4} \frac{\langle y,e_n\rangle _H}{m^p+n^4} \end{aligned}$$

but on the other hand, it holds that

$$\begin{aligned} \sum _{k\in {\mathcal {I}}} \phi _{in}^k(y)\mu _{km}(y) = \frac{1}{i^p+n^4} \frac{\langle y,e_m\rangle _H}{n^p+m^4} \end{aligned}$$

for all \(y\in H\) and all \(i\in {\mathcal {I}}\), \(m,n \in {\mathcal {J}}\). Obviously, these two expressions differ for some choice of \(m, n \in {\mathcal {J}}\). Thus, the considered example does not fulfill the commutativity condition (1).

In the following examples, we have \(\rho _A=2\) and we choose the parameter p such that assumption (A3) is always fulfilled and such that different cases in Table 2 are addressed. All simulations are computed with an Intel Xeon E3-1245 v5 CPU at 3.50 GHz and 32 GB of memory using Matlab version R2021b. Further, the iterated stochastic integrals are approximated by Algorithm 1 from [21] and we make use of the implementation that can be found in the toolbox [10]. First, for each example the orders of convergence w.r.t. the time discretization \(q_{\text {DFM}}\), \(q_{\text {MIL}}\) and \(q_{\text {EXE}}\) that are given by Theorems 2.12.2 and Proposition 3.1 are analyzed. Note that from the convergence result [36, Thm. 1], we observe that \(q_{\text {MIL}}=q_{\text {DFM}}\), see also [9, Thm. 1]. Then, the much more relevant effective order of convergence is analyzed where the error versus computational effort is considered. Here, the computational cost \({\bar{c}}\) is computed as \({\bar{c}}(\text {DFM-A1}) = MN+2MNK+MK(1+2M^{2q-1})\), \({\bar{c}}(\mathrm {\text {MIL-A1}}) = MN+MNK+MN^2K+MK(1+2M^{2q-1})\), and \({\bar{c}}(\text {EXE}) = MN+MNK+MK\), see also Table 1. The effective order of convergence is a good indicator for the performance of numerical schemes in practice. The numerical results for the effective order of convergence are confirmed by a comparison of the corresponding averaged measured CPU times that are based on a huge number of simulations. However, one has to keep in mind that in general CPU time measurements may critically depend on, e.g., the concrete implementation of a scheme which may lead to significantly differing results. Therefore, considering a theoretical cost model and the corresponding effective order of convergence is an attempt to have a more objective performance indicator. That is why we concentrate our analysis on the effective order of convergence.

4.1 Example 1

In the first example, we set the parameters to \(p=\tfrac{4}{3}\), \(\rho _Q = 3\) and we choose \(F(y) = 1-y\), \(y\in H\). This allows for \(\beta \in [0,1)\) and we choose \(\beta =0\). Moreover, we set the initial value \(\xi (x)= X_0(x) =0\) for all \(x\in (0,1)\). From condition \(p > \max ( \tfrac{2+8\delta }{3+2\delta }, 1)\), it follows that \(\delta \in (0,\tfrac{3}{8})\). Therefore, we set \(\delta = \tfrac{3}{8}-\varepsilon _{\delta }\) for some arbitrarily small \(\varepsilon _{\delta }>0\). From these parameter values, we compute \(\gamma \in [\tfrac{3}{8}-\varepsilon _{\delta },\tfrac{7}{8}-\varepsilon _{\delta })\) and we thus choose \(\gamma = \tfrac{7}{8}-\varepsilon _{\delta } -\varepsilon _{\gamma }\) for some arbitrarily small \(\varepsilon _{\gamma }>0\). As a result of this, it follows that \(q= q_{\text {DFM}} = q_{\text {MIL}} = \frac{7}{8}-{\hat{\varepsilon }}\) with \({\hat{\varepsilon }} = \varepsilon _{\delta } + \varepsilon _{\gamma }>0\) arbitrarily small. Finally, since we can choose \(\vartheta \in (0,\frac{1}{2})\) arbitrarily for this example, we take \(\vartheta = \frac{1}{4}\) for simplicity. Then, from the condition \(\alpha <\frac{7+\rho _Q+4\gamma }{2\rho _Q}\), we directly get that \(\alpha \in (0,\tfrac{27}{12}-\tfrac{2}{3} {\hat{\varepsilon }})\) and we choose \(\alpha = \frac{27}{12}-\varepsilon _{\alpha }\) for some arbitrarily small \(\varepsilon _{\alpha }> \tfrac{2}{3} {\hat{\varepsilon }}>0\). Thus, assumption (A3) holds, as discussed above. Furthermore, condition (A5a) is fulfilled as \(\rho _Q >2\).

With these parameters, we can identify the scheme that is superior. For this example, it holds that \(q < \gamma \rho _A (2q-1) \le 2q\) for sufficiently small \({\hat{\varepsilon }}>0\). Thus, the \(\text {DFM-A1}\) scheme is optimal, i.e., it is the scheme with the highest effective order of convergence according to Table 2. In order to compare the \(\text {DFM-A1}\) scheme to the other schemes under consideration, we calculate the effective orders of convergence for each of the schemes. We expect that the scheme \(\text {DFM-A1}\) obtains the highest effective order of convergence in this setting with

$$\begin{aligned} \text {error}(\text {DFM-A1})= {\mathcal {O}}\left( {\bar{c}}^{-\frac{27-12\varepsilon _{\alpha }}{58 -24 \varepsilon _{\alpha }}} \right) \end{aligned}$$

given by (21), i.e., \(\text {EOC}(\text {DFM-A1}) \approx \tfrac{27}{58}\). Moreover, we fix some arbitrary \(N\in {\mathbb {N}}\) and compute the relation \(M = N^2\) and \(K= \big \lceil N^{\frac{\frac{7}{4} -2 {\hat{\varepsilon }}}{\frac{27}{4}-3 \varepsilon _{\alpha }}} \big \rceil \approx \big \lceil N^{\frac{7}{27}} \big \rceil \) as given in (20) for the implementation of the \(\text {DFM-A1}\) scheme.

Considering the scheme \(\text {MIL-A1}\), the effective order of convergence for this scheme is given by (23) with

$$\begin{aligned} \text {error}(\text {MIL-A1})= {\mathcal {O}}\left( {\bar{c}}^{ -\frac{\frac{189}{32} -\frac{27}{4} {\hat{\varepsilon }} -\frac{21}{8} \varepsilon _{\alpha } +3 {\hat{\varepsilon }} \varepsilon _{\alpha }}{\frac{115}{8} -6\varepsilon _{\alpha } -{\hat{\varepsilon }}}} \right) , \end{aligned}$$

i.e., \(\text {EOC}(\text {MIL-A1}) \approx \tfrac{189}{460}\). For this example, the relations between N, K and M for the \(\text {MIL-A1}\) scheme given in (22) are exactly the same as for the \(\text {DFM-A1}\) scheme.

On the other hand, for the \(\text {EXE}\) scheme we obtain from (18) for some arbitrarily fixed \(N \in {\mathbb {N}}\) the relation \(M= \big \lceil N^{\frac{7}{2} -4 {\hat{\varepsilon }}} \big \rceil \approx \big \lceil N^{\frac{7}{2}} \big \rceil \) and \(K= \big \lceil N^{\frac{\frac{7}{4} -2 {\hat{\varepsilon }}}{\frac{27}{4} -3 \varepsilon _{\alpha }}} \big \rceil \approx \big \lceil N^{\frac{7}{27}} \big \rceil \) as an optimal choice. The effective order of convergence for the \(\text {EXE}\) scheme is given as

$$\begin{aligned} \text {error}(\text {EXE}) = {\mathcal {O}}\left( {\bar{c}}^{-\frac{\frac{189}{32} -\frac{21}{8} \varepsilon _{\alpha } -\frac{27}{4} {\hat{\varepsilon }} + 3 {\hat{\varepsilon }} \varepsilon _{\alpha }}{ \frac{257}{16} -\frac{27}{4} \varepsilon _{\alpha } -\frac{29}{2} {\hat{\varepsilon }} +6 {\hat{\varepsilon }} \varepsilon _{\alpha }}} \right) \end{aligned}$$

as stated in (24), i.e., it holds \(\text {EOC}(\text {EXE}) \approx \tfrac{189}{514}\).

Fig. 2
figure 2

\(L^2\)-error at \(T=1\) against number of time steps (left) and against computational cost (right) for Example 1 computed from 500 paths for \(N \in \{ 2, 4, 8, 16, 32\}\) in \(\log \)\(\log \) scale, respectively

For the numerical evaluation, we compare the approximations of the schemes \(\text {DFM-A1}\), \(\text {MIL-A1}\) and \(\text {EXE}\) to an approximation computed with the linear implicit Euler scheme with \(N = 2^6\), \(K= \lceil 2^{\frac{14}{9}}\rceil \) and \(M = \lceil 2^{\frac{35}{2}}\rceil \) that serves as the reference solution. We simulate 500 paths with each scheme and for each \(N \in \{ 2, 4, 8, 16, 32\}\) to compare the \(L^2\)-error at time \(T=1\) versus number of time steps as well as versus computational cost, see Fig. 2. Here, \(\log \)\(\log \) scales are used such that the absolute value of the slope of the graph indicates the temporal order of convergence with respect to step size (left figure) and the effective order of convergence with respect to computational effort (right figure).

Considering the parameters used in this example, the temporal order of convergence is \(q_{\text {EXE}}=\frac{1}{2}\) for the \(\text {EXE}\) scheme and \(q_{\text {DFM}}= q_{\text {MIL}}=\frac{7}{8}-{\hat{\varepsilon }}\) for the \(\text {MIL-A1}\) and \(\text {DFM-A1}\) schemes for some arbitrarily small \({\hat{\varepsilon }}>0\). Thus, the \(\text {MIL-A1}\) and \(\text {DFM-A1}\) schemes possess a significantly higher order of convergence than the \(\text {EXE}\) scheme which is confirmed by the left diagram in Fig. 2 and the corresponding results given in Table 4.

Table 4 Computational cost \({\bar{c}}\), \(L^2\)-error and corresponding standard deviation for Example 1 obtained from 500 paths
Table 5 Average measured CPU times for Example 1 that correspond to the simulation results in Table 4 obtained from 500 computed paths, respectively

Further, it holds that \(\text {EOC}(\text {EXE})< \text {EOC}(\text {MIL-A1}) < \text {EOC}(\text {DFM-A1})\) and thus the \(\text {DFM-A1}\) scheme has a higher effective order of convergence than the other schemes. This is confirmed by the numerical simulation results presented in the right diagram in Fig. 2 and in Table 4 where \({\bar{c}}\) denotes the computational cost measured by counting the number of functional evaluations and normally distributed random numbers needed by each scheme. Additionally, we also specify the corresponding average measured CPU times in Table 5, where the \(\text {DFM-A1}\) scheme takes the lowest CPU times as accuracy increases. These results substantiate the cost model and the results for the effective order of convergence in Fig. 2. As a result of this, for this example the \(\text {DFM-A1}\) scheme performs better than the \(\text {MIL-A1}\) and the \(\text {EXE}\) scheme.

4.2 Example 2

Here, we choose a smaller value \(p = \tfrac{44}{41}\) and the same covariance operator Q as in Example 1 with \(\rho _Q =3\). Thus, condition (A5a) is fulfilled. As in Example 1, we consider the function \(F(y) = 1-y\), \(y\in H\) and choose \(\beta =0\). Again, the initial value is chosen as \(\xi (x)= X_0(x) =0\) for all \(x\in (0,1)\). Further, we calculate the condition \(\delta \in (0, \tfrac{5}{24})\) and choose \(\delta = \tfrac{5}{24} -\varepsilon _{\delta }\) for some arbitrarily small \(\varepsilon _{\delta } >0\). Then, we get \(\gamma \in [ \tfrac{5}{24}, \tfrac{17}{24} -\varepsilon _{\delta } )\) and we set \(\gamma = \tfrac{17}{24} -\varepsilon _{\delta } - \varepsilon _{\gamma }\) for some arbitrarily small \(\varepsilon _{\gamma }>0\). This implies \(q= q_{\text {DFM-A1}} = q_{\text {MIL-A1}} = \tfrac{17}{24} - {\hat{\varepsilon }}\) with \({\hat{\varepsilon }} = \varepsilon _{\delta } + \varepsilon _{\gamma } >0\) arbitrarily small. Moreover, one can choose \(\vartheta \in (0, \tfrac{1}{2})\) arbitrarily and here we choose \(\vartheta = \frac{1}{4}\) for simplicity. Then, we calculate that \(\alpha \in (0, \tfrac{77}{36} -\tfrac{2}{3} {\hat{\varepsilon }} )\) and therefore set \(\alpha = \tfrac{77}{36} - \varepsilon _{\alpha }\) with \(\varepsilon _{\alpha }> \tfrac{2}{3} {\hat{\varepsilon }} > 0\) arbitrarily small. Thus, assumption (A3) is fulfilled.

Checking the conditions in Table 2, we are in the case of \(q> \tfrac{1}{2}\) and \(\gamma \rho _A(2q-1) \le q\) for sufficiently small \({\hat{\varepsilon }}>0\). In this case, the optimal effective order of convergence is obtained by the \(\text {DFM-A1}\) scheme according to Table 2. For the \(\text {DFM-A1}\) scheme, we get from (19) that

$$\begin{aligned} \text {error}(\text {DFM-A1}) = {\mathcal {O}}\left( {\bar{c}}^{-\frac{\frac{1309}{144} -\frac{17}{4} \varepsilon _{\alpha } -\frac{77}{6} {\hat{\varepsilon }}}{ \frac{62}{3} -9 \varepsilon _{\alpha } -2 {\hat{\varepsilon }}}} \right) , \end{aligned}$$

i.e., \(\text {EOC}(\text {DFM-A1}) \approx \tfrac{1309}{2976}\). The optimal choice of M and K given some \(N \in {\mathbb {N}}\) is then determined in (18), which results in \(M = N^2\) and \(K = \big \lceil N^{\frac{\frac{17}{12} -2 {\hat{\varepsilon }}}{ \frac{77}{12} -3 \varepsilon _{\alpha }}} \big \rceil \approx \big \lceil N^{\frac{17}{77}} \big \rceil \).

Considering the \(\text {MIL-A1}\) scheme, we obtain from (23) the effective order of convergence

$$\begin{aligned} \text {error}(\text {MIL-A1}) = {\mathcal {O}}\left( {\bar{c}}^{-\frac{\frac{1309}{144} -\frac{17}{4} \varepsilon _{\alpha } -\frac{77}{6} {\hat{\varepsilon }}}{ \frac{325}{12} -12 \varepsilon _{\alpha } -2 {\hat{\varepsilon }}}} \right) , \end{aligned}$$

i.e., it holds that \(\text {EOC}(\text {MIL-A1}) \approx \frac{1309}{3900}\). Given some \(N \in {\mathbb {N}}\), the optimal choice for M and K is given in (22) and yields the same results as for the \(\text {DFM-A1}\) scheme in this example.

For the Euler scheme, it holds \(q_{\text {EXE}}=\frac{1}{2}\) which in turn yields with (18) that \(M = \big \lceil N^{\frac{17}{6} +4 {\hat{\varepsilon }}} \big \rceil \approx \big \lceil N^{\frac{17}{6}} \big \rceil \) and \(K= \big \lceil N^{\frac{\frac{17}{12} -2 {\hat{\varepsilon }}}{\frac{77}{12} -3 \varepsilon _{\alpha }}} \big \rceil \approx \big \lceil N^{\frac{17}{77}} \big \rceil \). For the effective order of convergence, we obtain

$$\begin{aligned} \text {error}(\text {EXE}) ={\mathcal {O}}\left( {\bar{c}}^{-\frac{\frac{1309}{288} -\frac{17}{8} \varepsilon _{\alpha } - \frac{77}{12} {\hat{\varepsilon }}}{\frac{1873}{144} - \frac{23}{4} \varepsilon _{\alpha } - \frac{83}{6} {\hat{\varepsilon }} + 6 \varepsilon _{\alpha } {\hat{\varepsilon }}}} \right) , \end{aligned}$$

i.e., it holds \(\text {EOC}(\text {EXE}) \approx \tfrac{1309}{3746}\).

Fig. 3
figure 3

\(L^2\)-error at \(T=1\) against number of time steps and computational cost, respectively, for Example 2 computed from 500 paths for \(N \in \{ 2, 4, 8, 16, 32\}\) in log–log scale

Now, the performance of the schemes \(\text {DFM-A1}\), \(\text {MIL-A1}\) and \(\text {EXE}\) is analyzed for this example by numerical simulations. Therefore, a reference solution is computed by the linear implicit Euler scheme. Precisely, we choose \(N = 2^6\), \(K= \lceil 2^{\frac{102}{77}}\rceil \) and \(M = \lceil 2^{\frac{85}{6}}\rceil \) for the computation of the reference solution by the linear implicit Euler scheme. Again, for each \(N \in \{ 2, 4, 8, 16, 32\}\) the \(L^2\)-error is determined at time \(T=1\) based on 500 computed paths.

In the left \(\log \)\(\log \) plot of Fig. 3, the \(L^2\)-errors versus the number of time steps are compared. Here, we can see that the corresponding theoretical temporal order of convergence \(q_{\text {DFM}}= q_{\text {MIL}}= \frac{17}{24} - {\hat{\varepsilon }}\) for the \(\text {DFM-A1}\) and the \(\text {MIL-A1}\) scheme as well as \(q_{\text {EXE}}= \frac{1}{2}\) for the \(\text {EXE}\) scheme are validated by the numerical simulations. The respective numerical results are also listed in Table 6.

In the right diagram of Fig. 3, we analyze the effective orders of convergence for the schemes under consideration. For this example, we expect \(\text {EOC}(\text {MIL-A1})< \text {EOC}(\text {EXE}) < \text {EOC}(\text {DFM-A1})\) due to the cost model. Again, the \(\text {DFM-A1}\) scheme performs best with the highest effective order of convergence and the original Milstein scheme \(\text {MIL-A1}\) has the lowest effective order of convergence, which is even less than that of the \(\text {EXE}\) scheme, which is also confirmed by the numerical results in the right diagram. The simulation results are also given in Table 6. Supplementary, the average measured CPU times corresponding to the simulation results in Table 6 are given in Table 7 where the \(\text {DFM-A1}\) has the lowest computing times. These measurements underpin the results for the effective order of convergence in Fig. 3. Thus, for this example the \(\text {DFM-A1}\) scheme outperforms the \(\text {MIL-A1}\) and the \(\text {EXE}\) scheme.

4.3 Example 3

The following example has been considered for the first time in [36] for an analysis of the original Milstein scheme. Here, we choose the parameters \(p=4\), \(\rho _Q=3\) and we consider \(F(y) = 1-y\) for \(y \in H\). Then, as in the previous examples, we can choose \(\beta \in [0,1)\) arbitrarily and therefore set \(\beta =0\). The initial condition is given by \(\xi (x) = X_0(x) = 0\) for all \(x \in (0,1)\). Then, the condition \(p> \max ( \tfrac{2+8\delta }{3+2\delta }, 1)\) is fulfilled for any \(\delta \in (0,\tfrac{1}{2})\) and we choose a maximal \(\delta = \tfrac{1}{2} -\varepsilon _{\delta }\) for some arbitrarily small \(\varepsilon _{\delta }>0\). We have \(\gamma \in [\tfrac{1}{2}-\varepsilon _{\delta }, 1-\varepsilon _{\delta })\), i.e., we can choose \(\gamma = 1 -\varepsilon _{\delta } -\varepsilon _{\gamma }\) for some arbitrarily small \(\varepsilon _{\gamma }>0\). Thus, for the temporal order of convergence we get \(q = q_{\text {DFM}}= q_{\text {MIL}}= 1 -{\hat{\varepsilon }}\) for some arbitrarily small \({\hat{\varepsilon }} = \varepsilon _{\delta } + \varepsilon _{\gamma } >0\). Note that this is the maximal possible temporal order for the \(\text {DFM}\) and the \(\text {MIL}\) scheme. In order to determine \(\alpha \) to be maximal, we choose \(\vartheta =\tfrac{1}{4}\) for simplicity from \(\vartheta \in (0, \tfrac{1}{2})\) such that \(\alpha <\frac{7+\rho _Q+4\gamma }{2\rho _Q}\) needs to be fulfilled. Thus, we can choose \(\alpha = \tfrac{7}{3}-\varepsilon _{\alpha }\) for some arbitrarily small \(\varepsilon _{\alpha }> \tfrac{2}{3} {\hat{\varepsilon }} >0\). As a result of this, assumption (A3) is fulfilled as well as condition (A5a).

Table 6 Computational cost \({\bar{c}}\), \(L^2\)-error and corresponding standard deviation for Example 2 obtained from 500 paths
Table 7 Average measured CPU times that correspond to the simulation results in Table 6 for Example 2 obtained from 500 computed paths, respectively

Analyzing the effective order of convergence, we see that \(\gamma \rho _A (2 q-1) = 2 q\) as \({\hat{\varepsilon }} \rightarrow 0\). Therefore, both schemes \(\text {DFM-A1}\) and \(\text {MIL-A1}\) have the same optimal effective order of convergence for this example, see also Table 2. From (21) it follows that

$$\begin{aligned} \text {error}(\text {DFM-A1}) = \text {error}(\text {MIL-A1}) = {\mathcal {O}}\left( {\bar{c}}^{-\frac{7 -3 \varepsilon _{\alpha }}{15 -6 \varepsilon _{\alpha }}} \right) \end{aligned}$$

and we get \(\text {EOC}(\text {DFM-A1}) = \text {EOC}(\text {MIL-A1}) \approx \tfrac{7}{15}\). Then, for \(N\in {\mathbb {N}}\) we compute the relation \(M = N^2\) and \(K= \big \lceil N^{\frac{2 -2 {\hat{\varepsilon }}}{7 -3 \varepsilon _{\alpha }}} \big \rceil \approx \big \lceil N^{\frac{2}{7}} \big \rceil \) due to (20) for the implementation of the \(\text {DFM-A1}\) and the \(\text {MIL-A1}\) scheme.

Further, we obtain for the \(\text {EXE}\) scheme that \(q_{\text {EXE}}= \tfrac{1}{2}\) and the effective order of convergence can be calculated from (24) as

$$\begin{aligned} \text {error}(\text {EXE}) = {\mathcal {O}}\left( {\bar{c}}^{-\frac{7 -3 \varepsilon _{\alpha } -7 {\hat{\varepsilon }} + 3 {\hat{\varepsilon }} \varepsilon _{\alpha }}{ \frac{37}{2} -\frac{15}{2} \varepsilon _{\alpha } -15 {\hat{\varepsilon }} +6 {\hat{\varepsilon }} \varepsilon _{\alpha }}} \right) \end{aligned}$$

and thus we have \(\text {EOC}(\text {EXE}) \approx \tfrac{14}{37}\). Then, given some \(N \in {\mathbb {N}}\) and following (18), we get \(M = \big \lceil N^{4 - 4 {\hat{\varepsilon }}} \big \rceil \approx N^4\) and \(K = \big \lceil N^{\frac{2 -2 {\hat{\varepsilon }}}{7 -3 \varepsilon _{\alpha }}} \big \rceil \approx \big \lceil N^{\frac{2}{7}} \big \rceil \).

Fig. 4
figure 4

\(L^2\)-error at \(T=1\) against number of time steps and computational cost, respectively, for Example 3 computed from 500 paths for \(N \in \{ 2, 4, 8, 16, 32\}\) in log-log scale

Table 8 Computational cost \({\bar{c}}\), \(L^2\)-error and corresponding standard deviation for Example 3 obtained from 500 paths
Table 9 Average measured CPU times for Example 3 that correspond to the simulation results in Table 8 obtained from 500 computed paths, respectively

Now, we study the performance of the schemes under consideration by using a reference solution computed by the linear implicit Euler scheme with \(N= 2^6\), \(K= \lceil 2^{\frac{12}{7}} \rceil \) and \(M = 2^{24}\). Then, for each \(N \in \{ 2,4,8,16,32 \}\) the \(L^2\)-error for each scheme at time \(T=1\) is considered based on 500 computed paths.

For this example, the maximal possible theoretical temporal order of convergence \(q_{\text {DFM}}= q_{\text {MIL}}= 1 -{\hat{\varepsilon }}\) for the \(\text {DFM-A1}\) and the \(\text {MIL-A1}\) scheme and \(q_{\text {EXE}}= \tfrac{1}{2}\) for the \(\text {EXE}\) scheme is achieved. This order is validated by the numerical simulation results presented in the left \(\log \)-\(\log \) plot of Fig. 4 and in Table 8.

For the effective order of convergence, we have \(\text {EOC}(\text {EXE}) < \text {EOC}(\text {MIL-A1}) = \text {EOC}(\text {DFM-A1})\) due to the cost model. These theoretical results are confirmed by the right diagram of Fig. 4 where the negative slope of each graph reveals the effective order of convergence of the corresponding scheme. The presented simulation results are also given in Table 8. As for the previous examples, we also report on the average measured CPU times that correspond to the simulations performed by each scheme, see Table 9. Again, it can be seen that the \(\text {DFM-A1}\) schemes performs best and despite the approximation of iterated stochastic integrals the \(\text {DFM-A1}\) and \(\text {MIL-A1}\) scheme perform significantly better than the \(\text {EXE}\) scheme. Again, the measured CPU times support the results for the effective order of convergence presented in Fig. 4.

4.4 Example 4

Compared to the other examples, we choose a different nonlinearity F for this example in order to obtain restrictions for the parameter \(\beta \). Therefore, we consider the mapping \(F :H_{\beta } \rightarrow H\) given by

$$\begin{aligned} F(v) = \sum _{i \in {\mathcal {I}}} f_i(v) \, e_i \end{aligned}$$

for \(v \in H_{\beta }\) with some \(f_i :H_{\beta } \rightarrow {\mathbb {R}}\) for \(i \in {\mathcal {I}}\). In this example, we choose \(f_i(v) = i^{-s} \sin (i^r \langle v,e_i \rangle _H )\) for \(v \in H_{\beta }\), \(s> \tfrac{1}{2}\), \(r \le \min (s, 2 \beta + \tfrac{s}{2} )\) and \(i \in {\mathcal {I}}\). Then, we get

$$\begin{aligned} \Vert F(v) \Vert _H^2 = \sum _{i \in {\mathcal {I}}} |f_i(v) |^2 = \sum _{i \in {\mathcal {I}}} \frac{| \sin (i^r \langle v,e_i \rangle _H) |^2}{i^{2s}} < \infty . \end{aligned}$$

Further, F is twice continuously Fréchet differentiable and it holds

$$\begin{aligned} \sup _{v \in H_{\beta }} \Vert F'(v) \Vert _{L(H)}^2&= \sup _{v \in H_{\beta }} \sup _{\begin{array}{c} u \in H \\ \Vert u \Vert _H = 1 \end{array}} \sum _{i \in {\mathcal {I}}} \bigg | \sum _{k \in {\mathcal {I}}} \frac{\partial f_i}{\partial v_k}(v) \, \langle u,e_k \rangle _H \bigg |^2 \\&= \sup _{v \in H_{\beta }} \sup _{\begin{array}{c} u \in H \\ \Vert u \Vert _H = 1 \end{array}} \sum _{i \in {\mathcal {I}}} i^{2(r-s)} | \cos (i^r \langle v,e_i \rangle _H ) |^2 \, | \langle u,e_i \rangle _H |^2 \\&\le \sup _{\begin{array}{c} u \in H \\ \Vert u \Vert _H = 1 \end{array}} \sum _{i \in {\mathcal {I}}} | \langle u,e_i \rangle _H |^2 = 1, \end{aligned}$$

because \(r \le s\). Next, considering the second Fréchet derivative, we get

$$\begin{aligned} \sup _{v \in H_{\beta }} \Vert F''(v) \Vert _{L^{(2)}(H_{\beta },H)}^2&= \sup _{v \in H_{\beta }} \sup _{\begin{array}{c} u,w \in H_{\beta } \\ \Vert u \Vert _{H_{\beta }} = \Vert w \Vert _{H_{\beta }} = 1 \end{array}} \\&\quad \times \sum _{i \in {\mathcal {I}}}\bigg | \sum _{k,l \in {\mathcal {I}}} \frac{\partial ^2 f_i}{\partial v_k \partial v_l}(v) \, \langle u,e_k \rangle _H \, \langle w,e_l \rangle _H \bigg |^2 \\&= \sup _{v \in H_{\beta }} \sup _{\begin{array}{c} u,w \in H_{\beta } \\ \Vert u \Vert _{H_{\beta }} = \Vert w \Vert _{H_{\beta }} = 1 \end{array}}\\&\quad \times \sum _{i \in {\mathcal {I}}} i^{4r-2s} | \sin (i^r \langle v,e_i \rangle _H ) |^2 | \langle u,e_i \rangle _H |^2 | \langle w,e_i \rangle _H |^2 \\&\le \sup _{\begin{array}{c} u,w \in H_{\beta } \\ \Vert u \Vert _{H_{\beta }} = \Vert w \Vert _{H_{\beta }} = 1 \end{array}} \sum _{i \in {\mathcal {I}}} i^{4r-2s} | \langle u,e_i \rangle _H |^2 | \langle w,e_i \rangle _H |^2 \\&\le \frac{100^2}{\pi ^4} \sup _{\begin{array}{c} u \in H_{\beta } \\ \Vert u \Vert _{H_{\beta }} = 1 \end{array}} \sum _{i \in {\mathcal {I}}} i^{4r-2s-4 \beta } | \langle u,e_i \rangle _H |^2 \\&\le \frac{100^4}{\pi ^8} \sup _{\begin{array}{c} u \in H_{\beta } \\ \Vert u \Vert _{H_{\beta }} = 1 \end{array}} \Vert u \Vert _{H_{\beta }}^2 = \frac{100^4}{\pi ^8} < \infty , \end{aligned}$$

since \(\Vert z \Vert _{H_{\beta }}^2 = \tfrac{\pi ^4}{100^2} \sum _{i \in {\mathcal {I}}} i^{4 \beta } | \langle z,e_i \rangle _H |^2\) for any \(z \in H_{\beta }\) and because \(r \le \min (s, 2 \beta + \tfrac{s}{2} )\). Thus, assumption (A2) is fulfilled.

Again, we choose \(\rho _Q = 3\). Moreover, we select \(r=s=\tfrac{7}{2}\) in the definition of F and \(p=4\) in the definition of the operator B. As the initial value, we choose \(X_0 = \xi \in H\) with \(\langle \xi (x),e_i \rangle _H = i^{-2}\) for \(x\in (0,1)\) and \(i \in {\mathcal {I}}\). First, we calculate \(\beta \in [\tfrac{7}{8},1)\) from the condition \(r \le \min ( s, 2 \beta + \tfrac{s}{2} )\). Therefore, we choose \(\beta = \tfrac{7}{8}\) minimal possible. Analogously to Example 1, we derive \(\delta , \vartheta \in (0,\tfrac{1}{2})\) and choose \(\delta =\tfrac{1}{2} -\varepsilon _{\delta }\) and \(\vartheta = \tfrac{1}{2} -\varepsilon _{\vartheta }\) for arbitrarily small \(\varepsilon _{\delta }, \varepsilon _{\vartheta } >0\). Then, we choose \(\gamma \in [ \frac{7}{8}, 1 -\varepsilon _{\delta })\) maximal, i.e., we choose \(\gamma = 1 -\varepsilon _{\delta } -\varepsilon _{\gamma }\) for arbitrarily small \(\varepsilon _{\gamma }>0\). Let \({\hat{\varepsilon }} = \varepsilon _{\delta } + \varepsilon _{\gamma } >0\) be arbitrarily small. It follows that \(q=q_{\text {DFM-A1}} = q_{\text {MIL-A1}} = \tfrac{1}{4} -2 {\hat{\varepsilon }}\). Finally, we calculate that \(\alpha \in (0, \tfrac{7}{3} -\tfrac{2}{3} {\hat{\varepsilon }})\) and we set \(\alpha = \tfrac{7}{3} - \varepsilon _{\alpha }\) for some arbitrarily small \(\varepsilon _{\alpha }> \tfrac{2}{3} {\hat{\varepsilon }} >0\).

Since we have \(q \le \tfrac{1}{2}\), the optimal schemes are the \(\text {EXE}\) scheme and the \(\text {DFM-A1}\) scheme, both attaining the same effective order of convergence for this example, see Table 2. Taking into account all parameters, we get from (19) and (24) that

$$\begin{aligned} \text {error}(\text {EXE}/ \text {DFM-A1}) = {\mathcal {O}}\left( {\bar{c}}^{-\frac{\frac{7}{2} -33 {\hat{\varepsilon }} + \frac{83}{2} {\hat{\varepsilon }}^2 -12 {\hat{\varepsilon }}^3}{\frac{65}{4} -\frac{65}{2} {\hat{\varepsilon }} -\frac{27}{4} \varepsilon _{\alpha } + 12 \varepsilon _{\alpha } {\hat{\varepsilon }} + 4 {\hat{\varepsilon }}^2}} \right) , \end{aligned}$$

i.e., for the effective order of convergence it holds that \(\text {EOC}(\text {EXE}) = \text {EOC}(\text {DFM-A1})\approx \tfrac{14}{65}\). For some arbitrarily fixed \(N \in {\mathbb {N}}\), we obtain for the \(\text {EXE}\) scheme as well as for the \(\text {DFM-A1}\) scheme from (18) that \(M = \big \lceil N^{\frac{2 -2 {\hat{\varepsilon }}}{\frac{1}{4} -2 {\hat{\varepsilon }}}} \big \rceil \approx N^8\) and \(K = \big \lceil N^{\frac{2 -2 {\hat{\varepsilon }}}{7 -3 \varepsilon _{\alpha }}} \big \rceil \approx \lceil N^{\frac{2}{7}} \rceil \) as the optimal choice. In this case, the computation of the double integrals is not expensive as it holds that \(D \ge D_1 = M^{-\frac{1}{2}-\varepsilon }\) for some \(\varepsilon >0\) such that \(D=1\) can be fixed or it can even be neglected. On the other hand, for the \(\text {MIL-A1}\) scheme, we compute from (23) that

$$\begin{aligned} \text {error}(\text {MIL-A1}) = {\mathcal {O}}\left( {\bar{c}}^{-\frac{\frac{7}{2} -33 {\hat{\varepsilon }} + \frac{83}{2} {\hat{\varepsilon }}^2 -12 {\hat{\varepsilon }}^3}{18 -\frac{93}{2} {\hat{\varepsilon }} -\frac{15}{2} \varepsilon _{\alpha } + 18 \varepsilon _{\alpha } {\hat{\varepsilon }} + 4 {\hat{\varepsilon }}^2}} \right) , \end{aligned}$$

which gives us the effective order of convergence \(\text {EOC}(\text {MIL-A1}) \approx \tfrac{7}{36}\). Moreover, the optimal choice for M and K given some \(N \in {\mathbb {N}}\) can be calculated from (22) to be exactly the same as for the \(\text {EXE}\) scheme and the \(\text {DFM-A1}\) scheme. For this example, we have \(\text {EOC}(\text {MIL-A1}) < \text {EOC}(\text {EXE}) = \text {EOC}(\text {DFM-A1})\). Further, the computational effort involved in computing a convergence plot is very high due to the relation \(M\approx N^8\). Therefore, we do not present a convergence plot for this setting.

The examples presented above confirm the theoretical analysis that we conducted in Sect. 3. The numerical experiments show that the derivative-free Milstein type scheme for equations with non-commutative noise defined in (7), in combination with Algorithm 1, has always at least the same and in many cases an even higher effective order of convergence compared to the exponential Euler scheme and the original Milstein scheme.

5 Conclusion

We proposed the derivative-free Milstein type scheme \(\text {DFM}\) for the approximation of the mild solution of SPDEs that need not fulfill a commutativity condition for the noise and we proved an upper bound for the \(L^2\)-error. As the main novelty, the introduced \(\text {DFM}\) scheme is derivative-free and has computational cost \({\mathcal {O}}(N K M)\) which is of the same magnitude as for the Euler schemes \(\text {EXE}\) and \(\text {LIE}\). This is a significant reduction of the computational complexity compared to the original Milstein scheme \(\text {MIL}\) that is not derivative-free and which has computational cost \({\mathcal {O}}(N^2 K M)\). In addition, the convergence of the \(\text {DFM}\) method is proved if it is combined with any suitable simulation method for the iterated stochastic integrals. As an example, the effective order of convergence of the \(\text {DFM}\) scheme combined with Algorithm 1 in [21] for the simulation of the iterated stochastic integrals is analyzed in detail. For Algorithm 1, the effective order of convergence of the \(\text {DFM}\) scheme is at least that for the Euler schemes or the Milstein scheme \(\text {MIL}\) and turns out to be even significantly higher for many parameter settings depending on the specific SPDE to be approximated. Thus, in many cases the proposed \(\text {DFM}\) scheme outperforms the Euler schemes as well as the original Milstein scheme.

The maximal possible effective order of convergence that can be attained by the \(\text {DFM}\) scheme combined with Algorithm 1 is bounded by 1/2, which is in accordance with the upper bound for the order of strong convergence in case of finite-dimensional SDEs if Algorithm 1 is applied, see also [5]. However, in contrast to the finite-dimensional SDEs setting, for SPDEs the Euler schemes often attain some effective order of convergence less than 1/2. This gap in the order of convergence for the Euler schemes is the reason why the use of higher order approximation methods can be reasonable and which is in strong contrast to the finite-dimensional SDE setting. To the best of the authors knowledge, this is the first attempt to give a rigorous analysis of the error versus computational cost for higher order approximation methods applied to SPDEs without any commutativity condition where the computational cost for the approximation of iterated stochastic integrals is incorporated within the framework of a cost model. It remains an open question whether the application of higher order numerical methods that incorporate further iterated stochastic integrals from the stochastic Taylor expansion may close the gap for the order of convergence to the upper bound of 1/2 if, e.g., naive approximations like Algorithm 1 are applied for the approximation of these iterated stochastic integrals. As a result of this, higher order approximation methods may be of strong interest, especially in the case of SPDEs. On the other hand, it may be possible to overcome the upper bound of 1/2 for the order of convergence if some more sophisticated algorithm for the simulation of the iterated stochastic integrals is combined with the \(\text {DFM}\) scheme, see e.g., Algorithm 2 in [21] and the recently proposed algorithm in [24].

6 Proofs

Here, we give the proof of the convergence result for the derivative-free Milstein scheme (7) as stated in Theorem 2.1. Moreover, we prove the estimate given in Theorem 2.2 which incorporates the approximation of the stochastic double integrals additionally. In the following, we always denote \(Y_m = Y_m^{N,K,M}\) for simplicity and let \(I_{(i,j),l} = (\eta _i \eta _j)^{-\frac{1}{2}} I^Q_{(i,j),l}\). Attention should be paid to the fact that for ease of notation the constants in our proofs may differ from line to line even though their denomination is not changed. We need the following estimate on the moments of the approximation process \((Y_m)_{m\in \{0,\ldots ,M\}}\) for the proof of Theorem 2.1. Note that, without loss of generality, we present the proofs with an equidistant time step \(h=h_m\) for all \(m\in \{0,\ldots ,M-1\}\).

Lemma 6.1

Assume that (A1)–(A4) and (A5) hold. Then, it holds that

$$\begin{aligned} \sup _{m\in \{0,\ldots ,M\}}\big (\mathrm {E}\big [\Vert Y_m\Vert _{H_{\delta }}^p\big ]\big )^{\frac{1}{p}} \le C_{p,Q,T,\delta } \left( 1+\big (\mathrm {E}\big [\Vert X_0\Vert _{H_{\delta }}^p\big ]\big )^{\frac{1}{p}}\right) \end{aligned}$$

for all \(p \in [2,\infty )\) in case of (A5a) and for \(p=2\) in case of (A5b) for some arbitrary \(N,K,M \in {\mathbb {N}}\) and some constant \(C_{p,Q,T,\delta } >0\) independent of N, K and M.

Proof of Lemma 6.1

We conduct the proof of this lemma iteratively. Fix some \(N,K,M \in {\mathbb {N}}\) and let \(p\in [2,\infty )\). The statement obviously holds for \(m=0\). Then, for some \(m\in \{1,\ldots ,M\}\), we assume that the statement is true for all \(Y_l\) with \(l \in \{0,\ldots ,m-1\}\).

By the triangle inequality, we get

Case 1: Assume that assumption (A5a) is fulfilled. We estimate the individual terms by a Burkholder–Davis–Gundy type inequality [6, Theorem 4.37], and a Taylor expansion of the difference approximation. Precisely, we use

$$\begin{aligned}&B \bigg ( Y_l + \sum _{i\in {\mathcal {J}}_K} P_N B(Y_l) {\tilde{e}}_i I_{(i,j),l}^Q \bigg ) = B(Y_l)\nonumber \\&\quad + \int _0^1 B'(\xi (Y_l,j,u)) \bigg ( \sum _{i\in {\mathcal {J}}_K} P_N B(Y_l) {\tilde{e}}_i I_{(i,j),l}^Q \bigg ) \, \mathrm {d}u \end{aligned}$$
(32)

for some \(\xi (Y_l,j,u)= Y_l+u\sum _{i\in {\mathcal {J}}_K}P_NB(Y_l) {\tilde{e}}_iI_{(i,j),l}^Q \in H_{\beta }\), \(l\in \{0,\ldots ,m-1\}\), \(j\in {\mathcal {J}}_K\), \(u\in [0,1]\). Therewith, we get

$$\begin{aligned}&\big (\mathrm {E}\big [\Vert Y_m\Vert _{H_{\delta }}^p\big ]\big )^{\frac{2}{p}} \\&\quad \le C_p \left( \big (\mathrm {E}\big [\Vert X_0\Vert _{H_{\delta }}^p\big ]\big )^{\frac{2}{p}}\right. \\&\qquad +\left( \sum _{l=0}^{m-1}\left( \mathrm {E}\left[ \left( \int _{t_l}^{t_{l+1}}\big \Vert (-A)^{\delta }e^{A(t_m-t_l)}F(Y_l)\big \Vert _{H}\, \mathrm {d}s\right) ^p\right] \right) ^{\frac{1}{p}}\right) ^2\\&\qquad + \int _{t_0}^{t_m} \left( \mathrm {E}\left[ \left\| \sum _{l=0}^{m-1}e^{A(t_m-t_l)}B(Y_l)\mathbbm {1}_{[t_l,t_{l+1})}(s)\right\| _{L_{HS}(U_0,H_{\delta })}^p \right] \right) ^{\frac{2}{p}} \, \mathrm {d}s\\&\qquad +\left( \mathrm {E}\left[ \left\| \sum _{l=0}^{m-1}(-A)^{\delta }e^{A(t_m-t_l)} \sum _{j\in {\mathcal {J}}_K} \int _0^1 B'(\xi (Y_l,j,u))\right. \right. \right. \\&\qquad \times \left. \left. \left( \left. \left. \sum _{i\in {\mathcal {J}}_K} P_N B(Y_l) {\tilde{e}}_i I_{(i,j),l}^Q \right) {\tilde{e}}_j \, \mathrm {d}u \right\| _H^p \right] \right) ^{\frac{2}{p}} \right) . \end{aligned}$$

The estimates on the analytic semigroup, see Lemma 6.3 and 6.13 in [27, Ch.2], and assumptions (A2), (A3), yield

$$\begin{aligned}&\big (\mathrm {E}\big [\Vert Y_m\Vert _{H_{\delta }}^p\big ]\big )^{\frac{2}{p}}\\&\quad \le C_p \big ( \mathrm {E} \big [ \Vert X_0 \Vert _{H_{\delta }}^p \big ] \big )^{\frac{2}{p}} + C_{p,\delta } M \sum _{l=0}^{m-1}\Big (h^p(t_m-t_l)^{-\delta p}\Big )^{\frac{2}{p}} \big ( \mathrm {E}\big [\Vert F(Y_l)\Vert _{H}^p\big ]\big )^{\frac{2}{p}} \\&\qquad + C_p \sum _{l=0}^{m-1}\int _{t_l}^{t_{l+1}} \left( \mathrm {E} \left[ \left\| \sum _{k=0}^{m-1} e^{A(t_m-t_k)} B(Y_k) \mathbbm {1}_{[t_k,t_{k+1})}(s) \right\| _{L_{HS}(U_0,H_{\delta })}^p \right] \right) ^{\frac{2}{p}} \, \mathrm {d}s \\&\qquad + C_{p,\delta } M \sum _{l=0}^{m-1}(t_m-t_l)^{-2\delta } \left( \mathrm {E}\left[ \left\| \sum _{j\in {\mathcal {J}}_K} \int _0^1 B'(\xi (Y_l,j,u))\right. \right. \right. \\&\qquad \times \left. \left. \left. \left( \sum _{i \in {\mathcal {J}}_K} P_N B(Y_l) {\tilde{e}}_i I_{(i,j),l}^Q \right) {\tilde{e}}_j \, \mathrm {d}u \right\| _{H}^p \right] \right) ^{\frac{2}{p}} \\&\quad \le C_p \big (\mathrm {E}\big [\Vert X_0\Vert _{H_{\delta }}^p\big ]\big )^{\frac{2}{p}} + C_{p,T,\delta } h \sum _{l=0}^{m-1}(t_m-t_l)^{-2\delta } \big (1+\big (\mathrm {E}\big [\Vert Y_l\Vert _{H_{\delta }}^p\big ]\big )^{\frac{2}{p}}\big ) \\&\qquad + C_p \sum _{l=0}^{m-1}\Big (\mathrm {E}\big [ \Vert B(Y_l)\Vert _{L_{HS}(U_0,H_{\delta })}^p\big ]\Big )^{\frac{2}{p}} \int _{t_l}^{t_{l+1}} \big \Vert (-A)^{-\delta }\big \Vert _{L(H)}^2 \big \Vert (-A)^{\delta }e^{A(t_m-t_l)}\big \Vert _{L(H)}^2 \, \mathrm {d}s \\&\qquad + C_{p,\delta } M \sum _{l=0}^{m-1}(t_m-t_l)^{-2\delta } \\&\qquad \times \bigg ( \sum _{j\in {\mathcal {J}}_K} \Big ( \mathrm {E} \Big [ \Big ( \int _0^1 \big \Vert B'(\xi (Y_l,j,u)) \big \Vert _{L(H,L(U,H))} \, \mathrm {d}u \Big )^p \\&\qquad \times \Big \Vert B(Y_l) B(Y_l) \sum _{i \in {\mathcal {J}}_K} I_{(i,j),l}^Q{\tilde{e}}_i\Big \Vert _H^p\Big ]\Big )^{\frac{1}{p}}\bigg )^2 \\&\quad \le C_p \big (\mathrm {E}\big [\Vert X_0\Vert _{H_{\delta }}^p\big ]\big )^{\frac{2}{p}} + h^{1-2\delta } C_{p,T,\delta } \sum _{l=0}^{m-1}(m-l)^{-2\delta } \big (1+\big (\mathrm {E}\big [\Vert Y_l\Vert _{H_{\delta }}^p\big ]\big )^{\frac{2}{p}}\big ) \\&\qquad + C_{p,\delta } \sum _{l=0}^{m-1}h (t_m-t_l)^{-2\delta } \Big ( \mathrm {E}\big [\Vert B(Y_l) \Vert _{L_{HS}(U_0,H_{\delta })}^p\big ]\Big )^{\frac{2}{p}} \\&\qquad + C_{p,\delta } M h^{-2\delta } \sum _{l=0}^{m-1}(m-l)^{-2\delta }\\&\qquad \times \bigg ( \sum _{j\in {\mathcal {J}}_K}\Big (\mathrm {E}\big [\Vert B(Y_l)\Vert ^p_{L(U,H_{\delta })}\big ]\Big )^{\frac{1}{p}} \big (\mathrm {E}\big [ \big (\sum _{i\in {\mathcal {J}}_K} \big (I_{(i,j),l}^Q\big )^2\big )^{\frac{p}{2}}\big ]\big )^{\frac{1}{p}} \bigg )^2. \end{aligned}$$

This expression can further be simplified by the distributional properties of \(I^Q_{(i,j)}\), \(i,j\in {\mathcal {J}}_K\), see [11]. Therewith, we obtain

$$\begin{aligned}&\big ( \mathrm {E} \big [ \Vert Y_m \Vert _{H_{\delta }}^p \big ] \big )^{\frac{2}{p}} \\&\quad \le C_p \big (\mathrm {E}\big [\Vert X_0\Vert _{H_{\delta }}^p\big ]\big )^{\frac{2}{p}} + C_{p,T,\delta } h^{1-2\delta } \sum _{l=0}^{m-1}(m-l)^{-2\delta } \big (1+\big (\mathrm {E}\big [\Vert Y_l\Vert _{H_{\delta }}^p\big ]\big )^{\frac{2}{p}}\big )\\&\qquad + C_{p,Q,\delta } h \sum _{l=0}^{m-1}(t_m-t_l)^{-2\delta } \big (1+\big (\mathrm {E}\big [\Vert Y_l\Vert _{H_{\delta }}^p\big ]\big )^{\frac{2}{p}}\big )\\&\qquad + C_{p,\delta } M h^{-2\delta } \sum _{l=0}^{m-1}(m-l)^{-2\delta } \\&\qquad \times \Big ( \big (1+\mathrm {E}\big [\Vert Y_l\Vert _{H_{\delta }}^p\big ]\big )^{\frac{1}{p}} \sum _{i,j\in {\mathcal {J}}_K} \big (\mathrm {E}\big [ |I_{(i,j),l} \sqrt{\eta _i}\sqrt{\eta _j} |^p \big ] \big )^{\frac{1}{p}}\Big )^2\\&\quad \le C_p \big (\mathrm {E}\big [\Vert X_0\Vert _{H_{\delta }}^p\big ]\big )^{\frac{2}{p}} + C_{p,Q,T,\delta } h^{1-2\delta } \sum _{l=0}^{m-1}(m-l)^{-2\delta } \big (1+\big (\mathrm {E}\big [\Vert Y_l\Vert _{H_{\delta }}^p\big ]\big )^{\frac{2}{p}}\big )\\&\qquad + C_{p,\delta } M h^{-2\delta } \sum _{l=0}^{m-1}(m-l)^{-2\delta } \Big ( \big (1+\mathrm {E}\big [\Vert Y_l\Vert _{H_{\delta }}^p\big ]\big )^{\frac{1}{p}} \sum _{i,j\in {\mathcal {J}}_K}\sqrt{\eta _i}\sqrt{\eta _j}\,h\Big )^2. \end{aligned}$$

Case 2: Assume \(p=2\) and that assumption (A5b) is fulfilled. Again, we estimate the individual terms by a Burkholder-Davis-Gundy type inequality [6, Theorem 4.37], but using a first order Taylor expansion of the difference approximation. Thus, we use

$$\begin{aligned}&B \bigg ( Y_l + \sum _{i\in {\mathcal {J}}_K} P_N B(Y_l) {\tilde{e}}_i I_{(i,j),l}^Q \bigg ) \nonumber \\&\quad = B(Y_l) + B'(Y_l) \Bigg ( \sum _{i\in {\mathcal {J}}_K} P_N B(Y_l){\tilde{e}}_i I_{(i,j),l}^Q \Bigg ) \nonumber \\&\qquad + \int _0^1 \int _0^u B''(\xi (Y_l,j,r)) \Bigg ( \sum _{i \in {\mathcal {J}}_K} P_N B(Y_l) {\tilde{e}}_i I_{(i,j),l}^Q , \sum _{i \in {\mathcal {J}}_K} P_N B(Y_l) {\tilde{e}}_i I_{(i,j),l}^Q \Bigg ) \, \mathrm {d}r \, \mathrm {d}u \end{aligned}$$
(33)

for some \(\xi (Y_l,j,r)= Y_l + r \sum _{i \in {\mathcal {J}}_K} P_N B(Y_l) {\tilde{e}}_i I_{(i,j),l}^Q \in H_{\beta }\), \(l\in \{0,\ldots ,m-1\}\), \(j\in {\mathcal {J}}_K\), \(r\in [0,1]\).

With estimates on the analytic semigroup, see Lemma 6.3 and 6.13 in [27, Ch.2], we get that

$$\begin{aligned}&\big (\mathrm {E}\big [\Vert Y_m\Vert _{H_{\delta }}^p\big ]\big )^{\frac{2}{p}} \\&\quad \le C_p \Bigg ( \big (\mathrm {E}\big [\Vert X_0\Vert _{H_{\delta }}^p\big ]\big )^{\frac{2}{p}} +\bigg (\sum _{l=0}^{m-1}\Big ( \mathrm {E}\Big [\Big (\int _{t_l}^{t_{l+1}}\big \Vert (-A)^{\delta }e^{A(t_m-t_l)}F(Y_l)\big \Vert _{H}\, \mathrm {d}s\Big )^p\Big ]\Big )^{\frac{1}{p}}\bigg )^2\\&\qquad + \int _{t_0}^{t_m} \bigg (\mathrm {E}\bigg [\Big \Vert \sum _{l=0}^{m-1}e^{A(t_m-t_l)}B(Y_l)\mathbbm {1}_{[t_l,t_{l+1})}(s)\Big \Vert _{L_{HS}(U_0,H_{\delta })}^p \bigg ]\bigg )^{\frac{2}{p}} \, \mathrm {d}s \\&\qquad + \bigg ( \mathrm {E}\bigg [\Big \Vert \sum _{l=0}^{m-1}(-A)^{\delta }e^{A(t_m-t_l)} \sum _{j \in {\mathcal {J}}_K} \Big ( B'(Y_l) \Big ( \sum _{i\in {\mathcal {J}}_K} P_N B(Y_l) {\tilde{e}}_i I_{(i,j),l}^Q \Big ) {\tilde{e}}_j \\&\qquad + \int _0^1 \int _0^u B''(\xi (Y_l,j,r)) \\&\qquad \times \Big ( \sum _{i \in {\mathcal {J}}_K} P_N B(Y_l) {\tilde{e}}_i I_{(i,j),l}^Q , \sum _{i \in {\mathcal {J}}_K} P_N B(Y_l) {\tilde{e}}_i I_{(i,j),l}^Q \Big ){\tilde{e}}_j \, \mathrm {d}r \, \mathrm {d}u \Big ) \Big \Vert _H^p\bigg ]\bigg )^{\frac{2}{p}} \Bigg ) \\&\quad \le C_p \big (\mathrm {E}\big [\Vert X_0\Vert _{H_{\delta }}^p\big ]\big )^{\frac{2}{p}} + C_{p,\delta } M \sum _{l=0}^{m-1}\Big (h^p(t_m-t_l)^{-\delta p}\Big )^{\frac{2}{p}} \big ( \mathrm {E}\big [\Vert F(Y_l)\Vert _{H}^p\big ]\big )^{\frac{2}{p}} \\&\qquad + C_p \sum _{l=0}^{m-1}\int _{t_l}^{t_{l+1}} \bigg (\mathrm {E}\bigg [\Big \Vert \sum _{k=0}^{m-1} e^{A(t_m-t_k)} B(Y_k) \mathbbm {1}_{[t_k,t_{k+1})}(s)\Big \Vert _{L_{HS}(U_0,H_{\delta })}^p\bigg ]\bigg )^{\frac{2}{p}} \, \mathrm {d}s \\&\qquad + C_{p,\delta } M \sum _{l=0}^{m-1}(t_m-t_l)^{-2\delta } \bigg ( \mathrm {E}\bigg [\Big \Vert \sum _{i, j \in {\mathcal {J}}_K} I_{(i,j),l}^Q B'(Y_l) ( P_N B(Y_l) {\tilde{e}}_i ) {\tilde{e}}_j \Big \Vert _H^p\bigg ]\bigg )^{\frac{2}{p}} \\&\qquad + C_{p,\delta } M \sum _{l=0}^{m-1}(t_m-t_l)^{-2\delta } \bigg ( \mathrm {E}\bigg [\Big \Vert \sum _{j \in {\mathcal {J}}_K} \int _0^1 \int _0^u B''(\xi (Y_l,j,r)) \\&\qquad \times \Big ( P_N B(Y_l) \sum _{i \in {\mathcal {J}}_K} {\tilde{e}}_i I_{(i,j),l}^Q, P_N B(Y_l) \sum _{i \in {\mathcal {J}}_K} {\tilde{e}}_i I_{(i,j),l}^Q \Big ) {\tilde{e}}_j \, \mathrm {d}r \, \mathrm {d}u \Big ) \Big \Vert _H^p\bigg ]\bigg )^{\frac{2}{p}} . \end{aligned}$$

Making use of \(p=2\), assumptions (A2), (A3) and (A5b) yield

$$\begin{aligned}&\mathrm {E}\big [\Vert Y_m\Vert _{H_{\delta }}^2\big ] \\&\quad \le C \mathrm {E} \big [ \Vert X_0\Vert _{H_{\delta }}^2 \big ] + C_{T,\delta } \sum _{l=0}^{m-1}h (t_m-t_l)^{-2\delta } \big ( 1 + \mathrm {E} \big [ \Vert Y_l\Vert _{H_{\delta }}^2 \big ] \big ) \\&\qquad + C \sum _{l=0}^{m-1}\mathrm {E} \big [ \Vert B(Y_l)\Vert _{L_{HS}(U_0,H_{\delta })}^2 \big ] \int _{t_l}^{t_{l+1}} \big \Vert (-A)^{-\delta }\big \Vert _{L(H)}^2 \big \Vert (-A)^{\delta }e^{A(t_m-t_l)}\big \Vert _{L(H)}^2 \, \mathrm {d}s \\&\qquad + C_{\delta } \, M \sum _{l=0}^{m-1}(t_m-t_l)^{-2\delta } \sum _{i_1,i_2,j_1,j_2 \in {\mathcal {J}}_K} \mathrm {E}\bigg [ I_{(i_1,j_1),l}^Q I_{(i_2,j_2),l}^Q \\&\qquad \times \big \langle B'(Y_l) ( P_N B(Y_l) {\tilde{e}}_{i_1} ) {\tilde{e}}_{j_1} , B'(Y_l) ( P_N B(Y_l) {\tilde{e}}_{i_2} ) {\tilde{e}}_{j_2} \big \rangle _H \bigg ] \\&\qquad + C_{\delta } \, M \sum _{l=0}^{m-1}(t_m-t_l)^{-2\delta } \\&\qquad \bigg ( \sum _{j \in {\mathcal {J}}_K} \bigg ( \mathrm {E}\bigg [ \Big ( \int _0^1 \int _0^u \big \Vert B''(\xi (Y_l,j,r)) \big ( P_N B(Y_l), P_N B(Y_l) \big ) \big \Vert _{L^{(2)}(U,L(U,H))} \\&\qquad \times \Big \Vert \sum _{i \in {\mathcal {J}}_K} {\tilde{e}}_i I_{(i,j),l}^Q \Big \Vert _U^2 \Vert {\tilde{e}}_j \Vert _U \, \mathrm {d}r \, \mathrm {d}u \Big )^2 \bigg ] \bigg )^{\frac{1}{2}} \bigg )^{2} \\&\quad \le C \mathrm {E} \big [ \Vert X_0\Vert _{H_{\delta }}^2 \big ] + C_{T,\delta } \sum _{l=0}^{m-1}h (t_m-t_l)^{-2\delta } \big ( 1 + \mathrm {E} \big [ \Vert Y_l\Vert _{H_{\delta }}^2 \big ] \big ) \\&\qquad + C \sum _{l=0}^{m-1}\mathrm {E} \big [ \Vert B(Y_l)\Vert _{L_{HS}(U_0,H_{\delta })}^2 \big ] \big \Vert (-A)^{-\delta }\big \Vert _{L(H)}^2 \big \Vert (-A)^{\delta }e^{A(t_m-t_l)}\big \Vert _{L(H)}^2 \, h \\&\qquad + C_{\delta } \, M \sum _{l=0}^{m-1}(t_m-t_l)^{-2\delta } \sum _{i_1,i_2,j_1,j_2 \in {\mathcal {J}}_K} \mathrm {E}\big [ I_{(i_1,j_1),l}^Q I_{(i_2,j_2),l}^Q \big ] \\&\qquad \times \mathrm {E} \big [ \big \langle B'(Y_l) ( P_N B(Y_l) {\tilde{e}}_{i_1} ) {\tilde{e}}_{j_1} , B'(Y_l) ( P_N B(Y_l) {\tilde{e}}_{i_2} ) {\tilde{e}}_{j_2} \big \rangle _H \big ] \\&\qquad + C_{\delta } \, M \sum _{l=0}^{m-1}(t_m-t_l)^{-2\delta } \bigg ( \sum _{j \in {\mathcal {J}}_K} \bigg ( \mathrm {E}\bigg [ \Big ( \int _0^1 \int _0^u \big (1 + \big \Vert \xi (Y_l,j,r) \big \Vert _H + \big \Vert Y_l \big \Vert _H \big ) \\&\qquad \times \sum _{i \in {\mathcal {J}}_K} \big ( I_{(i,j),l}^Q \big )^2 \, \mathrm {d}r \, \mathrm {d}u \Big )^2 \bigg ] \bigg )^{\frac{1}{2}} \bigg )^{2} . \end{aligned}$$

Due to \(\mathrm {E} \big [ I_{(i_1,j_1),l}^Q I_{(i_2,j_2),l}^Q \big ] = \tfrac{1}{2} \eta _{i1}\eta _{i2}h_l^2\) if \(i_1 = i_2\) and \(j_1=j_2\) and \(\mathrm {E} \big [ I_{(i_1,j_1),l}^Q I_{(i_2,j_2),l}^Q \big ] = 0\) otherwise, we get

$$\begin{aligned}&\mathrm {E}\big [\Vert Y_m\Vert _{H_{\delta }}^2\big ] \\&\quad \le C \mathrm {E} \big [ \Vert X_0\Vert _{H_{\delta }}^2 \big ] + C_{T,\delta } \sum _{l=0}^{m-1}h (t_m-t_l)^{-2\delta } \big ( 1 + \mathrm {E} \big [ \Vert Y_l\Vert _{H_{\delta }}^2 \big ] \big ) \\&\qquad + C \sum _{l=0}^{m-1}\mathrm {E} \big [ \Vert B(Y_l)\Vert _{L_{HS}(U_0,H_{\delta })}^2 \big ] \big \Vert (-A)^{-\delta }\big \Vert _{L(H)}^2 \big \Vert (-A)^{\delta }e^{A(t_m-t_l)}\big \Vert _{L(H)}^2 \, h \\&\qquad + C_{\delta } \, M \sum _{l=0}^{m-1}(t_m-t_l)^{-2\delta } \sum _{i,j \in {\mathcal {J}}_K} \eta _i \eta _j h^2 \, \mathrm {E}\big [ \Vert B'(Y_l) ( P_N B(Y_l) {\tilde{e}}_i ) {\tilde{e}}_j \Vert _H^2 \big ] \\&\qquad + C_{\delta } \, M \sum _{l=0}^{m-1}(t_m-t_l)^{-2\delta } \\&\qquad \times \bigg ( \sum _{j \in {\mathcal {J}}_K} \bigg ( \mathrm {E}\bigg [ \Big ( \int _0^1 \int _0^u \Big (1 + 2 \big \Vert Y_l \big \Vert _H + r \Big \Vert P_N B(Y_l) \sum _{i \in {\mathcal {J}}_K} {\tilde{e}}_i I_{(i,j),l}^Q \Big \Vert _H \Big ) \\&\qquad \times \sum _{i \in {\mathcal {J}}_K} \big ( I_{(i,j),l}^Q \big )^2 \, \mathrm {d}r \, \mathrm {d}u \Big )^2 \bigg ] \bigg )^{\frac{1}{2}} \bigg )^{2} \\&\quad \le C \mathrm {E} \big [ \Vert X_0\Vert _{H_{\delta }}^2 \big ] + C_{T,\delta } \sum _{l=0}^{m-1}h (t_m-t_l)^{-2\delta } \big ( 1 + \mathrm {E} \big [ \Vert Y_l\Vert _{H_{\delta }}^2 \big ] \big ) \\&\qquad + C \sum _{l=0}^{m-1}\mathrm {E} \big [ {\text {tr}}Q \, \Vert B(Y_l)\Vert _{L(U,H_{\delta })}^2 \big ] \big \Vert (-A)^{-\delta }\big \Vert _{L(H)}^2 \big \Vert (-A)^{\delta }e^{A(t_m-t_l)}\big \Vert _{L(H)}^2 \, h \\&\qquad + C_{Q,\delta } \, M \sum _{l=0}^{m-1}(t_m-t_l)^{-2\delta } h^2 \, \mathrm {E}\big [ \big \Vert B'(Y_l) P_N B(Y_l) \big \Vert _{L(U,L(U,H))}^2 \big ] \\&\qquad + C_{\delta } \, M \sum _{l=0}^{m-1}(t_m-t_l)^{-2\delta } \bigg ( \sum _{j \in {\mathcal {J}}_K} \bigg ( \mathrm {E}\bigg [ \Big ( 1 + \big \Vert Y_l \big \Vert _{H_{\delta }} + \big \Vert (-A)^{-\delta } \big \Vert _{L(H)} \big \Vert B(Y_l) \big \Vert _{L(U,H_{\delta })} \\&\qquad \times \Big ( \sum _{i \in {\mathcal {J}}_K} \big ( I_{(i,j),l}^Q\big )^2 \Big )^{\frac{1}{2}} \Big )^2 \Big ( \sum _{i \in {\mathcal {J}}_K} \big ( I_{(i,j),l}^Q \big )^2 \Big )^2 \bigg ] \bigg )^{\frac{1}{2}} \bigg )^{2} \\&\quad \le C \mathrm {E} \big [ \Vert X_0\Vert _{H_{\delta }}^2 \big ] + C_{T,\delta } \sum _{l=0}^{m-1}h (t_m-t_l)^{-2\delta } \big ( 1 + \mathrm {E} \big [ \Vert Y_l\Vert _{H_{\delta }}^2 \big ] \big ) \\&\qquad + C_{Q,\delta } h \sum _{l=0}^{m-1}(t_m-t_l)^{-2\delta } \big (1 + \mathrm {E} \big [ \Vert Y_l\Vert _{H_{\delta }}^2 \big ] \big ) \\&\qquad + C_{Q,T,\delta } h \sum _{l=0}^{m-1}(t_m-t_l)^{-2\delta } \mathrm {E}\big [ \big \Vert B'(Y_l) \big \Vert _{L(H,L(U,H))}^2 \big \Vert (-A)^{-\delta } \big \Vert _{L(H)}^2 \big \Vert B(Y_l) \Vert _{L(U,H_{\delta })}^2 \big ] \\&\qquad + C_{\delta } \, M \sum _{l=0}^{m-1}(t_m-t_l)^{-2\delta } \bigg ( \sum _{j \in {\mathcal {J}}_K} \bigg ( \mathrm {E}\bigg [ \Big ( \Big ( \sum _{i \in {\mathcal {J}}_K} \big ( I_{(i,j),l}^Q \big )^2 \Big )^2 \\&\qquad + \Big ( \sum _{i \in {\mathcal {J}}_K} \big ( I_{(i,j),l}^Q \big )^2 \Big )^3 \Big ) \big ( 1 + \big \Vert Y_l \big \Vert _{H_{\delta }}^2 \big ) \bigg ] \bigg )^{\frac{1}{2}} \bigg )^{2} \\&\quad \le C \mathrm {E} \big [ \Vert X_0\Vert _{H_{\delta }}^2 \big ] + C_{Q,T,\delta } h^{1-2\delta } \sum _{l=0}^{m-1}(m-l)^{-2\delta } \big ( 1 + \mathrm {E} \big [ \Vert Y_l\Vert _{H_{\delta }}^2 \big ] \big ) \\&\qquad + C_{\delta } \, M \sum _{l=0}^{m-1}(t_m-t_l)^{-2\delta } \bigg ( \sum _{j \in {\mathcal {J}}_K} \bigg ( \bigg ( \mathrm {E}\bigg [ \Big ( \sum _{i \in {\mathcal {J}}_K} \big ( I_{(i,j)}^Q(h_l) \big )^2 \Big )^2 \bigg ]\\&\qquad + \mathrm {E}\bigg [ \Big ( \sum _{i \in {\mathcal {J}}_K} \big ( I_{(i,j)}^Q(h_l) \big )^2 \Big )^3 \bigg ] \bigg )\big ( 1 + \mathrm {E}\big [ \big \Vert Y_l \big \Vert _{H_{\delta }}^2 \big ] \big ) \bigg )^{\frac{1}{2}} \bigg )^{2} \\&\quad \le C \mathrm {E} \big [ \Vert X_0\Vert _{H_{\delta }}^2 \big ] + C_{Q,T,\delta } h^{1-2\delta } \sum _{l=0}^{m-1}(m-l)^{-2\delta } \big ( 1 + \mathrm {E} \big [ \Vert Y_l\Vert _{H_{\delta }}^2 \big ] \big ) \\&\qquad + C_{Q,\delta } \, M \, h^{-2\delta } \sum _{l=0}^{m-1}(m-l)^{-2\delta } \\&\qquad \times \bigg ( {\text {tr}}Q \bigg ( ({\text {tr}}Q)^2 h^4 + ({\text {tr}}Q)^3 h^6 \bigg )^{\frac{1}{2}} \bigg )^{2} \big ( 1 + \mathrm {E}\big [ \big \Vert Y_l \big \Vert _{H_{\delta }}^2 \big ] \big ) \\&\quad \le C \mathrm {E} \big [ \Vert X_0\Vert _{H_{\delta }}^2 \big ] + C_{Q,T,\delta } h^{1-2\delta } \sum _{l=0}^{m-1}(m-l)^{-2\delta } \big ( 1 + \mathrm {E} \big [ \Vert Y_l\Vert _{H_{\delta }}^2 \big ] \big ) \\&\qquad + C_{Q,T,\delta } \, h^{1-2\delta } \sum _{l=0}^{m-1}(m-l)^{-2\delta } \big ( h^2 + h^4 \big ) \big ( 1 + \mathrm {E}\big [ \big \Vert Y_l \big \Vert _{H_{\delta }}^2 \big ] \big ). \end{aligned}$$

Now, we continue with the final estimates in case 1 and case 2 simultaneously having in mind that case 2 is restricted to \(p=2\).

Interpreting the terms \(\sum _{l=0}^{m-1}(m-l)^{-2\delta }\) as lower Darboux sums, we estimate these expressions as in the proof of the scheme for SPDEs with commutative noise in [20], see also [7], for \(\delta \in (0,\frac{1}{2})\) and all \(m\in \{1,\ldots ,M\}\), \(M\in {\mathbb {N}}\)

$$\begin{aligned} \sum _{l=0}^{m-1}(m-l)^{-2\delta } = \sum _{l=1}^m l^{-2\delta } \le 1+\int _1^M r^{-2\delta } \, \mathrm {d}r \le \frac{M^{1-2\delta }}{1-2\delta }. \end{aligned}$$

This yields

$$\begin{aligned} \big (\mathrm {E}\big [\Vert Y_m\Vert _{H_{\delta }}^p\big ]\big )^{\frac{2}{p}}&\le C_p \big ( \mathrm {E} \big [ \Vert X_0 \Vert _{H_{\delta }}^p \big ] \big )^{\frac{2}{p}} + C_{p,Q,T,\delta } \\&\quad + h^{1-2\delta } C_{p,Q,T,\delta } \sum _{l=0}^{m-1}(m-l)^{-2\delta } \big (\mathrm {E}\big [\Vert Y_l\Vert _{H_{\delta }}^p\big ]\big )^{\frac{2}{p}} \end{aligned}$$

in a first step. Further, the discrete Gronwall Lemma implies the boundedness of the moments

$$\begin{aligned} \big (\mathrm {E}\big [\Vert Y_m\Vert _{H_{\delta }}^p\big ]\big )^{\frac{2}{p}}&\le \Big (C_p \big ( \mathrm {E} \big [ \Vert X_0 \Vert _{H_{\delta }}^p \big ] \big )^{\frac{2}{p}} + C_{p,Q,T,\delta } \Big ) e^{C_{p,Q,T,\delta } \sum _{l=0}^{m-1}(m-l)^{-2\delta }h^{1-2\delta }} \\&\le C_{p,Q,T,\delta } \big (1+\big (\mathrm {E}\big [\Vert X_0\Vert _{H_{\delta }}^p\big ]\big )^{\frac{2}{p}}\big ) \end{aligned}$$

for all \(m\in \{1,\ldots ,M\}\), \(M\in {\mathbb {N}}\), for \(p\in [2,\infty )\) in case of (A5a) and for \(p=2\) in case of (A5b). \(\square \)

We address the proof of Theorem 2.1 now and show that the scheme converges with the specified order. This estimate does not yet involve any approximation of the stochastic iterated integrals.

Proof of Theorem 2.1

First, we express the mild solution of (2) as

$$\begin{aligned} X_{t_m} = e^{At_m}X_0 + \sum _{l=0}^{m-1} \int _{t_l}^{t_{l+1}}e^{A(t_m-s)}F(X_s)\,\mathrm {d}s + \sum _{l=0}^{m-1} \int _{t_l}^{t_{l+1}}e^{A(t_m-s)}B(X_s)\,\mathrm {d}W_s \end{aligned}$$

for all \(m\in \{0,\ldots ,M\}\), \(M\in {\mathbb {N}}\) to align the components with the corresponding terms in the approximation below. We define the following auxiliary processes for \(m\in \{0,\ldots ,M\}\), \(M,N,K\in {\mathbb {N}}\)

$$\begin{aligned} {\hat{Y}}^{\text {MIL}}_{m}&= P_N \left( e^{At_m}X_0 + \sum _{l=0}^{m-1} \int _{t_l}^{t_{l+1}}e^{A(t_m-t_l)}F({\hat{Y}}^{\text {MIL}}_l)\,\mathrm {d}s\right. \\&\quad + \sum _{l=0}^{m-1} \int _{t_l}^{t_{l+1}}e^{A(t_m-t_l)}B({\hat{Y}}^{\text {MIL}}_l)\,\mathrm {d}W^K_s \\&\quad +\left. \sum _{l=0}^{m-1} \int _{t_l}^{t_{l+1}}e^{A(t_m-t_l)} B'({\hat{Y}}^{\text {MIL}}_l)\left( P_N\int _{t_l}^s B({\hat{Y}}^{\text {MIL}}_l)\,\mathrm {d}W_r^K\right) \,\mathrm {d}W_s^K\right) ,\\ {\bar{Y}}^{\text {MIL}}_{m}&= P_N\left( e^{At_m} X_0 + \sum _{l=0}^{m-1} \int _{t_l}^{t_{l+1}}e^{A(t_m-t_l)}F(Y_l)\,\mathrm {d}s \right. \\&\quad + \sum _{l=0}^{m-1} \int _{t_l}^{t_{l+1}}e^{A(t_m-t_l)}B(Y_l)\,\mathrm {d}W^K_s \\&\quad +\left. \sum _{l=0}^{m-1} \int _{t_l}^{t_{l+1}}e^{A(t_m-t_l)} B'(Y_l)\left( P_N\int _{t_l}^sB(Y_l)\,\mathrm {d}W_r^K\right) \,\mathrm {d}W_s^K\right) . \end{aligned}$$

The discrete process \((Y_m)_{m\in \{0,\ldots ,M\}}\) denotes the approximation obtained by the \(\text {DFM}\) scheme in (7). The auxiliary processes are introduced in order to split the approximation error such that we can employ some known prior estimates. We analyze the following terms separately

$$\begin{aligned} \Big (\mathrm {E}\big [\Vert X_{t_m}-Y_m\Vert _H^2\big ]\Big )^{\frac{1}{2}}&\le \Big (\mathrm {E} \big [ \Vert X_{t_m} - {\hat{Y}}^{\text {MIL}}_m \Vert _H^2 \big ] \Big )^{\frac{1}{2}} +\Big (\mathrm {E}\big [\Vert {\hat{Y}}^{\text {MIL}}_m - Y_m \Vert _H^2 \big ] \Big )^{\frac{1}{2}} \nonumber \\&\le \Big (\mathrm {E}\big [\Vert X_{t_m} - {\hat{Y}}^{\text {MIL}}_m \Vert _H^2 \big ] \Big )^{\frac{1}{2}} + \Big (\mathrm {E} \big [ \Vert {\hat{Y}}^{\text {MIL}}_m - {\bar{Y}}^{\text {MIL}}_{m} \Vert _H^2 \big ] \Big )^{\frac{1}{2}} \nonumber \\&\quad + \Big ( \mathrm {E} \big [ \Vert {\bar{Y}}^{\text {MIL}}_{m} - Y_m \Vert _H^2 \big ] \Big )^{\frac{1}{2}} \end{aligned}$$
(34)

for all \(m\in \{0,\ldots ,M\}\), \(M\in {\mathbb {N}}\). The first term is similar to the error that results from the approximation of (2) with the Milstein scheme by Jentzen and Röckner presented in [9]. A slight difference arises as we introduce the projection operator \(P_N\) in the definition of \({\hat{Y}}^{\text {MIL}}_{m}\), see the computations in [18, 20]. The main reasoning, however, is the same. In the error analysis in [9], the commutativity condition is not needed—it is only employed to facilitate implementation—whereas all conditions required in the proof in [9] are fulfilled due to assumptions (A1)–(A4). Therefore, the estimate

$$\begin{aligned} \sup _{m\in \{0,\ldots ,M\}} \Big ( \mathrm {E} \big [ \Vert X_{t_m} - {\hat{Y}}^{\text {MIL}}_m \Vert _H^2 \big ] \Big )^{\frac{1}{2}}&\le C_{Q,T} \left( \left( \inf _{i \in {\mathcal {I}} {\setminus } {\mathcal {I}}_N} \lambda _i \right) ^{-\gamma } \right. \nonumber \\&\quad +\left. \bigg ( \sup _{j \in {\mathcal {J}} {\setminus } {\mathcal {J}}_K} \eta _j \bigg )^{\alpha } + M^{-\min (2(\gamma -\beta ),\gamma )}\right) \end{aligned}$$
(35)

for arbitrary \(N,M,K\in {\mathbb {N}}\) is valid. For details, we refer to [9].

The error estimate of the second term in (34), \(\mathrm {E} \big [ \Vert {\hat{Y}}^{\text {MIL}}_m - {\bar{Y}}^{\text {MIL}}_m \Vert _H^2\big ]\), \(m\in \{0,\ldots ,M\}\), \(M\in {\mathbb {N}}\), can be obtained by the same means as in the proof of convergence of the Milstein scheme in [9] and mainly relies on the Lipschitz properties of the operators. We transfer this reasoning from [9, Section 6.3], which yields

$$\begin{aligned} \mathrm {E} \big [ \Vert {\hat{Y}}^{\text {MIL}}_m - {\bar{Y}}^{\text {MIL}}_m \Vert _H^2 \big ] \le C_Th \sum _{l=0}^{m-1} \mathrm {E} \big [ \Vert {\hat{Y}}^{\text {MIL}}_l - Y_l \Vert _H^2 \big ] \end{aligned}$$
(36)

for all \(m\in \{0,\ldots ,M\}\), \(M,N,K\in {\mathbb {N}}\).

Next, we analyze the third term in (34) which represents the error that results from the approximation of the derivative. We can show that the theoretical order of convergence that the Milstein scheme obtains is not reduced by this approximation.

We rewrite the expression in (34) as

$$\begin{aligned}&\mathrm {E}\big [\Vert {\bar{Y}}^{\text {MIL}}_{m}-Y_m\Vert _H^2\big ] \\&\quad = \mathrm {E} \bigg [ \Big \Vert P_N \Big ( \sum _{l=0}^{m-1}\sum _{i,j \in {\mathcal {J}}_K} \sqrt{\eta _j} \sqrt{\eta _i} e^{A(t_m-t_l)} B'(Y_l) ( P_N B(Y_l) {\tilde{e}}_i ) {\tilde{e}}_j I_{(i,j),l}\Big ) \\&\qquad - P_N\Big (\sum _{l=0}^{m-1}e^{A(t_m-t_l)}\sum _{j\in {\mathcal {J}}_K} \Big (B\Big (Y_l+\sum _{i\in {\mathcal {J}}_K}\sqrt{\eta _j}\sqrt{\eta _i}P_NB(Y_l) {\tilde{e}}_iI_{(i,j),l}\Big ){\tilde{e}}_j\\&\qquad -B(Y_l){\tilde{e}}_j\Big )\Big )\Big \Vert _H^2\bigg ]. \end{aligned}$$

We employ a Taylor approximation of first order for the second term, see (33), such that the first order derivatives cancel. Moreover, the triangle inequality and assumption (A3) imply

$$\begin{aligned}&\mathrm {E}\big [\Vert {\bar{Y}}^{\text {MIL}}_{m}-Y_m\Vert _H^2\big ] \nonumber \\&\quad \le \mathrm {E}\bigg [\Big \Vert \sum _{l=0}^{m-1}e^{A(t_{m}-t_l)}\sum _{j\in {\mathcal {J}}_K} \int _0^1\int _0^uB''\Big (Y_l+r\sum _{i\in {\mathcal {J}}_K}\sqrt{\eta _j} \sqrt{\eta _i}P_NB(Y_l){\tilde{e}}_iI_{(i,j),l}\Big ) \nonumber \\&\qquad \Big (\sum _{i\in {\mathcal {J}}_K}\sqrt{\eta _j}\sqrt{\eta _i}P_NB(Y_l){\tilde{e}}_i I_{(i,j),l},\sum _{i\in {\mathcal {J}}_K}\sqrt{\eta _j} \sqrt{\eta _i}P_NB(Y_l){\tilde{e}}_iI_{(i,j),l}\Big ){\tilde{e}}_j \, \mathrm {d}r\,\mathrm {d}u \Big \Vert _H^2\bigg ] . \end{aligned}$$
(37)

Case 1: Assume that assumption (A5a) is fulfilled, i.e., Lemma 6.1 is valid for any \(p \ge 2\). Thus, it follows from (37) that

$$\begin{aligned}&\mathrm {E}\big [\Vert {\bar{Y}}^{\text {MIL}}_{m}-Y_m\Vert _H^2\big ] \\&\quad \le \mathrm {E}\bigg [\Big (\sum _{l=0}^{m-1}\sum _{j\in {\mathcal {J}}_K}\int _0^1\int _0^u\Big \Vert e^{A(t_{m}-t_l)} B''\Big (Y_l+r\sum _{i\in {\mathcal {J}}_K}\sqrt{\eta _j}\sqrt{\eta _i}P_NB(Y_l){\tilde{e}}_iI_{(i,j),l}\Big ) \\&\qquad \times \Big (\sum _{i\in {\mathcal {J}}_K}\sqrt{\eta _j}\sqrt{\eta _i}P_NB(Y_l){\tilde{e}}_iI_{(i,j),l}, \sum _{i\in {\mathcal {J}}_K}\sqrt{\eta _j}\sqrt{\eta _i}P_N B(Y_l){\tilde{e}}_iI_{(i,j),l}\Big ){\tilde{e}}_j\Big \Vert _H \,\mathrm {d}r\,\mathrm {d}u\Big )^2\bigg ] \\&\quad \le C_T \mathrm {E}\bigg [\Big (\sum _{l=0}^{m-1}\sum _{j\in {\mathcal {J}}_K}\int _0^1\int _0^u\Big \Vert B''\Big (Y_l+r\sum _{i\in {\mathcal {J}}_K}\sqrt{\eta _j}\sqrt{\eta _i}P_NB(Y_l) {\tilde{e}}_iI_{(i,j),l}\Big )\Big \Vert _{L^{(2)}(H,L(U,H))}\\&\qquad \times \Big \Vert \sum _{i\in {\mathcal {J}}_K} \sqrt{\eta _j}\sqrt{\eta _i}P_NB(Y_l){\tilde{e}}_iI_{(i,j),l}\Big \Vert _H^2\,\mathrm {d}r \,\mathrm {d}u\Big )^2\bigg ]\\&\quad \le C_T \mathrm {E}\bigg [\Big (\sum _{l=0}^{m-1}\sum _{j\in {\mathcal {J}}_K}\Big \Vert B(Y_l) \sum _{i\in {\mathcal {J}}_K}\sqrt{\eta _j}\sqrt{\eta _i}{\tilde{e}}_iI_{(i,j),l}\Big \Vert _H^2\Big )^2\bigg ]. \end{aligned}$$

Then, we obtain with Lemma 6.1 in the case that condition (A5a) is valid that

$$\begin{aligned}&\mathrm {E}\big [\Vert {\bar{Y}}^{\text {MIL}}_{m}-Y_m\Vert _H^2\big ] \\&\quad \le C_T \mathrm {E}\bigg [\bigg (\sum _{l=0}^{m-1}\sum _{j\in {\mathcal {J}}_K}\Vert B(Y_l)\Vert _{L(U,H)}^2 \bigg \Vert \sum _{i\in {\mathcal {J}}_K}\sqrt{\eta _j}\sqrt{\eta _i} {\tilde{e}}_iI_{(i,j),l}\bigg \Vert _U^2\bigg )^2\bigg ] \\&\quad \le C_T \mathrm {E}\bigg [\bigg (\sum _{l=0}^{m-1}\sum _{j\in {\mathcal {J}}_K}\Vert B(Y_l)\Vert _{L(U,H_{\delta })}^2 \\&\quad \quad \times \sum _{i_1,i_2\in {\mathcal {J}}_K} \eta _j\sqrt{\eta _{i_1}}\sqrt{\eta _{i_2}}I_{(i_1,j),l}I_{(i_2,j),l} \langle {\tilde{e}}_{i_1},{\tilde{e}}_{i_2}\rangle _U\bigg )^2\bigg ]\\&\quad = C_T \mathrm {E}\bigg [\bigg (\sum _{l=0}^{m-1}\sum _{j\in {\mathcal {J}}_K}\Vert B(Y_l)\Vert _{L(U,H_{\delta })}^2\sum _{i\in {\mathcal {J}}_K}\eta _j\eta _iI_{(i,j),l}^2\bigg )^2\bigg ]\\&\quad \le C_T \left( \sum _{l=0}^{m-1}\sum _{j\in {\mathcal {J}}_K} \sum _{i\in {\mathcal {J}}_K} \big ( \mathrm {E}[\Vert B(Y_l)\Vert _{L(U,H_{\delta })}^4]\big )^{\frac{1}{2}} \Big (\mathrm {E}\Big [\Big (\eta _j\eta _iI_{(i,j),l}^2\Big )^2\Big ]\Big )^{\frac{1}{2}}\right) ^2\\&\quad \le C_{Q,T,\delta } \left( \sum _{l=0}^{m-1}\sum _{j\in {\mathcal {J}}_K} \sum _{i\in {\mathcal {J}}_K}\eta _j\eta _i \big (\mathrm {E}\big [I_{(i,j),l}^4\big ]\big )^{\frac{1}{2}}\right) ^2. \end{aligned}$$

Finally, we get

$$\begin{aligned} \mathrm {E}\big [\Vert {\bar{Y}}^{\text {MIL}}_{m}-Y_m\Vert _H^2\big ] \le C_{Q,T,\delta } \Big (\sum _{l=0}^{m-1}({\text {tr}}Q)^2 h^2\Big )^2 \le C_{Q,T,\delta } h^2 ({\text {tr}}Q)^4 \le C_{Q,T,\delta } h^2 \end{aligned}$$
(38)

by the distributional properties of \(I_{(i,j),l}\), \(l\in \{0,\ldots ,m-1\}\), \(i,j\in {\mathcal {J}}_K\) for all \(m\in \{1,\ldots ,M\}\), \(M,K\in {\mathbb {N}}\), see [11].

Case 2: If assumption (A5b) is fulfilled, then Lemma 6.1 is valid for \(p = 2\). Therefore, we need a customized proof to proceed and we get for \(p=2\) from (33) and (37) that

$$\begin{aligned}&\mathrm {E}\big [\Vert {\bar{Y}}^{\text {MIL}}_{m}-Y_m\Vert _H^2\big ] \\&\quad \le \mathrm {E} \bigg [ \Big \Vert \sum _{l=0}^{m-1}e^{A(t_{m}-t_l)} \sum _{j \in {\mathcal {J}}_K} \int _0^1 \int _0^u B''\Big ( Y_l + r \sum _{i \in {\mathcal {J}}_K} P_N B(Y_l) {\tilde{e}}_i I^Q_{(i,j),l} \Big ) \\&\qquad \times \Big ( \sum _{i \in {\mathcal {J}}_K} P_N B(Y_l) {\tilde{e}}_i I^Q_{(i,j),l}, \sum _{i \in {\mathcal {J}}_K} P_N B(Y_l) {\tilde{e}}_i I^Q_{(i,j),l} \Big ) {\tilde{e}}_j \, \mathrm {d}r\,\mathrm {d}u \Big \Vert _H^2\bigg ] \\&\quad \le M \sum _{l=0}^{m-1}\bigg ( \sum _{j \in {\mathcal {J}}_K} \bigg ( \mathrm {E} \bigg [ \Big \Vert e^{A(t_{m}-t_l)} \Big \Vert _{L(H)}^2 \Big \Vert \int _0^1 \int _0^u B''\Big ( Y_l + r \sum _{i \in {\mathcal {J}}_K} P_N B(Y_l) {\tilde{e}}_i I^Q_{(i,j),l} \Big ) \\&\qquad \times \Big ( \sum _{i \in {\mathcal {J}}_K} P_N B(Y_l) {\tilde{e}}_i I^Q_{(i,j),l}, \sum _{i \in {\mathcal {J}}_K} P_N B(Y_l) {\tilde{e}}_i I^Q_{(i,j),l} \Big ) {\tilde{e}}_j \, \mathrm {d}r\,\mathrm {d}u \Big \Vert _H^2\bigg ] \bigg )^{\frac{1}{2}} \bigg )^2 \\&\quad \le C_T \, M \sum _{l=0}^{m-1}\bigg ( \sum _{j \in {\mathcal {J}}_K} \bigg ( \mathrm {E}\bigg [ \Big ( \int _0^1 \int _0^u \big \Vert B''(\xi (Y_l,j,r)) \big ( P_N B(Y_l), P_N B(Y_l) \big ) \big \Vert _{L^{(2)}(U,L(U,H))} \\&\qquad \times \Big \Vert \sum _{i \in {\mathcal {J}}_K} {\tilde{e}}_i I_{(i,j),l}^Q \Big \Vert _U^2 \Vert {\tilde{e}}_j \Vert _U \, \mathrm {d}r \, \mathrm {d}u \Big )^2 \bigg ] \bigg )^{\frac{1}{2}} \bigg )^{2} \\&\quad \le C_{T,\delta } \, M \sum _{l=0}^{m-1}\bigg ( \sum _{j \in {\mathcal {J}}_K} \bigg ( \mathrm {E}\bigg [ \Big ( \int _0^1 \int _0^u \big (1 + \big \Vert \xi (Y_l,j,r) \big \Vert _H + \big \Vert Y_l\big \Vert _H \big ) \\&\qquad \times \sum _{i \in {\mathcal {J}}_K} \big ( I_{(i,j),l}^Q \big )^2 \, \mathrm {d}r \, \mathrm {d}u \Big )^2 \bigg ] \bigg )^{\frac{1}{2}} \bigg )^{2} . \end{aligned}$$

Due to \(\mathrm {E} \big [ I_{(i_1,j_1),l}^Q I_{(i_2,j_2),l}^Q \big ] = \tfrac{1}{2} \eta _{i_1} \eta _{j_1} h_l^2\) if \(i_1 = i_2\) and \(j_1=j_2\) and \(\mathrm {E} \big [ I_{(i_1,j_1),l}^Q I_{(i_2,j_2),l}^Q \big ] = 0\) otherwise, we get

$$\begin{aligned}&\mathrm {E}\big [\Vert {\bar{Y}}^{\text {MIL}}_{m}-Y_m\Vert _H^2\big ] \nonumber \\&\quad \le C_{T,\delta } \, M \sum _{l=0}^{m-1}\bigg ( \sum _{j \in {\mathcal {J}}_K} \bigg ( \mathrm {E}\bigg [ \Big ( \int _0^1 \int _0^u \Big (1 + 2 \big \Vert Y_l \big \Vert _H + r \Big \Vert P_N B(Y_l) \sum _{i \in {\mathcal {J}}_K} {\tilde{e}}_i I_{(i,j),l}^Q \Big \Vert _H \Big ) \nonumber \\&\qquad \times \sum _{i \in {\mathcal {J}}_K} \big ( I_{(i,j),l}^Q \big )^2 \, \mathrm {d}r \, \mathrm {d}u \Big )^2 \bigg ] \bigg )^{\frac{1}{2}} \bigg )^{2} \nonumber \\&\quad \le C_{T,\delta } \, M \sum _{l=0}^{m-1}\bigg ( \sum _{j \in {\mathcal {J}}_K} \bigg ( \mathrm {E}\bigg [ \Big ( 1 + \big \Vert Y_l \big \Vert _H + \big \Vert (-A)^{-\delta } \big \Vert _{L(H)} \big \Vert B(Y_l) \big \Vert _{L(U,H_{\delta })} \nonumber \\&\qquad \times \Big ( \sum _{i \in {\mathcal {J}}_K} \big ( I_{(i,j),l}^Q \big )^2 \Big )^{\frac{1}{2}} \Big )^2 \Big ( \sum _{i \in {\mathcal {J}}_K} \big ( I_{(i,j),l}^Q \big )^2 \Big )^2 \bigg ] \bigg )^{\frac{1}{2}} \bigg )^{2} \nonumber \\&\quad \le C_{T,\delta } \, M \sum _{l=0}^{m-1}\bigg ( \sum _{j \in {\mathcal {J}}_K} \bigg ( \mathrm {E}\bigg [ \Big ( \Big ( \sum _{i \in {\mathcal {J}}_K} \big ( I_{(i,j),l}^Q \big )^2 \Big )^2 \nonumber \\&\quad \quad + \Big ( \sum _{i \in {\mathcal {J}}_K} \big ( I_{(i,j),l}^Q \big )^2 \Big )^3 \Big ) \big ( 1 + \big \Vert Y_l \big \Vert _H^2 \big ) \bigg ] \bigg )^{\frac{1}{2}} \bigg )^{2} \nonumber \\&\quad \le C_{T,\delta } \, M \sum _{l=0}^{m-1}\bigg ( \sum _{j \in {\mathcal {J}}_K} \bigg ( \bigg ( \mathrm {E}\bigg [ \Big ( \sum _{i \in {\mathcal {J}}_K} \big ( I_{(i,j),l}^Q \big )^2 \Big )^2 \bigg ] \nonumber \\&\qquad + \mathrm {E}\bigg [ \Big ( \sum _{i \in {\mathcal {J}}_K} \big ( I_{(i,j),l}^Q \big )^2 \Big )^3 \Big ) \bigg ] \bigg ) \big ( 1 + \mathrm {E}\big [ \big \Vert Y_l \big \Vert _H^2 \big ] \big ) \bigg )^{\frac{1}{2}} \bigg )^{2} \nonumber \\&\quad \le C_{T,\delta } \, M \sum _{l=0}^{m-1}\bigg ( {\text {tr}}Q \bigg ( ({\text {tr}}Q)^2 h^4 + ({\text {tr}}Q)^3 h^6 \bigg )^{\frac{1}{2}} \bigg )^{2} \big ( 1 + \mathrm {E}\big [ \big \Vert Y_l \big \Vert _{H_{\delta }}^2 \big ] \big ) \nonumber \\&\quad \le C_{T,\delta } \, ({\text {tr}}Q)^4 h \sum _{l=0}^{m-1}\big ( h^2 + {\text {tr}}Q \, h^4 \big ) \big ( 1 + \mathrm {E}\big [ \big \Vert X_0 \big \Vert _{H_{\delta }}^2 \big ] \big ) \nonumber \\&\quad \le C_{Q,T,\delta } h^2 . \end{aligned}$$
(39)

Now, we proceed for both cases similarly. A combination of estimates (35), (36) and (38) or (39) for case 1 and case 2, respectively, with (34), and Gronwall’s Lemma imply

$$\begin{aligned} \mathrm {E} \big [ \Vert {\hat{Y}}^{\text {MIL}}_m - Y_m \Vert _H^2 \big ] \le C_T h \sum _{l=0}^{m-1} \mathrm {E} \big [ \Vert {\hat{Y}}^{\text {MIL}}_l - Y_l \Vert _H^2 \big ] + C_{Q,T,\delta } h^2 \le C_{Q,T,\delta } h^2. \end{aligned}$$

This results in

$$\begin{aligned} \Big (\mathrm {E}\big [\Vert X_{t_m}-Y_m\Vert _H^2\big ]\Big )^{\frac{1}{2}}&\le C_{Q,T,\delta } \left( \left( \inf _{i \in {\mathcal {I}} {\setminus } {\mathcal {I}}_N} \lambda _i \right) ^{-\gamma } \right. \\&\quad +\left. \bigg ( \sup _{j \in {\mathcal {J}} {\setminus } {\mathcal {J}}_K} \eta _j \bigg )^{\alpha } + M^{-\min (2(\gamma -\beta ),\gamma )}\right) \end{aligned}$$

for the overall error. \(\square \)

In the last part of this section, we prove the estimate that we obtain in the case that the stochastic double integrals are approximated additionally, that is, this estimate incorporates the error of the algorithm which is used to compute \({\bar{I}}^Q_{(i,j),l}\), \(i,j\in {\mathcal {J}}_K\), \(l \in \{0,\ldots ,M-1\}\).

Remark 6.1

Under the assumption that

$$\begin{aligned} \sum _{j\in {\mathcal {J}}_K} \bigg (\mathrm {E} \bigg [ \bigg ( \sum _{i\in {\mathcal {J}}_K} \big ({\bar{I}}^Q_{(i,j),t,t+h} \big )^2 \bigg )^{\frac{p}{2}} \bigg ]\bigg )^{\frac{1}{p}} \le C_Q h \end{aligned}$$

for \(p \in [2,\infty )\) in case of assumption (A5a) or

$$\begin{aligned} \sum _{j \in {\mathcal {J}}_K} \bigg ( \mathrm {E}\bigg [ \bigg ( \sum _{i \in {\mathcal {J}}_K} \big ( {\bar{I}}^Q_{(i,j),t,t+h} \big )^2 \bigg )^q \bigg ] \bigg )^{\frac{1}{2}} \le C_Q h^q \end{aligned}$$

for \(q=2,3\) in case of assumption (A5b) for any \(h>0\) and \(t \in [0,T-h]\), a statement similar to Lemma 6.1 also holds for the process \(({\bar{Y}}_l)_{l\in \{0,\ldots ,M\}}\) which includes the approximation of the stochastic double integral, i.e., it holds

$$\begin{aligned} \sup _{m\in \{0,\ldots ,M\}}\big ( \mathrm {E}\big [\Vert {\bar{Y}}_m \Vert _{H_{\delta }}^p \big ] \big )^{\frac{1}{p}} \le C_{Q,T,\delta } \big (1 + \big (\mathrm {E} \big [ \Vert X_0\Vert _{H_{\delta }}^p \big ] \big )^{\frac{1}{p}} \big ) , \end{aligned}$$

however with the restriction \(p=2\) in case of (A5b).

Proof of Theorem 2.2

From the proof of Theorem 2.1 we get an estimate for \(\Big (\mathrm {E}\big [\Vert X_{t_m}-Y_m\Vert _H^2\big ]\Big )^{\frac{1}{2}}\). It remains to prove the expression for the error caused by the approximation of the iterated stochastic integrals, that is,

$$\begin{aligned} \Big ( \mathrm {E}\big [\Vert Y_m-{\bar{Y}}_m\Vert _H^2\big ] \Big )^{\frac{1}{2}} \le \Big ( \mathrm {E}\big [\Vert Y_m-Y_{m,{\bar{Y}}}\Vert _H^2\big ] \Big )^{\frac{1}{2}} + \Big ( \mathrm {E}\big [\Vert Y_{m,{\bar{Y}}}-{\bar{Y}}_m\Vert _H^2\big ] \Big )^{\frac{1}{2}} \end{aligned}$$
(40)

where

$$\begin{aligned} Y_{m,{\bar{Y}}}&= P_N\bigg ( e^{At_m}X_0+ \sum _{l=0}^{m-1} \int _{t_l}^{t_{l+1}}e^{A(t_m-t_l)}F({\bar{Y}}_l)\,\mathrm {d}s\\&\quad + \sum _{l=0}^{m-1} \int _{t_l}^{t_{l+1}}e^{A(t_m-t_l)} B({\bar{Y}}_l)\,\mathrm {d}W^K_s \\&\quad +\sum _{l=0}^{m-1}\sum _{j\in {\mathcal {J}}_K}e^{A(t_m-t_l)} \bigg (B\bigg ({\bar{Y}}_l+\sum _{i\in {\mathcal {J}}_K}P_NB({\bar{Y}}_l) {\tilde{e}}_iI^Q_{(i,j),l}\bigg ){\tilde{e}}_j-B({\bar{Y}}_l){\tilde{e}}_j\bigg )\bigg ). \end{aligned}$$

For the terms inside the two integrals, we employ Taylor approximations of first order of the difference operators as in (33) where \(\xi (Y_l,j,r) = Y_l + r \sum _{i \in {\mathcal {J}}_K} B(Y_l) {\tilde{e}}_i I^Q_{(i,j),l}\) for all \(j\in {\mathcal {J}}_K\), \(l \in \{0,\ldots ,m-1\}\) and \(r \in [0,1]\); below \({\bar{\xi }}({\bar{Y}}_l,j,r)\) is defined analogously.

This yields

$$\begin{aligned}&\mathrm {E}\big [\Vert Y_m-Y_{m,{\bar{Y}}}\Vert _H^2\big ] \nonumber \\&\quad = \mathrm {E}\bigg [\Big \Vert P_N \Big ( \sum _{l=0}^{m-1} \int _{t_l}^{t_{l+1}}e^{A(t_m-t_l)}\big (F(Y_l)-F({\bar{Y}}_l)\big )\,\mathrm {d}s\nonumber \\&\qquad + \sum _{l=0}^{m-1} \int _{t_l}^{t_{l+1}}e^{A(t_m-t_l)}\big ( B(Y_l)-B({\bar{Y}}_l)\big )\,\mathrm {d}W^K_s \nonumber \\&\qquad +\sum _{l=0}^{m-1}\sum _{j\in {\mathcal {J}}_K} e^{A(t_m-t_l)} \Big ( B'(Y_l) \Big ( \sum _{i \in {\mathcal {J}}_K} P_N B(Y_l) {\tilde{e}}_i I_{(i,j),l}^Q \Big ) {\tilde{e}}_j \nonumber \\&\qquad - B'({\bar{Y}}_l) \Big ( \sum _{i\in {\mathcal {J}}_K} P_N B({\bar{Y}}_l) {\tilde{e}}_i I_{(i,j),l}^Q \Big ) {\tilde{e}}_j \Big ) + \sum _{l=0}^{m-1} \sum _{j \in {\mathcal {J}}_K} e^{A(t_m-t_l)} \nonumber \\&\qquad \times \Big (\int _0^1 \int _0^u B''(\xi (Y_l,j,r))\nonumber \\&\qquad \times \Big (\sum _{i\in {\mathcal {J}}_K} P_N B(Y_l) {\tilde{e}}_i I_{(i,j),l}^Q, \sum _{i \in {\mathcal {J}}_K} P_N B(Y_l) {\tilde{e}}_i I_{(i,j),l}^Q \Big ) {\tilde{e}}_j \,\mathrm {d}r \, \mathrm {d}u \nonumber \\&\qquad - \int _0^1 \int _0^u B''({\bar{\xi }}({\bar{Y}}_l,j,r))\nonumber \\&\qquad \times \Big ( \sum _{i \in {\mathcal {J}}_K} P_N B({\bar{Y}}_l) {\tilde{e}}_i I_{(i,j),l}^Q, \sum _{i \in {\mathcal {J}}_K} P_N B({\bar{Y}}_l) {\tilde{e}}_i I_{(i,j),l}^Q \Big ) {\tilde{e}}_j \, \mathrm {d}r \, \mathrm {d}u \Big ) \Big ) \Big \Vert _H^2 \bigg ] \nonumber \\&\quad \le C_{Q,T,\beta ,\gamma } h \sum _{l=0}^{m-1} \mathrm {E} \Big [\big \Vert Y_l-{\bar{Y}}_l\big \Vert ^2_H\Big ] + \mathrm {E} \bigg [\Big \Vert \sum _{l=0}^{m-1}\sum _{j\in {\mathcal {J}}_K} e^{A(t_m-t_l)} \nonumber \\&\qquad \times \Big ( \int _0^1 \int _0^u B''(\xi (Y_l,j,r)) \nonumber \\&\qquad \times \Big (\sum _{i \in {\mathcal {J}}_K} P_N B(Y_l) {\tilde{e}}_i I_{(i,j),l}^Q, \sum _{i \in {\mathcal {J}}_K} P_N B(Y_l) {\tilde{e}}_i I_{(i,j),l}^Q \Big ) {\tilde{e}}_j \, \mathrm {d}r \, \mathrm {d}u \Big ) \Big ) \nonumber \\&\qquad - \int _0^1 \int _0^u B''({\bar{\xi }}({\bar{Y}}_l,j,r))\nonumber \\&\qquad \times \Big ( \sum _{i \in {\mathcal {J}}_K} P_N B({\bar{Y}}_l) {\tilde{e}}_i I_{(i,j),l}^Q, \sum _{i \in {\mathcal {J}}_K} P_N B({\bar{Y}}_l) {\tilde{e}}_i I_{(i,j),l}^Q \Big ) {\tilde{e}}_j \, \mathrm {d}r \, \mathrm {d}u \Big ) \Big ) \Big \Vert _H^2 \bigg ] \nonumber \\&\quad \le C_{Q,T,\beta ,\gamma } h \sum _{l=0}^{m-1} \mathrm {E} \Big [ \big \Vert Y_l - {\bar{Y}}_l \big \Vert ^2_H \Big ] + C M \sum _{l=0}^{m-1} \bigg ( \sum _{j \in {\mathcal {J}}_K} \bigg ( \mathrm {E} \bigg [ \Big \Vert \int _0^1 \int _0^u \nonumber \\&\qquad B''(\xi (Y_l,j,r)) \big ( \sum _{i \in {\mathcal {J}}_K} P_N B(Y_l) {\tilde{e}}_i I_{(i,j),l}^Q, \sum _{i \in {\mathcal {J}}_K} P_N B(Y_l) {\tilde{e}}_i I_{(i,j),l}^Q \Big ) {\tilde{e}}_j \nonumber \\&\qquad - B''({\bar{\xi }}({\bar{Y}}_l,j,r)) \nonumber \\&\qquad \times \Big ( \sum _{i \in {\mathcal {J}}_K} P_N B({\bar{Y}}_l) {\tilde{e}}_i I_{(i,j),l}^Q, \sum _{i \in {\mathcal {J}}_K} P_N B({\bar{Y}}_l) {\tilde{e}}_i I_{(i,j),l}^Q \Big ) {\tilde{e}}_j \, \mathrm {d}r \, \mathrm {d}u \Big \Vert ^2_H \bigg ] \bigg )^{\frac{1}{2}} \bigg )^2 \end{aligned}$$
(41)

where in the second step the computations are the same as in [9, Section 6.3], see also (36). This estimate mainly employs the Lipschitz continuity of the involved operators.

Case 1: Assume that assumption (A5a) is fulfilled, i.e., Lemma 6.1 is valid for any \(p \ge 2\). Then, by the triangle inequality, the norm properties as well as assumption (A3), (41) results in

$$\begin{aligned}&\mathrm {E}\big [\Vert Y_m-Y_{m,{\bar{Y}}}\Vert _H^2\big ] \nonumber \\&\quad \le C_{Q,T,\beta ,\gamma } h \sum _{l=0}^{m-1}\mathrm {E}\Big [\big \Vert Y_l-{\bar{Y}}_l\big \Vert ^2_H\Big ] \nonumber \\&\qquad + C M \sum _{l=0}^{m-1}\Big (\sum _{j\in {\mathcal {J}}_K} \Big (\mathrm {E}\Big [\big \Vert B''(\xi (Y_l,j)) \big \Vert ^2_{L^{(2)}(H,L(U,H))} \big \Vert \sum _{i\in {\mathcal {J}}_K} B(Y_l){\tilde{e}}_iI_{(i,j),l}^Q\big \Vert ^4_H\Big ]\Big )^{\frac{1}{2}}\Big )^2 \nonumber \\&\qquad + C M \sum _{l=0}^{m-1}\Big (\sum _{j\in {\mathcal {J}}_K} \Big ( \mathrm {E}\Big [\big \Vert B''({\bar{\xi }} ({\bar{Y}}_l,j))\big \Vert ^2_{L^{(2)}(H,L(U,H))}\big \Vert \sum _{i\in {\mathcal {J}}_K} B({\bar{Y}}_l){\tilde{e}}_i I_{(i,j),l}^Q\big \Vert ^4_H\Big ]\Big )^{\frac{1}{2}}\Big )^2 \nonumber \\&\quad \le C_{Q,T,\beta ,\gamma } h \sum _{l=0}^{m-1} \mathrm {E}\Big [\big \Vert Y_l-{\bar{Y}}_l\big \Vert ^2_H\Big ] \nonumber \\&\qquad + C M \sum _{l=0}^{m-1}\bigg (\sum _{j\in {\mathcal {J}}_K} \bigg (\mathrm {E}\bigg [ \Big ( \sum _{i_1,i_2\in {\mathcal {J}}_K} I_{(i_1,j),l}^QI_{(i_2 ,j),l}^Q \langle {\tilde{e}}_{i_1},{\tilde{e}}_{i_2}\rangle _U \Big )^2 \big \Vert B(Y_l)\big \Vert ^4_{L(U,H)}\bigg ]\bigg )^{\frac{1}{2}}\bigg )^2 \nonumber \\&\qquad + C M \sum _{l=0}^{m-1}\bigg (\sum _{j\in {\mathcal {J}}_K} \bigg (\mathrm {E}\bigg [ \Big ( \sum _{i_1,i_2\in {\mathcal {J}}_K} I_{(i_1,j),l}^Q I_{(i_2 ,j),l}^Q \langle {\tilde{e}}_{i_1},{\tilde{e}}_{i_2}\rangle _U \Big )^2 \big \Vert B({\bar{Y}}_l)\big \Vert ^4_{L(U,H)}\bigg ]\bigg )^{\frac{1}{2}}\bigg )^2 \nonumber \\&\quad \le C_{Q,T,\beta ,\gamma } h \sum _{l=0}^{m-1}\mathrm {E}\Big [\big \Vert Y_l-{\bar{Y}}_l\big \Vert ^2_H\Big ] \nonumber \\&\qquad + C M \sum _{l=0}^{m-1}\bigg (\sum _{j\in {\mathcal {J}}_K} \bigg (\mathrm {E}\bigg [\Big ( \sum _{i\in {\mathcal {J}}_K} \big (I_{(i,j),l}^Q\big )^2\Big )^2 \big \Vert B(Y_l)\big \Vert ^4_{L(U,H)}\bigg ]\bigg )^{\frac{1}{2}}\bigg )^2 \nonumber \\&\qquad + C M \sum _{l=0}^{m-1}\Big (\sum _{j\in {\mathcal {J}}_K} \bigg (\mathrm {E}\bigg [ \Big ( \sum _{i\in {\mathcal {J}}_K} \big (I_{(i,j),l}^Q\big )^2\Big )^2 \big \Vert B({\bar{Y}}_l)\big \Vert ^4_{L(U,H)}\bigg ]\bigg )^{\frac{1}{2}}\bigg )^2. \end{aligned}$$
(42)

This expression can further be simplified by the properties of \(I_{(i,j),l}^Q\) for \(l\in \{0,\ldots ,m-1\}\), \(m\in \{1,\ldots ,M\}\), \(i,j\in {\mathcal {J}}_K\), \(M,K\in {\mathbb {N}}\) and assumption (A3). Furthermore, (A5a), Lemma 6.1 and Remark 6.1 imply

$$\begin{aligned}&\mathrm {E}\big [\Vert Y_m-Y_{m,{\bar{Y}}}\Vert _H^2\big ] \\&\quad \le C_{Q,T,\beta ,\gamma } h \sum _{l=0}^{m-1}\mathrm {E}\Big [\big \Vert Y_l-{\bar{Y}}_l\big \Vert ^2_H\Big ]\\&\qquad + C_Q M \sum _{l=0}^{m-1}\Big (\Big (h^4\, \Big (\mathrm {E}\big [\Vert B(Y_l)\Vert ^4_{L(U,H)}\big ] + \mathrm {E}\big [\Vert B({\bar{Y}}_l)\Vert ^4_{L(U,H)}\big ]\Big )\Big )^{\frac{1}{2}}\Big )^2 \\&\quad \le C_{Q,T,\beta ,\gamma } h \sum _{l=0}^{m-1}\mathrm {E}\Big [\big \Vert Y_l-{\bar{Y}}_l\big \Vert ^2_H\Big ] + C_{Q,T,\beta ,\gamma } \sum _{l=0}^{m-1}h^3 \\&\quad \le C_{Q,T,\beta ,\gamma } h \sum _{l=0}^{m-1} \mathrm {E}\Big [\big \Vert Y_l-{\bar{Y}}_l\big \Vert ^2_H\Big ] + C_{Q,T,\beta ,\gamma } h^2. \end{aligned}$$

Case 2: If assumption (A5b) is fulfilled, then Lemma 6.1 is valid for \(p = 2\). By applying the triangle inequality, we get analogously to case 2 in the proof of Theorem 2.1 that

$$\begin{aligned}&\mathrm {E}\big [\Vert Y_m-Y_{m,{\bar{Y}}}\Vert _H^2\big ] \\&\quad \le C_{Q,T,\beta ,\gamma } \, h \sum _{l=0}^{m-1} \mathrm {E} \Big [ \big \Vert Y_l - {\bar{Y}}_l \big \Vert ^2_H \Big ] + C \, M \sum _{l=0}^{m-1} \bigg ( \sum _{j \in {\mathcal {J}}_K} \bigg ( \mathrm {E} \bigg [ \Big \Vert \int _0^1 \int _0^u \\&\qquad B''(\xi (Y_l,j,r)) \big ( \sum _{i \in {\mathcal {J}}_K} P_N B(Y_l) {\tilde{e}}_i I_{(i,j),l}^Q, \sum _{i \in {\mathcal {J}}_K} P_N B(Y_l) {\tilde{e}}_i I_{(i,j),l}^Q \Big ) {\tilde{e}}_j \, \mathrm {d}r \, \mathrm {d}u \Big \Vert ^2_H \bigg ] \bigg )^{\frac{1}{2}} \\&\qquad + \sum _{j \in {\mathcal {J}}_K} \bigg ( \mathrm {E} \bigg [ \Big \Vert \int _0^1 \int _0^u B''({\bar{\xi }}({\bar{Y}}_l,j,r))\\&\qquad \times \Big ( \sum _{i \in {\mathcal {J}}_K} P_N B({\bar{Y}}_l) {\tilde{e}}_i I_{(i,j),l}^Q, \sum _{i \in {\mathcal {J}}_K} P_N B({\bar{Y}}_l) {\tilde{e}}_i I_{(i,j),l}^Q \Big ) {\tilde{e}}_j \, \mathrm {d}r \, \mathrm {d}u \Big \Vert ^2_H \bigg ] \bigg )^{\frac{1}{2}} \bigg )^2 \\&\quad \le C_{Q,T,\beta ,\gamma } \, h \sum _{l=0}^{m-1} \mathrm {E} \Big [ \big \Vert Y_l - {\bar{Y}}_l \big \Vert ^2_H \Big ] + C \, M \sum _{l=0}^{m-1}\\&\qquad \times \bigg ( \sum _{j \in {\mathcal {J}}_K}\bigg ( \mathrm {E}\bigg [ \Big ( \int _0^1 \int _0^u \big \Vert B''(\xi (Y_l,j,r)) \big ( P_N B(Y_l), P_N B(Y_l) \big ) \big \Vert _{L^{(2)}(U,L(U,H))} \\&\qquad \times \Big \Vert \sum _{i \in {\mathcal {J}}_K} {\tilde{e}}_i I_{(i,j),l}^Q \Big \Vert _U^2 \Vert {\tilde{e}}_j \Vert _U \, \mathrm {d}r \, \mathrm {d}u \Big )^2 \bigg ] \bigg )^{\frac{1}{2}} \\&\qquad + \sum _{j \in {\mathcal {J}}_K} \bigg ( \mathrm {E}\bigg [ \Big ( \int _0^1 \int _0^u \big \Vert B''({\bar{\xi }}({\bar{Y}}_l,j,r)) \big ( P_N B({\bar{Y}}_l), P_N B({\bar{Y}}_l) \big ) \big \Vert _{L^{(2)}(U,L(U,H))} \\&\qquad \times \Big \Vert \sum _{i \in {\mathcal {J}}_K} {\tilde{e}}_i I_{(i,j),l}^Q \Big \Vert _U^2 \Vert {\tilde{e}}_j \Vert _U \, \mathrm {d}r \, \mathrm {d}u \Big )^2 \bigg ] \bigg )^{\frac{1}{2}} \bigg )^{2} \\&\quad \le C_{Q,T,\beta ,\gamma } \, h \sum _{l=0}^{m-1} \mathrm {E} \Big [ \big \Vert Y_l - {\bar{Y}}_l \big \Vert ^2_H \Big ] \\&\qquad + C \, M \sum _{l=0}^{m-1}\bigg ( \sum _{j \in {\mathcal {J}}_K} \bigg ( \mathrm {E}\bigg [ \Big ( \int _0^1 \int _0^u \big (1 + \big \Vert \xi (Y_l,j,r) \big \Vert _H + \big \Vert Y_l\big \Vert _H \big ) \\&\qquad \times \sum _{i \in {\mathcal {J}}_K} \big ( I_{(i,j),l}^Q \big )^2 \, \mathrm {d}r \, \mathrm {d}u \Big )^2 \bigg ] \bigg )^{\frac{1}{2}} \\&\qquad + \sum _{j \in {\mathcal {J}}_K} \bigg ( \mathrm {E}\bigg [ \Big ( \int _0^1 \int _0^u \big (1 + \big \Vert {\bar{\xi }}({\bar{Y}}_l,j,r) \big \Vert _H + \big \Vert {\bar{Y}}_l \big \Vert _H \big ) \sum _{i \in {\mathcal {J}}_K} \big ( I_{(i,j),l}^Q \big )^2 \, \mathrm {d}r \, \mathrm {d}u \Big )^2 \bigg ] \bigg )^{\frac{1}{2}} \bigg )^{2} . \end{aligned}$$

Making use of the distributional characteristics of \(I_{(i,j),l}^Q\), we get

$$\begin{aligned}&\mathrm {E}\big [\Vert Y_m-Y_{m,{\bar{Y}}}\Vert _H^2\big ] \nonumber \\&\quad \le C_{Q,T,\beta ,\gamma } \, h \sum _{l=0}^{m-1} \mathrm {E} \Big [ \big \Vert Y_l - {\bar{Y}}_l \big \Vert ^2_H \Big ] \nonumber \\&\qquad + C \, M \sum _{l=0}^{m-1}\bigg ( \sum _{j \in {\mathcal {J}}_K} \bigg ( \mathrm {E}\bigg [ \Big ( \int _0^1 \int _0^u \Big (1 + 2 \big \Vert Y_l \big \Vert _H + r \Big \Vert P_N B(Y_l) \sum _{i \in {\mathcal {J}}_K} {\tilde{e}}_i I_{(i,j),l}^Q \Big \Vert _H \Big ) \nonumber \\&\qquad \times \sum _{i \in {\mathcal {J}}_K} \big ( I_{(i,j),l}^Q \big )^2 \, \mathrm {d}r \, \mathrm {d}u \Big )^2 \bigg ] \bigg )^{\frac{1}{2}} \nonumber \\&\qquad + \sum _{j \in {\mathcal {J}}_K} \bigg ( \mathrm {E}\bigg [ \Big ( \int _0^1 \int _0^u \Big (1 + 2 \big \Vert {\bar{Y}}_l \big \Vert _H + r \Big \Vert P_N B({\bar{Y}}_l) \sum _{i \in {\mathcal {J}}_K} {\tilde{e}}_i I_{(i,j),l}^Q \Big \Vert _H \Big ) \nonumber \\&\qquad \times \sum _{i \in {\mathcal {J}}_K} \big ( I_{(i,j),l}^Q \big )^2 \, \mathrm {d}r \, \mathrm {d}u \Big )^2 \bigg ] \bigg )^{\frac{1}{2}} \bigg )^{2} \nonumber \\&\quad \le C_{Q,T,\beta ,\gamma } \, h \sum _{l=0}^{m-1} \mathrm {E} \Big [ \big \Vert Y_l - {\bar{Y}}_l \big \Vert ^2_H \Big ] + C_{Q,T,\delta } \, M \nonumber \\&\qquad \times \sum _{l=0}^{m-1}\bigg ( \sum _{j \in {\mathcal {J}}_K} \bigg ( \bigg ( \mathrm {E}\bigg [ \Big ( \sum _{i \in {\mathcal {J}}_K} \big ( I_{(i,j),l}^Q \big )^2 \Big )^2 \bigg ] \nonumber \\&\qquad + \mathrm {E}\bigg [ \Big ( \sum _{i \in {\mathcal {J}}_K} \big ( I_{(i,j),l}^Q \big )^2 \Big )^3 \Big ) \bigg ] \bigg ) \big ( 1 + \mathrm {E}\big [ \big \Vert Y_l \big \Vert _H^2 \big ] \big ) \bigg )^{\frac{1}{2}} \nonumber \\&\qquad + \sum _{j \in {\mathcal {J}}_K} \bigg ( \bigg ( \mathrm {E}\bigg [ \Big ( \sum _{i \in {\mathcal {J}}_K} \big ( I_{(i,j),l}^Q \big )^2 \Big )^2 \bigg ] + \mathrm {E}\bigg [ \Big ( \sum _{i \in {\mathcal {J}}_K} \big ( I_{(i,j),l}^Q \big )^2 \Big )^3 \Big ) \bigg ] \bigg )\nonumber \\&\qquad \times \big ( 1 + \mathrm {E}\big [ \big \Vert {\bar{Y}}_l \big \Vert _H^2 \big ] \big ) \bigg )^{\frac{1}{2}} \bigg )^{2} \nonumber \\&\quad \le C_{Q,T,\beta ,\gamma } \, h \sum _{l=0}^{m-1} \mathrm {E} \Big [ \big \Vert Y_l - {\bar{Y}}_l \big \Vert ^2_H \Big ] \nonumber \\&\qquad + C_{Q,T,\delta } \, M \sum _{l=0}^{m-1}\bigg ( {\text {tr}}Q \bigg ( ({\text {tr}}Q)^2 h^4 + ({\text {tr}}Q)^3 h^6 \bigg )^{\frac{1}{2}} \bigg )^{2} \nonumber \\&\qquad \big ( 1 + \mathrm {E}\big [ \big \Vert Y_l \big \Vert _{H_{\delta }}^2 \big ] + \mathrm {E}\big [ \big \Vert {\bar{Y}}_l \big \Vert _{H_{\delta }}^2 \big ] \big ) \nonumber \\&\quad \le C_{Q,T,\beta ,\gamma } \, h \sum _{l=0}^{m-1} \mathrm {E} \Big [ \big \Vert Y_l - {\bar{Y}}_l \big \Vert ^2_H \Big ] \nonumber \\&\qquad + C_{Q,T,\delta } \, ({\text {tr}}Q)^4 h \sum _{l=0}^{m-1}\big ( h^2 + {\text {tr}}Q \, h^4 \big ) \big ( 1 + \mathrm {E}\big [ \big \Vert X_0 \big \Vert _{H_{\delta }}^2 \big ] \big ) \nonumber \\&\quad \le C_{Q,T,\beta ,\gamma } \, h \sum _{l=0}^{m-1} \mathrm {E} \Big [ \big \Vert Y_l - {\bar{Y}}_l \big \Vert ^2_H \Big ] + C_{Q,T,\delta } h^2 . \end{aligned}$$
(43)

Summarizing, we have \(\mathrm {E}\big [\Vert Y_m-Y_{m,{\bar{Y}}}\Vert _H^2\big ] \le C_{Q,T,\beta ,\gamma } \, h \sum _{l=0}^{m-1} \mathrm {E} \Big [ \big \Vert Y_l - {\bar{Y}}_l\big \Vert ^2_H \Big ] + C_{Q,T,\delta } h^2\) for both cases.

Finally, we analyze the second term in (40). We basically employ the same techniques as for the previous term. At first, we replace the difference operator by a first order Taylor expansion.

$$\begin{aligned}&\mathrm {E}\big [\Vert Y_{m,{\bar{Y}}}-{\bar{Y}}_m\Vert _H^2\big ] \\&\quad = \mathrm {E}\bigg [\Big \Vert P_N \sum _{l=0}^{m-1}\sum _{j\in {\mathcal {J}}_K}e^{A(t_m-t_l)} \Big (\Big (B\Big ({\bar{Y}}_l+\sum _{i\in {\mathcal {J}}_K}P_NB({\bar{Y}}_l) {\tilde{e}}_iI^Q_{(i,j),l}\Big ){\tilde{e}}_j-B({\bar{Y}}_l){\tilde{e}}_j\Big )\\&\qquad - \Big (B\Big ({\bar{Y}}_l+\sum _{i\in {\mathcal {J}}_K}P_NB({\bar{Y}}_l) {\tilde{e}}_i{\bar{I}}^Q_{(i,j),l}\Big ){\tilde{e}}_j -B({\bar{Y}}_l){\tilde{e}}_j\Big )\Big )\Big \Vert _H^2\bigg ]\\&\quad = \mathrm {E} \bigg [ \Big \Vert P_N \Big ( \sum _{l=0}^{m-1} \sum _{j\in {\mathcal {J}}_K} e^{A(t_m-t_l)} \Big ( B'({\bar{Y}}_l) \Big ( \sum _{i\in {\mathcal {J}}_K} P_N B({\bar{Y}}_l) {\tilde{e}}_i I_{(i,j),l}^Q \Big ) {\tilde{e}}_j \\&\qquad + \int _0^1 \int _0^u B''({\bar{\xi }}({\bar{Y}}_l,j,r)) \Big (\sum _{i\in {\mathcal {J}}_K} P_NB({\bar{Y}}_l) {\tilde{e}}_iI_{(i,j),l}^Q,\sum _{i\in {\mathcal {J}}_K} P_N B({\bar{Y}}_l){\tilde{e}}_iI_{(i,j),l}^Q \Big ){\tilde{e}}_j \,\mathrm {d}r\,\mathrm {d}u\\&\qquad - B'({\bar{Y}}_l) \Big ( \sum _{i\in {\mathcal {J}}_K} P_N B({\bar{Y}}_l) {\tilde{e}}_i {\bar{I}}_{(i,j),l}^Q \Big ) {\tilde{e}}_j \\&\qquad - \int _0^1 \int _0^u B''(\bar{{\bar{\xi }}}({\bar{Y}}_l,j,r)) \\&\qquad \times \Big (\sum _{i\in {\mathcal {J}}_K} P_N B({\bar{Y}}_l){\tilde{e}}_i{\bar{I}}_{(i,j),l}^Q,\sum _{i\in {\mathcal {J}}_K} P_N B({\bar{Y}}_l ) {\tilde{e}}_i {\bar{I}}_{(i,j),l}^Q\Big ){\tilde{e}}_j\,\mathrm {d}r \,\mathrm {d}u \Big ) \Big ) \Big \Vert _H^2 \bigg ]. \end{aligned}$$

As above, we obtain for the terms involving the second derivative

$$\begin{aligned}&\mathrm {E}\big [\Vert Y_{m,{\bar{Y}}}-{\bar{Y}}_m\Vert _H^2\big ] \nonumber \\&\quad \le C \mathrm {E}\bigg [\Big \Vert \sum _{l=0}^{m-1}e^{A(t_m-t_l)} \Big (\int _{t_l}^{t_{l+1}}B'({\bar{Y}}_l) \Big (\int _{t_l}^s P_NB({\bar{Y}}_l)\, \mathrm {d}W_r^K\Big )\, \mathrm {d}W^K_s \nonumber \\&\qquad - \sum _{i,j\in {\mathcal {J}}_K} {\bar{I}}_{(i,j),l}^Q B'({\bar{Y}}_l) (P_N B({\bar{Y}}_l) {\tilde{e}}_i ) {\tilde{e}}_j \Big ) \Big \Vert _H^2\bigg ] \nonumber \\&\qquad + C \mathrm {E} \bigg [ \Big \Vert \sum _{l=0}^{m-1} e^{A(t_m-t_l)} \sum _{j\in {\mathcal {J}}_K} \nonumber \\&\qquad \times \Big ( \int _0^1\int _0^u B''({\bar{\xi }}({\bar{Y}}_l,j,r))\nonumber \\&\qquad \times \Big (\sum _{i\in {\mathcal {J}}_K} P_NB({\bar{Y}}_l) {\tilde{e}}_iI_{(i,j),l}^Q,\sum _{i\in {\mathcal {J}}_K} P_NB({\bar{Y}}_l){\tilde{e}}_iI_{(i,j),l}^Q\Big ){\tilde{e}}_j \,\mathrm {d}r\,\mathrm {d}u \nonumber \\&\qquad - \int _0^1\int _0^u B''(\bar{{\bar{\xi }}}({\bar{Y}}_l,j,r))\nonumber \\&\qquad \times \Big (\sum _{i\in {\mathcal {J}}_K} P_NB({\bar{Y}}_l) {\tilde{e}}_i{\bar{I}}_{(i,j),l}^Q, \sum _{i\in {\mathcal {J}}_K} P_NB({\bar{Y}}_l){\tilde{e}}_i {\bar{I}}_{(i,j),l}^Q\Big ){\tilde{e}}_j\,\mathrm {d}r\,\mathrm {d}u \Big ) \Big \Vert _H^2\bigg ] \nonumber \\&\quad \le C \sum _{l=0}^{m-1}\mathrm {E}\bigg [\Big \Vert \int _{t_l}^{t_{l+1}} B'({\bar{Y}}_l) \Big ( \int _{t_l}^s P_N B({\bar{Y}}_l) \, \mathrm {d}W_r^K\Big )\, \mathrm {d}W^K_s \nonumber \\&\qquad - \sum _{i,j\in {\mathcal {J}}_K} {\bar{I}}_{(i,j),l}^Q B'({\bar{Y}}_l) (P_N B({\bar{Y}}_l) {\tilde{e}}_i ) {\tilde{e}}_j \Big \Vert _H^2 \bigg ] \nonumber \\&\qquad + C \bigg ( \sum _{l=0}^{m-1} \Big ( \mathrm {E} \bigg [ \Big \Vert e^{A(t_m-t_l)} \sum _{j\in {\mathcal {J}}_K} \int _0^1 \int _0^u B''({\bar{\xi }}({\bar{Y}}_l,j,r))\nonumber \\&\qquad \times \Big (\sum _{i\in {\mathcal {J}}_K} P_NB({\bar{Y}}_l) {\tilde{e}}_iI_{(i,j),l}^Q,\sum _{i\in {\mathcal {J}}_K} P_NB({\bar{Y}}_l){\tilde{e}}_iI_{(i,j),l}^Q\Big ){\tilde{e}}_j \,\mathrm {d}r\,\mathrm {d}u \Big \Vert _H^2\bigg ] \Big )^{\frac{1}{2}} \nonumber \\&\qquad +\sum _{l=0}^{m-1} \bigg ( \mathrm {E} \bigg [ \bigg \Vert e^{A(t_m-t_l)} \sum _{j\in {\mathcal {J}}_K} \int _0^1 \int _0^u B''(\bar{{\bar{\xi }}}({\bar{Y}}_l,j,r))\nonumber \\&\qquad \times \bigg (\sum _{i\in {\mathcal {J}}_K} P_N B({\bar{Y}}_l) {\tilde{e}}_i{\bar{I}}_{(i,j),l}^Q, \sum _{i\in {\mathcal {J}}_K} P_NB({\bar{Y}}_l){\tilde{e}}_i {\bar{I}}_{(i,j),l}^Q \bigg ) {\tilde{e}}_j \, \mathrm {d}r \, \mathrm {d}u \bigg \Vert _H^2\bigg ] \bigg )^{\frac{1}{2}} \bigg )^2 . \end{aligned}$$
(44)

The first term is the error that results from the approximation of the iterated stochastic integral. Depending on the choice of the scheme, this error estimate may differ. Assumption (10) states that

$$\begin{aligned}&\bigg ( \mathrm {E}\bigg [\bigg \Vert \int _{t_l}^{t_{l+1}}B'({\bar{Y}}_l) \bigg (\int _{t_l}^s P_N B({\bar{Y}}_l) \, \mathrm {d}W_r^K\bigg ) \, \mathrm {d}W^K_s \\&\quad - \sum _{i,j\in {\mathcal {J}}_K} {\bar{I}}_{(i,j),l}^Q B'({\bar{Y}}_l) (P_N B({\bar{Y}}_l) {\tilde{e}}_i ) {\tilde{e}}_j \bigg \Vert _H^2\bigg ] \bigg )^{\frac{1}{2}} \le {\mathcal {E}}(M,K) \end{aligned}$$

for all \(l\in \{0,\ldots ,m-1\}\), \(m\in \{1,\ldots ,M\}\), \(h>0\) and \(M,K\in {\mathbb {N}}\).

Case 1: Assume that assumption (A5a) is fulfilled, i.e., Lemma 6.1 is valid for any \(p \ge 2\). Analogously to the calculations in (42), we get for (44)

$$\begin{aligned}&\mathrm {E}\big [\Vert Y_{m,{\bar{Y}}}-{\bar{Y}}_m\Vert _H^2\big ] \le C \sum _{l=0}^{m-1} {\mathcal {E}}(M,K)^2\\&\quad + C M \sum _{l=0}^{m-1}\bigg (\sum _{j\in {\mathcal {J}}_K} \bigg (\mathrm {E}\bigg [\bigg ( \sum _{i\in {\mathcal {J}}_K} \big (I_{(i,j),l}^Q\big )^2\bigg )^2 \big \Vert B({\bar{Y}}_l)\big \Vert ^4_{L(U,H)}\bigg ]\bigg )^{\frac{1}{2}}\bigg )^2 \\&\quad + C M \sum _{l=0}^{m-1}\bigg (\sum _{j\in {\mathcal {J}}_K} \bigg (\mathrm {E}\bigg [\bigg ( \sum _{i\in {\mathcal {J}}_K} \big ({\bar{I}}_{(i,j),l}^Q\big )^2\bigg )^2 \big \Vert B({\bar{Y}}_l)\big \Vert ^4_{L(U,H)}\bigg ]\bigg )^{\frac{1}{2}}\bigg )^2. \end{aligned}$$

By assumption (11) and the properties of \(I_{(i,j),l}^Q\) as well as Remark 6.1, we obtain

$$\begin{aligned}&\mathrm {E}\big [\Vert Y_{m,{\bar{Y}}}-{\bar{Y}}_m\Vert _H^2\big ] \le C \sum _{l=0}^{m-1} {\mathcal {E}}(M,K)^2 \\&\qquad +C M \sum _{l=0}^{m-1}\bigg (\sum _{j\in {\mathcal {J}}_K} \bigg (\mathrm {E}\bigg [ \bigg ( \sum _{i\in {\mathcal {J}}_K} \big (I_{(i,j),l}^Q\big )^2\bigg )^2 \bigg ] \mathrm {E}\Big [\big \Vert B({\bar{Y}}_l)\big \Vert ^4_{L(U,H)}\Big ]\bigg )^{\frac{1}{2}}\bigg )^2 \\&\qquad + C M \sum _{l=0}^{m-1} \bigg ( \sum _{j\in {\mathcal {J}}_K} \bigg ( \mathrm {E} \bigg [ \bigg ( \sum _{i\in {\mathcal {J}}_K} \big ({\bar{I}}_{(i,j),l}^Q\big )^2\bigg )^2 \bigg ] \mathrm {E}\Big [\big \Vert B({\bar{Y}}_l)\big \Vert ^4_{L(U,H)}\Big ]\bigg )^{\frac{1}{2}}\bigg )^2\\&\quad \le C \sum _{l=0}^{m-1} {\mathcal {E}}(M,K)^2 + C_Q M \sum _{l=0}^{m-1}\Big (h^2\Big ( \mathrm {E}\big [\Vert B({\bar{Y}}_l)\Vert ^4_{L(U,H)}\big ] \Big )^{\frac{1}{2}}\Big )^2 \nonumber \\&\quad \le C M {\mathcal {E}}(M,K)^2 + C_{Q,T,\delta } h^2, \end{aligned}$$

which completes the proof for case 1.

Case 2: If assumption (A5b) is fulfilled, then Lemma 6.1 is valid for \(p = 2\). Analogously to the computations in (43), we get with assumption (12) that

$$\begin{aligned}&\mathrm {E}\big [\Vert Y_{m,{\bar{Y}}}-{\bar{Y}}_m\Vert _H^2\big ] \le C \sum _{l=0}^{m-1} {\mathcal {E}}(M,K)^2\\&\qquad + C \, M \sum _{l=0}^{m-1}\bigg ( \sum _{j \in {\mathcal {J}}_K} \bigg ( \mathrm {E}\bigg [ \bigg ( \sum _{i \in {\mathcal {J}}_K} \big ( I_{(i,j),l}^Q \big )^2 \bigg )^2 \bigg ] + \mathrm {E}\bigg [ \bigg ( \sum _{i \in {\mathcal {J}}_K} \big ( I_{(i,j),l}^Q \big )^2 \bigg )^3 \bigg ) \bigg ]\bigg )^{\frac{1}{2}} \\&\qquad + \sum _{j \in {\mathcal {J}}_K} \bigg ( \mathrm {E}\bigg [ \Big ( \sum _{i \in {\mathcal {J}}_K} \big ( {\bar{I}}_{(i,j),l}^Q \big )^2 \Big )^2 \bigg ] \\&\qquad + \mathrm {E}\bigg [ \Big ( \sum _{i \in {\mathcal {J}}_K} \big ( {\bar{I}}_{(i,j),l}^Q \big )^2 \Big )^3 \Big ) \bigg ] \bigg )^{\frac{1}{2}} \bigg )^{2} \Big ( 1 + \mathrm {E}\big [ \big \Vert {\bar{Y}}_l \big \Vert _H^2 \big ] \Big ) \\&\quad \le C \sum _{l=0}^{m-1} {\mathcal {E}}(M,K)^2 + C_{T} \, ({\text {tr}}Q)^4 h \sum _{l=0}^{m-1}\big ( h^2 + {\text {tr}}Q \, h^4 \big ) \big ( 1 + \mathrm {E}\big [ \big \Vert X_0 \big \Vert _{H_{\delta }}^2 \big ] \big ) \\&\quad \le C M {\mathcal {E}}(M,K)^2 + C_{Q,T,\delta } h^2. \end{aligned}$$

This proves the statement for case 2. \(\square \)