1 Introduction

We consider weak convergence analysis of tamed schemes for semi-linear stochastic differential equations (SDEs) of the form

$$\begin{aligned} {\textrm{d}}X_t= \left( A X_t+F(X_t) \right) {{\textrm{d}}}{t}+ \sum _{i=1} ^m \left( B_i X_t+g_i(X_t)\right) {{{\textrm{d}}}W}_t^i, \quad X_0=x \in \mathbb {R} ^d, \end{aligned}$$
(1.1)

with m, \(d\in \mathbb {N}\). Here \(W_t^i\) are iid Brownian motions on the probability space \((\Omega , \mathcal {F},\mathbb {P})\) with filtration \(({\mathcal {F} _ {t} }) _{t \in [0,T]}\). The nonlinearity \(F: \mathbb {R} ^d \rightarrow \mathbb {R} ^d\) only satisfies a one-sided Lipschitz condition, where as the \(g_i: \mathbb {R} ^d \rightarrow \mathbb {R} ^d\) are globally Lipschitz.

We assume throughout that the matrices A, \(B_i \in \mathbb {R} ^{d \times d }\) satisfy the following zero commutator conditions

$$\begin{aligned} AB_i-B_iA=0, \quad B_jB_i-B_iB_j=0 \qquad \text {for} \quad i,j=1, \ldots , m. \end{aligned}$$
(1.2)

Condition (1.2) is used to exploit the exact solution of Geometric Brownian Motion (GBM) below. By introducing the notation \(W_t:=[W^1_t,\ldots ,W^m_t]^T\),

$$\begin{aligned} \mu (y):= A y+F(y), \quad \sigma _i(y):=B_i y+ g_i(y), \end{aligned}$$
(1.3)

and the matrix \(\sigma (x):=[\sigma _i(x)]\), (i.e. columns formed by \(\sigma _i\)) we can re-write (1.1) as

$$\begin{aligned} {\textrm{d}}X_t= \mu (X_t) {{\textrm{d}}}{t}+ \sigma (X_t) {{{\textrm{d}}}W}_t, \quad X_0=x \in \mathbb {R} ^d. \end{aligned}$$
(1.4)

The schemes we examine are in the class of exponential integrators. These methods have proved to be effective schemes for many SDEs and stochastic partial differential equations (SPDEs). Originally only linear (or linearized) drift terms were exploited, see for example [3, 18, 19, 24, 27], however recently several exponential integrators that also exploit the linear terms in diffusion emerged [10, 11, 17, 32]. We are particularly interested in dealing with a one-sided Lipschitz drift term F with superlinear growth. Hutzenthaler et al. [14] showed both strong and weak divergence of Euler’s method for the general SDE (1.1) with superlinearly growing coefficients F and/or \(g_i\). Following [14] many explicit variants of Euler–Maruyama schemes that guarantee the strong convergence to the exact solution of SDE were derived, see for example [2, 15, 16, 22, 26, 30]. The most well known approach to deal with superlinearly growing coefficients is to employ “taming" to prevent the unbounded growth of numerical solutions. Although there has been much consideration of strong convergence of tamed schemes, see for example [15, 28, 29] there has been little consideration of weak convergence for SDEs [4, 5, 31, 33]. In [5], Bréhier investigated the weak error of the explicit tamed Euler scheme for SDE’s with one–sided Lipschitz continuous drift and additive noise to approximate averages with respect to the invariant distribution of the continuous time process. Bossy et al., [4], proposed and proved weak convergence of a semi-explicit exponential Euler scheme for a one-dimensional SDE with non-globally Lipschitz drift and diffusion behaving as \(x^{\alpha }\), with \(\alpha >1\). Due to the weak condition on the diffusion coefficient, their study covers regularity results for the solution of the Kolmogorov PDE commonly used in weak error analysis. In [31], Wang et al. formulated a general weak convergence theorem for one-step numerical approximations of SDEs with non-globally drift coefficients. They applied this to prove weak convergence of rate one for the tamed and backward Euler–Maruyama methods. We would like to point out that their analysis is not directly applicable to our GBM based schemes as it is not a classical one-step method but rather the composition of the GBM flow and a one-step flow. In the context of SPDEs, Cai et al. [7] constructed and analysed a weak convergence of a numerical scheme based on a spectral Galerkin method in space and a tamed version of the exponential Euler method. Below we impose conditions that are similar to those for the SPDE in [7]. We prove weak convergence for a class of exponential integrators where a form of taming is used for the one-sided Lipschitz drift term. The GBM methods exploit the exact solution of geometric Brownian motion, see [11, 12] where strong convergence of related methods were considered. Further, by taking \(A=B_i=0\) (or incorporating these terms into the nonlinearities) we simultaneously prove weak convergence for the standard exponential tamed scheme such as in [9]. Our proof is based on the Kolmogorov equation and one of the main difficulties is to take into account the stochasticity in the solution operator.

In our numerical experiments we compare different approaches to estimate the weak errors all using multi-level Monte Carlo (MLMC) techniques as reviewed in [13]. For a linear diffusion term we observe that the exponential tamed method does not perform well for larger time step sizes and hence a time step size restriction is required for MLMC techniques (for example to estimate the weak errors). This is of particular interest as tamed methods were originally introduced and strong convergence was examined precisely to control nonlinearities in the context of MLMC type simulations, see [15]. The GBM based method does not suffer in this way in our experiments for linear noise. For nonlinear diffusion both tamed based methods require a step size restriction for convergence on the MLMC techniques.

The paper is organized as follows: in Sect. 2 we state our assumptions on the drift and diffusion, present the new numerical method and state our main results. In Sect. 3 we present numerical simulations illustrating the rate of convergence using the MLMC simulations and compare the different approaches. The proofs of the main results are then given in detail in Sects. 4 and 5.

2 Setting and main results

Throughout the paper we let \({ \big \langle } \cdot , \cdot { \big \rangle }\) denote the standard inner product in \(\mathbb {R}^d\) (so \({ \big \langle } y , z { \big \rangle }=y ^\intercal z\) for \(y,z \in \mathbb {R}^d\) ) and \(\left\Vert \cdot \right\Vert \) represent both the Euclidean norm for vectors as well as the induced matrix norm. A vector \(\beta =(\beta _1,\beta _2,\ldots ,\beta _d)\) is a multiindex of order \(\vert \beta \vert =\sum _{i=1}^d \beta _i\) with nonnegative integers components. The partial derivative operator corresponding to the multiindex \(\beta \) is defined as

$$\begin{aligned} D^\beta h(x)=\frac{\partial ^{\vert \beta \vert } h(x)}{\partial _{x_1}^{\beta _1} \partial _{x_2}^{\beta _2}\ldots \partial _{x_d}^{\beta _d} } \end{aligned}$$

where \(h\in C^{\vert \beta \vert }(\mathbb {R}^d;\mathbb {R}^l)\). For a nonnegative integer j, we let \(\textbf{D}^jh(x)\) represent the jth order derivative operator applied to a function \(h\in C^j(\mathbb {R}^d;\mathbb {R}^l)\). When \(j=1\) we simply write the Jacobian as \(\textbf{D}h\).

Additionally, \(C^{k} _{b} (\mathbb {R}^d;\mathbb {R})\) denotes of the set of k-times differentiable functions, which are uniformly continuous and bounded together with their derivatives up to k-th order.

We define the sets

$$\begin{aligned} \mathbb {N}_{n}:=\{0,1,2,\ldots ,n\} \qquad \text {and} \qquad \mathbb {N}_{n}^+:=\{1,2,\ldots ,n\}. \end{aligned}$$

Before introducing our class of numerical methods we present three results from [8] on the existence and uniquenes, bounded moments and mean-square differentiability of the exact solution to (1.4).

2.1 Preliminary results for the SDE

For an SDE such as (1.4) with globally Lipschitz drift and diffusion coefficients many classical textbooks on stochastic analysis consider the Kolmogorov PDE. However, for non-globally Lipschitz drift coefficients there are far fewer results. One key work is Cerrai [8] for the properties of the exact solution to a SDE with one-sided drift coefficient.

Assumption 1

Let \(H\ge 0\) be given. Let the functions F and \(g_i \in C^h(\mathbb {R}^d;\mathbb {R}^d)\) for some \(h\ge H\) where \(i=1,2, \ldots , m\). Define the matrix g by the columns of \(g_i\) so that \(g=[g_i]\). Additionally, assume that

  1. (i)

    there exists \(r \ge 0\) such that for any \(j=0,1,\ldots ,h\)

    \(\sup _{y \in \mathbb {R}^d} \left\Vert D^\alpha F(y)\right\Vert (1+\left\Vert y\right\Vert ^{2r+1-j})^{-1} < \infty , \qquad \vert \alpha \vert =j\);

  2. (ii)

    there exists \(\rho \le r\) such that for any \(j=0,1,\ldots ,h\)

    \(\sup _{y \in \mathbb {R}^d} \left\Vert D^\alpha g_i(y)\right\Vert (1+\left\Vert y\right\Vert ^{\rho -j})^{-1} < \infty ,\qquad \vert \alpha \vert =j \);

  3. (iii)

    for all \(p>0\) there exist \(K=K(p) \in \mathbb {R}\) such that

    \( { \big \langle } y , \textbf{D} F(z) y { \big \rangle }+ p \left\Vert \textbf{D}g (z)y \right\Vert ^2 \le K \left\Vert y\right\Vert ^2, \quad \forall \ y,z \in \mathbb {R}^d.\)

Assumption 2

There exist constants \(a>0\) and \(r,\gamma ,c \ge 0\) such that for any \(y,z \in \mathbb {R}^d\)

$$\begin{aligned} { \big \langle } Az , z { \big \rangle } + { \big \langle } F(y+z)-F(y) , z { \big \rangle } \le -a \left\Vert z\right\Vert ^{2r+2} + c (\left\Vert y\right\Vert ^{\gamma } +1). \end{aligned}$$

In particular, under Assumption 1, Cerrai [8] proves the existence and uniqueness of a solution to the SDE (1.1).

Theorem 2.1

([8], Theorem 1.3.5) Suppose that Assumption 1 holds with \(H=3\). Then there exists a unique solution \(X_t\) for \(t \in [0,T]\) to the SDE (1.1) along with the following moment bound for \(p \ge 1\) and constant \(C=C(p,T)>0\)

$$\begin{aligned} \mathbb {E}\left[ \left\Vert X_t\right\Vert ^p \right] <C ( 1+ \left\Vert x\right\Vert ^p ). \end{aligned}$$
(2.1)

To get order one weak convergence we need to assume bounded moments of derivatives of the exact solution to the SDE (1.1) with respect to the initial condition. By the notation \(X_t ^x\), we emphasize that the initial condition is \(X_0=x\). We denote the derivative of the exact solution with respect to the initial condition by \(\textbf{D}_{x}X_t ^x\). The following regularity result is given in [8] (see also [31, 33]).

Theorem 2.2

[8, Theorem 1.3.6] Let Assumption 1 hold with \(H=3\), Assumption 2 hold and \(X_t^x\) be the solution to (1.1). Then \(X_t^x\) is h times mean-square differentiable and for \(i=1,\ldots , h\), \( p\ge 1\) and \(t \in [0,T]\)

$$\begin{aligned} \sup _{x \in \mathbb {R} ^d} \mathbb {E}\left[ \left\Vert \textbf{D}_x ^i X_t ^x \right\Vert ^p \right] < \infty . \end{aligned}$$

Assumption 3

Let the test function \(\phi : \mathbb {R}^d \rightarrow \mathbb {R}\) and \(\phi \in C^{2} _{b}(\mathbb {R}^d)\).

Before continuing we define the quantity

$$\begin{aligned} \Psi (t,x):=\mathbb {E}\left[ \phi (X_t) \vert X_0=x \right] =\mathbb {E}\left[ \phi (X_t^x) \right] . \end{aligned}$$
(2.2)

Theorem 2.3

[8, Theorem 1.6.2] Let Assumption 1 with \(H=3\), Assumption 2 and Assumption 3 hold. Let \(X_t\) be the solution to (1.1). Then, \(\Psi (t,x)\) defined in (2.2) is the unique classical solution to the

Kolmogorov PDE

$$\begin{aligned} \frac{\partial }{\partial t} \Psi (t,x)=\mathcal {L}\Psi (t,x) \end{aligned}$$
(2.3)

where, with \(\mu \) and \(\sigma _i\) defined in (1.3), \(\mathcal {L}\) is given by

$$\begin{aligned} \mathcal {L}\Psi (t, x):=\textbf{D} \Psi (t,x) \mu (x) +\frac{1}{2} \sum _{i=1}^ m \sigma _i(x)^\intercal \textbf{D}^2 \Psi (t,x) \sigma _i(x). \end{aligned}$$

Remark 2.1

Rather than imposing an extra hypotheses (Hypotheses 1.3) as in [8] on the diffusion coefficient we assume in Assumption 3 that \(\phi \) is in \(C^2 _b (\mathbb {R}^d)\). Together with the mean-square differentiability of \(X_t^x\) given in Theorem 2.2, \(\Psi (t,x)\) then satisfies smoothness and boundedness conditions required in the proof of Theorem 1.6.2 in [8].

2.2 Tamed GBM method and convergence results

Our class of exponential methods takes advantage of the linear terms in (1.1) by exploiting the stochastic operator

$$\begin{aligned} \mathbf {\Phi } _{t,t_0} =\exp \left( ( A-\frac{1}{2} \sum _{i=1} ^m B_i ^2)(t-t_0) + \sum _{i=1}^mB_i(W_t^i -W_{t_0}^i) \right) \end{aligned}$$
(2.4)

which, under the commutativity condition (1.2), is the solution to

$$\begin{aligned} {\textrm{d}}\mathbf {\Phi } _{t, t_0} = A \mathbf {\Phi } _{t,t_0} {{\textrm{d}}}{t}+ \sum _{i=1} ^m \ B_i \mathbf {\Phi } _{t,t_0} {{{\textrm{d}}}W}_t^i, \qquad \mathbf {\Phi } _{t_0, t_0} =I_d. \end{aligned}$$
(2.5)

Given \(N\in \mathbb {N}\) and final time T we set the time step size \(\Delta t=\frac{T}{N}\). This gives the uniform time partition \(0=t_0<t_1<t_2<\cdots <t_N=T\) with \(t_n=n\Delta t\). We denote increments \(\Delta W^i_n:=W^i_{t_{n+1}} -W^i_{t_n}\).

We propose and prove weak convergence of the tamed GBM method

$$\begin{aligned} Y_{n+1}^N=\mathbf {\Phi } _{t_{n+1},t_n} \left( Y_n^N+ \! \left( F^{tm} _{\Delta t}(Y_n^N)-\sum _{i=1}^mB_ig_i(Y_n^N)\right) \Delta t+ \! \sum _{i=1}^m g_i(Y_n^N) \Delta W^i_n\right) \nonumber \\ \end{aligned}$$
(2.6)

where \(F^{tm} _{\Delta t}\) is the taming term given by

$$\begin{aligned} F^{tm} _{\Delta t}(y):= \alpha (\Delta t,y)F(y). \end{aligned}$$
(2.7)

The taming function \(\alpha (\Delta t,y)\) is assumed to satisfy for all \(y\in \mathbb {R}^d\) and \(\Delta t>0\)

$$\begin{aligned} \left\Vert \alpha (\Delta t,y)F(y)\right\Vert \Delta t\le 1, \quad 0\le \alpha (\Delta t,y) \le 1,\quad \vert \alpha (\Delta t,y)-1 \vert \le C\Delta t, \end{aligned}$$
(2.8)

where \(C>0\) is a constant independent of \(\Delta t\). The typical form of taming (e.g. [15, 28]) is to take \(\alpha (\Delta t,y)=(1+\Delta t\Vert F(y)\Vert ^p)^{-1}\), so that with \(p=1\)

$$\begin{aligned} F^{tm} _{\Delta t}(y)=\frac{F(y)}{1+\Delta t\left\Vert F(y)\right\Vert }. \end{aligned}$$
(2.9)

Strong convergence of (2.6) with (2.9) and \(g_i\equiv 0\) was considered in [12] and the efficiency of the method was illustrated numerically. If we take \(\alpha (t,y) \equiv 1\), so that \(F^{tm} _{\Delta t}=F\) in (2.6), then we obtain one of the methods in [11] (proved to be strongly convergent with order 1/2 under global Lipschitz assumptions). Further it was shown the method is highly efficient for SDE’s with dominant linear terms and, by a homotopy approach, is competitive when applied to highly non-linear forms of (1.1). Note that it is clear from (1.1) that taking \(B_i=0\) and \(A=B_i=0\) (or incorporating these terms into the nonlinearities) we recover from (2.6) the exponential tamed and the standard tamed methods respectively.

Before we state our main results we give an integral representation of the continuous version of the numerical method. Let us define the continuous extension of numerical solution \(\bar{Y}_t \) for \(t\in [t_n,t_{n+1}]\) by

$$\begin{aligned} \bar{Y}_t =&\mathbf {\Phi } _{t,t_n} \bar{Y}_{t_n} + \mathbf {\Phi } _{t,t_n} \int _{t_n} ^ {t} \left( F^{tm} _{\Delta t}(\bar{Y}_{t_n} ) -\sum _{i=1}^mB_ig_i(\bar{Y}_{t_n} )\right) {{\textrm{d}}}{s}\nonumber \\&+ \mathbf {\Phi } _{t,t_n} \int _{t_n} ^ {t} \sum _{i=1} ^m g_i(\bar{Y}_{t_n} ){{{\textrm{d}}}W}^i_s. \end{aligned}$$
(2.10)

Then it is clear that \(Y_n^N=\bar{Y}_{t_n} \) for \(t_n \le t < t_{n+1}\).

Lemma 2.1

Let \( \bar{Y}_t\) be the interpolated continuous version in (2.10) of the numerical solution given in (2.6). Then the differential for this solution is given by

$$\begin{aligned} {\textrm{d}}\bar{Y}_t = \mu ^{tm} {{\textrm{d}}}{t}+ \sum _{i=1}^m \sigma _i^{tm} {{{\textrm{d}}}W}_t^i, \qquad t\in [t_n,t_{n+1}] \end{aligned}$$
(2.11)

where \(\mu ^{tm}=\mu ^{tm}(t,t_n)\) and \(\sigma _i^{tm}=\sigma _i^{tm}(t,t_n)\) and

$$\begin{aligned} \mu ^{tm}:= A \bar{Y}_t + \mathbf {\Phi } _{t,t_n} F^{tm} _{\Delta t}(\bar{Y}_{t_n} ), \quad \text {and} \quad \sigma _i^{tm}:= B_i \bar{Y}_t +\mathbf {\Phi } _{t,t_n } g_i(\bar{Y}_{t_n} ). \end{aligned}$$
(2.12)

Proof

By the definition of the inverse GBM (see for example [20]),

$$\begin{aligned} {\textrm{d}}\mathbf {\Phi } _{t,t_n} ^{-1}=\left( -A+\sum _{i=1}^mB_i^2\right) \mathbf {\Phi } _{t,t_n} ^{-1} {{\textrm{d}}}{t}- \sum _{i=1}^mB_i\mathbf {\Phi } _{t,t_n} ^{-1} {{{\textrm{d}}}W}_t^i. \end{aligned}$$

We seek the appropriate \(\mu ^{tm}\) and \(\sigma _i ^{tm}\). The product rule for the Itô differential gives

$$\begin{aligned} {\textrm{d}}\left( \mathbf {\Phi } _{t,t_n} ^{-1} \bar{Y}_t \right) =&\mathbf {\Phi } _{t,t_n} ^{-1}\Big ( (-A+\sum _{i=1} ^m B_i^2) \bar{Y}_t+\mu ^{tm}-\sum _{i=1} ^m B_i\sigma _i ^{tm}\Big ) {{\textrm{d}}}{t}\nonumber \\&+ \mathbf {\Phi } _{t,t_n} ^{-1}\Big (\sum _{i=1} ^m\left( \sigma _i ^{tm}-B_i \bar{Y}_t \right) \Big ) {{{\textrm{d}}}W}_t^i. \end{aligned}$$
(2.13)

On the other hand (2.10) can be written as

$$\begin{aligned} {\textrm{d}}\left( \mathbf {\Phi } _{t,t_n} ^{-1} \bar{Y}_t \right) = \left( F^{tm} _{\Delta t}(\bar{Y}_{t_n} ) -\sum _{i=1}^mB_ig_i(\bar{Y}_{t_n} )\right) {{\textrm{d}}}{t}+ \sum _{i=1} ^m g_i(\bar{Y}_{t_n} ){{{\textrm{d}}}W}^i_t. \end{aligned}$$
(2.14)

By comparison of (2.14) with (2.13), we find

$$\begin{aligned} \mathbf {\Phi } _{t,t_n} ^{-1}\left( (-A+\sum _{i=1} ^m B_i^2) \bar{Y}_t +\mu ^{tm}-\sum _{i=1} ^m B_i\sigma _i ^{tm}\right)&= F^{tm} _{\Delta t}( \bar{Y}_{t_n} ) -\sum _{i=1}^mB_ig_i( \bar{Y}_{t_n} ) \end{aligned}$$
(2.15)
$$\begin{aligned} \mathbf {\Phi } _{t,t_n} ^{-1} \left( \sigma _i ^{tm}-B_i \bar{Y}_t \right)&= g_i (\bar{Y}_{t_n} ), \quad i=1,2 \ldots , m. \end{aligned}$$
(2.16)

Solving matrix equation (2.16) for \( \sigma _i^{tm}\), the commutativity conditions (1.2) and substitution into equation (2.15) to determine \(\mu ^{tm}\) gives the desired result. \(\square \)

Our main result is weak convergence of order one of the numerical scheme (2.6) and to prove this we make use of bounded moments.

Assumption 4

There exists \(K>0\) such that for all \(i=1,\ldots , m\)

$$\begin{aligned} \left\Vert g_i(y)-g_i(z) \right\Vert \le K \left\Vert y-z\right\Vert , \quad \forall \ y, z \in \mathbb {R}^d. \end{aligned}$$

Remark 2.2

The global Lipschitz property given in Assumption 4 implies boundedness of \(\textbf{D}g\). Together with Assumption 1, by the mean value Theorem, there exists \(K \in \mathbb {R}\) such that for all \(y,z \in \mathbb {R}^d\)

$$\begin{aligned} { \big \langle } y-z , F(y)-F(z) { \big \rangle }\le K \left\Vert y-z\right\Vert ^2. \end{aligned}$$
(2.17)

Theorem 2.4

Let Assumption 1 with \(H=1\) and Assumption 4 hold. Then, for \(Y_n^N\) be given by (2.6) and for all \(p \in [1,\infty )\)

$$\begin{aligned} \sup _{N \in \mathbb {N}} \sup _{0\le n\le N} \mathbb {E}\left[ \left\Vert Y_n^N\right\Vert ^p \right] <\infty . \end{aligned}$$

The proof of this Theorem is given in Sect. 4 and follows the approach of [15]. However we need to control the stochastic operator in (2.4) and in contrast to [12] we also now need to take account of the nonlinear diffusion terms \(g_i\). The main novelty in our proof below is the interaction between these two terms.

Theorem 2.5

Let Assumption 1 with \(H=4\), Assumption 2 and Assumption 4 hold. Let \(X_T\) be the solution to (1.1). Let \(Y_N^N\) be found from (2.6) and (2.8) hold. Then, for all \(\phi :\mathbb {R}^d\rightarrow \mathbb {R}\), \( \phi \in C^{4} _{b} (\mathbb {R}^d)\) there is a constant \(C>0\), independent of \(\Delta t\) such that

$$\begin{aligned} \vert \mathbb {E}\left[ \phi (Y_N^N ) \right] -\mathbb {E}\left[ \phi (X_T ) \right] \vert \le C \Delta t. \end{aligned}$$

Although in Theorem 2.3 we have \(\phi \in C^2 _b (\mathbb {R}^d)\), to get first order weak convergence, we impose \(\phi \in C^4 _b (\mathbb {R}^d)\) in Theorem 2.5. We prove Theorem 2.5 in Sect. 5 using the Kolmogorov equation. Once again we need to take careful account of the stochastic operator \(\mathbf {\Phi } _{t,t_0} \) from (2.4) as well as dealing with the one-sided Lipschitz drift F. Before giving the proofs we present some numerical results.

3 Numerical results

We seek to estimate numerically the weak discretization error \( \vert \mathbb {E}{\left[ \phi (X_T)\right] }-\mathbb {E}{\left[ \phi (Y_N^N)\right] }\vert \), where \(Y_N^N\) is a numerical approximation to \(X_T\) with \(\Delta t= T/N\). To illustrate the rate of convergence we need to estimate the weak error for different values of \(\Delta t\). Our aim is to examine this in the absence of an analytic solution or where the numerical solution of the Kolomogorov equation is prohibitively expensive. We also wish to illustrate the weak convergence rate of order 1 that is proved in Theorem 2.5.

In practice we take a reference numerical solution so that \(X_T\approx {\mathbb {Y}}^{N_{R}} \) with \({N_{R}}>>N\) (with \(\Delta t_R=T/N_R\)). We then estimate

$$\begin{aligned} \vert \mathbb {E}{\left[ \phi ({\mathbb {Y}}^{N_{R}})\right] }-\mathbb {E}{\left[ \phi (Y_N^N)\right] }\vert = \vert \mathbb {E}{\left[ \phi ({\mathbb {Y}}^{N_{R}})-\phi (Y_N^N)\right] } \vert . \end{aligned}$$

Note that \({\mathbb {Y}}^{N_{R}} \) may be computed by a different method to that for \(Y_N^N\). In [21] issues in computing weak errors using MLMC methods for SPDEs are discussed with multiplicative noise and upper and lower bounds of simulation errors are obtained. However the authors did not consider the simultaneous computation of a reference solution. In [1] the MLMC method is examined where the zero solution is asymptotically mean square stable and an importance sampling technique was introduced. We observe similar stability issues below.

We briefly discuss four approaches to estimate the weak error using the multi-level Monte-Carlo technique (MLMC), see [13, 23]. We denote these methods Trad, MLMCL0, MLMC, MLMCSR and examine them numerically in our experiments. In a traditional method, denoted Trad, we estimate independently \(\mathbb {E}{\left[ \phi ({\mathbb {Y}}^{N_{R}})\right] }\) and \(\mathbb {E}{\left[ \phi (Y_N^N)\right] }\) by a MLMC method. Thus, for the reference solution we have

$$\begin{aligned} \mathbb {E}{\left[ \phi ({\mathbb {Y}}^{N_{R}})\right] } = \mathbb {E}{\left[ \phi ({\mathbb {Y}}^{N_0})\right] } + \sum _{\ell =1}^{R} \mathbb {E}{\left[ \phi ({\mathbb {Y}}^{N_{\ell }}) - \phi ({\mathbb {Y}}^{N_{\ell -1}})\right] } \end{aligned}$$
(3.1)

and for the approximation, with \(N_L:=N\)

$$\begin{aligned} \mathbb {E}{\left[ \phi (Y_N^N)\right] } = \mathbb {E}{\left[ \phi (Y_{N_0}^{N_0})\right] } + \sum _{\ell =1}^{L} \mathbb {E}{\left[ \phi (Y_{N_\ell }^{N_\ell }) - \phi (Y_{N_{\ell -1}}^{N_{\ell -1}})\right] }. \end{aligned}$$
(3.2)

An alternative is to exploit difference in the telescoping sums from the MLMC approach for the reference \({\mathbb {Y}}^{N_{R}} \) (3.1) and numerical approximation \(Y_N^N\) (3.2). Subtracting we get

$$\begin{aligned} \mathbb {E}{\left[ \phi ({\mathbb {Y}}^{N_{R}})\right] }&- \mathbb {E}{\left[ \phi (Y_N^N)\right] } = \mathbb {E}{\left[ \phi ({\mathbb {Y}}^{N_0})-\phi (Y^{N_0}_{N_0})\right] } \nonumber \\&+ \sum _{\ell =1}^{L} \mathbb {E}{\left[ \phi ({\mathbb {Y}}^{N_\ell }) - \phi ({\mathbb {Y}}^{N_{\ell -1}}) - [\phi (Y_{N_\ell }^{N_\ell }) - \phi (Y_{N_{\ell -1}}^{N_{\ell -1}})]\right] } \nonumber \\&+ \sum _{\ell =L+1}^{R} \mathbb {E}{\left[ \phi ({\mathbb {Y}}^{N_{\ell }}) - \phi ({\mathbb {Y}}^{N_{\ell -1}})\right] }. \end{aligned}$$
(3.3)

For the coarsest level \(\ell =0\) we have a choice of estimating \(\mathbb {E}{\left[ \phi ({\mathbb {Y}}^{N_0})\right] }-\mathbb {E}{\left[ \phi (Y_{N_0}^{N_0})\right] }\) or \(\mathbb {E}{\left[ \phi ({\mathbb {Y}}^{N_0})-\phi (Y_{N_0}^{N_0})\right] }\). If the reference \({\mathbb {Y}}^{N_0}\) is found with a different method to that of \(Y^{N_0}_{N_0}\), we can expect some variance reduction using the latter method over the former (and so fewer samples required to approximate the expectation). Note that if only computing for a fixed single L then we expect further variance reduction for the second term, i.e. in estimating

$$\begin{aligned} \mathbb {E}{\left[ \phi ({\mathbb {Y}}^{N_\ell }) - \phi ({\mathbb {Y}}^{N_{\ell -1}}) - [\phi (Y_{N_\ell }^{N_\ell }) - \phi (Y_{N_{\ell -1}}^{N_{\ell -1}})]\right] }. \end{aligned}$$

However, here we wish to compute for different values of L in order to illustrate the rate of convergence. Thus, rather than recomputing (3.3) for different L we instead as follows (to avoid recomputing the MLMC estimates).

For MLMCL0 we exploit efficiency of variance reduction on the coarsest level and so estimate \(\mathbb {E}{\left[ \phi ({\mathbb {Y}}^{N_0})-\phi (Y^{N_0}_{N_0})\right] },\) with estimations of

Where as for MLMC we estimate the weak error by (3.3) and estimate \(\mathbb {E}{\left[ \phi ({\mathbb {Y}}^{N_0})\right] }\), \(\mathbb {E}{\left[ \phi (Y^{N_0}_{N_0})\right] }\) separately. Finally we see from (3.3) that if the same numerical scheme is used to estimate both the reference solutions and \(Y_N^N\) that

$$\begin{aligned} \mathbb {E}{\left[ \phi ({\mathbb {Y}}^{N_{R}})\right] }- \mathbb {E}{\left[ \phi (Y_N^N)\right] }&= \sum _{\ell =L+1}^{R} \mathbb {E}{\left[ \phi ({\mathbb {Y}}^{N_{\ell }}) - \phi ({\mathbb {Y}}^{N_{\ell -1}})\right] }. \end{aligned}$$

This method does not allow for a more accurate (e.g. higher order) reference solution (as it uses a self reference). We call this MLMCSR and estimate

$$\begin{aligned} \mathbb {E}{\left[ \phi ({\mathbb {Y}}^{N_{\ell }}) - \phi ({\mathbb {Y}}^{N_{\ell -1}})\right] }, \qquad \ell =1,2,\ldots ,R. \end{aligned}$$

To illustrate the rate one weak convergence results of Theorem 2.5 we consider the following cubic SDE in \(\mathbb {R}^d\)

$$\begin{aligned} {\textrm{d}}X = [AX + X-X^3] {{\textrm{d}}}{t}+ \beta _1 X {{{\textrm{d}}}W}+ \beta _2 \frac{X}{1+X^2} {{{\textrm{d}}}W}. \end{aligned}$$

This cubic equation is often used as a test equation as solutions exhibit transitions between different phases. With \(d=1\) it is sometimes known as the Ginzburg-Landau equation, [20]. For larger d the system of SDEs can be thought of as arising from the spatial discretization of a stochastic Allen-Cahn equation, see for example [6, 7, 11, 14, 24]. We first examine the linear diffusion with \(\beta _1=0.1\), \(\beta _2=0\) and then non-linear diffusion \(\beta _1=\beta _2=0.1\). We look at dimensions \(d=1,4,10,50\) and \(d=100\). For \(d=1\) we take \(A=-4\), \(X_0=x=0.5\). For \(d\ge 4\) A is the standard tridiagonal matrix from the finite difference approximation to the Laplacian: \( A=0.5 d^{-2}\text {diag}(1,-2,1) \) and as initial data we take \(X_0=x=0.5\exp (-10(y-0.5)^2)\) where \(y=[1/d,2/d,\ldots ,(d-1)/d]^T\). We solve to \(T=1\) and as \(\phi \), we take \(\phi (y)=\Vert y\Vert ^2\). The standard exponential tamed method is used as the reference solution \({\mathbb {Y}}^{N_{R}} \) (except for MLMCSR when looking at our GBM tamed method) and we take (2.9) as our taming function. We perform 10 separate runs computing the full weak error convergence plot and use this to estimate the standard deviation.

Table 1 Average computational times in seconds (and standard deviation) required to compute the convergence plots (e.g. Fig. 1a or b or one curve in c or d)

First we examine the linear noise case with \(\beta _1=0.1\) (and \(\beta _2=0\)). Table 1 compares the efficiency of the different methods for estimating the weak error for dimensions \(d=1,4\) and 10. We restrict \(\Delta t<0.0075\) in these computations to ensure we see convergence of the exponential tamed method. We observe, as expected, that estimating the weak error directly by either MLMCL0 or MLMC is far more efficient than using the MLMC in a traditional from Trad. We also observe there is a slight advantage to using MLMCL0 compared to MLMC (due to some variance reduction on the coarsest level). Trad (GBM) looks at convergence for the GBM based methods and (Tamed) to the exponential tamed method, we observe a small overhead in the GBM method compared to Tamed. However, as we discuss below, there are advantages to the GBM approach. An alternative method would be to take a drift implicit method and to solve a nonlinear system of equations at each step. The weak errors are similar to those for GBM. Using the standard fsolve (without providing the linearization) in MATLAB leads to computational times two orders of magnitude larger (e.g. 2396 s (8.6) for MLMC0 and 2396 s (2.8) for MLMC) performed on the same cluster. For this reason we only show convergence plots for the explicit methods.

In Fig. 1 we show weak convergence for \(d=1\) and in all approaches observe the expected rate of convergence one. Furthermore in (c) and (d), where we have a direct comparison to the exponential tamed method, we see the GBM based method has a smaller error constant. We see in (d) that the two smallest step sizes (\(\Delta t=2^{-9}\) and \(\Delta t=2^{-9}\)) are too close to the reference time step to observe convergence.

Fig. 1
figure 1

Comparison of different approaches to estimate the weak error for \(d=1\) with \(\beta _1=0.1\) and \(\beta _2=0\)

In Fig. 2 we show convergence for \(d=4\) and observe the predicted convergence of rate 1. In (a), (b) and (c) we restrict the largest step so that \(\Delta t<0.02\). For the exponential tamed method it was essential to impose this restriction on the maximum step size as, although solutions remain bounded, the error is too large to observe convergence. These large solutions also lead to large variances and hence an infeasible large number of samples, (see also the discussion in [1]). This is illustrated by comparing (c) where we restrict \(\Delta t<0.02\) and (d) where \(\Delta t<0.2\). The exponential tamed method only starts to converge for \(\Delta t\le 0.02\). For the larger step sizes in (d) we see the GBM based method performs well (unlike the exponential tamed method). The small step size restriction on the standard exponential method as d increases makes it difficult to obtain a reference solution using this method using the MLMC method.

Fig. 2
figure 2

Comparison of 3 different approaches to estimate the weak error for \(d=4\) with \(\beta _1=0.1\) and \(\beta _2=0\). In (d) we see that the exponential tamed method is not well behaved for large time step size

In Fig. 3 we illustrate convergence using MLMCSR for the two methods with \(d=50\) and \(d=100\). We only examine MLMCSR as for these larger values of d due to unreliable results the exponential tamed method for larger \(\Delta t\) values (as discussed for Fig. 2). We see that for \(d=50\) we require \(\Delta t<2\times 10^{-4}\) and for \(d=100\) that \(\Delta t<10^{-4}\) to obtain an approximation using the standard exponential tamed method. However, we observe that the GBM method converges with the predicted order and there is no issue with large variances.

Fig. 3
figure 3

Convergence for \(d=50\) and \(d=100\) with \(\beta _1=0.1\) and \(\beta _2=0\). For larger time step sizes the exponential tamed method has a large variance and we do not see convergence until smaller time step sizes

For the nonlinear diffusion, with \(\beta _1=\beta _2=0.1\) in (3.4), we illustrate convergence in Fig. 4 (a) MLMCSR and (b) Trad. We have taken \(d=4\) and observe weak convergence of rate one. Both the methods MLMC and MLMCL0 also show rate one convergence with the GBM based method having a smaller error constant. For this nonlinear noise, for larger values of \(\Delta t\), although the taming ensures solutions remain bounded the variance does not reduce. This is now for both methods.

Fig. 4
figure 4

Weak convergence for \(d=4\) with \(\beta _1=\beta _2=0.1\). We observe rate one convergence. A time step size restriction is required

4 Proof of bounded moments

We let \( \Delta \textbf{W}_{k}:=(\Delta W_k^1,\Delta W_k^2,\ldots ,\Delta W_k^m)^T.\) For the drift we define the notation

$$\begin{aligned} \mathcal {N}(x):= F^{tm} _{\Delta t}(x) -\sum _{i=1}^mB_ig_i(x) \end{aligned}$$
(4.1)

where \(F^{tm} _{\Delta t}\) is defined in (2.7). We can use \(\mathcal {N}\) to re-write (2.6) as

$$\begin{aligned} Y_{n+1}^N=\mathbf {\Phi } _{t_{n+1},t_n} \left( Y_n^N+ \mathcal {N}(Y_n^N) \Delta t+ \sum _{i=1}^m g_i(Y_n^N) \Delta W^i_n \right) . \end{aligned}$$

We now prove the boundedness of the operator \(\mathbf {\Phi } _{t,t_0} \) in stochastic \(L^p\) spaces.

Lemma 4.1

Suppose \(p \ge 2\), \(0\le s<t\le T \) and let \(\mathbf {\Phi } _{t,s} \) be given in (2.4) with \(t_0=s\). Then,

  1. (i)

    For any \(\mathcal {F}_{s}\) measurable random variable v in \(L^p (\Omega ,\mathbb {R}^d )\)

    $$\begin{aligned} \left\Vert \mathbf {\Phi } _{t,s} v\right\Vert _ {L^p(\Omega ,\mathbb {R}^d)} \le \exp { \left( \left( \left\Vert A\right\Vert +\frac{p-1}{2}\sum _{i=1} ^m \left\Vert B_i\right\Vert ^2\right) (t-s)\right) } \left\Vert v\right\Vert _ {L^p(\Omega ,\mathbb {R}^d)} . \end{aligned}$$
  2. (ii)

    There are constants \(\kappa _{1,2}>0\), independent of \((t-s)\) such that

    $$\begin{aligned} \mathbb {E}\left[ \left\Vert \mathbf {\Phi } _{t,s} \right\Vert ^p \right] \le \kappa _1\exp (\kappa _2 (t-s)). \end{aligned}$$

Proof

For the proof of (i), see [12].

To show (ii) we recall that for a standard Gaussian variable \(z \sim N(0,1)\) and \(\alpha >0\)

$$\begin{aligned} \mathbb {E}\left[ \exp (\alpha \vert z \vert ) \right]= & {} \dfrac{1}{\sqrt{2 \pi }} \int _{-\infty } ^{\infty } \exp (\alpha \vert x \vert ) \exp (-x^2/2)dx\nonumber \\= & {} \exp (\alpha ^2/2) \left( 1+\text {erf} (\alpha /\sqrt{2})\right) . \end{aligned}$$
(4.2)

From (2.4) we have that

$$\begin{aligned} \left\Vert \mathbf {\Phi } _{t,s} \right\Vert ^p \le \exp \left( \left\Vert p( A-\frac{1}{2} \sum _{i=1} ^m B_i ^2)(t-s) \right\Vert \right) \exp \left( p \sum _{i=1}^m \left\Vert B_i\right\Vert \vert W_t^i -W_{s}^i \vert \right) .\nonumber \\ \end{aligned}$$
(4.3)

By taking expectation of both sides in (4.3) and considering the independence and distribution of random values \(\vert W_t^i -W_{s}^i \vert \) and using (4.2)   for \(i=1,2,\ldots , m\), we have

$$\begin{aligned}{} & {} \mathbb {E}\left[ \left\Vert \mathbf {\Phi } _{t,s} \right\Vert \right] ^p \\{} & {} \quad \le \exp \left( \left\Vert p( A-\frac{1}{2} \sum _{i=1} ^m B_i ^2)(t-s) \right\Vert \right) \prod _{i=1}^m \mathbb {E}\left[ \exp \left( p \left\Vert B_i\right\Vert \vert W_t^i -W_{s}^i \vert \right) \right] \\{} & {} \quad = \exp \left( \left\Vert p( A-\frac{1}{2} \sum _{i=1} ^m B_i ^2)(t-s) \right\Vert \right) \\{} & {} \qquad \times \prod _{i=1}^m \exp \left( \dfrac{1}{2}p^2 \left\Vert B_i\right\Vert ^2 (t-s)\right) \left( 1+\text {erf}(\frac{p \sqrt{t-s} \left\Vert B_i\right\Vert }{\sqrt{2} }) \right) . \end{aligned}$$

By the boundedness of the function \(\text {erf}\), the positive constants \(\kappa _1\) and \(\kappa _2\) are determined in terms of \(p, \left\Vert A\right\Vert \) and \(\left\Vert B_i\right\Vert \). \(\square \)

For the proof of Theorem 2.4, we adapt the approach given in [15]. In [12] a similar approach was taken with linear diffusion and so the additional element below is dealing with the nonlinear diffusion terms \(g_i\). We start by introducing appropriate sub events of \(\Omega \). We let \(\Omega _0^N:=\Omega \), then for \(n\in \mathbb {N}_N ^+\)

$$\begin{aligned} \Omega _n^N:=\left\{ \omega \in \Omega \vert \sup _{ k \in \mathbb {N}_{n-1} } D_k^N(\omega ) \le N^{1/4r},\sup _{ k \in \mathbb {N}_{n-1}} \left\Vert \Delta \textbf{W}_{k}\right\Vert \le 1 \right\} , \end{aligned}$$

where the parameter r is as defined in Assumption 1. The dominating stochastic process \(D_n^N\) is defined with \(D_0^N:=\left( \lambda + \left\Vert x\right\Vert \right) e^{\lambda }\) and for \(n \in \mathbb {N}_{N}^+\) by

$$\begin{aligned} D_n^N:=(\lambda + \left\Vert x \right\Vert ) \sup _{u \in \mathbb {N}_{n} } \prod _{k=u }^{n-1 }\left\Vert \mathbf {\Phi } _{t_{k+1},t_{k}} \right\Vert \exp \left( \lambda + \sup _{u \in \mathbb {N}_{n} } \sum _{k=u} ^{n-1} \left( \lambda \left\Vert \Delta \textbf{W}_k\right\Vert ^2 +\beta _k ^N \right) \right) \end{aligned}$$

where

$$\begin{aligned} \beta _k ^N:= \mathbbm {1}_{\left\Vert Y_k^N\right\Vert \ge 1} \frac{{{ \big \langle } Y_k^N , \sum _{i=1}^mg_i(Y_k^N)\Delta W^i_k { \big \rangle }} }{\left\Vert Y_k^N\right\Vert ^2} \quad \text {and} \quad \lambda :=\max \{\lambda _0,\lambda _1,\lambda _2,\lambda _3,\lambda _4\}. \end{aligned}$$

Here \(\lambda _0,\ldots ,\lambda _4\) are constants defined by

$$\begin{aligned} \lambda&_0 :=\exp \left( {\sum _{i=1}^m \left\Vert B_i\right\Vert +T \left\Vert A-\frac{1}{2} \sum \limits _{i=1} ^m B_i ^2\right\Vert }\right) \times \bigg ( 1+2TK+T \left\Vert F(0)\right\Vert \\&\qquad \ldots +\sum _{i=1}^{m} \left\Vert B_i\right\Vert TK+T\left\Vert \sum _{i=1}^mB_ig_i(0)\right\Vert +mK+\sum _{i=1}^{m} \left\Vert g_i(0) \right\Vert \bigg ) \\ \lambda _1:&= m\sum _{i=1}^{m} \left( K +\left\Vert g_i(0)\right\Vert \right) ^2 \\ \lambda _2:&=\left( 2K+\left\Vert F(0)\right\Vert \right) ^4 \\ \lambda _3:&=\left( \sum _{i=1}^{m} \left\Vert B_i\right\Vert K+\left\Vert \sum _{i=1}^mB_ig_i(0)\right\Vert \right) ^4\\ \lambda _4:&=\left( 4T^2+2T \right) ^2, \end{aligned}$$

where the constant \(K>0\) denotes a constant that arises from the constants in Assumption 1, 4 and (2.17). The first result shows we can dominate the numerical solution on the set \(\Omega _n^N\).

Lemma 4.2

Let Assumption 1 hold with \(H=1\). Let \(Y_n^N\) be given by (2.6) and let (2.8) hold. For all \(n \in \mathbb {N}_{N}\) we have the pathwise inequality

$$\begin{aligned} 1_{\Omega _n^N} \left\Vert Y_n^N\right\Vert \le D_n^N. \end{aligned}$$

Proof

On \(\Omega _{n+1}^N\) by construction \(\left\Vert \Delta \textbf{W}_n\right\Vert \le 1\) for \(n\in \mathbb {N}_{N-1}\). Therefore, \(\left\Vert \mathbf {\Phi } _{t_{n+1},t_n} \right\Vert < \infty \) for \(n \in \mathbb {N}_{N-1}\). We prove the Lemma on two subsets of \(\Omega _{n+1}^N\)

$$\begin{aligned} S^{(1)}_{n+1}&:=\Omega _{n+1}^N\cap \lbrace \omega \in \Omega \vert \left\Vert Y_n^N (\omega )\right\Vert \le 1 \rbrace \\ S^{(2)}_{n+1}&:=\Omega _{n+1}^N\cap \lbrace \omega \in \Omega \vert 1 \le \left\Vert Y_n^N (\omega )\right\Vert \le N^{1/(4r)} \rbrace . \end{aligned}$$

First, on \(S^{(1)}_{n+1}\), we have from (2.6) and the triangle inequality that

$$\begin{aligned} \left\Vert Y_{n+1}^N\right\Vert \le \left\Vert \mathbf {\Phi } _{t_{n+1},t_n} \right\Vert \left( \left\Vert Y_n^N\right\Vert +\Delta t \left\Vert \mathcal {N}(Y_n^N)\right\Vert + \sum _{i=1}^{m} \left\Vert g_i(Y_n^N)\right\Vert \left\Vert \Delta \textbf{W}_n\right\Vert \right) . \end{aligned}$$

Since \(\left\Vert Y_n^N\right\Vert \le 1\), \(\left\Vert \Delta \textbf{W}_n\right\Vert \le 1\) on \(S^{(1)}_{n+1}\), and by the taming inequality \(\alpha (\Delta t,y)\le 1 \) from (2.8), we have that

$$\begin{aligned} \left\Vert Y_{n+1}^N\right\Vert \le \left\Vert \mathbf {\Phi } _{t_{n+1},t_n} \right\Vert \left( 1+\Delta t\left\Vert F(Y_n^N)\right\Vert +\Delta t\left\Vert \sum _{i=1}^mB_ig_i(Y_n^N)\right\Vert +\sum _{i=1}^{m} \left\Vert g_i(Y_n^N)\right\Vert \right) . \end{aligned}$$

Adding and subtracting F(0), \(\sum _{i=1}^mB_ig_i(0)\), as well as \(g_i(0)\) for \(i=1,\ldots , m\), and then applying the triangle inequality we get

$$\begin{aligned} \left\Vert Y_{n+1}^N\right\Vert \le&\left\Vert \mathbf {\Phi } _{t_{n+1},t_n} \right\Vert \Big ( 1+\Delta t\left\Vert F(Y_n^N)-F(0)\right\Vert +\Delta t\left\Vert F(0)\right\Vert \nonumber \\&+\Delta t\left\Vert \sum _{i=1}^mB_ig_i(Y_n^N)-\sum _{i=1}^mB_ig_i(0)\right\Vert +\Delta t\left\Vert \sum _{i=1}^mB_ig_i(0)\right\Vert \\&+\sum _{i=1}^{m} \left( \left\Vert g_i(Y_n^N)-g_i(0)\right\Vert +\left\Vert g_i(0)\right\Vert \right) \Big ). \end{aligned}$$

By the polynomial growth condition on \(\textbf{D} F\) given in Assumption 1, the global Lipschitz condition on \(g_i\) (Assumption 4), and using that \(\Delta t\le T\)

$$\begin{aligned} \left\Vert Y_{n+1}^N\right\Vert \le&\left\Vert \mathbf {\Phi } _{t_{n+1},t_n} \right\Vert \Big ( 1+T K \left( 1+\left\Vert Y_n^N\right\Vert ^{2r} \right) \left\Vert Y_n^N\right\Vert +T \left\Vert F(0)\right\Vert \\&+ T K \sum _{i=1}^m \left\Vert B_i\right\Vert \left\Vert Y_n^N\right\Vert + T \left\Vert \sum _{i=1}^mB_ig_i(0)\right\Vert +mK \left\Vert Y_n^N\right\Vert + \sum _{i=1}^{m} \left\Vert g_i(0)\right\Vert \Big ). \end{aligned}$$

On \(S^{(1)}_{n+1}\), since \(\left\Vert Y_n^N\right\Vert \le 1\) and \( \left\Vert \Delta \textbf{W}_n\right\Vert \le 1 \), we have

$$\begin{aligned} \left\Vert Y_{n+1}^N\right\Vert \le \lambda . \end{aligned}$$
(4.4)

To bound \(S^{(2)}_{n+1}\), we start from (2.6) by squaring the norm

$$\begin{aligned} \begin{aligned}&\left\Vert Y_{n+1}^N\right\Vert ^2 \le \left\Vert \mathbf {\Phi } _{t_{n+1},t_n} \right\Vert ^2 \left\Vert Y_n^N+\Delta t \mathcal {N}(Y_n^N)+\sum _{i=1}^mg_i(Y_n^N)\Delta W^i_n\right\Vert ^2 \\&\quad \le \left\Vert \mathbf {\Phi } _{t_{n+1},t_n} \right\Vert ^2 \left( \left\Vert Y_n^N\right\Vert ^2+\Delta t^2 \left\Vert \mathcal {N}(Y_n^N)\right\Vert ^2 + m\sum _{i=1}^{m} \left\Vert g_i(Y_n^N)\right\Vert ^2 \left\Vert \Delta \textbf{W}_n\right\Vert ^2 \right. \\&\qquad \left. \! + 2 \Delta t{ \big \langle } Y_n^N , \mathcal {N}(Y_n^N) { \big \rangle } +2 { \big \langle } Y_n^N , \sum _{i=1}^mg_i(Y_n^N)\Delta W^i_n { \big \rangle }\! + \!2 \Delta t{ \big \langle } \mathcal {N}(Y_n^N) , \sum _{i=1}^mg_i(Y_n^N)\Delta W^i_n { \big \rangle } \right) . \end{aligned} \end{aligned}$$

Applying the Cauchy–Schwarz and Arithmetic–Geometric inequalities

$$\begin{aligned} \left\Vert Y_{n+1}^N\right\Vert ^2 \le&\left\Vert \mathbf {\Phi } _{t_{n+1},t_n} \right\Vert ^2 \left( \left\Vert Y_n^N\right\Vert ^2+2\Delta t^2 \left\Vert \mathcal {N}(Y_n^N)\right\Vert ^2 +2 m\sum _{i=1}^{m} \left\Vert g_i(Y_n^N)\right\Vert ^2 \left\Vert \Delta \textbf{W}_n\right\Vert ^2 \right. \nonumber \\&\left. + 2 \Delta t{ \big \langle } Y_n^N , \mathcal {N}(Y_n^N) { \big \rangle } +2 { \big \langle } Y_n^N , \sum _{i=1}^mg_i(Y_n^N)\Delta W^i_n { \big \rangle } \right) . \end{aligned}$$
(4.5)

Let us consider \(\mathcal {N}(Y_n^N)\) (see (4.1)). Following the argument in [15, (40) in Proof of Lemma 3.1] we have

$$\begin{aligned} \left\Vert F(Y_n^N)\right\Vert ^2\le N \sqrt{\lambda } \left\Vert Y_n^N\right\Vert ^2. \end{aligned}$$

Further, by Assumption 4, the global Lipschitz property of the function \(g_i\)

$$\begin{aligned} \left\Vert \sum _{i=1}^mB_ig_i(Y_n^N)\right\Vert ^2\le & {} \left( \sum _{i=1}^m\left\Vert B_i\right\Vert K+\left\Vert \sum _{i=1}^mB_ig_i(0)\right\Vert \right) ^2 \left\Vert Y_n^N\right\Vert ^2 \nonumber \\\le & {} \sqrt{\lambda } \left\Vert Y_n^N\right\Vert ^2 \end{aligned}$$
(4.6)

and

$$\begin{aligned} \left\Vert g_i(Y_n^N)\right\Vert ^2\le \left( K + \left\Vert g_i(0)\right\Vert \right) ^2 \left\Vert Y_n^N\right\Vert ^2. \end{aligned}$$

By definition of \(\lambda \), we have

$$\begin{aligned} m\sum _{i=1}^m \left\Vert g_i(Y_n^N)\right\Vert ^2\le \lambda \left\Vert Y_n^N\right\Vert ^2. \end{aligned}$$

Therefore the linear growth of the second term on the RHS of (4.5)

$$\begin{aligned} \left\Vert \mathcal {N}(Y_n^N)\right\Vert ^2 \le 4 N \sqrt{\lambda } \left\Vert Y_n^N\right\Vert ^2 \end{aligned}$$

on \(S^{(2)}_{n+1}\) is obtained.

The one-sided Lipschitz condition on F (2.17) and the Cauchy–Schwarz inequality give that

$$\begin{aligned} { \big \langle } Y_n^N , F(Y_n^N) { \big \rangle }&\le { \big \langle } Y_n^N , F(Y_n^N)-F(0) { \big \rangle } +{ \big \langle } Y_n^N , F(0) { \big \rangle } \nonumber \\&\le \left( K + \left\Vert F (0)\right\Vert \right) \left\Vert Y_n^N\right\Vert ^2 \nonumber \\&\le \sqrt{ \lambda } \left\Vert Y_n^N\right\Vert ^2 \end{aligned}$$
(4.7)

where we have used that \(1\le \Vert Y_n^N\Vert \). Therefore by the inequalities (4.6), (4.7), Cauchy–Schwarz inequality for the term \({ \big \langle } Y_n^N , \sum _{i=1}^m B_i g_i (Y_n^N) { \big \rangle }\) and \(\lambda >1\)

$$\begin{aligned} { \big \langle } Y_n^N , \mathcal {N}(Y_n^N) { \big \rangle }&= \alpha (\Delta t, Y_n^N) { \big \langle } Y_n^N , F (Y_n^N) { \big \rangle }- { \big \langle } Y_n^N , \sum _{i=1}^m B_i g_i (Y_n^N) { \big \rangle } \\&\le \left( \alpha (\Delta t, Y_n^N) +1\right) \sqrt{\lambda } \left\Vert Y_n^N\right\Vert ^2. \end{aligned}$$

As a result, since \(\Delta t =T/N\), \(\alpha (\Delta t,Y_n^N) < 1 \), by (2.8), and \((4T^2+2T)\le \sqrt{\lambda }\), we see that on \(S^{(2)}_{n+1}\) (4.5) becomes

$$\begin{aligned} \left\Vert Y_{n+1}^N\right\Vert ^2&\le \left\Vert \mathbf {\Phi } _{t_{n+1},t_n} \right\Vert ^2 \left\Vert Y_n ^N\right\Vert ^2 \left( 1+\frac{8T^2}{N} \sqrt{\lambda } + 2\lambda \left\Vert \Delta \textbf{W}_n\right\Vert ^2 + \frac{4T}{N}\sqrt{\lambda } +2 \beta _n ^N \right) \nonumber \\&\le \left\Vert \mathbf {\Phi } _{t_{n+1},t_n} \right\Vert ^2 \left\Vert Y_n ^N\right\Vert ^2 \left( 1+ \frac{2\lambda }{N} + 2\lambda \left\Vert \Delta \textbf{W}_n\right\Vert ^2 +2\beta _n ^N \right) \nonumber \\&\le \left\Vert \mathbf {\Phi } _{t_{n+1},t_n} \right\Vert ^2 \left\Vert Y_n ^N\right\Vert ^2 \exp \left( \frac{2\lambda }{N} + 2\lambda \left\Vert \Delta \textbf{W}_n\right\Vert ^2 +2\beta _n ^N \right) . \end{aligned}$$
(4.8)

Now we carry out an induction argument on n. The case for \(n=0\) is obvious by initial condition on \(\Omega _0^N=\Omega \). Let \(l \in \mathbb {N}_{N-1}\) and assume \(\left\Vert Y_{n} ^N (\omega )\right\Vert \le D_n ^N(\omega )\) holds for all \(n \in \mathbb {N}_{l}\) where \(\omega \in \Omega _n ^N\). We now prove that

$$\begin{aligned} \left\Vert Y_{l+1} ^N(\omega )\right\Vert \le D_{l+1} ^N(\omega ) \quad \text {for all} \quad \omega \in \Omega _{l+1} ^N. \end{aligned}$$

For all \( \omega \in \Omega _{l+1} ^N\), we have by the induction hypothesis \(\left\Vert Y_{n} ^N (\omega )\right\Vert \le D_n ^N(\omega )\le N^{1/(4r)}\), \(n\in \mathbb {N}_l\) and \(\Omega _{l+1} ^N \subseteq \Omega _{n+1} ^N\). For any \( \omega \in \Omega _{l+1} ^N\), \(\omega \) belongs to \(S_{l+1} ^{(1)}\) or \(S_{l+1} ^{(2)}\). For the inductive argument we define a random variable

$$\begin{aligned} \tau _l ^N (\omega ):=\max \left( \lbrace -1 \rbrace \cup \lbrace n \in \mathbb {N}_{l-1} \rbrace \bigg \vert \left\Vert Y_n^N (\omega )\right\Vert \le 1 \rbrace \right) \end{aligned}$$

as in [15]. This definition implies that \(1 \le \left\Vert Y_n^N (\omega )\right\Vert \le N^ {1/(4r)}\) for all \(n \in \lbrace \tau _{l+1} ^N (\omega )+1, \tau _{l+2} ^N (\omega ),\ldots , l \rbrace \). By the bound in (4.8)

$$\begin{aligned}&\left\Vert Y_{l+1} ^N (\omega )\right\Vert \le \left\Vert \mathbf {\Phi } _{t_{l+1},t_l} (\omega )\right\Vert \left\Vert Y_l ^N (\omega )\right\Vert \exp \left( \frac{\lambda }{N} + \lambda \left\Vert \Delta \textbf{W}_l(\omega )\right\Vert ^2 +\beta _l ^N(\omega ) \right) \\&\quad \le \left\Vert Y_{ \tau _{l+1} ^N (\omega )+1} ^N (\omega )\right\Vert \\&\qquad \times \prod _{n=\tau _{l+1} ^N (\omega )+1 } ^l \left( \left\Vert \mathbf {\Phi } _{t_{n+1},t_{n}} (\omega ) \right\Vert \right) \exp \left( \sum _{n=\tau _{l+1}+1} ^l \left( \frac{\lambda }{N} + \lambda \left\Vert \Delta \textbf{W}_n(\omega )\right\Vert ^2 +\beta _n ^N(\omega ) \right) \right) \\&\quad \le \left\Vert Y_{ \tau _{l+1} ^N (\omega )+1} ^N (\omega )\right\Vert \\&\qquad \times \sup _{u \in \mathbb {N}_{l+1} } \prod _{n=u }^l \left\Vert \mathbf {\Phi } _{t_{n+1},t_{n}} (\omega ) \right\Vert \exp \left( \lambda + \sup _{u \in \mathbb {N}_{l+1} } \sum _{n=u} ^l \left( \lambda \left\Vert \Delta \textbf{W}_n(\omega )\right\Vert ^2 +\beta _n ^N(\omega ) \right) \right) . \end{aligned}$$

By considering (4.4), the following completes the induction step and proof

$$\begin{aligned} \begin{aligned}&\left\Vert Y_{l+1} ^N (\omega )\right\Vert \le (\lambda + \left\Vert x\right\Vert ) \\&\qquad \times \sup _{u \in \mathbb {N}_{l+1} } \prod _{n=u }^l \left\Vert \mathbf {\Phi } _{t_{n+1},t_{n}} (\omega ) \right\Vert \exp \left( \lambda + \sup _{u \in \mathbb {N}_{l+1} } \sum _{n=u} ^l \left( \lambda \left\Vert \Delta \textbf{W}_n(\omega )\right\Vert ^2 +\beta _n ^N(\omega ) \right) \right) \\&\quad =D_{l+1} ^N (\omega ). \end{aligned} \end{aligned}$$

\(\square \)

Lemma 4.3

For all \(p \ge 1\), \( \sup _{N \in \mathbb {N} } \mathbb {E}\left[ \sup _{ n \in \mathbb {N}_{N} } \vert D_n^N \vert ^p \right] < \infty . \)

Proof

By two applications of Hölder’s inequalities, we have

$$\begin{aligned} \begin{aligned}&\mathop {\sup _{ N \in \mathbb {N}}}_{N \ge 8 \lambda p T } \left\Vert \sup _{n \in \mathbb {N}_{N}} D_n ^N\right\Vert _ {L^{p}(\Omega ,\mathbb {R})} \\&\quad \le e^{\lambda } \left( \mathop {\sup _{ N \in \mathbb {N}}}_{N \ge 8 \lambda p T } \left\Vert \exp \sum _{k=0} ^{N-1} \left( \lambda \left\Vert \Delta \textbf{W}_k \right\Vert ^2 \right) \right\Vert _ {L^{2p}(\Omega ,\mathbb {R})} \right) \\&\qquad \times \sup _{ N \in \mathbb {N}} \left\Vert (\lambda + \left\Vert x \right\Vert ) \sup _{n \in \mathbb {N}_{N} } \sup _{u \in \mathbb {N}_{n} } \prod _{k=u }^{n-1 }\left\Vert \mathbf {\Phi } _{t_{k+1}} ,t_{k} \right\Vert \sup _{n \in \mathbb {N}_{N} } \exp \left( \sup _{u \in \mathbb {N}_{n} }\sum _{k=u} ^{n-1} \ \beta _k ^N \right) \right\Vert _ {L^{2p}(\Omega ,\mathbb {R})} \\&\quad \le e^{\lambda } \left( \mathop {\sup _{ N \in \mathbb {N}}}_{N \ge 8 \lambda p T } \left\Vert \exp \sum _{k=0} ^{N-1} \left( \lambda \left\Vert \Delta \textbf{W}_k \right\Vert ^2 \right) \right\Vert _ {L^{2p}(\Omega ,\mathbb {R})} \right) \\&\qquad \times \left( \sup _{ N \in \mathbb {N}} \left\Vert \sup _{n \in \mathbb {N}_{N} } \exp \left( \sup _{u \in \mathbb {N}_{n} }\sum _{k=u} ^{n-1} \ \beta _k ^N \right) \right\Vert _ {L^{4p}(\Omega ,\mathbb {R})} \right) \\&\qquad \times \left( \sup _{ N \in \mathbb {N}} \left\Vert (\lambda + \left\Vert x \right\Vert ) \sup _{n \in \mathbb {N}_{N} } \sup _{u \in \mathbb {N}_{n} } \prod _{k=u }^{n-1 }\left\Vert \mathbf {\Phi } _{t_{k+1}} ,t_{k} \right\Vert \right\Vert _ {L^{4p}(\Omega ,\mathbb {R})} \right) . \end{aligned} \end{aligned}$$

Then by (4.3) and since \(\lambda \) and x are deterministic we get

$$\begin{aligned}&\mathop {\sup _{ N \in \mathbb {N}}}_{N \ge 8 \lambda p T } \left\Vert \sup _{n \in \mathbb {N}_{N}} D_n ^N\right\Vert _ {L^{p}(\Omega ,\mathbb {R})} \\&\quad \le e^{\lambda } \left( \lambda + \left\Vert x \right\Vert \right) \left( \mathop {\sup _{ N \in \mathbb {N}}}_{N \ge 8 \lambda p T } \left\Vert \exp \sum _{k=0} ^{N-1} \left( \lambda \left\Vert \Delta \textbf{W}_k \right\Vert ^2 \right) \right\Vert _ {L^{2p}(\Omega ,\mathbb {R})} \right) \\&\quad \quad \times \left( \sup _{ N \in \mathbb {N}} \left\Vert \sup _{n \in \mathbb {N}_{N} } \exp \left( \sup _{u \in \mathbb {N}_{n} }\sum _{k=u} ^{n-1} \ \beta _k ^N \right) \right\Vert _ {L^{4p}(\Omega ,\mathbb {R})} \right) \\&\quad \quad \times \left( \sup _{ N \in \mathbb {N}} \prod _{k=0 }^{N-1 } \left\Vert \exp \left( \left\Vert ( A-\frac{1}{2} \sum _{i=1} ^m B_i ^2)\Delta t\right\Vert \right) \prod _{i=1}^m \exp \left( \left\Vert B_i\right\Vert \vert \Delta W^i_k \vert \right) \right\Vert _ {L^{4p}(\Omega ,\mathbb {R})} \right) . \end{aligned}$$

The second and third terms are shown to be bounded in [15, Lemma 3.5]. On the other hand, the boundedness of the last term on the RHS is due to the \(\Delta t\) dependence of the upper bound of \(\mathbb {E}\left[ \exp (\left\Vert B_i\right\Vert \vert \Delta W^i_k \vert ) \right] \) as shown in the proof of Lemma 4.1 (ii). \(\square \)

Lemma 4.4

Let \(\bar{Y}_t \) be given by (2.10). For all \(p \ge 2\) and \(t \in (t_n,t_{n+1}]\), there exists \(K=K(p,T)>0\) such that

$$\begin{aligned} \left\Vert \bar{Y}_t \right\Vert _ {L^p(\Omega ,\mathbb {R}^d)} ^2 \le K \left( 1 + \left\Vert \bar{Y}_{t_n} \right\Vert _ {L^p(\Omega ,\mathbb {R}^d)} ^2\right) . \end{aligned}$$
(4.9)

Proof

Note that \(\bar{Y}_t \) is the solution to a linear Itô SDE

$$\begin{aligned} \bar{Y}_t =\bar{Y}_{t_n} + \int _{t_n} ^ {t} \left( A \bar{Y}_s + \mathbf {\Phi } _{s,t_n} F^{tm} _{\Delta t}(\bar{Y}_{t_n} ) \right) {{\textrm{d}}}{s}+ \sum _{i=1}^{m}\int _{t_n} ^ {t} \left( B_i \bar{Y}_t +\mathbf {\Phi } _{t,t_n } g_i(\bar{Y}_{t_n} ) \right) {{{\textrm{d}}}W}_t^i \end{aligned}$$

with initial condition \(\bar{Y}_{t_n} \) on the interval \([t_n,t]\) by Lemma 2.1. After adding and subtracting the terms \(\sum _{i=1}^{m} \int _{t_n}^t \mathbf {\Phi } _{s,t_n} g_i(0) {{{\textrm{d}}}W}_{s}^i\), we have by Jensen inequality for sums and integrals as well as the Burkholder–Davis–Gundy inequality (see [25])

$$\begin{aligned} \mathbb {E}\left[ \left\Vert \bar{Y}_t \right\Vert \right] ^{p} \le \,&6 ^{p-1} \left( \mathbb {E}\left[ \left\Vert \bar{Y}_{t_n} \right\Vert ^{p} \right] +\left\Vert A\right\Vert ^p (t-t_n)^{p-1} \int _{t_n}^t \mathbb {E}\left[ \left\Vert \bar{Y}_s \right\Vert \right] ^{p} {{\textrm{d}}}{s}\right. \\&\left. + (t-t_n)^{p-1} \int _{t_n}^t \mathbb {E}\left[ \left\Vert \mathbf {\Phi } _{s,t_n} \alpha (\Delta t,\bar{Y}_{t_n} )F(\bar{Y}_{t_n} ) \right\Vert ^{p} \right] {{\textrm{d}}}{s}\right. \\&\left. +(t-t_n)^{p/2-1} \sum _{i=1}^m \left\Vert B_i\right\Vert ^p \int _{t_n}^t \mathbb {E}\left[ \left\Vert \bar{Y}_s \right\Vert ^ {p} \right] {{\textrm{d}}}{s}\right. \\&\left. +(t-t_n)^{p/2-1} \sum _{i=1}^m \int _{t_n}^t \mathbb {E}\left[ \left\Vert \mathbf {\Phi } _{s,t_n} \left( g_i(\bar{Y}_{t_n} ) -g_i(0)\right) \right\Vert ^ {p} \right] {{\textrm{d}}}{s}\right) . \end{aligned}$$

Using Lemma 4.1, that \( \left\Vert \alpha (\Delta t,\bar{Y}_{t_k} )F(\bar{Y}_{t_k} ) \Delta t\right\Vert \le 1\) and the global Lipschitz property of \(g_i\), the desired inequality is achieved from an application of the continuous Gronwall Lemma. \(\square \)

We now use Lemmas 4.14.4 to prove the scheme has bounded moments.

4.1 Proof of Theorem 2.4: bounded moments

Proof

Consider the Itô equation for the continuous extension of numerical solution given in Lemma 2.1 on [0, t]

$$\begin{aligned} \bar{Y}_{t_n} =x+\sum _{k=0}^{n-1} \int _{t_k}^{t_{k+1}} \left( A\bar{Y}_s +\mathbf {\Phi } _{s,t_k} \alpha (\Delta t,\bar{Y}_{t_k} ) F(\bar{Y}_{t_k} ) \right) {{\textrm{d}}}{s}\\ +\sum _{k=0}^{n-1} \sum _{i=1}^m \int _{t_k}^{t_{k+1}} \left( B_i\bar{Y}_s +\mathbf {\Phi } _{s,t_k} g_i(\bar{Y}_{t_k} ) \right) {{{\textrm{d}}}W}_{s}^i. \end{aligned}$$

By adding and subtracting the terms \(\sum _{i=1}^m \sum _{k=0}^{n-1} \int _{t_k}^{t_{k+1}} \mathbf {\Phi } _{s,t_k} g_i(0) {{{\textrm{d}}}W}_{s}^i\) and using the triangle inequality, we have

$$\begin{aligned} \left\Vert \bar{Y}_{t_n} \right\Vert _ {L^{p}(\Omega ,\mathbb {R}^d)} \le&\left\Vert x\right\Vert +\left\Vert A\right\Vert \sum _{k=0}^{n-1} \int _{t_k}^{t_{k+1}} \left\Vert \bar{Y}_s \right\Vert _ {L^{p}(\Omega ,\mathbb {R}^d)} {{\textrm{d}}}{s}\\&+ \sum _{k=0}^{n-1} \int _{t_k}^{t_{k+1}} \left\Vert \mathbf {\Phi } _{s,t_k} \alpha (\Delta t,\bar{Y}_{t_k} )F(\bar{Y}_{t_k} ) \right\Vert _ {L^{p}(\Omega ,\mathbb {R}^d)} {{\textrm{d}}}{s}\\&+\left\Vert \sum _{k=0}^{n-1} \int _{t_k}^{t_{k+1}} \sum _{i=1}^m B_i\bar{Y}_s {{{\textrm{d}}}W}_{s}^i\right\Vert _ {L^{p}(\Omega ,\mathbb {R}^d)} \\&+\left\Vert \sum _{k=0}^{n-1} \int _{t_k}^{t_{k+1}} \mathbf {\Phi } _{s,t_k} \sum _{i=1}^mg_i(0) {{{\textrm{d}}}W}_{s}^i\right\Vert _ {L^{p}(\Omega ,\mathbb {R}^d)} \\&+\left\Vert \sum _{k=0}^{n-1} \int _{t_k}^{t_{k+1}} \mathbf {\Phi } _{s,t_k} \sum _{i=1}^{m} \left( g_i(\bar{Y}_{t_k} ) -g_i(0) \right) {{{\textrm{d}}}W}_{s}^i \right\Vert _ {L^{p}(\Omega ,\mathbb {R}^d)} . \end{aligned}$$

By Lemma 4.1, that \( \left\Vert \alpha (\Delta t,\bar{Y}_{t_k} )F(\bar{Y}_{t_k} ) \Delta t\right\Vert \le 1 \) in (2.8), along with the Burkholder–Davis–Gundy inequality we have

$$\begin{aligned} \begin{aligned}&\left\Vert \bar{Y}_{t_n} \right\Vert _ {L^{p}(\Omega ,\mathbb {R}^d)} \le \left\Vert x\right\Vert +\left\Vert A\right\Vert \sum _{k=0}^{n-1} \int _{t_k}^{t_{k+1}} \left\Vert \bar{Y}_s \right\Vert _ {L^{p}(\Omega ,\mathbb {R}^d)} {{\textrm{d}}}{s}+ K N \\&+p\left( \sum _{k=0}^{n-1} \int _{t_k}^{t_{k+1}} \sum _{i=1}^m \left\Vert B_i\bar{Y}_s \right\Vert _ {L^{p}(\Omega ,\mathbb {R}^d)} ^2 {{\textrm{d}}}{s}\right) ^{1/2} \\&+p K\left( \sum _{i=1}^m \left\Vert g_i(0)\right\Vert ^2 T \right) ^{1/2}+K^2 \sqrt{m} p \left( \sum _{k=0}^{n-1} \int _{t_k}^{t_{k+1}} \left\Vert \bar{Y}_{t_k} \right\Vert _ {L^{p}(\Omega ,\mathbb {R}^d)} ^2 {{\textrm{d}}}{s}\right) ^{1/2}. \end{aligned} \end{aligned}$$

By taking the square of both sides and applying the Jensen inequalities for sums and integrals

$$\begin{aligned} \begin{aligned} \left\Vert \bar{Y}_{t_n} \right\Vert _ {L^{p}(\Omega ,\mathbb {R}^d)} ^2 \le&4 \left( \left\Vert x\right\Vert +p K \left( \sum _{i=1}^m \left\Vert g_i(0)\right\Vert ^2 T \right) ^{1/2} + K N \right) ^2 \\&+4 \left\Vert A\right\Vert ^2 T \sum _{k=0}^{n-1} \int _{t_k}^{t_{k+1}} \left\Vert \bar{Y}_s \right\Vert _ {L^{p}(\Omega ,\mathbb {R}^d)} ^2 {{\textrm{d}}}{s}\\&+4 p^2 \sum _{k=0}^{n-1} \int _{t_k}^{t_{k+1}} \sum _{i=1}^m \left\Vert B_i\bar{Y}_s \right\Vert _ {L^{p}(\Omega ,\mathbb {R}^d)} ^2 {{\textrm{d}}}{s}\\&+ 4 K^4 m p^2 \sum _{k=0}^{n-1} \int _{t_k}^{t_{k+1}} \left\Vert \bar{Y}_{t_k} \right\Vert _ {L^{p}(\Omega ,\mathbb {R}^d)} ^2 {{\textrm{d}}}{s}. \end{aligned} \end{aligned}$$
(4.10)

Substituting (4.9) into (4.10) and rewriting the resulting integral inequality in discrete form, we have

$$\begin{aligned} \begin{aligned}&\left\Vert Y_n^N\right\Vert _ {L^{p}(\Omega ,\mathbb {R}^d)} ^2 \le 4 \left( \left\Vert x\right\Vert +p K\left( \sum _{i=1}^m \left\Vert g_i(0)\right\Vert ^2 T \right) ^{1/2}+ K N \right) ^2 \\&\quad + 4 \left\Vert A\right\Vert ^2 T \sum _{k=0}^{n-1} K (1+\left\Vert Y_k ^N\right\Vert _ {L^{p}(\Omega ,\mathbb {R}^d)} ^2) \frac{T}{N} \\&\quad +4 p^2 \sum _{k=0}^{n-1} \sum _{i=1}^m \left\Vert B_i\right\Vert ^2 K(1+\left\Vert Y_k ^N\right\Vert _ {L^{p}(\Omega ,\mathbb {R}^d)} ^2) \frac{T}{N} + 4 K^4 m p^2 \sum _{k=0}^{n-1} \left\Vert Y_k ^N\right\Vert _ {L^{p}(\Omega ,\mathbb {R}^d)} ^2 \frac{T}{N}. \end{aligned} \end{aligned}$$

By the discrete Gronwall inequality

$$\begin{aligned} \begin{aligned} \sup _{ n \in \mathbb {N}_{N} } \left\Vert Y_n^N\right\Vert _ {L^{p}(\Omega ,\mathbb {R}^d)} \le C_{\left\Vert A\right\Vert ,\left\Vert B_i\right\Vert , T,p,K} \left( \left\Vert x\right\Vert +p K\left( \sum _{i=1}^m \left\Vert g_i(0)\right\Vert ^2 T \right) ^{1/2} \right. \\ \left. + KN +\left\Vert A\right\Vert T\sqrt{K}+p\sum _{i=1}^m \left\Vert B_i\right\Vert \sqrt{K T} \right) . \end{aligned} \end{aligned}$$

Following the bootstrap argument presented in [15, Lemma 3.9] to deal with the term KN on the RHS, we get

$$\begin{aligned} \sup _{N \in \mathbb {N} }\sup _{n \in \mathbb {N}_{N} } \left\Vert 1_{(\Omega _n^N)^c}Y_n^N \right\Vert _ {L^{p}(\Omega ,\mathbb {R}^d)} < \infty \end{aligned}$$
(4.11)

and

$$\begin{aligned} \mathop {\sup _{N \in \mathbb {N}}}_{N \ge 8\lambda p T }\sup _{n \in \mathbb {N}_{N} } \left\Vert 1_{\Omega _n^N} Y_n^N\right\Vert _ {L^{p}(\Omega ,\mathbb {R}^d)} \le \mathop {\sup _{N \in \mathbb {N}}}_{N \ge 8\lambda p T }\sup _{n \in \mathbb {N}_{N} } \left\Vert D_n^N\right\Vert _ {L^{p}(\Omega ,\mathbb {R}^d)} . \end{aligned}$$
(4.12)

The boundedness of the term in right hand side of (4.12) is proved in Lemma 4.3. Hence the proof is complete. \(\square \)

5 Proof of weak convergence

We now take C as a generic constant that is independent of the time step size \(\Delta t\).

Lemma 5.1

Let the Assumptions of Theorem 2.3 hold. For each \(T>0\), there exists \(C>0\) such that for \(0 \le t \le T\)

$$\begin{aligned} \vert \mathbb {E}\left[ \textbf{D} \Psi (t,x)\xi \right] \vert \le C \left\Vert \mathbb {E}\left[ \xi \right] \right\Vert , \qquad \text {for all } \xi \in L^p(\Omega , \mathbb {R}^d), \quad p\ge 1. \end{aligned}$$
(5.1)

Proof

By interchanging the derivative and expectation (see discussion in [4, sections 5.1 and 5.2]), the chain rule and the definition of \(\Psi \) in (2.2)

$$\begin{aligned} \vert \mathbb {E}\left[ \textbf{D} \Psi (t,x)\xi \right] \vert&=\vert \textbf{D} \Psi (t,x) \mathbb {E}\left[ \xi \right] \vert \\&=\vert \mathbb {E}\left[ \textbf{D} \phi (X_t ^x) \textbf{D}_x X_t ^x \right] \mathbb {E}\left[ \xi \right] \vert . \end{aligned}$$

Then by the deterministic Cauchy–Schwarz and triangle inequality

$$\begin{aligned} \vert \mathbb {E}\left[ \textbf{D} \Psi (t,x)\xi \right] \vert&\le \left\Vert \mathbb {E}\left[ \textbf{D} \phi (X_t ^x) \textbf{D}_x X_t ^x \right] \right\Vert \left\Vert \mathbb {E}\left[ \xi \right] \right\Vert \nonumber \\&\le \mathbb {E}\left[ \left\Vert \textbf{D} \phi (X_t ^x) \textbf{D}_x X_t ^x \right\Vert \right] \left\Vert \mathbb {E}\left[ \xi \right] \right\Vert \nonumber \\&\le \mathbb {E}\left[ \left\Vert \textbf{D} \phi (X_t ^x) \right\Vert ^2 \right] ^{1/2} \mathbb {E}\left[ \left\Vert \textbf{D}_x X_t ^x \right\Vert ^2 \right] ^{1/2} \left\Vert \mathbb {E}\left[ \xi \right] \right\Vert \end{aligned}$$

where on the last step we used the stochastic Cauchy–Schwarz inequality. The fact that \(\phi \in C^2_b(\mathbb {R}^d)\) and Theorem 2.2 complete the proof. \(\square \)

Considering the definition of the function \(\Psi \) in (2.2), we define

$$\begin{aligned} u(t,\bar{Y}_t):=\Psi (T-t, \bar{Y}_t). \end{aligned}$$
(5.2)

We point out the inequality (5.1) implies, for u defined in (5.2), that

$$\begin{aligned} \vert \mathbb {E}\left[ \textbf{D} u(s,\bar{Y}_s )\xi \right] \vert \le C \left\Vert \mathbb {E}\left[ \xi \right] \right\Vert \end{aligned}$$
(5.3)

where \(\xi \) is a \(\mathbb {R}^d\)-valued random variable. We are now in a position to prove weak convergence.

5.1 Proof of Theorem 2.5: weak convergence

Proof

Recall the definition of \(\Psi (t,x)\) from (2.2). Applying the Itô formula on \([t_n,t_{n+1})\) for \(\Psi (t,\bar{Y}_t )\) with \(\bar{Y}_t \) given by (2.11) gives

$$\begin{aligned} {\textrm{d}}\Psi (t, \bar{Y}_t )=\left( \frac{\partial }{\partial t} +\mathcal {\hat{L}}_n(t) \right) \Psi (t,\bar{Y}_t ){{\textrm{d}}}{t}+\sum _{i=1}^ m \mathcal {\hat{L}}^i_n (t) \Psi (t,\bar{Y}_t ) {\textrm{d}}W^i_t \end{aligned}$$
(5.4)

where

$$\begin{aligned} \begin{aligned} \mathcal {\hat{L}}_n (t) \Psi (t,\bar{Y}_t )&:=\textbf{D} \Psi (t,\bar{Y}_t ) \mu ^{tm} ( t,t_n ) +\frac{1}{2} \sum _{i=1}^ m \sigma _i ^{tm} ( t,t_n ) ^ \intercal \textbf{D}^2 \Psi (t,\bar{Y}_t ) \sigma _i ^{tm} ( t,t_n ) \\ \mathcal {\hat{L}}^i_n (t)\Psi (t,\bar{Y}_t )&:=\textbf{D} \Psi (t,\bar{Y}_t ) \sigma _i ^{tm} ( t,t_n ) \end{aligned} \end{aligned}$$

and \(\mu ^{tm}\) and \(\sigma _i ^{tm}\) are defined in (2.12). Now we define the error function e by

$$\begin{aligned} e:=\vert \Psi (0, \bar{Y}_T)-\Psi (T,\bar{Y}_0)\vert =\bigg \vert \mathbb {E}\left[ \phi (\bar{Y}_T) -\phi (X_T^x) \vert X_0=x \right] \bigg \vert . \end{aligned}$$

By definition of u in (5.2) and by a telescopic sum, we have

$$\begin{aligned} e=\vert \mathbb {E}\left[ u(T,\bar{Y}_T) \right] -u(0,\bar{Y}_0)\vert =\Big \vert \sum _{k=0}^{N-1} \mathbb {E}\left[ u(t_{k+1},\bar{Y}_{t_{k+1})}-u(t_{k},\bar{Y}_{t_k}) \right] \Big \vert . \end{aligned}$$

An application of Itô ’s formula for \(u(s,\bar{Y}_s )\), using (5.4) on each of the subintervals, and the zero expectation of Itô integrals gives

$$\begin{aligned} e=\Bigg \vert \sum _{k=0}^{N-1} \mathbb {E}\left[ \int _{s=t_k}^{s=t_{k+1}} \left( \frac{\partial }{\partial s} u(s, \bar{Y}_s) + \mathcal {\hat{L}}_k(s) u(s,\bar{Y}_s ) \right) {{\textrm{d}}}{s} \right] \Bigg \vert . \end{aligned}$$

On the other hand, \(u(s,\bar{Y}_s )=\Psi (T-s, \bar{Y}_s ) \) satisfies the Kolmogorov PDE (2.3)

$$\begin{aligned} \frac{\partial }{\partial s} u(s,\bar{Y}_s )=-\mathcal {L}u(s,\bar{Y}_s ). \end{aligned}$$

So we have

$$\begin{aligned} e=\bigg \vert \sum _{k=0}^{N-1} \mathbb {E}\left[ \int _{t_k}^{t_k+1} \left( \mathcal {\hat{L}}_k(s) u(s,\bar{Y}_s )-\mathcal {L}u(s,\bar{Y}_s ) \right) {{\textrm{d}}}{s} \right] \bigg \vert . \end{aligned}$$
(5.5)

Consider the integrand

$$\begin{aligned} \begin{aligned} \mathcal {\hat{L}}_k(s)&u(s,\bar{Y}_s ) -\mathcal {L}u(s,\bar{Y}_s ) = \textbf{D} u(s,\bar{Y}_s ) \left( \mu ^{tm} ( s,t_k ) -\mu (\bar{Y}_s )\right) \\&\quad +\frac{1}{2} \sum _{i=1}^ m \sigma _i ^{tm} ( s,t_k ) ^\intercal \textbf{D}^2 u(s,\bar{Y}_s ) \sigma _i ^{tm} ( s,t_k ) - \frac{1}{2} \sum _{i=1}^ m \sigma _i(\bar{Y}_s )^\intercal \textbf{D}^2 u(s,\bar{Y}_s ) \sigma _i(\bar{Y}_s ) \\&= \textbf{ D} u(s,\bar{Y}_s ) \left( \mathbf {\Phi } _{s,t_k } F^{tm} _{\Delta t}(\bar{Y}_{t_k} ) - F(\bar{Y}_s ) \right) \\ {}&+\sum _{i=1}^ m (B_i \bar{Y}_s )^\intercal \textbf{D}^2 u(s,\bar{Y}_s ) \left( \mathbf {\Phi } _{s,t_k } g_i (\bar{Y}_{t_k} ) - g_i(\bar{Y}_s )\right) \\&\quad +\frac{1}{2}\sum _{i=1}^ m \left( \mathbf {\Phi } _{s,t_k } g_i (\bar{Y}_{t_k} )\right) ^\intercal \textbf{D} ^2 u(s,\bar{Y}_s ) \left( \mathbf {\Phi } _{s,t_k } g_i (\bar{Y}_{t_k} )\right) \\&\quad -\frac{1}{2} \sum _{i=1}^ m g_i (\bar{Y}_s ) ^\intercal \textbf{D} ^2 u(s,\bar{Y}_s ) g_i (\bar{Y}_s ). \end{aligned} \end{aligned}$$

Adding and subtracting the terms \(\textbf{ D} u(s,\bar{Y}_s ) F^{tm} _{\Delta t}(\bar{Y}_{t_k} ) \), \(\textbf{D} u(s,\bar{Y}_s )F(\bar{Y}_{t_k} )\) and defining

$$\begin{aligned} \Theta _1^k(s,\textbf{x})&:= \sum _{i=1}^ m \left( B_i \bar{Y}_s \right) ^ \intercal \textbf{D}^2 u(s,\textbf{x}) \left( \mathbf {\Phi } _{s,t_k } g_i (\bar{Y}_{t_k} ) \right) \\ \Theta _2^k(s,\textbf{x})&:=\sum _{i=1}^ m \left( B_i \bar{Y}_s \right) ^ \intercal \textbf{D}^2 u(s,\textbf{x}) g_i(\textbf{x}) \\ \Theta _3^k(s,\textbf{x})&:= \frac{1}{2} \sum _{i=1}^ m \left( \mathbf {\Phi } _{s,t_k} g_i(\bar{Y}_{t_k} )\right) ^\intercal \textbf{D}^2 u(s,\textbf{x}) \left( \mathbf {\Phi } _{s,t_k} g_i(\bar{Y}_{t_k} )\right) \\ \Theta _4^k(s,\textbf{x})&:=\frac{1}{2} \sum _{i=1}^ m g_i(\textbf{x}) ^\intercal \textbf{D}^2 u(s,\textbf{x}) g_i(\textbf{x}) \end{aligned}$$

such that \(\Theta _1^k(t_k, \bar{Y}_{t_k} )=\Theta _2^k(t_k, \bar{Y}_{t_k} )\), \(\Theta _3^k(t_k, \bar{Y}_{t_k} )=\Theta _4^k(t_k, \bar{Y}_{t_k} )\), we have

$$\begin{aligned} \mathcal {\hat{L}}_k(s) u(s,\bar{Y}_s )-\mathcal {L}u(s,\bar{Y}_s ):= & {} T_1^k(s)+T_2^k(s)+T_3^k(s)+T_4^k(s)\\{} & {} +T_5^k(s)+T_6^k(s)+T_7^k(s) \end{aligned}$$

where

$$\begin{aligned} T_1^k(s)&:=\textbf{D} u(s,\bar{Y}_s ) \left( \mathbf {\Phi } _{s,t_k} F^{tm} _{\Delta t}(\bar{Y}_{t_k} )- F^{tm} _{\Delta t}(\bar{Y}_{t_k} )\right)&T_4 ^k (s)&:= \Theta _1^k(s, \bar{Y}_s )-\Theta _1^k(t_k, \bar{Y}_{t_k} )\\ T_2 ^k(s)&:= \textbf{D} u(s,\bar{Y}_s ) \left( F^{tm} _{\Delta t}(\bar{Y}_{t_k} )- F(\bar{Y}_{t_k} ) \right)&T_5^k (s)&:= \Theta _2^k(t_k, \bar{Y}_{t_k} )-\Theta _2^k(s, \bar{Y}_s )\\ T_3 ^k(s)&:= \textbf{D} u(s,\bar{Y}_s ) \left( F(\bar{Y}_{t_k} )-F(\bar{Y}_s ) \right)&T_6 ^k (s)&:= \Theta _3^k(s, \bar{Y}_s )-\Theta _3^k(t_k, \bar{Y}_{t_k} ) \\{} & {} T_7 ^k (s)&:= \Theta _4^k(t_k, \bar{Y}_{t_k} )- \Theta _4^k(s, \bar{Y}_s ). \end{aligned}$$

In terms of these functions, the error (5.5) is given by

$$\begin{aligned} e=\left| \sum _{k=0}^{N-1} \mathbb {E}\left[ \int _{t_k}^{t_{k+1}} \left( T_1^k(s)\!+\!T_2^k(s)\!+\!T_3^k(s)\!+\!T_4^k(s)\!+\!T_5^k(s)\!+\!T_6^k(s)\!+\!T_7^k(s) \right) {{\textrm{d}}}{s} \right] \right| . \end{aligned}$$

We now seek to bound each \(\left| \mathbb {E}\left[ T_i^k \right] \right| \) by a term of order \((s-t_k)\) or \(\Delta t\) on the subinterval \([t_k,t_{k+1}]\), for \(i=1,\ldots ,7\).

Starting with \(T_1^k\), the Itô equation for \(\mathbf {\Phi } _{} \) in (2.5) for \(\mathcal {F} _ {t_k}\) measurable, \(\mathbb {R}^d\)-valued random variables \(v\in L^2 (\Omega ,\mathbb {R}^d)\) gives

$$\begin{aligned} \mathbf {\Phi } _{s,t_k} v= v+ \int _{t_k} ^s A \mathbf {\Phi } _{r,t_k} vdr+ \int _{t_k} ^s \sum _{i=1}^mB_i\mathbf {\Phi } _{r,t_k} vdW^i _r. \end{aligned}$$

By the zero expectation of Itô integrals and Fubini’s Theorem,

$$\begin{aligned} \mathbb {E}\left[ \mathbf {\Phi } _{s,t_k} v-v \right] = \int _{t_k} ^s \mathbb {E}\left[ \mathbf {\Phi } _{r,t_k} v \right] dr. \end{aligned}$$

Taking \(v=F^{tm} _{\Delta t}(\bar{Y}_{t_k} )\), boundedness of the operator \(\mathbf {\Phi } _{} \) from Lemma 4.1 and bounded moments of numerical solution in Theorem 2.4 and the inequality (5.3), we have

$$\begin{aligned} \left| \mathbb {E}\left[ T_1^k(s) \right] \right| \le C \left\Vert \int _{t_k} ^s \mathbb {E}\left[ \mathbf {\Phi } _{r,t_k} F^{tm} _{\Delta t}(\bar{Y}_{t_k} ) \right] dr\right\Vert \le C (s-t_k). \end{aligned}$$

For \(T_2^k\), by the definition of \(F^{tm} _{\Delta t}\) in (2.8) we have

$$\begin{aligned} \left| \mathbb {E}\left[ T_2^k(s) \right] \right| =\left| \mathbb {E}\left[ \textbf{D} u(s,\bar{Y}_s ) \left( -\Delta t\alpha (\Delta t,\bar{Y}_{t_k} )\left\Vert F(\bar{Y}_{t_k} )\right\Vert F(\bar{Y}_{t_k} ) \right) \right] \right| \le C \Delta t, \end{aligned}$$

where the inequality \( \alpha (\Delta t,\bar{Y}_{t_k} ) < 1\) from (2.8), bounded moments of numerical solution from Theorem 2.4 and the inequality (5.3) are used.

We now consider \(T_3^k\). Itô ’s formula for \(F_{j}(\bar{Y}_s )\) around \(\bar{Y}_{t_k} \) where \(F_{j}:\mathbb {R}^d \rightarrow \mathbb {R}\) refers to the jth component of the vector valued function F for \(j=1,\ldots ,d\) gives

$$\begin{aligned} F_{j}(\bar{Y}_s )=F_{j}(\bar{Y}_{t_k} )+\int _{t_k} ^s \mathcal {\hat{L}}_k F_{j}(\bar{Y}_r ) dr +\sum _{i=1}^ m \int _{t_k} ^s \mathcal {\hat{L}}^i_k F_{j}(\bar{Y}_r ) dW^i _r \end{aligned}$$

where

$$\begin{aligned} \begin{aligned} \mathcal {\hat{L}}_k F_{j}(\bar{Y}_r )&:=\textbf{D} F_{j}(\bar{Y}_r ) \mu ^{tm} ( r,t_k ) +\frac{1}{2} \sum _{i=1}^ m \sigma _i ^{tm} ( r,t_k ) ^ \intercal \textbf{D}^2 F_{j}(\bar{Y}_r ) \sigma _i ^{tm} ( r,t_k ) \\ \mathcal {\hat{L}}^i_k F_{j}(\bar{Y}_r )&:=\textbf{D} F_{j}(\bar{Y}_r ) \sigma _i ^{tm} ( r,t_k ) . \end{aligned} \end{aligned}$$

By zero expectation of the Itô integral and Jensen’s inequality for the integral and expectation, we have

$$\begin{aligned} \left| \mathbb {E}\left[ F_{j}(\bar{Y}_s )-F_{j}(\bar{Y}_{t_k} ) \right] \right| ^2=(s-t_k)\int _{t_k} ^s \mathbb {E}\left[ \vert \mathcal {\hat{L}}_k F_{j}(\bar{Y}_r ) \vert ^2 \right] dr. \end{aligned}$$

By polynomial growth assumption on the derivatives of F and bounded numerical moments, we conclude that

$$\begin{aligned} \left| \mathbb {E}\left[ T_3^k(s) \right] \right| =C\left\Vert \mathbb {E}\left[ F(\bar{Y}_s )-F(\bar{Y}_{t_k} ) \right] \right\Vert \le C (s-t_k). \end{aligned}$$

Now consider the term \(T_4^k\). By applying the Itô formula to the function \(\Theta _1^k(t,\bar{Y}_t )\) over the interval \([t_k,s]\) and taking expectation, we have

$$\begin{aligned} T_4^k = \mathbb {E}\left[ \Theta _1^k(s, \bar{Y}_s )-\Theta _1^k(t_k, \bar{Y}_{t_k} ) \right] =\mathbb {E}\left[ \int _{t_k} ^s \left( \left( \frac{\partial }{\partial r} + \mathcal {\hat{L}}_k\right) \Theta _1^k(r,\bar{Y}_r ) \right) dr \right] . \end{aligned}$$

The integrand is given by

$$\begin{aligned} \bigg (\frac{\partial }{\partial r} +&\mathcal {\hat{L}}_k\bigg )\Theta _1^k(r,\bar{Y}_r ) = \sum _{i=1}^ m \left( B_i \bar{Y}_r \right) ^ \intercal \frac{\partial }{\partial r} \left( \textbf{D}^2 u(r,\bar{Y}_r ) \right) \left( \mathbf {\Phi } _{r,t_k } g_i (\bar{Y}_{t_k} ) \right) \nonumber \\&+\sum _{i=1}^ m \left( B_i \bar{Y}_r \right) ^ \intercal \textbf{D}^2 u(r,\bar{Y}_r ) ( A-\frac{1}{2} \sum _{k=1} ^m B_k ^2) \mathbf {\Phi } _{r,t_k } g_i (\bar{Y}_{t_k} ) \nonumber \\&+\sum _{i=1}^ m \textbf{D} \Big [ \left( B_i \bar{Y}_r \right) ^ \intercal \textbf{D}^2 u(r,\bar{Y}_r ) \left( \mathbf {\Phi } _{r,t_k } g_i (\bar{Y}_{t_k} ) \right) \Big ] \mu ^{tm} ( r,t_k ) \nonumber \\&+\frac{1}{2} \sum _{i=1}^ m \sigma _i ^{tm} ( r ) ^ \intercal \textbf{D}^2 \Big [ \left( B_i \bar{Y}_r \right) ^ \intercal \textbf{D}^2 u(r,\bar{Y}_r ) \left( \mathbf {\Phi } _{r,t_k } g_i (\bar{Y}_{t_k} ) \right) \Big ] \sigma _i ^{tm} ( r,t_k ) . \end{aligned}$$
(5.6)

By the continuity of all the derivatives of u, the condition that \(\phi \in C^4_b(\mathbb {R}^d)\) and bounded moments of the numerical method in Theorem 2.4 we have, similar to the arguments in [33, Appendix B], that \(\mathbb {E}\left[ (\frac{\partial }{\partial r} + \mathcal {\hat{L}}_k)\Theta _1^k(r,\bar{Y}_r ) \right] \) is uniformly bounded on \([t_k,s]\). Therefore we conclude

$$\begin{aligned} \left| \mathbb {E}\left[ T_4^k (s) \right] \right| \le \int _{t_k} ^s \vert \mathbb {E}\left[ \left( \frac{\partial }{\partial r} + \mathcal {\hat{L}}_k\right) \Theta _1^k(r,\bar{Y}_r ) \right] \vert dr \le C (s-t_k). \end{aligned}$$

Similar arguments can be applied to \(T_5^k,T_6^k,T_7^k\) as well. Finally, by Fubini’s theorem for integrals

$$\begin{aligned} e&=\left| \sum _{k=0}^{N-1} \mathbb {E}\left[ \int _{t_k}^{t_{k+1}} \left( T_1^k(s)+T_2^k(s)+T_3^k(s)+T_4^k(s)+T_5^k(s)+T_6^k(s)+T_7^k(s)\right) {{\textrm{d}}}{s} \right] \right| \\&\le C \left( \Delta t+ \sum _{k=0}^{N-1} \int _{t_k}^{t_{k+1}} (s-t_k){{\textrm{d}}}{s}\right) \\&= C(T) \Delta t. \end{aligned}$$

we obtain the desired order for the weak error of the scheme. \(\square \)