Strong convergence of an adaptive time-stepping Milstein method for SDEs with monotone coefficients

Kelly, Cónall; Lord, Gabriel J.; Sun, Fandi

doi:10.1007/s10543-023-00969-9

Strong convergence of an adaptive time-stepping Milstein method for SDEs with monotone coefficients

Open access
Published: 22 May 2023

Volume 63, article number 33, (2023)
Cite this article

Download PDF

You have full access to this open access article

BIT Numerical Mathematics Aims and scope Submit manuscript

Strong convergence of an adaptive time-stepping Milstein method for SDEs with monotone coefficients

Download PDF

1434 Accesses
1 Citation
Explore all metrics

Abstract

We introduce an explicit adaptive Milstein method for stochastic differential equations with no commutativity condition. The drift and diffusion are separately locally Lipschitz and together satisfy a monotone condition. This method relies on a class of path-bounded time-stepping strategies which work by reducing the stepsize as solutions approach the boundary of a sphere, invoking a backstop method in the event that the timestep becomes too small. We prove that such schemes are strongly $L_2$ convergent of order one. This order is inherited by an explicit adaptive Euler–Maruyama scheme in the additive noise case. Moreover we show that the probability of using the backstop method at any step can be made arbitrarily small. We compare our method to other fixed-step Milstein variants on a range of test problems.

On Milstein approximations with varying coefficients: the case of super-linear diffusion coefficients

Article Open access 19 June 2019

Convergence of the Euler–Maruyama method for multidimensional SDEs with discontinuous drift and degenerate diffusion coefficient

Article Open access 20 July 2017

Weak convergence of Euler scheme for SDEs with low regular drift

Article 02 October 2021

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

We investigate the use of adaptive time-stepping strategies in the construction of a strongly convergent explicit Milstein-type numerical scheme for a d-dimensional stochastic differential equation (SDE) of Itô-type on the probability space $(\Omega , {\mathcal {F}}, {\mathbb {P}})$,

$$\begin{aligned} X(t)= X(0)+\int _{0}^{t}f(X(r))dr +\sum _{i=1}^{m}\int _{0}^{t}g_i(X(r))dW_i(r), \end{aligned}$$

(1.1)

for $t\in [0,T]$, $T\ge 0$ and $i=1,\dots , m\in {\mathbb {N}}$, where $W=[W_1,\cdots , W_m]^T$ is an m-dimensional Wiener process, the drift coefficient $f: {\mathbb {R}}^d \rightarrow {\mathbb {R}}^d$ and the diffusion coefficient $g: {\mathbb {R}}^d\rightarrow {\mathbb {R}}^{d\times m}$ each satisfy a local Lipschitz condition along with a polynomial growth condition and, together, a monotone condition. Both are twice continuously differentiable; see Assumptions 2.1 and 2.2. Throughout, we take the initial vector $X(0)=X_0\in {\mathbb {R}}^d$ to be deterministic.

It was pointed out in [34] that, because the Euler–Maruyama and Euler–Milstein methods coincide in the additive noise case, and as a consequence of the analysis in [15], an explicit Milstein scheme over a uniform mesh cannot converge in ${L}_p$ to solutions of (1.1). We propose here an adaptive variant of the explicit Milstein method that achieves strong ${L}_2$ convergence of order one to solutions of (1.1). As an immediate consequence of this, in the case of additive noise an adaptive Euler–Maruyama method also has ${L}_2$ convergence of order one. To prove our convergence result it is essential to introduce a new variant of the admissible class of time-stepping strategies introduced in [17, 18], which we call path-bounded strategies.

Several variants on the fixed-step Milstein method have been proposed, see for example the tamed Milstein [20, 34], projected and split-step backward Milstein [1], truncated Milstein [10], implicit Milstein methods [13, 35] and a recent tamed stochastic Runge–Kutta (of order one) method of [8], all designed to converge strongly to solutions of SDEs with more general drift and diffusions, such as in (1.1). However, with few exceptions (see [1, 20]) explicit methods of this kind have only examined the case where the diffusion coefficients $g_i$ satisfies a commutativity condition. We do not impose a commutativity restriction and hence must consider the associated Lévy areas (see Lemma 2.2).

A review of methods that adapt the timestep in order to control local error may be found in the introduction to [17]; we cite here [2, 7, 16, 21, 28, 31] and remark that our purpose is instead to handle the nonlinear response of the discrete system see also [5, 6] and discussion in [17, 18]. A common feature of the adaptivity is the use of both a minimum and maximum time step where the magnitude of the minimum step is controlled by a free parameter which requires some a-priori knowledge on the part of the user. The approach of [5, 6] was recently extended to McKean–Vlasov equations in [30] and include a Milstein approximation. In addition we note the fully adaptive Milstein method proposed in [14] for a scalar SDE with light constraints on the coefficients. There the authors stated that such a method was easy to implement but hard to analyse and as a result considered a different, but related method.

Our framework for adaptivity was introduced in [17] for an explicit Euler–Maruyama method, and has since been extended to SDE systems with monotone coefficients in [18] and to SPDE methods in [3]. These methods all use a backstop method when the chosen strategy attempts to select a stepsize below the minimum step. We demonstrate here, for a path-bounded strategy, that the probability of using the backstop can be made arbitrarily small by choosing an appropriately large $\rho $, and an appropriately small $h_{\max }$. This is consistent with observation, and with the intuitive notion that the use of the backstop should be rare in practice.

The structure of the article is as follows. Mathematical preliminaries are considered in Sect. 2, including precise specifications of the conditions imposed on each f and $g_i$, and the characterisation of an explicit Milstein method on an arbitrary mesh. The construction and result of the adaptive time-stepping strategy is outlined in Sect. 3, where we formulate the adaptive Milstein scheme with backstop which will be the subject of our main theorem. Both main results: on strong $L_2$ convergence and on the probability of using the backstop method, are stated in Sect. 4; we defer their proofs to Sect. 7. In Sect. 5 we compare the adaptive scheme numerically to other fixed step methods and illustrate both convergence and efficiency. The proof of Lemma 2.2 is in Appendix A.

2 Mathematical preliminaries

We consider the d-dimensional Itô-type SDE (1.1) and for the remainder of the article let $({\mathcal {F}}_t)_{t\ge 0}$ be the natural filtration of W. For all $x \in {\mathbb {R}}^d$ and for all $\phi (x)\in \textrm{C}^2({\mathbb {R}}^d,{\mathbb {R}}^d)$, the Jacobian matrix of $\phi (x)$ is denoted ${\textbf{D}}\phi (x)\in {\mathcal {L}}({\mathbb {R}}^d,{\mathbb {R}}^d)$; the second derivative of $\phi (x)$ with respect to a vector x forms a 3-tensor and is denoted ${\textbf{D}}^2\phi (x)\in {\mathcal {L}}({\mathbb {R}}^{d\times d},{\mathbb {R}}^d)$; and $[x]^2:=x\otimes x$ stands for the outer product of x and itself. Furthermore, let $\Vert \cdot \Vert $ denote the standard $l^2$ norm in ${\mathbb {R}}^d$, $\Vert \cdot \Vert _{{\textbf{F}}(a\times b)}$ the Frobenious norm of the matrix in ${\mathbb {R}}^{a\times b}$; for simplicity we write $\Vert \cdot \Vert _{{\textbf{F}}}$ as the Frobenious norm of the matrix in ${\mathbb {R}}^{d\times d}$. $\Vert \cdot \Vert _{{\textbf{T}}_3}$ denotes the induced tensor norm (spectral norm) of the 3-tensor in ${\mathbb {R}}^{d\times d\times d}$ and it is defined as $\big \Vert \cdot \big \Vert _{{\textbf{T}}_3}:=\sup _{h_1,h_2\in {\mathbb {R}}^d, \Vert h_1\Vert ,\Vert h_2\Vert \le 1}\big \Vert \cdot (h_1\otimes h_2)\big \Vert $. For $a,b\in {\mathbb {R}}$, $a\vee b$ denotes max$\{a,b\}$ and $a\wedge b$ denotes min$\{a,b\}$. We frequently make use of the elementary inequality

$$\begin{aligned} 2ab\le a^2+b^2,\quad a,b\in {\mathbb {R}}, \end{aligned}$$

(2.1)

and of the following two standard extensions of Jensen’s inequality (see [23, Corollary A.10]). For $f\in L^1$, if $p\ge 1$,

$$\begin{aligned} \Bigg | \int _{0}^t f(s)ds \Bigg |^p \le t^{p-1}\int _{0}^t |f(s)|^p ds,\quad t\ge 0. \end{aligned}$$

(2.2)

For $a_i\in {\mathbb {R}}$ and $p\ge 1$,

$$\begin{aligned} \Bigg | \sum _{i=1}^{n} a_i \Bigg |^p \le n^{p-1}\sum _{i=1}^{n} |a_i|^p, \quad n\in {\mathbb {N}}\backslash \{ 0\}. \end{aligned}$$

(2.3)

We now present our assumptions on f and $g_i$ in (1.1).

Assumption 2.1

Let $f\in \textrm{C}^2({\mathbb {R}}^d,{\mathbb {R}}^d)$ and $g\in \textrm{C}^2({\mathbb {R}}^d,{\mathbb {R}}^{d\times m})$ with $g_i(x)=[g_{1,i}(x),\dots ,g_{d,i}(x)]^T\in \textrm{C}^2({\mathbb {R}}^d,{\mathbb {R}}^{d})$. For each $\varkappa \ge 1$ there exist $L_{\varkappa }>0$ such that

$$\begin{aligned} \big \Vert f(x)-f(y)\big \Vert ^2+\big \Vert g(x)-g(y)\big \Vert ^2_{{\textbf{F}}(d\times m)}\le L_{\varkappa }\big \Vert x-y\big \Vert ^2, \end{aligned}$$

(2.4)

for $x,y\in {\mathbb {R}}^d$ with $\Vert x\Vert \vee \Vert y\Vert \le \varkappa $, and there exists $c\ge 0$ such that for some $\eta \ge 2$

$$\begin{aligned} \big \langle x-y, f(x)-f(y)\big \rangle +\frac{\eta -1}{2} \big \Vert g(x)-g(y)\big \Vert ^2_{{\textbf{F}}(d\times m)} \le c\big \Vert x-y\big \Vert ^2. \end{aligned}$$

(2.5)

In addition, for some constants $c_{ 3,4,5,6 }$, $q_1$, $q_2\ge 0$; $i=1,\dots ,m$, we have

$$\begin{aligned} \big \Vert {{\textbf {D}}}f(x)\big \Vert _{{{\textbf {F}}}}\le \,\,&c_3(1+\Vert x\Vert ^{q_1+1}), \qquad \quad \big \Vert {{\textbf {D}}}g_i(x)\big \Vert _{{{\textbf {F}}}} \le \,\, c_4(1+\Vert x\Vert ^{q_2+1}), \end{aligned}$$

(2.6)

$$\begin{aligned} \big \Vert f(x)\big \Vert \le \,\,&c_5(1+\Vert x\Vert ^{q_1+2}), \quad \quad \big \Vert g(x)\big \Vert _{{{\textbf {F}}}(d\times m)} \le \,\, c_6(1+\Vert x\Vert ^{q_2+2}). \end{aligned}$$

(2.7)

Furthermore, for some $c_{1,2}\ge 0$; $i=1,\dots ,m$, we have

$$\begin{aligned} \big \Vert {\textbf{D}}^2f(x)\big \Vert _{{\textbf{T}}_3}\le \,\, c_1(1+\Vert x\Vert ^{q_1}), \quad \big \Vert {\textbf{D}}^2g_i(x)\big \Vert _{{\textbf{T}}_3}\le \,\, c_2(1+\Vert x\Vert ^{q_2}). \end{aligned}$$

(2.8)

Under (2.4) and (2.5), the SDE (1.1) has a unique strong solution on any interval [0, T], where $T < \infty $ on the filtered probability space $(\Omega , {\mathcal {F}}, ({\mathcal {F}}_t )_{t \ge 0}, {\mathbb {P}})$, see [11, 25] and [33].

Assumption 2.2

Suppose that (2.5) in Assumption 2.1 holds with

$$\begin{aligned} \eta \ge 4q + 2q_2+10, \end{aligned}$$

where $q:=q_1\vee q_2$, $q_1$ and $q_2$ are from (2.7) in Assumption 2.1.

We now give the following Lemma on moments of the solution.

Lemma 2.1

[26, Lem. 4.2] Let f and g satisfy (2.4) and (2.5), and suppose that Assumption 2.2 holds. If g further satisfies (2.7), then there is a constant $C_{\texttt {X}} >0$ such that the solution of (1.1) satisfies

$$\begin{aligned} {\mathbb {E}}\biggl [\sup _{s\in [0,T]}\Vert X(s)\Vert ^{\eta -2q_2-2}\biggr ] \le C_{\texttt {X}}. \end{aligned}$$

(2.9)

Next we present the fixed-step Milstein method (see [19, Sec. 10.3]) that is the basis of the adaptive method presented in this article.

Definition 2.1

(Milstein method) For $n\in {\mathbb {N}}$, $s\in [t_n, t_{n+1}]$ and given $Y(t_n)$, the fixed-step Milstein scheme for (1.1), interpolated over the interval $[t_n,t_{n+1}]$, is given by

$$\begin{aligned} Y(s):= & {} Y(t_n)+f\big (Y(t_n)\big )|s-t_n|+\sum _{i=1}^{m}g_i\big (Y(t_n)\big )I_{i}^{t_n,s}\nonumber \\{} & {} +\sum _{i,j=1}^{m}{\textbf{D}}g_i\big (Y(t_n)\big )g_j\big (Y(t_n)\big )I_{j,i}^{t_n,s}, \end{aligned}$$

(2.10)

where following [1, 34], the stochastic integral and the iterated stochastic integral are defined as

$$\begin{aligned} I_{i}^{t_n,s}:=\int _{t_{n}}^{s}dW_i(r), \qquad I_{j,i}^{t_n,s}:=\int _{t_{n}}^{s} \int _{t_{n}}^{r}dW_j(p) dW_i(r). \end{aligned}$$

(2.11)

Expanding the last term in (2.10) we have that

$$\begin{aligned}&\sum _{i,j=1}^{m}{{\textbf {D}}}g_i\big (Y(t_n)\big )g_j\big (Y(t_n)\big )I_{j,i}^{t_n,s}\nonumber \\=\,\,&\frac{1}{2}\sum _{i=1}^{m}{{\textbf {D}}}g_i\big (Y(t_n)\big )g_i\big (Y(t_n)\big )\left( \left( I_{i}^{t_n,s}\right) ^2-|s-t_n|\right) \nonumber \\ {}&+\frac{1}{2}\sum _{{\begin{array}{c} i,j=1\\ i<j \end{array}}}^{m}\Big ({{\textbf {D}}}g_i\big (Y(t_n)\big )g_j\big (Y(t_n)\big )+{{\textbf {D}}}g_j\big (Y(t_n)\big )g_i\big (Y(t_n)\big )\Big )I_{i}^{t_n,s}I_{j}^{t_n,s}\nonumber \\ {}&+\sum _{{\begin{array}{c} i,j=1\\ i<j \end{array}}}^{m}\Big ({{\textbf {D}}}g_i\big (Y(t_n)\big )g_j\big (Y(t_n)\big )-{{\textbf {D}}}g_j\big (Y(t_n)\big )g_i\big (Y(t_n)\big )\Big )A_{ij}^{t_n,s}, \end{aligned}$$

(2.12)

where the term $A_{ij}^{t_n,s}$ is the Lévy area (see for example [22, Eq. (1.2.2)]) defined by

$$\begin{aligned} A_{ij}^{t_n,s}:=\frac{1}{2}\left( I_{i,j}^{t_n,s}-I_{j,i}^{t_n,s}\right) {,} \end{aligned}$$

(2.13)

and we have used the relations $I_{i,i}^{t_n,s} = \frac{1}{2}\big ( (I_{i}^{t_n,s})^2 - |t-s|\big )$ and $I_{i,j}^{t_n,s} + I_{j,i}^{t_n,s} = I_{i}^{t_n,s} I_{j}^{t_n,s}$. As mentioned in the introduction many authors assume the following commutativity condition: suppose that ${\textbf{D}}g_i(y)g_j(y)={\textbf{D}}g_j(y)g_i(y)$ for all $i,j=1,\dots , m$ and $y\in {\mathbb {R}}^d$. When this holds, the last term in (2.12) vanishes, avoiding the need for any analysis of $A_{ij}^{t_n,s}$ defined in (2.13). We do not impose such a condition in this paper, and therefore make use of the following conditional moment bounds on the Lévy areas.

Lemma 2.2

(Lévy Area) For all $i,j =1,\dots ,m$, $0\le t_n\le s<T$ and for a pair of Wiener process $(W_i(r),W_j(r))^T$ where $r\in [t_n,s]$ and the Lévy area $A_{ij}^{t_n,s}$ defined in (2.13), there exists a finite constant $C_{\texttt {LA}}$ whose explicit form is in (A.1) such that for $k\ge 1$

$$\begin{aligned} {\mathbb {E}}\left[ \big |A_{ij}^{t_n,s}\big |^k \bigg |{\mathcal {F}}_{t_n}\right] \le C^{}_{\texttt {LA}}\left( k\right) \,|s-t_n|^k, \quad a.s. \end{aligned}$$

(2.14)

For proof see Appendix A.

3 Adaptive time-stepping strategies

To deal with the extra terms that arise from Milstein over Euler–Maruyama type discretisations, we introduce a new class of time-stepping strategies in Definition 3.5. Let $\{h_{n+1}\}_{n\in {\mathbb {N}}}$ be a sequence of strictly positive random timesteps with corresponding random times $\{t_n:=\sum _{i=1}^{n}h_i\}_{n\in {\mathbb {N}}\backslash \{0\}}$, where $t_0=0$.

Definition 3.1

Suppose that each member of $\{t_n\}_{n\in {\mathbb {N}}\backslash \{0\}}$ is an ${\mathcal {F}}_t$-stopping time: i.e. $\{t_n\le t\}\in {\mathcal {F}}_t$ for all $t\ge 0$, where $({\mathcal {F}}_t)_{t\ge 0}$ is the natural filtration of W. If $\tau $ is any $({\mathcal {F}}_t)$-stopping time, then (see [27, p. 14])

$$\begin{aligned} {\mathcal {F}}_{\tau }:=\big \{A\in {\mathcal {F}}:\,A\cap \{\tau \le t\}\in {\mathcal {F}}_t {,\, \text{ for } \text{ all } \,t\ge 0}\big \}. \end{aligned}$$

(3.1)

In particular this allows us to condition on ${\mathcal {F}}_{t_n}$ at any point on the random time-set $\{t_n\}_{n\in {\mathbb {N}}}$.

Assumption 3.1

For the sequence of random timesteps $\{h_{n+1}\}_{n\in {\mathbb {N}}}$, there are constant values $h_{\max }>h_{\min }>0$, $\rho >1$ such that $h_{\max }=\rho h_{\min }$, and

$$\begin{aligned} 0<h_{\min } \le h_{n+1} \le h_{\max }\le 1. \end{aligned}$$

(3.2)

In addition, we assume each $h_{n+1}$ is ${\mathcal {F}}_{t_n}$-measurable.

Definition 3.2

Let $N^{(t)}$ be a random integer such that

$$\begin{aligned} N^{(t)}:=\max \big \{n\in {\mathbb {N}}\backslash \{ 0\}: t_{n-1}<{t} \big \}, \end{aligned}$$

(3.3)

and let $N=N^{(T)}$ and $t_N=T$, so that T is always the last point on the mesh. Note that $N^{(t)}$ indicates the step number such that $t\in \big [t_{N^{(t)}-1},\,t_{N^{(t)}}\big ]$. Furthermore, by Assumption 3.1, $N^{(t)}$ only takes values in the finite set $\{N^{(t)}_{\min },\dots ,N^{(t)}_{\max }\}$, where $N^{(t)}_{\min }:=\lfloor t/h_{\max }\rfloor $ and $N^{(t)}_{\max }:=\lceil t/h_{\min }\rceil $.

In Assumption 3.1, the lower bound $h_{\min }$ given by (3.2) ensures that a simulation over the interval [0, T] can be completed in a finite number of time steps. In the event that at time $t_n$ our strategy attempts to select a stepsize $h_{n+1} \le h_{\min }$, we instead apply a single step of a backstop method ($\varphi $ in Definition 3.3 below), a known method that satisfies a mean-square consistency requirement with deterministic step $h_{n+1}=h_{\min }$ (see also discussion in Remarks 3.1 and 5.1).

First we recall the Milstein method expressed as a map. Over each step $[t_n,t_{n+1}]$ the Milstein map $\theta :{\mathbb {R}}^d\times {\mathbb {R}} \times {\mathbb {R}}\rightarrow {\mathbb {R}}^d$ is defined as

$$\begin{aligned} \theta \big (x,t_n,s-t_n\big ):= & {} x+(s-t_n)f(x)\nonumber \\{} & {} \quad +\sum _{i=1}^{m}g_i(x)I_{i}^{t_n,s}+\sum _{i,j=1}^{m}{\textbf{D}}g_i(x)g_j(x) I_{j,i}^{t_n,s}. \end{aligned}$$

(3.4)

Following [18, Def. 9], we now define an adaptive Milstein scheme combining the Milstein method and a backstop method.

Definition 3.3

(Adaptive Milstein Scheme) Let $\{h_{n+1}\}_{n\in {\mathbb {N}}}$ satisfy Assumption 3.1. Using indicator functions to distinguish the backstop case when $h_{n+1}=h_{\min }$ (and allowing for the possibility that the final step taken to time T is smaller than $h_{\min }$, in which case the backstop is also used), we define the continuous form of an adaptive Milstein scheme associated with a particular time-stepping strategy $\{h_{n+1}\}_{n\in {\mathbb {N}}}$ as

$$\begin{aligned} {\widetilde{Y}}(s):=\theta \left( {{\widetilde{Y}}}(t_n)\varvec{,}\,\, t_n\varvec{,}\,\,s-t_n\right) \cdot {\textbf{1}}_{\{h_{\min }<h_{n+1}\le h_{\max }\}}\nonumber \\ +\varphi \left( \widetilde{Y}(t_n)\varvec{,}\,\,t_n\varvec{,}\,\,{s-t_n}\right) \cdot {\textbf{1}}_{\{h_{n+1}{\le } h_{\min }\}}, \end{aligned}$$

(3.5)

for $s\in [t_n,t_{n+1}]$, $n\in {\mathbb {N}}$, ${\widetilde{Y}}(0)=X(0)$, and $\theta $ is as given in (3.4). Thus the scheme is characterised by the sequence of tuples, $\big \{\big (\widetilde{Y}(s)\big )_{s\in [t_n,t_{n+1}]},h_{n+1}\big \}_{n\in {\mathbb {N}}}$. The backstop map $\varphi :{\mathbb {R}}^d\times {\mathbb {R}} \times {\mathbb {R}} \rightarrow {\mathbb {R}}^d$ in (3.5) satisfies for each $n \in {\mathbb {N}}$

$$\begin{aligned} {\mathbb {E}}\left[ \left\| X(s)- \varphi \left( \widetilde{Y}(t_n)\varvec{,}\,\,t_n\varvec{,}\,\, s-t_n\right) \right\| ^2 \bigg |{\mathcal {F}}_{t_n} \right] \le \left\| X(t_n) -\widetilde{Y}(t_n) \right\| ^2\nonumber \\ + C_{B_1} \int _{t_n}^{s} {\mathbb {E}}\left[ \left\| X(r)-\varphi \left( \widetilde{Y}(t_n)\varvec{,}\,\,t_n\varvec{,}\,\, {r-t_n}\right) \right\| ^2 \bigg |{\mathcal {F}}_{t_n}\right] dr+ C_{B_2} h_{\min }^3, \end{aligned}$$

(3.6)

a.s, for positive constants $C_{B_1}$ and $C_{B_2}$.

Throughout the article it is notationally convenient to make the following definition.

Definition 3.4

Let ${\widetilde{Y}}$ be as given in Definition 3.3 and define for each $n\in {\mathbb {N}}$

$$\begin{aligned} Y_{\theta }(s):=\theta \Big ({\widetilde{Y}}(t_n),t_n,s-t_n\Big ),\quad s\in [t_n,t_{n+1}]. \end{aligned}$$

(3.7)

Remark 3.1

The upper bound $h_{\max }$ prevents step sizes from becoming too large and allows us to examine strong convergence of the adaptive Milstein method (3.5) to solutions of (1.1) as $h_{\max }\rightarrow 0$ (and hence as $h_{\min }\rightarrow 0$). Note that $\varphi $ satisfies (3.6) if the backstop method satisfies a mean-square consistency requirement. In practice, instead of testing (3.6), we choose a backstop method that is strongly convergent with rate 1.

Remark 3.2

For all $i,j=1,2,\dots , m$, $I_{i}^{t_n,t_{n+1}}$ in (2.11) is a Wiener increment taken over a random step of length $h_{n+1}$, which itself may depend on ${{\widetilde{Y}}}(t_n)$ and therefore is not necessarily independent and normally distributed. However, since $h_{n+1}$ is ${\mathcal {F}}_{t_n}$-measurable, then $I_{i}^{t_n,t_{n+1}}$ is ${\mathcal {F}}_{t_n}$-conditionally normally distributed and by the Optional Sampling Theorem (see for example [32]), for all $p=0,1,2,\dots $

$$\begin{aligned} {\mathbb {E}}\left[ I_{i}^{t_n,t_{n+1}} \bigg |{\mathcal {F}}_{t_n}\right]&=0,\quad a.s.; \end{aligned}$$

(3.8)

$$\begin{aligned} {\mathbb {E}}\left[ \left| I_{i}^{t_n,t_{n+1}}\right| ^2 \bigg |{\mathcal {F}}_{t_n}\right]&=h_{n+1},\quad a.s.; \end{aligned}$$

(3.9)

$$\begin{aligned} {\mathbb {E}}\left[ \left| I_{i}^{t_n,s}\right| ^{p} \bigg |{\mathcal {F}}_{t_n}\right]&=\varvec{\gamma }_{p}|s-t_n|^{\frac{p}{2}},\quad a.s.; \end{aligned}$$

(3.10)

where $\varvec{\gamma }_{p}:=2^{p/2}\Gamma \left( (p+1)/2 \right) \pi ^{-1/2}$, and $\Gamma $ is the Gamma function (see for example [29, p. 148]). In implementation, it is sufficient to replace the sequence of Wiener increments with i.i.d. ${\mathcal {N}} (0, 1)$ random variables scaled at each step by the ${\mathcal {F}}_{t_n}$-measurable random variable $\sqrt{h_{n+1}}$.

We now provide a specific example of a time-stepping strategy that we use in Sect. 5 and that satisfies the assumptions for our convergence proof in Theorem 4.1. Suppose that for each $n=0,\dots , N-1$ and some fixed constant $\kappa >0$, we choose constant values $h_{\max }>h_{\min }>0$, $\rho >1$ such that $h_{\max }=\rho h_{\min }$ and

$$\begin{aligned} {h_{n+1}=h_{\min }\vee \left( h_{\max }\wedge \frac{h_{\max }}{\big \Vert \widetilde{Y}(t_n)\big \Vert ^{1/\kappa }} \right) .} \end{aligned}$$

(3.11)

Then (3.2) in Assumption 3.1 holds for (3.11). Notice also that, from (3.11), the following bound applies on the event $\{h_{\min }< h_{n+1} \le h_{\max }\}$:

$$\begin{aligned} 0\le \big \Vert {{\widetilde{Y}}}(t_n)\big \Vert < \left( \frac{h_{\max }}{h_{\min }}\right) ^\kappa = \rho ^\kappa . \end{aligned}$$

The strategy given by (3.11) is admissible in the sense given in [17, 18]. However, it also motivates the following class of time-stepping strategies to which our convergence analysis applies.

Definition 3.5

(Path-bounded time-stepping strategies) Let $\big \{{{\widetilde{Y}}}(t_n),h_{n+1} \big \}_{n\in {\mathbb {N}}}$ be a numerical approximation for (1.1) given by (3.5), associated with a timestep sequence $\{h_{n+1}\}_{n\in {\mathbb {N}}}$ satisfying Assumption 3.1. We say that $\{h_{n+1}\}_{n\in {\mathbb {N}}}$ is a path-bounded time-stepping strategy for (3.5) if there exist real non-negative constants $0\le Q<R$ (where R may be infinite if $Q\ne 0$) such that on the event $\{h_{\min }< h_{n+1} \le h_{\max }\}$,

$$\begin{aligned} Q\le \big \Vert {{\widetilde{Y}}}(t_n)\big \Vert < R, \quad n=0,\dots , N-1. \end{aligned}$$

(3.12)

Note that throughout this paper we use a strategy where $Q=0$ and $R<\infty $. As we will see in Sect. 5.2, a careful choice of the parameter $\kappa $ can be used to minimise invocations of the backstop method when $\rho $ is fixed.

4 Main results

Our first main result shows strong convergence with order 1 of solutions of (3.5) to solutions of (1.1) when $\{h_{n+1}\}_{n\in {\mathbb {N}}}$ is a path-bounded time-stepping strategy ensuring that (3.12) holds.

Theorem 4.1

(Strong Convergence) Let $(X(t))_{t\in [0,T]}$ be a solution of (1.1) with initial value $X(0) = X_0{\in {\mathbb {R}}^d}$. Suppose that the conditions of Assumptions 2.1 and 2.2 hold. Let $\big \{\big (\widetilde{Y}(s)\big )_{s\in [t_n,t_{n+1}]},h_{n+1}\big \}_{n\in {\mathbb {N}}}$ be the adaptive Milstein scheme given in Definition 3.3 with initial value for the first component ${{\widetilde{Y}}}_0 = X_0$ and path-bounded time-stepping strategy $\{h_{n+1}\}_{n\in {\mathbb {N}}}$ satisfying the conditions of Definition 3.5 for some $R<\infty $. Then there exists a constant $C(R,\rho ,T) > 0$ such that

$$\begin{aligned} \max _{t\in [0,T]}\Big ({\mathbb {E}}\Big [\Vert X(t)-\widetilde{Y}(t)\Vert ^2\Big ]\Big )^{1/2} \le C(R,\rho ,T)\,h_{\max }. \end{aligned}$$

(4.1)

Furthermore,

$$\begin{aligned} \lim _{\rho \rightarrow \infty }C(R,\rho ,T)=\infty . \end{aligned}$$

(4.2)

The proof of Theorem 4.1, which is given in Sect. 7.2, accounts for the properties of the random sequences $\{t_n\}_{n\in {\mathbb {N}}}$ and $\{h_{n+1}\}_{n\in {\mathbb {N}}}$ and uses (3.12) to compensate for the non-Lipschitz drift and diffusion.

Our second main result shows that for the specific strategy given by (3.11) the probability of needing a backstop method can be made arbitrarily small by taking $\rho $ sufficiently large for fixed $\kappa $.

Theorem 4.2

(Probability of Backstop) Let all the conditions of Theorem 4.1 hold, and suppose that the path-bounded time-stepping strategy $\{h_{n+1}\}_{n\in {\mathbb {N}}}$ also satisfies (3.11). Let $C(R,\rho ,T)$ be the error constant in estimate (4.1) from the statement of Theorem 4.1.

For any fixed $\kappa \ge 1$ there exists a constant $C_{\text {prob}}=C_{\text {prob}}(T,R,h_{\max })$ such that, for $h_{\max }\,\le \,1/C(R,\rho ,T)$,

$$\begin{aligned} {\mathbb {P}}\left[ {h_{n+1}=h_{\min }} \right] \le C_{\text {prob}}\,\,\rho ^{1-2\kappa }. \end{aligned}$$

(4.3)

Further for arbitrarily small tolerance $\varepsilon \in (0,1)$, there exists $\rho >0$ such that

$$\begin{aligned} {\mathbb {P}}\left[ {h_{n+1}=h_{\min }} \right] <\varepsilon ,\quad n\in {\mathbb {N}}. \end{aligned}$$

For proof see Sect. 7.3.

5 Numerical examples

Remark 5.1

We use the adaptive strategy in (3.11). We ensure that we reach the final time by taking $h_{N}=T-t_{N-1}$ as our final step, and in a situation where this is smaller than $h_{\min }$ we use the backstop method (this is compatible with the proofs below).

In the numerical experiments below, we set the adaptive Milstein scheme (AMil) as in (3.5) with (3.11) as the choice of $h_{n+1}$. Projected Milstein (PMil) in [1, Eq. (24)] is set to be the backstop method of AMil and the reference method of all models. Then we compare the strong convergence,looking at the root mean square (RMS) error, and efficiency, by comparing the CPU time, of AMil and PMil, Split-Step Backward Milstein method (SSBM) [1, Eq. (25)], the new variant of Milstein (TMil) in [20], and the Tamed Stochastic Runge–Kutta of order 1.0 (TSRK1) method [8, Eq. (3.8) (3.9)]. For the non-adaptive schemes, to examine strong convergence, we take as the fixed step $h_{\text {mean}}$ the average of all time steps over each path and each Monte Carlo realization $m = 1,\dots , M$ so that

$$\begin{aligned} h_{\text {mean}}:=\frac{1}{M}\sum _{m=1}^{M}\frac{T}{N_{m}}, \end{aligned}$$

where $N_{m}$ denotes the number of steps taken on the $m^{th}$ sample path to reach T.

5.1 One-dimensional test equations with multiplicative and additive noise

In order to demonstrate strong convergence of order one for a scalar test equation with non-globally Lipschitz drift, consider

$$\begin{aligned} dX(t)=\big (X(t)-3X(t)^3\big )dt+G(X(t))dW(t), \quad t\in [0,1]. \end{aligned}$$

(5.1)

For illustrating both the multiplicative and additive noise cases, we estimate the RMS error by a Monte Carlo method using $M=1000$ trajectories for $h_{\max }=[2^{-14}, 2^{-12}, 2^{-10}, 2^{-8}, 2^{-6}]$, $\rho =2^2$, $\kappa =1$, and use as a reference solution PMil over a mesh with uniform step sizes $h_{\text {ref}}=2^{-18}$.

For additive noise we set $G(x)=\sigma $ in (5.1), and for multiplicative noise we set $G(x)=\sigma (1-x^2)$ with $\sigma =0.2$ and $X(0)=11$ in both cases. Strong convergence of order one is displayed by all methods in Fig. 1 part (a) and (c) for the additive and multiplicative cases respectively, with the efficiency displayed in parts (b) and (d).

Finally, consider Theorem 4.2. We illustrate that the probability of our time-stepping strategy selecting $h_{\min }$, and therefore triggering an application of the backstop method, can be made arbitrarily small at every step by an appropriate choice of $\rho $ (with fixed $\kappa =1$). Consider (5.1) again with $G(x)=\sigma (1-x^2)$, this time with $X(0)=100$, $\kappa , T=1$, $h_{\max }=2^{-20}$ and $\rho =[2, 4, \dots , 16]$. In Fig. 1e, we plot two paths of h when $\rho =2, 6$. Observe that when $\rho =2$ the backstop is triggered only for the first $10^5$ steps approximately, whereas once $\rho $ is increase to 6 this is reduced to the first $2\times 10^4$ steps approximately. Estimated probabilities of using $h_{\min }$ are plotted on a log-log scale as a function of $\rho $ in Fig. 1f (with $M=100$ realizations). The estimated probability of using $h_{\min }$ declines to zero as $\rho $ increases. We observe a rate close to $-1$, matching that in (4.3) with $\kappa =1$.

5.2 One-dimensional model of telomere shortening

The following one-dimensional SDE model was given in [9, Eq. (A6)] for modelling the shortening over time of telomere length L in DNA replication

$$\begin{aligned} dL(t) = -\big (c+aL(t)^2\big )dt + \sqrt{\frac{1}{3}aL(t)^3}dW(t). \end{aligned}$$

(5.2)

The parameter c determines the underlying decay rate of the length and a controls the intensity at which random breaks occur in the telomere; we take $(a,c)=(0.41\times 10^{-6},7.5)$ as in [9]. In this example we fix $\rho =4$, instead adjusting the parameter $\kappa $ in (3.11) to control use of the backstop method. Individual paths are shown in Fig. 2 where we take $h_{\max }=2^{-18}$, and $h=2^{-20}$ for the fixed step methods.

We set $L(0)=1000$, noting from [9] that initial values could be as high as (say) $L(0)=6000$ and remain physically realistic. The end of the interval of valid simulation is determined by the first time at which trajectories reach zero, and is therefore random. However this is not observed to occur in the timescale (25 days) we consider here.

By design PMil projects the data onto a ball of radius determined in part by the growth of the drift term. We see in Fig. 2a that (PMil) immediately is reduced to approximately 200.

Contrarily, the design of TMil scales both drift and diffusion terms by $1/(1+h|L|^2)$ for this model. When $h|L|^2$ is large this scaling can damp out changes from step to step, and in Fig. 2a we see that TMil shows as (spuriously) almost constant. The paths of the other methods, AMil, SSBM and TSRK1 are close together as shown in Fig. 2a and in high detail in (b).

Notice that we used $\kappa =8$ in (3.11) for AMil method to reduce the chance of requiring the backstop method PMil while keeping $\rho =4$. We avoid setting $\kappa =1$ in this case because $L(0)=1000$ and so the adaptive step $h_{n+1}$ would too frequently require the backstop method.

5.3 Two-dimensional test systems

We now consider three ($i=1,2,3$) different SDEs:

$$\begin{aligned} dX(t)=F(X(t))dt+G_i(X(t))dW(t),\quad t\in [0,1],\quad X(0)=[7,9]^T, \end{aligned}$$

(5.3)

with $W(t)=[W_1(t),W_2(t)]^T$, where $W_1$ and $W_2$ are independent scalar Wiener processes, $X(t)=[X_1(t),X_2(t)]^T$, $F(x)=[x_2-3x_1^3,x_1-3x_2^3]^T$, and

$$\begin{aligned} G_1(x)=\sigma \begin{pmatrix}x_1^2 &{}\quad 0\\ 0 &{}\quad x_2^2\end{pmatrix}, \, G_2(x)=\sigma \begin{pmatrix}x_2^2 &{}\quad x_2^2\\ x_1^2 &{}\quad x_1^2\end{pmatrix},\, G_3(x)=\sigma \begin{pmatrix}1.5x_1^2 &{}\quad x_2\\ x_2^2 &{}\quad 1.5x_1\end{pmatrix}. \end{aligned}$$

$G_1$ is an example of diagonal noise, $G_2$ commutative noise, and $G_3$ non-commutative noise.

For $G_1$ and $G_2$ we use $h_{\max }=[2^{-14}, 2^{-12}, 2^{-10}, 2^{-8}, 2^{-6}]$, $h_{\text {ref}}=2^{-18}$, $\rho =4$ and $\kappa =1$. In Fig. 3a, c, we see order one strong convergence for all methods. Parts (b) and (d) show the efficiency of the adaptive method.

For $i=3$, the non-commutative noise case, take $h_{\max }=[2^{-8}, 2^{-7}, 2^{-6}, 2^{-5}, 2^{-4}]$, $ h_{\text {ref}}=2^{-11}$, $\rho =2^2$ and $X(0)=[3,4]^T$. To simulate the Lévy areas we follow the method in [12, Sec. 4.3], which is based on the Euler approximation of a system of SDEs. Again, we observe order one convergence for all methods in Fig. 3e and that AMil is the most efficient in (f). Note that as TSRK1 is only supported theoretically for commutative noise we do not consider it here.

6 Preliminary lemmas

We present five lemmas necessary for the proof of Theorem 4.1 and Theorem 4.2. Throughout this section we assume that f and g satisfy Assumptions 2.1 and (except for Lemma 6.4) that we are on the event $\{h_{\min }<h_{n+1}\le h_{\max }\}$ so that (3.12) holds of Definition 3.5. We use (2.6), (2.8) and (2.7) to define some bounded constant coefficients depending on $R<\infty $. The constants in (6.1) are then used in the development of a one-step error bound for the adaptive part of the scheme.

$$\begin{aligned} \begin{aligned} \big \Vert f\big ({{\widetilde{Y}}}(t_n)\big )\big \Vert \le \,&c_5(1+R^{q_1+2})=:C_{f};\\ \big \Vert {\textbf{D}}f\big (\widetilde{Y}(t_n)\big )\big \Vert _{{\textbf{F}}}\le \,&c_3(1+R^{q_1+1})=:C_{Df}; \\ \big \Vert g_i\big ({{\widetilde{Y}}}(t_n)\big )\big \Vert \le \big \Vert g\big (\widetilde{Y}(t_n)\big )\big \Vert _{{\textbf{F}}(d\times m)}\le \,&c_6(1 + R^{q_2+2})=:C_{g_i}; \\ \big \Vert {\textbf{D}}g_i\big ({{\widetilde{Y}}}(t_n)\big )\big \Vert _{{\textbf{F}}} \le \,&c_4(1 + R^{q_2+1})=:C_{Dg_i}. \end{aligned} \end{aligned}$$

(6.1)

The following lemma provides a bound for the even conditional moments of the iterated stochastic integral in (2.11).

Lemma 6.1

(Iterated Stochastic Integral) Let $\big \{\big (\widetilde{Y}(s)\big )_{s\in [t_n,t_{n+1}]},h_{n+1}\big \}_{n\in {\mathbb {N}}}$ be the adaptive Milstein scheme given in Definitions 3.3 and 3.5. Then there exists a constant $C_{\texttt {ISI}}$ such that for $k\ge 1$, $n\in {\mathbb {N}}$ and $s\in [t_n,t_{n+1}]$, on the event $\{h_{\min }<h_{n+1}\le h_{\max }\}$

$$\begin{aligned} {\mathbb {E}}\Bigg [\Bigg \Vert \sum _{i,j=1}^{m}{\textbf{D}}g_i\big (\widetilde{Y}(t_n)\big )g_j\big (\widetilde{Y}(t_n)\big )I_{j,i}^{t_n,s}\Bigg \Vert ^{2k}\Bigg |{\mathcal {F}}_{t_n}\Bigg ]\le C^{}_{\texttt {ISI}}\left( k,R\right) |s-t_n|^{2k}, \end{aligned}$$

(6.2)

where

$$\begin{aligned} C^{}_{\texttt {ISI}}\left( k,R\right) :=&\,3^{2k} m^{4k}C_{Dg_i}^{2k} C_{g_i}^{2k}\Big ( \varvec{\gamma }_{4k}+1+\varvec{\gamma }_{2k}^2+C^{}_{\texttt {LA}}\left( 2k\right) \Big ). \end{aligned}$$

(6.3)

Here, $\varvec{\gamma }_{p}$ is from (3.10), $C^{}_{\texttt {LA}}\left( 2k\right) $ is from Lemma 2.2 with explicit form given in (A.1), and the R dependence in $C^{}_{\texttt {ISI}}\left( k,R\right) $ arises from (6.1).

Proof

First of all, for convenience we set

$$\begin{aligned} G_{\texttt {ISI}}(s):=\Bigg \Vert \sum _{i,j=1}^{m}{\textbf{D}}g_i\big (\widetilde{Y}(t_n)\big )g_j\big (\widetilde{Y}(t_n)\big )I_{i,j}^{t_n,s}\Bigg \Vert ^{2k}. \end{aligned}$$

By (2.12) and (2.3), we have, for $s\in [t_n, t_{n+1}]$ and $n\in {\mathbb {N}}$,

$$\begin{aligned} G_{\texttt {ISI}}(s)\le & {} 3^{2k-1}\Bigg (\Bigg \Vert \frac{1}{2}\sum _{i=1}^{m}{\textbf{D}}g_i\big (\widetilde{Y}(t_n)\big )g_i\big ({{\widetilde{Y}}}(t_n)\big )\Big (\big (I_{i}^{t_n,s} \big )^2 -|s-t_n|\Big ) \Bigg \Vert ^{2k}\\{} & {} +\Bigg \Vert \frac{1}{2}\sum _{\begin{array}{c} i,j=1\\ i<j \end{array}}^{m}\Big ({\textbf{D}}g_i\big (\widetilde{Y}(t_n)\big )g_j\big (\widetilde{Y}(t_n)\big )+{\textbf{D}}g_j\big (\widetilde{Y}(t_n)\big )g_i\big (\widetilde{Y}(t_n)\big )\Big )I_{i}^{t_n,s}I_{j}^{t_n,s}\Bigg \Vert ^{2k}\\{} & {} +\Bigg \Vert \sum _{\begin{array}{c} i,j=1\\ i<j \end{array}}^{m}\Big ({\textbf{D}}g_i\big (\widetilde{Y}(t_n)\big )g_j\big (\widetilde{Y}(t_n)\big )-{\textbf{D}}g_j\big (\widetilde{Y}(t_n)\big )g_i\big ({{\widetilde{Y}}}(t_n)\big )\Big )A_{ij}^{t_n,s} \Bigg \Vert ^{2k}\Bigg ). \end{aligned}$$

Applying (2.3) again and by submultiplicativity of the Euclidean norm and the fact that the induced matrix 2-norm is bounded above by the Frobenius norm, for $s\in [t_n, t_{n+1}]$ and $n\in {\mathbb {N}}$, we get

$$\begin{aligned} G_{\texttt {ISI}}(s)\le & {} 3^{2k-1}\Bigg (\frac{ m^{2k-1}}{2^{2k}}\sum _{i=1}^{m}\big \Vert {\textbf{D}}g_i\big (\widetilde{Y}(t_n)\big )\big \Vert _{{\textbf{F}}}^{2k}\big \Vert g_i\big (\widetilde{Y}(t_n)\big )\big \Vert ^{2k}\Big (\big (I_{i}^{t_n,s} \big )^2+|s-t_n| \Big )^{2k}\\{} & {} +\left( \frac{ m(m-1)}{2}\right) ^{2k-1}\sum _{\begin{array}{c} i,j=1\\ i<j \end{array}}^{m}\Big \Vert {\textbf{D}}g_i\big (\widetilde{Y}(t_n)\big )g_j\big (\widetilde{Y}(t_n)\big )+{\textbf{D}}g_j\big (\widetilde{Y}(t_n)\big )g_i(Y(t_n)\Big \Vert ^{2k}\\{} & {} \times \bigg (\frac{1}{2^{2k}}\Big |I_{i}^{t_n,s}I_{j}^{t_n,s} \Big |^{2k}+\big |A_{ij}^{t_n,s} \big |^{2k}\bigg )\Bigg ). \end{aligned}$$

Applying conditional expectations on both sides, together with the pairwise conditional independence of $I_{i}^{t_n,s}$ and $I_{j}^{t_n,s}$ for $i\ne j$, (2.6) and (6.1), we have for $s\in [t_n, t_{n+1}]$ and $n\in {\mathbb {N}}$

$$\begin{aligned} {\mathbb {E}}\Big [G_{\texttt {ISI}}(s)\Big |{\mathcal {F}}_{t_n}\Big ]\le & {} 3^{2k}\Bigg ( m^{2k-1} C_{Dg_i}^{2k} C_{g_i}^{2k}\sum _{i=1}^{m}\left( {\mathbb {E}}\left[ \left| I_{i}^{t_n,s}\right| ^{4k}\bigg |{\mathcal {F}}_{t_n}\right] +|s-t_n|^{2k} \right) \\{} & {} +\left( \frac{ m(m-1)}{2}\right) ^{2k-1}C_{Dg_i}^{2k} C_{g_i}^{2k} \sum _{\begin{array}{c} i,j=1\\ i<j \end{array}}^{m} \bigg ( {\mathbb {E}}\left[ \left| I_{i}^{t_n,s}\right| ^{2k}\bigg |{\mathcal {F}}_{t_n}\right] {\mathbb {E}}\left[ \left| I_{j}^{t_n,s}\right| ^{2k}\bigg |{\mathcal {F}}_{t_n}\right] \\{} & {} +{\mathbb {E}}\left[ \left| A_{ij}^{t_n,s}\right| ^{2k}\bigg |{\mathcal {F}}_{t_n}\right] \bigg )\Bigg ). \end{aligned}$$

Using (3.9), (3.10) and (2.14) we have

$$\begin{aligned} {\mathbb {E}}\Big [G_{\texttt {ISI}}(s)\Big |{\mathcal {F}}_{t_n}\Big ]\le \,\,&C^{}_{\texttt {ISI}}\left( k,R\right) |s-t_n|^{2k}, \end{aligned}$$

where $C^{}_{\texttt {ISI}}\left( k,R\right) $ is in (6.3). $\square $

The following lemma provides a bound on the conditional moments of the adaptive Milstein scheme in (3.5) over one step, in the case where the method applies the map $\theta $.

Lemma 6.2

Consider $\big \{\big (\widetilde{Y}(s)\big )_{s\in [t_n,t_{n+1}]},h_{n+1}\big \}_{n\in {\mathbb {N}}}$ from Definitions 3.3 and 3.5, and let $(Y_{\theta }(s))_{s\in (t_n,t_{n+1}]}$ be as defined in Definition 3.4. Then there exists a constant $C_{ Y_{\theta }}> 0$ such that for $k\ge 1$, $n\in {\mathbb {N}}$ and $s\in {(}t_n, t_{n+1}]$, on the event $\{h_{\min }<h_{n+1}\le h_{\max }\}$,

$$\begin{aligned} {\mathbb {E}}\Big [\big \Vert Y_{\theta }(s)\big \Vert ^k\Big |{\mathcal {F}}_{t_n}\Big ] \le C_{ Y_{\theta }}\big (k,R\big ), \end{aligned}$$

(6.4)

where

$$\begin{aligned} C_{ Y_{\theta }}\big (k,R\big ):=\,\,4^{k-1}\Big (R^k+C_f^k+m^k C_{g_i}^k\,\, \varvec{\gamma }_{k}+C^{}_{\texttt {ISI}}\left( 2k\right) ^{1/2}\Big ), \end{aligned}$$

(6.5)

with the constant $C_{\texttt {ISI}}$ from Lemma 6.1.

Proof

By (3.7), (3.4) and (2.3), we have, for $s\in {(}t_n, t_{n+1}]$ and $n\in {\mathbb {N}}$,

$$\begin{aligned} \big \Vert Y_{\theta }(s)\big \Vert ^k=\,&\left\| \theta \left( \widetilde{Y}(t_n)\textbf{,}\,\, t_n\textbf{,}\,\, s-t_n\right) \right\| ^k \\ \le&\,\,4^{k-1}\Bigg (\big \Vert \widetilde{Y}(t_n)\big \Vert ^k+\left\| f\big (\widetilde{Y}(t_n)\big )\right\| ^k|s-t_n|^k+\left\| \sum _{i=1}^{m}g_i\big (\widetilde{Y}(t_n)\big )I_{i}^{t_n,s}\right\| ^{k}\\ {}&+\Bigg (\Bigg \Vert \sum _{i,j=1}^{m}{{\textbf {D}}}g_i\big (\widetilde{Y}(t_n)\big )g_j\big (\widetilde{Y}(t_n)\big )I_{j,i}^{t_n,s}\Bigg \Vert ^{2k}\Bigg )^{1/2}\Bigg ). \end{aligned}$$

Applying (2.3), (3.12) and (6.1) for $s\in {(}t_n, t_{n+1}]$ and $n\in {\mathbb {N}}$, it yields

$$\begin{aligned} \big \Vert Y_{\theta }(s)\big \Vert ^k \le&\,\,4^{k-1}\left( R^k+C_f^k |s-t_n|^k+m^{k-1} C_{g_i}^k \sum _{i=1}^{m}\left| I_{i}^{t_n,s}\right| ^{k} \right. \\ {}&\left. +\Bigg (\Bigg \Vert \sum _{i,j=1}^{m}{{\textbf {D}}}g_i\big (\widetilde{Y}(t_n)\big )g_j\big (\widetilde{Y}(t_n)\big )I_{j,i}^{t_n,s}\Bigg \Vert ^{2k}\Bigg )^{1/2}\right) . \end{aligned}$$

Taking conditional expectation on both sides, with Jensen’s inequality on the last term we have for $s\in {(}t_n, t_{n+1}]$ and $n\in {\mathbb {N}}$

$$\begin{aligned} {\mathbb {E}}\Big [\Vert Y_{\theta }(s)\Vert ^k\Big |{\mathcal {F}}_{t_n}\Big ]\le & {} \, 4^{k-1}\left( R^k+C_f^k|s-t_n|^k+ m^{k-1} C_{g_i}^k\sum _{i=1}^{m}{\mathbb {E}}\left[ \left| I_{i}^{t_n,s}\right| ^{k}\bigg |{\mathcal {F}}_{t_n} \right] \right. \\{} & {} \left. +\left( {\mathbb {E}}\Bigg [\Bigg \Vert \sum _{i,j=1}^{m}{\textbf{D}}g_i\big (\widetilde{Y}(t_n)\big )g_j\big (\widetilde{Y}(t_n)\big )I_{j,i}^{t_n,s}\Bigg \Vert ^{2k}\Bigg |{\mathcal {F}}_{t_n}\Bigg ]\right) ^{1/2}\right) . \end{aligned}$$

Using (3.9), (6.2) from Lemma 6.1 and since $|s-t_n|\le h_{\max }\le 1$ (3.2) we have

$$\begin{aligned} {\mathbb {E}}\Big [\Vert Y_{\theta }(s)\Vert ^k\Big |{\mathcal {F}}_{t_n}\Big ] \le C_{Y_{\theta }}(k,R), \end{aligned}$$

where $C_{Y_{\theta }}(k,R)$ is in (6.5). $\square $

The following lemma proves regularity in time of the adaptive Milstein scheme in (3.5) when applying the map $\theta $.

Lemma 6.3

(Scheme Regularity) Consider $\big \{\big (\widetilde{Y}(s)\big )_{s\in [t_n,t_{n+1}]},h_{n+1}\big \}_{n\in {\mathbb {N}}}$ in Definitions 3.3 and 3.5, and let $(Y_{\theta }(s))_{s\in {(}t_n,t_{n+1}]}$ be as defined in Definition 3.4. Then there exists a constant $C_{\texttt {SR}}$ such that for $k\ge 1$, $n\in {\mathbb {N}}$ and $s\in {(}t_n, t_{n+1}]$, on the event $\{h_{\min }<h_{n+1}\le h_{\max }\}$

$$\begin{aligned} {\mathbb {E}}\Big [\big \Vert Y_{\theta }(s)- {{\widetilde{Y}}}(t_n) \big \Vert ^{2k} \Big |{\mathcal {F}}_{t_n}\Big ] \le C^{}_{\texttt {SR}}\left( k,R\right) |s-t_n|^k, \end{aligned}$$

(6.6)

where

$$\begin{aligned} C^{}_{\texttt {SR}}\left( k,R\right) := \,\, 3^{2k-1}\Big (C_f^{2k} +m^{2k} C_{g_i}^{2k}\,\, \varvec{\gamma }_{2k} +C^{}_{\texttt {ISI}}\left( 2k\right) \Big ), \end{aligned}$$

(6.7)

with the constant $C_{\texttt {ISI}}$ from Lemma 6.1.

Proof

The method of proof is similar to the proof of Lemma 6.2. $\square $

Remark 6.1

Our analysis requires a certain number of finite moments for the SDE (1.1), and it is necessary to track exactly what those are in order to see that the conditions of Assumption 2.2 are not violated. To this end, we introduce a superscript notation for random variables appearing as conditional expectations at this point. The notation should be interpreted according to the following example: in (6.9) below the random variable $C_{\texttt {PR}}^{\{2k(q+2)\}}$ requires $2k(q+2)$ finite moments of the SDE (1.1) to have finite expectation.

The following lemma examines the regularity of solutions of the SDE (1.1).

Lemma 6.4

(Path Regularity) Let f, g also satisfy Assumption 2.2, and let $(X(s))_{s\in [t_n,t_{n+1}]}$ be a solution of (1.1). Then there exists an ${\mathcal {F}}_{t_n}$-measurable random variable ${\overline{C}}_{\texttt {PR}}^{\{2k(q+2)\}}$ such that for $k\ge 1$, $n\in {\mathbb {N}}$ and $s\in [t_n,t_{n+1}]$ a.s.

$$\begin{aligned} {\mathbb {E}}\Big [\Vert X(s)-X(t_n)\Vert ^{2k}\Big |{\mathcal {F}}_{t_n}\Big ]&\,\le \,\, \overline{C}^{\{2k(q+2)\}}_{\texttt {PR}}\,\,|s-t_n|^{k}, \end{aligned}$$

(6.8)

where $q=q_1\vee q_2$ is as defined in Assumption 2.2. Where a.s.

$$\begin{aligned} \overline{C}^{\{2k(q+2)\}}_{\texttt {PR}}= & {} \,2^{4k-2}c_5^{2k}\left( 1+{\mathbb {E}}\left[ {\sup _{p\in [t_n,t_{n+1}]}}\Vert X(p)\Vert ^{2k(q_1+2)}\bigg |{\mathcal {F}}_{t_n}\right] \right) \nonumber \\{}{} & {} {} +2^{4k-2}(k(2k-1))^k c_6^{2k}\left( 1+{\mathbb {E}}\left[ {\sup _{p\in [t_n,t_{n+1}]}}\Vert X(p)\Vert ^{2k(q_2+2)}\bigg |{\mathcal {F}}_{t_n}\right] \right) ,\nonumber \\ \end{aligned}$$

(6.9)

where the expectation of ${\overline{C}}_\texttt {PR}^{\{2k(q+2)\}}$ is denoted $C^{}_{\texttt {PR}}\left( k\right) $, given by

$$\begin{aligned} C^{}_{\texttt {PR}}\left( k\right) :={\mathbb {E}}\left[ \overline{C}^{\{2k(q+2)\}}_{\texttt {PR}}\right] \le 2^{4k-2}\left( 1+C_{\texttt {X}} \right) \big ( c_5^{2k}+(k(2k-1))^k c_6^{2k}\big ). \end{aligned}$$

(6.10)

Proof

The method of proof follows that of [25, Thm. 7.1]. The bound (6.10) follows from (2.9) and Assumption 2.2. $\square $

The following lemma provides a bound on the even conditional moments of the remainder term from a Taylor expansion of either the drift f or diffusion g, around ${\widetilde{Y}}(t_n)$.

Lemma 6.5

(Taylor Error) Consider $\big \{\big (\widetilde{Y}(s)\big )_{s\in [t_n,t_{n+1}]},h_{n+1}\big \}_{n\in {\mathbb {N}}}$ from

Definitions 3.3 and 3.5, and let $(Y_{\theta }(s))_{s\in [t_n,t_{n+1}]}$ be as defined in Definition 3.4. Let $u\in \{f, g\}$ and set $c_{{\textbf{D}}2}:=c_1\vee c_2$. Then there exists a constant $C_{\texttt {TE}}$ such that for $k\ge 1$, $n\in {\mathbb {N}}$ and $s\in [t_n, t_{n+1}]$, on the event $\{h_{\min }<h_{n+1}\le h_{\max }\}$,

$$\begin{aligned} {\mathbb {E}}\left[ \Big \Vert \int _{0}^{1}(1-\epsilon ){\textbf{D}}^2u\Big (\widetilde{Y}(t_n) -\epsilon \big ( Y_{\theta }(s)-{{\widetilde{Y}}}(t_n)\big )\Big ) d\epsilon \Big \Vert _{\mathbf {T_3}}^{2k}\bigg | {\mathcal {F}}_{t_n}\right] \le C_{\texttt {TE}}\big (k,R\big ), \end{aligned}$$

(6.11)

where $C_{\texttt {TE}}\left( k,R \right) :=c_{{\textbf{D}}2}^{2k}\left( 1+ 3^{2kq+1}\left( R^{2kq}+ C_{Y_{\theta }}\left( k,R \right) \right) \right) $, where $C_{Y_{\theta }}\big (k,R\big )$ is from Lemma 6.2.

Proof

By using (2.2), (2.3), (2.8), Lemma 6.2, (3.12) and since $c_{{\textbf{D}}2}=c_1 \vee c_2$, $q=q_1 \vee q_2$ we have

$$\begin{aligned}&{\mathbb {E}}\left[ \Big \Vert \int _{0}^{1}(1-\epsilon ){{\textbf {D}}}^2u\Big (Y(t_n) -\epsilon \big (Y_{\theta }(s)-{{\widetilde{Y}}}(t_n)\big )\Big ) d\epsilon \Big \Vert _{\mathbf {T_3}}^{2k}\bigg | {\mathcal {F}}_{t_n}\right] \\ \le \,\,&{\mathbb {E}}\left[ \int _{0}^{1}(1-\epsilon )^{2k}\Big \Vert {{\textbf {D}}}^2u\Big ({{\widetilde{Y}}}(t_n) -\epsilon \big (Y_{\theta }(s)-{{\widetilde{Y}}}(t_n)\big )\Big )\Big \Vert _{\mathbf {T_3}}^{2k} d\epsilon \bigg | {\mathcal {F}}_{t_n}\right] \\ \le \,\,&c_{{{\textbf {D}}}2}^{2k}{\mathbb {E}}\left[ \int _{0}^{1}(1-\epsilon )^{2k}\Big (1+\big \Vert {{\widetilde{Y}}}(t_n)-\epsilon \cdot \big (Y_{\theta }(s)-{{\widetilde{Y}}}(t_n)\big )\big \Vert ^{2kq}\Big ) d\epsilon \bigg |{\mathcal {F}}_{t_n}\right] \\ \le \,\,&c_{{{\textbf {D}}}2}^{2k}{\mathbb {E}}\Big [1+3^{2kq-1}\big \Vert {{\widetilde{Y}}}(t_n)\big \Vert ^{2kq}\\ +&\int _{0}^{1}(1-\epsilon )^{2k}\epsilon ^{2kq}3^{2kq}\left( \Vert Y_{\theta }(s)\Vert ^{2kq}+\big \Vert {{\widetilde{Y}}}(t_n)\big \Vert ^{2kq}\right) d\epsilon \Bigg |{\mathcal {F}}_{t_n}\Bigg ]\\ \le \,\,&c_{{{\textbf {D}}}2}^{2k}\left( 1+ 3^{2kq+1}\big \Vert {{\widetilde{Y}}}(t_n)\big \Vert ^{2kq}+3^{2kq}{\mathbb {E}}\Big [\Vert Y_{\theta }(s)\Vert ^{2kq} \Big |{\mathcal {F}}_{t_n}\Big ]\right) \\ =\,\,&C^{}_{\texttt {TE}}\left( k,R\right) , \end{aligned}$$

where $(1-\epsilon )^{2k}\,\epsilon ^{2kq}\le 1$ for $k,q\ge 1$ and $\epsilon \in [0,1]$. $\square $

7 Proof of main theorems

In this section we prove the strong convergence result of Theorem 4.1 and Theorem 4.2 on the probability of using the backstop and the role of $\rho $.

7.1 Setting up the error function

Notice that ${{\widetilde{Y}}} (s)$, from the explicit adaptive Milstein scheme (3.5), takes either the Milstein map $\theta $ in (3.4) or the backstop map $\varphi $ in (3.6) depending on the value of $h_{n+1}$. Thus, we define the error by

$$\begin{aligned} {{\widetilde{E}}}(s):=\,X(s)-{{\widetilde{Y}}} (s) = E_{\theta }(s) + E_{\varphi }(s), \end{aligned}$$

(7.1)

for $s\in [t_n, t_{n+1}]$ and $n\in {\mathbb {N}}$. Here

$$\begin{aligned} E_{\varphi }(s):=\left( X(s)- \varphi \left( \widetilde{Y}(t_n)\varvec{,}\,\,t_n\varvec{,}\,\, {s-t_n}\right) \right) \, {\textbf{1}}_{\{h_{n+1}\le h_{\min }\}}, \end{aligned}$$

(7.2)

and $Y_{\theta }(s)$ is as defined in Definition 3.4 and

$$\begin{aligned} E_{\theta }(s):= & {} \big (X(s) - Y_{\theta }(s)\big )\, {\textbf{1}}_{\{h_{\min }<h_{n+1}\le h_{\max }\}}\nonumber \\= & {} \,\left( {{\widetilde{E}}}(t_n)+\int _{t_n}^{s}\Delta f\big (X(r),{{\widetilde{Y}}}(t_n)\big )dr\right. \nonumber \\{} & {} \left. +\sum _{i=1}^{m}\int _{t_n}^{s}\Delta g_i\big (r,X(r),{{\widetilde{Y}}}(t_n)\big )dW_i(r)\right) \, {\textbf{1}}_{\{h_{\min }<h_{n+1}\le h_{\max }\}}, \end{aligned}$$

(7.3)

with

$$\begin{aligned} \Delta f\big (X(r),{\widetilde{Y}}(t_n)\big )&:=\,f(X(r))- f\big ({{\widetilde{Y}}}(t_n)\big ); \end{aligned}$$

(7.4)

$$\begin{aligned} \Delta g_i\big (r,X(r),\widetilde{Y}(t_n)\big )&:=\,g_i(X(r))-g_i\big (\widetilde{Y}(t_n)\big )-\sum _{j=1}^{m}{{\textbf {D}}}g_i\big (\widetilde{Y}(t_n)\big )g_j\big ({{\widetilde{Y}}}(t_n)\big )I_{j}^{t_n,r}. \end{aligned}$$

(7.5)

To simplify the proof of Theorem 4.1 and Theorem 4.2, we require two lemmas. First, we find the second-moment bound of $\Delta g_i$ in (7.5) on the event $\{h_{\min }<h_{n+1}\le h_{\max }\}$ (so that (3.12) holds).

Lemma 7.1

Let g satisfy Assumption 2.1 and $\Delta g_i$ be as in (7.5). Take $s\in [t_n,t_{n+1}]$, let X(s) be a solution of (1.1), consider $\big (\widetilde{Y}(s),h_{n+1}\big )$ from Definitions 3.3 and 3.5, and let $Y_{\theta }(s)$ be as defined in Definition 3.4. In this case there exists a constant $C_{G}$ such that, on the event $\{h_{\min }<h_{n+1}\le h_{\max }\}$,

$$\begin{aligned}{} & {} {\mathbb {E}}\left[ \left\| \Delta g_i\big (s,X(s),\widetilde{Y}(t_n)\big ) \right\| ^2\bigg |{\mathcal {F}}_{t_n}\right] \nonumber \\{} & {} \le 2{\mathbb {E}}\Big [\big \Vert g(X(s))-g\big (Y_{\theta }(s)\big )\big \Vert _{{\textbf{F}}(d\times m)}^2 \Big |{\mathcal {F}}_{t_n}\Big ] +C_{G}(R) |s-t_n|^2, \end{aligned}$$

(7.6)

where

$$\begin{aligned} C_{G}(R) :=\,8 C_{Dg_i}^2 \big (C_f^2+ C^{}_{\texttt {ISI}}\left( 1,R\right) \big ) +4C^{}_{\texttt {TE}}\left( 2,R\right) ^{1/2} C^{}_{\texttt {SR}}\left( 4,R\right) ^{1/2}, \end{aligned}$$

(7.7)

and $C_{\texttt {ISI}}$, $C_{\texttt {TE}}$ and $C_{\texttt {SR}}$ are from Lemmas 6.1, 6.5 and 6.3, respectively.

Proof

Substitute $\Delta g_i$ by (7.5) in the LHS of (7.6), add and subtract $g_i\big ( Y_{\theta }(s)\big )$, and use (2.3) to get

$$\begin{aligned}{} & {} {\mathbb {E}}\left[ \left\| \Delta g_i\big (s,X(s),{{\widetilde{Y}}}(t_n)\big ) \right\| ^2\bigg |{\mathcal {F}}_{t_n}\right] \le 2{\mathbb {E}}\Big [\Big \Vert g_i(X(s))-g_i\big ( Y_{\theta }(s)\big )\Big \Vert ^2 \Big |{\mathcal {F}}_{t_n}\Big ] \nonumber \\{} & {} +\underbrace{2{\mathbb {E}}\Bigg [\Bigg \Vert g_i\big ( Y_{\theta }(s)\big )-g_i\big (\widetilde{Y}(t_n)\big )-\sum _{j=1}^{m}{\textbf{D}}g_i\big (\widetilde{Y}(t_n)\big )g_j\big ({{\widetilde{Y}}}(t_n)\big )I_{j}^{t_n,s} \Bigg \Vert ^2 \Bigg |{\mathcal {F}}_{t_n}\Bigg ]}_{=: G_{1}}. \end{aligned}$$

(7.8)

To analyse $G_{1}$, we expand $g_i ( Y_{\theta }(s) )$ using Taylor’s theorem (see for example [23, A.1]) around $g_i\big ({{\widetilde{Y}}}(t_n)\big )$ to get

$$\begin{aligned}{} & {} g_i\big ( Y_{\theta }(s)\big )-g_i\big ({{\widetilde{Y}}}(t_n)\big ) =\, {\textbf{D}}g_i\big ({{\widetilde{Y}}}(t_n)\big )\big ( Y_{\theta }(s)-{{\widetilde{Y}}}(t_n)\big )\nonumber \\{} & {} +\int _{0}^{1}(1-\epsilon ){\textbf{D}}^2g_i\Big (\widetilde{Y}(t_n)-\epsilon \big ( Y_{\theta }(s) -\widetilde{Y}(t_n)\big )\Big )\Big [ Y_{\theta }(s)-\widetilde{Y}(t_n)\Big ]^2d\epsilon , \end{aligned}$$

(7.9)

where we recall from Sect. 2 that $[\cdot ]^2$ represents the outer product of a vector with itself. Substituting (7.9) into $G_{1}$ in (7.8), taking out ${\textbf{D}}g_i\big ({{\widetilde{Y}}}(t_n)\big )$ as a common factor, and applying (2.3) gives

$$\begin{aligned} G_{1}\,\le&\,\,4 {\mathbb {E}}\Bigg [\Bigg \Vert {\textbf{D}}g_i\big (\widetilde{Y}(t_n)\big )\bigg ( Y_{\theta }(s)-\widetilde{Y}(t_n)-\sum _{j=1}^{m}g_j\big (\widetilde{Y}(t_n)\big )I_{j}^{t_n,s}\bigg )\Bigg \Vert ^2\Bigg |{\mathcal {F}}_{t_n}\Bigg ] \nonumber \\&\quad +4 {\mathbb {E}}\Bigg [\Bigg \Vert \int _{0}^{1}(1-\epsilon ){\textbf{D}}^2g_i\Big (\widetilde{Y}(t_n)-\epsilon \big ( Y_{\theta }(s)-\widetilde{Y}(t_n)\big )\Big )\nonumber \\&\qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \times \Big [ Y_{\theta }(s)-{{\widetilde{Y}}}(t_n)\Big ]^2d\epsilon \Bigg \Vert ^2 \Bigg |{\mathcal {F}}_{t_n}\Bigg ] \nonumber \\ =:\,\,&G_{1.1}+G_{1.2}. \end{aligned}$$

(7.10)

For $G_{1.1}$ in (7.10), by submultiplicativity of the Euclidean norm and the fact that the induced matrix 2-norm is bounded above by the Frobenius norm; by (3.7), (6.1) and (6.2) in the statement of Lemma 6.1 with $k=1$, we have

$$\begin{aligned} G_{1.1} \,&\le \,\, 8\Big \Vert {{\textbf {D}}}g_i\big (\widetilde{Y}(t_n)\big )\Big \Vert _{{{\textbf {F}}}}^2\Bigg (\Big \Vert f\big (\widetilde{Y}(t_n)\big )\Big \Vert ^2|s-t_n|^2\nonumber \\ {}&\,\,\quad +{\mathbb {E}}\Bigg [\Bigg \Vert \sum _{i,j=1}^{m}{{\textbf {D}}}g_i\big (\widetilde{Y}(t_n)\big )g_j\big (\widetilde{Y}(t_n)\big )I_{j,i}^{t_n,s}\Bigg \Vert ^2\Bigg |{\mathcal {F}}_{t_n}\Bigg ]\Bigg )\nonumber \\ \,&\le \,\, 8 C_{Dg_i}^2\big (C_f^2+ C^{}_{\texttt {ISI}}\left( 1,R\right) \big )|s-t_n|^2. \end{aligned}$$

(7.11)

For $G_{1.2}$ in (7.10), we apply (2.2), the Cauchy-Schwarz inequality, then using (6.11) in Lemma 6.5 with $k=2$ and (6.6) in Lemma 6.3 with $k=4$ we get

$$\begin{aligned} G_{1.2} \,&\le \,\, 4 \int _{0}^{1}\Big ({\mathbb {E}}\Big [\Big \Vert (1-\epsilon ){{\textbf {D}}}^2g_i\Big (\widetilde{Y}(t_n)-\epsilon \big ( Y_{\theta }(s)-\widetilde{Y}(t_n)\big )\Big )\Big \Vert _{\mathbf {T_3}}^4 \Big |{\mathcal {F}}_{t_n}\Big ]\Big )^{1/2}d\epsilon \nonumber \\ {}&\,\,\quad \times \Big ({\mathbb {E}}\Big [\Big \Vert Y_{\theta }(s)-{{\widetilde{Y}}}(t_n) \Big \Vert ^8 \Big |{\mathcal {F}}_{t_n}\Big ]\Big )^{1/2} \nonumber \\ {}&\le \,\, 4C^{}_{\texttt {TE}}\left( 2,R\right) ^{1/2} C^{}_{\texttt {SR}}\left( 4,R\right) ^{1/2} \,|s-t_n|^2. \end{aligned}$$

(7.12)

Substituting the bounds (7.11) and (7.12) back to (7.10) before bringing together the terms in (7.8), we have

$$\begin{aligned} {\mathbb {E}}\left[ \left\| \Delta g_i\big (s,X(s),{{\widetilde{Y}}}(t_n)\big ) \right\| ^2\bigg |{\mathcal {F}}_{t_n}\right]\le & {} 2{\mathbb {E}}\Big [\Big \Vert g_i(X(s))-g_i\big ( Y_{\theta }(s)\big )\Big \Vert ^2 \Big |{\mathcal {F}}_{t_n}\Big ] \\{} & {} +C_{G }(R) |s-t_n|^2, \end{aligned}$$

where $C_G(R) $ is given in (7.7). By bounding $\Vert g_i\Vert ^2$ with $\Vert g\Vert _{{\textbf{F}}(d\times m)}^2$, the statement of Lemma 7.1 follows. $\square $

The second lemma in the following gives the conditional second-moment bound of $E_{\theta }(s)$ as in (7.3), which is the first part of the one-step error in (7.1).

Lemma 7.2

Let f, g satisfy Assumption 2.1 and 2.2. Let X(s) be a solution of (1.1) and ${{\widetilde{E}}}(s)$ be given by (7.1) with $E_{\theta }(s)$ defined in (7.3), with $s\in [t_n,t_{n+1}]$, $n\in {\mathbb {N}}$. In this case there exists a constant $C_E$ and an ${\mathcal {F}}_{t_n}$-measurable random variable $\overline{C}^{\{4(q+2)\}}_{M}$ such that

$$\begin{aligned} {\mathbb {E}}\Big [\big \Vert E_{\theta }(t_{n+1})\big \Vert ^2 \Big |{\mathcal {F}}_{t_n}\Big ] \le \,\, \big \Vert \widetilde{E}(t_{n})\big \Vert ^2+C_E(R)\int _{t_n}^{t_{n+1}} {\mathbb {E}}\left[ \big \Vert E_{\theta }(r)\big \Vert ^2 \bigg |{\mathcal {F}}_{t_n}\right] dr\nonumber \\ +{\overline{C}}_M^{\{4(q+2)\}}(R)\,h_{n+1}^3, \quad a.s. \end{aligned}$$

(7.13)

where

$$\begin{aligned} C_E(R) :=\, 2K_1(R)+2c, \end{aligned}$$

(7.14)

with constant $K_1$ as defined in (7.40). The ${\mathcal {F}}_{t_n}$-measurable random variable $\overline{C}^{\{4(q+2)\}}_M$ is given by

$$\begin{aligned} {\overline{C}}_M^{\{4(q+2)\}}(R)\,:=\,m^4 C^2_{Df} C^2_{g_i}+2\overline{K}_2^{\{4(q+2)\}}+mC_{G}(R) , \end{aligned}$$

(7.15)

with the ${\mathcal {F}}_{t_n}$-measurable random variable $\overline{K}^{\{4(q+2)\}}_2$ in (7.41), constant $C_{G}$ in Lemma 7.1. Denote ${\mathbb {E}}\left[ {\overline{C}^{\{4(q+2)\}}_M(R)} \right] =:C_M(R)$, the finiteness of which is ensured in (7.44).

We recall that the superscript notation in (7.15) follows the convention introduced in the statement of Lemma 6.4 and indicates the number of finite moments required of the SDE solution.

Proof

Throughout the proof, we restrict attention to trajectories on the event $\{h_{\min }<h_{n+1}\le h_{\max }\}$, since by (7.3), $E_\theta (s)$ is only nonzero on this event, otherwise (7.13) holds trivially. Applying the stopping time variant of Itô formula (see Mao & Yuan [27]) to (7.3), we have,

$$\begin{aligned}{} & {} \big \Vert E_{\theta }(t_{n+1})\big \Vert ^2=\big \Vert \widetilde{E}(t_{n})\big \Vert ^2+2\int _{t_n}^{t_{n+1}}\underbrace{\Big \langle E_{\theta }(r),\Delta f\big (X(r),{{\widetilde{Y}}}(t_n)\big ) \Big \rangle }_{=:J_f} dr\\{} & {} +\sum _{i=1}^{m}\int _{t_n}^{t_{n+1}}{\Big \Vert \underbrace{\Delta g_i\big (r,X(r),{{\widetilde{Y}}}(t_n)\big )}_{=:J_{g_i}}\Big \Vert ^2} dr+2\sum _{i=1}^{m}\int _{t_n}^{t_{n+1}}\big \langle E_{\theta }(r),{J_{g_i}}\big \rangle dW_i(r). \end{aligned}$$

Take expectations on both sides conditional upon ${\mathcal {F}}_{t_n}$, and since $\int _{t_n}^{t_{n+1}}\big |J_f\big | dr$ has finite expectation (by the boundedness of $\widetilde{Y}(t_n)$ in (3.12) and the finiteness of absolute moments of X(r) see (2.9)), using Fubini’s Theorem (see for example [4, Proposition 12.10]) and (3.8) we have,

$$\begin{aligned} {\mathbb {E}}\Big [\big \Vert E_{\theta }(t_{n+1})\big \Vert ^2 \Big |{\mathcal {F}}_{t_n}\Big ] = \big \Vert {{\widetilde{E}}}(t_{n})\big \Vert ^2 +2\int _{t_n}^{t_{n+1}}{\mathbb {E}}\big [J_f\big |{\mathcal {F}}_{t_n}\big ]dr\nonumber \\ +\sum _{i=1}^{m}\int _{t_n}^{t_{n+1}}{\mathbb {E}}\big [{\Vert }J_{g_i}{\Vert ^2}\big |{\mathcal {F}}_{t_n}\big ]dr, \end{aligned}$$

(7.16)

By Lemma 7.1, we have the bound of ${\Vert }J_{g_i}{\Vert ^2}$ in (7.16) as

$$\begin{aligned} {\mathbb {E}}\big [{\Vert }J_{g_i}{\Vert ^2}\big |{\mathcal {F}}_{t_n}\big ]\le & {} \, 2{\mathbb {E}}\Big [\big \Vert g(X(r))-g(Y_{\theta }(r))\big \Vert _{{\textbf{F}}(d\times m)}^2 \Big |{\mathcal {F}}_{t_n}\Big ]\nonumber \\{} & {} \quad + C_{G}(R) |r-t_n|^2. \end{aligned}$$

(7.17)

For $J_f$, by substituting $\Delta f$ with (7.4) with adding in and subtracting out $f(Y_{\theta }(r))$, we have

$$\begin{aligned} J_f=\big \langle E_{\theta }(r),f(X(r))-f(Y_{\theta }(r))\big \rangle +\underbrace{\big \langle E_{\theta }(r),f(Y_{\theta }(r))-f\big (\widetilde{Y}(t_n)\big )\big \rangle }_{=:H}. \end{aligned}$$

(7.18)

Substituting (7.18) and (7.17) back into (7.16), we have

$$\begin{aligned}{} & {} {\mathbb {E}}\Big [\big \Vert E_{\theta }(t_{n+1})\big \Vert ^2 \Big |{\mathcal {F}}_{t_n}\Big ] \le \big \Vert \widetilde{E}(t_{n})\big \Vert ^2+mC_{G}(R) h_{n+1}^3 \nonumber \\{} & {} +2\int _{t_n}^{t_{n+1}}{\mathbb {E}}\big [J_{f,g}\big |{\mathcal {F}}_{t_n}\big ]dr+2\int _{t_n}^{t_{n+1}}{\mathbb {E}}\big [H\big |{\mathcal {F}}_{t_n}\big ]dr, \end{aligned}$$

(7.19)

where

$$\begin{aligned} J_{f,g}:=\big \langle E_{\theta }(r),f(X(r))-f(Y_{\theta }(r))\big \rangle +\big \Vert g(X(r))-g(Y_{\theta }(r))\big \Vert _{{{\textbf {F}}}(d\times m)}^2. \end{aligned}$$

(7.20)

For H in (7.18), and in a similar way to (7.9), we expand $f(Y_{\theta }(r))$ using Taylor’s theorem around ${{\widetilde{Y}}}(t_n)$ to have

$$\begin{aligned}{} & {} f(Y_{\theta }(r))-f\big (\widetilde{Y}(t_n)\big )={\textbf{D}}f\big (\widetilde{Y}(t_n)\big )\big (Y_{\theta }(r)-\widetilde{Y}(t_n)\big )\nonumber \\{} & {} +\int _{0}^{1}(1-\epsilon ){\textbf{D}}^2f\Big (\widetilde{Y}(t_n)-\epsilon \cdot \big (Y_{\theta }(r)-\widetilde{Y}(t_n)\big )\Big )\Big [Y_{\theta }(r)-\widetilde{Y}(t_n)\Big ]^2d\epsilon . \end{aligned}$$

(7.21)

Then we substitute $Y_{\theta }(r)$ in the first term on the RHS of (7.21) with (3.4) where we use the expanded form of the map as characterised in (2.12) for $s=r$. Therefore, for the last term on the RHS of (7.19), we have

$$\begin{aligned} {\mathbb {E}}\big [H\big |{\mathcal {F}}_{t_n}\big ]\,\le \, H_1+H_2+H_3+H_4+H_5+H_6, \end{aligned}$$

(7.22)

where

$$\begin{aligned} H_1 := \,\,&{\mathbb {E}}\Big [\left\langle E_{\theta }(r)\textbf{,}\,\, {{\textbf {D}}}f\big ({{\widetilde{Y}}}(t_n)\big )|r-t_n|f\big (\widetilde{Y}(t_n)\big )\right\rangle \Big |{\mathcal {F}}_{t_n}\Big ]; \\ H_2 := \,\,&{\mathbb {E}}\bigg [\left\langle E_{\theta }(r)\textbf{,}\,\, \underbrace{\sum _{i=1}^{m}{{\textbf {D}}}f\big (\widetilde{Y}(t_n)\big )g_i\big ({{\widetilde{Y}}}(t_n)\big )I_{i}^{t_n,r}}_{=:H_{2R}} \right\rangle \bigg |{\mathcal {F}}_{t_n} \bigg ];\\ H_{3}:=\,\,&{\mathbb {E}}\Bigg [\Bigg \langle E_{\theta }(r),\,\, \frac{1}{2}\sum _{i=1}^{m}{{\textbf {D}}}f\big (\widetilde{Y}(t_n)\big ){{\textbf {D}}}g_i\big (\widetilde{Y}(t_n)\big )g_i\big ({{\widetilde{Y}}}(t_n)\big )\\ {}&\times \left( \left( I_{i}^{t_n,r}\right) ^2-|r-t_n|\right) \Bigg \rangle \Bigg |{\mathcal {F}}_{t_n} \Bigg ];\\ H_{4}\,:=\,\,&{\mathbb {E}}\Bigg [\Bigg \langle E_{\theta }(r),\,\,\frac{1}{2}\sum _{{\begin{array}{c} i,j=1\\ i<j \end{array}}}^{m}{{\textbf {D}}}f\big (\widetilde{Y}(t_n)\big )\Big ({{\textbf {D}}}g_i\big (\widetilde{Y}(t_n)\big )g_j\big ({{\widetilde{Y}}}(t_n)\big )\\&\qquad +{{\textbf {D}}}g_j\big (\widetilde{Y}(t_n)\big )g_i\big ({{\widetilde{Y}}}(t_n)\big )\Big ) I_{i}^{t_n,r}I_{j}^{t_n,r} \Bigg \rangle \Bigg |{\mathcal {F}}_{t_n} \Bigg ];\\ H_{5}:=&\,\,{\mathbb {E}}\Bigg [ \Bigg \langle E_{\theta }(r),\,\, \sum _{{\begin{array}{c} i,j=1\\ i<j \end{array}}}^{m}{{\textbf {D}}}f\big (\widetilde{Y}(t_n)\big )\Big ({{\textbf {D}}}g_i\big (\widetilde{Y}(t_n)\big )g_j\big ({{\widetilde{Y}}}(t_n)\big )\\ {}&\qquad \qquad \qquad \quad -{{\textbf {D}}}g_j\big (\widetilde{Y}(t_n)\big )g_i\big ({{\widetilde{Y}}}(t_n)\big )\Big )A_{ij}(t_n, r)\Bigg \rangle \Bigg |{\mathcal {F}}_{t_n} \Bigg ];\\ H_{6}:=\,&{\mathbb {E}}\bigg [\bigg <E_{\theta }(r),\,\, \int _{0}^{1}(1-\epsilon ){{\textbf {D}}}^2f\Big (\widetilde{Y}(t_n)-\epsilon \cdot \big (Y_{\theta }(r)-{{\widetilde{Y}}}(t_n)\big )\Big )\\ {}&\qquad \times \Big [Y_{\theta }(r)-\widetilde{Y}(t_n)\Big ]^2d\epsilon \bigg >\bigg |{\mathcal {F}}_{t_n} \bigg ]. \end{aligned}$$

We will now determine suitable upper bounds for each of $H_1$, $H_2$, $H_3$, $H_4$, $H_5$, and $H_6$ in turn. For $H_1$ in (7.22), by the Cauchy-Schwarz inequality, (2.1), and (6.1), we have

$$\begin{aligned} H_1\le&\,\,{\mathbb {E}}\Big [\Vert E_{\theta }(r)\Vert \, \big \Vert {\textbf{D}}f\big (\widetilde{Y}(t_n)\big )\big \Vert _{{\textbf{F}}}\,\big \Vert f\big (\widetilde{Y}(t_n)\big )\big \Vert \,|r-t_n|\,\Big |{\mathcal {F}}_{t_n}\Big ]\nonumber \\ \le&\,\,{\mathbb {E}}\left[ \frac{1}{2}\big \Vert {\textbf{D}}f\big (\widetilde{Y}(t_n)\big )\big \Vert _{{\textbf{F}}}^2\,\big \Vert f\big (\widetilde{Y}(t_n)\big )\big \Vert ^2\Vert E_{\theta }(r)\Vert ^2+\frac{1}{2}|r-t_n|^2\,\bigg |{\mathcal {F}}_{t_n}\right] \nonumber \\ \le&\,\,\frac{1}{2} C_{Df}^2 C_{f}^2\,\,{\mathbb {E}}\Big [\Vert E_{\theta }(r)\Vert ^2\Big |{\mathcal {F}}_{t_n}\Big ]+\frac{1}{2}|r-t_n|^2. \end{aligned}$$

(7.23)

Next, for the analysis of $H_2$ in (7.22), by (3.8), we firstly have

$$\begin{aligned} {\mathbb {E}}\big [H_{2R}\big |{\mathcal {F}}_{t_n} \big ]=\sum _{i=1}^{m}{\textbf{D}}f\big (\widetilde{Y}(t_n)\big )g_i\big (\widetilde{Y}(t_n)\big ){\mathbb {E}}\Big [I_{i}^{t_n,r}\Big |{\mathcal {F}}_{t_n} \Big ] =0. \end{aligned}$$

(7.24)

By (2.3), the Cauchy-Schwarz inequality, (6.1) and (3.9) we also have

$$\begin{aligned} {\mathbb {E}}\Big [\big \Vert H_{2R}\big \Vert ^2\Big |{\mathcal {F}}_{t_n} \Big ] \le \,\,&m\sum _{i=1}^{m}\big \Vert {\textbf{D}}f\big (\widetilde{Y}(t_n)\big )\big \Vert _{{\textbf{F}}}^2 \big \Vert g_i\big (\widetilde{Y}(t_n)\big )\big \Vert ^2 {\mathbb {E}}\left[ \left| I_{i}^{t_n,r} \right| ^2 \bigg |{\mathcal {F}}_{t_n} \right] \nonumber \\ \le \,\,&m^2 C_{Df}^2 C_{g_i}^2|r-t_n|. \end{aligned}$$

(7.25)

Then, for $H_2$ in (7.22) we firstly expand $E_{\theta }(r)$ using (7.3) to have

$$\begin{aligned} H_2=&\, {\mathbb {E}}\left[ \left\langle {{\widetilde{E}}}(t_n)\varvec{,}\,\, H_{2R} \right\rangle \bigg |{\mathcal {F}}_{t_n} \right] +{\mathbb {E}}\left[ \left\langle \int _{t_n}^{r}\Delta f(X(p),Y(t_n))dp\varvec{,}\,\, H_{2R}\right\rangle \bigg |{\mathcal {F}}_{t_n} \right] \nonumber \\ {}&\quad +{\mathbb {E}}\left[ \left\langle \sum _{i=1}^{m}\int _{t_n}^{r}\Delta g_i\big (p,X(p),{{\widetilde{Y}}}(t_n)\big )dW_{i}(p)\varvec{,}\,\, H_{2R} \right\rangle \bigg |{\mathcal {F}}_{t_n} \right] \nonumber \\ =:&\,\, H_{2.1}+H_{2.2}+H_{2.3}. \end{aligned}$$

(7.26)

For $H_{2.1}$ in (7.26), by (7.24) we have

$$\begin{aligned} H_{2.1}=\big \langle {{\widetilde{E}}}(t_n)\varvec{,}\,\, {\mathbb {E}}\big [H_{2R}\big |{\mathcal {F}}_{t_n} \big ] \big \rangle =0. \end{aligned}$$

(7.27)

For $H_{2.2}$ in (7.26), by adding in and subtracting out $f(X(t_n))$ in $\Delta f$ in (7.4):

$$\begin{aligned} H_{2.2}&=\,\, {\mathbb {E}}\Bigg [\left\langle \int _{t_n}^{r} f(X(r))-f(X(t_n)) dp\varvec{,}\,\, H_{2R} \right\rangle \Bigg |{\mathcal {F}}_{t_n} \Bigg ]\nonumber \\ {}&\quad \,\,+{\mathbb {E}}\Bigg [\left\langle \int _{t_n}^{r} f(X(t_n))-f\big (\widetilde{Y}(t_n)\big ) dp\varvec{,}\,\, H_{2R}\right\rangle \Bigg |{\mathcal {F}}_{t_n} \Bigg ]\nonumber \\ {}&=:\,\, H_{2.21}+H_{2.22}. \end{aligned}$$

(7.28)

Similar to $H_{2.1}$ in (7.27), we have $H_{2.22} = 0$. For $H_{2.21}$ in (7.28), using the Cauchy-Schwarz inequality and (7.25) we have

$$\begin{aligned} H_{2.21}\le&\,\, {\mathbb {E}}\left[ \left\| \int _{t_n}^{r} f(X(p))-f(X(t_n)) dp\right\| \big \Vert H_{2R}\big \Vert \bigg |{\mathcal {F}}_{t_n} \right] \nonumber \\ \le&\Bigg (|r-t_n|\int _{t_n}^r{\mathbb {E}}\left[ \left\| f(X(p))-f(X(t_n)) \right\| ^2 \bigg |{\mathcal {F}}_{t_n} \right] dp\,\,{\mathbb {E}}\Big [\big \Vert H_{2R}\big \Vert ^2\Big |{\mathcal {F}}_{t_n} \Big ]\Bigg )^{1/2}\nonumber \\ \le&\,\, m C_{Df} C_{g_i}|r-t_n|\left( \int _{t_n}^r{\mathbb {E}}\left[ \left\| f(X(p))-f(X(t_n)) \right\| ^2\bigg |{\mathcal {F}}_{t_n} \right] dp\right) ^{1/2}. \end{aligned}$$

(7.29)

By Taylor expansion of f(X(p)) around $f(X(t_n))$ to first order, and using (2.6), the Cauchy-Schwarz inequality, Lemma 6.4 with $k=2$ and (2.3):

$$\begin{aligned}&\,\,{\mathbb {E}}\left[ \big \Vert f(X(p))-f(X(t_n)) \big \Vert ^2\bigg |{\mathcal {F}}_{t_n} \right] \nonumber \\&=\,\,{\mathbb {E}}\left[ \left\| \int _{0}^{1}{\textbf{D}}f(X(t_n)-\epsilon \cdot (X(p)-X(t_n))(X(p)-X(t_n))d\epsilon \right\| ^2\bigg |{\mathcal {F}}_{t_n} \right] \nonumber \\&\le \Big ({\mathbb {E}}\left[ \Vert X(p)-X(t_n)\Vert ^4\bigg |{\mathcal {F}}_{t_n} \right] \Big )^{1/2}\nonumber \\&\qquad \qquad \times \left( {\mathbb {E}}\left[ \left\| \int _{0}^{1}{\textbf{D}}f\big (X(t_n)-\epsilon \cdot (X(p)-X(t_n)\big )d\epsilon \right\| _{{\textbf{F}}}^4\bigg |{\mathcal {F}}_{t_n} \right] \right) ^{1/2}\nonumber \\&\le \,\,{\overline{C}}^{\{4(q+2)\}}_{H2.21}|p-t_n|, \end{aligned}$$

(7.30)

where

$$\begin{aligned} {\overline{C}}^{\{4(q+2)\}}_{H2.21}:= & {} \,\,\left( \overline{C}^{\{4(q+2)\}}_{\texttt {PR}}\right) ^{1/2}\nonumber \\{} & {} \times c_3^2\Big (1+3^{4q_1+4}{\mathbb {E}} \Big [\sup _{p\in [t_n,t_{n+1}]}\Vert X(p)\Vert ^{4q_1+4}\Big |{\mathcal {F}}_{t_n} \Big ]\Big )^{1/2}. \end{aligned}$$

(7.31)

Substituting (7.30) back to (7.29) and using that $H_{2.22}=0$, we have

$$\begin{aligned} H_{2.2} \le m C_{Df} C_{g_i} \left( \overline{C}^{\{4(q+2)\}}_{H2.21}\right) ^{1/2}|r-t_n|^2. \end{aligned}$$

(7.32)

For $H_{2.3}$ as in (7.26), using the Cauchy-Schwarz inequality, (2.3), (6.1), (3.9) and Itô ’s isometry we have

$$\begin{aligned} H_{2.3}\le&\left( {\mathbb {E}}\left[ \left\| \sum _{i=1}^{m}\int _{t_n}^{r}\Delta g_i\big (p,X(p),\widetilde{Y}(t_n)\big )dW_{i}(p)\right\| ^2\bigg |{\mathcal {F}}_{t_n} \right] \right) ^{1/2}\\&\qquad \qquad \quad \times \left( {\mathbb {E}}\left[ \left\| \sum _{i=1}^{m}{\textbf{D}}f\big (\widetilde{Y}(t_n)\big )g_i\big ({{\widetilde{Y}}}(t_n)\big )I_{i}^{t_n,r} \right\| ^2 \bigg |{\mathcal {F}}_{t_n} \right] \right) ^{1/2}\\ \le&\left( m\sum _{i=1}^{m}\int _{t_n}^{r}{\mathbb {E}}\left[ \left\| \Delta g_i\big (p,X(p),{{\widetilde{Y}}}(t_n)\big ) \right\| ^2\bigg |{\mathcal {F}}_{t_n} \right] dp\right) ^{1/2} \\&\qquad \qquad \qquad \qquad \qquad \qquad \qquad \quad \times m C_{Df} C_{g_i} |r-t_n|^{1/2}. \end{aligned}$$

Then, by Lemma 7.1 we have

$$\begin{aligned}{} & {} H_{2.3} \le \bigg (2m^2\int _{t_n}^{r} {\mathbb {E}}\Big [\big \Vert g(X(p))-g(Y_{\theta }(p))\big \Vert _{{{\textbf {F}}}(d\times m)}^2 \Big |{\mathcal {F}}_{t_n}\Big ]dp \\{}{} & {} {}\qquad +C_{G}(R) |r-t_n|^3\bigg )^{1/2}m C_{Df} C_{g_i} |r-t_n|^{1/2}. \end{aligned}$$

Since the integrand ${\mathbb {E}}\Big [\big \Vert g(X(p))-g(Y_{\theta }(p))\big \Vert _{{\textbf{F}}(d\times m)}^2 \Big |{\mathcal {F}}_{t_n}\Big ]$ is non-negative for all $p\in [t_n,t_{n+1}]$, we can replace the upper limit of integration with $t_{n+1}$. With $\sqrt{a+b}\le \sqrt{a}+\sqrt{b}$, we have

$$\begin{aligned} H_{2.3}\le & {} \sqrt{2}m^2 C_{Df} C_{g_i}|r-t_n|^{1/2}\nonumber \\{} & {} \times \left( \int _{t_n}^{t_{n+1}} {\mathbb {E}}\Big [\big \Vert g(X(r))-g(Y_{\theta }(r))\big \Vert _{{\textbf{F}}(d\times m)}^2 \Big |{\mathcal {F}}_{t_n}\Big ]dr\right) ^{1/2}\nonumber \\{} & {} +m C_{Df} C_{g_i}C_{G}(R)^{1/2} |r-t_n|^{2}. \end{aligned}$$

(7.33)

Notice that we changed the variable of integration from p back to r for consistency. Substituting (7.27), (7.32) and (7.33) back into (7.26), we have

$$\begin{aligned} H_{2}&\le \,\,m C_{Df} C_{g_i} \Bigg (\left( \overline{C}^{\{4(q+2)\}}_{H2.21}\right) ^{1/2}+C_{G}(R)^{1/2} \Bigg )|r-t_n|^2\nonumber \\&\quad +\sqrt{2}m^2 C_{Df} C_{g_i}|r-t_n|^{1/2}\nonumber \\&\quad \times \left( \int _{t_n}^{t_{n+1}} {\mathbb {E}}\Big [\big \Vert g(X(r))-g(Y_{\theta }(r))\big \Vert _{{\textbf{F}}(d\times m)}^2 \Big |{\mathcal {F}}_{t_n}\Big ]dr\right) ^{1/2}. \end{aligned}$$

(7.34)

For $H_{3}$ in (7.22), by the Cauchy-Schwarz inequality, triangle inequality, (2.3), (2.1), (3.10), (2.6) and (6.1) we have

$$\begin{aligned} H_{3}&\le \,\,{\mathbb {E}}\Bigg [\frac{1}{4}\sum _{i=1}^{m}\Bigg ( \big \Vert {\textbf{D}}f\big (\widetilde{Y}(t_n)\big )\big \Vert _{{\textbf{F}}}^2\,\big \Vert {\textbf{D}}g_i\big (\widetilde{Y}(t_n)\big )\big \Vert _{{\textbf{F}}}^2\,\big \Vert g_i\big (\widetilde{Y}(t_n)\big )\big \Vert ^2\Vert E_{\theta }(r)\Vert ^2 \nonumber \\&\quad + 2\left| I_{i}^{t_n,r}\right| ^4+ 2|r-t_n|^2\Bigg ) \Bigg |{\mathcal {F}}_{t_n} \Bigg ]\nonumber \\&\le \,\,\frac{m}{4} C_{Df}^2 C_{Dg_i}^2 C_{g_i}^2\,\,{\mathbb {E}}\Big [\Vert E_{\theta }(r)\Vert ^2\Big |{\mathcal {F}}_{t_n}\Big ]+ \frac{(\varvec{\gamma }_{4}+1)m}{2}|r-t_n|^2. \end{aligned}$$

(7.35)

For $H_{4}$ in (7.22), by the Cauchy-Schwarz inequality, conditional independence of the Itô integrals, (3.9), triangle inequality, (2.1), Itô ’s isometry, (2.6), and (6.1), we have

$$\begin{aligned} H_{4}&\le \,\,{\mathbb {E}}\Bigg [\frac{1}{2}\sum _{\begin{array}{c} i,j=1\\ i<j \end{array}}^{m}\Vert E_{\theta }(r)\Vert \big \Vert {{\textbf {D}}}f\big (\widetilde{Y}(t_n)\big )\big \Vert _{{{\textbf {F}}}}\Big (\big \Vert {{\textbf {D}}}g_i\big (\widetilde{Y}(t_n)\big )\big \Vert _{{{\textbf {F}}}}\big \Vert g_j\big (\widetilde{Y}(t_n)\big )\big \Vert \nonumber \\ {}&\quad \,\, +\big \Vert {{\textbf {D}}}g_j\big (\widetilde{Y}(t_n)\big )\big \Vert _{{{\textbf {F}}}}\big \Vert g_i\big (\widetilde{Y}(t_n)\big )\big \Vert \Big ) \left| I_{i}^{t_n,r}\right| \,\left| I_{j}^{t_n,r}\right| \Bigg | {\mathcal {F}}_{t_n} \Bigg ]\nonumber \\ {}&\le \,\,\frac{1}{4}m(m-1) C_{Df}^2 C_{Dg_i}^2 C_{g_i}^2{\mathbb {E}}\Big [\Vert E_{\theta }(r)\Vert ^2\Big |{\mathcal {F}}_{t_n}\Big ]+\frac{1}{8}m(m-1)|r-t_n|^2. \end{aligned}$$

(7.36)

For $H_{5}$ in (7.22), by the Cauchy-Schwarz inequality, triangle inequality, (2.1), (6.1), (2.6), and Lemma 2.2 with $b=2$, we have

$$\begin{aligned} H_{5}&\le \,\,{\mathbb {E}}\Bigg [\frac{1}{2}\sum _{{\begin{array}{c} i,j=1\\ i<j \end{array}}}^{m}\Bigg (\big \Vert {{\textbf {D}}}f\big (\widetilde{Y}(t_n)\big )\Vert _{{{\textbf {F}}}}^2\Vert E_{\theta }(r)\Vert ^2\Big (\big \Vert {{\textbf {D}}}g_i\big (\widetilde{Y}(t_n)\big )\big \Vert _{{{\textbf {F}}}}^2\big \Vert g_j\big (\widetilde{Y}(t_n)\big )\big \Vert ^2\nonumber \\ {}&\quad \,\, +\big \Vert {{\textbf {D}}}g_i\big (\widetilde{Y}(t_n)\big )\big \Vert _{{{\textbf {F}}}}^2\big \Vert g_j\big (\widetilde{Y}(t_n)\big )\big \Vert ^2\Big )+\big |A_{ij}(t_n, r)\big |^2\Bigg )\Bigg |{\mathcal {F}}_{t_n} \Bigg ]\nonumber \\ {}&\le \,\,\frac{1}{2}m(m-1) C_{Df}^2 C_{Dg_i}^2 C_{g_i}^2{\mathbb {E}}\Big [\Vert E_{\theta }(r)\Vert ^2\Big |{\mathcal {F}}_{t_n}\Big ]\nonumber \\ {}&\quad \,\, +\frac{1}{4}m(m-1)(C^{}_{\texttt {LA}}\left( 2\right) )^2|r-t_n|^2. \end{aligned}$$

(7.37)

For $H_{6}$ in (7.22), by the Cauchy-Schwarz inequality, triangle inequality, and (2.1) we have (noting that $\Vert [\cdot ]^2 \Vert _{\textbf{F}}=\Vert \cdot \Vert ^2$)

$$\begin{aligned} H_{6}\le \,&{\mathbb {E}}\bigg [\Vert E_{\theta }(r)\Vert \big \Vert Y_{\theta }(r)-{{\widetilde{Y}}}(t_n)\big \Vert ^2\\&\qquad \qquad \times \left\| \int _{0}^{1}(1-\epsilon ){\textbf{D}}^2f\Big (\widetilde{Y}(t_n)-\epsilon \cdot \big (Y_{\theta }(r)-\widetilde{Y}(t_n)\big )\Big )d\epsilon \right\| _{{\textbf{T}}_3}\bigg |{\mathcal {F}}_{t_n} \bigg ]\\ \le \,&\frac{1}{2}{\mathbb {E}}\Big [\Vert E_{\theta }(r)\Vert ^2\Big |{\mathcal {F}}_{t_n}\Big ]+ \frac{1}{2}\underbrace{\sqrt{{\mathbb {E}}\left[ \big \Vert Y_{\theta }(r)-{{\widetilde{Y}}}(t_n)\big \Vert ^8\bigg |{\mathcal {F}}_{t_n} \right] }}_{H_{6.1}}\\&\qquad \times \underbrace{\sqrt{{\mathbb {E}}\left[ \left\| \int _{0}^{1}(1-\epsilon ){\textbf{D}}^2f\Big (\widetilde{Y}(t_n)-\epsilon \cdot \big (Y_{\theta }(r)-\widetilde{Y}(t_n)\big )\Big )d\epsilon \right\| _{{\textbf{T}}_3}^4 \bigg |{\mathcal {F}}_{t_n} \right] }}_{H_{6.2}}. \end{aligned}$$

From (6.6) in Lemma 6.3 with $k=4$, we have $H_{6.1}\le C^{}_{\texttt {SR}}\left( 4,R\right) ^{1/2} |r-t_n|^2.$ From (6.11) in Lemma 6.5 with $k=2$, we have $H_{6.2} \le C^{}_{\texttt {TE}}\left( 2,R\right) ^{1/2}$. Therefore, $H_{6}$ in (7.22) becomes

$$\begin{aligned} H_{6}\le \frac{1}{2}{\mathbb {E}}\Big [\Vert E_{\theta }(r)\Vert ^2\Big |{\mathcal {F}}_{t_n}\Big ] +\frac{1}{2} C^{}_{\texttt {SR}}\left( 4,R\right) ^{1/2}\, C^{}_{\texttt {TE}}\left( 2,R\right) ^{1/2}\,|r-t_n|^2. \end{aligned}$$

(7.38)

Substituting (7.23), (7.34), (7.35), (7.36), (7.37) and (7.38) back into (7.22) for H, we have

$$\begin{aligned} {\mathbb {E}}[H|{\mathcal {F}}_{t_n}]\le & {} K_1(R) {\mathbb {E}}\Big [\Vert E_{\theta }(r)\Vert ^2\Big |{\mathcal {F}}_{t_n}\Big ]+\overline{K}^{\{4(q+2)\}}_2(R)|r-t_n|^2\nonumber \\{} & {} +\sqrt{2}m^2 C_{Df} C_{g_i}|r-t_n|^{1/2}\nonumber \\{} & {} \times \left( \int _{t_n}^{t_{n+1}} {\mathbb {E}}\Big [\big \Vert g(X(r))-g(Y_{\theta }(r))\big \Vert _{{\textbf{F}}(d\times m)}^2 \Big |{\mathcal {F}}_{t_n}\Big ]dr\right) ^{1/2}, \end{aligned}$$

(7.39)

where

$$\begin{aligned} K_1(R) :=\,\,&\frac{1}{2} + \frac{1}{2} C_{Df}^2 C_{f}^2+m(m-1) C_{Df}^2 C_{Dg_i}^2 C_{g_i}^2, \end{aligned}$$

(7.40)

and with ${\overline{C}}^{\{4(q+2)\}}_{H2.21}$ from (7.31)

$$\begin{aligned}&{\overline{K}}^{\{4(q+2)\}}_2(R)\nonumber \\&:=\,\,\frac{1}{2}+m C_{Df} C_{g_i} \bigg (\left( {\overline{C}}^{\{4(q+2)\}}_{H2.21}\right) ^{1/2}+\frac{1}{2}(\varvec{\gamma }_{4}+1)m\nonumber \\ {}&+\frac{1}{4}m(m-1)\big (1+\left( C^{}_{\texttt {LA}}\left( 2\right) \right) ^2\big )+\frac{1}{2} C^{}_{\texttt {SR}}\left( 4,R\right) ^{1/2} C^{}_{\texttt {TE}}\left( 2,R\right) ^{1/2}. \end{aligned}$$

(7.41)

Substituting ${\mathbb {E}}[H|{\mathcal {F}}_{t_n}]$ from (7.39) back into (7.19), we have

$$\begin{aligned} {\mathbb {E}}\Big [\big \Vert E_{\theta }(t_{n+1})&\big \Vert ^2 \Big |{\mathcal {F}}_{t_n}\Big ]\le \big \Vert \widetilde{E}(t_n)\big \Vert ^2+2K_1(R)\displaystyle \int _{t_n}^{t_{n+1}}{\mathbb {E}}\Big [\Vert E_{\theta }(r)\Vert ^2\Big |{\mathcal {F}}_{t_n}\Big ]dr\nonumber \\{}&{} +mC_{G}(R) h_{n+1}^3 +{\overline{K}}^{\{4(q+2)\}}_2(R) h_{n+1}^3\nonumber \\{}&{} +2 \displaystyle \int _{t_n}^{t_{n+1}}{\mathbb {E}}\big [J_{f,g}\big |{\mathcal {F}}_{t_n}\big ]dr+\sqrt{2 }m^2 C_{Df} C_{g_i}h_{n+1}^{3/2}\nonumber \\{}&{} \times \displaystyle \left( \int _{t_n}^{t_{n+1}} {\mathbb {E}}\Big [\big \Vert g(X(r))-g(Y_{\theta }(r))\big \Vert _{{{\textbf {F}}}(d\times m)}^2 \Big |{\mathcal {F}}_{t_n}\Big ]dr\right) ^{1/2}. \end{aligned}$$

(7.42)

Using (2.1) on the last term on the RHS of (7.42), we have

$$\begin{aligned} {\mathbb {E}}\Big [\big \Vert E_{\theta }&(t_{n+1})\big \Vert ^2 \Big |{\mathcal {F}}_{t_n}\Big ]\le \,\, \big \Vert \widetilde{E}(t_n)\big \Vert ^2+2K_1(R)\displaystyle \int _{t_n}^{t_{n+1}} {\mathbb {E}}\Big [\Vert E_{\theta }(r)\Vert ^2\Big |{\mathcal {F}}_{t_n}\Big ]dr\nonumber \\{}&{} +{\overline{C}}_M^{\{4(q+2)\}}(R)\,h_{n+1}^3\nonumber \\{}&{} +2 \displaystyle \int _{t_n}^{t_{n+1}}{\mathbb {E}}\bigg [J_{f,g}+\frac{1}{2}\Big \Vert g(X(r))-g(Y_{\theta }(r))\big \Vert _{{{\textbf {F}}}(d\times m)}^2\bigg |{\mathcal {F}}_{t_n}\bigg ]dr, \end{aligned}$$

(7.43)

where ${\overline{C}}^{\{4(q+2)\}}_M$ is as defined in (7.15). Recall $J_{f,g}$ is given in (7.20) so that

$$\begin{aligned}{} & {} J_{f,g}+\frac{1}{2}\Big \Vert g(X(r))-g(Y_{\theta }(r))\big \Vert _{{\textbf{F}}(d\times m)}^2\\{} & {} =\Big \langle E_{\theta }(r),f(X(r))-f(Y_{\theta }(r))\Big \rangle +\frac{3}{2}\big \Vert g(X(r))-g(Y_{\theta }(r))\big \Vert _{{\textbf{F}}(d\times m)}^2. \end{aligned}$$

By Assumption 2.2 we can apply the monotone condition (2.5):

$$\begin{aligned} {\mathbb {E}}\Big [\big \Vert E_{\theta }(t_{n+1})\big \Vert ^2 \Big |{\mathcal {F}}_{t_n}\Big ] \le&\,\, \big \Vert \widetilde{E}(t_n)\big \Vert ^2+C_E(R)\displaystyle \int _{t_n}^{t_{n+1}} {\mathbb {E}}\Big [\Vert E_{\theta }(r)\Vert ^2\Big |{\mathcal {F}}_{t_n}\Big ]dr\\{}&{} +{\overline{C}}_M^{\{4(q+2)\}}(R)\,h_{n+1}^3, \end{aligned}$$

where $C_E(R)$ is in (7.14).

To obtain the the final estimate on $C_M(R)$ in the Lemma we use the explicit form of ${\overline{C}}_M^{\{4(q+2)\}}$, $\overline{K}_2^{\{4(q+2)\}}$, ${\overline{C}}_{H2.21}^{\{4(q+2)\}}$, given by (7.15), (7.41), and (7.31) respectively, (6.10) in the statement of Lemma 6.5, (2.9), and Assumption 2.2 we bound the expectation of $\overline{C}_M^{\{4(q+2)\}}$ as follows,

$$\begin{aligned} C_M(R)\,:=\,\,&{\mathbb {E}}\Bigg [ {\overline{C}}_M^{\{4(q+2)\}}(R)\Bigg ]\nonumber \\ \le&\,m^4 C^2_{Df} C^2_{g_i}+ 2m C_{Df} C_{g_i} \Big ( c_3 C^{}_{\texttt {PR}}\left( 2\right) ^{1/4} \big (1+3^{q_1+1}C_{\texttt {X}} \big )+C_{G}(R)^{1/2} \Big )\nonumber \\&\quad +\frac{1}{2}(\varvec{\gamma }_{4}+1)m+\frac{1}{2}m(m-1)\big (1+(C^{}_{\texttt {LA}}\left( 2\right) )^2\big )\nonumber \\&\quad + C^{}_{\texttt {SR}}\left( 4,R\right) ^{1/2} C^{}_{\texttt {TE}}\left( 2,R\right) ^{1/2}+mC_{G}(R)+1. \end{aligned}$$

(7.44)

$\square $

7.2 Proof of Theorem 4.1 on strong convergence

Proof

Firstly, by (7.1) we have the conditional second-moment bound of the one-step error as

$$\begin{aligned} {\mathbb {E}}\Big [\big \Vert {{\widetilde{E}}}(t_{n+1})\big \Vert ^2 \Big |{\mathcal {F}}_{t_n}\Big ] =\,\,{\mathbb {E}}\Big [\big \Vert E_{\theta }(t_{n+1})\big \Vert ^2 \Big |{\mathcal {F}}_{t_n}\Big ]+{\mathbb {E}}\Big [\big \Vert E_{\varphi }(t_{n+1})\big \Vert ^2 \Big |{\mathcal {F}}_{t_n}\Big ], \end{aligned}$$

(7.45)

where by (3.6) and (7.2), the one-step error bound of the backstop map yields

$$\begin{aligned} {\mathbb {E}}\Big [\big \Vert E_{\varphi }(t_{n+1})\big \Vert ^2 \Big |{\mathcal {F}}_{t_n}\Big ] \le&\,\, \big \Vert \widetilde{E}(t_n)\big \Vert ^2+C_{B_1}\displaystyle \int _{t_n}^{t_{n+1}} {\mathbb {E}}\Big [\Vert E_{\varphi }(r)\Vert ^2\Big |{\mathcal {F}}_{t_n}\Big ]dr\nonumber \\ {}&+C_{B_2}\,h_{n+1}^3, \quad a.s. \end{aligned}$$

(7.46)

Therefore, by substituting (7.13) and (7.46) into (7.45), and recalling (7.1) we have for any $h_{n+1}$ that satisfies Assumption 3.1,

$$\begin{aligned} {\mathbb {E}}\Big [\big \Vert {{\widetilde{E}}}(t_{n+1})\big \Vert ^2 \Big |{\mathcal {F}}_{t_n}\Big ]\le & {} \big \Vert \widetilde{E}(t_{n})\big \Vert ^2 + \Gamma _1(R) \int _{t_n}^{t_{n+1}} {\mathbb {E}}\Big [\big \Vert {{\widetilde{E}}}(r)\big \Vert ^2 \Big |{\mathcal {F}}_{t_n}\Big ]dr\nonumber \\{} & {} +{\overline{\Gamma }}_2^{\{4(q+2)\}}\big (R \big ) h_{n+1} ^3,\quad a.s. \end{aligned}$$

(7.47)

where we define $\Gamma _1$, ${\overline{\Gamma }}_2$ and by (7.44) its expected form as

$$\begin{aligned} \Gamma _1(R)&:=C_E(R)+C_{B_1};\nonumber \\ {\overline{\Gamma }}_2^{\{4(q+2)\}}\big (R\big )&:=\overline{C}_M^{\{4(q+2)\}}(R)+ C_{B_2};\nonumber \\ \Gamma _2(R)&:={\mathbb {E}}\bigg [{\overline{\Gamma }}_2^{\{4(q+2)\}}\big (R \big ) \bigg ] \le C_M(R) + C_{B_2}. \end{aligned}$$

(7.48)

For a fixed $t>0$, let $N^{(t)}$ be as in Definition 3.2, we multiply both sides of (7.47) with the indicator function ${\textbf{1}}_{\{N^{(t)}> n+1\}}$ and sum up the steps excluding the last step $N^{(t)}$ to have

$$\begin{aligned}{} & {} \sum _{n=0}^{N^{(t)}-2}{\mathbb {E}}\Big [\big \Vert \widetilde{E}(t_{n+1})\big \Vert ^2 \Big |{\mathcal {F}}_{t_n}\Big ] {{\textbf {1}}}_{\{N^{(t)}> n+1\}} \le \sum _{n=0}^{N^{(t)}-2}\big \Vert \widetilde{E}(t_{n})\big \Vert ^2{{\textbf {1}}}_{\{N^{(t)}> n+1\}} \nonumber \\{} & {} \qquad +\, \Gamma _1(R) \sum _{n=0}^{N^{(t)}-2}\int _{t_n}^{t_{n+1}}{\mathbb {E}}\Big [\big \Vert \widetilde{E}(r)\big \Vert ^2 \Big |{\mathcal {F}}_{t_n}\Big ]{{\textbf {1}}}_{\{N^{(t)}> n+1\}}dr\nonumber \\{} & {} \qquad + {\overline{\Gamma }}_2^{\{4(q+2)\}}(R)\sum _{n=0}^{N^{(t)}-2} h_{n+1}^3{{\textbf {1}}}_{\{N^{(t)}> n+1\}}. \end{aligned}$$

(7.49)

Since $t\in \big [t_{N^{(t)}-1},t_{N^{(t)}}\big ]$, we use (7.47) to express the last step, noting that it holds when $t_n,t_{n+1}$ are replaced by $t_{N^{(t)}-1}$ and t respectively:

$$\begin{aligned} {\mathbb {E}}\Big [\big \Vert {{\widetilde{E}}}(t )\big \Vert ^2 \Big |{\mathcal {F}}_{t_{N^{(t)}-1}}\Big ] \le \,&\big \Vert \widetilde{E}(t_{N^{(t)}-1})\big \Vert ^2+\Gamma _1(R)\int _{t_{N^{(t)}-1}}^{t} {\mathbb {E}}\Big [\big \Vert {{\widetilde{E}}}(r)\big \Vert ^2 \Big |{\mathcal {F}}_{t_{N^{(t)}-1}}\Big ]dr\nonumber \\&\qquad \quad +{\overline{\Gamma }}_2^{\{4(q+2)\}}(R) \big |t-t_{N^{(t)}-1}\big |^3. \end{aligned}$$

(7.50)

By adding the both sides of (7.49) and (7.50), and taking an expectation:

$$\begin{aligned}&\left. \begin{array}{l} \qquad {\mathbb {E}}\Bigg [\displaystyle \sum _{n=0}^{N^{(t)}-2}\Big ({\mathbb {E}}\Big [\big \Vert {{\widetilde{E}}}(t_{n+1})\big \Vert ^2 \Big |{\mathcal {F}}_{t_n}\Big ]-\big \Vert {{\widetilde{E}}}(t_{n})\big \Vert ^2\Big ){\textbf{1}}_{\{N^{(t)}> n+1\}} \\ \,\,\quad \quad \qquad \quad \qquad \qquad +{\mathbb {E}}\Big [\big \Vert {{\widetilde{E}}}(t)\big \Vert ^2 \Big |{\mathcal {F}}_{t_{N^{(t)}-1}}\Big ]-\big \Vert {{\widetilde{E}}}(t_{N^{(t)}-1})\big \Vert ^2 \Bigg ] \\ \end{array}\right\} =:\text {LHS}\nonumber \\&\left. \begin{array}{l} \le \,\,\Gamma _1(R){\mathbb {E}}\Bigg [ \displaystyle \sum _{n=0}^{N^{(t)}-2}\int _{t_n}^{t_{n+1}}{\mathbb {E}}\Big [\big \Vert {{\widetilde{E}}}(r)\big \Vert ^2 \Big |{\mathcal {F}}_{t_n}\Big ]{\textbf{1}}_{\{N^{(t)}> n+1\}}dr\\ \qquad \qquad \qquad \qquad \qquad \qquad +\displaystyle \int _{t_{N^{(t)}-1}}^{t} {\mathbb {E}}\Big [\big \Vert {{\widetilde{E}}}(r)\big \Vert ^2 \Big |{\mathcal {F}}_{t_{N^{(t)}-1}}\Big ]dr\Bigg ] \\ \end{array}\right\} =: \text {R}_1 \nonumber \\&\left. \begin{array}{l} \quad +{\mathbb {E}}\Bigg [{\overline{\Gamma }}_2^{\{4(q+2)\}}(R)\Bigg (\displaystyle \sum _{n=0}^{N^{(t)}-2} h_{n+1}^3{\textbf{1}}_{\{N^{(t)}> n+1\}}+\big |t-t_{N^{(t)}-1}\big |^3\Bigg )\Bigg ] \end{array}\right\} =: \text {R}_2 \end{aligned}$$

(7.51)

where we analyse (7.51) ($\text {LHS}\le \text {R}_1+\text {R}_2$) below. For the LHS in (7.51), $N^{(t)}$ is a random number taking value from $N^{(t)}_{\min }$ to $N^{(t)}_{\max }$, and ${\textbf{1}}_{\{N^{(t)}> n+1\}}$ is an ${\mathcal {F}}_{t_{n}}$-measurable random variable. Therefore it is useful decompose the range of n into three parts on each trajectory. First, when $n<N^{(t)}-1$, then $1_{\{N^{(t)}>n+1\}}=1_{\{N^{(t)}>n\}}=1$. Second, when $n=N^{(t)}-1$, then $1_{\{N^{(t)}>n+1\}}=0$ and $1_{\{N^{(t)}>n\}}=1$. Finally, when $n>N^{(t)}-1$, then $1_{\{N^{(t)}>n+1\}}=1_{\{N^{(t)}>n\}}=0$. Hence we obtain a telescoping sum with the appropriate cancellation that terminates at ${\mathbb {E}}\big [\Vert {\widetilde{E}}(t_{N^{(t)}-1})\Vert ^2\,1_{\{N^{(t)}>N^{(t)}-1\}}\big ]={\mathbb {E}}\big [\Vert {\widetilde{E}}(t_{N^{(t)}-1})\Vert ^2\big ]$. Applying this with the tower property for conditional expectations, and using the fact that $\Vert {\widetilde{E}}(t_0)\Vert ^2=0$, we have

$$\begin{aligned} \text {LHS}=&\sum _{n=0}^{N_{\max }^{(t)}-2}{\mathbb {E}}\Big [\big \Vert \widetilde{E}(t_{n+1})\big \Vert ^2 {\textbf{1}}_{\{N^{(t)}> n+1\}} -\big \Vert \widetilde{E}(t_{n})\big \Vert ^2 {\textbf{1}}_{\{N^{(t)}> n+1\}}\Big ]\nonumber \\&\qquad \qquad \qquad \qquad \qquad +{\mathbb {E}}\Big [{\mathbb {E}}\Big [\big \Vert \widetilde{E}(t )\big \Vert ^2 \Big |{\mathcal {F}}_{t_{N^{(t)}-1}}\Big ]-\big \Vert \widetilde{E}(t_{N^{(t)}-1})\big \Vert ^2\Big ] \nonumber \\ =\,\,&{\mathbb {E}}\Big [\big \Vert \widetilde{E}(t_{N^{(t)}-1})\big \Vert ^2\Big ]-{\mathbb {E}}\Big [\big \Vert \widetilde{E}(t_{0})\big \Vert ^2\Big ]+ {\mathbb {E}}\big [\big \Vert {{\widetilde{E}}}(t )\big \Vert ^2 \big ]-{\mathbb {E}}\Big [\big \Vert \widetilde{E}(t_{N^{(t)}-1})\big \Vert ^2\Big ]\nonumber \\ =\,\,&{\mathbb {E}}\Big [\big \Vert {{\widetilde{E}}}(t )\big \Vert ^2 \Big ]. \end{aligned}$$

(7.52)

Consider the term $\text {R}_1$ on the RHS of (7.51). By Definition 3.2 we have each $n=N^{(r)}-1$ for $r\in [t_n,t_{n+1}]$. So we restate ${\mathcal {F}}_{t_n}$ as ${\mathcal {F}}_{t_{N^{(r)}-1}}$, and the indicator function as ${\textbf{1}}_{\{N^{(t)}>N^{(r)}\}}$. Summing up all the steps results in an integral from 0 to $t_{N^{(t)}-1}$ that

$$\begin{aligned} \text{ R}_1&=\,\,\Gamma _1(R){\mathbb {E}}\Bigg [ \int _{0}^{t_{N^{(t)}-1}}{\mathbb {E}}\Big [\big \Vert \widetilde{E}(r)\big \Vert ^2{{\textbf {1}}}_{\{N^{(t)}>N^{(r)}\}} \Big |{\mathcal {F}}_{t_{N^{(r)}-1}}\Big ]dr\nonumber \\ {}&\quad \,\, +\int _{t_{N^{(t)}-1}}^{t} {\mathbb {E}}\Big [\big \Vert {{\widetilde{E}}}(r)\big \Vert ^2 \Big |{\mathcal {F}}_{t_{N^{(t)}-1}}\Big ]dr\Bigg ]\nonumber \\ {}&\le \,\,\Gamma _1(R)\int _{0}^{t} {\mathbb {E}}\Big [\big \Vert \widetilde{E}(r)\big \Vert ^2 \Big ]dr. \end{aligned}$$

(7.53)

For $\text {R}_2$ in (7.51), by (7.48), Definition 3.2 and $\rho h_{\min }=h_{\max }$ we have

$$\begin{aligned} \text {R}_2\le \Gamma _2(R) N^{(t)}_{\max }h_{\max }^3 \le \Gamma _2(R)\left( \rho t+1\right) h_{\max }^2. \end{aligned}$$

(7.54)

We see that $4(q+2)$ is the minimum number of finite SDE moments required for a finite $\text {R}_2$, and this is guaranteed by Assumption 2.2. Combining (7.52), (7.53) and (7.54) back into (7.51), for all $t\in [0,T]$ we have

$$\begin{aligned} {\mathbb {E}}\Big [\big \Vert {{\widetilde{E}}}(t)\big \Vert ^2 \Big ]\,\,\le \,\, \Gamma _1(R)\int _{0}^{t} {\mathbb {E}}\Big [\big \Vert \widetilde{E}(r)\big \Vert ^2 \Big ]dr +\Gamma _2(R)\left( \rho t+1\right) h_{\max }^2. \end{aligned}$$

By Gronwall’s inequality (see [25, Thm. 8.1]), we have for all $t\in [0,T]$

$$\begin{aligned} \Big ({\mathbb {E}}\Big [\big \Vert {{\widetilde{E}}}(t)\big \Vert ^2 \Big ]\Big )^{\frac{1}{2}}\le C(R,\rho ,t)\,h_{\max }. \end{aligned}$$

(7.55)

Taking the maximum over t on the both sides, the proof follows with

$$\begin{aligned} C(R,\rho ,t):=\sqrt{\big ( C_M(R)+ C_{B_2}\big )\left( \rho t+1\right) \exp \Big (t\big (C_E(R)+C_{B_1}\big )\Big )}. \end{aligned}$$

$\square $

7.3 Proof of Theorem 4.2 on the probability of using the backstop

Proof

By (3.11) and by the Markov inequality we have

$$\begin{aligned} {\mathbb {P}}\big [h_{n+1}= h_{\min }\big ]={\mathbb {P}}\left[ \frac{h_{\max }}{\big \Vert \widetilde{Y}(t_n)\big \Vert ^{1/\kappa }}\le h_{\min }\right] \le \frac{{\mathbb {E}}\Big [\big \Vert \widetilde{Y}(t_n)\big \Vert ^2\Big ]}{\rho ^{2\kappa }}. \end{aligned}$$

(7.56)

By adding in and subtracting out $X(t_n)$ together with the tower property of conditional expectation, (2.3), (7.1) and (2.9) we have

$$\begin{aligned} {\mathbb {E}}\Big [\big \Vert {{\widetilde{Y}}}(t_n)\big \Vert ^2\Big ] \le \,\,&2{\mathbb {E}}\Big [\big \Vert X(t_n)-{{\widetilde{Y}}}(t_n)\big \Vert ^2\Big ]+2{\mathbb {E}}\big [\Vert X(t_n)\Vert ^2\big ]\nonumber \\ \le \,\,&2{\mathbb {E}}\Big [{\mathbb {E}}\Big [\big \Vert X(t_n)-{{\widetilde{Y}}}(t_n)\big \Vert ^2\Big | {\mathcal {F}}_{t_{n-1}}\Big ]\Big ]+2{\mathbb {E}}\bigg [\sup _{t_n\in [0,T]}\Vert X(t_n)\Vert ^2\bigg ]\nonumber \\ \le \,\,&2{\mathbb {E}}\Big [{\mathbb {E}}\Big [\big \Vert {{\widetilde{E}}}(t_n)\big \Vert ^2\Big | {\mathcal {F}}_{t_{n-1}}\Big ]\Big ]+2C_{\texttt {X}}. \end{aligned}$$

(7.57)

Next, we repeatedly substitute (7.47) for decreasing values of n into the RHS of (7.57) until $n=0$. Then with tower property, Definition 3.2, (3.11) and (7.48) we have

$$\begin{aligned} {\mathbb {E}}\Big [\big \Vert {{\widetilde{Y}}}(t_n)\big \Vert ^2\Big ] \le \,\,&2{\mathbb {E}}\Big [\big \Vert {{\widetilde{E}}}(t_{n-1})\big \Vert ^2\Big ]+2\Gamma _1(R){\mathbb {E}}\Bigg [\int _{t_{n-1}}^{t_n}{\mathbb {E}}\Big [\big \Vert {{\widetilde{E}}}(r)\big \Vert ^2\Big | {\mathcal {F}}_{t_{n-1}}\Big ]dr\Bigg ]\nonumber \\&\qquad +2{\mathbb {E}}\Bigg [{\overline{\Gamma }}_2^{\{4(q+2)\}}(R) h_{n} ^3\Bigg ]+2C_{\texttt {X}} \nonumber \\ \le \,\,&2{\mathbb {E}}\Big [\big \Vert {{\widetilde{E}}}(t_{0})\big \Vert ^2\Big ]+2\Gamma _1(R){\mathbb {E}}\Bigg [\sum _{j=1}^{n}\int _{t_{j-1}}^{t_j}{\mathbb {E}}\Big [\big \Vert {{\widetilde{E}}}(r)\big \Vert ^2\Big | {\mathcal {F}}_{t_{j-1}}\Big ]dr\Bigg ]\nonumber \\&\qquad +2N^{(T)}_{\max }\Gamma _2(R) h_{\max }^3+2C_{\texttt {X}} \nonumber \\ \le \,\,&2\Gamma _1(R){\mathbb {E}}\Bigg [\int _{0}^{t_n}{\mathbb {E}}\Big [\big \Vert {{\widetilde{E}}}(r)\big \Vert ^2\Big | {\mathcal {F}}_{t_{N^{(r)}-1}}\Big ]dr\Bigg ]\nonumber \\&\qquad +2\left( \rho \,T+1 \right) \Gamma _2(R) h_{\max }^3+2C_{\texttt {X}}. \end{aligned}$$

(7.58)

Since the integrand ${\mathbb {E}}\Big [\big \Vert \widetilde{E}(r)\big \Vert ^2\Big | {\mathcal {F}}_{t_{N^{(r)}-1}}\Big ]$ in the second term on the RHS of (7.58) is almost surely non-negative for all $r\in [0,T]$, we can replace the upper limit of integration with T. Using ${{\widetilde{E}}}(t_{0})=0$, (3.11), the tower property of conditional expectation, and (7.55) from Theorem 4.1, we have

$$\begin{aligned}&{\mathbb {E}}\Bigg [\int _{0}^{t_n}{\mathbb {E}}\Big [\big \Vert \widetilde{E}(r)\big \Vert ^2\Big | {\mathcal {F}}_{t_{N^{(r)}-1}}\Big ]dr\Bigg ] \nonumber \\ \le&\int _{0}^{T}{\mathbb {E}}\Big [\big \Vert \widetilde{E}(r)\big \Vert ^2\Big ]dr \le T\max _{r\in [0,T]}{\mathbb {E}}\Big [\big \Vert {{\widetilde{E}}}(r)\big \Vert ^2\Big ] \le T\,C^2(R,\rho ,T)h_{\max }^2. \end{aligned}$$

(7.59)

By choosing $h_{\max }\le 1/C(R,\rho ,T)$, we substitute (7.59) into (7.58) and then (7.56) to get

$$\begin{aligned} {\mathbb {P}}\big [h_{n+1}\,=\,\, h_{\min }\big ] \,\le \,\, \frac{2\Big (\Gamma _1(R)+\left( T+1 \right) \Gamma _2(R) h_{\max }^2+C_{\texttt {X}} \Big )}{\rho ^{2\kappa -1}}=:\frac{C_{\text {prob}}}{\rho ^{2\kappa -1}}, \end{aligned}$$

(7.60)

and the rest of the proof follows. $\square $

References

Beyn, W.J., Isaak, E., Kruse, R.: Stochastic C-stability and B-consistency of explicit and implicit Milstein-type schemes. J. Sci. Comput. 70(3), 1042–1077 (2017)
Article MathSciNet MATH Google Scholar
Burrage, P.M., Herdiana, R., Burrage, K.: Adaptive stepsize based on control theory for stochastic differential equations. J. Comput. Appl. Math. 171(1–2), 317–336 (2004). https://doi.org/10.1016/j.cam.2004.01.027
Article MathSciNet MATH Google Scholar
Campbell, S., Lord, G.: Adaptive time-stepping for stochastic partial differential equations with non-Lipschitz drift. arXiv preprint arXiv:1812.09036 (2018)
Dineen, S.: Probability Theory in Finance: a mathematical guide to the Black–Scholes Formula. Graduate studies in mathematics; v.70. American Mathematical Society, Universities Press (2011)
Fang, W., Giles, M.B.: Adaptive Euler–Maruyama method for SDEs with non-globally Lipschitz drift. In: International Conference on Monte Carlo and Quasi-Monte Carlo Methods in Scientific Computing, pp. 217–234. Springer (2016)
Fang, W., Giles, M.B.: Adaptive Euler–Maruyama method for SDEs with nonglobally Lipschitz drift. Ann. Appl. Probab. 30(2), 526–560 (2020)
Article MathSciNet MATH Google Scholar
Gaines, J., Lyons, T.: Variable step size control in the numerical solution of stochastic differential equations. SIAM J. Appl. Math. 57(5), 1455–1484 (1997). https://doi.org/10.1137/S0036139995286515
Article MathSciNet MATH Google Scholar
Gan, S., He, Y., Wang, X.: Tamed Runge–Kutta methods for SDEs with super-linearly growing drift and diffusion coefficients. Appl. Numer. Math. 152, 379–402 (2020)
Article MathSciNet MATH Google Scholar
Grasman, J., Salomons, H., Verhulst, S.: Stochastic modeling of length-dependent telomere shortening in Corvus monedula. J. Theor. Biol. 282(1), 1–6 (2011)
Article MATH Google Scholar
Guo, Q., Liu, W., Mao, X., Yue, R.: The truncated Milstein method for stochastic differential equations with commutative noise. J. Comput. Appl. Math. 338, 298–310 (2018). https://doi.org/10.1016/j.cam.2018.01.014
Article MathSciNet MATH Google Scholar
Hasminskii, R.: Stochastic Stability of Differential Equations. Sijthoff & Noordhoff (1980)
Higham, D.J., Kloeden, P.E.: Maple and MATLAB for stochastic differential equations in finance. In: Programming Languages and Systems in Computational Economics and Finance, pp. 233–269. Springer (2002)
Higham, D.J., Mao, X., Szpruch, L.: Convergence, non-negativity and stability of a new Milstein scheme with applications to finance. Discrete and Continuous Dynamical Systems B, pp. 2083–2100 (2013)
Hofmann, N., Müller-Gronbach, T., Ritter, K.: The optimal discretization of stochastic differential equations. J. Complex. 17(1), 117–153 (2001). https://doi.org/10.1006/jcom.2000.0570
Article MathSciNet MATH Google Scholar
Hutzenthaler, M., Jentzen, A., Kloeden, P.E.: Strong and weak divergence in finite time of Euler’s method for stochastic differential equations with non-globally Lipschitz continuous coefficients. Proc. R. Soc. Lond. A Math. Phys. Eng. Sci. 467(2130), 1563–1576 (2011)
MathSciNet MATH Google Scholar
Ilie, S., Jackson, K.R., Enright, W.H.: Adaptive time-stepping for the strong numerical solution of stochastic differential equations. Numer. Algorithms 68(4), 791–812 (2015). https://doi.org/10.1007/s11075-014-9872-6
Article MathSciNet MATH Google Scholar
Kelly, C., Lord, G.J.: Adaptive time-stepping strategies for nonlinear stochastic systems. IMA J. Numer. Anal. 38(3), 1523–1549 (2018)
Article MathSciNet MATH Google Scholar
Kelly, C., Lord, G.J.: Adaptive Euler methods for stochastic systems with non-globally Lipschitz coefficients. Numerical Algorithms pp. 1–27 (2021)
Kloeden, P., Platen, E.: Numerical methods for stochastic differential equations. Stoch. Hydrol. Hydraul. 5(2), 172–172 (1991)
Article Google Scholar
Kumar, C., Sabanis, S.: On Milstein approximations with varying coefficients: the case of super-linear diffusion coefficients. BIT Numer. Math. 59(4), 929–968 (2019). https://doi.org/10.1007/s10543-019-00756-5
Article MathSciNet MATH Google Scholar
Lamba, H., Mattingly, J.C., Stuart, A.M.: An adaptive Euler-Maruyama scheme for SDEs: convergence and stability. IMA J. Numer. Anal. 27(3), 479–506 (2007). https://doi.org/10.1093/imanum/drl032
Article MathSciNet MATH Google Scholar
Lévy, P.: Wiener’s random function, and other Laplacian random functions. In: Proceedings of the Second Berkeley Symposium on Mathematical Statistics and Probability. The Regents of the University of California (1951)
Lord, G.J., Powell, C.E., Shardlow, T.: An Introduction to Computational Stochastic PDEs. Cambridge University Press, Cambridge (2014)
Book MATH Google Scholar
Malham, S., Wiese, A.: Efficient almost-exact Lévy area sampling. Stat. Probab. Lett. 88, 50–55 (2014)
Article MATH Google Scholar
Mao, X.: Stochastic Differential Equations and Applications, 2nd edn. Woodhead Publishing, Cambridge (2007)
MATH Google Scholar
Mao, X.: The truncated Euler-Maruyama method for stochastic differential equations. J. Comput. Appl. Math. 290, 370–384 (2015)
Article MathSciNet MATH Google Scholar
Mao, X., Yuan, C.: Stochastic Differential Equations with Markovian Switching. Imperial College Press, London (2006)
Book MATH Google Scholar
Mauthner, S.: Step size control in the numerical solution of stochastic differential equations. J. Comput. Appl. Math. 100(1), 93–109 (1998). https://doi.org/10.1016/S0377-0427(98)00139-3
Article MathSciNet MATH Google Scholar
Papoulis, A., Pillai, S.U.: Probability, Random Variables, and Stochastic Processes, 4th edn. McGraw-Hill, New York (2002)
Google Scholar
Reisinger, C., Stockinger, W.: An adaptive Euler–Maruyama scheme for Mckean–Vlasov SDEs with super-linear growth and application to the mean-field FitzHugh–Nagumo model. arXiv preprint arXiv:2005.06034 (2020)
Shardlow, T., Taylor, P.: On the pathwise approximation of stochastic differential equations. BIT Numer. Math. 56(3), 1101–1129 (2016)
Article MathSciNet MATH Google Scholar
Shiryaev, A.: Probability, 2nd edn. Springer, Berlin (1996)
Book MATH Google Scholar
Tretyakov, M.V., Zhang, Z.: A fundamental mean-square convergence theorem for SDEs with locally Lipschitz coefficients and its applications. SIAM J. Numer. Anal. 51(6), 3135–3162 (2013)
Article MathSciNet MATH Google Scholar
Wang, X., Gan, S.: The tamed Milstein method for commutative stochastic differential equations with non-globally Lipschitz continuous coefficients. J. Differ. Equ. Appl. 19(3), 466–490 (2013)
Article MathSciNet MATH Google Scholar
Yao, J., Gan, S.: Stability of the drift-implicit and double-implicit Milstein schemes for nonlinear SDEs. Appl. Math. Comput. 339, 294–301 (2018). https://doi.org/10.1016/j.amc.2018.07.026
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

School of Mathematical Sciences, University College Cork, Cork, Ireland
Cónall Kelly
Mathematics, IMAPP, Radboud University, Nijmegen, The Netherlands
Gabriel J. Lord
Maxwell Institute, Department of Mathematics, MACS, Heriot-Watt University, Edinburgh, UK
Fandi Sun

Authors

Cónall Kelly
View author publications
You can also search for this author in PubMed Google Scholar
Gabriel J. Lord
View author publications
You can also search for this author in PubMed Google Scholar
Fandi Sun
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Fandi Sun.

Ethics declarations

Conflict of interest

The authors have no competing interests to declare that are relevant to the content of this article.

Additional information

Communicated by David Cohen.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix A: Proof of Lemma 2.2 (Lévy Area)

Proof

Set $\texttt{i}^2=-1$. Since the pair of Wiener processes $(W_i(r),W_j(r))^T$, $r\in [t_n, s]$, are mutually independent, by [22, Eq. (1.3.5)] the characteristic function of the Lévy area (2.13) is given by $\phi (\lambda )=(\cosh \left( \frac{1}{2}|s-t_n|\lambda \right) )^{-1}.$ This was applied in the context of numerical methods for SDEs in [24]. The Taylor expansion of the function $\cosh \left( \frac{1}{2}|s-t_n|\lambda \right) $ around 0 gives

$$\begin{aligned} \phi (\lambda )=\sum _{N=0}^{\infty }\frac{{\textbf {E}}_{2N}}{(2N)!}\left( \frac{1}{2}|s-t_n|\right) ^{2N}\,\lambda ^{2N},\quad \left| \frac{1}{2}|s-t_n|\lambda \right| <\frac{\pi }{2}, \end{aligned}$$

where ${\textbf {E}}_{2N}$ stands for the $2N^{\text {th}}$ Euler number, which may be expressed as

$$\begin{aligned} {\textbf {E}}_{2N}=\texttt{i}\sum _{b=1}^{2N+1}\sum _{j=0}^{b}\left( {\begin{array}{c}j\\ b\end{array}}\right) \frac{(-1)^j (b-2j)^{2N+1}}{2^b\,\texttt{i}^b\,b},\quad N=0,1,2,3,\dots . \end{aligned}$$

All odd Euler numbers are zero. The $k^{\text {th}}$ derivative of the characteristic function with respect to $\lambda $ is

$$\begin{aligned} \phi (\lambda )_\lambda ^{(k)}=\sum _{N=\lceil \frac{k}{2}\rceil }^{\infty }\left( \prod ^{k-1}_{B=0}(2N-B)\right) \frac{{\textbf {E}}_{2N}}{(2N)!}\left( \frac{1}{2}|s-t_n|\right) ^{2N}\,\lambda ^{2N-k}. \end{aligned}$$

As $\lambda \rightarrow 0$, since all terms vanish unless $k=2N$, we have

$$\begin{aligned} \lim _{\lambda \rightarrow 0}\phi (\lambda )_\lambda ^{(k)}={\left\{ \begin{array}{ll} \displaystyle \left( \prod ^{k-1}_{B=0}(k-B)\right) \frac{{\textbf {E}}_{k}}{(k)!}\left( \frac{1}{2}|s-t_n|\right) ^{k}, &{} k\,\text { even};\\ \,\,0, &{} k\,\text { odd}. \end{array}\right. } \end{aligned}$$

In the calculation of expectations, we make use of the mutual independence, conditional upon ${\mathcal {F}}_{t_n}$, of the pair of Brownian increments $(W_i(t),W_j(t))^T$. Therefore, the $k^{\text {th}}$ conditional moment of $A_{ij}^{t_n,s}$ is

$$\begin{aligned} {\mathbb {E}}\Big [\big (A_{ij}^{t_n,s}\big )^k \Big |{\mathcal {F}}_{t_n}\Big ]= L_k\,|s-t_n|^k, \end{aligned}$$

where for all $a=1,2,3,\dots $

$$\begin{aligned} L_k= & {} \left( \prod ^{k-1}_{B=0}(k-B)\right) \frac{{\textbf {E}}_{k}}{(k)!}\left( -\frac{1}{2}\,\texttt{i}\right) ^{k}\\:= & {} {\left\{ \begin{array}{ll} \left( \prod ^{k-1}_{B=0}(k-B)\right) \frac{{\textbf {E}}_{k}}{(k)!}\left( \frac{1}{2}\right) ^{k}, &{} \quad k=4a=4, 8, 12,\dots \\ -\left( \prod ^{k-1}_{B=0}(k-B)\right) \frac{{\textbf {E}}_{k}}{(k)!}\left( \frac{1}{2}\right) ^{k}, &{} \quad k=4a-2=2, 6, 10 \dots \\ 0, &{} \quad k=2a-1=1, 3, 5,\dots \end{array}\right. } \end{aligned}$$

which is finite, as a finite product of finite factors. When k is even, we have

$$\begin{aligned} {\mathbb {E}}\Big [\big |A_{ij}^{t_n,s}\big |^k \Big |{\mathcal {F}}_{t_n}\Big ]= {\mathbb {E}}\Big [\big (A_{ij}^{t_n,s}\big )^k \Big |{\mathcal {F}}_{t_n}\Big ]= L_k\,|s-t_n|^k, \quad a.s. \end{aligned}$$

When k is odd, i.e. $k=2c+1$ for all $c=0,1,2,\dots $, we have a.s.

$$\begin{aligned} {\mathbb {E}}\Big [\big |A_{ij}^{t_n,s}\big |^k \Big |{\mathcal {F}}_{t_n}\Big ]&={\mathbb {E}}\Big [\big |A_{ij}^{t_n,s}\big |^{2c+1} \Big |{\mathcal {F}}_{t_n}\Big ]\\&\le \sqrt{ {\mathbb {E}}\Big [\big (A_{ij}^{t_n,s}\big )^{4c} \Big |{\mathcal {F}}_{t_n}\Big ]{\mathbb {E}}\Big [\big (A_{ij}^{t_n,s}\big )^{2} \Big |{\mathcal {F}}_{t_n}\Big ]}\\&={\left\{ \begin{array}{ll} \sqrt{L_2}\,|s-t_n|, &{}\quad c=0;\\ \displaystyle \sqrt{L_{4c}\cdot L_2}|s-t_n|^{2c+1}, &{}\quad c=1,2,3,\dots \end{array}\right. }\\&={\left\{ \begin{array}{ll} \displaystyle \sqrt{L_2}\,|s-t_n|, &{}\quad k=1;\\ \sqrt{L_{2k-2}\cdot L_2}|s-t_n|^{k}, &{}\quad k=3,5,7,\dots . \end{array}\right. } \end{aligned}$$

Therefore, in conclusion we have

$$\begin{aligned} {\mathbb {E}}\Big [\big |A_{ij}^{t_n,s}\big |^k \Big |{\mathcal {F}}_{t_n}\Big ]\,\le \,\, C^{}_{\texttt {LA}}\left( k\right) |s-t_n|^k, \quad a.s. \end{aligned}$$

where

$$\begin{aligned} C^{}_{\texttt {LA}}\left( k\right) ={\left\{ \begin{array}{ll} \sqrt{L_2}, &{}\quad k=1;\\ \sqrt{L_{2k-2}\cdot L_2}, &{}\quad k=3,5,7,\dots ; \\ L_k, &{} \quad k=2,4,6,\dots \end{array}\right. } \end{aligned}$$

(A.1)

$\square $

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Kelly, C., Lord, G.J. & Sun, F. Strong convergence of an adaptive time-stepping Milstein method for SDEs with monotone coefficients. Bit Numer Math 63, 33 (2023). https://doi.org/10.1007/s10543-023-00969-9

Download citation

Received: 23 July 2021
Accepted: 24 March 2023
Published: 22 May 2023
DOI: https://doi.org/10.1007/s10543-023-00969-9

Keywords

Mathematics Subject Classification

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Strong convergence of an adaptive time-stepping Milstein method for SDEs with monotone coefficients

Abstract

Similar content being viewed by others

On Milstein approximations with varying coefficients: the case of super-linear diffusion coefficients

Convergence of the Euler–Maruyama method for multidimensional SDEs with discontinuous drift and degenerate diffusion coefficient

Weak convergence of Euler scheme for SDEs with low regular drift

1 Introduction

2 Mathematical preliminaries

Assumption 2.1

Assumption 2.2

Lemma 2.1

Definition 2.1

Lemma 2.2

3 Adaptive time-stepping strategies

Definition 3.1

Assumption 3.1

Definition 3.2

Definition 3.3

Definition 3.4

Remark 3.1

Remark 3.2

Definition 3.5

4 Main results

Theorem 4.1

Theorem 4.2

5 Numerical examples

Remark 5.1

5.1 One-dimensional test equations with multiplicative and additive noise

5.2 One-dimensional model of telomere shortening

5.3 Two-dimensional test systems

6 Preliminary lemmas

Lemma 6.1

Proof

Lemma 6.2

Proof

Lemma 6.3

Proof

Remark 6.1

Lemma 6.4

Proof

Lemma 6.5

Proof

7 Proof of main theorems

7.1 Setting up the error function

Lemma 7.1

Proof

Lemma 7.2

Proof

7.2 Proof of Theorem 4.1 on strong convergence

Proof

7.3 Proof of Theorem 4.2 on the probability of using the backstop

Proof

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendix A: Proof of Lemma 2.2 (Lévy Area)

Appendix A: Proof of Lemma 2.2 (Lévy Area)

Proof

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation