1 Introduction

We investigate the use of adaptive time-stepping strategies in the construction of a strongly convergent explicit Milstein-type numerical scheme for a d-dimensional stochastic differential equation (SDE) of Itô-type on the probability space \((\Omega , {\mathcal {F}}, {\mathbb {P}})\),

$$\begin{aligned} X(t)= X(0)+\int _{0}^{t}f(X(r))dr +\sum _{i=1}^{m}\int _{0}^{t}g_i(X(r))dW_i(r), \end{aligned}$$
(1.1)

for \(t\in [0,T]\), \(T\ge 0\) and \(i=1,\dots , m\in {\mathbb {N}}\), where \(W=[W_1,\cdots , W_m]^T\) is an m-dimensional Wiener process, the drift coefficient \(f: {\mathbb {R}}^d \rightarrow {\mathbb {R}}^d\) and the diffusion coefficient \(g: {\mathbb {R}}^d\rightarrow {\mathbb {R}}^{d\times m}\) each satisfy a local Lipschitz condition along with a polynomial growth condition and, together, a monotone condition. Both are twice continuously differentiable; see Assumptions 2.1 and 2.2. Throughout, we take the initial vector \(X(0)=X_0\in {\mathbb {R}}^d\) to be deterministic.

It was pointed out in [34] that, because the Euler–Maruyama and Euler–Milstein methods coincide in the additive noise case, and as a consequence of the analysis in [15], an explicit Milstein scheme over a uniform mesh cannot converge in \({L}_p\) to solutions of (1.1). We propose here an adaptive variant of the explicit Milstein method that achieves strong \({L}_2\) convergence of order one to solutions of (1.1). As an immediate consequence of this, in the case of additive noise an adaptive Euler–Maruyama method also has \({L}_2\) convergence of order one. To prove our convergence result it is essential to introduce a new variant of the admissible class of time-stepping strategies introduced in [17, 18], which we call path-bounded strategies.

Several variants on the fixed-step Milstein method have been proposed, see for example the tamed Milstein [20, 34], projected and split-step backward Milstein [1], truncated Milstein [10], implicit Milstein methods [13, 35] and a recent tamed stochastic Runge–Kutta (of order one) method of [8], all designed to converge strongly to solutions of SDEs with more general drift and diffusions, such as in (1.1). However, with few exceptions (see [1, 20]) explicit methods of this kind have only examined the case where the diffusion coefficients \(g_i\) satisfies a commutativity condition. We do not impose a commutativity restriction and hence must consider the associated Lévy areas (see Lemma 2.2).

A review of methods that adapt the timestep in order to control local error may be found in the introduction to [17]; we cite here [2, 7, 16, 21, 28, 31] and remark that our purpose is instead to handle the nonlinear response of the discrete system see also [5, 6] and discussion in [17, 18]. A common feature of the adaptivity is the use of both a minimum and maximum time step where the magnitude of the minimum step is controlled by a free parameter which requires some a-priori knowledge on the part of the user. The approach of [5, 6] was recently extended to McKean–Vlasov equations in [30] and include a Milstein approximation. In addition we note the fully adaptive Milstein method proposed in [14] for a scalar SDE with light constraints on the coefficients. There the authors stated that such a method was easy to implement but hard to analyse and as a result considered a different, but related method.

Our framework for adaptivity was introduced in [17] for an explicit Euler–Maruyama method, and has since been extended to SDE systems with monotone coefficients in [18] and to SPDE methods in [3]. These methods all use a backstop method when the chosen strategy attempts to select a stepsize below the minimum step. We demonstrate here, for a path-bounded strategy, that the probability of using the backstop can be made arbitrarily small by choosing an appropriately large \(\rho \), and an appropriately small \(h_{\max }\). This is consistent with observation, and with the intuitive notion that the use of the backstop should be rare in practice.

The structure of the article is as follows. Mathematical preliminaries are considered in Sect. 2, including precise specifications of the conditions imposed on each f and \(g_i\), and the characterisation of an explicit Milstein method on an arbitrary mesh. The construction and result of the adaptive time-stepping strategy is outlined in Sect. 3, where we formulate the adaptive Milstein scheme with backstop which will be the subject of our main theorem. Both main results: on strong \(L_2\) convergence and on the probability of using the backstop method, are stated in Sect. 4; we defer their proofs to Sect. 7. In Sect. 5 we compare the adaptive scheme numerically to other fixed step methods and illustrate both convergence and efficiency. The proof of Lemma 2.2 is in Appendix A.

2 Mathematical preliminaries

We consider the d-dimensional Itô-type SDE (1.1) and for the remainder of the article let \(({\mathcal {F}}_t)_{t\ge 0}\) be the natural filtration of W. For all \(x \in {\mathbb {R}}^d\) and for all \(\phi (x)\in \textrm{C}^2({\mathbb {R}}^d,{\mathbb {R}}^d)\), the Jacobian matrix of \(\phi (x)\) is denoted \({\textbf{D}}\phi (x)\in {\mathcal {L}}({\mathbb {R}}^d,{\mathbb {R}}^d)\); the second derivative of \(\phi (x)\) with respect to a vector x forms a 3-tensor and is denoted \({\textbf{D}}^2\phi (x)\in {\mathcal {L}}({\mathbb {R}}^{d\times d},{\mathbb {R}}^d)\); and \([x]^2:=x\otimes x\) stands for the outer product of x and itself. Furthermore, let \(\Vert \cdot \Vert \) denote the standard \(l^2\) norm in \({\mathbb {R}}^d\), \(\Vert \cdot \Vert _{{\textbf{F}}(a\times b)}\) the Frobenious norm of the matrix in \({\mathbb {R}}^{a\times b}\); for simplicity we write \(\Vert \cdot \Vert _{{\textbf{F}}}\) as the Frobenious norm of the matrix in \({\mathbb {R}}^{d\times d}\). \(\Vert \cdot \Vert _{{\textbf{T}}_3}\) denotes the induced tensor norm (spectral norm) of the 3-tensor in \({\mathbb {R}}^{d\times d\times d}\) and it is defined as \(\big \Vert \cdot \big \Vert _{{\textbf{T}}_3}:=\sup _{h_1,h_2\in {\mathbb {R}}^d, \Vert h_1\Vert ,\Vert h_2\Vert \le 1}\big \Vert \cdot (h_1\otimes h_2)\big \Vert \). For \(a,b\in {\mathbb {R}}\), \(a\vee b\) denotes max\(\{a,b\}\) and \(a\wedge b\) denotes min\(\{a,b\}\). We frequently make use of the elementary inequality

$$\begin{aligned} 2ab\le a^2+b^2,\quad a,b\in {\mathbb {R}}, \end{aligned}$$
(2.1)

and of the following two standard extensions of Jensen’s inequality (see [23, Corollary A.10]). For \(f\in L^1\), if \(p\ge 1\),

$$\begin{aligned} \Bigg | \int _{0}^t f(s)ds \Bigg |^p \le t^{p-1}\int _{0}^t |f(s)|^p ds,\quad t\ge 0. \end{aligned}$$
(2.2)

For \(a_i\in {\mathbb {R}}\) and \(p\ge 1\),

$$\begin{aligned} \Bigg | \sum _{i=1}^{n} a_i \Bigg |^p \le n^{p-1}\sum _{i=1}^{n} |a_i|^p, \quad n\in {\mathbb {N}}\backslash \{ 0\}. \end{aligned}$$
(2.3)

We now present our assumptions on f and \(g_i\) in (1.1).

Assumption 2.1

Let \(f\in \textrm{C}^2({\mathbb {R}}^d,{\mathbb {R}}^d)\) and \(g\in \textrm{C}^2({\mathbb {R}}^d,{\mathbb {R}}^{d\times m})\) with \(g_i(x)=[g_{1,i}(x),\dots ,g_{d,i}(x)]^T\in \textrm{C}^2({\mathbb {R}}^d,{\mathbb {R}}^{d})\). For each \(\varkappa \ge 1\) there exist \(L_{\varkappa }>0\) such that

$$\begin{aligned} \big \Vert f(x)-f(y)\big \Vert ^2+\big \Vert g(x)-g(y)\big \Vert ^2_{{\textbf{F}}(d\times m)}\le L_{\varkappa }\big \Vert x-y\big \Vert ^2, \end{aligned}$$
(2.4)

for \(x,y\in {\mathbb {R}}^d\) with \(\Vert x\Vert \vee \Vert y\Vert \le \varkappa \), and there exists \(c\ge 0\) such that for some \(\eta \ge 2\)

$$\begin{aligned} \big \langle x-y, f(x)-f(y)\big \rangle +\frac{\eta -1}{2} \big \Vert g(x)-g(y)\big \Vert ^2_{{\textbf{F}}(d\times m)} \le c\big \Vert x-y\big \Vert ^2. \end{aligned}$$
(2.5)

In addition, for some constants \(c_{ 3,4,5,6 }\), \(q_1\), \(q_2\ge 0\); \(i=1,\dots ,m\), we have

$$\begin{aligned} \big \Vert {{\textbf {D}}}f(x)\big \Vert _{{{\textbf {F}}}}\le \,\,&c_3(1+\Vert x\Vert ^{q_1+1}), \qquad \quad \big \Vert {{\textbf {D}}}g_i(x)\big \Vert _{{{\textbf {F}}}} \le \,\, c_4(1+\Vert x\Vert ^{q_2+1}), \end{aligned}$$
(2.6)
$$\begin{aligned} \big \Vert f(x)\big \Vert \le \,\,&c_5(1+\Vert x\Vert ^{q_1+2}), \quad \quad \big \Vert g(x)\big \Vert _{{{\textbf {F}}}(d\times m)} \le \,\, c_6(1+\Vert x\Vert ^{q_2+2}). \end{aligned}$$
(2.7)

Furthermore, for some \(c_{1,2}\ge 0\); \(i=1,\dots ,m\), we have

$$\begin{aligned} \big \Vert {\textbf{D}}^2f(x)\big \Vert _{{\textbf{T}}_3}\le \,\, c_1(1+\Vert x\Vert ^{q_1}), \quad \big \Vert {\textbf{D}}^2g_i(x)\big \Vert _{{\textbf{T}}_3}\le \,\, c_2(1+\Vert x\Vert ^{q_2}). \end{aligned}$$
(2.8)

Under (2.4) and (2.5), the SDE (1.1) has a unique strong solution on any interval [0, T], where \(T < \infty \) on the filtered probability space \((\Omega , {\mathcal {F}}, ({\mathcal {F}}_t )_{t \ge 0}, {\mathbb {P}})\), see [11, 25] and [33].

Assumption 2.2

Suppose that (2.5) in Assumption 2.1 holds with

$$\begin{aligned} \eta \ge 4q + 2q_2+10, \end{aligned}$$

where \(q:=q_1\vee q_2\), \(q_1\) and \(q_2\) are from (2.7) in Assumption 2.1.

We now give the following Lemma on moments of the solution.

Lemma 2.1

[26, Lem. 4.2] Let f and g satisfy (2.4) and (2.5), and suppose that Assumption 2.2 holds. If g further satisfies (2.7), then there is a constant \(C_{\texttt {X}} >0\) such that the solution of (1.1) satisfies

$$\begin{aligned} {\mathbb {E}}\biggl [\sup _{s\in [0,T]}\Vert X(s)\Vert ^{\eta -2q_2-2}\biggr ] \le C_{\texttt {X}}. \end{aligned}$$
(2.9)

Next we present the fixed-step Milstein method (see [19, Sec. 10.3]) that is the basis of the adaptive method presented in this article.

Definition 2.1

(Milstein method) For \(n\in {\mathbb {N}}\), \(s\in [t_n, t_{n+1}]\) and given \(Y(t_n)\), the fixed-step Milstein scheme for (1.1), interpolated over the interval \([t_n,t_{n+1}]\), is given by

$$\begin{aligned} Y(s):= & {} Y(t_n)+f\big (Y(t_n)\big )|s-t_n|+\sum _{i=1}^{m}g_i\big (Y(t_n)\big )I_{i}^{t_n,s}\nonumber \\{} & {} +\sum _{i,j=1}^{m}{\textbf{D}}g_i\big (Y(t_n)\big )g_j\big (Y(t_n)\big )I_{j,i}^{t_n,s}, \end{aligned}$$
(2.10)

where following [1, 34], the stochastic integral and the iterated stochastic integral are defined as

$$\begin{aligned} I_{i}^{t_n,s}:=\int _{t_{n}}^{s}dW_i(r), \qquad I_{j,i}^{t_n,s}:=\int _{t_{n}}^{s} \int _{t_{n}}^{r}dW_j(p) dW_i(r). \end{aligned}$$
(2.11)

Expanding the last term in (2.10) we have that

$$\begin{aligned}&\sum _{i,j=1}^{m}{{\textbf {D}}}g_i\big (Y(t_n)\big )g_j\big (Y(t_n)\big )I_{j,i}^{t_n,s}\nonumber \\=\,\,&\frac{1}{2}\sum _{i=1}^{m}{{\textbf {D}}}g_i\big (Y(t_n)\big )g_i\big (Y(t_n)\big )\left( \left( I_{i}^{t_n,s}\right) ^2-|s-t_n|\right) \nonumber \\ {}&+\frac{1}{2}\sum _{{\begin{array}{c} i,j=1\\ i<j \end{array}}}^{m}\Big ({{\textbf {D}}}g_i\big (Y(t_n)\big )g_j\big (Y(t_n)\big )+{{\textbf {D}}}g_j\big (Y(t_n)\big )g_i\big (Y(t_n)\big )\Big )I_{i}^{t_n,s}I_{j}^{t_n,s}\nonumber \\ {}&+\sum _{{\begin{array}{c} i,j=1\\ i<j \end{array}}}^{m}\Big ({{\textbf {D}}}g_i\big (Y(t_n)\big )g_j\big (Y(t_n)\big )-{{\textbf {D}}}g_j\big (Y(t_n)\big )g_i\big (Y(t_n)\big )\Big )A_{ij}^{t_n,s}, \end{aligned}$$
(2.12)

where the term \(A_{ij}^{t_n,s}\) is the Lévy area (see for example [22, Eq. (1.2.2)]) defined by

$$\begin{aligned} A_{ij}^{t_n,s}:=\frac{1}{2}\left( I_{i,j}^{t_n,s}-I_{j,i}^{t_n,s}\right) {,} \end{aligned}$$
(2.13)

and we have used the relations \(I_{i,i}^{t_n,s} = \frac{1}{2}\big ( (I_{i}^{t_n,s})^2 - |t-s|\big )\) and \(I_{i,j}^{t_n,s} + I_{j,i}^{t_n,s} = I_{i}^{t_n,s} I_{j}^{t_n,s}\). As mentioned in the introduction many authors assume the following commutativity condition: suppose that \({\textbf{D}}g_i(y)g_j(y)={\textbf{D}}g_j(y)g_i(y)\) for all \(i,j=1,\dots , m\) and \(y\in {\mathbb {R}}^d\). When this holds, the last term in (2.12) vanishes, avoiding the need for any analysis of \(A_{ij}^{t_n,s}\) defined in (2.13). We do not impose such a condition in this paper, and therefore make use of the following conditional moment bounds on the Lévy areas.

Lemma 2.2

(Lévy Area) For all \(i,j =1,\dots ,m\), \(0\le t_n\le s<T\) and for a pair of Wiener process \((W_i(r),W_j(r))^T\) where \(r\in [t_n,s]\) and the Lévy area \(A_{ij}^{t_n,s}\) defined in (2.13), there exists a finite constant \(C_{\texttt {LA}}\) whose explicit form is in (A.1) such that for \(k\ge 1\)

$$\begin{aligned} {\mathbb {E}}\left[ \big |A_{ij}^{t_n,s}\big |^k \bigg |{\mathcal {F}}_{t_n}\right] \le C^{}_{\texttt {LA}}\left( k\right) \,|s-t_n|^k, \quad a.s. \end{aligned}$$
(2.14)

For proof see Appendix A.

3 Adaptive time-stepping strategies

To deal with the extra terms that arise from Milstein over Euler–Maruyama type discretisations, we introduce a new class of time-stepping strategies in Definition 3.5. Let \(\{h_{n+1}\}_{n\in {\mathbb {N}}}\) be a sequence of strictly positive random timesteps with corresponding random times \(\{t_n:=\sum _{i=1}^{n}h_i\}_{n\in {\mathbb {N}}\backslash \{0\}}\), where \(t_0=0\).

Definition 3.1

Suppose that each member of \(\{t_n\}_{n\in {\mathbb {N}}\backslash \{0\}}\) is an \({\mathcal {F}}_t\)-stopping time: i.e. \(\{t_n\le t\}\in {\mathcal {F}}_t\) for all \(t\ge 0\), where \(({\mathcal {F}}_t)_{t\ge 0}\) is the natural filtration of W. If \(\tau \) is any \(({\mathcal {F}}_t)\)-stopping time, then (see [27, p. 14])

$$\begin{aligned} {\mathcal {F}}_{\tau }:=\big \{A\in {\mathcal {F}}:\,A\cap \{\tau \le t\}\in {\mathcal {F}}_t {,\, \text{ for } \text{ all } \,t\ge 0}\big \}. \end{aligned}$$
(3.1)

In particular this allows us to condition on \({\mathcal {F}}_{t_n}\) at any point on the random time-set \(\{t_n\}_{n\in {\mathbb {N}}}\).

Assumption 3.1

For the sequence of random timesteps \(\{h_{n+1}\}_{n\in {\mathbb {N}}}\), there are constant values \(h_{\max }>h_{\min }>0\), \(\rho >1\) such that \(h_{\max }=\rho h_{\min }\), and

$$\begin{aligned} 0<h_{\min } \le h_{n+1} \le h_{\max }\le 1. \end{aligned}$$
(3.2)

In addition, we assume each \(h_{n+1}\) is \({\mathcal {F}}_{t_n}\)-measurable.

Definition 3.2

Let \(N^{(t)}\) be a random integer such that

$$\begin{aligned} N^{(t)}:=\max \big \{n\in {\mathbb {N}}\backslash \{ 0\}: t_{n-1}<{t} \big \}, \end{aligned}$$
(3.3)

and let \(N=N^{(T)}\) and \(t_N=T\), so that T is always the last point on the mesh. Note that \(N^{(t)}\) indicates the step number such that \(t\in \big [t_{N^{(t)}-1},\,t_{N^{(t)}}\big ]\). Furthermore, by Assumption 3.1, \(N^{(t)}\) only takes values in the finite set \(\{N^{(t)}_{\min },\dots ,N^{(t)}_{\max }\}\), where \(N^{(t)}_{\min }:=\lfloor t/h_{\max }\rfloor \) and \(N^{(t)}_{\max }:=\lceil t/h_{\min }\rceil \).

In Assumption 3.1, the lower bound \(h_{\min }\) given by (3.2) ensures that a simulation over the interval [0, T] can be completed in a finite number of time steps. In the event that at time \(t_n\) our strategy attempts to select a stepsize \(h_{n+1} \le h_{\min }\), we instead apply a single step of a backstop method (\(\varphi \) in Definition 3.3 below), a known method that satisfies a mean-square consistency requirement with deterministic step \(h_{n+1}=h_{\min }\) (see also discussion in Remarks 3.1 and 5.1).

First we recall the Milstein method expressed as a map. Over each step \([t_n,t_{n+1}]\) the Milstein map \(\theta :{\mathbb {R}}^d\times {\mathbb {R}} \times {\mathbb {R}}\rightarrow {\mathbb {R}}^d\) is defined as

$$\begin{aligned} \theta \big (x,t_n,s-t_n\big ):= & {} x+(s-t_n)f(x)\nonumber \\{} & {} \quad +\sum _{i=1}^{m}g_i(x)I_{i}^{t_n,s}+\sum _{i,j=1}^{m}{\textbf{D}}g_i(x)g_j(x) I_{j,i}^{t_n,s}. \end{aligned}$$
(3.4)

Following [18, Def. 9], we now define an adaptive Milstein scheme combining the Milstein method and a backstop method.

Definition 3.3

(Adaptive Milstein Scheme) Let \(\{h_{n+1}\}_{n\in {\mathbb {N}}}\) satisfy Assumption 3.1. Using indicator functions to distinguish the backstop case when \(h_{n+1}=h_{\min }\) (and allowing for the possibility that the final step taken to time T is smaller than \(h_{\min }\), in which case the backstop is also used), we define the continuous form of an adaptive Milstein scheme associated with a particular time-stepping strategy \(\{h_{n+1}\}_{n\in {\mathbb {N}}}\) as

$$\begin{aligned} {\widetilde{Y}}(s):=\theta \left( {{\widetilde{Y}}}(t_n)\varvec{,}\,\, t_n\varvec{,}\,\,s-t_n\right) \cdot {\textbf{1}}_{\{h_{\min }<h_{n+1}\le h_{\max }\}}\nonumber \\ +\varphi \left( \widetilde{Y}(t_n)\varvec{,}\,\,t_n\varvec{,}\,\,{s-t_n}\right) \cdot {\textbf{1}}_{\{h_{n+1}{\le } h_{\min }\}}, \end{aligned}$$
(3.5)

for \(s\in [t_n,t_{n+1}]\), \(n\in {\mathbb {N}}\), \({\widetilde{Y}}(0)=X(0)\), and \(\theta \) is as given in (3.4). Thus the scheme is characterised by the sequence of tuples, \(\big \{\big (\widetilde{Y}(s)\big )_{s\in [t_n,t_{n+1}]},h_{n+1}\big \}_{n\in {\mathbb {N}}}\). The backstop map \(\varphi :{\mathbb {R}}^d\times {\mathbb {R}} \times {\mathbb {R}} \rightarrow {\mathbb {R}}^d\) in (3.5) satisfies for each \(n \in {\mathbb {N}}\)

$$\begin{aligned} {\mathbb {E}}\left[ \left\| X(s)- \varphi \left( \widetilde{Y}(t_n)\varvec{,}\,\,t_n\varvec{,}\,\, s-t_n\right) \right\| ^2 \bigg |{\mathcal {F}}_{t_n} \right] \le \left\| X(t_n) -\widetilde{Y}(t_n) \right\| ^2\nonumber \\ + C_{B_1} \int _{t_n}^{s} {\mathbb {E}}\left[ \left\| X(r)-\varphi \left( \widetilde{Y}(t_n)\varvec{,}\,\,t_n\varvec{,}\,\, {r-t_n}\right) \right\| ^2 \bigg |{\mathcal {F}}_{t_n}\right] dr+ C_{B_2} h_{\min }^3, \end{aligned}$$
(3.6)

a.s, for positive constants \(C_{B_1}\) and \(C_{B_2}\).

Throughout the article it is notationally convenient to make the following definition.

Definition 3.4

Let \({\widetilde{Y}}\) be as given in Definition 3.3 and define for each \(n\in {\mathbb {N}}\)

$$\begin{aligned} Y_{\theta }(s):=\theta \Big ({\widetilde{Y}}(t_n),t_n,s-t_n\Big ),\quad s\in [t_n,t_{n+1}]. \end{aligned}$$
(3.7)

Remark 3.1

The upper bound \(h_{\max }\) prevents step sizes from becoming too large and allows us to examine strong convergence of the adaptive Milstein method (3.5) to solutions of (1.1) as \(h_{\max }\rightarrow 0\) (and hence as \(h_{\min }\rightarrow 0\)). Note that \(\varphi \) satisfies (3.6) if the backstop method satisfies a mean-square consistency requirement. In practice, instead of testing (3.6), we choose a backstop method that is strongly convergent with rate 1.

Remark 3.2

For all \(i,j=1,2,\dots , m\), \(I_{i}^{t_n,t_{n+1}}\) in (2.11) is a Wiener increment taken over a random step of length \(h_{n+1}\), which itself may depend on \({{\widetilde{Y}}}(t_n)\) and therefore is not necessarily independent and normally distributed. However, since \(h_{n+1}\) is \({\mathcal {F}}_{t_n}\)-measurable, then \(I_{i}^{t_n,t_{n+1}}\) is \({\mathcal {F}}_{t_n}\)-conditionally normally distributed and by the Optional Sampling Theorem (see for example [32]), for all \(p=0,1,2,\dots \)

$$\begin{aligned} {\mathbb {E}}\left[ I_{i}^{t_n,t_{n+1}} \bigg |{\mathcal {F}}_{t_n}\right]&=0,\quad a.s.; \end{aligned}$$
(3.8)
$$\begin{aligned} {\mathbb {E}}\left[ \left| I_{i}^{t_n,t_{n+1}}\right| ^2 \bigg |{\mathcal {F}}_{t_n}\right]&=h_{n+1},\quad a.s.; \end{aligned}$$
(3.9)
$$\begin{aligned} {\mathbb {E}}\left[ \left| I_{i}^{t_n,s}\right| ^{p} \bigg |{\mathcal {F}}_{t_n}\right]&=\varvec{\gamma }_{p}|s-t_n|^{\frac{p}{2}},\quad a.s.; \end{aligned}$$
(3.10)

where \(\varvec{\gamma }_{p}:=2^{p/2}\Gamma \left( (p+1)/2 \right) \pi ^{-1/2}\), and \(\Gamma \) is the Gamma function (see for example [29, p. 148]). In implementation, it is sufficient to replace the sequence of Wiener increments with i.i.d. \({\mathcal {N}} (0, 1)\) random variables scaled at each step by the \({\mathcal {F}}_{t_n}\)-measurable random variable \(\sqrt{h_{n+1}}\).

We now provide a specific example of a time-stepping strategy that we use in Sect. 5 and that satisfies the assumptions for our convergence proof in Theorem 4.1. Suppose that for each \(n=0,\dots , N-1\) and some fixed constant \(\kappa >0\), we choose constant values \(h_{\max }>h_{\min }>0\), \(\rho >1\) such that \(h_{\max }=\rho h_{\min }\) and

$$\begin{aligned} {h_{n+1}=h_{\min }\vee \left( h_{\max }\wedge \frac{h_{\max }}{\big \Vert \widetilde{Y}(t_n)\big \Vert ^{1/\kappa }} \right) .} \end{aligned}$$
(3.11)

Then (3.2) in Assumption 3.1 holds for (3.11). Notice also that, from (3.11), the following bound applies on the event \(\{h_{\min }< h_{n+1} \le h_{\max }\}\):

$$\begin{aligned} 0\le \big \Vert {{\widetilde{Y}}}(t_n)\big \Vert < \left( \frac{h_{\max }}{h_{\min }}\right) ^\kappa = \rho ^\kappa . \end{aligned}$$

The strategy given by (3.11) is admissible in the sense given in [17, 18]. However, it also motivates the following class of time-stepping strategies to which our convergence analysis applies.

Definition 3.5

(Path-bounded time-stepping strategies) Let \(\big \{{{\widetilde{Y}}}(t_n),h_{n+1} \big \}_{n\in {\mathbb {N}}}\) be a numerical approximation for (1.1) given by (3.5), associated with a timestep sequence \(\{h_{n+1}\}_{n\in {\mathbb {N}}}\) satisfying Assumption 3.1. We say that \(\{h_{n+1}\}_{n\in {\mathbb {N}}}\) is a path-bounded time-stepping strategy for (3.5) if there exist real non-negative constants \(0\le Q<R\) (where R may be infinite if \(Q\ne 0\)) such that on the event \(\{h_{\min }< h_{n+1} \le h_{\max }\}\),

$$\begin{aligned} Q\le \big \Vert {{\widetilde{Y}}}(t_n)\big \Vert < R, \quad n=0,\dots , N-1. \end{aligned}$$
(3.12)

Note that throughout this paper we use a strategy where \(Q=0\) and \(R<\infty \). As we will see in Sect. 5.2, a careful choice of the parameter \(\kappa \) can be used to minimise invocations of the backstop method when \(\rho \) is fixed.

4 Main results

Our first main result shows strong convergence with order 1 of solutions of (3.5) to solutions of (1.1) when \(\{h_{n+1}\}_{n\in {\mathbb {N}}}\) is a path-bounded time-stepping strategy ensuring that (3.12) holds.

Theorem 4.1

(Strong Convergence) Let \((X(t))_{t\in [0,T]}\) be a solution of (1.1) with initial value \(X(0) = X_0{\in {\mathbb {R}}^d}\). Suppose that the conditions of Assumptions 2.1 and 2.2 hold. Let \(\big \{\big (\widetilde{Y}(s)\big )_{s\in [t_n,t_{n+1}]},h_{n+1}\big \}_{n\in {\mathbb {N}}}\) be the adaptive Milstein scheme given in Definition 3.3 with initial value for the first component \({{\widetilde{Y}}}_0 = X_0\) and path-bounded time-stepping strategy \(\{h_{n+1}\}_{n\in {\mathbb {N}}}\) satisfying the conditions of Definition 3.5 for some \(R<\infty \). Then there exists a constant \(C(R,\rho ,T) > 0\) such that

$$\begin{aligned} \max _{t\in [0,T]}\Big ({\mathbb {E}}\Big [\Vert X(t)-\widetilde{Y}(t)\Vert ^2\Big ]\Big )^{1/2} \le C(R,\rho ,T)\,h_{\max }. \end{aligned}$$
(4.1)

Furthermore,

$$\begin{aligned} \lim _{\rho \rightarrow \infty }C(R,\rho ,T)=\infty . \end{aligned}$$
(4.2)

The proof of Theorem 4.1, which is given in Sect. 7.2, accounts for the properties of the random sequences \(\{t_n\}_{n\in {\mathbb {N}}}\) and \(\{h_{n+1}\}_{n\in {\mathbb {N}}}\) and uses (3.12) to compensate for the non-Lipschitz drift and diffusion.

Our second main result shows that for the specific strategy given by (3.11) the probability of needing a backstop method can be made arbitrarily small by taking \(\rho \) sufficiently large for fixed \(\kappa \).

Theorem 4.2

(Probability of Backstop) Let all the conditions of Theorem 4.1 hold, and suppose that the path-bounded time-stepping strategy \(\{h_{n+1}\}_{n\in {\mathbb {N}}}\) also satisfies (3.11). Let \(C(R,\rho ,T)\) be the error constant in estimate (4.1) from the statement of Theorem 4.1.

For any fixed \(\kappa \ge 1\) there exists a constant \(C_{\text {prob}}=C_{\text {prob}}(T,R,h_{\max })\) such that, for \(h_{\max }\,\le \,1/C(R,\rho ,T)\),

$$\begin{aligned} {\mathbb {P}}\left[ {h_{n+1}=h_{\min }} \right] \le C_{\text {prob}}\,\,\rho ^{1-2\kappa }. \end{aligned}$$
(4.3)

Further for arbitrarily small tolerance \(\varepsilon \in (0,1)\), there exists \(\rho >0\) such that

$$\begin{aligned} {\mathbb {P}}\left[ {h_{n+1}=h_{\min }} \right] <\varepsilon ,\quad n\in {\mathbb {N}}. \end{aligned}$$

For proof see Sect. 7.3.

5 Numerical examples

Remark 5.1

We use the adaptive strategy in (3.11). We ensure that we reach the final time by taking \(h_{N}=T-t_{N-1}\) as our final step, and in a situation where this is smaller than \(h_{\min }\) we use the backstop method (this is compatible with the proofs below).

In the numerical experiments below, we set the adaptive Milstein scheme (AMil) as in (3.5) with (3.11) as the choice of \(h_{n+1}\). Projected Milstein (PMil) in [1, Eq. (24)] is set to be the backstop method of AMil and the reference method of all models. Then we compare the strong convergence,looking at the root mean square (RMS) error, and efficiency, by comparing the CPU time, of AMil and PMil, Split-Step Backward Milstein method (SSBM) [1, Eq. (25)], the new variant of Milstein (TMil) in [20], and the Tamed Stochastic Runge–Kutta of order 1.0 (TSRK1) method [8, Eq. (3.8) (3.9)]. For the non-adaptive schemes, to examine strong convergence, we take as the fixed step \(h_{\text {mean}}\) the average of all time steps over each path and each Monte Carlo realization \(m = 1,\dots , M\) so that

$$\begin{aligned} h_{\text {mean}}:=\frac{1}{M}\sum _{m=1}^{M}\frac{T}{N_{m}}, \end{aligned}$$

where \(N_{m}\) denotes the number of steps taken on the \(m^{th}\) sample path to reach T.

Fig. 1
figure 1

Strong convergence and efficiency of model (5.1) with a and b for additive noise; c and d for multiplicative noise. e Two paths of timestep h for \(\rho =2,6\) and in f the estimated probability of using \(h_{\min }\) for the multiplicative noise model with \(M=100\) realizations

5.1 One-dimensional test equations with multiplicative and additive noise

In order to demonstrate strong convergence of order one for a scalar test equation with non-globally Lipschitz drift, consider

$$\begin{aligned} dX(t)=\big (X(t)-3X(t)^3\big )dt+G(X(t))dW(t), \quad t\in [0,1]. \end{aligned}$$
(5.1)

For illustrating both the multiplicative and additive noise cases, we estimate the RMS error by a Monte Carlo method using \(M=1000\) trajectories for \(h_{\max }=[2^{-14}, 2^{-12}, 2^{-10}, 2^{-8}, 2^{-6}]\), \(\rho =2^2\), \(\kappa =1\), and use as a reference solution PMil over a mesh with uniform step sizes \(h_{\text {ref}}=2^{-18}\).

For additive noise we set \(G(x)=\sigma \) in (5.1), and for multiplicative noise we set \(G(x)=\sigma (1-x^2)\) with \(\sigma =0.2\) and \(X(0)=11\) in both cases. Strong convergence of order one is displayed by all methods in Fig. 1 part (a) and (c) for the additive and multiplicative cases respectively, with the efficiency displayed in parts (b) and (d).

Finally, consider Theorem 4.2. We illustrate that the probability of our time-stepping strategy selecting \(h_{\min }\), and therefore triggering an application of the backstop method, can be made arbitrarily small at every step by an appropriate choice of \(\rho \) (with fixed \(\kappa =1\)). Consider (5.1) again with \(G(x)=\sigma (1-x^2)\), this time with \(X(0)=100\), \(\kappa , T=1\), \(h_{\max }=2^{-20}\) and \(\rho =[2, 4, \dots , 16]\). In Fig. 1e, we plot two paths of h when \(\rho =2, 6\). Observe that when \(\rho =2\) the backstop is triggered only for the first \(10^5\) steps approximately, whereas once \(\rho \) is increase to 6 this is reduced to the first \(2\times 10^4\) steps approximately. Estimated probabilities of using \(h_{\min }\) are plotted on a log-log scale as a function of \(\rho \) in Fig. 1f (with \(M=100\) realizations). The estimated probability of using \(h_{\min }\) declines to zero as \(\rho \) increases. We observe a rate close to \(-1\), matching that in (4.3) with \(\kappa =1\).

5.2 One-dimensional model of telomere shortening

The following one-dimensional SDE model was given in [9, Eq. (A6)] for modelling the shortening over time of telomere length L in DNA replication

$$\begin{aligned} dL(t) = -\big (c+aL(t)^2\big )dt + \sqrt{\frac{1}{3}aL(t)^3}dW(t). \end{aligned}$$
(5.2)

The parameter c determines the underlying decay rate of the length and a controls the intensity at which random breaks occur in the telomere; we take \((a,c)=(0.41\times 10^{-6},7.5)\) as in [9]. In this example we fix \(\rho =4\), instead adjusting the parameter \(\kappa \) in (3.11) to control use of the backstop method. Individual paths are shown in Fig. 2 where we take \(h_{\max }=2^{-18}\), and \(h=2^{-20}\) for the fixed step methods.

We set \(L(0)=1000\), noting from [9] that initial values could be as high as (say) \(L(0)=6000\) and remain physically realistic. The end of the interval of valid simulation is determined by the first time at which trajectories reach zero, and is therefore random. However this is not observed to occur in the timescale (25 days) we consider here.

By design PMil projects the data onto a ball of radius determined in part by the growth of the drift term. We see in Fig. 2a that (PMil) immediately is reduced to approximately 200.

Contrarily, the design of TMil scales both drift and diffusion terms by \(1/(1+h|L|^2)\) for this model. When \(h|L|^2\) is large this scaling can damp out changes from step to step, and in Fig. 2a we see that TMil shows as (spuriously) almost constant. The paths of the other methods, AMil, SSBM and TSRK1 are close together as shown in Fig. 2a and in high detail in (b).

Notice that we used \(\kappa =8\) in (3.11) for AMil method to reduce the chance of requiring the backstop method PMil while keeping \(\rho =4\). We avoid setting \(\kappa =1\) in this case because \(L(0)=1000\) and so the adaptive step \(h_{n+1}\) would too frequently require the backstop method.

Fig. 2
figure 2

Single paths of the Telomere length SDE (5.2) solved over 25 days. b shows a detailed plot from (a)

Fig. 3
figure 3

Two-dimensional system (5.3). a and b show the strong convergence and efficiency for diagonal noise, c and d with commutative and e and f for non-commutative noise. We choose \(a=3\), \(\sigma =0.2\) and \(b=1.5\)

5.3 Two-dimensional test systems

We now consider three (\(i=1,2,3\)) different SDEs:

$$\begin{aligned} dX(t)=F(X(t))dt+G_i(X(t))dW(t),\quad t\in [0,1],\quad X(0)=[7,9]^T, \end{aligned}$$
(5.3)

with \(W(t)=[W_1(t),W_2(t)]^T\), where \(W_1\) and \(W_2\) are independent scalar Wiener processes, \(X(t)=[X_1(t),X_2(t)]^T\), \(F(x)=[x_2-3x_1^3,x_1-3x_2^3]^T\), and

$$\begin{aligned} G_1(x)=\sigma \begin{pmatrix}x_1^2 &{}\quad 0\\ 0 &{}\quad x_2^2\end{pmatrix}, \, G_2(x)=\sigma \begin{pmatrix}x_2^2 &{}\quad x_2^2\\ x_1^2 &{}\quad x_1^2\end{pmatrix},\, G_3(x)=\sigma \begin{pmatrix}1.5x_1^2 &{}\quad x_2\\ x_2^2 &{}\quad 1.5x_1\end{pmatrix}. \end{aligned}$$

\(G_1\) is an example of diagonal noise, \(G_2\) commutative noise, and \(G_3\) non-commutative noise.

For \(G_1\) and \(G_2\) we use \(h_{\max }=[2^{-14}, 2^{-12}, 2^{-10}, 2^{-8}, 2^{-6}]\), \(h_{\text {ref}}=2^{-18}\), \(\rho =4\) and \(\kappa =1\). In Fig. 3a, c, we see order one strong convergence for all methods. Parts (b) and (d) show the efficiency of the adaptive method.

For \(i=3\), the non-commutative noise case, take \(h_{\max }=[2^{-8}, 2^{-7}, 2^{-6}, 2^{-5}, 2^{-4}]\), \( h_{\text {ref}}=2^{-11}\), \(\rho =2^2\) and \(X(0)=[3,4]^T\). To simulate the Lévy areas we follow the method in [12, Sec. 4.3], which is based on the Euler approximation of a system of SDEs. Again, we observe order one convergence for all methods in Fig. 3e and that AMil is the most efficient in (f). Note that as TSRK1 is only supported theoretically for commutative noise we do not consider it here.

6 Preliminary lemmas

We present five lemmas necessary for the proof of Theorem 4.1 and Theorem 4.2. Throughout this section we assume that f and g satisfy Assumptions 2.1 and (except for Lemma 6.4) that we are on the event \(\{h_{\min }<h_{n+1}\le h_{\max }\}\) so that (3.12) holds of Definition 3.5. We use (2.6), (2.8) and (2.7) to define some bounded constant coefficients depending on \(R<\infty \). The constants in (6.1) are then used in the development of a one-step error bound for the adaptive part of the scheme.

$$\begin{aligned} \begin{aligned} \big \Vert f\big ({{\widetilde{Y}}}(t_n)\big )\big \Vert \le \,&c_5(1+R^{q_1+2})=:C_{f};\\ \big \Vert {\textbf{D}}f\big (\widetilde{Y}(t_n)\big )\big \Vert _{{\textbf{F}}}\le \,&c_3(1+R^{q_1+1})=:C_{Df}; \\ \big \Vert g_i\big ({{\widetilde{Y}}}(t_n)\big )\big \Vert \le \big \Vert g\big (\widetilde{Y}(t_n)\big )\big \Vert _{{\textbf{F}}(d\times m)}\le \,&c_6(1 + R^{q_2+2})=:C_{g_i}; \\ \big \Vert {\textbf{D}}g_i\big ({{\widetilde{Y}}}(t_n)\big )\big \Vert _{{\textbf{F}}} \le \,&c_4(1 + R^{q_2+1})=:C_{Dg_i}. \end{aligned} \end{aligned}$$
(6.1)

The following lemma provides a bound for the even conditional moments of the iterated stochastic integral in (2.11).

Lemma 6.1

(Iterated Stochastic Integral) Let \(\big \{\big (\widetilde{Y}(s)\big )_{s\in [t_n,t_{n+1}]},h_{n+1}\big \}_{n\in {\mathbb {N}}}\) be the adaptive Milstein scheme given in Definitions 3.3 and 3.5. Then there exists a constant \(C_{\texttt {ISI}}\) such that for \(k\ge 1\), \(n\in {\mathbb {N}}\) and \(s\in [t_n,t_{n+1}]\), on the event \(\{h_{\min }<h_{n+1}\le h_{\max }\}\)

$$\begin{aligned} {\mathbb {E}}\Bigg [\Bigg \Vert \sum _{i,j=1}^{m}{\textbf{D}}g_i\big (\widetilde{Y}(t_n)\big )g_j\big (\widetilde{Y}(t_n)\big )I_{j,i}^{t_n,s}\Bigg \Vert ^{2k}\Bigg |{\mathcal {F}}_{t_n}\Bigg ]\le C^{}_{\texttt {ISI}}\left( k,R\right) |s-t_n|^{2k}, \end{aligned}$$
(6.2)

where

$$\begin{aligned} C^{}_{\texttt {ISI}}\left( k,R\right) :=&\,3^{2k} m^{4k}C_{Dg_i}^{2k} C_{g_i}^{2k}\Big ( \varvec{\gamma }_{4k}+1+\varvec{\gamma }_{2k}^2+C^{}_{\texttt {LA}}\left( 2k\right) \Big ). \end{aligned}$$
(6.3)

Here, \(\varvec{\gamma }_{p}\) is from (3.10), \(C^{}_{\texttt {LA}}\left( 2k\right) \) is from Lemma 2.2 with explicit form given in (A.1), and the R dependence in \(C^{}_{\texttt {ISI}}\left( k,R\right) \) arises from (6.1).

Proof

First of all, for convenience we set

$$\begin{aligned} G_{\texttt {ISI}}(s):=\Bigg \Vert \sum _{i,j=1}^{m}{\textbf{D}}g_i\big (\widetilde{Y}(t_n)\big )g_j\big (\widetilde{Y}(t_n)\big )I_{i,j}^{t_n,s}\Bigg \Vert ^{2k}. \end{aligned}$$

By (2.12) and (2.3), we have, for \(s\in [t_n, t_{n+1}]\) and \(n\in {\mathbb {N}}\),

$$\begin{aligned} G_{\texttt {ISI}}(s)\le & {} 3^{2k-1}\Bigg (\Bigg \Vert \frac{1}{2}\sum _{i=1}^{m}{\textbf{D}}g_i\big (\widetilde{Y}(t_n)\big )g_i\big ({{\widetilde{Y}}}(t_n)\big )\Big (\big (I_{i}^{t_n,s} \big )^2 -|s-t_n|\Big ) \Bigg \Vert ^{2k}\\{} & {} +\Bigg \Vert \frac{1}{2}\sum _{\begin{array}{c} i,j=1\\ i<j \end{array}}^{m}\Big ({\textbf{D}}g_i\big (\widetilde{Y}(t_n)\big )g_j\big (\widetilde{Y}(t_n)\big )+{\textbf{D}}g_j\big (\widetilde{Y}(t_n)\big )g_i\big (\widetilde{Y}(t_n)\big )\Big )I_{i}^{t_n,s}I_{j}^{t_n,s}\Bigg \Vert ^{2k}\\{} & {} +\Bigg \Vert \sum _{\begin{array}{c} i,j=1\\ i<j \end{array}}^{m}\Big ({\textbf{D}}g_i\big (\widetilde{Y}(t_n)\big )g_j\big (\widetilde{Y}(t_n)\big )-{\textbf{D}}g_j\big (\widetilde{Y}(t_n)\big )g_i\big ({{\widetilde{Y}}}(t_n)\big )\Big )A_{ij}^{t_n,s} \Bigg \Vert ^{2k}\Bigg ). \end{aligned}$$

Applying (2.3) again and by submultiplicativity of the Euclidean norm and the fact that the induced matrix 2-norm is bounded above by the Frobenius norm, for \(s\in [t_n, t_{n+1}]\) and \(n\in {\mathbb {N}}\), we get

$$\begin{aligned} G_{\texttt {ISI}}(s)\le & {} 3^{2k-1}\Bigg (\frac{ m^{2k-1}}{2^{2k}}\sum _{i=1}^{m}\big \Vert {\textbf{D}}g_i\big (\widetilde{Y}(t_n)\big )\big \Vert _{{\textbf{F}}}^{2k}\big \Vert g_i\big (\widetilde{Y}(t_n)\big )\big \Vert ^{2k}\Big (\big (I_{i}^{t_n,s} \big )^2+|s-t_n| \Big )^{2k}\\{} & {} +\left( \frac{ m(m-1)}{2}\right) ^{2k-1}\sum _{\begin{array}{c} i,j=1\\ i<j \end{array}}^{m}\Big \Vert {\textbf{D}}g_i\big (\widetilde{Y}(t_n)\big )g_j\big (\widetilde{Y}(t_n)\big )+{\textbf{D}}g_j\big (\widetilde{Y}(t_n)\big )g_i(Y(t_n)\Big \Vert ^{2k}\\{} & {} \times \bigg (\frac{1}{2^{2k}}\Big |I_{i}^{t_n,s}I_{j}^{t_n,s} \Big |^{2k}+\big |A_{ij}^{t_n,s} \big |^{2k}\bigg )\Bigg ). \end{aligned}$$

Applying conditional expectations on both sides, together with the pairwise conditional independence of \(I_{i}^{t_n,s}\) and \(I_{j}^{t_n,s}\) for \(i\ne j\), (2.6) and (6.1), we have for \(s\in [t_n, t_{n+1}]\) and \(n\in {\mathbb {N}}\)

$$\begin{aligned} {\mathbb {E}}\Big [G_{\texttt {ISI}}(s)\Big |{\mathcal {F}}_{t_n}\Big ]\le & {} 3^{2k}\Bigg ( m^{2k-1} C_{Dg_i}^{2k} C_{g_i}^{2k}\sum _{i=1}^{m}\left( {\mathbb {E}}\left[ \left| I_{i}^{t_n,s}\right| ^{4k}\bigg |{\mathcal {F}}_{t_n}\right] +|s-t_n|^{2k} \right) \\{} & {} +\left( \frac{ m(m-1)}{2}\right) ^{2k-1}C_{Dg_i}^{2k} C_{g_i}^{2k} \sum _{\begin{array}{c} i,j=1\\ i<j \end{array}}^{m} \bigg ( {\mathbb {E}}\left[ \left| I_{i}^{t_n,s}\right| ^{2k}\bigg |{\mathcal {F}}_{t_n}\right] {\mathbb {E}}\left[ \left| I_{j}^{t_n,s}\right| ^{2k}\bigg |{\mathcal {F}}_{t_n}\right] \\{} & {} +{\mathbb {E}}\left[ \left| A_{ij}^{t_n,s}\right| ^{2k}\bigg |{\mathcal {F}}_{t_n}\right] \bigg )\Bigg ). \end{aligned}$$

Using (3.9), (3.10) and (2.14) we have

$$\begin{aligned} {\mathbb {E}}\Big [G_{\texttt {ISI}}(s)\Big |{\mathcal {F}}_{t_n}\Big ]\le \,\,&C^{}_{\texttt {ISI}}\left( k,R\right) |s-t_n|^{2k}, \end{aligned}$$

where \(C^{}_{\texttt {ISI}}\left( k,R\right) \) is in (6.3). \(\square \)

The following lemma provides a bound on the conditional moments of the adaptive Milstein scheme in (3.5) over one step, in the case where the method applies the map \(\theta \).

Lemma 6.2

Consider \(\big \{\big (\widetilde{Y}(s)\big )_{s\in [t_n,t_{n+1}]},h_{n+1}\big \}_{n\in {\mathbb {N}}}\) from Definitions 3.3 and 3.5, and let \((Y_{\theta }(s))_{s\in (t_n,t_{n+1}]}\) be as defined in Definition 3.4. Then there exists a constant \(C_{ Y_{\theta }}> 0\) such that for \(k\ge 1\), \(n\in {\mathbb {N}}\) and \(s\in {(}t_n, t_{n+1}]\), on the event \(\{h_{\min }<h_{n+1}\le h_{\max }\}\),

$$\begin{aligned} {\mathbb {E}}\Big [\big \Vert Y_{\theta }(s)\big \Vert ^k\Big |{\mathcal {F}}_{t_n}\Big ] \le C_{ Y_{\theta }}\big (k,R\big ), \end{aligned}$$
(6.4)

where

$$\begin{aligned} C_{ Y_{\theta }}\big (k,R\big ):=\,\,4^{k-1}\Big (R^k+C_f^k+m^k C_{g_i}^k\,\, \varvec{\gamma }_{k}+C^{}_{\texttt {ISI}}\left( 2k\right) ^{1/2}\Big ), \end{aligned}$$
(6.5)

with the constant \(C_{\texttt {ISI}}\) from Lemma 6.1.

Proof

By (3.7), (3.4) and (2.3), we have, for \(s\in {(}t_n, t_{n+1}]\) and \(n\in {\mathbb {N}}\),

$$\begin{aligned} \big \Vert Y_{\theta }(s)\big \Vert ^k=\,&\left\| \theta \left( \widetilde{Y}(t_n)\textbf{,}\,\, t_n\textbf{,}\,\, s-t_n\right) \right\| ^k \\ \le&\,\,4^{k-1}\Bigg (\big \Vert \widetilde{Y}(t_n)\big \Vert ^k+\left\| f\big (\widetilde{Y}(t_n)\big )\right\| ^k|s-t_n|^k+\left\| \sum _{i=1}^{m}g_i\big (\widetilde{Y}(t_n)\big )I_{i}^{t_n,s}\right\| ^{k}\\ {}&+\Bigg (\Bigg \Vert \sum _{i,j=1}^{m}{{\textbf {D}}}g_i\big (\widetilde{Y}(t_n)\big )g_j\big (\widetilde{Y}(t_n)\big )I_{j,i}^{t_n,s}\Bigg \Vert ^{2k}\Bigg )^{1/2}\Bigg ). \end{aligned}$$

Applying (2.3), (3.12) and (6.1) for \(s\in {(}t_n, t_{n+1}]\) and \(n\in {\mathbb {N}}\), it yields

$$\begin{aligned} \big \Vert Y_{\theta }(s)\big \Vert ^k \le&\,\,4^{k-1}\left( R^k+C_f^k |s-t_n|^k+m^{k-1} C_{g_i}^k \sum _{i=1}^{m}\left| I_{i}^{t_n,s}\right| ^{k} \right. \\ {}&\left. +\Bigg (\Bigg \Vert \sum _{i,j=1}^{m}{{\textbf {D}}}g_i\big (\widetilde{Y}(t_n)\big )g_j\big (\widetilde{Y}(t_n)\big )I_{j,i}^{t_n,s}\Bigg \Vert ^{2k}\Bigg )^{1/2}\right) . \end{aligned}$$

Taking conditional expectation on both sides, with Jensen’s inequality on the last term we have for \(s\in {(}t_n, t_{n+1}]\) and \(n\in {\mathbb {N}}\)

$$\begin{aligned} {\mathbb {E}}\Big [\Vert Y_{\theta }(s)\Vert ^k\Big |{\mathcal {F}}_{t_n}\Big ]\le & {} \, 4^{k-1}\left( R^k+C_f^k|s-t_n|^k+ m^{k-1} C_{g_i}^k\sum _{i=1}^{m}{\mathbb {E}}\left[ \left| I_{i}^{t_n,s}\right| ^{k}\bigg |{\mathcal {F}}_{t_n} \right] \right. \\{} & {} \left. +\left( {\mathbb {E}}\Bigg [\Bigg \Vert \sum _{i,j=1}^{m}{\textbf{D}}g_i\big (\widetilde{Y}(t_n)\big )g_j\big (\widetilde{Y}(t_n)\big )I_{j,i}^{t_n,s}\Bigg \Vert ^{2k}\Bigg |{\mathcal {F}}_{t_n}\Bigg ]\right) ^{1/2}\right) . \end{aligned}$$

Using (3.9), (6.2) from Lemma 6.1 and since \(|s-t_n|\le h_{\max }\le 1\) (3.2) we have

$$\begin{aligned} {\mathbb {E}}\Big [\Vert Y_{\theta }(s)\Vert ^k\Big |{\mathcal {F}}_{t_n}\Big ] \le C_{Y_{\theta }}(k,R), \end{aligned}$$

where \(C_{Y_{\theta }}(k,R)\) is in (6.5). \(\square \)

The following lemma proves regularity in time of the adaptive Milstein scheme in (3.5) when applying the map \(\theta \).

Lemma 6.3

(Scheme Regularity) Consider \(\big \{\big (\widetilde{Y}(s)\big )_{s\in [t_n,t_{n+1}]},h_{n+1}\big \}_{n\in {\mathbb {N}}}\) in Definitions 3.3 and 3.5, and let \((Y_{\theta }(s))_{s\in {(}t_n,t_{n+1}]}\) be as defined in Definition 3.4. Then there exists a constant \(C_{\texttt {SR}}\) such that for \(k\ge 1\), \(n\in {\mathbb {N}}\) and \(s\in {(}t_n, t_{n+1}]\), on the event \(\{h_{\min }<h_{n+1}\le h_{\max }\}\)

$$\begin{aligned} {\mathbb {E}}\Big [\big \Vert Y_{\theta }(s)- {{\widetilde{Y}}}(t_n) \big \Vert ^{2k} \Big |{\mathcal {F}}_{t_n}\Big ] \le C^{}_{\texttt {SR}}\left( k,R\right) |s-t_n|^k, \end{aligned}$$
(6.6)

where

$$\begin{aligned} C^{}_{\texttt {SR}}\left( k,R\right) := \,\, 3^{2k-1}\Big (C_f^{2k} +m^{2k} C_{g_i}^{2k}\,\, \varvec{\gamma }_{2k} +C^{}_{\texttt {ISI}}\left( 2k\right) \Big ), \end{aligned}$$
(6.7)

with the constant \(C_{\texttt {ISI}}\) from Lemma 6.1.

Proof

The method of proof is similar to the proof of Lemma 6.2. \(\square \)

Remark 6.1

Our analysis requires a certain number of finite moments for the SDE (1.1), and it is necessary to track exactly what those are in order to see that the conditions of Assumption 2.2 are not violated. To this end, we introduce a superscript notation for random variables appearing as conditional expectations at this point. The notation should be interpreted according to the following example: in (6.9) below the random variable \(C_{\texttt {PR}}^{\{2k(q+2)\}}\) requires \(2k(q+2)\) finite moments of the SDE (1.1) to have finite expectation.

The following lemma examines the regularity of solutions of the SDE (1.1).

Lemma 6.4

(Path Regularity) Let f, g also satisfy Assumption 2.2, and let \((X(s))_{s\in [t_n,t_{n+1}]}\) be a solution of (1.1). Then there exists an \({\mathcal {F}}_{t_n}\)-measurable random variable \({\overline{C}}_{\texttt {PR}}^{\{2k(q+2)\}}\) such that for \(k\ge 1\), \(n\in {\mathbb {N}}\) and \(s\in [t_n,t_{n+1}]\) a.s.

$$\begin{aligned} {\mathbb {E}}\Big [\Vert X(s)-X(t_n)\Vert ^{2k}\Big |{\mathcal {F}}_{t_n}\Big ]&\,\le \,\, \overline{C}^{\{2k(q+2)\}}_{\texttt {PR}}\,\,|s-t_n|^{k}, \end{aligned}$$
(6.8)

where \(q=q_1\vee q_2\) is as defined in Assumption 2.2. Where a.s.

$$\begin{aligned} \overline{C}^{\{2k(q+2)\}}_{\texttt {PR}}= & {} \,2^{4k-2}c_5^{2k}\left( 1+{\mathbb {E}}\left[ {\sup _{p\in [t_n,t_{n+1}]}}\Vert X(p)\Vert ^{2k(q_1+2)}\bigg |{\mathcal {F}}_{t_n}\right] \right) \nonumber \\{}{} & {} {} +2^{4k-2}(k(2k-1))^k c_6^{2k}\left( 1+{\mathbb {E}}\left[ {\sup _{p\in [t_n,t_{n+1}]}}\Vert X(p)\Vert ^{2k(q_2+2)}\bigg |{\mathcal {F}}_{t_n}\right] \right) ,\nonumber \\ \end{aligned}$$
(6.9)

where the expectation of \({\overline{C}}_\texttt {PR}^{\{2k(q+2)\}}\) is denoted \(C^{}_{\texttt {PR}}\left( k\right) \), given by

$$\begin{aligned} C^{}_{\texttt {PR}}\left( k\right) :={\mathbb {E}}\left[ \overline{C}^{\{2k(q+2)\}}_{\texttt {PR}}\right] \le 2^{4k-2}\left( 1+C_{\texttt {X}} \right) \big ( c_5^{2k}+(k(2k-1))^k c_6^{2k}\big ). \end{aligned}$$
(6.10)

Proof

The method of proof follows that of [25, Thm. 7.1]. The bound (6.10) follows from (2.9) and Assumption 2.2. \(\square \)

The following lemma provides a bound on the even conditional moments of the remainder term from a Taylor expansion of either the drift f or diffusion g, around \({\widetilde{Y}}(t_n)\).

Lemma 6.5

(Taylor Error) Consider \(\big \{\big (\widetilde{Y}(s)\big )_{s\in [t_n,t_{n+1}]},h_{n+1}\big \}_{n\in {\mathbb {N}}}\) from

Definitions 3.3 and 3.5, and let \((Y_{\theta }(s))_{s\in [t_n,t_{n+1}]}\) be as defined in Definition 3.4. Let \(u\in \{f, g\}\) and set \(c_{{\textbf{D}}2}:=c_1\vee c_2\). Then there exists a constant \(C_{\texttt {TE}}\) such that for \(k\ge 1\), \(n\in {\mathbb {N}}\) and \(s\in [t_n, t_{n+1}]\), on the event \(\{h_{\min }<h_{n+1}\le h_{\max }\}\),

$$\begin{aligned} {\mathbb {E}}\left[ \Big \Vert \int _{0}^{1}(1-\epsilon ){\textbf{D}}^2u\Big (\widetilde{Y}(t_n) -\epsilon \big ( Y_{\theta }(s)-{{\widetilde{Y}}}(t_n)\big )\Big ) d\epsilon \Big \Vert _{\mathbf {T_3}}^{2k}\bigg | {\mathcal {F}}_{t_n}\right] \le C_{\texttt {TE}}\big (k,R\big ), \end{aligned}$$
(6.11)

where \(C_{\texttt {TE}}\left( k,R \right) :=c_{{\textbf{D}}2}^{2k}\left( 1+ 3^{2kq+1}\left( R^{2kq}+ C_{Y_{\theta }}\left( k,R \right) \right) \right) \), where \(C_{Y_{\theta }}\big (k,R\big )\) is from Lemma 6.2.

Proof

By using (2.2), (2.3), (2.8), Lemma 6.2, (3.12) and since \(c_{{\textbf{D}}2}=c_1 \vee c_2\), \(q=q_1 \vee q_2\) we have

$$\begin{aligned}&{\mathbb {E}}\left[ \Big \Vert \int _{0}^{1}(1-\epsilon ){{\textbf {D}}}^2u\Big (Y(t_n) -\epsilon \big (Y_{\theta }(s)-{{\widetilde{Y}}}(t_n)\big )\Big ) d\epsilon \Big \Vert _{\mathbf {T_3}}^{2k}\bigg | {\mathcal {F}}_{t_n}\right] \\ \le \,\,&{\mathbb {E}}\left[ \int _{0}^{1}(1-\epsilon )^{2k}\Big \Vert {{\textbf {D}}}^2u\Big ({{\widetilde{Y}}}(t_n) -\epsilon \big (Y_{\theta }(s)-{{\widetilde{Y}}}(t_n)\big )\Big )\Big \Vert _{\mathbf {T_3}}^{2k} d\epsilon \bigg | {\mathcal {F}}_{t_n}\right] \\ \le \,\,&c_{{{\textbf {D}}}2}^{2k}{\mathbb {E}}\left[ \int _{0}^{1}(1-\epsilon )^{2k}\Big (1+\big \Vert {{\widetilde{Y}}}(t_n)-\epsilon \cdot \big (Y_{\theta }(s)-{{\widetilde{Y}}}(t_n)\big )\big \Vert ^{2kq}\Big ) d\epsilon \bigg |{\mathcal {F}}_{t_n}\right] \\ \le \,\,&c_{{{\textbf {D}}}2}^{2k}{\mathbb {E}}\Big [1+3^{2kq-1}\big \Vert {{\widetilde{Y}}}(t_n)\big \Vert ^{2kq}\\ +&\int _{0}^{1}(1-\epsilon )^{2k}\epsilon ^{2kq}3^{2kq}\left( \Vert Y_{\theta }(s)\Vert ^{2kq}+\big \Vert {{\widetilde{Y}}}(t_n)\big \Vert ^{2kq}\right) d\epsilon \Bigg |{\mathcal {F}}_{t_n}\Bigg ]\\ \le \,\,&c_{{{\textbf {D}}}2}^{2k}\left( 1+ 3^{2kq+1}\big \Vert {{\widetilde{Y}}}(t_n)\big \Vert ^{2kq}+3^{2kq}{\mathbb {E}}\Big [\Vert Y_{\theta }(s)\Vert ^{2kq} \Big |{\mathcal {F}}_{t_n}\Big ]\right) \\ =\,\,&C^{}_{\texttt {TE}}\left( k,R\right) , \end{aligned}$$

where \((1-\epsilon )^{2k}\,\epsilon ^{2kq}\le 1\) for \(k,q\ge 1\) and \(\epsilon \in [0,1]\). \(\square \)

7 Proof of main theorems

In this section we prove the strong convergence result of Theorem 4.1 and Theorem 4.2 on the probability of using the backstop and the role of \(\rho \).

7.1 Setting up the error function

Notice that \({{\widetilde{Y}}} (s)\), from the explicit adaptive Milstein scheme (3.5), takes either the Milstein map \(\theta \) in (3.4) or the backstop map \(\varphi \) in (3.6) depending on the value of \(h_{n+1}\). Thus, we define the error by

$$\begin{aligned} {{\widetilde{E}}}(s):=\,X(s)-{{\widetilde{Y}}} (s) = E_{\theta }(s) + E_{\varphi }(s), \end{aligned}$$
(7.1)

for \(s\in [t_n, t_{n+1}]\) and \(n\in {\mathbb {N}}\). Here

$$\begin{aligned} E_{\varphi }(s):=\left( X(s)- \varphi \left( \widetilde{Y}(t_n)\varvec{,}\,\,t_n\varvec{,}\,\, {s-t_n}\right) \right) \, {\textbf{1}}_{\{h_{n+1}\le h_{\min }\}}, \end{aligned}$$
(7.2)

and \(Y_{\theta }(s)\) is as defined in Definition 3.4 and

$$\begin{aligned} E_{\theta }(s):= & {} \big (X(s) - Y_{\theta }(s)\big )\, {\textbf{1}}_{\{h_{\min }<h_{n+1}\le h_{\max }\}}\nonumber \\= & {} \,\left( {{\widetilde{E}}}(t_n)+\int _{t_n}^{s}\Delta f\big (X(r),{{\widetilde{Y}}}(t_n)\big )dr\right. \nonumber \\{} & {} \left. +\sum _{i=1}^{m}\int _{t_n}^{s}\Delta g_i\big (r,X(r),{{\widetilde{Y}}}(t_n)\big )dW_i(r)\right) \, {\textbf{1}}_{\{h_{\min }<h_{n+1}\le h_{\max }\}}, \end{aligned}$$
(7.3)

with

$$\begin{aligned} \Delta f\big (X(r),{\widetilde{Y}}(t_n)\big )&:=\,f(X(r))- f\big ({{\widetilde{Y}}}(t_n)\big ); \end{aligned}$$
(7.4)
$$\begin{aligned} \Delta g_i\big (r,X(r),\widetilde{Y}(t_n)\big )&:=\,g_i(X(r))-g_i\big (\widetilde{Y}(t_n)\big )-\sum _{j=1}^{m}{{\textbf {D}}}g_i\big (\widetilde{Y}(t_n)\big )g_j\big ({{\widetilde{Y}}}(t_n)\big )I_{j}^{t_n,r}. \end{aligned}$$
(7.5)

To simplify the proof of Theorem 4.1 and Theorem 4.2, we require two lemmas. First, we find the second-moment bound of \(\Delta g_i\) in (7.5) on the event \(\{h_{\min }<h_{n+1}\le h_{\max }\}\) (so that (3.12) holds).

Lemma 7.1

Let g satisfy Assumption 2.1 and \(\Delta g_i\) be as in (7.5). Take \(s\in [t_n,t_{n+1}]\), let X(s) be a solution of (1.1), consider \(\big (\widetilde{Y}(s),h_{n+1}\big )\) from Definitions 3.3 and 3.5, and let \(Y_{\theta }(s)\) be as defined in Definition 3.4. In this case there exists a constant \(C_{G}\) such that, on the event \(\{h_{\min }<h_{n+1}\le h_{\max }\}\),

$$\begin{aligned}{} & {} {\mathbb {E}}\left[ \left\| \Delta g_i\big (s,X(s),\widetilde{Y}(t_n)\big ) \right\| ^2\bigg |{\mathcal {F}}_{t_n}\right] \nonumber \\{} & {} \le 2{\mathbb {E}}\Big [\big \Vert g(X(s))-g\big (Y_{\theta }(s)\big )\big \Vert _{{\textbf{F}}(d\times m)}^2 \Big |{\mathcal {F}}_{t_n}\Big ] +C_{G}(R) |s-t_n|^2, \end{aligned}$$
(7.6)

where

$$\begin{aligned} C_{G}(R) :=\,8 C_{Dg_i}^2 \big (C_f^2+ C^{}_{\texttt {ISI}}\left( 1,R\right) \big ) +4C^{}_{\texttt {TE}}\left( 2,R\right) ^{1/2} C^{}_{\texttt {SR}}\left( 4,R\right) ^{1/2}, \end{aligned}$$
(7.7)

and \(C_{\texttt {ISI}}\), \(C_{\texttt {TE}}\) and \(C_{\texttt {SR}}\) are from Lemmas 6.1, 6.5 and 6.3, respectively.

Proof

Substitute \(\Delta g_i\) by (7.5) in the LHS of (7.6), add and subtract \(g_i\big ( Y_{\theta }(s)\big )\), and use (2.3) to get

$$\begin{aligned}{} & {} {\mathbb {E}}\left[ \left\| \Delta g_i\big (s,X(s),{{\widetilde{Y}}}(t_n)\big ) \right\| ^2\bigg |{\mathcal {F}}_{t_n}\right] \le 2{\mathbb {E}}\Big [\Big \Vert g_i(X(s))-g_i\big ( Y_{\theta }(s)\big )\Big \Vert ^2 \Big |{\mathcal {F}}_{t_n}\Big ] \nonumber \\{} & {} +\underbrace{2{\mathbb {E}}\Bigg [\Bigg \Vert g_i\big ( Y_{\theta }(s)\big )-g_i\big (\widetilde{Y}(t_n)\big )-\sum _{j=1}^{m}{\textbf{D}}g_i\big (\widetilde{Y}(t_n)\big )g_j\big ({{\widetilde{Y}}}(t_n)\big )I_{j}^{t_n,s} \Bigg \Vert ^2 \Bigg |{\mathcal {F}}_{t_n}\Bigg ]}_{=: G_{1}}. \end{aligned}$$
(7.8)

To analyse \(G_{1}\), we expand \(g_i ( Y_{\theta }(s) )\) using Taylor’s theorem (see for example [23, A.1]) around \(g_i\big ({{\widetilde{Y}}}(t_n)\big )\) to get

$$\begin{aligned}{} & {} g_i\big ( Y_{\theta }(s)\big )-g_i\big ({{\widetilde{Y}}}(t_n)\big ) =\, {\textbf{D}}g_i\big ({{\widetilde{Y}}}(t_n)\big )\big ( Y_{\theta }(s)-{{\widetilde{Y}}}(t_n)\big )\nonumber \\{} & {} +\int _{0}^{1}(1-\epsilon ){\textbf{D}}^2g_i\Big (\widetilde{Y}(t_n)-\epsilon \big ( Y_{\theta }(s) -\widetilde{Y}(t_n)\big )\Big )\Big [ Y_{\theta }(s)-\widetilde{Y}(t_n)\Big ]^2d\epsilon , \end{aligned}$$
(7.9)

where we recall from Sect. 2 that \([\cdot ]^2\) represents the outer product of a vector with itself. Substituting (7.9) into \(G_{1}\) in (7.8), taking out \({\textbf{D}}g_i\big ({{\widetilde{Y}}}(t_n)\big )\) as a common factor, and applying (2.3) gives

$$\begin{aligned} G_{1}\,\le&\,\,4 {\mathbb {E}}\Bigg [\Bigg \Vert {\textbf{D}}g_i\big (\widetilde{Y}(t_n)\big )\bigg ( Y_{\theta }(s)-\widetilde{Y}(t_n)-\sum _{j=1}^{m}g_j\big (\widetilde{Y}(t_n)\big )I_{j}^{t_n,s}\bigg )\Bigg \Vert ^2\Bigg |{\mathcal {F}}_{t_n}\Bigg ] \nonumber \\&\quad +4 {\mathbb {E}}\Bigg [\Bigg \Vert \int _{0}^{1}(1-\epsilon ){\textbf{D}}^2g_i\Big (\widetilde{Y}(t_n)-\epsilon \big ( Y_{\theta }(s)-\widetilde{Y}(t_n)\big )\Big )\nonumber \\&\qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \times \Big [ Y_{\theta }(s)-{{\widetilde{Y}}}(t_n)\Big ]^2d\epsilon \Bigg \Vert ^2 \Bigg |{\mathcal {F}}_{t_n}\Bigg ] \nonumber \\ =:\,\,&G_{1.1}+G_{1.2}. \end{aligned}$$
(7.10)

For \(G_{1.1}\) in (7.10), by submultiplicativity of the Euclidean norm and the fact that the induced matrix 2-norm is bounded above by the Frobenius norm; by (3.7), (6.1) and (6.2) in the statement of Lemma 6.1 with \(k=1\), we have

$$\begin{aligned} G_{1.1} \,&\le \,\, 8\Big \Vert {{\textbf {D}}}g_i\big (\widetilde{Y}(t_n)\big )\Big \Vert _{{{\textbf {F}}}}^2\Bigg (\Big \Vert f\big (\widetilde{Y}(t_n)\big )\Big \Vert ^2|s-t_n|^2\nonumber \\ {}&\,\,\quad +{\mathbb {E}}\Bigg [\Bigg \Vert \sum _{i,j=1}^{m}{{\textbf {D}}}g_i\big (\widetilde{Y}(t_n)\big )g_j\big (\widetilde{Y}(t_n)\big )I_{j,i}^{t_n,s}\Bigg \Vert ^2\Bigg |{\mathcal {F}}_{t_n}\Bigg ]\Bigg )\nonumber \\ \,&\le \,\, 8 C_{Dg_i}^2\big (C_f^2+ C^{}_{\texttt {ISI}}\left( 1,R\right) \big )|s-t_n|^2. \end{aligned}$$
(7.11)

For \(G_{1.2}\) in (7.10), we apply (2.2), the Cauchy-Schwarz inequality, then using (6.11) in Lemma 6.5 with \(k=2\) and (6.6) in Lemma 6.3 with \(k=4\) we get

$$\begin{aligned} G_{1.2} \,&\le \,\, 4 \int _{0}^{1}\Big ({\mathbb {E}}\Big [\Big \Vert (1-\epsilon ){{\textbf {D}}}^2g_i\Big (\widetilde{Y}(t_n)-\epsilon \big ( Y_{\theta }(s)-\widetilde{Y}(t_n)\big )\Big )\Big \Vert _{\mathbf {T_3}}^4 \Big |{\mathcal {F}}_{t_n}\Big ]\Big )^{1/2}d\epsilon \nonumber \\ {}&\,\,\quad \times \Big ({\mathbb {E}}\Big [\Big \Vert Y_{\theta }(s)-{{\widetilde{Y}}}(t_n) \Big \Vert ^8 \Big |{\mathcal {F}}_{t_n}\Big ]\Big )^{1/2} \nonumber \\ {}&\le \,\, 4C^{}_{\texttt {TE}}\left( 2,R\right) ^{1/2} C^{}_{\texttt {SR}}\left( 4,R\right) ^{1/2} \,|s-t_n|^2. \end{aligned}$$
(7.12)

Substituting the bounds (7.11) and (7.12) back to (7.10) before bringing together the terms in (7.8), we have

$$\begin{aligned} {\mathbb {E}}\left[ \left\| \Delta g_i\big (s,X(s),{{\widetilde{Y}}}(t_n)\big ) \right\| ^2\bigg |{\mathcal {F}}_{t_n}\right]\le & {} 2{\mathbb {E}}\Big [\Big \Vert g_i(X(s))-g_i\big ( Y_{\theta }(s)\big )\Big \Vert ^2 \Big |{\mathcal {F}}_{t_n}\Big ] \\{} & {} +C_{G }(R) |s-t_n|^2, \end{aligned}$$

where \(C_G(R) \) is given in (7.7). By bounding \(\Vert g_i\Vert ^2\) with \(\Vert g\Vert _{{\textbf{F}}(d\times m)}^2\), the statement of Lemma 7.1 follows. \(\square \)

The second lemma in the following gives the conditional second-moment bound of \(E_{\theta }(s)\) as in (7.3), which is the first part of the one-step error in (7.1).

Lemma 7.2

Let f, g satisfy Assumption 2.1 and 2.2. Let X(s) be a solution of (1.1) and \({{\widetilde{E}}}(s)\) be given by (7.1) with \(E_{\theta }(s)\) defined in (7.3), with \(s\in [t_n,t_{n+1}]\), \(n\in {\mathbb {N}}\). In this case there exists a constant \(C_E\) and an \({\mathcal {F}}_{t_n}\)-measurable random variable \(\overline{C}^{\{4(q+2)\}}_{M}\) such that

$$\begin{aligned} {\mathbb {E}}\Big [\big \Vert E_{\theta }(t_{n+1})\big \Vert ^2 \Big |{\mathcal {F}}_{t_n}\Big ] \le \,\, \big \Vert \widetilde{E}(t_{n})\big \Vert ^2+C_E(R)\int _{t_n}^{t_{n+1}} {\mathbb {E}}\left[ \big \Vert E_{\theta }(r)\big \Vert ^2 \bigg |{\mathcal {F}}_{t_n}\right] dr\nonumber \\ +{\overline{C}}_M^{\{4(q+2)\}}(R)\,h_{n+1}^3, \quad a.s. \end{aligned}$$
(7.13)

where

$$\begin{aligned} C_E(R) :=\, 2K_1(R)+2c, \end{aligned}$$
(7.14)

with constant \(K_1\) as defined in (7.40). The \({\mathcal {F}}_{t_n}\)-measurable random variable \(\overline{C}^{\{4(q+2)\}}_M\) is given by

$$\begin{aligned} {\overline{C}}_M^{\{4(q+2)\}}(R)\,:=\,m^4 C^2_{Df} C^2_{g_i}+2\overline{K}_2^{\{4(q+2)\}}+mC_{G}(R) , \end{aligned}$$
(7.15)

with the \({\mathcal {F}}_{t_n}\)-measurable random variable \(\overline{K}^{\{4(q+2)\}}_2\) in (7.41), constant \(C_{G}\) in Lemma 7.1. Denote \({\mathbb {E}}\left[ {\overline{C}^{\{4(q+2)\}}_M(R)} \right] =:C_M(R)\), the finiteness of which is ensured in (7.44).

We recall that the superscript notation in (7.15) follows the convention introduced in the statement of Lemma 6.4 and indicates the number of finite moments required of the SDE solution.

Proof

Throughout the proof, we restrict attention to trajectories on the event \(\{h_{\min }<h_{n+1}\le h_{\max }\}\), since by (7.3), \(E_\theta (s)\) is only nonzero on this event, otherwise (7.13) holds trivially. Applying the stopping time variant of Itô formula (see Mao & Yuan [27]) to (7.3), we have,

$$\begin{aligned}{} & {} \big \Vert E_{\theta }(t_{n+1})\big \Vert ^2=\big \Vert \widetilde{E}(t_{n})\big \Vert ^2+2\int _{t_n}^{t_{n+1}}\underbrace{\Big \langle E_{\theta }(r),\Delta f\big (X(r),{{\widetilde{Y}}}(t_n)\big ) \Big \rangle }_{=:J_f} dr\\{} & {} +\sum _{i=1}^{m}\int _{t_n}^{t_{n+1}}{\Big \Vert \underbrace{\Delta g_i\big (r,X(r),{{\widetilde{Y}}}(t_n)\big )}_{=:J_{g_i}}\Big \Vert ^2} dr+2\sum _{i=1}^{m}\int _{t_n}^{t_{n+1}}\big \langle E_{\theta }(r),{J_{g_i}}\big \rangle dW_i(r). \end{aligned}$$

Take expectations on both sides conditional upon \({\mathcal {F}}_{t_n}\), and since \(\int _{t_n}^{t_{n+1}}\big |J_f\big | dr\) has finite expectation (by the boundedness of \(\widetilde{Y}(t_n)\) in (3.12) and the finiteness of absolute moments of X(r) see (2.9)), using Fubini’s Theorem (see for example [4, Proposition 12.10]) and (3.8) we have,

$$\begin{aligned} {\mathbb {E}}\Big [\big \Vert E_{\theta }(t_{n+1})\big \Vert ^2 \Big |{\mathcal {F}}_{t_n}\Big ] = \big \Vert {{\widetilde{E}}}(t_{n})\big \Vert ^2 +2\int _{t_n}^{t_{n+1}}{\mathbb {E}}\big [J_f\big |{\mathcal {F}}_{t_n}\big ]dr\nonumber \\ +\sum _{i=1}^{m}\int _{t_n}^{t_{n+1}}{\mathbb {E}}\big [{\Vert }J_{g_i}{\Vert ^2}\big |{\mathcal {F}}_{t_n}\big ]dr, \end{aligned}$$
(7.16)

By Lemma 7.1, we have the bound of \({\Vert }J_{g_i}{\Vert ^2}\) in (7.16) as

$$\begin{aligned} {\mathbb {E}}\big [{\Vert }J_{g_i}{\Vert ^2}\big |{\mathcal {F}}_{t_n}\big ]\le & {} \, 2{\mathbb {E}}\Big [\big \Vert g(X(r))-g(Y_{\theta }(r))\big \Vert _{{\textbf{F}}(d\times m)}^2 \Big |{\mathcal {F}}_{t_n}\Big ]\nonumber \\{} & {} \quad + C_{G}(R) |r-t_n|^2. \end{aligned}$$
(7.17)

For \(J_f\), by substituting \(\Delta f\) with (7.4) with adding in and subtracting out \(f(Y_{\theta }(r))\), we have

$$\begin{aligned} J_f=\big \langle E_{\theta }(r),f(X(r))-f(Y_{\theta }(r))\big \rangle +\underbrace{\big \langle E_{\theta }(r),f(Y_{\theta }(r))-f\big (\widetilde{Y}(t_n)\big )\big \rangle }_{=:H}. \end{aligned}$$
(7.18)

Substituting (7.18) and (7.17) back into (7.16), we have

$$\begin{aligned}{} & {} {\mathbb {E}}\Big [\big \Vert E_{\theta }(t_{n+1})\big \Vert ^2 \Big |{\mathcal {F}}_{t_n}\Big ] \le \big \Vert \widetilde{E}(t_{n})\big \Vert ^2+mC_{G}(R) h_{n+1}^3 \nonumber \\{} & {} +2\int _{t_n}^{t_{n+1}}{\mathbb {E}}\big [J_{f,g}\big |{\mathcal {F}}_{t_n}\big ]dr+2\int _{t_n}^{t_{n+1}}{\mathbb {E}}\big [H\big |{\mathcal {F}}_{t_n}\big ]dr, \end{aligned}$$
(7.19)

where

$$\begin{aligned} J_{f,g}:=\big \langle E_{\theta }(r),f(X(r))-f(Y_{\theta }(r))\big \rangle +\big \Vert g(X(r))-g(Y_{\theta }(r))\big \Vert _{{{\textbf {F}}}(d\times m)}^2. \end{aligned}$$
(7.20)

For H in (7.18), and in a similar way to (7.9), we expand \(f(Y_{\theta }(r))\) using Taylor’s theorem around \({{\widetilde{Y}}}(t_n)\) to have

$$\begin{aligned}{} & {} f(Y_{\theta }(r))-f\big (\widetilde{Y}(t_n)\big )={\textbf{D}}f\big (\widetilde{Y}(t_n)\big )\big (Y_{\theta }(r)-\widetilde{Y}(t_n)\big )\nonumber \\{} & {} +\int _{0}^{1}(1-\epsilon ){\textbf{D}}^2f\Big (\widetilde{Y}(t_n)-\epsilon \cdot \big (Y_{\theta }(r)-\widetilde{Y}(t_n)\big )\Big )\Big [Y_{\theta }(r)-\widetilde{Y}(t_n)\Big ]^2d\epsilon . \end{aligned}$$
(7.21)

Then we substitute \(Y_{\theta }(r)\) in the first term on the RHS of (7.21) with (3.4) where we use the expanded form of the map as characterised in (2.12) for \(s=r\). Therefore, for the last term on the RHS of (7.19), we have

$$\begin{aligned} {\mathbb {E}}\big [H\big |{\mathcal {F}}_{t_n}\big ]\,\le \, H_1+H_2+H_3+H_4+H_5+H_6, \end{aligned}$$
(7.22)

where

$$\begin{aligned} H_1 := \,\,&{\mathbb {E}}\Big [\left\langle E_{\theta }(r)\textbf{,}\,\, {{\textbf {D}}}f\big ({{\widetilde{Y}}}(t_n)\big )|r-t_n|f\big (\widetilde{Y}(t_n)\big )\right\rangle \Big |{\mathcal {F}}_{t_n}\Big ]; \\ H_2 := \,\,&{\mathbb {E}}\bigg [\left\langle E_{\theta }(r)\textbf{,}\,\, \underbrace{\sum _{i=1}^{m}{{\textbf {D}}}f\big (\widetilde{Y}(t_n)\big )g_i\big ({{\widetilde{Y}}}(t_n)\big )I_{i}^{t_n,r}}_{=:H_{2R}} \right\rangle \bigg |{\mathcal {F}}_{t_n} \bigg ];\\ H_{3}:=\,\,&{\mathbb {E}}\Bigg [\Bigg \langle E_{\theta }(r),\,\, \frac{1}{2}\sum _{i=1}^{m}{{\textbf {D}}}f\big (\widetilde{Y}(t_n)\big ){{\textbf {D}}}g_i\big (\widetilde{Y}(t_n)\big )g_i\big ({{\widetilde{Y}}}(t_n)\big )\\ {}&\times \left( \left( I_{i}^{t_n,r}\right) ^2-|r-t_n|\right) \Bigg \rangle \Bigg |{\mathcal {F}}_{t_n} \Bigg ];\\ H_{4}\,:=\,\,&{\mathbb {E}}\Bigg [\Bigg \langle E_{\theta }(r),\,\,\frac{1}{2}\sum _{{\begin{array}{c} i,j=1\\ i<j \end{array}}}^{m}{{\textbf {D}}}f\big (\widetilde{Y}(t_n)\big )\Big ({{\textbf {D}}}g_i\big (\widetilde{Y}(t_n)\big )g_j\big ({{\widetilde{Y}}}(t_n)\big )\\&\qquad +{{\textbf {D}}}g_j\big (\widetilde{Y}(t_n)\big )g_i\big ({{\widetilde{Y}}}(t_n)\big )\Big ) I_{i}^{t_n,r}I_{j}^{t_n,r} \Bigg \rangle \Bigg |{\mathcal {F}}_{t_n} \Bigg ];\\ H_{5}:=&\,\,{\mathbb {E}}\Bigg [ \Bigg \langle E_{\theta }(r),\,\, \sum _{{\begin{array}{c} i,j=1\\ i<j \end{array}}}^{m}{{\textbf {D}}}f\big (\widetilde{Y}(t_n)\big )\Big ({{\textbf {D}}}g_i\big (\widetilde{Y}(t_n)\big )g_j\big ({{\widetilde{Y}}}(t_n)\big )\\ {}&\qquad \qquad \qquad \quad -{{\textbf {D}}}g_j\big (\widetilde{Y}(t_n)\big )g_i\big ({{\widetilde{Y}}}(t_n)\big )\Big )A_{ij}(t_n, r)\Bigg \rangle \Bigg |{\mathcal {F}}_{t_n} \Bigg ];\\ H_{6}:=\,&{\mathbb {E}}\bigg [\bigg <E_{\theta }(r),\,\, \int _{0}^{1}(1-\epsilon ){{\textbf {D}}}^2f\Big (\widetilde{Y}(t_n)-\epsilon \cdot \big (Y_{\theta }(r)-{{\widetilde{Y}}}(t_n)\big )\Big )\\ {}&\qquad \times \Big [Y_{\theta }(r)-\widetilde{Y}(t_n)\Big ]^2d\epsilon \bigg >\bigg |{\mathcal {F}}_{t_n} \bigg ]. \end{aligned}$$

We will now determine suitable upper bounds for each of \(H_1\), \(H_2\), \(H_3\), \(H_4\), \(H_5\), and \(H_6\) in turn. For \(H_1\) in (7.22), by the Cauchy-Schwarz inequality, (2.1), and (6.1), we have

$$\begin{aligned} H_1\le&\,\,{\mathbb {E}}\Big [\Vert E_{\theta }(r)\Vert \, \big \Vert {\textbf{D}}f\big (\widetilde{Y}(t_n)\big )\big \Vert _{{\textbf{F}}}\,\big \Vert f\big (\widetilde{Y}(t_n)\big )\big \Vert \,|r-t_n|\,\Big |{\mathcal {F}}_{t_n}\Big ]\nonumber \\ \le&\,\,{\mathbb {E}}\left[ \frac{1}{2}\big \Vert {\textbf{D}}f\big (\widetilde{Y}(t_n)\big )\big \Vert _{{\textbf{F}}}^2\,\big \Vert f\big (\widetilde{Y}(t_n)\big )\big \Vert ^2\Vert E_{\theta }(r)\Vert ^2+\frac{1}{2}|r-t_n|^2\,\bigg |{\mathcal {F}}_{t_n}\right] \nonumber \\ \le&\,\,\frac{1}{2} C_{Df}^2 C_{f}^2\,\,{\mathbb {E}}\Big [\Vert E_{\theta }(r)\Vert ^2\Big |{\mathcal {F}}_{t_n}\Big ]+\frac{1}{2}|r-t_n|^2. \end{aligned}$$
(7.23)

Next, for the analysis of \(H_2\) in (7.22), by (3.8), we firstly have

$$\begin{aligned} {\mathbb {E}}\big [H_{2R}\big |{\mathcal {F}}_{t_n} \big ]=\sum _{i=1}^{m}{\textbf{D}}f\big (\widetilde{Y}(t_n)\big )g_i\big (\widetilde{Y}(t_n)\big ){\mathbb {E}}\Big [I_{i}^{t_n,r}\Big |{\mathcal {F}}_{t_n} \Big ] =0. \end{aligned}$$
(7.24)

By (2.3), the Cauchy-Schwarz inequality, (6.1) and (3.9) we also have

$$\begin{aligned} {\mathbb {E}}\Big [\big \Vert H_{2R}\big \Vert ^2\Big |{\mathcal {F}}_{t_n} \Big ] \le \,\,&m\sum _{i=1}^{m}\big \Vert {\textbf{D}}f\big (\widetilde{Y}(t_n)\big )\big \Vert _{{\textbf{F}}}^2 \big \Vert g_i\big (\widetilde{Y}(t_n)\big )\big \Vert ^2 {\mathbb {E}}\left[ \left| I_{i}^{t_n,r} \right| ^2 \bigg |{\mathcal {F}}_{t_n} \right] \nonumber \\ \le \,\,&m^2 C_{Df}^2 C_{g_i}^2|r-t_n|. \end{aligned}$$
(7.25)

Then, for \(H_2\) in (7.22) we firstly expand \(E_{\theta }(r)\) using (7.3) to have

$$\begin{aligned} H_2=&\, {\mathbb {E}}\left[ \left\langle {{\widetilde{E}}}(t_n)\varvec{,}\,\, H_{2R} \right\rangle \bigg |{\mathcal {F}}_{t_n} \right] +{\mathbb {E}}\left[ \left\langle \int _{t_n}^{r}\Delta f(X(p),Y(t_n))dp\varvec{,}\,\, H_{2R}\right\rangle \bigg |{\mathcal {F}}_{t_n} \right] \nonumber \\ {}&\quad +{\mathbb {E}}\left[ \left\langle \sum _{i=1}^{m}\int _{t_n}^{r}\Delta g_i\big (p,X(p),{{\widetilde{Y}}}(t_n)\big )dW_{i}(p)\varvec{,}\,\, H_{2R} \right\rangle \bigg |{\mathcal {F}}_{t_n} \right] \nonumber \\ =:&\,\, H_{2.1}+H_{2.2}+H_{2.3}. \end{aligned}$$
(7.26)

For \(H_{2.1}\) in (7.26), by (7.24) we have

$$\begin{aligned} H_{2.1}=\big \langle {{\widetilde{E}}}(t_n)\varvec{,}\,\, {\mathbb {E}}\big [H_{2R}\big |{\mathcal {F}}_{t_n} \big ] \big \rangle =0. \end{aligned}$$
(7.27)

For \(H_{2.2}\) in (7.26), by adding in and subtracting out \(f(X(t_n))\) in \(\Delta f\) in (7.4):

$$\begin{aligned} H_{2.2}&=\,\, {\mathbb {E}}\Bigg [\left\langle \int _{t_n}^{r} f(X(r))-f(X(t_n)) dp\varvec{,}\,\, H_{2R} \right\rangle \Bigg |{\mathcal {F}}_{t_n} \Bigg ]\nonumber \\ {}&\quad \,\,+{\mathbb {E}}\Bigg [\left\langle \int _{t_n}^{r} f(X(t_n))-f\big (\widetilde{Y}(t_n)\big ) dp\varvec{,}\,\, H_{2R}\right\rangle \Bigg |{\mathcal {F}}_{t_n} \Bigg ]\nonumber \\ {}&=:\,\, H_{2.21}+H_{2.22}. \end{aligned}$$
(7.28)

Similar to \(H_{2.1}\) in (7.27), we have \(H_{2.22} = 0\). For \(H_{2.21}\) in (7.28), using the Cauchy-Schwarz inequality and (7.25) we have

$$\begin{aligned} H_{2.21}\le&\,\, {\mathbb {E}}\left[ \left\| \int _{t_n}^{r} f(X(p))-f(X(t_n)) dp\right\| \big \Vert H_{2R}\big \Vert \bigg |{\mathcal {F}}_{t_n} \right] \nonumber \\ \le&\Bigg (|r-t_n|\int _{t_n}^r{\mathbb {E}}\left[ \left\| f(X(p))-f(X(t_n)) \right\| ^2 \bigg |{\mathcal {F}}_{t_n} \right] dp\,\,{\mathbb {E}}\Big [\big \Vert H_{2R}\big \Vert ^2\Big |{\mathcal {F}}_{t_n} \Big ]\Bigg )^{1/2}\nonumber \\ \le&\,\, m C_{Df} C_{g_i}|r-t_n|\left( \int _{t_n}^r{\mathbb {E}}\left[ \left\| f(X(p))-f(X(t_n)) \right\| ^2\bigg |{\mathcal {F}}_{t_n} \right] dp\right) ^{1/2}. \end{aligned}$$
(7.29)

By Taylor expansion of f(X(p)) around \(f(X(t_n))\) to first order, and using (2.6), the Cauchy-Schwarz inequality, Lemma 6.4 with \(k=2\) and (2.3):

$$\begin{aligned}&\,\,{\mathbb {E}}\left[ \big \Vert f(X(p))-f(X(t_n)) \big \Vert ^2\bigg |{\mathcal {F}}_{t_n} \right] \nonumber \\&=\,\,{\mathbb {E}}\left[ \left\| \int _{0}^{1}{\textbf{D}}f(X(t_n)-\epsilon \cdot (X(p)-X(t_n))(X(p)-X(t_n))d\epsilon \right\| ^2\bigg |{\mathcal {F}}_{t_n} \right] \nonumber \\&\le \Big ({\mathbb {E}}\left[ \Vert X(p)-X(t_n)\Vert ^4\bigg |{\mathcal {F}}_{t_n} \right] \Big )^{1/2}\nonumber \\&\qquad \qquad \times \left( {\mathbb {E}}\left[ \left\| \int _{0}^{1}{\textbf{D}}f\big (X(t_n)-\epsilon \cdot (X(p)-X(t_n)\big )d\epsilon \right\| _{{\textbf{F}}}^4\bigg |{\mathcal {F}}_{t_n} \right] \right) ^{1/2}\nonumber \\&\le \,\,{\overline{C}}^{\{4(q+2)\}}_{H2.21}|p-t_n|, \end{aligned}$$
(7.30)

where

$$\begin{aligned} {\overline{C}}^{\{4(q+2)\}}_{H2.21}:= & {} \,\,\left( \overline{C}^{\{4(q+2)\}}_{\texttt {PR}}\right) ^{1/2}\nonumber \\{} & {} \times c_3^2\Big (1+3^{4q_1+4}{\mathbb {E}} \Big [\sup _{p\in [t_n,t_{n+1}]}\Vert X(p)\Vert ^{4q_1+4}\Big |{\mathcal {F}}_{t_n} \Big ]\Big )^{1/2}. \end{aligned}$$
(7.31)

Substituting (7.30) back to (7.29) and using that \(H_{2.22}=0\), we have

$$\begin{aligned} H_{2.2} \le m C_{Df} C_{g_i} \left( \overline{C}^{\{4(q+2)\}}_{H2.21}\right) ^{1/2}|r-t_n|^2. \end{aligned}$$
(7.32)

For \(H_{2.3}\) as in (7.26), using the Cauchy-Schwarz inequality, (2.3), (6.1), (3.9) and Itô ’s isometry we have

$$\begin{aligned} H_{2.3}\le&\left( {\mathbb {E}}\left[ \left\| \sum _{i=1}^{m}\int _{t_n}^{r}\Delta g_i\big (p,X(p),\widetilde{Y}(t_n)\big )dW_{i}(p)\right\| ^2\bigg |{\mathcal {F}}_{t_n} \right] \right) ^{1/2}\\&\qquad \qquad \quad \times \left( {\mathbb {E}}\left[ \left\| \sum _{i=1}^{m}{\textbf{D}}f\big (\widetilde{Y}(t_n)\big )g_i\big ({{\widetilde{Y}}}(t_n)\big )I_{i}^{t_n,r} \right\| ^2 \bigg |{\mathcal {F}}_{t_n} \right] \right) ^{1/2}\\ \le&\left( m\sum _{i=1}^{m}\int _{t_n}^{r}{\mathbb {E}}\left[ \left\| \Delta g_i\big (p,X(p),{{\widetilde{Y}}}(t_n)\big ) \right\| ^2\bigg |{\mathcal {F}}_{t_n} \right] dp\right) ^{1/2} \\&\qquad \qquad \qquad \qquad \qquad \qquad \qquad \quad \times m C_{Df} C_{g_i} |r-t_n|^{1/2}. \end{aligned}$$

Then, by Lemma 7.1 we have

$$\begin{aligned}{} & {} H_{2.3} \le \bigg (2m^2\int _{t_n}^{r} {\mathbb {E}}\Big [\big \Vert g(X(p))-g(Y_{\theta }(p))\big \Vert _{{{\textbf {F}}}(d\times m)}^2 \Big |{\mathcal {F}}_{t_n}\Big ]dp \\{}{} & {} {}\qquad +C_{G}(R) |r-t_n|^3\bigg )^{1/2}m C_{Df} C_{g_i} |r-t_n|^{1/2}. \end{aligned}$$

Since the integrand \({\mathbb {E}}\Big [\big \Vert g(X(p))-g(Y_{\theta }(p))\big \Vert _{{\textbf{F}}(d\times m)}^2 \Big |{\mathcal {F}}_{t_n}\Big ]\) is non-negative for all \(p\in [t_n,t_{n+1}]\), we can replace the upper limit of integration with \(t_{n+1}\). With \(\sqrt{a+b}\le \sqrt{a}+\sqrt{b}\), we have

$$\begin{aligned} H_{2.3}\le & {} \sqrt{2}m^2 C_{Df} C_{g_i}|r-t_n|^{1/2}\nonumber \\{} & {} \times \left( \int _{t_n}^{t_{n+1}} {\mathbb {E}}\Big [\big \Vert g(X(r))-g(Y_{\theta }(r))\big \Vert _{{\textbf{F}}(d\times m)}^2 \Big |{\mathcal {F}}_{t_n}\Big ]dr\right) ^{1/2}\nonumber \\{} & {} +m C_{Df} C_{g_i}C_{G}(R)^{1/2} |r-t_n|^{2}. \end{aligned}$$
(7.33)

Notice that we changed the variable of integration from p back to r for consistency. Substituting (7.27), (7.32) and (7.33) back into (7.26), we have

$$\begin{aligned} H_{2}&\le \,\,m C_{Df} C_{g_i} \Bigg (\left( \overline{C}^{\{4(q+2)\}}_{H2.21}\right) ^{1/2}+C_{G}(R)^{1/2} \Bigg )|r-t_n|^2\nonumber \\&\quad +\sqrt{2}m^2 C_{Df} C_{g_i}|r-t_n|^{1/2}\nonumber \\&\quad \times \left( \int _{t_n}^{t_{n+1}} {\mathbb {E}}\Big [\big \Vert g(X(r))-g(Y_{\theta }(r))\big \Vert _{{\textbf{F}}(d\times m)}^2 \Big |{\mathcal {F}}_{t_n}\Big ]dr\right) ^{1/2}. \end{aligned}$$
(7.34)

For \(H_{3}\) in (7.22), by the Cauchy-Schwarz inequality, triangle inequality, (2.3), (2.1), (3.10), (2.6) and (6.1) we have

$$\begin{aligned} H_{3}&\le \,\,{\mathbb {E}}\Bigg [\frac{1}{4}\sum _{i=1}^{m}\Bigg ( \big \Vert {\textbf{D}}f\big (\widetilde{Y}(t_n)\big )\big \Vert _{{\textbf{F}}}^2\,\big \Vert {\textbf{D}}g_i\big (\widetilde{Y}(t_n)\big )\big \Vert _{{\textbf{F}}}^2\,\big \Vert g_i\big (\widetilde{Y}(t_n)\big )\big \Vert ^2\Vert E_{\theta }(r)\Vert ^2 \nonumber \\&\quad + 2\left| I_{i}^{t_n,r}\right| ^4+ 2|r-t_n|^2\Bigg ) \Bigg |{\mathcal {F}}_{t_n} \Bigg ]\nonumber \\&\le \,\,\frac{m}{4} C_{Df}^2 C_{Dg_i}^2 C_{g_i}^2\,\,{\mathbb {E}}\Big [\Vert E_{\theta }(r)\Vert ^2\Big |{\mathcal {F}}_{t_n}\Big ]+ \frac{(\varvec{\gamma }_{4}+1)m}{2}|r-t_n|^2. \end{aligned}$$
(7.35)

For \(H_{4}\) in (7.22), by the Cauchy-Schwarz inequality, conditional independence of the Itô integrals, (3.9), triangle inequality, (2.1), Itô ’s isometry, (2.6), and (6.1), we have

$$\begin{aligned} H_{4}&\le \,\,{\mathbb {E}}\Bigg [\frac{1}{2}\sum _{\begin{array}{c} i,j=1\\ i<j \end{array}}^{m}\Vert E_{\theta }(r)\Vert \big \Vert {{\textbf {D}}}f\big (\widetilde{Y}(t_n)\big )\big \Vert _{{{\textbf {F}}}}\Big (\big \Vert {{\textbf {D}}}g_i\big (\widetilde{Y}(t_n)\big )\big \Vert _{{{\textbf {F}}}}\big \Vert g_j\big (\widetilde{Y}(t_n)\big )\big \Vert \nonumber \\ {}&\quad \,\, +\big \Vert {{\textbf {D}}}g_j\big (\widetilde{Y}(t_n)\big )\big \Vert _{{{\textbf {F}}}}\big \Vert g_i\big (\widetilde{Y}(t_n)\big )\big \Vert \Big ) \left| I_{i}^{t_n,r}\right| \,\left| I_{j}^{t_n,r}\right| \Bigg | {\mathcal {F}}_{t_n} \Bigg ]\nonumber \\ {}&\le \,\,\frac{1}{4}m(m-1) C_{Df}^2 C_{Dg_i}^2 C_{g_i}^2{\mathbb {E}}\Big [\Vert E_{\theta }(r)\Vert ^2\Big |{\mathcal {F}}_{t_n}\Big ]+\frac{1}{8}m(m-1)|r-t_n|^2. \end{aligned}$$
(7.36)

For \(H_{5}\) in (7.22), by the Cauchy-Schwarz inequality, triangle inequality, (2.1), (6.1), (2.6), and Lemma 2.2 with \(b=2\), we have

$$\begin{aligned} H_{5}&\le \,\,{\mathbb {E}}\Bigg [\frac{1}{2}\sum _{{\begin{array}{c} i,j=1\\ i<j \end{array}}}^{m}\Bigg (\big \Vert {{\textbf {D}}}f\big (\widetilde{Y}(t_n)\big )\Vert _{{{\textbf {F}}}}^2\Vert E_{\theta }(r)\Vert ^2\Big (\big \Vert {{\textbf {D}}}g_i\big (\widetilde{Y}(t_n)\big )\big \Vert _{{{\textbf {F}}}}^2\big \Vert g_j\big (\widetilde{Y}(t_n)\big )\big \Vert ^2\nonumber \\ {}&\quad \,\, +\big \Vert {{\textbf {D}}}g_i\big (\widetilde{Y}(t_n)\big )\big \Vert _{{{\textbf {F}}}}^2\big \Vert g_j\big (\widetilde{Y}(t_n)\big )\big \Vert ^2\Big )+\big |A_{ij}(t_n, r)\big |^2\Bigg )\Bigg |{\mathcal {F}}_{t_n} \Bigg ]\nonumber \\ {}&\le \,\,\frac{1}{2}m(m-1) C_{Df}^2 C_{Dg_i}^2 C_{g_i}^2{\mathbb {E}}\Big [\Vert E_{\theta }(r)\Vert ^2\Big |{\mathcal {F}}_{t_n}\Big ]\nonumber \\ {}&\quad \,\, +\frac{1}{4}m(m-1)(C^{}_{\texttt {LA}}\left( 2\right) )^2|r-t_n|^2. \end{aligned}$$
(7.37)

For \(H_{6}\) in (7.22), by the Cauchy-Schwarz inequality, triangle inequality, and (2.1) we have (noting that \(\Vert [\cdot ]^2 \Vert _{\textbf{F}}=\Vert \cdot \Vert ^2\))

$$\begin{aligned} H_{6}\le \,&{\mathbb {E}}\bigg [\Vert E_{\theta }(r)\Vert \big \Vert Y_{\theta }(r)-{{\widetilde{Y}}}(t_n)\big \Vert ^2\\&\qquad \qquad \times \left\| \int _{0}^{1}(1-\epsilon ){\textbf{D}}^2f\Big (\widetilde{Y}(t_n)-\epsilon \cdot \big (Y_{\theta }(r)-\widetilde{Y}(t_n)\big )\Big )d\epsilon \right\| _{{\textbf{T}}_3}\bigg |{\mathcal {F}}_{t_n} \bigg ]\\ \le \,&\frac{1}{2}{\mathbb {E}}\Big [\Vert E_{\theta }(r)\Vert ^2\Big |{\mathcal {F}}_{t_n}\Big ]+ \frac{1}{2}\underbrace{\sqrt{{\mathbb {E}}\left[ \big \Vert Y_{\theta }(r)-{{\widetilde{Y}}}(t_n)\big \Vert ^8\bigg |{\mathcal {F}}_{t_n} \right] }}_{H_{6.1}}\\&\qquad \times \underbrace{\sqrt{{\mathbb {E}}\left[ \left\| \int _{0}^{1}(1-\epsilon ){\textbf{D}}^2f\Big (\widetilde{Y}(t_n)-\epsilon \cdot \big (Y_{\theta }(r)-\widetilde{Y}(t_n)\big )\Big )d\epsilon \right\| _{{\textbf{T}}_3}^4 \bigg |{\mathcal {F}}_{t_n} \right] }}_{H_{6.2}}. \end{aligned}$$

From (6.6) in Lemma 6.3 with \(k=4\), we have \(H_{6.1}\le C^{}_{\texttt {SR}}\left( 4,R\right) ^{1/2} |r-t_n|^2.\) From (6.11) in Lemma 6.5 with \(k=2\), we have \(H_{6.2} \le C^{}_{\texttt {TE}}\left( 2,R\right) ^{1/2}\). Therefore, \(H_{6}\) in (7.22) becomes

$$\begin{aligned} H_{6}\le \frac{1}{2}{\mathbb {E}}\Big [\Vert E_{\theta }(r)\Vert ^2\Big |{\mathcal {F}}_{t_n}\Big ] +\frac{1}{2} C^{}_{\texttt {SR}}\left( 4,R\right) ^{1/2}\, C^{}_{\texttt {TE}}\left( 2,R\right) ^{1/2}\,|r-t_n|^2. \end{aligned}$$
(7.38)

Substituting (7.23), (7.34), (7.35), (7.36), (7.37) and (7.38) back into (7.22) for H, we have

$$\begin{aligned} {\mathbb {E}}[H|{\mathcal {F}}_{t_n}]\le & {} K_1(R) {\mathbb {E}}\Big [\Vert E_{\theta }(r)\Vert ^2\Big |{\mathcal {F}}_{t_n}\Big ]+\overline{K}^{\{4(q+2)\}}_2(R)|r-t_n|^2\nonumber \\{} & {} +\sqrt{2}m^2 C_{Df} C_{g_i}|r-t_n|^{1/2}\nonumber \\{} & {} \times \left( \int _{t_n}^{t_{n+1}} {\mathbb {E}}\Big [\big \Vert g(X(r))-g(Y_{\theta }(r))\big \Vert _{{\textbf{F}}(d\times m)}^2 \Big |{\mathcal {F}}_{t_n}\Big ]dr\right) ^{1/2}, \end{aligned}$$
(7.39)

where

$$\begin{aligned} K_1(R) :=\,\,&\frac{1}{2} + \frac{1}{2} C_{Df}^2 C_{f}^2+m(m-1) C_{Df}^2 C_{Dg_i}^2 C_{g_i}^2, \end{aligned}$$
(7.40)

and with \({\overline{C}}^{\{4(q+2)\}}_{H2.21}\) from (7.31)

$$\begin{aligned}&{\overline{K}}^{\{4(q+2)\}}_2(R)\nonumber \\&:=\,\,\frac{1}{2}+m C_{Df} C_{g_i} \bigg (\left( {\overline{C}}^{\{4(q+2)\}}_{H2.21}\right) ^{1/2}+\frac{1}{2}(\varvec{\gamma }_{4}+1)m\nonumber \\ {}&+\frac{1}{4}m(m-1)\big (1+\left( C^{}_{\texttt {LA}}\left( 2\right) \right) ^2\big )+\frac{1}{2} C^{}_{\texttt {SR}}\left( 4,R\right) ^{1/2} C^{}_{\texttt {TE}}\left( 2,R\right) ^{1/2}. \end{aligned}$$
(7.41)

Substituting \({\mathbb {E}}[H|{\mathcal {F}}_{t_n}]\) from (7.39) back into (7.19), we have

$$\begin{aligned} {\mathbb {E}}\Big [\big \Vert E_{\theta }(t_{n+1})&\big \Vert ^2 \Big |{\mathcal {F}}_{t_n}\Big ]\le \big \Vert \widetilde{E}(t_n)\big \Vert ^2+2K_1(R)\displaystyle \int _{t_n}^{t_{n+1}}{\mathbb {E}}\Big [\Vert E_{\theta }(r)\Vert ^2\Big |{\mathcal {F}}_{t_n}\Big ]dr\nonumber \\{}&{} +mC_{G}(R) h_{n+1}^3 +{\overline{K}}^{\{4(q+2)\}}_2(R) h_{n+1}^3\nonumber \\{}&{} +2 \displaystyle \int _{t_n}^{t_{n+1}}{\mathbb {E}}\big [J_{f,g}\big |{\mathcal {F}}_{t_n}\big ]dr+\sqrt{2 }m^2 C_{Df} C_{g_i}h_{n+1}^{3/2}\nonumber \\{}&{} \times \displaystyle \left( \int _{t_n}^{t_{n+1}} {\mathbb {E}}\Big [\big \Vert g(X(r))-g(Y_{\theta }(r))\big \Vert _{{{\textbf {F}}}(d\times m)}^2 \Big |{\mathcal {F}}_{t_n}\Big ]dr\right) ^{1/2}. \end{aligned}$$
(7.42)

Using (2.1) on the last term on the RHS of (7.42), we have

$$\begin{aligned} {\mathbb {E}}\Big [\big \Vert E_{\theta }&(t_{n+1})\big \Vert ^2 \Big |{\mathcal {F}}_{t_n}\Big ]\le \,\, \big \Vert \widetilde{E}(t_n)\big \Vert ^2+2K_1(R)\displaystyle \int _{t_n}^{t_{n+1}} {\mathbb {E}}\Big [\Vert E_{\theta }(r)\Vert ^2\Big |{\mathcal {F}}_{t_n}\Big ]dr\nonumber \\{}&{} +{\overline{C}}_M^{\{4(q+2)\}}(R)\,h_{n+1}^3\nonumber \\{}&{} +2 \displaystyle \int _{t_n}^{t_{n+1}}{\mathbb {E}}\bigg [J_{f,g}+\frac{1}{2}\Big \Vert g(X(r))-g(Y_{\theta }(r))\big \Vert _{{{\textbf {F}}}(d\times m)}^2\bigg |{\mathcal {F}}_{t_n}\bigg ]dr, \end{aligned}$$
(7.43)

where \({\overline{C}}^{\{4(q+2)\}}_M\) is as defined in (7.15). Recall \(J_{f,g}\) is given in (7.20) so that

$$\begin{aligned}{} & {} J_{f,g}+\frac{1}{2}\Big \Vert g(X(r))-g(Y_{\theta }(r))\big \Vert _{{\textbf{F}}(d\times m)}^2\\{} & {} =\Big \langle E_{\theta }(r),f(X(r))-f(Y_{\theta }(r))\Big \rangle +\frac{3}{2}\big \Vert g(X(r))-g(Y_{\theta }(r))\big \Vert _{{\textbf{F}}(d\times m)}^2. \end{aligned}$$

By Assumption 2.2 we can apply the monotone condition (2.5):

$$\begin{aligned} {\mathbb {E}}\Big [\big \Vert E_{\theta }(t_{n+1})\big \Vert ^2 \Big |{\mathcal {F}}_{t_n}\Big ] \le&\,\, \big \Vert \widetilde{E}(t_n)\big \Vert ^2+C_E(R)\displaystyle \int _{t_n}^{t_{n+1}} {\mathbb {E}}\Big [\Vert E_{\theta }(r)\Vert ^2\Big |{\mathcal {F}}_{t_n}\Big ]dr\\{}&{} +{\overline{C}}_M^{\{4(q+2)\}}(R)\,h_{n+1}^3, \end{aligned}$$

where \(C_E(R)\) is in (7.14).

To obtain the the final estimate on \(C_M(R)\) in the Lemma we use the explicit form of \({\overline{C}}_M^{\{4(q+2)\}}\), \(\overline{K}_2^{\{4(q+2)\}}\), \({\overline{C}}_{H2.21}^{\{4(q+2)\}}\), given by (7.15), (7.41), and (7.31) respectively, (6.10) in the statement of Lemma 6.5, (2.9), and Assumption 2.2 we bound the expectation of \(\overline{C}_M^{\{4(q+2)\}}\) as follows,

$$\begin{aligned} C_M(R)\,:=\,\,&{\mathbb {E}}\Bigg [ {\overline{C}}_M^{\{4(q+2)\}}(R)\Bigg ]\nonumber \\ \le&\,m^4 C^2_{Df} C^2_{g_i}+ 2m C_{Df} C_{g_i} \Big ( c_3 C^{}_{\texttt {PR}}\left( 2\right) ^{1/4} \big (1+3^{q_1+1}C_{\texttt {X}} \big )+C_{G}(R)^{1/2} \Big )\nonumber \\&\quad +\frac{1}{2}(\varvec{\gamma }_{4}+1)m+\frac{1}{2}m(m-1)\big (1+(C^{}_{\texttt {LA}}\left( 2\right) )^2\big )\nonumber \\&\quad + C^{}_{\texttt {SR}}\left( 4,R\right) ^{1/2} C^{}_{\texttt {TE}}\left( 2,R\right) ^{1/2}+mC_{G}(R)+1. \end{aligned}$$
(7.44)

\(\square \)

7.2 Proof of Theorem 4.1 on strong convergence

Proof

Firstly, by (7.1) we have the conditional second-moment bound of the one-step error as

$$\begin{aligned} {\mathbb {E}}\Big [\big \Vert {{\widetilde{E}}}(t_{n+1})\big \Vert ^2 \Big |{\mathcal {F}}_{t_n}\Big ] =\,\,{\mathbb {E}}\Big [\big \Vert E_{\theta }(t_{n+1})\big \Vert ^2 \Big |{\mathcal {F}}_{t_n}\Big ]+{\mathbb {E}}\Big [\big \Vert E_{\varphi }(t_{n+1})\big \Vert ^2 \Big |{\mathcal {F}}_{t_n}\Big ], \end{aligned}$$
(7.45)

where by (3.6) and (7.2), the one-step error bound of the backstop map yields

$$\begin{aligned} {\mathbb {E}}\Big [\big \Vert E_{\varphi }(t_{n+1})\big \Vert ^2 \Big |{\mathcal {F}}_{t_n}\Big ] \le&\,\, \big \Vert \widetilde{E}(t_n)\big \Vert ^2+C_{B_1}\displaystyle \int _{t_n}^{t_{n+1}} {\mathbb {E}}\Big [\Vert E_{\varphi }(r)\Vert ^2\Big |{\mathcal {F}}_{t_n}\Big ]dr\nonumber \\ {}&+C_{B_2}\,h_{n+1}^3, \quad a.s. \end{aligned}$$
(7.46)

Therefore, by substituting (7.13) and (7.46) into (7.45), and recalling (7.1) we have for any \(h_{n+1}\) that satisfies Assumption 3.1,

$$\begin{aligned} {\mathbb {E}}\Big [\big \Vert {{\widetilde{E}}}(t_{n+1})\big \Vert ^2 \Big |{\mathcal {F}}_{t_n}\Big ]\le & {} \big \Vert \widetilde{E}(t_{n})\big \Vert ^2 + \Gamma _1(R) \int _{t_n}^{t_{n+1}} {\mathbb {E}}\Big [\big \Vert {{\widetilde{E}}}(r)\big \Vert ^2 \Big |{\mathcal {F}}_{t_n}\Big ]dr\nonumber \\{} & {} +{\overline{\Gamma }}_2^{\{4(q+2)\}}\big (R \big ) h_{n+1} ^3,\quad a.s. \end{aligned}$$
(7.47)

where we define \(\Gamma _1\), \({\overline{\Gamma }}_2\) and by (7.44) its expected form as

$$\begin{aligned} \Gamma _1(R)&:=C_E(R)+C_{B_1};\nonumber \\ {\overline{\Gamma }}_2^{\{4(q+2)\}}\big (R\big )&:=\overline{C}_M^{\{4(q+2)\}}(R)+ C_{B_2};\nonumber \\ \Gamma _2(R)&:={\mathbb {E}}\bigg [{\overline{\Gamma }}_2^{\{4(q+2)\}}\big (R \big ) \bigg ] \le C_M(R) + C_{B_2}. \end{aligned}$$
(7.48)

For a fixed \(t>0\), let \(N^{(t)}\) be as in Definition 3.2, we multiply both sides of (7.47) with the indicator function \({\textbf{1}}_{\{N^{(t)}> n+1\}}\) and sum up the steps excluding the last step \(N^{(t)}\) to have

$$\begin{aligned}{} & {} \sum _{n=0}^{N^{(t)}-2}{\mathbb {E}}\Big [\big \Vert \widetilde{E}(t_{n+1})\big \Vert ^2 \Big |{\mathcal {F}}_{t_n}\Big ] {{\textbf {1}}}_{\{N^{(t)}> n+1\}} \le \sum _{n=0}^{N^{(t)}-2}\big \Vert \widetilde{E}(t_{n})\big \Vert ^2{{\textbf {1}}}_{\{N^{(t)}> n+1\}} \nonumber \\{} & {} \qquad +\, \Gamma _1(R) \sum _{n=0}^{N^{(t)}-2}\int _{t_n}^{t_{n+1}}{\mathbb {E}}\Big [\big \Vert \widetilde{E}(r)\big \Vert ^2 \Big |{\mathcal {F}}_{t_n}\Big ]{{\textbf {1}}}_{\{N^{(t)}> n+1\}}dr\nonumber \\{} & {} \qquad + {\overline{\Gamma }}_2^{\{4(q+2)\}}(R)\sum _{n=0}^{N^{(t)}-2} h_{n+1}^3{{\textbf {1}}}_{\{N^{(t)}> n+1\}}. \end{aligned}$$
(7.49)

Since \(t\in \big [t_{N^{(t)}-1},t_{N^{(t)}}\big ]\), we use (7.47) to express the last step, noting that it holds when \(t_n,t_{n+1}\) are replaced by \(t_{N^{(t)}-1}\) and t respectively:

$$\begin{aligned} {\mathbb {E}}\Big [\big \Vert {{\widetilde{E}}}(t )\big \Vert ^2 \Big |{\mathcal {F}}_{t_{N^{(t)}-1}}\Big ] \le \,&\big \Vert \widetilde{E}(t_{N^{(t)}-1})\big \Vert ^2+\Gamma _1(R)\int _{t_{N^{(t)}-1}}^{t} {\mathbb {E}}\Big [\big \Vert {{\widetilde{E}}}(r)\big \Vert ^2 \Big |{\mathcal {F}}_{t_{N^{(t)}-1}}\Big ]dr\nonumber \\&\qquad \quad +{\overline{\Gamma }}_2^{\{4(q+2)\}}(R) \big |t-t_{N^{(t)}-1}\big |^3. \end{aligned}$$
(7.50)

By adding the both sides of (7.49) and (7.50), and taking an expectation:

$$\begin{aligned}&\left. \begin{array}{l} \qquad {\mathbb {E}}\Bigg [\displaystyle \sum _{n=0}^{N^{(t)}-2}\Big ({\mathbb {E}}\Big [\big \Vert {{\widetilde{E}}}(t_{n+1})\big \Vert ^2 \Big |{\mathcal {F}}_{t_n}\Big ]-\big \Vert {{\widetilde{E}}}(t_{n})\big \Vert ^2\Big ){\textbf{1}}_{\{N^{(t)}> n+1\}} \\ \,\,\quad \quad \qquad \quad \qquad \qquad +{\mathbb {E}}\Big [\big \Vert {{\widetilde{E}}}(t)\big \Vert ^2 \Big |{\mathcal {F}}_{t_{N^{(t)}-1}}\Big ]-\big \Vert {{\widetilde{E}}}(t_{N^{(t)}-1})\big \Vert ^2 \Bigg ] \\ \end{array}\right\} =:\text {LHS}\nonumber \\&\left. \begin{array}{l} \le \,\,\Gamma _1(R){\mathbb {E}}\Bigg [ \displaystyle \sum _{n=0}^{N^{(t)}-2}\int _{t_n}^{t_{n+1}}{\mathbb {E}}\Big [\big \Vert {{\widetilde{E}}}(r)\big \Vert ^2 \Big |{\mathcal {F}}_{t_n}\Big ]{\textbf{1}}_{\{N^{(t)}> n+1\}}dr\\ \qquad \qquad \qquad \qquad \qquad \qquad +\displaystyle \int _{t_{N^{(t)}-1}}^{t} {\mathbb {E}}\Big [\big \Vert {{\widetilde{E}}}(r)\big \Vert ^2 \Big |{\mathcal {F}}_{t_{N^{(t)}-1}}\Big ]dr\Bigg ] \\ \end{array}\right\} =: \text {R}_1 \nonumber \\&\left. \begin{array}{l} \quad +{\mathbb {E}}\Bigg [{\overline{\Gamma }}_2^{\{4(q+2)\}}(R)\Bigg (\displaystyle \sum _{n=0}^{N^{(t)}-2} h_{n+1}^3{\textbf{1}}_{\{N^{(t)}> n+1\}}+\big |t-t_{N^{(t)}-1}\big |^3\Bigg )\Bigg ] \end{array}\right\} =: \text {R}_2 \end{aligned}$$
(7.51)

where we analyse (7.51) (\(\text {LHS}\le \text {R}_1+\text {R}_2\)) below. For the LHS in (7.51), \(N^{(t)}\) is a random number taking value from \(N^{(t)}_{\min }\) to \(N^{(t)}_{\max }\), and \({\textbf{1}}_{\{N^{(t)}> n+1\}}\) is an \({\mathcal {F}}_{t_{n}}\)-measurable random variable. Therefore it is useful decompose the range of n into three parts on each trajectory. First, when \(n<N^{(t)}-1\), then \(1_{\{N^{(t)}>n+1\}}=1_{\{N^{(t)}>n\}}=1\). Second, when \(n=N^{(t)}-1\), then \(1_{\{N^{(t)}>n+1\}}=0\) and \(1_{\{N^{(t)}>n\}}=1\). Finally, when \(n>N^{(t)}-1\), then \(1_{\{N^{(t)}>n+1\}}=1_{\{N^{(t)}>n\}}=0\). Hence we obtain a telescoping sum with the appropriate cancellation that terminates at \({\mathbb {E}}\big [\Vert {\widetilde{E}}(t_{N^{(t)}-1})\Vert ^2\,1_{\{N^{(t)}>N^{(t)}-1\}}\big ]={\mathbb {E}}\big [\Vert {\widetilde{E}}(t_{N^{(t)}-1})\Vert ^2\big ]\). Applying this with the tower property for conditional expectations, and using the fact that \(\Vert {\widetilde{E}}(t_0)\Vert ^2=0\), we have

$$\begin{aligned} \text {LHS}=&\sum _{n=0}^{N_{\max }^{(t)}-2}{\mathbb {E}}\Big [\big \Vert \widetilde{E}(t_{n+1})\big \Vert ^2 {\textbf{1}}_{\{N^{(t)}> n+1\}} -\big \Vert \widetilde{E}(t_{n})\big \Vert ^2 {\textbf{1}}_{\{N^{(t)}> n+1\}}\Big ]\nonumber \\&\qquad \qquad \qquad \qquad \qquad +{\mathbb {E}}\Big [{\mathbb {E}}\Big [\big \Vert \widetilde{E}(t )\big \Vert ^2 \Big |{\mathcal {F}}_{t_{N^{(t)}-1}}\Big ]-\big \Vert \widetilde{E}(t_{N^{(t)}-1})\big \Vert ^2\Big ] \nonumber \\ =\,\,&{\mathbb {E}}\Big [\big \Vert \widetilde{E}(t_{N^{(t)}-1})\big \Vert ^2\Big ]-{\mathbb {E}}\Big [\big \Vert \widetilde{E}(t_{0})\big \Vert ^2\Big ]+ {\mathbb {E}}\big [\big \Vert {{\widetilde{E}}}(t )\big \Vert ^2 \big ]-{\mathbb {E}}\Big [\big \Vert \widetilde{E}(t_{N^{(t)}-1})\big \Vert ^2\Big ]\nonumber \\ =\,\,&{\mathbb {E}}\Big [\big \Vert {{\widetilde{E}}}(t )\big \Vert ^2 \Big ]. \end{aligned}$$
(7.52)

Consider the term \(\text {R}_1\) on the RHS of (7.51). By Definition 3.2 we have each \(n=N^{(r)}-1\) for \(r\in [t_n,t_{n+1}]\). So we restate \({\mathcal {F}}_{t_n}\) as \({\mathcal {F}}_{t_{N^{(r)}-1}}\), and the indicator function as \({\textbf{1}}_{\{N^{(t)}>N^{(r)}\}}\). Summing up all the steps results in an integral from 0 to \(t_{N^{(t)}-1}\) that

$$\begin{aligned} \text{ R}_1&=\,\,\Gamma _1(R){\mathbb {E}}\Bigg [ \int _{0}^{t_{N^{(t)}-1}}{\mathbb {E}}\Big [\big \Vert \widetilde{E}(r)\big \Vert ^2{{\textbf {1}}}_{\{N^{(t)}>N^{(r)}\}} \Big |{\mathcal {F}}_{t_{N^{(r)}-1}}\Big ]dr\nonumber \\ {}&\quad \,\, +\int _{t_{N^{(t)}-1}}^{t} {\mathbb {E}}\Big [\big \Vert {{\widetilde{E}}}(r)\big \Vert ^2 \Big |{\mathcal {F}}_{t_{N^{(t)}-1}}\Big ]dr\Bigg ]\nonumber \\ {}&\le \,\,\Gamma _1(R)\int _{0}^{t} {\mathbb {E}}\Big [\big \Vert \widetilde{E}(r)\big \Vert ^2 \Big ]dr. \end{aligned}$$
(7.53)

For \(\text {R}_2\) in (7.51), by (7.48), Definition 3.2 and \(\rho h_{\min }=h_{\max }\) we have

$$\begin{aligned} \text {R}_2\le \Gamma _2(R) N^{(t)}_{\max }h_{\max }^3 \le \Gamma _2(R)\left( \rho t+1\right) h_{\max }^2. \end{aligned}$$
(7.54)

We see that \(4(q+2)\) is the minimum number of finite SDE moments required for a finite \(\text {R}_2\), and this is guaranteed by Assumption 2.2. Combining (7.52), (7.53) and (7.54) back into (7.51), for all \(t\in [0,T]\) we have

$$\begin{aligned} {\mathbb {E}}\Big [\big \Vert {{\widetilde{E}}}(t)\big \Vert ^2 \Big ]\,\,\le \,\, \Gamma _1(R)\int _{0}^{t} {\mathbb {E}}\Big [\big \Vert \widetilde{E}(r)\big \Vert ^2 \Big ]dr +\Gamma _2(R)\left( \rho t+1\right) h_{\max }^2. \end{aligned}$$

By Gronwall’s inequality (see [25, Thm. 8.1]), we have for all \(t\in [0,T]\)

$$\begin{aligned} \Big ({\mathbb {E}}\Big [\big \Vert {{\widetilde{E}}}(t)\big \Vert ^2 \Big ]\Big )^{\frac{1}{2}}\le C(R,\rho ,t)\,h_{\max }. \end{aligned}$$
(7.55)

Taking the maximum over t on the both sides, the proof follows with

$$\begin{aligned} C(R,\rho ,t):=\sqrt{\big ( C_M(R)+ C_{B_2}\big )\left( \rho t+1\right) \exp \Big (t\big (C_E(R)+C_{B_1}\big )\Big )}. \end{aligned}$$

\(\square \)

7.3 Proof of Theorem 4.2 on the probability of using the backstop

Proof

By (3.11) and by the Markov inequality we have

$$\begin{aligned} {\mathbb {P}}\big [h_{n+1}= h_{\min }\big ]={\mathbb {P}}\left[ \frac{h_{\max }}{\big \Vert \widetilde{Y}(t_n)\big \Vert ^{1/\kappa }}\le h_{\min }\right] \le \frac{{\mathbb {E}}\Big [\big \Vert \widetilde{Y}(t_n)\big \Vert ^2\Big ]}{\rho ^{2\kappa }}. \end{aligned}$$
(7.56)

By adding in and subtracting out \(X(t_n)\) together with the tower property of conditional expectation, (2.3), (7.1) and (2.9) we have

$$\begin{aligned} {\mathbb {E}}\Big [\big \Vert {{\widetilde{Y}}}(t_n)\big \Vert ^2\Big ] \le \,\,&2{\mathbb {E}}\Big [\big \Vert X(t_n)-{{\widetilde{Y}}}(t_n)\big \Vert ^2\Big ]+2{\mathbb {E}}\big [\Vert X(t_n)\Vert ^2\big ]\nonumber \\ \le \,\,&2{\mathbb {E}}\Big [{\mathbb {E}}\Big [\big \Vert X(t_n)-{{\widetilde{Y}}}(t_n)\big \Vert ^2\Big | {\mathcal {F}}_{t_{n-1}}\Big ]\Big ]+2{\mathbb {E}}\bigg [\sup _{t_n\in [0,T]}\Vert X(t_n)\Vert ^2\bigg ]\nonumber \\ \le \,\,&2{\mathbb {E}}\Big [{\mathbb {E}}\Big [\big \Vert {{\widetilde{E}}}(t_n)\big \Vert ^2\Big | {\mathcal {F}}_{t_{n-1}}\Big ]\Big ]+2C_{\texttt {X}}. \end{aligned}$$
(7.57)

Next, we repeatedly substitute (7.47) for decreasing values of n into the RHS of (7.57) until \(n=0\). Then with tower property, Definition 3.2, (3.11) and (7.48) we have

$$\begin{aligned} {\mathbb {E}}\Big [\big \Vert {{\widetilde{Y}}}(t_n)\big \Vert ^2\Big ] \le \,\,&2{\mathbb {E}}\Big [\big \Vert {{\widetilde{E}}}(t_{n-1})\big \Vert ^2\Big ]+2\Gamma _1(R){\mathbb {E}}\Bigg [\int _{t_{n-1}}^{t_n}{\mathbb {E}}\Big [\big \Vert {{\widetilde{E}}}(r)\big \Vert ^2\Big | {\mathcal {F}}_{t_{n-1}}\Big ]dr\Bigg ]\nonumber \\&\qquad +2{\mathbb {E}}\Bigg [{\overline{\Gamma }}_2^{\{4(q+2)\}}(R) h_{n} ^3\Bigg ]+2C_{\texttt {X}} \nonumber \\ \le \,\,&2{\mathbb {E}}\Big [\big \Vert {{\widetilde{E}}}(t_{0})\big \Vert ^2\Big ]+2\Gamma _1(R){\mathbb {E}}\Bigg [\sum _{j=1}^{n}\int _{t_{j-1}}^{t_j}{\mathbb {E}}\Big [\big \Vert {{\widetilde{E}}}(r)\big \Vert ^2\Big | {\mathcal {F}}_{t_{j-1}}\Big ]dr\Bigg ]\nonumber \\&\qquad +2N^{(T)}_{\max }\Gamma _2(R) h_{\max }^3+2C_{\texttt {X}} \nonumber \\ \le \,\,&2\Gamma _1(R){\mathbb {E}}\Bigg [\int _{0}^{t_n}{\mathbb {E}}\Big [\big \Vert {{\widetilde{E}}}(r)\big \Vert ^2\Big | {\mathcal {F}}_{t_{N^{(r)}-1}}\Big ]dr\Bigg ]\nonumber \\&\qquad +2\left( \rho \,T+1 \right) \Gamma _2(R) h_{\max }^3+2C_{\texttt {X}}. \end{aligned}$$
(7.58)

Since the integrand \({\mathbb {E}}\Big [\big \Vert \widetilde{E}(r)\big \Vert ^2\Big | {\mathcal {F}}_{t_{N^{(r)}-1}}\Big ]\) in the second term on the RHS of (7.58) is almost surely non-negative for all \(r\in [0,T]\), we can replace the upper limit of integration with T. Using \({{\widetilde{E}}}(t_{0})=0\), (3.11), the tower property of conditional expectation, and (7.55) from Theorem 4.1, we have

$$\begin{aligned}&{\mathbb {E}}\Bigg [\int _{0}^{t_n}{\mathbb {E}}\Big [\big \Vert \widetilde{E}(r)\big \Vert ^2\Big | {\mathcal {F}}_{t_{N^{(r)}-1}}\Big ]dr\Bigg ] \nonumber \\ \le&\int _{0}^{T}{\mathbb {E}}\Big [\big \Vert \widetilde{E}(r)\big \Vert ^2\Big ]dr \le T\max _{r\in [0,T]}{\mathbb {E}}\Big [\big \Vert {{\widetilde{E}}}(r)\big \Vert ^2\Big ] \le T\,C^2(R,\rho ,T)h_{\max }^2. \end{aligned}$$
(7.59)

By choosing \(h_{\max }\le 1/C(R,\rho ,T)\), we substitute (7.59) into (7.58) and then (7.56) to get

$$\begin{aligned} {\mathbb {P}}\big [h_{n+1}\,=\,\, h_{\min }\big ] \,\le \,\, \frac{2\Big (\Gamma _1(R)+\left( T+1 \right) \Gamma _2(R) h_{\max }^2+C_{\texttt {X}} \Big )}{\rho ^{2\kappa -1}}=:\frac{C_{\text {prob}}}{\rho ^{2\kappa -1}}, \end{aligned}$$
(7.60)

and the rest of the proof follows. \(\square \)