In this section we first state the assumptions on the coefficients of SDE (1), under which the result of this paper is proven, then we study the occupation time of an Itô process close to a hypersurface, and finally we recall the transformation from [15], which is also essential for our proof.
Definitions and assumptions
We want to prove strong convergence of the Euler–Maruyama method for SDEs with discontinuous drift coefficient. Instead of the usual requirement of Lipschitz continuity we only assume that the drift is a piecewise Lipschitz function on the \({\mathbb {R}}^d\).
Definition 2.1
([15, Definitions 3.1 and 3.2]) Let \(A\subseteq {\mathbb {R}}^d\).
-
1.
For a continuous curve \(\gamma :[0,1]\longrightarrow {\mathbb {R}}^d\), let \(\ell (\gamma )\) denote its length,
$$\begin{aligned} \ell (\gamma )=\sup _{n,0\le t_1<\cdots <t_n\le 1}\sum _{k=1}^n \Vert \gamma (t_k)-\gamma (t_{k-1})\Vert \,. \end{aligned}$$
The intrinsic metric \(\rho \) on A is given by
$$\begin{aligned} \rho (x,y):= & {} \inf \{\ell (\gamma ):\gamma :[0,1]\longrightarrow A \text { is a continuous curve satisfying } \gamma (0)\\= & {} x,\, \gamma (1)=y\}\,, \end{aligned}$$
where \(\rho (x,y):=\infty \), if there is no continuous curve from x to y.
-
2.
Let \(f:A\longrightarrow {\mathbb {R}}^m\) be a function. We say that f is intrinsic Lipschitz, if it is Lipschitz w.r.t. the intrinsic metric on A, i.e. if there exists a constant L such that
$$\begin{aligned} \forall x,y\in A: \Vert f(x)-f(y)\Vert \le L \rho (x,y)\,. \end{aligned}$$
The prototypical examples for intrinsic Lipschitz function are given, like in the one-dimensional case, by differentiable functions with bounded derivative.
Lemma 2.2
([15, Lemma 3.8]) Let \(A\subseteq {\mathbb {R}}^d\) be open and let \(f:A\longrightarrow {\mathbb {R}}^m\) be a differentiable function with \(\Vert f'\Vert <\infty \). Then f is intrinsic Lipschitz with Lipschitz constant \(\Vert f'\Vert \).
Definition 2.3
([15, Definition 3.4]) A function \(f{:}\,{\mathbb {R}}^d\longrightarrow {\mathbb {R}}^m\) is piecewise Lipschitz, if there exists a hypersurface \(\Theta \) with finitely many connected components and with the property, that the restriction \(f|_{{\mathbb {R}}^d\backslash \Theta }\) is intrinsic Lipschitz. We call \(\Theta \) an exceptional set for f, and we call
$$\begin{aligned} \sup _{x,y\in {\mathbb {R}}^d\backslash \Theta }\frac{\Vert f(x)-f(y)\Vert }{\rho (x,y)} \end{aligned}$$
the piecewise Lipschitz constant of f.
In this paper \(\Theta \) will be a fixed \(C^3\)-hypersurface, and we will only consider piecewise Lipschitz functions with exceptional set \(\Theta \). In the following, \(L_f\) denotes the piecewise Lipschitz constant of a function f, if f is piecewise Lipschitz, and it denotes the Lipschitz constant, if f is Lipschitz.
We define the distance \(d(x,\Theta )\) between a point x and the hypersurface \(\Theta \) by \(d(x,\Theta ):=\inf \{\Vert x-y\Vert :y \in \Theta \}\), and for every \(\varepsilon >0\) we define \(\Theta ^\varepsilon :=\{x\in {\mathbb {R}}^d: d(x,\Theta )<\varepsilon \}\).
Recall that, since \(\Theta \in C^3\), for every \(\xi \in \Theta \) there exists an open environment \(U\subseteq \Theta \) of \(\xi \) and a continuously differentiable function \(n{:}\,U\longrightarrow {\mathbb {R}}^d\) such that for every \(\zeta \in U\) the vector \(n(\zeta )\) has length 1 and is orthogonal to the tangent space of \(\Theta \) in \(\zeta \). On a given connected open subset of \(\Theta \) the local unit normal vector n is unique up to a factor \(\pm 1\).
We recall a definition from differential geometry.
Definition 2.4
Let \(\Theta \in {\mathbb {R}}^d\) be any set.
-
1.
An environment \(\Theta ^\varepsilon \) is said to have the unique closest point property, if for every \(x\in {\mathbb {R}}^d\) with \(d(x,\Theta )<\varepsilon \) there is a unique \(p\in \Theta \) with \(d(x,\Theta )=\Vert x-p\Vert \). Therefore, we can define a mapping \(p{:}\Theta ^{\varepsilon }\longrightarrow \Theta \) assigning to each x the point \(p(x)\) in \(\Theta \) closest to x.
-
2.
\(\Theta \) is said to be of positive reach, if there exists \(\varepsilon >0\) such that \(\Theta ^\varepsilon \) has the unique closest point property. The reach of \(\Theta \) is the supremum over all such \(\varepsilon \) if such an \(\varepsilon \) exists, and 0 otherwise.
Now, we give assumptions which are sufficient for the results in [15] to hold and which we need to prove the main result here.
Assumption 2.1
We assume the following for the coefficients of (1):
-
1.
\(\mu \) and \(\sigma \) are bounded;
-
2.
the diffusion coefficient \(\sigma \) is Lipschitz;
-
3.
the drift coefficient \(\mu \) is a piecewise Lipschitz function \({\mathbb {R}}^d\longrightarrow {\mathbb {R}}^d\). Its exceptional set \(\Theta \) is a \(C^3\)-hypersurface of positive reach;
-
4.
non-parallelity condition: there exists a constant \(c_0>0\) such that \(\Vert \sigma (\xi )^\top n(\xi )\Vert \ge c_0\) for all \(\xi \in \Theta \);
-
5.
the function \(\alpha {:} \Theta \longrightarrow {\mathbb {R}}^d\) defined by
$$\begin{aligned} \alpha (\xi ):=\lim _{h\rightarrow 0}\frac{\mu (\xi -h n(\xi ))-\mu (\xi +hn(\xi ))}{2 \Vert \sigma (\xi )^\top n(\xi )\Vert ^2} \end{aligned}$$
(3)
is \(C^3\) and all derivatives up to order three are bounded.
Theorem 2.5
([15, Theorem 3.21]) Let Assumption 2.1 hold. Then SDE (1) has a unique strong solution.
Remark on Assumption 2.1:
-
1.
For existence and uniqueness of a solution to (1), in [15, Theorem 3.21] instead of Assumption 2.1.1 only boundedness in an \(\varepsilon \)-environment of \(\Theta \) is needed. However, for the proof of our convergence result we require global boundedness. Note that other results in the literature on numerical methods for SDEs with discontinuous drift also rely on boundedness of the coefficients, cf. [19,20,21].
-
2.
Assumption 2.1.2 is a technical condition; the focus in this paper is on other types of irregularities in the coefficients. There are results in the literature, where the authors deal with a non-globally Lipschitz diffusion coefficient, see, e.g., [5], but in contributions where only Hölder continuity is required for \(\sigma \), usually uniform non-degeneracy is assumed.
-
3.
Assumption 2.1.3 is a geometrical condition which we require in order to locally flatten \(\Theta \), i.e. to map \(\Theta \) to a hyperplane in a regular way. This is crucial in many places in [15] and here, in particular for the proof of Theorem 2.7 below. In addition to that, Assumption 2.1.3 implies that there exists a constant \(c_1\) such that \(\Vert n'(\xi )\Vert \le c_1\) for every \(\xi \in \Theta \) and every orthonormal vector n on \(\Theta \), see [15, Lemma 3.10].
-
4.
Assumption 2.1.4 means that the diffusion coefficient must have a component orthogonal to \(\Theta \) in all \(\xi \in \Theta \). This condition is significantly weaker than uniform non-degeneracy, and it is essential: in [16] we give a counterexample for the case where the non-parallelity condition does not hold. Then, even existence of a solution is not guaranteed.
-
5.
Assumption 2.1.5 is a technical condition, which is required for our transformation method to work. Boundedness of \(\alpha \) and \(\alpha '\) is needed for proving the local invertibility of our transform. Existence and boundedness of \(\alpha ''\) and \(\alpha '''\) is required for the multidimensional version of Itô’s formula to hold for the transform, see [15]. Moreover, it has been shown in [15, Proposition 3.13] that \(\alpha \) is a well-defined function on \(\Theta \), i.e. it does not depend on the choice of the normal vector n and, in particular, on its sign.
Example 2.6
Suppose \(\Theta \) is the finite and disjoint union of orientable compact \(C^3\)-manifolds. Then \(\Theta \) is of positive reach by the lemma in [3], and each connected component of \(\Theta \) separates the \({\mathbb {R}}^n\) into two open connected components by the Jordan–Brouwer separation theorem, see [17].
Thus \({\mathbb {R}}^d\backslash \Theta \) is the union of finitely many disjoint open connected subsets of \({\mathbb {R}}^d\); we can write \({\mathbb {R}}^d\backslash \Theta =A_1\cup \cdots \cup A_n\).
Suppose there exist bounded and Lipschitz \(C^3\)-functions \(\mu _1,\ldots ,\mu _n:{\mathbb {R}}^d\longrightarrow {\mathbb {R}}^d\) such that \(\mu =\sum _{k=1}^n {\mathbf {1}}_{A_k}\mu _k\), and suppose that \(\sigma :{\mathbb {R}}^d\longrightarrow {\mathbb {R}}^{d\times d}\) is bounded, Lipschitz, and \(C^3\) with \(\sigma (\xi )^\top n(\xi )\ne 0\) for every \(\xi \in \Theta \).
Then it is readily checked that \(\mu \) and \(\sigma \) satisfy Assumption 2.1.
In Sect. 4 we present a number of concrete examples which satisfy Assumption 2.1 and we perform numerical tests on the associated SDEs.
Occupation time close to a hypersurface
In this section we study the occupation time of an Itô process close to a \(C^3\)-hypersurface. In the proof of our main theorem, the Euler–Maruyama approximation \(X^\delta \) in equation (2) will play the role of that Itô process.
Theorem 2.7
Let \(\Theta \) be a \(C^3\)-hypersurface of positive reach and let \({\varepsilon _0}>0\) be such that the closure of \(\Theta ^{\varepsilon _0}\) has the unique closest point property. Let further \(X=(X_t)_{t\ge 0}\) be an \({\mathbb {R}}^d\)-valued Itô process
$$\begin{aligned} X_t=X_0+\int _0^t A_s ds+\int _0^t B_s dW_s\,, \end{aligned}$$
with progressively measurable processes \(A=(A_t)_{t\ge 0}\), \(B=(B_t)_{t\ge 0}\), where \(A\) is \({\mathbb {R}}^d\)-valued and \(B\) is \({\mathbb {R}}^{d\times d}\)-valued. Let the coefficients \(A,B\) be such that
-
1.
there exists a constant \(c_{AB}\) such that for almost all \(\omega \in \Omega \) it holds that
$$\begin{aligned} \forall t\in [0,T]: X_t(\omega )\in \Theta ^{\varepsilon _0} \Longrightarrow \max (\Vert A_t(\omega )\Vert ,\Vert B_t(\omega )\Vert )\le c_{AB}\,; \end{aligned}$$
-
2.
there exists a constant \(c_0\) such that for almost all \(\omega \in \Omega \) it holds that
$$\begin{aligned} \forall t\in [0,T]: X_t(\omega )\in \Theta ^{\varepsilon _0} \Longrightarrow n (p(X_t(\omega )) )^\top B_t(\omega )B_t(\omega )^\top n (p(X_t(\omega ) ))\ge c_0\,. \end{aligned}$$
Then there exists a constant C such that for all \(0<\varepsilon <\varepsilon _0/2\),
$$\begin{aligned} \int _0^{T} {\mathbb P}\left( \{X_s \in \Theta ^\varepsilon \} \right) ds \le C\varepsilon \,. \end{aligned}$$
For the proof we will construct a one-dimensional Itô process Y with the property that Y is close to 0, if and only if X is close to \(\Theta \). For the construction of Y we decompose the path of X into pieces close to \(\Theta \) and pieces farther away. These pieces are then mapped to \({\mathbb {R}}\) by using a signed distance of X from \(\Theta \) and pasted together in a continuous way.
A signed distance to \(\Theta \) is locally given by \(D(x):=n(p(x))^\top (x-p(x))\), where n is a local unit normal vector.
Lemma 2.8
For all \(x\in \Theta ^{\varepsilon _0}\) it holds that \(D'(x)=n(p(x))^\top \).
Proof
Fix \(x\in \Theta ^{\varepsilon _0}\backslash \Theta \) and consider the function h defined by \(h(b):=\Vert x-p(x+b)\Vert ^2\). By definition of the projection map p, h has a minimum in \(b=0\), such that \(h'(0)=0\). Hence from \(h'(b)=-2(x-p(x+b))^\top p'(x+b)\), we get \((x-p(x))^\top p'(x)=0\). This implies \(n(p(x))^\top p'(x)=0\), since \((x-p(x))\) is a scalar multiple of n(p(x)).
Using that \(D(x)=a\Vert x-p(x)\Vert \) for an \(a\in \{-1,1\}\), we compute
$$\begin{aligned} D'(x)&=a\Vert x-p(x)\Vert ^{-1} (x-p(x))^\top ({{\text {id}}_{{\mathbb {R}}^d}}-p'(x)) =a\,n(p(x))^\top ({{\text {id}}_{{\mathbb {R}}^d}}-p'(x))\nonumber \\&=a\big (n(p(x))^\top -n(p(x))^\top p'(x)\big ) =an(p(x))^\top \,. \end{aligned}$$
(4)
For \(\psi \in {\mathbb {R}}\) with \(|\psi |\) small we get
$$\begin{aligned} D(x+\psi n(p(x)))&=n\Big (p\big (x+\psi n(p(x))\big )\Big )^\top \Big (x+\psi n(p(x))-p\big (x+\psi n(p(x))\big )\Big )\\&=n(p(x))^\top (x+\psi n(p(x))-p(x))=D(x)+\psi \,, \end{aligned}$$
such that the directional derivative of D in direction n(p(x)) in x is 1. From this and from (4) it follows that \(D'(x)=n(p(x))^\top \). This also holds for \(x\in \Theta \) by the continuity of \(D'\). \(\square \)
The following lemma states that for any continuous curve \(\gamma \) in \(\Theta ^{\varepsilon _0}\) there is a continuous path of unit normal vectors, such that to every point of \(\gamma \) we can assign a signed distance in a continuous way.
Lemma 2.9
Let \(\gamma :[a,b]\longrightarrow \Theta ^{\varepsilon _0}\) be a continuous function. Then there exists \(m:[a,b]\longrightarrow {\mathbb {R}}^d\) such that
-
1.
m is continuous;
-
2.
\(\Vert m(t)\Vert =1\) for all \(t\in [a,b]\);
-
3.
m(t) is orthogonal to \(\Theta \) in the point \(p(\gamma (t))\) for all \(t\in [a,b]\).
Proof
For \(\xi \in \Theta \) we denote the tangent space to \(\Theta \) in \(\xi \) by \({{\text {tang}}}_\xi \). Let
$$\begin{aligned} S:= & {} \{a\le s\le b: \exists m:[a,s]\longrightarrow {\mathbb {R}}^d\text { continuous}, \;\Vert m(t)\Vert \\= & {} 1,\; m(t)\bot {{\text {tang}}}_{p(\gamma (t))}\;\forall t\in [a,s]\}. \end{aligned}$$
The set \(S\) is nonempty and its elements are bounded by b. Let \(s_1:=\sup S\). There exists an open and connected subset \(U\subseteq \Theta \) such that \(p(\gamma (s_1))\in U\), and a unit normal vector \(n_1:U\longrightarrow {\mathbb {R}}^d\).
Since U is open and \(p\circ \gamma \) is continuous, there exists \(\eta >0\) such that \(p(\gamma ([s_1-\eta ,s_1]))\subseteq U\). By the definition of \(s_1\) there exists \(s\in (s_1-\eta ,s_1)\) and \(m:[a,s]\longrightarrow {\mathbb {R}}^d\) continuous, with \(\Vert m(t)\Vert =1\) and \(m(t)\bot {{\text {tang}}}_{p(\gamma (t))}\) for all \( t\in [a,s]\).
Since \(n_1\) is unique up to a factor \(\pm 1\), the mapping \(n_1\circ p \circ \gamma \) either coincides with m or \(-m\) on \((s_1-\eta ,s)\). Without loss of generality we may assume that the former is the case. Thus we can extend m continuously to \([a,s_1]\) by defining \(m(t):= n_1(p(\gamma (t)))\) for all \(t\in (s,s_1]\).
Now, if \(s_1\) was strictly smaller than b, then we could use the same mapping \(n_1\circ p \circ \gamma \) to extend m continuously beyond \(s_1\), contradicting the definition of \(s_1\). \(\square \)
We will need the following estimate on the local time of a one-dimensional Itô process.
Lemma 2.10
Let \(Y=(Y_t)_{t\ge 0}\) be an Itô process with bounded and progressively measurable coefficients \(\hat{A}=(\hat{A}_t)_{t\ge 0},\hat{B}=(\hat{B}_t)_{t\ge 0}\).
Then \(\sup _{y\in {\mathbb {R}}}{\mathbb {E}}(L_T^y(Y)) \le \left( 3 T^2\Vert \hat{A}\Vert _\infty ^2 +\frac{3}{2}T \Vert \hat{B}_s\Vert _\infty ^2\right) ^{1/2}\).
The claim is a special case of [19, Lemma 3.2]. We give a proof for the convenience of the reader.
Proof
From the Meyer–Tanaka formula [11, Section 3.7, Eq. (7.9)] we have
$$\begin{aligned} 2 L_T^y(Y)&=|Y_T-y|-|Y_0-y|-\int _0^T \left( {\mathbf {1}}_{\{Y_s>y\}}-{\mathbf {1}}_{\{Y_s<y\}}\right) dY_s\\&\le |Y_T-Y_0|+\left| \int _0^T \left( {\mathbf {1}}_{\{Y_s>y\}}-{\mathbf {1}}_{\{Y_s<y\}}\right) dY_s\right| \\&\le \left| \int _0^T \hat{A}_s ds\right| + \left| \int _0^T \hat{B}_s dW_s\right| +\left| \int _0^T \left( {\mathbf {1}}_{\{Y_s>y\}}-{\mathbf {1}}_{\{Y_s<y\}}\right) \hat{A}_sds\right| \\&\quad +\left| \int _0^T \left( {\mathbf {1}}_{\{Y_s>y\}}-{\mathbf {1}}_{\{Y_s<y\}}\right) \hat{B}_s dW_s\right| \\&\le 2\int _0^T |\hat{A}_s |ds+ \left| \int _0^T \hat{B}_s dW_s\right| +\left| \int _0^T \left( {\mathbf {1}}_{\{Y_s>y\}}-{\mathbf {1}}_{\{Y_s<y\}}\right) \hat{B}_s dW_s\right| \,. \end{aligned}$$
Using the inequality \((a+b+c)^2\le 3(a^2+b^2+c^2)\) we get
$$\begin{aligned} 4 L_T^y(Y)^2\le 12 \Vert \hat{A}\Vert _\infty ^2 T^2+3\left| \int _0^T \hat{B}_s dW_s\right| ^2 +3\left| \int _0^T \left( {\mathbf {1}}_{\{Y_s>y\}}-{\mathbf {1}}_{\{Y_s<y\}}\right) \hat{B}_s dW_s\right| ^2\,, \end{aligned}$$
and, using Itô’s \(L^2\)-isometry,
$$\begin{aligned} 4{\mathbb {E}}\left( L_T^y(Y)^2\right) \le 12 \Vert \hat{A}\Vert _\infty ^2 T^2+6\int _0^T \hat{B}_s^2 ds\le 12 \Vert \hat{A}\Vert _\infty ^2 T^2+6T \Vert \hat{B}\Vert _\infty ^2 \,. \end{aligned}$$
The claim now follows by applying the Cauchy–Schwarz-inequality and taking the supremum over all \(y\in {\mathbb {R}}\). \(\square \)
We are ready to prove the result of this section.
Proof of Theorem 2.7
Let \(\varepsilon _1=\varepsilon _0/2\). Define a mapping \(\lambda : {\mathbb {R}}\longrightarrow {\mathbb {R}}\) by
$$\begin{aligned} \lambda (z)= {\left\{ \begin{array}{ll} z- \frac{2}{3 \varepsilon _1^2}z^3 +\frac{1}{5 \varepsilon _1^4}z^5 &{}\quad |z|\le \varepsilon _1\\ \frac{8 \varepsilon _1}{15} &{}\quad z> \varepsilon _1\\ -\frac{8 \varepsilon _1}{15} &{}\quad z< -\varepsilon _1\,. \end{array}\right. } \end{aligned}$$
Note that \(\lambda '(0)=1\) and \(\lambda '(\pm \varepsilon _1)=\lambda ''(\pm \varepsilon _1)=0\), so that \(\lambda \in C^2\).
Next we decompose the path of X: let \(\tau _{0}:=\inf \{t\ge 0: X_t\in \Theta ^{\varepsilon _1}\}\). In particular we have \(\tau _0=0\), if \(X_0\in \Theta ^{\varepsilon _1}\). For \(k\in {\mathbb {N}}_0\), define
$$\begin{aligned} \kappa _{k+1}&:=\inf \{t\ge \tau _{k}: X_t\notin \Theta ^{2\varepsilon _1}\}\wedge T\,,\\ \tau _{k+1}&:=\inf \{t\ge \kappa _{k+1}: X_t\in \Theta ^{\varepsilon _1}\}\wedge T\,. \end{aligned}$$
By Lemma 2.9 there exist continuous \(m_k:[\tau _k,\kappa _{k+1}]\longrightarrow {\mathbb {R}}^d\), with \(\Vert m_k(t)\Vert =1\) and \(m_k(t)\bot {{\text {tang}}}_{p(X_t)}\) for all \(t\in [\tau _k,\kappa _{k+1}]\). Without loss of generality \(m_0\) can be chosen such that \(m_0(\tau _0)^\top (X_{\tau _0}-p(X_{\tau _0}) )\ge 0\). We construct a one-dimensional process Y as follows:
$$\begin{aligned} Y_{t}&= {\left\{ \begin{array}{ll} \lambda (m_0(\tau _0)^\top (X_{\tau _0}-p(X_{\tau _0}))) &{}\quad t\le \tau _0 \\ \lambda (m_k(t)^\top (X_{t}-p(X_{t}))) &{}\quad t\in [\tau _k,\kappa _{k+1}] \\ \lambda (m_k(\kappa _k)^\top (X_{\kappa _k}-p(X_{\kappa _k}))) &{}\quad t\in [\kappa _k,\tau _{k}]\,, \end{array}\right. } \end{aligned}$$
where without loss of generality the \(m_k\) are chosen such that
$$\begin{aligned} \lambda (m_{k+1}(\tau _{k+1})^\top (X_{\tau _{k+1}}-p(X_{\tau _{k+1}}))) =\lambda (m_k(\kappa _k)^\top (X_{\kappa _k}-p(X_{\kappa _k})))\,. \end{aligned}$$
(5)
Note that by construction both sides of (5) can only take the values \(\pm \lambda (\varepsilon _1)\).
We have thus constructed a continuous \([\lambda (-\varepsilon _1),\lambda (\varepsilon _1)]\)-valued process Y with the property that the occupation time of Y in an environment of 0 is the same as the occupation time of X in an environment of \(\Theta \), i.e. \(Y\in (-\lambda (\varepsilon ),\lambda (\varepsilon ))\), iff \(X\in \Theta ^\varepsilon \) for all \(0<\varepsilon <\varepsilon _1\).
To show that Y is an Itô process, we want to use Itô’s formula. For this we recognize that Y, depending on its proximity to \(\Theta \), is either constant or locally of the form \(Y_t=\lambda (n(p(X_t))^\top (X_t-p(X_t)))\) for a suitable choice of the unit normal vector. Denote \(D(x)=n(p(x))^\top (x-p(x) )\). The function \(D\) is locally a signed distance to \(\Theta \) and \(D \in C^2\). This can be seen by following the proof of [3, Theorem 1]. Hence, we may apply Itô’s formula to get
$$\begin{aligned} dY_t\,=\,&\lambda '(D(X_t))D'(X_t) A_t dt +\lambda '(D(X_t)) D'(X_t) B_t dW_t +\frac{1}{2}{\text {tr}}\left( B_t^\top \lambda ''(D(X_t))B_t\right) dt\,. \end{aligned}$$
By Lemma 2.8 we have \(D'(x)=n(p(x))^\top \), and hence
$$\begin{aligned} (\lambda (D(x)))''= (\lambda '(D(x))n(p(x))^\top )' =\lambda ''(D(x))n(p(x))n(p(x))^\top +\lambda '(D(x))n'(p(x))\,. \end{aligned}$$
Since \(\lambda '\) and \(\lambda ''\) are bounded by construction, \(\Vert n(p(x))n(p(x))^\top \Vert =1\), \(\Vert n'\Vert \) is bounded (c.f. the remark on Assumption 2.1.3), and by Assumption 1 of the theorem, the coefficients of Y are uniformly bounded. Therefore \(dY_t=\hat{A}_t dt+ \hat{B}_t dW_t\), with bounded and progressively measurable \(\hat{A},\hat{B}\).
Let \(0<\varepsilon \le \varepsilon _1/2\). For all \(|z|\le \varepsilon \), we have \(\lambda '(z)\ge \left( \frac{3}{4}\right) ^2\). Thus by Assumption 2 of the theorem,
$$\begin{aligned} \left( \frac{3}{4}\right) ^2 c_0^2\int _0^t{\mathbf {1}}_{\left\{ X_s\in \Theta ^\varepsilon \right\} }ds&= \left( \frac{3}{4}\right) ^2 c_0^2\int _0^t{\mathbf {1}}_{\left\{ Y_s\in (-\lambda (\varepsilon ),\lambda (\varepsilon ))\right\} }ds\\&\le \int _0^t {\mathbf {1}}_{\left\{ Y_s\in (-\lambda (\varepsilon ),\lambda (\varepsilon ))\right\} } \lambda '\left( D(X_s)\right) ^2 n(p(X_s))^\top B_sB_s^\top n(p(X_s))ds\\&= \int _0^t{\mathbf {1}}_{\left\{ Y_s\in (-\lambda (\varepsilon ),\lambda (\varepsilon ))\right\} } d\left[ Y\right] _s\,. \end{aligned}$$
By the occupation time formula [11, Chapter 3, 7.1 Theorem] for one-dimensional continuous semimartingales, we get
$$\begin{aligned} \int _0^T{\mathbb P}\left( \{X_s\in \Theta ^\varepsilon \}\right) ds&\le \left( \frac{4}{3 c_0}\right) ^{2} {\mathbb {E}}\left( \int _0^T{\mathbf {1}}_{\{Y_s\in (-\lambda (\varepsilon ),\lambda (\varepsilon ))\}}d\left[ Y\right] _s\right) \\&=2 \left( \frac{4}{3 c_0}\right) ^{2} {\mathbb {E}}\left( \int _{\mathbb {R}}{\mathbf {1}}_{(-\lambda (\varepsilon ),\lambda (\varepsilon ))}(y)L_T^{y}\left( Y\right) dy\right) \\&\le \frac{4^3}{3^2c_0^2}\, \sup _{y\in {\mathbb {R}}}{\mathbb {E}}\left( L_T^{y}\left( Y\right) \right) \varepsilon \,. \end{aligned}$$
\(\square \)
The transformation
The proof of convergence is based on a transformation that removes the discontinuity from the drift and makes the drift Lipschitz while preserving the Lipschitz property of the diffusion coefficient. A suitable transform is presented in [15]. We recall it here.
Define \(G{:}{\mathbb {R}}^d\longrightarrow {\mathbb {R}}^d\),
$$\begin{aligned} G(x)={\left\{ \begin{array}{ll} x+{\tilde{\phi }}(x) \alpha (p(x))&{}\quad x\in \Theta ^{\varepsilon _0}\\ x &{}\quad x\in {\mathbb {R}}^d\backslash \Theta ^{\varepsilon _0}\,, \end{array}\right. } \end{aligned}$$
where \(\varepsilon _0>0\) is smaller than the reach of \(\Theta \), see Assumption 2.1.3, \(\alpha \) is the function defined in Assumption 2.1.5, and
$$\begin{aligned} {\tilde{\phi }}(x)=n(p(x))^\top (x-p(x)) \Vert x-p(x)\Vert \phi \left( \frac{\Vert x-p(x)\Vert }{c}\right) \,, \end{aligned}$$
with positive constant c and
$$\begin{aligned} \phi (u)= {\left\{ \begin{array}{ll} (1+u)^3(1-u)^3 &{}\quad |u|\le 1\\ 0 &{}\quad |u|> 1. \end{array}\right. } \end{aligned}$$
Note that \(G''\) is piecewise Lipschitz with exceptional set \(\Theta \).
If c is chosen sufficiently small, see [15, Lemma 3.18], G is invertible by [15, Theorem 3.14]. Furthermore, Itô’s formula holds for G and \(G^{-1}\) by [15, Theorem 3.19].
With this we can define a process \(Z=(Z_t)_{t\ge 0}\) by \(Z_t=G(X_t)\), which solves the SDE
$$\begin{aligned} dZ_t =\tilde{\mu }(Z_t) dt+\tilde{\sigma }(Z_t) dW_t\,, \end{aligned}$$
(6)
where
$$\begin{aligned} \tilde{\mu }(z)&=G'(G^{-1}(z))\mu (G^{-1}(z))+\frac{1}{2}{\text {tr}}\left( \sigma (G^{-1}(z))^\top G''(G^{-1}(z))\sigma (G^{-1}(z))\right) \,,\\ \tilde{\sigma }(z)&=G'(G^{-1}(z)) \sigma (G^{-1}(z)) \,. \end{aligned}$$
From [15, Theorem 3.20] we know that \(\tilde{\mu }\) and \(\tilde{\sigma }\) are Lipschitz, and hence the solution to (6) can be approximated with strong order 1 / 2 using the Euler–Maruyama scheme.