1 Introduction

The aim of this note is to describe the topological support of the law of the solution to a general stochastic differential equation (SDE) with jump noise. For diffusions, such a description is provided by the classical Stroock-Varadhan support theorem, see [1]. The natural question of an extension of this theorem to SDEs with jumps was studied intensively at the end of 1990s–early 2000s by H.Kunita, Y.Ishikawa, and T.Simon, see [2,3,4]. It appears that, in the jump noise setting, the two cases should be naturally separated, named in [3] the ‘Type I’ and ‘Type II’ SDEs. In plain words, an SDE is of ‘Type I’ if the small jumps are absolutely integrable and of ‘Type II’ otherwise. A ‘Type I’ SDE admits a simple and intuitively clear description of the support; this case was studied completely in [3]. ‘Type II’ SDEs had been studied under some additional limitations only, which can be understood as certain technical convexity and scaling-type assumptions on the Lévy measure of the noise. The latter ‘scaling’ assumption (see H2 in [3] or (10) below) is quite restrictive and requires the jump noise to be close to an \(\alpha \)-stable one, in a sense. This limitation is of a technical nature and is caused by the method used in [3, 4] rather than the problem itself; hence, a natural question arises how to remove it in order to get a general support theorem for SDEs with jumps, free from any technical limitations. This question was solved in the one-dimensional setting in [5] and for canonical (Markus) equations in [6]. For general multidimensional Itô equations with jumps, it apparently requires alternative methods and remained open since early 2000s.

In this note, we propose a method for proving a support theorem for general ‘Type II’ SDEs, based on a change of measure and free from any non-natural technical limitations. The description we obtain for the topological support for the law of the solutions to jumping SDEs is of a considerable importance because of its natural applications, in particular, to the study of the ergodic properties of the solution to SDE, considered as a Markov process and the strong maximum principle for the generator of this process; see more discussion in Sect. 2.3.

The structure of the paper is as follows: In Sect. 2, the main statement is formulated and provided by a discussion and examples. The proof of the main statement is given in Sect. 3, and to improve the readability the proof of the key lemma (Lemma 3.1) is postponed to a separate Sect. 4. For the same reason, the proof of a technical estimate (19) is given in Appendix A.

2 Main Result

2.1 Preliminaries

Let \(N(\textrm{d}u,\textrm{d}t)\) be a Poisson point measure (PPM) on \(\mathbb {R}^d\times [0, \infty )\) with the intensity measure \(\mu (\textrm{d}u)\textrm{d}t\), where \(\mu (\textrm{d}u)\) is a Lévy measure, i.e.,

$$\begin{aligned} \int _{\mathbb {R}^d}(|u|^2\wedge 1)\mu (\textrm{d}u)<\infty . \end{aligned}$$

We consider an SDE in \(\mathbb {R}^m\)

$$\begin{aligned} \textrm{d}X_\textrm{t}= & {} b(X_\textrm{t})\, \textrm{d}t+\int _{|u|\le 1}c(X_{\textrm{t}-},u) \widetilde{N}(\textrm{d}u,\textrm{d}t)\nonumber \\ {}{} & {} + \int _{|u|>1} c(X_\textrm{t}-,u) N(\textrm{d}u,\textrm{d}t), \quad X_0=x_0, \end{aligned}$$
(1)

where \(\widetilde{N}(\textrm{d}u,\textrm{d}t)=\widetilde{N}(\textrm{d}u,\textrm{d}t)-\mu (\textrm{d}u)\textrm{d}t\) is the corresponding compensated PPM.

The coefficient c(xu) is assumed to have the form

$$\begin{aligned} c(x,u)=\sigma (x)u+r(x,u), \end{aligned}$$
(2)

where r(xu) is negligible, in a sense, w.r.t. |u| for small values of |u| (see below). In the case \(r(x,u)\equiv 0\) Eq. (1) transforms to

$$\begin{aligned} \textrm{d}X_\textrm{t}=b(X_\textrm{t})\, \textrm{d}t+\sigma (X_{\textrm{t}-})\textrm{d}Z_\textrm{t}, \quad X_0=x_0, \end{aligned}$$
(3)

where the Lévy process Z is given by its Itô–Lévy decomposition

$$\begin{aligned} \textrm{d}Z_\textrm{t}=\int _{|u|\le 1}u \widetilde{N}(\textrm{d}u,\textrm{d}t)+ \int _{|u|>1} u N(\textrm{d}u,\textrm{d}t). \end{aligned}$$

We assume the following.

\({\textbf{H}}_1.\) The functions \(b:\mathbb {R}^m\rightarrow \mathbb {R}^m\) and \(\sigma :\mathbb {R}^m\rightarrow \mathbb {R}^{m\times d}\) are Lipschitz continuous.

\({\textbf{H}}_2.\) There exist constants \(C>0, \beta >0\) such that

$$\begin{aligned} |r(x,u)-r(y,u)|\le C|x-y||u|^\beta , \quad |r(x_0,u)|\le C|u|^\beta , \end{aligned}$$

and

$$\begin{aligned} \int _{|u|\le 1}|u|^\beta \mu (\textrm{d}u)<\infty . \end{aligned}$$
(4)

If \(\beta \le 1,\) then \({\textbf{H}}_2\) yields that the whole function c(xu) is absolutely integrable w.r.t. \(\mu (\textrm{d}u)\) on the set \(\{|u|\le 1\}\); that is, Eq. (1) has Type I in the terminology of [3]. This (comparatively simple) case is already studied completely, hence in what follows we consider the case \(\beta >1\), only. Note that, in this case \(r(x,u)\le C(|x-x_0|+1)|u|^\beta \ll |u|\) for small |u|, and the linear part \(\sigma (x)u\) is the principal one in the decomposition (2) for c(xu).

Recall that the Skorokhod space \(\mathbb {D}([0,T], \mathbb {R}^m)\) is the set of càdlàg functions on [0, T] with the metric

$$\begin{aligned} d(f,g)=\sup _{\lambda \in \Lambda _{0,T}}\left( \sup _{0\le s<t\le T}\left| \log \frac{\lambda _\textrm{t}-\lambda _s}{t-s}\right| +\sup _\mathrm{{t}\in [0,T]}|f(\lambda _\textrm{t})-g(t)|\right) , \end{aligned}$$

where \(T>0\) and \(\Lambda _{0,T}\) denotes the set of strictly increasing continuous functions \(\lambda :[0, T]\rightarrow [0, T]\) such that \(\lambda (0)=0, \lambda (T)=T\). It is known that \(\mathbb {D}([0,T], \mathbb {R}^m)\) endowed with \(d(\cdot , \cdot )\) is a Polish space, e.g., [7, Section 14].

Under the assumptions \({\textbf{H}}_1\), \({\textbf{H}}_2\), SDE (1) has unique strong solution X, see [8, Theorem IV.9.1]. We fix a time horizon \(T>0\) and consider this solution on the time interval [0, T]. The law of this solution in the Skorokhod space \(\mathbb {D}([0,T], \mathbb {R}^m)\) will be denoted by \(\textrm{Law}_{x_0,T} (X)\). We aim to describe the support of this law; recall that the (topological) support of a measure \(\kappa \) on a metric space S with Borel \(\sigma \)-algebra is the minimal closed subset F such that \(\kappa (S\setminus F)=0\), this set is denoted by \({{\,\textrm{supp}\,}}(\kappa )\). Alternatively, \(x\in {{\,\textrm{supp}\,}}(\kappa )\) if, and only if, for any open ball B centered at x one has \(\kappa (B)>0\).

2.2 The Main Statement

To describe the support of \(\textrm{Law}_{x_0,T} (X)\), we introduce some constructions. First, we introduce a kernel \(J(x,\textrm{d}y)\) on \(\mathbb {R}^m\) by

$$\begin{aligned} J(x, A)=\mu \big (\{u:x+c(x,u)\in A\}\big ), \quad A\in {\mathscr {B}}(\mathbb {R}^m), \end{aligned}$$

and define the set \({\mathscr {A}}\subset \mathbb {R}^m\times \mathbb {R}^m\) of ‘admissible jumps’ by

$$\begin{aligned} (x,y)\in {\mathscr {A}} \Longleftrightarrow y\in \textrm{supp}(J(x, \cdot )). \end{aligned}$$

Next, denote by L the set of \(\ell \in \mathbb {R}^d\) such that

$$\begin{aligned} \int _{|u|\le 1}|u\cdot \ell | \mu (\textrm{d}u)<\infty ; \end{aligned}$$

here and below we use notation \(a\cdot b\) for the scalar product in \(\mathbb {R}^d\). It is easy to check that L is a vector subspace in \(\mathbb {R}^d\); we call it the ‘integrability subspace’ for \(\mu \) furthermore. Denote by \(u_L\) the orthogonal projection of u on L, then

$$\begin{aligned} \int _{|u|\le 1}|u_L| \mu (\textrm{d}u)<\infty \end{aligned}$$

and under the assumptions \({\textbf{H}}_1\), \({\textbf{H}}_2\) the following function is well defined and is Lipschitz continuous:

$$\begin{aligned} \widetilde{b}(x)=b(x)-\int _{|u|\le 1}\sigma (x)u_L \mu (\textrm{d}u)-\int _{|u|\le 1}r(x,u)\mu (\textrm{d}u). \end{aligned}$$

Denote by \({\textbf{F}}^{\textrm{const}}_{0,T}\) and \({\textbf{F}}^{\textrm{step}}_{0,T}\) the following classes of functions \(f:[0,T]\rightarrow \mathbb {R}^d\):

  • any \(f\in {\textbf{F}}^{\textrm{const}}_{0,T}\) takes a constant value in the orthogonal complement \(L^\perp \) of the subspace L in \(\mathbb {R}^d\);

  • for any \(f\in {\textbf{F}}^{\textrm{step}}_{0,T}\) there exists a partition \([0,T]=[0, \tau _1)\cup [\tau _1,\tau _2)\cup \dots \cup [\tau _j,T]\) such that, on each interval of the partition, f takes a constant value in \(L^\perp .\)

Finally, denote by \({\textbf{S}}^{\textrm{const}}_{0,T,x_0}\) and \({\textbf{S}}^{\textrm{step}}_{0,T,x_0}\) the classes of functions satisfying the following piece-wise ordinary differential equations

$$\begin{aligned} \phi _\textrm{t}=x_0+\int _0^t\Big (\widetilde{b}(\phi _\textrm{s})+\sigma (\phi _\textrm{s})f_\textrm{s}\Big )\, \textrm{d}s+\sum _{t_\textrm{k}\le t}\triangle _{t_k}\phi , \end{aligned}$$
(5)

where \(\{t_k\}\subset (0,T)\) is an arbitrary finite set, f is an arbitrary function from the class \({\textbf{F}}^{\textrm{const}}_{0,T}\) or \({\textbf{F}}^{\textrm{step}}_{0,T},\) respectively, and the jumps of the function \(\phi \) at the time moments \(\{t_k\}\) satisfy

$$\begin{aligned} (\phi _\textrm{tk}-, \phi _\textrm{tk})\in {\mathscr {A}}\hbox { for any }k. \end{aligned}$$
(6)

Note that each such \(\phi \) can be obtained as follows. Let \(x_0\), \(\{t_\textrm{k}\}\), and f be given, we can assume \(t_1<t_2<\dots \). We first solve the Cauchy problem for the ODE

$$\begin{aligned} \phi '_t=\widetilde{b}(\phi _\textrm{t})+\sigma (\phi _\textrm{t})f_\textrm{t} \end{aligned}$$
(7)

on the interval \([0, t_1)\) with the initial condition \(\phi _0=x_0\). Then, we determine the value \(\phi _\textrm{t}1\) which should obey the admissibility assumption (6); note that \(\phi _{t_1-}\) is already well defined. Then, we use \(\phi _{t_1}\) as the initial value or the (new) Cauchy problem on the time interval \([t_1, t_2)\), and so on.

Denote by \(\overline{{\textbf{S}}^{\textrm{const}}_{0,T,x_0}}\) and \(\overline{{\textbf{S}}^{\textrm{step}}_{0,T,x_0}}\) the closures of the classes \({\textbf{S}}^{\textrm{const}}_{0,T,x_0}\) and \({\textbf{S}}^{\textrm{step}}_{0,T,x_0}\) in \(\mathbb {D}([0,T], \mathbb {R}^m)\).

Theorem 2.1

Assume \({\textbf{H}}_1\) and \({\textbf{H}}_2\) with \(\beta >1\). Then for any \(T>0\)

$$\begin{aligned} {{\,\textrm{supp}\,}}\Big (\textrm{Law}_{x_0,T} (X)\Big )=\overline{{\textbf{S}}^{\textrm{const}}_{0,T,x_0}}=\overline{{\textbf{S}}^{\textrm{step}}_{0,T,x_0}}. \end{aligned}$$

In plain words, the above description of the support can be explained as follows. The SDE (1) contains two parts, the ‘deterministic flow’ part which corresponds to the drift, and the ‘stochastic jump part’ which corresponds to the PPM. These two parts are still related through the compensator term, which is involved in the stochastic integral w.r.t. \(\widetilde{N}(\textrm{d}u, \textrm{d}t)\). In the simple case where \(L=\mathbb {R}^d\), and thus the SDE is of the Type I, these parts can be completely separated, and (1) can be written in the form

$$\begin{aligned} \textrm{d}X_t=\widetilde{b}(X_t)\, \textrm{d}t+\int _{\mathbb {R}^d}c(X_{t-},u) N(\textrm{d}u,\textrm{d}t), \end{aligned}$$
(8)

where the effective drift coefficient is defined by \(\widetilde{b}(x)=b(x)-\int _{|u|\le 1}c(x,u)\, \mu (\textrm{d}u)\). In this case, (5) does not contain the part with f because \(L^\perp =\{0\}\), and the description of the support of the law of the solution X is intuitively clear: the solution follows the deterministic flow defined by the effective drift, then makes a jump, admissible for the stochastic jump part, then again follows the deterministic flow, etc. This description was thoroughly proved in [3, Theorem I]. The general (Type II) case is more sophisticated, since the compensator term cannot be separated from the stochastic integral. Theorem 2.1 actually tells us that the above description of the support remains essentially correct, with the following two important changes:

  • the effective drift involves only the parts of c(xu) which are absolute integrable, i.e., r(xu) and \(\sigma (x) u_L\);

  • the ‘non-integrability subspace’ \(L^\perp \) induces an extra drift part, which may act at arbitrary direction of the form \(\sigma (x) h, h\in L^\perp .\)

Such a description of the support was conjectured by T. Simon, see [3, Remark 3.3(c)], by analogy with a similar result proved for Lévy processes in [9]. Theorem 2.1 proves this conjecture in a wide generality, i.e., without any specific assumptions on the SDE except the natural conditions \({\textbf{H}}_1\), \({\textbf{H}}_2\) which yield strong existence of the solution.

Remark 2.2

Instead of the global Lipschitz conditions in \({\textbf{H}}_1\), \({\textbf{H}}_2\), one can assume their local analogues combined with certain condition which prevents the strong solution to SDE (1) from blowing up, e.g., the linear growth condition on the coefficients. Theorem 2.1 can be extended to such a setting by the usual localization technique.

Remark 2.3

Assumption \({\textbf{H}}_2\) with \(\beta >1\) contains a structural limitation that the infinite variation part of the SDE is linear in the jump variable u. Removing this limitation would require considering x-dependent non-integrability subspaces instead of \(\{\sigma (x)f, f\in L^\perp \}\) and piece-wise differential inclusions instead of (5); such an extension is a topic for a further research. Still, the limitation mentioned above is not very restrictive practically. For instance, if c(xu) is \(C^2\)-smooth in u, then (the local version of) \({\textbf{H}}_2\) holds true with \(\beta =2\) by the Taylor formula; in this case, (4) holds true just because \(\mu \) is a Lévy measure.

Remark 2.4

Our description of admissible jumps differs from the one adopted in [3], which requires that

$$\begin{aligned} y=x+c(x,u), \quad u\in \textrm{supp}(\mu ). \end{aligned}$$

These two descriptions coincide for c(xu) continuous in u; for discontinuous c(xu), the latter one is no longer applicable. To see this, one can consider a simple example where \(m=d=1,\) \(\mu \) is a finite discrete measure \(\mu \) which has a full support in \(\mathbb {R}\), \(m=1\), and \(c(x,u)=1_K(u)+21_{\mathbb {R}\setminus K}(u)\), where K is a countable dense subset in \(\mathbb {R}\) such that \(\mu (\mathbb {R}\setminus K)=0\).

2.3 Discussion: Applications and Examples

Support theorems are involved naturally in the study of ergodic properties of the Markov process associated with the SDE; namely, they provide a natural tool for proving that the process is topologically irreducible, see [10, Proof of Proposition 5.3] or [11, Theorem 1.3  and Section 4.3]. This gives a natural application field for Theorem 2.1. Namely, denote by \(S_T(x_0)\) the closure of the set of the values \(\phi _T\), where \(\phi \) runs through the solutions to (5) with arbitrary finite set \(\{t_k\}\subset (0,T)\), jumps of \(\phi \) satisfying (6), and arbitrary function f from the class \({\textbf{F}}^{\textrm{step}}_{0,T}.\) By Theorem 2.1, the solution to (1) satisfies

$$\begin{aligned} {{\,\textrm{supp}\,}}\Big (\textrm{Law}\, (X_T)\Big )=S_T(x_0). \end{aligned}$$

Thus, in order to provide the topological irreducibility for the Markov process associated with (1), it is sufficient to show that

$$\begin{aligned} S_T(x)=\mathbb {R}^m\hbox { for all }x\in \mathbb {R}^m. \end{aligned}$$
(9)

Another application of the support theorems dates back to the original paper by Stroock and Varadhan [1], and concerns the strong maximal principle for the generator \({\mathscr {L}}\) of the Markov process X, which states that any sub-harmonic function for \({\mathscr {L}}\) which reaches its maximum on the given set is constant on this set. We refer an interested reader for a detailed discussion to [8, Theorem IV.8.3] or [3, Remark 3.3(b)]. Here, we mention briefly the following simple corollary: if \(\bigcup _{T>0} S_T(x)\) is dense in \(\mathbb {R}^m\) for any \(x\in \mathbb {R}^m\), then the strong maximal principle for \({\mathscr {L}}\) holds true on the entire \(\mathbb {R}^m\). For this property to hold, it is clearly sufficient that (9) holds true for some \(T>0\).

Below, we give two simple sufficient conditions where (9) holds true for any \(T>0\).

Example 2.5

(Jump noise satisfying the ‘cone condition’). Let there exist \(\theta \in (0,1)\) such that, for any \(\ell \in \mathbb {R}^d, |\ell |=1\) and \(\varepsilon >0\), the intersection of the cone \(\{u:u\cdot \ell \ge \theta |u|\}\) with the ball \(\{|u|< \varepsilon \}\) has positive measure \(\mu \). Assume also that \(\sigma (x)\) is point-wise degenerate, i.e., \(\textrm{rank}\, \sigma (x)=m\le d, x\in \mathbb {R}^m\). Then, \(S_T(x)=\mathbb {R}^m\) for all \(x\in \mathbb {R}^m, T>0\). To prove this assertion, one can take \(f\equiv 0\) and organize for given \(x,y\in \mathbb {R}^m, \varepsilon >0\) and \(T>0\) sequences of (frequent) jump times \(\{t_k\}\) and (small) jumps amplitudes \(u_k\), which would force the solution to (5) with \(x_0=x\) to take the final value y with \(|\phi _T-y|<\varepsilon \). This construction is essentially the same as in the proof of [11, Proposition 4.17]; thus, we omit the details here.

Example 2.6

(Jump noise of the ‘strong Type II’). Let there be no integrability directions for \(\mu (\textrm{d}u)\); that is, \(L=\{0\}\). Assume also that \(\sigma (x)\) is point-wise degenerate. Then, \(S_T(x)=\mathbb {R}^m\) for all \(x\in \mathbb {R}^m, T>0\). To prove this assertion, one can ignore the jump part in (5) and consider the solution to the SDE

$$\begin{aligned} \textrm{d}\phi _t=\widetilde{b}(\phi _t)\, \textrm{d}t+\sigma (\phi _t)\, f_t\textrm{d}t, \quad \phi _0=x. \end{aligned}$$

Since f can be arbitrary piece-wise constant function \([0,T]\rightarrow \mathbb {R}^d\) and \(\sigma (x)\) is point-wise degenerate, by a proper choice of f respective solution \(\phi \) can be made arbitrarily close to

$$\begin{aligned} \phi _t^{x,y,T}=x+\frac{t}{T}(y-x). \end{aligned}$$

Let us give two more particular examples illustrating the range of applicability of Theorem 2.1 and sufficient conditions from Example 2.5 and Example 2.6. For that purpose, we recall the auxiliary ‘scaling’ assumption imposed in [3, 4]: for some \(\alpha \in (0,2)\),

$$\begin{aligned} \int _{|u|\le \varepsilon }(u\cdot \ell )^2\mu (\textrm{d}u)\asymp \varepsilon ^{2-\alpha }, \quad |\ell |=1. \end{aligned}$$
(10)

The following example concerns the cylindrical Lévy processes, which have been studied extensively in the last decades, e.g., [12,13,14].

Example 2.7

Let \(\mu (\textrm{d}u)\) be the Lévy measure of the \(Z=( Z^1, \dots , Z^d)\) with the independent components \(Z^i, i=1, \dots , d\). Let the components be symmetric \(\alpha _i\)-stable processes on \(\mathbb {R}\); then, the scaling assumption (10) fails unless all the stability indices \(\alpha _i, i=1, \dots , d\) are the same; that is, [3, Theorem II] is not applicable. If at least one \(\alpha _i>1\), then this is the Type II Lévy noise and [3, Theorem I] is not applicable, as well. On the other hand, Theorem 2.1 can be applied assuming the coefficients satisfy \({\textbf{H}}_1\), \({\textbf{H}}_2\); note that \(\beta \) can be taken arbitrary \(>\max _i\alpha _i.\) The Lévy measure \(\mu (\textrm{d}u)\) is supported by the collection of the coordinate axes in \(\mathbb {R}^d\) and thus satisfies the ‘cone condition’ from Example 2.5. Hence, if \(\sigma (x)\) is point-wise degenerate, then identity (9) holds true for any \(T>0\).

Our second example shows that Lévy measure can be of a ‘strong Type II’ even if even the (rather mild) ‘cone condition’ fails.

Example 2.8

Let the Lévy measure \(\mu (\textrm{d}u)\) on \(\mathbb {R}^2\) be the image of the symmetric \(\alpha \)-stable Lévy measure \(\frac{\textrm{d}z}{|z|^{1+\alpha }}\) on \(\mathbb {R}\) under the mapping

$$\begin{aligned} z\mapsto (z,|z|^\gamma \textrm{sgn}\, z) \end{aligned}$$

with some \(\gamma >1\). Then, the scaling condition (10) fails, and for \(\alpha >1\) (i.e., when the noise is of Type II) the previous results are not applicable. On the other hand, Theorem 2.1 applies. Note that the Lévy measure \(\mu (\textrm{d}u)\) is quite degenerate and the ‘cone condition’ fails. On the other hand, for \(\gamma \le \alpha \) the noise has strong Type II. Hence, if \(\sigma (x)\) is point-wise degenerate, then identity (9) holds true for any \(T>0\).

3 Proof of Theorem 2.1

To prove the announced statement, it is sufficient to prove the following two inclusions:

$$\begin{aligned} {{\,\textrm{supp}\,}}\Big (\textrm{Law}_{x_0,T} (X)\Big )\subset \overline{{\textbf{S}}^{\textrm{const}}_{0,T,x_0}}, \quad \overline{{\textbf{S}}^{\textrm{step}}_{0,T,x_0}}\subset {{\,\textrm{supp}\,}}\Big (\textrm{Law}_{x_0,T}(X)\Big ). \end{aligned}$$
(11)

The proof of the first inclusion is simple and standard, corresponding argument was explained in [3]; for the reader’s convenience, we outline the argument here. Consider a family of SDEs

$$\begin{aligned} \textrm{d}X_t^\eta =b_\eta (X_t^\eta )\, \textrm{d}t+\int _{\eta \le |u|\le 1}c(X_{t-}^\eta ,u) \widetilde{N}(\textrm{d}u,\textrm{d}t)+ \int _{|u|>1} c(X_{t-}^\eta ,u) N(\textrm{d}u,\textrm{d}t),\quad \end{aligned}$$
(12)

with \(X_0^\eta =x_0\), where

$$\begin{aligned} b_\eta (x)=b(x)-\int _{|u|<\eta }\sigma (x)u_L \mu (\textrm{d}u)-\int _{|u|<\eta }r(x,u)\mu (\textrm{d}u). \end{aligned}$$

It is easy to check that \(b_\eta \rightarrow b, \eta \rightarrow 0\) uniformly on compact subsets of \(\mathbb {R}^m\). Then, by the usual stochastic calculus technique one can show that

$$\begin{aligned} \sup _{t\in [0,T]}|X_t^\eta -X_t|\rightarrow 0, \quad \eta \rightarrow 0 \end{aligned}$$
(13)

in probability; see the proof of a similar statement in Lemma 4.2. On the other hand, Eq. (12) can be written in the form:

$$\begin{aligned} \textrm{d}X_t^\eta =\widetilde{b}(X_t^\eta )\, \textrm{d}t{-}\sigma (X_t^\eta )\upsilon _\eta \, \textrm{d}t+\int _{|u|\ge \eta }c(X_{t-}^\eta ,u) N(\textrm{d}u,\textrm{d}t), \end{aligned}$$

where

$$\begin{aligned} \upsilon _\eta =\int _{\eta \le |u|\le 1}(u-u_L)\, \mu (\textrm{d}u)\in L^\perp . \end{aligned}$$
(14)

The PPM N, restricted to \(\{|u|\ge \eta , t\in [0, T]\}\), has a finite set of atoms (‘jumps’). Then, the solution to the latter equation can be represented path-wise as a collection of solutions to ODEs of the form (5), where \(f\equiv \upsilon _\eta \) belongs to \({\textbf{F}}^{\textrm{const}}_{0,T}\), \(\{t_k\}\) are equal to the time instants of the jumps, and \(\triangle _{t_k}\phi =c(\phi _{t_{k}-}, u_k)\), where \(\{u_k\}\) are equal to the amplitudes of the jumps for N. Because

$$\begin{aligned} \phi _{t_{k}}=\phi _{t_{k-}}+c(\phi _{t_{k-}},u_k), \end{aligned}$$

the pair \((\phi _{t_{k}-}, \phi _{t_{k}})\) is admissible for any k. Therefore, for any \(\eta \),

$$\begin{aligned} {{\,\textrm{supp}\,}}\Big (\textrm{Law}_{x_0,T} (X^\eta )\Big )\subset \overline{{\textbf{S}}^{\textrm{const}}_{0,T,x_0}}\Longleftrightarrow \mathbb {P}(X^\eta |_{[0,T]}\in \overline{{\textbf{S}}^{\textrm{const}}_{0,T,x_0}})=1. \end{aligned}$$

By (13), the laws of \(X^\eta \) in \(\mathbb {D}([0,T], \mathbb {R}^m)\) weakly converge to the law of X, which gives for a closed set \(\overline{{\textbf{S}}^{\textrm{const}}_{0,T,x_0}}\)

$$\begin{aligned} \mathbb {P}(X|_{[0,T]}\in \overline{{\textbf{S}}^{\textrm{const}}_{0,T,x_0}})\ge \limsup _{\eta \rightarrow 0}\mathbb {P}(X^\eta |_{[0,T]}\in \overline{{\textbf{S}}^{\textrm{const}}_{0,T,x_0}})=1, \end{aligned}$$

and completes the first inclusion in (11).

The second inclusion is the main part of the theorem. In order to proceed with its proof, we re-write the original SDE (1) to the form

$$\begin{aligned} \begin{aligned} \textrm{d}X_t=\widetilde{b}_\eta (X_t)\, \textrm{d}t&{-}\sigma (X_t)\upsilon _\eta \, \textrm{d}t+\int _{|u|<\eta }c(X_{t-},u) \widetilde{N}(\textrm{d}u,\textrm{d}t)\\&+ \int _{|u|\ge \eta }c(X_{t-},u) N(\textrm{d}u,\textrm{d}t), \end{aligned} \end{aligned}$$
(15)

where \(\eta >0\) is a (small) parameter which is yet to be chosen and

$$\begin{aligned} \widetilde{b}_\eta (x)=b(x)-\int _{\eta \le |u|\le 1}\sigma (x)u_L \mu (\textrm{d}u)-\int _{\eta \le |u|\le 1}r(x,u)\mu (\textrm{d}u), \end{aligned}$$

recall that \(\upsilon _\eta \) is given by (14). Consider SDE

$$\begin{aligned} \textrm{d}X_t^{\eta , \textrm{trunc}}=\widetilde{b}_\eta (X_t^{\eta , \textrm{trunc}})\, \textrm{d}t{-}\sigma (X_t^{\eta , \textrm{trunc}})\upsilon _\eta \, \textrm{d}t+\int _{|u|<\eta }c(X_{t-}^{\eta , \textrm{trunc}},u) \widetilde{N}(\textrm{d}u,\textrm{d}t),\quad \end{aligned}$$
(16)

which can be seen as a modification of (15) with ‘large jumps’ being truncated, i.e., with the PPM \(N(\textrm{d}u, \textrm{d}t)\) changed to its restriction to \(\{|u|<\eta \}\times [0,T]\). We will denote by \(X^{x, S,\eta , \textrm{trunc}}\) the solution of (16) with \(X_S=x\).

Let \(f\in {\textbf{F}}^{\textrm{step}}_{0,T}, x\in \mathbb {R}^m, S\in [0, T]\) be fixed and \(\phi ^{x, S, f}\) be the solution of the ODE

$$\begin{aligned} \phi _t=x+\int _S^t\Big ({\widetilde{b}}(\phi _s)+\sigma (\phi _s)f_s\Big )\, \textrm{d}s, \quad t\in [S, T] \end{aligned}$$
(17)

The following statement is the cornerstone of the entire proof. In what follows, we denote by B(xr) the open ball in \(\mathbb {R}^m\) with the center x and radius r.

Lemma 3.1

(The Key Lemma) Let \(f\in {\textbf{F}}^{\textrm{step}}_{0,T}\) be fixed. There exists \(\rho \in (0,1)\) such that, for any given \(x\in \mathbb {R}^m,\) and \(\gamma >0,\) there exists \(\eta ^{f,x, \gamma }>0\) such that

$$\begin{aligned} \begin{aligned} p^{\textrm{trunc}}(\eta , f, x, \gamma ) := \inf _{x'\in B(x,\rho \gamma ), 0 \le \ S \le \,Q\le \,T}\mathbb {P}\left( \sup _{t\in [S, Q]}|X_t^{x',S,\eta , \textrm{trunc}}-\phi _t^{x,S, f}|\le \gamma \right) >0 \end{aligned} \end{aligned}$$

for any \(\eta \in (0,\eta ^{f,x, \gamma }].\)

We postpone the proof of Lemma 3.1 to a separate Sect. 4; here, we explain the argument which provides the second inclusion in (11) once this key lemma is proved. Though being quite standard, this argument is a bit cumbersome; hence, we divide the exposition in several steps.

Step 1: Choosing the number and approximate instants of ‘big jumps’. Fix \(\eta >0\) and decompose

$$\begin{aligned} N(\textrm{d}u, \textrm{d}t)=1_{|u|<\eta }N(\textrm{d}u, \textrm{d}t)+1_{|u|\ge \eta }N(\textrm{d}u, \textrm{d}t)=:N_\eta (\textrm{d}u, \textrm{d}t)+N^\eta (\textrm{d}u, \textrm{d}t). \end{aligned}$$

The PPM \(N^\eta (\textrm{d}u, \textrm{d}t)\) a.s. has a finite number of atoms with \(t\in [0, T]\); say, \(\{(\xi _j,\tau _j)\}_{j=1}^J.\) It is well known that the PPMs \(N_\eta (\textrm{d}u, \textrm{d}t)\) and \(N^\eta (\textrm{d}u, \textrm{d}t)\) are independent. In addition, the random variable

$$\begin{aligned} J=N(\{|u|\ge \eta \}\times [0,T]) \end{aligned}$$

has the Poisson distribution with the intensity \(T\mu (|u|\ge \eta ),\) and conditioned by the event \(\{J=K\},\) the random vectors \(\{\xi _j\}_{j=1}^K, \{\tau _j\}_{j=1}^K\) are independent. The corresponding (conditional) laws are the K-fold product of

$$\begin{aligned} \mathbb {P}^\eta (\textrm{d}u):=\frac{1}{\mu (\{v:|v|\ge \eta \})}\mu (\textrm{d}u) \end{aligned}$$

for \(\{\xi _j\}_{j=1}^K,\) and the uniform distribution on the simplex

$$\begin{aligned} \Delta _K(0, T):=\{(s_1, \dots , s_K): 0\le s_1\le \dots \le s_K\le T\} \end{aligned}$$

for \(\{\tau _j\}_{j=1}^K.\)

Next, let \(\phi \in {\textbf{S}}^{\textrm{step}}_{0,T,x_0}\) be fixed with the corresponding function \(f\in {\textbf{F}}^{\textrm{step}}_{0,T}\) and points \(\{t_k, k=1, \dots , K\}\) from Eq. (5) for \(\phi \). Denote \(x_k=\phi _{t_k-}, y_k=\phi _{t_k}\); we can and will assume that \(x_k\not =y_k\) because otherwise we can simply exclude the point \(t_k\) from the Eq. (5). Denote also \(t_0=0, t_{K+1}=T\), then for any positive

$$\begin{aligned} \delta <\delta _{\{t_k\}}:=\frac{1}{2}\min _{k=0, \dots , K}|t_{k+1}-t_k| \end{aligned}$$

we have

$$\begin{aligned} p(\eta , \delta , \{t_k\}):=\mathbb {P}(J=K, |t_k-\tau _k|<\delta , k=1, \dots ,K)\\ =e^{-T\mu (|u|\ge \eta )}\Big (2\delta \mu (|u|\ge \eta )\Big )^K, \end{aligned}$$

which is positive.

Step 2: Linking the instants of ‘big jumps’ for X with the discontinuities of \(\phi \). Process X follows the truncated SDE (16) between the ‘big jumps instants’ \(\tau _j\), and at these instants satisfies

$$\begin{aligned} \triangle _{\tau _j} X=c(X_{\tau _j}, \xi _{j}), \quad j=1, \dots , J. \end{aligned}$$

Denote by \(X^{s_1, \dots , s_K}\) the similar process with \(J=K\) and \(\tau _k=s_k\): it follows the truncated SDE (16) on each \([s_{k-1}, s_k)\), and satisfies

$$\begin{aligned} \triangle _{s_k} X^{s_1, \dots , s_K}=c(X_{s_k}^{s_1, \dots , s_K}, \xi _{k}), \quad k=1, \dots , K; \end{aligned}$$

the random vector \(\{\xi _j\}_{j=1}^K\) has the law \((\mathbb {P}^\eta )^{\otimes K}\). Then

$$\begin{aligned} \mathbb {P}(X\in A)\ge & {} \mathbb {P}(X\in A, J=K)\nonumber \\= & {} \int _{\Delta _K(0, T)}\mathbb {P}(X^{s_1, \dots , s_K}\in A)\mathbb {P}(J=K, (\tau _1, \dots \tau _k)\in ds_1\dots ds_K)\nonumber \\\ge & {} p(\eta , \delta , \{t_k\})\inf _{|t_k-s_k|<\delta , k=1, \dots ,K}\mathbb {P}(X^{s_1, \dots , s_K}\in A). \end{aligned}$$
(18)

We will apply this inequality with

$$\begin{aligned} A=\{x(\cdot ): d(x(\cdot ), \phi )\le \varepsilon \}, \end{aligned}$$

in this case \(\mathbb {P}(X^{s_1, \dots , s_K}\in A)\) can be bounded from below as follows. Denote \(s_0=0\), \(s_{K+1}=T\), and \(y_0=x_0\). Define a function \(\phi ^{s_1, \dots , s_K}\) as follows: on each of the intervals \([s_k, s_{k+1}), k=0, \dots , K\) it satisfies (17) with \(x=y_k, S=s_k\); it is also continuous at the end point \(T=s_{K+1}\). Recall that the initial function \(\phi _t\) follows similar description with \(\{t_k\}\) instead of \(\{s_k\}\). Using this observation, it is easy to show that

$$\begin{aligned} \sup _{{|t_k-s_k|<\delta , k=1, \dots ,K}} d(\phi ^{s_1, \dots , s_K}, \phi )\rightarrow 0, \quad \delta \rightarrow 0; \end{aligned}$$
(19)

for the reader’s convenience, we prove this relation in Appendix A. Hence, we can fix \(\delta (\varepsilon )>0\) such that

$$\begin{aligned} \sup _{{|t_k-s_k|<\delta , k=1, \dots ,K}} d(\phi ^{s_1, \dots , s_K}, \phi )\le \frac{\varepsilon }{2}, \quad \delta \in (0, \delta (\varepsilon )), \end{aligned}$$

and thus for any \(\{s_k\}\) with \(|t_k-s_k|<\delta , k=1, \dots ,K\)

$$\begin{aligned} \mathbb {P}(d(X^{s_1, \dots , s_K}, \phi )\le \varepsilon )\ge \mathbb {P}\left( d(X^{s_1, \dots , s_K}, \phi ^{s_1, \dots , s_K})\le \frac{\varepsilon }{2}\right) . \end{aligned}$$
(20)

Step 3: Some preliminary estimates for the distance between \(X^{s_1, \dots , s_K}\) and \(\phi ^{s_1, \dots , s_K}\). In what follows, \(\rho \in (0,1)\) is the number given by Lemma 3.1. Denote

$$\begin{aligned} {\mathscr {F}}_t=\sigma (X^{s_1, \dots , s_K}_s, s\le t)=\sigma (N_\eta ([0,s]\times du), s\le t, \{\xi _k, s_k\le t\}), \end{aligned}$$

and

$$\begin{aligned} {\mathscr {G}}_t={\mathscr {F}}_{t-}=\sigma (N_\eta ([0,s]\times du), s\le t, \{\xi _k, s_k<t\}), \end{aligned}$$

note that \(\xi _k\) is independent on \({\mathscr {G}}_{s_k}\) for any k. The following holds:

  1. (I)

    For any \(k=0, \dots , K\) and \(\gamma >0\), the conditional probability w.r.t. \({\mathscr {F}}_{s_k}\) for the event

    $$\begin{aligned} \{|X^{s_1, \dots , s_K}_t-\phi _t^{s_1, \dots , s_K}|\le \gamma , t\in [s_k, s_{k+1})\} \end{aligned}$$

    equals

    $$\begin{aligned} \mathbb {P}\left( \sup _{t\in [s_k, s_{k+1})}|X_t^{x, s_k,\eta , \textrm{trunc}}-\phi _t^{y_k,s_k, f}|\le \gamma \right) \Big |_{x=X_{s_k}^{s_1, \dots , s_K}}, \end{aligned}$$

    and by Lemma 3.1 it is bounded from below by \(p^{\textrm{trunc}}(\eta , f, y_k, \gamma )\) on the set

    $$\begin{aligned} \left\{ |X^{s_1, \dots , s_K}_{s_k}-y_k|<{\rho \gamma }\right\} \in {\mathscr {F}}_{s_k}, \end{aligned}$$

    provided that \(\eta \le \eta ^{f,y_k,\gamma }\);

  2. (II)

    for any \(k=1, \dots , K\) and \(\gamma >0\), the conditional probability w.r.t. \({\mathscr {G}}_{s_k}\) for the event

    $$\begin{aligned} \left\{ |X^{s_1, \dots , s_K}_{s_k}-y_k|<{\rho \gamma }\right\} \end{aligned}$$

    equals

    $$\begin{aligned} \begin{aligned} P^\eta&\left( |x+c(x,\xi )-y_k|<{\rho \gamma }\right) \Big |_{x=X^{s_1, \dots , s_K}_{s_k-}}\\&=\frac{1}{\mu (\{u:|v|\ge \eta \})}\mu \left( \left\{ u:|u|\ge \eta , |x+c(x,u)-y_k|<{\rho \gamma }\right\} \right) \Big |_{x=X^{s_1, \dots , s_K}_{s_k-}}. \end{aligned} \end{aligned}$$

Recall that each pair \((x_k, y_k)=(\phi _{t_k-}, \phi _{t_k})\) is admissible; hence, for any \(\gamma >0\) and k,

$$\begin{aligned} J(x_k,B(y_k, \gamma ))=\mu (\{u:x_k+c(x_k,u)\in B(y_k, {\rho \gamma })\})>0. \end{aligned}$$

Take

$$\begin{aligned} \gamma _*=\frac{1}{3}\min _{k}|x_k-y_k|, \end{aligned}$$

then by the assumptions \({\textbf{H}}_1\), \({\textbf{H}}_2\) there exists \(\eta ^{*}>0\) such that, for all k,

$$\begin{aligned} |u|\le \eta ^*\Longrightarrow |c(x_k, u)|< |x_k-y_k|-{\gamma _*} \Longrightarrow x_k+c(x_k, u)\not \in B\left( y_k, {\gamma _*}\right) .\nonumber \\ \end{aligned}$$
(21)

This yields for any \(\gamma \in (0, \gamma _*]\)

$$\begin{aligned} \mu (\{u:|u|>\eta ^*, x_k+c(x_k, u) \in B(y_k, {\rho \gamma })\})>0, \quad k=1, \dots , K. \end{aligned}$$

Moreover, because c(xu) is continuous w.r.t. x we have by the usual weak continuity arguments that, for each \(\gamma \in (0, \gamma _*]\) and k, there exists \(\gamma '>0\) such that

$$\begin{aligned} \inf _{x\in B(x_k,2\gamma ')} \mu \left( \left\{ u:|u|>\eta ^*, x+c(x, u) \in B\left( y_k, {\rho \gamma }\right) \right\} \right) >0. \end{aligned}$$
(22)

Step 4: Specifying the parameters and completing the proof. Now, we can specify the free parameters \(\eta , \gamma , \delta \) in the above estimates and finalize the entire proof. Let us define iteratively parameters \(\gamma _{k}\) for \(k=K+1, \dots , 1\) as follows: Take

$$\begin{aligned} \gamma _{K+1}=\min \left( \frac{\varepsilon }{2},\gamma _*\right) , \end{aligned}$$

and let for \(k=K, \dots , 1\)

$$\begin{aligned} \gamma _{k}=\min \left( \gamma _{k+1}',\gamma _*\right) , \end{aligned}$$

where \(\gamma _{k+1}'\) is such that (22) holds true with this \(\gamma '\) and \(\gamma =\gamma _{k+1}\). We will use these parameters to estimate

$$\begin{aligned} \begin{aligned} \mathbb {P}&\left( \sup _{t\in [0, T]}|X^{s_1, \dots , s_K}_{t}-\phi ^{s_1, \dots , s_K}_{t}|\le \frac{\varepsilon }{2}\right) \\&\ge \mathbb {P}\left( \bigcap _{k=0}^K \Big \{\sup _{t\in [s_k, s_{k+1})}|X^{s_1, \dots , s_K}_{t}-\phi ^{s_1, \dots , s_K}_{t}|\le \gamma _{k+1}\Big \}\right) . \end{aligned} \end{aligned}$$
(23)

It follows from the calculations in Appendix A that \(\delta \in (0, \delta (\varepsilon ))\) can be taken small enough such that, for any \(s_1, \dots , s_K\) with \(|t_k-s_k|<\delta , k=1, \dots , K\),

$$\begin{aligned} |\phi _{s_k-}^{s_1, \dots , s_K}-x_k|<{\gamma _{k}}. \end{aligned}$$
(24)

We fix such \(\delta >0\) and the truncation level

$$\begin{aligned} \eta =\min \{\eta _*, \eta ^{f,y_k,{\gamma _{k+1}}}, k=0, \dots , K \}. \end{aligned}$$

Denote

$$\begin{aligned} A_k=\left\{ \sup _{t\in [s_k, s_{k+1})}|X^{s_1, \dots , s_K}_t-\phi _t^{s_1, \dots , s_K}|\le \gamma _{k+1}\right\} , \quad k=0, \dots , {K} \end{aligned}$$

and

$$\begin{aligned} B_k=\{|X^{s_1, \dots , s_K}_{s_k}-y_k|<{\rho \gamma _{k+1}}\}, \quad k=1, \dots , K, \quad B_0=\Omega . \end{aligned}$$

Then \( A_k\in {\mathscr {G}}_{s_{k+1}}, B_k\in {\mathscr {F}}_{s_k}, k=1,\dots , K\), and we have the following:

  • \(\bullet \) By Lemma 3.1, for \(k=0, \dots , K\)

    $$\begin{aligned} \mathbb {P}(A_k|{\mathscr {F}}_{s_{k}})\ge p^{\textrm{trunc}}(\eta , f, y_k, \gamma _{k+1}) \hbox { a.s. on the set }B_k. \end{aligned}$$
  • \(\bullet \) By (24), for any \(k=1, \dots , K\) we have \(|X^{s_1, \dots , s_K}_{s_k-}-x_k|<{2\gamma _k \le 2\gamma _{k+1}'}\) on the set \(A_{k-1}\); see the definition of \(\gamma _{k}\) for the notation \(\gamma _{k+1}'\). Then

    $$\begin{aligned} \begin{aligned} \mathbb {P}(B_k|{\mathscr {G}}_{s_{k}})&\ge p^{\textrm{jump}}_k(\eta )\\&:=\frac{1}{\mu (\{u:|v|\ge \eta \})}\inf _{x\in B(x_k,{2\gamma _{k+1}'})}\\ {}&\mu \left( \left\{ u:|u|>\eta , x+c(x, u) \in B\left( y_k, {\rho \gamma _{k+1}}\right) \right\} \right) , \end{aligned} \end{aligned}$$

    and since \(\eta \le {\eta ^*}\) we have by (22)

    $$\begin{aligned} p^{\textrm{jump}}_k(\eta )>0, \quad k=1, \dots , K. \end{aligned}$$

Then, by (23) and the telescopic property of the conditional expectations we have for any \(s_1, \dots , s_K\) with \(|t_k-s_k|<\delta , k=1, \dots , K\)

$$\begin{aligned} \begin{aligned} \mathbb {P}&\left( \sup _{t\in [0, T]}|X^{s_1, \dots , s_K}_{t}-\phi ^{s_1, \dots , s_K}_{t}|\le \frac{\varepsilon }{2}\right) \ge \mathbb {P}(A_0\cap B_1\cap A_1\dots \cap B_K\cap B_K)\\&\quad \ge p^{\textrm{trunc}}\left( \eta , f, { y_0, \gamma _1}\right) {\prod _{k=1}^{K}\Big (p^{\textrm{jump}}_k(\eta ) p^{\textrm{trunc}}(\eta , f, y_{k}, \gamma _{k+1}) \Big )}=:q(\eta , \delta )>0. \end{aligned} \end{aligned}$$

Then by (20)

$$\begin{aligned} \begin{aligned} \mathbb {P}(d(X^{s_1, \dots , s_K}, \phi )\le \varepsilon )&\ge \mathbb {P}\left( d(X^{s_1, \dots , s_K}, \phi ^{s_1, \dots , s_K})\le \frac{\varepsilon }{2}\right) \\&\ge \mathbb {P}\left( \sup _{t\in [0, T]}|X^{s_1, \dots , s_K}_{t} -\phi ^{s_1, \dots , s_K}_{t}|\le \frac{\varepsilon }{2}\right) \\&\ge q(\eta , \delta ) \end{aligned} \end{aligned}$$

and by (18)

$$\begin{aligned} \mathbb {P}(d(X, \phi )\le \varepsilon )\ge p(\eta , \delta , \{t_k\}) q(\eta , \delta ). \end{aligned}$$

In these estimates, the choice of \(\delta , \eta \) depends on \(\phi \in {\textbf{S}}^{\textrm{step}}_{0,T,x_0}\) and \(\varepsilon >0\), only, hence the proof of (11) is complete.

4 Proof of the Key Lemma

We begin the proof of Lemma 3.1 with the following auxiliary result.

Lemma 4.1

For any \(w\in L^\perp \) and \(\eta >0\), there exist \(\zeta \in (0, \eta )\) and a function \(g:\mathbb {R}^d\rightarrow [-\frac{1}{2}, \frac{1}{2}]\) such that \( g(u)=0\hbox { whenever either }|u|\le \zeta \hbox { or }|u|\ge \eta \) and

$$\begin{aligned} \int _{\mathbb {R}^d}(u-u_L)g(u)\, \mu (\textrm{d}u)=w. \end{aligned}$$

Proof

For a given \(0<\zeta <\eta \), denote by \(G_\zeta ^\eta \) the set of all functions \(g:\mathbb {R}^d\rightarrow [-\frac{1}{2}, \frac{1}{2}]\) such that \( g(u)=0\hbox { whenever either }|u|\le \zeta \hbox { or }|u|\ge \eta . \) This set is convex and symmetric; hence,

$$\begin{aligned} V_\zeta ^\eta =\left\{ \int _{\mathbb {R}^d}(u-u_L)g(u)\, \mu (\textrm{d}u), g\in G_\zeta ^\eta \right\} \end{aligned}$$

is a symmetric convex subset of \(L^\perp \), and so is the set

$$\begin{aligned} V_0^\eta :=\bigcup _{\zeta \in (0, \eta )}V_\zeta ^\eta \end{aligned}$$

The statement of the lemma is equivalent to

$$\begin{aligned} V_0^\eta = L^\perp . \end{aligned}$$
(25)

Assuming (25) to fail, we have that \(V_0^\eta \) is a proper symmetric convex subset of \(L^\perp ,\) and thus, there exist \(\ell \in L^\perp \setminus \{0\}\) and \(c\ge 0\) such that

$$\begin{aligned} -c\le v\cdot \ell \le c, \quad v\in V_0^\eta ; \end{aligned}$$
(26)

recall that \(v\cdot \ell \) denotes the scalar product in \(\mathbb {R}^d\). Indeed, if there exists a point \(w\not \in V_0^\eta \) then by the separating hyperplane theorem (e.g., [15, Section 2.5]) there exists an affine hyperplane in \(L^\perp \) which separates non-intersecting convex sets \(V_0^\eta \) and \(\{w\}\), i.e., for some \(\ell \in L^\perp \setminus \{0\}\) and \(c\in {\mathbb {R}}\),

$$\begin{aligned} v\cdot \ell \le c, \quad v\in V_0^\eta , \quad w\cdot \ell \ge c. \end{aligned}$$

Since \(0\in V_0^\eta \), we get \(c\ge 0\) which proves the second inequality in (26); the first inequality follows then by the symmetry of \(V_0^\eta \).

It follows from (26) that, for every \(\zeta \in (0, \eta )\) and \(g\in G_\zeta ^\eta \),

$$\begin{aligned} \begin{aligned} \left| \int _{\mathbb {R}^d}\ell \cdot u\,g(u)\, \mu (\textrm{d}u)\right|&=\left| \int _{\mathbb {R}^d}\ell \cdot (u-u_L)g(u)\, \mu (\textrm{d}u)\right| \\&=\left| \ell \cdot \int _{\mathbb {R}^d}(u-u_L)g(u)\, \mu (\textrm{d}u)\right| \le c, \end{aligned} \end{aligned}$$

in the first identity we have used that \(u_L\in L\) is orthogonal to \(\ell \in L^\perp \). Taking

$$\begin{aligned} g_\zeta ^\eta (u)=\frac{1}{2}\textrm{sign}\,(\ell \cdot u)1_{\zeta<|u|<\eta }, \end{aligned}$$

we get from the previous inequality that

$$\begin{aligned} \int _{\zeta<|u|<\eta }|\ell \cdot u|\, \mu (\textrm{d}u)\le 2c, \quad \zeta \in (0, \eta ), \end{aligned}$$

and passing to the limit as \(\zeta \rightarrow 0\) we obtain

$$\begin{aligned} \int _{|u|<\eta }|\ell \cdot u|\, \mu (\textrm{d}u)\le 2c<+\infty . \end{aligned}$$

This means that \(\ell \in L\), which contradicts to the fact that \(\ell \in L^\perp \setminus \{0\}\). \(\square \)

By Lemma 4.1, for a fixed \(f\in {\textbf{F}}^{\textrm{step}}_{0,T}\) and \(\eta >0\) one can choose a function \(g_t^{f, \eta }(u)\) such that it is a step-wise function of t, takes values in \([-\frac{1}{2}, \frac{1}{2}]\), satisfies

$$\begin{aligned} \int _{\mathbb {R}^d}(u-u_L)g_t^{f, \eta }(u)\, \mu (\textrm{d}u)=\upsilon _\eta {+}f_t, \quad t\in [0, T] \end{aligned}$$
(27)

with \(\upsilon _\eta \in L^\perp \) given by (14), and, for some \(\zeta ^{f, \eta }>0\), one has \(g_t^{f, \eta }(u)=0\) whenever either \(|u|\ge \eta \) or \(|u|\le \zeta ^{f, \eta }\).

We write the compensated PPM in (16) in the form \(N(\textrm{d}u, \textrm{d}t)-\mu (\textrm{d}u)\textrm{d}t\) and consider the same SDE with another PPM \(Q^{f, \eta }(\textrm{d}u, \textrm{d}t)\) which has the intensity measure \((1+g_t^{f, \eta }(u))\mu (\textrm{d}u)\textrm{d}t\):

$$\begin{aligned} \textrm{d}Y_t^\eta =\widetilde{b}_\eta (Y_t^\eta )\, \textrm{d}t{-}\sigma (Y_t^\eta )\upsilon _\eta \, \textrm{d}t+\int _{|u|<\eta }c(Y_{t-}^\eta ,u) (Q^{f, \eta }(\textrm{d}u,\textrm{d}t)-\mu (\textrm{d}u)\textrm{d}t). \end{aligned}$$
(28)

Similarly to the notation used in Lemma 3.1, we denote by \(Y_t^{x,S,\eta }, t\ge S\) the solution to (28) with \(Y^\eta _S=x\). Recall that the function \(\phi _t^{x,S, f}\) is defined as the solution to the ODE (17). The following lemma is based on quite standard stochastic calculus estimates.

Lemma 4.2

There exists \(\rho \in (0,1)\) such that, for arbitrary \(\gamma \in (0,1], x\in \mathbb {R}^d\),

$$\begin{aligned} \inf _{x'\in B(x, \gamma \rho ), 0\le S\le Q\le T}\mathbb {P}\left( \sup _{t\in [S, Q]}|Y_t^{x',S,\eta }-\phi _t^{x,S, f}|\le \gamma \right) \rightarrow 1, \quad \eta \rightarrow 0. \end{aligned}$$

Proof

For simplicity of notation, we take \(S=0, Q=T\) and omit the index S. We also consider the scalar case \(d=1\); for \(d>1\) similar estimates should be performed coordinate-wise. The stochastic integral part in Eq. (28) can be written as

$$\begin{aligned} \begin{aligned} \int _{|u|<\eta }&c(Y_{t-}^{{\eta }},u) \widetilde{Q}^{f, \eta }(\textrm{d}u,\textrm{d}t) {+}\int _{|u|<\eta }c(Y_{t-}^{{\eta }},u) g_t^{f, \eta }(u)\mu (\textrm{d}u)\textrm{d}t,\\&{=\int _{|u|<\eta }c(Y_{t-}^{{\eta }},u) \widetilde{Q}^{f, \eta }(\textrm{d}u,\textrm{d}t)}\\&\qquad {+\int _{|u|<\eta }\Big (\sigma (Y_{t-}^{{\eta }})(u-u_L)+ \sigma (Y_{t-}^{{\eta }})u_L+ r(Y_{t-}^{{\eta }},u)\Big ) g_t^{f, \eta }(u)\mu (\textrm{d}u)\textrm{d}t,} \end{aligned} \end{aligned}$$

hence taking (27) into account we can write this equation in the form

$$\begin{aligned} \textrm{d}Y_t^\eta =\widetilde{b}_t^{f, \eta } (Y_t^\eta )\, \textrm{d}t+\sigma (Y_t^\eta )f_t\, \textrm{d}t+\int _{|u|<\eta }c(Y_{t-}^\eta ,u) \widetilde{Q}^{f, \eta }(\textrm{d}u,\textrm{d}t), \end{aligned}$$
(29)

where

$$\begin{aligned} \begin{aligned} \widetilde{b}_t^{f, \eta }(x)&= \widetilde{b}_\eta (x){+}\int _{|u|<\eta } \sigma (x)u_L g_t^{f, \eta }(u)\mu (\textrm{d}u){+}\int _{|u|<\eta } r(x,u) g_t^{f, \eta }(u)\mu (\textrm{d}u)\\&=\widetilde{b}(x)+\int _{|u|<\eta } \sigma (x)u_L (1{+}g_t^{f, \eta } (u))\mu (\textrm{d}u)\\&\qquad +\int _{|u|<\eta } r(x,u) (1{+}g_t^{f, \eta }(u))\mu (\textrm{d}u). \end{aligned} \end{aligned}$$

Since the functions \(u_L, |u|^\beta \) are integrable w.r.t. \(\mu (du)\) on \(\{|u|\le 1\}\),

$$\begin{aligned} |1{+}g_t^{f, \eta }(u)|\le \frac{3}{2}, \end{aligned}$$

and assumptions \({\textbf{H}}_1, {\textbf{H}}_2\) hold, we have that

$$\begin{aligned} \Delta _\eta ^K:=\sup _{x\in K, t\in [0, T]}|\widetilde{b}_t^{f, \eta }(x)-\widetilde{b}(x)|\rightarrow 0, \quad \eta \rightarrow 0 \end{aligned}$$

for any compact subset \(K\subset \mathbb {R}^d\). Denote \(\phi _t=\phi ^{x,f}\) and take

$$\begin{aligned} K=\textrm{closure}\Big (\{\phi _s, s\in [0, T]\} \Big ), \quad K'=\{y: \textrm{dist}(y, K)\le 1\}. \end{aligned}$$

By the assumptions \({\textbf{H}}_1, {\textbf{H}}_2\), there exists a constant \(C_{K'}\) such that

$$\begin{aligned} |c(y,u)|\le C_{K'}|u|, \quad |u|\le 1, \quad y\in K'. \end{aligned}$$

Denote

$$\begin{aligned} \tau _{K'}=\inf \{t: Y_t^{x',\eta }\not \in K'\} \end{aligned}$$

with the usual convention \(\inf \varnothing =T\). We have

$$\begin{aligned} \begin{aligned}&Y_t^{x',\eta }-\phi _t =x'-x+\int _0^t\Big (\widetilde{b}_t^{f, \eta }(\phi _s)-\widetilde{b}(\phi _s)\Big )\, ds +{\int _0^t\Big (\sigma (Y^{x',\eta }_s)-\sigma (\phi _s)\Big )f_s\, ds}\\&\quad +\int _0^t\Big (\widetilde{b}_t^{f, \eta }(Y^{x',\eta }_s)-\widetilde{b}_t^{f, \eta }(\phi _s)\Big )\, ds+\int _0^t\int _{|u|<\eta }c(Y_{s-}^{x',\eta },u) \widetilde{Q}^{f, \eta }(\textrm{d}u,\textrm{d}s). \end{aligned} \end{aligned}$$

By the Doob maximal inequality and the Itô isometry, for any \(\varepsilon >0\)

$$\begin{aligned} \begin{aligned} \mathbb {P}\left( \sup _{t\in [0, \tau _{K'}]}\left| \int _0^t\int _{|u|<\eta }c(Y_{s-}^\eta ,u) \widetilde{Q}^{f, \eta }(\textrm{d}u,\textrm{d}s)\right| \ge \varepsilon \right)&\\ {}&\le \frac{1}{\varepsilon ^2} \mathbb {E}\left( \int _0^{\tau _{K'}}\int _{|u|<\eta }c(Y_{s-}^\eta ,u) \widetilde{Q}^{f, \eta }(\textrm{d}u,\textrm{d}s)\right) ^2\\&=\frac{1}{\varepsilon ^2} \mathbb {E}\int _0^{\tau _{K'}}\int _{|u|<\eta }c(Y_{s-}^\eta ,u)^2 (1+g_s^{f, \eta }(u))\mu (\textrm{d}u)\textrm{d}s\\& \le C_{K'}\frac{3T}{2}\int _{|u|<\eta }|u|^2\mu (\textrm{d}u)\rightarrow 0, \qquad \eta \rightarrow 0; \end{aligned} \end{aligned}$$

in the last inequality we have used that \(1+g_s^{f, \eta }(u)\le \frac{3}{2}\). It is easy to check that the functions \(\widetilde{b}_t^{f, \eta }\) are uniformly Lipschitz, i.e., there exists L such that, for any \(t\in [0, T], \eta \in (0,1]\)

$$\begin{aligned} |\widetilde{b}_t^{f, \eta }(x)-\widetilde{b}_t^{f, \eta }(y)|\le L|x-y|. \end{aligned}$$

In addition, \(\sigma \) is Lipschitz and a piece-wise constant function \(f\in {\textbf{F}}^{\textrm{step}}_{0,T}\) is bounded; hence, we can assume that L is large enough for

$$\begin{aligned} |\sigma (x)f_t-\sigma (y)f_t|\le L|x-y| \end{aligned}$$

to hold. Then on the set

$$\begin{aligned} A_{\varepsilon , \eta }:=\left\{ \sup _{t\in [0, \tau _{K'}]}\left| \int _0^t\int _{|u|<\eta }c(Y_{s-}^\eta ,u) \widetilde{Q}(\textrm{d}u,\textrm{d}s)\right| <\varepsilon \right\} \end{aligned}$$

we have

$$\begin{aligned} |Y_t^{x',\eta }-\phi _t|\le |x'-x|+T\Delta _\eta ^K+\varepsilon +{2}L\int _0^t|Y^{x',\eta }_s (\phi _s)-\phi _s|\, ds, \quad t\in [0, \tau _{K'}], \end{aligned}$$

which by the Gronwall inequality yields

$$\begin{aligned} \sup _{t\in [0, \tau _{K'}]}|Y_t^{x',\eta }-\phi _t| \le \Big (|x'-x|+T\Delta _\eta ^K+\varepsilon \Big )e^{{2}LT}. \end{aligned}$$
(30)

Take

$$\begin{aligned} \rho =\frac{1}{2}e^{{-2}LT}, \end{aligned}$$

and \(\varepsilon , \eta \) small enough for

$$\begin{aligned} \Big (T\Delta _\eta ^K+\varepsilon \Big )e^{{2}LT}<{\frac{1}{2}}. \end{aligned}$$

Then, for any \(\gamma \in (0,1]\) and \(x'\) with \(|x'-x|<\rho \gamma \) we have on the set \(A_{\varepsilon , \eta }\)

$$\begin{aligned} \sup _{t\in [0, \tau _{K'}]}|Y_t^{x',\eta }-\phi _t|<1. \end{aligned}$$

Because trajectories of \(Y^{x',\eta }\) are right continuous, this yields that, on this set, \(\tau _{K'}=T\) and actually

$$\begin{aligned} \sup _{t\in [0,T]}|Y_t^{x',\eta }-\phi _t| \le \Big (|x'-x|+T\Delta _\eta ^K+\varepsilon \Big )e^{{2}LT}. \end{aligned}$$
(31)

Now, for a given \(\gamma \), we take \(\varepsilon <\frac{1}{4}\gamma e^{-{2}LT}\) and we get that for \(\eta \) small enough

$$\begin{aligned} \inf _{x'\in B(x, \rho \gamma )}\mathbb {P}(\sup _{t\in [0,T]}|Y_t^{x',\eta }-\phi _t|\le \gamma )\ge \mathbb {P}(A_{\varepsilon , \eta }), \end{aligned}$$

and because

$$\begin{aligned} \mathbb {P}(A_{\varepsilon , \eta })\rightarrow 1, \quad \eta \rightarrow 0 \end{aligned}$$

this completes the proof. \(\square \)

Now we are ready to complete the proof of the Key Lemma. Denote

$$\begin{aligned} \begin{aligned} {\mathscr {E}}^{f, \eta }&=\exp \left( -\int _{\mathbb {R}^d\times [0, T]}\log (1+g_t^{f, \eta }(u))\widetilde{Q}^{f, \eta }(\textrm{d}u, \textrm{d}t)\right. \\&\quad \left. +\int _{\mathbb {R}^d\times [0, T]}\Big (g_t^{f, \eta }(u)-\log (1+g_t^{f, \eta }(u))\Big )\, \mu (\textrm{d}u)\textrm{d}t\right) , \end{aligned} \end{aligned}$$

recall that by the construction \(1+g_t^{f, \eta }(u)\in [\frac{1}{2}, \frac{3}{2}]\) and \(1+g_t^{f, \eta }(u)=1\) whenever either \(|u|\ge \eta \) or \(|u|\le \zeta ^{f, \eta }\). Hence, the stochastic integral under the exponent is well defined in the standard (quadratic) Itô sense. The second (deterministic) integral is even bounded; hence, it is easy to see that

$$\begin{aligned} P\left( {\mathscr {E}}^{f, \eta }<\frac{1}{N}\right) =P(\log {\mathscr {E}}^{f, \eta }<-\log N)\rightarrow 0, \quad N\rightarrow \infty \end{aligned}$$
(32)

for any \(\eta \in (0,1]\).

On the other hand, the classical result by Skorokhod [16] tells us that the laws of the PPMs \(N(\textrm{d}u, \textrm{d}t)\) and \(Q^{f, \eta }(\textrm{d}u, \textrm{d}t)\) are equivalent, and \({\mathscr {E}}^{f, \eta }\) equals to the Radon–Nikodym derivative of \(\textrm{Law}(N)\) w.r.t. \(\textrm{Law}(Q^{f, \eta })\) evaluated at \(Q^{f, \eta }\). Recall that \(X^{x',S,\eta , \textrm{trunc}}\) and \(Y^{x',S,\eta }\) are defined as the strong solutions to the same SDE with the noises N and \(Q^{f, \eta }\), respectively. Hence, we can treat them as images of \(N, Q^{f, \eta }\) under some measurable mapping, which yields the identity

$$\begin{aligned} \mathbb {P}\left( \sup _{t\in [S, Q]}|X_t^{x',S,\eta , \textrm{trunc}}-\phi _t^{x,S, f}|\le \gamma \right) =\mathbb {E}1_{\sup _{t\in [S, Q]}|Y_t^{x',S,\eta , }-\phi _t^{x,S, f}|<\varepsilon }{\mathscr {E}}^{f, \eta }. \end{aligned}$$

By Lemma 4.2, for a given \(x\in \mathbb {R}^d\), \(\gamma >0\), and \(f\in {\textbf{F}}^{\textrm{step}}_{0,T}\), there exists \(\eta ^{f,x, \gamma }>0\) such that

$$\begin{aligned} \inf _{x'\in B(x, \gamma \rho ), 0\le S\le Q\le T}\mathbb {P}\left( \sup _{t\in [S, Q]}|Y_t^{x',S,\eta }-\phi _t^{x,S, f}|\le \gamma \right) \ge \frac{2}{3} \end{aligned}$$
(33)

for any \(\eta \in (0, \eta ^{f,x, \gamma }]\). By (32), for any such \(\eta \) there exists \(N^{f, \eta }\) large enough for

$$\begin{aligned} P({\mathscr {E}}^{f, \eta }\ge \frac{1}{N^{f, \eta }})\ge \frac{2}{3}. \end{aligned}$$

Then, for any \(x'\in B(x, \gamma \rho ), 0\le S\le Q\le T\) we have

$$\begin{aligned} \mathbb {P}\left( \sup _{t\in [S, Q]}|Y_t^{x',S,\eta }-\phi _t^{x,S, f}|\le \gamma , {\mathscr {E}}^{f, \eta }\ge \frac{1}{N^{f, \eta }} \right) \ge \frac{1}{3} \end{aligned}$$

and

$$\begin{aligned} \begin{aligned} \mathbb {P}\left( \sup _{t\in [S, Q]}|X_t^{x',S,\eta , \textrm{trunc}}-\phi _t^{x,S, f}|\le \gamma \right)&\\ {}&\ge \frac{1}{N^{f, \eta }}\mathbb {P}\left( \sup _{t\in [S, Q]}|Y_t^{x',S,\eta }-\phi _t^{x,S, f}|\le \gamma , {\mathscr {E}}^{f, \eta }\ge \frac{1}{N^{f, \eta }} \right) \\ {}&\ge \frac{1}{3N^{f, \eta }}>0, \end{aligned} \end{aligned}$$

which completes the proof.