1 Introduction

Stochastic volatility models (in the simplest one-dimensional case) are of the form

$$\begin{aligned} dS_{t}= & {} \nu _{1}(S_{t},V_{t})S_{t}\, dt+ V_{t}S_{t}\, d{\overline{W}}_{t}, \end{aligned}$$
(1)

where \({\overline{W}}\) is a Brownian motion, \(\nu _1\) is a suitable function and S describes the (discounted) price of an asset with volatility process V.

The present paper is about the long-term behaviour of S. In the Markovian case, V satisfies a stochastic differential equation (SDE),

$$\begin{aligned} dV_{t}= & {} \nu _{2}(V_{t})\, dt+\sigma (V_{t})\, dB_{t}, \end{aligned}$$
(2)

where B is another Brownian motion, possibly correlated with \({\overline{W}}\); \(\nu _{2},\sigma \) are suitable functions. In such diffusion models there is an arsenal of techniques from Markov process theory to show that the law of \(S_t\) tends to a limit as \(t\rightarrow \infty \), see e.g. [12, 25, 32, 33, 38, 41, 42], Chapter 20 of [31] and Subsection 7.1 of [22].

Recently, however, fractional stochastic volatility models have become popular (see [9, 15, 45]), where the process V is not Markovian. For instance,

$$\begin{aligned} V_t=\exp \left( J_t\right) ,\quad J_t:=\int _{-\infty }^t K(t-s)dB_s, \end{aligned}$$
(3)

with some (two-sided) Brownian motion B, and a suitable function \(K:{\mathbb {R}}_+\rightarrow {\mathbb {R}}\). In such a setting the question of stochastic stability becomes difficult, one cannot rely on the usual Markovian techniques and there seem to be no results in the literature that would imply the convergence of the law of \(S_{t}\) as \(t\rightarrow \infty \) at this level of generality.

We now explain our motivations for studying such models. Asset price processes often show mean-reversion (for instance, commodities or commodity futures, see [6, 8]). Optimal investment problems for such assets were considered in [17], see also the study [13] on asymptotic arbitrage. Long-term investments may also be studied in the framework of ergodic, risk-sensitive or adaptive control (see e.g. [5, 11, 24, 36]). All these approaches require that the law of \(S_{t}\) should converge to a steady state as \(t\rightarrow \infty \). Long-term investment problems for fractional processes were treated in [18, 34], but these studies do not cover fractional stochastic volatility models.

The present paper proves that – under mean-reversion and smoothness conditions on the drift of S and integrability assumptions on \(V_{0}\), \(S_{0}\) – the stochastic system \((S_{t},V_{t})\) converges to an invariant probability, independent of the initialization \(S_{0}\). A multi-asset framework is treated and \(B,{\overline{W}}\) are allowed to have a stochastic correlation. Our arguments are based on a new coupling construction for (discrete-time) Markov processes in random environments.

In the extant literature on fractional volatility, asset dynamics is most often considered for purposes of derivative pricing; [4, 14, 20] are early examples. These papers thus work under the risk-neutral measure, which corresponds to taking \(\nu _{1}\equiv 0\) in (1). As we have in mind a different class of problems (portfolio optimization), we work under the physical probability, where \(\nu _{1}\) is non-zero.

In Sect. 2 we rigorously formulate our main results, Theorems 2.9 and 2.10. A novel (discrete-time) coupling method is introduced in Sect. 3. As a warm-up, it is first presented for (ordinary) Markov chains in Sect. 3.1. Sect. 3.2 develops the same ideas in the more involved setting of Markov chains in random environments. In Sect. 4 we prove the main results, combining advanced Malliavin calculus techniques with the discrete-time construction of Sect. 3.2.

2 Results

Scalar product in finite-dimensional Euclidean spaces is denoted by \(\langle \cdot ,\cdot \rangle \), the coresponding norm is \(|\cdot |\), where the dimension of the space may vary. For a matrix A, \(A^{*}\) denotes its transpose. For matrices A, |A| denotes the operator norm.

All the random objects in the present paper live on a fixed probability space \((\Omega ,{\mathscr {F}},P)\). For a Polish space \({\mathscr {Z}}\), its Borel sigma-algebra is denoted \({\mathscr {B}}({\mathscr {Z}})\). If \(Z:\Omega \rightarrow {\mathscr {Z}}\) is \({\mathscr {F}}/{\mathscr {B}}({\mathscr {Z}})\) is measurable (that is, if Z is a \({\mathscr {Z}}\)-valued random variable) then \({\mathscr {L}}(Z)\) denotes its law on \({\mathscr {B}}({\mathscr {Z}})\).

Fix \(d,m\in {\mathbb {N}}\setminus \{0\}\) with \(d\le m\). The number of assets is d and the dimension of the driving noise m. For every \(k\ge 1\), let \({\mathscr {W}}^k\) denote the set of continuous \({\mathbb {R}}^{k}\)-valued functions on \({\mathbb {R}}\) which is a Polish space under the metric

$$\begin{aligned} {\mathbf {d}}_{k}(f,g):=\sum _{i=-\infty }^{\infty }\frac{1}{2^{|i|}}\left[ {} 1\wedge \sup _{u\in [i,i+1]}|f(u)-g(u)|\right] ,\quad f,g\in {\mathscr {W}}^k. \end{aligned}$$

Let \(B_t\), \(t\in {\mathbb {R}}\) be a two-sided m-dimensional Brownian motion (i.e. \(B_t\), \(B_{-t}\), \(t\in {\mathbb {R}}_+\) are independent standard m-dimensional Brownian motions), let \({\mathscr {G}}_t\), \(t\in {\mathbb {R}}\) denote its completed natural filtration. Let \({\mathscr {V}}\) denote the set of \(d\times d\) non-singular matrices, \({\mathscr {R}}\) the set of \(d\times m\) matrices r satisfying \(rr^{*}<I\), where I is the d-dimensional identity matrix and \(A<B\) for symmetric, positive semidefinite \(d\times d\) matrices AB means that \(B-A\) is positive definite. Similarly, \(A\le B\) means that \(B-A\) is positive semindefinite and \(\sqrt{A}\) denotes the usual (symmetric) square root of semidefinite matrices.

Let \(V_t\), \(t\in {\mathbb {R}}\) (resp. \(\rho _t\), \(t\in {\mathbb {R}}\)) be \({\mathscr {V}}\)-valued (resp. \({\mathscr {R}}\)-valued) processes with continuous trajectories.

Notice that \({\mathbf {B}}_{t}:=(B_{t}-B_{t+s})_{s\in {\mathbb {R}}}\) (resp. \({\mathbf {V}}_{t}:=(V_{t+s})_{s\in {\mathbb {R}}}\) and \({\mathbf {R}}_{t}:=(\rho _{t+s})_{s\in {\mathbb {R}}}\)) can be naturally regarded as a \({\mathscr {W}}^m\)-valued (resp. \({\mathscr {W}}^{d\times d}\)-valued and \({\mathscr {W}}^{d\times m}\)-valued) random process indexed by \(t\in {\mathbb {R}}\).

Assumption 2.1

There are measurable functions \(F_{1}:{\mathscr {W}}^{m}\rightarrow {\mathscr {W}}^{d\times d}\), \(F_{2}:{\mathscr {W}}^{m}\rightarrow {\mathscr {W}}^{d\times m}\) such that \({\mathbf {V}}_{t}=F_{1}({\mathbf {B}}_{t})\), \({\mathbf {R}}_{t}=F_{2}({\mathbf {B}}_{t})\). Furthermore, \((V_{t},\rho _{t})\), \(t\in {\mathbb {R}}\) is adapted to \({\mathscr {G}}_{t}\), \(t\in {\mathbb {R}}\).

In plain English, \((V_{t},\rho _{t})\) is a nonanticipative functional of the increments of the Brownian motion B up to t. A specification like (3) is a typical example. Under Assumption 2.1, \(({\mathbf {V}}_{t},{\mathbf {R}}_{t},{\mathbf {B}}_{t})\), \(t\in {\mathbb {R}}\) is a stationary process in the strong sense.

Let \(W_t\), \(t\in {\mathbb {R}}_+\) be another, d-dimensional standard Brownian motion with (completed) natural filtration \({\mathscr {F}}_t\), \(t\in {\mathbb {R}}_+\). Instead of prices, it is more convenient to work with log-prices. Hence we consider d financial assets whose log-price is given by the d-dimensional process \(L_t\), \(t\in {\mathbb {R}}_+\) which is the solution of the stochastic differential equation

$$\begin{aligned} dL_t=\zeta (L_t,V_{t})\, dt + V_t \rho _t \, dB_t+V_t \sqrt{I-\rho _t\rho _{t}^{*}}\, dW_t, \end{aligned}$$
(4)

where \(L_0\) is a random variable and \(\zeta :{\mathbb {R}}^{d}\times {\mathscr {V}}\rightarrow {\mathbb {R}}^{d}\) is a measurable function.

Assumption 2.2

Let \({\mathscr {G}}_{\infty }\) be independent of \({\mathscr {F}}_{\infty }\). Let \(L_{0}=l(R,{\mathbf {V}}_{0},{\mathbf {R}}_{0})\) for some measurable \(l:[0,1]\times {\mathscr {W}}^{d\times d}\times {\mathscr {W}}^{d\times m}\rightarrow {\mathbb {R}}\) and [0, 1]-uniformly distributed random variable R, which is assumed to be independent of \({\mathscr {G}}_{\infty }\vee {\mathscr {F}}_{\infty }\).

Remark 2.3

An arbitrary joint law for \((L_{0},{\mathbf {V}}_{0},{\mathbf {R}}_{0})\) can be realized for suitable l, hence Assumption 2.2 is not restrictive at all. For practical applications, actually, one may assume \(L_{0}\) to be constant.

Assumption 2.2, together with Assumptions 2.4 and 2.7, stipulated below, guarantee a unique \(({\mathscr {F}}_{t}\vee {\mathscr {G}}_{t})_{t\in {\mathbb {R}}_{+}}\)-adapted solution to (4), by Theorem 7 on page 82 of [28].

Assumption 2.4

The function \(\zeta (x,v)\) is twice continuously differentiable in its first variable, \(\partial _x\zeta ,\partial _{xx}\zeta \) are bounded. Furthermore, there is \(K>0\) such that \(|\zeta (x,v_{1})-\zeta (x,v_{2})|\le K(1+|v_{1}|+|v_{2}|)|v_{1}-v_{2}|\) for all \(x\in {\mathbb {R}}^{d}\), \(v_1,v_2\in {\mathscr {V}}\) (polynomial Lipschitz condition in v).

The following mean-reversion (or dissipativity) condition is rather standard, also in a non-Markovian context, see e.g. [21].

Assumption 2.5

There exist \(\alpha ,\beta >0\), \(\xi \ge 2\) such that

$$\begin{aligned} \langle x,\zeta (x,v)\rangle \le -\alpha |x|^{2}+\beta (1+|v|^{\xi }), \ x\in {\mathbb {R}}^d,\ v\in {\mathscr {V}}. \end{aligned}$$

Example 2.6

We briefly comment on the meaning of Assumptions 2.4 and 2.5 in a simple case with one asset (\(d=1\)) whose price satisfies

$$\begin{aligned} dS_{t} = \nu _{1}(S_{t})S_{t}\, dt+ V_{t}S_{t}\, d{\overline{W}}_{t} \end{aligned}$$

with some \(S_{0}>0\), with a \(({\mathbb {R}}\setminus \{0\})\times (-1,1)\)-valued stationary process \((V_{t},\rho _{t})\) and Brownian motion \(d{\overline{W}}_{t}=\rho _t\, dB_t+\sqrt{1-\rho _t^2}\, dW_t\). Let the function \(\nu _{1}\) be such that \({\bar{\nu }}_{1}(x):=\nu _{1}(\exp (x))\) is twice continuously differentiable with \({{\bar{\nu }}}_{1}'\), \({{\bar{\nu }}}_{1}''\) bounded and satisfying

$$\begin{aligned} x{{\bar{\nu }}}_{1}(x)\le -{\bar{\alpha }} |x|^{2}+{\bar{\beta }},\quad x\in {\mathbb {R}} \end{aligned}$$

with some \({\bar{\alpha }},{\bar{\beta }}>0\). Then \(L_{t}:=\ln (S_{t})\) has dynamics

$$\begin{aligned} dL_{t}=\left[ {{\bar{\nu }}}_{1}(L_{t})+\frac{V_{t}^{2}}{2}\right] \, dt+ V_{t}\, d{\overline{W}}_{t} \end{aligned}$$

and \(\zeta (x,v):={{\bar{\nu }}}_{1}(x)+v^{2}/2\) satisfies Assumption 2.5 (with \(\xi =2\) and with suitable \(\alpha ,\beta \)). Assumption 2.4 also holds true. This example shows how the Lipschitz-continuity condition on v naturally arises in Assumption 2.4. It also shows that the most relevant case is where \(\xi =2\).

Finally, we stipulate moment conditions on the volatility process and on the initial condition.

Assumption 2.7

Let \(E[|V_{0}|^{\max \{\xi ,4\}}]<\infty \) holds for the \(\xi \) of Assumption 2.5.

Assumption 2.8

Let \(E[|L_{0}|^2]<\infty \) hold.

Our principal result is now presented.

Theorem 2.9

Let Assumptions 2.12.2, 2.42.52.7 and 2.8 be in force. Then

$$\begin{aligned} {\mathscr {L}}(L_t,{\mathbf {V}}_{t},{\mathbf {R}}_{t})\rightarrow \mu _{\sharp },\ t\rightarrow \infty \end{aligned}$$
(5)

holds for some probability \(\mu _{\sharp }\) on \({\mathscr {B}}({\mathbb {R}}^{d}\times {\mathscr {W}}^{d\times d}\times {\mathscr {W}}^{d\times m})\), in the sense of weak convergence of probability measures. The probability \(\mu _{\sharp }\) does not depend on \(L_0\) and it is invariant in the following sense: if \({\mathscr {L}}(L_0,{\mathbf {V}}_{0},{\mathbf {R}}_{0})=\mu _{\sharp }\) then \({\mathscr {L}}(L_t,{\mathbf {V}}_{t},{\mathbf {R}}_{t})=\mu _{\sharp }\) for every \(t>0\).

In the following theorem, instead of Assumption 2.5 one assumes the weaker condition (6) below. This comes at the price of strengthening Assumptions 2.7 and 2.8 to (7) below.

Theorem 2.10

Let Assumptions 2.2 and 2.4 hold, let

$$\begin{aligned} \langle x,\zeta (x,v)\rangle \le -\alpha |x|^{1+\gamma }+\beta (1+|v|^{\xi }), \ x\in {\mathbb {R}}^d,\ v\in {\mathscr {V}} \end{aligned}$$
(6)

hold for some \(\alpha ,\beta >0\), \(\xi \ge 2\) and \(0<\gamma <1\). Let us assume

$$\begin{aligned} E\left[ \mathrm {e}^{\kappa _{0} |L_{0}|}\right]<\infty ,\quad {} E\left[ \mathrm {e}^{\kappa _{0} |V_{0}|^{\xi /\gamma }}\right] <\infty \end{aligned}$$
(7)

for some \(\kappa _{0}>0\). Then the conclusions of Theorem 2.9 hold.

3 Coupling constructions

Following the conventions of measure theory, the total variation norm of a finite signed measure \(\mu \) on \({\mathscr {B}}({\mathscr {Z}})\) is defined as

$$\begin{aligned} ||\mu ||_{TV}:=\sup _{\phi \in \Phi _{1}}\left| \int _{{\mathscr {Z}}}\phi (z)\mu (dz)\right| , \end{aligned}$$

where \(\Phi _{1}\) denotes the family of measurable functions \(\phi :{\mathscr {Z}}\rightarrow [-1,1]\). The underlying \({\mathscr {Z}}\) may vary but it will always be clear from the context. Note that for \({\mathscr {Z}}\)-valued random variables \(Z_{1}\), \(Z_{2}\) we always have

$$\begin{aligned} ||{\mathscr {L}}(Z_{1})-{\mathscr {L}}(Z_{2})||_{TV}\le 2P(Z_{1}\ne Z_{2}). \end{aligned}$$
(8)

3.1 Markov chains

First we are working in the setting of general state space discrete-time Markov chains. Our main ideas are explained in this simple context before turning to Markov chains in random environments in the next subsection.

Proofs for the stochastic stability of Markov chains are usually based on two ingredients, see e.g. [31]. First, it is checked (using Lyapunov functions) that the chain returns often enough to a fixed set C. Second, a minorization condition holds on C for the transition kernel so couplings occur whose probabilities can be estimated. Such C are called “small sets”.

When the state space is \({\mathbb {R}}^d\), it happens often that all compact sets are small. This is the case for both discretized and discretely sampled non-degenerate diffusions. The coupling method of the present subsection exploits the latter property, formulated in more abstract terms. Otherwise we rely on standard “coupling from the past” ideas, see e.g. [10, 37].

Although Theorem 3.4 below seems to be new, its statement contains little revelation. Its proof, on the contrary, presents original ideas which will become fruitful in the more general setting of the next subsection where existing results do not apply. We construct couplings on a sequence of small sets and then exploit (assuming a certain form of tightness) that the chain stays in these sets with large enough probabilities. The crucial methodological contribution of this approach is that, instead of analysing return times to a set C (which have a complicated dependence structure due to the random environment), one can repeatedly use simple, one-step estimates.

Another approach based on one-step estimates was presented in [23], using a contraction argument in a suitable metric. When applying it in the presence of the random environment, however, the metric to be used becomes dependent on that environment which sets limitations to the use of that method, see [16].

Let \({\mathscr {X}}\) be a Polish space. Let \(Q(\cdot ,\cdot )\) be a probabilistic kernel, i.e. \(Q(\cdot ,A)\) is measurable for each \(A\in {\mathscr {B}}({\mathscr {X}})\) and \(Q(x,\cdot )\) is a probability law for each \(x\in {\mathscr {X}}\). Let \(X_t\), \(t\in {\mathbb {N}}\) denote a Markov chain with transition kernel Q, started from some \(X_0\). We now define the set of initial laws starting from which the chain satisfies a tightness-like assumption. We assume in the sequel that we are given a non-decreasing sequence of sets \({\mathscr {X}}_n\in {\mathscr {B}}({\mathscr {X}})\), \(n\in {\mathbb {N}}\) with \({\mathscr {X}}_0\ne \emptyset \).

Definition 3.1

Let \({\mathscr {P}}_{b}\) denote the set of probabilities \(\mu \) on \({\mathscr {B}}({\mathscr {X}})\) such that if \(X_{0}\) has law \(\mu \) then

$$\begin{aligned} \lim _{n\rightarrow \infty }\sup _{t\in {\mathbb {N}}}P(X_{t}\notin {\mathscr {X}}_{n}) = 0. \end{aligned}$$

Notice that \({\mathscr {P}}_{b}\) might well be empty. We write \(X_{0}\in {\mathscr {P}}_{b}\) when we indeed mean \({\mathscr {L}}(X_{0})\in {\mathscr {P}}_{b}\). We stipulate next that minorization conditions should hold on each of the sets \({\mathscr {X}}_{n}\).

Assumption 3.2

There exists a sequence \(\alpha _{n}\in (0,1]\), \(n\in {\mathbb {N}}\) and a sequence of probability measures \(\nu _{n}\), \(n\in {\mathbb {N}}\) such that

$$\begin{aligned} Q(x,A)\ge \alpha _{n}\nu _n(A),\ A\in {\mathscr {B}}({\mathscr {X}}),\ x\in {\mathscr {X}}_n,\ n\in {\mathbb {N}}. \end{aligned}$$
(9)

We recall a representation result for kernels satisfying the minorization condition (9), in terms of random mappings that are constant on the respective \({\mathscr {X}}_{n}\) with probability at least \(\alpha _{n}\).

Lemma 3.3

Let Assumption 3.2 be in force. Let \({\mathbf {U}}\) be a uniform random variable on [0, 1]. For each \(n\in {\mathbb {N}}\), there exists a mapping \(T^{n}(\cdot ,\cdot ):[0,1]\times {\mathscr {X}}\rightarrow {\mathscr {X}}\) satisfying

$$\begin{aligned} Q(x,A)=P(T({\mathbf {U}},x)\in A),\ x\in {\mathscr {X}},\ A\in {\mathscr {B}}({\mathscr {X}}), \end{aligned}$$

such that for all \(u\in [0,\alpha _{n}]\),

$$\begin{aligned} T^{n}(u,x_1)=T^{n}(u,x_2)\quad \text{ for } \text{ all }\quad x_1,x_2\in {\mathscr {X}}_n. \end{aligned}$$
(10)

Proof

Such a representation is well-known, see page 228 in [7]. \(\square \)

Theorem 3.4

Let Assumption 3.2 hold. Then there exists a probability \(\mu _{*}\) on \({\mathscr {B}}({\mathscr {X}})\) such that

$$\begin{aligned} ||{\mathscr {L}}(X_t)- \mu _{*}||_{TV}\rightarrow 0,\ t\rightarrow \infty \end{aligned}$$
(11)

holds for every \(X_{0}\in {\mathscr {P}}_{b}\).

Proof

Theorem 3.4 follows from Theorem 3.10 below (choosing \({\mathscr {Y}}\) a singleton). Nonetheless we provide a proof in the present, simple setting, in order to elucidate the main ideas.

Fix \(\varepsilon >0\) and choose \(n=n(\varepsilon )\) so large that

$$\begin{aligned} \sup _{t\in {\mathbb {N}}}P(X_{t}\notin {\mathscr {X}}_{n}) \le \varepsilon . \end{aligned}$$
(12)

We estimate coupling probabilities on \({\mathscr {X}}_{n}\), using independent copies of the random mappings constructed in Lemma 3.3 above.

Let \({\mathbf {U}}_{k}\), \(k\in -{\mathbb {N}}\) be an independent sequence of uniform random variables on [0, 1], independent of \(X_{0}\). Let \(T^{n}(\cdot ,\cdot )\) be the mapping constructed in Lemma 3.3. Define the process

$$\begin{aligned} {\tilde{X}}_{t}:=[T^{n}({\mathbf {U}}_{0},\cdot )\circ \cdots \circ T^{n}({\mathbf {U}}_{-t+1},\cdot )](X_{0}), \ t\in {\mathbb {N}} \end{aligned}$$

where we mean \({\tilde{X}}_{0}=X_{0}\). Notice that \({\mathscr {L}}({\tilde{X}}_{t})={\mathscr {L}}(X_{t})\) for each \(t\in {\mathbb {N}}\).

Fix integers \(1\le s<t\). For each \(j=0,\ldots ,s\), define the following disjoint events:

$$\begin{aligned} A^{s,t}_{j}:= & {} \left\{ [T^{n}({\mathbf {U}}_{-j},\cdot )\circ \cdots \circ T^{n}({\mathbf {U}}_{-t+1},\cdot )](X_{0}) = [T^{n}({\mathbf {U}}_{-j})\right. \\&\left. \circ \ldots \circ T^{n}({\mathbf {U}}_{-s+1},\cdot )] (X_{0})\right\} ,\\ B^{s,t}_{j}:= & {} \left\{ [T^{n}({\mathbf {U}}_{-j},\cdot )\circ \cdots \circ T^{n}({\mathbf {U}}_{-t+1},\cdot )](X_{0}) \ne [T^{n}({\mathbf {U}}_{-j},\cdot )\right. \\&\circ \ldots \circ T^{n}({\mathbf {U}}_{-s+1},\cdot )] (X_{0}),\\&[T^{n}({\mathbf {U}}_{-j},\cdot )\circ \cdots \circ T^{n}({\mathbf {U}}_{-t+1},\cdot )](X_{0})\in {\mathscr {X}}_{n}, \ [T^{n}({\mathbf {U}}_{-j},\cdot )\\&\left. \circ \ldots \circ T^{n}({\mathbf {U}}_{-s+1},\cdot )] (X_{0})\in {\mathscr {X}}_{n}\right\} ,\\ C^{s,t}_{j}:= & {} \Omega \setminus (A^{s,t}_{j}\cup B^{s,t}_{j}), \end{aligned}$$

where we mean

$$\begin{aligned} A^{s,t}_{s}:= & {} \left\{ [T^{n}({\mathbf {U}}_{-s},\cdot )\circ \cdots \circ T^{n}({\mathbf {U}}_{-t+1},\cdot )](X_{0}) = X_{0}\right\} ,\\ B^{s,t}_{s}:= & {} \left\{ [T^{n}({\mathbf {U}}_{-s},\cdot )\circ \cdots \circ T^{n}({\mathbf {U}}_{-t+1},\cdot )](X_{0}) \ne X_{0},\ [T^{n}({\mathbf {U}}_{-s},\cdot )\right. \\&\left. \circ \cdots \circ T^{n}({\mathbf {U}}_{-t+1},\cdot )](X_{0})\in {\mathscr {X}}_{n},\ X_{0}\in {\mathscr {X}}_{n}\right\} . \end{aligned}$$

Define also \(p_{j}^{s,t}:=P(A^{s,t}_{j})\). We aim to show that, for s large, \(p^{s,t}_{0}\) is close to 1 for each \(t>s\), which means that \({\tilde{X}}_{t}\) very likely equals \({\tilde{X}}_{s}\). We estimate \(p_{j}^{s,t}\) by backward recursion. Notice that

$$\begin{aligned} P(C^{s,t}_{j})\le & {} P([T^{n}({\mathbf {U}}_{-j},\cdot )\circ \cdots \circ T^{n}({\mathbf {U}}_{-t+1},\cdot )](X_{0})\notin {\mathscr {X}}_{n}) \nonumber \\&+ P([T^{n}({\mathbf {U}}_{-j},\cdot )\circ \ldots \circ T^{n}({\mathbf {U}}_{-s+1},\cdot )] (X_{0})\notin {\mathscr {X}}_{n})\nonumber \\= & {} P(X_{t-j}\notin {\mathscr {X}}_{n})+P(X_{s-j}\notin {\mathscr {X}}_{n})\le 2\varepsilon , \end{aligned}$$
(13)

by (12). Define \({\mathscr {H}}_{j,t}:=\sigma (X_{0},{\mathbf {U}}_{-j},\ldots ,{\mathbf {U}}_{-t+1})\). On the event \(B_{j}^{s,t}\in {\mathscr {H}}_{j,t}\) we have

$$\begin{aligned} P\left( A_{j-1}^{s,t}\mid {\mathscr {H}}_{j,t}\right) \ge P\left( {\mathbf {U}}_{-j+1}\in [0,\alpha _{n}]\mid {\mathscr {H}}_{j,t}\right) ={} P({\mathbf {U}}_{-j+1}\in [0,\alpha _{n}])=\alpha _{n} \text{ a.s. } \end{aligned}$$

since \(T^{n}({\mathbf {U}}_{-j+1},\cdot )\) is a constant mapping on \({\mathscr {X}}_{n}\) when \({\mathbf {U}}_{-j+1}^{1}\in [0,\alpha _{n}]\), and \({\mathbf {U}}_{-j+1}\) is independent of \({\mathscr {H}}_{j,t}\). On the other hand, on the event \(A_{j}^{s,t}\in {\mathscr {H}}_{j,t}\) we have \(P\left( A_{j-1}^{s,t}\vert {\mathscr {H}}_{j,t}\right) =1\) a.s. for trivial reasons. Hence

$$\begin{aligned} p_{j-1}^{s,t}\ge p_{j}^{s,t}+\alpha _{n}P(B_{j}^{s,t})\ge p_{j}^{s,t}+\alpha _{n}(1-p_{j}^{s,t}-2\varepsilon ), \end{aligned}$$
(14)

using (13). We get by backward recursion using (14), starting from the trivial \(p_{s}^{s,t}\ge 0\), that

$$\begin{aligned} p_{0}^{s,t}\ge (1-2\varepsilon )\alpha _{n}\frac{1-(1-\alpha _{n})^{s}}{1- (1-\alpha _{n})}=(1-2\varepsilon ){} [1-(1-\alpha _{n})^{s}], \end{aligned}$$

remembering also the formula for the sum of a geometric series. It follows from (8) that for all integers \(1\le s <t\),

$$\begin{aligned} ||{\mathscr {L}}(X_{t})-{\mathscr {L}}(X_{s})||_{TV}\le 2P({\tilde{X}}_{t}\ne {\tilde{X}}_{s}) =2(1-p_{0}^{s,t})\le 4\varepsilon +2(1-\alpha _{n})^{s},\nonumber \\ \end{aligned}$$
(15)

which is smaller than \(5\varepsilon \) for s large enough. As \(\varepsilon \) was arbitrary, the sequence \({\mathscr {L}}(X_{t})\), \(t\in {\mathbb {N}}\) is shown to be Cauchy for the total variation distance hence it converges to some probability \(\mu _{*}\).

Let \(X_t\), \(X_t'\), \(t\in {\mathbb {N}}\) denote Markov chains with transition kernel Q, started from \(X_0, X_0'\in {\mathscr {P}}_{b}\), respectively. Then, using \({\mathbf {U}}_{k}\), \(k\in -{\mathbb {N}}\) independent of \(\sigma (X_{0},X_{0}')\), we get \(||{\mathscr {L}}(X_t)- {\mathscr {L}}(X_t')||_{TV}\rightarrow 0\) as \(t\rightarrow \infty \) analogously to the argument above. This shows that \(\mu _{*}\) is independent of the choice of \(X_{0}\in {\mathscr {P}}_{b}\). \(\square \)

Remark 3.5

Assume \({\mathscr {X}}:={\mathbb {R}}^{d}\) and \({\mathscr {X}}_{n}:=\{x\in {\mathscr {X}}: |x|\le n\}\), \(n\in {\mathbb {N}}\). Let \(V(x):=g(|x|)\) for some non-decreasing \(g:{\mathbb {R}}_{+}\rightarrow {\mathbb {R}}_{+}\) with \(g(\infty )=\infty \). If the initial state \(X_{0}\) is such that

$$\begin{aligned} \sup _{k\in {\mathbb {N}}}E[V(X_{k})]<\infty , \end{aligned}$$
(16)

then \(X_{0}\in {\mathscr {P}}_{b}\), as seen from Markov’s inequality.

The argument for proving Theorem 3.4 above, in fact, provides us with a convergence rate estimate, too. For each t, (17) below allows to optimize over n and to choose \(n=n(t)\) that gives the best estimate.

Corollary 3.6

Under Assumption 3.2, in the setting of Remark 3.5, for each \(n\in {\mathbb {N}}\) and \(t\in {\mathbb {N}}\),

$$\begin{aligned} ||{\mathscr {L}}(X_{t})-\mu _{*}||_{TV}\le 4\frac{\sup _{k\in {\mathbb {N}}}E[V(X_{k})]}{g(n)}+2(1-\alpha _{n})^{t}. \end{aligned}$$
(17)

Proof

This follows from (12), (15) and from Markov’s inequality. \(\square \)

We demonstrate the application of Corollary 3.6 and the resulting rate through a simple example.

Example 3.7

Consider a stable scalar AR(1) process, where \({\mathscr {X}}={\mathbb {R}}\) and the dynamics is

$$\begin{aligned} X_{t+1} = \gamma X_t + \varepsilon _{t+1}, \end{aligned}$$
(18)

where \(0<\gamma <1\), \(\varepsilon _t\) is an independent series of standard Gaussian variables, and \(X_0\) is a constant initialization.

In order to apply Corollary 3.6, we choose \(V(x)=g(|x|)=e^{\beta x^2}\) with \(\beta <\frac{1-\gamma ^2}{2}\). To confirm (16), expanding the dynamics Eq. (18) we see

$$\begin{aligned} X_t = \gamma ^t X_0 + \sum _{s=1}^t \gamma ^{t-s}\varepsilon _s \quad \sim \quad {\mathscr {N}}\left( \gamma ^t X_0, \frac{1-\gamma ^{2t}}{1-\gamma ^2}\right) . \end{aligned}$$

Consequently,

$$\begin{aligned} EV(X_t)&= \frac{1}{\sqrt{2\pi \frac{1-\gamma ^{2t}}{1-\gamma ^2}}}\int _{-\infty }^\infty e^{-\frac{1-\gamma ^{2}}{2(1-\gamma ^{2t})}(z-\gamma ^t X_0)^2}e^{\beta z^2} dz\\&\le \frac{1}{\sqrt{2\pi }} \int _{-\infty }^\infty e^{-\frac{1-\gamma ^{2}}{2}(z-\gamma ^t X_0)^2}e^{\beta z^2} dz < \infty , \end{aligned}$$

and this quantity is also bounded above uniformly in t by some \(c(\gamma ,\beta ,X_0)\) since \(|\gamma ^t X_0|\) decreases as \(t\rightarrow \infty \).

We also need Assumption 3.2, the minorization condition for a sequence of small sets. Let

$$\begin{aligned} {\mathscr {X}}_n=[-n,n], \qquad \nu =\frac{1}{2}Leb\vert _{[-1,1]}, \end{aligned}$$

for all n. In order to acquire \(\alpha _n\), we need to find the infimum of \(\frac{dQ(x,\cdot )}{d\nu (\cdot )}\) on the appropriate sets, and now that they are both absolutely continuous distributions, this boils down to comparing the densities, therefore

$$\begin{aligned} \alpha _n=\inf _{x\in [-n,n],z\in [-1,1]} \frac{Q(x,dz)}{ \frac{1}{2}dz } = \sqrt{\frac{2}{\pi }}e^{-\frac{(\gamma n + 1)^2}{2}}. \end{aligned}$$
(19)

Substituting the computed expressions Corollary 3.6 provides

$$\begin{aligned} ||{\mathscr {L}}(X_{t})-\mu _{*}||_{TV}\le \frac{4c(\gamma ,\beta ,X_0)}{\exp (\beta n^2)} + 2\left( 1-\sqrt{\frac{2}{\pi }}e^{-\frac{(\gamma n + 1)^2}{2}}\right) ^t. \end{aligned}$$
(20)

It remains to choose n depending on t to get the best bound possible. Clearly there is a tradeoff: for small values of n, the first term is weak while for large values of n the second term increases and can remain bounded away from 0.

Let us present the heuristics to find a near-optimal n. The second term in (20) is approximately

$$\begin{aligned} 2\exp \left( -t \sqrt{\frac{2}{\pi }} \exp \left( -\frac{(\gamma n + 1)^2}{2}\right) \right) . \end{aligned}$$

We get the optimal bounds if the two terms agree (ignoring constants):

$$\begin{aligned} \exp (-\beta n^2)&= \exp \left( -t \sqrt{\frac{2}{\pi }} \exp \left( -\frac{(\gamma n + 1)^2}{2}\right) \right) ,\\ \log \beta + 2\log n&= \log t + \frac{1}{2} \log \frac{2}{\pi }- \frac{(\gamma n + 1)^2}{2}. \end{aligned}$$

It is easy to see that the value of \(\frac{\sqrt{2\log t}}{\gamma }\) for n “almost” satisfies the latter equality but it does not lead to the desired estimate (21) below. Still, inspired by this option we choose

$$\begin{aligned} n=\left\lceil \left( \frac{\sqrt{2}}{\gamma }-\eta \right) \sqrt{\log t}\right\rceil \end{aligned}$$

with some small \(\eta >0\). Using this choice in our bound (20) and noting

$$\begin{aligned} (\gamma n +1)^{2}\le \left( \gamma \left( \frac{\sqrt{2}}{\gamma }-\eta \right) \sqrt{\log t}+2\right) ^{2} \end{aligned}$$

we get

$$\begin{aligned} ||{\mathscr {L}}(X_{t})-\mu _{*}||_{TV}\le & {} \frac{4c(\gamma ,\beta ,X_0)}{\exp \left( \beta \left( \frac{\sqrt{2}}{\gamma } -\eta \right) ^2\log t\right) } \\&+ 2\left( 1-\sqrt{\frac{2}{\pi }}\exp \left[ -\left( \left( 1-\frac{\gamma \eta }{\sqrt{2}}\right) \sqrt{\log t} + \sqrt{2}\right) ^2\right] \right) ^t. \end{aligned}$$

In the exponent of the first term we could choose the coefficient of the logarithm arbitrarily close to \(\frac{1-\gamma ^2}{2} \left( \frac{\sqrt{2}}{\gamma }\right) ^2=\frac{1}{\gamma ^2}-1\). Although the second term looks daunting, observe that it has the order of \((1-t^{-1+\eta '})^t\) with some \(\eta '>0\) therefore it has subpolynomial decay and is negligible compared to the first term.

Summing up, for a rate estimate we get that for any \(h>0\) there is some constant \(C_h>0\) such that

$$\begin{aligned} ||{\mathscr {L}}(X_{t})-\mu _{*}||_{TV}\le \frac{C_h}{t^{\frac{1}{\gamma ^2}-1-h}}. \end{aligned}$$
(21)

In the model (18), \(||{\mathscr {L}}(X_{t})-\mu _{*}||_{TV}\) decreases geometrically in t so only a suboptimal rate can be achieved by our method. Nevertheless, the estimates leading to (21) are of great interest since they can serve as a basis for similar results for certain non-Markovian models, where power convergence rates are common, see e.g. [21]. One can thus treat models like (32) below (which are not covered by current literature). Then, using technology from [16, 29], various mixing properties and laws of large numbers (with rate estimates) can be established for functionals of the process \(X_{t}\), \(t\in {\mathbb {N}}\). Central limit theorems can also be derived from mixing conditions, see [44]. These developments, however, are out of the scope of the present article.

3.2 Markov chains in random environments

We now extend Theorem 3.4 to Markov chains in random environments. These processes still evolve in \({\mathscr {X}}=\cup _{n\in {\mathbb {N}}}{\mathscr {X}}_{n}\) but their dynamics is be influenced by another random process we are just about to introduce. Let \({\mathscr {Y}}\) be another Polish space and let \(Y_t\), \(t\in {\mathbb {Z}}\) be a (strict sense) stationary process in \({\mathscr {Y}}\). We assume that a non-decreasing sequence \({\mathscr {Y}}_{n}\in {\mathscr {B}}({\mathscr {Y}})\), \(n\in {\mathbb {N}}\) is given with \({\mathscr {Y}}_{0}\ne \emptyset \). Let \(Q:{\mathscr {X}}\times {\mathscr {Y}}\times {\mathscr {B}}({\mathscr {X}})\rightarrow [0,1]\) be a parametrized family of transition kernels, i.e. \(Q(\cdot ,\cdot ,A)\) is measurable for all \(A\in {\mathscr {B}}({\mathscr {X}})\) and \(Q(x,y,\cdot )\) is a probability for all \((x,y)\in {\mathscr {X}}\times {\mathscr {Y}}\). We say that the process \({X}_t\), \(t\in {\mathbb {N}}\) is a Markov chain in a random environment with transition kernel Q if it is an \({\mathscr {X}}\)-valued stochastic process such that

$$\begin{aligned} P({X}_{t+1}\in A\mid \sigma (Y_j,\ j\in {\mathbb {Z}};\ X_j,\ 0\le j\le t))=Q(X_t,Y_{t},A),\ t\in {\mathbb {N}}. \end{aligned}$$
(22)

Denote by \({\mathscr {M}}_{0}\) the set of probability laws on \({\mathscr {X}}\times {\mathscr {Y}}^{{\mathbb {Z}}}\) such that their second marginal equals the law of \((Y_{k})_{k\in {\mathbb {Z}}}\). Let \({\mathscr {M}}_{b}\) denote the set of those \(\mu \in {\mathscr {M}}_{0}\) for which the process \(X_{t}\), \(t\in {\mathbb {N}}\) started from \(X_{0}\) with \({\mathscr {L}}(X_{0},(Y_{k})_{k\in {\mathbb {Z}}})=\mu \) satisfies

$$\begin{aligned} \sup _{t\in {\mathbb {N}}}P(X_{t}\notin {\mathscr {X}}_{n})\rightarrow 0,\ n\rightarrow \infty . \end{aligned}$$
(23)

We write \(X_{0}\in {\mathscr {M}}_{b}\) in the sequel when we really mean \({\mathscr {L}}(X_{0},(Y_{k})_{k\in {\mathbb {Z}}})\in {\mathscr {M}}_{b}\).

Assumption 3.8

Let \(P(Y_{0}\notin {\mathscr {Y}}_{n})\rightarrow 0\) hold as \(n\rightarrow \infty \). There exists a sequence \(\alpha _{n}\in (0,1]\), \(n\in {\mathbb {N}}\) and a sequence of probability measures \(\nu _{n}\), \(n\in {\mathbb {N}}\) such that for all \(n\in {\mathbb {N}}\),

$$\begin{aligned} Q(x,y,A)\ge \alpha _{n}\nu _n(A),\ A\in {\mathscr {B}}({\mathscr {X}}),\ y\in {\mathscr {Y}}_{n},\ x\in {\mathscr {X}}_n. \end{aligned}$$

A parametric version of Lemma 3.3 comes next.

Lemma 3.9

Let Assumption 3.8 be in force. Let \({\mathbf {U}}\) be a uniform random variable on [0, 1]. For each \(n\in {\mathbb {N}}\), there exists a measurable mapping \(T^{n}(\cdot ,\cdot ,\cdot ):[0,1]\times {\mathscr {X}}\times {\mathscr {Y}}\rightarrow {\mathscr {X}}\) satisfying \(Q(x,y,A)=P(T^{n}({\mathbf {U}},x,y)\in A)\), \(x\in {\mathscr {X}}\), \(y\in {\mathscr {Y}}\), \(A\in {\mathscr {B}}({\mathscr {X}})\) such that for all \(u\in [0,\alpha _{n}]\),

$$\begin{aligned} T^{n}(u,x_1,y)=T^{n}(u,x_2,y)\quad \text{ for } \text{ all }\quad x_1,x_2\in {\mathscr {X}}_n,\ y\in {\mathscr {Y}}_{n}. \end{aligned}$$

Proof

This is a straightforward extension of the case with \({\mathscr {Y}}\) a singleton, that is, of Lemma 3.3 above. See Lemma 7.1 of [29]. \(\square \)

The following abstract result serves as the basis of Sect. 4 below. We do not know of any similar results in the literature. Existing papers have fairly restrictive assumptions: either Doeblin-like conditions (as in [26, 27, 39]) or strong contractivity hypotheses (as in [40]).

Theorem 3.10

Let Assumption 3.8 hold and let \({\mathscr {M}}_{b}\ne \emptyset \). Let \(X_t\), \(t\in {\mathbb {N}}\) denote a Markov chain in a random environment with transition kernel Q, started from some \(X_0\in {\mathscr {M}}_{b}\). Then there exists a probability \(\mu _{\sharp }\) on \({\mathscr {B}}({\mathscr {X}}\times {\mathscr {Y}}^{{\mathbb {N}}})\) such that

$$\begin{aligned} ||{\mathscr {L}}(X_{t},(Y_{t+k})_{k\in {\mathbb {Z}}})-\mu _{\sharp }||_{TV}\rightarrow 0,\ t\rightarrow \infty .{} \end{aligned}$$
(24)

If \(X_{t}'\), \(t\in {\mathbb {N}}\) is another such Markov chain in random environment started from \(X_{0}'\in {\mathscr {M}}_{b}\) then

$$\begin{aligned} ||{\mathscr {L}}(X_{t},(Y_{t+k})_{k\in {\mathbb {Z}}})- {\mathscr {L}}(X'_{t},(Y_{t+k})_{k\in {\mathbb {Z}}})||_{TV}\rightarrow 0,\ t\rightarrow \infty . \end{aligned}$$
(25)

In particular, \(\mu _{\sharp }\) does not depend on the choice of \(X_{0}\in {\mathscr {M}}_{b}\). The probability \(\mu _{\sharp }\) is invariant in the following sense: if \(X_{0}\) is such that \({\mathscr {L}}(X_{0},(Y_{k})_{k\in {\mathbb {Z}}})=\mu _{\sharp }\) then \({\mathscr {L}}(X_{t},(Y_{t+k})_{k\in {\mathbb {Z}}})=\mu _{\sharp }\) for each \(t\in {\mathbb {N}}\).

Proof

The core idea of the proof is identical to that of Theorem 3.4, with the extra task of checking whether the process Y stays in \({\mathscr {Y}}_{n}\) for some suitable n. In order to prove invariance, however, here we need to construct \({\tilde{X}}_{\infty }\) such that \({\tilde{X}}_{t}\) (to be defined soon) converges to \({\tilde{X}}_{\infty }\) a.s. in a stationary way (along a suitable subsequence). This requires a more complicated setup.

There exists a measurable function \(g:{\mathscr {Y}}^{{\mathbb {Z}}}\times [0,1]\rightarrow {\mathscr {X}}\) and a uniform [0, 1]-valued random variable R, independent of \(\sigma (Y_{k},{k\in {\mathbb {Z}}})\), such that \({\mathscr {L}}(X_{0},(Y_{k})_{k\in {\mathbb {Z}}})= {\mathscr {L}}(g((Y_{k})_{k\in {\mathbb {Z}}},R),(Y_{k})_{k\in {\mathbb {Z}}})\). Let \({\mathbf {U}}_{k}\), \(k\in -{\mathbb {N}}\) be an independent family of uniform random variables on [0, 1], independent of \(\sigma (R,(Y_{k})_{k\in {\mathbb {Z}}})\). Let \(T^{n}(\cdot ,\cdot ,\cdot )\), \(n\in {\mathbb {N}}\) be the mappings constructed in Lemma 3.9.

For each integer \(m\ge 1\) choose \(n(m)\in {\mathbb {N}}\) so large that

$$\begin{aligned} P(Y_{0}\notin {\mathscr {Y}}_{n(m)})+\sup _{k\in {\mathbb {N}}}P(X_{k}\notin {\mathscr {X}}_{n(m)}) \le 1/2^{m}. \end{aligned}$$
(26)

Let \(N(m)\ge 1\) be so large that \((1-\alpha _{n(m)})^{N(m)}\le 1/2^{m}\). Define \(M_{0}:=0\), \(M_{m}:=\sum _{j=1}^{m}N(j)\). Define the following random mappings from \({\mathscr {X}}\rightarrow {\mathscr {X}}\), for each \(m\ge 1\):

$$\begin{aligned} {\tilde{T}}_{m}(\cdot ):=T^{n(m)}({\mathbf {U}}_{-M_{m-1}},\cdot ,Y_{-M_{m-1}-1})\circ \ldots \circ T^{n(m)} ({\mathbf {U}}_{-M_{m}+1},\cdot ,Y_{-M_{m}}) \end{aligned}$$

and

$$\begin{aligned} {\mathbf {T}}_{m}(\cdot ):={\tilde{T}}_{1}(\cdot )\circ \ldots \circ {\tilde{T}}_{m}(\cdot ). \end{aligned}$$

Let \({\mathbf {T}}_{0}\) be the identity mapping of \({\mathscr {X}}\).

Let \({\tilde{X}}_{0}:=g((Y_{k})_{k\in {\mathbb {Z}}},R)\) and for each \(m\in {\mathbb {N}}\) and each \(M_{m}+1\le t\le M_{m+1}\), define the process

$$\begin{aligned} {\tilde{X}}_{t}:= & {} {\mathbf {T}}_{m}(\cdot )\circ T^{n(m+1)}({\mathbf {U}}_{-M_{m}},\cdot ,Y_{-M_{m}-1})\\&\circ \ldots \circ T^{n(m+1)}({\mathbf {U}}_{-t+1},\cdot ,{} Y_{-t})(g((Y_{-t+k})_{k\in {\mathbb {Z}}},R)). \end{aligned}$$

Notice that \({\mathscr {L}}({\tilde{X}}_{t},(Y_{k})_{k\in {\mathbb {Z}}})={\mathscr {L}}(X_{t},(Y_{t+k})_{k\in {\mathbb {Z}}})\) by construction, for each \(t\in {\mathbb {N}}\).

Fix \(m\ge 2\) and let \(M_{m}+1\le t\le M_{m+1}\) be arbitrary. For each \(j=M_{m-1},\ldots ,M_{m}\) we define the following random variables:

$$\begin{aligned} V_{j,t}:= & {} [T^{n(m)}({\mathbf {U}}_{-j},\cdot ,Y_{-j-1})\circ \cdots \circ T^{n(m)}({\mathbf {U}}_{-M_{m}+1},\cdot ,Y_{-M_{m}})\\&\circ T^{n(m+1)}({\mathbf {U}}_{-M_{m}},\cdot ,Y_{-M_{m}-1})\circ \\&\cdots \circ T^{n(m+1)}({\mathbf {U}}_{-t+1},\cdot ,Y_{-t})](g((Y_{-t+k})_{k\in {\mathbb {Z}}},R)),\\ W_{j,t}:= & {} [T^{n(m)}({\mathbf {U}}_{-j},\cdot ,Y_{-j-1})\\&\circ \cdots \circ T^{n(m)}({\mathbf {U}}_{-M_{m}+1},\cdot ,Y_{-M_{m}})] (g((Y_{-M_{m}+k})_{k\in {\mathbb {Z}}},R)), \end{aligned}$$

with the understanding that

$$\begin{aligned} W_{M_{m},t}=g((Y_{-M_{m}+k})_{k\in {\mathbb {Z}}},R) \end{aligned}$$

and

$$\begin{aligned} V_{M_{m},t}:= & {} T^{n(m+1)}({\mathbf {U}}_{-M_{m}},\cdot ,Y_{-M_{m}-1})\\&\circ \cdots \circ T^{n(m+1)}({\mathbf {U}}_{-t+1},\cdot ,Y_{-t})(g((Y_{-t+k})_{k\in {\mathbb {Z}}},R)). \end{aligned}$$

Consider the corresponding disjoint events

$$\begin{aligned} A_{j,t}:= & {} \left\{ V_{j,t}=W_{j,t}\right\} ,\\ B_{j,t}:= & {} \left\{ V_{j,t}\ne W_{j,t}, V_{j,t}\in {\mathscr {X}}_{n(m)},\ W_{j,t}\in {\mathscr {X}}_{n(m)},\ Y_{-j}\in {\mathscr {Y}}_{n(m)} \right\} ,\\ C_{j,t}:= & {} \Omega \setminus (A_{j,t}\cup B_{j,t}). \end{aligned}$$

Define also \(p_{j,t}:=P(A_{j,t})\), \(j=M_{m-1},\ldots ,M_{m}\). Notice that

$$\begin{aligned} P(C_{j,t})\le & {} P(V_{j,t}\notin {\mathscr {X}}_{n(m)})+ P(W_{j,t}\notin {\mathscr {X}}_{n(m)}) +P(Y_{-j}\notin {\mathscr {Y}}_{n(m)})\nonumber \\= & {} P(X_{t-j}\notin {\mathscr {X}}_{n(m)})+P(X_{M_{m}-j}\notin {\mathscr {X}}_{n(m)}) +P(Y_{-j}\notin {\mathscr {Y}}_{n(m)})\nonumber \\\le & {} 1/2^{m-1}, \end{aligned}$$
(27)

by the stationarity of the process Y and by (26).

Define \({\mathscr {H}}_{j,t}:=\sigma ((Y_{k})_{k\in {\mathbb {Z}}},R,{\mathbf {U}}_{-j},\ldots ,{\mathbf {U}}_{-t+1})\). On \(B_{j,t}\in {\mathscr {H}}_{j,t}\) we have

$$\begin{aligned}&P\left( A_{j-1,t}\mid {\mathscr {H}}_{j,t}\right) \\&\quad \ge P\left( {\mathbf {U}}_{-j+1}\in [0,\alpha _{n(m)}]\mid {\mathscr {H}}_{j,t}\right) ={} P({\mathbf {U}}_{-j+1}\in [0,\alpha _{n(m)}])=\alpha _{n(m)} \text{ a.s. } \end{aligned}$$

since \(T^{n(m)}({\mathbf {U}}_{-j+1},\cdot ,y)\) is a constant mapping on \({\mathscr {X}}_{n(m)}\), for each \(y\in {\mathscr {Y}}_{n(m)}\) when \({\mathbf {U}}_{-j+1}\in [0,\alpha _{n(m)}]\), and \({\mathbf {U}}_{-j+1}\) is independent of \({\mathscr {H}}_{j,t}\). On the other hand, on \(A_{j,t}\in {\mathscr {H}}_{j,t}\) we have \(P\left( A_{j-1,t}\vert {\mathscr {H}}_{j,t}\right) =1\) a.s., trivially. Hence

$$\begin{aligned} p_{j-1,t}\ge p_{j,t}+\alpha _{n(m)}P(B_{j,t})\ge p_{j,t}+\alpha _{n(m)}(1-p_{j,t}-1/2^{m-1}), \end{aligned}$$
(28)

using (27), which leads (by backward induction starting from \(p_{M_{m},t}\ge 0\)) to

$$\begin{aligned} p_{M_{m-1},t}\ge (1-1/2^{m-1}){} [1-(1-\alpha _{n(m)})^{N(m)}], \end{aligned}$$

and eventually to

$$\begin{aligned} P({\tilde{X}}_{t}\ne {\tilde{X}}_{M_{m}})\le & {} P(V_{M_{m-1},t}\ne W_{M_{m-1},t})= 1-p_{M_{m-1},t}\le 1/2^{m-1}\nonumber \\&+ (1-\alpha _{n(m)})^{N(m)}\le 1/2^{m-2}, \end{aligned}$$
(29)

remembering the choice of N(m). These relations establish, in particular, that for the event

$$\begin{aligned} A_{m}:=\left\{ {\tilde{X}}_{M_{j}}={\tilde{X}}_{M_{m}} \text{ for } \text{ all } j\ge m\right\} , \end{aligned}$$

we have

$$\begin{aligned} P(\Omega \setminus A_{m})\le \sum _{j=m} \frac{1}{2^{j-2}}\le 1/2^{m-3}. \end{aligned}$$
(30)

We can thus define unambiguously \({\tilde{X}}_{\infty }:={\tilde{X}}_{M_{m}}\) on \(A_{m}\) and, doing this for all \(m\ge 2\), a random variable \({\tilde{X}}_{\infty }\) gets almost surely defined. Clearly, for all \(M_{m}+1\le t\le M_{m+1}\),

$$\begin{aligned} P({\tilde{X}}_{t}\ne {\tilde{X}}_{\infty })\le P({\tilde{X}}_{t}\ne {\tilde{X}}_{M_{m}})+P({\tilde{X}}_{M_{m}} \ne {\tilde{X}}_{\infty }) \le 1/2^{m-4}, \end{aligned}$$

by (29) and (30). Denoting by \(\mu _{\sharp }\) the law of \(({\tilde{X}}_{\infty },(Y_{k})_{k\in {\mathbb {Z}}})\),

$$\begin{aligned} ||{\mathscr {L}}(X_{t},(Y_{t+k})_{k\in {\mathbb {Z}}})-\mu _{\sharp }||_{TV}\le 2P({\tilde{X}}_{t}\ne {\tilde{X}}_{\infty })\rightarrow 0, \ t\rightarrow \infty .{} \end{aligned}$$

Now we turn to proving (25). In addition to \({\tilde{X}}_{t}\), let us also define \({\tilde{X}}_{t}'\) in the same manner with g replaced by \(g':{\mathscr {Y}}^{{\mathbb {Z}}}\times [0,1]\rightarrow {\mathscr {X}}\) such that \({\mathscr {L}}(X_{0}',(Y_{k})_{k\in {\mathbb {Z}}})= {\mathscr {L}}(g'((Y_{k})_{k\in {\mathbb {Z}}},R),(Y_{k})_{k\in {\mathbb {Z}}})\). We get by analogous arguments that

$$\begin{aligned}&||{\mathscr {L}}({X}_{t},(Y_{t+k})_{k\in {\mathbb {Z}}})-{\mathscr {L}}({X}_{t}',(Y_{t+k})_{k\in {\mathbb {Z}}})||_{TV}\\&\quad = ||{\mathscr {L}}({\tilde{X}}_{t},(Y_{k})_{k\in {\mathbb {Z}}})-{\mathscr {L}}({\tilde{X}}_{t}',(Y_{k})_{k\in {\mathbb {Z}}})||_{TV} \rightarrow 0,\ t\rightarrow \infty . \end{aligned}$$

To see invariance, fix \(\varepsilon >0\) and notice that for \(m=m(\varepsilon )\) large enough,

$$\begin{aligned} P({\tilde{X}}_{M_{m}}\ne {\tilde{X}}_{\infty })+P({\tilde{X}}_{M_{m}+1}\ne {\tilde{X}}_{\infty })\le \varepsilon . \end{aligned}$$
(31)

Let us take \({\mathbf {U}}^{*}\) uniform on [0, 1], independent of all the random objects that have appeared so far. We use the mapping \(T^{0}(\cdot ,\cdot ,\cdot )\) below but \(T^{n}(\cdot ,\cdot ,\cdot )\) for any n would do equally well. Notice that

$$\begin{aligned} {\mathscr {L}}(T^{0}({\mathbf {U}}^{*},{\tilde{X}}_{M_{m}},Y_{0}), (Y_{1+k})_{k\in {\mathbb {Z}}}) ={\mathscr {L}}({\tilde{X}}_{M_{m}+1},(Y_{k})_{k\in {\mathbb {Z}}}) \end{aligned}$$

and then from (31), necessarily,

$$\begin{aligned} ||{\mathscr {L}}(T^{0}({\mathbf {U}}^{*},{\tilde{X}}_{M_{m}},Y_{0}), (Y_{1+k})_{k\in {\mathbb {Z}}}) -\mu _{\sharp }||_{TV}\le 2\varepsilon . \end{aligned}$$

Now employing the limiting random variable representing \(\mu _{\sharp }\),

$$\begin{aligned}&||{\mathscr {L}}(T^{0}({\mathbf {U}}^{*},{\tilde{X}}_{M_{m}},Y_{0}),(Y_{1+k})_{k\in {\mathbb {Z}}}) -{\mathscr {L}}(T^{0}({\mathbf {U}}^{*},{\tilde{X}}_{\infty },Y_{0}),(Y_{1+k})_{k\in {\mathbb {Z}}}))||_{TV}\\&\quad \le 2P(T^{0}({\mathbf {U}}^{*},{\tilde{X}}_{M_{m}},Y_{0})\ne T^{0}({\mathbf {U}}^{*},{\tilde{X}}_{\infty },Y_{0}))\\&\quad \le 2P({\tilde{X}}_{M_{m}}\ne {\tilde{X}}_{\infty })\le 2\varepsilon . \end{aligned}$$

Thus we have

$$\begin{aligned} ||{\mathscr {L}}(T^{0}({\mathbf {U}}^{*},{\tilde{X}}_{\infty },Y_{0}),(Y_{1+k})_{k\in {\mathbb {Z}}})-\mu _{\sharp }||_{TV}\le 4\varepsilon \end{aligned}$$

and, as \(\varepsilon \) was arbitrary, \({\mathscr {L}}(T^{0}({\mathbf {U}}^{*},{\tilde{X}}_{\infty },Y_{0}),(Y_{1+k})_{k\in {\mathbb {Z}}})=\mu _{\sharp }\) follows. Clearly, this means that

$$\begin{aligned} {\mathscr {L}}(X_{0},(Y_{k})_{k\in {\mathbb {Z}}})=\mu _{\sharp } \text{ implies } {\mathscr {L}}(X_{1},(Y_{1+k})_{k\in {\mathbb {Z}}})=\mu _{\sharp } \end{aligned}$$

and the latter extends immediately to \({\mathscr {L}}(X_{t},(Y_{t+k})_{k\in {\mathbb {Z}}})=\mu _{\sharp }\) for all \(t\ge 2\), too. The proof is complete. \(\square \)

Before transitioning to the analysis of continuous-time processes, let us demonstrate the application of Theorem 3.10 on a benchmark model: the discrete-time counterpart of (1) with log-Gaussian \(V_{t}\). We take the simplest mean-reverting drift, but the same argument applies under more general dissipativity conditions.

Example 3.11

Consider the following model for financial time series. Let \(\eta _t,t\in {\mathbb {Z}}\) be independent standard Gaussian random variables and

$$\begin{aligned} Z_t = \sum _{k=0}^\infty a_k \eta _{t-k}, \end{aligned}$$

a causal moving average with constants \(a_k\), \(k\in {\mathbb {N}}\) satisfying \(\sum _{k} a_k^2<\infty \). Therefore \(Z_t\) is almost surely well defined and is a stationary Gaussian process. \(Z_t\) represents the log-volatility of an asset’s log-price \(X_t\) which in turn is defined as

$$\begin{aligned} X_{t+1} = \gamma X_t + \rho e^{Z_t} \eta _{t+1} + \sqrt{1-\rho ^2} e^{Z_t} \varepsilon _{t+1}, \end{aligned}$$
(32)

where \(\gamma \in (0,1), \rho \in (-1,1)\) and \(\varepsilon _k,k\in {\mathbb {N}}\) is an i.i.d. series of zero-mean random variables, also independent of \(\eta _t,t\in {\mathbb {Z}}\).

For the \(\varepsilon _k\) we assume they have finite variance and have a positive density f(x) such that for all \(n\in {\mathbb {N}}\), \(\inf _{x\in [-n,n]}f(x)=c(n)>0\). Additionally, we assume the initial price \(X_0\) has finite variance and is independent of \(\eta _t,t\in {\mathbb {Z}}, \varepsilon _k,k\in {\mathbb {N}}\).

We claim that under these natural assumptions Theorem 3.10 is applicable to the model (32).

First of all, the random environment is defined as \(Y_{t}:=(Z_{t},\eta _{t+1})\). We choose

$$\begin{aligned} {\mathscr {X}}_n = \{x\in {\mathbb {R}}:|x|\le n\} \qquad {\mathscr {Y}}_n = \{(z,\eta )\in {\mathbb {R}}^2 :|z|,|\eta |\le n\}.\end{aligned}$$

We first verify Assumption 3.8, fix some \(n\in {\mathbb {N}}\) and \(\nu _n=\frac{1}{2} Leb\vert _{[-1,1]}\). Now that we are working with absolutely continuous distributions, we have to find a lower bound of the transition density to \([-1,1]\) from any departure point \(X_t\in {\mathscr {X}}_n, (Z_t,\eta _{t+1})\in {\mathscr {Y}}_{n}\).

Rearranging (32), we get

$$\begin{aligned} \varepsilon _{t+1} = \frac{X_{t+1}-\gamma X_t}{\sqrt{1-\rho ^2}e^{Z_t}} + \frac{\rho }{\sqrt{1-\rho ^2}}\eta _{t+1}. \end{aligned}$$

Requiring \(X_{t+1}\) to arrive in \([-1,1]\), knowing \(X_t,\eta _{t+1}\in [-n,n]\), \(e^{Z_t}\in [e^{-n},e^n]\), the possible needed values of \(\varepsilon _{t+1}\) are restricted within some bounded interval \([-d(n),d(n)]\). Using the condition on the bounded positivity of the density f(x) of \(\varepsilon _{t+1}\) we get a valid minorization with

$$\begin{aligned} \alpha _n = 2\inf _{x\in [-d(n),d(n)]}f(x) = 2c(d(n))>0. \end{aligned}$$

It is left to confirm that \(X_0\in {\mathscr {M}}_b\), so that \(X_t\) uniformly rarely leaves the small sets. By recursively using (32) we may express \(X_t\) as follows:

$$\begin{aligned} X_t = \sum _{s=1}^t \gamma ^{t-s} e^{Z_{s-1}}\left( \rho \eta _s + \sqrt{1-\rho ^2}\varepsilon _s\right) + \gamma ^t X_0. \end{aligned}$$
(33)

To bound \(X_t\), we compute \(E[X^2_t]\). Observe that when evaluating the square of this sum, all cross-terms cancel when taking expectation, even the ones only involving Z and \(\eta \). Consequently,

$$\begin{aligned} E[X_t^2] = \sum _{s=1}^t \gamma ^{2t-2s} E\left[ e^{2Z_{s-1}}\right] (\rho ^2 E[\eta ^2_s] + (1-\rho ^2)E[\varepsilon ^2_s]) + \gamma ^{2t} E[X^2_0]. \end{aligned}$$

Regarding these terms, remember that \(Z_{t}\) was Gaussian thus it has finite exponential moments and all appearing variables had finite variances. Moreover, due to the stationarity of all components appearing, we have the time-independent bound

$$\begin{aligned} E[X_t^2] \le \frac{1}{1-\gamma ^2} E\left[ e^{2Z_{0}}\right] (\rho ^2 E[\eta ^2_1] + (1-\rho ^2)E[\varepsilon ^2_1]) + E[X^2_0] =: K <\infty . \end{aligned}$$

From here we can conveniently bound

$$\begin{aligned} \sup _{t\in {\mathbb {N}}}P(|X_t|>n)\le \frac{K}{n^2}, \end{aligned}$$

which indeed converges to 0 as \(n\rightarrow \infty \). This reasoning shows that \({\mathscr {L}}(X_0,(Z_k,\eta _{k+1})_{k\in {\mathbb {Z}}})\in {\mathscr {M}}_b\). We have verified the minorization Assumption  3.8 just before so Theorem 3.10 applies, ensuring convergence in total variation. The present example complements Example 3.4 of [16] where convergence in total variation was established under stronger assumptions (but with a rate estimate).

4 Proofs in continuous time

Until finishing the proof of Theorem 2.9, we assume that all the hypotheses of that theorem are in force. Let us first establish a simple continuity property.

Lemma 4.1

When \(s\rightarrow 0\), \(\sup _{t\in {\mathbb {R}}} \{E[{\mathbf {d}}_{d\times d}({\mathbf {V}}_{t+s},{\mathbf {V}}_{t})]+E[{\mathbf {d}}_{d\times m}({\mathbf {R}}_{t+s},{\mathbf {R}}_{t})]\}\rightarrow 0\) holds true.

Proof

By stationarity of V, \(\rho \), this amounts to proving \(E[{\mathbf {d}}_{d\times d}({\mathbf {V}}_{s},{\mathbf {V}}_{0})]+E[{\mathbf {d}}_{d\times m}({\mathbf {R}}_{s},{\mathbf {R}}_{0})]\rightarrow 0\). The process V has trajectories that are uniformly continuous on compacts hence \(\sup _{u\in [i,i+1]}|V_{u+s}-V_{u}|\rightarrow 0\) almost surely as \(s\rightarrow 0\), for each \(i\in {\mathbb {Z}}\). Then \(E[1\wedge \sup _{u\in [i,i+1]}|V_{u+s}-V_{u}|]\rightarrow 0\) for each i and, finally, \(E[{\mathbf {d}}_{d\times d}({\mathbf {V}}_{s},{\mathbf {V}}_{0})]\rightarrow 0\) by the definition of \({\mathbf {d}}\). We argue in the same manner for \({\mathbf {R}}\). \(\square \)

Now let us prove a moment estimate.

Lemma 4.2

We have \({\tilde{L}}:=\sup _{t\in {\mathbb {R}}_{+}}E[|L_{t}|^{2}]<\infty \).

Proof

Fix \(k\in {\mathbb {N}}\), for the moment. Define the stopping times \(\tau _l:=\inf \{ t>k:|L_t|>l\}\) for \(l\in {\mathbb {N}}\). Itô’s formula and Assumption 2.5 imply that, for all \(k\le t\le k+1\),

$$\begin{aligned} \mathrm {e}^{2\alpha (t\wedge \tau _{l}-k)}|L_{t\wedge \tau _{l}}|^{2}= & {} |L_{k}|^{2} + \int _{k}^{t\wedge \tau _{l}} 2\mathrm {e}^{2\alpha (s-k)}\langle L_{s},\zeta (L_{s},V_{s})\rangle \, ds\\&+ \int _{k}^{t\wedge \tau _{l}} 2\mathrm {e}^{2\alpha (s-k)} L_{s}^{*}V_{s}d{\overline{W}}_{s}\\&+ \int _{k}^{t\wedge \tau _{l}} \mathrm {e}^{2\alpha (s-k)}\mathrm {tr}(V_{s}^{*}V_{s})\, ds+ \int _{k}^{t\wedge \tau _{l}}2\alpha \mathrm {e}^{2\alpha (s-k)}|L_{s}|^{2}\, ds \\\le & {} |L_{k}|^{2}-\int _{k}^{t\wedge \tau _{l}}2\alpha \mathrm {e}^{2\alpha (s-k)}|L_{s}|^{2}\, ds\\&+ \int _{k}^{k+1}2\mathrm {e}^{2\alpha (s-k)}\beta (1+|V_{s}|^{\xi })\, ds \\&+ \int _{k}^{t\wedge \tau _{l}} 2\mathrm {e}^{2\alpha (s-k)} L_{s}^{*}V_{s}d{\overline{W}}_{s} + \mathrm {e}^{2\alpha }\int _{k}^{k+1}d|V_{s}|^{2}\, ds\\&+ \int _{k}^{t\wedge \tau _{l}}2\alpha \mathrm {e}^{2\alpha (s-k)}|L_{s}|^{2}\, ds\\\le & {} |L_{k}|^{2}+\int _{k}^{k+1}2\mathrm {e}^{2\alpha }\beta (1+|V_{s}|^{\xi })\, ds \\&+ \int _{k}^{t\wedge \tau _{l}} 2\mathrm {e}^{2\alpha (s-k)} L_{s}^{*}V_{s}d{\overline{W}}_{s} + \mathrm {e}^{2\alpha }\int _{k}^{k+1}d|V_{s}|^{2}\, ds, \end{aligned}$$

where \({\overline{W}}_{t}=\int _{0}^{t}\rho _{s}\,dB_{s}+\int _{0}^{t}\sqrt{I-\rho _{s}\rho ^{*}_{s}}\, dW_{s}\) is a d-dimensional standard Brownian motion. Taking expectations and noting the martingale property of the stochastic integral,

$$\begin{aligned} E[\mathrm {e}^{2\alpha (t\wedge \tau _{l}-k)}|L_{t\wedge \tau _{l}}|^{2}]\le E[|L_{k}|^{2}]+\int _{k}^{k+1}\mathrm {e}^{2\alpha }(2\beta +d)(1+E[|V_{s}|^{\xi }])\, ds. \end{aligned}$$

Noting stationarity of V and applying Fatou’s lemma,

$$\begin{aligned} E[|L_{t}|^{2}]\le \mathrm {e}^{-2\alpha (t-k)}E[|L_{k}|^{2}]+e^{2\alpha }(2\beta +d)(1+E[|V_{0}|^{\xi }]). \end{aligned}$$
(34)

Setting \(t=k+1\), Assumption 2.8 and an induction on k show that \(E[|L_{k}|^{2}]<\infty \) for all \(k\in {\mathbb {N}}\), in fact, \(\sup _{k}E[|L_{k}|^{2}]<\infty \). Finally, also \(\sup _{t\ge 0}E[|L_{t}|^{2}]<\infty \) follows from (34). \(\square \)

Let \({\mathbf {C}}_{k}\) denote the Banach space of continuous \({\mathbb {R}}^{k}\)-valued functions on [0, 1] equipped with the usual maximum norm \(||\cdot ||_{{\mathbf {C}}_{k}}\). The family of functions in \({\mathbf {C}}_{d\times d}\) whose values are non-singular is denoted by \({\mathbf {C}}^{+}\). We further define

$$\begin{aligned} {\mathbf {C}}^{1}:=\{{\mathbf {r}}\in {\mathbf {C}}_{d\times m}:\, {\mathbf {r}}_{t}{\mathbf {r}}_{t}^{*}\le I,\ t\in [0,1]\} \end{aligned}$$

as well as

$$\begin{aligned} {\mathbf {C}}^{1+}:=\{{\mathbf {r}}\in {\mathbf {C}}_{d\times m}:\, {\mathbf {r}}_{t}{\mathbf {r}}^{*}_{t}< I,\ t\in [0,1]\}. \end{aligned}$$

The auxiliary process to be defined in (35) below plays a key role in our arguments. The parameters \({\mathbf {v}},{\mathbf {r}}\) represent the “frozen” values of trajectories of the volatility and correlation processes, while \({\mathbf {z}}\) is a generic value of the stochastic integral of \(V\rho \) with respect to B.

For each \({\mathbf {v}}\in {\mathbf {C}}_{d\times d}\), \({\mathbf {z}}\in {\mathbf {C}}_{d}\), \({\mathbf {r}}\in {\mathbf {C}}^{1}\) and \(x\in {\mathbb {R}}^{d}\), let \({\tilde{X}}_t({\mathbf {v}},{\mathbf {z}},{\mathbf {r}},x)\), \(t\in [0,1]\) denote the unique \({\mathscr {F}}_t\)-adapted solution of the SDE

$$\begin{aligned} d{\tilde{X}}_t({\mathbf {v}},{\mathbf {z}},{\mathbf {r}},x)= & {} \zeta \left( {\tilde{X}}_t({\mathbf {v}},{\mathbf {z}},{\mathbf {r}},x) + {\mathbf {z}}_t,{\mathbf {v}}_{t}\right) \, dt\nonumber \\&+ {\mathbf {v}}_t\sqrt{I-{\mathbf {r}}_t{\mathbf {r}}_t^{*}}\, dW_t,\ {\tilde{X}}_0({\mathbf {v}},{\mathbf {z}},{\mathbf {r}},x)=x, \end{aligned}$$
(35)

which exists e.g. by Theorem 7 on page 82 of [28]. We shall use the shorthand notation \({\mathbf {q}}:=({\mathbf {v}},{\mathbf {z}},{\mathbf {r}},x)\) in the sequel. Introduce also the space \({\mathscr {Y}}:={\mathbf {C}}_{d\times d}\times {\mathbf {C}}_{d}\times {\mathbf {C}}^{1}\) where the random environment (to be defined in (37) below) evolves.

In line with the notations of the standard reference work [35], \({\mathbb {D}}^{k,p}\) denotes the p-Sobolev space of k times Malliavin differentiable functionals. The first and second Malliavin derivative of a functional F is denoted by DF, \(D^{2}F\) or \(D_{r}F\), \(D^{2}_{r_{1},r_{2}}F\) when we need to emphasize that these are random processes/fields indexed by \(r,r_{1},r_{2}\). The Skorokhod integral operator (the adjoint of D) is denoted by \(\delta \). The notation H refers to the Hilbert-space of square-integrable \({\mathbb {R}}^{d}\)-valued functions on [0, 1].

For \(F=(F^{1},\ldots ,F^{d})\) with \(F^{i}\in {\mathbb {D}}^{1,2}\), \(i=1,\ldots ,d\) the corresponding Malliavin matrix \(\sigma (F)\) is defined as

$$\begin{aligned} \sigma (F)_{ij}=\sum _{l=1}^{d}\int _{0}^{1} D^{(l)}_{s}F^{i} D^{(l)}_{s}F^{j}\, ds, \end{aligned}$$

where \(D^{(l)}F^{i}\) denotes the lth coordinate of \(DF^{i}\). In the sequel, the notation \(L^p\) refers to the usual space of p-integrable real-valued random variables, for \(p\ge 1\). We define \(\gamma :=\sigma ^{-1}\) on the event where \(\sigma \) is invertible and 0 otherwise.

Lemma 4.3

For each \({\mathbf {q}}\in {\mathscr {Y}}\times {\mathbb {R}}^{d}\), we have \({\tilde{X}}_1({\mathbf {q}})\in \cap _{p\ge 1}{\mathbb {D}}^{2,p}\), \(D{\tilde{X}}_1({\mathbf {q}})\) and \(D^{2}{\tilde{X}}_1({\mathbf {q}})\) are bounded. Furthermore, if \({\mathbf {v}}\in {\mathbf {C}}^{+}\) and \({\mathbf {r}}\in {\mathbf {C}}^{1+}\) then \(\gamma \) is uniformly bounded, in particular, \(1/\mathrm {det}(\sigma ({\tilde{X}}_1({\mathbf {q}})))\in \cap _{p\ge 1}L^p\) holds.

Proof

The first statement follows from the proof of Theorem 2.2.2 of [35] which applies in the cases \(N=1,2\) by Assumption 2.4.

To see the last statement, recall from Theorem 2.2.1 of [35] that the matrix-valued process \((M_{t}(u))_{ij}:=D^{(j)}{\tilde{X}}^{i}_{t}({\mathbf {q}})\), \(t\in [u,1]\) satisfies the (random) ordinary differential equation

$$\begin{aligned} dM_{t}(u)=\partial _{x}\zeta \left( {\tilde{X}}_t({\mathbf {q}})+{\mathbf {z}}_{t},{\mathbf {v}}_{t}\right) M_{t}(u)\, dt,\ M_{u}(u)={\mathbf {v}}_u\sqrt{I-{\mathbf {r}}_{u}{\mathbf {r}}_{u}^{*}}, \end{aligned}$$
(36)

for each \(u\in [0,1]\). Assumption (2.4) gives us \(K'\) with \(|\partial _{x}\zeta |\le K'\). We see that

$$\begin{aligned} M_{1}(u)M_{1}^{*}(u)= & {} {\mathbf {v}}_u\sqrt{I-{\mathbf {r}}_{u}{\mathbf {r}}_{u}^{*}} \exp \left( {} \int _{u}^{1}\partial _{x}\zeta ({\tilde{X}}_s({\mathbf {q}})+{\mathbf {z}}_{s},{\mathbf {v}}_{s})\right. \\&\left. +\partial _{x}\zeta ^{*}({\tilde{X}}_s({\mathbf {q}})+{\mathbf {z}}_{s},{\mathbf {v}}_{s})\, ds \right) \sqrt{I-{\mathbf {r}}_{u}{\mathbf {r}}_{u}^{*}}{\mathbf {v}}_u^{*}\\\ge & {} \exp (-2K'){\mathbf {v}}_u(I-{\mathbf {r}}_{u}{\mathbf {r}}_{u}^{*}){\mathbf {v}}_u^{*}\\\ge & {} \epsilon I, \end{aligned}$$

for some \(\epsilon =\epsilon ({\mathbf {v}},{\mathbf {r}})>0\) since \({\mathbf {v}}\in {\mathbf {C}}^{+}\) and \({\mathbf {r}}\in {\mathbf {C}}^{1+}\). This implies our claim since

$$\begin{aligned} \sigma ({\tilde{X}}_s({\mathbf {q}}))=\int _{0}^{1} M_{1}(u)M_{1}(u)^{*}\, du. \end{aligned}$$

It follows also from (36) that \(D{\tilde{X}}_1({\mathbf {q}})\) is bounded, for each \({\mathbf {q}}\). Finally, Lemma 2.2.2 of [35] implies that, for each pair of indices il, the second derivative \(DD^{(l)}{\tilde{X}}^{i}_{t}({\mathbf {q}})\) satisfies for all indices j and for all \(s\le u\le t\),

$$\begin{aligned} |D_{s}^{(j)}D_{u}^{(l)}{\tilde{X}}_{t}^{i}({\mathbf {q}})|= & {} \left| \int _{u}^{t}(D_{s}^{(j)}\partial _{x}\zeta ({\tilde{X}}_{r} ({\mathbf {q}})+{\mathbf {z}}_{r},{\mathbf {v}}_{r})) D_{u}^{(l)}{\tilde{X}}^{i}_{r}({\mathbf {q}})\right. \\&\left. +\partial _{x} \zeta ({\tilde{X}}_{r}({\mathbf {q}})+{\mathbf {z}}_{r},{\mathbf {v}}_{r}) D_{s}^{(j)}D_{u}^{(l)}{\tilde{X}}_{r}^{i}({\mathbf {q}})\, dr\right| \\\le & {} \int _{u}^{t} K_{1}+K_{2} |D_{s}^{(j)}D_{u}^{(l)}{\tilde{X}}_{r}^{i}({\mathbf {q}})|\, dr \end{aligned}$$

for constants \(K_{1},K_{2}\) since \(\partial _{x}\zeta \), \(\partial _{xx}\zeta \), \(D{\tilde{X}}^{i}_{t}({\mathbf {q}})\) are all bounded. Gronwall’s lemma now guarantees that also \(D_{s}^{(j)}D_{u}^{(l)}{\tilde{X}}_{1}^{i}({\mathbf {q}})\) is bounded. \(\square \)

We now set up a discrete-time machinery so that we can invoke the results of Sect. 3.2. Set \({\mathscr {X}}:={\mathbb {R}}^{d}\) and \({\mathscr {X}}_n:=\{x\in {\mathbb {R}}^{d}:\ |x|\le n\}\), \(n\in {\mathbb {N}}\). Define, for \(k\in {\mathbb {Z}}\), the \({\mathscr {Y}}\)-valued random variables

$$\begin{aligned} Y_k:=\left( (V_{k+t})_{t\in [0,1]},(Z_{k+t}-Z_{k})_{t\in [0,1]},(\rho _{k+t})_{t\in [0,1]}\right) , \end{aligned}$$
(37)

where we denote \(Z_u:=\int _0^u V_{s}\rho _s\, dB_s\), \(u\in {\mathbb {R}}_{+}\). Y is a stationary process by Assumption 2.1.

By Prokhorov’s theorem, there exist an increasing sequence of compact sets \({\mathbf {D}}_n\subset {\mathbf {C}}_{d\times d}\times {\mathbf {C}}_{d}\times {\mathbf {C}}^{1}\), \(n\in {\mathbb {N}}\) such that \(P(Y_0\notin {\mathbf {D}}_n)\le 1/n\). As \(V\in {\mathscr {V}}\) and \(\rho \in {\mathscr {R}}\), \(P(Y_{0}\in {\mathbf {C}}^{+}\times {\mathbf {C}}_{d}\times {\mathbf {C}}^{1+})=1\) holds. Thus there is an increasing \({\mathbb {N}}\)-valued sequence \(l(n)\rightarrow \infty \), \(n\rightarrow \infty \) such that the sets

$$\begin{aligned} {\mathscr {Y}}_n:=\{({\mathbf {v}},{\mathbf {z}},{\mathbf {r}})\in {\mathbf {D}}_{n}:\ {\mathbf {r}}_{s}{\mathbf {r}}_{s}^{*}\le (1-1/l(n))I,\ {\mathbf {v}}_{s}{\mathbf {v}}^{*}_{s}\ge I/l(n),\ s\in [0,1] \} \end{aligned}$$

satisfy \(P(Y_{0}\notin {\mathscr {Y}}_{n})\le 2/n\), \(n\in {\mathbb {N}}\). Being closed subsets of the respective \({\mathbf {D}}_{n}\), they are compact and satisfy \(P(Y_{0}\notin {\mathscr {Y}}_{n})\rightarrow 0\), \(n\rightarrow \infty \).

We define a metric on \({\mathscr {Q}}:={\mathscr {Y}}\times {\mathbb {R}}^{d}\) by setting, for \({\mathbf {q}}_{i}=({\mathbf {v}}^{i}, {\mathbf {z}}^{i},{\mathbf {r}}^{i},x^{i})\), \(i=1,2\),

$$\begin{aligned} \rho ({\mathbf {q}}_{1},{\mathbf {q}}_{2}):=||{\mathbf {v}}^{1}-{\mathbf {v}}^{2}||_{{\mathbf {C}}_{d\times d}} +||{\mathbf {z}}^{1}-{\mathbf {z}}^{2}||_{{\mathbf {C}}_{d}}+||{\mathbf {r}}^{1}-{\mathbf {r}}^{2}||_{{\mathbf {C}}_{d\times m}} +|x^{1}-x^{2}|. \end{aligned}$$

Continuity of \({\tilde{X}}_t({\mathbf {q}})\) and its Malliavin derivatives with respect to the parameter \({\mathbf {q}}\) is established next.

Lemma 4.4

For each \(n\in {\mathbb {N}}\) and \(p\ge 2\) there exists \(C(n,p)>0\) such that for all \({\mathbf {q}}^{1},{\mathbf {q}}^{2}\in {\mathscr {Y}}_{n}\times {\mathscr {X}}_{n}\) we have

$$\begin{aligned} E^{1/p}[\sup _{t\in [0,1]}|{\tilde{X}}_t({\mathbf {q}}^{1})-{\tilde{X}}_t({\mathbf {q}}^{2})|^{p}]\le & {} C(n,p){} \rho ({\mathbf {q}}^{1}, {\mathbf {q}}^{2}),\\ E^{1/p}\left[ ||D{\tilde{X}}_1({\mathbf {q}}^{1})- D{\tilde{X}}_1({\mathbf {q}}^{2})||_{H}^{p}\right]\le & {} C(n,p)\rho ({\mathbf {q}}^{1}, {\mathbf {q}}^{2}), \\ E^{1/p}\left[ ||D^2{\tilde{X}}_1({\mathbf {q}}^{1})-D^2{\tilde{X}}_1({\mathbf {q}}^{2})||_{H\otimes H}^{p}\right]\le & {} C(n,p)\rho ({\mathbf {q}}^{1}, {\mathbf {q}}^{2}). \end{aligned}$$

Proof

For \(i=1,2\), define the Picard iterates \(Z^{i}_{0}(t):=x^{i}\), \(t\in [0,1]\) and

$$\begin{aligned} Z^{i}_{l+1}(t):=x^{i}+\int _{0}^{t}\zeta \left( Z^{i}_{l}(s) +{\mathbf {z}}^{i}_s,{\mathbf {v}}^{i}_{s}\right) \, ds+ \int _{0}^{t}{\mathbf {v}}^{i}_s\sqrt{I-{\mathbf {r}}^{i}_s{\mathbf {r}}^{i*}_{s}}\, dW_s, \end{aligned}$$
(38)

for \(t\in [0,1]\) and \(l\in {\mathbb {N}}\). Let \(K_{0}\) denote a bound for \(|\partial _{x}\zeta |\) such that \(K_{0}\ge K\) where K is as in Assumption 2.4. Clearly,

$$\begin{aligned}&\sup _{u\in [0,t]}|Z^{1}_{l+1}(u)-Z^{2}_{l+1}(u)| \\&\quad \le |x^{1}-x^{2}| + K_{0}\int _{0}^{t}\sup _{u\in [0,s]}|Z^{1}_{l+1}(u)- Z^{2}_{l+1}(u)|\\&\qquad +(1+|{\mathbf {v}}^{1}_{s}|+|{\mathbf {z}}^{1}_{s}-{\mathbf {z}}^{2}_{s}|+ |{\mathbf {v}}^{2}_{s}|)|{\mathbf {v}}^{1}_{s}-{\mathbf {v}}^{2}_{s}|\, ds\\&\qquad + \sup _{s\in [0,t]}\left| \int _{0}^{s} {\mathbf {v}}^{1}_{u}\sqrt{I-{\mathbf {r}}^{1}_{u}{\mathbf {r}}^{1*}_{u}} -{\mathbf {v}}^{2}_{u}\sqrt{I-{\mathbf {r}}^{2}_{u}{\mathbf {r}}^{2*}_{u}}dW_{u}\right| . \end{aligned}$$

Note that there is \(C>0\) such that

$$\begin{aligned} (x_{1}+x_{2}+x_{3}+x_{4}+x_{5})^{2}\le C(x^{2}_{1}+x_{2}^{2}+x_{3}^{2}+x_{4}^{2}+x_{5}^{2}),\ x_{i}\in {\mathbb {R}},\ i=1,2,3,4,5. \end{aligned}$$

Taking squares and using Cauchy’s inequality we arrive at

$$\begin{aligned}&\sup _{u\in [0,t]}|Z^{1}_{l+1}(u)-Z^{2}_{l+1}(u)|^{2} \\&\quad \le C|x^{1}-x^{2}|^{2}+CtK_{0}^{2}\int _{0}^{t} \sup _{u\in [0,s]}|Z^{1}_{l+1}(u)- Z^{2}_{l+1}(u)|^{2}+ |{\mathbf {z}}^{1}_{s}-{\mathbf {z}}^{2}_{s}|^{2}\\&\qquad + (1+|{\mathbf {v}}^{1}_{s}|+|{\mathbf {v}}^{2}_{s}|)^{2} |{\mathbf {v}}^{1}_{s}-{\mathbf {v}}^{2}_{s}|^{2}\, ds\\&\qquad + C\sup _{s\in [0,t]}\left| \int _{0}^{s} {\mathbf {v}}^{1}_{u}\sqrt{I-{\mathbf {r}}^{1}_{u}{\mathbf {r}}^{1*}_{u}} -{\mathbf {v}}^{2}_{u}\sqrt{I-{\mathbf {r}}^{2}_{u}{\mathbf {r}}^{2*}_{u}}dW_{u}\right| ^{2}. \end{aligned}$$

Taking expectations, applying Doob’s inequality and noting \(t\le 1\),

$$\begin{aligned}&E\sup _{u\in [0,t]}|Z^{1}_{l+1}(u)-Z^{2}_{l+1}(u)|^{2} \\&\quad \le C|x^{1}-x^{2}|^{2}+CK_{0}^{2}\int _{0}^{t} E\sup _{u\in [0,s]}|Z^{1}_{l+1}(u)- Z^{2}_{l+1}(u)|^{2}\\&\qquad +(1+|{\mathbf {v}}^{1}_{s}|+ |{\mathbf {z}}^{1}_{s}-{\mathbf {z}}^{2}_{s}|^{2}+|{\mathbf {v}}^{2}_{s}|)^{2} |{\mathbf {v}}^{1}_{s}-{\mathbf {v}}^{2}_{s}|^{2}\, ds\\&\qquad + 4CE\int _{0}^{1} \left( {\mathbf {v}}^{1}_{u}\sqrt{I-{\mathbf {r}}^{1}_{u}{\mathbf {r}}^{1*}_{u}} -{\mathbf {v}}^{2}_{u}\sqrt{I-{\mathbf {r}}^{2}_{u}{\mathbf {r}}^{2*}_{u}}\right) ^{2}\, du\\&\quad \le C_{n}'\left[ \rho ^{2}({\mathbf {q}}^{1},{\mathbf {q}}^{2})+\int _{0}^{t}E\sup _{u\in [0,s]}|Z^{1}_{l+1}(u)- Z^{2}_{l+1}(u)|^{2}\, ds\right] . \end{aligned}$$

for suitable \(C_{n}'\) because \(z\rightarrow \sqrt{I-zz^{*}}\) is Lipschitz-continuous on the set \(\{z\in {\mathbf {C}}^{1}: zz^{*}\le (1-\epsilon )I\}\), for all \(\epsilon >0\). Grönwall’s lemma implies that for some constant \(C_{n}''\), independent of l,

$$\begin{aligned} E\left[ \sup _{t\in [0,1]}|Z^{1}_{l+1}(t)-Z^{2}_{l+1}(t)|^{2}\right] \le C_{n}'' \rho ^{2}({\mathbf {q}}^{1}, {\mathbf {q}}^{2}). \end{aligned}$$

Since Picard iterates converge, (see e.g. Lemma 2.2.1 in [35]), we get

$$\begin{aligned} E^{1/2}\left[ \sup _{t\in [0,1]}|{\tilde{X}}({\mathbf {q}}^{1})-{\tilde{X}}({\mathbf {q}}^{2})|^{2}\right] \le \sqrt{C_{n}''}\rho ({\mathbf {q}}^{1}, {\mathbf {q}}^{2}). \end{aligned}$$

A similar argument works in \(L^{p}\) with \(p> 2\), too. Now recall that \(D{\tilde{X}}({\mathbf {q}})\), \(D^{2}{\tilde{X}}({\mathbf {q}})\) also satisfy similar (even simpler) equations, see Theorem 2.2.1 of [35], so analogous arguments apply to them, proving the remaining two inequalities. \(\square \)

We continue with some more technical material. In the following lemma, we rely on the powerful techniques presented in [3].

Lemma 4.5

The random variables \({\tilde{X}}_{1}({\mathbf {q}})\) have densities \(p_{{\mathbf {q}}}(u)\), \(u\in {\mathbb {R}}^{d}\) with respect to \(\mathrm {Leb}_{d}\), the d-dimensional Lebesgue measure, for each \({\mathbf {q}}\in {\mathscr {Y}}_{n}\times {\mathscr {X}}_{n}\), for each n. These densities have versions such that the mapping \((u,{\mathbf {q}})\rightarrow p_{{\mathbf {q}}}(u)\) is continuous on \({\mathscr {Y}}_{n}\times {\mathscr {X}}_{n}\), for each n.

Proof

Fix n, let \({\mathbf {q}}_{k}\in {\mathscr {Y}}_{n}\times {\mathscr {X}}_{n}\) and \(u_{k}\in {\mathbb {R}}^{d}\), \(k\in {\mathbb {N}}\) be such that

$$\begin{aligned} (u_n,{\mathbf {q}}_n):=(u_n,{\mathbf {v}}_n,{\mathbf {r}}_n,{\mathbf {z}}_n,x_n)\rightarrow (u,{\mathbf {q}}):=(u,{\mathbf {v}},{\mathbf {r}},{\mathbf {z}},x) \end{aligned}$$

hold as \(n\rightarrow \infty \). Let \(\partial _i Q\), \(i=1,\ldots ,d\) denote the ith partial derivative of the Poisson kernel on \({\mathbb {R}}^{d}\), see page 14 of [3]. We rely on the Malliavin-Thalmaier formula for the density of functionals on the Wiener space [30], as presented in [3]. By Theorem 2.3.1 of [3], the representation

$$\begin{aligned} p_{{\mathbf {q}}}(u)=\sum _{i=1}^{d}\sum _{j=1}^{d}E\left[ \partial _i Q\left( {\tilde{X}}_{1}({\mathbf {q}})-u\right) {} \delta \left( \gamma ^{i,j}({\tilde{X}}_{1}({\mathbf {q}}))D{\tilde{X}}_{1}({\mathbf {q}})\right) \right] \end{aligned}$$
(39)

provides the density function of \({\tilde{X}}_{1}({\mathbf {q}})\) (with respect to the d-dimensional Lebesgue-measure).

Fix ij. By Lemma 4.3, the sequence \({\tilde{X}}_{1}({\mathbf {q}}_{n})\) is bounded in \({\mathbb {D}}^{2,p}\) for all p and \(\gamma ({\tilde{X}}_{1}({\mathbf {q}}_{n}))\) is uniformly bounded, hence \(\partial _i Q\left( {\tilde{X}}_{1}({\mathbf {q}}_{n})-u\right) \) is bounded in \(L^{(d+1)/d}\) by (2.86) in Theorem 2.3.1 of [3]. By Corollary 2.2.12 in [3], the sequence \(\delta \left( \gamma ^{i,j}({\tilde{X}}_{1}({\mathbf {q}}))D{\tilde{X}}_{1}({\mathbf {q}}_{n})\right) \) is bounded in \(L^{p}\) for all \(p\ge 1\), in particular in \(L^{(d+1/2)(2d+2)/d}\). Applying Hölder’s inequality with the conjugate pair of exponents \((d+1)/(d+1/2)\) and \(2d+2\), we get that

$$\begin{aligned}&E\left[ \left| \partial _i Q\left( {\tilde{X}}_{1}({\mathbf {q}}_{n})-u\right) {} \delta \left( \gamma ^{i,j}({\tilde{X}}_{1}({\mathbf {q}}_{n}))D{\tilde{X}}_{1}({\mathbf {q}}_{n})\right) \right| ^{(d+1/2)/d} \right] \\&\quad \le E\left[ \left| \partial _i Q\left( {\tilde{X}}_{1}({\mathbf {q}}_{n})-u\right) \right| ^{(d+1)/d}\right] ^{(d+1/2)/(d+1)}\\&\qquad E\left[ \left| \delta \left( \gamma ^{i,j}({\tilde{X}}_{1}({\mathbf {q}}_{n})) D{\tilde{X}}_{1}({\mathbf {q}}_{n})\right) \right| ^{(d+1/2)(2d+2)/d}\right] ^{1/(2d+2)}, \end{aligned}$$

hence

$$\begin{aligned} \sup _{n}E\left[ \left| \partial _i Q\left( {\tilde{X}}_{1}({\mathbf {q}}_{n})-u\right) {} \delta \left( \gamma ^{i,j}({\tilde{X}}_{1}({\mathbf {q}}_{n}))D{\tilde{X}}_{1}({\mathbf {q}}_{n})\right) \right| ^{(d+1/2)/d} \right] <\infty . \end{aligned}$$
(40)

This means that the sequence \(\partial _i Q\left( {\tilde{X}}_{1}({\mathbf {q}}_{n})-u\right) {} \delta \left( \gamma ^{i,j}({\tilde{X}}_{1}({\mathbf {q}}_{n}))D{\tilde{X}}_{1}({\mathbf {q}}_{n})\right) \), \(n\in {\mathbb {N}}\) is uniformly integrable hence it suffices to prove that

$$\begin{aligned}&\partial _i Q\left( {\tilde{X}}_{1}({\mathbf {q}}_{n})-u_{n}\right) {} \delta \left( \gamma ^{i,j}({\tilde{X}}_{1}({\mathbf {q}}_{n}))D{\tilde{X}}_{1}({\mathbf {q}}_{n})\right) \nonumber \\&\quad \rightarrow {} \partial _i Q\left( {\tilde{X}}_{1}({\mathbf {q}})-u\right) {} \delta \left( \gamma ^{i,j}({\tilde{X}}_{1}({\mathbf {q}}))D{\tilde{X}}_{1}({\mathbf {q}})\right) ,\ n\rightarrow \infty \end{aligned}$$
(41)

in probability. We remark that in (40) we can also take a supremum in \(u\in C\) for any compact \(C\subset {\mathbb {R}}^{d}\) and this also implies

$$\begin{aligned} \sup _{n}\sup _{u\in C}p_{{\mathbf {q}}_{n}}(u)<\infty . \end{aligned}$$
(42)

We start by treating the factor \(\partial _i Q\) in (41). Let \(\varepsilon >0\) be given. Notice that each \(\partial _i Q\) is Lipschitz-continuous outside a ball of radius \(\varepsilon \), with Lipschitz-constant, say, \(K_{\varepsilon }\). Let M be such that \(\sup _{n}|u_{n}|\le M\) and denote by \(A_{d}\) the volume of the unit ball in \({\mathbb {R}}^{d}\). We thus have, by the Markov inequality,

$$\begin{aligned}&P\left( |\partial _i Q\left( {\tilde{X}}_{1}({\mathbf {q}}_{n})-u_{n}\right) -\partial _i Q\left( {\tilde{X}}_{1}({\mathbf {q}})-u\right) |\ge \varepsilon {} \right) \\&\quad \le P\left( |{\tilde{X}}_{1}({\mathbf {q}}_{n})-u_{n}|\le \varepsilon \right) {} +P\left( |{\tilde{X}}_{1}({\mathbf {q}})-u|\le \varepsilon \right) \\&\qquad + K_{\varepsilon }E\left[ |{\tilde{X}}_{1}({\mathbf {q}}_{n})-u_{n}-{\tilde{X}}_{1}({\mathbf {q}})+u|\right] /\varepsilon \\&\quad \le A_{d}\varepsilon ^{d} \sup _{|v|\le M+\varepsilon }\sup _{k\in {\mathbb {N}}}p_{{\mathbf {q}}_{k}}(v)+ A_{d}\varepsilon ^{d} \sup _{|v|\le M+\varepsilon }p_{{\mathbf {q}}}(v)\\&\qquad + K_{\varepsilon } {E\left[ |{\tilde{X}}_{1}({\mathbf {q}}_{n})-u_{n} -{\tilde{X}}_{1}({\mathbf {q}})+u|\right] }/{\varepsilon }. \end{aligned}$$

Here the third term is smaller than \(\varepsilon \) for n large enough, by Lemma 4.4. Then all three terms can be made arbitrarily small by choosing \(\varepsilon \) small enough and n large enough, the suprema being finite by (42). This shows convergence in probability for the first factor in (41).

Now we turn to the second factor in (41). By Proposition 1.5.4 of [35], convergence of

$$\begin{aligned} \gamma ^{i,j}({\tilde{X}}_{1}({\mathbf {q}}_{n}))D{\tilde{X}}_{1}({\mathbf {q}}_{n})\rightarrow \gamma ^{i,j}({\tilde{X}}_{1}({\mathbf {q}}))D{\tilde{X}}_{1}({\mathbf {q}})\quad \text{ in } {\mathbb {D}}^{1,2} \end{aligned}$$
(43)

implies the \(L^{2}\)-convergence (hence also convergence in probability) of

$$\begin{aligned} \delta \left( \gamma ^{i,j}({\tilde{X}}_{1}({\mathbf {q}}_{n}))D{\tilde{X}}_{1}({\mathbf {q}}_{n})\right) \rightarrow {} \delta \left( \gamma ^{i,j}({\tilde{X}}_{1}({\mathbf {q}}))D{\tilde{X}}_{1}({\mathbf {q}})\right) .{} \end{aligned}$$

So it remains to establish (43).

Since \(\gamma ^{i,j}({\tilde{X}}_{1}({\mathbf {q}}_{n}))\) are uniformly bounded and \(D{\tilde{X}}_{1}({\mathbf {q}}_{n})\rightarrow D{\tilde{X}}_{1}({\mathbf {q}})\) in \(L^{2}([0,1]\times \Omega )\), we clearly have that \(\gamma ^{i,j}({\tilde{X}}_{1}({\mathbf {q}}_{n}))D{\tilde{X}}_{1}({\mathbf {q}}_{n})\rightarrow \gamma ^{i,j}({\tilde{X}}_{1}({\mathbf {q}}))D{\tilde{X}}_{1}({\mathbf {q}})\) in \(L^{2}([0,1]\times \Omega )\) by Lemma 4.4 and by the fact that matrix inversion is a continuous operation. Let us now have a closer look at the Malliavin derivative of \(\gamma ^{i,j}({\tilde{X}}_{1}({\mathbf {q}}_{n}))D{\tilde{X}}_{1}({\mathbf {q}}_{n})\).

First, let us recall that \(D^{2}{\tilde{X}}_{1}({\mathbf {q}}_{n})\rightarrow D^{2}{\tilde{X}}_{1}({\mathbf {q}})\) in \(L^{p}([0,1]^{2}\times \Omega )\) for all \(p\ge 1\), by Lemma 4.4. Also, \(\gamma ^{i,j}({\tilde{X}}_{1}({\mathbf {q}}_{n}))\rightarrow \gamma ^{i,j}({\tilde{X}}_{1}({\mathbf {q}}))\) in \(L^{p}\) and \(D{\tilde{X}}_{1}({\mathbf {q}}_{n})\rightarrow D{\tilde{X}}_{1}({\mathbf {q}})\) in \(L^{p}\). It remains to establish

$$\begin{aligned} D\gamma ^{i,j}({\tilde{X}}_{1}({\mathbf {q}}_{n}))\rightarrow D\gamma ^{i,j}({\tilde{X}}_{1}({\mathbf {q}})) \end{aligned}$$
(44)

in \(L^{p}\).

Denote by \(G:{\mathscr {V}}\rightarrow {\mathbb {R}}^{d\times d}\) the operation of matrix inversion and by \(G'\) its derivative. We find that, on the set of positive definite matrices bounded away from 0, \(G'\) is bounded. Now (44) follows from Lemma 4.4 and from our previous observations. As these arguments work for all ij, we indeed get \(p_{{\mathbf {q}}_{n}}(u_{n})\rightarrow {} p_{{\mathbf {q}}}(u)\). \(\square \)

Proving the positivity of densities is an evergreen topic in Malliavin calculus. We rely on the deep study [2] in the next lemma.

Lemma 4.6

We have \(p_{{\mathbf {q}}}(u)>0\) for every \({\mathbf {q}}\), u.

Proof

Fix \({\mathbf {q}}\), u. We will choose \(0<\eta <1/2\) later. We will apply Theorem 3.3 of [2] with the choice \(y=u\), \(r=1\), \(T=1\), \(\delta =\eta \),

$$\begin{aligned} F= & {} {\tilde{X}}_{1}({\mathbf {q}}),\ F_{1-\eta }={\tilde{X}}_{1-\eta }({\mathbf {q}}),\\ G_{\eta }= & {} {\mathbf {v}}_{1-\eta }\sqrt{I-{\mathbf {r}}_{1-\eta }{\mathbf {r}}_{1-\eta }^{*}}(W_{1}-W_{1-\eta }),\\ R_{\eta }= & {} R^{1}_{\eta }+R^{2}_{\eta }= \int _{1-\eta }^{1}({\mathbf {v}}_{s}\sqrt{I-{\mathbf {r}}_{s}{\mathbf {r}}_{s}^{*}} -{\mathbf {v}}_{1-\eta }\sqrt{I-{\mathbf {r}}_{1-\eta }{\mathbf {r}}_{1-\eta }^{*}})\, dW_{s}\\&+ \int _{1-\eta }^{1} \zeta ({\tilde{X}}_{s}({\mathbf {q}})+{\mathbf {z}}_{s},{\mathbf {v}}_{s})\, ds. \end{aligned}$$

We are now checking the conditions of that theorem.

First, \(F_{1-\eta }\) is \({\mathscr {F}}_{1-\eta }\)-measurable. Second, as all coordinates of \({\tilde{X}}_{1}({\mathbf {q}})\) are in \({\mathbb {D}}^{2,\infty }\) by Lemma 4.3, so are those of \(R_{\delta }\). Third, since \(s\rightarrow ({\mathbf {r}}_{s},{\mathbf {v}}_{s})\) is continuous and \({\mathbf {r}}_{s}\in {\mathbf {C}}^{1+}\), \({\mathbf {v}}_{s}\in {\mathbf {C}}^{+}\), there is \(\varepsilon >0\) such that \({\mathbf {v}}_{s}\sqrt{I-{\mathbf {r}}_{s}{\mathbf {r}}_{s}^{*}}\ge \varepsilon I\) for all s. It follows that

$$\begin{aligned} C_{\eta }:=\int _{1-\eta }^{1} {\mathbf {v}}^{*}_{s}[I-{\mathbf {r}}_{s}{\mathbf {r}}_{s}^{*}]{\mathbf {v}}_{s}\, ds\ge \eta \varepsilon ^{2}I. \end{aligned}$$

Clearly, \(\mathrm {det}(C_{\eta })\ne 0\).

In order to apply Theorem 3.3 of [2], it remains to check that the event

$$\begin{aligned} {\tilde{\Gamma }}_{\eta ,1}= \{|F_{1-\eta }-y|\le 1/2\}\cap \{ ||C_{\eta }^{-1/2}R_{\eta }||_{\eta ,2,q}\le a\mathrm {e}^{-1}\} \end{aligned}$$

has positive probability for a suitable \(\eta \). Here q and a are explicit constants whose precise form can be found in [2]. For \(U\in {\mathbb {R}}^{d}\) with all coordinates in \({\mathbb {D}}^{2,q}\) the norm \(||U||_{\delta ,2,q}\) is defined as the random quantity

$$\begin{aligned}&\left( E_{T-\eta }[|U|^{q}]+E_{T-\eta }\left[ \left( \int _{T-\eta }^{T} |D_{s}(U)|^{2}\, ds\right) ^{q/2}\right] \right. \nonumber \\&\quad \qquad \left. + E_{T-\eta }\left[ \left( \int _{T-\eta }^{T}\int _{T-\eta }^{T} |D^{2}_{s_{1},s_{2}}(U)|^{2}\, ds_{1}\, ds_{2}\right) ^{q/2}\right] \right) ^{1/q}, \end{aligned}$$
(45)

with \(E_{T-\eta }\) denoting conditional expectation with respect to \({\mathscr {F}}_{T-\eta }\). As we have already seen,

$$\begin{aligned}C_{\eta }^{-1/2}\le \frac{1}{\varepsilon \sqrt{\eta }}I,\end{aligned}$$

so it suffices to show that

$$\begin{aligned} {\hat{\Gamma }}_{\eta ,1}:=\{|F_{1-\eta }-u|\le 1/2\}\cap \{ ||R_{\eta }||_{\eta ,2,q}\le a\mathrm {e}^{-1}{\varepsilon \sqrt{\eta }}\} \end{aligned}$$

has positive probability.

By an easy extension of the support theorem for diffusions, see [19], the process \({\tilde{X}}({\mathbf {q}})\) has full support on the space of continuous functions starting from x, so we clearly have that

$$\begin{aligned} P(A_{\eta })>0 \text{ for } A_{\eta }:=\{|F_{1-\eta }-u|\le 1/2\},{} \end{aligned}$$

for each \(0<\eta <1/2\). A standard argument (like Lemma 19 of [1]) shows that

$$\begin{aligned} \Vert R_{\eta }\Vert _{\eta ,2,q}\le (1+|F_{1-\eta }|)\eta \end{aligned}$$

almost surely. But then on \(A_{\eta }\) we have \(\Vert R_{\eta }\Vert _{\delta ,2,q}\le (1+|u|+1/2)\eta \). Clearly, this is smaller than \(a\mathrm {e}^{-1}{\varepsilon \sqrt{\eta }}\) for \(\eta \) small enough. We conclude that the set \({\hat{\Gamma }}_{\eta ,q}\) contains \(A_{\eta }\) for \(\eta \) small enough, consequently it has positive probability. Now Theorem 3.3 of [2] implies that \(p_{{\mathbf {q}}}(\cdot )\ge c\) Lebesgue-a.s. with some \(c>0\) in a neighbourhood of u. As \(p_{{\mathbf {q}}}(u)\) is continuous in u, \(p_{{\mathbf {q}}}(u)\ge c\), showing our lemma. \(\square \)

Corollary 4.7

There exist constants \({\tilde{c}}_n>0\), \(n\in {\mathbb {N}}\) such that for each \(A\in {\mathscr {B}}({\mathbb {R}})\) with \(A\subset [-1,1]\) and for all \(({\mathbf {v}},{\mathbf {z}},{\mathbf {r}})\in {\mathscr {Y}}_n\), \(x\in {\mathscr {X}}_n\),

$$\begin{aligned} P({\tilde{X}}_1({\mathbf {v}},{\mathbf {z}},{\mathbf {r}},x)\in A)\ge {\tilde{c}}_n\mathrm {Leb}(A). \end{aligned}$$

Proof

Compactness of \([-1,1]\times {\mathscr {Y}}_{n}\times {\mathscr {X}}_{n}\), Lemmas 4.5 and 4.6 imply that

$$\begin{aligned} \inf _{(u,{\mathbf {q}})\in [-1,1]\times {\mathscr {Y}}_n\times {\mathscr {X}}_n} p_{{\mathbf {q}}}(u)>0. \end{aligned}$$

\(\square \)

Define \({\hat{X}}_t({\mathbf {v}},{\mathbf {z}},{\mathbf {r}},x):={\tilde{X}}_t({\mathbf {v}},{\mathbf {z}},{\mathbf {r}},x) +{\mathbf {z}}_{t}\), \(t\in [0,1]\). This process satisfies the integral equation

$$\begin{aligned} {\hat{X}}_t({\mathbf {v}},{\mathbf {z}},{\mathbf {r}},x)=x+ \int _{0}^{t}\zeta ({\hat{X}}_s,{\mathbf {v}}_{s})\, ds+{\mathbf {z}}_{t}+ \int _{0}^{t}{\mathbf {v}}_{s}\sqrt{I-{\mathbf {r}}_{s}{\mathbf {r}}_{s}^{*}}\, dW_{s},\ t\in [0,1] \end{aligned}$$

hence it serves as the “parametric version” of (4).

Corollary 4.8

There exist constants \({\hat{c}}_n>0\), \(n\in {\mathbb {N}}\) such that for each \(A\in {\mathscr {B}}({\mathbb {R}})\) with \(A\subset [-1,1]\) and for all \(({\mathbf {v}},{\mathbf {z}},{\mathbf {r}})\in {\mathscr {Y}}_n\), \(x\in {\mathscr {X}}_n\),

$$\begin{aligned} P({\hat{X}}_1({\mathbf {v}},{\mathbf {z}},{\mathbf {r}},x)\in A)\ge {\hat{c}}_n\mathrm {Leb}(A). \end{aligned}$$

Proof

Note that \(({\mathbf {v}},{\mathbf {z}},{\mathbf {r}})\rightarrow ||{\mathbf {z}}||_{{\mathbf {C}}_{d}}\) is bounded on each \({\mathscr {Y}}_{n}\). Hence there is \(N\ge n\) such that whenever \(x\in {\mathscr {X}}_{n}\) one has \(x+{\mathbf {z}}_{1}\in {\mathscr {X}}_{N}\) for all \({\mathbf {q}}\in {\mathscr {Y}}_{n}\). Now Corollary 4.7 readily implies the statement. \(\square \)

Lemma 4.9

There exists a measurable mapping \(\Xi :\Omega \times \cup _{n}{\mathscr {Y}}_{n}\times {\mathbb {R}}\rightarrow {} {\mathbf {C}}_{d}\) such that it satisfies for all \({\mathbf {q}}\in \cup _{n}{\mathscr {Y}}_{n}\times {\mathbb {R}}\) the equation

$$\begin{aligned} {\Xi }_{t}({\mathbf {q}})=x+\int _{0}^{t}\zeta (\Xi _{s}({\mathbf {q}}),{\mathbf {v}}_{s})\, ds+ {\mathbf {z}}_{t}+\int _{0}^{t}{\mathbf {v}}_{s}\sqrt{I-{\mathbf {r}}_{s}{\mathbf {r}}_{s}^{*}}\, dW_{t}, \ t\in [0,1]. \end{aligned}$$

For almost all \(\omega \), \(\Xi (\omega ,\cdot ,\cdot )\) is continuous. Furthermore, \(\Xi _{t}(\cdot ,Y_{k},L_{k})\), \(t\in [0,1]\) is a version of \(L_{k+t}\), \(t\in [0,1]\). From now on we always take this version of L.

Proof

Let us take an increasing sequence of sets \(B_{n}\subset {\mathscr {Y}}_{n}\times {\mathscr {X}}_{n}\), \(n\in {\mathbb {N}}\) which are countable and dense in \({\mathscr {Y}}_{n}\times {\mathscr {X}}_{n}\). By Lemma 4.4, there is a common P-null set \(N\in {\mathscr {F}}\) such that for \(\omega \in \Omega \setminus N\) the mapping \({\mathbf {q}}\rightarrow ({\hat{X}}_{u}({\mathbf {q}})(\omega ))_{u\in [0,1]}\in {\mathbf {C}}_{d}\) is uniformly continuous on \(B_{n}\) for each n hence it has a continuous extension to \({\mathscr {Y}}_{n}\times {\mathscr {X}}_{n}\) which coincides with the respective extensions on \({\mathscr {Y}}_{l}\times {\mathscr {X}}_{l}\) for \(l\le n\). Hence we eventually get a function \(\Xi :(\Omega \setminus N)\times \cup _{n}{\mathscr {Y}}_{n}\times {\mathbb {R}}\rightarrow {} {\mathbf {C}}_{d}\) that is measurable in its first variable and jointly continuous in its second and third, hence jointly measurable in all three variables. (We set \(\Xi :=0\) on N.)

For any \({\mathscr {G}}_{\infty }\vee {\mathscr {F}}_{k}\)-measurable step function \({\mathbf {Q}}:\Omega \rightarrow \cup _{n}{\mathscr {Y}}_{n}\times {\mathbb {R}}\) with \({\mathbf {Q}}=({\mathbf {V}},{\mathbf {Z}},{\mathbf {R}},X)\) it clearly holds that

$$\begin{aligned} {\Xi }_{t}({\mathbf {Q}})=X+\int _{0}^{t} \zeta (\Xi _{s}({\mathbf {Q}})),{\mathbf {v}}_{s})\, ds+{\mathbf {Z}}_{t}+\int _{0}^{t} {\mathbf {v}}_{s}\sqrt{I-{\mathbf {R}}_{s}{\mathbf {R}}_{s}^{*}}\, dW_{t}, \ t\in [0,1], \end{aligned}$$

and then this extends by continuity to all \(\cup _{n}{\mathscr {Y}}_{n}\times {\mathbb {R}}\)-valued \({\mathscr {G}}_{\infty }\vee {\mathscr {F}}_{k}\)-measurable random variables \({\mathbf {Q}}\), by continuity. In particular, it holds for \({\mathbf {Q}}:=(Y_{k},L_{k})\), which proves the second statement. \(\square \)

Let us define the parametrized kernel Q as follows: for each \((x,y)\in {\mathbb {R}}\times \cup _{n}{\mathscr {Y}}_{n}\) and for all continuous and bounded \(\phi :{\mathbb {R}}^{d}\rightarrow {\mathbb {R}}^{d}\) we let

$$\begin{aligned} \int _{{\mathbb {R}}^{d}}\phi (z)\, Q(x,y,dz):=E[\phi (\Xi _{1}(y,x))]. \end{aligned}$$

This clearly defines a probability for all (xy), and for a fixed \(\phi \) it is measurable in (xy) by Lemma 4.9. Now we can recursively generate

$$\begin{aligned} X_0:=L_{0}, \ X_{t+1}:={\Xi }_{t+1}(Y_{t},X_{t}), t\in {\mathbb {N}}\setminus \{0\} \end{aligned}$$

and see that X is a Markov chain in random environment with kernel Q which satisfies \(X_{t}=L_{t}\), \(t\in {\mathbb {N}}\). Notice that (23) holds by Lemma 4.2 above.

Let \(\mu ,\nu \) be probabilities on \({\mathscr {B}}({\mathbb {R}}^{d}\times {\mathscr {W}}^{d\times d} \times {\mathscr {W}}^{d\times m})\). Let \({\mathscr {C}}(\mu ,\nu )\) denote the set of probabilities \(\pi \) on \({\mathscr {B}}({\mathbb {R}}^{d}\times {\mathscr {W}}^{d\times d} \times {\mathscr {W}}^{d\times m}\times {\mathbb {R}}\times {\mathscr {W}}^{d\times d} \times {\mathscr {W}}^{d\times m})\) such that their respective marginals are \(\mu ,\nu \). Define

$$\begin{aligned}&{\mathbf {w}}(\mu ,\nu )\\&\quad :=\inf _{\zeta \in {\mathscr {C}}(\mu ,\nu )} \int _{({\mathbb {R}}^{d}\times {\mathscr {W}}^{d\times d} \times {\mathscr {W}}^{d\times m})^{2}} \left( \left[ 1 \wedge |x_{1}-x_{2}|\right] +{\mathbf {d}}_{d\times d}(v_{1},v_{2})\right. \\&\qquad \left. +{\mathbf {d}}_{d\times m}(w_{1},w_{2})\right) \pi (dx_{1},dv_{1},dw_{1},dx_{2},dv_{2},dw_{2}). \end{aligned}$$

This bounded Wasserstein distance metrizes weak convergence of probabilities on \({\mathscr {B}}({\mathbb {R}}^{d}\times {\mathscr {W}}^{d\times d} \times {\mathscr {W}}^{d\times m})\) and satisfies \({\mathbf {w}}(\mu ,\nu )\le C||\mu -\nu ||_{TV}\) for some \(C>0\), see Theorem 6.15 of [43].

Proof of Theorem 2.9

The letter C refers to various constants in this proof. Invoking Theorem 3.10, we can establish the existence of \(\mu _{\sharp }\) such that

$$\begin{aligned} {\mathscr {L}}(L_l,{\mathbf {V}}_{l},{\mathbf {R}}_{l})\rightarrow \mu _{\sharp },\ l\rightarrow \infty ,\ l\in {\mathbb {N}} \end{aligned}$$

holds in \(||\cdot ||_{TV}\). Working on a finer time grid, we similarly obtain that, for each \(k\in {\mathbb {N}}\), the sequence of laws \({\mathscr {L}}(L_{l/2^{k}},{\mathbf {V}}_{l/2^{k}},{\mathbf {R}}_{l/2^{k}})\), \(l\in {\mathbb {N}}\) converge in \(||\cdot ||_{TV}\) as \(l\rightarrow \infty \) and all these limits necessarily equal \(\mu _{\sharp }\).

Assumption 2.4 implies Lipschitz-continuity of \(\zeta \) in its first variable and local Lipschitz-continuity with linearly growing Lipschitz-continuity in its second variable. In particular, \(|\zeta (x,v)|\le C(1+|x|+|v|^{2})\), hence for \(0<h\le 1\),

$$\begin{aligned}&E[|L_{t+h}-L_t|^2]\nonumber \\&\quad \le 3 E\left[ \left( \int _t^{t+h} \zeta \left( L_s,V_{s}\right) \, ds\right) ^2\right] + 3E\left[ \left( \int _t^{t+h} V_s\rho _{s}\, dB_s\right) ^2\right] \nonumber \\&\qquad + 3E\left[ \left( \int _t^{t+h} \sqrt{I-\rho _{s}\rho _{s}^{*}}V_s\, dW_s\right) ^2\right] \nonumber \\&\quad \le \int _t^{t+h} C\left[ E[|L_s|^2]\right. \nonumber \\&\qquad \left. +E[|V_0|^4]+1\right] \, ds +3\int _t^{t+h}E[|V_0|^2]\, ds+3\int _t^{t+h}E[|V_0|^2]\, ds\nonumber \\&\quad \le hC[{\tilde{L}} +E[|V_0|^4]+1+E[|V_0|^2]]\le Ch, \end{aligned}$$
(46)

by Assumption 2.4 and Lemma 4.2. It is only at this point that we need \(E[|V_{0}|^{4}]<\infty \).

For each \(t\in {\mathbb {R}}_+\) and \(k\in {\mathbb {N}}\), let l(kt) denote the integer satisfying \(l(k,t)/2^k\le t<[l(k,t)+1]/2^k\). Notice that, for k fixed, \(l(k,t)\rightarrow \infty \) as \(t\rightarrow \infty \). We estimate, using (46),

$$\begin{aligned}&{\mathbf {w}}({\mathscr {L}}(L_t,{\mathbf {V}}_{t},{\mathbf {R}}_{t}),\mu _{\sharp })\\&\quad \le {\mathbf {w}}({\mathscr {L}}(L_t,{\mathbf {V}}_{t},{\mathbf {R}}_{t}), {\mathscr {L}}(L_{l(k,t)/2^k},{\mathbf {V}}_{l(k,t)/2^k},{\mathbf {R}}_{l(k,t)/2^k}))\\&\qquad + {\mathbf {w}}({\mathscr {L}}(L_{l(k,t)/2^k},{\mathbf {V}}_{l(k,t)/2^k}, {\mathbf {R}}_{l(k,t)/2^k}),\mu _{\sharp })\\&\quad \le E|L_t-L_{l(k,t)/2^k}| + E[{\mathbf {d}}_{d\times d}({\mathbf {V}}_{t},{\mathbf {V}}_{l(k,t)/2^k})]\\&\qquad + E[{\mathbf {d}}_{d\times m}({\mathbf {R}}_{t},{\mathbf {R}}_{l(k,t)/2^k}] + C||{\mathscr {L}}(L_{l(k,t)/2^k},{\mathbf {V}}_{l(k,t)/2^k},{\mathbf {R}}_{l(k,t)/2^k})-\mu _{\sharp }||_{TV}\\&\quad \le \sqrt{{C}/{2^{k}}}+ \sup _{t\in {\mathbb {R}}}\{E[{\mathbf {d}}_{d\times d}({\mathbf {V}}_{t},{\mathbf {V}}_{l(k,t)/2^k})]+ E[{\mathbf {d}}_{d\times m}({\mathbf {R}}_{t},{\mathbf {R}}_{l(k,t)/2^k}]\} \\&\qquad + C||{\mathscr {L}}(L_{l(k,t)/2^k})-\mu _{\sharp }||_{TV}. \end{aligned}$$

Noting Lemma 4.1 and Theorem 3.10, the latter expression can be made arbitrarily small by first choosing k large enough and then choosing t large enough.

Now we turn to proving stationarity. Theorem 3.10 implies that, if \({\mathscr {L}}(L_{0},{\mathbf {V}}_{0},{\mathbf {R}}_{0})=\mu _{\sharp }\) then

$$\begin{aligned} {\mathscr {L}}(L_{t},{\mathbf {V}}_{t},{\mathbf {R}}_{t})=\mu _{\sharp } \end{aligned}$$
(47)

holds for all dyadic rationals \(t\ge 0\). For an arbitrary \(t\in {\mathbb {R}}\), take dyadic rationals \(t_{n}\rightarrow t\), \(n\rightarrow \infty \) and estimate

$$\begin{aligned}&{\mathbf {w}}({\mathscr {L}}(L_t,{\mathbf {V}}_{t},{\mathbf {R}}_{t}), {\mathscr {L}}(L_{t_{n}},{\mathbf {V}}_{t_{n}},{\mathbf {R}}_{t_{n}}))\\&\quad \le E|L_t-L_{t_{n}}| + E[{\mathbf {d}}_{d\times d}({\mathbf {V}}_{t},{\mathbf {V}}_{t_{n}})]+ E[{\mathbf {d}}_{d\times m}({\mathbf {R}}_{t},{\mathbf {R}}_{t_{n}})], \end{aligned}$$

which tends to 0 as \(n\rightarrow \infty \), by Lemma 4.1 and by (46). Hence (47) holds for all \(t\in {\mathbb {R}}\). \(\square \)

Proof of Theorem 2.10

Notice that, in the proof of Theorem 2.9, we used Assumption 2.5 only in Lemma 4.2. Under our current assumptions, we verify

$$\begin{aligned} \sup _{t\ge 0}E[e^{\kappa |L_{t}|}]<\infty \end{aligned}$$

for some \(\kappa >0\). This trivially entails

$$\begin{aligned} {\tilde{L}}:=\sup _{t\ge 0}E[|L_{t}|^{2}]<\infty , \end{aligned}$$

and the rest of the proof follows verbatim that of Theorem 2.9.

We use the Lyapunov-function \(g(x):=\exp \left( \kappa \sqrt{1+|x|^{2}}\right) \), \(x\in {\mathbb {R}}^{d}\), where \(0<\kappa \le \kappa _{0}\) will be chosen later. Note that

$$\begin{aligned} \partial _{i}g(x)=\exp \left( \kappa \sqrt{1+|x|^{2}}\right) \frac{\kappa x^{i}}{\sqrt{1+|x|^{2}}},\ i=1,\ldots ,d,{} \end{aligned}$$

and \(|\partial _{ij}g(x)|\le C_{0}\kappa g(x)\) for all x, with some constant \(C_{0}>0\), for all \(1\le i,j\le d\).

Fix \(k\in {\mathbb {N}}\). Define the stopping times \(\tau _l:=\inf \{ t>k:|L_t|>l\}\) for \(l\in {\mathbb {N}}\). Apply Itô’s lemma to obtain

$$\begin{aligned} \mathrm {e}^{\alpha (t\wedge \tau _{l}-k)}\mathrm {e}^{\kappa \sqrt{1+|L_{t\wedge \tau _{l}}|^{2}}}\le & {} \mathrm {e}^{\kappa \sqrt{1+|L_{k}|^{2}}}+\int _{k}^{t\wedge \tau _{l}}\mathrm {e}^{\alpha (s-k)} \kappa \frac{\mathrm {e}^{\kappa \sqrt{1+|L_{s}|^{2}}}}{\sqrt{1+|L_{s}|^{2}}} \langle L_{s},\zeta (L_{s},V_{s})\rangle \, ds\\&+ \int _{k}^{t\wedge \tau _{l}}\mathrm {e}^{\alpha (s-k)} \kappa \frac{\mathrm {e}^{\kappa \sqrt{1+|L_{s}|^{2}}}}{\sqrt{1+|L_{s}|^{2}}} L_{s}^{*}V_{s}\, d{\overline{W}}_{s}\\&+\int _{k}^{t\wedge \tau _{l}}C_{1}\kappa \mathrm {e}^{\alpha (s-k)}\mathrm {e}^{\kappa \sqrt{1+|L_{s}|^{2}}}|V_{s}|^{2}\, ds\\&+ \int _{k}^{t\wedge \tau _{l}}\alpha \mathrm {e}^{\alpha (s-k)} \mathrm {e}^{\kappa \sqrt{1+|L_{s}|^{2}}}\, ds,\ t\ge k, \end{aligned}$$

for some \(C_{1}>0\). Taking expectations, using the martingale property of stochastic integrals and (6), we arrive at

$$\begin{aligned}&E\left[ e^{\alpha (t\wedge \tau _{l}-k)}\mathrm {e}^{\kappa \sqrt{1+|L_{t\wedge \tau _{l}}|^{2}}}\right] \\&\quad \le E\left[ \mathrm {e}^{\kappa \sqrt{1+|L_{k}|^{2}}}\right] \\&\qquad +E\left[ \int _{k}^{t\wedge \tau _{l}} \kappa \mathrm {e}^{\alpha (s-k)}\frac{\mathrm {e}^{\kappa \sqrt{1+|L_{s}|^{2}}}}{\sqrt{1+|L_{s}|^{2}}} (-\alpha |L_{s}|^{1+\gamma }+\beta (1+|V_{s}|^{\xi }))\, ds\right] \\&\qquad + E\left[ \int _{k}^{t\wedge \tau _{l}}C_{1}\kappa \mathrm {e}^{\alpha (s-k)} \mathrm {e}^{\kappa \sqrt{1+|L_{s}|^{2}}}|V_{s}|^{2}\, ds\right] \\&\qquad + E\left[ \int _{k}^{t\wedge \tau _{l}}\alpha \mathrm {e}^{\alpha (s-k)} \mathrm {e}^{\kappa \sqrt{1+|L_{s}|^{2}}}\, ds\right] . \end{aligned}$$

Set \(C_{2}:=C_{1}+\beta \). Let us notice that, on the event

$$\begin{aligned} A:=\left\{ |L_{s}|\ge \max \left\{ 1, \left( \frac{2\sqrt{2}}{\kappa }\right) ^{1/\gamma }+ \left( \frac{2\sqrt{2} C_{2}}{\alpha }\right) ^{1/\gamma }(1+|V_{s}|^{\xi })^{1/\gamma }\right\} \right\} \end{aligned}$$

we have

$$\begin{aligned} \frac{-\alpha \kappa }{\sqrt{1+|L_{s}|^{2}}} |L_{s}|^{1+\gamma }+C_{2}\kappa (1+|V_{s}|^{\xi })+\alpha\le & {} 0 \end{aligned}$$

On the complement of A,

$$\begin{aligned}&\exp \left( \kappa \sqrt{1+|L_{s}|^{2}}\right) \\&\quad \le \exp \left( {} \kappa +\kappa \left( 1+\left( \frac{2\sqrt{2}}{\kappa }\right) ^{1/\gamma }+ \left( \frac{2\sqrt{2} C_{2}}{\alpha }\right) ^{1/\gamma }(1+|V_{s}|^{\xi })^{1/\gamma }\right) \right) . \end{aligned}$$

Letting \(l\rightarrow \infty \) and applying Fatou’s lemma, our estimate takes the form

$$\begin{aligned}&E\left[ e^{\alpha (t-k)}\mathrm {e}^{\kappa \sqrt{1+|L_{t}|^{2}}}\right] \le E\left[ \mathrm {e}^{\kappa \sqrt{1+|L_{k}|^{2}}}\right] \\&\qquad + C_{3}(\kappa )\int _{k}^{k+1}E\left[ (1+|V_{s}|^{\xi }) \exp \left( \kappa \left( \frac{2\sqrt{2} C_{2}}{\alpha }\right) ^{1/\gamma }(1+|V_{s}|^{\xi })^{1/\gamma }\right) {} \right] \, ds\\&\quad \le E\left[ \mathrm {e}^{\kappa \sqrt{1+|L_{k}|^{2}}}\right] + C_{4}(\kappa )E\left[ (1+|V_{0}|^{\xi }) \exp \left( \kappa \left( \frac{2\sqrt{2} C_{2}}{\alpha }\right) ^{1/\gamma }2^{{1}/{\gamma }}|V_{0}|^{\xi /\gamma }\right) \right] . \end{aligned}$$

for all \(k\le t\le k+1\), with constants \(C_{3}(\kappa ),C_{4}(\kappa )\). Choosing \(\kappa \) such that \(\left( \frac{2\sqrt{2} C_{2}}{\alpha }\right) ^{1/\gamma }2^{{1}/{\gamma }}\kappa <\kappa _{0}\), the second integral is finite, by (6). Now we can easily conclude, as in Lemma 4.2 above. \(\square \)