1 Introduction

Markov chains in random environments (recursive chains in the terminology of [3]) were systematically studied on countable state spaces in [4, 5, 17]. However, papers on the ergodic properties of such processes on a general state space are scarce and require rather strong, Doeblin-type conditions, see [13, 14, 18]. An exception is [19], where the system dynamics is assumed to be contracting instead. This is also rather restrictive an assumption, and only weak convergence of the laws can be established.

In this paper, we deal with Markov chains in random environments that satisfy refinements of the usual hypotheses for the geometric ergodicity of Markov chains: minorization on “small sets”, see Chapter 5 of [16], and Foster–Lyapunov-type “drift” conditions, see Chapter 15 of [16].

Assuming that a suitably defined maximal process of the random environment satisfies a tail estimate, we manage to establish stochastic stability. We use certain ideas of [12] to obtain convergence to a limiting distribution in total variation norm with estimates on the convergence rate, see Sect. 2 for the statements of our results. We also present a method to prove ergodic theorems, exploiting ideas of [2, 11]. An important technical ingredient is the notion of L-mixing, see Sect. 5.

As examples, we present difference equations modulated by Gaussian processes in Sect. 3. These can be regarded as discretizations of diffusions in random environments which arise, for instance, in stochastic volatility models of mathematical finance, see [6, 9]. These examples demonstrate the power of our approach. Proofs appear in Sects. 4, 6 and 8. Certain ramifications are explored in Sect. 7.

2 Main Results

Let \({\mathcal {Y}}\) be a Polish space with its Borel sigma-field \({\mathfrak {A}}\), and let \(Y_t\), \(t\in {\mathbb {Z}}\) be a (strongly) stationary \({\mathcal {Y}}\)-valued process on some probability space \((\Omega ,{\mathcal {F}},P)\). Expectation of a real-valued random variable X with respect to P will be denoted by E[X] in the sequel. For \(1\le p<\infty \), we write \(L^p\) to denote the Banach space of (a.s. equivalence classes of) \({\mathbb {R}}\)-valued random variables with \(E[|X|^p]<\infty \), equipped with the usual norm.

We fix another Polish space \({\mathcal {X}}\) with its Borel sigma-field \({\mathfrak {B}}\) and denote by \({\mathcal {P}}({\mathcal {X}})\) the set of probability measures on \({\mathfrak {B}}\). Let \(Q:{\mathcal {Y}}\times {\mathcal {X}}\times {\mathfrak {B}}\rightarrow [0,1]\) be a family of transition kernels parametrized by \(y\in {\mathcal {Y}}\), i.e. for all \(A\in {\mathfrak {B}}\), \(Q(\cdot ,\cdot ,A)\) is \({\mathfrak {A}}\otimes {\mathfrak {B}}\)-measurable and for all \(y\in {\mathcal {Y}}\), \(x\in {\mathcal {X}}\), \(A\rightarrow Q(y,x,A)\) is a probability on \({\mathfrak {B}}\). Let \({X}_t\), \(t\in {\mathbb {N}}\) be a \({\mathcal {X}}\)-valued stochastic process such that

$$\begin{aligned} P({X}_{t+1}\in A\vert {\mathcal {F}}_t)=Q(Y_{t},X_t,A)\ P\text{-a.s. },\ t\ge 0, \end{aligned}$$
(1)

where the filtration is defined by

$$\begin{aligned} {\mathcal {F}}_t:=\sigma (Y_j,\ j\in {\mathbb {Z}};\ X_j,\ 0\le j\le t),\ t\ge 0. \end{aligned}$$

The process Y will represent the random environment whose state \(Y_t\) at time t determines the transition law \(Q(Y_t,\cdot ,\cdot )\) of the process X at the given instant t. Thus, X is a Markov chain in a random environment. Our purpose is to study the ergodic properties of X.

Remark 2.1

Obviously, the law of \({X}_t\), \(t\in {\mathbb {N}}\) (and also its joint law with \(Y_t\), \(t\in {\mathbb {Z}}\)) is uniquely determined by (1). For every given Q and \(X_{0}\), there exists a process X satisfying (1) (after possibly enlarging the probability space). See, for example, page 228 of [1] for a similar construction. We will establish a more precise result in Lemma 6.1, under additional assumptions.

We will now introduce a number of assumptions of various kinds that will figure in the statements of the main results: Theorems 2.12, 2.16, 2.17 and 2.18.

The following assumption closely resembles the well-known drift conditions for geometrically ergodic Markov chains, see, for example, Chapter 15 of [16]. In our case, however, they are relaxed by also having dependence on the state of the random environment.

Assumption 2.2

(Drift condition) Let \(V:{\mathcal {X}}\rightarrow {\mathbb {R}}_{+}\) be a measurable function. Let \(A_n\in {\mathfrak {A}}\), \(n\in {\mathbb {N}}\) be a non-decreasing sequence of subsets such that \(A_0\ne \emptyset \) and \({\mathcal {Y}}=\cup _{n\in {\mathbb {N}}}A_n\). Define the \({\mathbb {N}}\)-valued function

$$\begin{aligned} \Vert y \Vert :=\min \{n:\, y\in A_n\},\ y\in {\mathcal {Y}}. \end{aligned}$$

We assume that there are a non-increasing function \(\lambda :{\mathbb {N}}\rightarrow (0,1]\) and a non-decreasing function \(K:{\mathbb {N}}\rightarrow (0,\infty )\) such that, for all \(x\in {\mathcal {X}}\) and \(y\in {\mathcal {Y}}\),

$$\begin{aligned} \int _{{\mathcal {X}}} V(z)\, Q(y,x,dz)\le (1-\lambda (\Vert y\Vert ))V(x)+ K(\Vert y\Vert ). \end{aligned}$$
(2)

Furthermore, we may and will assume \(K(\cdot )\ge 1\).

We provide some intuition about Assumption 2.2: we expect that the stochastic process X behaves in an increasingly arbitrary way as the random environment Y becomes more and more “extreme” (i.e. \(\Vert Y\Vert \) grows) so the drift condition (2) becomes less and less stringent (i.e. \(\lambda (\Vert Y\Vert )\) decreases and K(||y||) increases).

Example 2.3

A typical case is where \({\mathcal {Y}}\) is a subset of a Banach space \({\mathbb {B}}\) with norm \(\Vert \cdot \Vert _{{\mathbb {B}}}\); \({\mathfrak {A}}\) its Borel field; \(A_n:=\{y\in {\mathcal {Y}}:\, \Vert y\Vert _{{\mathbb {B}}}\le n\}\), \(n\in {\mathbb {N}}\). In this setting

where \(\lceil \cdot \rceil \) stands for the ceiling function. In the examples of the present paper, we will always have \({\mathbb {B}}={\mathbb {R}}^d\) with some \(d\ge 1\) and \(|\cdot |=\Vert \cdot \Vert _{{\mathbb {B}}}\) will denote the respective Euclidean norm. Note, however, that in general \(\Vert \cdot \Vert \) is not necessarily related to any geometric structure.

Remark 2.4

It would be desirable to relax Assumption 2.2 by allowing \(\lambda \) to vary in \((-\infty ,1)\) as long as “in the average” it is contractive. (There are multiple options for the precise formulation of this property.) Such a result has been worked out in [15].

The next assumption stipulates the existence of a whole family of suitable “small sets” C(R(n)) that fit well the sets \(A_n\) appearing in Assumption 2.2.

Assumption 2.5

(Minorization condition) For \(R\ge 0\), set \(C(R):=\{x\in {\mathcal {X}}:\ V(x)\le R\}\). Let \(\lambda (\cdot )\), \(K(\cdot )\) be as in Assumption 2.2. Define \(R(n):=4K(n)/\lambda (n)\). There is a non-increasing function \(\alpha :{\mathbb {N}}\rightarrow (0,1]\) and for each \(n\in {\mathbb {N}}\), there exists a probability measure \(\nu _n\) on \({\mathfrak {B}}\) such that, for all \(y\in A_n\), \(x\in C(R(n))\) and \(A\in {\mathfrak {B}}\),

$$\begin{aligned} Q(y,x,A)\ge \alpha (n)\nu _{n}(A). \end{aligned}$$
(3)

In other words, if the state y of the random environment is in \(A_n\), we work on the set \(C(4K(n)/\lambda (n))\) on which we are able to benefit from a “coupling effect” of strength \(\alpha (n)\).

For a fixed V as in Assumption 2.2, let us define a family of metrics on

$$\begin{aligned}{\mathcal {P}}_V({\mathcal {X}}):=\left\{ \mu \in {\mathcal {P}}({\mathcal {X}}):\, \int _{{\mathcal {X}}} V(x)\, \mu (dx) <\infty \right\} \end{aligned}$$

by setting

$$\begin{aligned}\rho _{\beta }(\nu _1,\nu _2):=\int _{{\mathcal {X}}} [1+\beta V(x)]\vert \nu _1-\nu _2\vert (dx),\ \nu _1,\nu _2\in {\mathcal {P}}_V({\mathcal {X}}), \end{aligned}$$

for each \(0\le \beta \le 1\). Here, \(\vert \nu _1-\nu _2\vert \) is the total variation of the signed measure \(\nu _1-\nu _2\). Note that \(\rho _0\) is just the total variation distance (and it can be defined for all \(\nu _1,\nu _2\in {\mathcal {P}}({\mathcal {X}})\)), while \(\rho _1\) is the \((1+V)\)-weighted total variation distance.

Definition 2.6

For a measurable \(f:{\mathcal {X}}\rightarrow {\mathbb {R}}_{+}\), we define \(\Phi (f)\) to be the set of measurable \(\phi :{\mathcal {X}}\rightarrow {\mathbb {R}}\) such that \(|\phi (z)|\le C(1+f(z))\), \(z\in {\mathcal {X}}\) holds for some constant \(C=C(\phi )\). Hence, \(\Phi (1)\) denotes the set of bounded, measurable functions on \({\mathcal {X}}\).

Let \(L:{\mathcal {X}}\times {\mathfrak {B}}\rightarrow [0,1]\) be a transition kernel. For each \(\mu \in {\mathcal {P}}({\mathcal {X}})\), we define the probability

$$\begin{aligned}{}[L\mu ](A):=\int _{{\mathcal {X}}} L(x,A)\, \mu (dx),\ A\in {\mathfrak {B}}. \end{aligned}$$
(4)

Consistently with these definitions, \(Q(Y_n)\mu \) will refer to the action of the kernel \(Q(Y_n,\cdot ,\cdot )\) on \(\mu \). Note, however, that \(Q(Y_n)\mu \) is a random probability measure. For a bounded measurable function \(\phi :{\mathcal {X}}\rightarrow {\mathbb {R}}\), we set

$$\begin{aligned}(x):=\int _{{\mathcal {X}}}\phi (z)\, L(x,dz),\ x\in {\mathcal {X}}. \end{aligned}$$

The latter definition makes sense for any nonnegative measurable \(\phi \), too.

The following assumption is an integrability condition about the initial values \(X_0\) and \(X_1\) of the process X.

Assumption 2.7

(Moment condition on the initial values)

$$\begin{aligned} E[V^{2}(X_{0})+V^{2}(X_{1})]<\infty . \end{aligned}$$
(5)

We now present a hypothesis controlling the maxima of \(\Vert Y\Vert \) over finite time intervals (i.e. the “degree of extremity” of the random environment).

Assumption 2.8

(Condition on the maximal process of the random environment) There exist a non-decreasing function \(g:{\mathbb {N}}\rightarrow {\mathbb {N}}\) and a non-increasing function \(\ell :{\mathbb {N}}\rightarrow [0,1]\) such that

$$\begin{aligned} P\left( \max _{1\le i\le t}\Vert Y_i\Vert \ge g(t)\right) \le \ell (t),\ t\ge 1. \end{aligned}$$
(6)

Remark 2.9

It is clear that for a given process Y, several choices for the pair of functions \(g,\ell \) are possible. Each of these leads to different estimates, and it depends on Y and X which choice is better, no general rule can be determined a priori.

Remark 2.10

In the setting of Example 2.3, let Y be a Gaussian process in \({\mathcal {Y}}={\mathbb {R}}^d\). Assumption 2.8 holds, for instance, with \(g(t)\sim \sqrt{t}\) and \(\ell (t)\) exponentially decreasing, see Sect. 3 for more details.

Remark 2.11

One can derive estimates like (6) also for rather general processes Y. For instance, let \(Y_t\), \(t\in {\mathbb {Z}}\) be \({\mathbb {R}}^d\)-valued strongly stationary such that \(E|Y_0|^p<\infty \) for all \(p\ge 1\). Then for each \(q\ge 1\) set \(p=2q\) and estimate

$$\begin{aligned}{} & {} E^{1/q}\left[ \max _{1\le i\le t}|Y_i|^q\right] \le E^{1/2q}\left[ \max _{1\le i\le t}|Y_i|^{2q}\right] \\\le & {} E^{1/2q}\left[ \sum _{i=1}^t|Y_i|^{2q}\right] \le C(q) t^{\frac{1}{2q}}, \end{aligned}$$

with constant \(C(q)=E^{1/2q}[|Y_0|^{2q}]\). The Markov inequality implies that

$$\begin{aligned} P\left( \max _{1\le i\le t}|Y_i|\ge t\right) \le \frac{C^q(q) t^{1/2}}{t^q}\le \frac{C^q(q)}{t^{q-1/2}}. \end{aligned}$$
(7)

Actually, for arbitrarily small \(\chi >0\) and arbitrarily large \(r\ge 1\), we can set \(q=\frac{r}{\chi }+\frac{1}{2}\) in (7) and then Assumption 2.8 holds with

$$\begin{aligned} g(t)=\lceil t^{\chi }\rceil \text{ and } \ell (t)=\frac{C^{q}(q)}{t^{r}},\ t\ge 1, \end{aligned}$$

i.e. for arbitrary polynomially growing \(g(\cdot )\) and polynomially decreasing \(\ell (\cdot )\). This shows that our main results below have a wide spectrum of applicability well beyond the case of Gaussian Y, see also Example 2.20.

We now define a number of quantities that will appear in various convergence rate estimates below. For each \(t\in {\mathbb {N}}\), set

$$\begin{aligned}\zeta (t):= & {} \min \{\alpha (t),\lambda (t)\},\\ r_1(t):= & {} \sum _{k=t}^{\infty } \frac{K(g(k))}{\alpha (g(k))}e^{-k\zeta (g(k))/4},\\ r_2(t):= & {} \sum _{k=t}^{\infty } \frac{K(g(k+1))}{\alpha (g(k+1))\zeta (g(k+1))}\sqrt{\ell (k)},\\ r_3(t):= & {} \sum _{k=t}^{\infty } e^{-k\zeta (g(k))/4},\\ r_4(t):= & {} \sum _{k=t}^{\infty } \ell (k),\\ \pi (t):= & {} \frac{|\ln (\lambda (g(t)))|}{\alpha (g(t))\lambda (g(t))}. \end{aligned}$$

Now comes the first main result of the present paper: assuming our conditions on drift, minorization, initial values and control of the maxima, \(\textrm{Law}(X_{t})\) will tend to a limiting law as \(t\rightarrow \infty \), provided that \(r_1(0)\) and \(r_2(0)\) are finite.

Theorem 2.12

Define \(\mu _{t}:=\textrm{Law}(X_{t})\), \(t\in {\mathbb {N}}\). Let Assumptions 2.2, 2.5, 2.7 and 2.8 be in force. Assume

$$\begin{aligned} r_1(0)+r_2(0)<\infty . \end{aligned}$$
(8)

Then, there is a probability \(\mu _*\) on \({\mathcal {X}}\) such that \(\mu _t\rightarrow \mu _*\) in \((1+V)\)-weighted total variation as \(t\rightarrow \infty \). More precisely,

$$\begin{aligned}\rho _1(\mu _t,\mu _*)\le C[r_1(t)+r_2(t)],\ t\in {\mathbb {N}}, \end{aligned}$$

for some constant \(C>0\). The limit \(\mu _{*}\) does not depend on \(X_{0}\).

Remark 2.13

When \(\lambda ,K,\alpha \) are constant and do not depend on Y, we retrieve the familiar exponential convergence rate in Theorem 2.12. Indeed, in this case we may suppose that \(A_n=A_0={\mathcal {Y}}\) for all n and hence we may choose \(g(t)=1\) and \(l(t)=0\), for all t.

Theorem 2.16 is just a variant of Theorem 2.12: relaxing the assumptions it provides convergence in a weaker sense.

Assumption 2.14

(Weaker moment condition on the initial values)

$$\begin{aligned} E[V(X_{0})+V(X_{1})]<\infty . \end{aligned}$$
(9)

Remark 2.15

As one of our referees pointed out, a simple sufficient condition for (9) can be formulated in terms of \(X_{0},Y_{0}\) only. Let Assumption 2.2 be in force. Then, \(E[V(X_{1})]\le E[V(X_{0})]+E[K(||Y_{0}||)]\); hence, a sufficient condition for (9) is \(E[V(X_{0})+K(\Vert Y_{0}\Vert )]<\infty \).

Similarly, let

$$\begin{aligned} \int _{{\mathcal {X}}} V^2(z)\, Q(y,x,dz)\le (1-\lambda (\Vert y\Vert ))^2V^{2}(x)+ K^2(\Vert y\Vert ) \end{aligned}$$

hold instead of (2). This implies (2) and also \(E[V^2(X_1)]\le E[V^2(X_0)]+ E[K^2(||Y_0||)]\); thus, a sufficient condition for (5) in terms of \(X_0,Y_0\) only is \(E[V^2(X_0)+ K^2(||Y_0||)]<\infty \) in this case.

Theorem 2.16

Recall that \(\mu _{t}=\textrm{Law}(X_{t})\), \(t\in {\mathbb {N}}\). Let Assumptions 2.2, 2.5, 2.8 and 2.14 be in force. Assume

$$\begin{aligned} r_3(0)+r_4(0)<\infty . \end{aligned}$$
(10)

Then, there is a probability \(\mu _*\) on \({\mathcal {X}}\) such that \(\mu _t\rightarrow \mu _*\) in total variation as \(t\rightarrow \infty \). More precisely,

$$\begin{aligned} \rho _0(\mu _t,\mu _*)\le C[r_3(t)+r_4(t)],\ t\in {\mathbb {N}}, \end{aligned}$$
(11)

for some constant \(C>0\). The limit \(\mu _{*}\) does not depend on \(X_{0}\).

Clearly, Assumption 2.7 implies Assumption 2.14 and (8) implies (10). Next, ergodic theorems corresponding to Theorems 2.12 and 2.16 are stated.

Theorem 2.17

Let Assumptions 2.2, 2.5, 2.7 and 2.8 be in force, but with \(R(n):=8K(n)/\lambda (n)\), \(n\in {\mathbb {N}}\) in Assumption 2.5. Let Y be an ergodic process. Let \(\phi \in \Phi (V^{\delta })\) for some \(0< \delta \le 1/2\). Assume

$$\begin{aligned} r_1(0)+r_2(0)<\infty {} \end{aligned}$$

and

$$\begin{aligned} \left( \frac{K(g(t))}{\lambda (g(t))}\right) ^{2\delta }\frac{\pi (t)}{t}\rightarrow 0,\ t\rightarrow \infty . \end{aligned}$$
(12)

Then,

$$\begin{aligned} \frac{\phi (X_1)+\ldots +\phi (X_t)}{t}\rightarrow \int _{{\mathcal {X}}} \phi (z)\mu _*(dz),\ t\rightarrow \infty \end{aligned}$$
(13)

holds in \(L^{p}\) for each \(p<1/\delta \). (Here, \(\mu _*\) is the same as in Theorem 2.12.)

We can weaken our assumptions for bounded \(\phi \).

Theorem 2.18

Let Assumptions 2.2, 2.5, 2.8 and 2.14 be in force, but with \(R(n):=8K(n)/\lambda (n)\), \(n\in {\mathbb {N}}\) in Assumption 2.5. Let Y be an ergodic process. Assume

$$\begin{aligned} r_3(0)+r_4(0)<\infty \end{aligned}$$

and

$$\begin{aligned} \frac{\pi (t)}{t}\rightarrow 0,\ t\rightarrow \infty . \end{aligned}$$
(14)

Then, for each \(\phi \in \Phi (1)\) the convergence (13) holds in \(L^{p}\) for all \(p\ge 1\).

Remark 2.19

In Theorems 2.17 and 2.18, we require a slight strengthening of Assumption 2.5 by imposing (3) with \(R(n)=8K(n)/\lambda (n)\) instead of \(R(n)=4K(n)/\lambda (n)\).

Condition (14) is closely related to the condition \(r_3(0)<\infty \) but none of the two implies the other. Indeed, fix \(g(t):=t\). Choose \(\lambda (t):=1/2\) and \(\alpha (t):=\sqrt{\ln (t)}/{t}\), \(t\ge 4\). Then, \(\pi (t)/t\rightarrow 0\) but \(r_3(0)=\infty \). Conversely, let \(\alpha (t):=1/2\) and \(\lambda (t)=:\frac{8\ln (t)}{t}\). Then, \(r_3(0)<\infty \) but \(\pi (t)/t\) tends to a positive constant.

Example 2.20

Let Y be strongly stationary \({\mathbb {R}}^d\)-valued with \(E|Y_0|^p<\infty \) for all \(p\ge 1\). Let Assumptions 2.2 and 2.5 hold with \(K(\cdot )\) having at most polynomial growth (i.e. \(K(n)\le C n^{b}\) with some \(C,b>0\)) and \(\alpha (\cdot )\), \(\lambda (\cdot )\) having at most polynomial decay (i.e. \(\alpha (n)\ge c n^{-b}\) with some \(c>0\), similarly for \(\lambda \)). Let Assumption 2.7 hold. Then, Remark 2.11 shows (choosing \(\chi \) small and r large) that Theorems 2.12 and 2.17 apply.

Remark 2.21

One of our referees pointed out that more general versions of Assumptions 2.2 and 2.5 could be considered. Let \({\tilde{\lambda }},{\tilde{\alpha }},{\tilde{K}}:{\mathcal {Y}}\rightarrow (0,\infty )\) be measurable functions with \({\tilde{K}}\ge 1\) and \({\tilde{\lambda }},{\tilde{\alpha }}\le 1\). Instead of (2), one could assume

$$\begin{aligned} \int _{{\mathcal {X}}} V(z)\, Q(y,x,dz)\le (1-{\tilde{\lambda }}(y))V(x)+ {\tilde{K}}(y),\ y\in {\mathcal {Y}}. \end{aligned}$$
(15)

Instead of (3), one could assume that, for all \(y\in {\mathcal {Y}}\), \(x\in C({\tilde{R}}(y))\) and \(A\in {\mathfrak {B}}\),

$$\begin{aligned} Q(y,x,A)\ge {\tilde{\alpha }}(y)\nu (y,A),{} \end{aligned}$$
(16)

with a transition kernel \(\nu :{\mathcal {Y}}\times {\mathfrak {B}}\rightarrow [0,1]\). Here, \({\tilde{R}}(y):=4{\tilde{K}}(y)/{\tilde{\lambda }}(y)\).

When it comes to (6), however, it is not totally clear how to formulate it in this more general setting. We only sketch one possibility here. Let \(\lambda _{t},\alpha _{t}>0\), \(t\in {\mathbb {N}}\) be non-increasing sequences, \(K_{t}>0\), \(t\in {\mathbb {N}}\) a non-decreasing sequence. Define

$$\begin{aligned} \ell (t):=P\left( \Omega \setminus \left\{ {\tilde{\lambda }}(Y_{j})\ge \lambda _{t-1},{\tilde{\alpha }}(Y_{j})\ge \alpha _{t-1}, {\tilde{K}}(Y_{j})\le K_{t-1}, j=1,\ldots ,t\right\} \right) ,\ t\ge 1.\nonumber \\ \end{aligned}$$
(17)

Note that (6) is equivalent to

$$\begin{aligned} P\left( \Omega \setminus \left\{ Y_{1},\ldots ,Y_{t}\in A_{g(t)-1}\right\} \right) \le \ell (t){}, \end{aligned}$$

hence (17) looks reasonable. Redefine

$$\begin{aligned}\zeta (t):= & {} \min \{\alpha _{t},\lambda _{t}\},\\ r_1(t):= & {} \sum _{k=t}^{\infty } \frac{K_{k}}{\alpha _{k}}e^{-k\zeta (k)/4},\\ r_2(t):= & {} \sum _{k=t}^{\infty } \frac{K_{k+1}}{\alpha _{k+1}\zeta (k+1)}\sqrt{\ell (k)}. \end{aligned}$$

Let (15), (16) and (5) be in force. Let us assume \(r_{1}(0)+r_{2}(0)<\infty \). Then, the conclusion of Theorem 2.12 remains true, with essentially the same proof.

3 Difference Equations in Gaussian Environments

In this section, we present examples of processes X that satisfy a difference equation, modulated by the process Y. We do not aim at a high degree of generality but prefer to illustrate the power of the results in Sect. 2 in some easily tractable cases. We stress that, as far as we know, none of these results follow from the existing literature.

We fix \({\mathcal {Y}}={\mathbb {R}}^d\) for some d and \({\mathcal {X}}={\mathbb {R}}\). We also fix a \({\mathcal {Y}}\)-valued zero-mean Gaussian stationary process \(Y_t\), \(t\in {\mathbb {Z}}\). We set \(\Vert y\Vert =\lceil |y|\rceil \), \(y\in {\mathcal {Y}}\) as in Example 2.3. We will exclusively use \(V(x)=|x|\), \(x\in {\mathbb {R}}\) in the present section.

Remark 3.1

Let \(\xi _t\), \(t\in {\mathbb {Z}}\) be a zero-mean \({\mathbb {R}}\)-valued stationary Gaussian process with unit variance. Clearly,

$$\begin{aligned}{} & {} P(\max _{1\le i\le t}|\xi |\ge t^{b})\le e^{-\frac{1}{4}t^{2b}}E\max _{1\le i\le t}e^{\frac{1}{4}|\xi _{i}|^{2}}\\\le & {} e^{-\frac{1}{4}t^{2b}}\sum _{i=1}^{t}Ee^{\frac{1}{4}|\xi _{i}|^{2}}\le Cte^{-{\bar{c}}_{1}t^{2b}}\le {} C e^{-{\bar{c}}_{2}t^{2b}} \end{aligned}$$

for suitable \(C,{\bar{c}}_{1},{\bar{c}}_{2}\).

Applying these observations with \(b=1\) to every coordinate of Y, it follows that Assumption 2.8 holds for the process Y with the choice \(g(k)=\lceil c_1 \sqrt{k}\rceil \), \(\ell (k)=\exp (-c_2 k)\) for some \(c_1,c_2>0\) and thus \(r_4(n)\) decreases at a geometric rate as \(n\rightarrow \infty \).

More generally, choosing arbitrary \(b>0\), Assumption 2.8 holds for Y with the choice \(g(k)=\lceil c_1 k^{b} \rceil \), \(\ell (k)=\exp (-c_2 k^{2b})\).

We assume throughout this section that \(\varepsilon _t\), \(t\in {\mathbb {N}}\) is an \({\mathbb {R}}\)-valued i.i.d. sequence, independent of \(Y_t\), \(t\in {\mathbb {Z}}\); \(E|\varepsilon _0|^2<\infty \) and the law of \(\varepsilon _0\) has an everywhere positive density f with respect to the Lebesgue measure, which is even and non-increasing on \([0,\infty )\). All these hypotheses could clearly be weakened/modified, and we just try to stay as simple as possible.

Example 3.2

First we investigate the effect of the “contraction coefficient” \(\lambda \) in (2). Let \(d:=1\). Let \(0<\underline{\sigma }\le {\overline{\sigma }}\) be constants and \(\sigma :{\mathbb {R}}\times {\mathbb {R}}\rightarrow [\underline{\sigma },{\overline{\sigma }}]\) a measurable function. Let furthermore \(\Delta :{\mathbb {R}}\rightarrow (0,1]\) be even and non-increasing on \([0,\infty )\), for which we will develop conditions on the way. We stipulate that the tail of f is not too thin: it is at least as thick as that of a Gaussian variable, that is,

$$\begin{aligned} f(x)\ge e^{-sx^2}/s,\ x\ge 0, \end{aligned}$$
(18)

for some \(s>0\). We assume that the dynamics of X is given by

$$\begin{aligned}X_0:=0,\ X_{t+1}:=(1-\Delta (Y_t))X_t+\sigma (Y_t,X_t)\varepsilon _{t+1},\ t\in {\mathbb {N}}. \end{aligned}$$

We will find \(K(\cdot ),\lambda (\cdot ),\alpha (\cdot )\) such that Assumptions 2.2 and 2.5 hold and give an estimate for the rate \(r_3(n)\) appearing in (11). (Note that we already have estimates for the rate \(r_4(n)\) from Remark 3.1.)

The density of \(X_1\) conditional on \(X_0=x\), \(Y_0=y\) (w.r.t. the Lebesgue measure) is easily seen to be

$$\begin{aligned}h_{x,y}(z):=f\left( \frac{z-(1-\Delta (y))x}{\sigma (y,x)}\right) \frac{1}{\sigma (y,x)},\ z\in {\mathbb {R}}. \end{aligned}$$

Let \(\eta >0\) be arbitrary for the moment. We can estimate

$$\begin{aligned}\inf _{x,z\in [-\eta ,\eta ]}h_{x,y}(z)\ge f\left( \frac{2\eta }{\underline{\sigma }}\right) \frac{1}{{\overline{\sigma }}}=:m(\eta ). \end{aligned}$$

Define the probability measures

$$\begin{aligned} {\tilde{\nu }}_{\eta }(A):=\frac{1}{2\eta }\textrm{Leb}(A\cap [-\eta ,\eta ]),\ A\in {\mathfrak {B}}. \end{aligned}$$

It follows that

$$\begin{aligned}Q(y,x,A)\ge 2\eta m(\eta ) {\tilde{\nu }}_{\eta }(A),\ A\in {\mathfrak {B}}, \end{aligned}$$

for all \(x\in [-\eta ,\eta ]\), \(y\in {\mathbb {R}}\). Notice that

$$\begin{aligned}(x)\le (1-\Delta (y))V(x) + {\overline{\sigma }}E|\varepsilon _0|\le (1-\Delta (y))V(x)+K, \end{aligned}$$

where \(K:=\max \{{\overline{\sigma }}E|\varepsilon _0|,1\}\). Then, Assumption 2.2 holds with \(A_n=\{x\in {\mathbb {R}}:\, |x|\le n\}\), \(\lambda (n)=\Delta (n)\) and \(K(n)=K\), \(n\ge 1\). (Here and in the sequel we use the index set \({\mathbb {N}}\setminus \{0\}\) instead of \({\mathbb {N}}\) for convenience.)

Let us now specify \(\eta \) by setting \(\eta :={\tilde{R}}(y):=4K/\Delta (y)\), \(y\in {\mathcal {Y}}\) and \(R(n)={\tilde{R}}(n)\), \(n\in {\mathbb {N}}\). We note that \({\tilde{R}}(y)\) is defined for every \(y\in {\mathcal {Y}}\), while R(n) is defined for every \(n\in {\mathbb {N}}\), and this is why we keep different notations for these two functions here and also in the subsequent examples. We can conclude using the tail bound (18) that

$$\begin{aligned}Q(y,x,A)\ge \frac{8Km({\tilde{R}}(y))}{\Delta (y)}{\tilde{\nu }}_{{\tilde{R}}(y)}(A)\ge \frac{e^{-c_3 {\tilde{R}}^2(y)}}{c_{3}\Delta (y)}{\tilde{\nu }}_{{\tilde{R}}(y)}(A), \end{aligned}$$

for all \(A\in {\mathfrak {B}}\), \(y\in {\mathcal {Y}}\), \(|x|\le R(\lceil |y|\rceil )\) with some \(c_3>0\), so (3) in Assumption 2.5 holds with

$$\begin{aligned}\alpha (n):=\frac{e^{-c_3 {R}^2(n)}}{c_{3}\Delta (0)},\ n\ge 1, \end{aligned}$$

and \(\nu _n={\tilde{\nu }}_{R(n)}\). Now, let the function \(\Delta \) be such that \(\Delta (y):= 1\) for \(0\le y<3\) and \(\Delta (y)\ge 1/(\ln (y))^{\delta }\) with some \(\delta >0\), for all \(y\ge 3\). We obtain from the previous estimates and from Remark 3.1 with \(g(k)=\lceil c_1 \sqrt{k}\rceil \) that

$$\begin{aligned}\zeta (g(k))\ge e^{-c_4\ln ^{2\delta }(k)}/c_{4}, \end{aligned}$$

with some \(c_4>0\). If \(\delta <1/2\), then this leads to estimates on the terms of \(r_3(n)\) which guarantee \(r_3(0)<\infty \).

If instead of (18) we assume

$$\begin{aligned}f(x)\ge e^{-sx}/s,\ x\ge 0, \end{aligned}$$

then \(r_3(0)<\infty \) follows whenever \(\delta <1\). This shows nicely the interplay between the feasible fatness of the tail of f and the strength of the mean-reversion \(\Delta (\cdot )\).

Example 3.3

We recall that f is assumed even, positive and non-increasing on \([0,\infty )\). Again, let \(d:=1\), \(X_0:=0\) and

$$\begin{aligned}X_{t+1}:=(1-\Delta ) X_t +\sigma (Y_t,X_t)\varepsilon _{t+1},\ t\in {\mathbb {N}}, \end{aligned}$$

where \(\sigma :{\mathbb {R}}\times {\mathbb {R}}\rightarrow (0,\infty )\) is a measurable function and \(0< \Delta \le 1\) is a constant. We furthermore assume that

$$\begin{aligned}c_5 G(y)\le \sigma (y,x)\le c_6 G(y),\ x\in {\mathbb {R}}, \end{aligned}$$

with some even function \(G:{\mathbb {R}}\rightarrow (0,\infty )\) that is non-decreasing on \([0,\infty )\) and with constants \(c_5,c_6>0\). We clearly have (2) with \(\lambda (n)=\Delta \), \(n\in {\mathbb {N}}\) (i.e. \(\lambda (\cdot )\) is constant) and \(A_n=\{x\in {\mathbb {R}}:\ |x|\le n\}\), \(K(n):={\tilde{K}}(n)\), \(n\in {\mathbb {N}}\) where \({\tilde{K}}(y)=\max \{1, c_6 G(y)E\vert \varepsilon _0\vert \}\), \(y\in {\mathbb {R}}\). Taking \({\tilde{R}}(y)= 4{\tilde{K}}(y)/\Delta \), \(y\in {\mathbb {R}}\), estimates as in Example 3.2 lead to

$$\begin{aligned}Q(y,x,A)\ge 2{\tilde{R}}(y)f\left( \frac{2{\tilde{R}}(y)}{c_5 G(y)}\right) \frac{1}{c_6 G(y)}{\tilde{\nu }}_{{\tilde{R}}(y)}(A)\ge c_7 {\tilde{\nu }}_{{\tilde{R}}(y)}(A), \end{aligned}$$

for all \(A\in {\mathfrak {B}}\) with some fixed constant \(c_7>0\), where \({\tilde{\nu }}_{{\tilde{R}}(y)}(\cdot )\) is the normalized Lebesgue measure restricted to \(C({\tilde{R}}(y))\), as in Example 3.2, so setting \(R(n)={\tilde{R}}(n)\), \(n\in {\mathbb {N}}\), we can choose \(\nu _n={\tilde{\nu }}_{R(n)}\) and \(\alpha (\cdot )\) a positive constant.

Assume, for example, \(G(y)\le C[1+|y|^q]\), \(y\ge 0\) with some \(C,q>0\), this guarantees \(E[V^{2}(X_{1})]=E[X_{1}^{2}]<\infty \), i.e. Assumption 2.7 holds. Choose \(g(k)=\lceil c_1\sqrt{k}\rceil \), \(\ell (k)=\exp (-c_2 k)\), as discussed in Remark 3.1. Then, Theorems 2.12 and 2.17 apply.

Example 3.4

We now investigate a discrete-time model for financial time series, inspired by the “fractional stochastic volatility model” of [6, 9].

Let \(w_t\), \(t\in {\mathbb {Z}}\) and \(\varepsilon _t\), \(t\in {\mathbb {N}}\) be two sequences of i.i.d. random variables such that the two sequences are also independent. Assume that \(w_t\) is Gaussian. We define the (causal) infinite moving average process

$$\begin{aligned} \xi _t:=\sum _{j=0}^{\infty } a_jw_{t-j},\ t\in {\mathbb {Z}}. \end{aligned}$$

This series is almost surely convergent whenever \(\sum _{j=0}^{\infty } a_j^2<\infty \). We take \(d=2\) here, and the random environment will be the \({\mathcal {Y}}={\mathbb {R}}^2\)-valued process \(Y_t=(w_t,\xi _t)\), \(t\in {\mathbb {Z}}\).

We imagine that \(\xi _t\) describes the log-volatility of an asset in a financial market. It is reasonable to assume that \(\xi \) is a Gaussian linear process (see [9] where the related continuous-time models are discussed in detail).

Let us now consider the \({\mathbb {R}}\)-valued process X which will describe the increment of the log-price of the given asset. Assume that \(X_0:=0\),

$$\begin{aligned} X_{t+1}=(1-\Delta ) X_t+\rho e^{\xi _t}w_t +\sqrt{1-\rho ^2}e^{\xi _t}\varepsilon _{t+1},\ t\in {\mathbb {N}}, \end{aligned}$$
(19)

with some \(-1<\rho <1\), \(0<\Delta \le 1\). The log-price is thus jointly driven by the noise sequences \(\varepsilon _t\), \(w_t\). The parameter \(\Delta \) is responsible for the autocorrelation of X. (\(\Delta \) is typically close to 1.) The parameter \(\rho \) controls the correlation of the price and its volatility. This is found to be nonzero (actually, negative) in empirical studies, see [7], and hence, it is important to include \(w_t\), \(t\in {\mathbb {Z}}\) both in the dynamics of X and in that of Y. We take \(A_n=\{y=(w,\xi )\in {\mathbb {R}}^2:\ |y|\le n\}\), \(n\in {\mathbb {N}}\).

Notice that

$$\begin{aligned} |X_{1}|\le (1-\Delta ) |X_0| + [|w_0|+|\varepsilon _{1}|]e^{\xi _1} \end{aligned}$$

hence,

$$\begin{aligned} E[V(X_{1})\vert X_0=x,\ Y_0=(w,\xi )] \le (1-\Delta ) V(x)+c_{8}e^{\xi }(1+|w|) \end{aligned}$$

for all \(x\in {\mathbb {R}}\), with some \(c_{8}>0\), i.e. Assumption 2.2 holds with \(\lambda (n)=\lambda :=\Delta \) and \(K(n)=c_8 e^n(1+n)\).

We now turn our attention to Assumption 2.5. Denote the density of the law of \(X_1\) conditional on \(X_0=x\), \(Y_0=(w,\xi )\) with respect to the Lebesgue measure by \(h_{x,w,\xi }(z)\), \(z\in {\mathbb {R}}\). Let us fix \(\eta >0\) for the moment. For \(x,z\in [-\eta ,\eta ]\), we clearly have

$$\begin{aligned} h_{x,w,\xi }(z)\ge f\left( \frac{2\eta +e^{\xi }|w|}{e^{\xi }\sqrt{1-\rho ^2}}\right) \frac{1}{e^{\xi }\sqrt{1-\rho ^2}}. \end{aligned}$$
(20)

We assume from now on that f, the density of \(\varepsilon _0\), satisfies

$$\begin{aligned}f(x)= s/(1+x)^{\chi },\ x\ge 0 \end{aligned}$$

with some \(s>0\), \(\chi >3\), this is reasonable as \(X_t\) has fat tails according to empirical studies, see [7]. At the same time, \(E[\varepsilon _{0}^{2}]<\infty \) and Assumption 2.7 are also satisfied for such a choice of f.

Define \({\tilde{K}}(y):=e^{\xi }(1+|w|)\) and \({\tilde{R}}(y):=4{\tilde{K}}(y)/\lambda \), for \(y=(w,\xi )\in {\mathbb {R}}^2\). Specify \(\eta :={\tilde{R}}(y)\) and use (20) to obtain, as in Example 3.2,

$$\begin{aligned} Q(y,x,A)\ge \frac{c_{9}}{(1+|w|)^{\chi }}\frac{1}{e^{\xi }}2{\tilde{R}}(y){\tilde{\nu }}_{{\tilde{R}}(y)}(A)\ge \frac{c_{10}}{(1+|w|)^{\chi -1}} {\tilde{\nu }}_{{\tilde{R}}(y)}(A), \end{aligned}$$

with fixed constants \(c_{9},c_{10}>0\), where \({\tilde{\nu }}_{{\tilde{R}}(y)}\) is the normalized Lebesgue measure restricted to \([-{\tilde{R}}(y), {\tilde{R}}(y)]\). Set \(R(n)={\tilde{R}}((n,n))\), \(n\ge 1\). Then, Assumption 2.5 holds with

$$\begin{aligned} \alpha (n)=\frac{c_{10}}{(1+n)^{\chi -1}},\ n\ge 1 \end{aligned}$$

and \(\nu _{n}={\tilde{\nu }}_{R(n)}\). Recalling the end of Remark 3.1, and choosing \(b>0\) small enough, we can conclude that Theorems 2.12 and 2.17 apply to this stochastic volatility model.

More generally, instead of (19), we may consider

$$\begin{aligned} X_{t+1}-X_t=k(X_t)+\rho e^{\xi _t}w_t +\sqrt{1-\rho ^2}e^{\xi _t}\varepsilon _{t+1} \end{aligned}$$

with some dissipative measurable function \(k:{\mathbb {R}}\rightarrow {\mathbb {R}}\), i.e. we assume \(xk(x)\le -Ax^2+B\) for all \(x\in {\mathbb {R}}\) with some \(A,B>0\). Following the same steps, the applicability of Theorems 2.12 and 2.17 can be verified.

We stress that only a small fraction of relevant examples has been presented above, favouring simplicity. The results of Sect. 2 clearly apply in much greater generality.

4 Proofs of Stochastic Stability

Consider the \({\mathfrak {Y}}:={\mathcal {Y}}^{{\mathbb {Z}}}\)-valued random variable \({\textbf{Y}}:=(Y_{t})_{t\in {\mathbb {Z}}}\). By the measure decomposition theorem (see III.72 of [8]), there is a transition kernel \({\tilde{\mu }}_{0}:{\mathfrak {Y}}\times {\mathfrak {B}}\rightarrow {} [0,1]\) such that

$$\begin{aligned} \mu _{0}(A)=P(X_{0}\in A)=\int _{{\mathfrak {Y}}}{\tilde{\mu }}_{0}({\textbf{y}},A)\textrm{Law}({\textbf{Y}})(d{\textbf{y}}),\ A\in {\mathfrak {B}}. \end{aligned}$$
(21)

For each \({\textbf{y}}\in {\mathfrak {Y}}\), we will denote by \({\tilde{\mu }}_{0}({\textbf{y}})\) the probability \(A\rightarrow {\tilde{\mu }}_{0}({\textbf{y}},A)\), \(A\in {\mathfrak {B}}\) in the sequel.

Clearly, Assumption 2.7 is equivalent to

$$\begin{aligned} E\left[ \int _{{\mathcal {X}}} V^{2}(z)[{\tilde{\mu }}_0({\textbf{Y}})+[Q(Y_0){\tilde{\mu }}_0({\textbf{Y}})]](dz)\right] <\infty , \end{aligned}$$
(22)

and Assumption 2.14 is equivalent to

$$\begin{aligned} E\left[ \int _{{\mathcal {X}}} V(z)[{\tilde{\mu }}_0({\textbf{Y}})+[Q(Y_0){\tilde{\mu }}_0({\textbf{Y}})]](dz)\right] <\infty . \end{aligned}$$
(23)

We first recall a result which will be crucial in the arguments below.

Lemma 4.1

Let \(L:{\mathcal {X}}\times {\mathfrak {B}}\rightarrow [0,1]\) be a transition kernel such that

$$\begin{aligned}LV(x)\le \gamma V(x)+K,\ x\in {\mathcal {X}}, \end{aligned}$$

for some \(0\le \gamma <1\), \(K>0\). Let \(C:=\{x\in {\mathcal {X}}:\, V(x)\le R \}\) for some \(R>2K/(1-\gamma )\). Let us assume that there is a probability \(\nu \) on \({\mathfrak {B}}\) such that

$$\begin{aligned}\inf _{x\in C} L(x,A)\ge \alpha \nu (A),\ A\in {\mathfrak {B}}, \end{aligned}$$

for some \(\alpha >0\). Then for each \(\alpha _0\in (0,\alpha )\) and for \(\gamma _0:=\gamma + 2K/R\),

$$\begin{aligned}\rho _{\beta }(L\mu _1,L\mu _2)\le \max \left\{ 1-(\alpha -\alpha _0),\frac{2+R\beta \gamma _0}{2+R\beta }\right\} \rho _{\beta }(\mu _1,\mu _2),\ \mu _1,\mu _2\in {\mathcal {P}}_V, \end{aligned}$$

holds for \(\beta =\alpha _0/K\).

Proof

See Theorem 3.1 in [12]. \(\square \)

Next comes an easy corollary.

Lemma 4.2

Let \(L:{\mathcal {X}}\times {\mathfrak {B}}\rightarrow [0,1]\) be a transition kernel such that

$$\begin{aligned} LV(x)\le (1-\lambda ) V(x)+K,\ x\in {\mathcal {X}}, \end{aligned}$$
(24)

for some \(0<\lambda \le 1\), \(K>0\). Let \(C:=\{x\in {\mathcal {X}}:\, V(x)\le R \}\) with \(R:=4K/\lambda \). Assume that there is a probability \(\nu \) on \({\mathfrak {B}}\) such that

$$\begin{aligned} \inf _{x\in C} L(x,A)\ge \alpha \nu (A),\ A\in {\mathfrak {B}}, \end{aligned}$$
(25)

for some \(\alpha >0\). Then,

$$\begin{aligned} \rho _{\beta }(L\mu _1,L\mu _2)\le \left( 1-\frac{\min (\alpha ,\lambda )}{4}\right) \rho _{\beta }(\mu _1,\mu _2),\quad {} \mu _1,\mu _2\in {\mathcal {P}}_V, \end{aligned}$$
(26)

holds for \(\beta =\frac{\alpha }{2K}\).

Proof

Choose \(\gamma :=1-\lambda \), and let \(\alpha _0:=\alpha /2\). Note that \(1-(\alpha -\alpha _0)= 1-\alpha /2\) and \(R\beta =2\alpha /\lambda \) holds for \(\beta =\frac{\alpha }{2K}\). Also, \(\gamma _{0}=1-\lambda /2\). Applying Lemma 4.1, we estimate

$$\begin{aligned}\rho _{\beta }(L\mu _1,L\mu _2)\le & {} \max \left\{ 1-(\alpha -\alpha _0),\frac{2+R\beta \gamma _0}{2+R\beta }\right\} \rho _{\beta }(\mu _1,\mu _2)\\= & {} \max \left\{ 1-\alpha /2,1-\frac{\alpha \lambda }{2(\alpha +\lambda )}\right\} \rho _{\beta }(\mu _1,\mu _2). \end{aligned}$$

Here,

$$\begin{aligned} \frac{\alpha \lambda }{2(\alpha +\lambda )}\ge \frac{\min (\alpha ,\lambda )\max (\alpha ,\lambda )}{4\max (\alpha ,\lambda )}\ge {\min (\alpha ,\lambda )}/{4} \end{aligned}$$
(27)

and we get the statement since \(\alpha /2\ge \frac{\min (\alpha ,\lambda )}{4}\). \(\square \)

We introduce some important notation now. If \(({\textbf{y}},A)\rightarrow L({\textbf{y}},A)\), \({\textbf{y}}\in {\mathfrak {Y}}\), \(A\in {\mathfrak {B}}\) is a (not necessarily transition) kernel and Z is a \({\mathfrak {Y}}\)-valued random variable, then we define a measure \({\mathcal {E}}[L(Z)](\cdot )\) on \({\mathfrak {B}}\) via

$$\begin{aligned} {\mathcal {E}}[L(Z)](A):=E[L(Z,A)],\ A\in {\mathfrak {B}}. \end{aligned}$$
(28)

We will use the following trivial inequalities in the sequel:

$$\begin{aligned} \rho _0(\cdot )\le 2,\quad \rho _0(\cdot )\le \rho _{\beta }(\cdot )\le \rho _1(\cdot )\le \frac{1}{\beta }\rho _{\beta }(\cdot ),\ 0<\beta \le 1. \end{aligned}$$
(29)

Proof of Theorem 2.12

For later use, we define the \({\mathfrak {Y}}\)-valued random variables \(\hat{{\textbf{Y}}}_{n}:=(Y_{n+j})_{j\in {\mathbb {Z}}}\), for each \(n\in {\mathbb {Z}}\). Note that \({\textbf{Y}}=\hat{{\textbf{Y}}}_{0}\). Fix \({\textbf{y}}:=(y_j)_{j\in {\mathbb {Z}}}\in {\mathfrak {Y}}\) for the moment. Set \(\hat{{\textbf{y}}}_{n}:=(y_{n+j})_{j\in {\mathbb {Z}}}\), for each \(n\in {\mathbb {Z}}\). Again, \({\textbf{y}}=\hat{{\textbf{y}}}_{0}\). Define

$$\begin{aligned} \mu _0({\textbf{y}}):={\tilde{\mu }}_{0}({\textbf{y}}),\ \mu _n({\textbf{y}}):=Q(y_0)Q(y_{-1})\ldots Q(y_{-n+1}) {\tilde{\mu }}_{0}(\hat{{\textbf{y}}}_{-n+1}),\ n\ge 1. \end{aligned}$$
(30)

Here, Q(y) is the operator acting on probabilities which is described in (4) but, instead of L(xA), with the kernel Q(yxA). Fix \(n\ge 1\) and denote \(\bar{y}_n:=\max _{-n+1\le j\le 0}\Vert y_j\Vert \). Since

$$\begin{aligned} y_{j}\in A_{\bar{y}_{n}},\ -n+1\le j\le 0, \end{aligned}$$
(31)

Assumptions 2.2 and 2.5 imply that (24) and (25) hold for \(L=Q(y_j)\), \(j=-n+1,\ldots ,0\) with \(K=K(\bar{y}_n)\), \(\lambda =\lambda (\bar{y}_n)\) and \(\alpha =\alpha (\bar{y}_n)\). An n-fold application of Lemma 4.2 implies that, for \(\beta =\alpha (\bar{y}_n)/2K(\bar{y}_n)\),

$$\begin{aligned}\rho _{\beta }(\mu _n({\textbf{y}}),\mu _{n+1}({\textbf{y}}))\le (1-\zeta (\bar{y}_n)/4)^{n} \rho _{\beta }({\tilde{\mu }}_0(\hat{{\textbf{y}}}_{-n+1}),Q(y_{-n}){\tilde{\mu }}_{0} (\hat{{\textbf{y}}}_{-n})). \end{aligned}$$

By (29),

$$\begin{aligned} \rho _{1}(\mu _n({\textbf{y}}),\mu _{n+1}({\textbf{y}})) \le \frac{2K(\bar{y}_n)}{\alpha (\bar{y}_n)} (1-\zeta (\bar{y}_n)/4)^n \rho _1({\tilde{\mu }}_0(\hat{{\textbf{y}}}_{-n+1}),Q(y_{-n}){\tilde{\mu }}_{0} (\hat{{\textbf{y}}}_{-n})).\nonumber \\ \end{aligned}$$
(32)

We thus arrive at

$$\begin{aligned} E[\rho _1(\mu _n({\textbf{Y}}),\mu _{n+1}({\textbf{Y}}))]\le & {} 2E \left[ \frac{K(M_n)}{\alpha (M_n)} (1-\zeta (M_n)/4)^{n}\rho _1({\tilde{\mu }}_0(\hat{{\textbf{Y}}}_{-n+1}),\right. \nonumber \\{} & {} \left. Q(Y_{-n}){\tilde{\mu }}_0(\hat{{\textbf{Y}}}_{-n}))\right] , \end{aligned}$$
(33)

using the notation \(M_n:=\max _{-n+1\le i\le 0}\Vert Y_i\Vert \). We now estimate the expectation on the right-hand side of (33) separately on the events \(\{M_n\ge g(n)\}\) and \(\{M_n< g(n)\}\). Note first that, for each \(m\ge n\),

$$\begin{aligned}{} & {} E\left[ \frac{K(M_m)}{\alpha (M_m)}\left( 1-\frac{\zeta (M_m)}{4}\right) ^m \rho _1({\tilde{\mu }}_0(\hat{{\textbf{Y}}}_{-m+1}), Q(Y_{-m}){\tilde{\mu }}_0(\hat{{\textbf{Y}}}_{-m}))1_{\{M_m\ge g(m)\}}\right] \nonumber \\\le & {} \sum _{k=m}^{\infty } \frac{K(g(k+1))}{\alpha (g(k+1))}\left( 1-\frac{\zeta (g(k+1))}{4}\right) ^m \nonumber \\{} & {} E\left[ \rho _1({\tilde{\mu }}_0(\hat{{\textbf{Y}}}_{-m+1}), Q(Y_{-m}){\tilde{\mu }}_0(\hat{{\textbf{Y}}}_{-m})) 1_{\{g(k+1)>M_m\ge g(k)\}}\right] \nonumber \\\le & {} \sum _{k=m}^{\infty } \frac{K(g(k+1))}{\alpha (g(k+1))}\left( 1-\frac{\zeta (g(k+1))}{4}\right) ^m \nonumber \\{} & {} E\left[ \rho _1({\tilde{\mu }}_0(\hat{{\textbf{Y}}}_{-m+1}), Q(Y_{-m}){\tilde{\mu }}_0(\hat{{\textbf{Y}}}_{-m}))1_{\{M_k\ge g(k)\}}\right] , \end{aligned}$$
(34)

since \(M_{k}\ge M_{m}\). Hence, applying \(1-x\le e^{-x}\), \(x\ge 0\) and (34),

$$\begin{aligned}{} & {} \sum _{m=n}^{\infty } E[\rho _1(\mu _m({\textbf{Y}}),\mu _{m+1}({\textbf{Y}}))]\nonumber \\\le & {} 2\sum _{m=n}^{\infty } \frac{K(g(m))}{\alpha (g(m))}e^{-\frac{m}{4}\zeta (g(m))}E\left[ \rho _1({\tilde{\mu }}_0(\hat{{\textbf{Y}}}_{-m+1}), Q(Y_{-m}){\tilde{\mu }}_0(\hat{{\textbf{Y}}}_{-m})) 1_{\{M_m< g(m)\}}\right] \nonumber \\{} & {} \quad + 2\sum _{m=n}^{\infty }\sum _{k=m}^{\infty } \frac{K(g(k+1))}{\alpha (g(k+1))}\left( 1-\frac{\zeta (g(k+1))}{4}\right) ^m \nonumber \\{} & {} \quad \qquad E\left[ \rho _1({\tilde{\mu }}_0(\hat{{\textbf{Y}}}_{-m+1}), Q(Y_{-m}){\tilde{\mu }}_0(\hat{{\textbf{Y}}}_{-m}))1_{\{M_k\ge g(k)\}}\right] \nonumber \\\le & {} 2\sum _{m=n}^{\infty } \frac{K(g(m))}{\alpha (g(m))}e^{-\frac{m}{4}\zeta (g(m))}E\left[ \rho _1({\tilde{\mu }}_0(\hat{{\textbf{Y}}}_{-m+1}), Q(Y_{-m}){\tilde{\mu }}_0(\hat{{\textbf{Y}}}_{-m}))\right] \nonumber \\{} & {} \quad + 8\sum _{k=n}^{\infty } \frac{K(g(k+1))}{\alpha (g(k+1))\zeta (g(k+1))}e^{-n\zeta (g(k+1))}\nonumber \\{} & {} \qquad \quad E^{1/2}\left[ \rho _1^{2}({\tilde{\mu }}_0(\hat{{\textbf{Y}}}_{-m+1}), Q(Y_{-m}){\tilde{\mu }}_0(\hat{{\textbf{Y}}}_{-m}))\right] P^{1/2}(M_k\ge g(k))\nonumber \\\le & {} 2E\left[ \rho _1({\tilde{\mu }}_0(\hat{{\textbf{Y}}}_{1}), Q(Y_{0}){\tilde{\mu }}_0({\textbf{Y}}))\right] \sum _{m=n}^{\infty } \frac{K(g(m))}{\alpha (g(m))}e^{-\frac{m}{4}\zeta (g(m))}\nonumber \\{} & {} \quad + 8E^{1/2}\left[ \rho _1^{2}({\tilde{\mu }}_0(\hat{{\textbf{Y}}}_{1}), Q(Y_{0}){\tilde{\mu }}_0({\textbf{Y}}))\right] \sum _{k=n}^{\infty } \frac{K(g(k+1))}{\alpha (g(k+1))\zeta (g(k+1))}\sqrt{\ell (k)},\nonumber \\ \end{aligned}$$
(35)

where we have used the closed-form expression for the sum of geometric series and Cauchy–Schwarz in the second inequality; Assumption 2.8 and the fact that the law of \(\rho _1({\tilde{\mu }}_0(\hat{{\textbf{Y}}}_{1}), Q(Y_{0}){\tilde{\mu }}_0({\textbf{Y}}))\) equals that of \(\rho _1({\tilde{\mu }}_0(\hat{{\textbf{Y}}}_{-m+1}), Q(Y_{-m}){\tilde{\mu }}_0(\hat{{\textbf{Y}}}_{-m}))\), for each m, in the third inequality. Recall that

$$\begin{aligned}{} & {} E[\rho _1^{2}({\tilde{\mu }}_0(\hat{{\textbf{Y}}}_{1}), Q(Y_{0}){\tilde{\mu }}_0({\textbf{Y}}))]\\{} & {} \quad \le 2E\left[ \int _{{\mathcal {X}}} (1+V(z))^{2} [{\tilde{\mu }}_0(\hat{{\textbf{Y}}}_{1})+ Q(Y_{0}){\tilde{\mu }}_0({\textbf{Y}})](dz) \right] \\{} & {} \quad = 2E\left[ \int _{{\mathcal {X}}} (1+V(z))^{2} [{\tilde{\mu }}_0({\textbf{Y}})+ Q(Y_{0}){\tilde{\mu }}_0({\textbf{Y}})](dz) \right] <\infty \end{aligned}$$

by (22). A fortiori, \(E[\rho _1({\tilde{\mu }}_0(\hat{{\textbf{Y}}}_{1}), Q(Y_{0}){\tilde{\mu }}_0({\textbf{Y}}))]<\infty \), too. Now it follows from (35) and \(r_1(0)+r_2(0)<\infty \) that

$$\begin{aligned} \sum _{n=1}^{\infty }E[\rho _1(\mu _n({\textbf{Y}}),\mu _{n+1}({\textbf{Y}}))]<\infty . \end{aligned}$$
(36)

Consequently, for a.e. \(\omega \in \Omega \), the sequence \(\mu _n({\textbf{Y}}(\omega ))\), \(n\in {\mathbb {N}}\) is Cauchy and hence convergent for the metric \(\rho _{1}\). Its limit is denoted by \(\mu _{\sharp }(\omega )\).

For later use, we remark that \(\omega \rightarrow \int _{{\mathcal {X}}}\phi (z)\mu _{\sharp }(\omega )(dz)\) is \(\sigma ({\textbf{Y}})\)-measurable for every \(\phi \in \Phi (V)\). Hence, there is a measurable \(\Psi _{\phi }:{\mathfrak {Y}}\rightarrow {\mathbb {R}}\) such that

$$\begin{aligned} \Psi _{\phi }({\textbf{Y}})=\int _{{\mathcal {X}}}\phi (z)\mu _{\sharp }(dz) \text{ a.s. } \end{aligned}$$
(37)

In the sequel, we will need the definition (28) for the kernel \(({\textbf{y}},A)\rightarrow \mu _n({\textbf{y}})(A)\), \({\textbf{y}}\in {\mathfrak {Y}}\), \(A\in {\mathfrak {B}}\) and for similar kernels. Notice that, for any measurable function \(w:{\mathcal {X}}\rightarrow {\mathbb {R}}_+\),

$$\begin{aligned} \int _{{\mathcal {X}}} w(z)\, \left| {\mathcal {E}}[\mu _n({\textbf{Y}})]-{\mathcal {E}}[\mu _{n+1}({\textbf{Y}})]\right| (dz) \le \int _{{\mathcal {X}}} w(z)\, {\mathcal {E}}\left[ \left| \mu _n({\textbf{Y}})-\mu _{n+1}({\textbf{Y}})\right| \right] (dz). \nonumber \\ \end{aligned}$$
(38)

This is trivial for indicators and then follows for all measurable w in a standard way. By similar arguments, we also have

$$\begin{aligned} \int _{{\mathcal {X}}}w(z) {\mathcal {E}}\left[ \left| \mu _n({\textbf{Y}})-\mu _{n+1}({\textbf{Y}})\right| \right] (dz)= E\left[ \int _{{\mathcal {X}}}w(z) \left| \mu _n({\textbf{Y}})-\mu _{n+1}({\textbf{Y}})\right| (dz)\right] . \nonumber \\ \end{aligned}$$
(39)

Notice that \(\mu _{n}=\textrm{Law}(X_{n})={\mathcal {E}}[\mu _n(\hat{{\textbf{Y}}}_{-n})]={\mathcal {E}}[\mu _n({\textbf{Y}})]\). We infer from (38) and (39) that

$$\begin{aligned}\rho _1(\mu _n,\mu _{n+1})= & {} \int _{{\mathcal {X}}}(1+V(z)) \left| {\mathcal {E}}[\mu _n({\textbf{Y}})]-{\mathcal {E}}[\mu _{n+1}({\textbf{Y}})]\right| (dz) \\\le & {} \int _{{\mathcal {X}}}(1+V(z)) {\mathcal {E}}\left[ \left| \mu _n({\textbf{Y}})-\mu _{n+1}({\textbf{Y}})\right| \right] (dz) \\= & {} E\left[ \int _{{\mathcal {X}}}(1+V(z)) \left| \mu _n({\textbf{Y}})-\mu _{n+1}({\textbf{Y}})\right| (dz)\right] \\= & {} E[\rho _1(\mu _n({\textbf{Y}}),\mu _{n+1}({\textbf{Y}}))]. \end{aligned}$$

Then, it follows from (36) that

$$\begin{aligned}\sum _{n=1}^{\infty }\rho _1(\mu _n,\mu _{n+1})<\infty , \end{aligned}$$

so \(\mu _n\), \(n\ge 0\) is a Cauchy sequence for the complete metric \(\rho _1\). Hence, it converges to some probability \(\mu _*\) as \(n\rightarrow \infty \). The claimed convergence rate also follows by the above estimates.

To show uniqueness, let \(X_{0}'\) be another initial condition satisfying Assumption 2.7, with the corresponding \({\tilde{\mu }}_{0}'({\textbf{y}})\), see (21). Defining, just like in (30),

$$\begin{aligned} \mu _0'({\textbf{y}}):={\tilde{\mu }}_{0}'({\textbf{y}}),\ \mu _n'({\textbf{y}}):= Q(y_0)Q(y_{-1})\ldots Q(y_{-n+1}){\tilde{\mu }}_{0}'(\hat{{\textbf{y}}}_{-n+1}),\ n\ge 1, \end{aligned}$$

the estimates (34) and (35) show that

$$\begin{aligned}{} & {} \rho _{1}({\mathcal {E}}[\mu _{n}'({\textbf{Y}})],{\mathcal {E}}[\mu _{n}({\textbf{Y}})]) \le {} E[\rho _{1}(\mu _{n}'({\textbf{Y}}),\mu _{n}({\textbf{Y}}))]\nonumber \\{} & {} \quad \le 2E\left[ \rho _1({\tilde{\mu }}_0({\textbf{Y}}), {\tilde{\mu }}_0'({\textbf{Y}}))\right] \frac{K(g(n))}{\alpha (g(n))}e^{-\frac{n}{4}\zeta (g(n))}\nonumber \\{} & {} \qquad + 8E^{1/2}\left[ \rho _1^{2}({\tilde{\mu }}_0({\textbf{Y}}), {\tilde{\mu }}_0'({\textbf{Y}}))\right] \sum _{k=n}^{\infty } \frac{K(g(k+1))}{\alpha (g(k+1))\zeta (g(k+1))}\sqrt{\ell (k)} \end{aligned}$$
(40)

which tends to 0 when \(n\rightarrow \infty \) since, as before, \(E\left[ \rho _1^{2}({\tilde{\mu }}_0({\textbf{Y}}), {\tilde{\mu }}_0'({\textbf{Y}}))\right] <\infty \), by Assumption 2.7. \(\square \)

Remark 4.3

The proof of Theorem 2.12 also implies convergence for the “quenched” process: there is a set \({\mathfrak {Y}}'\subset {\mathfrak {Y}}\) with \(\textrm{Law}({\textbf{Y}})({\mathfrak {Y}}')=1\) such that, for all \({\textbf{y}}\in {\mathfrak {Y}}'\), the sequence \(\mu _{n}({\textbf{y}})\) converges in \(\rho _{1}\) to a limiting probability \(\mu _{\natural }({\textbf{y}})\) as \(n\rightarrow \infty \).

Remark 4.4

Define the probability \({\bar{\mu }}(A):=E[\mu _{\sharp }(A)]\), \(A\in {\mathfrak {B}}\). It is clear that, for every \(\phi \in \Phi (1)\),

$$\begin{aligned}{} & {} \int _{{\mathcal {X}}}\phi (z)\mu _{*}(dz) = \lim _{n\rightarrow \infty }\int _{{\mathcal {X}}}\phi (z)\mu _{n}(dz) =\lim _{n\rightarrow \infty }\int _{{\mathcal {X}}}\phi (z){\mathcal {E}}[\mu _{n}({\textbf{Y}})](dz)\\= & {} \lim _{n\rightarrow \infty }E\left[ \int _{{\mathcal {X}}} \phi (z) \mu _n({\textbf{Y}})(dz)\right] = E\left[ \int _{{\mathcal {X}}}\phi (z)\mu _{\sharp }(dz)\right] = \int _{{\mathcal {X}}}\phi (z){\bar{\mu }}(dz), \end{aligned}$$

hence \({\bar{\mu }}=\mu _{*}=E[\mu _{\natural }({\textbf{Y}})(A)]\), see the above remark.

Proof of Theorem 2.16

Estimates of Theorem 2.12 imply

$$\begin{aligned}\rho _{0}(\mu _n({\textbf{y}}_n),\mu _{n+1}({\textbf{y}}_{n+1}))\le (1-\zeta (\bar{y}_n)/4)^{n} \rho _{1}({\tilde{\mu }}_0(\hat{{\textbf{y}}}_{-n+1}),Q(y_{-n}){\tilde{\mu }}_0(\hat{{\textbf{y}}}_{-n})). \end{aligned}$$

By (29), this leads to

$$\begin{aligned}\rho _0(\mu _n,\mu _{n+1})\le & {} E[\rho _0(\mu _n({\textbf{Y}}),\mu _{n+1}({\textbf{Y}}))] \le E[\rho _0(\mu _n({\textbf{Y}}),\mu _{n+1}({\textbf{Y}}))1_{\{M_{n}<g(n)\}}] \\{} & {} + 2E[1_{\{M_{n}\ge g(n)\}}]\\\le & {} (1-\zeta (g(n))/4)^n E[\rho _1({\tilde{\mu }}_0(\hat{{\textbf{Y}}}_{-n+1}), Q(Y_{-n}){\tilde{\mu }}_0(\hat{{\textbf{Y}}}_{-n}))1_{\{M_n<g(n)\}}]\\{} & {} +2P\left( M_n\ge g(n)\right) \\\le & {} (1-\zeta (g(n))/4)^n E[\rho _1({\tilde{\mu }}_0(\hat{{\textbf{Y}}}_{1}), Q(Y_{0}){\tilde{\mu }}_0({\textbf{Y}}))]+2P\left( M_n\ge g(n)\right) \\\le & {} C[e^{-n\zeta (g(n))/4}+{\ell (n)}], \end{aligned}$$

for some \(C>0\), using Assumption 2.8 and (23). The result now follows as in the proof of Theorem 2.12.

Remark 4.5

The convergence rates obtained by our method heavily depend on the choice of the functions g and \(\ell \), for which there are multiple options. Hence, no optimality can be claimed at this level of generality. The approach, however, works in many concrete cases where available methods do not.

5 L-Mixing Processes

Let \({\mathcal {G}}_t\), \(t\in {\mathbb {N}}\) be an increasing sequence of sigma-algebras (i.e. a discrete-time filtration), and let \({\mathcal {G}}^+_t\), \(t\in {\mathbb {N}}\) be a decreasing sequence of sigma-algebras such that, for each \(t\in {\mathbb {N}}\), \({\mathcal {G}}_t\) is independent of \({\mathcal {G}}^+_t\).

Let \(W_t\), \(t\in {\mathbb {N}}\) be a real-valued stochastic process. For each \(r\ge 1\), introduce

$$\begin{aligned} M_r(W):=\sup _{t\in {\mathbb {N}}} E^{1/r}[|W_t|^r]. \end{aligned}$$

For each process W such that \(M_1(W)<\infty \) we also define, for each \(r\ge 1\), the quantities

$$\begin{aligned} \gamma _r(W,\tau ):=\sup _{t\ge \tau }E^{1/r}[|W_t-E[W_t|{\mathcal {G}}_{t-\tau }^+]|^r],\ \tau \in {\mathbb {N}},\quad \Gamma _r(W):=\sum _{\tau =0}^{\infty } \gamma _r(W,\tau ). \end{aligned}$$

For some \(r\ge 1\), the process W is called L-mixing of order r with respect to \(({\mathcal {G}}_t,{\mathcal {G}}^+_t)\), \(t\in {\mathbb {N}}\) if it is adapted to \(({\mathcal {G}}_t)_{t\in {\mathbb {N}}}\) and \(M_r(W)<\infty \), \(\Gamma _r(W)<\infty \). We say that W is L-mixing if it is L-mixing of order r for all \(r\ge 1\). This notion of mixing was introduced in [10].

Remark 5.1

It is easy to check that if \(W_t\), \(t\in {\mathbb {N}}\) is L-mixing of order r, then also the process \({\tilde{W}}_t:=W_t-EW_t\), \(t\in {\mathbb {N}}\) is L-mixing of order r; moreover, \(\Gamma _r({\tilde{W}})=\Gamma _r(W)\) and \(M_r({\tilde{W}})\le 2M_r(W)\).

The next lemma is useful when checking the L-mixing property for a given process.

Lemma 5.2

Let \({\mathcal {G}}\subset {\mathcal {F}}\) be a sigma-algebra, X, Y random variables with \(E[|X|^r]+E[|Y|^r]<\infty \) with some \(r\ge 1\). If Y is \({\mathcal {G}}\)-measurable, then

$$\begin{aligned} E^{1/r}[|X-E[X\vert {\mathcal {G}}]|^r]\le 2E^{1/r}[|X-Y|^r]. \end{aligned}$$

Proof

See Lemma 2.1 of [10]. \(\square \)

Lemma 5.3

For an L-mixing process W of order \(r\ge 2\) satisfying \(E[W_t]=0\), \(t\in {\mathbb {N}}\),

$$\begin{aligned} E^{1/r}\left[ \left| \sum _{i=1}^N W_i\right| ^r\right] \le C_r N^{1/2} M_r^{1/2}(W)\Gamma _r^{1/2}(W), \end{aligned}$$

holds for each \(N\ge 1\) with a constant \(C_r\) that does not depend either on N or on W.

Proof

This follows from Theorem 1.1 of [10]. \(\square \)

L-mixing is, in many cases, easier to show than other, better-known mixing concepts such as \(\alpha -\), \(\beta -\) or \(\phi \)-mixing. There seems to be no implication between L-mixing and these latter conditions. For further information and related results, see [10].

6 Proofs of Ergodicity I

Throughout this section, let the assumptions of Theorem 2.17 be in force: Y is an ergodic process; Assumptions 2.2 and 2.7 hold; Assumption 2.5 holds with \(R(n):=8K(n)/\lambda (n)\), \(n\in {\mathbb {N}}\); and we have \(r_1(0)+r_2(0)<\infty \) and

$$\begin{aligned}\left( \frac{K(g(N))}{\lambda (g(N))}\right) ^{2\delta } \frac{\pi (N)}{N}\rightarrow 0,\ N\rightarrow \infty . \end{aligned}$$

In Sect. 4, we profited from contraction estimates for the metric \(\rho _{\beta }\). These required, essentially, that given \(X_{t}\), \(X_{t}'\) a convenient coupling for \(X_{t+1}\), \(X_{t+1}'\) is realized, see, for example, (40). The exact nature of that coupling is hidden in Lemma 4.1. In the current section, we construct couplings for the whole process \(X_{t}\) which allow us to show suitable mixing properties.

We now present a construction that is crucial for proving Theorem 2.17. The random mappings \(T_t\) in the lemma below serve to provide the coupling effects that are needed for establishing the L-mixing property (see Sect. 5) for an auxiliary process (Z below) which will, in turn, lead to Theorem 2.17. Such a representation with random mappings was used in [2, 11]. In our setting, however, there is also dependence on \(y\in {\mathcal {Y}}\).

For \(R\ge 0\), denote by \({\mathfrak {C}}(R)\) the set of \({\mathcal {X}}\rightarrow {\mathcal {X}}\) mappings that are constant on \(C(R)=\{x\in {\mathcal {X}}:\, V(x)\le R\}\).

Lemma 6.1

There exists a sequence of measurable functions \(T_t:{\mathcal {Y}}\times {\mathcal {X}}\times {\Omega } \rightarrow {\mathcal {X}}\), \(t\ge 1\) such that

$$\begin{aligned} P(T_t(y,x,\cdot )\in A)=Q(y,x,A), \end{aligned}$$
(41)

for all \(t\ge 1\), \(y\in {\mathcal {Y}}\), \(x\in {\mathcal {X}}\), \(A\in {\mathfrak {B}}\). There exist independent sigma-algebras \({\mathcal {L}}_t\), \(t\ge 1\) such that the random variables \(T_t(y,x,\cdot ),\, x\in {\mathcal {X}},\, y\in {\mathcal {Y}}\) are \({\mathcal {L}}_{t}\)-measurable. There are events \(J_t(y)\in {\mathcal {L}}_{t}\), for all \(t\ge 1\), \(y\in {\mathcal {Y}}\) such that

$$\begin{aligned} J_t(y)\subset \{\omega \in \Omega :\, T_t(y,\cdot ,\omega )\in {\mathfrak {C}}(R(\Vert y\Vert ))\}\quad \text{ and }\quad P(J_t(y))\ge \alpha (\Vert y\Vert ). \end{aligned}$$
(42)

Proof

Let \(U_t\), \(t\ge 1\) be an independent sequence of uniform random variables on [0, 1]. Let \(\varepsilon _t\), \(t\ge 1\) be another such sequence, independent of \((U_t)_{t\ge 1}\). By enlarging the probability space, if necessary, we can always construct such random variables and we may even assume that \((U_t,\varepsilon _t)\), \(t\ge 1\) are independent of \((X_0,(Y_t)_{t\in {\mathbb {Z}}})\). Let \({\mathcal {L}}_{t}:=\sigma (U_{t},\varepsilon _{t})\).

We assume that \({\mathcal {X}}\) is uncountable, the case of countable \({\mathcal {X}}\) being analogous, but simpler. As \({\mathcal {X}}\) is Borel isomorphic to \({\mathbb {R}}\), see page 159 of [8], we may and will assume that, actually, \({\mathcal {X}}={\mathbb {R}}\). (We omit the details.)

The main idea in the arguments below is to separate the “independent component” \(\alpha (n)\nu _n(\cdot )\) from the rest of the kernel \(Q(y,x,\cdot )-\alpha (n)\nu _n(\cdot )\) for \(y\in A_n\) and \(x\in C(R(n))\). This independent component will ensure the existence of the constant mappings in (42).

Recall the sets \(A_n\), \(n\in {\mathbb {N}}\) from Assumption 2.2. Let \(B_n:=A_n\setminus A_{n-1}\), \(n\in {\mathbb {N}}\), with the convention \(A_{-1}:=\emptyset \). For each \(n\in {\mathbb {N}}\), \(y\in B_n\), let \(j_n(y,r):=\nu _{n}((-\infty ,r])\), \(r\in {\mathbb {R}}\) (the cumulative distribution function of \(\nu _n\)) and define its (\({\mathfrak {A}}\otimes {\mathcal {B}}({\mathbb {R}})\)-measurable) pseudoinverse by \(j^-_n(y,z):=\inf \{r\in {\mathbb {Q}}:\, j_n(y,r)\ge z\}\), \(z\in (0,1)\). Here, \({\mathcal {B}}({\mathbb {R}})\) refers to the Borel field of \({\mathbb {R}}\). Similarly, for \(y\in B_n\) and \(x\in C(R(n))\), let

$$\begin{aligned} q(y,x,r):=\frac{Q(y,x,(-\infty ,r])-\alpha (n)j_n(y,r)}{1-\alpha (n)},\ r\in {\mathbb {R}}, \end{aligned}$$

the cumulative distribution function of the normalization of \(Q(y,x,\cdot )-\alpha (n)\nu _n(\cdot )\). For \(x\notin C(R(n))\), set simply

$$\begin{aligned} q(y,x,r):=Q(y,x,(-\infty ,r]),\ r\in {\mathbb {R}}. \end{aligned}$$

For each \(x\in {\mathcal {X}}\), define

$$\begin{aligned} q^-(y,x,z):=\inf \{r\in {\mathbb {Q}}:\, q(y,x,r)\ge z\},\ z\in (0,1). \end{aligned}$$

Define, for \(n\in {\mathbb {N}}\), \(y\in B_n\),

$$\begin{aligned}T_t(y,x,\omega ):= & {} q^-(y,x,\varepsilon _t), \text{ if } U_t(\omega )>\alpha (n) \text{ or } U_t(\omega )\le \alpha (n) \text{ but } x\notin C(R(n)),\\ T_t(y,x,\omega ):= & {} j_n^-(y,\varepsilon _t), \text{ if } U_t(\omega )\le \alpha (n) \text{ and } x\in C(R(n)). \end{aligned}$$

Notice that \(T_t(y,\cdot ,\omega )\in {\mathfrak {C}}({R(n)})\) whenever \(U_t(\omega )\le \alpha (n)\), this implies (42) with \(J_t(y):=\{\omega \in \Omega :\, U_t(\omega )\le \alpha (\Vert y\Vert )\}\). It is easy to check (41), too. \(\square \)

Remark 6.2

Note that, in the above construction, \((U_n,\varepsilon _n)_{n\in {\mathbb {N}}}\) was taken to be independent of \((X_0,(Y_t)_{t\in {\mathbb {Z}}})\). This will be important later, in the proof of Theorem 2.17.

We drop dependence of the mappings \(T_t\) on \(\omega \in \Omega \) in the notation from now on and will simply write \(T_t(y,x)\). We continue our preparations for the proof of Theorem 2.17. Let

$$\begin{aligned} {\mathcal {G}}_{0}:=\{\emptyset ,\Omega \},\quad {\mathcal {G}}_t:=\sigma (\varepsilon _i,U_i,\ 1\le i\le t),\ t\ge 1 \end{aligned}$$
(43)

and

$$\begin{aligned} {\mathcal {G}}^+_t:=\sigma (\varepsilon _i,U_i,\ i\ge t+1),\ t\in {\mathbb {N}}. \end{aligned}$$
(44)

Take an arbitrary element \({\tilde{x}}\in {\mathcal {X}}\), this will remain fixed throughout this section.

Our approach to the ergodic theorem for X does not rely on the Markovian structure, and it proceeds rather through establishing a convenient mixing property. The ensuing arguments will lead to Theorem 2.17 via the L-mixing property of certain auxiliary Markov chains. It turns out that L-mixing is particularly well adapted to Markov chains, even when they are time-inhomogeneous. (And for us this is the crucial point.) See also Sect. 7 about these issues.

The main ideas of the arguments below go back to [2] and [11]. In [11], Doeblin chains were treated. We need to extend those arguments substantially in the present, more complicated setting.

Let us fix \({\textbf{y}}=(y_j)_{j\in {\mathbb {Z}}}\in {\mathfrak {Y}}\) till further notice such that, for some \(H\in {\mathbb {N}}\), \(\Vert y_j\Vert \le H\) holds for all \(j\in {\mathbb {Z}}\). Define

$$\begin{aligned} Z_0:=X_0,\quad Z_{t+1}:=T_{t+1}({y}_t,Z_t),\ t\in {\mathbb {N}}. \end{aligned}$$
(45)

Clearly, the process Z heavily depends on the choice of \({\textbf{y}}\). However, for a while we do not signal this dependence for notational simplicity. Fix also \(m\in {\mathbb {N}}\) till further notice. Define also

$$\begin{aligned} {\tilde{Z}}_m:={\tilde{x}}, \quad {\tilde{Z}}_{t+1}:=T_{t+1}({y}_t,{\tilde{Z}}_t),\ t\ge m. \end{aligned}$$
(46)

Notice that \({\tilde{Z}}_t\), \(t\ge m\) are \({\mathcal {G}}^+_m\)-measurable.

Our purpose will be to prove that, with a large probability, \(Z_{m+\tau }={\tilde{Z}}_{m+\tau }\) for \(\tau \) large enough. In other words, a coupling between the processes Z and \({\tilde{Z}}\) is realized. Fix \(\epsilon =\epsilon (H)>0\) which will be specified later. Let \(\tau \ge 1\) be an arbitrary integer. Denote \(\vartheta :=\lceil \, \epsilon \tau \, \rceil \). Recall that \(R(H)=8K(H)/\lambda (H)\). Define \(D:=C(R(H)/2)=\{x\in {\mathcal {X}}:\, V(x)\le R(H)/2\}\) and \({\overline{D}}:=\{(x_1,x_2)\in {\mathcal {X}}^2:\, V(x_1)+V(x_2)\le R(H)\}\). Denote \({\overline{Z}}_t:=(Z_t,{\tilde{Z}}_t)\), \(t\ge m\).

Lemma 6.3

We have \(\sup _{k\in {\mathbb {N}}}E[V(Z_k)]\le E[V(X_0)]+K(H)/\lambda (H)<\infty \). Furthermore, \(\sup _{k\ge m}E[V({\tilde{Z}}_k)]\le V({\tilde{x}})+K(H)/\lambda (H)\).

Proof

Assumption 2.2 implies that, for \(k\ge 1\),

$$\begin{aligned} E[V(Z_k)]\le (1-\lambda (H))E[V(Z_{k-1})]+K(H). \end{aligned}$$

Assumption 2.7 implies that \(E[V(X_0)]=E[V(Z_0)]<\infty \) so, for every \(k\in {\mathbb {N}}\),

$$\begin{aligned}E[V(Z_k)]\le E[V(X_0)]+\sum _{l=0}^{\infty } K(H)(1-\lambda (H))^l=E[V(X_0)]+\frac{K(H)}{\lambda (H)}. \end{aligned}$$

Similarly,

$$\begin{aligned} E[V({\tilde{Z}}_k)]\le V({\tilde{x}})+\sum _{l=0}^{\infty } K(H)(1-\lambda (H))^l= V({\tilde{x}})+\frac{K(H)}{\lambda (H)}. \end{aligned}$$

\(\square \)

The counterpart of the above lemma for X instead of Z is the following.

Lemma 6.4

$$\begin{aligned} \sup _{n\in {\mathbb {N}}}E[V(X_n)]<\infty . \end{aligned}$$

Proof

Note that \(E[V(X_0)]<\infty \) by Assumption 2.7. So, for each \(n\ge 1\),

$$\begin{aligned}E[V(X_n)]\le & {} \int _{{\mathcal {X}}}(1+V(z))\mu _n(dz) \\\le & {} \int _{{\mathcal {X}}}(1+V(z))|\mu _n-\mu _0|(dz) + \int _{{\mathcal {X}}}(1+V(z))\mu _0(dz) \\= & {} \rho _1(\mu _n,\mu _0)+ E[V(X_0)]+1 \\\le & {} \rho _{1}(\mu _{n},\mu _{*})+\rho _{1}(\mu _{0},\mu _{*})+ E[V(X_0)]+1 \\\le & {} 2C[r_{1}(0)+r_{2}(0)]+E[V(X_0)]+1, \end{aligned}$$

by Theorem 2.12. \(\square \)

We note for later use that if \(z\in {\mathcal {X}}\setminus D\), then for all \(y\in A_H\),

$$\begin{aligned}{}[Q(y)(K(H)+V)](z)\le & {} (1-\lambda (H))V(z)+2K(H)\nonumber \\\le & {} (1-\lambda (H)/2)V(z). \end{aligned}$$
(47)

Recall Definitions (43) and (44). Define the \(({\mathcal {G}}_t)_{t\in {\mathbb {N}}}\)-stopping times

$$\begin{aligned} \sigma _0:=m,\ \sigma _{n+1}:=\min \{i>\sigma _n:\ {\overline{Z}}_i\in {\overline{D}}\}. \end{aligned}$$

The results below serve to control the number of returns to \({\overline{D}}\) and the probability of coupling between the processes Z and \({\tilde{Z}}\). Our estimation strategy in the proof of Theorem 2.17 will be the following. We will control \(P({\tilde{Z}}_{\tau +m}\ne Z_{\tau +m})\) for large \(\tau \): either there were only few returns of the process \({\overline{Z}}\) to \({\overline{D}}\) (which happens with small probability) or there were many returns but coupling did not occur (which also has small probability). First let us present a lemma controlling the number of returns to \({\overline{D}}\).

Lemma 6.5

There is \(\bar{C}>0\) such that

$$\begin{aligned} \sup _{n\ge 1} E\left[ \exp (\varrho (H)(\sigma _{n+1}-\sigma _n))\big \vert {\mathcal {G}}_{\sigma _n}\right] \le \frac{\bar{C}}{\lambda ^2(H)}, \end{aligned}$$

and

$$\begin{aligned} E[\exp (\varrho (H)(\sigma _1-\sigma _0))]\le \frac{\bar{C}}{\lambda ^2(H)} \end{aligned}$$

where

$$\begin{aligned} \varrho (H):=\ln (1+\lambda (H)/2). \end{aligned}$$
(48)

In particular, \(\sigma _n<\infty \) a.s. for each \(n\in {\mathbb {N}}\). Furthermore, \(\bar{C}\) does not depend on either \({\textbf{y}}\), m or H.

Proof

We can estimate, for \(k\ge 1\) and \(n\ge 1\),

$$\begin{aligned}{} & {} P(\sigma _{n+1}-\sigma _n> k\vert {\mathcal {G}}_{\sigma _n}) = P({\overline{Z}}_{\sigma _n+k}\notin {\overline{D}},\ldots , {\overline{Z}}_{\sigma _n+1} \notin {\overline{D}}\vert {\mathcal {G}}_{\sigma _n}) \\{} & {} \quad \le E\left[ \left( \frac{V(Z_{\sigma _n+k})+V({\tilde{Z}}_{\sigma _n+k})}{R(H)}\right) 1_{\{{\overline{Z}}_{\sigma _n+k-1}\notin {\overline{D}}\}}\cdots 1_{\{{\overline{Z}}_{\sigma _n+1}\notin {\overline{D}}\}}\vert {\mathcal {G}}_{\sigma _n}\right] \\{} & {} \quad = E\left[ E\left[ \left( \frac{V(Z_{\sigma _n+k})+V({\tilde{Z}}_{\sigma _n+k})}{R(H)}\right) 1_{\{{\overline{Z}}_{\sigma _n+k-1}\notin {\overline{D}}\}} |{\mathcal {G}}_{\sigma _n+k-1}\right] 1_{\{{\overline{Z}}_{\sigma _n+k-2}\notin {\overline{D}}\}}\cdots 1_{\{{\overline{Z}}_{\sigma _n+1}\notin {\overline{D}}\}}\vert {\mathcal {G}}_{\sigma _n}\right] . \end{aligned}$$

Notice that \(\{{\overline{Z}}_{\sigma _n+k-1}\notin {\overline{D}}\}\subset \{ Z_{\sigma _{n}+k-1}\notin D\} \cup \{ {\tilde{Z}}_{\sigma _{n}+k-1}\notin D\}=:E_{1}\cup E_{2}\). Assumption 2.2 and the observation (47) imply that

$$\begin{aligned}{} & {} E\left[ \left( \frac{V(Z_{\sigma _n+k})+V({\tilde{Z}}_{\sigma _n+k})}{R(H)}\right) {} 1_{\{{\overline{Z}}_{\sigma _n+k-1}\notin {\overline{D}}\}}|{\mathcal {G}}_{\sigma _n+k-1}\right] \\{} & {} \quad \le 1_{E_{1}}\left[ \frac{1}{R(H)}[(1-\lambda (H)/2) V(Z_{\sigma _n+k-1})-K(H)] \right. \\{} & {} \qquad \qquad \left. + \frac{1}{R(H)}[(1-\lambda (H))V({\tilde{Z}}_{\sigma _n+k-1})+K(H)]\right] \\{} & {} \qquad + 1_{E_{2}\setminus E_{1}}\left[ \frac{1}{R(H)}[(1-\lambda (H)) V(Z_{\sigma _n+k-1})-K(H)] \right. \\{} & {} \qquad \qquad \quad \left. + \frac{1}{R(H)}[(1-\lambda (H)/2)V({\tilde{Z}}_{\sigma _n+k-1})+K(H)]\right] \\{} & {} \quad \le \frac{1-\lambda (H)/2}{R(H)} [V(Z_{\sigma _n+k-1})+V({\tilde{Z}}_{\sigma _n+k-1})]. \end{aligned}$$

This argument can clearly be iterated and leads to

$$\begin{aligned}{} & {} P(\sigma _{n+1}-\sigma _n> k\vert {\mathcal {G}}_{\sigma _n})\\\le & {} \frac{(1-\lambda (H)/2)^{k-1}}{R(H)} E\left[ V(Z_{\sigma _n+1})+V({\tilde{Z}}_{\sigma _n+1})\Big \vert {\mathcal {G}}_{\sigma _n}\right] \\\le & {} \frac{(1-\lambda (H)/2)^{k-1}}{R(H)} \left[ (1-\lambda (H))\left[ V(Z_{\sigma _n})+V({\tilde{Z}}_{\sigma _n})\right] +2K(H)\right] \\\le & {} (1-\lambda (H)/2)^{k}, \end{aligned}$$

by Assumption 2.2, since \({\overline{Z}}_{\sigma _n}\in {\overline{D}}\). In the case \(n=0\), \(k\ge 1\), we arrive at

$$\begin{aligned}{} & {} P(\sigma _{1}-\sigma _0> k)\\\le & {} E\left[ (1-\lambda (H))(V(Z_m)+V({\tilde{x}}))+2K(H)\right] \frac{(1-\lambda (H)/2)^{k-1}}{R(H)} \\\le & {} \left( E[V(X_0)]+\frac{1}{8}+V({\tilde{x}})+\frac{\lambda (H)}{4}\right) \left( 1-\frac{\lambda (H)}{2}\right) ^{k-1} \end{aligned}$$

instead, in a similar way, by Lemma 6.3.

Now, we turn from probabilities to expectations. Using \(e^{\varrho (H)}\le 2\), we can estimate, for \(n\ge 1\),

$$\begin{aligned}E\left[ \exp \{\varrho (H)(\sigma _{n+1}-\sigma _n)\}\big \vert {\mathcal {G}}_{\sigma _n}\right]\le & {} \sum _{k=0}^{\infty } e^{\varrho (H)(k+1)}\left( 1-\frac{\lambda (H)}{2}\right) ^{k} \\\le & {} 2\sum _{k=0}^{\infty } \left( 1-\frac{\lambda ^2(H)}{4}\right) ^{k} = \frac{8}{\lambda ^2(H)}. \end{aligned}$$

When \(n=0\), we obtain

$$\begin{aligned}{} & {} E\left[ \exp \{\varrho (H)(\sigma _{1}-\sigma _0)\}\right] \\{} & {} \quad \le \left( E[V(X_0)]+\frac{1}{8}+V({\tilde{x}})+\frac{\lambda (H)}{4}\right) \left[ e^{\varrho (H)}+ \sum _{k=1}^{\infty } e^{\varrho (H)(k+1)}\left( 1-\frac{\lambda (H)}{2}\right) ^{k-1}\right] \\{} & {} \quad \le \frac{\bar{C}}{\lambda ^2(H)}, \end{aligned}$$

for some \(\bar{C}\). We may and will assume \(\bar{C}\ge 8\). The statement follows. \(\square \)

The quantity \(\epsilon >0\) has been arbitrary thus far. Now, let us make the choice

$$\begin{aligned} \epsilon :=\epsilon (H)=\frac{\varrho (H)}{4(\ln (\bar{C})-2\ln (\lambda (H)))}. \end{aligned}$$
(49)

Recall that \(\tau \ge 1\) has also been arbitrary. Recall that \(\vartheta =\lceil \epsilon \tau \rceil \).

Corollary 6.6

If \(\tau \ge 1/\epsilon (H)\), then \(P(\sigma _{\vartheta }>m+\tau )\le \exp (-\varrho (H)\tau /2)\).

Proof

Lemma 6.5 and the tower rule for conditional expectations easily imply

$$\begin{aligned} E[\exp (\varrho (H)\sigma _{\vartheta })]\le \left( \frac{\bar{C}}{\lambda ^2(H)}\right) ^{\vartheta }e^{\varrho (H)m}. \end{aligned}$$

Hence, by the Markov inequality,

$$\begin{aligned} P(\sigma _{\vartheta }>m+\tau )\le \left( \frac{\bar{C}}{\lambda ^2(H)}\right) ^{\vartheta }\exp (-\varrho (H)\tau ). \end{aligned}$$

The statement now follows by direct calculations. Indeed, this choice of \(\epsilon (H)\) and \(\tau \ge 1/\epsilon (H)\) implies

$$\begin{aligned} (\ln (\bar{C})-2\ln (\lambda (H)))[\epsilon (H)\tau +1]\le \frac{\tau }{2}\ln (1+\lambda (H)/2), \end{aligned}$$

which guarantees

$$\begin{aligned}(\ln (\bar{C})-2\ln (\lambda (H))\lceil \epsilon (H)\tau \rceil -\tau \ln (1+\lambda (H)/2)\le -\frac{\tau }{2}\ln (1+\lambda (H)/2). \end{aligned}$$

\(\square \)

The next lemma controls the probability of coupling between Z and \({\tilde{Z}}\).

Lemma 6.7

For all \(\tau \ge 1\),

$$\begin{aligned} P(Z_{m+\tau }\ne {\tilde{Z}}_{m+\tau },\ \sigma _{\vartheta }\le m+\tau )\le (1-\alpha (H))^{\vartheta -1}\le e^{-(\vartheta -1)\alpha (H)}. \end{aligned}$$

Proof

For typographical reasons, we will write \(\sigma (n)\) instead of \(\sigma _n\) in this proof. Notice that if \(\omega \in \Omega \) is such that \(\sigma (k)(\omega )<m+\tau \) and \(T_{\sigma (k)(\omega )+1}(y_{\sigma (k)(\omega )+1}, \cdot ,\omega )\in {\mathfrak {C}}(R(H))\), then \(Z_{\sigma (k)(\omega )+1}(\omega )={\tilde{Z}}_{\sigma (k)(\omega ) +1}(\omega )\) and hence also \(Z_{m+\tau }(\omega )={\tilde{Z}}_{m+\tau }(\omega )\). Recall the proof of Lemma 6.1 and estimate

$$\begin{aligned}{} & {} P(Z_{m+\tau }\ne {\tilde{Z}}_{m+\tau },\ \sigma ({\vartheta })\le m+\tau ) \\\le & {} P(U_{\sigma (1)+1}>\alpha (H),\ldots , U_{\sigma ({\vartheta -1})+1}>\alpha (H)) \\= & {} E[E[1_{\{U_{\sigma ({\vartheta -1})+1}>\alpha (H)\}}\vert {\mathcal {G}}_{\sigma ({\vartheta -1})}] 1_{\{U_{\sigma (1)+1}>\alpha (H)\}}\cdots 1_{\{U_{\sigma (\vartheta -2)+1}>\alpha (H)\}}]. \end{aligned}$$

As easily seen,

$$\begin{aligned}E[1_{\{U_{\sigma ({\vartheta -1})+1}>\alpha (H)\}} \vert {\mathcal {G}}_{\sigma ({\vartheta -1})}] = (1-\alpha (H)). \end{aligned}$$

Iterating the above argument, we arrive at the statement of this lemma using \(1-x\le e^{-x}\), \(x\ge 0\). \(\square \)

Lemma 6.8

Let \(\phi \in \Phi (V^{\delta })\) for some \(0<\delta \le 1/2\). Then, the process \(\phi (Z_t)\), \(t\in {\mathbb {N}}\) is L-mixing of order p with respect to \(({\mathcal {G}}_t,{\mathcal {G}}^+_t)\), \(t\in {\mathbb {N}}\), for all \(1\le p<1/\delta \). Furthermore, \(\Gamma _{p}(\phi (Z))\), \(M_{p}(\phi (Z))\) (see Sect. 5 for the definitions of these quantities) have upper bounds that do not depend on \({\textbf{y}}\), only on H.

In the sequel, we will use, without further notice, the following elementary inequalities for \(x,y\ge 0\):

$$\begin{aligned} (x+y)^r\le 2^{r-1}(x^r+y^r) \text{ if } r\ge 1;\ (x+y)^r\le x^r+y^r \text{ if } 0<r<1. \end{aligned}$$

Proof of Lemma 6.8

Clearly,

$$\begin{aligned}M_{1/\delta }(\phi (Z))\le {\tilde{C}}\left[ 1+\left( E[V(X_0)]+\frac{K(H)}{\lambda (H)}\right) ^{\delta }\right] , \end{aligned}$$

with some constant \({\tilde{C}}>0\), by Lemma 6.3. Also,

$$\begin{aligned} M_p(\phi (Z))\le M_{1/\delta }(\phi (Z)), \end{aligned}$$

for all \(1\le p<1/\delta \).

Now, we turn to establishing a bound for \(\Gamma _p(\phi (Z))\). Since \({\tilde{Z}}_m\) is deterministic, \({\tilde{Z}}_{m+\tau }\) is \({\mathcal {G}}_m^+\)-measurable for \(\tau \ge 0\). Lemma 5.2 implies that, for \(\tau \ge 1\),

$$\begin{aligned}{} & {} E^{1/p}[|\phi (Z_{m+\tau })-E[\phi (Z_{m+\tau })\vert {\mathcal {G}}_{m}^+]|^p] \nonumber \\\le & {} 2E^{1/p}[|\phi (Z_{m+\tau })-\phi ({\tilde{Z}}_{m+\tau })|^p] \nonumber \\\le & {} 2E^{1/p}[(|\phi (Z_{m+\tau })|+|\phi ({\tilde{Z}}_{m+\tau })|)^p 1_{\{Z_{m+\tau }\ne {\tilde{Z}}_{m+\tau }\}}] \nonumber \\\le & {} 2 E^{\delta }[(|\phi (Z_{m+\tau })|+|\phi ({\tilde{Z}}_{m+\tau })|)^{1/\delta }]P^{\frac{1-p\delta }{p}}(Z_{m+\tau }\ne {\tilde{Z}}_{m+\tau }), \end{aligned}$$
(50)

using Hölder’s inequality with the exponents \(1/(p\delta )\) and \(1/(1-p\delta )\). By Lemma 6.3,

$$\begin{aligned}{} & {} E^{\delta }[(|\phi (Z_{m+\tau })|+|\phi ({\tilde{Z}}_{m+\tau })|)^{1/\delta }] \nonumber \\{} & {} \quad \le {\tilde{C}}\left[ 1+\left( E[V(X_0)]+\frac{K(H)}{\lambda (H)}\right) ^{\delta }\right] \nonumber \\{} & {} \qquad + {\tilde{C}}\left[ 1+\left( V({\tilde{x}})+\frac{K(H)}{\lambda (H)}\right) ^{\delta }\right] \le \check{C}\left[ \frac{K(H)}{\lambda (H)}\right] ^{\delta }, \end{aligned}$$
(51)

for some suitable \(\check{C}>0\). Here, we have used \(K\ge 1\). Since

$$\begin{aligned} P(Z_{m+\tau }\ne {\tilde{Z}}_{m+\tau })\le P(Z_{m+\tau }\ne {\tilde{Z}}_{m+\tau },\ \sigma _{\vartheta }\le m+\tau ) + P(\sigma _{\vartheta }> m+\tau ), \end{aligned}$$

we obtain from (51), Lemma 6.7 and Corollary 6.6 that for \(\tau \ge 1/\varepsilon (H)\),

$$\begin{aligned}{} & {} \gamma _p(\phi (Z),\tau ) \nonumber \\{} & {} \quad \le 2\check{C}\left( \frac{K(H)}{\lambda (H)}\right) ^{\delta }P^{\frac{1-p\delta }{p}}(Z_{m+\tau }\ne {\tilde{Z}}_{m+\tau }) \nonumber \\{} & {} \quad \le 2\check{C}\left( \frac{K(H)}{\lambda (H)}\right) ^{\delta } \left[ \exp \left( -\alpha (H)[\epsilon (H)\tau -1]\right) + \exp \left( -\frac{\varrho (H)\tau }{2}\right) \right] ^{(1-p\delta )/p},\nonumber \\ \end{aligned}$$
(52)

noting that the estimates of Lemma 6.7 and Corollary 6.6 do not depend on the choice of m. For each integer

$$\begin{aligned} 0\le \tau <1/\epsilon (H), \end{aligned}$$

we will apply the trivial estimate

$$\begin{aligned} \gamma _p(\phi (Z),\tau )\le 2M_p(\phi (Z))\le 2M_{1/\delta }(\phi (Z))\le 2\check{C}\left[ \frac{K(H)}{\lambda (H)}\right] ^{\delta }, \end{aligned}$$
(53)

recall (51). Keep in mind the definition \(\vartheta =\lceil \epsilon \tau \rceil \). We can then write, using the formula for the sum of geometric series,

$$\begin{aligned}{} & {} \Gamma _p(\phi (Z))\le 2\check{C}\left( \frac{K(H)}{\lambda (H)}\right) ^{\delta }\nonumber \\{} & {} \left[ \frac{1}{\epsilon (H)} + \sum _{\tau \ge 1/\epsilon (H)} \left[ \exp \left( -\alpha (H)[\epsilon (H)\tau -1]\right) + \exp \left( -\frac{\varrho (H)\tau }{2}\right) \right] ^{(1-p\delta )/p}\right] \nonumber \\{} & {} \le c'\left[ \frac{1}{\epsilon (H)}+\frac{\exp \left( {\alpha (H)}(1-p\delta )/p\right) }{1-\exp \left( -\alpha (H)\epsilon (H)(1-p\delta )/p\right) }+ \frac{1}{1-\exp \left( -\frac{\varrho (H)(1-p\delta )}{2p}\right) }\right] \nonumber \\{} & {} \quad \left( \frac{K(H)}{\lambda (H)}\right) ^{\delta } \end{aligned}$$
(54)

with some constant \(c'\). Using elementary properties of the functions \(x\rightarrow 1/(1-e^{-x})\) and \(x\rightarrow \ln (1+x)\) and the definitions (49) and (48), \(\Gamma _p(\phi (Z))\) can be estimated from above by

$$\begin{aligned} c''\left[ \frac{1}{\alpha (H)\epsilon (H)}+ \frac{1}{\lambda (H)}\right] \left( \frac{K(H)}{\lambda (H)}\right) ^{\delta }\le & {} c'''\frac{|\ln (\lambda (H))|}{\alpha (H)\lambda (H)}\left( \frac{K(H)}{\lambda (H)}\right) ^{\delta } \end{aligned}$$
(55)

with some \(c'',c'''>0\),. The L-mixing property of order p follows. (Note, however, that \(c'''\) depends on p, \(\delta \) as well as on \(V({\tilde{x}})\) and \(E[V(X_0)]\).) \(\square \)

Proof of Theorem 2.17

Recall the definitions (45) and (46). Now, we start signalling the dependence of Z on \({\textbf{y}}\) and hence write \(Z_t^{{\textbf{y}}}\), \(t\in {\mathbb {N}}\). Note that the law of \(Z_t^{{\textbf{Y}}}\), \(t\in {\mathbb {N}}\) equals that of \(X_t\), \(t\in {\mathbb {N}}\), by construction of Z and by Remark 6.2.

For \(t\in {\mathbb {N}}\) and \({\textbf{y}}\in {\mathfrak {Y}}\), define \(\psi _{t}({\textbf{y}}):=\int _{{\mathcal {X}}}\phi (x)\mu _{n}({\textbf{y}})(dx)\), recall the definition of \(\mu _{n}({\textbf{y}})\) from (30). Notice that \(\psi _{t}(\hat{{\textbf{y}}}_{t})=E[\phi (Z^{{\textbf{y}}}_{t})]\). Define \(W_t({\textbf{y}}):=\phi (Z^{{\textbf{y}}}_t)-\psi _{t}(\hat{{\textbf{y}}}_{t})\). Clearly, \(W_{t}({\textbf{y}})\) is a zero-mean process.

Fix \(p\ge 2\) such that \(\delta p<1\). Fix \(N\in {\mathbb {N}}\) for the moment. In the particular case where \({\textbf{y}}\) satisfies \(|y_j|\le H:=g(N)\), \(j\in {\mathbb {N}}\), the process \(W_t({\textbf{y}})\), \(t\in {\mathbb {N}}\) is L-mixing by Lemma 6.8 and Remark 5.1. Hence, Lemma 5.3 implies

$$\begin{aligned}{} & {} E^{1/p}\left[ \left| \frac{W_1({\textbf{y}})+\ldots +W_N({\textbf{y}})}{N}\right| ^p\right] \\{} & {} \quad \le \frac{C_p M_p^{1/2}(W({\textbf{y}}))\Gamma _p^{1/2}(W({\textbf{y}}))}{N^{1/2}}\\{} & {} \quad \le \frac{C_p M_{1/\delta }^{1/2}(W({\textbf{y}}))\Gamma _p^{1/2}(W({\textbf{y}}))}{N^{1/2}}\\{} & {} \quad \le \frac{2C_p \sqrt{\check{C}} [K(g(N))/\lambda (g(N))]^{\delta /2} \sqrt{c'''}[K(g(N))/\lambda (g(N))]^{\delta /2} \pi ^{1/2}(N)}{N^{1/2}}, \end{aligned}$$

by (51) and (55); recall also Remark 5.1. Fix \({\tilde{y}}\in A_0\) and define

$$\begin{aligned} {\tilde{Y}}_j:={Y}_j, \text{ if } {Y}_j\in A_{g(N)},\ {\tilde{Y}}_j:={\tilde{y}}, \text{ if } {Y}_j\notin A_{g(N)}. \end{aligned}$$

Let \(\tilde{{\textbf{Y}}}=({\tilde{Y}}_{j})_{j\in {\mathbb {Z}}}\in {\mathfrak {Y}}\). Note that, by \(\phi \in \Phi (V^{\delta })\),

$$\begin{aligned} E^{\delta }[|W_j({\textbf{Y}})|^{1/\delta }]\le 2{\tilde{C}}(1+E^{\delta }[V(X_j)]),\ j\ge 1. \end{aligned}$$

Estimate, using Hölder’s inequality with exponents \(1/(\delta p)\), \(1/(1-\delta p)\),

$$\begin{aligned}{} & {} E^{1/p}\left[ \left| \frac{W_1({\textbf{Y}})+\ldots +W_N({\textbf{Y}})}{N}\right| ^p\right] \nonumber \\{} & {} \quad \le E^{1/p}\left[ \left| \frac{W_1(\tilde{{\textbf{Y}}})+\ldots +W_N(\tilde{{\textbf{Y}}})}{N}\right| ^p\right] \nonumber \\{} & {} \qquad + M_{1/\delta }(W({\textbf{Y}}))P^{\frac{1-p\delta }{p}}(({\tilde{Y}}_1,\ldots ,{\tilde{Y}}_N)\ne (Y_1,\ldots ,Y_N)) \nonumber \\{} & {} \quad \le \frac{C'[K(g(N))/\lambda (g(N))]^{\delta } \pi ^{1/2}(N)}{N^{1/2}}+C'\left( 1+\sup _{n\in {\mathbb {N}}} E[V(X_n)]\right) ^{\delta }\ell ^{\frac{1-p\delta }{p}}(N) \nonumber \\{} & {} \quad \le \frac{C''[K(g(N))/\lambda (g(N))]^{\delta } \pi ^{1/2}(N)}{N^{1/2}}+C''\ell ^{\frac{1-p\delta }{p}}(N), \end{aligned}$$
(56)

with some constants \(C',C''>0\), by Lemma 6.4. Here, we have also used the fact that if \(({\tilde{Y}}_1,\ldots ,{\tilde{Y}}_N)=(Y_1,\ldots ,Y_N)\), then also \(W_{j}({\textbf{Y}})=W_{j}(\tilde{{\textbf{Y}}})\), \(j=1,\ldots ,N\). The quantity (56) tends to 0 by our hypotheses (12) and \(r_{2}(0)<\infty \).

Recall the notation \(\hat{{\textbf{Y}}}_{n}:=(Y_{j+n})_{j\in {\mathbb {Z}}}\) and the functional \(\Psi _{\phi }\) from (37). Now, we can estimate

$$\begin{aligned}{} & {} \left| \int _{{\mathcal {X}}} \phi (z)\mu _*(dz)-\frac{\sum _{j=1}^{N}\phi (Z_j^{{\textbf{Y}}})}{N}\right| \nonumber \\{} & {} \quad \le \left| \int _{{\mathcal {X}}} \phi (z)\mu _*(dz)-\frac{\sum _{j=1}^{N}\Psi _{\phi }(\hat{{\textbf{Y}}}_{j})}{N}\right| \nonumber \\{} & {} \qquad + \left| \frac{\sum _{j=1}^{N}\Psi _{\phi }(\hat{{\textbf{Y}}}_{j})}{N}-\frac{\sum _{j=1}^{N}\psi _{j}(\hat{{\textbf{Y}}}_{j})}{N}\right| \nonumber \\{} & {} \qquad + \left| \frac{\sum _{j=1}^{N}\psi _{j}(\hat{{\textbf{Y}}}_{j})}{N}-\frac{\sum _{j=1}^{N}\phi (Z_j^{{\textbf{Y}}})}{N}\right| . \end{aligned}$$
(57)

Birkhoff’s theorem and the ergodicity of the process Y imply that

$$\begin{aligned}\frac{\sum _{j=1}^{N}\Psi _{\phi }(\hat{{\textbf{Y}}}_{j})}{N}\rightarrow \int _{{\mathcal {X}}}\phi (z)\mu _{*}(dz),\ N\rightarrow \infty , \end{aligned}$$

almost surely, hence also in probability, noting Remark 4.4.

By stationarity of the process \(\hat{{\textbf{Y}}}_{k}\), \(k\in {\mathbb {Z}}\), we have that

$$\begin{aligned}E\left[ \max \left( |\psi _{j}(\hat{{\textbf{Y}}}_{j})-\Psi _{\phi }(\hat{{\textbf{Y}}}_{j})|,1\right) \right] ={} E\left[ \max \left( |\psi _{j}({\textbf{Y}})-\Psi _{\phi }({\textbf{Y}})|,1\right) \right] . \end{aligned}$$

The proof of Theorem 2.12 (see the discussion after (36)) and Remark 4.4 shows that \(\psi _{j}({\textbf{Y}})\rightarrow \Psi _{\phi }({\textbf{Y}})\) almost surely as \(j\rightarrow \infty \). It follows that \(\psi _{j}(\hat{{\textbf{Y}}}_{j})-\Psi _{\phi }(\hat{{\textbf{Y}}}_{j})\) tends to 0 in probability and so does the second term on the right-hand side of (57).

The third term on the right-hand side of (57) equals

$$\begin{aligned}\left| \frac{W_1({\textbf{Y}})+\ldots +W_N({\textbf{Y}})}{N}\right| . \end{aligned}$$

We claim that it converges to 0 in probability. Notice that \(\ell (N)\rightarrow 0\), \(N\rightarrow \infty \) by the hypothesis \(r_{2}(0)<\infty \). It was a hypothesis of Theorem 2.17 that \(\pi (N)(K(N)/\lambda (N))^{2\delta }/N\) tends to 0 and hence, actually, (56) implies that the term in question tends to 0 in \(L^{p}\), a fortiori, in probability.

To sum up,

$$\begin{aligned} \left| \int _{{\mathcal {X}}} \phi (z)\mu _*(dz)-\frac{\sum _{j=1}^{N}\phi (X_j)}{N}\right| \rightarrow 0 \end{aligned}$$

in probability, recalling that the laws of \(Z^{{\textbf{Y}}}_{n}\), \(n\in {\mathbb {N}}\) and \(X_{n}\), \(n\in {\mathbb {N}}\) coincide.

To show convergence in \(L^{p}\), it suffices to check the uniform integrability of the family of random variables \(V^{\delta p}(X_{n})\), \(n\in {\mathbb {N}}\) since \(\phi \in \Phi (V^{\delta })\). This follows from \(p<1/\delta \) and from Lemma 6.4. The theorem has been shown for \(p\ge 2\) but this implies the result for \(1\le p<2\), too. \(\square \)

Remark 6.9

In the proof of Theorem 2.17, we could find estimates for the \(L^{p}\) convergence rate for the third term in (57) (see (56)). We would be able to do likewise for the second term, using the estimates in our arguments. However, there is, a priori, no rate estimate for

$$\begin{aligned} h_{N}:=\left| \frac{\sum _{j=1}^{N}\Psi _{\phi }(\hat{{\textbf{Y}}}_{j})}{N}- \int _{{\mathcal {X}}}\phi (z){} \mu _{*}(dz) \right| , \end{aligned}$$

as this depends on the mixing properties of Y. Making suitably strong assumptions about the process Y, however, this term could also be estimated. In the ideal case, \(E^{1/p}[h_{N}^{p}]\) is of the order \(1/\sqrt{N}\).

7 Ramifications

Let \(X_t\), \(t\in {\mathbb {N}}\) be a \({\mathcal {X}}\)-valued time-inhomogeneous Markov chain. Denote by \(Q_{t}(x,A)\) the transition kernel at times \(t\ge 1\). We impose that there exist \(\lambda ,K>0\) such that, for all \(t\ge 1\),

$$\begin{aligned}{}[Q_{t}V](x)\le (1-\lambda )V(x)+K,\ x\in {\mathcal {X}}, \end{aligned}$$

for some measurable function \(V:{\mathcal {X}}\rightarrow {\mathbb {R}}_{+}\). Furthermore, for some \(\alpha >0\) and for all \(t\ge 1\),

$$\begin{aligned} \inf _{x\in C} Q_{t}(x,A)\ge \alpha \nu (A),\ A\in {\mathfrak {B}} \end{aligned}$$

for some probability \(\nu \) and for

$$\begin{aligned} C:=\{x\in {\mathcal {X}}:\, V(x)\le 4K/\lambda \}. \end{aligned}$$

A simplified form of the argument of Lemma 6.1 gives us independent (but not identically distributed) random mappings \(T_{t}:{\mathcal {X}}\times \Omega \rightarrow {\mathcal {X}}\) such that \(P(T_t(x,\cdot )\in A)=Q_{t}(x,A)\) for \(t\ge 1\). Note that \(T_{t}\) are only independent, but not identically distributed this time. Define \(Z_0:=X_0\), \(Z_{t+1}:=T_{t+1}(Z_{t})\), \(t\in {\mathbb {N}}\), where we dropped, as before, the dependence of \(T_{t+1}\) on \(\omega \in \Omega \) in the notation.

Fix \(m\in {\mathbb {N}}\) and define \({\tilde{Z}}_m:={\tilde{x}}\), \({\tilde{Z}}_{t+1}:=T_{t+1}({\tilde{Z}}_t)\), \(t\ge m\), for some fixed \({\tilde{x}}\in {\mathcal {X}}\). Repeating the arguments of Sect. 6, we get the following result.

Theorem 7.1

For suitable constants \(c_{1},c_{2}>0\),

$$\begin{aligned} P(Z_{m+\tau }\ne {\tilde{Z}}_{m+\tau })\le c_{1}\exp \left( -c_{2}\frac{\alpha \lambda }{|\ln (\lambda )|}\tau \right) ,\ \tau \ge 1. \end{aligned}$$

Furthermore, for \(0< \delta \le 1/2\) and for any \(\phi \in \Phi (V^{\delta })\), \(\phi (X_{t})\), \(t\in {\mathbb {N}}\) is L-mixing of order p for each \(1\le p<1/\delta \) and the following estimates hold:

$$\begin{aligned} M_{p}(\phi (X))=O((K/\lambda )^{\delta }) \text{ and } \Gamma _{p}(\phi (X))=O\left( \left( \frac{K}{\lambda }\right) ^{\delta } \frac{|\ln (\lambda )|}{\alpha \lambda }\right) . \end{aligned}$$

\(\square \)

Although this result forms a very particular case of our framework, it is new and of considerable interest: it establishes a useful mixing property for functionals of a wide class of (even inhomogeneous) Markov processes.

8 Proofs of Ergodicity II

Proof of Theorem 2.18

This follows very closely the proof of Theorem 2.17, and we only point out the differences. Denote by S an upper bound for \(|\phi |\). Take an arbitrary \(p\ge 2\). We may use the Hölder inequality with exponents 1 and \(\infty \) in the estimates (50). This leads to

$$\begin{aligned} \Gamma _p(\phi (Z))\le c'''\frac{|\ln (\lambda (H))|}{\alpha (H)\lambda (H)}=c'''\pi (H), \end{aligned}$$

using the argument of (55). Then, the proof of convergence in probability can be completed as above. Note that instead of

$$\begin{aligned} M_{1/\delta }(W({\textbf{Y}}))P^{\frac{1-p\delta }{p}}((\tilde{Y_1},\ldots ,{\tilde{Y}}_N)\ne (Y_1,\ldots ,Y_N)) \end{aligned}$$

we may write

$$\begin{aligned} SP((\tilde{Y_1},\ldots ,{\tilde{Y}}_N)\ne (Y_1,\ldots ,Y_N))\le S\ell (N) \end{aligned}$$

in (56) and that

$$\begin{aligned} E^{1/p}\left[ \left| \frac{W_{1}({\textbf{y}})+\ldots +W_{N}({\textbf{y}})}{N}\right| ^{p}\right] \le {} \frac{C_{p}S^{1/2}c''' \pi (N)}{N^{1/2}} \end{aligned}$$

in this case. As \(\phi \) is bounded, \(L^{p}\) convergence for all \(p\ge 1\) also follows. \(\square \)