Abstract
We study the ergodic behaviour of a discrete-time process X which is a Markov chain in a stationary random environment. The laws of \(X_t\) are shown to converge to a limiting law in (weighted) total variation distance as \(t\rightarrow \infty \). Convergence speed is estimated, and an ergodic theorem is established for functionals of X. Our hypotheses on X combine the standard “drift” and “small set” conditions for geometrically ergodic Markov chains with conditions on the growth rate of a certain “maximal process” of the random environment. We are able to cover a wide range of models that have heretofore been intractable. In particular, our results are pertinent to difference equations modulated by a stationary (Gaussian) process. Such equations arise in applications such as discretized stochastic volatility models of mathematical finance.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
Markov chains in random environments (recursive chains in the terminology of [3]) were systematically studied on countable state spaces in [4, 5, 17]. However, papers on the ergodic properties of such processes on a general state space are scarce and require rather strong, Doeblin-type conditions, see [13, 14, 18]. An exception is [19], where the system dynamics is assumed to be contracting instead. This is also rather restrictive an assumption, and only weak convergence of the laws can be established.
In this paper, we deal with Markov chains in random environments that satisfy refinements of the usual hypotheses for the geometric ergodicity of Markov chains: minorization on “small sets”, see Chapter 5 of [16], and Foster–Lyapunov-type “drift” conditions, see Chapter 15 of [16].
Assuming that a suitably defined maximal process of the random environment satisfies a tail estimate, we manage to establish stochastic stability. We use certain ideas of [12] to obtain convergence to a limiting distribution in total variation norm with estimates on the convergence rate, see Sect. 2 for the statements of our results. We also present a method to prove ergodic theorems, exploiting ideas of [2, 11]. An important technical ingredient is the notion of L-mixing, see Sect. 5.
As examples, we present difference equations modulated by Gaussian processes in Sect. 3. These can be regarded as discretizations of diffusions in random environments which arise, for instance, in stochastic volatility models of mathematical finance, see [6, 9]. These examples demonstrate the power of our approach. Proofs appear in Sects. 4, 6 and 8. Certain ramifications are explored in Sect. 7.
2 Main Results
Let \({\mathcal {Y}}\) be a Polish space with its Borel sigma-field \({\mathfrak {A}}\), and let \(Y_t\), \(t\in {\mathbb {Z}}\) be a (strongly) stationary \({\mathcal {Y}}\)-valued process on some probability space \((\Omega ,{\mathcal {F}},P)\). Expectation of a real-valued random variable X with respect to P will be denoted by E[X] in the sequel. For \(1\le p<\infty \), we write \(L^p\) to denote the Banach space of (a.s. equivalence classes of) \({\mathbb {R}}\)-valued random variables with \(E[|X|^p]<\infty \), equipped with the usual norm.
We fix another Polish space \({\mathcal {X}}\) with its Borel sigma-field \({\mathfrak {B}}\) and denote by \({\mathcal {P}}({\mathcal {X}})\) the set of probability measures on \({\mathfrak {B}}\). Let \(Q:{\mathcal {Y}}\times {\mathcal {X}}\times {\mathfrak {B}}\rightarrow [0,1]\) be a family of transition kernels parametrized by \(y\in {\mathcal {Y}}\), i.e. for all \(A\in {\mathfrak {B}}\), \(Q(\cdot ,\cdot ,A)\) is \({\mathfrak {A}}\otimes {\mathfrak {B}}\)-measurable and for all \(y\in {\mathcal {Y}}\), \(x\in {\mathcal {X}}\), \(A\rightarrow Q(y,x,A)\) is a probability on \({\mathfrak {B}}\). Let \({X}_t\), \(t\in {\mathbb {N}}\) be a \({\mathcal {X}}\)-valued stochastic process such that
where the filtration is defined by
The process Y will represent the random environment whose state \(Y_t\) at time t determines the transition law \(Q(Y_t,\cdot ,\cdot )\) of the process X at the given instant t. Thus, X is a Markov chain in a random environment. Our purpose is to study the ergodic properties of X.
Remark 2.1
Obviously, the law of \({X}_t\), \(t\in {\mathbb {N}}\) (and also its joint law with \(Y_t\), \(t\in {\mathbb {Z}}\)) is uniquely determined by (1). For every given Q and \(X_{0}\), there exists a process X satisfying (1) (after possibly enlarging the probability space). See, for example, page 228 of [1] for a similar construction. We will establish a more precise result in Lemma 6.1, under additional assumptions.
We will now introduce a number of assumptions of various kinds that will figure in the statements of the main results: Theorems 2.12, 2.16, 2.17 and 2.18.
The following assumption closely resembles the well-known drift conditions for geometrically ergodic Markov chains, see, for example, Chapter 15 of [16]. In our case, however, they are relaxed by also having dependence on the state of the random environment.
Assumption 2.2
(Drift condition) Let \(V:{\mathcal {X}}\rightarrow {\mathbb {R}}_{+}\) be a measurable function. Let \(A_n\in {\mathfrak {A}}\), \(n\in {\mathbb {N}}\) be a non-decreasing sequence of subsets such that \(A_0\ne \emptyset \) and \({\mathcal {Y}}=\cup _{n\in {\mathbb {N}}}A_n\). Define the \({\mathbb {N}}\)-valued function
We assume that there are a non-increasing function \(\lambda :{\mathbb {N}}\rightarrow (0,1]\) and a non-decreasing function \(K:{\mathbb {N}}\rightarrow (0,\infty )\) such that, for all \(x\in {\mathcal {X}}\) and \(y\in {\mathcal {Y}}\),
Furthermore, we may and will assume \(K(\cdot )\ge 1\).
We provide some intuition about Assumption 2.2: we expect that the stochastic process X behaves in an increasingly arbitrary way as the random environment Y becomes more and more “extreme” (i.e. \(\Vert Y\Vert \) grows) so the drift condition (2) becomes less and less stringent (i.e. \(\lambda (\Vert Y\Vert )\) decreases and K(||y||) increases).
Example 2.3
A typical case is where \({\mathcal {Y}}\) is a subset of a Banach space \({\mathbb {B}}\) with norm \(\Vert \cdot \Vert _{{\mathbb {B}}}\); \({\mathfrak {A}}\) its Borel field; \(A_n:=\{y\in {\mathcal {Y}}:\, \Vert y\Vert _{{\mathbb {B}}}\le n\}\), \(n\in {\mathbb {N}}\). In this setting
where \(\lceil \cdot \rceil \) stands for the ceiling function. In the examples of the present paper, we will always have \({\mathbb {B}}={\mathbb {R}}^d\) with some \(d\ge 1\) and \(|\cdot |=\Vert \cdot \Vert _{{\mathbb {B}}}\) will denote the respective Euclidean norm. Note, however, that in general \(\Vert \cdot \Vert \) is not necessarily related to any geometric structure.
Remark 2.4
It would be desirable to relax Assumption 2.2 by allowing \(\lambda \) to vary in \((-\infty ,1)\) as long as “in the average” it is contractive. (There are multiple options for the precise formulation of this property.) Such a result has been worked out in [15].
The next assumption stipulates the existence of a whole family of suitable “small sets” C(R(n)) that fit well the sets \(A_n\) appearing in Assumption 2.2.
Assumption 2.5
(Minorization condition) For \(R\ge 0\), set \(C(R):=\{x\in {\mathcal {X}}:\ V(x)\le R\}\). Let \(\lambda (\cdot )\), \(K(\cdot )\) be as in Assumption 2.2. Define \(R(n):=4K(n)/\lambda (n)\). There is a non-increasing function \(\alpha :{\mathbb {N}}\rightarrow (0,1]\) and for each \(n\in {\mathbb {N}}\), there exists a probability measure \(\nu _n\) on \({\mathfrak {B}}\) such that, for all \(y\in A_n\), \(x\in C(R(n))\) and \(A\in {\mathfrak {B}}\),
In other words, if the state y of the random environment is in \(A_n\), we work on the set \(C(4K(n)/\lambda (n))\) on which we are able to benefit from a “coupling effect” of strength \(\alpha (n)\).
For a fixed V as in Assumption 2.2, let us define a family of metrics on
by setting
for each \(0\le \beta \le 1\). Here, \(\vert \nu _1-\nu _2\vert \) is the total variation of the signed measure \(\nu _1-\nu _2\). Note that \(\rho _0\) is just the total variation distance (and it can be defined for all \(\nu _1,\nu _2\in {\mathcal {P}}({\mathcal {X}})\)), while \(\rho _1\) is the \((1+V)\)-weighted total variation distance.
Definition 2.6
For a measurable \(f:{\mathcal {X}}\rightarrow {\mathbb {R}}_{+}\), we define \(\Phi (f)\) to be the set of measurable \(\phi :{\mathcal {X}}\rightarrow {\mathbb {R}}\) such that \(|\phi (z)|\le C(1+f(z))\), \(z\in {\mathcal {X}}\) holds for some constant \(C=C(\phi )\). Hence, \(\Phi (1)\) denotes the set of bounded, measurable functions on \({\mathcal {X}}\).
Let \(L:{\mathcal {X}}\times {\mathfrak {B}}\rightarrow [0,1]\) be a transition kernel. For each \(\mu \in {\mathcal {P}}({\mathcal {X}})\), we define the probability
Consistently with these definitions, \(Q(Y_n)\mu \) will refer to the action of the kernel \(Q(Y_n,\cdot ,\cdot )\) on \(\mu \). Note, however, that \(Q(Y_n)\mu \) is a random probability measure. For a bounded measurable function \(\phi :{\mathcal {X}}\rightarrow {\mathbb {R}}\), we set
The latter definition makes sense for any nonnegative measurable \(\phi \), too.
The following assumption is an integrability condition about the initial values \(X_0\) and \(X_1\) of the process X.
Assumption 2.7
(Moment condition on the initial values)
We now present a hypothesis controlling the maxima of \(\Vert Y\Vert \) over finite time intervals (i.e. the “degree of extremity” of the random environment).
Assumption 2.8
(Condition on the maximal process of the random environment) There exist a non-decreasing function \(g:{\mathbb {N}}\rightarrow {\mathbb {N}}\) and a non-increasing function \(\ell :{\mathbb {N}}\rightarrow [0,1]\) such that
Remark 2.9
It is clear that for a given process Y, several choices for the pair of functions \(g,\ell \) are possible. Each of these leads to different estimates, and it depends on Y and X which choice is better, no general rule can be determined a priori.
Remark 2.10
In the setting of Example 2.3, let Y be a Gaussian process in \({\mathcal {Y}}={\mathbb {R}}^d\). Assumption 2.8 holds, for instance, with \(g(t)\sim \sqrt{t}\) and \(\ell (t)\) exponentially decreasing, see Sect. 3 for more details.
Remark 2.11
One can derive estimates like (6) also for rather general processes Y. For instance, let \(Y_t\), \(t\in {\mathbb {Z}}\) be \({\mathbb {R}}^d\)-valued strongly stationary such that \(E|Y_0|^p<\infty \) for all \(p\ge 1\). Then for each \(q\ge 1\) set \(p=2q\) and estimate
with constant \(C(q)=E^{1/2q}[|Y_0|^{2q}]\). The Markov inequality implies that
Actually, for arbitrarily small \(\chi >0\) and arbitrarily large \(r\ge 1\), we can set \(q=\frac{r}{\chi }+\frac{1}{2}\) in (7) and then Assumption 2.8 holds with
i.e. for arbitrary polynomially growing \(g(\cdot )\) and polynomially decreasing \(\ell (\cdot )\). This shows that our main results below have a wide spectrum of applicability well beyond the case of Gaussian Y, see also Example 2.20.
We now define a number of quantities that will appear in various convergence rate estimates below. For each \(t\in {\mathbb {N}}\), set
Now comes the first main result of the present paper: assuming our conditions on drift, minorization, initial values and control of the maxima, \(\textrm{Law}(X_{t})\) will tend to a limiting law as \(t\rightarrow \infty \), provided that \(r_1(0)\) and \(r_2(0)\) are finite.
Theorem 2.12
Define \(\mu _{t}:=\textrm{Law}(X_{t})\), \(t\in {\mathbb {N}}\). Let Assumptions 2.2, 2.5, 2.7 and 2.8 be in force. Assume
Then, there is a probability \(\mu _*\) on \({\mathcal {X}}\) such that \(\mu _t\rightarrow \mu _*\) in \((1+V)\)-weighted total variation as \(t\rightarrow \infty \). More precisely,
for some constant \(C>0\). The limit \(\mu _{*}\) does not depend on \(X_{0}\).
Remark 2.13
When \(\lambda ,K,\alpha \) are constant and do not depend on Y, we retrieve the familiar exponential convergence rate in Theorem 2.12. Indeed, in this case we may suppose that \(A_n=A_0={\mathcal {Y}}\) for all n and hence we may choose \(g(t)=1\) and \(l(t)=0\), for all t.
Theorem 2.16 is just a variant of Theorem 2.12: relaxing the assumptions it provides convergence in a weaker sense.
Assumption 2.14
(Weaker moment condition on the initial values)
Remark 2.15
As one of our referees pointed out, a simple sufficient condition for (9) can be formulated in terms of \(X_{0},Y_{0}\) only. Let Assumption 2.2 be in force. Then, \(E[V(X_{1})]\le E[V(X_{0})]+E[K(||Y_{0}||)]\); hence, a sufficient condition for (9) is \(E[V(X_{0})+K(\Vert Y_{0}\Vert )]<\infty \).
Similarly, let
hold instead of (2). This implies (2) and also \(E[V^2(X_1)]\le E[V^2(X_0)]+ E[K^2(||Y_0||)]\); thus, a sufficient condition for (5) in terms of \(X_0,Y_0\) only is \(E[V^2(X_0)+ K^2(||Y_0||)]<\infty \) in this case.
Theorem 2.16
Recall that \(\mu _{t}=\textrm{Law}(X_{t})\), \(t\in {\mathbb {N}}\). Let Assumptions 2.2, 2.5, 2.8 and 2.14 be in force. Assume
Then, there is a probability \(\mu _*\) on \({\mathcal {X}}\) such that \(\mu _t\rightarrow \mu _*\) in total variation as \(t\rightarrow \infty \). More precisely,
for some constant \(C>0\). The limit \(\mu _{*}\) does not depend on \(X_{0}\).
Clearly, Assumption 2.7 implies Assumption 2.14 and (8) implies (10). Next, ergodic theorems corresponding to Theorems 2.12 and 2.16 are stated.
Theorem 2.17
Let Assumptions 2.2, 2.5, 2.7 and 2.8 be in force, but with \(R(n):=8K(n)/\lambda (n)\), \(n\in {\mathbb {N}}\) in Assumption 2.5. Let Y be an ergodic process. Let \(\phi \in \Phi (V^{\delta })\) for some \(0< \delta \le 1/2\). Assume
and
Then,
holds in \(L^{p}\) for each \(p<1/\delta \). (Here, \(\mu _*\) is the same as in Theorem 2.12.)
We can weaken our assumptions for bounded \(\phi \).
Theorem 2.18
Let Assumptions 2.2, 2.5, 2.8 and 2.14 be in force, but with \(R(n):=8K(n)/\lambda (n)\), \(n\in {\mathbb {N}}\) in Assumption 2.5. Let Y be an ergodic process. Assume
and
Then, for each \(\phi \in \Phi (1)\) the convergence (13) holds in \(L^{p}\) for all \(p\ge 1\).
Remark 2.19
In Theorems 2.17 and 2.18, we require a slight strengthening of Assumption 2.5 by imposing (3) with \(R(n)=8K(n)/\lambda (n)\) instead of \(R(n)=4K(n)/\lambda (n)\).
Condition (14) is closely related to the condition \(r_3(0)<\infty \) but none of the two implies the other. Indeed, fix \(g(t):=t\). Choose \(\lambda (t):=1/2\) and \(\alpha (t):=\sqrt{\ln (t)}/{t}\), \(t\ge 4\). Then, \(\pi (t)/t\rightarrow 0\) but \(r_3(0)=\infty \). Conversely, let \(\alpha (t):=1/2\) and \(\lambda (t)=:\frac{8\ln (t)}{t}\). Then, \(r_3(0)<\infty \) but \(\pi (t)/t\) tends to a positive constant.
Example 2.20
Let Y be strongly stationary \({\mathbb {R}}^d\)-valued with \(E|Y_0|^p<\infty \) for all \(p\ge 1\). Let Assumptions 2.2 and 2.5 hold with \(K(\cdot )\) having at most polynomial growth (i.e. \(K(n)\le C n^{b}\) with some \(C,b>0\)) and \(\alpha (\cdot )\), \(\lambda (\cdot )\) having at most polynomial decay (i.e. \(\alpha (n)\ge c n^{-b}\) with some \(c>0\), similarly for \(\lambda \)). Let Assumption 2.7 hold. Then, Remark 2.11 shows (choosing \(\chi \) small and r large) that Theorems 2.12 and 2.17 apply.
Remark 2.21
One of our referees pointed out that more general versions of Assumptions 2.2 and 2.5 could be considered. Let \({\tilde{\lambda }},{\tilde{\alpha }},{\tilde{K}}:{\mathcal {Y}}\rightarrow (0,\infty )\) be measurable functions with \({\tilde{K}}\ge 1\) and \({\tilde{\lambda }},{\tilde{\alpha }}\le 1\). Instead of (2), one could assume
Instead of (3), one could assume that, for all \(y\in {\mathcal {Y}}\), \(x\in C({\tilde{R}}(y))\) and \(A\in {\mathfrak {B}}\),
with a transition kernel \(\nu :{\mathcal {Y}}\times {\mathfrak {B}}\rightarrow [0,1]\). Here, \({\tilde{R}}(y):=4{\tilde{K}}(y)/{\tilde{\lambda }}(y)\).
When it comes to (6), however, it is not totally clear how to formulate it in this more general setting. We only sketch one possibility here. Let \(\lambda _{t},\alpha _{t}>0\), \(t\in {\mathbb {N}}\) be non-increasing sequences, \(K_{t}>0\), \(t\in {\mathbb {N}}\) a non-decreasing sequence. Define
Note that (6) is equivalent to
hence (17) looks reasonable. Redefine
Let (15), (16) and (5) be in force. Let us assume \(r_{1}(0)+r_{2}(0)<\infty \). Then, the conclusion of Theorem 2.12 remains true, with essentially the same proof.
3 Difference Equations in Gaussian Environments
In this section, we present examples of processes X that satisfy a difference equation, modulated by the process Y. We do not aim at a high degree of generality but prefer to illustrate the power of the results in Sect. 2 in some easily tractable cases. We stress that, as far as we know, none of these results follow from the existing literature.
We fix \({\mathcal {Y}}={\mathbb {R}}^d\) for some d and \({\mathcal {X}}={\mathbb {R}}\). We also fix a \({\mathcal {Y}}\)-valued zero-mean Gaussian stationary process \(Y_t\), \(t\in {\mathbb {Z}}\). We set \(\Vert y\Vert =\lceil |y|\rceil \), \(y\in {\mathcal {Y}}\) as in Example 2.3. We will exclusively use \(V(x)=|x|\), \(x\in {\mathbb {R}}\) in the present section.
Remark 3.1
Let \(\xi _t\), \(t\in {\mathbb {Z}}\) be a zero-mean \({\mathbb {R}}\)-valued stationary Gaussian process with unit variance. Clearly,
for suitable \(C,{\bar{c}}_{1},{\bar{c}}_{2}\).
Applying these observations with \(b=1\) to every coordinate of Y, it follows that Assumption 2.8 holds for the process Y with the choice \(g(k)=\lceil c_1 \sqrt{k}\rceil \), \(\ell (k)=\exp (-c_2 k)\) for some \(c_1,c_2>0\) and thus \(r_4(n)\) decreases at a geometric rate as \(n\rightarrow \infty \).
More generally, choosing arbitrary \(b>0\), Assumption 2.8 holds for Y with the choice \(g(k)=\lceil c_1 k^{b} \rceil \), \(\ell (k)=\exp (-c_2 k^{2b})\).
We assume throughout this section that \(\varepsilon _t\), \(t\in {\mathbb {N}}\) is an \({\mathbb {R}}\)-valued i.i.d. sequence, independent of \(Y_t\), \(t\in {\mathbb {Z}}\); \(E|\varepsilon _0|^2<\infty \) and the law of \(\varepsilon _0\) has an everywhere positive density f with respect to the Lebesgue measure, which is even and non-increasing on \([0,\infty )\). All these hypotheses could clearly be weakened/modified, and we just try to stay as simple as possible.
Example 3.2
First we investigate the effect of the “contraction coefficient” \(\lambda \) in (2). Let \(d:=1\). Let \(0<\underline{\sigma }\le {\overline{\sigma }}\) be constants and \(\sigma :{\mathbb {R}}\times {\mathbb {R}}\rightarrow [\underline{\sigma },{\overline{\sigma }}]\) a measurable function. Let furthermore \(\Delta :{\mathbb {R}}\rightarrow (0,1]\) be even and non-increasing on \([0,\infty )\), for which we will develop conditions on the way. We stipulate that the tail of f is not too thin: it is at least as thick as that of a Gaussian variable, that is,
for some \(s>0\). We assume that the dynamics of X is given by
We will find \(K(\cdot ),\lambda (\cdot ),\alpha (\cdot )\) such that Assumptions 2.2 and 2.5 hold and give an estimate for the rate \(r_3(n)\) appearing in (11). (Note that we already have estimates for the rate \(r_4(n)\) from Remark 3.1.)
The density of \(X_1\) conditional on \(X_0=x\), \(Y_0=y\) (w.r.t. the Lebesgue measure) is easily seen to be
Let \(\eta >0\) be arbitrary for the moment. We can estimate
Define the probability measures
It follows that
for all \(x\in [-\eta ,\eta ]\), \(y\in {\mathbb {R}}\). Notice that
where \(K:=\max \{{\overline{\sigma }}E|\varepsilon _0|,1\}\). Then, Assumption 2.2 holds with \(A_n=\{x\in {\mathbb {R}}:\, |x|\le n\}\), \(\lambda (n)=\Delta (n)\) and \(K(n)=K\), \(n\ge 1\). (Here and in the sequel we use the index set \({\mathbb {N}}\setminus \{0\}\) instead of \({\mathbb {N}}\) for convenience.)
Let us now specify \(\eta \) by setting \(\eta :={\tilde{R}}(y):=4K/\Delta (y)\), \(y\in {\mathcal {Y}}\) and \(R(n)={\tilde{R}}(n)\), \(n\in {\mathbb {N}}\). We note that \({\tilde{R}}(y)\) is defined for every \(y\in {\mathcal {Y}}\), while R(n) is defined for every \(n\in {\mathbb {N}}\), and this is why we keep different notations for these two functions here and also in the subsequent examples. We can conclude using the tail bound (18) that
for all \(A\in {\mathfrak {B}}\), \(y\in {\mathcal {Y}}\), \(|x|\le R(\lceil |y|\rceil )\) with some \(c_3>0\), so (3) in Assumption 2.5 holds with
and \(\nu _n={\tilde{\nu }}_{R(n)}\). Now, let the function \(\Delta \) be such that \(\Delta (y):= 1\) for \(0\le y<3\) and \(\Delta (y)\ge 1/(\ln (y))^{\delta }\) with some \(\delta >0\), for all \(y\ge 3\). We obtain from the previous estimates and from Remark 3.1 with \(g(k)=\lceil c_1 \sqrt{k}\rceil \) that
with some \(c_4>0\). If \(\delta <1/2\), then this leads to estimates on the terms of \(r_3(n)\) which guarantee \(r_3(0)<\infty \).
If instead of (18) we assume
then \(r_3(0)<\infty \) follows whenever \(\delta <1\). This shows nicely the interplay between the feasible fatness of the tail of f and the strength of the mean-reversion \(\Delta (\cdot )\).
Example 3.3
We recall that f is assumed even, positive and non-increasing on \([0,\infty )\). Again, let \(d:=1\), \(X_0:=0\) and
where \(\sigma :{\mathbb {R}}\times {\mathbb {R}}\rightarrow (0,\infty )\) is a measurable function and \(0< \Delta \le 1\) is a constant. We furthermore assume that
with some even function \(G:{\mathbb {R}}\rightarrow (0,\infty )\) that is non-decreasing on \([0,\infty )\) and with constants \(c_5,c_6>0\). We clearly have (2) with \(\lambda (n)=\Delta \), \(n\in {\mathbb {N}}\) (i.e. \(\lambda (\cdot )\) is constant) and \(A_n=\{x\in {\mathbb {R}}:\ |x|\le n\}\), \(K(n):={\tilde{K}}(n)\), \(n\in {\mathbb {N}}\) where \({\tilde{K}}(y)=\max \{1, c_6 G(y)E\vert \varepsilon _0\vert \}\), \(y\in {\mathbb {R}}\). Taking \({\tilde{R}}(y)= 4{\tilde{K}}(y)/\Delta \), \(y\in {\mathbb {R}}\), estimates as in Example 3.2 lead to
for all \(A\in {\mathfrak {B}}\) with some fixed constant \(c_7>0\), where \({\tilde{\nu }}_{{\tilde{R}}(y)}(\cdot )\) is the normalized Lebesgue measure restricted to \(C({\tilde{R}}(y))\), as in Example 3.2, so setting \(R(n)={\tilde{R}}(n)\), \(n\in {\mathbb {N}}\), we can choose \(\nu _n={\tilde{\nu }}_{R(n)}\) and \(\alpha (\cdot )\) a positive constant.
Assume, for example, \(G(y)\le C[1+|y|^q]\), \(y\ge 0\) with some \(C,q>0\), this guarantees \(E[V^{2}(X_{1})]=E[X_{1}^{2}]<\infty \), i.e. Assumption 2.7 holds. Choose \(g(k)=\lceil c_1\sqrt{k}\rceil \), \(\ell (k)=\exp (-c_2 k)\), as discussed in Remark 3.1. Then, Theorems 2.12 and 2.17 apply.
Example 3.4
We now investigate a discrete-time model for financial time series, inspired by the “fractional stochastic volatility model” of [6, 9].
Let \(w_t\), \(t\in {\mathbb {Z}}\) and \(\varepsilon _t\), \(t\in {\mathbb {N}}\) be two sequences of i.i.d. random variables such that the two sequences are also independent. Assume that \(w_t\) is Gaussian. We define the (causal) infinite moving average process
This series is almost surely convergent whenever \(\sum _{j=0}^{\infty } a_j^2<\infty \). We take \(d=2\) here, and the random environment will be the \({\mathcal {Y}}={\mathbb {R}}^2\)-valued process \(Y_t=(w_t,\xi _t)\), \(t\in {\mathbb {Z}}\).
We imagine that \(\xi _t\) describes the log-volatility of an asset in a financial market. It is reasonable to assume that \(\xi \) is a Gaussian linear process (see [9] where the related continuous-time models are discussed in detail).
Let us now consider the \({\mathbb {R}}\)-valued process X which will describe the increment of the log-price of the given asset. Assume that \(X_0:=0\),
with some \(-1<\rho <1\), \(0<\Delta \le 1\). The log-price is thus jointly driven by the noise sequences \(\varepsilon _t\), \(w_t\). The parameter \(\Delta \) is responsible for the autocorrelation of X. (\(\Delta \) is typically close to 1.) The parameter \(\rho \) controls the correlation of the price and its volatility. This is found to be nonzero (actually, negative) in empirical studies, see [7], and hence, it is important to include \(w_t\), \(t\in {\mathbb {Z}}\) both in the dynamics of X and in that of Y. We take \(A_n=\{y=(w,\xi )\in {\mathbb {R}}^2:\ |y|\le n\}\), \(n\in {\mathbb {N}}\).
Notice that
hence,
for all \(x\in {\mathbb {R}}\), with some \(c_{8}>0\), i.e. Assumption 2.2 holds with \(\lambda (n)=\lambda :=\Delta \) and \(K(n)=c_8 e^n(1+n)\).
We now turn our attention to Assumption 2.5. Denote the density of the law of \(X_1\) conditional on \(X_0=x\), \(Y_0=(w,\xi )\) with respect to the Lebesgue measure by \(h_{x,w,\xi }(z)\), \(z\in {\mathbb {R}}\). Let us fix \(\eta >0\) for the moment. For \(x,z\in [-\eta ,\eta ]\), we clearly have
We assume from now on that f, the density of \(\varepsilon _0\), satisfies
with some \(s>0\), \(\chi >3\), this is reasonable as \(X_t\) has fat tails according to empirical studies, see [7]. At the same time, \(E[\varepsilon _{0}^{2}]<\infty \) and Assumption 2.7 are also satisfied for such a choice of f.
Define \({\tilde{K}}(y):=e^{\xi }(1+|w|)\) and \({\tilde{R}}(y):=4{\tilde{K}}(y)/\lambda \), for \(y=(w,\xi )\in {\mathbb {R}}^2\). Specify \(\eta :={\tilde{R}}(y)\) and use (20) to obtain, as in Example 3.2,
with fixed constants \(c_{9},c_{10}>0\), where \({\tilde{\nu }}_{{\tilde{R}}(y)}\) is the normalized Lebesgue measure restricted to \([-{\tilde{R}}(y), {\tilde{R}}(y)]\). Set \(R(n)={\tilde{R}}((n,n))\), \(n\ge 1\). Then, Assumption 2.5 holds with
and \(\nu _{n}={\tilde{\nu }}_{R(n)}\). Recalling the end of Remark 3.1, and choosing \(b>0\) small enough, we can conclude that Theorems 2.12 and 2.17 apply to this stochastic volatility model.
More generally, instead of (19), we may consider
with some dissipative measurable function \(k:{\mathbb {R}}\rightarrow {\mathbb {R}}\), i.e. we assume \(xk(x)\le -Ax^2+B\) for all \(x\in {\mathbb {R}}\) with some \(A,B>0\). Following the same steps, the applicability of Theorems 2.12 and 2.17 can be verified.
We stress that only a small fraction of relevant examples has been presented above, favouring simplicity. The results of Sect. 2 clearly apply in much greater generality.
4 Proofs of Stochastic Stability
Consider the \({\mathfrak {Y}}:={\mathcal {Y}}^{{\mathbb {Z}}}\)-valued random variable \({\textbf{Y}}:=(Y_{t})_{t\in {\mathbb {Z}}}\). By the measure decomposition theorem (see III.72 of [8]), there is a transition kernel \({\tilde{\mu }}_{0}:{\mathfrak {Y}}\times {\mathfrak {B}}\rightarrow {} [0,1]\) such that
For each \({\textbf{y}}\in {\mathfrak {Y}}\), we will denote by \({\tilde{\mu }}_{0}({\textbf{y}})\) the probability \(A\rightarrow {\tilde{\mu }}_{0}({\textbf{y}},A)\), \(A\in {\mathfrak {B}}\) in the sequel.
Clearly, Assumption 2.7 is equivalent to
and Assumption 2.14 is equivalent to
We first recall a result which will be crucial in the arguments below.
Lemma 4.1
Let \(L:{\mathcal {X}}\times {\mathfrak {B}}\rightarrow [0,1]\) be a transition kernel such that
for some \(0\le \gamma <1\), \(K>0\). Let \(C:=\{x\in {\mathcal {X}}:\, V(x)\le R \}\) for some \(R>2K/(1-\gamma )\). Let us assume that there is a probability \(\nu \) on \({\mathfrak {B}}\) such that
for some \(\alpha >0\). Then for each \(\alpha _0\in (0,\alpha )\) and for \(\gamma _0:=\gamma + 2K/R\),
holds for \(\beta =\alpha _0/K\).
Proof
See Theorem 3.1 in [12]. \(\square \)
Next comes an easy corollary.
Lemma 4.2
Let \(L:{\mathcal {X}}\times {\mathfrak {B}}\rightarrow [0,1]\) be a transition kernel such that
for some \(0<\lambda \le 1\), \(K>0\). Let \(C:=\{x\in {\mathcal {X}}:\, V(x)\le R \}\) with \(R:=4K/\lambda \). Assume that there is a probability \(\nu \) on \({\mathfrak {B}}\) such that
for some \(\alpha >0\). Then,
holds for \(\beta =\frac{\alpha }{2K}\).
Proof
Choose \(\gamma :=1-\lambda \), and let \(\alpha _0:=\alpha /2\). Note that \(1-(\alpha -\alpha _0)= 1-\alpha /2\) and \(R\beta =2\alpha /\lambda \) holds for \(\beta =\frac{\alpha }{2K}\). Also, \(\gamma _{0}=1-\lambda /2\). Applying Lemma 4.1, we estimate
Here,
and we get the statement since \(\alpha /2\ge \frac{\min (\alpha ,\lambda )}{4}\). \(\square \)
We introduce some important notation now. If \(({\textbf{y}},A)\rightarrow L({\textbf{y}},A)\), \({\textbf{y}}\in {\mathfrak {Y}}\), \(A\in {\mathfrak {B}}\) is a (not necessarily transition) kernel and Z is a \({\mathfrak {Y}}\)-valued random variable, then we define a measure \({\mathcal {E}}[L(Z)](\cdot )\) on \({\mathfrak {B}}\) via
We will use the following trivial inequalities in the sequel:
Proof of Theorem 2.12
For later use, we define the \({\mathfrak {Y}}\)-valued random variables \(\hat{{\textbf{Y}}}_{n}:=(Y_{n+j})_{j\in {\mathbb {Z}}}\), for each \(n\in {\mathbb {Z}}\). Note that \({\textbf{Y}}=\hat{{\textbf{Y}}}_{0}\). Fix \({\textbf{y}}:=(y_j)_{j\in {\mathbb {Z}}}\in {\mathfrak {Y}}\) for the moment. Set \(\hat{{\textbf{y}}}_{n}:=(y_{n+j})_{j\in {\mathbb {Z}}}\), for each \(n\in {\mathbb {Z}}\). Again, \({\textbf{y}}=\hat{{\textbf{y}}}_{0}\). Define
Here, Q(y) is the operator acting on probabilities which is described in (4) but, instead of L(x, A), with the kernel Q(y, x, A). Fix \(n\ge 1\) and denote \(\bar{y}_n:=\max _{-n+1\le j\le 0}\Vert y_j\Vert \). Since
Assumptions 2.2 and 2.5 imply that (24) and (25) hold for \(L=Q(y_j)\), \(j=-n+1,\ldots ,0\) with \(K=K(\bar{y}_n)\), \(\lambda =\lambda (\bar{y}_n)\) and \(\alpha =\alpha (\bar{y}_n)\). An n-fold application of Lemma 4.2 implies that, for \(\beta =\alpha (\bar{y}_n)/2K(\bar{y}_n)\),
By (29),
We thus arrive at
using the notation \(M_n:=\max _{-n+1\le i\le 0}\Vert Y_i\Vert \). We now estimate the expectation on the right-hand side of (33) separately on the events \(\{M_n\ge g(n)\}\) and \(\{M_n< g(n)\}\). Note first that, for each \(m\ge n\),
since \(M_{k}\ge M_{m}\). Hence, applying \(1-x\le e^{-x}\), \(x\ge 0\) and (34),
where we have used the closed-form expression for the sum of geometric series and Cauchy–Schwarz in the second inequality; Assumption 2.8 and the fact that the law of \(\rho _1({\tilde{\mu }}_0(\hat{{\textbf{Y}}}_{1}), Q(Y_{0}){\tilde{\mu }}_0({\textbf{Y}}))\) equals that of \(\rho _1({\tilde{\mu }}_0(\hat{{\textbf{Y}}}_{-m+1}), Q(Y_{-m}){\tilde{\mu }}_0(\hat{{\textbf{Y}}}_{-m}))\), for each m, in the third inequality. Recall that
by (22). A fortiori, \(E[\rho _1({\tilde{\mu }}_0(\hat{{\textbf{Y}}}_{1}), Q(Y_{0}){\tilde{\mu }}_0({\textbf{Y}}))]<\infty \), too. Now it follows from (35) and \(r_1(0)+r_2(0)<\infty \) that
Consequently, for a.e. \(\omega \in \Omega \), the sequence \(\mu _n({\textbf{Y}}(\omega ))\), \(n\in {\mathbb {N}}\) is Cauchy and hence convergent for the metric \(\rho _{1}\). Its limit is denoted by \(\mu _{\sharp }(\omega )\).
For later use, we remark that \(\omega \rightarrow \int _{{\mathcal {X}}}\phi (z)\mu _{\sharp }(\omega )(dz)\) is \(\sigma ({\textbf{Y}})\)-measurable for every \(\phi \in \Phi (V)\). Hence, there is a measurable \(\Psi _{\phi }:{\mathfrak {Y}}\rightarrow {\mathbb {R}}\) such that
In the sequel, we will need the definition (28) for the kernel \(({\textbf{y}},A)\rightarrow \mu _n({\textbf{y}})(A)\), \({\textbf{y}}\in {\mathfrak {Y}}\), \(A\in {\mathfrak {B}}\) and for similar kernels. Notice that, for any measurable function \(w:{\mathcal {X}}\rightarrow {\mathbb {R}}_+\),
This is trivial for indicators and then follows for all measurable w in a standard way. By similar arguments, we also have
Notice that \(\mu _{n}=\textrm{Law}(X_{n})={\mathcal {E}}[\mu _n(\hat{{\textbf{Y}}}_{-n})]={\mathcal {E}}[\mu _n({\textbf{Y}})]\). We infer from (38) and (39) that
Then, it follows from (36) that
so \(\mu _n\), \(n\ge 0\) is a Cauchy sequence for the complete metric \(\rho _1\). Hence, it converges to some probability \(\mu _*\) as \(n\rightarrow \infty \). The claimed convergence rate also follows by the above estimates.
To show uniqueness, let \(X_{0}'\) be another initial condition satisfying Assumption 2.7, with the corresponding \({\tilde{\mu }}_{0}'({\textbf{y}})\), see (21). Defining, just like in (30),
the estimates (34) and (35) show that
which tends to 0 when \(n\rightarrow \infty \) since, as before, \(E\left[ \rho _1^{2}({\tilde{\mu }}_0({\textbf{Y}}), {\tilde{\mu }}_0'({\textbf{Y}}))\right] <\infty \), by Assumption 2.7. \(\square \)
Remark 4.3
The proof of Theorem 2.12 also implies convergence for the “quenched” process: there is a set \({\mathfrak {Y}}'\subset {\mathfrak {Y}}\) with \(\textrm{Law}({\textbf{Y}})({\mathfrak {Y}}')=1\) such that, for all \({\textbf{y}}\in {\mathfrak {Y}}'\), the sequence \(\mu _{n}({\textbf{y}})\) converges in \(\rho _{1}\) to a limiting probability \(\mu _{\natural }({\textbf{y}})\) as \(n\rightarrow \infty \).
Remark 4.4
Define the probability \({\bar{\mu }}(A):=E[\mu _{\sharp }(A)]\), \(A\in {\mathfrak {B}}\). It is clear that, for every \(\phi \in \Phi (1)\),
hence \({\bar{\mu }}=\mu _{*}=E[\mu _{\natural }({\textbf{Y}})(A)]\), see the above remark.
Proof of Theorem 2.16
Estimates of Theorem 2.12 imply
By (29), this leads to
for some \(C>0\), using Assumption 2.8 and (23). The result now follows as in the proof of Theorem 2.12.
Remark 4.5
The convergence rates obtained by our method heavily depend on the choice of the functions g and \(\ell \), for which there are multiple options. Hence, no optimality can be claimed at this level of generality. The approach, however, works in many concrete cases where available methods do not.
5 L-Mixing Processes
Let \({\mathcal {G}}_t\), \(t\in {\mathbb {N}}\) be an increasing sequence of sigma-algebras (i.e. a discrete-time filtration), and let \({\mathcal {G}}^+_t\), \(t\in {\mathbb {N}}\) be a decreasing sequence of sigma-algebras such that, for each \(t\in {\mathbb {N}}\), \({\mathcal {G}}_t\) is independent of \({\mathcal {G}}^+_t\).
Let \(W_t\), \(t\in {\mathbb {N}}\) be a real-valued stochastic process. For each \(r\ge 1\), introduce
For each process W such that \(M_1(W)<\infty \) we also define, for each \(r\ge 1\), the quantities
For some \(r\ge 1\), the process W is called L-mixing of order r with respect to \(({\mathcal {G}}_t,{\mathcal {G}}^+_t)\), \(t\in {\mathbb {N}}\) if it is adapted to \(({\mathcal {G}}_t)_{t\in {\mathbb {N}}}\) and \(M_r(W)<\infty \), \(\Gamma _r(W)<\infty \). We say that W is L-mixing if it is L-mixing of order r for all \(r\ge 1\). This notion of mixing was introduced in [10].
Remark 5.1
It is easy to check that if \(W_t\), \(t\in {\mathbb {N}}\) is L-mixing of order r, then also the process \({\tilde{W}}_t:=W_t-EW_t\), \(t\in {\mathbb {N}}\) is L-mixing of order r; moreover, \(\Gamma _r({\tilde{W}})=\Gamma _r(W)\) and \(M_r({\tilde{W}})\le 2M_r(W)\).
The next lemma is useful when checking the L-mixing property for a given process.
Lemma 5.2
Let \({\mathcal {G}}\subset {\mathcal {F}}\) be a sigma-algebra, X, Y random variables with \(E[|X|^r]+E[|Y|^r]<\infty \) with some \(r\ge 1\). If Y is \({\mathcal {G}}\)-measurable, then
Proof
See Lemma 2.1 of [10]. \(\square \)
Lemma 5.3
For an L-mixing process W of order \(r\ge 2\) satisfying \(E[W_t]=0\), \(t\in {\mathbb {N}}\),
holds for each \(N\ge 1\) with a constant \(C_r\) that does not depend either on N or on W.
Proof
This follows from Theorem 1.1 of [10]. \(\square \)
L-mixing is, in many cases, easier to show than other, better-known mixing concepts such as \(\alpha -\), \(\beta -\) or \(\phi \)-mixing. There seems to be no implication between L-mixing and these latter conditions. For further information and related results, see [10].
6 Proofs of Ergodicity I
Throughout this section, let the assumptions of Theorem 2.17 be in force: Y is an ergodic process; Assumptions 2.2 and 2.7 hold; Assumption 2.5 holds with \(R(n):=8K(n)/\lambda (n)\), \(n\in {\mathbb {N}}\); and we have \(r_1(0)+r_2(0)<\infty \) and
In Sect. 4, we profited from contraction estimates for the metric \(\rho _{\beta }\). These required, essentially, that given \(X_{t}\), \(X_{t}'\) a convenient coupling for \(X_{t+1}\), \(X_{t+1}'\) is realized, see, for example, (40). The exact nature of that coupling is hidden in Lemma 4.1. In the current section, we construct couplings for the whole process \(X_{t}\) which allow us to show suitable mixing properties.
We now present a construction that is crucial for proving Theorem 2.17. The random mappings \(T_t\) in the lemma below serve to provide the coupling effects that are needed for establishing the L-mixing property (see Sect. 5) for an auxiliary process (Z below) which will, in turn, lead to Theorem 2.17. Such a representation with random mappings was used in [2, 11]. In our setting, however, there is also dependence on \(y\in {\mathcal {Y}}\).
For \(R\ge 0\), denote by \({\mathfrak {C}}(R)\) the set of \({\mathcal {X}}\rightarrow {\mathcal {X}}\) mappings that are constant on \(C(R)=\{x\in {\mathcal {X}}:\, V(x)\le R\}\).
Lemma 6.1
There exists a sequence of measurable functions \(T_t:{\mathcal {Y}}\times {\mathcal {X}}\times {\Omega } \rightarrow {\mathcal {X}}\), \(t\ge 1\) such that
for all \(t\ge 1\), \(y\in {\mathcal {Y}}\), \(x\in {\mathcal {X}}\), \(A\in {\mathfrak {B}}\). There exist independent sigma-algebras \({\mathcal {L}}_t\), \(t\ge 1\) such that the random variables \(T_t(y,x,\cdot ),\, x\in {\mathcal {X}},\, y\in {\mathcal {Y}}\) are \({\mathcal {L}}_{t}\)-measurable. There are events \(J_t(y)\in {\mathcal {L}}_{t}\), for all \(t\ge 1\), \(y\in {\mathcal {Y}}\) such that
Proof
Let \(U_t\), \(t\ge 1\) be an independent sequence of uniform random variables on [0, 1]. Let \(\varepsilon _t\), \(t\ge 1\) be another such sequence, independent of \((U_t)_{t\ge 1}\). By enlarging the probability space, if necessary, we can always construct such random variables and we may even assume that \((U_t,\varepsilon _t)\), \(t\ge 1\) are independent of \((X_0,(Y_t)_{t\in {\mathbb {Z}}})\). Let \({\mathcal {L}}_{t}:=\sigma (U_{t},\varepsilon _{t})\).
We assume that \({\mathcal {X}}\) is uncountable, the case of countable \({\mathcal {X}}\) being analogous, but simpler. As \({\mathcal {X}}\) is Borel isomorphic to \({\mathbb {R}}\), see page 159 of [8], we may and will assume that, actually, \({\mathcal {X}}={\mathbb {R}}\). (We omit the details.)
The main idea in the arguments below is to separate the “independent component” \(\alpha (n)\nu _n(\cdot )\) from the rest of the kernel \(Q(y,x,\cdot )-\alpha (n)\nu _n(\cdot )\) for \(y\in A_n\) and \(x\in C(R(n))\). This independent component will ensure the existence of the constant mappings in (42).
Recall the sets \(A_n\), \(n\in {\mathbb {N}}\) from Assumption 2.2. Let \(B_n:=A_n\setminus A_{n-1}\), \(n\in {\mathbb {N}}\), with the convention \(A_{-1}:=\emptyset \). For each \(n\in {\mathbb {N}}\), \(y\in B_n\), let \(j_n(y,r):=\nu _{n}((-\infty ,r])\), \(r\in {\mathbb {R}}\) (the cumulative distribution function of \(\nu _n\)) and define its (\({\mathfrak {A}}\otimes {\mathcal {B}}({\mathbb {R}})\)-measurable) pseudoinverse by \(j^-_n(y,z):=\inf \{r\in {\mathbb {Q}}:\, j_n(y,r)\ge z\}\), \(z\in (0,1)\). Here, \({\mathcal {B}}({\mathbb {R}})\) refers to the Borel field of \({\mathbb {R}}\). Similarly, for \(y\in B_n\) and \(x\in C(R(n))\), let
the cumulative distribution function of the normalization of \(Q(y,x,\cdot )-\alpha (n)\nu _n(\cdot )\). For \(x\notin C(R(n))\), set simply
For each \(x\in {\mathcal {X}}\), define
Define, for \(n\in {\mathbb {N}}\), \(y\in B_n\),
Notice that \(T_t(y,\cdot ,\omega )\in {\mathfrak {C}}({R(n)})\) whenever \(U_t(\omega )\le \alpha (n)\), this implies (42) with \(J_t(y):=\{\omega \in \Omega :\, U_t(\omega )\le \alpha (\Vert y\Vert )\}\). It is easy to check (41), too. \(\square \)
Remark 6.2
Note that, in the above construction, \((U_n,\varepsilon _n)_{n\in {\mathbb {N}}}\) was taken to be independent of \((X_0,(Y_t)_{t\in {\mathbb {Z}}})\). This will be important later, in the proof of Theorem 2.17.
We drop dependence of the mappings \(T_t\) on \(\omega \in \Omega \) in the notation from now on and will simply write \(T_t(y,x)\). We continue our preparations for the proof of Theorem 2.17. Let
and
Take an arbitrary element \({\tilde{x}}\in {\mathcal {X}}\), this will remain fixed throughout this section.
Our approach to the ergodic theorem for X does not rely on the Markovian structure, and it proceeds rather through establishing a convenient mixing property. The ensuing arguments will lead to Theorem 2.17 via the L-mixing property of certain auxiliary Markov chains. It turns out that L-mixing is particularly well adapted to Markov chains, even when they are time-inhomogeneous. (And for us this is the crucial point.) See also Sect. 7 about these issues.
The main ideas of the arguments below go back to [2] and [11]. In [11], Doeblin chains were treated. We need to extend those arguments substantially in the present, more complicated setting.
Let us fix \({\textbf{y}}=(y_j)_{j\in {\mathbb {Z}}}\in {\mathfrak {Y}}\) till further notice such that, for some \(H\in {\mathbb {N}}\), \(\Vert y_j\Vert \le H\) holds for all \(j\in {\mathbb {Z}}\). Define
Clearly, the process Z heavily depends on the choice of \({\textbf{y}}\). However, for a while we do not signal this dependence for notational simplicity. Fix also \(m\in {\mathbb {N}}\) till further notice. Define also
Notice that \({\tilde{Z}}_t\), \(t\ge m\) are \({\mathcal {G}}^+_m\)-measurable.
Our purpose will be to prove that, with a large probability, \(Z_{m+\tau }={\tilde{Z}}_{m+\tau }\) for \(\tau \) large enough. In other words, a coupling between the processes Z and \({\tilde{Z}}\) is realized. Fix \(\epsilon =\epsilon (H)>0\) which will be specified later. Let \(\tau \ge 1\) be an arbitrary integer. Denote \(\vartheta :=\lceil \, \epsilon \tau \, \rceil \). Recall that \(R(H)=8K(H)/\lambda (H)\). Define \(D:=C(R(H)/2)=\{x\in {\mathcal {X}}:\, V(x)\le R(H)/2\}\) and \({\overline{D}}:=\{(x_1,x_2)\in {\mathcal {X}}^2:\, V(x_1)+V(x_2)\le R(H)\}\). Denote \({\overline{Z}}_t:=(Z_t,{\tilde{Z}}_t)\), \(t\ge m\).
Lemma 6.3
We have \(\sup _{k\in {\mathbb {N}}}E[V(Z_k)]\le E[V(X_0)]+K(H)/\lambda (H)<\infty \). Furthermore, \(\sup _{k\ge m}E[V({\tilde{Z}}_k)]\le V({\tilde{x}})+K(H)/\lambda (H)\).
Proof
Assumption 2.2 implies that, for \(k\ge 1\),
Assumption 2.7 implies that \(E[V(X_0)]=E[V(Z_0)]<\infty \) so, for every \(k\in {\mathbb {N}}\),
Similarly,
\(\square \)
The counterpart of the above lemma for X instead of Z is the following.
Lemma 6.4
Proof
Note that \(E[V(X_0)]<\infty \) by Assumption 2.7. So, for each \(n\ge 1\),
by Theorem 2.12. \(\square \)
We note for later use that if \(z\in {\mathcal {X}}\setminus D\), then for all \(y\in A_H\),
Recall Definitions (43) and (44). Define the \(({\mathcal {G}}_t)_{t\in {\mathbb {N}}}\)-stopping times
The results below serve to control the number of returns to \({\overline{D}}\) and the probability of coupling between the processes Z and \({\tilde{Z}}\). Our estimation strategy in the proof of Theorem 2.17 will be the following. We will control \(P({\tilde{Z}}_{\tau +m}\ne Z_{\tau +m})\) for large \(\tau \): either there were only few returns of the process \({\overline{Z}}\) to \({\overline{D}}\) (which happens with small probability) or there were many returns but coupling did not occur (which also has small probability). First let us present a lemma controlling the number of returns to \({\overline{D}}\).
Lemma 6.5
There is \(\bar{C}>0\) such that
and
where
In particular, \(\sigma _n<\infty \) a.s. for each \(n\in {\mathbb {N}}\). Furthermore, \(\bar{C}\) does not depend on either \({\textbf{y}}\), m or H.
Proof
We can estimate, for \(k\ge 1\) and \(n\ge 1\),
Notice that \(\{{\overline{Z}}_{\sigma _n+k-1}\notin {\overline{D}}\}\subset \{ Z_{\sigma _{n}+k-1}\notin D\} \cup \{ {\tilde{Z}}_{\sigma _{n}+k-1}\notin D\}=:E_{1}\cup E_{2}\). Assumption 2.2 and the observation (47) imply that
This argument can clearly be iterated and leads to
by Assumption 2.2, since \({\overline{Z}}_{\sigma _n}\in {\overline{D}}\). In the case \(n=0\), \(k\ge 1\), we arrive at
instead, in a similar way, by Lemma 6.3.
Now, we turn from probabilities to expectations. Using \(e^{\varrho (H)}\le 2\), we can estimate, for \(n\ge 1\),
When \(n=0\), we obtain
for some \(\bar{C}\). We may and will assume \(\bar{C}\ge 8\). The statement follows. \(\square \)
The quantity \(\epsilon >0\) has been arbitrary thus far. Now, let us make the choice
Recall that \(\tau \ge 1\) has also been arbitrary. Recall that \(\vartheta =\lceil \epsilon \tau \rceil \).
Corollary 6.6
If \(\tau \ge 1/\epsilon (H)\), then \(P(\sigma _{\vartheta }>m+\tau )\le \exp (-\varrho (H)\tau /2)\).
Proof
Lemma 6.5 and the tower rule for conditional expectations easily imply
Hence, by the Markov inequality,
The statement now follows by direct calculations. Indeed, this choice of \(\epsilon (H)\) and \(\tau \ge 1/\epsilon (H)\) implies
which guarantees
\(\square \)
The next lemma controls the probability of coupling between Z and \({\tilde{Z}}\).
Lemma 6.7
For all \(\tau \ge 1\),
Proof
For typographical reasons, we will write \(\sigma (n)\) instead of \(\sigma _n\) in this proof. Notice that if \(\omega \in \Omega \) is such that \(\sigma (k)(\omega )<m+\tau \) and \(T_{\sigma (k)(\omega )+1}(y_{\sigma (k)(\omega )+1}, \cdot ,\omega )\in {\mathfrak {C}}(R(H))\), then \(Z_{\sigma (k)(\omega )+1}(\omega )={\tilde{Z}}_{\sigma (k)(\omega ) +1}(\omega )\) and hence also \(Z_{m+\tau }(\omega )={\tilde{Z}}_{m+\tau }(\omega )\). Recall the proof of Lemma 6.1 and estimate
As easily seen,
Iterating the above argument, we arrive at the statement of this lemma using \(1-x\le e^{-x}\), \(x\ge 0\). \(\square \)
Lemma 6.8
Let \(\phi \in \Phi (V^{\delta })\) for some \(0<\delta \le 1/2\). Then, the process \(\phi (Z_t)\), \(t\in {\mathbb {N}}\) is L-mixing of order p with respect to \(({\mathcal {G}}_t,{\mathcal {G}}^+_t)\), \(t\in {\mathbb {N}}\), for all \(1\le p<1/\delta \). Furthermore, \(\Gamma _{p}(\phi (Z))\), \(M_{p}(\phi (Z))\) (see Sect. 5 for the definitions of these quantities) have upper bounds that do not depend on \({\textbf{y}}\), only on H.
In the sequel, we will use, without further notice, the following elementary inequalities for \(x,y\ge 0\):
Proof of Lemma 6.8
Clearly,
with some constant \({\tilde{C}}>0\), by Lemma 6.3. Also,
for all \(1\le p<1/\delta \).
Now, we turn to establishing a bound for \(\Gamma _p(\phi (Z))\). Since \({\tilde{Z}}_m\) is deterministic, \({\tilde{Z}}_{m+\tau }\) is \({\mathcal {G}}_m^+\)-measurable for \(\tau \ge 0\). Lemma 5.2 implies that, for \(\tau \ge 1\),
using Hölder’s inequality with the exponents \(1/(p\delta )\) and \(1/(1-p\delta )\). By Lemma 6.3,
for some suitable \(\check{C}>0\). Here, we have used \(K\ge 1\). Since
we obtain from (51), Lemma 6.7 and Corollary 6.6 that for \(\tau \ge 1/\varepsilon (H)\),
noting that the estimates of Lemma 6.7 and Corollary 6.6 do not depend on the choice of m. For each integer
we will apply the trivial estimate
recall (51). Keep in mind the definition \(\vartheta =\lceil \epsilon \tau \rceil \). We can then write, using the formula for the sum of geometric series,
with some constant \(c'\). Using elementary properties of the functions \(x\rightarrow 1/(1-e^{-x})\) and \(x\rightarrow \ln (1+x)\) and the definitions (49) and (48), \(\Gamma _p(\phi (Z))\) can be estimated from above by
with some \(c'',c'''>0\),. The L-mixing property of order p follows. (Note, however, that \(c'''\) depends on p, \(\delta \) as well as on \(V({\tilde{x}})\) and \(E[V(X_0)]\).) \(\square \)
Proof of Theorem 2.17
Recall the definitions (45) and (46). Now, we start signalling the dependence of Z on \({\textbf{y}}\) and hence write \(Z_t^{{\textbf{y}}}\), \(t\in {\mathbb {N}}\). Note that the law of \(Z_t^{{\textbf{Y}}}\), \(t\in {\mathbb {N}}\) equals that of \(X_t\), \(t\in {\mathbb {N}}\), by construction of Z and by Remark 6.2.
For \(t\in {\mathbb {N}}\) and \({\textbf{y}}\in {\mathfrak {Y}}\), define \(\psi _{t}({\textbf{y}}):=\int _{{\mathcal {X}}}\phi (x)\mu _{n}({\textbf{y}})(dx)\), recall the definition of \(\mu _{n}({\textbf{y}})\) from (30). Notice that \(\psi _{t}(\hat{{\textbf{y}}}_{t})=E[\phi (Z^{{\textbf{y}}}_{t})]\). Define \(W_t({\textbf{y}}):=\phi (Z^{{\textbf{y}}}_t)-\psi _{t}(\hat{{\textbf{y}}}_{t})\). Clearly, \(W_{t}({\textbf{y}})\) is a zero-mean process.
Fix \(p\ge 2\) such that \(\delta p<1\). Fix \(N\in {\mathbb {N}}\) for the moment. In the particular case where \({\textbf{y}}\) satisfies \(|y_j|\le H:=g(N)\), \(j\in {\mathbb {N}}\), the process \(W_t({\textbf{y}})\), \(t\in {\mathbb {N}}\) is L-mixing by Lemma 6.8 and Remark 5.1. Hence, Lemma 5.3 implies
by (51) and (55); recall also Remark 5.1. Fix \({\tilde{y}}\in A_0\) and define
Let \(\tilde{{\textbf{Y}}}=({\tilde{Y}}_{j})_{j\in {\mathbb {Z}}}\in {\mathfrak {Y}}\). Note that, by \(\phi \in \Phi (V^{\delta })\),
Estimate, using Hölder’s inequality with exponents \(1/(\delta p)\), \(1/(1-\delta p)\),
with some constants \(C',C''>0\), by Lemma 6.4. Here, we have also used the fact that if \(({\tilde{Y}}_1,\ldots ,{\tilde{Y}}_N)=(Y_1,\ldots ,Y_N)\), then also \(W_{j}({\textbf{Y}})=W_{j}(\tilde{{\textbf{Y}}})\), \(j=1,\ldots ,N\). The quantity (56) tends to 0 by our hypotheses (12) and \(r_{2}(0)<\infty \).
Recall the notation \(\hat{{\textbf{Y}}}_{n}:=(Y_{j+n})_{j\in {\mathbb {Z}}}\) and the functional \(\Psi _{\phi }\) from (37). Now, we can estimate
Birkhoff’s theorem and the ergodicity of the process Y imply that
almost surely, hence also in probability, noting Remark 4.4.
By stationarity of the process \(\hat{{\textbf{Y}}}_{k}\), \(k\in {\mathbb {Z}}\), we have that
The proof of Theorem 2.12 (see the discussion after (36)) and Remark 4.4 shows that \(\psi _{j}({\textbf{Y}})\rightarrow \Psi _{\phi }({\textbf{Y}})\) almost surely as \(j\rightarrow \infty \). It follows that \(\psi _{j}(\hat{{\textbf{Y}}}_{j})-\Psi _{\phi }(\hat{{\textbf{Y}}}_{j})\) tends to 0 in probability and so does the second term on the right-hand side of (57).
The third term on the right-hand side of (57) equals
We claim that it converges to 0 in probability. Notice that \(\ell (N)\rightarrow 0\), \(N\rightarrow \infty \) by the hypothesis \(r_{2}(0)<\infty \). It was a hypothesis of Theorem 2.17 that \(\pi (N)(K(N)/\lambda (N))^{2\delta }/N\) tends to 0 and hence, actually, (56) implies that the term in question tends to 0 in \(L^{p}\), a fortiori, in probability.
To sum up,
in probability, recalling that the laws of \(Z^{{\textbf{Y}}}_{n}\), \(n\in {\mathbb {N}}\) and \(X_{n}\), \(n\in {\mathbb {N}}\) coincide.
To show convergence in \(L^{p}\), it suffices to check the uniform integrability of the family of random variables \(V^{\delta p}(X_{n})\), \(n\in {\mathbb {N}}\) since \(\phi \in \Phi (V^{\delta })\). This follows from \(p<1/\delta \) and from Lemma 6.4. The theorem has been shown for \(p\ge 2\) but this implies the result for \(1\le p<2\), too. \(\square \)
Remark 6.9
In the proof of Theorem 2.17, we could find estimates for the \(L^{p}\) convergence rate for the third term in (57) (see (56)). We would be able to do likewise for the second term, using the estimates in our arguments. However, there is, a priori, no rate estimate for
as this depends on the mixing properties of Y. Making suitably strong assumptions about the process Y, however, this term could also be estimated. In the ideal case, \(E^{1/p}[h_{N}^{p}]\) is of the order \(1/\sqrt{N}\).
7 Ramifications
Let \(X_t\), \(t\in {\mathbb {N}}\) be a \({\mathcal {X}}\)-valued time-inhomogeneous Markov chain. Denote by \(Q_{t}(x,A)\) the transition kernel at times \(t\ge 1\). We impose that there exist \(\lambda ,K>0\) such that, for all \(t\ge 1\),
for some measurable function \(V:{\mathcal {X}}\rightarrow {\mathbb {R}}_{+}\). Furthermore, for some \(\alpha >0\) and for all \(t\ge 1\),
for some probability \(\nu \) and for
A simplified form of the argument of Lemma 6.1 gives us independent (but not identically distributed) random mappings \(T_{t}:{\mathcal {X}}\times \Omega \rightarrow {\mathcal {X}}\) such that \(P(T_t(x,\cdot )\in A)=Q_{t}(x,A)\) for \(t\ge 1\). Note that \(T_{t}\) are only independent, but not identically distributed this time. Define \(Z_0:=X_0\), \(Z_{t+1}:=T_{t+1}(Z_{t})\), \(t\in {\mathbb {N}}\), where we dropped, as before, the dependence of \(T_{t+1}\) on \(\omega \in \Omega \) in the notation.
Fix \(m\in {\mathbb {N}}\) and define \({\tilde{Z}}_m:={\tilde{x}}\), \({\tilde{Z}}_{t+1}:=T_{t+1}({\tilde{Z}}_t)\), \(t\ge m\), for some fixed \({\tilde{x}}\in {\mathcal {X}}\). Repeating the arguments of Sect. 6, we get the following result.
Theorem 7.1
For suitable constants \(c_{1},c_{2}>0\),
Furthermore, for \(0< \delta \le 1/2\) and for any \(\phi \in \Phi (V^{\delta })\), \(\phi (X_{t})\), \(t\in {\mathbb {N}}\) is L-mixing of order p for each \(1\le p<1/\delta \) and the following estimates hold:
\(\square \)
Although this result forms a very particular case of our framework, it is new and of considerable interest: it establishes a useful mixing property for functionals of a wide class of (even inhomogeneous) Markov processes.
8 Proofs of Ergodicity II
Proof of Theorem 2.18
This follows very closely the proof of Theorem 2.17, and we only point out the differences. Denote by S an upper bound for \(|\phi |\). Take an arbitrary \(p\ge 2\). We may use the Hölder inequality with exponents 1 and \(\infty \) in the estimates (50). This leads to
using the argument of (55). Then, the proof of convergence in probability can be completed as above. Note that instead of
we may write
in (56) and that
in this case. As \(\phi \) is bounded, \(L^{p}\) convergence for all \(p\ge 1\) also follows. \(\square \)
References
Bhattacharya, R.N., Waymire, E.C.: Stochastic Processes with Applications. Wiley, New York (1990)
Bhattacharya, R., Waymire, E.C.: An approach to the existence of unique invariant probabilities for Markov processes. In: Limit Theorems in Probability and Statistics, János Bolyai Math. Soc., I (Balatonlelle 1999), pp. 181–200 (2002)
Borovkov, A.A.: Egodicity and Stability of Stochastic Processes. Wiley, New York (1998)
Cogburn, R.: The ergodic theory of Markov chains in random environments. Z. Wahrsch. Verw. Gebiete 66, 109–128 (1984)
Cogburn, R.: On direct convergence and periodicity for transition probabilities of Markov chains in random environments. Ann. Probab. 18, 642–654 (1990)
Comte, F., Renault, É.: Long memory in continuous-time stochastic volatility models. Math. Finance 8, 291–323 (1998)
Cont, R.: Empirical properties of asset returns: stylized facts and statistical issues. Quant. Finance 1, 223–236 (2001)
Dellacherie, C., Meyer, P.-A.: Probability and Potential. North-Holland, Amsterdam (1979)
Gatheral, J., Jaisson, T., Rosenbaum, M.: Volatility is rough. Quantitative Finance 18, 933–949 (2018)
Gerencsér, L.: On a class of mixing processes. Stochastics 26, 165–191 (1989)
Gerencsér, L., Molnár-Sáska, G., Michaletzky, Gy., Tusnády, G., Vágó, Zs.: New methods for the statistical analysis of Hidden Markov models. In: Proceedings of the 41st IEEE Conference on Decision and Control, 2002, Las Vegas, USA, pp. 2272–2277, IEEE Press, New York (2002)
Hairer, M., Mattingly, J.: Yet another look at Harris’ ergodic theorem for Markov chains. In: Seminar on stochastic analysis, random fields and applications VI. (eds. R. Dalang, M. Dozzi and F. Russo F.), Springer, Basel, (2011)
Kifer, Y.: Perron-Frobenius theorem, large deviations, and random perturbations in random environments. Math. Zeitschrift 222, 677–698 (1996)
Kifer, Y.: Limit theorems for random transformations and processes in random environments. Trans. Am. Math. Soc. 350, 1481–1518 (1998)
Lovas, A., Rásonyi, M.: Markov chains in random environment with applications in queueing theory and machine learning. Stochastic Process. Appl. 137, 294–326 (2021)
Meyn, S.P., Tweedie, R.L.: Markov Chains and Stochastic Stability. Springer (1993)
Orey, S.: Markov chains with stochastically stationary transition probabilities. Ann. Probab. 19, 907–928 (1991)
Seppäläinen, T.: Large deviations for Markov chains with random transitions. Ann. Probab. 22, 713–748 (1994)
Stenflo, Ö.: Markov chains in random environments and random iterated function systems. Trans. Am. Math. Soc. 353, 3547–3562 (2001)
Acknowledgements
Both authors enjoyed the support of the NKFIH (National Research, Development and Innovation Office, Hungary) grant KH 126505. The first author was also supported by the NKFIH grant PD 121107 and the second author by the “Lendület” grant LP 2015-6 of the Hungarian Academy of Sciences and by the NKFIH grant K 143529. We thank Attila Lovas for pointing out two mistakes and for suggesting improvements. We also thank three anonymous referees for helpful comments.
Funding
Open access funding provided by ELKH Alfréd Rényi Institute of Mathematics.
Author information
Authors and Affiliations
Contributions
The authors have no relevant financial or non-financial interests to disclose. Data sharing is not applicable to this article as no datasets were generated or analysed during the current study.
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Gerencsér, B., Rásonyi, M. On the Ergodicity of Certain Markov Chains in Random Environments. J Theor Probab 36, 2093–2125 (2023). https://doi.org/10.1007/s10959-023-01256-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10959-023-01256-7