1 Introduction

In the present paper we are concerned with the problem of proving the law of large numbers (LLN) for random dynamical systems.

The question of establishing the LLN for an additive functional of a Markov process is one of the most fundamental in probability theory and there exists a rich literature on the subject, see e.g. [21] and the citations therein. However, in most of the existing results, it is usually assumed that the process under consideration is stationary and its equilibrium state is stable in some sense, usually in the \(L^2\), or total variation norm. Our stability condition is formulated in a weaker metric than the total variation distance.

The law of large numbers we study in this note was also considered in many papers. Our results are based on a version of the law of large numbers due to Shirikyan (see [23, 24]). Recently Komorowski et al. [15] obtained the weak law of large numbers for the passive tracer model in a compressible environment and Walczuk studied Markov processes with the transfer operator having spectral gap in the Wasserstein metric and proved the LLN in the non-stationary case [30].

Random dynamical systems [6, 8, 10] take into consideration some very important and widely studied cases, namely dynamical systems generated by learning systems [1, 11, 13, 19], Poisson driven stochastic differential equations [7, 18, 25, 27], iterated function systems with an infinite family of transformations [17, 28, 29], random evolutions [4, 22] and irreducible Markov systems [31], used for the computer modelling of different stochastic processes.

A large class of applications of such models, both in physics and biology, is worth mentioning here: the shot noise, the photo conductive detectors, the growth of the size of structural populations, the motion of relativistic particles, both fermions and bosons (see [3, 14, 16]), the generalized stochastic process introduced in the recent model of gene expression by Lipniacki et al. [20] see also [2, 5, 9].

A number of results have been obtained that claim an existence of an asymptotically stable, unique invariant measure for Markov processes generated by random dynamical systems for which the state space need not be locally compact. We consider random dynamical systems with randomly chosen jumps acting on a given Polish space \((Y,\varrho )\).

The aim of this paper is to study stochastic processes whose paths follow deterministic dynamics between random times, jump times, at which they change their position randomly. Hence, we analyse stochastic processes in which randomness appears at times \(\tau _0 < \tau _1 <\tau _2<\ldots \) We assume that a point \( x_0 \in Y \) moves according to one of the dynamical systems \( T_i : {\mathbb {R}}_+ \times Y \rightarrow Y \) from some set \(\{ T_1, \ldots , T_N \}\). The motion of the process is governed by the equation \( X(t) = T_i (t, x_0) \) until the first jump time \(\tau _1\). Then we choose a transformation \(q_{\theta } : Y \rightarrow Y\) from a family \(\{q_{\theta }\, : \, {\theta }\in \Theta = \{1, \ldots , K\}\} \) and define \(x_1 = q_{\theta }(T_i (\tau _1, x_0))\). The process restarts from that new point \(x_1\) and continues as before. This gives the stochastic process \(\{X(t)\}_{t \ge 0}\) with jump times \(\{\tau _1, \tau _2, \ldots \}\) and post jump positions \(\{x_1, x_2, \ldots \}\). The probability determining the frequency with which the dynamical systems \(T_i\) are chosen is described by a matrix of probabilities \({[p_{ij}]}_{i,j=1}^N \), \(p_{ij} : Y \rightarrow [0, 1]\). The maps \(q_{\theta }\) are randomly chosen with place dependent distribution. Given a Lipschitz function \(\psi :X \rightarrow \mathbb {R}\) we define

$$\begin{aligned} S_n(\psi )= \psi (x_0)+\dots +\psi (x_n). \end{aligned}$$

Our aim is to find conditions under which \(S_n(\psi )\) satisfies law of large numbers. Our results are based on an exponential convergence theorem due to Ślȩczka and Kapica (see [12]) and a version of the law of large numbers due to Shirikyan (see [23, 24]).

2 Notation and Basic Definitions

Let (Xd) be a Polish space, i.e. a complete and separable metric space and denote by \(\mathcal {B}_X\) the \(\sigma \)-algebra of Borel subsets of X. By \(B_b(X)\) we denote the space of bounded Borel-measurable functions equipped with the supremum norm, \(C_b(X)\) stands for the subspace of bounded continuous functions. Let \(\mathcal {M}_{fin}(X)\) and \(\mathcal {M}_1(X)\) be the sets of Borel measures on X such that \(\mu (X)<\infty \) for \(\mu \in \mathcal {M}_{fin}(X)\) and \(\mu (X)=1\) for \(\mu \in \mathcal {M}_1(X)\). The elements of \(\mathcal {M}_1(X)\) are called probability measures. The elements of \(\mathcal {M}_{fin}(X)\) for which \(\mu (X)\le 1\) are called subprobability measures. By \(supp\, \mu \) we denote the support of the measure \(\mu \). We also define

$$\begin{aligned} {\mathcal M}_1^L (X)=\left\{ \mu \in {\mathcal M}_1(X):\,\int _X L(x)\mu (d x)<\infty \right\} \end{aligned}$$

where \(L:X\rightarrow [0,\infty )\) is an arbitrary Borel measurable function and

$$\begin{aligned} {\mathcal M}_1^1 (X)=\left\{ \mu \in {\mathcal M}_1(X):\,\int _X d(\bar{x} ,x)\mu (d x)<\infty \right\} , \end{aligned}$$

where \(\bar{x}\in X\) is fixed. By the triangle inequality this family is independent of the choice of \(\bar{x}\).

The space \(\mathcal {M}_1(X)\) is equipped with the Fortet-Mourier metric:

$$\begin{aligned} \Vert \mu _1-\mu _2\Vert _{FM}=\sup \left\{ |\int _X f(x)(\mu _1-\mu _2)(dx)|:\, f\in \mathcal {F}\right\} , \end{aligned}$$

where

$$\begin{aligned} \mathcal {F}=\{f\in C_b(X):\, |f(x)-f(y)|\le d(x,y) \quad \text {and}\quad |f(x)|\le 1\quad \text {for}\quad x,y\in X\}. \end{aligned}$$

Let \(P:B_b(X)\rightarrow B_b(X)\) be a Markov operator, i.e. a linear operator satisfying \(P\mathbf{1}_X=\mathbf{1}_X\) and \(Pf(x)\ge 0\) if \(f\ge 0\). Denote by \(P^{*}\) the the dual operator, i.e operator \(P^{*}:\mathcal {M}_{fin}(X)\rightarrow \mathcal {M}_{fin}(X)\) defined as follows

$$\begin{aligned} P^{*}\mu (A):=\int _X P \mathbf{1}_A(x)\mu (dx)\qquad \text {for}\qquad A\in \mathcal {B}_X. \end{aligned}$$

We say that a measure \(\mu _*\in \mathcal {M}_1(X)\) is invariant for P if

$$\begin{aligned} \int _X Pf(x)\mu _*(dx)=\int _X f(x)\mu _*(dx)\qquad \text {for every}\qquad f\in B_b(X) \end{aligned}$$

or, alternatively, we have \(P^* \mu _*=\mu _*\). An invariant measure \(\mu \) is attractive if

$$\begin{aligned} \lim \limits _{n\rightarrow \infty }\int _X P^nf(x)\, \mu (dx)=\int _X f(x)\, \mu (dx)\quad \text {for}\quad f\in C_b(X),\, \mu \in \mathcal {M}_1(X). \end{aligned}$$

By \(\{\mathbf {P}_x:\, x\in X\}\) we denote a transition probability function for P, i.e. a family of measures \(\mathbf {P}_x\in \mathcal {M}_1(X)\) for \(x\in X\), such that the map \(x\mapsto \mathbf {P}_x(A)\) is measurable for every \(A\in \mathcal {B}_X\) and

$$\begin{aligned} Pf(x)=\int _X f(y) \mathbf {P}_x(dy)\qquad \text {for}\qquad x\in X\quad \text {and}\quad f\in B_b(X) \end{aligned}$$

or equivalently \(P^*\mu (A)=\int _X \mathbf {P}_x(A)\mu (dx)\) for \(A\in \mathcal {B}_X\) and \(\mu \in \mathcal {M}_{fin}(X)\). We say that a vector \((p_1, \ldots ,p_N)\) where \(p_i :Y \rightarrow [0, 1]\) is a probability vector if

$$\begin{aligned} \sum _{i=1}^N p_i(x) = 1 \quad \text {for}\quad x\in Y. \end{aligned}$$

Analogously a matrix \([p_{ij}]_{i,j }\) where \( p_{ij} :Y \rightarrow [0, 1]\) for \(i, j \in \{1, \ldots , N\}\) is a probability matrix if

$$\begin{aligned} \sum _{j=1}^N p_{ij}(x) = 1 \quad \text {for}\quad x\in Y\quad \text {and} \quad i \in \{1, \ldots , N \}. \end{aligned}$$

Definition 1

A coupling for \(\{\mathbf {P}_x: x\in X\}\) is a family \(\{\mathbf {B}_{x,y}:\, x,y\in X\}\) of probability measures on \(X\times X\) such that for every \(B\in \mathcal {B}_{X^2}\) the map \(X^2\ni (x,y)\mapsto \mathbf {B}_{x,y}(B)\) is measurable and

$$\begin{aligned} \mathbf {B}_{x,y}(A\times X)=\mathbf {P}_x(A),\qquad \mathbf {B}_{x,y}(X\times A)=\mathbf {P}_y(A) \end{aligned}$$

for every \(x,y\in X\) and \(A\in \mathcal {B}_X\).

In the following we assume that there exists a subcoupling for \(\{\mathbf {P}_x: x\in X\}\), i.e. a family \(\{\mathbf {Q}_{x,y}:\,x,y\in X\}\) of subprobability measures on \(X^2\) such that the map \((x,y)\mapsto \mathbf {Q}_{x,y}(B)\) is measurable for every Borel \(B\subset X^2\) and

$$\begin{aligned} \mathbf {Q}_{x,y}(A\times X)\le \mathbf {P}_x(A)\qquad \text {and}\qquad \mathbf {Q}_{x,y}(X\times A)\le \mathbf {P}_y(A) \end{aligned}$$

for every \(x,y\in X\) and Borel \(A\subset X\).

Measures \(\{\mathbf {Q}_{x,y}:x,y\in X\}\) allow us to construct a coupling for \(\{\mathbf {P}_x:x\in X\}\). Define on \(X^2\) the family of measures \(\{\mathbf {R}_{x,y}:x,y\in X\}\) which on rectangles \(A\times B\) are given by

$$\begin{aligned} \mathbf {R}_{x,y}(A\times B)=\frac{1}{1-\mathbf {Q}_{x,y}(X^2)}(\mathbf {P}_x(A)-\mathbf {Q}_{x,y}(A\times X)) (\mathbf {P}_y(B)-\mathbf {Q}_{x,y}(X\times B)), \end{aligned}$$

when \(\mathbf {Q}_{x,y}(X^2)<1\) and \(\mathbf {R}_{x,y}(A\times B)=0\) otherwise. A simple computation shows that the family \(\{\mathbf {B}_{x,y}:\, x,y\in X\}\) of measures on \(X^2\) defined by

$$\begin{aligned} \mathbf {B}_{x,y}=\mathbf {Q}_{x,y}+\mathbf {R}_{x,y}\quad \text {for}\quad x,y\in X \end{aligned}$$
(1)

is a coupling for \(\{\mathbf {P}_x:\, x\in X\}\).

The following Theorem due to M. Ślȩczka and R. Kapica (see [12]) will be used in the proof of Theorem 3 in Sect. 4.

Theorem 1

Assume that a Markov operator P and transition probabilities \(\{\mathbf {Q}_{x,y}:\,x,y\in X\}\) satisfy

  • A0 P is a Feller operator, i.e. \(P(C_b(X))\subset C_b (X)\).

  • A1 There exists a Lapunov function  for P, i.e. continuous function \(L: X\rightarrow [0, \infty )\) such that L is bounded on bounded sets, \(\lim _{x\rightarrow \infty }L(x)=+\infty \) and for some \(\lambda \in (0,1), \, c>0\)

    $$\begin{aligned} PL (x)\le \lambda L(x) +c \qquad for \qquad x\in X. \end{aligned}$$
  • A2 There exist \(F\subset X^2\) and \(\alpha \in (0,1)\) such that \(supp \,{\mathbf Q}_{x,y}\subset F\) and

    $$\begin{aligned} \int _{ X^2} d (u,v){\mathbf Q}_{x,y}(d u,d v) \le \alpha d(x,y)\qquad for \qquad (x,y)\in F. \end{aligned}$$
    (2)
  • A3 There exist \(\delta >0,\, l>0\) and \(\nu \in (0,1]\) such that

    $$\begin{aligned} 1- {\mathbf Q}_{x,y}(X^2) \le l d(x,y)^{\nu } \end{aligned}$$

    and

    $$\begin{aligned} {\mathbf Q}_{x,y} (\{(u,v)\in X^2:\, d(u,v)<\alpha d(x,y)\} )\ge \delta \end{aligned}$$

    for \((x,y)\in F\)

  • A4 There exist \(\beta \in (0,1)\), \({\tilde{C}}>0\) and \(R>0\) such that for

    $$\begin{aligned} \kappa (\,(x_n,y_n)_{n\in {\mathbb N}_0}\,)=\inf \{n\in {\mathbb N}_0 :\, (x_n,y_n)\in F\quad \text {and}\quad L(x_n)+L(y_n)<R\} \end{aligned}$$

    we have

    $$\begin{aligned} {\mathbb E}_{x,y} \beta ^{-\kappa }\le {\tilde{C}}\qquad whenever \qquad L(x)+L(y)<\frac{4c}{1-\lambda }, \end{aligned}$$

    where \(\mathbb {E}_{x,y}\) denotes here the expectation with respect to the chain starting from (xy) and with transition function \(\{\mathbf {B}_{x,y}:\, x,y\in X\}\). Then operator P possesses a unique invariant measure \(\mu _{*}\in \mathcal {M}_1^L (X)\), which is attractive in \(\mathcal {M}_1(X)\). Moreover, there exist \(q\in (0,1)\) and \(C>0\) such that

    $$\begin{aligned} \Vert P^{* n}\mu -\mu _{*}\Vert _{FM}\le q^n C(1+\int _X L(x)\mu (dx) ) \end{aligned}$$
    (3)

    for \(\mu \in \mathcal {M}_1^L (X)\) and \(n\in \mathbb {N}\).

We will also need a version of the strong law of large numbers due to Shirikyan [23, 24]. It is originally formulated for Markov chains on a Hilbert space, however analysis of the proof shows that it remains true for Polish spaces.

Theorem 2

Let \((\Omega ,\mathcal {F},\mathbb {P})\) be a probability space and let X be a Polish space. Suppose that for a family of Markov chains \(((X_n^x)_{n\ge 0},\mathbb {P}_x)_{x\in X}\) on X with Markov operator \(P:B_b(X)\rightarrow B_b(X)\) there exists a unique invariant measure \(\mu _*\in \mathcal {M}_1(X)\), a continuous function \(v:X\rightarrow \mathbb {R}_+\) and a sequence \((\eta _n)_{n\in \mathbb {N}}\) of positive numbers such that \(\eta _n\rightarrow 0\) as \(n\rightarrow \infty \) and

$$\begin{aligned} ||P^{*n}\delta _x-\mu _*||_{FM}\le \eta _n v(x)\quad \text {for}\quad x\in X. \end{aligned}$$

If

$$\begin{aligned} C=\sum _{n=0}^{\infty }\eta _n <\infty \end{aligned}$$

and there exists a continuous function \(h:X\rightarrow \mathbb {R}_+\) such that

$$\begin{aligned} \mathbb {E}_x(v(X_n^x))\le h(x)\quad \text {for}\quad x\in X,n\ge 0, \end{aligned}$$

where \(\mathbb {E}_x\) is the expectation with respect to \(\mathbb {P}_x\), then for any \(x\in X\) and any bounded Lipschitz function \(f:X\rightarrow \mathbb {R}\) we have

$$\begin{aligned} \lim _{n\rightarrow \infty }\frac{1}{n}\sum _{k=0}^{n-1} f(X_k^x)=\int _X f(y)\, \mu _*(dy) \end{aligned}$$

\(\mathbb {P}_x\) almost surely.

3 Random Dynamical Systems

Let \((Y, \varrho )\) be a Polish space, \(\mathbb {R}_+=[0,+\infty )\) and \(I = \{1, \dots ,N\}\), \( \Theta = \{1, \ldots , K\}\), where N and K are given positive integers.

We are given a family of continuous functions \(q_{\theta } : Y \rightarrow Y , {\theta } \in \Theta \) and a finite sequence of semidynamical systems \(T_{i}:\mathbb {R}_+\times Y\rightarrow Y\), \(i \in I\), i.e.

$$\begin{aligned} T_i(s+t,x)=T_i(s,(T_i(t,x)), \quad T_i(0, x) = x \quad \text {for }\quad s,t \in \mathbb {R}_+, \,\,i\in I \,\,\,\text {and}\,\,\, x\in Y, \end{aligned}$$

the transformations \(T_{i}:\mathbb {R}_+\times Y\rightarrow Y\), \(i \in I\) are continuous.

Let \(p_i :Y \rightarrow [0,1], \,\,\, i \in I\), \(\tilde{p}_{\theta } :Y \rightarrow [0,1], \,\,\, {\theta } \in \Theta \) be probability vectors and \([p_{ij}]_{i, j \in I}\), \( p_{ij}:Y\rightarrow [0, 1], \,\,\, i,j \in I\) be a matrix of probabilities . In the sequel we denote the system by (Tqp ).

Finally, let \((\Omega , \Sigma , \mathbb {P} )\) be a probability space and \(\{\tau _n\}_{n\ge 0}\) be an increasing sequence of random variables \(\tau _n :\Omega \rightarrow \mathbb {R}_+\) with \(\tau _0 =0\) and such that the increments \(\Delta \tau _n=\tau _n-\tau _{n-1}\), \(n \in \mathbb {N} \), are independent and have the same density \(g(t)=\lambda e^{-\lambda t}\), \( t \ge 0 \).

The intuitive description of random dynamical system corresponding to the system (Tqp ) is the following.

For an initial point \(x_0 \in Y \) we randomly select a transformation \(T_{i_0}\) from the set \(\{T_1 , \ldots , T_N \}\) in such a way that the probability of choosing \(T_{i_0}\) is equal to \(p_{i_0}(x_0)\), and we define

$$\begin{aligned} X(t) = T_{i_0}(t, x_0) \quad \text {for}\quad 0\le t < \tau _1. \end{aligned}$$

Next, at the random moment \(\tau _1\), at the point \(T_{i_0}(\tau _1, x_0)\) we choose a jump \(q_{\theta }\) from the set \(\{q_1, \ldots ,q_K\}\) with probability \(\tilde{p}_{\theta }(T_{i_0}(\tau _1, x_0 ))\) and we define

$$\begin{aligned} x_1 = q_{\theta } (T_{i_0} (\tau _1, x_0)). \end{aligned}$$

Finally, given \(x_n\), \(n\ge 1 \), we choose \( T_{i_n} \) in such a way that the probability of choosing \( T_{i_n} \) is equal to \(p_{i_{n-1}i_n}(x_n)\) and we define

$$\begin{aligned} X(t) = T_{i_n} (t - \tau _n, x_n )\quad \text {for}\quad \tau _n < t <\tau _{n+1}. \end{aligned}$$

At the point \( T_{i_n}(\Delta \tau _{n+1}, x_n ) \) we choose \(q_{{\theta }_n}\) with probability \(\tilde{p}_{{\theta }_n}(T_{i_n}(\Delta \tau _{n+1}, x_n))\). Then we define

$$\begin{aligned} x_{n+1} = q_{{\theta }_n}(T_{i_n} (\Delta \tau _{n+1}, x_n )). \end{aligned}$$

We obtain a piecewise-deterministic trajectory for \(\{X(t)\}_{t \ge 0}\) with jump times \(\{\tau _1, \tau _2, \ldots \}\) and post jump locations \(\{x_1, x_2, \ldots \}\).

The above considerations may be reformulated as follows. Let \(\{\xi _n\}_{n \ge 0}\) and \(\{\gamma _n\}_{n \ge 1}\) be sequences of random variables, \(\xi _n :\Omega \rightarrow I\) and \( \gamma _n :\Omega \rightarrow \Theta \), such that

$$\begin{aligned}&\mathbb {P} (\xi _0 = i | x_0 = x ) = p_i (x),\nonumber \\&\mathbb {P} (\xi _n = k | x_n = x \quad \text {and} \quad \xi _{n-1} = i ) = p_{ik}(x),\nonumber \\&\mathbb {P} (\gamma _n = {\theta } | T_{\xi _{n-1}} (\Delta \tau _n , x_{n-1}) = y ) = \tilde{p}_{\theta } (y). \end{aligned}$$
(4)

Assume that \(\{\xi _n\}_{n \ge 0}\) and \(\{\gamma _n\}_{n \ge 0} \) are independent of \(\{\tau _n\}_{n \ge 0}\) and that for every \(n \in \mathbb {N}\) the variables \(\gamma _1, \ldots ,\gamma _{n-1}\), \( \xi _1, \ldots ,\xi _{n-1}\) are also independent.

Given an initial random variable \(\xi _0\) the sequence of the random variables \(\{x_n\}_{n\ge 0}\), \(x_n : \Omega \rightarrow Y \), is given by

$$\begin{aligned} x_n =q_{\gamma _n} \big (T_{\xi _{n-1}}(\Delta \tau _n, x_{n-1})\big ) \quad \text {for}\quad n=1,2, \dots \end{aligned}$$
(5)

and the stochastic process \(\{X(t)\}_{t \ge 0}\), \(X(t) : \Omega \rightarrow Y\), is given by

$$\begin{aligned} X(t) = T_{\xi _{n-1}}(t - \tau _{n-1}, x_{n-1}) \quad \text {for} \quad \tau _{n-1} \le t < \tau _n,\quad n = 1,2, \ldots \end{aligned}$$
(6)

It is easy to see that \(\{X(t)\}_{t \ge 0}\) and \(\{x_n\}_{n \ge 0}\) are not Markov processes. In order to use the theory of Markov operators we must redefine the processes \(\{X(t)\}_{t \ge 0}\) and \(\{x_n\}_{n \ge 0}\) in such a way that the redefined processes become Markov.

To this end, consider the space \(Y\times I \) endowed with the metric d given by

$$\begin{aligned} d \big ((x, i), (y, j)\big )=\varrho (x,y) + \varrho _d(i, j)\quad \text {for}\quad x, y\in Y, \,\,i, j\in I, \end{aligned}$$
(7)

where \(\varrho _d\) is the discrete metric in I. Now define the process \(\{\xi (t) \}_{t \ge 0}\), \(\xi (t): \Omega \rightarrow I \), by

$$\begin{aligned} \xi (t) = \xi _{n-1} \quad \text {for} \quad \tau _{n-1} \le t <\tau _{n},\quad n=1, 2, \ldots \end{aligned}$$

Then the stochastic process \( \{(X(t), \xi (t))\}_{t \ge 0}\), \((X(t), \xi (t)) : \Omega \rightarrow Y \times I \) has the required Markov property.

We will study the Markov process (post jump locations) \(\{(x_n, \xi _n) \}_{n\ge 0}\) , \((x_n, \xi _n) : \Omega \rightarrow Y \times I \).

Define the Markov operator \(P:B_b(Y\times I)\rightarrow B_b(Y\times I)\)

$$\begin{aligned} Pf(x, i) = \sum _{j\in I} \sum _{{\theta } \in \Theta } \int _0^{+\infty } \lambda e^{-\lambda t} f\big (q_{\theta }\big ( T_j(t, x )\big ),j\big )p_{ij}(x)\tilde{p}_{\theta }\big (T_j (t, x)\big ) \,dt. \end{aligned}$$
(8)

Now consider the sequence of distributions

$$\begin{aligned} \overline{\mu }_n(A) = \mathbb {P} \big ((x_n, \xi _n) \in A \big ) \quad \text {for} \quad A \in \mathcal {B} (Y\times I), \, n \ge 0. \end{aligned}$$

It is easy to see that

$$\begin{aligned} \overline{\mu }_{n+1} = P^* \overline{\mu }_n \quad \text {for } \quad n \ge 0, \end{aligned}$$

where \(P^*\mathcal {M}_1(Y\times I)\rightarrow \mathcal {M}_1(Y\times I)\) is the dual operator

$$\begin{aligned} P^*\mu (A) = \sum _{j\in I} \sum _{{\theta } \in \Theta } \int _{Y\times I} \int _0^{+\infty } \lambda e^{-\lambda t} 1_A\big (q_{\theta }\big ( T_j(t,x)\big ),j \big ) p_{ij}(x)\tilde{p}_{\theta }\big (T_j (t, x)\big ) \, dt\, \mu (dx, di). \end{aligned}$$
(9)

In order to get the existence of an exponentially attractive invariant measure and the strong law of large numbers, we make the following assumptions on the system (Tqp).

The transformations \(T_i : \mathbb {R}_+ \times Y \rightarrow Y\), \(i\in I\) and \(q_\theta :Y \rightarrow Y \), \(\theta \in \Theta \), are continuous and there exists \(x_* \in Y\) such that

$$\begin{aligned} \int _{\mathbb {R}_+}e^{-\lambda t} \varrho (q_{\theta } (T_j(t,x_*)) , q_{\theta }(x_*))\ dt < \infty \quad \text {for} \quad j \in I, \quad {\theta } \in \Theta . \end{aligned}$$
(10)

For the system (Tqp) there are three constants \(L\ge 1\), \(\alpha \in \mathbb {R} \) and \(L_q > 0\) such that

$$\begin{aligned} \sum _{j\in I} p_{ij}(y)\varrho (T_j(t,x) ,T_j(t,y)) \le Le^{ \alpha t}\varrho (x,y) \quad \text {for}\quad x,y \in Y, \,\, i \in I, \,\, t \ge 0 \end{aligned}$$
(11)

and

$$\begin{aligned} \sum _{{\theta } \in \Theta } \tilde{p}_{\theta }(x)\varrho (q_{\theta }(x),q_{\theta }(y)) \le L_q \varrho (x,y) \quad \text {for} \quad x,y \in Y. \end{aligned}$$
(12)

We also assume that the functions \(\tilde{p}_{\theta }\), \({\theta } \in \Theta \), and \(p_{ij}\), \(i,j \in I\), satisfy the following conditions

$$\begin{aligned}&\sum _{j\in I} |p_{ij}(x) - p_{ij}(y)| \le L_p\varrho (x,y) \quad \text {for}\quad x,y \in Y, \,\, i \in I,\nonumber \\&\sum _{{\theta }\in \Theta } |\tilde{p}_{{\theta }}(x) - \tilde{p}_{{\theta }}(y)| \le L_{\tilde{p}}\varrho (x,y) \quad \text {for}\quad x,y \in Y, \end{aligned}$$
(13)

where \(L_p, L_{\tilde{p}} > 0\).

For \(x, y \in Y,\, t\ge 0\) we define

$$\begin{aligned}&I_{T}(t, x, y) = \{ j \in I: \varrho (T_j(t, x) , T_j(t, y)) \le Le^{\alpha t}\varrho (x , y)\}\nonumber \\&I_{q}(x, y) = \{ {\theta } \in \Theta : \varrho (q_{\theta }(x) , q_{\theta }(y)) \le L_q\varrho (x , y)\} \end{aligned}$$
(14)

Assume that there are \(p_0 > 0, q_0 > 0\) such that : for every \(i_1, i_2 \in I, \, x, y \in Y\) and \( t \ge 0\) we have

$$\begin{aligned}&\sum \limits _{j \in I_{T}(t, x, y)} p_{i_1j}(x)p_{i_2j}(y) > p_0,\nonumber \\&\sum \limits _{{\theta } \in I_{q}(x, y)}\tilde{p}_{{\theta }}(x)\tilde{p}_{{\theta }}(y) > q_0. \end{aligned}$$
(15)

Remark 1

The condition (15) is satisfied if there are \(i_0 \in I, {\theta }_0 \in \Theta \) such that

$$\begin{aligned}&\varrho (T_{i_0}(t,x) , T_{i_0}(t,y)) \le Le^{ \alpha t}\varrho (x,y) \quad \text {for}\quad x,y \in Y, \,\, t \ge 0,\nonumber \\&\varrho (q_{{\theta }_0}(x),q_{{\theta }_0}(y)) \le L_q \varrho (x,y) \quad \text {for} \quad x,y \in Y, \end{aligned}$$
(16)

and

$$\begin{aligned}&\inf _{i\in I} \inf _{x \in Y} p_{i i_0}(x) > 0,\nonumber \\&\inf _{x \in Y} \tilde{p}_{{\theta }_0}(x) > 0. \end{aligned}$$
(17)

4 The Main Theorem

Theorem 3

Assume that system (Tpq) satisfies conditions (10)–(15). If

$$\begin{aligned} LL_q + \frac{\alpha }{\lambda } < 1. \end{aligned}$$
(18)

then

  1. (i)

    there exists a unique invariant measure \(\mu _*\in \mathcal {M}_1^1(Y\times I)\) for the process \((x_n,\xi _n)_{n\ge 0}\), which is attractive in \(\mathcal {M}_1(Y\times I)\).

  2. (ii)

    there exist \(q\in (0,1)\) and \(C>0\) such that for \(\mu \in \mathcal {M}_1^1(Y\times I)\) and \(n\in \mathbb {N}\)

    $$\begin{aligned} ||P^{*n}\mu -\mu _* ||_{FM}\le q^n C\left( 1+\int _Y \varrho (x,x_*) \,\mu (dx)\right) , \end{aligned}$$

    where \(x_*\) is given by (10),

  3. (iii)

    the strong law of large numbers holds for the process \((x_n,\xi _n)_{n\ge 0}\) starting from \((x_0,\xi _0 )\in Y\times I\), i.e. for every bounded Lipschitz function \(f:Y\times I\rightarrow \mathbb {R}\) and every \(x_0\in Y\) and \(\xi _0\in I\) we have

    $$\begin{aligned} \lim _{n\in \infty }\frac{1}{n}\sum _{k=0}^{n-1} f(x_k,\xi _k)=\int _{Y\times I} f(x,\xi )\, \mu _*(dx,d\xi ) \end{aligned}$$

    \(\mathbb {P}_{x_0,\xi _0}\) almost surely.

Proof of Theorem 3

We are going to verify assumptions of Theorem 1. Set \(X=Y\times I\), \(F=X\times X\) and define

$$\begin{aligned}&{\mathbf Q}_{(x_1,i_1)(x_2,i_2)}(A)=\\&\sum _{j\in I}\sum _{{\theta }\in \Theta }\int _0^{+\infty }\lambda e^{-\lambda t}\Big \{ p_{i_1 j}(x_1) \tilde{p}_{\theta }\big ( T_j(t,x_1)\big )\wedge p_{i_2 j}(x_2)\tilde{p}_{\theta }\big (T_j(t,x_2)\big )\Big \} \\&\quad \times 1_A\big (\big ( q_{\theta }\big (T_j(t,x_1)\big ),j),(q_{\theta }\big (T_j(t,x_2)\big ),j\big )\big )\, dt \end{aligned}$$

for \(A\subset X\times X\), where \(a\wedge b\) stands for the minimum of a and b.

  • A0. The continuity of functions \(p_{ij}, \tilde{p}_{\theta }, q_{\theta }\) implies that the operator P defined in (8) is a Feller operator.

  • A1. Define \(L(x,i)=\varrho (x,x_*)\) for \((x,i)\in X\). By (8) we have

    Further, using (10), (11) and (12) we obtain

    $$\begin{aligned} PL(x,i)\le aL(x,i)+b, \end{aligned}$$
    (19)

    where

    $$\begin{aligned}&a=\frac{\lambda L L_q}{\lambda -\alpha }, \nonumber \\&b=\sum _{j\in I}\sum _{{\theta }\in \Theta }\int _0^{+\infty }\lambda e^{-\lambda t}\varrho (q_{\theta }\big (T_j(t,x_*)\big ),q_{\theta }(x_*))\, dt +\sum _{{\theta }\in \Theta }\varrho (q_{\theta }(x_*),x_*), \end{aligned}$$
    (20)

    so L is a Lapunov function for P.

  • A2. Observe that by (7), (11) and (12) we have for \((x_1,i_1),(x_2,i_2)\in X\)

    $$\begin{aligned}&\int _{X^2} d(u,v)\,{\mathbf Q}_{(x_1,i_1)(x_2,i_2)}(du,dv)=\\&\sum _{j\in I}\sum _{{\theta }\in \Theta } \int _0^{+\infty }\lambda e^{-\lambda t}\Big \{ p_{i_1 j}(x_1) \tilde{p}_{\theta }\big ( T_j(t,x_1)\big )\wedge p_{i_2 j}(x_2)\tilde{p}_{\theta }\big (T_j(t,x_2)\big )\Big \}\times \\&\quad \times \varrho (q_{\theta }\big (T_j(t,x_1)\big ),q_{\theta }\big (T_j (t,x_2)\big ))\,dt \\&\le \sum _{j\in I}\sum _{{\theta }\in \Theta } \int _0^{+\infty }\lambda e^{-\lambda t} p_{i_1 j}(x_1) \tilde{p}_{\theta }\big ( T_j(t,x_1)\big ) \varrho \Big (q_{\theta }\big (T_j(t,x_1)\big ),q_{\theta }\big (T_j (t,x_2)\big )\Big )\,dt \\&\le \beta \varrho (x_1,x_2) \le \beta \,d\big ((x_1,i_1),(x_2,i_2)\big ) \end{aligned}$$

    with \(\beta =\frac{\lambda L L_q}{\lambda -\alpha }<1\) by (18).

  • A3. From (13) and (11) it follows that

    $$\begin{aligned}&1-\sum _{j\in I}\sum _{{\theta }\in \Theta } \Big \{ p_{i_1 j}(x_1) \tilde{p}_{\theta }\big ( T_j(t,x_1)\big )\wedge p_{i_2 j}(x_2)\tilde{p}_{\theta }\big (T_j(t,x_2)\big )\Big \} \\&\le \sum _{j\in I}\sum _{{\theta }\in \Theta } | p_{i_1 j}(x_1) \tilde{p}_{\theta }\big ( T_j(t,x_1)\big ) - p_{i_2 j}(x_2)\tilde{p}_{\theta }\big (T_j(t,x_2)\big )| \\&\le \sum _{j\in I}\sum _{{\theta }\in \Theta } p_{i_1 j}(x_1)|\tilde{p}_{\theta }\big (T_j(t,x_1)\big )-\tilde{p}_{\theta }\big (T_j(t,x_2)\big )| \\&\quad + \sum _{j\in I}\sum _{{\theta }\in \Theta } \tilde{p}_{\theta }\big (T_j(t,x_2)\big )|p_{i_1 j}(x_1)-p_{i_2 j}(x_2)| \\&\le L L_{\tilde{p}} e^{\alpha t}\varrho (x_1,x_2) + L_p \varrho (x_1,x_2) +2N \varrho _d(i_1,i_2) \end{aligned}$$

    and consequently

    $$\begin{aligned} 1-{\mathbf Q}_{(x_1,i_1)(x_2,i_2)}(X^2)\le \Big (L_p+\frac{\lambda L L_{\tilde{p}}}{\lambda -\alpha }\Big )\varrho (x_1,x_2)+2N\varrho _d(i_1,i_2). \end{aligned}$$

Fix \(x_1,x_2\in Y\) and \(i_1,i_2\in I\). Define \(B=\{\big ( (u,j),(v,j)\big ):\, \varrho (u,v)<\beta \varrho (x_1,x_2), j\in I\}\). If \(\alpha \ge 0\) then there exists \(t_*>0\) such that \(L L_q e^{\alpha t}<\beta \) for \(t<t_*\). Set \(A=(0,t_*)\). If \(\alpha <0\) then there exists \(t_*>0\) such that \(L L_q e^{\alpha t}<\beta \) for \(t>t_*\). Set \(A=(t_*,\infty )\). In both cases define \(r=\int _A \lambda e^{-\lambda t}\, dt\). For all \(x,y\in Y\), \(t\in A\), \(j\in I_{T}(t,x,y)\) and \({\theta }\in I_q\big (T_j(t,x),T_j(t,y)\big )\) we have

$$\begin{aligned} \big ( (q_{\theta }(T_j(t,x) ),j ), (q_{\theta } (T_j(t,y) ),j )\big )\in B. \end{aligned}$$
(21)

From (15) and (21) we obtain

$$\begin{aligned}&{\mathbf Q}_{(x_1,i_1)(x_2,i_2)} (B) \\&\ge \int _A \lambda e^{-\lambda t}\sum _{j\in I_{T}(t,x_1,x_2)}\sum _{{\theta }\in I_q (T_j(t,x),T_j(t,y) )} \Big \{ p_{i_1 j}(x_1)\tilde{p}_{\theta }\big ( T_j(t,x_1)\big ) \\&\quad \wedge p_{i_2 j}(x_2)\tilde{p}_{\theta }\big (T_j(t,x_2)\big )\Big \} \\&\quad \times 1_B\big (\big ( q_{\theta }\big (T_j(t,x_1)\big ),j\big ) ,\big (q_{\theta }\big (T_j(t,x_2),j\big )\big ) \big )\, dt \\&= \int _A \lambda e^{-\lambda t}\sum _{j\in I_{T}(t,x_1,x_2)}\sum _{{\theta }\in I_q (T_j(t,x),T_j(t,y) )} \Big \{ p_{i_1 j}(x_1)\tilde{p}_{\theta }\big ( T_j(t,x_1)\big ) \\&\quad \wedge p_{i_2 j}(x_2)\tilde{p}_{\theta }\big (T_j(t,x_2)\big )\Big \} \, dt\\&\ge \int _A \lambda e^{-\lambda t}\sum _{j\in I_{T}(t,x_1,x_2)}\sum _{{\theta }\in I_q (T_j(t,x),T_j(t,y) )} \Big \{ p_{i_1 j}(x_1) p_{i_2 j}(x_2)\tilde{p}_{\theta }\big ( T_j(t,x_1)\big ) \\&\quad \times \tilde{p}_{\theta }\big (T_j(t,x_2)\big )\Big \}\, dt \\&>p_0 q_0 r>0, \end{aligned}$$

so A3 is satisfied. Since \(F=X\times X\), assumption A4 is trivially satisfied. From Theorem 1 we obtain (i) and (ii). Set \(v(x,i)=C(\varrho (x,x_*)+1)\) and \(h(x,i)=C(\varrho (x,x_*)+1+\frac{b}{1-a})\) for \(x\in X\), \(i\in I\), with ab as in (20). Iterating (19) we obtain

$$\begin{aligned} \mathbb {E}_{x_0,\xi _0}(v(x_n,\xi _n))\le h(x_0,\xi _0)\quad \text {for}\quad x_0\in X,\xi _0\in I. \end{aligned}$$

Application of Theorem 2 ends the proof.

The next result describing the asymptotic behavior of the process \((x_n)_{n\ge 0}\) on Y is an obvious consequence of Theorem 3. Let \({\tilde{\mu }}_0\) be the distribution of the initial random vector \(x_0\) and \({\tilde{\mu }}_n\) the distribution of \(x_n\), i.e.

$$\begin{aligned} {\tilde{\mu }}_n(A)=\mathbb {P}(x_n\in A)\quad \text {for}\quad A\in \mathcal {B}_Y, n\ge 1. \end{aligned}$$

\(\square \)

Theorem 4

Under the hypotheses of Theorem 3 the following statements hold:

  1. (i)

    there exists a measure \({\tilde{\mu }}_*\in \mathcal {M}_1^1(Y)\) such that for any \({\tilde{\mu }}_0\) the sequence \(({\tilde{\mu }}_n)_{n\ge 0}\) converges weakly to \({\tilde{\mu }}_*\). Moreover, if

    $$\begin{aligned} \mathbb {P}(x_0\in A)={\tilde{\mu }}_*(A)\quad \text {for}\quad A\in \mathcal {B}_Y \end{aligned}$$

    then \({\tilde{\mu }}_n(A)={\tilde{\mu }}_*(A)\) for \(A\in \mathcal {B}_Y\) and \(n\ge 1\).

  2. (ii)

    there exist \(q\in (0,1)\) and \(C>0\) such that

    $$\begin{aligned} ||{\tilde{\mu }}_n - {\tilde{\mu }}_* ||_{FM}\le q^n C(1+\int _Y \varrho (x,x_*)\,{\tilde{\mu }}_0(dx)) \end{aligned}$$

    for any initial distribution \({\tilde{\mu }}_0\in \mathcal {M}_1^1(Y)\) and \(n\ge 1\).

  3. (iii)

    for any starting point \(x_0\in Y\), \(\xi _0\in I\) and any bounded Lipschitz function f on Y

    $$\begin{aligned} \lim _{n\rightarrow \infty }\frac{1}{n}\sum _{k=0}^{n-1} f(x_k)=\int _Y f(x)\,{\tilde{\mu }}_*(dx) \end{aligned}$$

    \(\mathbb {P}_{x_0,\xi _0}\) almost surely.

The examples below show that our model generalizes some important and widely studied objects, namely dynamical systems generated by iterated function systems [1, 11, 13, 19] and Poisson driven stochastic differential equations [7, 18, 25, 27].

Example 1

Iterated Function Systems.

Let \((Y, \Vert \cdot \Vert )\) be a separable Banach space. An iterated function system (IFS) consists of a sequence of continuous transformations

$$\begin{aligned} q_{\theta } : Y \rightarrow Y , \quad \theta = 1, \ldots , K \end{aligned}$$

and a probability vector

$$\begin{aligned} \tilde{p}_{\theta }: Y \rightarrow [0, 1] , \quad \theta = 1, \ldots , K . \end{aligned}$$

Such a system is briefly denoted by \((q, \tilde{p} )_{K} = (q_1, \ldots ,q_K , \tilde{p}_1,\ldots , \tilde{p}_K )\). The action of an IFS can be roughly described as follows. We choose an initial point \(x_0\) and we randomly select from the set \( \Theta = \{1, \ldots , K\}\) an integer \(\theta _0\) in such a way that the probability of choosing \(\theta _0\) is \(\tilde{p}_{\theta _0}(x_0)\). If a number \(\theta _0\) is drawn, we define \(x_1 = q_{\theta _0}(x_0)\). Having \(x_1\) we select \(\theta _1\) in such a way that the probability of choosing \(\theta _1\) is \(\tilde{ p}_{\theta _1}(x_1)\). Now we define \(x_2 = q_{\theta _1}(x_1)\) and so on.

An IFS is a particular example of a random dynamical system with randomly chosen jumps. Consider a dynamical system of the form \(I =\{1\}\) and \(T_1(t, x) = x \) for \(x \in Y \), \( t \in \mathbb {R}_+\). Moreover assume that \(p_1(x) = 1 \) and \(p_{11}(x) = 1 \) for \(x \in Y\). Then we obtain an IFS \((q, \tilde{ p} )_{K}\).

Denoting by \(\tilde{\mu }_n\), \(n \in \mathbb {N} \), the distribution of \(x_n\), i.e., \(\,\tilde{\mu }_n (A) = \mathbb {P}(x_n \in A)\) for \( A \in \mathcal {B}(Y)\), we define \(\widetilde{P}^*\) as the transition operator such that \(\tilde{\mu }_{n+1} = \widetilde{P}^*\tilde{\mu }_n\) for \(n \in \mathbb {N} \). The transition operator corresponding to iterated function system \((q, \tilde{p})_K\) is given by

$$\begin{aligned} \widetilde{P}^*\mu (A) = \sum _{\theta \in \Theta } \int _Y 1_A \big (q_{\theta }(x)\big )\tilde{p}_{\theta }(x) \,\mu (dx)\quad \text {for}\quad A \in \mathcal {B}(Y),\,\,\mu \in \mathcal {M}_1(Y). \end{aligned}$$
(22)

We assume (13) and (15) and take \(\alpha =0\) in (11). If

$$\begin{aligned} \sum _{{\theta } \in \Theta } \tilde{p}_{\theta }(x)\varrho (q_{\theta }(x),q_{\theta }(y)) \le L_q \varrho (x,y) \quad \text {for} \quad x,y \in Y. \end{aligned}$$

with \(L_q<1\) then from Theorem 4 we obtain existence of an invariant measure \(\mu _*\in \mathcal {M}_1^1(Y\times I)\) for the process \((x_n,\xi _n)_{n\ge 0}\), which is attractive in \(\mathcal {M}_1(Y\times I)\), exponentially attractive in \(\mathcal {M}_1^1(Y\times I)\) and for which the strong law of large numbers holds (cf. [26]).

Remark 2

The convergence in Theorem 4 is weak and one cannot expect the strong one, i.e. in the total variation norm. Indeed, let \(Y = \mathbb {R} \), \(q_1(x) = x\) and \(q_2(x) = 0\) for \( x \in \mathbb {R} \). For every probabilistic vector \((p_1, p_2 )\) with \(p_1 < 1 \) condition \(L_q<1\) from the above Example is satisfied. Thus for every \(\mu \in \mathcal {M}_1(Y)\) the sequence \(\{\widetilde{P}^{*n}\mu \}_{n \ge 1}\) given by (22) converges weakly to \( \mu _0 = \delta _0\). Obviously, the strong convergence does not hold.

Example 2

Poisson driven stochastic differential equation.

Consider a stochastic differential equation

$$\begin{aligned} dX(t) = a(X(t),\xi (t)) dt + b(X(t))dp(t) \qquad \text {for} \quad t \ge 0 \end{aligned}$$

with the initial condition

$$\begin{aligned} X(0) = x_0, \end{aligned}$$

where \(a:Y\times I \rightarrow Y\), \(b:Y\rightarrow Y\) are Lipschitz functions, \((Y, \Vert \cdot \Vert )\) is a separable Banach space, \(\{p(t)\}_{t \ge )}\) is a Poisson process and \(\{\xi (t)\}_{t \ge 0}\), \( \xi (t) : \Omega \rightarrow I \) is a stochastic process describing random switching at random moments \(\tau _n\). Consider a sequence of random variables \(\{x_n\}_{n \ge 0} \), \( x_n : \Omega \rightarrow Y \) such that

$$\begin{aligned}&x_n = q(T_{\xi (\tau _{n-1})} (\tau _n - \tau _{n-1}, x_{n-1})) ,\quad q(x) = x + b(x)\\&\mathbb {P} \{\xi (0) = k | x_0 = x \} = p_k (x),\\&{\mathbb {P}} \{\xi (\tau _n) = s | x_n = y, \,\, \xi (\tau _{n-1}) = i \} = p_{is} (y), \quad \text {for}\quad n = 1,\ldots \\&\text {and}\\&\xi (t) = \xi (\tau _{n-1}) \qquad \text {for} \quad \tau _{n-1} \le t < \tau _{n},\quad n=1, 2, \ldots \end{aligned}$$

This is a particular example of continuous random dynamical systems where \(q_{\theta }(x) = q(x)\), \(\theta \in \{1, \ldots , K\}\), and for every \(i \in I\), \( T_i(t,x)= v_i(t) \) are the solutions of the unperturbed Cauchy problems

$$\begin{aligned} v'_i (t) = a(v_i (t), i) \qquad \text {and}\quad v_i (0) = x,\qquad x \in Y. \end{aligned}$$

It is easy to check that \(\mu _n = P^{*n} \mu \), where \(P^*\) is the transition operator corresponding to the above stochastic equation‘ and given by

$$\begin{aligned} P^*\mu (A) = \sum _{j \in I}\int _{ Y \times I} \int _{\mathbb {R}_+} \lambda e^{-\lambda t} 1_A(q(T_j (t,x) ), j)p_{ij}(x)dt d\mu (x, i) \end{aligned}$$

for \( A \in \mathcal {B}(Y\times I)\) and \( \mu \in \mathcal {M}_1\).

Assume that there exist positive constants \(L_q \), L and \(\alpha \) such that

$$\begin{aligned} \Vert q(x) - q(y)\Vert \le L_q \Vert x - y\Vert \end{aligned}$$

and

$$\begin{aligned} \Vert T_{i_0} (t,x) - T_{i_0}(t, y)\Vert \le L e^{\alpha t} \Vert x - y\Vert \end{aligned}$$

for some \(i_0 \in I \) such that \(\inf _{i\in I} \inf _{x \in Y} p_{i i_0}(x) > 0\) and \(\quad x, y \in Y\), \(t \ge 0\). If

$$\begin{aligned} LL_q + \frac{\alpha }{\lambda }< 1. \end{aligned}$$

then there exists a unique invariant measure \(\mu _*\in \mathcal {M}_1^1(Y\times I)\) for the process \((x_n,\xi _n)_{n\ge 0}\), which is attractive in \(\mathcal {M}_1(Y\times I)\), exponentially attractive in \(\mathcal {M}_1^1(Y\times I)\) and the strong law of large numbers holds for the process \((x_n,\xi _n)_{n\ge 0}\) starting from \((x_0,\xi _0 )\in Y\times I\), i.e. for every bounded Lipschitz function \(f:Y\times I\rightarrow \mathbb {R}\) and every \(x_0\in Y\) and \(\xi _0\in I\) we have

$$\begin{aligned} \lim _{n\in \infty }\frac{1}{n}\sum _{k=0}^{n-1} f(x_k,\xi _k)=\int _{Y\times I} f(x,\xi )\, \mu _*(dx,d\xi ) \end{aligned}$$

\(\mathbb {P}_{x_0,\xi _0}\) almost surely.