1 Introduction

Let \(\{X_n,n \ge 0\}\) be an arbitrary information source taking values in a finite alphabet set on probability space \((\Omega ,{{{\mathcal {F}}}},\mathbf {P})\). Set \(f_n(\omega )=-\frac{1}{n} \ln \mathbf {P}(X_0,X_1,\ldots ,X_n)\), and \(f_n(\omega )\) is called the relative entropy density of \(\{X_0,X_1,\ldots , X_n\}\) in information theory. The convergence of \(f_n(\omega )\) to a constant in a sense (\(L_1\) convergence, convergence in probability, a.e. convergence) is regarded as the entropy ergodic theorem or the asymptotic equipartiton property (AEP), or the Shannon–McMillan–Breiman theorem, which is the fundamental theorem in information theory. Shannon [22] first proved the entropy ergodic theorem for convergence in probability for stationary ergodic information sources with finite alphabet. The entropy ergodic theorem in \(L_1\) and a.e. convergence, respectively, for stationary ergodic information sources was explored by McMillan [20] and Breiman [6]. Chung [8] considered the case of countable alphabet and Billingsley [5] extended the result to stationary nonergodic sources. Gray and Kieffer [14] extended it to asymptotically stationary measure process. The entropy ergodic theorem for general stochastic processes can be found, for example, in Barron [2] and Algoet and Cover [1]. Yang and Liu [19, 30] obtained the entropy ergodic theorem of nonhomogeneous Markov information sources in finite state space. Yang and Liu [32] studied the entropy ergodic theorem for mth-order nonhomogeneous Markov information sources.

Let \(S_n=\sum _{k=1}^{n}X_k\), \(S_0=0\) and for any \(m,n \in \mathbb {N^{+}}\), define

$$\begin{aligned} T_{m,n}=S_{m+n}-S_{m}=\sum _{k=m+1}^{m+n}X_k, \end{aligned}$$

\(\frac{T_{m,n}}{n}\) is regarded as the moving average or delay sum in probability theory. Many researches have been taken on topics of moving average. Shepp [23] studied the limiting values of the averages \([S_{n+f(n)}-S_n]/f(n)\) for i.i.d. random variables. Gaposhkin [12] established the law of large numbers for moving averages of independent random variables. Lanzinger [17] studied an almost sure limit theorem for moving averages of random variables between the strong law of large numbers and the Erdos–Rényi law. Lai [16] gave a review of limit theorems for moving averages and described some recent developments motivated by applications to signal detection and change point problems. Recently, Wang and Yang [27] considered the entropy ergodic theorem of the moving average form and obtained the generalized entropy ergodic theorem for nonhomogeneous Markov chains.

The tree indexed stochastic process is one of the research hotspots of stochastic structure in recent years. The tree indexed stochastic process generally includes tree indexed random wak, random tree (such as Galton–Watson tree) and tree indexed Markov chain et al. There are a lot of researches about probability limit theorems of tree indexed stochastic process, we briefly list as follows: Chen [7] studied the average properties of random walks on Galton-Watson trees. Telcs and Wormald [26] studied the strong recurrence of tree indexed random walks determined by the resistance properties of spherically symmetric graphs. Dembo et al. [10] extended the notions of shift-invariance and specific relative entropy—as typically understood for Markov fields on deterministic graphs such as \({\mathbb {Z}}^d\)-to Markov fields on random trees, and also developed single-generation empirical measure large deviation principles for a more general class of random trees. Le Gall [18] considered Galton–Watson trees associated with a critical offspring distribution and condition to have exactly n vertices, and they proved that these conditioned spatial trees converge as \(n\rightarrow \infty \), moduloan appropriate rescaling, towards the conditioned brownian tree under suitable assumptions on the offspring distribution and the spatial displacements. Guyon [13] studied the law of large numbers and central limit theorems for the bifurcating Markov chains indexed by a binary tree, and applied these results to detect cellular aging in Escherichia Coli, using the data of Stewart et al. and a bifurcating autoregressive model. Yamamoto [34] established a large deviation theorem for the number of branches of each order in a random binary tree, where the rate function associated with a large deviation was given by asymptotic forms of the rate function.

The significant progress of tree indexed Markov chains is its entropy ergodic theorem. Benjamini and Peres [3] gave the definition of tree-indexed Markov chains and studied the recurrence and ray-recurrence for them. Berger and Ye [4] studied the existence of entropy rate for some stationary random fields on a homogenous tree. Ye and Berger [35, 36], by using Pemantle’s [21] result and a combinational approach, have obtained entropy ergodic theorem in probability for a PPG-invariant and ergodic random field on a homogenous tree. Yang and Liu [30] established the strong law of large numbers for frequency of state occurrence on Markov chains indexed by a homogenous tree (in fact, it is special case of tree-indexed Markov chains and PPG-invariant random field). Yang [31, 33] obtained the strong law of large numbers and the entropy ergodic theorem for tree-indexed Markov chains. Huang and Yang [15] studied the strong law of large numbers and entropy ergodic theorem for Markov chains indexed by an uniformly bounded tree. Shi and Yang [25] studied the entropy ergodic theorem for mth-order nonhomogeneous Markov chains indexed by a tree. Recently, Dang et al. [9] defined a discrete form of nonhomogeneous bifurcating Markov chains indexed by a binary tree and discuss the equivalent properties for them, meanwhile the strong law of large numbers and the entropy ergodic theorem are studied for these Markov chains with finite state space. Shi et al. [24] studied the strong law of large numbers and entropy ergodic theorem for Markov chains indexed by a Cayley tree in a Markovian environment with countable state space.

Inspired by Dang et al. [9], Wang and Yang [27], and infused with some new ideas, in this paper, we study the generalized entropy ergodic theorem for nonhomogeneous bifurcating Markov chains indexed by a binary tree. Firstly, we prove a strong limit theorem for moving average of the bivariate functions of such chains. Secondly, we prove the strong law of large numbers for the frequencies of occurrence of states of moving average and the generalized entropy ergodic theorem. As corollaries, we generalize some known results. The research innovations of this paper are embodied in generalizing the entropy ergodic theorem in the form of moving average. As the classical Doob martingale convergence theorem cannot be employed, the core technique in this paper is that we construct a class of random variables with a parameter and the mean value of one, and use Borel–Cantelli lemma to prove the existence of a.e. convergence of certain random variables.

The rest of this paper is organized as follows. Section 2 describes some preliminaries, some concepts and properties of Markov chains indexed by a tree and the entropy density are reviewed. The most significant results of this article, i.e. the strong law of large numbers for the frequencies of occurrence of states and the generalized entropy ergodic theorem for the finite nonhomogeneous bifurcating Markov chains indexed by a binary tree, will be illustrated in Sect. 3. Finally, the proofs of main results in Sect. 3 are provided in Sect. 4.

2 Preliminaries

A tree is a graph T which is connected and contains no circuits. Given any two vertices \(\alpha \ne \beta \in T\). Let \(\overline{\alpha \beta }\) be the unique path connecting \(\alpha \) and \(\beta \). Define the distance \(d(\alpha ,\beta )\) to be the number of edges contained in the path \(\overline{\alpha \beta }\). Select a vertex as the root (denoted by o). For any two vertices \(\sigma \) and t of tree T, we write \(\sigma \le t\) if \(\sigma \) is on the unique path from the root o to t. We denote by \(\sigma \wedge t\) the vertex farthest from o satisfying \(\sigma \wedge t\le t\) and \(\sigma \wedge t\le \sigma \). The set of all vertices with distance n from the root o is called the n-th level of T . We denote by \(L_n\) the set of all vertices on level n \((L_o = \{o\})\). We denote by \(L_m^n\) to be the set of all vertices on the mth to nth level of T, specially by \(T^{(n)}\) to be the set of all vertices on level 0 (the root o) to level n. Let T be any tree and \(t\in T\backslash \{o\}\). If a vertex in this tree is on the unique path from the root o to t and is the nearest to t, we call it the predecessor of t and denote it by \(1_t\) , we also call t a successor of \(1_t\). If the root of a tree has N neighboring vertices and other vertices have \(N +1\) neighboring vertices, we call this type of tree a Cayley tree and denote it by \(T_{C,N}\). That is, for any vertex t of Cayley tree \(T_{C,N}\), it has N successors on the next level. In this paper, we mainly investigate the binary tree \(T_{C,2}\), on which each vertex has two successors on the next level. For simplicity, we denote \(T_{C,2}\) by \(T_2\) (see Fig. 1). For any vertex t of the binary tree \(T_2\), we denote by \(t^1\) and \(t^2\) the two successors of t, and call them the first successor and the second successor of t respectively.

Let \((\Omega ,{{{\mathcal {F}}}},{\mathbf {P}})\) be a probability space, and T be any tree, \(\{X_{t},t \in T\}\) be tree-indexed stochastic processes defined on \((\Omega ,{{{\mathcal {F}}}},\mathbf {P})\). Let A be the subgraph of T, \(X^{A} = \{X_{t},t \in A\}\). We denote by |A| the number of vertices of A, \(x^{A}\) the realization of \(X^{A}\). Dang et al. [9] defined the discrete form of nonhomogeneous bifurcating Markov chains indexed by a binary tree. First we review the definition of this process.

Fig. 1
figure 1

Binary tree \(T_{C,2}\)

Definition 2.1

(Dang et al. [9]) Let \(T_{2}\) be a binary tree, G a countable state space, \(\{X_{t},t\in T_{2}\}\) be a collection of G-valued random variables defined on probability space \((\Omega ,\mathcal{F},\mathbf {P})\). Let

$$\begin{aligned} p=\{p(x),x\in G\} \end{aligned}$$
(1)

be a distribution on G, and

$$\begin{aligned} P_{t}=(P_{t}(y_{1},y_{2}|x)),\quad x,y_{1},y_{2}\in G,\quad t\in T_{2} \end{aligned}$$
(2)

be a collection of stochastic matrices (that is \(P_{t}(y_{1},y_{2}|x)\ge 0,\forall y_{1},y_{2},x \in G\), and \(\sum _{(y_{1},y_{2})\in G^{2}} P_{t}(y_{1},y_{2}|x)=1,\forall x\in G)\) on \(G\times G^{2}\). If \(\forall n\ge 1\),

$$\begin{aligned} {\mathbf {P}}(X^{L_{n}} = x^{L_{n}} | X^{T^{(n-1)}}=x^{T^{(n-1)}}) = \prod _{t\in L_{n-1}} P_{t}(x_{t^{1}},x_{t^{2}} | x_{t}), \end{aligned}$$
(3)

and

$$\begin{aligned} {\mathbf {P}}(X_{o} = x) = p(x),\quad \forall x\in G, \end{aligned}$$
(4)

\(\{X_{t},t\in T_{2}\} \) will be called G-valued nonhomogeneous bifurcating Markov chains indexed by a binary tree \(T_{2}\) with the initial distribution (1) and stochastic matrices (2). If \(\forall t \in T_{2},P_{t}=P\), where \(P=\big \{P(y_{1},y_{2}|x),x,y_{1},y_{2}\in G\big \}\) is a stochastic matrix on \(G\times G^{2}\), \(\{X_{t},t\in T_{2}\} \) will be called G-valued homogeneous bifurcating Markov chains indexed by a binary tree.

Dang et al. [9] presented the equivalent properties for nonhomogeneous bifurcating Markov chains indexed by a binary tree as following.

Property 2.1

(Dang et al. [9]) Let \(T_{2}\) be a binary tree, G a countable state space, and \(\{X_{t},t\in T_{2}\}\) be a collection of G-valued random variables defined on probability space \((\Omega ,\mathcal{F},\mathbf {P})\), then the three propositions below are equivalent:

  1. (i)

    \(\{X_{t},t\in T_{2}\}\) is a G-valued nonhomogeneous bifurcating Markov chain indexed by a binary tree \(T_{2}\) with the initial distribution (1) and stochastic matrices (2) defined by Definition 2.1;

  2. (ii)

    For \(\forall n\ge 1\) and \(\forall x^{T^{(n)}}\in G^{T^{(n)}}\), we have

    $$\begin{aligned} {\mathbf {P}}(X^{T^{(n)}} = x^{T^{(n)}}) = p(x_{o})\prod _{ t\in T^{(n-1)}}P_{t}(x_{t^{1}},x_{t^{2}}|x_{t}); \end{aligned}$$
    (5)
  3. (iii)

    For \(\forall n\ge 1\) and \(t,t_{1},t_{2},\ldots ,t_{n} \in T_{2}\), satisfying \(t_{i} \wedge t^{1} \le t,t_{i} \wedge t^{2} \le t,1\le i \le n\), we have

    $$\begin{aligned}&{\mathbf {P}}(X_{t^{1}}=y_{1},X_{t^{2}}=y_{2} | X_{t} = x,X_{t_{1}} = x_{t_{1}},\ldots ,X_{t_{n}}= x_{t_{n}}) \nonumber \\&\quad = P_{t}(y_{1},y_{2} | x)= {\mathbf {P}}(X_{t^{1}} = y_{1},X_{t^{2}}= y_{2} | X_{t} = x), \ \ \forall x,y_{1},y_{2} \in G, \end{aligned}$$
    (6)

    and

    $$\begin{aligned} {\mathbf {P}}(X_{o} = x) = p(x),\ \ \forall x\in G. \end{aligned}$$

Remark 2.1

It is a consequence of Kolmogorov extension theorem that there exists a collection of G-valued random variables \(\{X_{t},t\in T_{2}\}\) on some probability space such that (5) holds.

Remark 2.2

By (5), we can easily obtain that for \(\forall m,n \ge 1,n \ge m\) and \(\forall x^{L^{n}_{m}} = G^{L^{n}_{m}}\),

$$\begin{aligned} {\mathbf {P}}(X^{L^{n}_{m}}= x^{L^{n}_{m}}) = {\mathbf {P}}(X^{L_{m}} = x^{L_{m}})\prod _{t\in L^{n-1}_{m}} P(x_{t^{1}},x_{t^{2}} | x_{t}). \end{aligned}$$
(7)

Remark 2.3

If \(\{X_{t},t\in T_{2}\}\) is a G-valued nonhomogeneous bifurcating Markov chains indexed by a binary tree \(T_{2}\) with the stochastic matrices (2) defined by Definition 2.1. From the second equality of (6), we have that for any \(t\in T\),

$$\begin{aligned} {\mathbf {P}}(X_{t^{1}} = y_{1},X_{t^{2}} = y_{2} | X_{t} = x) = P_{t}(y_{1},y_{2} | x). \end{aligned}$$

Below we will recall the definition of tree indexed nonhomogeneous Markov chains.

Definition 2.2

(Dong et al. [11]) Let T be a local finite and infinite tree, G a countable state space, \(\{X_{t},t\in T\}\) be a collection of G-valued random variables defined on probability space \((\Omega ,{{\mathcal {F}}},\mathbf {P})\). Let

$$\begin{aligned} p = \{p(x),x\in G\} \end{aligned}$$
(8)

be a distribution on G, and

$$\begin{aligned} Q_{t} = (Q_{t}(y | x)),\ \ x,y\in G,\ \ t\in T\backslash \{o\} \end{aligned}$$
(9)

be a collection of transition matrices on \(G^{2}\). If \(\forall n\ge 1\), and \(t, t_{1}, t_{2},\ldots ,t_{n} \in T\), satisfying \(t_{i}\wedge t \le 1_{t},1 \le i \le n\), we have

$$\begin{aligned} {\mathbf {P}}(X_{t}&= y | X_{1_{t}} = x,X_{t_{1}} = x_{t_{1}},\ldots ,X_{t_{n}} = x_{t_{n}})\nonumber \\&{=\mathbf {P}}(X_{t} = y | X_{1_{t}} = x) = Q_{t}(y | x),\ \ \ \forall x,y \in G, \end{aligned}$$
(10)

and

$$\begin{aligned} {\mathbf {P}}(X_{o} = x) = p(x),\ \ \ \forall x\in G, \end{aligned}$$
(11)

\(\{X_{t},t \in T\}\) will be called G-valued nonhomogeneous Markov chains indexed by tree T with the initial distribution (8) and transition matrices (9), or called tree indexed nonhomogeneous Markov chains with state space G.

The above definition is the natural generalization of the definition of homogeneous Markov chains indexed by tree T (see Benjamini and Peres [3]). Similar to the equivalent property of nonhomogeneous bifurcating Markov chains indexed by a binary tree, by Property 2.1, we can immediately obtain the equivalent property of nonhomogeneous Markov chains indexed by a tree.

Property 2.2

(Dang et al. [9]) Let T be a local finite and infinite tree, G a countable state space, and \(\{X_{t},t\in T\}\) be a collection of G-valued random variables defined on probability space \((\Omega ,{{\mathcal {F}}},\mathbf {P})\). Then \(\{X_{t},t\in T\}\) is a tree indexed nonhomogeneous Markov chain taking values in G defined by Definition 2.2 if and only if \(\forall n \ge 1\) and \(\forall x^{T^{(n)}}\in G^{T^{(n)}}\),

$$\begin{aligned} {\mathbf {P}}(X^{T^{(n)}} = x^{T^{(n)}}) = p(x_{o})\prod _{ t\in {T^{(n)}\backslash \{o\}}} Q_{t}(x_{t}|x_{1_{t}}). \end{aligned}$$
(12)

Remark 2.4

From Property 2.2, we know that \(\{X_{t},t\in T_{2}\}\) is a tree indexed nonhomogeneous Markov chain if and only if, \(\forall n\ge 1\) and \(\forall x^{T^{(n)}}\in G^{T^{(n)}}\),

$$\begin{aligned} {\mathbf {P}}(X^{T^{(n)}} = x^{T^{(n)}}) = p(x_{o})\prod _ {t\in T^{(n-1)}} Q_{t^{1}}(x_{t^{1}}|x_{t})Q_{t^{2}}(x_{t^{2}}|x_{t}). \end{aligned}$$
(13)

Thus a nonhomogeneous bifurcating Markov chain indexed by a binary tree is the nonhomogeneous Markov chain indexed by a binary tree if and only if, for \(\forall t\in T_{2}\) and \(\forall x,y_{1},y_{2} \in G\),

$$\begin{aligned} P_{t}(y_{1},y_{2}|x) = Q_{t^{1}}(y_{1}|x)Q_{t^{2}}(y_{2}|x), \end{aligned}$$
(14)

that is \(\forall t\in T_{2}\),

$$\begin{aligned} \mathbf {P}(X_{t^{1}} = y_{1},X_{t^{2}} = y_{2}|X_{t} = x) = \mathbf {P}(X_{t^{1}} = y_{1}|X_{t} = x)\mathbf {P}(X_{t^{2}} = y_{2}|X_{t} = x). \end{aligned}$$

The above equality means that a nonhomogeneous bifurcating Markov chain indexed by a binary tree is the nonhomogeneous Markov chain indexed by a binary tree if and only if for any \(t\in T_{2}\), their two successors of the same predecessor of t are conditionally independent.

Let T be a tree, \(\{X_{t},t\in T\}\) be a stochastic process indexed by tree T taking values in countable state space G. Denote \(P(x^{L^{n}_{m}}) = {\mathbf {P}}(X^{L^{n}_{m}} = x^{L^{n}_{m}})\). Let \(\{a_{n},n\ge 0\}\) and \(\{\phi (n),n\ge 0\}\) be two sequences of nonnegative integers such that \(\lim _{n\rightarrow \infty }\phi (n) = \infty \). Define

$$\begin{aligned} f_{a_{n},\phi (n)}(\omega ) = - \frac{1}{|L^{a_{n}+\phi (n)}_{a_{n}} |}\ln P(X^{L^{a_{n}+\phi (n)}_{a{n}}} ), \end{aligned}$$
(15)

\(f_{a_{n},\phi (n)}(\omega )\) will be called the generalized entropy density of \(X^{L^{a_{n}+\phi (n)}_{a_{n}}}\). Particularly, if \(a_{n} \equiv 0\) and \(\phi (n) = n\), \(f_{a_{n},\phi (n)}(\omega )\) will become the classical entropy density of \(X^{T^{(n)}}\) defined as follows

$$\begin{aligned} f_{n}(\omega )\doteq f_{0,n}(\omega ) = - \frac{ 1}{|T^{(n)}|}\ln P(X^{T^{(n)}}). \end{aligned}$$
(16)

Obviously, if \(\{X_{t},t\in T\}\) is a nonhomogeneous bifurcating Markov chains indexed by a binary tree defined by Definition 2.1, it follows from (7) that

$$\begin{aligned} f_{a_{n},\phi (n)}(\omega ) = - \frac{1}{|L^{a_{n}+\phi (n)}_{a_{n}}|}\Big [\ln P(X^{L_{a_{n}}})+ \sum _{t\in L^{a_{n}+\phi (n)-1}_{a_{n}} }\ln P_{t}(X_{t^{1}},X_{t^{2}}|X_{t})\Big ], \end{aligned}$$
(17)

and

$$\begin{aligned} f_{n}(\omega ) = - \frac{ 1 }{\left| T^{(n)}\right| }\Big [\ln P(X_{o}) +\sum _{t\in T^{(n-1)}}\ln P_{t}(X_{t^{1}},X_{t^{2}}|X_{t})\Big ]. \end{aligned}$$
(18)

Property 2.3

(Yang and Yang [28]) Let \(T_{2}\) be a binary tree, \(G = \{0,1,\ldots ,b - 1\}\) a finite state space and \(\{X_{t},t\in T_{2}\}\) a tree-indexed stochastic process taking values in G. Let \(f_{a_{n},\phi (n)}(\omega )\) be defined by (15). Then \(f_{a_{n},\phi (n)}(\omega )\) are uniformly integrable.

3 Main Results

Let \(G=\{0,1,\ldots ,b-1\}\) be a finite state space, \(\{X_{t},t \in T_{2}\}\) be a G-valued nonhomogeneous bifurcating Markov chain indexed by a binary tree defined as before. Let \(S_{k}(L^{a_{n}+\phi (n)}_{a_{n}})(k\in G)\) be the number of k in set of random variables \(\{X_{t},t \in L_{a_{n}}^{a_n+\phi (n)}\}\), and \( S_{k}(L_{a_{n}})(k \in G)\) be the number of k in set of random variables \(\{X_{t},t \in L_{a_{n}}\}, S^{i}_{k}(L^{a_{n}+\phi (n)-1}_{a_{n}})(k\in G)\) be the number of k in set of random variables \(\{X_{t^{i}} = k,t \in L^{a_{n}+\phi (n)-1}_{a_{n}}\},i = 1,2\), which are defined as,

$$\begin{aligned}&S_{k}(L^{a_{n}+\phi (n)}_{a_{n}}) =|\{t\in L^{a_{n}+\phi (n)}_{a_{n}}: X_{t} = k\}|; \\&S_{k}(L_{a_{n}}) = |\{t\in L_{a_{n}} : X_{t} = k\}|; \end{aligned}$$

and

$$\begin{aligned} S^{i}_{k}(L^{a_{n}+\phi (n)-1}_{a_{n}}) = |\{t\in L^{a_{n}+\phi (n)-1}_{a_{n}}: X_{t^{i}}=k\}|, \quad i = 1,2. \end{aligned}$$

It follows that

$$\begin{aligned}&S_{k}(L^{a_{n}+\phi (n)}_{a_{n}}) =\sum _{t\in L^{a_{n}+\phi (n)}_{a_{n}}}I_{k}(X_{t}); \end{aligned}$$
(19)
$$\begin{aligned}&S_{k}(L_{a_{n}}) = \sum _{t\in L_{a_{n}}}I_{k}(X_{t}); \end{aligned}$$
(20)
$$\begin{aligned}&S^{1}_{k}(L^{a_{n}+\phi (n)-1}_{a_{n}})=\sum _{t\in L^{a_{n}+\phi (n)-1}_{a_{n}}}I_{k}(X_{t^{1}}); \end{aligned}$$
(21)
$$\begin{aligned}&S^{2}_{k}(L^{a_{n}+\phi (n)-1}_{a_{n}})=\sum _{t\in L^{a_{n}+\phi (n)-1}_{a_{n}}}I_{k}(X_{t^{2}}); \end{aligned}$$
(22)

and

$$\begin{aligned} S_{k}(L^{a_{n}+\phi (n)}_{a_{n}}) = S^{1}_{k}(L^{a_{n}+\phi (n)-1}_{a_{n}})+S^{2}_{k}(L^{a_{n}+\phi (n)-1}_{a_{n}})+S_{k}(L_{a_{n}}), \end{aligned}$$
(23)

where

$$\begin{aligned} I_{k}(i)=\left\{ \begin{array}{cc} 1,\ \ &{} i=k,\\ 0,\ \ &{} i\ne k. \end{array} \right. \end{aligned}$$

In this section, we will establish the strong law of large numbers for the frequencies of occurrence of states and the generalized entropy ergodic theorem for the finite nonhomogeneous bifurcating Markov chains indexed by a binary tree. Firstly, we will give the strong law of large numbers for the frequencies of occurrence of states for this chains with finite state space.

Theorem 3.1

Let \(G = \{0,1,\ldots ,b - 1\}\) be a finite state space, and \(\{X_{t},t\in T_{2}\}\) be a G-valued nonhomogeneous bifurcating Markov chain indexed by a binary tree \(T_{2}\) with stochastic matrices \(\{P_{t},t\in T_{2}\}\) defined by Definition 2.1, \(S_{k}(L^{a_{n}+\phi (n)}_{a_{n}})\) be defined by (19). Let \(P = (P(y_{1},y_{2}|x)),x,y_{1},y_{2}\in G\) be another stochastic matrix, and let \(P_{1}(y_{1}|x) = \sum _{y_{2}\in G}P(y_{1},y_{2}|x)\),\(P_{2}(y_{2}|x) = \sum _{y_{1}\in G}P(y_{1},y_{2}|x)\),\(P_{1}= (P_{1}(y|x))\),\(P_{2}= (P_{2}(y|x))\). Let \(Q = \frac{1}{2}(P_{1} + P_{2})\), and assume that the transition matrix Q is ergodic. Let \(\{a_{n},n\ge 0\}\) and \(\{\phi (n),n\ge 0\}\) be two nonnegative integer sequences such that for any positive integers nm

$$\begin{aligned} \phi (m + n)-\phi (n)\ge m. \end{aligned}$$
(24)

If \(\forall x,y_{1},y_{2} \in G\),

$$\begin{aligned} \lim _{n\rightarrow \infty }\frac{1}{|T^{(n)}|}\sum _{t\in T^{(n-1)}}|P_{t}(y_{1},y_{2}|x)-P(y_{1},y_{2}|x)|=0, \end{aligned}$$
(25)

then

$$\begin{aligned} \lim _{n\rightarrow \infty }\frac{S_{k}(L^{a_{n}+\phi (n)}_{a_{n}})}{|L^{a_{n}+\phi (n)}_{a_{n}}|}= \pi (k) \quad \mathrm{a.e.} \quad \forall k \in G, \end{aligned}$$
(26)

where \(\pi =\{\pi (0),\pi (1),\ldots ,\pi (b-1)\}\) is the unique stationary distribution determined by the transition matrix Q.

The proof of the above theorem will be given in Sect. 4.

In the following, we will study the generalized entropy ergodic theorem for nonhomogeneous bifurcating Markov chains indexed by a binary tree with finite state space \(G=\{0,1,\ldots ,b-1\}\).

Theorem 3.2

Under the conditions of Theorem 3.1, let \(f_{a_{n},\phi _{n}}(\omega )\) be as defined in (17) and \(\{a_{n},n\ge 0\}\) be a sequence of bounded nonnegative numbers, then

$$\begin{aligned} \lim _{n\rightarrow \infty }f_{a_{n},\phi _{n}}(\omega )=-\frac{1}{2}\sum ^{b-1}_{l=0}\pi (l)\sum ^{b-1}_{k_{1}=0}\sum ^{b-1}_{k_{2}=0}P(k_{1},k_{2}|l)\ln P(k_{1},k_{2}|l) \quad \mathrm{a.e.}. \end{aligned}$$
(27)

The proof of the above theorem will be presented in Sect. 4.

Remark 3.1

Let \(a_{n}\equiv 0,\phi (n) = n\) in Theorem 3.2, we can immediately get the entropy ergodic theorem for nonhomogeneous bifurcating Markov chains indexed by a binary tree with finite state space G (see Dang et al. [9]).

Remark 3.2

From Property 2.3, we know that \(f_{a_{n},\phi (n)}(\omega )\) are uniformly integrable. Thus (27) also holds with \(L_{1}\) convergence.

We denote by \( g_{a_{n},\phi (n)}(\omega )\) the generalized entropy density of nonhomogeneous Markov chains indexed by a tree with the initial distribution (8) and transition matrices (9). From (12), it is easy to see that

$$\begin{aligned} g_{a_{n},\phi (n)}(\omega )=- \frac{1}{|L^{a_{n}+\phi (n)}_{a_n}|}\Big [\ln P(X^{L_{a_{n}}}) +\sum _{t\in L^{a_{n}+\phi (n)}_{a_{n}+1}}\ln Q_{t}(X_{t}|X_{1_{t}})\Big ]. \end{aligned}$$
(28)

By Theorem 3.2, we can establish the generalized entropy ergodic theorem for nonhomogeneous Markov chains indexed by a binary tree.

Corollary 3.1

Let \(T_{2}\) be a binary tree, \(G =\{0,1,2,\ldots ,b-1\}\) be a finite state space, \(\{X_{t},t\in T_{2}\}\) be a G-valued nonhomogeneous Markov chain indexed by \(T_{2}\) with the transition matrices (9) defined by Definition 2.2. Let \(Q=(Q(k|l)),k,l\in G\) be another transition matrix, and assume that Q is ergodic. Let \(\{a_{n},n\ge 0\}\) be a sequence of bounded nonnegative integers and \(\{\phi (n),n \ge 0\}\) be a nonnegative integer sequences such that for any positive integers nm,

$$\begin{aligned} \phi (m + n)- \phi (n)\ge m. \end{aligned}$$

If

$$\begin{aligned} \lim _{n\rightarrow \infty }\frac{1}{|T^{(n)}|}\sum _{T^{(n)}\backslash \{o\}}|Q_{t}(k|l)-Q(k|l)| = 0,\quad \forall k,l\in G, \end{aligned}$$
(29)

then

$$\begin{aligned} \lim _{n\rightarrow \infty }g_{a_{n},\phi (n)}(\omega )= - \sum ^{b-1}_{l=0}\sum ^{b-1}_{k=0}\pi (l)Q(k|l)\ln Q(k|l) \quad \mathrm{a.e.},\quad \end{aligned}$$
(30)

where \(\pi = \{\pi (0),\ldots ,\pi (b -1)\}\) is the unique stationary distribution determined by the transition matrix Q.

The proof of the above corollary will be given in Sect. 4.

Remark 3.3

Take \(a_{n} \equiv 0,\phi (n) = n\) in Corollary 3.1, it is straightforward to obtain the entropy ergodic theorem for nonhomogeneous Markov chains indexed by a Cayley tree \(T_{C,2}\) with finite state space G. The result is a special case of Dong, Yang and Bai [11] for \(N=2\).

If there is only one son for each vertex of the tree, nonhomogeneous Markov chains indexed by a binary tree will degenerate into nonhomogeneous Markov chains. Similarly, we denote by \(h_{a_{n},\phi (n)}(\omega ) \) the generalized entropy density of nonhomogeneous Markov chain with the initial distribution \(\big \{\mu _{0}(0),\ldots ,\mu _{0}(b-1)\big \}\) and transition matrices \(P_{n} = (p_{n}(i,j)),\ \ i,j \in G\). It easily follows that

$$\begin{aligned} h_{a_{n},\phi (n)}(\omega ) =- \frac{1}{\phi (n)}\bigg \{\log \mu _{a_{n}}(X_{a_{n}}) + \sum ^{a_{n}+\phi (n)}_{k=a_{n}+1 }\log p_{k}(X_{k-1},X_{k})\bigg \}, \end{aligned}$$
(31)

where \(\mu _{a_{n}}(x)\) is the distribution of \(X_{a_{n}}\). Thus we can get the generalized entropy ergodic theorem for nonhomogeneous Markov chains.

Corollary 3.2

Suppose \(\{X_{n},n\ge 0\}\) is a nonhomogeneous Markov chain taking values from a finite state space \(G = \{0,1,\ldots ,b-1\}\) with the initial distribution \(\big \{\mu _{0}(0),\ldots ,\mu _{0}(b-1)\big \}\) and the transition matrices \(\big \{P_{n} = (p_{n}(i,j)),\ \ i,j \in G,n = 1,2,\ldots \big \}\), where \(p_{n}(i,j) = {\mathbf {P}}(X_{n} = j | X_{n-1} = i)\). Let \(\{a_{n},n\ge 0\}\) be a sequence of bounded nonnegative integer and \(\{\phi (n),n \ge 0\}\) be a nonnegative integer sequences such that for any positive integers nm

$$\begin{aligned} \phi (m + n)- \phi (n)\ge m. \end{aligned}$$

Let \(P = (p(i,j))\) be another transition matrix, and assume that P is irreducible. If

$$\begin{aligned} \lim _{n\rightarrow \infty }\frac{1}{n}\sum ^{n}_{k=1}|p_{k}(i,j)-p(i,j)|=0, \end{aligned}$$
(32)

then

$$\begin{aligned} \lim _{n\rightarrow \infty }h_{a_{n},\phi (n)}(\omega ) =- \sum ^{b-1}_{i=0}\sum ^{b-1}_{j=0}\pi _{i}p(i,j)\log p(i,j) \quad \mathrm{a.e.}. \end{aligned}$$
(33)

Proof

The corollary is a special case of Corollary 3.1, where \(T_2\) is the set of nonnegative integers \({\mathbb {N}}\). \(\square \)

Remark 3.4

Note that

$$\begin{aligned}&\frac{1}{\phi (n)}\sum _{k=a_n+1}^{a_n+\phi (n)}|p_k(i,j)-p(i,j)|\\&\quad \le (1+\frac{a_n}{\phi (n)})\frac{1}{a_n+\phi (n)}\sum _{k=1}^{a_n+\phi (n)}|p_k(i,j)-p(i,j)|, \end{aligned}$$

and \(\{a_n\}\) is bounded, by (32), we have that

$$\begin{aligned} \lim \limits _{n\rightarrow \infty }\frac{1}{\phi (n)}\sum _{k=a_n+1}^{a_n+\phi (n)}|p_k(i,j)-p(i,j)|=0. \end{aligned}$$

Thus, we can immediately obtain the results of Wang and Yang [27] on the generalized entropy ergodic theorem for delayed sums of nonhomogeneous Markov chains.

Remark 3.5

If \(a_{n} \equiv 0,\phi (n) = n\) in Corollary 3.2, we can get the entropy ergodic theorem of nonhomogeneous Markov chains (see Yang, [29]).

4 The Proofs

Before providing the proofs of the main results in Sect 3, we begin with some lemmas.

Lemma 4.1

Let \(T_{2}\) be a binary tree, and G be a countable state space. Assuming that \(\{X_{t},t\in T_{2}\}\) be a G-valued nonhomogeneous bifurcating Markov chain indexed by a binary tree \(T_2\) defined by Definition 2.1, and \(\{g_{t}(x,y_{1},y_{2}),t \in T_{2}\}\) be a collection of functions defined on \(G^{3}\). Suppose that \(\exists \alpha > 0\), s.t. \(E[e^{\alpha |g_{t}(X_{t},X_{t^{1}},X_{t^{2}})|}] < \infty ,\forall t\in T_{2}\). Let \(\{a_{n},n \ge 0\}\) and \(\{\phi (n),n\ge 0\} \) be two sequences of nonnegative integers such that \(\phi (n)\) converges to infinity as \(n \rightarrow \infty \). Assume that for \(\forall \varepsilon > 0\),

$$\begin{aligned} \sum ^{\infty }_{n=1} \exp (-|L^{a_{n}+\phi (n)}_{a_{n}}|\varepsilon ) < \infty . \end{aligned}$$
(34)

Let

$$\begin{aligned} H_{a_{n},\phi (n)}(\omega ) = \sum _{ t\in L^{a_{n}+\phi (n)-1}_{a_{n}}}g_{t}(X_{t},X_{t^{1}},X_{t^{2}}), \end{aligned}$$
(35)

and

$$\begin{aligned} G_{a_{n},\phi (n)}(\omega ) = \sum _{t\in L^{a_{n}+\phi (n)-1}_{a_{n}}}E\big [g_{t}(X_{t},X_{t^{1}},X_{t^{2}})|X_{t}\big ]. \end{aligned}$$
(36)

Let \(\alpha > 0\), and set

$$\begin{aligned} D(\alpha )= & {} \Bigg \{\omega : \limsup _{n\rightarrow \infty }\frac{1}{|L^{a_{n}+\phi (n)}_{a_{n}}|}\sum _{t\in L^{a_{n}+\phi (n)}_{a_{n}+1} }E\big [g^{2}_{t}(X_{t},X_{t^{1}},X_{t^{2}})e^{\alpha |g_{t}(X_{t},X_{t^{1}},X_{t^{2}})|}|X_{t}\big ]\nonumber \\= & {} M(\alpha ;\omega )< \infty \Bigg \}. \end{aligned}$$
(37)

Then we have

$$\begin{aligned} \lim _{n\rightarrow \infty }\frac{H_{a_{n},\phi (n)}(\omega )-G_{a_{n},\phi (n)}(\omega )}{|L^{a_{n}+\phi (n)} _{a_{n}}|}= 0 \quad \mathrm{a.e.} \quad \omega \in D(\alpha ). \end{aligned}$$
(38)

Remark 4.1

It is easy to see that if \(\{g_{t}(x,y_{1},y_{2}),t \in T_{2}\}\) is a collection of uniformly bounded functions, then for any \(\alpha > 0, D(\alpha ) = \Omega \), thus we can get

$$\begin{aligned} \lim _{n\rightarrow \infty }\frac{H_{a_{n},\phi (n)}(\omega )-G_{a_{n},\phi (n)}(\omega )}{|L^{a_{n}+\phi (n)}_{a_{n}}|}= 0 \quad \mathrm{a.e.}. \end{aligned}$$

Remark 4.2

Let \(a_{n} = 0\) and \(\phi (n) = [\log _{2}n^{\alpha }](\alpha > 0)\). Since \(T_{2}\) is a binary tree, we have

$$\begin{aligned} |L^{a_{n}+\phi (n)}_{a_{n}}| = 2^{[\log _{2}n^{\alpha }]+1}-1 \ge 2^{\log _{2}n^{\alpha }-1+1}-1 = n^{\alpha }-1, \end{aligned}$$

where \([\cdot ]\) is the usual greatest integer function. In this case (34) holds.

Proof

Let \(\lambda \) be a nonzero real number, for fixed n, define

$$\begin{aligned} t_{a_{n},m}(\lambda ,\omega ) =\frac{e^{\lambda \sum _{t\in L^{a_{n}+m-1} _{a_{n}}}g_{t}(X_{t},X_{t^{1}},X_{t^{2}})}}{\prod \limits _{t \in L^{a_{n}+m-1}_{a_{n}}}E\big [e^{\lambda g_{t}(X_{t},X_{t^{1}},X_{t^{2}})}|X_{t}\big ]},\ \ \ m = 1,2,\ldots ,\phi (n). \end{aligned}$$
(39)

Noticing that

$$\begin{aligned}&E\Big [e^{\lambda \sum _{t\in L_{a_{n}+\phi (n)-1}}g_{t}(X_{t},X_{t^{1}},X_{t^{2}})}|X^{T^{(a_{n}+\phi (n)-1)}}\Big ]\nonumber \\&\quad = \sum _{t \in L_{a_{n}+\phi (n)-1},(x_{t^{1}},x_{t^{2}})\in G^{2}}e^{\lambda \sum _{t\in L_{a_{n}+\phi (n)-1}} g_{t}(X_{t},x_{t^{1}},x_{t^{2}})}\nonumber \\&\quad \quad \cdot \mathbf {P}(X^{L_{a_{n}+\phi (n)}}=x^{L_{a_{n}+\phi (n)}}|X^{T^{(a_{n}+\varphi (n)-1)}})\nonumber \\&\quad = \sum _{t\in L_{a_{n}+\phi (n)-1},(x_{t^{1}},x_{t^{2}})\in G^{2}}e^{\lambda \sum _{t\in L_{a_{n}+\phi (n)-1}}g_{t}(X_{t},x_{t^{1}},x_{t^{2}})}\cdot \prod _{t\in L_{a_{n}+\phi (n)-1}}P_{t}(x_{t^{1}},x_{t^{2}}|X_{t})\nonumber \\&\quad = \prod _{t\in L_{a_{n}+\phi (n)-1}}\sum _{(x_{t^{1}},x_{t^{2}})\in G^{2}}e^{\lambda g_{t}(X_{t},x_{t^{1}},x_{t^{2}})}P_{t}(x_{t^{1}},x_{t^{2}}|X_{t})\nonumber \\&\quad = \prod _{t\in L_{a_{n}+\phi (n)-1}}E\Big [e^{\lambda g_{t}(X_{t},X_{t^{1}},X_{t^{2}})}|X_t\Big ]. \end{aligned}$$
(40)

It is easy to see that \(E [t_{a_{n},1}(\lambda ,\omega )] = 1\). Hence by (40),

$$\begin{aligned}&E[t_{a_{n},\phi (n)}(\lambda ,\omega )]\nonumber \\&\quad = E\left[ E[t_{a_{n},\phi (n)}(\lambda ,\omega )|X^{T^{(a_{n}+\phi (n)-1)}}]\right] \nonumber \\&\quad = E\Bigg [E\Big [t_{a_{n},\phi (n)-1}(\lambda ,\omega )\frac{e^{\lambda \sum _{t\in L_{a_{n}+\phi (n)-1}}g_{t}(X_{t},X_{t^{1}},X_{t^{2}})}}{\prod \limits _{t\in L_{a_{n}+\phi (n)-1}}E[e^{\lambda g_{t}(X_{t},X_{t^{1}},X_{t^{2}})}|X_{t}]}|X^{T^{(a_{n}+\phi (n)-1)}}\Big ]\Bigg ]\nonumber \\&\quad = E\Bigg [t_{a_{n},\phi (n)-1}(\lambda ,\omega )\cdot \frac{E\Big [e ^{\lambda \sum _{t\in L_{a_{n}+\phi (n)-1}}g_{t}(X_{t},X_{t^{1}},X_{t^{2}})} \mid X^{T^{(a_{n}+\phi (n)-1)}}\Big ]}{\prod \limits _{t\in L_{a_{n}+\phi (n)-1}}E[e^{\lambda g_{t}(X_{t},X_{t^{1}},X_{t^{2}})}|X_{1_{t}}]}\Bigg ]\nonumber \\&\quad = E[t_{a_{n},\phi (n)-1}(\lambda ,\omega )] =\cdots = E [t_{a_{n},1}(\lambda ,\omega )] = 1. \end{aligned}$$
(41)

By Markov inequality, (34) and (41), for any \(\varepsilon > 0\), we have

$$\begin{aligned}&\sum ^{\infty }_{n=1}P\Big [\frac{1}{|L^{a_{n}+\phi (n)}_{a_{n}}|} \ln t_{a_{n},\phi (n)}(\lambda ,\omega ) \ge \varepsilon \Big ] \nonumber \\&\quad = \sum ^{\infty }_{n=1} P\left[ t_{a_{n},\phi (n)}(\lambda ,\omega ) \ge \exp (|L^{a_{n}+\phi (n)}_{a_{n}}|\cdot \varepsilon )\right] \nonumber \\&\quad \le \sum ^{\infty }_{n=1} \exp (-|L^{a_{n}+\phi (n)}_{a_{n}}|\cdot \varepsilon ) <\infty . \end{aligned}$$
(42)

According to Borel–Cantelli Lemma and arbitrariness of \(\varepsilon \), we have

$$\begin{aligned} \limsup _{n\rightarrow \infty }\frac{1}{|L^{a_{n}+\phi (n)}_{a_{n}}|}\ln t_{a_{n},\phi (n)}(\lambda ,\omega ) \le 0 \quad \mathrm{a.e.}. \end{aligned}$$
(43)

Noticing that

$$\begin{aligned} \frac{\ln t_{a_{n},\phi (n)}(\lambda ,\omega ) }{|L^{a_{n}+\phi (n)}_{a_{n}}|}&= \frac{1}{|L^{a_{n}+\phi (n)}_{a_{n}}| }\sum _{t\in L^{a_{n}+\phi (n)-1}_{a_{n}}}\Big \{\lambda g_{t}(X_{t},X_{t^{1}},X_{t^{2}})\nonumber \\&\quad -\ln E[e^{\lambda g_{t}(X_{t},X_{t^{1}},X_{t^{2}})}|X_{t}]\Big \}. \end{aligned}$$
(44)

by (43) and (44), we have

$$\begin{aligned}&\limsup _{n\rightarrow \infty }\frac{1}{|L^{a_{n}+\phi (n)}_{a_{n}}|} \sum _{t\in L^{a_{n}+\phi (n)-1}_{a_{n}}}\Big \{\lambda g_{t}(X_{t},X_{t^{1}},X_{t^{2}})\nonumber \\&\quad -\, \ln E[e^{\lambda g_{t}(X_{t},X_{t^{1}},X_{t^{2}})}|X_{t}]\Big \}\le 0\ \ \ \mathrm{a.e.}. \end{aligned}$$
(45)

Let \(0 < \lambda \le \alpha \), dividing both sides of (45) by \(\lambda \), we have

$$\begin{aligned}&\limsup _{n\rightarrow \infty }\frac{1}{|L^{a_{n}+\phi (n)}_{a_{n}}|} \sum _{t\in L^{a_{n}+\phi (n)-1}_{a_{n}}}\Big \{g_{t}(X_{t},X_{t^{1}},X_{t^{2}})\nonumber \\&\quad -\,\frac{\ln E[e^{\lambda g_{t}(X_{t},X_{t^{1}},X_{t^{2}})}|X_{t}]}{\lambda } \Big \}\le 0\quad \mathrm{a.e.}. \end{aligned}$$
(46)

By (37), (46), and inequalities \(\ln x \le x-1 (x>0)\) and \(0\le e^{x}-1-x \le \frac{1}{2}x^{2}e^{|x|}\), we get

$$\begin{aligned}&\limsup _{n\rightarrow \infty }\frac{1}{|L^{a_{n}+\phi (n)}_{a_{n}}|}\sum _{t\in L^{a_{n}+\phi (n)-1}_{a_{n}}}\Big \{g_{t}(X_{t},X_{t^{1}},X_{t^{2}})-E[g_{t}(X_{t},X_{t^{1}},X_{t^{2}})|X_{t}]\Big \}\nonumber \\&\quad \le \limsup _{n\rightarrow \infty }\frac{1}{|L^{a_{n}+\phi (n)}_{a_{n}}|}\sum _{t\in L^{a_{n}+\phi (n)-1}_{a_{n}}} \Big \{\frac{\ln E[e^{\lambda g_{t}(X_{t},X_{t^{1}},X_{t^{2}})}|X_t]}{\lambda }\nonumber \\&\quad \quad -\,E[g_{t}(X_{t},X_{t^{1}},X_{t^{2}})|X_t]\Big \}\nonumber \\&\quad \le \limsup _{n\rightarrow \infty }\frac{1}{|L^{a_{n}+\phi (n)}_{a_{n}}|}\sum _{t\in L^{a_{n}+\phi (n)-1}_{a_{n}}}\Bigg \{\frac{E[e^{\lambda g_{t}(X_{t},X_{t^{1}},X_{t^{2}})}|X_{t}]-1 }{\lambda }\nonumber \\&\quad \quad -\, \frac{E[\lambda g_{t}(X_{t},X_{t^{1}},X_{t^{2}})|X_{t}] }{\lambda }\Bigg \}\nonumber \\&\quad = \limsup _{n\rightarrow \infty }\frac{1}{|L^{a_{n}+\phi (n)}_{a_{n}}|}\sum _{t\in L^{a_{n}+\phi (n)-1}_{a_{n}}}\nonumber \\&\quad \quad \frac{E\Big [e^{\lambda g_{t}(X_{t},X_{t^{1}},X_{t^{2}})} -1-\lambda g_{t}(X_{t},X_{t^{1}},X_{t^{2}})|X_t\Big ]}{\lambda }\nonumber \\&\quad \le \frac{\lambda }{2}\limsup _{n\rightarrow \infty }\frac{1}{|L^{a_{n}+\phi (n)}_{a_{n}}|}\sum _{t\in L^{a_{n}+\phi (n)-1}_{a_{n}}}E\Big [g^{2}_{t}(X_{t},X_{t^{1}},X_{t^{2}})e^{|\lambda |\cdot |g_{t}(X_{t},X_{t^{1}},X_{t^{2}})|}|X_{t}\Big ]\nonumber \\&\quad \le \frac{\lambda }{2}\limsup _{n\rightarrow \infty }\frac{1}{|L^{a_{n}+\phi (n)}_{a_{n}}|}\sum _{t\in L^{a_{n}+\phi (n)-1}_{a_{n}}}E\Big [g^{2}_{t}(X_{t},X_{t^{1}},X_{t^{2}})e^{|\alpha |\cdot |g_{t}(X_{t},X_{t^{1}},X_{t^{2}})|}|X_{t}\Big ]\nonumber \\&\quad = \frac{\lambda }{2}M(\alpha ;\omega ) \quad a.e. \ \ \ \omega \in D(\alpha ). \end{aligned}$$
(47)

Letting \(\lambda \rightarrow 0^{+}\) in (47) we have

$$\begin{aligned}&\limsup _{n\rightarrow \infty }\frac{1}{|L^{a_{n}+\phi (n)}_{a_{n}}|}\sum _{t\in L^{a_{n}+\phi (n)-1}_{a_{n}}}\Big \{g_{t}(X_{t},X_{t^{1}},X_{t^{2}})-E[g_{t}(X_{t},X_{t^{1}},X_{t^{2}})|X_{t}]\Big \}\nonumber \\&\quad \le 0 \ \ \mathrm{a.e.} \ \ \omega \in D(\alpha ). \end{aligned}$$
(48)

Let \(-\alpha \le \lambda < 0\), we similarly get

$$\begin{aligned}&\liminf _{n\rightarrow \infty }\frac{1}{|L^{a_{n}+\phi (n)}_{a_{n}}|}\sum _{t\in L^{a_{n}+\phi (n)-1}_{a_{n}}}\Big \{g_{t}(X_{t},X_{t^{1}},X_{t^{2}})-E[g_{t}(X_{t},X_{t^{1}}, X_{t^{2}})|X_{t}]\Big \}\nonumber \\&\quad \ge 0 \ \ \mathrm{a.e.}\ \ \omega \in D(\alpha ). \end{aligned}$$
(49)

Combining (48) and (49), we obtain (38) directly. \(\square \)

Lemma 4.2

Let \(T_{2}\) be a binary tree, \(\{a_{n},n \ge 0\}\) and \(\{\phi (n),n \ge 0\}\) defined as in Lemma 4.1. Let \(\{a_{t},t \in T\}\) be a collection of real numbers, and a be a real number. If

$$\begin{aligned} \lim _{n\rightarrow \infty }\frac{1}{|T^{(n)}|} \sum _{t\in T^{(n-1)}}|a_{t} -a| = 0, \end{aligned}$$
(50)

then

$$\begin{aligned} \lim _{n\rightarrow \infty }\frac{1}{|L^{a_{n}+\phi (n)}_{a_{n}}|}\sum _{t\in L^{a_{n}+\phi (n)-1}_{a_{n}}}|a_{t} -a|= 0. \end{aligned}$$
(51)

Proof

Noticing that

$$\begin{aligned} \frac{1}{|L^{a_{n}+\phi (n)}_{a_{n}}|}\sum _{t\in L^{a_{n}+\phi (n)-1}_{a_{n}}}|a_{t}-a|\le \frac{|T^{(a_{n}+\phi (n))}|}{|L^{a_{n}+\phi (n)}_{a_{n}}|}\frac{1}{|T^{(a_{n}+\phi (n))}|}\sum _{t\in T^{(a_{n}+\phi (n)-1)}}|a_{t}-a|.\nonumber \\ \end{aligned}$$
(52)

Since \(T_{2}\) is a binary tree, we have

$$\begin{aligned} \lim _{n\rightarrow \infty }\frac{|T^{(a_{n}+\phi (n))}|}{|L^{a_{n}+\phi (n)}_{a_{n}}|}=\lim _{n\rightarrow \infty }\frac{2^{a_{n}+\phi (n)+1} -1}{2^{a_{n}}(2^{\phi (n)+1} -1)} = 1. \end{aligned}$$
(53)

Equation (51) immediately follows from (50), (52) and (53). \(\square \)

Now, we present the proof of Theorem 3.1 as follows.

Proof of Theorem 3.1

It is easy to see from (24) that \(\lim \limits _{n\rightarrow \infty }\phi (n) = \infty \) and (34) is satisfied. By (25) and Lemma 4.2, we have

$$\begin{aligned} \lim _{n\rightarrow \infty }\frac{1}{|L^{a_{n}+\phi (n)}_{a_{n}}|}\sum _{t\in L^{a_{n}+\phi (n)-1}_{a_{n}}}|P_{t}(y_{1},y_{2}|x)-P(y_{1},y_{2}|x)|=0. \end{aligned}$$
(54)

Let \(g_{t}(x,y_{1},y_{2}) = I_{k}(y_{1})\) in Lemma 4.1. Obviously, \(\{g_{t}(x,y_{1},y_{2}),t \in T_{2}\}\) are uniformly bounded. Since

$$\begin{aligned} H_{a_{n},\phi (n)}(\omega ) = \sum _{t\in L^{a_{n}+\phi (n)-1}_{a_{n}}}I_{k}(X_{t^{1}}) = S^1_k(L^{a_{n}+\phi (n)-1}_{a_{n}} ), \end{aligned}$$
(55)

and

$$\begin{aligned} G_{a_{n},\phi (n)}(\omega )= & {} \sum _{t\in L^{a_{n}+\phi (n)-1}_{a_{n}}}E[g_{t}(X_{t},X_{t^{1}},X_{t^{2}})|X_{t}]\nonumber \\= & {} \sum _{t\in L^{a_{n}+\phi (n)-1}_{a_{n}}}\sum _{(x_{t^{1}},x_{t^{2}})\in G^{2}}g_{t}(X_{t},x_{t^{1}},x_{t^{2}})\cdot P_{t}(x_{t^{1}},x_{t^{2}}|X_{t})\nonumber \\= & {} \sum _{t\in L^{a_{n}+\phi (n)-1}_{a_{n}}}\sum _{(x_{t^{1}},x_{t^{2}})\in G^{2}}I_{k}(x_{t^{1}})\cdot P_{t}(x_{t^{1}},x_{t^{2}}|X_{t}). \end{aligned}$$
(56)

From Lemma 4.1, we have

$$\begin{aligned}&\lim _{n\rightarrow \infty }\frac{1}{|L^{a_{n}+\phi (n)}_{a_{n}}|}\Big \{S^{1}_{k}(L^{a_{n}+\phi (n)-1}_{a_{n}})\nonumber \\&\quad -\,\sum _{t\in L^{a_{n}+\phi (n)-1}_{a_{n}}}\sum _{(x_{t^{1}},x_{t^{2}})\in G^{2}}I_{k}(x_{t^{1}})P_{t}(x_{t^{1}},x_{t^{2}}|X_{t})\Big \}=0 \ \ \mathrm{a.e.}. \end{aligned}$$
(57)

From (54), it can be easily verified that

$$\begin{aligned}&\lim _{n\rightarrow \infty }\frac{1}{|L^{a_{n}+\phi (n)}_{a_{n}}|}\Big \{ \sum _{t\in L^{a_{n}+\phi (n)-1}_{a_{n}}}\sum _{(x_{t^{1}},x_{t^{2}})\in G^{2}}I_{k}(x_{t^{1}})P_{t}(x_{t^{1}},x_{t^{2}}|X_{t})\nonumber \\&\quad -\,\sum _{t\in L^{a_{n}+\phi (n)-1}_{a_{n}}}\sum _{(x_{t^{1}},x_{t^{2}})\in G^{2}}I_{k}(x_{t^{1}})P(x_{t^{1}},x_{t^{2}}|X_{t})\Big \}=0.\nonumber \\ \end{aligned}$$
(58)

Since \(\sum _{x_{t^{2}}\in G}P(x_{t^{1}},x_{t^{2}}|X_{t}) = P_{1}(x_{t^{1}}|X_{t})\), so

$$\begin{aligned}&\sum _{t\in L^{a_{n}+\phi (n)-1}_{a_{n}}}\sum _{(x_{t^{1}},x_{t^{2}})\in G^{2}}I_{k}(x_{t^{1}})P(x_{t^{1}},x_{t^{2}}|X_{t})\nonumber \\&\quad = \sum _{t\in L^{a_{n}+\phi (n)-1}_{a_{n}}}P_{1}(k|X_{t})\nonumber \\&\quad = \sum _{t\in L^{a_{n}+\phi (n)-1}_{a_{n}}}\sum ^{b-1}_{l=0}I_{l}(X_{t})P_{1}(k|l)\nonumber \\&\quad = \sum ^{b-1 }_{l=0}P_{1}(k|l)S_{l}(L^{a_{n}+\phi (n)-1}_{a_{n}}). \end{aligned}$$
(59)

By (57)–(59), we have

$$\begin{aligned} \lim _{n\rightarrow \infty }\frac{1}{|L^{a_{n}+\phi (n)}_{a_{n}}|}\Big \{S^{1}_{k}(L^{a_{n}+\phi (n)-1}_{a_{n}})- \sum ^{b-1 }_{l=0}P_{1}(k|l)S_{l}(L^{a_{n}+\phi (n)-1}_{a_{n}})\Big \}=0 \quad \mathrm{a.e.}. \end{aligned}$$
(60)

Let \(g_{t}(x,y_{1},y_{2}) = I_{k}(y_{2})\) in Lemma 4.1, similarly, we obtain that

$$\begin{aligned} \lim _{n\rightarrow \infty }\frac{1}{|L^{a_{n}+\phi (n)}_{a_{n}}|}\Big \{S^{2}_{k}(L^{a_{n}+\phi (n)-1}_{a_{n}})- \sum ^{b-1}_{l=0}P_{2}(k|l)S_{l}(L^{a_{n}+\phi (n)-1}_{a_{n}})\Big \}=0 \quad \mathrm{a.e.}. \end{aligned}$$
(61)

Adding (60) and (61), and noticing that

$$\begin{aligned} 0 \le \lim _{n\rightarrow \infty }\frac{S_{k}(L_{a_{n}})}{|L^{a_{n}+\phi (n)}_{a_{n}}|}\le \lim _{n\rightarrow \infty }\frac{|L_{a_{n}}|}{|L^{a_{n}+\phi (n)}_{a_{n}}|}= \lim _{n\rightarrow \infty }\frac{2^{a_{n}}}{2^{a_{n}}(2^{\phi (n)}-1)}=0, \end{aligned}$$

\(\lim \limits _{n\rightarrow \infty }\frac{|L^{a_{n}+\phi {n})}_{a_{n}}|}{|L^{a_{n}+\phi (n)-1}_{a_{n}}|}=2\), and \(Q = \frac{1}{2}(P_{1} + P_{2})\). By (23), we have

$$\begin{aligned} \lim _{n\rightarrow \infty }\Bigg \{\frac{S_{k}(L^{a_{n}+\phi (n)}_{a_{n}})}{|L^{a_{n}+\phi (n)}_{a_{n}}|}- \sum ^{b-1}_{l=0}Q(k|l)\frac{S_{l}(L^{a_{n}+\phi (n)-1}_{a_{n}})}{|L^{a_{n}+\phi (n)-1}_{a_{n}}|}\Bigg \}=0 \quad a.e. \end{aligned}$$
(62)

Letting \(\phi '(n) = \phi (n)-1\), it is easy to see that \(\{\phi '(n),n \ge 0\}\) also satisfies (34). Using the same argument as that used to derive (62), we can prove that

$$\begin{aligned} \lim _{n\rightarrow \infty }\Bigg \{\frac{S_{k}(L^{a_{n}+\phi (n)-1}_{a_{n}})}{|L^{a_{n}+\phi (n)-1}_{a_{n}}|}- \sum ^{b-1}_{l=0}Q(k|l)\frac{S_{l}(L^{a_{n}+\phi (n)-2}_{a_{n}})}{|L^{a_{n}+\phi (n)-2}_{a_{n}}|}\Bigg \}=0 \quad \mathrm{a.e.}. \end{aligned}$$
(63)

Multiplying the k-th equality of (63) by Q(j|k), adding them together and using (62), we have

$$\begin{aligned} 0= & {} \lim _{n\rightarrow \infty }\left[ \sum ^{b-1}_{k=0}\frac{S_{k}(L^{a_{n}+\phi (n)-1}_{a_{n}})}{|L^{a_{n}+\phi (n)-1}_{a_{n}}|}Q(j|k)- \sum ^{b-1}_{k=0}\sum ^{b-1}_{l=0}\frac{S_{l}(L^{a_{n}+\phi (n)-2}_{a_{n}})}{|L^{a_{n}+\phi (n)-2}_{a_{n}}|}Q(k|l)Q(j|k)\right] \nonumber \\= & {} \lim _{n\rightarrow \infty }\Bigg \{\left[ \sum ^{b-1}_{k=0}\frac{S_{k}(L^{a_{n}+\phi (n)-1}_{a_{n}})}{|L^{a_{n}+\phi (n)-1}_{a_{n}}|}Q(j|k)- \frac{S_{j}(L^{a_{n}+\phi (n)}_{a_{n}})}{|L^{a_{n}+\phi (n)}_{a_{n}}|}\right] \nonumber \\&+\left[ \frac{S_{j}(L^{a_{n}+\phi (n)}_{a_{n}})}{|L^{a_{n}+\phi (n)}_{a_{n}}|}- \sum ^{b-1}_{k=0}\sum ^{b-1}_{l=0}\frac{S_{l}(L^{a_{n}+\phi (n)-2}_{a_{n}})}{|L^{a_{n}+\phi (n)-2}_{a_{n}}|}Q(k|l)Q(j|k)\right] \Bigg \}\nonumber \\= & {} \lim _{n\rightarrow \infty }\left[ \frac{S_{j}(L^{a_{n}+\phi (n)}_{a_{n}})}{|L^{a_{n}+\phi (n)}_{a_{n}}|}- \sum ^{b-1}_{l=0}\frac{S_{l}(L^{a_{n}+\phi (n)-2}_{a_{n}})}{|L^{a_{n}+\phi (n)-2}_{a_{n}}|}Q^{(2)}(j|l)\right] \quad \mathrm{a.e.}. \end{aligned}$$
(64)

where \(Q^{(N)}(j|l)\) is the N-step transition probability determined by Q. By induction, we have

$$\begin{aligned} \lim _{n\rightarrow \infty }\left[ \frac{S_{j}(L^{a_n+\phi (n)}_{a_n})}{|L^{a_{n}+\phi (n)}_{a_{n}}|}- \sum ^{b-1}_{l=0}\frac{S_{l}(L^{a_{n}+\phi (n)-N}_{a_{n}})}{|L^{a_{n}+\phi (n)-N}_{a_{n}}|}Q^{(N)}(j|l)\right] =0 \quad \mathrm{a.e.}. \end{aligned}$$
(65)

Noticing that

$$\begin{aligned} \frac{1}{|L^{a_{n}+\phi (n)-N}_{a_{n}}|}\sum ^{b-1}_{l=0}S_{l}(L^{a_{n}+\phi (n)-N}_{a_{n}})=1, \end{aligned}$$
(66)

and

$$\begin{aligned} \lim _{N\rightarrow \infty }Q^{(N)}(j|l) = \pi (j),\ \ \ j\in G. \end{aligned}$$
(67)

(26) follows from (65), (66) and (67). This completes the proof of the Theorem 3.1.

\(\square \)

Before presenting the proof of Theorem 3.2, we cite a lemma which will be used.

Lemma 4.3

(Dong et al. [11]) Let \(T_{2}\) be a binary tree, \(\varphi (x)\) be a bounded function defined on interval \(\bigtriangleup \), and \(\varphi \) be continuous at \(x = b(b\in \bigtriangleup )\). Let \(\{b_{t},t \in T_{2}\}\) be a collection of real numbers. If

$$\begin{aligned} \lim _{n\rightarrow \infty }\frac{1}{|T^{(n)}|}\sum _{t\in T^{(n-1)}}|b_{t}-b|=0, \end{aligned}$$
(68)

then

$$\begin{aligned} \lim _{n\rightarrow \infty }\frac{1}{|T^{(n)}|}\sum _{t\in T^{(n-1)}}|\varphi (b_{t})-\varphi (b)|=0. \end{aligned}$$
(69)

Proof of Theorem 3.2

Since \(\{a_{n},n\ge 0\}\) is bounded, then there exists \(M\ge 0\) such that \(|a_n|\le M\) for all \(n\ge 0\). Since

$$\begin{aligned} E[e^{|\ln P(X^{L_{a_{n}}})|}]=\sum _{x^{L_{a_{n}}}}e^{-\ln P(X^{L_{a_{n}}}=x^{L_{a_{n}}})}P(X^{L_{a_{n}}}=x^{L_{a_{n}}})\le b^{|L_{a_{n}}|}. \end{aligned}$$

It is easy to see from (24) that \(\{\phi (n),n\ge 0\}\) satisfies (34). By Markov inequality and (34), we have for every \(\varepsilon > 0\),

$$\begin{aligned} \sum ^{\infty }_{n=1}P\left[ \frac{1}{|L^{a_{n}+\phi (n)}_{a_{n}}|}\ln {{P}}(X^{L_{a_{n}}})\ge \varepsilon \right] \le b^{2^{M}}\sum ^{\infty }_{n=1}\exp \{- \varepsilon |L^{a_{n}+\phi (n)}_{a_{n}}|\} <\infty . \end{aligned}$$
(70)

By Borel–Cantelli lemma, we get

$$\begin{aligned} \lim _{n\rightarrow \infty }\frac{1}{|L^{a_{n}+\phi (n)}_{a_{n}}|}\ln P(X^{L_{a_{n}}})=0\quad \mathrm{a.e.}. \end{aligned}$$
(71)

Let \(\varphi (x) = x \log x(\varphi (0) = 0)\). It is easy to see that \(\varphi (x)\) is a continuous function on the interval [0, 1]. By Lemmas 4.2, 4.3 and (25), we have \(\forall k_{1},k_{2},l\in G\)

$$\begin{aligned}&\lim _{n\rightarrow \infty }\frac{1}{|L^{a_{n}+\phi (n)}_{a_{n}}|}\sum _{t\in L^{a_{n}+\phi (n)-1}_{a_{n}}}\left| P_{t}(k_{1},k_{2}|l)\ln P_{t}(k_{1},k_{2}|l)\right. \nonumber \\&\quad \left. -P(k_{1},k_{2}|l)\ln P(k_{1},k_{2}|l)\right| =0. \end{aligned}$$
(72)

Let \(g_{t}(x,y_{1},y_{2}) = \ln P_{t}(y_{1},y_{2}|x)\) for all \(t\in T_2\) in Lemma 4.1. By (35) and (36), we have

$$\begin{aligned}&H_{a_{n},\phi (n)}(\omega ) =\sum _{t\in L^{a_{n}+\phi (n)-1}_{a_{n}}}\ln P_{t}(X_{t^{1}},X_{t^{2}}|X_{t}),\end{aligned}$$
(73)
$$\begin{aligned}&G_{a_{n},\phi (n)}(\omega ) = \sum _{t\in L^{a_{n}+\phi (n)-1}_{a_{n}}}\sum _{(x_{t^{1}},x_{t^{2}})\in G^{2}} P_{t}(x_{t^{1}},x_{t^{2}}|X_{t})\ln P_{t}(x_{t^{1}},x_{t^{2}}|X_{t}). \end{aligned}$$
(74)

Letting \(\alpha =\frac{1}{2}\), noticing that for any \(t\in T_{2}\), we have

$$\begin{aligned}&E\left[ g^{2}_{t}(X_{t},X_{t^{1}},X_{t^{2}})e^{\alpha |g_{t}(X_{t},X_{t^{1}},X_{t^{2}})|}|X_{t}\right] \\&\quad = \sum _{(x_{t^{1}},x_{t^{2}})\in G^{2}}\ln ^{2}P_{t}(x_{t^{1}},x_{t^{2}}|X_{t})\cdot e^{- \frac{1}{2}\ln P_{t}(x_{t^{1}},x_{t^{2}}|X_{t})}P_{t}(x_{t^{1}},x_{t^{2}}|X_{t})\\&\quad = \sum _{(x_{t^{1}},x_{t^{2}})\in G^{2}}\ln ^{2}P_{t}(x_{t^{1}},x_{t^{2}}|X_{t})[P_{t}(x_{t^{1}},x_{t^{2}}|X_{t})]^\frac{1}{2}\\&\quad \le 16b^{2}e^{-2}. \end{aligned}$$

and \(\forall t\in T_{2}\),

$$\begin{aligned} E[e^{\frac{1}{2}|g_{t}(X_{t},X_{t^{1}},X_{t^{2}})|}|X_{t}] < \infty . \end{aligned}$$
(75)

Thus

$$\begin{aligned}&\limsup _{n\rightarrow \infty }\frac{1}{|L^{a_{n}+\phi (n)}_{a_{n}}|}\sum _{t\in L^{a_{n}+\phi (n)-1}_{a_{n}}}E[g^{2}_{t}(X_{t},X_{t^{1}},X_{t^{2}})\cdot e^{\frac{1}{2}|g_{t}(X_{t},X_{t^{1}},X_{t^{2}})|}|X_t]\nonumber \\&\quad \le 16b^{2}e^{-2}. \end{aligned}$$
(76)

By (73)–(76) and Lemma 4.1, we have

$$\begin{aligned}&\lim _{n\rightarrow \infty }\frac{1}{|L^{a_{n}+\phi (n)}_{a_{n}}|}\Bigg \{\sum _{t\in L^{a_{n}+\phi (n)-1}_{a_{n}}}\ln P_{t}(X_{t^{1}},X_{t^{2}}|X_{t})\nonumber \\&\quad - \,\sum _{t\in L^{a_{n}+\phi (n)-1}_{a_{n}}}\sum _{(x_{t^{1}},x_{t^{2}})\in G^{2}}P_{t}(x_{t^{1}},x_{t^{2}}|X_{t})\cdot \ln P_{t}(x_{t^{1}},x_{t^{2}}|X_{t})\Bigg \} = 0 \quad \mathrm{a.e.}.\qquad \end{aligned}$$
(77)

Now, we have

$$\begin{aligned}&\Bigg |\frac{1}{|L^{a_{n}+\phi (n)}_{a_{n}}|}\sum _{t\in L^{a_{n}+\phi (n)-1}_{a_{n}}}\sum _{(x_{t^{1}},x_{t^{2}})\in G^{2}}P_{t}(x_{t^{1}},x_{t^{2}}|X_{t})\cdot \ln P_{t}(x_{t^{1}},x_{t^{2}}|X_{t})\nonumber \\&\quad -\,\frac{1}{2}\sum ^{b-1}_{l=0}\pi (l)\sum ^{b-1}_{k_ 1=0}\sum ^{b-1}_{k_ 2=0}P(k_{1},k_{2}|l)\ln P(k_{1},k_{2}|l) \Bigg | \nonumber \\&\quad \le \Bigg |\frac{1}{|L^{a_{n}+\phi (n)}_{a_{n}}|}\sum _{t\in L^{a_{n}+\phi (n)-1}_{a_{n}}}\sum ^{b-1}_{l=0}\sum ^{b-1}_{k_{1}=0}\sum ^{b-1}_{k_{2}=0}I_{l}(X_{t})P_{t}(k_{1},k_{2}|l)\cdot \ln P_{t}(k_{1},k_{2}|l)\nonumber \\&\qquad - \,\frac{1}{|L^{a_{n}+\phi (n)}_{a_{n}}|}\sum _{t\in L^{a_{n}+\phi (n)-1}_{a_{n}}}\sum ^{b-1}_{l=0}\sum ^{b-1}_{k_{1}=0}\sum ^{b-1}_{k_{2}=0}I_{l}(X_{t})P(k_{1},k_{2}|l)\cdot \ln P(k_{1},k_{2}|l)\Bigg |\nonumber \\&\qquad + \Bigg |\frac{1}{|L^{a_{n}+\phi (n)}_{a_{n}}|}\sum _{t\in L^{a_{n}+\phi (n)-1}_{a_{n}}}\sum ^{b-1}_{l=0}\sum ^{b-1}_{k_{1}=0}\sum ^{b-1}_{k_{2}=0}I_{l}(X_{t})P(k_{1},k_{2}|l)\cdot \ln P(k_{1},k_{2}|l)\nonumber \\&\qquad -\,\frac{1}{2}\sum ^{b-1}_{l=0}\pi (l)\sum ^{b-1}_{k_{1}=0}\sum ^{b-1}_{k_{2}=0}P(k_{1},k_{2}|l)\ln P(k_{1},k_{2}|l)\Bigg |\nonumber \\&\quad \le \sum ^{b-1}_{l=0}\sum ^{b-1}_{k_{1}=0}\sum ^{b-1}_{k_{2}=0}\frac{1}{|L^{a_{n}+\phi (n)}_{a_{n}}|}\sum _{t\in L^{a_{n}+\phi (n)-1}_{a_{n}}}\big |P_{t}(k_{1},k_{2}|l)\cdot \ln P_{t}(k_{1},k_{2}|l) \nonumber \\&\qquad -\,P(k_{1},k_{2}|l)\cdot \ln P(k_{1},k_{2}|l)\big |+\sum ^{b-1}_{l=0}\sum ^{b-1}_{k_{1}=0}\sum ^{b-1 }_{k_{2}=0}P(k_{1},k_{2}|l)\cdot \ln P(k_{1},k_{2}|l)\nonumber \\&\quad \quad \left| \frac{1}{|L^{a_{n}+\phi (n)}_{a_{n}}|}\sum _{t\in L^{a_{n}+\phi (n)-1}_{an}}I_{l}(X_{t})-\frac{1}{2}\pi (l)\right| \nonumber \\&\quad \le \sum ^{b-1}_{l=0}\sum ^{b-1}_{k_{1}=0}\sum ^{b-1}_{k_{2}=0}\frac{1}{|L^{a_{n}+\phi (n)}_{a_{n}}|}\sum _{t\in L^{a_{n}+\phi (n)-1}_{a_{n}}}\big |P_{t}(k_{1},k_{2}|l)\cdot \ln P_{t}(k_{1},k_{2}|l)\nonumber \\&\qquad -\,P(k_{1},k_{2}|l)\cdot \ln P(k_{1},k_{2}|l)\big | + \sum ^{b-1}_{l=0}\sum ^{b-1}_{k_{1}=0}\sum ^{b-1}_{k_{2}=0}P(k_{1},k_{2}|l)\cdot \ln P(k_{1},k_{2}|l)\nonumber \\&\quad \quad \left| \frac{1}{|L^{a_{n}+\phi (n)}_{a_{n}}|}S_{l}(L^{a_{n}+\phi (n)-1}_{an})- \frac{1}{2}\pi (l)\right| \quad \mathrm{a.e.}. \end{aligned}$$
(78)

By Theorem 3.1, (72), (77) and (78), and noticing that \( \lim \limits _{n\rightarrow \infty }\frac{|L^{a_{n}+\phi (n)-1}_{a_{n}}|}{|L^{a_{n}+\phi (n)}_{a_{n}}|}=\frac{1}{2}\). We have

$$\begin{aligned}&\lim _{n\rightarrow \infty }\frac{1}{|L^{a_{n}+\phi (n)}_{a_{n}}|}\sum _{t\in L^{a_{n}+\phi (n)-1}_{a_{n}}}\ln P_{t}(X_{t^{1}},X_{t^{2}}|X_{t})\nonumber \\&\quad =\frac{1}{2}\sum ^{b-1}_{l=0}\pi (l)\sum ^{b-1}_{k_{1}=0}\sum ^{b-1}_{k_{2}=0}P(k_{1},k_{2}|l)\ln P(k_{1},k_{2}|l)\quad \mathrm{a.e.}. \end{aligned}$$
(79)

(27) can be obtained from (17), (71) and (79), which completes the proof of the theorem 3.2. \(\square \)

Proof of Corollary 3.1

Let \(\forall t\in T_{2}\) and \(\forall x,y_{1},y_{2}\in G,P_{t}(y_{1},y_{2}|x) = Q_{t^{1}}(y_{1}|x)Q_{t^{2}}(y_{2}|x)\). From Remark 2.4 we know that nonhomogeneous Markov chain indexed by a binary tree given in this corollary is a nonhomogeneous bifurcating Markov chain indexed by a binary tree with the stochastic matrices \(\{P_{t}= (P_{t}(y_{1},y_{2}|x)),t \in T_{2}\}\), and

$$\begin{aligned}&g_{a_{n},\phi (n)}(\omega ) =\frac{1}{|L^{a_{n}+\phi (n)}_{a_{n}}|}\Big [\ln P(X^{L_{a_{n}}})+ \sum _{t\in L^{a_{n}+\phi (n)}_{a_{n}+1}}\ln Q_{t}(X_{t}|X_{1_{t}})\Big ]\nonumber \\&\quad = - \frac{1}{|L^{a_{n}+\phi (n)}_{a_{n}}|} \Big [\ln P(X^{L_{a_{n}}})+ \sum _{t\in L^{a_{n}+\phi (n)-1}_{a_{n}}}\ln Q_{t^{1}}(X_{t^{1}}|X_{t}) \nonumber \\&\quad \quad +\, \sum _{t\in L^{a_{n}+\phi (n)-1}_{a_{n}}}\ln Q_{t^{2}}(X_{t^{2}}|X_{t})\Big ]\nonumber \\&\quad = - \frac{1}{|L^{a_{n}+\phi (n)}_{a_{n}}|}\Big [\ln P(X^{L_{a_{n}}}) +\sum _{t\in L^{a_{n}+\phi (n)-1}_{a_{n}}}\ln P_{t}(X_{t^{1}},X_{t^{2}}|X_{t})\Big ]\nonumber \\&\quad = f_{a_{n},\phi (n)}(\omega ). \end{aligned}$$
(80)

Let \(P(y_{1},y_{2}|x) = Q(y_{1}|x)Q(y_{2}|x)\). It is easy to see that \(P_{1} = Q,P_{2}= Q, \frac{1}{2}(P_{1}+P_{2}) = Q\), and Q is ergodic. Since

$$\begin{aligned}&|P_{t}(k_{1},k_{2}|l)-P(k_{1},k_{2}|l)|\nonumber \\&\quad = |Q_{t^{1}}(k_{1}|l)Q_{t^{2}}(k_{2}|l)-Q(k_{1}|l)Q(k_{2}|l)|\nonumber \\&\quad \le |Q_{t1}(k_{1}|l)Q_{t^{2}}(k_{2}|l) - Q(k_{1}|l)Q_{t^{2}}(k_{2}|l)|+|Q(k_{1}|l)Q_{t^{2}}(k_{2}|l)\nonumber \\&\qquad -Q(k_{1}|l)Q(k_{2}|l)|\nonumber \\&\quad \le |Q_{t^{1}}(k_{1}|l)-Q(k_{1}|l)|+|Q_{t^{2}}(k_{2}|l)-Q(k_{2}|l)|, \end{aligned}$$
(81)

and by (29), for \(i = 1,2,\)

$$\begin{aligned}&\lim _{n\rightarrow \infty }\frac{1}{|T^{(n)}|}\sum _{t\in T^{(n-1)}\backslash \{o\}}|Q_{t^{i}}(k_{1}|l)-Q(k_{1}|l)|\nonumber \\&\quad \le \lim _{n\rightarrow \infty }\frac{1}{|T^{(n)}|}\sum _{t\in T^{(n)}\backslash \{o\}}|Q_{t}(k_{1}|l)-Q(k_{1}|l)| = 0, \end{aligned}$$
(82)

Thus (25) follows from (81), (82). By Theorem 3.2 and (80),

$$\begin{aligned}&\lim _{n\rightarrow \infty }g_{a_{n},\phi (n)}(\omega )\nonumber \\&\quad = \lim _{n\rightarrow \infty }f_{a_{n},\phi (n)}(\omega )\nonumber \\&\quad = - \frac{1}{2}\sum ^{b-1}_{l=0}\pi (l)\sum ^{b-1}_{k_{1}=0}\sum ^{b-1}_{k_{2}=0}P(k_{1},k_{2}|l)\ln P(k_{1},k_{2}|l)\nonumber \\&\quad = - \frac{1}{2}\sum ^{b-1}_{l=0} \pi (l)\sum ^{b-1}_{k_{1}=0}\sum ^{b-1}_{k_{2}=0}Q(k_{1}|l)Q(k_{2}|l)\cdot \big [\ln Q(k_{1}|l) + \ln Q(k_{2}|l)\big ]\nonumber \\&\quad = - \sum ^{b-1}_{l=0}\sum ^{b-1}_{k=0}\pi (l)Q(k|l)\ln Q(k|l)\quad \mathrm{a.e.}. \end{aligned}$$
(83)

Thus, (30) holds. \(\square \)