1 Introduction

We call a connected graph T a tree if it is an infinite and locally finite, with a conspicuous node o called the root and without loops or cycles. In this work, we restrict the degrees of the nodes to a number of no less than 2. Let σ, τ be nodes of T. Write \(\tau< \sigma\) if τ is on the unique path connecting o to σ, and \(\vert\sigma\vert\) for the number of edges on this path. For any two nodes σ, τ, denote by \(\sigma\wedge\tau\) the node farthest from o satisfying \(\sigma\wedge\tau < \sigma\), \(\sigma\wedge\tau< \tau\).

Some useful notations are listed as follows: \(|A|\) is the number of elements in the set A, \(L_{n}\) is the nodes in level n of tree T, and \(L_{0}\) is the set of the roots o, \(T^{(n)}\) is the nodes in level 0 to n of tree T, \(T^{(n)}\setminus\{o\}= \{ T^{(n)} \mbox{ excluding the root } o \}\), \({1}_{t}\) is the first predecessor of t, \({2}_{t}\) is the second predecessor of t. See Figure 1 for an example.

Figure 1
figure 1

\(\pmb{T^{(5)}}\) of an infinite tree.

The study of tree-indexed processes began at the end of 20th century. Since Benjamini and Peres [1] introduced the notion of the tree-indexed Markov chains in 1994, much literature (see [29]) studied some strong limit properties for Markov chains indexed by an infinite tree with uniformly bounded degree. Meanwhile, there are many authors (see [1012]) who tried to give the limit properties of Markov chains indexed by a class of non-uniformly bounded-degree trees.

This work, motivated by Peng (2014), mainly considers a kind of non-uniformly bounded-degree trees and studies some strong limit properties, including the strong law of large numbers and AEP with a.e. convergence, for nonhomogeneous Markov chains indexed by a controlled tree, which permits some of the nodes to have an asymptotic infinite degree. The outcomes can generalize some well-known results. The technical route used in this paper is similar to that in [13], some of the related notations in this paper are the same as [13].

Definition 1

(see [14])

Let T be a tree, S be a states space, \(\{X_{\sigma},\sigma\in T\}\) be a collection of S-valued random variables defined on the probability space \((\Omega,\mathcal{F},P)\). Let

$$ p=\bigl\{ p(x),x\in S\bigr\} $$
(1)

be a distribution on S, and

$$ \bigl(P_{t}(y|x)\bigr), \quad {x,y\in S}, t\in T, $$
(2)

be stochastic matrices on \(S^{2}\). If, for any vertices \(t\in T\),

$$\begin{aligned}& \mathrm{P}(X_{t}=y|X_{1_{t} }=x \mbox{ and } X_{s} \mbox{ for } t \wedge s < 1_{t}) \\& \quad =\mathrm{P}(X_{t}=y|X_{1_{t} }=x)=P_{t}(y|x), \quad x, y\in S \end{aligned}$$
(3)

and

$$\mathrm{P}(X_{o}=x)=p(x),\quad x\in S, $$

then \(\{X_{t},t\in T\}\) will be called S-value nonhomogeneous Markov chains indexed by a tree with the initial distribution (1) and transition matrix (2), or they will be called tree-indexed nonhomogeneous Markov chains. If the transition matrices \((P_{t}(y|x))\) have nothing to do with t, i.e., for all \(t\in T\),

$$\bigl(P_{t}(y|x)\bigr)=\bigl(P(y|x)\bigr), \quad {x,y\in S}, $$

\(\{X_{t},t\in T\}\) will be called S-value homogeneous Markov chains indexed by tree T.

We set an integer \(N \geq0\), \(d^{0}(t):=1\), and denote by

$$ d^{N}(t):=\bigl\vert \{{\sigma\in T:{N}_{\sigma}=t}\}\bigr\vert , \quad N\geq1, $$
(4)

the number of t’s Nth descendants. We assume that, for any integer \(N\geq0\), there are constants \(\delta >0\), and positive integers \(M_{k}\), \(k=0, 1, 2,\ldots \) , such that

$$ \frac{|\{t\in T^{(n)}: d^{N}(t)>M_{N} \} | }{|T^{(n+N)}|} \leq\frac {1}{(1+\delta)^{ d^{N}_{n}}} $$
(5)

uniformly holds for all \(n\geq0\), where \(d^{N}_{n}= {\max_{t\in T^{(n)} }}\{d^{N}(t)\}\).

Definition 2

We call T a controlled tree if it is a non-uniformly bounded-degree tree when the assumption (5) holds.

From the assumption (5) we can find that some of the nodes on a controlled tree may have an asymptotic infinite degree. The following three remarks indicate that controlled tree models include some well-known models such as Cayley trees (of course homogeneous trees) and uniformly bounded-degree trees.

Remark 1

A Cayley tree \(T_{C,m}\), of which each vertex has m descendants, satisfies the above condition (5). Actually, in such a tree, \(d^{N}_{n}=m^{N}\), hence \(|\{t\in T^{(n)}: d^{N}(t)> m^{N}\}|=0\).

Remark 2

If we consider any uniformly bounded-degree tree, then there are some \(a>0\) such that \(d^{N}_{n}\leq a^{N}\), \(|\{t\in T^{(n)}: d^{N}(t)> a^{N} \}|=0\), which indicates that uniformly bounded-degree trees conform to the assumption (5).

Remark 3

In this paper, the condition (5) can imply the case in Peng [13]. The assumption (5) in [13], which we denote by (5a), is

$$ \max\bigl\{ {d^{N}(t)}:{t\in{T^{(n)}}}\bigr\} \leq O\biggl(\ln{ \frac {|T^{(n+N)}|}{|T^{(n)}|}}\biggr), $$
(5a)

where

$$O(n)=\biggl\{ c_{n}: 0< \limsup_{n\rightarrow\infty} \frac{ c_{n}}{n}\leq c, c \mbox{ is a constant}\biggr\} . $$

In fact, (5) is equivalent with

$$ d^{N}_{n} \leq\log_{1+\delta} \frac{|T^{(n+N)}|}{|\{t\in T^{(n)}: d^{N}(t)>M_{N} \} | }. $$
(6)

Meanwhile, (5a) is equivalent with

$$ { d^{N}_{n}}\leq O\biggl(\ln{\frac{|T^{(n+N)}|}{|T^{(n)}|}} \biggr). $$
(7)

Obviously,

$$ \bigl\vert \bigl\{ t\in T^{(n)}: d^{N}(t)>M_{N} \bigr\} \bigr\vert \leq\bigl\vert T^{(n)}\bigr\vert . $$
(8)

Then, for all \(\delta>0\), combining (6), (7), and (8), we arrive at

$$ O\biggl(\ln{\frac{|T^{(n+N)}|}{|T^{(n)}|}}\biggr) \leq\log_{1+\delta} \frac {|T^{(n+N)}|}{|\{t\in T^{(n)}: d^{N}(t)>M_{N} \} | }. $$
(9)

Hence, by (9) we know the trees with (5a) holding are special cases of the controlled tree model.

The above three remarks indicate that the tree models introduced in this work are extensions of [7, 15] and [13]. Without additional statement, the trees referred to in the following are all infinite, local finite trees with assumption (5) holding.

Now we give some useful notations. Let \(\delta_{k}(\cdot)\) be the indicator function, i.e.,

$$\delta_{k}(x)= \left \{ \textstyle\begin{array}{l@{\quad}l} 1, & \mbox{if }k=x, \\ 0, & \mbox{or else}. \end{array}\displaystyle \right . $$

For given natural integer \(N\geq0\), write

$$ S_{k}^{N}\bigl(T^{(n)}\bigr):=\sum _{t\in T^{(n-N)}}\delta_{k}(X_{t})d^{N}(t). $$
(10)

By (10), we have

$$ \sum_{k\in S}S_{n}^{N} (k)=\bigl\vert T^{(n)}\bigr\vert -\bigl\vert T^{(N)}\bigr\vert . $$
(11)

Denote

$$ H_{n}(\omega)=\sum_{t\in T^{(n)}\setminus\{o\} }g_{t}(X_{{1}_{t}},X_{t}) $$
(12)

and

$$ G_{n}(\omega)=\sum_{t\in T^{(n)}\setminus\{o\} }E \bigl[g_{t}(X_{{1}_{t}},X_{t})|X_{{1}_{t}}\bigr]. $$
(13)

2 Some strong limit theorems

In this section, we mainly consider a controlled tree, which is a non-uniformly bounded-degree tree with assumption (5) holding. Theorem 1 and Theorem 2 give two kinds of strong limit theorems for nonhomogeneous Markov chains. Theorem 3 proves the strong law of large numbers and the Shannon-McMillan theorem for nonhomogeneous Markov chains indexed by a controlled tree.

Theorem 1

Let T be a controlled tree defined by Definition  1. Let \(\{X_{t}, t\in T\}\) be S-value nonhomogeneous Markov chains indexed by this tree with the initial distribution (1) and transition matrix (2). Let \(S_{k}^{N}(T^{(n)})\) be defined as before; for all \(k\in S\) and given \(N\geq0\), we have

$$ \lim_{n\rightarrow\infty}\frac{1}{|T^{(n+N)}|}\biggl\{ S_{k}^{N} \bigl(T^{(n)}\bigr)- \sum_{t\in T^{(n)}\setminus\{o\}}d^{N}(t)P_{t}(k|X_{{1}_{t}}) \biggr\} =0 \quad \textit{a.e.} $$
(14)

Proof

Let \(g_{t}(x,y)=d^{N}(t)\delta_{k}(y)\), then we have

$$\begin{aligned} \begin{aligned}[b] G_{n}(\omega)&=\sum_{t\in T^{(n)}\setminus\{o \}}E \bigl[g_{t}(X_{{1}_{t}},X_{t})|X_{{1}_{t}}\bigr]= \sum_{t\in T^{(n)}\setminus\{o \}}d^{N}(t)\sum _{x_{t}\in S}\delta _{k}(x_{t})P_{t}(x_{t}|X_{{1}_{t}}) \\ &=\sum_{t\in T^{(n)}\setminus\{o\} }d^{N}(t)P_{t}(k|X_{{1}_{t}}) \end{aligned} \end{aligned}$$
(15)

and

$$ H_{n}(\omega)=\sum_{t\in T^{(n)}\setminus\{o \}}g(X_{{1}_{t}},X_{t})= \sum_{t\in T^{(n)}\setminus\{o\}}d^{N}(t)\delta _{k}(X_{t})= S_{k}^{N} \bigl(T^{(n)}\bigr)-\delta_{k}(X_{o})d^{N}(o). $$
(16)

According to Lemma 1 of [15] (Huang and Yang, 2008), we have

$$ \lim_{n}t_{n}(\lambda,\omega)=t(\lambda,\omega)< \infty \quad \mbox{a.e.} $$
(17)

Obviously,

$$\liminf_{n\rightarrow\infty}{\bigl\vert T^{(n+N)}\bigr\vert }= \infty, $$

then by (17)

$$ \limsup_{n\rightarrow\infty}\frac{\ln t_{n}(\lambda,\omega )}{|T^{(n+N)}|}\leq0 \quad \mbox{a.e.} $$
(18)

By (16) and (18), we get

$$ \limsup_{n\rightarrow\infty}\frac{1}{|T^{(n+N)}|}\biggl\{ \lambda H_{n}(\omega)- \sum_{t\in T^{(n)}\setminus\{o\}}\ln\bigl[E \bigl[e^{\lambda g_{t}(X_{{1}_{t}},X_{t})}|X_{{1}_{t}}\bigr]\bigr]\biggr\} \leq0 \quad \mbox{a.e.} $$
(19)

Let \(\lambda>0\). Dividing two sides of (19) by λ, we have

$$ \limsup_{n\rightarrow\infty}\frac{1}{a_{n}}\biggl\{ H_{n}( \omega)- \sum_{t\in T^{(n)}\setminus\{o\}}\frac{\ln[E[e^{\lambda g_{t}(X_{{1}_{t}},X_{t})}|X_{{1}_{t}}]]}{\lambda}\biggr\} \leq0 \quad \mbox{a.e.} $$
(20)

By (5), (18), (20), and the inequalities \(\ln x \leq x-1\) (\(x>0\)), \(0\leq e^{x}-1-x\leq2^{-1}x^{2}e^{|x|}\), we have

$$\begin{aligned}& \limsup_{n\rightarrow\infty}\frac{1}{|T^{(n+N)}|}\biggl[ H_{n}( \omega)-\sum_{t\in T^{(n)}\setminus\{o\}}E\bigl[ g_{t}(X_{{1}_{t}},X_{t})|X_{{1}_{t}} \bigr]\biggr] \\& \quad \leq \limsup_{n\rightarrow\infty}\frac{1}{|T^{(n+N)}|}\sum _{t\in T^{(n)}\setminus\{o\}}\biggl\{ \frac{\ln[E[e^{\lambda g_{t}(X_{{1}_{t}},X_{t})}|X_{{1}_{t}}]]}{\lambda}-E\bigl[ g_{t}(X_{{1}_{t}},X_{t})|X_{{1}_{t}} \bigr]\biggr\} \\& \quad \leq \limsup_{n\rightarrow\infty}\frac{1}{|T^{(n+N)}|}\sum _{t\in T^{(n)}\setminus\{o\}}\biggl\{ \frac{E[e^{\lambda g_{t}(X_{{1}_{t}},X_{t})}|X_{{1}_{t}}]-1}{\lambda}-E\bigl[ g_{t}(X_{{1}_{t}},X_{t})|X_{{1}_{t}} \bigr]\biggr\} \\& \quad \leq \frac{\lambda}{2}\limsup_{n\rightarrow\infty}\frac{1}{|T^{(n+N)}|} \sum_{t\in T^{(n)}\setminus\{o\}}E\bigl[g_{t}^{2}(X_{{1}_{t}},X_{t})e^{\lambda| g_{t}(X_{{1}_{t}},X_{t})|}|X_{{1}_{t}} \bigr] \\& \quad = \frac{\lambda}{2}\limsup_{n\rightarrow\infty}\frac{1}{ |T^{(n+N)}|}\sum _{t\in T^{(n)}\setminus\{o\}}E\bigl[\bigl(d^{N}(t) \delta_{k}(X_{t})\bigr)^{2}e^{{\lambda }|d^{N}(t)\delta_{k}(X_{t})|}|X_{{1}_{t}} \bigr] \\& \quad \leq \frac{\lambda}{2} \limsup_{n\rightarrow\infty}\frac{1}{ |T^{(n+N)}|} \sum_{t\in T^{(n)}\setminus\{o\}}\bigl[\bigl(d^{N}(t) \bigr)^{2}e^{{\lambda }d^{N}(t)}P_{t}(k|X_{{1}_{t}})\bigr] \\& \quad \leq \frac{\lambda}{2} \limsup_{n\rightarrow\infty}\frac{1}{ |T^{(n+N)}|} \sum_{t\in T^{(n)}\setminus\{o\}}\bigl[\bigl(d^{N}(t) \bigr)^{2}e^{{\lambda}d^{N}(t)}\bigr]. \end{aligned}$$

Split the \(T^{(n)}\setminus\{o\}\) into two parts, \(\{t: d^{N}(t)>M_{N}\}\) and \(\{t: d^{N}(t)\leq M_{N}\}\), then we have

$$\begin{aligned}& \frac{\lambda}{2} \limsup_{n\rightarrow\infty} \frac{1}{ |T^{(n+N)}|}\sum_{t\in T^{(n)}\setminus\{o\}}\bigl[ \bigl(d^{N}(t)\bigr)^{2}e^{{\lambda }d^{N}(t)}\bigr] \\& \quad = \frac{\lambda}{2} \limsup_{n\rightarrow\infty}\frac{1}{ |T^{(n+N)}|} \biggl(\sum_{\{t: d^{N}(t)>M_{N} \}}\bigl[\bigl(d^{N}(t) \bigr)^{2}e^{{\lambda }d^{N}(t)}\bigr]+\sum_{\{t: d^{N}(t)\leq M_{N} \}} \bigl[\bigl(d^{N}(t)\bigr)^{2}e^{{\lambda }d^{N}(t)}\bigr] \biggr) \\& \quad \leq \frac{\lambda}{2} \limsup_{n\rightarrow\infty}\frac{|t: d^{N}(t)>M_{N}, t\in T^{(n)}\setminus\{o\}| }{ |T^{(n+N)}|} \bigl(d^{N}_{n}\bigr)^{2}e^{{\lambda}d^{N}_{n}} + \frac{\lambda}{2} {M_{N}}^{2}e^{{\lambda}M_{N}} \\& \quad \leq \frac{\lambda}{2} \limsup_{n\rightarrow\infty}\frac {M_{N}(d^{N}_{n})^{2}e^{{\lambda}d^{N}_{n}}}{(1+\delta)^{ d^{N}_{n}}} + \frac{\lambda }{2} {M_{N}}^{2}e^{{\lambda}M_{N}}. \end{aligned}$$
(21)

We restrict \(0<\lambda<\frac{1}{2}\ln(1+\delta)\) in (21), then

$$\begin{aligned}& \limsup_{n\rightarrow\infty}\frac{1}{|T^{(n+N)}|}\biggl[ H_{n}(\omega)-\sum_{t\in T^{(n)}\setminus\{o\}}E\bigl[ g_{t}(X_{{1}_{t}},X_{t})|X_{{1}_{t}}\bigr]\biggr] \\& \quad \leq \frac{\lambda}{2} \limsup_{n\rightarrow\infty}\frac {M_{N}(d^{N}_{n})^{2}}{(1+\delta)^{ \frac{1}{2}d^{N}_{n}}} + \frac{\lambda}{2} {M_{N}}^{2}e^{{\lambda}M_{N}}. \end{aligned}$$
(22)

Noticing, for \(\delta>0\),

$$ \limsup_{n\rightarrow\infty}\frac{M_{N}(d^{N}_{n})^{2}}{(1+\delta)^{ \frac {1}{2}d^{N}_{n}}}< \infty, $$
(23)

whenever \(d^{N}_{n}\) tends to infinity or not (as \(n\rightarrow\infty\)). By (15), (16), and (23), letting \(\lambda\rightarrow0^{+}\) in (22), we have

$$ \lim_{n\rightarrow\infty}\frac{1}{|T^{(n+N)}|}\biggl\{ S_{k}^{N} \bigl(T^{(n)}\bigr)- \sum_{t\in T^{(n)}\setminus\{o\}}d^{N}(t)P_{t}(k|X_{{1}_{t}}) \biggr\} \leq0 \quad \mbox{a.e.} $$
(24)

We similarly get by letting \(\lambda \rightarrow0^{-}\),

$$ \lim_{n\rightarrow\infty}\frac{1}{|T^{(n+N)}|}\biggl\{ S_{k}^{N} \bigl(T^{(n)}\bigr)- \sum_{t\in T^{(n)}\setminus\{o\}}d^{N}(t)P_{t}(k|X_{{1}_{t}}) \biggr\} \geq0 \quad \mbox{a.e.} $$
(25)

Combing (24) and (25), we obtain (14) directly. □

Theorem 2

We make the assumption of Theorem  1. If, for any \(x, y\in S\),

$$ \lim_{|t|\rightarrow\infty}P_{t}{(y|x)}=P{(y|x)}>0, $$
(26)

then

$$ \lim_{n\rightarrow\infty}\biggl\{ \frac{S_{k}^{N}(T^{(n)})}{|T^{(n+N)}|}- \frac{\sum_{l\in S}S_{l}^{N+1}(T^{(n-1)})}{|T^{(n+N)}|}P(k|l) \biggr\} =0 \quad \textit{a.e.} $$
(27)

Proof

By Theorem 1, we have

$$\begin{aligned}& \biggl\vert \frac{S_{k}^{N}(T^{(n)})}{|T^{(n+N)}|}- \frac{\sum_{l\in S}S_{l}^{N+1}(T^{(n-1)})}{|T^{(n+N)}|}P(k|l)\biggr\vert \\& \quad = \biggl\vert \frac{1}{|T^{(n+N)}|}\biggl\{ S_{k}^{N} \bigl(T^{(n)}\bigr)- \sum_{t\in T^{(n)}\setminus\{o\}}d^{N}(t)P_{t}(k|X_{{1}_{t}}) \\& \qquad {}+ \sum_{t\in T^{(n)}\setminus\{o\}}d^{N}(t)\bigl[P_{t}(k|X_{{1}_{t}})-P(k|X_{{1}_{t}}) \bigr]\biggr\} \biggr\vert \\& \quad \le \frac{1}{|T^{(n+N)}|}\biggl\{ \biggl\vert S_{k}^{N} \bigl(T^{(n)}\bigr)- \sum_{t\in T^{(n)}\setminus\{o\}}d^{N}(t)P_{t}(k|X_{{1}_{t}}) \biggr\vert \\& \qquad {}+\biggl\vert \sum_{t\in T^{(n)}\setminus\{o\}}d^{N}(t) \bigl[P_{t}(k|X_{{1}_{t}})-P(k|X_{{1}_{t}})\bigr]\biggr\vert \biggr\} . \end{aligned}$$

Combining (14) and (26), (27) follows. □

Write

$$P\bigl(x^{T^{(n)}}\bigr)=P\bigl(X^{T^{(n)}}=x^{T^{(n)}}\bigr). $$

Let

$$f_{n}(\omega)=-\frac{1}{|T^{(n)}|}\ln P\bigl(X^{T^{(n)}}\bigr), $$

\(f_{n}(\omega)\) will be called the entropy density of \(X^{T^{(n)}}\). If \((X_{t})_{t\in T}\) is defined by Definition 1, we have by (3)

$$ f_{n}(\omega)=-\frac{1}{|T^{(n)}|}\biggl[\ln P_{o}(X_{0})+ \sum_{t\in T^{(n)} \setminus\{o\}}\ln P_{t}(X_{t}|X_{{1}_{t}}) \biggr]. $$
(28)

The convergence of \(f_{n}(\omega)\) to a constant in a sense (\(L_{1}\) convergence, convergence in probability, a.e. convergence) is called the Shannon-McMillan theorem or the entropy theorem or the asymptotic equipartition property (AEP) in information theory. Next, we establish the class where we have a.e. convergence of the law of large numbers and the AEP for a tree-indexed nonhomogeneous Markov chain.

Theorem 3

Let \(k\in S \), and \(P=(P(x|y))_{x,y\in S}\) an ergodic stochastic matrix. Denote the unique stationary distribution of P by π. Let \((X_{t})_{t\in T}\) be a T-indexed nonhomogeneous Markov chain with state space S. If (26) holds, then, for given integer \(N\geq0\),

$$ \lim_{n\rightarrow\infty}\frac{S^{N}_{k}(T^{(n)})}{|T^{(n+N)}|}=\pi(k) \quad \textit{a.e.} $$
(29)

Let \(f_{n}(\omega)\) be defined as (28), then

$$ \lim_{n\rightarrow\infty}f_{n}(\omega)=-\sum _{l\in S}\sum_{k\in S}\pi (l)P(k|l)\ln P(k|l) \quad \textit{a.e.} $$
(30)

Proof

The proofs of (28) and (29) are similar to the work of Huang and Yang ([15], Theorem 2 and Corollary 3), so we omit them. Now we focus on the proof of (30). Letting \(g_{t}(x,y)=-\ln P_{t}(y|x)\) in (12) and (13), then by (28),

$$ \lim_{n\rightarrow \infty}f_{n}(\omega)=\lim_{n\rightarrow \infty} \frac{H_{n}(\omega)}{|T^{(n)}|}, $$
(31)

by (29) and (26)

$$\begin{aligned} \frac{G_{n}(\omega)}{|T^{(n)}|} =&-\frac{1}{|T^{(n)}|}\sum_{t\in T^{(n)}\setminus\{o\}}E \bigl[\ln P_{t}(X_{t}|X_{{1}_{t}})|X_{{1}_{t}} \bigr] \\ =& -\frac{1}{|T^{(n)}|}\sum_{t\in T^{(n)}\setminus\{o\}}\sum _{l\in S}\sum_{k\in S} E\bigl[ \delta_{l}(X_{{1}_{t}})\delta_{k}(X_{t})\ln P_{t}(k|l)|X_{{1}_{t}}\bigr] \\ =& -\sum_{k,l\in S}\sum_{t\in T^{(n)}\setminus\{o\}} \frac{\delta_{l}(X_{{1}_{t}})P_{t}(k|l)\ln P_{t}(k|l)}{|T^{(n)}|} \\ =& -\sum_{k,l\in S}\sum_{t\in T^{(n)}\setminus\{o\}} \frac{\delta_{l}(X_{{1}_{t}})P(k|l)\ln P(k|l)}{|T^{(n)}|} \\ &{}-\sum_{k,l\in S}\sum _{t\in T^{(n)}\setminus\{o\}}\frac{\delta_{l}(X_{{1}_{t}})[P_{t}(k|l)\ln P_{t}(k|l)-P(k|l)\ln P(k|l)]}{|T^{(n)}|} \\ =& -\sum_{k,l\in S}\frac{S^{1}_{l}(T^{(n-1)})P(k|l)\ln P(k|l)}{|T^{(n)}|} \\ &{}-\sum _{k,l\in S}\sum_{t\in T^{(n)}\setminus\{o\}}\frac{\delta_{l}(X_{{1}_{t}})[P_{t}(k|l)\ln P_{t}(k|l)-P(k|l)\ln P(k|l)]}{|T^{(n)}|} \\ \rightarrow& -\sum_{l\in S}\sum _{k\in S}\pi(l)P(k|l)\ln P(k|l)\quad \mbox{a.e. as } n \rightarrow \infty, \end{aligned}$$

this combining with (31) implies (30). The proof is completed. □