Keywords

1 Introduction

Markov chains are probabilistic models that can be used to describe the uncertain dynamics of a large variety of stochastic processes. One of the key results within the field is the point-wise ergodic theorem. It establishes a relation between the long-term time average of a real-valued function and its limit expectation, which is guaranteed to exist if the Markov chain is ergodic.Footnote 1 For this reason, limit expectations and limit distributions have become central objects of interest. Of course, if one is interested in the long-term behaviour of time averages, one could also study the expected values of these averages directly. This is not often done though, because the limit of these expected time averages coincides with the aforementioned limit expectations, which can straightforwardly be obtained by solving a linear eigenproblem [10].

We here consider a generalisation of Markov chains, called imprecise Markov chains [2, 4, 9], for which the considerations above are not necessarily true. Imprecise Markov chains are sets of traditional (“precise”) probabilistic models, where the Markov property (history independence) and time-homogeneity apply to the collection of precise models as a whole, but not necessarily to the individual models themselves. Imprecise Markov chains therefore allow one to incorporate model uncertainty about the numerical values of the transition probabilities that make up a Markov chain, but also, and more importantly, about structural assumptions such as time-homogeneity and the Markov property. For such an imprecise Markov chain, one is then typically interested in obtaining tight upper and lower bounds on inferences for the individual constituting models. The operators that represent these upper and lower bounds are respectively called upper and lower expectations.

Just like traditional Markov chains can have a limit expectation, an imprecise Markov chain can have limit upper and lower expectations. There are necessary and sufficient conditions for their existence [8] as well as an imprecise variant of the point-wise ergodic theorem [2]. An important difference with traditional Markov chains however, is that upper and lower bounds on expectations of time averages—we will call these upper and lower expected time averages—may not converge to limit upper and lower expectations. Nevertheless, because they give conservative bounds [11, Lemma 57], and because they are fairly easy to compute, limit upper and lower expectations are often used as descriptors of the long-term behaviour of imprecise Markov chains, even if one is actually interested in time averages. This comes at a cost though: as we illustrate in Sect. 4, both inferences can differ greatly, with limit expectations providing far too conservative bounds.

Unfortunately, apart from some experiments in [11], little is known about the long-term behaviour of upper and lower expected time averages in imprecise Markov chains. The aim of this paper is to remedy this situation. Our main result is an accessibility condition that is necessary and sufficient for upper and lower expected time averages to converge to a limit value that does not depend on the process’ initial state; see Sect. 7. Remarkably, this condition is considerably weaker than the ones required for limit lower and upper expectations to exist.

Technical proofs are relegated to the appendix of an extended online version [12]. This is particularly true for the results in Sect. 7, where the main text provides an informal argument that aims to provide intuition.

2 Markov Chains

We consider an infinite sequence \(X_{0} X_{1} X_{2} \cdots \) of uncertain states, where each state \(X_k\) at time takes values in some finite set \(\mathscr {X}{}\), called the state space. Such a sequence \(X_{0} X_{1} X_{2} \cdots \) will be called a (discrete-time) stochastic process. For any \(k,\ell \in \mathbb {N}_{0}{}\) such that \(k \le \ell \), we use \(X_{k:\ell }\) to denote the finite subsequence \(X_{k} \cdots X_{\ell }\) of states that takes values in \(\mathscr {X}{}^{\ell -k+1}\). Moreover, for any \(k,\ell \in \mathbb {N}_{0}{}\) such that \(k \le \ell \) and any \(x_{k:\ell } \in \mathscr {X}{}^{\ell -k+1}\), we use \(X_{k:\ell } = x_{k:\ell }\) to denote the event that \(X_{k} = x_k \cdots X_{\ell } = x_\ell \). The uncertain dynamics of a stochastic process are then typically described by probabilities of the form \(\mathrm {P}(X_{k+1} = x_{k+1} \vert X_{0:k} = x_{0:k})\), for any \(k \in \mathbb {N}_{0}{}\) and any \(x_{0:k+1} \in \mathscr {X}{}^{k+2}\). They represent beliefs about which state the process will be in at time \(k+1\) given that we know that it was in the states \(x_{0} \cdots x_{k}\) at time instances 0 through k. Additionally, our beliefs about the value of the initial state \(X_0\) can be represented by probabilities \(\mathrm {P}(X_0=x_0)\) for all \(x_{0} \in \mathscr {X}{}\). The local probability assessments \(\mathrm {P}(X_{k+1} = x_{k+1} \vert X_{0:k} = x_{0:k})\) and \(\mathrm {P}(X_0=x_0)\) can now be combined to construct a global probability model \(\mathrm {P}\) that describes the dynamics of the process on a more general level. This can be done in various ways; one of the most common ones being a measure-theoretic approach where countable additivity plays a central role. For our purposes however, we will only require finite additivity. Regardless, once you have such a global probability model \(\mathrm {P}\), it can then be used to define expectations and make inferences about the uncertain behaviour of the process.

For any set A, let us write \(\mathscr {L}{}(A)\) to denote the set of all real-valued functions on A. Throughout, for any \(a \in A\), we use \(\mathbb {I}_{a}\) to denote the indicator of a: the function in \(\mathscr {L}{}(A)\) that takes the value 1 in a and 0 otherwise. We will only be concerned with (upper and lower) expectations of finitary functions: functions that depend on the state of the process at a finite number of time instances. So if f is finitary, we can write \(f = g(X_{0:k})\) for some \(k \in \mathbb {N}_{0}{}\) and some \(g \in \mathscr {L}{}(\mathscr {X}{}^{k+1})\). Note that finitary functions are bounded; this follows from their real-valuedness and the fact that \(\mathscr {X}{}\) is finite. The expectation of a finitary function \(f(X_{0:k})\) conditional on some event \(X_{0:\ell } = x_{0:\ell }\) simply reduces to a finite weighted sum:

$$\begin{aligned} \mathrm {E}_{\mathrm {P}}(f(X_{0:k}) \vert X_{0:\ell } = x_{0:\ell }) = \sum _{x_{\ell +1:k} \in \mathscr {X}{}^{k-\ell }} f(x_{0:k})\prod _{i=\ell }^{k-1} \mathrm {P}(X_{i+1} = x_{i+1} \vert X_{0:i} = x_{0:i}). \end{aligned}$$

A particularly interesting case arises when studying stochastic processes that are described by a probability model \(\mathrm {P}\) that satisfies

$$\begin{aligned} \mathrm {P}(X_{k+1} = y \, \vert \, X_{0:k} = x_{0:k}) = \mathrm {P}(X_{k+1} = y \, \vert \, X_{k} = x_{k}), \end{aligned}$$

for all \(k \in \mathbb {N}_{0}{}\), all \(y \in \mathscr {X}{}\) and all \(x_{0:k} \in \mathscr {X}{}^{k+1}\). This property, known as the Markov property, states that given the present state of the process the future behaviour of the process does not depend on its history. A process of this type is called a Markov chain. We moreover call it (time) homogeneous if additionally \(\mathrm {P}(X_{k+1} = y \, \vert \, X_{k} = x) = \mathrm {P}(X_{1} = y \, \vert \, X_{0} = x)\), for all \(k \in \mathbb {N}_{0}{}\) and all \(x,y \in \mathscr {X}{}\). Hence, together with the assessments \(\mathrm {P}(X_0 = x_0)\), the dynamics of a homogeneous Markov chain are fully characterised by the probabilities \(\mathrm {P}(X_{1} = y \, \vert \, X_{0} = x)\). These probabilities are typically gathered in a transition matrix T; a row-stochastic \(\vert \mathscr {X}{} \vert \times \vert \mathscr {X}{} \vert \) matrix T that is defined by for all \(x,y \in \mathscr {X}{}\). This matrix representation T is particularly convenient because it can be regarded as a linear operator from \(\mathscr {L}{}(\mathscr {X}{})\) to \(\mathscr {L}{}(\mathscr {X}{})\), defined for any \(k \in \mathbb {N}_{0}{}\), any \(f \in \mathscr {L}{}(\mathscr {X}{})\) and any \(x \in \mathscr {X}{}\) by

More generally, we have that \(\mathrm {E}_{\mathrm {P}}(f(X_{k+\ell }) \, \vert \, X_{k} = x ) = T^{\ell } f(x)\) for all \(k \in \mathbb {N}_{0}{}\), all \(\ell \in \mathbb {N}_{0}{}\) and all \(x \in \mathscr {X}{}\). Then, under some well-known accessibility conditions [8, Proposition 3], the expectation \(T^{\ell } f(x)\) converges for increasing \(\ell \) towards a constant \(\mathrm {E}_{\infty }(f)\) independently of the initial state x. If this is the case for all \(f \in \mathscr {L}{}(\mathscr {X}{})\), the homogeneous Markov chain will have a steady-state distribution, represented by the limit expectation \(\mathrm {E}_{\infty }\), and we call the Markov chain ergodic. The expectation \(\mathrm {E}_{\infty }\) is in particular also useful if we are interested in the limit behaviour of expected time averages. Indeed, let be the time average of some function \(f \in \mathscr {L}{}(\mathscr {X}{})\) evaluated at the time instances \(\ell \) through \(k+\ell \). Then, according to [11, Theorem 38], the limit of the expected average \(\lim _{k \rightarrow +\infty } \mathrm {E}_{\mathrm {P}}(\overline{f}_{k}(X_{0:k}))\) coincides with the limit expectation \(\mathrm {E}_{\infty }(f)\). One of the aims of this paper is to explore to which extent this remains true for imprecise Markov chains.

3 Imprecise Markov Chains

If the basic probabilities \(\mathrm {P}(X_{k+1} \vert X_{0:k} = x_{0:k})\) that describe a stochastic process are imprecise, in the sense that we only have partial information about them, then we can still model the process’ dynamics by considering a set \(\mathscr {T}\!_{x_{0:k}}\) of such probabilities, for all \(k \in \mathbb {N}_{0}{}\) and all \(x_{0:k} \in \mathscr {X}{}^{k+1}\). This set \(\mathscr {T}\!_{x_{0:k}}\) is then interpreted as the set of all probability mass functions \(\mathrm {P}(X_{k+1} \vert X_{0:k} = x_{0:k})\) that we deem “plausible”. We here consider the special case where the sets \(\mathscr {T}\!_{x_{0:k}}\) satisfy a Markov property, meaning that \(\mathscr {T}\!_{x_{0:k}} = \mathscr {T}\!_{x_{k}}\) for all \(k \in \mathbb {N}_{0}{}\) and all \(x_{0:k} \in \mathscr {X}{}^{k+1}\). Similar to the precise case, the sets \(\mathscr {T}\!_{x}\), for all \(x \in \mathscr {X}{}\), can be gathered into a single object: the set \(\mathscr {T}\!_{}\) of all row stochastic \(\vert \mathscr {X}{} \vert \times \vert \mathscr {X}{} \vert \) matrices T such that, for all \(x \in \mathscr {X}{}\), the probability mass function \(T(x, \cdot )\) is an element of \(\mathscr {T}\!_{x}\). A set \(\mathscr {T}\!_{}\) of transition matrices defined in this way is called separately specified [9]. For any such set \(\mathscr {T}\!_{}\), the corresponding imprecise Markov chain under epistemic irrelevance \(\mathscr {P}^{\, \mathrm {ei}}_{\mathscr {T}\!_{}}\) [3] is the set of all (precise) probability models \(\mathrm {P}\) such that \(\smash {\mathrm {P}(X_{k+1} \vert X_{0:k} = x_{0:k}) \in \mathscr {T}\!_{x_k}}\) for all \(k \in \mathbb {N}_{0}{}\) and all \(\smash {x_{0:k} \in \mathscr {X}{}^{k+1}}\). The values of the probabilities \(\mathrm {P}(X_0 = x_0)\) will be of no importance to us, because we will focus solely on (upper and lower) expectations conditional on the value of the initial state \(X_0\).

Clearly, an imprecise Markov chain \(\mathscr {P}^{\, \mathrm {ei}}_{\mathscr {T}\!_{}}\) also contains non-homogeneous, and even non-Markovian processes. So the Markov property does in this case not apply to the individual probability assessments, but rather to the sets \(\mathscr {T}\!_{x_{0:k}}\). The model \(\mathscr {P}^{\, \mathrm {ei}}_{\mathscr {T}\!_{}}\) is therefore a generalisation of a traditional Markov chain where we allow for model uncertainty about, on the one hand, the mass functions \(\mathrm {P}(X_{k+1} \vert X_{0:k} = x_{0:k})\) and, on the other hand, about structural assumptions such as the Markov and time-homogeneity property. However, there are also types of imprecise Markov chains that do impose some of these properties. For a given set \(\mathscr {T}\!_{}\), the imprecise Markov chain under complete independence \(\smash {\mathscr {P}^{\, \mathrm {ci}}_{\mathscr {T}\!_{}}}\) is the subset of \(\smash {\mathscr {P}^{\, \mathrm {ei}}_{\mathscr {T}\!_{}}}\) that contains all, possibly non-homogeneous, Markov chains in \(\smash {\mathscr {P}^{\, \mathrm {ei}}_{\mathscr {T}\!_{}}}\) [11]. The imprecise Markov chain under repetition independence \(\smash {\mathscr {P}^{\, \mathrm {ri}}_{\mathscr {T}\!_{}}}\) is the subset of \(\smash {\mathscr {P}^{\, \mathrm {ei}}_{\mathscr {T}\!_{}}}\) containing all homogeneous Markov chains [11]. Henceforth, we let \(\mathscr {T}\!_{}\) be some fixed, arbitrary set of transition matrices that is separately specified.

Now, for any probability model \(\mathrm {P}\) in the imprecise Markov chain \(\mathscr {P}^{\, \mathrm {ei}}_{\mathscr {T}\!_{}}\), we can again consider the corresponding expectation operator \(\mathrm {E}_{\mathrm {P}}\). The upper and lower expectation are then respectively defined as the tightest upper and lower bound on this expectation:

for any finitary function f and any event A of the form \(X_{0:k} = x_{0:k}\). The operators \(\smash {\overline{\mathrm {E}}^{\, \mathrm {ei}}_{\mathscr {T}\!_{}}}\) and \(\smash {\underline{\mathrm {E}}^{\, \mathrm {ei}}_{\mathscr {T}\!_{}}}\) are related by conjugacy, meaning that \(\smash {\underline{\mathrm {E}}^{\, \mathrm {ei}}_{\mathscr {T}\!_{} \,}(\cdot \vert \cdot ) = - \overline{\mathrm {E}}^{\, \mathrm {ei}}_{\mathscr {T}\!_{} \,}(- \cdot \vert \cdot )}\), which allows us to focus on only one of them; upper expectations in our case. The lower expectation \(\smash {\underline{\mathrm {E}}^{\, \mathrm {ei}}_{\mathscr {T}\!_{} \,}(f \vert A)}\) of a finitary function f can then simply be obtained by considering the upper expectation \(\smash {- \overline{\mathrm {E}}^{\, \mathrm {ei}}_{\mathscr {T}\!_{} \,}(- f \vert A)}\).

In a similar way, we can define the upper expectations \(\smash {\overline{\mathrm {E}}^{\, \mathrm {ci}}_{\mathscr {T}\!_{}}}\) and \(\smash {\overline{\mathrm {E}}^{\, \mathrm {ri}}_{\mathscr {T}\!_{}}}\) and the lower expectations \(\smash {\underline{\mathrm {E}}^{\mathrm {ci}}_{\mathscr {T}\!_{}}}\) and \(\smash {\underline{\mathrm {E}}^{\mathrm {ri}}_{\mathscr {T}\!_{}}}\) as the tightest upper and lower bounds on the expectations corresponding to the models in \(\mathscr {P}^{\, \mathrm {ci}}_{\mathscr {T}\!_{}}\) and \(\mathscr {P}^{\, \mathrm {ri}}_{\mathscr {T}\!_{}}\), respectively. Since \(\mathscr {P}^{\, \mathrm {ri}}_{\mathscr {T}\!_{}} \subseteq \mathscr {P}^{\, \mathrm {ci}}_{\mathscr {T}\!_{}} \subseteq \mathscr {P}^{\, \mathrm {ei}}_{\mathscr {T}\!_{}}\), we have that \(\smash {\overline{\mathrm {E}}^{\, \mathrm {ri}}_{\mathscr {T}\!_{}} \, (f \vert A) \le \overline{\mathrm {E}}^{\, \mathrm {ci}}_{\mathscr {T}\!_{}} \, (f \vert A) \le \overline{\mathrm {E}}^{\, \mathrm {ei}}_{\mathscr {T}\!_{}} \, (f \vert A)}\) for any finitary function f and any event A of the form \(X_{0:k} = x_{0:k}\).

As we have mentioned before, imprecise Markov chains generalise traditional Markov chains by incorporating different types of model uncertainty. The corresponding upper (and lower) expectations then allow us to make inferences that are robust with respect to this model uncertainty. For a more detailed discussion on the motivation for and interpretation behind these and other types of so-called imprecise probability models, we refer to [1, 5, 14].

Within the context of imprecise Markov chains, we will be specifically concerned with two types of inferences: the upper and lower expectation of a function at a single time instant, and the upper and lower expectation of the time average of a function. For imprecise Markov chains under epistemic irrelevance and under complete independence, both of these inferences coincide [11, Theorem 51 & Theorem 52]. For any \(f \in \mathscr {L}{}(\mathscr {X}{})\) and any \(x \in \mathscr {X}{}\), we will denote them by

$$\begin{aligned} \overline{\mathrm {E}}_{k}(f \vert x) = \,&\overline{\mathrm {E}}^{\, \mathrm {ei}}_{\mathscr {T}\!_{}}(f(X_{k}) \vert X_0 = x) = \overline{\mathrm {E}}^{\, \mathrm {ci}}_{\mathscr {T}\!_{}}(f(X_{k}) \vert X_0 = x) \\ \text { and } \ \overline{\mathrm {E}}_{\mathrm {av},k}(f \vert x) = \,&\overline{\mathrm {E}}^{\, \mathrm {ei}}_{\mathscr {T}\!_{}}(\overline{f}_{k}(X_{0:k}) \vert X_0 = x) = \overline{\mathrm {E}}^{\, \mathrm {ci}}_{\mathscr {T}\!_{}}(\overline{f}_{k}(X_{0:k}) \vert X_0 = x), \end{aligned}$$

respectively, where the dependency on \(\mathscr {T}\!_{}\) is implicit. The corresponding lower expectations can be obtained through conjugacy: \(\underline{\mathrm {E}}_{k}(f \vert x) = - \overline{\mathrm {E}}_{k}( -f \vert x)\) and \(\underline{\mathrm {E}}_{\mathrm {av},k}(f \vert x) = - \overline{\mathrm {E}}_{\mathrm {av},k}( -f \vert x)\) for all \(f \in \mathscr {L}{}(\mathscr {X}{})\) and all \(x \in \mathscr {X}{}\). In the remainder, we will omit imprecise Markov chains under repetition independence from the discussion. Generally speaking, this type of imprecise Markov chain is less studied within the field of imprecise probability because of its limited capacity to incorporate model uncertainty. Indeed, it is simply a set of time-homogeneous precise Markov chains and therefore only allows for model uncertainty about the numerical values of the transition probabilities. Moreover, as far as we know, a characterisation for the ergodicity of such Markov chains—a central topic in this paper—is currently lacking. We therefore believe that this subject demands a separate discussion, which we defer to future work.

4 Transition Operators, Ergodicity and Weak Ergodicity

Inferences of the form \(\overline{\mathrm {E}}_{k}(f \vert x)\) were among the first ones to be thoroughly studied in imprecise Markov chains. Their study was fundamentally based on the observation that \(\overline{\mathrm {E}}_{k}(f \vert x)\) can be elegantly rewritten as the k-th iteration of the map \(\overline{T}{} :\mathscr {L}{}(\mathscr {X}{}) \rightarrow \mathscr {L}{}(\mathscr {X}{})\) defined by

for all \(x \in \mathscr {X}{}\) and all \(h \in \mathscr {L}{}(\mathscr {X}{})\). Concretely, \(\overline{\mathrm {E}}_{k}(f \vert x) = [\overline{T}{}^k f](x)\) for all \(x \in \mathscr {X}{}\) and all \(k \in \mathbb {N}_{0}{}\) [4, Theorem 3.1]. The map \(\overline{T}{}\) therefore plays a similar role as the transition matrix T in traditional Markov chains, which is why it is called the upper transition operator corresponding to the set \(\mathscr {T}\!_{}\).

In an analogous way, inferences of the form \(\overline{\mathrm {E}}_{\mathrm {av},k}(f \vert x)\) can be obtained as the k-th iteration of the map \(\smash {\overline{T}{}_{ f}^{} :\mathscr {L}{}(\mathscr {X}{}) \rightarrow \mathscr {L}{}(\mathscr {X}{})}\) defined by for all \(h \in \mathscr {L}{}(\mathscr {X}{})\). In particular, if we let and

(1)

then it follows from [11, Lemma 41] that \(\overline{\mathrm {E}}_{\mathrm {av},k}(f \vert x) = \tfrac{1}{k+1} \tilde{m}_{f,k}(x)\) for all \(x \in \mathscr {X}{}\) and all \(k \in \mathbb {N}_{0}{}\). Applying Eq. (1) repeatedly, we find that for all \(x \in \mathscr {X}{}\):

$$\begin{aligned} \overline{\mathrm {E}}_{\mathrm {av},k}(f \vert x) = \tfrac{1}{k+1}\tilde{m}_{f,k}(x) =\tfrac{1}{k+1} [\overline{T}{}_{ f}^{k} \tilde{m}_{f,0}](x) = \tfrac{1}{k+1} [\overline{T}{}_{ f}^{k+1}(0)](x). \end{aligned}$$
(2)

The same formula can also be obtained as a special case of the results in [13].

These expressions for \(\overline{\mathrm {E}}_{k}(f \vert x)\) and \(\overline{\mathrm {E}}_{\mathrm {av},k}(f \vert x)\) in terms of the respective operators \(\overline{T}{}\) and \(\overline{T}{}_{ f}^{}\) are particularly useful when we aim to characterise the limit behaviour of these inferences. As will be elaborated on in the next section, there are conditions on \(\overline{T}{}\) that are necessary and sufficient for \(\overline{\mathrm {E}}_{k}(f \vert x)\) to converge to a limit value that does not depend on the process’ initial state \(x \in \mathscr {X}{}\). If this is the case for all \(f \in \mathscr {L}{}(\mathscr {X}{})\), the imprecise Markov chain is called ergodic and we then denote the constant limit value by . Similarly, we call an imprecise Markov chain weakly ergodic if, for all \(f \in \mathscr {L}{}(\mathscr {X}{})\), \(\lim _{k \rightarrow +\infty } \overline{\mathrm {E}}_{\mathrm {av},k}(f \vert x)\) exists and does not depend on the initial state x. For a weakly ergodic imprecise Markov chain, we denote the common limit value by . In contrast with standard ergodicity, weak ergodicity and, more generally, the limit behaviour of \(\overline{\mathrm {E}}_{\mathrm {av},k}(f \vert x)\), is almost entirely unexplored. The aim of this paper is to remedy this situation. The main contribution will be a necessary and sufficient condition for an imprecise Markov chain to be weakly ergodic. As we will see, this condition is weaker than those needed for standard ergodicity, hence our choice of terminology. The following example shows that this difference already becomes apparent in the precise case.

Example 1

Let \(\mathscr {X}{} = \{a,b\}\), consider any function \(\smash {f = \big [{\begin{matrix} f_a \\ f_b \end{matrix}}\big ] \in \mathscr {L}{}(\mathscr {X}{})}\) and assume that \(\mathscr {T}\!_{}\) consists of a single matrix \(T = \big [{\begin{matrix} 0 &{} 1 \\ 1 &{} 0 \end{matrix}}\big ]\). Clearly, \(\overline{T}{}\) is not ergodic because \(\smash {\overline{T}{}^{(2\ell + 1)} f = T^{(2\ell + 1)} f = \big [{\begin{matrix} 0 &{} 1 \\ 1 &{} 0 \end{matrix}}\big ]f = \big [{\begin{matrix} f_b \\ f_a \end{matrix}}\big ]} \) and \(\smash {\overline{T}{}^{(2\ell )} f = \big [{\begin{matrix} 1 &{} 0 \\ 0 &{} 1 \end{matrix}}\big ]f = \big [{\begin{matrix} f_a \\ f_b \end{matrix}}\big ]}\) for all \(\ell \in \mathbb {N}_{0}{}\). \(\overline{T}{}\) is weakly ergodic though, because

$$\begin{aligned} \overline{T}{}_{ f}^{(2\ell )}(0) = \ell \big [{\begin{matrix} f_a + f_b \\ f_a + f_b \end{matrix}}\big ] \, \text { and } \, \overline{T}{}_{ f}^{(2\ell +1)}(0) = f + \overline{T}{} \, \overline{T}{}_{ f}^{(2\ell )}(0) = f + \ell \big [{\begin{matrix} f_a + f_b \\ f_a + f_b \end{matrix}}\big ], \end{aligned}$$

for all \(\ell \in \mathbb {N}_{0}{}\), which implies that exists. \(\Diamond \)

Notably, even if an imprecise Markov chain is ergodic (and hence also weakly ergodic) and therefore both \(\overline{\mathrm {E}}_{\infty }(f)\) and \(\overline{\mathrm {E}}_{\mathrm {av},\infty }(f)\) exist, these inferences will not necessarily coincide. This was first observed in an experimental setting [11, Section 7.6], but the differences that were observed there were marginal. The following example shows that these differences can in fact be very substantial.

Example 2

Let \(\mathscr {X}{} = \{a,b\}\), let \(\mathscr {T}\!_{a}\) be the set of all probability mass functions on \(\mathscr {X}{}\) and let for the probability mass function \(p = (p_a , p_b) = (1,0)\) that puts all mass in a. Then, for any \(f = \big [{\begin{matrix} f_a \\ f_b \end{matrix}}\big ] \in \mathscr {L}{}(\mathscr {X}{})\), we have that

$$\begin{aligned} \overline{T}{}f (x) = \begin{aligned} {\left\{ \begin{array}{ll} \max f &{}\text { if } x=a; \\ f_a &{}\text { if } x=b, \end{array}\right. } \end{aligned} \quad \text { and } \quad \overline{T}{}^{ \, 2} f (x) = \begin{aligned} {\left\{ \begin{array}{ll} \max \overline{T}{} f = \max f &{}\text { if } x=a; \\ \overline{T}{}f (a) = \max f &{}\text { if } x=b. \end{array}\right. } \end{aligned} \end{aligned}$$

It follows that \(\overline{T}{}^k f = \max f\) for all \(k \ge 2\), so the limit upper expectation \(\overline{\mathrm {E}}_{\infty }(f)\) exists and is equal to \(\max f\) for all \(f \in \mathscr {L}{}(\mathscr {X}{})\). In particular, we have that \(\overline{\mathrm {E}}_{\infty }(\mathbb {I}_{b}) = 1\). On the other hand, we find that \(\smash {\overline{T}{}_{\mathbb {I}_{b}}^{(2\ell )}(0) = \ell }\) and \(\smash {\overline{T}{}_{\mathbb {I}_{b}}^{(2\ell + 1)}(0)} = \smash {\mathbb {I}_{b} + \overline{T}{} \, \overline{T}{}_{\mathbb {I}_{b}}^{(2\ell )}(0)} = \smash {\big [{\begin{matrix} \ell \\ \ell +1 \end{matrix}}\big ]}\) for all \(\ell \in \mathbb {N}_{0}{}\). This implies that the upper expectation exists and is equal to 1/2. This value differs significantly from the limit upper expectation \(\overline{\mathrm {E}}_{\infty }(\mathbb {I}_{b}) = 1\).

In fact, this result could have been expected simply by taking a closer look at the dynamics that correspond to \(\mathscr {T}\!_{}\). Indeed, it follows directly from \(\mathscr {T}\!_{}\) that, if the system is in state b at some instant, then it will surely be in a at the next time instant. Hence, the system can only reside in state b for maximally half of the time, resulting in an upper expected average that converges to 1/2. These underlying dynamics have little effect on the limit upper expectation \(\overline{\mathrm {E}}_{\infty }(\mathbb {I}_{b})\) though, because it is only concerned with the upper expectation of \(\mathbb {I}_{b}\) evaluated at a single time instant. \(\Diamond \)

Although we have used sets \(\mathscr {T}\!_{}\) of transition matrices to define imprecise Markov chains, it should at this point be clear that, if we are interested in the inferences \(\overline{\mathrm {E}}_{k}(f \vert x)\) and \(\overline{\mathrm {E}}_{\mathrm {av},k}(f \vert x)\) and their limit values, then it suffices to specify \(\overline{T}{}\). In fact, we will henceforth forget about \(\mathscr {T}\!_{}\) and will assume that \(\overline{T}{}\) is a coherent upper transition operator on \(\mathscr {L}{}(\mathscr {X}{})\), meaning that it is an operator from \(\mathscr {L}{}(\mathscr {X}{})\) to \(\mathscr {L}{}(\mathscr {X}{})\) that satisfies

  1. C1.

    \(\min h \le \overline{T}{} h \le \max h\)                      [boundedness];

  2. C2.

    \(\overline{T}{} (h+g) \le \overline{T}{}h + \overline{T}{}g\)                     [sub-additivity];

  3. C3.

    \(\overline{T}{}(\lambda h) = \lambda \overline{T}{}h\)              [non-negative homogeneity],

for all \(h,g \in \mathscr {L}{}(\mathscr {X})\) and all real \(\lambda \ge 0\) [5, 14, 15], and we will regard \(\overline{\mathrm {E}}_{k}(f \vert x)\) and \(\overline{\mathrm {E}}_{\mathrm {av},k}(f \vert x)\) as objects that correspond to \(\overline{T}{}\). Our results and proofs will never rely on the fact that \(\overline{T}{}\) is derived from a set \(\mathscr {T}\!_{}\) of transition matrices, but will only make use of C1C3 and the following two properties that are implied by them [14, Section 2.6.1]:

  1. C4.

    \(\overline{T}{}(\mu + h) = \mu + \overline{T}{}h\)                      [constant additivity];

  2. C5.

    if \(h \le g\) then \(\overline{T}{} h \le \overline{T}{} g\)                          [monotonicity],

for all \(h,g \in \mathscr {L}{}(\mathscr {X})\) and all real \(\mu \). This can be done without loss of generality because an upper transition operator \(\smash {\overline{T}{}}\) that is defined as an upper envelope of a set \(\mathscr {T}\!_{}\) of transition matrices—as we did in Sect. 4—is always coherent [14, Theorem 2.6.3]. Since properties such as ergodicity and weak ergodicity can be completely characterised in terms of \(\overline{T}{}\), we will henceforth simply say that \(\overline{T}{}\) itself is (weakly) ergodic, instead of saying that the corresponding imprecise Markov chain is.

5 Accessibility Relations and Topical Maps

To characterise ergodicity and weak ergodicity, we will make use of some well-known graph-theoretic concepts, suitably adapted to the imprecise Markov chain setting; we recall the following from [4] and [8]. The upper accessibility graph \(\mathscr {G}(\overline{T}{})\) corresponding to \(\overline{T}{}\) is defined as the graph with vertices \(x_1 \cdots x_n \in \mathscr {X}{}\), where , with an edge from \(x_i\) to \(x_j\) if \(\overline{T}{}\mathbb {I}_{x_j}(x_i) > 0\). For any two vertices \(x_i\) and \(x_j\), we say that \(x_j\) is accessible from \(x_i\), denoted by \(x_i \rightarrow x_j\), if \(x_i = x_j\) or if there is a directed path from \(x_i\) to \(x_j\), which means that there is a sequence \(x_i = x'_0, x'_1 , \cdots , x'_m = x_j\) of vertices, with \(m \in \mathbb {N}{}\), such that there is an edge from \(x'_{\ell -1}\) to \(x'_{\ell }\) for all \(\ell \in \{1,\cdots ,m\}\). We say that two vertices \(x_i\) and \(x_j\) communicate and write \(x_i \leftrightarrow x_j\) if both \(x_i \rightarrow x_j\) and \(x_j \rightarrow x_i\). The relation \(\leftrightarrow \) is an equivalence relation (reflexive, symmetric and transitive) and the equivalence classes are called communication classes. We call the graph \(\mathscr {G}(\overline{T}{})\) strongly connected if any two vertices \(x_i\) and \(x_j\) in \(\mathscr {G}(\overline{T}{})\) communicate, or equivalently, if \(\mathscr {X}{}\) itself is a communication class. Furthermore, we say that \(\overline{T}{}\) (or \(\mathscr {G}(\overline{T}{})\)) has a top class \(\mathcal {R}\) if

So, if \(\overline{T}{}\) has a top class \(\mathcal {R}\), then \(\mathcal {R}\) is accessible from any vertex in the graph \(\mathscr {G}(\overline{T}{})\). As a fairly immediate consequence, it follows that \(\mathcal {R}\) is a communication class that is maximal or undominated, meaning that \(x \not \rightarrow y\) for all \(x \in \mathcal {R}\) and all \(y \in \mathcal {R}^c\). In fact, it is the only such maximal communication class.

Having a top class is necessary for \(\overline{T}{}\) to be ergodic, but it is not sufficient. Sufficiency additionally requires that the top class \(\mathcal {R}\) satisfies [8, Proposition 3]:

  1. E1.

    \((\forall x \in \mathcal {R})(\exists k^*\in \mathbb {N}{})(\forall k \ge k^*) \ \min \overline{T}{}^k \mathbb {I}_{x} > 0\)                      [Regularity];

  2. E2.

    \((\forall x \in \mathcal {R}^c)(\exists k \in \mathbb {N}{}) \ \overline{T}{}^k \mathbb {I}_{\mathcal {R}^c}(x) < 1\)                                        [Absorbing].

We will say that \(\overline{T}{}\) is top class regular (TCR) if it has a top class that is regular, and analogously for top class absorbing (TCA). Top class regularity represents aperiodic behaviour: it demands that there is some time instant \(k^*\in \mathbb {N}{}\) such that all of the elements in the top class \(\mathcal {R}\) are accessible from each other in k steps, for any \(k \ge k^*\). In the case of traditional Markov chains, top class regularity suffices as a necessary and sufficient condition for ergodicity [4, 10]. However, in the imprecise case, we need the additional condition of being top class absorbing, which ensures that the top class will eventually be reached. It requires that, if the process starts from any state \(x \in \mathcal {R}^c\), the lower probability that it will ever transition to \(\mathcal {R}\) is strictly positive. We refer to [4] for more details. From a practical point of view, an important feature of both of these accessibility conditions is that they can be easily checked in practice [8].

The characterisation of ergodicity using (TCR) and (TCA) was strongly inspired by the observation that upper transition operators are part of a specific collection of order-preserving maps, called topical maps. These are maps \(F :\mathbb {R}{}^n \rightarrow \mathbb {R}{}^n\) that satisfy

  1. T1.

    \(F(\mu + h) = \mu + Fh\)                      [constant additivity];

  2. T2.

    if \(h \le g\) then \(F(h) \le F(g)\)                    [monotonicity],

for all \(h,g \in \mathbb {R}{}^n\) and all \(\mu \in \mathbb {R}{}\). To show this, we identify \(\mathscr {L}{}(\mathscr {X}{})\) with the finite-dimensional linear space \(\mathbb {R}{}^n\), with \(n = \vert \mathscr {X}{} \vert \); this is clearly possible because both are isomorph. That every coherent upper transition operator is topical now follows trivially from C4 and C5. What is perhaps less obvious, but can be derived in an equally trivial way, is that the operator \(\overline{T}{}_{ f}^{}\) is also topical. This allows us to apply results for topical maps to \(\smash {\overline{T}{}_{ f}^{}}\) in order to find necessary and sufficient conditions for weak ergodicity.

6 A Sufficient Condition for Weak Ergodicity

As a first step, we aim to find sufficient conditions for the existence of \(\overline{\mathrm {E}}_{\mathrm {av},\infty }(f)\). To that end, recall from Sect. 4 that if \(\overline{\mathrm {E}}_{\mathrm {av},\infty }(f)\) exists, it is equal to the limit \(\smash {\lim _{k \rightarrow +\infty } \overline{T}{}_{ f}^{k} (0) / k}\). Then, since \(\overline{T}{}_{ f}^{}\) is topical, the following lemma implies that it is also equal to \(\smash {\lim _{k \rightarrow +\infty } \overline{T}{}_{ f}^{k} h / k}\) for any \(h \in \mathscr {L}{}(\mathscr {X}{})\).

Lemma 1

[7, Lemma 3.1]. Consider any topical map \(F :\mathbb {R}{}^n \rightarrow \mathbb {R}{}^n\). If the limit \(\lim _{k \rightarrow +\infty } F^k h / k\) exists for some \(h \in \mathbb {R}{}^n\), then the limit exists for all \(h \in \mathbb {R}{}^n\) and they are all equal.

Hence, if \(\smash {\lim _{k \rightarrow +\infty } \overline{T}{}_{ f}^{k} h / k}\) converges to a constant vector \(\mu \) for some \(h \in \mathscr {L}{}(\mathscr {X}{})\), then \(\overline{\mathrm {E}}_{\mathrm {av},\infty }(f)\) exists and is equal to \(\mu \). This condition is clearly satisfied if the map \(\smash {\overline{T}{}_{ f}^{}}\) has an (additive) eigenvector \(h \in \mathscr {L}{}(\mathscr {X}{})\), meaning that \(\smash {\overline{T}{}_{ f}^{k} h = h + k \mu }\) for some \(\mu \in \mathbb {R}{}\) and all \(k \in \mathbb {N}_{0}{}\). In that case, we have that \(\overline{\mathrm {E}}_{\mathrm {av},\infty }(f) = \mu \), where \(\mu \) is called the eigenvalue corresponding to h.

To find conditions that guarantee the existence of an eigenvector of \(\overline{T}{}_{ f}^{}\), we will make use of results from [6] and [7]. There, accessibility graphs are defined in a slightly different way: for any topical map \(F :\mathbb {R}{}^n \rightarrow \mathbb {R}{}^n\), they let \(\mathscr {G}'(F)\) be the graph with vertices \(v_{1} , \cdots , v_{n}\) and an edge from \(v_i\) to \(v_j\) if \(\smash {\lim _{\alpha \rightarrow +\infty } [F(\alpha \mathbb {I}_{v_j})](v_i) = +\infty }\). Subsequently, for such a graph \(\mathscr {G}'(F)\), the accessibility relation \(\cdot \rightarrow \cdot \) and corresponding notions (e.g. ‘strongly connected’, ‘top class’, ...) are defined as in Sect. 5. If we identify the vertices \(v_{1} , \cdots , v_{n}\) in \(\mathscr {G}'(\overline{T}{})\) and \(\smash {\mathscr {G}'(\overline{T}{}_{ f}^{})}\) with the different states \(x_{1} , \cdots , x_{n}\) in \(\mathscr {X}{}\), this can in particular be done for the topical maps \(\overline{T}{}\) and \(\overline{T}{}_{ f}^{}\). The following results show that the resulting graphs coincide with the one defined in Sect. 5.

Lemma 2

For any two vertices x and y in \(\mathscr {G}'(\overline{T}{})\), there is an edge from x to y in \(\mathscr {G}'(\overline{T}{})\) if and only if there is an edge from x to y in \(\mathscr {G}(\overline{T}{})\).

Proof

Consider any two vertices x and y in the graph \(\mathscr {G}'(\overline{T}{})\). Then there is an edge from x to y if \(\lim _{\alpha \rightarrow +\infty } [\overline{T}{}(\alpha \mathbb {I}_{y})](x) = +\infty \). By non-negative homogeneity [C3], this is equivalent to the condition that \(\lim _{\alpha \rightarrow +\infty } \alpha [\overline{T}{}\mathbb {I}_{y}](x) = +\infty \). Since moreover \(0 \le \overline{T}{}\mathbb {I}_{y} \le 1\) by C1, this condition reduces to \(\overline{T}{}\mathbb {I}_{y}(x) > 0\). \(\square \)

Corollary 1

The graphs \(\mathscr {G}'(\overline{T}{}_{ f}^{})\), \(\mathscr {G}'(\overline{T}{})\) and \(\mathscr {G}(\overline{T}{})\) are identical.

Proof

Lemma 2 implies that \(\mathscr {G}'(\overline{T}{})\) and \(\mathscr {G}(\overline{T}{})\) are identical. Moreover, that \(\mathscr {G}'(\overline{T}{}_{ f}^{})\) is equal to \(\mathscr {G}'(\overline{T}{})\), follows straightforwardly from the definition of \(\overline{T}{}_{ f}^{}\).    \(\square \)

In principle, we could use this result to directly obtain the desired condition for the existence of an eigenvector from [6, Theorem 2]. However, [6, Theorem 2] is given in a multiplicative framework and would need to be reformulated in an additive framework in order to be applicable to the map \(\overline{T}{}_{ f}^{}\); see [6, Section 2.1]. This can be achieved with a bijective transformation, but we prefer to not do so because it would require too much extra terminology and notation. Instead, we will derive an additive variant of [6, Theorem 2] directly from [6, Theorem 9] and [6, Theorem 10].

The first result establishes that the existence of an eigenvector is equivalent to the fact that trajectories are bounded with respect to the Hilbert semi-norm \(\left\Vert \cdot \right\Vert _{\mathrm {H}}\), defined by for all \(h \in \mathbb {R}{}^n\).

Theorem 1

[6, Theorem 9]. Let \(F :\mathbb {R}{}^n \rightarrow \mathbb {R}{}^n\) be a topical map. Then F has an eigenvector in \(\mathbb {R}{}^n\) if and only if \(\left\{ \left\Vert F^k h\right\Vert _{\mathrm {H}} :k \in \mathbb {N}{} \right\} \) is bounded for some (and hence all) \(h \in \mathbb {R}{}^n\).

That the boundedness of a single trajectory indeed implies the boundedness of all trajectories follows from the non-expansiveness of a topical map with respect to the Hilbert semi-norm [6]. The second result that we need uses the notion of a super-eigenspace, defined for any topical map F and any \(\mu \in \mathbb {R}{}\) as the set .

Theorem 2

[6, Theorem 10]. Let \(F :\mathbb {R}{}^n \rightarrow \mathbb {R}{}^n\) be a topical map such that the associated graph \(\mathscr {G}'(F)\) is strongly connected. Then all of the super-eigenspaces are bounded in the Hilbert semi-norm.

Together, these theorems imply that any topical map \(F :\mathbb {R}{}^n \rightarrow \mathbb {R}{}^n\) for which the graph \(\mathscr {G}'(F)\) is strongly connected, has an eigenvector. The connection between both is provided by the fact that trajectories cannot leave an eigenspace. The following result formalises this.

Theorem 3

Let \(F :\mathbb {R}{}^n \rightarrow \mathbb {R}{}^n\) be a topical map such that the associated graph \(\mathscr {G}'(F)\) is strongly connected. Then F has an eigenvector in \(\mathbb {R}{}^n\).

Proof

Consider any \(h\in \mathbb {R}{}^n\) and any \(\mu \in \mathbb {R}\) such that \(\max (Fh - h) \le \mu \). Then \(Fh\le h+\mu \), so \(h\in S^\mu (F)\). Now notice that \(F(Fh) \le F(h +\mu ) = Fh +\mu \) because of T1 and T2, which implies that also \(Fh \in S^\mu (F)\). In the same way, we can also deduce that \(F^{2} h \in S^\mu (F)\) and, by repeating this argument, that the whole trajectory corresponding to h remains in \(S^\mu (F)\). This trajectory is bounded because of Theorem 2, which by Theorem 1 guarantees the existence of an eigenvector.    \(\square \)

In particular, if \(\mathscr {G}'(\overline{T}{}_{ f}^{})\) is strongly connected then \(\overline{T}{}_{ f}^{}\) has an eigenvector, which on its turn implies the existence of \(\smash {\overline{\mathrm {E}}_{\mathrm {av},\infty }(f)}\) as explained earlier. If we combine this observation with Corollary 1, we obtain the following result.

Proposition 1

An upper transition operator \(\overline{T}{}\) is weakly ergodic if the associated graph \(\mathscr {G}(\overline{T}{})\) is strongly connected.

Proof

Suppose that \(\mathscr {G}(\overline{T}{})\) is strongly connected. Then, by Corollary 1, \(\mathscr {G}'(\overline{T}{}_{ f}^{})\) is also strongly connected. Hence, since \(\smash {\overline{T}{}_{ f}^{}}\) is a topical map, Theorem 3 guarantees the existence of an eigenvector of \(\smash {\overline{T}{}_{ f}^{}}\). As explained in the beginning of this section, this implies by Lemma 1 that \(\smash {\overline{\mathrm {E}}_{\mathrm {av},\infty }(f)}\) exists, so we indeed find that \(\overline{T}{}\) is weakly ergodic.    \(\square \)

In the remainder of this paper, we will use the fact that \(\overline{T}{}\) is coherent—so not just topical—to strengthen this result. In particular, we will show that the condition of being strongly connected can be replaced by a weaker one: being top class absorbing. It will moreover turn out that this property is not only sufficient, but also necessary for weak ergodicity.

7 Necessary and Sufficient Condition for Weak Ergodicity

In order to gain some intuition about how to obtain a more general sufficient condition for weak ergodicity, consider the case where \(\overline{T}{}\) has a top class \(\mathcal {R}\) and the process’ initial state x is in \(\mathcal {R}\). Since \(\mathcal {R}\) is a maximal communication class, the process surely remains in \(\mathcal {R}\) and hence, it is to be expected that the time average of f will not be affected by the dynamics of the process outside \(\mathcal {R}\). Moreover, the communication class \(\mathcal {R}\) is a strongly connected component, so one would expect that, due to Proposition 1, the upper expected time average \(\overline{\mathrm {E}}_{\mathrm {av},k}(f \vert x)\) converges to a constant that does not depend on the state \(x \in \mathcal {R}\). Our intuition is formalised by the following proposition. Its proof, as well as those of the other statements in this section, are available in the appendix of [12].

Proposition 2

For any maximal communication class \(\mathcal {S}\) and any \(x\in \mathcal {S}\), the upper expectation \(\overline{\mathrm {E}}_{\mathrm {av},k}(f \vert x)\) is equal to \(\overline{\mathrm {E}}_{\mathrm {av},k}(f \mathbb {I}_{\mathcal {S}} \vert x)\) and converges to a limit value. This limit value is furthermore the same for all \(x \in \mathcal {S}\).

As a next step, we want to extend the domain of convergence of \(\overline{\mathrm {E}}_{\mathrm {av},k}(f \vert x)\) to all states \(x \in \mathscr {X}{}\). To do so, we will impose the additional property of being top class absorbing (TCA), which, as explained in Sect. 5, demands that there is a strictly positive (lower) probability to reach the top class \(\mathcal {R}\) in a finite time period. Once in \(\mathcal {R}\), the process can never escape \(\mathcal {R}\) though. One would therefore expect that as time progresses—as more of these finite time periods go by—this lower probability increases, implying that the process will eventually be in \(\mathcal {R}\) with practical certainty. Furthermore, if the process transitions from \(x \in \mathcal {R}^c\) to a state \(y \in \mathcal {R}\), then Proposition 2 guarantees that \(\overline{\mathrm {E}}_{\mathrm {av},k}(f \vert y)\) converges to a limit and that this limit value does not depend on the state y. Finally, since the average is taken over a growing time interval, the initial finite number of time steps that it took for the process to transition from x to y will not influence the time average of f in the limit. This leads us to suspect that \(\overline{\mathrm {E}}_{\mathrm {av},k}(f \vert x)\) converges to the same limit as \(\overline{\mathrm {E}}_{\mathrm {av},k}(f \vert y)\). Since this argument applies to any \(x\in \mathcal {R}^c\), we are led to believe that \(\overline{T}{}\) is weakly ergodic. The following result confirms this.

Proposition 3

Any \(\overline{T}{}\) that satisfies (TCA) is weakly ergodic.

Conversely, suppose that \(\overline{T}{}\) does not satisfy (TCA). Then there are two possibilities: either there is no top class or there is a top class but it is not absorbing. If there is no top class, then it can be easily deduced that there are at least two maximal communication classes \(\mathcal {S}_1\) and \(\mathcal {S}_2\). As discusssed earlier, the process cannot escape the classes \(\mathcal {S}_1\) and \(\mathcal {S}_2\) once it has reached them. So if it starts in one of these communication classes, the process’ dynamics outside this class are irrelevant for the behaviour of the resulting time average. In particular, if we let f be the function that takes the constant value \(c_1\) in \(\mathcal {S}_1\) and \(c_2\) in \(\mathcal {S}_2\), with \(c_1 \not = c_2\), then we would expect that \(\overline{\mathrm {E}}_{\mathrm {av},k}(f \vert x) = c_1\) and \(\overline{\mathrm {E}}_{\mathrm {av},k}(f \vert y) = c_2\) for all \(k \in \mathbb {N}_{0}{}\), any \(x \in \mathcal {S}_1\) and any \(y \in \mathcal {S}_2\). In fact, this can easily be formalised by means of Proposition 2. Hence, \(\overline{\mathrm {E}}_{\mathrm {av},\infty }(f \vert x)=c_1\ne c_2=\overline{\mathrm {E}}_{\mathrm {av},\infty }(f \vert y)\), so the upper transition operator \(\overline{T}{}\) cannot be weakly ergodic.

Proposition 4

Any weakly ergodic \(\overline{T}{}\) has a top class.

Finally, suppose that there is a top class \(\mathcal {R}\), but that it is not absorbing. This implies that there is an \(x \in \mathcal {R}^c\) and a compatible precise model such that the process is guaranteed to remain in \(\mathcal {R}^c\) given that it started in x. If we now let \(f = \mathbb {I}_{\mathcal {R}^c}\), then conditional on the fact that \(X_0 = x\), the expected time average of f corresponding to this precise model is equal to 1. Furthermore, since \(f\le 1\), no other process can yield a higher expected time average. The upper expected time average \(\overline{\mathrm {E}}_{\mathrm {av},k}(f \vert x)\) is therefore equal to 1 for all \(k \in \mathbb {N}_{0}{}\). However, using Proposition 2, we can also show that \(\overline{\mathrm {E}}_{\mathrm {av},k}(f \vert y) = 0\) for any \(y \in \mathcal {R}\) and all \(k \in \mathbb {N}_{0}{}\). Hence, \(\overline{\mathrm {E}}_{\mathrm {av},\infty }(f \vert x)=1\ne 0=\overline{\mathrm {E}}_{\mathrm {av},\infty }(f \vert y)\), which precludes \(\overline{T}{}\) from being weakly ergodic.

Proposition 5

Any weakly ergodic \(\overline{T}{}\) that has a top class satisfies (TCA).

Together with Propositions 3 and 4, this allows us to conclude that (TCA) is a necessary and sufficient condition for weak ergodicity.

Theorem 4

\(\overline{T}{}\) is weakly ergodic if and only if it is top class absorbing.

8 Conclusion

The most important conclusion of our study of upper and lower expected time averages is its final result: that being top class absorbing is necessary and sufficient for weak ergodicity; a property that guarantees upper and lower expected time averages to converge to a limit value that does not depend on the process’ initial state. In comparison with standard ergodicity, which guarantees the existence of a limit upper and lower expectation, weak ergodicity thus requires less stringent conditions to be satisfied. We illustrated this difference in Example 1, where we considered a(n imprecise) Markov chain that satisfies (TCA) but not (TCR).

Apart from the fact that their existence is guaranteed under weaker conditions, the inferences \(\overline{\mathrm {E}}_{\mathrm {av},\infty }(f)\) are also able to provide us with more information about how time averages might behave, compared to limit expectations. To see why, recall Example 2, where the inference \(\overline{\mathrm {E}}_{\mathrm {av},\infty }(\mathbb {I}_{b}) = 1/2\) significantly differed from \(\overline{\mathrm {E}}_{\infty }(\mathbb {I}_{b}) = 1\). Clearly, the former was more representative for the limit behaviour of the time average of \(\mathbb {I}_{b}\). As a consequence of [11, Lemma 57], a similar statement holds for general functions. In particular, it implies that \(\overline{\mathrm {E}}_{\mathrm {av},\infty }(f) \le \overline{\mathrm {E}}_{\infty }(f)\) for any function \(f \in \mathscr {L}{}(\mathscr {X}{})\). Since both inferences are upper bounds, \(\overline{\mathrm {E}}_{\mathrm {av},\infty }(f)\) is therefore at least as informative as \(\overline{\mathrm {E}}_{\infty }(f)\).

In summary then, when it comes to characterising long-term time averages, there are two advantages that (limits of) upper and lower expected time averages have over conventional limit upper and lower expectations: they exist under weaker conditions and they are at least as (and sometimes much more) informative.

That said, there is also one important feature that limit upper and lower expectations have, but that is currently still lacking for upper and lower expected time averages: an (imprecise) point-wise ergodic theorem [2, Theorem 32]. For the limit upper and lower expectations of an ergodic imprecise Markov chain, this result states that

$$\begin{aligned} \smash {\underline{\mathrm {E}}_{\infty }(f)} \le \liminf _{k \rightarrow +\infty } \overline{f}_k (X_{0:k}) \le \limsup _{k \rightarrow +\infty } \overline{f}_k (X_{0:k}) \le \smash {\overline{\mathrm {E}}_{\infty }(f)}, \end{aligned}$$

with lower probability one. In order for limit upper and lower expected time averages to be the undisputed quantities of interest when studying long-term time averages, a similar result would need to be obtained for weak ergodicity, where the role of \(\overline{\mathrm {E}}_{\infty }(f)\) and is taken over by \(\overline{\mathrm {E}}_{\mathrm {av},\infty }(f)\) and , respectively. If such a result would hold, it would provide us with (strictly almost sure) bounds on the limit values attained by time averages that are not only more informative as the current ones, but also guaranteed to exist under weaker conditions. Whether such a result indeed holds is an open problem that we would like to address in our future work.

A second line of future research that we would like to pursue consists in studying the convergence of \(\overline{\mathrm {E}}_{\mathrm {av},k}(f \vert x)\) in general, without imposing that the limit value should not depend on x. We suspect that this kind of convergence will require no conditions at all.