4.1 Introduction

Although imprecision and robustness in discrete-time Markov chains were already studied in the 1990s [6,7,8], more significant progress [2, 3, 5, 11] could be made after the graphical structure of imprecise probability trees underlying them was uncovered in 2008 [4]. Research has now moved firmly into the continuous-time domain, for which [1, 9] are good starting points.

In this paper, I give a concise and elementary overview of a number of basic ideas and results in discrete-time imprecise Markov chains, with an emphasis on their graphical representation. We begin with the basics of precise and imprecise probability models in Sects. 4.2 and 4.3. When such models are used in a dynamical context, precise and imprecise probability trees arise naturally; they and the use of the fundamental Law of Iterated Expectations for making inferences about them constitute the subjects of Sects. 4.4 and 4.5. Imprecise Markov chains correspond to special imprecise probability trees, and they and their basic inferences are discussed in Sect. 4.6, followed by a number of examples in Sect. 4.7. These examples hint at stationary distributions and ergodicity. These notions are briefly discussed in Sect. 4.8, which concludes the paper. Throughout, I have included a number of simple exercises to illustrate the arguments in the main text.

4.2 Precise Probability Models

Assume we are uncertain about the value that a variable \(X\) assumes in some finite set of possible values \(\mathcal {X}\). This is usually modelled by a probability mass function \(m\) on \(\mathcal {X}\), satisfying \((\forall x\in \mathcal {X})m(x)\ge 0\) and \(\sum _{x\in \mathcal {X}}m(x)=1\).

With \(m\) we can associate an expectation operator \(E_m\) as follows

$$\begin{aligned} E_m(f):=\sum _{x\in \mathcal {X}}m(x)f(x)\text { where }f:\mathcal {X}\rightarrow \mathbb {R}. \end{aligned}$$

If \(A\subseteq \mathcal {X}\) is an event, then its probability is given by \(P_m(A)=\sum _{x\in A}m(x)=E_m(I_A)\), where \(I_A:\mathcal {X}\rightarrow \mathbb {R}\) is the indicator of \(A\) and assumes the value \(1\) on \(A\) and \(0\) elsewhere. This tells us that there are two equivalent mathematical languages for dealing with uncertainty: the language of probabilities and the language of expectations, and that we can go freely from one to the other.

All possible (precise) probability models are gathered in the simplex \(\Sigma _\mathcal {X}\) of all mass functions on \(\mathcal {X}\): \(\Sigma _\mathcal {X}:={\left\{ m\in \mathbb {R}^\mathcal {X}:(\forall x\in \mathcal {X})m(x)\ge 0\text { and }\sum _{x\in \mathcal {X}}m(x)=1\right\} }\). Any probability model for uncertainty about \(X\) is a point in that simplex, which indicated that mass functions have a geometrical interpretation. This is illustrated below for the case \(\mathcal {X}=\{a,b,c\}\) and the uniform mass function \(m_{\mathrm {u}}\).

figure a

Expectation also has a geometrical interpretation: specifying a value \(E(f)\) for the expectation of a map \(f:\mathcal {X}\rightarrow \mathbb {R}\), namely, \(\sum _{x\in \mathcal {X}}m(x)f(x)=E(f)\), imposes a linear constraint on the possible values for \(m\) in \(\Sigma _\mathcal {X}\). It corresponds to intersecting the simplex \(\Sigma _\mathcal {X}\) with a hyperplane, whose direction depends on \(f\). This is also illustrated in the picture above; in this particular case two assessments turn out to completely determine a unique mass function.

4.3 Imprecise Probability Models

We now turn to a generalisation of precise probability models, which we will call imprecise. To allow for more realistic and flexible assessments, we can envisage imposing linear inequality—rather than equality—constraints on the \(m\) in \(\Sigma _\mathcal {X}\):

$$\begin{aligned} \underline{E}\phantom {E}(f)\le \sum _{x\in \mathcal {X}}m(x)f(x) \text { or } \sum _{x\in \mathcal {X}}m(x)f(x)\le \overline{E}\phantom {E}(f). \end{aligned}$$

This corresponds to intersecting \(\Sigma _\mathcal {X}\) with affine semi-spaces:

figure b

Any such number of assessments leads to a credal set \(\mathcal {M}\), which is our first type of imprecise probability model.

Definition 4.1

A credal set \(\mathcal {M}\) is a convex closed subset of \(\Sigma _\mathcal {X}\).

Below, we show some more examples of such credal sets in the special case \(\mathcal {X}=\{a,b,c\}\). The credal set on the left corresponds to the assessment: ‘\(b\) is at least as likely as \(c\)’; the one in the middle is a convex mixture of the uniform mass function with the entire simplex; and the one on the right represents a statement in classical propositional logic: ‘\(X=a\) or \(X=c\)’. This illustrates that the language of credal sets encompasses both precise probabilities and classical propositional logic.

figure c

Lower and upper expectations are our second type of imprecise probability model. To see how they come about, consider the credal set in the figure below on the right.

figure d

We can ask what we know about the probability of \(c\), or the expectation of \({I_{\{c\}}}\), given this credal set: it is only known to belong to the closed interval \([\nicefrac {1}{4},\nicefrac {4}{7}]\). This can be generalised from events to arbitrary elements of the set \(\mathcal {L}(\mathcal {X})=\mathbb {R}^\mathcal {X}\) of all real-valued maps \(f\) on \(\mathcal {X}\): As \(m\) ranges over the credal set \(\mathcal {M}\), \(E_m(f)\) will similarly range over a closed interval that is completely determined by its lower and upper bounds.

This leads to the definition of the following two real functionals on \(\mathcal {L}(\mathcal {X})\):

$$\begin{aligned} \begin{aligned} \underline{E}\phantom {E}_\mathcal {M}(f)&=\min {\left\{ E_m(f):m\in \mathcal {M}\right\} } \text { lower expectation}\\ \overline{E}\phantom {E}_\mathcal {M}(f)&=\max {\left\{ E_m(f):m\in \mathcal {M}\right\} } \text { upper expectation} \end{aligned} \text { for all }f:\mathcal {X}\rightarrow \mathbb {R}. \end{aligned}$$

Observe that these lower and upper expectations are mathematically equivalent models, because

$$\begin{aligned} \overline{E}\phantom {E}_\mathcal {M}(f)=-\underline{E}\phantom {E}_\mathcal {M}(-f)\text { for all }f\in \mathcal {L}(\mathcal {X}). \end{aligned}$$

We will in what follows focus on upper expectations.

Exercise 4.1

What is the upper expectation \(\overline{E}\phantom {E}_\mathcal {M}\) when \(\mathcal {M}=\Sigma _\mathcal {X}\)?

figure e

This shows that we can go from the language of probabilities—and the use of \(\mathcal {M}\)—to the language of expectations—and the use of \(\overline{E}\phantom {E}_\mathcal {M}\). To see that we can also go the other way, we need the following definition:

Definition 4.2

We call a real functional \(\overline{E}\phantom {E}\) on \(\mathcal {L}(\mathcal {X})\) an upper expectation if it satisfies the following properties: for all \(f\) and \(g\) in \(\mathcal {L}(\mathcal {X})\) and all real \(\lambda \ge 0\):

  1. 1.

    \(\overline{E}\phantom {E}(f)\le \max f\) [boundedness];

  2. 2.

    \(\overline{E}\phantom {E}(f+g)\le \overline{E}\phantom {E}(f)+\overline{E}\phantom {E}(g)\) [sub-additivity];

  3. 3.

    \(\overline{E}\phantom {E}(\lambda f)=\lambda \overline{E}\phantom {E}(f)\) [non-negative homogeneity].

Upper expectations are also called coherent upper previsions [10, 12]. They constitute a model that is mathematically equivalent to credal sets, in very much the same way as expectations are mathematically equivalent to probability mass functions:

Theorem 4.1

A real functional \(\overline{E}\phantom {E}\) is an upper expectation if and only if it is the upper envelope of some credal set \(\mathcal {M}\).

Proof

Use \(\mathcal {M}={\left\{ m\in \Sigma _\mathcal {X}:(\forall f\in \mathcal {L}(\mathcal {X}))(E_m(f)\le \overline{E}\phantom {E}(f))\right\} }\). \(\square \)

Exercise 4.2

Consider any linear prevision \(E_m\) and any \(\epsilon \in [0,1]\). Verify that the so-called linear-vacuous mixture:

figure f

is an upper expectation.

Solution: \(E_m\) and \(\max \) are upper expectations by Theorem 4.1, because they are upper envelopes of the respective credal sets \(\{m\}\) and \(\Sigma _\mathcal {X}\)—see Exercise 4.1. Now verify that being an upper expectation is preserved by taking convex mixtures. The corresponding credal set \((1-\epsilon )\{m\}+\epsilon \Sigma _\mathcal {X}:={\left\{ (1-\epsilon )m+\epsilon q:q\in \Sigma _\mathcal {X}\right\} }\) is indicated in blue in the figure above. \(\lozenge \)

Exercise 4.3

All upper expectations on a binary space \(\mathcal {X}=\{0,1\}\) are such linear-vacuous mixtures, and the corresponding credal set can be depicted as

figure g

Let \(p:=m(1)\) and \(q:=1-p=m(0)\). What is the relation between \([\underline{p},\overline{p}]\) and \(p,\epsilon \)?

Solution: \(\underline{p}=\underline{E}\phantom {E}({I_{\{1\}}})=(1-\epsilon )p=p-\epsilon p\) and \(\overline{p}=\overline{E}\phantom {E}({I_{\{1\}}})=(1-\epsilon )p+\epsilon =p+\epsilon q\). Hence, \(\overline{p}-\underline{p}=\epsilon \). \(\lozenge \)

4.4 Discrete-Time Uncertain Processes

We now apply these ideas in a more dynamic context: the study of processes. We consider an uncertain process, which is a collection of uncertain variables \(X_1\), \(X_2\), ..., \(X_n\), ... assuming values in some finite set of states \(\mathcal {X}\). This can be represented graphically by a standard event tree with nodes (also called situations) \(s=(x_1,x_2,\dots ,x_n)\) for \(x_k\in \mathcal {X}\) and \(n\ge 0\). This is depicted below on the left for the special case that \(\mathcal {X}=\{0,1\}\), where we have limited ourselves to three variables \(X_1\), \(X_2\), and \(X_3\); but the idea should be clear. Observe that we use the symbol \(\square \) for the initial situation, or root node, of the event tree.

figure h

The event tree becomes a probability tree as soon as we attach to each node \(s=(x_1,x_2,\dots ,x_n)\) a local probability mass function \(m_s\) on \(\mathcal {X}\) with associated expectation operator \(E_{m_s}\), expressing the uncertainty about the next variable \(X_{n+1}\) after observing the earlier variables \(X_1=x_1\), ..., \(X_n=x_n\). This is depicted above on the right for the special case that \(\mathcal {X}=\{0,1\}\).

We now consider a very general inference problem in such a probability tree. Consider any function \(g:\mathcal {X}^n\rightarrow \mathbb {R}\) of the first \(n\) variables: \(g=g(X_1,X_2,\dots ,X_n)\). We want to calculate its expectation \(E(g\vert s)\) in the situation \(s=(x_1,\dots ,x_k)\), that is, after having observed the first \(k\) variables. Interestingly, this can be done efficiently using the following theorem, which is a reformulation of the Law of Total Probability:

Theorem 4.2

(Law of Iterated Expectations) If we know \(E(g\vert s,x)\) for all \(x\in \mathcal {X}\), then we can calculate \(E(g\vert s)\) by backwards recursion using the local model \(m_s\):

$$\begin{aligned} E(g\vert s) =\underset{\text {local}}{\underbrace{E_{m_s}}}(E(g\vert s,\cdot )) =\sum _{x\in \mathcal {X}}m_s(x)E(g\vert s,x). \end{aligned}$$

This shows that expectations can be calculated recursively using a very basic step, illustrated below for the case \(\mathcal {X}=\{0,1\}\):

figure i

Hence, all expectations \(E(g\vert x_1,\dots ,x_k)\) in the tree can be calculated from the local models \(m_s\) as follows:

  1. 1.

    start in the final cut \(\mathcal {X}^n\) and let \(E(g\vert x_1,x_2,\dots ,x_n)=g(x_1,x_2,\dots ,x_n)\);

  2. 2.

    do backwards recursion using the Law of Iterated Expectations:

    $$\begin{aligned} E(g\vert x_1,\dots ,x_k) =\underset{\text {local}}{\underbrace{E_{m_{(x_1,\dots ,x_k)}}}}(E(g\vert x_1,\dots ,x_k,\cdot )) \end{aligned}$$
  3. 3.

    go on until you get to the root node \(\square \), where we can identify \(E(g\vert \square )=E(g)\).

Exercise 4.4

Consider flipping a coin twice independently, with probability \(p\) for heads—outcome \(1\)—and \(q=1-p\) for tails—outcome \(0\). The corresponding probability tree for this experiment is given below on the left, with, in red, in the nodes, the corresponding number of heads. What is the expected number of heads?

figure j

Solution: Above on the right, we apply the Law of Iterated Expectations recursively, from leaves to root; the solution is the expectation \(2p\) attached to the root. \(\lozenge \)

Exercise 4.5

Extend the ideas in the solution to Exercise 4.4 to calculate the expected number of heads when the coin is flipped \(n\) times independently.

Solution: We apply the Law of Iterated Expectations recursively, from leaves to root. Below on the left, we consider starting from the leaves of the tree at depth \(n\); applying the Law reduces to adding \(p\) to the number of heads in each of their parent nodes at depth \(n-1\). On the right, we apply the Law to these nodes at depth \(n-1\), which reduces to adding \(2p\) to the number of heads in each of their parent nodes at depth \(n-2\).

figure k

Going on in this way, we see that the solution is the expectation \(np\) attached to the root at depth \(0\). \(\lozenge \)

Exercise 4.6

We now flip the same coin time and time again, independently, until we reach heads for the first time. Calculate the expected number of coin flips.

Solution: Below is the (unbounded) probability tree associated with this experiment.

figure l

Call the unknown expectation \(\alpha \). We apply the Law of Iterated Expectations to the situations at depth \(1\). In the situation \(1\), the expected number of heads is \(1\), the actual number of heads there. In the situation \(0\), we see a copy of the original tree extending to the right, but since we have already flipped the coin once here, the expected number of heads in this situation is \(\alpha +1\). In the parent node, the expected number of heads \(\alpha \) is therefore also given by \(p\cdot 1+q\cdot (\alpha +1)=1+q\alpha \), whence \(\alpha =\nicefrac {1}{p}\). \(\lozenge \)

4.5 Imprecise Probability Trees

Until now, we have assumed that we have sufficient information in order to specify, in each node \(s\), a local probability mass function \(m_s\) on the set \(\mathcal {X}\) of possible values for the next state.

figure m

We now let go of this major restrictive assumption by allowing for more general uncertainty models. We will consider credal sets as our more general local uncertainty models: closed convex subsets \(\mathcal {M}_s\) of \(\Sigma _\mathcal {X}\). See the figure below for a special case when \(\mathcal {X}=\{0,1\}\).

Definition 4.3

An imprecise probability tree is an event tree where in each node \(s\) the local uncertainty model is a credal set \(\mathcal {M}_s\), or equivalently, its associated upper expectation \(\overline{E}\phantom {E}_s\), with \(\overline{E}\phantom {E}_s(f):=\max {\left\{ E_m(f):m\in \mathcal {M}_s\right\} }\) for all \(f\in \mathcal {L}(\mathcal {X})\).

An imprecise probability tree can be interpreted as an infinity of compatible precise probability trees: choose in each node \(s\) a probability mass function \(m_s\) from the set \(\mathcal {M}_s\).

figure n

For each real map \(g=g(X_1,\dots ,X_n)\), each node \(s=(x_1,\dots ,x_k)\), and each such compatible precise probability tree, we can calculate the expectation \(E(g\vert x_1,\dots ,x_k)\) using the backwards recursion method described before. By varying over each compatible probability tree, we get a closed real interval, completely characterised by lower and upper expectations \(\underline{E}\phantom {E}(g\vert x_1,\dots x_k)\) and \(\overline{E}\phantom {E}(g\vert x_1,\dots ,x_k)\): \([\underline{E}\phantom {E}(g\vert x_1,\dots ,x_k),\overline{E}\phantom {E}(g\vert x_1,\dots ,x_k)]\). The complexity of calculating these bounds in this way is clearly exponential in the number of time steps \(n\). But, there is a more efficient method to calculate them:

Theorem 4.3

(Law of Iterated Upper Expectations [4, 5]) If we know \(\overline{E}\phantom {E}(g\vert s,x)\) for all \(x\in \mathcal {X}\), then we can calculate \(\overline{E}\phantom {E}(g\vert s)\) by backwards recursion using the local model \(\overline{E}\phantom {E}_s\):

$$\begin{aligned} \overline{E}\phantom {E}(g\vert s) =\underset{\text {local}}{\underbrace{\overline{E}\phantom {E}_s}}(\overline{E}\phantom {E}(g\vert s,\cdot )) =\max _{m_s\in \mathcal {M}_s}\sum _{x\in \mathcal {X}}m_s(x)\,\overline{E}\phantom {E}(g\vert s,x). \end{aligned}$$

This shows that expectations can be calculated recursively using a very basic step, illustrated below for the case \(\mathcal {X}=\{0,1\}\):

figure o

The method for, and the complexity of, calculating the \(\overline{E}\phantom {E}(g\vert s)\), as a function of \(n\), is therefore essentially the same as in the precise case!

Exercise 4.7

Extend the ideas in the solution to Exercise 4.5 to calculate the upper expected number of heads when the coin is flipped \(n\) times independently, but where now we have an imprecise probability model for a coin flip, with a probability interval \([\underline{p},\overline{p}]\) for heads, and a corresponding interval \([\underline{q},\overline{q}]=[1-\overline{p},1-\underline{p}]\) for tails.

Solution: We apply the Law of Iterated Upper Expectations recursively, from leaves to root. Below on the left, we consider starting from the leaves of the tree at depth \(n\); applying the Law reduces to adding \(\overline{p}\) to the number of heads in each of their parent nodes at depth \(n-1\).

figure p

On the right, we apply the Law to these nodes at depth \(n-1\), which reduces to adding \(2\overline{p}\) to the number of heads in each of their parent nodes at depth \(n-2\). Going on in this way, we see that the solution is the expectation \(n\overline{p}\) attached to the root at depth \(0\). A similar result holds for the lower expectation. \(\lozenge \)

Exercise 4.8

We now flip the same coin with the imprecise probability model time and time again, independently, until we reach heads for the first time. Calculate the upper expected number of coin flips.

Solution: Below is the (unbounded) probability tree associated with this experiment.

figure q

Call the unknown upper expectation \(\alpha \). We apply the Law of Iterated Upper Expectations to the situations at depth \(1\). In the situation \(1\), the upper expected number of heads is \(1\), the actual number of heads there. In the situation \(0\), we see a copy of the original tree extending to the right, but since we have already flipped the coin once here, the upper expected number of heads in this situation is \(\alpha +1\). In the parent node, the upper expected number of heads \(\alpha \) is therefore also given by \(1+\overline{E}\phantom {E}(\alpha {I_{\{0\}}})=1+\alpha \overline{q}\), whence \(\alpha =\nicefrac {1}{\underline{p}}\). A similar result holds for the lower expectation. \(\lozenge \)

The attentive reader will have observed that in all these simple exercises, we can also obtain the ‘imprecise’ result from the ’precise’ one by optimising over the single parameter \(p\). We have to warn against too much optimism: in more involved examples, this will no longer be the case.

4.6 Imprecise Markov Chains

We now look at a special instance of a probability tree, corresponding to a stationary (precise) Markov chain. This happens when the precise local models \(m_{(x_1,\dots ,x_n)}\) only depend on the last observed state \(x_n\)—this is the Markov Condition—and also do not depend explicitly on the time step \(n\):

$$\begin{aligned} m_{(x_1,\dots ,x_n)}=q(\cdot \vert x_n) \end{aligned}$$

for some family of transition mass functions \(q(\cdot \vert x)\), \(x\in \mathcal {X}\).

Definition 4.4

The uncertain process is a stationary precise Markov chain when all \(\mathcal {M}_s\) are singletons \(\{m_s\}\) and \(\mathcal {M}_{(x_1,\dots ,x_n)}=\{q(\cdot \vert x_n)\}\), for some family of transition mass functions \(q(\cdot \vert x)\), \(x\in \mathcal {X}\).

For each \(x\in \mathcal {X}\), the transition mass function \(q(\cdot \vert x)\) corresponds to an expectation operator, given by \(E(f\vert x)=\sum _{z\in \mathcal {X}}q(z\vert x)f(z)\) for all \(f\in \mathcal {L}(\mathcal {X})\).

Definition 4.5

Consider the linear transformation \(\mathrm {T}\) of \(\mathcal {L}(\mathcal {X})\), called transition operator: \(\mathrm {T}:\mathcal {L}(\mathcal {X})\rightarrow \mathcal {L}(\mathcal {X}):f\mapsto \mathrm {T}f\), where \(\mathrm {T}f\) is the real map defined by:

$$\begin{aligned} \mathrm {T}f(x) :=E(f\vert x) =\sum _{z\in \mathcal {X}}q(z\vert x)f(z) \text { for all }x\in \mathcal {X}. \end{aligned}$$

In the parlance of linear algebra, or functional analysis, \(\mathrm {T}\) is the dual of the linear transformation with Markov matrix \(M\) with elements \(M_{xy}:=q(y\vert x)\).

Up to now, we have mainly been concerned with conditional expectations of the type \(E(\cdot \vert s)\). We will now look at particular unconditional expectations, where \(s=\square \). For any \(n\ge 0\), we define the expectation for the (single) state \(X_n\) at time \(n\) by

$$\begin{aligned} E_n(f)=E(f(X_n))=E(f(X_n)\vert \square ) \text { for all }f:\mathcal {X}\rightarrow \mathbb {R}\end{aligned}$$

and we denote the corresponding mass function by \(m_n\). Applying the Law of Iterated Expectations in Theorem 4.2 now yields, with also \(E_1=E_{m_\square }\) and \(m_1=m_\square \):

$$\begin{aligned} E_n(f)=E_1(\mathrm {T}^{n-1}f), \text { and dually, } m_n=M^{n-1}m_1, \end{aligned}$$

so the complexity of calculating \(E_n(f)\) is linear in the number of time steps \(n\).

Exercise 4.9

Consider the stochastic process where we first flip a fair coin. From then on, on heads, we select a biased coin with probability \(p\) for heads for the next coin flip, and on tails, a biased coin with probability \(q=1-p\) for heads, and keep on flipping one of the two biased coins, selected on the basis of the outcome of the previous coin flip. This produces a Markov chain. Find \(\mathrm {T}f\), \(\mathrm {T}^2f\), and \(E_1(f)\), \(E_2(f)\) and \(E_3(f)\) for \(f\in \mathcal {L}(\{0,1\})\).

Solution: Clearly, \(E_1(f)=\nicefrac {1}{2}f(1)+\nicefrac {1}{2}f(0)\), \(\mathrm {T}f(0)=E(f\vert 0)=qf(1)+pf(0)\) and \(\mathrm {T}f(1)=E(f\vert 0)=pf(1)+qf(0)\), whence

$$\begin{aligned} E_2(f)=E_1(\mathrm {T}f) =\frac{p+q}{2}f(1)+\frac{p+q}{2}f(0) =\frac{1}{2}f(1)+\frac{1}{2}f(0). \end{aligned}$$

Similarly,

$$\begin{aligned} \mathrm {T}^2f(0)&=q\mathrm {T}f(1)+p\mathrm {T}f(0) =q[pf(1)+qf(0)]+p[qf(1)+pf(0)]\\&=(p^2+q^2)f(0)+2pqf(1)\\ \mathrm {T}^2f(1)&=(p^2+q^2)f(1)+2pqf(0), \end{aligned}$$

whence

$$\begin{aligned} E_3(f)=E_1(\mathrm {T}^2f) =\frac{p^2+q^2+2pq}{2}f(1)+\frac{2pq+p^2+q^2}{2}f(0) =\frac{1}{2}f(1)+\frac{1}{2}f(0), \end{aligned}$$

and so on. We see that at the level of expectations of single state variables, the process cannot be distinguished from flipping a fair coin. \(\lozenge \)

The generalisation from precise to imprecise Markov chains goes as follows:

Definition 4.6

The uncertain process is a stationary imprecise Markov chain when the Markov Condition is satisfied with stationarity: \(\mathcal {M}_{(x_1,\dots ,x_n)}=\mathcal {Q}(\cdot \vert x_n)\) for some family of transition credal sets \(\mathcal {Q}(\cdot \vert x)\), \(x\in \mathcal {X}\).

figure r

An imprecise Markov chain can be seen as an infinity of (precise) probability trees: choose a precise mass function from \(\mathcal {M}_s\) in each situation \(s\). It should be clear that not all of these satisfy the Markov property or stationarity. This implies that solving the optimisation problem in order to find the tight upper bounds \(\overline{E}\phantom {E}(g\vert s)\), as discussed in Sect. 4.5, is not (necessary always) simply an optimisation over a parametrised collection of stationary (or even non-stationary) Markov chains, although it can turn out be so simple in a number of special cases.

For each \(x\in \mathcal {X}\), the local transition model \(\mathcal {Q}(\cdot \vert x)\) corresponds to an upper expectation operator \(\overline{E}\phantom {E}(\cdot \vert x)\), with \(\overline{E}\phantom {E}(f\vert x)=\max {\left\{ E_p(f):p\in \mathcal {Q}(\cdot \vert x)\right\} }\) for all \(f\in \mathcal {L}(\mathcal {X})\). This leads to the following definition, which generalises the definition of transition operators for precise Markov chains:

Definition 4.7

Consider the non-linear transformation \(\overline{\mathrm {T}}\phantom {\mathrm {T}}\) of \(\mathcal {L}(\mathcal {X})\), called the upper transition operator: \(\overline{\mathrm {T}}\phantom {\mathrm {T}}:\mathcal {L}(\mathcal {X})\rightarrow \mathcal {L}(\mathcal {X}):f\mapsto \overline{\mathrm {T}}\phantom {\mathrm {T}}f\) where the real map \(\overline{\mathrm {T}}\phantom {\mathrm {T}}f\) is defined by \(\overline{\mathrm {T}}\phantom {\mathrm {T}}f(x):=\overline{E}\phantom {E}(f\vert x)=\max {\left\{ E_p(f):p\in \mathcal {Q}(\cdot \vert x)\right\} }\) for all \(x\in \mathcal {X}\).

For any \(n\ge 0\), we define the upper expectation for the (single) state \(X_n\) at time \(n\) by

$$\begin{aligned} \overline{E}\phantom {E}_n(f)=\overline{E}\phantom {E}(f(X_n))=\overline{E}\phantom {E}(f(X_n)\vert \square ) \text { for all }f:\mathcal {X}\rightarrow \mathbb {R}. \end{aligned}$$

Then the Law of Iterated Upper Expectations of Theorem 4.3 yields, with also \(\overline{E}\phantom {E}_1=\overline{E}\phantom {E}_{\mathcal {M}_\square }\):

$$\begin{aligned} \overline{E}\phantom {E}_n(f)=\overline{E}\phantom {E}_1(\overline{\mathrm {T}}\phantom {\mathrm {T}}^{n-1}f)\text { for all }n\ge 1 \text { and all }f\in \mathcal {L}(\mathcal {X}), \end{aligned}$$

so the complexity of calculating \(\overline{E}\phantom {E}_n(f)\) is still linear in the number of time steps \(n\).

4.7 Examples

Consider a two-element state space \(\mathcal {X}=\{1,0\}\), with upper expectation \(\overline{E}\phantom {E}_1=\overline{E}\phantom {E}_{\mathcal {M}_\square }\) for the first variable, and for each \((x_1,\dots ,x_n)\in \{1,0\}^n\), with \(0<\epsilon \le 1\), \(\mathcal {M}_{(x_1,\dots ,x_n)}=\mathcal {M}_{x_n}=(1-\epsilon )\{q(\cdot \vert x_n)\}+\epsilon \Sigma _{\{1,0\}}\), or equivalently, for the upper transition operator \(\overline{\mathrm {T}}\phantom {\mathrm {T}}=(1-\epsilon )\mathrm {T}+\epsilon \max \). In other words, each transition credal set \(\mathcal {Q}(\cdot \vert x)\) is a linear-vacuous mixture (see Exercise 4.2, also for the notations used) centred on the transition mass function \(q(\cdot \vert x)\), where the mixture coefficient \(\epsilon \) is the same in each state \(x\).

It is a matter of simple and direct verification that for \(n\ge 1\) and \(f\in \mathcal {L}(\mathcal {X})\): \(\overline{\mathrm {T}}\phantom {\mathrm {T}}^nf=(1-\epsilon )^n\mathrm {T}^nf+\epsilon \sum _{k=0}^{n-1}(1-\epsilon )^k\max \mathrm {T}^kf\), and therefore, using the Law of Iterated Expectations, \(\overline{E}\phantom {E}_{n+1}(f){=}\overline{E}\phantom {E}_1(\overline{\mathrm {T}}\phantom {\mathrm {T}}^nf)=(1-\epsilon )^n\overline{E}\phantom {E}_1(\mathrm {T}^nf)+\epsilon \sum _{k=0}^{n-1}(1-\epsilon )^k\max \mathrm {T}^kf\). If we now let \(n\rightarrow \infty \), it is not too hard to see that the limit exists and is independent of the initial upper expectation \(\overline{E}\phantom {E}_1\):

$$\begin{aligned} \lim _{n\rightarrow \infty }\overline{E}\phantom {E}_{n}(f) =\epsilon \sum _{k=0}^{\infty }(1-\epsilon )^k\max \mathrm {T}^kf \text { for all }f\in \mathcal {L}(\mathcal {X}). \end{aligned}$$

We consider two special cases:

  1. 1.

    Contaminated random walk: when \(\mathrm {T}f(1){=}\mathrm {T}f(0)=\nicefrac {1}{2}[f(1)+f(0)]\), the underlying precise Markov chain is actually like flipping a fair coin. We then find that \(\overline{E}\phantom {E}_\infty (f)=(1-\epsilon )\nicefrac {1}{2}[f(1)+f(0)]+\epsilon \max f\) for all \(f\in \mathcal {L}(\mathcal {X})\).

  2. 2.

    Contaminated cycle: when \(\mathrm {T}f(1)=f(0)\text { and }\mathrm {T}f(0)=f(1)\), the underlying precise Markov chain is actually like deterministic cycle between the states \(0\) and \(1\). We then find that \(\overline{E}\phantom {E}_\infty (f)=\max f\) for all \(f\in \mathcal {L}(\mathcal {X})\).

The probability intervals for \(1\) corresponding to these two limit models are given by

figure s

As another example, we consider \(\mathcal {X}=\{a,b,c\}\) and the transition models depicted below, which are imprecise models not very far from a simple cycle:

figure t

Below, we depict the time evolution of the \(\overline{E}\phantom {E}_n\) (as credal sets) for three cases (red, yellow and blue). We see that, here too, regardless of the initial distribution \(\overline{E}\phantom {E}_1\), the distribution \(\overline{E}\phantom {E}_n\) seems to converge to the same distribution.

figure u

4.8 A Non-linear Perron–Frobenius Theorem, and Ergodicity

The convergence behaviour in the previous examples can also be observed in general imprecise Markov chains under fairly weak conditions. The following theorems can be derived from the more general discussions and results in [3, 5].

Theorem 4.4

Consider a stationary imprecise Markov chain with finite state set \(\mathcal {X}\) and upper transition operator \(\overline{\mathrm {T}}\phantom {\mathrm {T}}\). Suppose that \(\overline{\mathrm {T}}\phantom {\mathrm {T}}\) is regular, meaning that there is some \(n>0\) such that \(\min \overline{\mathrm {T}}\phantom {\mathrm {T}}^n{I_{\{x\}}}>0\) for all \(x\in \mathcal {X}\). Then for every initial upper expectation \(\overline{E}\phantom {E}_1\), the upper expectation \(\overline{E}\phantom {E}_n=\overline{E}\phantom {E}_1\circ \overline{\mathrm {T}}\phantom {\mathrm {T}}^{n-1}\) for the state at time \(n\) converges point-wise to the same stationary upper expectation \(\overline{E}\phantom {E}_\infty \): \(\lim _{n\rightarrow \infty }\overline{E}\phantom {E}_n(h)=\lim _{n\rightarrow \infty }\overline{E}\phantom {E}_1(\overline{\mathrm {T}}\phantom {\mathrm {T}}^{n-1}h):=\overline{E}\phantom {E}_\infty (h)\) for all \(h\) in \(\mathcal {L}(\mathcal {X})\). The limit upper expectation \(\overline{E}\phantom {E}_\infty \) is the only \(\overline{\mathrm {T}}\phantom {\mathrm {T}}\)-invariant upper expectation on \(\mathcal {L}(\mathcal {X})\), meaning that \(\overline{E}\phantom {E}_\infty =\overline{E}\phantom {E}_\infty \circ \overline{\mathrm {T}}\phantom {\mathrm {T}}\).

In that case we also have an interesting ergodicity result. For a detailed description of the notion of ‘almost surely’, we refer to [3], but it roughly means ‘with upper probability one’.

Theorem 4.5

Consider a stationary imprecise Markov chain with finite state set \(\mathcal {X}\) and upper transition operator \(\overline{\mathrm {T}}\phantom {\mathrm {T}}\). Suppose that \(\overline{\mathrm {T}}\phantom {\mathrm {T}}\) is regular with stationary upper expectation \(\overline{E}\phantom {E}_\infty \). Then, almost surely, for all \(h\) in \(\mathcal {L}(\mathcal {X})\):

$$\begin{aligned} \underline{E}\phantom {E}_\infty (h)\le \liminf _{n\rightarrow \infty }\frac{1}{n}\sum _{k=1}^nh(X_k) \le \limsup _{n\rightarrow \infty }\frac{1}{n}\sum _{k=1}^nh(X_k)\le \overline{E}\phantom {E}_\infty (h). \end{aligned}$$

4.9 Conclusion

The discussion in this paper lays bare a few interesting but quite basic aspects of inference in imprecise probability trees and Markov chains in discrete time. A more general and deeper treatment of these matters can be found in [3,4,5]. For recent work on imprecise Markov chains in continuous time, I refer the interested reader to [1, 9].