Imprecise Discrete-Time Markov Chains

I present a short and easy introduction to a number of basic deﬁnitions and important results from the theory of imprecise Markov chains in discrete time, with a ﬁnite state space. The approach is intuitive and graphical.


Precise Probability Models
Assume we are uncertain about the value that a variable X assumes in some finite set of possible values X . This is usually modelled by a probability mass function m on X , satisfying (∀x ∈ X )m(x) ≥ 0 and x∈X m(x) = 1.
With m we can associate an expectation operator E m as follows If A ⊆ X is an event, then its probability is given by P m (A) = x∈A m(x) = E m (I A ), where I A : X → R is the indicator of A and assumes the value 1 on A and 0 elsewhere. This tells us that there are two equivalent mathematical languages for dealing with uncertainty: the language of probabilities and the language of expectations, and that we can go freely from one to the other.
All possible (precise) probability models are gathered in the simplex X of all mass functions on X : X := m ∈ R X : (∀x ∈ X )m(x) ≥ 0 and x∈X m(x) = 1 . Any probability model for uncertainty about X is a point in that simplex, which indicated that mass functions have a geometrical interpretation. This is illustrated below for the case X = {a, b, c} and the uniform mass function m u .
Expectation also has a geometrical interpretation: specifying a value E( f ) for the expectation of a map f : X → R, namely, x∈X m(x) f (x) = E( f ), imposes a linear constraint on the possible values for m in X . It corresponds to intersecting the simplex X with a hyperplane, whose direction depends on f . This is also illustrated in the picture above; in this particular case two assessments turn out to completely determine a unique mass function.

Imprecise Probability Models
We now turn to a generalisation of precise probability models, which we will call imprecise. To allow for more realistic and flexible assessments, we can envisage imposing linear inequality-rather than equality-constraints on the m in X : This corresponds to intersecting X with affine semi-spaces: Any such number of assessments leads to a credal set M, which is our first type of imprecise probability model.
Below, we show some more examples of such credal sets in the special case X = {a, b, c}. The credal set on the left corresponds to the assessment: 'b is at least as likely as c'; the one in the middle is a convex mixture of the uniform mass function with the entire simplex; and the one on the right represents a statement in classical propositional logic: 'X = a or X = c'. This illustrates that the language of credal sets encompasses both precise probabilities and classical propositional logic.
Lower and upper expectations are our second type of imprecise probability model. To see how they come about, consider the credal set in the figure below on the right.
We can ask what we know about the probability of c, or the expectation of I {c} , given this credal set: it is only known to belong to the closed interval [ 1 /4, 4 /7]. This can be generalised from events to arbitrary elements of the set L(X ) = R X of all real-valued maps f on X : As m ranges over the credal set M, E m ( f ) will similarly range over a closed interval that is completely determined by its lower and upper bounds.
This leads to the definition of the following two real functionals on L(X ): Observe that these lower and upper expectations are mathematically equivalent models, because We will in what follows focus on upper expectations.

Exercise 4.1 What is the upper expectation
This shows that we can go from the language of probabilities-and the use of Mto the language of expectations-and the use of E M . To see that we can also go the other way, we need the following definition: Definition 4. 2 We call a real functional E on L(X ) an upper expectation if it satisfies the following properties: for all f and g in L(X ) and all real λ ≥ 0: Upper expectations are also called coherent upper previsions [10,12]. They constitute a model that is mathematically equivalent to credal sets, in very much the same way as expectations are mathematically equivalent to probability mass functions:

Discrete-Time Uncertain Processes
We now apply these ideas in a more dynamic context: the study of processes. We consider an uncertain process, which is a collection of uncertain variables X 1 , X 2 , …, X n , … assuming values in some finite set of states X . This can be represented graphically by a standard event tree with nodes (also called situations) s = (x 1 , x 2 , . . . , x n ) for x k ∈ X and n ≥ 0. This is depicted below on the left for the special case that X = {0, 1}, where we have limited ourselves to three variables X 1 , X 2 , and X 3 ; but the idea should be clear. Observe that we use the symbol for the initial situation, or root node, of the event tree.
The event tree becomes a probability tree as soon as we attach to each node s = (x 1 , x 2 , . . . , x n ) a local probability mass function m s on X with associated expectation operator E m s , expressing the uncertainty about the next variable X n+1 after observing the earlier variables X 1 = x 1 , …, X n = x n . This is depicted above on the right for the special case that X = {0, 1}. We now consider a very general inference problem in such a probability tree. Consider any function g : X n → R of the first n variables: g = g(X 1 , X 2 , . . . , X n ). We want to calculate its expectation E(g|s) in the situation s = (x 1 , . . . , x k ), that is, after having observed the first k variables. Interestingly, this can be done efficiently using the following theorem, which is a reformulation of the Law of Total Probability:

Theorem 4.2 (Law of Iterated Expectations) If we know E(g|s, x) for all x ∈ X , then we can calculate E(g|s) by backwards recursion using the local model m s :
This shows that expectations can be calculated recursively using a very basic step, illustrated below for the case X = {0, 1}: Hence, all expectations E(g|x 1 , . . . , x k ) in the tree can be calculated from the local models m s as follows: 1. start in the final cut X n and let E(g|x 1 2. do backwards recursion using the Law of Iterated Expectations: 3. go on until you get to the root node , where we can identify E(g| ) = E(g).

Exercise 4.4
Consider flipping a coin twice independently, with probability p for heads-outcome 1-and q = 1 − p for tails-outcome 0. The corresponding probability tree for this experiment is given below on the left, with, in red, in the nodes, the corresponding number of heads. What is the expected number of heads?
Solution: Above on the right, we apply the Law of Iterated Expectations recursively, from leaves to root; the solution is the expectation 2 p attached to the root. ♦ Exercise 4.5 Extend the ideas in the solution to Exercise 4.4 to calculate the expected number of heads when the coin is flipped n times independently. Solution: We apply the Law of Iterated Expectations recursively, from leaves to root. Below on the left, we consider starting from the leaves of the tree at depth n; applying the Law reduces to adding p to the number of heads in each of their parent nodes at depth n − 1. On the right, we apply the Law to these nodes at depth n − 1, which reduces to adding 2 p to the number of heads in each of their parent nodes at depth n − 2.
at time n − 2 in a situation with k heads Going on in this way, we see that the solution is the expectation np attached to the root at depth 0. ♦ Exercise 4. 6 We now flip the same coin time and time again, independently, until we reach heads for the first time. Calculate the expected number of coin flips. Solution: Below is the (unbounded) probability tree associated with this experiment.
Call the unknown expectation α. We apply the Law of Iterated Expectations to the situations at depth 1. In the situation 1, the expected number of heads is 1, the actual number of heads there. In the situation 0, we see a copy of the original tree extending to the right, but since we have already flipped the coin once here, the expected number of heads in this situation is α + 1. In the parent node, the expected number of heads α is therefore also given by p · 1 + q · (α + 1) = 1 + qα, whence α = 1 /p. ♦

Imprecise Probability Trees
Until now, we have assumed that we have sufficient information in order to specify, in each node s, a local probability mass function m s on the set X of possible values for the next state.
We now let go of this major restrictive assumption by allowing for more general uncertainty models. We will consider credal sets as our more general local uncertainty models: closed convex subsets M s of X . See the figure below for a special case when X = {0, 1}. An imprecise probability tree can be interpreted as an infinity of compatible precise probability trees: choose in each node s a probability mass function m s from the set M s .
For each real map g = g(X 1 , . . . , X n ), each node s = (x 1 , . . . , x k ), and each such compatible precise probability tree, we can calculate the expectation E(g|x 1 , . . . , x k ) using the backwards recursion method described before. By varying over each compatible probability tree, we get a closed real interval, completely characterised by lower and upper expectations E(g|x 1 , . . .
The complexity of calculating these bounds in this way is clearly exponential in the number of time steps n. But, there is a more efficient method to calculate them: This shows that expectations can be calculated recursively using a very basic step, illustrated below for the case X = {0, 1}: The method for, and the complexity of, calculating the E(g|s), as a function of n, is therefore essentially the same as in the precise case! applying the Law reduces to adding p to the number of heads in each of their parent nodes at depth n − 1.
at time n − 1 in a situation with k heads at time n − 2 in a situation with k heads On the right, we apply the Law to these nodes at depth n − 1, which reduces to adding 2 p to the number of heads in each of their parent nodes at depth n − 2. Going on in this way, we see that the solution is the expectation n p attached to the root at depth 0. A similar result holds for the lower expectation. Call the unknown upper expectation α. We apply the Law of Iterated Upper Expectations to the situations at depth 1. In the situation 1, the upper expected number of heads is 1, the actual number of heads there. In the situation 0, we see a copy of the original tree extending to the right, but since we have already flipped the coin once here, the upper expected number of heads in this situation is α + 1. In the parent node, the upper expected number of heads α is therefore also given by 1 + E(α I {0} ) = 1 + αq, whence α = 1 /p. A similar result holds for the lower expectation. ♦ The attentive reader will have observed that in all these simple exercises, we can also obtain the 'imprecise' result from the 'precise' one by optimising over the single parameter p. We have to warn against too much optimism: in more involved examples, this will no longer be the case.

Imprecise Markov Chains
We now look at a special instance of a probability tree, corresponding to a stationary (precise) Markov chain. This happens when the precise local models m (x 1 ,...,x n ) only depend on the last observed state x n -this is the Markov Condition-and also do not depend explicitly on the time step n: for some family of transition mass functions q(·|x), x ∈ X . For each x ∈ X , the transition mass function q(·|x) corresponds to an expectation operator, given by E( f |x) = z∈X q(z|x) f (z) for all f ∈ L(X ).

Definition 4.5 Consider the linear transformation T of L(X ), called transition oper-
In the parlance of linear algebra, or functional analysis, T is the dual of the linear transformation with Markov matrix M with elements M xy := q(y|x).
Up to now, we have mainly been concerned with conditional expectations of the type E(·|s). We will now look at particular unconditional expectations, where s = . For any n ≥ 0, we define the expectation for the (single) state X n at time n by

Exercise 4.9
Consider the stochastic process where we first flip a fair coin. From then on, on heads, we select a biased coin with probability p for heads for the next coin flip, and on tails, a biased coin with probability q = 1 − p for heads, and keep on flipping one of the two biased coins, selected on the basis of the outcome of the previous coin flip. This produces a Markov chain. Find T f , T 2 f , and Similarly, and so on. We see that at the level of expectations of single state variables, the process cannot be distinguished from flipping a fair coin. ♦ The generalisation from precise to imprecise Markov chains goes as follows: The uncertain process is a stationary imprecise Markov chain when the Markov Condition is satisfied with stationarity: An imprecise Markov chain can be seen as an infinity of (precise) probability trees: choose a precise mass function from M s in each situation s. It should be clear that not all of these satisfy the Markov property or stationarity. This implies that solving the optimisation problem in order to find the tight upper bounds E(g|s), as discussed in Sect. 4.5, is not (necessary always) simply an optimisation over a parametrised collection of stationary (or even non-stationary) Markov chains, although it can turn out be so simple in a number of special cases.
For each x ∈ X , the local transition model Q(·|x) corresponds to an upper expec- This leads to the following definition, which generalises the definition of transition operators for precise Markov chains: For any n ≥ 0, we define the upper expectation for the (single) state X n at time n by

Examples
Consider a two-element state space X = {1, 0}, with upper expectation E 1 = E M for the first variable, and for each ( It is a matter of simple and direct verification that for n ≥ 1 and f ∈ L(X ): If we now let n → ∞, it is not too hard to see that the limit exists and is independent of the initial upper expectation E 1 : We consider two special cases: , the underlying precise Markov chain is actually like flipping a fair coin. We then find that (1), the underlying precise Markov chain is actually like deterministic cycle between the states 0 and 1. We then find that E ∞ ( f ) = max f for all f ∈ L(X ).
The probability intervals for 1 corresponding to these two limit models are given by As another example, we consider X = {a, b, c} and the transition models depicted below, which are imprecise models not very far from a simple cycle: Below, we depict the time evolution of the E n (as credal sets) for three cases (red, yellow and blue). We see that, here too, regardless of the initial distribution E 1 , the distribution E n seems to converge to the same distribution.

A Non-linear Perron-Frobenius Theorem, and Ergodicity
The convergence behaviour in the previous examples can also be observed in general imprecise Markov chains under fairly weak conditions. The following theorems can be derived from the more general discussions and results in [3,5]. In that case we also have an interesting ergodicity result. For a detailed description of the notion of 'almost surely', we refer to [3], but it roughly means 'with upper probability one'.

Conclusion
The discussion in this paper lays bare a few interesting but quite basic aspects of inference in imprecise probability trees and Markov chains in discrete time. A more general and deeper treatment of these matters can be found in [3][4][5]. For recent work on imprecise Markov chains in continuous time, I refer the interested reader to [1,9].
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made. The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.