For a finite set S and vector \(x \in \mathbb {R}^{|S|}\), let \({x[s]} \in \mathbb {R}\) denote the entry of x that corresponds to \(s \in S\). Let \(S' \subseteq S\) and \(a \in \mathbb {R}\). We write \({x[S']} = a\) to denote that \({x[s]} = a\) for all \(s \in S'\). Given \(x,y \in \mathbb {R}^{|S|}\), \(x \le y\) holds iff \({x[s]} \le {y[s]}\) holds for all \(s \in S\). For a function \(f :\mathbb {R}^{|S|} \rightarrow \mathbb {R}^{|S|}\) and \(k \ge 0\) we write \(f^k\) for the function obtained by applying f k times, i.e., \(f^0(x) = x\) and \(f^k(x) = f(f^{k-1}(x))\) if \(k>0\).
2.1 Probabilistic Models and Measures
We briefly present probabilistic models and their properties. More details can be found in, e.g., [15].
Definition 1
(Probabilistic Models). A Markov Decision Process (MDP) is a tuple \(\mathcal {M}= (S, Act , \mathbf {P}, {s_{I}}, \rho )\), where
-
S is a finite set of states, \( Act \) is a finite set of actions, \({s_{I}}\) is the initial state,
-
\(\mathbf {P}:S \times Act \times S \rightarrow [0,1]\) is a transition probability function satisfying \(\sum _{s' \in S} \mathbf {P}(s, \alpha , s') \in \{0,1\} \) for all \(s \in S, \alpha \in Act \), and
-
\(\rho :S \times Act \rightarrow \mathbb {R}\) is a reward function.
\(\mathcal {M}\) is a Markov Chain (MC) if \(| Act | = 1\).
Example 1
Figure 1 shows an example MC and an example MDP.
We often simplify notations for MCs by omitting the (unique) action. For an MDP \(\mathcal {M}= (S, Act , \mathbf {P}, {s_{I}}, \rho )\), the set of enabled actions of state \(s \in S\) is given by \( Act (s) = \{ \alpha \in Act \mid \sum _{s' \in S} \mathbf {P}(s, \alpha , s') = 1 \}\). We assume that \( Act (s) \ne \emptyset \) for each \(s \in S\). Intuitively, upon performing action \(\alpha \) at state s reward \(\rho (s,\alpha )\) is collected and with probability \(\mathbf {P}(s, \alpha , s')\) we move to \(s' \in S\). Notice that rewards can be positive or negative.
A state \(s \in S\) is called absorbing if \(\mathbf {P}(s,\alpha ,s) = 1\) for every \(\alpha \in Act (s)\). A path of \(\mathcal {M}\) is an infinite alternating sequence \(\pi = s_0 \alpha _0 s_1 \alpha _1 \dots \) where \(s_i \in S\), \(\alpha _i \in Act (s_i)\), and \(\mathbf {P}(s_i, \alpha _i, s_{i+1}) > 0\) for all \(i\ge 0\). The set of paths of \(\mathcal {M}\) is denoted by \( Paths ^{\mathcal {M}}\). The set of paths that start at \(s \in S\) is given by \( Paths ^{\mathcal {M},s}\). A finite path \(\hat{\pi }= s_0 \alpha _0 \dots \alpha _{n-1} s_n\) is a finite prefix of a path ending with \( last (\hat{\pi }) = s_n \in S\). \(|\hat{\pi }| = n\) is the length of \(\hat{\pi }\), \( Paths _{ fin }^{\mathcal {M}}\) is the set of finite paths of \(\mathcal {M}\), and \( Paths _{ fin }^{\mathcal {M},s}\) is the set of finite paths that start at state \(s \in S\). We consider LTL-like notations for sets of paths. For \(k \in \mathbb {N}\cup \{\infty \}\) and \(G, H \subseteq S\) let
$$ H {{\mathrm{\mathcal {U}}}}^{\le k} G = \{ s_0 \alpha _0 s_1 \dots \in Paths ^{\mathcal {M},{s_{I}}} \mid s_0, \dots , s_{j-1} \in H \text {, } s_j \in G \text { for some } j \le k\} $$
denote the set of paths that, starting from the initial state \({s_{I}}\), only visit states in H until after at most k steps a state in G is reached. Sets \(H {{\mathrm{\mathcal {U}}}}^{> k} G\) and \(H {{\mathrm{\mathcal {U}}}}^{=k} G\) are defined similarly. We use the shorthands \({\lozenge }^{\le k} G := S {{\mathrm{\mathcal {U}}}}^{\le k} G\), \({\lozenge }G := {\lozenge }^{\le \infty } G\), and \({\square }^{\le k} G := Paths ^{\mathcal {M}, {s_{I}}} \setminus {\lozenge }^{\le k} (S \setminus G)\).
A (deterministic) scheduler for \(\mathcal {M}\) is a function \(\sigma : Paths _{ fin }^{\mathcal {M}} \rightarrow Act \) such that \(\sigma (\hat{\pi }) \in Act ( last (\hat{\pi }))\) for all \(\hat{\pi }\in Paths _{ fin }^{\mathcal {M}}\). The set of (deterministic) schedulers for \(\mathcal {M}\) is \(\mathfrak {S}^{\mathcal {M}}\). \(\sigma \in \mathfrak {S}^{\mathcal {M}}\) is called positional if \(\sigma (\hat{\pi })\) only depends on the last state of \(\hat{\pi }\), i.e., for all \(\hat{\pi }, \hat{\pi }' \in Paths _{ fin }^{\mathcal {M}}\) we have \( last (\hat{\pi }) = last (\hat{\pi }')\) implies \(\sigma (\hat{\pi }) = \sigma (\hat{\pi }')\). For MDP \(\mathcal {M}\) and scheduler \(\sigma \in \mathfrak {S}^{\mathcal {M}}\) the probability measure over finite paths is given by \({\mathrm {Pr}}_ fin ^{\mathcal {M}, \sigma } : Paths _{ fin }^{\mathcal {M}, {s_{I}}} \rightarrow [0,1]\) with \( {\mathrm {Pr}}_ fin ^{\mathcal {M}, \sigma } (s_0 \dots s_n) = \prod _{i=0}^{n-1} \mathbf {P}(s_i, \sigma (s_0\dots s_i), s_{i+1}). \) The probability measure \({\mathrm {Pr}}^{\mathcal {M}, \sigma }\) over measurable sets of infinite paths is obtained via a standard cylinder set construction [15].
Definition 2
(Reachability Probability). The reachability probability of MDP \(\mathcal {M}= (S, Act , \mathbf {P}, {s_{I}}, \rho )\), \(G \subseteq S\), and \(\sigma \in \mathfrak {S}^{\mathcal {M}}\) is given by \({\mathrm {Pr}}^{\mathcal {M}, \sigma }({\lozenge }G)\).
For \(k \in \mathbb {N}\cup \{\infty \}\), the function \({\blacklozenge }^{\le k} G :{\lozenge }G \rightarrow \mathbb {R}\) yields the k-bounded reachability reward of a path \(\pi = s_0 \alpha _0 s_1 \dots \in {\lozenge }G\). We set \({\blacklozenge }^{\le k} G(\pi ) = \sum _{i = 0}^{j-1} \rho (s_i, \alpha _i)\), where \(j = \min (\{i\ge 0 \mid s_i \in G \} \cup \{k\})\). We write \({\blacklozenge }G\) instead of \({\blacklozenge }^{ \le \infty } G\).
Definition 3
(Expected Reward). The expected (reachability) reward of MDP \(\mathcal {M}= (S, Act , \mathbf {P}, {s_{I}}, \rho )\), \(G \subseteq S\), and \(\sigma \in \mathfrak {S}^{\mathcal {M}}\) with \({\mathrm {Pr}}^{\mathcal {M}, \sigma }({\lozenge }G) = 1\) is given by the expectation \({\mathbb {E}}^{\mathcal {M}, \sigma }({\blacklozenge }G) = \int _{\pi \in {\lozenge }G} {\blacklozenge }G (\pi ) \,\mathrm {d}{\mathrm {Pr}}^{\mathcal {M}, \sigma }(\pi )\).
We write \({\mathrm {Pr}}^{\mathcal {M}, \sigma }_s\) and \({\mathbb {E}}^{\mathcal {M}, \sigma }_s\) for the probability measure and expectation obtained by changing the initial state of \(\mathcal {M}\) to \(s \in S\). If \(\mathcal {M}\) is a Markov chain, there is only a single scheduler. In this case we may omit the superscript \(\sigma \) from \({\mathrm {Pr}}^{\mathcal {M}, \sigma }\) and \({\mathbb {E}}^{\mathcal {M}, \sigma }\). We also omit the superscript \(\mathcal {M}\) if it is clear from the context. The maximal reachability probability of \(\mathcal {M}\) and G is given by \({\mathrm {Pr}}^{\mathrm {max}}({\lozenge }G) = \max _{\sigma \in \mathfrak {S}^{\mathcal {M}}} {\mathrm {Pr}}^{\sigma }({\lozenge }G)\). There is a a positional scheduler that attains this maximum [16]. The same holds for minimal reachability probabilities and maximal or minimal expected rewards.
Example 2
Consider the MDP \(\mathcal {M}\) from Fig. 1(b). We are interested in the maximal probability to reach state \(s_4\) given by \(\mathrm {Pr}^{\mathrm {max}}({\lozenge }\{s_4 \})\). Since \(s_4\) is not reachable from \(s_3\) we have \(\mathrm {Pr}^{\mathrm {max}}_{s_3}({\lozenge }\{s_4\}) = 0\). Intuitively, choosing action \(\beta \) at state \(s_0\) makes reaching \(s_3\) more likely, which should be avoided in order to maximize the probability to reach \(s_4\). We therefore assume a scheduler \(\sigma \) that always chooses action \(\alpha \) at state \(s_0\). Starting from the initial state \(s_0\), we then eventually take the transition from \(s_2\) to \(s_3\) or the transition from \(s_2\) to \(s_4\) with probability one. The resulting probability to reach \(s_4\) is given by \(\mathrm {Pr}^{\mathrm {max}}({\lozenge }\{s_4 \}) = \mathrm {Pr}^{\sigma }({\lozenge }\{s_4 \}) = 0.3/ (0.1 + 0.3) = 0.75\).
2.2 Probabilistic Model Checking via Interval Iteration
In the following we present approaches to compute reachability probabilities and expected rewards. We consider approximative computations. Exact computations are handled in e.g. [17, 18] For the sake of clarity, we focus on reachability probabilities and sketch how the techniques can be lifted to expected rewards.
Reachability Probabilities. We fix an MDP \(\mathcal {M}= (S, Act , \mathbf {P}, {s_{I}}, \rho )\), a set of goal states \(G \subseteq S\), and a precision parameter \({\varepsilon }> 0\).
Problem 1
Compute an \({\varepsilon }\)-approximation of the maximal reachability probability \({\mathrm {Pr}^{\mathrm {max}}({\lozenge }G)}\), i.e., compute a value \({r}\in [0,1]\) with \(|{r}- {\mathrm {Pr}^{\mathrm {max}}({\lozenge }G)} | < {\varepsilon }\).
We briefly sketch how to compute such a value \({r}\) via interval iteration [12, 13, 19]. The computation for minimal reachability probabilities is analogous.
W.l.o.g. it is assumed that the states in G are absorbing. Using graph algorithms, we compute \({S}_{0} = \{ s \in S \mid {\mathrm {Pr}}^\mathrm {max}_s({\lozenge }G) =0 \}\) and partition the state space of \(\mathcal {M}\) into
with \({S}_{?} = S \setminus (G \cup {S}_{0})\). If \({s_{I}}\in {S}_{0}\) or \({s_{I}}\in G\), the probability \({\mathrm {Pr}}^\mathrm {max}({\lozenge }G)\) is 0 or 1, respectively. From now on we assume \({s_{I}}\in {S}_{?}\).
We say that \(\mathcal {M}\) is contracting with respect to \(S' \subseteq S\) if \({\mathrm {Pr}}_s^{\sigma }({\lozenge }S') = 1\) for all \(s \in S\) and for all \(\sigma \in \mathfrak {S}^{\mathcal {M}}\). We assume that \(\mathcal {M}\) is contracting with respect to \(G \cup {S}_{0}\). Otherwise, we apply a transformation on the so-called end componentsFootnote 1 of \(\mathcal {M}\), yielding a contracting MDP \(\mathcal {M}'\) with the same maximal reachability probability as \(\mathcal {M}\). Roughly, this transformation replaces each end component of \(\mathcal {M}\) with a single state whose enabled actions coincide with the actions that previously lead outside of the end component. This step is detailed in [13, 19].
We have \({x^*[s]} = {\mathrm {Pr}}^\mathrm {max}_s({\lozenge }G)\) for \(s \in S\) and the unique fixpoint \(x^*\) of the function \(f :\mathbb {R}^{|S|} \rightarrow \mathbb {R}^{|S|}\) with \({f(x)[{S}_{0}]} = 0\), \({f(x)[G]} = 1\), and
$$\begin{aligned} {f(x)[s]} = \max _{\alpha \in Act (s)} \sum _{s' \in S} \mathbf {P}(s, \alpha , s') \cdot {x[s']} \end{aligned}$$
for \(s \in {S}_{?}\). Hence, computing \({\mathrm {Pr}^{\mathrm {max}}({\lozenge }G)}\) reduces to finding the fixpoint of f.
A popular technique for this purpose is the value iteration algorithm [1]. Given a starting vector \(x \in \mathbb {R}^{|S|}\) with \({x[{S}_{0}]} = 0\) and \({x[G]} = 1\), standard value iteration computes \(f^k(x)\) for increasing k until \(\max _{s \in {S}} |{f^k(x)[s]} - {f^{k-1}(x)[s]}| < \varepsilon \) holds for a predefined precision \(\varepsilon > 0\). As pointed out in, e.g., [13], there is no guarantee on the preciseness of the result \({r}= {f^k(x)[{s_{I}}]}\), i.e., standard value iteration does not give any evidence on the error \(|{r}- {\mathrm {Pr}^{\mathrm {max}}({\lozenge }G)}|\). The intuitive reason is that value iteration only approximates the fixpoint \(x^*\) from one side, yielding no indication on the distance between the current result and \(x^*\).
Example 3
Consider the MDP \(\mathcal {M}\) from Fig. 1(b). We invoked standard value iteration in PRISM [7] and Storm [8] to compute the reachability probability \(\mathrm {Pr}^{\mathrm {max}}({\lozenge }\{s_4 \})\). Recall from Example 2 that the correct solution is 0.75. With (absolute) precision \({\varepsilon }= 10^{-6}\) both model checkers returned 0.7248. Notice that the user can improve the precision by considering, e.g., \({\varepsilon }= 10^{-8}\) which yields 0.7497. However, there is no guarantee on the preciseness of a given result.
The interval iteration algorithm [12, 13, 19] addresses the impreciseness of value iteration. The idea is to approach the fixpoint \(x^*\) from below and from above. The first step is to find starting vectors \(x_\ell , x_u \in \mathbb {R}^{|S|}\) satisfying \({x_\ell [{S}_{0}]} = {x_u[{S}_{0}]} = 0\), \({x_\ell [G]} = {x_u[G]} = 1\), and \(x_\ell \le x^* \le x_u\). As the entries of \(x^*\) are probabilities, it is always valid to set \({x_\ell [{S}_{?}]} = 0\) and \({x_u[{S}_{?}]} = 1\). We have \(f^k(x_\ell ) \le x^* \le f^k(x_u)\) for any \(k \ge 0\). Interval iteration computes \(f^k(x_\ell )\) and \(f^k(x_u)\) for increasing k until \(\max _{s \in S} |{f^k(x_\ell )[s]} - {f^{k}(x_u)[s]}| < 2 \varepsilon \). For the result \({r}= \nicefrac {1}{2} \cdot ({f^k(x_\ell )[{s_{I}}]} + {f^k(x_u)[{s_{I}}]}) \) we obtain that \(|{r}- {\mathrm {Pr}^{\mathrm {max}}({\lozenge }G)}| < \varepsilon \), i.e., we get a sound approximation of the maximal reachability probability.
Example 4
We invoked interval iteration in PRISM and Storm to compute the reachability probability \(\mathrm {Pr}^{\mathrm {max}}({\lozenge }\{s_4 \})\) for the MDP \(\mathcal {M}\) from Fig. 1(b). Both implementations correctly yield an \({\varepsilon }\)-approximation of \(\mathrm {Pr}^{\mathrm {max}}({\lozenge }\{s_4 \})\), where we considered \({\varepsilon }= 10^{-6}\). However, both PRISM and Storm required roughly 300,000 iterations for convergence.
Expected Rewards. Whereas [13, 19] only consider reachability probabilities, [12] extends interval iteration to compute expected rewards. Let \(\mathcal {M}\) be an MDP and G be a set of absorbing states such that \(\mathcal {M}\) is contracting with respect to G.
Problem 2
Compute an \({\varepsilon }\)-approximation of the maximal expected reachability reward \({{\mathbb {E}^{\mathrm {max}}}({\blacklozenge }G)}\), i.e., compute a value \({r}\in \mathbb {R}\) with \(|{r}- {{\mathbb {E}^{\mathrm {max}}}({\blacklozenge }G)} | < {\varepsilon }\).
We have \({x^*[s]} = {\mathbb {E}}_s^\mathrm {max}({\blacklozenge }G)\) for the unique fixpoint \(x^*\) of \(g :\mathbb {R}^{|S|} \rightarrow \mathbb {R}^{|S|}\) with
$$ {g(x)[G]} = 0 \ \text { and } \ {g(x)[s]} = \max _{\alpha \in Act (s)} \rho (s,\alpha ) + \sum _{s' \in S} \mathbf {P}(s, \alpha , s') \cdot {x[s']} $$
for \(s \notin G\). As for reachability probabilities, interval iteration can be applied to approximate this fixpoint. The crux lies in finding appropriate starting vectors \(x_\ell , x_u \in \mathbb {R}^{|S|}\) guaranteeing \(x_\ell \le x^* \le x_u\). To this end, [12] describe graph based algorithms that give an upper bound on the expected number of times each individual state \(s \in S \setminus G\) is visited. This then yields an approximation of the expected amount of reward collected at the various states.