As mentioned before, we will utilize the notion of permissive schedulers, where not all nondeterminism is to be resolved. A permissive scheduler may select a set of actions at each state, such that at a state there might be several possible actions or probability distributions over actions left open. In this sense, permissive schedulers can be seen as sets of schedulers. Here, we discuss properties and efficient representations that are needed later on. Analogously to schedulers, we consider only memoryless notions.
Definition 4
(Permissive Scheduler). A permissive scheduler of MDP \(\mathcal {M}{}=(S{},s_{ I }{}, Act ,\mathcal {P}{})\) is a function \(\theta :S\rightarrow 2^{ Distr ( Act )}\setminus \emptyset \) and \(\forall s\in S.\,\forall \mu \in \theta (s).\, supp (\mu )\subseteq Act (s)\). The set of all permissive schedulers for \(\mathcal {M}\) is \( PSched ^\mathcal {M}\).
Intuitively, at each state there is not only one but several distributions over actions available. Deterministic permissive schedulers are functions of the form \({S\rightarrow 2^ Act }\), i. e., there are different choices of action left open. We use the following notations for connections to (non-permissive) schedulers.
Definition 5
(Compliance). A scheduler \(\sigma \) for the MDP \(\mathcal {M}\) is compliant with a permissive scheduler \(\theta \), written \(\sigma \in \theta \), iff for all \(s\in S\) it holds that \(\sigma (s)\in \theta (s)\).
A permissive scheduler \(\theta _{\mathcal {S}}\) for \(\mathcal {M}\) is induced by a set of schedulers \(\mathcal {S}\subseteq Sched ^\mathcal {M}\), iff for each state \(s\in S\) and each distribution \(\mu \in \theta _{\mathcal {S}}(s)\) there is a scheduler \(\sigma \in \mathcal {S}\) with \(\sigma (s)=\mu \).
We are interested in sets of schedulers that admit our safety specification.
Definition 6
(Safe and Maximal Permissive Scheduler). A permissive scheduler \(\theta \in PSched ^\mathcal {M}\) for the MDP \(\mathcal {M}\) is safe for a reachability property \(\varphi =\mathbb {P}_{\le \lambda }(\lozenge T)\) iff for all \(\sigma \in \theta \) it holds that \(\sigma \models \varphi \), denoted by \(\theta \models \varphi \). The permissive scheduler \(\theta \) is called maximal, if there exists no scheduler \(\sigma \in Sched ^\mathcal {M}\) with \(\sigma \not \in \theta \) and \(\sigma \models \varphi \).
A safe permissive scheduler contains only schedulers that admit the safety specification while a maximal safe permissive scheduler contains all such schedulers (and probably more). Note that even for a set of safe schedulers, the induced permissive scheduler might be unsafe; contradicting choices might evolve, i. e., choosing a certain action (or distribution) at one state might rule out certain memoryless choices at other states; this is illustrated by the following example.
Example 1
Consider the MDP \(\mathcal {M}\) depicted in Fig. 1, where the only nondeterministic choices occur at states \(s_0\) and \(s_1\). Assume a reachability property \(\varphi = \mathbb {P}_{\le 0.3}(\lozenge \{ s_2 \})\). This property is violated by the deterministic scheduler
as \(s_2\) is reached with probability 0.36 exceeding the threshold 0.3. This is the only unsafe scheduler; removing either action a or c from \(\mathcal {M}\) leads to a safe MDP, i. e. the possible deterministic schedulers
,
, and
are all safe. However, consider the induced permissive scheduler \(\theta _{\sigma _2,\sigma _3,\sigma _4}\in PSched ^\mathcal {M}\) with
, where in fact all nondeterministic choices are left open. Unfortunately, it holds that the unsafe scheduler \(\sigma _1\) is compliant with \(\theta _{\sigma _2,\sigma _3,\sigma _4}\), therefore \(\theta \) is unsafe.
Example 1 shows that in order to form a safe permissive scheduler it is not sufficient to just consider the set of safe schedulers. Actually, one needs to keep track that the very same safe scheduler is used in every state. Theoretically, this can be achieved by adding finite memory to the scheduler in order to avoid conflicting actions.
A succinct representation of the maximal permissive scheduler can be gained by enumerating all minimal sets of conflicting action choices (now only considering deterministic schedulers), and excluding them from all possible schedulers. We investigate the worst case size of such a set. Assume without loss of generality that for all \(s\in S\) the sets \( Act (s)\) are pairwise disjoint.
Definition 7
(Conflict Set). \(C \subseteq Act \) is a conflict set for MDP \(\mathcal {M}\) and property \(\varphi \) iff there exists a deterministic scheduler \(\sigma \in Sched ^{\mathcal {M}}\) such that \((\forall a\in C.\,\exists s\in S.\,\sigma (s)=a)\) and \(\sigma \not \models \varphi \). The set of all conflict sets for \(\mathcal {M}\) and \(\varphi \) is denoted by \( Conf ^\mathcal {M}_\varphi \). \(C\in Conf ^\mathcal {M}_\varphi \) is a minimal conflict set iff \(\forall C'\subsetneq C.\,C'\not \in Conf ^\mathcal {M}_\varphi \).
Lemma 1
The size of the set of all minimal conflict sets for \(\mathcal {M}\) and \(\varphi \) potentially grows exponentially in the number of states of \(\mathcal {M}\).
Proof Sketch. Let \(\mathcal {M}_{n} = (S, s_{ I }, Act , \mathcal {P})\) be given by \(S = \{ s_0, \ldots , s_{n}, \bot \}\), \(s_{ I }= s_0\), \(Act = \{ a_{0}, \ldots , a_{n-1}, b_{0}, \ldots , b_{n-1}, c, d \}\) and
$$\begin{aligned} \mathcal {P}(s, \alpha )(t) = {\left\{ \begin{array}{ll} 0.5 &{} \text {if } i< n, \alpha = a_{i}, s = s_{i}, t = s_{i+1} \\ 0.5 &{} \text {if } i< n, \alpha = a_{i}, s = s_{i}, t = \bot \\ 1 &{} \text {if } i < n, \alpha = b_{i}, s = s_{i}, t = s_{i+1} \\ 1 &{} \text {if } \alpha = c, s = s_{n}, t = s_{n} \\ 1 &{} \text {if } \alpha = d, s = \bot , t = \bot \\ 0 &{} \text {otherwise} \end{array}\right. } \end{aligned}$$
Figure 2 shows the instance \(\mathcal {M}_{4}\) where several copies of the \(\bot \)-states have been drawn and the d self-loops have been omitted for ease of presentation.
Consider the property \(\varphi = \mathbb {P}_{\le \lambda }(\lozenge \{s_n\})\) with \(\lambda = 0.5^{\frac{n}{2} + 1}\). Choosing any combination of \(\frac{n}{2}\) of the \(b_{i}\) actions yields a minimal conflict set. Hence, there are at least
$$\begin{aligned} \left( {\begin{array}{c}n\\ \frac{n}{2}\end{array}}\right) \overset{n:=2m}{=} \frac{(2m)!}{2m!} = \underbrace{\frac{(m+1)}{1} \cdots \frac{2m}{m}}_{m \text { factors } \ge \; 2} \ge 2^m \overset{m:=\frac{n}{2}}{=} 2^{\frac{n}{2}} \in \varOmega \left( \left( \sqrt{2}\right) ^{n}\right) \end{aligned}$$
minimal conflict sets. \(\Box \)
This strongly indicates that an exact representation of the maximal permissive scheduler is not feasible. For algorithmic purposes, we strive for a more compact representation. It seems natural to investigate the possibilities of using MDPs as representation of permissive schedulers. Therefore, analogously to induced MCs for schedulers (cf. Definition 3), we define induced MDPs for permissive schedulers. For a permissive scheduler \(\theta \in PSched ^\mathcal {M}\), we will uniquely identify the nondeterministic choices of probability distributions \(\mu \in \theta (s)\) at each state \(s\in S\) of the MDP by new actions \(a_{s,\mu }\).
Definition 8
(Induced MDP). For an MDP \(\mathcal {M}{}=(S{},s_{ I }{}, Act ,\mathcal {P}{})\) and permissive scheduler \(\theta \) for \(\mathcal {M}\), the MDP induced by \(\mathcal {M}\) and \(\theta \) is \(\mathcal {M}^\theta =(S,s_{ I }, Act ^\theta ,\mathcal {P}^\theta )\) with \( Act ^\theta =\{a_{s,\mu }\mid s\in S,\mu \in \theta (s)\}\) and:
$$\begin{aligned} \mathcal {P}^\theta (s,a_{s,\mu })(s')=\sum _{a\in Act (s)}\mu (s)(a)\cdot \mathcal {P}(s,a)(s')\quad \text{ for } s,s'\in S \text{ and } a_{s,\mu }\in Act ^\theta ~. \end{aligned}$$
Intuitively, we nondeterministically choose between the distributions over actions induced by the permissive scheduler \(\theta \). Note that if the permissive scheduler contains only one distribution for each state, i. e., in fact the permissive scheduler is just a scheduler, the actions can be discarded which yields an induced MC as in Definition 3, making this definition backward compatible.
Remark 2
Each deterministic scheduler \(\sigma \in Sched ^{\mathcal {M}^\theta }\) for the induced MDP \(\mathcal {M}^\theta \) induces a (randomized) scheduler for the original MDP \(\mathcal {M}\). In particular, \(\sigma \) induces a scheduler \(\sigma '\in \theta \) for \(\mathcal {M}\) which is compliant with the permissive scheduler \(\theta \): For all \(s\in S\) there exists an action \(a_{s,\mu }\in Act ^\theta \) such that \(\sigma (s)=a_{s,\mu }\). The randomized scheduler \(\sigma '\) is then given by \(\sigma '(s)=\mu \) and it holds that
$$\begin{aligned} \sum _{a \in Act (s)}\sigma '(s)(a)\cdot \mathcal {P}(s,a)(s')=\mathcal {P}^\theta (s,a_{s,\mu })(s')\ . \end{aligned}$$
Remark 3
A deterministic permissive scheduler \(\theta _{\text {det}}\in PSched ^\mathcal {M}\) for the MDP \(\mathcal {M}\) simply restricts the nondeterministic choices of the original MDP to the ones that are chosen with probability one by \(\theta _{\text {det}}\). The transition probability function \(\mathcal {P}^{\theta _{\text {det}}}\) of the induced MDP \(\mathcal {M}^{\theta _{\text {det}}}\) can be written as
$$\begin{aligned} \mathcal {P}^\theta (s,a_{s,\mu })(s')=\mathcal {P}(s,a)(s')\quad \text{ for } \text{ all } s\in S \text{ and } a_{s,\mu }\in Act ^{\theta _{\text {det}}} \text{ with } \mu (a)=1~. \end{aligned}$$
The induced MDP \(\mathcal {M}^\theta \) can be seen as a sub-MDP \(\mathcal {M}^{ sub }=(S,s_{ I }, Act ,\mathcal {P}^{ sub })\) of \(\mathcal {M}\) by omitting all actions that are not chosen. Hence, for all \(s,s'\in S\):
$$\begin{aligned} \mathcal {P}^{ sub }(s,a)(s')= {\left\{ \begin{array}{ll} \mathcal {P}(s,a)(s') &{} \text { if }\exists \mu \in \theta (s).\,\mu (a)=1 \\ 0 &{} \text { otherwise .} \end{array}\right. } \end{aligned}$$
Example 2
Recall Example 1 with \(\varphi = \mathbb {P}_{\le 0.3}(\lozenge \{ s_2 \})\). The MDP \(\mathcal {M}^{\theta }\) induced by the permissive scheduler \(\theta \) is the same as \(\mathcal {M}\), as all available choices of actions are included (see Example 1). Note that we use the simplified notation from Remark 3. However, consider the safe (but not maximal) permissive scheduler \(\theta _{\textit{safe}}\) formed by \(\{s_0 \mapsto a, s_1 \mapsto d\}\) and \(\{s_0 \mapsto b, s_1 \mapsto d\}\). The induced MDP is the sub-MDP \(\mathcal {M}^{\theta _{\textit{safe}}}\) of \(\mathcal {M}\) depicted in Fig. 3. This sub-MDP has no scheduler \(\sigma \) with \(\sigma \not \models \varphi \).