Abstract
Markov decision processes are a ubiquitous formalism for modelling systems with nondeterministic and probabilistic behavior. Verification of these models is subject to the famous state space explosion problem. We alleviate this problem by exploiting a hierarchical structure with repetitive parts. This structure not only occurs naturally in robotics, but also in probabilistic programs describing, e.g., network protocols. Such programs often repeatedly call a subroutine with similar behavior. In this paper, we focus on a local case, in which the subroutines have a limited effect on the overall system state. The key ideas to accelerate analysis of such programs are (1) to treat the behavior of the subroutine as uncertain and only remove this uncertainty by a detailed analysis if needed, and (2) to abstract similar subroutines into a parametric template, and then analyse this template. These two ideas are embedded into an abstractionrefinement loop that analyses hierarchical MDPs. A prototypical implementation shows the efficacy of the approach.
Download conference paper PDF
1 Introduction
Markov Decision Processes (MDPs) are the model for sequential decision making under probabilistic uncertainty, and as such are central in modelling of randomized algorithms, distributed systems with lossy channels, or as the underlying formalism in reinforcement learning. A key question in the verification of MDPs is: What is the maximal probability that some error state is reached? In this question, one accounts for the probabilistic nature as well as the inherit (potentially adversarial) nondeterminism of the system. Various stateoftheart probabilistic model checkers, such as Storm [20], Prism [27] and Modest [17] implement a variety of methods that automatically compute such maximal probabilities. Most widespread are variations of valueiteration that iteratively apply a transition function to converge towards the requested probability.
Hierarchical Structure. Despite various successes, the state space explosion remains a significant challenge to the modelbased analysis of MDPs. To overcome this challenge, some approaches exploit symmetries or the parallel composition of a system. Other approaches exploit that typically not all paths through a system are equally likely and thus aim to find the essential or critical subsystem. While we exploit related ideas—a detailed comparison is given in the related work, cf. Sect. 7—our approach is fundamentally different and instead exploits a hierarchical decomposition natural in many system models. This decomposition is captured naturally by probabilistic programs (over discrete bounded variables) with nonnested subroutines, where some subroutines are called repeatedly with similar arguments. Figure 1 shows an example in which we demonstrate our approach in Sect. 2. More generally, we are interested in systems with an overall task that is achieved by a suitable combination of a limited number of subtasks. Such a setting occurs naturally, e.g. (i) in robotics, when multiple rooms in a floor need to be inspected, or (ii) in routing, when multiple packets need to be routed sequentially. The underlying problem structure is also exploited in hierarchical planning [5, 19, 30], where the goal is to find a good but not necessarily optimal policy (and induced value). We combine insights from hierarchical planning with an abstractionrefinement perspective and then construct an anytime algorithm with strict guarantees on the result.
Local ModelBased Analysis. An adequate operational model for the modelbased analysis of hierarchical systems is given by a hierarchical MDP, where the state space of a hierarchical MDP can be partitioned into subMDPs. Abstractly, one can represent a hierarchical MDP by the collection of subMDPs and a macrolevel MDP [19] where the probabilities of outgoing transitions at a state are described by a corresponding subMDP, cf. Sect. 3.2. In this paper, we focus on a hierarchical MDPs where the policies that are optimal in (only) a subMDP are optimal (partial) policies in the hierarchical MDP. More intuitively, we can solve the subMDPs individually, i.e., the solution (w.r.t. the fixed measure) for the subMDP is part of the globally optimal solution. While this assumption is restrictive, it is satisfied in various interesting settings. The assumption allows us to analyse subMDPs outofcontext, i.e., we can first analyse the subMDPs and then construct the correct macroMDP, i.e., extract transition probabilities and rewards from the subMDP analysis. This approach already improves the maximal memory consumption and allows for additional speedups if the same subMDP occurs multiple times.
Epistemic Uncertainty During Computation. The key insight to accelerate the outlined approach further is to avoid analysing all subMDPs precisely, while still providing sound guarantees on the obtained results. Therefore, consider that even before analysing the subMDPs we can analyse an uncertain variant of the macrolevel MDP where we do not yet know the associated transition probabilities and rewards but instead only know intervals. We may then do two things: First, we can identify the subMDPs which are most critical, i.e., where replacing the interval by a concrete value yields most benefits. Second, and more importantly, we can analyse a set of subMDPs and refine the associated uncertainties, i.e., tighten the associated intervals. To support the analysis of sets of subMDPs, we observe that often, these subMDPs are slight variations. In this paper, we represent them as parameterised instances of a particular templates that we define using parametric MDPs (pMDPs). The resulting intervals can be used to create an (intervalvalued version of the) macrolevel MDP. Analysing this gives bounds on the expected reward in the hierarchical MDP, and the bounds can be refined by analysing the subMDPs more precisely.
Contributions. In a nutshell, we explicitly allow for uncertainty during the solving process to speed up the analysis of hierarchical MDPs. Concretely, we contribute a scalable approach to solve hierarchical MDPs with many different subMDPs, in particular when these subMDPs are similar, but not the same. The approach resembles an abstractionrefinement loop where we abstract the hierarchical MDP in two layers and then refine the analysis of the lower layer to get a refined representation of the complete MDP. In every step, we can provide absolute error bounds. Our approach interprets the different subMDPs as a form of uncertainty. The efficient analysis originates from progress made in the analysis of uncertain (or parametric) MDPs, and brings that progress to a novel setting. The empirical evaluation with a prototype called levelup shows the efficacy of the approach.
2 Overview
We clarify the approach and its applicability with a motivating example that drastically abstracts a token passing process where the channel quality varies [12].
Setting. Consider the protocol in Fig. 1a which sends a token N times via a channel. That channel successfully transmits packets with probability p, where p varies over time. The subroutine takes t amount of time, depending on p. Specifically, in the model, we alternate between accumulating the required time and updating the channel quality for N token transmissions and then return the accumulated time. We aim to compute the expected return value. For the subroutine, we assume that sending a token is repeated until an acknowledgement is received, which is abstractly modelled in Fig. 1b and corresponds to the small Markov chain in Fig. 2a. First, the file must successfully be sent (\(s_0 \rightarrow s_1\)), then we start sending acknowledgements. The process terminates (\(s_1 \rightarrow s_2\)) once an acknowledgement is received. The complete protocol from Fig. 1 including the subroutine is reflected by the large Markov chain in Fig. 2b that repeats the small Markov chain (with different probabilities). This model may be analysed with standard tools, but for large N (and larger subroutines), the state space explosion must be alleviated.
MacroMDPs and Enumeration. We thus suggest to abstract the hierarchical model into the macrolevel MDP in Fig. 3a. Here, every state corresponds to an invocation of the subprocess. The reward at the states corresponds to the expected reward for the complete subprocess. Thus, naively, one may construct the macroMDP, analyse all (reachable) subMDPs independently and annotate the macroMDP states with the appropriate rewards, and finally analyse the macroMDP to obtain a result of \(\approx \)12.3. This approach avoids representing the complete hMDP in the memory, but it is still restricted to analysing systems with a limited number of subMDPs.
Our Approach. We improve scalability by constructing a parameterized macroMDP. Reconsider the rewards for Fig. 3a. The values can be computed via the graph in Fig. 3d, where we pick for each value for p (xaxis) and compute the corresponding expected reward \(\mathbb {E}\) (yaxis) obtained by analysing the subMDP in Fig. 2a. Intuitively, in our abstraction, we annotate the rewards with lower and upper bounds rather than exact values. Therefore, we compute bounds on the rewards by selecting an interval for the values \(p \in [8/25, 25/32]\), as shown in Fig. 3e. Conceptually, this means that we analyse a set of subMDPs at once, namely all subMDPs with \(p \in [8/25, 25/32]\). Annotating the corresponding expected rewards, in this case \([64/25, 25/4]\), then yields the macroMDP in Fig. 3b. Analysis of this MDP yields that overall expected time is in [7.68, 18.75]. We refine these bounds by analysing subsets of the subMDPs. We may split the values for p into two sets \([8/25, 2/5]\) and \([1/2, 25/32]\). Then, we obtain two corresponding intervals on the expected time in the subMDP as shown in Fig. 3f. Model checking the associated macroMDP, in Fig. 3c, bounds to expected time by [10.12, 14.25]. Technically, we realize this reasoning using parameter lifting [33].
Supported Extensions. For conciseness, this example is necessarily simple. Our approach allows nondeterminism, i.e., actionchoices, in the macroMDP and in the subMDPs. The subMDPs may have multiple outgoing transitions, but this must be combined with a restricted type of nondeterminism in the subMDP: If multiple outgoing transitions are present, the macroMDP has transition probabilities that depend on the subMDPs. We present a useful extension for reachability probabilities, see the discussion at the bottom of Sect. 3.3.
More Examples. Key ingredient to models where the approach excels are a repetitive task whose characteristics depend on some global state. Two variations are the expected energy consumption of a robot with slowly degrading components that, e.g., can be improved by maintenance or for job scheduling with periodically changing distribution of tasks (e.g., day vs. night).
3 Formal Problem Statement
We formalize MDPs and hierarchical MDPs (hMDPs) to pose the problem statement, then identify a subclass of hMDPs which we call localpolicy hMDPs and restrict our problem on computing optimal expected rewards in localpolicy hMDPs. Furthermore, we introduce parametric MDPs as they are key to the abstractionrefinement procedure later in the paper.
3.1 Background
Definition 1 (Parametric MDP)
A parametric MDP (pMDP) is a tuple \(\mathcal {M} = \langle S_\mathcal {M}, A_\mathcal {M}, \iota _\mathcal {M}, \vec {x}, P_\mathcal {M}, r_\mathcal {M}, T_\mathcal {M}\rangle \) where \(S_\mathcal {M}\) is a finite set of states, \(A_\mathcal {M}\) is a finite set of actions, \(\iota _\mathcal {M} \in S_\mathcal {S}\) is the initial state, \(\vec {x} = \langle x_0, \ldots x_n \rangle \) is a vector of parameters, \(P_\mathcal {M}:S_\mathcal {M} \times A_\mathcal {M} \times S_\mathcal {M} \rightarrow \mathbb {Q}[\vec {x}]\) are the transition probabilities, \(r_\mathcal {M}:S \rightarrow \mathbb {Q}[\vec {x}]\) the state rewards, and \(T_\mathcal {M}\) is a set of target states.
We drop the subscripts whenever possible. MDPs are parametric if \(\vec {x} \ne \langle \rangle \) and parameterfree otherwise. We omit parameters for parameterfree MDPs. We recap some standard notions on pMDPs (and MDPs):
For a (parameter) valuation \(u \in \mathbb {R}^{\vec {x}}\), the instantiation \(\mathcal {M}[u]\) globally substitutes \(P_\mathcal {M}(s,a,s')\) with \(P_\mathcal {M}(s,a,s')(u)\) and \(r_\mathcal {M}(s)\) with \(r_\mathcal {M}(s)(u)\). An assignment u is welldefined, if \(\mathcal {M}(u)\) constitutes an MDP, i.e., if \(\sum _{s'} P_\mathcal {\mathcal {M}}(s,\alpha ,s')(u) \in \{0, 1\}\) and \(r_\mathcal {M}(s)(u) \ge 0\) for each \(s \in S\), \(\alpha \in A\). We denote the set of all welldefined assignments with \(U_\mathcal {M}\). The set \(\textsf {Act}(s)\) denotes the enabled actions at state s, \(\textsf {Act}(s) = \{ \alpha \mid \sum _{s'} P_\mathcal {\mathcal {M}}(s,\alpha ,s') \ne 0\) }. If \(\textsf {Act}(s) = 1\) for every \(s \in S\), then the (parametric) MDP is a (parametric) Markov chain (MC). A path \(\pi \) is an (in)finite sequence of states \(s_0 \mathop {\rightarrow }\limits ^{\alpha _0} s_1\ldots \), with \(s_i \in S\), \(\alpha _i \in \textsf {Act}(s_i)\), \(P(s_i, \alpha _i,s_{i+1}) \ne 0\). For finite \(\pi \), \(\textsf {last}(\pi )\) denotes the last state of \(\pi \). We use \([s \rightarrow \lozenge T]\) to denote the set of (finite) paths T only at the end. The reward \(r(\pi )\) along a finite path \(\pi \) is the sum of the state rewards \(r(\pi ) :=\sum r(s_i)\).
Specifications. We consider indefinite horizon expected reward, i.e., the expected accumulated reward until reaching the target states. We refer to [3, 32] for a formal treatment and only introduce notation. Therefore, the unique probability measure \(Pr\) for a set of paths in a parameterfree Markov chain \(\mathcal {M}\) reaching state T can be defined using the usual cylinder set construction. We define \(Pr_\mathcal {M}(s \rightarrow \lozenge T)\) as the probability to reach a state in T, \(\int _{\pi \in [s \rightarrow \lozenge T]} Pr(\pi ) d\pi \). We then define the expected reward until hitting T, \(\textsf {ER}_{\mathcal {M}}(s \rightarrow \lozenge T) = \int _{\pi \in [s \rightarrow \lozenge T]} Pr(\pi ) \cdot r(\pi ) d\pi \). In both definitions, if s is the initial state, we simply write \(\ldots (\lozenge T)\). For technical conciseness, we make the standard assumption that target states are reached with probability 1, which ensures that the integral exists and is finite. (Arbitrary) reachability probabilities can be nevertheless be modelled using rewards.
Policies. In pMDPs, we resolve nondeterminism with policies. In this paper, it suffices to consider memoryless policies \(\sigma :S \rightarrow A\). The set of such policies is denoted \(\varSigma (\mathcal {M})\). We omit \(\mathcal {M}\) if it is clear from the context. It is helpful to also consider partial policies \(\hat{\sigma }:S \nrightarrow A\). For an pMDP \(\mathcal {M}\) and a (partial) policy \(\hat{\sigma }\), the induced dynamics are described by the induced pMDP \(\mathcal {M}[\hat{\sigma }]\), defined as \(\langle S_\mathcal {M}, A_\mathcal {M}, \iota _\mathcal {M}, \vec {x}, P, r_\mathcal {M}, T_\mathcal {M}\rangle \), where the transition probabilities are given as
If \(\sigma \) is total (not partial), then \(\mathcal {M}\) is a MC. We define the maximal expected reward \(\textsf {ER}_{\mathcal {M}}^\text {max}(\lozenge T) = \max _{\sigma \in \varSigma } \textsf {ER}_{\mathcal {M}[\sigma ]}(\lozenge T)\), and say that a policy \(\sigma \) is optimal, if \(\textsf {ER}_{\mathcal {M}}^\text {max}(\lozenge T) = \textsf {ER}_{\mathcal {M}[\sigma ]}(\lozenge T)\).
Regions and Parametric Model Checking. A set of valuations described by is called a (rectangular) region, if \(R = \{ u \mid u^{}\le u \le u^{+}\}\) for adequate bounds \(u^{}, u^{+}\in \mathbb {R}^{\vec {x}}\) and using pointwise inequalities, i.e., R is a Cartesian product of intervals of parameter values. We denote this region also with \([[u^{},u^{+}]]\). For regions, we may compute a lower bound on \(\min _{u \in R} \textsf {ER}_{\mathcal {M}[u]}^\textsf {max}(\lozenge T)\) and an upper bound on \(\max _{u \in R} \textsf {ER}_{\mathcal {M}[u]}^\textsf {max}(\lozenge T)\) via parameter lifting [33, 36].
3.2 Hierarchical MDPs
We concentrate on solving hierarchical MDPs (hMDPs). We assume that hMDPs are parameterfree and that their topology has some additional known structure.
Definition 2 (Hierarchical MDPs)
A MDP \(\mathcal {M}\) with a partitioning of its states \(S_\mathcal {M}= \bigcup \mathbf {S}_i\) is a hierarchical MDP, if for all i,

there exists a unique \(s^i_\iota \in \mathbf {S}_i\) such that \(s^i_\iota = \iota _\mathcal {M}\text { or } \textsf {pred}_\mathcal {M}(s^i_\iota ) \not \subseteq \mathbf {S}_i\), and

\(\text {for all } s \in \mathbf {S}_i \setminus \{ s^i_\iota \}\), it holds that \(s^i_\iota \ne \iota _\mathcal {M}\text { and } \textsf {pred}_\mathcal {M}(s) \subseteq \mathbf {S}_i.\)
The state \(s_\iota \) is called the entry state, which we denote \(\textsf {entry}_i\). States with \(\textsf {succ}_{\mathcal {M}}(s) \cap \mathbf {S}_i = \emptyset \) are called exitstates. The set \(\textsf {succ}(i) :=\textsf {succ}_{\mathcal {M}}(\mathbf {S}_i) \setminus \mathbf {S}_i\) are the successor states of the partition i. Let \(Y = \max _i \textsf {succ}(i)\). By adding auxiliary states, we can assume that \(\textsf {succ}(i) = Y\) for all i. We call partitions with \(\mathbf {S}_i = 1\) trivial. We use \(\mathbb {I}:=\{ i \mid \mathbf {S}_i > 1 \}\) to denote the indices of the nontrivial partitions. We remark that every MDP can be considered as an hMDP with only trivial partitions.
The naive solution to this problem is to ignore the hierarchical structure and solve the MDP monolithically. In this paper, we contribute methods that actively exploit the structure of the hierarchical MDPs with \(\mathbb {I} \gg 1\). We will make an additional assumption on the structure of the hierarchical MDP.
3.3 Optimal Local Subpolicies and Beyond
Intuitively, we want to ensure that the optimal policy within the partitions can be computed locally, i.e., on partition without taking into account the complete MDP. Therefore, each partition within the MDP can be considered as an individual MDP. In particular, each \(\mathbf {S}_i\) induces a subMDP as follows:
Definition 3 (subMDP)
Given a hierarchical MDP \(\mathcal {M}\) and partition \({\textbf {S}}_i\), the corresponding subMDP is an MDP \(\mathcal {M}_i :=\langle S_i :={\textbf {S}}_i \cup \textsf {succ}_{\mathcal {M}}({\textbf {S}}_i) \cup \{ \bot \}, A_\mathcal {M}\cup \{ \alpha _\bot \}, \iota :=\textsf {entry}_{i}, P_i, r_i, G_i \rangle \) with \(P_i\) defined by
\(r_i\) is defined as \(r_i(s) = r_\mathcal {M}(s)\) if \(s\in {\textbf {S}}_i\), \(r_i(s) = 0\) otherwise, and \(G_i :=\{ \bot _i \}\).
Thus, for every partition of the hierarchical MDP, the corresponding subMDP contains additionally the successor states, and a unique bottom state that is a target state and simplifies our construction later.
Likewise, we can (de)compose memoryless policies for the hierarchical MDP as a union of policies on the individual subMDPs. We do this only for nontrivial partitions. Let \(\sigma _i :S_i \mapsto A\) denote memoryless policies for \(\mathcal {M}_i\) and \(\sigma '_i\) the restriction of \(\sigma _i\) to \({\textbf {S}}_i\), then \(\left( \bigsqcup _{\mathbb {I}} \sigma _{i } \right) :S \nrightarrow A\) is the unique partial policy such that
Intuitively, we want that the union of locally optimal policies, a partial policy, can be completed to a total policy that is optimal.
Definition 4 (Optimal local subpolicies)
Given a hierarchical MDP \(\mathcal {M}\) with target states T and optimal policies \(\sigma _i \in \varSigma (\mathcal {M}_i)\) for all \(i \in \mathbb {I}\). The hierarchical MDP has optimal local subpolicies, if for \(\hat{\sigma }= \bigsqcup _\mathbb {I}\sigma _i\) it holds that \(\textsf {ER}_{\mathcal {M}[\hat{\sigma }]}^\textsf {max} = \textsf {ER}_{\mathcal {M}}^\textsf {max}\).
That is, if we collect (locally) optimal policies \(\sigma _i\) and apply them to \(\mathcal {M}\), we obtain the MDP \(\mathcal {M}[\left( \bigsqcup _{\mathbb {I}} \sigma _{i } \right) ]\). In that MDP, we can pick an optimal policy, and together with \(\left( \bigsqcup _{\mathbb {I}} \sigma _{i } \right) \) this constitutes an optimal and total policy for \(\mathcal {M}\).
Roughly, the idea now becomes that rather than solving one large MDP with S states, we solve \(\mathbb {I}\) MDPs with \(S/\mathbb {I}\) states and one MDP with \(\mathbb {I}\) states (assuming equallysized and only nontrivial partitions).
The assumption is restrictive, but not unreasonable: A subroutine may not have any nondeterminism, or a finished task will have no influence on any future task. The following proposition, while obvious, formalizes that:
Proposition 1 (Sufficient criterion)
Let \(\mathcal {M}\) be a hierarchical MDP. The MDP has optimal local subpolicies, if for each \(i \in \mathbb {I}\) either

there is a single successor for the partition, i.e., \(\textsf {succ}_{\mathcal {M}}({\textbf {S}}_i) \setminus {\textbf {S}}_i=1\), or

there are no choices, i.e., \(\textsf {Act}(s) = 1\) for all \(s \in {\textbf {S}}_i\),
Beyond Optimal Local Subpolicies. The efficiency of our approach is partly due to the assumption in Definition 4. We observe that adapting this definition allows for a spectrum of specific yet useful cases. In particular, say that our system describes a protocol in which we must optimize the probability to satisfy N tasks all may fail – the subMDPs will have two successor states. Often, it is then easy to see (and model) that a locally optimal policy will aim to satisfy each task and that thus, the locally optimal policy optimizes the probability to reach the corresponding successor state. Then, by adopting the target states in Definition 3 to be the successor state where the task is successful, the notion of an optimal policy—and thus of an optimal local subpolicy—changes. These changes are minimal and everything that follows below is easily adapted to this setting as demonstrated by the prototypical implementation.
4 Solving hMDPs with AbstractionRefinement
In this section, we consider hMDPs with optimal local subpolicies. We stepwise develop a sketch of an anytime algorithm that provides lower and upper bounds on the expected reward in this hMDP. In Sect. 4.1, we introduce an alternative representation of our problem that formalizes the idea of individually computing subMDPs. We then formalize the ideas that allow to construct an anytime algorithm in Sect. 4.2. In Sect. 4.3, we introduce the abstract requirements for analysing sets of subMDPs into the algorithm, and finally, in Sect. 4.4 we introduce a method that realises this using pMDPs.
4.1 The MacroMDP Formulation
We adapt macroMDPs [5] which summarize the subMDPs by single states.
Definition 5 (MacroMDP)
Let \(\mathcal {M}\) be a hMDP with n nontrivial \({\textbf {S}}_i\) partitions and \(S_\mathcal {M}\) partitioned as \(S_\mathcal {M}= \bigcup {\textbf {S}}_i \cup S'\). The macroMDP is defined as \(\mu (\mathcal {M}) :=\langle S' \cup \{ \textsf {entry}_i \mid 1 \le i \le n \}, A_\mathcal {M}, \iota _\mathcal {M}, \emptyset , P, r, T_\mathcal {M}\rangle \) with P and r given by
where \(\mathcal {M}_i\) is the corresponding subMDP (see Definition 3) and \(\sigma _i\) is an arbitrary but fixed optimal policy, i.e., a policy such that \(\textsf {ER}_{\mathcal {M}_i[\sigma _i]}(\lozenge G_i) = \textsf {ER}_{\mathcal {M}_i}^\textsf {max}(\lozenge G_i)\).
Intuitively, we replace the transitions within \({\textbf {S}}_i\) by a ‘bigstep semantics’ that aggregates the transitions within \({\textbf {S}}_i\) by single transitions such that the probability to reach any successor matches the probability to do so within \({\textbf {S}}_i\) under a specific –optimal– policy. Likewise, the expected reward matches the expected reward collected in \({\textbf {S}}_i\)^{Footnote 1}.
Remark 1
To define a unique macroMDP, we can take the lexicographically smallest policy \(\sigma _i\) among the optimal policies. Furthermore, we observe that for the cases covered by Proposition 1, it is not necessary to compute \(\sigma _i\) at all: Either there is a single successor—implying \(\textsf {Pr}_{\mathcal {M}_i[\sigma _i]}(\lozenge \{s'\}) = 1\) for any \(\sigma _i\)—or \(\varSigma (\mathcal {M}_i)=1\).
The following theorem formalises that, given the assumptions, taking the bigstep semantics is adequate when optimizing for an expected reward.
Theorem 1
Let \(\mathcal {M}\) be a hMDP with optimal local subpolicies and let \(\mu (\mathcal {M})\) be the corresponding macroMDP. Then: \(\textsf {ER}_{\mu (\mathcal {M})}^\textsf {max}(\lozenge T) = \textsf {ER}_{\mathcal {M}}^\textsf {max}(\lozenge T)\).
The important ingredient are the optimal local subpolicies that ensure that we aggregate behavior within the partitions by behavior that agrees with a (globally) optimal policy. We give a proof in the appendix^{Footnote 2}.
Naive Algorithm. Algorithmically, we first compute \(\textsf {ER}_{\mathcal {M}_i}^\textsf {max}(\lozenge T_i)\) and the associated policy \(\sigma _i\), then compute the reachability probabilities on the induced Markov chain. We collect these results in a vector \(\textsf {res}_i\), which is helpful to construct the macroMDP. To clarify further constructions in this paper, we make \(\textsf {res}_i\) explicit. Recall that \(\textsf {succ}_{\mathcal {M}}({\textbf {S}}_i) = Y\) for all i.
Definition 6 (Results for subMDP)
Let \(\mathcal {M}_i\) be a subMDP for the partition \({\textbf {S}}_i\) of a hMDP \(\mathcal {M}\). Let \(\textsf {succ}_{\mathcal {M}}({\textbf {S}}_i)\) be ordered. We define \(\textsf {res}_i \in \mathbb {R}^{Y+1}\) s.t.
where \(\sigma _i\) is an arbitrary but fixed policy such that \(\textsf {ER}_{\mathcal {M}_i[\sigma _i]}(\lozenge G_i) = \textsf {ER}_{\mathcal {M}_i}^\textsf {max}(\lozenge G_i)\).
This allows us to reformulate the macroMDP, in particular, the following two identities do hold:
The identities trivialize that constructing the macroMDP can be done by precomputing the necessary resultvectors.
This rather naive algorithm already limits memory and may exploit similarities between subMDPs during the analysis, e.g., based on the structure discussed in Sect. 4.4. It performs well if the number \(\mathbb {I}\) of subMDPs is sufficiently small. We are interested in considering methods that allow for larger \(\mathbb {I}\) or larger subMDPs. In particular, we want to avoid analysing all subMDPs, all individually.
4.2 The Uncertain MacroMDP Formulation
Uncertainty Before Computation. We start introducing a method that allows providing bounds on the expected rewards after individually analysing a subset of the subMDPs. Before computing the individual probabilities in \(\mathcal {M}_i\), we are uncertain about the probabilities and rewards in the MDP \(\mu (\mathcal {M})\). Under this uncertainty, we may not be able to compute \(\textsf {ER}_{\mu (\mathcal {M})}^\textsf {max}(\lozenge T)\) precisely. However, we may solve the problem statement by bounding the expected reward. Thus, the goal is to compute values \(\textsf {lb}, \textsf {ub}\) s.t.
Uncertain MacroMDPs. We capture the apriori uncertainty about the subMDP results in an uncertain macroMDP, a particularly shaped parametric MDP.
Definition 7 (Uncertain macroMDP)
Let \(\mathcal {M}\) be a hMDP with n nontrivial \({\textbf {S}}_i\) partitions and \(S_\mathcal {M}\) partitioned as \(S_\mathcal {M}= \bigcup {\textbf {S}}_i \cup S'\). The uncertain macroMDP is defined as \(\nu (\mathcal {M}) :=\langle S' \cup \{ \textsf {entry}_i \mid 1 \le i \le n \}, A_\mathcal {M}, \iota _\mathcal {M}, \vec {x}, P, r, T_\mathcal {M}\rangle \) with parameters \(\vec {x} :=\{ p_{i,j}, q_i \mid 1 \le i \le n, 1 \le j \le Y \}\) where \(Y = \textsf {succ}_{\mathcal {M}}({\textbf {S}}_i)\). P and r given by
Remark 2
Whenever \(\mathcal {M}_i\) and \(\mathcal {M}_{i'}\) are isomorphic, we may reduce the parameters and replace each occurrence of \(p_{i',j}\) with \(p_{i,j}\) and each occurrence of \(q_{i'}\) with \(q_i\).
The uncertain macroMDP can be instantiated to coincide with the macroMDP by setting the parameters accordingly.
Theorem 2
Let \(\mathcal {M}\) be a hMDP, \(\mu (\mathcal {M})\) the associated unique macroMDP, and \(\nu (\mathcal {M})\) the associated uncertain macroMDP with parameters \(p_{i,j}\) and \(q_i\). Let \(u^*\) be a parameter valuation with \(u^*(p_{i,j})= \textsf {res}_i(j)\) and \(u^*(q_i)= \textsf {res}_i(Y)\) for all i, j. Then:
Proof sketch. The construction of the uncertain macroMDP and the macroMDP only differs in the assignment of probabilities. We set u here as in the characterisation in (1) and thus the equality follows. \(\square \)
Computing Bounds. Assume for now that we can derive some (trivial) sound bounds on the results vector for any subMDP \(\mathcal {M}_i\)^{Footnote 3}.
Definition 8 (Sound bounds on results)
For \(\mathcal {M}_i\), the vectors \(\textsf {lbres}_i\) and \(\textsf {ubres}_i\) are sound bounds if the following pointwise inequality holds
These bounds on properties in the subMDP correspond to bounds on the parameters of the uncertain macrolevel MDP \(\nu (\mathcal {M})\). Let us formalize this idea.
Definition 9 (Suitable parameter region)
Given \(u^*\) from Theorem 2. The bounds \(u^{}, u^{+}\) are suitable if \(u^{}\le u^* \le u^{+}\). For suitable \(u^{}, u^{+}\), the region \([[u^{}, u^{+}]]\) is called suitable.
Using this notion, sound bounds \(\textsf {lbres}_i\) and \(\textsf {ubres}_i\) thus yield suitable bounds \(u^{}(x), u^{+}(x)\) for all \(x \in \bigcup _j p_{i,j} \cup \{ q_i \}\). Combined, the sound bounds for every i yields a suitable region. Formally:
Lemma 1
Given sound bounds \(\textsf {lbres}_i, \textsf {ubres}_i\) for each i, there exists a trivial mapping \(\textsf {Reg}\) s.t. \(\textsf {Reg}(\textsf {lbres}_1, \ldots \textsf {lbres}_n, \textsf {ubres}_1, \ldots \textsf {ubres}_n)\) is a suitable region.
With the suitable region we can apply verification on the parametric MDP.
Lemma 2
Let R be a suitable region. Then:
Proof sketch. We observe that the inequalities follow from the fact that \(u^* \in R\) with \(u^*\) as in Theorem 2. By that theorem, \(\textsf {ER}_{\nu (\mathcal {M})[u^*]}^\textsf {max}(\lozenge T) = \textsf {ER}_{\mu (\mathcal {M})}^\textsf {max}(\lozenge T)\). The statement then follows from Theorem 1. \(\square \)
From the bounds that we can compute using a suitable region, we then set \(\textsf {lb}\) and \(\textsf {ub}\) for Eq. (2):
Computationally, we may use parameter lifting [33] to find these values.
Refinement Loop. The complete anytime algorithm is summarized in Fig. 4. We start with an hMDP \(\mathcal {M}\) and extract the uncertain macroMDP \(\nu (\mathcal {M})\) and the subMDPs \(\{\mathcal {M}_i\}\)^{Footnote 4}. Furthermore we compute (trivial) sound bounds on \(\textsf {lbres}_i \le \textsf {res}_i \le \textsf {ubres}_i\). This leads to a suitable region \([[u^{}, u^{+}]] = \textsf {Reg}(\textsf {lbres}_1, \textsf {ubres}_1, \ldots )\). Then, we may at any time compute the bounds \(\textsf {lb}, \textsf {ub}\) on the expected reward in the hMDP \(\mathcal {M}\) by analysing \(\nu (\mathcal {M})\) on the region \([[u^{}, u^{+}]]\). To tighten these bounds, we must first refine the suitable region. Therefore, we analyse individual subMDPs \(\mathcal {M}_i\) and compute \(\textsf {res}_i\) and thus \(u^*(x)\) for \(x \in \cup _j p_{i,j} \cup q_i\). This refines the suitable bounds such that \(u^{}(x) = u^*(x) = u^{+}(x)\) for \(x \in \cup _j p_{i,j} \cup q_i\). We call this refinement individual refinement. The new region is suitable and Theorem 2 ensures correctness of the refinement. As we only have finitely many subMDPs, we obtain \(\textsf {lb}= \textsf {ub}\) after finitely many steps.
4.3 SetBased SubMDP Analysis
Next, we aim to provide an alternative refinement procedure that analyses a set of subMDPs at once, i.e., that refines the suitable bounds for a set of parameters at once. We denote the set of goal states for all subMDPs as G^{Footnote 5}.
Adequate Abstractions. We aim to compute sound bounds on the results for a set of subMDPs such that the bounds are sound for every individual subMDP in this set. We generalize Definition 8 as follows: The (lower and upper) bounds \(\textsf {lbres}_{I}, \textsf {ubres}_{I}\) are sound, if they are sound (lower and upper) bounds for every \(\textsf {res}_i\), \(i \in I\).
Lemma 3
Let \(\textsf {lbres}_I\) satisfy the following inequations using \(0 \le j < Y\):
Then, \(\textsf {lbres}_I\) is a sound lower bound.
Proof sketch. We must show \(\textsf {lbres}_I \le \textsf {res}_i\) for each \(i \in I\). By definition for each \(1 \le j \le Y\), \(\textsf {lbres}_I(j) \le \min _{i' \in I} \textsf {res}_{i'}(j)\) and trivially \(\min _{i' \in I} \textsf {res}_{i'}(j) \le \textsf {res}_i(j)\). \(\square \)
We omit the analogous statement for \(\textsf {ubres}\)^{Footnote 6}. In Sect. 4.4, we discuss a particular approach to obtain these bounds, i.e., the right hand sides of the equations in Eq. 5. Here, we update the algorithm sketch to handle this alternative refinement.
Remark 3
We cannot compute the optimal policy \(\sigma _i\) for the subMDP \(\mathcal {M}_i\) in this setting. Thus, we must compute probability bounds for all policies, which may make these bounds weak. Some optimizations are possible as some actions can in fact be excluded. More importantly, however, is that for cases within Proposition 1 the policy \(\sigma _i\) is irrelevant.
Updated Algorithm. We update the loop from Fig. 4: Rather than refining using a single i, we refine using a set I. Instead of \(\textsf {res}_i\), we use Lemma 3 to compute sound bounds \(\textsf {lbres}_I, \textsf {ubres}_I\) and call this setbased refinement. We may set \(\textsf {lbres}_i = \textsf {lbres}_I\) for each \(i \in I\). Then, we can compute a new suitable region via Lemma 1. With the suitable region, we can still utilise Eq. (4) to compute an approximation \([\textsf {lb}, \textsf {ub}]\). However, for completeness we must ensure that if \(I=1\), the upper and lower bounds coincide, i.e., \(\textsf {lbres}_{\{i\}} = \textsf {ubres}_{\{ i\}}\) for every i. That can be ensured by using individual subMDP refinement when \(I=1\).
We now first discuss the setbased analysis of multiple subMDPs \(\mathcal {M}_i\). We clarify the realization of the loop box in Sect. 5.
4.4 Templates for SetBased subMDP Analysis
We present an instance of setbased subMDP analysis where the subMDPs can be described as instantiations of a parametric MDPs.
Parametric Templates. We observe that the subMDPs are often similar, e.g., they define sending a file over a channel, exploring a room, in different conditions. We capture this similarity as follows: Let \(\{ \mathcal {T}_1, \ldots \mathcal {T}_m \}\) define a set of parametric MDPs, where we call each pMDP a template. In particular, for a hierarchical MDP \(\mathcal {M}\) with partitioning \({\textbf {S}}_1, \ldots {\textbf {S}}_n\) and corresponding subMDPs \(\mathcal {M}_1,\ldots , \mathcal {M}_n\) a subMDP \(\mathcal {M}_i\) is an instantiation of template \(\mathcal {T}_j\) and parameter instantiation v^{Footnote 7}, if \(\mathcal {M}_i = \mathcal {T}_{j}[v]\). For a concise description, this paper considers hMDPs over a single template \(\mathcal {T}\) and, for any \(I \subseteq \mathbb {I}\), we denote \(V_I :=\{ v_1, \ldots , v_n \}\) the finite (multi)set of parameter instantiations for the pMDP \(\mathcal {T}\) such that \(\mathcal {T}[v_i] = \mathcal {M}_i\).
Abstractions from Templates. In terms of the templates, Lemma 3 requires us to bound the expected rewards \(\textsf {ER}_{\mathcal {T}[v]}^\textsf {max}(\lozenge G)\) for all \(v\in V_I\). We realize this by defining the smallest region \(\textsf {toRegion}(V_I) \supseteq V_I\). For this region, we obtain expected rewards by computing the minimum maximal reward in \(\textsf {toRegion}(V_I)\). That is:
We handle the probabilities equally while taking into account the quantification over the policies. Following Lemma 3, these bounds are sound. Upper bounds are handled analogously. Computationally, we again use parameter lifting [33] to find these bounds. We can easily refine: Whenever we split I (or equally, \(V_I\)), we can compute (potentially) smaller regions \(\textsf {toRegion}(V_I)\).
In Fig. 5, we depict our method. In contrast to Fig. 4, we pass the template \(\mathcal {T}\) rather than the individual subMDPs. Furthermore, we now compute initial sound bounds via the analysis of the template (i.e., of \(V_I\)) and must pass the mapping from I to \(V_I\) to clarify the shape of the subMDPs.
5 Implementing the AbstractionRefinement Loop
Algorithm 1 outlines a basic implementation of the idea sketched in Fig. 5. We detail this implementation and then discuss an essential improvement.
We construct \(\nu (\mathcal {M})\), \(\mathcal {T}\), and (the implicit) mapping \(V :\mathbb {I}\rightarrow V_\mathbb {I}\) to map subMDPs to instantiations of \(\mathcal {T}\) from a suitable highlevel representation. We initialize a priority queue with triples that represent sets of template instantiations: I such that \(V_I :=\{ v_i :=V(i) \mid i \in I\}\) contains all valuations v such that \(\mathcal {T}[v]\) is a subMDP of \(\mathcal {M}\). We initially store bounds reflecting \(\textsf {lbres}_I\) and \(\textsf {ubres}_I\) as well as weights for the computation of the priority (see below). Initially, we assume that \(\textsf {lb}=0\) and \(\textsf {ub}= \infty \), we count the number of iterations in \(\#\text {iter}\). \(\text {Res}\) is map for storing result vectors. The algorithm now refines \(\textsf {lb}\) and \(\textsf {ub}\) until the gap between \(\textsf {lb}\) and \(\textsf {ub}\) is sufficiently small.
The main loop now iteratively refines \(\textsf {lb},\textsf {ub}\) by first refining \(\textsf {lbres}_I\) and \(\textsf {ubres}_I\), by splitting I and model checking \(\mathcal {T}\) w.r.t. subsequently smaller regions \(\textsf {toRegion}(V_I)\) (l. 511): Therefore, we take a set R from the queue. If \(R.I =\{ i \}\) is a singleton, we compute \(\textsf {lbres}_{R.I} = \textsf {res}_{i} = \textsf {ubres}_{R.I}\) and store this result. Otherwise, we apply model checking to the pMDP \(\mathcal {T}\) w.r.t. the region representation of R.I. We then split R.I, by splitting I into (here) two subsets. For splitting I, we use the geometric interpretation of \(\textsf {toRegion}(V_I)\) as a subset of \(\mathbb {R}^{\vec {y}}\), where we then split along one of the axis into two equally large subsets. Every k (we use \(k=8\)) iterations, we analyse the macroMDP (l. 1215). From Q and \(\text {Res}\) we extract the proper bounds \(\textsf {lbres}_i, \textsf {ubres}_i\) from Res[i] if possible and from Q using \(R.\text {bounds}\) for R such that \(i \in R.I\) otherwise. Then via \(\textsf {Reg}(\textsf {lbres}_1, \textsf {ubres}_1,\ldots )\) from Lemma 1 we compute a suitable region \(R'\). We analyse the uncertain macroMDP to obtain \(\textsf {lb}\) and \(\textsf {ub}\) in accordance with Eq. (4).
Finally, we discuss the priority function: If we apriori naively assume that each subMDP contributes an equal amount to the overal minimal expected reward in the hMDP (weights are all one) then the following priority function: \(R.\text {bounds} \cdot \sum _{v \in I} \text {R.weights}(v)\) computes priorities that correlate with how much computing \(\textsf {res}_i\) for all \(i \in I\) would reduce the gap between \(\textsf {lb}\) and \(\textsf {ub}\).
Termination and Correctness Argument. Algorithm 1 terminates. We split in such way that \(\max _{I \in Q} I\) monotonically decreases. Thus, eventually Q is empty and \(\text {Res}\) contains results for all subMDPs. Then, \(R'\) is a point region and checking \(\nu (\mathcal {M})\) with this point region ensures that \(\textsf {lb}= \textsf {ub}\). Correctness follows as \(R'\) is always suitable, see Eq. (4).
Computing Expected Visits. Based on our empirical evaluation we added one crucial improvement: While the algorithm above assumed that all subMDPs (or states in the macroMDP) are equally important, that assumption is generally inadequate. Roughly, only states reached by the optimal policy contribute at all (provided the bounds are tight enough that we can identify these states). The reachable states are weighted by the expected number of visits of these states. We compute an approximation of this expected number of visit by computing the currently optimizing policy (a byproduct of l. 13) and compute the center of \(R'\); this results in a MC for which we can compute the number of expected visits by a standard equation system [32]. Additionally, we update the weights for the regions in the queue based on these new results. We remark that this also makes the priority function more useful.
Interleaving Individual Refinement. Furthermore, for a subMDPs for which the expected number of visits is large^{Footnote 8} are individually analysed (and the points are removed from the region in the queue). This optimization reduces the need to split the corresponding regions until we obtain tight bounds.
6 Experiments
Implementation. We implemented levelup^{Footnote 9}, a prototype on top of the python bindings for Storm [20]. levelup analyses hierarchical MDPs by taking two MDPs, each provided as probabilistic program descriptions in the PRISM format: One MDP that encodes the (uncertain) macroMDP and one that describes the parametric template for the subMDPs. The parameter instance of the subMDP can be deduced as a function of the highlevel variable assignment of the macroMDP states. For technical reasons, the prototype currently provides support for subMDPs with one or two successor states – arguably the setting in which we expect our prototype to perform best. For subMDPs with a single successor state, the uncertain macroMDP may be represented as an (parameterfree) MDP with intervalvalued rewards. For two successors, we include support of the extension of Sect. 3.3 where the successor aims to optimize reaching a fixed successor state.
Setup. We investigate the scalability and the quality of the approximation over time. Therefore, we run our prototype on an MacBook 2020 M1 with an 8 GB RAM limit. We compare the enumerative baseline from Sect. 4.1 with Algorithm 1. Both exploit the hierarchical nature of the MDP. We qualitatively compare to standard model checking on the flat MDP, see below. We use a collection of benchmarks reflecting networks, job schedulers and robots.
Results. We consider instances that we summarize in Table 1. In particular, we give the benchmark name and instance for reference, the approximate number of states in the hierarchical MDP (computed from the macroMDP and the subMDPs), the number of nontrivial partitions, and the number of states and actions in the (uncertain) macroMDP and subMDPs, respectively. Then, we give the time to setup the data structures from the highlevel representation \(t_\text {init}\) in seconds. We highlight that a flat representation of all our benchmarks has at least \(10^7\), often more, states. As a reference, we present the performance of the enumerative baseline from Sect. 4.1. The performance of this approach is positive as it enables the verification of huge MDPs. A TO indicates \({>}1200\) s. To scale to either larger subMDPs or more subMDPs, we use the abstractionrefinement loop. To reflect the anytime nature, we list three run times for terminating when \(\eta \cdot \textsf {ub}\le \textsf {lb}\) with \(\eta \in \{ 0.5, 0.9, 0.95 \}\) respectively. The largest time faster than the enumerative baseline is highlighted (further to the right is better for the abstractionrefinement). For \(\eta =0.95\), we give details: The number of iterations (iter), the number of individual refinements based on the improvement from Sect. 5, and the fraction of time spent on model checking the uncertain macroMDPs \(\%_\text {um}\), the setrefinements \(\%_\text {sr}\), and the individual refinements \(\%_\text {ir}\), respectively.
Discussion. Before we discuss details of the results, let us clarify that exploiting the hierarchical structure is essential. MDPs with \({\approx }10^8\) states are at the limit of what fits in around 8GB of memory^{Footnote 10}. Symbolic methods based on MTBDDs easily scale beyond these sizes, but—noting that the subMDPs are all slightly different—the models we consider lack the necessary symmetry that make MTBDDs compact. Thus, support for hierarchical MDPs is a necessary step forward.
Regarding the abstractionrefinement: While a larger study may be necessary, we can start with two standard observations: The abstractionrefinement loop is significantly faster on \(\eta \le 0.9\). As \(\eta \rightarrow 1\), coarse abstractions are insufficient. Furthermore, the efficiency of the abstractionrefinement heavily depends on the particular structure. That being said, the approach outperforms the enumerative approach, especially for \(\eta = 0.9\), and up to more than an order of magnitude. This happens even if \(\mathbb {I}\) is rather small, or if, e.g., \(\mathcal {T}\) is small. We furthermore observe that for large \(\mathbb {I}\), the bookkeeping in python becomes a bottleneck. We think these observations are promising: we left many options for further optimizations and tweaking towards particular examples on the table. However, for models where most time is spent on model checking the macrolevel MDP, the approach is less suitable. We furthermore conjecture that tailored algorithms may exploit some of these dimensions, e.g., when there is the macroMDP or the subMDPs are indeed MCs or perhaps acyclic, depending on the number of parameters and their influence [36], or based on the relative weight of the uncertain rewards compared to rewards in the macroMDP.
7 Related Work
In the modelfree reinforcement learning (RL) setting, hierarchical models are popular. An excellent, recent survey is given in [29]. Our work generalizes the solution techniques on hierarchical MDPs that assume that these subMDPs are the same. In RL, this assumption is treated liberally, and the methods provide only weak error bounds. In contrast, our modelbased approach provides errorbounds in every step, and the error disappears in finitely many steps.
Hierarchical abstractions are used to analyse large MDPs in [5]. There, the goal is to find a policy that almost optimizes the reward. Rather than preimposing a hierarchy, the algorithm aims to find a hierarchy and define the goal states of the subMDP such that the model admits local policies. Instead, our solution can find the optimal policy and in particular gives strict error bounds at the cost of requiring a highlevel model that induces the hierarchy. An symbolic approach for continuous MDP, where the transition probabilities are the result of an associated LP, has recently been discussed in [24]. An hierarchical SCCdecomposition [1] aims to accelerate the process of solving a (given, monolithic) Markov chain. The computation of rewardbounded properties [18] generalizes topological value iteration and their notion of episodes mildly resembles an hierarchical approach but no uncertainty is assumed or used in the approach. The probabilistic model checker PAT [35] analyses a hierarchical probabilistic timed automaton given as a process algebra. The hierarchy is not exploited in the solving process.
While symbolic approaches, often on decision diagrams, exploit the transition system by compressing the data structures, abstractions aim to yield smaller systems that may assess an approximation for the soughtfor values. Abstractionrefinement without an imposed hierarchy is explored in [16, 21, 25]: Refinement amounts to considering a better approximation of the state space. In contrast, we impose the hierarchy, the abstraction amounts to an imprecise analysis of this fixed state space and we refine by analysing the state space more precisely (by means of analysing subMDPs at a greater level of detail). Contractbased abstractions (in probabilistic systems) are used to decompose the analysis of systems given by parallel running subsystems [14, 28, 38]. Partial exploration and bounded model checking approaches focus on the most critical paths, i.e., the paths where most of the probability mass lies [7, 23, 26], but these approaches do generally not exploit the hierarchical and repetitive structure. The observation that many parts of the system are not critical allows us to weigh the potential benefit of refining the intervals in various parts of the macroMDP.
Parametric MDPs are commonly used to model and analyse the effects of uncertainty in the precise transitions [15, 23, 31]. The methods presented in [13, 22] exploit a repetitive structure in parametric MCs to accelerate the construction of closed form solutions and are not applicable to MDPs. Parametric models have been used to support the design of systems [2, 8] or their adaption [6, 9], to find policies for partially observable systems [11], to analyse Bayesian networks [34], and to speed up the analysis of, e.g., software product lines [10, 37]. On top of technical differences, none of these approaches uses a hierarchical decomposition of an MDP or uses the results of the analysis in the analysis of a larger MDP.
8 Conclusion
This paper presents a first verification approach that exploits a specific hierarchical structure natural in many models to accelerate analysing the underlying MDP. An essential ingredient is to separate the two levels in the hierarchy. Then, when analysing the (toplevel) macroMDP, we may consider subMDPs that have not yet been analysed as epistemic uncertainty. Analysis techniques for uncertain (more precise: parametric) MDPs then enable an online approximation loop that incrementally removes uncertainty in a targeted fashion by analysing more and more subMDPs (more) precisely. Three clear directions for future work are to (i) consider an approach where one lifts the restrictions to locallyoptimal policies, (ii) investigate the applicability to a richer set of temporal properties and (iii) to allow automatic detection of partitions in, e.g., the Prism language.
Notes
 1.
Due to the additive nature of expected rewards, we can annotate the state with the expected reward even though it may differ over the different paths to an exit of \({\textbf {S}}_i\).
 2.
 3.
 4.
For efficiency, one must implement extraction without first computing an explicit representation of \(\mathcal {M}\).
 5.
Formally, we label the goal states and use G to refer to denote those states.
 6.
where min becomes max and inequalities flip.
 7.
We use v instead of u to avoid confusion with the instantiations for pMDP \(\nu (\mathcal {M})\).
 8.
In our implementation, we define this as subMDPs where the expected number of visits is in the top \(1 + 1/16 \cdot \#\text {iter}\) percent, but not more than 150 at a time.
 9.
The source code and executables, the benchmarks, logfiles and utilities are all available in an archived Docker container: https://doi.org/10.5281/zenodo.6524787.
 10.
Assuming 128 byte per state, i.e., 8 doubles and 16 (32bit) ints, as used in Storm.
References
Ábrahám, E., Jansen, N., Wimmer, R., Katoen, J.P., Becker, B.: DTMC model checking by SCC reduction. In: QEST, pp. 37–46. IEEE CS (2010)
Andriushchenko, R., Češka, M., Junges, S., Katoen, J.P.: Inductive synthesis for probabilistic programs reaches new horizons. In: TACAS 2021. LNCS, vol. 12651, pp. 191–209. Springer, Cham (2021). https://doi.org/10.1007/9783030720162_11
Baier, C., Katoen, J.P.: Principles of Model Checking. MIT Press, Cambridge (2008)
Baier, C., Klein, J., Leuschner, L., Parker, D., Wunderlich, S.: Ensuring the reliability of your model checker: interval iteration for markov decision processes. In: Majumdar, R., Kunčak, V. (eds.) CAV 2017. LNCS, vol. 10426, pp. 160–180. Springer, Cham (2017). https://doi.org/10.1007/9783319633879_8
Barry, J.L., Kaelbling, L.P., LozanoPérez, T.: DetH*: approximate hierarchical solution of large Markov decision processes. In IJCAI, pp. 1928–1935. IJCAI/AAAI (2011)
Bartocci, E., Grosu, R., Katsaros, P., Ramakrishnan, C.R., Smolka, S.A.: Model repair for probabilistic systems. In: Abdulla, P.A., Leino, K.R.M. (eds.) TACAS 2011. LNCS, vol. 6605, pp. 326–340. Springer, Heidelberg (2011). https://doi.org/10.1007/9783642198359_30
Brázdil, T., et al.: Verification of Markov decision processes using learning algorithms. In: Cassez, F., Raskin, J.F. (eds.) ATVA 2014. LNCS, vol. 8837, pp. 98–114. Springer, Cham (2014). https://doi.org/10.1007/9783319119366_8
Calinescu, R., Ceska, M., Gerasimou, S., Kwiatkowska, M., Paoletti, N.: Efficient synthesis of robust models for stochastic systems. J. Syst. Softw. 143, 140–158 (2018)
Chen, T., Hahn, E.M., Han, T., Kwiatkowska, M.Z., Qu, H., Zhang, L.: Model repair for Markov decision processes. In: TASE, pp. 85–92. IEEE CS (2013)
Chrszon, P., Dubslaff, C., Klüppelholz, S., Baier, C.: ProFeat: featureoriented engineering for familybased probabilistic model checking. Formal Aspects Comput. 30(1), 45–75 (2018)
Cubuktepe, M., Jansen, N., Junges, S., Marandi, A., Suilen, M., Topcu, U.: Robust finitestate controllers for uncertain POMDPs. In: AAAI, pp. 11792–11800. AAAI Press (2021)
Dombrowski, C., Junges, S., Katoen, J.P., Gross, J.: Modelchecking assisted protocol design for ultrareliable lowlatency wireless networks. In: SRDS, pp. 307–316. IEEE CS (2016)
Fang, X., Calinescu, R., Gerasimou, S., Alhwikem, F.: Fast parametric model checking through model fragmentation. In: ICSE, pp. 835–846. IEEE (2021)
Feng, L., Han, T., Kwiatkowska, M., Parker, D.: Learningbased compositional verification for synchronous probabilistic systems. In: Bultan, T., Hsiung, P.A. (eds.) ATVA 2011. LNCS, vol. 6996, pp. 511–521. Springer, Heidelberg (2011). https://doi.org/10.1007/9783642243721_40
Hahn, E.M., Hermanns, H., Wachter, B., Zhang, L.: PARAM: a model checker for parametric Markov models. In: Touili, T., Cook, B., Jackson, P. (eds.) CAV 2010. LNCS, vol. 6174, pp. 660–664. Springer, Heidelberg (2010). https://doi.org/10.1007/9783642142956_56
Hahn, E.M., Hermanns, H., Wachter, B., Zhang, L.: PASS: abstraction refinement for infinite probabilistic models. In: Esparza, J., Majumdar, R. (eds.) TACAS 2010. LNCS, vol. 6015, pp. 353–357. Springer, Heidelberg (2010). https://doi.org/10.1007/9783642120022_30
Hartmanns, A., Hermanns, H.: The modest toolset: an integrated environment for quantitative modelling and verification. In: Ábrahám, E., Havelund, K. (eds.) TACAS 2014. LNCS, vol. 8413, pp. 593–598. Springer, Heidelberg (2014). https://doi.org/10.1007/9783642548628_51
Hartmanns, A., Junges, S., Katoen, J.P., Quatmann, T.: Multicost bounded tradeoff analysis in MDP. J. Autom. Reason. 64(7), 1483–1522 (2020)
Hauskrecht, M., Meuleau, N., Kaelbling, L.P., Dean, T.L., Boutilier, C.: Hierarchical solution of Markov decision processes using macroactions. In: UAI, pp. 220–229. Morgan Kaufmann (1998)
Hensel, C., Junges, S., Katoen, J.P., Quatmann, T., Volk, M.: The probabilistic model checker storm. CoRR, abs/2002.07080 (2020)
Hermanns, H., Wachter, B., Zhang, L.: Probabilistic CEGAR. In: Gupta, A., Malik, S. (eds.) CAV 2008. LNCS, vol. 5123, pp. 162–175. Springer, Heidelberg (2008). https://doi.org/10.1007/9783540705451_16
Jansen, N., et al.: Accelerating parametric probabilistic verification. In: Norman, G., Sanders, W. (eds.) QEST 2014. LNCS, vol. 8657, pp. 404–420. Springer, Cham (2014). https://doi.org/10.1007/9783319106960_31
Jansen, N., Dehnert, C., Kaminski, B.L., Katoen, J.P., Westhofen, L.: Bounded model checking for probabilistic programs. In: Artho, C., Legay, A., Peled, D. (eds.) ATVA 2016. LNCS, vol. 9938, pp. 68–85. Springer, Cham (2016). https://doi.org/10.1007/9783319465203_5
Jeong, J., Jaggi, P., Sanner, S.: Symbolic dynamic programming for continuous state MDPs with linear program transitions. In: IJCAI, pp. 4083–4089. ijcai.org (2021)
Kattenbelt, M., Kwiatkowska, M.Z., Norman, G., Parker, D.: A gamebased abstractionrefinement framework for Markov decision processes. Formal Methods Syst. Des. 36(3), 246–280 (2010)
Kretínský, J., Meggendorfer, T.: Of cores: a partialexploration framework for Markov decision processes. Log. Methods Comput. Sci. 16(4) (2020)
Kwiatkowska, M., Norman, G., Parker, D.: PRISM 4.0: verification of probabilistic realtime systems. In: Gopalakrishnan, G., Qadeer, S. (eds.) CAV 2011. LNCS, vol. 6806, pp. 585–591. Springer, Heidelberg (2011). https://doi.org/10.1007/9783642221101_47
Kwiatkowska, M., Norman, G., Parker, D., Qu, H.: Assumeguarantee verification for probabilistic systems. In: Esparza, J., Majumdar, R. (eds.) TACAS 2010. LNCS, vol. 6015, pp. 23–37. Springer, Heidelberg (2010). https://doi.org/10.1007/9783642120022_3
Pateria, S., Subagdja, B., Tan, A.H., Quek, C.: Hierarchical reinforcement learning: a comprehensive survey. ACM Comput. Surv. 54(5), 109:1–109:35 (2021)
Precup, D., Sutton, R.S.: Multitime models for temporally abstract planning. In: NIPS, pp. 1050–1056. The MIT Press (1997)
Puggelli, A., Li, W., SangiovanniVincentelli, A.L., Seshia, S.A.: Polynomialtime verification of PCTL properties of MDPs with convex uncertainties. In: Sharygina, N., Veith, H. (eds.) CAV 2013. LNCS, vol. 8044, pp. 527–542. Springer, Heidelberg (2013). https://doi.org/10.1007/9783642397998_35
Puterman, M.L.: Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley, Hoboken (1995)
Quatmann, T., Dehnert, C., Jansen, N., Junges, S., Katoen, J.P.: Parameter synthesis for Markov models: faster than ever. In: Artho, C., Legay, A., Peled, D. (eds.) ATVA 2016. LNCS, vol. 9938, pp. 50–67. Springer, Cham (2016). https://doi.org/10.1007/9783319465203_4
Salmani, B., Katoen, J.P.: Finetuning the odds in Bayesian networks. In: Vejnarová, J., Wilson, N. (eds.) ECSQARU 2021. LNCS (LNAI), vol. 12897, pp. 268–283. Springer, Cham (2021). https://doi.org/10.1007/9783030867720_20
Song, S., Sun, J., Liu, Y., Dong, J.S.: A model checker for hierarchical probabilistic realtime systems. In: Madhusudan, P., Seshia, S.A. (eds.) CAV 2012. LNCS, vol. 7358, pp. 705–711. Springer, Heidelberg (2012). https://doi.org/10.1007/9783642314247_53
Spel, J., Junges, S., Katoen, J.P.: Finding provably optimal Markov chains. In: TACAS 2021. LNCS, vol. 12651, pp. 173–190. Springer, Cham (2021). https://doi.org/10.1007/9783030720162_10
ter Beek, M.H., Legay, A.: Quantitative variability modelling and analysis. Int. J. Softw. Tools Technol. Transfer 21(6), 607–612 (2019). https://doi.org/10.1007/s10009019005351
Xu, D.N., Gössler, G., Girault, A.: Probabilistic contracts for componentbased design. In: Bouajjani, A., Chin, W.N. (eds.) ATVA 2010. LNCS, vol. 6252, pp. 325–340. Springer, Heidelberg (2010). https://doi.org/10.1007/9783642156434_24
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
Copyright information
© 2022 The Author(s)
About this paper
Cite this paper
Junges, S., Spaan, M.T.J. (2022). AbstractionRefinement for Hierarchical Probabilistic Models. In: Shoham, S., Vizel, Y. (eds) Computer Aided Verification. CAV 2022. Lecture Notes in Computer Science, vol 13371. Springer, Cham. https://doi.org/10.1007/9783031131851_6
Download citation
DOI: https://doi.org/10.1007/9783031131851_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 9783031131844
Online ISBN: 9783031131851
eBook Packages: Computer ScienceComputer Science (R0)