Abstract
We provide an efficient algorithm for multiobjective modelchecking problems on Markov decision processes (MDPs) with multiple cost structures. The key problem at hand is to check whether there exists a scheduler for a given MDP such that all objectives over cost vectors are fulfilled. Reachability and expected cost objectives are covered and can be mixed. Empirical evaluation shows the algorithm’s scalability. We discuss the need for output beyond Pareto curves and exploit the available information from the algorithm to support decision makers.
This work is supported by the 3TU project “Big Software on the Run”, CDZ project CAP, and DFG RTG 2236 “UnRAVeL”.
You have full access to this open access chapter, Download conference paper PDF
Similar content being viewed by others
1 Introduction
Markov decision processes [41] (MDPs) with rewards or costs are a popular model to describe planning problems under uncertainty. Planning algorithms aim to find strategies which perform well (or even optimally) for a given objective. These algorithms typically assume that a goal is reached eventually [41, 45]. This however is unrealistic in many scenarios, e.g. due to insufficient resources or the possibility of failing actions. Furthermore, these policies often admit single runs which perform far below the user’s expectation, which is unsuitable in many scenarios with high stakes. Examples range from deliveries reaching an airport after the plane’s departure to more serious scenarios in e.g. wildfire management [1]. In particular, many scenarios call for minimising the probability to run out of resources before reaching the goal: while it is beneficial for a plane to reach its destination with low expected fuel consumption, it is essential to reach its destination with the fixed available amount of fuel.
Policies that optimise solely for the probability to reach a goal are mostly very expensive. Even in the presence of just a single cost structure, decision makers have to trade the success probability against the costs. This makes many planning problems inherently multiobjective [12, 17]. In particular, safety properties cannot be averaged out by good performance [21]. Planning scenarios in various application areas [44] have different resource constraints. Typical examples are energy consumption and time [11], or optimal expected revenue and time [38] in robotics, and monetary cost and available capacity in logistics [17].
Illustrative Example. Consider a simplified (discretised) version of the Mars rover task scheduling problem [11]. The task is to plan a variety of experiments for a day on Mars. The experiments vary in their success probability, time, energy consumption and their scientific value upon success. The time, energy consumption, and scientific value are uncertain and modelled by probability distributions, cf. Fig. 1(a). The objective is to achieve a minimum of daily scientific progress while limiting the risk of running out of time or out of energy. As the rover is expected to work for a longer period, we prefer a high expected scientific value.
Contributions and approach. This paper focuses on multiobjective costbounded reachability queries on MDPs, a natural setting for the aforementioned planning problems. The input is an MDP with multiple cost structures (e.g. energy, utility or time) and multiple objectives of the form “maximise/minimise the probability to reach a state in \(G_i\) such that the cumulative cost for the ith cost structure is below/above a threshold \(b_i\)”. This multiobjective variant of costbounded reachability is PSPACEhard [43]. The focus of this paper is on the practical side: we aim at finding a practically efficient algorithm to obtain (an approximation of) the Paretooptimal points. To accomplish this, we adapt and generalise recent approaches for the singleobjective case [27, 34] towards the multiobjective setting. The basic idea of [27, 34] is to implicitly unfold the MDP along cost epochs, and exploit the regularities of the epochMDPs. Prism [37] and the Modest Toolset [29] have been updated with such methods for the singleobjective case and significantly outperform the explicit unfolding approach of [2, 40]. This paper presents an algorithm that lifts this principle to multiple cost objectives and determines approximation errors when using value iteration. Extensions towards quantiles and expected costs are considered too. Evaluation using a prototypical implementation in Storm [20] shows promising results. In addition, we equip our algorithm with means to visualise (inspired by the recent techniques in [39]) the tradeoffs between various objectives that go beyond Pareto curves; we believe that this is key to obtain better insights into multiobjective decision making. An example is given in Fig. 1(b): it depicts the probability to satisfy an objective based on the remaining energy (yaxis) and time (xaxis).
Related work. The analysis of singleobjective (costbounded) reachability in MDPs is an active area of research in both AI and formal method communities, and referred to in, e.g., [18, 35, 48]. Various model checking approaches for single objectives exist. In [32], the topology of the unfolded MDP is exploited to speed up the value iteration. In [27], three different model checking approaches are explored and compared. A survey for heuristic approaches is given in [45]. A Qlearning based approach is described in [13]. An extension of this problem in the partially observable setting was considered in [14], and for probabilistic timed automata in [27]. The method from [4] computes optimal expected values under e.g. the condition that the goal is reached, and is thus applicable in settings where a goal is not necessarily reached. A similar problem is considered in [46]. For multiobjective analysis, the model checking community typically focuses on probabilities and expected costs as in the seminal works [15, 22]. Implementations are typically based on a value iteration approach in [24], and have been extended to stochastic games [16], Markov automata [42], and interval MDPs [28]. Other considered cases include e.g. multiobjective meanpayoff objectives [8], objectives over instantaneous costs [10], and parity objectives [7]. Multiobjective problems for MDPs with an unknown costfunction are considered in [33]. Surveys on multiobjective decision making in AI and machine learning can be found in [44] and [47], respectively.
2 Preliminaries
We write \(2^{S} \) for the powerset of S. The ith component of a tuple \(\varvec{t} = \langle v_1, \dots , v_n \rangle \) is . A (discrete) probability distribution over a set \(\varOmega \) is a function \(\mu \in \varOmega \rightarrow [0, 1]\) such that is countable and \(\sum _{\omega \in \mathrm {support}({\mu })}{\mu (\omega )} = 1\). \(\mathrm {Dist}({\varOmega }) \) is the set of all probability distributions over \(\varOmega \). \(\mathcal {D}(s) \) is the Dirac distribution for s, defined by \(\mathcal {D}(s) (s) = 1\).
Definition 1
A Markov decision process (MDP) with m cost structures is a triple \(M =\langle S, T, s_ init \rangle \) where \(S \) is a finite set of states, \(T \in S \rightarrow 2^{\mathrm {Dist}({\mathbb {N} ^{m} \times S})} \) is the transition function, and \(s_ init \in S \) is the initial state. For all \(s \in S \), we require that \(T (s)\) is finite and nonempty.
We write for \(\exists \,\mu \in T (s)\) and call it a transition. We write if additionally \(\langle \varvec{c},s' \rangle \in \mathrm {support}({\mu }) \). \(\langle \varvec{c},s' \rangle \) is a branch with cost vector \(\varvec{c}\). If \(T \) is clear from the context, we just write . Graphically, transitions are lines to a node from which branches labelled with their probability and costs lead to successor states. We may omit the node and probability for transitions into Dirac distributions.
Example 1
Figure 2 shows an MDP \(M_ ex \). From the initial state \(s_0\), the choice of going towards \(s_1\) or \(s_2\) is nondeterministic. Either way, the probability to return to \(s_0\) is 0.5, otherwise we move to \(s_1\) (or \(s_2\)). \(M_ ex \) has two cost structures: Failing to move to \(s_1\) has a cost of 1 for the first, and 2 for the second structure. Moving to \(s_2\) yields cost 2 for the first and no cost for the second structure.
In the remainder of this paper, we fix a given MDP \(M = \langle S, T, s_ init \rangle \). Its semantics is captured by the notion of paths. A path in M represents the infinite concrete resolution of both nondeterministic and probabilistic choices: \(\pi = s_0\, \mu _0\, \varvec{c}_0\, s_1\, \mu _1\, \varvec{c}_1 \dots \) where \(s_i \in S \), , and \(\langle \varvec{c}_i, s_{i+1} \rangle \in \mathrm {support}({\mu _i}) \) for all \(i \in \mathbb {N} \). A finite path \(\pi _\mathrm {fin} = s_0\, \mu _0\, \varvec{c}_0\, s_1\, \mu _1\, \varvec{c}_1\, s_2 \dots \mu _{n1}\, \varvec{c}_{n1}\, s_n\) is a finite prefix of a path with . Let . \(\mathrm {Paths}_\mathrm {fin}(M)\) (\(\mathrm {Paths}(M)\)) are the set of all (in)finite finite paths starting in \(s_ init \). A scheduler (adversary, policy or strategy) resolves nondeterministic choices:
Definition 2
\(\mathfrak {S} \in \mathrm {Paths}_\mathrm {fin}(M) \rightarrow \mathrm {Dist}({\mathrm {Dist}({\mathbb {N} ^m \times S})}) \) is a scheduler for M if . The set of all schedulers of M is \(\mathrm {Sched} (M)\). \(\mathfrak {S}\) is deterministic if \(\mathrm {support}({\mathfrak {S} (\pi )})  = 1\) for all finite paths \(\pi \).
Via the standard cylinder set construction [25], a scheduler \(\mathfrak {S} \) induces a probability measure \(\mathcal {P}^{\mathfrak {S}}_{M} \) on measurable sets of paths starting from \(s_ init \). We define the extremal values \(\mathcal {P}^{\mathrm {max}}_{M} (\varPi ) = \sup _{\mathfrak {S} \in \mathrm {Sched} (M)}\mathcal {P}^{\mathfrak {S}}_{M} (\varPi )\) and \(\mathcal {P}^{\mathrm {min}}_{M} (\varPi ) = \inf _{\mathfrak {S} \in \mathrm {Sched} (M)}\mathcal {P}^{\mathfrak {S}}_{M} (\varPi )\) for measurable \(\varPi \subseteq \mathrm {Paths}(M)\). For clarity, we focus on probabilities in this paper, but note that expected accumulated costs can be defined analogously [25] and our methods apply to them with only minor changes.
CostBounded Reachability. We are interested in the probabilities of sets of paths that reach certain goal states within multiple cost bounds:
Definition 3
A cost bound is given by \(\langle \mathrm {C}_{j} \rangle _{\sim b}\, G\) where \(j \in \{1, \dots , m\}\) identifies a cost structure, \({\sim } \in \{<,\le ,>,\ge \}\), \(b \in \mathbb {N} \) is a bound value, and \(G \subseteq S\) is a set of goal states. A costbounded reachability formula is a conjunction \(\bigwedge _{i=1}^{\!n \in \mathbb {N}} (\langle \mathrm {C}_{j_i} \rangle _{\sim _i b_i}\, G_i)\) of cost bounds. It characterises the measurable set of paths \(\varPi \) where, for every i, every \(\pi \in \varPi \) has a prefix \(\pi _\mathrm {fin}^i\) with \(\mathrm {last}(\pi _\mathrm {fin}^i) \in G_i\) and \(\mathrm {cost}_{j_i}(\pi _\mathrm {fin}^i) \sim _{i} b_i\).
A (singleobjective) multicost bounded reachability query asks for \(\mathcal {P}^{ opt }_{M} (e)\) where \( opt \in \{\, \mathrm {max}, \mathrm {min} \,\}\) and e is a costbounded reachability formula. Unbounded and stepbounded reachability are special cases of costbounded reachability. A singleobjective query may contain multiple bounds, but asks for a single scheduler that optimises the probability of satisfying them all.
We also consider multiobjective tradeoffs, i.e. sets of singleobjective queries written as \(\varPhi = multi \big ({\mathcal {P}^{ opt_1 }_{M} ({e}_1), \dots , \mathcal {P}^{ opt_{\ell } }_{M} ({e}_{\ell })}\big ) \). We call the \(e_k\) objectives. For tradeoffs, we are interested in the Pareto curve \( Pareto (M, \varPhi )\) which consists of all achievable probability vectors \(\varvec{p}_{\mathfrak {S}} = \langle \mathcal {P}^{\mathfrak {S}}_{M} ({e}_1), \dots , \mathcal {P}^{\mathfrak {S}}_{M} ({e}_\ell ) \rangle \) for \(\mathfrak {S} \in \mathrm {Sched} (M)\) that are not dominated by another achievable vector \(\varvec{p}_{\mathfrak {S} '}\). More precisely, \(\varvec{p}_{\mathfrak {S}} \in Pareto (M,\varPhi )\) iff for all \(\mathfrak {S} ' \in \mathrm {Sched} (M)\) either \(\varvec{p}_{\mathfrak {S}} = \varvec{p}_{\mathfrak {S} '} \) or for some \(i\in \{1, \dots , \ell \}\) we have \(( opt_i = \mathrm {max} \wedge {\varvec{p}_{\mathfrak {S}}}[{i}] > {\varvec{p}_{\mathfrak {S} '}}[{i}]) \vee ( opt_i = \mathrm {min} \wedge {\varvec{p}_{\mathfrak {S}}}[{i}] < {\varvec{p}_{\mathfrak {S} '}}[{i}])\).
Example 2
We consider \(\varPhi = multi \big ({\mathcal {P}^{\mathrm {max}}_{M_ ex } (\langle \mathrm {C}_{1} \rangle _{\le 1}\, \{ s_1\}), \mathcal {P}^{\mathrm {max}}_{M_ ex } (\langle \mathrm {C}_{2} \rangle _{\le 3}\, \{ s_2 \})}\big ) \) for \(M_ ex \) of Fig. 2. Let \(\mathfrak {S} _j\) be the scheduler that tries to move to \(s_1\) for at most j attempts and afterwards moves to \(s_2\). The induced probability vectors \(\varvec{p}_{\mathfrak {S} _1} = \langle 0.5,1 \rangle \) and \(\varvec{p}_{\mathfrak {S} _2} = \langle 0.75, 0.75 \rangle \) both lie on the Pareto curve since no \(\mathfrak {S} \in \mathrm {Sched} (M_ ex )\) induces (strictly) larger probabilities \(\varvec{p}_{\mathfrak {S}}\). By also considering schedulers that randomise between the choices of \(\mathfrak {S} _1\) and \(\mathfrak {S} _2\) we obtain \( Pareto (M_ ex , \varPhi ) = \{ w \cdot \varvec{p}_{\mathfrak {S} _1} + (1{}w) \cdot \varvec{p}_{\mathfrak {S} _2} \mid w \in [0,1] \}\).
For clarity of presentation, we restrict to tradeoffs \(\varPhi \) where every cost structure occurs exactly once, i.e., the number \({m}\) of cost structures of M matches the number of cost bounds occurring in \(\varPhi \). Furthermore, we require that none of the sets of goal states contains the initial state. Both assumptions are w.l.o.g. by copying cost structures as needed and adding a new initial state with zerocost transition to the old initial state.
3 Multidimensional Sequential Value Iteration
We present a practically efficient approach to compute (an approximation of) the Pareto curve for MDP M with \({m}\) cost structures and tradeoff \(\varPhi \). We merge the ideas of [24] to approximate a Pareto curve for an (unbounded) multiobjective tradeoff with those of [27, 34] to efficiently compute (singleobjective) costbounded reachability probabilities. For clarity of presentation we start with the upperbounded maximum case and assume a tradeoff of the form \(\varPhi = multi \big ({\mathcal {P}^{\mathrm {max}}_{M} ({e}_1), \dots , \mathcal {P}^{\mathrm {max}}_{M} ({e}_{\ell })}\big ) \) with \({e}_k = \bigwedge _{i=n_{k1}}^{\!n_k1}(\langle \mathrm {C}_{{i}} \rangle _{\le b_{i}}\, G_{i})\) and \(0=n_0< n_1< \dots < n_{{\ell }} = {m}\). Other variants are discussed in Sect. 3.3.
Cost epochs and goal satisfaction. Central to our approach is the concept of cost epochs. Consider the path \(\pi = (s_0\langle 2,0 \rangle s_2\langle 0,0 \rangle s_0\langle 1,2 \rangle )^{\omega }\) through \(M_ ex \) of Fig. 2. We plot the accumulated cost in both dimensions along this path in Fig. 3(a). Starting from \(\langle 0,0 \rangle \), the first transition yields cost 2 for the first cost structure: we jump to coordinate \(\langle 2,0 \rangle \). The next transition, back to \(s_0\), has no cost, so we stay at \(\langle 2,0 \rangle \). Finally, the failed attempt to move to \(s_1\) incurs costs \(\langle 1, 2 \rangle \). Consequently, for an infinite path, infinitely many points in this grid may be reached. However, a tradeoff specifies bound values for the costs, e.g., for \(\varPhi _ ex = multi \big ({\mathcal {P}^{\mathrm {max}}_{M_ ex } (\langle \mathrm {C}_{1} \rangle _{\le 4}\, \{ s_1\}), \mathcal {P}^{\mathrm {max}}_{M_ ex } (\langle \mathrm {C}_{2} \rangle _{\le 3}\, \{ s_2 \})}\big ) \) we get bound values 4 and 3. Once the bound value for a bound is reached, accumulating further costs in this dimension does not impact the satisfaction of its formula. It thus suffices to keep track, for each bound, of the remaining costs before reaching the bound value. This leads to a finite grid as depicted in Fig. 3(b). We refer to each of its coordinates as a cost epoch:
Definition 4
An \({m}\)dimensional cost epoch is a tuple in . For \({\varvec{e}}\in \varvec{E}_{{m}}\), \(\varvec{c} \in \mathbb {N} ^{m}\), the successor epoch is if that value is nonnegative and \(\bot \) otherwise.
If the entry for a bound is \(\bot \), it cannot be satisfied any more: too much costs have already been incurred. To check whether an objective \({e}_k = \bigwedge _{i=n_{k1}}^{\!n_k1}(\langle \mathrm {C}_{{i}} \rangle _{\le b_{i}}\, G_{i})\) is satisfied, we memorise whether each individual bound already holds. This is also used to ensure that satisfying a bound more than once has no effect.
Definition 5
A goal satisfaction represents the cost structure indices i for which bound \(\langle \mathrm {C}_{i} \rangle _{\le b_i}\,G_i\) already holds, i.e. \(G_i\) was reached before the bound value \(b_i\). For \({\varvec{g}}\in {\varvec{G}_{{m}}}\), \({\varvec{e}}\in \varvec{E}_{{m}}\) and \(s\in S \), let \({ succ ({\varvec{g}}, s, {\varvec{e}})} \in {\varvec{G}_{{m}}}\) define the update upon reaching s: \({{ succ ({\varvec{g}}, s, {\varvec{e}})}}[{i}] = 1\) if \(s \in G_i \wedge {{\varvec{e}}}[{i}] \ne \bot \) and \({{ succ ({\varvec{g}}, s, {\varvec{e}})}}[{i}] = {{\varvec{g}}}[{i}] \) otherwise.
3.1 The Unfolding Approach
\( Pareto (M, \varPhi )\) can be computed by reducing \(\varPhi \) to a multiobjective unbounded reachability problem on the unfolded MDP. Its states are the Cartesian product of the original MDP’s states, the epochs, and the goal satisfactions:
Definition 6
The unfolding for M as in Definition 1 and upperbounded maximum tradeoff \(\varPhi \) is the MDP with no cost structures, and the unfolding of probability distribution \(\mu \) defined by \( unf (\mu )(\langle \langle s', {\varvec{e}}', {\varvec{g}}' \rangle \rangle ) = \mu (\langle \varvec{c}, s' \rangle )\) if \({\varvec{e}}' = { succ ({\varvec{e}}, \varvec{c})} \wedge {\varvec{g}}' = { succ ({\varvec{g}}, s', {\varvec{e}}')}\) and 0 otherwise.
Costs are now encoded in the state space, so it suffices to consider the unbounded tradeoff \(\varPhi ' = multi \big ({\mathcal {P}^{\mathrm {max}}_{{M_{ unf }}} (e'_1), \dots , \mathcal {P}^{\mathrm {max}}_{{M_{ unf }}} (e'_{\ell })}\big ) \) with \({e}'_k = \langle \cdot \rangle _{\ge 0}\,G'_k\) and \(G'_k = \{ \langle s, {\varvec{e}}, {\varvec{g}} \rangle \mid \bigwedge _{i=n_{k1}}^{\!n_k1} {{\varvec{g}}}[{i}] = 1 \}\).
Lemma 1
There is a bijection \(f :\mathrm {Sched} (M) \rightarrow \mathrm {Sched} ({M_{ unf }})\) with \(\mathcal {P}^{\mathfrak {S}}_{M} (e_k) = \mathcal {P}^{f(\mathfrak {S})}_{{M_{ unf }}} (e'_k)\) for all \(\mathfrak {S} \in \mathrm {Sched} (M)\) and \(k \in \{\,1, \dots , {\ell }\,\}\). Consequently, we have that \( Pareto (M, \varPhi ) = Pareto ({M_{ unf }}, \varPhi ')\).
\( Pareto ({M_{ unf }}, \varPhi ')\) can be computed with existing multiobjective model checking algorithms for unbounded reachability. We build on the one of [24]. It iteratively chooses weight vectors \(\varvec{w} = \langle w_1, \dots , w_{\ell } \rangle \in [0,1]^{\ell }\setminus \{ \mathbf {0} \}\) and computes points
The Pareto curve \(\varvec{P}\) is convex, \(\varvec{p}_{\varvec{w}} \in \varvec{P}\) for all \(\varvec{w}\), and \(\varvec{q} \in \varvec{P}\) implies \(\varvec{q} \cdot \varvec{w} \le \varvec{p}_{\varvec{w}} \cdot \varvec{w}\). These observations allow us to approximate the Pareto curve with arbitrary precision; see [24] for details. [24] characterises \(\varvec{p}_{\varvec{w}}\) via weighted expected costs: \({M_{ unf }}\) is equipped with \({\ell }\) cost structures used to calculate the probability of each of the \({\ell }\) objectives. This is achieved by setting the value of the kth cost structure on each branch to 1 iff the objective \({e}_k'\) is satisfied in the target state of the branch but was not satisfied in the transition’s source state. On a path \(\pi \) through the resulting model \({M^{+}_{ unf }}\), we collect exactly one cost w.r.t. cost structure k iff \(\pi \) satisfies objective \({e}_k\).
Definition 7
For \(\mathfrak {S} \in \mathrm {Sched} ({M^{+}_{ unf }})\) and \(\varvec{w} \in [0,1]^{\ell }\), the weighted expected cost is \(\mathcal {E}^{\mathfrak {S}}_{{M^{+}_{ unf }}} (\varvec{w}) = \sum _{k=1}^{\ell }{\varvec{w}}[{k}] \cdot \int _{\pi \in \mathrm {Paths}(M)} \mathrm {cost}_{k}(\pi ) {\mathop {}\!\mathrm {d}}\mathcal {P}^{\mathfrak {S}}_{{M^{+}_{ unf }}} (\pi )\), i.e. the expected value of the weighted sum of the costs accumulated on paths in \({M^{+}_{ unf }}\).
The following characterisation of \(\varvec{p}_{\varvec{w}}\) is equivalent to Eq. 1:
and \(\varvec{1}_k \in \{0,1\}^{\ell }\) is the weight vector defined by \({\varvec{1}_k}[{j}] = 1\) iff \(j=k\). Standard MDP model checking algorithms [41] can be applied to compute an optimal (deterministic and memoryless) scheduler \(\mathfrak {S} \) and the induced costs \(\mathcal {E}^{\mathfrak {S}}_{{M^{+}_{ unf }}} (\varvec{1}_k)\).
3.2 An Epoch Model Approach Without Unfolding
The unfolding approach does not scale well: If the original MDP has n states, the unfolding will have on the order of \(n \cdot \prod _{i=1}^m (b_i + 2)\) states. This makes it infeasible for larger bound values \(b_i\) over multiple bounds. The bottleneck lies in computing the points \(\varvec{p}_{\varvec{w}}\) as in Eqs. 1 and 2. We now show how to do so efficiently, i.e. given a weight vector \(\varvec{w} = \langle w_1, \dots , w_{\ell } \rangle \in [0,1]^{\ell }\setminus \{ \mathbf {0} \}\), compute
without unfolding. The characterisations of \(\varvec{p}_{\varvec{w}}\) given in Eqs. 1 and 3 are equivalent due to Lemma 1.
The efficient analysis of singleobjective queries with a single bound \(\varPhi _1 = \mathcal {P}^{\mathrm {max}}_{M} (\langle \mathrm {C}_{} \rangle _{\le b}\, G)\) has recently been addressed in e.g. [27, 34]. The key observation is that the unfolding \({M_{ unf }}\) can be decomposed into \(b + 2\) epoch model MDPs \(M^{b}, \dots , M^{0}, M^{\bot }\) corresponding to the cost epochs. The epoch models are copies of M with only slight adaptations. Reachability probabilities in copies corresponding to epoch i only depend on the copies \(\{\, M^{j}~~j \le i \vee j = \bot \,\}\). It is thus possible to analyse \(M^{\bot }, \dots , M^{b}\) sequentially instead of considering all copies at once. In particular, it is not necessary to construct the full unfolding.
We lift this idea to multiobjective tradeoffs. The singleobjective case is notably simpler in that reaching a goal state for the first time or exceeding the cost bound immediately suffices to determine whether the one property is satisfied. In particular, while \(M^{\bot }\) is just one sink state in the singleobjective case, its structure is more involved here.
We first formalise the notion of epoch models for multiple bounds. The aim is to build an MDP for each epoch \({\varvec{e}}\in \varvec{E}_{{m}}\) that can be analysed via standard model checking techniques using the weighted expected cost encoding of objective probabilities. The state space of an epoch model consists of up to one copy of each original state for each goal satisfaction vector \({\varvec{g}}\in {\varvec{G}_{{m}}}\). Additional sink states \(\langle s_\bot , {\varvec{g}} \rangle \) encode the target for a jump to any other cost epoch \({\varvec{e}}' \ne {\varvec{e}}\). We consider \({\ell }\) cost structures to encode the objective probabilities. Let function \({ satObj _\varPhi }:{\varvec{G}_{{m}}} \times {\varvec{G}_{{m}}} \rightarrow \{0,1\}^{\ell }\) assign value 1 in entry k iff a reachability property \({e}_k\) is satisfied according to the second goal vector but was not satisfied in the first. For the transitions’ branches, we distinguish two cases: (1) If the successor epoch \( {\varvec{e}}' = { succ ({\varvec{e}},\varvec{c})}\) with respect to the original cost \(\varvec{c} \in \mathbb {N} ^{m}\) is the same as the current epoch \({\varvec{e}}\), we jump to the successor state as before, and update the goal satisfaction. We collect the new costs for the objectives if updating the goal satisfaction newly satisfies an objective as given by \({ satObj _\varPhi }\) (2). If the successor epoch \({\varvec{e}}' = { succ ({\varvec{e}},\varvec{c})}\) is different from the current epoch \({\varvec{e}}\), the probability is rerouted to the sink state with the corresponding goal state satisfaction vector. The collected costs contains the part of the goal satisfaction as in (1), but also the results obtained by analysing the reached epoch \({\varvec{e}}'\), given by a function f.
Definition 8
The epoch model of MDP M as in Definition 1 for \({\varvec{e}}\in \varvec{E}_{{m}}\) and a function \(f :{\varvec{G}_{{m}}} \times \mathrm {Dist}({\mathbb {N} ^m \times S}) \rightarrow [0,1]^{\ell }\) is the MDP \(M^{\varvec{e}}_f =\langle S ^{\varvec{e}}, T ^{\varvec{e}}_f, \langle s_ init , \varvec{0} \rangle \rangle \) with \({\ell }\) cost structures, , \(T ^{\varvec{e}}_f(\langle s_\bot , {\varvec{g}} \rangle ) = \{\,\mathcal {D}(\langle \varvec{0}, \langle s_\bot , {\varvec{g}} \rangle \rangle ) \,\}\), and for every \({\tilde{s}}= \langle s, {\varvec{g}} \rangle \in S ^{\varvec{e}}\) and \(\mu \in T (s)\), there is some \(\nu \in T ^{\varvec{e}}_f({\tilde{s}})\) defined by:

1.
\(\nu (\langle { satObj _\varPhi }({\varvec{g}}, {\varvec{g}}'), \langle s', {\varvec{g}}' \rangle \rangle ) = \mu (\varvec{c}, s')\) if \({ succ ({\varvec{e}}, \varvec{c})} = {\varvec{e}}\wedge {\varvec{g}}' = { succ ({\varvec{g}}, s', {\varvec{e}})}\), and

2.
\(\nu (\langle f({\varvec{g}}, \mu ) + { satObj _\varPhi }({\varvec{g}}, {\varvec{g}}'), \langle s_\bot , {\varvec{g}}' \rangle \rangle ) = \sum _{\varvec{c} \in \varvec{C}} \sum _{s' \in S '_{\varvec{c}}} \mu (\varvec{c}, s')\) where \(\varvec{C} = \{ \varvec{c} \mid { succ ({\varvec{e}}, \varvec{c})} \ne {\varvec{e}}\}\) and \(S '_{\varvec{c}} = \{ s' \mid { succ ({\varvec{g}}, s', { succ ({\varvec{e}}, \varvec{c})})} = {\varvec{g}}' \} \).
Figure 4 shows an epoch model \(M^{\varvec{e}}_f\) of the MDP \(M_ ex \) in Fig. 2 with respect to tradeoff \(\varPhi \) as in Example 2 and any epoch \({\varvec{e}}\in \varvec{E}_{2}\) with \({{\varvec{e}}}[{1}] \ne \bot \) and \({{\varvec{e}}}[{2}] \ne \bot \).
Remark 1
The structure of \(M^{\varvec{e}}_f\) differs only slightly between epochs. In particular consider epochs \({\varvec{e}}, {\varvec{e}}'\) with \({{\varvec{e}}}[{i}] = \bot \) iff \({{\varvec{e}}'}[{i}] = \bot \). To construct epoch model \(M^{{\varvec{e}}'}_f \) from \( M^{{\varvec{e}}}_f\), only transitions to the bottom states \(\langle s_\bot , {\varvec{g}} \rangle \) need to be adapted.
To analyse an epoch model \(M^{\varvec{e}}_f\), any successor epoch \({\varvec{e}}'\) of \({\varvec{e}}\) needs to be analysed before. Since costs are nonnegative, we can ensure this by analysing the epochs in a specific order. In the single dimensional case the order is uniquely given by \(\bot ,0,1, \dots , b\). For multiple cost bounds any linearisation of the partial order \({\preceq } \subseteq \varvec{E}_{{m}} \times \varvec{E}_{{m}}\) with \({\varvec{e}}' \preceq {\varvec{e}}\) iff \({{\varvec{e}}'}[{i}] \le {{\varvec{e}}}[{i}] \vee {{\varvec{e}}'}[{i}] = \bot \) for all i can be considered. We call such a linearisation a proper epoch sequence.
We compute the points \(\varvec{p}_{\varvec{w}}\) by analysing the different epoch models (i.e. the coordinates of Fig. 3(b)) sequentially. The main procedure is outlined in Algorithm 1. The costs of the model for the current epoch are computed in lines 28. These costs comprise the results from previously analysed epochs \({\varvec{e}}'\). In lines 912, the current epoch model \(M^{\varvec{e}}_f\) is built and analysed: We compute weighted expected costs on \(M^{\varvec{e}}_f\) where \({\mathcal {E}^{\mathfrak {S}}_{M^{\varvec{e}}_f} (\varvec{w})}[{s}] \) denotes the expected costs for \(M^{\varvec{e}}_f\) when changing the initial state to s. In line 10 a (deterministic and memoryless) scheduler \(\mathfrak {S} \) that induces the maximal weighted expected costs (i.e. \({\mathcal {E}^{\mathfrak {S}}_{M^{\varvec{e}}_f} (\varvec{w})}[{s}] = \max _{\mathfrak {S} '} {\mathcal {E}^{\mathfrak {S} '}_{M^{\varvec{e}}_f} (\varvec{w})}[{s}] \) for all states s) is computed. In line 12 we then compute the expected costs induced by \(\mathfrak {S} \) for the individual objectives.
Theorem 1
The output of Algorithm 1 satisfies Eq. 3.
Proof
(sketch). Let \({\varvec{e}}\) be the currently analysed epoch. Since \({\mathbb {E}}\) is assumed to be a proper epoch sequence, we already processed any reachable successor epoch \({\varvec{e}}'\) of \({\varvec{e}}\), i.e., line 7 is only executed for epochs \({\varvec{e}}'\) for which \(x^{{\varvec{e}}'}\) has already been computed. One can show that the values \({{{{{x^{{\varvec{e}}}}}[{\langle s, {\varvec{g}} \rangle }]}}[{k}]}\) computed by the algorithm coincide with the probability to satisfy \({e}_k'\) from state \(\langle s, {\varvec{e}}, {\varvec{g}} \rangle \) in the unfolding \({M_{ unf }}\) under a scheduler \(\mathfrak {S} \) that maximises the weighted sum.
Error propagation. So far, we assumed that (weighted) expected costs \(\mathcal {E}^{\mathfrak {S}}_{M} (\varvec{w})\) are computed exactly. Practical implementations, however, are often based on numerical methods that only approximate the correct solution. In fact, methods based on value iteration—the defacto standard in MDP model checking—do not give any guarantee on the accuracy of the obtained result [26]. We therefore consider interval iteration [5, 9] which for a predefined precision \(\varepsilon > 0\) guarantees that the obtained result \(x_s\) is \(\varepsilon \)precise, i.e. we have \(x_s  {\mathcal {E}^{\mathfrak {S}}_{M} (\varvec{w})}[{s}]  \le \varepsilon \).
For the singlecost bounded variant of Algorithm 1, [27] discusses that in order to compute \(\mathcal {P}^{\mathrm {max}}_{M} (\langle \mathrm {C}_{} \rangle _{\le b}\, G)\) with precision \(\varepsilon \), each epoch model needs to be analysed with precision \(\frac{\varepsilon }{b+1}\). We generalise this result to multidimensional tradeoffs. Assume the results of previously analysed epochs (given by f) are \(\varepsilon \)precise and that \(M^{\varvec{e}}_f\) is analysed with precision \(\delta \). As in the singledimensional case, the total error for \(M^{\varvec{e}}_f\) can accumulate to \(\delta + \varepsilon \). Since a path through the MDP M can visit at most \(\sum _{i=1}^{m}(b_i + 1)\) cost epochs whose analysis introduces error \(\delta \), the overall error can be upper bounded by \(\delta \cdot \sum _{i=1}^{m}(b_i + 1)\).
Theorem 2
If the values \({{{{{x^{{\varvec{e}}}}}[{{\tilde{s}}}]}}[{k}]}\) at line 12 of Algorithm 1 are computed with precision \(\varepsilon / \sum _{i=1}^{{m}} (b_i +1)\) for some \(\varepsilon > 0\), the output \(\varvec{p}_{\varvec{w}}'\) of the algorithm satisfies \( \varvec{p}_{\varvec{w}}  \varvec{p}_{\varvec{w}}' \cdot \varvec{w} \le \varepsilon \) where \(\varvec{p}_{\varvec{w}}\) is as in Eq. 3.
Remark 2
Alternatively, epochs can be analysed with the desired overall precision \(\varepsilon \) by lifting the results from topological interval iteration [5]. However, that requires to store the obtained bounds for the results of already analysed epochs.
3.3 Extensions
Minimising objectives. Objectives \(\mathcal {P}^{\mathrm {min}}_{M} ({e}_k)\) can be handled by extending the function \({ satObj _\varPhi }\) in Definition 8 such that it assigns cost \(1\) to branches that lead to the satisfaction of \({e}_k\). To obtain the desired probabilities we then maximise negative costs and multiply the result by \(1\) afterwards. As interval iteration supports mixtures of positive and negative costs [5], arbitrary combinations of minimising and maximising objectives can be considered^{Footnote 1}.
Beyond upper bounds. Our approach also supports bounds of the form \(\langle \mathrm {C}_{j} \rangle _{\sim b}\, G\) for \({\sim } \in \{<, \le , >, \ge \}\), i.e., we allow combinations of lower and upper costbounds. For strict upper bounds \(<\!b\) and nonstrict lower bounds \({\ge }\;b\) we consider \(\le b+1\) and \(> b1\) instead. For bound \(\langle \mathrm {C}_{i} \rangle _{> b_i}\, G_i\) we adapt the update of goal satisfactions such that \({{ succ ({\varvec{g}}, s, {\varvec{e}})}}[{i}] = 1\) if either \({{\varvec{g}}}[{i}] = 1\) or \(s\in G_i \wedge {{\varvec{e}}}[{i}] = \bot \). Similarly, we support multiboundedsinglegoal queries of the form \(\langle \mathrm {C}_{(j_1, \dots , j_n)} \rangle _{(\sim _{1} b_{1}, \dots , \sim _{n} b_{n})}\, G\) which characterises the paths \(\pi \) with a single prefix \(\pi _\mathrm {fin}\) satisfying \(\mathrm {last}(\pi _\mathrm {fin}) \in G\) and all cost bounds, i.e., \( \mathrm {cost}_{j_i}(\pi _\mathrm {fin}) \sim _i b_i\).
Example 3
The formula \(e = \langle \mathrm {C}_{(1,1)} \rangle _{(\le 1, \ge 1)}\, G\) expresses the paths that reach G while collecting exactly one cost w.r.t. the first cost structure. This formula is not equivalent to \(e' = \langle \mathrm {C}_{1} \rangle _{\le 1}\, G \wedge \langle \mathrm {C}_{1} \rangle _{\ge 1}\, G\) since, e.g., for \(G= \{\,s_0\,\}\) the path \(\pi = s_0 \langle 2 \rangle s_0\) satisfies \(e'\) but not e.
Expected cost objectives. We can consider costbounded expected cost objectives \(\mathcal {E}^{ opt }_{M} (\mathrm {R}_{j_1}, \langle \mathrm {C}_{j_2} \rangle _{\le b}\,\!)\) with \( opt \in \{\, \mathrm {max}, \mathrm {min}\,\}\) which refer to the expected cost accumulated for cost structure \(j_1\) within a given cost bound \(\langle \mathrm {C}_{j_2} \rangle _{\le b}\,\). Similar to costbounded reachability queries, we compute costbounded expected costs via computing (weighted) expected costs within epoch models.
Quantiles. A (multidimensional) quantile has the form \( Qu (\mathcal {P}^{ opt }_{M} (e) \sim p)\) for \( opt \in \{\,\mathrm {min}, \mathrm {max}\,\}\), \({\sim } \in \{<, \le , >, \ge \}\), \(e=\bigwedge _{i=1}^{\!n \in \mathbb {N}} (\langle \mathrm {C}_{j_i} \rangle _{\sim _i b_i}\, G_i)\) and a fixed probability threshold \(p \in [0,1]\). The quantile asks for the set of bound values \(\mathcal {B}\) that satisfy the probability threshold, i.e., \(\mathcal {B} = \{\langle b_1 \dots , b_n \rangle \mid \mathcal {P}^{ opt }_{M} (e) \sim p \}\). The computation of quantiles for singlecost bounded reachability has been discussed in [3, 34], where multiple cost bounds are supported via unfolding. Unfolding requires to fix bound values \(b_2, \dots , b_n\) a priori, and one can only ask for all \(b_1\) that satisfy the property. Our approach provides the basis for lifting the ideas of [3, 34] to multibounded queries. Roughly, one extends the epoch sequence \({\mathbb {E}}\) in Algorithm 1 dynamically until the epochs in which the bounded reachability probability passes the threshold p are explored. Additional steps such as detecting the case where \(\mathcal {B} = \emptyset \) are left for future work.
4 Visualisations
The results of a multiobjective model checking analysis are typically presented as a single (approximation of a) Pareto curve. For more than two objectives, the performance of the Paretooptimal scheduler can be displayed in a bar chart as in Fig. 4, where the colours reflect different objectives and the groups different schedulers. The aim is to visualise the tradeoffs between the different objectives such that the user can make an informed decision about the system design or pick a scheduler for implementation. However, Pareto set visualisations alone may not provide sufficient information, about, e.g., which objectives are aligned or conflicting (see e.g. [39] for a discussion in the nonprobabilistic case). Cost bounds furthermore add an extra dimension for each cost structure. Consider the Mars rover MDP \(M_r\) and tradeoff \( multi \big ({ obj _{100}, obj _{140}}\big ) \) with
where \( B \) is the set of states where the rover has safely returned to its base. We ask for the tradeoff between performing experiments of scientific value at least 100 before returning to base within 175 time units and maximum energy consumption of 100 units (\( obj _{100}\)) vs. achieving the same with scientific value at least 140 (\( obj _{140}\)). The Pareto curve (Fig. 5(a)) shows the tradeoff between achieving \( obj _{100}\) and \( obj _{140}\). However, for each Paretooptimal scheduler, our method has implicitly computed the probabilities of the two objectives for all reachable epochs as well, i.e. for all bounds on the three quantities below the ones required in the tradeoff. We visualise this information for deep insights into the behaviour of each scheduler, its robustness w.r.t. the bounds, and its preferences for certain objectives depending on the remaining budget for each quantity.
We use plots as shown in Fig. 6. They can be generated in no extra runtime or memory since all required data is already computed implicitly. We restrict to twodimensional plots since they are easier to grasp than complex threedimensional visualisations. In each plot, we can thus show the relationship between three different quantities: one on the xaxis (x), one on the yaxis (y), and one encoded as the colour of the points (z, where we use blue for high values, red for low values, black for probability zero, and white for unreachable epochs). Yet our example tradeoff already contains five quantities: the probability for \( obj _{100}\), the probability for \( obj _{140}\), the available time and energy to be spent, and the remaining scientific value to be accumulated. We thus need to project out some quantities. We do this by showing at every \(\langle \textit{x}, \textit{y} \rangle \) coordinate the maximum or minimum value of the z quantity when ranging over all reachable values of the hidden costs at this coordinate. That is, we show a best or worstcase situation, depending on the semantics of the respective quantities.
Out of the 30 possible combinations of quantities for our example, we showcase three to illustrate the added value of the obtained information. First, in Fig. 6(a), we plot the probabilities of the two objectives vs. the minimum scientific value that still needs to be accumulated for two different Paretooptimal schedulers (left: \(\mathfrak {S} _1\), right: \(\mathfrak {S} _2\)). White areas indicate that no epoch for the particular combination of probabilities is reachable from the tradeoff’s bounds. These two and all other Paretooptimal schedulers are white above the diagonal, which means that \( obj _{100}\) implies \( obj _{140}\), i.e. the objectives are aligned. For the left scheduler, we further see that all blueish areas are associated to lower probabilities for both objectives. Since blue indicates higher values, this scheduler achieves only low probabilities when it still needs to make the rover accumulate a high amount of value. However, it overall achieves higher probabilities for \( obj _{140}\) at medium value requirements, whereas the right scheduler is “safer” and focuses on satisfying \( obj _{100}\). The erratic spikes on the left occur because some probabilities are only reached after very unlikely paths.
In Fig. 6(b), we show for \(\mathfrak {S} _1\) the probability to achieve \( obj _{100}\) depending on the remaining energy to be spent vs. the remaining scientific value to be accumulated. We see a white vertical line for every odd xvalue; this is because, over all branches in the model, the gcd of all value costs is 2. The left plot shows the minimum probabilities over the hidden costs, i.e. we see the probability for the worstcase remaining time; the right plot shows the bestcase scenario. Not surprisingly, when time is low, only a lot of energy makes it possible to reach the objective with nonzero probability.
Finally, Fig. 6(c) shows the probability for \( obj _{140}\) depending on available time and energy for \(\mathfrak {S} _2\). We plot the minimum probability over the hidden scientific value requirement, i.e. a worstcase view. The plot shows that time is of little use in case of low remaining energy, but it helps significantly when there is sufficient energy, too. In Fig. 6(d), we depict for the same scheduler the minimum remaining scientific value (z) under which a certain probability for \( obj _{100}\) can be achieved (y), given a certain remaining time budget (x). The upper left corner shows that a high probability in little time is only achievable if we need to collect little more value; the value requirement gradually relaxes as we aim for lower probabilities or have more time.
5 Experiments
Implementation. We implemented the presented approach into Storm [20] v1.2, and available via [19]. The implementation computes extremal probabilities for singleobjective multicost bounded queries, as well as Pareto curves for the multiobjective case. We consider the sparse engine of Storm, i.e., explicit data structures such as sparse matrices. For singlecost bounded properties, this has already been addressed in [34]. For the computation of expected cost (Lines 10 to 12 of Algorithm 1) we employ interval iteration with finite precision floats as well as policy iteration with infinite precision rationals. The expected costs (lines 10 to 12 of Algorithm 1) are computed either numerically (via interval iteration over finite precision floats) or exactly (via policy iteration over infinite precision rationals). To reduce the memory consumption, the analysis result of an epoch model \(M^{\varvec{e}}_f\) is erased as soon as possible.
Setup & reproduction. We evaluate the approach on wide range of case studies, available in the artefact [30]. The models are given in Prism’s [37] guarded command language. For each case study we consider single and multiobjective queries that yield nontrivial results, i.e., probabilities strictly between zero and one. We compare the naive unfolding approach (UNF) as in Sect. 3.1 with the sequential approach (SEQ) as in Sect. 3.2. The unfolding of the model is applied on the Prism language level, by considering a parallel composition with cost counting structures. On the unfolded model we apply the algorithms for unbounded reachability as available in Storm. We considered precision \({\eta }= 10^{4}\) for the Pareto curve approximation and precision \({\varepsilon }= 10^{6}\) for interval iteration. We increased the precision for single epoch models as in Theorem 2.
We ran our experiments on a single core (2 GHz) of a HP BL685C G7 system with 192 GB of memory. We stopped each experiment after a time limit of 2 hours. For experiments that completed within the time limit, we observed a memory consumption of up to 36 GB for UNF and up to 5 GB for SEQ.
A binary equivalent to the binary we used for the experiments is available in the artefact [30]. The binary has been tested in the artefact evaluation VM [31]. For other configurations, Storm should be recompiled using the sources [19].
Details on reproduction of the tables, as well as details on how to analyse multicost bounded properties using Storm in general can be found in the readme, enclosed in the artefact.
Experimental Results. Tables 1 and 2 show results for single and multiobjective queries, respectively. The first columns yield the number of states and transitions of the original MDP, then for the query, the number of bounds \({m}\), the number of different cost structures r, and the number of reachable cost epochs (reflecting the magnitude of the bound values). \(S _ unf \) denotes the number of reachable states in the unfolding. For multiobjective queries, we additionally give the number of objectives and the number of analysed weight vectors \(\varvec{w}\). The remaining columns depict the runtimes of the different approaches in seconds. For UNF, we considered both the sparse (sp) and symbolic (dd) engine of Storm. The symbolic engine neither supports multiobjective model checking nor exact policy iteration.
On the majority of benchmarks, SEQ performs better than UNF. Typically, SEQ is less sensitive to increases in the magnitude of the cost bounds, as illustrated in Fig. 7. For three benchmark and query instances, we plot the runtime of both approaches against different numbers \(\mathbb {E}\) of reachable epochs. While for small cost bounds, UNF is sometimes even faster compared to SEQ, SEQ scales better with increasing \(\mathbb {E}\). It is not surprising that SEQ scales better, ultimately, the increased state space and the accompanying memory consumption in UNF is a bottleneck. The most important reason that UNF performs better for some (smaller) cost bounds is the induced overhead of checking the full epoch. In particular, the epoch contains (often many) states that are not reachable from the initial state (in the unfolding).
6 Conclusion
Many realworld planning problems consider several limited resources and contain tradeoffs. This paper present a practically efficient approach to analyse these problems. It has been implemented in the Storm model checker and shows significant performance benefits. The algorithm implicitly computes a large amount of information that is hidden in the standard plots of Pareto curves shown to visualise the results of a multiobjective analysis. We have developed a new set of visualisations that exploit all the available data to provide new and clear insights to decision makers even for problems with many objectives and cost dimensions.
Data Availability Statement. The datasets analysed during the current study, and the binary used for the analysis, are available in the figshare repository [30]. Source code matching the binary is available in [19].
Notes
 1.
This supersedes a restriction of the algorithm of [24].
References
The International Probabilistic Planning Competition, http://www.icapsconference.org/index.php/Main/Competitions
Andova, S., Hermanns, H., Katoen, J.P.: Discretetime rewards modelchecked. In: Larsen, K.G., Niebert, P. (eds.) FORMATS 2003. LNCS, vol. 2791, pp. 88–104. Springer, Heidelberg (2004). https://doi.org/10.1007/9783540409038_8
Baier, C., Daum, M., Dubslaff, C., Klein, J., Klüppelholz, S.: Energyutility quantiles. In: Badger, J.M., Rozier, K.Y. (eds.) NFM 2014. LNCS, vol. 8430, pp. 285–299. Springer, Cham (2014). https://doi.org/10.1007/9783319062006_24
Baier, C., Klein, J., Klüppelholz, S., Wunderlich, S.: Maximizing the conditional expected reward for reaching the goal. In: Legay, A., Margaria, T. (eds.) TACAS 2017. LNCS, vol. 10206, pp. 269–285. Springer, Heidelberg (2017). https://doi.org/10.1007/9783662545805_16
Baier, C., Klein, J., Leuschner, L., Parker, D., Wunderlich, S.: Ensuring the reliability of your model checker: interval iteration for Markov decision processes. In: Majumdar, R., Kunčak, V. (eds.) CAV 2017, Part I. LNCS, vol. 10426, pp. 160–180. Springer, Cham (2017). https://doi.org/10.1007/9783319633879_8
Barrett, L., Narayanan, S.: Learning all optimal policies with multiple criteria. In: ICML. AICPS, vol. 307, pp. 41–47. ACM (2008)
Berthon, R., Randour, M., Raskin, J.F.: Threshold constraints with guarantees for parity objectives in Markov decision processes. In: ICALP. LIPIcs, vol. 80, pp. 121:1–121:15. Schloss Dagstuhl  LeibnizZentrum fuer Informatik (2017)
Brázdil, T., Brozek, V., Chatterjee, K., Forejt, V., Kucera, A.: Two views on multiple meanpayoff objectives in Markov decision processes. LMCS 10(1) (2014)
Brázdil, T., Chatterjee, K., Chmelík, M., Forejt, V., Křetínský, J., Kwiatkowska, M., Parker, D., Ujma, M.: Verification of Markov decision processes using learning algorithms. In: Cassez, F., Raskin, J.F. (eds.) ATVA 2014. LNCS, vol. 8837, pp. 98–114. Springer, Cham (2014). https://doi.org/10.1007/9783319119366_8
Brázdil, T., Chatterjee, K., Forejt, V., Kucera, A.: Trading performance for stability in Markov decision processes. J. Comput. Syst. Sci. 84, 144–170 (2017)
Bresina, J.L., Jónsson, A.K., Morris, P.H., Rajan, K.: Activity planning for the mars exploration rovers. In: ICAPS, pp. 40–49. AAAI (2005)
Bryce, D., Cushing, W., Kambhampati, S.: Probabilistic planning is multiobjective. Technical report, Arizona State Univ., CSE (2007)
Cao, Z., Guo, H., Zhang, J., Oliehoek, F.A., Fastenrath, U.: Maximizing the probability of arriving on time: a practical qlearning method. In: AAAI, pp. 4481–4487. AAAI Press (2017)
Chatterjee, K., Chmelik, M., Gupta, R., Kanodia, A.: Optimal cost almostsure reachability in POMDPs. Artif. Intell. 234, 26–48 (2016)
Chatterjee, K., Majumdar, R., Henzinger, T.A.: Markov decision processes with multiple objectives. In: Durand, B., Thomas, W. (eds.) STACS 2006. LNCS, vol. 3884, pp. 325–336. Springer, Heidelberg (2006). https://doi.org/10.1007/11672142_26
Chen, T., Forejt, V., Kwiatkowska, M., Simaitis, A., Wiltsche, C.: On stochastic games with multiple objectives. In: Chatterjee, K., Sgall, J. (eds.) MFCS 2013. LNCS, vol. 8087, pp. 266–277. Springer, Heidelberg (2013). https://doi.org/10.1007/9783642403132_25
Cheng, L., Subrahmanian, E., Westerberg, A.W.: Multiobjective decision processes under uncertainty: applications, problem formulations, and solution strategies. Ind. Eng. Chem. Res. 44(8), 2405–2415 (2005)
Christman, A., Cassamano, J.: Maximizing the probability of arriving on time. In: Dudin, A., De Turck, K. (eds.) ASMTA 2013. LNCS, vol. 7984, pp. 142–157. Springer, Heidelberg (2013). https://doi.org/10.1007/9783642394089_11
Dehnert, C., Junges, S., Katoen, J.P., Quatmann, T., Volk, M.: Storm source files. zenodo (2018), https://doi.org/10.5281/zenodo.1181896
Dehnert, C., Junges, S., Katoen, J.P., Volk, M.: A Storm is coming: a modern probabilistic model checker. In: Majumdar, R., Kunčak, V. (eds.) CAV 2017, Part II. LNCS, vol. 10427, pp. 592–600. Springer, Cham (2017). https://doi.org/10.1007/9783319633909_31
Eastwood, R., Alexander, R., Kelly, T.: Safe multiobjective planning with a posteriori preferences. In: HASE, pp. 78–85. IEEE Computer Society (2016)
Etessami, K., Kwiatkowska, M., Vardi, M.Y., Yannakakis, M.: Multiobjective model checking of Markov decision processes. LMCS 4(4) (2008)
Feng, L., Wiltsche, C., Humphrey, L., Topcu, U.: Controller synthesis for autonomous systems interacting with human operators. In: ICCPS, pp. 70–79. ACM (2015)
Forejt, V., Kwiatkowska, M., Parker, D.: Pareto curves for probabilistic model checking. In: Chakraborty, S., Mukund, M. (eds.) ATVA 2012. LNCS, pp. 317–332. Springer, Heidelberg (2012). https://doi.org/10.1007/9783642333866_25
Forejt, V., Kwiatkowska, M., Norman, G., Parker, D.: Automated verification techniques for probabilistic systems. In: Bernardo, M., Issarny, V. (eds.) SFM 2011. LNCS, vol. 6659, pp. 53–113. Springer, Heidelberg (2011). https://doi.org/10.1007/9783642214554_3
Haddad, S., Monmege, B.: Reachability in MDPs: refining convergence of value iteration. In: Ouaknine, J., Potapov, I., Worrell, J. (eds.) RP 2014. LNCS, vol. 8762, pp. 125–137. Springer, Cham (2014). https://doi.org/10.1007/9783319114392_10
Hahn, E.M., Hartmanns, A.: A comparison of time and rewardbounded probabilistic model checking techniques. In: Fränzle, M., Kapur, D., Zhan, N. (eds.) SETTA 2016. LNCS, vol. 9984, pp. 85–100. Springer, Cham (2016). https://doi.org/10.1007/9783319476773_6
Hahn, E.M., Hashemi, V., Hermanns, H., Lahijanian, M., Turrini, A.: Multiobjective robust strategy synthesis for interval Markov decision processes. In: Bertrand, N., Bortolussi, L. (eds.) QEST 2017. LNCS, vol. 10503, pp. 207–223. Springer, Cham (2017). https://doi.org/10.1007/9783319663357_13
Hartmanns, A., Hermanns, H.: The Modest Toolset: an integrated environment for quantitative modelling and verification. In: Ábrahám, E., Havelund, K. (eds.) TACAS 2014. LNCS, vol. 8413, pp. 593–598. Springer, Heidelberg (2014). https://doi.org/10.1007/9783642548628_51
Hartmanns, A., Junges, S., Katoen, J.P., Quatmann, T.: Evaluated artefact for this paper. figshare (2018), https://doi.org/10.6084/m9.figshare.5907349.v1
Hartmanns, A., Wendler, P.: Artefact vm. figshare (2018), https://doi.org/10.6084/m9.figshare.5896615
Hou, P., Yeoh, W., Varakantham, P.: Revisiting risksensitive MDPs: new algorithms and results. In: ICAPS. AAAI (2014)
Junges, S., Jansen, N., Dehnert, C., Topcu, U., Katoen, J.P.: Safetyconstrained reinforcement learning for MDPs. In: Chechik, M., Raskin, J.F. (eds.) TACAS 2016. LNCS, vol. 9636, pp. 130–146. Springer, Heidelberg (2016). https://doi.org/10.1007/9783662496749_8
Klein, J., Baier, C., Chrszon, P., Daum, M., Dubslaff, C., Klüppelholz, S., Märcker, S., Müller, D.: Advances in probabilistic model checking with PRISM: variable reordering, quantiles and weak deterministic Büchi automata. In: STTT, pp. 1–16 (2017)
Kolobov, A., Mausam, Weld, D.S.: A theory of goaloriented MDPs with dead ends. In: UAI, pp. 438–447. AUAI Press (2012)
Kwiatkowska, M., Norman, G., Parker, D.: The PRISM benchmark suite. In: QEST, pp. 203–204. IEEE CS Press (2012)
Kwiatkowska, M., Norman, G., Parker, D.: PRISM 4.0: verification of probabilistic realtime systems. In: Gopalakrishnan, G., Qadeer, S. (eds.) CAV 2011. LNCS, vol. 6806, pp. 585–591. Springer, Heidelberg (2011). https://doi.org/10.1007/9783642221101_47
Lacerda, B., Parker, D., Hawes, N.: Multiobjective policy generation for mobile robots under probabilistic timebounded guarantees. In: ICAPS, pp. 504–512. AAAI Press (2017)
Lankaites Pinheiro, R., LandaSilva, D., Atkin, J.: A technique based on tradeoff maps to visualise and analyse relationships between objectives in optimisation problems. J. MultiCriteria Decis. Anal. 24(1–2), 37–56 (2017)
Laroussinie, F., Sproston, J.: Model checking durational probabilistic systems. In: Sassone, V. (ed.) FoSSaCS 2005. LNCS, vol. 3441, pp. 140–154. Springer, Heidelberg (2005). https://doi.org/10.1007/9783540319825_9
Puterman, M.L.: Markov Decision Processes. Wiley, New York (1994)
Quatmann, T., Junges, S., Katoen, J.P.: Markov automata with multiple objectives. In: Majumdar, R., Kunčak, V. (eds.) CAV 2017, Part I. LNCS, vol. 10426, pp. 140–159. Springer, Cham (2017). https://doi.org/10.1007/9783319633879_7
Randour, M., Raskin, J.F., Sankur, O.: Percentile queries in multidimensional Markov decision processes. FMSD 50(2–3), 207–248 (2017)
Roijers, D.M., Vamplew, P., Whiteson, S., Dazeley, R.: A survey of multiobjective sequential decisionmaking. J. Artif. Intell. Res. 48, 67–113 (2013)
Steinmetz, M., Hoffmann, J., Buffet, O.: Goal probability analysis in probabilistic planning: exploring and enhancing the state of the art. J. Artif. Intell. Res. 57, 229–271 (2016)
TeichteilKönigsbuch, F.: Stochastic safest and shortest path problems. In: AAAI. AAAI Press (2012)
Vamplew, P., Dazeley, R., Berry, A., Issabekov, R., Dekker, E.: Empirical evaluation methods for multiobjective reinforcement learning algorithms. Mach. Learn. 84(1–2), 51–80 (2011)
Yu, S.X., Lin, Y., Yan, P.: Optimization models for the first arrival target distribution function in discrete time. J. Math. Anal. Appl. 225(1), 193–223 (1998)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made. The images or other third party material in this book are included in the book's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the book's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
Copyright information
© 2018 The Author(s)
About this paper
Cite this paper
Hartmanns, A., Junges, S., Katoen, JP., Quatmann, T. (2018). Multicost Bounded Reachability in MDP. In: Beyer, D., Huisman, M. (eds) Tools and Algorithms for the Construction and Analysis of Systems. TACAS 2018. Lecture Notes in Computer Science(), vol 10806. Springer, Cham. https://doi.org/10.1007/9783319899633_19
Download citation
DOI: https://doi.org/10.1007/9783319899633_19
Published:
Publisher Name: Springer, Cham
Print ISBN: 9783319899626
Online ISBN: 9783319899633
eBook Packages: Computer ScienceComputer Science (R0)