Multi-cost Bounded Reachability in MDP

Hartmanns, Arnd; Junges, Sebastian; Katoen, Joost-Pieter; Quatmann, Tim

doi:10.1007/978-3-319-89963-3_19

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10806))

Included in the following conference series:

International Conference on Tools and Algorithms for the Construction and Analysis of Systems

7293 Accesses
16 Citations

Abstract

We provide an efficient algorithm for multi-objective model-checking problems on Markov decision processes (MDPs) with multiple cost structures. The key problem at hand is to check whether there exists a scheduler for a given MDP such that all objectives over cost vectors are fulfilled. Reachability and expected cost objectives are covered and can be mixed. Empirical evaluation shows the algorithm’s scalability. We discuss the need for output beyond Pareto curves and exploit the available information from the algorithm to support decision makers.

This work is supported by the 3TU project “Big Software on the Run”, CDZ project CAP, and DFG RTG 2236 “UnRAVeL”.

You have full access to this open access chapter, Download conference paper PDF

Multi-cost Bounded Tradeoff Analysis in MDP

Article Open access 28 July 2020

The Complexity of Graph-Based Reductions for Reachability in Markov Decision Processes

Maximal Cost-Bounded Reachability Probability on Continuous-Time Markov Decision Processes

1 Introduction

Markov decision processes [41] (MDPs) with rewards or costs are a popular model to describe planning problems under uncertainty. Planning algorithms aim to find strategies which perform well (or even optimally) for a given objective. These algorithms typically assume that a goal is reached eventually [41, 45]. This however is unrealistic in many scenarios, e.g. due to insufficient resources or the possibility of failing actions. Furthermore, these policies often admit single runs which perform far below the user’s expectation, which is unsuitable in many scenarios with high stakes. Examples range from deliveries reaching an airport after the plane’s departure to more serious scenarios in e.g. wildfire management [1]. In particular, many scenarios call for minimising the probability to run out of resources before reaching the goal: while it is beneficial for a plane to reach its destination with low expected fuel consumption, it is essential to reach its destination with the fixed available amount of fuel.

Policies that optimise solely for the probability to reach a goal are mostly very expensive. Even in the presence of just a single cost structure, decision makers have to trade the success probability against the costs. This makes many planning problems inherently multi-objective [12, 17]. In particular, safety properties cannot be averaged out by good performance [21]. Planning scenarios in various application areas [44] have different resource constraints. Typical examples are energy consumption and time [11], or optimal expected revenue and time [38] in robotics, and monetary cost and available capacity in logistics [17].

Illustrative Example. Consider a simplified (discretised) version of the Mars rover task scheduling problem [11]. The task is to plan a variety of experiments for a day on Mars. The experiments vary in their success probability, time, energy consumption and their scientific value upon success. The time, energy consumption, and scientific value are uncertain and modelled by probability distributions, cf. Fig. 1(a). The objective is to achieve a minimum of daily scientific progress while limiting the risk of running out of time or out of energy. As the rover is expected to work for a longer period, we prefer a high expected scientific value.

Contributions and approach. This paper focuses on multi-objective cost-bounded reachability queries on MDPs, a natural setting for the aforementioned planning problems. The input is an MDP with multiple cost structures (e.g. energy, utility or time) and multiple objectives of the form “maximise/minimise the probability to reach a state in $G_i$ such that the cumulative cost for the i-th cost structure is below/above a threshold $b_i$”. This multi-objective variant of cost-bounded reachability is PSPACE-hard [43]. The focus of this paper is on the practical side: we aim at finding a practically efficient algorithm to obtain (an approximation of) the Pareto-optimal points. To accomplish this, we adapt and generalise recent approaches for the single-objective case [27, 34] towards the multi-objective setting. The basic idea of [27, 34] is to implicitly unfold the MDP along cost epochs, and exploit the regularities of the epoch-MDPs. Prism [37] and the Modest Toolset [29] have been updated with such methods for the single-objective case and significantly outperform the explicit unfolding approach of [2, 40]. This paper presents an algorithm that lifts this principle to multiple cost objectives and determines approximation errors when using value iteration. Extensions towards quantiles and expected costs are considered too. Evaluation using a prototypical implementation in Storm [20] shows promising results. In addition, we equip our algorithm with means to visualise (inspired by the recent techniques in [39]) the trade-offs between various objectives that go beyond Pareto curves; we believe that this is key to obtain better insights into multi-objective decision making. An example is given in Fig. 1(b): it depicts the probability to satisfy an objective based on the remaining energy (y-axis) and time (x-axis).

Related work. The analysis of single-objective (cost-bounded) reachability in MDPs is an active area of research in both AI and formal method communities, and referred to in, e.g., [18, 35, 48]. Various model checking approaches for single objectives exist. In [32], the topology of the unfolded MDP is exploited to speed up the value iteration. In [27], three different model checking approaches are explored and compared. A survey for heuristic approaches is given in [45]. A Q-learning based approach is described in [13]. An extension of this problem in the partially observable setting was considered in [14], and for probabilistic timed automata in [27]. The method from [4] computes optimal expected values under e.g. the condition that the goal is reached, and is thus applicable in settings where a goal is not necessarily reached. A similar problem is considered in [46]. For multi-objective analysis, the model checking community typically focuses on probabilities and expected costs as in the seminal works [15, 22]. Implementations are typically based on a value iteration approach in [24], and have been extended to stochastic games [16], Markov automata [42], and interval MDPs [28]. Other considered cases include e.g. multi-objective mean-payoff objectives [8], objectives over instantaneous costs [10], and parity objectives [7]. Multi-objective problems for MDPs with an unknown cost-function are considered in [33]. Surveys on multi-objective decision making in AI and machine learning can be found in [44] and [47], respectively.

2 Preliminaries

We write $2^{S} $ for the powerset of S. The i-th component of a tuple $\varvec{t} = \langle v_1, \dots , v_n \rangle $ is . A (discrete) probability distribution over a set $\varOmega $ is a function $\mu \in \varOmega \rightarrow [0, 1]$ such that is countable and $\sum _{\omega \in \mathrm {support}({\mu })}{\mu (\omega )} = 1$. $\mathrm {Dist}({\varOmega }) $ is the set of all probability distributions over $\varOmega $. $\mathcal {D}(s) $ is the Dirac distribution for s, defined by $\mathcal {D}(s) (s) = 1$.

Definition 1

A Markov decision process (MDP) with m cost structures is a triple $M =\langle S, T, s_ init \rangle $ where $S $ is a finite set of states, $T \in S \rightarrow 2^{\mathrm {Dist}({\mathbb {N} ^{m} \times S})} $ is the transition function, and $s_ init \in S $ is the initial state. For all $s \in S $, we require that $T (s)$ is finite and non-empty.

We write for $\exists \,\mu \in T (s)$ and call it a transition. We write if additionally $\langle \varvec{c},s' \rangle \in \mathrm {support}({\mu }) $. $\langle \varvec{c},s' \rangle $ is a branch with cost vector $\varvec{c}$. If $T $ is clear from the context, we just write . Graphically, transitions are lines to a node from which branches labelled with their probability and costs lead to successor states. We may omit the node and probability for transitions into Dirac distributions.

Example 1

Figure 2 shows an MDP $M_ ex $. From the initial state $s_0$, the choice of going towards $s_1$ or $s_2$ is nondeterministic. Either way, the probability to return to $s_0$ is 0.5, otherwise we move to $s_1$ (or $s_2$). $M_ ex $ has two cost structures: Failing to move to $s_1$ has a cost of 1 for the first, and 2 for the second structure. Moving to $s_2$ yields cost 2 for the first and no cost for the second structure.

In the remainder of this paper, we fix a given MDP $M = \langle S, T, s_ init \rangle $. Its semantics is captured by the notion of paths. A path in M represents the infinite concrete resolution of both nondeterministic and probabilistic choices: $\pi = s_0\, \mu _0\, \varvec{c}_0\, s_1\, \mu _1\, \varvec{c}_1 \dots $ where $s_i \in S $, , and $\langle \varvec{c}_i, s_{i+1} \rangle \in \mathrm {support}({\mu _i}) $ for all $i \in \mathbb {N} $. A finite path $\pi _\mathrm {fin} = s_0\, \mu _0\, \varvec{c}_0\, s_1\, \mu _1\, \varvec{c}_1\, s_2 \dots \mu _{n-1}\, \varvec{c}_{n-1}\, s_n$ is a finite prefix of a path with . Let . $\mathrm {Paths}_\mathrm {fin}(M)$ ($\mathrm {Paths}(M)$) are the set of all (in)finite finite paths starting in $s_ init $. A scheduler (adversary, policy or strategy) resolves nondeterministic choices:

Definition 2

$\mathfrak {S} \in \mathrm {Paths}_\mathrm {fin}(M) \rightarrow \mathrm {Dist}({\mathrm {Dist}({\mathbb {N} ^m \times S})}) $ is a scheduler for M if . The set of all schedulers of M is $\mathrm {Sched} (M)$. $\mathfrak {S}$ is deterministic if $|\mathrm {support}({\mathfrak {S} (\pi )}) | = 1$ for all finite paths $\pi $.

Via the standard cylinder set construction [25], a scheduler $\mathfrak {S} $ induces a probability measure $\mathcal {P}^{\mathfrak {S}}_{M} $ on measurable sets of paths starting from $s_ init $. We define the extremal values $\mathcal {P}^{\mathrm {max}}_{M} (\varPi ) = \sup _{\mathfrak {S} \in \mathrm {Sched} (M)}\mathcal {P}^{\mathfrak {S}}_{M} (\varPi )$ and $\mathcal {P}^{\mathrm {min}}_{M} (\varPi ) = \inf _{\mathfrak {S} \in \mathrm {Sched} (M)}\mathcal {P}^{\mathfrak {S}}_{M} (\varPi )$ for measurable $\varPi \subseteq \mathrm {Paths}(M)$. For clarity, we focus on probabilities in this paper, but note that expected accumulated costs can be defined analogously [25] and our methods apply to them with only minor changes.

Cost-Bounded Reachability. We are interested in the probabilities of sets of paths that reach certain goal states within multiple cost bounds:

Definition 3

A cost bound is given by $\langle \mathrm {C}_{j} \rangle _{\sim b}\, G$ where $j \in \{1, \dots , m\}$ identifies a cost structure, ${\sim } \in \{<,\le ,>,\ge \}$, $b \in \mathbb {N} $ is a bound value, and $G \subseteq S$ is a set of goal states. A cost-bounded reachability formula is a conjunction $\bigwedge _{i=1}^{\!n \in \mathbb {N}} (\langle \mathrm {C}_{j_i} \rangle _{\sim _i b_i}\, G_i)$ of cost bounds. It characterises the measurable set of paths $\varPi $ where, for every i, every $\pi \in \varPi $ has a prefix $\pi _\mathrm {fin}^i$ with $\mathrm {last}(\pi _\mathrm {fin}^i) \in G_i$ and $\mathrm {cost}_{j_i}(\pi _\mathrm {fin}^i) \sim _{i} b_i$.

A (single-objective) multi-cost bounded reachability query asks for $\mathcal {P}^{ opt }_{M} (e)$ where $ opt \in \{\, \mathrm {max}, \mathrm {min} \,\}$ and e is a cost-bounded reachability formula. Unbounded and step-bounded reachability are special cases of cost-bounded reachability. A single-objective query may contain multiple bounds, but asks for a single scheduler that optimises the probability of satisfying them all.

We also consider multi-objective tradeoffs, i.e. sets of single-objective queries written as $\varPhi = multi \big ({\mathcal {P}^{ opt_1 }_{M} ({e}_1), \dots , \mathcal {P}^{ opt_{\ell } }_{M} ({e}_{\ell })}\big ) $. We call the $e_k$ objectives. For tradeoffs, we are interested in the Pareto curve $ Pareto (M, \varPhi )$ which consists of all achievable probability vectors $\varvec{p}_{\mathfrak {S}} = \langle \mathcal {P}^{\mathfrak {S}}_{M} ({e}_1), \dots , \mathcal {P}^{\mathfrak {S}}_{M} ({e}_\ell ) \rangle $ for $\mathfrak {S} \in \mathrm {Sched} (M)$ that are not dominated by another achievable vector $\varvec{p}_{\mathfrak {S} '}$. More precisely, $\varvec{p}_{\mathfrak {S}} \in Pareto (M,\varPhi )$ iff for all $\mathfrak {S} ' \in \mathrm {Sched} (M)$ either $\varvec{p}_{\mathfrak {S}} = \varvec{p}_{\mathfrak {S} '} $ or for some $i\in \{1, \dots , \ell \}$ we have $( opt_i = \mathrm {max} \wedge {\varvec{p}_{\mathfrak {S}}}[{i}] > {\varvec{p}_{\mathfrak {S} '}}[{i}]) \vee ( opt_i = \mathrm {min} \wedge {\varvec{p}_{\mathfrak {S}}}[{i}] < {\varvec{p}_{\mathfrak {S} '}}[{i}])$.

Example 2

We consider $\varPhi = multi \big ({\mathcal {P}^{\mathrm {max}}_{M_ ex } (\langle \mathrm {C}_{1} \rangle _{\le 1}\, \{ s_1\}), \mathcal {P}^{\mathrm {max}}_{M_ ex } (\langle \mathrm {C}_{2} \rangle _{\le 3}\, \{ s_2 \})}\big ) $ for $M_ ex $ of Fig. 2. Let $\mathfrak {S} _j$ be the scheduler that tries to move to $s_1$ for at most j attempts and afterwards moves to $s_2$. The induced probability vectors $\varvec{p}_{\mathfrak {S} _1} = \langle 0.5,1 \rangle $ and $\varvec{p}_{\mathfrak {S} _2} = \langle 0.75, 0.75 \rangle $ both lie on the Pareto curve since no $\mathfrak {S} \in \mathrm {Sched} (M_ ex )$ induces (strictly) larger probabilities $\varvec{p}_{\mathfrak {S}}$. By also considering schedulers that randomise between the choices of $\mathfrak {S} _1$ and $\mathfrak {S} _2$ we obtain $ Pareto (M_ ex , \varPhi ) = \{ w \cdot \varvec{p}_{\mathfrak {S} _1} + (1{-}w) \cdot \varvec{p}_{\mathfrak {S} _2} \mid w \in [0,1] \}$.

For clarity of presentation, we restrict to tradeoffs $\varPhi $ where every cost structure occurs exactly once, i.e., the number ${m}$ of cost structures of M matches the number of cost bounds occurring in $\varPhi $. Furthermore, we require that none of the sets of goal states contains the initial state. Both assumptions are w.l.o.g. by copying cost structures as needed and adding a new initial state with zero-cost transition to the old initial state.

3 Multi-dimensional Sequential Value Iteration

We present a practically efficient approach to compute (an approximation of) the Pareto curve for MDP M with ${m}$ cost structures and tradeoff $\varPhi $. We merge the ideas of [24] to approximate a Pareto curve for an (unbounded) multi-objective tradeoff with those of [27, 34] to efficiently compute (single-objective) cost-bounded reachability probabilities. For clarity of presentation we start with the upper-bounded maximum case and assume a tradeoff of the form $\varPhi = multi \big ({\mathcal {P}^{\mathrm {max}}_{M} ({e}_1), \dots , \mathcal {P}^{\mathrm {max}}_{M} ({e}_{\ell })}\big ) $ with ${e}_k = \bigwedge _{i=n_{k-1}}^{\!n_k-1}(\langle \mathrm {C}_{{i}} \rangle _{\le b_{i}}\, G_{i})$ and $0=n_0< n_1< \dots < n_{{\ell }} = {m}$. Other variants are discussed in Sect. 3.3.

Cost epochs and goal satisfaction. Central to our approach is the concept of cost epochs. Consider the path $\pi = (s_0\langle 2,0 \rangle s_2\langle 0,0 \rangle s_0\langle 1,2 \rangle )^{\omega }$ through $M_ ex $ of Fig. 2. We plot the accumulated cost in both dimensions along this path in Fig. 3(a). Starting from $\langle 0,0 \rangle $, the first transition yields cost 2 for the first cost structure: we jump to coordinate $\langle 2,0 \rangle $. The next transition, back to $s_0$, has no cost, so we stay at $\langle 2,0 \rangle $. Finally, the failed attempt to move to $s_1$ incurs costs $\langle 1, 2 \rangle $. Consequently, for an infinite path, infinitely many points in this grid may be reached. However, a tradeoff specifies bound values for the costs, e.g., for $\varPhi _ ex = multi \big ({\mathcal {P}^{\mathrm {max}}_{M_ ex } (\langle \mathrm {C}_{1} \rangle _{\le 4}\, \{ s_1\}), \mathcal {P}^{\mathrm {max}}_{M_ ex } (\langle \mathrm {C}_{2} \rangle _{\le 3}\, \{ s_2 \})}\big ) $ we get bound values 4 and 3. Once the bound value for a bound is reached, accumulating further costs in this dimension does not impact the satisfaction of its formula. It thus suffices to keep track, for each bound, of the remaining costs before reaching the bound value. This leads to a finite grid as depicted in Fig. 3(b). We refer to each of its coordinates as a cost epoch:

Definition 4

An ${m}$-dimensional cost epoch is a tuple in . For ${\varvec{e}}\in \varvec{E}_{{m}}$, $\varvec{c} \in \mathbb {N} ^{m}$, the successor epoch is if that value is non-negative and $\bot $ otherwise.

If the entry for a bound is $\bot $, it cannot be satisfied any more: too much costs have already been incurred. To check whether an objective ${e}_k = \bigwedge _{i=n_{k-1}}^{\!n_k-1}(\langle \mathrm {C}_{{i}} \rangle _{\le b_{i}}\, G_{i})$ is satisfied, we memorise whether each individual bound already holds. This is also used to ensure that satisfying a bound more than once has no effect.

Definition 5

A goal satisfaction represents the cost structure indices i for which bound $\langle \mathrm {C}_{i} \rangle _{\le b_i}\,G_i$ already holds, i.e. $G_i$ was reached before the bound value $b_i$. For ${\varvec{g}}\in {\varvec{G}_{{m}}}$, ${\varvec{e}}\in \varvec{E}_{{m}}$ and $s\in S $, let ${ succ ({\varvec{g}}, s, {\varvec{e}})} \in {\varvec{G}_{{m}}}$ define the update upon reaching s: ${{ succ ({\varvec{g}}, s, {\varvec{e}})}}[{i}] = 1$ if $s \in G_i \wedge {{\varvec{e}}}[{i}] \ne \bot $ and ${{ succ ({\varvec{g}}, s, {\varvec{e}})}}[{i}] = {{\varvec{g}}}[{i}] $ otherwise.

3.1 The Unfolding Approach

$ Pareto (M, \varPhi )$ can be computed by reducing $\varPhi $ to a multi-objective unbounded reachability problem on the unfolded MDP. Its states are the Cartesian product of the original MDP’s states, the epochs, and the goal satisfactions:

Definition 6

The unfolding for M as in Definition 1 and upper-bounded maximum tradeoff $\varPhi $ is the MDP with no cost structures, and the unfolding of probability distribution $\mu $ defined by $ unf (\mu )(\langle \langle s', {\varvec{e}}', {\varvec{g}}' \rangle \rangle ) = \mu (\langle \varvec{c}, s' \rangle )$ if ${\varvec{e}}' = { succ ({\varvec{e}}, \varvec{c})} \wedge {\varvec{g}}' = { succ ({\varvec{g}}, s', {\varvec{e}}')}$ and 0 otherwise.

Costs are now encoded in the state space, so it suffices to consider the unbounded tradeoff $\varPhi ' = multi \big ({\mathcal {P}^{\mathrm {max}}_{{M_{ unf }}} (e'_1), \dots , \mathcal {P}^{\mathrm {max}}_{{M_{ unf }}} (e'_{\ell })}\big ) $ with ${e}'_k = \langle \cdot \rangle _{\ge 0}\,G'_k$ and $G'_k = \{ \langle s, {\varvec{e}}, {\varvec{g}} \rangle \mid \bigwedge _{i=n_{k-1}}^{\!n_k-1} {{\varvec{g}}}[{i}] = 1 \}$.

Lemma 1

There is a bijection $f :\mathrm {Sched} (M) \rightarrow \mathrm {Sched} ({M_{ unf }})$ with $\mathcal {P}^{\mathfrak {S}}_{M} (e_k) = \mathcal {P}^{f(\mathfrak {S})}_{{M_{ unf }}} (e'_k)$ for all $\mathfrak {S} \in \mathrm {Sched} (M)$ and $k \in \{\,1, \dots , {\ell }\,\}$. Consequently, we have that $ Pareto (M, \varPhi ) = Pareto ({M_{ unf }}, \varPhi ')$.

$ Pareto ({M_{ unf }}, \varPhi ')$ can be computed with existing multi-objective model checking algorithms for unbounded reachability. We build on the one of [24]. It iteratively chooses weight vectors $\varvec{w} = \langle w_1, \dots , w_{\ell } \rangle \in [0,1]^{\ell }\setminus \{ \mathbf {0} \}$ and computes points

$$\begin{aligned} \varvec{p}_{\varvec{w}} = \langle \mathcal {P}^{\mathfrak {S}}_{{M_{ unf }}} \!(e'_1), \ldots , \mathcal {P}^{\mathfrak {S}}_{{M_{ unf }}} \!(e'_{\ell }) \rangle ~ \text {with} ~ \mathfrak {S} \in {\text{ arg } \text{ max }}_{\mathfrak {S} '}\!\!\left( \sum \nolimits _{k=1}^{{\ell }} w_k \cdot \mathcal {P}^{\mathfrak {S} '}_{{M_{ unf }}} (e'_k) \right) \!\text {.} \end{aligned}$$

(1)

The Pareto curve $\varvec{P}$ is convex, $\varvec{p}_{\varvec{w}} \in \varvec{P}$ for all $\varvec{w}$, and $\varvec{q} \in \varvec{P}$ implies $\varvec{q} \cdot \varvec{w} \le \varvec{p}_{\varvec{w}} \cdot \varvec{w}$. These observations allow us to approximate the Pareto curve with arbitrary precision; see [24] for details. [24] characterises $\varvec{p}_{\varvec{w}}$ via weighted expected costs: ${M_{ unf }}$ is equipped with ${\ell }$ cost structures used to calculate the probability of each of the ${\ell }$ objectives. This is achieved by setting the value of the k-th cost structure on each branch to 1 iff the objective ${e}_k'$ is satisfied in the target state of the branch but was not satisfied in the transition’s source state. On a path $\pi $ through the resulting model ${M^{+}_{ unf }}$, we collect exactly one cost w.r.t. cost structure k iff $\pi $ satisfies objective ${e}_k$.

Definition 7

For $\mathfrak {S} \in \mathrm {Sched} ({M^{+}_{ unf }})$ and $\varvec{w} \in [0,1]^{\ell }$, the weighted expected cost is $\mathcal {E}^{\mathfrak {S}}_{{M^{+}_{ unf }}} (\varvec{w}) = \sum _{k=1}^{\ell }{\varvec{w}}[{k}] \cdot \int _{\pi \in \mathrm {Paths}(M)} \mathrm {cost}_{k}(\pi ) {\mathop {}\!\mathrm {d}}\mathcal {P}^{\mathfrak {S}}_{{M^{+}_{ unf }}} (\pi )$, i.e. the expected value of the weighted sum of the costs accumulated on paths in ${M^{+}_{ unf }}$.

The following characterisation of $\varvec{p}_{\varvec{w}}$ is equivalent to Eq. 1:

$$\begin{aligned} \varvec{p}_{\varvec{w}} = \langle \mathcal {E}^{\mathfrak {S}}_{{M^{+}_{ unf }}} (\varvec{1}_1), \dots , \mathcal {E}^{\mathfrak {S}}_{{M^{+}_{ unf }}} (\varvec{1}_{\ell }) \rangle \quad \text {where} \quad \mathfrak {S} \in {\text{ arg } \text{ max }}_{\mathfrak {S} '} \mathcal {E}^{\mathfrak {S} '}_{{M^{+}_{ unf }}} (\varvec{w}) \end{aligned}$$

(2)

and $\varvec{1}_k \in \{0,1\}^{\ell }$ is the weight vector defined by ${\varvec{1}_k}[{j}] = 1$ iff $j=k$. Standard MDP model checking algorithms [41] can be applied to compute an optimal (deterministic and memoryless) scheduler $\mathfrak {S} $ and the induced costs $\mathcal {E}^{\mathfrak {S}}_{{M^{+}_{ unf }}} (\varvec{1}_k)$.

3.2 An Epoch Model Approach Without Unfolding

The unfolding approach does not scale well: If the original MDP has n states, the unfolding will have on the order of $n \cdot \prod _{i=1}^m (b_i + 2)$ states. This makes it infeasible for larger bound values $b_i$ over multiple bounds. The bottleneck lies in computing the points $\varvec{p}_{\varvec{w}}$ as in Eqs. 1 and 2. We now show how to do so efficiently, i.e. given a weight vector $\varvec{w} = \langle w_1, \dots , w_{\ell } \rangle \in [0,1]^{\ell }\setminus \{ \mathbf {0} \}$, compute

$$\begin{aligned} \varvec{p}_{\varvec{w}} = \langle \mathcal {P}^{\mathfrak {S}}_{M} (e_1), \dots , \mathcal {P}^{\mathfrak {S}}_{M} (e_{\ell }) \rangle ~ \text {with} ~ \mathfrak {S} \in {\text{ arg } \text{ max }}_{\mathfrak {S} '} \!\!\left( \sum \nolimits _{k=1}^{{\ell }} w_i \cdot \mathcal {P}^{\mathfrak {S} '}_{M} (\langle \cdot \rangle _{\ge 0}\,e_k) \right) \end{aligned}$$

(3)

without unfolding. The characterisations of $\varvec{p}_{\varvec{w}}$ given in Eqs. 1 and 3 are equivalent due to Lemma 1.

The efficient analysis of single-objective queries with a single bound $\varPhi _1 = \mathcal {P}^{\mathrm {max}}_{M} (\langle \mathrm {C}_{} \rangle _{\le b}\, G)$ has recently been addressed in e.g. [27, 34]. The key observation is that the unfolding ${M_{ unf }}$ can be decomposed into $b + 2$ epoch model MDPs $M^{b}, \dots , M^{0}, M^{\bot }$ corresponding to the cost epochs. The epoch models are copies of M with only slight adaptations. Reachability probabilities in copies corresponding to epoch i only depend on the copies $\{\, M^{j}~|~j \le i \vee j = \bot \,\}$. It is thus possible to analyse $M^{\bot }, \dots , M^{b}$ sequentially instead of considering all copies at once. In particular, it is not necessary to construct the full unfolding.

We lift this idea to multi-objective tradeoffs. The single-objective case is notably simpler in that reaching a goal state for the first time or exceeding the cost bound immediately suffices to determine whether the one property is satisfied. In particular, while $M^{\bot }$ is just one sink state in the single-objective case, its structure is more involved here.

We first formalise the notion of epoch models for multiple bounds. The aim is to build an MDP for each epoch ${\varvec{e}}\in \varvec{E}_{{m}}$ that can be analysed via standard model checking techniques using the weighted expected cost encoding of objective probabilities. The state space of an epoch model consists of up to one copy of each original state for each goal satisfaction vector ${\varvec{g}}\in {\varvec{G}_{{m}}}$. Additional sink states $\langle s_\bot , {\varvec{g}} \rangle $ encode the target for a jump to any other cost epoch ${\varvec{e}}' \ne {\varvec{e}}$. We consider ${\ell }$ cost structures to encode the objective probabilities. Let function ${ satObj _\varPhi }:{\varvec{G}_{{m}}} \times {\varvec{G}_{{m}}} \rightarrow \{0,1\}^{\ell }$ assign value 1 in entry k iff a reachability property ${e}_k$ is satisfied according to the second goal vector but was not satisfied in the first. For the transitions’ branches, we distinguish two cases: (1) If the successor epoch $ {\varvec{e}}' = { succ ({\varvec{e}},\varvec{c})}$ with respect to the original cost $\varvec{c} \in \mathbb {N} ^{m}$ is the same as the current epoch ${\varvec{e}}$, we jump to the successor state as before, and update the goal satisfaction. We collect the new costs for the objectives if updating the goal satisfaction newly satisfies an objective as given by ${ satObj _\varPhi }$ (2). If the successor epoch ${\varvec{e}}' = { succ ({\varvec{e}},\varvec{c})}$ is different from the current epoch ${\varvec{e}}$, the probability is rerouted to the sink state with the corresponding goal state satisfaction vector. The collected costs contains the part of the goal satisfaction as in (1), but also the results obtained by analysing the reached epoch ${\varvec{e}}'$, given by a function f.

Definition 8

The epoch model of MDP M as in Definition 1 for ${\varvec{e}}\in \varvec{E}_{{m}}$ and a function $f :{\varvec{G}_{{m}}} \times \mathrm {Dist}({\mathbb {N} ^m \times S}) \rightarrow [0,1]^{\ell }$ is the MDP $M^{\varvec{e}}_f =\langle S ^{\varvec{e}}, T ^{\varvec{e}}_f, \langle s_ init , \varvec{0} \rangle \rangle $ with ${\ell }$ cost structures, , $T ^{\varvec{e}}_f(\langle s_\bot , {\varvec{g}} \rangle ) = \{\,\mathcal {D}(\langle \varvec{0}, \langle s_\bot , {\varvec{g}} \rangle \rangle ) \,\}$, and for every ${\tilde{s}}= \langle s, {\varvec{g}} \rangle \in S ^{\varvec{e}}$ and $\mu \in T (s)$, there is some $\nu \in T ^{\varvec{e}}_f({\tilde{s}})$ defined by:

1.
$\nu (\langle { satObj _\varPhi }({\varvec{g}}, {\varvec{g}}'), \langle s', {\varvec{g}}' \rangle \rangle ) = \mu (\varvec{c}, s')$ if ${ succ ({\varvec{e}}, \varvec{c})} = {\varvec{e}}\wedge {\varvec{g}}' = { succ ({\varvec{g}}, s', {\varvec{e}})}$, and
2.
$\nu (\langle f({\varvec{g}}, \mu ) + { satObj _\varPhi }({\varvec{g}}, {\varvec{g}}'), \langle s_\bot , {\varvec{g}}' \rangle \rangle ) = \sum _{\varvec{c} \in \varvec{C}} \sum _{s' \in S '_{\varvec{c}}} \mu (\varvec{c}, s')$ where $\varvec{C} = \{ \varvec{c} \mid { succ ({\varvec{e}}, \varvec{c})} \ne {\varvec{e}}\}$ and $S '_{\varvec{c}} = \{ s' \mid { succ ({\varvec{g}}, s', { succ ({\varvec{e}}, \varvec{c})})} = {\varvec{g}}' \} $.

Figure 4 shows an epoch model $M^{\varvec{e}}_f$ of the MDP $M_ ex $ in Fig. 2 with respect to tradeoff $\varPhi $ as in Example 2 and any epoch ${\varvec{e}}\in \varvec{E}_{2}$ with ${{\varvec{e}}}[{1}] \ne \bot $ and ${{\varvec{e}}}[{2}] \ne \bot $.

Remark 1

The structure of $M^{\varvec{e}}_f$ differs only slightly between epochs. In particular consider epochs ${\varvec{e}}, {\varvec{e}}'$ with ${{\varvec{e}}}[{i}] = \bot $ iff ${{\varvec{e}}'}[{i}] = \bot $. To construct epoch model $M^{{\varvec{e}}'}_f $ from $ M^{{\varvec{e}}}_f$, only transitions to the bottom states $\langle s_\bot , {\varvec{g}} \rangle $ need to be adapted.

To analyse an epoch model $M^{\varvec{e}}_f$, any successor epoch ${\varvec{e}}'$ of ${\varvec{e}}$ needs to be analysed before. Since costs are non-negative, we can ensure this by analysing the epochs in a specific order. In the single dimensional case the order is uniquely given by $\bot ,0,1, \dots , b$. For multiple cost bounds any linearisation of the partial order ${\preceq } \subseteq \varvec{E}_{{m}} \times \varvec{E}_{{m}}$ with ${\varvec{e}}' \preceq {\varvec{e}}$ iff ${{\varvec{e}}'}[{i}] \le {{\varvec{e}}}[{i}] \vee {{\varvec{e}}'}[{i}] = \bot $ for all i can be considered. We call such a linearisation a proper epoch sequence.

We compute the points $\varvec{p}_{\varvec{w}}$ by analysing the different epoch models (i.e. the coordinates of Fig. 3(b)) sequentially. The main procedure is outlined in Algorithm 1. The costs of the model for the current epoch are computed in lines 2-8. These costs comprise the results from previously analysed epochs ${\varvec{e}}'$. In lines 9-12, the current epoch model $M^{\varvec{e}}_f$ is built and analysed: We compute weighted expected costs on $M^{\varvec{e}}_f$ where ${\mathcal {E}^{\mathfrak {S}}_{M^{\varvec{e}}_f} (\varvec{w})}[{s}] $ denotes the expected costs for $M^{\varvec{e}}_f$ when changing the initial state to s. In line 10 a (deterministic and memoryless) scheduler $\mathfrak {S} $ that induces the maximal weighted expected costs (i.e. ${\mathcal {E}^{\mathfrak {S}}_{M^{\varvec{e}}_f} (\varvec{w})}[{s}] = \max _{\mathfrak {S} '} {\mathcal {E}^{\mathfrak {S} '}_{M^{\varvec{e}}_f} (\varvec{w})}[{s}] $ for all states s) is computed. In line 12 we then compute the expected costs induced by $\mathfrak {S} $ for the individual objectives.

Theorem 1

The output of Algorithm 1 satisfies Eq. 3.

Proof

(sketch). Let ${\varvec{e}}$ be the currently analysed epoch. Since ${\mathbb {E}}$ is assumed to be a proper epoch sequence, we already processed any reachable successor epoch ${\varvec{e}}'$ of ${\varvec{e}}$, i.e., line 7 is only executed for epochs ${\varvec{e}}'$ for which $x^{{\varvec{e}}'}$ has already been computed. One can show that the values ${{{{{x^{{\varvec{e}}}}}[{\langle s, {\varvec{g}} \rangle }]}}[{k}]}$ computed by the algorithm coincide with the probability to satisfy ${e}_k'$ from state $\langle s, {\varvec{e}}, {\varvec{g}} \rangle $ in the unfolding ${M_{ unf }}$ under a scheduler $\mathfrak {S} $ that maximises the weighted sum.

Error propagation. So far, we assumed that (weighted) expected costs $\mathcal {E}^{\mathfrak {S}}_{M} (\varvec{w})$ are computed exactly. Practical implementations, however, are often based on numerical methods that only approximate the correct solution. In fact, methods based on value iteration—the de-facto standard in MDP model checking—do not give any guarantee on the accuracy of the obtained result [26]. We therefore consider interval iteration [5, 9] which for a predefined precision $\varepsilon > 0$ guarantees that the obtained result $x_s$ is $\varepsilon $-precise, i.e. we have $|x_s - {\mathcal {E}^{\mathfrak {S}}_{M} (\varvec{w})}[{s}] | \le \varepsilon $.

For the single-cost bounded variant of Algorithm 1, [27] discusses that in order to compute $\mathcal {P}^{\mathrm {max}}_{M} (\langle \mathrm {C}_{} \rangle _{\le b}\, G)$ with precision $\varepsilon $, each epoch model needs to be analysed with precision $\frac{\varepsilon }{b+1}$. We generalise this result to multi-dimensional tradeoffs. Assume the results of previously analysed epochs (given by f) are $\varepsilon $-precise and that $M^{\varvec{e}}_f$ is analysed with precision $\delta $. As in the single-dimensional case, the total error for $M^{\varvec{e}}_f$ can accumulate to $\delta + \varepsilon $. Since a path through the MDP M can visit at most $\sum _{i=1}^{m}(b_i + 1)$ cost epochs whose analysis introduces error $\delta $, the overall error can be upper bounded by $\delta \cdot \sum _{i=1}^{m}(b_i + 1)$.

Theorem 2

If the values ${{{{{x^{{\varvec{e}}}}}[{{\tilde{s}}}]}}[{k}]}$ at line 12 of Algorithm 1 are computed with precision $\varepsilon / \sum _{i=1}^{{m}} (b_i +1)$ for some $\varepsilon > 0$, the output $\varvec{p}_{\varvec{w}}'$ of the algorithm satisfies $| \varvec{p}_{\varvec{w}} - \varvec{p}_{\varvec{w}}'| \cdot \varvec{w} \le \varepsilon $ where $\varvec{p}_{\varvec{w}}$ is as in Eq. 3.

Remark 2

Alternatively, epochs can be analysed with the desired overall precision $\varepsilon $ by lifting the results from topological interval iteration [5]. However, that requires to store the obtained bounds for the results of already analysed epochs.

3.3 Extensions

Minimising objectives. Objectives $\mathcal {P}^{\mathrm {min}}_{M} ({e}_k)$ can be handled by extending the function ${ satObj _\varPhi }$ in Definition 8 such that it assigns cost $-1$ to branches that lead to the satisfaction of ${e}_k$. To obtain the desired probabilities we then maximise negative costs and multiply the result by $-1$ afterwards. As interval iteration supports mixtures of positive and negative costs [5], arbitrary combinations of minimising and maximising objectives can be considered^{Footnote 1}.

Beyond upper bounds. Our approach also supports bounds of the form $\langle \mathrm {C}_{j} \rangle _{\sim b}\, G$ for ${\sim } \in \{<, \le , >, \ge \}$, i.e., we allow combinations of lower and upper cost-bounds. For strict upper bounds $<\!b$ and non-strict lower bounds ${\ge }\;b$ we consider $\le b+1$ and $> b-1$ instead. For bound $\langle \mathrm {C}_{i} \rangle _{> b_i}\, G_i$ we adapt the update of goal satisfactions such that ${{ succ ({\varvec{g}}, s, {\varvec{e}})}}[{i}] = 1$ if either ${{\varvec{g}}}[{i}] = 1$ or $s\in G_i \wedge {{\varvec{e}}}[{i}] = \bot $. Similarly, we support multi-bounded-single-goal queries of the form $\langle \mathrm {C}_{(j_1, \dots , j_n)} \rangle _{(\sim _{1} b_{1}, \dots , \sim _{n} b_{n})}\, G$ which characterises the paths $\pi $ with a single prefix $\pi _\mathrm {fin}$ satisfying $\mathrm {last}(\pi _\mathrm {fin}) \in G$ and all cost bounds, i.e., $ \mathrm {cost}_{j_i}(\pi _\mathrm {fin}) \sim _i b_i$.

Example 3

The formula $e = \langle \mathrm {C}_{(1,1)} \rangle _{(\le 1, \ge 1)}\, G$ expresses the paths that reach G while collecting exactly one cost w.r.t. the first cost structure. This formula is not equivalent to $e' = \langle \mathrm {C}_{1} \rangle _{\le 1}\, G \wedge \langle \mathrm {C}_{1} \rangle _{\ge 1}\, G$ since, e.g., for $G= \{\,s_0\,\}$ the path $\pi = s_0 \langle 2 \rangle s_0$ satisfies $e'$ but not e.

Expected cost objectives. We can consider cost-bounded expected cost objectives $\mathcal {E}^{ opt }_{M} (\mathrm {R}_{j_1}, \langle \mathrm {C}_{j_2} \rangle _{\le b}\,\!)$ with $ opt \in \{\, \mathrm {max}, \mathrm {min}\,\}$ which refer to the expected cost accumulated for cost structure $j_1$ within a given cost bound $\langle \mathrm {C}_{j_2} \rangle _{\le b}\,$. Similar to cost-bounded reachability queries, we compute cost-bounded expected costs via computing (weighted) expected costs within epoch models.

Quantiles. A (multi-dimensional) quantile has the form $ Qu (\mathcal {P}^{ opt }_{M} (e) \sim p)$ for $ opt \in \{\,\mathrm {min}, \mathrm {max}\,\}$, ${\sim } \in \{<, \le , >, \ge \}$, $e=\bigwedge _{i=1}^{\!n \in \mathbb {N}} (\langle \mathrm {C}_{j_i} \rangle _{\sim _i b_i}\, G_i)$ and a fixed probability threshold $p \in [0,1]$. The quantile asks for the set of bound values $\mathcal {B}$ that satisfy the probability threshold, i.e., $\mathcal {B} = \{\langle b_1 \dots , b_n \rangle \mid \mathcal {P}^{ opt }_{M} (e) \sim p \}$. The computation of quantiles for single-cost bounded reachability has been discussed in [3, 34], where multiple cost bounds are supported via unfolding. Unfolding requires to fix bound values $b_2, \dots , b_n$ a priori, and one can only ask for all $b_1$ that satisfy the property. Our approach provides the basis for lifting the ideas of [3, 34] to multi-bounded queries. Roughly, one extends the epoch sequence ${\mathbb {E}}$ in Algorithm 1 dynamically until the epochs in which the bounded reachability probability passes the threshold p are explored. Additional steps such as detecting the case where $\mathcal {B} = \emptyset $ are left for future work.

4 Visualisations

The results of a multi-objective model checking analysis are typically presented as a single (approximation of a) Pareto curve. For more than two objectives, the performance of the Pareto-optimal scheduler can be displayed in a bar chart as in Fig. 4, where the colours reflect different objectives and the groups different schedulers. The aim is to visualise the tradeoffs between the different objectives such that the user can make an informed decision about the system design or pick a scheduler for implementation. However, Pareto set visualisations alone may not provide sufficient information, about, e.g., which objectives are aligned or conflicting (see e.g. [39] for a discussion in the non-probabilistic case). Cost bounds furthermore add an extra dimension for each cost structure. Consider the Mars rover MDP $M_r$ and tradeoff $ multi \big ({ obj _{100}, obj _{140}}\big ) $ with

$$ obj _v = \mathcal {P}^{\mathrm {max}}_{M_r} (\langle \mathrm {C}_{ time } \rangle _{\le 175}\, B \wedge \langle \mathrm {C}_{ energy } \rangle _{\le 100}\, B \wedge \langle \mathrm {C}_{ value } \rangle _{\ge v}\, B ) $$

where $ B $ is the set of states where the rover has safely returned to its base. We ask for the tradeoff between performing experiments of scientific value at least 100 before returning to base within 175 time units and maximum energy consumption of 100 units ($ obj _{100}$) vs. achieving the same with scientific value at least 140 ($ obj _{140}$). The Pareto curve (Fig. 5(a)) shows the tradeoff between achieving $ obj _{100}$ and $ obj _{140}$. However, for each Pareto-optimal scheduler, our method has implicitly computed the probabilities of the two objectives for all reachable epochs as well, i.e. for all bounds on the three quantities below the ones required in the tradeoff. We visualise this information for deep insights into the behaviour of each scheduler, its robustness w.r.t. the bounds, and its preferences for certain objectives depending on the remaining budget for each quantity.

We use plots as shown in Fig. 6. They can be generated in no extra runtime or memory since all required data is already computed implicitly. We restrict to two-dimensional plots since they are easier to grasp than complex three-dimensional visualisations. In each plot, we can thus show the relationship between three different quantities: one on the x-axis (x), one on the y-axis (y), and one encoded as the colour of the points (z, where we use blue for high values, red for low values, black for probability zero, and white for unreachable epochs). Yet our example tradeoff already contains five quantities: the probability for $ obj _{100}$, the probability for $ obj _{140}$, the available time and energy to be spent, and the remaining scientific value to be accumulated. We thus need to project out some quantities. We do this by showing at every $\langle \textit{x}, \textit{y} \rangle $ coordinate the maximum or minimum value of the z quantity when ranging over all reachable values of the hidden costs at this coordinate. That is, we show a best- or worst-case situation, depending on the semantics of the respective quantities.

Out of the 30 possible combinations of quantities for our example, we showcase three to illustrate the added value of the obtained information. First, in Fig. 6(a), we plot the probabilities of the two objectives vs. the minimum scientific value that still needs to be accumulated for two different Pareto-optimal schedulers (left: $\mathfrak {S} _1$, right: $\mathfrak {S} _2$). White areas indicate that no epoch for the particular combination of probabilities is reachable from the tradeoff’s bounds. These two and all other Pareto-optimal schedulers are white above the diagonal, which means that $ obj _{100}$ implies $ obj _{140}$, i.e. the objectives are aligned. For the left scheduler, we further see that all blue-ish areas are associated to lower probabilities for both objectives. Since blue indicates higher values, this scheduler achieves only low probabilities when it still needs to make the rover accumulate a high amount of value. However, it overall achieves higher probabilities for $ obj _{140}$ at medium value requirements, whereas the right scheduler is “safer” and focuses on satisfying $ obj _{100}$. The erratic spikes on the left occur because some probabilities are only reached after very unlikely paths.

Table 1. Runtime comparison for multi-cost single-objective queries

Full size table

Table 2. Runtime comparison for multi-cost multi-objective queries

Full size table

In Fig. 6(b), we show for $\mathfrak {S} _1$ the probability to achieve $ obj _{100}$ depending on the remaining energy to be spent vs. the remaining scientific value to be accumulated. We see a white vertical line for every odd x-value; this is because, over all branches in the model, the gcd of all value costs is 2. The left plot shows the minimum probabilities over the hidden costs, i.e. we see the probability for the worst-case remaining time; the right plot shows the best-case scenario. Not surprisingly, when time is low, only a lot of energy makes it possible to reach the objective with non-zero probability.

Finally, Fig. 6(c) shows the probability for $ obj _{140}$ depending on available time and energy for $\mathfrak {S} _2$. We plot the minimum probability over the hidden scientific value requirement, i.e. a worst-case view. The plot shows that time is of little use in case of low remaining energy, but it helps significantly when there is sufficient energy, too. In Fig. 6(d), we depict for the same scheduler the minimum remaining scientific value (z) under which a certain probability for $ obj _{100}$ can be achieved (y), given a certain remaining time budget (x). The upper left corner shows that a high probability in little time is only achievable if we need to collect little more value; the value requirement gradually relaxes as we aim for lower probabilities or have more time.

5 Experiments

Implementation. We implemented the presented approach into Storm [20] v1.2, and available via [19]. The implementation computes extremal probabilities for single-objective multi-cost bounded queries, as well as Pareto curves for the multi-objective case. We consider the sparse engine of Storm, i.e., explicit data structures such as sparse matrices. For single-cost bounded properties, this has already been addressed in [34]. For the computation of expected cost (Lines 10 to 12 of Algorithm 1) we employ interval iteration with finite precision floats as well as policy iteration with infinite precision rationals. The expected costs (lines 10 to 12 of Algorithm 1) are computed either numerically (via interval iteration over finite precision floats) or exactly (via policy iteration over infinite precision rationals). To reduce the memory consumption, the analysis result of an epoch model $M^{\varvec{e}}_f$ is erased as soon as possible.

Set-up & reproduction. We evaluate the approach on wide range of case studies, available in the artefact [30]. The models are given in Prism’s [37] guarded command language. For each case study we consider single- and multi-objective queries that yield non-trivial results, i.e., probabilities strictly between zero and one. We compare the naive unfolding approach (UNF) as in Sect. 3.1 with the sequential approach (SEQ) as in Sect. 3.2. The unfolding of the model is applied on the Prism language level, by considering a parallel composition with cost counting structures. On the unfolded model we apply the algorithms for unbounded reachability as available in Storm. We considered precision ${\eta }= 10^{-4}$ for the Pareto curve approximation and precision ${\varepsilon }= 10^{-6}$ for interval iteration. We increased the precision for single epoch models as in Theorem 2.

We ran our experiments on a single core (2 GHz) of a HP BL685C G7 system with 192 GB of memory. We stopped each experiment after a time limit of 2 hours. For experiments that completed within the time limit, we observed a memory consumption of up to 36 GB for UNF and up to 5 GB for SEQ.

A binary equivalent to the binary we used for the experiments is available in the artefact [30]. The binary has been tested in the artefact evaluation VM [31]. For other configurations, Storm should be recompiled using the sources [19].

Details on reproduction of the tables, as well as details on how to analyse multi-cost bounded properties using Storm in general can be found in the readme, enclosed in the artefact.

Experimental Results. Tables 1 and 2 show results for single- and multi-objective queries, respectively. The first columns yield the number of states and transitions of the original MDP, then for the query, the number of bounds ${m}$, the number of different cost structures r, and the number of reachable cost epochs (reflecting the magnitude of the bound values). $|S _ unf |$ denotes the number of reachable states in the unfolding. For multi-objective queries, we additionally give the number of objectives and the number of analysed weight vectors $\varvec{w}$. The remaining columns depict the runtimes of the different approaches in seconds. For UNF, we considered both the sparse (sp) and symbolic (dd) engine of Storm. The symbolic engine neither supports multi-objective model checking nor exact policy iteration.

On the majority of benchmarks, SEQ performs better than UNF. Typically, SEQ is less sensitive to increases in the magnitude of the cost bounds, as illustrated in Fig. 7. For three benchmark and query instances, we plot the runtime of both approaches against different numbers $|\mathbb {E}|$ of reachable epochs. While for small cost bounds, UNF is sometimes even faster compared to SEQ, SEQ scales better with increasing $|\mathbb {E}|$. It is not surprising that SEQ scales better, ultimately, the increased state space and the accompanying memory consumption in UNF is a bottleneck. The most important reason that UNF performs better for some (smaller) cost bounds is the induced overhead of checking the full epoch. In particular, the epoch contains (often many) states that are not reachable from the initial state (in the unfolding).

6 Conclusion

Many real-world planning problems consider several limited resources and contain tradeoffs. This paper present a practically efficient approach to analyse these problems. It has been implemented in the Storm model checker and shows significant performance benefits. The algorithm implicitly computes a large amount of information that is hidden in the standard plots of Pareto curves shown to visualise the results of a multi-objective analysis. We have developed a new set of visualisations that exploit all the available data to provide new and clear insights to decision makers even for problems with many objectives and cost dimensions.

Data Availability Statement. The datasets analysed during the current study, and the binary used for the analysis, are available in the figshare repository [30]. Source code matching the binary is available in [19].

Notes

1.
This supersedes a restriction of the algorithm of [24].

References

The International Probabilistic Planning Competition, http://www.icaps-conference.org/index.php/Main/Competitions
Andova, S., Hermanns, H., Katoen, J.-P.: Discrete-time rewards model-checked. In: Larsen, K.G., Niebert, P. (eds.) FORMATS 2003. LNCS, vol. 2791, pp. 88–104. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-40903-8_8
Chapter Google Scholar
Baier, C., Daum, M., Dubslaff, C., Klein, J., Klüppelholz, S.: Energy-utility quantiles. In: Badger, J.M., Rozier, K.Y. (eds.) NFM 2014. LNCS, vol. 8430, pp. 285–299. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-06200-6_24
Chapter Google Scholar
Baier, C., Klein, J., Klüppelholz, S., Wunderlich, S.: Maximizing the conditional expected reward for reaching the goal. In: Legay, A., Margaria, T. (eds.) TACAS 2017. LNCS, vol. 10206, pp. 269–285. Springer, Heidelberg (2017). https://doi.org/10.1007/978-3-662-54580-5_16
Chapter Google Scholar
Baier, C., Klein, J., Leuschner, L., Parker, D., Wunderlich, S.: Ensuring the reliability of your model checker: interval iteration for Markov decision processes. In: Majumdar, R., Kunčak, V. (eds.) CAV 2017, Part I. LNCS, vol. 10426, pp. 160–180. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-63387-9_8
Chapter Google Scholar
Barrett, L., Narayanan, S.: Learning all optimal policies with multiple criteria. In: ICML. AICPS, vol. 307, pp. 41–47. ACM (2008)
Google Scholar
Berthon, R., Randour, M., Raskin, J.F.: Threshold constraints with guarantees for parity objectives in Markov decision processes. In: ICALP. LIPIcs, vol. 80, pp. 121:1–121:15. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik (2017)
Google Scholar
Brázdil, T., Brozek, V., Chatterjee, K., Forejt, V., Kucera, A.: Two views on multiple mean-payoff objectives in Markov decision processes. LMCS 10(1) (2014)
Google Scholar
Brázdil, T., Chatterjee, K., Chmelík, M., Forejt, V., Křetínský, J., Kwiatkowska, M., Parker, D., Ujma, M.: Verification of Markov decision processes using learning algorithms. In: Cassez, F., Raskin, J.-F. (eds.) ATVA 2014. LNCS, vol. 8837, pp. 98–114. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11936-6_8
Chapter MATH Google Scholar
Brázdil, T., Chatterjee, K., Forejt, V., Kucera, A.: Trading performance for stability in Markov decision processes. J. Comput. Syst. Sci. 84, 144–170 (2017)
Article MathSciNet Google Scholar
Bresina, J.L., Jónsson, A.K., Morris, P.H., Rajan, K.: Activity planning for the mars exploration rovers. In: ICAPS, pp. 40–49. AAAI (2005)
Google Scholar
Bryce, D., Cushing, W., Kambhampati, S.: Probabilistic planning is multi-objective. Technical report, Arizona State Univ., CSE (2007)
Google Scholar
Cao, Z., Guo, H., Zhang, J., Oliehoek, F.A., Fastenrath, U.: Maximizing the probability of arriving on time: a practical q-learning method. In: AAAI, pp. 4481–4487. AAAI Press (2017)
Google Scholar
Chatterjee, K., Chmelik, M., Gupta, R., Kanodia, A.: Optimal cost almost-sure reachability in POMDPs. Artif. Intell. 234, 26–48 (2016)
Article MathSciNet Google Scholar
Chatterjee, K., Majumdar, R., Henzinger, T.A.: Markov decision processes with multiple objectives. In: Durand, B., Thomas, W. (eds.) STACS 2006. LNCS, vol. 3884, pp. 325–336. Springer, Heidelberg (2006). https://doi.org/10.1007/11672142_26
Chapter Google Scholar
Chen, T., Forejt, V., Kwiatkowska, M., Simaitis, A., Wiltsche, C.: On stochastic games with multiple objectives. In: Chatterjee, K., Sgall, J. (eds.) MFCS 2013. LNCS, vol. 8087, pp. 266–277. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-40313-2_25
Chapter Google Scholar
Cheng, L., Subrahmanian, E., Westerberg, A.W.: Multiobjective decision processes under uncertainty: applications, problem formulations, and solution strategies. Ind. Eng. Chem. Res. 44(8), 2405–2415 (2005)
Article Google Scholar
Christman, A., Cassamano, J.: Maximizing the probability of arriving on time. In: Dudin, A., De Turck, K. (eds.) ASMTA 2013. LNCS, vol. 7984, pp. 142–157. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-39408-9_11
Chapter Google Scholar
Dehnert, C., Junges, S., Katoen, J.P., Quatmann, T., Volk, M.: Storm source files. zenodo (2018), https://doi.org/10.5281/zenodo.1181896
Dehnert, C., Junges, S., Katoen, J.-P., Volk, M.: A Storm is coming: a modern probabilistic model checker. In: Majumdar, R., Kunčak, V. (eds.) CAV 2017, Part II. LNCS, vol. 10427, pp. 592–600. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-63390-9_31
Chapter Google Scholar
Eastwood, R., Alexander, R., Kelly, T.: Safe multi-objective planning with a posteriori preferences. In: HASE, pp. 78–85. IEEE Computer Society (2016)
Google Scholar
Etessami, K., Kwiatkowska, M., Vardi, M.Y., Yannakakis, M.: Multi-objective model checking of Markov decision processes. LMCS 4(4) (2008)
Google Scholar
Feng, L., Wiltsche, C., Humphrey, L., Topcu, U.: Controller synthesis for autonomous systems interacting with human operators. In: ICCPS, pp. 70–79. ACM (2015)
Google Scholar
Forejt, V., Kwiatkowska, M., Parker, D.: Pareto curves for probabilistic model checking. In: Chakraborty, S., Mukund, M. (eds.) ATVA 2012. LNCS, pp. 317–332. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33386-6_25
Chapter MATH Google Scholar
Forejt, V., Kwiatkowska, M., Norman, G., Parker, D.: Automated verification techniques for probabilistic systems. In: Bernardo, M., Issarny, V. (eds.) SFM 2011. LNCS, vol. 6659, pp. 53–113. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-21455-4_3
Chapter Google Scholar
Haddad, S., Monmege, B.: Reachability in MDPs: refining convergence of value iteration. In: Ouaknine, J., Potapov, I., Worrell, J. (eds.) RP 2014. LNCS, vol. 8762, pp. 125–137. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11439-2_10
Chapter MATH Google Scholar
Hahn, E.M., Hartmanns, A.: A comparison of time- and reward-bounded probabilistic model checking techniques. In: Fränzle, M., Kapur, D., Zhan, N. (eds.) SETTA 2016. LNCS, vol. 9984, pp. 85–100. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-47677-3_6
Chapter Google Scholar
Hahn, E.M., Hashemi, V., Hermanns, H., Lahijanian, M., Turrini, A.: Multi-objective robust strategy synthesis for interval Markov decision processes. In: Bertrand, N., Bortolussi, L. (eds.) QEST 2017. LNCS, vol. 10503, pp. 207–223. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-66335-7_13
Chapter Google Scholar
Hartmanns, A., Hermanns, H.: The Modest Toolset: an integrated environment for quantitative modelling and verification. In: Ábrahám, E., Havelund, K. (eds.) TACAS 2014. LNCS, vol. 8413, pp. 593–598. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-642-54862-8_51
Chapter Google Scholar
Hartmanns, A., Junges, S., Katoen, J.P., Quatmann, T.: Evaluated artefact for this paper. figshare (2018), https://doi.org/10.6084/m9.figshare.5907349.v1
Hartmanns, A., Wendler, P.: Artefact vm. figshare (2018), https://doi.org/10.6084/m9.figshare.5896615
Hou, P., Yeoh, W., Varakantham, P.: Revisiting risk-sensitive MDPs: new algorithms and results. In: ICAPS. AAAI (2014)
Google Scholar
Junges, S., Jansen, N., Dehnert, C., Topcu, U., Katoen, J.-P.: Safety-constrained reinforcement learning for MDPs. In: Chechik, M., Raskin, J.-F. (eds.) TACAS 2016. LNCS, vol. 9636, pp. 130–146. Springer, Heidelberg (2016). https://doi.org/10.1007/978-3-662-49674-9_8
Chapter Google Scholar
Klein, J., Baier, C., Chrszon, P., Daum, M., Dubslaff, C., Klüppelholz, S., Märcker, S., Müller, D.: Advances in probabilistic model checking with PRISM: variable reordering, quantiles and weak deterministic Büchi automata. In: STTT, pp. 1–16 (2017)
Article Google Scholar
Kolobov, A., Mausam, Weld, D.S.: A theory of goal-oriented MDPs with dead ends. In: UAI, pp. 438–447. AUAI Press (2012)
Google Scholar
Kwiatkowska, M., Norman, G., Parker, D.: The PRISM benchmark suite. In: QEST, pp. 203–204. IEEE CS Press (2012)
Google Scholar
Kwiatkowska, M., Norman, G., Parker, D.: PRISM 4.0: verification of probabilistic real-time systems. In: Gopalakrishnan, G., Qadeer, S. (eds.) CAV 2011. LNCS, vol. 6806, pp. 585–591. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-22110-1_47
Chapter Google Scholar
Lacerda, B., Parker, D., Hawes, N.: Multi-objective policy generation for mobile robots under probabilistic time-bounded guarantees. In: ICAPS, pp. 504–512. AAAI Press (2017)
Google Scholar
Lankaites Pinheiro, R., Landa-Silva, D., Atkin, J.: A technique based on trade-off maps to visualise and analyse relationships between objectives in optimisation problems. J. Multi-Criteria Decis. Anal. 24(1–2), 37–56 (2017)
Article Google Scholar
Laroussinie, F., Sproston, J.: Model checking durational probabilistic systems. In: Sassone, V. (ed.) FoSSaCS 2005. LNCS, vol. 3441, pp. 140–154. Springer, Heidelberg (2005). https://doi.org/10.1007/978-3-540-31982-5_9
Chapter Google Scholar
Puterman, M.L.: Markov Decision Processes. Wiley, New York (1994)
Book Google Scholar
Quatmann, T., Junges, S., Katoen, J.-P.: Markov automata with multiple objectives. In: Majumdar, R., Kunčak, V. (eds.) CAV 2017, Part I. LNCS, vol. 10426, pp. 140–159. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-63387-9_7
Chapter Google Scholar
Randour, M., Raskin, J.F., Sankur, O.: Percentile queries in multi-dimensional Markov decision processes. FMSD 50(2–3), 207–248 (2017)
MATH Google Scholar
Roijers, D.M., Vamplew, P., Whiteson, S., Dazeley, R.: A survey of multi-objective sequential decision-making. J. Artif. Intell. Res. 48, 67–113 (2013)
MathSciNet MATH Google Scholar
Steinmetz, M., Hoffmann, J., Buffet, O.: Goal probability analysis in probabilistic planning: exploring and enhancing the state of the art. J. Artif. Intell. Res. 57, 229–271 (2016)
MathSciNet MATH Google Scholar
Teichteil-Königsbuch, F.: Stochastic safest and shortest path problems. In: AAAI. AAAI Press (2012)
Google Scholar
Vamplew, P., Dazeley, R., Berry, A., Issabekov, R., Dekker, E.: Empirical evaluation methods for multiobjective reinforcement learning algorithms. Mach. Learn. 84(1–2), 51–80 (2011)
Article MathSciNet Google Scholar
Yu, S.X., Lin, Y., Yan, P.: Optimization models for the first arrival target distribution function in discrete time. J. Math. Anal. Appl. 225(1), 193–223 (1998)
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

University of Twente, Enschede, The Netherlands
Arnd Hartmanns & Joost-Pieter Katoen
RWTH Aachen University, Aachen, Germany
Sebastian Junges, Joost-Pieter Katoen & Tim Quatmann

Authors

Arnd Hartmanns
View author publications
You can also search for this author in PubMed Google Scholar
Sebastian Junges
View author publications
You can also search for this author in PubMed Google Scholar
Joost-Pieter Katoen
View author publications
You can also search for this author in PubMed Google Scholar
Tim Quatmann
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tim Quatmann .

Editor information

Editors and Affiliations

Ludwig-Maximilians-Universität München, Munich, Germany
Dirk Beyer
University of Twente, Enschede, The Netherlands
Marieke Huisman

Rights and permissions

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made. The images or other third party material in this book are included in the book's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the book's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hartmanns, A., Junges, S., Katoen, JP., Quatmann, T. (2018). Multi-cost Bounded Reachability in MDP. In: Beyer, D., Huisman, M. (eds) Tools and Algorithms for the Construction and Analysis of Systems. TACAS 2018. Lecture Notes in Computer Science(), vol 10806. Springer, Cham. https://doi.org/10.1007/978-3-319-89963-3_19

Download citation

DOI: https://doi.org/10.1007/978-3-319-89963-3_19
Published: 14 April 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-89962-6
Online ISBN: 978-3-319-89963-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics