Multi-cost Bounded Tradeoff Analysis in MDP

We provide a memory-efficient algorithm for multi-objective model checking problems on Markov decision processes (MDPs) with multiple cost structures. The key problem at hand is to check whether there exists a scheduler for a given MDP such that all objectives over cost vectors are fulfilled. We cover multi-objective reachability and expected cost objectives, and combinations thereof. We further transfer approaches for computing quantiles over single cost bounds to the multi-cost case and highlight the ensuing challenges. An empirical evaluation shows the scalability of our new approach both in terms of memory consumption and runtime. We discuss the need for more detailed visual presentations of results beyond Pareto curves and present a first visualisation approach that exploits all the available information from the algorithm to support decision makers.

(a) (b) Fig. 1 Science on Mars: planning under several resource constraints a goal is reached eventually (with probability one) and optimise the expected reward or cost to reach that goal [46,53]. This assumption however is unrealistic in many scenarios, e.g. due to insufficient resources or the possibility of attempted actions failing. Furthermore, the resulting optimal schedulers often admit single runs which perform far below the user's expectation. Such deviations to the expected value are unsuitable in many scenarios with high stakes. Examples range from deliveries reaching an airport after the plane's departure to more serious scenarios in e.g. wildfire management [56]. In particular, many practical scenarios call for minimising the probability to run out of resources before reaching the goal: while it is beneficial for a plane to reach its destination with low expected fuel consumption, it is essential to reach its destination with the fixed available amount of fuel. Schedulers that optimise solely for the probability to reach a goal are mostly very expensive. Even in the presence of just a single cost structure, decision makers have to trade the success probability against the costs. This tradeoff makes many planning problems inherently multi-objective [12,17]. In particular, safety properties cannot be averaged out by good performance [22]. Planning scenarios in various application areas [51] have different resource constraints. Typical examples are energy consumption and time [11], or optimal expected revenue and time [42] in robotics, and monetary cost and available capacity in logistics [17].
Example 1 Consider a simplified (discretised) version of the Mars rover task scheduling problem [11]. We want to plan a variety of experiments for a day on Mars. The experiments vary in their success probability, time, energy consumption, and scientific value upon success. The time, energy consumption, and scientific value are uncertain and modelled by probability distributions, cf. Figure 1a. Each day, the rover can perform multiple experiments until it runs out of time or out of energy. The objective is to achieve a minimum of daily scientific progress while keeping the risk of exceeding the time or energy limits low. As the rover is expected to work for a longer period, we prefer a high expected scientific value. This article focuses on (i) multi-objective multi-cost bounded reachability queries as well as (ii) multi-cost quantiles on MDPs. We take as input an MDP with multiple cost structures (e.g. energy, utility, and time).
The bounded reachability problem is specified as multiple objectives of the form "maximise/minimise the probability to reach a state in G i such that the cumulative cost for the i-th cost structure is below/above a cost limit b i ". This multi-objective variant of cost-bounded reachability in MDPs is PSPACE-hard [49]. The focus of this article is on the practical side: we aim at finding a practically efficient algorithm to obtain (an approximation of) the Pareto-optimal points. To accomplish this, we adapt and generalise recent approaches for the single-objective case [28,37] towards the multi-objective setting. The basic idea of [28,37] is to implicitly unfold the MDP along cost epochs, and exploit the regularities of the epoch MDP. Prism [39] and the Modest Toolset [31] have been updated with such methods for the single-objective case and significantly outperform the traditional explicit unfolding approach of [1,44]. This article presents an algorithm that lifts this principle to multiple cost objectives and determines approximation errors when using value iteration. We also sketch extensions to expected accumulated cost objectives.
The problem of computing quantiles [2,37,52,57] is essentially the inversion of the bounded reachability problem: a quantile query has the form "what are the cumulative cost limits b i such that the maximum/minimum probability to reach a state in G i with the accumulated cost for the i-th cost structure being below/above b i is less/greater than a fixed probability threshold p." Such an inversion is natural: Instead of asking how likely it is to arrive at the planned destination with the pre-specified amount of fuel, we now ask how much fuel to take such that we arrive at the planned destination in 99.9% of the cases. A key difference to multi-cost bounded reachability as described earlier is that we do not know a priori how far to unfold the MDP. The main algorithm for quantiles iteratively extends the unfolding, reusing the ideas developed for an efficient implementation for multi-cost bounded reachability. The algorithm thereby explores a frontier of the set of cost limits for which the probability threshold holds. To ensure that the representation of the frontier is finite, already in the single-bounded case, some preprocessing is necessary, see e.g. [2]. We generalise these preprocessing steps to the multi-bounded case, and show that this is not always straightforward.
Our new approach has been implemented in the probabilistic model checker Storm [21]. We evaluate its performance, compared to the traditional unfolding approach, on a number of MDP examples as well as on discrete-time Markov chains as a special case of MDPs. We find that the new approach provides not only the expected memory advantage, but is usually faster, too, especially for high cost limits.
In addition, we equip our algorithm with means to visualise (inspired by the recent techniques in [43]) the tradeoffs between various objectives that go beyond Pareto curves. We believe that this is key to obtain better insights into multi-objective decision making. An example is given in Fig. 1b: it depicts the probability (indicated by the colours) to satisfy an objective based on the remaining energy (y-axis) and time (x-axis). Our visualisations provide a way to inspect all of the data that our algorithm implicitly computes anyway.
The key challenge here is to reduce the dimensionality of the available data to make the available information easy to grasp without obscuring important dependencies. As such, our visualisations are a first proposal, and come with a call to visualisation experts for improved methods. Related Work The analysis of single-objective (cost-bounded) reachability in MDPs is an active area of research in both the AI and the formal methods communities, and referred to in e.g. [3,18,38,59]. Various model checking approaches for single objectives exist. In [35], the topology of the unfolded MDP is exploited to speed up the value iteration. In [28], three different model checking approaches are explored and compared. A survey for heuristic approaches is given in [53]. A Q-learning based approach is described in [13]. An extension of this problem to the partially observable setting was considered in [14], and to probabilistic timed automata in [28]. Quantile queries with a single cost bound have been studied in [57]. Multiple cost bounds where all but one cost limits are fixed a priori have been considered in [2]: the idea is to explicitly unfold the model with respect to the given cost bounds, effectively transforming the query to a single-dimensional one. [37] presents a symbolic implementation of these approaches. The method of [4] computes optimal expected values under e.g. the condition that the goal is reached, and is thus applicable in settings where a goal is not necessarily reached. A similar problem is considered in [55]. For multi-objective analysis, the model checking community typically focuses on probabilities and expected costs as in the seminal works [15,23]. Implementations are typically based on a value iteration approach as in [25], and have been extended to stochastic games [16], Markov automata [47], and interval MDPs [30]. Other considered cases include e.g. multi-objective mean-payoff objectives [8], objectives over instantaneous costs [10], and parity objectives [7]. Multi-objective problems for MDPs with an unknown cost-function are considered in [36]. Surveys on multi-objective decision making in AI and machine learning can be found in [51] and [58], respectively. This article is an extended version of a previous conference paper [32]. We provide more details on the core algorithms, extended proofs, an expanded explanation of our visualisations, and additional models in the experimental evaluation. We added Sect. 5, which presents methods for computing multi-cost quantiles, for which we also provide an experimental evaluation in Sect. 7. Structure of the Paper After the preliminaries (in Sect. 2), we first recap the existing unfolding technique that the new approach conceptually builds upon (in Sect. 3). Then, we present (in Sect. 4) our approach to computing the Pareto curve under multiple cost bounds. We use similar techniques to compute multi-cost quantiles (outlined in Sect. 5). Finally, we show proposals for visualisations of the available data (in Sect. 6), and empirically evaluate the proposed algorithms based on their implementation in Storm (in Sect. 7).

Preliminaries
We first introduce notation used throughout this article, then define the model of Markov decision processes, its semantics, and the multi-objective cost-bounded properties that we are interested in.

Markov Decision Processes
Markov decision processes (MDPs) combine nondeterministic choices, capturing e.g. user input, scheduler decisions, or unknown and possibly adversarial environments, with probabilistic behaviour as in discrete-time Markov chains. They are the fundamental model for decision-making under uncertainty. We use MDPs in which the branches of transitions are annotated with (multiple) integer costs (also called rewards), allowing properties to observe quantities such as the passage of discrete time, energy usage, or monetary costs of decision outcomes.  We write s − → T μ for μ ∈ T (s) and call it a transition. We write s c − → T s if additionally c, s ∈ support(μ) and call c, s a branch with cost vector c. If T is clear from the context, we just write − → in place of − → T . Graphically, we represent transitions by lines to a node from which branches labelled with their probability and costs lead to successor states. We may omit the node and probability for transitions into Dirac distributions. Figure 2a shows an example MDP M ex . From the initial state s 0 , the choice of going towards s 1 or s 2 is nondeterministic. Either way, the probability to stay in s 0 is 0.5, otherwise we move to s 1 (or s 2 ). M ex has two cost structures: Failing to move to s 1 has a cost of 1 for the first, and 2 for the second structure. Moving to s 2 yields cost 2 for the first and no cost for the second structure.

Example 2
Using MDPs directly to build complex models is cumbersome. In practice, high-level formalisms like Prism's [39] guarded command language or the high-level modelling language Modest [29] are used to specify MDPs. Aside from a parallel composition operator, they extend MDPs with variables over finite domains that can be used in expressions to e.g. enable or disable transitions. Their semantics is an MDP whose states are the valuations of the variables. This allows to compactly describe very large MDPs.

Paths and Schedulers
For the remainder of this article, we fix an MDP M = S, T , s init . Its semantics is captured by the notion of paths. A path in M represents the infinite concrete resolution of both nondeterministic and probabilistic choices. An end component is a subset of the states and transitions such that it is possible (by choosing only transitions in the subset) to remain within the subset of states forever (with probability 1).

Definition 3
An end component (EC) of M is given by T : S → 2 Dist(N m ×S) for a non-empty S ⊆ S such that -for all s ∈ S , T (s) ⊆ T (s) and s c − → T s implies s ∈ S , and -for all s, s ∈ S there is a finite path in M from s to s only using transitions in T .
A scheduler (or adversary, policy, or strategy) resolves the nondeterministic choices.

Definition 4 A function S : Paths fin
The set of all schedulers of M is Sched(M). We call a scheduler S ∈ Sched(M) deterministic if |support(S(π fin ))| = 1 and memoryless if last(π fin ) = last(π fin ) implies S(π fin ) = S(π fin ) for all finite paths π fin and π fin . For simplicity, we also write deterministic memoryless schedulers as functions S : S → Dist(N m × S).
Via the standard cylinder set construction a scheduler S induces a probability measure P S M on measurable sets of paths starting from s init . More details can be found in e.g. [26] for the case of deterministic schedulers and [46, Section 2.1.6] for the general case. We define the extremal values P max . For clarity, we focus on probabilities in this article, but note that expected accumulated costs can be defined analogously (see e.g. [26]) and our methods apply to them with only minor changes.

Cost-Bounded Reachability
Recall that the branches of an MDP are annotated with tuples of costs. In our notations we use C j to refer to the j-th cost structure, i.e. the costs obtained by taking the j-th component of each tuple. We are interested in the probabilities of sets of paths that reach certain goal states while respecting a conjunction of multiple cost bounds.

Definition 5
A cost bound is given by C j ∼b G where C j with j ∈ {1, . . . , m} identifies a cost structure, ∼ ∈ {<, ≤, >, ≥}, b ∈ N is a cost limit, and G ⊆ S is a set of goal states. A cost-bounded reachability formula is a conjunction n∈N i=1 ( C j i ∼ i b i G i ) of cost bounds. It characterises the measurable set of paths Π where, for every i, every π ∈ Π has a prefix π i fin with last(π i fin ) ∈ G i and cost j i (π i fin ) ∼ i b i .
We call a cost-bounded reachability formula ϕ = n∈N i=1 ( C j i ∼ i b i G i ) single-cost bounded if n = 1 and multi-cost bounded in the general case. A (single-objective) multi-cost bounded reachability query asks for the maximal (minimal) probability to satisfy a conjunction of cost bounds, i.e. for P opt M (ϕ) where opt ∈ { max, min } and ϕ is a cost-bounded reachability formula. Unbounded and step-bounded reachability are special cases of cost-bounded reachability. A single-objective query may contain multiple bounds, but asks for a single scheduler that optimises the probability of satisfying them all.

Example 3
The single-objective multi-cost bounded query P max Fig. 2a asks for the maximal probability to reach s 1 with at most cost 1 for the first cost structure and s 2 with at most cost 2 for the second cost structure. This probability is 0.5, e.g. attained by the scheduler that tries to move to s 1 once and to s 2 afterwards.
Given multiple objectives (i.e. multiple reachability queries) at once, a scheduler that optimises for one objective might be suboptimal for the other objectives. We thus consider multi-objective tradeoffs (or simply tradeoffs), i.e. sets of single-objective queries written as = multi P   Fig. 2a. Let S j be the scheduler that tries to move to s 1 for at most j attempts and afterwards almost surely moves to s 2 . The induced probability vectors p S 1 = 0.5, 1 and p S 2 = 0.75, 0.75 both lie on the Pareto curve since no S ∈ Sched(M ex ) induces (strictly) larger probabilities p S . By also considering schedulers that randomise between the choices of S 1 and S 2 we obtain Pareto(M ex , ) = {w · p S 1 + (1−w) · p S 2 | w ∈ [0, 1]}. Graphically, the Pareto curve corresponds to the line between p S 1 and p S 2 as shown in Fig. 2b.

Example 4 We consider
For clarity of presentation in the following sections, and unless otherwise noted, we restrict to tradeoffs where every cost structure occurs exactly once, i.e. the number m of cost structures of M matches the number of cost bounds occurring in . Furthermore, we require that none of the sets of goal states contains the initial state. Both assumptions are without loss of generality since any formula can be made to satisfy this restriction by copying cost structures as needed and adding a new initial state with a zero-cost transition to the old initial state. We will also introduce all ideas with the upper-bounded maximum case first, assuming a tradeoff = multi P max M (ϕ 1 ), . . . , P max M (ϕ ) with cost-bounded reachability formulas (cf. Definition 5) We discuss other bound types in Sect. 4.4, including combinations of lower and upper bounds.

The Unfolding Approach Revisited
The classic approach to compute cost-bounded properties is to unfold the accumulated cost into the state space [1]. Our new approach is more memory-efficient than unfolding, but fundamentally rests on the same notions and arguments for correctness. We thus present the unfolding approach in this section first.

Epochs and Goal Satisfaction
The central concept in our approach is that of cost epochs. The idea is that, to compute a Pareto curve, we analyse all reachable cost epochs one by one, in a specific order. Let us start with the most naïve way to track costs: For each path, we can plot the accumulated cost in all dimensions along this path in a cost grid. A coordinate c 1 , . . . , c m in the cost grid reflects that the amount of collected cost in dimension i is c i .  Fig. 3a. Starting from 0, 0 , the first transition yields cost 2 for the first cost structure: we jump to coordinate 2, 0 . The next transition, back to s 0 , has no cost, so we stay at 2, 0 . The failed attempt to move to s 1 incurs costs 1, 2 , jumping to coordinate 3, 2 . This series of updates repeats ad infinitum.
For an infinite path, infinitely many points in the cost grid may be reached. These points are therefore not a suitable notion to use in model checking. However, a tradeoff specifies limits for the costs, e.g. for we get cost limits 4 and 3. Once the limit for a cost is reached, accumulating further costs in this dimension does not impact the satisfaction of the corresponding formula. It thus suffices to keep track of the remaining costs before reaching the cost limit of each bound. This leads to a finite grid of cost epochs.

Example 6
Reconsider path π of Example 5 and ex as above. We illustrate the finite epoch grid in Fig. 3b. We start in cost epoch 4, 3 . The first transition incurs cost 2, 0 , and subsequently the remaining cost before reaching the bound is 2, 3 . These updates continue analogously to Example 5. From 1, 1 taking cost 2, 0 means that we violate the bound in the first dimension. We indicate this violation with ⊥, and move to ⊥, 1 . Later, taking cost 1, 2 does not change that we already violated the first bound: the first entry remains ⊥, but as we now also violate the second bound, we move to ⊥, ⊥ . We then remain in this cost epoch forever.
Recall that we consider upper bounds. Consequently, if the entry for a bound is ⊥, it cannot be satisfied any more: too much cost has already been incurred. To check whether an objective ϕ k = n k −1 i=n k−1 ( C i ≤b i G i ) is satisfied, we need to memorise whether each individual bound already holds, that is, whether we have reached a state in G i before exceeding the cost limit.

Definition 7
A goal satisfaction g ∈ G m def = {0, 1} m represents the cost structure indices i for which bound C i ≤b i G i already holds, i.e. G i was reached before exceeding the cost limit b i . For g ∈ G m , e ∈ E m and s ∈ S, let succ(g, s, e) ∈ G m define the update upon reaching s:

Example 7
Reconsider path π of Example 5 and ex as previously. As s 0 / ∈ G i , we start with g = 0, 0 . We visit s 1 and update g to 1, 0 since s 1 ∈ G 1 and s 1 / ∈ G 2 . We then visit s 0 and g remains 1, 0 . After that, upon visiting s 2 , g is updated to 1, 1 .

The Unfolding Approach
We can now compute Pareto(M, ) by reducing to a multi-objective unbounded reachability problem on the unfolded MDP M unf . The states of M unf are the Cartesian product of the original MDP's states, the epochs, and the goal satisfactions, thereby effectively generalising the construction from [1].

Definition 8
The unfolding for an MDP M as in Definition 1 and upper-bounded maximum tradeoff is the MDP Transitions and probabilities are thus as before, but if a branch is equipped with costs, we update the cost epoch entry in the state; likewise, if a state is a goal state, we update the corresponding goal satisfaction entry. As costs are now encoded in the state space, it suffices to consider the unbounded tradeoff Fig. 2a and ex as previously. Figure 4 contains a fragment of the unfolding.

Multi-objective Model Checking on the Unfolding
Pareto(M unf , ) can be computed with existing multi-objective model checking algorithms for unbounded reachability. We build on the approach of [25]: we iteratively choose weight vectors w = w 1 , . . . , w ∈ [0, 1] \ {0} and compute points The Pareto curve Pareto(M unf , ) is convex, has finitely many vertices, and contains the point p w for each weight vector w. Moreover, q · w > p w · w implies q / ∈ Pareto(M unf , ). These observations enable us to approximate the Pareto curve with arbitrary precision by enumerating its vertices p w in a smart order. At any point, the algorithm maintains two convex areas which under-and overapproximate the area under the Pareto curve. Further details are given in [25], including a method to compute a bound on the error at any stage.
To reduce the computation of p w to standard MDP model checking, [25] characterises p w via weighted expected costs: we construct M + unf from M unf . States and transitions are as in M unf , but M + unf is additionally equipped with cost structures used to calculate the probability of each of the objectives. This is achieved by setting the value of the k-th cost structure on each branch to 1 if and only if the objective ϕ k is satisfied in the target state of the branch but was not satisfied in the transition's source state. More precisely, the cost of a branch s, e, g is point-wise defined by On a path π through M + unf , we collect exactly cost 1 for cost structure k if and only if π satisfies objective ϕ k .
The weighted expected cost is the expected value of the weighted sum of the costs accumulated on paths in M + unf . In the definition, we consider a Lebesgue integral instead of a sum since Paths(M) is generally uncountable. The maximal weighted expected cost for M + unf and w is given by E max There is always a deterministic, memoryless scheduler S that attains the maximal expected cost [46].
The following characterisation of p w is equivalent to Eq. 1: Standard MDP model checking algorithms [46] can be applied to compute an optimal (deterministic and memoryless) scheduler S and the induced costs E S

Multi-cost Multi-objective Sequential Value Iteration
An unfolding-based approach as discussed in Sect. 3.2 does not scale well in terms of memory consumption: If the original MDP has n states, then the unfolding has on the order of n · m i=1 (b i +2) states. This blow-up makes an a priori unfolding infeasible for larger cost limits b i over multiple bounds. The bottleneck lies in computing the points p w as in equations 1 and 2. In this section, we show how to compute these probability vectors efficiently, i.e. given a weight vector w = w 1 , . . . , w ∈ [0, 1] \ {0}, compute without creating the unfolding. The two characterisations of p w given in equations 1 and 3 are equivalent due to Lemma 1.
The efficient analysis of single-objective queries 1 = P max M ( C ≤b G) with a single bound has recently been addressed [28,37]. The key idea is based on dynamic programming. instead of considering all copies at once. In particular, it is not necessary to construct the complete unfolding.
We lift this idea to multi-objective tradeoffs with multiple cost bounds: we aim to build an MDP for each epoch e ∈ E m that can be analysed via standard model checking techniques using the weighted expected cost encoding of objective probabilities. Notably, in the single cost bound case with a single objective, it is easy to determine whether the one property is satisfied: either reaching a goal state for the first time or exceeding the cost bound immediately suffices to determine whether the property is satisfied. Thus, while M ⊥ is just one sink state in the single cost bound case, its structure is more involved in the presence of multiple objectives and multiple cost bounds.

An Epoch Model Approach without Unfolding
We first formalise epoch models for multiple bounds. As noted, the overall epoch structure is the same as in the unfolding approach.
and for everys = s, g ∈ S e and μ ∈ T (s), there is a ν ∈ T e f (s) such that In contrast to Definition 1, the MDP M e f may consider cost vectors that consist of non-natural numbers-as reflected by the image of f . The two items in the definition reflect the two cases described before. For item 2, the sum satObj (g, g ) + f (g, μ) reflects the two cases where an objective is satisfied in the current step (upon taking a branch that leaves the epoch) or only afterwards. In particular, our algorithm constructs f in a way that satObj (g, g )

Proof
We apply the characterisation of (weighted) expected rewards as the smallest solution of a Bellman equation system [25,46]. For M + unf , assume variables x[ s,ê, g ] ∈ R ≥0 for every s,ê, g ∈ S . The smallest solution of the equation system satisfies We To analyse an epoch model M e f , any successor epoch e of e needs to have been analysed before. Since costs are non-negative, we can ensure this by analysing the epochs in a specific order: In the case of a single cost bound, this order is uniquely given by ⊥, 0, 1, . . . , b. For multiple cost bounds any proper epoch sequence can be considered. This definition coincides with the topological sort of the graph in Fig. 6. To improve performance, we group the epoch models with a common MDP structure.

Algorithm 1: Sequential multi-cost bounded analysis
Example 11 For the epoch models depicted in Fig. 5, a possible proper epoch sequence is We compute the points p w by analysing the different epoch models (i.e. the coordinates of Fig. 3b) sequentially, using a dynamic programming-based approach. The main procedure is outlined in Algorithm 1.
is computed. In line 16, we then compute the expected costs induced by S for the individual objectives. Forejt et al. [25] describe how this computation can be implemented with a value iteration-based procedure. Alternatively, we can apply policy iteration or linear programming [46] for this purpose.

Theorem 1
The output of Algorithm 1 satisfies Eq. 3.
Proof We have to show: We prove the following statement for each epoch e:

Runtime and Memory Requirements
In the following, we discuss the complexity of our approach relative to the size of a binary encoding of the cost limits b 1 , . . . , b m occurring in a tradeoff . Algorithm 1 computes expected weighted costs for |E| many epoch models M e f . Each of these computations can be done in polynomial time (in the size of M e f ) via a linear programming encoding [46]. With |E| ≤ m i=1 b i , we conclude that the runtime of Algorithm 1 is exponential in a binary encoding of the cost limits. For the unfolding approach, weighted expected costs have to be computed for a single MDP whose size is, again, exponential in a binary encoding of the cost limits. Although we observe similar theoretical runtime complexities for both approaches, experiments with topological value iteration [5,19] and single cost bounds [2,28] have shown the practical benefits of analysing several small sub-models instead of one large MDP. We make similar observations with our approach in Sect. 7.
Algorithm 1 stores a solution vector x e [ s, g ] ∈ R for each e ∈ E, s ∈ S, and g ∈ G m , i.e. a solution vector is stored for every state of the unfolding. However, memory consumption can be optimised by erasing solutions x e [ s, g ] as soon as this value is not accessed by any of the remaining epoch models (for example if all predecessor epochs of e have been considered already). If m = 1 (i.e. there is only a single cost bound), such an optimisation yields an algorithm that runs in polynomial space. In the general case (m > 1), the memory requirements remain exponential in the size of a binary encoding of the cost limits. However, our experiments in Sect. 7 indicate substantial memory savings in practice.

Error Propagation
As presented above, the algorithm assumes that (weighted) expected costs E S M (w) are computed exactly. Practical implementations, however, are often based on numerical methods that only approximate the correct solution. The de-facto standard in MDP model checking for this purpose is value iteration. Methods based on value iteration do not provide any guarantee on the accuracy of the obtained result [27] for the properties considered here. Recently, interval iteration [5,27] and similar techniques [9,34,48] have been suggested to provide error bounds. These methods guarantee that the obtained result x s is ε-precise for any predefined precision ε > 0, i.e. upon termination we obtain |x[s] − E S M (w)[s]| ≤ ε for all states s. We describe how to adapt our approach for multi-objective multi-cost bounded reachability to work with an ε-precise method for computing the expected costs.

General Models
Results from topological interval iteration [5] indicate that individual epochs can be analysed with precision ε to guarantee this same precision for the overall result. The downside is that such an adaptation requires the storage of the obtained bounds for all previously analysed epochs. Therefore, we extend the following result from [28]. The bound is easily deduced: assume the results of previously analysed epochs (given by f ) are η-precise and that M e f is analysed with precision δ. The total error for M e f can accumulate to at most δ + η. As we analyse b + 1 (non-trivial) epoch models, the error thus accumulates to (b + 1) · δ. Setting δ to ε b+1 guarantees the desired bound ε. We generalise this result to multi-cost bounded tradeoffs.
for some ε > 0, the output p w of the algorithm satisfies |p w − p w | · w ≤ ε where p w is as in Eq. 3.
Proof As in the single-cost bounded case, the total error for M e f can accumulate to δ +η when η is the (maximal) error bound on f . The error bound on f is again recursively determined by δ − 1 times the maximum number of epochs visited along paths from the successor epochs. Since a path through the MDP M visits at most m i=1 (b i + 1) non-trivial cost epochs, each incurring cost δ, the overall error can be upper-bounded by δ · m i=1 (b i + 1).
While an approach based on Theorem 2 thus requires the analysis of epoch models with tighter error bounds than the bounds induced by [5], and therefore potentially increases the per-epoch analysis time, it still allows us to be significantly more memory-efficient.

Acyclic Epoch Models
The error bound in Theorem 2 is pessimistic, as it does not assume any structure in the epoch models. However, very often, the individual epoch models are in fact acyclic, in particular for cost epochs e ∈ N m , i.e. e[i] = ⊥ for all i. Intuitively, costs usually represent quantities like time or energy usage for which the possibility to perform infinitely many interesting steps without accumulating cost would be considered a modelling error. In the timed case, for example, such a model would allow Zeno behaviour, which is generally considered unrealistic and undesirable. When epoch models are acyclic, interval iteration [5,27] will converge to the exact result in a finite number of iterations. In this case, the tightening of the precision according to Theorem 2 usually has no effect on runtime. The epoch models for cost epochs e ∈ N m are acyclic for almost all models that we experiment with in Sect. 7.

Different Bound Types
Minimising Objectives Objectives P min M (ϕ k ) can be handled by adapting the function satObj in Definition 10 such that it assigns cost −1 to branches that lead to the satisfaction of ϕ k . To obtain the desired probabilities we then maximise negative costs and multiply the result by −1 afterwards. As interval iteration supports mixtures of positive and negative costs [5], arbitrary combinations of minimising and maximising objectives can be considered 1 . Beyond Upper Bounds Our approach also supports bounds of the form C j ∼b G for ∼ ∈ {< , ≤, >, ≥}, and we allow combinations of lower and upper cost bounds. Strict upper bounds < b can be reformulated to non-strict upper bounds ≤ b − 1. Likewise, we reformulate nonstrict lower bounds ≥ b to > b − 1, and only consider strict lower bounds in the following.
For bound C i >b i G i we adapt the update of goal satisfactions (Definition 7) such that Moreover, we support multi-bounded single-goal queries like C ( j 1 ,..., j n ) (∼ 1 b 1 ,...,∼ n b n ) G, which characterises the paths π with a prefix π fin satisfying last(π fin ) ∈ G and all cost bounds simultaneously, i.e. cost j i (π fin ) ∼ i b i . Let us clarify the meaning of simultaneously with an example.

Example 12
The formula ϕ = C (1,1) (≤1,≥1) G expresses the paths that reach G while collecting exactly one cost with respect to the first cost structure. This formula is not equivalent to ϕ = C 1 ≤1 G ∧ C 1 ≥1 G. Consider the trivial MDP in Fig. 7 with G = { s 0 }. The MDP (and the trivial strategy) satisfies ϕ but not ϕ: Initially, the left-hand side of ϕ is (already) satisfied, and after one more step along the unique path, also the right-hand side is satisfied, thereby satisfying the conjunction. However, there is no point where exactly cost 1 is collected, hence ϕ is never satisfied.

Expected Cost Objectives
The algorithm supports cost-bounded expected cost objectives E opt M (R j 1 , C j 2 ≤b ) with opt ∈ { max, min }, which refer to the expected cost accumulated for cost structure j 1 within a given cost bound C j 2 ≤b . The computation is analogous to cost-bounded reachability queries: we treat them by computing (weighted) expected costs within epoch models. Therefore, they can be used in multi-objective queries, potentially in combination with cost-bounded reachability objectives.

Multi-cost Quantiles
The queries presented in previous sections assume that cost limits are fixed a priori and ask for the induced probabilities. We now study the opposite question: What are the cost limits required to satisfy a given probability threshold? This question thus asks for computing quantiles as considered in [2,37,52,57], and we lift it to multiple cost bounds. In particular, we present an efficient implementation of an algorithm to answer questions like -How much time and energy is required to fulfil a task with at least probability 0.8? -How many product types can be manufactured without failure with probability 0.99? -How much energy is needed to complete how many jobs with probability 0.9?
In this section, we introduce multi-cost bounded quantile queries to formalise these questions. We then first sketch our approach to solve them, and after that provide a more extensive treatment of quantiles with only upper cost bounds and of quantiles with only lower cost bounds. Finally, we address more complex forms of quantiles in Sect. 5.5.

Quantiles in Multiple Dimensions
Definition 12 An m-dimensional quantile query for an MDP M and m ∈ N is given by The solution of a quantile query is a set of cost limits that satisfy the probability threshold.

Definition 13
The set of satisfying cost limits for an m-dimensional quantile query = Qu P opt M (ϕ ? ) ∼ p is given by arises from ϕ ? by inserting cost limits b.

Example 13
Consider the MDP M Qu given in Fig. 8a and the quantile query The (upper-right, brighter) green area in Fig. 8b indicates the set of satisfying cost limits for ex , given by Concretely, the set describes a form of closure of a set of points on the frontier. We discuss why this is the satisfying set. First, consider cost limits 1, y for arbitrary y. The point indicates a limit of 1 in the first dimension. In particular, the leftmost action is then never helpful in satisfying the objective as it takes cost 2 in the first dimension. Thus, we have to take the right action. When taking this action, we may return to s 0 at most once before violating the cost limit. Thus, the probability to reach the target is 0.1 + 0.9 · 0.1 < 0.5, and these cost limits violate the query. Now, consider cost limits 6, 0 . Using similar reasoning as above, only the right action is relevant. We may take the self-loop at most 6 times, which yields a probability to reach the target within the cost limit of 6 i=0 0.1 · 0.9 i > 0.5, and thus these cost limits satisfy the query. Finally, consider cost limits 2, 4 . Now, the left action helps: We can take the left action at most 4 times. If we still have not reached the target, we have 2, 0 cost remaining, which can be spend trying the right action up to 2 times. The probability of reaching the target under this scheduler is again 6 i=0 0.1 · 0.9 i > 0.5.

For the remainder, let us fix an m-dimensional quantile query
In Example 13, the infinite set of satisfying cost limits is concisely described as the closure of the finitely many points generating its "frontier". We lift this type of representation to general quantile queries.

Definition 14
The closure of a set B ⊆ (N ∪ {∞}) m with respect to a quantile query is given by cl Indeed, we can always characterise the set of satisfying cost limits by B ⊆ Sat( ) with cl (B) = cl (Sat( )).

. This contradicts our assumption that G 1 is a generator of B.
We also refer to gen (Sat( )) as the generator of quantile query . A generator is called natural if it only contains points in N m . The following example shows that quantile queries can have (non-)natural and (in-)finite generators.
yield the satisfying cost limits as shown in Figure 9b. ex does not have a finite generator:

Computing Finite Natural Generators
We now present an algorithm to compute the set of satisfying cost limits Sat( ) for quantile queries where both and have a finite natural generator. In the subsequent subsections, we present suitable preprocessing steps to lift this algorithm to more general quantiles.
Our Proof As in the proof of Theorem 1, it can be shown that in lines 11-25, the algorithm computes P opt M (ϕ e ) where ϕ e arises from ϕ ? by inserting the cost limits from e. The algorithm checks whether this probability meets the threshold given in and inserts it in either B + or B − , accordingly. Hence, B + (B − ) only contains (non-)satisfying cost limits.

Lemma 7 If Algorithm 2 terminates, it returns a generator of .
Proof We first show that upon termination, N m ⊆ cl B + ∪ cl B − . The proof is based on the following observation: When the algorithm terminates, there is a b ∈ N such that In particular, has already been considered as a candidate epoch in According to the definition of closure, there has to be a b ∈ B + with b b , where is as in Definition 14. We also have b b since for all i ∈ {1, . . . , m} one of the following cases holds: By transitivity of , we get b b and thus b ∈ cl B + . Next, we show that upon termination cl B + = cl (Sat( )). Lemma 7 follows as sets with the same closure must have the same unique generator. cl B + ⊆ cl (Sat( )) follows already from Lemma 6. For the other direction, let b ∈ cl (Sat( )), i.e. there

Lemma 8 Algorithm 2 terminates if and have finite natural generators.
Proof Eventually all points of the finite set gen (Sat( )) ⊆ N m are considered in line 6 and inserted in B + , yielding cl B + = cl (Sat( )). Similarly, cl B − = cl Sat( ) holds eventually. Hence, all b ∈ N m are contained in cl B + ∪ cl B − , which leads to termination of the procedure.

Theorem 3 Algorithm 2 yields a generator of , if and have finite natural generators.
If the algorithm terminates, it analyses at most (b max ) m epoch models, where Finite natural generators are only required for Lemma 8, i.e. to guarantee termination of the algorithm. Intuitively, Algorithm 2 lacks a mechanism to analyse the resulting reachability probability when one or more cost limits approach infinity. For the single-cost case, [2,57] suggest preprocessing steps to check whether Sat( ) = ∅ and Sat( ) = ∅ holds before invoking their variant of sequential value iteration. The preprocessing steps ensures that sequential value iteration always finds a b ∈ N with gen (Sat( )) = {b}. We next lift these ideas to multi-dimensional quantiles.

Upper Cost-Bounded Quantiles
Here, we consider upper cost-bounded quantile queries of the form For the single-cost case the preprocessing checks the unbounded reachability probability [2,57]. Example 14 shows that the complementary query might have a non-natural generator. To guarantee termination of Algorithm 2, it suffices to initialise the set B − to the points from Intuitively, this reflects a situation in which the cost-bounded reachability probability is always below the threshold p, for any amount of cost collected towards the cost limits given by I . Formally, we have Call Algorithm 2 on M, with B + = ∅ and B − initialised as above 10 return gen B + Algorithm 3: Quantile computation with upper cost bounds.
Here, upper cost bounds with arbitrarily high cost limits are replaced by unbounded reachability, following ideas from the single-dimensional case [2,57]. This observation provides the basis of our algorithm, outlined as Algorithm 3. In case of single-cost queries, it analyses the unbounded reachability probability to check whether there is some cost limit that satisfies the probability threshold in lines 2-4. For quantiles with m ≥ 2 cost dimensions, we find the points b ∈ gen Sat( ) with b[k] = ∞ by checking all m many (m −1)-dimensional quantile queries k where the k-th cost bound is replaced by an unbounded reachability constraint (lines 6-8). These quantiles with partially known cost limits (cf. Remark 1) can be solved by calling Algorithm 3, recursively. We cache the results of the recursive calls, i.e. each of the occurring quantile queries is only processed once. Since each of the m cost bounds can be replaced by an unbounded reachability constraint or not, this results in roughly 2 m different calls to Algorithm 2. However, the recursive calls consider a simpler quantile with only m < m dimensions.

Theorem 4 Algorithm 3 yields a generator of upper cost-bounded quantile query .
Proof We show that before calling Algorithm 2 in line 9, B − = B − init holds. For m = 1, this is ensured in lines 2-4. For m ≥ 2, observe that ∞ and b 1 , . . . , b k−1 , b k+1 , b m ∈ gen k Sat( k ) The statement than follows from Lemma 9.

Lower Cost-Bounded Quantiles
We now consider lower cost-bounded quantile queries of the form

Remark 2
We restrict to maximising schedulers. In Sect. 5.5, we discuss issues with lower cost-bounded quantiles under minimising schedulers, i.e. quantile queries with P min M (ϕ). The following example shows that might have a finite non-natural generator. Fig. 10a and the quantile query

Example 15 Consider the MDP M Qu given in
The (lower-left, brighter) green area in Fig. 10b indicates the set of satisfying cost limits for ex , given by Similar to quantiles with upper cost bounds, we ensure termination of Algorithm 2 by initialising B + with B + init = gen (Sat( )) \ N m . Lemma 10 Algorithm 2 terminates and returns a generator of lower cost-bounded quantile query if initially B + = gen (Sat( )) \ N m .
where I ⊆ { 1, . . . , m } is a non-empty set with b[i] = ∞ if and only if i ∈ I . For the single-cost case, the computation of lim b→∞ P max M ( C j ≥b G) has been addressed in [2]: The approach relies on the notion of end components (Definition 3). For a finite path π fin = s 0 μ 0 c 0 s 1 . . . μ n−1 c n−1 s n , we consider the costs accumulated in end components, given by Call Algorithm 2 on M, with B − = ∅ and B + initialised as above 10 return gen B + Algorithm 4: Quantile computation with lower cost bounds.
We write C EC j >b G to characterise the set of paths Π where every π ∈ Π has a prefix π fin with last(π fin ) ∈ G and cost EC j (π fin ) > b. If we reach an EC in which costs can be collected, we can accumulate arbitrarily more cost by staying in that EC. We thus have [2]. We lift this observation to multiple lower cost bounds. In this case, any visit of an EC can be extended to accumulating more costs, without violating other (lower) cost bounds. Thus, we get: Our procedure for lower cost-bounded quantiles is shown in Algorithm 4. Similar to Algorithm 3, it adds points b to B + init with b[i] = ∞ for some i. In order to compute probabilities for arbitrarily high cost limits, it replaces the corresponding cost bound by C EC j k >0 G k .
Theorem 5 Algorithm 4 yields a generator of lower cost-bounded quantile query .

Intricate Quantile Queries
Above, we have considered only a subset of the possible quantile queries. Many more combinations are possible. These combinations do not easily fit into the framework provided above. We illustrate this mismatch with some cases.

Lower Cost-Bounded Quantiles Under Minimising Schedulers
To lift the results from the previous section to quantile queries that consider P min M (ϕ), we need to compute probabilities when one or more cost limits approach infinity. For the single-cost case, [2] proposes a transformation that instead computes the maximal probability to reach an end component in which either no goal state is visited or no cost is accumulated. However, in the multi-cost case, such a transformation does not preserve the other cost bounds.

Mixtures of Lower and Upper Cost Bounds
Quantile queries that consider mixtures of lowerand upper cost bounds might have infinite generators, as shown in Example 14. In this case, a finite representation of the satisfying cost limits cannot be achieved with our explicit construction of the generator. A procedure to check the subset of such queries that still yield finite generators is left for future work. Quantiles over Multi-objective Tradeoffs We considered quantile queries with a single probability operator P opt M (ϕ). An extension inspired by multi-objective queries considers quantiles 2 over several conflicting objectives: Such a query asks for the cost limits for which there is a scheduler that satisfies all probability thresholds. This introduces two sources of tradeoffs: A tradeoff between different resolutions of nondeterminism and a tradeoff between different cost limits. Handling such a tradeoff requires to analyse the tradeoffs at each epoch model, and propagating the results through the epochs requires great care and is outside the scope of this paper.

Visualisations
The aim of visualising the results of a multi-objective model checking analysis is to present the tradeoffs between the different objectives such that the user can make an informed decision about the system design or pick a scheduler for implementation. However, the standard Pareto set visualisations alone may not provide sufficient information, about e.g. which objectives are aligned or conflicting (see e.g. [43] for a discussion in the non-probabilistic case). Cost bounds furthermore add an extra dimension for each cost structure. In particular, for each Pareto-optimal scheduler, our method has implicitly computed the probabilities of all objectives for all reachable epochs as well, i.e. for all bounds on all quantities below the bounds required in the tradeoff. In this section, we first show the standard Pareto curve visualisation, which provides the user with an easy-to-understand but very high-level overview of the solution space of a multi-objective query. We then propose a way to visualise the behaviour of individual Pareto-optimal schedulers w.r.t. the probabilities of the individual objectives and the bound values in two-dimensional heatmap plots. These plots provide deep insights into the behaviour of each scheduler, its robustness w.r.t. the bounds, and its preferences for certain objectives depending on the remaining budget for each quantity. Yet due to the need to reduce the dimensionality of the available information, they can be difficult to understand at a first glance. We offer them both as a first straightforward attempt at visualising all the available data as well as an urgent call to visualisation experts to develop more perspicuous ways to present this wealth of data to users.

Pareto Curves
The results of a multi-objective model checking analysis are typically presented as a single (approximation of a) Pareto curve. As a running example for this section, consider the Mars rover MDP M r and tradeoff multi obj 100 , obj 140 with where B is the set of states where the rover has safely returned to its base. That is, we ask for the tradeoff between performing experiments of scientific value at least 100 before returning to base within 175 time units and maximum energy consumption of 100 units (obj 100 ) versus (a) (b) Fig. 11 Pareto curves achieving the same with scientific value at least 140 (obj 140 ). The corresponding Pareto curve is shown in Fig. 11a. Every point x, y in the green area on the bottom left corresponds to a scheduler under which the probability to satisfy obj 100 is x and the probability to satisfy obj 140 is y. The thick blue line is the frontier of Pareto-optimal schedulers: for any scheduler on this line, there is no other scheduler that achieves strictly higher probabilities for both objectives (cf. Sect. 2.4). Overall, this Pareto curve clearly shows that there is a tradeoff between achieving obj 100 and obj 140 ; more risky behaviour is necessary to increase the chance of reaching obj 140 , thereby decreasing the chance of reaching the "easier" objective obj 100 . For more than two objectives, the performance of a set of concrete Pareto-optimal schedulers can be displayed in a bar chart as in Fig. 11b, where the colours reflect different objectives and the groups different schedulers.

Visualising Bounded Multi-Objective Schedulers
Pareto curves and bar charts as presented above reduce schedulers to the probabilities of reaching each of the objectives. However, in a cost-bounded setting, users may arguably not only be interested in the probability for the exact cost limit, but also in the behaviour of the scheduler for lower limits: in essence, the probability distribution over the limits. As our method implicitly computes the probabilities of the objectives for all reachable epochs, this information is to a large extent available "for free" at no extra computational effort (limited only by which epochs are reachable).
We visualise this information via plots for individual Pareto-optimal schedulers as shown in Fig. 12. We restrict to two-dimensional plots since they are easier to grasp than complex three-dimensional visualisations. In each plot, we can thus show the relationship between three different quantities: one on the x-axis, one on the y-axis, and one encoded as the colour of the points (z, where we use blue for high values, red for low values, black for probability zero, and white for unreachable epochs). Our example tradeoff multi obj 100 , obj 140 however already contains five quantities: the probability for obj 100 , the probability for obj 140 , the available time and energy to be spent, and the remaining scientific value to be accumulated. We thus need to project out some quantities. We do this by showing at every x, y coordinate the maximum or minimum value of the z quantity when ranging over all reachable values of the hidden costs at this coordinate. That is, we show a best-or worst-case situation, depending on the semantics of the respective quantities. Out of the 30 possible combinations of quantities for multi obj 100 , obj 140 , we showcase four in Fig. 12 to illustrate the added value of the obtained information. Comparing Schedulers on Value Required to Achieve Objectives First, in Fig. 12a, we plot for the two Pareto-optimal schedulers S 1 and S 2 (cf. Fig. 11a) the probabilities of the two objectives on the x-and y-axes versus the scientific value that still needs to be accumulated in the z (colour) dimension. White areas thus indicate that no epoch for the particular combination of probabilities is reachable from the tradeoff's cost limits of 175 for time, 100 for energy, and 140 as the higher limit for scientific value. In principle, a point x, y, z in these two plots reads as follows: there is an epoch in which scientific value 140-z has already been achieved, the probability to reach obj 100 (i.e. achieve scientific value 100) is x, and the probability to reach obj 140 is y. However, two cost dimensions-the remaining budget for time and the remaining budget for energy-are not shown and need to be projected out. The two plots in fact show in the z dimension the maximum scientific value that still needs to be accumulated over all reachable time and energy budgets for the respective probability values. To be precise, a point x, y, z thus actually needs to be read as follows: among all epochs in which the probability to reach obj 100 is x and the probability to reach obj 140 is y, the maximum difference between 140 and the already accumulated scientific value is z. Since a higher amount of scientific value to be accumulated is easier to reach (less value needs to be accumulated), these plots actually show a "best-case" scenario: the point where enough scientific value has been accumulated to achieve a certain combination of probabilities. The plots for the minimum values are almost the same in this case, though.
We see that S 1 and S 2 are white above the diagonal, as are in fact all other Pareto-optimal schedulers, which means that obj 100 implies obj 140 , i.e. the objectives are aligned. For S 1 , we further see that all blue-ish areas are associated to low probabilities for both objectives: this scheduler achieves only low probabilities when it still needs to make the rover accumulate a high amount of value. However, it overall achieves higher probabilities for obj 140 at medium value requirements, whereas S 2 is "safer" and focuses on satisfying obj 100 . The erratic spikes for S 1 correspond to combinations of probabilities for the objectives that are reached only via unlikely paths. Figure 12b again contrasts schedulers S 1 and S 2 , but this time in terms of the probability for obj 100 (z colour) only, depending on the remaining time (x-axis) and energy budget (y-axis). We plot the minimum probability over the hidden scientific value requirement, i.e. a worst-case view. The plots show that time is of little use in case of low remaining energy but helps significantly when there is sufficient energy. The comparison of the two schedulers mainly confirms, but also explains, their positions in Fig. 11a: S 1 achieves overall lower probabilities for obj 100 than S 2 . If we were to compare each of these plots with the corresponding plot where the z colour is the probability for obj 140 (not shown here), we would see that the visual difference between the plots for S 1 is small whereas the two plots for S 2 are noticeably different-again confirming the Pareto curve plot. Comparing Best and Worst Case Projections In Fig. 12c, we show for S 1 the probability to achieve obj 100 (z colour) depending on the remaining scientific value to be accumulated (x-axis) and the remaining energy budget (y-axis). There is a white vertical line for every odd x-value: over all branches in the model, the gcd of all scientific value costs is 2. The remaining time has to be projected out. The left plot shows the minimum probabilities over the hidden costs, i.e. we see the probability for the worst-case remaining time; the right plot shows the best-case scenario. Not surprisingly, we see that when time is low, only a lot of energy makes it possible to reach the objective with non-zero probability. We also observe that without significant time restrictions (i.e. in the best case), only very limited energy is required to obtain positive probabilities (especially, of course, if only little more scientific value needs to be accumulated, i.e. on the left side of the right-hand plot). Comparing one Scheduler Over Two Objectives Finally, in Fig. 12d, we depict for scheduler S 2 the minimum remaining scientific value required (z colour) such that a certain probability for obj 100 (left) or obj 140 (right) can be achieved (y-axis), given a certain remaining time budget (x-axis). Note that this is a worst-case view: minimum remaining means maximum accumulated scientific value, i.e. we show the value needed to achieve a probability for the worst choice in hidden costs (i.e. in energy budget). The upper left corner for obj 100 shows that a high probability in little time is only achievable if we need to collect little more value (red points); the value requirement gradually relaxes as we aim for lower probabilities or have more time. We see white areas in the rightmost parts of both plots; for obj 100 , they are in the lower probability ranges, while for obj 140 , they are in the higher probability ranges. This reflects the tradeoff made by S 2 to strongly favour obj 100 over obj 140 : with a high time budget, it has many choices available (on which experiments to perform), and it makes them such that it achieves obj 100 with a high probability, at the cost of obj 140 -and with a high time budget, the probabilities are independent of the value and energy budget.

Experiments
We implemented the presented approaches into Storm [21], available at [20]. For multi-cost bounded queries, the implementation computes extremal probabilities for the single-objective and Pareto curves for the multi-objective case. In addition, Storm computes generators for two-dimensional quantile queries with only upper or only lower cost bounds.
We evaluate the approaches on a wide range of Markov chain and MDP case studies. Our benchmark selection includes DTMC models (Crowds and Nand) and MDP models (FireWire and Wlan) from the Prism benchmark suite [40]. Moreover, we consider MDP models that have been studied in the context of multi-objective model checking (Service and UAV ) and in the context of cost-bounded reachability (JobSched and Resources). Finally, we consider the MDP model of the Rover from Sect. 1. The models are given in Prism's [39] guarded command language. Except for Crowds, all epoch models for cost epochs e ∈ N m are acyclic.
We ran our experiments on a single core (2 GHz) of a HP BL685C G7 system with 192 GB of memory. We stopped each experiment after a time limit of 2 h.
Details on replicating the tables, as well as details on how to analyse multi-cost bounded properties using Storm in general, are enclosed in the artifact [33].

Multi-cost Bounded Reachability Queries
Implementation Details We use the sparse engine of Storm, i.e. explicit data structures such as sparse matrices. The expected costs (lines 14 to 16 of Algorithm 1) are computed either numerically (via interval iteration over finite-precision floats) or exactly (via policy iteration [26] over infinite-precision rationals 3 ). To reduce memory consumption, the analysis result of an epoch model M e f is erased once its predecessor epochs have been processed. Set-Up We compare the naive unfolding approach (UNF) as in Sect. 3 with the sequential approach (SEQ) as in Sect. 4. Globally, we considered precision η = 10 −4 for the Pareto curve approximation and precision ε = 10 −6 for interval iteration. -For UNF, the unfolding of the model is applied at the Prism language level, by considering a parallel composition with cost counting structures. For Crowds, Nand, and Wlan, these unfolded models are part of the Prism benchmark suite [40]. On the unfolding we apply the algorithms for unbounded reachability as available in Storm. -For SEQ, we increased the precision for single epoch models as in Theorem 2.
For all MDP case studies we consider single-and multi-objective cost-bounded reachability queries that yield non-trivial results, i.e. probabilities strictly between zero and one. For DTMCs, we only consider single-objective queries: there are no tradeoffs between schedulers since there is no nondeterminism.
Results Tables 1 and 2 show results for single-and multi-objective queries, respectively. The first columns yield the number of states and transitions of the original MDP, then for the query, the number of bounds m, the number of different cost structures r , and the number of reachable cost epochs |E| (reflecting the magnitude of the bound values). |S unf | denotes the number of reachable states in the unfolding. For multi-objective queries, we additionally give the number of objectives and the number of analysed weight vectors w. The remaining columns depict the runtimes of the different approaches in seconds. For UNF, we considered both the sparse (sp) and symbolic (dd) engine of Storm. The symbolic engine neither supports multi-objective model checking nor exact policy iteration. For experiments that completed within the time limit, we observed a memory consumption of up to 110 GB for UNF and up to 8 GB for SEQ. Evaluation On the majority of benchmarks, SEQ performs better than UNF. Typically, SEQ is less sensitive to increases in the magnitude of the cost bounds, as illustrated in Fig. 13. For three benchmark and query instances, we plot the runtime of both approaches against different numbers |E| of reachable epochs. While for small cost bounds, UNF is sometimes faster compared to SEQ, SEQ scales better with increasing |E|. It is not surprising that SEQ scales better: ultimately, the increased state space size and the accompanying memory consumption in UNF is a bottleneck. The most important reason that UNF performs better for some (smaller) cost bounds is the induced overhead of checking the full epoch. In particular, the epoch contains (often many) states that are not reachable from the initial state (in the unfolding).

Multi-dimensional Quantiles
Implementation Details Storm computes generators for 2-dimensional queries with either upper or lower cost bounds as presented in Sects. 5.3 and 5.4. We also allow for additional cost bounds with fixed cost limits as in Remark 1. Similar to multi-cost bounded queries, we consider Storm's sparse engine and compute expected costs (line 25 of Algorithm 2)   via interval iteration over finite precision floats or via policy iteration over infinite precision rationals.
Set-Up For interval iteration, we used precision ε = 10 −6 . The probability threshold for all quantile queries was set to 0.95. We considered the benchmarks from Table 1 where supported queries with a non-trivial generator (i.e. generators that contain a non-zero cost limit for each cost bound) could be assembled. For FireWire, Resources, and Rover, we consider the same cost-bounded reachability formulas as in Table 1.
Results Table 3 shows our results for multi-dimensional quantile queries. The columns depict the number of states and transitions for each model, the dimension of the quantile query m, the number of additional cost bounds with fixed cost limits n, the number of analysed cost epochs |E|, the number of points in the computed generator |gen|, and the runtimes for interval iteration and policy iteration, respectively. Experiments that finished within the time limit required at most 7 G B of memory. Evaluation Our experiments indicate the practicability of our approach. Naturally, the runtime largely depends on the epochs analysed. Comparing with the cost-bounded reachability (single objective), we can see that the overhead of not knowing a priori which epochs to check is significant, but manageable (a rough estimate is a factor of 2). The generators for the considered quantile queries only contain a small number of points, allowing for a concise representation of the set of satisfying cost limits. This small number raises hopes that a good heuristic for selecting candidates might be able to properly approximate such a generator quite fast.

Conclusion
Many real-world planning problems consider several limited resources and contain tradeoffs. This article presented a practically efficient approach to analyse these problems. It has been implemented in the Storm model checker and shows significant performance benefits. The extension to quantiles enables the user to tackle the planning and optimisation problem from an orthogonal angle. Our new algorithm implicitly computes a large amount of information that is hidden in the standard plots of Pareto curves shown to visualise the results of a multiobjective analysis. We have developed a new set of visualisations that exploit all the available data to provide new insights to decision makers even for problems with many objectives and cost dimensions; yet we also call for experts to improve on these visualisations, as we believe that an intuitive presentation of the vast amount of result data needs to accompany an efficient algorithm like ours to exploit its full usage potential.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.