figure a
figure b

1 Introduction

MDP Model Checking and Value Iteration. Markov decision processes (MDPs) are the standard model for sequential decision making in stochastic settings. A standard question in the verification of MDPs is: what is the maximal probability that an error state is reached. MDP model checking is an active topic in the formal verification community. Value iteration (VI) [44] is an iterative and approximate method whose performance in MDP model checking is well-established [11, 29, 30]. Several extensions with soundness have been proposed; they provide, in addition to under-approximations, also over-approximations with a desired precision [4, 24, 30, 43, 46], so that an approximate answer comes with an error bound. These sound algorithms are implemented in mature model checkers such as Prism [37], Modest [27], and Storm [32].

Compositional Model Checking. Even with these state-of-the-art algorithms, it is a challenge to model check large MDPs efficiently with high precision. Experiments observe that MDPs with more than \(10^8\) states are too large for those algorithms [35, 53, 54]—they simply do not fit in memory. However, such large MDPs often arise as models of complicated stochastic systems, e.g. in the domains of network and robotics. Furthermore, even small models may be numerically challenging to solve due to their structure [4, 24, 29].

Compositional model checking is a promising approach to tackle this scalability challenge. Given a compositional structure of a target system, compositional model checking executes a divide-and-conquer algorithm that avoids loading the entire state space at once, often solving the above memory problem. Moreover, reusing the model checking results for components can lead to speed-up by magnitudes. Although finding a suitable compositional structure for a given “monolithic” MDP is still open, many systems come with such an a priori compositional structure. For example, such compositional structures are often assumed in robotics and referred to as hierarchical models [5, 23, 31, 35, 40, 48, 51].

Fig. 1.
figure 1

open MDPs \(\mathcal {A}\) and \(\mathcal {B}\).

Recently, string diagrams of MDPs are introduced for compositional model checking [53, 54]; the current paper adopts this formalism. There, MDPs are extended with (open) entrances and exits (Fig. 1), and they get composed by sequential composition and sum \(\oplus \). See Fig. 2, where the right-hand sides are simple juxtapositions of graphs (wires get connected in ). This makes the formalism focused on sequential (as opposed to parallel) composition. This restriction eases the design of compositional algorithms; yet, the formalism is rich enough to capture the compositional structures of many system models.

Current Work: Compositional Value Iteration. In this paper, we present a compositional value iteration (CVI) algorithm that solves reachability probabilities of string diagrams of MDPs, operating in a divide-and-conquer manner along compositional structures. Our approximate VI algorithm comes with soundness—it produces error bounds—and exploits compositionality for efficiency.

Specifically, for soundness, we lift the recent paradigm of optimistic value iteration (OVI) [30] to the current compositional setting. We use it both for local (component-level) model checking and—in one of the two global VI stopping criteria that we present—for providing a global over-approximation.

For efficiency, firstly, we adopt a top-down compositional approach where each component is model-checked repeatedly, each time on a different weight \(\textbf{w}\), in a by-need manner. Secondly, in order to suppress repetitive computation on similar weights, we introduce a novel technique of Pareto caching that allows “approximate reuse” of model checking results. This closely relates to multi-objective probabilistic model checking [17, 20, 45], without the explicit goal of building Pareto curves. Our Pareto caching also leads to another (sound) global VI stopping criterion that is based on the approximate bottom-up approach [54].

Our algorithm is approximate (unlike the exact one in [53]), and top-down (unlike the bottom-up approximate one in [54]). Experimental evaluation demonstrates its performance thanks to the combination of these two features.

Contributions and Organization.  We start with an overview (Sect. 2) that presents graphical intuitions. After formalizing the problem setting in Sect. 3, we move on to describe our technical contributions:

  • compositional value iteration for string diagrams of MDPs where VI is run in a top-down and thus by-need manner (Sect. 4.2),

  • the Pareto caching technique for reusing results for components (Sect. 5.2),

  • two global stopping criteria that ensure soundness (Sect. 6).

We evaluate and discuss our approach through experiments (Sect. 7), show related work (Sect. 8), and conclude this paper (Sect. 9).

Fig. 2.
figure 2

sequential composition and sum \(\mathcal {A}\oplus \mathcal {B}\) of open MDPs. The framework is bidirectional (edges can be left- and right-ward); thus loops can arise in .

Notations. For a natural number m, we write [m] for \(\{1, \dots , m\}\). For a set X, we write \(\mathcal {D}(X)\) for the set of distributions on X. For sets XY, we write \(X\uplus Y\) for their disjoint union and \(f:X\rightharpoonup Y\) for a partial function f from X to Y.

2 Overview

This section illustrates our take on CVI with so-called Pareto caches using graphical intuitions. We describe MDPs as string diagrams over so-called open MDPs [53]. Open MDPs, such as \(\mathcal {A}, \mathcal {B}\) in Fig. 1, extend MDPs with open ends (entrances and exits). We use two operations and \(\oplus \); see Fig. 2. That figure also illustrates the bidirectional nature of the formalism: arrows can point left and right; thus acyclic MDPs can create cycles when combined. String diagrams come from category theory (see [53]) and they are used in many fields of computer science [8, 9, 22, 52].

2.1 Approximate Bottom-Up Model Checking

The first compositional model checking algorithm for string diagrams of MDPs is in [53], which is exact. Subsequently, in [54], an approximate compositional model checking algorithm is proposed. This is the basis of our algorithm and we shall review it here. Consider, for illustration, the sequential composition in Fig. 3, where the exit \(o_{3}\) is the target. The algorithm from [54] proceeds in the following bottom-up manner.

Fig. 3.
figure 3

 

First Step: Model Checking Each Component. Firstly, model checking is conducted for component oMDPs \(\mathcal {A}\) and \(\mathcal {B}\) separately, which amounts to identifying an optimal scheduler for each. At this point, however, it is unclear what constitutes an optimal scheduler:

Example 1

In the MDP \(\mathcal {A}\) in Fig. 3, let’s say the reachability probabilities \(\bigl (\,\textrm{RPr}^{\sigma _{1}}(i_1\rightarrow o_1),\,\textrm{RPr}^{\sigma _{1}}(i_1\rightarrow o_2)\bigr )\) are (0.2, 0.7) under a scheduler \(\sigma _{1}\), and (0.6, 0.2) under another \(\sigma _{2}\). One cannot tell which scheduler (\(\sigma _{1}\) or \(\sigma _{2}\)) is better for the global objective (i.e. reaching \(o_{3}\) in ) since \(\mathcal {B}\) is a black box.

Concretely, the context of \(\mathcal {A}\) is unknown. Therefore we have to compute all candidates of optimal schedulers, instead of one. This set is given by, for each component \(\mathcal {C}\) and its entrance i,

$$\begin{aligned} \bigl \{\,\text {schedulers }\sigma \,\big |\, \bigl (\textrm{RPr}^{\sigma }(i\rightarrow o)\bigr )_{o: \mathcal {C}\text { 's exit}} \text { is Pareto optimal}\,\bigr \}. \end{aligned}$$
(1)
Fig. 4.
figure 4

Pareto-optimal points

Here the Pareto optimality is a usual notion from multi-objective model checking (e.g. [17, 41]); here, it means that there is no scheduler \(\sigma '\) that dominates \(\sigma \) in the sense that \(\textrm{RPr}^{\sigma }(i\rightarrow o) \le \textrm{RPr}^{\sigma '}(i\rightarrow o)\) holds for each o and < holds for some o. The two points from the example can be plotted, see Fig. 4.

Fig. 5.
figure 5

approximations \((L_{i_1}, U_{i_1})\).

The Pareto curve—the set of points \(\bigl (\textrm{RPr}^{\sigma }(i\rightarrow o)\bigr )_{o}\) for the Pareto optimal schedulers \(\sigma \) in (1)—will look like the dashed blue line in Fig. 4.The solid blue line is realizable by a convex combination of the schedulers \(\sigma _1\) and \(\sigma _2\). It is always below the Pareto curve.

The algorithm in [54] computes guaranteed under- and over-approximations (LU) of Pareto-optimal points (1) for every open MDP. See Fig. 5; here the green area indicates the under-approximation, and the red area is the complement of the over-approximation, so that any Pareto-optimal points are guaranteed to be in their gap (white). These approximations are obtained by repeated application of (optimistic) value iteration on the open MDPs, i.e., a standard approach for verifying MDPs, based on [20, 47]. We formalize these notions in Sect. 5.1.

Second Step: Combination along. The second (inductive) step of the bottom-up algorithm in [54] is to combine the results of the first step—approximations as in Fig. 5, and the corresponding (near) optimal schedulers (1), for each component \(\mathcal {C}\)—along the operations in a string diagram.

Here we describe this second step through the example in Fig. 3. It computes reachability probabilities

$$\begin{aligned} \begin{array}{ll} \textrm{RPr}^{\sigma ,\tau }(i_{1}\rightarrow o_{3}) =&{} \textrm{RPr}^{\sigma }(i_{1}\rightarrow o_{1}) \cdot \textrm{RPr}^{\tau }(i_{2}\rightarrow o_{3})\\ &{}+ \textrm{RPr}^{\sigma }(i_{1}\rightarrow o_{2}) \cdot \textrm{RPr}^{\tau }(i_{3}\rightarrow o_{3}) \end{array} \qquad \qquad \qquad \qquad \end{aligned}$$
(2)

for each combination of Pareto-optimal schedulers \(\sigma \) (for \(\mathcal {A}\)) and \(\tau \) (for \(\mathcal {B}\)) to find which combinations of \(\sigma ,\tau \) are Pareto optimal for .

The equality (2)—called the decomposition equality in [53]—enables compositional reasoning on Pareto-optimal points and on their approximations: Pareto-optimal schedulers for can be computed from those for \(\mathcal {A}\) and \(\mathcal {B}\). This compositional reasoning can be exploited for performance. In particular, when the same component \(\mathcal {A}\) occurs multiple times in a string diagram \(\mathbb {D}\), the model checking result of \(\mathcal {A}\) can be reused multiple times.

2.2 Key Idea I: From Bottom-Up to Top-Down

The bottom-up approaches compute the Pareto curves independent of the context of the open MDP. One key idea is to move from bottom-up to top-down, a direction followed by other compositional techniques too, see Sect. 8.

Fig. 6.
figure 6

For illustration, consider the sequential composition in Fig. 6; we have concretized \(\mathcal {B}\) in Fig. 3. For this \(\mathcal {B}\), it follows that \(\textrm{RPr}(i_{2}\rightarrow o_{3})=0.8\) and \(\textrm{RPr}(i_{3}\rightarrow o_{3})=0.3\). Therefore the equality (2) boils down to

$$\begin{aligned} \textrm{RPr}^{\sigma }(i_{1}\rightarrow o_{3}) = 0.8\cdot \textrm{RPr}^{\sigma }(i_{1}\rightarrow o_{1}) + 0.3\cdot \textrm{RPr}^{\sigma }(i_{1}\rightarrow o_{2}). \end{aligned}$$
(3)

The equation (3) is a significant simplification compared to (2):

  • in (2), since the weight \(\bigl (\textrm{RPr}^{\tau }(i_{2}\rightarrow o_{3}),\textrm{RPr}^{\tau }(i_{3}\rightarrow o_{3})\bigr )\) is unknown, we must compute multidimensional Pareto curves as in Figs. 4 and 5;

  • in (3), since the weight is known to be (0.8, 0.3), we can solve the equation using standard single-objective model checking.

Exploiting this simplification is our first key idea. We introduce a systematic procedure for deriving weights (such as (0.8, 0.3) above) that uses the context of an oMDP, i.e., it goes top-down along the string diagram. The procedure works for bi-directional sequential composition (thus for loops, cf. Fig. 2), not only for uni-directional as in Fig. 6. In the procedure, we first examine the context of a component \(\mathcal {C}\), approximate a weight \(\textbf{w}\) for \(\mathcal {C}\), and then compute maximum weighted reachability probabilities in \(\mathcal {C}\). We formalize the approach in Sect. 4.2.

Potential performance advantages compared to the bottom-up algorithm in [54] should be obvious from Fig. 6. Specifically, the bottom-up algorithm draws a complete picture for Pareto-optimal points (such as Fig. 5) once for all, but a large part of this complete picture may not be used. In contrast, the top-down one draws the picture in a by-need manner, for a weight \(\textbf{w}\) only when the weight \(\textbf{w}\) is suggested by the context.

Fig. 7.
figure 7

top-down approximation.

The top-down approximation of Pareto-optimal points is illustrated in Fig. 7. Here a weight \(\textbf{w}\) is the normal vector of the blue lines; the figure shows a situation after considering two weights.

Fig. 8.
figure 8

  , an example

2.3 Key Idea II: Pareto Caching

Our second key idea (Pareto caching) arises when we try to combine the last idea (top-down compositionality) with the key advantage of the bottom-up approach [54], namely exploiting duplicates. Consider the string diagram in Fig. 8, for motivation, where we designate multiple occurrences of \(\mathcal {A}\) by \(\mathcal {A}_{1}, \mathcal {A}_{2}, \mathcal {A}_{3}\) for distinction, from left to right.

Let us run the top-down algorithm. The component \(\mathcal {E}\) suggests the weight (0.8, 0.3) for the two exits of \(\mathcal {A}_{3}\), and \(\mathcal {D}\) suggest the weight (0.2, 0.7) for the exits of \(\mathcal {A}_{2}\). Recalling that \(\mathcal {A}_{2}\) and \(\mathcal {A}_{3}\) are identical, the weighted optimization results for these two weights can be combined, leading to a picture like Fig. 7.

Now, in Fig. 8, we go on to the component \(\mathcal {B}\). It suggests the weight (0.75, 0.3).

  • In the bottom-up approach [54], performance advantages are brought by exploiting duplicates, that is, by reusing the model checking result of a component \(\mathcal {C}\) for its multiple occurrences.

  • Therefore, also here, we wish to use the previous analysis results for \(\mathcal {A}\)—for the weights (0.8, 0.3) and (0.2, 0.7)—for the weight (0.75, 0.3).

  • Intuitively, (0.75, 0.3) seems close enough to (0.8, 0.3), suggesting that we can use the previously obtained result for (0.8, 0.3).

But this casts the following questions: what is it for two weights to be “close enough”? Is (0.75, 0.3) really closer to (0.8, 0.3) than to (0.2, 0.7)? Can we bound errors—much like in Sect. 2.1—that arise from this “approximate reuse”?

Fig. 9.
figure 9

Pareto caching

In Sect. 5.2, we use the existing theory on Pareto curves in multi-objective model checking from [17, 20, 45] to answer these questions. Intuitively, the previous analysis result (red and green regions) gets queried on a new weight \(\textbf{w}\) (the normal vector of the blue lines), as illustrated in Fig. 9. We call answering weighted reachability based on the Pareto curve Pareto caching. The technique can prevent many invocations of using VI to compute the weighted reachability for \(\textbf{w}\). The distance between the under- and over-approximations computed this way can be big; if so (“cache miss”), we run VI again for the weight \(\textbf{w}\).

2.4 Global Stopping Criteria (GSCs)

On top of two key ideas, we provide two global stopping criteria (GSCs) in Sect. 6: one is based on the ideas from OVI [30] and the other is a symbiosis of the Pareto caches with the bottom-up approach. Although ensuring the termination of our algorithm in finite steps with our GSCs remains future work, we show that our GSCs are sound, that is, its output satisfies a given precision upon termination.

3 Formal Problem Statement

We recall (weighted) reachability in Markov decision processes (MDPs) and formalize string diagrams as their compositional representation. Together, this is the formal basis for our problem statement as already introduced above.

3.1 Markov Decision Process (MDP)

Definition 3.1

(MDP). An MDP \(\mathcal {M} = (S, A, P)\) is a tuple with a finite set S of states, a finite set A of actions, and a probabilistic transition function \(P:S\times A \rightharpoonup \mathcal {D}(S)\) (which is a partial function, cf. notations in Sect. 1).

A (finite) path (on \(\mathcal {M}\)) is a finite sequence of states . We write \(\textsf{FPath}_{\mathcal {M}}\) for the set of finite paths on \(\mathcal {M}\). A memoryless scheduler \(\sigma \) is a function \(\sigma :S\rightarrow \mathcal {D}(A)\); in this paper, memoryless schedulers suffice [20, 44]. We say \(\sigma \) is deterministic memoryless (DM) if for each \(s\in S\), \(\sigma (s)\) is Dirac. We also write \(\sigma :S\rightarrow A\) for a DM scheduler \(\sigma \). The set of all memoryless schedulers on \(\mathcal {M}\) is \(\varSigma ^{\mathcal {M}}\), and the set of all DM schedulers on \(\mathcal {M}\) is \(\varSigma _\text {d}^{\mathcal {M}}\).

For a memoryless scheduler \(\sigma \) and a target state \(t\in S\), the reachability probability \(\textrm{RPr}^{\mathcal {M}, \sigma ,t}(s)\) from a state s is given by , where (i) the set \(\textsf{FPath}_{\mathcal {M}}(t)\subseteq \textsf{FPath}_{\mathcal {M}}\) is defined by , and (ii) the probability \(\textrm{Pr}^{\mathcal {M}}_{\sigma ,s}(\pi )\) is defined by if \(\pi _1 = s\) and otherwise.

Towards our compositional approach for a reachability objective, we must generalize the objective to a weighted reachability probability objective: we want to compute the weighted sum—with respect to a certain weight vector \(\textbf{w}\)—over reachability probabilities to multiple target states. The standard reachability probability problem is a special case of this weighted reachability problem using a suitable unit vector \(\textbf{e}\) as the weight \(\textbf{w}\).

Definition 3.2

(weighted reachability probability). Let \(\mathcal {M}\) be an MDP, and T be a set of target states. A weight \(\textbf{w}\) on T is a vector .

Let s be a state, and \(\sigma \) be a scheduler. The weighted reachability probability \(\textrm{WRPr}^{\mathcal {M},\sigma , T}(\textbf{w}, s)\in [0, 1]\) from s to T over \(\sigma \) with respect to a weight \(\textbf{w}\) is defined naturally by a weighted sum, that is, . We write \(\textrm{WRPr}^{\mathcal {M}, T}_{\textrm{max}}(\textbf{w}, s)\) for the maximum weighted reachability probability \( \sup _{\sigma } \textrm{WRPr}^{\mathcal {M},\sigma , T}(\textbf{w}, s) \). (The supremum is realizable; see e.g. [28].)

3.2 String Diagram of MDPs

Definition 3.3

(oMDP). An open MDP (oMDP) \(\mathcal {A} = (M, \textsf{IO})\) is a pair consisting of an MDP M with open ends \(\textsf{IO}=(I_{\textbf{r}}, I_{\textbf{l}}, O_{\textbf{r}}, O_{\textbf{l}})\), where \(I_{\textbf{r}}, I_{\textbf{l}}, O_{\textbf{r}}, O_{\textbf{l}}\subseteq S\) are pairwise disjoint and each of them is totally ordered. The states in are the entrances, and the states in are the exits, respectively. We often use superscripts to designate the oMDP \(\mathcal {A}\) in question, such as \(I^{\mathcal {A}}\) and \(O^{\mathcal {A}}\).

We write \(\textrm{arity}(\mathcal {A}):(m_{\textbf{r}}, m_{\textbf{l}})\rightarrow (n_{\textbf{r}}, n_{\textbf{l}})\) for the arities of \(\mathcal {A}\), where , , , and . We assume that every exit s is a sink state, that is, P(sa) is undefined for any \(a\in A\). We can naturally lift the definitions of schedulers and weighted reachability probabilities from MDPs to oMDPs: we will be particularly interested in the following instances; 1) the weighted reachability probability from a chosen entrance i to the set \(O^{\mathcal {A}}\) of all exits; and 2) the maximum weighted reachability probability from i to \(O^{\mathcal {A}}\) weighted by \(\textbf{w}\).

We define string diagrams of MDPs [53] syntactically, as syntactic trees whose leaves are oMDPs and non-leaf nodes are algebraic operations. The latter are syntactic operations and they are yet to be interpreted.

Definition 3.4

(string diagram of MDPs). A string diagram \(\mathbb {D}\) of MDPs is a term adhering to the grammar , where \(\textsf{c}_{\mathcal {A}}\) is a constant designating an oMDP \(\mathcal {A}\).

The above syntactic operations are interpreted by the semantic operations below. The following definitions explicate the graphical intuition in Fig. 2.

Definition 3.5

(sequential composition ). Let \(\mathcal {A}\), \(\mathcal {B}\) be oMDPs, \(\textrm{arity}(\mathcal {A}) = (m_{\textbf{r}}, m_{\textbf{l}}) \rightarrow (l_{\textbf{r}}, l_{\textbf{l}})\), and \(\textrm{arity}(\mathcal {B}) = (l_{\textbf{r}}, l_{\textbf{l}}) \rightarrow (n_{\textbf{r}}, n_{\textbf{l}})\). Their sequential composition is the oMDP \((M, \textsf{IO}')\) where \(\textsf{IO}' = (I_{\textbf{r}}^{\mathcal {A}}, I_{\textbf{l}}^{\mathcal {B}}, O_{\textbf{r}}^{\mathcal {B}}, O_{\textbf{l}}^{\mathcal {A}})\), and P is

$$\begin{aligned} \begin{array}{ll} P(s, a, s') &{}:= {\left\{ \begin{array}{ll} P^{\mathcal {D}}(s, a, s') &{}\text {if } \mathcal {D} \in \{\mathcal {A}, \mathcal {B}\}\text { , } s\in S^{\mathcal {D}} \text { , } a\in A^{\mathcal {D}} \text { , and } s'\in S^{\mathcal {D}}\text {, }\\ P^{\mathcal {A}}(s, a, o_{\textbf{r}, i}^{\mathcal {A}}) &{}\text {if } s\in S^{\mathcal {A}} \text { , } a\in A^{\mathcal {A}} \text { , } s' = i_{\textbf{r}, i}^{\mathcal {B}} \text { for some } 1 \le i \le l_{\textbf{r}}\text { , }\\ P^{\mathcal {B}}(s, a, o_{\textbf{l}, i}^{\mathcal {B}}) &{}\text {if } s\in S^{\mathcal {B}} \text { , } a\in A^{\mathcal {B}} \text { , } s' = i_{\textbf{l}, i}^{\mathcal {A}} \text { for some } 1 \le i \le l_{\textbf{l}},\\ 0 &{}\text {otherwise. } \end{array}\right. } \end{array} \end{aligned}$$

Definition 3.6

(sum \(\oplus \)). Let \(\mathcal {A}, \mathcal {B}\) be oMDPs. Their sum \(\mathcal {A}\oplus \mathcal {B}\) is the oMDP \((M, \textsf{IO}')\) where \(\textsf{IO}' = (I_{\textbf{r}}^{\mathcal {A}}\uplus I_{\textbf{r}}^{\mathcal {B}}, I_{\textbf{l}}^{\mathcal {A}}\uplus I_{\textbf{l}}^{\mathcal {B}}, O_{\textbf{r}}^{\mathcal {A}}\uplus O_{\textbf{r}}^{\mathcal {B}}, O_{\textbf{l}}^{\mathcal {A}}\uplus O_{\textbf{l}}^{\mathcal {B}})\), \(M = (S^{\mathcal {A}} \uplus S^{\mathcal {B}}, A^{\mathcal {A}} \uplus A^{\mathcal {B}}, P)\), and P is given by if \(\mathcal {D} \in \{\mathcal {A}, \mathcal {B}\}\), \(s\in S^{\mathcal {D}}\), \(a\in A^{\mathcal {D}}\), and \(s'\in S^{\mathcal {D}}\), and otherwise .

Definition 3.7

(operational semantics \(\llbracket \mathbb {D} \rrbracket \)). Let \(\mathbb {D}\) be a string diagram of MDPs. The operational semantics \(\llbracket \mathbb {D} \rrbracket \) is the oMDP which is inductively defined by Defs 3.5 and 3.6, with the base case \(\llbracket \textsf{c}_{\mathcal {A}} \rrbracket =\mathcal {A}\). Here we assume that every string diagram \(\mathbb {D}\) has matching arities so that compositions are well-defined. We call \(I^{\llbracket \mathbb {D} \rrbracket }\) and \(O^{\llbracket \mathbb {D} \rrbracket }\) global entrances and global exits of \(\mathbb {D}\), respectively.

Fig. 10.
figure 10

\(\llbracket \mathbb {D} \rrbracket \) in Ex. 3.10

For describing the occurrence of oMDPs and their duplicates in a string diagram \(\mathbb {D}\), we formally define nominal components \(\mathop {\textrm{nCP}}(\mathbb {D})\) and components \(\mathop {\textrm{CP}}(\mathbb {D})\). The latter for graph-theoretic operations in our compositional VI (CVI) (Algorithm 1), while the former is for Pareto caching (Sect. 5.2). Examples are provided later in Ex. 3.10.

Definition 3.8

(\(\mathop {\textrm{nCP}}(\mathbb {D})\), \(\mathop {\textrm{CP}}(\mathbb {D})\)). The set \(\mathop {\textrm{nCP}}(\mathbb {D})\) of nominal components is the set of constants occurring in \(\mathbb {D}\) (as a term). The set \(\mathop {\textrm{CP}}(\mathbb {D})\) of components is inductively defined by the following: , and for ; here we count multiplicities, unlike \(\mathop {\textrm{nCP}}(\mathbb {D})\).

We introduce local open ends of string diagrams, in contrast to global open ends defined in Def. 3.7.

Definition 3.9

(\(I_{\textrm{lc}}(\mathbb {D})\), \(O_{\textrm{lc}}(\mathbb {D})\) (local)). The sets \(I_{\textrm{lc}}(\mathbb {D})\) and \(O_{\textrm{lc}}(\mathbb {D})\) of local entrances and exits of \(\mathbb {D}\) are given by , and , respectively. Clearly we have \(I^{\llbracket \mathbb {D} \rrbracket }\subseteq I_{\textrm{lc}}(\mathbb {D})\), \(O^{\llbracket \mathbb {D} \rrbracket }\subseteq O_{\textrm{lc}}(\mathbb {D})\).

Example 3.10

Let , where \(\mathcal {A}\) and \(\mathcal {B}\) are from Fig. 1. The oMDP \(\llbracket \mathbb {D} \rrbracket \) is shown in Fig. 10. Then \(\mathop {\textrm{nCP}}(\mathbb {D})=\{\textsf{c}_{\mathcal {A}},\textsf{c}_{\mathcal {B}} \}\), while \(\mathop {\textrm{CP}}(\mathbb {D})=\{\mathcal {A}_{1}, \mathcal {A}_{2}, \mathcal {B}\}\) with subscripts added for distinction. We have \(I^{\llbracket \mathbb {D} \rrbracket }=\{i^{\mathcal {A}_{1}}_{1}\}\) and \(O^{\llbracket \mathbb {D} \rrbracket }=\{o^{\mathcal {A}_{1}}_{1}, o^{\mathcal {B}}_{2}\}\), and \(I_{\textrm{lc}}(\mathbb {D})= \{ i^{\mathcal {A}_{1}}_{1}, i^{\mathcal {A}_{1}}_{2}, i^{\mathcal {A}_{2}}_{1}, i^{\mathcal {A}_{2}}_{2}, i^{\mathcal {B}}_{1} \} \) and \(O_{\textrm{lc}}(\mathbb {D})= \{ o^{\mathcal {A}_{1}}_{1}, o^{\mathcal {A}_{1}}_{2}, o^{\mathcal {A}_{2}}_{1}, o^{\mathcal {A}_{2}}_{2}, o^{\mathcal {B}}_{1}, o^{\mathcal {B}}_{2} \} \). Note also that \(O_{\textrm{lc}}(\mathbb {D})\) does not suppress exits removed in sequential composition, such as \(\{ o^{\mathcal {A}_{1}}_{2}, o^{\mathcal {A}_{2}}_{1}, o^{\mathcal {A}_{2}}_{2}, o^{\mathcal {B}}_{1} \}\).

figure av

We remark that as a straightforward extension, we can also extract a scheduler that achieves the under-approximation.

4 VI in a Compositional Setting

We recap value iteration (VI) [3, 44] and its extension to optimistic value iteration (OVI) [30] before presenting our compositional VI (CVI).

4.1 Value Iteration (VI) and Optimistic Value Iteration (OVI)

VI relies on the characterization of maximum reachability probabilities as a least fixed point (lfp), specifically the lfp \(\mu \varPhi _{\mathcal {M},T}\) of the Bellman operator \(\varPhi _{\mathcal {M},T}\): the Bellman operator \(\varPhi _{\mathcal {M},T}\) is an operator on the set \([0, 1]^S\) that intuitively returns the \(t+1\)-step reachability probabilities given the t-step reachability probabilities. A formal treatment can be found in [55, Appendix B]. Then the Kleene sequence \(\bot \le \varPhi _{\mathcal {M},T}(\bot )\le \varPhi _{\mathcal {M},T}^{2}(\bot )\le \cdots \) gives a monotonically increasing sequence that converges to the lfp \(\mu \varPhi _{\mathcal {M},T}\), where \(\bot \) is the least element. This also applies to weighted reachability probabilities.

While VI gives guaranteed under-approximations, it does not say how close the current approximation is to the solution \(\mu \varPhi _{\mathcal {M},T}\)Footnote 1. The capability of providing guaranteed over-approximations as well is called soundness in VI, and many techniques come with soundness [24, 30, 43, 46]. Soundness is useful for stopping criteria: one can fix an error bound \(\eta \in [0, 1]\); VI can terminate when the distance between under- and over-approximations is at most \(\eta \).

Among sound VI techniques, in this paper we focus on optimistic VI (OVI) due to its proven performance record [11, 29]. We use OVI in many places, specifically for 1) stopping criteria for local VIs in Sect. 4.2, 2) caching heuristics in Sect. 5.2, and 3) a stopping criterion for global (compositional) VI in Sect. 6.

The main steps of OVI proceed as follows: 1) a VI iteration produces an under-approximation l for every state; 2) we heuristically pick an over-approximation candidate u, for example by ; and 3) we verify the candidate u by checking if \(\varPhi _{\mathcal {M},T}(u)\le u\). If the last holds, then by the Park induction principle [42], u is guaranteed to over-approximate the lfp \(\mu \varPhi _{\mathcal {M},T}\). If it does not, then we refine lu and try again. See [30] for details.

4.2 Going Top-Down in Compositional Value Iteration

We move on to formalize the story of Sect. 2.2. Algorithm 1 is a prototype of our proposed algorithm, where compositional VI is run in a top-down manner. It will be combined with Pareto caching (Sect. 5.2) and the stopping criteria introduced in Sect. 6. A high-level view of Algorithm 1 is the iteration of the following operations: 1) running local VI in each component oMDP, and 2) propagating its result along sequential composition, from an entrance of a succeeding component, to the corresponding exit of a preceding component. See Fig. 11 for illustration. The algorithm maintains two main constructs: functions \(g:I_{\textrm{lc}}(\mathbb {D})\rightarrow [0, 1]\) and \(h:O_{\textrm{lc}}(\mathbb {D})\rightarrow [0, 1]\) that assign values to local entrances and exits, respectively. They are analogues of the value function \(f:S\rightarrow [0, 1]\) in (standard) VI (Sect. 4.1); g and h get iteratively increased as the algorithm proceeds.

figure ax
Fig. 11.
figure 11

an overview of Algorithm 1. In the MDP \(\llbracket \mathbb {D} \rrbracket \), the exit \(o^{\mathcal {A}}_{1}\) and the entrance \(i^{\mathcal {B}}_{1}\) get merged in (Def. 3.5); here they are distinguished, much like in Def. 3.9. Numbers in red are the values of h; those in blue are the values of g.

Lines 4–12 are the main VI loop, where we combine local VI (over each component \(\mathcal {A}\)) and propagation along sequential composition. The algorithm \(\texttt {LocalVI}\) takes the target oMDP \(\mathcal {A}\) and its “local weight” as arguments; the latter is the restriction \(h|_{O^{\mathcal {A}}}:O^{\mathcal {A}}\rightarrow [0, 1]\) of the function \(h:O_{\textrm{lc}}(\mathbb {D})\rightarrow [0, 1]\). Any VI algorithm will do for \(\texttt {LocalVI}\); we use OVI as announced in Sect. 4.1. The result of local VI is a function \(g_{\mathcal {A}}:I^{\mathcal {A}}\rightarrow [0, 1]\) for values over entrances of \(\mathcal {A}\). These get patched up to form \(g:I_{\textrm{lc}}(\mathbb {D})\rightarrow [0, 1]\) in line 11. The function \(\coprod _{\mathcal {A}\in \mathop {\textrm{CP}}(\mathbb {D})}g_{\mathcal {A}}\) is defined by obvious case-distinction: it returns \(g_{\mathcal {A}}(i)\) for a local entrance \(i\in I^{\mathcal {A}}\). Recall from Def. 3.9 that \(I_{\textrm{lc}}(\mathbb {D})=\biguplus _{\mathcal {A}\in \mathop {\textrm{CP}}(\mathbb {D})}I^{\mathcal {A}}\). In line 12, the values at entrances are propagated to the connected exits.

On \(\texttt {PropagateSeqComp}\) in line 12, its graphical intuition is in Fig. 11c; here are some details. We first note that the set \(O_{\textrm{lc}}(\mathbb {D})\) of local exits is partitioned into 1) global exits (i.e. those in \(O^{\llbracket \mathbb {D} \rrbracket }\)) and 2) those local exits that get removed by sequential composition. Indeed, by examining Defs 3.5 and 3.6, we see that sequential composition is the only operation that removes local exits, and the local exits that are not removed eventually become global exits. It is also obvious (Def. 3.5) that each local exit o removed in sequential composition has a corresponding local entrance \(i_o\). Using these, we define the function , of the type \(O_{\textrm{lc}}(\mathbb {D})\rightarrow [0, 1]\), as follows: \(h(o)=w_{o}\) if o is a global exit (much like line 2); \(h(o)=g(i_o)\) otherwise.

Theorem 4.1

Algorithm 1 satisfies the following properties:

  1. 1.

    (Guaranteed under-approximation) For the output f of Algorithm 1, we have \(f(i)\le \textrm{WRPr}^{\llbracket \mathbb {D} \rrbracket }_{\textrm{max}}(\textbf{w}, i)\) for each \(i\in I^{\llbracket \mathbb {D} \rrbracket }\).

  2. 2.

    (Convergence) Assume that \(\texttt {GlobalStoppingCriterion}\) is \({\textbf {false}}\). Algorithm 1 converges to the optimal value, that is, f converges to \( \textrm{WRPr}^{\llbracket \mathbb {D} \rrbracket }_{\textrm{max}}(\textbf{w}, I^{\llbracket \mathbb {D} \rrbracket })\).    \(\square \)

The correctness of the under-approximation of Algorithm 1 follows easily from those of (non-compositional, asynchronous) VI. The convergence depends on the fact that line 6 of Algorithm 1 iterates over all components.

5 Pareto Caching in Compositional VI

In our formulation of Algorithm 1, there is no explicit notion of Pareto curves. However, in line 6, we do (implicitly) compute under-approximations on points on the Pareto curves. Here we recap approximate Pareto curves. We then show how we conduct Pareto caching, the key idea sketched in Sect. 2.3.

5.1 Approximating Pareto Curves

We formalize the Pareto curves illustrated in Sect. 2. For details, see [17, 20, 41, 45]. Model checking oMDPs is a multi-objective problem, that determines different trade-offs between reachability probabilities for the individual exits.

Definition 5.1

(Pareto curve for an oMDP [54]). Let \(\mathcal {A}\) be an oMDP, and i be a (chosen) entrance. Let \(\textbf{p},\textbf{p}' \in [0, 1]^{O^{\mathcal {A}}}\). The relation \(\preceq \) between them is defined by \(\textbf{p}\preceq \textbf{p}'\) if \(\textbf{p}(o)\le \textbf{p}'(o)\) for each \(o\in O^{\mathcal {A}}\). When \(\textbf{p}\prec \textbf{p}'\) (i.e. \(\textbf{p}\preceq \textbf{p}'\) and \(\textbf{p}\ne \textbf{p}'\)), we say \(\textbf{p}'\) dominates \(\textbf{p}\). Let \(\sigma \) be a scheduler for \(\mathcal {A}\). We define the point realized by \(\sigma \), denoted by \(\textbf{p}^{\sigma }_{i}\), by , the reachability probability from i to o under \(\sigma \).

The set \(\textsf{Ach}^{\sigma }_{i}\) of points achievable by \(\sigma \) is . The set of achievable points is given by . The Pareto curve \(\textsf{Pareto}_{i}\subseteq [0, 1]^{O^{\mathcal {A}}}\) is the set of maximal elements in \(\textsf{Ach}_{i}\) wrt. \(\preceq \). We say a scheduler \(\sigma \) is Pareto-optimal if \( \textbf{p}^{\sigma }_{i} \in \textsf{Pareto}_{i} \).

The set is convex, downward closed, and finitely generated by DM schedulers; it follows that, for our target problem, Pareto-optimal DM schedulers suffice. This is illustrated in Fig. 9, where a weight \(\textbf{w}\) is the normal vector of blue lines, and the maximum is achieved by a generating point for \(\textsf{Ach}_{i}\).

The last observations are formally stated as follows.

Proposition 5.2

([17, 20, 45]). For any entrance \(i\in I\), the set \(\textsf{Ach}_{i}\) of achievable points is finitely generated by DM schedulers, that is, \( \textsf{Ach}_{i} = \textsf{DwConvCl}(\textsf{Ach}^{\varSigma _\text {d}^{\mathcal {A}}}_{i}) \). Here, \(\textsf{DwConvCl}(X)\) denotes the downward and convex closed set generated by \(X\subseteq \mathbb R^n\), and \(\textsf{Ach}^{\varSigma _\text {d}^{\mathcal {A}}}_{i}\) is given by , where \(\varSigma _\text {d}^{\mathcal {A}}\) is the set of DM schedulers.

Proposition 5.3

([17, 20, 45]). Given a weight \(\textbf{w}\in [0, 1]^{O^{\mathcal {A}}}\) and an entrance i, there is a scheduler \(\sigma \) such that \( \textrm{WRPr}^{\llbracket \mathbb {D} \rrbracket ,\sigma }(\textbf{w}, i) = \textrm{WRPr}^{\llbracket \mathbb {D} \rrbracket }_{\textrm{max}}(\textbf{w}, i)\). Moreover, this \(\sigma \) can be chosen to be DM and Pareto-optimal.

We now formulate sound approximations of Pareto curves, which is a foundation of our Pareto caching (and a global stopping criterion in Sect. 6).

Definition 5.4

(sound approximation [54]). Let i be an entrance. An under-approximation \(L_i\) of the Pareto curve \(\textsf{Pareto}_{i}\) is a downward closed subset \(L_i\subseteq \textsf{Ach}_{i}\); an over-approximation is a downward closed superset \(U_i\supseteq \textsf{Ach}_{i}\). A pair \((L_i,U_i)\) is called a sound approximation of the Pareto curve \(\textsf{Pareto}_{i}\). In this paper, we focus on \(L_i\) and \(U_i\) that are finitely generated, i.e. the convex and downward closures of some finite generators \(L_i^{\textrm{g}}, U_i^{\textrm{g}}\subseteq [0, 1]^{O^{\mathcal {A}}}\), respectively. A sound approximation of an oMDP \(\mathcal {A}\) is a pair (LU), where \(L=(L_{i})_{i\in I^{\mathcal {A}}}\), \(U=(U_{i})_{i\in I^{\mathcal {A}}}\), and \((L_i, U_i)\) is a sound approximation for each entrance i.

5.2 Pareto Caching

We go on to formalize our second key idea, Pareto caching, outlined in Sect. 2.3.

In Def. 5.5, an index \(\textsf{c}_{\mathcal {A}}\in \mathop {\textrm{nCP}}(\mathbb {D})\) is a nominal component that ignores multiplicities, since we want to reuse results for different occurrences of \(\mathcal {A}\).

Definition 5.5

(Pareto cache). Let \(\mathbb {D}\) be a string diagram of MDPs. A Pareto cache \(\textbf{C}\) is an indexed family , where \( (L^{\mathcal {A}}, U^{\mathcal {A}})\) is a sound approximation for each nominal component \(\textsf{c}_{\mathcal {A}}\), defined in Def. 5.4.

As announced in Sect. 2.3, a Pareto cache \(\textbf{C}\)—its component \((L^{\mathcal {A}}, U^{\mathcal {A}})\), to be precise—gets queried on a weight \(\textbf{w}\in [0, 1]^{O^{\mathcal {A}}}\). It is not trivial what to return, however, since the specific weight \(\textbf{w}\) may not have been used before to construct \(\textbf{C}\). The query is answered in the way depicted in Fig. 9, finding an extremal point where \(L^{\mathcal {A}}\) intersects with a plane with its normal vector \(\textbf{w}\).

Definition 5.6

(cache read). Assume the above setting, and let i be an entrance of interest. The cache read \((L^{\mathcal {A}}_{i}(\textbf{w}), U^{\mathcal {A}}_{i}(\textbf{w}))\in [0, 1]^{2}\) on \(\textbf{w}\) at i is defined by and .

Recall from Sect. 5.1 that we can assume \(L_{i}\) and \(U_{i}\) are finitely generated as convex and downward closures. It follows [20, 45] that each supremum above is realized by some generating point, much like in Prop. 5.3, easing computation.

figure bk

We complement Algorithm 1 by Algorithm 2 that introduces our Pareto caching. Specifically, for the weight \(h|_{O^{\mathcal {A}}}\) in question, we first compute the error \(\max _{i\in I^{\mathcal {A}}}U^{\mathcal {A}}_i(h|_{O^{\mathcal {A}}}) - L^{\mathcal {A}}_i(h|_{O^{\mathcal {A}}})\) of the Pareto cache \(\textbf{C}= \big ((L^{\mathcal {A}}, U^{\mathcal {A}})\big )_{\textsf{c}_{\mathcal {A}}}\) with respect to this weight. The error can be greater than a prescribed bound \(\eta \)—we call this cache miss—in which case we run OVI locally for \(\mathcal {A}\) (line 9). When the error is no greater than \(\eta \)—we call this cache hit—we use the cache read (Def. 5.6), sparing OVI on a component \(\mathcal {A}\in \mathop {\textrm{CP}}(\mathbb {D})\). In the case of a cache miss, the result \((l,\sigma )\) of local OVI (line 9) is used also to update the Pareto cache \(\textbf{C}\) (line 10); see below.

Using a Pareto cache may prevent the execution of local VI on every component, which can be critical for the convergence of Algorithm 1; see Thm. 4.1. A simple solution is to disregard Pareto caches eventually.

Updating the Cache. Pareto caches get incrementally updated using the results for weighted reachabilities with different weights \(\textbf{w}\). We build upon data structures in [20, 45]. Notable is the asymmetry between under- and over-approximations \((L_{i},U_{i})\): we obtain 1) a point in \(L_{i}\) and 2) a plane that bounds \(U_{i}\).

We update the cache after running OVI on a weight \(\textbf{w}\in [0, 1]^{O^{\mathcal {A}}}\), which approximately computes the optimal weighted reachability to exits \(o\in O^{\mathcal {A}}\). That is, it returns \(l,u\in [0, 1]\) such that

$$\begin{aligned} l\;\le \; \textstyle \sup _{\sigma } \bigl (\,\textbf{w}\cdot \bigl (\, \textrm{RPr}^{\sigma }(i\rightarrow o) \,\bigr )_{o\in O^{\mathcal {A}}}\,\bigr ) \;\le \; u. \end{aligned}$$
(4)

Here i is any entrance and \(\textrm{RPr}^{\sigma }(i\rightarrow o)\) is the probability \(\textrm{RPr}^{\mathcal {A},\sigma ,\{o\}}(i)\) in Sect. 3.1.

What are the “graphical” roles of lu in the Pareto curve? The role of u is easier: it follows from (4) that any achievable reachability vector \( \bigl ( \textrm{RPr}^{\sigma }(i\rightarrow o) \bigr )_{o} \) resides under the plane \(\{\textbf{p}\mid \textbf{w}\cdot \textbf{p}= u\}\). This plane thus bounds an over-approximation \(U_{i}\). The use of l takes some computation. By (4), the existence of a good scheduler \(\sigma \) is guaranteed; but this alone does not carry any graphical information e.g. in Fig. 9. We have to go constructive, by extracting a near-optimal DM scheduler \(\sigma _{0}\) (we can do so in VI) and using this fixed \(\sigma _{0}\) to compute \( \bigl ( \textrm{RPr}^{\sigma _{0}}(i\rightarrow o) \bigr )_{o} \). This way we can plot an achievable point—a corner point in Fig. 9—in \(L_i\).

6 Global Stopping Criteria (GSC)

We present the last missing piece, namely global stopping criteria (GSC in short, in line 4 of Algorithm 1). It has to ensure that the computed underapproximation f is \(\epsilon \) close to the exact reachability probability. We provide two criteria, called optimistic and bottom-up.

Optimistic GSC (Opt-GSC). The challenge in adapting the idea of OVI (see Sect. 4.1) to CVI is to define a suitable Bellman operator for CVI. Once we define such a Bellman operator for CVI, we can immediately apply the idea of OVI. For simplicity, we assume that CVI solves exactly in each local component (line 6 in Algorithm 1) without Pareto caching; this can be done, for example, by policy iteration [29]. Then, CVI (without Pareto caching and a global stopping criterion) on \(\mathbb {D}\) is exactly the same as the (non-compositional) VI on a suitable shortcut MDP [54] of \(\mathbb {D}\). Intuitively, a shortcut MDP summarizes a Pareto-optimal scheduler by a single action from a local entrance to exit, see [55, Appendix C] for the definition. Thus, we can regard the standard Bellman operator on the shortcut MDP as the Bellman operator for CVI, and define Opt-GSC as the standard OVI based on this characterisation. CVI with Opt-GSC (and Pareto caching) actually uses local under-approximations (not exact solutions) for obtaining a global under-approximation (line 7 in Algorithm 2 and line 9 in Algorithm 2), where the desired soundness property still holds. See [55, Appendix C] for more details.

Bottom-up GSC (BU-GSC). We obtain another global stopping criterion by composing Pareto caches—computed in Algorithm 2 for each component \(\mathcal {A}\)—in the bottom-up manner in [54] (outlined in Sect. 2.1). Specifically, 1) Algorithm 2 produces an over-approximation \(U^{\mathcal {A}}\) for the Pareto curve of each component \(\mathcal {A}\); 2) we combine \((U^{\mathcal {A}})_{\mathcal {A}}\) along and \(\oplus \) to derive an over-approximation U of the global Pareto curve; and 3) this U is queried on the weight \(\textbf{w}\) in question (i.e. the input of CVI), in order to obtain an over-approximation u of the weighted reachability probabilities. BU-GSC checks if this over-approximation u is close enough to the under-approximation l derived from g in Algorithm 1.

Correctness. CVI (Algorithm 1 with Pareto caching under either GSC) is sound. The proof is in [55, Appendix C].

Theorem 6.1

(\(\epsilon \)-soundness of CVI). Given a string diagram \(\mathbb {D}\), a weight \(\textbf{w}\), and \(\epsilon \in [0, 1]\), if CVI terminates, then the output f satisfies

$$\begin{aligned} f(i)\le \textrm{WRPr}^{\llbracket \mathbb {D} \rrbracket }_{\textrm{max}}(\textbf{w}, I^{\llbracket \mathbb {D} \rrbracket }) \le f(i)+\epsilon , \end{aligned}$$

for each \(i\in I^{\llbracket \mathbb {D} \rrbracket }\).

Our algorithm currently comes with no termination guarantee; this is future work. Termination of VI (with soundness) is a tricky problem: most known termination proofs exploit the uniqueness of a fixed point of the Bellman operator, which must be algorithmically enforced e.g. by eliminating end components [10, 24]. In the current compositional setting, end components can arise by composing components, so detecting them is much more challenging.

7 Empirical Evaluation

In this section, we compare the scalability of our approaches both among each other and in comparison with some existing baselines. We discuss the setup, give the results, and then give our interpretation of them.

Approaches. We examine our three main algorithms. Opt-GSC with either exact caching ( ) or Pareto-caching ( ), and BU-GSC with Pareto-caching (Symb). BU-GSC needs a Pareto cache, so we cannot run BU-GSC with an exact cache. We compare our approaches against two baselines: a monolithic (Mono) algorithm building the complete MDP \(\llbracket \mathbb {D} \rrbracket \) and the bottom-up (BU) as explained in [54]. We use two virtual approaches that use a perfect oracle to select the fastest out of the specified algorithms: baselines is the best-of-the-baselines, while novel is the best of the three new algorithms. All algorithms are built on top of the probabilistic model checker Storm [32], which is primarily used for model building and (O)VI on component MDPs as well as operating on Pareto curves.

Setup. We run all benchmarks on a single core of an AMD Ryzen TRP 5965WX, with a 900 s time-out and a 16GB memory limit. We use all (scalable) benchmark instances from [54]. While these benchmarks are synthetic, they reflect typical structures found in network protocols and high-level planning domains. We require an overall precision of \(10^{-4}\), we run the Pareto cache with an acceptance precision of \(10^{-5}\), and solve the LPs in the upper-bound queries for the Pareto cache with an exact LP solver and a tolerance of \(10^{-4}\). The components are reverse topologically ordered, i.e., we always first analyse component MDPs towards the end of a given MDP \(\llbracket \mathbb {D} \rrbracket \). To solve the component MDPs inside the VI, we use OVI for the lower bounds and precise policy iteration for the upper bounds. We use algorithms and data structures already present in Storm for maintaining Pareto curves [45], which use exact rational arithmetic for numerical stability. Although our implementation supports exact arithmetic throughout the code, in practice this leads to a significant performance penalty, performing up to 100 times slower. For algorithms not related to maintaining the Pareto cache, we opted for using 64-bit floating point arithmetic, which is standard in probabilistic model checking [11]. Using floating point arithmetic can produce unsound results [26]; we attempt to prevent unsound results in our benchmark. First, we check with our setup that our results are very close (error \(<10^{-5}\)) to the exact solutions (when they could be computed). Second, we check that all results, obtained with different methods, are close. We evaluate the stopping criteria after ten iterations. These choices can be adapted using our prototypical implementation, we discuss some of these choices at the end of the discussion below.

Fig. 12.
figure 12

Benchmark scatter plots, time in seconds, OoR=Out of Resources

Results. We provide pairwise comparisons of the runtimes on all benchmarks using the scatter plots in Fig. 12Footnote 2. Notice the log-log scale. For some of the benchmark instances, we provide detailed information in Tables 1 and 2, respectively. In Table 1, we give the identifier for the string diagram and the component MDPs, as well as the number of states in \(\llbracket \mathbb {D} \rrbracket \). Then, for each of the five algorithms, we provide the timings in t, for each algorithm maintaining Pareto points, we give the number of Pareto points stored |P|, and for the three novel VI-based algorithms, we give the amount of time spent in an attempt to prove convergence (\(t_s\)). In Table 2, we focus on our three novel algorithms and the performance of the caches. We again provide identifiers for the models, and then for each algorithm, the total time spent by the algorithm, the time spent on inserting and retrieving items from the cache, as well as the fraction of cache hits H and the number of total queries Q. Thus, the number of cache hits is given by \(H\cdot Q\). The full tables and more figures are given in  [55, Appendix A].

Table 1. Performance for different algorithms. See Results for explanations.
Table 2. Cache access times for CVI algorithms. See Results for explanations.

Discussion. We make some observations. We notice that the CVI algorithms collectively solve more benchmarks within the time out and speed up most benchmarks, see Fig. 12(top-l).Footnote 3 We refer to benchmark results in Table 1.

\(\textbf{OCVI}^{e}\) Mostly Outperforms Mono, Fig. 12(top-c). The monolithic VI as typical in Storm requires a complete model, which can be prohibitively large. However, even for medium-sized models such as Chains100-RmB, the VI can run into time outs due to slow convergence. CVI with the exact cache (and even with no cache) quickly converges – highlighting that the grouping of states helps VI to converge. On the other hand, a model such as Birooms100-RmS highlights that the harder convergence check can yield a significant overhead.

Symb Mostly Outperforms BU, Fig. 12(top-r). For many models, the top-down approach as motivated in Sect. 4.2 indeed ensures that we avoid the undirected exploration of the Pareto curves. However, if the VI repeatedly asks for weights that are not relevant for the optimal scheduler, the termination checks fail and this yields a significant overhead.

\(\textbf{OCVI}^{e}\) and Symb Both Provide Clear Added Value, Fig. 12(bot-l). Both approaches can solve benchmarks within ten seconds that the other approach does not solve within the time-out. Both approaches are able to save significantly upon the number of iterations necessary. Symb suffers from the overhead of the Pareto cache, see below, whereas requires somewhat optimal values in all leaves, regardless of whether these leaves are important for reaching the global target. Therefore, Symb may profit from ideas from asynchronous VI and from adaptive schemes to decide when to run the termination check.

Pareto Cache Has a Significant Overhead, Fig. 12(bot-c/r) and Table 2. We observe that the Pareto cache consistently yields an overhead: In particular, often outperforms . The Pareto cache is essential for Symb. The overhead has three different sources. (1) More iterations: Birooms10-RmB illustrates how requires only 14% of the iterations of . Even with a 66% cache hit rate in , this means an overhead in the number of component MDPs analysed. The main reason is that reusing approximation can delay convergenceFootnote 4. (2) Cache retrieval: To obtain an upper bound, we must optimize over Pareto curves that contain tens of halfspaces, which are numerically not very stable. Therefore, Pareto curves in Storm are represented exactly. The linear program that must be solved is often equally slowFootnote 5 as actually solving the LP, especially for small MDPs. (3) Cache insertion: Cache insertion of lower bounds requires model checking Markov chains, as many as there are exits in the open MDPs. These times are pure overhead if this lower bound is never retrieved and can be substantial for large open MDPs.

Opportunities for Heuristics and Hyperparameters. We extensively studied variations of the presented algorithms. For example, a much higher tolerance in the Pareto cache can significantly speed up on the cost of not terminating on many benchmark cases and one can investigate a per-query strategy for retrieving and/or inserting cache results.

Interpretation of Results. Mono works well on models that fit into memory and exhibit little sharing of open MDPs. BU works well when the Pareto curves of the open MDPs can be accurately be approximated with few Pareto points, which, in practice, excludes open MDPs with more than 3 exits. CVI without caching and termination criteria resembles a basic kind of topological VIFootnote 6 on the monolithic MDP. CVI can thus improve upon topological VI either via the cache or via the alternative stopping criteria. Based on the experiments, we conjecture that

  • the cache is efficient when the cost of performing a single reachability query is expensive — such as in the Room10 model — while the cache hit rate is high.

  • the symbiotic termination criterion (Symb) works well when some exits are not relevant for the global target, such as the Chains3500 model, in which going backwards is not productive.

  • the compositional OVI stopping criterion ( / ) works well when the likelihood of reaching all individual open MDPs is high, such as can be seen in the ChainsLoop500-Dice4 model.

8 Related Work

We group our related work into variations of value iteration, compositional verification of MDPs, and multi-objective verification.

Value Iteration. Value iteration as standard analysis of MDPs [29] is widely studied. In the undiscounted, indefinite horizon case we study, value iteration requires an exponential number of iterations in theory, but in practice converges earlier. This motivates the search for sound termination criteria. Optimistic value iteration [30] is now widely adopted as the default approach [11, 29]. To accelerate VI, various asynchronous variations have been suggested that prevent operating on the complete state space. In particular topological VI [2, 15] and (uni-directional) sequential VI [25, 28, 36] aim to exploit an acyclic structure similar to what exists in uni-directional MDPs.

Sequentially Composed MDPs. The exploitation of a compositional structure in MDPs is widely studied. In particular, the sequential composition in our paper is closely related to hierarchical compositions that capture how tasks are often composed of repetitive subtasks [5, 6, 23, 31, 35, 48, 50, 51]. While we study a fully model-based approach, Jothimurugan et al. [33] provide a compositional reinforcement learning method whose sub-goals are induced by specifications. Neary et al. [40] update the learning goals based on the analysis of the component MDPs, but do not consider the possibility of reaching multiple exits. The widespread option-framework and variations such as abstract VI [34], aggregate policies [14, 49] into additional actions to speed up convergence of value iterations and is often applied in model-free approaches. In the context of OVI, we must converge everywhere and the bottom-up stopping criterion is not easily lifted to a model-free setting.

Further Related Work. As a different type of compositional reasoning, assume-guarantee reasoning [7, 12, 16, 18, 19, 38, 39] is a central topic, and a compositional probabilistic framework [38] with the parallel composition \(\parallel \) is also based on Pareto curves: extending string diagrams of MDPs for the parallel composition \(\parallel \) is challenging, but an interesting future work. We mention that there are VIs on Pareto curves solving multi-objective simple stochastic games [1, 13]. Due to the multi-objectivity, they maintain a set of points for each state during iterations; CVI solves single-objective oMDPs determined by weights, thus we maintain a single value for each state during iterations.

9 Conclusion

This paper investigates the verification of compositional MDPs, with a particular focus on approximating the behavior of the component MDPs via a Pareto cache and sound stopping criteria for value iteration. The empirical evaluation does not only demonstrate the efficacy of the novel algorithms, but also demonstrates the potential for further improvements, using asynchronous value iteration, efficient Pareto caches manipulations, and powerful compositional stopping criteria.