# Optimal Time-Bounded Reachability Analysis for Concurrent Systems

## Abstract

Efficient optimal scheduling for concurrent systems on a finite horizon is a challenging task up to date: Not only does time have a continuous domain, but in addition there are exponentially many possible decisions to choose from at every time point.

In this paper we present a solution to the problem of optimal time-bounded reachability for Markov automata, one of the most general formalisms for modelling concurrent systems. Our algorithm is based on the discretisation of the time horizon. In contrast to most existing algorithms for similar problems, the discretisation step is not fixed. We attempt to discretise only in those time points when the optimal scheduler *in fact* changes its decision. Our empirical evaluation demonstrates that the algorithm improves on existing solutions up to several orders of magnitude.

## 1 Introduction

Modern technologies grow and complexify rapidly, making it hard to ensure their dependability and reliability. Formal approaches to describing these systems include (generalised) stochastic Petri nets [Mol82, MCB84, MBC+98, Bal07], stochastic activity networks [MMS85], dynamic fault trees [BCS10] and others. The semantics of these modelling languages is often defined in terms of *continuous time Markov chains* (CTMCs). CTMCs can model the behaviour of seemingly independent processes evolving in memoryless continuous time (according to exponential distributions).

Modelling a system as a CTMC, however, strips it of any notion of *choice*, e. g., which of a number of requests to process first, or how to optimally balance the load over multiple servers of a cluster. Making sure that the system is safe for all possible choices of this kind is an important issue when assessing its reliability. *Non-determinism* allows the modeller to capture these choices. Modelling systems with non-determinism is possible in formalisms such as *interactive Markov chains* [Her02], or *Markov automata* (MA) [EHKZ13]. The latter are one of the most general models for concurrent systems available and can serve as a semantics for generalised stochastic Petri nets and dynamic fault trees.

A similar formalism, *continuous time Markov decision processes* (CTMDPs) [Ber00, Put94], has seen wide-spread use in control theory and operations research. In fact, MA and CTMDPs are closely related: They both can model exponential Markovian transitions and non-determinism. However, MA are *compositional*, while CTMDPs are not: In general it is not possible to model a system as a CTMDP by modelling each of its sub-components as smaller CTMDP and then combining them. This is why modelling large systems with many communicating sub-components as a CTMDP is cumbersome and error-prone. In fact, most modern model checkers, such as Storm [DJKV17], Modest [HH14] and PRISM [KNP11], do not offer any support for CTMDPs.

*optimal time-bounded reachability probability*, i. e. the maximal (or minimal) probability of a system to reach a set of goal states (e. g. unsafe states) within a given time bound. Due to the presence of non-determinism this value depends on which decisions are taken at which time points. Since the optimal strategy is time dependent there are continuously many different strategies. Classically, one deals with continuity by discretising the values, as is the case in most algorithms for CTMDPs and MA [Neu10, FRSZ16, HH15, BS11]: The time horizon is discretised into finitely many intervals, and the value within each interval is approximated by e. g. polynomial or exponential functions.

Discretisation is closely related to the scheduler that is optimal for a specific MA. As an example, consider Fig. 1: The plot shows the probabilities of reaching a goal state for a certain time bound, by choosing options 1 and 2. If less than 0.9 seconds remain, option 1 has a higher probability of reaching the goal set, while option 2 is preferable as long as more than 0.9 seconds are left. In this example it is enough to discretise the time horizon with roughly 2 intervals: [0, 0.9] and (0.9, 1.5]. The algorithms known to date however use from 200 to \(2\cdot 10^6\) intervals, which is far too many. The solution that we present in this paper discretises the time horizon in only *three* intervals for this example.

*Our contribution* consists in an algorithm that computes time bounded reachability probabilities for Markov automata. The algorithm discretises the time horizon by intervals of variable length, making them smaller near those time points where the optimal scheduler switches from one decision to another. We give a characterisation of these time points, as well as tight sufficient conditions for no such time point to exist within an interval. We present an empirical evaluation of the performance of the algorithm and compare it to other algorithms available for Markov automata. The algorithm does perform well in the comparison, improving in some cases by several orders of magnitude, but does not strictly outperform available solutions.

## 2 Preliminaries

Given a finite set *S*, a *probability distribution* over *S* is a function \(\mu :S\rightarrow [0,1]\), s. t. \(\sum _{s\in S}\mu (s) = 1\). We denote the set of all probability distributions over *S* by \(\mathrm {Dist}(S)\). The sets of rational, real and natural numbers are denoted with \(\mathbb {Q}\), \(\mathbb {R}\) and \(\mathbb {N}\) resp., \(X_{\unrhd 0}=\{x \in X ~|~ x \unrhd 0\}\), for \(X\in \{\mathbb {Q},\mathbb {R}\}, \unrhd \in \{ >, \geqslant \}\), \(\mathbb {N}_{\geqslant 0} = \mathbb {N}\cup \{0\}\).

### Definition 1

A *Markov automaton (MA)*^{1} is a tuple \(\mathcal {M}=(S,\text{ Act },{\overset{}{\mathbf {P}}}, {\overset{}{\mathrm {Q}}}, G)\) where *S* is a finite set of states partitioned into *probabilistic* (\( PS \)) and *Markovian* (\( MS \)), \(G\subseteq S\) is a set of *goal states*, \(\text{ Act }\) is a finite set of actions, \({\overset{}{\mathbf {P}}}: PS \times \text{ Act }\rightarrow \mathrm {Dist}(S)\) is the probabilistic transition matrix, \({\overset{}{\mathrm {Q}}}: MS \times S \rightarrow \mathbb {Q}\) is the Markovian transition matrix, s. t. \(\overset{}{\mathrm {Q}}(s,s')\geqslant 0\) for \(s\ne s'\), \(\overset{}{\mathrm {Q}}(s,s) = -\sum _{s' \ne s}\overset{}{\mathrm {Q}}(s,s')\).

Figure 2 shows an example MA. Grey and white colours denote Markovian and probabilistic states correspondingly. Transitions labelled as \(\alpha \) or \(\beta \) are actions of state \(s_1\). Dashed transitions associated with an action represent the distribution assigned to the action. Purely solid transitions are Markovian.

*Notation and further definitions:* For a Markovian state \(s \in MS \) and \(s' \ne s\), we call \(\overset{}{\mathrm {Q}}(s,s')\) the *transition rate* from *s* to \(s'\). The *exit rate* of a Markovian state *s* is Open image in new window . \({\mathrm {\mathbf {E}}_{\small {\text {max}}}}\) denotes the maximal exit rate among all the Markovian states of \(\mathcal {M}\). For a probabilistic state \(s \in PS \), \(\text{ Act }(s)=\{\alpha \in \text{ Act }\,|\,\exists \mu \in \mathrm {Dist}(S): \mathbf {P}(s,\alpha ) = \mu \}\) denotes the set of actions that are *enabled* in *s*. \(\mathbb {P}[s,\alpha ,\cdot ]\in \mathrm {Dist}(S)\) is defined by Open image in new window , where \(\mathbf {P}(s',\alpha ) = \mu \). We impose the usual *non-zenoness* [GHH+14] restriction on MA. This disallows e. g., probabilistic states with no outgoing transitions, or with only self-loop transitions.

A *(timed) path* in \(\mathcal {M}\) is a finite or infinite sequence \(\rho = s_0 \overset{\alpha _0, t_0}{\longrightarrow } s_1 \overset{\alpha _1, t_1}{\longrightarrow } \cdots \overset{\alpha _k, t_k}{\longrightarrow } s_{k+1} \overset{\alpha _{k+1}, t_{k+1}}{\longrightarrow } \cdots ,\) where \(\alpha _i \in \text{ Act }(s_i)\) for \(s_i \in PS \), and \(\alpha _i = \bot \) for \(s_i \in MS \). For a finite path \(\rho = s_0 \overset{\alpha _0, t_0}{\longrightarrow } s_1 \overset{\alpha _1, t_1}{\longrightarrow } \cdots \overset{\alpha _{k-1}, t_{k-1}}{\longrightarrow } s_{k}\) we define \(\rho \mathord \downarrow = s_{k}\). The set of all finite (infinite) paths of \(\mathcal {M}\) is denoted by \( Paths ^*\) (\( Paths \)).

Time passes continuously in Markovian states. The system leaves the state after the amount of time that is governed by an exponential distribution, i. e. the probability of leaving \(s\in MS \) within \(t\ge 0\) time units is given by \(1 - \mathrm {e}^{-\mathrm {E}(s)\cdot t}\), after which the next state \(s'\) is chosen with probability \(\overset{}{\mathrm {Q}}(s,s')/\mathrm {E}(s)\).

Probabilistic transitions happen instantaneously. Whenever the system is in a probabilistic state *s* and an action \(\alpha \in \text{ Act }(s)\) is chosen, the successor \(s'\) is selected according to the distribution \(\mathbb {P}[s,\alpha ,\cdot ]\) and the system moves from *s* to \(s'\) right away. Thus, the residence time in probabilistic states is always 0.

### 2.1 Time-Bounded Reachability

In this work we are interested in the probability to reach a certain set of states of a Markov automaton within a given time bound. However, due to the presence of multiple actions in probabilistic states the behaviour of a Markov automaton is not a stochastic process and thus no probability measure can be defined. This issue is resolved by introducing the notion of a scheduler.

A *general scheduler (or strategy)* \(\pi : Paths ^*\rightarrow \mathrm {Dist}(\text{ Act })\) is a measurable function, s. t. \(\forall \rho \in Paths ^*\) if \(\rho \mathord \downarrow \in PS \) then \(\pi (\rho ) \in \mathrm {Dist}(\text{ Act }(\rho \mathord \downarrow ))\). General schedulers provide a distribution over enabled actions of a probabilistic state given that the path \(\rho \) has been observed from the beginning of the system evolution. We call *stationary* such a general scheduler \(\pi \) that can be represented as \(\pi : PS \rightarrow \text{ Act }\), i. e. it is non-randomised and depends only on the current state. The set of all general (stationary) schedulers is denoted by \(\varPi _{\mathtt {gen}}\) (\(\varPi _{\mathtt {stat}}\) resp.).

Given a general scheduler \(\pi \), the behaviour of a Markov automaton is a fully defined stochastic process. For the definition of the probability measure Open image in new window on Markov automata we refer to [Hat17].

*(time-bounded) reachability probability*(or

*value*) for a scheduler \(\pi \) and state

*s*in \(\mathcal {M}\) is defined as follows:where \(\Diamond ^{\leqslant T}_{s} G= \{s \overset{\alpha _0,t_0}{\longrightarrow } s_1 \overset{\alpha _1,t_1}{\longrightarrow } s_2 \ldots \mid \exists i: s_i \in G\wedge \sum _{j=0}^{i-1} t_j \le T\}\) is the set of paths starting from

*s*and reaching \(G\) before

*T*.

*optimal (time-bounded) reachability probability*(or

*value*) of state

*s*in \(\mathcal {M}\) is defined as follows:We denote by Open image in new window ( Open image in new window ) the vector of values Open image in new window ( Open image in new window ) for all \(s \in S\). A general scheduler that achieves optimum for Open image in new window is called

*optimal*, and the one that achieves value \(\mathbf {v}\), s. t. Open image in new window , is \(\varepsilon \)

*-optimal*.

*Optimal Schedulers.* For the time-bounded reachability problem it is known [RS13] that there exists an optimal scheduler \(\pi \) of the form \(\pi : PS \times \mathbb {R}_{\geqslant 0}\rightarrow \text{ Act }\). This scheduler does not need to know the full history of the system, but only the current probabilistic state it is in and the total time left until time bound. It is deterministic, i. e. *not randomised*, and additionally, this scheduler is *piecewise constant*, meaning that there exists a finite partition \(\mathcal {I}({\pi })\) of the time interval [0, *T*] into intervals \(I_0=[t_0, t_1], I_1=(t_1, t_2], \cdots , I_{k-1}=(t_{k-1}, t_k]\), such that \(t_0=0, t_k=T\) and the value of the scheduler remains constant throughout each interval of the partition, i. e. \(\forall I \in \mathcal {I}({\pi }), \forall t_1,t_2 \in I, \forall s \in PS {}: \pi (s,t_1) = \pi (s,t_2)\). The value of \(\pi \) on an interval \(I\in \mathcal {I}({\pi })\) and \(s\in PS {}\) is denoted by \(\pi (s, I)\), i. e. \(\pi (s, I) = \pi (s,t)\) for any \(t\in I\).

As an example, consider the MA in Fig. 2 and time bound \(T=1\). Here the optimal scheduler for state \(s_1\) chooses the reliable but slow action \(\beta \) if there is enough time, i. e. if at least 0.62 time is left. Otherwise the optimal scheduler switches to a more risky, but faster path via action \(\alpha \).

In the literature this subclass of schedulers is sometimes referred to as *total-time positional deterministic, piecewise constant schedulers*. From now on we call a scheduler from this subclass simply a *scheduler (or strategy)* and denote the set of such schedulers with \(\varPi \). An important notion of schedulers is the *switching point*, the point of time separating two intervals of constant decisions:

### Definition 2

For a scheduler \(\pi \) and \(s\in PS {}\) we call \(\tau \in \mathbb {R}_{\geqslant 0}\) a *switching point*, iff \(\exists I_1, I_2\in \mathcal {I}({\pi })\), s. t. \(\tau =\sup {I_1}\) and \(\tau = \inf {I_2}\) and \(\exists s \in PS {}: \pi (s, I_1) \ne \pi (s, I_2)\).

Whether the switching points can be computed exactly or not is an open problem. In fact, the theorem of Lindemann-Weierstrass suggests that switching points are non-algebraic numbers, what hints at a negative answer.

## 3 Related Work

In this section we briefly review the algorithms designed to approximate time bounded reachability probabilities. We only discuss the algorithms that guarantee to compute \(\varepsilon \)-close approximation of the reachability value.

The majority of the algorithms [Neu10, BS11, FRSZ16, SSM18, BHHK15] are available for continuous time Markov decision processes (CTMDPs) [Ber00]. Two of those, [Neu10] and [BHHK15], are also applicable to MA. We compare to them in our empirical evaluation in Sect. 5. All the algorithms utilise such known techniques as discretisation, uniformisation, or a combination thereof. The drawback of most of the algorithms is that they do not adapt to a specific instance of a problem. Namely, given a model \(\mathcal {M}\) to analyse, they perform as many computations as is needed for \(\widehat{\mathcal {M}}\), which is the worst-case model in a subclass of models that share certain parameters with \(\mathcal {M}\), such as \({\mathrm {\mathbf {E}}_{\small {\text {max}}}}\), for example. Experimental evaluation performed in [BHHK15] shows that such approaches are not promising, because most of the time the algorithms perform too many unnecessary computations. This is not the case for [BS11] and [BHHK15]. The latter performs the analysis via uniformisation and schedulers that cannot observe time. The former, designed for CTMDPs, performs discretisation of the time horizon with intervals of variable length, however is not applicable to MA. Just like in [BS11], our approach is to adapt the discretisation of the time horizon to a specific instance of the problem.

## 4 Our Solution

In this section we present a novel approach to approximating optimal time-bounded reachability and the optimal scheduler for an arbitrary Markov automaton. Throughout the section we work with an MA \(\mathcal {M}=(S,{\text {Act}},{\mathbf {P}},{\overset{}{\mathrm {Q}}},G)\), time bound \(T\in \mathbb {Q}_{\geqslant 0}\) and precision \(\varepsilon \in \mathbb {Q}_{>0}\). To simplify the presentation we concentrate on supremum reachability probability.

Given a scheduler, computation (or approximation) of the reachability probability is relatively easy:

### Lemma 1

Let \(0=\tau _0< \tau _{1}<\ldots<\tau _{k-1}<\tau _k = T\), where \(\tau _i\) are the switching points of \(\pi \) for \(i=1..k-1\). The solution of the system of Equations (1)–(2) can be obtained separately on each of the intervals \((\tau _{i-1}, \tau _i], \forall i=1..k\), where the value of the scheduler remains constant for all states. Given the solution Open image in new window on interval \((\tau _{i-1}, \tau _{i}]\), we derive the solution for \((\tau _{i}, \tau _{i+1}]\) by using the values Open image in new window as boundary conditions. Later in Sect. 4.1 we will show that the approximation of the solution for each interval \((\tau _{i-1}, \tau _{i}]\) can be achieved via a combination of known techniques, such as *uniformisation* (for the Markovian states) and *untimed reachability analysis* (for probabilistic states).

Thus, given an optimal scheduler, Lemma 1 can be used to compute or approximate the optimal reachability value. Finding an optimal scheduler is therefore *the* challenge for optimal time-bounded reachability analysis. Our solution is based on approximating the optimal reachability value up to an arbitrary \(\varepsilon > 0\) by discretising the time horizon with intervals of variable length. On each interval the value of our \(\varepsilon \)-optimal scheduler remains constant. The discretisation we use attempts to reflect the partition \(\mathcal {I}({\pi })\) of a minimal^{2} optimal scheduler \(\pi \), i. e. it mimics intervals on which \(\pi \) has constant value.

Our solution is presented in Algorithm 1. It computes an \(\varepsilon \)-optimal scheduler \(\pi _\mathtt {opt}\) and approximates the system of Equations (1)–(2) for \(\pi _\mathtt {opt}\). The algorithm iterates over intervals of constant decisions of an \(\varepsilon \)-optimal strategy. At each iteration it computes: (i) a stationary scheduler \(\pi \) that is close to be optimal on the current interval (line 7), (ii) length \(\delta \) of the interval, on which \(\pi \) introduces acceptable error (line 8) and (iii) the reachability values for time \(t+\delta \) (line 9). The following sections discuss the steps of the algorithm in more detail.

### Theorem 1

Algorithm 1 approximates the value of an arbitrary Markov automaton for time bound \(T \in \mathbb {Q}_{\geqslant 0}\) up to a given \(\varepsilon \in \mathbb {Q}_{> 0}\).

### 4.1 Computing the Reachability Value

In this section we discuss steps 4 and 9, that require computation of the reachability probability according to the system of Equations (1)–(2). Our approach is based on the approximation of the solution. The presence of two types of states, probabilistic and Markovian, demands separate treatment of those. Informally, we will combine two techniques: time-bounded reachability analysis on continuous time Markov chains^{3} for Markovian states and time-unbounded reachability analysis on discrete time Markov chains^{4} for probabilistic states. Parameters \(w\) and \(\varepsilon _i\) of Algorithm 1 control the error allowed by the approximation. Here \(\varepsilon _i\) bounds the error for the very first instance of time-unbounded reachability in line 4. While \(w\) defines the fraction of the error that can be used by the approximations in subsequent iterations ( Open image in new window and Open image in new window ).

*s*by performing any number of probabilistic transitions and no Markovian transitions. This system of linear equations can be either solved exactly, e. g. via Gaussian elimination, or approximated (numerical methods). If Open image in new window is under-approximated we denote it by Open image in new window , where \(\epsilon \) is the approximation error. For \(A\subseteq S\) we define Open image in new window , Open image in new window .

For time bound \(0, s\in PS \) the value Open image in new window is the optimal probability to reach any goal state via only probabilistic transitions. We denote it by Open image in new window (step 4). It is a well-known problem on *discrete time Markov decision processes* [Put94] and can be computed or approximated by policy iteration, linear programming [Put94] or interval value iteration [HM14, QK18, BKL+17]. If the value is approximated up to \(\epsilon \), we denote it by Open image in new window .

The reachability analysis on Markovian states is solved with the well-known *uniformisation* approach [Jen53]. Informally, Markovian states will be implicitly *uniformised*: The exit rate for each Markovian state will be equal \({\mathrm {\mathbf {E}}_{\small {\text {max}}}}\) (by adding a self-loop transition), but this will not affect the reachability value.

*k*Markovian transitions. Let \(\mathbf {x}\in [0,1]^{|S|}\) be a vector of values for each state. For \(k \in \mathbb {N}_{\geqslant 0}, \pi \in \varPi _{\mathtt {stat}}\) we define Open image in new window if \(s \in G\) and otherwise:The value Open image in new window is the weighted sum over all states \(s'\) of the value \(\mathbf {x}_{s'}\) and the probability to reach \(s'\) starting from

*s*within

*k*Markovian transitions. Therefore the counter

*k*decreases only when a Markovian state performs a transition and is not affected by probabilistic transitions. If values Open image in new window are approximated up to precision \(\epsilon \), i. e. Open image in new window is used for probabilistic states instead of Open image in new window in (4), we use the notation Open image in new window .

We denote with Open image in new window the probability mass function of the Poisson distribution with parameter \(\lambda \). For a \(\tau \in \mathbb {R}_{\geqslant 0}\) and Open image in new window , Open image in new window is some natural number satisfying Open image in new window , e. g. Open image in new window [BHHK15], where *e* is the Euler’s number.

We are now in position to describe a way to compute Open image in new window at line 9 of Algorithm 1. Let Open image in new window be a vector of values computed by the previous iteration of Algorithm 1 for time *t*. Let Open image in new window be the solution of the system of Equation (1) for time point \(t+\delta \), a stationary scheduler \(\pi : PS \rightarrow \text{ Act }\) and where Open image in new window is used instead of Open image in new window as the boundary condition^{5}. The following Lemma shows that Open image in new window can be efficiently approximated up to Open image in new window :

### Lemma 2

### 4.2 Choosing a Strategy

The strategy for the next interval is computed in Step 7 and implicitly in Step 4. The latter has been discussed in Sect. 4.1. We proceed to Step 7.

*continuous time Markov decision processes*[Mil68], we prove that derivatives of function Open image in new window at time \(\tau =t\) help finding the strategy \(\pi \) that remains optimal for interval \((t, t+\delta ]\), for some \(\delta >0\). This is rooted in the Taylor expansion of function Open image in new window via the values of Open image in new window . We define sets

*t*for a scheduler \(\pi \).

### Lemma 3

If \(\pi \in \mathcal {F}_{|S|+1}\) then \(\exists \delta > 0\) such that \(\pi \) is optimal on \((t, t+\delta ]\).

Thus in order to compute a stationary strategy that is optimal on time-interval \((t, t+\delta ]\), for some \(\delta >0\), one needs to compute at most \(|S|+1\) derivatives of Open image in new window at time *t*. Procedure \(\textsc {FindStrategy}\) does exactly that. It computes sets \(\mathcal {F}_i\) until for some \(j\in 0..(|S|+1)\) there is only 1 strategy left, i. e. \(|\mathcal {F}_j|=1\). Otherwise it outputs any strategy in \(\mathcal {F}_{|S|+1}\). Similarly to Sect. 4.1, the scheduler that maximises the values Open image in new window can be approximated. This question and other optimisations are discussed in detail in Sect. 4.4.

### 4.3 Finding Switching Points

*t*,

*T*]. Recall the function Open image in new window , defined in Sect. 4.1 (Lemma 2) as the solution of the system of Equation (1) with the boundary condition Open image in new window , for a stationary scheduler \(\pi \). For a probabilistic state

*s*the following holds:Let \(s \in PS , \pi \in \varPi _{\mathtt {stat}}, \alpha \in \text{ Act }(s)\). Consider the following function:This function denotes the reachability value for time bound \(t+\delta \) and a scheduler that is different from \(\pi \). Namely, this is such a scheduler, that all states follow strategy \(\pi \), except for state

*s*, that selects action \(\alpha \) for the very first transition, and afterwards selects action \(\pi (s)\). Between two switching points the strategy \(\pi \) is optimal and therefore the value of Open image in new window is not greater than Open image in new window for all \(s\in PS {}, \alpha \in \text{ Act }(s)\). If for some \(\delta \in [0, T-t], s \in PS , \alpha \in \text{ Act }(s)\) it holds that Open image in new window , then action \(\alpha \) is better for

*s*then \(\pi (s)\), and therefore \(\pi (s)\) is not optimal for

*s*at \(t+\delta \). We show that the next switching point after time point

*t*is such a value \(t+\delta ,\delta \in (0, T-t]\), thatProcedure \(\textsc {FindStep}\) approximates switching points iteratively. It splits the time interval [0,

*T*] into subintervals \([t_1,t_2], \ldots , [t_{n-1}, t_{n}]\) and at each iteration

*k*checks whether (7) holds for some \(\delta \in [t_{k}, t_{k+1}]\). The latter is performed by procedure \(\textsc {CheckInterval}\). If \(\forall \delta \in [t_k, t_{k+1}]\) (7) does not hold, \(\textsc {FindStep}\) repeats by increasing

*k*. Otherwise, it outputs the largest \(\delta \in [t_{k}, t_{k+1}]\) for which (7) does not hold (line 11). This is done by binary search up to distance \(\delta _{\min }\). Later in this section we will show that establishing that (7) does not hold for all \(\delta \in [t_k,t_{k+1}]\) can be efficiently performed by considering only 2 time points of the interval \([t_k,t_{k+1}]\) and a subset of state-action pairs.

*Selecting* \(t_k\). This step is a heuristic. The correctness of our algorithm does not depend on the choices of \(t_k\), but its runtime is supposed to benefit from it: Obviously, the runtime of \(\textsc {FindStrategy}\) is best given an oracle that produces time points \(t_k\) which are exactly the switching points of the optimal strategy. Any other heuristic is just a guess.

At every iteration *k* we choose such a time point \(t_k\) that the MA is very likely to perform at most *k* Markovian transitions within time \(t_k\). “Very likely” here means with probability Open image in new window . For \(k \in \mathbb {N}\) we define Open image in new window as follows: Open image in new window , and for \(k > 1\): Open image in new window .

*Searching for switching points within*\([t_{k}, t_{k+1}]\). In order to check whether Open image in new window for

*all*\(\delta \in [t_{k}, t_{k+1}]\) we only have to check whether the maximum of function Open image in new window is at most 0 on this interval for all \(s\in PS ,\alpha \in \text{ Act }(s)\). In order to achieve this we work on the approximation of \(\mathbf { diff }(s, \alpha , t+\delta )\) derived from Lemma 2, thus establishing a sufficient condition for the scheduler to remain optimal:Here Open image in new window ( Open image in new window ) denotes an under-approximation of the value Open image in new window ( Open image in new window resp.) up to Open image in new window , defined in Lemma 2. And analogously for Open image in new window . Simple rewriting leads to the following:where Open image in new window . In order to find the supremum of the right-hand side of (9) over all \(\delta \in [a, b]\) we search for extremum of each Open image in new window , separately as a function of \(\delta \). Simple derivative analysis shows that the extremum of these functions is achieved at \(\delta =i/{\mathrm {\mathbf {E}}_{\small {\text {max}}}}\). Truncation of the time interval by \((\left\lfloor t_{k} \cdot {\mathrm {\mathbf {E}}_{\small {\text {max}}}}\right\rfloor +1) /{\mathrm {\mathbf {E}}_{\small {\text {max}}}}\) (step 4, Algorithm 2) ensures that for all \(i=0..k\) the extremum of \(y_i(\delta )\) is attained at either \(\delta =t_{k}\) or \(\delta =t_{k+1}\).

### Lemma 4

*k*. \(\forall \delta \in [t_{k}, t_{k+1}], s \in PS , \alpha \in \text{ Act }\):where

\(\textsc {CheckInterval}\) returns false iff for all \(s \in PS , \alpha \in \text{ Act }\) the right-hand side of (10) is less or equal to 0. Since Lemma 4 over-approximates \(\mathbf { diff }(s, \alpha , t+\delta )\) false positives are inevitable. Namely, it is possible that procedure \(\textsc {CheckInterval}\) suggests that there exists a switching point within \([t_k, t_{k+1}]\), while in reality there is none. This however does not affect correctness of the algorithm and only its running time.

*Finding Maximal Transitions.* Here we show that there exists a subset of states, such that, if the optimal strategy for these states does not change on an interval, then the optimal strategy for *all* states does not change on this interval.

In the following we call a pair \((s,\alpha ) \in PS \times \text{ Act }\) a *transition*. For transitions \((s, \alpha ), (s', \alpha ') \in PS \times \text{ Act }\) we write \((s,\alpha ) \preceq _{k} (s', \alpha ')\) iff \(C_{\pi , \varepsilon _{\text {N}}}(s,\alpha ) \leqslant C_{\pi , \varepsilon _{\text {N}}}(s',\alpha ')\) and \(\forall i=0..k: B^i_{\pi , \varepsilon _{\text {N}}}(s,\alpha ) \leqslant B^i_{\pi , \varepsilon _{\text {N}}}(s',\alpha ')\). We say that a transition \((s,\alpha )\) is *maximal* if there exists no other transition \((s',\alpha ')\) that satisfies the following: \((s,\alpha )\preceq _{k} (s',\alpha ')\) and at least one of the following conditions hold: \(C_{\pi , \varepsilon _{\text {N}}}(s,\alpha ) < C_{\pi , \varepsilon _{\text {N}}}(s',\alpha ')\) or \(\exists i=0..k: B^i_{\pi , \varepsilon _{\text {N}}}(s,\alpha ) < B^i_{\pi , \varepsilon _{\text {N}}}(s',\alpha ')\). The set of all maximal transitions is denoted with \(\mathcal {T}_\mathrm {max}(k)\).

We prove that if inequality (10) holds for all transitions from \(\mathcal {T}_\mathrm {max}(k)\), then it holds for all transitions. Thus only transitions from \(\mathcal {T}_\mathrm {max}(k)\) have to be checked by procedure \(\textsc {CheckInterval}\). In our implementation we only compute \(\mathcal {T}_\mathrm {max}(k)\) before the call to \(\textsc {CheckInterval}\) at line 11 of Algorithm 2, and use the set \(A= PS \times \text{ Act }\) within the while-loop.

### 4.4 Optimisation for Large Models

Here we discuss a number of implementation improvements developers should consider when applying our algorithm to large case studies:

*Switching points.* It may happen that the optimal strategy switches very often on a time interval, while the effect of these frequent switches is negligible. The difference may be so small that the \(\varepsilon \)-optimal strategy actually stays stationary on this interval. In addition, floating-point computations may lead to imprecise results: Values that are 0 in theory might be represented by non-zero float-point numbers, making it seem as if the optimal strategy changed its decision, when in fact it did not. To counteract these issues we can modify \(\textsc {CheckInterval}\) such that it outputs false even if the right-hand side of (10) is positive, as long as it is sufficiently small. The following lemma proves that the error introduced by not switching the decision is acceptable:

### Lemma 5

Let \(\delta = t_{k+1}-t_{k}\), \(\varepsilon '=\varepsilon -\varepsilon _i, \epsilon \in (0, \varepsilon '\cdot \delta /T)\) and \(N(\delta ,\epsilon )=({\mathrm {\mathbf {E}}_{\small {\text {max}}}}\delta )^2/2.0/\epsilon \). If \(\forall s \in PS , \alpha \in \text{ Act }, \tau \in [t_{k}, t_{k+1}]\) the right-hand side of (10) is not greater than \((\varepsilon '\delta /T-\epsilon )/N(\delta , \epsilon )\), then \(\pi \) is \(\varepsilon '\delta /T\)-optimal in \([t_{k}, t_{k+1}]\).

*Optimal strategy.* In some cases computation of the optimal strategy in the way it was described in Sect. 4.2 is computationally expensive, or is not possible at all. For example, if some values \(|\mathbf {d}^{(i)}_{\pi }(s)|\) are larger than the maximal floating point number that a computer can store, or if the computation of \(|S|+1\) derivatives is already too prohibitive for models of large state space, or if the values Open image in new window can only be approximated and not computed precisely. With the help of Lemma 5 and minor modifications to Algorithm 1, the correctness and convergence of Algorithm 1 can be preserved even when the strategy computed by \(\textsc {FindStrategy}\) is not guaranteed to be optimal.

## 5 Empirical Evaluation

We implemented our algorithm as a part of IMCA [GHKN12]. Experiments were conducted as single-thread processes on an Intel Core i7-4790 with 32 GB of RAM. We compare the algorithm presented in this paper with [Neu10] and [BHHK15]. Both are available in IMCA. We use the following abbreviations to refer to the algorithms: FixStep for [Neu10], \(\texttt {Unif}^{+}\) for [BHHK15] and SwitchStep for Algorithm 1. The value of the parameter \(w\) in Algorithm 1 is set to 0.1, \(\varepsilon _i=0\). We keep the default values of all other algorithms.

The evaluation is performed on a set of published benchmarks:

**dpm**-**j**-**k:** A model of a *dynamic power management system* [QWP99], representing the internals of a Fujitsu disk drive. The model contains a queue, service requester, service provider and a power manager. The requester generates tasks of *j* types differing in energy requirements, that are stored in the queue of size *k*. The power manager selects the processing mode for the service provider. A state is a goal state if the queue of at least one task type is full.

**qs****-j****-k** and **ps****-j****-k:** Models of a *queuing system* [HH12] and a *polling system* [GHH+13] where incoming requests of *j* types are buffered in two queues of size *k* each, until they are processed by the server. We consider the state with both queues being full to form the goal state set.

The discretisation step used in some of the experiments shown in Fig. 3.

\(\delta _{\texttt {F}}\) | \(\min \delta _{\texttt {S}}\) | \(\text {avg}\,\delta _{\texttt {S}}\) | \(\max \delta _{\texttt {S}}\) | | precision | |
---|---|---|---|---|---|---|

dpm-5-2 | \(3.7\cdot 10^{-6}\) | \(3.65\cdot 10^{-5}\) | 0.27 | 3.97 | 15 | 0.001 |

qs-2-3 | \(1.04\cdot 10^{-6}\) | \(1.04\cdot 10^{-6}\) | 0.017 | 7.56 | 15 | 0.001 |

ps-2-6 | \(3.54\cdot 10^{-6}\) | 0.0003 | 6 | 17.4 | 18 | 0.001 |

### 5.1 Results

SwitchStep **vs ** FixStep. Figure 3 compares runtimes of SwitchStep and FixStep. For these experiments precision is set to \(10^{-3}\) and the state space size ranges from \(10^2\) to \(10^5\).

Parameters of the experiments shown in Fig. 4.

| | | | \({\mathrm {\mathbf {E}}_{\small {\text {max}}}}\) | | |
---|---|---|---|---|

dpm-[4..7]-2 | 2061 - 158,208 | 4 - 7 | 4.6 - 9.1 | 15 |

dpm-3-[2..20] | 412 - 115,108 | 3 | 3.3 | 100 |

qs-1-[2..7] | 124 - 3,614 | 4 - 14 | 11.3 - 35.3 | 6 |

qs-[1..4]-2 | 124 - 16,924 | 4 - 8 | 11.3 | 6 |

ps-[1..8]-2 | 47 - 156,315 | 3 - 8 | 3.6 - 257.6 | 18 |

ps-2-[1-7] | 65 - 743,969 | 2 - 4 | 4.8 - 5.6 | 18 |

**vs**\(\texttt {Unif}^{+}\). In order to compare SwitchStep with \(\texttt {Unif}^{+}\) we have to restrict ourselves to a subclass of Markov automata in which probabilistic and Markovian states alternate, and probabilistic states have only 1 successor for each action. This is due to the fact that \(\texttt {Unif}^{+}\) is available in IMCA only for this subclass of models.

Figure 4 shows the comparison of running times of SwitchStep and \(\texttt {Unif}^{+}\). For the plot on the left we varied those model parameters that affect state space size, number of non-deterministic actions and maximal exit rate. In the plot on the right the model parameters are fixed, but precision and time bounds used for the experiments are differing. Table 2 shows the parameters of the models used in these experiments. We observe that there are cases in which SwitchStep performs remarkably better than \(\texttt {Unif}^{+}\), and cases of the opposite. Consider the experiments in Fig. 4, right. They show that \(\texttt {Unif}^{+}\) may be highly sensitive to variations of time bounds and precision, while SwitchStep is more robust in this respect. This is due to the fact that the scheduler computed by \(\texttt {Unif}^{+}\) does not have means to observe time precisely and can only guess it. This may be good enough, which is the case on the \(\texttt {ps}\) benchmark. However if it is not, then better precision will require many more computations. Additionally \(\texttt {Unif}^{+}\) does not use discretisation. This means that the increase of the time bound from \(T_1\) to \(T_2\) may significantly increase the overall running time, even if no new switching points appear on the interval \([T_1, T_2]\). SwitchStep does not suffer from these issues due to the fact that it considers schedulers that observe the time precisely and uses the discretisation. Large time intervals that introduce no switching points will likely be handled within one iteration.

In general, SwitchStep performs at its best when there are not too many switching points, which is what is observed in most published case studies.

*Conclusions:* We conclude that SwitchStep does not replace all existing algorithms for time bounded reachability. However it does improve the state of the art in many cases and thus occupies its own niche among available solutions.

## Footnotes

- 1.
Strictly speaking, this is the definition of a

*closed*Markov automaton in which no state has two actions with the same label. This is however not a restriction since the analysis of*general*Markov automata is always performed only after the composition under the urgency assumption is performed. Additional renaming of the actions does not affect the properties considered in this work. - 2.
In the size of \(\mathcal {I}({\pi })\).

- 3.
Markov automata without probabilistic states.

- 4.
Markov automata without Markovian states and such that \(\forall s \in PS : |\text{ Act }(s)| = 1\).

- 5.
Open image in new window may differ from Open image in new window since its boundary condition Open image in new window is an approximation of the boundary condition Open image in new window , used by Open image in new window .

## References

- [Bal07]Balbo, G.: Introduction to generalized stochastic Petri nets. In: Bernardo, M., Hillston, J. (eds.) SFM 2007. LNCS, vol. 4486, pp. 83–131. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-72522-0_3CrossRefGoogle Scholar
- [BCS10]Boudali, H., Crouzen, P., Stoelinga, M.: A rigorous, compositional, and extensible framework for dynamic fault tree analysis. IEEE Trans. Dependable Sec. Comput.
**7**(2), 128–143 (2010). https://doi.org/10.1109/TDSC.2009.45CrossRefGoogle Scholar - [Ber00]Bertsekas, D.P.: Dynamic Programming and Optimal Control, 2nd edn. Athena Scientific, Belmont (2000)Google Scholar
- [BHHK15]Butkova, Y., Hatefi, H., Hermanns, H., Krčál, J.: Optimal continuous time Markov decisions. In: Finkbeiner, B., Pu, G., Zhang, L. (eds.) ATVA 2015. LNCS, vol. 9364, pp. 166–182. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24953-7_12CrossRefGoogle Scholar
- [BKL+17]Baier, C., Klein, J., Leuschner, L., Parker, D., Wunderlich, S.: Ensuring the reliability of your model checker: interval iteration for Markov decision processes. In: Majumdar, R., Kunčak, V. (eds.) CAV 2017. LNCS, vol. 10426, pp. 160–180. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-63387-9_8CrossRefGoogle Scholar
- [BS11]Buchholz, P., Schulz, I.: Numerical analysis of continuous time Markov decision processes over finite horizons. Comput. OR
**38**(3), 651–659 (2011). https://doi.org/10.1016/j.cor.2010.08.011MathSciNetCrossRefzbMATHGoogle Scholar - [DJKV17]Dehnert, C., Junges, S., Katoen, J., Volk, M.: A storm is coming: a modern probabilistic model checker. In: Majumdar, R., Kunčak, V. (eds.) CAV 2017. LNCS, vol. 10427, pp. 592–600. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-63390-9_31CrossRefGoogle Scholar
- [EHKZ13]Eisentraut, C., Hermanns, H., Katoen, J., Zhang, L.: A semantics for every GSPN. In: Colom, J.-M., Desel, J. (eds.) PETRI NETS 2013. LNCS, vol. 7927, pp. 90–109. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-38697-8_6CrossRefGoogle Scholar
- [FRSZ16]Fearnley, J., Rabe, M.N., Schewe, S., Zhang, L.: Efficient approximation of optimal control for continuous-time Markov games. Inf. Comput.
**247**, 106–129 (2016). https://doi.org/10.1016/j.ic.2015.12.002MathSciNetCrossRefzbMATHGoogle Scholar - [GHH+13]Guck, D., Hatefi, H., Hermanns, H., Katoen, J., Timmer, M.: Modelling, reduction and analysis of Markov automata. In: Joshi, K., Siegle, M., Stoelinga, M., D’Argenio, P.R. (eds.) QEST 2013. LNCS, vol. 8054, pp. 55–71. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-40196-1_5CrossRefGoogle Scholar
- [GHH+14]Guck, D., Hatefi, H., Hermanns, H., Katoen, J., Timmer, M.: Analysis of timed and long-run objectives for Markov automata. Log. Methods Comput. Sci.
**10**(3) (2014). https://doi.org/10.2168/LMCS-10(3:17)2014 - [GHKN12]Guck, D., Han, T., Katoen, J.-P., Neuhäußer, M.R.: Quantitative timed analysis of interactive Markov chains. In: Goodloe, A.E., Person, S. (eds.) NFM 2012. LNCS, vol. 7226, pp. 8–23. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-28891-3_4CrossRefGoogle Scholar
- [Hat17]Hatefi-Ardakani, H.: Finite horizon analysis of Markov automata. Ph.D. thesis, Saarland University, Germany (2017). http://scidok.sulb.uni-saarland.de/volltexte/2017/6743/
- [Her02]Hermanns, H.: Interactive Markov Chains: The Quest for Quantified Quality. LNCS, vol. 2428. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-45804-2CrossRefzbMATHGoogle Scholar
- [HH12]Hatefi, H., Hermanns, H.: Model checking algorithms for Markov automata. ECEASST
**53**(2012). http://journal.ub.tu-berlin.de/eceasst/article/view/783 - [HH14]Hartmanns, A., Hermanns, H.: The modest toolset: an integrated environment for quantitative modelling and verification. In: Ábrahám, E., Havelund, K. (eds.) TACAS 2014. LNCS, vol. 8413, pp. 593–598. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-642-54862-8_51CrossRefGoogle Scholar
- [HH15]Hatefi, H., Hermanns, H.: Improving time bounded reachability computations in interactive Markov chains. Sci. Comput. Program.
**112**, 58–74 (2015). https://doi.org/10.1016/j.scico.2015.05.003CrossRefzbMATHGoogle Scholar - [HM14]Haddad, S., Monmege, B.: Reachability in MDPs: refining convergence of value iteration. In: Ouaknine, J., Potapov, I., Worrell, J. (eds.) RP 2014. LNCS, vol. 8762, pp. 125–137. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11439-2_10CrossRefzbMATHGoogle Scholar
- [Jen53]Jensen, A.: Markoff chains as an aid in the study of markoff processes. Scand. Actuarial J.
**1953**(sup1), 87–91 (1953). https://doi.org/10.1080/03461238.1953.10419459MathSciNetCrossRefzbMATHGoogle Scholar - [KNP11]Kwiatkowska, M.Z., Norman, G., Parker, D.: PRISM 4.0: verification of probabilistic real-time systems. In: Gopalakrishnan, G., Qadeer, S. (eds.) CAV 2011. LNCS, vol. 6806, pp. 585–591. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-22110-1_47CrossRefGoogle Scholar
- [MBC+98]Marsan, M.A., Balbo, G., Conte, G., Donatelli, S., Franceschinis, G.: Modelling with generalized stochastic Petri nets. SIGMETRICS Perform. Eval. Rev.
**26**(2), 2 (1998). https://doi.org/10.1145/288197.581193CrossRefzbMATHGoogle Scholar - [MCB84]Marsan, M.A., Conte, G., Balbo, G.: A class of generalized stochastic Petri nets for the performance evaluation of multiprocessor systems. ACM Trans. Comput. Syst.
**2**(2), 93–122 (1984). https://doi.org/10.1145/190.191CrossRefGoogle Scholar - [Mil68]Miller, B.: Finite state continuous time Markov decision processes with a finite planning horizon. SIAM J. Control
**6**(2), 266–280 (1968). https://doi.org/10.1137/0306020MathSciNetCrossRefzbMATHGoogle Scholar - [MMS85]Meyer, J.F., Movaghar, A., Sanders, W.H.: Stochastic activity networks: structure, behavior, and application. In: International Workshop on Timed Petri Nets, Torino, pp. 106–115. IEEE Computer Society (1985)Google Scholar
- [Mol82]Molloy, M.K.: Performance analysis using stochastic Petri nets. IEEE Trans. Comput.
**C–31**(9), 913–917 (1982)CrossRefGoogle Scholar - [Neu10]Neuhäußer, M.R.: Model checking nondeterministic and randomly timed systems. Ph.D. thesis, RWTH Aachen University (2010). http://darwin.bth.rwth-aachen.de/opus3/volltexte/2010/3136/
- [Put94]Puterman, M.L.: Markov Decision Processes: Discrete Stochastic Dynamic Programming, 1st edn. Wiley, Hoboken (1994)CrossRefGoogle Scholar
- [QK18]Quatmann, T., Katoen, J.-P.: Sound value iteration. In: Chockler, H., Weissenbacher, G. (eds.) CAV 2018. LNCS, vol. 10981, pp. 643–661. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-96145-3_37CrossRefGoogle Scholar
- [QWP99]Qiu, Q. Wu, Q., Pedram, M.: Stochastic modeling of a power-managed system: construction and optimization. In: ISLPED, 1999, pp. 194–199. ACM (1999). https://doi.org/10.1145/313817.313923
- [RS13]Rabe, M.N., Schewe, S.: Optimal time-abstract schedulers for CTMDPs and continuous-time Markov games. Theor. Comput. Sci.
**467**, 53–67 (2013). https://doi.org/10.1016/j.tcs.2012.10.001MathSciNetCrossRefzbMATHGoogle Scholar - [SSM18]Salamati, M., Soudjani, S., Majumdar, R.: Approximate time bounded reachability for CTMCs and CTMDPs: a Lyapunov approach. In: McIver, A., Horvath, A. (eds.) QEST 2018. LNCS, vol. 11024, pp. 389–406. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-99154-2_24CrossRefGoogle Scholar

## Copyright information

**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.