figure a

1 Introduction

Modern technologies grow and complexify rapidly, making it hard to ensure their dependability and reliability. Formal approaches to describing these systems include (generalised) stochastic Petri nets [Mol82, MCB84, MBC+98, Bal07], stochastic activity networks [MMS85], dynamic fault trees [BCS10] and others. The semantics of these modelling languages is often defined in terms of continuous time Markov chains (CTMCs). CTMCs can model the behaviour of seemingly independent processes evolving in memoryless continuous time (according to exponential distributions).

Modelling a system as a CTMC, however, strips it of any notion of choice, e. g., which of a number of requests to process first, or how to optimally balance the load over multiple servers of a cluster. Making sure that the system is safe for all possible choices of this kind is an important issue when assessing its reliability. Non-determinism allows the modeller to capture these choices. Modelling systems with non-determinism is possible in formalisms such as interactive Markov chains [Her02], or Markov automata (MA) [EHKZ13]. The latter are one of the most general models for concurrent systems available and can serve as a semantics for generalised stochastic Petri nets and dynamic fault trees.

A similar formalism, continuous time Markov decision processes (CTMDPs) [Ber00, Put94], has seen wide-spread use in control theory and operations research. In fact, MA and CTMDPs are closely related: They both can model exponential Markovian transitions and non-determinism. However, MA are compositional, while CTMDPs are not: In general it is not possible to model a system as a CTMDP by modelling each of its sub-components as smaller CTMDP and then combining them. This is why modelling large systems with many communicating sub-components as a CTMDP is cumbersome and error-prone. In fact, most modern model checkers, such as Storm [DJKV17], Modest [HH14] and PRISM [KNP11], do not offer any support for CTMDPs.

In the analysis of MA and CTMDPs, one of the most challenging problems is the approximation of optimal time-bounded reachability probability, i. e. the maximal (or minimal) probability of a system to reach a set of goal states (e. g. unsafe states) within a given time bound. Due to the presence of non-determinism this value depends on which decisions are taken at which time points. Since the optimal strategy is time dependent there are continuously many different strategies. Classically, one deals with continuity by discretising the values, as is the case in most algorithms for CTMDPs and MA [Neu10, FRSZ16, HH15, BS11]: The time horizon is discretised into finitely many intervals, and the value within each interval is approximated by e. g. polynomial or exponential functions.

Fig. 1.
figure 1

Reachability probability for different decisions

Discretisation is closely related to the scheduler that is optimal for a specific MA. As an example, consider Fig. 1: The plot shows the probabilities of reaching a goal state for a certain time bound, by choosing options 1 and 2. If less than 0.9 seconds remain, option 1 has a higher probability of reaching the goal set, while option 2 is preferable as long as more than 0.9 seconds are left. In this example it is enough to discretise the time horizon with roughly 2 intervals: [0, 0.9] and (0.9, 1.5]. The algorithms known to date however use from 200 to \(2\cdot 10^6\) intervals, which is far too many. The solution that we present in this paper discretises the time horizon in only three intervals for this example.

Our contribution consists in an algorithm that computes time bounded reachability probabilities for Markov automata. The algorithm discretises the time horizon by intervals of variable length, making them smaller near those time points where the optimal scheduler switches from one decision to another. We give a characterisation of these time points, as well as tight sufficient conditions for no such time point to exist within an interval. We present an empirical evaluation of the performance of the algorithm and compare it to other algorithms available for Markov automata. The algorithm does perform well in the comparison, improving in some cases by several orders of magnitude, but does not strictly outperform available solutions.

2 Preliminaries

Given a finite set S, a probability distribution over S is a function \(\mu :S\rightarrow [0,1]\), s. t. \(\sum _{s\in S}\mu (s) = 1\). We denote the set of all probability distributions over S by \(\mathrm {Dist}(S)\). The sets of rational, real and natural numbers are denoted with \(\mathbb {Q}\), \(\mathbb {R}\) and \(\mathbb {N}\) resp., \(X_{\unrhd 0}=\{x \in X ~|~ x \unrhd 0\}\), for \(X\in \{\mathbb {Q},\mathbb {R}\}, \unrhd \in \{ >, \geqslant \}\), \(\mathbb {N}_{\geqslant 0} = \mathbb {N}\cup \{0\}\).

Definition 1

A Markov automaton (MA)Footnote 1 is a tuple \(\mathcal {M}=(S,\text{ Act },{\overset{}{\mathbf {P}}}, {\overset{}{\mathrm {Q}}}, G)\) where S is a finite set of states partitioned into probabilistic (\( PS \)) and Markovian (\( MS \)), \(G\subseteq S\) is a set of goal states, \(\text{ Act }\) is a finite set of actions, \({\overset{}{\mathbf {P}}}: PS \times \text{ Act }\rightarrow \mathrm {Dist}(S)\) is the probabilistic transition matrix, \({\overset{}{\mathrm {Q}}}: MS \times S \rightarrow \mathbb {Q}\) is the Markovian transition matrix, s. t. \(\overset{}{\mathrm {Q}}(s,s')\geqslant 0\) for \(s\ne s'\), \(\overset{}{\mathrm {Q}}(s,s) = -\sum _{s' \ne s}\overset{}{\mathrm {Q}}(s,s')\).

Fig. 2.
figure 2

An example MA.

Figure 2 shows an example MA. Grey and white colours denote Markovian and probabilistic states correspondingly. Transitions labelled as \(\alpha \) or \(\beta \) are actions of state \(s_1\). Dashed transitions associated with an action represent the distribution assigned to the action. Purely solid transitions are Markovian.

Notation and further definitions: For a Markovian state \(s \in MS \) and \(s' \ne s\), we call \(\overset{}{\mathrm {Q}}(s,s')\) the transition rate from s to \(s'\). The exit rate of a Markovian state s is . \({\mathrm {\mathbf {E}}_{\small {\text {max}}}}\) denotes the maximal exit rate among all the Markovian states of \(\mathcal {M}\). For a probabilistic state \(s \in PS \), \(\text{ Act }(s)=\{\alpha \in \text{ Act }\,|\,\exists \mu \in \mathrm {Dist}(S): \mathbf {P}(s,\alpha ) = \mu \}\) denotes the set of actions that are enabled in s. \(\mathbb {P}[s,\alpha ,\cdot ]\in \mathrm {Dist}(S)\) is defined by , where \(\mathbf {P}(s',\alpha ) = \mu \). We impose the usual non-zenoness [GHH+14] restriction on MA. This disallows e. g., probabilistic states with no outgoing transitions, or with only self-loop transitions.

A (timed) path in \(\mathcal {M}\) is a finite or infinite sequence \(\rho = s_0 \overset{\alpha _0, t_0}{\longrightarrow } s_1 \overset{\alpha _1, t_1}{\longrightarrow } \cdots \overset{\alpha _k, t_k}{\longrightarrow } s_{k+1} \overset{\alpha _{k+1}, t_{k+1}}{\longrightarrow } \cdots ,\) where \(\alpha _i \in \text{ Act }(s_i)\) for \(s_i \in PS \), and \(\alpha _i = \bot \) for \(s_i \in MS \). For a finite path \(\rho = s_0 \overset{\alpha _0, t_0}{\longrightarrow } s_1 \overset{\alpha _1, t_1}{\longrightarrow } \cdots \overset{\alpha _{k-1}, t_{k-1}}{\longrightarrow } s_{k}\) we define \(\rho \mathord \downarrow = s_{k}\). The set of all finite (infinite) paths of \(\mathcal {M}\) is denoted by \( Paths ^*\) (\( Paths \)).

Time passes continuously in Markovian states. The system leaves the state after the amount of time that is governed by an exponential distribution, i. e. the probability of leaving \(s\in MS \) within \(t\ge 0\) time units is given by \(1 - \mathrm {e}^{-\mathrm {E}(s)\cdot t}\), after which the next state \(s'\) is chosen with probability \(\overset{}{\mathrm {Q}}(s,s')/\mathrm {E}(s)\).

Probabilistic transitions happen instantaneously. Whenever the system is in a probabilistic state s and an action \(\alpha \in \text{ Act }(s)\) is chosen, the successor \(s'\) is selected according to the distribution \(\mathbb {P}[s,\alpha ,\cdot ]\) and the system moves from s to \(s'\) right away. Thus, the residence time in probabilistic states is always 0.

2.1 Time-Bounded Reachability

In this work we are interested in the probability to reach a certain set of states of a Markov automaton within a given time bound. However, due to the presence of multiple actions in probabilistic states the behaviour of a Markov automaton is not a stochastic process and thus no probability measure can be defined. This issue is resolved by introducing the notion of a scheduler.

A general scheduler (or strategy) \(\pi : Paths ^*\rightarrow \mathrm {Dist}(\text{ Act })\) is a measurable function, s. t. \(\forall \rho \in Paths ^*\) if \(\rho \mathord \downarrow \in PS \) then \(\pi (\rho ) \in \mathrm {Dist}(\text{ Act }(\rho \mathord \downarrow ))\). General schedulers provide a distribution over enabled actions of a probabilistic state given that the path \(\rho \) has been observed from the beginning of the system evolution. We call stationary such a general scheduler \(\pi \) that can be represented as \(\pi : PS \rightarrow \text{ Act }\), i. e. it is non-randomised and depends only on the current state. The set of all general (stationary) schedulers is denoted by \(\varPi _{\mathtt {gen}}\) (\(\varPi _{\mathtt {stat}}\) resp.).

Given a general scheduler \(\pi \), the behaviour of a Markov automaton is a fully defined stochastic process. For the definition of the probability measure on Markov automata we refer to [Hat17].

Let \(s \in S\), \(T \in \mathbb {Q}_{\geqslant 0}\) be a time bound and \(\pi \in \varPi _{\mathtt {gen}}\) be a general scheduler. The (time-bounded) reachability probability (or value) for a scheduler \(\pi \) and state s in \(\mathcal {M}\) is defined as follows:

where \(\Diamond ^{\leqslant T}_{s} G= \{s \overset{\alpha _0,t_0}{\longrightarrow } s_1 \overset{\alpha _1,t_1}{\longrightarrow } s_2 \ldots \mid \exists i: s_i \in G\wedge \sum _{j=0}^{i-1} t_j \le T\}\) is the set of paths starting from s and reaching \(G\) before T.

For \(\mathop {\mathrm {opt}}\in \{\sup , \inf \}\), the optimal (time-bounded) reachability probability (or value) of state s in \(\mathcal {M}\) is defined as follows:

We denote by () the vector of values () for all \(s \in S\). A general scheduler that achieves optimum for is called optimal, and the one that achieves value \(\mathbf {v}\), s. t. , is \(\varepsilon \)-optimal.

Optimal Schedulers. For the time-bounded reachability problem it is known [RS13] that there exists an optimal scheduler \(\pi \) of the form \(\pi : PS \times \mathbb {R}_{\geqslant 0}\rightarrow \text{ Act }\). This scheduler does not need to know the full history of the system, but only the current probabilistic state it is in and the total time left until time bound. It is deterministic, i. e. not randomised, and additionally, this scheduler is piecewise constant, meaning that there exists a finite partition \(\mathcal {I}({\pi })\) of the time interval [0, T] into intervals \(I_0=[t_0, t_1], I_1=(t_1, t_2], \cdots , I_{k-1}=(t_{k-1}, t_k]\), such that \(t_0=0, t_k=T\) and the value of the scheduler remains constant throughout each interval of the partition, i. e. \(\forall I \in \mathcal {I}({\pi }), \forall t_1,t_2 \in I, \forall s \in PS {}: \pi (s,t_1) = \pi (s,t_2)\). The value of \(\pi \) on an interval \(I\in \mathcal {I}({\pi })\) and \(s\in PS {}\) is denoted by \(\pi (s, I)\), i. e. \(\pi (s, I) = \pi (s,t)\) for any \(t\in I\).

As an example, consider the MA in Fig. 2 and time bound \(T=1\). Here the optimal scheduler for state \(s_1\) chooses the reliable but slow action \(\beta \) if there is enough time, i. e. if at least 0.62 time is left. Otherwise the optimal scheduler switches to a more risky, but faster path via action \(\alpha \).

In the literature this subclass of schedulers is sometimes referred to as total-time positional deterministic, piecewise constant schedulers. From now on we call a scheduler from this subclass simply a scheduler (or strategy) and denote the set of such schedulers with \(\varPi \). An important notion of schedulers is the switching point, the point of time separating two intervals of constant decisions:

Definition 2

For a scheduler \(\pi \) and \(s\in PS {}\) we call \(\tau \in \mathbb {R}_{\geqslant 0}\) a switching point, iff \(\exists I_1, I_2\in \mathcal {I}({\pi })\), s. t. \(\tau =\sup {I_1}\) and \(\tau = \inf {I_2}\) and \(\exists s \in PS {}: \pi (s, I_1) \ne \pi (s, I_2)\).

Whether the switching points can be computed exactly or not is an open problem. In fact, the theorem of Lindemann-Weierstrass suggests that switching points are non-algebraic numbers, what hints at a negative answer.

3 Related Work

In this section we briefly review the algorithms designed to approximate time bounded reachability probabilities. We only discuss the algorithms that guarantee to compute \(\varepsilon \)-close approximation of the reachability value.

The majority of the algorithms [Neu10, BS11, FRSZ16, SSM18, BHHK15] are available for continuous time Markov decision processes (CTMDPs) [Ber00]. Two of those, [Neu10] and [BHHK15], are also applicable to MA. We compare to them in our empirical evaluation in Sect. 5. All the algorithms utilise such known techniques as discretisation, uniformisation, or a combination thereof. The drawback of most of the algorithms is that they do not adapt to a specific instance of a problem. Namely, given a model \(\mathcal {M}\) to analyse, they perform as many computations as is needed for \(\widehat{\mathcal {M}}\), which is the worst-case model in a subclass of models that share certain parameters with \(\mathcal {M}\), such as \({\mathrm {\mathbf {E}}_{\small {\text {max}}}}\), for example. Experimental evaluation performed in [BHHK15] shows that such approaches are not promising, because most of the time the algorithms perform too many unnecessary computations. This is not the case for [BS11] and [BHHK15]. The latter performs the analysis via uniformisation and schedulers that cannot observe time. The former, designed for CTMDPs, performs discretisation of the time horizon with intervals of variable length, however is not applicable to MA. Just like in [BS11], our approach is to adapt the discretisation of the time horizon to a specific instance of the problem.

4 Our Solution

In this section we present a novel approach to approximating optimal time-bounded reachability and the optimal scheduler for an arbitrary Markov automaton. Throughout the section we work with an MA \(\mathcal {M}=(S,{\text {Act}},{\mathbf {P}},{\overset{}{\mathrm {Q}}},G)\), time bound \(T\in \mathbb {Q}_{\geqslant 0}\) and precision \(\varepsilon \in \mathbb {Q}_{>0}\). To simplify the presentation we concentrate on supremum reachability probability.

Given a scheduler, computation (or approximation) of the reachability probability is relatively easy:

Lemma 1

For a scheduler \(\pi \in \varPi \) and a state \(s \in S\), the function is the solution to the following system of equations:


Let \(0=\tau _0< \tau _{1}<\ldots<\tau _{k-1}<\tau _k = T\), where \(\tau _i\) are the switching points of \(\pi \) for \(i=1..k-1\). The solution of the system of Equations (1)–(2) can be obtained separately on each of the intervals \((\tau _{i-1}, \tau _i], \forall i=1..k\), where the value of the scheduler remains constant for all states. Given the solution on interval \((\tau _{i-1}, \tau _{i}]\), we derive the solution for \((\tau _{i}, \tau _{i+1}]\) by using the values as boundary conditions. Later in Sect. 4.1 we will show that the approximation of the solution for each interval \((\tau _{i-1}, \tau _{i}]\) can be achieved via a combination of known techniques, such as uniformisation (for the Markovian states) and untimed reachability analysis (for probabilistic states).

Thus, given an optimal scheduler, Lemma 1 can be used to compute or approximate the optimal reachability value. Finding an optimal scheduler is therefore the challenge for optimal time-bounded reachability analysis. Our solution is based on approximating the optimal reachability value up to an arbitrary \(\varepsilon > 0\) by discretising the time horizon with intervals of variable length. On each interval the value of our \(\varepsilon \)-optimal scheduler remains constant. The discretisation we use attempts to reflect the partition \(\mathcal {I}({\pi })\) of a minimalFootnote 2 optimal scheduler \(\pi \), i. e. it mimics intervals on which \(\pi \) has constant value.

Our solution is presented in Algorithm 1. It computes an \(\varepsilon \)-optimal scheduler \(\pi _\mathtt {opt}\) and approximates the system of Equations (1)–(2) for \(\pi _\mathtt {opt}\). The algorithm iterates over intervals of constant decisions of an \(\varepsilon \)-optimal strategy. At each iteration it computes: (i) a stationary scheduler \(\pi \) that is close to be optimal on the current interval (line 7), (ii) length \(\delta \) of the interval, on which \(\pi \) introduces acceptable error (line 8) and (iii) the reachability values for time \(t+\delta \) (line 9). The following sections discuss the steps of the algorithm in more detail.

Theorem 1

Algorithm 1 approximates the value of an arbitrary Markov automaton for time bound \(T \in \mathbb {Q}_{\geqslant 0}\) up to a given \(\varepsilon \in \mathbb {Q}_{> 0}\).

figure b

4.1 Computing the Reachability Value

In this section we discuss steps 4 and 9, that require computation of the reachability probability according to the system of Equations (1)–(2). Our approach is based on the approximation of the solution. The presence of two types of states, probabilistic and Markovian, demands separate treatment of those. Informally, we will combine two techniques: time-bounded reachability analysis on continuous time Markov chainsFootnote 3 for Markovian states and time-unbounded reachability analysis on discrete time Markov chainsFootnote 4 for probabilistic states. Parameters \(w\) and \(\varepsilon _i\) of Algorithm 1 control the error allowed by the approximation. Here \(\varepsilon _i\) bounds the error for the very first instance of time-unbounded reachability in line 4. While \(w\) defines the fraction of the error that can be used by the approximations in subsequent iterations ( and ).

We start with time-unbounded reachability analysis for probabilistic states. Let \(\pi \in \varPi _{\mathtt {stat}}, s,s' \in S\). We define


This value denotes the probability to reach state \(s'\) starting from state s by performing any number of probabilistic transitions and no Markovian transitions. This system of linear equations can be either solved exactly, e. g. via Gaussian elimination, or approximated (numerical methods). If is under-approximated we denote it by , where \(\epsilon \) is the approximation error. For \(A\subseteq S\) we define , .

For time bound \(0, s\in PS \) the value is the optimal probability to reach any goal state via only probabilistic transitions. We denote it by (step 4). It is a well-known problem on discrete time Markov decision processes [Put94] and can be computed or approximated by policy iteration, linear programming [Put94] or interval value iteration [HM14, QK18, BKL+17]. If the value is approximated up to \(\epsilon \), we denote it by .

The reachability analysis on Markovian states is solved with the well-known uniformisation approach [Jen53]. Informally, Markovian states will be implicitly uniformised: The exit rate for each Markovian state will be equal \({\mathrm {\mathbf {E}}_{\small {\text {max}}}}\) (by adding a self-loop transition), but this will not affect the reachability value.

We will first define the discrete probability to reach the target vector within k Markovian transitions. Let \(\mathbf {x}\in [0,1]^{|S|}\) be a vector of values for each state. For \(k \in \mathbb {N}_{\geqslant 0}, \pi \in \varPi _{\mathtt {stat}}\) we define if \(s \in G\) and otherwise:


The value is the weighted sum over all states \(s'\) of the value \(\mathbf {x}_{s'}\) and the probability to reach \(s'\) starting from s within k Markovian transitions. Therefore the counter k decreases only when a Markovian state performs a transition and is not affected by probabilistic transitions. If values are approximated up to precision \(\epsilon \), i. e. is used for probabilistic states instead of in (4), we use the notation .

We denote with the probability mass function of the Poisson distribution with parameter \(\lambda \). For a \(\tau \in \mathbb {R}_{\geqslant 0}\) and , is some natural number satisfying , e. g. [BHHK15], where e is the Euler’s number.

We are now in position to describe a way to compute at line 9 of Algorithm 1. Let be a vector of values computed by the previous iteration of Algorithm 1 for time t. Let be the solution of the system of Equation (1) for time point \(t+\delta \), a stationary scheduler \(\pi : PS \rightarrow \text{ Act }\) and where is used instead of as the boundary conditionFootnote 5. The following Lemma shows that can be efficiently approximated up to :

Lemma 2

Let and \(\delta \in [0, T-t]\). Then , where:


4.2 Choosing a Strategy

The strategy for the next interval is computed in Step 7 and implicitly in Step 4. The latter has been discussed in Sect. 4.1. We proceed to Step 7.

Here we search for a strategy that remains constant for all time points within interval \((t, t+\delta ]\), for some \(\delta >0\), and introduces only an acceptable error. Analogously to results for continuous time Markov decision processes [Mil68], we prove that derivatives of function at time \(\tau =t\) help finding the strategy \(\pi \) that remains optimal for interval \((t, t+\delta ]\), for some \(\delta >0\). This is rooted in the Taylor expansion of function via the values of . We define sets

$$\begin{aligned} \mathcal {F}_0&= \{\pi \in \varPi _{\mathtt {stat}}\mid \forall s \in PS : \pi =\text {arg max}{}_{\pi ' \in \varPi _{\mathtt {stat}}} \mathbf {d}^{(0)}_{\pi '}(s)\} \\ \mathcal {F}_i&= \{\pi \in \mathcal {F}_{i-1} \mid \forall s \in PS : \pi =\text {arg max}{}_{\pi ' \in \mathcal {F}_{i-1}} (-1)^{i-1} \mathbf {d}^{(i)}_{\pi '}(s)\}, i\geqslant 1, \end{aligned}$$

where for \(\pi \in \varPi _{\mathtt {stat}}\), \(s\in G: \mathbf {d}^{(0)}_{\pi }(s)=1\), for , for and for \(i\geqslant 1\):

The value \(\mathbf {d}^{(i)}_{\pi }(s)\) is the \(i^{\text {th}}\) derivative of at time t for a scheduler \(\pi \).

Lemma 3

If \(\pi \in \mathcal {F}_{|S|+1}\) then \(\exists \delta > 0\) such that \(\pi \) is optimal on \((t, t+\delta ]\).

Thus in order to compute a stationary strategy that is optimal on time-interval \((t, t+\delta ]\), for some \(\delta >0\), one needs to compute at most \(|S|+1\) derivatives of at time t. Procedure \(\textsc {FindStrategy}\) does exactly that. It computes sets \(\mathcal {F}_i\) until for some \(j\in 0..(|S|+1)\) there is only 1 strategy left, i. e. \(|\mathcal {F}_j|=1\). Otherwise it outputs any strategy in \(\mathcal {F}_{|S|+1}\). Similarly to Sect. 4.1, the scheduler that maximises the values can be approximated. This question and other optimisations are discussed in detail in Sect. 4.4.

4.3 Finding Switching Points

Given that a strategy \(\pi \) is computed by \(\textsc {FindStrategy}\), we need to know for how long this strategy can be followed before the action has to change for at least one of the states. We consider the behaviour of the system in the time interval [tT]. Recall the function , defined in Sect. 4.1 (Lemma 2) as the solution of the system of Equation (1) with the boundary condition , for a stationary scheduler \(\pi \). For a probabilistic state s the following holds:


Let \(s \in PS , \pi \in \varPi _{\mathtt {stat}}, \alpha \in \text{ Act }(s)\). Consider the following function:

This function denotes the reachability value for time bound \(t+\delta \) and a scheduler that is different from \(\pi \). Namely, this is such a scheduler, that all states follow strategy \(\pi \), except for state s, that selects action \(\alpha \) for the very first transition, and afterwards selects action \(\pi (s)\). Between two switching points the strategy \(\pi \) is optimal and therefore the value of is not greater than for all \(s\in PS {}, \alpha \in \text{ Act }(s)\). If for some \(\delta \in [0, T-t], s \in PS , \alpha \in \text{ Act }(s)\) it holds that , then action \(\alpha \) is better for s then \(\pi (s)\), and therefore \(\pi (s)\) is not optimal for s at \(t+\delta \). We show that the next switching point after time point t is such a value \(t+\delta ,\delta \in (0, T-t]\), that


Procedure \(\textsc {FindStep}\) approximates switching points iteratively. It splits the time interval [0, T] into subintervals \([t_1,t_2], \ldots , [t_{n-1}, t_{n}]\) and at each iteration k checks whether (7) holds for some \(\delta \in [t_{k}, t_{k+1}]\). The latter is performed by procedure \(\textsc {CheckInterval}\). If \(\forall \delta \in [t_k, t_{k+1}]\) (7) does not hold, \(\textsc {FindStep}\) repeats by increasing k. Otherwise, it outputs the largest \(\delta \in [t_{k}, t_{k+1}]\) for which (7) does not hold (line 11). This is done by binary search up to distance \(\delta _{\min }\). Later in this section we will show that establishing that (7) does not hold for all \(\delta \in [t_k,t_{k+1}]\) can be efficiently performed by considering only 2 time points of the interval \([t_k,t_{k+1}]\) and a subset of state-action pairs.

figure c

Selecting \(t_k\). This step is a heuristic. The correctness of our algorithm does not depend on the choices of \(t_k\), but its runtime is supposed to benefit from it: Obviously, the runtime of \(\textsc {FindStrategy}\) is best given an oracle that produces time points \(t_k\) which are exactly the switching points of the optimal strategy. Any other heuristic is just a guess.

At every iteration k we choose such a time point \(t_k\) that the MA is very likely to perform at most k Markovian transitions within time \(t_k\). “Very likely” here means with probability . For \(k \in \mathbb {N}\) we define as follows: , and for \(k > 1\): .

Searching for switching points within \([t_{k}, t_{k+1}]\). In order to check whether for all \(\delta \in [t_{k}, t_{k+1}]\) we only have to check whether the maximum of function is at most 0 on this interval for all \(s\in PS ,\alpha \in \text{ Act }(s)\). In order to achieve this we work on the approximation of \(\mathbf { diff }(s, \alpha , t+\delta )\) derived from Lemma 2, thus establishing a sufficient condition for the scheduler to remain optimal:


Here () denotes an under-approximation of the value ( resp.) up to , defined in Lemma 2. And analogously for . Simple rewriting leads to the following:


where . In order to find the supremum of the right-hand side of (9) over all \(\delta \in [a, b]\) we search for extremum of each , separately as a function of \(\delta \). Simple derivative analysis shows that the extremum of these functions is achieved at \(\delta =i/{\mathrm {\mathbf {E}}_{\small {\text {max}}}}\). Truncation of the time interval by \((\left\lfloor t_{k} \cdot {\mathrm {\mathbf {E}}_{\small {\text {max}}}}\right\rfloor +1) /{\mathrm {\mathbf {E}}_{\small {\text {max}}}}\) (step 4, Algorithm 2) ensures that for all \(i=0..k\) the extremum of \(y_i(\delta )\) is attained at either \(\delta =t_{k}\) or \(\delta =t_{k+1}\).

Lemma 4

Let \([t_k, t_{k+1}]\) be the interval considered by \(\textsc {CheckInterval}\) at iteration k. \(\forall \delta \in [t_{k}, t_{k+1}], s \in PS , \alpha \in \text{ Act }\):



$$\begin{aligned} \delta (s, \alpha , i)&= {\left\{ \begin{array}{ll} t_{k} &{} \text {if } B^i_{\pi , \varepsilon _{\text {N}}}(s,\alpha ) \geqslant 0 \text { and } i/{\mathrm {\mathbf {E}}_{\small {\text {max}}}}\leqslant t_{k} \\ &{} \text { or } B^i_{\pi , \varepsilon _{\text {N}}}(s,\alpha ) \leqslant 0 \text { and } i/{\mathrm {\mathbf {E}}_{\small {\text {max}}}}> t_{k} \\ t_{k+1} &{} \text {otherwise} \end{array}\right. } \end{aligned}$$

\(\textsc {CheckInterval}\) returns false iff for all \(s \in PS , \alpha \in \text{ Act }\) the right-hand side of (10) is less or equal to 0. Since Lemma 4 over-approximates \(\mathbf { diff }(s, \alpha , t+\delta )\) false positives are inevitable. Namely, it is possible that procedure \(\textsc {CheckInterval}\) suggests that there exists a switching point within \([t_k, t_{k+1}]\), while in reality there is none. This however does not affect correctness of the algorithm and only its running time.

Finding Maximal Transitions. Here we show that there exists a subset of states, such that, if the optimal strategy for these states does not change on an interval, then the optimal strategy for all states does not change on this interval.

In the following we call a pair \((s,\alpha ) \in PS \times \text{ Act }\) a transition. For transitions \((s, \alpha ), (s', \alpha ') \in PS \times \text{ Act }\) we write \((s,\alpha ) \preceq _{k} (s', \alpha ')\) iff \(C_{\pi , \varepsilon _{\text {N}}}(s,\alpha ) \leqslant C_{\pi , \varepsilon _{\text {N}}}(s',\alpha ')\) and \(\forall i=0..k: B^i_{\pi , \varepsilon _{\text {N}}}(s,\alpha ) \leqslant B^i_{\pi , \varepsilon _{\text {N}}}(s',\alpha ')\). We say that a transition \((s,\alpha )\) is maximal if there exists no other transition \((s',\alpha ')\) that satisfies the following: \((s,\alpha )\preceq _{k} (s',\alpha ')\) and at least one of the following conditions hold: \(C_{\pi , \varepsilon _{\text {N}}}(s,\alpha ) < C_{\pi , \varepsilon _{\text {N}}}(s',\alpha ')\) or \(\exists i=0..k: B^i_{\pi , \varepsilon _{\text {N}}}(s,\alpha ) < B^i_{\pi , \varepsilon _{\text {N}}}(s',\alpha ')\). The set of all maximal transitions is denoted with \(\mathcal {T}_\mathrm {max}(k)\).

We prove that if inequality (10) holds for all transitions from \(\mathcal {T}_\mathrm {max}(k)\), then it holds for all transitions. Thus only transitions from \(\mathcal {T}_\mathrm {max}(k)\) have to be checked by procedure \(\textsc {CheckInterval}\). In our implementation we only compute \(\mathcal {T}_\mathrm {max}(k)\) before the call to \(\textsc {CheckInterval}\) at line 11 of Algorithm 2, and use the set \(A= PS \times \text{ Act }\) within the while-loop.

4.4 Optimisation for Large Models

Here we discuss a number of implementation improvements developers should consider when applying our algorithm to large case studies:

Switching points. It may happen that the optimal strategy switches very often on a time interval, while the effect of these frequent switches is negligible. The difference may be so small that the \(\varepsilon \)-optimal strategy actually stays stationary on this interval. In addition, floating-point computations may lead to imprecise results: Values that are 0 in theory might be represented by non-zero float-point numbers, making it seem as if the optimal strategy changed its decision, when in fact it did not. To counteract these issues we can modify \(\textsc {CheckInterval}\) such that it outputs false even if the right-hand side of (10) is positive, as long as it is sufficiently small. The following lemma proves that the error introduced by not switching the decision is acceptable:

Lemma 5

Let \(\delta = t_{k+1}-t_{k}\), \(\varepsilon '=\varepsilon -\varepsilon _i, \epsilon \in (0, \varepsilon '\cdot \delta /T)\) and \(N(\delta ,\epsilon )=({\mathrm {\mathbf {E}}_{\small {\text {max}}}}\delta )^2/2.0/\epsilon \). If \(\forall s \in PS , \alpha \in \text{ Act }, \tau \in [t_{k}, t_{k+1}]\) the right-hand side of (10) is not greater than \((\varepsilon '\delta /T-\epsilon )/N(\delta , \epsilon )\), then \(\pi \) is \(\varepsilon '\delta /T\)-optimal in \([t_{k}, t_{k+1}]\).

Optimal strategy. In some cases computation of the optimal strategy in the way it was described in Sect. 4.2 is computationally expensive, or is not possible at all. For example, if some values \(|\mathbf {d}^{(i)}_{\pi }(s)|\) are larger than the maximal floating point number that a computer can store, or if the computation of \(|S|+1\) derivatives is already too prohibitive for models of large state space, or if the values can only be approximated and not computed precisely. With the help of Lemma 5 and minor modifications to Algorithm 1, the correctness and convergence of Algorithm 1 can be preserved even when the strategy computed by \(\textsc {FindStrategy}\) is not guaranteed to be optimal.

5 Empirical Evaluation

We implemented our algorithm as a part of IMCA  [GHKN12]. Experiments were conducted as single-thread processes on an Intel Core i7-4790 with 32 GB of RAM. We compare the algorithm presented in this paper with [Neu10] and [BHHK15]. Both are available in IMCA. We use the following abbreviations to refer to the algorithms: FixStep for [Neu10], \(\texttt {Unif}^{+}\) for [BHHK15] and SwitchStep for Algorithm 1. The value of the parameter \(w\) in Algorithm 1 is set to 0.1, \(\varepsilon _i=0\). We keep the default values of all other algorithms.

The evaluation is performed on a set of published benchmarks:

dpm-j-k: A model of a dynamic power management system [QWP99], representing the internals of a Fujitsu disk drive. The model contains a queue, service requester, service provider and a power manager. The requester generates tasks of j types differing in energy requirements, that are stored in the queue of size k. The power manager selects the processing mode for the service provider. A state is a goal state if the queue of at least one task type is full.

qs-j-k and ps-j-k: Models of a queuing system [HH12] and a polling system  [GHH+13] where incoming requests of j types are buffered in two queues of size k each, until they are processed by the server. We consider the state with both queues being full to form the goal state set.

The memory required by all three algorithms is polynomial in the size of the model. For the evaluation we therefore concentrate on runtime only. We set the time limit for the experiments to 15 minutes. Timeouts are marked by in the plots. Runtimes are given in seconds. All the plots use the log-log axis.

Table 1. The discretisation step used in some of the experiments shown in Fig. 3.
Fig. 3.
figure 3

Running time comparison of FixStep and SwitchStep.

5.1 Results

SwitchStep vs FixStep. Figure 3 compares runtimes of SwitchStep and FixStep. For these experiments precision is set to \(10^{-3}\) and the state space size ranges from \(10^2\) to \(10^5\).

This plot represents the general trend observed in many experiments: The algorithm FixStep does not scale well with the size of the problem (state space, precision, time bound). For larger benchmarks it usually required more than 15 minutes. This is likely due to the fact that the discretisation step used by FixStep is very small, which means that the algorithm performs many iterations. In fact Table 1 reports on the size of the discretisation steps for both FixStep and SwitchStep on a few benchmarks. Here the column \(\delta _{\texttt {F}}\) shows the length of the discretisation step of \(\texttt {FixStep} \). As we mentioned in Sect. 3, this step is fixed for the selected values of time bound and precision. Columns \(\min \delta _{\texttt {S}}\), \(\text {avg}\delta _{\texttt {S}}\) and \(\max \delta _{\texttt {S}}\) show minimal, average and maximal steps used by SwitchStep respectively. The average step used by SwitchStep is several orders of magnitude larger than that of FixStep. Therefore SwitchStep performs much less iterations. Even though each iteration takes longer, overall significant decrease in the amount of iterations leads to much smaller total runtime.

Table 2. Parameters of the experiments shown in Fig. 4.

SwitchStep vs \(\texttt {Unif}^{+}\). In order to compare SwitchStep with \(\texttt {Unif}^{+}\) we have to restrict ourselves to a subclass of Markov automata in which probabilistic and Markovian states alternate, and probabilistic states have only 1 successor for each action. This is due to the fact that \(\texttt {Unif}^{+}\) is available in IMCA only for this subclass of models.

Fig. 4.
figure 4

Running times of algorithms SwitchStep and \(\texttt {Unif}^{+}\).

Figure 4 shows the comparison of running times of SwitchStep and \(\texttt {Unif}^{+}\). For the plot on the left we varied those model parameters that affect state space size, number of non-deterministic actions and maximal exit rate. In the plot on the right the model parameters are fixed, but precision and time bounds used for the experiments are differing. Table 2 shows the parameters of the models used in these experiments. We observe that there are cases in which SwitchStep performs remarkably better than \(\texttt {Unif}^{+}\), and cases of the opposite. Consider the experiments in Fig. 4, right. They show that \(\texttt {Unif}^{+}\) may be highly sensitive to variations of time bounds and precision, while SwitchStep is more robust in this respect. This is due to the fact that the scheduler computed by \(\texttt {Unif}^{+}\) does not have means to observe time precisely and can only guess it. This may be good enough, which is the case on the \(\texttt {ps}\) benchmark. However if it is not, then better precision will require many more computations. Additionally \(\texttt {Unif}^{+}\) does not use discretisation. This means that the increase of the time bound from \(T_1\) to \(T_2\) may significantly increase the overall running time, even if no new switching points appear on the interval \([T_1, T_2]\). SwitchStep does not suffer from these issues due to the fact that it considers schedulers that observe the time precisely and uses the discretisation. Large time intervals that introduce no switching points will likely be handled within one iteration.

In general, SwitchStep performs at its best when there are not too many switching points, which is what is observed in most published case studies.

Conclusions: We conclude that SwitchStep does not replace all existing algorithms for time bounded reachability. However it does improve the state of the art in many cases and thus occupies its own niche among available solutions.