Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Markov models are widely used in many areas of science and engineering in order to evaluate the probability of certain events of interest. Quantitative analysis of time-bounded properties of Markov models typically proceeds through numerical analysis, via solution of equations yielding the probability of the system residing in a given state at a given time, or via simulation-based exploration of its execution paths. For continuous-time Markov chains (CTMCs), a commonly employed method is uniformisation (also known as Jensen’s method), which is based on the discretisation of the original CTMC and on the numerical computation of transient probabilities (that is, probability distributions over time). This can be combined with graph-theoretic techniques for probabilistic model checking against temporal logic properties [4].

There are many situations where highly accurate probability estimates are necessary, for example for reliability analysis in safety-critical systems or for predictive modelling in scientific experiments, but this is difficult to achieve in practice because of the state-space explosion problem. Imprecise values are known to lead to lack of robustness, in the sense that the satisfaction of temporal logic formulae can be affected by small changes to the formula bound or the probability distribution of the model. Simulation-based analysis does not suffer from this problem and additionally allows dynamic adaptation of the sampling procedure, as e.g. in importance sampling, to the current values of the transient probability distribution. However, this analysis provides only weak precision guarantees in the form of confidence intervals. In order to enable the handling of larger state spaces, two types of techniques have been introduced: state aggregation and state-space truncation. State aggregation techniques build a reduced state space using lumping [6] or bisimulation quotient [21], and have been proposed both in exact [21] and approximate form [10], with the latter deemed more robust than than the exact ones [11]. State-space truncation methods, e.g. fast adaptive uniformisation (FAU) [9, 23], on the other hand, only consider the states whose probability mass is not negligible, while clustering states where the probability is less than a given threshold and computing the total probability lost. Unfortunately, though these methods allow the user to specify a desired precision, none provide explicit and general error bounds that can be used to quantify the accuracy of the numerical computation: more precisely, these truncation methods provide a lower bound on the probability distributions in time, and the total probability lost can be used to derive a (rather conservative) upper bound on the (point-wise) approximation error as the sum of the lower bound and of the total probability lost.

Key Contributions. We propose a novel adaptive aggregation method for Markov chains that allows us to control its approximation error based on explicitly derived error bounds. The method can be combined with numerical techniques such as uniformisation [9, 23], typically employed in quantitative verification of Markov chains. The method works over a finite time interval by clustering the state space of a Markov chain sequentially in time, where the quality of the current aggregation is quantified by a number of metrics. These metrics, in conjunction with user-specified precision requirements, drive the process by automatically and adaptively reclustering its state space depending on the explicit error bounds. In contrast to related simulation-based approaches in the literature [13, 31] that employ the current probability distribution of the aggregated model to selectively cluster the regions of the state space containing negligible probability mass, our novel use of the derived error bounds allows far greater accuracy and flexibility as it accounts also for the past history of the probability mass within specific clusters.

To the best of our knowledge, despite recent attempts [10, 11] the development and use of explicit bounds on the error associated with a clustering procedure is new for the simulation and analysis of Markov chains. The versatility of the method is further enhanced by employing a variety of different metrics to assess the approximation quality. More specifically, we use the following to control the error: (1) the probability distributions in time (namely, the point-wise difference between concrete and abstract distributions), (2) the time-wise likelihood of events (\(L_1\) norm and total variation distance), as well as (3) the probability of satisfying a temporal logic specification.

We implement our method in conjunction with uniformisation for the computation of probability distributions of the process in time, as well as time-bounded probabilities (a key procedure for probabilistic model checking against temporal logic specifications), and evaluate it on two case studies of chemical reaction networks. Compared to fast adaptive uniformisation as implemented in PRISM [9], currently the best performing technique in this setting, we demonstrate that our method yields a marked improvement in numerical precision without degrading its performance.

Related Work. (Bio-)chemical reaction networks can be naturally analysed using discrete stochastic models. Since the discrete state space of these models can be large or even infinite, a number of numerical approaches have been proposed to alleviate the associated state-space explosion problem. For biochemical models with large populations of chemical components, fluid (mean-field) approximation techniques can be applied [5] and extended to approximate higher-order moments [12]: these deterministic approximations lead to a set of ordinary differential equations. In [16], a hybrid method is proposed that captures the dynamics using a discrete stochastic description in the case of small populations and a moment-based deterministic description for large populations. An alternative approach assumes that the transient probabilities can be compactly approximated based on quasi product forms [3]. All the mentioned methods do not provide explicit accuracy bounds of approximation.

A widely studied model reduction method for Markov models is state aggregation based on lumping [6] or (bi-)simulation equivalence [4], with the latter notion in its exact [21] or approximate [10] form. In particular, approximate notions of equivalence have led to new abstraction/refinement techniques for the numerical verification of Markov models over finite [11] as well as uncountably-infinite state spaces [1, 2, 26]. Related to these techniques, [27] presents an algorithm to approximate probability distributions of a Markov process forward in time, which serves as an inspiration for our adaptive scheme. From the perspective of simulations, adaptive aggregations are discussed in [13] but no precision error is given: our work differs by developing an adaptive aggregation scheme, where a formal error analysis steers the adaptation.

An alternative method to deal with large/infinite state spaces is truncation, where a lower bound on the transient probability distribution of the concrete model is computed, and the total probability mass that is lost due to this truncation is quantified. Such methods include finite state projections [24], sliding window abstractions [18], or fast adaptive uniformisation (FAU) [9, 23]. Apart from truncating the state space by neglecting states with insignificant probability mass, FAU dynamically adapts the uniformisation rate, thus significantly reducing the number of uniformisation steps [30]. The efficiency of the truncation techniques depends on the distribution of the significant part of the probability mass over the states, and may result in poor accuracy if this mass is spread out over a large number of states, or whenever the selected window of states does not align with a property of interest.

Summarising, whilst a number of methods have been devised to study or to simulate complex biochemical models, in most cases a rigorous error analysis is missing [13, 22, 31], or the error analysis cannot be effectively used to obtain accurate bounds on the probability distribution or on the likelihood of events of interest [17].

Structure of this Article. Section 2 introduces the sequential aggregation approach to approximate the transient probability distribution (that is, the distribution over time) of discrete-time Markov chains, and quantifies bounds on the introduced error according to three different metrics. Section 3 applies the aggregation method for temporal logic verification of Markov chains. In Sect. 4, we implement adaptive aggregation for continuous-time Markov chain models of chemical reaction networks, in conjunction with known techniques such as uniformisation and threshold truncation. Finally, Sect. 5 discusses experimental results.

2 Computation of the Transient Probability Distribution

We first work with discrete-time labelled Markov chains (LMC), and in Sect. 4 we show how to apply the obtained results to (labelled) continuous-time Markov chains. Formally, an LMC is defined as a triple (SPL), where

  • \(S = \{s_1, \ldots , s_n\}\) is the finite state space of size n;

  • \(P: S \times S \rightarrow [0,1]\) is the transition probability matrix, which is such that \( \forall j \in S: \sum _{i=1}^n P_{ji} = \sum _{i=1}^n P(j,i) = 1\);

  • \(L: S \rightarrow 2^\varSigma \) is a labelling function, where \(\varSigma \) is a finite alphabet built from a set of atomic propositions.

Whenever clear from the context, we refer to the model simply as (SP). The model is initialised via distribution \(\pi _0: S \rightarrow [0,1], \sum _{s\in S} \pi _0 (s) = 1\), and its transient probability distribution at time step \(k \ge 0\) is

$$\begin{aligned} \pi _{k+1} (s) = \sum _{s' \in S} \pi _k(s') P (s',s), \end{aligned}$$
(1)

or more concisely as \(\pi _{k+1} = \pi _k P\) (where the \(\pi _k\)’s are row vectors). We are interested in providing a compact representation and an efficient computation of the vectors \(\pi _k\).

Sequential Aggregations of the Markov Chain. Consider the finite time interval of interest \([0,1,\ldots , N]\). Divide this interval into a given number (q) of sub-intervals, namely select \(N_1, N_2, \ldots , N_q: \sum _{i=1}^{q} N_i = N\), and consider the evolution of the model within the corresponding l-th interval \([\sum _{i=0}^{l-1} N_i,\sum _{i=0}^{l} N_i]\), for \(l=1, \ldots q\), and where we have set \(N_0 = 0\).

We assume that a specific state-space aggregation is given, for each of the q sub-intervals of time. Later, in Sect. 4, we show how such aggregations can be obtained adaptively, based on a number of measures (such as the current value of the aggregated transient probability distribution, or the accrued aggregation error in time). In particular, at the l-th step (where \(l = 1, \ldots , q\)), the state space is partitioned (clustered) as \({S = \cup _{i=1}^{m_l} S_i^l}\) (consider that the cardinality index \(m_l\) has been reasonably selected so that \(m_l<<n\)), and denote the abstract (aggregated) state space simply as \(S^l\) and its elements (the abstract states) with \(\phi _i, i=1,\ldots , m_l\). Introduce abstraction and refinement maps as \(\alpha ^l: S \rightarrow S^l\) and \(A^l: S^l \rightarrow 2^S\), respectively – the first takes concrete points into abstract ones, whereas the latter relates abstract states to concrete partitioning sets. For any pair of indices \(i,j = 1, \ldots , m_l\), define the abstract transition matrix as

$$\begin{aligned} P^l (\phi _j,\phi _i) \doteq \frac{1}{\mid A^l(\phi _i) \mid } \sum _{s\in A^l(\phi _i)} \sum _{s' \in A^l(\phi _j)} P(s',s). \end{aligned}$$

The intuition behind the aggregated matrix \(P^l\) is that it encompasses the average incoming probability from clusters \(S_j\) to \(S_i\). The shape of this matrix is justified by the structure of the update equation in (1). Given the aggregated Markov chain, we shall work, for all \(s \in S^l\), with the following recursions:

$$\begin{aligned} \pi _{k+1}^l (s) = \sum _{s' \in S^1} \pi _k^l(s') P^l (s',s). \end{aligned}$$

The smaller, aggregated model \((S^l,P^l)\) serves as basis for an approximate computation of the transient probability in time: we now calculate an explicit upper bound on the approximation error. In order to quantify this error, we define a function \({\epsilon ^l: [1,\ldots ,m_l]^2} \rightarrow [0,1]\), as follows:

$$\begin{aligned} \epsilon ^l (j,i) \doteq \max _{s \in S_i^l} \left| \frac{\mid S_i^l \mid }{\mid S_j^l \mid } P(S_j,s) - P^l(\phi _j,\phi _i) \right| . \end{aligned}$$
(2)

Intuitively, this quantity accounts for the difference between the average incoming probability between a pair (ji) of partitioning sets, and the worst-case (rescaled) point-wise incoming probability between those two sets. Introduce the terms \(\epsilon ^l (j) : = \sum _{i=1}^{m_l} \epsilon ^l (j,i)\).

Finally, define, for all \(s \in S\), \(\tilde{\pi }_{k}^l (s) = \pi _k^l (\alpha ^l(s))/\mid A^l(\alpha ^l(s))\mid \) as a (normalised) piecewise constant approximation of the abstract functions \(\pi _k^l\). Functions \(\tilde{\pi }_{k}^l\), being defined over the concrete state space S, will be employed for comparison with the original distribution functions \(\pi _k\). Specifically, for the initial interval \([N_0,N_1]\) (with \(l=1\)), approximate the initial distribution \(\pi _0\) by \(\pi _0^1\) as: \(\forall s \in S^1, \pi _0^1(s) = \sum _{s'\in A^1(s)} \pi _0(s')\). Similarly, we have that \(\forall s \in S, \tilde{\pi }_0^1 (s) = \pi _0^1(\alpha ^1(s))/\mid A^1(\alpha ^1(s))\mid \).

Remark 1

Exact and approximate probabilistic bisimulations [10, 21] build a quotient or a cover of the state space of the original model based on matching or approximating the “outgoing probability” from concrete states – for example, exact probabilistic bisimulation compares, for state pairs \((s_1, s_2)\) within a partition, the “outgoing” probabilities \(P(s_1,B)\) and \(P(s_2,B)\) over partitions B. On the other hand, in (2) we approximate the “incoming probability”, as motivated by the approximation of the recursions in (1).    \(\Box \)

Explicit Error Bounds for the Quality of the Sequential Aggregations. Let us consider the aggregated model \((S^1,P^1)\) (for \(l=1\)) and, given the aggregated vector \(\pi _0^1\), the time-wise updates \(\pi _{k+1}^1 = \pi _k^1 P^1, k = N_0, \ldots , N_1-1\). Introduce the interpolated vectors \(\tilde{\pi }_{k+1}^1 (s), s\in S\), defined as \(\tilde{\pi }_{k+1}^1 (s) = \pi _{k+1}^1 (\alpha ^1(s)){/} \mid A^1 (\alpha ^1 (s)) \mid \). We are interested in a bound on the point-wise error defined over the concrete state space, namely \(\forall s \in S, k = N_0,\ldots , N_1\), \(\left| \pi _{k} (s) - \tilde{\pi }_{k}^1 (s) \right| \), or equivalently a bound for \(\left| \pi _{k} (s) - \frac{\pi _{k}^1 (\alpha ^1(s))}{\mid A^1 (\alpha ^1 (s)) \mid } \right| \). Such a point-wise bound directly allows for expressing a global error for the infinity norm of the difference between the two distribution vectors, namely

$$\begin{aligned} \left\| \pi _{k} - \tilde{\pi }_{k}^1 \right\| _{\infty } = \max _{s \in S} \left| \pi _{k} (s) - \tilde{\pi }_{k}^1 (s) \right| . \end{aligned}$$

Beyond the first aggregation \((l=1)\), the next statement explicitly characterises such a bound over the entire sequence of q re-aggregations and the time interval \([0, 1, \ldots , N]\).

Proposition 1

Consider a sequential q-step aggregation strategy, characterised by times \(N_l: \sum _{l=1}^{q} N_l = N\), partitions \(S = \cup _{i=1}^{m_l} S_i^l\), and matrices \(P^l\). We obtain

$$\begin{aligned}&\quad \left| \pi _{N} (s) - \tilde{\pi }_{N}^q (s) \right| \le c(s)^{N} \left| \pi _{0} (s) - \tilde{\pi }_{0}^1 (s) \right| \\&\qquad \quad + \sum _{l=1}^{q} c(s)^{N-\sum _{i=0}^{l} N_i} \left\{ \frac{1}{\mid A^l(\alpha ^l(s)) \mid }\sum _{j=1}^{m_l} \epsilon ^l(j, \alpha ^l(s)) \sum _{k=0}^{N_l-1} \pi _{\sum _{i=0}^{l-1}N_{i}+k}^l (j) + \gamma _{l-1}^{l} (s) \right\} , \end{aligned}$$

where we have set \(c(s) = P(S,s)\), and \(\gamma _{l-1}^{l}(s) = \left| \tilde{\pi }_{\sum _{i=0}^{l-1} N_i}^{l-1} (s) - \tilde{\pi }_{\sum _{i=0}^{l-1} N_i}^{l} (s) \right| \) for \({l=1,\ldots ,q}\), with \(\gamma _{0}^{1} (s) = 0\), \(\forall s \in S\).

Remark 2

A few comments on the structure of the error bounds are in order. The overall error is composed of two main contributions, one depending on the error accrued within single aggregation steps, and the other (\(\gamma _{l-1}^{l}(s)\)) depending on the q re-aggregations (that is, an update from the current partition to the next).

The first term of the first contribution further depends on the point-wise error in the distributions initialised at each aggregation, namely, \(\left| \pi _{\sum _{i=0}^{l} N_i} (s) -\right. \) \(\left. \tilde{\pi }_{\sum _{i=0}^{l} N_i}^l (s) \right| \): this quantity, discounted by the \(N_l\)-th power of the factor c(s) (accounting for contractive or expansive dynamics), builds up recursively to yield the global (over the q aggregation steps) quantity \(c(s)^{N} \left| \pi _{0} (s) - \tilde{\pi }_{0}^1 (s) \right| \). The second term of the first contribution, on the other hand, accounts for the error due to the approximation of the transition probability matrix (terms \(\epsilon ^l\)), averaged over the accrued running distribution functions (factors \(\pi ^l\)).

The intuition on factor c(s) is the following: if the model is “contractive” (in a certain probabilistic sense) towards a point s, the factor c(s) is likely to be greater than one; on the other hand, if the distribution in time is “dispersed,” then it is likely that \(c(s)<1\) over a large subset of the state space. The quantity \(c(s) = P(S,s)\) might be decreased if we work on a subset of S: this might happen with a discrete-time chain obtained from a corresponding continuous-time model via FAU [9, 23], or through the interaction of the factor \(c(s), s\in S\), with atomic propositions defined specifically over subsets of the state space S.    \(\Box \)

Corollary 1

Consider the same setup as in Proposition 1. A bound for the quantity \(\left\| \pi _{N} - \tilde{\pi }_{N}^q \right\| _{\infty }\) can be obtained from that in Proposition 1 by straightforward adaptation and setting \(c = \max _{s \in S} c(s)\), and \(\gamma _{l-1}^{l} = \max _{s \in S} \gamma _{l-1}^{l}(s), l=1,\ldots ,q\).

In addition to point-wise errors, we seek a bound for the following global error,

$$\begin{aligned} \left\| \pi _{k} - \tilde{\pi }_{k}^1 \right\| _1 = \sum _{s\in S} \left| \pi _{k} (s) - \tilde{\pi }_{k}^1 (s) \right| , \quad \forall k = 0, \ldots , N_1, \end{aligned}$$

and its further extension to successive aggregations and time steps \(k = N_1+1, \ldots , N\). This \(L_1\)-norm measure is related to the “total variation distance” over events in the \({\sigma \text{-algebra }}\) \(2^S\) at each time step k. This measure is commonly used in related literature [8, 29], and refers to differences in probability of events defined over sets in S at a specific point in time k. The corresponding error bound is explicitly quantified as follows.

Proposition 2

Consider a q-step sequential aggregation strategy characterised by the times \(N_l: \sum _{l=1}^{q} N_l = N\), partitions \(S = \cup _{i=1}^{m_l} S_i^l\), and matrices \(P^l\). We obtain

$$\begin{aligned} \left\| \pi _{N} - \tilde{\pi }_{N}^q \right\| _1 \le \left\| \pi _{0} - \tilde{\pi }_{0}^1 \right\| _1 + \sum _{l=1}^{q} \left\{ \sum _{j=1}^{m_l} \epsilon ^l(j) \sum _{k=0}^{N_l-1} \pi _{\sum _{i=0}^{l-1}N_{i}+k}^l (j) + {\varGamma }_{l-1}^{l} \right\} , \end{aligned}$$

where for \(l=1,\ldots ,q\), \( {\varGamma }_{l-1}^{l} = \left\| \tilde{\pi }_{\sum _{i=0}^{l-1}N_i}^{l-1} - \tilde{\pi }_{\sum _{i=0}^{l-1}N_i}^{l} \right\| _1, \) and where we have set \({\varGamma }_{0}^{1} = 0\).

3 Aggregations for Model Checking of Time-Bounded Specifications

In Sect. 2, we have introduced a sequential aggregation procedure to approximate the computation of the transient probability distribution of a Markov chain. The derived bounds allow for a comparison of aggregated and concrete models either point-wise, or according to a global measure of the differences in the probability of events over the state space, at a specific point in time. We now show how to employ the aggregation method for quantitative verification against probabilistic temporal logics such as PCTL. We focus on a bounded variant of the probabilistic safety (invariance) property, which corresponds to time-bounded invariance for continuous-time Markov chains.

Consider the LMP (SPL). We focus on properties expressed over the atomic propositions AP, namely the set of finite strings over the labels \(2^{AP}\), and on how to approximately compute the likelihood associated to such strings. In particular, consider a step-bounded safety formula [4], namely \(\mathbb P_{=?} \left( G^{\le N} {\varPhi }\right) \), where \(N \in \mathbb N\), and \({{\varPhi }\in 2^\varSigma }\), Sat\(({\varPhi }) \subseteq S\) Footnote 1. This specification expresses the likelihood that a trajectory, initialised according to a distribution (say, \(\pi _0\)) over the state space S, resides within set \({\varPhi }\) over the time interval \([0, 1, \ldots , N]\). The specification of interest can be characterised as follows: for any \(s \in S, k = 0,1, \ldots , N-1\),

$$\begin{aligned} V_0 (s) = 1_{{\varPhi }}(s) \pi _0 (s), \qquad V_{k+1} (s) = 1_{{\varPhi }}(s) \sum _{s' \in S} V_k(s') P (s',s), \end{aligned}$$

so that \(\mathbb P_{=?} \left( G^{\le N} {\varPhi }\right) = \sum _{s \in S} V_N (s)\). It is well known that the computed quantity depends on the choice of the initial distribution \(\pi _0\) (which can in particular be a point mass for a distinguished initial state). As should be clear from the recursion above (use of indicator functions \(1_{{\varPhi }}\)), it is sufficient to restrict the recursive updates to within the set of points labelled by \({\varPhi }\).

As before, consider the global finite interval \([0,1,\ldots , N]\), and divide it via intervals of duration \(N_1, N_2, \ldots , N_q: \sum _{i=1}^{q} N_i = N\). Specifically, for the initial interval \([N_0,N_1]\) (corresponding to index \(q=1\)), partition set \({\varPhi }\) as \({\varPhi }= \cup _{i=1}^{m_1} {\varPhi }_i^1\) – notice that the partition does not cross the boundaries of the set \({\varPhi }\). Thus \(S^1 = {\varPhi }^1 \cup \{a_1\} = \{1, \ldots , m_1, a_1\}\), where \(a_1\) is associated with the complement set \(S{\setminus }{\varPhi }\). Introduce abstraction and refinement maps as \(\alpha ^1: S \rightarrow S^1\) and \(A^1: S^1 \rightarrow 2^S\), the abstract transition matrix \(P^1\), and function \(\epsilon ^1: [1,\ldots ,m_1]^2 \rightarrow [0,1]\) as

$$\begin{aligned} \epsilon ^1 (j,i) \doteq \max _{s \in {\varPhi }_i^1} \left| \frac{\mid {\varPhi }_i^1 \mid }{\mid {\varPhi }_j^1 \mid } P({\varPhi }_j^1,s) - P^1(\phi _j,\phi _i) \right| . \end{aligned}$$

Further, approximate \(\pi _0\) as: \(\forall s\in S^1, \pi _0^1(s) = \sum _{s'\in A^1(s)} \pi _0(s')\). Introduce, \(\forall s \in S^1\), cost functions \(V_i\) via the following recursions:

$$\begin{aligned} V_0^1 (s) = 1_{{\varPhi }^1}(s) \pi _0^1 (s), \qquad V_{k+1}^1 (s) = 1_{{\varPhi }^1}(s) \sum _{s' \in S^1} V_k^1(s') P^1 (s',s), \end{aligned}$$

and, \(\forall s \in S\), \(\tilde{V}_{k}^1 (s) = V_k^1(\alpha ^1(s))/|A^1(\alpha ^1(s))|\), as a (normalised) piecewise constant approximation of the abstract functions \(V_k^1\), and in particular initialised as \(\tilde{\pi }_0^1 (s) = \pi _0^1(\alpha ^1(s))/|A^1(\alpha ^1(s))|\). We shall derive explicit bounds on the computation of the error:

$$\begin{aligned} \left| \sum _{s \in {\varPhi }} V_{N_1} (s) - \sum _{s \in {\varPhi }} \tilde{V}_{N_1}^1 (s) \right| = \left| \sum _{i=1}^{m_1} \sum _{s \in {\varPhi }_i} \left( V_{N_1} (s) - \tilde{V}_{N_1}^1 (s) \right) \right| , \end{aligned}$$

and extend them over successive aggregation and time steps \(k = N_1+1, \ldots , N\). Notice that, in this instance, we are comparing two scalars, comprising the likelihoods associated with the specification of interest computed over the concrete and abstract models, respectively. More precisely, in general we have:

$$\begin{aligned} \left| \sum _{s \in {\varPhi }_i} V_{N_1} (s) - \sum _{s \in {\varPhi }_i} \tilde{V}_{N_1}^1 (s) \right| \! = \! \left| \sum _{s \in {\varPhi }_i} V_{N_1} (s) - \sum _{s \in {\varPhi }_i} \frac{V_{N_1}^1 (\alpha ^1(s))}{|A^1(\alpha ^1(s))|} \right| \! = \! \left| \sum _{s \in {\varPhi }_i} V_{N_1} (s) - V_{N_1}^1 (i) \right| . \end{aligned}$$

Proposition 3

Consider a q-step sequential aggregation strategy characterised by corresponding times \(N_l: \sum _{l=1}^{q} N_l = N\), partitions \({\varPhi }= \cup _{i=1}^{m_l} {\varPhi }_i^l\), and matrices \(P^l\). We obtain:

$$\begin{aligned} \left| \sum _{s \in {\varPhi }} V_{N} (s) - \sum _{s \in {\varPhi }} \tilde{V}_{N}^q (s) \right| \le \sum _{l=1}^{q} \sum _{i=1}^{m_l} \epsilon ^l (i) \sum _{k=0}^{N_l-1} V_{\sum _{i=0}^{l-1}N_i +k}^l (i). \end{aligned}$$

Remark 3

We give some intuition regarding the structure of the bounds. The quantity depends on a summation over q aggregation steps. It expresses the accrual of the error incurred over the outgoing probability from the i-th partition (term \(\epsilon ^l (i) \)), averaged over the history of the cost function over that partition. Note the symmetry between the shape of the bound and the recursive definition of the quantities of interest.    \(\Box \)

4 Quantitative Analysis of Chemical Reaction Networks

A chemical reaction network describes a biochemical system containing M chemical species participating in a number of chemical reactions. The state of a model of the system at time \(t \in \mathbb R^+\) is the vector \(\mathbf {X}(t) = ( X_1(t), X_2(t), \ldots , X_{M}(t) )\), where \(X_i\) denotes the number of molecules of the i-th species [15]. Whenever a single reaction occurs the state changes based on the stoichiometry of the corresponding reaction. We use S to denote the set of (discrete) states. Further, for \(s \in S\), \(\pi _t(s)\) denotes the probability \(\mathbb P (\mathbf {X}(t) = s)\). Assuming finite volume and temperature, the model can be interpreted as a continuous-time Markov chain (CTMC) \(C = (S, R)\), where the rate matrix \(R(s,s')\) gives the rate of a transition from states s to \(s'\), and \(\pi _0\) specifies the initial distribution over S. The time evolution of the model is governed by the Chemical Master Equation (CME) [15], namely \(\frac{d}{dt}\pi _t = \pi _t \cdot Q\), where Q is the infinitesimal generator matrix, defined as \(Q(s,s') = R(s,s')\) if \(s \ne s'\), and as \(1-\sum _{s'' \ne s}{R(s,s'')} \) otherwise. The exact solution of the CME is in general intractable, which has led to a number of possible numerical approximations [25]. We employ uniformisation [30], which in many cases outperforms other methods and also provides an arbitrary, user-defined approximation precision.

Uniformisation is based on a time-discretisation of the CTMC. The distribution \(\pi _t\) is obtained as a sum (over index i) of terms giving the probability that i discrete reaction steps occur up to time t: this is a Poisson random variable \(\gamma _{i,\lambda \cdot t} = e^{-\lambda \cdot t} \cdot \frac{\left( \lambda \cdot t\right) ^i}{i!}\), where the time delay is exponentially distributed with rate \(\lambda \). More formally, \(\pi _t = \sum ^{\infty }_{i=0}{\gamma _{i,\lambda \cdot t} \cdot \pi _0 \cdot \tilde{Q}^i} \approx \sum ^{N}_{i=0}{\gamma _{i,\lambda \cdot t} \cdot \pi _0 \cdot \tilde{Q}^i}\), where \(\tilde{Q}\) is the uniformised infinitesimal generator matrix defined using terms \(\frac{R(s,s')}{\lambda }\), and where the uniformisation constant \(\lambda \) is equal to the maximal exit rate \(\sum _{s'' \ne s}{R(s,s'')} \). Although the sum is in general infinite, for a given precision an upper bound N can be estimated using techniques in [14], which also allow for efficient computation of the Poisson probabilities \(\gamma _{i,\lambda \cdot t}\).

For complex models with very large or possibly infinite state spaces, the above numerical approximations are computationally infeasible, and are typically combined with (dynamical) state-space truncation methods, such as finite state projection [24], sliding window abstraction [18], or fast adaptive uniformisation [9, 23] (FAU). The key idea of these truncation methods is to restrict the analysis of the model to a subset of states containing significant probability mass. One can easily compute the probability lost at each uniformisation step and thus obtain the total probability lost by truncation. As such, these truncation methods provide a lower bound on the quantities \(\pi _t\), and the quantified probability lost can be used to derive a (rather conservative) upper bound on the approximation error: the sum of the lower bound and the probability lost gives an upper bound for the point-wise error. Moreover, a (pessimistic) bound on the \(L_1\)-norm over a general subset of the state space is obtained by multiplying the probability lost by the number of states in the concrete subset.

Adaptive Aggregation for CTMC Models of Chemical Reaction Networks. The aggregation methods in the previous sections can be directly applied to uniformised CTMCs, such as those arising from chemical reaction networks. We now discuss how the aggregation unfolds sequentially in time and how the derived error bounds can be used for the aggregation method in this setting.

Recall from Eq. (2) that the derivation of the error bounds for the aggregation procedure requires a finite state space: for infinite-state CTMCs, the aggregation method can be combined with state-space truncation (alongside time uniformisation), in order to accelerate computations in cases where the set of significant states is still too large. On the other hand, for finite-state CTMC models, adaptive aggregations can be regarded as an orthogonal strategy to truncation, and can be directly applied in conjunction with time uniformisation. In order to compare the precision and reduction capability of our method to that of FAU, we thus assume that the population of each species is bounded, which ensures fairness of experimental evaluation.

The key ingredient of the proposed aggregation method is a partitioning strategy that controls and adapts the clustering of the state space over the given finite time interval. Algorithm 1 summarises the scheme for transient probability calculation (the adaptive aggregation for an invariant property as in Sect. 3 unfolds similarly). The procedure starts with a given partition \(S^1\) of the state space S (obtained by the procedure \(\mathsf {initAggregation}\) on line 2). It dynamically (and automatically) updates the current partition when needed, thus providing new abstract state spaces \(S^l\) over the l-th time interval \([\sum _{i=0}^{l-1} N_i, \sum _{i=0}^{l} N_i]\), where \(l=2, \ldots , q\) and \(q << N\). The update of the current l-th clustering is performed after \(N_l\) steps, that is, whenever the error accrual exceeds a threshold ensuring the user-defined precision \(\theta \) (function \(\mathsf {checkAggregation}\) on line 6). At the same time, the aggregation strategy aims to minimise the average size of the abstract state space, defined as \(avg = \sum _{l=0}^{q}{N_l \cdot |S^l|} / N\). We consider two adaptive strategies, one time-local and the other history-dependent, both of which are driven by the shape of the derived error bounds – in particular, the history-dependent strategy exactly employs the calculated error bounds. Both strategies are parametrised by thresholds, which ensure the required overall precision \(\theta \) and account for the size of the concrete state space as well as the number of uniformisation steps N.

figure a

The history-dependent strategy is based on the available history contributing to the shape of the derived errors: for the l-th aggregation step and the given i-th cluster of the current partition, it tracks the sum of the errors accumulated in the interval \([\sum _{i=0}^{l-1} N_i, \sum _{i=0}^{l-1} {N_i + k}]\) for \( k=1,\ldots , N_l\), according to the explicit bounds derived in Sect. 2 (line 4 of Algorithm 1). At each step k, the obtained value (averaged over k steps) reflects the (averaged) error accrual for each cluster (array AccumErrors) and is used to drive the partitioning procedure.

The function \(\mathsf {checkAggregation}\) determines (using AccumErrors) if the current clustering meets the desired threshold, or if a refinement is desirable: during re-clustering, a locally coarser abstraction may as well be suggested by merging clusters. The function \(\mathsf {Recluster}\) provides the new clustering based on the error bounds, which are functions of AccumErrors, of the local contributions \(\epsilon ^l\), and of the (history of the) distribution \(\pi _k^l\) (or of the cost \(V_k^l\) in the case of safety verification). In contrast to the adaptive method presented in [13] and based exclusively on local heuristics, our strategy closely reflects the shape of the derived, history-dependent error bounds. Note that the aggregation strategy applied to chemical reaction networks aligns well with the known structure of the underlying CTMCs. In particular, the state-space clustering employs the spatial locality of the distribution of transitions in the M-dimensional space [13, 31] (M is the number of chemical species), usually leading to relatively uniform probability mass over adjacent states and thus to strategies that cluster neighbouring states.

A simpler re-clustering strategy (denoted in the experiments as local) employs at each uniformisation step k only the product of the local error \(\epsilon ^l\) with the probability distribution \(\pi _k^l\) (or with the cost function \(V_k^l\)). In other words, a local re-clustering is performed if the local error depending on \(\epsilon ^l \pi _k^l\) (respectively, on \(\epsilon ^l V_k^l\)) is above a given threshold. This intuitive scheme is similar to the local heuristic employed in [13].

We will show that the history-based strategy is more flexible with respect to the required precision and aggregation size. Our experiments confirm that, while based on error bounds that over-estimate the actual empirical error incurred in the aggregation, the history-based strategy tends to outperform the more intuitive and easier local strategy, with respect to key performance metrics affecting the practical use of the adaptive aggregation. This shows that the computed errors not only serve as a means to certify the accuracy of approximation, but can also be used to effectively drive the aggregation procedure. In particular, the metrics we are interested in are: (1) the value of avg representing the state-space reduction; (2) the accuracy of the empirical results of the abstract model; (3) the total number of re-clusterings; and (4) the actual value of the error bounds (compared to the empirical errors).

The number of re-clusterings (denoted by q) is crucial for the performance of the overall scheme, since each re-clustering requires \(\mathcal {O}(|S|+|P|)\) steps, which is similar to performing a few uniformisation steps for the concrete model. As such, the number of re-clusterings should be significantly smaller than the total number of uniformisation steps. Therefore, in our experiments we use thresholds that favour fewer re-clusterings over coarser abstractions. Finally, note that the adaptive aggregation scheme can be combined with the adaptive uniformisation step as well as with dynamic state-space truncation [9, 23, 30], which updates the uniformisation constant \(\lambda \) for different time intervals, thus decreasing the number of overall uniformisation steps N.

Illustrative Example. We resort to a two-dimensional discrete Lotka-Volterra “predator-prey” model [15] to illustrate the history-dependent aggregation strategy. The maximal population of each species is bounded by 2000, thus the concrete model has 4M states. The initial population is set to 200 predators and 400 preys.

Figure 1 displays the outcome of the adaptive procedure (top row) at three distinct time steps and (bottom row) the current probability distribution of the concrete model. For ease of visualisation, the top plots display for each point of the concrete model the size of its corresponding cluster, where we have limited the maximal size to 100 states. Note the close correspondence between the error bounds and the computed empirical errors, and the limited number of re-clusterings needed (one in about 200 uniformisation steps). Observe that the single-state clusters (red colour in the plots) tend to collect where the current probability distribution peaks. The figure also illustrates a memory effect due to the history-dependent error bounds employed by the aggregation.

Fig. 1.
figure 1

Transient analysis of the Lotka-Volterra model using history-based adaptive aggregation.

5 Experimental Evaluation on Two Case Studies

We have developed a prototype implementation of the adaptive aggregation for the quantitative analysis of chemical reaction networks modelled in PRISM [20]. We have evaluated the scheme on two case studies in comparison with FAU [9] as implemented in the explicit engine of PRISM. In order to ensure comparability between the two schemes, which employ different data structures, rather than measuring execution time we have focused on assessing performance based on measures that are independent of implementation, and specifically focused on the metrics (1)–(4) introduced in Sect. 4 (model reduction, empirical accuracy, number of re-clusterings, and formal error bounds). For the same reason, we have not incorporated heuristics such as varying the maximal cluster size, optimally selecting error thresholds, or use of advanced clustering methods, which can be employed to further optimise the adaptive scheme.

We run all experiments on a MackBook Air with 1.8GHz Intel Core i5 and 4 GB 1600 MHz RAM. As expected, for comparable state space reductions (value avg), FAU can be faster but in the same order of magnitude as our prototype, due to the overhead of clustering and adaptive uniformisation not being fully integrated in our implementation.

Recall that FAU eliminates states with incoming probability lower than a defined threshold, and as such leads to an under-approximation of the concrete probability distribution with no tailored error bounds: all we can say is that, point-wise, the concrete transient probability distribution resides between this under-approximation and a value obtained by adding the total probability lost, and similarly for the invariance likelihood.

Two-Component Signalling Pathway. [7] has analysed the robustness of the output signal of an input-output signal response mechanisms introduced in [28]. It is a two-component signalling pathway including the histidine kinase H, the response regulator R, and their phosphorylated forms (Hp and Rp). In order to ensure a feasible analysis, [7] has limited the state space by bounding the total populations over the intervals \(25\le H+Hp \le 35\) and \(25\le R+Rp \le 35\) (dimensionless quantities). Since this truncation has a significant impact on the distribution of variable Rp (representing the output signal), in this work we consider less conservative (but computationally more expensive) bounds and employ the adaptive aggregation scheme, which allows for a reduction in the size of the model while quantifying the precision of approximation by means of the derived error bounds.

Fig. 2.
figure 2

Statistics for the invariant property. Population bounds [5,55]: 1.2M states (less than those in Fig. 3 due to the property of interest), 16489 steps. The satisfaction probability of the property for the concrete model is equal to 2.15E-17.

We first evaluate the adaptive aggregation scheme over the verification of an invariant property with associated small likelihood: in this scenario dynamic truncation techniques such as FAU provide insufficient approximation precision. We compute the probability that the population of Rp stays below the level 15 for \(t = 5\) s (a relevant time window due to the fast-scale phosphorylation). The results for the new, less restrictive population bounds [5, 55] are reported in Fig. 2. We present empirical satisfaction probabilities (“Empirical”) and their formal bounds (“Bound”) computed using Proposition 3 for the adaptive aggregation scheme, and lower bounds and probability lost for the FAU algorithm. For both schemes we report the obtained state-space size avg. We can observe a clear relationship between the state-space reduction and the precision of the analysis. For adaptive aggregations, the parametrisation of each strategy is denoted by an index (1, 2, 3) representing the thresholds affecting the precision. Note that the parameterization for the history-based aggregation, in contrast to the local strategy, allows us to obtain the user-defined precision (e.g. in this experiment for the history-dependent strategy index 1 denotes a restriction of the bounds to 5E-11, whereas 2 to 5E-12, and 3 to 5E-14), since the aggregation employs exactly the errors. The results also demonstrate that the history-based strategy significantly outperforms the local strategy in all four key performance metrics.

Since the invariant property is associated with a small probability, we require accurate error bounds. The data in Fig. 2 shows that, for upper bounds of the adaptive scheme that are at least 5 orders of magnitude better compared to those from FAU, the adaptive aggregation method provides more than a twenty-fold reduction with respect to the size of the concrete model, and about a three-fold improvement with respect to the compression obtained via FAU. The results also demonstrate that different parameterisations of the aggregation strategy allow us to control the bounds, and via the bounds also to improve the empirical results (which confirms the usefulness of the derived bounds). However, decreasing the truncation threshold of FAU only improves the lower bounds (from 0.0 to 2.15E-17), but the probability lost is not considerably improved (it is even slightly worse for the very small thresholds, probably due to rounding errors). Notice that, whilst the global errors are still much more conservative, FAU provides better state-space reduction when a lower bound around 2.15E-17 (which is very close to the true probability) is required for the adaptive scheme.

Fig. 3.
figure 3

Statistics for the \(L_1\) norm of the error. Population bounds [25,35]: 116K states, 6924 uniformisation steps. Population bounds [5,55]: 2.5M states, 16489 uniformisation steps.

Fig. 4.
figure 4

Statistics for the \(L_1\) norm of the error computed over a set, characterised by the population of at least one species that is equal to 0. Population bounds [25,35]: 116K states, 6924 uniformisation steps - the set has 14 K states and the probability distribution within the set at time \(t=5\) is equal to 1.31E-8. Population bounds [5,55]: 2.5M states, 16489 uniformisation steps - the set has 307 K states and the probability distribution within the set at time \(t=5\) is equal to 1.36E-8.

Next, we employ this example to compare the computation of the \(L_1\) norm of the probability distribution at time \(t=5\) s. The table in Fig. 3 depicts the results for the \(L_1\) norm over the whole state space, whereas the table in Fig. 4 depicts the results for the \(L_1\) norm over a certain subset of interest. The formal bounds for the adaptive scheme (column “Empirical” in Fig. 3) have been computed using Proposition 2, whilst the corresponding bounds for Fig. 4 (middle part) have been obtained as the sum of the point-wise errors, defined in Proposition 1, over the subset of interest. The upper part of the tables corresponds to the population bounds [25, 35] (as in [7]), whereas the lower part to the less restrictive bounds [5, 55]. Compared to the local strategy, the history-based aggregation again provides better performance, namely it requires significantly (up to ten-times) smaller numbers of re-clusterings (“Re-clust.”): we thus present the results only for the history-dependent strategy. We ensure the comparability of the two outcomes by empirically selecting the threshold for FAU to obtain a truncated model of size (avg) similar to that resulting from our technique. Note that, in the case of the \(L_1\) norm over the state space, the probability lost reported by FAU provides the safe bound on the \(L_1\) norm and is equal to the empirical error between the concrete and truncated probability distributions. However, in the case of the \(L_1\) norm over a general subset of the state space the probability lost has to be multiplied by the cardinality of the subset to obtain the correct formal bounds. Such bounds are reported in Fig. 4 (right part) as “Bound,” whereas the empirical error between the distribution over the subset is depicted as “Empirical”.

Summarising Figs. 3 and  4, when requiring a tight bound for the smaller state space (population [25, 35]), either approach does not lead to more than a two-fold reduction in the size of the space. This suggests a limit on the possible state-space reduction resulting from the model dynamics. However, for the larger model (population [5, 55]), up to a seven-fold reduction can be obtained using adaptive aggregation. We can see that FAU outperforms the adaptive aggregation scheme in the case of the \(L_1\) norm error over the whole state space (where it leads to a nineteen-fold reduction) but, in contrast to our approach, is not able to provide useful bounds for a general \(L_1\) norm (especially for the larger model).

Prokaryotic Gene Expression. The second case study deals with a more complex model for prokaryotic gene expression. The chemical reaction model has been introduced in [19] and includes 12 species and 11 reactions. We bound the maximal population of particular species (left column in Fig. 5) to obtain a finite and tractable state space. We focus our experiments exclusively on the history-dependent aggregation scheme.

Fig. 5.
figure 5

Statistics for the \(L_1\) norm of the error restricted to a set of interest, a strict subset of the state space. Maximal population 10: 1.2M states, 33162 uniformisation steps - the set has 516 K states and the probability distribution within the set at time \(t=1000\) is equal to 5.84E-3. Maximal population 20: 4.4M states, 53988 uniformisation steps - the set has 1.8M states and the probability distribution within the set at time \(t=1000\) is equal to 2.21E-2.

In contrast with the previous case study that focused on events with very small likelihood, we now discuss results for events with non-negligible likelihood. Figure 5 reports basic statistics on the computation of the \(L_1\) norm over a certain subset of the state space at time \(t=1000\) s. Providing useful error bounds on the \(L_1\) norm (computed from the point-wise errors in Proposition 1), the adaptive aggregation leads to almost a ten-fold state space reduction for the smaller model (1.2M vs 127K) and a fifteen-fold reduction for the larger model (4.4M vs 287K). Due to the large cardinality of the subset of interest, FAU fails to provide any informative formal bounds. Note that in this case study the adaptive aggregation scheme also provides better empirical bounds than FAU.

Finally, we have evaluated both approaches on an invariant property (the population of a species stays below the level 10, for 1000 s) with a significant satisfaction probability (more than \(15\,\%\) and \(20\,\%\) on the small and large model, respectively). We observe that this choice is favourable to FAU, since for invariant properties with high likelihood the state space truncated via FAU is aligned with the property of interest, and thus the lost probability mass is slightly smaller than the error introduced by the state-space aggregations. In this scenario FAU yields better reductions than the adaptive aggregation scheme (especially for the larger model), while providing similar error bounds, since it is able to successfully identify the relevant part of the state space. This scenario advantageous to FAU is in contrast to that discussed in Fig. 2, as well as to the general case where for an arbitrary model it is not known how the probability mass is distributed in relation to the states satisfying the property of interest.

6 Conclusions

We have proposed a novel adaptive aggregation algorithm for approximating the probability of an event in a Markov chain with rigorous precision guarantees. Our approach provides error bounds that are in general orders of magnitude more accurate compared to those from fast adaptive uniformisation, and significantly decreases the size of models without performance degradation. This has allowed us to efficiently analyse larger and more complex models. Future work will include effective combinations of the adaptive aggregation with robustness analysis and parameter synthesis. We also plan to apply our approach to the verification and performance analysis of complex safety-critical computer systems, where precision guarantees play a key role.