1 Introduction

Markov processes over general (uncountable) state spaces appear in many areas of engineering, such as power and transportation networks, biological processes, robotics, and manufacturing systems. The importance of this class of stochastic processes in applications has motivated a significant research effort into their foundations, analysis, and verification.

We study the problem of algorithmically verifying finite-horizon probabilistic invariance for Markov processes, that is computing the probability that a stochastic process remains within a given set for a given finite time horizon. For finite-state stochastic processes, there is a mature theory of model checking discrete-time Markov chains [7], and a number of probabilistic model checking tools [18, 22] compute explicit solutions to related verification problems. On the other hand, stochastic processes taking values over uncountable state spaces do not in general admit explicit solutions, and related verification problems are undecidable even for simple dynamics [2]. A number of studies have therefore explored abstraction techniques that reduce the given stochastic process (over a general state space) to a finite-state process, while preserving properties in a quantitative sense [2, 10]. The abstracted model allows the application of standard model checking techniques (and software tools) over finite-state models. The work in [2] has further shown that an explicit error can be attached to the abstraction: this error is computed purely based on continuity properties of the concrete Markov process. As such, properties proved on the finite-state abstraction can be used to reason about properties of the original process. The overall approach has been customized under various assumptions on the model [9, 11] and has been extended to linear temporal specifications [3, 30]. A software tool has also been developed to automate the abstraction procedure [14] and to couple it with standard probabilistic model checkers [18, 22].

In previous work, the structure of the underlying Markov process (namely, the interdependence among its variables) has not been actively reflected in the abstraction algorithms, and the finite-state Markov chain has been always represented explicitly, which can become quite expensive in terms of memory requirements. In many applications, the dynamics of the Markov process, which are fully characterized by a conditional stochastic kernel, often exhibit specific structural properties. More precisely, the dynamics of any state variable may depend only on a limited number of other state variables, and the process noise driving each state variable can be assumed to be independent. Examples of such structured systems are models of power grids and sensor–actuator networks as large-scale interconnected networks [29], and mass-spring-damper systems [5, 6] with a given non-dense topology.

In this work we present an abstraction and model checking algorithm for discrete-time stochastic dynamical systems over general (uncountable) state spaces. The procedure constructs a finite-state Markov abstraction of the process, but differs from previous work in that it is based on a dimension-dependent partitioning of the state space. Additionally, we perform a precise dimension-dependent analysis of the error introduced by the abstraction, and our error bounds can be exponentially smaller than the earlier bounds obtained in [2]. Furthermore, we represent the abstraction as a dynamic Bayesian network (DBN) [19], instead of explicitly representing it via a probabilistic transition matrix. The Bayesian network representation exploits independence assumptions in the model to potentially provide polynomially sized representations (in the number of dimensions) for the Markov chain abstraction, whereas the explicit transition matrix would be exponential in the number of dimensions. We show how factor graphs and the sum-product algorithm, developed for belief propagation in Bayesian networks, can be used to model check probabilistic invariance properties without constructing the transition matrix. Overall, our approach leads to significant reduction in computational and memory resources for model checking structured Markov processes, and provides tighter error bounds.

The material is organized in seven sections. Section 2 defines discrete-time Markov processes and the probabilistic invariance problem. Section 3 presents a new algorithm for abstracting a process to a DBN, together with the quantification of the abstraction error. We discuss efficient model checking of the constructed DBN in Sect. 4. The performance of the DBN abstraction approach is compared with the state-of-the-art abstraction procedure in Sect. 5. We apply the overall abstraction algorithm to a case study in Sect. 6. Section 7 outlines current directions of investigation.

2 Markov processes and probabilistic invariance

2.1 Discrete-time Markov processes

We write \(\mathbb {N}\) for the non-negative integers \(\mathbb {N} :=\{0,1,2,\ldots \}\) and \(\mathbb {N}_n\) for positive integers not greater than n, \(\mathbb {N}_n := \{1,2,\ldots ,n\}\). We use bold typeset for vectors and normal typeset for one-dimensional quantities.

We consider discrete-time stochastic dynamical systems defined over a general state space \(\mathcal {S}\). For a sequence of independent and identically distributed (iid) random variables \(\{\varvec{\zeta }(t),\,\,t\in \mathbb {N}\}\) taking values in \(\mathbb {R}^n\), and a measurable map \(\varvec{f}:\mathcal {S}\times \mathbb {R}^n\rightarrow \mathcal {S}\), the dynamical system is characterized as

$$\begin{aligned} \varvec{s}(t+1) = \varvec{f}(\varvec{s}(t),\varvec{\zeta }(t)),\quad \forall t\in \mathbb {N},\quad \varvec{s}(0) = \varvec{s}_0\in \mathcal {S}. \end{aligned}$$
(1)

The stochastic dynamical system (1) can be seen as a discrete-time Markov process \(\mathscr {M}_{\mathfrak {s}}\) characterized by the tuple \((\mathcal {S},\mathcal {B}, T_{\mathfrak {s}})\): \(\mathcal {S}\) is the continuous state space, which we assume to be endowed with a metric and to be separableFootnote 1; \(\mathcal {B}\) is the Borel \(\sigma \)-algebra associated to \(\mathcal {S}\), which is the smallest \(\sigma \)-algebra containing all open subsets of \(\mathcal {S}\); and \(T_{\mathfrak {s}}:\mathcal {S}\times \mathcal {B}\rightarrow [0,1]\) is a stochastic kernel, so that \(T_{\mathfrak {s}}(\cdot ,B)\) is a non-negative measurable function for any set \(B\in \mathcal {B}\), and \(T_{\mathfrak {s}}(\varvec{s},\cdot )\) is a probability measure on \((\mathcal {S},\mathcal {B})\) for any \(\varvec{s}\in \mathcal {S}\). The stochastic kernel \(T_{\mathfrak {s}}(\varvec{s},\cdot )\) of dynamical system (1) is computed as

$$\begin{aligned} T_{\mathfrak {s}}(\varvec{s},B) = T_{\zeta }\left( \varvec{\zeta }\in \mathbb {R}^n\,:\,\varvec{f}(\varvec{s},\varvec{\zeta })\in B\right) , \end{aligned}$$

where \(T_{\zeta }\) is the distribution of the r.v. \(\varvec{\zeta }(0)\) (in fact, of any \(\varvec{\zeta }(t)\) since these are iid random variables). In other words, the map \(\varvec{f}\) and the distribution of the r.v. \(\{\varvec{\zeta }(t)\}\) uniquely define the stochastic kernel of the process.

Trajectories (also called traces or paths) of \(\mathscr {M}_{\mathfrak {s}}\) are sequences \((\varvec{s}(0),\varvec{s}(1),\varvec{s}(2),\ldots )\) which belong to the set \(\varOmega = \mathcal {S}^{\mathbb {N}}\). The product \(\sigma \)-algebra on \(\varOmega \) is denoted by \(\mathcal {F}\). Given the initial state \(\varvec{s}(0) = \varvec{s}_0\in \mathcal {S}\) of \(\mathscr {M}_{\mathfrak {s}}\), the stochastic Kernel \(T_{\mathfrak {s}}\) induces a unique probability measure \(\mathcal {P}\) on \((\varOmega ,\mathcal {F})\) that satisfies the Markov property: namely for any measurable set \(B\in \mathcal {B}\) and any \(t \in \mathbb {N}\)

$$\begin{aligned} \mathcal {P} \left( \varvec{s}(t+1)\in \ B| \varvec{s}(0),\varvec{s}(1),\ldots ,\varvec{s}(t)\right) = \mathcal {P} \left( \varvec{s}(t+1)\in \ B| \varvec{s}(t)\right) = T_{\mathfrak {s}}(\varvec{s}(t),B). \end{aligned}$$

We assume that the stochastic kernel \(T_{\mathfrak {s}}\) admits a density function \(t_{\mathfrak {s}}:\mathcal {S}\times \mathcal {S}\rightarrow \mathbb {R}_{\ge 0}\), such that \(T_{\mathfrak {s}}(\varvec{s},B) = \int _B t_{\mathfrak {s}}(\bar{\varvec{s}}|\varvec{s})d\bar{\varvec{s}}\).

Let us expand the dynamical Eq. (1) explicitly over its states \(\varvec{s} = [s_1,\ldots ,s_n]^T\), map components \(\varvec{f} = [f_1,\ldots ,f_n]^T\), and uncertainly terms \(\varvec{\zeta }= [\zeta _1,\ldots ,\zeta _n]^T\), as follows:

$$\begin{aligned} \begin{array}{l} s_1(t+1) = f_1(s_1(t),s_2(t),\ldots ,s_n(t),\zeta _1(t)),\\ s_2(t+1) = f_2(s_1(t),s_2(t),\ldots ,s_n(t),\zeta _2(t)),\\ \quad \vdots \\ s_n(t+1) = f_n(s_1(t),s_2(t),\ldots ,s_n(t),\zeta _n(t)). \end{array} \end{aligned}$$
(2)

In this article we are interested in exploiting the knowledge of the structure of the dynamics in (2), in order to scale up formal verification algorithms based on abstractions [2, 10, 11]. We focus our attention on continuous (unbounded and uncountable) Euclidean spaces \(\mathcal {S} = \mathbb {R}^n\), and further assume that for any \(t\in \mathbb {N}\), \(\zeta _k(t)\) are independent for all \(k\in \mathbb {N}_n\). This latter assumption is widely used in the theory of dynamical systems, and allows for the following multiplicative structure on the conditional density function of the process:

$$\begin{aligned} t_{\mathfrak {s}}(\bar{\varvec{s}}|\varvec{s}) = t_1(\bar{s}_1|\varvec{s})t_2(\bar{s}_2|\varvec{s})\ldots t_n(\bar{s}_n|\varvec{s}), \end{aligned}$$
(3)

where the function \(t_k:\mathbb {R}^n\times \mathbb {R}\rightarrow \mathbb {R}_{\ge 0}\) solely depends on the map \(f_k\) and the distribution of \(\zeta _k\). The following example is adapted from [12] to demonstrate the computation of the function \(t_k\) based on some regularity assumptions on the function \(f_k\).

Example 1

Consider a \(k{\text {th}}\) order version of the system of equations in (2),

$$\begin{aligned} s_k(t+1) = f_k(\varvec{s}(t),\zeta _k(t)),\quad \varvec{s}(\cdot )\in \mathbb {R}^n,\,\,\zeta _k(\cdot )\in \mathbb {R}, \end{aligned}$$

where \(\zeta _k(\cdot )\) are iid with known distribution \(t_{\zeta _k}(\cdot )\). Suppose that the vector field \(f_k:\mathbb {R}^n\times \mathbb {R}\rightarrow \mathbb {R}\) is continuously differentiable and that \(\frac{\partial f_k}{\partial \zeta _k}\) is invertible. Then the implicit function theorem guarantees the existence and uniqueness of a function \(g_k:\mathbb {R}\times \mathbb {R}^n\rightarrow \mathbb {R}\) such that \(\zeta _k(t) = g_k(s_k(t+1),\varvec{s}(t))\). The conditional density function \(t_k\) in this case is [27]:

$$\begin{aligned} t_{k}(\bar{s}_k|\varvec{s}) = \left| \det \left[ \frac{\partial g_k}{\partial \bar{s}_k}(\bar{s}_k,\varvec{s})\right] \right| t_{\zeta _k}(g_k(\bar{s}_k,\varvec{s})). \end{aligned}$$

As a special case the invertibility of \(\frac{\partial f_k}{\partial \zeta _k}\) is guaranteed for systems with additive process noise, namely \(f_k(\varvec{s},\zeta _k) = f_{kd}(\varvec{s})+\zeta _k\), which results in \(t_{k}(\bar{s}_k|\varvec{s}) = t_{\zeta _k}(\bar{s}_k-f_{kd}(\varvec{s}))\). This fact is used in the subsequent examples and in Sect. 5. \(\square \)

Remark 1

The results of this article are presented under the structural assumption that \(\zeta _k(\cdot )\) are independent over \(k\in \mathbb {N}_n\). These results can be generalized to a broader class of processes by allowing inter-dependencies between the entries of the process noise, which leads to form subsets of the entries of \(\varvec{\zeta }(\cdot )\), which are so that any two entries from different subsets are independent, whereas entries within a subset may be dependent. This assumption induces a multiplicative structure on \(t_{\mathfrak {s}}(\bar{\varvec{s}}|\varvec{s})\) among the different subsets, which is similar to (3). As it will be discussed in Sect. 3, our abstraction approach requires partitioning the state space projected over these independent subsets, thus algorithmically the higher the number of subsets, the more efficient our abstraction process.\(\square \)

The following two examples provide instances of stochastic dynamical systems (2) and justify the structural assumption raised in (3).

Fig. 1
figure 1

An n-body mass-spring-damper system

Example 2

Figure 1 shows a system of n masses connected by springs and dampers. For \(i\in \mathbb {N}_n\), block i has mass \(m_i\), the \(i{\text {th}}\) spring has stiffness \(k_i\), and the \(i{\text {th}}\) damper has damping coefficient \(b_i\). The first mass is connected to a fixed wall by the left-most spring/damper connection. All other masses are connected to the previous mass with a spring and a damper. A force \(\zeta _i\) is applied to each mass, modeling the effect of a disturbance or of process noise. The dynamics of the overall system is comprised of the position and velocity of the blocks. It can be shown that the dynamics in discrete time take the form \(\varvec{s}(t+1) = \varPhi \varvec{s}(t)+\varvec{\zeta }(t)\), where \(\varvec{s}(t)\in \mathbb {R}^{2n}\) with \(s_{2i-1}(t),s_{2i}(t)\) indicating the velocity and position of mass i. The state transition matrix \(\varPhi = [\varPhi _{ij}]_{i,j}\in \mathbb {R}^{2n\times 2n}\) is a band matrix with lower and upper bandwidth 3 and 2, respectively (\(\varPhi _{ij} = 0\) for \(j<i-3\) and for \(j>i+2\)). \(\square \)

Example 3

A second example of structured dynamical systems is a discrete-time large-scale interconnected system. Consider an interconnected system of \(N_{\mathfrak d}\) heterogeneous linear time-invariant (LTI) subsystems described by the following stochastic difference equations:

$$\begin{aligned} \varvec{s}_i(t+1) = \varPhi _i \varvec{s}_i(t) + \sum _{j\in N_i} G_{ij}\varvec{s}_j(t) + B_i \varvec{u}_i(t)+\varvec{\zeta }_i(t), \end{aligned}$$

where \(i\in \mathbb {N}_{N_{\mathfrak d}}\) denotes the \(i^{\text {th}}\) subsystem and \(\varvec{s}_i\in \mathbb {R}^{n\times 1}, \varvec{u}_i\in \mathbb {R}^{p\times 1}, \varvec{\zeta }_i\in \mathbb {R}^{m\times 1}\) are the state, the input, and the process noise of subsystem i. The term \(\sum _{j\in N_i} G_{ij}\varvec{s}_j(t)\) represents the physical interconnection between the subsystems where \(N_i\), \(|N_i|\ll N_{\mathfrak d}\), is the set of subsystems to which system i is physically connected. The described interconnected system can be found in many application areas including smart power grids, traffic systems, and sensor-actuator networks [16]. \(\square \)

2.2 Probabilistic invariance

We focus on verifying probabilistic invariance, which plays a central role in verifying properties of a system expressed as PCTL formulae or as linear temporal specifications [3, 7, 28, 30].

Definition 1

(Probabilistic invariance) Consider a bounded Borel set \(A\in \mathcal {B}\), representing a set of safe states. The finite-horizon probabilistic invariance problem asks to compute the probability that a trajectory of \(\mathscr {M}_{\mathfrak {s}}\) associated with an initial condition \(\varvec{s}_0\) remains within the set A during the finite time horizon N:

$$\begin{aligned} p_N(\varvec{s}_0,A) = \mathcal {P}\{\varvec{s}(t)\in A\text { for all } t=0,1,2,\ldots ,N| \varvec{s}(0) =\varvec{s}_0\}. \end{aligned}$$

This quantity allows us to extend the result to a general probability distribution \(\pi :\mathcal {B}\rightarrow [0,1]\) for the initial state \(\varvec{s}(0)\) of the system as

$$\begin{aligned} \mathcal {P}\{\varvec{s}(t)\in A\text { for all } t=0,1,2,\ldots ,N\} = \int _{\mathcal {S}} p_N(\varvec{s}_0,A)\pi (d\varvec{s}_0). \end{aligned}$$
(4)

Solution of the probabilistic invariance problem can be characterized via the value functions \(V_k:\mathcal {S}\rightarrow [0,1]\), \(k=0,1,2,\ldots ,N\), defined by the following Bellman backward recursion [2]:

$$\begin{aligned} V_k(\varvec{s}) = \varvec{1}_A(\varvec{s}) \int _A V_{k+1}(\bar{\varvec{s}})t_{\mathfrak {s}}(\bar{\varvec{s}}|\varvec{s})d\bar{\varvec{s}}\,\,\text { for }\,\,k=0,1,2,\ldots ,N-1. \end{aligned}$$
(5)

This recursion is initialized with \(V_N(\varvec{s}) = \varvec{1}_A(\varvec{s})\), where \(\varvec{1}_A(\varvec{s})\) is the indicator function which is 1 if \(\varvec{s}\in A\) and 0 otherwise, and results in the solution \(p_N(\varvec{s}_0,A) = V_0(\varvec{s}_0)\).

Equation (5) characterizes the finite-horizon probabilistic invariance quantity as the solution of a dynamic programming problem. However, since its explicit solution is in general not available, the actual computation of the quantity \(p_N(\varvec{s}_0,A)\) requires N numerical integrations at each state in the set A. This is usually performed with techniques based on state-space discretization [8].

3 Formal abstractions as dynamic Bayesian networks

3.1 Dynamic Bayesian networks

A Bayesian network (BN) is a tuple \(\mathfrak {B} = (\mathcal {V},\mathcal {E},\mathcal {T})\). The pair \((\mathcal {V},\mathcal {E})\) is a directed acyclic graph (DAG) representing the structure of the network. The nodes in \(\mathcal {V}\) are (discrete or continuous) random variables and the arcs in \(\mathcal {E}\) represent the dependence relationships among the random variables. The set \(\mathcal {T}\) contains conditional probability distributions (CPD) in forms of tables or density functions for discrete and continuous random variables, respectively. In a BN, knowledge is represented in two ways: qualitatively, as dependences between variables by means of the DAG; and quantitatively, as conditional probability distributions attached to the dependence relationships. Each random variable \(X_i\in \mathcal {V}\) is associated with a conditional probability distribution \(\mathbb {P}(X_i|Pa(X_i))\), where Pa(Y) represents the parent set of the variable \(Y\in \mathcal {V}\): \(Pa(Y) = \{X\in \mathcal {V}|(X,Y)\in \mathcal {E}\}\). A BN is called two-layered if the set of nodes \(\mathcal {V}\) can be partitioned to two sets \(\mathcal {V}_1,\mathcal {V}_2\) with the same cardinality such that only the nodes in the second layer \(\mathcal {V}_2\) have an associated CPD.

A dynamic Bayesian network (DBN) [19, 25] is a way to extend Bayesian networks to model probability distributions over collections of random variables \(X(0),X(1),X(2),\ldots \) indexed by time t. A DBNFootnote 2 is defined to be a pair \((\mathfrak {B}_0,\mathfrak {B}_{\rightarrow })\), where \(\mathfrak {B}_0\) is a BN which defines the distribution of X(0), and \(\mathfrak {B}_{\rightarrow }\) is a two-layered BN that defines the transition probability distribution for \((X(t+1)|X(t))\).

3.2 DBNs as representations of Markov processes

We now show that any discrete-time Markov process \(\mathscr {M}_{\mathfrak {s}}\) over \(\mathbb {R}^n\) can be represented as a DBN \((\mathfrak {B}_0, \mathfrak {B}_{\rightarrow })\) over n continuous random variables. The advantage of the reformulation is that it makes the dependencies between random variables explicit.

The BN \(\mathfrak {B}_0\) is trivial for a given initial state of the Markov process \(\varvec{s}(0) = \varvec{s}_0\). The DAG of \(\mathfrak {B}_0\) has the set of nodes \(\{X_1,X_2,\ldots ,X_n\}\) without any arc. The Dirac delta distribution located in the initial state of the process is assigned to each node of \(\mathfrak {B}_0\).Footnote 3 The DAG for the two-layered BN \(\mathfrak {B}_{\rightarrow } = (\mathcal {V},\mathcal {E},\mathcal {T})\) comprises a set of nodes \(\mathcal {V} = \mathcal {V}_1\cup \mathcal {V}_2\), with \(\mathcal {V}_1 = \{X_1,X_2,\ldots ,X_n\}\) and \(\mathcal {V}_2 = \{\bar{X}_1,\bar{X}_2,\ldots ,\bar{X}_n\}\). Each arc in \(\mathcal {E}\) connects a node in \(\mathcal {V}_1\) to another node in \(\mathcal {V}_2\); \((X_i,\bar{X}_j)\in \mathcal {E}\) if and only if \(t_j(\bar{s}_j|\varvec{s})\) is not a constant function of \(s_i\). The set \(\mathcal {T}\) assigns a CPD to each node \(\bar{X}_j\) according to the density function \(t_j(\bar{s}_j|\varvec{s})\).

Example 4

Consider the following stochastic linear dynamical system:

$$\begin{aligned} \begin{array}{l} s_1(t+1) = a_{11} s_1(t) + \zeta _1(t)\\ s_2(t+1) = a_{21} s_1(t) + a_{22} s_2(t) + \zeta _2(t)\\ s_3(t+1) = a_{32} s_2(t) + a_{33} s_3(t) + \zeta _3(t)\\ \vdots \\ s_n(t+1) = a_{n(n-1)} s_{n-1}(t) + a_{nn} s_n(t) + \zeta _n(t), \end{array} \end{aligned}$$
(6)

with initial state \(\varvec{s}(0) = \varvec{s}_0 = [s_{01},s_{02},\ldots ,s_{0n}]^T\), where \(\zeta _i(\cdot ),\,i\in \mathbb {N}_n\) are independent Gaussian r.v. \(\mathcal {N}(0,\sigma _i^2)\), which clearly satisfies the independence assumption on the process noise raised in Sect. 2.1. The conditional density function of the system takes the following form:

$$\begin{aligned} t_{\mathfrak {s}}(\bar{\varvec{s}}|\varvec{s}) = t_1(\bar{s}_1|s_1)t_2(\bar{s}_2|s_1,s_2)t_3(\bar{s}_3|s_2,s_3)\ldots t_n(\bar{s}_n|s_{n-1},s_n). \end{aligned}$$

The DAG of the two-layered BN \(\mathfrak {B}_{\rightarrow }\) associated with this system is sketched in Fig. 2 for \(n = 4\). The BN \(\mathfrak {B}_0\) has an empty graph on the set of nodes \(\{X_1,\ldots ,X_n\}\) with the associated Dirac delta density functions located at \(s_{0i}\), \(\delta _d(s_i(0)-s_{0i})\).

Note that model (6) is in the form

$$\begin{aligned} \varvec{s}(t+1) = \varPhi \varvec{s}(t)+\varvec{\zeta }(t)\quad t\in \mathbb {N}, \end{aligned}$$
(7)

for a lower bidiagonal matrix \(\varPhi = [a_{ij}]_{i,j}\) and independent Gaussian r.v. \(\varvec{\zeta }(t)\sim \mathcal {N}(0,\varSigma )\) with the diagonal covariance matrix \(\varSigma = diag([\sigma _1^2,\sigma _2^2,\ldots ,\sigma _n^2])\). For the linear dynamical system (7), which has a non-diagonal covariance matrix \(\varSigma \), a linear transformation can be employed to change the coordinates and to obtain a stochastic linear system with a diagonal covariance matrix satisfying the independence assumption on the process noise raised in Sect. 2.1. \(\square \)

Fig. 2
figure 2

Two-layered BN \(\mathfrak {B}_{\rightarrow }\) associated with the stochastic linear dynamical system in (7) for \(n=4\)

3.3 Finite abstraction of Markov processes as discrete DBNs

Let \(A\in \mathcal {B}\) be a bounded Borel set of safe states. We abstract the structured Markov process \(\mathscr {M}_{\mathfrak {s}}\) interpreted in the previous section as a DBN with continuous variables to a DBN with discrete random variables. Our abstraction is relative to the set A. Algorithm 1 provides the steps of the abstraction procedure. It consists of discretizing each dimension into a finite number of bins.

The first step of Algorithm 1 is to project the safe set A over different dimensions, \(D_i \doteq \varPi _i(A)\), where the projection operators \(\varPi _i:\mathbb {R}^n\rightarrow \mathbb {R},\,i\in \mathbb {N}_n,\) are defined as \(\varPi _i(\varvec{s}) = s_i\) for any \(\varvec{s} = [s_1,\ldots ,s_n]^T\in \mathbb {R}^n\). In step 2 of the Algorithm, set \(D_i\) is partitioned as \(\{D_{ij}\}_{j=1}^{n_i}\) (for any \(i\in \mathbb {N}_n\), \(D_{ij}\)’s are arbitrary but non-empty, non-intersecting, and \(D_i= \cup _{j=1}^{n_i} D_{ij}\)). In the next step, representative points \(z_{ij} \in D_{ij}\) are also chosen arbitrarily. The subsequent results are independent of the choice of these representative points, but a natural option for interval partition sets \(D_{ij}\) is their center point. Then the DAG \((\mathcal {V},\mathcal {E})\) of the DBN \(\mathfrak {B}_{\rightarrow }\) is constructed with \(\mathcal {V} = \{X_i,\bar{X}_i,i\in \mathbb {N}_n\}\) and \(\mathcal {E}\) as per Sect. 3.2. Step 5 of the algorithm constructs the support of the random variables in \(\mathcal {V}\). For any \(i\in \mathbb {N}_n\), the support of \(X_i,\bar{X}_i\) will be \(\varOmega _i \doteq Z_i \cup \{\phi _i\}\) with the set \(Z_i \doteq \{z_{i1},\ldots ,z_{in_i}\}\) containing the representative points selected in step 3 and the dummy state \(\phi _i\) representing the complement of the set \(D_i\). Finally, step 6 computes the discrete CPDs \(T_i(\bar{X}_i|Pa(\bar{X}_i))\), reflecting the dependencies among the variables.

Each row of the CPD \(T_i\) includes values of conditional random variables \(Pa(\bar{X}_i)\), the value of r.v. \(\bar{X}_i\), and their associated probability. This probability is written as \(T_i(\bar{X}_i = z|v(Pa(\bar{X}_i)))\) in step 6 of the algorithm. In other words, the function \(v(\cdot )\) acts on (possibly a set of) random variables and provides their instantiation. The term \(v(Pa(\bar{X}_i))\) that is present in the conditioned argument of \(t_i\) leads to evaluate function \(t_i(\bar{s}_i|\cdot )\) at the instantiated values of \(Pa(\bar{X}_i)\).

For any \(i\in \mathbb {N}_n\), \(\varXi _i: Z_i \rightarrow 2^{D_i}\) represents a set-valued map that associates to any point \(z_{ij}\in Z_i\) the corresponding partition set \(D_{ij} \subset D_i\) (this is known as the “refinement map”). Furthermore, the abstraction map \(\xi _i: D_i \rightarrow Z_i\) associates to any point \(s_i \in D_i\) the corresponding discrete state in \(Z_i\). Additionally, notice that the absorbing states \(\phi = \{\phi _1,\ldots ,\phi _n\}\) are added to the definition of BN \(\mathfrak {B}_{\rightarrow }\) so that the conditional probabilities \(T_i(\bar{X}_i|Pa(\bar{X}_i))\) marginalize to one.

The construction of the DBN with discrete r.v. in Algorithm 1 is closely related to the Markov chain abstraction method in [2, 10]. The main difference lies in partitioning in each dimension separately instead of doing it for the whole state space. Absorbing states are also assigned to each dimension separately instead of having only one for the unsafe set. Moreover, Algorithm 1 stores the transition probabilities efficiently as a BN.

figure a

3.4 Probabilistic invariance for the abstract DBN

We extend the use of \(\mathbb {P}\) by denoting the probability measure on the set of events defined over a DBN with discrete r.v. \(\varvec{z} = (X_1,X_2,\ldots ,X_n)\). Given a discrete set \(Z_{\mathfrak a}\subset \prod _i\varOmega _i\), the probabilistic invariance problem asks to evaluate the probability \(p_N(\varvec{z}_0, Z_{\mathfrak a})\) that a finite execution associated with the initial condition \(\varvec{z}(0) = \varvec{z}_0\) remains within the set \(Z_{\mathfrak a}\) during the finite time horizon \(t=0,1,2,\ldots ,N\). Formally,

$$\begin{aligned} p_N(\varvec{z}_0,Z_{\mathfrak a}) = \mathbb {P}(\varvec{z}(t)\in Z_{\mathfrak a}, \text { for all } t=0,1,2,\ldots ,N|\varvec{z}(0) = \varvec{z}_0). \end{aligned}$$

This probability can be computed by a discrete analogue of the Bellman backward recursion (see [4] for details).

Theorem 1

Consider value functions \(V_k^d:\prod _i\varOmega _i\rightarrow [0,1]\), \(k=0,1,2,\ldots ,N\), computed by the backward recursion

$$\begin{aligned} V_k^d(\varvec{z}) = \varvec{1}_{Z_{\mathfrak a}}(\varvec{z}) \sum _{\bar{\varvec{z}}\in \prod _i\varOmega _i} V_{k+1}^d(\bar{\varvec{z}})\mathbb {P}(\bar{\varvec{z}}|\varvec{z}) \quad k=0,1,2,\ldots ,N-1, \end{aligned}$$
(8)

and initialized with \(V_N^d(\varvec{z}) = \varvec{1}_{Z_{\mathfrak a}}(\varvec{z})\). Then the solution of the invariance problem is characterized as \(p_N(\varvec{z}_0,Z_{\mathfrak a}) = V_0^d(\varvec{z}_0)\).

The discrete transition probabilities \(\mathbb {P}(\bar{\varvec{z}}|\varvec{z})\) in Eq. (8) are computed by taking the product of the CPD in \(\mathcal {T}\). More specifically, for any \(\varvec{z},\bar{\varvec{z}}\in \prod _i\varOmega _i\) of the form \(\varvec{z} = (z_1,z_2,\ldots ,z_n),\bar{\varvec{z}} = (\bar{z}_1,\bar{z}_2,\ldots ,\bar{z}_n)\) we have

$$\begin{aligned} \mathbb {P}(\bar{\varvec{z}}|\varvec{z}) = \prod _iT_i(\bar{X}_i = \bar{z}_i|Pa(\bar{X}_i) = \varvec{z}). \end{aligned}$$

Our algorithm for probabilistic invariance computes \(p_N(\varvec{z}_0, Z_{\mathfrak a})\) to approximate \(p_N(\varvec{s}_0, A)\), for suitable choices of \(\varvec{z}_0\) and \(Z_{\mathfrak a}\) depending on \(\varvec{s}_0\) and A. The natural choice for the initial state is \(\varvec{z}_0 =(z_1(0),\ldots ,z_n(0))\) with \(z_i(0) = \xi _i(\varPi _i(\varvec{s}_0))\). For A, the n-fold Cartesian product of the collection of the partition sets \(\{D_{ij}\},\,i\in \mathbb {N}_n\) generates a cover of A as

$$\begin{aligned} A&\subset \bigcup \{D_{1j}\}_{j=1}^{n_1}\times \{D_{2j}\}_{j=1}^{n_2}\times \ldots \times \{D_{nj}\}_{j=1}^{n_n}\\&= \bigcup _{\varvec{j}}\left\{ D_{\varvec{j}}| \varvec{j} = (j_1,j_2,\ldots ,j_n), D_{\varvec{j}} \doteq D_{1j_1}\times D_{2j_2}\times \ldots \times D_{nj_n}\right\} . \end{aligned}$$

We define the safe set \(Z_{\mathfrak a}\) of the DBN as

$$\begin{aligned} Z_{\mathfrak a} = \bigcup _{\varvec{j}}\left\{ (z_{1j_1},z_{2j_2},\ldots ,z_{nj_n}),\text { such that } A\cap D_{\varvec{j}}\ne \emptyset \text { for }\varvec{j} = (j_1,j_2,\ldots ,j_n)\right\} , \end{aligned}$$
(9)

which is a discrete representation of the continuous set \(\bar{A}\subset \mathbb {R}^n\)

$$\begin{aligned} \bar{A} = \bigcup _{\varvec{j}}\left\{ D_{\varvec{j}},\text { such that } \varvec{j} = (j_1,j_2,\ldots ,j_n), A\cap D_{\varvec{j}}\ne \emptyset \right\} . \end{aligned}$$
(10)

For instance \(\bar{A}\) can be a finite union of hypercubes in \(\mathbb {R}^n\) if the partition sets \(D_{ij}\) are intervals. It is clear that the set \(\bar{A}\) is in general different form A.

There are thus two sources of error: first due to replacing A with \(\bar{A}y\), and second, due to the abstraction of the dynamics between the discrete outcome obtained by Theorem 1 and the continuous solution that results from (5). In the next section we provide a quantitative bound on the two sources of error.

3.5 Quantification of the error due to abstraction

Let us explicitly write the Bellman recursion (5) of the safety problem over the set \(\bar{A}\):

$$\begin{aligned} W_N(\varvec{s}) = \varvec{1}_{\bar{A}}(\varvec{s}),\quad W_k(\varvec{s}) = \int _{\bar{A}}W_{k+1}(\varvec{\bar{s}})t_{\mathfrak {s}}(\varvec{\bar{s}}|\varvec{s})d\varvec{\bar{s}},\quad k=0,1,2,\ldots ,N-1, \end{aligned}$$
(11)

which results in \(p_N(\varvec{s}_0,\bar{A}) = W_0(\varvec{s}_0)\). Theorem 2 characterizes the error due to replacing the safe set A by \(\bar{A}\).

Theorem 2

Solution of the probabilistic invariance problem with the time horizon N and two safe sets \(A,\bar{A}\) satisfies the inequality

$$\begin{aligned} |p_N(\varvec{s}_0,A)-p_N(\varvec{s}_0,\bar{A})|\le MN\mathcal {L}(A\varDelta \bar{A}), \quad \forall \varvec{s}_0\in A\cap \bar{A}, \end{aligned}$$

where \(M \doteq \sup \left\{ t_{\mathfrak {s}}(\varvec{\bar{s}}|\varvec{s})\big |\varvec{s},\varvec{\bar{s}}\in A\varDelta \bar{A}\right\} \), \(\mathcal {L}(B)\) denotes the Lebesgue measure of any set \(B\in \mathcal {B}\), and \(A\varDelta \bar{A} \doteq (A\backslash \bar{A})\cup (\bar{A}\backslash A)\) is the symmetric difference of the two sets \(A,\bar{A}\).

Proof

Recall the recursive equations for the probabilistic safety problem over sets A and \(\bar{A}\) as in (5) and (11), respectively. Solutions of the safety problems are \(p_N(\varvec{s}_0, A) = V_0(\varvec{s}_0)\) and \(p_N(\varvec{s}_0,\bar{A}) = W_0(\varvec{s}_0)\). We prove inductively that the inequality \(|V_{k}(\varvec{s})- W_{k}(\varvec{s})|\le M(N-k)\mathcal {L}(\bar{A}\varDelta A)\) holds for all \(\varvec{s}\in A\cap \bar{A}\). This inequality is true for \(k = N\) since \(V_{N}(\varvec{s}) = W_{N}(\varvec{s}) = 1\) for \(\varvec{s}\in A\cap \bar{A}\). For any \(k=0,1,2,\ldots ,N-1\) and any state \(\varvec{s}\in A\cap \bar{A}\) we have

$$\begin{aligned} |V_{k}(\varvec{s})- W_{k}(\varvec{s})|&\le \int _{A\cap \bar{A}}|V_{k+1}(\varvec{\bar{s}})- W_{k+1}(\varvec{\bar{s}})|t_{\mathfrak {s}}(\varvec{\bar{s}}|\varvec{s})d\varvec{\bar{s}}\\&\quad + \int _{A\backslash \bar{A}}V_{k+1}(\varvec{\bar{s}})t_{\mathfrak {s}}(\varvec{\bar{s}}|\varvec{s})d\varvec{\bar{s}} + \int _{\bar{A}\backslash A}W_{k+1}(\varvec{\bar{s}})t_{\mathfrak {s}}(\varvec{\bar{s}}|\varvec{s})d\varvec{\bar{s}}\\&\le M(N-k-1)\mathcal {L}(\bar{A}\varDelta A) + M\mathcal {L}(\bar{A}\backslash A)+M\mathcal {L}(A\backslash \bar{A})\\&= M(N-k)\mathcal {L}(\bar{A}\varDelta A). \end{aligned}$$

The inequality for \(k = 0\) proves the upper bound \(MN\mathcal {L}(\bar{A}\varDelta A)\) on \(|p_N(\varvec{s}_0, A)-p_N(\varvec{s}_0,\bar{A})|\).

\(\square \)

The second contribution to the error is related to the discretization of Algorithm 1 which is quantified by posing regularity conditions on the dynamics of the process. The following Lipschitz continuity assumption restricts the generality of the density functions \(t_k\) characterizing the dynamics of model \(\mathscr {M}_{\mathfrak {s}}\).

Assumption 1

Assume the density functions \(t_k(\bar{s}_i|\cdot )\) are Lipschitz continuous with the finite positive Lipschitz constants \(d_{ij}\):

$$\begin{aligned} |t_j(\bar{s}_j|\varvec{s})-t_j(\bar{s}_j|\varvec{s'})|\le d_{ij}| s_i-s_i'|, \end{aligned}$$
(12)

with \(\varvec{s} = [s_1,\ldots ,s_{i-1},s_i,s_{i+1},\ldots ,s_n]\) and \(\varvec{s'} = [s_1,\ldots ,s_{i-1},s_i',s_{i+1},\ldots ,s_n]\), for all \(s_k,s_k',\bar{s}_k\in D_k\), \(k\in \mathbb {N}_n\), and for all \(i,j\in \mathbb {N}_n\).

Note that Assumption 1 holds with \(d_{ij} = 0\) if and only if \((X_i,\bar{X}_j)\notin \mathcal {E}\) in the DAG of the BN \(\mathfrak {B}_{\rightarrow }\). Assumption 1 enables us to assign non-zero weights \(w_{ij} = d_{ij}\mathcal {L}(D_j)\) to the arcs \((X_i,\bar{X}_j)\in \mathcal {E}\), for all \(i,j\in \mathbb {N}_n\), of the graph. We define the out-weight of the node \(X_i\) by \(\mathcal {O}_i = \sum _{j=1}^{n}w_{ij}\) and the in-weight of the node \(\bar{X}_j\) by \(\mathcal {I}_j = \sum _{i=1}^{n}w_{ij}\).

Remark 2

The Lipschitz constants \(d_{ij}\) in (12) can be obtained as an upper bound on the absolute value of the partial derivative (with respect to \(s_i\)) of the density function \(t_{j}(\bar{s}_j|\varvec{s})\). Each constant can be computed using symbolic or numeric differentiation, then performing one single local optimization over the partition of interest. Available software like MATLAB can easily be used to automate such computations. The software tool FAUST \(^{\mathsf 2}\) [14] already employs computation of Lipschitz constants to perform Markov chain abstraction of stochastic systems. Note that Assumption 1 is a mild restriction on the class of stochastic systems under study and is not bound to linear dynamics. For instance, any non-linear system with additive noise \(s_j(t+1) = f_j(\varvec{s}(t))+\zeta (t)\) in which both \(f_j(\cdot )\) and the density function of \(\zeta (\cdot )\) are Lipschitz continuous, satisfies Assumption 1. \(\square \)

Remark 3

Additionally, the above assumption implies Lipschitz continuity of the conditional density functions \(t_j(\bar{s}_j|\varvec{s})\). Since trivially \(|s_i - s'_i| \le \Vert \varvec{s} - \varvec{s}'\Vert \) for all \(i \in \mathbb {N}_n\), we obtain

$$\begin{aligned} |t_j(\bar{s}_j|\varvec{s})-t_j(\bar{s}_j|\varvec{s}')|\le \mathcal {H}_j \Vert \varvec{s}-\varvec{s}'\Vert \quad \forall \varvec{s},\varvec{s}'\in \bar{A},\bar{s}_j\in D_j, \end{aligned}$$

where \(\mathcal {H}_j = \sum _{i=1}^{n}d_{ij}\). The density function \(t_{\mathfrak {s}}(\varvec{\bar{s}}|\varvec{s})\) is also Lipschitz continuous if the density functions \(t_j(\bar{s}_j|\varvec{s})\) are bounded, but the boundedness assumption is not necessary for the results of this paper to hold. \(\square \)

Assumption 1 enables us to establish Lipschitz continuity of the value functions \(W_k\) in (11). This continuity property is essential in proving an upper bound on the discretization error of Algorithm 1, which we shall present in Corollary 1.

Lemma 1

Consider the value functions \(W_k(\cdot )\), \(k=0,1,2,\ldots ,N\), employed in Bellman recursion (11) of the safety problem over the set \(\bar{A}\). Under Assumption 1, these value functions are Lipschitz continuous

$$\begin{aligned} |W_k(\varvec{s})-W_k(\varvec{s}')|\le \kappa \Vert \varvec{s}-\varvec{s}'\Vert ,\quad \forall \varvec{s},\varvec{s}'\in \bar{A}, \end{aligned}$$

for all \(k=0,1,2,\ldots ,N\) with the constant \(\kappa = \sum _{j=1}^{n}\mathcal {I}_j\), where \(\mathcal {I}_j\) is the in-weight of the node \(\bar{X}_j\) in the DAG of the BN \(\mathfrak {B}_{\rightarrow }\).

Proof

The inequality holds for \(k=N\) since \(W_N(\varvec{s})=W_N(\varvec{s}') =1\) for any \(s,\varvec{s}'\in \bar{A}\). For \(k=0,1,2,\ldots ,N-1\) and any \(\varvec{s},\varvec{s}'\in \bar{A}\) we have

$$\begin{aligned} |W_k(\varvec{s})-W_k(\varvec{s}')|&\le \int _{\bar{A}} W_{k+1}(\bar{\varvec{s}})|t_{\mathfrak {s}}(\bar{\varvec{s}}|\varvec{s})-t_{\mathfrak {s}}(\bar{\varvec{s}}|\varvec{s}')|d\bar{\varvec{s}}\\&\le \int _{\bar{A}}|t_{\mathfrak {s}}(\bar{\varvec{s}}|\varvec{s})-t_{\mathfrak {s}}(\bar{\varvec{s}}|\varvec{s}')|d\bar{\varvec{s}} \end{aligned}$$

Next, we employ a telescopic sum for the multiplicative structure of the density functions in the integrand on the right-hand side, to obtain:

$$\begin{aligned} |W_k(\varvec{s})&-W_k(\varvec{s}')|\le \int _{\bar{A}}\left| \prod _{i=1}^{n}t_i(\bar{s}_i|\varvec{s})-\prod _{i=1}^{n}t_i(\bar{s}_i|\varvec{s}')\right| d\bar{\varvec{s}}\nonumber \\&= \int _{\bar{A}}\left| \sum _{j=1}^{n}\left[ \prod _{i=1}^{j-1}t_i(\bar{s}_i|\varvec{s}')\prod _{i=j}^{n}t_i(\bar{s}_i|\varvec{s}) -\prod _{i=1}^{j}t_i(\bar{s}_i|\varvec{s}')\prod _{i=j+1}^{n}t_i(\bar{s}_i|\varvec{s})\right] \right| d\bar{\varvec{s}}\nonumber \\&\le \sum _{j=1}^{n}\int _{\bar{A}}\left[ \prod _{i=1}^{j-1}t_i(\bar{s}_i|\varvec{s}')\prod _{i=j+1}^{n}t_i(\bar{s}_i|\varvec{s}) \left| t_j(\bar{s}_j|\varvec{s})-t_j(\bar{s}_j|\varvec{s}')\right| \right] d\bar{\varvec{s}}\nonumber \\&\le \sum _{j=1}^{n}\int _{D_j}\left| t_j(\bar{s}_j|\varvec{s})-t_j(\bar{s}_j|\varvec{s}')\right| d\bar{s}_j\nonumber \\&\le \sum _{j=1}^{n}\mathcal {H}_j\Vert \varvec{s}-\varvec{s}'\Vert \mathcal {L}(D_j) =\Vert \varvec{s}-\varvec{s}'\Vert \sum _{j=1}^{n}\mathcal {H}_j\mathcal {L}(D_j) = \Vert \varvec{s}-\varvec{s}'\Vert \sum _{j=1}^{n}\mathcal {I}_j. \end{aligned}$$
(13)

\(\square \)

Corollary 1

The following inequality holds under Assumption 1:

$$\begin{aligned} |p_N(\varvec{s}_0, A)-p_N(\varvec{z}_0, Z_{\mathfrak a})|\le MN\mathcal {L}(A\varDelta \bar{A}) + N\kappa \delta \quad \forall \varvec{s}_0\in A, \end{aligned}$$

where \(p_N(\varvec{z}_0, Z_{\mathfrak a})\) is the invariance probability for the DBN obtained by Algorithm 1. The initial state of the DBN is \(\varvec{z}_0 =(z_1(0),\ldots ,z_n(0))\) with \(z_i(0) = \xi _i(\varPi _i(\varvec{s}_0))\). The set \(Z_{\mathfrak a}\) and the constant M are defined in (9) and Theorem 2, respectively. The diameter of the partition of Algorithm 1 is defined and used as

$$\begin{aligned} \delta = \sup \{\Vert \varvec{s}-\varvec{s}'\Vert ,\forall \varvec{s},\varvec{s}'\in D_{\varvec{j}},\forall \varvec{j}\,\,\,D_{\varvec{j}}\subset \bar{A}\}. \end{aligned}$$

Proof

Construction of the set \(\bar{A}\) in (10) implies that \(A\subseteq \bar{A}\). We use triangle inequality and utilize the bound established in Theorem 2 to get, for all \(\varvec{s}_0\in A\),

$$\begin{aligned} |p_N(\varvec{s}_0, A)-p_N(\varvec{z}_0, Z_{\mathfrak a})|&\le |p_N(\varvec{s}_0, A)-p_N(\varvec{s}_0,\bar{A})| + |p_N(\varvec{s}_0, \bar{A})-p_N(\varvec{z}_0, Z_{\mathfrak a})|\\&\le MN\mathcal {L}(A\varDelta \bar{A}) + |W_0(\varvec{s}_0) - V_0^d(\varvec{z}_0)|, \end{aligned}$$

where \(V_0^d\) and \(W_0\) are defined respectively in (8) and (11). DBN constructed in Sect. 3.3 is in fact a finite-state Markov chain with a specific structure. Combining this with the Lipschitz continuity property of \(W_0\) proved in Lemma 1 enable us to utilize the bound provided in [2]: the error caused by the state-space discretization is upper-bounded by multiplication of three terms, which are Lipschitz constant of the value functions \(\kappa \), horizon of the invariance specification N, and the diameter \(\delta \) of the partition selected for the set \(\bar{A}\). Then we have \(|W_0(\varvec{s}_0) - V_0^d(\varvec{z}_0)|\le N\kappa \delta \), which completes the proof. \(\square \)

The second error term in Corollary 1 is a linear function of the partition diameter \(\delta \), which depends on all partition sets along different dimensions. We are interested in proving a dimension-dependent error bound in order to parallelize the whole abstraction procedure along different dimensions. The next theorem gives this dimension-dependent error bound.

Theorem 3

The following inequality holds under Assumption 1:

$$\begin{aligned} |p_N(\varvec{s}_0, A)-p_N(\varvec{z}_0, Z_{\mathfrak a})|\le MN\mathcal {L}(A\varDelta \bar{A}) + N\sum _{i=1}^{n} \mathcal {O}_i\delta _i \quad \forall \varvec{s}_0\in A, \end{aligned}$$
(14)

with the constants defined in Corollary 1. \(\mathcal {O}_i\) is the out-weight of the node \(X_i\) in the DAG of the BN \(\mathfrak {B}_{\rightarrow }\). The quantity \(\delta _i\) is the maximum diameter of the partition sets along the i th dimension

$$\begin{aligned} \delta _i = \sup \{|s_i-s_i'|,\forall s_i,s_i'\in D_{ij},\forall j\in \mathbb {N}_{n_i}\}. \end{aligned}$$

Proof

The proof follows the same lines as those of Lemma 1. We refine the inequality (13) to obtain an upper bound for \(|W_k(\varvec{s}) -W_k(\varvec{s}')|\) localized to partition sets. Namely, for any \(\varvec{s},\varvec{s}'\in D_{\varvec{j}}\),

$$\begin{aligned}&|W_k(\varvec{s}) -W_k(\varvec{s}')| \le \sum _{j=1}^{n}\int _{D_j}\left| t_j(\bar{s}_j|\varvec{s})-t_j(\bar{s}_j|\varvec{s}')\right| d\bar{s}_j\\&\quad \le \sum _{j=1}^{n}\int _{D_j}\sum _{i=1}^nd_{ij}|s_i-s'_i|d\bar{s}_j \le \sum _{i,j=1}^{n}d_{ij}\delta _i\mathcal {L}(D_j) = \sum _{i=1}^{n}\mathcal {O}_i\delta _i. \end{aligned}$$

Next, we utilize the results of [10], which give an upper bound on the partitioning error based on the above local computation. This implies \(|W_0(\varvec{s}_0) - V_0^d(\varvec{z}_0)|\le N\sum _{i=1}^{n}\mathcal {O}_i\delta _i\). The rest of the proof is exactly the same as that of Corollary 1. \(\square \)

For a given error threshold \(\epsilon \), we can select the set \(\bar{A}\) and consequently the diameters \(\delta _i\) such that \(MN\mathcal {L}(A\varDelta \bar{A}) + N\sum _{i=1}^{n} \mathcal {O}_i\delta _i\le \epsilon \). Therefore, generation of the abstract DBN, namely selection of the partition sets \(\{D_{ij},\,j\in \mathbb {N}_i\}\) (according to the diameter \(\delta _i\)) and computation of the CPD, can be implemented in parallel. For a given \(\epsilon \) and set \(\bar{A}\), the cardinality of the state space \(\varOmega _i, i\in \mathbb {N}_n,\) of the discrete random variable \(X_i\) and thus the size of the CPD \(T_i\), grow linearly as a function of the horizon of the specification N.

Notice that the constant M defined in Theorem 2 and used in (14) depends on \(\bar{A}\) but can be replaced by

$$\begin{aligned} M_C = \sup \left\{ t_{\mathfrak {s}}(\varvec{\bar{s}}|\varvec{s})\big |\varvec{s},\varvec{\bar{s}}\in C\right\} , \end{aligned}$$
(15)

where C is any set that contains \(A\varDelta \bar{A}\). In order to tune the error in (14), one method will be selecting the set C as a box containing the safe set A, computing the constant \(M_C\) as in (15), and then choosing \(\bar{A}\) such that \(A\subseteq \bar{A}\subseteq C\) with a suitable \(\mathcal {L}(\bar{A}\varDelta A)\). Subsequently, the partition diameters \(\delta _i\) are selected for this set \(\bar{A}\) to guarantee the error threshold \(\epsilon \).

4 Efficient model checking of the finite-state DBN

Existing numerical methods for model checking DBNs with discrete r.v. transform the DBN into an explicit matrix representation [17, 23, 26], which defeats the purpose of a compact representation. Instead, we show that the multiplicative structure of the transition probability matrix can be incorporated in the computation which makes the construction of \(\mathbb {P}(\bar{\varvec{z}}|\varvec{z})\) dispensable. For this purpose we employ factor graphs and the sum-product algorithm [21] originally developed for marginalizing functions and applied to belief propagation in Bayesian networks. Suppose that a global function is given as a product of local functions, and that each local function depends on a subset of the variables of the global map. In its most general form, the sum-product algorithm acts on factor graphs in order to marginalize the global function, i.e., taking summation respect to a subset of variables, exploiting its product structure [21]. In our problem, we restrict the summation domain of the Bellman recursion (8) to \(\prod _i Z_i\) because the value functions are simply equal to zero in the complement of this set. The summand in (8) has the multiplicative structure

$$\begin{aligned} g(\varvec{z},\bar{\varvec{z}}) \doteq \varvec{1}_{Z_{\mathfrak a}}(\varvec{z})V_{k+1}^d(\bar{\varvec{z}})\prod _i T_i(\bar{X}_i = \bar{z}_i|Pa(\bar{X}_i) = \varvec{z}), \,\,\,V_k^d(\varvec{z}) =\sum _{\bar{\varvec{z}}\in \prod _i Z_i}g(\varvec{z},\bar{\varvec{z}}). \end{aligned}$$
(16)

The function \(g(\varvec{z},\bar{\varvec{z}})\) depends on variables \(\{z_i,\bar{z}_i,\,i\in \mathbb {N}_n\}\). The factor graph of \(g(\varvec{z},\bar{\varvec{z}})\) has 2n variable nodes, one for each variable and \((n+2)\) function nodes for local functions \(\varvec{1}_{Z_{\mathfrak a}},V_{k+1}^d,T_i\). An arc connects a variable node to a function node if and only if the variable is an argument of the local function. The factor graph of Example 4 for \(n = 4\) is presented in Fig. 3—factor graphs of general functions \(g(\varvec{z},\bar{\varvec{z}})\) in (16) are similar to that in Fig. 3, the only part needing to be modified being the set of arcs connecting variable nodes \(\{z_i,\,i\in \mathbb {N}_n\}\) and function nodes \(\{T_i,\,i\in \mathbb {N}_n\}\). This part of the graph can be obtained from the DAG of \(\mathfrak {B}_{\rightarrow }\) of the DBN.

Fig. 3
figure 3

Factor graph of the linear stochastic system (7) for \(n=4\)

Fig. 4
figure 4

Spanning tree of the linear stochastic system in (7) for \(n=4\) and two orderings \((\bar{z}_4,\bar{z}_3,\bar{z}_2,\bar{z}_1)\) (top plot) and \((\bar{z}_1,\bar{z}_2,\bar{z}_3,\bar{z}_4)\) (bottom plot)

The factor graph of a function \(g(\varvec{z},\bar{\varvec{z}})\) contains loops for \(n\ge 2\) and must be transformed to a spanning tree using clustering and stretching transformations [21]. For this purpose the order of clustering function nodes \(\{T_i,i\in \mathbb {N}_n\}\) and that of stretching variable nodes \(\{z_i,\, i\in \mathbb {N}_n\}\) needs to be chosen. Figure 4 presents the spanning trees of the stochastic system in (7) for two such orderings. The variable nodes at the bottom of each spanning tree specify the order of the summation, whereas the function nodes considered from the left to the right indicate the order of multiplication of the local functions. The rest of the variable nodes show the arguments of the intermediate functions, which reflects the required memory for storing such functions. The computational complexity of the solution carried out on the spanning tree clearly depends on this ordering.

Algorithm 2 presents a greedy procedure that operates on the factor graph and provides an ordering of the variables and of the functions to reduce the overall memory usage. This algorithm iteratively combines the function nodes and selects the next variable node, over which the summation is carried out. Step 1 initializes the algorithm by distinguishing three sets of nodes: \(\mathcal {U}_1 = \{z_1,z_2,\ldots ,z_n\}\) and \(\mathcal {U}_2 = \{\bar{z}_1,\bar{z}_2,\ldots ,\bar{z}_n\}\) contain the variable nodes and \(\mathcal {U}_3 = \{T_1,T_2,\ldots ,T_n\}\) includes function nodes. The sequences \(e_{\mathfrak f}\) and \(\kappa _{\mathfrak f}\) are initially empty and will contain the function and variables for performing product and sum in the sum-product algorithm. These sequences are built progressively during the while loop of the algorithm.

In each iteration of the while loop we compute the set of nodes from \(\mathcal {U}_1\) and \(\mathcal {U}_2\) connected to the elements of \(\mathcal {U}_3\) through functions \(Pa_{\mathfrak f}\) and \(Ch_{\mathfrak f}\), respectively. Steps 4–6 modify the graph to combine the nodes in \(\mathcal {U}_3\) that are connected to the same set of nodes in \(\mathcal {U}_1\) since these function nodes have the same conditional variables and their memory usage is exactly the same. Step 7 selects the next function and variable nodes for performing product and sum in the sum-product algorithm such that the required memory is minimal among the possible selections. Finally, step 8 updates the sets after such selection.

Note that Algorithm 2 is applied to the factor graph of the system which has only \((3n+2)\) nodes. In contrast, the memory usage of the DBN model checking is a polynomial function of the number of partition sets which is in general much larger than \((3n+2)\) for practical accuracies. Thus the overhead related to Algorithm 2 is definitely worth when viewed from the perspective of the attained memory savings. Since Algorithm 2 computes the ordering progressively, its outcome depends on the structure of the factor graph and is sub-optimal in general. The output of this algorithm implemented on the factor graph of Example 4 is the orderings \(\kappa _{\mathfrak f} = (\bar{z}_4,\bar{z}_3,\bar{z}_2,\bar{z}_1)\) and \(e_{\mathfrak f} = (T_4,T_3,T_2,T_1)\), started from the outermost sum, which is related to the spanning tree on top of Fig. 4.

figure b

5 Comparison with the state of the art

In this section we compare our approach with the state-of-the-art abstraction procedure presented in [2] (referred to as \(\mathrm {AKLP}\) in the following), which does not exploit the structure of the dynamics. The \(\mathrm {AKLP}\) algorithm approximates the concrete model with a finite-state Markov chain by uniformly gridding the safe set. As in our work, the error bound of the \(\mathrm {AKLP}\) procedure depends on the global Lipschitz constant of the density function of the model, however it does not exploit its structure as proposed in this work. We compare the two procedures on (1) error bounds and (2) computational resources.

Consider the stochastic linear dynamical model in (7), where \(\varPhi = [a_{ij}]_{i,j}\) is an arbitrary matrix. The Lipschitz constants \(d_{ij}\) in Assumption 1 can be computed as \(d_{ij} = |a_{ji}|/\sigma _j^2\sqrt{2\pi e}\), where e is Euler’s constant. From Theorem 3, we get the following error bound:

$$\begin{aligned} e_{\mathrm {DBN}} \doteq MN\mathcal {L}(A\varDelta \bar{A})+\frac{N}{\sqrt{2\pi e}}\sum _{i,j=1}^n\frac{|a_{ji}|}{\sigma _j^2}\mathcal {L}(D_j)\delta _i. \end{aligned}$$

On the other hand, the error bound for \(\mathrm {AKLP}\) is

$$\begin{aligned} e_{\mathrm {AKLP}} = MN\mathcal {L}(A\varDelta \bar{A}) +\frac{Ne^{-1/2}}{(\sqrt{2\pi })^n\sigma _1\sigma _2\cdots \sigma _n}\Vert \varSigma ^{-1/2}\varPhi \Vert _2\delta \mathcal {L}(A). \end{aligned}$$
Table 1 Comparison of the \(\mathrm {AKLP}\) and the DBN-based algorithms, over the stochastic linear dynamical model (7)

In order to meaningfully compare the two error bounds, select set \(A = [-\alpha ,\alpha ]^n\) and \(\sigma _i = \sigma ,i\in \mathbb {N}_n\), and consider hypercubes as partition sets. The two error terms then become

$$\begin{aligned} e_{\mathrm {DBN}} = \varsigma n\eta \left( \frac{\Vert \varPhi \Vert _1}{n\sqrt{n}}\right) , \quad e_{\mathrm {AKLP}} = \varsigma \eta ^n\Vert \varPhi \Vert _2, \quad \eta = \frac{2\alpha }{\sigma \sqrt{2\pi }}, \quad \varsigma = \frac{N\delta }{\sigma \sqrt{e}}, \end{aligned}$$

where \(\Vert \varPhi \Vert _1\) and \(\Vert \varPhi \Vert _2\) are the entry-wise one-norm and the induced two-norm of matrix \(\varPhi \), respectively. The error \(e_{\mathrm {AKLP}}\) depends exponentially on the dimension n as \(\eta ^n\), whereas we have reduced this term to a linear one \((n\eta )\) in our proposed new approach resulting in error \(e_{\mathrm {DBN}}\). Note that \(\eta \le 1\) means that the standard deviation of the process noise is larger than the selected safe set: in this case the value functions (which characterize the probabilistic invariance problem) uniformly converge to zero with rate \(\eta ^n\); clearly the case of \(\eta >1\) is more interesting. On the other hand for any matrix \(\varPhi \) we have \(\frac{\Vert \varPhi \Vert _1}{n\sqrt{n}}\le \Vert \varPhi \Vert _2\). This second term indicates how sparsity is reflected in the error computation. Denote by r the degree of connectivity of the DAG of \(\mathfrak {B}_{\rightarrow }\) for this linear system, which is the maximum number of non-zero elements in rows of matrix \(\varPhi \). We adapt the following inequalities from [20] for the norms of matrix \(\varPhi \) (refer to the Appendix for a formal proof):

$$\begin{aligned} \Vert \varPhi \Vert _2\le \sqrt{nr}\max _{i,j}|a_{ij}|, \qquad \frac{\Vert \varPhi \Vert _1}{n\sqrt{n}}\le \frac{r}{\sqrt{n}}\max _{i,j}|a_{ij}|, \end{aligned}$$
(17)

which shows that for a fixed dimension n, sparse dynamics, compared to fully connected dynamics, results in better error bounds in the new approach.

In order to compare computational resources, consider the numerical values \(N = 10\), \(\alpha = 1\), \(\sigma = 0.2\), and the error threshold \(\epsilon = 0.2\) for the lower bidiagonal matrix \(\varPhi \) with all the non-zero entries set to one. Table 1 compares the number of required partition sets (or bins) per dimension, the number of marginals, and the required number of (addition and multiplication) operations for the verification step, for models of different dimensions (number of continuous variables n). The numerical values in Table 1 confirm that for a given upper bound on the error \(\epsilon \), the number of bins per dimension and the required marginals grow exponentially in dimension for \(\mathrm {AKLP}\) and polynomially for our DBN-based approach. For instance, to ensure the error is at most \(\epsilon \) for the model of dimension \(n=4\), the cardinality of the partition of each dimension for the uniform gridding and for the structured approach is \(2.9\times 10^5\) and \(8.5\times 10^3\), respectively. Then, \(\mathrm {AKLP}\) requires storing \(4.8\times 10^{43}\) entries (which is infeasible!), whereas the DBN approach requires \(1.8\times 10^{12}\) entries (\(\sim 8\)GB). The number of operations required for computation of the safety probability are \(1.1\times 10^{45}\) and \(3.5\times 10^{21}\), respectively. This shows a substantial reduction in memory usage and computational time effort: with given memory and computational resources, the DBN-based approach in compare with \(\mathrm {AKLP}\) promises to handle systems with dimension that is at least twice as large.

Statistical model checking (SMC) [24] is an alternative approach to analyse general probabilistic systems against temporal specifications. Our approach is distinct from SMC in the type of guarantees we provide on the numerical outcomes: namely, our approach provides absolute guarantees for satisfaction of the safety specification as in Corollary 1, whereas SMC provides probabilistic guarantees (i.e., with a given confidence). Moreover, we can compute safety probabilities for any initial state of the process belonging to a continuous domain, but the SMC approach can handle only a finite set of initial states and its computational complexity is linear in the cardinality of the initial set. Therefore SMC by itself cannot handle continuous domains of initial state as we do in this article. To address this, one option would be to partition the set of initial states, verify the process for representative points of the partition sets, and then perform a sensitivity analysis to judge satisfaction of the specification for non-evaluated initial states. Such a sensitivity analysis can be seen as a special case of our approach (i.e., using the Lipschitz continuity of the density function to prove that the property is a smooth function of the initial state). Finally, our approach can be as well extended to models with non-determinism, which is a feature knowingly difficult for existing SMC algorithms and tools.

6 Numerical case study

In this section we present a model for a metabolic network [1] based on a stochastic process, and compute the invariance probability over the model. A metabolic reaction network consists of a set of \(\mathfrak c\) metabolites and a related set of \(\mathfrak b\) fluxes between the metabolites in pool \(\varvec{c}\). The concentrations of the metabolites are represented with a vector \(\varvec{c}\in (\mathbb {R}_{\ge 0})^{\mathfrak c}\), and the set of fluxes is denoted by a vector \(\varvec{v}\in \mathbb {R}^{\mathfrak b}\). The metabolic network considered in this section is adapted from [1] and displayed in Fig. 5.

The material fluxes in \(\varvec{v}\) depend on enzymatic reaction mechanisms (for instance, Michaelis-Menten kinetics), substrate concentrations and allosteric effectors (vector \(\varvec{c}\)), and parameters of the mechanisms (\(\varvec{\alpha }\), encompassing for instance affinity constants, maximal conversion rates, etc). The rates of change for concentrations in \(\varvec{c}\) are described by balancing the in- and out-fluxes for each metabolite pool. These balances can be expresses via a stoichiometric matrix \(N_{\mathfrak r}\in \mathbb {Z}^{\mathfrak c\times \mathfrak b}\), which relates the number of balanced metabolites to the reactions present in the network, and via the flux functions in vector \(\varvec{v}\). These fluxes depend on the metabolite concentrations \(\varvec{c}\), as well as on the physio-chemical parameters \(\varvec{\alpha }\) (e.g. kinetic parameters), and on additional parameters \(\varvec{\beta }\) encompassing operational variables (e.g. substrate feed to the reactor, dilution rates, and other experimental settings), as follows:

$$\begin{aligned} d\varvec{c} = N_{\mathfrak r} \varvec{v}(\varvec{c},\varvec{\alpha },\varvec{\beta })dt+d\varvec{w}_t, \end{aligned}$$

where \(\{\varvec{w}_t,t\in \mathbb {R}_{\ge 0}\}\) is a Wiener process that additively captures uncertainties in the parameters and in the unmodeled dynamics of the metabolic network. A discrete-time dynamical model can be obtained by time sampling via the known Euler-Maruyama scheme, which yields

$$\begin{aligned} \varvec{c}(t+1) = \varvec{c}(t) + N_{\mathfrak r} \varvec{v}(\varvec{c}(t),\varvec{\alpha },\varvec{\beta })\tau + \varvec{\zeta }(t), \end{aligned}$$
(18)

where \(\tau \) is the sample time and \(\{\varvec{\zeta }(t),\,\,t\in \mathbb {N}\}\) is an iid Gaussian random sequence.

Fig. 5
figure 5

Metabolic network considered for case study presented in Sect. 6. The network presents two extracellular metabolites (\(A_{ex}\) and \(E_{ex}\)), and five intracellular ones (A to E). Arrows are labeled with metabolic fluxes, affecting the metabolites concentrations dynamics as per (18)

The state vector denoting the concentrations of the metabolites in (18) is \(\varvec{c} = [c_A,c_B,c_C,c_D,c_E]^T\in \mathbb {R}^5\). The stoichiometric matrix \(N_{\mathfrak r}\) can be written as

$$\begin{aligned} N_{\mathfrak r} = \left[ \begin{array}{c@{\quad }c@{\quad }c@{\quad }c@{\quad }c@{\quad }c@{\quad }c} 1 \quad &{} -1 \quad &{} 0 \quad &{} 0 \quad &{} 0 \quad &{} 0 \quad &{} 0\\ 0 \quad &{} 1 \quad &{} -1 \quad &{} 0 \quad &{} 0 \quad &{} -1 \quad &{} 0\\ 0 \quad &{} 0 \quad &{} 1 \quad &{} -1 \quad &{} 0 \quad &{} 0 \quad &{} 0\\ 0 \quad &{} 0 \quad &{} 0 \quad &{} 1 \quad &{} -1 \quad &{} 0 \quad &{} 0\\ 0 \quad &{} 0 \quad &{} 0 \quad &{} 0 \quad &{} 1 \quad &{} 1 \quad &{} -1 \end{array} \right] , \end{aligned}$$

and the fluxes vector \(\varvec{v} = [v_{upt},v_1,v_2,\,v_3,v_4,v_5,v_6]^T\), where \(v_{upt}\) is assumed to be a constant input flux. The structure and parameters of the kinetic equations are reported in Tables 2 and 3, respectively.

Table 2 Kinetic equations in the metabolic network of the case study in Sect. 6

The one-step conditional density function of the network is a multivariate Gaussian \(t_{\mathfrak {s}}(\bar{\varvec{c}}|\varvec{c})\sim \mathcal {N}(\varvec{m}(\varvec{c}),\varSigma )\), with a mean \(\varvec{m}(\varvec{c})\) that depends on the state vector \(\varvec{c}\) as follows:

$$\begin{aligned} \varvec{m}(\varvec{c}) = \left[ \begin{array}{l} m_1(c_A,c_B)\\ m_2(c_A,c_B,c_E)\\ m_3(c_B,c_C,c_E)\\ m_4(c_C,c_D)\\ m_5(c_B,c_D,c_E) \end{array} \right] = \left[ \begin{array}{l} c_A + \tau \left[ v_{upt}-v_1(c_A,c_B)\right] \\ c_B + \tau \left[ v_1(c_A,c_B) -v_2(c_B,c_E) -v_5(c_B,c_E)\right] \\ c_C + \tau \left[ v_2(c_B,c_E) - v_3(c_C)\right] \\ c_D + \tau \left[ v_3(c_C)-v_4(c_D)\right] \\ c_E + \tau \left[ v_4(c_D) + v_5(c_B,c_E) - v_6(c_E)\right] \\ \end{array} \right] , \end{aligned}$$

and with a covariance matrix \(\varSigma \) of \(\{\varvec{\zeta }(t)\}\). The two-layered BN \(\mathfrak {B}_{\rightarrow }\) associated with the metabolic network (18) is presented in Fig. 6.

Table 3 Parameter values used for the metabolic network of the case study in Sect. 6 (all parameters \(v_{max}\) have unit \([\mu mol/(gCDW.s)]\), \(K_{mA}\) and \(K_{mB}\) in the kinetic equation of \(v_2\) respectively have units \([(\mu mol/gCDW)^{hA}]\) and \([(\mu mol/gCDW)^{hB}]\), all other \(K_{mA},K_{mP}\) have the metabolite concentration unit \([\mu mol/gCDW]\))
Fig. 6
figure 6

Two-layered BN \(\mathfrak {B}_{\rightarrow }\) associated with the metabolic network of Fig. 5, with dynamics in (18)

Fig. 7
figure 7

Solution of the probabilistic invariance problem for the case study of Sect. 6, as a function of initial states of the process. Each plot represents the solution as a function of two initial metabolite concentrations (with units \([\mu mol/gCDW]\)), where the other three initial concentrations have been taken to be equal to zero

We assume the noises affecting reaction equations in (18) are independent [15], which makes the covariance matrix diagonal \(\varSigma = diag([\sigma _1^2,\sigma _2^2,\ldots ,\sigma _5^2])\). We use Lemma 2 in the appendix to compute weights \(w_{ij}\) associated with the DAG of \(\mathfrak {B}_{\rightarrow }\). These weights can be written as \(w_{ij} = 2h_{ij}/(\sigma _i\sqrt{2\pi }),\) where \(h_{ij}\) is the Lipschitz constant of the mean \(m_i(\varvec{c})\) respect to the \(j^{\text {th}}\) element of \(\varvec{c}\). We consider the safe set \(A = [0,1]^5\), time step \(\tau = 0.05\), time horizon \(N = 8\), input flux \(v_{upt} = 0.8\), and standard deviations \(\sigma _i = 0.2\). The number of bins per dimension \([55,13,8,8,9]\times 10^2\) is required to guarantee the error threshold \(\epsilon = 0.2\). Solution of the invariance problem is presented in Fig. 7. Each plot represents the solution as a function of two initial state variables where the other three initial states are zero. These simulation results indicate that the concentrations of all metabolites remain within the interval [0, 1] during the time horizon \(N=8\) with high probability for initial concentrations close to zero. The probability decreases for initial concentrations close to one. In other words, the noise term in Eq. (18) forces the concentrations to jump outside of the interval with a higher chance for initial concentrations close to the upper limit of the interval.

7 Conclusions and future directions

While we have focused on probabilistic invariance, our abstraction approach can be extended to more general properties expressed within the bounded-horizon fragment of PCTL [28] or to bounded-horizon linear temporal properties [3, 30], since the model checking problem for these logics reduce to computations of value functions similar to the Bellman recursion scheme. Our focus in this paper has been the foundations of DBN-based abstraction for general Markov processes: factored representations, error bounds, and algorithms. We are currently implementing these algorithms in the FAUST \(^{\mathsf 2}\) tool [14], and scaling the algorithms using dimension-dependent adaptive gridding [10] as well as implementations of the sum-product algorithm on top of data structures such as algebraic decision diagrams (as in probabilistic model checkers [22]).