1 Introduction

Often in formal verification one is interested in approximations of concrete models. Models are often built from experimental data that are themselves approximate, and taking approximations can reduce the size and complexity of the state space. Markov models in particular can be defined either syntactically as a transition structure (with states and matrices), or semantically as a random process whose trajectory satisfy the Markov property. Each representation gives rise to its own notions of approximation [1]: “the transition matrices have similar numbers and/or structure” vs “the trajectories have similar probability distributions”, respectively. While the syntactic representation is used for computations and model checking with concrete numbers, often one is interested in results in terms of the semantics, e.g. “what is the probability of reaching a failure state within 100 steps”. This gives practical value to studying how approximations in terms of transition matrices translate into approximations in terms of traces of the random process.

In this paper we build on the notion of \(\varepsilon \)-approximate probabilistic bisimulation, introduced in [13] as a natural extension to exact probabilistic bisimulation [12]. There, the notion of \(\varepsilon \)-approximate probabilistic bisimulation (or just \(\varepsilon \)-bisimulation) is defined in terms of the transition structure, and given \(\varepsilon \) the maximal \(\varepsilon \)-bisimulation relation can be computed for finite state systems with n states in \(\mathcal O(n^7)\) time [13].

It is on the other hand of interest to explore what \(\varepsilon \)-bisimulation means in terms of trajectories. While \(\varepsilon \)-bisimulation does have characterizations (on countable state spaces) in terms of logics and games [12], this logic is branching in nature, and does not directly relate to the trajectory of the model as it leaks information about the state space (similarly to the difference between CTL and LTL), as illustrated in Fig. 1.

Fig. 1.
figure 1

Branching vs Linear Time Behaviour. In the Labelled Markov Chain below (cf. Sect. 2 for the LMC model), the states \(s_1, s_2\) both emit traces \( \langle \{a\}, \{a\}, \{b\} \rangle \) and \( \langle \{a\}, \{a\}, \{c\} \rangle \) with probability 0.5 each, and hence \(s_1, s_2\) have the same linear time behaviour. However, \(s_1, s_2\) have different branching behaviour, since exclusively \(s_1\) satisfies the PCTL formula \( \mathrm {P}_{=1}\left[ \; \mathrm {X} \; \mathrm {P}_{=0.5}\left[ \; \mathrm {X} \; b \; \right] \; \right] \). Conversely, only \(s_2\) satisfies \( \mathrm {P}_{=0.5}\left[ \; \mathrm {X} \; \mathrm {P}_{=1}\left[ \; \mathrm {X} \; b \; \right] \; \right] \).

In this paper, we investigate what \(\varepsilon \)-approximate probabilistic bisimulation means in terms of trajectories. We will prove that for Labelled Markov Chains (over potentially uncountable state spaces), \(\varepsilon \)-bisimulation between two states places the tight upper bound of \(1-(1-\varepsilon )^k\) (which is \(\le k\varepsilon \)) on the total variation [18] between the distributions of length \(k+1\) traces starting from those states, for all \(k \in \mathbb N\). We will formulate these bounds by introducing the notion of f(k)-trace equivalence. As such, we extend the well known result that bisimulation implies trace equivalence in non-probabilistic systems to the context of approximate and probabilistic models (the exact probabilistic case having been considered in [7]).

One direct repercussion of our result is that it provides a method to efficiently bound the total variation of length k traces from two finite-state LMCs (or two states in an LMC), since the aforementioned result in [13] can be used to decide or to compute \(\varepsilon \)-bisimulation between two states in polynomial time. We will also apply our results to the quantitative verification of continuous-state Markov models [2, 3, 16], improving on the current class of properties approximated and the corresponding approximation errors.

Related Work. Literature on approximations of (finite-state) Markov models can be distinguished into two main branches: one focusing on one-step similarity, the other dealing with trace distances. One-step similarity can be studied via the notion of probabilistic bisimulation, introduced in the context of finite-state models by [20], and related to lumpability in [23]. [13] discusses a notion of approximate bisimulation, related to quasi-lumpability conditions in [6, 22]. From the perspective of process algebra, [17] studies operators on probabilistic transition systems that preserve the approximate bisimulation distance. The work in [13, 21] is seminal in introducing notions of (exact) probabilistic simulation, much extended in subsequent literature.

On the other hand, there are a few papers studying the total variation distance over traces. [9] presents an algorithm for approximating the total variation of infinite traces of labelled Markov systems and prove the problem of deciding whether it exceeds a given threshold to be NP-hard. [8] shows that the undiscounted bisimilarity pseudometric is a (non-tight) upper bound on the total variation of infinite traces (like \(\varepsilon \)-bisimulation, the bisimilarity pseudometric is defined on the syntax of the model and there are efficient algorithms for computing it [8, 10, 24]). The contribution in this paper, focusing on finite traces rather than infinite traces, is that the total variation of finite traces is much less conservative, and moreover allows manipulating models under specific error bounds on length k traces, as we will show in Sect. 5.

[14] studies notions based on the total variation of finite and infinite traces: employing a different notion of \(\varepsilon \)-bisimulation than ours, it proves error bounds on trace distances, which however depend on additional properties of the structure of the transition kernel (as shown in Sect. 7 the error bound on reachability probabilities could go to 1 in two steps, for any \(\varepsilon > 0\)). Finite abstractions of continuous-state Markov models can be synthesised by notions that are variations of the \(\varepsilon \)-bisimulation in this work [1, 3, 16]. Tangential to our work, [11] shows that the total variation of finite traces can be statically estimated via repeated observations. [26] investigates ways of compressing Hidden Markov Models by searching for a smaller model that minimises the total variation of length-k traces of the two models.

Structure of this article. In Sect. 2, we introduce the reference model (labelled Markov chains – LMC – over general state spaces) and provide a definition of \(\varepsilon \)-bisimulation for LMCs. In Sect. 3, we introduce the notion of approximate probabilistic trace equivalence (and the derived notion of probabilistic trace distance), and discuss how it relates to bounded linear time properties, and to the notion of distinguishability. In Sect. 4, we present the main result: we will derive a tight upper bound on the probabilistic trace distance between \(\varepsilon \)-bisimilar states. In Sect. 5, we show how these results can be used to approximately model check continuous state systems, and Sect. 6 discusses a case study. In Sect. 7, we discuss an alternative notion of approximate probabilistic bisimulation that appears in literature and show that it cannot be used to effectively bound probabilistic trace distance.

In this work we offer sketches of the proof of some theorems, and omit the proof of other results: the complete proofs, as well as the details on the implementation of the Case Study, can be found in the Appendix of [4].

2 Preliminaries

2.1 Labelled Markov Chains

We will work with discrete-time Labelled Markov Chains (LMCs) over general state spaces. Known definitions of countable- or finite-state LMCs represent special instances of the general models we introduce next.

Definition 1

(LMC syntax). A Labelled Markov Chain (LMC) is a structure \( \mathscr {M}= {\left( S, \varSigma , \tau , L\right) }\) where:

  • \(S\) is a (potentially uncountable) set of states.

  • \(\varSigma \subseteq \mathcal P(S)\) is a \(\varSigma \)-algebra over \(S\) representing the set of measurable subsets of \(S\).

  • \(\tau : S\times \varSigma \rightarrow [0,1]\) is a transition kernel. That is, for all \(s\in S\), \(\tau (s,\cdot )\) is a probability measure on the measure space \((S, \varSigma )\), and for all \(A \in \varSigma \) we require \(\tau (\cdot ,A)\) to be \(\varSigma \)-measurable.

  • \(L: S\rightarrow \mathcal O\) labels each state \(s \in S\) with a subset of atomic propositions from \(\mathrm {AP}\), where \(\mathcal O = 2^\mathrm {AP}\). L is required to be \(\varSigma \)-measurable, and we will assume \(\mathrm {AP}\) to be finite.

\(L(s)\) captures all the observable information at state \(s \in S\): this drives our notion relating pairs of states, and we characterise properties over the codomain of this function.

Definition 2

(LMC semantics). Let \(\mathscr {M}= {\left( S, \varSigma , \tau , L\right) }\) be a LMC. Given an initial distribution \(p_0\) over \(S\), the state of \(\mathscr {M}\) at time \(t \in \mathbb N\) is a random variable \(\mathscr {M}^{p_0}_{t}\) over \(S\), such that

for all \(k\in \mathbb N \backslash \{0\}, A_k \in \varSigma \), where of course \(\tau (y_{k-1}, A_k) = \int _{y_{k} \in A_{k}} \tau (y_{k-1},\mathrm {d}y_{k})\).

Models in related work. A body of related literature works with labelled MDPs, which are more general models allowing a non-deterministic choice \(u \in \mathcal U\) (for some finite \(\mathcal U\)) of the transition kernel \(\tau _u\) at each step. This choice is made by a “policy” that probabilistically selects u based on past observations of the process. Whilst we will ignore non-determinism and work with LMCs for simplicity, our results can be adapted to labelled MDPs by quantifying over all policies, or over all choices u for properties like \(\varepsilon \)-bisimulation, in order to remove the non-determinism. The seminal work on bisimulation and \(\varepsilon \)-bisimulation dealt with models known as LMPs [12]. LMPs allow for non-determinism (like labelled MDPs) but their states are unlabelled and at each step they have a probability of halting. For the study of bisimulation in this work, LMPs can be considered as a simplification of labelled MDPs to the case \(\mathcal O = \{\emptyset ,\{\mathrm {halted}\}\}\).

2.2 Exact and Approximate Probabilistic Bisimulations

The notion of approximate probabilistic bisimulation (in this work just \(\varepsilon \)-bisimulation) is a structural notion of closeness, based on the stronger notion of exact probabilistic bisimulation [12]. We discuss both next. Considering a binary relation R over set X, we say that a subset \(\tilde{S} \subseteq X\) is R-closed if \(\tilde{S}\) contains its own image under R. That is, if .

Definition 3

(Exact probabilistic bisimulation). Let \(\mathscr {M}= {\left( S, \varSigma , \tau , L\right) }\) be a LMC. For \(\varepsilon \in [0,1]\), an equivalence relation \(R \subseteq S\times S\) over the state space is an exact probabilistic bisimulation relation if

A pair of states \(s_1, s_2 \in S\) are said to be (exactly probabilistically) bisimilar if there exists an exact probabilistic bisimulation relation R such that \(s_1 R\, s_2\).

Note that since R is an equivalence relation, R-closed sets are exactly the unions of whole equivalence classes.

Next, we adapt the notion of \(\varepsilon \)-bisimulation (as discussed in [13] for LMPs over countable state spaces) to LMCs over general spaces.

Definition 4

( \(\varepsilon \) -bisimulation). Let \(\mathscr {M}= {\left( S, \varSigma , \tau , L\right) }\) be a LMC. For \(\varepsilon \in [0,1]\), a symmetric binary relation \(R_\varepsilon \subseteq S\times S\) over the state space is an \(\varepsilon \)-approximate probabilistic bisimulation relation (or just \(\varepsilon \)-bisimulation relation) if

$$\begin{aligned}&\forall T \in \varSigma , \quad \textit{we have } R_\varepsilon (T) \in \varSigma , \end{aligned}$$
(1)
$$\begin{aligned}&\forall (s_1, s_2) \in R_\varepsilon , \quad \textit{we have } L(s_1) = L(s_2), \end{aligned}$$
(2)
$$\begin{aligned}&\forall (s_1, s_2) \in R_\varepsilon , \; \forall T \in \varSigma , \quad \textit{we have } \tau (s_2,R_\varepsilon (T)) \ge \tau (s_1,T) - \varepsilon . \end{aligned}$$
(3)

Two states \(s_1, s_2 \in S\) are said to be \(\varepsilon \)-bisimilar if there exists an \(\varepsilon \)-bisimulation relation \(R_\varepsilon \) such that \(s_1 R_\varepsilon s_2\).

The condition raised in (3) could be understood intuitively as “for any move that \(s_1\) can take (say, into set T), \(s_2\) can match it with higher likelihood over the corresponding set \(R_\varepsilon (T)\), up to \(\varepsilon \) tolerance.” Notice that (1) is not a necessary requirement for countable state models, but for uncountable state models it is needed to ensure that \(R_\varepsilon (T)\) is measurable and \(\tau (s_2,R_\varepsilon (T))\) is defined in (3).

[13] showed that in countable state spaces, 0-approximate probabilistic bisimulation corresponds to exact probabilistic bisimulation. On uncountable state spaces, not every exact probabilistic bisimulation relation is a 0-bisimulation relation because of the additional measurably requirement, but we still have that 0-bisimulation implies exact probabilistic bisimulation.

Theorem 1

Let \(\mathscr {M}= {\left( S, \varSigma , \tau , L\right) }\) be a LMC, and let \(s_1,s_2 \in S\). If \(s_1,s_2\) are 0-bisimilar, then they are (exactly, probabilistically) bisimilar.

Although above \(s_1, s_2\) are required to belong to the state space of a given LMC, the notions of exact- and \(\varepsilon \)-bisimulation can be extended to hold over pairs of LMCs by combining their state spaces, as follows.

Definition 5

( \(\varepsilon \) -bisimulation of pairs of LMCs). Consider two LMCs \(\mathscr {M}_1 = \left( S_1, \varSigma _1, \tau _1, L_1 \right) \) and \(\mathscr {M}_2 = \left( S_2, \varSigma _2, \tau _2, L_2 \right) \). Without loss of generality, assume that their state spaces \(S_1, S_2\) are disjoint. The direct sum \(\mathscr {M}_1 \oplus \mathscr {M}_2\) of \(\mathscr {M}_1\) and \(\mathscr {M}_2\) is the LMC formed by combining the state spaces of \(\mathscr {M}_1\) and \(\mathscr {M}_2\). Formally, \(\mathscr {M}_1 \oplus \mathscr {M}_2 = \left( S_1 \uplus S_2, \sigma \left( \varSigma _1 \times \varSigma _2 \right) , \tau _1 \oplus \tau _2, L_1 \uplus L_2 \right) \), where:

  • \(S_1 \uplus S_2\) is the union of \(S_1\) and \(S_2\) where we have assumed wlog (by relabelling if necessary) that \(S_1,S_2\) are disjoint;

  • \(\sigma \left( \varSigma _1 \times \varSigma _2 \right) \) is the smallest \(\sigma \)-algebra containing \(\varSigma _1 \times \varSigma _2\);

Let \(s_1 \in S_1\), \(s_2 \in S_2\). We say that \(s_1, s_2\) are \(\varepsilon \)-bisimilar iff \(s_1, s_2\) are \(\varepsilon \)-bisimilar as states in the direct sum LMC \(\mathscr {M}_1 \oplus \mathscr {M}_2\).

Other Notions of \(\varepsilon \) -Bisimulation in Literature. There is an alternative, more direct, extension of exact probabilistic bisimulation in literature [1, 3, 14], which simply requires \(| \tau (s_1,\tilde{T}) - \tau (s_2,\tilde{T}) | \le \varepsilon \) instead of the conditions in Definition 4. However, this requirement alone is too weak to guarantee properties that we later discuss (cf. Sect. 7).

3 Approximate Probabilistic Trace Equivalence for LMCs

In this section we introduce the concept of approximate probabilistic trace equivalence (or just f(k)-trace equivalence) to represent closeness of observable linear time behaviour. Based on the likelihood over traces of a given LMC, this notion depends on its operational semantics (cf. Definition 2), rather than on the structure of its transition kernel (as in the case of approximate bisimulation). The notion can alternatively be thought of inducing a distance among traces, as elaborated below.

Definition 6

(Trace likelihood). Let \(\mathscr {M}= {\left( S, \varSigma , \tau , L\right) }\) be an LMC, \(s_0 \in S\), and \(k \in \mathbb N\). Let \(\mathrm {TRACE}\) denote a set of traces (each of length \(k+1\)), taking values in time over \(2^{\mathrm {AP}}\), so that \(\mathrm {TRACE} \subseteq \mathcal O^{k+1}\). Denote with \(P_{k}\!\left( s_0,\mathrm {TRACE}\right) \) the probability that the LMC \(\mathscr {M}\), given an initial state \(s_0\), generates any of the runs \(\langle \alpha _0, \cdots , \alpha _k \rangle \in \mathrm {TRACE}\), namely

$$\begin{aligned}&P_{k}\!\left( s_0,\mathrm {TRACE}_k\right) = \sum _{\begin{array}{c} \langle \alpha _0, \cdots , \alpha _k \rangle \\ \in \mathrm {TRACE}_k \end{array}} \mathbb P \big [ \mathscr {M}^{s_0}_{0} \in L^{-1}(\{\alpha _{0}\}), \cdots , \mathscr {M}^{s_0}_{k} \in L^{-1}(\{\alpha _{k}\}) \big ], \end{aligned}$$

where \(\mathscr {M}^{s_0}_{t}\) is the state of \(\mathscr {M}\) at step t, with a degenerate initial distribution \(p_0\) that is concentrated on point \(s_0\) (cf. Definition 2).

As intuitive, we consider traces of length \(k+1\) (rather than of length k) because a length \(k+1\) trace is produced by one initial state and precisely k transitions. Notice that the set of sequences of states generating \(\mathrm {TRACE}\) is measurable, being defined via a measurable map L over a finite set of traces.

Definition 7

(Total variation [15]). Let \((Z,\mathcal G)\) be a measure space where \(\mathcal G\) is a \(\sigma \)-algebra over Z, and let \(\mu _1, \mu _2\) be probability measures over \((Z,\mathcal G)\). The total variation between \(\mu _1, \mu _2\) is .

Definition 8

( f(k)-trace equivalence). Let \({\mathscr {M}= {\left( S, \varSigma , \tau , L\right) }}\) be a LMC. For a non-decreasing function \(f: \mathbb N \rightarrow [0,1]\), we say that states \(s_1,s_2 \in S\) are f(k)-approximate probabilistic trace equivalent if for all \(k \in \mathbb N\),

$$\begin{aligned} d_\mathrm {TV}\big (P_{k}\!\left( s_1,\cdot \right) ,P_{k}\!\left( s_2,\cdot \right) \big ) \le f(k), \end{aligned}$$

or alternatively if over \(\mathrm {TRACE} \subseteq \mathcal O^{k+1}\),

$$\begin{aligned} \left| P_{k}\!\left( s_1,\mathrm {TRACE}\right) - P_{k}\!\left( s_2,\mathrm {TRACE}\right) \right| \le f (k). \end{aligned}$$

The condition on monotonicity follows from the requirement on the total variation distance, which is defined over a product output space and necessarily accumulates over time. The notion of f(k)-trace equivalence can be used to relate states from two different LMCs, much in the same way as \(\varepsilon \)-bisimulation.

One can introduce the notion of probabilistic trace distance between pairs of states \(s_1,s_2\) as

$$ \min \{f(k) \ge 0 \mid s_1 \text { is } f(k)\text {-trace equivalent to }s_2\} = d_\mathrm {TV}\big (P_{k}\!\left( s_1,\cdot \right) ,P_{k}\!\left( s_2,\cdot \right) \big ). $$

Notice that the RHS is clearly a pseudometric. We discuss the development of tight bounds on the probabilistic trace distance in Sect. 4.

3.1 Interpretation and Application of \(\varepsilon \)-Trace Equivalence

The notion of \(\varepsilon \)-trace equivalence subsumes closeness of finite-time traces, and can be interpreted in two different ways. Firstly, \(\varepsilon \)-trace equivalence leads to closeness of satisfaction probabilities over bounded-horizon linear time properties, e.g. bounded LTL formulae, as follows.

Theorem 2

Let \(\mathscr {M}= {\left( S, \varSigma , \tau , L\right) }\) be an LMC, and let \(s_1,s_2 \in S\) be \(\varepsilon \)-trace equivalent. Let \(\psi \) be any bounded LTL property over a k-step time horizon, defined within the LTL fragment \( \phi = \phi _1 \mathrm U^{\le t} \phi _2 \mid a \mid \phi _1 \wedge \phi _2 \mid \phi _1 \vee \phi _2 \mid \lnot \phi _1 \) for \(t \le k\). Then,

$$\begin{aligned} \Bigm | \mathbb P \left[ s_1\,\models \,\psi \right] - \mathbb P \left[ s_2\,\models \,\psi \right] \Bigm |\; \le f (k), \end{aligned}$$

where \(P \left[ s\,\models \,\psi \right] \) is the probability that starting from state s, the LMC satisfies property \(\psi \).

Proof

Formula \(\psi \) is satisfied by a specific set of length \(k+1\) traces.    \(\square \)

Alternatively, via its connection to the notion of total variation, \(\varepsilon \)-trace distance leads to the notion of distinguishability of the underlying LMC, namely the ability (of an agent external to the model) to distinguish a model by observing its traces.

Theorem 3

Let \(s_1, s_2\) be two states of an LMC. Suppose one of them is selected by a secret fair coin toss. An external agent guesses which one has been selected by observing a trace of length \(k+1\) emitted from the unknown state. Then, an optimal agent guesses correctly with probability

$$\begin{aligned} \frac{1}{2} + \frac{1}{2} f(k), \end{aligned}$$

with \(f(k) = d_\mathrm {TV}\big (P_{k}\!\left( s_1,\cdot \right) ,P_{k}\!\left( s_2,\cdot \right) \big )\) being the probabilistic trace distance.

4 \(\varepsilon \)-Probabilistic Bisimulation Induces Approximate Probabilistic Trace Equivalence

In this section we present the main result: we show that \(\varepsilon \)-bisimulation induces a tight upper bound on the probabilistic trace distance, quantifiable as \((1-(1-\varepsilon )^k)\). This translates to a guarantee on all the properties implied by \(\varepsilon \)-trace equivalence, such as closeness of satisfaction probabilities for bounded linear time properties. In addition, since for finite state LMPs the maximal \(\varepsilon \)-bisimulation relation can be computed in \(\mathcal O(\left| S\right| ^7)\) time [13], this result allows to establish an upper bound on the probabilistic trace distance with the same time complexity.

Fig. 2.
figure 2

LMC for the proof of Theorem 4.

Theorem 4

( \(\varepsilon \) -bisimulation implies \((1-(1-\varepsilon )^k)\) -trace equivalence). Let \(\mathscr {M}= {\left( S, \varSigma , \tau , L\right) }\) be a LMC. If \(s_1, s_2 \in S\) are \(\varepsilon \)-bisimilar, then \(s_1, s_2\) are \((1-(1-\varepsilon )^k)\)-trace equivalent.

Proof

(Sketch). The full proof, developed for LMCs over uncountable state spaces, can be found in Appendix C of [4]. Here we offer a sketch of proof, employing the finite-state LMP in Fig. 2 as an illustrating example (where for simplicity we have omitted the labels for internal states, which can as well be labelled with \(\emptyset \in \mathcal O\)).

The maximal (i.e. coarsest) \(\varepsilon \)-bisimulation relation \(R_{\varepsilon }\) is obtained by pairs of states within the sets

$$\begin{aligned} \{t_1, t_2, u_1 \}, \{t_2, t_3, u_2 \}, \{s_1, s_2 \}, \{v\},\{w\},\{z\}. \end{aligned}$$

We would like to prove that these \(\varepsilon \)-bisimilar states are also \(\varepsilon \)-trace equivalent. In the full proof, we will show this by induction on the length of the trace, for all \(\varepsilon \)-bisimilar states at the same time. In this sketch proof, we aim to illustrate the induction step by showing how to bound

$$\begin{aligned} \left| P_{3}\!\left( s_1,\lozenge ^{\le 3} a\right) - P_{3}\!\left( s_2,\lozenge ^{\le 3} a\right) \right| , \end{aligned}$$

where \(\lozenge ^{\le k} a\) is the set of traces of length \(k+1\), which reach a state labelled with a (which in this case is just state v). The idea is to match each of the outgoing transitions from \(s_1\) to an outgoing transition from \(s_2\) and to an \(\varepsilon \)-bisimilar state. Specifically, we explicitly write

(4)

and respectively

(5)

We then match-off the terms in the expansions for \(P_{3}\!\left( s_1,\lozenge ^{\le 3} a\right) \) and \(P_{3}\!\left( s_2,\lozenge ^{\le 3} a\right) \), one term at a time. We use the induction hypothesis to argue that the probabilities in the matched terms are \((1-(1-\varepsilon )^k)\)-close to each other (here \(k=1\)), since they concern \(\varepsilon \)-bisimilar states. That is,

$$\begin{aligned}&\left| P_{2}\!\left( t_1,\lozenge ^{\le 2} a\right) - P_{2}\!\left( u_1,\lozenge ^{\le 2} a\right) \right| \le 1-(1-\varepsilon )^k \\&\left| P_{2}\!\left( t_2,\lozenge ^{\le 2} a\right) - P_{2}\!\left( u_1,\lozenge ^{\le 2} a\right) \right| \le 1-(1-\varepsilon )^k \\&\cdots \end{aligned}$$

The total amount of difference between the matching coefficients is no more than \(\varepsilon \). It can be shown (Lemma 1 in Appendix C of [4]) that these conditions guarantee the required bound on \(\left| P_{3}\!\left( s_1,\lozenge ^{\le 3} a\right) - P_{3}\!\left( s_2,\lozenge ^{\le 3} a\right) \right| \).

The main difficulty is choosing a suitable decomposition of \(P_{3}\!\left( s_1,\lozenge ^{\le 3} a\right) \) and \(P_{3}\!\left( s_2,\lozenge ^{\le 3} a\right) \). This is non-trivial since in (4), the probability of transitioning into \(t_2\) had to be broken up into two terms with probability each. However, we can tackle this issue using an extension of Hall’s Matching Theorem [5] (cf. Appendix C of [4]). It is then relatively straight forward to adapt this proof to LMCs on uncountable state spaces by converging on integrals with simple functions.    \(\square \)

We now show that the expression for the induced bound on probabilistic trace distance proved in Theorem 4, namely \((1-(1-\varepsilon )^k)\), is tight, in the sense that for any k and \(\varepsilon \), the bound can be attained by some pair of \(\varepsilon \)-bisimilar states in some LMC. In other words, it is not possible to provide bounds on the induced approximation level for traces, that are smaller than the expression discussed above and that are valid in general.

Theorem 5

For any \(\varepsilon \ge 0\), there exists a LMC \(\mathscr {M}= {\left( S, \varSigma , \tau , L\right) }\) and states \(s_1, s_2 \in S\) such that \(s_1, s_2\) are \(\varepsilon \)-bisimilar, and for all \(k \in \mathbb N\) there exists a set \(\mathrm {TRACE}\) of length \(k+1\) traces s.t. \(\left| P_{k}\!\left( s_1,\mathrm {TRACE}\right) - P_{k}\!\left( s_2,\mathrm {TRACE}\right) \right| = 1-(1-\varepsilon )^k\).

Proof

Select \(\varepsilon \ge 0\) and consider the following LMC:

figure a

Here \(s_1, s_2\) are \(\varepsilon \)-bisimilar, and for all \(k \in \mathbb N\), \(P_{k}\!\left( s_1,\lozenge ^{\le k} a\right) = 0\), whereas \(P_{k}\!\left( s_2,\lozenge ^{\le k} a\right) = \sum ^k_{i=1} {\left( 1-\varepsilon \right) ^{i-1} \varepsilon } = 1-(1-\varepsilon )^k\).    \(\square \)

The result in Theorem 4 can be viewed as an extension of the known fact that bisimulation implies trace equivalence in non-probabilistic transition systems. Similar to the deterministic case, the converse of Theorem 4 does not hold.

Theorem 6

\((1-(1-\varepsilon )^k)\)-trace equivalence does not imply \(\varepsilon \)-bisimulation.

Proof

In Fig. 1, states \(s_1, s_2\) are not \(\varepsilon \)-bisimilar for any , yet their probabilistic trace distance is equal to 0.    \(\square \)

This example shows that \(\varepsilon \)-bisimulation cannot be used to effectively estimate the probabilistic trace distance between individual states. In particular, while the \((1-(1-\varepsilon )^k)\)-bound on probabilistic trace distance discussed above is tight as a uniform bound, it is not tight for individual pairs of states.

5 Application to Model Checking of Continuous-State LMCs

Suppose we are given an LMC \(\mathscr {M}^\mathcal C= {\left( S^\mathcal C, \varSigma ^\mathcal C, \tau ^\mathcal C, L^\mathcal C\right) }\), which we shall refer to as the “concrete” model, possibly over a continuous state space. We are interested in calculating its probability of satisfying a given LTL formula, starting from certain initial states. One approach is to construct a finite-state LMC \(\mathscr {M}^\mathcal A= {\left( S^\mathcal A, \varSigma ^\mathcal A, \tau ^\mathcal A, L^\mathcal A\right) }\) (the “abstract model”) that can be related to \(\mathscr {M}^\mathcal C\) (in a way to be made precise shortly). Probabilistic model checking can then be run over \(\mathscr {M}^\mathcal A\) using standard tools for finite-state models such as PRISM [19], and since \(\mathscr {M}^\mathcal A\) is related to \(\mathscr {M}^\mathcal C\), this leads to approximate outcomes that are valid for \(\mathscr {M}^\mathcal C\). The above approach has been studied in several papers [2, 3, 16], and the method for constructing \(\mathscr {M}^\mathcal A\) from \(\mathscr {M}^\mathcal C\) is to raise smoothness assumptions on the kernel \(\tau ^\mathcal C\) of \(\mathscr {M}^\mathcal C\), and to partition the state space \(S^\mathcal C\), thus obtaining \(S^\mathcal A\) and \(\tau ^\mathcal A\) (the sigma algebra and labels being directly inherited).

In this section we will demonstrate the application of our results. We will employ \(\varepsilon \)-bisimulation to relate \(\mathscr {M}^\mathcal A\) and \(\mathscr {M}^\mathcal C\), and use our results to bound their trace distance. This method produces tighter error bounds, for a broader class of properties, than are currently established in literature. The first step is to establish simpler conditions that guarantee \(\varepsilon \)-bisimulation between \(\mathscr {M}^\mathcal C\) and \(\mathscr {M}^\mathcal A\).

Theorem 7

Let \(\varepsilon \in \left[ 0,1\right] \), and suppose there exists a finite measurable partition \(\mathcal Q_\varepsilon = \{ P_1, \cdots , P_N \}\) of \(S^\mathcal C\) such that for all \(P \in \mathcal Q_\varepsilon \), \(s_1, s_2 \in P\), we have that \(L^\mathcal C(s_1) = L^\mathcal C(s_2)\) andFootnote 1

$$\begin{aligned} \max _{J \subseteq \{1, \cdots , N\}} \left| \tau ^\mathcal C\left( s_1, \bigcup _{j \in J} P_j \right) - \tau ^\mathcal C\left( s_2, \bigcup _{j \in J} P_j \right) \right| \le \varepsilon . \end{aligned}$$

Assume wlog \(P_i \ne \emptyset \), and for each \(i \in \{ 1, \cdots , N \}\) choose a representative point \(s^\mathcal C_i \in P_i\). Consider the abstract model to be \(\mathscr {M}^\mathcal A= {\left( S^\mathcal A, \varSigma ^\mathcal A, \tau ^\mathcal A, L^\mathcal A\right) }\) formed by merging each \(P_i\) into \(s^\mathcal C_i\). Formally,

  • \(S^\mathcal A= \{ s^\mathcal A_1, \cdots , s^\mathcal A_N \}\).

  • \(\varSigma ^\mathcal A= \mathcal P \left( S^\mathcal A\right) \).

  • \(\tau ^\mathcal A\) is such that \(\tau ^\mathcal A(s^\mathcal A_i, \{ s^\mathcal A_j \}) = \tau ^\mathcal C( s^\mathcal C_i, P_j )\).

  • \(L^\mathcal A(s^\mathcal A_i) = L^\mathcal C(s^\mathcal C_i)\).

Then, for all \(i \in \{ 1, \cdots , N \}\), \(s^\mathcal C\in P_i\), we have that \(s^\mathcal C\) is \(\varepsilon \)-bisimilar to \(s^\mathcal A_i\), and hence \(1-(1-\varepsilon )^k\)-trace equivalent.

In practical terms the partition \(\mathcal Q_\varepsilon \) can be straightforwardly constructed in many cases. As shown in Theorem 8, the approach in [2, 3, 25] generates a partition of \(S^\mathcal C\) satisfying the conditions of Theorem 7. Thus, \(\varepsilon \)-bisimulation can be seen as the underlying reason for the closeness of probabilities of events.

Theorem 8

Consider an LMC \(\mathscr {M}^\mathcal C= {\left( S^\mathcal C, \varSigma ^\mathcal C, \tau ^\mathcal C, L^\mathcal C\right) }\) where \(S^\mathcal C\) is a Borel subset of \(\mathbb R^d\). Suppose that \(\tau ^\mathcal C(s,T)\) is of the form \(\int _{t \in T} f(s,t) \mathrm {d}t\), so that for each state \(s \in S^\mathcal C\), \(f(s,\cdot ) : S^\mathcal C\rightarrow \mathbb R^+_0\) is the probability density of the next state. Suppose further that \(f(\cdot ,t)\) is uniformly K-Lipschitz continuous for all \(t \in S^\mathcal C\). That is, for some \(K \in \mathbb R\), for all \(s_1, s_2, t \in S^\mathcal C\),

$$\begin{aligned} \left| f(s_1,t) - f(s_2,t) \right| \le K \cdot \Vert s_1 - s_2 \Vert . \end{aligned}$$

For \(A \in \varSigma ^\mathcal C\) (so \(A \subseteq \mathbb R^d\)), let \(\lambda (A)\) be the volume of A and be the diameter of A. For any \(\varepsilon \in \left[ 0,1\right] \), finite \(\lambda (S^\mathcal C)\), suppose partition \(\mathcal Q = \{P_1, \cdots , P_N\}\) of \(S^\mathcal C\) is such that

$$\begin{aligned} \max _{j \in \{1, \cdots , N \}} \delta (P_j) \le \frac{2 \varepsilon }{K\lambda (S^\mathcal C)}. \end{aligned}$$

Then, we have that \(\mathcal Q\) satisfies the conditions of Theorem 7 and can be used to construct the abstract model \(\mathscr {M}^\mathcal A\).

There are a number of adaptations that could be made to this result. [16] improves a related approach by varying the size of each partition in response to the local Lipschitz constant, rather than enforcing a globally uniform K. Similarly to this paper, [3] also discusses the relation of approximate probabilistic bisimulation to the problem of generating the abstract model, but a strictly weaker definition of approximate probabilistic bisimulation is employed (cf. Sect. 7). Finally, note that using algorithms in [13], we can compute \(\varepsilon \)-bisimulation relations on \(\mathscr {M}^\mathcal A\): this allows \(\mathscr {M}^\mathcal A\) to be further compressed (at the cost of an additional \(\varepsilon _2\) approximation), by merging the states that are \(\varepsilon _2\)-bisimilar to each other.

6 Case Study

Concrete Model. Consider the concrete model \(\mathscr {M}^\mathcal C= {\left( S^\mathcal C, \varSigma ^\mathcal C, \tau ^\mathcal C, L^\mathcal C\right) }\), describing the weather forecast for a resort. Here \(S^\mathcal C= \{0,1\} \times \left[ 0,1\right) \), \(\varSigma ^\mathcal C= \mathcal B(S^\mathcal C)\), and the state at time t is \((R_t, H_t)\), where

  • \(R_t \in \{0,1\}\) is a random variable representing whether it rains on day t,

  • \(H_t \in \left[ 0,1\right) \) is a random variable representing the humidity after day t.

Raining on day t causes it to become more likely to rain on day \(t+1\), but it also tends to reduce the humidity, which causes it to become gradually less likely to rain in the future. The meteorological variations are encompassed by \(\tau ^\mathcal C\), which is such that the model evolves according to

where \(\mathrm B(p)\) is the Bernoulli distribution with probability p of producing 1, and \(\mathrm U \left[ a,b \right) \) is the uniform distribution over the real interval \(\left[ a,b \right) \). Finally, the states of the model are labelled according to whether it rains on that day, namely

$$\begin{aligned} L^\mathcal C((r,h)) = {\left\{ \begin{array}{ll} \{\mathrm {RAIN}\} &{} \text {if}\ r = 1 \\ \emptyset &{} \text {if}\ r = 0. \end{array}\right. } \end{aligned}$$

Given \(\mathscr {M}^\mathcal C\) we are interested in computing the likelihood of events expressing meteorological predictions, given knowledge of present weather conditions.

Synthesis of the Abstract Model. Notice that \(\mathscr {M}^\mathcal C\) does not directly satisfy the smoothness assumptions of Theorem 8, in view of the discrete/continuous structure of its state space and the discontinuous probability density resulting from the uniform distribution. Nonetheless, we can still construct \(\mathscr {M}^\mathcal A\) by taking a sensible partition of \(S^\mathcal C\) and proving that it satisfies the conditions of Theorem 7. Let , where .

Theorem 9

For any \(\varepsilon \in \left[ 0,1\right] \), by taking , we have that \(\mathcal Q_\varepsilon \) satisfies the conditions of Theorem 7.

Therefore, we may construct the abstract model using Theorem 7. We choose the abstract state \((r^\mathcal A, h^\mathcal A) \in S^\mathcal A:= \{0,1\} \times \{0, \cdots , N-1 \}\) to correspond to the partition \(P_{r^\mathcal A, h^\mathcal A} \in \mathcal Q_\varepsilon \), and within each partition we select the concrete state with the lowest \(H_t\)-coordinate to be the representative state. This produces the abstract model \(\mathscr {M}^\mathcal A= {\left( S^\mathcal A, \varSigma ^\mathcal A, \tau ^\mathcal A, L^\mathcal A\right) }\), where \(\varSigma ^\mathcal A= \mathcal P(S^\mathcal A)\), \(L^\mathcal A((r,h)) = L^\mathcal C((r,h))\), and

where

  • \(p_{\mathrm {R}_1 \mid \mathrm {R_0}}(h_0) = \frac{1}{4} + \frac{3}{4} \frac{h_0}{N}\), and \(p_{\mathrm {R}_1 \mid \lnot \mathrm {R_0}}(h_0) = \frac{3}{4} \frac{h_0}{N}\),

  • ,

  • \(p_{H\mid \mathrm {R}} (h_0,h_1) = \frac{2}{N+h_0} \cdotp \max \left( \min \left( \frac{N+h_0}{2} - h_1, 1 \right) , 0 \right) \).

Computation of Approximate Satisfaction Probabilities. Suppose that at the end of day 0, we have \(R_0 = 0, H_0 = 0.5\), and a travel agent wants to know the risk of there being 2 consecutive days of rain over the next three days.

This probability can be computed algorithmically according to \(\mathscr {M}^\mathcal A\), and for \(N=1000\), this is 0.365437 (see Appendix G of [4]). Since point \((0, 500) \in S^\mathcal A\) is 0.001-bisimilar with \((0,0.5) \in S^\mathcal C\), this means that according \(\mathscr {M}^\mathcal C\) with initial state \((0,0.5) \in S^\mathcal C\), the probability of there being two consecutive days of rain over the next three days is \(0.365437 \pm 0.003\).

Analytical Validation of the Result. In this setup it is possible to evaluate the exact result for \(\mathscr {M}^\mathcal C\) analytically:

$$\begin{aligned} \mathbb P \left[ R_1 = 1, R_2 \!=\! 1 \right]&= \mathbb P \left[ R_1 = 1 \right] \int _0^1 f_{H_1 \mid R_1 = 1}(h_1) \cdotp \mathbb P \left[ R_2 = 1 \mid H_1=h_1, R_1=1 \right] \mathrm {d}h_1, \end{aligned}$$

which amounts to 0.199219, and similarly \(\mathbb P \left[ R_1 = 0, R_2 = 1, R_3 = 1 \right] = 0.166626\). This yields

which is within the error bounds guaranteed by \(\varepsilon \)-trace equivalence, as expected.

7 Other Notions of \(\varepsilon \)-Bisimulation

The following condition appears in literature [1, 3, 14] as the definition of approximate probabilistic bisimulation. For simplicity, let us restrict our attention to finite state spaces.

Definition 9

(Alternative notion of approximate probabilistic bisimulation, adapted from [3]). Let \(\mathscr {M}= {\left( S, \varSigma , \tau , L\right) }\) be a LMC, where \(S\) is finite and \(\varSigma = \mathcal P(S)\). For \(\varepsilon \in [0,1]\), a binary relation \(R_\varepsilon \) on \(S\) satisfies Definition 9 if:

This is different from our notion of approximate probabilistic bisimulation because \(\tilde{T}\) ranges over \(R_\varepsilon \)-closed sets rather than all (measurable) sets. This definition is closer to exact probabilistic bisimulation (cf. Definition 3), but it is too weak to effectively bound probabilistic trace distance.

Theorem 10

For any \(\varepsilon > 0\), there exists an LMC \(\mathscr {M}= {\left( S, \varSigma , \tau , L\right) }\), a binary relation \(R_\varepsilon \) on \(S\), and a pair of states \((s_1,s_2) \in R_\varepsilon \), such that \(R_\varepsilon \) satisfies the conditions in Definition 9, but the 2-step reachability probabilities from \(s_1,s_2\) differ by 1 for some destination states.

Proof

For \(\varepsilon > 0\), let \(N\in \mathbb {Z}^+\), \(1/N \le \varepsilon \). Consider the following LMC. Let .

The only \(R_\varepsilon \)-closed sets are \(\{s_1, s_2\}\), \(\{t_0, \dots , t_N\}\), \(\{u_1\}\), \(\{u_2\}\) and unions of these sets, and so \(R_\varepsilon \) satisfies Definition 9.

We have \(s_1 R_\varepsilon \, s_2\), and yet \(P_{2}\!\left( s_1,\lozenge ^{\le 2} a\right) = 1\) but \(P_{2}\!\left( s_2,\lozenge ^{\le 2} a\right) = 0\), where \(\lozenge ^{\le 2} a\) is the set of length 3 traces that reach a state labelled with a.

figure b

As shown in [14] however, there is still some relationship between the probabilities of specific traces, which hinges on additional details of the structure of the transition kernel.

8 Conclusions and Extensions

In this paper we have developed a theory of f(k)-trace equivalence. We derived the minimum f(k) such that \(\varepsilon \)-bisimulation implies f(k)-trace equivalence, thus extending the well known result for the exact non-probabilistic case. By linking error bounds on the total variation of length k traces to a notion of approximation based on the underlying transition kernel, we provided a means of computing upper bounds for the total variation and of synthesising abstract models with arbitrarily small total variation to a given concrete model.

It is of interest to extend our results to allow the states of the LMC to be labelled with bounded real-valued rewards, and then to limit the difference in expected reward between approximately bisimilar states.