Deciding Probabilistic Bisimilarity Distance One for Labelled Markov Chains
 5.7k Downloads
Abstract
Probabilistic bisimilarity is an equivalence relation that captures which states of a labelled Markov chain behave the same. Since this behavioural equivalence only identifies states that transition to states that behave exactly the same with exactly the same probability, this notion of equivalence is not robust. Probabilistic bisimilarity distances provide a quantitative generalization of probabilistic bisimilarity. The distance of states captures the similarity of their behaviour. The smaller the distance, the more alike the states behave. In particular, states are probabilistic bisimilar if and only if their distance is zero. This quantitative notion is robust in that small changes in the transition probabilities result in small changes in the distances.
During the last decade, several algorithms have been proposed to approximate and compute the probabilistic bisimilarity distances. The main result of this paper is an algorithm that decides distance one in \(O(n^2 + m^2)\), where n is the number of states and m is the number of transitions of the labelled Markov chain. The algorithm is the key new ingredient of our algorithm to compute the distances. The state of the art algorithm can compute distances for labelled Markov chains up to 150 states. For one such labelled Markov chain, that algorithm takes more than 49 h. In contrast, our new algorithm only takes 13 ms. Furthermore, our algorithm can compute distances for labelled Markov chains with more than 10,000 states in less than 50 min.
Keywords
Labelled Markov chain Probabilistic bisimilarity Probabilistic bisimilarity distance1 Introduction
A behavioural equivalence captures which states of a model give rise to the same behaviour. Bisimilarity, due to Milner [22] and Park [25], is one of the best known behavioural equivalences. Verifying that an implementation satisfies a specification boils down to checking that the model of the implementation gives rise to the same behaviour as the model of the specification, that is, the models are behavioural equivalent (see [1, Chap. 3]).
Probabilistic bisimilarity, due to Larsen and Skou [21], is a key behavioural equivalence for labelled Markov chains. As shown by Katoen et al. [16], minimizing a labelled Markov chain by identifying those states that are probabilistic bisimilar speeds up model checking. Probabilistic bisimilarity only identifies those states that behave exactly the same with exactly the same probability. If, for example, we replace the fair coin in the above example with a biased one, then none of the states labelled with zero in the original model with the fair coin are behaviourally equivalent to any of the states labelled with zero in the model with the biased coin. Behavioural equivalences like probabilistic bisimilarity rely on the transition probabilities and, as a result, are sensitive to minor changes of those probabilities. That is, such behavioural equivalences are not robust, as first observed by Giacalone et al. [12].
The probabilistic bisimilarity distances that we study in this paper were first defined by Desharnais et al. in [11]. Each pair of states of a labelled Markov chain is assigned a distance, a real number in the unit interval [0, 1]. This distance captures the similarity of the behaviour of the states. The smaller the distance, the more alike the states behave. In particular, states have distance zero if and only if they are probabilistic bisimilar. This provides a quantitative generalization of probabilistic bisimilarity that is robust in that small changes in the transition probabilities give rise to small changes in the distances. For example, we can model a biased die by using a biased coin instead of a fair coin in the above example. Let us assume that the odds of heads of the biased coin, that is, going to the left, is \(\frac{51}{100}\). A state labelled zero in the model of the fair die has a nontrivial distance, that is, a distance greater than zero and smaller than one, to the corresponding state in the model of the biased die. For example, the initial states have distance about 0.036. We refer the reader to [7] for a more detailed discussion of a similar example.
As we already mentioned earlier, behavioural equivalences can be used to verify that an implementation satisfies a specification. Similarly, the distances can be used to check how similar an implementation is to a specification. We also mentioned that probabilistic bisimilarity can be used to speed up model checking. The distances can be used in a similar way, by identifying those states that behave almost the same, that is, have a small distance (see [3, 23, 26]).
Instead of computing the set of state pairs that have distance one, we compute the complement, that is, the set of state pairs with distance smaller than one. Obviously, the set of state pairs with distance zero is included in this set. First, we decide distance zero. As we mentioned earlier, distance zero coincides with probabilistic bisimilarity. The first decision procedure for probabilistic bisimilarity was provided by Baier [4]. More efficient decision procedures were subsequently proposed by Derisavi et al. [10] and also by Valmari and Franceschinis [30]. The latter two both run in \(O(m \log n)\), where n and m are the number of states and transitions of the labelled Markov chain. Subsequently, we use a traversal of a directed graph derived from the labelled Markov chain. This traversal takes \(O(n^2 + m^2)\).
Chen et al. [8] presented an algorithm to compute the distances by means of Khachiyan’s ellipsoid method [17]. Though the algorithm is polynomial time, in practice it is not as efficient as the policy iteration algorithms (see the examples in [28, Sect. 8]). The state of the art algorithm to compute the probabilistic bisimilarity distances consists of two components: \(D_0\) and SPI. To compare this algorithm with our new algorithm consisting of the components \(D_0\), \(D_1\) and SPI, we implemented all the components in Java and ran both implementations on several labelled Markov chains. These labelled Markov chains model randomized algorithms and probabilistic protocols that are part of the distribution of probabilistic model checkers such as PRISM [20]. Whereas the original state of the art algorithm can handle labelled Markov chains with up to 150 states, our new algorithm can handle more than 10,000 states. Furthermore, for one such labelled Markov chain with 150 states, the original algorithm takes more than 49 h, whereas our new algorithm takes only 13 ms. Also, the new algorithm consisting of the components \(D_0\), \(D_1\), Q and SPPI to compute only small distances along with the new algorithm consisting of the components \(D_0\), \(D_1\) and DI to approximate the distances give rise to even less execution times for a number of the labelled Markov chains.

a polynomial decision procedure for distance one,

an algorithm to compute the probabilistic bisimilarity distances,

an algorithm to compute those probabilistic bisimilarity distances smaller than some given Open image in new window , and

an approximation algorithm to compute the probabilistic bisimilarity distances up to some given accuracy Open image in new window .
Furthermore, by means of experiments we have shown that these three new algorithms are very effective, improving significantly on the state of the art.
2 Labelled Markov Chains and Probabilistic Bisimilarity Distances
We start by reviewing the model of interest, labelled Markov chains, its most well known behavioural equivalence, probabilistic bisimilarity due to Larsen and Skou [21], and the probabilistic bisimilarity pseudometric due to Desharnais et al. [11]. We denote the set of rational probability distributions on a set S by \(\mathrm {Distr}(S)\). For \(\mu \in \mathrm {Distr}(S)\), its support is defined by Open image in new window . Instead of \(S \times S\), we often write \(S^2\).
Definition 1

a nonempty finite set S of states,

a nonempty finite set L of labels,

a transition function \(\tau : S \rightarrow \mathrm {Distr}(S)\), and

a labelling function \(\ell : S \rightarrow L\).
For the remainder of this section, we fix such a labelled Markov chain \(\langle S, L, \tau , \ell \rangle \).
Definition 2
Note that \(\omega \in \varOmega (\mu , \nu )\) is a joint probability distribution with marginals \(\mu \) and \(\nu \). The following proposition will be used to prove Proposition 5.
Proposition 1
Definition 3
An equivalence relation \(R \subseteq S^2\) is a probabilistic bisimulation if for all \((s, t) \in R\), \(\ell (s) = \ell (t)\) and there exists \(\omega \in \varOmega (\tau (s), \tau (t))\) such that \(\mathrm {support}(\omega ) \subseteq R\). Probabilistic bisimilarity, denoted \(\sim \), is the largest probabilistic bisimulation.
The probabilistic bisimilarity pseudometric of Desharnais et al. [11] maps each pair of states of a labelled Markov chain to a distance, an element of the unit interval [0, 1]. Hence, the pseudometric is a function from \(S^2\) to [0, 1], that is, an element of \([0, 1]^{S^2}\). As we will discuss below, it can be defined as a fixed point of the following function.
Definition 4
Since a concave function on a convex polytope attains its minimum (see [18, p. 260]), the above minimum exists. We will use this fact in Proposition 4, one of the key technical results in this paper. We endow the set \([0, 1]^{S^2}\) of functions from \(S^2\) to [0, 1] with the following partial order: \(d \sqsubseteq e\) if \(d(s, t) \le e(s, t)\) for all s, \(t \in S\). The set \([0, 1]^{S^2}\) together with the order \(\sqsubseteq \) form a complete lattice (see [9, Chap. 2]). The function \(\varDelta \) is monotone (see [6, Sect. 3]). According to the KnasterTarski fixed point theorem [29, Theorem 1], a monotone function on a complete lattice has a least fixed point. Hence, \(\varDelta \) has a least fixed point, which we denote by \(\varvec{\mu }(\varDelta )\). This fixed point assigns to each pair of states their probabilistic bisimilarity distance.
Theorem 1
\(D_0 = \{\, (s, t) \in S^2 \mid s \sim t \,\}\).
3 Distance One
We concluded the previous section with the characterization of \(D_0\) as the set of state pairs that are probabilistic bisimilar. In this section we present a characterization of \(D_1\) as a fixed point of the function introduced in Definition 5.
Let us consider the case that the probabilistic bisimilarity distance of states s and t is one, that is, \(\varvec{\mu }(\varDelta )(s, t) =1\). Then \(\varDelta (\varvec{\mu }(\varDelta ))(s, t) =1\). From the definition of \(\varDelta \), we can conclude that either \(\ell (s) \not = \ell (t)\), or for all couplings \(\omega \in \varOmega (\tau (s), \tau (t))\) we have \(\mathrm {support}(\omega ) \subseteq D_1\).
Definition 5
Proposition 2
The function \(\varGamma \) is monotone.
Since the set \(2^{S^2}\) of subsets of \(S^2\) endowed with the order \(\subseteq \) is a complete lattice (see [9, Example 2.6(2)]) and the function \(\varGamma \) is monotone, we can conclude from the KnasterTarski fixed point theorem that \(\varGamma \) has a greatest fixed point, which we denote by \(\varvec{\nu }(\varGamma )\). Next, we show that \(D_1\) is a fixed point of \(\varGamma \).
Proposition 3
\(D_1 = \varGamma (D_1)\).
Since we have already seen that \(D_1\) is a fixed point of \(\varGamma \), we have that \(D_1 \subseteq \varvec{\nu }(\varGamma )\). To conclude that \(D_1\) is the greatest fixed point of \(\varGamma \), it remains to show that \(\varvec{\nu }(\varGamma ) \subseteq D_1\), which is equivalent to the following.
Proposition 4
\(\varvec{\nu }(\varGamma ) \setminus D_1 = \emptyset \).
Proof
 Assume that there exists \((s, t) \in M\) such that \(\mathrm {support}(\omega _{s, t}) \cap D_1 \not = \emptyset \). LetBy (3), we have that \(\mathrm {support}(\omega _{s, t}) \subseteq \varvec{\nu }(\varGamma )\). Since \(\mathrm {support}(\omega _{s, t}) \cap D_1~\not =~\emptyset \) by assumption, we can conclude that Open image in new window . Again using the fact that \(\mathrm {support}(\omega _{s, t}) \subseteq \varvec{\nu }(\varGamma )\), we have that$$ p = \sum _{(u, v) \in \varvec{\nu }(\varGamma ) \cap D_1} \omega _{s, t}(u, v). $$Furthermore,$$\begin{aligned} \sum _{(u, v) \in \varvec{\nu }(\varGamma ) \setminus D_1} \omega _{s, t}(u, v) = 1  p. \end{aligned}$$(5)The last step follows from (5) and the fact that \(\varvec{\mu }(\varDelta )(u, v) \ge m\) for all \((u, v) \in \varvec{\nu }(\varGamma ) \setminus D_1\). From the facts that Open image in new window and \(m \ge p + (1  p) m\) we can conclude that \(m \ge 1\). This contradicts (1).$$\begin{aligned} m= & {} \varvec{\mu }(\varDelta )(s, t)\\= & {} \varDelta (\varvec{\mu }(\varDelta ))(s, t)\\= & {} \min _{\omega \in \varOmega (\tau (s), \tau (t))} \sum _{u, v \in S} \omega (u, v) \, \varvec{\mu }(\varDelta )(u,v)\\= & {} \sum _{u, v \in S} \omega _{s, t}(u, v) \, \varvec{\mu }(\varDelta )(u, v) \,\,\,\,[(4)] \\= & {} \sum _{(u, v) \in \varvec{\nu }(\varGamma )} \omega _{s, t}(u, v) \, \varvec{\mu }(\varDelta )(u, v)\,\,\,\,\,[(3)]\\= & {} \sum _{(u, v) \in \varvec{\nu }(\varGamma ) \cap D_1} \omega _{s, t}(u, v) \, \varvec{\mu }(\varDelta )(u, v) + \sum _{(u, v) \in \varvec{\nu }(\varGamma ) \setminus D_1} \omega _{s, t}(u, v) \, \varvec{\mu }(\varDelta )(u, v) \\= & {} p + \sum _{(u, v) \in \varvec{\nu }(\varGamma ) \setminus D_1} \omega _{s, t}(u, v) \, \varvec{\mu }(\varDelta )(u, v)\\\ge & {} p + (1  p)m. \end{aligned}$$

Otherwise, \(\mathrm {support}(\omega _{s, t}) \cap D_1 = \emptyset \) for all \((s, t) \in M\). Next, we will show that M is a probabilistic bisimulation under this assumption. From the fact that M is a probabilistic bisimulation, we can conclude from Theorem 1 that \(\varvec{\mu }(\varDelta )(s, t ) = 0\) for all \((s, t) \in M\). Hence, since \(M \not = \emptyset \) we have that \(M \cap S_0^2 \not = \emptyset \) which contradicts (2).
Next, we prove that M is a probabilistic bisimulation. Let \((s, t) \in M\). Since \(M \subseteq \varvec{\nu }(\varGamma ) \setminus D_1\) by (1), we have that \((s, t) \not \in D_1\) and, hence, Open image in new window . From the definition of \(\varDelta \), we can conclude that \(\ell (s) = \ell (t)\). Sinceand \(\varvec{\mu }(\varDelta )(u, v) \ge m\) for all \((u, v) \in \varvec{\nu }(\varGamma ) \setminus D_1\), we can conclude that \(\varvec{\mu }(\varDelta )(u, v) = m\) for all \((u, v) \in \mathrm {support}(\omega _{s,t})\). Hence, \(\mathrm {support}(\omega _{s,t}) \subseteq M\). Therefore, M is a probabilistic bisimulation. \(\square \)$$\begin{aligned} m= & {} \varvec{\mu }(\varDelta )(s, t)\\= & {} \sum _{(u, v) \in \varvec{\nu }(\varGamma ) \setminus D_1} \omega _{s, t}(u, v) \, \varvec{\mu }(\varDelta )(u, v) \,\,\,\, \text {[as above]} \end{aligned}$$
Theorem 2
\(D_1 = \varvec{\nu }(\varGamma )\).
We have shown that \(D_1\) can be characterized as the greatest fixed point of \(\varGamma \). Next, we will show that \(D_1\) can be decided in polynomial time.
Theorem 3
Distance one can be decided in \(O(n^2 + m^2)\).
Proof
As we will show in Theorem 5, distance smaller than one can be decided in \(O(n^2 + m^2)\). Hence, distance one can be decided in \(O(n^2 + m^2)\) as well. \(\square \)
4 Distance Smaller Than One
To compute the set of state pairs which have distance one, we can first compute the set of state pairs which have distance less than one. The latter set we denote by Open image in new window . We can then obtain \(D_1\) by taking the complement of Open image in new window . As we will discuss below, Open image in new window can be characterized as the least fixed point of the following function.
Definition 6
The next theorem follows from Theorem 2.
Theorem 4
Next, we show that the computation of Open image in new window can be formulated as a reachability problem on a directed graph which is induced by the labelled Markov chain. Thus, we can use standard search algorithms, for example, breadthfirst search, on the induced graph.
Next, we present the graph induced by the labelled Markov chain.
Definition 7
We are left to show that in the graph G defined above, a vertex (s, t) is reachable from some vertex in \(S^2_0\) if and only if the state pair (s, t) in the labelled Markov chain has distance less than one.
As we have discussed earlier, if a state pair (s, t) has distance one, either s and t have different labels, or for all couplings \(\omega \in \varOmega (\tau (s), \tau (t))\) we have that \(\mathrm {support}(\omega ) \subseteq D_1\). To avoid the universal quantification over couplings, we will use Proposition 1 in the proof of following proposition.
Proposition 5
Theorem 5
Distance smaller than one can be decided in \(O(n^2 + m^2)\).
Proof
Distance smaller than one can be decided as follows.
 1.
Decide distance zero.
 2.
Breadthfirst search of G, with the queue initially containing the pairs of states that have distance zero.
By Theorem 4 and Proposition 5, we have that s and t have distance smaller than one if and only if (s, t) is reachable in the directed graph G from some (u, v) such that u and v have distance zero. These reachable state pairs can be computed using breadthfirst search, with the queue initially containing \(S^2_0\).
Distance zero, that is, probabilistic bisimilarity, can be decided in \(O(m \log n)\) as shown by Derisavi et al. in [10]. The directed graph G has \(n^2\) vertices and \(m^2\) edges. Hence, breadthfirst search takes \(O(n^2 + m^2)\). \(\square \)
5 Number of Nontrivial Distances
As we have already discussed earlier, distance zero captures that states behave exactly the same, that is, they are probabilistic bisimilar, and distance one indicates that states behave very differently. The remaining distances, that is, those greater than zero and smaller than one, we call nontrivial. Being able to determine quickly the number of nontrivial distances of a labelled Markov chain allows us to decide whether computing all these nontrivial distances (using some policy iteration algorithm) is feasible.
To determine the number of nontrivial distances of a labelled Markov chain, we use the following algorithm.
 1.
Decide distance zero.
 2.
Decide distance one.
As first proved by Baier [4], distance zero, that is, probabilistic bisimilarity, can be decided in polynomial time. As we proved in Theorem 3, distance one can be decided in polynomial time as well. Hence, we can compute the number of nontrivial distances in polynomial time.
To decide distance zero, we implemented the algorithm to decide probabilistic bisimilarity due to Derisavi et al. [10] in Java. We also implemented our algorithm to decide distance one, described in the proof of Theorems 3 and 5.
We applied our implementation to labelled Markov chains that model randomized algorithms and probabilistic protocols. These labelled Markov chains have been obtained from the verification tool PRISM [20]. We compute the number of nontrivial distances for two models: the randomized selfstabilising algorithm due to Herman [14] and the bounded retransmission protocol by Helmink et al. [13].
N  S  \(D_0 + D_1\)  Nontrivial  \(D_0\)  \(D_1\)  \(S_1^2\) 

3  8  1.00 ms  12  38  14  14 
5  32  6.06 ms  280  304  440  440 
7  128  0.77 s  11,032  2,160  3,192  3,192 
9  512  378.42 s  230,712  13,648  17,784  17,784 
N  M  S  \(D_0 + D_1\)  \(D_0\)  \(D_1\)  \(S_1^2\) 

16  2  677  3.0 s  456,977  1,352  1,352 
16  3  886  8.6 s  783,226  1,770  1,770 
16  4  1,095  17.5 s  1,196,837  2,188  2,188 
16  5  1,304  22.8 s  1,697,810  2,606  2,606 
32  2  1,349  24.7 s  1,817,105  2,696  2,696 
32  3  1,766  69.7 s  3,115,226  3,530  3,530 
32  4  2,183  141.0 s  4,761,125  4,364  4,364 
32  5  2,600  208.6 s  6,754,802  5,198  5,198 
64  2  2,693  235.2 s  7,246,865  5,384  5,384 
64  3  3,526  616.4 s  12,425,626  7,050  7,050 
6 All Distances
To compute all distances of a labelled Markov chain, we augment the existing state of the art algorithm, which is based on algorithms due to Derisavi et al. [10] (step 1) and Bacci et al. [2] (step 3), by incorporating our decision procedure (step 2) as follows.
 1.
Decide distance zero.
 2.
Decide distance one.
 3.
Simple policy iteration.
Given that we not only decide distance zero, but also distance one, before running simple policy iteration, the correctness of the simple policy iteration algorithm in the augmented setting needs an adjusted proof.
As we already discussed in the previous section, step 1 and 2 are polynomial time. However, step 3 may take at least exponential time in the worst case, as we have shown in [27]. Hence, the overall algorithm is exponential time.
N  K  S  \(D_0\) + SPI  \(D_0\) + \(D_1\) + SPI  Speedup  \(D_0\)  \(D_1\)  \(S_1^2\) 

3  2  26  4 s  1 ms  4,281  122  554  50 
3  4  147  49 h  13 ms  13,800,000  7,419  14,190  292 
3  6  459    214 ms    88,671  122,010  916 
3  8  1,059    3 s    508,851  612,630  2,116 
4  2  61  812 s  3 ms  305,000  459  3,262  120 
4  4  812    388 ms    145,780  513,564  1,622 
4  6  3,962    82 s    4,350,292  11,347,152  7,922 
4  8  12,400    2,971 s    46,198,188  107,561,812  24,798 
5  2  141    6 ms    2,399  17,482  280 
5  4  4,244    33 s    3,318,662  14,692,874  8,486 
6  2  335    25 ms    14,327  97,898  668 
The simple policy iteration algorithm can only handle a limited number of states. For the labelled Markov chain with 26 states (\(N = 3\) and \(K=2\)) the simple policy iteration algorithm takes four seconds, while our new algorithm takes one millisecond. The speedup is more than 4,000 times. For the labelled Markov chain with 61 states (\(N = 4\) and \(K=2\)), the simple policy iteration algorithm runs in 812 s, while our new algorithm takes three milliseconds. The speedup of the new algorithm is 30,000 times. The biggest system the simple policy iteration algorithm can handle is the one with 147 states (\(N = 3\) and \(K = 4\)) and it takes more than 49 h. In contrast, our new algorithm terminates within 13 ms. That makes the new algorithm seven orders of magnitude faster than the state of the art algorithm. This example also shows that the new algorithm can handle systems with at least 12,400 states.
In the second example, we model two dies, one using a fair coin and the other one using a biased coin. The goal is to compute the probabilistic bisimilarity distance between these two dies. An implementation of the die algorithm is part of PRISM. The resulting labelled Markov chain has 20 states.
S  \(D_0 + \)SPI  \(D_0\) + \(D_1\) + SPI  Speedup  Nontrivial  \(D_0\)  \(D_1\)  \(S_1^2\) 

20  5.55 s  0.12 s  46.25  30  20  350  198 
7 Small Distances
As we have discussed in Sect. 5, for systems of which the number of nontrivial distances is so large that computing all of them is infeasible, we have to find alternative ways. In practice, as we only identify the state pairs with small distances, we can cut down the number of nontrivial distances by only computing those with small distances.
To compute the nontrivial distances smaller than a positive number, \(\varepsilon \), we use the following algorithm.
 1.
Decide distance zero.
 2.
Decide distance one.
 3.Compute the query setwhere$$ Q = \{\, (s, t) \in S^2 \setminus (D_0 \cup D_1) \mid \varDelta (d)(s, t) \le \varepsilon \,\} $$$$\begin{aligned} d(s, t) = \left\{ \begin{array}{ll} 1 &{} \text{ if } (s, t) \in D_1\\ 0 &{} \text{ otherwise } \end{array} \right. \end{aligned}$$
 4.
Simple partial policy iteration for Q.
The first two steps remain the same. In step 3, we compute a query set Q that contains all state pairs with distances no greater than \(\varepsilon \), as shown in Proposition 6. In step 4, we use this set as the query set to run the simple partial policy iteration algorithm by Bacci et al. [2].
Proposition 6
Let d be the distance function defined in step 3. For all \((s, t) \in S^2 \setminus (D_0 \cup D_1) \), if \(\varvec{\mu }(\varDelta )(s, t) \le \varepsilon \), then \(\varDelta (d)(s, t) \le \varepsilon \).
Given that we not only decide distance zero, but also distance one, before running simple partial policy iteration, the correctness of the simple partial policy iteration algorithm in the augmented setting needs an adjusted proof.
As we have seen before, step 1 and 2 take polynomial time. In step 3, computing \(\varDelta (d)\) corresponds to solving a minimum cost network flow problem. Such a problem can be solved in polynomial time using, for example, Orlin’s network simplex algorithm [24]. As we have shown in [28], step 4 takes at least exponential time in the worst case. Therefore, the overall algorithm is exponential time.
\(\varepsilon \)  \(D_0 + D_1 + Q\) + SPPI  Q  Total  Nontrivial 

0.1  57 min  96  1,002  2,300 
0.01  41 min  84  842  2,300 
8 Approximation Algorithm
We propose another solution to deal with a large number of nontrivial distances by approximating the distances rather than computing the exact values. To approximate the distances such that the approximate values differ from the exact ones by at most \(\alpha \), a positive number, we use the following algorithm.
 1.
Decide distance zero.
 2.
Decide distance one.
 3.
Again, the first two steps remain the same. Step 3 contains the new approximation algorithm called distance iteration (DI). In this step, we define two distance functions, a lowerbound l and an upperbound u. We repeatedly apply \(\varDelta \) to these two functions until the difference of the nontrivial distances in these two functions is smaller than the threshold \(\alpha \). For each state pair we end up with an interval of at most size \(\alpha \) in which their distance lies. To prove the algorithm correct, we modify the function \(\varDelta \) defining the probabilistic bisimilarity distances slightly as follows.
Definition 8
Some properties of \(\varDelta _0\), which are key to the correctness proof of the above algorithm, are collected in the following theorem.
Theorem 6
 (a)
The function \(\varDelta _0\) is monotone.
 (b)
The function \(\varDelta _0\) is nonexpansive.
 (c)
\(\varvec{\mu }(\varDelta _0) = \varvec{\mu }(\varDelta )\).
 (d)
\(\varvec{\mu }(\varDelta _0) = \varvec{\nu }(\varDelta _0)\).
 (e)
\(\varvec{\mu }(\varDelta _0) = \sup _{m \in \mathbb {N}} \varDelta _0^m(d_0)\), where \(d_0(s, t) = 0\) for all \(s, t \in S\).
 (f)
\(\varvec{\nu }(\varDelta _0) = \inf _{n \in \mathbb {N}} \varDelta _0^n(d_1)\), where \(d_1(s, t) = 1\) for all \(s, t \in S\).
Let us use randomized quicksort introduced in Sect. 7 and the randomized selfstabilising algorithm due to Herman [14] introduced in Sect. 5 as examples. Recall that for the randomized selfstabilising algorithm, when \(N = 7\), the number of nontrivial distances is 11,032, which we are not able to handle using the simple policy iteration algorithm. We apply the approximation algorithm to this model and the randomized quicksort example with 82 states and present the results below. The accuracy \(\alpha \) is set to be 0.01.
Model  S  Nontrivial  \(D_0 + D_1+\)DI 

Randomized quicksort  82  2,300  14 min 
Randomized selfstabilising algorithm  128  11,032  54 h 
9 Conclusion
In this paper, we have presented a decision procedure for probabilistic bisimilarity distance one. This decision procedure provides the basis for three new algorithms to compute and approximate the probabilistic bisimilarity distances of a labelled Markov chain. The first algorithm decides distance zero, then decides distance one, and finally uses simple policy iteration to compute the remaining distances. As shown experimentally, this new algorithm significantly improves the state of the art algorithm that only decides distance zero and then uses simple policy iteration. The second algorithm computes all probabilistic bisimilarity distances that are smaller than some given upper bound, by deciding distance zero, deciding distance one, computing a query set, and running simple partial policy iteration for that query set. This second algorithm can handle labelled Markov chains that have considerably more nontrivial distances than our first algorithm. The third algorithm approximates the probabilistic bisimilarity distances up to a given accuracy, deciding distance zero, deciding distance one and running distance iteration. Also this third algorithm can handle labelled Markov chains that have considerably more nontrivial distances than our first algorithm. Whereas we know that the first two algorithms take at least exponential time in the worst case, the analysis of the running time of the third algorithm has not yet been determined. Moreover, if we are only interested in the probabilistic bisimilarity distances for a few state pairs, with precomputation of distance zero and one we can exclude the state pairs with trivial distances. We can add the remaining state pairs to a query set and run simple partial policy iteration to get the distances. Alternatively, we can modify the distance iteration algorithm to approximate the distances for the predefined state pairs. The details of these new algorithms will be studied in the future.
Notes
Acknowledgements
The authors would like to thank Daniela Petrisan, Eric Ruppert and Dana Scott for discussions related to this research. The authors are also grateful to the referees for their constructive feedback.
References
 1.Aceto, L., Ingolfsdottir, A., Larsen, K., Srba, J.: Reactive Systems: Modelling, Specification and Verification. Cambridge University Press, Cambridge (2003)zbMATHGoogle Scholar
 2.Bacci, G., Bacci, G., Larsen, K.G., Mardare, R.: Onthefly exact computation of bisimilarity distances. In: Piterman, N., Smolka, S.A. (eds.) TACAS 2013. LNCS, vol. 7795, pp. 1–15. Springer, Heidelberg (2013). https://doi.org/10.1007/9783642367427_1CrossRefzbMATHGoogle Scholar
 3.Bacci, G., Bacci, G., Larsen, K.G., Mardare, R.: On the metricbased approximate minimization of Markov chains. In: Chatzigiannakis, I., Indyk, P., Kuhn, F., Muscholl, A. (eds.) Proceedings of the 44th International Colloquium on Automata, Languages, and Programming, Warsaw, Poland, July 2017. Leibniz International Proceedings in Informatics, vol. 80, pp. 104:1–104:14. Schloss Dagstuhl  LeibnizZentrum für Informatik (2017)Google Scholar
 4.Baier, C.: Polynomial time algorithms for testing probabilistic bisimulation and simulation. In: Alur, R., Henzinger, T.A. (eds.) CAV 1996. LNCS, vol. 1102, pp. 50–61. Springer, Heidelberg (1996). https://doi.org/10.1007/3540614745_57CrossRefGoogle Scholar
 5.Bellman, R.: A Markovian decision process. J. Math. Mech. 6(5), 679–684 (1957)MathSciNetzbMATHGoogle Scholar
 6.van Breugel, F.: On behavioural pseudometrics and closure ordinals. Inf. Process. Lett. 112(18), 715–718 (2012)MathSciNetCrossRefGoogle Scholar
 7.van Breugel, F.: Probabilistic bisimilarity distances. ACM SIGLOG News 4(4), 33–51 (2017)Google Scholar
 8.Chen, D., van Breugel, F., Worrell, J.: On the complexity of computing probabilistic bisimilarity. In: Birkedal, L. (ed.) FoSSaCS 2012. LNCS, vol. 7213, pp. 437–451. Springer, Heidelberg (2012). https://doi.org/10.1007/9783642287299_29CrossRefzbMATHGoogle Scholar
 9.Davey, B., Priestley, H.: Introduction to Lattices and Order. Cambridge University Press, Cambridge (2002)CrossRefGoogle Scholar
 10.Derisavi, S., Hermanns, H., Sanders, W.: Optimal statespace lumping in Markov chains. In. Process. Lett. 87(6), 309–315 (2003)MathSciNetCrossRefGoogle Scholar
 11.Desharnais, J., Gupta, V., Jagadeesan, R., Panangaden, P.: Metrics for labeled Markov systems. In: Baeten, J.C.M., Mauw, S. (eds.) CONCUR 1999. LNCS, vol. 1664, pp. 258–273. Springer, Heidelberg (1999). https://doi.org/10.1007/3540483209_19CrossRefGoogle Scholar
 12.Giacalone, A., Jou, C.C., Smolka, S.: Algebraic reasoning for probabilistic concurrent systems. In: Proceedings of the IFIP WG 2.2/2.3 Working Conference on Programming Concepts and Methods, Sea of Gallilee, Israel, April 1990, pp. 443–458. NorthHolland (1990)Google Scholar
 13.Helmink, L., Sellink, M.P.A., Vaandrager, F.W.: Proofchecking a data link protocol. In: Barendregt, H., Nipkow, T. (eds.) TYPES 1993. LNCS, vol. 806, pp. 127–165. Springer, Heidelberg (1994). https://doi.org/10.1007/3540580859_75CrossRefGoogle Scholar
 14.Herman, T.: Probabilistic selfstabilization. Inf. Process. Lett. 35(2), 63–67 (1990)MathSciNetCrossRefGoogle Scholar
 15.Itai, A., Rodeh, M.: Symmetry breaking in distributed networks. Inf. Comput. 88(1), 60–87 (1990)MathSciNetCrossRefGoogle Scholar
 16.Katoen, J.P., Kemna, T., Zapreev, I., Jansen, D.N.: Bisimulation minimisation mostly speeds up probabilistic model checking. In: Grumberg, O., Huth, M. (eds.) TACAS 2007. LNCS, vol. 4424, pp. 87–101. Springer, Heidelberg (2007). https://doi.org/10.1007/9783540712091_9CrossRefGoogle Scholar
 17.Khachiyan, L.: A polynomial algorithm in linear programming. Sov. Math. Dokl. 20(1), 191–194 (1979)MathSciNetzbMATHGoogle Scholar
 18.Klee, V., Witzgall, C.: Facets and vertices of transportation polytopes. In: Dantzig, G., Veinott, A. (eds.) Proceedings of 5th Summer Seminar on the Mathematis of the Decision Sciences, Stanford, CA, USA, July/August 1967. Lectures in Applied Mathematics, vol. 11, pp. 257–282. AMS (1967)Google Scholar
 19.Knuth, D., Yao, A.: The complexity of nonuniform random number generation. In: Traub, J. (ed.) Proceedings of a Symposium on New Directions and Recent Results in Algorithms and Complexity, Pittsburgh, PA, USA, April 1976, pp. 375–428. Academic Press (1976)Google Scholar
 20.Kwiatkowska, M., Norman, G., Parker, D.: PRISM 4.0: verification of probabilistic realtime systems. In: Gopalakrishnan, G., Qadeer, S. (eds.) CAV 2011. LNCS, vol. 6806, pp. 585–591. Springer, Heidelberg (2011). https://doi.org/10.1007/9783642221101_47CrossRefGoogle Scholar
 21.Larsen, K., Skou, A.: Bisimulation through probabilistic testing. In: Proceedings of the 16th Annual ACM Symposium on Principles of Programming Languages, Austin, TX, USA, January 1989, pp. 344–352. ACM (1989)Google Scholar
 22.Milner, R. (ed.): A Calculus of Communicating Systems. LNCS, vol. 92. Springer, Heidelberg (1980). https://doi.org/10.1007/3540102353CrossRefzbMATHGoogle Scholar
 23.Murthy, A., et al.: Approximate bisimulations for sodium channel dynamics. In: Gilbert, D., Heiner, M. (eds.) CMSB 2012. LNCS, pp. 267–287. Springer, Heidelberg (2012). https://doi.org/10.1007/9783642336362_16CrossRefGoogle Scholar
 24.Orlin, J.: A polynomial time primal network simplex algorithm for minimum cost flows. Math. Program. 78(2), 109–129 (1997)MathSciNetCrossRefGoogle Scholar
 25.Park, D.: Concurrency and automata on infinite sequences. In: Deussen, P. (ed.) GITCS 1981. LNCS, vol. 104, pp. 167–183. Springer, Heidelberg (1981). https://doi.org/10.1007/BFb0017309CrossRefGoogle Scholar
 26.Sen, P., Deshpande, A., Getoor, L.: Bisimulationbased approximate lifted inference. In: Bilmes, J., Ng, A. (eds.) Proceedings of the 25th Conference on Uncertainty in Artificial Intelligence, Montreal, QC, Canada, pp. 496–505. AUAI Press (2009)Google Scholar
 27.Tang, Q., van Breugel, F.: Computing probabilistic bisimilarity distances via policy iteration. In: Desharnais, J., Jagadeesan, R. (eds.) Proceedings of the 27th International Conference on Concurrency Theory, Quebec City, QC, Canada, August 2016. Leibniz International Proceedings in Informatics, vol. 59, pp. 22:1–22:15. Schloss Dagstuhl  LeibnizZentrum für Informatik (2016)Google Scholar
 28.Tang, Q., van Breugel, F.: Algorithms to compute probabilistic bisimilarity distances for labelled Markov chains. In: Meyer, R., Nestmann, U. (eds.) Proceedings of the 28th International Conference on Concurrency Theory, Berlin, Germany, September 2017. Leibniz International Proceedings in Informatics, vol. 85, pp. 27:1–27:16. Schloss Dagstuhl  LeibnizZentrum für Informatik (2017)Google Scholar
 29.Tarski, A.: A latticetheoretic fixed point theorem and its applications. Pac. J. Math. 5(2), 285–309 (1955)CrossRefGoogle Scholar
 30.Valmari, A., Franceschinis, G.: Simple O(m logn) time Markov chain lumping. In: Esparza, J., Majumdar, R. (eds.) TACAS 2010. LNCS, vol. 6015, pp. 38–52. Springer, Heidelberg (2010). https://doi.org/10.1007/9783642120022_4CrossRefGoogle Scholar
 31.Zhang, X., van Breugel, F.: Model checking randomized algorithms with Java PathFinder. In: Proceedings of the 7th International Conference on the Quantitative Evaluation of Systems, Williamsburg, VA, USA, September 2010, pp. 157–158. IEEE (2010)Google Scholar
Copyright information
<SimplePara><Emphasis Type="Bold">Open Access</Emphasis>This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.</SimplePara><SimplePara>The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.</SimplePara>