Abstract
We introduce parallel symbolic algorithms for bisimulation minimisation, to combat the combinatorial state space explosion along three different paths. Bisimulation minimisation reduces a transition system to the smallest system with equivalent behaviour. We consider strong and branching bisimilarity for interactive Markov chains, which combine labelled transition systems and continuoustime Markov chains. Large state spaces can be represented concisely by symbolic techniques, based on binary decision diagrams. We present specialised BDD operations to compute the maximal bisimulation using signaturebased partition refinement. We also study the symbolic representation of the quotient system and suggest an encoding based on representative states, rather than block numbers. Our implementation extends the parallel, shared memory, BDD library Sylvan, to obtain a significant speedup on multicore machines. We propose the usage of partial signatures and of disjunctively partitioned transition relations, to increase the parallelisation opportunities. Also our new parallel data structure for block assignments increases scalability. We provide SigrefMC, a versatile tool that can be customised for bisimulation minimisation in various contexts. In particular, it supports models generated by the highperformance model checker LTSmin, providing access to specifications in multiple formalisms, including process algebra. The extensive experimental evaluation is based on various benchmarks from the literature. We demonstrate a speedup up to 95\(\times \) for computing the maximal bisimulation on one processor. In addition, we find parallel speedups on a 48core machine of another 17\(\times \) for partition refinement and 24\(\times \) for quotient computation. Our new encoding of the reduced state space leads to smaller BDD representations, with up to a 5162fold reduction.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
One of the main challenges for model checking is that the space and time requirements of model checking algorithms increase exponentially in the size of the models. This paper combines state space reduction, symbolic representation, and parallel computation, to alleviate the state space explosion.
As input models, we consider interactive Markov chains (IMC). These provide a compositional framework to study functionality, performance, and dependability of reactive systems. IMCs inherit nondeterministic choice and communication from labelled transition systems, and probabilistic timed (Markovian) transitions from continuoustime Markov chains.
A state space reduction computes the smallest “equivalent” model. We consider strong bisimilarity, which preserves all behaviour, and branching bisimilarity, which abstracts from internal behaviour (represented by \(\tau \)steps) and only preserves the observable behaviour. Note that branching bisimulation preserves the branching structure of an LTS, thus preserving all properties expressible in CTL*X [14]. These notions correspond to strong and branching lumping for IMCs.
The reduced state space consists of (representatives of) the equivalence classes in the largest bisimulation, which is typically computed using partition refinement. Starting with the initial partition, in which all states are equivalent, the current partition is refined until the states in any equivalence class can no longer be distinguished. Blom et al. [5] introduced a signaturebased method, which defines the equivalence classes according to the characterising signature of a state.
Another important technique to handle large state spaces is symbolic representation. Sets of states are represented by characteristic functions, which are efficiently stored in binary decision diagrams (BDDs). In the literature, symbolic methods have been applied to bisimulation minimisation in several ways. Bouali and De Simone [8] refine the equivalence relation \(R\subseteq S\times S\), by iteratively removing all “bad” pairs from R, i.e., pairs of states that are no longer equivalent. For strong bisimulation, Mumme and Ciardo [32] apply saturationbased methods to compute R. Wimmer et al. [40, 41] use signatures to refine the partition, represented by the assignment to equivalence classes \(P:S\rightarrow C\). Symbolic bisimulation based on signatures has also been applied to Markov chains by Derisavi [16] and Wimmer et al. [38, 39].
The symbolic representation of the reduced state space tends to be much larger than the original model. One particular application of symbolic bisimulation minimisation is as a bridge between symbolical models and explicitstate analysis algorithms. Symbolical models can have very large state spaces that are efficiently encoded using BDDs. The minimised model has often a sufficiently small number of states, so it can be further analysed efficiently using explicitstate algorithms.
Symbolic techniques mainly reduce the memory requirements of model checking. To speed up the computation, developing scalable parallel algorithms is the way forward, since it takes advantage of multicore computer systems. In [17, 18, 20], we implemented the multicore BDD package Sylvan, providing parallel BDD operations to symbolic model checking.
Parallelisation had been applied to explicitstate bisimulation minimisation before. Blom et al. [4, 5] introduced distributed signaturebased bisimulation reduction. Also, [29] proposed a concurrent algorithm for bisimulation minimisation which combines signatures with the approach by Paige and Tarjan [33]. Recently, Wijs [37] implemented highly parallel strong and branching bisimilarity checking on GPGPUs. As far as we are aware, no earlier work combines symbolic bisimulation minimisation and parallelism. This paper is an extended version of [21]. There, we demonstrated that specialised BDD operations for signature refinement provide a major speedup of the sequential algorithm, and scale across multiple processors.
We extend [21] by four new results. First, we investigate how to compute the reduced state space, i.e., the quotient of the original system with respect to the maximal bisimulation obtained by signature refinement. Traditionally, the quotient is computed by a sequence of standard BDD operations. Similar to computing the partition, we find that quotient computation benefits from specialised BDD operations. Second, we study the representation of the quotient. Traditionally, its states are encoded by using the assigned block number as state identifier. We improve the encoding by choosing one representative state from each block. This considerably reduces the size of the resulting BDD representation. Third, we refine our algorithm. Instead of using a monolithic transition relation, we now support a disjunctive partitioning of the transition relation. This appears to be more efficient than a monolithic transition relation and provides further parallelisation opportunities when computing the maximal bisimulation. Finally, we link the tool SigrefMC presented in [21] to LTSmin, by supporting the partitioned transition systems generated by the symbolic backend of the LTSmin toolset [6, 28, 31]. Since LTSmin supports various input languages, including the specification language mCRL2 [13] for process algebra, this allows us to carry out a considerably larger set of experiments, generated from various specification languages.
Outline This paper presents the following contributions. We recapitulate the notion of partition refinement with partial signatures in Sect. 3. Section 4 discusses how we extended Sylvan to parallelise signaturebased partition refinement. In particular, we develop three specialised BDD algorithms: the refine algorithm refines a partition according to a signature, but maximally reuses the block number assignment of the previous partition (Sect. 4.3). This algorithm improves the operation cache usage for the computation of the signatures of stable blocks and enables partition refinement with partial signatures. The inert algorithm removes all transitions that are not inert (Sect. 4.4). This algorithm avoids an expensive intermediate result reported in the literature [41]. We discuss the new quotient computation in Sect. 5. Specialised BDD algorithms significantly speed up the quotient computation for the interactive transition relation (Sect. 5.1) and for the Markovian transition relation (Sect. 5.2). The new encoding of the quotient space is explained in Sect. 5.3. Section 6 presents the implementation of these algorithms as a versatile tool that can be customised for bisimulation minimisation in various contexts, including support for transition systems generated by the model checking toolset LTSmin (Sect. 6.1). Section 7 discusses experimental data based on benchmarks from the literature. For partition refinement, we demonstrate a speedup of up to 95\(\times \) sequentially. In addition, we find parallel speedups of up to 17\(\times \) due to parallelisation with 48 cores. For quotient computation, we find a speedup of 2–10\(\times \) by using specialised operations, and we find significantly smaller BDDs (up to 5162\(\times \) smaller) when using a representative state rather than the block number to encode the new transition system.
2 Preliminaries
We recall the basic definitions of partitions, of labelled transition systems, of continuoustime Markov chains, of interactive Markov chains, and of various bisimulations as in [5, 26, 40,41,42].
2.1 Partitions
Definition 1
Given a set S, a partition \(\pi \) of S is a subset \(\pi \subseteq 2^S\) such that
The elements of \(\pi \) are called equivalence classes or blocks. If \(\pi '\) and \(\pi \) are two partitions, then \(\pi '\) is a refinement of \(\pi \), written \(\pi '\sqsubseteq \pi \), if each block of \(\pi '\) is contained in a block of \(\pi \). Each equivalence relation \(\equiv \) is associated with a partition \(\pi =S/\!\equiv \). In this paper, we use \(\pi \) and \(\equiv \) interchangeably.
2.2 Transition systems
Definition 2
A labelled transition system (LTS) is a tuple \((S,\textsf {Act},T)\), consisting of a set of states S, a set of labels \(\textsf {Act}\), which may contain the nonobservable action \(\tau \), and transitions \(T\subseteq S\times \textsf {Act}\times S\).
We write \(s\overset{a}{\rightarrow }t\) for \((s,a,t)\!\in T\) and \(s\!\mathop {\nrightarrow }\limits ^{\tau }\) when s has no outgoing \(\tau \)transitions. We use \(\overset{a*}{\rightarrow }\) to denote the transitive reflexive closure of \(\overset{a}{\rightarrow }\). Given an equivalence relation \(\equiv \), we write for \(\overset{a}{\rightarrow }\!\cap \!\equiv \), i.e., transitions between equivalent states, called inert transitions. We use for the transitive reflexive closure of .
Definition 3
A continuoustime Markov chain (CTMC) is a tuple \((S,\mathbf {R})\), consisting of a set of states S and Markovian transitions \(\mathbf {R}:S\rightarrow S\rightarrow \mathbb {R}_{\ge 0}\).
We write \(s\overset{\lambda }{\Rightarrow }t\) for \(\mathbf {R}(s)(t)=\lambda \). The interpretation of \(s\overset{\lambda }{\Rightarrow }t\) is that the CTMC can switch from s to t within d time units with probability \(1\!\!\hbox {e}^{\lambda \cdot d}\). For a state s, we denote with \(\mathbf R (s)(C)=\sum _{s'\in C} \mathbf R (s)(s')\) the cumulative rate to reach a set of states \(C\subseteq S\) from state s in one transition.
Definition 4
An interactive Markov chain (IMC) is a tuple (\(S,\textsf {Act},T,\mathbf {R})\), consisting of a set of states S, a set of labels \(\textsf {Act}\) that may contain the nonobservable action \(\tau \), transitions \(T\subseteq S\times \textsf {Act}\times S\), and Markovian transitions \(\mathbf {R}:S\rightarrow S\rightarrow \mathbb {R}_{\ge 0}\).
An IMC basically combines the features of an LTS and a CTMC [25, 26]. One feature of IMCs is the maximal progress assumption. Internal interactive transitions, i.e., \(\tau \)transitions, can be assumed to take place immediately, while the probability that a Markovian transition executes immediately is zero. Therefore, we may remove all Markovian transitions from states that have outgoing \(\tau \)transitions: \(s\mathop {\rightarrow }\limits ^{\tau }\) implies \(\mathbf R (s)(S)=0\). We call IMCs to which this operation has been applied maximalprogresscut (mpcut) IMCs. In the rest of this paper, we implicitly assume that IMCs are mpcut.
2.3 Bisimulation
We recall strong and branching bisimulation. All discussed bisimulations are equivalence relations on the states of a transition system. Two states are bisimilar if and only if there is a bisimulation that relates them. So the maximal bisimulation relates two states if and only if they are bisimilar. For LTSs, we define strong and branching bisimulation as follows [41]:
Definition 5
A strong bisimulation on an LTS is an equivalence relation \(\equiv _{S}\) such that for all states \(s,t,s'\) with \(s\equiv _{S}t\) and \(s\overset{a}{\rightarrow }s'\), there is a state \(t'\) with \(t\overset{a}{\rightarrow }t'\) and \(s'\equiv _{S}t'\).
Definition 6
A branching bisimulation on an LTS is an equivalence relation \(\equiv _{B}\) such that for all states \(s,t,s'\) with \(s\equiv _{B}t\) and \(s\overset{a}{\rightarrow }s'\), either

\(a=\tau \) and \(s'\equiv _{B}t\), or

there are states \(t',t''\) with \(t\overset{\tau *}{\rightarrow }t'\overset{a}{\rightarrow }t''\) and \(t\equiv _{B}t'\) and \(s'\equiv _{B}t''\).
For CTMCs, we define strong bisimulation as follows [16, 38]:
Definition 7
A strong bisimulation on a CTMC is an equivalence relation \(\equiv _{S}\) such that for all \((s,t)\in \;\equiv _{S}\) and for all classes \(C\in S/\!\equiv _{S}\), \(\mathbf R (s)(C)=\mathbf R (t)(C)\).
For mpcut IMCs, we define strong and branching bisimulation as follows [26, 42]:
Definition 8
A strong bisimulation on an mpcut IMC is an equivalence relation \(\equiv _{S}\) such that for all \((s,t)\in \;\equiv _{S}\) and for all classes \(C\in S/\!\equiv _{S}\),

\(s\overset{a}{\rightarrow }s'\) for some \(s'\in C\) implies \(t\overset{a}{\rightarrow }t'\) for some \(t'\in C\)

\(\mathbf R (s)(C)=\mathbf R (t)(C)\)
Definition 9
A branching bisimulation on an mpcut IMC is an equivalence relation \(\equiv _{B}\) such that for all \((s,t)\in \;\equiv _{B}\) and for all classes \(C\in S/\!\equiv _{B}\),

\(s\overset{a}{\rightarrow }s'\) for some \(s'\in C\) implies

\(a=\tau \) and \((s,s')\in \,\equiv _{B}\), or

there are states \(t',t''\in S\) with \(t \overset{\tau *}{\rightarrow } t'\overset{a}{\rightarrow }t''\) and \((t,t')\in \,\equiv _{B}\) and \(t''\in C\).


\(\mathbf R (s)(C)>0\) implies

\(\mathbf R (s)(C)=\mathbf R (t')(C)\) for some \(t'\in S\) such that \(t \overset{\tau *}{\rightarrow } t'\!\mathop {\nrightarrow }\limits ^{\tau }\) and \((t,t')\in \,\equiv _{B}\).


\(s\!\mathop {\nrightarrow }\limits ^{\tau }\) implies \(t\overset{\tau *}{\rightarrow } t'\!\mathop {\nrightarrow }\limits ^{\tau }\) for some \(t'\)
As we compare our work to [41, 42], we consider divergencesensitive branching bisimulation for IMCs, which distinguishes deadlock states (without successors) from states that only have selflooping transitions.
3 Signaturebased bisimulation minimisation
Blom and Orzan [5] introduced a signaturebased approach to compute the maximal bisimulation of an LTS, which was further developed into a symbolic method by Wimmer et al. [41]. Each state is characterised by a signature, which is the same for all equivalent states in a bisimulation. These signatures are used to refine a partition of the state space until a fixed point is reached, which is the maximal bisimulation.
In the literature, multiple signatures are sometimes used that together fully characterise states, for example based on the state labels, based on the rates of continuoustime transitions, and based on the enabled interactive transitions. We consider these multiple signatures as elements of a single signature that fully characterises each state.
Definition 10
A signature \(\sigma (\pi )(s)\) is a tuple of functions \(f_i(\pi )(s)\), that together characterise each state s with respect to a partition \(\pi \). Two signatures \(\sigma (\pi )(s)\) and \(\sigma (\pi )(t)\) are equivalent, if and only if for all \(f_i\), \(f_i(\pi )(s)=f_i(\pi )(t)\).
The signatures of the five bisimulations from Sect. 2.3 are known from the literature. First, we define for all actions \(a\in \textsf {Act}\) and equivalence classes \(C\in \pi \):

\(\mathbf T ({\pi })(s)=\{(a,C)\mid \exists s'\in C:s\overset{a}{\rightarrow }s'\}\)

\(\mathbf B ({\pi })(s)=\{(a,C)\mid \exists s'\in C:s\)

\(\mathbf R ^s({\pi })(s)=C\mapsto \mathbf R (s)(C)\)
The five bisimulations are associated with the following signatures:
Functions \(\mathbf T \) and \(\mathbf B \) assign to each state s all pairs of actions a and equivalence classes \(C\in \pi \), such that state s can reach C by an action a either directly (\(\mathbf T \)) or via any number of inert \(\tau \)steps (\(\mathbf B \)). Furthermore, inert \(\tau \)steps are removed from B. \(\mathbf R ^s\) equals \(\mathbf R \) but with the domain restricted to the equivalence classes \(C\in \pi \) and represents the cumulative rate with which each state s can go to states in C. \(\mathbf R ^b\) equals \(\mathbf R ^s\) for states \(s\!\mathop {\nrightarrow }\limits ^{\tau }\) and takes the highest “reachable rate” for states with inert \(\tau \)transitions. In branching bisimulation for mpcut IMCs, the “highest reachable rate” is by definition the rate that all states \({s\!\overset{\tau }{\nrightarrow }}\) in C have. The element distinguishes time convergent states from time divergent states [42] and is independent of the partition.
For the bisimulations of Definitions 5–9, we state:
Lemma 1
A partition \(\pi \) is a bisimulation, iff for all s and t that are equivalent in \(\pi \), \(\sigma (\pi )(s)=\sigma (\pi )(t)\).
For the above definitions, it is fairly straightforward to prove that they are equivalent to the classical definitions of bisimulation. See [5, 41] for the bisimulations on LTSs and [42] for the bisimulations on IMCs.
3.1 Signaturebased partition refinement
As discussed above, signatures can consist of multiple elements. We first define partition refinement using the full signature. We then define partition refinement with partial signatures, i.e., using the elements of the signature, and discuss advantages of this approach.
Definition 11
(Partition refinement with full signatures)
For a given signature \(\sigma \), we define the series of partition refinements:
The algorithm iteratively refines the initial coarsest partition \(\{S\}\) according to the signatures of the states, until some fixed point \(\pi ^{n+1}=\pi ^{n}\) is obtained. For monotone signatures (defined below), this fixed point is the maximal bisimulation.
Definition 12
A signature is monotone if for all \(\pi ,\pi '\) with \(\pi \sqsubseteq \pi '\), \(\sigma (\pi )(s)=\sigma (\pi )(t)\) implies \(\sigma (\pi ')(s)=\sigma (\pi ')(t)\).
For all monotone signatures, the \({{\mathrm{sigref}}}\) operator is monotone: \(\pi \sqsubseteq \pi '\) implies \({{\mathrm{sigref}}}(\pi ,\sigma )\sqsubseteq {{\mathrm{sigref}}}(\pi ',\sigma )\). Hence, following Kleene’s fixed point theorem, the procedure above reaches the greatest fixed point.
In Definition 11, the full signature is computed in every iteration. We propose to apply partition refinement using parts of the signature. By definition, \(\sigma (\pi )(s)=\sigma (\pi )(t)\) if and only if for all parts \(f_i(\pi )(s)=f_i(\pi )(t)\).
Definition 13
(Partition refinement with partial signatures)
We always select some \(f_i\) that refines the partition \(\pi \). A fixed point is reached only when no \(f_i\) refines the partition further: \(\forall f_i\in \sigma :{{\mathrm{sigref}}}(\pi ^n, f_i)=\pi ^n\). The extra clause \(s \equiv _{\pi } t\) ensures that every application of \({{\mathrm{sigref}}}\) refines the partition.
Theorem 1
If all parts \(f_i\) are monotone, Definition 13 yields the greatest fixed point.
Proof
The procedure terminates since the chain is decreasing (\(\pi ^{n+1}\sqsubseteq \pi ^n\)), due to the added clause \(s \equiv _{\pi } t\). We reach some fixed point \(\pi ^n\), since \({{\mathrm{sigref}}}(\pi ^n, \sigma )=\pi ^n\) is implied by \(\forall f_i\in \sigma :{{\mathrm{sigref}}}(\pi ^n, f_i)=\pi ^n\). Finally, to prove that we get the greatest fixed point, assume there exists another fixed point \(\xi ={{\mathrm{sigref}}}(\xi ,\sigma )\). Then, also \(\xi ={{\mathrm{sigref}}}(\xi ,f_i)\) for all i. We prove that \(\xi \sqsubseteq \pi ^n\) by induction on n. Initially, \(\xi \sqsubseteq {S}=\pi ^0\). Assume \(\xi \sqsubseteq \pi ^n\), then for the selected i, \(\xi ={{\mathrm{sigref}}}(\xi ,f_i)\sqsubseteq {{\mathrm{sigref}}}(\pi ^n,f_i)=\pi ^{n+1}\), using monotonicity of \(f_i\).
There are several advantages to this approach due to its flexibility. First, for any \(f_i\) that is independent of the partition, we need to refine with respect to that \(f_i\) only once. Furthermore, refinements can be applied according to different strategies. For instance, for the strong bisimulation of an mpcut IMC, one could refine w.r.t. \(\mathbf T \) until there is no more refinement, then w.r.t. \(\mathbf R ^s\) until there is no more refinement, then repeat until neither \(\mathbf T \) nor \(\mathbf R ^s\) refines the partition. Finally, computing the full signature is the most memoryintensive operation in symbolic signaturebased partition refinement. If the partial signatures are smaller than the full signature, then larger models can be minimised.
4 Symbolic signature refinement
This section describes the parallel decision diagram library Sylvan, followed by the (MT)BDDs and (MT)BDD operations required for signaturebased partition refinement. We describe how we encode partitions and signatures for signaturebased partition refinement. We present a new parallelised refine function that maximally reuses block numbers from the old partition. Finally, we present a new BDD algorithm that computes inert transitions, i.e., restricts a transition relation such that states s and \(s'\) are in the same block.
4.1 Decision diagram algorithms in Sylvan
In symbolic model checking [11], sets of states and transitions are represented by their characteristic function, rather than stored individually. With states described by N Boolean variables, a set \(S\subseteq \mathbb {B}^N\) can be represented by its characteristic function \(f:\mathbb {B}^N\rightarrow \mathbb {B}\), where \(S=\{s\mid f(s)\}\). Binary decision diagrams (BDDs) are a concise and canonical representation of Boolean functions [10].
An (ordered) BDD is a directed acyclic graph with leaves 0 and 1. Each internal node has a variable label \(x_i\) and two outgoing edges labelled 0 and 1. Variables are encountered along each path according to a fixed variable ordering. Duplicate nodes and nodes with two identical outgoing edges are forbidden. It is well known that for a fixed variable ordering, every Boolean function is represented by a unique BDD.
In addition to BDDs with leaves 0 and 1, multiterminal binary decision diagrams have been proposed [2, 12] with leaves other than 0 and 1, representing functions from the Boolean space \(\mathbb {B}^N\) onto any set. For example, MTBDDs can have leaves representing integers (encoding \(\mathbb {B}^N\rightarrow \mathbb {N}\)), floatingpoint numbers (encoding \(\mathbb {B}^N\rightarrow \mathbb {R}\)), and rational numbers (encoding \(\mathbb {B}^N\rightarrow \mathbb {Q}\)). Partial functions are supported using a leaf \(\bot \).
Sylvan [17, 18, 20] implements parallelised operations on decision diagrams using parallel data structures and workstealing. Workstealing [7, 19] is a load balancing method for taskbased parallelism. Recursive operations, such as most BDD operations, implicitly form a tree of tasks. Independent subtasks are stored in queues and idle processors steal tasks from the queues of busy processors.
See Algorithm 1 for a generic example of a BDD operation. This algorithm takes two inputs, the BDDs x and y, to which a binary operation \(\textsf {F}\) is applied. Most decision diagram operations first check if the operation can be applied immediately to x and y (line 2). This is typically the case when x and y are leaves. Often there are also other trivial cases that can be checked first. We then consult the operation cache (line 4) to see if this (sub)operation has been computed earlier. The operation cache is required to reduce the time complexity of BDD operations from exponential to polynomial in the size of the BDDs. Sylvan uses a single shared unique table for all BDD nodes and a single shared operation cache for all operations.
Often, the parameters of an operation can be normalised in some ways to increase the cache efficiency. For example, \(a\wedge b\) and \(b\wedge a\) are the same operation. In that case, normalisation rules can rewrite the parameters to some standard form in order to increase cache utilisation, at line 3. A wellknown example is the ifthenelse algorithm, which rewrites using rewrite rules called “standard triples” as described in [9].
If x and y are not leaves and the operation is not trivial or in the cache, we use topVar (line 5) to determine the first variable of the root nodes of x and y. If x and y have a different variable in their root node, topVar returns the first one in the variable ordering. We then compute the recursive application of F to the cofactors of x and y with respect to variable v at lines 7–8. We write \(x_{v=i}\) to denote the cofactor of x where variable v takes value i. Since x and y are ordered according to the same fixed variable ordering, we can easily obtain \(x_{v=i}\). If the root node of x is on the variable v, then \(x_{v=i}\) is obtained by following the low (\(i=0\)) or high (\(i=1\)) edge of x. Otherwise, \(x_{v=i}\) equals x. After computing the suboperations, we compute the result by either reusing an existing or creating a new BDD node (line 9).
Operations on decision diagrams are typically recursively defined on the structure of the inputs. To parallelise the operation in Algorithm 1, the two independent suboperations at lines 7–8 are executed in parallel using workstealing. To obtain high performance in a multicore environment, the data structures for the BDD node table and the operation cache must be highly scalable. Sylvan implements several nonblocking data structures to enable good speedups [17, 20].
To compute symbolic signaturebased partition refinement, several basic operations must be supported by the BDD package (see also [41]). Sylvan implements basic operations such as \(\wedge \) and ifthenelse, and existential quantification \(\exists \). Negation \(\lnot \) is performed in constant time using complement edges. To compute relational products of transition systems, there are operations relnext (to compute successors) and relprev (to compute predecessors and to concatenate relations), which combine the relational product with variable renaming. Similar operations are also implemented for MTBDDs. Sylvan is designed to support custom BDD algorithms. We present several new algorithms below.
4.2 Encoding of signature refinement
We implement symbolic signature refinement similar to [41]. However, we do not refine the partition with respect to a single block, but with respect to all blocks simultaneously. We use a binary encoding with variables s for the current state, \(s'\) for the next state, a for the action labels, and b for the blocks. We order BDD variables a and b after s and \(s'\), since this is required to efficiently replace signatures (on a and b) by new block numbers b (see below). Variables s and \(s'\) are interleaved, which is a common heuristic for transition systems.
In [21], we ordered a before b. However, we expect that in general ordering b before a is better for the following reason. If we have a before b, then when computing the signatures and the quotient (Sect. 5), it is guaranteed that all BDD nodes on a variables have to be recreated, whereas they may be reused if a variables are last in the ordering.
To perform symbolic bisimulation, we represent a number of sets by their characteristic functions. See also Fig. 1.

A set of states is represented by a BDD \(\mathcal {S}(s)\);

Transitions are represented by a BDD \(\mathcal {T}(s,s',a)\);

Markovian transitions are represented by an MTBDD \(\mathcal {R}(s,s')\), with leaves containing rational numbers (\(\mathbb {Q}\)) that represent the transition rates;

Signatures \(\mathbf T \) and \(\mathbf B \) are represented by a BDD \(\sigma _T(s,b,a)\);

Signatures \(\mathbf R ^s\) and \(\mathbf R ^b\) are represented by an MTBDD \(\sigma _R(s,b)\), with leaves containing rational numbers (\(\mathbb {Q}\)) that represent the rates in the signature.
We represent Markovian transitions using rational numbers, since they offer better precision than floatingpoint numbers. The manipulation of floatingpoint numbers typically introduces tiny rounding errors, resulting in different results of similar computations. This significantly affects bisimulation reduction, often resulting in finer partitions than the maximal bisimulation [38], which is unacceptable.
In the literature, three methods have been proposed to represent the partition \(\pi \).

1.
As an equivalence relation, using a BDD \(\mathcal {E}(s,s')=1\) iff \(s\equiv _{\pi } s'\) [8, 32].

2.
As a partition, by assigning each block a unique number, encoded with variables b, using a BDD \(\mathcal {P}(s,b)=1\) iff \(s\in C_b\) [16, 41, 42].

3.
Using \(k={\lceil }\log _2 n{\rceil }\) BDDs \(\mathcal {P}_{0},\cdots ,\mathcal {P}_{k1}\) such that \(\mathcal {P}_i(s)=1\) iff \(s\in C_b\) and the ith bit of b is 1. This requires significant time to restore blocks for the refinement procedure, but can require less memory [15].
We choose to use method 2, since in practice the BDD of \(\mathcal {P}(s,b)\) is smaller than the BDD of \(\mathcal {E}(s,s')\). Using \(\mathcal {P}(s,b)\) also has the advantage of straightforward signature computation. The logarithmic representation is incompatible with our approach, since we refine all blocks simultaneously. Their approach involves restoring individual blocks to the \(\mathcal {P}(s,b)\) representation, performing a refinement step, and compacting the result to the logarithmic representation. Restoring all blocks simply computes the full \(\mathcal {P}(s,b)\).
In the implementation of signature refinement, we actually encode \(\mathcal {P}\) using \(s'\) variables instead of s variables, i.e., encoding from target states to block numbers. This is advantageous for signature computation, as the signatures \(\sigma _T\) and \(\sigma _R\) can then be computed as follows:

\(\sigma _T(s,b,a) \, := \, \exists s':\mathcal {T}(s,s',a) \wedge \mathcal {P}(s',b)\)

\(\sigma _R(s,b) \, := \, \exists _\texttt {sum}\, s':\mathcal {R}(s,s') \wedge \mathcal {P}(s',b)\)
4.3 The refine algorithm
We present a new BDD algorithm to refine partitions according to a signature, which maximally preserves previously assigned block numbers.
Partition refinement consists of two steps: computing the signatures and computing the next partition. Given the signatures \(\sigma _T\) and/or \(\sigma _R\) for the current partition \(\pi \), the new partition can be computed as follows.
Since the chosen variable ordering has variables \(s,s'\) before a, b, each path in \(\sigma \) ends in a (MT)BDD representing the signature for the states encoded by that path. For \(\sigma _T\), every path that assigns values to s ends in a BDD on a, b. For \(\sigma _R\), every path that assigns values to s ends in a MTBDD on b with rational leaves.
Wimmer et al. [41] present a BDD operation refine that “replaces” these sub(MT)BDDs by the BDD representing a unique block number for each distinct signature. The result is the BDD of the next partition. They use a global counter and a hash table to associate each signature with a unique block number. This algorithm has the disadvantage that block number assignments are unstable. There is no guarantee that a stable block has the same block number in the next iteration. This has implications for the computation of the new signatures. When the block number of a stable block changes, cached results of signature computation in earlier iterations cannot be reused.
We modify the refine algorithm to use the current partition to reuse the previous block number of each state. This also allows refining a partition with respect to only a part of the signature, as described in Sect. 3. The modification is applied such that it can be parallelised in Sylvan. See Algorithm 2.
The algorithm has two input parameters: \(\sigma \) which encodes the (partial) signature for the current partition and \(\mathcal {P}\) which encodes the current partition. The algorithm uses a global counter \(\textsf {iter}\), which is the current iteration. This is necessary since the cached results of the previous iteration cannot be reused. It also uses and updates an array blocks, which contains the signature of each block in the new partition. This array is cleared between iterations of partition refinement.
The implementation is similar to other BDD operations, with an operation cache (lines 2 and 18) and a recursion step for variables in s (lines 3–8). The two recursive operations are executed in parallel. refine simultaneously descends in \(\sigma \) and \(\mathcal {P}\) (lines 6–7), matching the valuation of \(s_i\) in \(\sigma \) and \(s'_i\) in \(\mathcal {P}\). Block assignment happens at lines 11–17. We rely on the wellknown atomic operation compare_and_swap (cas), which atomically compares and modifies a value in memory. This is necessary for parallel correctness. We use cas to claim the previous block number for the signature (line 12). If the block number is already claimed for a different signature, then the current block is being split and we call search_or_insert to assign a new block number.
Different implementations of search_and_insert are possible. We implemented a parallel hash table that uses a global counter for the next block number when inserting a new pair \((\sigma , B)\), similar to [41]. We also implemented an alternative implementation that integrates the blocks array with a skip list. A skip list is a probabilistic multilevel ordered linked list. See [35]. This implementation performed better in our experiments, but we omit the implementation details due to space constraints.
4.4 Computing inert transitions
To compute the set of inert \(\tau \)transitions for branching bisimulation , or more generally, to compute any inert transition relation \(\rightarrow \!\cap \!\equiv \) with \(\pi =S/\!\equiv \) with blocks b, the expression \(\mathcal {T}(s,s') \wedge \exists b:\mathcal {P}(s,b) \wedge \mathcal {P}(s',b)\) must be evaluated. [41] writes that the intermediate BDD of \(\exists b:\mathcal {P}(s,b) \wedge \mathcal {P}(s',b)\), obtained by first computing \(\mathcal {P}(s,b)\) using variable renaming from \(\mathcal {P}(s',b)\) and then \(\exists b:\mathcal {P}(s,b) \wedge \mathcal {P}(s',b)\) using and_exists, is very large. This is no surprise, since this intermediate result is indeed the BDD \(\mathcal {E}(s,s')\), which we were avoiding by representing the partition using \(\mathcal {P}(s',b)\).
The solution in [41] was to avoid computing \(\mathcal {E}\) by computing the signatures and the refinement only with respect to one block at a time, which also enables several optimisations in [40].
We present an alternative solution, which computes \(\rightarrow \!\cap \!\equiv \) directly using a custom BDD algorithm. The inert algorithm takes parameters \(\mathcal {T}(s,s')\) (\(\mathcal {T}\) may contain other variables ordered after \(s,s'\)) and two copies of \(\mathcal {P}(s',b)\): \(\mathcal {P}^s\) and \(\mathcal {P}^{s'}\). The algorithm matches \(\mathcal {T}\) and \(\mathcal {P}^s\) on valuations of variables s, and \(\mathcal {T}\) and \(\mathcal {P}^{s'}\) on valuations of variables \(s'\). See Algorithm 3, and also Fig. 2 for a schematic overview. When in the recursive call all valuations to s and \(s'\) have been matched, with \(S_s,S_{s'}\subseteq S\) the sets of states represented by these valuations, \(\mathcal {T}\) is the set of actions that label the transitions between states in \(S_s\) and \(S_{s'}\), \(\mathcal {P}^s\) is the block that contains all \(S_s\), and \(\mathcal {P}^{s'}\) is the block that contains all \(S_{s'}\). Then, if \(\mathcal {P}^s\ne \mathcal {P}^{s'}\), the transitions are not inert and inert returns False, removing the transition from \(\mathcal {T}\). Otherwise, \(\mathcal {T}\) (which may still contain other variables ordered after \(s,s'\), such as action labels) is returned.
5 Quotient computation
Computing the partition of the maximal bisimulation is only the first part of the minimisation process. We must also apply the partition to the original system, such that the blocks of the partition become the states of the new transition system. A straightforward conversion procedure encodes the new states using the block numbers assigned during partition refinement.
Just like partition refinement, the quotient can be computed with a sequence of standard BDD operations. We describe how the Sigref tool by Wimmer et al. [41] implements this computation. Furthermore, we develop specialised algorithms which significantly speedup quotient computation for the interactive transition relation (Sect. 5.1) and for the Markovian transition relation (Sect. 5.2). Finally, we investigate a different encoding that does not use the assigned block numbers for the new system, but picks an arbitrary state from each block as a representative (Sect. 5.3).
5.1 Computing the new interactive transition relation
For LTSs and IMCs, the new interactive transition relation is computed using the original transition relation and the partition. We first describe how this relation is computed using standard BDD operations in the Sigref tool [41]. We then present a new algorithm that performs all steps in one operation.
The Sigref tool implements two methods to compute the new interactive transition relation. The first consists of the following steps:

1.
Merge target states to the new encoding (in b).
$$\begin{aligned} \mathcal {T}(s,b,a) \, := \, \exists s':\mathcal {T}(s,s',a) \wedge \mathcal {P}(s',b) \end{aligned}$$ 
2.
Rename b variables to \(s'\) variables.
$$\begin{aligned} \mathcal {T}(s,s',a) \, := \, \mathcal {T}(s,b,a)[b\leftarrow s'] \end{aligned}$$ 
3.
Merge source states to the new encoding (in b).
$$\begin{aligned} \mathcal {T}(s',b,a) \, := \exists s:\mathcal {T}(s,s',a) \wedge \mathcal {P}'(s,b) \end{aligned}$$ 
4.
Rename b variables to s variables.
$$\begin{aligned} \mathcal {T}(s,s',a) \, := \, \mathcal {T}(s',b,a)[b\leftarrow s] \end{aligned}$$ 
5.
Remove \(\tau \)loops (only for branching bisimulation).
$$\begin{aligned} \mathcal {T}(s,s',a) \, := \mathcal {T}(s,s',a)\wedge \lnot (s=s'\wedge a=\tau ) \end{aligned}$$
Encoding and merging states (steps 1 and 3) are carried out using the BDD operation and_exists on the transition relation and the partition, where the existential quantification causes the transitions to states in the same block and from states in the same block to be combined like a set union. It is straightforward to see that the result is correct, as long as \(\tau \)loops are removed for branching bisimulation. For strong bisimulation, all states in a block have the same transitions, so existential quantification has no effect. For branching bisimulation, all states in a block can reach transitions via inert \(\tau \)steps, so combining the transitions with existential quantification is necessary to compute the correct result.
Step 1 requires the partition defined on \(s'\) and b variables, whereas step 3 requires the partition defined on s and b variables, in order to perform and_exists. Therefore, one additional rename operation is required to obtain a duplicate of the partition defined on the other variables. The algorithm to compute the quotient is then as follows:
Steps 1–5 coincide with lines 2–7 in the above algorithm. The BDD for \(s=s'\wedge a=\tau \) (line 7) is trivial and can be computed just before line 7.
The Sigref tool also implements a more optimised version, by introducing \(b'\) variables that are interleaved with the b variables, similar to how s and \(s'\) variables are interleaved.

1.
Merge target states to the new encoding (in \(b'\)).
$$\begin{aligned} \mathcal {T}(s,b',a) \, := \, \exists s':\mathcal {T}(s,s',a) \wedge \mathcal {P}'(s',b') \end{aligned}$$ 
2.
Merge source states to the new encoding (in b).
$$\begin{aligned} \mathcal {T}(b,b',a) \, := \exists s:\mathcal {T}(s,a,b') \wedge \mathcal {P}''(s,b) \end{aligned}$$ 
3.
Rename b and \(b'\) variables to s and \(s'\) variables.
$$\begin{aligned} \mathcal {T}(s,s',a) := \mathcal {T}(a,b,b')[b\leftarrow s,b'\leftarrow s'] \end{aligned}$$ 
4.
Remove \(\tau \)loops (only for branching bisimulation).
$$\begin{aligned} \mathcal {T}(s,s',a) \, := \mathcal {T}(s,s',a)\wedge \lnot (s=s'\wedge a=\tau ) \end{aligned}$$
Since we use \(s'\) and b variables for \(\mathcal {P}\), two rename operations would be required to compute \(\mathcal {P}'(s',b')\) and \(\mathcal {P}''(s,b)\). Instead, we perform this version as follows:

1.
Merge target states to the new encoding (in b).
$$\begin{aligned} \mathcal {T}(s,b,a) := \exists s':\mathcal {T}(s,s',a) \wedge \mathcal {P}(s',b) \end{aligned}$$ 
2.
Rename s and b variables to \(s'\) and \(b'\) variables.
$$\begin{aligned} \mathcal {T}(s',b',a) \, := \, \mathcal {T}(s,b,a)[s\leftarrow s',b\leftarrow b'] \end{aligned}$$ 
3.
Merge source states to the new encoding (in b).
$$\begin{aligned} \mathcal {T}(b,b',a) \, := \exists s:\mathcal {T}(s',b',a) \wedge \mathcal {P}(s',b) \end{aligned}$$ 
4.
Rename b and \(b'\) variables to s and \(s'\) variables.
$$\begin{aligned} \mathcal {T}(s,s',a) \, := \mathcal {T}(b,b',a)[b\leftarrow s,b'\leftarrow s'] \end{aligned}$$ 
5.
Remove \(\tau \)loops (only for branching bisimulation).
$$\begin{aligned} \mathcal {T}(s,s',a) \, := \, \mathcal {T}(s,s',a)\wedge \lnot (s=s'\wedge a=\tau ) \end{aligned}$$
This procedure avoids creating a copy of \(\mathcal {P}\) by renaming. The implementation is then as follows:
These algorithms still compute intermediate results that could be avoided by combining several steps into one operation. For example, every rename operation essentially creates a duplicate of the original BDD, when most BDD nodes are affected by the renaming. Using a custom operation can mitigate this. Similar to the inert algorithm discussed in Sect. 4.4, we implement the algorithm quotient that combines all steps of the above two algorithms. See Fig. 3 and Algorithm 4. Note the similarities with Fig. 2 and Algorithm 3.
Like the inert operation, we evaluate and match the transition relation with two copies of the partition (lines 1–12) and obtain the source block, the target block, and the set of actions at line 14–15. If we perform branching bisimulation and the source and target blocks are identical, we remove the \(\tau \) transition from the obtained set of actions (line 14). As the two BDDs for the blocks are simple cubes that encode exactly one block by assigning a value to each b variable, and \(\mathcal {T}\) is the set of actions A, it is very straightforward to compute the BDD representing the triple \((s,s',A)\) using the recursive function makecube (line 15), which we included for completeness in Algorithm 4 at lines 18–29. Then, we combine all tuples computed at line 15 with or (lines 8 and 12), which has the same effect as existential quantification in the original algorithm.
5.2 Computing the new Markovian transition relation
For CTMCs and IMCs, the new Markovian transition relation must be computed. We first describe how this relation is computed using standard BDD operations in the Sigref tool [41]. We then present a new algorithm that combines several steps of the computation.
The Sigref tool uses the following method to compute the new Markovian transition relation:

1.
Merge target states to the new encoding (in b).
$$\begin{aligned} \mathcal {R}(s,b) \, := \, \exists _\texttt {sum} s':\mathcal {R}(s,s') \wedge \mathcal {P}(s',b) \end{aligned}$$ 
2.
Rename b variables to \(s'\) variables.
$$\begin{aligned} \mathcal {R}(s,s') \, := \mathcal {R}(s,b)[b\leftarrow s'] \end{aligned}$$ 
3.
Merge source states to the new encoding (in b).
$$\begin{aligned} \mathcal {R}(s',b) \, := \exists _\texttt {max} s:\mathcal {R}(s,s') \wedge \mathcal {P}'(s,b) \end{aligned}$$ 
4.
Rename b variables to s variables.
$$\begin{aligned} \mathcal {R}(s,s') \, := \mathcal {R}(s',b)[b\leftarrow s] \end{aligned}$$
First, the target states are converted to the new encoding using and_exists_sum, as transition rates to different states in the same block are added to obtain \(\mathcal {R}(s,b)\). The variables b are renamed to \(s'\) to obtain \(\mathcal {R}(s,s')\). The source states are then converted to the new encoding using and_exists_max, as we take the maximum, as discussed in Sect. 3, to obtain \(\mathcal {R}(s',b)\). Finally, the variables b are renamed to s to obtain the result \(\mathcal {R}(s,s')\).
The algorithm to compute the quotient is then as follows:
We also implemented a custom quotient operation for the Markovian transition relation. However, not all steps can be combined like with interaction transition relation, since adding rates from states to blocks must be done before the source states are merged. Thus, we can only combine steps 2–4. The quotient operation for the Markovian transition relation is similar to the implementation of and_exists_max in Sylvan, modified to perform the rename operations on the fly and we omit it due to space limitations.
5.3 Alternative encoding for new states
The standard encoding of the states in the new transition system uses the block numbers assigned during partition refinement. This can have a significant disadvantage. Symbolic models are powerful as they can represent large state spaces efficiently by exploiting structural properties of the transition system, like symmetries and independent variables. Such properties are lost when using the block numbers of the partition.
We propose an alternative encoding “pickonestate” that picks one state from each block to represent all states in the block. Each path in \(\mathcal {P}\) to the subBDD that represents a block (on b variables) encodes states in that block, such that state variables encountered along the path are True if the high edge was followed and False if the low edge was followed. We use this information to compute exactly one state (encoded using b variables, with missing state variables set to False) that represents the block and store this state in an array. Since we are simply interested in obtaining one state that represents each block, we only need to visit each node in the BDD \(\mathcal {P}\) once, so we use the operation cache to denote whether we have visited the node. See Algorithm 5. This algorithm pick fills an array picked with a single state for each block, obtained from the path as described above using a helper function pick_one_state.
After obtaining a single state for each block, we can use an algorithm similar to refine (Sect. 4.3) to replace each block in \(\mathcal {P}\) by the selected state (encoded using b variables). Then, the same algorithms as in Sects. 5.1 and 5.2 compute the new transition system using the proposed encoding.
6 Tool support
We implemented multicore symbolic signaturebased bisimulation minimisation in a tool called SigrefMC. The tool supports LTSs, CTMCs, and IMCs delivered in two input formats, the XML format used by the original Sigref tool and the BDD format that the tool LTSmin [28] generates for various model checking languages. SigrefMC supports both the floatingpoint and the rational representation of rates in continuoustime transitions.
One of the design goals of this tool is to encourage researchers to extend it for their own file formats and notions of bisimulation, and to integrate it in other toolsets. Therefore, SigrefMC is freely available online^{Footnote 1} and licensed with the permissive Apache 2.0 license. Documentation is available and instructions for extending the tool for different input/output formats and types of bisimulation are included.
6.1 Support for LTSmin
SigrefMC supports models are generated by the model checking toolset LTSmin. LTSmin provides a languageindependent Partitioned NextState Interface (Pins), which connects various input languages to model checking algorithms [6, 28, 31]. In Pins, the states of a system are represented by vectors of N integer values. Furthermore, transitions are distinguished in K disjunctive “transition groups”, i.e., each transition in the system belongs to one of these transition groups. The transition relation of each transition group usually only depends on a subset of the entire state vector called the “short vector”, further distinguished by the variables that are “read” and the variables that are “written” [31]. This enables the efficient encoding of transitions that only affect some integers of the state vector. Exploiting this information lets the Pins interface work in a quasisymbolic way, as a single pair of short vectors can represent many transition relations on the full state vector.
Initially, LTSmin does not have knowledge of the transitions in each transition group, and only the initial state is known. The transition system is explored by learning new transitions via the Pins interface, which are then added to the transition relation. Various input languages connect to LTSmin via the Pins interface by implementing a nextstate function, which produces all target states (as write vectors) reachable from a given source state (as read vector). Using the LTSmin toolset, we can convert process algebra specifications in the language mCRL2 [13] to the BDD file format that SigrefMC supports. We can then minimise the obtained LTS using the techniques described in this paper and obtain the result, either as a symbolic LTS or as a simple explicitstate enumeration of transitions between states.
7 Experimental evaluation
This section reports on the experimental evaluation of the techniques proposed in this paper. We study the improvements to signature refinement in Sect. 7.1, the improvements to quotient computation in Sect. 7.2, the effect of ordering block variables after or before action variables in Sect. 7.3, and finally the performance of the presented tool SigrefMC on process algebra benchmarks produced with LTSmin in Sect. 7.4. We also refer to the full experimental data that are available online^{Footnote 2} and can be reproduced.
When comparing SigrefMC to other tools, we restrict ourselves to the symbolic bisimulation minimisation tool Sigref by Wimmer et al., as [41] already compares Sigref to other explicitstate and symbolic bisimulation minimisation tools.
7.1 Signature refinement
7.1.1 Design
To study the improvements to signature refinement that we present in this paper, we compared our results (using the skip list variant of refine) to Sigref 1.5 [40] for LTS and IMC models, and to a version of Sigref used in [38] for CTMC models. For the CTMC models, we used Sigref with rational numbers provided by the GMP library and SigrefMC with rational number support by Sylvan. For the IMC models, version 1.5 of Sigref does not support the GMP library and the version used in [38] does not support IMCs. We used SigrefMC with floating points for a fairer comparison, but the tools give a slightly different number of blocks, due to the use of floating points.
We restrict ourselves to the models presented in [38, 41] and an IMC model that is part of the distribution of Sigref. These models have been generated from PRISM benchmarks using a custom version of the PRISM toolset [30]. We refer to the literature for a description of these models.
We perform experiments on the three tools using a 48core machine, containing 4 AMD Opteron^{TM} 6168 processors with 12 cores each. We measure the runtimes for the partition refinement algorithm (excluding fileI/O) using Sigref, SigrefMC with only 1 worker, and SigrefMC with 48 workers.
Apart from the new refine and inert algorithms presented in the current paper, there are several other differences. The first is that the original Sigref uses the CUDD implementation of BDDs, while SigrefMC uses Sylvan, along with some extra BDD algorithms that avoid explicitly computing variable renaming of some BDDs. The second is that Sigref has several optimisations [40] that are not available in SigrefMC.
7.1.2 Results
See Table 1 for the results of these experiments. These results were obtained by repeating each benchmark at least 15 times and taking the average. The timeout was set to 3600 s. The column “States” shows the number of states before bisimulation minimisation and “Blocks” the number of equivalence classes after bisimulation minimisation. We show the wall clock time using Sigref (\(T_w\)), using SigrefMC with 1 worker (\(T_1\)) and using SigrefMC with 48 workers (\(T_{48}\)). We compute the sequential speedup \(T_w/T_1\), the parallel speedup \(T_1/T_{48}\), and the total speedup \(T_w/T_{48}\).
Note that we obtained these results using the variable ordering \(s,s'< a < b\); the other experiments are computed using the variable ordering \(s,s'< b < a\), as discussed below and in Sect. 4.2.
Due to space constraints, we do not include all results, but restrict ourselves to larger models. We refer to the full experimental data that is available online. In the full set of results, excluding executions that take less than 1 s, SigrefMC is always faster sequentially and always benefits from parallelism.
The results show a clear advantage for larger models. One interesting result is for the p2p75 model. This model is ideal for symbolic bisimulation with a large number of states (\(2^{35}\)) and very few blocks after minimisation (336). For this model, our tool is 95\(\times \) faster sequentially and has a parallel speedup of 8\(\times \), resulting in a total speedup of 767\(\times \). The best parallel speedup of 17\(\times \) was obtained for the kanban05 model.
In almost all experiments, the signature computation dominates with 70–99% of the execution time sequentially. We observe that the refinement step sometimes benefits more from parallelism than signature computation, with speedups up to 29.9\(\times \). We also find that reusing block numbers for stable blocks causes a major reduction in computation time towards the end of the procedure. The kanban LTS models and the larger polling CTMC models are an excellent case study to demonstrate this. See Fig. 4. There is a clear correlation between the number of new blocks per iteration and the time per iteration for SigrefMC, while the time per iteration for Sigref seems to correlate with the number of blocks.
7.2 Quotient computation
7.2.1 Design
To study the different methods for quotient computation, we implemented the methods described in Sects. 5.1 and 5.2:

blocks: block encoding using standard operations

block: block encoding using specialised operations

pick: pickonestate encoding, specialised operations
We computed the partition in SigrefMC using rational numbers for the Markovian transitions and with the variable ordering \(s,s'< b < a\) for the interactive transitions. We used the same 48core machine as for the experiments in Sect. 7.1. We measure the time for quotient computation with 1 worker and with 48 workers. Our experimental setup performed all benchmarks in random order and repeated the experiments ad infinitum. When we halted the script, every benchmark was performed at least \(12 \times \). The timeout was set to 1200 s, including time to compute the partition.
7.2.2 Results
See Table 2 for the results of these experiments. The results show that the block implementation is faster than the blocks implementation, except for the ftwc02 model. For CTMC models, using specialised operations results in a speedup of 2–\(3{\times }\). For LTS models, using specialised operations results in a speedup of 5–\(9{\times }\). The pickonestate encoding shows mixed results for computation time, as it can be slower or faster than block encoding. Furthermore, we obtain a parallel speedup of up to \(20.5{\times }\) for the block encoding and \(21.9{\times }\) with the pickonestate encoding, with 48 workers.
See Table 3 for the sizes of the computed transition relations using block encoding and using pickonestate encoding, in number of BDD nodes. In many cases, pickonestate encoding is superior, with up to \(5162{\times }\) smaller BDDs for the polling models. For the p2p models, block encoding is superior, likely due to the small number of blocks after bisimulation minimisation.
7.3 Variable ordering
7.3.1 Design
As discussed in Sect. 4.2, we can choose to order block variables b before or after action variables a in the variable ordering of the BDDs. To compare the ordering \(s,s'< a < b\) and \(s,s'< b < a\), we compare signature refinement and quotient computation for the kanban LTS models.
We expect that in general ordering b before a is the best choice. If we have a variables before b variables, then it is guaranteed that all BDD nodes on a variables are recreated when we compute signatures for partition refinement and when we compute the quotient, whereas they may be reused if a variables are last in the ordering.
7.3.2 Results
See Table 4 for the results of this experiment. All data points are computed with at least 5 runs. We computed the quotient using the pickonestate algorithm. We see that in most cases the ordering with b before a is superior. We observe a stronger effect for partition refinement than for quotient computation. The surprising exception is quotient computation of the kanban04 model with strong bisimulation, where the ordering with a before b is slightly better, although the total time still favours ordering b before a.
7.4 Process algebra experiments
7.4.1 Design
As described in Sect. 6.1, we extended SigrefMC with support for BDDs produced by the model checking toolset LTSmin from process algebra models specified in the mCRL2 specification language.
We first took a number of communication protocols from the mCRL2 example directory, in particular the bounded retransmission protocol (BRP) and the Sliding Window Protocol (SWP). We made them parametric in the number of data elements, number of retries, window size, etc. We also include a number of distributed algorithms. We ported the probabilistic leader election protocols [3], based on Dolev–Klawe–Rodeh and Franklin, from \(\mu \)CRL to mCRL2. We also included Hesselink’s hardware register [27]. Finally, we also included an industrial case study: Workload Management System of the computation grid at the Large Hadron Collider LHC (CERN), specified in [36].
This leads to the following specifications:

SWP_m_n: the Sliding Window Protocol [1] on m data items, with window size n. This specifies a onedirectional version of the sliding window protocol. n subsequent data items can be sent and acknowledged in arbitrary order. This requires sequence numbers modulo 2n. Its external behaviour is equivalent to a 2nplace buffer.

BRP_m_\(\ell \)_n: the bounded retransmission protocol [24] on m data items, sending a list of length \(\ell \) and with n retries. This protocol extends the ABP, but gives up after n retries. The status of the transmission is returned to both the sender and the receiver. The external behaviour is a bit complicated, since the sender cannot distinguish if the last data element or the last acknowledgement got lost.

DKR_n: randomised variant [3] of Dolev–Klawe–Rodeh’s [22] Leader Election Protocol on a unidirectional ring with n anonymous partners. Several rounds may be needed when partners choose the same identity. The protocol is based on hop counters and on an alternating bit to distinguish subsequent rounds. The external behaviour is equivalent to a single leader action.

Franklin_n_m: randomised variant [3] of Franklin’s Leader Election Protocol [23], but now on a bidirectional ring with n partners, using \(m\le n\) different identities. The external behaviour is again equivalent to a single leader action.

Hesselink_n: Hesselink’s handshake register [27], constructed from four safe registers and four Boolean atomic registers, modelled in mCRL2 by Groote, and used for experimentation in [34].

WMS: this models the Workload Management System of the DIRAC (Distributed Infrastructure with Remote Agent Control) for the Large Hadron Collider experiments at CERN, as described in [36].
We used the following toolchain to generate input files for SigrefMC:

1.
mcrl22lps Dfvn from the mCRL2 toolset to generate LPS files from the specifications

2.
lps2ltssym vset=lddmc from the LTSmin toolset to generate the transition systems in LDD format from the LPS files

3.
ldd2bdd from the LTSmin toolset to convert the transition systems from LDDs to BDDs
To evaluate SigrefMC on these models, we performed the same experiments as in Sect. 7.2.
We measure the time for partition refinement and quotient computation with 1 worker and with 48 workers. Our experimental setup performed all benchmarks in random order and repeated the experiments ad infinitum. When we halted the script, every benchmark was performed at least \(6 \times \). The timeout was set to 1200 s for the entire program, i.e., partition refinement and quotient computation.
7.4.2 Results
The results are summarised in Tables 5 and 6. We do not include all results to conserve space; all results from the experiments are available online.
It is interesting to see that both strong and branching bisimulation result in huge reductions. We see clear benefit from parallel processing, with speedups of up to 24.7\(\times \) for signature refinement and up to 24.5\(\times \) for quotient computation (block encoding)
The pickonestate encoding does not work so well here. Probably because the number of blocks is low; also the state vectors are relatively long. For a few models, the pickonestate encoding works relatively well; these are models that have a high number of blocks.
8 Conclusions
Originally, we intended to investigate parallelism in symbolic bisimulation minimisation. To our surprise, we obtained a much higher sequential speedup using specialised BDD operations, as demonstrated by the results in Table 1 and Fig. 4. The specialised BDD operations offer a clear advantage sequentially and the integration with Sylvan results in decent parallel speedups. Our best result had a total speedup of 767\(\times \). By also using specialised BDD operations for quotient computation, we demonstrated performance improvements in 2–10\(\times \) over using standard BDD operations.
The success of this approach suggests that for applications that involve decision diagrams, specialised operations that combine sequential steps can be a good method to obtain performance improvements in several orders of magnitude. Similarly, the additional performance improvement gained by the parallel framework from Sylvan is relatively low hanging fruit to improve the performance of symbolic algorithms with decision diagrams.
The pickonestate encoding that we proposed in this paper is promising, especially for transition systems that are still relatively large after bisimulation minimisation. The implementation discussed here just picked an arbitrary state; we expect that better heuristics may be developed in the future.
A limitation of this study is that we only measured the performance on the benchmarks that were used in [38, 40] and on several benchmarks from the mCRL2 distribution.
References
Badban, B., Fokkink, W., Groote, J.F., Pang, J., van de Pol, J.: Verification of a sliding window protocol in \(\mu \)CRL and PVS. Formal Asp. Comput. 17(3), 342–388 (2005)
Bahar, R.I., Frohm, E.A., Gaona, C.M., Hachtel, G.D., Macii, E., Pardo, A., Somenzi, F.: Algebraic decision diagrams and their applications. ICCAD 1993, 188–191 (1993)
Bakhshi, R., Fokkink, W., Pang, J., van de Pol, J.: Leader election in anonymous rings: Franklin goes probabilistic. In: Ausiello, G., Karhumäki, J., Mauri, G., Ong, C.L. (eds.) TCS’08, IFIP, vol. 273, pp. 57–72. Springer, Berlin (2008)
Blom, S., Haverkort, B.R., Kuntz, M., van de Pol, J.: Distributed Markovian bisimulation reduction aimed at CSL model checking. ENTCS 220(2), 35–50 (2008)
Blom, S., Orzan, S.: Distributed branching bisimulation reduction of state spaces. ENTCS 89(1), 99–113 (2003)
Blom, S., van de Pol, J., Weber, M.: LTSmin: distributed and symbolic reachability. In: CAV, LNCS, vol. 6174, pp. 354–359. Springer (2010)
Blumofe, R.D.: Scheduling multithreaded computations by work stealing. In: FOCS, pp. 356–368. IEEE Computer Society (1994)
Bouali, A., de Simone, R.: Symbolic bisimulation minimisation. In: Computer Aided Verification, 4th International Workshop, LNCS, vol. 663, pp. 96–108. Springer (1992)
Brace, K.S., Rudell, R.L., Bryant, R.E.: Efficient implementation of a BDD package. In: DAC, pp. 40–45 (1990)
Bryant, R.E.: Graphbased algorithms for Boolean function manipulation. IEEE Trans. Comput. C–35(8), 677–691 (1986)
Burch, J., Clarke, E., Long, D., McMillan, K., Dill, D.: Symbolic model checking for sequential circuit verification. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 13(4), 401–424 (1994)
Clarke, E.M., McMillan, K.L., Zhao, X., Fujita, M., Yang, J.: Spectral transforms for large Boolean functions with applications to technology mapping. In: DAC, pp. 54–60 (1993)
Cranen, S., Groote, J.F., Keiren, J.J.A., Stappers, F.P.M., de Vink, E.P., Wesselink, W., Willemse, T.A.C.: An overview of the mCRL2 toolset and its recent advances. In: TACAS, LNCS, vol. 7795, pp. 199–213. Springer (2013)
De Nicola, R., Vaandrager, F.W.: Three logics for branching bisimulation. J. ACM 42(2), 458–487 (1995)
Derisavi, S.: A symbolic algorithm for optimal Markov chain lumping. TACAS 2007, 139–154 (2007)
Derisavi, S.: Signaturebased symbolic algorithm for optimal Markov chain lumping. In: QEST 2007, pp. 141–150. IEEE Computer Society (2007)
van Dijk, T.: Sylvan: multicore decision diagrams. Ph.D. thesis, University of Twente (2016)
van Dijk, T., Laarman, A., van de Pol, J.: Multicore BDD operations for symbolic reachability. ENTCS 296, 127–143 (2013)
van Dijk, T., van de Pol, J.: Lace: nonblocking split deque for workstealing. In: MuCoCoS, LNCS, vol. 8806, pp. 206–217. Springer (2014)
van Dijk, T., van de Pol, J.: Sylvan: multicore decision diagrams. In: TACAS, LNCS, vol. 9035, pp. 677–691. Springer (2015)
van Dijk, T., van de Pol, J.: Multicore symbolic bisimulation minimisation. In: TACAS, LNCS, vol. 9636, pp. 332–348. Springer (2016)
Dolev, D., Klawe, M.M., Rodeh, M.: An \(o(n \log n)\) unidirectional distributed algorithm for extrema finding in a circle. J. Algorithms 3(3), 245–260 (1982)
Franklin, W.R.: On an improved algorithm for decentralized extrema finding in circular configurations of processors. Commun. ACM 25(5), 336–337 (1982)
Groote, J.F., van de Pol, J.: A bounded retransmission protocol for large data packets. In: Wirsing, M., Nivat, M. (eds.) AMAST’96, LNCS 1101, pp. 536–550. Springer, Berlin (1996)
Hermanns, H.: Interactive Markov Chains: The Quest for Quantified Quality, Lecture Notes in Computer Science, vol. 2428. Springer, Berlin (2002)
Hermanns, H., Katoen, J.: The how and why of interactive Markov chains. In: FMCO’09, LNCS 6286, pp. 311–337. Springer (2009)
Hesselink, W.H.: Invariants for the construction of a handshake register. Inf. Process. Lett. 68(4), 173–177 (1998)
Kant, G., Laarman, A., Meijer, J., van de Pol, J., Blom, S., van Dijk, T.: LTSmin: highperformance languageindependent model checking. In: TACAS 2015, LNCS, vol. 9035, pp. 692–707. Springer (2015)
Kulakowski, K.: Concurrent bisimulation algorithm. CoRR. arXiv:1311.7635 (2013)
Kwiatkowska, M.Z., Norman, G., Parker, D.: PRISM 4.0: verification of probabilistic realtime systems. In: CAV, LNCS, vol. 6806, pp. 585–591. Springer (2011)
Meijer, J., Kant, G., Blom, S., van de Pol, J.: Read, write and copy dependencies for symbolic model checking. In: Yahav, E. (ed.) HVC, LNCS, vol. 8855, pp. 204–219. Springer, Berlin (2014)
Mumme, M., Ciardo, G.: An efficient fully symbolic bisimulation algorithm for nondeterministic systems. Int. J. Found. Comput. Sci. 24(2), 263–282 (2013)
Paige, R., Tarjan, R.E.: Three partition refinement algorithms. SIAM J. Comput. 16(6), 973–989 (1987)
van de Pol, J., Timmer, M.: State space reduction of linear processes using control flow reconstruction. In: Liu, Z., Ravn, A.P. (eds.) ATVA’09, LNCS, vol. 5799, pp. 54–68. Springer, Berlin (2009)
Pugh, W.: Skip lists: a probabilistic alternative to balanced trees. Commun. ACM 33(6), 668–676 (1990)
Remenska, D., Willemse, T.A.C., Verstoep, K., Fokkink, W., Templon, J., Bal, H.E.: Using model checking to analyze the system behavior of the LHC production grid. In: CCGrid’12, pp. 335–343. IEEE Computer Society (2012)
Wijs, A.: GPU accelerated strong and branching bisimilarity checking. TACAS 2015, 368–383 (2015)
Wimmer, R., Becker, B.: Correctness issues of symbolic bisimulation computation for Markov chains. In: MMB&DFT, LNCS, vol. 5987, pp. 287–301. Springer (2010)
Wimmer, R., Derisavi, S., Hermanns, H.: Symbolic partition refinement with automatic balancing of time and space. Perform. Eval. 67(9), 816–836 (2010)
Wimmer, R., Herbstritt, M., Becker, B.: Optimization techniques for BDDbased bisimulation computation. In: 17th GLSVLSI, pp. 405–410. ACM (2007)
Wimmer, R., Herbstritt, M., Hermanns, H., Strampp, K., Becker, B.: Sigref—a symbolic bisimulation tool box. In: ATVA, LNCS, vol. 4218, pp. 477–492. Springer (2006)
Wimmer, R., Hermanns, H., Herbstritt, M., Becker, B.: Towards symbolic stochastic aggregation. Technical Report, SFB/TR 14 AVACS (2007)
Acknowledgements
Open access funding provided by Johannes Kepler University Linz.
Author information
Authors and Affiliations
Corresponding author
Additional information
Work funded by the NWO Grant 612.001.101 (MaDriD) and by FWF, NFN Grant S11408N23 (RiSE).
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
About this article
Cite this article
van Dijk, T., van de Pol, J. Multicore symbolic bisimulation minimisation. Int J Softw Tools Technol Transfer 20, 157–177 (2018). https://doi.org/10.1007/s100090170468z
Published:
Issue Date:
DOI: https://doi.org/10.1007/s100090170468z