Multicore symbolic bisimulation minimisation
 247 Downloads
 1 Citations
Abstract
We introduce parallel symbolic algorithms for bisimulation minimisation, to combat the combinatorial state space explosion along three different paths. Bisimulation minimisation reduces a transition system to the smallest system with equivalent behaviour. We consider strong and branching bisimilarity for interactive Markov chains, which combine labelled transition systems and continuoustime Markov chains. Large state spaces can be represented concisely by symbolic techniques, based on binary decision diagrams. We present specialised BDD operations to compute the maximal bisimulation using signaturebased partition refinement. We also study the symbolic representation of the quotient system and suggest an encoding based on representative states, rather than block numbers. Our implementation extends the parallel, shared memory, BDD library Sylvan, to obtain a significant speedup on multicore machines. We propose the usage of partial signatures and of disjunctively partitioned transition relations, to increase the parallelisation opportunities. Also our new parallel data structure for block assignments increases scalability. We provide SigrefMC, a versatile tool that can be customised for bisimulation minimisation in various contexts. In particular, it supports models generated by the highperformance model checker LTSmin, providing access to specifications in multiple formalisms, including process algebra. The extensive experimental evaluation is based on various benchmarks from the literature. We demonstrate a speedup up to 95\(\times \) for computing the maximal bisimulation on one processor. In addition, we find parallel speedups on a 48core machine of another 17\(\times \) for partition refinement and 24\(\times \) for quotient computation. Our new encoding of the reduced state space leads to smaller BDD representations, with up to a 5162fold reduction.
Keywords
Bisimulation minimisation Interactive Markov chains Binary decision diagrams Parallel algorithms1 Introduction
One of the main challenges for model checking is that the space and time requirements of model checking algorithms increase exponentially in the size of the models. This paper combines state space reduction, symbolic representation, and parallel computation, to alleviate the state space explosion.
As input models, we consider interactive Markov chains (IMC). These provide a compositional framework to study functionality, performance, and dependability of reactive systems. IMCs inherit nondeterministic choice and communication from labelled transition systems, and probabilistic timed (Markovian) transitions from continuoustime Markov chains.
A state space reduction computes the smallest “equivalent” model. We consider strong bisimilarity, which preserves all behaviour, and branching bisimilarity, which abstracts from internal behaviour (represented by \(\tau \)steps) and only preserves the observable behaviour. Note that branching bisimulation preserves the branching structure of an LTS, thus preserving all properties expressible in CTL*X [14]. These notions correspond to strong and branching lumping for IMCs.
The reduced state space consists of (representatives of) the equivalence classes in the largest bisimulation, which is typically computed using partition refinement. Starting with the initial partition, in which all states are equivalent, the current partition is refined until the states in any equivalence class can no longer be distinguished. Blom et al. [5] introduced a signaturebased method, which defines the equivalence classes according to the characterising signature of a state.
Another important technique to handle large state spaces is symbolic representation. Sets of states are represented by characteristic functions, which are efficiently stored in binary decision diagrams (BDDs). In the literature, symbolic methods have been applied to bisimulation minimisation in several ways. Bouali and De Simone [8] refine the equivalence relation \(R\subseteq S\times S\), by iteratively removing all “bad” pairs from R, i.e., pairs of states that are no longer equivalent. For strong bisimulation, Mumme and Ciardo [32] apply saturationbased methods to compute R. Wimmer et al. [40, 41] use signatures to refine the partition, represented by the assignment to equivalence classes \(P:S\rightarrow C\). Symbolic bisimulation based on signatures has also been applied to Markov chains by Derisavi [16] and Wimmer et al. [38, 39].
The symbolic representation of the reduced state space tends to be much larger than the original model. One particular application of symbolic bisimulation minimisation is as a bridge between symbolical models and explicitstate analysis algorithms. Symbolical models can have very large state spaces that are efficiently encoded using BDDs. The minimised model has often a sufficiently small number of states, so it can be further analysed efficiently using explicitstate algorithms.
Symbolic techniques mainly reduce the memory requirements of model checking. To speed up the computation, developing scalable parallel algorithms is the way forward, since it takes advantage of multicore computer systems. In [17, 18, 20], we implemented the multicore BDD package Sylvan, providing parallel BDD operations to symbolic model checking.
Parallelisation had been applied to explicitstate bisimulation minimisation before. Blom et al. [4, 5] introduced distributed signaturebased bisimulation reduction. Also, [29] proposed a concurrent algorithm for bisimulation minimisation which combines signatures with the approach by Paige and Tarjan [33]. Recently, Wijs [37] implemented highly parallel strong and branching bisimilarity checking on GPGPUs. As far as we are aware, no earlier work combines symbolic bisimulation minimisation and parallelism. This paper is an extended version of [21]. There, we demonstrated that specialised BDD operations for signature refinement provide a major speedup of the sequential algorithm, and scale across multiple processors.
We extend [21] by four new results. First, we investigate how to compute the reduced state space, i.e., the quotient of the original system with respect to the maximal bisimulation obtained by signature refinement. Traditionally, the quotient is computed by a sequence of standard BDD operations. Similar to computing the partition, we find that quotient computation benefits from specialised BDD operations. Second, we study the representation of the quotient. Traditionally, its states are encoded by using the assigned block number as state identifier. We improve the encoding by choosing one representative state from each block. This considerably reduces the size of the resulting BDD representation. Third, we refine our algorithm. Instead of using a monolithic transition relation, we now support a disjunctive partitioning of the transition relation. This appears to be more efficient than a monolithic transition relation and provides further parallelisation opportunities when computing the maximal bisimulation. Finally, we link the tool SigrefMC presented in [21] to LTSmin, by supporting the partitioned transition systems generated by the symbolic backend of the LTSmin toolset [6, 28, 31]. Since LTSmin supports various input languages, including the specification language mCRL2 [13] for process algebra, this allows us to carry out a considerably larger set of experiments, generated from various specification languages.
Outline This paper presents the following contributions. We recapitulate the notion of partition refinement with partial signatures in Sect. 3. Section 4 discusses how we extended Sylvan to parallelise signaturebased partition refinement. In particular, we develop three specialised BDD algorithms: the refine algorithm refines a partition according to a signature, but maximally reuses the block number assignment of the previous partition (Sect. 4.3). This algorithm improves the operation cache usage for the computation of the signatures of stable blocks and enables partition refinement with partial signatures. The inert algorithm removes all transitions that are not inert (Sect. 4.4). This algorithm avoids an expensive intermediate result reported in the literature [41]. We discuss the new quotient computation in Sect. 5. Specialised BDD algorithms significantly speed up the quotient computation for the interactive transition relation (Sect. 5.1) and for the Markovian transition relation (Sect. 5.2). The new encoding of the quotient space is explained in Sect. 5.3. Section 6 presents the implementation of these algorithms as a versatile tool that can be customised for bisimulation minimisation in various contexts, including support for transition systems generated by the model checking toolset LTSmin (Sect. 6.1). Section 7 discusses experimental data based on benchmarks from the literature. For partition refinement, we demonstrate a speedup of up to 95\(\times \) sequentially. In addition, we find parallel speedups of up to 17\(\times \) due to parallelisation with 48 cores. For quotient computation, we find a speedup of 2–10\(\times \) by using specialised operations, and we find significantly smaller BDDs (up to 5162\(\times \) smaller) when using a representative state rather than the block number to encode the new transition system.
2 Preliminaries
We recall the basic definitions of partitions, of labelled transition systems, of continuoustime Markov chains, of interactive Markov chains, and of various bisimulations as in [5, 26, 40, 41, 42].
2.1 Partitions
Definition 1
The elements of \(\pi \) are called equivalence classes or blocks. If \(\pi '\) and \(\pi \) are two partitions, then \(\pi '\) is a refinement of \(\pi \), written \(\pi '\sqsubseteq \pi \), if each block of \(\pi '\) is contained in a block of \(\pi \). Each equivalence relation \(\equiv \) is associated with a partition \(\pi =S/\!\equiv \). In this paper, we use \(\pi \) and \(\equiv \) interchangeably.
2.2 Transition systems
Definition 2
A labelled transition system (LTS) is a tuple \((S,\textsf {Act},T)\), consisting of a set of states S, a set of labels \(\textsf {Act}\), which may contain the nonobservable action \(\tau \), and transitions \(T\subseteq S\times \textsf {Act}\times S\).
We write \(s\overset{a}{\rightarrow }t\) for \((s,a,t)\!\in T\) and \(s\!\mathop {\nrightarrow }\limits ^{\tau }\) when s has no outgoing \(\tau \)transitions. We use \(\overset{a*}{\rightarrow }\) to denote the transitive reflexive closure of \(\overset{a}{\rightarrow }\). Given an equivalence relation \(\equiv \), we write Open image in new window for \(\overset{a}{\rightarrow }\!\cap \!\equiv \), i.e., transitions between equivalent states, called inert transitions. We use Open image in new window for the transitive reflexive closure of Open image in new window .
Definition 3
A continuoustime Markov chain (CTMC) is a tuple \((S,\mathbf {R})\), consisting of a set of states S and Markovian transitions \(\mathbf {R}:S\rightarrow S\rightarrow \mathbb {R}_{\ge 0}\).
We write \(s\overset{\lambda }{\Rightarrow }t\) for \(\mathbf {R}(s)(t)=\lambda \). The interpretation of \(s\overset{\lambda }{\Rightarrow }t\) is that the CTMC can switch from s to t within d time units with probability \(1\!\!\hbox {e}^{\lambda \cdot d}\). For a state s, we denote with \(\mathbf R (s)(C)=\sum _{s'\in C} \mathbf R (s)(s')\) the cumulative rate to reach a set of states \(C\subseteq S\) from state s in one transition.
Definition 4
An interactive Markov chain (IMC) is a tuple (\(S,\textsf {Act},T,\mathbf {R})\), consisting of a set of states S, a set of labels \(\textsf {Act}\) that may contain the nonobservable action \(\tau \), transitions \(T\subseteq S\times \textsf {Act}\times S\), and Markovian transitions \(\mathbf {R}:S\rightarrow S\rightarrow \mathbb {R}_{\ge 0}\).
An IMC basically combines the features of an LTS and a CTMC [25, 26]. One feature of IMCs is the maximal progress assumption. Internal interactive transitions, i.e., \(\tau \)transitions, can be assumed to take place immediately, while the probability that a Markovian transition executes immediately is zero. Therefore, we may remove all Markovian transitions from states that have outgoing \(\tau \)transitions: \(s\mathop {\rightarrow }\limits ^{\tau }\) implies \(\mathbf R (s)(S)=0\). We call IMCs to which this operation has been applied maximalprogresscut (mpcut) IMCs. In the rest of this paper, we implicitly assume that IMCs are mpcut.
2.3 Bisimulation
We recall strong and branching bisimulation. All discussed bisimulations are equivalence relations on the states of a transition system. Two states are bisimilar if and only if there is a bisimulation that relates them. So the maximal bisimulation relates two states if and only if they are bisimilar. For LTSs, we define strong and branching bisimulation as follows [41]:
Definition 5
A strong bisimulation on an LTS is an equivalence relation \(\equiv _{S}\) such that for all states \(s,t,s'\) with \(s\equiv _{S}t\) and \(s\overset{a}{\rightarrow }s'\), there is a state \(t'\) with \(t\overset{a}{\rightarrow }t'\) and \(s'\equiv _{S}t'\).
Definition 6

\(a=\tau \) and \(s'\equiv _{B}t\), or

there are states \(t',t''\) with \(t\overset{\tau *}{\rightarrow }t'\overset{a}{\rightarrow }t''\) and \(t\equiv _{B}t'\) and \(s'\equiv _{B}t''\).
For CTMCs, we define strong bisimulation as follows [16, 38]:
Definition 7
A strong bisimulation on a CTMC is an equivalence relation \(\equiv _{S}\) such that for all \((s,t)\in \;\equiv _{S}\) and for all classes \(C\in S/\!\equiv _{S}\), \(\mathbf R (s)(C)=\mathbf R (t)(C)\).
For mpcut IMCs, we define strong and branching bisimulation as follows [26, 42]:
Definition 8

\(s\overset{a}{\rightarrow }s'\) for some \(s'\in C\) implies \(t\overset{a}{\rightarrow }t'\) for some \(t'\in C\)

\(\mathbf R (s)(C)=\mathbf R (t)(C)\)
Definition 9
 \(s\overset{a}{\rightarrow }s'\) for some \(s'\in C\) implies

\(a=\tau \) and \((s,s')\in \,\equiv _{B}\), or

there are states \(t',t''\in S\) with \(t \overset{\tau *}{\rightarrow } t'\overset{a}{\rightarrow }t''\) and \((t,t')\in \,\equiv _{B}\) and \(t''\in C\).

 \(\mathbf R (s)(C)>0\) implies

\(\mathbf R (s)(C)=\mathbf R (t')(C)\) for some \(t'\in S\) such that \(t \overset{\tau *}{\rightarrow } t'\!\mathop {\nrightarrow }\limits ^{\tau }\) and \((t,t')\in \,\equiv _{B}\).


\(s\!\mathop {\nrightarrow }\limits ^{\tau }\) implies \(t\overset{\tau *}{\rightarrow } t'\!\mathop {\nrightarrow }\limits ^{\tau }\) for some \(t'\)
As we compare our work to [41, 42], we consider divergencesensitive branching bisimulation for IMCs, which distinguishes deadlock states (without successors) from states that only have selflooping transitions.
3 Signaturebased bisimulation minimisation
Blom and Orzan [5] introduced a signaturebased approach to compute the maximal bisimulation of an LTS, which was further developed into a symbolic method by Wimmer et al. [41]. Each state is characterised by a signature, which is the same for all equivalent states in a bisimulation. These signatures are used to refine a partition of the state space until a fixed point is reached, which is the maximal bisimulation.
In the literature, multiple signatures are sometimes used that together fully characterise states, for example based on the state labels, based on the rates of continuoustime transitions, and based on the enabled interactive transitions. We consider these multiple signatures as elements of a single signature that fully characterises each state.
Definition 10
A signature \(\sigma (\pi )(s)\) is a tuple of functions \(f_i(\pi )(s)\), that together characterise each state s with respect to a partition \(\pi \). Two signatures \(\sigma (\pi )(s)\) and \(\sigma (\pi )(t)\) are equivalent, if and only if for all \(f_i\), \(f_i(\pi )(s)=f_i(\pi )(t)\).

\(\mathbf T ({\pi })(s)=\{(a,C)\mid \exists s'\in C:s\overset{a}{\rightarrow }s'\}\)

\(\mathbf B ({\pi })(s)=\{(a,C)\mid \exists s'\in C:s\)

\(\mathbf R ^s({\pi })(s)=C\mapsto \mathbf R (s)(C)\)
Functions \(\mathbf T \) and \(\mathbf B \) assign to each state s all pairs of actions a and equivalence classes \(C\in \pi \), such that state s can reach C by an action a either directly (\(\mathbf T \)) or via any number of inert \(\tau \)steps (\(\mathbf B \)). Furthermore, inert \(\tau \)steps are removed from B. \(\mathbf R ^s\) equals \(\mathbf R \) but with the domain restricted to the equivalence classes \(C\in \pi \) and represents the cumulative rate with which each state s can go to states in C. \(\mathbf R ^b\) equals \(\mathbf R ^s\) for states \(s\!\mathop {\nrightarrow }\limits ^{\tau }\) and takes the highest “reachable rate” for states with inert \(\tau \)transitions. In branching bisimulation for mpcut IMCs, the “highest reachable rate” is by definition the rate that all states \({s\!\overset{\tau }{\nrightarrow }}\) in C have. The element Open image in new window distinguishes time convergent states from time divergent states [42] and is independent of the partition.
For the bisimulations of Definitions 5–9, we state:
Lemma 1
A partition \(\pi \) is a bisimulation, iff for all s and t that are equivalent in \(\pi \), \(\sigma (\pi )(s)=\sigma (\pi )(t)\).
For the above definitions, it is fairly straightforward to prove that they are equivalent to the classical definitions of bisimulation. See [5, 41] for the bisimulations on LTSs and [42] for the bisimulations on IMCs.
3.1 Signaturebased partition refinement
As discussed above, signatures can consist of multiple elements. We first define partition refinement using the full signature. We then define partition refinement with partial signatures, i.e., using the elements of the signature, and discuss advantages of this approach.
Definition 11
The algorithm iteratively refines the initial coarsest partition \(\{S\}\) according to the signatures of the states, until some fixed point \(\pi ^{n+1}=\pi ^{n}\) is obtained. For monotone signatures (defined below), this fixed point is the maximal bisimulation.
Definition 12
A signature is monotone if for all \(\pi ,\pi '\) with \(\pi \sqsubseteq \pi '\), \(\sigma (\pi )(s)=\sigma (\pi )(t)\) implies \(\sigma (\pi ')(s)=\sigma (\pi ')(t)\).
For all monotone signatures, the \({{\mathrm{sigref}}}\) operator is monotone: \(\pi \sqsubseteq \pi '\) implies \({{\mathrm{sigref}}}(\pi ,\sigma )\sqsubseteq {{\mathrm{sigref}}}(\pi ',\sigma )\). Hence, following Kleene’s fixed point theorem, the procedure above reaches the greatest fixed point.
In Definition 11, the full signature is computed in every iteration. We propose to apply partition refinement using parts of the signature. By definition, \(\sigma (\pi )(s)=\sigma (\pi )(t)\) if and only if for all parts \(f_i(\pi )(s)=f_i(\pi )(t)\).
Definition 13
We always select some \(f_i\) that refines the partition \(\pi \). A fixed point is reached only when no \(f_i\) refines the partition further: \(\forall f_i\in \sigma :{{\mathrm{sigref}}}(\pi ^n, f_i)=\pi ^n\). The extra clause \(s \equiv _{\pi } t\) ensures that every application of \({{\mathrm{sigref}}}\) refines the partition.
Theorem 1
If all parts \(f_i\) are monotone, Definition 13 yields the greatest fixed point.
Proof
The procedure terminates since the chain is decreasing (\(\pi ^{n+1}\sqsubseteq \pi ^n\)), due to the added clause \(s \equiv _{\pi } t\). We reach some fixed point \(\pi ^n\), since \({{\mathrm{sigref}}}(\pi ^n, \sigma )=\pi ^n\) is implied by \(\forall f_i\in \sigma :{{\mathrm{sigref}}}(\pi ^n, f_i)=\pi ^n\). Finally, to prove that we get the greatest fixed point, assume there exists another fixed point \(\xi ={{\mathrm{sigref}}}(\xi ,\sigma )\). Then, also \(\xi ={{\mathrm{sigref}}}(\xi ,f_i)\) for all i. We prove that \(\xi \sqsubseteq \pi ^n\) by induction on n. Initially, \(\xi \sqsubseteq {S}=\pi ^0\). Assume \(\xi \sqsubseteq \pi ^n\), then for the selected i, \(\xi ={{\mathrm{sigref}}}(\xi ,f_i)\sqsubseteq {{\mathrm{sigref}}}(\pi ^n,f_i)=\pi ^{n+1}\), using monotonicity of \(f_i\).
There are several advantages to this approach due to its flexibility. First, for any \(f_i\) that is independent of the partition, we need to refine with respect to that \(f_i\) only once. Furthermore, refinements can be applied according to different strategies. For instance, for the strong bisimulation of an mpcut IMC, one could refine w.r.t. \(\mathbf T \) until there is no more refinement, then w.r.t. \(\mathbf R ^s\) until there is no more refinement, then repeat until neither \(\mathbf T \) nor \(\mathbf R ^s\) refines the partition. Finally, computing the full signature is the most memoryintensive operation in symbolic signaturebased partition refinement. If the partial signatures are smaller than the full signature, then larger models can be minimised.
4 Symbolic signature refinement
This section describes the parallel decision diagram library Sylvan, followed by the (MT)BDDs and (MT)BDD operations required for signaturebased partition refinement. We describe how we encode partitions and signatures for signaturebased partition refinement. We present a new parallelised refine function that maximally reuses block numbers from the old partition. Finally, we present a new BDD algorithm that computes inert transitions, i.e., restricts a transition relation such that states s and \(s'\) are in the same block.
4.1 Decision diagram algorithms in Sylvan
In symbolic model checking [11], sets of states and transitions are represented by their characteristic function, rather than stored individually. With states described by N Boolean variables, a set \(S\subseteq \mathbb {B}^N\) can be represented by its characteristic function \(f:\mathbb {B}^N\rightarrow \mathbb {B}\), where \(S=\{s\mid f(s)\}\). Binary decision diagrams (BDDs) are a concise and canonical representation of Boolean functions [10].
An (ordered) BDD is a directed acyclic graph with leaves 0 and 1. Each internal node has a variable label \(x_i\) and two outgoing edges labelled 0 and 1. Variables are encountered along each path according to a fixed variable ordering. Duplicate nodes and nodes with two identical outgoing edges are forbidden. It is well known that for a fixed variable ordering, every Boolean function is represented by a unique BDD.
In addition to BDDs with leaves 0 and 1, multiterminal binary decision diagrams have been proposed [2, 12] with leaves other than 0 and 1, representing functions from the Boolean space \(\mathbb {B}^N\) onto any set. For example, MTBDDs can have leaves representing integers (encoding \(\mathbb {B}^N\rightarrow \mathbb {N}\)), floatingpoint numbers (encoding \(\mathbb {B}^N\rightarrow \mathbb {R}\)), and rational numbers (encoding \(\mathbb {B}^N\rightarrow \mathbb {Q}\)). Partial functions are supported using a leaf \(\bot \).
See Algorithm 1 for a generic example of a BDD operation. This algorithm takes two inputs, the BDDs x and y, to which a binary operation \(\textsf {F}\) is applied. Most decision diagram operations first check if the operation can be applied immediately to x and y (line 2). This is typically the case when x and y are leaves. Often there are also other trivial cases that can be checked first. We then consult the operation cache (line 4) to see if this (sub)operation has been computed earlier. The operation cache is required to reduce the time complexity of BDD operations from exponential to polynomial in the size of the BDDs. Sylvan uses a single shared unique table for all BDD nodes and a single shared operation cache for all operations.
Often, the parameters of an operation can be normalised in some ways to increase the cache efficiency. For example, \(a\wedge b\) and \(b\wedge a\) are the same operation. In that case, normalisation rules can rewrite the parameters to some standard form in order to increase cache utilisation, at line 3. A wellknown example is the ifthenelse algorithm, which rewrites using rewrite rules called “standard triples” as described in [9].
If x and y are not leaves and the operation is not trivial or in the cache, we use topVar (line 5) to determine the first variable of the root nodes of x and y. If x and y have a different variable in their root node, topVar returns the first one in the variable ordering. We then compute the recursive application of F to the cofactors of x and y with respect to variable v at lines 7–8. We write \(x_{v=i}\) to denote the cofactor of x where variable v takes value i. Since x and y are ordered according to the same fixed variable ordering, we can easily obtain \(x_{v=i}\). If the root node of x is on the variable v, then \(x_{v=i}\) is obtained by following the low (\(i=0\)) or high (\(i=1\)) edge of x. Otherwise, \(x_{v=i}\) equals x. After computing the suboperations, we compute the result by either reusing an existing or creating a new BDD node (line 9).
Operations on decision diagrams are typically recursively defined on the structure of the inputs. To parallelise the operation in Algorithm 1, the two independent suboperations at lines 7–8 are executed in parallel using workstealing. To obtain high performance in a multicore environment, the data structures for the BDD node table and the operation cache must be highly scalable. Sylvan implements several nonblocking data structures to enable good speedups [17, 20].
4.2 Encoding of signature refinement
We implement symbolic signature refinement similar to [41]. However, we do not refine the partition with respect to a single block, but with respect to all blocks simultaneously. We use a binary encoding with variables s for the current state, \(s'\) for the next state, a for the action labels, and b for the blocks. We order BDD variables a and b after s and \(s'\), since this is required to efficiently replace signatures (on a and b) by new block numbers b (see below). Variables s and \(s'\) are interleaved, which is a common heuristic for transition systems.
In [21], we ordered a before b. However, we expect that in general ordering b before a is better for the following reason. If we have a before b, then when computing the signatures and the quotient (Sect. 5), it is guaranteed that all BDD nodes on a variables have to be recreated, whereas they may be reused if a variables are last in the ordering.

A set of states is represented by a BDD \(\mathcal {S}(s)\);

Transitions are represented by a BDD \(\mathcal {T}(s,s',a)\);

Markovian transitions are represented by an MTBDD \(\mathcal {R}(s,s')\), with leaves containing rational numbers (\(\mathbb {Q}\)) that represent the transition rates;

Signatures \(\mathbf T \) and \(\mathbf B \) are represented by a BDD \(\sigma _T(s,b,a)\);

Signatures \(\mathbf R ^s\) and \(\mathbf R ^b\) are represented by an MTBDD \(\sigma _R(s,b)\), with leaves containing rational numbers (\(\mathbb {Q}\)) that represent the rates in the signature.
 1.
As an equivalence relation, using a BDD \(\mathcal {E}(s,s')=1\) iff \(s\equiv _{\pi } s'\) [8, 32].
 2.
As a partition, by assigning each block a unique number, encoded with variables b, using a BDD \(\mathcal {P}(s,b)=1\) iff \(s\in C_b\) [16, 41, 42].
 3.
Using \(k={\lceil }\log _2 n{\rceil }\) BDDs \(\mathcal {P}_{0},\cdots ,\mathcal {P}_{k1}\) such that \(\mathcal {P}_i(s)=1\) iff \(s\in C_b\) and the ith bit of b is 1. This requires significant time to restore blocks for the refinement procedure, but can require less memory [15].

\(\sigma _T(s,b,a) \, := \, \exists s':\mathcal {T}(s,s',a) \wedge \mathcal {P}(s',b)\)

\(\sigma _R(s,b) \, := \, \exists _\texttt {sum}\, s':\mathcal {R}(s,s') \wedge \mathcal {P}(s',b)\)
4.3 The refine algorithm
We present a new BDD algorithm to refine partitions according to a signature, which maximally preserves previously assigned block numbers.
Partition refinement consists of two steps: computing the signatures and computing the next partition. Given the signatures \(\sigma _T\) and/or \(\sigma _R\) for the current partition \(\pi \), the new partition can be computed as follows.
Since the chosen variable ordering has variables \(s,s'\) before a, b, each path in \(\sigma \) ends in a (MT)BDD representing the signature for the states encoded by that path. For \(\sigma _T\), every path that assigns values to s ends in a BDD on a, b. For \(\sigma _R\), every path that assigns values to s ends in a MTBDD on b with rational leaves.
We modify the refine algorithm to use the current partition to reuse the previous block number of each state. This also allows refining a partition with respect to only a part of the signature, as described in Sect. 3. The modification is applied such that it can be parallelised in Sylvan. See Algorithm 2.
The algorithm has two input parameters: \(\sigma \) which encodes the (partial) signature for the current partition and \(\mathcal {P}\) which encodes the current partition. The algorithm uses a global counter \(\textsf {iter}\), which is the current iteration. This is necessary since the cached results of the previous iteration cannot be reused. It also uses and updates an array blocks, which contains the signature of each block in the new partition. This array is cleared between iterations of partition refinement.
The implementation is similar to other BDD operations, with an operation cache (lines 2 and 18) and a recursion step for variables in s (lines 3–8). The two recursive operations are executed in parallel. refine simultaneously descends in \(\sigma \) and \(\mathcal {P}\) (lines 6–7), matching the valuation of \(s_i\) in \(\sigma \) and \(s'_i\) in \(\mathcal {P}\). Block assignment happens at lines 11–17. We rely on the wellknown atomic operation compare_and_swap (cas), which atomically compares and modifies a value in memory. This is necessary for parallel correctness. We use cas to claim the previous block number for the signature (line 12). If the block number is already claimed for a different signature, then the current block is being split and we call search_or_insert to assign a new block number.
Different implementations of search_and_insert are possible. We implemented a parallel hash table that uses a global counter for the next block number when inserting a new pair \((\sigma , B)\), similar to [41]. We also implemented an alternative implementation that integrates the blocks array with a skip list. A skip list is a probabilistic multilevel ordered linked list. See [35]. This implementation performed better in our experiments, but we omit the implementation details due to space constraints.
4.4 Computing inert transitions
To compute the set of inert \(\tau \)transitions for branching bisimulation Open image in new window , or more generally, to compute any inert transition relation \(\rightarrow \!\cap \!\equiv \) with \(\pi =S/\!\equiv \) with blocks b, the expression \(\mathcal {T}(s,s') \wedge \exists b:\mathcal {P}(s,b) \wedge \mathcal {P}(s',b)\) must be evaluated. [41] writes that the intermediate BDD of \(\exists b:\mathcal {P}(s,b) \wedge \mathcal {P}(s',b)\), obtained by first computing \(\mathcal {P}(s,b)\) using variable renaming from \(\mathcal {P}(s',b)\) and then \(\exists b:\mathcal {P}(s,b) \wedge \mathcal {P}(s',b)\) using and_exists, is very large. This is no surprise, since this intermediate result is indeed the BDD \(\mathcal {E}(s,s')\), which we were avoiding by representing the partition using \(\mathcal {P}(s',b)\).
We present an alternative solution, which computes \(\rightarrow \!\cap \!\equiv \) directly using a custom BDD algorithm. The inert algorithm takes parameters \(\mathcal {T}(s,s')\) (\(\mathcal {T}\) may contain other variables ordered after \(s,s'\)) and two copies of \(\mathcal {P}(s',b)\): \(\mathcal {P}^s\) and \(\mathcal {P}^{s'}\). The algorithm matches \(\mathcal {T}\) and \(\mathcal {P}^s\) on valuations of variables s, and \(\mathcal {T}\) and \(\mathcal {P}^{s'}\) on valuations of variables \(s'\). See Algorithm 3, and also Fig. 2 for a schematic overview. When in the recursive call all valuations to s and \(s'\) have been matched, with \(S_s,S_{s'}\subseteq S\) the sets of states represented by these valuations, \(\mathcal {T}\) is the set of actions that label the transitions between states in \(S_s\) and \(S_{s'}\), \(\mathcal {P}^s\) is the block that contains all \(S_s\), and \(\mathcal {P}^{s'}\) is the block that contains all \(S_{s'}\). Then, if \(\mathcal {P}^s\ne \mathcal {P}^{s'}\), the transitions are not inert and inert returns False, removing the transition from \(\mathcal {T}\). Otherwise, \(\mathcal {T}\) (which may still contain other variables ordered after \(s,s'\), such as action labels) is returned.
5 Quotient computation
Computing the partition of the maximal bisimulation is only the first part of the minimisation process. We must also apply the partition to the original system, such that the blocks of the partition become the states of the new transition system. A straightforward conversion procedure encodes the new states using the block numbers assigned during partition refinement.
Just like partition refinement, the quotient can be computed with a sequence of standard BDD operations. We describe how the Sigref tool by Wimmer et al. [41] implements this computation. Furthermore, we develop specialised algorithms which significantly speedup quotient computation for the interactive transition relation (Sect. 5.1) and for the Markovian transition relation (Sect. 5.2). Finally, we investigate a different encoding that does not use the assigned block numbers for the new system, but picks an arbitrary state from each block as a representative (Sect. 5.3).
5.1 Computing the new interactive transition relation
For LTSs and IMCs, the new interactive transition relation is computed using the original transition relation and the partition. We first describe how this relation is computed using standard BDD operations in the Sigref tool [41]. We then present a new algorithm that performs all steps in one operation.
 1.Merge target states to the new encoding (in b).$$\begin{aligned} \mathcal {T}(s,b,a) \, := \, \exists s':\mathcal {T}(s,s',a) \wedge \mathcal {P}(s',b) \end{aligned}$$
 2.Rename b variables to \(s'\) variables.$$\begin{aligned} \mathcal {T}(s,s',a) \, := \, \mathcal {T}(s,b,a)[b\leftarrow s'] \end{aligned}$$
 3.Merge source states to the new encoding (in b).$$\begin{aligned} \mathcal {T}(s',b,a) \, := \exists s:\mathcal {T}(s,s',a) \wedge \mathcal {P}'(s,b) \end{aligned}$$
 4.Rename b variables to s variables.$$\begin{aligned} \mathcal {T}(s,s',a) \, := \, \mathcal {T}(s',b,a)[b\leftarrow s] \end{aligned}$$
 5.Remove \(\tau \)loops (only for branching bisimulation).$$\begin{aligned} \mathcal {T}(s,s',a) \, := \mathcal {T}(s,s',a)\wedge \lnot (s=s'\wedge a=\tau ) \end{aligned}$$
Steps 1–5 coincide with lines 2–7 in the above algorithm. The BDD for \(s=s'\wedge a=\tau \) (line 7) is trivial and can be computed just before line 7.
 1.Merge target states to the new encoding (in \(b'\)).$$\begin{aligned} \mathcal {T}(s,b',a) \, := \, \exists s':\mathcal {T}(s,s',a) \wedge \mathcal {P}'(s',b') \end{aligned}$$
 2.Merge source states to the new encoding (in b).$$\begin{aligned} \mathcal {T}(b,b',a) \, := \exists s:\mathcal {T}(s,a,b') \wedge \mathcal {P}''(s,b) \end{aligned}$$
 3.Rename b and \(b'\) variables to s and \(s'\) variables.$$\begin{aligned} \mathcal {T}(s,s',a) := \mathcal {T}(a,b,b')[b\leftarrow s,b'\leftarrow s'] \end{aligned}$$
 4.Remove \(\tau \)loops (only for branching bisimulation).$$\begin{aligned} \mathcal {T}(s,s',a) \, := \mathcal {T}(s,s',a)\wedge \lnot (s=s'\wedge a=\tau ) \end{aligned}$$
 1.Merge target states to the new encoding (in b).$$\begin{aligned} \mathcal {T}(s,b,a) := \exists s':\mathcal {T}(s,s',a) \wedge \mathcal {P}(s',b) \end{aligned}$$
 2.Rename s and b variables to \(s'\) and \(b'\) variables.$$\begin{aligned} \mathcal {T}(s',b',a) \, := \, \mathcal {T}(s,b,a)[s\leftarrow s',b\leftarrow b'] \end{aligned}$$
 3.Merge source states to the new encoding (in b).$$\begin{aligned} \mathcal {T}(b,b',a) \, := \exists s:\mathcal {T}(s',b',a) \wedge \mathcal {P}(s',b) \end{aligned}$$
 4.Rename b and \(b'\) variables to s and \(s'\) variables.$$\begin{aligned} \mathcal {T}(s,s',a) \, := \mathcal {T}(b,b',a)[b\leftarrow s,b'\leftarrow s'] \end{aligned}$$
 5.Remove \(\tau \)loops (only for branching bisimulation).$$\begin{aligned} \mathcal {T}(s,s',a) \, := \, \mathcal {T}(s,s',a)\wedge \lnot (s=s'\wedge a=\tau ) \end{aligned}$$
These algorithms still compute intermediate results that could be avoided by combining several steps into one operation. For example, every rename operation essentially creates a duplicate of the original BDD, when most BDD nodes are affected by the renaming. Using a custom operation can mitigate this. Similar to the inert algorithm discussed in Sect. 4.4, we implement the algorithm quotient that combines all steps of the above two algorithms. See Fig. 3 and Algorithm 4. Note the similarities with Fig. 2 and Algorithm 3.
Like the inert operation, we evaluate and match the transition relation with two copies of the partition (lines 1–12) and obtain the source block, the target block, and the set of actions at line 14–15. If we perform branching bisimulation and the source and target blocks are identical, we remove the \(\tau \) transition from the obtained set of actions (line 14). As the two BDDs for the blocks are simple cubes that encode exactly one block by assigning a value to each b variable, and \(\mathcal {T}\) is the set of actions A, it is very straightforward to compute the BDD representing the triple \((s,s',A)\) using the recursive function makecube (line 15), which we included for completeness in Algorithm 4 at lines 18–29. Then, we combine all tuples computed at line 15 with or (lines 8 and 12), which has the same effect as existential quantification in the original algorithm.
5.2 Computing the new Markovian transition relation
For CTMCs and IMCs, the new Markovian transition relation must be computed. We first describe how this relation is computed using standard BDD operations in the Sigref tool [41]. We then present a new algorithm that combines several steps of the computation.
 1.Merge target states to the new encoding (in b).$$\begin{aligned} \mathcal {R}(s,b) \, := \, \exists _\texttt {sum} s':\mathcal {R}(s,s') \wedge \mathcal {P}(s',b) \end{aligned}$$
 2.Rename b variables to \(s'\) variables.$$\begin{aligned} \mathcal {R}(s,s') \, := \mathcal {R}(s,b)[b\leftarrow s'] \end{aligned}$$
 3.Merge source states to the new encoding (in b).$$\begin{aligned} \mathcal {R}(s',b) \, := \exists _\texttt {max} s:\mathcal {R}(s,s') \wedge \mathcal {P}'(s,b) \end{aligned}$$
 4.Rename b variables to s variables.$$\begin{aligned} \mathcal {R}(s,s') \, := \mathcal {R}(s',b)[b\leftarrow s] \end{aligned}$$
We also implemented a custom quotient operation for the Markovian transition relation. However, not all steps can be combined like with interaction transition relation, since adding rates from states to blocks must be done before the source states are merged. Thus, we can only combine steps 2–4. The quotient operation for the Markovian transition relation is similar to the implementation of and_exists_max in Sylvan, modified to perform the rename operations on the fly and we omit it due to space limitations.
5.3 Alternative encoding for new states
The standard encoding of the states in the new transition system uses the block numbers assigned during partition refinement. This can have a significant disadvantage. Symbolic models are powerful as they can represent large state spaces efficiently by exploiting structural properties of the transition system, like symmetries and independent variables. Such properties are lost when using the block numbers of the partition.
We propose an alternative encoding “pickonestate” that picks one state from each block to represent all states in the block. Each path in \(\mathcal {P}\) to the subBDD that represents a block (on b variables) encodes states in that block, such that state variables encountered along the path are True if the high edge was followed and False if the low edge was followed. We use this information to compute exactly one state (encoded using b variables, with missing state variables set to False) that represents the block and store this state in an array. Since we are simply interested in obtaining one state that represents each block, we only need to visit each node in the BDD \(\mathcal {P}\) once, so we use the operation cache to denote whether we have visited the node. See Algorithm 5. This algorithm pick fills an array picked with a single state for each block, obtained from the path as described above using a helper function pick_one_state.
6 Tool support
We implemented multicore symbolic signaturebased bisimulation minimisation in a tool called SigrefMC. The tool supports LTSs, CTMCs, and IMCs delivered in two input formats, the XML format used by the original Sigref tool and the BDD format that the tool LTSmin [28] generates for various model checking languages. SigrefMC supports both the floatingpoint and the rational representation of rates in continuoustime transitions.
One of the design goals of this tool is to encourage researchers to extend it for their own file formats and notions of bisimulation, and to integrate it in other toolsets. Therefore, SigrefMC is freely available online^{1} and licensed with the permissive Apache 2.0 license. Documentation is available and instructions for extending the tool for different input/output formats and types of bisimulation are included.
6.1 Support for LTSmin
SigrefMC supports models are generated by the model checking toolset LTSmin. LTSmin provides a languageindependent Partitioned NextState Interface (Pins), which connects various input languages to model checking algorithms [6, 28, 31]. In Pins, the states of a system are represented by vectors of N integer values. Furthermore, transitions are distinguished in K disjunctive “transition groups”, i.e., each transition in the system belongs to one of these transition groups. The transition relation of each transition group usually only depends on a subset of the entire state vector called the “short vector”, further distinguished by the variables that are “read” and the variables that are “written” [31]. This enables the efficient encoding of transitions that only affect some integers of the state vector. Exploiting this information lets the Pins interface work in a quasisymbolic way, as a single pair of short vectors can represent many transition relations on the full state vector.
Initially, LTSmin does not have knowledge of the transitions in each transition group, and only the initial state is known. The transition system is explored by learning new transitions via the Pins interface, which are then added to the transition relation. Various input languages connect to LTSmin via the Pins interface by implementing a nextstate function, which produces all target states (as write vectors) reachable from a given source state (as read vector). Using the LTSmin toolset, we can convert process algebra specifications in the language mCRL2 [13] to the BDD file format that SigrefMC supports. We can then minimise the obtained LTS using the techniques described in this paper and obtain the result, either as a symbolic LTS or as a simple explicitstate enumeration of transitions between states.
Computation time in seconds for partition refinement on the benchmarks, comparing Sigref with SigrefMC
Model  States  Blocks  Time  Speedups  

\(T_{w}\)  \(T_{1}\)  \(T_{48}\)  Seq.  Par.  Total  
LTS models (strong)  
kanban03  1,024,240  85,356  92.16  10.09  0.88  9.14\(\times \)  11.52\(\times \)  105.29\(\times \) 
kanban04  16,020,316  778,485  1410.66  148.15  11.37  9.52\(\times \)  13.03\(\times \)  124.06\(\times \) 
kanban05  16,772,032  5,033,631  –  1284.86  73.57  –  17.47\(\times \)  – 
kanban06  264,515,056  25,293,849  –  –  2584.23  –  –  – 
LTS models (branching)  
kanban04  16,020,316  2785  8.47  0.52  0.24  16.39\(\times \)  2.11\(\times \)  34.60\(\times \) 
kanban05  16,772,032  7366  34.11  1.48  0.43  22.98\(\times \)  3.47\(\times \)  79.81\(\times \) 
kanban06  264,515,056  17,010  118.19  3.87  0.83  30.55\(\times \)  4.65\(\times \)  142.20\(\times \) 
kanban07  268,430,272  35,456  387.16  8.83  1.66  43.86\(\times \)  5.31\(\times \)  232.71\(\times \) 
kanban08  4,224,876,912  68,217  1091.67  17.91  2.98  60.96\(\times \)  6.02\(\times \)  366.72\(\times \) 
kanban09  4,293,193,072  123,070  3186.48  34.23  5.51  93.10\(\times \)  6.21\(\times \)  578.59\(\times \) 
CTMC models  
cycling4  431,101  282,943  220.23  26.72  2.60  8.24\(\times \)  10.29\(\times \)  84.84\(\times \) 
cycling5  2,326,666  1,424,914  1249.23  170.28  19.42  7.34\(\times \)  8.77\(\times \)  64.34\(\times \) 
fgf  80,616  38,639  71.62  8.86  0.88  8.08\(\times \)  10.04\(\times \)  81.20\(\times \) 
p2p56  \(2^{30}\)  336  750.29  26.96  2.99  27.83\(\times \)  9.03\(\times \)  251.24\(\times \) 
p2p65  \(2^{30}\)  266  248.17  9.49  1.21  26.15\(\times \)  7.82\(\times \)  204.47\(\times \) 
p2p75  \(2^{35}\)  336  2280.76  24.01  2.97  94.99\(\times \)  8.08\(\times \)  767.12\(\times \) 
polling16  1,572,864  98,304  792.82  118.50  10.18  6.69\(\times \)  11.64\(\times \)  77.85\(\times \) 
polling17  3,342,336  196,608  1739.01  303.65  22.58  5.73\(\times \)  13.45\(\times \)  77.03\(\times \) 
polling18  7,077,888  393,216  –  705.22  49.81  –  14.16\(\times \)  – 
robot020  31,160  30,780  28.15  3.21  0.60  8.78\(\times \)  5.36\(\times \)  47.04\(\times \) 
robot025  61,200  60,600  78.48  6.78  0.95  11.58\(\times \)  7.11\(\times \)  82.39\(\times \) 
robot030  106,140  105,270  174.30  12.26  1.47  14.21\(\times \)  8.33\(\times \)  118.44\(\times \) 
IMC models (strong)  
ftwc01  2048  1133  1.26  1.14  0.2  1.11\(\times \)  5.76\(\times \)  6.38\(\times \) 
ftwc02  32,768  16,797  154.55  102.07  15.85  1.51\(\times \)  6.44\(\times \)  9.75\(\times \) 
IMC models (branching)  
ftwc01  2048  430  1.12  0.77  0.13  1.45\(\times \)  6.07\(\times \)  8.83\(\times \) 
ftwc02  32,786  3886  152.9  50.39  4.89  3.03\(\times \)  10.3\(\times \)  31.26\(\times \) 
7 Experimental evaluation
This section reports on the experimental evaluation of the techniques proposed in this paper. We study the improvements to signature refinement in Sect. 7.1, the improvements to quotient computation in Sect. 7.2, the effect of ordering block variables after or before action variables in Sect. 7.3, and finally the performance of the presented tool SigrefMC on process algebra benchmarks produced with LTSmin in Sect. 7.4. We also refer to the full experimental data that are available online^{2} and can be reproduced.
When comparing SigrefMC to other tools, we restrict ourselves to the symbolic bisimulation minimisation tool Sigref by Wimmer et al., as [41] already compares Sigref to other explicitstate and symbolic bisimulation minimisation tools.
7.1 Signature refinement
7.1.1 Design
To study the improvements to signature refinement that we present in this paper, we compared our results (using the skip list variant of refine) to Sigref 1.5 [40] for LTS and IMC models, and to a version of Sigref used in [38] for CTMC models. For the CTMC models, we used Sigref with rational numbers provided by the GMP library and SigrefMC with rational number support by Sylvan. For the IMC models, version 1.5 of Sigref does not support the GMP library and the version used in [38] does not support IMCs. We used SigrefMC with floating points for a fairer comparison, but the tools give a slightly different number of blocks, due to the use of floating points.
We restrict ourselves to the models presented in [38, 41] and an IMC model that is part of the distribution of Sigref. These models have been generated from PRISM benchmarks using a custom version of the PRISM toolset [30]. We refer to the literature for a description of these models.
We perform experiments on the three tools using a 48core machine, containing 4 AMD Opteron^{TM} 6168 processors with 12 cores each. We measure the runtimes for the partition refinement algorithm (excluding fileI/O) using Sigref, SigrefMC with only 1 worker, and SigrefMC with 48 workers.
Apart from the new refine and inert algorithms presented in the current paper, there are several other differences. The first is that the original Sigref uses the CUDD implementation of BDDs, while SigrefMC uses Sylvan, along with some extra BDD algorithms that avoid explicitly computing variable renaming of some BDDs. The second is that Sigref has several optimisations [40] that are not available in SigrefMC.
7.1.2 Results
See Table 1 for the results of these experiments. These results were obtained by repeating each benchmark at least 15 times and taking the average. The timeout was set to 3600 s. The column “States” shows the number of states before bisimulation minimisation and “Blocks” the number of equivalence classes after bisimulation minimisation. We show the wall clock time using Sigref (\(T_w\)), using SigrefMC with 1 worker (\(T_1\)) and using SigrefMC with 48 workers (\(T_{48}\)). We compute the sequential speedup \(T_w/T_1\), the parallel speedup \(T_1/T_{48}\), and the total speedup \(T_w/T_{48}\).
Note that we obtained these results using the variable ordering \(s,s'< a < b\); the other experiments are computed using the variable ordering \(s,s'< b < a\), as discussed below and in Sect. 4.2.
Due to space constraints, we do not include all results, but restrict ourselves to larger models. We refer to the full experimental data that is available online. In the full set of results, excluding executions that take less than 1 s, SigrefMC is always faster sequentially and always benefits from parallelism.
The results show a clear advantage for larger models. One interesting result is for the p2p75 model. This model is ideal for symbolic bisimulation with a large number of states (\(2^{35}\)) and very few blocks after minimisation (336). For this model, our tool is 95\(\times \) faster sequentially and has a parallel speedup of 8\(\times \), resulting in a total speedup of 767\(\times \). The best parallel speedup of 17\(\times \) was obtained for the kanban05 model.
Computation time in seconds for different implementations of quotient computation
blocks  block  pick  

\(T_1\)  \(T_{48}\)  Sp.  \(T_1\)  \(T_{48}\)  Sp.  \(T_1\)  \(T_{48}\)  Sp.  
LTS model (strong)  
kanban03  24.64  1.5  16.42\(\times \)  9.48  0.48  19.85\(\times \)  6.72  0.35  19.08\(\times \) 
kanban04  370.16  21.25  17.42\(\times \)  129.19  7.84  16.47\(\times \)  106.22  5.38  19.73\(\times \) 
kanban05  –  175.92  –  1114.06  55.26  20.16\(\times \)  740.53  33.80  21.91\(\times \) 
LTS model (branching)  
kanban04  1.08  0.12  8.91\(\times \)  0.20  0.03  6.67\(\times \)  0.16  0.04  3.65\(\times \) 
kanban05  3.48  0.33  10.71\(\times \)  0.68  0.09  7.60\(\times \)  0.51  0.10  5.05\(\times \) 
kanban06  11.44  1.10  10.38\(\times \)  1.90  0.27  6.95\(\times \)  1.42  0.30  4.78\(\times \) 
kanban07  29.94  3.02  9.93\(\times \)  5.38  0.77  7.00\(\times \)  3.17  0.64  4.93\(\times \) 
kanban08  110.47  8.34  13.24\(\times \)  11.52  1.52  7.56\(\times \)  7.01  1.29  5.44\(\times \) 
kanban09  200.44  18.77  10.68\(\times \)  27.05  3.83  7.06\(\times \)  14.21  2.74  5.19\(\times \) 
CTMC model  
cycling4  170.2  9.51  17.91\(\times \)  40.22  3.05  13.21\(\times \)  59.51  3.32  17.90\(\times \) 
cycling5  1039.17  55.52  18.72\(\times \)  231.25  14.01  16.50\(\times \)  294.15  13.48  21.83\(\times \) 
fgf  17.77  1.64  10.83\(\times \)  6.12  0.61  9.99\(\times \)  7.42  0.73  10.20\(\times \) 
kanban3  19.32  1.5  12.87\(\times \)  6.4  0.58  11.07\(\times \)  7.04  0.49  14.26\(\times \) 
kanban4  285.52  14.72  19.40\(\times \)  81.57  4.67  17.48\(\times \)  104.65  5.08  20.60\(\times \) 
p2p56  22.1  2.34  9.45\(\times \)  9.66  1.12  8.63\(\times \)  10.25  1.41  7.29\(\times \) 
p2p65  7.45  0.91  8.17\(\times \)  3.41  0.45  7.64\(\times \)  3.67  0.55  6.71\(\times \) 
p2p75  17.55  2.02  8.71\(\times \)  8.84  1.05  8.39\(\times \)  9.26  1.19  7.79\(\times \) 
polling16  176.47  8.74  20.20\(\times \)  95.33  4.83  19.76\(\times \)  66.25  4.49  14.75\(\times \) 
polling17  416.17  20.65  20.16\(\times \)  223.11  11.51  19.39\(\times \)  161.74  10.02  16.14\(\times \) 
polling18  1063.13  53.38  19.92\(\times \)  542.02  26.43  20.51\(\times \)  359.49  21.68  16.58\(\times \) 
robot020  3.47  0.27  12.68\(\times \)  1.72  0.16  10.83\(\times \)  1.55  0.12  12.57\(\times \) 
robot025  6.97  0.54  13.00\(\times \)  3.39  0.32  10.66\(\times \)  2.91  0.25  11.83\(\times \) 
robot030  12.36  1.03  12.04\(\times \)  5.84  0.53  10.98\(\times \)  4.81  0.41  11.78\(\times \) 
IMC model (strong)  
ftwc01  1.62  0.16  10.06\(\times \)  1.69  0.14  12.22\(\times \)  0.96  0.08  11.98\(\times \) 
ftwc02  208.89  20.78  10.05\(\times \)  370.16  36.65  10.10\(\times \)  301.88  15.34  19.68\(\times \) 
IMC model (branching)  
ftwc01  0.36  0.05  6.99\(\times \)  0.3  0.03  9.00\(\times \)  0.19  0.03  6.83\(\times \) 
ftwc02  17.13  1.72  9.98\(\times \)  15.73  1.45  10.86\(\times \)  5.24  0.49  10.77\(\times \) 
7.2 Quotient computation
7.2.1 Design

blocks: block encoding using standard operations

block: block encoding using specialised operations

pick: pickonestate encoding, specialised operations
Number of BDD nodes for the transition relation after quotient computation, for the block number encoding and the pickonestate encoding
block  pick  factor  

LTS (strong)  
kanban03  710,359  6137  115.75\(\times \) 
kanban04  6,553,843  14,599  448.92\(\times \) 
kanban05  43,901,839  27,600  1590.65\(\times \) 
LTS (branching)  
kanban04  17,510  1081  16.20\(\times \) 
kanban05  47,920  1259  38.06\(\times \) 
kanban06  110,069  1944  56.62\(\times \) 
kanban07  233,902  1999  117.01\(\times \) 
kanban08  442,890  2838  156.06\(\times \) 
kanban09  800,649  3388  236.32\(\times \) 
IMC (strong)  
ftwc01  47,859  660  72.51\(\times \) 
ftwc02  5,669,528  1208  4693.32\(\times \) 
IMC (branching)  
ftwc01  2137  285  7.50\(\times \) 
ftwc02  49,093  413  118.87\(\times \) 
ftwc03  1,236,052  541  2284.75\(\times \) 
CTMC  
cycling4  1,869,641  185,824  10.06\(\times \) 
cycling5  8,960,365  430,936  20.79\(\times \) 
fgf  422,954  38,452  11.00\(\times \) 
kanban3  354,774  2473  143.46\(\times \) 
kanban4  3,032,327  4899  618.97\(\times \) 
p2p56  1513  2635  0.57\(\times \) 
p2p65  1039  2151  0.48\(\times \) 
p2p75  1428  3057  0.47\(\times \) 
polling16  715,145  494  1447.66\(\times \) 
polling17  1,442,013  529  2725.92\(\times \) 
polling18  2,901,462  562  5162.74\(\times \) 
robot020  148,385  3790  39.15\(\times \) 
robot025  260,514  4785  54.44\(\times \) 
robot030  411,624  5512  74.68\(\times \) 
7.2.2 Results
See Table 2 for the results of these experiments. The results show that the block implementation is faster than the blocks implementation, except for the ftwc02 model. For CTMC models, using specialised operations results in a speedup of 2–\(3{\times }\). For LTS models, using specialised operations results in a speedup of 5–\(9{\times }\). The pickonestate encoding shows mixed results for computation time, as it can be slower or faster than block encoding. Furthermore, we obtain a parallel speedup of up to \(20.5{\times }\) for the block encoding and \(21.9{\times }\) with the pickonestate encoding, with 48 workers.
See Table 3 for the sizes of the computed transition relations using block encoding and using pickonestate encoding, in number of BDD nodes. In many cases, pickonestate encoding is superior, with up to \(5162{\times }\) smaller BDDs for the polling models. For the p2p models, block encoding is superior, likely due to the small number of blocks after bisimulation minimisation.
7.3 Variable ordering
7.3.1 Design
As discussed in Sect. 4.2, we can choose to order block variables b before or after action variables a in the variable ordering of the BDDs. To compare the ordering \(s,s'< a < b\) and \(s,s'< b < a\), we compare signature refinement and quotient computation for the kanban LTS models.
Computation time in seconds on the LTS benchmarks, with the variable orders \(s,s'< a < b\) and \(s,s'< b < a\), for both partition refinement and quotient computation, with 1 worker and 48 workers
Model (bisimulation)  Partition, 1 worker  Partition, 48 workers  Quotient, 1 worker  Quotient, 48 workers  

\(a < b\)  \(b < a\)  \(a < b\)  \(b < a\)  \(a < b\)  \(b < a\)  \(a < b\)  \(b < a\)  
kanban03 (strong)  8.69  6.86  1.10  1.01  6.83  6.72  0.36  0.35 
kanban04 (strong)  127.54  102.11  13.86  11.66  98.12  106.22  4.25  5.38 
kanban05 (strong)  1211.20  1076.17  99.63  95.09  854.62  740.53  34.17  33.80 
kanban04 (branching)  0.40  0.38  0.22  0.23  0.16  0.16  0.04  0.04 
kanban05 (branching)  1.12  1.05  0.43  0.39  0.51  0.51  0.11  0.10 
kanban06 (branching)  2.88  2.65  0.92  0.89  1.42  1.42  0.30  0.30 
kanban07 (branching)  6.46  5.95  2.06  2.21  3.18  3.17  0.65  0.64 
kanban08 (branching)  13.09  11.95  4.27  3.60  7.04  7.01  1.33  1.29 
kanban09 (branching)  24.37  22.24  7.28  6.99  14.47  14.21  3.01  2.74 
7.3.2 Results
See Table 4 for the results of this experiment. All data points are computed with at least 5 runs. We computed the quotient using the pickonestate algorithm. We see that in most cases the ordering with b before a is superior. We observe a stronger effect for partition refinement than for quotient computation. The surprising exception is quotient computation of the kanban04 model with strong bisimulation, where the ordering with a before b is slightly better, although the total time still favours ordering b before a.
7.4 Process algebra experiments
7.4.1 Design
As described in Sect. 6.1, we extended SigrefMC with support for BDDs produced by the model checking toolset LTSmin from process algebra models specified in the mCRL2 specification language.
We first took a number of communication protocols from the mCRL2 example directory, in particular the bounded retransmission protocol (BRP) and the Sliding Window Protocol (SWP). We made them parametric in the number of data elements, number of retries, window size, etc. We also include a number of distributed algorithms. We ported the probabilistic leader election protocols [3], based on Dolev–Klawe–Rodeh and Franklin, from \(\mu \)CRL to mCRL2. We also included Hesselink’s hardware register [27]. Finally, we also included an industrial case study: Workload Management System of the computation grid at the Large Hadron Collider LHC (CERN), specified in [36].

SWP_m_n: the Sliding Window Protocol [1] on m data items, with window size n. This specifies a onedirectional version of the sliding window protocol. n subsequent data items can be sent and acknowledged in arbitrary order. This requires sequence numbers modulo 2n. Its external behaviour is equivalent to a 2nplace buffer.

BRP_m_\(\ell \)_n: the bounded retransmission protocol [24] on m data items, sending a list of length \(\ell \) and with n retries. This protocol extends the ABP, but gives up after n retries. The status of the transmission is returned to both the sender and the receiver. The external behaviour is a bit complicated, since the sender cannot distinguish if the last data element or the last acknowledgement got lost.

DKR_n: randomised variant [3] of Dolev–Klawe–Rodeh’s [22] Leader Election Protocol on a unidirectional ring with n anonymous partners. Several rounds may be needed when partners choose the same identity. The protocol is based on hop counters and on an alternating bit to distinguish subsequent rounds. The external behaviour is equivalent to a single leader action.

Franklin_n_m: randomised variant [3] of Franklin’s Leader Election Protocol [23], but now on a bidirectional ring with n partners, using \(m\le n\) different identities. The external behaviour is again equivalent to a single leader action.

Hesselink_n: Hesselink’s handshake register [27], constructed from four safe registers and four Boolean atomic registers, modelled in mCRL2 by Groote, and used for experimentation in [34].

WMS: this models the Workload Management System of the DIRAC (Distributed Infrastructure with Remote Agent Control) for the Large Hadron Collider experiments at CERN, as described in [36].
 1.
mcrl22lps Dfvn from the mCRL2 toolset to generate LPS files from the specifications
 2.
lps2ltssym vset=lddmc from the LTSmin toolset to generate the transition systems in LDD format from the LPS files
 3.
ldd2bdd from the LTSmin toolset to convert the transition systems from LDDs to BDDs
We measure the time for partition refinement and quotient computation with 1 worker and with 48 workers. Our experimental setup performed all benchmarks in random order and repeated the experiments ad infinitum. When we halted the script, every benchmark was performed at least \(6 \times \). The timeout was set to 1200 s for the entire program, i.e., partition refinement and quotient computation.
7.4.2 Results
The results are summarised in Tables 5 and 6. We do not include all results to conserve space; all results from the experiments are available online.
It is interesting to see that both strong and branching bisimulation result in huge reductions. We see clear benefit from parallel processing, with speedups of up to 24.7\(\times \) for signature refinement and up to 24.5\(\times \) for quotient computation (block encoding)
Results for the process algebra benchmarks generated with LTSmin
Model  States  Blocks  Signature refinement  Quotient (blocks)  Quotient (block)  

\(T_1\)  \(T_{48}\)  Sp.  \(T_1\)  \(T_{48}\)  Sp.  \(T_1\)  \(T_{48}\)  Sp.  
LTS model (strong)  
brp244  11,182  2976  3.92  0.35  11.23\(\times \)  1.32  0.43  3.08\(\times \)  0.57  0.04  13.26\(\times \) 
brp344  40,592  10,326  13.50  0.92  14.75\(\times \)  5.30  0.63  8.48\(\times \)  2.45  0.14  17.53\(\times \) 
brp444  109,422  27,106  38.91  2.23  17.43\(\times \)  18.03  1.49  12.10\(\times \)  9.84  0.52  18.93\(\times \) 
dkr3  11,455  208  7.64  0.54  14.05\(\times \)  3.64  0.38  9.57\(\times \)  1.31  0.09  14.50\(\times \) 
dkr4  909,593  3429  –  115.28  –  –  25.86  –  –  4.99  – 
franklin32  11,805  702  7.15  0.47  15.21\(\times \)  4.71  0.42  11.12\(\times \)  0.99  0.07  13.55\(\times \) 
franklin33  41,401  883  24  1.24  19.40\(\times \)  16.25  1.05  15.52\(\times \)  3.13  0.19  16.46\(\times \) 
franklin42  272,241  10,706  330.56  14.67  22.53\(\times \)  204.68  9.65  21.21\(\times \)  28.04  1.43  19.63\(\times \) 
franklin43  5,269,441  17,738  –  441.56  –  –  115.02  –  –  13.87  – 
hesselink2  540,736  1018  3.49  0.34  10.30\(\times \)  1.96  0.30  6.60\(\times \)  0.43  0.07  5.94\(\times \) 
hesselink3  13,834,800  2835  17.70  1.42  12.50\(\times \)  16.16  1.57  10.27\(\times \)  2.33  0.35  6.58\(\times \) 
hesselink4  142,081,536  6036  51.41  3.56  14.44\(\times \)  66.71  5.37  12.43\(\times \)  7.01  1.21  5.78\(\times \) 
hesselink5  883,738,000  11,005  179.85  12.61  14.26\(\times \)  313.42  25.40  12.34\(\times \)  22.32  3.64  6.14\(\times \) 
swp24  2,589,056  69,555  267.46  11.33  23.60\(\times \)  258.66  13.40  19.30\(\times \)  30.78  1.39  22.21\(\times \) 
swp32  52,380  4710  4.12  0.25  16.45\(\times \)  4.98  0.39  12.71\(\times \)  0.73  0.05  14.57\(\times \) 
swp33  1,652,724  65,025  142.60  6.13  23.26\(\times \)  188.10  9.60  19.60\(\times \)  24.89  1.11  22.39\(\times \) 
swp42  140,352  11,553  9.77  0.54  18.02\(\times \)  13.18  0.98  13.40\(\times \)  1.96  0.12  16.10\(\times \) 
swp43  7,429,632  –  630.73  25.92  24.34\(\times \)  –  47.05  –  111.69  4.55  24.56\(\times \) 
WMS  155,034,776  1  0.12  0.02  4.91\(\times \)  0.11  0.20  0.56\(\times \)  0.10  0.13  0.79\(\times \) 
LTS model (branching)  
brp244  11,182  98  3.63  0.36  10.11\(\times \)  0.28  0.10  2.67\(\times \)  0.18  0.02  7.71\(\times \) 
brp344  40,592  328  13.78  0.98  14.08\(\times \)  0.28  0.10  2.67\(\times \)  0.18  0.02  7.71\(\times \) 
brp444  109,422  858  39.71  2.16  18.38\(\times \)  4.04  0.48  8.47\(\times \)  4.52  0.23  19.64\(\times \) 
dkr3  11,455  2  4.46  0.33  13.39\(\times \)  0.94  0.38  2.47\(\times \)  0.63  0.05  11.79\(\times \) 
dkr4  909,593  2  349.24  15.31  22.81\(\times \)  45.30  10.60  4.27\(\times \)  25.73  1.37  18.82\(\times \) 
franklin32  11,805  2  3.62  0.29  12.58\(\times \)  0.53  0.35  1.50\(\times \)  0.28  0.04  6.64\(\times \) 
franklin33  41,401  2  11.94  0.66  17.96\(\times \)  1.80  0.47  3.88\(\times \)  0.95  0.07  13.55\(\times \) 
franklin42  272,241  2  50.97  2.40  21.28\(\times \)  4.76  1.76  2.71\(\times \)  2.19  0.18  12.28\(\times \) 
franklin43  5,269,441  2  807.72  32.69  24.71\(\times \)  67.70  22.37  3.03\(\times \)  31.94  1.56  20.52\(\times \) 
hesselink2  540,736  72  7.64  0.79  9.71\(\times \)  0.73  0.15  4.80\(\times \)  0.19  0.03  6.33\(\times \) 
hesselink3  13,834,800  189  37.10  2.76  13.46\(\times \)  5.88  0.66  8.86\(\times \)  0.94  0.13  7.36\(\times \) 
hesselink4  142,081,536  384  114.37  7.98  14.33\(\times \)  26.66  2.05  12.97\(\times \)  2.79  0.38  7.44\(\times \) 
hesselink5  883,738,000  675  351.69  23.93  14.70\(\times \)  102.95  7.38  13.95\(\times \)  8.33  1.11  7.49\(\times \) 
swp24  2,589,056  511  116.16  5.08  22.88\(\times \)  20.58  1.33  15.53\(\times \)  2.32  0.13  18.09\(\times \) 
swp32  52,380  121  4.41  0.31  14.07\(\times \)  0.67  0.09  7.76\(\times \)  0.11  0.01  11.00\(\times \) 
swp33  1,652,724  1093  135.99  6.21  21.88\(\times \)  18.35  1.16  15.84\(\times \)  2.34  0.12  19.51\(\times \) 
swp42  140,352  341  8.13  0.46  17.85\(\times \)  1.96  0.34  5.74\(\times \)  0.28  0.03  11.13\(\times \) 
swp43  7,429,632  5461  420.09  17.13  24.52\(\times \)  99.64  5.42  18.38\(\times \)  10.59  0.49  21.68\(\times \) 
WMS  155,034,776  1  0.36  0.22  1.66\(\times \)  0.11  0.22  0.51\(\times \)  0.10  0.11  0.93\(\times \) 
Results for the process algebra benchmarks generated with LTSmin
Model  States  Blocks  quotient (pick)  Number of nodes  

\(T_1\)  \(T_{48}\)  Sp.  block  pick  Factor  
LTS model (strong)  
brp244  11,182  2976  0.68  0.05  13.60\(\times \)  9383  10,390  0.9\(\times \) 
brp344  40,592  10,326  2.85  0.16  17.42\(\times \)  22,981  21,935  1.05\(\times \) 
brp444  109,422  27,106  10.94  0.58  18.69\(\times \)  43,414  48,777  0.89\(\times \) 
dkr3  11,455  208  1.48  0.10  14.85\(\times \)  1192  31,412  0.04\(\times \) 
dkr4  909,593  3429  –  5.44  –  –  –  – 
franklin32  11,805  702  1.27  0.10  13.32\(\times \)  3706  65,776  0.06\(\times \) 
franklin33  41,401  883  3.80  0.23  16.27\(\times \)  4840  101,813  0.05\(\times \) 
franklin42  272,241  10,706  38.27  1.86  20.52\(\times \)  58,428  799,831  0.07\(\times \) 
franklin43  5,269,441  17,738  –  15.79  –  –  –  – 
hesselink2  540,736  1018  0.60  0.11  5.76\(\times \)  5927  8368  0.71\(\times \) 
hesselink3  13,834,800  2835  3.13  0.54  5.77\(\times \)  12,575  16,965  0.74\(\times \) 
hesselink4  142,081,536  6036  9.32  1.81  5.15\(\times \)  20,648  25,722  0.8\(\times \) 
hesselink5  883,738,000  11,005  28.71  4.92  5.83\(\times \)  32,335  43,141  0.75\(\times \) 
swp24  2,589,056  69,555  50.90  2.47  20.62\(\times \)  485,607  154,904  3.13\(\times \) 
swp32  52,380  4710  1.20  0.09  13.09\(\times \)  40,718  23,401  1.74\(\times \) 
swp33  1,652,724  65,025  –  –  –  435,339  –  – 
swp42  140,352  11,553  3.08  0.23  13.22\(\times \)  93,494  40,475  2.31\(\times \) 
swp43  7,429,632  264,708  164.38  6.69  24.56\(\times \)  1,474,564  404,756  3.64\(\times \) 
WMS  155,034,776  1  0.11  0.10  1.05\(\times \)  7  265  0.03\(\times \) 
LTS model (branching)  
brp244  11,182  98  0.20  0.02  10\(\times \)  804  4514  0.18\(\times \) 
brp344  40,592  328  1.00  0.06  16.16\(\times \)  2136  12,221  0.17\(\times \) 
brp444  109,422  858  4.67  0.24  19.33\(\times \)  4383  31,192  0.14\(\times \) 
dkr3  11,455  2  0.68  0.06  11.81\(\times \)  5  163  0.03\(\times \) 
dkr4  909,593  2  26.40  1.43  18.5\(\times \)  5  251  0.02\(\times \) 
franklin32  11,805  2  0.29  0.03  8.8\(\times \)  5  139  0.04\(\times \) 
franklin33  41,401  2  1.00  0.07  14.26\(\times \)  5  175  0.03\(\times \) 
franklin42  272,241  2  2.27  0.18  12.7\(\times \)  5  203  0.02\(\times \) 
franklin43  5,269,441  2  33.04  1.59  20.8\(\times \)  5  267  0.02\(\times \) 
hesselink2  540,736  72  0.22  0.03  7.33\(\times \)  653  3700  0.18\(\times \) 
hesselink3  13,834,800  189  1.10  0.14  7.62\(\times \)  1516  9191  0.16\(\times \) 
hesselink4  142,081,536  384  3.36  0.42  7.94\(\times \)  2329  14,253  0.16\(\times \) 
hesselink5  883,738,000  675  9.94  1.22  8.14\(\times \)  3749  24,943  0.15\(\times \) 
swp24  2,589,056  511  2.80  0.14  19.99\(\times \)  1821  4722  0.39\(\times \) 
swp32  52,380  121  0.13  0.01  13\(\times \)  555  2620  0.21\(\times \) 
swp33  1,652,724  1093  2.75  0.14  20.3\(\times \)  3994  10,461  0.38\(\times \) 
swp42  140,352  341  0.33  0.02  14\(\times \)  1588  4952  0.32\(\times \) 
swp43  7,429,632  5461  12.28  0.56  21.87\(\times \)  15,050  26,941  0.56\(\times \) 
WMS  155,034,776  1  0.11  0.11  0.97\(\times \)  3  89  0.03\(\times \) 
8 Conclusions
Originally, we intended to investigate parallelism in symbolic bisimulation minimisation. To our surprise, we obtained a much higher sequential speedup using specialised BDD operations, as demonstrated by the results in Table 1 and Fig. 4. The specialised BDD operations offer a clear advantage sequentially and the integration with Sylvan results in decent parallel speedups. Our best result had a total speedup of 767\(\times \). By also using specialised BDD operations for quotient computation, we demonstrated performance improvements in 2–10\(\times \) over using standard BDD operations.
The success of this approach suggests that for applications that involve decision diagrams, specialised operations that combine sequential steps can be a good method to obtain performance improvements in several orders of magnitude. Similarly, the additional performance improvement gained by the parallel framework from Sylvan is relatively low hanging fruit to improve the performance of symbolic algorithms with decision diagrams.
The pickonestate encoding that we proposed in this paper is promising, especially for transition systems that are still relatively large after bisimulation minimisation. The implementation discussed here just picked an arbitrary state; we expect that better heuristics may be developed in the future.
A limitation of this study is that we only measured the performance on the benchmarks that were used in [38, 40] and on several benchmarks from the mCRL2 distribution.
Footnotes
Notes
Acknowledgements
Open access funding provided by Johannes Kepler University Linz.
References
 1.Badban, B., Fokkink, W., Groote, J.F., Pang, J., van de Pol, J.: Verification of a sliding window protocol in \(\mu \)CRL and PVS. Formal Asp. Comput. 17(3), 342–388 (2005)CrossRefMATHGoogle Scholar
 2.Bahar, R.I., Frohm, E.A., Gaona, C.M., Hachtel, G.D., Macii, E., Pardo, A., Somenzi, F.: Algebraic decision diagrams and their applications. ICCAD 1993, 188–191 (1993)Google Scholar
 3.Bakhshi, R., Fokkink, W., Pang, J., van de Pol, J.: Leader election in anonymous rings: Franklin goes probabilistic. In: Ausiello, G., Karhumäki, J., Mauri, G., Ong, C.L. (eds.) TCS’08, IFIP, vol. 273, pp. 57–72. Springer, Berlin (2008)Google Scholar
 4.Blom, S., Haverkort, B.R., Kuntz, M., van de Pol, J.: Distributed Markovian bisimulation reduction aimed at CSL model checking. ENTCS 220(2), 35–50 (2008)MATHGoogle Scholar
 5.Blom, S., Orzan, S.: Distributed branching bisimulation reduction of state spaces. ENTCS 89(1), 99–113 (2003)MATHGoogle Scholar
 6.Blom, S., van de Pol, J., Weber, M.: LTSmin: distributed and symbolic reachability. In: CAV, LNCS, vol. 6174, pp. 354–359. Springer (2010)Google Scholar
 7.Blumofe, R.D.: Scheduling multithreaded computations by work stealing. In: FOCS, pp. 356–368. IEEE Computer Society (1994)Google Scholar
 8.Bouali, A., de Simone, R.: Symbolic bisimulation minimisation. In: Computer Aided Verification, 4th International Workshop, LNCS, vol. 663, pp. 96–108. Springer (1992)Google Scholar
 9.Brace, K.S., Rudell, R.L., Bryant, R.E.: Efficient implementation of a BDD package. In: DAC, pp. 40–45 (1990)Google Scholar
 10.Bryant, R.E.: Graphbased algorithms for Boolean function manipulation. IEEE Trans. Comput. C–35(8), 677–691 (1986)CrossRefMATHGoogle Scholar
 11.Burch, J., Clarke, E., Long, D., McMillan, K., Dill, D.: Symbolic model checking for sequential circuit verification. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 13(4), 401–424 (1994)CrossRefGoogle Scholar
 12.Clarke, E.M., McMillan, K.L., Zhao, X., Fujita, M., Yang, J.: Spectral transforms for large Boolean functions with applications to technology mapping. In: DAC, pp. 54–60 (1993)Google Scholar
 13.Cranen, S., Groote, J.F., Keiren, J.J.A., Stappers, F.P.M., de Vink, E.P., Wesselink, W., Willemse, T.A.C.: An overview of the mCRL2 toolset and its recent advances. In: TACAS, LNCS, vol. 7795, pp. 199–213. Springer (2013)Google Scholar
 14.De Nicola, R., Vaandrager, F.W.: Three logics for branching bisimulation. J. ACM 42(2), 458–487 (1995)MathSciNetCrossRefMATHGoogle Scholar
 15.Derisavi, S.: A symbolic algorithm for optimal Markov chain lumping. TACAS 2007, 139–154 (2007)MATHGoogle Scholar
 16.Derisavi, S.: Signaturebased symbolic algorithm for optimal Markov chain lumping. In: QEST 2007, pp. 141–150. IEEE Computer Society (2007)Google Scholar
 17.van Dijk, T.: Sylvan: multicore decision diagrams. Ph.D. thesis, University of Twente (2016)Google Scholar
 18.van Dijk, T., Laarman, A., van de Pol, J.: Multicore BDD operations for symbolic reachability. ENTCS 296, 127–143 (2013)Google Scholar
 19.van Dijk, T., van de Pol, J.: Lace: nonblocking split deque for workstealing. In: MuCoCoS, LNCS, vol. 8806, pp. 206–217. Springer (2014)Google Scholar
 20.van Dijk, T., van de Pol, J.: Sylvan: multicore decision diagrams. In: TACAS, LNCS, vol. 9035, pp. 677–691. Springer (2015)Google Scholar
 21.van Dijk, T., van de Pol, J.: Multicore symbolic bisimulation minimisation. In: TACAS, LNCS, vol. 9636, pp. 332–348. Springer (2016)Google Scholar
 22.Dolev, D., Klawe, M.M., Rodeh, M.: An \(o(n \log n)\) unidirectional distributed algorithm for extrema finding in a circle. J. Algorithms 3(3), 245–260 (1982)MathSciNetCrossRefMATHGoogle Scholar
 23.Franklin, W.R.: On an improved algorithm for decentralized extrema finding in circular configurations of processors. Commun. ACM 25(5), 336–337 (1982)CrossRefGoogle Scholar
 24.Groote, J.F., van de Pol, J.: A bounded retransmission protocol for large data packets. In: Wirsing, M., Nivat, M. (eds.) AMAST’96, LNCS 1101, pp. 536–550. Springer, Berlin (1996)Google Scholar
 25.Hermanns, H.: Interactive Markov Chains: The Quest for Quantified Quality, Lecture Notes in Computer Science, vol. 2428. Springer, Berlin (2002)MATHGoogle Scholar
 26.Hermanns, H., Katoen, J.: The how and why of interactive Markov chains. In: FMCO’09, LNCS 6286, pp. 311–337. Springer (2009)Google Scholar
 27.Hesselink, W.H.: Invariants for the construction of a handshake register. Inf. Process. Lett. 68(4), 173–177 (1998)CrossRefMATHGoogle Scholar
 28.Kant, G., Laarman, A., Meijer, J., van de Pol, J., Blom, S., van Dijk, T.: LTSmin: highperformance languageindependent model checking. In: TACAS 2015, LNCS, vol. 9035, pp. 692–707. Springer (2015)Google Scholar
 29.Kulakowski, K.: Concurrent bisimulation algorithm. CoRR. arXiv:1311.7635 (2013)
 30.Kwiatkowska, M.Z., Norman, G., Parker, D.: PRISM 4.0: verification of probabilistic realtime systems. In: CAV, LNCS, vol. 6806, pp. 585–591. Springer (2011)Google Scholar
 31.Meijer, J., Kant, G., Blom, S., van de Pol, J.: Read, write and copy dependencies for symbolic model checking. In: Yahav, E. (ed.) HVC, LNCS, vol. 8855, pp. 204–219. Springer, Berlin (2014)Google Scholar
 32.Mumme, M., Ciardo, G.: An efficient fully symbolic bisimulation algorithm for nondeterministic systems. Int. J. Found. Comput. Sci. 24(2), 263–282 (2013)MathSciNetCrossRefMATHGoogle Scholar
 33.Paige, R., Tarjan, R.E.: Three partition refinement algorithms. SIAM J. Comput. 16(6), 973–989 (1987)MathSciNetCrossRefMATHGoogle Scholar
 34.van de Pol, J., Timmer, M.: State space reduction of linear processes using control flow reconstruction. In: Liu, Z., Ravn, A.P. (eds.) ATVA’09, LNCS, vol. 5799, pp. 54–68. Springer, Berlin (2009)Google Scholar
 35.Pugh, W.: Skip lists: a probabilistic alternative to balanced trees. Commun. ACM 33(6), 668–676 (1990)CrossRefGoogle Scholar
 36.Remenska, D., Willemse, T.A.C., Verstoep, K., Fokkink, W., Templon, J., Bal, H.E.: Using model checking to analyze the system behavior of the LHC production grid. In: CCGrid’12, pp. 335–343. IEEE Computer Society (2012)Google Scholar
 37.Wijs, A.: GPU accelerated strong and branching bisimilarity checking. TACAS 2015, 368–383 (2015)Google Scholar
 38.Wimmer, R., Becker, B.: Correctness issues of symbolic bisimulation computation for Markov chains. In: MMB&DFT, LNCS, vol. 5987, pp. 287–301. Springer (2010)Google Scholar
 39.Wimmer, R., Derisavi, S., Hermanns, H.: Symbolic partition refinement with automatic balancing of time and space. Perform. Eval. 67(9), 816–836 (2010)CrossRefGoogle Scholar
 40.Wimmer, R., Herbstritt, M., Becker, B.: Optimization techniques for BDDbased bisimulation computation. In: 17th GLSVLSI, pp. 405–410. ACM (2007)Google Scholar
 41.Wimmer, R., Herbstritt, M., Hermanns, H., Strampp, K., Becker, B.: Sigref—a symbolic bisimulation tool box. In: ATVA, LNCS, vol. 4218, pp. 477–492. Springer (2006)Google Scholar
 42.Wimmer, R., Hermanns, H., Herbstritt, M., Becker, B.: Towards symbolic stochastic aggregation. Technical Report, SFB/TR 14 AVACS (2007)Google Scholar
Copyright information
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.