An O(m log n) algorithm for branching bisimilarity on labelled transition systems

Branching bisimilarity is a behavioural equivalence relation on labelled transition systems (LTSs) that takes internal actions into account. It has the traditional advantage that algorithms for branching bisimilarity are more efficient than ones for other weak behavioural equivalences, especially weak bisimilarity. With m the number of transitions and n the number of states, the classic \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${O\left( {m n}\right) }$$\end{document} algorithm was recently replaced by an \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$O({m (\log \left| { Act }\right| + \log n)})$$\end{document} algorithm [9], which is unfortunately rather complex. This paper combines its ideas with the ideas from Valmari [20], resulting in a simpler \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$O({m \log n})$$\end{document} algorithm. Benchmarks show that in practice this algorithm is also faster and often far more memory efficient than its predecessors, making it the best option for branching bisimulation minimisation and preprocessing for calculating other weak equivalences on LTSs.


Introduction
Branching bisimilarity [8] is an alternative to weak bisimilarity [17]. Both equivalences allow the reduction of labelled transition systems (LTSs) containing transitions labelled with internal actions, also known as silent, hidden or τ -actions.
One of the distinct advantages of branching bisimilarity is that, from the outset, an efficient algorithm has been available [10], which can be used to calculate whether two states are equivalent and to calculate a quotient LTS. It has complexity O(mn) with m the number of transitions and n the number of states. It is more efficient than classic algorithms for weak bisimilarity, which use transitive closure (for instance, [16] runs in O n 2 m log n + mn 2.376 , where n 2.376 is the time for computing the transitive closure), and algorithms for weak simulation equivalence (strong simulation equivalence can be computed in O(mn) [12], and for weak simulation equivalence first the transitive closure needs to be computed). The algorithm is also far more efficient than algorithms for trace-based equivalence notions, such as (weak) trace equivalence or weak failure equivalence [16].
Branching bisimilarity also enjoys the nice mathematical property that there exists a canonical quotient with a minimal number of states and transitions (contrary to, for instance, trace-based equivalences). Additionally, as branching bisimilarity is coarser than virtually any other behavioural equivalence taking internal actions into account [7], it is ideal for preprocessing. In order to calculate a desired equivalence, one can first reduce the behaviour modulo branching bisimilarity, before applying a dedicated algorithm on the often substantially reduced transition system. In the mCRL2 toolset [5] this is common practice.
In [9,11] an algorithm to calculate stuttering equivalence on Kripke structures with complexity O(m log n) was proposed. Stuttering equivalence essentially differs from branching bisimilarity in the fact that transitions do not have labels and as such all transitions can be viewed as internal. In these papers it was shown that branching bisimilarity can be calculated by translating LTSs to Kripke structures, encoding the labels of transitions into labelled states following [6,19]. This led to an O(m(log |Act| + log n)) or O(m log m) algorithm for branching bisimilarity.
Besides the time complexity, the algorithm in [9,11] has two disadvantages. First, the translation to Kripke structures introduces a new state and a new transition per action label and target state of a transition, which increases the memory required to calculate branching bisimilarity. This made it far less memory efficient than the classical algorithm of [10], and this was perceived as a substantial practical hindrance. For instance, when reducing systems consisting of tens of millions of states, such as [2], memory consumption is the bottleneck. Second, the algorithm in [9,11] is very complex. To illustrate the complexity, implementing it took approximately half a person-year.
Contributions. We present an algorithm for branching bisimilarity that runs directly on LTSs in O(m log n) time and that is simpler than the algorithm of [9,11]. To achieve this we use an idea from Valmari and Lehtinen [20,21] for strong bisimilarity. The standard Paige-Tarjan algorithm [18], which has O(m log n) time complexity for strong bisimilarity on Kripke structures, registers work done in a separate partition of states. Valmari [20] observed that this leads to complexity O(m log m) on LTSs and proposed to use a partition of transitions, whose elements he (and we) calls bunches, to register work done. This reduces the time complexity on LTSs to O(m log n).
Using this idea we design our more straightforward algorithm for branching bisimilarity on LTSs. Essentially, this makes the maintenance of action labels particularly straightforward and allows to simplify the handling of new, so-called, bottom states [10]. It also leads to a novel main invariant, which we formulate as Invariant 1. It allows us to prove the correctness of the algorithm in a far more straightforward way than before.
We have proven the correctness and complexity of the algorithm in detail [14] and demonstrate that it outperforms all preceding algorithms both in time and space when the LTSs are sizeable. This is illustrated with more than 30 example LTSs. This shows that the new algorithm pushes the state-of-the-art in comparing and minimising the behaviour of LTSs w.r.t. weak equivalences, either directly (branching bisimilarity) or using the form of a preprocessing step (for other weak equivalences).
Despite the fact that this new algorithm is more straightforward than the previous O(m(log |Act| + log n)) algorithm [9], the implementation of the algorithm is still not easy. To guard against implementation errors, we extensively applied random testing, comparing the output with that of other algorithms. The algorithms and their source code are freely available in the mCRL2 toolset [5].
Overview of the article. In Section 2 we provide the definition of LTSs and branching bisimilarity. In Section 3 we provide the core algorithm with high-level data structures, correctness and complexity. The subsequent section presents the procedure for splitting blocks, which can be presented as an independent pair of coroutines. Section 5 presents some benchmarks. Proofs and implementation details are omitted in this paper, and can be found in [14].

Branching bisimilarity
In this section we define labelled transition systems and branching bisimilarity.
S is a finite set of states. The number of states is denoted by n. 2. Act is a finite set of actions including the internal action τ . 3. − → ⊆ S × Act × S is a transition relation. The number of transitions is necessarily finite and denoted by m.
It is common to write t a − → t for (t, a, t ) ∈ − →. With slight abuse of notation we write t a − → t ∈ T instead of (t, a, t ) ∈ T for T ⊆ − →. We also write t a − → Z for the set of transitions {t a − → t | t ∈ Z}, and Z a − → Z for the set {t a − → t | t ∈ Z, t ∈ Z }. We call all actions except τ the visible actions. If t a − → t , we say that from t, the state t , the action a, and the transition t a − → t are reachable.
Definition 2 (Branching bisimilarity). Let A = (S, Act, − →) be an LTS. We call a relation R ⊆ S × S a branching bisimulation relation iff it is symmetric and for all s, t ∈ S such that s R t and all transitions s a − → s we have: 1. a = τ and s R t, or 2. there is a sequence t τ − → · · · τ − → t a − → t such that s R t and s R t . Two states s and t are branching bisimilar, denoted by s ↔ b t, iff there is a branching bisimulation relation R such that s R t.
Note that branching bisimilarity is an equivalence relation. Given an equivalence relation R, a transition s a − → t is called inert iff a = τ and s R t. If t − → t such that t R t i for 1 ≤ i ≤ n, we say that the state t n , the action a, and the transition t n a − → t are inertly reachable from t.
The equivalence classes of branching bisimilarity partition the set of states.
We will often use that a partition Π induces an equivalence relation in the following way: s ≡ Π t iff there is some B ∈ Π containing both s and t.

The algorithm
In this section we present the core algorithm. In the next section we deal with the actual splitting of blocks in the partition. We start off with an abstract description of this core part.

High-level description of the algorithm
The algorithm is a partition refinement algorithm. It iteratively refines two partitions Π s and Π t . Partition Π s is a partition of states in S that is coarser than branching bisimilarity. We refer to the elements of Π s as blocks, typically denoted using B. Partition Π t partitions the non-inert transitions of − →, where inertness is interpreted with respect to ≡ Πs . We refer to the elements of Π t as bunches, typically denoted using T .
The partition of transitions Π t records the current knowledge about transitions. Transitions are in different bunches iff the algorithm has established that they cannot simulate each other (i.e., they cannot serve as s a − → s and t a − → t in Definition 2).
The partition of states Π s records the current knowledge about branching bisimilarity. Two states are in different blocks iff the algorithm has found a proof that they are not branching bisimilar (this is formalised in Invariant 3). This implies that Π s must be such that states with outgoing transitions in different combinations of bunches are in different blocks (Invariant 1).
Before performing partition refinement, the LTS is preprocessed to contract τ -strongly connected components (SCCs) into a single state without a τ -loop. This step is valid as all states in a τ -SCC are branching bisimilar. Consequently, every block has bottom states, i.e., states without outgoing inert τtransitions [10].
The core invariant of the algorithm says that if one state in a block can inertly reach a transition in a bunch, all states in that block can inertly reach a transition in this bunch. This can be formulated in terms of bottom states: Invariant 1 (Bunches). Π s is stable under Π t , i.e., if a bunch T ∈ Π t contains a transition with its source state in a block B ∈ Π s , then every bottom state in block B has a transition in bunch T .
The initial partitions Π s and Π t are the coarsest partitions that satisfy Invariant 1. Π t starts with a single bunch consisting of all non-inert transitions. Then, in Π s we need to separate states with some transition in this bunch from those without. We define B vis to be the set of states from which a visible transition is inertly reachable, and B invis to be the other states. Then Π s = {B vis , B invis }\{∅}.
Transitions in a bunch may have different labels or go to different blocks. In that case, the bunch can be split as these transitions cannot simulate each other. If we manage to achieve the situation where all transitions in a bunch have the same label and go to the same target block, the obtained partition turns out to be a branching bisimulation. Therefore, we want to split each bunch into socalled action-block-slices defined below. We also immediately define some other sets derived from Π t and Π s as we require them in our further exposition. So, we have: -The action-block-slices, i.e., the transitions in T with label a ending in B : The block-bunch-slices, i.e., the transitions in T starting in B: The bottom states of B, i.e., the states without outgoing inert transitions: The block-bunch-slices and action-block-slices are explicitly maintained as auxiliary data structures in the algorithm in order to meet the required performance bounds. If the partitions Π s or Π t are adapted, all the derived sets above also change accordingly.
A bunch can be trivial, which means that it only contains one action-blockslice, or it can contain multiple action-block-slices. In the latter case one actionblock-slice is split off to become a bunch by itself. However, this may invalidate Invariant 1. Some states in a block may only have transitions in the new bunch while other states have only transitions in the old bunch. Therefore, blocks have to be split to satisfy Invariant 1. Splitting blocks can cause bunches to become non-trivial because action-block-slices fall apart.
This splitting is repeated until all bunches are trivial, and as already stated above, the obtained partition Π s is the required branching bisimulation. As the transition system is finite this process of repeated splitting terminates.

Abstract algorithm
We first present an abstract version of the algorithm in Algorithm 1. Its behaviour is as follows. As long as there are non-trivial bunches-i.e, bunches containing multiple action-block-slices-, these bunches need to be split such that they ultimately become trivial. The outer loop (Lines 1. First make T B a − →B a primary splitter; then make T B− → \T B a − →B a secondary splitter if T B− → was a primary splitter (note: Split R into the subblock N that can inertly reach R τ − → U and the rest R To restore this stability we investigate all block-bunch-slices in one of the new bunches, namely T a − →B . Blocks that do not have transitions in these blockbunch-slices are stable with respect to both bunches. To keep track of the blocks that still need to be split, we partition the block-bunch-slices T B− → into stable and unstable block-bunch-slices. A block-bunch-slice is stable if we have ensured that it is not a splitter for any block. Otherwise it is deemed unstable, and it needs to be checked whether it is stable, or whether the block B must be split. The first inner loop (Lines 1.5-1.7) inserts all unstable block-bunch-slices into the splitter list. Block-bunch-slices of the shape T B a − →B in the splitter list are labelled primary, and other list entries are labelled secondary.
In the second loop (Lines 1.8-1.18), one splitter T B− → from the splitter list is taken at a time and its source block is split into R (the part that can inertly reach T B− → ) and U (the part that cannot inertly reach T B− → ) to re-establish stability.
If T B− → was a primary splitter of the form T B a − →B , then we know that U must be stable under T U − → \ T U a − →B , as every bottom state in B has a transition in the former block-bunch-slice T B− → , and as the states in U have no transition in T B a − →B , every bottom state in U must have a transition in T B− → \ T B a − →B . Therefore, at Line 1.11, block-bunch-slice T U − → \ T U a − →B can be removed from the splitter list. This is the three-way split from [18].
Some inert transitions may have become non-inert, namely the τ -transitions that go from R to U . There cannot be τ -transitions from U to R. The new noninert transitions were not yet part of a bunch in Π t . So, a new bunch R τ − → U is formed for them. All transitions in this new bunch leave R and thus R is the only block that may not be stable under this new bunch. To avoid superfluous work, we split off the unstable part N , i.e. the part that can inertly reach a transition in R τ − → U and contains all new bottom states, at Line 1.14. The original bottom states of R become the bottom states of R . There can be transitions N τ − → R that also become non-inert, and we add these to the new bunch R τ − → U . As observed in [10], blocks containing new bottom states can become unstable under any bunch. So, stability of N (but not of R ) must be re-established, and all block-bunch-slices leaving N are put on the splitter list at Line 1.15.

Correctness
The validity of the algorithm follows from a number of major invariants. The main invariant, Invariant 1, is valid at Line 1.2. Additionally, the algorithm satisfies the following three invariants.

Invariant 4 (No inert loops).
There is no inert loop in a block, i.e., for every sequence s 1 Invariant 2 indicates that two non-inert transitions that (1) start in the same block, (2) have the same label, and (3) end in the same block, always reside in the same bunch. Invariant 3 says that branching bisimilar states never end up in separate blocks. Invariant 4 ensures that all τ -paths in each block are finite. As a consequence every block has at least one bottom state, and from every state a bottom state can be inertly reached.
The invariants given above allow us to prove that the algorithm works correctly. When the algorithm terminates (and this always happens, see Section 3.5), branching bisimilar states are perfectly grouped in blocks. Theorem 1. From the Invariants 1, 3 and 4, it follows that after the algorithm terminates, Because of the space restrictions here, the proofs are omitted. The interested reader is referred to [14] for the details.

In-depth description of the algorithm
To show that the algorithm has the desired O(m log n) time complexity, we now give a more detailed description of the algorithm. The pseudocode of the detailed algorithm is given in Algorithm 2. This algorithm serves two purposes. First of all, it clarifies how the data structures are used, and refines many of the steps in the high-level algorithm. Additionally, time budgets for parts of the algorithm Algorithm 2 Detailed algorithm for branching bisimulation partitioning if T B− → was a primary splitter (note: 2.25: are printed in grey at the right-hand side of the pseudocode. We use these time budgets in Section 3.5 to analyse the overall complexity of the algorithm. We focus on the most important details in the algorithm. At Lines 2.6-2.7, a small action-block-slice T a − →B is moved into its own bunch, and T is reduced to T \ T a − →B . All blocks that have transitions in the two new bunches are added to the splitter list in Lines 2.8-2.13. This loop also marks some transitions (in the time complexity annotations we write Marked (T B− → ) for the marked transitions of block-bunch-slice T B− → ). The function of this marking is similar to that of the counters in [18]: it serves to determine quickly whether a bottom state has a transition in a secondary splitter T B− → \T B a − →B (or slices that are the result of splitting this slice). In general, a bottom state has transitions in some splitter block-bunch-slice if and only if it has marked transitions in this slice. There is one exception: After splitting under a primary splitter T B− → , bottom states in U are not marked. But as they always have a transition in T U − → \ T U a − →B , U is already stable in this case (see Line 2.19). The second loop is refined to Lines 2.14-2.30. In every iteration one splitter T B− → from the splitter list is considered, and its source block is first split into R and U . Formally, the routine split(B, T ) delivers the pair R, U defined by: We detail its algorithm and discuss its correctness in Section 4. In Lines 2.21-2.28, the situation is handled when some inert transitions have become non-inert. We mark one of the outgoing transitions of every new bottom state such that we can find the bottom states with a transition in T N − → in time proportional to the number of such new bottom states.
We illustrate the algorithm in the following example. Note this also illustrates some of the details of the split subroutine, which is discussed in detail in Section 4. Figure 1a. Observe that block B is stable w.r.t. the bunches T and T . We have split off a small bunch T a − →B from T , and as a consequence, B needs to be restabilised. The bunches put on the splitter list initially are T a − →B and T \ T a − →B . When putting these bunches on the splitter list, all transitions in T B a − →B are marked, see the m's in Figure 1b. Also, for states that have transitions both in T a − →B and in T \ T a − →B , one transition in the latter bunch is marked, see the m's in Figure 1b.

Example 1. Consider the situation in
We now first split B w.r.t. the primary splitter T a − →B into R, the states that can inertly reach T a − →B , and U , the states that cannot. In Figure 1b, the states known to be destined for R are indicated by , the states known to be destined for U are indicated by . Initially, all states with a marked outgoing transition are destined for R, the remaining bottom state of B is destined for U . The split subroutine proceeds to extend sets R and U in a backwards fashion using two coroutines, marking a state destined for R if one of its successors is already in R, and marking a state destined for U if all its successors are in U . Here, the state in U does not have any incoming inert transitions, so its coroutine immediately terminates and all other states belong to R. Block B is split into subblocks R and U , as shown in Figure 1c. Block U is stable w.r.t. both T a − →B and T \ T a − →B . We still need to split R w.r.t. T \ T a − →B , into R 1 and U 1 , say. For this, we use the marked transitions in T \ T a − →B as a starting point to compute all bottom states that can reach a transition in T \ T a − →B . This guarantees that the time we use is proportional to the size of T a − →B . Initially, there is one state destined for R 1 , marked in Figure 1c, and one state destined for U 1 , marked in the same figure. We now perform the two coroutines in split simultaneously. Figure 1d shows the situation after both coroutines have considered one transition: The U 1 -coroutine (which calculates the states that cannot inertly reach T \ T a − →B ) has initialised the counter untested of one state to 2 on Line 3.9 of Algorithm 3 because two of its outgoing inert transitions have not yet been considered. The R 1 -coroutine (which calculates the states that can inertly reach T \ T a − →B ) has checked the unmarked transition in the splitter T R− → \ T R a − →B . As the latter coroutine has finished visiting unmarked transitions in the splitter, the U 1 -coroutine no longer needs to run the slow test loop at Lines 3.13 -3.17 of the left column of Algorithm 3. In Figure 1e the situation is shown after two more steps in the coroutines. Each has visited two extra transitions. There two extra are states destined for R 1 , marked , and one state is destined for U 1 with 0 remaining inert transitions, for which we know immediately that it has no transition in T \ T a − →B , this is marked . Now, the R 1 -coroutine is terminated, since it contains more that 1 2 |R| states, and the remaining incoming transitions of states in U 1 are visited. This will not further extend U 1 . The result of splitting is shown in Figure 1f. Some inert transitions become non-inert, so a new bunch with transitions R 1 τ − → U 1 is created, and all these transitions are marked m. We next have to split R 1 with respect to this new bunch into the set of states N 1 that can inertly reach a transition in the new bunch, and the set R 1 that cannot inertly reach this bunch. In this case, all states in R 1 have a marked outgoing transition, hence N 1 = R 1 , and R 1 = ∅. The coroutine that calculates the set of states that cannot inertly reach a transition in the bunch will immediately terminate because there are no transitions to be considered.
Observe that R 1 (= N 1 ) has a new bottom state, marked 'nb'. This means that stability of R 1 with respect to any bunch is not guaranteed any more and needs to be re-established. We therefore consider all bunches in which R 1 has an outgoing transition. We add T R1 a − →B , T R1− → \ T R1 a − →B and T R1− → to the splitter list as secondary splitters, and mark one outgoing transition from each bottom state in each of these bunches using m. This situation is shown in Figure 1g.
In this case, R 1 is stable w.r.t. T R1 a − →B and T R1− → \ T R1 a − →B , i.e., all states in R 1 can inertly reach a transition in both bunches. In both cases this is observed immediately after initialisation in split, since the set of states that cannot inertly reach a transition in these bunches is initially empty, and the corresponding coroutine terminates immediately. Therefore, consider splitting R 1 with respect to T R1− → . This leads to R 2 , the set of states that can inertly reach a transition in T , and U 2 , the set of states that cannot inertly reach a transition in T . Note there are no marked transitions in T R1− → , so initially all bottom states of R 1 are destined for U 2 (marked in Figure 1h), and there are no states destined for R 2 . Then we start splitting R 1 . In the R 2 -coroutine, we first add the states with an unmarked transition in T R1− → to R 2 at Line 3.4r (i.e., in the right column of Algorithm 3) and then all predecessors of the new bottom state need to be considered. When split terminates, there will be no additional states in U 2 , and the remaining states end up in R 2 .
The situation after splitting R 1 into R 2 and U 2 is shown in Figure 1i. One of the inert transitions (marked m) becomes non-inert. Furthermore, R 2 contains a new bottom state. This is the state with a transition in T . As each block must have a bottom state, a non-bottom state had to become a bottom state.
We need to continue stabilising R 2 w.r.t. bunch R 2 τ − → U 2 , which does not lead to a new split, and we need to restabilise R 2 w.r.t. all bunches in which it has an outgoing transition. This also does not lead to new splits, so the situation in Figure 1i after removing the markings is the final result of splitting.

Time complexity
Throughout this section, let n be the number of states and m the number of transitions in the LTS. To simplify the complexity notations we assume that n ≤ m + 1. This is not a significant restriction, since it is satisfied by any LTS in which every non-initial state has an incoming transition. We also write in(s) and out(s) for the sets of incoming and outgoing transitions of state s.
We use the principle "Process the smaller half" [13]: when a set is split into two parts, we spend time proportional to the size of the smaller subset. This leads to a logarithmic number of operations assigned to each element. We apply this principle twice, once to new bunches and once to new subblocks. Additionally, we spend some time on new bottom states. This is formulated in the following theorem. The initialisation in Lines 2.1-2.5 can be performed in O(m) time, where the assumption n ≤ m + 1 is used. Furthermore, we assume that we can access action labels fast enough to bucket sort the transitions in time O(m), which is for instance the case if the action labels are consecutively numbered.
To meet the indicated time budgets, our implementation uses a number of data structures. States are stored in a refinable partition [21], grouped per block, in such a way that we can visit bottom states without spending time on nonbottom states. Transitions are stored in four linked refinable partitions, grouped per source state, per target state, per bunch, and per block-bunch-slice, in such a way that we can visit marked transitions without spending time on unmarked transitions of the block. How these data structures are instrumental for the complexity can be found in [14].

Splitting blocks
The function split(B, T ), presented in Algorithm 3, refines block B into subblocks R and U , where R contains those states in B that can inertly reach a transition in T , and U contains the states that cannot, as formally specified in Equation (1).
These two sets are computed by two coroutines executing in lockstep: the two coroutines start the same number of loop iterations, so that the overhead is at most proportional to the faster of the two and all work done in both coroutines can be attributed to the smaller of the two subblocks R and U .
As for all s ∈ U while |U | ≤ 3.14: To identify the states in U , observe that a state is in U if all its inert successors are in U and it does not have a transition in T B− → . To compute U , we let a counter untested [t] for every non-bottom state t record the number of outgoing inert transitions to states that are not yet known to be in U . If untested [t] = 0, this means all inert successors of t are guaranteed to be in U , so, provided t does not have a transition in T B− → , one can also add t to U . To take care of the possibility that all inert transitions of t have been visited before all sources of unmarked transitions in T B− → are added to R, we check all non-inert transitions of t to determine whether they are not in T B− → at Lines 3.13 -3.17 .
The coroutine that finishes first, provided that its number of states does not exceed 1 2 |B|, has completely computed the smaller subblock resulting from the refinement, and the other coroutine can be aborted. As soon as the number of states of a coroutine is known to exceed 1 2 |B|, it is aborted, and the other coroutine can continue to identify the smaller subblock. In detail, the runtime complexity of R, U := split(B, T ) is: This complexity is inferred as follows. As we execute the coroutines in lockstep, it suffices to show that the runtime bound for the smaller subblock is satisfied. In

Experimental evaluation
The new algorithm (JGKW20) has been implemented in the mCRL2 toolset [5] and is available in its 201908.0 release. This toolset also contains implementations of various other algorithms, such as the O(mn) algorithm by Groote and Vaandrager (GV) [10] and the O(m(log |Act| + log n)) algorithm of [9] (GJKW17). In addition, it offers a sequential implementation of the partition-refinement algorithm using state signatures by Blom and Orzan (BO) [3], which has time complexity O(n 2 m). For each state, BO maintains a signature describing which blocks the state can reach directly via its outgoing transitions.
In this section, we report on the experiments we have conducted to compare GV, BO, GJKW17 and JGKW20 when applied to practical examples. In the experiments the given LTSs are minimised w.r.t. branching bisimilarity. The set of benchmarks consists of all LTSs offered by the VLTS benchmark set 3 with at least 60,000 transitions. Their name ends in " n/1000 m/1000 " and thus  [15]. All experiments have been conducted on individual nodes of the DAS-5 cluster [1]. Each of these nodes was running CentOS Linux 7.4, had an Intel Xeon E5-2698-v3 2.3GHz CPU, and was equipped with 256 GB RAM. Development version 201808.0.c59cfd413f of mCRL2 was used for the experiments. 4 Table 1 presents the obtained results. Benchmarks are ordered by their number of transitions. On each benchmark, we have applied each algorithm ten times, and report the mean runtime and memory use of these ten runs, rounded to significant digits (estimated using [4] for the standard deviation). A trailing decimal dot indicates that the unit digit is significant. If this dot is missing, there is one insignificant zero. For all presented data the estimated standard deviation is less than 20% of the mean. Otherwise we print '-' in Table 1.
The -symbol after a table entry indicates that the measurement is significantly better than the corresponding measurements for the other three algorithms, and the -symbol indicates that it is significantly worse. Here, the results are considered significant if, given a hundred tables such as Table 1, one table of running time (resp. memory) is expected to contain spuriously significant results.
Concerning the runtimes, clearly, GV and BO perform significantly worse than the other two algorithms, and JGKW20 in many cases performs significantly better than the others. In particular, JGKW20 is about 40% faster than GJKW17, the fastest older algorithm. Concerning memory use, in the majority of cases GJKW17 uses more memory than the others, while sometimes BO is the most memory-hungry. JGKW20 is much more competitive, in many cases even outperforming every other algorithm.
The results show that when applied to practical cases, JGKW20 is generally the fastest algorithm, and even when other algorithms have similar runtimes, it uses almost always the least memory. This combination makes JGKW20 currently the best option for branching bisimulation minimisation of LTSs.
Data Availability Statement and Acknowledgement. The datasets generated and analysed during the current study are available in the figshare repository: https://doi.org/10.6084/m9.figshare.11876688.v1. This work is partly done during a visit of the first author at Eindhoven University of Technology, and a visit of the second author at the Institute of Software, Chinese Academy of Sciences. The first author is supported by the National Natural Science Foundation of China, Grant No. 61761136011.