A simpler O (m log n) algorithm for branching bisimilarity on labelled transition systems

Branching bisimilarity is a behavioural equivalence relation on labelled transition systems that takes internal actions into account. It has the traditional advantage that algorithms for branching bisimilarity are more eﬃcient than all algorithms for other weak behavioural equivalences, especially weak bisimilarity. With m the number of transitions and n the number of states, the classic O ( mn ) algorithm has recently been replaced by an O ( m log n ) algorithm, which is unfortunately rather complex. This paper combines the ideas from Groote et al. [9] with the ideas from Valmari [19]. This results in a simpler O ( m log n ) algorithm. Benchmarks show that this new algorithm is faster and more memory eﬃcient than all its predecessors.


Introduction
Branching bisimilarity [8] is an alternative to weak bisimilarity [16].Both equivalences allow the reduction of labelled transition systems containing transitions labelled with internal actions, which are also referred to as silent, hidden or τ -actions.
One of the distinct advantages of branching bisimilarity is that, from the outset, an efficient algorithm has been available [10].This algorithm can be used to calculate whether two states in a labelled transition system are equivalent, and to calculate a quotient transition system.The algorithm had complexity O(mn) with m the number of transitions and n the number of states.It is more efficient than classic algorithms for weak bisimilarity, which use transitive closure (for instance, [15] runs in O n 2 m log n + mn 2.376 , where n 2.376 is the time for computing the transitive closure), and algorithms for weak simulation (the strong simulation relation can be computed in O(mn) [12], and for weak simulation first the transitive closure of the transition relation needs to be computed).The algorithm is also far more efficient than algorithms for trace-based equivalence notions, such as (weak) trace equivalence or weak failure equivalence, as these are generally PSPACE-complete on finite-state labelled transition systems [15].
Branching bisimilarity is interesting in several other respects.Not only is it a useful notion to compare the behaviour of labelled transition systems directly, as it exactly respects the branching structure of behaviour, it also enjoys a number of nice mathematical properties such as the existence of a canonical quotient with a minimal number of states and transitions modulo branching bisimilarity (contrary to, for instance, trace-based equivalences).Additionally, as branching bisimilarity is coarser than virtually any other conceivable behavioural equivalence taking internal actions into account [7], it is ideal for preprocessing.In order to calculate a desired equivalence, one can first reduce the behaviour modulo branching bisimilarity, before applying a dedicated algorithm on the often substantially reduced transition system.In the mCRL2 toolset [5] this is common practice.
In [9,11] an algorithm to calculate stuttering equivalence on Kripke structures with complexity O(m log n) was proposed.Stuttering equivalence essentially differs from branching bisimilarity in the fact that transitions do not have labels and as such all transitions can be viewed as internal.In these papers it was shown that branching bisimilarity can be calculated by translating labelled transition systems to Kripke structures, encoding the labels of transitions into labelled states following [6,18].This led to an O(m log n) algorithm for branching bisimilarity. 1nfortunately, the algorithm in [9,11] has two disadvantages.First, the translation to Kripke structures introduces a new state and a new transition per action label and target state of a transition, and as such increases the memory required to calculate branching bisimilarity substantially, depending on the structure of the transition system.This made it far less memory efficient than the classical algorithm of [10], and this was actually perceived as a substantial practical hindrance.For instance, when reducing systems consisting of tens of millions of states, such as [2], memory consumption is the bottleneck of the algorithm from [9,11].Second, the algorithm in [9,11] is very complex.To realise the targeted O(m log n) complexity, several subtle situations that can occur while running the algorithm were handled using dedicated subalgorithmic steps.To illustrate the complexity, implementing the algorithm of [9,11] took approximately half a man-year.
Contributions.We present an algorithm for branching bisimilarity that runs directly on labelled transition systems in O(m log n) time and that is simpler than the algorithm of [9,11].
To achieve this we use an idea from Valmari and Lehtinen [19,20], which they apply in the context of strong bisimulation.They observed that the standard Paige-Tarjan algorithm [17], which has O(m log n) time complexity for strong bisimilarity on Kripke structures, has a slightly higher time complexity, namely O(m log m), when applied to labelled transition systems.Valmari [19] solves this in a sophisticated way by introducing a partition of transitions, whose elements he (and we) calls bunches.
Using this idea we design our more straightforward algorithm for branching bisimilarity on labelled transition systems.Essentially, this makes the maintenance of action labels particularly straightforward and allows to simplify stability reassessment in case of new bottom states.It also leads to a novel main invariant, which we formulate as Invariant 3.2.It allows us to prove the correctness of the algorithm in a far more straightforward way than before.
To simplify the complexity notations we assume that n ≤ m + 1.This is not a significant restriction, since it is satisfied by any labelled transition system in which every non-initial state has an incoming transition.This can easily be achieved by preprocessing the graph to remove unreachable states.Furthermore, we assume that we can access action labels fast enough to bucket sort the transitions in time O(m), which is for instance the case if the action labels are consecutively numbered.
We provide a detailed proof of correctness of the algorithm and show using benchmarks that this new algorithm outperforms all preceding algorithms both in time and space when the labelled transition systems become sizeable.This is illustrated with more than 30 example LTSs.
Despite the fact that this new algorithm is more straightforward than the previous O(m log n) algorithm [9], the implementation of the algorithm into code is still not easy.To guard against implementation errors, we extensively applied random testing, comparing the output with that of other algorithms.The algorithms and their source code are freely available for use in the mCRL2 toolset [5].
Historical overview.For those new to the area of algorithms for bisimulation equivalences on labelled transition systems, it might be useful to review the major concepts that have been developed in this field throughout the years.Following Kanellakis and Smolka [14], efficient algorithms use partition refinement.The states of the transition system are partitioned into blocks, such that equivalent states are in the same block.These blocks are refined until nonequivalent states are in different blocks.The original algorithm [14] calculated strong bisimilarity and had time complexity O(mn).The main idea is to find a splitter, i.e., a block that shows that some states are not equivalent, and then move these states to separate blocks.
Subsequently, the seminal article of Paige and Tarjan [17] presented an efficient algorithm for strong bisimulation minimisation of Kripke structures.Its main data structure consists of two partitions, a fine one into blocks and a coarse one into (what is called in [9]) constellations of blocks.The fine partition stores the current knowledge about inequivalence of states, and the coarse partition stores the current knowledge on which blocks cannot act as splitters.When the two partitions coincide, no more splitters exist, so the blocks in the fine partition are the bisimulation equivalence classes.Paige and Tarjan's algorithm repeatedly splits a constellation with multiple blocks into two parts and splits the fine partition if the new constellations actually lead to some splits.Their ingenious data structures ensure Hopcroft's "Process the smaller half" principle [13], guaranteeing that the work done requires time proportional to the smaller of the splitters.This leads to a time complexity in O(m log n) on Kripke structures.
The next key insight was by Valmari, who introduced the idea to use a partition of transitions into bunches [19] to store the knowledge about non-splitters.This resulted in an O(m log n) algorithm for strong bisimilarity on labelled transition systems, even though m may be larger than n 2 when there are enough transition labels.
For branching bisimilarity, Groote and Vaandrager [10] presented an algorithm with worst-case time complexity in O(mn).They established that in order to determine that a block can be split it is only necessary to look at its bottom states, i.e., states that have no outgoing internal transition in the same block.In addition to the algorithm of [14] the algorithm of Groote and Vaandrager [10] had to determine which states can reach certain bottom states via internal transitions in the block to split that block.They also observed that stability is not preserved under splitting when new bottom states emerge.Therefore, stability of existing blocks had to be reassessed by the algorithm.
More than 25 years later, [9,11] managed to merge the ideas of Paige and Tarjan with those of Groote and Vaandrager, finding an algorithm for stuttering equivalence that has time complexity O(m log n) on Kripke structures as well as time complexity O(m log n) on LTSs using a translation of LTSs to Kripke structures.The first essential difficulty that had to be overcome was that calculating the reachability of states through internal transitions must be done in time proportional to the smallest block that is split off, following the "Process the smaller half" principle.This was done by two coroutines that are executed in parallel to identify those states that can reach certain bottom states via internal transitions, and simultaneously identify the states from which those bottom states are not reachable.As soon as the smaller set of states was found, the other coroutine was terminated.The other essential contribution of [9,11] is that reassessing stability of the partition can be done in time proportional to log n times the number of (incoming and outgoing) transitions of new bottom states.As each state becomes a bottom state at most once, this fits into the time bound.Dealing with this was done in a rather delicate post-processing stage that would be executed whenever a new bottom state was found.We present an algorithm without postprocessing.
Overview of the article.In Section 2 we provide the definition of labelled transition systems and branching bisimilarity.In Section 3 we provide the core algorithm with high-level data structures, correctness and complexity.The next section presents the procedure for splitting blocks, which can be presented as an independent pair of coroutines.Section 5 provides the details of the algorithm, especially how the high-level data structures can be represented efficiently.Section 6 presents some benchmarks.

Branching bisimilarity
In this section we define labelled transition systems and branching bisimilarity.Definition 2.1 (Labelled transition system).A labelled transition system (LTS) is a triple A = (S, Act, − →) where 1. S is a finite set of states.The number of states is denoted by n.

2.
Act is a finite set of actions including the internal action τ .→) be a labelled transition system.We call a relation R ⊆ S × S a branching bisimulation relation iff it is symmetric and for all s, t ∈ S such that s R t and all transitions s a − → s we have:

−
Two states s and t are branching bisimilar, denoted by s ↔ b t, iff there is a branching bisimulation relation R such that s R t.
Note that branching bisimilarity is an equivalence relation.Given an equivalence relation R, we say that the state t n , the action a, and the transition t n a − → t are inertly reachable from t.

The algorithm
We now present our algorithm to calculate branching bisimilarity at an abstract level and assign time budgets, i.e., indications of how much time each step is allowed to take.The details of the implementation, which are essential to fit the time budgets, are given in Section 5. We first describe the basic data structures and subsequently the algorithm, its correctness and complexity.The algorithm depends on a block splitting procedure, which is explained in Section 4. In this and the following sections, we use the labelled transition system A = (S, Act, − →).

The essential data types
The algorithm relies heavily on partitioning sets, especially, sets of states and sets of transitions.
Note that a partition induces an equivalence relation in the following way: s ≡ Π t iff there is some B ∈ Π containing both s and t.
The algorithm uses two main partitions.Partition Π s is a partition of states in S that is coarser than branching bisimilarity.We refer to the elements of Π s as blocks, for which we typically use the letter B. Partition Π t partitions the non-inert transitions of − →, where inertness of τ -transitions is interpreted with respect to ≡ Πs .We refer to the elements of Π t as bunches, for which we use the letter T .
In the algorithm, the partition of blocks Π s records the current knowledge about branching bisimilarity: two states are in different blocks iff the algorithm has found a proof that they are not branching bisimilar (see Invariant 3.6).The partition of transitions Π t records the current knowledge about splitters.For each bunch of transitions, we initially assume that they can pairwise simulate each other (i.e., they can serve as transitions s a − → s and t a − → t in Definition 2.2), until proven otherwise.The algorithm maintains the invariant (formalised in Invariant 3.2) that, whenever a state in a block has a transition in a bunch, then every state in that block can inertly reach a transition in the same bunch, such that the condition of Definition 2.2 is satisfied.Whenever this invariant is satisfied for a particular block and bunch, we say that the block is stable with respect to that bunch.
However, Definition 2.2 comes with some constraints on transition t a − → t : first, it needs to have the same action label as s b − → s , and second, s and t need to be in the same target block.If these constraints are violated, it appears that our initial assumption about the bunch was incorrect, and we try to correct this by splitting the bunch into parts that do satisfy the constraints.We do this by splitting off an action-block-slice, a subset of transitions in a bunch with the same action label and the same target block and placing it in a new bunch.The new bunch is called the primary splitter, the remainder of the bunch from which it was split off is called the secondary splitter.After such a split, the blocks in Π s need to be split with respect to both splitters to re-establish the invariant.
From the two partitions Π s and Π t , we infer the following derived sets that are used in the algorithm.For bunches T ∈ Π t , actions a ∈ Act and blocks B, B ∈ Π s , we have: • The block-bunch-slices, i.e., the transitions in T that start in B: • The action-block-slices, i.e., the transitions in T that have label a and end in B : • A block-bunch-slice intersected with an action-block-slice: • The bottom states of a block, i.e., the states without an outgoing inert transition: • The states in a block with a transition in a bunch: • The blocks splittable by an action-block-slice: • The number of action-block-slices contained in a bunch: • The outgoing transitions of a block: • The incoming transitions of a block: The first two of these sets (block-bunch-slices and action-block-slices) are maintained as auxiliary data structures in the algorithm in order to meet the required performance bounds.If the partitions Π s or Π t are adapted, these derived sets also change.For instance if a block in Π s is replaced by two other blocks (this happens at Lines 1.17 and 1.25 of Algorithm 1), the corresponding block-bunch-slices and action-block-slices are split as well.For the sake of brevity, we omit the updating of these derived sets in the high-level description of the algorithm.We describe how these sets are maintained in Section 5.
When Π t is changed, the invariant needs to be re-established by splitting blocks.To keep track of the blocks that still need to be split, we partition the block-bunch-slices T B− → into stable and unstable block-bunch-slices.A block-bunch-slice is stable if we have ensured that it is not a splitter for any block in Π s .Otherwise it is deemed unstable, and it needs to be checked whether it is stable, or whether some block B must be split.If a block-bunch-slice is unstable, it is stored in the splitter list, either as a primary or as a secondary splitter.Moreover, if a block-bunch-slice is unstable, we divide the transitions in this block-bunch-slice in marked and non-marked transitions.These markings are used to determine which bottom states in a block have a transition in a particular bunch, and are essential for efficient splitting of blocks.For an unstable block-bunch-slice T B− → we write its marked transitions as Marked (T B− → ).Note that, even when a block-bunch-slice T B− → resides on the splitter list, the block B may be split due to another splitter.In such a case, the block-bunch-slice is split accordingly, and the splitter list is implicitly adapted.

Overview of the algorithm
Before performing the partition refinement, the LTS is preprocessed to contract τ -strongly connected components (SCCs) into a single state without τ -loop.This step is valid since all states on a τ -SCC are branching bisimilar, and it ensures that all τ -paths in the LTS are finite.
The algorithm itself is a partition refinement algorithm.It iteratively refines the partitions Π s and Π t .The main objective for the algorithm is to guarantee the following: If for some block B , a block B has a transition in B a − → B in an action-block-slice T a − →B , then every bottom state in B has a transition in the same action-block-slice T a − →B .Since there are no infinite τ -paths in the LTS, every state in B is guaranteed to inertly reach a bottom state, hence every state in B can inertly reach a transition in T a − →B .Therefore, whenever this objective has been reached, every block is a branching bisimulation equivalence class.
To achieve this objective, the algorithm maintains the following weaker invariant, which keeps track of bunches instead of action-block-slices.If a block B has a transition in B a − → B to some block B in a bunch T , then every bottom state in B has a transition in T .Observe that, once every action-block-slice is in its own bunch, and this is reflected in the blocks, the main objective of the algorithm has been reached.
Hence, the main invariant of our algorithm is the following.
Invariant 3.2 (Bunches).Π s is stable under Π t , i.e., if a bunch T ∈ Π t contains a transition with its source state in a block B of Π s , then every bottom state in block B has a transition in bunch T (in fact, in block-bunch-slice T B− → ).Now, if a bunch contains multiple action-block-slices-we call that a non-trivial bunch-, to get closer to our main objective, we have to refine Π t by splitting off an action-block-slice.At the same time, to preserve the main invariant, when we split a bunch, we need to reflect this change in the blocks.Therefore, the blocks that had an inertly reachable transition in the original bunch are split into subblocks that can either inertly reach the new bunch, the remainder of the original bunch, or both.This idea is fleshed out in Algorithm 1.We first illustrate one step of the algorithm using the example in Figure 1.
Example 3.3.We start with a part of a labelled transition system in Figure 1a.We have three states, s 0 , s 1 , and s 2 , with an inert transition s 1 τ − → s 0 , and a number of transitions in the bunch T .Note that T is non-trivial: it contains two action-block-slices, T a − →B and T b − →B .To get closer to the situation where every bunch consists of exactly one action-block slice, the algorithm splits off a small action-block-slice from bunch T , and puts it into its own bunch T .This situation is shown in Figure 1b.
To ensure that the invariant is maintained, we now need to split block B with respect to T and T .We first split with respect to T .This splits block B into two blocks: R, which contains the states that can inertly reach a transition in T , and U , containing those states from which T is unreachable inertly.This puts s 0 and s 1 in R, and s 2 in U .This is shown in Figure 1c.
Block U is stable with respect to T since, according to the invariant that holds before splitting the bunch, all states that cannot inertly reach T must be able to inertly reach T .Block R,  is not yet stable with respect to T .Therefore, R must be split into the set of states from which a transition in T is unreachable, denoted U , and from which a transition in T is reachable, denoted R .This is shown in Figure 1d.Note that in this last split, the τ -transition becomes non-inert, and a new bunch T τ containing this transition is introduced.The algorithm needs to stabilise with respect to this new bunch, and it needs to ensure that R , the block containing a new bottom state, is stabilised with respect to all other bunches it can inertly reach.In this particular example, R happens to be stable with respect to all bunches, so in this case, the required stabilisation has no effect.

In-depth description of the algorithm
The pseudocode of the algorithm is given in Algorithm 1.The algorithm works as follows, where we start with the initialisation.First, at Line 1.1, all τ -SCCs are contracted into a single state each (without τ -loop).All states in a τ -SCC are branching bisimilar (as they can all reach the same states, possibly after a few inert transitions).This preprocessing ensures that there are no τ -cycles in the LTS (Invariant 3.7 below) and from every non-bottom state a bottom state can be reached via inert transitions (Lemma 3.8 below).
Second, at Lines 1.2-1.5, we create the initial partitions Π s and Π t as follows.Let B vis be the set of states from which a visible transition is inertly reachable, and let B invis be the other states (i.e., the states from which only a deadlock state can be inertly reached).Then Π s = {B vis , B invis }\{∅}.Initially, Π t contains one bunch consisting of all non-inert transitions.All the block-bunch-slices induced by Π s and Π t are initially stable, i.e., for all block-bunch-slices T B− → , every bottom state of B, which must be B vis , has a transition in T B− → (this is Invariant 3.2 above).
If there are non-trivial bunches, these bunches need to be split such that they ultimately become trivial.The outer loop of the algorithm (Lines 1.6-1.31)takes a non-trivial bunch T from Π t , and from this it moves a small (containing at most half the number of transitions in the bunch) action-block-slice T a − →B into its own bunch in Π t (Line 1.8).Hence, bunch T is reduced to T \ T a − →B .The two new bunches T a − →B and T \ T a − →B can cause instability, violating Invariant 3.2.This means there can be blocks with transitions in one new bunch, but not all bottom states have transitions in that bunch because some bottom states only have transitions in the other new bunch.

Algorithm 1 Abstract algorithm for branching bisimulation partitioning
1.1: Find τ -SCCs and contract each of them to a single state for all B ∈ splittableBlocks(T a − →B ) do for all T B− → in the splitter list (in order) do 1.17: the splitter list, and mark all its transitions 1.23: 1.25: Add all T N − → to the splitter list and label them secondary 1.28: For each bottom state, mark one of its outgoing transitions in every T N − → where it has one end for 1.31: end while For such blocks, stability needs to be restored by splitting them.The set splittableBlocks(T a − →B ) contains all blocks that have transitions in both new bunches T a − →B and T \ T a − →B ; these blocks must be investigated.All other blocks are stable with respect to the new bunches.
Earlier algorithms would now investigate all blocks to re-establish stability.Instead, we investigate all block-bunch-slices in the smaller of the two new bunches.All blocks that do not have transitions in these block-bunch-slices are stable with respect to both bunches.The first inner loop (Lines 1.9-1.13)serves to insert all these block-bunch-slices into the splitter list.Block-bunchslices of the shape T B a − →B in the splitter list are labelled primary, and all other block-bunch-slices in this list are labelled secondary.
In the first loop (Lines 1.9-1.13)some transitions are marked.The function of this marking is similar to that of the counters in [17]: it serves to determine quickly whether a bottom state has a transition in a secondary splitter T B− → \ T B a − →B (or slices that are the result of splitting this slice).Invariant 3.11 below indicates that a bottom state has transitions in some splitter blockbunch-slice if and only if it has marked transitions in this slice.The marked transitions are stored separately in a block-bunch-slice and therefore can be visited without spending time on unmarked transitions of the block.This is essential to obtain the time complexity of the algorithm, as we are allowed to perform one unit of work per transition in T a − →B (the smaller of the two new bunches), and since |Marked − →B |, we do not mark too many transitions.In the second loop (Lines 1.14-1.30),one splitter T B− → from the splitter list is taken at a time and its source block is split into R (the part that can inertly reach some transition in T B− → ) and U (the part that cannot inertly reach T B− → ) to re-establish stability.Formally, the routine split(B, T ) delivers the pair R, U defined by: We will detail its algorithm and argue for its correctness in Section 4.
If T B− → was a primary splitter of the form T B a − →B added to the splitter list at Line 1.10, then we know that U must be stable under T U − → \ T U a − →B , as every bottom state in B has a transition in the former block-bunch-slice T B− → , and as the states in U have no transition in T B a − →B , every bottom state in U must have a transition in T B− → \ T B a − →B .Therefore, at Line 1.19, block-bunchslice T U − → \ T U a − →B can be removed from the splitter list without investigating it.This is crucial for the complexity, and essentially it is the translation of the three-way split from [17].
Some invisible transitions may have become non-inert, namely the τ -transitions that go from R to U .There cannot be τ -transitions from U to R as otherwise, from the source state of such a transition, T B− → could be inertly reached, so it should have been added to R instead of U .The new non-inert transitions were not yet part of a bunch in Π t .So, a new bunch R τ − → U is formed containing these transitions.All transitions in this new bunch leave block R and therefore R is the only block that may not be stable under this new bunch.To re-establish stability we split R into blocks N, R (Line 1.23).Observe that there can be transitions N τ − → R that also become non-inert, and we add those to the new bunch R τ − → U .In N , i.e., the set of states that can inertly reach some transition in R τ − → U , some states are new bottom states, while R contains all original bottom states of R. In accordance with the observations in [10] blocks containing new bottom states can become unstable under any blockbunch-slice.Therefore, stability under all those block-bunch-slices must be re-established and therefore all the block-bunch-slices leaving block N are put on the splitter list.We mark one of the transitions in every new bottom state such that we can find the bottom states with a transition in T N − → in time proportional to the number of such new bottom states.This algorithm is repeated until all action-block-slices coincide with the bunches.In the next section we prove that the resulting partition Π s exactly coincides with branching bisimilarity.
We illustrate the algorithm in the following example.Note that the example also illustrates some of the details of the split subroutine, which is discussed in detail in Section 4.
Example 3.4.Consider the situation in Figure 2a.Observe that block B is stable w.r.t. the bunches T and T .We have split off a small bunch T a − →B from T , and as a consequence, B needs to be restabilised.The bunches put on the splitter list initially are T a − →B and T \ T a − →B .When putting these bunches on the splitter list, all transitions in T B a − →B are marked, see the m's in Figure 2b.Also, for states that have transitions both in T a − →B and in T \ T a − →B , one such transition in the latter bunch is marked, see the m's in Figure 2b.
We now first split B w.r.t. the primary splitter T a − →B into R, the states that can inertly reach T a − →B , and U , the states that cannot.In Figure 2b, the states known to be destined for R are indicated , the states known to be destined for U are indicated .Initially, all states with a marked outgoing transition are destined for R, the remaining bottom state of B is destined for U .The split subroutine proceeds to extend sets R and U in a backwards fashion using two coroutines, marking a state destined for R if one of its successors is already in R, and marking a state destined for U if all its successors are in U .In this case, the state in U does not have any incoming inert transitions, so its coroutine immediately terminates and all other states belong to R. Block B is split into R and U .The resulting block U is stable w.r.t.both T a − →B and T \ T a − →B .The resulting sets R and U are shown in Figure 2c.
We still need to split R w.r.t.T \ T a − →B , into R 1 and U 1 , say.For this, we use the marked transitions in T \T a − →B as a starting point to compute all bottom states that can reach a transition in T \ T a − →B .This guarantees that the time we use is proportional to the size of T a − →B .Initially, there is one state destined for R 1 , marked in Figure 2c, and one state destined for U 1 , marked in the same figure.We now perform both coroutines in split simultaneously.Figure 2d shows the situation after both coroutines have considered one transition: The U 1 -coroutine (the coroutine that calculates the states that cannot inertly reach a transition in T \ T a − →B ) has initialised the counter untested of one state to 2 on Line 2.10 of Algorithm 2 because two of its outgoing inert transitions have not yet been considered.The R 1 -coroutine (the coroutine that calculates the states that can inertly reach a transition in T \ T a − →B ) has checked the unmarked transition in the splitter T R− → \ T R a − →B .As the latter coroutine has finished visiting unmarked transitions in the splitter, the U 1 -coroutine will no longer need to run the slow test loop at Lines 2.14-2.18 of the left-most column of Algorithm 2. In Figure 2e the situation is shown after two more steps in the coroutines.Each has visited two extra transitions.There are two extra states destined for R 1 , marked , and there is one state destined for U 1 with 0 remaining inert transitions, for which we know immediately that it has no transition in T \ T a − →B , this is marked .Now, the R 1 -coroutine is terminated, since it contains more that 1  2 |R| states, and the remaining incoming transitions of states in U 1 are visited.This will not further extend U 1 since all states are already either destined for R 1 or for U 1 .
The result of splitting is shown in Figure 2f.Note that some inert transitions become non-inert, so a new bunch with the transitions R 1 τ − → U 1 is created, and all those transitions are marked m.We next have to split R 1 with respect to this new bunch into the set of states N 1 that can inertly reach a transition in the new bunch, and the set R 1 that cannot inertly reach this bunch.In this case, all states in R 1 have a marked outgoing transition, hence N 1 = R 1 , and R 1 = ∅.The coroutine that calculates the set of states that cannot inertly reach a transition in the bunch will immediately terminate because there are no transitions to be considered.
Observe that R 1 (= N 1 ) has a new bottom state.This means that stability of R 1 with respect to any bunch is not guaranteed any more, and this stability needs to be re-established.We therefore consider all bunches in which R 1 has an outgoing transition.We add and T R1− → to the splitter list as secondary splitters, and mark one of the outgoing transitions from each bottom state in each of those bunches using m.This situation is shown in Figure 2g.
In this case, R 1 is stable w.r.t.T R1 a − →B and T R1− → \ T R1 a − →B , i.e., all states in R 1 can inertly reach a transition in both bunches.In both cases this is observed immediately after initialisation in split, since the set of states that cannot inertly reach a transition in these bunches is initially empty, and the corresponding coroutine terminates immediately.
Therefore, consider splitting R 1 with respect to T R1− → .This leads to R 2 , the set of states that can inertly reach a transition in T , and U 2 , the set of states that cannot inertly reach a transition in T .Note there are no marked transitions in T R1− → , so initially all bottom states of R 1 are destined for U 2 (marked in Figure 2h), and there are no states destined for R 2 .Then we start splitting R 1 .In the R 2 -coroutine, we first add the states with an unmarked transition in T R1− → to R 2 at Line 2.5r (i.e., in the right column of Algorithm 2) and then all predecessors of the new bottom state need to be considered.When split terminates, there will be no additional states in U 2 , and the remaining states end up in R 2 .
The situation after splitting R 1 into R 2 and U 2 is shown in Figure 2i.One of the inert transitions becomes non-inert, this is marked m.Furthermore, R 2 contains a new bottom state.This is the state in R 2 with a transition to T , and it is marked nb.Note that, as each block necessarily has a bottom state (see Lemma 3.8), a non-bottom state had to become a bottom state in this case.
We need to continue stabilising R 2 w.r.t. the bunch R 2 τ − → U 2 , which does not lead to a new split, and we need to restabilise R 2 w.r.t.all bunches in which it has an outgoing transition.This also does not lead to new splits, so the situation in Figure 2i after removing the marking of transitions in R 2 τ − → U 2 is the final result of splitting.

Correctness
The validity of the algorithm follows from a number of major invariants.
The main invariant is Invariant 3.2, which was stated before.The following invariant indicates that two non-inert transitions starting in the same block, having the same label and ending in the same block, will always reside in the same bunch.Proof.Initially, Π t contains one bunch with all non-inert transitions.Therefore, the invariant is valid.
There are two places in the algorithm where validity of this invariant can be jeopardised.At Line 1.8 Π t is refined, and at Lines 1.22 and 1.26 a new bunch is created and extended.
We first look at Line 1.8.We replace bunch T with the two bunches T a − →B and T \ T a − →B .As all the transitions in T a − →B have label a and go to block B , the invariant is maintained for T a − →B .All transitions in T \ T a − →B have a label different from a or go to a different block than B , so the invariant is maintained for these transitions as well.
At Lines 1.22 and 1.26 all new non-inert transitions are put in a new bunch.As these are the only τ -transitions between blocks R and U , the invariant remains valid under the creation of this bunch.
The following invariant says that states that are branching bisimilar will never end up in separate blocks.Invariant 3.6 (Preservation of branching bisimilarity).For all states s, t ∈ S, if s ↔ b t, then there is some block B ∈ Π s such that s, t ∈ B.
Proof.Initially, this invariant is valid.We prove this by contraposition.Consider two states s, t in different blocks.Then s ∈ B vis and t ∈ B invis , or vice versa.But then s and t cannot be branching bisimilar, because from s a visible action is inertly reachable, whereas that is not the case from t (or vice versa).
The preservation of this invariant can be seen as follows.There are two places where blocks in Π s are split, namely at Lines 1.17 and 1.25.We first concentrate on the splitting at Line 1.17.In this case a block B is split into R and U where R, U is the result of an invocation of split(B, T B− → ).Assume that the invariant would be invalidated.This means that there are states s ∈ R and t ∈ U with s ↔ b t.As s ∈ R, by Equation (1) on page 9 we know that s Hence, by (1) on page 9 it follows that t ∈ R, contradicting the assumption that the invariant could be invalidated.
At Line 1.25 splitting takes place with regard to a splitter R τ − → U .As this is a bunch satisfying Invariant 3.5, the reasoning is completely analogous to the previous case.
The following invariant says that there are no τ -loops in any block.Actually, a stronger property holds, namely that there are no τ -loops at all, after they have been removed during the initialisation of the algorithm.

Invariant 3.7 (No inert loops).
There is no inert loop in a block, i.e., for every sequence Proof.Initially, this invariant holds because every strongly connected component consisting of states connected via τ -transitions is contracted and merged into a single state.The only operation that influences this invariant is splitting a block (Lines 1.17 and 1.25).Splitting blocks cannot introduce new loops.
As a consequence of the previous invariant, every block has at least one bottom state, and from every state in a block a bottom state can be inertly reached.Lemma 3.8.Invariant 3.7 implies that for all partitions Π s of S, and all blocks B in Π s , we have 1.Bottom(B) = ∅.
2. For every state s ∈ B, there is a path of inert transitions leading to a bottom state in B.
Proof.Let Π s be an arbitrary partition of S, and B ∈ Π s be a block in S such that B = ∅.
1. Towards a contradiction, assume that Bottom(B) = ∅.Then every state s in B has an outgoing transition s τ − → s for some s ∈ B. Since B is finite, there must be a τ -cycle in B. This contradicts Invariant 3.7.

2.
Let s ∈ B be arbitrary, and assume that s does not have a path of inert transitions leading to a bottom state in B. Then, there must, again, be a τ -cycle in B, which contradicts Invariant 3.7.
Invariant 3.9 is a technical invariant required to prove the main Invariant 3.2.Invariant 3.9 holds for the second inner for loop, and says that Invariant 3.2 holds, except for the block-bunchslices on the splitter list, for which the invariant has to be re-established by splitting blocks.Invariant 3.9 (Inner loop at Lines 1.14-1.30).If a non-empty block-bunch-slice T B− → is not in the splitter list, then every bottom state in its source block B has a transition in T B− → .
Proof.First it is shown that the invariant holds when arriving at the second for loop at Line 1.14.Consider a non-empty block-bunch-slice T B− → that is not in the splitter list.If T B− → is not a subset of T at Line 1.6, it follows from Invariant 3.2 that all t ∈ Bottom( B) have a transition in T , so the invariant holds.Now assume T B− → is a subset of T .Note that T = T .Then one of the following cases apply: As T B a − →B is not empty, there is a transition s a − → s ∈ T B a − →B ⊆ T .Towards a contradiction, suppose there is some state t ∈ Bottom( B) that does not have an outgoing transition in T a − →B .Then, t must have a transition in T \ T a − →B by Invariant 3.2, thus B ∈ splittableBlocks(T a − →B ) and hence T B a − →B is on the splitter list.This contradicts the assumption that T B− → is not on the splitter list.
− →B .It cannot happen that B ∈ splittableBlocks(T a − →B ), since then T B− → is on the splitter list.Therefore, B ∈ splittableBlocks(T a − →B ).Since T B− → = ∅, we have T B− → = T , and it immediately follows from Invariant 3.2 that all t ∈ Bottom( B) have a transition in T B− → .
We now show that the loop starting at Line 1.14 maintains this invariant.Concretely, we consider some block-bunch-slice T B− → that is not empty and does not occur in the splitter list at Line 1.30.We make a case distinction on the block B.
• Let us first assume that B is not the result of splitting a block at Lines 1.17 or 1.25.So, B ∩ B = ∅.This means that T B− → was not split during the last iteration of the for loop, and it is not a subset of the bunch with new non-inert transitions created at Lines 1.22 and 1.26.Hence, the invariant remains valid for T B− → during this last iteration.
• Assume B is a subset of R and the condition at Line 1.21 is not valid, i.e., R τ − → U = ∅.This means B is not split further at Line 1.25, i.e., B = R.We treat the cases where TR− → is or is not a subset of T B− → separately.
-If TR− → ⊆ T B− → then we see that due to the splitting at Line 1.17 all states in R can reach a transition in T B− → = T R− → .In particular, all bottom states of B have a transition in T B− → and this is also the case for TR− → by construction.Hence, the invariant is valid for TR− → .
-TR− → ∩T B− → = ∅.In this case, TR− → is the result of splitting a block-bunch-slice TB− → that was not on the splitter list, as splitting TB− → results in keeping the splitted block-bunchslices on this list in case the original was already there.This means that all bottom states in B have transitions in TB− → by Invariant 3.2, and by construction, those bottom states in R have transitions in TR− → .As R τ − → U = ∅, there are no additional bottom states in Bottom(R) \ Bottom(B) and the invariant is established for TR− → .
• Assume B is a subset of R and the condition at Line 1.21 is valid.This means that B is either equal to N or R .
-Assume B equals N .This means that T B− → has the shape T N − → (Line 1.27) and is added to the splitter list, contradicting our assumption that T B− → is not on the splitter list at Line 1.30.
-Assume B is equal to R .In this case the block-bunch-slice TR − → cannot be a subset of the new bunch created at Lines 1.22 and 1.26 as there are no transitions in (R If the original block-bunch-slice TB− → that existed at the beginning of the second for loop satisfying TR − → ⊆ TB− → was in the splitter list, then also TR − → would be on the list.Contradiction.Therefore, TB− → was not in the splitter list.As TB− → was not empty, all bottom states in B would have transitions in TB− → .As all new bottom states are moved to N , all bottom states in R are also bottom states in B, and it follows that all bottom states in R have transitions in TR − → .Hence, the invariant holds.
• Here we consider the situation where B = U .There are three cases to consider.
- -In this last case we investigate the remaining situations.So either TU− → = T U − → , or TU− → = T U − → \ T U a − →B or T B− → is not primary.If the original block-bunch-slice TB− → that existed at the beginning of the second for loop satisfying TU− → ⊆ TB− → was in the splitter list, then also TU− → would be on the list.Contradiction.Therefore, TB− → was not in the splitter list.As TU− → is not empty, TB− → is also not empty and so, all bottom states in B have a transition in TB− → .All bottom states of U are bottom states from B. (Otherwise there must have been a transition s τ − → s from a state s ∈ U to a state s ∈ R; but then s would be part of R.) Hence, all bottom states in U have a transition in TU− → , showing that the invariant holds.Invariant 3.2 is the main invariant of the algorithm.It is valid at Line 1.6.It is an adaptation to branching bisimulation of a similar invariant in [19].It says that the partition is always stable w.r.t. the bunches.Stability refers to the presence of a transition in a bunch, and hence does not relate to actions or target blocks.We finally prove its invariance.The proof relies on Invariant 3.9.
Proof of Invariant 3.2.We show that Π s is stable under Π t , i.e., if a bunch T ∈ Π t contains a transition with its source state in a block B of Π s , then every bottom state in block B has a transition in bunch T (in fact, in block-bunch-slice T B− → ).
Initially, this invariant is valid: The initial blocks in Π s are B vis and B invis .Furthermore, there is only one bunch.From the states in B invis no visible transition is reachable, and this means that all transitions from states in B invis are inert.Therefore, they do not occur in the initial bunch and Invariant 3.2 holds trivially.Each bottom state in B vis has an outgoing visible transition, which must occur in the initial bunch.This also makes the invariant valid in a trivial way, albeit for a different reason.
The invariant is invalidated when Π t is split at Line 1.8.At the end of the second for loop (Line 1.30), emptiness of the splitter list and Invariant 3.9 imply that all block-bunch-slices are stable.This implies the invariant as follows.Consider a bunch T ∈ Π t .Assume there is a transition s a − → s ∈ T with s ∈ B. The transition s a − → s occurs in the block-bunch-slice T B− → .As T B− → is stable, it holds that every state t ∈ Bottom(B) has an outgoing transition in T B− → and therefore the invariant holds at Line 1.30.The invariants given above allow us to prove that the algorithm works correctly.When the algorithm terminates (and this always happens, see Section 3.5), branching bisimilar states are perfectly grouped in blocks.
Proof.By Invariant 3.6 it follows that ↔ b ⊆ ≡ Πs .Hence, we only need to show that ≡ Πs ⊆ ↔ b .We do this by showing that ≡ Πs is a branching bisimulation relation.
Consider states s, t ∈ S such that s, t ∈ B for some block B ∈ Π s .Assume s a − → s .
• If a = τ and s ∈ B, i.e., the transition is inert, then s ≡ Πs t and we have fulfilled the proof obligation for branching bisimulation in this case.
• If the transition s a − → s is not inert, it is part of some bunch T ∈ Π t .By Invariant 3. We provide the following invariant as a precondition for splitting blocks in Section 4. It says that whenever split( B, T B− → ) is called, the bottom states in block B with transitions in bunch T can be found by only looking at the marked transitions.Checking whether a state has a marked outgoing transition can be done in constant time, which is essential for the algorithm.

Invariant 3.11 (Marked transitions). Whenever split( B, T
Proof.Whenever split( B, T B− → ) is called, the block-bunch-slice T B− → is in the splitter list.There are four cases when a block-bunch-slice is inserted into the splitter list: − →B is a primary splitter of B. Then, all transitions in T B a − →B are marked at Line 1.11, so the invariant obviously holds.
• T B− → is a secondary splitter of B that has been added at Line 1.10.Note that for every block, we first split under its primary splitter T B a − →B and then apply what remains of the secondary splitter only to R (or only to R in case R τ − → U is not empty, but for brevity, we only mention R below).So, the call is split(R, T R− → \ T R a − →B ).As every bottom state of R has a transition in T B− → , Line 1.12 ensures that every bottom state in R which already was a bottom state in B and which has a transition in T R− → \ T R a − →B , also has a marked transition in it.So, the invariant holds.• T B− → = TN− → is a splitter of N added at Line 1.27.In that case, Line 1.28 ensures that every bottom state with a transition in TN− → has a marked transition in it.Hence, also in this last case, the invariant holds.
Additionally, it can happen that a splitter itself is split.Then, the two new subslices keep their markings.If new bottom states result from the split, they are handled in a similar way to the last case above: in all relevant slices, each new bottom state with an outgoing transition in that slice also has a marked outgoing transition in that slice.

Complexity
We show that the algorithm runs in time O(m log n), using the time budgets printed in grey at the right-hand side of the pseudocode, which indicate how much time each piece of code is allowed to spend.In Section 5 it is explained how the data structures meet these time budgets.The initialisation (Lines 1.1-1.6)can be performed in O(m) time, where for the calculation of the sets of states, the assumption that n ≤ m + 1 is used.The calculation of the time complexity of the while loop is split into three separate parts.The first part regards splitting bunches, putting block-bunch-slices on the splitter list and marking transitions, which is attributed to the transitions in a new small bunch.The second part deals with splitting blocks, which is attributed to the transitions of the smaller subblock.The third part handles the calculations that are required when states become new bottom states.
When splitting bunches, we apply the "Process the smaller half" principle to new bunches.Every transition is an element of a new bunch T a − →B at Line 1.10 at most log 2 n 2 + 1 times, because the first time T a − →B is investigated, it has at most n 2 elements, and every subsequent time a new bunch containing the same transition is investigated, it has at most half the size of the previous bunch.Whenever a transition is an element of a new bunch T a − →B , it is processed in constant time.This is indicated with the time budget O(|T a − →B |) for Lines 1.7-1.14.Also, processing block-bunch-slices of the form T B a − →B and T B− → \ T B a − →B (as added to the splitter list at Line 1.10) requires time in O Marked (T B− → ) at Lines 1.15-1.22,corresponding to at most the number of transitions in T B a − →B .Summing over all transitions gives runtime O(m log n) attributed to new bunches.
In the next section we explain in detail how long split can take.In short, its runtime depends on the smaller of the two resulting subblocks-so we apply "Process the smaller half" to that block.Every state can be part of such a smaller subblock at most log 2 n times, because the first time it is part of a subblock of at most n/2 states, and at every subsequent time the same state becomes part of another smaller subblock, the latter has at most half the size of the previous subblock.Whenever a state is in the smaller subblock of split, we are allowed to attribute time proportional to the number of incoming and outgoing transitions of that state.More precisely, provided each source state of T is in B, the complexity of calculating split(B, T ) is the follow- 26 to U or R, whichever is smaller.Summing over all states gives rise to a cumulative time complexity of O(m log n).
Finally, some work is attributed to new bottom states.Every non-bottom state can become a bottom state at most once during the whole execution.When this happens, we attribute time proportional to its outgoing transitions to it.This is indicated with several time budgets O(|Bottom(N ) − → |) at Lines 1.15-1.28.At Line 1.27 we need to include not only the current new bottom states but also the future ones because there may be block-bunch-slices that only have transitions from non-bottom states.When N is split under such a block-bunch-slice, at least one of these non-bottom states will become a bottom state but we cannot yet say which one(s) right now.Also, for block-bunch-slices of the form T N − → at Line 1.28, we budget O Marked (T B− → ) at Lines 1.15-1.22corresponding to the new bottom states in N .Summing over all states gives runtime O(m) attributed to new bottom states.
Adding up these three time budgets shows that the grand total of all work is O(m log n).

Splitting blocks
The function split(B, T ) refines the block B into two subblocks, R and U , where R contains those states in B that can inertly reach a transition in T , and U contains the states that cannot, as formally specified in Equation ( 1).These two sets are computed by two coroutines that are executed in such a way that the work between the two is balanced by alternating between the coroutines, and all work done in both coroutines can be attributed to the smaller of the two blocks R and U .Algorithm 2 presents those coroutines.
As a precondition, guaranteed by Invariant 3.11, the function requires that the marking of transitions in T B− → is such that bottom states of B that have an outgoing transition in T B− → also have a marked outgoing transition in T B− → .Formally, The initial sets are computed as follows.Initially, all states in B Marked(T ) − −−−−−− → , i.e., all states that are the source of a marked transition in T , are put in R. All bottom states that are not in R initially are put in U .Observe that these initial sets can be computed in O(|Marked (T )|) time by grouping bottom states with marked transitions.
The sets are extended as follows in the coroutines.For the states in R, first the states in were not yet in R, i.e., all states that are the source of an unmarked transition in T , are added to R. Using backward reachability along inert transitions, R is extended until either R is stable (no states can be added), or R contains more than half the states in B.
To identify the states in U , observe that a state t is in U if either it is a bottom state that does not have a transition in T , or all successors of t are forced to inertly reach such a bottom state.To compute U , we keep track of a counter untested [t] for every state t, that records the number of outgoing inert transitions to states that are not yet known to be in U .If untested [t] = 0, this means all inert successors of t are guaranteed to be in U , so, provided t does not have a transition in T B− → , t is also added to U .To take care of the possibility that all inert transitions of t have been visited before all states that are the source of a transition in T B− → are added to R, we need to check all non-inert transitions of t to determine whether they are not in T B− → at Lines 2.14 -2.18 , i.e., in the left column of Algorithm 2. Note that checking all successors of such a state is balanced with marking the states in R. In the next section, we explain how to initialize the array untested in constant time.
The coroutine that finishes first, provided that its number of states does not exceed 1 2 |B|, has completely computed the smaller subblock resulting from the refinement, and the other coroutine can be aborted.As soon as the number of states of a coroutine is known to exceed 1  2 |B|, it is aborted, and the other coroutine can continue to identify the smaller subblock.
In detail, the runtime complexity of R, U := split(B, T ) is: This complexity can be inferred as follows.First, note that we execute the coroutines in lockstep.That means that running them will incur an overhead that is at most proportional to the faster of the two.As soon as one subblock becomes too large, its coroutine is aborted to reduce the overhead.Therefore, it is enough to show that the runtime bound for the smaller subblock is satisfied.Note that if both subblocks have the same size, |R| = |U | = 1 we separate bottom from non-bottom states and states known to be destined for R from other states.Note that initially, R contains exactly the states with marked outgoing transitions.This is illustrated in the first two lines of Figure 4.This makes it possible to visit all states in a block B in time O(|B|), to find its bottom states that are not in R in constant time (cf.Line 2.3 of Algorithm 2) and visit these bottom states in time O(|Bottom(B) \ R|).Example 5.1.Figure 3a shows an LTS and its current partition.The corresponding refinable partition data structure is shown in Figure 3b.The history of the splitting is as follows.We start with a block consisting of all states.From this, the states that cannot inertly reach a visible transition are split off as B 1 .Subsequently, B 2 , B 3 and B 4 are split off from the remaining states, resulting in the partition shown in the figure.Note that in Figure 3b it is also illustrated that in blocks B 0 and B 4 the non-bottom states are grouped together, and appear after the bottom states.
When a block is split, we need to update the data structure.This can be done in time proportional to the smaller subblock at Lines 1.17 and 1.25.Figure 4 illustrates how U and R are located in the slice of B. In the third line in the figure, each state with untested [t] = 0 is stored in a specific slice of non-bottom states.To initialise untested [t] to "undefined" at Line 2.5 of Algorithm 2, it suffices to set this slice to the empty slice.After split has finished, the bottom states of R and the non-bottom states of U exchange places; note that this can be done in time proportional to the smaller of the two subblocks.In the example, a part of the non-bottom states of U can even stay where they are.At the end, new bottom states are searched and added to the bottom states of R.
Transitions are stored in four linked refinable partitions [20].While not all four are essential for the concepts of the algorithm, we need them to ensure the time complexity bounds.In Figure 5 we illustrate each of the refinable partitions for the transitions corresponding to the example in Figure 3.
• Transitions are stored in an array grouped per bunch, i.e., transitions in the same bunch are adjacent to each other, see Figure 5a.Then, each bunch can be described as a slice in the array.Within a bunch, transitions are grouped further per action-block-slice.As a consequence, when a small action-block-slice needs to be split off a bunch, one can easily select either the first or the last action-block-slice in the bunch and split it off in constant time.
When a block is split, we need to split its action-block-slices, which can be done in time proportional to the incoming transitions of the smaller subblock at Lines 1.17 and 1.25.This operation fits into the time budget.
• Transitions are stored in an array grouped per block-bunch-slice, see Figure 5b.Within each slice, the marked transitions are separated from the unmarked ones when the block-bunchslice is a splitter.
When a bunch is split, we need to split its block-bunch-slices, which can be done in time proportional to the smaller new bunch at Line 1.8.When a block is split, we need to split its block-bunch-slices, which can be done in time proportional to the outgoing transitions of the smaller subblock at Lines 1.17 and 1.25.Both operations fit into the allowed time budget.
We need this partition to mark transitions quickly, namely in constant time per transition, to visit all marked transitions at Line 2.2, and to visit all other transitions at Line 2.5r.
• Transitions are stored grouped per source state, see Figure 5c.Within each slice, transitions are further grouped into non-inert and inert transitions, and the non-inert transitions are grouped per bunch.
When a bunch is split, we need to regroup the transitions in that bunch as well.This is done in time proportional to the smaller new bunch at Line 1.8.
We need this partition to visit all outgoing transitions of a state at Line 2.14 .We also use this partition to decide whether a state with a transition in T a − →B also has a transition in T B− → \ T a − →B : While regrouping the transitions in T B− → leaving from the same source state, we can recognize whether all transitions move to T a − →B or some remain in T B− → \ T a − →B .
• Transitions are stored grouped per target state, see Figure 5d.Within each slice, transitions are further grouped into non-inert and inert transitions.This partition hardly ever changes.We need this partition to visit all incoming (inert) transitions of a state at Line 2.7.
When a transition becomes non-inert, we have to change all four partitions: Create a new bunch, create a new block-bunch-slice, and move the transition from the inert to the non-inert ones in the last two partitions.We do this by running over all outgoing (formerly) inert transitions of R or all incoming (formerly) inert transitions of U , depending on which subblock is smaller.This requires either time O(|R − → |) or O(|U ← − |), respectively, which fits into the time budget at Lines 1.17 and 1.25.
In our implementation, the four transition partitions are linked together via pointers.When source and target state and (pointers to) the relevant slices mentioned above are only stored once, we need nine pointers or size t integers per transition.Block-bunch-slices are also stored in lists.We store, per block, a list of its stable block-bunchslices, and additionally one global list containing all unstable block-bunch-slices, called the splitter list.When a block is split, its list of stable block-bunch-slices needs to be distributed over the two blocks.This does not require additional time complexity over splitting the block-bunch-slices themselves.New stable block-bunch-slices are inserted into the list of the new block.New unstable block-bunch-slices are inserted into the splitter list.
We obviously need the unstable block-bunch-slices at Line 1.14, and we need the stable blockbunch-slices of block N at Line 1.27.We store a stability flag with each block-bunch-slice to decide whether a split-off new block-bunch-slice should go into the stable or the unstable list.When executing Line 1.27, we now need to clear the stability flag of every stable block-bunch-slice of N .As every block-bunch-slice of N either already contains a transition from a new bottom state, or will contain a transition from a new bottom state after it has been used as a splitter, we assign the runtime needed to clear this flag to the present and future new bottom states of N .
Care needs to be taken that T U − → \ T U a − →B can be found at Line 1.19.We ensure this as follows.At Line 1.10, the primary splitter T B a − →B and the secondary splitter T B− → \ T B a − →B are added to the splitter list in this order.After T B a − →B has been removed from the list (Line 1.16), − →B is the first element of the remaining list.At Line 1.17, this is split into , in an order that depends on which subblock is the smaller.So either the first or the second element of the splitter list is the required slice at Line 1.19.

Several small optimisations
We mention a few additional optimisations that our implementation uses, which are not essential for the complexity, but speed up the implementation.
In cases when we mark all transitions in a block-bunch-slice (Lines 1.11 and 1.22), we instead add their source states to R immediately.
In Line 1.27, we actually know that N is stable under R τ − → U because that was the splitter applied last, so we do not make R and stability is preserved if no more new bottom states are found at Line 1.23), so we do not make T N − → unstable.

Benchmarks
The new algorithm (JGKW19) has been implemented in the mCRL2 toolset [5], and is available in its 201908.0release.This toolset also contains implementations of various other algorithms, such as the algorithm by Groote and Vaandrager (GV) [10] and the GJKW algorithm of [9], which we refer to as GJKW17.In addition, it offers an implementation of the partition-refinement algorithm using state signatures by Blom and Orzan (BO) [3].For each state, a signature is maintained describing which blocks the state can reach directly via its outgoing transitions.Although its time complexity is O(mn2 ), in some cases, it is known to outperform GV.In this section, we report on the experiments we have conducted to compare GV, BO, GJKW17 and JGKW19 when applied to practical examples.All experiments involve the branching bisimulation minimisation of a given LTS, which GJKW17 first transforms into a Kripke structure.Note that for an LTS of n states and m transitions, this transformation results in a Kripke structure consisting of n + m states and 2m transitions in the worst case.
The set of benchmarks consists of all LTSs offered by the VLTS benchmark set 2 with at least 60,000 transitions, plus three cases that have been derived from models distributed with the mCRL2 toolset.These models are: 1. lift6-final: this model is based on an elevator model, extended to six elevators; 2. dining 14: this is the dining philosophers model with 14 philosophers; 3. 1394-fin3: this model is an altered version of the 1394-fin model, extended to three processes and two data elements.Table 1 presents the structural characteristics for each benchmark: the number of states (n), the number of transitions (m), the number of τ -transitions (m τ ), the number of actions (|Act|), and the number of states and transitions after branching bisimulation reduction (min.n and min.m, respectively).
All experiments have been conducted on individual nodes of the DAS-5 cluster [1].Each of these nodes was running CentOS Linux 7.4, had an Intel Xeon E5-2698-v3 2.3GHz CPU, and was equipped with 256 GB RAM.The experiments were performed using development version 201808.0.c59cfd413f of mCRL2. 3able 2 presents the obtained results.On each benchmark, we have applied each of the four algorithms ten times, and report the mean runtime (in seconds or minutes) and memory use (in MB or GB) of those ten runs.In the table, only the significant digits are listed, which are identified by first estimating the standard deviation, given the ten results.Given results x 0 , . . ., x 9 , the standard deviation std is estimated to be std = (Σ 0≤i≤9 x 2 i )−(Σ 0≤i≤9 xi) 2 /10 8.5 [4].For all presented data the estimated standard deviation is less than 20% of the mean.Cases in which this is not true are indicated by '-' in Table 2.
Concerning the significant digits, a decimal dot indicates that the unit digit is significant.If this dot is missing, there is one insignificant zero.The estimated standard deviation is used to identify the significant digits.For example, '3.6 s' has a standard deviation in [0.01, 0.1) s, '540.MB' has a standard deviation in [0.1, 1) MB, and '100 s' has a standard deviation in [1,10) s.
The -symbol after a table entry indicates that the measurement is significantly better than the corresponding measurements for the other three algorithms, and the -symbol indicates that the measurement is significantly worse.Here, the results are considered significant if, given a hundred tables such as Table 2, one table of running time (resp.memory) is expected to contain spuriously significant results.Concerning the runtimes, clearly, GV and BO perform significantly worse than the other two algorithms, and JGKW19 in many cases performs significantly better than the others.In particular, it should be noted that, although GJKW17 has the same time complexity, JGKW19 often still outperforms GJKW17.Concerning memory use, in the majority of cases GJKW17 uses more memory than the others, while sometimes BO is the most memory-hungry.JGKW19 is much more competitive, in many cases even outperforming every other algorithm.
Overall, the results demonstrate that when applied to practical cases, JGKW19 is generally the most efficient algorithm time-wise, and when other algorithms have similar runtimes, it is almost always the most efficient memory-wise.This combination makes JGKW19, the algorithm presented in this paper, currently the best option for branching bisimulation minimisation of LTSs.
However, if we consider the analysis of the complexity of the algorithm for divergence-blind stuttering bisimilarity a bit more closely, we can observe that the actual complexity is O(m log b), where b is the number of states in the largest block in the initial partition.
For the Kripke structure that is the result of translating the LTS, let S a = {s ∈ S A | L(s) = {a}} and observe that |S a | ≤ |S| for all a ∈ Act.The initial partition in the divergence-blind stuttering bisimilarity algorithm is such that s and t are in the same block iff L(s) = L(t).Therefore, the largest block in the initial partition of a translated Kripke structure has at most |S| = n states.Hence, b = n.The complexity of deciding branching bisimilarity via reduction to Kripke structure and using divergence-blind stuttering bisimilarity is thus O(m log n).
Part of a labelled transition system with three states, s0, s1, s2, divided into three blocks B, B and B , and a single non-trivial bunch T .Situation after moving a small action-blockslice T a − →B to its own bunch T .Situation after splitting B with respect to T .Situation after splitting R with respect to T .

Figure 1 :
Figure 1: One step in the branching bisimulation algorithm

1. 10 : 11 :1. 12 :
Add first T B a− →B and then T B− → \ T B a − →B to the splitter list.Label T B a − →B primary andT B− → \ T B a − →B secondary 1.Mark all transitions in T B a− →BFor every state with both marked outgoing transitions and an outgoing transition in T B− → \ T B a − →B , mark one such transition

Figure 2 :
Figure 2: Illustration of splitting of a small block from T and stabilising block B with respect to the new bunches T a − →B and T \ T a − →B , as explained in Example 3.4.

Invariant 3 . 5 (
Bunches are not unnecessarily split).For any pair of non-inert transitions s a − → s and t a − → t , if s, t ∈ B and s , t ∈ B then s a − → s ∈ T and t a − → t ∈ T for some bunch T ∈ Π t .
empty, because it contains all transitions in T B− → reachable from states in U , from which, by construction, transitions in TB− → cannot be reached.As T U − → is empty, the invariant holds trivially.-SupposeTU− → = T U − → \ T U a − →B and T B− → = T B a− →B is primary.We know that B is stable w.r.t.T B− → by Invariant 3.2.That means that every bottom state of B has a transition in T B− → .Bottom states in U have no transitions in T a − →B .Hence, they all have transitions in T \ T a − →B .If we restrict this set of transitions to those starting in U , we see that all bottom states in U also have a transition in T U − → \ T U a − →B and the invariant holds.
7 and Lemma 3.8 there is a path of inert transitions t τ − → • • • τ − → t with t ∈ Bottom(B), and by Invariant 3.2 there is a transition t b − → t ∈ T .As the algorithm terminated, the condition at Line 1.6 is false, which means that #aB (T ) = 1.In other words, T is equal to some action-block-slice.This must be T a − →B , as s a − → s ∈ T (where s ∈ B ). Hence, b = a and t ∈ B , and the proof obligation for branching bisimulation has been fullfilled.Concluding, ≡ Πs is a branching bisimulation, and therefore ≡ Πs = ↔ b .

•
T B− → = R τ − → U at Line 1.23.All transitions of R τ − → U are marked at Line 1.22, so the invariant holds.
An example LTS and partition.Unlabelled transitions are assumed to be τ -transitions. s

Figure 3 :
Figure 3: Snapshot of an LTS with its partitions, and the corresponding refinable partition data structure.

Figure 5 :
Figure 5: Four refinable partition instances for the transitions of the example in Figure 3.
The number of transitions is necessarily finite and denoted by m.T instead of (t, a, t ) ∈ T for T ⊆− →.We also write t We refer to all actions except τ as the visible actions.The transitions labelled with τ are the invisible or hidden transitions.If t a − → t for (t, a, t ) ∈ − →.Using a slight abuse of notation we write t a − → t ∈ a − → T for the set {t a − → t | t ∈ T and t ∈ T }. a − → t , we say that from t, the state t , the action a, and the transition t a − → t are reachable.Definition 2.2 (Branching bisimilarity).Let A = (S, Act, − Internal structure of a block during and shortly after split.In this example, the Usubblock is smaller, so it will become the new block.
The forest of transitions per block-bunch-slice.

Table 1 :
Structural characteristics of the benchmark LTSs.

Table 2 :
Running time and memory use results for GV, BO, GJKW17 and JGKW19.and : significantly better (resp.worse) than all three other algorithms.