Tight Lower and Upper Bounds for the Complexity of Canonical Colour Refinement

An assignment of colours to the vertices of a graph is stable if any two vertices of the same colour have identically coloured neighbourhoods. The goal of colour refinement is to find a stable colouring that uses a minimum number of colours. This is a widely used subroutine for graph isomorphism testing algorithms, since any automorphism needs to be colour preserving. We give an $O((m+n)\log n)$ algorithm for finding a canonical version of such a stable colouring, on graphs with $n$ vertices and $m$ edges. We show that no faster algorithm is possible, under some modest assumptions about the type of algorithm, which captures all known colour refinement algorithms.


Introduction
Colour refinement (also known as naive vertex classification) is a very simple, yet extremely useful algorithmic routine for graph isomorphism testing.It classifies the vertices by iteratively refining a colouring of the vertices as follows.Initially, all vertices have the same colour.Then in each step of the iteration, two vertices that currently have the same colour get different colours if for some colour c they have a different number of neighbours of colour c.The process stops if no further refinement is achieved, resulting in a stable colouring of the graph.To use colour refinement as an isomorphism test, we can run it on the disjoint union of two graphs.Any isomorphism needs to map vertices to vertices of the same colour.So, if the stable colouring differs on the two graphs, that is, if for some colour c, the graphs have a different number of vertices of colour c, then we know they are nonisomorphic, and we say that colour refinement distinguishes the two graphs.Babai, Erdös, and Selkow [2] showed that colour refinement distinguishes almost all graphs (in the G(n, 1/2) model).In fact, they proved the stronger statement that the stable colouring is discrete on almost all graphs, that is, every vertex gets its own colour.On the other hand, colour refinement fails to distinguish any two regular graphs with the same number of vertices, such as a 6-cycle and the disjoint union of two triangles.
Colour refinement is not only useful as a simple isomorphism test in itself, but also as a subroutine for more sophisticated algorithms, both in theory and practice.For example, Babai and Luks's [1,3] O( 2√ n log n )-algorithm -this is still the best known worst-case running time for isomorphism testing -uses colour refinement as a subroutine, and most practical graph isomorphism tools (for example, [19,10,17,24]), starting with McKay's "Nauty" [19,20], are based on the individualisation refinement paradigm (see also [21]).The basic idea of these algorithms is to recursively compute a canonical labelling of a given graph, which may already have an initial colouring of its vertices, as follows.We run colour refinement starting from the initial colouring until a stable colouring is reached.If the stable colouring is discrete, then this already gives us a canonical labelling (provided the colours assigned by colour refinement are canonical, see below).
If not, we pick some colour c with more than one vertex.Then for each vertex v of colour c, we modify the stable colouring by assigning a fresh colour to v (that is, we "individualise" v) and recursively call the algorithm on the resulting vertex-coloured graph.Then for each v we get a canonically labelled version of our graph, and we return the lexicographically smallest among these.(More precisely, each canonical labelling of a graph yields a canonical string encoding, and we compare these strings lexicographically.)To turn this simple procedure into a practically useful algorithm, various heuristics are applied to prune the search tree.They exploit automorphisms of the graph found during the search.However, crucial for any implementation of such an algorithm is a very efficient colour refinement procedure, because colour refinement is called at every node of the search tree.
Colour refinement can be implemented to run in time O((n+m) log n), where n is the number of vertices and m the number of edges of the input graph.To our knowledge, this was first been proved by Cardon and Crochemore [8].Later Paige and Tarjan [22, p.982] sketched a simpler algorithm.Both algorithms are based on the partitioning techniques introduced by Hopcroft [14] for minimising finite automata.However, an issue that is completely neglected in the literature is that, at least for individualisation refinement, we need a version of colour refinement that produces a canonical colouring.That is, if f is an isomorphism from a graph G to a graph H, then for all vertices v of G, v and f (v) should get the same colour in the respective stable colourings of G and H.However, neither of the algorithms analysed in the literature seem to produce canonical colourings.We present an implementation of colour refinement that computes a canonical stable colouring in time O((n + m) log n).Ignoring the canonical part, our algorithmic techniques are similar to known results: like [22] and various other papers, we use Hopcroft's strategy of 'ignoring the largest new cell', after splitting a cell [14].Our data structures have some similarities to those described by Junttila and Kaski [17].Nevertheless, since [17] contains no complexity analysis, and [22] omits various (nontrivial) implementation details, it seems that the current paper gives the first detailed description of an O((m+n) log n) algorithm that uses this strategy.On a high level, our algorithm is also quite similar to McKay's canonical colour refinement algorithm [19,Alg. 2.5], but with a few key differences which enable an O((n + m) log n) implementation.McKay [19] gave an O(n 2 log n) implementation using adjacency matrices, which is the previous fastest algorithm for this problem.Our algorithm is described and analysed in Section 3. In Section 3.4, we discuss extensions: We show how the algorithm can be applied to directed, undirected and edge coloured graphs, and how the complexity bound in fact applies to an entire branch of an individualisation refinement algorithm.Now the question arises whether colour refinement can be implemented in linear time.After various attempts, we started to believe that it cannot.Of course with currently known techniques one cannot expect to disprove the existence of a linear time algorithm for the standard (RAM) computation model, or for similar general computation models.Instead, we prove a tight lower bound for a restricted, but very broad class of algorithms.In this sense, our result is comparable to the lower bounds for comparison based sorting algorithm.Actually, our class of partition-refinement based algorithms captures all known colour refinement algorithms, and actually every reasonable algorithmic strategy we could think of.We use the following assumptions.(See Sections 2 and 4 for precise definitions.)Colour refinement algorithms start with a unit partition (which has one cell V (G)), and iteratively refine this until a stable colouring is obtained.This is done using refining operations: choose a union of current partition cells as refining set R, and choose another (possibly overlapping) union of partition cells S. Cells in S are split up if their neighbourhoods in R provide a reason for this.(That is, two vertices in a cell in S remain in the same cell only if they have the same number of neighbours in every cell in R.) This operation requires considering all edges between R and S, so the number of such edges is a very reasonable and modest lower bound for the complexity of such a refining step; we call this the cost of the operation.We note that a naive algorithm might choose R = S = V (G) in every iteration.This then requires time Ω(mn) on graphs that require a linear number of refining operations, such as paths.Therefore, all fast algorithms are based on choosing R and S smartly (and on implementing refining steps efficiently).
For our main lower bound result, we construct a class of instances such that any possible sequence of refining operations that yields the stable partition has total cost at least Ω((m + n) log n).Note that it is surprising that a tight lower bound can be obtained in this model.Indeed, cost upper bounds in this model would not necessarily yield corresponding algorithms, since firstly we allow the sets R and S to be chosen nondeterministically, and secondly, it is not even clear how to refine S using R in time proportional to the number of edges between these classes.However, as we prove a lower bound, this makes our result only stronger.An alternative formulation of our lower bound result is to model the class of nondeterministic partition-refinement based algorithms as "proof system" and then proves lower bounds on the length of derivations (see the first author's PhD-thesis [4] for details).We formulate the lower bound result for undirected graphs and non-canonical colour refinement, so that it also holds for digraphs, and canonical colour refinement.These results are presented in Section 4. Our construction also yields corresponding lower bounds for the problems of computing the bisimilarity relation on a transition system and for computing the equivalence classes induced by the 2-variable fragment of firstorder logic L 2 on a structure (see Section 4.4).

Preliminaries
For an undirected (simple) graph G, N (v) denotes the set of neighbours of v ∈ V (G), and d(v) = |N (v)| its degree.For a digraph, N + (v) and N − (v) denote the out-and in-neighbourhoods, and The sets S i are called cells of π.The order of π is the number of cells |π|.A partition π is discrete if every cell has size 1, and unit if it has exactly one cell.Given a partition π of V , and two elements u, v ∈ V , we write u ≈ π v if and only if there exists a cell S ∈ π with u, v ∈ S. We say that a set For readability, all further definitions and propositions in this section are formulated for (undirected) graphs, but the corresponding statements also hold for digraphs (replace degrees/neighbourhoods by out-degrees/out-neighbourhoods).One can see that if π is stable and d(u) = d(v), then u ≈ π v, which we will use throughout.
A partition ρ of V refines a partition π of V if for every u, v ∈ V , u ≈ ρ v implies u ≈ π v. (In other words: every cell of π is ρ-closed.)If ρ refines π, we write π ρ.If in addition ρ = π, then we also write π ≺ ρ.Note that is a partial order on all partitions of V .
Definition 1 Let G be a graph, and let π and π ′ be partitions of V (G).For vertex sets R, S ⊆ V (G) that are π-closed, we say that π ′ is obtained from π by a refining operation (R, S) if for every S ′ ∈ π with S ′ ∩ S = ∅, it holds that S ′ ∈ π ′ , and Note that if π ′ is obtained from π by a refining operation (R, S), then π π ′ .We say that the operation (R, S) is effective if π ≺ π ′ .In this case, at least one cell C ∈ π is split, which means that C ∈ π ′ .Note that an effective refining operation exists for π if and only if π is unstable.In addition, the next proposition says that if the goal is to obtain a (coarsest) stable partition, then applying any refining operation is safe.
Proof: π π ′ follows immediately from the definitions.Now consider u, v with u ≈ ρ v, and thus u ≈ π v. Then for any . This holds because R ′ is a union of sets in ρ, and for all these this property holds since ρ is stable.Therefore, u ≈ π ′ v.
A partition π is a coarsest partition for a property P if π satisfies P , and there is no partition ρ with ρ ≺ π that also satisfies property P .
Proposition 3 Let G = (V, E) be a graph.For every partition π of V , there is a unique coarsest stable partition ρ that refines π.
Proof: For any partition π, the discrete partition refines π and is stable, so there exists a stable partition that refines π.Because is a partial order, there exists then at least one coarsest stable partition that refines π.Now suppose there exists a partition π for which there exist at least two distinct coarsest stable partitions ρ 1 and ρ 2 that refine π.Choose such a partition π so that |π| is maximum.Clearly, π is not stable (otherwise ρ 1 = π = ρ 2 ).So there exists at least one effective refining operation (R, S) that can be applied to π.For the resulting partition π ′ , |π ′ | > |π| holds.By Proposition 2, both ρ 1 and ρ 2 refine π ′ as well.But since |π ′ | > |π|, this contradicts the choice of π.

Canonical Colouring Methods
If the colouring is clear from the context, we also omit the superscript.For any colouring α of a (di)graph G, the set , which we will denote by π α .We will call α unit or stable if π α is unit or stable, respectively.
Given two (di)graphs G and G ′ , with respective colourings α and α ′ , an isomorphism h : A colouring method is a method for obtaining a colouring β of a (di)graph G, given an initial k-colouring α. (This method can be an algorithm, or simply a definition.Often, the initial colouring α is chosen to be the unit colouring.)A colouring method (or algorithm) is called canonical if for any two isomorphic (di)graphs G and G ′ with initial colourings α resp.α ′ and isomorphism h : V (G) → V (G ′ ), the following holds: if h is colour preserving for α and α ′ , then h is colour preserving for the resulting colourings β and β ′ .The resulting colouring β itself is also called a canonical colouring of G, starting from α.If α is the unit colouring, β is simply called a canonical colouring of G.
For instance, for simple undirected graphs G, the degree function d, which assigns the colour , yields canonical colouring of G, because every isomorphism maps vertices to vertices of the same degree.(In other words: degrees are isomorphism invariant.)Obviously, a canonical colouring method is useful for deducing information about possible isomorphisms between two graphs, especially when the resulting partition π β refines the initial partition π α .For details on isomorphism testing algorithms based on this idea, we refer to [19,21].
In this section we give a fast canonical algorithm that for any (di)graph G and colouring α of G, yields a colouring β of V (G) such that π β is the coarsest stable partition that refines π α .For ease of presentation, we require that the initial colouring α is a surjective ℓ-colouring for some value ℓ (so every colour in {1, . . ., ℓ} occurs at least once).The resulting colouring β will then again be a surjective k-colouring for some value k.In particular, if we choose α to be the unit colouring, then β is a canonical colouring of G such that π β is the unique coarsest stable partition of G. To obtain the most general result, we formulate the algorithm for digraphs.Variants and extensions are discussed in Section 3.4.

High-level Description and Correctness Proofs
In Algorithm 1, we give a high-level description of our canonical colour refinement algorithm.This is not yet the fast implementation, and in fact, because we do not yet specify which data structures are used to represent the various mathematical objects (sets and functions), no sharp complexity bound can be concluded from it.In the next section, we give a detailed implementation of this algorithm, describe the data structures in detail, and prove the desired complexity bound.Here, we first focus on proving correctness of the algorithm.
In our algorithms, the scope of for loops, while loops and if-then-else statements is indicated by the indentation of blocks; because of space considerations we omit 'end for', 'end while' and 'end if' statements.
The input to Algorithm 1 is a digraph G = (V, E), with V = {1, . . ., n}.For every vertex v ∈ V , the sets of out-neighbours N + (v) and in-neighbours N − (v) are given.(Alternatively, these can be computed in linear time from the edge list.)In addition, an ℓ-colouring α of G and a set S ⊆ {1, . . ., ℓ} are given.The set S should be a sufficient refining colour set for α, which is a set that satisfies the following property: for any colour class C α i and two vertices u, v ∈ C α i , if there exists a colour class trivially forms a sufficient refining colour set for any ℓ-colouring, but that smarter choices of S may give a faster algorithm (which will be necessary in Section 3.4).
Throughout, the algorithm maintains an (ordered) partition (C 1 , . . ., C k ) of V (G), starting with the partition (C α 1 , . . ., C α ℓ ) (Lines 1-3).We also view this partition as a colouring, so the sets C i will be called colour classes, and indices i ∈ {1, . . ., k} will be called colours.In the main while-loop (Line 5), this partition is iteratively refined using refining operations of the form (R, V ), where R = C r for some r ∈ {1, . . ., k}.We will show that when the algorithm terminates, no effective refining operations are possible on the resulting partition.So the resulting partition is the unique coarsest stable partition of G that refines π α (Propositions 2, 3).The next colour r that is used as refining colour is chosen using a stack (sequence) S refine (Line 6), which contains all colours that still need to be considered.For a given refining colour class C r and any v ∈ V , call d + r (v) := |N + (v) ∩ C r | the colour degree of v (with respect to colour r).Then every colour s ∈ {1, . . ., k} will be split up according to colour degrees (in the for-loop of Line 10).We only consider colours that actually split up, in increasing order.When splitting up colour class C s , the new colours will be s and k + 1, . . ., k + d − 1, where d is the number of different colour degrees that occur in C s .These new colours are assigned to the vertices in C s according to increasing colour degrees (Lines 11 -20).
It remains to explain how newly introduced colours are added to the stack S refine .Initially, S refine contains all colours in S, in increasing order (Line 4), and whenever new colours are introduced during the splitting of a colour class C s , these are pushed onto the stack S refine , in increasing order (Lines 21-27).There are however exceptions: for instance, if we have already used the vertex set C s as refining colour class before, and this set is split up into d new colours, then it is not necessary to use all of these new colours as refining colours later; one colour b may be omitted from S refine (Line 27).To obtain a good complexity, we choose b such that the size of the corresponding colour class is maximised, in order to minimise the sizes of the refining colour sets used later during the computation.(This is Hopcroft's trick [14], which was also used by e.g.[22].) Informally, this algorithm is canonical since at every point, both the (colourings given by the) ordered partition (C 1 , . . ., C k ) and stack S refine remain canonical; new colours that we assign to vertices, and the order in which colours are considered in the various loops of the algorithm, are completely determined by isomorphism-invariant values such as colour degrees and colour numbers.The order in which vertices of G or neighbour lists are given in the input is irrelevant.A formal proof is given in Lemma 6 below.We first prove that Algorithm 1 returns the unique coarsest stable partition, which requires the following invariant.
Proposition 4 At the end of every iteration of the for-loop in Line 10 of Algorithm 1, {C 1 , . . ., C k } is a partition of V (G) into nonempty sets, and the set of colours in S refine is a sufficient refining colour set for the corresponding kcolouring of G.
Proof: Since new colours correspond to colour degrees that actually occur (Lines 11-16), every new colour class will be nonempty.Lines 19 and 20 show that every vertex of G remains part of exactly one colour class.So the algorithm maintains a partition of V (G).
By definition, the set of colours in S refine is a sufficient refining colour set before the first iteration.We prove that this invariant is maintained during any iteration of the for-loop, where colour class C s for s ∈ {1, . . ., k} is split up (by colour r), into the new colour classes C σ1 , . . ., C σp .Denote S = C s , as it is at the start of the iteration (so S = C σ1 ∪ . . .∪ C σp ).Because the new colour classes form a partition of the old colour class S, for every z ∈ V (G), it holds that Consider two vertices u, v ∈ V (G) that are in the same colour class after the refining operation, and therefore also before the refining operation.1)).So if s ∈ S refine , then the invariant is maintained after splitting up the colour, since every new colour is added to S refine (Lines 22-23), and s remains in S refine .So now assume s ∈ S refine .Then every colour in {σ 1 , . . ., σ p } is added to S refine , except for i = f (b) (Line 27).Then we need to consider the case that 1)).Since s ∈ S refine , and the invariant held before the refining operation, there exists another colour j ′ ∈ S refine such that Since this colour remains in S refine , the invariant is also maintained in this case.
Using the above proposition, we can prove that Algorithm 1 computes a coarsest stable colouring, provided that S is a sufficient refining colour set.Recall that this condition is certainly satisfied when choosing S = {1, . . ., ℓ}.
Lemma 5 Let G be a digraph, α be a surjective ℓ-colouring of G, and let S ⊆ {1, . . ., ℓ} be a sufficient refining colour set for α.Then Algorithm 1 computes a surjective k-colouring β of G such that π β is the coarsest stable partition that refines π α .
Proof: Let ω be the coarsest stable partition of V (G) that refines π α .The partition π β given by the algorithm is refined by ω because it is obtained from π α using refining operations (Proposition 2).The stack S refine is empty when the algorithm terminates, so the empty set is a sufficient refining colour set at this point (Proposition 4), and therefore π β is stable.It follows that π β is equal to ω (Proposition 3).At any point, the sets C i for i ∈ {1, . . ., k} are nonempty (Proposition 4), so the resulting k-colouring β is surjective.
Lemma 6 Algorithm 1 is a canonical colouring algorithm.
) denote the set C j as it is at the start of the i-th iteration of the while-loop in Line 5, when running Algorithm 1 with input G, α, S (resp.G ′ , α ′ , S).Let S G,i refine (resp.S G ′ ,i refine ) denote the stack S refine as it is at the start of iteration i of the while-loop in Line 5, when running Algorithm 1 with input G, α, S (resp.G ′ , α ′ , S).
To show that Algorithm 1 is canonical, we prove by induction over i that for every isomorphism h : V (G) → V (G ′ ) that is colour-preserving for α and α ′ , the following properties are maintained: S G,i refine = S G ′ ,i refine , and for all c and . For i = 1, the claim follows immediately from how S refine is initialised (Line 4), and how the sets C c are initialised (Line 2).We now consider the places in the algorithm where these sets and stacks are modified.In Line 6, the last element of both S G,i refine and S G ′ ,i refine is removed, so these sequences stay the same.Furthermore, it follows that the same colour is used as refining colour for both G and G ′ in this iteration.The induction assumption shows that h is a colour preserving isomorphism for the colourings given by the various sets . So the isomorphism h shows that for every c and every d, C G,i c and C G ′ ,i c contain the same number of vertices with colour degree d.Hence the set Colors split is the same for both G and G ′ , and for each colour c ∈ Colors split , the values maxcdeg and numcdeg(j) (for every j) are the same.Therefore, in every iteration of the for-loop in Line 10, the sets D, I will be the same for both G and G ′ .The choice of the bijection f in line 16 is unique because of the monotonicity; hence f will be the same for G and G ′ as well.It follows that when in Lines 19 and 20, . Hence h remains colour preserving for the new partition.From the previous observations it also follows that in Line 25, b is chosen to be the same value for both G and G ′ .Therefore, in Lines 27 and 23, the stack S refine is modified in the same way for both G and G ′ (note that in both cases, the colours are added in increasing order).This shows that the claimed properties are maintained in one iteration of the while-loop in Line 5, so by induction, h is also a colour preserving isomorphism for the final colouring β that is returned in Line 32.

Implementation and Complexity Bound
We now describe a fast implementation of Algorithm 1.The main idea of the complexity proof is the following: one iteration (of the main while-loop; Line 5 of Algorithm 1) consists of popping a refining colour r from the stack S refine , and applying the refining operation (R, V ), with R = C r .Below we give implementation details and prove the following lemma: Lemma 7 Algorithm 1 can be implemented such that one iteration, in which a refining operation (R, V ) is applied, takes time and k is the number of new colours that are introduced in this iteration.This implementation requires an initialisation step with complexity O(n).
Using the above lemma, we can prove the desired complexity bound.(The main idea is again based on Hopcroft's idea [14].) q denote the refining colour classes C r with v ∈ C r that are considered throughout the computation, in chronological order.Then we observe that for all i ∈ {1, . . ., q−1}, This holds because whenever a set S = C s is split up into C σ1 , . . ., C σp , where s has been considered earlier as a refining colour (so it is not in S refine anymore), then for all new colours σ i that are added to the stack S refine , |C σi | ≤ 1 2 |S| holds, since the largest colour class is not added to S refine .Note that if a colour class C σi is subsequently split up before σ i is considered as refining colour, the bound of course also holds.It follows that every v ∈ V (G) appears at most log 2 n times in a refining colour class.Then we can write where the first summation is over all refining colour classes R = C r considered during the computation.In addition, the total number of new colours that is introduced is at most n, since every colour class, after it is introduced, remains nonempty throughout the computation.So we may write where k i denotes the number of colours introduced during iteration i. Combining these bounds with Lemma 8 shows that the total complexity of the algorithm can be bounded by Combining Lemmas 5, 6 and 8 (using S = {1, . . ., ℓ}), we obtain our main theorem: Theorem 9 For any digraph G on n vertices and m edges, with surjective ℓcolouring α, in time O((n + m) log n) a canonical surjective k-colouring β of G can be computed such that π β is the coarsest stable partition that refines π α .
Implementation Details It remains to prove Lemma 7. In Algorithm 2 and its subroutine Algorithm 3, the detailed, fast implementation of Algorithm 1 is given.The colour classes C i are represented by doubly linked lists C[i], indexed by i ∈ {1, . . ., n}. (C and A are arrays containing (pointers to) doubly-linked lists and lists, respectively, indexed by colour numbers 1, . . ., n.)For all lists L, we keep track of their length, which we denote by |L|.
The first challenge is how to compute the colour degrees d + r (v) efficiently for every v ∈ V (G) (Lines 7 and 8 of Algorithm 1), with respect to the refining colour r, and corresponding colour class R. For this we use an array cdeg[v] of integers, indexed by v ∈ {1, . . ., n}.We use the following invariant: at the beginning of every iteration, cdeg[v] = 0 for all v. Then we can compute these colour degrees by looping over all in-neighbours w of all vertices v ∈ R, and increasing cdeg [w].At the same time, we compute the maximum colour degree for every colour c, using an array maxcdeg (this is an array of integers indexed by c ∈ {1, . . ., n}), we compute a list Colors adj of colours i that contain at least one vertex w ∈ C i with cdeg[w] ≥ 1, and for every such colour i, we compute a list A[i] of all vertices w with cdeg[w] ≥ 1.None of these lists contain duplicates.See Lines 47-54 of Algorithm 2. This implementation is correct because we also maintain the following invariant: at the beginning of every iteration, maxcdeg[c] = 0 and A[c] is an empty list, for every c, Colors adj is an empty list, and flags are maintained for colours to keep track of membership in Colors adj .To maintain this invariant, we reset all of these data structures again at the end of every iteration (Lines 68-73).Note that it suffices to only reset cdeg [v] for vertices v that occur in some list A[c] (Lines 69-70).
Next, we address how we can consider all colours that split up in one iteration, in canonical (increasing) order (see Lines 9,10 of Algorithm 1 and Lines 55-66).To this end, we compute a new list Colors split , which represents the subset of Colors adj containing all colours that actually split up.This is necessary since this list needs to be sorted, in order to consider the colours in canonical order (in the for-loop in Line 66).By ensuring that all colours in Colors split split up, we have that |Colors split | ≤ k (where k is the number of colours introduced in this iteration), and therefore we can afford to sort this list.This can be done using any list sorting algorithm of complexity O(k log k), such as merge sort.To compute which colours split up, we compute for every colour in c ∈ Colors adj the maximum colour degree maxcdeg // This is the final colouring β often in S (Lines 81-83), which corresponds to the new colour that is possibly not added to S refine .Using numcdeg, we can also easily construct an array f , indexed by d ∈ {0, . . ., maxcdeg[s]}, which represents the mapping from colour degrees that occur in S to newly introduced colours, or to the current colour s (Lines 85-93).Finally, we can move all vertices v ∈ A ] is the new colour that corresponds to the colour degree of v (Lines 94-98).Note that looping over A[s] suffices, because if there are vertices in C[s] with colour degree 0, then these keep the same colour, and thus do not need to be addressed.This fact is essential since the number of such vertices may be too large to consider, for our desired complexity bound.In conclusion, Algorithms 2 and 3 are indeed implementations of Algorithm 1.We now prove Lemma 7 by analysing the complexity.
Proof of Lemma 7: The given implementation uses a number of arrays of length n, either containing integers (cdeg, colour, maxcdeg, numcdeg, f ), or containing (pointers to) lists/doubly linked lists (C, A).All of these arrays can be initialised in time O(n).In general, the initialisation steps (Lines 33-44) take time O(n) (for Line 43, use bucket sort).
We first consider the complexity of the subroutine SplitUpColour(s), given in Algorithm 3. We prove that it

Extensions, Generalisations and Variants
Stack vs. queue In our algorithm, we use a stack to select the next colour that should be used for the next refining operation, whereas previous similar algorithms use a queue [19,22].Firstly, we remark that if we replace the stack by a queue, it can easily be checked that all of the claims proved in the previous sections still hold.So the best choice is determined by other concerns, which we now shortly discuss.Using a queue gives the nice property that during the algorithm execution, all of the following 'standard' partitions will be generated: given an initial partition π = π 0 of the vertices V of a graph G, for every i ≥ 0 one can define π i+1 to be the partition obtained from π i using the refining operation (V, V ).The coarsest stable partition of G that refines π is now the first partition π i with π i = π i+1 .This characterisation is sometimes used as an alternative definition of coarsest stable partitions.One can verify that when using a queue, for every i a colouring α with π α = π i will be generated during the execution of the algorithm.
When using a stack, the behaviour of the algorithm seems somewhat less predictable.Nevertheless, this yields a 'depth-first' type of strategy that tends to give very small colour classes much quicker, which seems an advantage.In our own (limited) computational studies, we observed that using a stack was never worse than using a queue, and in some cases significantly better.Furthermore, we had an earlier lower bound example construction that required time Ω((n + m) log n) for a queue-based algorithm, but could be solved in time O(n + m) using a stack-based algorithm.For these reasons, we would recommend using a stack.The Complexity of Iterative Refinement Consider Algorithm 4. This algorithm takes as input a digraph G on n vertices, and returns a discrete colouring β of G, or more precisely: a surjective n-colouring of G.This colouring is not canonical, since in Line 103, an arbitrary vertex is chosen to be individualised, that is, to receive a unique colour.So by itself this algorithm is not very interesting (there are easier ways to obtain an arbitrary discrete colouring of G).However, it corresponds to one recursion branch of various state of the art canonical labelling algorithms, based on the algorithm introduced by McKay [19].We now shortly sketch how one should modify this algorithm (into a recursive algorithm) to obtain such a canonical labelling algorithm: In Line 103, instead choose a colour class C α i of the current colouring α with |C α i | ≥ 2. We branch on this colour class, as follows: for every v ∈ C α i , continue with a separate branch of the algorithm where v is individualised (Line 105), and a new stable colouring is computed (as shown in Line 106).Continuing recursively this way, one obtains a number of discrete colourings of G; one for every leaf of the recursion tree.A canonical discrete colouring of G can be obtained by choosing one of these colourings that maximises some value.For instance, consider the adjacency matrix representation of G where rows and columns are ordered according to the colour numbers, and view this as a binary number in the straightforward way.This is the basic algorithm; by keeping track of automorphisms of the graph, there are various ways to speed up the algorithm by pruning the recursion tree.For more details, we refer to [19,10,17,24,20].

Algorithm 4 Iterative Colour Refinement
The algorithm for obtaining a canonical discrete colouring β for a digraph G sketched above does not terminate in polynomial time for all graphs G. (If it did, this would yield a polynomial time isomorphism test: for two digraphs G and G ′ , compute canonical discrete n-colourings β and β ′ , respectively.Since β and β ′ are discrete n-colourings, they define a unique colour preserving bijection h : V (G) → V (G ′ ).Since β and β ′ are canonical, G and G ′ are isomorphic if and only if h is an isomorphism.)Examples are known where such an algorithm will consider an exponential number of branches [21].Nevertheless, a single branch of this algorithm (as shown in Algorithm 4) terminates quickly.From [19] it follows that Algorithm 4 has an implementation that terminates in time O(n 2 log n).
Using our results, we can show that it has an O((n + m) log n) implementation.Proof: The main part of the computation occurs in Lines 99 and 106, where we compute a surjective canonical k-colouring β such that π β refines π α , for a given surjective ℓ-colouring α (in Line 99, the unit colouring is chosen for α).For this we use the fast implementation of Algorithm 1, given in Section 3.3.To obtain the desired complexity, we make the following simple changes, compared to Algorithm 1: we do not initialise the sets C i and stack S refine every time we call the algorithm (Lines 1-4), and do not explicitly compute the new colouring β (Lines 29-32).In addition, we do not actually copy the the colouring β (Line 101 of Algorithm 4).Instead, we initialise these sets once, keep working with the same sets {C 1 , . . ., C k } throughout different iterations of the while-loop in Algorithm 4, and and only compute the corresponding colouring β at the very end of the algorithm.Whenever we individualise a vertex v by assigning it a new colour (Line 105 of Algorithm 4), we move v from its previous colour class C i to the new colour class C ℓ+1 .In addition, we update the stack S refine , which is currently empty, to contain the single colour ℓ + 1. (This can both be done in constant time.) We now argue that for computing the next stable colouring β (Line 106), it is sufficient that S refine contains only the colour ℓ + 1. Denote by α 1 the stable ℓ-colouring before this step (with α 1 (v) = i), and by α 2 the new (ℓ + 1)colouring (with α 2 (v) = ℓ + 1).Consider the colour classes . We conclude that {ℓ + 1} is a sufficient refining colour set for α 2 , so Algorithm 1 will compute the desired stable colouring β when S refine is initialised like this (Lemma 5).
We can now use the same argument as in the proof of Theorem 9 to show that the total complexity of all calls to Algorithm 1 (without the initialisation steps, as described above) is bounded by O((n + m) log n).Indeed, for every vertex v ∈ V (G), if R v 1 , . . ., R v q denote the refining colour classes C r with v ∈ C r that are considered throughout the entire computation, in chronological order, then again for all i ∈ {1, . . ., q − 1} it holds that If v is the vertex that is individualised in Line 105 (of Algorithm 4), then this holds because the next refining colour class that contains v has size one, whereas the previous colour class that contained v had size at least two (because v was chosen with a nonunique colour in Line 103).In all other cases, the argument given in the proof of Theorem 9 applies.Following that proof, this shows that the total complexity of all refining done in Algorithm 4 can be bounded by O((n + m) log n).
It remains to bound the complexity of the other steps of Algorithm 4. As described above, the various sets are initialised only once, and the final colouring β is computed only once, so this only adds a term O(n) to the complexity.In addition, all steps in the while-loop (Line 100) other than the stable colouring computation in Line 106 can be done in constant time, since we do not actually copy the colouring (Line 101).For the selection of the vertex v in Line 103, this claim is not entirely obvious, but one can observe that during the computation, one can maintain a doubly linked list that contains the colours of all colour classes of size at least two.This list can be updated in constant time whenever vertices are recoloured (so it does not change the total asymptotic complexity), and it can be used to select a vertex in Line 103 in constant time.The whileloop in Line 100 terminates after at most n iterations.In total, this shows that Algorithm 4 has an implementation with complexity We remark that in practice, one might wish to use smarter methods to select the vertex v to be individualised (Line 103), or more generally, to select the nontrivial colour class on which the recursive canonical labelling algorithm should branch.For instance, one can always branch on the smallest nontrivial colour class, or on the largest colour class 4 .In that case, an efficient heap-based priority queue implementation (see e.g.[12]) can be used instead of a doubly-linked list to keep track of the sizes of colour classes, to attain the above complexity.
Alternative Stability Criteria We formulated our results only for digraphs, with stability defined only in terms of out-neighbours.We now summarise how our results should be modified to accommodate alternative stability criteria.
Theorem 11 For any undirected graph G on n vertices with m edges, in time O((n + m) log n) a canonical coarsest stable colouring can be computed.
Proof: For an undirected graph G, denote by G * the digraph with V (G * ) = V (G), constructed by replacing every undirected edge by two directed edges in both directions.Observe that a colouring α is stable for G if and only if it is stable for G * , so we can use the fast implementation of Algorithm 1 on input G * to compute a coarsest stable colouring of G. Next, observe that a bijection h : V (G) → V (H) is an isomorphism from G to H if and only if it is a (digraph) isomorphism from G * to H * .It follows that the computed colouring is a canonical coarsest stable colouring.
For a positive integer p, we define a p-edge coloured digraph G to be a tuple (V, E, c) where (V, E) is a digraph that may have parallel edges and/or loops, and c : E → {1, . . ., p} is an edge colouring of G.For e ∈ E, we write e = (u, v) to denote that e is an edge from u to v.For j ∈ {1, . . ., p}, v ∈ V and = j}| (the number of edges of colour j, leaving v, with head in C).A (vertex) ℓ-colouring α of G is called edge-colour stable if for all u, v ∈ V with α(u) = α(v), all j ∈ {1, . . ., p} and all i ∈ {1, . . ., ℓ}, it holds that d + j (u, for all j ∈ {1, . . ., p} and u, v ∈ V (possibly the same), it holds that the number of edges of colour j from u to v equals the number of edges of colour j from h(u) to h(v).Using this notion of isomorphism, canonical colouring methods/canonical colourings for edge coloured digraphs are defined the same as before.We will now show that a colouring β of G is edge-colour stable if and only if there exists a stable colouring β ′ for G ′ that refines α, that coincides with β on V .
Consider an edge-colour stable colouring β of G.We extend it to a colouring β ′ of G ′ , as follows: for each new vertex v e that corresponds to an edge e = (u, v), assign the tuple (c(e), β(v)).Extend β by assigning new colours to the new vertices, according to the lexicographical order of these tuples.(So two new vertices receive the same colour if and only if they are assigned the same tuple, and a new vertex and an original vertex never receive the same colour.)The resulting colouring β ′ of G ′ clearly refines α, and is stable for G ′ : for every vertex colour i used by β, vertex u ∈ V and edge colour j ∈ {1, . . ., p}, the number d + j (u, C β i ) (with respect to G) equals the number of out-neighbours of u in G ′ that have the colour corresponding to the tuple (j, i).For the new vertices of G ′ , the stability criterion follows easily.
For the other direction, consider a stable colouring β ′ of G ′ that refines α, and define β to be the restriction of β ′ to V .We argue that β is edge-colour stable for G.For two new vertices v e and v f of G ′ , with respective out-neighbours x and y, we have that β ′ (v e ) = β ′ (v f ) implies β ′ (x) = β ′ (y) and α(v e ) = α(v f ), so c(e) = c(f ).This can be used to conclude that for any two vertices u, v ∈ V , colour i and edge colour j, if It follows that a coarsest edge-colour stable colouring β of G corresponds to a coarsest stable colouring β ′ of G ′ that refines α.Since we can compute such a colouring β ′ in a canonical way, we can compute such a colouring β in a canonical way (Theorem We remark that for any class of edge-coloured digraphs where the number of edges is polynomially bounded in the number vertices (so they satisfy m ∈ O(n d ) for a constant d), we can write log(n + m) ∈ O(log n d ) = O(log n).So for such a graph class, the above lemma shows that a canonical coarsest edge-colour stable colouring can again be computed in time O((n + m) log n).
The above theorem can be used for various stronger isomorphism tests.We now give details for one of these.For digraphs, we defined stability only considering out-neighbourhoods.Nevertheless, an isomorphism h between two digraphs not only maps the out-neighbourhood of a vertex v bijectively to the out-neighbourhood of h(v), but does the same with the in-neighbourhoods.So for the purpose of digraph isomorphism testing, the following stronger stability criterion is more useful: a k-colouring α of a digraph G is bi-stable if for every pair of vertices u, v ∈ V (G) with α(u) = α(v) and every colour c ∈ {1, . . ., k}, both Theorem 13 For any digraph G on n vertices with m edges, in time O((n + m) log n) a canonical coarsest bi-stable colouring can be computed.E, c) on the same vertex set as G as follows: for every edge (u, v) ∈ E(G), add an edge e = (u, v) to E with c(e) = 1, and an edge f = (v, u) to E with c(f ) = 2. (Note that this may introduce parallel edges.)Observe that a colouring α : V → {1, . . ., k} is edge-colour stable for G ′ if and only if it is bi-stable for G, and that a canonical colouring method for G ′ is a canonical colouring method for G.So Theorem 12 can be applied.We use that G ′ has 2m ∈ O(n 2 ) edges, which yields the complexity bound O((n + m) log n).

Complexity Lower Bound
We shall prove our lower bound for undirected graphs; this makes it as general as possible.The cost of a refining operation (R, S) in a graph G is cost(R, S) This is basically the number of edges between R and S, except that edges with both ends in R ∩ S are counted twice.For a partition π that admits a refining operation (R, S), denote by π(R, S) the partition that results from this operation.
Definition 14 Let G = (V, E) be a graph, and π be a partition of V .
-Otherwise, cost(π) := min R,S cost(π(R, S))+cost(R, S), where the minimum is taken over all effective refining operations (R, S) that can be applied to π.
Note that this is well-defined; if π is unstable, then there exists at least one effective elementary refining operation (R, S), and for any such operation, |π(R, S)| > |π|.We can now formulate the main result of this section.
Note that this theorem implies a complexity lower bound for all partitionrefinement based algorithms for colour refinement, as discussed in the introduction.We will first prove some basic observations related to the above definitions, then give the construction of the graph, and finally prove Theorem 15.

Basic Observations
We start with two basic properties of stable partitions.The first proposition follows easily from the definitions.
Proposition 16 Let G = (V, E) be a graph, and π be a stable partition of V .For any π-closed subset S ⊆ V , π[S] is a stable partition for G[S].
Proposition 17 Let G = (V, E) be a graph, and π be a stable partition of V .For any π-closed set S and vertices u, v ∈ V : if the distance from u to S is different from the distance from v to S, then u ≈ π v.
Proof: Denote the distance from a vertex x to S by dist(x, S).W.l.o.g.we may assume that dist(u, S) < dist(v, S), so in particular dist(u, S) is finite.We prove the statement by induction over dist(u, S).
For a partition π of V , denote by π ∞ the coarsest stable partition of V that refines π.
Proof: Let (R, S) be a refining operation that can be applied to π, which yields π ′ .Then it can be observed that the operation (R, S) can also be applied to ρ, and that for the resulting partition ρ ′ , it holds again that π An induction proof based on this observation shows that a minimum cost sequence of refining operations that refines π to π ∞ can also be applied to ρ, to yield the stable partition π ∞ , at the same cost.Therefore, cost(π) ≥ cost(ρ).
A refining operation (R, S) on π is elementary if both R ∈ π and S ∈ π.The next proposition shows that adding the word 'elementary' in Definition 14 yields an equivalent definition.
Proposition 19 Let π be an unstable partition of V (G).Then where the minimum is taken over all effective elementary refining operations (R, S) that can be applied to π.
Proof: Let (R, S) an nonelementary refining operation for π, and let ρ 1 be the result of applying (R, S) to π.We shall prove that there is a sequence of elementary refining operations of total cost at most cost(R, S) that, when applied to π, yields a partition ρ 2 that refines ρ 1 .The claim follows by Proposition 18. Suppose that R consists of the cells R 1 , . . ., R q and S consists of the cells S 1 , . . ., S p .We apply the elementary refining operations (R i , S j ) for all i ∈ {1, . . ., q}, j ∈ {1, . . ., p} in an arbitrary order and let ρ 2 be the resulting partition.The cost of these elementary refinements is It is easy to see that ρ 2 refines ρ 1 .Indeed, if u, v ∈ S belong to the same class of ρ 2 , then they belong to the same class S j , and for all classes R i they have the same number of neighbours in R i .Hence they have the same number of neighbours in R = i R i , and this means that they belong to the same class of ρ 1 .

Construction of the Graph
For k ∈ N, denote B k = {0, . . ., 2 k −1}.For ℓ ∈ {0, . . ., k} and q ∈ {0, . . ., 2 ℓ −1}, the subset B ℓ q = {q2 k−ℓ , . . ., (q + 1)2 k−ℓ − 1} is called the q-th binary block of level ℓ.Analogously, for any set of vertices with indices in B k , we also consider For such a set X, a partition π of X into binary blocks is a partition where every S ∈ π is a binary block.A key fact for binary blocks that we will often use is that for any ℓ and q, B ℓ q = B ℓ+1 2q ∪ B ℓ+1 2q+1 .For every integer k ≥ 2, we will construct a graph G k .(An example for k = 3 is given in Figure 1.)In its core this graph consists of the vertex sets i for all j ∈ {1, . . ., k} and every y i is adjacent to all y j i .Furthermore, for all i, j 1 , j 2 there is an edge between x j1 i and y j2 i .(For X , binary blocks are subsets of the form X ℓ q := {x j i | i ∈ B ℓ q , j ∈ {1, . . ., k}}, and for Y the definition is analogous.)We add gadgets to the graph to ensure that any sequence of refining operations behaves as follows.After the first step, which distinguishes vertices according to their degrees, X and Y are cells of the resulting partition.Next, X splits up into two binary blocks X 1 0 and X 1 1 of equal size.This causes X to split up accordingly into X 1 0 and X 1 1 .One of these cells will be used to halve Y in the same way.This refining operation (R, S) is expensive because [R, S] contains half of the edges between X and Y. Next, Y can be split up into Y 1 0 and Y 1 1 .Once this happens, there is a gadget AND 1 that causes the two cells X 1 0 , X 1  1 to split up into the four cells X 2 q , for q = 0, . . ., 3. Again, this causes cells in X , Y and Y to split up in the same way and to achieve this, half of the edges between X and Y have to be considered.The next gadget AND 2 ensures that if both cells of Y are split, then the four cells of X can be halved again, etc.In general, we design a gadget AND ℓ of level ℓ that ensures that if Y is partitioned into 2 ℓ+1 binary blocks of equal size, then X can be partitioned into 2 ℓ+2 binary blocks of equal size.By halving all the cells of X and Y k = Θ(log n) times (with n = |V (G k )|), this refinement process ends up with a discrete colouring of these vertices.Since every iteration uses half of the edges between X and Y (which are Θ(m)), we get the cost lower bound of Ω(m log n) We now define these gadgets in more detail.For every integer ℓ ≥ 1, we define a gadget AND ℓ , which consists of a graph G together with two out-terminals a 0 , a 1 , and an ordered sequence of the graph G is identical to the construction of Cai, Fürer and Immerman [7].(See Figure 2. The out-terminals a 0 , a 1 and in-terminals b 0 , . . ., b 3 are indicated.)For ℓ ≥ 3, AND ℓ is obtained by taking one copy G * of an AND 2 -gadget, and two copies G ′ and G ′′ of an AND ℓ−1 -gadget, and adding four edges to connect the two pairs of in-terminals of G * with the pairs of out-terminals of G ′ and G ′′ , respectively.As out-terminals of the resulting gadget we choose the out-terminals of G * .The in-terminal sequence is obtained by concatenating the sequences of in-terminals of G ′ and G ′′ .(See Figure 3 for an example of AND 3 .)For any AND ℓ -gadget G with in-terminals b 0 , . . ., b 2 ℓ −1 , the in-terminal pairs are pairs b 2p and b 2p+1 , for all p ∈ {0, . . ., 2 ℓ−1 − 1}.
The graph G k is now constructed as follows.Start with vertex sets X, X , Y and Y , and edges between them, as defined above.For every ℓ ∈ {1, . . ., k−1}, we add a copy G of an AND ℓ -gadget to the graph.Denote the out-and in-terminals of G by a 0 , a 1 and b 0 , . . ., b 2 ℓ −1 , respectively.
-For i = 0, 1 and all relevant q: we add edges from a i to every vertex in X ℓ+1 2q+i .-For every i, we add edges from b i to every vertex in Y ℓ i .Finally, we add a starting gadget to the graph, consisting of three vertices v 0 , v 1 , v 2 , the edge v 1 v 2 , and edges {v Proof: An easy induction proof shows that the AN D ℓ -gadget has O(2 ℓ ) vertices and edges.So, all AND ℓ gadgets together, for ℓ ∈ {1, . . ., k − 1}, have at most O(2 k ) vertices and edges.Therefore, the bounds on the total number of vertices and edges of G k are dominated by the number of vertices and edges in G k [X ∪Y], which is k2 k+1 and k 2 2 k , respectively.
We now state and prove the key property for AN D ℓ -gadgets.This requires the following definitions.For a graph G = (V, E), If ψ is a partition of a subset S ⊆ V , then for short we say that a partition ρ of V refines ψ if it refines ψ ∪ {V \ S}.We say that ρ agrees with ψ if ρ[S] = ψ.(So if V \ S = ∅, one can choose ρ such that it agrees with ψ but does not refine ψ.)For two graphs G and H, by G ⊎ H we denote the graph obtained by taking the disjoint union of G and H.We say that a partition π of . This is used often for the case where V 1 = N (u) and V 2 = N (v) for two vertices u and v, to conclude that if π is stable, then u ≈ π v.If V 1 = {x} and V 2 = {y}, then we also say that π distinguishes x from y.
Lemma 21 Let G be an AND ℓ -gadget with in-terminals B = {b 0 , . . ., b 2 ℓ −1 } and out-terminals a 0 , a 1 .Let ψ be a partition of B into binary blocks, and let ρ be the coarsest stable partition ρ of V (G) that refines ψ.Then ρ agrees with ψ.Furthermore, ρ distinguishes a 0 from a 1 if and only if ψ distinguishes all in-terminal pairs.Proof: We prove the statement by induction over ℓ.For ℓ = 1, the statement is trivial.Now suppose ℓ = 2.We only consider partitions of {b 0 , . . ., b 3 } into binary blocks.Because of the automorphisms of this gadget, it follows that it suffices to consider the following four partitions for ψ.For all of them, a corresponding partition ρ is given; it can be verified that ρ is the coarsest stable partition of V (AN D ℓ ) that refines ψ.(The nonterminal vertices are labeled c 0 , . . ., c 3 , as shown in Figure 2.) We see that in all four cases, ρ agrees with ψ on B. Furthermore, ρ distinguishes the out-terminals if and only if ψ distinguishes all in-terminal pairs (which is only the case for the last ψ).Now suppose ℓ ≥ 3. Recall that an AND ℓ -gadget H is obtained by taking two copies G ′ and G ′′ of an AND ℓ−1 -gadget, and informally, putting a copy G * of an AND 2 -gadget on top of those.Any partition ψ of the in-terminal set B of H into binary blocks corresponds to partitions ψ ′ and ψ ′′ of the in-terminal sets B ′ and B ′′ of G ′ and G ′′ respectively, again into binary blocks.So by induction, we have coarsest stable partitions ρ ′ and ρ ′′ of V (G ′ ) and V (G ′′ ) that refine ψ ′ and ψ ′′ and agree with them on B ′ and B ′′ , respectively.Together, this yields a partition π of V (G ′ ) ∪ V (G ′′ ), which is stable for G ′ ⊎ G ′′ , refines ψ, and agrees with ψ on B. (To be precise: if ψ is not the unit partition, then we can simply take π = ρ ′ ∪ ρ ′′ , because ψ is a partition into binary blocks, and thus distinguishes every single in-terminal of G ′ from every single in-terminal of G ′′ .Otherwise, every set in π should be the union of the two corresponding sets in ρ ′ and ρ ′′ .)Then π gives a partition of the out-terminals of G ′ and G ′′ , which yields a matching partition ψ * of the in-terminals B * of G * , again into binary blocks.Applying the induction hypothesis to G * , we obtain a coarsest stable partition ρ * of V (G * ) that refines and agrees with ψ * .Combining π and ρ * yields a stable partition ρ of the vertices V (H) of the entire gadget.
Applying the induction hypothesis to G ′ and G ′′ shows that at least one in-terminal pair of G * is not distinguished by ψ * if and only if at least one interminal pair of G ′ or G ′′ is not distinguished by ψ ′ or ψ ′′ respectively.Applying the induction hypothesis to G * then shows that ρ does not distinguish the outterminals of H if ψ does not distinguish at least one in-pair of H.This then also holds for the coarsest stable partition of V (H) that refines ψ.
Finally, let ψ be a partition into binary blocks of the in-terminals B of H that distinguishes every pair, and let ρ be a coarsest stable partition that refines ψ.We prove that ρ also distinguishes a 0 from a 1 .By definition, ρ distinguishes any vertex from B from any vertex not in B. We conclude that for any two vertices u, v ∈ V (H), if they have different distance to B, then u ≈ ρ v (Proposition 17).So by Proposition 16, ρ induces stable partitions ρ * and π for both G * and G ′ ⊎ G ′′ , respectively.The graphs G ′ and G ′′ are components of G ′ ⊎ G ′′ , so we conclude that ρ induces stable partitions ρ ′ and ρ ′′ for both G ′ and G ′′ , respectively.By induction, it follows that ρ ′ and ρ ′′ both distinguish the out-terminals of G ′ and G ′′ , respectively.(If this holds for the coarsest stable partition, then it holds for any stable partition.)Then ψ := ρ[B * ] (where B * denotes again the in-terminal set of G * ) distinguishes all in-terminal pairs of G * .So by induction, ρ distinguishes a 0 from a 1 .
The following Corollary follows immediately from Lemma 21.
Corollary 22 Let π be a stable partition for an AND-gadget G such that ψ = π[B] is a partition of the in-terminals B into binary blocks, and such that B is πclosed.If π does not distinguish the out-terminals, then at least one in-terminal pair is not distinguished.
. Since π is stable, it refines the coarsest stable partition ρ of V (G) that refines ψ.Now apply Lemma 21.

Cost Lower Bound Proof
Intuitively, at level ℓ of the refinement process, the current partition contains all blocks X ℓ+1 q of level ℓ + 1 and for all 0 ≤ q < 2 ℓ , either Y ℓ q or the two blocks Y ℓ+1 2q and Y ℓ+1 2q+1 .In this situation one can split up the blocks Y ℓ q into blocks Y ℓ+1 2q and Y ℓ+1 2q+1 using either refining operation (X ℓ+1 2q , Y ℓ q ) or (X ℓ+1 2q+1 , Y ℓ q ).These operations both have cost 2 k−(ℓ+1) k 2 , and refining all the Y ℓ q cells in this way costs 2 k−1 k 2 .Once Y is partitioned into binary blocks of level ℓ + 1, we can partition X into blocks of level ℓ + 2 (using the AND ℓ -gadget), and proceed the same way.Since there are k such refinement levels, we can lower bound the total cost of refining the graph by 2 k−1 k 3 = Ω(m log n) and are done.What remains to show is that applying the refining operations in this specific way is the only way to obtain a stable partition.To formalise this, we introduce a number of partitions of V (G k ) that are stable with respect to the (spanning) subgraph and that partition X and Y into binary blocks.(For disjoint vertex sets S, T , we denote [S, T ] = {uv ∈ E(G) | u ∈ S, v ∈ T }.)So on G k , these partitions can only be refined using operations (R, S), where R is a binary block of X and S is a binary block of Y.
Definition 23 For any ℓ ∈ {0, . . ., k − 1}, and nonempty set Q ⊆ B ℓ , by τ Q,ℓ we denote the partition of X ∪ Y that contains cells -X ℓ+1 q for all q ∈ B ℓ+1 , -Y ℓ q for all q ∈ Q, and both We now show that for every ℓ and Q, there is also a stable partition of G ′ k that partitions X and Y as prescribed by the above definition.In particular, this holds for π Q,ℓ .
Proof: We design a stable partition ρ of V (G k ) = V (G ′ k ) that is stable on G ′ k , and agrees with τ Q,ℓ .So we start with ρ = τ Q,ℓ .For every cell X ℓ+1 q in τ Q,ℓ , we add the cell X ℓ+1 q to ρ.For every cell Y m q in τ Q,ℓ (ℓ ≤ m ≤ ℓ + 1), we add the cell Y m q to ρ.Then we add cells {v 0 }, {v 1 } and {v 2 }.For every AND p -gadget G of G k (with in-terminals adjacent to Y and outterminals adjacent to X), we define a partition ψ of the in-terminals B as follows: for ) if and only if ℓ ≥ p holds, or both ℓ = p − 1 and q ∈ Q hold.Now we extend ρ by adding all cells of the coarsest stable partition of the AND p -gadget G that refines ψ.By Lemma 21, this partition distinguishes the out-terminals of G if and only if ℓ ≥ p (since Q is nonempty).Extending ρ this way for every AND-gadget yields the final partition ρ of V (G k ).By definition, ρ agrees with τ Q,ℓ .From the construction, the stability condition is easily verified for almost all cells of ρ.Only cells {a 0 , a 1 } ∈ ρ consisting of out-terminals of AND p -gadgets need to be considered in more detail.As noted before, such cells only occur when p ≥ ℓ + 1.Then we have for every integer q that X p+1 2q ∪ X p+1 2q+1 = X p q ⊆ X ℓ+1 q ′ ∈ ρ (for some value q ′ ).Since a 0 is adjacent to every X p+1 2q and a 1 is adjacent to every X p+1 2q+1 , it follows that N (a 0 ) and N (a 1 ) are not distinguished by ρ.Therefore, ρ is stable for G ′ k .Then the coarsest stable partition π Q,ℓ that refines τ Q,ℓ also agrees with τ Q,ℓ .
Since π Q,ℓ is stable on G ′ k , any effective refining operation (with respect to G k ) should involve the edges between X and Y. Since π Q,ℓ partitions X and Y as prescribed by τ Q,ℓ , we conclude that any effective elementary refining operation has the form described in the following corollary.Recall that a refining operation (R, S) for a partition π is elementary if both R and S are classes of π, and that by Proposition 19 it suffices to consider elementary refining operations.
Corollary 25 Let (R, S) be an effective elementary refining operation on π Q,ℓ .Then for some q ∈ Q, R = X ℓ+1 2q or R = X ℓ+1 2q+1 , and S = Y ℓ q .The cost of this operation k 2 2 k−(ℓ+1) .This motivates the following definition: for q ∈ Q, by r q (π Q,ℓ ) we denote the partition of V (G k ) that results from the above refining operation.(Both choices of R yield the same result.)Lemma 26 For every ℓ ∈ {0, . . ., k − 1}, nonempty Q ⊆ B ℓ and q ∈ Q: r q (π Q,ℓ ) π B ℓ+1 ,ℓ+1 , and Proof: Choose Q ′ and ℓ ′ satisfying one of the conditions (i.e.Q ′ = B ℓ+1 and also a stable partition that refines τ Q,ℓ ).If we now obtain a partition ρ from π Q,ℓ by splitting up one cell such that the only vertex pairs u, v with u ≈ π Q,ℓ v but u ≈ ρ v are vertex pairs with u ≈ π Q ′ ,ℓ ′ v, then clearly still ρ π Q ′ ,ℓ ′ holds.This is exactly how r q (π Q,ℓ ) is obtained.
Proof: First, we note that by considering the various vertex degrees and using Proposition 17, one can verify that ω refines {X, X , Y, Y, {v 0 }, {v 1 }, {v 2 }, V G }, where V G denotes all vertices in AND-gadgets.In particular, V G is ω-closed, so ω induces a stable partition on G[V G ] (Proposition 16), and therefore it does so on every AND-gadget of G k (which are components of G[V G ]).Note that for any two different AND ℓ -gadgets H 1 and H 2 of G k , there exists an integer d such that H 1 contains a vertex at distance exactly d from the ω-closed set X ∪ Y , but H 2 does not.This observation can be combined with Proposition 17 to show that if u and v are part of different AND-gadgets, then u ≈ ω v. Subsequently this yields that for any AND-gadget of G k with output terminals a 0 , a 1 , the set {a 0 , a 1 } is ω-closed, and the set of input terminals B of this gadget is ω-closed.
are bisimilar if the processes starting in these states look the same.Formally, a transition system is a vertex-labelled directed graph.Let S = (V, E, λ), where (V, E) is a directed graph and λ a function that assigns a set of properties to each state v ∈ V .A bisimulation on S is a relation ∼ on V satisfying the following three properties for all v, w ∈ V such that v ∼ w: Not every bisimulation is an equivalence relation, but the reflexive symmetric transitive closure of a bisimulation is still a bisimulation.For convenience, in the following we assume that all bisimulations are equivalence relations.This is justified by the fact that the partition refinement algorithms (see below) that are commonly used to compute bisimulations, and that we study here, represent the relations using of the vertex set and hence implicitly assume that the relations they represent are equivalence relations.
It is not hard to see that on each transition system S there is a unique coarsest bisimulation, which we call the bisimilarity relation on S. The bisimilarity relation can be defined by letting v be bisimilar to w if there is a bisimulation ∼ such that v ∼ w; it is then straightforward to verify that bisimilarity is a bisumlation and that all other bisimulations refine it.We remark that the bisimilarity relation on a transition system is precisely what Paige and Tarjan [22] call the coarsest relational partition of the initial partition given by the labelling.Thus the problem of computing the bisimilarity relation of a given transition system is equivalent to the problem of computing the coarsest relational partition considered in [22].
Note the similarity between a bisimulation and a stable colouring of a vertexcoloured digraph, which we may view as a transition system with a labelling λ that maps each vertex to its colour.Condition (i) just says that a bisimulation refines the original colouring, as a stable colouring is supposed to do as well.Conditions (ii) and (iii), which are equivalent under the assumption that a bisimulation be an equivalence relation and hence symmetric, says that if two vertices v, w are in the same class C then for every other class D, either both v and w have an out-neighbour in D or neither of them has.Thus instead of refining by the degree in D, we just refine by the Boolean value "degree at least 1".This immediately implies that the coarsest stable colouring of S refines the coarsest bisimulation, that is, the bisimilarity relation, on S.
It should be clear from these considerations that the bisimilarity relation on a transition system S with n vertices and m edges can be computed in time O((n + m) log n) by a slight modification of the partitioning algorithm for computing the coarsest stable colouring (assuming, of course, that the labels can be computed and compared in constant time) [22].
As for the coarsest stable colouring, we may ask if the bisimilarity relation can be computed in linear time.It turns out that our lower bound for colour refinement implies a lower bound for bisimilarity.Again, we consider the class of partition refinement algorithms.As the partition refinement algorithms for colour refinement, partition refinement algorithms for bisimilarity maintain a partition of the set of vertices of the given transition system, and they iteratively refine it using refining operations until a bisimulation is reached.In each refining operation, such an algorithm chooses a union of current partition cells as refining set R, and chooses another (possibly overlapping) union of partition cells S. Cells in S are split up according to the out-neighbourhoods of the vertices in the cells in R.That is, two vertices v, w currently in the same cell in S remain in the same cell after the refinement step if and only if for all cells C of the partition, with C ⊆ R, it holds that Recall that N + (v) denotes the set of out-neighbours of a vertex v in a directed graph (or transitition system).The cost bcost(R, S) of such a refinement relation (R, S) is the number of edges from S to R. Again, the sum of the costs of all refinement operations is a reasonable lower bound for the running time of a partition refinement algorithm.The cost bcost(α) of a partition α of the vertex set is then defined as the minimum cost of a sequence of refinement operations that transforms α to the coarsest bisimulation refining it (see Definition 14).
Theorem 28 For every integer k ≥ 2, there is a transition system S k with n ∈ O(2 k k) vertices and m ∈ O(2 k k 2 ) edges and constant labelling function, such that such that bcost(α) ∈ Ω((m + n) log n), where α is the unit partition of V (S k ).
Proof (sketch).The proof is essentially the same as the proof of Theorem 15.The transition system S k is a directed version of the the graph G k .Figure 4 illustrates the direction of the edges.All vertices get the same label.
It is not hard to show that the bisimilarity classes of S k are exactly the same as the colour classes of G k in the coarsest stable colouring and that essentially the refinement steps do the same on G k and S k .Thus the lower-bound proof carries over.

⊓ ⊔
Equivalence in 2-Variable Logic It is a well-known fact (due to Immerman and Lander [16]) that colour refinement assigns the same colour to two vertices of a graph if and only if the vertices satisfy the same formulas of the logic C 2 , two-variable first-order logic with counting.Two variable first-order logic L 2 is the fragment of first order logic consisting of all formulas built with just two variables.For example, the following L 2formula φ(x) in the language of directed graphs says that from vertex x one can reach a sink (a vertex of out-degree 0) in four steps: φ(x) := ∃y(E(x, y) ∧ ∃x(E(y, x) ∧ ∃y(E(x, y) ∧ ∃x(E(y, x) ∧ ∀y ¬E(x, y))))).
Two variable first-order logic with counting C 2 is the extension of L 2 by counting quantifiers of the form ∃ ≥i x, for all i ≥ 1.For example, the following C 2 -formula Fig. 4. The transitions system S3 corresponding to the graph G3 of Figure 1 ψ(x) in the language of directed graphs says that from vertex x one can reach a vertex of out-degree at least 10 in four steps: ψ(x) := ∃y(E(x, y) ∧ ∃x(E(y, x) ∧ ∃y(E(x, y) ∧ ∃x(E(y, x) ∧ ∃ ≥10 yE(x, y))))).
This formula is not equivalent to any formula of L 2 .Two-variable logics, and more generally finite variable logics, have been studied extensively in finite model theory (see, for example, [11,15,18,13]).
We call two vertices of a graph L 2 -equivalent (C 2 -equivalent ) if they satisfy the same formulas of the logic L 2 (C 2 , respectively).Now Immerman and Lander's theorem states that for all graphs G (possible coloured and/or directed) and all vertices v, w ∈ V (G), the vertices v and w have the same colour in the coarsest bi-stable colouring of G if and only if they are C 2 -equivalent.(Recall that bi-stable was defined in Section 3.4.)In particular, this implies that the C 2 -equivalence classes of a graph can be computed in time O((n + m) log n), but not better (by a partition-refinement algorithm).
On plain undirected graphs, the logic L 2 is extremely weak.However, on coloured and/or directed graphs, the logic is quite interesting.The L 2 -equivalence relation refines the bisimilarity relation.It is well known that the L 2 -equivalence relation can be computed in time O((n + m) log n) by a variant of the colour refinement algorithm.Our lower bounds can be extended to show that it cannot be computed faster by a partition-refinement algorithm.

An Open Problem
The key idea of the O((n + m) log n) partitioning algorithms is Hopcroft's idea of processing the smaller half.Hopcroft originally proposed this idea for the minimisation of deterministic finite automata.The algorithm proceeds by identifying equivalent states and then collapsing each equivalence class to a single new state.The partitioning problem (computing classes of equivalent states) is actually just the bisimilarity problem for finite automata, which may be viewed as edge-labelled transition systems.
However, for DFA-minimisation we only need to compute the bisimilarity relation for deterministic finite automata, that is, transition systems where each state has exactly one outgoing edge of each edge label.The systems in our lower bound proof are highly nondeterministic.Thus our lower bounds do not apply.
It remains a very interesting open problem whether similar lower bounds can be proved for DFA-minimisation, or whether DFA-minimisation is possible in linear time.Paige, Tarjan, and Bonic [23] proved that this is possible for DFAs with a single-letter alphabets.To the best of our knowledge, the only known result in this direction is a family of examples due to Berstel and Carton [6] (also see [9,5]) showing that the O(n log n) bound for Hopcroft's original algorithm is tight.

Algorithm 3
Subroutine SplitUpColour(s) 75: maxcdeg := maxcdeg[s] 76: For i ∈ [1, . . ., maxcdeg]: 77: numcdeg[i] := 0 78: numcdeg[0] := |C[s]| − |A[s]| 79: For v ∈ A[s]: 80: numcdeg[cdeg[v]] := numcdeg[cdeg[v]] + 1 81: b := 0 82: For i ∈ [1, . . ., maxcdeg]: 83: If numcdeg[i] > numcdeg[b] then b := i 84: If s ∈ S refine then instack := 1 else instack := 0 // maintain flag for this test 85: For i ∈ [0, . . ., maxcdeg]: 86: If numcdeg[i] ≥ 1 then 87: If i = mincdeg[s] then 88: f [i] := s 89: If instack = 0 and b = i then push(S refine , f [i]) 90: else 91: terminates in time O(D + R (S)), where R = C[r] denotes the refining colour class, S = C[s] denotes the class to be split up, and D + R (S) = v∈S |N + (v) ∩ R|.Every (non-loop) line takes constant time.For the list deletion (Line 96), this requires a proper implementation of doubly linked lists.The test in Line 84 whether s ∈ S refine can be done in constant time by maintaining a 0/1 flag for every colour, which indicates whether the colour is in S refine .Since colours are added to and deleted from the stack S refine one by one, maintaining these flags is no problem.All for-loops in Algorithm 3 are repeated either maxcdeg[s] times or |A[s]| times.Both values are bounded by D + R (S).So the total complexity of one call to the subroutine can be bounded by O(D + R (S)).Now consider the complexity of one while loop iteration of Algorithm 2. The first two (nested) for loops (Lines 47-54) take time O(|R| + D − (R)).This holds because in total, D − (R) choices of w are considered, and the operations for every such choice take constant time.The test in Line 51 can be implemented in constant time using a 0/1 flag that keeps track of whether a colour appears in Colors adj .Since elements are added to and deleted from Colors adj one by one (Lines 52, 73), maintaining these flags is again no problem.Since |Colors adj | ≤ D − (R), the complexity of the for loops in Lines 55 and 63 can be bounded by O(D − (R)).Sorting Colors split takes time O(k log k), when using e.g.merge sort, since |Colors split | ≤ k (every colour in Colors split will split up and thus introduce at least one new colour).One call to the subroutine SplitUpColour(s) takes time O(D + R (S)), with S = C[s], as shown above.Since s∈Colors split D + R (C[s]) ≤ D − (R), the complexity of the for-loop in Line 66 can be bounded by O(D − (R)).The complexity of the last for-loop (Line 68) can also be bounded by O(D − (R)).Note in particular that in total, at most D − (R) choices of v are considered in Line 70.This shows that the complexity of one iteration of the while-loop can be bounded by O(|R| + D − (R) + k log k).

Theorem 10
Algorithm 4 can be implemented such that it terminates in time O((n + m) log n), where n = |V (G)| and m = |E(G)| for the input graph G.

Theorem 12
Let G = (V, E, c) be an edge coloured digraph with n = |V | and m = |E|.In time O((n + m) log(n + m)), a canonical coarsest edge-colour stable colouring can be computed for G. Proof: In time O(n + m), we can construct the following digraph G ′ from G, with vertex colouring α: Start with the vertex set V (the original vertices), and for every edge e ∈ E with e = (u, v), add a vertex v e (the new vertices) and two edges (u, v e ) and (v e , v).Assign colour α(v e ) = c(e) to the new vertices, and colour α(v) = 0 to the original vertices v ∈ V .
9).It remains to consider the complexity.The graph G ′ and colouring α can be constructed from G in time O(n + m).It has n + m vertices, and 2m edges.So β ′ can be computed in time O((n+ 3m) log(n+ m)) = O((n + m) log(n + m)) time, by Theorem 9.

Fig. 3 .
Fig. 2. AND2 Figure 1 for an example of this construction.(In the figure, we have expanded the terminals of AND 2 into edges, for readability.This does not affect the behaviour of the graph.)Proposition 20 G k has O(2 k k) vertices and O(2 k k 2 ) edges.
this yields a partition of B into binary blocks, and that this distinguishes an in-terminal pair b 2q , b 2q+1 (which are adjacent to Y p 2q and Y p 2q+1 , respectively, with union Y p−1 q