Abstract
An assignment of colours to the vertices of a graph is stable if any two vertices of the same colour have identically coloured neighbourhoods. The goal of colour refinement is to find a stable colouring that uses a minimum number of colours. This is a widely used subroutine for graph isomorphism testing algorithms, since any automorphism needs to be colour preserving. We give an O((m + n)log n) algorithm for finding a canonical version of such a stable colouring, on graphs with n vertices and m edges. We show that no faster algorithm is possible, under some modest assumptions about the type of algorithm, which captures all known colour refinement algorithms.
1 Introduction
Colour refinement (also known as naive vertex classification) is a very simple, yet extremely useful algorithmic routine for graph isomorphism testing. It classifies the vertices by iteratively refining a colouring of the vertices as follows. Initially, all vertices have the same colour. Then in each step of the iteration, two vertices that currently have the same colour get different colours if for some colour c they have a different number of neighbours of colour c. The process stops if no further refinement is achieved, resulting in a stable colouring of the graph. To use colour refinement as an isomorphism test, we can run it on the disjoint union of two graphs. Any isomorphism needs to map vertices to vertices of the same colour. So, if the stable colouring differs on the two graphs, that is, if for some colour c, the graphs have a different number of vertices of colour c, then we know they are nonisomorphic, and we say that colour refinement distinguishes the two graphs. Babai, Erdös, and Selkow [2] showed that colour refinement distinguishes almost all graphs (in the G(n,1/2) model). In fact, they proved the stronger statement that the stable colouring is discrete on almost all graphs, that is, every vertex gets its own colour. On the other hand, colour refinement fails to distinguish any two regular graphs with the same number of vertices, such as a 6cycle and the disjoint union of two triangles.
Colour refinement is not only useful as a simple isomorphism test in itself, but also as a subroutine for more sophisticated algorithms, both in theory and practice. For example, Babai and Luks’s [1, 3] \(O(2^{\sqrt {n\log n}})\)algorithm — this is still the best known worstcase running time for isomorphism testing — uses colour refinement as a subroutine, and most practical graph isomorphism tools (for example, [11, 18, 20, 22]), starting with McKay’s “Nauty” [20, 21], are based on the individualisation refinement paradigm (see also [23]). The basic idea of the latter algorithms is to recursively compute a canonical labelling of a given graph, which may already have an initial colouring of its vertices, as follows. We run colour refinement starting from the initial colouring until a stable colouring is reached. If the stable colouring is discrete, then this already gives us a canonical labelling (provided the colours assigned by colour refinement are canonical, see below). If not, we pick some colour c with more than one vertex. Then for each vertex v of colour c, we modify the stable colouring by assigning a fresh colour to v (that is, we “individualise” v) and recursively call the algorithm on the resulting vertexcoloured graph. Then for each v we get a canonically labelled version of our graph, and we return the lexicographically smallest among these. (More precisely, each canonical labelling of a graph yields a canonical string encoding, and we compare these strings lexicographically.) To turn this simple procedure into a practically useful algorithm, various heuristics are applied to prune the search tree. They exploit automorphisms of the graph found during the search. However, crucial for any implementation of such an algorithm is a very efficient colour refinement procedure, because colour refinement is called at every node of the search tree.
Colour refinement can be implemented to run in time O((n + m)log n), where n is the number of vertices and m the number of edges of the input graph. To our knowledge, this was first proved by Cardon and Crochemore [9]. Later Paige and Tarjan [24, p.982] sketched a simpler algorithm. Both algorithms are based on the partitioning techniques introduced by Hopcroft [15] for minimising finite automata. However, an issue that is neglected in the literature is that, at least for individualisation refinement, we need a version of colour refinement that produces a canonical colouring. That is, if f is an isomorphism from a graph G to a graph H, then for all vertices v of G, v and f(v) should get the same colour in the respective stable colourings of G and H. However, none of the aforementioned algorithms seem to produce canonical colourings. We present an implementation of colour refinement that computes a canonical stable colouring in time O((n + m)log n). Ignoring the canonical part, our algorithmic techniques are similar to known results: like [24] and various other papers, we use Hopcroft’s strategy of ‘ignoring the largest new cell’, after splitting a cell [15]. Our data structures have some similarities to those described by Junttila and Kaski [18]. Nevertheless, since [18] contains no complexity analysis, and [17, 24] omit various (nontrivial) implementation details, it seems that the current paper gives the first detailed description of an O((m + n)log n) algorithm that uses this strategy. On a high level, our algorithm is also quite similar to McKay’s canonical colour refinement algorithm [20, Alg. 2.5], but with a few key differences which enable an O((n + m)log n) implementation. McKay [20] gave an O(n ^{2} log n) implementation using adjacency matrices. Our algorithm is described and analysed in Section 3. In Section 3.4, we discuss extensions: We show how the algorithm can be applied to directed, undirected and edge coloured graphs, and how the complexity bound in fact applies to an entire branch of an individualisation refinement algorithm.
Now the question arises whether colour refinement can be implemented in linear time. After various attempts, we started to believe that it cannot. Of course with currently known techniques one cannot expect to disprove the existence of a linear time algorithm for the standard (RAM) computation model, or for similar general computation models. Instead, we prove a tight lower bound for a restricted, but very broad class of algorithms. In this sense, our result is comparable to the lower bounds for comparison based sorting algorithm. Actually, our class of partitionrefinement based algorithms captures all known colour refinement algorithms, and in fact every reasonable algorithmic strategy we could think of. We use the following assumptions. (See Sections 2 and 4 for precise definitions.) Colour refinement algorithms start with a unit partition (which has one cell V(G)), and iteratively refine this until a stable partition, or colouring, is obtained. This is done using refining operations: choose a union of current partition cells as refining set R, and choose another (possibly overlapping) union of partition cells S. Cells in S are split up if their neighbourhoods in R provide a reason for this. (That is, two vertices in a cell in S remain in the same cell only if they have the same number of neighbours in every cell in R.) This operation requires considering all edges between R and S, so the number of such edges is a very reasonable and modest lower bound for the complexity of such a refining step; we call this the cost of the operation. We note that a naive algorithm might choose R = S = V(G) in every iteration. This then requires time Ω(m n) on graphs that require a linear number of refining operations, such as paths. Therefore, all fast algorithms are based on choosing R and S smartly (and on implementing refining steps efficiently).
For our main lower bound result, we construct a class of instances such that any possible sequence of refining operations that yields the stable partition has total cost at least Ω((m + n)log n). Note that it is surprising that a tight lower bound can be obtained in this model. Indeed, cost upper bounds in this model would not necessarily yield corresponding algorithms, since firstly we allow the sets R and S to be chosen nondeterministically, and secondly, it is not even clear how to refine S using R in time proportional to the number of edges between these classes. However, as we prove a lower bound, this makes our result only stronger. An alternative formulation of our lower bound result is to model the class of nondeterministic partitionrefinement based algorithms as a “proof system” and then prove lower bounds on the length of derivations (see the first author’s PhDthesis [4] for details). We formulate the lower bound result for undirected graphs and noncanonical colour refinement, so that it also holds for digraphs, and canonical colour refinement. These results are presented in Section 4. Our construction also yields corresponding lower bounds for the problems of computing the bisimilarity relation on a transition system and for computing the equivalence classes induced by the 2variable fragment of firstorder logic L^{2} on a structure (see Section 4.4).
Colour refinement has a natural “higher dimensional” generalisation known as the WeisfeilerLeman algorithm. In the kdimensional version, which we call kWL in the following, we colour ktuples of vertices instead of single vertices. For a detailed description of the algorithm and its history, we refer the reader to [8]. Using similar tricks as for colour refinement, one can implement kWL to run in time O(n ^{k} log n) [17]. The question arises if this is optimal. At the time of writing this paper we did not know a better than linear lower bound for kWL. However, recently the first author of this paper jointly with Nordström [5] proved an n ^{Ω(k/ log k)} lower bound for the number of refinement rounds kWL requires. This implies an n ^{Ω(k/ log k)} lower bound in the partition refinement model considered here. But of course there remains a gap between this lower bound and the O(n ^{k} log n) upper bound.
2 Preliminaries
For an undirected (simple) graph G, N(v) denotes the set of neighbours of v ∈ V(G), and d(v) = N(v) its degree. For a digraph, N ^{+}(v) and N ^{−}(v) denote the out and inneighbourhoods, and d ^{+}(v) = N ^{+}(v) resp. d ^{−}(v) = N ^{−}(v) the out and indegree, respectively. A partition π of a set V is a set {S _{1},…,S _{ k }} of pairwise disjoint nonempty subsets of V, such that \(\cup _{i=1}^{k} S_{i}=V\). The sets S _{ i } are called cells of π. The order of π is the number π of cells. A partition π is discrete if every cell has size 1, and unit if it has exactly one cell. Given a partition π of V, and two elements u, v ∈ V, we write u ≈ _{ π } v if and only if there exists a cell S ∈ π with u, v ∈ S. We say that a set V ^{′} ⊆ V is πclosed if it is the union of a number of cells of π. In other words, if u ≈ _{ π } v and u ∈ V ^{′} then v ∈ V ^{′}. For any subset V ^{′} ⊆ V, π induces a partition π[V ^{′}] on V ^{′}, which is defined by \(u\approx _{\pi [V^{\prime }]} v\) if and only if u ≈ _{ π } v, for all u, v ∈ V ^{′}.
Let G = (V, E) be a graph. A partition π of V is stable for G if for every pair of vertices u, v ∈ V with u ≈ _{ π } v and R ∈ π, it holds that N(u) ∩ R = N(v) ∩ R. If G is a digraph, then N ^{+}(u) ∩ R = N ^{+}(v) ∩ R should hold. For readability, all further definitions and propositions in this section are formulated for (undirected) graphs, but the corresponding statements also hold for digraphs (replace degrees/neighbourhoods by outdegrees/outneighbourhoods). One can see that if π is stable and d(u) ≠ d(v), then u ≉ _{ π } v, which we will use throughout.
A partition ρ of V refines a partition π of V if for every u, v ∈ V, u ≈ _{ ρ } v implies u ≈ _{ π } v. (In other words: every cell of π is ρclosed.) If ρ refines π, we write π ≼ ρ. If in addition ρ ≠ π, then we also write π≺ρ. Note that ≼ is a partial order on all partitions of V.
Definition 1
Let G be a graph, and let π and π ^{′} be partitions of V(G). For vertex sets R, S ⊆ V(G) that are πclosed, we say that π ^{′} is obtained from π by a refining operation (R, S) if

for every S ^{′} ∈ π with S ^{′} ∩ S = ∅, it holds that S ^{′} ∈ π ^{′}, and

for every u, v ∈ S: \(u\approx _{\pi ^{\prime }} v\) if and only if u ≈ _{ π } v and for all R ^{′} ∈ π with R ^{′} ⊆ R, N(u) ∩ R ^{′}=N(v) ∩ R ^{′} holds.
Note that if π ^{′} is obtained from π by a refining operation (R, S), then π ≼ π ^{′}. We say that the operation (R, S) is effective if π≺π ^{′}. In this case, at least one cell C ∈ π is split, which means that C∉π ^{′}. Note that an effective refining operation exists for π if and only if π is unstable. In addition, the next proposition says that if the goal is to obtain a (coarsest) stable partition, then applying any refining operation is safe.
Proposition 2
Let π ^{′} be obtained from π by a refining operation (R,S). If ρ is a stable partition with π≼ρ, then π≼π ^{′} ≼ρ.
Proof 1
π ≼ π ^{′} follows immediately from the definitions. Now consider u, v with u ≈ _{ ρ } v, and thus u ≈ _{ π } v. Then for any R ^{′} ∈ π, \(d_{R^{\prime }}(u)=d_{R^{\prime }}(v)\). This holds because R ^{′} is a union of sets in ρ, and for all these sets this property holds since ρ is stable. Therefore, \(u\approx _{\pi ^{\prime }} v\). □
A partition π is a coarsest partition for a property P if π satisfies P, and there is no partition ρ with ρ≺π that also satisfies property P.
Proposition 3
Let G=(V,E) be a graph. For every partition π of V, there is a unique coarsest stable partition ρ that refines π.
Proof 2
For any partition π, the discrete partition refines π and is stable, so there exists a stable partition that refines π. Because ≼ is a partial order, there exists then at least one coarsest stable partition that refines π. Now suppose there exists a partition π for which there exist at least two distinct coarsest stable partitions ρ _{1} and ρ _{2} that refine π. Choose such a partition π so that π is maximum. Clearly, π is not stable (otherwise ρ _{1} = π = ρ _{2}). So there exists at least one effective refining operation (R, S) that can be applied to π. For the resulting partition π ^{′}, π ^{′}>π holds. By Proposition 2, both ρ _{1} and ρ _{2} refine π ^{′} as well. But since π ^{′}>π, this contradicts the choice of π. □
3 A Fast Canonical Colour Refinement Algorithm
3.1 Canonical Colouring Methods
A colouring of a (di)graph G is a function \(\alpha :V(G)\to \mathbb {Z}\). (Note that adjacent vertices may receive the same colour.) It is a kcolouring if for every v ∈ V(G), α(v)∈{1,…,k}. Given a colouring α of G and \(i\in \mathbb {Z}\), we denote \(C^{\alpha }_{i}=\{v\in V(G) \mid \alpha (v)=i\}\). The set \(C^{\alpha }_{i}\) is called colour class i. If the colouring is clear from the context, we also omit the superscript. For any colouring α of a (di)graph G, the set \(\{C^{\alpha }_{i} \mid i\in \mathbb {Z},\ C^{\alpha }_{i}\not =\emptyset \}\) is a partition of V(G), which we will denote by π _{ α }. We will call α unit or stable if π _{ α } is unit or stable, respectively.
Given two (di)graphs G and G ^{′}, with respective colourings α and α ^{′}, an isomorphism h : V(G) → V(G ^{′}) is colour preserving for α and α ^{′} if for all v ∈ V(G), α(v) = α ^{′}(h(v)) holds. A colouring method is a method for obtaining a colouring β of a (di)graph G, given an initial kcolouring α. (This method can be an algorithm, or simply a definition. Often, the initial colouring α is chosen to be the unit colouring.) A colouring method (or algorithm) is called canonical if for any two isomorphic (di)graphs G and G ^{′} with initial colourings α resp. α ^{′} and isomorphism h : V(G)→V(G ^{′}), the following holds: if h is colour preserving for α and α ^{′}, then h is colour preserving for the resulting colourings β and β ^{′}. The resulting colouring β itself is also called a canonical colouring of G, starting from α. If α is the unit colouring, β is simply called a canonical colouring of G.
For instance, for simple undirected graphs G, the degree function d, which assigns the colour d(v) = N(v) to each v ∈ V(G), yields canonical colouring of G, because every isomorphism maps vertices to vertices of the same degree. (In other words: degrees are isomorphism invariant.) Obviously, a canonical colouring method is useful for deducing information about possible isomorphisms between two graphs, especially when the resulting partition π _{ β } refines the initial partition π _{ α }. For details on isomorphism testing algorithms based on this idea, we refer to [20, 23].
In this section we give a fast canonical algorithm that for any (di)graph G and colouring α of G, yields a colouring β of V(G) such that π _{ β } is the coarsest stable partition that refines π _{ α }. For ease of presentation, we require that the initial colouring α is a surjective ℓcolouring for some value ℓ (so every colour in {1,…,ℓ} occurs at least once). The resulting colouring β will then again be a surjective kcolouring for some value k. In particular, if we choose α to be the unit colouring, then β is a canonical colouring of G such that π _{ β } is the unique coarsest stable partition of G. To obtain the most general result, we formulate the algorithm for digraphs. Variants and extensions are discussed in Section 3.4.
3.2 Highlevel Description and Correctness Proofs
In Algorithm 1, we give a highlevel description of our canonical colour refinement algorithm. This is not yet the fast implementation, and in fact, because we do not yet specify which data structures are used to represent the various mathematical objects (sets and functions), no sharp complexity bound can be concluded from it. In the next section, we give a detailed implementation of this algorithm, describe the data structures in detail, and prove the desired complexity bound. Here, we first focus on proving correctness of the algorithm.
In our algorithms, the scope of for loops, while loops and ifthenelse statements is indicated by the indentation of blocks; because of space considerations we omit ‘end for’, ‘end while’ and ‘end if’ statements.
The input to Algorithm 1 is a digraph G = (V, E), with V = {1,…,n}. For every vertex v ∈ V, the sets of outneighbours N ^{+}(v) and inneighbours N ^{−}(v) are given. (Alternatively, these can be computed in linear time from the edge list.) In addition, an ℓcolouring α of G and a set S⊆{1,…,ℓ} are given. The set S should be a sufficient refining colour set for α, which is a set that satisfies the following property: for any colour class \(C^{\alpha }_{i}\) and two vertices \(u,v\in C^{\alpha }_{i}\) , if there exists a colour class \(C^{\alpha }_{j}\) with \(N^{+}(u)\cap C^{\alpha }_{j}\not =N^{+}(v)\cap C^{\alpha }_{j}\) , then there exists a j ^{′} ∈ S such that \(N^{+}(u)\cap C^{\alpha }_{j^{\prime }}\not =N^{+}(v)\cap C^{\alpha }_{j^{\prime }}\). Note that {1,…,ℓ} trivially forms a sufficient refining colour set for any ℓcolouring, but that smarter choices of S may give a faster algorithm (which will be necessary in Section 3.4).
Throughout, the algorithm maintains an (ordered) partition (C _{1},…,C _{ k }) of V(G), starting with the partition \((C^{\alpha }_{1},\ldots ,C^{\alpha }_{\ell })\) (Lines 1–3). We also view this partition as a colouring, so the sets C _{ i } will be called colour classes, and indices i ∈ {1,…,k} will be called colours. In the main whileloop (Line 5), this partition is iteratively refined using refining operations of the form (R, V), where R = C _{ r } for some r ∈ {1,…,k}. We will show that when the algorithm terminates, no effective refining operations are possible on the resulting partition. So the resulting partition is the unique coarsest stable partition of G that refines π _{ α } (Propositions 2, 3). The next colour r that is used as refining colour is chosen using a stack (sequence) S _{refine} (Line 6), which contains all colours that still need to be considered. For a given refining colour class C _{ r } and any v ∈ V, call \(d^{+}_{r}(v):=N^{+}(v)\cap C_{r}\) the colour degree of v (with respect to colour r). Then every colour s ∈ {1,…,k} will be split up according to colour degrees (in the forloop of Line 10). We only consider colours that actually split up, in increasing order. When splitting up colour class C _{ s }, the new colours will be s and k + 1,…,k + d−1, where d is the number of different colour degrees that occur in C _{ s }. These new colours are assigned to the vertices in C _{ s } according to increasing colour degrees (Lines 11–12).
It remains to explain how newly introduced colours are added to the stack S _{refine}. Initially, S _{refine} contains all colours in S, in increasing order (Line 4). (To be precise, this means that the highest colour is on top of the stack.) Whenever new colours are introduced during the splitting of a colour class C _{ s }, these are pushed onto the stack S _{refine}, in increasing order (Lines 21–27). There are however exceptions: for instance, if we have already used the vertex set C _{ s } as refining colour class before, and this set is split up into d new colours, then it is not necessary to use all of these new colours as refining colours later; one colour b may be omitted from S _{refine} (Line 27). To obtain a good complexity, we choose b such that the size of the corresponding colour class is maximised, in order to minimise the sizes of the refining colour sets used later during the computation. (This is Hopcroft’s trick [15], which was also used by e.g. [24].)
Informally, this algorithm is canonical since at every point, both the (colourings given by the) ordered partition (C _{1},…,C _{ k }) and stack S _{refine} remain canonical; new colours that we assign to vertices, and the order in which colours are considered in the various loops of the algorithm, are completely determined by isomorphisminvariant values such as colour degrees and colour numbers. The order in which vertices of G or neighbour lists are given in the input is irrelevant. A formal proof is given in Lemma 6 below. We first prove that Algorithm 1 returns the unique coarsest stable partition, which requires the following invariant.
Proposition 4
At the end of every iteration of the forloop in Line 10 of Algorithm 1, {C _{1} ,…,C _{ k } } is a partition of V(G) into nonempty sets, and the set of colours in S _{ refine } is a sufficient refining colour set for the corresponding kcolouring of G.
Proof 3
Since new colours correspond to colour degrees that actually occur (Lines 11–16), every new colour class will be nonempty. Lines 19 and 20 show that every vertex of G remains part of exactly one colour class. So the algorithm maintains a partition of V(G).
By definition, the set of colours in S _{refine} is a sufficient refining colour set before the first iteration. We prove that this invariant is maintained during any iteration of the forloop, where colour class C _{ s } for s ∈ {1,…,k} is split up (by colour r), into the new colour classes \(C_{\sigma _{1}},\ldots ,C_{\sigma _{p}}\). Denote S = C _{ s }, as it is at the start of the iteration (so \(S=C_{\sigma _{1}}\cup \ldots \cup C_{\sigma _{p}}\)). Because the new colour classes form a partition of the old colour class S, for every z ∈ V(G), it holds that
Consider two vertices u, v ∈ V(G) that are in the same colour class after the refining operation, and therefore also before the refining operation. If N ^{+}(u) ∩ S≠N ^{+}(u) ∩ S, then there exists an i ∈ {1,…,p} such that \(N^{+}(u)\cap C_{\sigma _{i}}\not =N^{+}(u)\cap C_{\sigma _{i}}\) (because of (1)). So if s ∈ S _{refine}, then the invariant is maintained after splitting up the colour, since every new colour is added to S _{refine} (Lines 22–23), and s remains in S _{refine}.
So now assume s∉S _{refine}. Then every colour in {σ _{1},…,σ _{ p }} is added to S _{refine}, except for i = f(b) (Line 27). Then we need to consider the case that \(N^{+}(u)\cap C_{\sigma _{i}}\not =N^{+}(v)\cap C_{\sigma _{i}}\) , but \(N^{+}(u)\cap C_{\sigma _{j}}=N^{+}(v)\cap C_{\sigma _{j}}\) for all j ∈ {1,…,p}∖{i}. But then also N ^{+}(u) ∩ S≠N ^{+}(v) ∩ S (because of (1)). Since s∉S _{refine}, and the invariant held before the refining operation, there exists another colour j ^{′} ∈ S _{refine} such that \(N^{+}(u)\cap C_{\sigma _{j^{\prime }}}\not =N^{+}(v)\cap C_{\sigma _{j^{\prime }}}\). Since this colour remains in S _{refine}, the invariant is also maintained in this case. □
Using the above proposition, we can prove that Algorithm 1 computes a coarsest stable colouring, provided that S is a sufficient refining colour set. Recall that this condition is certainly satisfied when choosing S = {1,…,ℓ}.
Lemma 5
Let G be a digraph, α be a surjective ℓcolouring of G, and let S⊆{1,…,ℓ} be a sufficient refining colour set for α. Then Algorithm 1 computes a surjective kcolouring β of G such that π _{ β } is the coarsest stable partition that refines π _{ α }.
Proof 4
Let ω be the coarsest stable partition of V(G) that refines π _{ α }. The partition π _{ β } given by the algorithm is refined by ω because it is obtained from π _{ α } using refining operations (Proposition 2). The stack S _{refine} is empty when the algorithm terminates, so the empty set is a sufficient refining colour set at this point (Proposition 4), and therefore π _{ β } is stable. It follows that π _{ β } is equal to ω (Proposition 3). At any point, the sets C _{ i } for i ∈ {1,…,k} are nonempty (Proposition 4), so the resulting kcolouring β is surjective. □
Lemma 6
Algorithm 1 is a canonical colouring algorithm.
Proof 5
Consider two digraphs G and G ^{′}, with ℓcolourings α resp. α ^{′}, and S⊆{1,…,ℓ}. For \(i\in \mathbb {N}\), let \(C^{G,i}_{j}\) (resp. \(C^{G^{\prime },i}_{j}\)) denote the set C _{ j } as it is at the start of the ith iteration of the whileloop in Line 5, when running Algorithm 1 with input G, α, S (resp. G ^{′},α ^{′},S). Let \(S_{\text {\footnotesize refine}}^{G,i}\) (resp. \(S_{\text {\footnotesize refine}}^{G^{\prime },i}\)) denote the stack S _{refine} as it is at the start of iteration i of the whileloop in Line 5, when running Algorithm 1 with input G, α, S (resp. G ^{′},α ^{′},S).
To show that Algorithm 1 is canonical, we prove by induction over i that for every isomorphism h : V(G)→V(G ^{′}) that is colourpreserving for α and α ^{′}, the following properties are maintained: \(S_{\text {\footnotesize refine}}^{G,i}=S_{\text {\footnotesize refine}}^{G^{\prime },i}\), and for all c and \(v\in C^{G,i}_{c}\), it holds that \(h(v)\in C^{G^{\prime },i}_{c}\). For i = 1, the claim follows immediately from how S _{refine} is initialised (Line 4), and how the sets C _{ c } are initialised (Line 2). We now consider the places in the algorithm where these sets and stacks are modified. In Line 6, the last element of both \(S_{\text {\footnotesize refine}}^{G,i}\) and \(S_{\text {\footnotesize refine}}^{G^{\prime },i}\) is removed, so these sequences stay the same. Furthermore, it follows that the same colour is used as refining colour for both G and G ^{′} in this iteration. The induction assumption shows that h is a colour preserving isomorphism for the colourings given by the various sets \(C^{G,i}_{c}\) and \(C^{G^{\prime },i}_{c}\). So the isomorphism h shows that for every c and every d, \(C^{G,i}_{c}\) and \(C^{G^{\prime },i}_{c}\) contain the same number of vertices with colour degree d. Hence the set Colors_{split} is the same for both G and G ^{′}, and for each colour c ∈ Colors_{split}, the values maxcdeg and numcdeg(j) (for every j) are the same. Therefore, in every iteration of the forloop in Line 10, the sets D, I will be the same for both G and G ^{′}. The choice of the bijection f in line 16 is unique because of the monotonicity; hence f will be the same for G and G ^{′} as well. It follows that when in Lines 19 and 20, a vertex v ∈ V(G) is moved from colour class \(C^{G,i}_{s}\) to colour class \(C^{G,i}_{f(d^{+}_{r}(v))}\), the vertex h(v) ∈ V(G ^{′}) is also moved from \(C^{G^{\prime },i}_{s}\) to \(C^{G^{\prime },i}_{f(d^{+}_{r}(v))}\), since \(d^{+}_{r}(v)=d^{+}_{r}(h(v))\). Hence h remains colour preserving for the new partition. From the previous observations it also follows that in Line 25, b is chosen to be the same value for both G and G ^{′}. Therefore, in Lines 27 and 23, the stack S _{refine} is modified in the same way for both G and G ^{′} (note that in both cases, the colours are added in increasing order). This shows that the claimed properties are maintained in one iteration of the whileloop in Line 5, so by induction, h is also a colour preserving isomorphism for the final colouring β that is returned in Line 32. □
3.3 Implementation and Complexity Bound
We now describe a fast implementation of Algorithm 1. The main idea of the complexity proof is the following: one iteration (of the main whileloop; Line 5 of Algorithm 1) consists of popping a refining colour r from the stack S _{refine}, and applying the refining operation (R, V), with R = C _{ r }. Below we give implementation details and prove the following lemma:
Lemma 7
Algorithm 1 can be implemented such that one iteration, in which a refining operation (R,V) is applied, takes time
where \(D^{}(R)={\sum }_{v\in R} d^{}(v)\) and k is the number of new colours that are introduced in this iteration. This implementation requires an initialisation step with complexity O(n).
Using the above lemma, we can prove the desired complexity bound. (The main idea is again based on Hopcroft’s idea [15].)
Lemma 8
Algorithm 1 has an implementation with complexity O((n+m)logn), where n=V(G) and m=E(G) for the input digraph G.
Proof 6
Consider a vertex v ∈ V(G). Let \({R^{v}_{1}},\ldots ,{R^{v}_{q}}\) denote the refining colour classes C _{ r } with v ∈ C _{ r } that are considered throughout the computation, in chronological order. Then we observe that for all i ∈ {1,…,q−1}, \({R^{v}_{i}}\ge 2R^{v}_{i+1}\) holds. This holds because whenever a set S = C _{ s } is split up into \(C_{\sigma _{1}},\ldots ,C_{\sigma _{p}}\), where s has been considered earlier as a refining colour (so it is not in S _{refine} anymore), then for all new colours σ _{ i } that are added to the stack S _{refine}, \(C_{\sigma _{i}}\le \frac {1}{2}S\) holds, since the largest colour class is not added to S _{refine}. Note that if a colour class \(C_{\sigma _{i}}\) is subsequently split up before σ _{ i } is considered as refining colour, the bound of course also holds. It follows that every v ∈ V(G) appears at most log_{2} n times in a refining colour class. Then we can write
where the first summation is over all refining colour classes R = C _{ r } considered during the computation. In addition, the total number of new colours that is introduced is at most n, since every colour class, after it is introduced, remains nonempty throughout the computation. So we may write
where k _{ i } denotes the number of colours introduced during iteration i. Combining these bounds with Lemma 8 shows that the total complexity of the algorithm can be bounded by O(n) + O((n + m)log n) + O(nlog n) ⊆ O((n + m)log n). □
Combining Lemmas 5, 6 and 8 (using S = {1,…,ℓ}), we obtain our main theorem:
Theorem 9
For any digraph G on n vertices and m edges, with surjective ℓcolouring α, in time O((n+m)logn) a canonical surjective kcolouring β of G can be computed such that π _{ β } is the coarsest stable partition that refines π _{ α }.
Implementation Details
It remains to prove Lemma 7. In Algorithm 2 and its subroutine Algorithm 3, the detailed, fast implementation of Algorithm 1 is given. The colour classes C _{ i } are represented by doubly linked lists C[i], indexed by i ∈ {1,…,n}. (So C is an array containing (pointers to) doublylinked lists, indexed by colour numbers 1,…,n.) For all lists L, we keep track of their length, which we denote by L.
The first challenge is how to compute the colour degrees \(d^{+}_{r}(v)\) efficiently for every v ∈ V(G) (Lines 7 and 8 of Algorithm 1), with respect to the refining colour r, and corresponding colour class R. For this we use an array cdeg[v] of integers, indexed by v ∈ {1,…,n}. We use the following invariant: at the beginning of every iteration, cdeg[v]=0 for all v. Then we can compute these colour degrees by looping over all inneighbours w of all vertices v ∈ R, and increasing cdeg[w]. At the same time, we compute the maximum colour degree for every colour c, using an array maxcdeg (this is an array of integers indexed by c ∈ {1,…,n}), we compute a list Colors_{adj} of colours i that contain at least one vertex w ∈ C _{ i } with cdeg[w] ≥ 1, and for every such colour i, we compute a list A[i] of all vertices w with cdeg[w] ≥ 1. (So A is an array containing (pointers to) lists, indexed by colour numbers 1,…,n.) None of these lists contain duplicates. See Lines 47–54 of Algorithm 2. This implementation is correct because we also maintain the following invariant: at the beginning of every iteration, maxcdeg[c]=0 and A[c] is an empty list, for every c, Colors_{adj} is an empty list, and flags are maintained for colours to keep track of membership in Colors_{adj}. To maintain this invariant, we reset all of these data structures again at the end of every iteration (Lines 68–73). Note that it suffices to only reset cdeg[v] for vertices v that occur in some list A[c] (Lines 69–70).
Next, we address how we can consider all colours that split up in one iteration, in canonical (increasing) order (see Lines 9,10 of Algorithm 1 and Lines 55–66). To this end, we compute a new list Colors_{split}, which represents the subset of Colors_{adj} containing all colours that actually split up. This is necessary since this list needs to be sorted, in order to consider the colours in canonical order (in the forloop in Line 66). By ensuring that all colours in Colors_{split} split up, we have that Colors_{split} ≤ k (where k is the number of colours introduced in this iteration), and therefore we can afford to sort this list. This can be done using any list sorting algorithm of complexity O(klog k), such as merge sort. To compute which colours split up, we compute for every colour in c ∈ Colors_{adj} the maximum colour degree maxcdeg[c] and minimum colour degree mincdeg[c]. The values maxcdeg[c] were computed before. Observe that we have mincdeg[c]=0 if A[c]<C[c]. Otherwise, we can afford to compute mincdeg[c] by iterating over A[c] (see Lines 55–61).
Finally, we need to show how a single colour class S = C[s] can be split up efficiently, and how the appropriate new colours can be added to the stack S _{refine} in the proper order (Lines 10–28 of Algorithm 1). The details of this procedure are given in Algorithm 3. Firstly, for every relevant d, we compute how many vertices in C[s] have colour degree d. These values are stored in an array numcdeg[d], indexed by d ∈ {0,…,maxcdeg[s]} (Lines 76–80). Using this array numcdeg, we can easily compute the (minimum) colour degree b that occurs most often in S (Lines 81–83), which corresponds to the new colour that is possibly not added to S _{refine}. Using numcdeg, we can also easily construct an array f, indexed by d ∈ {0,…,maxcdeg[s]}, which represents the mapping from colour degrees that occur in S to newly introduced colours, or to the current colour s (Lines 85–93). Finally, we can move all vertices v ∈ A[s] from C[s] to C[i], where i = f[cdeg[v]] is the new colour that corresponds to the colour degree of v (Lines 94–98). Note that looping over A[s] suffices, because if there are vertices in C[s] with colour degree 0, then these keep the same colour, and thus do not need to be addressed. This fact is essential since the number of such vertices may be too large to consider, for our desired complexity bound. In conclusion, Algorithms 2 and 3 are indeed implementations of Algorithm 1. We now prove Lemma 7 by analysing the complexity.
Proof 7 (Proof of Lemma 7)
The given implementation uses a number of arrays of length n, either containing integers (cdeg, colour, maxcdeg, numcdeg, f), or containing (pointers to) lists/doubly linked lists (C, A). All of these arrays can be initialised in time O(n). In general, the initialisation steps (Lines 33–44) take time O(n) (for Line 43, use bucket sort).
We first consider the complexity of the subroutine SplitUpColour (s), given in Algorithm 3. We prove that it terminates in time \(O(D^{+}_{R}(S))\), where R = C[r] denotes the refining colour class, S = C[s] denotes the class to be split up, and \(D^{+}_{R}(S)={\sum }_{v\in S} N^{+}(v)\cap R\). Every (nonloop) line takes constant time. For the list deletion (Line 96), this requires a proper implementation of doubly linked lists. The test in Line 84 whether s ∈ S _{refine} can be done in constant time by maintaining a 0/1 flag for every colour, which indicates whether the colour is in S _{refine}. Since colours are added to and deleted from the stack S _{refine} one by one, maintaining these flags is no problem. All forloops in Algorithm 3 are repeated either maxcdeg[s] times or A[s] times. Both values are bounded by \(D^{+}_{R}(S)\). So the total complexity of one call to the subroutine can be bounded by \(O(D^{+}_{R}(S))\).
Now consider the complexity of one while loop iteration of Algorithm 2. The first two (nested) for loops (Lines 47–54) take time O(R + D ^{−}(R)). This holds because in total, D ^{−}(R) choices of w are considered, and the operations for every such choice take constant time. The test in Line 51 can be implemented in constant time using a 0/1 flag that keeps track of whether a colour appears in Colors_{adj}. Since elements are added to and deleted from Colors_{adj} one by one (Lines 52, 73), maintaining these flags is again no problem.
Since Colors_{adj} ≤ D ^{−}(R), the complexity of the for loops in Lines 55 and 63 can be bounded by O(D ^{−}(R)). Sorting Colors_{split} takes time O(klog k), when using e.g. merge sort, since Colors_{split} ≤ k (every colour in Colors_{split} will split up and thus introduce at least one new colour). One call to the subroutine SplitUpColour (s) takes time \(O(D^{+}_{R}(S))\), with S = C[s], as shown above. Since
the complexity of the forloop in Line 66 can be bounded by O(D ^{−}(R)). The complexity of the last forloop (Line 68) can also be bounded by O(D ^{−}(R)). Note in particular that in total, at most D ^{−}(R) choices of v are considered in Line 70. This shows that the complexity of one iteration of the whileloop can be bounded by O(R + D ^{−}(R) + klog k). □
3.4 Extensions, Generalisations and Variants
Stack vs. queue
In our algorithm, we use a stack to select the next colour that should be used for the next refining operation, whereas previous similar algorithms use a queue [20, 24]. Firstly, we remark that if we replace the stack by a queue, it can easily be checked that all of the claims proved in the previous sections still hold. So the best choice is determined by other concerns, which we now shortly discuss.
Using a queue gives the nice property that during the algorithm execution, all of the following ‘standard’ partitions will be generated: given an initial partition π = π _{0} of the vertices V of a graph G, for every i ≥ 0 one can define π _{ i + 1} to be the partition obtained from π _{ i } using the refining operation (V, V). The coarsest stable partition of G that refines π is now the first partition π _{ i } with π _{ i } = π _{ i + 1}. This characterisation is sometimes used as an alternative definition of coarsest stable partitions. One can verify that when using a queue, for every i a colouring α with π _{ α } = π _{ i } will be generated during the execution of the algorithm.
When using a stack, the behaviour of the algorithm seems somewhat less predictable. Nevertheless, this yields a ‘depthfirst’ type of strategy that tends to give very small colour classes much quicker, which seems an advantage. In our own (limited) computational studies, we observed that using a stack was never worse than using a queue, and in some cases significantly better. Furthermore, we had an earlier lower bound example construction that required time Ω((n + m)log n) for a queuebased algorithm, but could be solved in time O(n + m) using a stackbased algorithm. For these reasons, we would recommend using a stack.
The Complexity of Iterative Refinement
Consider Algorithm 4. This algorithm takes as input a digraph G on n vertices, and returns a discrete colouring β of G, or more precisely: a surjective ncolouring of G. This colouring is not canonical, since in Line 103, an arbitrary vertex is chosen to be individualised, that is, to receive a unique colour. So by itself this algorithm is not very interesting (there are easier ways to obtain an arbitrary discrete colouring of G). However, it corresponds to one recursion branch of various state of the art canonical labelling algorithms, based on the algorithm introduced by McKay [20]. We now shortly sketch how one should modify this algorithm (into a recursive algorithm) to obtain such a canonical labelling algorithm: In Line 103, instead choose a colour class \(C^{\alpha }_{i}\) of the current colouring α with \(C^{\alpha }_{i}\ge 2\) . We branch on this colour class, as follows: for every \(v\in C^{\alpha }_{i}\), continue with a separate branch of the algorithm where v is individualised (Line 105), and a new stable colouring is computed (as shown in Line 106). Continuing recursively this way, one obtains a number of discrete colourings of G; one for every leaf of the recursion tree. A canonical discrete colouring of G can be obtained by choosing one of these colourings that maximises some value. For instance, consider the adjacency matrix representation of G where rows and columns are ordered according to the colour numbers, and view this as a binary number in the straightforward way. This is the basic algorithm; by keeping track of automorphisms of the graph, there are various ways to speed up the algorithm by pruning the recursion tree. For more details, we refer to [11, 18, 20–22].
The algorithm for obtaining a canonical discrete colouring β for a digraph G sketched above does not terminate in polynomial time for all graphs G. (If it did, this would yield a polynomial time isomorphism test: for two digraphs G and G ^{′}, compute canonical discrete ncolourings β and β ^{′}, respectively. Since β and β ^{′} are discrete ncolourings, they define a unique colour preserving bijection h : V(G)→V(G ^{′}). Since β and β ^{′} are canonical, G and G ^{′} are isomorphic if and only if h is an isomorphism.) Examples are known where such an algorithm will consider an exponential number of branches [23]. Nevertheless, a single branch of this algorithm (as shown in Algorithm 4) terminates quickly. From [20] it follows that Algorithm 4 has an implementation that terminates in time O(n ^{2} log n). Using our results, we can show that it has an O((n + m)log n) implementation.
Theorem 10
Algorithm 4 can be implemented such that it terminates in time O((n+m)logn), where n=V(G) and m=E(G) for the input graph G.
Proof 8
The main part of the computation occurs in Lines 99 and 106, where we compute a surjective canonical kcolouring β such that π _{ β } refines π _{ α }, for a given surjective ℓcolouring α (in Line 99, the unit colouring is chosen for α). For this we use the fast implementation of Algorithm 1, given in Section 3.3. To obtain the desired complexity, we make the following simple changes, compared to Algorithm 1: we do not initialise the sets C _{ i } and stack S _{refine} every time we call the algorithm (Lines 1–4), and do not explicitly compute the new colouring β (Lines 29–32). In addition, we do not actually copy the the colouring β (Line 101 of Algorithm 4). Instead, we initialise these sets once, keep working with the same sets {C _{1},…,C _{ k }} throughout different iterations of the whileloop in Algorithm 4, and and only compute the corresponding colouring β at the very end of the algorithm. Whenever we individualise a vertex v by assigning it a new colour (Line 105 of Algorithm 4), we move v from its previous colour class C _{ i } to the new colour class C _{ ℓ + 1}. In addition, we update the stack S _{refine}, which is currently empty, to contain the single colour ℓ + 1. (This can both be done in constant time.)
We now argue that for computing the next stable colouring β (Line 106), it is sufficient that S _{refine} contains only the colour ℓ + 1. Denote by α _{1} the stable ℓcolouring before this step (with α _{1}(v) = i), and by α _{2} the new (ℓ + 1)colouring (with α _{2}(v) = ℓ + 1). Consider the colour classes \(C^{\alpha _{1}}_{i}\), \(C^{\alpha _{2}}_{i}\) and \(C^{\alpha _{2}}_{\ell +1}\), so \(\{C^{\alpha _{2}}_{i},C^{\alpha _{2}}_{\ell +1}\}\) is a partition of \(C^{\alpha _{1}}_{i}\). Consider any two vertices u, v with α _{2}(u) = α _{2}(v). If \(N^{+}(u)\cap C^{\alpha _{2}}_{i}\not =N^{+}(v)\cap C^{\alpha _{2}}_{i}\), then \(N^{+}(u)\cap C^{\alpha _{2}}_{\ell +1}\not =N^{+}(v)\cap C^{\alpha _{2}}_{\ell +1}\), since \(N^{+}(u)\cap C^{\alpha _{1}}_{i}=N^{+}(v)\cap C^{\alpha _{1}}_{i}\) (because α _{1} is stable). We conclude that {ℓ + 1} is a sufficient refining colour set for α _{2}, so Algorithm 1 will compute the desired stable colouring β when S _{refine} is initialised like this (Lemma 5).
We can now use the same argument as in the proof of Theorem 9 to show that the total complexity of all calls to Algorithm 1 (without the initialisation steps, as described above) is bounded by O((n + m)log n). Indeed, for every vertex v ∈ V(G), if \({R^{v}_{1}},\ldots ,{R^{v}_{q}}\) denote the refining colour classes C _{ r } with v ∈ C _{ r } that are considered throughout the entire computation, in chronological order, then again for all i ∈ {1,…,q−1} it holds that \({R^{v}_{i}}\ge 2R^{v}_{i+1}\). If v is the vertex that is individualised in Line 105 (of Algorithm 4), then this holds because the next refining colour class that contains v has size one, whereas the previous colour class that contained v had size at least two (because v was chosen with a nonunique colour in Line 103). In all other cases, the argument given in the proof of Theorem 9 applies. Following that proof, this shows that the total complexity of all refining operations done in Algorithm 4 can be bounded by O((n + m)log n).
It remains to bound the complexity of the other steps of Algorithm 4. As described above, the various sets are initialised only once, and the final colouring β is computed only once, so this only adds a term O(n) to the complexity. In addition, all steps in the whileloop (Line 100) other than the stable colouring computation in Line 106 can be done in constant time, since we do not actually copy the colouring (Line 101). For the selection of the vertex v in Line 103, this claim is not entirely obvious, but one can observe that during the computation, one can maintain a doubly linked list that contains the colours of all colour classes of size at least two. This list can be updated in constant time whenever vertices are recoloured (so it does not change the total asymptotic complexity), and it can be used to select a vertex in Line 103 in constant time. The whileloop in Line 100 terminates after at most n iterations. In total, this shows that Algorithm 4 has an implementation with complexity O(n) + O(n) + O((n + m)log n) ⊆ O((n + m)log n). □
We remark that in practice, one might wish to use smarter methods to select the vertex v to be individualised (Line 103), or more generally, to select the nontrivial colour class on which the recursive canonical labelling algorithm should branch. For instance, one can always branch on the smallest nontrivial colour class, or on the largest colour class^{Footnote 1}. In that case, an efficient heapbased priority queue implementation (see e.g. [13]) can be used instead of a doublylinked list to keep track of the sizes of colour classes, to attain the above complexity.
Alternative Stability Criteria
We formulated our results only for digraphs, with stability defined only in terms of outneighbours. We now summarise how our results should be modified to accommodate alternative stability criteria.
Theorem 11
For any undirected graph G on n vertices with m edges, in time O((n+m)logn) a canonical coarsest stable colouring can be computed.
Proof 9
For an undirected graph G, denote by G ^{∗} the digraph with V(G ^{∗}) = V(G), constructed by replacing every undirected edge by two directed edges in both directions. Observe that a colouring α is stable for G if and only if it is stable for G ^{∗}, so we can use the fast implementation of Algorithm 1 on input G ^{∗} to compute a coarsest stable colouring of G. Next, observe that a bijection h : V(G)→V(H) is an isomorphism from G to H if and only if it is a (digraph) isomorphism from G ^{∗} to H ^{∗}. It follows that the computed colouring is a canonical coarsest stable colouring. □
For a positive integer p, we define a pedge coloured digraph G to be a tuple (V, E, c) where (V, E) is a digraph that may have parallel edges and/or loops, and c : E→{1,…,p} is an edge colouring of G. For e ∈ E, we write e = (u, v) to denote that e is an edge from u to v. For j ∈ {1,…,p}, v ∈ V and C ⊆ V, denote \(d^{+}_{j}(v,C)=\{ e\in E \mid \exists w\in C \ e=(v,w) \wedge c(e)=j\}\) (the number of edges of colour j, leaving v, with head in C). A (vertex) ℓcolouring α of G is called edgecolour stable if for all u, v ∈ V with α(u) = α(v), all j ∈ {1,…,p} and all i ∈ {1,…,ℓ}, it holds that \(d^{+}_{j}(u,C^{\alpha }_{i}) = d^{+}_{j}(v,C^{\alpha }_{i})\). For two pedge coloured digraphs G = (V, E, c) and G ^{′}=(V ^{′},E ^{′},c ^{′}), a bijection h : V→V ^{′} is called an isomorphism if for all j ∈ {1,…,p} and u, v ∈ V (possibly the same), it holds that the number of edges of colour j from u to v equals the number of edges of colour j from h(u) to h(v). Using this notion of isomorphism, canonical colouring methods/canonical colourings for edge coloured digraphs are defined the same as before.
Theorem 12
Let G=(V,E,c) be an edge coloured digraph with n=V and m=E. In time O((n+m)log(n+m)), a canonical coarsest edgecolour stable colouring can be computed for G.
Proof 10
In time O(n + m), we can construct the following digraph G ^{′} from G, with vertex colouring α: Start with the vertex set V (the original vertices), and for every edge e ∈ E with e = (u, v), add a vertex v _{ e } (the new vertices) and two edges (u, v _{ e }) and (v _{ e }, v). Assign colour α(v _{ e }) = c(e) to the new vertices, and colour α(v) = 0 to the original vertices v ∈ V.
We will now show that a colouring β of G is edgecolour stable if and only if there exists a stable colouring β ^{′} for G ^{′} that refines α, that coincides with β on V.
Consider an edgecolour stable colouring β of G. We extend it to a colouring β ^{′} of G ^{′}, as follows: for each new vertex v _{ e } that corresponds to an edge e = (u, v), assign the tuple (c(e),β(v)). Extend β by assigning new colours to the new vertices, according to the lexicographical order of these tuples. (So two new vertices receive the same colour if and only if they are assigned the same tuple, and a new vertex and an original vertex never receive the same colour.) The resulting colouring β ^{′} of G ^{′} clearly refines α, and is stable for G ^{′}: for every vertex colour i used by β, vertex u ∈ V and edge colour j ∈ {1,…,p}, the number \(d^{+}_{j}(u,C^{\beta }_{i})\) (with respect to G) equals the number of outneighbours of u in G ^{′} that have the colour corresponding to the tuple (j, i). For the new vertices of G ^{′}, the stability criterion follows easily.
For the other direction, consider a stable colouring β ^{′} of G ^{′} that refines α, and define β to be the restriction of β ^{′} to V. We argue that β is edgecolour stable for G. For two new vertices v _{ e } and v _{ f } of G ^{′}, with respective outneighbours x and y, we have that β ^{′}(v _{ e }) = β ^{′}(v _{ f }) implies β ^{′}(x) = β ^{′}(y) and α(v _{ e }) = α(v _{ f }), so c(e) = c(f). This can be used to conclude that for any two vertices u, v ∈ V, colour i and edge colour j, if β(u) = β(v) then \(d^{+}_{j}(u,C^{\beta }_{i})=d^{+}_{j}(v,C^{\beta }_{i})\). So β is edgecolour stable for G.
It follows that a coarsest edgecolour stable colouring β of G corresponds to a coarsest stable colouring β ^{′} of G ^{′} that refines α. Since we can compute such a colouring β ^{′} in a canonical way, we can compute such a colouring β in a canonical way (Theorem 9). It remains to consider the complexity. The graph G ^{′} and colouring α can be constructed from G in time O(n + m). It has n + m vertices, and 2m edges. So β ^{′} can be computed in time O((n + 3m)log(n + m)) = O((n + m)log(n + m)) time, by Theorem 9. □
We remark that for any class of edgecoloured digraphs where the number of edges is polynomially bounded in the number vertices (so they satisfy m ∈ O(n ^{d}) for a constant d), we can write log(n + m) ∈ O(log n ^{d}) = O(log n). So for such a graph class, the above lemma shows that a canonical coarsest edgecolour stable colouring can again be computed in time O((n + m)log n).
The above theorem can be used for various stronger isomorphism tests. We now give details for one of these. For digraphs, we defined stability only considering outneighbourhoods. Nevertheless, an isomorphism h between two digraphs not only maps the outneighbourhood of a vertex v bijectively to the outneighbourhood of h(v), but does the same with the inneighbourhoods. So for the purpose of digraph isomorphism testing, the following stronger stability criterion is more useful: a kcolouring α of a digraph G is bistable if for every pair of vertices u, v ∈ V(G) with α(u) = α(v) and every colour c ∈ {1,…,k}, both \(N^{+}(u)\cap C^{\alpha }_{c}=N^{+}(v)\cap C^{\alpha }_{c}\) and \(N^{}(u)\cap C^{\alpha }_{c}=N^{}(v)\cap C^{\alpha }_{c}\) hold.
Theorem 13
For any digraph G on n vertices with m edges, in time O((n+m)logn) a canonical coarsest bistable colouring can be computed.
Proof 11
Let V = V(G). Construct a 2edge coloured digraph G ^{′}=(V, E, c) on the same vertex set as G as follows: for every edge (u, v) ∈ E(G), add an edge e = (u, v) to E with c(e) = 1, and an edge f = (v, u) to E with c(f) = 2. (Note that this may introduce parallel edges.) Observe that a colouring α : V→{1,…,k} is edgecolour stable for G ^{′} if and only if it is bistable for G, and that a canonical colouring method for G ^{′} is a canonical colouring method for G. So Theorem 12 can be applied. We use that G ^{′} has 2m ∈ O(n ^{2}) edges, which yields the complexity bound O((n + m)log n). □
4 Complexity Lower Bound
We shall prove our lower bound for undirected graphs; this makes it as general as possible. The cost of a refining operation (R, S) in a graph G is
This is basically the number of edges between R and S, except that edges with both ends in R ∩ S are counted twice. For a partition π that admits a refining operation (R, S), denote by π(R, S) the partition that results from this operation.
Definition 14
Let G = (V, E) be a graph, and π be a partition of V.

If π is stable, then cost(π):=0.

Otherwise, cost(π):= minR, S cost(π(R, S))+ cost(R, S), where the minimum is taken over all effective refining operations (R, S) that can be applied to π.
Note that this is welldefined; if π is unstable, then there exists at least one effective elementary refining operation (R, S), and for any such operation, π(R, S)>π. We can now formulate the main result of this section.
Theorem 15
For every integer k ≥ 2, there is a graph G _{ k } with n∈O(2 ^{k} k) vertices and m∈O(2 ^{k} k ^{2} ) edges, such that cost(α)∈Ω((m+n)logn), where α is the unit partition of V(G _{ k } ).
Note that this theorem implies a complexity lower bound for all partitionrefinement based algorithms for colour refinement, as discussed in the introduction. We will first prove some basic observations related to the above definitions, then give the construction of the graph, and finally prove Theorem 15.
4.1 Basic Observations
We start with two basic properties of stable partitions. The first proposition follows easily from the definitions.
Proposition 16
Let G=(V,E) be a graph, and π be a stable partition of V. For any πclosed subset S⊆V, π[S] is a stable partition for G[S].
Proposition 17
Let G=(V,E) be a graph, and π be a stable partition of V. For any πclosed set S and vertices u,v∈V: if the distance from u to S is different from the distance from v to S, then u≉ _{ π } v.
Proof 12
Denote the distance from a vertex x to S by dist(x, S). W.l.o.g. we may assume that dist(u, S)<dist(v, S), so in particular dist(u, S) is finite. We prove the statement by induction over dist(u, S). If dist(u, S) = 0 then u ∈ S but v∉S. Since S is πclosed, this implies u ≉ _{ π } v. Otherwise, u is adjacent to a vertex w with dist(w, S) = dist(u, S)−1, but v is not. Let R ∈ π be the cell with w ∈ R. Then by induction, N(v) ∩ R=0, so u ≉ _{ π } v, since π is stable. □
For a partition π of V, denote by π _{ ∞ } the coarsest stable partition of V that refines π.
Proposition 18
Let π and ρ be partitions of V such that π≼ρ≼π _{ ∞ } . Then cost(π) ≥ cost(ρ).
Proof 13
Let (R, S) be a refining operation that can be applied to π, which yields π ^{′}. Then it can be observed that the operation (R, S) can also be applied to ρ, and that for the resulting partition ρ ^{′}, it holds again that π ^{′}≼ρ ^{′}≼π _{ ∞ } (Proposition 2 shows that ρ ^{′}≼π _{ ∞ }).
An induction proof based on this observation shows that a minimum cost sequence of refining operations that refines π to π _{ ∞ } can also be applied to ρ, to yield the stable partition π _{ ∞ }, at the same cost. Therefore, cost(π) ≥ cost(ρ). □
A refining operation (R, S) on π is elementary if both R ∈ π and S ∈ π. The next proposition shows that adding the word ‘elementary’ in Definition 14 yields an equivalent definition.
Proposition 19
Let π be an unstable partition of V(G). Then
where the minimum is taken over all effective elementary refining operations (R,S) that can be applied to π.
Proof 14
Let (R, S) an nonelementary refining operation for π, and let ρ _{1} be the result of applying (R, S) to π. We shall prove that there is a sequence of elementary refining operations of total cost at most cost(R, S) that, when applied to π, yields a partition ρ _{2} that refines ρ _{1}. The claim follows by Proposition 18.
Suppose that R consists of the cells R _{1},…,R _{ q } and S consists of the cells S _{1},…,S _{ p }. We apply the elementary refining operations (R _{ i }, S _{ j }) for all i ∈ {1,…,q},j ∈ {1,…,p} in an arbitrary order and let ρ _{2} be the resulting partition. The cost of these elementary refinements is
It is easy to see that ρ _{2} refines ρ _{1}. Indeed, if u, v ∈ S belong to the same class of ρ _{2}, then they belong to the same class S _{ j }, and for all classes R _{ i } they have the same number of neighbours in R _{ i }. Hence they have the same number of neighbours in \(R=\bigcup _{i}R_{i}\), and this means that they belong to the same class of ρ _{1}. □
4.2 Construction of the Graph
For \(k\in \mathbb {N}\), denote \(\mathcal {B}_{k}=\{0,\ldots ,2^{k}1\}\). For ℓ ∈ {0,…,k} and q ∈ {0,…,2^{ℓ}−1}, the subset \(\mathcal {B}^{\ell }_{q}=\{q2^{k\ell },\ldots ,(q+1)2^{k\ell }1\}\) is called the qth binary block of level ℓ. Analogously, for any set of vertices with indices in \(\mathcal {B}_{k}\), we also consider binary blocks. For instance, if \(X=\{x_{i} \mid i\in \mathcal {B}_{k}\}\), then \(X^{\ell }_{q}=\{x_{i} \mid i\in \mathcal {B}^{\ell }_{q}\}\) is called a binary block of X. For such a set X, a partition π of X into binary blocks is a partition where every S ∈ π is a binary block. A key fact for binary blocks that we will often use is that for any ℓ and q, \(\mathcal {B}^{\ell }_{q}=\mathcal {B}^{\ell +1}_{2q}\cup \mathcal {B}^{\ell +1}_{2q+1}\).
For every integer k ≥ 2, we will construct a graph G _{ k }. (An example for k = 3 is given in Fig. 1.) In its core this graph consists of the vertex sets \(X=\{x_{i}\mid i\in \mathcal {B}_{k}\}\), \(\mathcal X=\{{x^{j}_{i}}\mid i\in \mathcal {B}_{k},j\in \{1,\ldots ,k\}\}\), \(\mathcal Y=\{{y^{j}_{i}}\mid i\in \mathcal {B}_{k},j\in \{1,\ldots ,k\}\}\) and \(Y=\{y_{i}\mid i\in \mathcal {B}_{k}\}\) . Every vertex x _{ i } is adjacent to \({x^{j}_{i}}\) for all j ∈ {1,…,k} and every y _{ i } is adjacent to all \({y_{i}^{j}}\) . Furthermore, for all i, j _{1}, j _{2} there is an edge between \(x^{j_{1}}_{i}\) and \(y^{j_{2}}_{i}\). (For \(\mathcal X\), binary blocks are subsets of the form \(\mathcal X^{\ell }_{q}:=\{{x^{j}_{i}} \mid i\in \mathcal {B}^{\ell }_{q}, j\in \{1,\ldots ,k\}\}\), and for \(\mathcal Y\) the definition is analogous.)
We add gadgets to the graph to ensure that any sequence of refining operations behaves as follows. After the first step, which distinguishes vertices according to their degrees, X and Y are cells of the resulting partition. Next, X splits up into two binary blocks \({X^{1}_{0}}\) and \({X^{1}_{1}}\) of equal size. This causes \(\mathcal X\) to split up accordingly into \({\mathcal X^{1}_{0}}\) and \({\mathcal X^{1}_{1}}\). One of these cells will be used to halve \(\mathcal Y\) in the same way. This refining operation (R, S) is expensive because [R, S] contains half of the edges between \(\mathcal X\) and \(\mathcal Y\). Next, Y can be split up into \({Y^{1}_{0}}\) and \({Y^{1}_{1}}\). Once this happens, there is a gadget AND_{1} that causes the two cells \({X^{1}_{0}}\), \({X^{1}_{1}}\) to split up into the four cells \({X^{2}_{q}}\), for q = 0,…,3. Again, this causes cells in \(\mathcal X, \mathcal Y\) and Y to split up in the same way and to achieve this, half of the edges between \(\mathcal X\) and \(\mathcal Y\) have to be considered. The next gadget AND_{2} ensures that if both cells of Y are split, then the four cells of X can be halved again, etc. In general, we design a gadget AND_{ ℓ } of level ℓ that ensures that if Y is partitioned into 2^{ℓ + 1} binary blocks of equal size, then X can be partitioned into 2^{ℓ + 2} binary blocks of equal size. By halving all the cells of X and Y k = Θ(log n) times (with n = V(G _{ k })), this refinement process ends up with a discrete colouring of these vertices. Since every iteration uses half of the edges between \(\mathcal X\) and \(\mathcal Y\) (which are Θ(m)), we get the cost lower bound of Ω(mlog n) (with m = E(G _{ k })).
We now define these gadgets in more detail. For every integer ℓ ≥ 1, we define a gadget AND_{ ℓ }, which consists of a graph G together with two outterminals a _{0}, a _{1}, and an ordered sequence of p = 2^{ℓ} interminals b _{0},…,b _{ p−1}. For ℓ = 1, the graph G has V(G) = {a _{0}, a _{1}, b _{0}, b _{1}}, and E(G) = {a _{0} b _{0}, a _{1} b _{1}}. For ℓ = 2, the graph G is identical to the construction of Cai, Fürer and Immerman [8]. (See Fig. 2. The outterminals a _{0}, a _{1} and interminals b _{0},…,b _{3} are indicated.) For ℓ ≥ 3, AND_{ ℓ } is obtained by taking one copy G ^{∗} of an AND_{2}gadget, and two copies G ^{′} and G ^{″} of an AND_{ ℓ−1}gadget, and adding four edges to connect the two pairs of interminals of G ^{∗} with the pairs of outterminals of G ^{′} and G ^{″}, respectively. As outterminals of the resulting gadget we choose the outterminals of G ^{∗}. The interminal sequence is obtained by concatenating the sequences of interminals of G ^{′} and G ^{″}. (See Fig. 3 for an example of AND_{3}.) For any AND_{ ℓ }gadget G with interminals \(b_{0},\ldots ,b_{2^{\ell }1}\), the interminal pairs are pairs b _{2p } and b _{2p + 1}, for all p ∈ {0,…,2^{ℓ−1}−1}.
The graph G _{ k } is now constructed as follows. Start with vertex sets \(X,\mathcal X,\mathcal Y\) and Y, and edges between them, as defined above. For every ℓ ∈ {1,…,k−1}, we add a copy G of an AND_{ ℓ }gadget to the graph. Denote the out and interminals of G by a _{0}, a _{1} and \(b_{0},\ldots ,b_{2^{\ell }1}\), respectively.

For i = 0,1 and all relevant q: we add edges from a _{ i } to every vertex in \(X^{\ell +1}_{2q+i}\).

For every i, we add edges from b _{ i } to every vertex in \(Y^{\ell }_{i}\).
Finally, we add a starting gadget to the graph, consisting of three vertices v _{0}, v _{1}, v _{2}, the edge v _{1} v _{2}, and edges \(\{v_{0}x_{i} \mid i{\in \mathcal {B}^{1}_{0}}\}\cup \{v_{1}x_{i} \mid i{\in \mathcal {B}^{1}_{1}}\}\). See Fig. 1 for an example of this construction. (In the figure, we have expanded the terminals of AND_{2} into edges, for readability. This does not affect the behaviour of the graph.)
Proposition 20
G _{ k } has O(2 ^{k} k) vertices and O(2 ^{k} k ^{2} ) edges.
Proof 15
An easy induction proof shows that the A N D _{ ℓ }gadget has O(2^{ℓ}) vertices and edges. So, all AND_{ ℓ } gadgets together, for ℓ ∈ {1,…,k−1}, have at most O(2^{k}) vertices and edges. Therefore, the bounds on the total number of vertices and edges of G _{ k } are dominated by the number of vertices and edges in \(G_{k}[\mathcal X\cup \mathcal Y]\), which is k2^{k + 1} and k ^{2}2^{k}, respectively. □
We now state and prove the key property for A N D _{ ℓ }gadgets. This requires the following definitions. For a graph G = (V, E), If ψ is a partition of a subset S ⊆ V, then for short we say that a partition ρ of V refines ψ if it refines ψ∪{V∖S}. We say that ρ agrees with ψ if ρ[S] = ψ. (So if V∖S ≠ ∅, one can choose ρ such that it agrees with ψ but does not refine ψ.) For two graphs G and H, by G⊎H we denote the graph obtained by taking the disjoint union of G and H. We say that a partition π of V distinguishes two sets V _{1} ⊆ V and V _{2} ⊆ V if there is a set R ∈ π with R ∩ V _{1}≠R ∩ V _{2}. This is often used for the case where V _{1} = N(u) and V _{2} = N(v) for two vertices u and v, to conclude that if π is stable, then u ≉ _{ π } v. If V _{1} = {x} and V _{2} = {y}, then we also say that π distinguishes x from y.
Lemma 21
Let G be an AND _{ ℓ } gadget with interminals \(B=\{b_{0},\ldots ,b_{2^{\ell }1}\}\) and outterminals a _{0} ,a _{1} . Let ψ be a partition of B into binary blocks, and let ρ be the coarsest stable partition ρ of V(G) that refines ψ. Then ρ agrees with ψ. Furthermore, ρ distinguishes a _{0} from a _{1} if and only if ψ distinguishes all interminal pairs.
Proof 16
We prove the statement by induction over ℓ. For ℓ = 1, the statement is trivial. Now suppose ℓ = 2. We only consider partitions of {b _{0},…,b _{3}} into binary blocks. Because of the automorphisms of this gadget, it follows that it suffices to consider the following four partitions for ψ. For all of them, a corresponding partition ρ is given; it can be verified that ρ is the coarsest stable partition of V(A N D _{ ℓ }) that refines ψ. (The nonterminal vertices are labeled c _{0},…,c _{3}, as shown in Fig. 2.)
We see that in all four cases, ρ agrees with ψ on B. Furthermore, ρ distinguishes the outterminals if and only if ψ distinguishes all interminal pairs (which is only the case for the last ψ).
Now suppose ℓ ≥ 3. Recall that an AND_{ ℓ }gadget H is obtained by taking two copies G ^{′} and G ^{″} of an AND_{ ℓ−1}gadget, and informally, putting a copy G ^{∗} of an AND_{2}gadget on top of those. Any partition ψ of the interminal set B of H into binary blocks corresponds to partitions ψ ^{′} and ψ ^{″} of the interminal sets B ^{′} and B ^{″} of G ^{′} and G ^{″} respectively, again into binary blocks. So by induction, we have coarsest stable partitions ρ ^{′} and ρ ^{″} of V(G ^{′}) and V(G ^{″}) that refine ψ ^{′} and ψ ^{″} and agree with them on B ^{′} and B ^{″}, respectively. Together, this yields a partition π of V(G ^{′})∪V(G ^{″}), which is stable for G ^{′}⊎G ^{″}, refines ψ, and agrees with ψ on B. (To be precise: if ψ is not the unit partition, then we can simply take π = ρ ^{′}∪ρ ^{″}, because ψ is a partition into binary blocks, and thus distinguishes every single interminal of G ^{′} from every single interminal of G ^{″}. Otherwise, every set in π should be the union of the two corresponding sets in ρ ^{′} and ρ ^{″}.) Then π gives a partition of the outterminals of G ^{′} and G ^{″}, which yields a matching partition ψ ^{∗} of the interminals B ^{∗} of G ^{∗}, again into binary blocks. Applying the induction hypothesis to G ^{∗}, we obtain a coarsest stable partition ρ ^{∗} of V(G ^{∗}) that refines and agrees with ψ ^{∗}. Combining π and ρ ^{∗} yields a stable partition ρ of the vertices V(H) of the entire gadget.
Applying the induction hypothesis to G ^{′} and G ^{″} shows that at least one interminal pair of G ^{∗} is not distinguished by ψ ^{∗} if and only if at least one interminal pair of G ^{′} or G ^{″} is not distinguished by ψ ^{′} or ψ ^{″} respectively. Applying the induction hypothesis to G ^{∗} then shows that ρ does not distinguish the outterminals of H if ψ does not distinguish at least one inpair of H. This then also holds for the coarsest stable partition of V(H) that refines ψ.
Finally, let ψ be a partition into binary blocks of the interminals B of H that distinguishes every pair, and let ρ be a coarsest stable partition that refines ψ. We prove that ρ also distinguishes a _{0} from a _{1}. By definition, ρ distinguishes any vertex from B from any vertex not in B. We conclude that for any two vertices u, v ∈ V(H), if they have different distance to B, then u ≉ _{ ρ } v (Proposition 17). So by Proposition 16, ρ induces stable partitions ρ ^{∗} and π for both G ^{∗} and G ^{′}⊎G ^{″}, respectively. The graphs G ^{′} and G ^{″} are components of G ^{′}⊎G ^{″}, so we conclude that ρ induces stable partitions ρ ^{′} and ρ ^{″} for both G ^{′} and G ^{″}, respectively. By induction, it follows that ρ ^{′} and ρ ^{″} both distinguish the outterminals of G ^{′} and G ^{″}, respectively. (If this holds for the coarsest stable partition, then it holds for any stable partition.) Then ψ: = ρ[B ^{∗}] (where B ^{∗} denotes again the interminal set of G ^{∗}) distinguishes all interminal pairs of G ^{∗}. So by induction, ρ distinguishes a _{0} from a _{1}. □
The following Corollary follows immediately from Lemma 21.
Corollary 22
Let π be a stable partition for an ANDgadget G such that ψ=π[B] is a partition of the interminals B into binary blocks, and such that B is πclosed. If π does not distinguish the outterminals, then at least one interminal pair is not distinguished.
Proof 17
Since B is πclosed, π refines ψ = π[B]. Since π is stable, it refines the coarsest stable partition ρ of V(G) that refines ψ. Now apply Lemma 21. □
4.3 Cost Lower Bound Proof
Intuitively, at level ℓ of the refinement process, the current partition contains all blocks \(\mathcal X^{\ell +1}_{q}\) of level ℓ + 1 and for all 0 ≤ q < 2^{ℓ}, either \(\mathcal Y^{\ell }_{q}\) or the two blocks \(\mathcal Y^{\ell +1}_{2q}\) and \(\mathcal Y^{\ell +1}_{2q+1}\). In this situation one can split up the blocks \(\mathcal Y^{\ell }_{q}\) into blocks \(\mathcal Y^{\ell +1}_{2q}\) and \(\mathcal Y^{\ell +1}_{2q+1}\) using either refining operation \((\mathcal X^{\ell +1}_{2q},\mathcal Y^{\ell }_{q})\) or \((\mathcal X^{\ell +1}_{2q+1},\mathcal Y^{\ell }_{q})\). These operations both have cost 2^{k−(ℓ + 1)} k ^{2}, and refining all the \(\mathcal Y^{\ell }_{q}\) cells in this way costs 2^{k−1} k ^{2}. Once \(\mathcal Y\) is partitioned into binary blocks of level ℓ + 1, we can partition \(\mathcal X\) into blocks of level ℓ + 2 (using the AND_{ ℓ }gadget), and proceed the same way. Since there are k such refinement levels, we can lower bound the total cost of refining the graph by 2^{k−1} k ^{3}=Ω(mlog n) and are done. What remains to show is that applying the refining operations in this specific way is the only way to obtain a stable partition. To formalise this, we introduce a number of partitions of V(G _{ k }) that are stable with respect to the (spanning) subgraph \(G^{\prime }_{k}=G_{k}[\mathcal X,\mathcal Y]\), and that partition \(\mathcal X\) and \(\mathcal Y\) into binary blocks. (For disjoint vertex sets S, T, we denote [S, T]={u v ∈ E(G)∣u ∈ S, v ∈ T}.) So on G _{ k }, these partitions can only be refined using operations (R, S), where R is a binary block of \(\mathcal X\) and S is a binary block of \(\mathcal Y\).
Definition 23
For any ℓ ∈ {0,…,k−1}, and nonempty set \(Q\subseteq \mathcal {B}_{\ell }\), by τ _{ Q, ℓ } we denote the partition of \(\mathcal X\cup \mathcal Y\) that contains cells

\(\mathcal X^{\ell +1}_{q}\) for all \(q\in \mathcal {B}_{\ell +1}\),

\(\mathcal Y^{\ell }_{q}\) for all q ∈ Q, and both \(\mathcal Y^{\ell +1}_{2q}\) and \(\mathcal Y^{\ell +1}_{2q+1}\) for all \(q\in \mathcal {B}_{\ell }\setminus Q\).
π _{ Q, ℓ } denotes the coarsest stable partition for \(G^{\prime }_{k}=G_{k}[\mathcal X,\mathcal Y]\) that refines τ _{ Q, ℓ }.
We now show that for every ℓ and Q, there is also a stable partition of \(G^{\prime }_{k}\) that partitions \(\mathcal X\) and \(\mathcal Y\) as prescribed by the above definition. In particular, this holds for π _{ Q, ℓ }.
Lemma 24
For every ℓ∈{0,…,k−1} and nonempty set \(Q\subseteq \mathcal {B}_{\ell }\) , π _{ Q,ℓ } agrees with τ _{ Q,ℓ }.
Proof 18
We design a stable partition ρ of \(V(G_{k})=V(G^{\prime }_{k})\) that is stable on \(G^{\prime }_{k}\), and agrees with τ _{ Q, ℓ }. So we start with ρ = τ _{ Q, ℓ }. For every cell \(\mathcal X^{\ell +1}_{q}\) in τ _{ Q, ℓ }, we add the cell \(X^{\ell +1}_{q}\) to ρ. For every cell \({\mathcal Y^{m}_{q}}\) in τ _{ Q, ℓ } (ℓ ≤ m ≤ ℓ + 1), we add the cell \({Y^{m}_{q}}\) to ρ. Then we add cells {v _{0}}, {v _{1}} and {v _{2}}.
For every AND_{ p }gadget G of G _{ k } (with interminals adjacent to Y and outterminals adjacent to X), we define a partition ψ of the interminals B as follows: for u, v ∈ B, u ≈ _{ ψ } v if and only if N(u) ∩ Y is not distinguished from N(v) ∩ Y. Note that this yields a partition of B into binary blocks, and that this distinguishes an interminal pair b _{2q }, b _{2q + 1} (which are adjacent to \(Y^{p}_{2q}\) and \(Y^{p}_{2q+1}\), respectively, with union \(Y^{p1}_{q}\)) if and only if ℓ ≥ p holds, or both ℓ = p−1 and q∉Q hold. Now we extend ρ by adding all cells of the coarsest stable partition of the AND_{ p }gadget G that refines ψ. By Lemma 21, this partition distinguishes the outterminals of G if and only if ℓ ≥ p (since Q is nonempty). Extending ρ this way for every ANDgadget yields the final partition ρ of V(G _{ k }). By definition, ρ agrees with τ _{ Q, ℓ }. From the construction, the stability condition is easily verified for almost all cells of ρ. Only cells {a _{0}, a _{1}} ∈ ρ consisting of outterminals of AND_{ p }gadgets need to be considered in more detail. As noted before, such cells only occur when p ≥ ℓ + 1. Then we have for every integer q that \(X^{p+1}_{2q}\cup X^{p+1}_{2q+1}={X^{p}_{q}}\subseteq X^{\ell +1}_{q^{\prime }}\in \rho \) (for some value q ^{′}). Since a _{0} is adjacent to every \(X^{p+1}_{2q}\) and a _{1} is adjacent to every \(X^{p+1}_{2q+1}\), it follows that N(a _{0}) and N(a _{1}) are not distinguished by ρ. Therefore, ρ is stable for \(G^{\prime }_{k}\). Then the coarsest stable partition π _{ Q, ℓ } that refines τ _{ Q, ℓ } also agrees with τ _{ Q, ℓ }. □
Since π _{ Q, ℓ } is stable on \(G^{\prime }_{k}\), any effective refining operation (with respect to G _{ k }) should involve the edges between \(\mathcal X\) and \(\mathcal Y\). Since π _{ Q, ℓ } partitions \(\mathcal X\) and \(\mathcal Y\) as prescribed by τ _{ Q, ℓ }, we conclude that any effective elementary refining operation has the form described in the following corollary. Recall that a refining operation (R, S) for a partition π is elementary if both R and S are classes of π, and that by Proposition 19 it suffices to consider elementary refining operations.
Corollary 25
Let (R,S) be an effective elementary refining operation on π _{ Q,ℓ } . Then for some q∈Q, \(R=\mathcal X^{\ell +1}_{2q}\) or \(R=\mathcal X^{\ell +1}_{2q+1}\) , and \(S=\mathcal Y^{\ell }_{q}\) . The cost of this operation is k ^{2}2^{k−(ℓ+1)}.
This motivates the following definition: for q ∈ Q, by r _{ q }(π _{ Q, ℓ }) we denote the partition of V(G _{ k }) that results from the above refining operation. (Both choices of R yield the same result.)
Lemma 26
For every ℓ ∈ {0,…,k − 1}, nonempty \(Q\subseteq \mathcal {B}_{\ell }\) and q ∈ Q:

\(r_{q}(\pi _{Q,\ell })\preceq \pi _{\mathcal {B}_{\ell +1},\ell +1}\), and

if Q ^{′} = Q ∖ {q} is nonempty, then \(r_{q}(\pi _{Q,\ell })\preceq \pi _{Q^{\prime },\ell }\).
Proof 19
Choose Q ^{′} and ℓ ^{′} satisfying one of the conditions (i.e. \(Q^{\prime }=\mathcal {B}_{\ell +1}\) and ℓ ^{′} = ℓ + 1, or Q ^{′} = Q∖{q} and ℓ ^{′} = ℓ). Then \(\tau _{Q,\ell }\preceq \tau _{Q^{\prime },\ell ^{\prime }}\) , so also \(\pi _{Q,\ell }\preceq \pi _{Q^{\prime },\ell ^{\prime }}\) (since \(\pi _{Q^{\prime },\ell ^{\prime }}\) is also a stable partition that refines τ _{ Q, ℓ }). If we now obtain a partition ρ from π _{ Q, ℓ } by splitting up one cell such that the only vertex pairs u, v with \(u\approx _{\pi _{Q,\ell }} v\) but u ≉ _{ ρ } v are vertex pairs with \(u\not \approx _{\pi _{Q^{\prime },\ell ^{\prime }}} v\), then clearly still \(\rho \preceq \pi _{Q^{\prime },\ell ^{\prime }}\) holds. This is exactly how r _{ q }(π _{ Q, ℓ }) is obtained. □
Lemma 27
Let ω be the coarsest stable partition for G _{ k } . For all ℓ∈{0,…,k−1} and nonempty \(Q\subseteq \mathcal {B}_{\ell }\) : π _{ Q,ℓ } ≼ω.
Proof 20
First, we note that by considering the various vertex degrees and using Proposition 17, one can verify that ω refines \(\{X,\mathcal X,\mathcal Y,Y,\{v_{0}\},\{v_{1}\},\{v_{2}\},V_{G}\}\), where V _{ G } denotes all vertices in ANDgadgets. In particular, V _{ G } is ωclosed, so ω induces a stable partition on G[V _{ G }] (Proposition 16), and therefore it does so on every ANDgadget of G _{ k } (which are components of G[V _{ G }]). Note that for any two different AND_{ ℓ }gadgets H _{1} and H _{2} of G _{ k }, there exists an integer d such that H _{1} contains a vertex at distance exactly d from the ωclosed set X∪Y, but H _{2} does not. This observation can be combined with Proposition 17 to show that if u and v are part of different ANDgadgets, then u ≉ _{ ω } v. Subsequently this yields that for any ANDgadget of G _{ k } with output terminals a _{0}, a _{1}, the set {a _{0}, a _{1}} is ωclosed, and the set of input terminals B of this gadget is ωclosed.
We now prove that ω[X] is discrete. Suppose that there is an ANDgadget in G _{ k } for which the outterminals are not distinguished by ω. Then let ℓ be the minimum value such that this holds for the AND_{ ℓ }gadget G of G _{ k }. As observed above, we may apply Corollary 22 to G, which shows that there is at least one pair of interminals b _{2q } and b _{2q + 1} that is not distinguished by ω. By stability, and since Y is ωclosed, this shows that there are vertices \(y_{i}\in Y^{\ell }_{2q}\) and \(y_{j}\in Y^{\ell }_{2q+1}\) in the adjacent binary blocks such that y _{ i } ≈ _{ ω } y _{ j }. Then, considering the ωclosed subgraph \(G_{k}[X\cup \mathcal X\cup \mathcal Y\cup Y]\), it easily follows that x _{ i } ≈ _{ ω } x _{ j }. If ℓ ≥ 2, then \(x_{i}\in X^{\ell }_{2q}\) is adjacent to the outterminal a _{0} of the AND_{ ℓ−1} gadget of G _{ k }, whereas \(x_{j}\in X^{\ell }_{2q+1}\) is adjacent to the other outterminal a _{1} of this gadget. By choice of ℓ, a _{0}≉_{ ω } a _{1}, so since {a _{0}, a _{1}} is ωclosed, this gives a contradiction with stability. If ℓ = 1, then we consider the starting gadget: \(x_{i}\in X^{\ell }_{0}\) is adjacent to v _{0}, and \(x_{j}\in X^{\ell }_{1}\) is adjacent to v _{1}, but {v _{0}} and {v _{1}} are distinct cells of ω, a contradiction with stability.
We conclude that for every ANDgadget, ω distinguishes the outterminals. Since every vertex x _{ i } ∈ X is adjacent to a unique set of such outterminals, it follows that ω[X] is discrete. Therefore, for every q, \({\mathcal X^{k}_{q}}\) and \({\mathcal Y^{k}_{q}}\) are ωclosed. Hence ω refines τ _{ Q, ℓ } for every Q and ℓ, and thus it refines π _{ Q, ℓ } for every Q and ℓ. □
Proof 21 (Proof of Theorem 15)
Let G _{ k } be the graph described in Section 4.2, and π _{ Q, ℓ } be the partitions of V(G _{ k }) from Definition 23. By Lemma 27, the coarsest stable partition ω of G refines all partitions π _{ Q, ℓ }. For ease of notation, we define \(\pi _{\emptyset ,\ell }:=\pi _{\mathcal {B}_{\ell +1},\ell +1}\) for all ℓ < k−1. By Corollary 25, any effective elementary refining operation on a partition π _{ Q, ℓ } has cost 2^{k−(ℓ + 1)} k ^{2}, and results in r _{ q }(π _{ Q, ℓ }) for some q ∈ Q. Denote Q ^{′} = Q∖{q}. By Lemma 26, \(r_{q}(\pi _{Q,\ell })\preceq \pi _{Q^{\prime },\ell }\). By Proposition 19, to compute the cost(π _{ Q, ℓ }), it suffices to consider only partitions that can be obtained by elementary refining operations. So we may now apply Proposition 18 to conclude that
By induction on Q it then follows that \(\operatorname {cost}(\pi _{{\mathcal {B}}_{\ell },\ell })\geq 2^{k1}k^{2}+ \operatorname {cost}(\pi _{\mathcal {B}_{\ell +1},\ell +1})\) for all 0 ≤ ℓ ≤ k−1. Hence, by induction on ℓ, \(\operatorname {cost}(\pi _{\mathcal {B}_{0},0})\geq 2^{k1}k^{3}\), which lower bounds cost(α). By Proposition 20, n ∈ O(2^{k} k) and m ∈ O(2^{k} k ^{2}), so log n ∈ O(k). This shows that cost(α)∈Ω((m + n)log n). □
4.4 Related lower bounds
In this section, we sketch how our construction also yields lower bounds for two other partitioning problems.
Bisimilarity
Bisimilarity is a key concept in concurrency theory and automated verification. A bisimulation is a binary relation defined on the states of a transition system (or between two transition systems). Intuitively, two states are bisimilar if the processes starting in these states look the same. Formally, a transition system is a vertexlabelled directed graph. Let S = (V, E, λ), where (V, E) is a directed graph and λ a function that assigns a set of properties to each state v ∈ V. A bisimulation on S is a relation ∼ on V satisfying the following three properties for all v, w ∈ V such that v∼w:

(i)
λ(v) = λ(w);

(ii)
for all v ^{′} ∈ N ^{+}(v) there is a w ^{′} ∈ N ^{+}(w) such that v ^{′}∼w ^{′};

(iii)
for all w ^{′} ∈ N ^{+}(w) there is a v ^{′} ∈ N ^{+}(v) such that v ^{′}∼w ^{′}.
Not every bisimulation is an equivalence relation, but the reflexive symmetric transitive closure of a bisimulation is still a bisimulation. For convenience, in the following we assume that all bisimulations are equivalence relations. This is justified by the fact that the partition refinement algorithms (see below) that are commonly used to compute bisimulations, and that we study here, represent the relations using partitions of the vertex set and hence implicitly assume that the relations they represent are equivalence relations.
It is not hard to see that on each transition system S there is a unique coarsest bisimulation, which we call the bisimilarity relation on S. The bisimilarity relation can be defined by letting v be bisimilar to w if there is a bisimulation ∼ such that v∼w; it is then straightforward to verify that bisimilarity is a bisumlation and that all other bisimulations refine it. We remark that the bisimilarity relation on a transition system is precisely what Paige and Tarjan [24] call the coarsest relational partition of the initial partition given by the labelling. Thus the problem of computing the bisimilarity relation of a given transition system is equivalent to the problem of computing the coarsest relational partition considered in [24].
Note the similarity between a bisimulation and a stable colouring of a vertexcoloured digraph, which we may view as a transition system with a labelling λ that maps each vertex to its colour. Condition (i) just says that a bisimulation refines the original colouring, as a stable colouring is supposed to do as well. Conditions (ii) and (iii), which are equivalent under the assumption that a bisimulation be an equivalence relation and hence symmetric, says that if two vertices v, w are in the same class C then for every other class D, either both v and w have an outneighbour in D or neither of them has. Thus instead of refining by the degree in D, we just refine by the Boolean value “degree at least 1”. This immediately implies that the coarsest stable colouring of S refines the coarsest bisimulation, that is, the bisimilarity relation, on S.
It should be clear from these considerations that the bisimilarity relation on a transition system S with n vertices and m edges can be computed in time O((n + m)log n) by a slight modification of the partitioning algorithm for computing the coarsest stable colouring (assuming, of course, that the labels can be computed and compared in constant time) [24].
As for the coarsest stable colouring, we may ask if the bisimilarity relation can be computed in linear time. It turns out that our lower bound for colour refinement implies a lower bound for bisimilarity. Again, we consider the class of partition refinement algorithms. As the partition refinement algorithms for colour refinement, partition refinement algorithms for bisimilarity maintain a partition of the set of vertices of the given transition system, and they iteratively refine it using refining operations until a bisimulation is reached. In each refining operation, such an algorithm chooses a union of current partition cells as refining set R, and chooses another (possibly overlapping) union of partition cells S. Cells in S are split up according to the outneighbourhoods of the vertices in the cells in R. That is, two vertices v, w currently in the same cell in S remain in the same cell after the refinement step if and only if for all cells C of the partition, with C ⊆ R, it holds that
Recall that N ^{+}(v) denotes the set of outneighbours of a vertex v in a directed graph (or transitition system). The cost bcost(R, S) of such a refinement relation (R, S) is the number of edges from S to R. Again, the sum of the costs of all refinement operations is a reasonable lower bound for the running time of a partition refinement algorithm. The cost bcost(α) of a partition α of the vertex set is then defined as the minimum cost of a sequence of refinement operations that transforms α to the coarsest bisimulation refining it (see Definition 14).
Theorem 28
For every integer k ≥ 2, there is a transition system S _{ k } with n∈O(2 ^{k} k) vertices and m∈O(2 ^{k} k ^{2} ) edges and constant labelling function, such that such that bcost(α)∈Ω((m+n)logn), where α is the unit partition of V(S _{ k } ).
Proof 22 (sketch)
The proof is essentially the same as the proof of Theorem 15. The transition system S _{ k } is a directed version of the the graph G _{ k }. Fig. 4 illustrates the direction of the edges. All vertices get the same label.
It is not hard to show that the bisimilarity classes of S _{ k } are exactly the same as the colour classes of G _{ k } in the coarsest stable colouring and that essentially the refinement steps do the same on G _{ k } and S _{ k }. Thus the lowerbound proof carries over. □
Equivalence in 2Variable Logic
It is a wellknown fact (due to Immerman and Lander [17]) that colour refinement assigns the same colour to two vertices of a graph if and only if the vertices satisfy the same formulas of the logic C^{2}, twovariable firstorder logic with counting.
Two variable firstorder logic L^{2} is the fragment of first order logic consisting of all formulas built with just two variables. For example, the following L^{2}formula ϕ(x) in the language of directed graphs says that from vertex x one can reach a sink (a vertex of outdegree 0) in four steps:
Two variable firstorder logic with counting C^{2} is the extension of L^{2} by counting quantifiers of the form ∃^{≥i} x, for all i ≥ 1. For example, the following C^{2}formula ψ(x) in the language of directed graphs says that from vertex x one can reach a vertex of outdegree at least 10 in four steps:
This formula is not equivalent to any formula of L^{2}. Twovariable logics, and more generally finite variable logics, have been studied extensively in finite model theory (see, for example, [12, 14, 16, 19]).
We call two vertices of a graph L^{2}equivalent (C ^{2} equivalent) if they satisfy the same formulas of the logic L^{2} (C^{2}, respectively). Now Immerman and Lander’s theorem states that for all graphs G (possible coloured and/or directed) and all vertices v, w ∈ V(G), the vertices v and w have the same colour in the coarsest bistable colouring of G if and only if they are C^{2}equivalent. (Recall that bistable was defined in Section 3.4.) In particular, this implies that the C^{2}equivalence classes of a graph can be computed in time O((n + m)log n), but not better (by a partitionrefinement algorithm).
On plain undirected graphs, the logic L^{2} is extremely weak. However, on coloured and/or directed graphs, the logic is quite interesting. The L^{2}equivalence relation refines the bisimilarity relation. It is well known that the L^{2}equivalence relation can be computed in time O((n + m)log n) by a variant of the colour refinement algorithm. Our lower bounds can be extended to show that it cannot be computed faster by a partitionrefinement algorithm.
An Open Problem
The key idea of the O((n + m)log n) partitioning algorithms is Hopcroft’s idea of processing the smaller half. Hopcroft originally proposed this idea for the minimisation of deterministic finite automata. The algorithm proceeds by identifying equivalent states and then collapsing each equivalence class to a single new state. The partitioning problem (computing classes of equivalent states) is actually just the bisimilarity problem for finite automata, which may be viewed as edgelabelled transition systems.
However, for DFAminimisation we only need to compute the bisimilarity relation for deterministic finite automata, that is, transition systems where each state has exactly one outgoing edge of each edge label. The systems in our lower bound proof are highly nondeterministic. Thus our lower bounds do not apply.
It remains a very interesting open problem whether similar lower bounds can be proved for DFAminimisation, or whether DFAminimisation is possible in linear time. Paige, Tarjan, and Bonic [25] proved that this is possible for DFAs with a singleletter alphabets. To the best of our knowledge, the only known result in this direction is a family of examples due to Berstel and Carton [7] (also see [6, 10]) showing that the O(nlog n) bound for Hopcroft’s original algorithm is tight.
Notes
Our own computational tests showed, somewhat surprisingly, that branching on the largest colour class is clearly the best strategy of these two.
References
Babai, L.: Moderately Exponential Bound for Graph Isomorphism. In: Proc. FCT’81, pp. 34–50. Springer (1981)
Babai, L., Erdös, P., Selkow, S.: Random graph isomorphism. SIAM J. Comput. 9, 628–635 (1980)
Babai, L., Luks, E.: Canonical Labeling of Graphs. In: Proc. STOC’83, pp. 171–183 (1983)
Berkholz, C.: Lower bounds for heuristic algorithms. Ph.D. thesis, RWTH Aachen University. http://publications.rwthaachen.de/record/462250 (2014)
Berkholz, C., Nordström, J.: NearOptimal Lower Bounds on Quantifier Depth and WeisfeilerLeman Refinement Steps. In: Proceedings of the 31St Annual ACM/IEEE Symposium on Logic in Computer Science (LICS’16). To appear (2016)
Berstel, J., Boasson, L., Carton, O.: Continuant polynomials and worstcase behavior of Hopcroft’s minimization algorithm. Theor. Comput. Sci. 410(3032), 2811–2822 (2009)
Berstel, J., Carton, O.: On the Complexity of Hopcroft’s State Minimization Algorithm. In: Proc. CIAA’04, pp. 35–44 (2005)
Cai, J., Fürer, M., Immerman, N.: An optimal lower bound on the number of variables for graph identifications. Combinatorica 12(4), 389–410 (1992)
Cardon, A., Crochemore, M.: Partitioning a graph in O(A log2V). Theor. Comput. Sci. 19(1), 85–98 (1982)
Castiglione, G., Restivo, A., Sciortino, M.: Circular sturmian words and Hopcroft’s algorithm. Theor. Comput. Sci. 410(43), 4372–4381 (2009)
Darga, P., Liffiton, M., Sakallah, K., Markov, I.: Exploiting Structure in Symmetry Detection for CNF. In: Proc. DAG’04, pp. 530–534 (2004)
Ebbinghaus, H. D., Flum, J.: Finite Model Theory, 2nd edn. Springer, Verlag (1999)
Fredman, M., Tarjan, R.: Fibonacci heaps and their uses in improved network optimization algorithms. J. ACM 34, 596–615 (1987)
Grädel, E., Kolaitis, P., Libkin, L., Marx, M., Spencer, J., Vardi, M., Venema, Y., Weinstein, S.: Finite model theory and its applications. springer, verlag (2007)
Hopcroft, J.: An N Log N Algorithm for Minimizing States in a Finite Automaton. In: Z. Kohavi, A. Paz (eds.) Theory of Machines and Computations, pp. 189–196. Academic Press (1971)
Immerman, N.: Descriptive complexity springer verlag (1999)
Immerman, N., Lander, E.: Describing Graphs: a FirstOrder Approach to Graph Canonization. In: A. Selman Complexity theory retrospective pp. 5981 SpringerVerlag (1990)
Junttila, T., Kaski, P.: Engineering an Efficient Canonical Labeling Tool for Large and Sparse Graphs. In: Proc. ALENEX’07, pp. 135–149 (2007)
Libkin, L.: Elements of finite model theory springer verlag (2004)
McKay, B.: Practical graph isomorphism. Congressus Numerantium 30, 45–87 (1981)
McKay, B.: Nauty User’s Guide (Version 2.4). Computer Science Dept., Australian National University (2007)
McKay, B., Piperno, A.: Practical graph isomorphism, II. J. Symb. Comput. 60, 94–112 (2014)
Miyazaki, T.: The complexity of McKay’s canonical labeling algorithm. Groups and Computation II 28, 239–256 (1997)
Paige, R., Tarjan, R.: Three partition refinement algorithms. SIAM J. Comput. 16(6), 973–989 (1987)
Paige, R., Tarjan, R., Bonic, R.: A linear time solution to the single function coarsest partition problem. Theor. Comput. Sci. 40, 67–84 (1985)
Acknowledgments
We thank Christof Löding for discussions and references regarding the lower bounds for bisimilarity and DFAminimisation.
Author information
Authors and Affiliations
Corresponding author
Additional information
Supported by the European Community’s Seventh Framework Programme (FP7/20072013), grant agreement n^{∘} 317662.
Supported by the German Research Foundation DFG Koselleck Grant GR 1492/141.
An extended abstract of this paper has appeared in the proceedings of ESA’13, LNCS 8125, pp 145–156 (2013).
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
About this article
Cite this article
Berkholz, C., Bonsma, P. & Grohe, M. Tight Lower and Upper Bounds for the Complexity of Canonical Colour Refinement. Theory Comput Syst 60, 581–614 (2017). https://doi.org/10.1007/s0022401696860
Published:
Issue Date:
DOI: https://doi.org/10.1007/s0022401696860