The Todd-Coxeter Algorithm for Semigroups and Monoids

In this paper we provide an account of the Todd-Coxeter algorithm for computing congruences on semigroups and monoids. We also give a novel description of an analogue for semigroups of the so-called Felsch strategy from the Todd-Coxeter algorithm for groups.


Introduction
In this article we describe the Todd-Coxeter algorithm for congruence enumeration in semigroups and monoids.The essential purpose of this algorithm is to compute the action of a finitely presented semigroup or monoid on the equivalence classes of a left, right, or two-sided congruence.Most existing implementations (see, for example, ACE [14], GAP [12], and MAF [38]) and expository accounts (see, for example, [29, p. 351], [31,Sections 4.5,4.6 and Chapter 5], and [15,Chapter 5]) of the Todd-Coxeter algorithm relate to the enumeration of the cosets of a subgroup of a finitely presented group; or, more precisely, to the production of a permutation representation of the action of the group on the cosets.This was extended to linear representations by Linton [19,20].The purpose of this article is to provide an expository, but more or less complete, account of the Todd-Coxeter algorithm for semigroups and monoids.
The Todd-Coxeter algorithm is not a single procedure, but rather an infinite collection of different but related procedures.In the literature for finitely presented groups, examples of procedures in this collection are referred to as coset enumerations; see, for example, [31,Sections 4.5,4.6 and Chapter 5].Coset enumerations are also not algorithms, at least by some definitions, in that they might consist of infinitely many steps, and they do not always terminate.In fact, a coset enumeration terminates if and only if the subgroup whose cosets are being enumerated has finite index; for further details see Section 4. This is not to say, however, that the number of steps, or the run time, of a coset enumeration can be predicted in advance; it is relatively straightforward to find examples of finite presentations for the trivial group where the number of steps in a coset enumeration is arbitrarily high.For instance, the group presentation ⟨a, b | ab n , a n b n+1 , b n ab −1 a −1 ⟩ defines the trivial group, and is likely that at least n steps are required in any coset enumeration for this presentation; see [17].Although this might seem rather negative, nothing more can really be expected.For example, if a coset enumeration for the trivial subgroup of a finitely presented group G does successfully terminate, then the output can be used to solve the word problem in G.It is well-known by the theorem of Novikov [27] and Boone [3] that the word problem for finitely presented groups is undecidable and so no procedure that solves the word problem, including any coset enumeration, can be expected to terminate in every case.Analogous statements about the undecidability of the word problem hold for finitely presented semigroups and monoids; see [29,Chapter 12].
Congruences are to semigroups and monoids what cosets are to groups, and so we will refer to congruence enumeration for finitely presented semigroups and monoids as the analogue of coset enumeration for groups.We will not consider the special case of enumerating the cosets of a subgroup of a group separately from the more general case of enumerating classes of a congruence of a semigroup.
The first computer implementation of the Todd-Coxeter algorithm is attributed to Haselgrove in 1953 by Leech [18].Several authors (for example, Neubüser in [25] and Walker in [36]) comment that this may be the first computer implementation of an algorithm for groups, possibly representing the starting point of the field of computational group theory.Neumann [26] adapted the algorithm in [35] to semigroups, and Jura [17] proved that Neumann's adaptation was valid.One congruence enumeration strategy is described in Ruškuc [30,Chapter 12] as well as variants for computing Rees congruences and minimal ideals.Stephen [32,Chapter 4] also describes a variant of Todd-Coxeter, which can be used to solve the word problem by constructing only part of the action of a finitely presented semigroup or monoid on itself by right multiplication.Stephen's procedure [32,Section 4.1] is similar to one of the two main strategies for congruence enumeration described in Section 6; see also Section 9. Versions of the Todd-Coxeter algorithm for semigroups were implemented in FORTRAN by Robertson and Ünlü [28]; in C by Walker [36,37]; and in GAP by Pfeiffer [12].The C++ library libsemigroups [24] (by three of the present authors) contains a flexible optimized implementation of the different versions of the Todd-Coxeter algorithm described in this document for semigroups and monoids.
The paper is organised as follows: some mathematical prerequisites from the theory of monoids are given in Section 2; in Section 3 we define the infinite family of procedures for enumerating a right congruence on a monoid; the validity of congruence enumeration is shown in Section 4; in Section 5 we discuss how congruence enumeration can be used to compute congruences of a monoid that is not given a priori by a finite presentation; in Section 6 and Section 7 we describe the two main strategies for congruence enumeration; in Section 8 we discuss some issues related to the implementation of congruence enumeration; and, finally, in Section 9 we present some variants: for enumerating congruences of monoids with zero elements, Rees congruences, and Stephen's procedure [32,Section 4.1].
In Appendix A we give a number of extended examples, and in Appendix B there is a comparison of the performance of the implementations in libsemigroups [24] of various strategies when applied to a number of group, monoid, or semigroup presentations.The authors have also implemented basic versions of the algorithms described in this paper in python3 as a concrete alternative to the description given here; see [23].

Prerequisites
In this section, we provide some required mathematical prerequisites.The contents of this section are well-known in the theory of semigroups; for further background information we refer the reader to [16].
If µ is an equivalence relation on a set X, then we denote by x/µ the equivalence class of x in µ and by X/µ the set { x/µ : x ∈ X } of equivalence classes.The least equivalence relation on X with respect to containment is ∆ X = { (x, x) ∈ X × X : x ∈ X }; we refer to this as the trivial equivalence on X.
If S is a semigroup and R ⊆ S × S, then there is an (two-sided) elementary sequence with respect to R between x, y ∈ S if x = y or x = s 1 , s 2 , . . ., s m = y, for some m ≥ 2, where s i = p i u i q i , s i+1 = p i v i q i , with p i , q i ∈ S1 and (u i , v i ) ∈ R for all i.If p i = ε for all i in an elementary sequence, then we refer to the sequence as a right elementary sequence; left elementary sequences are defined analogously.We define R # ⊆ S × S so that (x, y) ∈ R # if and only if there is an elementary sequence between x and y with respect to R. Note that R # is the least congruence on the semigroup S containing R; see [16,Section 1.5].The least right congruence containing a subset of S × S is defined analogously, using right elementary sequences rather than two-sided elementary sequences; we will not reserve any special notation for such least right congruences.Least left congruences are defined dually.
Throughout this article we write functions to the right of their arguments, and compose from left to right.Let S be a semigroup and let X be a set.A function Ψ : X × S −→ X is a right action of S on X if ((x, s)Ψ, t)Ψ = (x, st)Ψ for all x ∈ X and for all s, t ∈ S. If in addition S has an identity element e, we require (x, e)Ψ = x for all x ∈ X also.
If S is a semigroup, then we may adjoin an identity to S (if necessary) so that S is a monoid.We denote S with an adjoined identity 1 S by S 1 .If Ψ : X × S −→ X is a right action of a semigroup S on a set X, then Ψ 1 : X × S 1 −→ X defined by (x, s)Ψ 1 = (x, s)Ψ for all x ∈ X and s ∈ S, and (x, 1 S )Ψ 1 = x for all x ∈ X, is a right action of S 1 on X also.
For the sake of brevity, we will write x • s instead of (x, s)Ψ, and we will say that S acts on X on the right.The kernel of a function f : X −→ Y , where X and Y are any sets, is the equivalence relation If X is any set, then X X denotes the set of all functions from X to X. Endowed with the operation of function composition, X X is a monoid, called the full transformation monoid on X.A right action Ψ of a semigroup S on a set X induces a homomorphism Ψ ′ : S −→ X X defined by (s)ϕ : x → x • s for all s ∈ S and all x ∈ X.
The kernel of a right action Ψ of a semigroup S on a set X is the kernel of the function It is straightforward to verify that the kernel of a homomorphism, and the kernel of a right action of a semigroup S, is a congruence on S.
If S acts on the sets X and Y on the right, then we say that λ : X −→ Y is a homomorphism of right actions if (x • s)λ = (x)λ • s for all x ∈ X and s ∈ S 1 (see Fig. 2.1 for a diagram).An isomorphism of right actions is a bijective homomorphism of right actions.
If A is a set, then a word over A is a finite sequence (a 1 , . . ., a n ) where a 1 , . . ., a n ∈ A and n ≥ 0; say that the length of the word is n.We will write a 1 • • • a n rather than (a 1 , . . ., a n ), and will refer to A as an alphabet and a ∈ A as a letter .If A is any set, then the free semigroup on A is the set A + consisting of all words over A of length at least 1 with operation given by the juxtaposition of words.The free monoid on A is the set A + ∪ {ε} with the operation again the juxtaposition of words and where ε denotes the unique word of length 0, the so-called empty word ; the free monoid on A is denoted A * .
The next proposition might be viewed as an analogue of the Third Isomorphism Theorem for semigroups ([16, Theorem 1.5.4]) in the context of actions.More precisely, Proposition 2.1 will allow us to replace the action of a free monoid on a quotient of a quotient by an action on a single quotient.We require the following definition.Let S be a semigroup, let ρ be a congruence of S, and let σ be a right congruence of S such that ρ ⊆ σ.The set σ/ρ defined as Proposition 2.1.Let S be a semigroup, let ρ be a congruence of S, and let σ be a right congruence of S such that ρ ⊆ σ.The right actions of S on (S/ρ)/(σ/ρ) and S/σ defined by The proof of Proposition 2.1 is similar to the proof of [16,Theorem 1.5.4].An analogue of Proposition 2.1 can be formulated for left congruences, but we do not require this explicitly; see Proposition 3.1, and the surrounding text, for further details.
We also require the following result, the proof of which is routine, and hence omitted.
Proposition 2.2.Let S be a semigroup, let X be a set, and let Ψ : X × S −→ X be a right action.If µ ⊆ ker(Ψ) is a congruence on S, then Ψ : X × S/µ −→ X defined by for all x ∈ X and s ∈ S, is a right action that is isomorphic to Ψ.

Congruence enumeration
In this section we define what we mean by a congruence enumeration for a semigroup or monoid, and establish some further notational conventions.Congruence enumeration, as described in this section, will provide the general context for the more explicit algorithms discussed in Section 6 and Section 7; this section is based, at least in spirit, on [31, Section 5.1] and is also influenced by [32,33].
The purpose of a congruence enumeration is to determine, in some sense, the structure of a finitely presented monoid; or more generally, a congruence on such a monoid.For the sake of simplicity, we will assume throughout that P = ⟨A|R ⟩ is a monoid presentation defining a monoid M , where A is some totally ordered finite alphabet, and R is a (possibly empty) finite subset of A * × A * .Additionally, we suppose that S is a finite subset of A * × A * .We write R # to be the least two-sided congruence on A * containing R, and denote by ρ the least right congruence on A * containing R # and S. In this notation, the monoid M defined by the presentation P is A * /R # .If the number of congruence classes of ρ is finite, then the output of a congruence enumeration is a description of the natural right action of A * on the congruence classes of ρ.In particular, the output yields the number of such classes, can be used to determine whether or not two words in A * belong to the same class, and provides a homomorphism from A * to the full transformation monoid (A * /ρ) (A * /ρ) on the set of congruence classes of ρ.Since it is undecidable whether a finitely presented monoid is finite or not [22] (see also [4,Remark 4]), an upper bound for the number of steps required for a congruence enumeration for ⟨A|R ⟩ to terminate cannot be computed as a function of |A| and |R|.If the number of congruence classes of ρ is infinite, then the enumeration will not terminate.If the set S is empty, then a congruence enumeration, if it terminates, gives us a description of the monoid A * /R # together with the natural right action of A * on A * /R # .
Perhaps it is more natural to want to determine a right congruence of the monoid M defined by the presentation P rather than on the free monoid A * .However, by Proposition 2.1, to determine the right congruence on it suffices to determine the right congruence ρ on A * containing R # and the set S.
It is only for the sake of convenience that we have chosen to consider finitely presented monoids, rather than finitely presented semigroups.To apply the algorithms described in this article to a semigroup that is not a monoid, simply adjoin an identity, perform the algorithm, and disregard the adjoined identity in the output.
The choice of "right" rather than "left" in the previous paragraphs was arbitrary.
Proposition 3.1.Let A be an alphabet, let R, S ⊆ A * × A * , let R # be the least congruence on A * containing R, and let ρ be the least left congruence on A * containing R # and S. Then ρ † is the least right congruence on A * containing R # † and S † .
It follows from Proposition 3.1 that any algorithm for computing right congruences can be used for left congruences, by simply reversing every word.
The inputs of a congruence enumeration are the finite alphabet A, the finite set of relations R ⊆ A * × A * , the finite set S ⊆ A * × A * , and a certain type of digraph that is defined in the next section.

Word graphs
One of the central components of the proofs presented in Section 4 is that of a word graph, which is used in a natural representation of equivalence relations on a free monoid.Let A be any alphabet and let Γ = (N, E) be a digraph with non-empty finite set of nodes N ⊆ N with 0 ∈ N and edges E ⊆ N × A × N .Then, following [32,33], we refer to Γ as a word graph.The word graph Γ = ({0}, ∅) is the trivial word graph.
If (α, a, β) ∈ E is an edge in a word graph Γ, then α is the source, a is the label , and β is the target of (α, a, β).An edge (α, a, β) is said to be incident to its source α and target β.
A word graph Γ is deterministic if for every node α and every letter a ∈ A there is at most one edge of the form (α, a, β) in Γ.A word graph Γ is complete if for every node α and every letter a ∈ A there is at least one edge incident to α labelled by a in Γ.
If α, β ∈ N , then an (α, β)-path is a sequence of edges (α 1 , a 1 , α 2 ), . . ., (α n , a n , α n+1 ) ∈ E where α 1 = α and α n+1 = β and a 1 , . . ., a n ∈ A ; α is the source of the path; the word a 1 • • • a n ∈ A * labels the path; β is the target of the path; and the length of the path is n.If α, β ∈ N and there is an (α, β)-path in Γ, then we say that β is reachable from α.If Γ = (N, E) is a word graph and P(A * × A * ) denotes the power set of A * × A * , then the path relation of Γ is the function π Γ : N −→ P(A * × A * ) defined by (u, v) ∈ (α)π Γ if there exists a node β such that u and v both label (α, β)-paths in Γ.If Γ is a word graph and α is a node in Γ, then (α)π Γ is reflexive and symmetric, and (α)π Γ is transitive for all α if and only if Γ is deterministic.In particular, (α)π Γ is an equivalence relation for all α if and only if Γ is deterministic.If R ⊆ A * × A * , Γ is a word graph, and π Γ is the path relation of Γ, then we say that Γ is compatible with R if R ⊆ (α)π Γ for every node α in Γ.
Before giving the definition of a congruence enumeration, we highlight that many accounts of the Todd-Coxeter algorithm (see, for instance, [15,25,31]) are not formulated in terms of digraphs, but rather as a table whose rows are labelled by a set C of non-negative integers, and columns are labelled by the generating set A. If Γ = (N, E) is a deterministic word graph and f : C −→ N is a bijection such that (0)f = 0, then the value in the row labelled c and column labelled a is f −1 of the target of the unique edge in Γ with source (c)f and label a.According to Neubüser [25], until the 1950s, congruence enumeration was often performed by hand, and, in this context, using tables is more straightforward than using graphs.On the other hand, the language of word graphs provide a more accessible means of discussing congruence enumeration in theory.

The definition
Recall that we suppose throughout that ⟨A|R⟩ for some R ⊆ A * × A * is a finite monoid presentation.Additionally, we will suppose throughout that A is a totally ordered alphabet.A congruence enumeration is a sequence of the following steps TC1, TC2, and TC3 where the input to the i-th step (where i ∈ N) is (Γ i , κ i ) for some word graph Γ i with a totally ordered set of nodes N i and set of edges E i , and some equivalence relation TC1 (define a new node).If α is a node in Γ i and there is no edge in Γ i labelled by a ∈ A with source α, then we define Γ i+1 to be the word graph obtained from Γ i by adding the new node β := 1 + max j≤i N j and the edge (α, a, β).We define κ i+1 := κ i ∪ {(β, β)}.(a) If u and v 1 label paths from α to some nodes β, γ ∈ N i in Γ i , respectively, but γ is not the source of any edge labelled by b, then we set Γ i+1 to be the word graph obtained from Γ i by adding the edge (γ, b, β) and we define κ i+1 := κ i .
(b) The dual of (a) where there are paths labelled by u 1 and v to nodes β and γ, respectively, but β is not the source of any edge labelled by a.
(c) If u and v label paths from w to some nodes β and γ, respectively, and β ̸ = γ, then we define Γ i+1 := Γ i and κ i+1 to be the least equivalence containing κ i and (β, γ).
Note that conditions (a), (b), and (c) are mutually exclusive, and it may be the case that none of them hold.
TC3 (process coincidences or a determination).We define Γ i+1 to be the quotient of Γ i by κ i and define κ i+1 to be the least equivalence on N i+1 containing every (β, γ), β ̸ = γ, for which there exist α ∈ N i+1 and a ∈ A such that (α, a, β), (α, a, γ) ∈ E i+1 .Recall that the quotient Γ i /κ i of Γ i by κ i is the word graph with nodes {min α/κ i : α ∈ N i } and hence after an application of TC3 each node in Γ i+1 is set to be equal to the minimum of the set of nodes in its equivalence class in κ i .
There are only finitely many possible quotients of any word graph.Hence if TC3 is applied repeatedly, then after finitely many iterations the output κ i+1 will equal ∆ Ni , and Γ i+1 and Γ i will be equal.
If w labels an (α, β)-path P in some Γ i , then neither TC1 nor TC2 changes any of the edges belonging to P .Hence if Γ i+1 is obtained from Γ i by applying TC1 or TC2, then w labels an (α, β)-path in Γ i+1 also.If Γ i+1 is obtained from Γ i by applying TC3, then Γ i+1 is a homomorphic image of Γ i .As already noted, homomorphisms preserve paths, and so w labels a path in Γ i+1 also.
We can now formally define a congruence enumeration.
Definition 3.3.[Congruence enumeration.]Suppose that A is a finite alphabet, that R, S ⊆ A * × A * are finite, that ρ is the least right congruence on A * containing both R # and S, and that Γ 1 = (N 1 , E 1 ) is a word graph with path relation π Γ1 : N 1 −→ P(A * × A * ) such that (0)π Γ1 ⊆ ρ and κ 1 = ∆ N1 .Then a congruence enumeration for ρ with input (Γ 1 , κ 1 ) consists of: (a) For every (u, v) ∈ S, by repeatedly applying TC1 (if necessary), add edges to Γ 1 so that it contains paths labelled by u and v both with source 0.
If (Γ m , κ m ) is the output of steps (a) and (b), then the enumeration is concluded by performing any sequence of applications of TC1, TC2, and TC3 such that the following conditions hold for Γ i = (N i , E i ) for every i ∈ N, i ≥ m: (c) If α ∈ N i and there is no edge incident to α with label a ∈ A, then there exists j ≥ i such that either: α is no longer a node in Γ j ; or there is an edge incident to α with label a in Γ j .
(d) If α ∈ N i and (u, v) ∈ R, then there exists j ≥ i such that either: α is no longer a node in Γ j or (u, v) ∈ (α)π Γj .
(e) If κ i ̸ = ∆ Ni for some i, then there exists j ≥ i such that κ j = ∆ Nj .
The initial value of the word graph Γ 1 that forms the input to a congruence enumeration is usually either the trivial word graph or, if M is finite, the right Cayley graph of the monoid M defined by the presentation ⟨A|R⟩; see Section 5 for more details.
A congruence enumeration terminates if the output (Γ i , κ i ) has the property that applying any of TC1, TC2, or TC3 to (Γ i , κ i ) results in no changes to the output, i.e. (Γ i+1 , κ i+1 ) = (Γ i , κ i ).It is straightforward to verify that a congruence enumeration terminates at step i if and only if Γ i is complete, compatible with R ∪ S, and deterministic.
For any given finite monoid presentation, there is a wide range of choices for the order in which steps TC1, TC2, and TC3 are performed, and to which nodes, generators, and relations they are applied.We will examine two specific strategies for enumerating congruences for an arbitrary finite monoid presentation in more detail in Section 6 and Section 7.
In this section we address the validity of congruence enumeration as defined in Section 3; we will continue to use the notation established therein.
The main results in this section are the following.
Theorem 4.1.Let A be a finite alphabet, let R ⊆ A * × A * be a finite set, and let R # be the least two-sided congruence on A * containing R. If S ⊆ A * × A * is any finite set, and ρ is the least right congruence on A * containing R # and S, then the following hold: (a) If a congruence enumeration for ρ terminates with output word graph Γ = (N, E), then A * /ρ is finite and the function ϕ : N × A * −→ N , defined by (α, w)ϕ = β whenever w labels an (α, β)-path in Γ, is a right action that is isomorphic to the natural action of A * on A * /ρ by right multiplication.
(b) If A * /ρ is finite, then any congruence enumeration for ρ terminates.
Corollary 4.2.Let A be a finite alphabet, let R ⊆ A * × A * be a finite set, and let R # be the least two-sided congruence on A * containing R. Then the following hold: (a) If a congruence enumeration for R # terminates with output word graph Γ = (N, E), then A * /R # is finite and the function ϕ : N × A * −→ N defined by (α, w)ϕ = β whenever w labels an (α, β)-path in Γ is a right action that is isomorphic to the (faithful) natural action of A * /R # on itself by right multiplication.
(b) If A * /R # is finite, then any congruence enumeration for R # terminates.
We will prove Theorem 4.1 and Corollary 4.2 in Section 4.2.We start by showing that the steps TC1, TC2, and TC3 preserve certain properties of word graphs in Proposition 4.4.In Section 4.1, we show that every congruence enumeration eventually stabilises and is eventually compatible with R; and in Section 4.2 we give the proofs of Theorem 4.1 and Corollary 4.2.
We will make repeated use, without reference, to the following straightforward lemma.
Lemma 4.3.If any of TC1, TC2, or TC3 is applied to (Γ i , κ i ) where every node in Γ i is reachable from 0, then every node in the output Γ i+1 satisfies is reachable from 0 also.
The next proposition also plays a crucial role in the proof of Theorem 4.1 and Corollary 4.2.
Proposition 4.4.If TC1, TC2, or TC3 is applied to The proof of Proposition 4.4 is split into two parts due to commonalities in the proofs rather than the statements.The two cases are: TC1, TC2(a), TC2(b); and TC2(c), TC3.Proof.Since (a) and (b) of TC2 are symmetric, it suffices to prove the proposition when TC1 or TC2(a) is applied.In TC1, α is a node in Γ i and there is no edge incident to α labelled by a.In this case, Γ i+1 is obtained from Γ i by adding the single node β := 1 + min j≤i N j and the single edge (α, a, β), and κ i+1 := κ i ∪ {(β, β)}.In TC2(a), there exists (u, v) ∈ R, b ∈ A, and v 1 ∈ A * such that v = v 1 b, and there exist nodes α, β, and γ in Γ i such that u and v 1 label (α, β)-and (α, γ)-paths in Γ i , respectively.In this case, Γ i+1 is obtained from Γ i by adding the single edge (γ, b, β) and κ i+1 := κ i .
TC1: Suppose that (u, v) ∈ (0)π Γi+1 \ (0)π Γi is arbitrary.Then there are paths in Γ i+1 from 0 to some node γ labelled by u and v but there are no such paths in Γ i .Since β is not a node in Γ i , β is not the target of any path in Γ i .It follows that the edge (α, a, β) must occur at least once in both paths.But β is the source of no edges in Γ i+1 , and so (α, a, β) occurs once, and it must be the last edge, in both paths.Hence

TC2(a):
We proceed by induction on the total number k of occurrences of the edge (γ, b, β) (defined at the start of the proof) in any pair of paths in Γ i+1 with source 0 and the same target node.The inductive hypothesis is: if (x, y) ∈ (0)π Γi+1 , X and Y are paths with source 0 labelled by x and y, respectively, and the total number of occurrences of the edge (γ, b, β) in X and Y is strictly less than k, then (x, y) ∈ ρ.The base case is when k = 0 and, in this case, (x, y) ∈ (0)π Γi ⊆ ρ, as required.
We conclude the proof by showing that (0)π Γi+1/κi+1 ⊆ ρ when either TC1 or TC2(a) is applied.Suppose that and η are nodes in Γ i and hence they are reachable from 0 in Γ i by Lemma 4.3.In other words, there exist Lemma 4.6.Proposition 4.4 holds when TC2(c) or TC3 is applied.

Completeness, determinism, and compatibility
In this section, we will prove that if at some step i in a congruence enumeration the word graph Γ i is complete, then that enumeration terminates, and that every congruence enumeration is eventually compatible with ρ.
We say that a congruence enumeration stabilises if for every i ∈ N and every node α of Γ i there exists K ∈ N such that for all j ≥ K either α is not a node in Γ j or if (α, a, β) ∈ E j , then (α, a, β) ∈ E j+1 for any a ∈ A. Proof.We will prove a slightly stronger statement, that any sequence of TC1, TC2, and TC3 stabilises.
Suppose that i ∈ N, α is an arbitrary node in Γ i , and a ∈ A is arbitrary.Either there exists K > i such that α is no longer a node in Γ K , or α is a node in Γ j for all j ≥ i.If α is no longer a node in Γ K , then, from the definitions of TC1, TC2, and TC3, α is not a node in Γ k for any k ≥ K.
Suppose that α is a node in Γ j for all j ≥ i and that (α, a, β 1 ) ∈ E j for some β 1 ∈ A * .If β 1 ∈ N j is replaced by β 2 ∈ N k at some step k ≥ j, then step k is an application of TC3.In particular, 0 ≤ β 2 ≤ β 1 .This process can be repeated only finitely many times because there are only finitely many natural numbers less than β 1 .
If Γ i is complete for some i, then TC1 cannot be applied again at any step after i.In this case, TC2 and TC3 make κ coarser and hence reduce the number of nodes in Γ i .Since the number of nodes in Γ i is finite, it follows that the enumeration must terminate at some point.We record this in the following corollary.
Corollary 4.8.If Γ i is complete at some step i of a congruence enumeration, then the enumeration terminates at some step j ≥ i.
The next lemma shows that, roughly speaking, if the word graph at some step of a congruence enumeration is nondeterministic, then at some later step it is deterministic.Lemma 4.9.If w ∈ A * labels a path in Γ i starting at 0 for some i ∈ N, then there exists j ≥ i such that w labels a unique path in Γ j starting at 0. Proof.We proceed by induction on the length of the word w.If w = ε, then w labels a unique path in every Γ i starting (and ending) at 0.
Suppose that every word of length at most n − 1 labelling a path starting at 0 in Γ i labels a unique path, and let w ∈ A * be any word of length n ≥ 1, labelling a path in Γ i starting at 0. If w = w 1 a for some w 1 ∈ A * and a ∈ A, then w 1 labels a unique path in Γ i starting at 0 by induction.Suppose that α is the target of the unique path labelled w 1 .By Lemma 4.7, we may assume without loss of generality that α is a node in Γ j for every j ≥ i.If there exist edges (α, a, β 1 ), . . ., (α, a, β r ) in Γ i , then at most one of these edges was created by an application of TC1 or TC2.
By part (e) of the definition of a congruence enumeration (Definition 3.3), there exists j > i such that κ j = ∆ Nj and so at step j, there is only one edge incident to α labelled a.
Next we show that every congruence enumeration is eventually compatible with ρ.Lemma 4.10.If (u, v) ∈ R # , then there exists K ∈ N such that (u, v) ∈ (0)π Γi for all i ≥ K.
Suppose that j ∈ {1, . . ., k − 1} is arbitrary, and that w j = pxq and w j+1 = pyq where p, q ∈ A * and (x, y) ∈ R. From part (c) of the definition of a congruence enumeration (Definition 3.3), p labels a path with source 0 in Γ j for all j ≥ i for some sufficiently large i.We may choose i large enough so that the target node α of this path is a node in every Γ j for j ≥ i.By Definition 3.3(d), for sufficiently large i, (x, y) ∈ (α)π Γi and so (px, py) ∈ (0)π Γi .Again if i is large enough, there is a path labelled q from the target of the path from 0 labelled by px (or py) and so (w j , w j+1 ) = (pxq, pyq) ∈ (0)π Γi .
Proof.Recall that ρ is the least right congruence containing R # and S (as defined in Theorem 4.1).If (u, v) ∈ ρ, then there exists a right elementary sequence u = w 1 , w 2 , . . ., w k = v such that w j = xq and w j+1 = yq for some (x, y) ∈ R # ∪ S and some q ∈ A * .Hence it suffices to show that if (x, y) ∈ R # ∪ S and q ∈ A * , then, for sufficiently large i, (xq, yq) ∈ (0)π Γi .
We also obtain the following stronger result for an enumeration of a two-sided congruence.
Corollary 4.13.Suppose that u, v ∈ A * are arbitrary.Then the following are equivalent: (ii) there exists K ∈ N such that (u, v) ∈ (α)π Γi for all nodes α in Γ i and for all i ≥ K; arbitrary, then (wu, wv) ∈ R # and so, by Lemma 4.10, (wu, wv) ∈ (0)π Γi for sufficiently large i.In particular, for large enough i, wu and wv both label (0, β)-paths for some node β.If α is the target of the path starting at 0 labelled by w, then u and v both label a (α, β)-paths in Γ i and so (u, v) ∈ (α)π Γi for i large enough.
(ii) ⇒ (iii).This follows immediately since 0 is a node in every word graph.
(iii) ⇒ (i).Follows by applying Corollary 4.12 to ρ = R # .Proof.We start the proof by showing that there is always a bijection between a certain set of nodes in the word graphs Γ i (defined in a moment) and the classes of ρ.Let M be the subset of i∈N N i so that α ∈ M if α ∈ N i for all large enough i.By Lemma 4.7, for every α ∈ M there exists w α ∈ A * such that w α labels a path from 0 to α in every Γ i for sufficiently large i.We define f : We will show that f is a bijection.If α, β ∈ M are such that (α)f = (β)f , then (w α , w β ) ∈ ρ and so (w α , w β ) ∈ (0)π Γi for large enough i by Corollary 4.12.In particular, w α and w β both label paths from 0 to the same node in every Γ i for large enough i.By Lemma 4.7, we can also choose i large enough so that the target γ of these paths is a node in Γ j for all j ≥ i.But w α and w β also label (0, α)-and (0, β)-paths in every Γ i for sufficiently large i.By Lemma 4.9, there exists j ≥ i such that w α labels a unique path starting at 0, and so α = γ.Similarly, there exists k ≥ j such that w β labels a unique path from 0, and so β = γ also.In particular, α = β and so f is injective.
For surjectivity, let u ∈ A * be arbitrary.Then u labels a path in Γ i for large enough i.Since every congruence enumeration stabilises, we can choose i large enough so that every edge in the path labelled by u belongs to every Γ j for j ≥ i.In particular, u labels a (0, α)-path in Γ i for some α ∈ M .But w α also labels a (0, α)-path in Γ i , and so (u, w α ) ∈ (0)π Γi .Hence, again by Corollary 4.12, for large enough i, (u, w α ) ∈ ρ and so (α)f = w α /ρ = u/ρ, and f is surjective.
(a).Suppose that (Γ, κ) is the output of the enumeration where Γ = (N, E) and that π : N −→ P(A * × A * ) is the path relation of Γ.Since the congruence enumeration process has terminated, Γ is finite and its set of nodes N coincides with the set M defined at the start of the proof.Hence A * /ρ is finite, since f : N −→ A * /ρ defined in (4.1) is a bijection.
Since any application of TC3 results in no changes to Γ, it follows that κ = ∆ N .Hence Γ is deterministic, and so every w ∈ A * labels exactly one path starting at every u ∈ N .In particular, ϕ (as defined in the statement) is a well-defined function.If w 1 , w 2 ∈ A * label (α 1 , α 2 )-and (α 2 , α 3 )-paths in Γ, respectively, then certainly w 1 w 2 labels a (α 1 , α 3 )-path, and so ϕ is an action.

(b).
If A * /ρ is finite, then the set M of nodes defined at the start of the proof is also finite.Since every congruence enumeration stabilises, it follows that for every α ∈ M , there exists K ∈ N such that (α, a, β) ∈ E i for all i ≥ K.It follows that β ∈ M .Since A and M are finite, there exists K ∈ N such that (α, a, β) ∈ E i for all α ∈ M , all a ∈ A, and all i ≥ K.It follows that none of TC1, TC2, nor TC3 results in any changes to Γ i or κ i , i ≥ K, and so the enumeration terminates.
We conclude this section by proving Corollary 4.2.Example 5.1.Suppose that M is the monoid generated by the matrices over the boolean semiring B = {0, 1} with addition defined by and multiplication defined as usual for the real numbers 0 and 1.The right Cayley graph of M with respect to its generating set {a, b, c} is shown in Fig. 5.1; see also Table 5.1.
Suppose that we want to determine a right congruence on a finite monoid M generated by a non-abstract set of generators A -such as transformations, partial permutations, or matrices over a semiring.Of course, every finite monoid M is finitely presented, and so congruence enumeration can be applied to any presentation for M .It is possible that a presentation for M is already known, or we can compute a presentation for M ; for example, by using the Froidure-Pin Algorithm (see [11]).
It is also possible, instead of starting with the trivial word graph, to start a congruence enumeration with the right Cayley graph Γ 1 = (N 1 , E 1 ) of M with respect to A as the input to the process.As mentioned above, (0)π Γ1 = R Experiments indicate that if the congruence being enumerated has a relatively large number of classes in comparison to |M |, then this second approach is faster than starting from the trivial word graph.On the other hand, if the number of congruence classes is relatively small compared with |M |, then starting from the trivial word graph is often faster.Since the number of congruence classes is usually not known in advance, the implementation in libsemigroups [24] runs both these approaches in parallel, accepting whichever provides an answer first.
We end this section with an example of performing a congruence enumeration with a right Cayley graph as an input.
Example 5.2.Suppose that M is the monoid from Example 5.1.To compute the least right congruence ρ on M containing S = {(a, b)} we perform the following steps.
Set Γ 1 to be the right Cayley graph of M with respect to the generating set {a, b, c}; see Fig. 5.1 and Table 5.1.Suppose that a < b < c.
Step 1: Apply TC2 to the only pair (a, b) in S, and the output of this is (Γ 2 , κ 2 ) where κ 2 is the least equivalence on the nodes of Γ 2 containing (1, 2); see Fig. 5.2a.
We conclude that ρ has 4 equivalence classes whose representatives ε, a, c, and a 2 correspond to the nodes {0, 1, 3, 4} in the graph Γ 5 .By following the paths in Γ 5 starting at 0 labelled by each of the words in Table 5.1 we determine that the congruence classes of ρ are: {ε}, {a, b, ab}, {c}, {a 2 , ba, bc, bab}.5.1 for a representative word for each node.A dashed edge with a single arrowhead denotes an edge that is obtained from TC2 or TC3, solid edges correspond to edges that existed at the previous step.

The HLT strategy
In this section, we describe the so-called HLT strategy for congruence enumeration.The acronym HLT stands for Haselgrove, Leech, and Trotter; for further details see [5] or [31,Section 5.2].For the record, this is Walker's Strategy 1 in [36], and is referred to as R-style in ACE [14] ("R" for "relators").
The HLT strategy, like every congruence enumeration, starts by applying steps (a) and (b) in the definition (Definition 3.3).We then repeatedly apply TC3 until the resulting κ is trivial, after which we repeatedly perform the following steps: HLT1.If α is the minimum node in Γ i such that there exists (u, v) ∈ R where either: u or v does not label a path starting at α, or u and v label paths ending in different nodes.In the former case, we apply TC1 repeatedly until u and v 1 label paths starting at α where v = v 1 b for some b ∈ A, and then apply TC2 where part (a) applies.In the latter case, we simply apply TC2 to α and (u, v) ∈ R where part (c) applies.
HLT3.If α is the node from HLT1, then we apply TC1 to α and every a ∈ A, if any, such that there is no edge incident to α labelled by a.
We note that if for every a ∈ A there exists (u, v) ∈ R such that either u or v starts with a, then HLT3 does not have to be performed.
Next we will show that the HLT strategy fulfils the definition of a congruence enumeration.
Proposition 6.1.If ⟨A|R⟩ is a finite monoid presentation, S is a finite subset of A * × A * , R # is the least two-sided congruence on A * containing R, and ρ is the least right congruence on A * containing S and R # , then the HLT strategy applied to (Γ 1 , κ 1 ) where Γ 1 = (N 1 , E 1 ) is a word graph such that (0)π Γ1 ⊆ ρ and κ 1 = ∆ N1 is a congruence enumeration.
Proof.It suffices to check that the conditions in Definition 3.3 hold.Clearly parts (a) and (b) are included in the HLT strategy and so there is nothing to check for these.
(c).Suppose that α is a node in Γ i = (N i , E i ) such that there is no edge labelled by some a ∈ A incident to α.If α ∈ N j for all j ≥ i, then there exists k ∈ N such that α is the minimum node to which HLT1 is applied.If there exists (u, v) ∈ R such that the first letter of u is a, then an edge labelled by a incident to α is defined in HLT1.Otherwise, such an edge is defined in HLT3.

(d).
Suppose that α is a node in Γ i and (u, v) ∈ R are such that u and v both label paths starting at α.If α ∈ N j for all j ≥ i, then there exists k ∈ N such that α is the minimum node to which HLT1 is applied, and so TC2 is applied to α and (u, v) at some later step.Hence either the paths labelled by u and v starting at α end at the same node, or they end at distinct nodes β and γ such that (β, γ) ∈ κ i .But TC3 is applied at some later step, and so β and γ are identified in the corresponding quotient.Eventually the paths labelled by u and v starting at α end at the same node, i.e. (u, v) ∈ (α)π Γi for sufficiently large i.
(e).If (β, γ) ∈ κ i for some i, then there exists j ≥ i such that TC3 is applied by the definition of HLT2.If we choose the smallest such j, then κ i ⊆ κ j and so (β, γ) ̸ ∈ κ j+1 .
Example 6.2.Let M be the monoid defined in Example 5.1.Then M is isomorphic to the monoid defined by the presentation ⟨a, b, c We will apply the HLT strategy to the presentation and the set S = {(a, b)}.In other words, we enumerate the least right congruence on the monoid M containing (a, b).This is the same right congruence that we enumerated in Example 5.1.The input word graph Γ 1 is the trivial graph and the input κ 1 is ∆ N1 .We suppose that a < b < c.
Step 2: We continue with Definition 3.3(b).We apply TC3 and we get the graph Γ 3 in Fig. 6.1b which is the quotient of Γ 2 by κ 2 .The set κ 3 is defined to be ∆ N3 .0 1 2 3 4 ε a b a 2 c Table 6.1:A word labelling a path from 0 to each node in the right Cayley graph on the monoid M from Example 6.2; see Fig. 6.1 Step 3: Now we are ready to apply HLT1.Since 0 is the minimum node in Γ 3 such that (ac, a 2 ) ∈ R and ac and a 2 do not label paths in Γ 3 , we apply TC1 until node 3 and a path labelled by a 2 are defined and we continue by applying TC2(b) for u = a 2 , v = ac and v 1 = a.The output of this step is the graph Γ 4 in Fig. 6.1c.
Step 9: In this step, we do not apply HLT1 to 0 and (aba, a 2 ) since aba and a 2 both label paths that reach the same node in Γ 9 .We apply HLT1 to the node 1 and the relation (ac, a 2 ) ∈ R and the output is graph Γ 10 in Fig. 6.1i.
Step 10: We apply HLT1 to the node 1 and the relation (cb, bc) ∈ R and the output is graph Γ 11 in Fig. 6.1j.
After Step 10, Γ 11 is complete, deterministic, and compatible with R. Hence the enumeration terminates; see Example 5.1 for details of the enumerated congruence.

The Felsch strategy
In this section, we describe two versions of the so-called Felsch strategy for congruence enumeration.For the record, this is Walker's Strategy 2 in [36], and is referred to as C-style in ACE [14] ("C" for "cosets").

First version
The Felsch strategy starts with the trivial word graph Γ 1 = (N 1 , E 1 ) and with the trivial equivalence κ 1 = ∆ N1 on the nodes N 1 of Γ 1 .Just like the HLT strategy, the Felsch strategy starts by applying steps (a) and (b) from the definition of a congruence enumeration (Definition 3.3).We then repeatedly apply TC3 until the resulting κ is trivial.The following steps are then repeatedly applied: F1.If α ∈ N i is the minimum node in Γ i such that there exists a ∈ A and there is no edge with source α labelled by a, then apply TC1 to α and a. Apply TC2 to every node in Γ i and every relation in R.
It follows immediately from the definition of the Felsch strategy that it satisfies the definition of a congruence enumeration.To illustrate the Felsch strategy, we repeat the calculation from Example 6.2.
Example 7.1.Let M be the monoid defined by defined in Example 5.1.Then M is isomorphic to the monoid defined by the presentation ⟨a, b, c We will apply the Felsch strategy to the presentation (with a < b < c) and the set S = {(a, b)}.In other words, we enumerate the least right congruence on the monoid M containing (a, b).This is the same right congruence that we enumerated in Example 5.1.The input word graph Γ 1 is the trivial graph and the input κ 1 is ∆ N1 .
Step 2: We continue with Definition 3.3(b).We apply TC3 and we get graph Γ 3 in Fig. 7.1b which is the quotient of Γ 2 by κ 2 .The set κ 3 is defined to be ∆ N3 Step 3: We apply F1 and hence we apply TC1 to 0 and c.We add the node 3 ∈ N 4 and define the edge (0, c, 3).The output is graph Γ 4 in Fig. 7.1c.The output (Γ i , κ i ) of each step in the HLT strategy in Example 6.2.Purple arrows correspond to a, gray to b, pink to c, shaded nodes of the same colour belong to κ i , and unshaded nodes belong to singleton classes; see Table 6.1 for a representative word for each node.A dashed edge with a double arrowhead indicates the edge being defined in TC1, a dashed edge with a single arrowhead denotes an edge that is obtained from TC2 or TC3, solid edges correspond to edges that existed at the previous step.7.1 for a representative word for each node.A dashed edge with a double arrowhead indicates the edge being defined in TC1, a dashed edge with a single arrowhead denotes an edge that is obtained from TC2 or TC3, solid edges correspond to edges that existed at the previous step.Step 4: We continue with the application of the second part of F1 and hence we apply TC2 to 1 and (b 2 , b).The output is graph Γ 5 in Fig. 7.1d.
Step 5: We apply F1 and hence we apply TC1 to 1 and a.We add the node 4 ∈ N 6 and define the edge (1, a, 4).The output is graph Γ 6 in Fig. 7.1e.
Steps 6-12: We continue with the application of the second part of F1 and hence we apply TC2 to 0 and (a 3 , a 2 ), to 0 and (ac, a 2 ), to 1 and (ac, a 2 ), to 0 and (ca, a 2 ), to 0 and (c 2 , a 2 ) and to 3 and (b 2 , b).

After
Step 12, Γ 13 in Fig. 7.1f is complete, deterministic, and compatible with R. Hence the enumeration terminates; see Example 5.1 for details of the enumerated congruence.

Second version
The purpose of F2 in the Felsch stategy is to squeeze as much information as possible out of every definition of an edge (α, a, β) made in F1.The implementation of the Felsch strategy in [23] spends most of its time performing TC2.In this section, we propose a means of reducing the number of times TC2 is applied in F2 of the Felsch strategy.Roughly speaking, we do this by only applying TC2 to (u, v) ∈ R and a node α in Γ i if the path in Γ i starting at α and labelled by u, or v, goes through a part of Γ i that has recently changed.
To enable us to do this, we require a new set D i ⊆ N i × A, in addition to the word graph Γ i and equivalence relation κ i , at every step of a congruence enumeration.An element (α, a) of D i corresponds to a recently defined edge incident to α and labelled by a.More precisely, we define D 1 = ∅ and for i ≥ 1 we define In order to efficiently use the information recorded in D i , we require the following definition.Definition 7.2.If ⟨A|R⟩ is a finite monoid presentation, then we define the Felsch tree F (A, R) of this presentation to be a pair (Θ, ι) where: We modify TC1, TC2, and TC3 so that the set D i is defined appropriately at every step, and then, roughly speaking, we replace F2 in the Felsch strategy by a backtrack search through Θ in F (A, R) for every pair in D i .We will refer to this as the modified Felsch strategy .
If α is a node in Γ i and v ∈ A * is a node in Θ, then we perform a backtrack search consisting of the following steps: PD1. Apply TC2 to α and every relation in (v)ι.
PD2.For every edge (v, b, bv) in Θ that has not been traversed: for every node β in Γ i such that (β, b, α) ∈ E i , apply PD1 and PD2 to β and bv.
We will refer to either removing a pair from D i or performing the backtrack search as processing a deduction.The backtrack search is initiated for every (α, a) ∈ D i .If a is not a node in Θ, then a does not occur in any relation in R and so following the path in Γ i labelled by any u such that (u, v) ∈ R from any node in w in Γ i cannot contain any edge labelled by a.In particular, no such path contains the edge (α, a, β) ∈ Γ i ; the definition of which caused (α, a) to belong to D i in the first place.Hence if (α, a) ∈ D i , then there is a node a in Θ.
We repeatedly apply PD1, PD2, and PD3 starting from every pair in D i .This must terminate eventually because Θ is finite and we only apply TC2 and TC3 in PD1, PD2, and PD3.
To show that the modified Felsch strategy is a congruence enumeration, we require the following lemma.
Lemma 7.4.Suppose that α is a node in Γ j for all j ≥ i, and that (u, v) ∈ R is such that u and v label paths P u and P v in Γ i starting at α.If (β, a) ∈ D i for some i, and there is an edge (β, a, γ) ∈ Γ i in either P u or P v , then (u, v) ∈ (α)π Γ k for some k ≥ i.
As noted above, Θ is finite, and PD1 and PD2 are only applied finitely many times before there is an application of PD3.If the first application of TC3 (in PD3) after step j occurs at step k, then Γ k+1 is the quotient of Γ k by κ k and κ i ⊆ κ k .Therefore (u, v) ∈ (α)π Γ k+1 as required.
Proposition 7.5.If ⟨A|R⟩ is a finite monoid presentation, S is a finite subset of A * × A * , R # is the least two-sided congruence on A * containing R, and ρ is the least right congruence on A * containing S and R # , then the modified Felsch strategy applied to (Γ 1 , κ 1 ), where Γ 1 = (N 1 , E 1 ) is the trivial word graph and κ 1 = ∆ N1 , is a congruence enumeration.
Proof.The only difference between the modified Felsch strategy and the original Felsch strategy is that after F1, TC2 is only applied to particular nodes and relations.Hence, it suffices to show that Definition 3.3(d) holds.Assume α ∈ N i at some step i in the congruence enumeration and let (u, v) ∈ R. In order for modified Felsch to be a congruence enumeration we need to show that there exists j ≥ i such that either α / ∈ N j or (u, v) ∈ (α)π Γj .If α / ∈ N j for some j > i, then α / ∈ N k for all k ≥ j (since new nodes introduced in TC1 are larger than all previous nodes).Hence it suffices to prove that if α is a node for all j ≥ i, there exists some k such that (u, v) ∈ (α)π Γ k .Since F1 is repeatedly applied there exists a step k + 1 when α is a node in Γ k+1 and paths P u and P v leaving α labelled by u and v, respectively, exist in Γ k+1 .Suppose that k ∈ N is the least value such that this holds.Then, by Lemma 7.4, it suffices to show that there exists an edge (β, a, γ) in either P u or P v that belongs to E k+1 \ E k so that (β, a) ∈ D k+1 .If every edge in P u and P v belongs to E k , then u and v both label paths in Γ k starting at α, which contradicts the assumed minimality of k.

Implementation issues
In this section, we briefly address some issues relating to any implementation of the Todd-Coxeter algorithm for semigroups and monoids as described herein.
In some examples, it can be observed that the HLT strategy defines many more nodes than the Felsch strategy.One possible antidote to this is to, roughly speaking, perform "periods of definition à la HLT [that] alternate with periods of intensive scan à la Felsch" [25, p. 14].
In both ACE [14] and libsemigroups [24] it is possible to specify the precise lengths of the periods of applications of HLT and Felsch.As might be expected, some choices work better than others in particular examples, and this is difficult (or impossible) to predict in advance.It is routine to show that alternating between HLT and Felsch in this way still meets the definition of a congruence enumeration given in Definition 3.3.Although we presented the HLT and Felsch strategies separately, it seems that some combination of the two sometimes offers better performance.
The next issue is: how to represent the equivalence relations κ i ?A method suggested in [31,Section 4.6], which is now a standard approach for representing equivalence relations, is to use the disjoint-sets data structure to represent the least equivalence relation containing the pairs (u, v) added to κ i in TC2(c) or TC3.The theoretical time complexity of updating the data structure to merge two classes, or to find a canonical representative of a class given another representative, is O(α(m)) time (in both the worst and the average case) and requires O(m) space where m is the number of elements in the underlying set and α is the inverse Ackermann function; see [34].
Another issue that arises in the implementation is how to represent the word graphs Γ i .In order to efficiently obtain the word graph Γ i+1 from Γ i in TC3 it is necessary to keep track of both the edges with given source node, and with given target.This is more complex for semigroups and monoids than for groups, because a word graph Γ = (N, E) ouput by a coset enumeration for a group has the property that for every β ∈ N and every a ∈ A there is exactly one α ∈ N such that (α, a, β) is an edge.As such if it ever arises that there are edges (α 1 , a, β) and (α 2 , a, β) in a word graph during a coset enumeration, the pair (α 1 , α 2 ) can immediately be added to κ i .It is therefore possible to represent every word graph arising during a coset enumeration to have the property that |{ α ∈ N : (α, a, β) ∈ E }| = 1, which simplifies the data structure required to represent such a graph.
In contrast, if Γ = (N, E) is a word graph arising in a congruence enumeration for a monoid, then |{ α ∈ N : (α, a, β) ∈ E }| can be as large as |N |.In practice, in TC3 pairs of nodes belonging to κ i are merged successively.A balance must be struck between repeatedly updating the data structure for the edges with given target in Γ i+1 or only retaining the edges with given source and rebuilding the data structure for the target edges later in the process.The former works better if the number of nodes in Γ i is comparable to the number in Γ i+1 , i.e.only a relatively small number of nodes are merged.On the other hand, if Γ i+1 is considerably smaller than Γ i , then it can be significantly faster to do the latter.
Depending on the sequence of applications of TC1, TC2, and TC3 in two successful congruence enumerations with the same input, the output word graphs may not be equal.However, the output word graphs are unique up to isomorphism.Standardization is a process for transforming a word graph into a standard form.To discuss this we require the following notion.
If A is any alphabet and ⪯ is a total order on A * , then we say that ⪯ is a reduction ordering if ⪯ has no infinite descending chains and if u ⪯ v for some u, v ∈ A * , then puq ⪯ pvq for all p, q ∈ A * .It follows from this definition that ε is the ⪯-minimum word in A * for every reduction ordering on A * .
If the set A is totally ordered by ≤, then we may extend ≤ to ≤ lex over A * , by defining ε ≤ lex w for all w ∈ A * and u ≤ lex v whenever u = au 1 and v = bv 1 for some a, b ∈ A with a < b, or a = b and u 1 ≤ lex v 1 .This order is usually referred to as the lexicographic order on A * .Note that lexicographic order is not a reduction ordering.The short-lex order ≤ slex on A * is defined as follows: if u, v ∈ A * , then u ≤ slex v if |u| < |v| or |u| = |v| and u ≤ lex v.It is straightforward to verify that the short-lex order on A * is a reduction ordering.Further examples of reduction orderings on A * include recursive path descent, as well as the wreath product of any finite collection of reduction orderings; see [31, Section 2.1] for further details.
Suppose that ⪯ is a reduction ordering on A * .We will say that the word graph Γ = (N, E) is standardized with respect to ⪯ if α < β if and only if w α ≺ w β for any α, β ∈ N where w α , w β ∈ A * are the ⪯-minimum words labelling (0, α)-and (0, β)-paths, respectively.Any process that transforms a word graph Γ into a standardized word graph, is referred to as standardization; for example, see STANDARDIZE in [31, p195].
A word graph Γ i can be replaced by any standardized word graph at any step of a congruence enumeration, provided that the values in κ i and D i are also updated accordingly.In particular, it can be applied repeatedly during a congruence enumeration, or only at the end.Standardization during a congruence enumeration can be very costly in the context of semigroups and monoids.However, it can also be somewhat beneficial in some examples.
The order of the definitions of new nodes in Γ i+1 in both HLT1 and F1 depends on the numerical values of their source nodes.Hence replacing Γ i by a standardized word graph can change the order of these definitions, which in turn can influence the number of steps in the enumeration.In an actual implementation, standardising a word graph is a rather complicated process that involves applying a permutation to the data structure representing the graph.

Further variants
In this section, we present some variants of the Todd-Coxeter algorithm that appear in the literature, which are used to compute special types of congruences (namely Rees congruences) and for computing congruences on finitely presented inverse monoids.

Monoids with zero
The first such variant is for finitely presented monoids with a zero element 0. For example, if M is the monoid defined by the presentation ⟨a, b, 0|ab = 0, a 4 = a, b 3 = b, (ab) 2 = 0, a0 = 0a = 0 = b0 = 0b = 0 2 ⟩, then the relations a0 = 0a = 0 = b0 = 0b = 0 2 indicate that 0 is a zero element of M .Of course, the algorithms described above can be applied to this finite monoid presentation, as well as every other.On the other hand, the inclusion of the relations a0 = 0a = 0 = b0 = 0b = 0 2 is rather cumbersome, and so we might rather write: where the relations a0 = 0a = 0 = b0 = 0b = 0 2 are implicit.This is directly analogous to the implicit relations for the identity in a monoid presentation, and in a group presentation for inverses.We refer to such a presentation as a finite monoid-with-0 presentation.Both the HLT and Felsch strategies can be adapted for monoid-with-0 presentations without much difficulty as follows.
Suppose that ⟨A|R⟩ is a finite monoid-with-0 presentation.We refer to a word graph Γ i = (N i , E i ) over A ∪ {0} with a distinguished node ω ∈ N i such that the only edges with source ω are loops of the form (ω, a, ω) ∈ E i for all a ∈ A and edges (α, 0, ω) for all α ∈ N i as a word graph-with-0.We augment TC1 with the following step: Z: If β is the new node introduced in TC1, then we define the edge (β, 0, ω).
Given a monoid-with-0 presentation, it is routine to verify that if we perform any congruence enumeration (where TC1 is augmented with Z) with input (Γ 1 , κ 1 ) where Γ 1 = (N 1 , E 1 ) is a word graph-with-0 and κ 1 = ∆ N1 , then the conclusions in Corollary 4.2 still hold.

Rees congruences
Following [30,Chapter 12], we may extend the discussion of the previous section, to obtain a procedure for enumerating a left, right, or two-sided Rees congruence on a finitely presented monoid (with or without zero element).If I is a left, right, or two-sided ideal of the monoid S, then the Rees congruence associated with I is the congruence ∆ S ∪ (I × I).Such a procedure applies to a finite monoid presentation ⟨A|R⟩ and set finite S ⊆ A * × {0} (rather than S ⊆ A * × A * as in Definition 3.3).The input to such an enumeration is a word graph-with-zero Γ 1 = (N 1 , E 1 ) and κ 1 = ∆ N1 .The first steps are identical to those given in Definition 3.3(a) and (b) except that TC1 and Z are applied in part (a).The subsequent steps are just any sequence of applications of TC1+Z, TC2, and TC3 satisfying Definition 3.3(c), (d) and (e).It follows immediately from Theorem 4.1, and the validity of the congruence enumeration for monoid-with-0 presentations, that this process is valid.

Stephen's procedure
Another variant of the Todd-Coxeter algorithm is that of Stephen [32,Chapter 4], mentioned in the introduction of the current article.Note that a similar method for constructing the Cayley graph of groups was described by Dehn [6].Suppose that M is the monoid defined by a finite monoid presentation ⟨A|R⟩ and that Γ = (N, E) is the right Cayley graph of M with respect to A. If w ∈ A * is arbitrary and w labels a (0, α)-path in Γ for some α ∈ N , then the aim of this variant is to output the subgraph Λ of Γ induced by the set X of nodes in N from which α is reachable.Note that the set of these nodes corresponds to the set of elements in M which are ≥ R w/R # (recall that two monoid elements s, t ∈ M satisfy s ≥ R t if tM ⊆ sM ).If A is the automata with alphabet A, state set X, initial state 0, accept state α, and edges consisting of those in Λ, then the language L(A) accepted by A is the set of words in v ∈ A * that represent the same element of M as w (i.e.v/R # = w/R # ).As such if the (as yet to be described) procedure terminates, the output allows us to decide the word problem for w in M .
Suppose that w = a 1 • • • a n ∈ A * for some a 1 , . . ., a n ∈ A, that Γ 1 = (N 1 , E 1 ) is the trivial word graph, and κ 1 = ∆ N1 .A special case of Stephen's procedure consists of the following steps described in terms of TC1, TC2, and TC3.The first step is always: S1: TC1 is applied to 0 and a 1 , then to i and a i+1 for every i such that 1 ≤ i ≤ n − 1.The resulting Γ n consists of the single path from 0 to the node n.(The graph Γ n is referred to as the linear graph of w in [32].) S1 is then followed by any sequence of the following steps: S2: At step i, suppose that the word graph Γ i = (N i , E i ) contains a path with source α ∈ N i labelled u for some (u, v) ∈ R. If v = v 1 b where v 1 ∈ A * and b ∈ A, then TC1 is applied until there is a path with source α labelled by v 1 and then TC2 is applied to α and (u, v) ∈ R. (This is referred to as an elementary expansion in [32].) S3: Apply TC3.(Quotienting Γ i by the least equivalence containing a single (α, β) ∈ N i × N i is referred to as a determination in [32].In TC3 we quotient Γ i by the entire equivalence κ i ; this is the only point where the procedure described here differs from the description in [32].) It is shown in [32] that if any sequence of S2 and S3 has the property that after finitely many steps any subsequent applications of S2 and S3 result in no changes to the output (i.e.(Γ i+1 , κ i+1 ) = (Γ i , κ i )), then Γ i is isomorphic to the induced subgraph Λ defined at the start of this section.Note that S2 and S3 are similar to HLT1, HLT2, and HLT3 described in Section 6.While it is not possible to use congruence enumeration, at least as described in this paper, to solve the word problem when the monoid M defined by a presentation ⟨A|R⟩ is infinite, it is possible to decide whether or not two words u, v ∈ A * represent the same element of M using the procedure defined in this section, whenever the induced subgraph Λ is finite.Since the set of nodes in Λ corresponds to the set of elements in M which are ≥ R u/R # (assuming that u is the input word for Stephen's procedure) the word graph Λ is finite when there are only finitely many elements of M that are ≥ R u/R # .

A Extended examples
This appendix contains a number of extended examples of congruence enumerations.
Example A.1.We will apply the Felsch strategy to the presentation with a < b.The input word graph Γ 1 is the trivial graph and the input κ 1 = ∆ N1 .Since S = ∅, we do not apply steps Definition 3.3(a) and (b).
For the sake of simplicity, the steps in this example correspond to either a single application of F1 (a single application of TC1 and multiple applications of TC2) or a single application of TC3.If a step produces no change to Γ i or κ i , this step is skipped and does not have a number; see Table A Step 1-2: The only node in Γ 1 is 0. Since there is no edge incident to 0 labelled by a, in Step 1 we apply F1 and add the node 1 and the edge (0, a, 1).Similarly, in Step 2 we add the node 2 and the edge (0, b, 2).The output is the word graph Γ 3 in Fig. A.1a.
Step 3: An application of TC1 which leads to the definition of the node 3 and the edge (1, a, 3).At this point F1 leads to an application of TC2 to the node 3 and the relation (a Step 13: We apply TC1 and define the node 13 and the edge (7, a, 13).We apply TC2 to the node 1 and the relation ((ab) 2 , a 2 ) and define the edge (13, b, 1).We also apply TC2 to node 7 and the relation ((ab) 2 , a 2 ) and define the edge (13, a, 7).Next, we apply TC2 to the node 1 and the relation (b  Step 20: We apply TC1 and the node 18 and the edge ( Step 25: We apply TC1 and the node 21 and the edge (18, a, 21) are defined.We apply TC2 to 6 and ((ab) 2 , a 2 ) and define the edge (21, b, 17).We also apply TC2 to 18 and ((ab) 2 , a 2 ) and define (21, a, 18).Finally, we apply TC2 Step 27: We apply TC1 and define 23 and (20, a, 23).We apply TC2 to 12 and ((ab) 2 , a 2 ) and define (23, b, 12).We also apply TC2 to 20 and ((ab) 2 , a 2 ) and define (23, a, 20).Finally, we apply TC2 to 23 and (b with a < b < c.Each step of this enumeration corresponds to either: at least one application of TC1 (in HLT1) followed by multiple applications of TC2 (in the same HLT1 step as the application of TC1 or subsequent applications of HLT1 where TC1 is not invoked); or a single application of TC3 (in HLT2); see Step 1: At this step we apply HLT1 to the node 0 and relation (a 2 , ac).TC1 yields the new nodes 1 and 2, and the edges (0, a, 1) and (1, a,                   Step 1: At this step we apply F1 to the node 0. The node 1 and the edge (0, a, 1) are defined; see Step 4: F1 is applied to 1 and hence the node 4 and the edge (1, a, 4) are defined.We apply F2 and the following applications of TC2 yield new information; applying TC2(b) to 0 and (a 2 , ac) yields the edge (1, c, 4), applying TC2(b) to 0 and (a 2 , ca) yields the edge (3, a, 4), the application of TC2(b) to 0 and (a 2 , c 2 ) yields the edge (3, c, 4), the application of TC2(a) to 0 and (a 3 , a 2 ) yields the edge (4, a, 4) and finally the application of TC2(b) to 1 and (a 2 , ac) yields the edge (4, c, 4); see Fig. A.8d.
Step 5: We apply F1 to 1 which leads to the definition of the node 5 and the edge (1, b, 5).We apply F2 and the following applications of TC2 yield new information; the application of TC2(b) to 0 and (a 2 , aba) yields the edge (5, a, 4) and the application of Step 10: We apply F1.The node 9 and the edge (6, a, 9) are defined and an application of F2 follows.Applying TC2(b) to 2 and (a 2 , ac) yields the edge (6, c, 9), applying TC2(b) to 2 and (a 2 , ca) yields the edge (7, a, 9), applying TC2(a) to 2 and (a 3 , a 2 ) yields the edge (9, a, 9) and applying TC2(b) to 6 and (a 2 , ac) yields the edge (9, c, 9).In addition, applying Step 12: We apply F1 and the node 10 and the edge (6, b, 10) are defined.We apply F2 and the following applications of TC2 yield new information; applying TC2(b) to 2 and (a 2 , aba) yields the edge (10, a, 4), applying

Figure 2 . 1 :
Figure 2.1: A commutative diagram illustrating a homomorphism of actions where X ×S −→ Y ×S is the function defined by (x, s) → ((x)λ, s).

Figure 3 . 1 :
Figure 3.1: A diagram representing TC2(a), solid lines correspond to paths in Γ i and the dashed edge is the one defined in TC2(a).

Figure 4 . 1 :
Figure 4.1: The commutative diagram in the proof of Theorem 4.1(a).

4. 2
The proofs of Theorem 4.1 and Corollary 4.2 In this section, we prove Theorem 4.1 and Corollary 4.2.Theorem 4.1.Let A be a finite alphabet, let R ⊆ A * × A * be a finite set, and let R # be the least two-sided congruence on A * containing R. If S ⊆ A * × A * is any finite set, and ρ is the least right congruence on A * containing R # and S, then the following hold: (a) If a congruence enumeration for ρ terminates with output word graph Γ = (N, E), then A * /ρ is finite and the function ϕ : N × A * −→ N , defined by (α, w)ϕ = β whenever w labels an (α, β)-path in Γ, is a right action that is isomorphic to the natural action of A * on A * /ρ by right multiplication.(b) If A * /ρ is finite, then any congruence enumeration for ρ terminates.
a 2 ab ba bc bab Table 5.1:A word labelling a path from 0 to each node in the right Cayley graph on the monoid M from Example 5.1; see Fig. 5.1.

Corollary 4 . 2 .
Let A be a finite alphabet, let R ⊆ A * × A * be a finite set, and let R # be the least two-sided congruence on A * containing R. Then the following hold: (a) If a congruence enumeration for R # terminates with output word graph Γ = (N, E), then A * /R # is finite and the function ϕ : N × A * −→ N defined by (α, w)ϕ = β whenever w labels an (α, β)-path in Γ is a right action that is isomorphic to the (faithful) natural action of A * /R # on itself by right multiplication.(b) If A * /R # is finite, then any congruence enumeration for R # terminates.Proof.If S = ∅ in Theorem 4.1, then the least right congruence ρ containing S and R # is just R # .By Theorem 4.1, the action ϕ : N × A * −→ N is isomorphic to the natural action of A * on A * /R # by right multiplication.By Corollary 4.13, the kernel of ϕ is R # , and it is routine to verify that the kernel of the action of A * on A * /R # by right multiplication is also R # .Hence, by Proposition 2.2, the action ϕ of A * on N and the induced action of A * /R # on N are isomorphic.Similarly, the action of A * on A * /R # is isomorphic to the action of A * /R # on A * /R # by right multiplication.Thus the actions of A * /R # on N and A * /R # on A * /R # are isomorphic also.The latter action is faithful since A * /R # is a monoid.Hence the action of A * /R # on N is faithful also, and there is nothing further to prove.

5
Monoids not defined by a presentation Suppose that the monoid M defined by the presentation ⟨A|R⟩ is finite and that ϕ : A * −→ M is the unique surjective homomorphism extending the inclusion of A in M .Then ker(ϕ) = R # and (a)ϕ = a for every a ∈ A. If M = {m 1 , m 2 , . . ., m |M | }, then the right Cayley graph of M with respect to A is the word graph Γ with nodes N = {0, . . ., |M | − 1} and edges (α, a, β) for all α, β ∈ N and every a ∈ A such that m α a = m β .The right Cayley graph Γ is complete and deterministic, and so if π : N −→ P(A * × A * ) is the path relation on Γ, then (α)π = R # for all α ∈ N .

Figure 5 . 1 :
Figure 5.1: The right Cayley graph of the monoid M from Example 5.1 with respect to the generating set {a, b, c}; see Table5.1 for a representative word for each node.Purple arrows correspond to edges labelled a, gray labelled b, and pink labelled c.

Figure 5 . 2 :
Figure 5.2:The output (Γ i , κ i ) of each step in Example 5.2.Purple arrows correspond to a, gray to b, pink to c, shaded nodes of the same colour belong to κ i , and unshaded nodes belong to singleton classes; see Table5.1 for a representative word for each node.A dashed edge with a single arrowhead denotes an edge that is obtained from TC2 or TC3, solid edges correspond to edges that existed at the previous step.

Figure 6 . 1 :
Figure 6.1:The output (Γ i , κ i ) of each step in the HLT strategy in Example 6.2.Purple arrows correspond to a, gray to b, pink to c, shaded nodes of the same colour belong to κ i , and unshaded nodes belong to singleton classes; see Table6.1 for a representative word for each node.A dashed edge with a double arrowhead indicates the edge being defined in TC1, a dashed edge with a single arrowhead denotes an edge that is obtained from TC2 or TC3, solid edges correspond to edges that existed at the previous step.

Figure 7 . 1 :
Figure 7.1: The output (Γ i , κ i ) of each step in the Felsch strategy in Example 7.1.Purple arrows correspond to a, gray to b, pink to c, shaded nodes of the same colour belong to κ i , and unshaded nodes belong to singleton classes; see Table7.1 for a representative word for each node.A dashed edge with a double arrowhead indicates the edge being defined in TC1, a dashed edge with a single arrowhead denotes an edge that is obtained from TC2 or TC3, solid edges correspond to edges that existed at the previous step.

0 1 2 3 4 ε a b c a 2 Table 7 . 1 :
A word labelling a path from 0 to each node in the right Cayley graph on the monoid M from Example 7.1; see Fig. 7.1

Step 15 :Step 16 :Step 17 :b 2 a a 2 5 . 19 :
Taking the quotient of Γ 15 in F2 by κ 15 yields the graph in Fig. A.2b.We apply TC1 and define the node 14 and edge (10, b, 14); see Fig. A.2c.We apply TC1 and define the node 15 and the edge (11, a, 15).Applying TC2 to 2 and the relation ((ab) 2 , a 2 ) leads to the definition of the edge (15, b, 10) and applying TC2 to 11 and the relation ((ab) 2 , a 2 ) leads to the definition of edge (15, a, 11).Finally, an application of TC2 to 15 and the relation (b 3 , b) leads to the definition of the node (14, b, 10); see Fig. A.2d. Step 18: The node 16 and the edge (11, b, 16) are defined.After an application of TC2 to 5 and (b 3 , b) leads to the definition of (16, b, 11); see Fig. A.2e. ba ba 2 b (ba) 2 bab 2 b 2 a 2 b 2 ab ba 2 ba b 2 a 2 b b 2 aba b 2 ab 2 b 2 a 2 ba Table A.1: A word labelling a path from 0 to each node in Example A.1; see Fig. A.1, Fig. A.2, Fig. A.3, Fig. A.4, and Fig. A.Step The node 17 and the edge (12, a, 17) are defined.An application of TC2 to 6 and (b 3 , b) leads to the definition of (17, a, 12); see Fig. A.2f.
Fig. A.6 and Fig. A.7 for the word graphs Γ i and equivalence relation κ i after every step i; see also Table A.2.

Figure A. 1 :
Figure A.1: The output (Γ i , κ i ) for i = {3, 4, 6, 7, 8, 9, 10, 11, 13, 14} in the Felsch Strategy in Example 7.1.Purple arrows correspond to a, gray to b, shaded nodes of the same colour belong to κ i , and unshaded nodes belong to singleton classes.A dashed edge with a double arrowhead indicates the edge being defined in TC1, a dashed edge with a single arrowhead denotes a new edge obtained from TC2 or TC3, solid edges correspond to edges that existed at the previous step.

Figure A. 2 :
Figure A.2:The output (Γ i , κ i ) for i ∈ {15, . . ., 20} in the Felsch strategy in Example 7.1.Purple arrows correspond to a, gray to b, shaded nodes of the same colour belong to κ i , and unshaded nodes belong to singleton classes.A dashed edge with a double arrowhead indicates the edge being defined in F1, a dashed edge with a single arrowhead denotes an edge that is obtained from F2, solid edges correspond to edges that existed at the previous step.

Figure A. 3 :
Figure A.3: The output (Γ i , κ i ) for i ∈ {21, 22, 23, 24} in the Felsch strategy in Example 7.1.Purple arrows correspond to a, gray to b, shaded nodes of the same colour belong to κ i , and unshaded nodes belong to singleton classes.A dashed edge with a double arrowhead indicates the edge being defined in TC1, a dashed edge with a single arrowhead denotes an edge that is obtained from TC2 or TC3, solid edges correspond to edges that existed at the previous step.

Figure A. 4 :
Figure A.4:The output (Γ i , κ i ) for i ∈ {25, . . ., 28} in the Felsch strategy for Example 7.1.Purple arrows correspond to a, gray to b, shaded nodes of the same colour belong to κ i , and unshaded nodes belong to singleton classes.A dashed edge with a double arrowhead indicates the edge being defined in TC1, a dashed edge with a single arrowhead denotes an edge that is obtained from TC2 or TC3, solid edges correspond to edges that existed at the previous step.

Figure A. 5 :Steps 4 - 6 :Steps 7 - 9 :Steps 14 - 18 :Steps 19 - 20 :Step 21 :Example A. 3 .
Figure A.5: The output (Γ i , κ i ) for i = 29 and 30 in the Felsch strategy of Example 7.1.Purple arrows correspond to a, gray to b, shaded nodes of the same colour belong to κ i , and unshaded nodes belong to singleton classes.A dashed edge with a double arrowhead indicates the edge being defined in TC1, a dashed edge with a single arrowhead denotes an edge that is obtained from TC2 or TC3, solid edges correspond to edges that existed at the previous step.

Fig. A.8a. Step 2 :Step 3 :
We apply F1 to the node 0. The node 2 and the edge (0, b, 2) are defined.Applying F2 to 0 and the relation (b 2 , b) leads to the definition of the edge (2, b, 2); see Fig. A.8b.We apply F1 to 0. The node 3 and the edge (0, c, 3) are defined; see Fig. A.8c.

Figure A. 6 :
Figure A.6:The output (Γ i , κ i ) for i = 1, . . ., 20 in Example A.2. Purple arrows correspond to a, gray to b, pink to c, shaded nodes of the same colour belong to κ i , and unshaded nodes belong to singleton classes.A dashed edge with a double arrowhead indicates the edge being defined in TC1, a dashed edge with a single arrowhead denotes an edge that is obtained from TC2 or TC3, solid edges correspond to edges that existed at the previous step.

Figure A. 7 :
Figure A.7:The output (Γ i , κ i ) for i = 21, . . ., 24 of each step in Example A.2. Purple arrows correspond to 1, gray to 3, pink to 4, shaded nodes of the same colour belong to κ i , and unshaded nodes belong to singleton classes.A dashed edge with a double arrowhead indicates the edge being defined in TC1, a dashed edge with a single arrowhead denotes an edge that is obtained from TC2 or TC3, solid edges correspond to edges that existed at the previous step.