1 Introduction

In this article we describe the Todd–Coxeter algorithm for congruence enumeration in semigroups and monoids. The essential purpose of this algorithm is to compute the action of a finitely presented semigroup or monoid on the equivalence classes of a left, right, or two-sided congruence. Most existing implementations (see, for example, ACE [14], GAP [12], and MAF [38]) and expository accounts (see, for example, [29, p. 351], [31, Sections 4.5, 4.6 and Chapter 5], and [15, Chapter 5]) of the Todd–Coxeter algorithm relate to the enumeration of the cosets of a subgroup of a finitely presented group; or, more precisely, to the production of a permutation representation of the action of the group on the cosets. This was extended to linear representations by Linton [19, 20]. The purpose of this article is to provide an expository, but more or less complete, account of the Todd–Coxeter algorithm for semigroups and monoids.

The Todd–Coxeter algorithm is not a single procedure, but rather an infinite collection of different but related procedures. In the literature for finitely presented groups, examples of procedures in this collection are referred to as coset enumerations; see, for example, [31, Sections 4.5, 4.6 and Chapter 5]. Coset enumerations are also not algorithms, at least by some definitions, in that they might consist of infinitely many steps, and they do not always terminate. In fact, a coset enumeration terminates if and only if the subgroup whose cosets are being enumerated has finite index; for further details see Sect. 4. This is not to say, however, that the number of steps, or the run time, of a coset enumeration can be predicted in advance; it is relatively straightforward to find examples of finite presentations for the trivial group where the number of steps in a coset enumeration is arbitrarily high. For instance, the group presentation \(\langle a, b\ |\ ab ^ n, a ^ n b ^ {n + 1}, b ^ n ab ^ {-1}a ^ {-1}\rangle \) defines the trivial group, and is likely that at least n steps are required in any coset enumeration for this presentation; see [17]. Although this might seem rather negative, nothing more can really be expected. For example, if a coset enumeration for the trivial subgroup of a finitely presented group G does successfully terminate, then the output can be used to solve the word problem in G. It is well-known by the theorem of Novikov [27] and Boone [3] that the word problem for finitely presented groups is undecidable and so no procedure that solves the word problem, including any coset enumeration, can be expected to terminate in every case. Analogous statements about the undecidability of the word problem hold for finitely presented semigroups and monoids; see [29, Chapter 12].

Congruences are to semigroups and monoids what cosets are to groups, and so we will refer to congruence enumeration for finitely presented semigroups and monoids as the analogue of coset enumeration for groups. We will not consider the special case of enumerating the cosets of a subgroup of a group separately from the more general case of enumerating classes of a congruence of a semigroup.

The first computer implementation of the Todd–Coxeter algorithm is attributed to Haselgrove in 1953 by Leech [18]. Several authors (for example, Neubüser in [25] and Walker in [36]) comment that this may be the first computer implementation of an algorithm for groups, possibly representing the starting point of the field of computational group theory. Neumann [26] adapted the algorithm in [35] to semigroups, and Jura [17] proved that Neumann’s adaptation was valid. One congruence enumeration strategy is described in Ruškuc [30, Chapter 12] as well as variants for computing Rees congruences and minimal ideals. Stephen [32, Chapter 4] also describes a variant of Todd–Coxeter, which can be used to solve the word problem by constructing only part of the action of a finitely presented semigroup or monoid on itself by right multiplication. Stephen’s procedure [32, Section 4.1] is similar to one of the two main strategies for congruence enumeration described in Sect. 6; see also Sect. 9.

Versions of the Todd–Coxeter algorithm for semigroups were implemented in FORTRAN by Robertson and Ünlü [28]; in C by Walker [36, 37]; and in GAP by Pfeiffer [12]. The C++ library libsemigroups [24] (by three of the present authors) contains a flexible optimized implementation of the different versions of the Todd–Coxeter algorithm described in this document for semigroups and monoids.

The paper is organised as follows: some mathematical prerequisites from the theory of monoids are given in Sect. 2; in Sect. 3 we define the infinite family of procedures for enumerating a right congruence on a monoid; the validity of congruence enumeration is shown in Sect. 4; in Sect. 5 we discuss how congruence enumeration can be used to compute congruences of a monoid that is not given a priori by a finite presentation; in Sects. 6 and 7 we describe the two main strategies for congruence enumeration; in Sect. 8 we discuss some issues related to the implementation of congruence enumeration; and, finally, in Sect. 9 we present some variants: for enumerating congruences of monoids with zero elements, Rees congruences, and Stephen’s procedure [32, Section 4.1].

In Appendix A we give a number of extended examples, and in Appendix B there is a comparison of the performance of the implementations in libsemigroups [24] of various strategies when applied to a number of group, monoid, or semigroup presentations. The authors have also implemented basic versions of the algorithms described in this paper in python3 as a concrete alternative to the description given here; see [23].

2 Prerequisites

In this section, we provide some required mathematical prerequisites. The contents of this section are well-known in the theory of semigroups; for further background information we refer the reader to [16].

If \(\mu \) is an equivalence relation on a set X, then we denote by \(x / \mu \) the equivalence class of x in \(\mu \) and by \(X / \mu \) the set \(\{\, x / \mu \, :\, x\in X \,\}\) of equivalence classes. The least equivalence relation on X with respect to containment is \(\Delta _X = \{\, (x, x)\in X \times X \, :\, x\in X \,\}\); we refer to this as the trivial equivalence on X.

If S is a semigroup and \(R\subseteq S\times S\), then there is an (two-sided) elementary sequence with respect to R between \(x, y\in S\) if \(x = y\) or \(x=s_1, s_2, \ldots , s_m=y\), for some \(m \ge 2\), where \(s_i=p_iu_iq_i\), \(s_{i+1}=p_iv_iq_i\), with \(p_i, q_i\in S ^ 1\) and \((u_i, v_i) \in R\) for all i. If \(p_i = \varepsilon \) for all i in an elementary sequence, then we refer to the sequence as a right elementary sequence; left elementary sequences are defined analogously. We define \(R ^ {\#} \subseteq S \times S\) so that \((x,y)\in R^{\#}\) if and only if there is an elementary sequence between x and y with respect to R. Note that \(R^{\#}\) is the least congruence on the semigroup S containing R; see [16, Section 1.5]. The least right congruence containing a subset of \(S \times S\) is defined analogously, using right elementary sequences rather than two-sided elementary sequences; we will not reserve any special notation for such least right congruences. Least left congruences are defined dually.

Throughout this article we write functions to the right of their arguments, and compose from left to right. Let S be a semigroup and let X be a set. A function \(\Psi :X\times S\longrightarrow X\) is a right action of S on X if \(((x, s)\Psi , t)\Psi =(x, st)\Psi \) for all \(x\in X\) and for all \(s, t \in S\). If in addition S has an identity element e, we require \((x, e)\Psi = x\) for all \(x\in X\) also.

If S is a semigroup, then we may adjoin an identity to S (if necessary) so that S is a monoid. We denote S with an adjoined identity \(1_S\) by \(S ^ 1\). If \(\Psi :X\times S\longrightarrow X\) is a right action of a semigroup S on a set X, then \(\Psi ^ 1:X\times S ^ 1\longrightarrow X\) defined by \( (x, s)\Psi ^ 1 = (x, s)\Psi \) for all \(x\in X\) and \(s\in S\), and \((x, 1_S)\Psi ^ 1 = x\) for all \(x\in X\), is a right action of \(S ^ 1\) on X also.

For the sake of brevity, we will write \(x\cdot s\) instead of \((x, s)\Psi \), and we will say that S acts on X on the right. The kernel of a function \(f: X\longrightarrow Y\), where X and Y are any sets, is the equivalence relation

$$\begin{aligned} \ker (f) = \{\, (x,y)\in X\times X \, :\, (x)f = (y)f \,\}. \end{aligned}$$

If X is any set, then \(X ^ X\) denotes the set of all functions from X to X. Endowed with the operation of function composition, \(X ^ X\) is a monoid, called the full transformation monoid on X. A right action \(\Psi \) of a semigroup S on a set X induces a homomorphism \(\Psi ': S \longrightarrow X ^ X\) defined by \((s)\Psi ': x\mapsto x \cdot s\) for all \(s\in S\) and all \(x\in X\).

The kernel of a right action \(\Psi \) of a semigroup S on a set X is the kernel of the function \(\Psi ': S \longrightarrow X ^ X\),

$$\begin{aligned} \ker (\Psi )= & {} \{\, (s, t) \, :\, (s)\Psi ' = (t)\Psi ' \,\} \\= & {} \{\, (s, t)\in S\times S \, :\, (x, s)\Psi = (x, t)\Psi ,\ \text {for all }x\in X \,\}. \end{aligned}$$

Footnote 1It is straightforward to verify that the kernel of a homomorphism, and the kernel of a right action of a semigroup S, is a congruence on S.

If S acts on the sets X and Y on the right, then we say that \(\lambda : X\longrightarrow Y\) is a homomorphism of right actions if \((x \cdot s) \lambda =(x)\lambda \cdot s\) for all \(x \in X\) and \(s\in S^1\) (see Fig. 1 for a diagram). An isomorphism of right actions is a bijective homomorphism of right actions.

Fig. 1
figure 1

A commutative diagram illustrating a homomorphism of actions where \(X\times S \longrightarrow Y\times S\) is the function defined by \((x, s) \mapsto ((x)\lambda , s)\)

If A is a set, then a word over A is a finite sequence \((a_1, \ldots , a_n)\) where \(a_1, \ldots , a_n\in A\) and \(n \ge 0\); say that the length of the word is n. We will write \(a_1 \cdots a_n\) rather than \((a_1, \ldots , a_n)\), and will refer to A as an alphabet and \(a\in A\) as a letter. If A is any set, then the free semigroup on A is the set \(A ^ +\) consisting of all words over A of length at least 1 with operation given by the juxtaposition of words. The free monoid on A is the set \(A ^ + \cup \{\varepsilon \}\) with the operation again the juxtaposition of words and where \(\varepsilon \) denotes the unique word of length 0, the so-called empty word; the free monoid on A is denoted \(A ^ *\).

The next proposition might be viewed as an analogue of the Third Isomorphism Theorem for semigroups ([16, Theorem 1.5.4]) in the context of actions. More precisely, Proposition 2.1 will allow us to replace the action of a free monoid on a quotient of a quotient by an action on a single quotient. We require the following definition. Let S be a semigroup, let \(\rho \) be a congruence of S, and let \(\sigma \) be a right congruence of S such that \(\rho \subseteq \sigma \). The set \(\sigma / \rho \) defined as

$$\begin{aligned} \sigma / \rho = \{\, (x/\rho , y/\rho )\in S/\rho \times S/\rho \, :\, (x,y) \in \sigma \,\} \end{aligned}$$

is a right congruence on \(S/\rho \).

Proposition 2.1

Let S be a semigroup, let \(\rho \) be a congruence of S, and let \(\sigma \) be a right congruence of S such that \(\rho \subseteq \sigma \). The right actions of S on \((S / \rho ) / (\sigma / \rho )\) and \(S / \sigma \) defined by

$$\begin{aligned} (u/ \rho )/ (\sigma / \rho ) \cdot v = (uv/ \rho )/ (\sigma / \rho ) \quad \text {and} \quad u/\sigma \cdot v = uv/\sigma \quad \text {for all}\quad u, v\in S \end{aligned}$$

are isomorphic.

The proof of Proposition 2.1 is similar to the proof of [16, Theorem 1.5.4]. An analogue of Proposition 2.1 can be formulated for left congruences, but we do not require this explicitly; see Proposition 3.1, and the surrounding text, for further details.

We also require the following result, the proof of which is routine, and hence omitted.

Proposition 2.2

Let S be a semigroup, let X be a set, and let \(\Psi :X \times S\longrightarrow X\) be a right action. If \(\mu \subseteq \ker (\Psi )\) is a congruence on S, then \({\overline{\Psi }}: X \times S/\mu \longrightarrow X\) defined by

$$\begin{aligned} (x, s/\mu ) {\overline{\Psi }} = (x, s){\overline{\Psi }} \end{aligned}$$

for all \(x \in X\) and \(s\in S\), is a right action that is isomorphic to \(\Psi \).

3 Congruence enumeration

In this section we define what we mean by a congruence enumeration for a semigroup or monoid, and establish some further notational conventions. Congruence enumeration, as described in this section, will provide the general context for the more explicit algorithms discussed in Sects. 6 and 7; this section is based, at least in spirit, on [31, Section 5.1] and is also influenced by [32, 33].

The purpose of a congruence enumeration is to determine, in some sense, the structure of a finitely presented monoid; or more generally, a congruence on such a monoid. For the sake of simplicity, we will assume throughout that \({\mathcal {P}}= \langle A\mid R \, \rangle \) is a monoid presentation defining a monoid M, where A is some totally ordered finite alphabet, and R is a (possibly empty) finite subset of \(A ^ * \times A ^ *\). Additionally, we suppose that S is a finite subset of \(A ^ * \times A ^ *\). We write \(R ^ {\#}\) to be the least two-sided congruence on \(A ^ *\) containing R, and denote by \(\rho \) the least right congruence on \(A ^ *\) containing \(R ^ {\#}\) and S. In this notation, the monoid M defined by the presentation \({\mathcal {P}}\) is \(A ^ * / R ^ {\#}\). If the number of congruence classes of \(\rho \) is finite, then the output of a congruence enumeration is a description of the natural right action of \(A ^ *\) on the congruence classes of \(\rho \). In particular, the output yields the number of such classes, can be used to determine whether or not two words in \(A ^ *\) belong to the same class, and provides a homomorphism from \(A ^ *\) to the full transformation monoid \({(A ^ * / \rho )} ^ {(A ^ * / \rho )}\) on the set of congruence classes of \(\rho \). Since it is undecidable whether a finitely presented monoid is finite or not [22] (see also [4, Remark 4]), an upper bound for the number of steps required for a congruence enumeration for \(\langle A\mid R \, \rangle \) to terminate cannot be computed as a function of |A| and |R|. If the number of congruence classes of \(\rho \) is infinite, then the enumeration will not terminate. If the set S is empty, then a congruence enumeration, if it terminates, gives us a description of the monoid \(A ^ * /R ^ {\#}\) together with the natural right action of \(A ^ *\) on \(A ^ * / R ^ {\#}\).

Perhaps it is more natural to want to determine a right congruence of the monoid M defined by the presentation \({\mathcal {P}}\) rather than on the free monoid \(A ^ *\). However, by Proposition 2.1, to determine the right congruence on \(M = A ^ * / R ^ {\#}\) generated by

$$\begin{aligned} \{\, (s/R ^ {\#}, t/R ^ {\#}) \, :\, (s, t)\in S \,\} \end{aligned}$$

it suffices to determine the right congruence \(\rho \) on \(A ^ *\) containing \(R ^ {\#}\) and the set S.

It is only for the sake of convenience that we have chosen to consider finitely presented monoids, rather than finitely presented semigroups. To apply the algorithms described in this article to a semigroup that is not a monoid, simply adjoin an identity, perform the algorithm, and disregard the adjoined identity in the output.

The choice of “right” rather than “left” in the previous paragraphs was arbitrary. If \(w = a_1a_2\cdots a_n\in A ^ *\), then we denote the reversed word \(a_n\cdots a_2a_1\) by \(w ^ {\dagger }\). If \(W \subseteq A ^ * \times A ^ *\), then we denote by \(W ^ \dagger \) the set \(\{\, (u^ {\dagger }, v ^ \dagger ) \, :\, (u, v) \in W \,\}\).

Proposition 3.1

Let A be an alphabet, let \(R, S\subseteq A ^ * \times A ^ *\), let \(R ^ {\#}\) be the least congruence on \(A ^ *\) containing R, and let \(\rho \) be the least left congruence on \(A ^ *\) containing \(R ^ {\#}\) and S. Then \(\rho ^ {\dagger }\) is the least right congruence on \(A ^ *\) containing \({R ^ {\#}} ^ {\dagger }\) and \(S ^ {\dagger }\).

It follows from Proposition 3.1 that any algorithm for computing right congruences can be used for left congruences, by simply reversing every word.

The inputs of a congruence enumeration are the finite alphabet A, the finite set of relations \(R \subseteq A ^ * \times A ^ *\), the finite set \( S \subseteq A ^ * \times A ^ *\), and a certain type of digraph that is defined in the next section.

3.1 Word graphs

One of the central components of the proofs presented in Sect. 4 is that of a word graph, which is used in a natural representation of equivalence relations on a free monoid. Let A be any alphabet and let \(\Gamma = (N, E)\) be a digraph with non-empty finite set of nodes \(N\subseteq {\mathbb {N}}\) with \(0\in N\) and edges \(E\subseteq N \times A \times N\). Then, following [32, 33], we refer to \(\Gamma \) as a word graph. The word graph \(\Gamma = (\{0\}, \varnothing )\) is the trivial word graph.

If \((\alpha , a, \beta )\in E\) is an edge in a word graph \(\Gamma \), then \(\alpha \) is the source, a is the label, and \(\beta \) is the target of \((\alpha , a, \beta )\). An edge \((\alpha , a, \beta )\) is said to be incident to its source \(\alpha \) and target \(\beta \).

A word graph \(\Gamma \) is deterministic if for every node \(\alpha \) and every letter \(a\in A\) there is at most one edge of the form \((\alpha , a, \beta )\) in \(\Gamma \). A word graph \(\Gamma \) is complete if for every node \(\alpha \) and every letter \(a\in A\) there is at least one edge incident to \(\alpha \) labelled by a in \(\Gamma \).

If \(\alpha , \beta \in N\), then an \((\alpha , \beta )\) -path is a sequence of edges \((\alpha _1, a_1, \alpha _{2}), \ldots , (\alpha _{n}, a_{n}, \alpha _{n + 1})\in E\) where \(\alpha _1 = \alpha \) and \(\alpha _{n + 1} = \beta \) and \(a_1, \ldots , a_{n}\in A\); \(\alpha \) is the source of the path; the word \(a_1\cdots a_{n}\in A ^ *\) labels the path; \(\beta \) is the target of the path; and the length of the path is n. If \(\alpha , \beta \in N\) and there is an \((\alpha , \beta )\)-path in \(\Gamma \), then we say that \(\beta \) is reachable from \(\alpha \). If \(\Gamma = (N, E)\) is a word graph and \({\mathfrak {P}}(A ^ * \times A ^ *)\) denotes the power set of \(A ^ * \times A ^ *\), then the path relation of \(\Gamma \) is the function \(\pi _{\Gamma }: N \longrightarrow {\mathfrak {P}}(A ^ * \times A ^ *)\) defined by \((u, v)\in (\alpha )\pi _{\Gamma }\) if there exists a node \(\beta \) such that u and v both label \((\alpha , \beta )\)-paths in \(\Gamma \). If \(\Gamma \) is a word graph and \(\alpha \) is a node in \(\Gamma \), then \((\alpha )\pi _{\Gamma }\) is reflexive and symmetric, and \((\alpha )\pi _{\Gamma }\) is transitive for all \(\alpha \) if and only if \(\Gamma \) is deterministic. In particular, \((\alpha )\pi _{\Gamma }\) is an equivalence relation for all \(\alpha \) if and only if \(\Gamma \) is deterministic. If \(R\subseteq A ^ *\times A ^ *\), \(\Gamma \) is a word graph, and \(\pi _{\Gamma }\) is the path relation of \(\Gamma \), then we say that \(\Gamma \) is compatible with R if \(R\subseteq (\alpha )\pi _{\Gamma }\) for every node \(\alpha \) in \(\Gamma \).

Suppose that \(\Gamma = (N, E)\) is a word graph with nodes N and edges \(E\subseteq N\times A \times N\), and \(\kappa \subseteq N \times N\) is an equivalence relation. The quotient \(\Gamma /\kappa \) of \(\Gamma \) by \(\kappa \) is the word graph with nodes \(\{\, \min \alpha /\kappa \, :\, \alpha \in N \,\}\), and edges \((\min \alpha /\kappa , a, \min \beta /\kappa )\) whenever \((\alpha , a, \beta ) \in E\). It is routine to verify that if \(\Gamma = (N, E)\) is a word graph, \(\kappa \) is any equivalence relation on N, and \(\kappa '\) is any equivalence relation on the set of nodes of \(\Gamma / \kappa \), then \((\Gamma /\kappa ) / \kappa ' = \Gamma / \kappa ''\) where \(\kappa ''\) is the least equivalence on N containing \(\kappa \) and every \((\alpha , \beta )\in N \times N\) where \((\min \alpha /\kappa , \min \beta /\kappa ) \in \kappa '\). If \(\Gamma _1= (N_1, E_1)\) and \(\Gamma _2= (N_2, E_2)\) are word graphs over the same alphabet A, then \(\phi :N_1\longrightarrow N_2\) is a homomorphism if \((\alpha , a, \beta )\in E_1\) implies \(((\alpha )\phi , a, (\beta )\phi )\in E_2\); and we write \(\phi : \Gamma _1\longrightarrow \Gamma _2\). An isomorphism of word graphs \(\Gamma _1\) and \(\Gamma _2\) is a bijection \(\phi : \Gamma _1\longrightarrow \Gamma _2\) such that both \(\phi \) and \(\phi ^ {-1}\) are homomorphisms. If \(\kappa \) is any equivalence relation on a word graph \(\Gamma = (N, E)\), then the function \(\phi : \Gamma \longrightarrow \Gamma / \kappa \) defined by \((\alpha )\phi = \alpha / \kappa \) is a homomorphism. Conversely, if \(\phi : \Gamma _1 \longrightarrow \Gamma _2\) is a homomorphism, then \((\Gamma _1)\phi := \{\, (\alpha )\phi \in N_2 \, :\, \alpha \in N_1 \,\}\) is isomorphic to the quotient \(\Gamma _1 / \ker (\phi )\). If \((\alpha _1, a_1, \alpha _{2}), \ldots , (\alpha _{n}, a_{n}, \alpha _{n + 1})\in E_1\) is a path in \(\Gamma _1\) and \(\phi : \Gamma _1\longrightarrow \Gamma _2\) is a homomorphism, then, by definition, \(((\alpha _1)\phi , a_1, (\alpha _{2})\phi ), \ldots , ((\alpha _{n})\phi , a_{n}, (\alpha _{n + 1})\phi )\in E_2\) is a path in \(\Gamma _2\) with the same label \(a_1 \cdots a_n \in A ^ *\). In this way, we say that homomorphisms of word graphs preserve paths and labels of paths. This leads to the following straightforward lemma.

Lemma 3.2

Let \(\Gamma _1= (N_1, E_1)\) and \(\Gamma _2= (N_2, E_2)\) be word graphs, and let \(\pi _{\Gamma _1}: N_1 \longrightarrow {\mathfrak {P}}(A ^ * \times A ^ *)\) and \(\pi _{\Gamma _2}:N_2\longrightarrow {\mathfrak {P}}(A ^ * \times A ^ *)\) be the path relations of \(\Gamma _1\) and \(\Gamma _2\), respectively. If \(\phi : \Gamma _1\longrightarrow \Gamma _2\) is a homomorphism and \(\alpha \in N_1\) is arbitrary, then \((\alpha )\pi _{\Gamma _1}\subseteq ((\alpha )\phi ) \pi _{\Gamma _2}\).

Before giving the definition of a congruence enumeration, we highlight that many accounts of the Todd–Coxeter algorithm (see, for instance, [15, 25, 31]) are not formulated in terms of digraphs, but rather as a table whose rows are labelled by a set C of non-negative integers, and columns are labelled by the generating set A. If \(\Gamma = (N, E)\) is a deterministic word graph and \(f: C \longrightarrow N\) is a bijection such that \((0)f = 0\), then the value in the row labelled c and column labelled a is \(f ^ {-1}\) of the target of the unique edge in \(\Gamma \) with source (c)f and label a. According to Neubüser [25], until the 1950s, congruence enumeration was often performed by hand, and, in this context, using tables is more straightforward than using graphs. On the other hand, the language of word graphs provide a more accessible means of discussing congruence enumeration in theory.

3.2 The definition

Recall that we suppose throughout that \(\langle A| R\rangle \) for some \(R\subseteq A ^ * \times A ^ *\) is a finite monoid presentation. Additionally, we will suppose throughout that A is a totally ordered alphabet. A congruence enumeration is a sequence of the following steps TC1, TC2, and TC3 where the input to the i-th step (where \(i\in {\mathbb {N}}\)) is \((\Gamma _{i}, \kappa _{i})\) for some word graph \(\Gamma _{i}\) with a totally ordered set of nodes \(N_i\) and set of edges \(E_i\), and some equivalence relation \(\kappa _{i}\subseteq N_i \times N_i\).

TC1:

(define a new node). If \(\alpha \) is a node in \(\Gamma _{i}\) and there is no edge in \(\Gamma _{i}\) labelled by \(a\in A\) with source \(\alpha \), then we define \(\Gamma _{i + 1}\) to be the word graph obtained from \(\Gamma _{i}\) by adding the new node \(\beta := 1 + \max \bigcup _{j \le i} N_j\) and the edge \((\alpha , a, \beta )\). We define \(\kappa _{i + 1}:= \kappa _{i} \cup \{(\beta , \beta )\}\).

TC2:

(follow the paths defined by a relation). Suppose that \(\alpha \in N_i\) and \((u,v)\in R\) where \(u = u_1a\) and \(v= v_1b\) for some \(u_1, v_1\in A ^ {*}\) and \(a, b \in A\).

(a):

If u and \(v_1\) label paths from \(\alpha \) to some nodes \(\beta , \gamma \in N_i\) in \(\Gamma _{i}\), respectively, but \(\gamma \) is not the source of any edge labelled by b, then we set \(\Gamma _{i + 1}\) to be the word graph obtained from \(\Gamma _{i}\) by adding the edge \((\gamma , b, \beta )\) and we define \(\kappa _{i + 1}:= \kappa _{i}\) (Fig. 2).

(b):

The dual of (a) where there are paths labelled by \(u_1\) and v to nodes \(\beta \) and \(\gamma \), respectively, but \(\beta \) is not the source of any edge labelled by a.

(c):

If u and v label paths from w to some nodes \(\beta \) and \(\gamma \), respectively, and \(\beta \ne \gamma \), then we define \(\Gamma _{i + 1}:= \Gamma _{i}\) and \(\kappa _{i + 1}\) to be the least equivalence containing \(\kappa _{i}\) and \((\beta , \gamma )\).

Note that conditions (a), (b), and (c) are mutually exclusive, and it may be the case that none of them hold.

TC3:

(process coincidences or a determination). We define \(\Gamma _{i + 1}\) to be the quotient of \(\Gamma _{i}\) by \(\kappa _{i}\) and define \(\kappa _{i + 1}\) to be the least equivalence on \(N_{i + 1}\) containing every \((\beta , \gamma )\), \(\beta \ne \gamma \), for which there exist \(\alpha \in N_{i + 1}\) and \(a\in A\) such that \((\alpha , a, \beta ), (\alpha , a, \gamma )\in E_{i + 1}\). Recall that the quotient \(\Gamma _i/\kappa _i\) of \(\Gamma _i\) by \(\kappa _i\) is the word graph with nodes \(\{\, \min \alpha /\kappa _i \, :\, \alpha \in N_i \,\}\) and hence after an application of TC3 each node in \(\Gamma _{i + 1}\) is set to be equal to the minimum of the set of nodes in its equivalence class in \(\kappa _i\).

There are only finitely many possible quotients of any word graph. Hence if TC3 is applied repeatedly, then after finitely many iterations the output \(\kappa _{i + 1}\) will equal \(\Delta _{N_i}\), and \(\Gamma _{i + 1}\) and \(\Gamma _{i}\) will be equal.

Fig. 2
figure 2

A diagram representing TC2(a), solid lines correspond to paths in \(\Gamma _{i}\) and the dashed edge is the one defined in TC2(a)

If w labels an \((\alpha , \beta )\)-path P in some \(\Gamma _{i}\), then neither TC1 nor TC2 changes any of the edges belonging to P. Hence if \(\Gamma _{i + 1}\) is obtained from \(\Gamma _{i}\) by applying TC1 or TC2, then w labels an \((\alpha ,\beta )\)-path in \(\Gamma _{i + 1}\) also. If \(\Gamma _{i + 1}\) is obtained from \(\Gamma _{i}\) by applying TC3, then \(\Gamma _{i + 1} \) is a homomorphic image of \(\Gamma _{i}\). As already noted, homomorphisms preserve paths, and so w labels a path in \(\Gamma _{i + 1}\) also.

We can now formally define a congruence enumeration.

Definition 3.3

(Congruence enumeration.) Suppose that A is a finite alphabet, that \(R, S \subseteq A ^ * \times A ^ *\) are finite, that \(\rho \) is the least right congruence on \(A ^ *\) containing both \(R ^ {\#}\) and S, and that \(\Gamma _{1} = (N_1, E_1)\) is a word graph with path relation \(\pi _{\Gamma _1}: N_1 \longrightarrow {\mathfrak {P}}(A ^ * \times A ^ *)\) such that \((0)\pi _{\Gamma _1}\subseteq \rho \) and \(\kappa _{1} = \Delta _{N_1}\). Then a congruence enumeration for \(\rho \) with input \((\Gamma _{1}, \kappa _{1})\) consists of:

  1. (a)

    For every \((u, v)\in S\), by repeatedly applying TC1 (if necessary), add edges to \(\Gamma _{1}\) so that it contains paths labelled by u and v both with source 0.

  2. (b)

    Apply TC2 to 0 and every \((u, v) \in S\).

If \((\Gamma _{m}, \kappa _{m})\) is the output of steps (a) and (b), then the enumeration is concluded by performing any sequence of applications of TC1, TC2, and TC3 such that the following conditions hold for \(\Gamma _{i} = (N_i, E_i)\) for every \(i\in {\mathbb {N}}\), \(i\ge m\):

  1. (c)

    If \(\alpha \in N_i\) and there is no edge incident to \(\alpha \) with label \(a\in A\), then there exists \(j\ge i\) such that either: \(\alpha \) is no longer a node in \(\Gamma _j\); or there is an edge incident to \(\alpha \) with label a in \(\Gamma _j\).

  2. (d)

    If \(\alpha \in N_i\) and \((u, v)\in R\), then there exists \(j\ge i\) such that either: \(\alpha \) is no longer a node in \(\Gamma _j\) or \((u, v)\in (\alpha )\pi _{\Gamma _j}\).

  3. (e)

    If \(\kappa _{i}\ne \Delta _{N_i}\) for some i, then there exists \(j\ge i\) such that \(\kappa _j = \Delta _{N_j}\).

The initial value of the word graph \(\Gamma _{1}\) that forms the input to a congruence enumeration is usually either the trivial word graph or, if M is finite, the right Cayley graph of the monoid M defined by the presentation \(\langle A |R \rangle \); see Sect. 5 for more details.

A congruence enumeration terminates if the output \((\Gamma _{i}, \kappa _{i})\) has the property that applying any of TC1, TC2, or TC3 to \((\Gamma _{i}, \kappa _{i})\) results in no changes to the output, i.e. \((\Gamma _{i + 1}, \kappa _{i + 1}) = (\Gamma _{i}, \kappa _{i})\). It is straightforward to verify that a congruence enumeration terminates at step i if and only if \(\Gamma _{i}\) is complete, compatible with \(R \cup S\), and deterministic.

For any given finite monoid presentation, there is a wide range of choices for the order in which steps TC1, TC2, and TC3 are performed, and to which nodes, generators, and relations they are applied. We will examine two specific strategies for enumerating congruences for an arbitrary finite monoid presentation in more detail in Sects. 6 and 7.

4 Validity

In this section we address the validity of congruence enumeration as defined in Sect. 3; we will continue to use the notation established therein.

The main results in this section are the following.

Theorem 4.1

Let A be a finite alphabet, let \(R\subseteq A ^ * \times A ^ *\) be a finite set, and let \(R ^ {\#}\) be the least two-sided congruence on \(A ^ *\) containing R. If \(S\subseteq A ^ * \times A ^ *\) is any finite set, and \(\rho \) is the least right congruence on \(A ^ *\) containing \(R ^ {\#}\) and S, then the following hold:

  1. (a)

    If a congruence enumeration for \(\rho \) terminates with output word graph \(\Gamma =(N, E)\), then \(A ^ * / \rho \) is finite and the function \(\phi : N \times A ^ * \longrightarrow N\), defined by \((\alpha , w)\phi = \beta \) whenever w labels an \((\alpha , \beta )\)-path in \(\Gamma \), is a right action that is isomorphic to the natural action of \(A ^ *\) on \(A ^ * / \rho \) by right multiplication.

  2. (b)

    If \(A ^ * / \rho \) is finite, then any congruence enumeration for \(\rho \) terminates.

Corollary 4.2

Let A be a finite alphabet, let \(R\subseteq A ^ * \times A ^ *\) be a finite set, and let \(R ^ {\#}\) be the least two-sided congruence on \(A ^ *\) containing R. Then the following hold:

  1. (a)

    If a congruence enumeration for \(R ^ {\#}\) terminates with output word graph \(\Gamma =(N, E)\), then \(A ^ * / R ^ {\#}\) is finite and the function \(\phi : N \times A ^ * \longrightarrow N\) defined by \((\alpha , w)\phi = \beta \) whenever w labels an \((\alpha , \beta )\)-path in \(\Gamma \) is a right action that is isomorphic to the (faithful) natural action of \(A ^ * / R ^ {\#}\) on itself by right multiplication.

  2. (b)

    If \(A ^ * / R ^ {\#}\) is finite, then any congruence enumeration for \(R ^ {\#}\) terminates.

We will prove Theorem 4.1 and Corollary 4.2 in Sect. 4.2. We start by showing that the steps TC1, TC2, and TC3 preserve certain properties of word graphs in Proposition 4.4. In Sect. 4.1, we show that every congruence enumeration eventually stabilises and is eventually compatible with R; and in Sect. 4.2 we give the proofs of Theorem 4.1 and Corollary 4.2.

We will make repeated use, without reference, to the following straightforward lemma.

Lemma 4.3

If any of TC1, TC2, or TC3 is applied to \((\Gamma _{i}, \kappa _{i})\) where every node in \(\Gamma _i\) is reachable from 0, then every node in the output \(\Gamma _{i + 1}\) is reachable from 0 also.

The next proposition also plays a crucial role in the proof of Theorem 4.1 and Corollary 4.2.

Proposition 4.4

If TC1, TC2, or TC3 is applied to \((\Gamma _{i}, \kappa _{i})\) where \(\Gamma _{i} = (N_i, E_i)\) is a word graph such that \((0)\pi _{\Gamma _i}\subseteq \rho \), and \((0)\pi _{\Gamma _i / \kappa _i}\subseteq \rho \), then the output \((\Gamma _{i + 1}, \kappa _{i + 1})\) satisfies \((0)\pi _{\Gamma _i} \subseteq (0)\pi _{\Gamma _{i + 1}} \subseteq \rho \), and \((0)\pi _{\Gamma _{i}/ \kappa _{i}} \subseteq (0)\pi _{\Gamma _{i + 1}/ \kappa _{i + 1}}\subseteq \rho \).

The proof of Proposition 4.4 is split into two parts due to commonalities in the proofs rather than the statements. The two cases are: TC1, TC2(a), TC2(b); and TC2(c), TC3.

Lemma 4.5

Proposition 4.4 holds when TC1, TC2(a), or TC2(b) is applied.

Proof

Since (a) and (b) of TC2 are symmetric, it suffices to prove the proposition when TC1 or TC2(a) is applied. In TC1, \(\alpha \) is a node in \(\Gamma _{i}\) and there is no edge incident to \(\alpha \) labelled by a. In this case, \(\Gamma _{i + 1}\) is obtained from \(\Gamma _{i}\) by adding the single node \(\beta := 1 + \min \bigcup _{j \le i} N_j\) and the single edge \((\alpha , a, \beta )\), and \(\kappa _{i + 1}:= \kappa _i \cup \{(\beta , \beta )\}\). In TC2(a), there exists \((u, v)\in R\), \(b\in A\), and \(v_1\in A ^ *\) such that \(v = v_1b\), and there exist nodes \(\alpha \), \(\beta \), and \(\gamma \) in \(\Gamma _i\) such that u and \(v_1\) label \((\alpha , \beta )\)- and \((\alpha , \gamma )\)-paths in \(\Gamma _i\), respectively. In this case, \(\Gamma _{i + 1}\) is obtained from \(\Gamma _{i}\) by adding the single edge \((\gamma , b, \beta )\) and \(\kappa _{i + 1}:= \kappa _i\).

Hence if either TC1 or TC2(a) is applied, then the identity map \(N_i \longrightarrow N_{i + 1}\) is a homomorphism from \(\Gamma _i\) to \(\Gamma _{i + 1}\) and so the identity map \(N_i/\kappa _i\longrightarrow N_{i + 1} / \kappa _i\) is also a homomorphism from \(\Gamma _i/\kappa _i\) to \(\Gamma _{i+ 1}/\kappa _{i + 1}\). Hence, by Lemma 3.2, \((0)\pi _{\Gamma _i}\subseteq (0)\pi _{\Gamma _{i + 1}}\) and \((0)\pi _{\Gamma _i/\kappa _i}\subseteq (0)\pi _{\Gamma _{i + 1}/\kappa _{i + 1}}\).

To show that \((0)\phi _{\Gamma _{i + 1}}\subseteq \rho \) we consider the cases when TC1 and TC2(a) are applied separately.

TC1: Suppose that \((u, v) \in (0)\pi _{\Gamma _{i + 1}}{\setminus } (0)\pi _{\Gamma _i}\) is arbitrary. Then there are paths in \(\Gamma _{i + 1}\) from 0 to some node \(\gamma \) labelled by u and v but there are no such paths in \(\Gamma _i\). Since \(\beta \) is not a node in \(\Gamma _i\), \(\beta \) is not the target of any path in \(\Gamma _i\). It follows that the edge \((\alpha , a, \beta )\) must occur at least once in both paths. But \(\beta \) is the source of no edges in \(\Gamma _{i + 1}\), and so \((\alpha , a, \beta )\) occurs once, and it must be the last edge, in both paths. Hence \(u = u_1a\) and \(v= v_1a\) where \(u_1, v_1\in A ^ *\) label \((0, \alpha )\)-paths in \(\Gamma _{i}\). Therefore \((u_1, v_1) \in \rho \) by assumption and since \(\rho \) is a right congruence, \((u, v) = (u_1a, v_1a) \in \rho \) as well. Hence \((0)\pi _{\Gamma _{i + 1}} \subseteq \rho \).

TC2(a): We proceed by induction on the total number k of occurrences of the edge \((\gamma , b, \beta )\) (defined at the start of the proof) in any pair of paths in \(\Gamma _{i + 1}\) with source 0 and the same target node. The inductive hypothesis is: if \((x, y)\in (0)\pi _{\Gamma _{i + 1}}\), X and Y are paths with source 0 labelled by x and y, respectively, and the total number of occurrences of the edge \((\gamma , b, \beta )\) in X and Y is strictly less than k, then \((x, y) \in \rho \). The base case is when \(k = 0\) and, in this case, \((x, y) \in (0)\pi _{\Gamma _{i}}\subseteq \rho \), as required.

Suppose that \(k \ge 1\), that \(x, y\in A ^ *\) are any words labelling \((0, \delta )\)-paths X and Y, respectively, for some node \(\delta \), and that \((\gamma , b, \beta )\) occurs k times in X and Y. Without loss of generality we can assume that \((\gamma , b, \beta )\) occurs at least once in X. If X consists of the edges \(e_1, \ldots , e_r\in E_{i + 1}\), and \(j\in \{1, \ldots , r\}\) is the largest value such that \(e_j = (\gamma , b, \beta )\), then we may write \(x = x_1bx_2\) where \(x_1\) labels the \((0,\gamma )\)-path \((e_1, \ldots , e_{j - 1})\) and \(x_2\) labels the \((\beta , \delta )\)-path \((e_{j + 1}, \ldots , e_r)\). If \(w'\) labels any path from 0 to \(\alpha \) in \(\Gamma _i\), then \(w'v_1\) labels an \((0,\gamma )\)-path in \(\Gamma _{i}\). It follows that \((x_1, w'v_1) \in (0)\pi _{\Gamma _{i + 1}}\). The number of occurrences of \((\gamma , b, \beta )\) in \((e_1, \ldots , e_{j -1})\) is strictly less than k, and the number in the path (in \(\Gamma _{i}\)) labelled by \(w'v_1\) is 0. Hence, by the inductive hypothesis, \((x_1, w'v_1) \in \rho \) and so \((x, w'vx_2) = (x_1bx_2, wv_1bx_2) \in \rho \).

The word \(w'\) labels a \((0, \alpha )\)-path, u labels a \((\alpha , \beta )\)-path, and \(x_2\) labels a \((\beta , \delta )\)-path. The first two of these paths belong to \(\Gamma _i\) by assumption, and the third does also, by the maximality of j. Hence \(w'ux_2\) labels a \((0, \delta )\)-path in \(\Gamma _i\), which therefore contains no occurrences of the edge \((\gamma , b, \beta )\). The word y also labels the \((0, \delta )\)-path Y in \(\Gamma _{i + 1}\) and this path contains at most \(k - 1\) occurrences of \((\gamma , b, \beta )\), since X and Y together contained k occurrences, and X contained at least 1 occurrence of \((\gamma , b, \beta )\). Therefore, by induction, \((w'ux_2, y) \in \rho \). Finally, since \((u, v)\in R\subseteq \rho \), \((w'ux_2, w'vx_2)\in R ^ {\#} \subseteq \rho \) also. Hence \((x, w'vx_2), (w'vx_2, w'ux_2), (w'ux_2, y) \in \rho \) and so by transitivity, \((x, y) \in \rho \).

We conclude the proof by showing that \((0)\pi _{\Gamma _{i + 1}/ \kappa _{i + 1}} \subseteq \rho \) when either TC1 or TC2(a) is applied. Suppose that \((u, v) \in (0)\pi _{\Gamma _{i + 1}/ \kappa _{i + 1}}\). If \((u, v) \in (0)\pi _{\Gamma _{i + 1}}\), then \((u, v)\in \rho \), as required. If \((u, v)\not \in (0)\pi _{\Gamma _{i + 1}}\), then u and v label \((0, \zeta )\)- and \((0, \eta )\)-paths in \(\Gamma _{i + 1}\), respectively, for some nodes \(\zeta \ne \eta \). Recall that \( \kappa _{i + 1} = \kappa _{i}\cup \{(\beta , \beta )\}\) if TC1 is applied, and \(\kappa _{i + 1} = \kappa _i\) if TC2(a) is applied. In either case, \((\zeta , \eta )\in \kappa _{i + 1}\) and \(\zeta \ne \eta \) implies that \((\zeta , \eta ) \in \kappa _i\). But \(\zeta \) and \(\eta \) are nodes in \(\Gamma _i\) and hence they are reachable from 0 in \(\Gamma _i\) by Lemma 4.3. In other words, there exist \(u', v'\in A ^ * \) labelling \((0, \zeta )\)- and \((0, \eta )\)-paths in \(\Gamma _i\), respectively. Hence \((u, u'), (v', v)\in (0)\pi _{\Gamma _{i + 1}} \subseteq \rho \) and, by construction, \((u', v')\in (0)\pi _{\Gamma _i / \kappa _i}\subseteq \rho \). Thus, by transitivity, \((u, v)\in \rho \), as required. \(\square \)

Lemma 4.6

Proposition 4.4 holds when TC2(c) or TC3 is applied.

Proof

If TC2(c) is applied, then there exist a node \(\alpha \) in \(\Gamma _i\) and \((u, v) \in R\) such that u and v label \((\alpha , \beta )\)- and \((\alpha , \gamma )\)-paths in \(\Gamma _i\) for some distinct nodes \(\beta \) and \(\gamma \) in \(\Gamma _i\). In this case, \(\Gamma _{i + 1} = \Gamma _i\) and \(\kappa _{i + 1}\) is the least equivalence relation containing \(\kappa _i\) and \((\beta , \gamma )\). If TC3 is applied, then \(\Gamma _{i + 1} = \Gamma _i / \kappa _i\) and \(\kappa _{i + 1}\) is the least equivalence on \(N_{i + 1}\) containing every \((\delta , \zeta )\), \(\delta \ne \zeta \), for which there exist \(\eta \in N_{i + 1}\) and \(a\in A\) such that \((\eta , a, \delta ), (\eta , a, \zeta )\in E_{i + 1}\).

In either case, there is a homomorphism from \(\Gamma _{i + 1}\) to \(\Gamma _{i + 1} / \kappa _{i + 1}\) and so Lemma 3.2 implies that \((0)\pi _{\Gamma _{i + 1}}\subseteq (0)\pi _{\Gamma _{i + 1}/ \kappa _{i + 1}}\). If TC2(c) is applied, then the identity map is a homomorphism from \(\Gamma _i\) to \(\Gamma _{i + 1}\). If TC3 is applied, then \(\Gamma _{i + 1} = \Gamma _{i}/ \kappa _i\) is a homomorphic image of \(\Gamma _i\) also. Hence, in either case, \((0)\pi _{\Gamma _i}\subseteq (0)\pi _{\Gamma _{i + 1}}\) by Lemma 3.2.

Hence, in both cases, it suffices to show that \((0)\pi _{\Gamma _{i}/ \kappa _{i}} \subseteq (0)\pi _{\Gamma _{i + 1}/ \kappa _{i + 1}}\subseteq \rho \). We consider the cases when TC2(c) and TC3 are applied separately.

TC2(c): If \(\beta \) and \(\gamma \) are the nodes defined above, and \(\sigma \) is the least equivalence relation on the set of nodes in \(\Gamma _i/\kappa _i\) containing \(\beta /\kappa _i\) and \(\gamma / \kappa _i\), then \((\Gamma _i/\kappa _i)/\sigma = \Gamma _{i} / \kappa _{i + 1} = \Gamma _{i + 1}/ \kappa _{i + 1}\). Hence \(\Gamma _{i + 1}/ \kappa _{i + 1}\) is a homomorphic image of \(\Gamma _i/\kappa _i\) and so \((0)\pi _{\Gamma _i/ \kappa _i} \subseteq (0)\pi _{\Gamma _{i + 1}/ \kappa _{i + 1}}\).

Suppose that \((x, y) \in (0)\pi _{\Gamma _{i + 1}/ \kappa _{i + 1}}\). Then x and y both label \((0, \delta )\)-paths in \(\Gamma _{i + 1}/ \kappa _{i + 1}\) for some node \(\delta \) in \(\Gamma _{i + 1}/ \kappa _{i + 1}\). It follows that there are nodes \(\zeta \) and \(\eta \) in \(\Gamma _{i + 1}\) such that \((\delta , \zeta ), (\delta , \eta )\in \kappa _{i + 1}\) and x and y label \((0, \zeta )\)- and \((0, \eta )\)-paths in \(\Gamma _{i + 1} = \Gamma _i\), respectively.

If \((\zeta , \eta ) \in \kappa _i\), then \((u, v)\in (0)\pi _{\Gamma _i / \kappa _i}\) and so \((u, v) \in \rho \) by assumption. If \((\zeta , \eta )\not \in \kappa _i\), then without loss of generality we may assume that \((\zeta , \eta ) = (\beta , \gamma )\). In this case, if \(w\in A ^ *\) labels a \((0, \alpha )\) path in \(\Gamma _i\), then wu and x both label \((0, \beta ) = (0, \zeta )\)-paths in \(\Gamma _i\), and so \((wu, x)\in (0)\pi _{\Gamma _i}\subseteq \rho \). Similarly, wv and y both label \((0, \gamma )\)-paths in \(\Gamma _i\), and so \((wv, y) \in \rho \), also. Finally, \((u, v) \in R\) and so \((wu, wv)\in R ^ {\#} \subseteq \rho \), and so \((x, y) \in \rho \) also.

TC3: Clearly, \(\Gamma _{i + 1}/ \kappa _{i + 1}\) is a homomorphic image of \(\Gamma _{i + 1}\) and \(\Gamma _{i + 1}\) is defined to be \(\Gamma _i/ \kappa _i\) in TC3. Hence \((0)\pi _{\Gamma _i/ \kappa _i} \subseteq (0)\pi _{\Gamma _{i + 1}/ \kappa _{i + 1}}\) by Lemma 3.2. It remains to show that \((0)\pi _{\Gamma _{i + 1}/ \kappa _{i + 1}}\subseteq \rho \).

Suppose that \((x, y) \in (0)\pi _{\Gamma _{i + 1}/ \kappa _{i + 1}}\). Then there exist nodes \(\zeta \) and \(\eta \) in \(\Gamma _{i + 1}\) such that \((\zeta , \eta )\in \kappa _{i + 1}\) where x labels a \((0, \zeta )\)-path and y labels a \((0, \eta )\)-paths in \(\Gamma _{i + 1}\). In TC3, \(\kappa _{i + 1}\) is defined to be the least equivalence containing \((\beta , \gamma ) \in N_{i + 1} \times N_{i + 1}\) such that there are edges \((\alpha , a, \beta )\) and \((\alpha , a, \gamma )\) in \(\Gamma _{i + 1}\). Hence \((\eta , \zeta ) \in \kappa _{j + 1}\) implies that there exist \(\alpha _1, \ldots , \alpha _{n - 1}, \beta _1:= \zeta , \beta _1, \ldots , \beta _n:= \eta \in N_{j + 1}\) and \(a_1, \ldots , a_{n - 1}\in A\) where \((\alpha _j, a_j, \beta _j), (\alpha _j, a_j, \beta _{j + 1}) \in E_{j + 1}\) and \(\beta _j \ne \beta _{j + 1}\) for every j. Suppose that \(w_j\in A ^ *\) is any word labelling a \((0, \alpha _j)\)-path for \(j = 1, \ldots , n - 1\), that \(x_1 = x\), that \(x_j\) labels a \((0, \beta _j)\)-path for \(j = 2, \ldots , n - 1\), and that \(x_n = y\).

Then \(w_ja_j\) and \(x_j\) both label \((0, \beta _j)\)-paths in \(\Gamma _{i + 1}\) for every j. Similarly, \(w_ja_j\) and \(x_{j + 1}\) both label \((0, \beta _{j + 1})\)-paths in \(\Gamma _{i + 1}\) for every j. Hence \((w_ja_j, x_j), (w_ia_j, x_{j + 1}) \in (0)\pi _{\Gamma _{i + 1}} \subseteq \rho \) for every j. Therefore \((x_j, x_{j + 1}) \in \rho \) for every j, and so by transitivity \((x_1, x_n) = (x, y)\in \rho \), as required. \(\square \)

4.1 Completeness, determinism, and compatibility

In this section, we will prove that if at some step i in a congruence enumeration the word graph \(\Gamma _{i}\) is complete, then that enumeration terminates, and that every congruence enumeration is eventually compatible with \(\rho \).

We say that a congruence enumeration stabilises if for every \(i\in {\mathbb {N}}\) and every node \(\alpha \) of \(\Gamma _{i}\) there exists \(K\in {\mathbb {N}}\) such that for all \(j\ge K\) either \(\alpha \) is not a node in \(\Gamma _{j}\) or if \((\alpha , a, \beta ) \in E_j\), then \((\alpha , a, \beta ) \in E_{j + 1}\) for any \(a\in A\).

Lemma 4.7

Every congruence enumeration stabilises.

Proof

We will prove a slightly stronger statement, that any sequence of TC1, TC2, and TC3 stabilises.

Suppose that \(i\in {\mathbb {N}}\), \(\alpha \) is an arbitrary node in \(\Gamma _{i}\), and \(a\in A\) is arbitrary. Either there exists \(K > i\) such that \(\alpha \) is no longer a node in \(\Gamma _{K}\), or \(\alpha \) is a node in \(\Gamma _{j}\) for all \(j \ge i\). If \(\alpha \) is no longer a node in \(\Gamma _{K}\), then, from the definitions of TC1, TC2, and TC3, \(\alpha \) is not a node in \(\Gamma _{k}\) for any \(k\ge K\).

Suppose that \(\alpha \) is a node in \(\Gamma _{j}\) for all \(j \ge i\) and that \((\alpha , a, \beta _1) \in E_j\) for some \(\beta _1\in A ^ *\). If \(\beta _1\in N_j\) is replaced by \(\beta _2\in N_k\) at some step \(k\ge j\), then step k is an application of TC3. In particular, \(0 \le \beta _2 \le \beta _1\). This process can be repeated only finitely many times because there are only finitely many natural numbers less than \(\beta _1\). \(\square \)

If \(\Gamma _{i}\) is complete for some i, then TC1 cannot be applied again at any step after i. In this case, TC2 and TC3 make \(\kappa \) coarser and hence reduce the number of nodes in \(\Gamma _{i}\). Since the number of nodes in \(\Gamma _i\) is finite, it follows that the enumeration must terminate at some point. We record this in the following corollary.

Corollary 4.8

If \(\Gamma _{i}\) is complete at some step i of a congruence enumeration, then the enumeration terminates at some step \(j\ge i\).

The next lemma shows that, roughly speaking, if the word graph at some step of a congruence enumeration is non-deterministic, then at some later step it is deterministic.

Lemma 4.9

If \(w\in A ^ *\) labels a path in \(\Gamma _i\) starting at 0 for some \(i\in {\mathbb {N}}\), then there exists \(j\ge i\) such that w labels a unique path in \(\Gamma _j\) starting at 0.

Proof

We proceed by induction on the length of the word w. If \(w = \varepsilon \), then w labels a unique path in every \(\Gamma _i\) starting (and ending) at 0.

Suppose that every word of length at most \(n - 1\) labelling a path starting at 0 in \(\Gamma _i\) labels a unique path, and let \(w\in A ^ *\) be any word of length \(n\ge 1\), labelling a path in \(\Gamma _i\) starting at 0. If \(w = w_1a\) for some \(w_1\in A ^ *\) and \(a \in A\), then \(w_1\) labels a unique path in \(\Gamma _i\) starting at 0 by induction. Suppose that \(\alpha \) is the target of the unique path labelled \(w_1\). By Lemma 4.7, we may assume without loss of generality that \(\alpha \) is a node in \(\Gamma _j\) for every \(j\ge i\). If there exist edges \((\alpha , a, \beta _1), \ldots , (\alpha , a, \beta _r)\) in \(\Gamma _i\), then at most one of these edges was created by an application of TC1 or TC2. So, if \(r > 1\), then \((\beta _1, \beta _2), \ldots , (\beta _{r - 1}, \beta _r) \in \kappa _i\).

By part (e) of the definition of a congruence enumeration (Definition 3.3), there exists \(j > i\) such that \(\kappa _j = \Delta _{N_j}\) and so at step j, there is only one edge incident to \(\alpha \) labelled a. \(\square \)

Next we show that every congruence enumeration is eventually compatible with \(\rho \).

Lemma 4.10

If \((u, v) \in R ^ {\#}\), then there exists \(K\in {\mathbb {N}}\) such that \((u, v) \in (0)\pi _{\Gamma _i}\) for all \(i \ge K\).

Proof

If \((u, v)\in R ^ {\#}\), then there exists an elementary sequence \(u = w_1, w_2, \ldots , w_k = v\) in \(A ^ *\) with respect to R. If \((w_j, w_{j + 1}) \in (0)\pi _{\Gamma _i}\) for every \(j \in \{1, \ldots , k - 1\}\), by transitivity \((w_1, w_k) = (u, v) \in (0)\pi _{\Gamma _i}\) for a large enough i. Hence it suffices to show that \((w_j, w_{j + 1}) \in (0)\pi _{\Gamma _i}\) for all j and for sufficiently large i.

Suppose that \(j\in \{1, \ldots , k - 1\}\) is arbitrary, and that \(w_j = pxq\) and \(w_{j + 1} = pyq\) where \(p, q\in A ^ *\) and \((x, y) \in R\). From part (c) of the definition of a congruence enumeration (Definition 3.3), p labels a path with source 0 in \(\Gamma _{j}\) for all \(j \ge i\) for some sufficiently large i. We may choose i large enough so that the target node \(\alpha \) of this path is a node in every \(\Gamma _j\) for \(j\ge i\). By Definition 3.3(d), for sufficiently large i, \((x, y) \in (\alpha )\pi _{\Gamma _i}\) and so \((px, py) \in (0)\pi _{\Gamma _i}\). Again if i is large enough, there is a path labelled q from the target of the path from 0 labelled by px (or py) and so \((w_j, w_{j + 1}) = (pxq, pyq)\in (0)\pi _{\Gamma _i}\). \(\square \)

Corollary 4.11

If \((u, v) \in \rho \), then there exists \(K\in {\mathbb {N}}\) such that \((u, v) \in (0)\pi _{\Gamma _i}\) for all \(i \ge K\).

Proof

Recall that \(\rho \) is the least right congruence containing \(R ^ {\#}\) and S (as defined in Theorem 4.1). If \((u, v)\in \rho \), then there exists a right elementary sequence \(u = w_1, w_2, \ldots , w_k =v\) such that \(w_j = xq\) and \(w_{j + 1} = yq\) for some \((x,y) \in R ^ {\#} \cup S\) and some \(q\in A ^ *\). Hence it suffices to show that if \((x, y) \in R ^ {\#} \cup S\) and \(q\in A ^ *\), then, for sufficiently large i, \((xq, yq)\in (0)\pi _{\Gamma _i}\).

If \((\Gamma _{m}, \kappa _{m})\) is the output of applying steps (a) and (b) of the congruence enumeration (Definition 3.3), then \(S\subseteq \pi _{\Gamma _m / \kappa _m}(0)\). Proposition 4.4 implies that \(\pi _{\Gamma _m/ \kappa _m}(0) \subseteq \pi _{\Gamma _i/ \kappa _i}(0)\) for all \(i\ge m\). By Definition 3.3(e), TC3 is eventually applied to \((\Gamma _{i - 1}, \kappa _{i - 1})\) for some \(i > m\). Hence \(S \subseteq (0)\pi _{\Gamma _{i - 1}/ \kappa _{i - 1}} = (0)\pi _{\Gamma _{i}}\) since \(\Gamma _{i} = \Gamma _{i - 1}/ \kappa _{i - 1}\), for large enough i.

Suppose that \(q\in A ^ *\) and \((x, y) \in R ^ {\#} \cup S\) are arbitrary. If \((x, y) \in R ^ {\#}\) then \((x, y) \in (0)\pi _{\Gamma _i}\) for large enough i by Lemma 4.10. If \((x, y) \in S\), then it follows from the previous paragraph that \((x, y) \in (0)\pi _{\Gamma _i}\) for large enough i. As in the proof of Lemma 4.10, for sufficiently large i, xq and yq label paths in \(\Gamma _{i}\), and so \((xq, yq) \in (0)\pi _{\Gamma _i}\). \(\square \)

Corollary 4.12

Suppose that \(u, v \in A ^ *\) are arbitrary. Then \((u, v)\in \rho \) if and only if there exists \(K\in {\mathbb {N}}\) such that \((u,v) \in (0)\pi _{\Gamma _i}\) for all \(i\ge K\).

Proof

(\(\Rightarrow \)) Assume that \((u, v)\in \rho \). By Lemma 4.10, \((u, v) \in (0)\pi _{\Gamma _K}\) for large enough K. But \((0)\pi _{\Gamma _i} \subseteq (0)\pi _{\Gamma _{i + 1}}\) for all i, and so \((u, v) \in (0)\pi _{\Gamma _i}\) for all \(i\ge K\).

(\(\Leftarrow \)) By assumption, \(\Gamma _{1}\) is a word graph with \((0)\pi _{\Gamma _1}\subseteq \rho \). By Proposition 4.4, \((0)\pi _{\Gamma _i} \subseteq \rho \) for all \(i\in {\mathbb {N}}\). Hence if \((u, v)\in (0)\pi _{\Gamma _i}\) for some i, then \((u, v)\in \rho \). \(\square \)

We also obtain the following stronger result for an enumeration of a two-sided congruence.

Corollary 4.13

Suppose that \(u, v\in A ^ *\) are arbitrary. Then the following are equivalent:

  1. (i)

    \((u, v)\in R ^ {\#}\);

  2. (ii)

    there exists \(K\in {\mathbb {N}}\) such that \((u, v) \in (\alpha )\pi _{\Gamma _i}\) for all nodes \(\alpha \) in \(\Gamma _{i}\) and for all \(i\ge K\);

  3. (iii)

    there exists \(K\in {\mathbb {N}}\) such that \((u, v) \in (0)\pi _{\Gamma _i}\) for all \(i\ge K\).

Proof

(i) \(\Rightarrow \) (ii). If \((u, v)\in R ^ {\#}\) and \(w\in A ^ *\) is arbitrary, then \((wu, wv) \in R ^ {\#}\) and so, by Lemma 4.10, \((wu, wv)\in (0)\pi _{\Gamma _i}\) for sufficiently large i. In particular, for large enough i, wu and wv both label \((0, \beta )\)-paths for some node \(\beta \). If \(\alpha \) is the target of the path starting at 0 labelled by w, then u and v both label a \((\alpha , \beta )\)-paths in \(\Gamma _{i}\) and so \((u, v) \in (\alpha )\pi _{\Gamma _{i}}\) for i large enough.

(ii) \(\Rightarrow \) (iii) This follows immediately since 0 is a node in every word graph.

(iii) \(\Rightarrow \) (i) Follows by applying Corollary 4.12 to \(\rho = R ^ {\#}\). \(\square \)

4.2 The proofs of Theorem 4.1 and Corollary 4.2

In this section, we prove Theorem 4.1 and Corollary 4.2.

Theorem 4.1. Let A be a finite alphabet, let \(R\subseteq A ^ * \times A ^ *\) be a finite set, and let \(R ^ {\#}\) be the least two-sided congruence on \(A ^ *\) containing R. If \(S\subseteq A ^ * \times A ^ *\) is any finite set, and \(\rho \) is the least right congruence on \(A ^ *\) containing \(R ^ {\#}\) and S, then the following hold:

  1. (a)

    If a congruence enumeration for \(\rho \) terminates with output word graph \(\Gamma =(N, E)\), then \(A ^ * / \rho \) is finite and the function \(\phi : N \times A ^ * \longrightarrow N\), defined by \((\alpha , w)\phi = \beta \) whenever w labels an \((\alpha , \beta )\)-path in \(\Gamma \), is a right action that is isomorphic to the natural action of \(A ^ *\) on \(A ^ * / \rho \) by right multiplication.

  2. (b)

    If \(A ^ * / \rho \) is finite, then any congruence enumeration for \(\rho \) terminates.

Proof

We start the proof by showing that there is always a bijection between a certain set of nodes in the word graphs \(\Gamma _i\) (defined in a moment) and the classes of \(\rho \). Let M be the subset of \(\bigcup _{i \in {\mathbb {N}}} N_i\) so that \(\alpha \in M\) if \(\alpha \in N_i\) for all large enough i. By Lemma 4.7, for every \(\alpha \in M\) there exists \(w_\alpha \in A ^ *\) such that \(w_\alpha \) labels a path from 0 to \(\alpha \) in every \(\Gamma _i\) for sufficiently large i. We define \(f: M \longrightarrow A ^ * / \rho \) by

$$\begin{aligned} (\alpha )f = w_\alpha /\rho . \end{aligned}$$
(4.1)

We will show that f is a bijection.

If \(\alpha , \beta \in M\) are such that \((\alpha )f = (\beta )f\), then \((w_\alpha , w_\beta ) \in \rho \) and so \((w_\alpha , w_\beta ) \in (0)\pi _{\Gamma _i}\) for large enough i by Corollary 4.12. In particular, \(w_\alpha \) and \(w_\beta \) both label paths from 0 to the same node in every \(\Gamma _i\) for large enough i. By Lemma 4.7, we can also choose i large enough so that the target \(\gamma \) of these paths is a node in \(\Gamma _j\) for all \(j\ge i\). But \(w_\alpha \) and \(w_\beta \) also label \((0, \alpha )\)- and \((0, \beta )\)-paths in every \(\Gamma _i\) for sufficiently large i. By Lemma 4.9, there exists \(j \ge i\) such that \(w_{\alpha }\) labels a unique path starting at 0, and so \(\alpha = \gamma \). Similarly, there exists \(k\ge j\) such that \(w_{\beta }\) labels a unique path from 0, and so \(\beta = \gamma \) also. In particular, \(\alpha = \beta \) and so f is injective.

For surjectivity, let \(u\in A ^ *\) be arbitrary. Then u labels a path in \(\Gamma _{i}\) for large enough i. Since every congruence enumeration stabilises, we can choose i large enough so that every edge in the path labelled by u belongs to every \(\Gamma _{j}\) for \(j\ge i\). In particular, u labels a \((0, \alpha )\)-path in \(\Gamma _{i}\) for some \(\alpha \in M\). But \(w_{\alpha }\) also labels a \((0, \alpha )\)-path in \(\Gamma _i\), and so \((u, w_{\alpha }) \in (0)\pi _{\Gamma _{i}}\). Hence, again by Corollary 4.12, for large enough i, \((u, w_{\alpha }) \in \rho \) and so \((\alpha )f = w_{\alpha } /\rho = u /\rho \), and f is surjective.

(a). Suppose that \((\Gamma , \kappa )\) is the output of the enumeration where \(\Gamma =(N, E)\) and that \(\pi :N \longrightarrow {\mathfrak {P}}(A ^ * \times A ^ *)\) is the path relation of \(\Gamma \). Since the congruence enumeration process has terminated, \(\Gamma \) is finite and its set of nodes N coincides with the set M defined at the start of the proof. Hence \(A ^ * / \rho \) is finite, since \(f:N \longrightarrow A ^ * / \rho \) defined in (4.1) is a bijection.

Since any application of TC3 results in no changes to \(\Gamma \), it follows that \(\kappa = \Delta _{N}\). Hence \(\Gamma \) is deterministic, and so every \(w\in A ^ *\) labels exactly one path starting at every \(u \in N\). In particular, \(\phi \) (as defined in the statement) is a well-defined function. If \(w_1, w_2\in A ^ *\) label \((\alpha _1,\alpha _2)\)- and \((\alpha _2, \alpha _3)\)-paths in \(\Gamma \), respectively, then certainly \(w_1w_2\) labels a \((\alpha _1, \alpha _3)\)-path, and so \(\phi \) is an action.

Fig. 3
figure 3

The commutative diagram in the proof of Theorem 4.1(a)

We will show that \(f: N \longrightarrow A ^ * / \rho \) defined in (4.1) is an isomorphism of the actions \(\phi \) and the natural action \(\Phi : A ^ * / \rho \times A ^ * \longrightarrow A ^ * / \rho \) defined by \((u / \rho , v)\Phi = uv /\rho \). We define \(g: N \times A ^ * \longrightarrow (A ^ * / \rho ) \times A ^ *\) by \((\alpha , v)g = ((\alpha )f, v) = (w_{\alpha }/\rho , v)\) for all \((\alpha , v)\in N \times A ^ *\). To show that f is a homomorphism of actions, we must prove that the diagram in Fig. 3 commutes. Suppose that \((\alpha , v) \in N\times A ^ *\). Then

$$\begin{aligned} ((\alpha , v)g)\Phi = (w_{\alpha }/\rho , v)\Phi = w_{\alpha }v / \rho \end{aligned}$$

and if v labels a \((\alpha , \beta )\)-path in \(\Gamma \), then

$$\begin{aligned} ((\alpha , v)\phi )f = (\beta )f = w_\beta / \rho . \end{aligned}$$

But \(w_{\beta }\) and \(w_{\alpha }v\) both label \((0, \beta )\)-paths in \(\Gamma \) and so \((w_{\alpha }v, w_{\beta }) \in (0)\pi \). By Corollary 4.12, \((0)\pi \subseteq \rho \) and so \((w_{\alpha }v, w_{\beta })\in \rho \). Hence \(((\alpha , v)g)\Phi = ((\alpha , v)\phi )f\) and so f is homomorphism of the actions \(\phi \) and \(\Phi \).

(b). If \(A ^ * / \rho \) is finite, then the set M of nodes defined at the start of the proof is also finite. Since every congruence enumeration stabilises, it follows that for every \(\alpha \in M\), there exists \(K\in {\mathbb {N}}\) such that \((\alpha , a, \beta )\in E_i\) for all \(i\ge K\). It follows that \(\beta \in M\). Since A and M are finite, there exists \(K\in {\mathbb {N}}\) such that \((\alpha , a, \beta ) \in E_i\) for all \(\alpha \in M\), all \(a\in A\), and all \(i\ge K\). It follows that none of TC1, TC2, nor TC3 results in any changes to \(\Gamma _i\) or \(\kappa _i\), \(i\ge K\), and so the enumeration terminates. \(\square \)

We conclude this section by proving Corollary 4.2.

Corollary 4.2Let A be a finite alphabet, let \(R\subseteq A ^ * \times A ^ *\) be a finite set, and let \(R ^ {\#}\) be the least two-sided congruence on \(A ^ *\) containing R. Then the following hold:

  1. (a)

    If a congruence enumeration for \(R ^ {\#}\) terminates with output word graph \(\Gamma =(N, E)\), then \(A ^ * / R ^ {\#}\) is finite and the function \(\phi : N \times A ^ * \longrightarrow N\) defined by \((\alpha , w)\phi = \beta \) whenever w labels an \((\alpha , \beta )\)-path in \(\Gamma \) is a right action that is isomorphic to the (faithful) natural action of \(A ^ * / R ^ {\#}\) on itself by right multiplication.

  2. (b)

    If \(A ^ * / R ^ {\#}\) is finite, then any congruence enumeration for \(R ^ {\#}\) terminates.

Proof

If \(S = \varnothing \) in Theorem 4.1, then the least right congruence \(\rho \) containing S and \(R ^ {\#}\) is just \(R ^ {\#}\). By Theorem 4.1, the action \(\phi : N\times A ^ *\longrightarrow N\) is isomorphic to the natural action of \(A ^ *\) on \(A ^ * / R ^ {\#}\) by right multiplication. By Corollary 4.13, the kernel of \(\phi \) is \(R ^ {\#}\), and it is routine to verify that the kernel of the action of \(A ^ *\) on \(A ^ * / R ^ {\#}\) by right multiplication is also \(R ^ {\#}\). Hence, by Proposition 2.2, the action \(\phi \) of \(A ^ * \) on N and the induced action of \(A ^ * / R ^ {\#}\) on N are isomorphic. Similarly, the action of \(A ^ *\) on \(A ^ * / R ^ {\#}\) is isomorphic to the action of \(A ^ * /R ^ {\#}\) on \(A ^ * / R ^ {\#}\) by right multiplication. Thus the actions of \(A ^ * / R ^ {\#}\) on N and \(A ^ * /R ^ {\#}\) on \(A ^ * / R ^ {\#}\) are isomorphic also. The latter action is faithful since \(A ^ * / R ^ {\#}\) is a monoid. Hence the action of \(A ^ * / R ^ {\#}\) on N is faithful also, and there is nothing further to prove. \(\square \)

5 Monoids not defined by a presentation

Suppose that the monoid M defined by the presentation \(\langle A\mid R\rangle \) is finite and that \(\phi : A ^* \longrightarrow M\) is the unique surjective homomorphism extending the inclusion of A in M. Then \(\ker (\phi ) = R ^ {\#}\) and \((a)\phi = a\) for every \(a\in A\). If \(M = \{m_1, m_2, \ldots , m_{|M|}\}\), then the right Cayley graph of M with respect to A is the word graph \(\Gamma \) with nodes \(N = \{0, \ldots , |M|-1\}\) and edges \((\alpha , a, \beta )\) for all \(\alpha , \beta \in N\) and every \(a\in A\) such that \(m_{\alpha }a = m_{\beta }\). The right Cayley graph \(\Gamma \) is complete and deterministic, and so if \(\pi : N \longrightarrow {\mathfrak {P}}(A ^ * \times A ^ *)\) is the path relation on \(\Gamma \), then \((\alpha )\pi = R ^ {\#}\) for all \(\alpha \in N\).

Example 5.1

Suppose that M is the monoid generated by the matrices

$$\begin{aligned} a = \begin{pmatrix} 1 &{}\quad 1 &{}\quad 0 \\ 0 &{}\quad 1 &{}\quad 1 \\ 1 &{}\quad 0 &{}\quad 1 \end{pmatrix}, \quad b = \begin{pmatrix} 1 &{}\quad 1 &{}\quad 0 \\ 0 &{}\quad 1 &{}\quad 0 \\ 0 &{}\quad 0 &{}\quad 1 \end{pmatrix}, \quad c = \begin{pmatrix} 1 &{}\quad 0 &{}\quad 1 \\ 1 &{}\quad 1 &{}\quad 0 \\ 0 &{}\quad 1 &{}\quad 1 \end{pmatrix} \end{aligned}$$

over the boolean semiring \({\mathbb {B}} = \{0, 1\}\) with addition defined by

$$\begin{aligned} 0 + 1 = 1 + 0 = 1 + 1 = 1, \qquad 0 + 0 = 0 \end{aligned}$$

and multiplication defined as usual for the real numbers 0 and 1. The right Cayley graph of M with respect to its generating set \(\{a, b, c\}\) is shown in Fig. 4; see also Table 1.

Fig. 4
figure 4

The right Cayley graph of the monoid M from Example 5.1 with respect to the generating set \(\{a, b, c\}\); see Table 1 for a representative word for each node. Purple arrows correspond to edges labelled a, gray labelled b, and pink labelled c

Table 1 A word labelling a path from 0 to each node in the right Cayley graph on the monoid M from Example 5.1; see Fig. 4

Suppose that we want to determine a right congruence on a finite monoid M generated by a non-abstract set of generators A—such as transformations, partial permutations, or matrices over a semiring. Of course, every finite monoid M is finitely presented, and so congruence enumeration can be applied to any presentation for M. It is possible that a presentation for M is already known, or we can compute a presentation for M; for example, by using the Froidure–Pin Algorithm (see [11]).

It is also possible, instead of starting with the trivial word graph, to start a congruence enumeration with the right Cayley graph \(\Gamma _{1} = (N_1, E_1)\) of M with respect to A as the input to the process. As mentioned above, \((0)\pi _{\Gamma _1} = R ^ {\#}\) and so \((\Gamma _1, \Delta _{N_1})\) is a valid input to a congruence enumeration. The right Cayley graph \(\Gamma _{1}\) is already complete, deterministic, and compatible with \(R ^ {\#}\) by definition. It follows that TC2 and TC3 are the only steps actually applied in a congruence enumeration with input \(\Gamma _{1}\). More precisely, TC2(c) is applied to 0 and every \((u, v) \in S\), and then TC3 is repeatedly applied until \(\kappa _{i} = \Delta _{N_i}\). The right Cayley graph of a monoid \(M = \langle \, A \,\rangle \) of one of the types mentioned above, can also be computed using the Froidure-Pin Algorithm, which has complexity O(|A||M|). Given that the right Cayley graph has |M| nodes and |A||M| edges, it is unlikely that, in general, there is a better method for finding the Cayley graph of a monoid from a generating set.

Experiments indicate that if the congruence being enumerated has a relatively large number of classes in comparison to |M|, then this second approach is faster than starting from the trivial word graph. On the other hand, if the number of congruence classes is relatively small compared with |M|, then starting from the trivial word graph is often faster. Since the number of congruence classes is usually not known in advance, the implementation in libsemigroups [24] runs both these approaches in parallel, accepting whichever provides an answer first.

We end this section with an example of performing a congruence enumeration with a right Cayley graph as an input.

Example 5.2

Suppose that M is the monoid from Example 5.1. To compute the least right congruence \(\rho \) on M containing \(S= \{(a, b)\}\) we perform the following steps.

Set \(\Gamma _{1}\) to be the right Cayley graph of M with respect to the generating set \(\{a, b, c\}\); see Fig. 4 and Table 1. Suppose that \(a< b < c\).

Step 1:

Apply TC2 to the only pair (ab) in S, and the output of this is \((\Gamma _{2}, \kappa _{2})\) where \(\kappa _{2}\) is the least equivalence on the nodes of \(\Gamma _2\) containing (1, 2); see Fig. 5a.

Step 2:

Apply TC3 to obtain the quotient of \(\Gamma _{2}\) by \(\kappa _{2}\), the output is \((\Gamma _{3}, \kappa _{3})\) where \(\Gamma _{3} = \Gamma _{2} / \kappa _{2}\) and \(\kappa _{3}\) is the least equivalence containing (1, 5), (4, 6), and (4, 7); see Fig. 5b.

Step 3:

Apply TC3 with input \((\Gamma _{3}, \kappa _{3})\) and output \((\Gamma _{4}, \kappa _{4})\) where \(\Gamma _{4} = \Gamma _{3} / \kappa _{3}\) and \(\kappa _{4}\) is the least equivalence containing (4, 8); see Fig. 5c.

Step 4:

Apply TC3 with input \((\Gamma _{4}, \kappa _{4})\) and output \((\Gamma _{5}, \kappa _{5})\) where \(\Gamma _{5} = \Gamma _{4} / \kappa _{4}\) and \(\kappa _{5} = \Delta _{N_5}\). Since \(\Gamma _{5}\) is complete and compatible with \(\rho \) and \(\kappa _{5}\) is trivial, the enumeration terminates with \(\Gamma _{5}\); see Fig. 5d.

We conclude that \(\rho \) has 4 equivalence classes whose representatives \(\varepsilon \), a, c, and \(a ^ 2\) correspond to the nodes \(\{0, 1, 3, 4\}\) in the graph \(\Gamma _{5}\). By following the paths in \(\Gamma _{5}\) starting at 0 labelled by each of the words in Table 1 we determine that the congruence classes of \(\rho \) are: \(\{\varepsilon \}\), \(\{a, b, ab\}\), \(\{c\}\), \(\{a ^ 2, ba, bc, bab\}\).

Fig. 5
figure 5

The output \((\Gamma _{i}, \kappa _{i})\) of each step in Example 5.2. Purple arrows correspond to a, gray to b, pink to c, shaded nodes of the same colour belong to \(\kappa _{i}\), and unshaded nodes belong to singleton classes; see Table 1 for a representative word for each node. A dashed edge with a single arrowhead denotes an edge that is obtained from TC2 or TC3, solid edges correspond to edges that existed at the previous step

6 The HLT strategy

In this section, we describe the so-called HLT strategy for congruence enumeration. The acronym HLT stands for Haselgrove, Leech, and Trotter; for further details see [5] or [31, Section 5.2]. For the record, this is Walker’s Strategy 1 in [36], and is referred to as R-style in ACE [14] (“R” for “relators”).

The HLT strategy, like every congruence enumeration, starts by applying steps (a) and (b) in the definition (Definition 3.3). We then repeatedly apply TC3 until the resulting \(\kappa \) is trivial, after which we repeatedly perform the following steps:

HLT1.:

If \(\alpha \) is the minimum node in \(\Gamma _{i}\) such that there exists \((u, v)\in R\) where either: u or v does not label a path starting at \(\alpha \), or u and v label paths ending in different nodes. In the former case, we apply TC1 repeatedly until u and \(v_1\) label paths starting at \(\alpha \) where \(v= v_1b\) for some \(b\in A\), and then apply TC2 where part (a) applies. In the latter case, we simply apply TC2 to \(\alpha \) and \((u, v)\in R\) where part (c) applies.

HLT2.:

Repeatedly apply TC3 until the resulting \(\kappa _{i + 1}\) is trivial.

HLT3.:

If \(\alpha \) is the node from HLT1, then we apply TC1 to \(\alpha \) and every \(a\in A\), if any, such that there is no edge incident to \(\alpha \) labelled by a.

We note that if for every \(a\in A\) there exists \((u, v)\in R\) such that either u or v starts with a, then HLT3 does not have to be performed.

Next we will show that the HLT strategy fulfils the definition of a congruence enumeration.

Proposition 6.1

If \(\langle A| R\rangle \) is a finite monoid presentation, S is a finite subset of \(A ^ * \times A ^ *\), \(R ^ {\#}\) is the least two-sided congruence on \(A ^ *\) containing R, and \(\rho \) is the least right congruence on \(A ^ *\) containing S and \(R ^ {\#}\), then the HLT strategy applied to \((\Gamma _{1}, \kappa _{1})\) where \(\Gamma _{1} = (N_1, E_1)\) is a word graph such that \((0)\pi _{\Gamma _1}\subseteq \rho \) and \(\kappa _{1} = \Delta _{N_1}\) is a congruence enumeration.

Proof

It suffices to check that the conditions in Definition 3.3 hold. Clearly parts (a) and (b) are included in the HLT strategy and so there is nothing to check for these.

(c). Suppose that \(\alpha \) is a node in \(\Gamma _{i}= (N_i, E_i)\) such that there is no edge labelled by some \(a\in A\) incident to \(\alpha \). If \(\alpha \in N_j\) for all \(j \ge i\), then there exists \(k\in {\mathbb {N}}\) such that \(\alpha \) is the minimum node to which HLT1 is applied. If there exists \((u, v)\in R\) such that the first letter of u is a, then an edge labelled by a incident to \(\alpha \) is defined in HLT1. Otherwise, such an edge is defined in HLT3.

(d). Suppose that \(\alpha \) is a node in \(\Gamma _{i}\) and \((u, v)\in R\) are such that u and v both label paths starting at \(\alpha \). If \(\alpha \in N_j\) for all \(j \ge i\), then there exists \(k\in {\mathbb {N}}\) such that \(\alpha \) is the minimum node to which HLT1 is applied, and so TC2 is applied to \(\alpha \) and (uv) at some later step. Hence either the paths labelled by u and v starting at \(\alpha \) end at the same node, or they end at distinct nodes \(\beta \) and \(\gamma \) such that \((\beta , \gamma )\in \kappa _{i}\). But TC3 is applied at some later step, and so \(\beta \) and \(\gamma \) are identified in the corresponding quotient. Eventually the paths labelled by u and v starting at \(\alpha \) end at the same node, i.e. \((u, v) \in (\alpha )\pi _{\Gamma _{i}}\) for sufficiently large i.

(e). If \((\beta , \gamma )\in \kappa _{i}\) for some i, then there exists \(j\ge i\) such that TC3 is applied by the definition of HLT2. If we choose the smallest such j, then \(\kappa _{i}\subseteq \kappa _{j}\) and so \((\beta , \gamma )\not \in \kappa _{j + 1}\). \(\square \)

Example 6.2

Let M be the monoid defined in Example 5.1. Then M is isomorphic to the monoid defined by the presentation

$$\begin{aligned} \langle a,b,c \;\mid \; ac = a^2,\ b^2= b,\ ca= a^2,\ cb =bc,\ c^2= a^2,\ a^3= a^2,\ aba= a^2 \rangle . \end{aligned}$$

We will apply the HLT strategy to the presentation and the set \(S = \{(a, b)\}\). In other words, we enumerate the least right congruence on the monoid M containing (ab). This is the same right congruence that we enumerated in Example 5.1. The input word graph \(\Gamma _{1}\) is the trivial graph and the input \(\kappa _{1}\) is \(\Delta _{N_1}\). We suppose that \(a<b<c\).

Step 1:

We apply Definition 3.3(a). We add the nodes \(1, 2 \in N_1\) and define the edges (0, a, 1) and (0, b, 2). Since \(S = \{(a,b)\}\) this concludes Definition 3.3(a). We continue by applying TC2 to 0 and (ab). Since a and b label paths from 0 to the nodes 1 and 2, respectively, we define \(\kappa _{2}\) to be the least equivalence containing \(\kappa _{1} = \Delta _{N_1}\) and (1, 2). The output is graph \(\Gamma _{2}\) in Fig. 6a.

Step 2:

We continue with Definition 3.3(b). We apply TC3 and we get the graph \(\Gamma _{3}\) in Fig. 6b which is the quotient of \(\Gamma _{2}\) by \(\kappa _{2}\). The set \(\kappa _{3}\) is defined to be \(\Delta _{N_3}\).

Step 3:

Now we are ready to apply HLT1. Since 0 is the minimum node in \(\Gamma _{3}\) such that \((ac, a ^2) \in R\) and ac and \(a^2\) do not label paths in \(\Gamma _{3}\), we apply TC1 until node 3 and a path labelled by \(a^2\) are defined and we continue by applying TC2(b) for \(u = a^2, v = ac \) and \(v_1 = a\). The output of this step is the graph \(\Gamma _{4}\) in Fig. 6c.

Step 4–8:

We apply HLT1 to the node 0 and each of the relations \((b^2, b)\), \((ca, a^2)\), (cbbc), \((c^2, a^2)\), \((a^3, a^2)\in R\) (in this order) and the output \((\Gamma _{i}, \kappa _{i})\), \(i = 5, \ldots , 9\) are shown in Fig. 6d–h.

Step 9:

In this step, we do not apply HLT1 to 0 and \((aba, a^2)\) since aba and \(a^2\) both label paths that reach the same node in \(\Gamma _{9}\). We apply HLT1 to the node 1 and the relation \((ac, a^2) \in R\) and the output is graph \(\Gamma _{10}\) in Fig. 6i.

Step 10:

We apply HLT1 to the node 1 and the relation \((cb, bc) \in R\) and the output is graph \(\Gamma _{11}\) in Fig. 6j.

After Step 10, \(\Gamma _{11}\) is complete, deterministic, and compatible with R. Hence the enumeration terminates; see Example 5.1 for details of the enumerated congruence.

Fig. 6
figure 6

The output \((\Gamma _{i}, \kappa _{i})\) of each step in the HLT strategy in Example 6.2. Purple arrows correspond to a, gray to b, pink to c, shaded nodes of the same colour belong to \(\kappa _{i}\), and unshaded nodes belong to singleton classes; see Table 2 for a representative word for each node. A dashed edge with a double arrowhead indicates the edge being defined in TC1, a dashed edge with a single arrowhead denotes an edge that is obtained from TC2 or TC3, solid edges correspond to edges that existed at the previous step

Table 2 A word labelling a path from 0 to each node in the right Cayley graph on the monoid M from Example 6.2; see Fig. 6

7 The Felsch strategy

In this section, we describe two versions of the so-called Felsch strategy for congruence enumeration. For the record, this is Walker’s Strategy 2 in [36], and is referred to as C-style in ACE [14] (“C” for “cosets”).

7.1 First version

The Felsch strategy starts with the trivial word graph \(\Gamma _{1} = (N_{1}, E_{1})\) and with the trivial equivalence \(\kappa _1 =\Delta _{N_1}\) on the nodes \(N_1\) of \(\Gamma _{1}\). Just like the HLT strategy, the Felsch strategy starts by applying steps (a) and (b) from the definition of a congruence enumeration (Definition 3.3). We then repeatedly apply TC3 until the resulting \(\kappa \) is trivial. The following steps are then repeatedly applied:

F1.:

If \(\alpha \in N_i\) is the minimum node in \(\Gamma _{i}\) such that there exists \(a\in A\) and there is no edge with source \(\alpha \) labelled by a, then apply TC1 to \(\alpha \) and a. Apply TC2 to every node in \(\Gamma _{i}\) and every relation in R.

F2.:

Apply TC3 repeatedly until \(\kappa _{i + 1}\) is trivial.

It follows immediately from the definition of the Felsch strategy that it satisfies the definition of a congruence enumeration. To illustrate the Felsch strategy, we repeat the calculation from Example 6.2.

Example 7.1

Let M be the monoid defined by defined in Example 5.1. Then M is isomorphic to the monoid defined by the presentation

$$\begin{aligned} \langle a,b,c \;\mid \; ac = a^2,\ b^2= b,\ ca= a^2,\ cb =bc,\ c^2= a^2,\ a^3= a^2,\ aba= a^2 \rangle . \end{aligned}$$

We will apply the Felsch strategy to the presentation (with \(a<b<c\)) and the set \(S = \{(a, b)\}\). In other words, we enumerate the least right congruence on the monoid M containing (ab). This is the same right congruence that we enumerated in Example 5.1. The input word graph \(\Gamma _{1}\) is the trivial graph and the input \(\kappa _{1}\) is \(\Delta _{N_1}\).

Step 1:

We apply Definition 3.3(a). We add the nodes \(1, 2 \in N_1\) and define the edges (0, a, 1) and (0, b, 2). Since \(S = \{(a,b)\}\) this concludes Definition 3.3(a). We continue by applying TC2 to 0 and (ab). Since a and b label paths from 0 to the nodes 1 and 2, respectively, we define \(\kappa _{2}\) to be the least equivalence containing \(\kappa _{1} = \Delta _{N_1}\) and (1, 2). The output is graph \(\Gamma _{2}\) in Fig. 7a.

Step 2:

We continue with Definition 3.3(b). We apply TC3 and we get graph \(\Gamma _{3}\) in Fig. 7b which is the quotient of \(\Gamma _{2}\) by \(\kappa _{2}\). The set \(\kappa _{3}\) is defined to be \(\Delta _{N_3}\)

Step 3:

We apply F1 and hence we apply TC1 to 0 and c. We add the node \(3 \in N_4\) and define the edge (0, c, 3). The output is graph \(\Gamma _{4}\) in Fig. 7c.

Step 4:

We continue with the application of the second part of F1 and hence we apply TC2 to 1 and \((b^2, b)\). The output is graph \(\Gamma _{5}\) in Fig. 7d.

Step 5:

We apply F1 and hence we apply TC1 to 1 and a. We add the node \(4 \in N_6\) and define the edge (1, a, 4). The output is graph \(\Gamma _{6}\) in Fig. 7e.

Steps 6–12:

We continue with the application of the second part of F1 and hence we apply TC2 to 0 and \((a^3, a^2)\), to 0 and \((ac, a^2)\), to 1 and \((ac, a^2)\), to 0 and \((ca, a^2)\), to 0 and \((c^2, a^2)\) and to 3 and \((b^2, b)\).

After Step 12, \(\Gamma _{13}\) in Fig. 7f is complete, deterministic, and compatible with R. Hence the enumeration terminates; see Example 5.1 for details of the enumerated congruence.

Fig. 7
figure 7

The output \((\Gamma _{i}, \kappa _{i})\) of each step in the Felsch strategy in Example 7.1. Purple arrows correspond to a, gray to b, pink to c, shaded nodes of the same colour belong to \(\kappa _{i}\), and unshaded nodes belong to singleton classes; see Table 3 for a representative word for each node. A dashed edge with a double arrowhead indicates the edge being defined in TC1, a dashed edge with a single arrowhead denotes an edge that is obtained from TC2 or TC3, solid edges correspond to edges that existed at the previous step

Table 3 A word labelling a path from 0 to each node in the right Cayley graph on the monoid M from Example 7.1; see Fig. 7

7.2 Second version

The purpose of F2 in the Felsch stategy is to squeeze as much information as possible out of every definition of an edge \((\alpha , a, \beta )\) made in F1. The implementation of the Felsch strategy in [23] spends most of its time performing TC2. In this section, we propose a means of reducing the number of times TC2 is applied in F2 of the Felsch strategy. Roughly speaking, we do this by only applying TC2 to \((u, v) \in R\) and a node \(\alpha \) in \(\Gamma _{i}\) if the path in \(\Gamma _{i}\) starting at \(\alpha \) and labelled by u, or v, goes through a part of \(\Gamma _{i}\) that has recently changed.

To enable us to do this, we require a new set \(D_i \subseteq N_i \times A\), in addition to the word graph \(\Gamma _{i}\) and equivalence relation \(\kappa _{i}\), at every step of a congruence enumeration. An element \((\alpha , a)\) of \(D_i\) corresponds to a recently defined edge incident to \(\alpha \) and labelled by a. More precisely, we define \(D_1 = \varnothing \) and for \(i\ge 1\) we define

$$\begin{aligned} D_{i+1}:= \{\, (\alpha , a)\in N_i\times A \, :\, (\alpha , a, \beta )\in E_{i + 1}\setminus E_i \,\}. \end{aligned}$$

In order to efficiently use the information recorded in \(D_i\), we require the following definition.

Definition 7.2

If \(\langle A\mid R\rangle \) is a finite monoid presentation, then we define the Felsch tree F(AR) of this presentation to be a pair \((\Theta , \iota )\) where:

  1. (a)

    \(\Theta = (N, E)\) is a graph with nodes N consisting of every (contiguous) subword \(w\in A^ *\) of a word in a relation in R and edges \((u, a, au)\in E\) whenever \(u, au \in N\) and \(a\in A\); and

  2. (b)

    a function \(\iota \) from N to the power set \({\mathfrak {P}}(R)\) of R such that \((u, v)\in (w)\iota \) whenever \(w\in N\) is a prefix of u or v.

Fig. 8
figure 8

The Felsch tree F(AR) from Example 7.3

Example 7.3

If \(A = \{a, b\}\) and \(R = \{(a ^ 4, a), (b ^ 3, b), ((ab) ^ 2, a ^ 2)\}\), then the nodes in the word graph \(\Theta \) in F(AR) are

$$\begin{aligned} N = \{\varepsilon , a, b, a ^ 2, ab, ba, b ^ 2, a ^ 3, aba,bab, b ^ 3, a ^4, (ab) ^ 2 \}. \end{aligned}$$

A diagram of \(\Theta \) can found in Fig. 8.

The function \(\iota : N \longrightarrow {\mathfrak {P}}(R)\) is:

$$\begin{aligned} (\varepsilon )\iota= & {} R, \\ (a)\iota = (a ^ 2)\iota= & {} \{(a ^ 4, a), ((ab) ^ 2, a ^ 2)\}\\ (a ^ 3)\iota = (a ^ 4)\iota= & {} \{(a ^ 4, a)\}\\ (aba)\iota= & {} \{((ab) ^ 2, a ^ 2)\}\\ (b)\iota = (b ^ 2)\iota = (b ^ 3)\iota= & {} \{ (b ^ 3, b)\}\\ (ab)\iota = ((ab) ^ 2)\iota= & {} \{((ab) ^ 2, a ^ 2)\} \\ (ba)\iota = (bab)\iota= & {} \varnothing . \end{aligned}$$

We modify TC1, TC2, and TC3 so that the set \(D_{i}\) is defined appropriately at every step, and then, roughly speaking, we replace F2 in the Felsch strategy by a backtrack search through \(\Theta \) in F(AR) for every pair in \(D_{i}\). We will refer to this as the modified Felsch strategy.

If \(\alpha \) is a node in \(\Gamma _{i}\) and \(v\in A ^ *\) is a node in \(\Theta \), then we perform a backtrack search consisting of the following steps:

PD1.:

Apply TC2 to \(\alpha \) and every relation in \((v)\iota \).

PD2.:

For every edge (vbbv) in \(\Theta \) that has not been traversed: for every node \(\beta \) in \(\Gamma _{i}\) such that \((\beta , b, \alpha ) \in E_i\), apply PD1 and PD2 to \(\beta \) and bv.

PD3.:

Repeatedly apply TC3 until \(\kappa \) is trivial.

We will refer to either removing a pair from \(D_i\) or performing the backtrack search as processing a deduction.

The backtrack search is initiated for every \((\alpha , a) \in D_i\). If a is not a node in \(\Theta \), then a does not occur in any relation in R and so following the path in \(\Gamma _{i}\) labelled by any u such that \((u, v) \in R\) from any node in w in \(\Gamma _{i}\) cannot contain any edge labelled by a. In particular, no such path contains the edge \((\alpha , a, \beta )\in \Gamma _{i}\); the definition of which caused \((\alpha , a)\) to belong to \(D_{i}\) in the first place. Hence if \((\alpha , a)\in D_i\), then there is a node a in \(\Theta \).

We repeatedly apply PD1, PD2, and PD3 starting from every pair in \(D_{i}\). This must terminate eventually because \(\Theta \) is finite and we only apply TC2 and TC3 in PD1, PD2, and PD3.

To show that the modified Felsch strategy is a congruence enumeration, we require the following lemma.

Lemma 7.4

Suppose that \(\alpha \) is a node in \(\Gamma _{j}\) for all \(j\ge i\), and that \((u, v)\in R\) is such that u and v label paths \(P_u\) and \(P_v\) in \(\Gamma _{i}\) starting at \(\alpha \). If \((\beta , a) \in D_{i}\) for some i, and there is an edge \((\beta , a, \gamma )\in \Gamma _i\) in either \(P_u\) or \(P_v\), then \((u, v) \in (\alpha )\pi _{\Gamma _k}\) for some \(k\ge i\).

Proof

Suppose that \(P_u\) consists of the edges \((\mu _1, b_1, \mu _2), \ldots , (\mu _{r}, b_r, \mu _{r + 1})\) where \(\mu _1 = \alpha \) and \(u = b_1\cdots b_r\). Similarly, suppose that \(P_v\) consists of the edges \((\nu _1, c_1, \nu _2), \ldots , (\nu _s, c_s, \nu _{s + 1})\) where \(\nu _1 = \alpha \) and \(v = c_1\cdots c_s\). We may assume that \(\mu _{r + 1} \ne \nu _{s + 1}\) and that there exists \(t\in \{1, \ldots , r\}\) such that \((\mu _t, b_t, \mu _{t + 1}) = (\beta , a, \gamma )\).

Since \((\beta , a)\in D_{i}\), the backtrack search in PD1 and PD2 will be applied to \((\beta , a)\) at some step \(j\ge i\) of the enumeration. This search begins with PD1, where TC2 is applied to \(\beta \) and every relation in \((a)\iota \). At step PD2 we apply PD1 and PD2 to every \(\zeta \in N_i\) and w such that there exists an edge of the form (abw) in \(\Theta \) and an edge \((\zeta , b, \beta ) \in E_i\).

By definition, \(u = b_1\cdots b_{t - 1}a b_{t + 1}\cdots b_r\). If \(m\in \{1, \ldots , t\}\), then, since \((u, v) \in R\), \(b_m\cdots b_{t - 1}a\) and \(b_{m - 1}\cdots b_{t - 1}a\) are nodes of \(\Theta \) and there exists an edge \((b_m\cdots b_{t - 1}a, b_{m - 1}, b_{m - 1}\cdots b_{t - 1}a)\) in \(\Theta \). Also by assumption \((\mu _{m - 1}, b_{m - 1}, \mu _m) \in E_i\) and so \((\mu _{m - 1}, b_{m - 1}\cdots b_{t - 1}a)\) is one of the pairs in PD2 if it is applied to \((\mu _m, b_{m}\cdots b_{t - 1}a)\). If \(m = t\), then \((\mu _m, b_{m}\cdots b_{t - 1}a) = (\mu _t, a) = (\beta , a)\) and hence PD2 is applied to \((\mu _m, b_{m}\cdots b_{t - 1}a)\) for every \(m\in \{1, \ldots , t\}\). In particular, PD1 is applied to \((\mu _1, b_1\cdots b_{t - 1}a) = (\alpha , b_1\cdots b_{t - 1}a)\) at some point. This means that TC2 is applied to \(\alpha \) and every relation in \((b_1\cdots b_{t - 1}a)\iota \). Since \(b_1\cdots b_{t - 1}a = b_1\cdots b_{t}\) is a prefix of u, it follows that \((u, v)\in (b_1\cdots b_{t - 1}a)\iota \) and so TC2 is applied to \(\alpha \) and (uv) at some step \(j \ge i\). Because u and v were assumed to label paths originating at \(\alpha \), neither TC2(a) nor (b) applies. Additionally, we assumed that \(\mu _{r + 1}\ne \nu _{s + 1}\) and so TC2(c) is applied and \((\mu _{r + 1}, \nu _{s + 1})\in \kappa _{i + 1}\).

As noted above, \(\Theta \) is finite, and PD1 and PD2 are only applied finitely many times before there is an application of PD3. If the first application of TC3 (in PD3) after step j occurs at step k, then \(\Gamma _{k + 1}\) is the quotient of \(\Gamma _{k}\) by \(\kappa _{k}\) and \(\kappa _{i}\subseteq \kappa _{k}\). Therefore \((u, v)\in (\alpha )\pi _{\Gamma _{k + 1}}\) as required. \(\square \)

Proposition 7.5

If \(\langle A| R\rangle \) is a finite monoid presentation, S is a finite subset of \(A ^ * \times A ^ *\), \(R ^ {\#}\) is the least two-sided congruence on \(A ^ *\) containing R, and \(\rho \) is the least right congruence on \(A ^ *\) containing S and \(R ^ {\#}\), then the modified Felsch strategy applied to \((\Gamma _{1}, \kappa _{1})\), where \(\Gamma _{1} = (N_1, E_1)\) is the trivial word graph and \(\kappa _{1} = \Delta _{N_1}\), is a congruence enumeration.

Proof

The only difference between the modified Felsch strategy and the original Felsch strategy is that after F1, TC2 is only applied to particular nodes and relations. Hence, it suffices to show that Definition 3.3(d) holds. Assume \(\alpha \in N_i\) at some step i in the congruence enumeration and let \((u, v) \in R\). In order for modified Felsch to be a congruence enumeration we need to show that there exists \(j \ge i\) such that either \(\alpha \notin N_j\) or \((u, v) \in (\alpha )\pi _{\Gamma _j}\). If \(\alpha \notin N_j\) for some \(j > i\), then \(\alpha \notin N_k\) for all \(k \ge j\) (since new nodes introduced in TC1 are larger than all previous nodes). Hence it suffices to prove that if \(\alpha \) is a node for all \(j \ge i\), there exists some k such that \((u,v) \in (\alpha )\pi _{\Gamma _k}\). Since F1 is repeatedly applied there exists a step \(k + 1\) when \(\alpha \) is a node in \(\Gamma _{k + 1}\) and paths \(P_u\) and \(P_v\) leaving \(\alpha \) labelled by u and v, respectively, exist in \(\Gamma _{k + 1}\). Suppose that \(k\in {\mathbb {N}}\) is the least value such that this holds. Then, by Lemma 7.4, it suffices to show that there exists an edge \((\beta , a, \gamma )\) in either \(P_u\) or \(P_v\) that belongs to \(E_{k + 1}\setminus E_k\) so that \((\beta , a)\in D_{k + 1}\). If every edge in \(P_u\) and \(P_v\) belongs to \(E_k\), then u and v both label paths in \(\Gamma _{k}\) starting at \(\alpha \), which contradicts the assumed minimality of k. \(\square \)

8 Implementation issues

In this section, we briefly address some issues relating to any implementation of the Todd–Coxeter algorithm for semigroups and monoids as described herein.

In some examples, it can be observed that the HLT strategy defines many more nodes than the Felsch strategy. One possible antidote to this is to, roughly speaking, perform

“periods of definition \(\grave{a}\) la HLT [that] alternate with periods of intensive scan \(\grave{a}\) la Felsch” [25, p. 14].

In both ACE [14] and libsemigroups [24] it is possible to specify the precise lengths of the periods of applications of HLT and Felsch. As might be expected, some choices work better than others in particular examples, and this is difficult (or impossible) to predict in advance. It is routine to show that alternating between HLT and Felsch in this way still meets the definition of a congruence enumeration given in Definition 3.3. Although we presented the HLT and Felsch strategies separately, it seems that some combination of the two sometimes offers better performance.

The next issue is: how to represent the equivalence relations \(\kappa _i\)? A method suggested in [31, Section 4.6], which is now a standard approach for representing equivalence relations, is to use the disjoint-sets data structure to represent the least equivalence relation containing the pairs (uv) added to \(\kappa _i\) in TC2(c) or TC3. The theoretical time complexity of updating the data structure to merge two classes, or to find a canonical representative of a class given another representative, is \(O(\alpha (m))\) time (in both the worst and the average case) and requires O(m) space where m is the number of elements in the underlying set and \(\alpha \) is the inverse Ackermann function; see [34].

Another issue that arises in the implementation is how to represent the word graphs \(\Gamma _i\). In order to efficiently obtain the word graph \(\Gamma _{i + 1}\) from \(\Gamma _i\) in TC3 it is necessary to keep track of both the edges with given source node, and with given target. This is more complex for semigroups and monoids than for groups, because a word graph \(\Gamma =(N, E)\) ouput by a coset enumeration for a group has the property that for every \(\beta \in N\) and every \(a \in A\) there is exactly one \(\alpha \in N\) such that \((\alpha , a, \beta )\) is an edge. As such if it ever arises that there are edges \((\alpha _1, a, \beta )\) and \((\alpha _2, a, \beta )\) in a word graph during a coset enumeration, the pair \((\alpha _1, \alpha _2)\) can immediately be added to \(\kappa _i\). It is therefore possible to represent every word graph arising during a coset enumeration to have the property that \(|\{\, \alpha \in N \, :\, (\alpha , a, \beta )\in E \,\}| = 1\), which simplifies the data structure required to represent such a graph.

In contrast, if \(\Gamma = (N, E)\) is a word graph arising in a congruence enumeration for a monoid, then \(|\{\, \alpha \in N \, :\, (\alpha , a, \beta )\in E \,\}|\) can be as large as |N|. In practice, in TC3 pairs of nodes belonging to \(\kappa _i\) are merged successively. A balance must be struck between repeatedly updating the data structure for the edges with given target in \(\Gamma _{i + 1}\) or only retaining the edges with given source and rebuilding the data structure for the target edges later in the process. The former works better if the number of nodes in \(\Gamma _i\) is comparable to the number in \(\Gamma _{i + 1}\), i.e. only a relatively small number of nodes are merged. On the other hand, if \(\Gamma _{i + 1}\) is considerably smaller than \(\Gamma _i\), then it can be significantly faster to do the latter.

Depending on the sequence of applications of TC1, TC2, and TC3 in two successful congruence enumerations with the same input, the output word graphs may not be equal. However, the output word graphs are unique up to isomorphism. Standardization is a process for transforming a word graph into a standard form. To discuss this we require the following notion.

If A is any alphabet and \(\preceq \) is a total order on \(A ^ *\), then we say that \(\preceq \) is a reduction ordering if \(\preceq \) has no infinite descending chains and if \(u\preceq v\) for some \(u, v\in A ^ *\), then \(puq \preceq pvq\) for all \(p, q\in A ^ *\). It follows from this definition that \(\varepsilon \) is the \(\preceq \)-minimum word in \(A ^ *\) for every reduction ordering on \(A ^ *\).

If the set A is totally ordered by \(\le \), then we may extend \(\le \) to \(\le _{\text {lex}}\) over \(A^*\), by defining \(\varepsilon \le _{\text {lex}} w\) for all \(w\in A ^*\) and \(u\le _{\text {lex}} v\) whenever \(u = au_1\) and \(v = bv_1\) for some \(a, b\in A\) with \(a<b\), or \(a=b\) and \(u_1 \le _{\text {lex}} v_1\). This order is usually referred to as the lexicographic order on \(A ^ *\). Note that lexicographic order is not a reduction ordering. The short-lex order \(\le _{\text {slex}}\) on \(A ^ *\) is defined as follows: if \(u, v \in A ^*\), then \(u \le _{\text {slex}} v\) if \(|u| < |v|\) or \(|u| = |v|\) and \(u \le _{\text {lex}} v\). It is straightforward to verify that the short-lex order on \(A ^ *\) is a reduction ordering. Further examples of reduction orderings on \(A ^ *\) include recursive path descent, as well as the wreath product of any finite collection of reduction orderings; see [31, Section 2.1] for further details.

Suppose that \(\preceq \) is a reduction ordering on \(A ^ *\). We will say that the word graph \(\Gamma = (N, E)\) is standardized with respect to \(\preceq \) if \(\alpha < \beta \) if and only if \(w_{\alpha } \prec w_{\beta }\) for any \(\alpha , \beta \in N\) where \(w_{\alpha }, w_{\beta }\in A ^ *\) are the \(\preceq \)-minimum words labelling \((0, \alpha )\)- and \((0, \beta )\)-paths, respectively. Any process that transforms a word graph \(\Gamma \) into a standardized word graph, is referred to as standardization; for example, see STANDARDIZE in [31, p195].

A word graph \(\Gamma _i\) can be replaced by any standardized word graph at any step of a congruence enumeration, provided that the values in \(\kappa _i\) and \(D_i\) are also updated accordingly. In particular, it can be applied repeatedly during a congruence enumeration, or only at the end. Standardization during a congruence enumeration can be very costly in the context of semigroups and monoids. However, it can also be somewhat beneficial in some examples.

The order of the definitions of new nodes in \(\Gamma _{i + 1}\) in both HLT1 and F1 depends on the numerical values of their source nodes. Hence replacing \(\Gamma _i\) by a standardized word graph can change the order of these definitions, which in turn can influence the number of steps in the enumeration. In an actual implementation, standardising a word graph is a rather complicated process that involves applying a permutation to the data structure representing the graph.

9 Further variants

In this section, we present some variants of the Todd–Coxeter algorithm that appear in the literature, which are used to compute special types of congruences (namely Rees congruences) and for computing congruences on finitely presented inverse monoids.

9.1 Monoids with zero

The first such variant is for finitely presented monoids with a zero element 0. For example, if M is the monoid defined by the presentation

$$\begin{aligned} \langle a, b, 0\mid ab = 0, a ^ 4 = a, b ^ 3 = b, (ab) ^ 2 = 0, a0=0a=0=b0=0b=0 ^2\rangle , \end{aligned}$$

then the relations \(a0=0a=0=b0=0b=0 ^2\) indicate that 0 is a zero element of M. Of course, the algorithms described above can be applied to this finite monoid presentation, as well as every other. On the other hand, the inclusion of the relations \(a0=0a=0=b0=0b=0 ^2\) is rather cumbersome, and so we might rather write:

$$\begin{aligned} \langle a, b\mid ab = 0, a ^ 4 = a, b ^ 3 = b, (ab) ^ 2 = 0\rangle \end{aligned}$$

where the relations \(a0=0a=0=b0=0b=0 ^2\) are implicit. This is directly analogous to the implicit relations for the identity in a monoid presentation, and in a group presentation for inverses. We refer to such a presentation as a finite monoid-with-0 presentation. Both the HLT and Felsch strategies can be adapted for monoid-with-0 presentations without much difficulty as follows.

Suppose that \(\langle A\mid R\rangle \) is a finite monoid-with-0 presentation. We refer to a word graph \(\Gamma _i = (N_i, E_i)\) over \(A \cup \{0\}\) with a distinguished node \(\omega \in N_i\) such that the only edges with source \(\omega \) are loops of the form \((\omega , a, \omega ) \in E_i\) for all \(a\in A\) and edges \((\alpha , 0, \omega )\) for all \(\alpha \in N_i\) as a word graph-with-0. We augment TC1 with the following step:

Z::

If \(\beta \) is the new node introduced in TC1, then we define the edge \((\beta , 0, \omega )\).

Given a monoid-with-0 presentation, it is routine to verify that if we perform any congruence enumeration (where TC1 is augmented with Z) with input \((\Gamma _1, \kappa _1)\) where \(\Gamma _1 = (N_1, E_1)\) is a word graph-with-0 and \(\kappa _1 = \Delta _{N_1}\), then the conclusions in Corollary 4.2 still hold.

9.2 Rees congruences

Following [30, Chapter 12], we may extend the discussion of the previous section, to obtain a procedure for enumerating a left, right, or two-sided Rees congruence on a finitely presented monoid (with or without zero element). If I is a left, right, or two-sided ideal of the monoid S, then the Rees congruence associated with I is the congruence \(\Delta _{S} \cup (I\times I)\). Such a procedure applies to a finite monoid presentation \(\langle A\mid R \rangle \) and set finite \(S \subseteq A ^ * \times \{0\}\) (rather than \(S \subseteq A ^ * \times A ^ *\) as in Definition 3.3). The input to such an enumeration is a word graph-with-zero \(\Gamma _1 = (N_1, E_1)\) and \(\kappa _1 = \Delta _{N_1}\). The first steps are identical to those given in Definition 3.3(a) and (b) except that TC1 and Z are applied in part (a). The subsequent steps are just any sequence of applications of TC1+Z, TC2, and TC3 satisfying Definition 3.3(c), (d) and (e). It follows immediately from Theorem 4.1, and the validity of the congruence enumeration for monoid-with-0 presentations, that this process is valid.

9.3 Stephen’s procedure

Another variant of the Todd–Coxeter algorithm is that of Stephen [32, Chapter 4], mentioned in the introduction of the current article. Note that a similar method for constructing the Cayley graph of groups was described by Dehn [6]. Suppose that M is the monoid defined by a finite monoid presentation \(\langle A\mid R\rangle \) and that \(\Gamma =(N, E)\) is the right Cayley graph of M with respect to A. If \(w\in A ^ *\) is arbitrary and w labels a \((0, \alpha )\)-path in \(\Gamma \) for some \(\alpha \in N\), then the aim of this variant is to output the subgraph \(\Lambda \) of \(\Gamma \) induced by the set X of nodes in N from which \(\alpha \) is reachable. Note that the set of these nodes corresponds to the set of elements in M which are \(\ge _{{\mathcal {R}}} w/{R^{\#}}\) (recall that two monoid elements \(s, t \in M\) satisfy \(s \ge _{{\mathcal {R}}} t\) if \(tM \subseteq sM\)). If \({\mathcal {A}}\) is the automata with alphabet A, state set X, initial state 0, accept state \(\alpha \), and edges consisting of those in \(\Lambda \), then the language \(L({\mathcal {A}})\) accepted by \({\mathcal {A}}\) is the set of words in \(v\in A ^ *\) that represent the same element of M as w (i.e. \(v/R ^ {\#} = w / R ^ {\#}\)). As such if the (as yet to be described) procedure terminates, the output allows us to decide the word problem for w in M.

Suppose that \(w = a_1 \cdots a_n\in A ^ *\) for some \(a_1, \ldots , a_n \in A\), that \(\Gamma _1 = (N_1, E_1)\) is the trivial word graph, and \(\kappa _1=\Delta _{N_1}\). A special case of Stephen’s procedure consists of the following steps described in terms of TC1, TC2, and TC3. The first step is always:

S1::

TC1 is applied to 0 and \(a_1\), then to i and \(a_{i + 1}\) for every i such that \(1\le i \le n - 1\). The resulting \(\Gamma _n\) consists of the single path from 0 to the node n. (The graph \(\Gamma _n\) is referred to as the linear graph of w in [32].)

S1 is then followed by any sequence of the following steps:

S2::

At step i, suppose that the word graph \(\Gamma _i = (N_i, E_i)\) contains a path with source \(\alpha \in N_i\) labelled u for some \((u, v) \in R\). If \(v = v_1b\) where \(v_1\in A ^ *\) and \(b\in A\), then TC1 is applied until there is a path with source \(\alpha \) labelled by \(v_1\) and then TC2 is applied to \(\alpha \) and \((u, v)\in R\). (This is referred to as an elementary expansion in [32].)

S3::

Apply TC3. (Quotienting \(\Gamma _i\) by the least equivalence containing a single \((\alpha , \beta ) \in N_i \times N_i\) is referred to as a determination in [32]. In TC3 we quotient \(\Gamma _i\) by the entire equivalence \(\kappa _i\); this is the only point where the procedure described here differs from the description in [32].)

It is shown in [32] that if any sequence of S2 and S3 has the property that after finitely many steps any subsequent applications of S2 and S3 result in no changes to the output (i.e. \((\Gamma _{i + 1}, \kappa _{i + 1}) = (\Gamma _{i}, \kappa _{i})\)), then \(\Gamma _i\) is isomorphic to the induced subgraph \(\Lambda \) defined at the start of this section. Note that S2 and S3 are similar to HLT1, HLT2, and HLT3 described in Sect. 6. While it is not possible to use congruence enumeration, at least as described in this paper, to solve the word problem when the monoid M defined by a presentation \(\langle A\mid R \rangle \) is infinite, it is possible to decide whether or not two words \(u, v\in A ^ *\) represent the same element of M using the procedure defined in this section, whenever the induced subgraph \(\Lambda \) is finite. Since the set of nodes in \(\Lambda \) corresponds to the set of elements in M which are \(\ge _{{\mathcal {R}}} u/R^{\#}\) (assuming that u is the input word for Stephen’s procedure) the word graph \(\Lambda \) is finite when there are only finitely many elements of M that are \(\ge _{{\mathcal {R}}} u/R^{\#}\).