The study of algorithmic properties of algebraic structures has a rich and fascinating history, linking with a vast number of different areas of mathematics. One of the central problems for an algebraic structure is the word problem. This is the problem of, given a presentation of the structure in terms of generators and relations, deciding whether or not two given words over the generators represent the same element of the structure. This problem can be dated back to the early 20th century in the investigations by Dehn and Thue [21, 72]. Natural algebraic structures to investigate with respect to this problem are monoids, i.e. semigroups with an identity element. A special monoid is one which admits a presentation in which all the defining relations are of the form \(w = 1\). Special monoids were first defined, and given their name, in 1958 by Tseitin [73, p. 178], although special monoids with a single defining relation \(w=1\) had implicitly already been studied in 1914 by Thue [72, Sects. 3 and 5–8]. Indeed, [72, Problem II] asks precisely for a solution to the word problem for special one-relation monoids \(\text {Mon} \langle A \mid w=1 \rangle \), and the remainder of Thue’s article deals with solving special cases of precisely this problem.Footnote 1 Thus special monoids form one of the cornerstones of combinatorial semigroup theory.

The first structured approach to special monoids, however, was due to Adian and his student Makanin in the 1960s [1,2,3, 51, 52]. The usage of the term special for such monoids was subsequently exported beyond the Soviet Union; it appears to have been first introduced to the French school by Lallement [43, p. 371], and to the German by Jantzen [37], both of whom explicitly borrow the phrase from Adian [3]. Other terms in use, before special became standard, were unitary, used by e.g. Book [10], and trivial, used by e.g. Cochet [20]; the usage of this latter term is (justifiably) derided by Jantzen [37, p. 72]. A rewriting of Adian’s and Makanin’s proofs and results into the language of rewriting systems was later made by Zhang in the early 1990s. Other authors have also investigated special monoids, see e.g. [8, 39, 40].

Arguably the most important result to date on special monoids is the following, sometimes referred to as the Adian-Makanin Theorem: the group of units of a k-relation special monoid is a k-relator group, and a finitely presented special monoid has decidable word problem if and only if its group of units does [1, 3, 52]. An example of an immediate corollary to this result is that the word problem for any special monoid with a single defining relation \(w=1\) is decidable, using Magnus’ solution to the word problem for one-relator groups [49]. Hence, given a special monoid M, the group of units U(M) plays a key rôle in understanding the monoid. We shall presently see this theme reinforced in this article.

From another direction, the methods of formal language theory have been highly successful in applications to group theory. This was initiated in 1971 by Anīsīmov [5], and considers, for a group G generated by a set A, the set of all words over A which represent the identity element. The realisation that the language-theoretic properties of this set of words can reveal structural information about the group in question unlocked an entirely new angle of approach to group theory. The set is commonly referred to as the word problem for the group, where the definite article is supported by the fact that the language-theoretic properties of this set do not generally depend on the generating set chosen for the group. Anīsīmov showed that the word problem of a group is a regular language if and only if the group is finite.Footnote 2 Further, he showed that the class of groups with context-free word problem is closed under free products, and that it does not contain the group \(\mathbb {Z}^2\). He later proved that every context-free group has the Howson property [6]. A full and striking characterisation of the class of context-free groups was then provided: a finitely generated group has context-free word problem if and only if it is virtually free. This was proved in 1983 by Muller and Schupp [54] (when supplemented by a result by Dunwoody’s [24]. This famous result—the Muller-Schupp theorem—is now the basis of a large literature and much active research, see e.g. [7, 22, 29, 30].

With the successes of language-theoretic techniques applied to the theory of groups evident, efforts were made to translate the relevant definitions and results to semigroup theory. In 2004, Duncan and Gilman [23] defined a language which provides an elegant analogy for monoids of the word problem for groups, and which takes into account the fact that the set of words representing the identity element in a monoid need not be particularly enlightening. The language they defined, also referred to as the word problem, enjoys the same types of invariance under choice of generating set as in the group case, and the analogy of Anīsīmov’s result remains true: a monoid has regular word problem if and only if it is finite [23, p. 523]. In view of the Muller-Schupp theorem, a natural question is thus characterising which monoids have context-free word problem; this was the first question asked by Duncan and Gilman about their definition of the word problem [23, Question 4]. This remains a wide open problem.

The goal of this article is to investigate (and fully answer) Duncan and Gilman’s question in the setting of special monoids. More generally, we will ask:

FormalPara Question 1

Let \(\mathbf {C}\) be a class of languages. What is the algebraic structure of a finitely presented special monoid with word problem in \(\mathbf {C}\)?

Duncan and Gilman’s question is thus Question  for \(\mathbf {C}= \mathbf {CF}\), the class of context-free languages. In this article, we will answer Question  completely for \(\mathbf {C}= \mathbf {CF}\), but also for much more general classes \(\mathbf {C}\), namely when \(\mathbf {C}\) a super-\({{\,\mathrm{AFL}\,}}\) (in the sense of Greibach [33]) closed under reversal. Specifically, we have the following main theorem:

FormalPara Theorem A

Let M be a finitely presented special monoid. Let \(\mathbf {C}\) be a super-\({{\,\mathrm{AFL}\,}}\) closed under reversal. Then M has word problem in \(\mathbf {C}\) if and only if its group of units U(M) has word problem in \(\mathbf {C}\).

We deduce the following (Corollary 3.1) by applying the Muller-Schupp theorem:

FormalPara Corollary

Let M be a finitely presented special monoid. Then M has context-free word problem if and only if the group of units U(M) is virtually free.

As every group is a special monoid, this is a strict generalisation of the Muller-Schupp theorem for groups. Using Theorem A, we can deduce many language-theoretic properties of special monoids. First, we prove a result (Theorem 3.2) which we do not state in full generality here, but which has as a special case:

FormalPara Theorem

Let M be a finitely presented special monoid. Then M has context-free word problem if and only if the set of all words representing the identity element of M is a context-free language.

That is, the usual group-theoretic approach of considering the language of words equal to the identity element is also applicable to special monoids. Using this, we completely answer a question first posed in 1992 by Zhang. We also deduce some results regarding decision problems; for example, we prove (Theorem 3.5) that for a special monoid with virtually free group of units, the rational subset membership problem is decidable. As another application, we find the second main theorem of this article:

FormalPara Theorem B

Let \(M = \text {Mon} \langle A \mid w=1 \rangle \) be a special one-relation monoid. Then it is decidable whether M has context-free word problem.

In fact, this can be decided in polynomial time. Whether this was the case was asked by Zhang in 1992, and so Theorem B answers this question. This theorem can be seen as a first step in a program to completely classify the language-theoretic properties of one-relation monoids with respect to various classes of languages.

We also investigate Question  for some classes of languages \(\mathbf {C}\) which are not super-\({{\,\mathrm{AFL}\,}}\)s. Namely, we consider the case when \(\mathbf {C}\) is the class of regular languages or the class of deterministic context-free languages; we (easily) classify the special monoids with regular word problem, and prove that any special monoid with deterministic context-free word problem is cancellative (Proposition 3.7). We end with a discussion of a property which a special monoid presentation may have (being infix), and ask whether every special monoid admits an infix presentation. When the group of units is trivial, we answer this affirmatively (Proposition 3.8).

1 Introduction

We assume the reader is familiar with formal language theory, including regular and context-free languages, as well as some elementary properties of codes. For some background on this, and other topics in formal language theory, we refer the reader to standard books on the subject [9, 34, 36]. The paper also assumes familiarity with the basics of the theory of monoid and group presentations, which will be written as \(\text {Mon} \langle A \mid R \rangle \) and \(\text {Gp} \langle A \mid R \rangle \), respectively. In particular, we assume the reader is familiar with the word problem as a decision problem. For further background information and examples of this theory, see e.g. [3, 17, 48, 50, 57].

1.1 Basic notation

Let A be a finite alphabet, and let \(A^*\) denote the free monoid on A, with identity element denoted \(\varepsilon \) or 1, depending on the context. Let \(A^+\) denote the free semigroup on A, i.e. \(A^+ = A^*\setminus \{ \varepsilon \}\). For \(u, v \in A^*\), by \(u \equiv v\) we mean that u and v are the same word. For \(w \in A^*\), we let |w| denote the length of w, i.e. the number of letters in w. We have \(|\varepsilon | = 0\). If \(w \equiv a_1 a_2 \cdots a_n\) for \(a_i \in A\), then we let \(w^\text {rev}\) denote the reverse of w, i.e. the word \(a_n a_{n-1} \cdots a_1\). Note that \(^\text {rev}:A^*\rightarrow A^*\) is an anti-homomorphism, i.e. \((uv)^\text {rev}\equiv v^\text {rev}u^\text {rev}\) for all \(u, v \in A^*\). If \(X \subseteq A^*\), then we let \(X^\text {rev}= \{ x^\text {rev}\mid x \in X \}\). If the words \(u, v \in A^*\) are equal in the monoid \(M = \text {Mon} \langle A \mid R \rangle \), then we denote this \(u =_M v\). For a monoid M generated by a finite set A, by which we mean there exists a surjective homomorphism \(\pi :A^*\rightarrow M\), we define, for \(w \in A^*\), the set

$$\begin{aligned} {{\,\mathrm{Rep}\,}}_A^M(w) := \{ u \in A^*\mid u =_M w \}. \end{aligned}$$

This set \({{\,\mathrm{Rep}\,}}_A^M(w) \subseteq A^*\) is the set of representatives of the element \(\pi (w)\). Finally, for \(X \subseteq A^*\) we let \(\langle X \rangle \) denote the submonoid of M generated by X, i.e. \(\pi (X^*)\).

We give some notation for rewriting systems. For an in-depth treatment and some terminology, see e.g. [10, 12, 38]. A rewriting system R on A is a subset of \(A^*\times A^*\). An element of R is called a rule. If \((u, v) \in R\), we will sometimes denote this \((u \rightarrow v) \in R\). The system R induces several relations on \(A^*\). We will write \(u \xrightarrow {}_{R} v\) if there exist \(x, y \in A^*\) and a rule \((\ell , r) \in T\) such that \(u \equiv x\ell y\) and \(v \equiv xry\). We let \(\xrightarrow {*}_{R}\) denote the reflexive and transitive closure of \(\xrightarrow {}_{R}\). We denote by the symmetric, reflexive, and transitive closure of \(\xrightarrow {}_{R}\). The relation defines the least congruence on \(A^*\) containing R. For \(X \subseteq A^*\), we let \(\langle X \rangle _R\) denote the set of ancestors of X, i.e. \(\langle X \rangle _R = \{ w \in A^*\mid \exists x \in X \text { such that } w \xrightarrow {*}_{R} x \}\). The monoid \(\text {Mon} \langle A \mid R \rangle \) is identified with the quotient . Let \(u, v \in A^*\) and let \(n \ge 0\). If there exist words \(u_0, u_1, \dots , u_n \in A^*\) such that

$$\begin{aligned} u \equiv u_0 \xrightarrow {}_{R} u_1 \xrightarrow {}_{R} \cdots \xrightarrow {}_{R} u_{n-1} \xrightarrow {}_{R} u_n \equiv v, \end{aligned}$$

then we denote this \(u \xrightarrow {}_{R}^n v\), i.e. u rewrites to v in n steps. Thus \(\xrightarrow {*}_{R} = \bigcup _{n \ge 0} \xrightarrow {}_{R}^n\).

A rewriting system \(R \subseteq A^*\times A^*\) is said to be monadic if \((u, v) \in R\) implies \(|u| \ge |v|\) and \(v \in A \cup \{ \varepsilon \}\). We say that R is special if \((u, v) \in R\) implies \(v \equiv \varepsilon \). Let \(\mathbf {C}\) be a class of languages. Every special system is monadic. A monadic rewriting system R is said to be \(\mathbf {C}\) if for every \(a \in A \cup \{ \varepsilon \}\), the language \(\{ u \mid (u, a) \in \mathscr {R}\}\) is in \(\mathbf {C}\). Thus, we may speak of e.g. monadic \(\mathbf {C}\)-rewriting systems or monadic context-free rewriting systems.

Let G be a group with finite (group) generating set A, with \(A^{-1}\) the set of inverses of the generators A. The language

$$\begin{aligned} {{\,\mathrm{IP}\,}}_{A \cup A^{-1}}^G := \{ w \mid w \in (A \cup A^{-1})^*, w =_G 1 \} \end{aligned}$$

is called the (group-theoretic) word problem for G, see e.g. [5, 54]. Let M be a monoid with a finite generating set A. Translating the above definition of the word problem directly to M does not, in general, yield much insight into the structure of M. Duncan and Gilman [23, p. 522], realising this, introduced a different language. The (monoid) word problem of M with respect to A is defined as the language

$$\begin{aligned} {{\,\mathrm{WP}\,}}_A^M := \{ u \# v^\text {rev}\mid u, v \in A^*, u =_M v\}, \end{aligned}$$

where \(\#\) is some fixed symbol not in A. For a class of languages \(\mathbf {C}\), we say that M has \(\mathbf {C}\)-word problem if \({{\,\mathrm{WP}\,}}_A^M\) is in \(\mathbf {C}\). If \(\mathbf {C}\) is closed under inverse homomorphism, then M having \(\mathbf {C}\)-word problem does not depend on the finite generating set A chosen for M [23, Theorem 5.2]. Furthermore, if M is a group, then M has group-theoretic word problem in \(\mathbf {C}\) if and only if \({{\,\mathrm{WP}\,}}_A^M\) is in \(\mathbf {C}\) [23, Theorem 3].

1.2 Monadic ancestry and super-\({{\,\mathrm{AFL}\,}}\)s

Many of our theorems will be stated for a special type of classes of languages. Such classes of languages are called super-\({{\,\mathrm{AFL}\,}}\)s, and were introduced by Greibach [33].

We follow Book, Jantzen and Wrathall [11] in the following definitions. Let A be an alphabet. For each \(a \in A\), let \(\sigma (a)\) be a language (over any finite alphabet); let \(\sigma (\varepsilon ) = \{ \varepsilon \}\); for every \(x, y \in A^*\) let \(\sigma (xy) = \sigma (x)\sigma (y)\); and for every \(L \subseteq A^*\), let \(\sigma (L) = \bigcup _{w\in L} \sigma (w)\). We then say that \(\sigma \) is a substitution. For a class \(\mathbf {C}\) of languages, if for every \(a \in A\) we have \(\sigma (a) \in \mathbf {C}\), then we say that \(\sigma \) is a \(\mathbf {C}\)-substitution. Let A be an alphabet, and \(\sigma \) a substitution on A. For every \(a \in A\), let \(A_a\) denote the smallest finite alphabet such that \(\sigma (a) \subseteq A_a^*\). Extend \(\sigma \) to \(A \cup (\bigcup _{a \in A}A_a)\) by defining \(\sigma (b) = \{ b \}\) whenever \(b \in (\bigcup _{a \in A} A_a) \setminus A\). For \(L \subseteq A^*\), let \(\sigma ^1(L) = \sigma (L)\), and let \(\sigma ^{n+1}(L) = \sigma (\sigma ^n(L))\) for \(n \ge 1\). Let \(\sigma ^\infty (L) = \bigcup _{n\ge 0} \sigma ^n(L)\). Then we say that \(\sigma ^\infty \) is an iterated substitution. If for every \(b \in A \cup (\bigcup _a A_a)\) we have \(b \in \sigma (b)\), then we say that \(\sigma ^\infty \) is a nested iterated substitution. We say that \(\mathbf {C}\) is closed under nested iterated substitution if for every \(\mathbf {C}\)-substitution \(\sigma \) and every \(L \in \mathbf {C}\), we have: if \(\sigma ^\infty \) is a nested iterated substitution, then \(\sigma ^\infty (L) \in \mathbf {C}\).

Let \(\mathbf {C}\) be a class of languages. Recall that \(\mathbf {C}\) is said to be closed under inverse homomorphism if AB are alphabets and \(L \in \mathbf {C}\) with \(L \subseteq B^*\), then for every homomorphism \(\varphi :A^*\rightarrow B^*\) we have \(\varphi ^{-1}(L) \in \mathbf {C}\). We say that \(\mathbf {C}\) is a super-\({{\,\mathrm{AFL}\,}}\) if it is a full \({{\,\mathrm{AFL}\,}}\) (i.e. it is closed under homomorphism, inverse homomorphism, intersection with regular languages, union, concatenation, and the Kleene star) and if it closed under nested iterated substitution. As mentioned, the class \(\mathbf {CF}\) of context-free languages is a super-\({{\,\mathrm{AFL}\,}}\) [42, 11, Theorem 2.2], as is the class \(\mathbf {IND}\) of indexed languages [4, 26] and the class \(\mathbf {ET0L}\) of \(\mathbf {ET0L}\)-languages [18, Corollary 4.10]. On the other hand, the class \(\mathbf {REG}\) is clearly not a super-\({{\,\mathrm{AFL}\,}}\). Indeed, if \(\mathbf {C}\) is a super-\({{\,\mathrm{AFL}\,}}\), then \(\mathbf {CF}\subseteq \mathbf {C}\), by [33, Theorem 2.2]. Thus \(\mathbf {CF}\) is the smallest super-\({{\,\mathrm{AFL}\,}}\). For more examples and generalisations, we refer the reader to the so-called hyper-\({{\,\mathrm{AFL}\,}}\)s defined by Engelfriet [26], all of which are super-\({{\,\mathrm{AFL}\,}}\)s.

A crucial definition, mixing rewriting systems with formal language theory, is the following. Let \(\mathbf {C}\) be a class of languages. We say that a rewriting system \(\mathscr {R} \subseteq A^*\times A^*\) is \(\mathbf {C}\)-ancestry preserving if for every \(L \in \mathbf {C}\) with \(L \subseteq A^*\), we have \(\langle L \rangle _{\mathscr {R}} \in \mathbf {C}\). A class of languages \(\mathbf {C}\) has the monadic ancestor property if every monadic \(\mathbf {C}\)-rewriting system is \(\mathbf {C}\)-ancestry preserving. It is not hard to see that any full \({{\,\mathrm{AFL}\,}}\) closed under nested iterated substitution has the monadic ancestor property [11, Theorem 2.2]). The converse holds, too, though we shall only require the forward direction herein.

Proposition 1.1

Let \(\mathbf {C}\) be a full \({{\,\mathrm{AFL}\,}}\). Then \(\mathbf {C}\) is a super-\({{\,\mathrm{AFL}\,}}\) if and only if it has the monadic ancestor property.

We refer the reader to Nyberg-Brodda [61] for a full (rather straightforward) proof. Thus we have a combinatorial characterisation of super-\({{\,\mathrm{AFL}\,}}\)s in terms of rewriting systems. We now turn to the main objects of study of this article.

1.3 Special monoids

Let \(M = \text {Mon} \langle A \mid w_1 =1, w_2 = 1, \dots , w_i = 1, \dots \rangle \). Then M is called special. That is, a monoid is special if it admits a presentation in which the right-hand side of every defining relation is the empty word. Unless stated otherwise, we always consider finitely presented special monoids. The group of units U(M) is the subgroup of M consisting of all (two-sided) invertible elements. We say that a word \(w \in A^*\) is an invertible word if it represents an invertible element of the monoid M. Arguably the most fundamental result for special monoids is the following, which relates a special monoid to its group of units.

Theorem

(Makanin [51]) Let \(M = \text {Mon} \langle A \mid w_1 =1, w_2 = 1, \dots , w_k = 1 \rangle \). Then U(M) is a k-relator group. Furthermore, the word and divisibility problems for M reduce to the word problem for U(M).

The results were announced in [52], and a proof appeared in Makanin’s Ph.D. thesis [51], see [64] for an English translation by the author of the present article. For the case \(k=1\), i.e. special one-relation monoids \(\text {Mon} \langle A \mid w=1 \rangle \), and more generally the case when \(|w_i| = |w_j|\) for all \(i, j \ge 1\), Makanin’s result had already been proved by Adian [1]. In particular, we deduce the decidability of the word problem for any special one-relation monoid \(\text {Mon} \langle A \mid w=1 \rangle \), as the word problem for one-relator groups \(\text {Gp} \langle X \mid r=1 \rangle \) is decidable, a classical result of Magnus [49].

We give some definitions and notation which will be used throughout the article. We follow Zhang [78] in notation, who produced a significantly shorter rewriting of Makanin’s result in terms of rewriting systems. We begin by stating a simple lemma, which appears in most of the arguments used throughout this article, and indeed in the literature in general.Footnote 3 We shall almost always use it implicitly.

Lemma 1.2

If the words xy and yz are invertible in a monoid M, then all three of the words xy, and z are invertible in M.

An example of a useful application of Lemma 1.2 is: if a word x is a prefix of an invertible word u, and a suffix of an invertible word v, then x is itself invertible.

Consider now an arbitrary finitely presented special monoid

$$\begin{aligned} M = \text {Mon} \langle A \mid w_1 =1, w_2 = 1, \dots , w_k = 1 \rangle , \end{aligned}$$
(1.1)

which shall remain fixed throughout this section. The words \(w_1, w_2, \dots , w_k\) will be called the defining words of the monoid. We say that an invertible word \(u \in A^+\) is minimal if none of its non-empty proper prefixes is invertible (in M). The set of all minimal words forms a biprefix code, denoted \(\mathfrak {M}\). Obviously any invertible word factors (uniquely) as a product of minimal words.

Every defining word \(w_i\) for \(1 \le i \le k\) is an invertible word, as \(w_i =_M 1\). Hence, we can uniquely factor every \(w_i\) into minimal words as \(w_i \equiv w_{i,1} w_{i,2} \cdots w_{i,\ell _i}\), where \(w_{i,j} \in \mathfrak {M}\) for \(1 \le j \le \ell _i\). The set of all minimal words arising in this way shall be denoted \(\Lambda \), and called the set of presentation pieces of (1.1). That is,

$$\begin{aligned} \Lambda = \bigcup _{i=1}^{k} \bigcup _{j=1}^{\ell _i} \{ w_{i,j} \} \subseteq A^*. \end{aligned}$$

We let \(\Delta \) denote the set of all minimal words \(\delta \in \mathfrak {M}\) satisfying: there exists some \(\lambda \in \Lambda \) with \(\delta =_M \lambda \) and \(|\delta | \le |\lambda |\). The set \(\Delta \) is called the set of invertible pieces of the presentation.Footnote 4 As \(\Delta \subseteq \mathfrak {M}\), no elements of \(\Delta \) overlap non-trivially. In particular, \(\Delta \) is a biprefix code. Furthermore, note that \(\Lambda \subseteq \Delta \) and that \(\langle \Delta \rangle = \langle \Lambda \rangle \), as submonoids of M. We partition \(\Delta \) according to which words in \(\Delta \) represent the same elements of M. This partition of \(\Delta \), i.e. the partition of \(\Delta \) induced by the equivalence relation \(=_M\), will be denoted \(\Delta _1 \cup \Delta _2 \cup \cdots \cup \Delta _\nu \). Let \(X = \{ x_1, \dots , x_\nu \}\) be a set of new symbols, and let \(\phi :\Delta ^*\rightarrow X^*\) be the map induced by \(\delta \mapsto x_i\) when \(\delta \in \Delta _i\). This is a well-defined homomorphism, as \(\Delta \) is a biprefix code. One can show (see [78, Theorem 3.7]) that

$$\begin{aligned} \text {Gp} \langle X \mid \{ \phi (w_i) = 1 \, (1 \le i \le k) \} \rangle \end{aligned}$$
(1.2)

is a (k-relator) group presentation for the group of units U(M) of M. In particular, we have \(\langle \Delta \rangle = \langle \Lambda \rangle = U(M)\). In fact, for \(u, v \in \Delta ^*\) we have \(u =_M v\) if and only if \(\phi (u) =_{U(M)} \phi (v)\), see [78, Lemmas 3.1 and 3.6]. We will find it convenient to consider \(\Delta \) as a finite generating set for U(M), and consider the language

$$\begin{aligned} {{\,\mathrm{WP}\,}}_\Delta ^{U(M)} = \{ u \# v^\text {rev}\mid u, v \in \Delta ^*, \phi (u) =_{U(M)} \phi (v)\} = {{\,\mathrm{WP}\,}}_A^M \cap \Delta ^*\# (\Delta ^\text {rev})^*.\qquad \end{aligned}$$
(1.3)

Zhang [78] introduced a rewriting system for studying special monoids, which we shall also find use for in this article. For any total order \(\prec \) on A, let \(<_{s}\) denote the short-lex order on A induced by \(\prec \). Define the rewriting system \(S = S(M)\) as:

$$\begin{aligned} S = \{ (u \rightarrow v) \mid u, v \in \Delta ^*:u =_M v \text { and } u >_s v\}. \end{aligned}$$
(1.4)

Zhang [78, Proposition 3.2] proves that this (generally infinite) system defines M, and is furthermore a complete rewriting system.

Now, in general, neither the set \(\Delta \) nor the presentation (1.2) are effectively computable from (1.1). However, if \(k=1\), then there is an algorithm, called Adian’s (overlap) algorithm, which computes both \(\Delta \) and (1.2). In fact, in this case, we also have \(\Lambda = \Delta \) and \(\nu =1\), which follows from Magnus’ Freiheitssatz. We refer the reader to [32, 43, 80] for details. Adian’s algorithm can fail to produce the correct output already when \(k=2\). Indeed, in general the problem of computing (1.2) is undecidable. A procedure for when \(k\ge 1\), implicit in Makanin’s Ph.D. thesis, is described in the author’s Ph.D. thesis [60]. This does not always terminate, but when it does it always outputs (1.2) and \(\Delta \) correctly.

We make one final definition. If \(\lambda \in \Lambda \) is a presentation piece, then we say that \(\lambda \) is obtained from itself by the piece-generating operation, and inductively, we say:

\((*)\):

Suppose that \(w \equiv h_1 \delta _{i,1} \delta _{i,2} \cdots \delta _{i,p} h_2\) is obtained from \(\lambda \) by the piece-generating operation, where \(p \ge 0\) and \(h_1, h_2\) are non-empty, and \(\delta _{i,j} \in \Delta \) for every such \(\delta _{i,j}\). Suppose then that \(w' \equiv h_1 \delta _{j,1} \delta _{j,2} \cdots \delta _{j,t} h_2\), with \(t \ge 0\), that \(|w'| \le |w|\), and that \(\delta _{i,1} \delta _{i,2} \cdots \delta _{i,p} =_M \delta _{j,1} \delta _{j,2} \cdots \delta _{j,t}\). Then \(w'\) is also said to be obtained from \(\lambda \) by the piece-generating operation.

If \(w \in A^*\) can be obtained from \(\lambda \) by the piece-generating operation, then we denote this by \(w \in [\lambda ]^\downarrow \). It is easy to see that for any \(w \in [\lambda ]^\downarrow \), we have that w is a minimal word, i.e. \(w \in \mathfrak {M}\) (see e.g. the second half of the proof of Lemma 2.2). We have \(\lambda \in [\lambda ]^\downarrow \), and \(|w| \le |\lambda |\) for every \(w \in [\lambda ]^\downarrow \), so in particular for every \(\lambda \in \Lambda \) the set \([\lambda ]^\downarrow \) is finite. In general, for \(\lambda _1, \lambda _2 \in \Lambda \) we can have \([\lambda _1]^\downarrow \cap [\lambda _2]^\downarrow \ne \varnothing \) even when \(\lambda _1 \not \equiv \lambda _2\). We remark the following useful fact: if \(\lambda \in \Lambda \) and \(\delta \in \Delta \) are such that \(\lambda \equiv h_1 w h_2\) and \(\delta \equiv h_1 w' h_2\), with \(h_1, h_2 \in A^+\) and \(w \xrightarrow {*}_{S} w'\), then \(\delta \in [\lambda ]^\downarrow \). The converse does not, in general, hold.

Example 1

Let \(M_1 = \text {Mon} \langle a,b,c \mid ab^3c = 1, b^2 = 1 \rangle \). Then it is not hard to show (e.g. by using a finite complete rewriting system for \(M_1\)) that \(\Lambda = \{ b, ab^3c\}\), while \(\Delta = \{ b, abc, ab^3c\}\). Then \(abc \in \Delta \) is obtained from the piece \(ab^3c \in \Lambda \) by the generating operation, as \(b^3 =_{M_1} b\) and \(|abc| \le |ab^3c|\). That is, \(abc \in [ab^3c]^\downarrow \).

Otto and Zhang [66, Theorem 5.2] proved the following “normal form lemma”.

Lemma 1.3

[Otto and Zhang] Let M be as given in (1.1), and let \(u, v \in A^*\) be such that \(u =_M v\). Then we can uniquely factorise u and v as

$$\begin{aligned} u \equiv u_0 a_1 u_1 \cdots a_m u_m, \qquad \text { and} \qquad v \equiv v_0 a_1 v_1 \cdots a_m v_m, \end{aligned}$$

respectively, where for all \(0 \le i \le m\) we have \(a_i \in A\) and

  1. (1)

    \(u_i =_M v_i\);

  2. (2)

    \(u_i\) is a maximal invertible factor of u.

  3. (3)

    \(v_i\) is a maximal invertible factor of v.

Here, a maximal invertible subword of w is one which is not properly contained in any other invertible subword of w. Lemma 1.3 tells us, roughly speaking, the importance of understand the equality of invertible words for understanding \({{\,\mathrm{WP}\,}}_A^M\). For this reason, we will begin by studying the invertible words of special monoids.

2 Invertible elements

Throughout this section, if not explicitly stated otherwise, we let

$$\begin{aligned} M = \text {Mon} \langle A \mid w_1 = 1, w_2 = 1, \dots , w_k =1 \rangle \end{aligned}$$
(2.1)

be a fixed special monoid. We assume \(|A|<\infty \). Let \(\Delta \) be the set of minimal invertible pieces of M, and let \(\phi :\Delta ^*\rightarrow X^*\) be the associated homomorphism. In general, if \(w \in A^*\) is invertible, it need not be the case that \(w \in \Delta ^*\). For example, in the bicyclic monoid \(B = \text {Mon} \langle b,c \mid bc=1 \rangle \), we have \(\Delta = \{ bc\}\), but \(b^n c^n \not \in \Delta ^*\) is invertible for every \(n \ge 0\). We remark as an aside that it is easy to prove that every non-empty invertible word contains a piece as a subword. The aim of this section is to give an explicit description of the invertible words in terms of \(\Delta \), in order to study their language-theoretic properties.

2.1 Generalised pieces

Let \(w \in A^*\) be a minimal word such that there exists some \(\delta \in \Delta \) with \(w \xrightarrow {*}_{S} \delta \). Then we say that w is a generalised piece. Recall that \(S = S(M)\) is the system (1.4). The collection of all generalised pieces is denoted \(\overline{\Delta }\). Note that \(\Delta \subseteq \overline{\Delta }\). Furthermore, as \(\overline{\Delta }\) consists of minimal words, we conclude that \(\overline{\Delta }\) is a biprefix code as a subset of \(A^*\). The following elementary properties are easy to prove, with the same proof, mutatis mutandis, as [78, Proposition 2.1].

Lemma 2.1

Let \(x, y, z \in A^*\). Then (1) \(xy, x \in \overline{\Delta }^*\) implies \(y \in \overline{\Delta }^*\). (2) \(yz, z \in \overline{\Delta }^*\) implies \(y \in \overline{\Delta }^*\). (3) Suppose that \(xy \in \overline{\Delta }^*\). If either x or y is invertible, then \(x \in \overline{\Delta }^*\) and \(y \in \overline{\Delta }^*\). (4) Suppose that \(xy \in \overline{\Delta }^*\) and \(yz \in \overline{\Delta }^*\). Then \(x, y, z \in \overline{\Delta }^*\).

Proof

Statements (1) and (2) follow directly from the fact that \(\overline{\Delta }\) is a biprefix code. For (3), suppose x is invertible, and let \(xy \equiv x_1 x_2 \cdots x_m\) with \(x_i \in \overline{\Delta }\) where \(1 \le i \le m\). Then \(x \equiv x_1 x_2 \cdots x_{\ell -1} E\), where E is a prefix of \(x_\ell \) for some \(\ell \le m\). Write \(x_\ell \equiv EF\) where \(F \in A^*\). As x and \(x_\ell \) are invertible, it follows that E is invertible by overlaps. As \(x_\ell \) is minimal, we must thus have that either E is empty, or else E is all of \(x_\ell \). In either case, \(x \in \overline{\Delta }^*\). By (1), we conclude \(y \in \overline{\Delta }^*\). Symmetrically, the results hold when y is invertible. For (4), as we have \(xy \in \overline{\Delta }^*\) and \(yz \in \overline{\Delta }^*\), we have that xy and yz are invertible. Hence y is invertible. By (3), we have \(x, y, z \in \overline{\Delta }^*\). \(\square \)

Lemma 2.1(4) ensures that the types of overlap arguments possible for words over \(\Delta ^*\) are also possible for \(\overline{\Delta }^*\). We shall use this implicitly throughout the remainder of the article. Now, \(\overline{\Delta }\) is a set whose elements are, in a certain sense, controlled by elements of \(\Delta \). The main benefit of introducing this set is the following lemma.

Lemma 2.2

A word \(w \in A^*\) is invertible if and only if \(w \in \overline{\Delta }^*\).

Proof

Any element of \(\overline{\Delta }^*\) is clearly invertible. For the converse, assume \(w \in A^*\) is invertible. By [78, Lemma 3.4] there exists some least \(n \ge 0\) and a \(D \in \Delta ^*\) such that \(w \xrightarrow {*}_{S}^n D\). We will prove the claim by induction on n. The base case \(n=0\) is clear, for then \(w \equiv D \in \Delta ^*\subseteq \overline{\Delta }^*\). Assume that the claim is true for some \(n \ge 0\), and let w be such that \(w \xrightarrow {*}_{S}^{n+1} D\). Then there exists some \(w_1 \in A^*\) such that \(w \xrightarrow {}_{S} w_1\) and \(w_1 \xrightarrow {*}_{S}^n D\). As \(w =_M w_1\), the word \(w_1\) is invertible and by the inductive hypothesis \(w_1 \in \overline{\Delta }^*\). Write \(w_1 \equiv \overline{\delta }_0 \overline{\delta }_1 \cdots \overline{\delta }_k\) where \(\overline{\delta }_i \in \overline{\Delta }\) for \(0 \le i \le k\). As \(w \xrightarrow {}_{S} w_1\), there exists some \((\ell , r) \in S\) and words \(u, v \in A^*\) such that \(w \equiv u \ell v\) and \(w_1 \equiv urv\). We subdivide into two cases, depending on whether r contains as a subword one of the \(\overline{\delta }_i\) or not.

In the first case, we assume the fixed subword r of \(w_1\) contains some \(\overline{\delta }_i\), where \(0 \le i \le k\), as a subword. Then, as \(\overline{\Delta }\) is a biprefix code and \(r \in \Delta ^*\subseteq \overline{\Delta }^*\), we must have that \(w_1 \equiv ErF\), where \(E, F \in \overline{\Delta }^*\). Hence \(w \equiv E\ell F\), and as \(\ell \in \Delta ^*\), we have \(w \in \overline{\Delta }^*\). In the second case, this fixed subword r does not contain any \(\overline{\delta }_i\) as a subword. We deal with two separate subcases, depending on whether r is empty or not.

First, if \(r \equiv \varepsilon \), then there exists \(0 \le i \le k\) we can write \(u \equiv \overline{\delta }_0 \cdots \overline{\delta }_{i-1}\overline{\delta }'_i\) and \(v \equiv \overline{\delta }''_i \overline{\delta }_{i+1} \cdots \overline{\delta }_k\), where \(\overline{\delta }'_i, \overline{\delta }''_i \in A^*\) are such that \(\overline{\delta }'_i \overline{\delta }''_i \equiv \overline{\delta }_i\); and such that

$$\begin{aligned} w \equiv u\ell v \equiv \overline{\delta }_0 \cdots \overline{\delta }_{i-1}(\overline{\delta }_i'\ell \overline{\delta }''_i) \overline{\delta }_{i+1} \cdots \overline{\delta }_k. \end{aligned}$$
(2.2)

Assume \(|\overline{\delta }'_i|\cdot |\overline{\delta }''_i| = 0\), i.e. at least one of \(\overline{\delta }'_i, \overline{\delta }''_i\) is empty. If \(\overline{\delta }'_i \equiv \varepsilon \), then \(\overline{\delta }''_i \equiv \overline{\delta }_i \in \overline{\Delta }\), and as \(\ell \in \overline{\Delta }^*\), we have \(w \in \overline{\Delta }^*\) by (2.2). The case \(\overline{\delta }''_i \equiv \varepsilon \) is entirely symmetrical. Thus assume instead that \(|\overline{\delta }'_i|\cdot |\overline{\delta }''_i| > 0\). We claim that no non-empty prefix of \(\overline{\delta }'_i \ell \overline{\delta }''_i\) is invertible, i.e. this word is minimal. By minimality of \(\overline{\delta }'_i\overline{\delta }''_i \in \overline{\Delta }\), no proper non-empty prefix or suffix of this word is invertible; thus if some prefix of \(\overline{\delta }'_i \ell \overline{\delta }''_i\) were invertible, then it is of the form \(\overline{\delta }'_i \ell _1\), where \(\ell _1 \in A^+\) is some non-empty proper prefix of \(\ell \). Thus \(\ell _1\) is left invertible, being a suffix of the invertible \(\overline{\delta }'_i \ell _1\), but also right invertible, being a prefix of \(\ell \). It follows that \(\ell _1\) is invertible. As \(\overline{\delta }_i' \ell _1\) is invertible, thus \(\overline{\delta }_i'\) is invertible, which contradicts the minimality of \(\overline{\delta }_i \equiv \overline{\delta }'_i \overline{\delta }''_i\) as \(|\overline{\delta }''_i| > 0\). It follows that \(\overline{\delta }'_i \ell \overline{\delta }''_i\) is minimal. As \(\overline{\delta }'_i \ell \overline{\delta }''_i\) it is clearly invertible, being equal in M to \(\overline{\delta }'_i \overline{\delta }''_i \in \overline{\Delta }\) by virtue of \(\ell =_M 1\), we have that \(\overline{\delta }'_i \ell \overline{\delta }''_i \in \overline{\Delta }\). By (2.2), we have \(w \in \overline{\Delta }^*\).

The case \(r \not \equiv \varepsilon \) uses very similar reasoning. We omit the proof for brevity. \(\square \)

Proposition 2.3

\(\overline{\Delta }= \mathfrak {M}\).

Proof

Clearly, \(\mathfrak {M}^*\) is the set of invertible words of M, so \(\mathfrak {M}^*= \overline{\Delta }^*\) by Lemma 2.2. As \(\mathfrak {M}\) and \(\overline{\Delta }\) are both biprefix codes, we thus necessarily have \(\overline{\Delta }= \mathfrak {M}\). \(\square \)

The description of \(\mathfrak {M}\) as \(\overline{\Delta }\) gives us access to an explicit description of the minimal words in terms of the objects \(\Delta \) and \(\xrightarrow {*}_{S}\), which will be useful. For some more properties of the set \(\overline{\Delta }\), we refer the reader to Chapter 3 of the author’s Ph.D. thesis [60].

2.2 Controlling the pieces

We will now demonstrate certain manipulations of the pieces \(\Delta \) of a presentation, in order to gain sufficient control over the set \(\overline{\Delta }\). One desirable property, which would simplify a bulk of reasoning, would be the non-existence of elements of \(\Delta \) appearing as proper subwords of other pieces in \(\Delta \), i.e. that \(\Delta \) is an infix code.Footnote 5 We do not know if every special monoid admits a presentation where \(\Delta \) is an infix code (see Question ). Instead, we introduce and consider a weaker property, which shall be sufficient for our purposes. Let \(\delta \in \Delta \) be a piece. If \(\delta \equiv h_1 w h_2\) for some \(h_1, h_2 \in A^+\) and \(w \in \Delta ^+\), then we say that w is a subpiece (of \(\delta \)). If w is a subpiece and \(|w|=1\), then we say that w is a small subpiece. A subpiece which is not small is called large. The special monoid presentation (2.1) is said to satisfy the small subpiece condition if all subpieces of pieces in \(\Delta \) are small.

Example 2

If \(M_2 = \text {Mon} \langle a,b,c \mid ab^2c =1, b = 1 \rangle \), then one can show that the pieces are \(\Delta = \{ ab^2c, abc, ac, b \}\). Thus there are three subpieces; b appearing as a subword of abc and of \(ab^2c\), and \(b^2\) appearing as a subword of \(ab^2c\). The first two subpieces are small, but as \(|b^2|=2\), this last subpiece is not small.

The property of satisfying the small subpiece condition is quite strongly tied to the presentation chosen for the monoid; indeed, \(M_2\) is clearly isomorphic to the bicyclic monoid \(\text {Mon} \langle a,c \mid ac=1 \rangle \), which satisfies the small subpiece condition. In this section we will, however, prove the following:

Proposition 2.4

Every finitely presented special monoid admits a presentation satisfying the small subpiece condition.

Before proving this, we state a useful lemma, proved by Makanin [51, Lemma 12].

Lemma 2.5

[Makanin’s Lemma] Let M be a special monoid given by the finite presentation

$$\begin{aligned} M= \text {Mon} \langle A \mid w_1 = 1, \dots , w_k = 1 \rangle . \end{aligned}$$

with presentation pieces \(\Lambda \). Fix some \(1 \le i \le k\), and let \(w_i \equiv \lambda _1 \lambda _2 \cdots \lambda _\ell \) with \(\lambda _j \in \Lambda \) for \(1 \le j \le \ell \). Suppose that \(\delta \in \Delta \) is such that \(\delta \in [\lambda _p]^\downarrow \) for some fixed \(1 \le p \le \ell \). Let \(w_i' \equiv \lambda _1 \lambda _2 \cdots \lambda _{p-1} \delta \lambda _{p+1} \cdots \lambda _\ell \), and let

$$\begin{aligned} M' = \text {Mon} \langle A \mid w_{1}=1, \dots , w_i' = 1, \dots , w_k = 1 \rangle . \end{aligned}$$

Then \(M \cong M'\) via the identity map, i.e. the presentations define the same congruence on \(A^*\). Furthermore, the factorisation in \(M'\) of the defining word \(w'_i\) into minimal invertible factors is obtained by replacing \(\lambda _p\) with \(\delta \) in the factors of the factorisation of \(w_i\), and the factorisation of the defining word \(w_j\) (\(j \ne i\), \(1 \le j \le k\)) is identical to its factorisation in M.

For a full discussion of the validity of the translation of the lemma into the language of minimal invertible pieces \(\Delta \) (rather than Makanin’s original “c-words”), see Sect. 1.3 of the author’s Ph.D. thesis [60]. We will now give an example for how Makanin’s lemma can be applied to remove large subpieces. The idea in the example is the same as the general idea which will be used in the proof of Proposition 2.4.

Example 3

Let \(M_3 = \text {Mon} \langle a,b \mid abaabbab = 1 \rangle \). By Adian’s algorithm (see Lallement [43]), the defining word factors into invertible pieces as (ab)(aabb)(ab), so \(\Delta = \Lambda = \{ ab, aabb \}\). Thus \(ab \in \Delta ^+\) is a large subpiece of \(aabb \in \Delta \). We will replace this large subpiece ab by a small subpiece p.

Let p be any new symbol, and introduce the defining relation \(p = ab\) to the presentation via a Tietze transformation, giving \(\text {Mon} \langle a,b,p \mid abaabbab = 1, p=ab \rangle \). It is clear that \(aabb \cdot ab\) is an inverse of ab, so from \(p=_{M_3} ab\) it thus follows that \(M_3\) is isomorphic to

$$\begin{aligned} \text {Mon} \langle a,b,p \mid abaabbab = 1, p=ab, p(aabb \cdot ab) = 1, (aabb \cdot ab)p = 1 \rangle . \end{aligned}$$

As the fact that both p and ab are inverses of aabbab follows from the two added relations, the relation \(p=ab\) follows from these relations; thus we can remove \(p=ab\), and find that \(M_3\) is isomorphic to the special monoid

$$\begin{aligned} \text {Mon} \langle a,b \mid (ab)(aabb)(ab) = 1, p(aabb)(ab)=1, (aabb)(ab)p = 1 \rangle . \end{aligned}$$
(2.3)

It is clear that each of paabbapb, and ab is a minimal invertible piece of this presentation. As \(p=_{M_3} ab\), we have that the piece apc is obtained from aabb by the piece-generating operation. Thus, by Makanin’s Lemma, we may replace aabc by apc in (2.3) without changing the monoid defined by it. Thus

$$\begin{aligned} M_3 \cong \text {Mon} \langle a,b,p \mid (ab)(apb)(ab)=1, p(apb)(ab) = 1, (apb)(ab)p = 1 \rangle . \end{aligned}$$

It is now not difficult to show that, for this new presentation, \(\Delta = \Lambda = \{ p, ab, apb\}\). Thus, this new presentation satisfies the small subpiece condition.

We now generalise the above example to the general case. First, for any set \(S \subseteq A^*\), we let \(\omega (S)\) denote the natural number \(\sum _{s \in S}(|s|-1)\). If \(S = \Lambda \), where \(\Lambda \) is the set of presentation pieces of (2.1), then in a loose sense \(\omega (\Lambda )\) is a measure of the “complexity” of the pieces of the presentation – we remark that M is cancellative if and only if \(\omega (\Lambda )=0\) (see Sect. 3.3). We shall, in the subsequent proof of Proposition 2.4, use several operations on the presentation (2.1), each of which reduces or does not increase \(\omega (\Lambda )\). In particular, if a presentation (2.1), with presentation pieces \(\Lambda \), does not satisfy the small subpiece condition, we will show that we find a new presentation with presentation pieces \(\Lambda '\) such that \(\omega (\Lambda ')<\omega (\Lambda )\). The proof will then be complete by induction.

Before we can realise the above idea in practice, we remark that it is not difficult to construct special monoids in which no presentation piece has a large subpiece, but there is some piece with a large subpiece. We begin with a lemma to remedy this, by showing that we can always find a presentation where large subpieces are “brought to light” in the presentation pieces.

Lemma 2.6

Let M be as in (2.1), with presentation pieces \(\Lambda \). Then M admits a special monoid presentation, with presentation pieces \(\Lambda '\), such that either this presentation satisfies the small subpiece condition; or else there is a presentation piece \(\lambda \in \Lambda '\) containing a large subpiece, and \(\omega (\Lambda ') \le \omega (\Lambda )\).

Proof

If (2.1) satisfies the small subpiece condition, then we are done, so suppose that it does not. Let \(\delta \in \Delta \) and \(w \in \Delta ^+\) be such that w is a large subpiece of \(\delta \). If \(\delta \in \Lambda \), then we are done. If \(\delta \not \in \Lambda \), then there is some \(\lambda _0 \in \Lambda \) such that \(\delta =_M \lambda _0\). Fix such a \(\lambda _0\). Then there are words \(u_0, u_1, \dots , u_n \in A^*\) and a sequence

(2.4)

In the rewriting (2.4), suppose (without loss of generality, by symmetry) that the first letter of \(\delta \) is affected (in the sense of Novikov [59, I. Sect. 1] and Adian [3]) before the last letter is. Suppose the first time, if any, this happens is in the rewriting \(u_i \xrightarrow {}_{M} u_{i+1}\). Then \(u_i \equiv v_0 \delta ' v_1\), where \(v_0 =_M v_1 =_M 1\) and \(\delta ' =_M \delta \). The rewriting \(u_i \xrightarrow {}_{M} u_{i+1}\) affects the first letter of \(\delta '\) (which is the same as the first letter of \(\delta \)) by deleting a defining relation \(w_j\) (\(1 \le j \le k\)), and therefore must, by minimality of \(\delta '\), and invertibility of \(v_0, v_1\), be such that \(\delta '\) is one of the minimal invertible pieces in the factorisation of this \(w_j\); thus \(\delta ' \in \Lambda \).

We conclude that for our chosen \(\delta \), we can find a piece \(\lambda \in \Lambda \) such that \(\delta =_M \lambda \), and there is a rewriting which does not affect the first or last letter of \(\delta \). Indeed, we can take \(\lambda \equiv \delta '\) as above if the first letter of \(\delta \) is affected in (2.4); otherwise, we can take \(\lambda \equiv \lambda _0\). In either case, pick the longest \(\lambda \) with the given property. Then there exist \(h_1, h_2 \in A^+\) with \(\delta \equiv h_1 w h_2\) and \(\lambda \equiv h_1 w' h_2\) with \(w =_M w'\). Thus there is some \(W \in A^*\) with \(w \xrightarrow {*}_{S} W\) and \(w' \xrightarrow {*}_{S} W\). Hence \(|w| \ge |W|\) and \(|w'| \ge |W|\). As \(\lambda \) was chosen longest, we also have \(|w| \le |w'|\) (for otherwise \(\delta \not \in \Delta \)).

Now, if \(|w'| = |W|\), then also \(|w| = |W|\). Thus the sequences of rules

$$\begin{aligned} (s_{1,1} , s_{1,2}), (s_{2,1} , s_{2,2}), \dots , (s_{m,1} , s_{m,2})&\in S \\ (s'_{1,1} , s'_{1,2}), (s'_{2,1} , s'_{2,2}), \dots , (s'_{\ell ,1} , s'_{\ell ,2})&\in S \end{aligned}$$

transforming \(w \xrightarrow {*}_{S} W\) resp. \(w' \xrightarrow {*}_{S} W\) satisfies \(|s_{i,1}| = |s_{i,2}|\) resp. \(|s'_{j,1}| = |s'_{j,2}|\) for all \(1 \le i \le m\) resp. \(1 \le j \le \ell \). Thus, by composing the sequence of rules rewriting \(w'\) to W with the reverse of the sequence of rules rewriting w to W, we find a sequence of applications of the piece-generating operation rewriting \(h_1 w' h_2\) to \(h_1 w h_2\). In other words, \(h_1 w h_2 \in [h_1 w' h_2]^\downarrow \), i.e. \(\delta \in [\lambda ]^\downarrow \). By Makanin’s Lemma, we may everywhere replace \(\lambda \) by \(\delta \) without changing the presentation; in the resulting presentation, whose presentation pieces will be denoted \(\Lambda '\), we have \(\delta \in \Lambda '\), and \(\delta \) contains a large subpiece. As \(|\delta | = |\lambda '|\), we have \(\omega (\Lambda ') = \omega (\Lambda )\), and we are done.

Suppose instead that \(|w'|>|W|\). We have \(\delta ' :\equiv h_1 W h_2 \in [\lambda ]^\downarrow \). By Makanin’s Lemma, we may everywhere replace \(\lambda \) with \(\delta '\) in (2.1) without changing the monoid M. Let \(\Lambda '\) be the new presentation pieces of this presentation. Then, as \(\lambda \) was chosen longest and \(|\delta '| < |\lambda |\), we have \(\omega (\Lambda ') < \omega (\Lambda )\), as \(\Lambda ' = (\Lambda - \{\lambda \}) \cup \{ \delta ' \}\) by the second part of Makanin’s Lemma. We may thus repeat the above proof for the new presentation, and are done by induction. \(\square \)

Having proved this rather technical lemma, we can proceed with our main proof.

Proof of Proposition 2.4

Suppose M is given by a presentation (2.10), with presentation pieces \(\Lambda _0\). Then M admits a presentation

$$\begin{aligned} \text {Mon} \langle A \mid w_1 = 1, w_2 =1, \dots , w_k = 1 \rangle \end{aligned}$$
(2.5)

satisfying the conclusions of Lemma 2.6, with presentation pieces \(\Lambda \) resp. pieces \(\Delta \), and such that \(\omega (\Lambda ) \le \omega (\Lambda _0)\). If (2.5) satisfies the small subpiece condition, we are done, so assume the second part of Lemma 2.6 holds, and let \(\lambda \in \Lambda \) be a presentation piece such that \(w \in \Delta ^+\) is a large subpiece of \(\lambda \). Write \(\lambda \equiv h_1 w h_2\) with \(h_1, h_2 \in A^+\). Introduce a new symbol p, disjoint from A, and add by way of Tietze transformation the relation \(p=w\) to the presentation (2.5). The resulting presentation is not special. However, as w is invertible, there exists some \(w' \in \Lambda ^+\) such that \(ww' =_M w'w =_M 1\). Hence also \(pw' =_M w'p =_M 1\). We add these relations to the presentation. As inverses in a group are unique, and as p and w are both invertible words, we find the relation \(p=w\) redundant. We remove it by a Tietze transformation, resulting in a new special presentation:

$$\begin{aligned} M' = \text {Mon} \langle A \cup \{ p \} \mid w_1 = 1, \dots , w_k = 1, pw' = 1, w'p = 1 \rangle . \end{aligned}$$
(2.6)

Now the map induced by \(a \mapsto a\) for all \(a \in A\) extends to an isomorphism from M to \(M'\). Thus the factorisation of \(w'\) and the \(w_i\), for \(1 \le i \le k\), into minimal invertible pieces is the same in \(M'\) as in M. Clearly, p is invertible. It follows that the set \(\Lambda '\) of presentation pieces of (2.6) is precisely \(\Lambda ' = \Lambda \cup \{ p \}\).

From the presentation piece \(\lambda \equiv h_1 w h_2 \in \Lambda \subset \Lambda '\) in \(M'\) we can by the piece-generating operation obtain the piece \(\delta := h_1 p h_2\), as \(p, w' \in (\Lambda ')^*\) satisfy \(p =_{M'} w\) and \(|p| < |w|\). That is, \(\delta \in [\lambda ]^\downarrow \). By Makanin’s Lemma, we can thus in the factorisations of the defining words in (2.6) replace \(\lambda \) by without changing the monoid. Let \(w_i'\) denote the word obtained by this replacement from \(w_i\) (for \(1 \le i \le k\)), and \(w''\) the word from \(w'\). We find a new presentation

$$\begin{aligned} M'' = \text {Mon} \langle A \cup \{ p \} \mid w_1' = 1, \dots , w_k' = 1, pw'' =1, w''p = 1 \rangle . \end{aligned}$$
(2.7)

Let \(\Lambda ''\) denote the presentation pieces of (2.7). As \(|p| = 1\) and \(|w|>1\), it follows that \(\delta := h_1 p h_2\) satisfies \(|\delta |<|\lambda |\). By the second part of Makanin’s Lemma, \(\delta \in \Lambda ''\), and the other presentation pieces of (2.7) are presentation pieces of (2.6), i.e. in \(\Lambda '\). Thus \(\omega (\Lambda '') < \omega (\Lambda ')\). In particular, we find

$$\begin{aligned} \omega (\Lambda '') < \omega (\Lambda ') = \sum _{\lambda ' \in \Lambda \cup \{ p \}} (|\lambda '|-1) = (1-1) + \sum _{\lambda ' \in \Lambda }(|\lambda '|-1) = \omega (\Lambda ). \end{aligned}$$

Thus, repeating the above proof starting with the presentation (2.7), either (2.7) satisfies the small subpiece condition, or else we obtain a presentation \(M'''\) with presentation pieces \(\Lambda '''\) satisfying \(\omega (\Lambda ''') < \omega (\Lambda '')\), etc. We conclude by induction on \(\omega \) that there is some \(n \ge 0\) and a presentation \(M^{(n)}\) with pieces \(\Delta ^{(n)}\) such that no piece \(\delta \in \Delta ^{(n)}\) has a large subpiece; that is, \(M^{(n)}\) satisfies the small subpiece condition, and defines M. \(\square \)

Presentations satisfying the small subpiece condition will now prove crucial for describing the language of representatives of pieces in special monoids.

2.3 Representatives of pieces

Recall that X is a set in bijective correspondence of cardinality the size \(\nu \) of the partition of \(\Delta \) as \(\Delta _1 \cup \Delta _2 \cup \cdots \cup \Delta _\nu \) into pieces equal to each other in M, and that \(\phi :\Delta ^*\rightarrow X^*\) is the canonical surjective homomorphism. For a word \(w \in \Delta ^*\), we define the language of \(\Delta \)-representatives of w as the set

$$\begin{aligned} {{\,\mathrm{Rep}\,}}\Delta _A^M(w) := \{ v \in \Delta ^*\mid v =_M w\} = \phi ^{-1}\left( {{\,\mathrm{Rep}\,}}_X^{U(M)}(\phi (w)) \right) . \end{aligned}$$
(2.8)

From the rightmost representation in (2.8), it follows that if \(\mathbf {C}\) is a class of languages closed under inverse homomorphisms, then \({{\,\mathrm{Rep}\,}}\Delta _A^M(w) \in \mathbf {C}\) if and only if \({{\,\mathrm{Rep}\,}}_X^{U(M)}(\phi (w)) \in \mathbf {C}\). For example, in the bicyclic monoid \(B = \text {Mon} \langle b,c \mid bc=1 \rangle \), with \(\Delta = \{ bc\}\), \(X = \{ x_1 \}\), and \(U(B) = \text {Mon} \langle x_1 \mid x_1 = 1 \rangle \), we have

$$\begin{aligned} {{\,\mathrm{Rep}\,}}\Delta _{\{b, c\}}^B(1) = \phi ^{-1}\left( {{\,\mathrm{Rep}\,}}_{X}^{U(B)}(\phi (1))\right) = \phi ^{-1}\left( x_1^*\right) = (bc)^*. \end{aligned}$$

Thus \({{\,\mathrm{Rep}\,}}\Delta _{\{b, c \}}^B(1)\) is regular, as U(B) is a group with regular word problem.

The idea of the following section is as follows. We will describe the language of representatives \({{\,\mathrm{Rep}\,}}_A^M(\delta )\) of a given piece \(\delta \in \Delta \) as the set of ancestors of \({{\,\mathrm{Rep}\,}}\Delta _A^M(\delta )\) under a certain monadic rewriting system, which in turn is controlled by the group of units U(M). Because \({{\,\mathrm{Rep}\,}}\Delta _A^M(\delta )\) can be understood in terms of U(M) by (2.8), this gives an understanding of \({{\,\mathrm{Rep}\,}}_A^M(\delta )\) in terms of U(M).

We will define the rewriting system

$$\begin{aligned} \mathscr {R}_\Delta = \bigcup _{\begin{array}{c} p \in \Delta \cup \{ \varepsilon \} \\ |p|\le 1 \end{array}} \big \{ (W_p \rightarrow p) \mid W_p \in {{\,\mathrm{Rep}\,}}\Delta _A^M(p) \big \}. \end{aligned}$$
(2.9)

In general, this is an infinite rewriting system. Furthermore, it is not in general a complete rewriting system. However, it has the following desirable property: let \(\mathbf {C}\) be a class of languages closed under inverse homomorphism such that U(M) has word problem in \(\mathbf {C}\). Then for every \(p \in \Delta \cup \{ \varepsilon \}\) with \(|p| \le 1\) (i.e. for every right-hand side in \(\mathscr {R}_\Delta \)), the language \({{\,\mathrm{Rep}\,}}\Delta _A^M(p)\) is in \(\mathbf {C}\), as \({{\,\mathrm{Rep}\,}}_{X}^{U(M)}(\phi (p)) \in \mathbf {C}\). Thus \(\mathscr {R}_\Delta \) is a monadic \(\mathbf {C}\)-rewriting system. For example, in the bicyclic monoid B as above, the set of left-hand sides in \(\mathscr {R}_\Delta \) of \(p \equiv \varepsilon \) is the language \((bc)^*\), which is a regular language, as U(B) has regular word problem.

Before showing the key technical lemma (Lemma 2.7), we introduce some useful terminology. For any word in \(\overline{\Delta }^*\), we can obtain a word in \(\Delta ^*\) by successively removing left-hand sides of rules in S(M), replacing them by their corresponding right-hand sides. We will consider this process in reverse, attributing the terminology of this idea to Cain & Maltcev [15]. First, let \(u \in \Delta ^*\) and factorise \(u \equiv \delta _1 \delta _2 \cdots \delta _n\) uniquely, where \(\delta _i \in \Delta \) for \(1 \le i \le n\). Then every non-empty subword of u of the form \(\delta _j \delta _{j+1} \cdots \delta _{\ell }\) for \(1 \le j \le \ell \le n\) is called a depth-0 inserted word. Inductively, for \(\mu \ge 0\), we define a depth-\((\mu +1)\) insertion as follows: if (1) a right-hand side \(s_2\) of a rule \((s_1 \rightarrow s_2) \in S(M)\) appearing as a proper non-prefix, non-suffix subword of some depth-\(\mu \) inserted word D, with \(D \in \Delta ^*\), is replaced by \(s_1\), then we call that occurrence of \(s_1\) a depth-\((\mu +1)\) inserted word, and the reversed rewriting \((s_2 \rightarrow s_1)\) is then called a depth-\((\mu +1)\) insertion; but (2) if instead the specified occurrence of \(s_2\) is a depth-\(\mu \) inserted word \(D \in \Delta ^*\), or if \(s_2 \equiv \varepsilon \) and does not satisfy the condition in (1), then the word \(s_1\) is a depth-\(\mu \) inserted word. The reversed rewriting \((s_2 \rightarrow s_1)\) is then called a depth-\(\mu \) insertion.

We give a concrete example. If \(\Delta = \{ d, b, abc\}\), and (for simplicity) we have the rewriting system \(\mathscr {T}\) with the rules \(\{dbd \rightarrow b, abc \rightarrow \varepsilon \}\), then an ancestor of the word \(u \equiv (abc)(abc)(b) \in \Delta ^*\) modulo \(\mathscr {T}\) might look like:

$$\begin{aligned} u' \equiv (adbdc)(abababccc)(dababccdbdd) \equiv (\underbrace{adbdc}_{\text {depth 0}}) (ab \underbrace{ab \overbrace{abc}^{\text {depth 2}}c }_{\text {depth 1}} c)(\underbrace{dab\overbrace{abc}^{\text {depth 1}}cdbdd}_{\text {depth 0}}). \end{aligned}$$

Thus, the word abc in the middle of the word \(u'\) is a depth-2 inserted word, and the rewriting of the leftmost term abdbc to abc is via the reverse of a depth-0 insertion \((b \rightarrow dbd)\). Now, just as in [15, Example 4.2], it is clear by definition of insertions (using no properties of the rewriting systems involved) that since \(u' \xrightarrow {*}_{\mathscr {T}} u\), we can perform this rewriting by first rewriting the depth-2 insertions in reverse, then the depth-1 insertions in reverse, and finally have a depth-0 inserted word in \(\Delta ^*\), which is then rewritten to u.

Lemma 2.7

Let M be a finitely presented special monoid. Then M admits a special presentation, with finite generating set A and minimal invertible pieces \(\Delta \), such that for every \(\delta \in \Delta \) we have \({{\,\mathrm{Rep}\,}}_A^M(\delta ) = \langle {{\,\mathrm{Rep}\,}}\Delta _A^M(\delta ) \rangle _{\mathscr {R}_\Delta }\).

Proof

Let M be as given. By Proposition 2.4, M admits a special presentation satisfying the small subpiece condition. Thus, let us assume that M is given by such a presentation

$$\begin{aligned} \text {Mon} \langle A \mid w_1 = 1, w_2 = 1, \dots , w_k=1 \rangle \end{aligned}$$
(2.10)

with pieces \(\Delta \). We will prove that for this presentation \({{\,\mathrm{Rep}\,}}_A^M(\delta ) = \langle {{\,\mathrm{Rep}\,}}\Delta _A^M(\delta ) \rangle _{\mathscr {R}_\Delta }\).

\((\supseteq )\) Let \(w \in {{\,\mathrm{Rep}\,}}\Delta _A^M(\delta )\) and \(w' \in A^*\) be arbitrary words such that \(w' \in \langle w \rangle _{\mathscr {R}_\Delta }\), i.e. \(w' \xrightarrow {*}_{\mathscr {R}_\Delta } w\). Now, for every rule \((W_p, p) \in \mathscr {R}_\Delta \), we have by definition that \(p =_M W_p\). Thus, by induction on the number of rules applied in rewriting \(w'\) to w, we have \(w' =_M w\). As \(w =_M \delta \), we have \(w' \in {{\,\mathrm{Rep}\,}}_A^M(\delta )\).

\((\subseteq )\) Let \(w \in {{\,\mathrm{Rep}\,}}_A^M(\delta )\). Then \(w =_M \delta \), so w is invertible. In particular, \(w \in \overline{\Delta }^*\) by Lemma 2.2, and there is some \(u \in \Delta ^*\) such that \(w \xrightarrow {*}_{S} u\). By the earlier reasoning, we can thus obtain w from u by first performing all depth-0 insertions, then all depth-1 insertions, etc. until after performing a finite number insertions we obtain w. Let \(\mu \ge 0\) be the highest depth of any such insertion performed.

We claim that \(w \in \langle {{\,\mathrm{Rep}\,}}\Delta _A^M(u)\rangle _{\mathscr {R}_\Delta }\) by induction on this \(\mu \). The base case \(\mu =0\) is clear, for then \(w \in \Delta ^*\). As for every rule \((s_1 \rightarrow s_2) \in S(M)\) we have \(s_1 =_M s_2\), it follows by induction on the number of rules applied in rewriting \(w \xrightarrow {*}_{S} u\) that \(w =_M u\). As \(w \in \Delta ^*\), we have \(w \in {{\,\mathrm{Rep}\,}}\Delta _A^M(u) \subseteq \langle {{\,\mathrm{Rep}\,}}\Delta _A^M(u)\rangle _{\mathscr {R}_\Delta }\). Assume, then, for induction that the claim is true for some \(\mu \ge 0\), and that w requires depth-\((\mu +1)\) insertions (but no higher). In the fixed rewriting \(w \xrightarrow {*}_{S} u\), let \(u' \in A^*\) be such that \(w \xrightarrow {}_{S} u' \xrightarrow {*}_{S} u\). Then the rewriting \(w \xrightarrow {}_{S} u'\) is by replacing the depth-\((\mu +1)\) inserted word \(s_1 \in \Delta ^*\) in w with the word \(s_2\), where \(s_2\) is a proper non-prefix non-suffix subword of either (I) a depth-\((\mu +1)\) word \(Q \in \Delta ^*\), or (II) a depth-\(\mu \) inserted piece \(Q \in \Delta \); and where \((s_1 \rightarrow s_2) \in S(M)\) is the specified rule.

In case (I), write \(Q \equiv Q_0 s_2 Q_1\), where necessarily \(Q_0, Q_1 \in \Delta ^*\). As Q is a depth-\((\mu +1)\) inserted word in \(u'\), the word \(u'\) can be obtained from some word \(u'' \in A^*\) by replacing a depth-\((\mu +1)\) or a depth-\(\mu \) inserted word \(s_3 \in \Delta ^*\) in \(u''\) with Q. That is, there is some rule \((Q \rightarrow s_3) \in S(M)\), which rewrites \(u' \xrightarrow {}_{S} u''\). But as \(Q_0 s_1 Q_1 =_M Q_0 s_2 Q_1 \equiv Q =_M s_3\), and \(|Q_0 s_1 Q_2| \ge |Q_0 s_2 Q_1| = |Q| \ge |s_3|\), we have \((Q_0 s_1 Q_1 \rightarrow s_3) \in S(M)\). Hence, we can obtain u already from \(u''\) by performing the insertion of replacing \(s_3\) by \(s_1\), i.e. \(u \xrightarrow {}_{S} u''\) by the rule \((Q_0 s_1 Q_1 \rightarrow s_3)\), thus reducing the rewriting \(w \xrightarrow {*}_{S} u\) by one step; we may thus by another induction assume without loss of generality that w is obtained from \(u'\) as in case (II).

Thus, assume case (II), i.e. \(Q \in \Delta \) is a depth-\(\mu \) inserted word in \(u'\), and \(s_2\) appears as a proper non-suffix non-prefix subword of the piece Q. As the presentation satisfies the small subpiece condition, it follows from \(s_2 \in \Delta ^*\) that the subpiece \(s_2\) satisfies \(|s_2| \le 1\). Hence also \(s_2 \in \Delta \cup \{ \varepsilon \}\). As \(s_1 =_M s_2\) and \(s_1 \in \Delta ^*\), we have \(s_1 \in {{\,\mathrm{Rep}\,}}\Delta _A^M(s_2)\). Hence, \((s_1 \rightarrow s_2) \in \mathscr {R}_\Delta \). Thus, w can be rewritten to \(u'\) in a single application of a rule from \(\mathscr {R}_\Delta \), and so, by repeating the same step for all depth-\((\mu +1)\) insertions, we find that there is a word \(w' \in A^*\) such that (1) \(w \xrightarrow {*}_{\mathscr {R}_\Delta } w'\); and (2) \(w'\) can be obtained from the word u with at most depth-\(\mu \) insertions. By the inductive hypothesis, thus \(w' \in \langle u \rangle _{\mathscr {R}_\Delta }\), and hence also \(w \in \langle u \rangle _{\mathscr {R}_\Delta }\). Now \(u \in {{\,\mathrm{Rep}\,}}\Delta _A^M(\delta )\), as \(u \in \Delta ^*\) and \(u =_M \delta \), and so we conclude that \(w \in \langle {{\,\mathrm{Rep}\,}}\Delta _A^M(\delta ) \rangle _{\mathscr {R}_\Delta }\), which is what was to be shown. \(\square \)

We conclude:

Theorem 2.8

Let \(M = \text {Mon} \langle A \mid w_1 =1, w_2 = 1, \dots , w_k = 1 \rangle \). Let the group of units U(M) of M be generated by a finite set X, and let \(\mathbf {C}\) be a super-\({{\,\mathrm{AFL}\,}}\). Then M admits a special presentation, with invertible pieces \(\Delta \), such that

$$\begin{aligned} {{\,\mathrm{WP}\,}}_X^{U(M)} \in \mathbf {C}\quad \implies \quad {{\,\mathrm{Rep}\,}}_A^M(\delta ) \in \mathbf {C}\,\, \text {for all }\delta \in \Delta . \end{aligned}$$

Proof

By Lemma 2.7, M admits a presentation satisfying the conclusions of that lemma. Suppose M is given by such a presentation, with pieces \(\Delta \), and X as usual. Suppose \({{\,\mathrm{WP}\,}}_X^{U(M)} \in \mathbf {C}\), and let \(\delta \in \Delta \) be any piece. Then for every \(p \in \Delta \cup \{ \varepsilon \}\) with \(|p|\le 1\), we have \({{\,\mathrm{Rep}\,}}\Delta _A^M(p) \in \mathbf {C}\). Hence the left-hand side of every letter or the empty word in \(\mathscr {R}_\Delta \) is in \(\mathbf {C}\), so \(\mathscr {R}_\Delta \) is a monadic \(\mathbf {C}\)-rewriting system. The conclusions of Lemma 2.7 being satisfied yields that \({{\,\mathrm{Rep}\,}}_A^M(\delta ) = \langle {{\,\mathrm{Rep}\,}}\Delta _A^M(\delta ) \rangle _{\mathscr {R}_\Delta }\). As \(\mathbf {C}\) has the monadic ancestor property and \({{\,\mathrm{Rep}\,}}\Delta _A^M(\delta ) \in \mathbf {C}\), we find that \({{\,\mathrm{Rep}\,}}_A^M(\delta ) \in \mathbf {C}\), as required. \(\square \)

This theorem is thus a full description of what the invertible words of a special monoid look like (we shall soon see that the assumption on the presentation becomes somewhat unimportant). We shall now use this description to language-theoretically describe when two invertible words are equal in a special monoid.

2.4 The invertible word problem

By the normal form lemma (Lemma 1.3), understanding equality of invertible words is tantamount to understanding equality of words, i.e. in describing in the word problem. We note that, on the surface, equality of invertible words (over \(A^*\)) is quite distinct from equality of elements in U(M). Motivated by the word problem \({{\,\mathrm{WP}\,}}_A^M\) for M, we define the invertible word problem of M (with respect to A) as

$$\begin{aligned} {{\,\mathrm{InvP}\,}}_A^M = \{ w_1 \# w_2^\text {rev}\mid w_1, w_2 \in A^*\text { invertible, and } w_1 =_M w_2 \}. \end{aligned}$$
(2.11)

By Lemma 2.2, \({{\,\mathrm{InvP}\,}}_A^M = {{\,\mathrm{WP}\,}}_A^M \cap \overline{\Delta }^*\# (\overline{\Delta }^\text {rev})^*= {{\,\mathrm{WP}\,}}_A^M \cap \mathfrak {M}^*\# (\mathfrak {M}^\text {rev})^*\).

We first provide the language-theoretic version of Lemma 1.3.

Lemma 2.9

Let \(\mathbf {C}\) be a super-\({{\,\mathrm{AFL}\,}}\). Then \({{\,\mathrm{InvP}\,}}_A^M \in \mathbf {C}\implies {{\,\mathrm{WP}\,}}_A^M \in \mathbf {C}\).

Proof

The result is obvious using Lemma 1.3 and the alternating products introduced by the author [61, 63]. We provide a direct proof instead. Let \(\mathscr {T}\) be the rewriting system over the alphabet \(A \cup \{ \# \}\) with the rules

$$\begin{aligned} \{ W \rightarrow \# \mid W \in {{\,\mathrm{InvP}\,}}_A^M \} \cup \{ a \# a \rightarrow \# \mid a \in A \}. \end{aligned}$$
(2.12)

Then \(\mathscr {T}\) is a monadic rewriting system. It is obviously a \(\mathbf {C}\)-rewriting system, as the language of left-hand sides of the symbol \(\#\) in (2.12) is the union of two languages in \(\mathbf {C}\), and \(\mathbf {C}\) is closed under unions. We claim that \(\langle \# \rangle _{\mathscr {T}} = {{\,\mathrm{WP}\,}}_A^M\). Upon proving this, we will conclude, as \(\mathscr {T}\) is a monadic \(\mathbf {C}\)-rewriting system, \(\mathbf {C}\) has the monadic ancestor property, and \(\{ \# \} \in \mathbf {C}\), that \({{\,\mathrm{WP}\,}}_A^M \in \mathbf {C}\), as desired.

\((\subseteq )\) Let \(w \in \langle \# \rangle _{\mathscr {T}}\). Then there exists some least \(\mu \ge 0\) and words \(u_i \in A^*\), where \(0 \le i \le \mu \) such that

$$\begin{aligned} w \equiv u_0 \xrightarrow {}_{\mathscr {T}} u_1 \xrightarrow {}_{\mathscr {T}} \cdots \xrightarrow {}_{\mathscr {T}} u_{\mu -1} \xrightarrow {}_{\mathscr {T}} u_\mu \equiv \#. \end{aligned}$$
(2.13)

We prove by induction on \(\mu \) that \(w \in {{\,\mathrm{WP}\,}}_A^M\). The base case \(\mu =0\) implies \(w \equiv \#\), and of course \(\# \in {{\,\mathrm{WP}\,}}_A^M\). Suppose \(\mu > 0\) and that the claim is true for all rewritings of the form (2.13) requiring at most \(\mu -1\) steps. Now \(w_1 \xrightarrow {}_{\mathscr {T}}^{\mu -1} \#\), so by the inductive hypothesis we have \(w_1 \in {{\,\mathrm{WP}\,}}_A^M\). Thus, we can write \(w_1 \equiv w_1' \# (w_1'')^\text {rev}\), where \(w_1', w_1'' \in A^*\) satisfy \(w_1' =_M w_1''\). Let \((r,s) \in \mathscr {T}\) be the rule which rewrites \(w_0 \xrightarrow {}_{\mathscr {T}} w_1\). Then \(s \equiv \#\) by (2.12), so \(w \equiv w_0 \equiv w_1' r (w_1'')^\text {rev}\). As r is a left-hand side of a rule in \(\mathscr {T}\), thus \(s \equiv u \# v^\text {rev}\), where either \(u \# v^\text {rev}\in {{\,\mathrm{InvP}\,}}_A^M\), or else \(u \equiv a \equiv v\), where \(a \in A\). In either case, \(u =_M v\). Thus

$$\begin{aligned} w \equiv w_1' r (w_1'')^\text {rev}\equiv w_1' u \# v^\text {rev}(w_1'')^\text {rev}\equiv (w_1'u) \# (w_1'' v)^\text {rev}, \end{aligned}$$

and as \(w_1' u =_M w_1'' u =_M w_1'' v\), thus \(w \in {{\,\mathrm{WP}\,}}_A^M\).

\((\supseteq )\) Suppose that \(w \in {{\,\mathrm{WP}\,}}_A^M\). Then \(w \equiv u \# v^\text {rev}\) for some \(u, v \in A^*\) with \(u=_M v\). By Lemma 1.3, we can factorise u and v uniquely as

$$\begin{aligned} u \equiv u_0a_1 u_1 \cdots a_m u_m, \quad v \equiv v_0a_1 v_1 \cdots a_m v_m, \end{aligned}$$

respectively, where for every \(0 \le i \le m\) we have \(a_i \in A\), \(u_i =_M v_i\), and \(u_i\) (resp. \(v_i\)) is a maximal invertible factor of u (resp. v). We prove the claim by induction on this m. The base case \(m=0\) is clear, for then \(u \equiv u_0, v \equiv v_0\) are invertible, and thus \(w \equiv u \# v^\text {rev}\in {{\,\mathrm{InvP}\,}}_A^M\). Hence \((w \rightarrow \#) \in \mathscr {T}\), so \(w \in \langle \# \rangle _{\mathscr {T}}\). Assume \(m>0\) and that the claim holds for \(m-1\). As \(u_m, v_m\) are invertible, we have \((u_m \# v_m^\text {rev}\rightarrow \#), (a_m \# a_m \rightarrow \#) \in \mathscr {T}\). Thus

$$\begin{aligned} w \equiv u \# v^\text {rev}\equiv u_0a_1 u_1 \cdots u_{m-1} a_m u_m&\# v_m^\text {rev}a_m v_{m-1}^\text {rev}\cdots a_1 v_0^\text {rev}\\ \xrightarrow {*}_{\mathscr {T}} u_0a_1 u_1 \cdots u_{m-1}&\# v_{m-1}^\text {rev}\cdots a_1 v_0^\text {rev}. \end{aligned}$$

By the inductive hypothesis, \(u_0a_1 u_1 \cdots u_{m-1} \# v_{m-1}^\text {rev}\cdots a_1 v_0^\text {rev}\in \langle \# \rangle _{\mathscr {T}}\), and so also \(w \in \langle \# \rangle _{\mathscr {T}}\), as was to be shown. \(\square \)

Lemma 2.9 shows that understanding \({{\,\mathrm{InvP}\,}}_A^M\) translates to understanding \({{\,\mathrm{WP}\,}}_A^M\). We now describe \({{\,\mathrm{InvP}\,}}_A^M\) in terms of the word problem for U(M).

Lemma 2.10

Let \(\mathbf {C}\) be a super-\({{\,\mathrm{AFL}\,}}\) closed under reversal. Then the special monoid M admits a special monoid presentation, with generators A, such that U(M) has word problem in \(\mathbf {C}\) if and only if \({{\,\mathrm{InvP}\,}}_A^M \in \mathbf {C}\).

Proof

By Lemma 2.7, M admits a special monoid presentation satisfying the conclusions of that lemma; let M be given by such a presentation, and let \(A, \Delta , X\) and \(\phi :\Delta ^*\rightarrow X^*\) be as usual.

For notational brevity, write \(\Delta _r = \Delta ^\text {rev}\). The language \(\Delta ^*\# \Delta _r^*\) is a regular language. Let \(K = {{\,\mathrm{InvP}\,}}_A^M \cap (\Delta ^*\# \Delta _r^*)\). As \(\mathbf {C}\) is closed under intersection with regular languages, \(K \in \mathbf {C}\). Now K consists of precisely the words of the form \(u \# v^\text {rev}\) where \(u, v \in \Delta ^*\) and \(u =_M v\). That is, \(K = {{\,\mathrm{WP}\,}}_\Delta ^{U(M)}\), cf. (1.3). Hence \({{\,\mathrm{WP}\,}}_\Delta ^{U(M)} \in \mathbf {C}\), so U(M) has word problem in \(\mathbf {C}\).

\((\implies )\) For every \(\delta \in \Delta \), let \(\heartsuit _\delta , \widetilde{\heartsuit }_\delta \) be new symbols. Define \(\heartsuit _\Delta = \{ \heartsuit _\delta \mid \delta \in \Delta \}\), and \(\widetilde{\heartsuit }_\Delta = \{ \widetilde{\heartsuit }_\delta \mid \delta \in \Delta \}\) such that \(\heartsuit _\Delta \cap \widetilde{\heartsuit }_\Delta = \varnothing \). Let \(R_\delta , R_\delta ^r\) be the rewriting systems on the alphabet \(A \cup \heartsuit _\Delta \cup \widetilde{\heartsuit }_\Delta \) defined by

$$\begin{aligned} \mathscr {R}_\delta :=&\bigcup _{\delta \in \Delta } \big \{ (w, \heartsuit _\delta ) \mid w \in {{\,\mathrm{Rep}\,}}_A^M(\delta ) \big \}, \\ \mathscr {R}_\delta ^r :=&\bigcup _{\delta \in \Delta } \big \{ (w^\text {rev}, \widetilde{\heartsuit }_\delta ) \mid w \in {{\,\mathrm{Rep}\,}}_A^M(\delta ) \big \}. \end{aligned}$$

The conclusions of Lemma 2.7 being satisfied, \({{\,\mathrm{Rep}\,}}_A^M(\delta ) \in \mathbf {C}\) for every \(\delta \in \Delta \); as \(\Delta \) is finite and \(\mathbf {C}\) is closed under finite unions, \(\mathscr {R}_\delta \) is a monadic \(\mathbf {C}\)-rewriting system. As \(\mathbf {C}\) is closed under reversal, \(\mathscr {R}_\delta ^r\) is also a monadic \(\mathbf {C}\)-rewriting system, and so the system \(\mathscr {R}_0 := \mathscr {R}_\delta \cup \mathscr {R}_\delta ^r\) is, too.

There exists a surjective homomorphism \(\varrho :(\heartsuit _\Delta \cup \widetilde{\heartsuit }_\Delta )^*\rightarrow \Delta ^*\) defined, for \(\delta \in \Delta \) by \(\varrho (\heartsuit _\delta ) = \varrho (\widetilde{\heartsuit }_\delta ) = \delta \). Thus \(\heartsuit _\Delta \cup \widetilde{\heartsuit }_\Delta \) is a finite generating set for U(M). Let

$$\begin{aligned} L := {{\,\mathrm{WP}\,}}_{\heartsuit _\Delta \cup \widetilde{\heartsuit }_\Delta }^{U(M)} \cap (\heartsuit _\Delta ^*\# \widetilde{\heartsuit }_\Delta ^*). \end{aligned}$$
(2.14)

As U(M) has word problem in \(\mathbf {C}\), and \(\mathbf {C}\) is a super-\({{\,\mathrm{AFL}\,}}\), we have \(L \in \mathbf {C}\). We claim that \({{\,\mathrm{InvP}\,}}_A^M = \langle L \rangle _{\mathscr {R}_0} \cap A^*\# A^*\). As \(\mathbf {C}\) has the monadic ancestor property, this would imply \(\langle L \rangle _{\mathscr {R}_0} \in \mathbf {C}\) and consequently \({{\,\mathrm{InvP}\,}}_A^M \in \mathbf {C}\), completing the proof.

\((\subseteq )\) Let \(w \in {{\,\mathrm{InvP}\,}}_A^M\) be arbitrary. By Lemma 2.2, there exist \(w_1, w_2 \in \overline{\Delta }^*\) such that \(w \equiv w_1 \# w_2^\text {rev}\) with \(w_1 =_M w_2\). Write \(w_1 \equiv \vartheta _0 \vartheta _1 \cdots \vartheta _n\) and \(w_2 \equiv \vartheta _0' \vartheta _1' \cdots \vartheta '_m\), where \(\vartheta _i, \vartheta _j' \in \overline{\Delta }\) for \(0 \le i \le n\) and \(0 \le j \le m\). By definition of \(\overline{\Delta }\), for every \(\vartheta _i, \vartheta _j'\) there exist \(\delta _i, \delta _j' \in \Delta \) such that \(\vartheta _i =_M \delta _i\) and \(\vartheta _j' =_M \delta _j'\), i.e. \(\vartheta _i \in {{\,\mathrm{Rep}\,}}_A^M(\delta _i)\) and \(\vartheta _j' \in {{\,\mathrm{Rep}\,}}_A^M(\delta _j')\). Thus for every ij as above, \((\vartheta _{\delta _i}, \heartsuit _{\delta _i}), ( (\vartheta _j')^\text {rev}, \widetilde{\heartsuit }_{\delta _j'}) \in \mathscr {R}_0\). Let W be the word \(\heartsuit _{\delta _0} \heartsuit _{\delta _1} \cdots \heartsuit _{\delta _n} \# \widetilde{\heartsuit }_{\delta '_m} \widetilde{\heartsuit }_{\delta '_{m-1}} \cdots \widetilde{\heartsuit }_{\delta '_0}\). Then \(w \xrightarrow {*}_{\mathscr {R}_0} W\). As \(\delta _0 \delta _1 \cdots \delta _n =_M \delta _0' \delta _1' \cdots \delta _m'\), it follows that \(W \in {{\,\mathrm{WP}\,}}_{\heartsuit _\Delta \cup \widetilde{\heartsuit }_\Delta }^{U(M)}\), and as \(W \in \heartsuit _\Delta ^*\# \widetilde{\heartsuit }_\Delta ^*\), we have \(W \in L\). We conclude that \(w \in \langle L \rangle _{\mathscr {R}_0}\). As \(w \in A^*\# A^*\), thus \(\langle L \rangle _{\mathscr {R}_0} \cap A^*\# A^*\).

\((\supseteq )\) Let \(w \in \langle L \rangle _{\mathscr {R}_0} \cap A^*\# A^*\). Let \(W \in L\) be such that \(w \xrightarrow {}_{\mathscr {R}_0} W\). Then there exist \(\delta _i, \delta _j' \in \Delta \) and \(\heartsuit _{\delta _i} \in \heartsuit _\Delta , \widetilde{\heartsuit }_{\delta _j}' \in \widetilde{\heartsuit }_\Delta \) for \(0 \le i \le n\) and \(0 \le j \le m\) such that

$$\begin{aligned} W \equiv \heartsuit _{\delta _0} \heartsuit _{\delta _1} \cdots \heartsuit _{\delta _n} \# \widetilde{\heartsuit }_{\delta _m'} \widetilde{\heartsuit }_{\delta _{m-1}'} \cdots \widetilde{\heartsuit }_{\delta _0'}, \end{aligned}$$
(2.15)

with \(\delta _0 \delta _1 \cdots \delta _n =_M \delta _0' \delta _1' \cdots \delta _m'\). Let \(u \in (A \cup \heartsuit _\Delta \cup \widetilde{\heartsuit }_\Delta \cup \{ \# \})^*\) be such that \(u \xrightarrow {*}_{\mathscr {R}} W\). As no left-hand side of a rule in either \(\mathscr {R}_\delta \) or \(\mathscr {R}_\delta ^r\) contains an occurrence of any letter from \(\heartsuit _\Delta \) or \(\widetilde{\heartsuit }_\Delta \), it easily follows that u can be written as \(\alpha _0 \alpha _1 \cdots \alpha _n \# \alpha _m' \alpha _{m-1}' \cdots \alpha _0'\), where for \(0 \le i \le n\), \(\alpha _i\) is either (i) \(\heartsuit _{\delta _i}\), or else (ii) \(u_i\), where \(u_{i} \in {{\,\mathrm{Rep}\,}}_A^M(\delta _i)\); and similarly for \(0 \le j \le m\), \(\alpha _j'\) is either (i’) \(\widetilde{\heartsuit }_{\delta _j'}\), or else (ii’) \(v_j^\text {rev}\), where \(v_j \in {{\,\mathrm{Rep}\,}}_A^M(\delta _j')\). In particular, as \(w \in A^*\# A^*\) and \(w \xrightarrow {}_{\mathscr {R}} W\), there exist \(u_i, v_j\) as above such that

$$\begin{aligned} w \equiv u_0 u_1 \cdots u_n \# v_m^\text {rev}v_{m-1}^\text {rev}\cdots v_0^\text {rev}\equiv u_0 u_1 \cdots u_n \# (v_0 v_1 \cdots v_m)^\text {rev}, \end{aligned}$$

with \(u_i \in {{\,\mathrm{Rep}\,}}_A^M(\delta _i)\) and \(v_j \in {{\,\mathrm{Rep}\,}}_A^M(\delta _j')\) for all \(0 \le i \le n\) and \(0 \le j \le m\). As \(\delta _0 \delta _1 \cdots \delta _n =_M \delta _0' \delta _1' \cdots \delta _m'\), it it follows that \(u_0 u_1 \cdots u_n =_M v_0 v_1 \cdots v_m\). Furthermore, every \(u_i\) and every \(v_j\) is invertible. Thus \(w \in {{\,\mathrm{InvP}\,}}_A^M\). \(\square \)

Example 4

We give an example of the system \(\mathscr {R}_0\) from the proof of Lemma 2.10 for a concrete special monoid. Let \(M_4 = \text {Mon} \langle a,b,c \mid abc=1, b^2 =1 \rangle \). Then it is easily seen that \(\Delta = \{ abc, b\}\). Thus \(\heartsuit _\Delta = \{ \heartsuit _{abc}, \heartsuit _b \}\) and \(\widetilde{\heartsuit }_\Delta = \{ \widetilde{\heartsuit }_{abc}, \widetilde{\heartsuit }_b \}\). Note that \(ab^{17}c =_{M_4} abc\), though certainly \(ab^{17}c \not \in \Delta \). Thus we have the rule \((ab^{17}c \rightarrow \heartsuit _{abc})\) in \(\mathscr {R}_\delta \subset \mathscr {R}_0\). Similarly, \(bab^3 cb^2 =_{M_4} b\), so \((b^2cb^3ab \rightarrow \widetilde{\heartsuit }_{b}) \in \mathscr {R}_\delta ^r \subset \mathscr {R}_0\).

Let \(\mathbf {C}\) be closed under inverse homomorphism. While the word problem for M being in \(\mathbf {C}\) is invariant under change of generating set, it is not obvious that the invertible word problem for M being in \(\mathbf {C}\) is also such an invariant. We suspect that this is the case, but have been unable to find a proof. In any case, Lemma 2.10 provides a sufficiently good description of \({{\,\mathrm{InvP}\,}}_A^M\) in terms of U(M) for our purposes.

3 The word problem

We have now completely described the invertible word problem for M in terms of U(M). This yields a proof of the main theorem of this article.

Theorem A

Let M be a finitely presented special monoid. Let \(\mathbf {C}\) be a super-\({{\,\mathrm{AFL}\,}}\) closed under reversal. Then M has word problem in \(\mathbf {C}\) if and only if the group of units U(M) of M has word problem in \(\mathbf {C}\).

Proof

\((\implies )\) As M is finitely presented, U(M) is finitely generated, and if M has word problem in \(\mathbf {C}\), then by [35, Proposition 8(a)] as \(\mathbf {C}\) is a super-\({{\,\mathrm{AFL}\,}}\) any finitely generated submonoid of M has word problem in \(\mathbf {C}\). Thus U(M) has word problem in \(\mathbf {C}\).

By Lemma 2.10, M admits a presentation satisfying the conclusions of Lemma 2.10. Let A be the finite generating set for M in this presentation, and let further \(\Delta , X\) be the minimal pieces and associated set. Then, as \(\mathbf {C}\) is closed under inverse homomorphism, \({{\,\mathrm{WP}\,}}_{X'}^{U(M)} \in \mathbf {C}\). As the conclusions of Lemma 2.10 are satisfied, we have \({{\,\mathrm{InvP}\,}}_A^M \in \mathbf {C}\). By Lemma 2.9, thus \({{\,\mathrm{WP}\,}}_A^M \in \mathbf {C}\). As \(\mathbf {C}\) is closed under inverse homomorphism, the word problem for M (with respect to any finite generating set) is thus also in \(\mathbf {C}\). \(\square \)

We remark that the assumption of “finitely presented” cannot in general be dropped from the statement of Theorem A, see Remark 1(3). As one application of Theorem A, we can take the class \(\mathbf {IND}\) of indexed languages, which is a super-\({{\,\mathrm{AFL}\,}}\) closed under reversal (see [26]), or the class \(\mathbf {ET0L}\). Furthermore, as the class \(\mathbf {CF}\) of context-free languages is a super-\({{\,\mathrm{AFL}\,}}\) closed under reversal, we find the following.

Corollary 3.1

A finitely presented special monoid has context-free word problem if and only if its group of units is virtually free.

As before, the assumption of “finitely presented” in Corollary 3.1 cannot in general be dropped, see Remark 1(3). Now, as every group is a special monoid (equal to its own group of units), this is a full generalisation of the Muller-Schupp theorem. In 2004, Duncan and Gilman asked for a characterisation of monoids with context-free word problem [23, Question 4]. We have thus answered this question completely for the class of special monoids. Hoffmann et al. [35, p. 97] write “the depth of the Muller-Schupp result and its reliance on the geometrical structure of Cayley graphs of groups suggests that a generalization to semigroups could be very hard to obtain”. Nevertheless, the above Corollary 3.1 is a generalisation of this sort, free from any geometric structure. We note that every monoid with context-free word problem is word-hyperbolic (in the sense of [23], see also [14]); thus a further corollary is that a finitely presented special monoid with virtually free group of units is word-hyperbolic. For example, \(\text {Mon} \langle a,b,c,d \mid abcdab=1 \rangle \) is word-hyperbolic, as it has infinite cyclic group of units.

Remark 1

We give some examples illustrating how context-free special monoids, despite the similarities suggested by Theorem A, differ from groups. All the following statements are false if one substitutes “group” for “special monoid”.

  1. (1)

    There exists a (finitely generated) context-free special monoid which cannot be finitely presented. Namely, let \(M_5 = \text {Mon} \langle a,b,c \mid ab^ic = 1 \, (i \ge 1) \rangle \). Then \(M_5\), being defined by a context-free complete monadic rewriting system, has context-free word problem [11, Corollary 3.8], but obviously \(M_5\) cannot be finitely presented. By contrast, every context-free group is finitely presented.

  2. (2)

    There exists a finitely generated special monoid with context-free word problem, whose group of units is not finitely generated (indeed none of whose maximal subgroups is finitely generated). We refer the reader to Nyberg-Brodda [65] for this example, which answered in the negative a question of Brough, Cain and Pfeiffer (Question 10.4 in the pre-print version of [13]).

  3. (3)

    There exists a finitely generated special monoid with context-free group of units, but such that the word problem for the monoid is not context-free. Namely, let \(M_6 = \text {Mon} \langle a,b,c,d \mid ab^ic^jd = 1 \, (i, j \ge 2, j = i^2) \rangle \). Write, for brevity, \(A = \{ a, b,c d\}\). Then the set of defining relations of \(M_6\) clearly forms a complete monadic rewriting system defining \(M_6\). For any defining word \(w_{i,j}: \equiv ab^ic^jd\), as no rule of the complete system begins with bc or d, no word bucu, or du is (right) invertible for any \(u \in A^*\), and hence the factorisation of \(w_{i,j}\) into minimal invertible factors is trivial. As the set of minimal invertible factors generates U(M) (see e.g. Gray and Ruškuc [32]), we conclude that \(U(M_6) = 1\). On the other hand, using the complete rewriting system it is straightforward to see that

    $$\begin{aligned} {{\,\mathrm{Rep}\,}}_{A}^{M_6}(1) \cap ab^*c^*d = \{ ab^i c^j d \mid i, j \ge 2, j = i^2 \}, \end{aligned}$$

    but this right-hand side is well-known to not be a context-free language. Thus \({{\,\mathrm{Rep}\,}}_A^{M_6}(1)\) cannot possibly be context-free, and so \(M_6\) cannot have context-free word problem, see Theorem 3.2(i)\(\implies \)(iii) (this direction does not require finite presentability).

Despite the examples given in Remark 1, groups and special monoids share many language-theoretic similarities. We present these now, all of which essentially follow from Theorem A.

3.1 Representatives of words

Recall that to speak of a “context-free group” or a “regular group” means to speak of the language words representing the identity element of this group. For monoids, such a language is not, in general, interesting; if all defining relations of a monoid \(\Pi \) are of the form \(u = v\) with \(u, v \in A^+\) non-empty, for example, then only the empty word represents the identity element. On the other hand, for special monoids, this language completely characterises the language-theoretic behaviour of the monoid:

Theorem 3.2

Let \(M = \text {Mon} \langle A \mid w_1 = 1, w_2 = 1, \dots , w_k = 1 \rangle \) be a finitely presented special monoid. Let \(\mathbf {C}\) be a super-\({{\,\mathrm{AFL}\,}}\) closed under reversal. Then the following are equivalent:

  1. (i)

    M has word problem in \(\mathbf {C}\).

  2. (ii)

    For every word \(w \in A^*\), \({{\,\mathrm{Rep}\,}}_A^M(w) \in \mathbf {C}\).

  3. (iii)

    \({{\,\mathrm{IP}\,}}_A^M = \{ w \mid w =_M 1, w \in A^*\}\) is in \(\mathbf {C}\).

Proof

(i)\(\implies \)(ii). Let \(w \in A^*\). If \({{\,\mathrm{WP}\,}}_A^M \in \mathbf {C}\), then the right quotient \({{\,\mathrm{WP}\,}}_A^M / \{ \# w^\text {rev}\}\) is in \(\mathbf {C}\), as \(\mathbf {C}\) is a full \({{\,\mathrm{AFL}\,}}\) and hence closed under right quotients with regular languages. But this quotient language is just \(\{ u \in A^*\mid u =_M w \} = {{\,\mathrm{Rep}\,}}_A^M(w)\).

(ii)\(\implies \)(iii) Obvious.

(iii)\(\implies \)(i) Suppose \({{\,\mathrm{IP}\,}}_A^M \in \mathbf {C}\). Let \(\Delta \) be the pieces of the presentation. Then, as \(\mathbf {C}\) is closed under intersection with regular languages, \({{\,\mathrm{IP}\,}}_A^M \cap \Delta ^*\in \mathbf {C}\). But

$$\begin{aligned} {{\,\mathrm{IP}\,}}_A^M \cap \Delta ^*= \{ w \in \Delta ^*\mid w =_M 1\} = {{\,\mathrm{IP}\,}}_{\Delta }^{U(M)} \end{aligned}$$

where \(\Delta \) is considered as a generating set for U(M) (cf. (1.3)). As U(M) is a group, having \({{\,\mathrm{IP}\,}}_\Delta ^{U(M)} \in \mathbf {C}\) implies by [23,  Theorem 5.3] that U(M) has word problem in \(\mathbf {C}\). Thus, by Theorem A, M has word problem in \(\mathbf {C}\). \(\square \)

Taking \(\mathbf {C}= \mathbf {CF}\), the class of context-free languages, Theorem 3.2 answers a 1992 question first posed by Zhang [79, Problem 1], who asked: if the group of units U(M) of a special monoid is context-free, are the languages \({{\,\mathrm{Rep}\,}}_A^M(w)\) for \(w \in A^*\) context-free? By Theorem A and (i)\(\implies \)(ii) in Theorem 3.2, the answer to Zhang’s question is thus affirmative. For \(w \in A^*\), the set \({{\,\mathrm{Rep}\,}}_A^M(w)\) is called a basic congruential language by Zhang. The study of congruences on the free monoid all of whose congruence classes are context-free languages has a long history, which we do not expand on here; we mention only the early work by Cochet [20].

Corollary 3.3

Let \(M, \mathbf {C}\) be as in Theorem 3.2. If the group of units U(M) has word problem in \(\mathbf {C}\), then \(\mathfrak {M}^*= \overline{\Delta }^*\in \mathbf {C}\).

Proof

Obvious by Theorems A and 3.2, as \(\overline{\Delta }^*\) is the right quotient of \({{\,\mathrm{WP}\,}}_A^M\) (which is in \(\mathbf {C}\)) by the regular language \((\Delta ^\text {rev})^*\), and \(\mathbf {C}\) is a full \({{\,\mathrm{AFL}\,}}\). \(\square \)

In particular, if the group of units of a (finitely presented) special monoid M (generated by A) is virtually free, then the set of all invertible words is a context-free language. It is thus decidable in sub-cubic time (see the proof of Proposition 3.4) whether a given word \(w \in A^*\) represents an invertible element of M or not.

Finally, we remark that in the proof of Theorem 3.2, it is easy to use Lemma 1.3 to allow for replacing the empty word 1 in Theorem 3.2(iii) with any word \(w \in A^*\). That is, \({{\,\mathrm{Rep}\,}}_A^M(w)\) is in \(\mathbf {C}\) for any word \(w \in A^*\) if and only if it is in \(\mathbf {C}\) for every word \(w \in A^*\). We leave the details to the reader. Theorem 3.2 shows that special monoids and groups share similar language-theoretic behaviour, and that for special monoids the definition of the word problem as given here is a good generalisation of the word problem for groups.

3.2 Some decision problems

In 1992, Zhang [79, Problem 3] asked: given a special one-relation monoid \(\text {Mon} \langle A \mid w=1 \rangle \), is it decidable whether the congruence class of every word is a context-free language? The below theorem, when combined with Theorem 3.2, gives a complete and affirmative answer to Zhang’s question.

Theorem B

It is decidable whether a special one-relation monoid \(\text {Mon} \langle A \mid w=1 \rangle \) has context-free word problem.

Proof

Let \(M = \text {Mon} \langle A \mid w=1 \rangle \) be a special one-relation monoid. By Theorem A, it suffices to decide whether the group of units U(M) is virtually free. Now U(M) is a one-relator group \(\text {Gp} \langle X \mid r=1 \rangle \), where X and r are effectively computable from the given presentation of M by Adian’s overlap algorithm. First, we decide whether U(M) is torsion-free: this is the case if and only if r is not equal in the free group \(F_X\) on X to a proper power \(u^n\) of some other word u, \(n >1\). If U(M) is torsion-free, then U(M) is virtually free if and only if it is free [71], and U(M) is in turn free if and only if r is empty or a primitive element \(F_X\) by [75, Theorem 4]. As primitivity of r can be decided by Whitehead’s algorithm [75], this yields the result if U(M) is torsion-free. If U(M) is not torsion-free, then we can (uniquely) write \(r = u^n\) as above, with equality in \(F_X\). Then U(M) is virtually free if and only if u is primitive in \(F_X\) by [27, Theorem 3], which can again be decided by Whitehead’s algorithm. \(\square \)

We remark that Zhang additionally asked if this problem can be solved in polynomial time; the above algorithm is polynomial-time, thus also answering this part of the question affirmatively. Indeed, the only part of the algorithm which is not obviously polynomial-time is using the Whitehead algorithm to check if a word is primitive; but this can be done in quadratic time, see e.g. [56].

Example 5

As an illustration of Theorem B, we give two examples.

  1. (1)

    Let \(M_7 = \text {Mon} \langle a,b \mid (abaabbab)^2 = 1 \rangle \). Then by Adian’s overlap algorithm the defining word factors into minimal invertible pieces as \(((ab)(aabb)(ab))^2\), and the group of units of \(M_7\) is isomorphic to \(\text {Gp} \langle x_1,x_2 \mid (x_1x_2x_1)^2 = 1 \rangle \). The word \(x_1x_2x_1\) is readily seen to be a primitive word; we may use the Nielsen transformations \(x_2 \mapsto x_2x_1^{-1}\) followed by \(x_1 \mapsto x_1^{-1}x_2^{-1}\) to transform \(x_1x_2x_1\) into \(x_1 x_2\), followed by \(x_1\). Thus \(U(M_7) \cong \text {Gp} \langle x_1,x_2 \mid x_1^2 =1 \rangle \), so \(U(M_7) \cong C_2 *\mathbb {Z}\) is virtually free (explicitly, the subgroup generated by \(x_2\) and \(x_1 x_2 x_1^{-1}\) has index 2 in \(U(M_7)\) and is free of rank 2). We conclude that \(M_7\) has context-free word problem.

  2. (2)

    Let \(M_8 = \text {Mon} \langle a,b,c \mid acabcabcac=1 \rangle \). We factor, as before, the defining word into minimal invertible pieces as (ac)(abc)(abc)(ac), which yields us \(U(M_8) \cong \text {Gp} \langle x_1,x_2 \mid x_1 x_2^2 x_1 = 1 \rangle \). However, \(x_1x_2^2 x_1\) is not a primitive word; indeed, \(U(M_8) \cong \text {Gp} \langle x_1,x_2 \mid x_1^2x_2^2 =1 \rangle \), the fundamental group of the Klein bottle, which is virtually \(\mathbb {Z}^2\), and therefore not even hyperbolic. In particular, \(U(M_8)\) is not virtually free by Theorem A, so \(M_8\) is not context-free.

Theorem B can be seen as partial progress in studying the following question, which does not appear to have been studied anywhere previously.

Question 2

Let \(M = \text {Mon} \langle A \mid u=v \rangle \) be a one-relation monoid. Is it decidable whether M has context-free word problem?

Note that whereas it is still an open problem whether the word problem for all one-relation monoids is decidable (see Nyberg-Brodda [62] for a survey of this problem), this does not preclude classifying those one-relation monoids with context-free word problem; cf. e.g. the classification by Shneerson [69, 70] of the one-relation monoids which satisfy some non-trivial identity. For further progress on Question , see Nyberg-Brodda [63].

We turn to the word problem for special monoids, considered as a decision problem. Let M be a finitely presented special monoid, generated by A and whose group of units has decidable word problem. The method devised by Makanin [52] or indeed Zhang [78] to decide whether, for \(u, v \in A^*\), we have \(u =_M v\), is exponential in \(f(|u|+|v|)\), where f is the complexity of the word problem for the group of units. It would be interesting to see to what extent this can be sharpened. The best we can do for now is the following, which follows directly from Theorem A.

Proposition 3.4

Let M be a finitely presented special monoid, finitely generated by A, with virtually free group of units. Then the word problem for M, with input \(u, v \in A^*\), is decidable in \(O(n^{2.3728639})\)-time, where \(n = |u|+|v|\).

Proof

By Theorem A, there exists a context-free grammar \(\Gamma _M\) generating \({{\,\mathrm{WP}\,}}_A^M\). Thus the word problem reduces to checking whether \(u \# v^\text {rev}\in \mathscr {L}(\Gamma _M)\). By a result of Valiant [74], checking membership for a word of length n in the language of a context-free grammar reduces to the problem of multiplying \(n \times n\)-matrices with entries in \({\text {GF}}(2)\). The current best known algorithm for this latter problem is \(O(n^{2.3728639})\), due to Le Gall [44], see also Williams [76]. \(\square \)

We mentioned earlier that any context-free monoid is word-hyperbolic. Unlike for hyperbolic groups (for which the word problem is decidable in linear time), the best known result along these lines for word-hyperbolic monoids is that the word problem can be solved in polynomial time [16]. No upper bound on the degree for the polynomials that arise in this way is known to exist, but the best current known algorithm cannot be faster than \(O(n^5 \log n)\).

Conjecture 1

Let M be a finitely presented special monoid with virtually free group of units. Then the word problem is decidable in quadratic time.

The conjectured fastest time for matrix multiplication is \(O(n^2)\); if that conjecture is true, then the proof of Proposition 3.4 demonstrates that Conjecture 1 is also true. In line with this conjecture, we ask the following rather broad question.

Question 3

Let M be a finitely presented special monoid such that the word problem for the group of units of M is decidable in O(f(n)) for some function f. Does there always exist a polynomial g such that the word problem for M is decidable in O(g(f(n))?

Finally, we turn to the rational subset membership problem. A subset \(K \subseteq M\) is rational if and only if there exists a regular language \(L \subseteq A^*\) such that \(K = \pi (L)\) (see [25] for this definition). The rational subset membership problem is said to be decidable if, given as input a word \(w \in A^*\) and a regular language \(K \subseteq A^*\) (given e.g. as a finite-state automaton accepting the language), then we can decide if \(\pi (w) \in \pi (K)\). Every singleton element of a monoid M is a rational subset of M; as is every finitely generated submonoid; as is, for any \(u \in A^*\), the set of elements \(m \in M\) with a representative uv (resp. vu) for some \(v \in A^*\), where for this final subset, we can take \(L = uA^*\) (resp. \(A^*u\)). Therefore, decidability of the rational subset membership problem implies decidability of the word, submonoid membership, and divisibility problems for M. It is not difficult to show that, for any monoid M with context-free word problem, generated by a finite set A and with associated surjective homomorphism \(\pi :A^*\rightarrow M\), the pre-image \(\pi ^{-1}(R) \subseteq A^*\) of any rational subset \(R \subseteq M\) is a context-free language; indeed, the proof is virtually identical to the proof of (i)\(\implies \)(ii) in Theorem 3.2. Thus, from Theorem A, we deduce the following:

Theorem 3.5

Let M be a finitely presented special monoid with virtually free group of units. Then the rational subset membership problem for M is decidable.

This theorem generalises the fact that virtually free groups have decidable rational subset membership problem. Relatively much is known about the rational subset membership problem for groups, see e.g. [46, 47] and the survey by Lohrey [45]. This latter survey has 19 pages on the problem for groups, and yet a single paragraph (Sect. 12) summarises all material known in 2015 regarding the problem for monoids. Thus Theorem 3.5 greatly expands the known results for monoids. For example, Render and Kambites [68] prove that the bicyclic monoid \(\text {Mon} \langle b,c \mid bc=1 \rangle \) has decidable rational subset membership problem. As the bicyclic monoid has trivial group of units, their result is a very special case of Theorem 3.5.

3.3 Other classes of languages

In view of the restrictions on the class \(\mathbf {C}\) of languages to apply Theorem A, one might ask to what extent these may be weakened. The purpose of this section is to demonstrate that it is not at all straightforward to weaken them. First, we remark that the class \(\mathbf {REG}\) of regular languages is not a super-\({{\,\mathrm{AFL}\,}}\), as it does not have the monadic ancestor property. The assumption of the monadic ancestor property cannot be directly removed from the statement of Theorem A, as the bicyclic monoid (which has trivial, and hence regular, group of units) does not have regular word problem, being infinite. However, it is not hard to understand the special monoids with regular word problem.

Proposition 3.6

Let M be a finitely presented special monoid. Then M has regular word problem if and only if M is a finite group.

Proof

By Anīsīmov’s theorem, M has regular word problem if and only if it is finite. On the other hand, from the results by Adian [2] on identities in special monoids it follows that a special monoid is finite if and only if it is a group; see [53, Theorem 6] for a proof of this fact using rewriting techniques. \(\square \)

Another class of languages which is not a super-\({{\,\mathrm{AFL}\,}}\) is the class \(\mathbf {DCF}\) of deterministic context-free languages. This is not, for example, closed under homomorphism or union. We can describe special monoids with deterministic context-free word problem quite well. First, free monoids have word problem in \(\mathbf {DCF}\), as the language \(\{ w \#w^\text {rev}\mid w \in A^*\}\) is easily seen to be in \(\mathbf {DCF}\). As an aside, this demonstrates the importance of the symbol \(\#\), as the language of palindromes \(\{ w w^\text {rev}\mid w \in A^*\}\) is not in \(\mathbf {DCF}\) (see [36, Exercise 12.6(b)]).

We make an observation. It is not difficult to see that for any special monoid M (generated by A and with pieces \(\Delta \)) which is not isomorphic to a free product of a free monoid by a group, we have that M contains a submonoid isomorphic to the bicyclic monoid. Indeed, M is of the aforementioned form if and only if there is some piece \(\delta \in \Delta \) with \(|\delta |>1\). Write \(\delta \equiv u_1 u_2\) with \(u_1, u_2 \in A^+\), and let \(u \in A^*\) be such that u is an inverse of \(\delta \). Let \(v \equiv u_2 u\). Then it is not hard to see that while \(u_1 v =_M 1\), we do not have \(vu_1 =_M 1\). It follows by [19, Lemma 1.31] that \(\langle u_1, v \rangle \le M\) is isomorphic to the bicyclic monoid. This is essentially the proof given in Lallement [43, Theorem 1.2], generalised from the one-relation case. Now, Brough, Cain and Pfeiffer [13] conjectured that the bicyclic monoid does not have deterministic context-free word problem; Kambites (unpublished) confirmed this conjecture, by an application of the pumping lemma for deterministic context-free languages [77]. In particular, by [35, Proposition 8(a)], any monoid containing the bicyclic monoid does not have deterministic context-free word problem; by the above, thus any special monoid which is not a free product of a free monoid by a group does not have deterministic context-free word problem. It follows implicitly from Makanin [51] or explicitly from Benois [8] that a special monoid is isomorphic to a free product of a free monoid by a group if and only if it is right cancellative, and for special monoids, right cancellativity is equivalent to cancellativity. We conclude:

Proposition 3.7

Let M be a finitely presented special monoid with deterministic context-free word problem. Then M is cancellative, and the group of units U(M) is a context-free group.

We emphasise that the case of having a special monoid be cancellative is somewhat pathological. It is natural to conjecture that the converse of the above proposition holds, too.

Conjecture 2

Let M be a finitely presented special monoid. Then M has deterministic context-free word problem if and only if M is cancellative and the group of units U(M) is a context-free group.

The class of monoids with word problem in \(\mathbf {C}= \mathbf {CF}\) is closed under taking free products [13, Theorem 6.2]. Indeed, the author has shown that the same is true for any super-\({{\,\mathrm{AFL}\,}}\) \(\mathbf {C}\) [61]. On the other hand, it is an open problem whether the same is true for the class of monoids with word problem in \(\mathbf {DCF}\). If this were true, then this would imply Conjecture 2 holds, in view of Benois’ result, the fact that free monoids have word problem in \(\mathbf {DCF}\), and that the class of groups with word problem in \(\mathbf {CF}\) coincides with the class of groups with word problem in \(\mathbf {DCF}\) [55].

3.4 Infix presentations

Recall that in Sect. 2.2 it was proved that every special monoid admits such a presentation (with pieces \(\Delta \)) satisfying the small subpiece condition. A stronger condition than the small subpiece condition is the infix condition, namely the condition that \(\Delta \) be an infix code; i.e. no piece contains a piece as a proper subword. For example, let

$$\begin{aligned} M = \text {Mon} \langle a,b,c,d \mid (ab)(cd)(ab)=1 \rangle \quad \text {resp.} \quad M' = \text {Mon} \langle a,b \mid (ab)(aabb)(ab)=1 \rangle . \end{aligned}$$

These presentations have pieces \(\Delta = \{ ab, cd\}\) resp. \(\Delta ' = \{ ab, aabb \}\). Thus M is given by an infix presentation, but \(M'\) is not. To the author it appears unlikely that \(M'\) admits an infix presentation. However, at present, we have no direct means of proving this suspicion. This makes the following question natural.

Question 4

Does every special monoid admit an infix presentation?

We conjecture that the answer to Question  is negative. The only general result we are able to show towards answering Question  is the following.

Proposition 3.8

Let \(M = \text {Mon} \langle A \mid w_1 = 1, w_2 = 1, \dots , w_k = 1 \rangle \). If the group of units of M is trivial, then M admits an infix presentation. Furthermore, an infix presentation for M can be effectively computed from the given presentation for M.

Proof

Let \(\Delta = \{ \delta _1, \delta _2, \dots , \delta _n \}\) be the pieces of the presentation. As every \(\delta _i\) is invertible, and \(U(M) = 1\), we have \(\delta _i =_M 1\) for every \(1 \le i \le n\). Furthermore, if \(w_\mu \equiv \delta _{i_1} \delta _{i_2} \cdots \delta _{i_\ell }\) with \(1 \le \mu \le k\) and pieces \(\delta _{i_j} \in \Delta \) and \(1 \le i_j \le n\) for \(1 \le j \le \ell \), then \(w_\mu =_M 1\) follows from the set of relations \(\{ \delta _i = 1 \mid 1 \le i \le n\}\). Thus the given presentation for M is equivalent to the presentation

$$\begin{aligned} M = \text {Mon} \langle A \mid \delta _1 = 1, \delta _2 = 1, \dots , \delta _n = 1 \rangle . \end{aligned}$$
(3.1)

It is obvious that \(\Delta \) is the set of minimal invertible pieces for this presentation, too. Suppose the presentation (3.1) is not infix. Then there are pieces \(\delta , \delta ' \in \Delta \) such that \(\delta \equiv h_1 \delta ' h_2\) for some \(h_1, h_2 \in A^+\). Add the relation \(h_1 h_2 = 1\) to (3.1), at which point the relation \(\delta = 1\) (i.e. \(h_1 \delta ' h_2 = 1\)) is redundant, following from \(h_1h_2 = 1\) and \(\delta ' = 1\). The resulting presentation clearly has pieces \(\Delta '\), where \(\Delta ' \subseteq \Delta \). We then repeat the process, given above to (3.1), to the new presentation. As every piece in \(\Delta \) contains at most finitely many occurrences of other pieces, this eventually terminates in an infix presentation.

As \(U(M) = 1\), the word problem is decidable for M by Makanin’s theorem, and so every step above is effective. \(\square \)

For example, consider the monoid

$$\begin{aligned} M_9 = \text {Mon} \langle a,b,c,d \mid bcabcd=1, abcd=1, bc=1 \rangle . \end{aligned}$$

It is easy to check that \(\Delta = \{ bc, ad, aadd, abcd \}\), so this is not an infix presentation. However, \(U(M_9) = 1\), so applying the treatment in the proof of Proposition 3.8 we find

$$\begin{aligned} M_9 \cong \text {Mon} \langle a,b,c,d \mid abcd=1, bc=1 \rangle \cong \text {Mon} \langle a,b,c,d \mid ad=1, bc=1 \rangle \end{aligned}$$

and the set of pieces of the final presentation is \(\Delta ' = \{ ad, bc \}\). Indeed, \(M_9\) is simply the free product of two copies of the bicyclic monoid. In particular this final presentation is infix. This method is rather useful, and can likely be extended somewhat (say, to when U(M) is finite?). Note, however, that Proposition 3.8 does nothing for \(M'\) above, as \(U(M') \cong \mathbb {Z}\).