1 Introduction

One motivation for this work stems from Description Logic (DL) [1], where constant symbols (called individual names) are used within knowledge bases to denote objects or individuals in an application domain. If such objects are composed of other objects, it makes sense to represent them as (ground) terms rather than constants. For example, the couple consisting of individual a in the first component and individual b in the second component is more reasonably represented by the term f(ab) (where f is a binary function symbol denoting the couple constructor) than by a third constant c that is unrelated to a and b. In fact, if we have two couples, one consisting of a and b and the other of \(a'\) and \(b'\), and we learn (by DL reasoning or from external sources) that a is equal to \(a'\) and b is equal to \(b'\), then this automatically implies that f(ab) is equal to \(f(a',b')\), i.e., that this is one and the same couple, whereas we would not obtain such a consequence if we had introduced constants c and \(c'\) for the two couples.

If we use terms to represent objects, and can learn (e.g., by DL reasoning) that two terms are supposed to be equal, we need to be able to decide which other identities between terms can be derived from the given ones. Fortunately, this problem (usually called the word problem for ground identities) is decidable in polynomial time. The standard approach for deciding this word problem is congruence closure [2,3,4,5]. Basically, congruence closure starts with the given set of ground identities E, and then extends it using closure under reflexivity, symmetry, transitivity, and congruence. The set \( CC (E)\) obtained this way is usually infinite, and the main observation that yields decidability in polynomial time is that one can restrict it to the subterms of E and the subterms of the terms for which one wants to decide the word problem. An alternative approach for deciding the word problem for ground identities is based on term rewriting. Basically, in this approach one generates an appropriate canonical term rewriting system from E, and then decides whether two terms are equal modulo the theory E by computing their canonical forms and checking whether they are syntactically equal. This was implicit in [6], and made explicit in [7] (see also [8, 9] for other rewriting-based approaches).

In the motivating example from DL, but also in other settings where congruence closure is employed (such as SMT [10, 11]), it sometimes makes sense to assume that certain function symbols satisfy additional semantic properties that are not expressible by ground identities. Such symbols are then often called interpreted symbols, in contrast to the uninterpreted ones, for which no such properties are required. For example, one may want to consider couples where the order of the components is irrelevant, which means that the couple constructor function is commutative. Another interesting property for (ordered) couples is extensionality: if two couples are equal then they must have the same first and second components, i.e., the couple constructor f must satisfy the extensionality axiom \(f(x,y) \mathrel {\approx }f(x',y') \Rightarrow x\mathrel {\approx }x' \wedge y\mathrel {\approx }y'\). While it is well-known that adding commutativity does not increase the complexity (see, e.g., [3, 12]), extensionality has, to the best of our knowledge, not been considered in this context before. The problem with extensionality is that it allows us to derive “small” identities from larger ones. Consequently, it is conceivable that one first needs to generate such large identities using congruence and other rules, before one can get back to a smaller one through the application of extensionality. Thus, it is not obvious that also with extensionality one can restrict congruence closure to a finite set of terms determined by the input.

Here, we will tackle this problem using a rewriting-based approach. Our proofs imply that, also with extensional symbols, proofs of identities that detour through “large” terms can be replaced by proofs using only “small” terms, but it is not clear how this could be shown directly without the rewriting-based approach introduced in this paper.

The conference version of this paper [13] shows how the rewriting-based approach of [7] can be extended to also handle commutative symbols. In contrast to approaches that deal with associative-commutative (AC) symbols [14, 15] using rewriting modulo AC, commutativity is treated by introducing an additional ground rewrite system \(R(\varSigma _c)\) consisting of all appropriately ordered ground instances of commutativity. In Sect. 3, we extend this approach from commutativity to more general shallow identities [16] of the form \(h(x_1,\ldots ,x_n) \mathrel {\approx }h(y_1,\ldots ,y_n)\) where \(\{x_1,\ldots ,y_n\}=\{y_1\ldots ,y_n\}\) is a set of (not necessarily distinct) variables. Although it deals with a more general setting, we were able to design a simpler congruence closure algorithm. Basically, it applies the classical rewriting-based congruence closure for uninterpreted function symbols. Shallow identities of the above form, called strongly shallow in this paper, are dealt with by reducing the left-hand sides of the generated rules using an additional ground rewrite system induced by the strongly shallow identities. This ground system is infinite, but finitely represented by an ordered rewrite system [17]. No other interaction between the ground identities and the strongly shallow identities is needed.

In [13] it was shown how to extend the commutative congruence closure algorithm developed there to deal also with (non-commutative) extensional symbols. In Sect. 4, we not only extend this approach from commutativity to strongly shallow identities. Instead, we exhibit general properties that the canonical rewrite system computed by a semantic congruence closure algorithm must satisfy such that extensionality for otherwise uninterpreted function symbols can be handled by a simple iterative algorithm that derives new identities between constants due to extensionality. As already observed in [13], it does not make sense to consider symbols that are both commutative and extensional since this makes the induced equational theory trivial. Instead, we introduce, in [13] and in Sect. 5 of the present paper, the notion of c-extensionality, which is more appropriate for commutative symbols, but increases the complexity of the word problem from polynomial to coNP-complete. Section 6 gives a more detailed description of related work.

2 Preliminaries

We assume that the reader is familiar with basic notions and results regarding equational theories, universal algebra, and term rewriting. In this section, we first recall the relevant notions and results regarding equational theories and term rewriting, but refer the readers to [5] for details. We will keep as close as possible to the notation introduced in [5]. In particular, we use \(\mathrel {\approx }\) to denote identities between terms and \(=\) to denote syntactic equality. Then, we introduce congruence closure, both without and with interpreted function symbols. Finally, we define our notion of strongly shallow identities, and show how such identities give rise to an ordered rewrite system that is canonical on ground terms.

2.1 Equational Theories and Term Rewriting

Terms are built as usual from variables, constants, and function symbols. A term of the form \(f(t_1,\ldots ,t_n)\) for an n-ary function symbol f is said to have root symbol f. An identity is a pair of terms (st), which we usually write as \(s\mathrel {\approx }t\). A ground term is a term not containing variables and a ground identity is a pair of ground terms. Given a set of (not necessarily ground) identities F, the equational theory induced by F is defined semantically as \( {\mathrel {\approx }_F} := \{ s\mathrel {\approx }t \mid \text{ every } \text{ model } \text{ of } F \text { is a model of} s\mathrel {\approx }t\}. \) The notion of model used here is the usual one from first-order logic, where the equality symbol is interpreted as the equality relation on the domain and we assume that identities are (implicitly) universally quantified. Since we consider signatures consisting only of constant and function symbols, we call first-order interpretations algebras.

Birkhoff’s theorem (see Theorem 4.5.14 in [5]) provides us with an alternative syntactic characterization of \(\mathrel {\approx }_F\) that is based on rewriting. A given set of identities F induces a binary relation \(\rightarrow _F\) on terms. Basically, we have \(s \rightarrow _F t\) if there is an identity \(\ell \mathrel {\approx }r\) in F such that s contains a substitution instance \(\sigma (\ell )\) of \(\ell \) as subterm, and t is obtained from s by replacing this subterm with \(\sigma (r)\). Birkhoff’s theorem says that \(\mathrel {\approx }_F\) is identical to \(\mathrel {\displaystyle \mathop {\leftrightarrow }^{*}}_F\), where \(\mathrel {\displaystyle \mathop {\leftrightarrow }^{*}}_F\) denotes the reflexive, transitive, and symmetric closure of \(\rightarrow _F\).

The normal form of a term s is an irreducible term \({\widehat{s}}\) such that \(s \mathrel {\displaystyle \mathop {\rightarrow }^{*}}_F {\widehat{s}}\), where \(\mathrel {\displaystyle \mathop {\rightarrow }^{*}}_F\) denotes the reflexive and transitive closure of \(\rightarrow _F\) and \({\widehat{s}}\) is irreducible if there is no \(s'\) with \({\widehat{s}}\rightarrow _F s'\). If \(\rightarrow _F\) is canonical (i.e., terminating and confluent), then every term has a unique normal form, which we then called its canonical form. In addition, \(s \mathrel {\displaystyle \mathop {\leftrightarrow }^{*}}_F t\) iff s and t have the same canonical forms. Termination ensures that the normal form exists and confluence that it is unique. The relation \(\rightarrow _F\) is confluent if \(s\mathrel {\displaystyle \mathop {\rightarrow }^{*}}_F t_1\) and \(s\mathrel {\displaystyle \mathop {\rightarrow }^{*}}_F t_2\) imply that the pair of terms \(\langle t_1,t_2\rangle \) is joinable, i.e., there is a term t such that \(t_1\mathrel {\displaystyle \mathop {\rightarrow }^{*}}_F t\) and \(t_2\mathrel {\displaystyle \mathop {\rightarrow }^{*}}_F t\). It is terminating if there is no infinite chain \(t_0\rightarrow _F t_1\rightarrow _F t_2\rightarrow _F\cdots \). Termination can be proved using a reduction order, which is a well-founded order > on terms such that \(\ell > r\) for all \(\ell \mathrel {\approx }r\in F\) implies \(s > t\) for all terms st with \(s\rightarrow _F t\). Since > is well-founded this then implies termination. If \(\rightarrow _F\) is terminating, then confluence can be tested by checking whether all critical pairs of F are joinable. Basically, critical pairs \(\langle t_1,t_2\rangle \) consider the most general “forks” of the form \(s\rightarrow _F t_1\) and \(s \rightarrow _F t_2\) that are due to overlapping left-hand sides of identities. They can be obtained by computing the most general unifiers of left-hand sides of rules with subterms of renamed variants of left-hand sides of other or the same rule. To be more precise, if \(\ell \mathrel {\approx }r\) and \(g\mathrel {\approx }d\) are identities (whose variables have been renamed apart) such that a non-variable subterm \(\ell '\) of \(\ell \) unifies with g with most general unifier \(\sigma \), then this overlap induces the critical pair \(\langle t_1,t_2\rangle \) where \(t_1 = \sigma (r)\) and \(t_2\) is obtained from \(\sigma (\ell )\) by replacing the subterm \(\sigma (\ell ') = \sigma (g)\) with \(\sigma (d)\). Such a critical pair \(\langle t_1,t_2\rangle \) is joinable if there is a term t such that \(t_1\mathrel {\displaystyle \mathop {\rightarrow }^{*}}_F t\) and \(t_2\mathrel {\displaystyle \mathop {\rightarrow }^{*}}_F t\).

When considering the relation \(\rightarrow _F\), one calls F a term rewriting system rather than a set of identities, and writes its elements (called rewrite rules) as \(\ell \rightarrow r\) instead of \(\ell \mathrel {\approx }r\). From a formal point of view, however, both rewrite rules and identities are (ordered) pairs of terms. Given a set of such pairs, we can view it as a set of identities or a term rewriting system, and thus the notions introduced above apply to both.

2.2 Congruence Closure

Let \(\varSigma \) be a finite set of function symbols of arity \(\ge 1\) and \(C_0\) a finite set of constant symbols. We denote the set of ground terms built using symbols from \(\varSigma \) and \(C_0\) with \(G(\varSigma ,C_0)\). In the following, let E be a finite set of ground identities \(s \mathrel {\approx }t\) between terms \(s, t\in G(\varSigma ,C_0)\), and \(\mathrel {\approx }_E\) the equational theory induced by E on \(G(\varSigma ,C_0)\).

2.2.1 Classical Congruence Closure

It is well known (see, e.g., [5], Lemma 4.3.3) that \(\mathrel {\approx }_E\) (viewed as a subset of \(G(\varSigma ,C_0)\times G(\varSigma ,C_0)\)) can be generated using congruence closure, i.e., by exhaustively applying reflexivity, transitivity, symmetry, and congruence to E. To be more precise, \( CC (E)\) is the smallest subset of \(G(\varSigma ,C_0)\times G(\varSigma ,C_0)\) that contains E and is closed under the following rules:

  • if \(s\in G(\varSigma ,C_0)\), then \({s\mathrel {\approx }s}\in CC (E)\) (reflexivity);

  • if \({s_1\mathrel {\approx }s_2}, {s_2\mathrel {\approx }s_3}\in CC (E)\), then \({s_1\mathrel {\approx }s_3}\in CC (E)\) (transitivity);

  • if \({s_1\mathrel {\approx }s_2}\in CC (E)\), then \({s_2\mathrel {\approx }s_1}\in CC (E)\) (symmetry);

  • if f is an n-ary function symbol and \({s_1\mathrel {\approx }t_1},\ldots , {s_n\mathrel {\approx }t_n}\in CC (E)\), then \(f(s_1,\ldots ,s_n) \mathrel {\approx }f(t_1,\ldots ,t_n)\in CC (E)\) (congruence).

The set \( CC (E)\) is usually infinite. To obtain a decision procedure for the word problem for E (i.e., for the relation \(\mathrel {\approx }_E\)), one can show that it is sufficient to restrict the application of the above rules to a finite subset of \(G(\varSigma ,C_0)\), which consists of the subterms of terms occurring in E and of the subterms of the terms for which one wants to decide whether holds or not (see, e.g., [5], Theorem 4.3.5).

2.2.2 Semantic Congruence Closure

As mentioned in the introduction, it sometimes makes sense to assume that certain function symbols are interpreted in the sense that they satisfy additional semantic properties that are not expressible by ground identities. Thus, we now consider a setting where there is a subset \(\varSigma _F \) of \(\varSigma \) and a set of (non-ground) identities F formulated using the function symbols in \(\varSigma _F \) and variables. Given a finite set E of ground identities \(s \mathrel {\approx }t\) between terms \(s, t\in G(\varSigma ,C_0)\), we assume that also the identities in F are satisfied. This means that we consider algebras \({\mathcal {A}}\) that satisfy not only the ground identities in E, but also the universal closure of the non-ground identities in F, i.e., algebras \({\mathcal {A}}\) that are models of \(E\cup F\).

Given \(s, t\in G(\varSigma ,C_0)\), we say that \(s\mathrel {\approx }t\) follows from E w.r.t. F (written \(s \mathrel {\approx }_E^{F}t\)) if \(s^{\mathcal {A}} = t^{\mathcal {A}} \) holds in all models \({\mathcal {A}} \) of \(E\cup F\). Thus, \(\mathrel {\approx }_E^{F}\) is the restriction of \(\mathrel {\approx }_{E\cup F}\) to the ground terms in \(G(\varSigma ,C_0)\). By a slight abuse of notation, we will later use \(\mathrel {\approx }_E^{F}\) also to denote the restriction of \(\mathrel {\approx }_{E\cup F}\) to \(G(\varSigma ,C)\) for a set of constants C containing \(C_0\).

The relation \({\mathrel {\approx }_E^{F}}\subseteq G(\varSigma ,C_0)\times G(\varSigma ,C_0)\) can again be generated by extending congruence closure by an additional rule that deals with the identities in F, i.e., \( CC ^{F} (E)\) is the smallest subset of \(G(\varSigma ,C_0)\times G(\varSigma ,C_0)\) that contains E and is closed under reflexivity, transitivity, symmetry, congruence, and the following rule:

  • if \({\ell \mathrel {\approx }r} \in F\) and \(\sigma \) is a substitution that maps the variables in \(\ell , r\) to elements of \(G(\varSigma ,C_0)\), then \({\sigma (\ell ) \mathrel {\approx }\sigma (r)} \in CC ^{F} (E)\) (F-identities).

We call \( CC ^{F} (E)\) the congruence closure of E w.r.t. F. Birkhoff’s theorem again shows that \( CC ^{F} (E)\) coincides with \({\mathrel {\approx }_E^{F}}\).

Without restrictions on the set of identities F, the word problem for E w.r.t. F (i.e., the relation \({\mathrel {\approx }_E^{F}}\)) need not be decidable. For example, if F axiomatizes associativity of a binary function symbol f, then this word problem corresponds to the word problem for finitely presented semigroups, which is known to be undecidable [18]. Thus, one needs to impose some restrictions on the set of identities F to guarantee decidability of the word problem.

2.3 Strongly Shallow Identities and Ordered Rewrite Systems

In [13], we considered the setting where \(\varSigma _F \) (denoted as \(\varSigma _c\) in [13]) contains only binary function symbols and F consists of the commutativity axioms \(f(x,y)\mathrel {\approx }f(y,x)\) for all \(f\in \varSigma _F \). In the present paper, we extend the results of [13] from commutativity to more general kinds of semantic properties of function symbols that can be expressed by strongly shallow identities. Note that the notion of strongly shallow identities introduced in this paper is more restrictive than what is often called “shallow” in the literature [16] (see Sect. 6 for more details).

Definition 1

Let \(\varSigma _S \subseteq \varSigma \) be a set of function symbols and \(h\in \varSigma _S \) a symbol in \(\varSigma _S \) of arity n. A strongly shallow h-identity is of the form \(h(x_1,\ldots ,x_n)\mathrel {\approx }h(y_1,\ldots ,y_n)\) where \(x_1,y_1,\ldots ,x_n,y_n\) are not necessarily distinct variables such that \(\{x_1,\ldots ,x_n\} = \{y_1,\ldots ,y_n\}\). The set of identities S is a set of strongly shallow identities for \(\varSigma _S \) if \(S = \bigcup _{h\in \varSigma _S}S_h\), where \(S_h\) is a non-empty set of strongly shallow h-identities for all \(h\in \varSigma _S \).

Example 1

Let fgh respectively be a binary, an n-ary, and a ternary function symbol (for \(n>2\)), and a a constant symbol.

  1. 1.

    The identity \(f(x,y)\mathrel {\approx }f(y,x)\) stating that f is commutative is a strongly shallow f-identity.

  2. 2.

    The identities \(g(x_1,\ldots ,x_i,x_{i+1},\ldots ,x_n)\mathrel {\approx }g(x_1,\ldots ,x_{i+1},x_{i},\ldots ,x_n)\) for distinct variables \(x_1,\ldots ,x_n\) and \(1\le i < n\) are strongly shallow g-identities. We call such an identity a g-transposition.

  3. 3.

    The identity \(h(x,x,y)\mathrel {\approx }h(x,y,y)\) is a strongly shallow h-identity.

  4. 4.

    The following identities are not strongly shallow according to our definition, though they are shallow according to the definition in [16]: \(f(x,x) \mathrel {\approx }x\), \(f(x,y)\mathrel {\approx }h(x,y,x)\), \(f(x,y) \mathrel {\approx }f(x,x)\), \(f(x,a)\mathrel {\approx }f(a,x)\).

The word problem for a finite set of strongly shallow identities (i.e., the relation \(\mathrel {\approx }_S\)) can easily be shown to be decidable using rewriting. However, the classical notion of a canonical rewrite system, as introduced above, cannot be used for this purpose. The reason is that strongly shallow identities, like commutativity or transposition identities, usually cannot be oriented into terminating rewrite rules. However, we can orient their ground instances into terminating rules using an appropriate reduction order.

Given a finite set of constant symbols \(C\supseteq C_0\), we denote the set of ground terms built using symbols from \(\varSigma \) and C with \(G(\varSigma ,C)\). Ground instances of terms are obtained by applying ground substitutions, which replace the variables in the given terms with elements of \(G(\varSigma ,C)\).

Definition 2

(Ordered rewrite system [17]) Given a finite set of (non-ground) identities F and a reduction order >, we call the pair \((F,{>})\) an ordered rewrite system. The ground rewrite system \(R(F,{>})\) induced by \((F,{>})\) is defined as

$$\begin{aligned} \{\theta (s)\rightarrow \theta (t) \mid s\mathrel {\approx }t\in F\cup F^{-1},\ \theta \ \text{ is } \text{ a } \text{ ground } \text{ substitution, } \text{ and }\ \theta (s)>\theta (t)\}, \end{aligned}$$

where \(F^{-1} = \{t\mathrel {\approx }s\mid {s\mathrel {\approx }t}\in F\}\).

Since > is assumed to be a reduction order, \(R(F,{>})\) is clearly terminating. The system \(R(F,{>})\) is usually infinite. However, if F is finite and > decidable, then one can effectively decide whether some rule in \(R(F,{>})\) applies to a given ground term, and compute a term resulting from the application of this rule. In particular, this implies that we can effectively compute a normal form of a given ground term w.r.t. the ground rewrite system \(R(F,{>})\). Note, however, that this normal form need not be unique since there is no guarantee that this system is confluent. We say that the ordered rewrite system \((F,{>})\) is canonical on ground terms if \(R(F,{>})\) is confluent.

As a first example, we consider commutativity, i.e., the setting where \(F_c\) consists of the commutativity axioms for the (binary) function symbols in \(\varSigma _c \subseteq \varSigma \). Instead of \(\varSigma _F \) and F we use the symbols \(\varSigma _c \) and \(F_c\) to indicate already in the notation that we now consider commutativity. Let \(\mathrel {>_{ lpo }}\) be the lexicographic path orderFootnote 1 induced by a linear order on \(\varSigma \cup C\). Note that \(\mathrel {>_{ lpo }}\) is then a linear order on \(G(\varSigma ,C)\) (see Exercise 5.20 in [5]). The ground rewrite system induced by \((F_c,{\mathrel {>_{ lpo }}})\) is of the form

$$\begin{aligned} R(F_c,{\mathrel {>_{ lpo }}}) = \{ f(s,t)\rightarrow f(t,s) \mid f\in \varSigma _c,\ s, t\in G(\varSigma ,C),\ \text{ and }\ s \mathrel {>_{ lpo }}t\}, \end{aligned}$$

and it is easy to see that it is confluent (this is an instance of Lemma 4 in [13], and also follows from what we show for strongly shallow identities below). Thus, \((F_c,{\mathrel {>_{ lpo }}})\) is canonical on ground terms.

Now, assume \(\varSigma _S \subseteq \varSigma \) and let S be a set of strongly shallow identities for \(\varSigma _S \). In general, the ground rewrite system \(R(S,{\mathrel {>_{ lpo }}})\) is not confluent. For example, assume that \(h\in \varSigma _S \) is ternary and \(S_h = \{h(x,y,z)\mathrel {\approx }h(x,z,y), h(x,x,y)\mathrel {\approx }h(x,y,y)\}\). If the linear order > on \(\varSigma \cup C\) that induces \(\mathrel {>_{ lpo }}\) satisfies \(a>b\), then the ground term h(aab) rewrites with \(R(S,{\mathrel {>_{ lpo }}})\) both to h(abb) and h(aba). Since these terms are irreducible w.r.t. \(R(S,{\mathrel {>_{ lpo }}})\), this shows that \(R(S,{\mathrel {>_{ lpo }}})\) is not confluent.

To obtain an ordered rewrite system that is canonical on ground terms, we extend S by all implied strongly shallow identities, i.e., we consider the set

$$\begin{aligned} S^* = \{\{h(x_1,\ldots ,x_n)\mathrel {\approx }h(y_1,\ldots ,y_n) \mid h\in \varSigma _S, h(x_1,\ldots ,x_n)\mathrel {\approx }_{S_h} h(y_1,\ldots ,y_n)\}. \end{aligned}$$

Finiteness of this set (up to renaming of variables) is an easy consequence of the fact that such identities can contain at most n different variables. Thus, we can fix an arbitrary set \(V_n\) of n different variables and assume without loss of generality that the variables \(x_1,y_1,\ldots ,x_n,y_n\) belong to \(V_n\). In addition, it is an easy consequence of the restricted form of strongly shallow identities that it is decidable whether an identity of this form follows from S: starting with \(h(x_1,\ldots ,x_n)\) one can simply enumerate the finitely many terms that can be obtained by iterated application of the identities in \(S_h\).

Thus, \((S^*,{\mathrel {>_{ lpo }}})\) is a finite ordered rewrite system that can effectively be computed. Due to the finiteness of \(S^*\) and decidability of \(\mathrel {>_{ lpo }}\) (Proposition 5.4.16 in [5]), checking whether a rule of \(R(S^*,{\mathrel {>_{ lpo }}})\) is applicable and actually applying it can effectively be done. The following lemma shows that \((S^*,{\mathrel {>_{ lpo }}})\) is canonical on ground terms.

Lemma 1

The ground rewrite system \(R(S^*,{\mathrel {>_{ lpo }}})\) is canonical and equivalent to S on \(G(\varSigma ,C)\), i.e., \(s\mathrel {\approx }_S t\) iff \(s \mathrel {\approx }_{R(S^*,{\mathrel {>_{ lpo }}})} t\) holds for all terms \(s,t\in G(\varSigma ,C)\).

Proof

Termination of \(R(S^*,{\mathrel {>_{ lpo }}})\) is an immediate consequence of the fact that \(\mathrel {>_{ lpo }}\) is a reduction order. Confluence is an easy consequence of the critical pair lemma for ordered rewrite systems in [17] (Lemma 2.1 and 2.2), but we prove it here explicitly for the sake of completeness.

Since \(R(S^*,{\mathrel {>_{ lpo }}})\) is terminating, it is sufficient to show that all critical pairs of this ground rewrite system are joinable. There are two cases to consider: an overlap at the root or a rewrite within the substitution part. If there is an overlap at the root, then there are rules \(h(\sigma (x_1),\ldots ,\sigma (x_n))\rightarrow h(\sigma (y_1),\ldots ,\sigma (y_n))\) and \(h(\theta (x_1'),\ldots ,\theta (x_n'))\rightarrow h(\theta (y_1'),\ldots ,\theta (y_n'))\) in \(R(S^*,{\mathrel {>_{ lpo }}})\) with identical left-hand sides, i.e., such that \(h(\sigma (x_1),\ldots , \sigma (x_n)) = h(\theta (x_1'),\ldots ,\theta (x_n'))\), which yields the critical pair:

$$\begin{aligned} \langle h(\sigma (y_1),\ldots ,\sigma (y_n)), h(\theta (y_1'),\ldots ,\theta (y_n'))\rangle . \end{aligned}$$
(1)

If we assume without loss of generality that the sets \(\{x_1,\ldots ,x_n\} =\{y_1,\ldots ,y_n\}\) and \(\{x_1',\ldots ,x_n'\} =\{y_1',\ldots ,y_n'\}\) are disjoint, then we can actually also assume that \(\sigma =\theta \). Consequently, this substitution is a unifier of the terms \(h(x_1,\ldots ,x_n)\) and \(h(x_1',\ldots ,x_n')\), and thus is an instance of the most general unifier (mgu) \(\delta \) of these two terms, i.e., there is a substitution \(\lambda \) such that \(\sigma = \lambda \delta = \theta \). Obviously, this mgu must map variables to variables, i.e., \(\delta (x_i) = \delta (x_i')\) is a variable for all \(i = 1,\ldots ,n\). But then \(h(x_1,\ldots ,x_n) \mathrel {\approx }_S h(y_1,\ldots ,y_n)\) and \(h(x_1',\ldots ,x_n') \mathrel {\approx }_S h(y_1',\ldots ,y_n')\) imply that \(h(\delta (y_1),\ldots ,\delta (y_n)) \mathrel {\approx }_S h(\delta (y_1'),\ldots ,\delta (y_n'))\). Since \(\mathrel {>_{ lpo }}\) is total on ground terms, this implies that one of the following two rules belongs to \(R(S^*,{\mathrel {>_{ lpo }}})\):

Since \(\sigma = \lambda \delta = \theta \), this shows that the critical pair (1) is joinable.

A rewrite in the substitution part means that there is a rule of the form \(h(\sigma (x_1),\ldots ,\sigma (x_n))\rightarrow h(\sigma (y_1),\ldots ,\sigma (y_n))\) in \(R(S^*,{\mathrel {>_{ lpo }}})\) and an \(i, 1\le i\le n\), such that \(\sigma (x_i)\) can be rewritten to a term \(r_i\) using a rule from \(R(S^*,{\mathrel {>_{ lpo }}})\). The critical pair obtained this way is

$$\begin{aligned} \langle h(\sigma (x_1),\ldots ,\sigma (x_{i-1}),r_i,\sigma (x_{i+1}),\ldots ,\sigma (x_n)), h(\sigma (y_1),\ldots ,\sigma (y_n))\rangle . \end{aligned}$$

If we modify the substitution \(\sigma \) to \(\sigma '\) by setting \(\sigma '(x_i) = r_i\), then both the left and the right component of the critical pair can be rewritten to the common term \(h(\sigma '(y_1),\ldots ,\sigma '(y_n))\) by \(R(S^*,{\mathrel {>_{ lpo }}})\). This finishes the proof that \(R(S^*,{\mathrel {>_{ lpo }}})\) is canonical.

By the definition of \(R(S^*,{\mathrel {>_{ lpo }}})\), every rule \(\ell \rightarrow r\) in \(R(S^*,{\mathrel {>_{ lpo }}})\) satisfies \(\ell \mathrel {\approx }_S r\), which shows that \(s \mathrel {\approx }_{R(S^*,{\mathrel {>_{ lpo }}})} t\) implies \(s\mathrel {\approx }_S t\) for all terms \(s,t\in G(\varSigma ,C)\). The other direction is an easy consequence of the fact that \(\mathrel {>_{ lpo }}\) is a linear order since this implies that every non-trivialFootnote 2 substitution instance in \(G(\varSigma ,C)\) of a non-trivial identity in S occurs (appropriately ordered) as a rule in \(R(S^*,{\mathrel {>_{ lpo }}})\). \(\square \)

Coming back to the example illustrating that \(R(S,{\mathrel {>_{ lpo }}})\) need not be confluent, we note that, in this example, the identity \(h(x,y,y)\mathrel {\approx }h(x,y,x)\) belongs to \(S^*\), and thus the term h(aba) rewrites to h(abb) in \(R(S^*,{\mathrel {>_{ lpo }}})\).

3 Semantic Congruence Closure Based on Rewriting

Let \(\varSigma \) be a finite set of function symbols of arity \(\ge 1\) and \(C_0\) a finite set of constant symbols. In the following, let E be a finite set of ground identities \(s \mathrel {\approx }t\) between terms \(s, t\in G(\varSigma ,C_0)\). In addition, assume that \(\varSigma _S \subseteq \varSigma \) and let S be a set of strongly shallow identities for \(\varSigma _S \). Below, we show how to construct a canonical rewrite system that can be used to decide the word problem for E w.r.t. S. This system will be the union of \(R(S^*,\mathrel {>_{ lpo }})\) with a finite ground rewrite system \({\widehat{R}}^{S} (E)\), whose construction is described below.

3.1 Construction of the Rewrite System \({\widehat{R}}^{S} (E)\)

Let \( Sub (E)\) denote the set of subterms of the terms occurring in E. In a first step, we introduce a new constant \(c_s\) for every term \(s\in Sub (E)\setminus C_0\). To simplify notation, for a constant \(a\in C_0\) we sometimes use \(c_a\) to denote a. Let \(C_1\) be the set of new constants introduced this way and \(C := C_0\cup C_1\). We fix an arbitrary linear order > on C, which will be used to orient identities between constants into rewrite rules. Note that this order does not take into account which terms the constants correspond to, and thus we may well have \(c_s > c_{f(s)}\).

The initial rewrite system R(E) induced by E consists of the following rules:

  • If \(s\in Sub (E)\setminus C_0\), then s is of the form \(f(s_1,\ldots ,s_n)\) for an n-ary function symbol f and terms \(s_1,\ldots ,s_n\) for some \(n\ge 1\). For every such s we add the rule \( f(c_{s_1},\ldots ,c_{s_n}) \rightarrow c_s \) to R(E).

  • For every identity \({s\mathrel {\approx }t}\in E\) we add \(c_s \rightarrow c_t\) to R(E) if \(c_s > c_t\), and \(c_t \rightarrow c_s\) if \(c_t > c_s\).

Obviously, the cardinality of \(C_1\) is linear in the size of E, and R(E) can be constructed in time linear in the size of E. From the above construction, it follows that R(E) has two types of rules: constant rules of the form \(c \rightarrow d\) for \(c > d\) and function rules of the form \(f(c_1, \ldots , c_n) \rightarrow d\).

Example 2

Consider \(E = \{ f(g(a),a) \mathrel {\approx }c, g(b) \mathrel {\approx }h(a), a \mathrel {\approx }b\}\) with \(\varSigma _S = \{f\}\) and \(S_f = \{f(x,y)\mathrel {\approx }f(y,x)\}\). It is easy to see that \(f(b,h(a)) \mathrel {\approx }_E^{S}c\).

Using our construction, we first introduce the constants \(C_1 = \{c_{f(g(a),a)}, c_{g(a)}, c_{g(b)}, c_{h(a)}\}\). If we fix the linear order on C as \(c_{f(g(a),a)}> c_{g(a)}> c_{g(b)}> c_{h(a)}> a> b > c\), then we obtain the following rewrite system: \(R(E) = \{ f(c_{g(a)},a) \rightarrow c_{f(g(a),a)}, g(a) \rightarrow c_{g(a)}, g(b) \rightarrow c_{g(b)}, h(a) \rightarrow c_{h(a)}, c_{f(g(a),a)} \rightarrow c, c_{g(b)} \rightarrow c_{h(a)}, a \rightarrow b \}\).

The following lemma is an easy consequence of the definition of R(E). It can be shown by a simple induction on the structure of the term s.

Lemma 2

For all terms \(s\in Sub (E)\) it holds that \(s \mathrel {\approx }_{R(E)} c_s\).

Using this lemma, we can show that the construction of R(E) is correct for consequence w.r.t. strongly shallow identities in the sense stated in the next lemma. Recall that E and \(\mathrel {\approx }_E^{S}\) are binary relations in \(G(\varSigma ,C_0)\), whereas R(E) and \(\mathrel {\approx }_{R(E)}^{S}\) are binary relations in \(G(\varSigma ,C)\) for \(C = C_0\cup C_1\).

Lemma 3

Viewed as a set of identities, w.r.t. S is a conservative extension of E w.r.t. S, i.e., iff for all terms .

Proof

To show the if-direction, assume that , i.e., there is an algebra \({\mathcal {A}}\) over the signature \(\varSigma \cup C_0\) satisfying all identities in \(E\cup S\) such that . We expand \({\mathcal {A}} \) to the new constants in \(C_1\) by interpreting \(c_s\) as \(s^{\mathcal {A}} \) for all \(s\in Sub (E)\setminus C_0\), and call the expansion obtained this way \({\mathcal {B}} \). Since , it is sufficient to show that \({\mathcal {B}}\) satisfies the identities in \(R(E)\cup S\). Since \({\mathcal {A}}\) satisfies S, its expansion obviously satisfies S as well. If \(f(c_{s_1},\ldots ,c_{s_n}) \rightarrow c_s\) belongs to R(E), then \(s = f(s_1,\ldots ,s_n)\). Consequently, \(f(c_{s_1},\ldots ,c_{s_n})^{\mathcal {B}} = f^{\mathcal {B}} (c_{s_1}^{\mathcal {B}},\ldots ,c_{s_n}^{\mathcal {B}}) = f^{\mathcal {A}} (s_1^{\mathcal {A}},\ldots ,s_n^{\mathcal {A}}) = s^{\mathcal {A}} = s^{\mathcal {B}} \), which shows that \({\mathcal {B}} \) satisfies the identity \(f(c_{s_1},\ldots ,c_{s_n}) \mathrel {\approx }c_s\). If \(c_s \rightarrow c_t\) belongs to R(E), then \({s\mathrel {\approx }t}\in E\). Consequently, \(c_s^{\mathcal {B}} = s^{\mathcal {A}} = t^{\mathcal {A}} = c_t^{\mathcal {B}} \), which shows that \({\mathcal {B}} \) satisfies the identity \(c_s \mathrel {\approx }c_t\).

To show the only-if-direction, assume that , i.e., there is an algebra \({\mathcal {B}}\) over the signature \(\varSigma \cup C\) satisfying all identities in \(R(E)\cup S\) such that . Let \({\mathcal {A}}\) be the reduct of \({\mathcal {B}}\) to the signature \(\varSigma \cup C_0\), which is obtained from \({\mathcal {B}}\) by forgetting about the interpretation of the symbols in \(C_1\). Since , it is sufficient to show that \({\mathcal {A}}\) satisfies the identities in \(E\cup S\). For the identities in S, this is again trivially true. Thus, consider an identity \({s\mathrel {\approx }t}\in E\). Then R(E) contains the rule \(c_s \rightarrow c_t\) or \(c_t \rightarrow c_s\). In addition, Lemma 2 yields \(s \mathrel {\approx }_{R(E)} c_s\) and \(t \mathrel {\approx }_{R(E)} c_t\). Consequently, \(s^{\mathcal {A}} = s^{\mathcal {B}} = c_s^{\mathcal {B}} = c_t^{\mathcal {B}} = t^{\mathcal {B}} = t^{\mathcal {A}} \) since \({\mathcal {B}}\) satisfies the identities in in R(E). \(\square \)

In Lemma 3, we consider the relation \(\mathrel {\approx }_{R(E)}^{S}\), where the elements of \(S_h\) for \(h\in \varSigma _S \) are used as additional identities. Our goal is, however, to deal both with the ground identities in E and with the strongly shallow identities in S by rewriting. This is where the ground rewrite system \(R(S^*,{\mathrel {>_{ lpo }}})\) comes into play. We assumed that \(\mathrel {>_{ lpo }}\) is induced by a linear order on \(\varSigma \cup C\) that extends > on C, makes each function symbol in \(\varSigma \) greater than each constant symbol in C, and linearly orders the function symbols in an arbitrary way. This implies that \(\ell \mathrel {>_{ lpo }}r\) for all rules \({\ell \rightarrow r}\in R(E)\). Consequently, termination of the rewrite system \(R(E)\cup R(S^*,{\mathrel {>_{ lpo }}})\) can also easily be shown using \(\mathrel {>_{ lpo }}\).

However, in general \(R(E)\cup R(S^*,{\mathrel {>_{ lpo }}})\) need not be confluent due to possible interactions of rules in R(E) with each other and of rules in R(E) with rules in \(R(S^*,{\mathrel {>_{ lpo }}})\). We turn \(R(E)\cup R(S^*,{\mathrel {>_{ lpo }}})\) into a confluent and terminating system by modifying R(E) appropriately.

Algorithm 1

We start with \(R^{S}_0(E) := R(E)\) and \(i:=0\):

  1. (a)

    Let \(R^{S}_i(E){|}_ con \) consist of the constant rules in \(R^{S}_i(E)\). For every constant \(c\in C\), consider

    and let e be the least element in \([c]_i\) w.r.t. the order >. We call e the representative of c w.r.t. \(R^{S}_i(E)\) and >. If \(c\ne e\), then add \(c\rightarrow e\) to \(R^{S}_{i+1}(E)\).

  2. (b)

    In all function rules in \(R^{S}_i(E)\), replace each constant by its representative w.r.t. \(R^{S}_i(E)\) and >, and call the resulting set of function rules \(F^{S}_i(E)\).

    1. (b1)

      Let h be an n-ary function symbol in \(\varSigma _S\). For each function rule of the form \(h(c_1,\ldots ,c_n)\rightarrow d\) in \(F^{S}_i(E)\), replace this rule with the rule \(\ell \rightarrow d\), where \(\ell \) is the canonical form of \(h(c_1,\ldots ,c_n)\) w.r.t. the canonical rewrite system \(R(S^*,{\mathrel {>_{ lpo }}})\). Then continue with (b2).

    2. (b2)

      Let f be an n-ary function symbol in \(\varSigma \). For every term \(f(c_1,\ldots ,c_n)\) occurring as the left-hand side of a rule in \(F^{S}_i(E)\), consider all the rules \(f(c_1,\ldots ,c_n)\rightarrow d_1, \ldots , f(c_1,\ldots ,c_n)\rightarrow d_k\) in \(F^{S}_i(E)\) with this left-hand side. Let d be the least element w.r.t. > in \(\{d_1, \ldots , d_k\}\). Add \(f(c_1,\ldots ,c_n)\rightarrow d\) and \(d_j\rightarrow d\) for all j with \(d_j\ne d\) to \(R^{S}_{i+1}(E)\).

    If at least one constant rule has been added in step (b), then set \(i:= i+1\) and continue with step (a). Otherwise, terminate with output \({\widehat{R}}^{S} (E) := R^{S}_{i+1}(E)\).

Let us illustrate the construction of \({\widehat{R}}^{S} (E)\) using Example 2. In step (a), the non-trivial equivalence classes are \([a]_0 =\{a,b\}\) with representative b, \([c_{f(g(a),a)}] = \{c_{f(g(a),a)},c\}\) with representative c, and \([c_{g(b)}] = \{c_{g(b)}, c_{h(a)}\}\) with representative \(c_{h(a)}\). Thus, \(a \rightarrow b, c_{f(g(a),a)} \rightarrow c, c_{g(b)} \rightarrow c_{h(a)}\) are the constant rules added to \(R^{S}_1(E)\). The function rules in \(F^{S}_0(E)\) are then \(f(c_{g(a)},b) \rightarrow c, g(b) \rightarrow c_{g(a)}, g(b) \rightarrow c_{h(a)}, h(b) \rightarrow c_{h(a)}.\) Since \(c_{g(a)} > b\), the left-hand side of the rule \(f(c_{g(a)},b)\rightarrow c\) is reduced with \(R(S^*,{\mathrel {>_{ lpo }}})\) to \(f(b,c_{g(a)})\), and thus in step (b1) this rule is replaced with \(f(b,c_{g(a)})\rightarrow c\). In step (b2), there are two rules with left-hand side g(b). For these rules, we add \(c_{g(a)}\rightarrow c_{h(a)}\) and \(g(b) \rightarrow c_{h(a)}\) to \(R^{S}_1(E)\). The rules with left-hand sides different from g(b) are moved unchanged from \(F^{S}_0(E)\) to \(R^{S}_1(E)\) since their left-hand sides are unique. Thus, \(R^{S}_1(E) = \{ a \rightarrow b, c_{f(g(a),a)} \rightarrow c, c_{g(b)} \rightarrow c_{h(a)}, c_{g(a)}\rightarrow c_{h(a)}, f(b,c_{g(a)}) \rightarrow c, g(b) \rightarrow c_{h(a)}, h(b) \rightarrow c_{h(a)}\}.\)

In the second iteration step, we now have the new non-trivial equivalence class \([c_{g(b)}]_1 = \{c_{g(b)}, c_{h(a)}, c_{g(a)}\}\) with representative \(c_{h(a)}\). The net effect of step (a) is, however, that the constant rules are moved unchanged from \(R^{S}_1(E)\) to \(R^{S}_2(E)\). The function rules in \(F^{S}_1(E)\) are then \(f(b,c_{h(a)}) \rightarrow c, g(b) \rightarrow c_{h(a)}, h(b) \rightarrow c_{h(a)}\). Consequently, no reduction is possible in step (b1) and no constant rules are added in step (b2). The construction thus terminates with output \({\widehat{R}}^{S} (E) = \{ a \rightarrow b, c_{f(g(a),a)} \rightarrow c, c_{g(b)} \rightarrow c_{h(a)}, c_{g(a)}\rightarrow c_{h(a)}, f(b,c_{h(a)}) \rightarrow c, g(b) \rightarrow c_{h(a)}, h(b) \rightarrow c_{h(a)}\}.\)

3.2 Proof of Correctness and Tractability

Our goal is now to show that \({\widehat{R}}^{S} (E)\cup R(S^*,{\mathrel {>_{ lpo }}})\) provides us with a polynomial-time decision procedure for the word problem in E w.r.t. S. When looking at the complexity of the word problem, one needs to clarify what counts as input and what is assumed to be fixed. In our setting, where we want to know whether holds or not, one could on the one hand assume that S and E are fixed, and only the terms are part of the input. On the other hand, one could assume that the input consists of S, E, and . In this paper, we make the latter assumption, which makes our polynomiality result considerably stronger than if we had used the former one.

However, to obtain this result, we make the additional assumption that the maximal arity of the function symbols in \(\varSigma _S\) is bounded by a constant. In this case we say that \(\varSigma _S\) is arity-bounded. For the case of commutativity, this restriction is satisfied since then all the symbols in \(\varSigma _S\) are binary. If we consider g-transpositions, but do not bound the arity n of the symbol g (i.e., the arities of the symbols satisfying transpositions is determined by the input rather than fixed from the outset), then the arity-boundedness condition is not satisfied. If \(S_g\) consists of all g-transitions, then the set of identities \(\{g(x_1,\ldots ,x_n)\mathrel {\approx }g(y_1,\ldots ,y_n) \mid g(x_1,\ldots ,x_n)\mathrel {\approx }_{S_h} g(y_1,\ldots ,y_n)\}\) contains (for distinct variables \(x_1,\ldots ,x_n\)) all identities of the form \(g(x_1,\ldots ,x_n)\mathrel {\approx }g(x_{\pi (1)},\ldots ,x_{\pi (n)})\), where \(\pi \) is a permutation of the set \(\{1,\ldots ,n\}\), and is thus exponential in n. If n is assumed to be a constant, then this set of identities is actually of constant size.

In the following, we assume for our complexity results that \(\varSigma _S\) is arity bounded. Note that, even if the maximal arity of the symbols in \(\varSigma _S\) is bounded in this way, how many symbols of a certain allowed arity are present in \(\varSigma _S\) and which strongly shallow identities they satisfy, can still be determined by the input. First, we prove that, under this assumption, the construction of \({\widehat{R}}^{S} (E)\) takes only polynomial time.

Lemma 4

If \(\varSigma _S\) is arity-bounded, then the system \({\widehat{R}}^{S} (E)\) can be computed from R(E) in polynomial time.

Proof

First, note that step (a) can clearly be performed in polynomial time since deciding \(\mathrel {\approx }_{R^{S}_i(E){|}_ con }\) amounts to performing reachability tests in an undirected graph.Footnote 3 Producing the original rules in \(F^{S}_i(E)\) can then clearly also be done in polynomial time.

The canonical forms of left-hand sides of the form \(h(c_1,\ldots ,c_n)\) for \(h\in \varSigma _S \) required in step (b1) can also be computed in polynomial time. In fact, the terms obtained by a rewrite sequence in \(R(S^*,{\mathrel {>_{ lpo }}})\) that starts with \(h(c_1,\ldots ,c_n)\) are all of the form \(h(d_1,\ldots ,d_n)\) with \(\{d_1,\ldots ,d_n\} = \{c_1,\ldots ,c_n\}\). The number of such terms is bounded by \(n^n\) (which is a constant) and since \(R(S^*,{\mathrel {>_{ lpo }}})\) is terminating, there cannot be a repetition in this sequence. Consequently, at most \(n^n\) (i.e., constantly many) rewrite steps are applied.

Grouping the rules in \(F^{S}_i(E)\) according to their left-hand sides in step (b2) is clearly also possible in polynomial time, as is adding the new rules to \(R^{S}_{i+1}(E)\). In case the procedure does not terminate after step (b), the number of different equivalence classes of constants decreases by at least one. Thus, the iteration must terminate after at most |C| steps. \(\square \)

Next, we show that the construction of \({\widehat{R}}^{S} (E)\) is correct in the following sense.

Lemma 5

Viewed as a set of identities, is equivalent to R(E) w.r.t. S, i.e., \(s \mathrel {\approx }_{R(E)}^{S} t\) iff \(s \mathrel {\approx }_{{\widehat{R}}^{S} (E)\cup R(S^*,{\mathrel {>_{ lpo }}})} t\) for all terms \(s, t \in G(\varSigma ,C)\).

Proof

Note that (on \(G(\varSigma ,C)\)) we have \({\mathrel {\approx }_{R(E)}^{S}} = {\mathrel {\approx }_{R^{S}_0(E)}^{S}} = {\mathrel {\approx }_{R^{S}_0(E)\cup R(S^*,{\mathrel {>_{ lpo }}})}}\) since \(R(E) = R^{S}_0(E)\) and \(R(S^*,{\mathrel {>_{ lpo }}})\) is equivalent to S on \(G(\varSigma ,C)\) by Lemma 1. It is thus sufficient to show that the modifications performed when going from \(R^{S}_i(E)\) to \(R^{S}_{i+1}(E)\) do not change the induced equational theory, i.e., \({\mathrel {\approx }_{R^{S}_i(E)\cup R(S^*,{\mathrel {>_{ lpo }}})}} = {\mathrel {\approx }_{R^{S}_{i+1}(E)\cup R(S^*,{\mathrel {>_{ lpo }}})}}\) for all \(i\ge 0\) for which \(R^{S}_{i+1}(E)\) is defined.

To show the inclusion from left to right, first consider step (a). If \(R^{S}_i(E)\) contains the constant rule \(c_1\rightarrow c_2\), then \(c_1\) and \(c_2\) belong to the same equivalence class w.r.t. \(R^{S}_i(E){|}_ con \) and \(c_1 > c_2\). In case \(c_2\) is the least element in this class, then \(R^{S}_{i+1}(E)\) still contains the rule \(c_1\rightarrow c_2\). Otherwise, since we know that \(c_1 > c_2\), the least element e in the class is different from these two constants, and thus \(R^{S}_{i+1}(E)\) contains the rules \(c_1\rightarrow e, c_2\rightarrow e\), which shows \(c_1 \mathrel {\approx }_{R^{S}_{i+1}(E)} c_2\), and thus \(c_1 \mathrel {\approx }_{R^{S}_{i+1}(E)\cup R(S^*,{\mathrel {>_{ lpo }}})} c_2\).

Regarding step (b), note that the replacements performed in the construction of \(F^{S}_i(E)\) replace constants c occurring in function rules by constants e that are equivalent to c both w.r.t. \(R^{S}_i(E)\) and w.r.t. \(R^{S}_{i+1}(E)\). Thus, these replacements do not change the overall equational theory. In step (b1), rules of the form \(h(c_1,\ldots ,c_n)\rightarrow d\) are replaced with rules \(\ell \rightarrow d\) such that \(h(c_1,\ldots ,c_n) \mathrel {\approx }_{R(S^*,{\mathrel {>_{ lpo }}})} \ell \), which thus also does not change the overall equational theory.

For step (b2), consider a function rule \(f(c_1,\ldots ,c_n)\rightarrow d_j\) in \(F^{S}_i(E)\). Either this rule also belongs to \(R^{S}_{i+1}(E)\), or \(R^{S}_{i+1}(E)\) contains rules \(f(c_1,\ldots ,c_n)\rightarrow d, d_j\rightarrow d\). In both cases, \(f(c_1,\ldots ,c_n) \mathrel {\approx }_{R^{S}_{i+1}(E)} d_j\), and thus \(f(c_1,\ldots ,c_n) \mathrel {\approx }_{R^{S}_{i+1}(E)\cup R(S^*,{\mathrel {>_{ lpo }}})} d_j\) is preserved.

The inclusion from right to left can be shown similarly to the one from left to right. \(\square \)

Lemma 6

Viewed as a term rewriting system, \({\widehat{R}}^{S} (E)\cup R(S^*,{\mathrel {>_{ lpo }}})\) is canonical, i.e., terminating and confluent.

Proof

Termination of the term rewriting system \({\widehat{R}}^{S} (E)\cup R(S^*,{\mathrel {>_{ lpo }}})\) can be shown as for \(R(E)\cup R(S^*,{\mathrel {>_{ lpo }}})\), by using the reduction order \(\mathrel {>_{ lpo }}\).

Regarding confluence, we first claim that there are no non-trivial critical pairs (see Sect. 6.2 in [5]) between the rules in \({\widehat{R}}^{S} (E)\). To see this, note that two function rules from \({\widehat{R}}^{S} (E)\) cannot overlap due to the fact that in step (b2) no more constants are identified, and thus all left-hand sides of function rules in \({\widehat{R}}^{S} (E)\) are unique. In addition, any constant can occur as left-hand side of at most one rule due to step (a). A rule of the form \(f(c_1,\ldots ,c_n)\rightarrow d\) cannot overlap with a rule of the form \(e\rightarrow e'\) since the \(c_i\) are representatives, whereas e is not a representative.

Second, we have already shown in the proof of Lemma 1 that all critical pairs between rules in \(R(S^*,{\mathrel {>_{ lpo }}})\) are joinable.

Thus, consider an overlap of a rule \(h(s_1,\ldots ,s_n)\rightarrow r\) in \(R(S^*,{\mathrel {>_{ lpo }}})\) with a rule in \({\widehat{R}}^{S} (E)\). If this overlap occurs at the root position, then the rule in \({\widehat{R}}^{S} (E)\) is a function rule of the form \(h(c_1,\ldots ,c_n) \rightarrow d\) in \({\widehat{R}}^{S} (E)\). However, due to step (b1), the left-hand sides of such function rules are irreducible w.r.t. \(R(S^*,{\mathrel {>_{ lpo }}})\), which contradicts our assumption that \(h(s_1,\ldots ,s_n) = h(c_1,\ldots ,c_n)\) is the left-hand side of a rule in \(R(S^*,{\mathrel {>_{ lpo }}})\).

Finally, assume that the rule from \({\widehat{R}}^{S} (E)\) is applied below the root position in the left-hand side of the rule in \(R(S^*,{\mathrel {>_{ lpo }}})\). Joinability of the critical pair obtained this way can be shown as in the case of a rewrite in the substitution part in the proof of Lemma 1.

Summing up, we have shown that all non-trivial critical pairs of \({\widehat{R}}^{S} (E)\cup R(S^*,{\mathrel {>_{ lpo }}})\) can be joined, which proves confluence of \({\widehat{R}}^{S} (E)\cup R(S^*,{\mathrel {>_{ lpo }}})\). \(\square \)

Since \({\widehat{R}}^{S} (E)\cup R(S^*,{\mathrel {>_{ lpo }}})\) is canonical, each term \(s\in G(\varSigma ,C)\) has a unique canonical form (i.e., irreducible term reachable from s) w.r.t. \({\widehat{R}}^{S} (E)\cup R(S^*,{\mathrel {>_{ lpo }}})\). We can thus use the system \({\widehat{R}}^{S} (E)\cup R(S^*,{\mathrel {>_{ lpo }}})\) to decide whether terms are equivalent in E w.r.t. S, i.e., whether , by computing the canonical forms of the terms \(s_0\) and \(t_0\).

Theorem 1

If , then iff \(s_0\) and \(t_0\) have the same canonical form w.r.t. \({\widehat{R}}^{S} (E)\cup R(S^*,{\mathrel {>_{ lpo }}})\).

Proof

If , then , and thus by Lemma 3 and Lemma 5. Since \({\widehat{R}}^{S} (E)\cup R(S^*,{\mathrel {>_{ lpo }}})\) is canonical, this implies that \(s_0\) and \(t_0\) have the same canonical form w.r.t. \({\widehat{R}}^{S} (E)\cup R(S^*,{\mathrel {>_{ lpo }}})\).

Conversely, if \(s_0\) and \(t_0\) have the same canonical form w.r.t. \({\widehat{R}}^{S} (E)\cup R(S^*,{\mathrel {>_{ lpo }}})\), then . Lemma 5 yields . Since , we can now apply Lemma 3 to obtain , which is equivalent to . \(\square \)

As an example, consider the rewrite system \({\widehat{R}}^{S} (E)\) that we have computed (above Lemma 4) from the set of ground identities E in Example 2, and recall that \(f(b,h(a)) \mathrel {\approx }_E^{S}c\). The canonical form of c is clearly c, and the canonical form of f(bh(a)) can be computed by the following rewrite sequence: \(f(b,h(a)) \rightarrow _{{\widehat{R}}^{S} (E)} f(b,h(b)) \rightarrow _{{\widehat{R}}^{S} (E)} f(b,c_{h(a)}) \rightarrow _{{\widehat{R}}^{S} (E)} c\). In this example, no rule of \(R(S^*,{\mathrel {>_{ lpo }}})\) is used in the computation of the canonical forms, which is of course not the case in general. Also note that we have used \(R(S^*,{\mathrel {>_{ lpo }}})\) in the construction of \({\widehat{R}}^{S} (E)\).

The construction of \({\widehat{R}}^{S} (E)\) is actually independent of the terms \(s_0, t_0\) for which we want to decide the word problem in E w.r.t. S. This is in contrast to approaches that restrict the construction of the congruence closure to the subterms of E and the subterms of the terms \(s_0, t_0\) for which one wants to decide the word problem. This fact will turn out to be useful in the next section.

It remains to show that the decision procedure obtained by applying Theorem 1 requires only polynomial time if \(\varSigma _S \) is assumed to be arity-bounded.

Corollary 1

If \(\varSigma _S\) is arity-bounded, then the word problem for finite sets of ground identities w.r.t. S is decidable in polynomial time, i.e., given a finite set of ground identities \(E\subseteq G(\varSigma ,C_0)\times G(\varSigma ,C_0)\) and terms , we can decide in polynomial time, whether holds or not if we assume that the maximal arity of the symbols in \(\varSigma _S\) is bounded by a constant.

Proof

Since we already know that \({\widehat{R}}^{S} (E)\) can be constructed in time polynomial in the size of E, and thus also has polynomial size, it is sufficient to show that the canonical form of a term w.r.t. the rewrite system \({\widehat{R}}^{S} (E)\cup R(S^*,{\mathrel {>_{ lpo }}})\) can be computed in time polynomial in the size of \(s_0\) and \({\widehat{R}}^{S} (E)\). This is clearly the case if only a polynomial number of rewrite steps are needed to produce the canonical form of \(s_0\), and each application of a rewrite step also takes only polynomial time. The latter is an easy consequence of the facts that the relevant part of the set of identities

$$\begin{aligned} S^* = \{h(x_1,\ldots ,x_n)\mathrel {\approx }h(y_1,\ldots ,y_n) \mid h\in \varSigma _S, h(x_1,\ldots ,x_n)\mathrel {\approx }_{S_h} h(y_1,\ldots ,y_n)\} \end{aligned}$$

is of linear size if n is assumed to be a constant and \(\mathrel {>_{ lpo }}\) is decidable in polynomial time (see Proposition 5.4.16 in [5]).

We show the former by induction on the structure of \(s_0\). If \(s_0\) is a constant, then there are at most \(|C|-1\) rewrite steps starting with \(s_0\) possible. This yields the desired polynomial bound since the cardinality of C is linear in the size of E.

Now assume that for a function symbol \(h\in \varSigma \). First, we compute the canonical forms \(t_1,\ldots ,t_n\) of \(s_1,\ldots ,s_n\). By induction, we can assume that the number of rewrite steps required for these computations is bounded by polynomials \(p_1,\ldots ,p_n\) in the sizes of E and \(s_1,\ldots ,s_n\), respectively. If \(h\in \varSigma _S \), then we rewrite \(h(t_1,\ldots ,t_n)\) with the rules in \(R(S^*,{\mathrel {>_{ lpo }}})\). Note that the terms obtained by this rewrite sequence are all of the form \(h(u_1,\ldots ,u_n)\) with \(\{t_1,\ldots ,t_n\} = \{u_1,\ldots ,u_n\}\). Since the number of such terms is bounded by \(n^n\) (which is a constant) and \(R(S^*,{\mathrel {>_{ lpo }}})\) is terminating, there cannot be a repetition in this sequence. Consequently, at most \(n^n\) rewrite steps are applied. If the term \(h(v_1,\ldots ,v_n)\) obtained by this rewrite sequence is the left-hand side of a rule in \({\widehat{R}}^{S} (E)\), then it rewrites to a constant d, from which at most \(|C|-1\) rewrite steps are possible. Overall, we thus need at most \(p_1+\ldots + p_n + n^n + |C|\) rewrite steps to compute the canonical form of , which is clearly a polynomial bound in the size of \(s_0\) and E.

In the case where \(h\in \varSigma \setminus \varSigma _S \), the bound is \(p_1+\ldots + p_n + |C|\) instead. \(\square \)

Commutative function symbols all have arity 2, and thus \(\varSigma _S = \varSigma _c\) is clearly arity-bounded. Given \(\varSigma _c\), we denote the set of commutativity axioms for the elements of \(\varSigma _c\) as \(F_c\).

Corollary 2

The commutative word problem for finite sets of ground identities is decidable in polynomial time, i.e., given a finite set of ground identities \(E\subseteq G(\varSigma ,C_0)\times G(\varSigma ,C_0)\), commutativity axioms \(F_c\) for the symbols in \(\varSigma _c \subseteq \varSigma \), and terms , we can decide in polynomial time whether holds or not.

Even if \(\varSigma _S \) is not arity-bounded, we may still obtain polynomially decidable word problems. As an example, consider for all \(n>2\) function symbols \(g_n\) that are assumed to satisfy the transposition identities

$$\begin{aligned} S_{g_n} = \{g_n(x_1,\ldots ,x_i,x_{i+1},\ldots ,x_n)\mathrel {\approx }g_n(x_1,\ldots ,x_{i+1},x_{i},\ldots ,x_n)\mid 1\le i\le n-1\}. \end{aligned}$$

In this case, it is not necessary to consider the (exponentially many) derived identities between strongly shallow \(g_n\)-terms in \(S_{g_n}^*\) to construct the ordered rewrite system. Using \((S_{g_n},{\mathrel {>_{ lpo }}})\) instead is sufficient since this system is canonical on ground terms. In fact, using the rules in \(R(S_{g_n},{\mathrel {>_{ lpo }}})\) one can compute the canonical form of a ground term in polynomial time since this corresponds to applying bubble-sort. This observation can be used to show that the word problem for finite sets of ground identities w.r.t. function symbols satisfying transposition identities is decidable in polynomial time even without assuming arity-boundedness.

4 Semantic Congruence Closure with Extensionality

Here, we additionally assume that some of the function symbols are extensional, i.e., there is a set of function symbols \(\varSigma ^e \subseteq \varSigma \) whose elements we call extensional symbols. An extensional symbol \(f\in \varSigma ^e \) must satisfy the conditional identities

$$\begin{aligned} f(x_1,\ldots ,x_n) \mathrel {\approx }f(y_1,\ldots ,y_n) \Rightarrow x_i \mathrel {\approx }y_i\ \ \text{ for } \text{ all } \, i, 1\le i\le n. \end{aligned}$$
(2)

In the present section, we assume that extensional symbols do not satisfy any other semantic properties. The reason for this restriction will be explained in the next section.

We have shown in [13] how to integrate extensional symbols into the rewriting-based commutative congruence closure algorithm developed there. It is not hard to see that this approach can be adapted to the integration of extensionality into congruence closure w.r.t. strongly shallow identities as developed in the previous section. Below, we describe this integration in the more general setting of a generic rewriting-based semantic congruence closure. In addition to making our result more general, this also has the advantage that it clarifies which properties of our rewriting-based congruence closure algorithm are vital for integrating extensional symbols.

4.1 Generic Semantic Congruence Closure

Abstracting from the specific form of the semantic properties, we now consider a setting where there is a subset \(\varSigma _F \) of \(\varSigma \) and a set of identities F formulated using the function symbols in \(\varSigma _F \) and variables (see Sect. 2.2.2). We assume that one can construct a canonical ground rewrite system \({\widehat{R}}^{F} \) that satisfies properties similar to the system \({\widehat{R}}^{S} (E)\cup R(S^*,{\mathrel {>_{ lpo }}})\) considered in the previous section.

We call a ground rewrite system \(R\subseteq G(\varSigma ,C)\times G(\varSigma ,C)\) flat if it consists of constant rules of the form \(c \rightarrow d\) and function rules of the form \(f(c_1, \ldots , c_n) \rightarrow d\), where \(c,c_1,\ldots ,c_n,d \in C\) and \(f\in \varSigma \) is an n-ary function symbol.

Definition 3

We say that rewrite-based congruence closure is effective for \(\varSigma _F \) and F if, for every finite flat ground rewrite system \(R\subseteq G(\varSigma ,C)\times G(\varSigma ,C)\), there is a

(possibly infinite) ground rewrite system \({\widehat{R}}^{F} \subseteq G(\varSigma ,C)\times G(\varSigma ,C)\) that satisfies the following properties:

  1. 1.

    All rules containing a symbol in \(\varSigma _F \) have left-hand sides whose root symbols belong to \(\varSigma _F \),

  2. 2.

    All rules containing a symbol \(f\in \varSigma \setminus \varSigma _F \) as root symbol of the left-hand side are flat function rules of the form \(f(c_1,\ldots ,c_n)\rightarrow d\), and the subset of these rules can effectively be computed,

  3. 3.

    \({\widehat{R}}^{F} \) is equivalent to R w.r.t. F on terms in \(G(\varSigma ,C)\),

  4. 4.

    \({\widehat{R}}^{F} \) is canonical and canonical forms of terms in \(G(\varSigma ,C)\) w.r.t. this term rewriting system can effectively be computed.

Note that we do not require that the canonical ground rewrite system \({\widehat{R}}^{F} \) itself is computable, but only that there is an algorithm for computing canonical forms w.r.t. this system. The reason is that \({\widehat{R}}^{F} \) may actually be infinite, and thus may not be computable in finite time. One may wonder how one can get a procedure for computing canonical forms w.r.t. \({\widehat{R}}^{F} \) if this system is infinite. One possibility is that one can compute a finite representation of this system that allows one to produce these normal forms. In the case of strongly shallow identities, this finite representation consists of the finite ground rewrite system \({\widehat{R}}^{S} (E)\) together with the ordered rewrite system \((S^*,\mathrel {>_{ lpo }})\). It is easy to see that the system \({\widehat{R}}^{S}:= {\widehat{R}}^{S} (E) \cup R(S^*,\mathrel {>_{ lpo }})\) induced by these two components satisfies the conditions required by Definition 3.

Under the assumptions formulated in this definition, given two terms , we can decide whether holds or not by computing their canonical forms w.r.t. \({\widehat{R}}^{F} \) and checking whether they are syntactically equal. In order to obtain a polynomial-time decision procedure, it is sufficient to assume that all the computations mentioned above can be performed in polynomial time. In this case we say that rewrite-based congruence closure is polynomial for \(\varSigma _F \) and F. If S is a strongly shallow set of identities over an arity-bounded signature \(\varSigma _S \), then \({\widehat{R}}^{S} = {\widehat{R}}^{S} (E) \cup R(S^*,\mathrel {>_{ lpo }})\) satisfies the conditions required for a polynomial rewrite-based congruence closure. However, as mentioned at the end of Sect. 3, polynomiality may be achieved even if \(\varSigma _S \) is not arity-bounded.

4.2 Generic Semantic Congruence Closure with Extensionality

Let \(\varSigma _F \) and F be as above, and assume that \(\varSigma ^e \subseteq \varSigma \setminus \varSigma _F \). In addition to the identities in E and the identities in F for the symbols in \(\varSigma _F \), we now assume that also the conditional identities in (2) are satisfied for the symbols \(f\in \varSigma ^e \). From a semantic point of view, this means that we now consider algebras \({\mathcal {A}}\) that satisfy not only the identities in \(E\cup F\), but also extensionality for the symbols in \(\varSigma ^e\), i.e., for all \(f\in \varSigma ^e \), all \(i, 1\le i\le n\), and all elements \(a_1,\ldots ,a_n,b_1,\ldots ,b_n\) of \({\mathcal {A}}\) it holds that \(f^{\mathcal {A}} (a_1,\ldots ,a_n) = f^{\mathcal {A}} (b_1,\ldots ,b_n)\) implies \(a_i=b_i\) for all \(i, 1\le i\le n\). We say that \(s\mathrel {\approx }t\) follows from E w.r.t. F and the extensional symbols in \(\varSigma ^e \) (written \(s \mathrel {\approx }_E^{F,\varSigma ^e}t\)) if \(s^{\mathcal {A}} = t^{\mathcal {A}} \) holds in all algebras \({\mathcal {A}} \) that satisfy the identities in \(E\cup F\) and extensionality for the symbols in \(\varSigma ^e\).

The relation \({\mathrel {\approx }_E^{F,\varSigma ^e}}\subseteq G(\varSigma ,C_0)\times G(\varSigma ,C_0)\) can also be generated using the following extension of congruence closure by an extensionality rule.

To be more precise, \( CC ^{F,\varSigma ^e} (E)\) is the smallest subset of \(G(\varSigma ,C_0)\times G(\varSigma ,C_0)\) that contains E and is closed under reflexivity, transitivity, symmetry, congruence, F-identities, and the following extensionality rule:

  • if \(f\in \varSigma ^e \) is an n-ary function symbol, \(1\le i\le n\), and \(f(s_1,\ldots ,s_n) \mathrel {\approx }f(t_1,\ldots ,t_n)\in CC ^{F,\varSigma ^e} (E)\), then \({s_i\mathrel {\approx }t_i}\in CC ^{F,\varSigma ^e} (E)\) (extensionality).

Proposition 1

For all terms \(s,t\in G(\varSigma ,C_0)\) it holds that \(s \mathrel {\approx }_E^{F,\varSigma ^e}t\) iff \({s\mathrel {\approx }t}\in CC ^{F,\varSigma ^e} (E)\).

Proof

This proposition is an easy consequence of Theorem 54 in [19]. Adapted to our setting, this theorem says that \(\mathrel {\approx }_E^{F,\varSigma ^e}\) is the least congruence containing E that is invariant under applying F-identities and extensionality. Clearly, this is exactly \( CC ^{F,\varSigma ^e} (E)\). \(\square \)

To obtain a decision procedure for \(\mathrel {\approx }_E^{F,\varSigma ^e}\), we assume that rewrite-based congruence closure is effective for \(\varSigma _F \) and F. To apply this assumption, we first need to transform E into a flat rewrite system. Thus, let the flat term rewriting system R(E) be defined as in Sect. 3.

Example 3

Consider \(E' = \{ f(g(a),a) \mathrel {\approx }c, g(b) \mathrel {\approx }h(a), g(a) \mathrel {\approx }g(b)\}\) with \(\varSigma _F = \{f\}\) where \(F = \{f(x,y)\mathrel {\approx }f(y,x)\}\) and \(\varSigma ^e = \{g\}\). Then \(f(b,h(a)) \mathrel {\approx }_{E'}^{F,\varSigma ^e}c\).

Let the set \(C_1\) of new constants and the linear order on all constants be defined as in Example 2. Then we obtain the following flat system:

Lemma 7

The system R(E) is a conservative extension of E also w.r.t. F and the extensional symbols in \(\varSigma ^e \), i.e., iff for all terms .

Proof

The proof of this lemma is basically identical to the proof of Lemma 3. One only needs to note, additionally, that going to the expansion and the reduct constructed in this proof does not change satisfaction of the extensionality axioms. \(\square \)

The confluent and terminating rewrite system for deciding \(\mathrel {\approx }_E^{F,\varSigma ^e}\) is now constructed by first applying the generic semantic congruence closure to R(E), then applying extensionality to derive new identities between constants, and iterating these two steps until no new identities are derived in the second step.

Algorithm 2

The rewrite system \({\widehat{R}}^{F,\varSigma ^e} (E)\) is constructed by performing the following steps, starting with \(R_0 := R(E)\) and \(i:= 0\):

  1. (g)

    Apply the generic semantic congruence closure for \(\varSigma _F \) and F to \(R_i\) to obtain the canonical rewrite system \({\widehat{R}}^{F} _i\).

  2. (e)

    Let \(R_{i+1} := R_i\). For all \(f\in \varSigma ^e \), all pairs of distinct flat function rules \(f(c_1,\ldots ,c_n)\rightarrow d, f(c_1',\ldots ,c_n')\rightarrow d\) in \({\widehat{R}}^{F} _i\), and all \(j, 1\le j\le n\), such that \(c_j\ne c_j'\), add \(c_j\rightarrow c_j'\) to \(R_{i+1}\) if \(c_j > c_j'\) and otherwise add \(c_j'\rightarrow c_j\) to \(R_{i+1}\). If at least one constant rule has been added in this step, then set \(i:= i+1\) and continue with step (g). Otherwise, terminate with output \({\widehat{R}}^{F,\varSigma ^e} (E) := {\widehat{R}}^{F} _i\).

Lemma 8

The construction of \({\widehat{R}}^{F,\varSigma ^e} (E)\) terminates after a linear number of iterations and produces a canonical rewriting system.

Proof

Termination after linearly many iterations is an immediate consequence of the fact that, in each iteration of the construction (except the last one), the number of distinct equivalence classes of constants decreases, and the cardinality of the set of constants C is linear in the size of the input. The system \({\widehat{R}}^{F,\varSigma ^e} (E)\) is canonical since it is produced by an application of rewrite-based congruence closure to a finite flat ground rewrite system \(R_i\). \(\square \)

We illustrate the construction using Example 3. In step (g), we apply Algorithm 1 from the previous section. In step (a) of that algorithm, the non-trivial equivalence classes of constants are \([c_{f(g(a),a)}] = \{c_{f(g(a),a)},c\}\) with representative c and \([c_{g(b)}] = \{c_{g(a)}, c_{g(b)}, c_{h(a)}\}\) with representative \(c_{h(a)}\). Thus, \(c_{f(g(a),a)} \rightarrow c, c_{g(a)} \rightarrow c_{h(a)}, c_{g(b)} \rightarrow c_{h(a)}\) are the constant rules added. The function rules are then \(f(c_{h(a)},a) \rightarrow c, g(a) \rightarrow c_{h(a)}, g(b) \rightarrow c_{h(a)}, h(a) \rightarrow c_{h(a)}.\) In step (b1), the first rule is rewritten to \(f(a,c_{h(a)}) \rightarrow c\). Since the function rules obtained this way have unique left-hand sides, no constant rule is added in step (b2). Thus, the construction in step (g) terminates and yields the system that is the union of the generated flat rules with the ones in \(R(F,{\mathrel {>_{ lpo }}})\) (with the strongly shallow identities F being commutativity of the symbols in \(\varSigma _F\)). Note that, for commutativity, \(F^* = F\).

Consequently, we now proceed with step (e). Since \(g\in \varSigma ^e \), the presence of the rules \(g(a) \rightarrow c_{h(a)}\) and \(g(b) \rightarrow c_{h(a)}\) triggers the addition of \(a\rightarrow b\) to \(R_1\).

In the second iteration step, we now have in step (a) of the construction applied in (g) the additional non-trivial equivalence class \([a] = \{a,b\}\) with representative b. The net effect of step (a) is then to generate the constant rules \(c_{f(g(a),a)} \rightarrow c, c_{g(a)} \rightarrow c_{h(a)}, c_{g(b)} \rightarrow c_{h(a)}\), and \(a\rightarrow b\). The obtained function rules after step (b1) are then \(f(b,c_{h(a)}) \rightarrow c, g(b) \rightarrow c_{h(a)}, h(b) \rightarrow c_{h(a)}.\) No new constant rules are added in step (b2). In step (e), the construction terminates with the set of flat rules \(R_1 = \{ a\rightarrow b, c_{f(g(a),a)} \rightarrow c, c_{g(a)} \rightarrow c_{h(a)}, c_{g(b)} \rightarrow c_{h(a)}, f(b,c_{h(a)}) \rightarrow c, g(b) \rightarrow c_{h(a)}, h(b) \rightarrow c_{h(a)}\},\) which is identical to the system \({\widehat{R}}^{S} (E)\) computed for the set of identity E of Example 2. The system \({\widehat{R}}^{F,\varSigma ^e} (E)\) produced by this run of Algorithm 2 is \(R_1\cup R(F,{\mathrel {>_{ lpo }}})\).

Our goal is now to show, for the general case, that \({\widehat{R}}^{F,\varSigma ^e} (E)\) provides us with a decision procedure for the extensional word problem for \(E\cup F\), i.e., it allows us to decide the relation \(\mathrel {\approx }_E^{F,\varSigma ^e}\). This is an easy consequence of the following theorem.

Theorem 2

If , then iff \(s_0\) and \(t_0\) have the same canonical form w.r.t. \({\widehat{R}}^{F,\varSigma ^e} (E)\).

Proof

Assume that Algorithm 2 has terminated with output \({\widehat{R}}^{F,\varSigma ^e} (E) = {\widehat{R}}^{F} _n\).

To show the if-direction, assume that \(s_0\) and \(t_0\) have the same canonical form w.r.t. \({\widehat{R}}^{F,\varSigma ^e} (E)\). Then, , and thus since \({\widehat{R}}^{F} _n\) is equivalent to \(R_n\) w.r.t. F. Consequently, it is sufficient to prove that \({\mathrel {\approx }^{F}_{R_{i+1}}} \subseteq {\mathrel {\approx }^{F,\varSigma ^e}_{R_{i}}}\) holds for all \(i, 0\le i<n\). In fact, then implies , which yields by Lemma 7 since \(R_0 = R(E)\). To see the inclusion \({\mathrel {\approx }^{F}_{R_{i+1}}} \subseteq {\mathrel {\approx }^{F,\varSigma ^e}_{R_{i}}}\), note that the presence of the rules \(f(c_1,\ldots ,c_n)\rightarrow d, f(c_1',\ldots ,c_n')\rightarrow d\) in \({\widehat{R}}^{F} _i\) implies that \(f(c_1,\ldots ,c_n) \mathrel {\approx }_{R_i}^{F} f(c_1',\ldots ,c_n')\), and thus \(c_j \mathrel {\approx }^{F,\varSigma ^e}_{R_i} c_j'\). Since these are the only identities added when going from \(R_i\) to \(R_{i+1}\), this completes the proof of the if-direction.

We show the only-if-direction by contraposition. Thus, assume that \(s_0, t_0\) have different canonical forms w.r.t. \({\widehat{R}}^{F,\varSigma ^e} (E)\). Let \({\mathcal {A}} \) be the initial algebra (i.e., the free algebra over the empty set of generators) for \({\widehat{R}}^{F,\varSigma ^e} (E)\) viewed as a set of identities over the signature . Recall that this algebra has the equivalence classes of terms in \(G(\varSigma ,C)\) w.r.t. \(\mathrel {\approx }_{{\widehat{R}}^{F,\varSigma ^e} (E)}\) as its elements, and any term \(s\in G(\varSigma ,C)\) is interpreted in \({\mathcal {A}} \) as the class of s. Since \({\widehat{R}}^{F,\varSigma ^e} (E)\) is canonical, we can represent such a class by the unique canonical form of its elements. Obviously, the fact that \(s_0, t_0\) have different canonical forms w.r.t. \({\widehat{R}}^{F,\varSigma ^e} (E)\) implies . Thus, if we can show that \({\mathcal {A}} \) satisfies E, the identities in F, and extensionality for every \(f\in \varSigma ^e \), then implies that , and thus by Proposition 1.

If \({s\mathrel {\approx }t} \in E\), then \({s\mathrel {\approx }_{R(E)}^{F} t}\) (see the proof of Lemma 3). Since \(R(E) = R_0\subseteq \ldots \subseteq R_n\), this implies \(s \mathrel {\approx }_{{\widehat{R}}^{F} _n} t\), and thus \(s \mathrel {\approx }_{{\widehat{R}}^{F,\varSigma ^e} (E)} t\). Consequently, these two terms are evaluated to the same element of \({\mathcal {A}} \), which shows that \({\mathcal {A}} \) satisfies the identities in E.

Let \({u\mathrel {\approx }v} \in F\) and \(\theta (u), \theta (v)\) be ground instances of uv. Then we know that \(\theta (u) \mathrel {\approx }_{R(E)}^{F} \theta (v)\). As above, this implies that the two terms \(\theta (u)\) and \(\theta (v)\) evaluated to the same element of \({\mathcal {A}} \), which shows that \({\mathcal {A}} \) satisfies the identities in F.

Let \(f\in \varSigma ^e \) be an n-ary function symbol and \(s_1,t_1,\ldots ,s_n,t_n\in G(\varSigma ,C)\) be terms such that the terms \(f(s_1,\ldots ,s_n)\) and \(f(t_1,\ldots ,t_n)\) evaluate to the same element of \({\mathcal {A}} \). Then \(f(s_1,\ldots ,s_n) \mathrel {\approx }_{{\widehat{R}}^{F,\varSigma ^e} (E)} f(t_1,\ldots ,t_n)\), and thus these terms have the same canonical forms w.r.t. \({\widehat{R}}^{F,\varSigma ^e} (E)\). Let \(s_1',t_1',\ldots ,s_n',t_n'\) be the canonical forms of the terms \(s_1,t_1,\ldots ,s_n,t_n\), respectively. Then the canonical forms of \(f(s_1,\ldots ,s_n)\) and \(f(t_1,\ldots ,t_n)\) are respectively identical to the canonical forms of \(f(s_1',\ldots ,s_n')\) and \(f(t_1',\ldots ,t_n')\). Since \(f\not \in \varSigma _F \), no rule with root symbol from \(\varSigma _F \) on the left-hand side is applicable to the latter terms. Thus, the only possible rules applicable to these terms are flat function rules for f. Consequently, there are two possible cases for how the (identical) canonical forms of \(f(s_1,\ldots ,s_n)\) and \(f(t_1,\ldots ,t_n)\) can look like:

  1. 1.

    The terms \(f(s_1,\ldots ,s_n)\) and \(f(t_1,\ldots ,t_n)\) respectively have the canonical forms \(f(s_1',\ldots ,\) \(s_n')\) and \(f(t_1',\ldots ,t_n')\), and the corresponding arguments are syntactically equal, i.e., \(s_j' = t_j'\) for \(j = 1, \ldots , n\). In this case, \(s_j \mathrel {\approx }_{{\widehat{R}}^{F,\varSigma ^e} (E)} s_j' = t_j' \mathrel {\approx }_{{\widehat{R}}^{F,\varSigma ^e} (E)} t_j\) for \(j = 1, \ldots , n\), and thus, for \(j = 1, \ldots , n\), the terms \(s_j\) and \(t_j\) evaluate to the same element of \({\mathcal {A}} \).

  2. 2.

    The terms \(f(s_1,\ldots ,s_n)\) and \(f(t_1,\ldots ,t_n)\) reduce to the same constant d. Then \({\widehat{R}}^{F_c,\varSigma ^e} (E)\) must contain the rules \(f(s_1',\ldots ,s_n')\rightarrow d\) and \(f(t_1',\ldots ,t_n')\rightarrow d\). By the construction of \({\widehat{R}}^{F,\varSigma ^e} (E)\), we thus have \(s_j' = t_j'\) for all \(j =1, \ldots , n\) since otherwise new constant rules would have been added in step (e) and the construction would not yet have terminated. This yields \(s_j \mathrel {\approx }_{{\widehat{R}}^{F,\varSigma ^e} (E)} s_j' = t_j' \mathrel {\approx }_{{\widehat{R}}^{F,\varSigma ^e} (E)} t_j\) for \(j = 1, \ldots , n\), and thus, for \(j = 1, \ldots , n\), the terms \(s_j\) and \(t_j\) evaluate to the same element of \({\mathcal {A}} \).

Summing up, we have thus shown that \({\mathcal {A}} \) also satisfies extensionality for the symbols in \(\varSigma ^e \), which completes the proof of the only-if-direction. \(\square \)

Recall that \(f(b,h(a)) \mathrel {\approx }_{E'}^{F,\varSigma ^e}c\) for the set of identities \(E'\) of Example 3. We have already seen that these two terms rewrite to the same canonical form w.r.t. \({\widehat{R}}^{S} (E)\cup R(F,{\mathrel {>_{ lpo }}}) = {\widehat{R}}^{F,\varSigma ^e} (E')\).

The following corollary is an easy consequence of the above theorem, the conditions satisfied if rewrite-based congruence closure is effective (polynomial), and Lemma 8.

Corollary 3

Let \(\varSigma _F \subseteq \varSigma \) and \(\varSigma ^e \subseteq \varSigma \setminus \varSigma _F \), F be a set of identities formulated using the function symbols in \(\varSigma _F \) and variables, and assume that rewrite-based congruence closure is effective (polynomial) for \(\varSigma _F \) and F. Then the word problem for finite sets E of ground identities w.r.t. F and the extensional symbols in \(\varSigma ^e \) is decidable (in polynomial time), i.e., given terms , we can decide (in polynomial time) whether holds or not.

In the case of a set of strongly shallow identities S for a signature \(\varSigma _S\) that is arity bounded, the results shown in the previous section imply that rewrite-based congruence closure is polynomial for \(\varSigma _S \) and S. Thus, we obtain the following instance of Corollary 3.

Corollary 4

Let \(\varSigma _S \subseteq \varSigma \) and \(\varSigma ^e \subseteq \varSigma \setminus \varSigma _S \), S be a set of strongly shallow identities for \(\varSigma _S\), and assumed that \(\varSigma _S\) is arity-bounded. Then the word problem for finite sets of ground identities w.r.t. S and the extensional symbols in \(\varSigma ^e \) is decidable in polynomial time, i.e., given a finite set of ground identities \(E\subseteq G(\varSigma ,C_0)\times G(\varSigma ,C_0)\) and terms , we can decide in polynomial time whether holds or not if we assume that the maximal arity of the symbols in \(\varSigma _S\) is bounded by a constant.

We have mentioned in the introduction that it is unclear how this polynomiality result could be obtained by a simple adaptation of the usual approach that restricts congruence closure to a polynomially large set of subterms determined by the input (informally called “small” terms in the following). The main problem is that one might have to generate identities between “large” terms before one can get back to a desired identity between “small” terms using extensionality. The question is now where our proof actually deals with this problem. The answer is: in Case 1 of the case distinction in the proof of the only-if-direction of Theorem 2. In fact, there we consider a derived identity \(f(s_1,\ldots ,s_n)\mathrel {\approx }f(t_1,\ldots ,t_n)\) such that the (syntactically identical) canonical forms of \(f(s_1,\ldots ,s_n)\) and \(f(t_1,\ldots ,t_n)\) are not a constant from C, but of the form \(f(s_1',\ldots ,s_n') = f(t_1',\ldots ,t_n')\). Basically, this means that \(f(s_1,\ldots ,s_n)\) and \(f(t_1,\ldots ,t_n)\) are terms that are not equivalent modulo E to subterms of terms occurring in E, since the latter terms have a constant representing them. Thus, these terms are “large” terms that potentially could cause a problem: an identity between them has been derived, and now extensionality applied to this identity yields new identities \(s_i\mathrel {\approx }t_i\) between smaller terms. Our proof shows that these identities can nevertheless be derived from \({\widehat{R}}^{F,\varSigma ^e} (E)\), and thus do not cause a problem. For the proof of Theorem 2 to go through, it is important that the canonical rewrite system \({\widehat{R}}^{F,\varSigma ^e} (E)\) works for all ground terms in \(G(\varSigma ,C)\), and not just for a finite set of “small” terms.

5 Symbols that are Commutative and Extensional

In the previous section, we have made the assumptions that the sets \(\varSigma _F \) and \(\varSigma ^e \) are disjoint, i.e., we did not consider extensionality for symbols satisfying the identities in F. The reason is that, without this restriction, we might obtain more consequences than we bargained for. For example, it is easy to see that the presence of a commutative and extensional symbol would trivialize the equational theory. In fact, if f is assumed to be commutative and extensional, then commutativity yields \(f(s,t) \mathrel {\approx }f(t,s)\) for all terms \(s,t\in G(\varSigma ,C_0)\), and extensionality then yields \(s\mathrel {\approx }t\). This shows that, in this case, the commutative and extensional congruence closure would be \(G(\varSigma ,C_0)\times G(\varSigma ,C_0)\), independently of E, and thus even for \(E=\emptyset \).

To avoid this problem, one can, however, look at appropriate variants of extensionality. In this section, we restrict the attention to commutativity, and consider the following variant of extensionality for commutative function symbols f, which we call c-extensionality:

$$\begin{aligned} f(x_1,x_2) \mathrel {\approx }f(y_1,y_2) \Rightarrow (x_1 \mathrel {\approx }y_1 \wedge x_2 \mathrel {\approx }y_2) \vee (x_1 \mathrel {\approx }y_2 \wedge x_2 \mathrel {\approx }y_1). \end{aligned}$$
(3)

For example, if f is a commutative couple constructor, and two couples turn out to be equal, then we want to infer that they consist of the same two persons, independently of the order in which they were put into the constructor.

Unfortunately, adding such a rule makes the word problem coNP-hard, which can be shown by a reduction from validity of propositional formulae.

Proposition 2

In the presence of at least one commutative and c-extensional symbol, the word problem for finite sets of ground identities is coNP-hard.

We prove this proposition by a reduction from validity of propositional formulae. Thus, consider a propositional formula \(\phi \), and let \(p_1,\ldots ,p_n\) be the propositional variables occurring in \(\phi \). We take the constants 0 and 1, and for every \(i, 1\le i\le n\), we view \(p_i\) as a constant symbol, and add a second constant symbol \({\overline{p}}_i\). In addition, we consider the function symbols \(f_\vee , f_\wedge , f_\lnot , f\), and assume that f is commutative and satisfies (3). We then consider ground identities that axiomatize the truth tables for \(\vee ,\wedge ,\lnot \), i.e.,

(4)

In addition, we consider, for every \(i, 1\le i\le n\), the identities

$$\begin{aligned} f(p_i,{\overline{p}}_i) \mathrel {\approx }f(0,1). \end{aligned}$$

Let \(E_\phi \) be the set of these ground identities, and let \(t_\phi \) be the term obtained from \(\phi \) by replacing the Boolean operations \(\vee ,\wedge \), and \(\lnot \) by the corresponding function symbols \(f_\vee , f_\wedge \), and \(f_\lnot \). Proposition 2 is now an immediate consequence of the following lemma.

Lemma 9

The identity \(t_\phi \mathrel {\approx }1\) holds in every algebra satisfying \(E_\phi \) together with (3) and commutativity of f iff \(\phi \) is valid.

Proof

First assume that \(\phi \) is valid, and let \({\mathcal {A}} \) be an algebra satisfying \(E_\phi \) together with (3) and commutativity of f. Since \({\mathcal {A}} \) satisfies the identities \(f(p_i,{\overline{p}}_i) \mathrel {\approx }f(0,1)\) for \(i = 1,\ldots ,n\), as well as (3), we can deduce that \(p_i^{\mathcal {A}} \in \{0^{\mathcal {A}},1^{\mathcal {A}} \}\) for all \(i = 1,\ldots ,n\). Let v be the propositional valuation that satisfies \(v(p_i) = 1\) iff \(p_i^{\mathcal {A}} = 1^{\mathcal {A}} \). Since \(\phi \) is valid, we know that \(v(\phi ) = 1\). Using this fact and the identities axiomatizing the truth tables for \(\vee ,\wedge ,\lnot \), we obtain that \(t_\phi ^{\mathcal {A}} = 1^{\mathcal {A}} \). Thus, the identity \(t_\phi \mathrel {\approx }1\) holds in \({\mathcal {A}}\).

Conversely, assume that \(\phi \) is not valid, and let v be a propositional valuation such that \(v(\phi ) = 0\). Let \({\mathcal {B}} \) be the free commutative algebra with generators 0, 1 and the binary function symbol f as binary operation. Thus, the domain B of \({\mathcal {B}} \) consists of the equivalence classes of ground terms in \(G(\{f\},\{0,1\})\) modulo commutativity. We expand \({\mathcal {B}} \) to an algebra \({\mathcal {A}} \) that also interprets the symbols \(p_1,\ldots ,p_n, {\overline{p}}_1,\ldots ,{\overline{p}}_n, f_\vee , f_\wedge , f_\lnot \) as follows:

  • the domain of \({\mathcal {A}} \) is B;

  • \(0^{\mathcal {A}} = 0^{\mathcal {B}} \) and \(1^{\mathcal {A}} = 1^{\mathcal {B}} \);

  • \(p_i^{\mathcal {A}} = v(p_i)^{\mathcal {B}} \) and \({\overline{p}}_i^{\mathcal {A}} = (1-v(p_i))^{\mathcal {B}} \) for \(i = 1,\ldots ,n\);

  • on \(\{0^{\mathcal {B}},1^{\mathcal {B}} \}\), \(f_\vee ^{\mathcal {A}} \) behaves like disjunction, \(f_\wedge ^{\mathcal {A}} \) behaves like conjunction, and \(f_\lnot ^{\mathcal {A}} \) behaves like negation, and on tuples not in \(\{0^{\mathcal {B}},1^{\mathcal {B}} \}\times \{0^{\mathcal {B}},1^{\mathcal {B}} \}\) (for \(f_\vee \) and \(f_\wedge \)) or \(\{0^{\mathcal {B}},1^{\mathcal {B}} \}\) (for \(f_\lnot \)) they yield an arbitrary value, say \(0^{\mathcal {B}} \);

  • \(f^{\mathcal {A}} \) is \(f^{\mathcal {B}} \).

It is easy to see that \({\mathcal {A}} \) satisfies the identities in \(E_\phi \). In addition, commutativity of f and (3) are satisfied since these properties hold in the free commutative algebra [20]. It is now easy to see that \(t_\phi ^{\mathcal {A}} = v(\phi )^{\mathcal {B}} = 0^{\mathcal {B}} = 0^{\mathcal {A}} \), and thus \({\mathcal {A}} \) does not satisfy the identity \(t_\phi \mathrel {\approx }1\) since in \({\mathcal {B}} \) the class of the generator 0 is different from the class of the generator 1. \(\square \)

To prove a complexity upper bound that matches the lower bound stated in Proposition 2, we consider a finite signature \(\varSigma \), a finite set of ground identities \(E\subseteq G(\varSigma ,C_0)\times G(\varSigma ,C_0)\) as well as sets \(\varSigma _c \subseteq \varSigma \) and \(\varSigma ^e \subseteq \varSigma \) of commutative and extensional symbols, respectively, and assume that the non-commutative extensional symbols in \(\varSigma ^e \setminus \varSigma _c \) satisfy extensionality (2), whereas the commutative extensional symbols in \(\varSigma ^e \cap \varSigma _c \) satisfy c-extensionality (3). We want to show that, in this setting, the problem of deciding, for given terms , whether is not equivalent to \(t_0\) is in NP.

For this purpose, we employ a nondeterministic variant of Algorithm 2. In step (g) this procedure applies Algorithm 1 for the special case where the set \(F_c\) of strongly shallow identities axiomatize commutativity of the symbols in \(\varSigma _c\). We denote the canonical rewrite system produced by an application of this algorithm to the current flat system \(R_i\) by \({\widehat{R}}^{F_c} _i\), i.e., \({\widehat{R}}^{F_c} _i = {\widehat{R}}^{F_c} (R_i) \cup R(F_c,{\mathrel {>_{ lpo }}})\).

For extensional symbols \(f\in \varSigma ^e \setminus \varSigma _c \), step (e) is also performed as in Algorithm 2. For extensional symbols \(f\in \varSigma ^e \cap \varSigma _c \), step (e) is modified as follows: for all pairs of distinct flat function rules \(f(c_1,c_2) \rightarrow d, f(c_1',c_2') \rightarrow d\) in \({\widehat{R}}^{F_c} _i\), nondeterministically choose whether

  • \(c_1\) and \(c_1'\) as well as \(c_2\) and \(c_2'\) are to be identified, or

  • \(c_1\) and \(c_2'\) as well as \(c_2\) and \(c_1'\) are to be identified,

and then add the corresponding constant rules to \(R_{i+1}\) unless the respective constants are already syntactically equal.

This nondeterministic algorithm has different runs, depending on the choices made in the nondeterministic part of step (e). But each run r produces a rewrite system \({\widehat{R}}^{F_c,\varSigma ^e} _r(E)\).

Example 4

We illustrate the nondeterministic construction using the identities \(E_\phi \) for \(\phi = p\vee \lnot p\) from our coNP-hardness proof. Then \(E_\phi \) consists of the identities in (4) together with the identity \(f(p,{\overline{p}}) \mathrel {\approx }f(0,1)\). Assuming an appropriate order on the constants, the system \(R(E_\phi )\) contains, among others, the rules

In steps (a) and (b) of Algorithm 1, these rules are transformed into the form

(5)

Since no new constant rule is added, the construction proceeds with step (e). Due to the presence of the rules \(f(p,{\overline{p}}) \rightarrow c_{f(1,0)}\) and \(f(1,0) \rightarrow c_{f(1,0)}\) for \(f\in \varSigma _c \cap \varSigma ^e \), it now nondeterministically chooses between identifying p with 1 or with 0. In the first case, the constant rules \(p\rightarrow 1, {\overline{p}}\rightarrow 0\) are added, and in the second \(p\rightarrow 0, {\overline{p}}\rightarrow 1\) are added. In the next iteration, no new constant rules are added, and thus the construction terminates. It has two runs \(r_1\) and \(r_2\). The generated rewrite systems \({\widehat{R}}^{F_c,\varSigma ^e} _{r_1}(E)\) and \({\widehat{R}}^{F_c,\varSigma ^e} _{r_2}(E)\) share the rules in (5), but the first contains \(p\rightarrow 1\) whereas the second contains \(p\rightarrow 0\).

Lemma 10

Every run r of the construction terminates after a linear number of iterations and the system \({\widehat{R}}^{F_c,\varSigma ^e} _r(E)\) this run produces is canonical.

Proof

Termination after a linear number of iterations can be shown as in the proof of Lemma 8. By Lemma 6, \({\widehat{R}}^{F_c,\varSigma ^e} _r(E)\) is canonical since it is produced by an application of Algorithm 1 to a flat system \(R_n\). \(\square \)

Using the canonical rewrite systems \({\widehat{R}}^{F_c,\varSigma ^e} _r(E)\), we can now characterize when an identity follows from E w.r.t. commutativity of the symbols in \(\varSigma _c \), extensionality of the symbols in \(\varSigma ^e \setminus \varSigma _c \), an c-extensionality of the symbols in \(\varSigma ^e \cap \varSigma _c \) as follows.

Theorem 3

Let . The identity holds in every algebra that satisfies E, commutativity for every \(f\in \varSigma _c \), extensionality for every \(f\in \varSigma ^e \setminus \varSigma _c \), and c-extensionality for every \(f\in \varSigma ^e \cap \varSigma _c \) iff \(s_0, t_0\) have the same canonical forms w.r.t. \({\widehat{R}}^{F_c,\varSigma ^e} _r(E)\) for every run r of the nondeterministic construction.

Proof

First, we show the if-direction. Thus, assume that \(s_0, t_0\) have the same canonical forms w.r.t. \({\widehat{R}}^{F_c,\varSigma ^e} _r(E)\) for every run r. Let \({\mathcal {A}} \) be an algebra that satisfies E, commutativity for every \(f\in \varSigma _c \), extensionality for every \(f\in \varSigma ^e \setminus \varSigma _c \), and c-extensionality for every \(f\in \varSigma ^e \cap \varSigma _c \). We must show that \(s_0^{\mathcal {A}} = t_0^{\mathcal {A}} \).

To this purpose, we expand \({\mathcal {A}} \) to the new constants in \(C_1\) by setting \(c_s^{\mathcal {A}}:= s^{\mathcal {A}} \), and call the expansion obtained this way still \({\mathcal {A}} \). Since \({\mathcal {A}} \) is a model of E, it is easy to see that it is also a model of R(E). In addition, it satisfies the rewrite rules (viewed as identities) added in steps (a) and (b) of Algorithm 1. The same is true for the rules added in step (e) of the modified version of Algorithm 2 when treating symbols in \(\varSigma ^e \setminus \varSigma _c \). For symbols \(f\in \varSigma ^e \cap \varSigma _c \), we let \({\mathcal {A}} \) decide which option to take. To be more precise, for all pairs of distinct rules \(f(c_1,c_2) \rightarrow d, f(c_1',c_2') \rightarrow d\) in \({\widehat{R}}^{F_c} _i\), we can assume by induction that \(f(c_1,c_2)^{\mathcal {A}} = f^{\mathcal {A}} (c_1^{\mathcal {A}},c_2^{\mathcal {A}}) = d^{\mathcal {A}} = f^{\mathcal {A}} (c_1'^{\mathcal {A}},c_2'^{\mathcal {A}}) = f(c_1',c_2')^{\mathcal {A}} \) holds. Since \({\mathcal {A}} \) satisfies c-extensionality for f, this implies that \(c_1^{\mathcal {A}} = c_1'^{\mathcal {A}}, c_2^{\mathcal {A}} = c_2'^{\mathcal {A}} \) or \(c_1^{\mathcal {A}} = c_2'^{\mathcal {A}}, c_2^{\mathcal {A}} = c_1'^{\mathcal {A}} \). If the former is the case, then we take the first option in the nondeterministic choice, and otherwise we take the second option. Overall, this yields a run r of the nondeterministic construction such that \({\mathcal {A}} \) is a model of \({\widehat{R}}^{F_c,\varSigma ^e} _r(E)\). Note that \({\mathcal {A}} \) is a model of the rules in \(R(F_c,{\mathrel {>_{ lpo }}})\) since it satisfies commutativity for every symbol in \(\varSigma _c \). Since \(s_0, t_0\) have the same canonical forms w.r.t. \({\widehat{R}}^{F_c,\varSigma ^e} _r(E)\), we know that . This yields \(s_0^{\mathcal {A}} = t_0^{\mathcal {A}} \) since \({\mathcal {A}} \) is a model of \({\widehat{R}}^{F_c,\varSigma ^e} _r(E)\).

Second, we show the only-if-direction by contraposition. Thus, assume that there is a run r of the algorithm such that \(s_0, t_0\) have different canonical forms w.r.t. \({\widehat{R}}^{F_c,\varSigma ^e} _r(E)\). Since \({\widehat{R}}^{F_c,\varSigma ^e} _r(E)\) is canonical, this implies that . As in the proof of Theorem 2, let \({\mathcal {A}} \) be the initial algebra for \({\widehat{R}}^{F_c,\varSigma ^e} _r(E)\) viewed as a set of identities over the signature . Then implies . Thus, it is sufficient to show that \({\mathcal {A}} \) satisfies E, commutativity for every \(f\in \varSigma _c \), extensionality for every \(f\in \varSigma ^e \setminus \varSigma _c \), and c-extensionality for every \(f\in \varSigma ^e \cap \varSigma _c \). Since the former three properties can be shown as in the proof of Theorem 2, we concentrate on the last one.

Thus, let \(f\in \varSigma ^e \cap \varSigma _c \) be a commutative and c-extensional function symbol and \(s_1,t_1,s_2,t_2\in G(\varSigma ,C)\) be terms such that the terms \(f(s_1,s_2)\) and \(f(t_1,t_2)\) evaluate to the same element of \({\mathcal {A}} \). Again, this implies that these terms have the same canonical forms w.r.t. \({\widehat{R}}^{F_c,\varSigma ^e} _r(E)\). Let \(s_1',t_1',s_2',t_2'\) be the canonical forms of the terms \(s_1,t_1,s_2,t_2\), respectively. As before, we distinguish several cases, but since f is commutative we now also need to take the rules in \(R(F_c,{\mathrel {>_{ lpo }}})\), which are part of \({\widehat{R}}^{F_c,\varSigma ^e} _r(E)\), into account:

  • First, assume that the canonical forms of \(f(s_1,s_2)\) and \(f(t_1,t_2)\) still have root symbol f. We distinguish several subcases, depending on how the rules in \(R(F_c,{\mathrel {>_{ lpo }}})\) have been applied:

    • The terms \(f(s_1,s_2)\) and \(f(t_1,t_2)\) have the canonical forms \(f(s_1',s_2')\) and \(f(t_1',t_2')\), respectively, and the corresponding arguments are syntactically equal, i.e., \(s_1'=t_1'\) and \(s_2'=t_2'\). As in the proof of Theorem 2, this implies that \(s_1\) and \(t_1\) as well as \(s_2\) and \(t_2\) respectively evaluate to the same elements of \({\mathcal {A}} \).

    • The case where the terms \(f(s_1,s_2)\) and \(f(t_1,t_2)\) have the canonical forms \(f(s_2',s_1')\) and \(f(t_2',t_1')\), respectively, can be handled in the same way since then again \(s_1'=t_1'\) and \(s_2'=t_2'\).

    • The terms \(f(s_1,s_2)\) and \(f(t_1,t_2)\) have the canonical forms \(f(s_2',s_1')\) and \(f(t_1',t_2')\), respectively, and the corresponding arguments are syntactically equal, i.e., \(s_2'=t_1'\) and \(s_1'=t_2'\). In this case we can derive that \(s_2\) and \(t_1\) as well as \(s_1\) and \(t_2\) respectively evaluate to the same elements of \({\mathcal {A}} \).

    • The case where the terms \(f(s_1,s_2)\) and \(f(t_1,t_2)\) have the canonical forms \(f(s_1',s_2')\) and \(f(t_2',t_1')\), respectively, can be handled in the same way since then again \(s_2'=t_1'\) and \(s_1'=t_2'\).

  • Second, assume that the canonical forms of \(f(s_1,s_2)\) and \(f(t_1,t_2)\) are the same constant d. Again, we distinguish several subcases, depending on how the rules in \(R(F_c,{\mathrel {>_{ lpo }}})\) have been applied before the reduction to the constant d:

    • The term rewriting system \({\widehat{R}}^{F_c,\varSigma ^e} _r(E)\) contains the rules \(f(s_1',s_2')\rightarrow d\) and \(f(t_1',t_2')\rightarrow d\). By the construction of \({\widehat{R}}^{F_c,\varSigma ^e} _r(E)\), we thus have \(s_1' = t_1'\) and \(s_2' = t_2'\), or \(s_1' = t_2'\) and \(s_2' = t_1'\) since no new constant rules have been added. This implies that \(s_1\) and \(t_1\) as well as \(s_2\) and \(t_2\) respectively evaluate to the same elements of \({\mathcal {A}} \), or \(s_1\) and \(t_2\) as well as \(s_2\) and \(t_1\) respectively evaluate to the same elements of \({\mathcal {A}} \).

    • The cases where the function rules reducing to d contain the arguments of f in other permutations can be treated in the same way.

Summing up, we have thus shown that \({\mathcal {A}} \) satisfies c-extensionality for the symbols in \(\varSigma ^e \cap \varSigma _c \), which completes the proof of the theorem.\(\square \)

Coming back to Example 4, we note that \(\phi = p\vee \lnot p\) is valid, and thus (by Lemma 9), the identity \(f_\vee (p,f_\lnot (p)) \mathrel {\approx }1\) holds in all algebras that satisfy \(E_\phi \) and interpret f as a commutative and c-extensional symbol. Using the rewrite system generated by the run \(r_1\), we obtain the following rewrite sequence:

$$\begin{aligned} f_\vee (p,f_\lnot (p)) \rightarrow f_\vee (1,f_\lnot (p)) \rightarrow f_\vee (1,f_\lnot (1)) \rightarrow f_\vee (1,0) \rightarrow 1. \end{aligned}$$

For the run \(r_2\), we obtain the sequence

$$\begin{aligned} f_\vee (p,f_\lnot (p)) \rightarrow f_\vee (0,f_\lnot (p)) \rightarrow f_\vee (0,f_\lnot (0)) \rightarrow f_\vee (0,1) \rightarrow 1. \end{aligned}$$

Thus, for both runs the terms \(f_\vee (p,f_\lnot (p))\) and 1 have the same canonical form 1.

Together with Proposition 2, Theorem 3 yields the following complexity results.

Corollary 5

Consider a finite set of ground identities \(E\subseteq G(\varSigma ,C_0)\times G(\varSigma ,C_0)\) as well as sets \(\varSigma _c \subseteq \varSigma \) and \(\varSigma ^e \subseteq \varSigma \) of commutative and extensional symbols, respectively, and two terms \(s_0,t_0\in G(\varSigma ,C_0)\). The problem of deciding whether the identity \(s_0\mathrel {\approx }t_0\) holds in every algebra that satisfies E, commutativity for every \(f\in \varSigma _c \), extensionality for every \(f\in \varSigma ^e \setminus \varSigma _c \), and c-extensionality for every \(f\in \varSigma ^e \cap \varSigma _c \) is coNP-complete.

Proof

Since Proposition 2 yields coNP-hardness, it is sufficient to show that the complement problem is in NP. This is an easy consequence of Theorem 3. In fact, to show that \(s_0\mathrel {\approx }t_0\) does not hold in all such algebras, it is sufficient to generate one run r of our nondeterministic construction, and then test whether \(s_0\) and \(t_0\) have different canonical forms w.r.t. \({\widehat{R}}^{F_c,\varSigma ^e} _r(E)\). The system \({\widehat{R}}^{F_c,\varSigma ^e} _r(E)\) can be generated in nondeterministic polynomial time, and the canonical forms can then be computed in polynomial time. \(\square \)

6 Related Work

As already mentioned in the introduction, several approaches for how to solve the word problem for finite sets of ground identities by constructing a canonical rewrite system have been developed in the literature [7,8,9]. Restricted to the case of uninterpreted function symbols, the approach employed in the present paper is similar to the rewriting-based congruence closure algorithm developed in [7]. In the presence of interpreted function symbols, i.e., ones that are must satisfy certain non-ground identities, our approach differs from previous ones [12, 14, 15] by the use of an ordered rewrite system. Basically, in our approach the ordered rewrite system deals with the semantic properties of the interpreted function symbols, and the only interaction with the construction of the rewrite system for the ground identities is through computing canonical forms of the left-hand sides of function rules using the ordered rewrite system. Otherwise, the construction is the same as for the case of uninterpreted function symbols. The papers [14, 15], which consider symbols that are associativity and commutativity (AC), deal with the problem that AC does not allow for a terminating non-ground rewrite system not by employing ordered rewriting, but by using rewriting modulo AC. This makes the computation of critical pairs more complicated and requires the use of completion modulo AC. One can actually use ordered rewriting to obtain a canonical ground rewrite system for AC [17], but since associativity is not shallow, the simple approach of combining this system with a rewrite system constructed from ground identities used in the present paper would not work. In fact, since it is known that the word problem for commutative semigroups is ExpSpace-complete [21], it is clear that some expensive additional computations are required. The approach in [12] treats, e.g., commutativity by dealing with it directly in the construction of the ground rewrite system, rather than by employing a separate ordered rewrite system. To the best of our knowledge, no other work than ours has dealt with extensionality in the context of ground identities.

Shallow identities have been investigated in term rewriting and unification theory as a class of identities with good computational properties before. Comon et al. [16] introduce a more general notion of shallow identities than the strongly shallow identities employed in the present paper. According to [16], an identity is shallow if every variable occurs at depth at most one in the left- and right-hand side of the identity. In particular, this notion of shallow also encompasses ground identities and identities like idempotency [i.e. \(f(x,x) = x\)]. It is shown in [16] that the word problem for shallow theories is decidable. The approach basically applies a completion procedure akin to unfailing completion [22] to obtain an ordered rewrite system whose ground instantiation is canonical. This yields decidability of the word problem as considered in our Corollary 1, but no complexity result is stated in [16]. Extensionality is not considered in that paper. Our restriction to sets of strongly shallow identities S allows us to obtain an ordered rewrite system \((S^*,\mathrel {>_{ lpo }})\) whose ground instantiation \(R(S^*,\mathrel {>_{ lpo }})\) is canonical in a very simple way. For more general shallow identities, this simple construction would not work (e.g., idempotency would destroy finiteness of \(S^*\)). In particular, our polynomiality results depend of the fact that \(S^*\) is of polynomial size if the signature \(\varSigma _S \) is arity-bounded.

Lynch et al. [23, 24] introduce a method called schematic saturation for automatically generating a ground satisfiability procedure for a given theory T by combining the axioms of T with ground clauses using paramodulation. Similar to Knuth-Bendix completion, schematic saturation need not terminate. However, if it does terminate, then the word problem for ground identities w.r.t. the theory T is decidable in polynomial time by superposition because superposition only generates polynomially many clauses. Similar to our complexity result for strongly shallow identities stated in Corollary 1, this polynomiality result needs the assumption that the signature used by the theory T is arity-bounded. If applied to commutativity and clauses specifying extensionality for non-commutative symbols, schematic saturation terminates,Footnote 4 which means that superposition can be used to decides the word problem for ground identities w.r.t. that theory in polynomial time. Schematic saturation does not terminate in the presence of commutative symbols that are c-extensional. But it can be extended by one more deletion rule, which then leads to termination.Footnote 5 If schematic saturation with the deletion rule halts on the clauses representing some theory, then the word problem for ground identities w.r.t. that theory is again decidable by superposition, but now it may generate exponentially many clauses. This appears to yield an ExpTime decision procedure for the coNP-complete word problem considered in Corollary 5. The problem with applying the results in [23, 24] is that there are no criteria known that are both easy to check and that guarantee that schematic saturation (with or without the additional deletion rule) terminates. One must run the procedure and see what happens. Unfortunately, there is currently no publicly accessible implementation of this procedure available that could be used for doing this.

The paper [25] develops a procedure that can combine decision procedures for the word problem for equational theories that share only constants. To be more precise, the following result is shown there.

Theorem 4

(Baader and Tinelli [25]) Let \(E := E_1 \cup E_2\), where, for \(i = 1, 2\), \(E_i\) is a non-trivial equational theory over \(\varSigma _i\). Furthermore, assume that \(\varSigma := \varSigma _1 \cap \varSigma _2\) is a finite set of constant symbols, and that \(E_1\) and \(E_2\) agree on \(\varSigma \), that is, \(c =_{E_1} d\) iff \(c =_{E_2} d\) for all \(c, d \in \varSigma \).

If the word problem is decidable for \(E_1\) and \(E_2\), then it is also decidable for E.

Though it is not explicitly stated in [25], the combination procedure preserves polynomiality for the word problem, i.e., if the word problem is decidable in polynomial time for \(E_1\) and \(E_2\), then it is also decidable in polynomial time for E (see the termination proof for the procedure in [26]).

This result can be applied in the setting of ground identities with interpreted function symbols considered in the present paper. Indeed, assume that F and G are equational theories over the disjoint signatures \(\varSigma _F \) and \(\varSigma _G \) such that the word problem for finite set of ground identities w.r.t. F is decidable, and the same is true for G. Then the word problem for finite sets of ground identities w.r.t. \(F\cup G\) is decidable as well. In fact, while the original ground identities may then contain symbols from both \(\varSigma _F\) and \(\varSigma _G\), flattening as in our construction of the rewrite system R(E) can be used to obtain a conservative extension of the input theory in which only constants are shared between the F and the G part. These theories may not yet agree on the shared constants, but this can be achieved by iteratedly adding identities between the shared constants derived from one theory to the other. This way, we can, for example, combine the decision procedure for the case of ground identities w.r.t. strongly shallow identities described in Sect. 3 with a decision procedure for ground identities w.r.t. AC symbols [14, 15, 27, 28]. This kind of combination also works in the presence of extensional symbols. In fact, while extensionality is axiomatized by conditional identities rather than identities, our results in Sect. 4 show that we can represent its effect using the equational theory axiomatized by the computed canonical rewrite system, which allows us to apply the above combination result.

7 Conclusion

We have shown, using a rewriting-based approach, that adding strongly shallow identities (like commutativity) and extensionality for certain function symbols to a finite set of ground identities leaves the complexity of the word problem in P. In contrast, adding c-extensionality for commutative function symbols raises the complexity to coNP. For classical congruence closure, it is well-known that it can actually be computed in \(O(n\log n)\) [4, 11]. Since this complexity upper bound can also be achieved using a rewriting-based approach [9, 12], we believe that the approach developed here can also be used to obtain an \(O(n\log n)\) upper bound for the word problem for ground identities in the presence of commutativity (and maybe other kinds of strongly shallow identities) and extensionality, as in Sect. 4, but this question was not in the focus of the present paper.

Our result on adding extensionality is actually not restricted to the case of interpreted function symbols whose semantic properties are defined by strongly shallow identities. Instead, we exhibited general properties that the canonical rewrite system computed by semantic congruence closure for the equational theory under consideration must satisfy such that extensionality can be treated in the simple way described in Sect. 4. An important condition is that the system is canonical for all ground terms, and not just a finite set of “small” terms determined by the input. It would be interesting to see whether rewriting-based approaches to congruence closure for other theories produced rewrite systems that satisfy the conditions stated in Sect. 4. For the rewriting-based approaches that deal with associative-commutative symbols proposed in [14, 15], this is not directly the case since they use rewriting modulo AC. However, our approach for adding extensional symbols in Sect. 4 can probably be extended to deal with rewriting modulo AC since the AC symbols are not extensional. We have mentioned in the previous section that extensionality in the presence of non-extensional AC symbols can be dealt with by the combination results in [25]. However, while this approach yields a decision procedure for the word problem, it does not provide us with a canonical ground rewrite system. Another, and more challenging question would be to come up with an appropriate notion of extensionality for AC symbols, and to show that adding it leaves the word problem decidable.

In our treatment of strongly shallow identities, we have used a rather brute-force way to obtain an ordered rewrite system that is canonical on ground terms, which basically adds all implied identities between strongly shallow h-terms (see the definition of the set \(S^*\) in Sect. 2.3). Instead, one could use unfailing completion [22] to add only those consequences that are needed to achieve ground confluence. The results in [16] show that this always terminates on shallow (and thus also strongly shallow) identities. The interesting question would then be under what conditions on the strongly shallow identities this actually terminates in polynomial time without requiring arity-boundedness. As mentioned in the previous section, the approach used in [16] can actually deal with more general kinds of shallow identities than the strongly shallow ones considered here. It produces an ordered rewrite system that is canonical for ground terms, but it is not clear whether, with the more general notion of shallow identities employed in [16], this system satisfies the properties required in Sect. 4 for adding extensionality. For strongly shallow identities other than commutativity, it would also be interesting to come up with appropriate variants of c-extensionality.

Regarding the application motivation from DL, it should be easy to extend tableau-based algorithms for DLs to deal with individuals named by ground terms and identities between these terms. Basically, the tableau algorithm then works with the canonical forms of such terms, and if it identifies two terms (e.g., when applying a tableau-rule dealing with number restrictions), then the rewrite system and the canonical forms need to be updated. More challenging would be a setting where rules are added to the knowledge base that generate new terms if they find a certain constellation in the knowledge base (e.g., a married couple, for which the rule introduces a ground term denoting the couple and assertions that link the couple with its components). In the context of first-order logic and modal logics, the combination of tableau-based reasoning and congruence closure has respectively been investigated in [29] and [30].