Keywords

1 Introduction

Anti-unification (AU) is a fundamental operation for reasoning about generalizations of formal objects. It is the dual operation to unification. The seminal works of Plotkin and Reynolds, introducing the area, were published more than fifty years ago [27, 28]. Recent applications renewed the interest in this technique. This current tendency is mainly due to the significance of generalization operations within frameworks crucial for software analysis and related areas [19].

In contrast to unification, where identifying the equivalence classes induced by a set of expressions is the main objective, AU methods search for the least general commonalities induced by a set of expressions. Investigations have exploited AU methods for various applications such as the implementation of efficient parallel compilers [8], plagiarism detection and code cloning [33, 35, 36], automated bug detection and fixing [7, 24, 32, 34], and indexing/compression/library learning [15, 26], just to name a few. Anti-unification has been studied for several mathematical and computational structures such as term-graphs [13], higher-order terms [12, 20, 25], unranked (variadic) languages [10, 21], nominal terms [11, 29, 30], modulo approximations [5, 22, 23] and background (first-order) equational theories, which is also the subject of this paper. Some of these algorithms have been implemented and can be accessed online [2, 9].

Syntactic AU algorithms [27, 28] compute the least general generalizations (lgg). In the equational case, the given terms do not necessarily have a single lgg. Problems are instead characterized by their minimal complete sets of generalizations (mcsg), which leads to the classification of theories depending on the existence and cardinality of such sets: If the mcsg does not exist for some problem in the given theory, then the theory has the nullary AU type. Otherwise, theories may have unitary (all problems have a singleton mcsg, i.e., a single lgg), finitary (all problems have a finite mcsg, at least one of which is not a singleton), or infinitary (there is a problem with the infinite mcsg) AU type.

There have been quite a few developments concerned with AU modulo equational theories. For example, Burghardt [14] considered AU modulo an arbitrary equational theory using grammars. Most other authors studied AU over fundamental algebraic properties and their combinations, e.g., associative (A), commutative (C), AC, idempotent (I) operators, or operators with unit (U) elements. An early work by Baader [6] studied AU over so-called “commutative theories”, covering commutative monoids (ACU), commutative idempotent monoids (ACUI), and Abelian groups. In a restricted setting, he showed that AU in such theories is unitary. Alpuente et al. [1, 3] studied AU over combinations of A, C, and U operators in an order-sorted setting, providing complete AU algorithms, and proving that all studied problems are finitary. Cerna and Kutsia [18] showed that some results depend on the number of symbols that satisfy the associated equational axioms. For instance, they proved the nullarity of theories containing more than one equational symbol: \(U^{>1}, (AU)^{>1} (CU)^{>1}, (ACU)^{1}\), and (AU)(CU). They also show that IAI,CI are infinitary [17], and Cerna proved that \((UI)^{>1}, (AUI)^{>1}, (CUI)^{>1},(ACUI)^{>1}\), and semirings are nullary [16].

This paper extends the state-of-the-art on equational anti-unification by providing an algorithm to solve first-order AU problems in which collapsing symbols may occur. These are symbols that are associated with an absorption constant such that \(f(\varepsilon _{f},x)\,\approx \, \varepsilon _{f} \,\approx \, f(x, \varepsilon _{f})\). Such properties often appear in syntactic, logical, and algebraic frameworks (e.g., \(0 \times x \,\approx \, 0,\) \( \text{ false } \wedge p \,\approx \, \text{ false }\)). They are an instance of the subterm-collapsing property. Concerning applications, one could consider such operations as modeling exception handling and other methods of flagging errors in software development, where much of the context is discarded when the error handling code is triggered. In such cases, like absorption theories, the state before triggering the error handling code is not precisely captured by the resulting context and, in a sense, can be abstracted away.

In this paper, we provide a detailed study of anti-unification in absorption theories: investigating its type (which turns out to be infinitary), coming up with a finitary algorithmic representation of the potentially infinite mcsg, developing an algorithm that computes such a representation, and studying its properties. Moreover, our work opens a way toward characterizing anti-unification for a bigger class of subterm-collapsing equational theories, where techniques introduced in this paper can be useful. We leave this as a future work.

Plan of the Paper. After defining the notions (Sect. 2), we introduce an algorithm for anti-unification over absorption theories (Sect. 3), prove its soundness and completeness (Sect. 4), show that anti-unification over absorption theories is of type infinitary and provide a brief complexity analysis (Sect. 5). Some proofs and explanatory examples can be found in [4].

2 Preliminaries

Let \(\mathcal {V}\) be a countable set of variables and \(\mathcal {F}\) a set of function symbols with a fixed arity. Additionally, we assume \(\mathcal {F}\) contains a special constant \(\star \), referred to as the wild card. The set of terms derived from \(\mathcal {F}\) and \(\mathcal {V}\) is denoted by \(\mathcal {T}(\mathcal {F},\mathcal {V})\), whose members are constructed using the grammar \(t \,{:}{:}\!= x \mid f(t_1,\dots ,t_n)\), where \(x\,\in \,\mathcal {V}\) and \(f\,\in \, \mathcal {F}\) with arity \(n\ge 0\). When \(n\,=\,0\), f is called a constant. Constant and function symbols, terms, and variables are denoted by lower-case letters of the first, second, third, and fourth quarter of the alphabet (\(a,b,\ldots \); \(f,g,\ldots \); \(r,s,\ldots \); \(x,y,\ldots \)). The set of variables occurring in t is denoted by \( var (t)\). The size of a term is defined inductively as: \( size (x)\,=\,1\), and \(\textstyle size (f(t_1,\ldots ,t_n)) \,=\, 1 + \sum _{i\,=\,1}^{n} size (t_i)\). The depth of a term t is defined inductively as \(dep(x)\,=\,1\) for variables and \(dep(f(t_1,\ldots ,t_n))\,=\,max\{dep(t_1), \ldots , dep(t_n)\}+1\) otherwise.

The set of positions of a term t, denoted by \( pos (t)\), is a set of strings of positive integers, defined as \( pos (f(t_1,\ldots ,t_n))\,=\,\{\epsilon \}\cup {\textstyle \bigcup _{i\,=\,1}^n} \{i. p \mid p\,\in \, pos (t_i)\}\), where \(f\,\in \, \mathcal {F}\), \(t_1,\ldots , t_n\) are terms, and \(\epsilon \) denotes the empty string. For example, the term at position 1.2 of g(f(xa)) is a. Given a term t and \(p\,\in \, pos (t)\), then \(t|_p\) denotes the subterm of t at position p. Given a term t and \(p,q\,\in \, pos (t)\), we write \(p\,\sqsubseteq \, q\) if \(q\,=\,p.q'\) and \(p\sqsubset q\) if \(p\,\sqsubseteq \, q\) and \(p{\,}\ne {\,} q\). The set of subterms of a term t is defined as \( sub (t)\,=\, \{t\vert _p\ \vert \ p\,\in \, pos (t)\}\). The head of a term t is defined as \( head (x)\,=\,x\) and \( head (f(t_1,\dots ,t_n))\,=\,f\), for \(n\ge 0\).

A substitution is a function \(\sigma : \mathcal {V}\rightarrow \mathcal {T}(\mathcal {F},\mathcal {V})\) such that \(\sigma (x){\,}\ne {\,} x\) for only finitely many variables. The set of the variables that are not mapped to themselves is called the domain of \(\sigma \), denoted as \( dom (\sigma )\). The range of \(\sigma \), denoted \( ran (\sigma )\), is the set of terms \(\{\sigma (x) \mid x\,\in \, dom (\sigma ) \}\). We refer to a ground term t if \( var (t)\,=\,\emptyset \) and a ground substitution \(\sigma \) if for all \(t\,\in \, ran (\sigma )\), t is ground. Substitutions are extended to terms in the usual manner. We use the postfix notation for substitution application to terms and write \(t\sigma \) instead of \(\sigma (t)\).

Substitutions can be described as sets of bindings of variables in their domains into terms in their ranges, e.g., we represent a substitution \(\sigma \) as the set \(\{x\,\mapsto \, x\sigma \mid x\,\in \, dom (\sigma ) \}\). Lowercase Greek letters denote substitutions except for the identity substitution that we denote by id. The set of variables occurring in the terms of \( ran (\sigma )\) is denoted as \( rvar (\sigma )\). The composition of substitutions \(\sigma \) and \(\rho \) is written \(\sigma \rho \) and is defined by \((\sigma \rho )(x)\,=\,(x\sigma )\rho \) for each \(x\,\in \, \mathcal {V}\). The restriction of a substitution \(\sigma \) to a set of variables V, denoted by \(\sigma \vert _{V}\), is a substitution defined as \(\sigma \vert _{V}(x)\,=\,\sigma (x)\) for all \(x\,\in \, V\) and \(\sigma \vert _{V}(x)\,=\,x\) otherwise.

In this work, we focus on equational anti-unification. Thus, we refrain from presenting syntactic variants of the concepts discussed below. For such details, we refer to the recent survey on the topic [19].

Definition 1

(Equational theory [31]). An equational theory \(T_E\) is a class of algebraic structures that hold a set of equational axioms E over \(\mathcal {T}(\mathcal {F},\mathcal {V})\).

The relation \(\{ (s,t) \,\in \, \mathcal {T}(\mathcal {F},\mathcal {V}) \times \mathcal {T}(\mathcal {F},\mathcal {V}) \; | \; E\models (s,t)\}\) induced by a set of equalities E gives the set of equalities satisfied by all structures in the theory of E. We will use the notation \(s\,\approx _E t\) for (st) belonging to this set. Also, we will identify \(T_E\) with the set of axioms E. Groups, monoids, and semirings are examples of equational theories.

Definition 2

(E-generalization, \(\preceq _{E}\)). The generalization relation of the theory induced by E holds for terms \(r,s\,\in \, \mathcal {T}(\mathcal {F},\mathcal {V})\), written \(r\preceq _{E} s\), if there exists a substitution \(\sigma \) such that \(r\sigma \,\approx _E\). In this case, we say that r is more general than s modulo E. If \(r\preceq _{E} s\) and \(r\preceq _{E} t\), we say that r is an E-generalization of s and t. The set of all E-generalizations of s and t is denoted as \(\mathcal {G}_{E}(s,t)\). By \(\prec _E\) and \(\simeq _E\), we denote the strict and equivalence relations induced by \(\preceq _{E}\), respectively.

Example 1

Consider the equational theory \(\texttt {Abs}\,=\,\{f(\varepsilon _{f},x)\,\approx \, \varepsilon _{f}, f(x,\varepsilon _{f})\,\approx \,\varepsilon _{f}\}\), and the terms \(s\,=\,\varepsilon _{f}\) and \(t\,=\,f(f(b,c),a)\). Then f(f(bx), a) is an Abs-generalization of s and t. Indeed, \(\sigma \,=\,\{x\,\mapsto \, \varepsilon _{f}\}\) and \(\rho \,=\,\{x\,\mapsto \, c\}\) satisfy \(f(f(b,x),a)\sigma \,=\,f(f(b,\varepsilon _{f}),a)\,\approx _{\texttt {Abs}}\varepsilon _{f}\) and \(f(f(b,x),a)\rho \,=\,f(f(b,c),a)\).

Definition 3

(Minimal complete set of E-generalizations). The minimal complete set of E-generalizations of the terms s and t, denoted as \( mcsg _E(s,t)\), is a subset of \(\mathcal {G}_{E}(s,t)\) satisfying:

  1. 1.

    For each \(r\,\in \, \mathcal {G}_{E}(s,t)\) there exists \(r'\,\in \, mcsg _E(s,t)\) such that \(r\preceq _{E} r'\).

  2. 2.

    If \(r,r'\,\in \, mcsg _E(s,t)\) and \(r\preceq _{E} r'\), then \(r\,=\,r'\) (minimality).

Example 2

For Example 1, the minimal complete set of \(\texttt {Abs}\)-generalizations is \( mcsg _{\texttt {Abs}}(\varepsilon _{f},f(f(b,c),a))\,=\,\{f(f(x,c),a),f(f(b,x),a),f(f(b,c),x)\}.\)

Definition 4

(Anti-unification type). The anti-unification type of an equational theory E may have one of the following forms:

  • Unitary: \( mcsg _{E}(s,t)\) exists for all \(s,t\,\in \, \mathcal {T}(\mathcal {F},\mathcal {V})\) and is always singleton.

  • Finitary: \( mcsg _{E}(s,t)\) exists and is finite for all \(s,t\,\in \, \mathcal {T}(\mathcal {F},\mathcal {V})\), and there exist \(s',t'\,\in \, \mathcal {T}(\mathcal {F},\mathcal {V})\) for which \(1 < | mcsg _{E}(s',t')| < \,\infty \).

  • Infinitary: \( mcsg _{E}(s,t)\) exists for all \(s,t\,\in \, \mathcal {T}(\mathcal {F},\mathcal {V})\), and there exist \(s',t'\,\in \, \mathcal {T}(\mathcal {F},\mathcal {V})\) such that \( mcsg _{E}(s',t')\) is infinite.

  • Nullary: for some \(s,t\,\in \, \mathcal {T}(\mathcal {F},\mathcal {V})\), \( mcsg _{E}(s,t)\) does not exist.

Example 3

From the introduction: Syntactic AU is unitary [27, 28], AU over associative (A) and commutative (C) theories is finitary [1], AU over idempotent theories is infinitary [16], and AU with multiple unital equations is nullary [18].

3 Anti-unification in Absorption Theories

Absorption is one of the fundamental properties used in various algebraic structures. For example, in semirings, rings, and Boolean algebras, the additive identity is the absorption constant for multiplication. Concrete examples are the product operation and 0 in number fields and the intersection operation and \(\emptyset \) in set theory. So far, investigations on anti-unification over absorption theories have only considered equational theories defining more elaborate algebraic structures (semirings [16]). In this work, we study pure absorption theories as part of a general study on the anti-unification of subterm-collapsing theories.

Remark 1

We only consider anti-unification of ground terms. Given that the generalization of two distinct variables is a fresh variable and the generalization of a variable with itself is the same variable, we can treat variables in the input problem as constants.

For a binary function symbol \(f\,\in \, \mathcal {F}\) and a constant \(\varepsilon _{f}\,\in \, \mathcal {F}\), the absorption property \(\texttt {Abs}(f,\varepsilon _{f})\) is given by the axioms \(\{f(x,\varepsilon _{f})\,\approx \,\varepsilon _{f}, \, f(\varepsilon _{f},x)\,\approx \,\varepsilon _{f}\}\). An absorption theory is induced by a finite union of absorption axiom sets \(\texttt {Abs}(f_1,\varepsilon _{f_1}) \cup \cdots \cup \texttt {Abs}(f_n,\varepsilon _{f_n})\), \(n\ge 1\), such for all \(1\le i {\,}\ne {\,} j \le n\), \(f_i {\,}\ne {\,} f_j\) and \(\varepsilon _{f_i}{\,}\ne {\,} \varepsilon _{f_j}\). Each pair \(f_i,\varepsilon _{f_i}\) is called a pair of related absorption symbols. When the concrete symbols are not relevant or if they are clear from the context, we refer to an absorption theory simply as \(\texttt {Abs}\).

An anti-unification triple (AUT) is a triple of the form \(s\,\triangleq _{x}\, t\), where \(x\,\in \, \mathcal {V}\), called the label of the AUT, and \(s,t\,\in \, \mathcal {T}(\mathcal {F},\mathcal {V})\). Given a set A of AUTs, \( labels (A) \,=\, \{ x\mid s\,\triangleq _{x}\, t\,\in \, A\}\) and \( size (A)\,=\, \sum _{s\,\triangleq _{x}t\in A} \big ( size (s) + size (t) \big )\). A set of AUTs is valid if its labels are pairwise disjoint. An AUT is referred to as wild if either the left or right side is the wild card.

Definition 5

(Solved AUT). An AUT \(s\,\triangleq _{x}\, t\) is solved over an absorption theory \(\texttt {Abs}\) if \( head (s)\) \(\not =\, head (t)\), \( head (s)\) and \( head (t)\) are not related absorption symbols, and \(s\,\triangleq _{x}\, t\) is not wild.

Intuitively, solved means the label of the AUT is the lgg of the two terms.

3.1 Generalization Procedure for Abs Theories

We present a set of inference rules (Table 1), which, when applied exhaustively (AUnif procedure), return a set of objects from which Abs-generalizations of the input AUTs may be derived. The inference rules of the AUnif procedure work on configurations, defined below.

Definition 6

(Configuration). A configuration is a quadruple of the form \( \langle A;S;D;\theta \rangle \), where:

  • A is a valid set of AUTs (active set);

  • S is a valid set of solved AUTs (store);

  • D is a valid set of wild AUTs (delayed set);

  • \(\theta \) is a substitution such that \( rvar (\theta )\,=\, labels (A)\cup labels (S)\cup labels (D)\) (anti-unifier);

  • \( labels (A), labels (S), labels (D)\), and \( dom (\theta )\) are pairwise disjoint.

All terms occurring in a configuration are in their \(\texttt {Abs}\)-normal forms: an absorption constant does not occur as the argument to its absorption symbol.

The rules in the Table 1 will be referred to as follows: Decompose (\(\overset{{{\tiny {\textit{Dec}}}}}{\Longrightarrow }\)), Solve (\(\overset{{{\tiny {\textit{Sol}}}}}{\Longrightarrow }\)), Expansions for Left Absorption, (\(\overset{{{\tiny {\textit{ExpLA1}}}}}{\Longrightarrow }\)and \(\overset{{{\tiny {\textit{ExpLA2}}}}}{\Longrightarrow }\)), Expansions for Right Absorption (\(\overset{{{\tiny {\textit{ExpRA1}}}}}{\Longrightarrow }\)and \(\overset{{{\tiny {\textit{ExpRA2}}}}}{\Longrightarrow }\)), Expansion Absorption in Both sides (\(\overset{{{\tiny {\textit{ExpBA1}}}}}{\Longrightarrow }\)) and (\(\overset{{{\tiny {\textit{ExpBA2}}}}}{\Longrightarrow }\)), and Merge (\(\overset{{{\tiny {\textit{Mer}}}}}{\Longrightarrow }\)). By we denote the application of some inference rule of Table 1 to \(\mathcal {C}\) resulting in \(\mathcal {C}'\). By we denote a finite sequence of inference rule applications starting at \(\mathcal {C}\) and ending with \(\mathcal {C}'\). In both cases we say \(\mathcal {C}'\) is derived from \(\mathcal {C}\). An initial configuration is a configuration of the form \(\langle A; \emptyset ;\emptyset ; \iota \rangle \), where \(\iota \,=\, \{\textsf{f}_A(x)\,\mapsto \, x\mid x\,\in \, labels (A)\}\) with \(\textsf{f}_A:\mathcal {V}\rightarrow (\mathcal {V}\,\setminus \, labels (A))\) being a bijection. A configuration \(\mathcal {C}\) is referred to as final if no inference rule is applicable to \(\mathcal {C}\). We denote the set of final configurations finitely derived from an initial configuration \(\mathcal {C}\) by \(\textsc {AUnif}(\mathcal {C})\).

Table 1. Inference rules for the AUnif procedure for Abs theory.

Lemma 1

(Preservation). If \(\mathcal C\) is a configuration and , then \(\mathcal {C'}\) is a configuration.

Proof

According to the rules in Table 1, we can have the following two cases:

  • A rule removes an AUT \(s\,\triangleq _{x}\,t\) from the active set of \(\mathcal {C}\). Then either \(s\,\triangleq _{x}\,t\) occurs in the store of \(\mathcal {C'}\), or the anti-unifier component of \(\mathcal {C}'\) is the composition of the anti-unifier component of \(\mathcal {C}\) with \(\{x\,\mapsto \, r\}\), where \( var (r)\) are fresh variables labeling newly added AUTs in the active and delayed sets of \(\mathcal {C}'\).

  • A rule removes an AUT \(s\,\triangleq _{x}\,t\) from the store of \(\mathcal {C}\). Then the store of \(\mathcal {C}'\) is a subset of the store of \(\mathcal {C}\) and the anti-unifier component of \(\mathcal {C}'\) is the composition of the anti-unifier component of \(\mathcal {C}\) with \(\{x\,\mapsto \, \textit{y}\}\), where y is a label of an AUT in the store of \(\mathcal {C}\) such that \(x{\,}\ne {\,} y\).

In both cases, the properties of a configuration are preserved.    \(\square \)

Remark 2

For the rest of the paper, we will only consider configurations derived from initial configurations.

Theorem 1

(Termination). Let \(\mathcal {C}\) be a configuration. Then \(\textsc {AUnif}(\mathcal {C})\) is computable in a finite number of steps.

Proof

Let \(\mathcal {C}\,=\,\langle A; S; D ; \theta \rangle \). We define \( size (\mathcal {C}):=\,( size (A), size (S))\) and compare these pairs lexicographically. This ordering is well-founded since the size of a set of AUTs is a natural number. Observe that if then \(size(\mathcal {C})>size(\mathcal {C}')\). Thus, every sequence of rule applications terminates. Furthermore, any configuration can be transformed by rules from Table 1 in finitely many ways. Thus, by König’s Lemma, \(\textsc {AUnif}(\mathcal {C})\) is finite and finitely computable.    \(\square \)

Let \(\langle \emptyset , S, D, \theta \rangle \,\in \, \textsc {AUnif}(\langle A; \emptyset ;\emptyset ; \iota \rangle )\), where \(\langle A; \emptyset ;\emptyset ; \iota \rangle \) is an initial configuration. We will show that for any AUT \(s\,\triangleq _{x}\,t\,\in \, A\), \(x\theta \,\in \,\mathcal {G}_{\texttt {Abs}}(s,t)\). Moreover, we can construct additional generalizations by considering the AUTs in the delayed sets. We discuss this process in the next section.

3.2 Abstraction Set and Substitutions

We construct the abstraction set and abstraction substitutions from the store and delayed sets of the final configurations derived using \(\textsc {AUnif}\) procedure. Let \(\langle \{s\,\triangleq _{x}\, t\};\emptyset ; \emptyset ; \iota \rangle \) be an initial configuration and \(\langle \emptyset ; S ; D; \theta \rangle \,\in \, \textsc {AUnif}(\langle \{s\,\triangleq _{x}\, t\};\emptyset ; \emptyset ; \iota \rangle )\). While \(x\theta \) may be more specific than the syntactic generalization of s and t, any use of the absorption theory while computing \(x\theta \) is completely dependent on the presence of absorption symbols and constants within s and t. Absorption theories allow the introduction of additional structure beyond what is present in the initial AUTs. For example, AUnif computes the generalization f(xy) for the terms \(\varepsilon _{f}\) and \( f(h(\varepsilon _{f}),h(h(\varepsilon _{f})))\), yet \(\texttt {Abs}\) allows a more specific generalization, f(xh(x)). In more extreme cases, infinitely many more specific generalizations may exist.

Definition 7

(Abstraction set). Let t be a ground term in Abs-normal form, and \(\sigma \) be a substitution whose range is in Abs-normal form. The abstraction set of t with respect to \(\sigma \) is the set

$$\begin{aligned} \uparrow (t,\sigma ):=\,\{r \mid r\sigma \,\approx _{\texttt {Abs}}\,t,\ {r~is~in~\texttt {Abs}\text {-}normal~form,~and~} var (r)\,\subseteq \, dom (\sigma )\}. \end{aligned}$$

Observe that \(t\,\in \, \uparrow (t,\sigma )\) since \( var (t) \,=\,\emptyset \,\subseteq \, dom (\sigma )\) and \(t\sigma \,=\, t\). To obtain an \(r\,\in \, \uparrow (t,\sigma )\), we abstract some occurrences of some \(x\sigma \)’s in t by x, where \(x\,\in \, dom (\sigma )\); this is the origin of the term “abstraction set”.

Example 4

Let \(t\,=\,g(\varepsilon _{f},f(h(a),b))\) and \(\sigma \,=\,\{x\,\mapsto \, a, y\,\mapsto \, f(h(a),b),z\,\mapsto \, b \}\). Then the abstraction set of t with respect to \(\sigma \) is

$$\begin{aligned} \uparrow (t,\sigma )\,=\,\{ t,\ g(\varepsilon _{f},y),\ g(\varepsilon _{f},f(h(x),b)),\ g(\varepsilon _{f},f(h(a),z)), \ g(\varepsilon _{f},f(h(x),z))\}. \end{aligned}$$

Now, consider \(t\,=\,h(\varepsilon _{f})\) and \(\sigma \,=\,\{y\,\mapsto \, a, v\,\mapsto \, \varepsilon _{f}\}\). Then \(\uparrow (t,\sigma )\) is infinite:

$$\begin{aligned} \uparrow (t,\sigma )\,=\,&\{h(\varepsilon _{f}),\, h(v)\} \cup \{ h(f(v,s)) \mid s \,\in \, \mathcal {T}(\mathcal {F},\{y,v\}) \} \cup {} \\ & \{ h(f(s,v)) \mid s \,\in \, \mathcal {T}(\mathcal {F},\{y,v\}) \} \cup {} \\ & \{ h(f(f(v,s),r)) \mid s,r \,\in \, \mathcal {T}(\mathcal {F},\{y,v\}) \} \cup {} \cdots \end{aligned}$$

Let us consider a particular configuration \(\mathcal {C}\). Observe that all AUTs occurring in the delayed set of \(\mathcal {C}\) are wild, i.e., of the form \(\star \,\triangleq _{x}\, t\) or \(t\,\triangleq _{x}\, \star \) where t is ground and \(\star \) is a constant, indicating that the particular term occurring in the AUT at this position is irrelevant. We produce more specific generalizations by composing abstraction substitutions with the anti-unifier of \(\mathcal {C}\). Essentially, abstraction substitutions are anti-unifiers of AUTs in the delayed set of \(\mathcal {C}\) constructed from an interpretation of the wild cards as particular terms. The variables occurring in the range of an abstraction substitution are restricted to labels of the store of \(\mathcal {C}\). In Sect. 4, we show that this restriction does not influence completeness.

Definition 8

(Abstraction substitutions). Let \(\mathcal {C} \,=\, \langle A;S;D;\theta \rangle \) be a configuration such that \(D{\,}\ne {\,} \emptyset \). A substitution \(\tau \) is called an abstraction substitution of \(\mathcal {C}\) if \( dom (\tau ) \,=\, labels (D)\), and for each \(y\,\in \, dom (\tau )\) we have \(y\tau \,\in \, \uparrow _{y}(D,S)\), where

$$\uparrow _{y}(D,S) :=\, \left\{ \begin{array}{cc} \uparrow (t,\{x\,\mapsto \, r\mid l\,\triangleq _{x}\,r\,\in \, S, \text { for some } l\}) &{} \text { if }\ \star \,\triangleq _{y}\, t \,\in \, D, \\ \uparrow (s,\{x\,\mapsto \, l\, \mid l\,\triangleq _{x}\,r\,\in \, S, \text { for some } r\}) &{} \text { if }\ s\,\triangleq _{y}\, \star \,\in \, D. \end{array} \right. $$

The set of abstraction substitutions of \(\mathcal {C}\) is denoted by \(\varPsi (D,S)\).

Corollary 1

Let \(\langle A;S;D;\theta \rangle \) be a configuration such that \(D{\,}\ne {\,} \emptyset \). Then for any \(y\,\in \, labels (D)\) and \(\tau \,\in \, \varPsi (D,S)\), \( var (y\tau ) \,\subseteq \, labels (S)\).

The following example illustrates the computation of final configurations using AUnif and the construction of the abstraction sets.

Example 5

Applying AUnif to \(g(\varepsilon _{f},f(a,h(\varepsilon _{f}))) \,\triangleq _{x}\, g(f(h(\varepsilon _{f}),a),\varepsilon _{f})\), we get the following four derivations that lead to four final configurations:

$$\begin{aligned} {\textbf {Derivation}}\, 1: \qquad \quad \quad \langle \{g(\varepsilon _{f},f(a,h(\varepsilon _{f})))\,\triangleq _{x}\, g(f(h(\varepsilon _{f}),a),\varepsilon _{f})\}; \emptyset ;\emptyset ;\iota \rangle & \!\overset{{{\tiny {\textit{Dec}}}}}{\Longrightarrow }\\ \langle \{\varepsilon _{f}\,\triangleq _{w_1}\, f(h(\varepsilon _{f}),a), f(a,h(\varepsilon _{f}))\,\triangleq _{w_2}\, \varepsilon _{f}\}; \emptyset ;\emptyset ;\{x\,\mapsto \, g(w_1,w_2), \ldots \}\rangle &\!\overset{{{\tiny {\textit{ExpLA1}}}}}{\Longrightarrow }\\ \langle \{\varepsilon _{f}\!\,\triangleq _{u_1}\,\! h(\varepsilon _{f}), f(a,h(\varepsilon _{f}))\!\,\triangleq _{w_2}\, \!\varepsilon _{f}\}; \emptyset ;\{\star \!\,\triangleq _{v_1}\,\! a\}; \{x\,\mapsto \, g(f(u_1,v_1),w_2), \ldots \}\rangle & \!\overset{{{\tiny {\textit{ExpRA1}}}}}{\Longrightarrow }\\ \langle \{\varepsilon _{f}\,\triangleq _{u_1}\, h(\varepsilon _{f}), a\,\triangleq _{u_2}\, \varepsilon _{f}\}; \emptyset ;\{\star \,\triangleq _{v_1}\, a,h(\varepsilon _{f})\,\triangleq _{v_2}\, \star \}; & \\ \{x\,\mapsto \, g(f(u_1,v_1),f(u_2,v_2)), \ldots \}\rangle & \!\overset{{{\tiny { Sol } \times 2}}}{\Longrightarrow } \\ \langle \emptyset ; \{\varepsilon _{f}\,\triangleq _{u_1}\, h(\varepsilon _{f}), a\,\triangleq _{u_2}\, \varepsilon _{f}\};\{\star \,\triangleq _{v_1}\, a,h(\varepsilon _{f})\,\triangleq _{v_2}\, \star \};&\\ \{x\,\mapsto \, g(f(u_1,v_1),f(u_2,v_2)), \ldots \}\rangle . \end{aligned}$$

Then \(D\,=\,\{\star \,\triangleq _{v_1}\, a,h(\varepsilon _{f})\,\triangleq _{v_2}\, \star \}\) and \(S\,=\,\{\varepsilon _{f}\,\triangleq _{u_1}\, h(\varepsilon _{f}), a\,\triangleq _{u_2}\, \varepsilon _{f}\}\). For the variable \(v_1\), \(\uparrow _{v_1}(D,S)\,=\,\uparrow (a,\{u_1\,\mapsto \, h(\varepsilon _{f}),u_2\,\mapsto \, \varepsilon _{f}\})\,=\,\{a\}\). For the variable \(v_2\), \(\uparrow _{v_2}(D,S)\,=\,\) \(\uparrow (h(\varepsilon _{f}),\{u_1\,\mapsto \, \varepsilon _{f},u_2\,\mapsto \, a\})\) is an infinite set

$$\begin{aligned} &\{h(\varepsilon _{f}),\, h(u_1)\}\cup \{ h(f(u_1, s)) \mid s\,\in \, \mathcal {T}(\mathcal {F},\{u_1,u_2\} )\} \\ & \qquad \qquad \qquad \,\, \cup \{ h(f(s,u_1)) \mid s\,\in \, \mathcal {T}(\mathcal {F},\{u_1,u_2\} )\} \\ & \qquad \qquad \qquad \,\, \cup \{ h(f(f(u_1,s),t)) \mid s,t\,\in \, \mathcal {T}(\mathcal {F},\{u_1,u_2\} )\} \cup \cdots \end{aligned}$$

The set of abstraction substitutions \(\varPsi (D,S)\) is an infinite set including \(\{\{v_1\,\mapsto \, a, v_2\,\mapsto \, h(\varepsilon _{f})\}, \{v_1\,\mapsto \, a, v_2\,\mapsto \, h(u_1)\}, \{v_1\,\mapsto \, a, v_2\,\mapsto \, h(f(u_1,a))\}, \ldots \}\). From the final configuration, we get an infinite set Abs-generalizations of the initial AUT, including, e.g., \(g(f(u_1,a),f(u_2,h(\varepsilon _{f})))\), \(g(f(u_1,a),f(u_2,h(u_1)))\), \( g(f(u_1,a),f(u_2,h(f(u_1,a))))\), etc.

$$\begin{aligned} {\textbf {Derivation}}\, 2: \qquad \quad \langle \{g(\varepsilon _{f},f(a,h(\varepsilon _{f})))\,\triangleq _{x}\, g(f(h(\varepsilon _{f}),a),\varepsilon _{f})\}; \emptyset ;\emptyset ;\iota \rangle & \!\!\overset{{{\tiny {\textit{Dec}}}}}{\Longrightarrow }\\ \langle \{\varepsilon _{f}\,\triangleq _{w_1}\, f(h(\varepsilon _{f}),a), f(a,h(\varepsilon _{f}))\,\triangleq _{w_2}\, \varepsilon _{f}\}; \emptyset ;\emptyset ;\{x\,\mapsto \, g(w_1,w_2),\ldots \}\rangle & \!\!\overset{{{\tiny {\textit{ExpLA1}}}}}{\Longrightarrow }\\ \langle \{\varepsilon _{f}\!\!\,\triangleq _{u_1}\,\!\! h(\varepsilon _{f}), f(a,h(\varepsilon _{f}))\!\!\,\triangleq _{w_2}\,\!\! \varepsilon _{f}\}; \emptyset ;\{\star \!\!\,\triangleq _{v_1}\,\!\! a\}; \{x\!\,\mapsto \,\! g(f(u_1,v_1),w_2), \ldots \}\rangle & \!\!\overset{{{\tiny {\textit{ExpRA2}}}}}{\Longrightarrow }\\ \langle \{\varepsilon _{f}\,\triangleq _{u_1}\, h(\varepsilon _{f}), h(\varepsilon _{f})\,\triangleq _{v_2}\, \varepsilon _{f}\}; \emptyset ;\{\star \,\triangleq _{v_1}\, a,a\,\triangleq _{u_2}\, \star \};&\\ \{x\,\mapsto \, g(f(u_1,v_1),f(u_2,v_2)), \ldots \}\rangle & \!\!\overset{{{\tiny { Sol } \times 2}}}{\Longrightarrow }\\ \langle \emptyset ; \{\varepsilon _{f}\,\triangleq _{u_1}\, h(\varepsilon _{f}), h(\varepsilon _{f})\,\triangleq _{v_2}\, \varepsilon _{f}\};\{\star \,\triangleq _{v_1}\, a,a\,\triangleq _{u_2}\, \star \};&\\ \{x\,\mapsto \, g(f(u_1,v_1),f(u_2,v_2)),\ldots \}\rangle . & \end{aligned}$$

Then \(D\,=\,\{\star \,\triangleq _{v_1}\, a,a\,\triangleq _{u_2}\, \star \}\) and \(S\,=\,\{\varepsilon _{f}\,\triangleq _{u_1}\, h(\varepsilon _{f}), h(\varepsilon _{f})\,\triangleq _{v_2}\, \varepsilon _{f}\}\). Thus, \(\uparrow _{v_1}(D,S)\,=\,\uparrow (a,\{u_1\,\mapsto \, h(\varepsilon _{f}),v_2\,\mapsto \, \varepsilon _{f}\})\,=\,\{a\}\), and \(\uparrow _{u_2}(D,S)\,=\,\uparrow (a,\{u_1\,\mapsto \, \varepsilon _{f}, v_2\,\mapsto \, h(\varepsilon _{f})\})\,=\,\{a\}\). This leads to the generalization \(g(f(u_1,a),f(a,v_2))\).

$$\begin{aligned} {\textbf {Derivation}}\, 3: \qquad \quad \langle \{g(\varepsilon _{f},f(a,h(\varepsilon _{f})))\,\triangleq _{x}\, g(f(h(\varepsilon _{f}),a),\varepsilon _{f})\}; \emptyset ;\emptyset ;\iota \rangle & \!\overset{{{\tiny {\textit{Dec}}}}}{\Longrightarrow }\\ \langle \{\varepsilon _{f}\,\triangleq _{w_1}\, f(h(\varepsilon _{f}),a), f(a,h(\varepsilon _{f}))\,\triangleq _{w_2}\, \varepsilon _{f}\}; \emptyset ;\emptyset ;\{x\,\mapsto \, g(w_1,w_2), \ldots \}\rangle & \!\overset{{{\tiny {\textit{ExpLA2}}}}}{\Longrightarrow }\\ \langle \{\varepsilon _{f}\!\,\triangleq _{v_1}\,\! a, f(a,h(\varepsilon _{f}))\!\,\triangleq _{w_2}\,\! \varepsilon _{f}\}; \emptyset ;\{\star \!\,\triangleq _{u_1}\,\! h(\varepsilon _{f})\}; \\ \{x\,\mapsto \, g(f(u_1,v_1),w_2), \ldots \}\rangle & \!\overset{{{\tiny {\textit{ExpRA1}}}}}{\Longrightarrow }\\ \langle \{\varepsilon _{f}\,\triangleq _{v_1}\,\! a, a\,\triangleq _{u_2}\,\! \varepsilon _{f}\}; \emptyset ;\{\star \,\triangleq _{u_1}\, \!h(\varepsilon _{f}), h(\varepsilon _{f})\,\triangleq _{v_2}\,\! \star \};&\\ \{x\,\mapsto \, g(f(u_1,v_1),f(u_2,v_2)), \ldots \}\rangle & \!\overset{{{\tiny { Sol } \times 2}}}{\Longrightarrow }\\ \langle \emptyset ; \{\varepsilon _{f}\,\triangleq _{v_1}\, a, a\,\triangleq _{u_2}\, \varepsilon _{f}\};\{\star \,\triangleq _{u_1}\, h(\varepsilon _{f}),h(\varepsilon _{f})\,\triangleq _{v_2}\, \star \};&\\ \{x\,\mapsto \, g(f(u_1,v_1),f(u_2,v_2)), \ldots \}\rangle . & \end{aligned}$$

Then \(D\,=\,\{\star \,\triangleq _{u_1}\,\! h(\varepsilon _{f}),h(\varepsilon _{f})\,\triangleq _{v_2}\, \! \star \}\) and \(S\,=\,\{\varepsilon _{f}\,\triangleq _{v_1}\,\! a, a\,\triangleq _{u_2}\,\! \varepsilon _{f}\}\). Thus, we get

$$\begin{aligned} & \uparrow _{u_1}(D,S)\,=\,\uparrow (h(\varepsilon _{f}),\, \{v_1\,\mapsto \, a,\, u_2\,\mapsto \, \varepsilon _{f}\})\,=\, \\ & \qquad \qquad \{h(\varepsilon _{f}),\ h(u_2)\} \cup \{h(f(u_2,s)) \mid s \,\in \, \mathcal {T}(\mathcal {F},\{v_1,u_2\} ) \} \cup {} \cdots , \text { and } \\ & \uparrow _{v_2}(D,S)\,=\,\uparrow (h(\varepsilon _{f}),\, \{v_1\,\mapsto \, \varepsilon _{f},\, u_2\,\mapsto \, a\})\,=\, \\ & \qquad \qquad \{h(\varepsilon _{f}),\ h(v_1)\} \cup \{h(f(v_1,s)) \mid s \,\in \, \mathcal {T}(\mathcal {F},\{v_1,u_2\} ) \} \cup {} \cdots \end{aligned}$$

Then \(\varPsi (D,S)\) is infinite, it contains, e.g., the substitutions \(\{u_1\,\mapsto \, h(\varepsilon _{f}), v_2\,\mapsto \, h(\varepsilon _{f})\},\) \( \{u_1\,\mapsto \, h(\varepsilon _{f}), v_2\,\mapsto \, h(v_1)\}, \{u_1\,\mapsto \, h(u_2), v_2\,\mapsto \, h(\varepsilon _{f})\} \), etc. This leads to infinitely many generalizations of the initial AUT, including, e.g., \(g(f(h(\varepsilon _{f}),v_1),f(u_2,h(\varepsilon _{f}))), g(f(h(\varepsilon _{f}),v_1),f(u_2, h(v_1)))\), etc.

$$\begin{aligned} {\textbf {Derivation}}\, 4:\qquad \quad \quad \langle \{g(\varepsilon _{f},f(a,h(\varepsilon _{f})))\,\triangleq _{x}\, g(f(h(\varepsilon _{f}),a),\varepsilon _{f})\}; \emptyset ;\emptyset ;\iota \rangle & \!\overset{{{\tiny {\textit{Dec}}}}}{\Longrightarrow }\\ \langle \{\varepsilon _{f}\,\triangleq _{w_1}\, f(h(\varepsilon _{f}),a), f(a,h(\varepsilon _{f}))\,\triangleq _{w_2}\, \varepsilon _{f}\}; \emptyset ;\emptyset ;\{x\,\mapsto \, g(w_1,w_2)\}\rangle & \!\overset{{{\tiny {\textit{ExpLA2}}}}}{\Longrightarrow }\\ \langle \{\varepsilon _{f}\!\,\triangleq _{v_1}\,\! a, f(a,h(\varepsilon _{f}))\!\,\triangleq _{w_2}\,\! \varepsilon _{f}\}; \emptyset ;\{\star \!\,\triangleq _{u_1}\,\! h(\varepsilon _{f})\}; \{x\,\mapsto \, g(f(u_1,v_1),w_2), \ldots \}\rangle & \!\overset{{{\tiny {\textit{ExpRA2}}}}}{\Longrightarrow }\\ \langle \{\varepsilon _{f}\,\triangleq _{v_1}\, a, h(\varepsilon _{f})\,\triangleq _{v_2}\, \varepsilon _{f}\}; \emptyset ;\{\star \,\triangleq _{u_1}\, h(\varepsilon _{f}), a\,\triangleq _{u_2}\, \star \};&\\ \{x\,\mapsto \, g(f(u_1,v_1),f(u_2,v_2)), \ldots \}\rangle & \!\overset{{{\tiny { Sol } \times 2}}}{\Longrightarrow }\\ \langle \emptyset ; \{\varepsilon _{f}\,\triangleq _{v_1}\, a, h(\varepsilon _{f})\,\triangleq _{v_2}\, \varepsilon _{f}\};\{\star \,\triangleq _{u_1}\, h(\varepsilon _{f}),a\,\triangleq _{u_2}\, \star \};&\\ \{x\,\mapsto \, g(f(u_1,v_1),f(u_2,v_2)),\ldots \}\rangle . & \end{aligned}$$

Then \(D\,=\,\{\star \,\triangleq _{u_1}\, h(\varepsilon _{f}),a\,\triangleq _{u_2}\, \star \}\) and \(S\,=\,\{\varepsilon _{f}\,\triangleq _{v_1}\, a, h(\varepsilon _{f})\,\triangleq _{v_2}\, \varepsilon _{f}\}\). This leads to infinitely many generalizations f the initial AUT, including, e.g., \(g(f(h(\varepsilon _{f}),v_1), f(a,v_2))\)\(g(f(h(v_2),v_1),f(a,v_2))\), \( g(f(h(f(v_2,a)),v_1),f(a,v_2))\), etc., since

$$\begin{aligned} & \uparrow _{u_1}(D,S)\!\,=\,\!\uparrow (h(\varepsilon _{f}),\{v_1\,\mapsto \, a,v_2\,\mapsto \, \varepsilon _{f}\})\,=\, \\ & \qquad \qquad \{h(\varepsilon _{f}),h(v_2)\} \cup \{h(f(v_2,s)) \mid s\,\in \, \mathcal {T}(\mathcal {F},\{v_1,v_2\}) \} \cup {} \cdots \text { and } \\ & \uparrow _{u_2}(D,S)\!\,=\,\!\uparrow (a,\{v_1\,\mapsto \, \varepsilon _{f},v_2\,\mapsto \, h(\varepsilon _{f})\})\!\,=\,\!\{a\}. \end{aligned}$$

Example 6

To generalize \(g(\varepsilon _{f},\varepsilon _{f},a)\) and \(g(\varepsilon _{f},b,\varepsilon _{f})\), the AUnif procedure generates two derivations, which differ from each other only in the last step:

$$\begin{aligned} {\textbf {Derivation}}\,1: \qquad \qquad \qquad \qquad \qquad \langle \{g(\varepsilon _{f},\varepsilon _{f},a)\,\triangleq _{x}\, g(\varepsilon _{f},b,\varepsilon _{f})\};\emptyset ; \emptyset ; \iota \rangle & \overset{{{\tiny {\textit{Dec}}}}}{\Longrightarrow }\\ \langle \{\varepsilon _{f}\,\triangleq _{y_1}\,\varepsilon _{f},\varepsilon _{f}\,\triangleq _{y_2}\, b,a\,\triangleq _{y_3}\, \varepsilon _{f}\};\emptyset ; \emptyset ; \{x\,\mapsto \, g(y_1,y_2,y_3), \ldots \}\rangle & \!\overset{{{\tiny { Sol } \times 2}}}{\Longrightarrow }\\ \langle \{\varepsilon _{f}\,\triangleq _{y_1}\,\varepsilon _{f}\};\{\varepsilon _{f}\,\triangleq _{y_2}\, b,a\,\triangleq _{y_3}\, \varepsilon _{f}\};\emptyset ; \{x\,\mapsto \, g(y_1,y_2,y_3), \ldots \}\rangle & \overset{{{\tiny {\textit{ExpBA1}}}}}{\Longrightarrow }\\ \langle \emptyset ; \{\varepsilon _{f}\,\triangleq _{y_2}\, b, a\,\triangleq _{y_3}\, \varepsilon _{f}\};\{\star \!\,\triangleq _{u_1}\, \!\varepsilon _{f},\varepsilon _{f}\!\,\triangleq _{u_2}\, \!\star \}; \{x\,\mapsto \, g(f(u_1,u_2),y_2,y_3),\ldots \}\rangle . & \end{aligned}$$

Here, for the store S and the delayed set D in the last configuration, we get

$$\begin{aligned} & \uparrow _{u_1}(D,S)\,=\,\uparrow (\varepsilon _{f},\{y_2\,\mapsto \, b,y_3\,\mapsto \, \varepsilon _{f}\})\,=\, {} \\ & \qquad \{\varepsilon _{f},y_3\} \cup \{ f(y_3,s) \mid s \,\in \, \mathcal {T}(\mathcal {F},\{y_2,y_3\}) \} \cup \{ f(s, y_3) \mid s \,\in \, \mathcal {T}(\mathcal {F},\{y_2,y_3\}) \cup {} \\ & \qquad \{ f(f(y_3,s),t) \mid s,t \,\in \, \mathcal {T}(\mathcal {F},\{y_2,y_3\}) \} \cup {} \cdots \\ & \uparrow _{u_2}(D,S)\,=\,\uparrow (\varepsilon _{f},\{y_2\,\mapsto \, \varepsilon _{f},y_3\,\mapsto \, a\})\,=\, {} \\ & \qquad \{\varepsilon _{f},y_2\} \cup \{ f(y_2,s) \mid s \,\in \, \mathcal {T}(\mathcal {F},\{y_2,y_3\}) \} \cup \{ f(s, y_2) \mid s \,\in \, \mathcal {T}(\mathcal {F},\{y_2,y_3\}) \cup {} \\ & \qquad \{ f(f(y_2,s),t) \mid s,t \,\in \, \mathcal {T}(\mathcal {F},\{y_2,y_3\}) \} \cup {} \cdots \end{aligned}$$

From these, we get an infinite set of generalizations that includes, among others, e.g., \(g(\varepsilon _{f}, y_2, y_3),\ \, g(f(y_3,y_2), y_2, y_3), \ \, g(f(f(y_3,y_3),y_2), y_2, y_3)\), etc.

$$\begin{aligned} {\textbf {Derivation}}\,2: \qquad \qquad \qquad \qquad \qquad \langle \{g(\varepsilon _{f},\varepsilon _{f},a)\,\triangleq _{x}\, g(\varepsilon _{f},b,\varepsilon _{f})\};\emptyset ; \emptyset ; \iota \rangle & \overset{{{\tiny {\textit{Dec}}}}}{\Longrightarrow }\\ \langle \{\varepsilon _{f}\,\triangleq _{y_1}\,\varepsilon _{f},\varepsilon _{f}\,\triangleq _{y_2}\, b,a\,\triangleq _{y_3}\, \varepsilon _{f}\};\emptyset ; \emptyset ; \{x\,\mapsto \, g(y_1,y_2,y_3)\}\rangle & \overset{{{\tiny { Sol } \times 2}}}{\Longrightarrow }\\ \langle \{\varepsilon _{f}\,\triangleq _{y_1}\,\varepsilon _{f}\};\{\varepsilon _{f}\,\triangleq _{y_2}\, b,a\,\triangleq _{y_3}\, \varepsilon _{f}\};\emptyset ; \{x\,\mapsto \, g(y_1,y_2,y_3)\}\rangle & \overset{{{\tiny {\textit{ExpBA2}}}}}{\Longrightarrow }\\ \langle \emptyset ; \{\varepsilon _{f}\,\triangleq _{y_2}\, b, a\,\triangleq _{y_3}\, \varepsilon _{f}\};\{\varepsilon _{f}\,\triangleq _{v_1}\, \star , \star \,\triangleq _{v_2}\, \varepsilon _{f}\}; \{x\,\mapsto \, g(f(v_1,v_2),y_2,y_3),\ldots \}\rangle . & \end{aligned}$$

Again, taking S and D from the last configuration, we get

$$\begin{aligned} & \uparrow _{v_1}(D,S)\,=\,\uparrow (\varepsilon _{f},\{y_2\,\mapsto \, \varepsilon _{f},y_3\,\mapsto \, a\})\,=\, {} \\ & \qquad \{\varepsilon _{f},y_2\} \cup \{ f(y_2,s) \mid s \,\in \, \mathcal {T}(\mathcal {F},\{y_2,y_3\}) \} \cup \{ f(s, y_2) \mid s \,\in \, \mathcal {T}(\mathcal {F},\{y_2,y_3\}) \cup {} \\ & \qquad \{ f(f(y_2,s),t) \mid s,t \,\in \, \mathcal {T}(\mathcal {F},\{y_2,y_3\}) \} \cup {} \cdots \\ & \uparrow _{v_2}(D,S)\,=\,\uparrow (\varepsilon _{f},\{y_2\,\mapsto \, b,y_3\,\mapsto \, \varepsilon _{f}\}) {} \\ & \qquad \{\varepsilon _{f},y_3\} \cup \{ f(y_3,s) \mid s \,\in \, \mathcal {T}(\mathcal {F},\{y_2,y_3\}) \} \cup \{ f(s, y_3) \mid s \,\in \, \mathcal {T}(\mathcal {F},\{y_2,y_3\}) \cup {} \\ & \qquad \{ f(f(y_3,s),t) \mid s,t \,\in \, \mathcal {T}(\mathcal {F},\{y_2,y_3\}) \} \cup {} \cdots \end{aligned}$$

From these, we get an infinite set of generalizations that includes, among others, e.g., \(g(\varepsilon _{f}, y_2, y_3),\ \, g(f(y_2,y_3), y_2, y_3), \ \, g(f(f(y_2,y_2),y_3), y_2, y_3)\), etc.

4 Soundness and Completeness

Preserving the stated properties of configurations (Definition 6) is essential to both the soundness and completeness proofs as these properties enforce consistency with respect to the use of the labels.

Theorem 2

(Soundness). Consider , a derivation to a final configuration. Then for all \(s\,\triangleq _{x}\, t\,\in \, A_0\cup S_0\), \(x \theta _n\,\in \,\mathcal {G}_{\texttt {Abs}}(s,t)\).

Proof

We proceed by induction over the derivation length.

Basecase. If the derivation has length 0, then it starts with a final configuration implying that \(A_0\,=\,\emptyset \) and for all \(s\,\triangleq _{x}\, t\,\in \, S_0\), \(x\theta _0\,=\,x\,\in \,\mathcal {G}_{\texttt {Abs}}(s,t)\).

Stepcase. Now consider a derivation having the following form:

(1)

We assume for the induction hypothesis (IH) that for derivations of the form

figure f

the theorem holds and show that the theorem holds for derivations of the form presented in Derivation 1. We continue the proof considering the various options for the transition from \(\langle A_0;S_0;D_0;\theta _0 \rangle \) to \(\langle A_1;S_1;D_1;\theta _1\rangle \).

  1. 1.

    (Dec). Assume that the derivation is of the form:

    figure g

    where \(\theta _1 \,=\, \theta _0\{y\,\mapsto \, f(x_1,\dots ,x_m)\}\). By the IH, we know that for all \(1\le i\le m\), \(x_i\theta _{n+1}\,\in \, \mathcal {G}_{\texttt {Abs}}(s_i,t_i)\) implying that

    $$f(x_1,\dots ,x_m)\theta _{n+1}\,\in \, \mathcal {G}_{\texttt {Abs}}(f(s_1,\dots , s_m),f(t_1,\dots , t_m)).$$
  2. 2.

    (Sol). Assume that the derivation is of the form:

    figure h

    where \(S_1 \,=\,\{s\,\triangleq _{y}\, t\} \cup S_0\). By IH, \(\theta _{n+1}\) generalizes all the AUTs with labels in \(S_1\). Thus, \(y\theta _{n+1}\,\in \,\mathcal {G}_{\texttt {Abs}}(s,t)\).

  3. 3.

    (ExpLA1). Assume that the derivation is of the form:

    figure i

    where \(D_1 \,=\,\{\star \,\triangleq _{x_2}\, t\}\cup D_0\) and \(\theta _1 \,=\,\theta _0\{y\,\mapsto \, f(x_1,x_2)\)}. By the IH, all the AUTs in \(\{\varepsilon _{f}\,\triangleq _{x_1}\, s\}\cup A'\) are generalized by the substitution \(\theta _{n+1}\), thus, \(x_1\theta _{n+1}\,\in \, \mathcal {G}_{\texttt {Abs}}(\varepsilon _{f},s)\). Furthermore, since \(x_2\,\in \, labels (D)\) then \(x_2\theta _{n+1} \,=\,x_2\) and \(x_2\preceq _{\texttt {Abs}} t\). We can build the generalization \(y\theta _{n+1}\,=\,f(x_1\theta _{n+1},x_2\theta _{n+1})\). Observe that \(f(x_1\theta _{n+1},x_2\theta _{n+1})\,=\,f(x_1\theta _{n+1},x_2)\,\in \, \mathcal {G}_{\texttt {Abs}}(f(\varepsilon _{f},t),f(s,t))\) and since \(f(\varepsilon _{f},t)\,\approx _{\texttt {Abs}} \varepsilon _{f}\), we get that \(y\theta _{n+1}\) belongs to \(\mathcal {G}_{\texttt {Abs}}(\varepsilon _{f},f(s,t))\).

  4. 4.

    The analysis of other one-side expansion rules is analogous to the previous one.

  5. 5.

    (ExpBA1). Assume that the derivation is of the form:

    figure j

    where \(D_1 \,=\,\{\varepsilon _{f} \!\,\triangleq _{x_1}\,\! \star , \star \! \,\triangleq _{x_2}\,\!\varepsilon _{f}\}\cup D_0\) and \( \theta _1 \,=\,\theta _0\{y\,\mapsto \, f(x_1,x_2)\)}. Notice, \(x_i\theta _{n+1}\,=\,x_i\) and \(x_i\preceq _{\texttt {Abs}} \varepsilon _{f}\), for \(i\,\in \,\{1,2\}\). This implies that \(y\theta _{n+1} \,=\,f(x_1\theta _{n+1}, x_2\theta _{n+1} )\,=\,f(x_1,x_2)\,\in \, \mathcal {G}_{\texttt {Abs}}(\varepsilon _{f}, \varepsilon _{f})\). The case (ExpBA2) is analogous.

  6. 6.

    (Mer) Assume that the derivation is of the form:

    figure k

    Notice that \(\theta _{1}\,=\,\theta _0\{y\,\mapsto \, z\}\), where z is the label of the AUT \(\{s\,\triangleq _{z}\, t\}\,\in \, S_0\). By IH, \(z\theta _{n+1}\,\in \,\mathcal {G}_{\texttt {Abs}}(s,t)\) implying that \(y\theta _{n+1}\,=\,y\{y\,\mapsto \, z\}\theta _{n+1}\,\in \,\mathcal {G}_{\texttt {Abs}}(s,t)\).    \(\square \)

While the soundness theorem covers the construction of generalizations of AUTs present in a given configuration, it does not consider the abstraction set or the construction of more specific generalizations when generalizing over an absorption theory. The abstraction set allows us to consider generalizations between a given term and an arbitrary term.

Lemma 2

Let be a derivation. Then for all \(\star \,\triangleq _{u}\, t\,\in \, D_n\) (resp. for all \(s \,\triangleq _{u}\, \star \,\in \, D_n\)) and \(\tau \,\in \,\varPsi (D_n,S_n)\), there exists a term r such that \(u\tau \,\in \,\mathcal {G}_{\texttt {Abs}}(r,t)\) (resp. \(u\tau \,\in \,\mathcal {G}_{\texttt {Abs}}(r,s)\)).

Proof

Let \(\eta \) be a ground substitution with \( dom (\eta ) \,=\, var (u\tau )\). Then \(r\,=\,u\tau \eta \).    \(\square \)

Intuitively, Lemma 2 formalizes the following observation: if \(\star \,\triangleq _{u}\, t\,\in \, D_n\), then \(u\tau \,\in \, \uparrow _u(D_n,S_n)\) implies \(u\tau \,\in \, \uparrow (t,\{x\,\mapsto \, t' \mid s\,\triangleq _{x}\,t'\,\in \, S_n \text { for some } s \})\). From this, we can deduce that \(u\tau \preceq _{\texttt {Abs}} t\). Thus, for every AUT in the set \(D_n\), the wild card can be interpreted as r and \(u\tau \preceq _{\texttt {Abs}} r\). We can now prove the following:

Theorem 3

Let be a derivation to a final configuration and \(s\,\triangleq _{x}\, t\,\in \, A_0\cup S_0\). Then for all \(\tau \,\in \,\varPsi (D_n,S_n)\), \(x\theta _n\tau \,\in \,\mathcal {G}_{\texttt {Abs}}(s,t)\).

Proof

From Theorem 2, \(x\theta _n\,\in \,\mathcal {G}_{\texttt {Abs}}(s,t)\). Furthermore, every \(u\,\in \, labels (D_n)\) is unique, only occurs once in \(x\theta _n\), and \(u\theta _n\tau \,=\,u\tau \). Considering these facts together with Lemma 2 and u being an \(\texttt {Abs}\)-generalization of the respective subterms in s and t, we deduce that \(x\theta _n \tau \,\in \,\mathcal {G}_{\texttt {Abs}}(s,t)\).    \(\square \)

Theorem 4

(Completeness). Let \(r\,\in \, \mathcal {G}_{\texttt {Abs}}(t_1,t_2)\). Then for all configurations \(\langle A; S; D; \theta \rangle \) such that \(t_1\,\triangleq _{x}\, t_2 \,\in \, A\) there exist a final configuration \(\langle \emptyset ; S';D';\theta '\rangle \) \(\,\in \, \textsc {AUnif}(\langle A; S;D;\theta \rangle )\) and \(\tau \,\in \,\varPsi (D',S')\) such that \(r\preceq _{\texttt {Abs}}x\theta '\tau \).

Proof

The proof is by structural induction over r.

                              Basecase

  1. 1.

    Let r be a variable. Then, we must consider the following three cases:

    1. (a)

      If \( head (t_1)\,=\, head (t_2)\), then from \(\langle A; S; D; \theta \rangle \) such that \(t_1\,\triangleq _{x}\, t_2 \,\in \, A\), we can reach \(\langle A'; S; D; \theta '\rangle \) by decomposition so that \( head (x\theta ')\,=\, head (t_1)\,=\, head (t_2)\). Thus, for any final configuration \(\langle \emptyset ; S''; D''; \theta ''\rangle \,\in \, \textsc {AUnif}(\langle A';S; D; \theta '\rangle )\), \(r\preceq _{\texttt {Abs}}x\theta ''\) as \(\theta ''\) can only be more specific than \(\theta '\).

    2. (b)

      If \( head (t_1)\,=\, head (t_2)\) are absorption constants, w.l.o.g, \(t_1\,=\,\varepsilon _{f}\), then from \(\langle A; S; D; \theta \rangle \) such that \(t_1\,\triangleq _{x}\, t_2 \,\in \, A\), we can reach \(\langle A'; S; D; \theta '\rangle \) by (ExpBA1) so that \( head (x\theta ')\,=\,f\). Thus, for any final configuration \(\langle \emptyset ; S''; D''; \theta ''\rangle \,\in \, \textsc {AUnif}(\langle A';S; D; \theta '\rangle )\), \(r\preceq _{\texttt {Abs}}x\theta ''\) as \(\theta ''\) can only be more specific than \(\theta '\).

    3. (c)

      W.l.o.g, if \(t_1 \,=\, \varepsilon _{f}\) and \(t_2\,=\,f(s_1,s_2)\), then from \(\langle A; S; D; \theta \rangle \) such that \(t_1\,\triangleq _{x}\, t_2 \,\in \, A\), we can reach \(\langle A'; S'; D'; \theta '\rangle \) using ExpLA1 such that \( head (x\theta ')\,=\, head (t_2)\). Thus, for any final configuration \(\langle \emptyset ; S''; D''; \theta ''\rangle \,\in \, \textsc {AUnif}(\langle A';S'; D'; \theta '\rangle )\), \(r\preceq _{\texttt {Abs}}x\theta ''\) as \(\theta ''\) is more specific than \(\theta '\).

    4. (d)

      Otherwise, if \( head (t_1)\not =\, head (t_2)\), then from \(\langle A; S; D; \theta \rangle \) with \(t_1\,\triangleq _{x}\,\! t_2 \,\in \, A\), we reach \(\langle A'; S'; D; \theta \rangle \) using Solve where \(t_1\,\triangleq _{x}\,\! t_2 \,\in \, S\). Thus, for any final configuration \(\langle \emptyset ; S''; D''; \theta ''\rangle \,\in \, \textsc {AUnif}(\langle A'; S'; D; \theta \rangle )\), we get \(r\,\approx _{\texttt {Abs}} x\theta ''\).

    In all four cases \(r\preceq _{\texttt {Abs}} x\theta ''\) and by Theorem 3 we get \(r\preceq _{\texttt {Abs}} x\theta '\tau \).

  2. 2.

    Let r be a constant. Then \(t_1\,=\,t_2\,=\,r\) and from a configuration \(\langle A; S; D; \theta \rangle \) where \(t_1\,\triangleq _{x}\, t_2 \,\in \, A\), we can reach a configuration \(\langle A'; S; D; \theta '\rangle \) using the decomposition rule such that \(x\theta '\,=\,t_1\,=\,t_2\,=\,r\). Thus, for any final configuration \(\langle \emptyset ; S''; D''; \theta ''\rangle \,\in \, \textsc {AUnif}(\langle A'; S'; D'; \theta '\rangle )\), \(r\preceq _{\texttt {Abs}} x\theta ''\tau \) trivially follows.

                              Stepcase

  1. 1.

    \(r\,=\,g(r_1,\ldots ,r_n)\), \(t_1\,=\,g(t'_1,\ldots ,t'_n)\), and \(t_2\,=\,g(t''_1,\ldots ,t''_n)\); This implies that \(r_i\) is a generalization of \(t'_i \,\triangleq _{y_i}\, t''_i\) for \(1\le i \le n\). From \(\langle A; S; D; \theta \rangle \) we can reach \(\langle A'; S'; D'; \theta '\rangle \), using the decomposition rule, such that \(t'_i \,\triangleq _{y_i}\, t''_i\,\in \, A'\). Note that there may exist \(1\le i < j \le n\) such that \( var (r_i)\cap var (r_j)\not =\, \emptyset \). Let \(R\,\subseteq \, var (r)\) such that for \(z\,\in \, R\) there exist \(1\le i < j \le n\) such that \(z\,\in \, var (r_i)\cap var (r_j)\). For any \(z\,\in \, R\), there are two cases to consider:

    1. (i)

      There does not exist a position \(p\,\in \, pos (t_1)\cap pos (t_2)\) such that \(s^*\,\triangleq _{z}\, t^*\) where \(s^*\,=\,t_1\vert _p\) and \(t^* \,=\, t_2\vert _p\). In other words, z generalizes terms which are absorbed during Abs-normalization of \(r\sigma \) and \(r\rho \), where \(r\sigma \,\approx _{\texttt {Abs}} t_1\) and \(r\rho \,\approx _{\texttt {Abs}} t_2\); this implies that replacing occurrences of z by \(\varepsilon _{f}\) (for the appropriate absorption symbol f) within r results in a more specific generalization \(r'\). For the remainder of this proof, we can consider r to be the generalization resulting from replacing all such variables in R by the appropriate absorption constant \(\varepsilon _{f}\).

    2. (ii)

      There exists a position \(p\,\in \, pos (t_1)\cap pos (t_2)\) such that \(s^*\,\triangleq _{z}\, t^*\) where \(s^*\,=\,t_1\vert _p\) and \(t^* \,=\, t_2\vert _p\). Notice that z is structurally smaller than r and thus, by the IH, there exists a final configuration \(\langle \emptyset ; S^*; D^*; \theta ^*\rangle \,\in \, \textsc {AUnif}(\langle \{s^*\,\triangleq _{z}\, t^* \}; \emptyset ; \emptyset ; \iota \rangle )\) and \(\tau ^*\,\in \,\varPsi (D^*, S^*)\) such that \(z\le x'\theta ^*\tau ^*\). We will use \(\theta ^*\tau ^*\) to guarantee variables occurring in multiple \(r_i\), for \(0\le i\le n\), are replaced by the same term in the generalizations resulting from the IH.

    By the induction hypothesis, there exists a final configuration \(\langle \emptyset ; S''; D'';\) \( \theta ''\rangle \,\in \, \textsc {AUnif}(\langle A'; S'; D'; \theta '\rangle )\) and \(\tau _i\,\in \,\varPsi (D'',S'')\) such that \(r_i\preceq _{\texttt {Abs}} y_i\theta ''\tau _i\) where \(1\le i\le n\). Note, we can choose the same configuration \(\langle \emptyset ; S''; D''; \theta ''\rangle \) for all AUTs \(t'_i\,\triangleq _{y_i}\, t''_i\) as the procedure produces all combinations of solutions to the subproblems. Furthermore, we can choose \(\langle \emptyset ; S''; D''; \theta ''\rangle \) such that \(S^*\,\subseteq \, S''\) and \(D^*\,\subseteq \, D''\) modulo label renaming as \(s^*\) and \(t^*\) are subterms of \(t_1\) and \(t_2\), respectively, modulo absorption symbol introduction. Now, we define \(\gamma _i\) as the substitution such that \(r_i\gamma _i\,\approx _{\texttt {Abs}}y_i\theta ''\tau _i\). By the above construction, we can safely assume for all \(z\,\in \, var (r_1)\cap var (r_2)\) such that z has not been replaced by an absorption constant, that \(z\gamma _i\,\approx _{\texttt {Abs}} z\theta ^*\tau ^*\) as there exist AUTs corresponding to \(S^*\) and \(D^*\) in \(S''\) and \(D''\), respectively. Now let \(\mu \) be a substitution and \(r_i'\) (\(1\le i \le n\)) be terms such that for all \(1\le i\le n\), \(r_i\,=\, r_i'\mu \) and \(g(r_1',\ldots ,r_n') \preceq _{\texttt {Abs}} g(y_1\theta '',\ldots , y_n\theta '')\). If \(\mu \) is the identity substitution, then we are done. Otherwise, we can use \(\mu \) to construct a \(\tau \,\in \,\varPsi (D'',S'')\). Additionally, we need to consider the \(\tau _i \,\in \,\varPsi (D'',S'')\) derived above for each \(r_i\), where \(1\le i\le n\), and the corresponding substitutions \(\gamma _i\). Thus, \(r_i'\mu \preceq _{\texttt {Abs}} y_i\theta ''\tau _i\) and \(r_i'\mu \gamma _i\,\approx _{\texttt {Abs}} y_i\theta ''\tau _i\). Now let \(\mu _i^1\) and \(\mu _i^2\) be substitutions such that \(\mu \gamma _i \,=\, (\mu _i^1\mu _i^2)\vert _{ dom (\mu \gamma _i)}\) and \(r'_i\mu _i^1\,\approx _{\texttt {Abs}}y_i\theta ''\). This is possible given the assumption that \(g(r_1',\ldots ,r_n') \preceq _{\texttt {Abs}} g(y_1\theta '',\cdots , y_n\theta '')\). Note that \(r'_i\mu _i^1\,\approx _{\texttt {Abs}} y_i\theta ''\) implies that for every \(x\,\in \, dom (\mu _i^2)\) there exists a \(z\,\in \, dom (\tau _i)\) such that \(z\tau _i\,\approx _{\texttt {Abs}} x\mu _i^2\). We now construct \(\tau \,\in \, \varPsi (D'',S'')\) using the \(\mu _i^2\), that is for all \(1\le j\le n\) and \(x\,\in \, dom (\mu _j^2)\) there exists a \(z\,\in \, dom (\tau )\) such that \(z\tau \,\approx _{\texttt {Abs}} x\mu _j^2\). It now follows that \(r_i \preceq _{\texttt {Abs}} y_i\theta ''\tau \) holds for all \(1\le i\le n\) and thus we have shown that \(g(r_1,\ldots ,r_n) \preceq _{\texttt {Abs}} g(y_1,\cdots , y_n)\theta ''\tau \).

  2. 2.

    \(r\,=\,f(r_1,r_2)\), where f is an absorption symbol and, w.l.o.g, \(t_1\,=\,\varepsilon _{f}\) and \(t_2\,=\,f(s_1,s_2)\). Then from \(\langle A; S; D; \theta \rangle \) we can derive a configuration \(\langle A'; S'; D'; \theta '\rangle \) using the ExpLA1 rule such that \(\star \,\triangleq _{y_2}\, s_2\,\in \, D'\) and \(\varepsilon _{f}\,\triangleq _{y_1}\, s_1\,\in \, A'\). Now let \(\langle \emptyset ; S''; D''; \theta ''\rangle \,\in \, \textsc {AUnif}(\langle A'; S'; D'; \theta '\rangle )\) be a final configuration. By the induction hypothesis we know that \(r_1\preceq _{\texttt {Abs}} y_1\theta ''\tau _1\) for some \(\tau _1\,\in \, \varPsi (S'',D'')\). Let \(\mu '\) be a substitution such that \(r_1\mu '\,\approx _{\texttt {Abs}} y_1\theta ''\tau _1\) and \(R_2\,\subseteq \, var (r)\) such that \(R_2\cap var (r_1) \,=\, \emptyset \). Using \(R_2\) we define a bijective renaming \(\nu \) such that for all \(z\,\in \, R_2\), \(z\nu \notin \,\in \, var (r_1\mu ')\cup var (r_1)\). We will now consider the term \(r\nu \mu '\,=\,f(r_1\mu ',r_2\nu \mu ')\). Note that for all variables \(z\,\in \, var (r_1)\cap var (r_2\nu )\), it must be the case that \(z\mu ' \preceq _{\texttt {Abs}} z\mu ^*\) where \(r_1\mu ^* \,\approx _{\texttt {Abs}} s_1\) and \(r_2\mu ^* \,\approx _{\texttt {Abs}} s_2\). Thus, observe that \(r_2\nu \mu '\preceq _{\texttt {Abs}} s_2\). Now let \(\gamma '\) be a substitution such that \( dom (\gamma ')\,=\, var (r_2\nu \mu ')\), \(r_2\nu \mu '\gamma '\,\approx _{\texttt {Abs}} s_2\), and \(r_1\mu '\gamma '\,\approx _{\texttt {Abs}} s_1\). Now consider \(R_2'\,=\, \{z\mid z\,\in \, dom (\gamma ')\wedge z\notin \,\in \, var (r_1\mu ')\}\) and \(\nu ' \,=\,\{ z\,\mapsto \, l\mid z\,\in \, R_2'\wedge z\gamma '\,=\,l\}\). Note that \(r_2\nu \mu '\nu '\preceq _{\texttt {Abs}} s_2\) and there exists \(t^* \,\in \, \uparrow _{y_2}(D'',S'')\) such that \(r_2\nu \mu '\nu '\,\approx _{\texttt {Abs}} t^*\) by the definition of the abstraction set. For terms in \(\uparrow _{y_2}(D'',S'')\) we know how to build a \(\tau _2\,\in \, \varPsi (D'',S'')\). Now let \(\mu '_1\) and \(\mu '_2\) be substitutions such that \(r_1\mu '\,\approx _{\texttt {Abs}} r_1'\mu '_1\mu '_2\) and for all \(z\,\in \, dom (\mu '_2)\) there exists \(y\,\in \, dom (\tau _1)\) such that \(z\mu '_2\,\approx _{\texttt {Abs}} y\tau _1\). Notice we can apply the same rewriting to \(r_2\nu \mu '\nu '\) that is \(r_2'\mu ''_1\mu ''_2\,\approx _{\texttt {Abs}}r_2\nu \mu '\nu '\). We are free to choose the \( dom (\nu ')\) such that it does not compose with the range of \(\mu '\). Thus for variables \(z\,\in \, var (r_1'\mu '_1) \cap var (r_2'\mu ''_1)\) such that \(z\,\in \, dom (\mu _2'')\), there exists \(y\,\in \, dom (\tau _2)\) such that \(z\mu ''_2\,\approx _{\texttt {Abs}}y\tau _2\) and \(z\mu '_2\,\approx _{\texttt {Abs}}y\tau _1\). We can safely assume that the \( dom (\tau _2)\cap var ( ran (\tau _1))\,=\, \emptyset \), thus we can choose \(\tau \,\in \, \varPsi (D'',S'')\) such that \(\tau \,=\,\tau _1\tau _2\) as the required substitution; So, \(r\preceq _{\texttt {Abs}} f(y_1,y_2)\theta ''\tau \).

  3. 3.

    \(r\,=\,f(r_1,r_2)\), where f is an absorption symbol and, \(t_1\,=\,\varepsilon _{f}\) and \(t_2\,=\,\varepsilon _{f}\). Then from \(\langle A; S; D; \theta \rangle \) we can derive a configuration \(\langle A'; S'; D'; \theta '\rangle \) using, w.l.o.g, the ExpBA1 rule such that \(\varepsilon _{f}\,\triangleq _{y_1}\, \star ,\star \,\triangleq _{y_2}\, \varepsilon _{f}\,\in \, D'\). Now let \(\langle \emptyset ; S''; D''; \theta ''\rangle \,\in \, \textsc {AUnif}(\langle A'; S'; D'; \theta '\rangle )\) be a final configuration. Because \(y_1,y_2\,\in \, labels (D')\), \(y_1\theta ' \,=\, y_1\) and \(y_2\theta '\,=\,y_2\). Thus, there exist \(t_1 \,\in \, \uparrow _{y_1}(D'',S'')\), \(t_2\,\in \, \uparrow _{y_2}(D'',S'')\), a renaming \(\nu \), and \(\tau \,\in \, \varPsi (D'',S'')\) such that \(r_1\nu \,\approx _{\texttt {Abs}} y_1\tau \) and \(r_2\nu \,\approx _{\texttt {Abs}} y_2\tau \); this follows from the abstraction set containing all terms Abs-equivalent to \(\varepsilon _{f}\) under the substitution derived from \(S''\). The substitution \(\nu \) is required to rename variables in r by the appropriate variables in \( labels (S'')\).    \(\square \)

Given the complexity of the construction used in this theorem, the extended version contains examples that illustrate it [4]. We also show there that completeness would not hold if the Merge rule were applied to T.

5 Anti-unification Type, Complexity

Here we show that the complete set of generalizations produced by \(\textsc {AUnif}\) is minimal. Merging the set of final configurations and then showing that constructible generalizations are incomparable play an important role in the proof.

Definition 9

(Merged configurations). Let s and t be terms. We refer to \(\textsc {AUnif}(\langle \{s\,\triangleq _{x}\, t\};\emptyset ; \emptyset ; \iota \rangle )\) as merged if for all \(\langle \emptyset ; S_0 ; D_0; \theta _0 \rangle ,\langle \emptyset ; S_1 ; D_1; \theta _1 \rangle \,\in \, \textsc {AUnif}(\langle \) \(\{s\,\triangleq _{x}\, t\};\emptyset ; \emptyset ; \iota \rangle )\) and \(s'\,\triangleq _{y_1}\,t'\,\in \, S_0\), \(s'\,\triangleq _{y_2}\,t'\,\in \, S_1\) iff \(y_1 \,=\, y_2\).

A merged set of final configurations can be obtained by an appropriate renaming of the store labels and applying this renaming to the final substitutions.

Lemma 3

Let s and t be terms and \(\langle \emptyset ; S ; D; \theta \rangle \,\in \, \textsc {AUnif}(\langle \{s\,\triangleq _{x}\, t\};\emptyset ; \emptyset ; \iota \rangle )\). Then for all \(s'\,\triangleq _{y}\, t'\,\in \, S\) and any non-variable term r, \(x\theta \{y\,\mapsto \, r\}\notin \mathcal {G}_{\texttt {Abs}}(s,t)\).

Proof

Given that \(s'\,\triangleq _{y}\, t'\,\in \, S\), we know that \( head (s')\not =\, head (t')\) and, \( head (s')\) and \( head (t')\) are not related absorption symbols. In \(x\theta \{y\,\mapsto \, r\}\), the non-variable term r replaces y which was a generalization of \(s'\) and \(t'\), but by this replacement, \( head (r)\) will clash with \( head (s')\), \( head (t')\), or both. Hence, it cannot be a generalization of \(s'\) and \(t'\), which implies \(x\theta \{y\,\mapsto \, r\}\notin \mathcal {G}_{\texttt {Abs}}(s,t)\).    \(\square \)

Definition 10

Let s and t be terms and \(\textsc {AUnif}(\langle \{s\,\triangleq _{x}\, t\};\emptyset ; \emptyset ; \iota \rangle )\) merged. We define the set \(\mathcal {C}_{\textsc {AUnif}}(s,t)\) as \(\mathcal {C}_{\textsc {AUnif}}(s,t) \,=\, \{ x\theta \tau \mid \langle \emptyset ; S ; D; \theta \rangle \,\in \, \textsc {AUnif}(\langle \{s\,\triangleq _{x}\, t\};\emptyset ; \emptyset ; \iota \rangle ) \wedge \tau \,\in \,\varPsi (D,S)\}.\)

Lemma 4

For any st, \(\mathcal {C}_{\textsc {AUnif}}(s,t)\) is their complete set of \(\texttt {Abs}\)-generalizations.

Proof

The lemma follows from the completeness of \(\textsc {AUnif}\) (Theorem 4).    \(\square \)

Lemma 5

For all terms st, and \(r_0,r_1\,\in \, \mathcal {C}_{\textsc {AUnif}}(s,t)\), if \(r_0{\,}\ne {\,} r_1\) then neither \(r_0 \preceq _{\texttt {Abs}} r_1\) nor \(r_1 \preceq _{\texttt {Abs}} r_0\) holds.

Proof

By Corollary 1, \( var (r_0) \subseteq labels (S_0)\) and \( var (r_1)\subseteq labels (S_1)\) for some final configurations \(\langle \emptyset ; S_0 ; D_0; \theta _0 \rangle ,\langle \emptyset ; S_1; D_1; \theta _1 \rangle \) \(\,\in \, \textsc {AUnif}(\langle \{s\,\triangleq _{x}\, t\};\emptyset ; \emptyset ; \iota \rangle )\) as \(r_0\) and \(r_1\) are derived via the composition of the anti-unifiers of the associated final configurations with an abstraction substitution. By Lemma 3, w.l.o.g., for \(x\,\in \, labels (S_0)\) we have \(r_0\{x\,\mapsto \, r\}\notin \,\mathcal {G}_{\texttt {Abs}}(s,t)\) when r is not a variable. If r is a variable and \(r\,\in \, labels (S_0)\cup labels (S_1)\), then \(r_0\{x\,\mapsto \, r\}\notin \,\mathcal {G}_{\texttt {Abs}}(s,t)\) because labels in \( labels (S_0)\cup labels (S_1)\) are assigned to unique AUTs (due to merging of \(\textsc {AUnif}\)) and thus x and r generalize different terms. Thus, \(r\notin \, labels (S_0)\cup labels (S_1)\) implying neither \(r_0 \preceq _{\texttt {Abs}} r_1\) nor \(r_1 \preceq _{\texttt {Abs}} r_0\) hold.    \(\square \)

Theorem 5

For all terms st, \(\mathcal {C}_{\textsc {AUnif}}(s,t)\) is actually \( mcsg _{\texttt {Abs}}(s,t)\).

Proof

Lemma 4 shows completeness. Minimality follows from Lemma 5.    \(\square \)

Corollary 2

Anti-unification modulo Abs theories is of type infinitary.

Proof

By Theorem 5, the set of \(\texttt {Abs}\)-generalizations computed in Example 5 is an mcsg, which is infinite since Configuration 1 produces infinitely many.

Theorem 5 shows contrast to idempotent anti-unification [17]: another infinitary anti-unification problem where the algorithm produces a finitely representable complete set of generalizations which should be further minimized to get an mcsg. In our case, AUnif directly gives a finitely represented mcsg.

Finally, we briefly comment on the complexity of AUnif in terms of the number of final configurations produced.

Definition 11

(Absorption positions). An absorption position of terms s and t is a position \(p\,\in \, pos(s)\cap pos(t)\) such that \( \{\varepsilon _f,f\} \,=\, \{head(s |_p), head(t |_p)\}\) for some \(f\,\in \, Abs_f\), and \(head(s |_q)\,=\, head(t |_q)\) for all \(q\sqsubset p\). The set of absorption positions of s and t is denoted as ap(st).

Absorption positions are disjoint from each other. If \(s\,\triangleq _{x}\,t\) is an initial AUT and \(p\,\in \, ap(s,t)\), after finitely many steps the AUnif algorithm will generate an AUT \(s|_p\,\triangleq _{x}\,t|_p\), that is, an AUT whose side heads form an absorption pair. To each such AUT, two inference rules from AUnif are applicable, i.e., this is a branching point in the algorithm. No other pair of joint positions causes branching. Hence, \(\textsc {AUnif}(\langle \{s\,\triangleq _{x}\,t\};\emptyset ;\emptyset ;\iota \rangle )\) contains more than one final configuration iff \(ap(s,t)\not =\, \emptyset \). Each absorption position may lead to at most \(\max \{ size (s), size (t)\}\) branches due to nested f’s below absorption positions (as, e.g., in \(\varepsilon _f \,\triangleq _{x}\, f(f(a,b),c)\)); they resurface after applying the expansion rules and create new AUTs between terms whose heads are absorption pairs (\(\varepsilon _f\) and f). It implies the following:

Theorem 6

Let s and t be terms and n be the cardinality of ap(st). Then the cardinality of \(\textsc {AUnif}(\langle \{s\,\triangleq _{x}\,t\} ; \emptyset ; \emptyset ; \theta \rangle )\) is bounded by \(\max \{ size (s), size (t)\}^n\).

If we fix the number of absorbing positions in the input terms, the set of final configurations has a polynomial size. Moreover, note that computing one final configuration requires a linear number of steps since each rule eliminates at least one pair of symbols from the set of AUTs to be transformed.

6 Conclusion

We introduced a rule-based algorithm that computes generalizations for problems modulo absorption symbols and proved its soundness and completeness. Furthermore, the algorithm finitely computes a finite set of final configurations from which we can extract a minimal complete set of generalizations. This set can be infinite, implying that \(\texttt {Abs}\)-anti-unification is of type infinitary.

In contrast to other grammar-based approaches, our algorithm is generalizable to similar subterm-collapsing theories, which would allow a finite representation of the minimal complete set of generalizations. Therefore, studying extensions of our method for such theories would be a natural next step.

For future work, we will consider how to combine our algorithm with algorithms for computing generalizations in other equational theories, similar to [3]. It would also be interesting to see how generalization techniques in such (combined) theories can be used in practice as part of methods for software analysis.