Higher-Order Pattern Anti-Unification in Linear Time

We present a rule-based Huet’s style anti-unification algorithm for simply typed lambda-terms, which computes a least general higher-order pattern generalization. For a pair of arbitrary terms of the same type, such a generalization always exists and is unique modulo \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha $$\end{document}α-equivalence and variable renaming. With a minor modification, the algorithm works for untyped lambda-terms as well. The time complexity of both algorithms is linear.


Introduction
The anti-unification problem of two terms t 1 and t 2 is concerned with finding their generalization, a term t such that both t 1 and t 2 are instances of t under some substitutions. Interesting generalizations are the least general ones. The purpose of anti-unification algorithms is to compute such least general generalizations (lggs).
For higher-order terms, in general, there is no unique higher-order lgg. Therefore, special classes have been considered for which the uniqueness is guaranteed. One of such classes is formed by higher-order patterns. These are λ-terms where the arguments of free variables are distinct bound variables. They were introduced by Miller [28] and gained popularity because of an attractive combination of expressive power and computational costs: There are practical unification algorithms [29][30][31] that compute most general unifiers whenever they exist. Pfenning [31] gave the first algorithm for higher-order pattern anti-unification in the Calculus of Constructions, with the intention of using it for proof generalization.
Since then, there have been several approaches to higher-order anti-unification, designing algorithms in various restricted cases. Motivated by applications in inductive learning, Feng and Muggleton [15] proposed anti-unification in Mλ, which is essentially an extension of higher-order patterns by permitting free variables to apply to object terms, not only to bound variables. Object terms may contain constants, free variables, and variables which are bound outside of object terms. The algorithm has been implemented and was used for inductive generalization.
Pientka [32] studied anti-unification of linear higher-order patterns in the framework of developing substitution tree indexing for higher-order terms. Linear higher-order patterns require that every meta-variable occurs at most once in them, and they apply to all distinct bound variables in its context. The generalization algorithm has been used for the insertion of terms into the index.
Lu et al. [26] investigated anti-unification in a restricted version of λ2 (a second-order λ-calculus with type variables [3]) and its applications in analogical programming and analogical theorem proving. The imposed restrictions guarantee uniqueness of the least general generalization. This algorithm as well as the one for higher-order patterns by Pfenning [31] have influenced the generalization algorithm used in the program transformation technique called supercompilation [27].
There are other fragments of higher-order anti-unification, motivated by analogical reasoning. A restricted version of second-order generalization has an application in the replay of program derivations [16]. A symbolic analogy model, called Heuristic-Driven Theory Projection, uses yet another restriction of higher-order anti-unification to detect analogies between different domains [21].
The last decade has seen a revived interest in anti-unification. The problem has been studied in various theories (e.g., [1,10,22]) and from different application points of view (e.g., [2,9,21,25,26,35]). A particularly interesting application comes from software code refactoring, to find similar pieces of code, e.g., in Python, Java [7,8] and Erlang [25] programs. These approaches are based on the first-order anti-unification [33,34]. To advance the refactoring and clone detection techniques for languages based on λ Prolog, one needs to employ antiunification for higher-order terms. Yet another motivation to look into the problem of higherorder anti-unification in more detail would be the improvement of indexing techniques for λ-terms used, e.g., in mathematical assistant systems.
In this paper, we revisit the problem of higher-order anti-unification and present a rulebased anti-unification algorithm (in the style of Huet [19]) for simply typed λ-calculus. The input of the algorithm are arbitrary terms in η-long β-normal form. The output is a higherorder pattern. The global function for recording disagreements is represented as a store, in the spirit of Alpuente et al. [1]. We prove that a least general pattern generalization always exists and is unique modulo α-equivalence. The proposed algorithm computes it in linear time. As it is done in related work, we assume that symbols and pointers are encoded in constant space, and basic operations on them are performed in constant time. With a small modification, the algorithm works for untyped lambda-calculus as well.
This paper is an extended and improved version of our conference publication [5]. There, it is proved that the problem is solvable in cubic time. A free open-source implementation for both simply typed and untyped calculi of this previous version of the algorithm is available.

Comparison with Some Related Work
The approaches which are closest to us are the following: -Pfenning [31] studied anti-unification in the Calculus of Constructions, whose type system is richer than the simple types we consider. Both the input and the output was required to be higher-order patterns. Some questions have remained open, including the efficiency, applicability, and implementations of the algorithm. Due to the nature of type dependencies in the calculus, the author was not able to formulate the algorithm in Huet's style [19], where a global function is used to guarantee that the same disagreements between the input terms are mapped to the same variable. The complexity has not been studied and the proofs of the algorithm properties have been just sketched. -Anti-unification in Mλ [15] is performed on simply typed terms, where both the input and the output are restricted to a certain extension of higher-order patterns. In this sense it is not comparable to our case, because we do not restrict the input, but require patterns in the output. Moreover, it contains neither the complexity analysis of the Mλ anti-unification algorithm nor the proofs of its properties. -The anti-unification algorithm proposed by Pientka [32] also considers simply typed terms with the input and output restricted. The restriction requires terms to be linear higher-order patterns. Complexity results are not reported. This approach is also different from ours for the same reason as above: We do not restrict the input. It should be said that omitting one of the rules in our algorithm (the merging rule), we can also compute linear pattern generalizations for arbitrary input.
Some more remote results are listed below: -Anti-unification for a restricted version of λ2 [26] requires the λ-abstraction not to be used in arguments. The algorithm computes a generalization which is least general with respect to the combination of several orderings defined in the paper. The properties of the algorithm are formally proved, but the complexity has not been analyzed. As the authors point out, the orderings they define are not comparable with the ordering used to compute higher-order pattern generalizations. -Generalization algorithms proposed by Hirata et al. [18] work on second-order terms which contain no λ-abstractions. The output is also restricted: It may contain variables which can be instantiated with multi-hole contexts only. Varying restrictions on the instantiation, various versions of generalizations are obtained. This approach is not comparable with ours. -Yet another anti-unification algorithm for λ-abstraction-free terms has been developed for analogy making [21]. The application dictates the typical input to be first-order, while their generalizations may contain second-order variables. A certain measure is introduced to compare generalizations, and the algorithm computes those which are preferred by this measure. This approach is not comparable with ours either. -The approach of Hasker [16] is also different from what we do. The anti-unification algorithm there works on a restriction of combinator terms and computes their generalizations (in quadratic time). It has been used for program derivation.

Preliminaries
In higher-order signatures we have types constructed from a set of basic types (typically δ) using the grammar τ : x is a variable and c is a constant, and are typed as usual. Terms of the form where h is a constant or a variable, will be written as h(t 1 , . . . , t m ), and terms of the form λx 1 . · · · .λx n · t as λx 1 , . . . , x n · t. We use #» x as a short-hand for x 1 , . . . , x n . Other standard notions of the simply typed λ-calculus, like bound and free occurrences of variables, α-conversion, β-reduction, η-long β-normal form, etc. are defined as usual (see [13]). By default, terms are assumed to be written in η-long β-normal form. Therefore, all terms have the form λx 1 , . . . , x n .h(t 1 , . . . , t m ), where n, m ≥ 0, h is either a constant or a variable, t 1 , . . . , t m have also this form, and the term h(t 1 , . . . , t m ) has a basic type.
The set of free variables of a term t is denoted by Vars(t). When we write an equality between two λ-terms, we mean that they are equivalent modulo α, β and η equivalence.
For a term t = λx 1 , . . . , x n .h(t 1 , . . . , t m ) with n, m ≥ 0, its head is defined as Positions in λ-terms are defined with respect to their tree representation in the usual way, as string of integers. For instance, in the term f (λx.λy · g(λz.h(z, y), x), λu · g(u)), the symbol f stands in the position (the empty sequence), the occurrence of λx. stands in the position 1, the bound occurrence of y in 1.1.1.1.1.2, the bound occurrence of u in 2.1.1, etc.
The path to a position in a λ-term is defined as the sequence of symbols from the root to the node at that position (not including) in the tree representation of the term. For instance, the path to the position 1.
Substitutions are finite sets of pairs {X 1 → t 1 , . . . , X n → t n } where X i and t i have the same type and the X 's are pairwise distinct variables. They can be extended to type preserving functions from terms to terms as usual, avoiding variable capture. The notions of substitution domain and range are also standard and are denoted, respectively, by Dom and Ran.
We use postfix notation for substitution applications, writing tσ instead of σ (t). As usual, the application tσ affects only the free occurrences of variables from Dom(σ ) in t. We write #» x σ for x 1 σ, . . . , x n σ , if #» x = x 1 , . . . , x n . Similarly, for a set of terms S, we define Sσ = {tσ | t ∈ S}. The composition of σ and ϑ is written as juxtaposition σ ϑ and is defined as x(σ ϑ) = (xσ )ϑ for all x. Yet another standard operation, restriction of a substitution σ to a set of variables S, is denoted by σ | S .
A substitution σ 1 is more general than σ 2 , written σ 1 σ 2 , if there exists ϑ such that X σ 1 ϑ = X σ 2 for all X ∈ Dom(σ 1 ) ∪ Dom(σ 2 ). The strict part of this relation is denoted by ≺. The relation is a partial order and generates the equivalence relation which we denote by . We overload by defining s t if there exists a substitution σ such that sσ = t.
A term t is called a generalization or an anti-instance of two terms t 1 and t 2 if t t 1 and t t 2 . It is a higher-order pattern generalization if additionally t is a higher-order pattern. It is the least general generalization, (lgg in short), aka a most specific anti-instance, of t 1 and t 2 , if there is no generalization s of t 1 and t 2 which satisfies t ≺ s.
An anti-unification problem (shortly AUP) is a triple and λ #» x · s are terms of the same type, t and s are in η-long β-normal form, and -X does not occur in t and s.
The variable X is called a generalization variable. The term X ( #» x ) is called the generalization term. The variables that belong to #» x , as well as bound variables, are written in the lower case letters x, y, z, . . .. Originally free variables, including the generalization variables, are written with the capital letters X, Y, Z , . . .. This notation intuitively corresponds to the usual convention about syntactically distinguishing bound and free variables. The size of a set of AUPs is defined as An anti-unifier of X ( #» x ) : t s is least general (or most specific) if there is no anti-unifier ϑ of the same problem that satisfies σ ≺ ϑ. Obviously, if σ is a least general anti-unifier of Here we consider a variant of higher-order anti-unification problem: Given: Higher-order terms t and s of the same type in η-long β-normal form.
Find: A higher-order pattern generalization r of t and s.
The problem statement means that we are looking for r which is least general among all higher-order patterns which generalize t and s. There can still exist a term which is less general than r , generalizes both s and t, but is not a higher-order pattern. For instance, if ) is a higher-order pattern, which is an lgg of t and s. However, the term λx, y · f (Z (x, x, y), Z (x, y, y)), which is not a higher-order pattern, is less general than r and generalizes t and s.
Below we assume that in the AUPs of the form is a higher-order pattern.

Transformation Rules for a Variant of Higher-Order Anti-Unification
In this section we describe a set of transformation rules for higher-order anti-unification. These rules work on triples A; S; σ , which we call states. Here A is a set of AUPs of the form : t n s n } that are pending to anti-unify, S is a set of already solved AUPs (the store), and σ is a substitution (computed so far) mapping variables to patterns.

Remark 1
We assume that in the set A∪ S each occurrence of λ binds a distinct name variable (in other words, all names of bound variables are distinct), and that each X i occurs in A ∪ S only once.

Definition 1
The set of transformations P is defined by the following set of rules: where h is a constant or h ∈ #» x , and Y 1 , . . . , Y n are fresh variables of the appropriate types.
where X is a fresh variable of the appropriate type.
where t and s are of a basic type, Head(t) = Head(s) or Head(t) = Head(s) = Z / ∈ #» x , the sequence #» y is a subsequence of #» x consisting of the variables that appear freely in t or in s, and Y is a fresh variable of the appropriate type.
To compute generalizations for terms t and s, we start with the initial state {X : t s}; ∅; ∅, where X is a fresh variable, and apply the transformations as long as possible, until no transformation applies. These final states have the form ∅; S; ϕ, where Mer does not apply to S. Then, the result computed by P is X ϕ.
One can easily show that a triple obtained from A; S; σ by applying any of the rules above to a state is indeed a state: For each expression X ( #» x ) : t s ∈ A ∪ S, the terms X ( #» x ), t and s have the same type, λ #» x · X ( #» x ) is a higher-order pattern, s and t are in η-long β-normal form, and X does not occur in t and s. Moreover, all generalization variables are distinct and substitutions map variables to patterns.
The property that each occurrence of λ in A ∪ S binds a unique variable is also maintained. It guarantees that in the Abs rule, the variable y is fresh for s. After the application of the rule, y will appear nowhere else in A ∪ S except X ( #» x , y) and, maybe, t and s.
Like in the anti-unification algorithms working on triple states [1,22], the idea of the store here is to keep track of already solved AUPs in order to reuse an existing variable in generalizations. This is important, since we aim at computing lggs.
The Mer rule requires solving a matching problem {t 1 s 1 , t 2 s 2 } with the substitution π which bijectively maps the variables from #» x to the variables from #» y . In general, a matching problem is defined as follows.
Definition 2 (Permuting matcher) Given a set of pairs of terms in η-long β-normal form P = {t 1 s 1 , . . . , t n s n } and two sets of variables D and R such that Vars(s i ), a permuting matcher is a bijection π : D → R such that, extended as a substitution π of variables x ∈ D by variables π(y) ∈ R, satisfies t i π = s i , for The permuting matcher, if it exists, is unique 2 and is denoted by match(D, R, P). When this map does not exist, we write match(D, R, P) = ⊥.
An algorithm that decides the existence of the permuting matcher and computes it in linear time is given in [5]. Here, in Sect. 5, we show that a more general problem can also be solved in linear time.

Example 1 A couple of examples illustrating the generalizations computed by P:
Then P performs the following transformations: y, x)). It generalizes the input terms t and s: r From the examples one can notice yet another advantage of using the store (besides helping in the merging): In the final state, it contains AUPs from which one can get the substitutions that show how the original terms can be obtained from the computed result.

Properties of the Set of Transformations P
In this section we will prove termination (Theorem 1), soundness (Theorem 2) and completeness (Theorem 3) of P.

Theorem 1 (Termination) The set of transformations P is terminating.
Moreover, any transformation sequence starting in the state A; S; σ terminates in O(| A|+ |S|) steps.
Proof We define the measure of a state as M(A; S; σ ) = 2 |A| + |S|. All rules in P strictly decrease this measure.
Proof To prove that X σ is a higher-order pattern, we use the facts that first, X is a higher order pattern and, second, at each step A 1 ; S 1 ; ϕ ⇒ A 2 ; S 2 ; ϕϑ if X ϕ is a higher-order pattern, then X ϕϑ is also a higher-order pattern. The latter property follows from stability of patterns under substitution application and the fact that substitutions in the rules map variables to higher-order patterns. As for X σ being in η-long β-normal form, this is guaranteed by the series of applications of the Abs rule, even if Dec introduces an AUP whose generalization term is not in this form. It finishes the proof of (a). Proving s was not transformed at this step, then this property trivially holds for it. Therefore, we assume that X ( #» x ) : t s is selected and prove the property for each rule: )). Let ψ 1 and ψ 2 be substitutions defined, respectively, by Y i ψ 1 = λ #» x · t i and Y i ψ 2 = λ #» x · s i for all 1 ≤ i ≤ m. Such substitutions obviously exist since the Y 's introduced by the Dec rule are fresh. Then Now, we proceed by induction on the length l of the transformation sequence. In fact, we will prove a more general statement: If A 0 ; S 0 ; ϑ 0 ⇒ * ∅; S n ; ϑ 0 ϑ 1 · · · ϑ n is a transformation sequence in P, then for any X ( #» x ) : t s ∈ A 0 ∪ S 0 we have X ( #» x )ϑ 1 · · · ϑ n t and X ( #» x )ϑ 1 · · · ϑ n s.
When l = 1, it is exactly the one-step case we just proved. Assume that the statement is true for any transformation sequence of the length n and prove it for a transformation sequence A 0 ; S 0 ; ϑ 0 ⇒ A 1 ; S 1 ; ϑ 0 ϑ 1 ⇒ * ∅; S n ; ϑ 0 ϑ 1 · · · ϑ n of the length n + 1.
Below the composition ϑ i ϑ i+1 · · · ϑ k is abbreviated as ϑ k i with k ≥ i. Let X ( #» x ) : t s be an AUP selected for transformation at the current step. (Again, the property trivially holds for the AUPs which are not selected). We consider each rule: (t 1 , . . . , t m ), s = h(s 1 , . . . , s m ) and By construction of ϑ n 2 , if there is U ∈ Vars(Ran(ϑ n 2 )), then there is an AUP of the form U ( #» u ) : t s ∈ S n . Let σ (resp. ϕ) be a substitution which maps each such U to the corresponding t (resp. s ). , y), and A 1 contains the AUP x , y)ϑ n 2 and due to the way how y was chosen, we finally get Hence, the result computed by P for X : t s generalizes both t and s. We call X σ a generalization of t and s computed by P. Moreover, given a transformation sequence {X : t s}; ∅; ∅ ⇒ * ∅; S; σ in P, we say that σ is a substitution computed by P for X : t s; -the restriction of σ on X , σ | X , is an anti-unifier of X : t s computed by P.

Theorem 3 (Completeness)
Let λ #» x · t 1 and λ #» x · t 2 be higher-order terms and λ #» x · s be a higher-order pattern such that λ #» x · s is a generalization of both λ #» x · t 1 and λ #» x · t 2 . Then, there exists a transformation sequence Proof By structural induction on s. We can assume without loss of generality that λ #» x · s is an lgg of λ #» x · t 1 and λ #» x · t 2 . We also assume that it is in the η-long β-normal form.
If s is a variable, then there are two cases: Either s ∈ #» x , or s / ∈ #» x . In the first case, we have s = t 1 = t 2 . The Dec rule gives σ = {X → λ #» x · s} and, hence, In the second case, either Head(t 1 ) = Head(t 2 ), or Head(t 1 ) = Head(t 2 ) / ∈ #» x . Sol is supposed to give us x is a subsequence of #» x consisting of variables occurring freely in t 1 or in t 2 . But #» x should be empty, because otherwise s would not be just a variable (remember that λ #» x · s is an lgg of λ #» x · t 1 and λ #» x · t 2 in the η-long β-normal form). Hence, we have If s is a constant c, then t 1 = t 2 = c. We can apply the Dec rule, obtaining σ = {X → λ #» x · c} and, hence, If s = λx.s , then t 1 and t 2 must have the forms t 1 = λx.t 1 and t 2 = λy.t 2 , and s must be an lgg of t 1

and t 2 . Abs gives a new state {X
By the induction hypothesis, we can compute a substitution Finally, assume that s is a compound term h(s 1 , . . . , s n ). If h / ∈ #» x is a variable, then s 1 , . . . , s n are distinct variables from #» x (because λ #» x · s is a higher-order pattern). That means that s 1 , . . . , s n appear freely in t 1 or t 2 . Moreover, either Head(t 1 ) = Head(t 2 ), or Head(t 1 ) = Head(t 2 ) = h. In both cases, we can apply the Sol rule to obtain (s 1 , . . . , s n ).
If h ∈ #» x or if it is a constant, then we should have Head(t 1 ) = Head(t 2 ). Assume they have the forms t 1 = h(t 1 1 , . . . , t 1 n ) and t 2 = h(t 2 1 , . . . , t 2 n ). We proceed by the Dec rule, obtaining By the induction hypothesis, we can construct transformation sequences 1 , . . . , n computing the substitutions σ 1 , . . . , σ n , respectively, such that These transformation sequences, together with the initial Dec step, can be combined into one transformation sequence, of the form Let for any term t, t| p denote the subterm of t at position p. If s does not contain duplicate variables free in λ #» x · s, then the construction of and the fact that , where #» z 1 and #» z 2 have the same length) at positions p 1 and p 2 , it indicates that (a) t 1 | p 1 and t 1 | p 2 differ from each other by a permutation of variables bound in t 1 , (b) t 2 | p 1 and t 2 | p 2 differ from each other by the same (modulo variable renaming) permutation of variables bound in t 2 , (c) the path to p 1 is the same (modulo bound variable renaming) in t 1 and t 2 . It equals (modulo bound variable renaming) the path to p 1 in s, and (d) the path to p 2 is the same (modulo bound variable renaming) in t 1 and t 2 . It equals (modulo bound variable renaming) the path to p 2 in s.
Then, because of (c) and (d), we should have two AUPs in S n : One, between (renamed variants of) t 1 | p 1 and t 2 | p 1 , and the other one between (renamed variants of) t 1 | p 2 and t 2 | p 2 . The possible renaming of variables is caused by the fact that Abs might have been applied to obtain the AUPs. Let those AUPs be Z ( #» z 1 ) : r 1 1 r 2 1 and Z ( #» z 2 ) : r 1 2 r 2 2 . The conditions (a) and (b) make sure that match( is a permuting matcher π, which means that we can apply the rule Mer with the substitution We can repeat this process for all duplicated variables in s, extending to the transformation sequence σ 0 ⇒ * ∅; S n ; σ 0 σ 1 · · · σ n ⇒ * ∅; S n+m ; σ 0 σ 1 · · · σ n σ 1 · · · σ m , where σ 1 , . . . , σ m are substitutions introduced by the applications of the Mer rule. Let σ = σ 0 σ 1 · · · σ n σ 1 · · · σ m . By this construction, we have λ #» x · s λ #» x · X ( #» x )σ , which finishes the proof.
Depending which AUP is selected to perform a transformation, there can be different transformation sequences in P starting from the same initial state, but leading to different generalizations. The next theorem states that all those generalizations are equivalent.
Proof It is not hard to notice that if it is possible to change the order of applications of rules (but sticking to the same selected AUPs for each rule) then the result remains the same: If ; σ 1 ϑ 2 ϑ 1 are two two-step transformation sequences, where R1 and R2 are (not necessarily different) rules and each of them transforms the same AUP(s) in both 1 and 2 , then A 3 = A 3 , S 3 = S 3 , and σ 1 ϑ 1 ϑ 2 = σ 1 ϑ 2 ϑ 1 (modulo the names of fresh variables).
Dec, Abs and Sol rules transform the selected AUP in a unique way. We show that it is irrelevant in which order we perform matching in the Mer rule.

Its left hand side is transformed as
Next, starting from X ϑ 1 ρ 2 , we can transform it as X ϑ 1 ρ 2 = X σρ At this step, since the equality = is αβη-equivalence, we can omit the application of the substitution {Y → λ #» y · Y ( #» y )} and proceed: The fact X ϑ 2 X ϑ 1 can be proved analogously. Hence, X ϑ 1 X ϑ 2 , which means that it is irrelevant in which order we perform matching in the Merge rule. Therefore, no matter how different transformation sequences are constructed, the computed generalizations are equivalent.
Corollary 1 For any given terms t and s, and any transformation sequence {X : t s}; ∅; ∅ ⇒ * ∅; S; σ in P, the higher-order pattern X σ is the unique least general generalization of t and s.

Complexity
In this section we describe an algorithm based on the set of transformations P and prove that this algorithm has linear time complexity. Notice that the Termination Theorem (Theorem 1) already proves that any transformation sequence in P has at most linear length. However, the direct application of one transformation rule, as described in Sect. 3, may require quadratic time, what would result in a cubic time algorithm, similar to our previous one [5]. In this section we will improve this result.

Remark 2
In the complexity analysis that follows we will assume the following: 1. All pointers require constant space and all basic operations on them can be done in constant time. This assumption is popular in the literature, despite the fact that it is inaccurate: In any implementation of trees based on the use of pointers, these will need space O(log n), since they address a memory of size O(n). The same argument applies to all traditional algorithms for first-order unification. In fact, all those claimed to be linear, without this assumption would have O(n log n) time complexity. Therefore, we will continue with this traditional assumption. In other words, we will neglect logarithmic factors in front of polynomial functions.
The proposed algorithm works in three phases. Later, we will prove that each one of them can be done in linear time. The following lemma allows us to decompose any transformation sequence in the three phases that we will analyze separately.  (s 1 , . . . , s n ) after n applications of Abs and one of Dec, we get the problem set Notice that, if we counted the number of symbols of Y i (x 1 , . . . , x n ) to compute the size of the new AUPs, this would be quadratic on the size of the original AUP. This is the reason to only consider the number of symbols of the second and third component of the AUP to define its size. Moreover, reusing the representation of x 1 , . . . , x n in a directed acyclic graph, we can also represent the new AUPs in linear space on the size of the representation of the original AUP. Proof In the Dec rule, we reuse the representation of t i from f (t 1 , . . . , t m ) to get a representation of each t i . Using a directed acyclic graph, we also reuse the argument vector #» x from the representation of X ( #» x ) in the original AUP and σ to construct the representation of each Y i ( #» x ) in the new AUPs and substitution. We assume that in constant time, by using an appropriate hash table we can find the unique occurrence of X in σ , and hence compute Notice that β-reduction is trivial in this use case. Therefore, the rule can be applied in time O(m), and the space requirements also increase in O(m).
In the Abs rule, we reuse the representation of λy.t to construct the representation of t. We also reuse the representation of #» x from X ( #» x ) when we construct the representation of X ( #» x , y). The most expensive step is to compute the substitution s{z → y}. We assume that, using an appropriated data structure where all occurrences of the bound variable z are linked, this can be done in linear time on the number of occurrence of this variable. This structure can be constructed for the initial problem in linear time.
If we bound the complexity as the product of the number of times we can apply these rules (linear) by the cost of every application (also linear in the worst case), we get a quadratic bound. In order to refine this bound we need to introduce the notion of extended size of a term, noted t , defined inductively as: h(t 1 , . . . , t m ) = m + t 1 + · · · + t m + 1, λy.t = r + t + 1 where r is the number of free occurrences of y in t.
It can be easily proved that t ≤ 3 |t|. We can prove that the applications of Dec and Abs rules decrease the sum of the extended sizes of the terms of the equations in the same amount as the time they needed to be applied. All these together prove that this phase of the algorithm can be computed in linear time, and that the increase of space is also linear.
Finally, |P| ≤ |t| + |s| is proved by inspection of the rules.

Lemma 3 (Second phase)
There exists an algorithm that, given a representation of P; ∅; σ , computes a representation of S and σ , where Proof In this second phase, we basically move equations from the problem set P to the store S. However, notice that the arguments of generalization variables in X ( #» x ) : t s are narrowed in Y ( #» y ) : t s, where #» y is a subset of #» x . As only those argument variables which appear in one of the terms t and s are kept, the length of the narrowed vector #» y is bound by |t| + |s|.
There is no need to share the representation of those narrowed argument vectors #» y anymore. The representation of #» y can be constructed without reusing the representation of Finally, |S| = |P| by inspection of the rule.
As a direct consequence of the previous two lemmas, we can conclude that the size of the store S, after the first and second phase of the anti-unification algorithm, is linear on the size of the original problem.
In order to prove that the third phase can also be implemented with a linear time algorithm, we can not directly use the rules described in previous section. This would lead to a cubic time algorithm, similar to our previous one [5].
First, we will reduce computation of permuting matchers to α-equivalence, using the following lemma.
is an ordering of the set D (resp. R) such that, for any i < j, the variable x i occurs free for the first time in t (resp s) before the first occurrence of x j with respect to the depth-first pre-order traversal of t (resp s).
Moreover Altogether, this proves the first part of the lemma. The second part of the lemma relies on the first part. Given a term t (resp. s), we can close it, adding lambda-bindings for all free variables in D (resp. R) in the same order they appear for the first time in the term w.r.t. the depth-first pre-order traversal. Then, we can transform it into de Bruijn form. Both processes can be done in linear time. 3 Then, to check if the two closing sequences (sequences of lambda-bindings) yield a permuting matcher for the two terms, we only need to check, in linear time, if the de Bruijn forms of the closed representations are equal.
However, we still have the problem that this should be repeated for any pair of AUPs in the store. A naive implementation of this process would result into a quadratic algorithm in the size of the store. However, this can be done in quasi-linear time using the following result. If numbers are unbounded, then we can use a similar idea, without using hash functions. We use a trie, i.e. a binary tree T such that, for any i = 1, . . . , m, tree T contains a node at position n i with label i. Starting with the empty tree, we proceed adding all necessary nodes to ensure that n i is a position in the tree. This can be done in linear time on the representation of n i , i.e. O(log n i ) = O(|n i |). Then, we add label i to the corresponding node, on time linear on the representation of i. At the end, sets of labels will represent subsets of equal numbers in S.
Using the same ideas as Hinze [17], the previous lemma can be generalized when, instead of numbers written in binary, we have other objects represented in a fixed signature. In our case, we will use the algorithm described in the proof of Lemma 5 to find subsets of αequivalent λ-terms. For this purpose, we translate α-terms into their de Bruijn form, and then represent them by their pre-order traversal sequence.
We are now able to establish the complexity of the third phase of the algorithm. Here we assume that every de Bruijn index only requires constant space in order to be represented. Assuming also that the rest of symbols also require constant space, we can represent the terms by their pre-order traversal sequence. This traversal sequence has size linear on the size of the term. By Lemma 5, we can partition the store S into subsets of equal terms, i.e. subsets of permuting matchers. Then, applying rule Mer to all AUPs of the subset, we remove all but one representative of each subset. In each application, the cost of applying the rule is linear on the size of the removed equation, since the side condition of the rule has already been checked.
From Lemmas 1, 2, 3 and 6 we can conclude the following result.
Theorem 5 (Complexity of P) Computing the least general higher-order pattern generalization of two simply typed lambda-terms in η-long β-normal form has linear time complexity on the size of the input.

Remark 3
The previous result is only valid under the assumption that all pointers, de Bruijn indexes, and representation of symbols only require constant space. Without these assumptions, we could prove that our anti-unification algorithm-like all traditional unification algorithms-is, in fact, quasi-linear. One could argue that in traditional unification algorithms, input terms are also represented as trees encoded with pointers; hence, input size is also quasi-linear on the number of symbols of the terms, like the complexity of the algorithm; therefore, the algorithm is quasi-linear in the number of symbols, but linear in the size of the input. However, this is not true. As Jacobsen proves, for trees with a constant number of distinct nodes, there exist succinct representations that only require linear space on the number of nodes [20]. In this representation, even to simply access a node's child requires logarithmic time. Since in complexity theory inputs are assumed to be represented succinctly, even the traditional algorithm for tree traversal strictly requires quasi-linear time.

Conclusion and Final Remarks
We designed an algorithm for computing higher-order pattern generalizations for simply typed lambda terms. The algorithm does not assume any restriction on the input except requiring them to be terms of the same type in the η-long β-normal form. The computed pattern is a least general generalization of the input terms, and is unique modulo free variable renaming and α-equivalence. It is computed in linear time in the size of input (under the usual conventions made in the unification literature on the space requirements for encoding symbols and pointers, and on the complexity of basic operations on them). One can observe that the set of transformations P used in the paper can be adapted with a relatively little effort to work on untyped terms (cf. the formulation of the unification algorithm both for untyped and simply typed patterns [30]). One thing to be added is lazy η-expansion: The AUP of the form X ( #» x ) : λy.t h(s 1 , . . . , s m ) should be transformed into X ( #» x ) : λy.t λz.h(s 1 , . . . , s m , z) for a fresh z. (Dually for abstractions in the right hand side). In addition, Sol needs an extra condition for the case when Head(t) = Head(s) but the terms have different number of arguments such as, e.g., in f (a, x) and f (b, x, y). Note that the complexity of the algorithm will remain linear in the untyped case, since the side enlargements in the lazy η-expansion are bounded by the size of the original problem in such a way that in the worst case, by summing up all enlargements one would only duplicate the size.
The anti-unification algorithm has been implemented (both for simply typed and untyped terms, without perfect hashing, using a simpler but more expensive method to compute permuting matchers) in Java as a part of an open-source anti-unification library [4]. It can be used online or can be downloaded freely from http://www.risc.jku.at/projects/stout/software/ hoau.php.
As for the related topics, we mention nominal anti-unification. Several authors explored relationship between nominal terms and higher-order patterns (see, e.g., [12,14,23,24] among others), proposing translations between them in the context of unification. However, it is not immediately clear how to reuse those translations for anti-unification, in particular, how to get nominal generalizations from pattern generalizations. Therefore, we proposed a direct algorithm for nominal anti-unification [6].
Studying anti-unification in the calculi with more complex type systems, such as the extension of the system F with subtyping F <: [11], would be a very interesting direction of future work, as it may have applications in clone detection and refactoring for the functional programming languages in the ML family.