Extending a High-Performance Prover to Higher-Order Logic

. Most users of proof assistants want more proof automation. Some proof assistants discharge goals by translating them to ﬁrst-order logic and invoking an eﬃcient prover on them, but much is lost in translation. Instead, we propose to extend ﬁrst-order provers with native support for higher-order features. Building on our extension of E to λ -free higher-order logic, we extend E to full higher-order logic. The result is the strongest prover on benchmarks exported from a proof assistant.


Introduction
In the last few decades, proof assistants have become indispensable tools for developing trustworthy formal proofs.They are used both in academia to verify mathematical theories [14] and in industry to verify the correctness of hardware [18] and software [13,19,21].However, due to the lack of strong built-in proof automation, proving seemingly simple goals can be a tedious manual task.To mitigate this, many proof assistants include a subsystem such as CoqHammer, HOL(y)Hammer, or Sledgehammer [7] that translates higher-order goals to first-order logic and passes them to efficient first-order automatic provers.If a first-order prover succeeds, the proof is reconstructed and the goal is closed.
Unfortunately, the translation of higher-order constructs is clumsy and leads to poor performance on goals that require higher-order reasoning.Using native higher-order provers such as Satallax [8] as backends is not always a good solution because they are much less efficient than their first-order counterparts [32].To bridge this gap, in 2016 we proposed to develop a new generation of higherorder provers that extend the arguably most successful first-order calculus, superposition, to higher-order logic, starting from a position of strength.
Our research has focused on three milestones: supporting λ-free higher-order logic, adding λ-terms, and adding first-class Boolean terms.In 2019, we extended the state-of-the-art first-order prover E [29] with a λ-free superposition calculus [37], obtaining a version of E called Ehoh, as a stepping stone towards full higher-order logic.Together with Bentkamp, Tourret, and Waldmann, we have since developed calculi, called λ-superposition, corresponding to the other two milestones [2,3] and implemented them in the experimental superposition prover Zipperposition [11].This OCaml prover is not nearly as efficient as E. Nevertheless, it has won the higher-order division of the CASC prover competition [34] in 2020, 2021, and 2022, ending nearly a decade of Satallax domination.
In this paper, we fulfill a three-year-old promise: We present the extension of Ehoh to full higher-order logic (Sect.2) based on incomplete variants of λ-superposition.We call this prover λE.In λE's implementation, we used the extensive experience with Zipperposition to choose a set of effective rules that could easily be retrofitted into an originally first-order prover.Another principle that guided the design of λE was gracefulness: We made sure that our changes do not impact the strong first-order performance of E and Ehoh.
One of the main challenges we faced was retrofitting λ-terms in Ehoh's term representation (Sect.3).Furthermore, Ehoh's inference engine assumes that inferences compute a most general unifier.We implemented a higher-order unification procedure [36] that can return multiple unifiers (Sect.4) and integrated it in the inference engine.Finally, we extended and adapted the superposition rule, resulting in an incomplete, pragmatic variant of λ-superposition (Sect.5).
We evaluated λE on a selection of proof assistants benchmarks as well as all higher-order theorems in the TPTP library [33] (Sect.6).λE outperformed all other higher-order provers on the proof assistant benchmarks; on the TPTP benchmarks, it ended up second only to the cooperative version of Zipperposition, which employs Ehoh as a backend.An arguably fairer comparison without the backend puts λE in first place for both benchmark suites.We also compared the performance of λE with E on first-order problems and found that no overhead has been introduced by the extension to higher-order logic.
λE is part of the E prover's development repository and will be part of E 3.0.It can be enabled by passing the option --enable-ho to the configure script.E and λE's source code is freely available online. 1

Logic
Our target logic is monomorphic classical higher-order logic with Hilbert choice.
We let s t n stand for s t 1 . . .t n and λx n .s for λx 1 . . . .λx n .s. Every β-normal term can be written as λx m .s t n , where s is not an application; we call s the head of the term.If s is a free variable, we call the term flex ; otherwise, the term is rigid.A term of type o, where o is the distinguished Boolean type, is called a formula.A type whose type is of the form Logical symbols are part of the signature and may thus occur within terms.We write them in bold: On top of the terms, we define some clausal structure.This structure is needed by λ-superposition.A literal l is an equation s ≈ t or a disequation s ≈ t.A clause is a finite multiset of literals, interpreted and written disjunctively: Notice that the clause-level operators are not set in bold.Predicate literals are encoded as (dis)equations with based on their sign; for example, even(x) is encoded as even(x) ≈ , and ¬ even(x) as even(x) ≈ .

Terms
E is designed around perfect term sharing [22], a principle that we kept in Ehoh and λE: Any two structurally identical terms are guaranteed to be the same object in memory.This is achieved through term cells, which represent individual terms.Each cell has (among other fields) (1) f_code, an integer corresponding to the symbol at the head of the term (negative if the head is a free variable, positive otherwise); (2) num_args, corresponding to the number of arguments applied to the head; and (3) args, an array of size num_args of pointers to argument terms.We use the notation f(s 1 , . . ., s n ) to denote a cell whose f_code corresponds to f, num_args equals n, and args points to the cells for s 1 , . . .s n .
Ehoh represents λ-free higher-order terms using a flattened, spine notation.Thus, the terms f, f a, and f a b are represented by the cells f, f(a), and f(a, b).To ensure that free variables are perfectly shared, Ehoh treats applied free variables differently: Arguments are not applied directly to a free variable, but using an internal symbol @ of variable arity.For example, the term X a b is represented by the cell @(X, a, b).This ensures that two different occurrences of the free variable X correspond to the same object, which makes substitutions more efficient [37].
Representation of λ λ λ-Terms.To support full higher-order logic, Ehoh's λ-free cell data structure must be extended to support the λ binder.We use the locally nameless representation [10]: De Bruijn indices represent (possibly loose) bound variables, whereas we keep the current representation for free variables.
Extending the term representation of Ehoh with a new term kind involves intricate manipulation of the cell data structure.De Bruijn indices must be represented like other cells with either a negative or a positive f_code, but the code must clearly identify that the cell is a De Bruijn index.
Other than possibly being instantiated during β-reduction, De Bruijn indices mostly behave as constants.Therefore, we choose to represent De Bruijn indices using positive f_codes: The De Bruijn index of value i will have i as the f_code.To ensure De Bruijn indices are not mistaken for function symbols, we use the properties bitfield of the cell, which holds precomputed properties of the cell.We introduce the property IsDBVar to denote that the cell represents a De Bruijn index.De Bruijn indices are systematically created through a dedicated function that sets the IsDBVar property.When given the same De Bruijn index and type, this function always returns the same object.Finally, we guard all the functions and macros that manipulate function codes to check if the property IsDBVar is set.To ensure perfect sharing of De Bruijn indices, arguments to De Bruijn indices are applied like for free variables, using @.
Extending cells to support λ-abstraction is easier.Each λ-abstraction has the distinguished function code LAM as the head symbol and two arguments: (1) a De Bruijn index 0 of the type of the abstracted variable; (2) the body of the λ-abstraction.Consider the term λx.λy.f x x, where both x and y have the type ι.This term is represented as λ λ f 1 1 in locally nameless representation, where bold numbers represent De Bruijn indices.In λE, the same term is represented by the cell LAM(0, LAM(0, f(1, 1))), where all De Bruijn variables have type ι.
The first argument of LAM is redundant, since it can be deduced from the type of the λ-abstraction.However, basic λ-term manipulation operations often require access to this term.We store it explicitly to avoid creating it repeatedly.
A term can be β-reduced as follows: When a cell @(LAM(0, s), t) is encountered, the field binding (normally used to record the substitution for a free variable) of the cell 0 is set to t.Then s is traversed to instantiate every loose occurrence of 0 in s with binding, whose loose De Bruijn indices are shifted by the number of λ binders above the occurrence of 0 in s [17].Next, this procedure is applied to the resulting term and its subterms, in leftmost outermost fashion.
λE's basic β-normalization works in this way, but it features a few optimizations.First, given a term of the form (λx n .s) t n , λE replaces the bound variables x i with t i in parallel.By avoiding the construction of intermediate terms, this reduces the number of recursive function calls and calls to the cell allocator.
Second, in line with the gracefulness principle, we want λE to incur little (or no) overhead on first-order problems and to excel on higher-order problems with a large first-order component.If β-reduction is implemented naively, finding a β-redex involves traversing the entire term.On purely first-order terms, β-reduction is then a complete waste of time.To avoid this, we use Ehoh's perfectly shared terms and their properties field.We introduce the property HasBetaReducibleSubterm, which is set if a cell is β-reducible.Whenever a new cell that contains a β-reducible term as a direct subterm is shared, the property is set.Setting of the property is inductively continued when further superterms are shared.For example, in the term t = f a (g((λx.x) a)), the cells for (λx.x) a, g ((λx.x) a), and t itself have the property HasBetaReducibleSubterm set.When it needs to find β-reducible subterms, λE will visit only the cells with this property set.This further means that on first-order subterms, a single bit masking operation is enough to determine that no subterm should be visited.
Along similar lines, we introduce a property HasDBSubterm that caches whether the cell contains a De Bruijn subterm.This makes instantiating De Bruijn indices during β-normalization faster, since only the subterms that contain De Bruijn indices must be visited.Similarly, some other operations such as shifting De Bruijn indices or determining whether a term is closed (i.e., it contains no loose bound variables) can be sped up or even avoided if the term is first-order.
Efficient η η η-Reduction.The term λx.s x is η-reduced to s whenever x does not occur unbound in s.We use the observation that a term cannot be η-reduced if it has no λ-abstraction subterms and introduce a property HasLambda that notes the presence of λ-abstraction in a term.Only terms with this property are visited during η-reduction.
λE performs parallel η-reduction: It recognizes terms of the form λx n .s x n such that none of the x i occurs unbound in s.If done naively, reducing terms of this kind requires up to n traversals of s to check if each x i occurs in s.In λE, exactly one traversal of s is required.More precisely, when η-reducing a cell LAM(0, s), λE considers all λ binders in s as well.In general, the cell will be of the form LAM(0, . . ., LAM(0, t) . ..),where t is not a λ-abstraction, and l is the number of LAM symbols above t.Then λE breaks the body t down into a maximal decomposition u (n − 1) . . . 1 0. If n = 0, the cell is not η-reducible.Otherwise, u is traversed to determine the minimal index j of a loose De Bruijn index, taking j = ∞ if no such index exists.λE can then remove the k = min{j, l, n} rightmost outermost λ binders in LAM(0, . . ., LAM(0, t) . ..) and replace t by the variant of u (n − 1) . . .(k + 1) k obtained by shifting the loose De Bruijn indices down by k.
Parallel η-reduction both speeds up η-reduction and avoids creating intermediate terms.For finding the minimal loose De Bruijn index, optimizations such as the HasDBSubterm property are used.
Representation of Boolean Terms.E and Ehoh represent Boolean terms using cells whose f_codes are reserved for logical symbols.Quantified formulas are represented by cells in which the first argument is the quantified variable and the second one is the body of the quantified formula.For example, the term ∀ ∀ ∀x.p x corresponds to the cell ∀ ∀ ∀(X, p(X)), where X is a free variable.This representation is convenient for parsing and clausification, which is what E and Ehoh use it for, but in full higher-order logic, it is problematic during proof search: Booleans can occur as subterms in clauses, as in q(X) ∨ p(∀ ∀ ∀(X, r(X))), and instantiating X in the first literal should not affect X in the second literal.
To avoid this issue, in λE we use λ binders to represent quantified formulas.Thus, ∀ ∀ ∀x.s is represented by ∀ ∀ ∀ (λx.s).Quantifiers are then unary symbols that do not directly bind the variables.Since λE represents bound variables using De Bruijn indices, this solves the α-conversion issues.However, this solution is incompatible with thousands of decades-old lines of clausification code that assumes the E representation of quantified formulas.Therefore, λE converts quantified formulas only after clausification, for Boolean terms that occur in a higher-order context (e.g., as argument to a function symbol).
New Term Orders.The λ-superposition calculus is parameterized by a term order that is used to break symmetries in the search space.We implemented the versions of the Knuth-Bendix order (KBO) and lexicographic path order (LPO) for higher-order terms described by Bentkamp et al. [2].These orders encode λ-terms as first-order terms and then invoke the standard KBO or LPO.For efficiency, we implemented separate KBO and LPO functions that compute the order directly, intertwining the encoding and the order computation.
Ehoh cells contain a binding field that can be used to store the substitution for a free variable.Substitutions can then be applied by following the binding pointers, replacing each free variable with its instance.Thus, when Ehoh needs to perform a KBO or LPO comparison of an instantiated term, it needs only follow the binding pointers.In full higher-order logic, however, instantiating a variable can trigger a chain of βη-reductions, changing the shape of the term dramatically.To prevent this, λE computes the βη-reduced instances of the terms before comparing them using KBO or LPO.

Unification, Matching, and Term Indexing
Standard superposition crucially depends on the concept of a most general unifier (MGU).In higher-order logic, such a unifier does not always exist, and the concept is replaced by that of a complete set of unifiers (CSU), which may be infinite.Vukmirović et al. [36] designed an efficient procedure to enumerate a CSU for a term pair.It is implemented in Zipperposition, together with some extensions to term indexing.In λE, we further improve the performance of this procedure by implementing a terminating, incomplete variant.We also introduce a new indexing data structure.
The Unification Procedure.The unification procedure works by maintaining a list of unification pairs to be solved.After choosing a pair, it first normalizes it by β-reducing and instantiating the heads of both terms in the pair.Then, if either head is a variable, it computes an appropriate binding for this variable, thereby approximating the solution.
Unlike in first-order and λ-free higher-order unification, in the full higher-order case there may be many bindings that lead to a solution.To reduce this mostly blind guessing of bindings, the procedure features support for oracles [36].These are procedures that solve the unification problem for a subclass of higher-order terms on which unification is decidable and, for λE, unary.Oracles help increase performance, avoid nontermination, and avoid redundant bindings.
Vukmirović et al. described their procedure as a transition system.In λE, the procedure is implemented nonrecursively, and the unifiers are enumerated using an iterator object that encapsulates the state of the unifier search.The iterator consists of five fields: (1) constraints, which holds the unification constraints; (2) bt_state, a stack that contains information necessary to backtrack to a previous state; (3) branch_iter, which stores how far we are in exploring different possibilities from the current search node; (4) steps, which remembers how many different unification bindings (such as imitation, projection, and identification) are applied; and (5) subst, a stack storing the variables bound so far.
The iterator is initialized to hold the original problem in constraints, and all other fields are initially empty.The unifiers are retrieved one by one by calling the function ForwardIter.It returns True if the iterator made progress, in which case the unifier can be read via the iterator's subst field.Otherwise, no more unifiers can be found, and the iterator is no longer valid.The function's pseudocode is given below, including two auxiliary functions: ForwardIter begins by backtracking if the previous attempt was successful (i.e., all constraints were solved).If it finds a state from which it can continue, it takes term pairs from constraints until there are no more constraints or it is determined that no unifier exists.The terms are normalized by instantiating the head variable with its binding and reducing the potential top-level β-redex that might appear.This instantiation and reduction process is repeated until there are no more top-level β-redexes and the head is not a variable bound to some term.Then the term with shorter λ prefix is expanded (only on the top level) so that both λ prefixes have the same length.Finally, the λ prefix is ignored, and we focus only on the body.In this way, we avoid fully substituting and normalizing terms and perform just enough operations to determine the next step of the procedure.
If either term of the constraint is flex, we first invoke oracles to solve the constraint.λE implements the most efficient oracles implemented in Zipperposition: fixpoint and pattern [36,Sect. 6].An oracle can return three results: (1) there is an MGU for the pair (Unifiable), which is recorded in subst, and the next pair in constraints is tried; (2) no MGU exists for the pair (NotUnifiable), which causes the iterator to backtrack; (3) if the pairs do not belong to the subclass that oracle can solve (NotInFragment), we generate possible variable bindings-that is, we guess the approximate form of the solution.
λE has a dedicated module that generates bindings (NextBinding).This module is given the current constraint and the values of branch_iter and steps, and it either returns the next binding and the new values of branch_iter and steps or reports that all different variable bindings are exhausted.The bindings that λE's unification procedure creates are imitation, Huet-style projection, identification, and elimination (one argument at a time) [36,Sect. 3].A limit on the total number of applied binding rules can be set, as well as a limit on the number of individual rule applications.The binding module checks whether limits are reached using the iterator's steps field.
Computing bindings is the only point in the procedure where the search tree branches and different possibilities are explored.Thus, when λE follows the branch indicated by the binding module, it records the state to which it needs to return should the followed branch be backtracked.The state consists of the values of constraints, steps, and subst before the branch is followed and the value of branch_iter that points past the followed branch.The values of branch_iter are either BindBegin, which denotes that no binding was created, intermediate values that NextBinding uses to remember how far through bindings it is, and BindEnd, which indicates that all bindings are exhausted.
If all bindings are exhausted, the procedure checks whether the pair is flex-flex and both sides have the same head.If so, the pair is decomposed and constraints are derived from the pair's arguments; otherwise, the iterator backtracks.If the pair is rigid-rigid, for unification to succeed, the heads of both sides must be the same.Unification then continues with new constraints derived from the arguments.Otherwise, the iterator must be backtracked.
Matching.In E, the matching algorithm is mostly used inside simplification rules such as demodulation and subsumption [26].As these rules must be efficiently performed, using a complex matching algorithm is not viable.Instead, we provide a matching algorithm for the pattern class of terms [24] to complement Ehoh's λ-free higher-order matching algorithm [37,Sect. 4].A term is a pattern if each of its free variables either has no arguments (as in first-order logic) or is applied to distinct De Bruijn indices.
To determine which of the two algorithms to call (pattern or λ-free), we introduce a cached property HasNonPatternVar, which is set for terms of the form X s n where n > 0 and either there exists some s i that is not a De Bruijn index or there exist indices i < j such that s i = s j is a De Bruijn index.This property is propagated to the superterms when they are perfectly shared.This allows later checks if a term belongs to the pattern class to be performed in constant time.
We modify the λ-free higher-order matching algorithm to treat λ prefixes as above in the unification procedure-by bringing the prefixes to the same length and ignoring them afterwards.This ensures that the algorithm will never try to match a free variable with a λ-abstraction, making sure that β-redexes never appear.We also modify the algorithm to ensure that free variables are never bound to terms that have loose bound variables.This algorithm cannot find many complex matching substitutions (matchers), but it can efficiently determine whether two terms are variable renamings of each other or whether a simple matcher can be used, as in the case of (X (λx.x) b, f (λx.x) b), where X → f is usually the desired matcher.If this algorithm does not find a matcher and both terms are patterns, pattern matching is tried.
Indexing.E, like other modern theorem provers, efficiently retrieves unifiable or matchable pairs of terms using indexing data structures.To find terms unifiable with a query term or instances of a query term, it uses fingerprint indexing [27].Vukmirović et al. extended this data structure to support full higher-order terms Fig. 1.First-order, λ-free higher-order, and higher-order pattern terms in a perfect discrimination tree in Zipperposition [36,Sect. 6].We use the same approach in λE, and we extend feature vector indices [28] in the same way.E uses perfect discrimination trees [23] to find generalizations of the query term (i.e., terms of which the query term is an instance).This data structure is a trie that indexes terms by representing them in a serialized, flattened form.The left branch from the root in Figure 1 shows how the first-order terms f a X and f a a are stored.In Ehoh, this data structure is extended to support partial application and applied variables [37].
In λE, we extend this structure to support λ-abstractions and the higher-order pattern matching algorithm.To this end, we change the way in which terms are serialized.First, we require that all terms are fully η-expanded (except for arguments of variables applied in patterns).Then, when the term is serialized, we use a single node for applied variable terms X s n , instead of a node for X followed by nodes for the arguments s n .We serialize λ-abstraction λx.s using a dedicated node LAM τ , where τ is the type of x, followed by the serialization of s.Other than these changes, serialization remains as in Ehoh, following the gracefulness principle.Figure 1 shows how g (X a b) c and h (λx.λy.X y x) are serialized.Since the terms are stored in serialized form, it is hard to manipulate λ prefixes of stored terms during matching.Performing η-expansion when serializing terms ensures that matchable terms have λ prefixes of the same length.
We have dedicated separate nodes for applied variables because access to arguments of applied variables is necessary for the pattern matching algorithm.Even though arguments can be obtained by querying the arity n of the variable and taking the next n arguments in the serialization, this is both inefficient and inelegant.As for De Bruijn indices, we treat them the same as function symbols.
Following the notation from the extension of perfect discrimination trees to λ-free higher-order logic [37], we now describe how enumeration of generalizations is performed.To traverse the tree, λE begins at the root node and maintains two stacks: term_stack and term_proc, where term_stack contains the subterms of the query term that have to be matched, and term_proc contains processed terms that are used to backtrack to previous states.Initially, term_stack contains the query term, the current matching substitution σ is empty, and the successor node is chosen among the child nodes as follows: A. If the node is labeled with a symbol ξ (where ξ is either a De Bruijn index or a constant) and the top item t of term_stack is of the form ξ t n , replace t by n new items t 1 , . . ., t n , and push t onto term_proc.
B. If the node is labeled with a symbol LAM τ and the top item t of term_stack is of the form λx. s and the type of x is τ , replace t by s, and push t onto term_proc.C. If the node is labeled with a possibly applied variable X s n (where n ≥ 0), and the top item of term_stack is t, the matching algorithm described above is run on X s n and t.The algorithm takes into account σ built so far and extends it if necessary.If the algorithm succeeds, pop t from term_stack, push it onto term_proc, and save the original value of σ in the node.
Backtracking works in the opposite direction: If the current node is labeled with a De Bruijn index or function symbol node of arity n, pop n terms from term_stack and move the top of term_proc to term_stack.If the node is labeled with LAM τ , pop the top of term_stack and move the top of term_proc to term_stack.Finally, if the node is labeled with a possibly applied variable, move the top of the term_proc to term_stack and restore the value of σ.
As an example of how finding a generalization works, consider the following states of stacks and substitutions, which emerge when looking for generalizations of g (f a b) c in the tree of Figure 1:

Preprocessing, Calculus, and Extensions
Ehoh's simple λ-free higher-order calculus performed well on Sledgehammer problems and formed a promising stepping stone to full higher-order logic [37].When implementing support for full higher-order logic, we were guided by efficiency and gracefulness with respect to Ehoh's calculus rather than completeness.Whereas Zipperposition provides both complete and incomplete modes, λE only offers incomplete modes.
Preprocessing.Our experience with Zipperposition showed the importance of flexibility in preprocessing the higher-order problems [35].Therefore, we implemented a flexible preprocessing module in λE.
To maintain compatibility with Ehoh, λE can optionally transform all λabstractions into named functions.This process is called λ-lifting [16].λE also removes all occurrences of Boolean subterms (other than , ⊥ ⊥ ⊥, and free variables) in higher-order contexts using a FOOL-like transformation [20].For example, the formula f(p Many TPTP problems use the definition role to identify the definitions of symbols.λE can treat definition axioms as rewrite rules, and replace all occurrences of defined symbols during preprocessing.Furthermore, during SInE [15] axiom selection, it can always include the defined symbol in the trigger relation.
Calculus.λE implements the same superposition calculus as Ehoh with three important changes.First, wherever Ehoh requires the MGU of terms, λE enumerates unifiers from a finite subset of the CSU, as explained in Sect. 4. Second, λE uses versions of the KBO and LPO orders designed for λ-terms.
The third difference is more subtle.One of the main features of Ehoh is prefix optimization [37, Sect.1]: a method that, given a demodulator s ≈ t, makes it possible to replace both applied and unapplied occurrences of s by t by traversing only the first-order subterms of a rewritable term.In a λ-free setting, this optimization is useful, but in the presence of βη-normalization, the shapes of terms can change drastically, making it much harder to track prefixes of terms.This is why we disable the prefix optimization in λE.To compensate for losing this optimization, we introduce the argument congruence rule AC in λE and enable positive and negative functional extensionality (PE and NE) by default: AC and NE assume that s and t are of function type.In NE, X denotes all the free variables occurring in s and t, and sk is a fresh Skolem symbol of the appropriate type.PE has a side condition that X may not occur in s, t, or C.
Saturation.E's saturation procedure assumes that each attempt to perform an inference will either result in a single clause or fail due to one of the inference side conditions.Unification procedures that produce multiple substitutions break this invariant, and the saturation procedure needed to be adjusted.
For Zipperposition, Vukmirović et al. developed a variant of the saturation procedure that interleaves computing unifiers and scheduling inferences to be performed [35].Since completeness was not a design goal for λE, we did not implement this version of the saturation procedure.Instead, in places where previously a single unifier was expected, λE consumes all elements of the iterator used for enumerating a unifier, converting them into clauses.
Reasoning about Formulas.Even though most of the Boolean structure is removed during preprocessing, formulas can reappear at the top level of clauses during saturation.For example, after instantiating X with λx.λy.x∧ ∧ ∧y, the clause X p q ∨ a ≈ b becomes (p ∧ ∧ ∧ q) ∨ a ≈ b. λE converts every clause of the form ϕ ∨ C, where ϕ has a logic symbol as its head, or it is a (dis)equation between two formulas different than , to an explicitly quantified formula.Then, the clausification algorithm is invoked on the formula to restore the clausal structure.Zipperposition features more dynamic clausification modes, but for simplicity we decided not to implement them in λE.
The λ-superposition calculus for full higher-order logic [2] includes many rules that act on Boolean subterms, which are necessary for completeness.Other than Boolean simplification rules, which use simple tautologies such as p ∧ ∧ ∧ ↔ ↔ ↔ p to simplify terms, we have implemented none of the Boolean rules of this calculus in λE.First, we have observed that complicated rules such as FluidBoolHoist and FluidLoobHoist are hardly ever useful in practice and usually only contribute to an uncontrolled increase in the proof state size.Second, simpler rules such as BoolHoist can usually be simulated by pragmatic rules that perform Boolean extensionality reasoning, described below.
To make up for excluding Boolean rules, we use an incomplete, but more easily controllable and intuitive rule, called primitive instantiation.This rule instantiates free predicate variables with approximations of formulas that are ground instances of this variable.We use the approximations described by Vukmirović and Nummelin [38,Sect. 3.3] and implemented them in a similar manner.
λE's handling of the Hilbert choice operator is inspired by Leo-III's [30].λE recognizes clauses of the form ¬ P X ∨ P (f P ), which essentially denote that f is a choice symbol.Then, when subterm f s is found during saturation, s is used to instantiate the choice axiom for f.Similarly, Leibniz equality [38] is eliminated by recognizing clauses of the form ¬ P a ∨ P b ∨ C.These clauses are then instantiated with P → λx.x ≈ a and P → λx.x ≈ b, which results in a ≈ b ∨ C.
Finally, λE treats induction axioms specially.Like Zipperposition [35,Sect. 4], it abstracts literals from the goal clauses and instantiates induction axioms with these abstractions.Since Zipperposition supports dynamic calculus-level clausification, induction axioms are instantiated during saturation, when the axioms are processed.In λE, this instantiation is performed immediately after clausification.After λE has collected all the abstractions, it traverses the clauses and instantiates those that have applied variable of the same type as the abstraction.
Extensionality.λE takes a pragmatic approach to reasoning about functional and Boolean extensionality: It uses abstracting rules [3] which simulate basic superposition calculus rules but do not require unifiability of the partner terms in the inference.More precisely, assume a core inference needs to be performed between two β-reduced terms u and v, such that they can be represented as u = C[s 1 , . . ., s n ] and v = C[t 1 , . . ., t n ], where C is the most general (green [3]) common context of u and v, not all of s i and t j are free variables, and for at least one i, s i = t i , s i and t i are not possibly applied free variables, and they are of Boolean or function type.Then, the conclusion is formed by taking the conclusion D of the core inference rule (which would be created if s and t are unifiable) and adding literals These rules are particularly useful because λE has no rules that dynamically process Booleans in FOOL-like fashion, such as BoolHoist.For example, given the clauses f (p∧ ∧ ∧q) ≈ a and g (f p) ≈ b, the abstracting version of the superposition rule would result in g a ≈ b ∨ (p ∧ ∧ ∧ q) ≈ p.In this way, the Boolean structure bubbles up to the top level and is further processed by clausification.We noticed that this alleviates the need for the other Boolean rules in practice.

Evaluation
We now try to answer two questions about λE: How does λE compare against other higher-order provers (including Ehoh)?Does λE introduce any overhead compared with Ehoh?To answer these questions, we ran provers on problems from the TPTP library [33] and on benchmarks generated by Sledgehammer (SH) [25].The experiments were carried out on StarExec Miami [31] nodes equipped with Intel Xeon E5-2620 v4 CPU clocked at 2.10 GHz.For the TPTP part, we used the CASC 20212 time limits: 120 s wall-clock and 960 s CPU.For SH benchmarks and to answer the other question, we used Sledgehammer's default time limit: 30 s wall-clock and CPU.The raw evaluation data is available online. 3omparison with Other Provers.To answer the first question, we let λE compete with the top contenders in the higher-order division of CASC 2021: cvc5 0.0.7,4Ehoh 2.7 [37], Leo-III 1.6.6 [30], Vampire 4.6 [6], and Zipperposition 2.1 [35].We also included Satallax 3.5 [8].We used all 2899 higher-order theorems in TPTP 7.5.0 as well as 5000 SH higher-order benchmarks originating from the Seventeen benchmark suite [12].On SH benchmarks, cvc5, Ehoh, λE, Vampire, and Zipperposition were run using custom schedules provided by their developers, optimized for single-core usage and low timeouts.Otherwise, we used the corresponding CASC configurations.
Although it focuses on λ-free higher-order logic, Ehoh 2.7 can parse full higher-order logic using λ-lifting.We included two versions of Zipperposition: coop uses Ehoh 2.7 as a backend to finish proof attempts, whereas uncoop does not use this feature.Both Ehoh and λE were run in the automatic scheduling mode.Compared to Ehoh, λE features a redesigned module for automatic scheduling, it can use multiple CPU cores, and its heuristics have been trained better on higher-order problems.
The results are shown in Figure 2. λE dramatically improves E's higher-order reasoning capabilities compared with Ehoh.It solves 20% more problems on TPTP benchmarks and 7% more problems on SH benchmarks, where Ehoh was already very successful.λE was mainly designed as an efficient backend to proof assistants.As such, it excels on SH benchmarks, outperforming the competition.On TPTP, it outperforms all higher-order provers other than Zipperposition-coop.If Zipperposition's Ehoh backend is disabled, λE outperforms Zipperposition by a wide margin.This comparison is arguably fairer; after all, λE does not use an older version of Zipperposition as a backend.These results suggest that λE already implements most of the necessary features for a high-performance higherorder prover but could benefit from the kind of fine-tuning that Zipperposition underwent in the last three years.
Remarkably, the raw evaluation data reveals thats λE solves 181 SH problems and 24 TPTP problems that Zipperposition-coop does not.The lower number of uniquely solved TPTP problems is likely because Zipperposition was heavily optimized on the TPTP.
Comparison with the First-Order E. Both Ehoh and λE can be compiled in a mode that disables most of the higher-order reasoning.This mode is designed for users that are interested only in E's first-order capabilities and care a lot about performance.To answer the second evaluation question, about assessing overhead of λE, we chose all the 1138 unique problems used at CASC from 2019 to 2021 in the first-order theorem division and ran Ehoh and λE both in this first-order (FO) mode and in higher-order (HO) mode.
We fixed a single configuration of options, because Ehoh's and λE's automatic scheduling methods could select different configurations and we would not be measuring the overhead but the quality of the chosen configurations.We chose the boa configuration [37,Sect. 7], which is the configuration most often used by E 2.2 in its automatic scheduling mode.The results are shown in Figure 3.
Counterintuitively, the higher-order versions of both provers outperform the first-order counterparts.However, the difference is so small that it can be attributed to the changes to memory layout that affect the order in which clauses are chosen.Similar effects are visible when comparing the first-order versions.
CASC Results.λE also took part in CASC 2022.In the TPTP higher-order division, λE finished second, after Zipperposition, as expected.In the Sledgehammer division, λE tied with Ehoh for first place, a disappointment.The likely explanation is that λE used a wrong configuration in this division, as we found out afterwards.We expect better performance at CASC 2023.

Discussion and Related Work
On the trajectory to λE, Bentkamp et al. developed three superposition calculi: for λ-free higher-order logic [4], for a higher-order logic with λ-abstraction but no Booleans [3], and for full higher-order logic [3].These milestones allowed us to carefully estimate how the increased reasoning capabilities of each calculus influence its performance.
Extending first-order provers with higher-order reasoning capabilities has been attempted by other researchers as well.Barbosa et al. extended the SMT solvers CVC4 (now cvc5) and veriT to higher-order logic in an incomplete way [1].Bhayat and Reger first extended Vampire to higher-order logic using combinatory unification [6], an incomplete approach, before they designed and implemented a complete higher-order superposition calculus based on SKBCI-combinators [5].The advantage is that combinators can be supported as a thin layer on top of λ-free terms.This calculus is also implemented in Zipperposition.However, in informal experiments, we found that λ-superposition performs substantially better, corroborating the CASC results, so we decided to make a more profound change to Ehoh and implement λ-superposition.
Possibly the only actively maintained higher-order provers built from the bottom up as higher-order provers are Leo-III [30] and Satallax's [8] successor Lash [9].A further overview of other traditional higher-order provers and the calculi they are based on can be found in the paper about Ehoh [37, Sect.9].

Conclusion
In 2019, the reviewers of our Ehoh paper [37] were skeptical that extending Ehoh with support for full higher-order logic would be feasible.One of them wrote: A potential criticism could be that this step from E to Ehoh is just extending FOL by those aspects of HOL that are easily in reach with rather straightforward extensions (none of the extensions is indeed very complicated), and that the difficult challenges of fully supporting HOL have yet to be confronted.
We ended up addressing the theoretical "difficult challenges" in other work with colleagues.In this paper, we faced the practical challenges pertaining to the extension of Ehoh's data structures and algorithms to support full higherorder logic and demonstrated that such an extension is possible.Our evaluation shows that this extension makes λE the best higher-order prover on benchmarks coming from interactive theorem proving practice, which was our goal.λE lags slightly behind Zipperposition on TPTP problems.One reason might be that Zipperposition does not assume a clausal structure and can perform subtle formula-level inferences.It would be useful to implement the same features in λE.We have also only started tuning λE's heuristics on higher-order problems.