Extending a Brainiac Prover to LambdaFree HigherOrder Logic
Abstract
Decades of work have gone into developing efficient proof calculi, data structures, algorithms, and heuristics for firstorder automatic theorem proving. Higherorder provers lag behind in terms of efficiency. Instead of developing a new higherorder prover from the ground up, we propose to start with the stateoftheart superpositionbased prover E and gradually enrich it with higherorder features. We explain how to extend the prover’s data structures, algorithms, and heuristics to \(\lambda \)free higherorder logic, a formalism that supports partial application and applied variables. Our extension outperforms the traditional encoding and appears promising as a stepping stone towards full higherorder logic.
1 Introduction
Superpositionbased provers, such as E [26], SPASS [33], and Vampire [18], are among the most successful firstorder reasoning systems. They serve as backends in various frameworks, including software verifiers (Why3 [15]), automatic higherorder theorem provers (LeoIII [27], Satallax [12]), and “hammers” in proof assistants (HOLyHammer for HOL Light [17], Sledgehammer for Isabelle [21]). Decades of research have gone into refining calculi, devising efficient data structures and algorithms, and developing heuristics to guide proof search. This work has mostly focused on firstorder logic with equality, with or without arithmetic.
Research on higherorder automatic provers has resulted in systems such as LEO [8], LeoII [9], and LeoIII [27], based on resolution and paramodulation, and Satallax [12], based on tableaux. These provers feature a “cooperative” architecture, pioneered by LEO: They are fullfledged higherorder provers that regularly invoke an external firstorder prover in an attempt to finish the proof quickly using only firstorder reasoning. However, the firstorder backend will succeed only if all the necessary higherorder reasoning has been performed, meaning that much of the firstorder reasoning is carried out by the slower higherorder prover. As a result, this architecture leads to suboptimal performance on firstorder problems and on problems with a large firstorder component. For example, at the 2017 installment of the CADE ATP System Competition (CASC) [30], LeoIII, using E as one of its backends, proved 652 out of 2000 firstorder problems in the Sledgehammer division, compared with 1185 for E on its own and 1433 for Vampire.
To obtain better performance, we propose to start with a competitive firstorder prover and extend it to full higherorder logic one feature at a time. Our goal is a graceful extension, so that the system behaves as before on firstorder problems, performs mostly like a firstorder prover on typical, mildly higherorder problems, and scales up to arbitrary higherorder problems, in keeping with the zerooverhead principle: What you don’t use, you don’t pay for.
The three main challenges are generalizing the term representation (Sect. 3), the unification algorithm (Sect. 4), and the indexing data structures (Sect. 5). We also adapted the inference rules (Sect. 6) and the heuristics (Sect. 7). This paper explains the key ideas. Details, including correctness proofs, are given in a separate technical report [32].
A novel aspect of our work is prefix optimization. Higherorder terms contain twice as many proper subterms as firstorder terms; for example, the term \(\mathsf {f}\;(\mathsf {g}\;\mathsf {a})\;\mathsf {b}\) contains not only the argument subterms \(\mathsf {g}\;\mathsf {a}\), \(\mathsf {a}\), \(\mathsf {b}\) but also the “prefix” subterms \(\mathsf {f}\), \(\mathsf {f}\;(\mathsf {g}\;\mathsf {a})\), \(\mathsf {g}\). Using prefix optimization, the prover traverses subterms recursively in a firstorder fashion, considering all the prefixes of the current subterm together, at no significant additional cost. Our experiments (Sect. 8) show that Ehoh is effectively as fast as E on firstorder problems and can also prove higherorder problems that do not require synthesizing \(\lambda \)terms. As a next step, we plan to add support for \(\lambda \)terms and higherorder unification.
2 Logic
Our logic corresponds to the intensional \(\lambda \)free higherorder logic (\(\lambda \)fHOL) described by Bentkamp, Blanchette, Cruanes, and Waldmann [7, Sect. 2]. Another possible name for this logic would be “applicative firstorder logic.” Extensionality can be obtained by adding suitable axioms [7, Sect. 3.1].
A type is either an atomic type \(\iota \) or a function type \(\tau \rightarrow \upsilon \), where \(\tau \) and \(\upsilon \) are themselves types. Terms, ranged over by s, t, u, v, are either variables \(x, y, z, \dots \), (function) symbols \(\mathsf {a}, \mathsf {b}, \mathsf {c}, \mathsf {d}, \mathsf {f}, \mathsf {g}, \ldots {}\) (often called “constants” in the higherorder literature), or binary applications \(s \; t.\) Application associates to the left, whereas \(\rightarrow \) associates to the right. The typing rules are as for the simply typed \(\lambda \)calculus. A term’s arity is the number of extra arguments it can take; thus, if \(\mathsf {f}\) has type \(\iota \rightarrow \iota \rightarrow \iota \) and \(\mathsf {a}\) has type \(\iota \), then \(\mathsf {f}\) is binary, \(\mathsf {f}\;\mathsf {a}\) is unary, and \(\mathsf {f}\;\mathsf {a}\;\mathsf {a}\) is nullary. Terms have a unique “flattened” decomposition of the form \(\zeta \; s_1 \, \ldots \, s_m\), where \(\zeta \), the head, is a variable x or symbol \(\mathsf {f}\). We abbreviate tuples \((a_1, \ldots , a_m)\) to \(\overline{a_m}\) or \(\overline{a}\); abusing notation, we write \(\zeta \; \overline{s_m}\) for the curried application \(\zeta \; s_1 \, \dots \, s_m.\)
An equation \(s \approx t\) corresponds to an unordered pair of terms. A literal L is an equation or its negation. Clauses C, D are finite multisets of literals, written \(L_1 \vee \cdots \vee L_n\). E and Ehoh clausify the input as a preprocessing step.
A wellknown technique to support \(\lambda \)fHOL using firstorder reasoning systems is to employ the applicative encoding. Following this scheme, every nary symbol is converted to a nullary symbol, and application is represented by a distinguished binary symbol \(\mathsf {@}.\) For example, the \(\lambda \)fHOL term \(\mathsf {f} \; (x\; \mathsf {a}) \; \mathsf {b}\) is encoded as the firstorder term \(\mathsf {@}(\mathsf {@}(\mathsf {f}, \mathsf {@}(x, \mathsf {a})), \mathsf {b}).\) However, this representation is not graceful; it clutters data structures and impacts proof search in subtle ways, leading to poorer performance, especially on large benchmarks. In our empirical evaluation, we find that for some prover modes, the applicative encoding incurs a 15% decrease in success rate (Sect. 8). For these and further reasons (Sect. 9), it is not an ideal basis for higherorder reasoning.
3 Types and Terms
The term representation is a fundamental question when building a theorem prover. Delicate changes to E’s term representation were needed to support partial application and especially applied variables. In contrast, the introduction of a higherorder type system had a less dramatic impact on the prover’s code.
Types. For most of its history, E supported only untyped firstorder logic. Cruanes implemented support for atomic types for E 2.0 [13, p. 117]. Symbols \(\mathsf {f}\) are declared with a type signature: \(\mathsf {f} : \tau _1 \times \cdots \times \tau _m \rightarrow \tau .\) Atomic types are represented by integers in memory, leading to efficient type comparisons.
In \(\lambda \)fHOL, a type signature consists of types \(\tau \), in which the function type constructor \(\rightarrow \) can be nested—e.g., \((\iota \rightarrow \iota ) \rightarrow \iota \rightarrow \iota .\) A natural way to represent such types is to mimic their recursive structures using tagged unions. However, this leads to memory fragmentation, and a simple operation such as querying the type of a function’s ith argument would require dereferencing i pointers. We prefer a flattened representation, in which a type \(\tau _1 \rightarrow \cdots \rightarrow \tau _n \rightarrow \iota \) is represented by a single node labeled with \({\rightarrow }\) and pointing to the array \((\tau _1,\dots ,\tau _n,\iota ).\) Applying \(k \le n\) arguments to a function of the above type yields a term of type \(\tau _{k+1} \rightarrow \cdots \rightarrow \tau _n \rightarrow \iota \). In memory, this corresponds to skipping the first k array elements.
To speed up type comparisons, Ehoh stores all types in a shared bank and implements perfect sharing, ensuring that types that are structurally the same are represented by the same object in memory. Type equality can then be implemented as a pointer comparison.
Terms. In E, terms are represented as perfectly shared directed acyclic graphs. Each node, or cell, contains 11 fields, including Open image in new window , an integer that identifies the term’s head symbol (if \({\ge }\;0\)) or variable (if \({<}\;0\)); Open image in new window , an integer corresponding to the number of arguments passed to the head symbol; Open image in new window , an array of size Open image in new window consisting of pointers to argument terms; and Open image in new window , which possibly stores a substitution for a variable used for unification and matching.
In higherorder logic, variables may have function type and be applied, and symbols can be applied to fewer arguments than specified by their type signatures. A natural representation of \(\lambda \)fHOL terms as tagged unions would distinguish between variables x, symbols \(\mathsf {f}\), and binary applications \(s \; t.\) However, this scheme suffers from memory fragmentation and lineartime access, as with the representation of types, affecting performance on purely or mostly firstorder problems. Instead, we propose a flattened representation, as a generalization of E’s existing data structures: Allow arguments to variables, and for symbols let Open image in new window be the number of actual arguments.
A side effect of the flattened representation is that prefix subterms are not shared. For example, the terms \(\mathsf {f}\;\mathsf {a}\) and \(\mathsf {f}\;\mathsf {a}\;\mathsf {b}\) correspond to the flattened cells \(\mathsf {f}(\mathsf {a})\) and \(\mathsf {f}(\mathsf {a}, \mathsf {b}).\) The argument subterm \(\mathsf {a}\) is shared, but not the prefix \(\mathsf {f}\;\mathsf {a}.\) Similarly, x and \(x\;\mathsf {b}\) are represented by two distinct cells, x() and \(x(\mathsf {b})\), and there is no connection between the two occurrences of x. In particular, despite perfect sharing, their Open image in new window fields are unconnected, leading to inconsistencies.
A potential solution would be to systematically traverse a clause and set the Open image in new window fields of all cells of the form \(x(\overline{s})\) whenever a variable x is bound, but this would be inefficient and inelegant. Instead, we implemented a hybrid approach: Variables are applied by an explicit application operator \(@\), to ensure that they are always perfectly shared. Thus, \(x\;\mathsf {b}\;\mathsf {c}\) is represented by the cell \(@(x, \mathsf {b}, \mathsf {c})\), where x is a shared subcell. This is graceful, since variables never occur applied in firstorder terms. The main drawback of this technique is that some normalization is necessary after substitution: Whenever a variable is instantiated by a term with a symbol head, the \(@\) symbol must be eliminated. Applying the substitution \(\{x \mapsto \mathsf {f} \; \mathsf {a}\}\) to the cell \(@(x, \mathsf {b}, \mathsf {c})\) must produce the cell \(\mathsf {f}(\mathsf {a}, \mathsf {b}, \mathsf {c})\) and not \(@(\mathsf {f}(\mathsf {a}), \mathsf {b}, \mathsf {c})\), for consistency with other occurrences of \(\mathsf {f} \; \mathsf {a} \; \mathsf {b} \; \mathsf {c}.\)
There is one more complication related to the Open image in new window field. In E, it is easy and useful to traverse a term as if a substitution has been applied, by following all set Open image in new window fields. In Ehoh, this is not enough, because cells must also be normalized. To avoid repeatedly creating the same normalized cells, we introduced a Open image in new window field that connects a \(@(x, \overline{s})\) cell with its substitution. However, this cache can easily become stale when the Open image in new window pointer is updated. To detect this situation, we store x’s Open image in new window value in the \(@(x, \overline{s})\) cell’s Open image in new window field (which is otherwise unused). To find out whether the cache is valid, it suffices to check that the Open image in new window fields of x and \(@(x, \overline{s})\) are equal.
Term Orders. Superposition provers rely on term orders to prune the search space. To ensure completeness, the order must be a simplification order that can be extended to a simplification order that is total on variablefree terms. The Knuth–Bendix order (KBO) and the lexicographic path order (LPO) meet this criterion. KBO is generally regarded as the more robust and efficient option for superposition. E implements both. In earlier work, Blanchette and colleagues have shown that only KBO can be generalized gracefully while preserving all the necessary properties for superposition [5]. For this reason, we focus on KBO.
E implements the lineartime algorithm for KBO described by Löchner [19], which relies on the tupling method to store intermediate results, avoiding repeated computations. It is straightforward to generalize the algorithm to compute the graceful \(\lambda \)fHOL version of KBO [5]. The main difference is that when comparing two terms \(\mathsf {f} \; \overline{s_m}\) and \(\mathsf {f} \; \overline{t_n}\), because of partial application we may now have \(m \not = n\); this required changing the implementation to perform a lengthlexicographic comparison of the tuples \(\overline{s_m}\) and \(\overline{t_n}.\)
4 Unification and Matching
Syntactic unification of \(\lambda \)fHOL terms has a definite firstorder flavor. It is decidable, and most general unifiers (MGUs) are unique up to variable renaming. For example, the unification constraint Open image in new window has the MGU \(\{y \mapsto \mathsf {f}\}\), whereas in full higherorder logic it would admit infinitely many independent solutions of the form \(\{ y \mapsto \lambda x.\; \mathsf {f} \; (\mathsf {f} \; (\cdots (\mathsf {f} \; x)\cdots )) \}.\) Matching is a special case of unification where only the variables on the lefthand side can be instantiated.
An easy but inefficient way to implement unification and matching for \(\lambda \)fHOL is to apply the applicative encoding (Sect. 1), perform firstorder unification or matching, and decode the result. Instead, we propose to generalize the firstorder unification and matching procedures to operate directly on \(\lambda \)fHOL terms.
We present our unification procedure as a transition system, generalizing Baader and Nipkow [3]. A unification problem consists of a finite set S of unification constraints Open image in new window , where \(s_i\) and \(t_i\) are of the same type. A problem is in solved form if it has the form Open image in new window , where the \(x_i\)’s are distinct and do not occur in the \(t_j\)’s. The corresponding unifier is \(\{ x_1 \mapsto t_1, \ldots , x_n \mapsto t_n \}.\) The transition rules attempt to bring the input constraints into solved form.

Delete Open image in new window

Decompose Open image in new window

DecomposeX Open image in new window

Orient Open image in new window

OrientXY Open image in new window if \(m > n\)

Eliminate Open image in new window if \(x \in \mathcal {V} ar (S) \setminus \mathcal {V} ar (t)\)
The Delete, Decompose, and Eliminate rules are essentially as for firstorder terms. The Orient rule is generalized to allow applied variables and complemented by a new OrientXY rule. DecomposeX, also a new rule, can be seen as a variant of Decompose that analyzes applied variables; the term u may be an application.
During proof search, E repeatedly needs to test a term s for unifiability not only with some other term t but also with t’s subterms. Prefix optimization speeds up this test: The subterms of t are traversed in a firstorder fashion; for each such subterm \(\zeta \; \overline{t_n}\), at most one prefix \(\zeta \; \overline{t_k}\), with \(k \le n\), is possibly unifiable with s, by virtue of their having the same arity. Using this technique, Ehoh is virtually as efficient as E on firstorder terms.
5 Indexing Data Structures
Superposition provers like E work by saturation. Their main loop heuristically selects a clause and searches for potential inference partners among a possibly large set of other clauses. Mechanisms such as simplification and subsumption also require locating terms in a large clause set. For example, when E derives a new equation \(s \approx t\), if s is larger than t according to the term order, it will rewrite all instances \(\sigma (s)\) of s to \(\sigma (t)\) in existing clauses.
To avoid iterating over all terms (including subterms) in large clause sets, superposition provers store the potential inference partners in indexing data structures. A term index stores a set of terms S. Given a query term t, a query returns all terms \(s \in S\) that satisfy a given retrieval condition: \(\sigma (s) = \sigma (t)\) (s and t are unifiable), \(\sigma (s) = t\) (s generalizes t), or \(s = \sigma (t)\) (s is an instance of t), for some substitution \(\sigma .\) Perfect indices return exactly the subset of terms satisfying the retrieval condition. In contrast, imperfect indices return a superset of eligible terms, and the retrieval condition needs to be checked for each candidate.
E relies on two term indexing data structures, perfect discrimination trees [20] and fingerprint indices [24], that needed to be generalized to \(\lambda \)fHOL. It also uses feature vector indices [25] to speed up clause subsumption and related techniques, but these require no changes to work with \(\lambda \)fHOL clauses.
E uses perfect discrimination trees for finding generalizations of query terms. For example, if the query term is \(\mathsf {g}(\mathsf {a}, \mathsf {a})\), it would follow the path \(\mathsf {g}.\mathsf {a}.\mathsf {a}\) in the tree \(D_1\) and return \(\{\mathsf {g}(\mathsf {a}, \mathsf {a})\}.\) For \(D_2\), it would also explore paths labeled with variables, binding them as it proceeds, and return \(\{ \mathsf {g}(\mathsf {a}, \mathsf {a}){,}\; \mathsf {g}(y, \mathsf {a}){,}\; \mathsf {g}(y, x){,}\; x \}.\)
The data structure relies on the observation that serializing is unambiguous. Conveniently, this property also holds for \(\lambda \)fHOL terms. Assume that two distinct \(\lambda \)fHOL terms yield the same serialization. Clearly, they must disagree on parentheses; one will have the subterm \(s\; t\; u\) where the other has \(s\; (t\; u).\) However, these two subterms cannot both be well typed.
When generalizing the data structure to \(\lambda \)fHOL, we face a slight complication due to partial application. Firstorder terms can only be stored in leaf nodes, but in Ehoh we must also be able to represent partially applied terms, such as \(\mathsf {f}\), \(\mathsf {g}\), or \(\mathsf {g}\;\mathsf {a}\) (assuming, as above, that \(\mathsf {f}\) is unary and \(\mathsf {g}\) is binary). Conceptually, this can be solved by storing a Boolean on each node indicating whether it is an accepting state. In the implementation, the change is more subtle, because several parts of E’s code implicitly assume that only leaf nodes are accepting.
The main difficulty specific to \(\lambda \)fHOL concerns applied variables. To enumerate all generalizing terms, E needs to backtrack from child to parent nodes. To achieve this, it relies on two stacks that store subterms of the query term: term_stack stores the terms that must be matched in turn against the current subtree, and term_proc stores, for each node from the root to the current subtree, the corresponding processed term, including any arguments yet to be matched.
 A.
If the node is labeled with a symbol \(\mathsf {f}\) and the top item t of term_stack is \(\mathsf {f}(\overline{t_n})\), replace t by n new items \(t_1,\dots ,t_n\), and push t onto term_proc.
 B.
If the node is labeled with a variable x, there are two subcases. If x is already bound, check that \(\sigma (x) = t\); otherwise, extend \(\sigma \) so that \(\sigma (x) = t.\) Next, pop a term t from term_stack and push it onto term_proc.
The goal is to reach an accepting node. If the query term and all the terms stored in the tree are firstorder, term_stack will then be empty, and the entire query term will have been matched.
Backtracking works in reverse: Pop a term t from term_proc; if the current node is labeled with an nary symbol, discard term_stack’s topmost n items; finally, push t onto term_stack. Variable bindings must also be undone.
To adapt the procedure to \(\lambda \)fHOL, the key idea is that an applied variable is not very different from an applied symbol. A node labeled with an nary symbol or variable \(\zeta \) matches a prefix \(t'\) of the kary term t popped from term_stack and leaves \(n  k\) arguments \(\overline{u}\) to be pushed back, with \(t = t' \; \overline{u}.\) If \(\zeta \) is a variable, it must be bound to the prefix \(t'.\) Backtracking works analogously: Given the arity n of the node label \(\zeta \) and the arity k of the term t popped from term_proc, we discard the topmost \(n  k\) items \(\overline{u}\) from term_proc.
Fingerprint Indices. Fingerprint indices [24] trade perfect indexing for a compact memory representation and more flexible retrieval conditions. The basic idea is to compare terms by looking only at a few predefined sample positions. If we know that term s has symbol \(\mathsf {f}\) at the head of the subterm at 2.1 and term t has \(\mathsf {g}\) at the same position, we can immediately conclude that s and t are not unifiable.
A fingerprint index is a trie that stores a term set T keyed by fingerprint. The term \(\mathsf {f}(\mathsf {g}(x), \mathsf {g}(\mathsf {a}))\) above would be stored in the node addressed by \(\mathsf {f}.\mathsf {g}.\mathsf {g}.{\textsf {A}}.{\textsf {N}}.\mathsf {a}.{\textsf {N}}\), possibly together with other terms that share the same fingerprint. This organization makes it possible to unify or match a query term s against all the terms T in one traversal. Once a node storing the terms \(U \subseteq T\) has been reached, due to overapproximation we must apply unification or matching on s and each \(u \in U.\)
When adapting this data structure to \(\lambda \)fHOL, we must first choose a suitable notion of position in a term. Conventionally, higherorder positions are strings over \(\{1, 2\}\) indicating, for each binary application \(t_1\;t_2\), which term \(t_i\) to follow. Given that this is not graceful, it seems preferable to generalize the firstorder notion to flattened \(\lambda \)fHOL terms—e.g., \(x\> \mathsf {a} \> \mathsf {b} \> _{1} = \mathsf {a}\) and \(x\> \mathsf {a} \> \mathsf {b} \> _{2} = \mathsf {b}.\) However, this approach fails on applied variables. For example, although \(x \> \mathsf {b}\) and \(\mathsf {f} \> \mathsf {a} \> \mathsf {b}\) are unifiable (using \(\{ x \mapsto \mathsf {f} \> \mathsf {a} \}\)), sampling position 1 would yield a clash between \(\mathsf {b}\) and \(\mathsf {a}.\) To ensure that positions remain stable under substitution, we propose to number arguments in reverse: \(t {}^\epsilon = t\) and \(\zeta \> t_n \, \ldots \, t_1 ^{i.p} = t_{i} ^p\) if \(1 \le i \le n\).
We can easily support prefix optimization for both terms s and t being compared: We ensure that s and t are fully applied, by adding enough fresh variables as arguments, before computing their fingerprints.
6 Inference Rules
Saturating provers try to show the unsatisfiability of a set of clauses by systematically adding logical consequences (up to simplification and redundancy), eventually deriving the empty clause as an explicit witness of unsatisfiability. They employ two kinds of inference rules: generating rules produce new clauses and are necessary for completeness, whereas simplification rules delete existing clauses or replace them by simpler clauses. This simplification is crucial for success, and most modern provers spend a large part of their time on simplification.
Ehoh implements essentially the same logical calculus as E, except that it is generalized to \(\lambda \)fHOL terms. The standard inference rules and completeness proof of superposition can be reused verbatim; the only changes concern the basic definitions of terms and substitutions [7, Sect. 1].
In each rule, \(\sigma \) denotes the MGU of s and \(s'.\) Not shown are order and selectionbased side conditions that restrict the rules’ applicability.
Equality resolution and factoring (ER and EF) work on entire terms that occur on either side of a literal occurring in the given clause. To generalize them, it suffices to disable prefix optimization for our unification algorithm. By contrast, the rules for superposition into negative and positive literals (SN and SP) are more complex. As twopremise rules, they require the prover to find a partner for the given clause. There are two cases to consider.
To cover the case where the given clause acts as the left premise, the prover relies on a fingerprint index to compute a set of clauses containing terms possibly unifiable with a side s of a positive literal of the given clause. Thanks to our generalization of fingerprints, in Ehoh this candidate set is guaranteed to overapproximate the set of all possible inference partners. The unification algorithm is then applied to filter out unsuitable candidates. Thanks to prefix optimization, we can avoid gracelessly polluting the index with all prefix subterms.
For the case where the given clause is the right premise, the prover traverses its subterms \(s'\) looking for inference partners in another fingerprint index, which contains only entire left and righthand sides of equalities. Like E, Ehoh traverses subterms in a firstorder fashion. If prefix unification succeeds, Ehoh determines the unified prefix and applies the appropriate inference instance.
E maintains a perfect discrimination tree that stores clauses of the form \(s \approx t\) indexed by s and t. When applying the ES rule, E considers each literal \(u \approx v\) of the given clause in turn. It starts by taking the lefthand side u as a query term. If an equation \(s \approx t\) (or \(t \approx s\)) is found in the tree, with \(\sigma (s) = u\), the prover checks whether \(\sigma '(t) = v\) for some extension \(\sigma '\) of \(\sigma \). If so, ES is applicable. To consider nonempty contexts, the prover traverses the subterms \(u'\) and \(v'\) of u and v in lockstep, as long as they appear under identical contexts. Thanks to prefix optimization, when Ehoh is given a subterm \(u'\), it can find an equation \(s \approx t\) in the tree such that \(\sigma (s)\) is equal to some prefix of \(u'\), with n arguments \(\overline{u_n}\) remaining as unmatched. Checking for equality subsumption then amounts to checking that \(v' = \sigma '(t) \; \overline{u_n}\), for some extension \(\sigma '\) of \(\sigma \).
For example, let \(\mathsf {f} \; (\mathsf {g} \; \mathsf {a} \; \mathsf {b}) \approx \mathsf {f} \; (\mathsf {h} \; \mathsf {g} \; \mathsf {b})\) be the given clause, and suppose that \(x \; \mathsf {a} \approx \mathsf {h} \; x\) is indexed. Under context \(\mathsf {f}\>[ {~}]\), Ehoh considers the subterms \(\mathsf {g} \; \mathsf {a} \; \mathsf {b}\) and \(\mathsf {h} \; x \; \mathsf {b}\). It finds the prefix \(\mathsf {g} \; \mathsf {a}\) of \(\mathsf {g} \; \mathsf {a} \; \mathsf {b}\) in the tree, with \(\sigma = \{x \mapsto \mathsf {g}\}\). The prefix \(\mathsf {h} \; \mathsf {g}\) of \(\mathsf {h} \; \mathsf {g} \; \mathsf {b}\) matches the indexed equation’s righthand side \(\mathsf {h} \; x\) using the same substitution, and the remaining argument in both subterms, \(\mathsf {b}\), is identical.
7 Heuristics
E’s heuristics are largely independent of the prover’s logic and work unchanged for Ehoh. On firstorder problems, Ehoh’s behavior is virtually the same as E’s. Yet, in preliminary experiments, we observed that some \(\lambda \)fHOL benchmarks were proved quickly by E in conjunction with the applicative encoding (Sect. 1) but timed out with Ehoh. Based on these observations, we extended the heuristics.
Term Order Generation. The inference rules and the redundancy criterion are parameterized by a term order (Sect. 3). E can generate a symbol weight function (for KBO) and a symbol precedence (for KBO and LPO) based on criteria such as the symbols’ frequencies and whether they appear in the conjecture.
In preliminary experiments, we discovered that the presence of an explicit application operator \(@\) can be beneficial for some problems. With the applicative encoding, generation schemes can take the symbols \(\mathsf {@}_{\tau ,\upsilon }\) into account, effectively exploiting the type information carried by such symbols. To simulate this behavior, we introduced four generation schemes that extend E’s existing symbolfrequencybased schemes by partitioning the symbols by type. To each symbol, the new schemes assign a frequency corresponding to the sum of all symbol frequencies for its class. In addition, we designed four schemes that combine E’s typeagnostic and Ehoh’s typeaware approaches.
To generate symbol precedences, E can sort symbols by weight and use the symbol’s position in the sorted array as the basis for precedence. To account for the type information introduced by the applicative encoding, we implemented four typeaware precedence generation schemes.
Literal Selection. The side conditions of the superposition rules (SN and SP, Sect. 6) allow the use of a literal selection function to restrict the set of inference literals, thereby pruning the search space. Given a clause, a literal selection function returns a (possibly empty) subset of its literals. For completeness, any nonempty subset selected must contain at least one negative literal. If no literal is selected, all maximal literals become inference literals. The most widely used function in E is probably SelectMaxLComplexAvoidPosPred, which we abbreviate to SelectMLCAPP. It selects at most one negative literal, based on size, groundness, and maximality of the literal in the clause. It also avoids negative literals that share a predicate symbol with a positive literal in the same clause.
Clause Selection. Selection of the given clause is a critical choice point. E heuristically assigns clause priorities and clause weights to the candidates. E’s main loop visits, in roundrobin fashion, a set of priority queues. From each queue, it selects a number of clauses with the highest priorities, breaking ties by preferring smaller weights.
E provides template weight functions that allow users to finetune parameters such as weights assigned to variables or function symbols. The most widely used template is ConjectureRelativeSymbolWeight. It computes term and clause weights according to eight parameters, notably conj_mul, a multiplier applied to the weight of conjecture symbols. We implemented a new typeaware template function, called ConjectureRelativeSymbolTypeWeight, that applies the conj_mul multiplier to all symbols whose type occurs in the conjecture.
Configurations and Modes. A combination of parameters—including term order, literal selection, and clause selection—is called a configuration. For years, E has provided an auto mode, which analyzes the input problem and chooses a configuration known to perform well on similar problems. More recently, E has been extended with an autoschedule mode, which applies a portfolio of configurations in sequence on the given problem. Configurations that perform well on a wide range of problems have emerged over time. One of them is the configuration that is most often chosen by E’s auto mode. We call it boa (“best of auto”).
8 Evaluation
In this section, we consider the following questions: How useful are Ehoh’s new heuristics? And how does Ehoh perform compared with the previous version of E, 2.2, used directly or in conjunction with the applicative encoding, and compared with other provers? To answer the first question, we evaluated each new parameter independently. From the empirical results, we derived a new configuration optimized for \(\lambda \)fHOL problems. To answer the second question, we compared Ehoh’s success rate on \(\lambda \)fHOL problems with native higherorder provers and with E’s on their applicatively encoded counterparts. We also included firstorder benchmarks to measure Ehoh’s overhead with respect to E.
We set a CPU time limit of 60 s per problem. The experiments were performed on StarExec [28] nodes equipped with Intel Xeon E52609 0 CPUs clocked at 2.40 GHz and with 8192 MB of memory. Our raw data are publicly available.^{2}

The combination of the weight generation scheme invtypefreqrank and the precedence generation scheme invtypefreq performs best.

The literal selection heuristics SelectMLCAPP, SelectMLCAPPPreferAppVar, and SelectMLCAPPAvoidAppVar give virtually the same results.

The clause selection function ConjectureRelativeSymbolTypeWeight with ConstPrio priority and an appv_mul factor of 1.41 performs best.
We derived a new configuration from boa, called hoboa, by enabling the features identified in the first and third points. Below, we present a more detailed evaluation of hoboa, along with other configurations, on a larger benchmark suite. The benchmarks are partitioned as follows: (1) 1147 firstorder TPTP [29] problems belonging to the FOF (untyped) and TF0 (monomorphic) categories, excluding arithmetic; (2) 5012 Sledgehammergenerated problems from the Judgment Day [11] suite, targeting the monomorphic firstorder logic embodied by TPTP TF0; (3) all 530 monomorphic higherorder problems from the TH0 category of the TPTP library belonging to the \(\lambda \)fHOL fragment; (4) 5012 Judgment Day problems targeting the \(\lambda \)fHOL fragment of TPTP TH0.
For the first group of benchmarks, we randomly chose 1000 FOF problems (out of 8172) and all monomorphic TFF problems that are parsable by E. Both groups of Sledgehammer problems include two subgroups of 2506 problems, generated to include 32 or 512 Isabelle lemmas (SH32 and SH512), to represent both smaller and larger problems arising in interactive verification. Each subgroup itself consists of two subsubgroups of 1253 problems, generated by using either \(\lambda \)lifting or SKstyle combinators to encode \(\lambda \)expressions.
We evaluated Ehoh against LeoIII and Satallax and a version of E, called \(\mathsf {@}{+}\text {E}\), that first performs the applicative encoding. LeoIII and Satallax have the advantage that they can instantiate higherorder variables by \(\lambda \)terms. Thus, some formulas that are provable by these two systems may be nontheorems for \(\mathsf {@}{+}\text {E}\) and Ehoh. A simple example is the conjecture \(\exists f.\>\forall x\> y.\; f\>x\;y \approx \mathsf {g}\;y\;x\), whose proof requires taking \(\lambda x\> y.\; \mathsf {g}\;y\;x\) as the witness for f.

Comparing the Ehoh rows with the corresponding E rows, we see that Ehoh’s overhead is barely noticeable—the difference is at most one problem. The raw evaluation data reveal that Ehoh’s time overhead is about 3.7%.

Ehoh generally outperforms the applicative encoding, on both firstorder and higherorder problems. On Sledgehammer benchmarks, the best Ehoh mode (autoschedule) clearly outperforms all \(\mathsf {@}{+}\text {E}\) modes and configurations. Despite this, there are problems that \(\mathsf {@}{+}\text {E}\) proves faster than Ehoh.

Especially on large benchmarks, the E variants are substantially more successful than LeoIII and Satallax. On the other hand, LeoIII emerges as the winner on the firstorder SH32 benchmark set, presumably thanks to the combination of firstorder backends (CVC4, E, and iProver) it depends on.

The new hoboa configuration outperforms boa on higherorder problems, suggesting that it could be worthwhile to retrain auto and autoschedule based on \(\lambda \)fHOL benchmarks and to design further heuristics.
9 Discussion and Related Work
Most higherorder provers were developed from the ground up. Two exceptions are Otter\(\lambda \) by Beeson [6] and Zipperposition by Cruanes [14]. Otter\(\lambda \) adds \(\lambda \)terms and secondorder unification to the superpositionbased Otter. The approach is pragmatic, with little emphasis on completeness. Zipperposition is a superpositionbased prover written in OCaml. It was initially designed for firstorder logic but subsequently extended to higherorder logic. Its performance is a far cry from E’s, but it is easier to modify. It is used by Bentkamp et al. [7] for experimenting with higherorder features. Finally, there is noteworthy preliminary work by the developers of Vampire [10] and of CVC4 and veriT [4].
Native higherorder reasoning was pioneered by Robinson [22], Andrews [1], and Huet [16]. TPS, by Andrews et al. [2], was based on expansion proofs and let users specify proof outlines. The Leo systems, developed by Benzmüller and his colleagues, are based on resolution and paramodulation. LEO [8] introduced the cooperative paradigm to integrate firstorder provers. LeoIII [27] expands the cooperation with SMT (satisfiability modulo theories) solvers and introduces term orders. Brown’s Satallax [12] is based on a higherorder tableau calculus, guided by a SAT solver; recent versions also cooperate with firstorder provers.
An alternative to all of the above is to reduce higherorder logic to firstorder logic by means of a translation. Robinson [23] outlined this approach decades before tools such as Sledgehammer [21] and HOLyHammer [17] popularized it in proof assistants. In addition to performing an applicative encoding, such translations must eliminate the \(\lambda \)expressions and encode the type information.
By removing the need for the applicative encoding, our work reduces the translation gap. The encoding buries the \(\lambda \)fHOL terms’ heads under layers of \(\mathsf {@}\) symbols. Terms double in size, cluttering the data structures, and twice as many subterm positions must be considered for inferences. Moreover, encoding is incompatible with interpreted operators, notably for arithmetic. A further complication is that in a monomorphic logic, \(\mathsf {@}\) is not a single symbol but a typeindexed family of symbols \(\mathsf {@}_{\tau ,\upsilon }\), which must be correctly introduced and recognized. Finally, the encoding must be undone in the generated proofs. While it should be possible to base a higherorder prover on such an encoding, the prospect is aesthetically and technically unappealing, and performance would likely suffer.
10 Conclusion
Despite considerable progress since the 1970s, higherorder automated reasoning has not yet assimilated some of the most successful methods for firstorder logic with equality, such as superposition. We presented a graceful extension of a stateoftheart firstorder theorem prover to a fragment of higherorder logic devoid of \(\lambda \)terms. Our work covers both theoretical and practical aspects. Experiments show promising results on \(\lambda \)free higherorder problems and very little overhead for firstorder problems, as we would expect from a graceful generalization.
The resulting Ehoh prover will form the basis of our work towards strong higherorder automation. Our aim is to turn it into a prover that excels on proof obligations emerging from interactive verification; in our experience, these tend to be large but only mildly higherorder. Our next steps will be to extend E’s term data structure with \(\lambda \)expressions and investigate techniques for computing higherorder unifiers efficiently.
Footnotes
Notes
Acknowledgment
We are grateful to the maintainers of StarExec for letting us use their service. We thank Ahmed Bhayat, Alexander Bentkamp, Daniel El Ouraoui, Michael Färber, Pascal Fontaine, Predrag Janičić, Robert Lewis, Tomer Libal, Giles Reger, HansJörg Schurr, Alexander Steen, Mark Summerfield, Dmitriy Traytel, and the anonymous reviewers for suggesting many improvements to this text. We also want to thank the other members of the Matryoshka team, including Sophie Tourret and Uwe Waldmann, as well as Christoph Benzmüller, Andrei Voronkov, Daniel Wand, and Christoph Weidenbach, for many stimulating discussions.
Vukmirović and Blanchette’s research has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation program (grant agreement No. 713999, Matryoshka). Blanchette has received funding from the Netherlands Organization for Scientific Research (NWO) under the Vidi program (project No. 016.Vidi.189.037, Lean Forward). He also benefited from the NWO Incidental Financial Support scheme.
References
 1.Andrews, P.B.: Resolution in type theory. J. Symb. Log. 36(3), 414–432 (1971)MathSciNetCrossRefGoogle Scholar
 2.Andrews, P.B., Bishop, M., Issar, S., Nesmith, D., Pfenning, F., Xi, H.: TPS: a theoremproving system for classical type theory. J. Autom. Reason. 16(3), 321–353 (1996)MathSciNetCrossRefGoogle Scholar
 3.Baader, F., Nipkow, T.: Term Rewriting and All That. Cambridge University Press, Cambridge (1998)CrossRefGoogle Scholar
 4.Barbosa, H., Reynolds, A., Fontaine, P., Ouraoui, D.E., Tinelli, C.: Higherorder SMT solving (work in progress). In: Dimitrova, R., D’Silva, V. (eds.) SMT 2018 (2018)Google Scholar
 5.Becker, H., Blanchette, J.C., Waldmann, U., Wand, D.: A transfinite Knuth–Bendix order for lambdafree higherorder terms. In: de Moura, L. (ed.) CADE 2017. LNCS (LNAI), vol. 10395, pp. 432–453. Springer, Cham (2017). https://doi.org/10.1007/9783319630465_27CrossRefGoogle Scholar
 6.Beeson, M.: Lambda logic. In: Basin, D., Rusinowitch, M. (eds.) IJCAR 2004. LNCS (LNAI), vol. 3097, pp. 460–474. Springer, Heidelberg (2004). https://doi.org/10.1007/9783540259848_34CrossRefGoogle Scholar
 7.Bentkamp, A., Blanchette, J.C., Cruanes, S., Waldmann, U.: Superposition for lambdafree higherorder logic. In: Galmiche, D., Schulz, S., Sebastiani, R. (eds.) IJCAR 2018. LNCS (LNAI), vol. 10900, pp. 28–46. Springer, Cham (2018). https://doi.org/10.1007/9783319942056_3CrossRefGoogle Scholar
 8.Benzmüller, C., Kohlhase, M.: System description: Leo—a higherorder theorem prover. In: Kirchner, C., Kirchner, H. (eds.) CADE 1998. LNCS (LNAI), vol. 1421, pp. 139–143. Springer, Heidelberg (1998). https://doi.org/10.1007/BFb0054256CrossRefGoogle Scholar
 9.Benzmüller, C., Sultana, N., Paulson, L.C., Theiss, F.: The higherorder prover LEOII. J. Autom. Reason. 55(4), 389–404 (2015)MathSciNetCrossRefGoogle Scholar
 10.Bhayat, A., Reger, G.: Set of support for higherorder reasoning. In: Konev, B., Urban, J., Rümmer, P. (eds.) PAAR2018, CEUR Workshop Proceedings, vol. 2162, pp. 2–16. CEURWS.org (2018)Google Scholar
 11.Böhme, S., Nipkow, T.: Sledgehammer: Judgement Day. In: Giesl, J., Hähnle, R. (eds.) IJCAR 2010. LNCS (LNAI), vol. 6173, pp. 107–121. Springer, Heidelberg (2010). https://doi.org/10.1007/9783642142031_9CrossRefGoogle Scholar
 12.Brown, C.E.: Satallax: an automatic higherorder prover. In: Gramlich, B., Miller, D., Sattler, U. (eds.) IJCAR 2012. LNCS (LNAI), vol. 7364, pp. 111–117. Springer, Heidelberg (2012). https://doi.org/10.1007/9783642313653_11CrossRefGoogle Scholar
 13.Cruanes, S.: Extending Superposition with Integer Arithmetic, Structural Induction, and Beyond. PhD thesis, École polytechnique (2015). https://who.rocq.inria.fr/Simon.Cruanes/files/thesis.pdf
 14.Cruanes, S.: Superposition with structural induction. In: Dixon, C., Finger, M. (eds.) FroCoS 2017. LNCS (LNAI), vol. 10483, pp. 172–188. Springer, Cham (2017). https://doi.org/10.1007/9783319661674_10CrossRefGoogle Scholar
 15.Filliâtre, J.C., Paskevich, A.: Why3—where programs meet provers. In: Felleisen, M., Gardner, P. (eds.) ESOP 2013. LNCS, vol. 7792, pp. 125–128. Springer, Heidelberg (2013). https://doi.org/10.1007/9783642370366_8CrossRefGoogle Scholar
 16.Huet, G.P.: A mechanization of type theory. In: Nilsson, N.J. (ed.) IJCAI73, pp. 139–146. Morgan Kaufmann Publishers Inc., Burlington (1973)Google Scholar
 17.Kaliszyk, C., Urban, J.: HOL(y)Hammer: online ATP service for HOL light. Math. Comput. Sci. 9(1), 5–22 (2015)CrossRefGoogle Scholar
 18.Kovács, L., Voronkov, A.: Firstorder theorem proving and Vampire. In: Sharygina, N., Veith, H. (eds.) CAV 2013. LNCS, vol. 8044, pp. 1–35. Springer, Heidelberg (2013). https://doi.org/10.1007/9783642397998_1CrossRefGoogle Scholar
 19.Löchner, B.: Things to know when implementing KBO. J. Autom. Reason. 36(4), 289–310 (2006)MathSciNetCrossRefGoogle Scholar
 20.McCune, W.: Experiments with discriminationtree indexing and path indexing for term retrieval. J. Autom. Reason. 9(2), 147–167 (1992)MathSciNetCrossRefGoogle Scholar
 21.Paulson, L.C., Blanchette, J.C.: Three years of experience with Sledgehammer, a practical link between automatic and interactive theorem provers. In: Sutcliffe, G., Schulz, S., Ternovska, E. (eds.) IWIL2010. EPiC, vol. 2, pp. 1–11. EasyChair (2012)Google Scholar
 22.Robinson, J.: Mechanizing higher order logic. In: Meltzer, B., Michie, D. (eds.) Machine Intelligence, vol. 4, pp. 151–170. Edinburgh University Press, Edinburgh (1969)Google Scholar
 23.Robinson, J.: A note on mechanizing higher order logic. In: Meltzer, B., Michie, D. (eds.) Machine Intelligence, vol. 5, pp. 121–135. Edinburgh University Press, Edinburgh (1970)Google Scholar
 24.Schulz, S.: Fingerprint indexing for paramodulation and rewriting. In: Gramlich, B., Miller, D., Sattler, U. (eds.) IJCAR 2012. LNCS (LNAI), vol. 7364, pp. 477–483. Springer, Heidelberg (2012). https://doi.org/10.1007/9783642313653_37CrossRefGoogle Scholar
 25.Schulz, S.: Simple and efficient clause subsumption with feature vector indexing. In: Bonacina, M.P., Stickel, M.E. (eds.) Automated Reasoning and Mathematics. LNCS (LNAI), vol. 7788, pp. 45–67. Springer, Heidelberg (2013). https://doi.org/10.1007/9783642366758_3CrossRefGoogle Scholar
 26.Schulz, S.: System description: E 1.8. In: McMillan, K., Middeldorp, A., Voronkov, A. (eds.) LPAR 2013. LNCS, vol. 8312, pp. 735–743. Springer, Heidelberg (2013). https://doi.org/10.1007/9783642452215_49CrossRefGoogle Scholar
 27.Steen, A., Benzmüller, C.: The higherorder prover LeoIII. In: Galmiche, D., Schulz, S., Sebastiani, R. (eds.) IJCAR 2018. LNCS (LNAI), vol. 10900, pp. 108–116. Springer, Cham (2018). https://doi.org/10.1007/9783319942056_8CrossRefGoogle Scholar
 28.Stump, A., Sutcliffe, G., Tinelli, C.: StarExec: a crosscommunity infrastructure for logic solving. In: Demri, S., Kapur, D., Weidenbach, C. (eds.) IJCAR 2014. LNCS (LNAI), vol. 8562, pp. 367–373. Springer, Cham (2014). https://doi.org/10.1007/9783319085876_28CrossRefGoogle Scholar
 29.Sutcliffe, G.: The TPTP problem library and associated infrastructure. From CNF to TH0, TPTP v6.4.0. J. Autom. Reason. 59(4), 483–502 (2017)MathSciNetCrossRefGoogle Scholar
 30.Sutcliffe, G.: The CADE26 automated theorem proving system competition–CASC26. AI Commun. 30(6), 419–432 (2017)MathSciNetCrossRefGoogle Scholar
 31.Vukmirović, P.: Implementation of LambdaFree HigherOrder Superposition. MSc thesis, Vrije Universiteit Amsterdam (2018). http://matryoshka.gforge.inria.fr/pubs/vukmirovic_msc_thesis.pdf
 32.Vukmirović, P., Blanchette, J.C., Cruanes, S., Schulz, S.: Extending a brainiac prover to lambdafree higherorder logic (technical report). Technical report (2019). http://matryoshka.gforge.inria.fr/pubs/ehoh_report.pdf
 33.Weidenbach, C., Dimova, D., Fietzke, A., Kumar, R., Suda, M., Wischnewski, P.: SPASS version 3.5. In: Schmidt, R.A. (ed.) CADE 2009. LNCS (LNAI), vol. 5663, pp. 140–145. Springer, Heidelberg (2009). https://doi.org/10.1007/9783642029592_10CrossRefGoogle Scholar
Copyright information
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.