The
calculus closely resembles \(\lambda \)Sup, augmented with rules for Boolean reasoning that are inspired by
. As in \(\lambda \)Sup, superposition-like inferences are restricted to certain first-order-like subterms, the green subterms, which we define inductively as follows: Every term t is a green subterm of t, and for all symbols
, if t is a green subterm of \(u_i\) for some i, then t is a green subterm of
. For example, the green subterms of
are the term itself,
,
, \({\mathsf {p}}\),
, and
. We write
to denote a term s with a green subterm t and call the first-order-like context
a green context.
Following \(\lambda \)Sup, we call a term t fluid if (1) \(t{\downarrow }_{\beta \eta {\mathsf {Q}}_{\eta }}\) is of the form \(y\>\bar{u}_n\) where \(n \ge 1\), or (2) \(t{\downarrow }_{\beta \eta {\mathsf {Q}}_{\eta }}\) is a \(\lambda \)-expression and there exists a substitution \(\sigma \) such that \(t\sigma {\downarrow }_{\beta \eta {\mathsf {Q}}_{\eta }}\) is not a \(\lambda \)-expression (due to \(\eta \)-reduction). Intuitively, fluid terms are terms whose normal form can change radically as a result of instantiation.
We define deeply occurring variables as in \(\lambda \)Sup, but exclude \(\lambda \)-expressions directly below quantifiers: A variable occurs deeply in a clause C if it occurs inside an argument of an applied variable or inside a \(\lambda \)-expression that is not directly below a quantifier.
Preprocessing. Our completeness theorem requires that quantified variables do not appear in certain higher-order contexts. We use preprocessing to eliminate problematic occurrences of quantifiers. The rewrite rules
and
, which we collectively denote by
, are defined as
and
where the rewritten occurrence of \({\mathsf {Q}}{\langle \tau \rangle }\) is unapplied or has an argument of the form
such that x occurs as a nongreen subterm of v. If either of these rewrite rules can be applied to a given term, the term is
-reducible; otherwise, it is
-normal.
For example, the term
is
-normal. A term may be
-reducible because a quantifier appears unapplied (e.g.,
); a quantified variable occurs applied (e.g.,
); a quantified variable occurs inside a nested \(\lambda \)-expression (e.g.,
); or a quantified variable occurs in the argument of a variable, either a free variable (e.g.,
) or a variable bound above the quantifier (e.g.,
).
A preprocessor
-normalizes the input problem. Although inferences may produce
-reducible clauses, we do not
-normalize during the derivation process itself. Instead,
-reducible ground instances of clauses will be considered redundant by the redundancy criterion. Thus, clauses whose ground instances are all
-reducible can be deleted. However, there are
-reducible clauses, such as
, that nevertheless have
-normal ground instances. Such clauses must be kept because the completeness proof relies on their
-normal ground instances.
In principle, we could omit the side condition of the
-rewrite rules and eliminate all quantifiers. However, the calculus (especially, the redundancy criterion) performs better with quantifiers than with \(\lambda \)-expressions, which is why we restrict
-normalization as much as the completeness proof allows. Extending the preprocessing to eliminate all Boolean terms as in Kotelnikov et al. [21] does not work for higher-order logic because Boolean terms can contain variables bound by enclosing \(\lambda \)-expressions.
Term Order. The calculus is parameterized by a well-founded strict total order \(\succ \) on ground terms satisfying these four criteria: (O1) compatibility with green contexts—i.e., \(s' \succ s\) implies
; (O2) green subterm property—i.e.
where \(\succeq \) is the reflexive closure of \(\succ \); (O3)
for all terms
; (O4)
for all types \(\tau \), terms t, and terms u such that
and u are
-normal and the only Boolean green subterms of u are
and
. The restriction of (O4) to
-normal terms ensures that term orders fulfilling the requirements exist, but it forces us to preprocess the input problem. We extend \(\succ \) to literals and clauses via the multiset extensions in the standard way [2, Sect. 2.4].
For nonground terms, \(\succ \) is required to be a strict partial order such that \(t \succ s\) implies \(t\theta \succ s\theta \) for all grounding substitutions \(\theta \). As in \(\lambda \)Sup, we also introduce a nonstrict variant \(\succsim \) for which we require that \(t\theta \succeq s\theta \) for all grounding substitutions \(\theta \) whenever \(t \succsim s\), and similarly for literals and clauses.
To construct a concrete order fulfilling these requirements, we define an encoding into untyped first-order terms, and compare these using a variant of the Knuth–Bendix order. In a first step, denoted
, the encoding translates fluid terms t as fresh variables
; nonfluid \(\lambda \)-expressions
as
; applied quantifiers
as
; and other terms
as
. Bound variables are encoded as constants \({\mathsf {db}}^i\) corresponding to De Bruijn indices. In a second step, denoted
, the encoding replaces \({\mathsf {Q}}_1\) by \({\mathsf {Q}}_1'\) and variables z by \(z'\) whenever they occur below \({\mathsf {lam}}\). For example,
is encoded as
. The first-order terms can then be compared using a transfinite Knuth–Bendix order \(\succ _{{\mathsf {kb}}}\) [22]. Let the weight of
and
be \(\omega \), the weight of
and
be 1, and the weights of all other symbols be less than \(\omega \). Let the precedence > be total and
be the symbols of lowest precedence, with
. Then let \(t \succ s\) if
and \(t \succsim s\) if
.
Selection Functions. The calculus is also parameterized by a literal selection function and a Boolean subterm selection function. We define an element x of a multiset M to be \(\unrhd \)-maximal for some relation \(\unrhd \) if for all \(y \in M\) with \(y \unrhd x\), we have \(y = x\). It is strictly \(\unrhd \)-maximal if it is \(\unrhd \)-maximal and occurs only once in M.
The literal selection function \( HLitSel \) maps each clause to a subset of selected literals. A literal may not be selected if it is positive and neither side is
. Moreover, a literal
may not be selected if \(y\>\bar{u}_n\), with \(n \ge 1\), is a \(\succeq \)-maximal term of the clause.
The Boolean subterm selection function \( HBoolSel \) maps each clause C to a subset of selected subterms in C. Selected subterms must be green subterms of Boolean type. Moreover, a subterm s must not be selected if
, if
, if s is a variable-headed term, if s is at the topmost position on either side of a positive literal, or if s contains a variable y as a green subterm, and
, with \(n \ge 1\), is a \(\succeq \)-maximal term of the clause.
Eligibility. A literal L is (strictly) eligible w.r.t. a substitution \(\sigma \) in C if it is selected in C or there are no selected literals and no selected Boolean subterms in C and \(L\sigma \) is (strictly) \(\succsim \)-maximal in \(C\sigma .\)
The eligible subterms of a clause C w.r.t. a substitution \(\sigma \) are inductively defined as follows: Any selected subterm is eligible. If a literal
with
is either eligible and negative or strictly eligible and positive, then the subterm s is eligible. If a subterm t is eligible and the head of t is not
or
, all direct green subterms of t are eligible. If a subterm t is eligible and t is of the form
or
, then u is eligible if
and v is eligible if
.
The Core Inference Rules. The calculus consists of the following core inference rules. The first five rules stem from \(\lambda \)Sup, with minor adaptions concerning Booleans:
-
Sup1. u is not fluid; 2. u is not a variable deeply occurring in C; 3. if u is a variable y, there must exist a grounding substitution \(\theta \) such that
and \(C\sigma \theta \prec C''\sigma \theta \), where \(C'' = C\{y\mapsto t'\}\); 4. \(\sigma \in {{\,\mathrm{CSU}\,}}(t,u)\); 5. \(t\sigma \not \precsim t'\sigma \); 6. u is eligible in C w.r.t. \(\sigma \); 7. \(C\sigma \not \precsim D\sigma \); 8. \(t \approx t'\) is strictly eligible in D w.r.t. \(\sigma \); 9. \(t\sigma \) is not a fully applied logical symbol; 10. if
, the subterm u is at the top level of a positive literal.
-
ERes1. \(\sigma \in {{\,\mathrm{CSU}\,}}(u,u')\); 2. \(u \not \approx u'\) is eligible in C w.r.t. \(\sigma \).
-
EFact1. \(\sigma \in {{\,\mathrm{CSU}\,}}(u,u')\); 2. \(u\sigma \not \precsim v\sigma \); 3. \((u \approx v)\sigma \) is \(\succsim \)-maximal in \(C\sigma \); 4. \(u\sigma \not \precsim v\sigma \); 5. nothing is selected in C.
-
FluidSup1. u is a variable deeply occurring in C or u is fluid; 2. z is a fresh variable; 3.
; 4. \((z\>t')\sigma \not = (z\>t)\sigma \); 5.–10. as for Sup.
-
ArgCong1. \(n > 0\); 2. \(\sigma \) is the most general type substitution that ensures well-typedness of the conclusion for a given n; 3. \(\bar{x}_n\) is a tuple of distinct fresh variables; 4. the literal \(s \approx s'\) is strictly eligible in C w.r.t. \(\sigma \).
The following rules are concerned with Boolean reasoning and originate from
. They have been adapted to support polymorphism and applied variables.
-
BoolHoist] 1. \(\sigma \) is a type unifier of the type of u with the Boolean type
(i.e., the identity if u is Boolean or
if u is of type \(\alpha \) for some type variable \(\alpha \)); 2. the head of u is neither a variable nor a logical symbol; 3. u is eligible in C; 4. the occurrence of u is not at the top level of a positive literal.
-
EqHoist, NeqHoist, ForallHoist, ExistsHoist 1.
,
,
, or
, respectively; 2. x, y, and \(\alpha \) are fresh variables; 3. u is eligible in C w.r.t. \(\sigma \); 4. if the head of u is a variable, it must be applied and the affected literal must be of the form
,
, or
where v is a variable-headed term.
-
FalseElim1.
; 2.
is strictly eligible in C w.r.t. \(\sigma \).
-
BoolRw 1. \(\sigma \in {{\,\mathrm{CSU}\,}}(t,u)\) and \((t, t')\) is one of the following pairs, where y is a fresh variable:
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
; 2. u is not a variable; 3. u is eligible in C w.r.t. \(\sigma \); 4. if the head of u is a variable, it must be applied and the affected literal must be of the form
,
, or \(u \approx v\) where v is a variable-headed term.
-
ForallRw, ExistsRw 1.
and
, respectively, where \(\beta \) is a fresh type variable, y is a fresh term variable, \(\bar{\alpha }\) are the free type variables and \(\bar{x}\) are the free term variables occurring in \(y\sigma \) in order of first occurrence; 2. u is not a variable; 3. u is eligible in C w.r.t. \(\sigma \); 4. if the head of u is a variable, it must be applied and the affected literal must be of the form
, or \(u \approx v\) where v is a variable-headed term; 5. for ForallRw, the indicated occurrence of u is not in a literal
, and for ExistsRw, the indicated occurrence of u is not in a literal
.
Like Sup, also the Boolean rules must be simulated in fluid terms. The following rules are Boolean counterparts of FluidSup:
In addition to the inference rules, our calculus relies on two axioms, below. Axiom (Ext), from \(\lambda \)Sup, embodies functional extensionality; the expression
abbreviates
. Axiom (Choice) characterizes the Hilbert choice operator \(\varepsilon \).
Rationale for the Rules. Most of the calculus’s rules are adapted from its precursors. Sup, ERes, and EFact are already present in Sup, with slightly different side conditions. Notably, as in \(\lambda \)fSup and \(\lambda \)Sup, Sup inferences are required only into green contexts. Other subterms are accessed indirectly via ArgCong and (Ext).
The rules BoolHoist, EqHoist, NeqHoist, ForallHoist, ExistsHoist, FalseElim, BoolRw, ForallRw, and ExistsRw, concerned with Boolean reasoning, stem from
, which was inspired by
. Except for BoolHoist and FalseElim, these rules have a condition stating that “if the head of u is a variable, it must be applied and the affected literal must be of the form
,
, or
where v is a variable-headed term.” The inferences at variable-headed terms permitted by this condition are our form of primitive substitution [1, 18], a mechanism that blindly substitutes logical connectives and quantifiers for variables z with a Boolean result type.
Example 1
Our calculus can prove that Leibniz equality implies equality (i.e., if two values behave the same for all predicates, they are equal) as follows:
The EqHoist inference, applied on
, illustrates how our calculus introduces logical symbols without a dedicated primitive substitution rule. Although
does not appear in the premise, we still need to apply EqHoist on
with
. Other calculi [1, 9, 18, 26] would apply an explicit primitive substitution rule instead, yielding essentially
. However, in our approach this clause is subsumed and could be discarded immediately. By hoisting the equality to the clausal level, we bypass the redundancy criterion.
Next, BoolRw can be applied to
with
. The two FalseElim steps remove the
literals. Then Sup is applicable with the unifier
, and ERes derives the contradiction.
Like in \(\lambda \)Sup, the FluidSup rule is responsible for simulating superposition inferences below applied variables, other fluid terms, and deeply occurring variables. Complementarily, FluidBoolHoist and FluidLoobHoist simulate the various Boolean inference rules below fluid terms. Initially, we considered adding a fluid version of each rule that operates on Boolean subterms, but we discovered that FluidBoolHoist and FluidLoobHoist suffice to achieve refutational completeness.
Example 2
The clause set consisting of
and
highlights the need for FluidBoolHoist and its companion. The set is unsatisfiable because the instantiation
produces the clause
, which is unsatisfiable in conjunction with
.
The literal selection function can select either literal in the first clause. ERes is applicable in either case, but the unifiers
and
do not lead to a contradiction. Instead, we need to apply FluidBoolHoist if the first literal is selected or FluidLoobHoist if the second literal is selected. In the first case, the derivation is as follows:
The FluidBoolHoist inference uses the unifier
. We apply ERes to the first literal of the resulting clause, with unifier
. Next, we apply EqHoist with the unifier
to the literal created by FluidBoolHoist, effectively performing a primitive substitution. The resulting clause can superpose into
with the unifier
. The two sides of the interpreted equality in the first literal can then be unified, allowing us to apply BoolRw with the unifier
. Finally, applying ERes twice and FalseElim once yields the empty clause.
Remarkably, none of the provers that participated in the CASC-J10 competition can solve this two-clause problem within a minute. Satallax finds a proof after 72 s and LEO-II after over 7 minutes. Our new Zipperposition implementation solves it in 3 s.
The Redundancy Criterion. In first-order superposition, a clause is considered redundant if all its ground instances are entailed by \(\prec \)-smaller ground instances of other clauses. In essence, this will also be our definition, but we will use a different notion of ground instances and a different notion of entailment.
Given a clause C, let its ground instances
be the set of all clauses of the form \(C\theta \) for some substitution \(\theta \) such that \(C\theta \) is ground and
-normal, and for all variables x occurring in C, the only Boolean green subterms of \(x\theta \) are
and
. The rationale of this definition is to ensure that ground instances of the conclusion of ForallHoist, ExistsHoist, ForallRw, and ExistsRw inferences are smaller than the corresponding instances of their premise by property (O4).
The redundancy criterion’s notion of entailment is defined via an encoding into a weaker logic, following \(\lambda \)fSup and \(\lambda \)Sup. In this paper, the weaker logic is ground first-order logic with interpreted Booleans—the ground fragment of the logic of
. Its signature \((\mathrm {\Sigma }_\mathsf {ty},\mathrm {\Sigma }_{\mathrm {GF}})\) is derived from our higher-order signature \((\mathrm {\Sigma }_\mathsf {ty},\mathrm {\Sigma })\) as follows. The type constructors \(\mathrm {\Sigma }_\mathsf {ty}\) are the same in both signatures, but \({\rightarrow }\) is an uninterpreted type constructor in first-order logic. For each ground instance
, we introduce a first-order symbol
with argument types \(\bar{\tau }_{\!j}\) and result type \(\tau _{\!j+1} \rightarrow \cdots \rightarrow \tau _n \rightarrow \tau \), for each j. Moreover, for each ground term \(\lambda x.\>t\), we introduce a symbol
of the same type. The symbols
, and
are identified with the corresponding first-order logical symbols.
We define an encoding
of
-normal ground higher-order terms into this ground first-order logic recursively as follows:
and
for applied quantifiers;
for \(\lambda \)-expressions; and
for other terms. For quantified variables, we define
. Here,
-normality is crucial to ensure that bound variables do not occur applied or within \(\lambda \)-expressions. The definition of green subterms is devised such that green subterms correspond to first-order subterms via the encoding
, with the exception of first-order subterms below quantifiers. The encoding
is extended to clauses by mapping each literal and each side of a literal individually. From the entailment relation \(\models \) for the ground first-order logic, we derive an entailment relation
on
-normal ground higher-order clauses by defining
if
. This relation is weaker than standard higher-order entailment; for example,
(because of the subscripts added by
) and
(because of the \({\mathsf {lam}}\) symbols used by
).
Using
, we define a clause C to be redundant w.r.t. a clause set N if for every
, we have
or there exists a clause \(C' \in N\) such that \(C \sqsupset C'\) and
. The tiebreaker \(\sqsupset \) can be an arbitrary well-founded partial order on clauses; in practice, we use a well-founded restriction of the ill-founded strict subsumption relation [6, Sect. 3.4]. We denote the set of redundant clauses w.r.t. a clause set N by \({ Red _{\mathrm {C}}}(N)\). Note that
is weak enough to ensure that the ArgCong inference rule and axiom (Ext) are not immediately redundant and can fulfill their purpose.
For first-order superposition, an inference is considered redundant if for each of its ground instances, a premise is redundant or the conclusion is entailed by clauses smaller than the main premise. For most inference rules, our definition follows this idea, using
for entailment; other rules need nonstandard notions of ground instances and redundancy. The definition of inference redundancy presented below is simpler than the more sophisticated notion in our technical report. Nonetheless, the redundant inferences below are a strict subset of the redundant inferences of our report and thus completeness also holds using the notion below. For the few prover optimizations based on inference redundancy that we know about (e.g., simultaneous superposition [4]), the following criterion suffices.
For Sup, ERes, EFact, BoolHoist, FalseElim, EqHoist, NeqHoist, and BoolRw, we define ground instances as usual: Ground instances are all inferences obtained by applying a grounding substitution to premises and conclusion such that the result adheres to the conditions of the given rule w.r.t. selection functions that select literals and subterms as in the original premise. For FluidSup and FluidBoolHoist, we define ground instances in the same way except that we require that ground instances adhere to the conditions of Sup or BoolHoist, respectively. For ForallRw, ExistsRw, ForallHoist, ExistsHoist, which do not have ground instances in the sense above, we define a ground instance as any inference that is obtained by applying the unifier \(\sigma \) to the premise and then applying a grounding substitution to premise and conclusion, regardless of whether the resulting inference is an inference of our calculus.
For all rules except FluidLoobHoist and ArgCong, we define an inference to be redundant w.r.t. a clause set N if for each ground instance \(\iota \), a premise of
is redundant w.r.t.
or the conclusion of
is entailed w.r.t.
by clauses from
that are smaller than the main (i.e., rightmost) premise of \(\iota \). For the rules FluidLoobHoist and ArgCong, as well as axioms (Ext) and (Choice)—viewed as premiseless inferences—we define an inference to be redundant w.r.t. a clause set N if all ground instances of its conclusion are contained in
or redundant w.r.t.
. We denote the set of redundant inferences w.r.t. N by \({ Red _{\mathrm {I}}}(N)\).
Simplification Rules. Our redundancy criterion is strong enough to support counterparts of most simplification rules implemented in Schulz’s first-order E [25, Sect. 2.3.1 and 2.3.2]. Deletion of duplicated literals, deletion of resolved literals, syntactic tautology deletion, negative simplify-reflect, and clause subsumption adhere to our redundancy criterion. Positive simplify-reflect, equality subsumption, and rewriting (demodulation) of positive and negative literals are supported if they are applied on green subterms or on other subterms that are encoded into first-order subterms by
and
. Semantic tautology deletion can be applied as well, using
; moreover, for positive literals, the rewriting clause must be smaller than the rewritten clause.
Under some circumstances, inference rules can be applied as simplifications. The FalseElim and BoolRw rules can be applied as a simplification if \(\sigma \) is the identity. If the head of u is
, ForallHoist and ForallRw can both be applied and, together, serve as one simplification rule. The same holds for ExistsHoist and ExistsRw if the head of u is
. For all of these rules, the eligibility conditions can be ignored.
Clausification. Like
, our calculus does not require the input problem to be clausified during the preprocessing, and it supports higher-order analogues of the three inprocessing clausification methods introduced by Nummelin et al. Inner delayed clausification relies on our core calculus rules to destruct logical symbols. Outer delayed clausification adds the following clausification rules to the calculus:
The double bars identify simplification rules (i.e., the conclusions make the premise redundant and can replace it). The first two rules require that s has a logical symbol as its head, whereas the last two require that s and t are Boolean terms other than
and
. The function \(oc\) distributes the logical symbols over the clause C—e.g.,
, and
. It is easy to check that our redundancy criterion allows us to replace the premise of the OuterClaus rules with their conclusion. Nonetheless, we apply EqOuterClaus and NeqOuterClaus as inferences because the premises might be useful in their original form.
Besides the two delayed clausification methods, a third inprocessing clausification method is immediate clausification. This clausifies the input problem’s outer Boolean structure in one swoop, resulting in a set of higher-order clauses. If unclausified Boolean terms rise to the top during saturation, the same algorithm is run to clausify them.
Unlike delayed clausification, immediate clausification is a black box and is unaware of the proof state other than the Boolean term it is applied to. Delayed clausification, on the other hand, clausifies the term step by step, allowing us to interleave clausification with the strong simplification machinery of superposition provers. It is especially powerful in higher-order contexts: Examples such as
can be refuted directly by equality resolution, rather than via more explosive rules on the clausified form.