Superposition with First-Class Booleans and Inprocessing Clausiﬁcation

. We present a complete superposition calculus for ﬁrst-order logic with an interpreted Boolean type. The design of our calculus was driven by three motivating factors. First, the proof of our calculus’s completeness lays the foundation for complete calculi in more expressive logics with Booleans, such as higher-order logic. Second, since clausiﬁcation can make some simple problems hard to prove, our calculus works directly on formulas. Last, we avoid the costly encoding of the theory of Booleans to ﬁrst-order logic. We evaluate our calculus using the Zipper-position theorem prover, and observe that, with no tuning of heuristic parameters, our approach is on a par with the state-of-the-art approach.


Introduction
Superposition is a state-of-the-art calculus for equational first-order logic, which operates on problems given in clausal normal form.Its immense success made clausification preprocessing a predominant mechanism in modern automatic theorem proving.However, clausification is not without drawbacks.In our own experience, we noticed that clausification can transform simple problems, such as ϕ → → → ϕ where ϕ is a large formula, in such a way that its original simplicity is hidden from the superposition calculus.Ganzinger and Stuber's superposition variant [12] operates on clauses that contain formulas as well as terms, and thus replaces clausification preprocessing by inprocessing.Inprocessing clausification allows the powerful simplification engine of superposition to operate on formulas.For example, unit equalities can rewrite formulas before clausification, possibly avoiding many future rewrites after clausification.Whole formulas rather than simple literals can be removed by rules such as subsumption resolution [4].
Another issue with Boolean reasoning in the standard superposition calculus is that, in first-order logic, formulas cannot appear inside terms although this is often desirable for problems coming from software verifiers or proof assistants.Instead, authors of such tools need to resort to translations.Kotelnikov et al. studied effects of these translations in detail.They showed that simple axioms such as the domain cardinality axiom for Booleans (∀(x : o).x ≈ ∨ x ≈ ⊥ ⊥ ⊥) can severely slow down superposition provers.To support more efficient reasoning on problems with first-class Booleans, they describe the FOOL logic, which admits functions that accept arguments of Boolean type and quantification over Booleans.They further describe two approaches to reason in FOOL: The first one [16] requires an additional rule in superposition calculus, whereas the second one [15] is completely based on preprocessing.
Our calculus combines complementary advantages of Ganzinger and Stuber's work and of Kotelnikov et al.'s.Following Kotelnikov et al. (but deviating from Ganzinger and Stuber), our logic (Sect.2) is similar to FOOL and supports nesting formulas inside terms, as well as quantifying over Booleans.Following Ganzinger and Stuber (but deviating from Kotelnikov et al.), our calculus (Sect.3) reasons with formulas and supports inprocessing clausification.
In addition to combining the two approaches, our calculus extends them as well.To reduce the number of possible inferences, we generalize Ganzinger and Stuber's Boolean selection functions, which allow us to restrict the Boolean subterms in a clause on which inferences can be performed.The term order requirements of our calculus are less restrictive than Ganzinger and Stuber's.In addition to the lexicographic path order, we also support the Knuth-Bendix order, which is known to work better with superposition in practice.
Our proof of refutational completeness (Sect.4) lays the foundation for complete calculi in more complex logics with Booleans.Indeed, Bentkamp et al. [8] devised a refutationally complete calculus for higher-order logic based on our completeness theorem.Our theorem incorporates a powerful redundancy criterion that allows for a variety of inprocessing clausification methods (Sect.5).
We implemented our approach using the Zipperposition theorem prover (Sect.6).Our evaluation is performed on thousands of problems that target our logic ranging from Sledgehammer-generated, to SMT-LIB, to TPTP benchmarks (Sect.7).The results are promising: without fine-tuning, our new calculus performs as well as known techniques.Exploring the heuristic choices that our calculus opens should lead to further performance improvements.In addition, we corroborate the claims of Ganzinger and Stuber concerning applicability of formula-based superposition reasoning: We find a set of 17 TPTP problems (out of 1000 randomly selected) that Zipperposition can solve only using the techniques described in this paper.We refer to our technical report [23] for more details on our calculus and the full completeness proof.

Logic
Our logic is a first-order logic with an interpreted Boolean type.It is essentially identical to the UF logic of SMT-LIB [5], including the Core theory, but without if-then-else-and let-expressions, which can be supported through simple translations.It also closely resembles Kotelnikov et al.'s FOOL [16], which additionally supports if-then-else-and let-expressions.
Our logic requires an interpreted Boolean type o and allows for an arbitrary number of uninterpreted types.The set of symbols must contain the logical symbols and the overloaded symbols ≈ ≈ ≈, ≈ ≈ ≈ : (τ × τ ) → o for each type τ .The logical symbols are printed in bold to distinguish them from the notation used for clauses below.Throughout the paper, we write tuples (a 1 , . . ., a n ) as ān or ā.
The set of terms is defined inductively as follows.Every variable is a term.If f : τn → υ is a symbol and tn : τn is a tuple of terms, then the application f( tn ) A literal is an equation s ≈ t or a disequation s ≈ t.We write s ≈ t for a literal that can be either an equation or a disequation.Unlike terms constructed using the function symbols ≈ ≈ ≈ and ≈ ≈ ≈, literals are unordered-i.e., s ≈ t and t ≈ s denote the same literal.A clause The empty clause is written as ⊥.Terms t of Boolean type are not literals.They must be encoded as t ≈ and t ≈ ⊥ ⊥ ⊥, which we call predicate literals.Both are considered positive literals because they are equations, not disequations.
We have considered to exclude negative literals s ≈ t by encoding them as (s ≈ ≈ ≈ t) ≈ ⊥ ⊥ ⊥, following Ganzinger and Stuber.However, this approach requires an additional term order condition to make the conclusion of equality factoring small enough, excluding the Knuth-Bendix order.To support both the Knuth-Bendix order and the lexicographic path order, we allow negative literals.Regardless, our simplification mechanism will allow us to simplify negative literals of the form t ≈ ⊥ ⊥ ⊥ and t ≈ into t ≈ and t ≈ ⊥ ⊥ ⊥, respectively, thereby eliminating redundant representations of predicate literals.
The semantics is a straightforward extension of standard first-order logic, as in Kotelnikov et al.'s FOOL logic.Some of our calculus rules introduce Skolem symbols, which are intended to be interpreted as witnesses for existentially quantified terms.Still, our semantics treats them as uninterpreted symbols.To achieve a satisfiability-preserving calculus, we assume that these symbols do not occur in the input problem.More precisely, we inductively extend the signature of the input problem by a symbol sk ∃z.t : τ → υ for each term of the form ∃z. t over the extended signature, where υ is the type of z and τ are the types of the free variables occurring in ∃z.t, in order of first appearance.

The Calculus
Following standard superposition, our calculus employs a term order and a literal selection function to restrict the search space.To accommodate for quantified Boolean terms, we impose additional requirements on the term order.To support flexible reasoning with Boolean subterms, in addition to the literal selection function, we introduce a Boolean subterm selection function.
Term Order The calculus is parameterized by a strict well-founded order on ground terms that fulfills: (O1) u ⊥ ⊥ ⊥ for any term u that is not or ⊥ ⊥ ⊥; (O2) ∀x.t {x → u}t and ∃x.t {x → u}t for any term u whose only Boolean subterms are and ⊥ ⊥ ⊥; (O3) subterm property; (O4) compatibility with contexts (not necessarily below ∀ and ∃); (O5) totality.The order is extended to literals, clauses, and nonground terms as usual [2].The nonground order then also enjoys (O6) stability under grounding substitutions.
Ganzinger and Stuber's term order restrictions are similar but incompatible with the Knuth-Bendix order (KBO) [14].Using an encoding of our terms into untyped first-order logic we describe how both the lexicographic path order (LPO) and the transfinite variant of KBO [18] can satisfy conditions (O1)-(O6).
Our encoding represents bound variables by De Bruijn indices, which become new constant symbols db n for n ∈ N. Quantifiers are represented by two new unary function symbols, also denoted by ∀ and ∃.All other symbols are simply identified with their untyped counterpart.Regardless of symbol precedence or symbol weight assignment, KBO and LPO satisfy properties (O3)-(O6) when applied to the encoded terms because they are ground-total simplification orders.They are even compatible with contexts below quantifiers.
To satisfy (O1) and (O2), let the precedence for LPO be where f is any other symbol.For KBO, we can use the same symbol precedence and a symbol weight function W that assigns each symbol ordinal weights (of the form ωa + b with a, b ∈ N), where The function FLSel maps each clause to a subset of its literals.The selection function FBSel maps each clause to a subset of its Boolean subterms.The literals FLSel (C) and the subterms FBSel (C) are called selected in C. The following selection restrictions apply: (S1) A literal can only be selected if it is negative or of the form s ≈ ⊥ ⊥ ⊥; (S2) A Boolean subterm can only be selected if it is not , ⊥ ⊥ ⊥, or a variable; (S3) A Boolean subterm can only be selected if its occurrence is not below a quantifier; and (S4) The topmost terms on either side of a positive literal cannot be selected.

The interplay of maximality with respect to term order, literal and Boolean selection functions gives rise to a new notion of eligibility:
Definition 2 (Eligibility).A literal L is (strictly) eligible w.r.t. a substitution σ in C if it is selected in C or there are no selected literals and no selected Boolean subterms in C and σL is (strictly) maximal in σC.Thus, a selected literal is strictly eligible.The eligible subterms of a clause C w.r.t. a substitution σ are inductively defined as follows: (E1) Any selected subterm is eligible.(E2) If a literal s ≈ t with σs σt is either eligible and negative or strictly eligible and positive, then s is eligible.(E3) If a subterm is eligible and its head is not ≈ ≈ ≈, ≈ ≈ ≈, ∀, or ∃, all of its direct subterms are also eligible.(E4) If a subterm is eligible and of the form s ≈ ≈ ≈ t or s ≈ ≈ ≈ t, then s is eligible if σs σt and t is eligible if σs σt.The substitution σ is left implicit if it is the identity substitution.
The Core Inference Rules The following inference rules form our calculus: The rules are subject to the following side conditions: the head of t is not a logical symbol; 8. if σt = ⊥ ⊥ ⊥, the subterm u is at the top level of a positive literal.
Rationale for the Rules Our calculus is a graceful generalization of superposition: if the input clauses do not contain any Boolean terms, it coincides with standard superposition.In addition to the standard superposition rules Sup, Irrefl, and EqFact, our calculus contains various rules to cope with Booleans.
For each logical symbol and quantifier, we must consider the case where it is true and the case where it is false.Whenever possible, we prefer rules that rewrite the Boolean subterm in place (with names ending in Rw).When this cannot be done in a satisfiability-preserving way, we resort to rules hoisting the Boolean subterm into a dedicated literal (with names ending in Hoist).For terms headed by an uninterpreted predicate, the rule BoolHoist only copes with the case that the term is false.If it is true, we rely on Sup to rewrite it to eventually.
Example 3. The clause a ∧ ∧ ∧ ¬ ¬ ¬a ≈ can be refuted by the core inferences as follows.First we derive a ≈ (displayed on the left) and then we use it to derive ⊥ (displayed on the right).In this and the following example, we assume eager selection of literals whenever the selection restrictions allow it.
The derivation illustrates how BoolHoist and Sup replace uninterpreted predicates by and ⊥ ⊥ ⊥ to allow BoolRw to eliminate the surrounding logical symbols.
Example 4. The clause (∃x.∀y.y ≈ ≈ ≈ x) ≈ can be refuted as follows: Redundancy Criterion In standard superposition, a clause is defined as redundant if all of its ground instances follow from smaller ground instances of other clauses.We keep this definition, but use a nonstandard notion of ground instances.In our completeness proof, this new notion of ground instances ensures that ground instances of the conclusion of ∀Hoist, ∃Hoist, ∀Rw, and ∃Rw inferences are smaller than the corresponding instances of their premise by property (O2).In standard superposition, an inference is defined as redundant if all of its ground instances are, and a ground inference is defined as redundant if its conclusion follows from other clauses smaller than the main premise.We keep this definition as well, but we use a nonstandard notion of ground instances for some of the Boolean rules.In our report, we define a slightly stronger variant of inference redundancy via an explicit ground calculus, but the following notion is also strong enough to justify the few prover optimizations based on inference redundancy we know from the literature (e.g., simultaneous superposition [7]).

Definition 6 (Redundancy of inferences).
A ground instance of a ∀Hoist, ∃Hoist, ∀Rw, or ∃Rw inference is an inference obtained by applying a grounding substitution to premise and conclusion, regardless of whether the result is a valid ∀Hoist, ∃Hoist, ∀Rw, or ∃Rw inference.A ground instance of an inference ι of other rules is an inference ι of the same rule such that premises and conclusion of ι are ground instances of the respective premises and conclusion of ι.For ι , we use selection functions that select the ground literals and Boolean subterms corresponding to the ones selected in the nonground premises.A ground inference with main premise C, side premises C 1 , . . ., C n , and conclusion D is redundant with respect to N if there exist clauses A nonground inference is redundant if all its ground instances are redundant.
A clause set N is saturated if every inference from N is redundant w.r.t.N.

Simplification Rules
The redundancy criterion is a graceful generalization of the criterion of standard superposition.Thus, the standard simplification and deletion rules, such as deletion of trivial literals and clauses, subsumption, and demodulation, can be justified.Demodulation below quantifiers is justified if the term order is compatible with contexts below quantifiers.
Some calculus rules can act as simplifications.⊥ ⊥ ⊥Elim can always be a simplification.Given a clause on which both Hoist and Rw apply, where ∈ {∀, ∃}, the clause can be replaced by the conclusions of these rules.If Rw does not apply because of condition 4 or 5, Hoist alone can be a simplification.
While experimenting with our implementation, we have observed that the following simplification rule can substantially shorten proofs: In this rule, we require s t.Even though we discovered the rule independently, a similar rule was already available in the Vampire prover [17].
Interpreting literals of the form s ≈ and s ≈ ⊥ ⊥ ⊥ as s ≈ ⊥ ⊥ ⊥ and s ≈ , respectively, we can apply the rule even to these positive literals.This especially comes in handy with rules such as BoolHoist.Consider the clause C = p i (⊥ ⊥ ⊥) ≈ ⊥ ⊥ ⊥ ∨ q ≈ ⊥ ⊥ ⊥, assume no literal is selected and the Boolean selection function always selects a subterm p(⊥ ⊥ ⊥).Applying BoolHoist to C we get If we did not use LocalRw, BoolHoist would produce i − 2 intermediary clauses starting from C, none of which would be recognized as a tautology.
Many rules of our calculus replace subterms with or ⊥ ⊥ ⊥.After this replacement, resulting terms can be simplified using Boolean equivalences that specify the behavior of logical operations on and ⊥ ⊥ ⊥.To this end, we use the rule BoolSimp [31], similar to simp of Leo-III [25, Sect.4.2.1.]: This rule replaces s with t whenever s ≈ ≈ ≈ t is contained in a predefined set of tautological equations.In addition to all equations that Leo-III uses for simp, we also include more complex ones, such as where u i = v j for some i and j.The exhaustive list is given in our technical report.Using BoolSimp and ⊥ ⊥ ⊥Elim, Example 3 can be solved in two simplification steps instead of twelve inference steps.

Refutational Completeness
Our calculus is not sound because of the introduction of Skolem symbols in ∀Rw and ∃Rw but it preserves satisfiability.It is also dynamically refutationally complete: Completeness Theorem 7. Let S 0 be an unsatisfiable set of clauses.Let (S i ) i be a fair derivation, i.e., a sequence S 0 , S 1 , . . .where every inference from the clauses in ∞ i=1 j>i S j is computed and added to some S j , or at least becomes redundant w.r.t.S j .Then ⊥ ∈ S i for some i.
We outline some key parts of the proof here but refer to our technical report [23] for the details.In our proof, we first define a ground calculus and prove it complete.Devising suitable ground analogues of the rules ∀Rw and ∃Rw was difficult because the arguments of the Skolems depend on the variables occurring in the premise.Therefore, we parameterize the ground calculus by a function that provides ground Skolem terms in the ground versions of these rules.When lifting the completeness result to the nonground level, we instantiate the parameter with a specific function that allows us to lift the ∀Rw and ∃Rw inferences.
To prove the ground calculus complete, we employ the framework for reduction of counterexamples [3].It requires us to construct an interpretation I given a saturated unsatisfiable clause set that does not contain ⊥.Then we must show that any counterexample-i.e., a clause that does not hold in I-can be reduced to a smaller (≺) counterexample by some inference.
The interpretation I is defined by a normalizing rewrite system as in the standard completeness proof of superposition.To ensure a correct semantics, we incrementally add Boolean rewrite rules along with the rules produced by clauses as usual.Intuitively, if a counterexample can be rewritten by a Boolean rule, we reduce it by a Hoist or Rw inference.If it can be rewritten by a rule produced by a clause, we reduce it by a Sup inference.
We derive the dynamic completeness of our nonground calculus using the the saturation framework [33].It gives us a nonground clause set N to work with.We then have to choose the parameters of our ground calculus such that all of its inferences from the grounding of N are redundant or liftable.We show that inferences rewriting below variables are redundant.Other inferences are shown to be liftable-i.e., they are a ground instance of some nonground inference from N .

Inprocessing Clausification Methods
Our calculus makes preprocessing clausification unnecessary: A problem specified by a formula f can be represented as a clause f ≈ .Our redundancy criterion allows us to add various sets of rules to steer the clausification inprocessing.
Without any additional rules, our core calculus rules perform all the necessary reasoning with formulas.We call this method inner delayed clausification because the calculus rules tend to operate on the inner Boolean subterms first.
The outer delayed clausification method adds the following rules to the calculus, which are guided by the outermost logical symbols.Let s and t be Boolean terms.Below, we denote literals of the form s ≈ and s ≈ ⊥ as s + , and literals of the form s ≈ ⊥ and s ≈ as s − .
The rules +OuterClaus and −OuterClaus are applicable to any term s whose head is a logical symbol, whereas the rules ≈OuterClaus and ≈OuterClaus are only applicable if neither s nor t is or ⊥.Clearly, our redundancy criterion allows us to replace the premise of all Outer-Claus-rules with their conclusions.Nonetheless, the rules ≈OuterClaus and ≈OuterClaus are not used as simplification rules since destructing equivalences disturbs the syntactic structure of the formulas, as noted by Ganzinger and Stuber [12].The function oc(s, C) analyzes the shape of the formula s and distributes it over the clause C. For example, oc(s This function also replaces quantified terms by either a fresh free variable or a Skolem in the body of the quantified term, depending on the polarity.The full definition of oc(s, C) is specified in our technical report.
A third inprocessing clausification method is immediate clausification.It first preprocesses the input problem using a standard first-order clausification procedure such as Nonnengart and Weidenbach's [22].Then, during the proof search, when a clause C appears on which OuterClaus rules could be applied, we apply the standard clausification procedure on the formula ∀x.C instead (where x are the free variables of C), and replace C with the clausification results.With this method, the formulas are clausified in one step, making intermediate clausification results inaccessible to the simplification machinery.
Renaming Common Formulas Following Tseitin [29], clausification procedures usually rename common formulas to prevent a possible combinatorial explosion caused by naive clausification.In our two delayed clausification methods, we realize this idea using the following rule.Given a definition clause R, i.e., a clause of the form p(x) ≈ f : Here, the formula f has a logical head, x are the free variables in f , p is a fresh symbol, σ i is a substitution, and the clauses R 1 , . . ., R m are the result of simplifying R as described below.The rule avoids exponential explosion by replacing n positions in which results of f 's clausification will appear into a single position in R. Optimizations such as polarity-aware renaming [22,Sect. 4] also apply to Rename.
Several issues arise with Rename as an inprocessing rule.We need to ensure that in R, f p(x), since otherwise demodulation might reintroduce a formula f in the simplified clauses.This can be achieved by giving the fresh symbol p a precedence smaller than that of all symbols initially present in the problem (other than and ⊥).To ensure the precedence is well founded, the precedence of p must be greater than that of symbols previously introduced by the calculus.For KBO, we additionally set the weight of p to the minimal possible weight.
For Rename to be used as a simplification rule, we need to ensure that the conclusions are smaller than the premises.This is trivially true for all clauses other than the clause R. For example, let C i = f ≈ (σ i is the identity).Clearly, R is larger than C i .However, we can view the definition clause R as two clauses Then, we can apply a single step of OuterClaus-rules to R + and R − (on their subformula f ), which further results in clauses R 1 , . . ., R m .Inspecting OuterClaus-rules, it is clear that m ≤ 4, which makes enforcing this simplification tolerable.Furthermore, as f is simplified in each of R 1 , . . ., R m , they are smaller than any premise C i .
Another potential source of a combinatorial explosion in our calculus are formulas that occur deep in the arguments of uninterpreted predicates.Consider the clause C = p i (x) ≈ ∨ q j (y) ≈ where i, j > 2. If the first and the second literal are eligible in C, any clause , resulting from multiple BoolHoist applications, can be obtained in many different ways.This explosion can be avoided using the following rule: where p is a fresh symbol, x are all free variables occurring in s ≈ t, the clauses R 1 , . . ., R 4 result from simplifying R = p(x) ≈ (s ≈ ≈ ≈ t) as described above, and we impose the same precedence and weight restrictions on p as for Rename.Finally, we require that both s ≈ t and C contain deep Booleans where a Boolean subterm u| p of a term u is a deep Boolean if there are at least two distinct proper prefixes q of the position p such that the head of u| q is an uninterpreted predicate.
Similarly to Rename, the definition clause R can be larger than the premise.As OuterClaus-rules might not apply to s ≈ ≈ ≈ t, we need a different solution: In this rule u is a non-variable Boolean subterm, different from and ⊥, whose indicated occurrence is not in a literal u ≈ b where b is , ⊥ or a variable.Clearly, both conclusions of BoolHoistSimp are smaller than the premise.As before, observing that R is equivalent to two clauses R + = p(x) ≈ ⊥ ∨ s ≈ t and R − = p(x) ≈ ∨ s ≈ t, we simplify R + and R − into clauses that are guaranteed to be smaller than the premise.This is achieved by applying BoolHoistSimp to one of the deep Boolean occurrences in both R + and R − , which produces R 1 , . . ., R 4 and reduces the size of resulting clauses enough for them to be smaller than the premise of RenameDeep.The RenameDeep rule can be applied analogously to negative literals s ≈ t.

Implementation
Zipperposition [10] is an automatic theorem prover designed for easy prototyping of various extensions of superposition.So far, it has been extended to support induction, arithmetic, and various fragments of higher-order logic.We have implemented our calculus and its extensions described above in Zipperposition.
Zipperposition has long supported λ as the only binder.Because introducing new binders would significantly complicate the implementation, we decided to represent the terms ∀x.t and ∃x.t as ∀(λx.t) and ∃(λx.t), respectively.
We introduced a normalized presentation of predicate literals as either s ≈ or s ≈ ⊥.As Zipperposition previously encoded them as s ≈ or s ≈ , enforcing the new encoding was a source of great implementation effort.
The presentation and implementation of the calculus rules differ in some ways, without affecting completeness.Relying on the presence of BoolSimp, we can simplify the implementation of BoolRw.BoolSimp simplifies logical symbols if one argument is either or ⊥ or if two arguments are identical.Therefore, we need to implement BoolRw only for two cases where BoolSimp does not apply: if all arguments of a logical symbol are distinct variables and if the sides of an equation or disequation are different and unifiable.Justified by our redundancy criterion, the rules BoolHoist and Hoist simultaneously replace all occurrences of the eligible subterm they act on.For example, applying ≈ ≈ ≈Hoist to p(x ≈ ≈ ≈ y) ≈ ∨ q(x ≈ ≈ ≈ y) ≈ ⊥ yields p(⊥) ≈ ∨ q(⊥) ≈ ⊥ ∨ x ≈ y.The last two conditions of the Rw rules are approximated by checking if the affected literal is of the form ∃z. v ≈ ⊥ (for ∃Rw) or ∀z.v ≈ (for ∀Rw).These are variants of the EqFact conditions but was discovered after the evaluation, thus here we perform EqFact inferences even if the maximal literal is selected.
Zipperposition's existing selection functions were not designed with Boolean subterm selection in mind.For instance, a function that selects a literal L with a selectable Boolean subterm s can make s eligible, even if the Boolean selection function did not select s.To mitigate this issue, we can optionally block selection of literals that contain selectable Boolean subterms.
We implemented four Boolean selection functions: selecting the leftmost innermost, leftmost outermost, syntactically largest or syntactically smallest selectable subterm.Ties are broken by selecting the leftmost term.Additionally, we implemented a Boolean selection function that does not select any subterm.
Vukmirović and Nummelin [31,Sect. 3.4] explored inprocessing clausification as part of their pragmatic approach to Boolean reasoning.They describe in detail how the formula renaming mechanism is implemented.We reuse their mechanism, and additionally simplify definition clauses as described in Sect.The mode using our new calculus performs immediate inprocessing clausification, and we call it base, while the mode that preprocesses Boolean subterms is denoted by preprocess in Figure 1.
The obtained results do not give a conclusive answer to question 1.On both TPTP-HO and Sledgehammer problems, some configuration of our new calculus manages to prove one problem more than preprocessing.On SMT-LIB benchmarks, the best configuration of our calculus matches preprocessing.This shows that our calculus performs roughly as well as previously known techniques.
Our base mode uses immediate inprocessing clausification.To answer question 2, we compared base with a variant of base with outer delayed clausification (base+outer ) and with a variant with inner delayed clausification (base+inner ).In the delayed modes, we invoke the Rename rule on formulas that are discovered to occur more than four times in the proof state.
The results show that inner delayed clausification, which performs the laziest form of clausification, gives the worst results on most benchmark sets.Outer delayed clausification performs roughly as well as immediate clausification on problems targeting our logic.On purely first-order problems, it performs a bit worse than immediate clausification.However, outer delayed clausification solves 17 problems not solved by immediate clausification on these problems.This suggests that it opens new possibilities for first-order reasoning that need to be explored further with specialized heuristics and additional rules.
We found a problem with a conjecture of the form ϕ → → → ϕ that only the delayed clausification modes can prove: the TPTP problem SWV122+1.The subformula renaming mechanism of immediate clausification obfuscates this problem, whereas delayed clausification allows BoolSimp to convert the negated conjecture to ⊥ directly, completing the proof in half a second.
To answer question 3, we compared the mode of Zipperposition in which all of the additional rules introduced by our calculus are disabled (Zip-FO) with base.Our results show that both modes perform roughly the same.Some base modes prove up to two more problems within the last seconds of the time allotted due to fluctuations in the evaluation environment beyond our control.
To answer question 4, we evaluated the Boolean selection functions we have implemented: syntactically smallest selectable term (used in base), syntactically largest selectable term (sel max ), leftmost innermost selectable term (sel li ), leftmost outermost selectable term (sel lo ), and no Boolean selection (sel ∅ ).We also evaluated two modes in which the rules LocalRw and BoolHoistSimp (BHS) are enabled.None of the selection functions influences the performance greatly.Similarly, we observe no substantial difference regardless of whether the rules LocalRw and BoolHoistSimp are enabled.

Related Work and Conclusion
The research presented in this paper extends superposition in two directions: with inprocessing clausification and with first-class Booleans.The first direction has been explored before by Ganzinger and Stuber [12], and others have investigated it in the context of other superposition-related calculi [1,4,19,20].
The other direction has been explored before by Kotelnikov et al., who developed two approaches to cope with first-class Booleans [15,16].For the quantified Boolean formula fragment of our logic, Seidl et al. developed a translation into effectively propositional logic [24].More general approaches to incorporate theories into superposition include superposition for finite domains [13], hierarchic superposition [6], and superposition with (co)datatypes [9].
For SMT solvers [21], supporting first-class Booleans is a widely accepted standard [5].In contrast, the TPTP TFX format [28], intended to promote firstclass Booleans in the rest of the automated reasoning community, is yet to gain traction.Software verification tools could clearly benefit from its popularization, as some of them identify terms and formulas in their logic, e.g., Why3 [11].
In conclusion, we devised a refutationally complete superposition calculus for first-order logic with interpreted Booleans.Its redundancy criterion allows us to flexibly add inprocessing clausification and other simplification rules.We believe our calculus is an excellent choice for the basis of new superposition provers: it offers the full power of regular superposition, while supporting rich input languages such as SMT-LIB and TPTP TFX.Even with unoptimized implementation and basic heuristics, our calculus matches the performance of earlier approaches.In addition, the freedom it offers in term order, literal and Boolean subterm selection opens many possibilities that are yet to be explored.Overall, our calculus is a solid foundation for richer logics in which the Boolean type cannot be efficiently preprocessed, such as higher-order logic [8].In future work, we plan to tune the heuristics and would find it interesting to combine our calculus with clause splitting techniques, such as AVATAR [30].This combination amounts to a hybrid approach combining superposition and tableaux.
and W(f) ∈ Z + for any other symbol f.Selection and Eligibility Following an idea of Ganzinger and Stuber, we parameterize our calculus with two selection functions: one selecting literals and one selecting Boolean subterms.Our selection functions have weaker restrictions: Definition 1 (Selection functions).The calculus is parameterized by a literal selection function FLSel and a Boolean subterm selection function FBSel .

Definition 5 (
Redundancy of clauses).The ground instances of a clause C are all clauses of the form γC where γ is a substitution such that for all variables x, the only Boolean subterms of γx are ⊥ ⊥ ⊥ and .A ground clause C is redundant with respect to a ground clause setN if there exist clauses C 1 , . . ., C k ∈ N such that we have C 1 , . . ., C k |= C and C C i for all 1 ≤ i ≤ k.A nonground clause C is redundant w.r.t.clauses N if C isstrictly subsumed by a clause in N or every ground instance of C is redundant w.r.t.ground instances of N .

Fig. 1 :
Fig. 1: Number of problems solved per benchmark set and Zipperposition mode.The x-axes start from the number of problems solved by all evaluated modes.
a variable and t : o a Boolean term, then the quantified terms ∀x.t and ∃x.t are terms of Boolean type.We view quantified terms modulo α-renaming.A formula is a term of Boolean type.The head of a term is f if the term is an application f( tn ); it is x if the term is a variable x; and it is ∀ or ∃ if the term is a quantified term ∀x.t or ∃x.t.A variable occurrence is free in a term if it is not bound by ∀ or ∃.A term is ground if it contains no free variables.
1.v is a term that may contain the loose bound variable z; 2. ȳ are the free variables occurring in ∃z.v and ∀z.v, respectively, in order of first appearance; 3. the indicated subterm is eligible in C; 4. for ∀Rw, C[ ] is not a tautology; 5. for ∃Rw, C[⊥ ⊥ ⊥] is not a tautology.⊥⊥⊥Elim1. σ = mgu(s ≈ t, ⊥ ⊥ ⊥ ≈ ); 2. s ≈ t is strictly eligible in C w.r.t.σ.BoolHoist 1. u is a Boolean term whose head is an uninterpreted predicate;2.u is eligible in C; 3. u is not a variable; 4. u is not at the top level of a positive literal. 5.