SAT-Based Subsumption Resolution

. Subsumption resolution is an expensive but highly eﬀec-tive simplifying inference for ﬁrst-order saturation theorem provers. We present a new SAT-based reasoning technique for subsumption resolution, without requiring radical changes to the underlying saturation al-gorithm. We implemented our work in the theorem prover Vampire


Introduction
Saturation-based proof search is a popular approach to first-order theorem proving [6,14,18].In addition to efficient inference systems [8,1], saturation provers also implement redundancy elimination to reduce the size of the search space.Redundancy elimination deletes clauses from the search space by showing them to be logical consequences of other (smaller) clauses, and therefore redundant.However, checking whether a first-order formula is implied by another first-order formula is undecidable, and so eliminating redundant clauses is in general undecidable too.In practice, saturation systems apply cheaper conditions for redundancy elimination, such as removing equational tautologies by congruence closure or deleting subsumed clauses by establishing multiset inclusion.Recently, SAT solving has been applied to efficiently detect and remove subsumed clauses [10].We extend SAT-based reasoning in first-order theorem proving to a combination of subsumption and resolution, subsumption resolution [2] (Section 4).
Both subsumption and subsumption resolution are NP-complete [4].To improve efficiency in practice, we (i) encode subsumption resolution as SAT formulas over (match) set constraints (Section 5) and (ii) directly integrate CDCL SAT solving for checking subsumption resolution in first-order theorem proving (Section 6).We implement our approach in the theorem prover Vampire [6], improving the state-of-the-art in first-order reasoning (Section 7).Related Work.Subsumption and subsumption resolution are some of the most powerful and frequently used redundancy criteria in saturation-based provers.Subsumption resolution is supported as contextual literal cutting in [14], along with efficient approaches for detecting multiset inclusions among clauses [6,18,13].Special cases of unit deletion as a by-product of subsumption tests are also proposed in [16].Much attention has been given to refinements of term indexing [16,13] to drastically reduce the set of candidate clauses checked for subsumption.Recently, these approaches have been complemented by SAT solving [10], reducing subsumption checking to SAT.Our work generalises this approach by solving for both subsumption and subsumption resolution via SAT.
SAT solvers have been applied widely to first-order theorem proving, including but not limited to AVATAR [17], instance-based methods [5], heuristic grounding [14], global subsumption [12] and combinations thereof [11], but using SAT solvers for classical subsumption methods is under-explored.To the best of our knowledge, SAT solving for subsumption resolution has so far not been addressed in the landscape of automated reasoning.

Illustrative Examples and Main Contributions
Let us illustrate a few challenges of subsumption resolution, which motivate our approach to solving it (Section 4).Given a pair of clauses L and M , denoted as (L, M ), the problem is to decide whether M can be simplified by L via a special case of logical consequence.In Figure 1 we show examples where it is not obvious for which pairs (L i , M i ) subsumption resolution can be applied.In fact, subsumption resolution can only be applied to (L 1 , M 1 ).Later, we show how our approach determines that M 1 can be shortened in the presence of L 1 (Example 3.1), but also how the remaining pairs cannot apply subsumption resolution (Examples 5.1, 5.2, and 4.1).For example, (L 4 , M 4 ) is filtered by pruning to bypass the SAT routine altogether.Our Contributions.

Preliminaries
We assume familiarity with first-order logic with equality.We include standard Boolean connectives and quantifiers in the language, and the constants ⊤, ⊥ for truth and falsehood.We use x, y, z for first-order variables, c, d, e for constants, f, g for functions, p, q, r for atoms, l, m for literals, and L, M for clauses, all potentially with indices.If L is a clause l 1 ∨ . . .∨ l n , we sometimes consider it as a multiset of its literals l i , and write |L| for its cardinality (i.e. the number n of literals in L).The empty clause is denoted □.Free variables are universally quantified.An expression E is a term, atom, literal, clause, or formula.

Substitutions and matches.
A substitution σ is a (partial) mapping from variables to terms.The result of applying a substitution σ to an expression E is denoted σ(E) and is the expression obtained by simultaneously replacing each variable x in E by σ(x).For example, the application of σ := {x → f (c)} to the clause L := {p(x), q(x, y)} yields σ(L) = {p(f (c)), q(f (c), y)}.Note that σ(L) is a logical consequence of L.
A matching substitution, in short a match, between literals l and m is a substitution σ such that σ(l) = m.For example, the match of p(x) onto p(f (c)) is {x → f (c)}.Two matches are compatible and can be combined in the same substitution iff they do not assign different terms to the same variable.For example, the substitutions {x → f (c), y → g(d)} and {x → f (c), z → h(e)} are compatible, but {x → f (c)} and {x → g(c)} are not.
Saturation and redundancy.Many first-order systems apply the superposition calculus [1] in a saturation loop [8].Given an input set F of clauses, saturation iteratively derives logical consequences and adds them to F .By soundness and completeness of superposition, if □ is derived the system can report unsatisfiability of F ; if □ is not encountered and no further clauses can be derived, the system reports satisfiability of F .
Saturation is more efficient when F is as small as possible.For this reason, saturation-based provers also employ simplifying inferences.Simplifying inferences reduce the number or size of clauses in F .This is formalised using the following notion of redundancy: a ground clause M is redundant in a set of ground clauses F if M is a logical consequence of clauses in F that are strictly smaller than M w.r.t. a fixed simplification ordering ≻.A non-ground clause M is redundant in a set of clauses F if each ground instance of M is redundant in the set of ground instances of F .If M is redundant in F , then M can be removed from F while retaining completeness.
where ⊆ M denotes multiset inclusion.We also say that M is subsumed by L.
Note that subsumed clauses are redundant.
Removing subsumed clauses M from the search space F is implemented through a simplifying rule, checking condition (1) over pairs of clauses (L, M ) from F .Matches between every literal in L to some literal in M are checked; if a compatible set of matches is found, then M can be removed from F .Subsumption resolution.Subsumption resolution aims to remove one redundant literal from a clause.Clauses M and L are said to be the main and side premise of subsumption resolution, respectively, iff there is a substitution σ, a set of literals L ′ ⊆ L and a literal m ′ ∈ M such that If so, M can be replaced by M \ {m ′ }.Subsumption resolution is hence the rule We indicate the deletion of a clause M by drawing a line through it ( M ), and we refer to the literal m ′ of M as the resolution literal of SR.Intuitively, subsumption resolution is binary resolution followed by subsumption of one of its premises by the conclusion.However, by combining two inferences into one it can be treated as a simplifying inference, which is advantageous from the perspective of proof search dynamics.

SAT-based Subsumption Resolution
We describe the main steps of our SAT-based approach for deciding the applicability of subsumption resolution on a pair (L, M ) of clauses.The core of our work solves (2) by finding match substitutions between literals in L and M .Our technique is summarised in Algorithm 1.
Pruning.The first step of Algorithm 1 prunes pairs (L, M ) of clauses that cannot be simplified by subsumption resolution due to a syntactic restriction over symbols in L and M , viz.whether the set of predicates in L is a subset of the predicates in M .If not, then there is a literal in L that cannot be matched to any literal in M , and hence subsumption resolution cannot be applied.
Example 4.1.The clause pair (L 4 , M 4 ) from Figure 1 is pruned by Algorithm 1: the set of predicates in L 4 and M 4 are respectively {p, q, r} and {p, q}, implying that the literal r(x 3 ) of L 4 cannot be matched to any literal in M 4 .Match set.The match set of Algorithm 1 computes matching substitutions over literals of L and M .The match set ms consists of a sparse matrix that assigns each literal pair (l i , m j ) ∈ L × M a substitution σ i,j such that σ i,j (l i ) = m j or σ i,j (l i ) = ¬m j .In addition, a polarity P i,j is also assigned to (l i , m j ), as follows: we set polarity P i,j = + if σ i,j (l i ) = m j and P i,j = − if σ i,j (l i ) = ¬m j .This matrix is sparse because in general not all literal pairs (l i , m j ) ∈ L × M can be matched.Additionally, it is again possible to prune (L, M ) while filling the match set: if a row of the match set is empty, then there is some literal in L that cannot be matched to any literal in M .In this case, subsumption resolution cannot use L to simplify M , so the pair (L, M ) is pruned.
SAT solver.The solver of Algorithm 1 is the CDCL-based SAT solver introduced previously [10], which supports reasoning over matching substitutions in addition to standard propositional reasoning.This solver also features direct support for AtMostOne constraints.Solver performance was tuned for subsumption, which we retain for subsumption resolution.Each propositional variable v is associated with a substitution σ v , and the solver ensures that all substitutions σ v , for which v is assigned ⊤ in the current model, are compatible.Conceptually, a global substitution σ satisfying the invariant σ = {σ v | v = ⊤} is kept in the SAT solver.In the following, we will write this binding as v ⇒ σ v ⊆ σ.
Encoding constraints.Given the match set of (L, M ), we formalise the subsumption resolution problem (2) as the conjunction of four constraints over matching substitutions.Our formalisation is given in Theorem 5.1 and is complete in the following sense: subsumption resolution can be applied over (L, M ) iff each constraint of Theorem 5.1 is satisfiable.Application of subsumption reso-lution is tested via satisfiability checking over our constraints from Theorem 5.1.Encodings of our subsumption resolution constraints are given in Section 5. Building the conclusion.If a model is found for the constraints encoding subsumption resolution, the conclusion M \ {m ′ } of SR is built using the model.

Subsumption Resolution and SAT Encodings
As mentioned in Section 4, we turn the application of subsumption resolution SR over (L, M ) into the satisfiability checking problem of Algorithm 1.We give our formalisation of SR in Theorem 5.1, followed by two encodings to SAT (Section 5.1-5.2) and adjustments to subsumption (Section 5.3).
Theorem 5.1 (Subsumption Resolution Constraints).Clauses M and L are the main and side premise, respectively, of an instance of the subsumption resolution rule SR iff there exists a substitution σ that satisfies the following four properties: We relate these constraints to the definition of subsumption resolution (2).The existence property (3) requires a literal m j in M such that a literal l i of L can be matched to ¬m j , ensuring the existence of the resolution literal in SR.Uniqueness (4) asserts that the resolution literal m j of SR is unique, required because SR performs only a single resolution step.Completeness (5) requires each literal in L be matched either to the complement of a resolution literal, or to a literal in M .Since each (complementary) literal in L is matched to one (resolution) literal of M , the completeness property ensures that the conclusion of SR subsumes M .Finally, coherence (6) states that all literals in M must be matched by literals in L with uniform polarity.This implies that all literals of L other than the resolution literal are present in the conclusion of SR.We note that these constraints can be used to recreate Example 3.1.
Example 5.1.The clause pair (L 2 , M 2 ) of Figure 1 does not satisfy the uniqueness property: both the match between p(x 1 ) and ¬p(y) and the match between q(x 2 ) and ¬q(c) are negative and so no substitution can satisfy all constraints simultaneously.Therefore, subsumption resolution cannot be applied over (L 2 , M 2 ).
Example 5.2.The clause pair (L 3 , M 3 ) violates the coherence property for all possible σ, since a negative map from p(x 1 ) to ¬p(y) cannot coexist with a positive map from ¬p(x 2 ) to ¬p(y).Subsumption resolution cannot be performed over (L 3 , M 3 ).

Direct SAT Encoding of Subsumption Resolution
We present our encoding of subsumption resolution constraints as a SAT problem, allowing us to use Algorithm 1 for deciding the application of SR.In the sequel we consider the clauses L, M as in Theorem 5.1.Compatibility.We introduce indexed propositional variables b + i,j and b − i,j to represent σ(l i ) = m j and σ(l i ) = ¬m j respectively, which we use to track compatible matching substitutions between literals of L and M .More precisely, a propositional variable is created if and only if the corresponding match is possible (i.e., in the formulas below, if no match exist, replace the corresponding propositional variable by ⊥).As it is not possible to have simultaneously a substitution σ i,j (l i ) = m j and σ i,j (l i ) = ¬m j , we also write b i,j to mean either b + i,j or b − i,j when the polarity of the match is irrelevant.Following Section 4, the variables are bound to their substitutions: SR constraints.Constraints ( 3)-( 6) of Theorem 5.1 employ bounded quantification over the finite number of literals in L, M .Expanding these quantifiers over their respective domains, we translate them into the following SAT formulas: SAT-based uniqueness SAT-based completeness SAT-based coherence SR as SAT problem.Based on the above, application of subsumption resolution is decided by the satisfiability of ( 7)∧( 8)∧( 9)∧( 10)∧ (11).This SAT formula extended with substitutions represents the result of encodeConstraint() in Algorithm 1 and is used further in Algorithm 3. When this formula is satisfiable, we construct the substitution σ required for SR by From the model of the SAT solver, we extract the first literal b − i,j assigned ⊤, from which we conclude that the j th literal in M is the resolution literal of SR.As such, application of SR over L and M results in replacing M by M \ {m j }.
Remark 5.1.Implicitly, all l i literals are mapped to at most one literal m j .Indeed, if there were several literals m j such that σ(l i ) = m j or σ(l i ) = ¬m j , then either the respective matches are not compatible (guarded by the compatibility property (7)), there are identical literals in M , or M is a tautology (which is not allowed).
Remark 5.2.While we defined b i,j to be true if, and only if, σ i,j ⊆ σ, we only encode the sufficient condition b i,j ⇒ σ i,j ⊆ σ.The completeness property (10) together with Remark 5.1 state that each l i must have exactly one match to some m j or ¬m j .Therefore, if σ i,j ⊆ σ then the respective b i,j must be true and the condition also becomes necessary: b i,j ⇐ σ i,j ⊆ σ.
Example 5.3.Consider the pair (L 1 , M 1 ) of Figure 1.The match set ms of Algorithm 1 is: is incompatible with any substitution, b 2,1 = ⊥ need not be defined.This also allows to disregard SAT clauses that are trivially satisfied.The existence (8) and completeness (10) properties cannot have empty clauses: this is easily detected while filling the match set, and the instance of SR is pruned.Adding falsified literals in these constraints is unnecessary.The uniqueness (9) and coherence (11) properties have only negative polarity literals and therefore there is no need to add clauses containing b 2,1 .In light of the previous comment, we use variables b + 1,1 , b − 1,2 and b − 2,2 and encode SR using the following constraints: The uniqueness (9) and coherence (11) properties are trivial here because the problem is simple: all b − i,j have the same j, and no literal m j can be mapped with different polarities.By using SAT solving from Algorithm 1 over the above SAT constraints, we obtain the SAT model b , with b − 2,2 the first literal assigned ⊤ with negative polarity.The application of SR over (L 1 , M 1 ) yields the conclusion M \ {m 2 } = p(g(y 1 ), c), replacing M .

Indirect SAT Encoding of Subsumption Resolution
SAT-based formulas ( 9) and ( 11) may yield many constraints, with worst-case complexity O(|L| 2 |M | 2 ).In practice such situations rarely occur, since the match set ms is sparsely populated.Nevertheless, to alleviate this worst-case complexity, we further constrain the approach of Section 5.1.We introduce structuring propositional variables c j such that c j is ⊤ iff there exists a literal l i with σ(l i ) = ¬m j , which we encode as: SR as revised SAT problem.While the compatibility property (7) remains unchanged, the SR constrains of Theorem 5.1 are revised as given below.
Using the above SAT formula as the result of encodeConstraint() in Algorithm 1, the worst-case behaviour is eliminated in exchange for O(|M |) propositional variables, c j .While the direct encoding of Section 5.1 is more efficient on small problems as it requires fewer variables and constraints, the indirect encoding of this section is expected to behave better on larger problems (see Section 7).
Remark 5.3.Note that the uniqueness property ( 14) is handled via AtMostOne constraints, based on the approach of [10].If a variable c j is set to ⊤, then our SAT solver in Algorithm 1 infers that all other variables c j ′ are set to ⊥.
SAT-based revised completeness, i = 2 The SAT solver returns b + 1,1 ∧ ¬b − 1,2 ∧ b − 2,2 ∧ c 2 as a solution to the above SAT problem, from which the application of SR yields a similar result to that of Example 5.3.
Remark 5.4.We note that our method naturally supports commutative predicates, such as equality.Let ≃ denote object-level equality.Suppose we have literals l i := a ≃ b and m j := c ≃ d.Two propositional variables with associated matching substitutions σ i,j and σ ′ i,j are introduced, where σ i,j matches a ≃ b against c ≃ d and σ ′ i,j matches a ≃ b against d ≃ c.If zero or one matches exist, then the problem behaves exactly like the non-symmetric case.If both matches exist, then σ i,j and σ ′ i,j must be incompatible: otherwise, c and d would be identical terms and the trivial literal m j would have been eliminated.Therefore, our SAT-based encodings for subsumption resolution do not need to be adapted and behave as expected.

SAT Constraints for Subsumption
In the new framework of Algorithm 1, the formulation suggested by [10] was adjusted to work with subsumption resolution.Algorithm 1 needs very little adaptation for subsumption: the encodeConstraint() method uses the encoding below, and the conclusion needs not be built as only the satisfiability of the formulas is relevant.The re-written SAT encoding becomes: Note that the set of propositional variables used in our SAT-based formulas ( 17)-( 19) encoding subsumption is a subset of the variables used by our SATbased subsumption resolution constraints.
Pruning for subsumption.The pruning technique described in Section 4 can be adapted into a stronger form for subsumption.In this case, we will check for multi-set inclusion between multi-sets of (predicates, polarity) pairs.

SAT-based Subsumption Resolution in Saturation
In this section we discuss the integration of our SAT-based subsumption resolution approach within saturation-based proof search.
Forward/backward simplifications.For the purpose of efficient reasoning, saturation algorithms use two main variants of simplification inferences implementing redundancy.Forward simplifications are applied on a newly generated clause M to check whether M can be simplified by an existing clause L. Backward simplifications use a newly generated clause L to check whether L can simplify existing clauses M .Backward simplification tends to be more expensive.SAT-based subsumption resolution in saturation.Since subsumption is a stronger form of simplification, subsumption is checked before subsumption resolution.This means that subsumption resolution is applied only if subsumption fails for all candidate premises.We integrate Algorithm 1 within saturation so that it is used both for subsumption and subsumption resolution.
Algorithms 2-3 display a variation of the integration of our SAT-based approach for checking subsumption resolution during saturation.Since most of the setup of subsumption is also required for subsumption resolution, both simplification rules are set up at the same time.As such, whenever turning to subsumption resolution, the same match set ms from Algorithm 2 can be reused, while also taking advantage of pruning steps performed during subsumption.
We modified the forward simplification algorithm as described in Algorithm 4. In this new setting, checking the same pair (L, M ) for subsumption directly fol-Algorithm 4 Forward simplification with SAT-based subsumption resolution procedure ForwardSimplify(M, F ) ▷ Empiric check return r ′ lowed by subsumption resolution enables us to use Algorithms 2-3 efficiently.Algorithm 4 pays the price of checking subsumption resolution even if subsumption may succeed, but in practice inefficiencies in this respect are seen rarely.

Role of indices.
When applying inferences that require terms or literals to unify or match, modern automated first-order theorem provers typically use term indices [9] to consider only viable candidates within the set of clauses.Subsumption and subsumption resolution is no exception.Our testbed system Vampire currently uses a substitution tree to index clauses for matching by their literals (Section 7).

Implementation and Experiments
We implemented and integrated our SAT-based subsumption resolution approach in the saturation-based first-order theorem prover Vampire [6]3 .
Versions compared.We use following versions of Vampire in our evaluation: • Vampire M is the master branch without SAT-based subsumption resolution; • Vampire I is the SAT-based subsumption resolution with the indirect encoding of Section 5.2 and a standard forward simplification algorithm with Algorithm 1 -that is, Algorithm 4 is not used here; • Vampire * I uses the indirect encoding with Algorithms 2-4; • Vampire * D uses the direct encoding of Section 5.1 and Algorithms 2-4.Experimental setting.To evaluate our work, we used the examples of the TPTP library (version 8.1.2) [15].In our evaluation, 24 926 problems were used out of the 25 257 TPTP problems; the remaining problems are not supported by Vampire (e.g., problems with both higher-order operators and polymorphism).
Our experimental evaluation was done on a machine with two 32-core AMD Epyc 7502 CPUs clocked at 2.5 GHz and 1006 GiB of RAM (split into 8 memory nodes of 126 GiB shared by 8 cores).Each benchmark problem was run with the options -sa otter -t 60, meaning that we used the Otter saturation algorithm [7] with a 60-second time-out.We use the Otter strategy because it is the most aggressive in terms of simplification and therefore runs the most subsumption resolutions.We turned off the AVATAR framework (-av off) in order to have full control over SAT-based reasoning in Vampire.
Evaluation setup.Our evaluation process is summarised in Algorithm 5, incorporating the following notes.
• The conclusion clause of the subsumption resolution rule SR is not necessarily unique.Therefore, different versions of subsumption resolution, including our work based on direct and indirect SAT encodings, may not return the same conclusion clause of SR.Hence, applying different versions of subsumption resolution over the same clauses may change the saturation process.• Saturation with our SAT-based subsumption resolution takes advantage of subsumption checking (see .Therefore, only checking subsumption resolution on pairs of clauses is not a fair nor viable comparison, as isolating subsumption checks from subsumption resolution is not what we aimed for (due to efficiency).• CPU cache influences results.For example, two consecutive runs of Algorithm 4 may be up to 25% faster on second execution, due to cache effects.For the reasons above, we decided to measure the run time of a complete execution of Algorithm 4. To prevent the branches to change, an Oracle is used to choose the path to follow.The Oracle is based on our indirect SAT encoding (Vampire * I ).This way, the same computation graph is used for all evaluated methods.To prevent cache preheating, we run the Oracle after the respective evaluated method.This way the cache is in a normal state for the evaluated method.To measure the run time of Algorithm 4, a Wrapper method was built on top of the Forward Simplify procedure of Algorithm 4. This Wrapper replaces the Forward Simplify loop in Vampire with minimal changes to the code.To empirically verify the correctness of our results, we used the Wrapper to compare the result of the evaluated method with the result of the Oracle.
Experimental details and analysis.2 shows that the total number of Forward Simplify loops ran in 60 seconds.However, the average and standard deviation were computed only on the intersection of the problems solved.That is, only the Forward Simplify loops finished by all the methods are taken into account.Otherwise, if a hard problem is solved in, for instance, 1 000 000 µs by one method, and times out for another, the average for the better would increase a lot, but the weaker method would not be penalised.Table 1 summarises the average solving time of our evaluation.
Comparison of encodings.We correlated the constraint building and SAT solving time with the length of clauses, using the different encodings of Sections 5.1-5.2. Figure 3 shows that on larger clauses, the average computation time increases faster for the direct encoding than for the indirect encoding.Table 2: Number of TPTP problems solved by the considered versions of Vampire.The run was made using the options -sa otter -av off with a timeout of 60 s.The Gain/Loss column reports the difference of solved instances compared to Vampire M .
Experimental summary.Our experiments show that Vampire * I yields the most stable approach for SAT-based subsumption resolution (Table 1), especially when it comes on solving large instances (Figure 3).Our results demonstrate the superiority of SAT-based subsumption resolution used with forward simplifications in saturation (e.g., Vampire * D and Vampire * I ), as concluded by Table 2.

Conclusion
We advocate SAT solving for improving saturation-based first-order theorem proving.We encode powerful simplification rules, in particular subsumption resolution, as SAT problems, triggering eager and efficient reasoning steps for the purpose of keeping proof search small.Our experiments with Vampire showcase the benefit of SAT-based subsumption.In the future, we aim to further extend simplification rules with SAT solving, in particular focusing on subsumption demodulation for equality reasoning [3].

Example 5 . 4 .
Consider again the clause pair (L 1 , M 1 ) of Figure 1.Compared to Example 5.3, our revised encoding of SR requires one additional variable c 2 , as m 2 in Example 5.3 is used with negative polarity.The revised constraints are: b

Figure 2
lists the cumulative instances solved by the respective Vampire versions, highlighting the strength of forward simplifications for effective saturation.

Fig. 2 :
Fig. 2: Cumulative instances of applying subsumption resolution, using the TPTP examples.A point (n, t) on the graph means that n forward simplify loops were executed in less than t µs.The flatter the curve, the faster the Vampire version is.

Fig. 3 :
Fig. 3: Average time (µs) spent on the creating and solving SAT-based subsumption resolution constraints.
Algorithm 2 SAT-based subsumption in saturation

Table 1 :
Average time spent in the Forward Simplify loop.Vampire * D is the fastest method, closely followed by the Vampire * I .However, the indirect encoding is much more stable and has a lower variance.