Superposition with Delayed Unification

Classically, in saturation-based proof systems, unification has been considered atomic. However, it is also possible to move unification to the calculus level, turning the steps of the unification algorithm into inferences. For calculi that rely on unification procedures returning large or even infinite sets of unifiers, integrating unification into the calculus is an attractive method of dovetailing unification and inference. This applies, for example, to AC-superposition and higher-order superposition. We show that first-order superposition remains complete when moving unification rules to the calculus level. We discuss some of the benefits this has even for standard first-order superposition and provide an experimental evaluation.


Introduction
Unification is a key feature in many proof calculi, particularly those based on the saturation framework.It acts as a filter, reducing the number of inferences that need to be carried out by instantiating terms only to the degree necessary.However, many unification algorithms have large time complexities and produce large, or even infinite, sets of unifiers.This is the case, for example, for AC-unification, which can produce a doubly exponential number of unifiers [10], and higher-order unification, which can produce an infinite set of unifiers [20].This motivates the study of how unification rules can be integrated into proof calculi to allow them to dovetail with standard calculus rules.One way to achieve this is to use the concept of unification with abstraction [17,13].The general idea is that during the unification process, instead of solving all unification pairs, certain pairs are retained and added to the conclusion of an inference as negative constraint literals.Calculus-level unification inferences then work on such literals to solve these constraints and remove the literals in the case they are unifiable.Note how this differs from constrained resolution-style calculi such as [4,15] where the constraints are completely separate from the rest of the clause and are not subject to inferences.
To demonstrate the idea of dedicated unification inferences in combination with unification with abstraction, we provide the following example.
A standard superposition calculus would proceed by unifying f (g(a, b)) and f (g(a, x) with the unifier σ = {x → b} and then rewriting C 1 with C 2 to derive tσ ̸ ≈ tσ .Equality resolution on tσ ̸ ≈ tσ would then derive ⊥.It is also possible to proceed by rewriting C 1 with C 2 without computing σ and instead add the constraint literal g(a, x) ̸ ≈ g(a, b) to the conclusion to derive t ̸ ≈ t ∨ g(a, x) ̸ ≈ g(a, b).A dedicated unification inference could then decompose the constraint literal resulting in t ̸ ≈ t ∨ a ̸ ≈ a ∨ b ̸ ≈ x.Further unification inferences could bind x to b, and remove the trivial pairs a ̸ ≈ a and t ̸ ≈ t to derive ⊥.
In this paper, we investigate moving unification to the calculus level for standard first-order superposition.Whilst this may seem like a regressive step, as we lose much of unification's power to act as a filter on inferences and hence produce many more clauses, we think the investigation is valuable for two reasons.
Firstly, by showing how syntactic first-order unification can be lifted to the calculus level, we provide a roadmap for how more complex unification problems can be lifted to the calculus level.This may prove particularly useful in the higher-order case, where abstraction may expose terms to standard calculus rules that were unavailable before.Moreover, we note that in our calculus we do not turn the entire unification problem into a constraint, but rather a subproblem.Whilst this may be merely an interesting detail for first-order unification, for more complex unification problems, such a method could be used to eagerly solve simple unification subproblems whilst delaying complex subproblems by adding them as constraints.
Secondly, one of the most expensive operations in first-order theorem provers is the maintenance of indices.Indices are crucial to the performance of modern solvers, as they facilitate the efficient retrieval of terms unifiable or matchable with a query term.However, solvers typically spend a large amount of time inserting and removing terms from indices as well as unifying against terms in the indices.This is particularly the case in the presence of the AVATAR architecture [24] wherein a change in the model can trigger the insertion and removal of thousands of terms from various indices.By moving unification to the calculus level, we can replace complex indices with simple hash maps, since to trigger an inference we merely need to check for top symbol equality and not unifiability.Insertion and deletion become O(1) time operations.However, for firstorder logic, we do not expect the time gained to offset the downsides of extra inferences carried out and extra clauses created.Our experimental results back up this hypothesis (see Section 7).Our main contributions are: ■ Designing a modified superposition calculus that moves unification to the calculus level (Section 3).
■ Proving the calculus to be statically and dynamically refutationally complete (Section 5).
■ Providing a thorough empirical evaluation of the calculus (Section 7).

Preliminaries
Syntax We consider standard monomorphic first-order logic with equality.We assume a signature consisting of a finite set of (monomorphically) typed function symbols and a single predicate, equality, denoted by ≈.A non-equality atom A can be expressed using equality as A ≈ ⊤ where ⊤ is a special function symbol [18].Terms are formed in the normal way from variables and function symbols.We commonly use s, t or u or their primed variants to refer to terms.We write s : τ to show that term s has type τ.A term is ground if it contains no variables.We use the notation s n to refer to a tuple or list of terms of length n.More generally, we use the over bar notation to refer to tuples and lists of various objects.Where the length of the tuple or list is not relevant, we drop the subscript.By s i we denote the ith element of the tuple s n .Literals are positive or negative equalities written as s ≈ t and s ̸ ≈ t respectively.We use s ≈t to refer to either a positive or a negative equality.Clauses are multisets of literals.A clause that contains no literals is known as the empty clause and denoted by ⊥.
A substitution is a mapping from variables to terms.We assume, w.l.o.g., that all substitutions are idempotent.We commonly denote substitutions using σ and θ and denote the application of a substitution σ to a term s by sσ .A substitution θ is grounding for a term s, if sθ is ground.The definition of grounding substitution can be extended to literals and clauses in the obvious manner.A substitution σ is a unifier of terms s and t if sσ = tσ .A unifier σ is more general than a unifier σ ′ if there exists a substitution ρ such that σ ρ = σ ′ .With respect to syntactic first-order unification, if two terms are unifiable then they have a single most general unifier up to variable naming [1].
A transitive irreflexive relation over terms is known as an ordering.The superposition calculus we present below is, as usual, parameterised by a simplification ordering on ground terms.An ordering ≻ is a simplification ordering, if it possesses the following properties.It is total on ground terms.It is compatible with contexts, meaning that if It is well-founded.Note that every simplification ordering has the subterm property.Namely, that if t is a proper subterm of s, then s ≻ t.For non-ground terms, the only property that is required of the ordering is that it is stable under substitution.That is, if s ≻ t then for all substitutions σ , sσ ≻ tσ .We extend the ordering ≻ to literals in the standard fashion via its multiset extension.A positive literal s ≈ s ′ is treated as the multiset {s, s ′ }, whilst a negative literal s ̸ ≈ s ′ is treated as the multiset {s, s, s ′ , s ′ }.The ordering is extended to clauses by its two-fold multiset extension.We use ≻ to denote the ordering on terms and its multiset extensions to literals and clauses.
Semantics An interpretation is a pair (U, I), where U is a set of typed universes and I is an interpretation function, such that for each function symbol f : Intuitively, what we are aiming for with our calculus, is that whenever standard superposition applies a substitution σ to a conclusion with the side condition "σ is a unifier of terms t 1 and t 2 ", our calculus adds a constraint t 1 ̸ ≈ t 2 to the conclusion.The calculus then has further inference rules that mimic the steps of a first-order unification algorithm and work on negative literals.Our presentation below does not quite follow this intuition.Instead, if the unification problem is trivial we solve it immediately.If it is non-trivial, we carry out a single step of unification and add the resulting sub-problems as constraints.Our reasons for doing this are two-fold.
1. Adding the entire unification problem t 1 ̸ ≈ t 2 as a constraint can lead to a constraint literal that is larger, with respect to ≻, than any literal occurring in the premises.This causes difficulties in the completeness proof.2.More pertinently, keeping in mind our planned applications to more complex logics, we wish to show that delayed unification remains complete even when only selected sub-problems of the original unification problem are added as constraints.
In the context of higher-order logic, for example, this could allow for the eager solving of simple unification sub-problems whilst only the most difficult are added as constraints.See Section 6 for further details.
Wherever we present a clause as a subclause C ′ and a literal l (e.g.C ′ ∨ l), we denote the entire clause by the same name as the subclause without the dash (e.g.we refer to the clause C ′ ∨ l by C).As in the classical superposition calculus, our calculus is parameterised by a selection function that is used to restrict the number of applicable inferences in order to avoid the search space growing unnecessarily.A selection function sel is a function that maps a clause to a subset of its negative literals.We say that literal l is σ -eligible in a clause C ′ ∨ l if it is selected in C (l ∈ sel(C)), or there are no selected literals and lσ is maximal in Cσ .Strict σ -eligibility is defined in a like fashion, with maximality replaced by strict maximality.Where σ is empty, we sometimes speak of eligibility instead of σ -eligibility.In what follows, CS is a multiset of literals that we refer to as constraints.
Both rules share the following side conditions.Let t stand for either f (t n ) or x.For SUP, the substitution σ mentioned in the side conditions is of course empty.
For VEQFACT, either u or u ′ must be a variable and σ is the most general unifier of u and u ′ .The side conditions for EQFACT are: The side conditions for VEQFACT are: The calculus also contains the following resolution / unification inferences.We refer to these as unification inferences, because each inference represents carrying out a single step of the well-known Robinson unification algorithm [11].
where for BIND, σ = {x → t} and x does not occur in t.
All three inferences require that the final literal be σ -eligible in Cσ (for DECOMPOSE and REFLDEL, σ is empty).We provide some examples to show how the calculus works.
Example 1.Consider the unsatisfiable clause set: Example 2. Consider the unsatisfiable clause set: Note 1.We abuse terminology and use inference and inference rule to refer both to schemas such as shown above, as well as concrete instances of such schemas.Given an inference ι, we refer to the tuple of its premises by prems(ι), to its maximal premise by mprem(ι), and to its conclusion by concl(ι).
We utilise Waldmann et al.'s framework [25] for proving the completeness of our calculus.Hence, our redundancy criterion is based on their intersected lifted criterion.In instantiating the framework, we roughly follow Bentkamp et al. [6].Let the calculus defined above be referred to as Inf .We introduce a ground inference system GInf that coincides with standard superposition [3].That is, it contains the well known three inferences, SUP, EQFACT and EQRES.We refer to these inferences by GSUP, GEQFACT and GEQRES to indicate that they are only applied to ground clauses.Following the notation of the framework, we write Inf (N) (GInf (N)) to denote the set of all Inf (GInf ) inferences with premises in a clause set N. We introduce a grounding function G that maps terms, literals and clauses to the sets of their ground instances.For example, given a clause C, G(C) is the set {Cθ | θ is a grounding substitution}.We extend the function G to clause sets by letting G(N) = C∈N G(C) where N is a set of clauses.
A ground clause C is redundant with respect to a set of ground clauses N if there are clauses C 1 , . . .,C n ∈ N such that for 1 The set of all ground clauses redundant with respect to a set of ground clauses N is denoted GRed Cl (N).
The set of all clauses redundant with respect a set of clauses N is denoted Red Cl (N).
In order to define redundant inferences, we have to pay careful attention to selection functions.For non-ground clauses, we fix a selection function sel.We then let G(sel) be a set of selection functions on ground clauses with the following property.For each gsel ∈ G(sel), for every ground clause C, there exists a clause D such that C ∈ G(D) and the literals selected in C by gsel correspond to those selected in D by sel.We write GInf gsel to show that the ground inference system GInf is parameterised by the selection function gsel.Let ι be an inference in Inf .We extend the grounding function G to a family of grounding functions G gsel for each gsel ∈ G(sel).Each function G gsel maps terms, literals and clauses as above, and maps members of Inf to subsets of GInf gsel as follows. 3efinition 1 (Ground Instance of an Inference).Let ι be of the form C 1 , . . .,C n ⊢ for some grounding substitution θ .In this case, we say that ι g is the θ -ground instance of ι.Note that we ignore the constraints in the definition of ground instances.
The set of all ground inferences redundant with respect to a set N is denoted GRed gsel I (N).
An inference ι is redundant with respect to a clause set N if for every gsel ∈ G(sel) and for every ι ′ ∈ G gsel (ι), ι ′ ∈ GRed gsel I (G(N)).In words, every ground instance of the inference is redundant with respect to G(N).We denote the set of all redundant inferences with respect to a set N as Red I (N).
A clause set N is saturated up to redundancy by an inference system Inf if every member of Inf (N) is redundant with respect to N. Note 2. Given the definition of clause redundancy above, the REFLDEL inference can be utilised as a simplification inference.That is, the conclusion of the inference renders the premise redundant.

Refutational Completeness
To prove refutational completeness we utilise the above mentioned framework of Waldmann et al. [25].In particular, we use Theorem 14 from the paper to lift completeness from the ground level to the non-ground level.We bring Theorem 14 here for clarity and to keep the paper self contained.We then present it in our notation.Let GRed = (GRed gsel I , GRed Cl ) and Red = (Red I , Red Cl ) Theorem 14 (from Waldmann et al. [25]).If (GInf q , Red q ) is statically refutationally complete w.r.t.|= q for every q ∈ Q and if for every N ⊆ F that is saturated w.r.t.FInf and Red ∩G there exists a q such that GInf q (G q (N)) ⊆ G q (FInf (N)) ∪ Red q I (G q (N)), then (FInf , Red ∩G ) is statically refutationally complete w.r.Thus, in our context, the set Q is G(sel), the ground inference system GInf q maps to GInf gsel , the ground redundancy criterion Red q maps to (GRed gsel I , GRed Cl ) and the ground entailment relation |= q maps to standard entailment on first-order clauses.Moreover, the non-ground inference system FInf maps to Inf and the redundancy criterion Red ∩G maps to (Red I , Red Cl ).Note, that this final mapping is not exact, as the criterion Red ∩G does not allow for a tiebreaker ordering, such as the strict subsumption relation, to be utilised in the definition of non-ground redundancy.However, this mismatch can easily be repaired since Theorem 16 of the framework paper extends the result of Theorem 14 to the case where tiebreaker orderings are used.
As our ground inference systems GInf gsel are ground superposition systems, static refutational completeness with respect to standard entailment and standard redundancy is a famous result.See for example [2].What remains for us to prove in order to apply Theorem 14 and show the static refutational completeness of Inf , is: 1.For every gsel ∈ G(sel), the grounding function G gsel is a grounding function in the sense of the framework.
2. For every clause set N saturated up to redundancy by Inf , there exists a gsel ∈ G(sel) such that GInf gsel (G(N)) ⊆ G gsel (Inf (N)) ∪ GRed gsel I (G(N)).In words, there exists a ground selection function such that every ground inference with that selection function and premises in G(N) is either the instance of a non-ground inferences with premises in N or is redundant with respect to G(N).
Lemma 1.For every gsel ∈ G(sel), the grounding function G gsel is a grounding function in the sense of the framework.
Proof.We need show that properties (G1) -(G3) defined by Waldmann et al. hold for grounding functions.These properties are: As properties (G1) and (G2) relate to the grounding of terms and clauses, and our grounding of these is fully standard we skip these.We prove (G3), which in our terminology is: for every ι ∈ Inf , G gsel (ι) ⊆ GRed gsel I (G(concl(ι))).This can be achieved by showing that for every ι ′ ∈ G gsel (ι), there exist clauses C ∈ G(concl(ι)) such that In what follows, let θ be the substitution by which ι ′ is a grounding of ι.
On the other hand, if CS is not empty, let u = f (t n ) and u ′ = f (s n ) be the two terms within prems(ι) from which the constraints are created.By the existence of ι ′ , we have that uθ = u ′ θ , and hence that t i θ = s i θ for 1 ≤ i ≤ n.Hence, every literal in CSθ has the form t ̸ ≈ t and is trivially false in every interpretation.Thus, we still have concl(ι)θ |= concl(ι ′ ).Moreover, by the subterm property of the ordering ≻ we have that t i θ ̸ ≈ s i θ is smaller than the maximal / selected literal of mprem(ι ′ ) for 1 ≤ i ≤ n and hence that concl(ι)θ ≺ mprem(ι ′ ).⊓ ⊔ Lemma 2. let σ be the most general unifier of terms s and s ′ , and θ be any unifier of the same terms.Then for any term t, (tσ )θ = tθ .
Proof.Since σ is the most general unifier, there must be a substitution ρ such that σ ρ = θ .Hence (tσ )θ = (tσ )σ ρ = tσ ρ = tθ where the second to last step follows from the fact that σ is idempotent.⊓ ⊔ Lemma 3.For every clause set N saturated by Inf , there exists a gsel ∈ G(sel Proof.For every D ∈ G(N) there must exist a clause C ∈ N such that D ∈ G(C).Let ≫ be an arbitrary well-founded ordering on clauses.We let C = G −1 (D) denote the ≫-smallest clause such that D ∈ G(C).We then choose the gsel ∈ G(sel) that for a clause D ∈ G(N) selects the corresponding literals to those selected by sel in G −1 (D).
Given this gsel, we need to show that every inference with premises in G(N) is either the ground instance of an inference with premises in N, or is redundant with respect to

G(N).
A SUP inference is redundant if the term t replaced in the second premise occurs at or below a variable.The proof is exactly the same as in the standard proof of the completeness of superposition [3], so we don't repeat it.All other inferences can be shown to be the ground instance of inferences from clauses in N.
Let ι ∈ GInf gsel be the following GSUP inference with premises in G(N).
ι fulfils all the side conditions of GSUP.Let σ be any substitution.The literal tθ ≈ t ′ θ being strictly maximal in Dθ implies that tσ ≈ t ′ σ is strictly maximal in Dσ due to the stability under substitution of ≻.The literal sθ [tθ ] ≈ s ′ θ being (strictly) eligible in Cθ with respect to gsel implies that sσ ≈ s ′ σ is strictly eligible in Cσ with respect to sel.Let p be the position of tθ within sθ and let u be the subterm of s at p. Since the term tθ does not occur below a variable of C, such a position must exist.Moreover, u cannot be a variable since if it was tθ would occur at a variable of C. As θ is a unifier of u and t, it must be the case that either t is a variable, or u and t have the same top symbol.Further, Dθ ≺ Cθ implies that Cσ ̸ ⪯ Dσ , tθ ≻ t ′ θ implies that tσ ̸ ⪯ t ′ σ , and sθ [t ′ θ ] ≻ s ′ θ implies sσ ̸ ⪯ s ′ σ .Thus, if t is not a variable, there exists the following SUP inference ι ′ from clauses D and C.
That is, the grounding of the conclusion of ι ′ less the constraint literals is equal to the conclusion of ι.Thus, ι is the θ -ground instance of ι ′ as per Definition 1 .If t is a variable x, then there exists the following VSUP inference ι ′ from clauses D and C.
Where σ = {x → u} is the most general unifier of t and u.Thus, we can use Lemma 2 to show that concl(ι ′ )θ = concl(ι) and again ι is the θ -ground instance of ι ′ .Let ι ∈ GInf gsel be the following GEQFACT inference with premise in G(N).
and ι fulfils all the side conditions of GEQFACT.Let σ be any substitution.The literal uθ ≈ vθ being maximal in Dθ implies that uσ ≈ vσ is maximal in Dσ .Since θ is a unifier of u ′ and u, at least one of them must be a variable, or they must share a top symbol.Moreover, uθ ≻ vθ implies that uσ ̸ ⪯ vσ and u , making ι the θ -ground instance of ι ′ as per Definition 1.If either u of ′ u is a variable there exists the following VEQFACT inference ι ′ from C.
Where σ is the most general unifier of u and u ′ .Thus, we can use Lemma 2 to show that concl(ι ′ )θ = concl(ι).Finally, let ι ∈ GInf gsel be the following GEQRES inference with premise in G(N).
and ι fulfils all the side conditions of GEQRES.Let σ be any substitution.The literal sθ ̸ ≈ s ′ θ being eligible with respect to gsel in Cθ implies that s ̸ ≈ s ′ is eligible in C with respect to sel.Since θ is a unifier of s and s ′ , at least one of them must be a variable, or they must share a top symbol.If s = s ′ , then there exists the following REFLDEL inference ι ′ from C.
Otherwise we have two options.If either s (or analogously s ′ ) is a variable, then there is the following BIND inference ι ′ from C.
Otherwise s and s ′ must share a top symbol and there is the following DECOMPOSE inference ι ′ from C.
In the first case, we have concl(ι ′ )θ = concl(ι).In the second case, σ is the most general unifier of s and s ′ , so we can use Lemma 2 to show that concl(ι ′ )θ = concl(ι).In the last case, we have that C ′ θ = concl(ι).Thus in all cases, ι is the θ -ground instance of ι ′ .

⊓ ⊔
Using Lemmas 1 and 3 we can instantiate Theorem 14 to prove the static refutational completeness of Inf .There is a slight issue here, as Theorem 14 gives us refutational completeness with respect to Herbrand entailment.That is We would like to prove completeness with respect to entailment as defined in Section 2 (known as Tarski entailment).This issue can easily be resolved by showing that the two concepts are equivalent with regards to refutations which can be achieved in a manner similar to Bentkamp et al. (Lemma 4.19 of [6]).
Theorem 1 (Static refutational completeness).For a set of clauses N saturated up to redundancy by Inf , N |= ⊥ if and only if ⊥ ∈ N.
Theorem 17 of Waldmann et al.'s framework can be used to derive dynamic refutational completeness from static refutational completeness.We refer readers to the framework for the formal definition of dynamic refutational completeness.
Theorem 2 (Dynamic refutational completeness).The inference system Inf is dynamically refutationally complete with respect to the redundancy criterion (Red I , Red Cl ).

Extending to Higher-Order Logic
We sketch how the ideas above can be extended to higher-order logic.This is ongoing research, and many of the technical details have yet to be fully worked out.Here, we provide a (very) informal description and then provide examples.The higher-order unification problem is undecidable and there can exist a potentially infinite number of incomparable most general unifiers for a pair of terms [12].Existing higher-order paramodulation style calculi deal with this issue in two main ways.One method is to abandon completeness and only unify to some predefined depth [22].Another approach is to produce potentially infinite streams of unifiers and interleave the fetching of items from such streams with the standard saturation procedure [7].Our idea is to solve easy sub-problems eagerly, such as when terms are first-order or in the pattern fragment [16], and add harder sub-problems as constraints.We then utilise dedicated inferences on negative literals to mimic the rules of Huet's well known (pre-)unification procedure [12].We think that inferences similar to the following two, could be sufficient to achieve refutational completeness.
In both rules, each z i is a fresh variable of the relevant type, and x s n ̸ ≈ f t m is selected in C. PROJECT has k ≤ n conclusions, one for each y i of suitable type.We hope that through a careful definition of the selection function, along with the use of purification, we can avoid the need to apply unification inferences to flex-flex literals (negative literals where both sides of the equality have variable heads).Moreover, we are hopeful that the calculus we propose can remain complete without the need for inferences that carry out superposition beneath variables such as the FLUIDSUP rule of λ -superpostion [7] and the SUBVARSUP rule of combinatory-superposition [9].Example 3. Consider the unsatisfiable clause set: where σ = {y → c}.Assume that the literal x a is selected in C 3 .We can carry out either a PROJECT step on this literal or an IMITATE step.The result of a project step is Applying the substitution and β -reducing results in C 5 = tσ ̸ ≈ tσ ∨ a ̸ ≈ a ∨ b ̸ ≈ b from which it is easy to reach a contradiction.
Example 4 (Example 1 of Bentkamp et al. [7]).Consider the unsatisfiable clause set: An IMITATE inference on the first literal of C 3 followed by the application of the substitution and some β -reduction results in We again carry out IMITATE on the first literal followed by an EQRES to leave us with We can now carry out a SUP inference between C 1 and C 6 resulting in from which it is simple to derive ⊥ via an application of IMITATE on either the first or the third literal.Note, that the empty clause was derived without the need for an inference that simulates superposition underneath variables, unlike in [7].
Example 5 (Example 2 of Bentkamp et al. [7]).Consider the unsatisfiable clause set: An EQRES inference on C 2 results in C 3 = y (λ x. g ( f x)) a ̸ ≈ g c ∨ y ̸ ≈ λ w x. w x.Assuming that the second literal is selected, 4 an EQRES inference results in C 4 = (y (λ x. g ( f x)) a ̸ ≈ g c){y → λ w x. w x}.Simplifying C 4 via applying the substitution and βreducing, we achieve g ( f a) ̸ ≈ g c.Superposing C 1 onto this clause we end up with C 5 = g c ̸ ≈ g c from which the empty clause can easily be derived.Note again, that the empty clause has been derived without recourse to a FLUIDSUP-like inference.

Experimental Results
We implemented the calculus in the Vampire theorem prover [14].We also implemented a variant of the calculus, that utilises fingerprint indices [19] to act as an imperfect filter.
The completeness proof indicates that a superposition inference only needs to be carried out when the two terms can possibly unify.Therefore, we store terms in fingerprint indices, which act as fast imperfect filters for finding unification partners, and only carry out superposition inferences with terms returned by the index.This restricts, somewhat, the number of inferences that take place, at the expense of some loss of speed.Thus, it represents a midway path between eager unification and delayed unification.As a final twist, we implemented a version of the calculus that uses fingerprint indices as well as solving constraint literals of the form x ̸ ≈ t (where x is not a subterm of t) and t ̸ ≈ t eagerly.Thus, in this version of the calculus there is no need for the BIND and REFLDEL rules.We compared each of these approaches with the standard superposition calculus implemented in Vampire.We refer to the standard calculus as VAMPIRE and the delayed inference calculus without fingerprint indices by VAMPIRE*. 5We refer to the delayed inference calculus with fingerprint indices by VAMPIRE † .Finally, we refer to the calculus that eagerly solves some constraint literals by VAMPIRE ‡ . 6 We tested these approaches against each other on benchmarks coming from CASC 2023 system competition [23].As our new approach is not currently compatible with higher-order or polymorphic input, we restricted the comparison to monomorphic firstorder problems.Namely, we used the 500 benchmarks in the FNE and FEQ categories.These are monomorphic, first-order benchmarks that either include equality (FEQ) or do not contain equality (FNE).All benchmarks in the set are theorems.The results can be seen in Table 1.All experiments were run on a node cluster located at The University of Manchester.Each node in the cluster is equipped with 192 gigabytes of RAM and 32 Intel ® Xeon processors with two threads per core.Each configuration was given 100s of CPU time per problem and run in single core mode.VAMPIRE was run with options --mode casc which causes it to use a tuned portfolio of strategies.All other variants were run with options --mode casc --forced_options duc=on which forces the use of the new calculus on top of the aforementioned portfolio.The calculi based on delayed unification perform badly in comparison to standard superposition.This is unsurprising, as syntactic first-order unification is already an efficient process.By replacing it with delayed unification, we gain little in terms of time, but pay a heavy penalty in terms of the number of inferences carried out.The use of fingerprint indices helps somewhat in mitigating this issue, but not a great deal.Eagerly solving trivial constraints shows more promise and is actually able to solve two problems that the standard calculus can not (within the time limit).These are the benchmarks CSR036+3.pand LAT347+3.p. 5 Our implementation can be found at https://github.com/vprover/vampire/tree/delayed-unification.To run the new calculus, use option -duc on.To run the standard calculus, the option duc is set to off. 6The code for both VAMPIRE † and VAMPIRE ‡ can be found at branch https://github.com/vprover/vampire/tree/delayed-unif-with-fp.VAMPIRE † was built from commit c04a08feb5db3e7468a1fa and VAMPIRE ‡ from commit fa2f139302b6a7a6487e73.

Approach
Again, option -duc on is required for the new calculi to run.
The only other proof calculi that we are aware of that explicitly integrate unification rules at the calculus level, are the higher-order paramodulation calculi [8,22] and lazy paramodulation [21].However, these calculi are paramodulation calculi and do not incorporate certain concepts of redundancy so crucial to the success of superposition provers.Moreover, the completeness proofs for these calculi are based on very different techniques to the Bachmair & Ganzinger style model building proofs commonly employed in the completeness proofs of superposition calculi.
There are other calculi that in some form do represent the folding of unification into the calculus, but the link between the unification rules and the calculus is less clear.For example, the recent work by one of the authors of this paper [13] relating to reasoning about linear arithmetic, moves theory reasoning relating to a number of equations from the unification algorithm to the calculus level.A different example, by another of this paper, is the combinatory-superposition calculus [9] which essentially folds higherorder combinatory unification into the calculus.In both cases, the relationship between the unification algorithm and the calculus rules is not obvious.
There are other methods of dovetailing unification with inference rules.For example, a unification procedure can be modified to return a stream of results.This stream can be interrupted in order to carry out further inferences and then returned to later.This is the approach taken by the higher-order Zipperposition prover [7] in order to handle the infinite sets of unifiers returned by higher-order unification.Conceptually, this is a very different solution to using constraints, since the intermediate terms created during unification are not available to the entire calculus as they are in our approach.Furthermore, from an implementation perspective, streams of unifiers are a far greater departure from the standard saturation architecture than the adding of constraints.Unification can also be partially delayed by preprocessing techniques such as Brand's modification method and its developments [5].
As mentioned in the introduction, abstraction resembles the basic strategy [4,15], where unification problems are added to the constraint part of a clause.Periodically, these constraints can be checked for satisfiability and clauses with unsatisfiable constraints removed.However, in the basic strategy, the constraints do not interact with the rest of the proof calculus.Moreover, redundancy of clauses can no longer be defined in terms of ground instances, but only in terms of ground instances that satisfy the constraints.This significantly affects the simplification machinery of superposition / resolution.Unification with abstraction was first introduced, to the best of our knowledge, by Reger et al. in [17] in the context of theory reasoning.However, the concept was introduced in an ad-hoc fashion with no theoretical analysis of its impact on the completeness of the underlying calculus.Recently, the relationship between unification modulo an equational theory and unification with abstraction has been analysed [13] and a framework developed linking the two.It remains to explore whether the current work can fit into that framework.
We have developed a first-order superposition calculus that delays unification through the use of constraints, and proved its completeness.Whilst the calculus does not perform well in practice, we feel that the calculus and its completeness proof form a template that can be followed to prove the completeness of calculi that involve unification procedures more complex than syntactic first-order unification.For example unification modulo a set of equations E. Some of the crucial features of our approach are: (1) the carrying out of partial unification and adding the remaining unification pairs back as constraints, and (2) the ignoring of constraint literals in the definition of redundant inference.In particular, feature (1) may well be crucial in taming issues relating to undecidable unification problems.For example, in higher-order logic where unification is undecidable, it is common to run unification to a particular depth and then give up if termination has not occurred.Of course, this harms completeness.With our approach it should be possible to add the remaining unification pairs back as constraints and maintain completeness.In the future, we would like to generalise our approach into a framework that can be used to prove the completeness of a variety of calculi as long as the unification problem for the underlying terms meets certain conditions.We would also like to explore instantiating such a framework to prove the completeness of particular calculi of interest to us such as AC-superposition and higher-order superposition.

valuation ξ is a function that maps each variable x : τ to a member of U τ . For a given interpretation M and valuation ξ , we uses t
ξ M to represent the denotation of t in M given ξ .A positive literal s ≈ t is true in an interpretation M for valuation ξ if s ξ ̸ ≈ t is true in an interpretation M for valuation ξ if s ≈ t is false.A clause C holds in an interpretation M for valuation ξ if one of its literals is true in M for ξ .An interpretation M models a clause C if C holds in M for every valuation.An interpretation models a clause set, if it models every clause in the set.A set of clauses M entails a set of clauses N, denoted M |= N, if every model of M is also a model of N.

Table 1 :
Summary of experimental results