A Comprehensive Framework for Saturation Theorem Proving

A crucial operation of saturation theorem provers is deletion of subsumed formulas. Designers of proof calculi, however, usually discuss this only informally, and the rare formal expositions tend to be clumsy. This is because the equivalence of dynamic and static refutational completeness holds only for derivations where all deleted formulas are redundant, but the standard notion of redundancy is too weak: A clause C does not make an instance \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$C\sigma $$\end{document}Cσ redundant. We present a framework for formal refutational completeness proofs of abstract provers that implement saturation calculi, such as ordered resolution and superposition. The framework modularly extends redundancy criteria derived via a familiar ground-to-nonground lifting. It allows us to extend redundancy criteria so that they cover subsumption, and also to model entire prover architectures so that the static refutational completeness of a calculus immediately implies the dynamic refutational completeness of a prover implementing the calculus within, for instance, an Otter or DISCOUNT loop. Our framework is mechanized in Isabelle/HOL.


Introduction
Saturation is one of the most successful approaches to automatic theorem proving. Provers based on resolution, superposition, and related proof calculi all implement some saturation procedure that systematically derives conclusions of inferences up to a redundancy criterion. More specifically, a saturation prover starts with a problem to refute, typically given as a set of clauses, and draws inferences from the clauses, adding the conclusions to the set. The prover may remove redundant clauses at any point. The refutation attempt ends as soon as the empty clause ⊥, denoting a contradiction, has been derived. The proof calculus specifies which inferences need to be performed and which clauses are redundant.
In their Handbook of Automated Reasoning chapter [6,Sect. 4], Bachmair and Ganzinger remark that "unfortunately, comparatively little effort has been devoted to a formal analysis of redundancy and other fundamental concepts of theorem proving strategies, while more emphasis has been placed on investigating the refutational completeness of a variety of modifications of inference rules, such as resolution." As a remedy, they present an abstract framework for saturation up to redundancy. Briefly, theorem proving derivations take the form N 0 N 1 · · · , where N 0 is the initial clause set and each step either adds inferred clauses or deletes redundant clauses. Given a suitable notion of fairness, the limit N ∞ of a fair derivation is saturated up to redundancy. If the calculus is refutationally complete and N ∞ does not contain ⊥, then N 0 has a model. We will refer to the calculus' refutational completeness as static as opposed to a prover's dynamic completeness.
Bachmair and Ganzinger also define a concrete prover, RP, based on a first-order ordered resolution calculus and the given clause procedure. However, like all realistic resolution provers, RP implements subsumption deletion: It will delete a clause C if it is subsumed by another clause C , meaning that C = C σ ∨ D for some substitution σ and some clause D. Yet the case where D = ⊥ is not covered by the standard definition of redundancy, according to which a clause C is redundant w.r.t. a clause set N if all its ground instances Cθ are entailed by strictly smaller ground instances of clauses belonging to N . Concretely, we would like P(x) to make P(a) redundant, but this fails because the instance of P(x) that entails P(a), namely P(a) itself, is not strictly smaller than P(a). As a result, RP-derivations are not -derivations, and Bachmair and Ganzinger's saturation framework is not applicable.
There are two ways to address this problem. In the Handbook, Bachmair and Ganzinger start from scratch and prove the dynamic refutational completeness of RP by relating nonground derivations to ground derivations. This proof, though, turns out to be rather nonmodular-it refers simultaneously to properties of the calculus, to properties of the prover, and to the fairness of the derivations. Extending it to other calculi or prover architectures would be costly. For this reason, most authors stop after proving static refutational completeness of their calculi.
An alternative approach is to extend the redundancy criterion so that subsumed clauses become redundant. As demonstrated by Bachmair and Ganzinger in 1990 [3], this is possible by redefining redundancy in terms of closures (C, θ) instead of ground instances Cθ . We show that this approach can be generalized and modularized: First, any redundancy criterion that is obtained by lifting a ground criterion can be extended to a redundancy criterion that supports subsumption without affecting static refutational completeness. Second, by applying this property to labeled formulas, it becomes possible to give generic completeness proofs for prover architectures in a straightforward way.
Most saturation provers implement a variant of the given clause procedure. We present an abstract version of the procedure that can be refined to obtain an Otter [28] or DISCOUNT [1] loop, and we prove it refutationally complete. We also present a generalization that decouples scheduling and computation of inferences, to support orphan formula deletion and inference dovetailing. A formula is an orphan when it has lost one of its parents to the redundancy criterion [24,39]; removing such formulas reduces the search space. Dovetailing makes it possible to support infinitary inference rules, such as λ-superposition and its possibly infinite sequences of higher-order unifiers [14].
When users of the framework instantiate these prover architectures with a concrete saturation calculus, they obtain the dynamic refutational completeness of the combination from the properties of the prover architecture and the static refutational completeness proof for the calculus. The framework is applicable to a wide range of calculi, including ordered resolution [6], unfailing completion [2], standard superposition [5], constraint superposition [29], theory superposition [44], and hierarchic superposition [8], It is already used in several published and ongoing works on combinatory superposition [15], λ-free superposition [12], λ-superposition [13,14], superposition with interpreted Booleans [33], AVATAR-style splitting [21], and superposition with SAT-inspired inprocessing [43].
This article is structured as follows: -Section 2 introduces the basic notions of Bachmair-Ganzinger-style saturation theorem proving, which also form the basis of our framework. These include the notions of inferences, redundancy, and static and dynamic refutational completeness. -Section 3 shows how to lift the refutational completeness of a ground calculus to the nonground level. This step adds support for subsumption and formula labels. -Section 4 presents several prover architectures that are variants of the given clause procedure, showing how to cope with multiple formula sets. -Section 5 briefly describes the Isabelle mechanization of the framework.
An earlier version of this work was presented at IJCAR 2020 [45]. This article extends the IJCAR paper with detailed proofs, more explanations and examples, and an index. The Isabelle mechanization is described in more detail in a CPP 2021 paper [41].

Preliminaries
Our framework is parameterized by abstract notions of formulas, inferences, and redundancy criteria, defined below. We also introduce various auxiliary concepts, notably static and dynamic refutational completeness, and study variations found in the literature.

Inferences and Redundancy
A set F of formulas is a nonempty set with a nonempty subset F ⊥ ⊆ F. Elements of F ⊥ represent false. Typically, F ⊥ is a singleton-i.e., F ⊥ = {⊥}. The possibility to distinguish between several false elements will be useful when we model concrete prover architectures, where different elements of F ⊥ represent different situations in which a contradiction has been derived.
It is easy to show that N 1 | N 2 if and only if N 1 | {C} for every C ∈ N 2 , and that N | i∈I N i if and only if N | N i for every i ∈ I . Moreover, all elements of F ⊥ are logically equivalent: If N | {⊥} for some ⊥ ∈ F ⊥ , then N | {⊥ } for every ⊥ ∈ F ⊥ .
Consequence relations are used (1) when one discusses the soundness of a calculus (and hence, when we justify the addition of formulas) and (2) when one discusses the refutational completeness of a calculus (and hence, when we justify the deletion of redundant formulas). Perhaps unexpectedly, the consequence relations used for these purposes may be different ones. A typical example is theory superposition, where one may use entailment w.r.t. all theory axioms for (1), but only entailment w.r.t. a subset of the (instances of the) theory axioms for (2). Another example is constraint superposition, where one uses entailment w.r.t. the set of all ground instances for (1), but entailment w.r.t. a subset of those instances for (2). Typically, the consequence relation | ≈ used for (1) is the intended one, and some additional calculus-dependent argument is necessary to show that refutational completeness w.r.t. the consequence relation | used for (2) implies refutational completeness w.r.t. | ≈.
An F-inference ι is a tuple (C n , . . . , C 0 ) ∈ F n+1 , n ≥ 0. The formulas C n , . . . , C 1 are called premises of ι; C 0 is called the conclusion of ι, denoted by concl(ι). An F-inference system Inf is a set of F-inferences. If N ⊆ F, we write Inf (N ) for the set of all inferences in Inf whose premises are contained in N . We write Inf (N , for the set of all inferences in Inf such that one premise is in M (and possibly in N as well) and the other premises are contained in N ∪ M.
One can find several slightly differing definitions for redundancy criteria, fairness, and saturation in the literature [6,8,44]. We discuss the differences in Sect. 2.3. Here we mostly follow Waldmann [44].
A redundancy criterion for an inference system Inf and a consequence relation | is a pair Red = (Red I , Red F ), where Red I : P (F) → P (Inf ) and Red F : P (F) → P (F) are mappings from sets of formulas to sets of inferences and from sets of formulas to sets of formulas that satisfy the following conditions for all sets of formulas N and N : Inferences in Red I (N ) and formulas in Red F (N ) are called redundant w.r.t. N . In a prover, Red I indicates which inferences need not be performed (e.g., because they have already been performed), whereas Red F justifies the deletion and simplification of formulas. Intuitively, (R1) states that deleting redundant formulas preserves inconsistency. (R2) and (R3) state that formulas or inferences that are redundant w.r.t. a set N remain redundant if arbitrary formulas are added to N or redundant formulas are deleted from N . (R4) ensures that computing an inference makes it redundant. Note that C ∈ Red F ({C}) generally does not hold.
The connection between redundant inferences and redundant formulas is given by the second part of (R3). Together with (R4) it implies that inferences with a redundant conclusion are themselves redundant: Let A be a set. An A-sequence is a finite sequence (a i ) k i=0 = a 0 , a 1 , . . . , a k or an infinite sequence (a i ) ∞ i=0 = a 0 , a 1 , . . . with a i ∈ A for all indices i. We use the notation (a i ) i≥0 or (a i ) i for both finite and infinite sequences. A nonempty sequence (a i ) i can be decomposed into a head a 0 and a tail (a i ) i≥1 . Given a relation ⊆ A × A, a -derivation is a nonempty A-sequence such that a i a i+1 for all valid indices.
We define the relation In other words, when taking a transition, we may add arbitrary formulas (N \ N ) and remove redundant formulas (N \ N ). In practice, the added formulas would normally be entailed by N . But since our framework is designed to establish only dynamic refutational completeness, we impose no soundness restrictions on added formulas. If dynamic soundness of a prover is desired, it can be derived immediately from the soundness of the inferences and does not require its own framework.
Let (N i ) i be a P (F)-sequence. Its limit (inferior) is the set N ∞ := i j≥i N j . The limit consists of the persistent formulas: These are formulas that eventually emerge in some N i and remain present in all the following sets Example 5 Let be a signature consisting of a binary predicate P, a unary function f, and a constant a. Let Inf be the set of inferences of the ordered resolution calculus with selection on -clauses, and let Red be the trivial redundancy criterion (Example 2). This calculus is refutationally complete [6,Sect. 3].
Consider the initial set N = {P(a, a), ¬P(x, x), P(f(x), y) ∨ ¬P(x, y)}, where the second literal of the third clause is selected and no other literals are selected. The inference ι = (¬P(x, x), P(a, a), ⊥) belongs to Inf (N ) and derives ⊥, so N is clearly unsatisfiable. However, without fairness, a derivation may never perform this inference. One such example is the infinite derivation where the clause ¬P(x, x) is always ignored in favor of inferences between P(f(x), y) ∨ ¬P(x, y) and a clause of the form P(f i (a), a). This derivation is not fair because In contrast, any derivation that is similar to the previous one but that performs the inference ι at some step would be fair. Now consider the satisfiable set N = {P(a, a), is clearly fair.
Using properties (R1)-(R3), it is possible to show that static and dynamic refutational completeness agree [6]: In some situations, we may wish to apply techniques that delete clauses even if they are not redundant. If such techniques are used at most finitely many times, we can incorporate them in the framework by letting the initial set N 0 correspond to the set of formulas after all inprocessing has been performed, effectively considering all the transitions that happened before N 0 as one big preprocessing step. For example, a first-order prover might discover a clause that contains a pure predicate symbol (a predicate symbol that always occurs with the same polarity in the clause set) and delete it. If the signature is finite, this can be done only finitely many times and is hence compatible with the framework. We only need to show that inprocessing preserves unsatisfiability.

Variations on a Theme
For some of the notions in Sects. 2.1 and 2.2 one can find alternative definitions in the literature.

Redundancy Criteria
As in Bachmair and Ganzinger's chapter [6,Sect. 4.1], we have specified in condition (R1) of redundancy criteria that the deletion of redundant formulas must preserve inconsistency. Alternatively, one can require that redundant formulas must be entailed by the nonredundant ones-i.e., N \ Red F (N ) | Red F (N )-leading to some obvious changes in Lemmas 10 and 37.
Bachmair and Ganzinger's definition of a redundancy criterion differs from ours in that they require only conditions (R1)-(R3). They call a redundancy criterion effective if an inference ι ∈ Inf is in Red I (N ) whenever concl(ι) ∈ N ∪ Red F (N ). As demonstrated by Lemma 1, that condition is equivalent to our condition (R4).

Inferences from Redundant Premises
Inferences from redundant premises are sometimes excluded in the definitions of saturation, fairness, and refutational completeness [6], and sometimes not [5,10,30,44]. 1 Similarly, redundancy of inferences is sometimes defined in such a way that inferences from redundant premises are necessarily redundant themselves [5,10], and sometimes not [6,30,44]. There are good arguments for each of these choices. On the one hand, one can argue that the saturation of a set of formulas should not depend on the presence or absence of redundant formulas, and that inferences from redundant formulas should be redundant as well. On the other hand, in any reasonable proof system, formulas are deleted from the set of formulas as soon as they are shown to be redundant, so why should we care whether the set is saturated even if we do not delete formulas that have been proved to be redundant?
To clarify how the different definitions found in the literature relate to each other, we define "reduced" variants of the definitions in Sects. 2.1 and 2.2. A set N ⊆ F is called reducedly saturated w.r.t. Inf and Red if Inf (N \ Red F (N )) ⊆ Red I (N ). The pair (Inf , Red) is reducedly statically refutationally complete w.r.t. | if for every reducedly saturated set Red) is reducedly dynamically refutationally complete w.r.t. | if for every reducedly fair Red -derivation (N i ) i such that N 0 | {⊥} for some ⊥ ∈ F ⊥ , we have ⊥ ∈ N i for some i and some ⊥ ∈ F ⊥ . A reduced redundancy criterion for | and Inf is a redundancy criterion Red = (Red I , Red F ) that additionally satisfies Inf (F, Red F (N )) ⊆ Red I (N ) for every N ⊆ F. Recall that Inf (N , M) denotes the set of Inf -inferences with at least one premise in M and the others in N ∪ M.
For reduced redundancy criteria, saturation and reduced saturation agree: , and since Red is reduced, we get again ι ∈ Red I (N ).

Corollary 14 If Red is a reduced redundancy criterion, then (Inf , Red) is statically refutationally complete if and only if it is reducedly statically refutationally complete.
An arbitrary redundancy criterion Red = (Red I , Red F ) can always be extended to a reduced redundancy criterion

Lemma 15 Red is a reduced redundancy criterion.
Proof Since Red F is left unchanged, (R1) and the first parts of (R2) and (R3) are obvious. (R4) holds because ι ∈ Red I (N ) ⊆ Red I (N ) for every inference ι with concl(ι) ∈ N . Moreover, Red is clearly reduced. It remains to prove the second parts of (R2) and (R3).
For (R2), assume N ⊆ N . Then Red I (N ) ⊆ Red I (N ) and Red F (N ) ⊆ Red F (N ). Moreover, Inf is clearly monotonic, so Inf (F, Red F (N )) ⊆ Inf (F, Red F (N )), and therefore

Lemma 16
If N ⊆ F is saturated w.r.t. Inf and Red, then N is saturated w.r.t. Inf and Red .
The converse does not hold:

Example 17
Consider a signature consisting of the four propositional variables (or nullary predicate symbols) P, Q, R, S. Let Inf be the set of inferences of the ordered resolution calculus with selection over clauses over the signature. Define Red F such that a clause C is contained in Red F (N ) if it is entailed by clauses in N that are smaller than C. Define Red I such that an inference is contained in Red I (N ) if its conclusion is entailed by clauses in N that are smaller than its largest premise. Then Red := (Red I , Red F ) is a redundancy criterion.
Let N be the set of clauses (1) ¬Q ∨ P, (2) ¬S ∨ R ∨ Q, (3) ¬S ∨ Q, where the atom ordering is P > Q > R > S and the first literals of (1) and (3) are selected. Due to the selection, Inf (N ) contains only a single inference, namely the ordered resolution inference ι between (2) and (1). The largest premise of ι is (1). The premise (2) is entailed by the smaller clause (3) and therefore contained in Red F (N ). Consequently, ι ∈ Red I (N ), which means that N is saturated w.r.t. Red . On the other hand, the conclusion ¬S ∨ R ∨ P is not entailed by the clauses in N that are smaller than (1)-i.e., (2) and (3)-so ι / ∈ Red I (N ). Therefore, N is not saturated w.r.t. Red.

Lemma 18
The following properties are equivalent for every N ⊆ F: Proof To show that (i) implies (ii), assume that N is reducedly saturated w.r.t. Inf and Redi.e., Inf (N \Red F (N )) ⊆ Red I (N ). We must show that In both cases, we conclude ι ∈ Red I (N ).
To show that (ii) implies (i), assume that N is saturated w.r.t. Inf and Red -i.e., The equivalence of (i)-i.e., Even though Red and Red are not equivalent as far as saturation is concerned, they are equivalent w.r.t. refutational completeness:
The equivalence of (iii) and (iv) follows from Lemma 15 and Corollary 14.
It remains to show the equivalence of (ii) and (iii). Observe that (ii) means that ( * ) holds for every set N ⊆ F that is reducedly saturated w.r.t. Inf and Red, and that (iii) means that ( * ) holds for every set N ⊆ F that is saturated w.r.t. Inf and Red . By Lemma 18, these two properties are equivalent.
The limit of a reducedly fair Red -derivation is a reducedly saturated set. 2 This is proved analogously to Lemma 9: Lemma 20 If (N i ) i is a reducedly fair Red -derivation, then the limit N ∞ is reducedly saturated w.r.t. Inf and Red.
Lemmas 10 and 11 can then be reproved for reduced static and reduced dynamic refutational completeness. Together with Theorem 19, we obtain this result:

Theorem 21
The properties (i)-(iv) of Theorem 19 and the following four properties are equivalent: Summarizing, we see that there are some differences between the "reduced" and the "nonreduced" approach, but that these differences are restricted to the intermediate notions, notably saturation. As far as (static or dynamic) refutational completeness is concerned, both approaches agree. Furthermore, Theorem 21 demonstrates that we can mix and match definitions from both worlds. Consequently, when we want to build on an existing refutational completeness proof for some saturation calculus, it does not matter which approach has been used there.
Given that the "nonreduced" definitions in Sects. 2.1 and 2.2 are simpler that than the "reduced" ones in the current section, there is usually little reason to prefer the "reduced" ones. For our purposes, a major advantage of the "nonreduced" definitions is that Red F and Red I are separated as much as possible. In particular, our definitions of saturation and static refutational completeness do not depend on redundant formulas, but only on redundant inferences. This property will be crucial for the proof of Theorem 45 in Sect. 3. [6,Sect. 4.1]. This is a quite peculiar property. First of all, it is overly complicated: If the conclusion of an inference

Bachmair and Ganzinger consider a sequence
. But this contradicts the assumption that ι ∈ Inf (N ) \ Red I (N ). So the condition can be simplified to Inf (N ) ⊆ Red I (N ), and since Red , the (simplified) condition is entailed by reduced fairness. There is a crucial difference, though: While reduced fairness requires that every inference from N ∞ is redundant or has a redundant premise at some finite step of the derivation, the Bachmair-Ganzinger definition also admits derivations where redundancy is achieved only in the limit.

Example 22
Consider a signature consisting of two unary predicate symbols P, Q, a unary function symbol f, and a constant b. Let Inf be the set of inferences of the ordered resolution calculus with selection over clauses over the signature.
Let N be the set of clauses (1) , where the atom ordering is a lexicographic path ordering with precedence P > Q > f > b and the first literals of (2) and (4) are selected. From (1) and (2), we obtain in the first derivation step P(f(b)), in the second step P(f(f(b))), and so on. The limit N ∞ consists of the four initial clauses (1)-(4) and all clauses of the form P(f i (b)) with i ≥ 1. The resolution inference between (3) and (4), yielding P(f(x)), is therefore redundant w.r.t. N ∞ , since for each of its ground instances the conclusion P Therefore, the sequence of clause sets is fair according to the definition in Bachmair and Ganzinger [6, Sect. 4.1], but neither fair nor reducedly fair according to our definitions.
Of course, a redundancy property that holds only for the limit of an infinite sequence generally cannot be checked effectively. In other words, Bachmair and Ganzinger's definition is more permissive than our alternative definition, but the additional degree of freedom can hardly be exploited in a theorem prover.

Semi-redundancy
Bachmair, Ganzinger, and Waldmann [8] use a definition of redundancy criteria that requires (R2) only for formulas and (R3) only for inferences. With their definition of fairness, this is sufficient to show that the limit of a fair Red -derivation is saturated, and thus, to show that static refutational completeness implies dynamic refutational completeness. Their definition of fairness, however, requires essentially that inferences from formulas in the limit N ∞ are redundant w.r.t. the limit, and since they do not enforce that an inference that is redundant at some step of the derivation is redundant w.r.t. the limit, this cannot be checked effectively in a theorem prover.

Nonstrict Redundancy
Nieuwenhuis and Rubio [29,30] and Peltier [34] define a ground clause C to be nonstrictly redundant w.r.t. a set N of ground clauses if C is entailed by smaller or equal clauses in N . This definition does not satisfy our condition (R3). Consequently, it can be used for proving the static completeness of a calculus, but it is insufficient to establish the connection between static and dynamic completeness (unless the notion of fairness is strengthened).

Intersections of Redundancy Criteria
In descriptions of concrete saturation calculi, we frequently encounter the situation that the calculus is parameterized in some way and that exactly one value of the parameter is used to show that every saturated N ⊆ F with N | {⊥} contains ⊥, but that this value is still unknown during the actual saturation process. Consequently, inferences and formulas may be considered as redundant during the saturation only if they are redundant for every possible value of the parameter. To model this situation in our framework, it is useful to define consequence relations and redundancy criteria as intersections of previously defined consequence relations or redundancy criteria.
Let Q be an arbitrary nonempty set, and let (| q ) q∈Q be a Q-indexed family of consequence relations over F. Define | ∩ := q∈Q | q .
Let Inf be an inference system, and let (Red q ) q∈Q be a Q-indexed family of redundancy criteria, where each Red q = (Red

Lemma 24
Red ∩ is a redundancy criterion for | ∩ and Inf .

Lemma 25 A set N ⊆ F is saturated w.r.t. Inf and Red ∩ if and only if it is saturated w.r.t.
Inf and Red q for every q ∈ Q.
In many cases where a redundancy criterion Red ∩ is defined as the intersection of other criteria, the consequence relations | q agree for all q ∈ Q.
There are some exceptions, though, for example constraint superposition [29], where the parameter q is a convergent rewrite system R and | q is entailment modulo R. For such calculi, one can typically demonstrate the static refutational completeness of (Inf , Red ∩ ) in the following form:

Lemma 26 If for every set N ⊆ F that is saturated w.r.t. Inf and Red ∩ and does not contain
If the condition of the lemma holds, then N must contain some ⊥ ∈ F ⊥ . Therefore, (Inf , Red ∩ ) is statically refutationally complete w.r.t. | ∩ .

Lifting
A standard approach for establishing the refutational completeness of a calculus is to first concentrate on the ground case and then lift the results to the nonground case. In this section, we show how to perform this lifting abstractly, given a suitable grounding function G. The function maps every formula C ∈ F to a set G(C) of formulas from a set of formulas G. Depending on the logic and the calculus, G(C) may be, for example, the set of all ground instances of C, a subset of the set of ground instances of C, or even a set of formulas from another logic. Similarly, FInf -inferences are mapped to sets of GInf -inferences, and saturation w.r.t. FInf -inferences is related to saturation w.r.t. GInf -inferences.
There are calculi where some FInf -inferences ι do not have a counterpart in GInf , such as the PosExt inferences of λ-free superposition [12]. In these cases, we set G(ι) = undef .

Standard Lifting
Given two sets of formulas F and G, an F-inference system FInf , a G-inference system GInf , and a redundancy criterion Red for GInf , let G be a function that maps every formula in F to a subset of G and every F-inference in FInf to undef or to a subset of GInf . The function G is called a grounding function if The function G is extended to sets N ⊆ F by defining G (N

Remark 27
Conditions (G1) and (G2) express that false formulas may only be mapped to sets of false formulas, and that only false formulas may be mapped to sets of false formulas. For most applications, it would be possible to replace condition (G3) by which implies (G3) by property (R4). There are some calculi, however, for which (G3 ) is too strong. Typical examples are calculi where the F-inferences include some normalization or abstraction step that does not have a counterpart in the G-inferences. So an F-inference ι may have a conclusion C ∨ t ≈ t , where the literal t ≈ t results from the normalization step, but the conclusions of the instances of ι have the form Cθ for a substitution θ that unifies t and t . In this case, (G3) is still satisfied, but (G3 ) is not.

Example 28
In standard superposition, F is the set of all universally quantified first-order clauses over some signature , G is the set of all ground first-order clauses over , and G maps every clause C to the set of its ground instances Cθ and every superposition inference ι to the set of its ground instances ιθ .
Let G be a grounding function from F and FInf to G and GInf , and let | ⊆ P (G) × P (G) be a consequence relation over G. We define the relation . We call | G the G-lifting of | . It corresponds to Herbrand entailment. If Tarski entailment (i.e., N 1 | T N 2 if and only if any model of N 1 is also a model of N 2 ) is desired, the mismatch can be repaired by showing that the two notions of entailment are equivalent as far as refutations are concerned.
Let Red = (Red I , Red F ) be a redundancy criterion for | and GInf . We define functions Red

Theorem 30 Red G is a redundancy criterion for | G and FInf .
We omit the proof at this point since we will prove a more general result (Theorem 42) in Sect. 3.2. The following folklore lemma connects a nonground calculus with a ground calculus it overapproximates. N )). In the second case, we are done immediately. In the first case, N )). N )). Using this terminology, we can rephrase the lemma as follows: If N is saturated and every unliftable inference from G(N ) is redundant w.r.t. G(N ), then G(N ) is saturated.
By the previous lemma, we know that G(N ) is saturated w.r.t. GInf and Red, so there exists some ⊥ G ∈ G ⊥ such that ⊥ G ∈ G(N ). Hence ⊥ G ∈ G(C) for some C ∈ N , which implies C ∈ F ⊥ by property (G2) of grounding functions. Now define ⊥ := C.

Example 33
In ordered binary resolution without selection [6,35], all inferences are liftable, as demonstrated below. Let be a first-order signature containing at least one constant, let F be the set of all -clauses without equality, and let G be the set of all ground -clauses without equality. Let FInf and GInf be the sets of all resolution or factoring inferences from clauses in respectively F and G that satisfy the given ordering restrictions, and let G be the function that maps every clause C ∈ F to the set of all its ground instances Cθ and that maps every inference (C n , . . . , C 0 ) ∈ FInf to the set of all (C n θ, . . . , C 0 θ) ∈ GInf . Then every resolution inference in GInf from ground instances of clauses in N has the form , and analogously for factoring inferences.
Therefore, the static refutational completeness of GInf implies the static refutational completeness of FInf .
The liftability result above holds also for ordered binary resolution with selection, provided that the selection function fsel on F and the selection function gsel on G are such that every clause D ∈ G(N ) inherits the selection of at least one clause C ∈ N for which D ∈ G(C). One can show that for every N ⊆ G and fsel, such a gsel exists. However, this gsel depends on N , and therefore Theorem 32 is not applicable. We will discuss this issue further in Sect. 3.3.

Example 34
In the superposition calculus without selection [5], all inferences are liftable, except superpositions at or below a variable position. Let be a first-order signature containing at least one constant and no predicate symbols except ≈, let F be the set of all -clauses with equality, and let G be the set of all ground -clauses with equality. Let FInf and GInf be the sets of all superposition, equality resolution, and equality factoring inferences from clauses in respectively F and G that satisfy the given ordering restrictions, and let G be the function that maps every clause C ∈ F to the set of all its ground instances Cθ and that maps every inference (C n , . . . , C 0 ) ∈ FInf to the set of all (C n θ, . . . , C 0 θ) ∈ GInf . Then every equality resolution or equality factoring inference from ground instances of clauses in N is contained in G(ι) for some inference ι ∈ FInf (N ). The same applies to superposition inferences with sθ | p = tθ , provided that p is a position of s and s| p is not a variable. Otherwise, p = p 1 p 2 for some variable x occurring in s at the position p 1 , so xθ | p 2 = tθ . In this case, define θ such that xθ = xθ [t θ ] p 2 and yθ = yθ for y = x. By congruence, the conclusion of the inference is entailed by the first premise (which is necessarily smaller than the second) and C θ ∨ [¬] sθ ≈ s θ . The ordering restrictions of the calculus require that tθ t θ ; hence the latter clause is also smaller than the second premise. By the usual redundancy criterion for superposition, this renders the inference redundant w.r.t. N .
Like for ordered resolution, the static refutational completeness of GInf implies the static refutational completeness of FInf .

Adding Tiebreaker Orderings
We now strengthen the G-lifting of redundancy criteria introduced in the previous subsection to also support subsumption deletion. Let = ( D ) D∈G be a G-indexed family of strict partial orderings on F that are well founded (i.e., for every D, D there exists no infinite descending chain C 0 D C 1 D · · · ). We define Red G, F : P (F) → P (F) as follows: Notice how D is used to break ties between C and C , possibly making C redundant. We call Red G, := (Red For nearly all applications (with a notable exception in Example 49 below), the orderings D agree for all D ∈ G. In these cases, we may take as a single well-founded strict partial ordering, rather than as a G-indexed family of such orderings. We get the previously defined F ) by setting D := ∅-i.e., the empty strict partial ordering on F-for every D ∈ G. As demonstrated by the following lemma, we may assume without loss of generality that the formula C in the definition of Red F ) is a redundancy criterion. We start with a technical lemma: (G(N )). Since D ∈ G(N ), there exists C ∈ N with D ∈ G(C). Let C be a minimal formula with this property w.r.t. D .
Assume that C ∈ Red G, F (N ). Then, by definition, D ∈ Red F (G(N )) or there exists C ∈ N such that C D C and D ∈ G(C ). The first property contradicts our initial assumption, whereas the second property contradicts the minimality of C. So C / ∈ Red G, We can now show that (Red G I , Red G, F ) satisfies the properties (R1)-(R4) of redundancy criteria: (G(N )). Combining the two relations, we obtain G(N \ Red Proof Obvious. N \ N )). Case 2: D / ∈ Red F (G(N )) and there exists C ∈ N \ Red G, N )), and by property (R3) also in Red N ) by Lemma 36, this implies ι ∈ Red I (G (N \ N )) by (R2). Since every ι ∈ G(ι) is contained in Red I (G(N \ N )), we conclude that ι ∈ Red G I (N \ N ). (G(N )). Let D ∈ G(concl(ι)). We consider two cases: N \ N )). Combining both cases, we obtain G(concl(ι)) ∈ G(N \ N ) ∪ Red F (G(N \ N )), hence ι ∈ Red G I (N \ N ).
Proof Let ι ∈ FInf such that concl(ι) ∈ N . If G(ι) = undef , then by property (G3) of grounding functions, G(ι) is a subset of Red I (G(concl(ι))), which in turn is a subset of Red I (G(N )). So ι ∈ Red G I (N ). The instantiation ordering · > := · ≥ \ · ≤ is well founded. By choosing := · >, we obtain a criterion Red G, that includes standard redundancy (Example 3) and also supports subsumption deletion. (It is customary to define subsumption so that C is subsumed by C if C = C σ ∨ D for some substitution σ and some possibly empty clause D, but since the case where D is nonempty is already supported by the standard redundancy criterion, the instantiation ordering · > is sufficient.) Similarly, for proof calculi modulo commutativity (C) or associativity and commutativity (AC), we can let C · ≥ C be true if there exists a substitution σ such that C equals C σ up to the equational theory (C or AC). The relation · > = · ≥ \ · ≤ is then again well founded.
Example 47 For higher-order calculi such as higher-order resolution [25] and λ-superposition [14], the instantiation ordering is not well founded, as witnessed by the chain is an infinite chain if is a simplification ordering.
Even if the instantiation ordering for some logic is not well founded, as in the two examples above, we can always define as the intersection of the instantiation quasi-ordering and an appropriate ordering based on formula sizes or weights, such as or size(C) = size(C ) and C contains fewer distinct variables than C .
Conversely, the relation can be more general than subsumption. In Sect. 4, we will use it to justify the movement of formulas between sets in the given clause procedure.

Example 49
There are a few applications, notably for superposition-based decision procedures [7], where one would like to define Red G, F using the reverse instantiation ordering · <. In this way, a clause P(x) would for example become redundant in the presence of the clauses P(b) and P(c), provided that b and c are the only two ground terms. The reverse instantiation ordering is not well founded on the set of all first-order clauses: P(x) · < P(f(x)) · < P(f(f(x))) · < · · · . However, it is well founded if we restrict it to the set of generalizations gen(D) := {C | D = Cθ for some θ } of a fixed ground clause D, so that we may in fact define := ( D ) D where D := · < ∩ (gen(D) × gen(D)). For this application, the possibility of taking to be a G-indexed family of well-founded strict partial orderings, as opposed to a single such ordering, is vital.

Intersections of Liftings
The results of the previous subsection can be extended in a straightforward way to intersections of lifted redundancy criteria. As before, let F and G be two sets of formulas, and let FInf be an F-inference system. In addition, let Q be a nonempty set. For every q ∈ Q, let | q be a consequence relation over G, let GInf q be a G-inference system, let Red q be a redundancy criterion for | q and GInf q , and let G q be a grounding function from F and FInf to G and GInf q . Let := ( D ) D∈G be a G-indexed family of well-founded strict partial orderings on F. 3 For each q ∈ Q, we know by Theorem 42 that the (G q , ∅)-lifting Red q,G q ,∅ = (Red Theorem 50 If (GInf q , Red q ) is statically refutationally complete w.r.t. | q for every q ∈ Q, and if for every N ⊆ F that is saturated w.r.t. FInf and Red ∩G there exists a q such that GInf q (G q (N ) Proof Assume that (GInf q , Red q ) is statically refutationally complete w.r.t. | q for every q ∈ Q and that for every N ⊆ F that is saturated w.r.t. FInf and Red ∩G there exists a q such that GInf q (G q (N )) ⊆ G q (FInf (N )) ∪ Red q I (G q (N )). Let N ⊆ F be saturated w.r.t. FInf and Red ∩G and assume that N | ∩ G {⊥} for some ⊥ ∈ F ⊥ . We must show that ⊥ ∈ N for some ⊥ ∈ F ⊥ . First, we know that there exists a q such that GInf q (G q (N )) ⊆ G q (FInf (N ))∪Red q I (G q (N )). Since Red ∩G = q∈Q Red q,G q ,∅ , we know by Lemma 25 that N is saturated w.r.t. FInf and the (G q , ∅)-lifting Red q,G q ,∅ of Red q . Therefore, by Lemma 31, G q (N ) is saturated w.r.t. GInf and Red q .
Since G q (N ) is saturated w.r.t. GInf and Red q , there must exist some ⊥ G ∈ G ⊥ such that ⊥ G ∈ G q (N ). Hence ⊥ G ∈ G q (C) for some C ∈ N , which implies C ∈ F ⊥ by property (G2) of grounding functions. Now define ⊥ := C.
Since the first components of Red ∩G and Red ∩G, agree, we obtain the analogues of Lemmas 43 and 44 and Theorem 45: Example 54 Intersections of liftings are needed to support selection functions in ordered resolution [6] and superposition [5]. The calculus FInf is parameterized by a function fsel on the set F of first-order clauses that selects a subset of the negative literals in each C ∈ F. There are several ways to extend fsel to a selection function gsel on the set G of ground clauses such that for every D ∈ G there exists some C ∈ F such that D = Cθ and D and C have corresponding selected literals.
For example, if fsel selects the first literal in C 1 = ¬P(x) ∨ ¬Q(c) and the second literal in C 2 = ¬P(b) ∨ ¬Q(y), then gsel could select the first literal in D = ¬P(b) ∨ ¬Q(c) (as in C 1 ) or the second literal (as in C 2 ). For every such gsel, | gsel is first-order entailment, GInf gsel is the set of ground inferences satisfying gsel, and Red gsel is the redundancy criterion for GInf gsel . The grounding function G gsel maps C ∈ F to { D ∈ G | D = Cθ for some θ } and ι ∈ FInf to the set of ground instances of ι in GInf gsel with corresponding literals selected in the premises.
If ι is the FInf -inference where the first literal in the second premise C 1 is selected by fsel, then is contained in G gsel (ι) ⊆ GInf gsel if gsel selects the first literal in the right premise (as in C 1 ) but it is not contained in GInf gsel (and hence not in G gsel (ι)) if gsel selects the second literal in the right premise (as in C 2 ). In the static refutational completeness proof, only one gsel is needed, but this gsel depends on the limit of a derivation and is not known during the derivation. Therefore, fairness must be guaranteed w.r.t. Red gsel,G gsel I for every possible extension gsel of fsel. Checking Red ∩G I amounts to a worst-case analysis, where we must assume that every ground instance Cθ ∈ G of a premise C ∈ F inherits the selection of C.

Example 55
Intersections of liftings are also necessary for constraint superposition calculi (Nieuwenhuis and Rubio [29]). Here the calculus FInf operates on the set F of first-order clauses with (ordering and equality) constraints. For a convergent rewrite system R, | R is first-order entailment up to R on the set G of unconstrained ground clauses, GInf R is the set of ground superposition inferences, and Red R is redundancy up to R. The grounding function G R maps C [[K ]] ∈ F to { D ∈ G | D = Cθ, K θ = true, xθ isR-irreducible for all x } (except in degenerate cases where x occurs only in positive literals x ≈ t) and ι ∈ FInf to the set of ground instances of ι where the premises and conclusion of G R (ι) are the G R -ground instances of the premises and conclusion of ι. In the static refutational completeness proof, only one particular R is needed, but this R is not known during a derivation, so fairness must be guaranteed w.r.t. Red R,G R I for every convergent rewrite system R.
To obtain a practically useful criterion, the intersection Red ∩G must be approximated appropriately; compare Nieuwenhuis and Rubio's Definition 6.5 (which corresponds to our Red ∩G ) to their Lemma 6.18.
Example 56 Some calculi have inference rules that introduce Skolem function symbols. An example is the δ-elimination rule of Ganzinger and Stuber [23]. At the nonground level, the difficulty is that whenever we generate a conclusion with a fresh symbol sk i , we need to mark all other instances of the rule with sk j ( j = i) as redundant; otherwise, we would end up generating lots of needless conclusions. This determinism can be avoided by encoding enough information to identify the rule instance in the subscript i.
At the ground level, a second difficulty arises. The ground inference cannot simply introduce a nullary Skolem symbol-in general, this would not match the behavior of the corresponding nonground inference. Instead, it must guess both the Skolem symbol and its argument list. This guessing can be achieved using a selection function that takes the rule instance as argument and returns the Skolem term. We can then lift the ground inference by taking the intersection of all possible selection functions.
Almost every redundancy criterion for a nonground inference system FInf that can be found in the literature can be written as Red G,∅ for some grounding function G from F and FInf to G and GInf , and some redundancy criterion Red for GInf , or as an intersection Red ∩G of such criteria. As Theorem 53 demonstrates, every static refutational completeness result for FInf and Red ∩G -which does not generally support the deletion of subsumed formulas during a run-immediately yields a dynamic refutational completeness result for FInf and Red ∩G, -which permits the deletion of subsumed formulas during a run, provided that they are larger according to .

Adding Labels
In practice, the orderings D used in (G, )-liftings often depend on meta-information about a formula, such as its age or the way in which it has been processed so far during a derivation.
To capture this meta-information, we extend formulas and inference systems in a rather trivial way with labels.
As before, let F and G be two sets of formulas, let FInf be an F-inference system, let GInf be a G-inference system, let | ⊆ P (G) × P (G) be a consequence relation over G, let Red be a redundancy criterion for | and GInf , and let G be a grounding function from F and FInf to G and GInf .
Let L be a nonempty set of labels. Define FL := F × L and FL ⊥ := F ⊥ × L. Notice that there are at least as many false values in FL as there are labels in L. We use M , N to denote labeled formula sets. Given a set N ⊆ FL, let N := {C | (C, l) ∈ N } denote the set of formulas without their labels.
In other words, whenever there is an FInf -inference from some premises, there is a corresponding FLInf -inference from the labeled premises (regardless of the labeling), and whenever there is an FLInf -inference from labeled premises, there is a corresponding FInfinference from the unlabeled premises. Let The extension to intersections of redundancy criteria is also straightforward. Let F and G be two sets of formulas, and let FInf be an F-inference system. Let Q be a nonempty set. For every q ∈ Q, let | q be a consequence relation over G, let GInf q be a G-inference system, let Red q be a redundancy criterion for | q and GInf q , and let G q be a grounding function from F and FInf to G and GInf q . Then for every q ∈ Q, the (G q , ∅)-lifting Red q,G q ,∅ of Red q is a redundancy criterion for the G q -lifting | q G q of | q and FInf , and so Red ∩G is a redundancy criterion for | ∩ G and FInf . Now let L be a nonempty set of labels, and define FL, FL ⊥ , and FLInf as above. For every q ∈ Q, define the function G Analogously to Lemmas 58-60, we obtain the following results:

Prover Architectures
We now use the above results to prove the refutational completeness of a popular prover architecture: the given clause procedure invented by McCune and Wos [28]. The architecture is parameterized by an inference system and a redundancy criterion. A generalization of the architecture decouples scheduling and computation of inferences, which has several benefits.

Given Clause Procedure
For this section, we fix the following. Let F and G be two sets of formulas, and let FInf be an F-inference system without premise-free inferences. Let Q be a nonempty set. For every q ∈ Q, let | q be a consequence relation over G, let GInf q be a G-inference system, let Red q be a redundancy criterion for | q and GInf q , and let G q be a grounding function from F and FInf to G and GInf q . Assume (FInf , Red ∩G ) is statically refutationally complete w.r.t. | ∩ G . Let L be a nonempty set of labels, let FL := F × L, and let the FL-inference system FLInf be a labeled version of FInf . By Theorem 63, (FLInf , Red ∩G L ) is statically refutationally complete w.r.t. | ∩ G L .
Let · = be an equivalence relation on F, let · be a well-founded strict partial ordering on F such that · is compatible with · = (i.e., C · D, C · = C , D · = D implies C · D ), such that C · = D implies G q (C) = G q (D) for all q ∈ Q, and such that C · D implies G q (C) ⊆ G q (D) for all q ∈ Q. We define · := · ∪ · =. In practice, · = is typically α-renaming (or equality if formulas are considered up to α-equivalence), · is either the instantiation ordering · > (Example 46), provided it is well founded, or some well-founded ordering included in · >, and for every q ∈ Q, G q maps every formula C ∈ F to the set of ground instances of C, possibly modulo some theory. Let be a well-founded strict partial ordering on L. We define the ordering on FL by (C, l) (C , l ) if either C · C or else C · = C and l l . By Lemma 52, the static refutational completeness of (FLInf , Red ∩G L ) w.r.t. | ∩ G L implies the static refutational completeness of (FLInf , Red ∩G L , ), which by Lemma 10 implies the dynamic refutational completeness of (FLInf , Red ∩G L , ).
This result may look intimidating, so let us unroll it. The FL-inference system FLInf is a labeled version of FInf , which means that we get an FLInf -inference by first omitting the labels of the premises (C n , l n ), . . . , (C 1 , l 1 ), then performing an FInf -inference (C n , . . . , C 0 ), and finally attaching an arbitrary label l 0 to the conclusion C 0 . Since the labeled grounding functions G q L differ from the corresponding unlabeled grounding functions G q only by the omission of the labels and the first components of Red ∩G L , and Red ∩G L agree, we get this result:

Lemma 64 An FLInf -inference ι is redundant w.r.t. Red ∩G L , and N if and only if the underlying FInf -inference ι is redundant w.r.t. Red ∩G and N .
For Red (ii) C · C for some C ∈ N ; (iii) C · C for some (C , l ) ∈ N with l l .
(ii) Assume that C · C for some C ∈ N . Then there exists a label l such that (C , l ) ∈ N . By the definition of , we have (C, l) The given clause procedure that lies at the heart of saturation provers can be presented and studied abstractly. 4 We assume that the set of labels L contains at least two values, one of which is a distinguished -smallest value denoted by active, and that the labeled version FLInf of FInf never assigns the label active to a conclusion.
The state of a prover is a set of labeled formulas. The label identifies to which formula set each formula belongs. The active label identifies the active formula set from the given clause procedure. The other, unspecified formula sets are considered passive. Given a set N and a label l, we define the projection N ↓ l as consisting only of the formulas labeled by l.
The given clause prover GC is defined as the following transition system: The initial state consists of the input formulas, paired with arbitrary labels different from active. A key invariant of the given clause procedure is that all inferences from active formulas are redundant w.r.t. the current set of formulas.
The Process rule covers most operations performed in a theorem prover. By Lemma 65, this includes -deleting Red ∩G F -redundant formulas with arbitrary labels and adding formulas that make other formulas Red  -deleting formulas that are · -subsumed by other formulas with arbitrary labels, by (ii); -deleting formulas that are · -subsumed by other formulas with smaller labels, by (iii); -replacing the label of a formula by a smaller label different from active, also by (iii).
Like for Red , in practice the added formulas would normally be entailed by N , but we impose no soundness restrictions.
Infer is the only rule that puts a formula in the active set. It relabels a passive formula C to active and ensures that all inferences between C and the active formulas, including C itself, become redundant. Recall that by Lemma 64, FLInf (N ↓ active , {(C, active)}) ⊆ Red By property (R4) of redundancy criteria, every inference is redundant if its conclusion is contained in the set of formulas, and typically, inferences are in fact made redundant by adding their conclusions to any of the passive sets. Then, M equals concl (FInf ( N ↓ active , {C})). There are, however, some techniques commonly implemented in theorem provers for which we need Infer's side condition in full generality.

Lemma 66 Every ⇒ GC -derivation is a Red ∩G L , -derivation.
Proof We need to show that every labeled formula that is deleted in a ⇒ GC -step is Red ∩G L ,redundant w.r.t. the remaining labeled formulas. For Process, this is trivial. For Infer, the only deleted formula is (C, l), which is Red ∩G L , -redundant w.r.t. (C, active) by part (iii) of Lemma 65, since l active.
Since (FLInf , Red ∩G L , ) is dynamically refutationally complete, it now suffices to show fairness to prove the refutational completeness of GC.
Given k ∈ N ∪ {∞}, let Inv N (k) denote the condition If (N i ) i is a Red ∩G L , -derivation and k ∈ N, by (R2) and (R3), the right-hand side is equal to Red ∩G L I (N k ). We will show that Inv N (i) is an invariant of GC and that it extends to the limit, enabling us to establish fairness: Lemma 67 Let (N i ) i be a ⇒ GC -derivation. If N 0 ↓ active = ∅, then Inv N (k) holds for all indices k.
Proof Base case: The hypothesis N 0 ↓ active = ∅ and the exclusion of premise-free inferences ensure that FLInf (↓ active ) = ∅ and hence Inv N (0) holds.
Case Process: Consider the step The first inclusion relies on Process's side condition that M ↓ active = ∅. The second inclusion corresponds to the induction hypothesis.

Lemma 68 Let (N i ) i be a nonempty P (FL)-sequence. If Inv N (i) holds for all indices i, then Inv N (∞) holds.
Proof We assume ι ∈ FLInf (N ∞ ↓ active ) for some ι and show ι ∈ i Red For ι to be in FLInf (N ∞ ↓ active ), all of its finitely many premises must be in N ∞ ↓ active . Therefore, there must exist an index k such that N k ↓ active contains all of them, and therefore ι ∈ FLInf (N k ↓ active ). Since Inv N (k) holds, ι ∈ k i=0 Red By Lemmas 66 and 69, we know that (N i ) i is a fair Red ∩G L , -derivation. Since (FLInf , Red ∩G L , ) is dynamically refutationally complete, we can conclude that some N i contains (⊥ , l) for some ⊥ ∈ F ⊥ and l ∈ L.

Example 71
The following Otter loop [28, Sect. 2.3.1] prover OL is an instance of the given clause prover GC. This loop design is inspired by Weidenbach's prover without splitting from his Handbook chapter [46,. The prover's state is a five-tuple N | X | P | Y | A of formula sets. The N , P, and A sets store the new, passive, and active formulas, respectively. The X and Y sets are subsingletons (i.e., sets of at most one element) that can store a chosen new or passive formula, respectively. Initial states are of the form N | ∅ | ∅ | ∅ | ∅.
Weidenbach identifies the X and Y components of OL's five-tuples; this is possible since the former is used only in his inner loop, whereas the latter is used only in his outer loop.
If we are interested in soundness, we can require that the formulas added by simplification and Infer are | ≈-entailed by the formulas in the state before the transition. This can be relaxed to consistency preservation-e.g., for calculi that perform skolemization.
A reasonable strategy for applying the OL rules is presented below. It relies on a wellfounded ordering on formulas to ensure that the simplification rules actually "simplify" their target, preventing nontermination of the inner loop. It also assumes that FInf (N , {C}) is finite if N is finite.
condition of dynamic refutational completeness is trivially satisfied. Otherwise, the argument is as follows. With each application of a rule other than Infer, the state, viewed as a multiset of labeled formulas, decreases w.r.t. the multiset extension of a relation that compares formulas using and breaks ties using on the labels. This ensures no formula is left in N or X forever. The fair choice of C ensures that that no formula is left in P forever, and the application of Infer following ChooseP ensures the same about Y . As a result, we have N ∞ = X ∞ = P ∞ = Y ∞ = ∅. Therefore, by Theorem 70, OL is dynamically refutationally complete.
In most saturation calculi, Red is defined in terms of some total well-founded ordering G on G. We can then define so that C C if the smallest element of G q (C) is greater than the smallest element of G q (C ) w.r.t. G , for some arbitrary fixed q ∈ Q. This allows a wide range of simplifications implemented in resolution or superposition provers.
To ensure fairness, the heuristic used to apply ChooseP must guarantee that no formula remains indefinitely in P. Fair choice strategies typically rely on formula age, which can be represented through labels. Consider labeled formulas (C, t), where t is the timestamp, and a labeled version of OL where formulas introduced by simplification or Infer are labeled with strictly increasing timestamps.

Example 72
One fair formula choice strategy is to alternate between heuristically choosing n formulas and taking the formula with the smallest timestamp [28, Sect. 2.3.1].
Proof By contradiction. Assume P ∞ = ∅. Consider the formula (C, t) ∈ P ∞ with the smallest timestamp t. There exists an index i such that C is the formula with the smallest timestamp in P i . After at most n + 1 applications of ChooseP, C will be chosen.

Example 73
Another fair option is to use an N-valued weight function w that is strictly monotonic in the timestamp-i.e., for any unlabeled formula C, if t < t , then w(C, t) < w(C, t )-and take a formula with the smallest weight [37,Sect. 4].
Proof Consider the labeled formula (C, t) with the smallest weight in P ∞ . The weight function satisfies the inequation n ≤ w(D, n) for every n and every unlabeled D. Therefore, after w(C, t) applications of Infer, new formulas introduced by simplification or Infer all have a weight larger than (C, t), and thus ChooseP will eventually have to choose (C, t).

Example 74
In its superposition module [20], iProver implements a rule that eliminates the chosen passive clause, or given clause, if it is redundant w.r.t. a subset of its child clauses together with the active set. The following iProver loop prover IL captures this. It is based on GC and consists of all the OL transition rules and of the following rule: As M, iProver would use a set of possibly simplified clauses from concl (FInf (A, {C})).

Example 75
Bachmair and Ganzinger's resolution prover RP [6, Sect. 4.3] is another instance of GC. It embodies both a concrete prover architecture and a concrete inference system: ordered resolution with selection (O S ). States are triples N | P | O of finite clause sets consisting of new, processed, and old (active) clauses, respectively. The instantiation relies on three labels l 3 l 2 l 1 = active. Subsumption can be supported as described in Example 46.

Delayed Inferences
The given clause prover GC presented in the previous subsection is sufficient to describe a prover based on an Otter loop as well as a basic DISCOUNT loop prover-which differs from the Otter loop prover OL in that the passive formulas are neither simplified or deleted using SimplifyBwdP or DeleteBwdP, nor are they used to simplify or delete other formulas in SimplifyFwd or DeleteFwd. To describe a DISCOUNT loop prover with orphan formula deletion, however, we need to extend GC.
An orphan formula is a passive formula generated by an inference for which at least one premise is no longer active.
To model orphan formula deletion, we need to decouple the scheduling of inferences and their computation. The same scheme can be used to model provers based on inference systems that contain premise-free inferences or that may generate infinitely many conclusions from finitely many premises. Yet another use of the scheme is to save memory: A delayed inference can be stored more compactly than a new formula, as a tuple of premises together with instructions on how to compute the conclusion.
The lazy given clause prover LGC generalizes GC. It is defined as the following transition system on pairs (T, N ), where T ("to do") is a set of scheduled inferences and N is a set of labeled formulas. We use the same assumptions as for GC except that we now permit premise-free inferences in FInf .
Initial states are states (T , N ) such that T consists of all premise-free inferences of FInf and N contains the input formulas paired with arbitrary labels different from active. A key invariant of LGC is that all inferences from active formulas are either scheduled in T or redundant w.r.t. N .
Process has the same behavior as the corresponding GC rule, except for the additional T component, which it ignores.
The Infer rule of GC is split into two parts in LGC: ScheduleInfer relabels a passive formula C to active and puts all inferences between C and the active formulas, including C itself, into the set T . ComputeInfer removes an inference from T and ensures that it becomes redundant by adding appropriate labeled formulas to N (typically the conclusion of the inference).
DeleteOrphans can delete scheduled inferences from T if some of their premises have been deleted from N ↓ active in the meantime by an application of Process. Note that the rule cannot delete premise-free inferences, since the side condition is then trivially false.
Abstractly, the T component of the state is a set of inferences (C n , . . . , C 0 ). In an actual implementation, it can be represented in different ways: as a set of compactly encoded recipes for computing the conclusion C 0 from the premises (C n , . . . , C 1 ) as in Waldmeister [24], or as a set of explicit formulas C 0 with information about their parents (C n , . . . , C 1 ) as in E [39]. In the latter case, some presimplifications may be performed on C 0 ; this could be modeled more faithfully by defining T as a set of pairs (ι, simp(C 0 )).
Proof We must show that every labeled formula that is deleted in an ⇒ LGC -step from the N component is Red ∩G L , -redundant w.r.t. the remaining labeled formulas. For Process this is trivial. For ScheduleInfer, the only deleted formula is (C, l), which is Red ∩G L ,redundant w.r.t. (C, active) by part (iii) of Lemma 65, since l active. Finally, the rules ComputeInfer and DeleteOrphans do not delete any formulas.
where ι ∈ T if and only if ι ∈ T for every ι and every T . We will show that Inv T,N (i) is an invariant of LGC and that it extends to the limit, enabling us to establish fairness.  N k+1 ). We assume ι ∈ FLInf (N k+1 ↓ active ) for some ι and show

be a nonempty (P (FInf ) × P (FL))-sequence. If Inv T,N (i) holds for all indices i, then Inv T,N (∞) holds.
Proof We assume ι ∈ FLInf (N ∞ ↓ active ) for some ι and show ι ∈ T ∞ ∪ i Red Clearly, there must exist an index k such that N k ↓ active contains all of ι's premises and ι / ∈ T k . Therefore is dynamically refutationally complete, we can conclude that some N i contains (⊥ , l) for some ⊥ ∈ F ⊥ and l ∈ L.

Example 81
The following DISCOUNT loop [1] prover DL is an instance of the lazy given clause prover LGC. This loop design is inspired by Schulz's description of E [39] but omits E's presimplification of concl(ι). The prover's state is a four-tuple T | P | Y | A, where T is a set of inferences and P, Y , A are sets of formulas. The T , P, and A sets correspond to the scheduled inferences, the passive formulas, and the active formulas, respectively. The Y set is a subsingleton that can store a chosen passive formula. Initial states have the form T | P | ∅ | ∅, where T is the set of all premise-free inferences of FInf .
A reasonable strategy for applying the DL rules is presented below. It relies on a wellfounded ordering on formulas to make sure that the simplification rules actually simplify their target in some sense, preventing infinite looping. It assumes that FInf (N , {C}) is finite whenever N is finite.
1. Repeat while T ∪ P = ∅ and ⊥ / ∈ Y ∪ A: 1.1. Apply ComputeInfer or ChooseP to retrieve the next conclusion of an inference from T or the next formula from P, where T and P are organized as a single priority queue, to ensure fairness. 1.2. Apply SimplifyFwd as long as the simplified formula C is -smaller than the original formula C. The instantiation of LGC relies on three labels l 3 l 2 l 1 = active corresponding to the sets P, Y , A, respectively.

Example 82
Higher-order unification can give rise to infinitely many incomparable unifiers. As a result, in λ-superposition [14], performing all inferences between two clauses can lead to infinitely many conclusions, which need to be enumerated fairly. The Zipperposition prover [14], which implements the calculus, performs this enumeration in an extended DISCOUNT loop.
Infinitary inference rules are also useful to reason about the theory of datatypes and codatatypes. Superposition with (co)datatypes [19] includes n-ary Acycl and Uniq rules, which had to be restricted and complemented with axioms so that they could be implemented in Vampire [27]. In Zipperposition, it would be possible to support the rules in full generality, eliminating the need for the axioms.
Abstractly, a Zipperposition loop prover ZL operates on states T | P | Y | A, where T is organized as a finite set of possibly infinite sequences (ι i ) i of inferences and the other compo-nents are as in DL (Example 81). The ChooseP, DeleteFwd, SimplifyFwd, DeleteBwd, and SimplifyBwd rules are essentially as in DL. The other rules follow: ComputeInfer works on the first element of sequences. ScheduleInfer adds new sequences to T . Typically, these sequences store FInf (A, {C}), which may be countably infinite, in such a way that all inferences in one sequence have identical premises and can be removed together by DeleteOrphan. The same rule can also be used to remove empty sequences from T , since the side condition is then vacuously true, thereby providing a form of garbage collection.
A subtle difference with DL is that ComputeInfer puts the formula C in P instead of Y . This gives more flexibility for scheduling; for example, a prover can pick several formulas from the same sequence and then choose the most suitable one-not necessarily the first one-to move to the active set.
To produce fair derivations, a prover needs to choose the sequence in ComputeInfer fairly and to choose the formula in ChooseP fairly. In combination, this achieves a form of dovetailing. The prover could use a simple algorithm, such as round-robin, for ComputeInfer and employ more sophisticated heuristics for ChooseP.
The implementation in Zipperposition uses a slightly more complicated representation for T , with sequences of subsingletons of inferences. Thus, each sequence element is either a single inference ι or the empty set, which signifies that no new unifier was found up to a certain depth.

Remark 83
The above approach to orphan formula deletion works because formulas recognized as orphans, belonging to the T state component, cannot have been used to make other formulas or inferences redundant-only passive and active formulas are considered by the redundancy criterion. If formulas from the P or A set could be detected as orphans, we could lose refutational completeness. To see this, consider the abstract scenario in which a formula C that is crucial for a refutation is subsumed by D, which is in turn deleted for being an orphan formula. Then C is lost forever even if is not an orphan formula.
Nevertheless, the idea of detecting orphan formulas outside the scheduled inferences in T can be salvaged as follows: Annotate each formula D with its parentage, and whenever D is used to simplify other clauses or to make other inferences redundant, remember this fact. Only consider D an orphan formula if it has lost a parent and if it has never beeen used to delete other formulas or to make other inferences redundant.

Integrating Saturation Calculi
The prover architectures described above can be instantiated with saturation calculi that use a redundancy criterion obtained as an intersection of lifted redundancy criteria. Some saturation calculi are defined in such a way that this requirement is trivially satisfied. For others, some reformulation of the redundancy criterion may be necessary.

Example 84
As explained in Examples 54 and 55, redundancy criteria for calculi with selection functions [5,6] or constraints [29,30] can be defined as intersections Red ∩G of lifted redundancy criteria.
Example 85 In Bachmair and Ganzinger's associative-commutative (AC) superposition calculus [4], the redundancy of general clauses and inferences is defined using a grounding function G that maps every clause C to the set of its ground instances Cθ and every inference ι to the set of its ground instances ιθ . ("Instance" means "syntactic instance" here, that is, not "instance modulo AC.") In principle, one could now apply (G, )-lifting, where we choose as the instantiation ordering modulo AC. This would be pointless, though, since in the definition of Red G, F the ordering is used only if D is a common syntactic instance of C and C . Note that, for example, C = f(c + (c + z)) ≈ b is an AC-instance of C = f((x + x) + y) ≈ b, but since C and C have no common syntactic ground instances, this fact is never exploited in Red G, F . We can repair this by redefining G so that it maps every ι to the set of its syntactic ground instances ιθ , as before, but C to the set of all D that are AC-equal to some ground instance Cθ . This qualifies as a grounding function as well, and since Bachmair and Ganzinger's definition of redundancy for ground clauses is invariant under AC, the new definition of redundancy for general clauses is equivalent to the old one.
Example 86 Waldmann [44] considers a superposition calculus modulo -torsion-free cancellative abelian monoids. Redundant clauses and inferences are defined in the standard way by lifting, except for the Abstraction inference rule: According to Waldmann's definition, a ground instance of an Abstraction inference ι = (C 2 , C 1 , C 0 ) is an Abstraction inference (C 2 θ, C 1 θ, C 0 θ) where C 2 θ and C 1 θ are ground. But the conclusion of an Abstraction inference is never ground, and this applies also to C 0 θ . When defining redundancy for such inferences, it is therefore necessary to further instantiate the abstraction variable y in C 0 θ using a substitution ρ that maps y to a sufficiently small ground term. To obtain a grounding function G as defined in Sect. 3.1, we need to redefine G(ι) as the set of all inferences (C 2 θ, C 1 θ, C 0 θρ), rather than the set of all (C 2 θ, C 1 θ, C 0 θ).

Example 87
The definition of redundancy for Bachmair, Ganzinger, and Waldmann's hierarchic superposition calculus [8] is mostly standard, using a grounding function that maps every clause C to a subset G(C) of the set of its ground instances and every hierarchic superposition inference ι to a set G(ι) of ground standard superposition inferences. There is one exception, namely, Close inferences, which derive ⊥ from a list of premises that is inconsistent w.r.t. some base (background) theory. For these inferences, G(ι) = undef .
Baumgartner and Waldmann's variant of hierarchic superposition [10] uses a slightly different definition of redundancy: where Th is a fixed set of ground base clauses and Red is the usual redundancy criterion for ground standard superposition. To convert this into the format required in Sect It is easy to check that Red Th := (Red Th I , Red Th F ) is also a redundancy criterion and that the properties above are equivalent to G(C) ⊆ Red Th F (G(N )) and G(ι) ⊆ Red Th I (G(N )). For Close inferences, we have again G(ι) = undef .

Example 88
For saturation calculi whose refutational completeness proof is based on some kind of lifting of ground instances, the requirement to use a redundancy criterion obtained as an intersection of lifted redundancy criteria is rather natural. The outlier is unfailing completion [2].
Unfailing completion predates the introduction of Bachmair-Ganzinger-style redundancy, but it can be incorporated into that framework. The formulas are the rewrite rules and equations. The only inferences are orientation and critical pair computation; the other inferences of the unfailing completion calculus (e.g., simplifications of equations or rules) must be considered as simplifications in our framework, rather than as inferences. With these definitions, formulas and inferences are redundant if for every rewrite proof using that rewrite rule, equation, or critical peak, there exists a smaller rewrite proof. 5 The requirement that the redundancy criterion must be obtained by lifting (which is necessary to introduce the labeling) can then be trivially fulfilled by "self-lifting"-i.e., by defining G := F and · := ∅ and by taking G as the function that maps every formula or inference to the set of its α-renamings.
Note that this definition of redundancy differs from the usual definition of redundancy for superposition. For example, with a term ordering satisfying f(c) f(b) f(a) c b a, the equations c ≈ b and c ≈ a make f(b) ≈ f(a) redundant in the superposition calculus (since they are smaller in the induced clause ordering), but they do not make f(b) ≈ f(a) redundant in unfailing completion (since the rewrite proof f(b) ↔ f(c) ↔ f(a) using c ≈ b and c ≈ a is larger than the rewrite proof f(b) ↔ f(a) using f(b) ≈ f(a)).

Isabelle Development
The framework described in the previous sections has been formalized in Isabelle/HOL [31,32], including all the theorems and lemmas, the prover architectures GC and LGC, and the example prover RP. The Isabelle theory files are available in the Archive of Formal Proofs [16,40]. The development is also part of the IsaFoL (Isabelle Formalization of Logic) [17] effort, which aims at developing a reusable computer-checked library of results about automated reasoning.
The main theory files of the development are listed below: -Calculus.thy collects basic definitions and lemmas about consequence relations, inference systems, and redundancy criteria, including the equivalence of static and dynamic refutational completeness. -Calculus_Variations.thy contains alternative notions of inferences, redundancy, saturation, and completeness found in the literature. -Intersection_Calculus.thy introduces calculi equipped with a family of redundancy criteria, whose intersection is taken. -Lifting_to_Non_Ground_Calculi.thy gathers the results on nonground liftings of calculi without and with well-founded orderings D . -Labeled_Lifting_to_Non_Ground_Calculi.thy contains the labeled extensions of the previous liftings. -Given_Clause_Architectures.thy and Given_Clause_Architectures_Revisited.thy include results about the given clause prover GC and its extension LGC with delayed inferences. The invariantbased proofs presented in this paper are found in the latter theory file. -FO_Ordered_Resolution_Prover_Revisited.thy re-proves the ordered resolution prover RP refutationally complete using our framework, providing a modular alternative to the Isabelle formalization by Schlichtkrull et al. [36,38].
The development relies heavily on Isabelle's locales [9]. These are contexts that fix variables and make assumptions about these. Definitions and lemmas occurring inside the locale may then refer to them. With locales, the definitions and lemmas look similar to or even simpler than how they are stated on paper, but the proofs often become more complicated: Layers of locales may hide definitions, and often these need to be manually unfolded in several steps before the desired lemma can be proved. A pathological example is Lemma 64, which obviously holds by construction from a human perspective but whose Isabelle proof required more than a hundred lines of code.
We chose to represent basic nonempty sets such as F and L by types. This lightened the development in two ways. First, it relieved us from having to thread through nonemptiness conditions. Second, objects are automatically typed appropriately based on the context, meaning that lemmas could be stated without explicit hypotheses that given objects are formulas, labels, or indices. On the other hand, for sets such as F ⊥ and FInf that are subsets of other sets, it was natural to use simply typed sets. Derivations, which describe the dynamic behavior of a calculus, are represented by the same lazy list codatatype [18] and auxiliary definitions that were used by Schlichtkrull et al.
The framework's design and its mechanization were carried out largely in parallel. This resulted in more work on the mechanization side because changes had to be propagated, but it also helped detect missing conditions and shape the theory itself. For example, an earlier version of the framework considered only single lifted redundancy criteria instead of intersections of lifted redundancy criteria (Sect. 3.3); our first attempt at verifying RP in Isabelle using the framework made it clear that the theory was not quite general enough yet to support selection functions (Example 54).

Conclusion
We presented a formal framework for saturation theorem proving inspired by Bachmair and Ganzinger's Handbook chapter [6]. Users can conveniently derive a dynamic refutational completeness result for a concrete prover based on a statically refutationally complete calculus. The key was to strengthen the standard redundancy criterion so that all prover operations, including subsumption deletion, can be justified by inference or redundancy. The framework is mechanized in Isabelle/HOL and can be used to verify actual provers.
To employ the framework, users must provide a statically complete saturation calculus expressible as the lifting (FInf , Red G ) or (FInf , Red ∩G ) of a ground calculus (GInf , Red), where Red qualifies as a redundancy criterion and G qualifies as a grounding function or grounding function family. The framework can be used to derive two main results: 1. After defining a well-founded ordering or a family of well-founded orderings that capture instantiation, invoke Theorem 53 to show (FInf , Red ∩G, ) dynamically complete. 2. Based on the previous step, invoke Theorems 70 or 80 to derive the dynamic completeness of a prover architecture building on the given clause procedure, such as the Otter, iProver, DISCOUNT, or Zipperposition loop (Examples 71, 74, 81, or 82).
The framework can also help establish the static completeness of the nonground calculus. For many calculi (with the notable exceptions of constraint and hierarchic superposition), Theorems 32 or 50 can be used to lift the static completeness of (GInf , Red) to (FInf , Red G ) or (FInf , Red ∩G ).
The main missing piece of the framework is a generic treatment of clause splitting. Until recently, the only formal treatment of splitting, by Fietzke and Weidenbach [22], hard-codes both the underlying calculus and the splitting strategy. Voronkov's AVATAR architecture [42] is more flexible and yields truly impressive empirical results, but he and his collaborators left the question of AVATAR's refutational completeness open. Ebner et al. [21] recently provided an answer by introducing and instantiating a generic splitting framework based on our saturation framework.