A Unifying Splitting Framework

. AVATAR is an elegant and eﬀective way to split clauses in a saturation prover using a SAT solver. But is it refutationally complete? And how does it relate to other splitting architectures? To answer these questions, we present a unifying framework that extends a saturation calculus (e.g., superposition) with splitting and embeds the result in a prover guided by a SAT solver. The framework also allows us to study locking, a subsumption-like mechanism based on the current propositional model. Various architectures are instances of the framework, including AVATAR, labeled splitting, and SMT with quantiﬁers.


Introduction
One of the great strengths of saturation calculi such as superposition [1] is that they avoid case distinctions. Derived clauses hold unconditionally, and the prover can stop as soon as it derives the empty clause, without having to backtrack. The drawback is that these calculi often generate long, unwieldy clauses that slow down the prover. A remedy is to partition the search space by splitting a multiple-literal clause C 1 ∨ · · · ∨ C n into variable-disjoint subclauses C i . Splitting approaches include splitting with backtracking [24], splitting without backtracking [20], labeled splitting [10], and AVATAR [22].
The SAT-based AVATAR architecture is of particular interest because it is so successful. Voronkov reported that an AVATAR-enabled Vampire could solve 421 TPTP [21] problems that had never been solved before by any system [22,Sect. 9], a mind-boggling number. AVATAR works well in combination with the superposition calculus because it combines superposition's strong equality reasoning with the SAT solver's strong clausal reasoning. It is also appealing theoretically, because it gracefully generalizes traditional saturation provers and yet degenerates to a SAT solver if the problem is propositional. x)]}, which closes the branch. Next, the SAT solver makes the right disjunct true, and resolving q(y, b) ← {[q(y, b)]} with ¬q(z, z) yields ⊥ ← {[q(y, b)]}. The SAT solver then reports "unsatisfiable," concluding the refutation.
What about refutational completeness? Far from being a purely theoretical concern, establishing completeness-or finding counterexamples-could yield insights and perhaps lead to an even stronger AVATAR. Before we can answer this open question, we must mathematize splitting. Our starting point is the saturation framework by Waldmann, Tourret, Robillard, and Blanchette [23], based on Bachmair and Ganzinger [2]. It covers a wide array of techniques, but "the main missing piece of the framework is a generic treatment of clause splitting" [23, p. 332]. We provide that missing piece, in the form of a splitting framework, and use it to show the completeness of an AVATAR-like architecture.
Our framework has five layers, linked by refinement. The first layer consists of a refutationally complete base calculus, such as resolution or superposition. It must be presentable as an inference system and a redundancy criterion.
From a base calculus, we derive a splitting calculus (Sect. 3). This extends the base calculus with splitting and inherits the base's completeness. It works on A-clauses or A-formulas C ← A, where A is a set of propositional literals.
Using the saturation framework, we can prove the dynamic completeness of an abstract prover, formulated as a transition system, that implements the splitting calculus. However, this ignores a vital component of AVATAR: the SAT solver. AVATAR considers only inferences involving A-formulas whose assertions are true in the current propositional model. The role of the third layer is to reflect this behavior. A model-guided prover operates on states of the form (J, N ), where J is a propositional model and N is a set of A-formulas (Sect. 4).
The fourth layer introduces AVATAR's locking mechanism (Sect. 5). With locking, an A-formula D ← B can be temporarily disabled by another A-formula C ← A if C subsumes D, even if A ⊆ B. Here we make a first discovery: AVATARstyle locking compromises completeness and must be curtailed.
Finally, the fifth layer is an AVATAR-based prover (Sect. 6). This refines the locking model-guided prover of the fourth layer with the given clause procedure, which saturates an A-formula set by distinguishing between active and passive A-formulas. Here we make another discovery: Selecting A-formulas fairly is not enough to guarantee completeness. We need a stronger criterion.
In a hypothetical tête-à-tête with the designers of labeled splitting, they might gently point out that by pioneering the use of a propositional model, including locking, they almost invented AVATAR themselves. Likewise, developers of SMT solvers might be tempted to claim that Voronkov merely reinvented SMT. To investigate such questions, we apply our framework to splitting without backtracking, labeled splitting, and SMT with quantifiers (Sect. 7). This gives us a solid basis for comparison as well as some new theoretical results.
A technical report [8] is available with the proofs, several counterexamples, and further details. A formalization using Isabelle/HOL [16] is underway.

Preliminaries
Our framework is parameterized by abstract notions of formulas, consequence relations, inferences, and redundancy. We largely follow the conventions of Waldmann et al. [23]. A-formulas generalize Voronkov's A-clauses [22].

Formulas.
A set F of formulas is a set that contains a distinguished element ⊥ denoting falsehood. A consequence relation | = ⊆ (P(F)) 2 has the following properties for all M, N, P, Q ⊆ F and C, D ∈ F: we can easily derive a relation understood as M − → N , as required by the saturation framework.
The | = notation can be extended to allow negation on either side. Let F ∼ be defined as Following the saturation framework [23, p. 318], we distinguish between the consequence relation | = used for stating refutational completeness and a possibly stronger consequence relation | ≈ for soundness. We require that | ≈ is compact.
Example 2. In clausal first-order logic with equality, the formulas in F consist of clauses over a signature Σ. Each clause C is a finite multiset of literals L 1 , . . . , L n written C = L 1 ∨ · · · ∨ L n . Each literal L is either an atom or its negation (¬), and each atom is an unoriented equation s ≈ t. We have M | = N if and only if every Σ-model of M also satisfies at least one clause in N.
Calculi and Derivations. A refutational calculus (Inf , Red ) combines a set of inferences Inf and a redundancy criterion Red . We refer to Waldmann et al. [23] for the precise definitions. Recall in particular that Inf (N ) is the set of inferences from N , Let (X i ) i be a sequence of sets. Its limit inferior is X ∞ = lim inf j→∞ X j = i j≥i X j , and its limit superior is Given a relation , a -derivation is an infinite sequence such that x i x i+1 for every i. Finite runs can be extended to derivations via stuttering.
Let RedF ⊆ (P(F)) 2 be the relation such that M RedF N if and only if M \ N ⊆ Red F (N ). The calculus (Inf , Red ) is dynamically (refutationally) complete (w.r.t. | =) if for every RedF -derivation (N i ) i that is weakly fair w.r.t. Inf and Red I and such that N 0 | = {⊥}, we have ⊥ ∈ N i for some i.
A-Formulas. We fix throughout a countable set V of propositional variables v 0 , v 1 , . . . . For each v ∈ V, let ¬v ∈ ¬V denote its negation, with ¬¬v = v. We assume that a formula fml (v) ∈ F is associated with each v ∈ V. Intuitively, v approximates fml (v) at the propositional level. This definition is extended so that fml (¬v) = ∼fml (v). An assertion a ∈ A = V ∪ ¬V is either a propositional variable v or its negation ¬v. Given a formula C ∈ F ∼ , let asn(C) denote the set of assertions a ∈ A such that {fml (a)} | ≈ {C} and {C} | ≈ {fml (a)}.
A propositional interpretation J ⊆ A is a set such that for every v ∈ V, exactly one of v ∈ J and ¬v ∈ J holds. We reserve the letter J for interpretations, and define fml (J) = {fml (a) | a ∈ J}.
An A-formula over a set F of base formulas and an assertion set A is a pair C = (C, A) ∈ AF = F × P fin (A), written C ← A, where C is a formula and A is a finite set of assertions {a 1 , . . . , a n } understood as an implication a 1 ∧· · ·∧a n − → C. We identify C ← ∅ with C and define the projection C ← A = C. Moreover, N ⊥ is the set consisting of all A-formulas of the form ⊥ ← A ∈ N . We call such A-formulas propositional clauses. Note the use of calligraphic letters (e.g., C, N ) to range over A-formulas and sets of A-formulas.
We say that C←A ∈ AF is enabled in J if A ⊆ J. A set of A-formulas is enabled in J if all of its members are enabled in J. The enabled projection N J ⊆ N consists of the projections C of all A-formulas C enabled in J. Analogously, the enabled projection Inf J ⊆ Inf of a set Inf of AF-inferences consists of the projections ι of all inferences ι ∈ Inf whose premises are all enabled in J.
A propositional interpretation J is a propositional model of N ⊥ , written

Splitting Calculi
Let F be a set of base formulas equipped with ⊥, | =, and | ≈. The relation | ≈ is assumed to be nontrivial: (D5) ∅ | ≈ ∅. Let A be a set of assertions over V and AF be the set of A-formulas over F and A. Let (FInf , FRed ) be a base calculus for F, where FRed is a redundancy criterion that additionally satisfies (1) an inference is FRed I -redundant if one of its premises is FRed F -redundant; (2) ⊥ / ∈ FRed F (N ) for every N ⊆ F; and (3) C ∈ FRed F ({⊥}) for every C = ⊥. These requirements can easily be met by a well-designed redundancy criterion [1,Sect. 4.3].
Below, we will define the splitting calculus induced by the base calculus. We will see that it not only is statically and dynamically complete w.r.t. | =, but also meets stronger, "local completeness" criteria that capture model switching.
The Inference Rules. We start with the mandatory inference rules.

Definition 4. The splitting inference system SInf consists of all instances of
In addition, the following optional inference rules can be used: The following side conditions apply. For Split: The three rules identified by double bars are simplifications; they replace their premises with their conclusions in the current A-formula set. The premises' removal is justified by SRed F , defined below. Also note that Base preserves the soundness of FInf w.r.t. | ≈ and that the other rules are sound w.r.t. | ≈.
The Split rule performs an n-way case split on C. Each case C i is approximated by an assertion a i . The first conclusion expresses that the case distinction is exhaustive. The n other conclusions assume C i if its approximation a i is true. In a clausal prover, typically C = C 1 ∨ · · · ∨ C n , where the subclauses C i have mutually disjoint sets of variables and form a maximal split.
Collect and Trim do some garbage collection. StrongUnsat is a variant of Unsat that uses | ≈ instead of | =. It might correspond to invoking an SMT solver [3] (| ≈) with a time limit, falling back on a SAT solver (| =). Approx can be used to make any derived A-formula visible to | ≈. Tauto allows communication in the other direction, from the SAT solver to the calculus. The Redundancy Criterion. Next, we lift the base redundancy criterion.

Definition 6. The splitting redundancy criterion
SRed qualifies as a redundancy criterion. It can justify the deletion of Aformulas that are propositionally tautological. It also allows other simplifications, as long as the assertions on A-formulas used to simplify a given C ← A are contained in A. If the base criterion FRed F supports subsumption, this also extends to A-formulas: is statically and hence dynamically complete. However, this result fails to capture a key aspect of most splitting architectures. Since SRedF -derivations have no notion of current split branch or model J, they must also perform disabled inferences. To respect enabledness, we need a weaker notion of saturation. If an A-formula set is consistent, it should suffice to saturate w.r.t. a single propositional model. In other words, if no A-formula ⊥ ← A ⊆ J is derivable for some model J | = N ⊥ , the prover should be allowed to give a verdict of "consistent." We will call such model-specific saturations local.
Theorem 11 (Strong dynamic completeness). Assume (FInf , FRed ) is statically complete. Given an SRedF -derivation (N i ) i that is locally fair w.r.t. SInf and SRed I and such that N 0 | = {⊥}, we have ⊥ ∈ N i for some i.
In Sects. 4 to 6, we will review three transition systems of increasing complexity, culminating with an idealized specification of AVATAR. They will be linked by a chain of stepwise refinements, like pearls on a string. All derivations using these will correspond to SRedF -derivations, and their fairness criteria will imply local fairness. Consequently, by Theorem 11, they will all be complete.

Model-Guided Provers
AVATAR and other splitting architectures maintain a model of the propositional clauses, which represents the split tree's current branch. We can capture this abstractly by refining SRedF -derivations to incorporate a propositional model.
The states are now pairs (J, N ), where J is a propositional model and N ⊆ AF. Initial states have the form (J, N ), where N ⊆ F. The model-guided prover MG is defined by the following transition rules: From an =⇒ MG -derivation, we obtain an SRedF -derivation by simply erasing the J components. The Derive rule can add new A-formulas and delete redundant A-formulas. J should be a model of N ⊥ most of the time; when it is not, Switch can be used to switch model or StrongUnsat to finish the refutation. The natural option is to switch model. We take J 1 = {v 0 , ¬v 1 }. We then derive Finally, we detect that the propositional clauses are unsatisfiable.
We need a fairness criterion for MG that implies local fairness of the underlying SRedF -derivation. The latter requires a witness J but gives us no hint as to where to look for one. Our solution involves a topological concept: J is a limit point in ., v 1 , v 3 , . . . , v 2i−1 are true and the other variables are false) and J 2i+1 = (J 2i \{¬v 2i })∪{v 2i }. Although it is not in the sequence, the interpretation J∩V = {v 1 , v 3 , . . .} is a limit point. The associated split tree is shown in Fig. 1. The direct path from the root to a node J i specifies the assertions that are true in J i .
This sequence has two limit points: J = lim inf i→∞ J 4i+1 and J = lim inf i→∞ J 4i+3 . The split tree is depicted in Fig. 2.
Basic topology tells us that every sequence has a limit point. No matter how erratically the prover switches branches, it will fully explore at least one of them. It then suffices to perform the base FInf -inferences fairly in that branch: Fairness of an =⇒ MG -derivation implies local fairness of the underlying SRedFderivation. A well-behaved propositional solver, as in labeled splitting, always gives rise to a single limit point J ∞ , which can be taken for J in Definition 15. By contrast, an unconstrained solver, as supported by AVATAR, can produce multiple limit points. Then it is more challenging to ensure fairness.
Example 16. Consider the consistent set consisting of ¬p(x), p(a) ∨ q(a), and ¬q(y) ∨ p(f(y)) ∨ q(f(y)). Splitting the second clause into p(a) and q(a) and resolving q(a) with the third clause yields p(f(a)) ∨ q(f(a)). This process can be iterated. Now suppose that v 2i and v 2i+1 are associated with p(f i (a)) and q(f i (a)), respectively. If we split every emerging p(f i (a))∨q(f i (a)) and the SAT solver always makes v 2i true first, we end up with the situation of Example 13 and Fig. 1. For the limit point J, all FInf -inferences are performed. Thus, the derivation is fair.
Example 17. We build a clause set from two copies of Example 16, where each clause C from each copy i ∈ {1, 2} is extended to ¬r i ∨ C. We add the clause r 1 ∨ r 2 and split it as our first move. From there, each branch imitates Example 16. A SAT solver might jump back and forth, as in Example 14 and Fig. 2. Even if A-clauses get disabled and re-enabled infinitely often, we must perform all nonredundant inferences in at least one of the two limit points (J or J ).

Locking Provers
Next, we refine the model-guided prover into a locking prover that temporarily locks away A-formulas that are redundant locally w.r.t. some J but not globally. Locking can cause incompleteness, because an A-formula can be locally redundant at every point in the derivation and yet not be so at any limit point, thereby breaking local saturation. For example, if we have derived p(x) ← {¬v k } for every k, then p(c) is locally redundant in any J that contains ¬v k . For the models J i = {v 1 , . . . , v i , ¬v i+1 , . . .}, the clause p(c) would always be locally redundant and ignored. Yet p(c) might not be locally redundant at the unique limit point J = V. We could rule out this counterexample by requiring that derivations are strongly fair-that is, every inference possible infinitely often must eventually be made redundant. However, we have found a counterexample showing that strong fairness does not ensure completeness [8,Example 46]. It would seem that this counterexample could arise with Vampire if the underlying SAT solver produces this specific sequence of interpretations.
Our solution is as follows. Let (J i , N i , L i ) i be an =⇒ L -derivation, let (J j ) j be a subsequence of (J i ) i , and let (N j ) j be the corresponding subsequence of (N i ) i . To achieve fairness, we now consider N ∞ , the A-formulas persistent in the unlocked subsequence (N j ) j . By contrast, fairness of =⇒ MG -derivations used N ∞ .
Fairness of an =⇒ L -derivation implies fairness of the corresponding =⇒ MGderivation. The condition on the sets L j ensures that inferences from A-formulas that are locked infinitely often, but not infinitely often with the same lock, are redundant at the limit point. In particular, if we know that each A-formula is locked at most finitely often, then lim sup j→∞ L j = L ∞ and the inclusion in the definition above simplifies to FInf ((N ∞ ) J ) ⊆ i FRed I ((N i ∪ L i ) J ).

AVATAR-Based Provers
AVATAR was unveiled in 2014 by Voronkov [22]. Since then, he and his colleagues studied many options and extensions [3,17]. A second implementation, in Lean's super tactic, is due to Ebner [9]. Here we attempt to capture AVATAR's essence.
The abstract AVATAR-based prover we define in this section extends the locking prover L with a given clause procedure [13]. A-formulas are moved in turn from the passive to the active set, where inferences are performed. The heuristic for choosing the next given A-formula to move is guided by timestamps indicating when the A-formulas were derived, to ensure fairness.
Let TAF = AF × N be the set of timestamped A-formulas. Given N ⊆ TAF, we define N = {C | (C, t) ∈ N for some t}, and we overload existing notations to erase timestamps. Thus, N = N , N ⊥ = N ⊥ , and so on. Note that we use a new set of calligraphic letters (e.g., C, N) to range over timestamped A-formulas and A-formulas sets. Using the saturation framework [23, Sect. 3], we lift (SInf , SRed ) to a calculus (TSInf , TSRed ) on TAF with the tiebreaker order > on timestamps, so that (C, t + k) ∈ TSRed F ({(C, t)}) for any k > 0.
where A, P, and Q are respectively the sets of active, passive, and other (disabled or propositional) timestamped A-formulas, and L is the set of locked timestamped A-formulas such that (1) A ⊥ = P ⊥ = ∅, (2) A ∪ P is enabled in J, and (3) Q J ⊆ {⊥}. The AVATAR-based prover AV is defined as follows: There is also a LockP rule that is identical to LockA except that it starts in the state (J, A, P {(C ← A, t)}, Q, L). An AV-derivation is well timestamped if every A-formula introduced by a rule is assigned a unique timestamp. Let In contrast with nonsplitting provers, for AV, fairness w.r.t. formulas does not imply fairness w.r.t. inferences. A problematic scenario involves two premises C, D of an inference ι and four transitions repeated forever, possibly with other steps interleaved: Infer makes C active; Switch disables it; Infer makes D active; Switch disables it. Even though C and D are selected in a strongly fair fashion, ι is never performed. We need an even stronger fairness criterion.
infinitely many indices i and there exists a subsequence (J j ) converging to a limit point (3) ensures that all inferences involving passive A-formulas are redundant at the limit point. It would not suffice to require P ∞ = ∅ because A-formulas can move back and forth between A, P, and Q, as we just saw. Condition (4) is similar to the condition on locks in Definition 18. If the =⇒ AVderivation is fair, the corresponding =⇒ L -derivation is also fair.
Many selection strategies are combinations of basic strategies, such as choosing the smallest formula by weight or the oldest by age. We capture such strategies using selection orders . Intuitively, C D if the prover will always select C A state is a tuple (J, A, P, Q, L) ∈ P(A) × P(TAF) 3 × P(P fin (A) × TAF), before D if both are present. We use two selection orders: TAF , based on timestamps, must be followed infinitely often; F must be followed otherwise.
For the first one, we can use age defined so that (C, t) age (C , t ) if t < t .
Definition 20. Let X be a set. A selection order on X is an irreflexive and transitive relation such that {y | y x} is finite for all x ∈ X.
The intersection of two orders 1 and 2 corresponds to the nondeterministic alternation between them. The prover may choose either a 1 -minimal or a 2 -minimal A-formula, at its discretion.
To ensure completeness, we must restrict the inferences that the prover may perform; otherwise, it could derive infinitely many A-formulas with different assertions, causing it to switch between two branches of the split tree without making progress. Given N ⊆ AF, let N = {A | C ← A ∈ N for some C}. Intuitively, a strongly finitary function F returns finitely many base formulas and finitely many new assertions, although it may return infinitely many Aformulas. Clearly, F (N ) is finite for any finite N ⊆ AF. If FInf (N ) is finite for any finite N ⊆ F, then performing SInf -inferences is strongly finitary. Deterministic Split rules, such as AVATAR's, are also strongly finitary. We can lift a strongly finitary F to any N ⊆ TAF by taking F TAF (N) = F ( N ) × N. If F and G are strongly finitary, then so is N → F (N ) ∪ G(N ).
Simplification rules used by the prover must be restricted even more to ensure completeness, because they can lead to new splits and assertions. For example, simplifying p(x * 0) ∨ p(x) to p(0) ∨ p(x) transforms an unsplittable clause into a splittable one. If simplifications were to produce infinitely many such clauses, the prover might split and switch models forever without making progress.

Definition 22.
Let ≺ be a well-founded relation on F, and let be its reflexive closure. A function S : AF → P(AF) is a strongly finitary simplification bound for ≺ if N → C∈N S(C) is strongly finitary and C C for all C ∈ S(C).
The prover may simplify an A-formula C to C only if C ∈ S(C). It may also delete C. Strongly finitary simplification bounds are closed under unions, allowing the combination of simplification techniques based on ≺. For superposition, a natural choice for ≺ is the clause order. The key property of strongly finitary simplification bounds is that if we saturate a finite set of A-formulas w.r.t. simplifications, the saturation is also finite.
Example 23. Let F be the set of first-order clauses and S(C ← A) = {C ← A | C is a subclause of C and A ⊆ A}. Then S is a strongly finitary simplification bound. This S covers many simplification techniques, including elimination of duplicate literals, deletion of resolved literals, and subsumption resolution.
Example 24. If the Knuth-Bendix order [12] is used and all weights are positive, then S(C ← A) = {C ← A | C ≺ C and A ⊆ A} is a strongly finitary simplification bound. This can be used to cover demodulation. Equipped with the above definitions, we introduce a fairness criterion that is more concrete and easier to apply than fairness of =⇒ AV -derivations. We could refine AV further and use this criterion to show the completeness of an imperative procedure such as Voronkov's extended Otter loop [22, Fig. 3], thus showing that Vampire with AVATAR is complete if locking is sufficiently restricted.
Lemma 25. Let I be a strongly finitary function, and let S be a strongly finitary simplification bound. Then a well-timestamped =⇒ AV -derivation (J i , A i , P i , Q i , L i ) i is fair if all of the following conditions hold:

Application to Other Architectures
AVATAR may be the most natural application of our framework, but it is not the only one. Below we complete the picture by studying splitting without backtracking, labeled splitting, and SMT with quantifiers.
Splitting without Backtracking. Before AVATAR, Riazanov and Voronkov [20] had already experimented with splitting in Vampire in a lighter variant without backtracking. They based their work on ordered resolution O with selection [2]. Weidenbach [24,end of Sect. 4.5] independently outlined the same technique. The basic idea is to extend the signature Σ with a countable set P of nullary predicate symbols and to augment the base calculus with a binary splitting rule that replaces a -clause C ∨D with two Σ P -clauses C ∨p and D ∨¬p. Riazanov and Voronkov require that the precedence ≺ makes all P-literals smaller than the Σ-literals. Binary splitting is then a simplification. They also extend the selection function of the base calculus to support P-literals. Their parallel selection function imitates as much as possible the original selection function.
The calculus O P is closely related to an instance of our framework. Let F be the set of Σ-clauses, with the empty clause as ⊥. Let O = (FInf , FRed ) be the base calculus. We take V = P. Let LA = (SInf , SRed ), whose name stands for lightweight AVATAR, be the induced splitting calculus. Lightweight AVATAR amounts to the splitting architecture Cruanes implemented in Zipperposition [7,Sect. 2.5]. Binary splitting can be realized in LA as a Split-like simplification Σ P rule. The calculi O P and LA disagree slightly because O P 's order ≺ can break ties using P-literals and because LA can detect unsatisfiability early using the Unsat rule. Despite its slightly weaker order, LA is tighter than O P in the sense that saturation w.r.t. O P implies saturation w.r.t. LA but not vice versa.
Labeled Splitting. Labeled splitting, as originally described by Fietzke and Weidenbach [10] and implemented in SPASS, is a first-order resolution-based calculus with binary splitting that traverses the split tree in a depth-first way, using an elaborate backtracking mechanism inspired by CDCL [15]. It works on states (Ψ, N ), where Ψ is a stack storing the current state of the split tree and N is a set of labeled clauses-clauses annotated with finite sets of natural numbers.
We model labeled splitting as an instance of the locking prover L based on the splitting calculus LS = (SInf , SRed ) induced by the resolution calculus R = (FInf , FRed ), where | = and | ≈ are as in Example 2 and V = i∈N {l i , r i , s i }. A-clauses correspond to labeled clauses. Splits are identified by unique split levels. Given a split on C ∨ D with level k, l k ∈ asn(C) and r k ∈ asn(D) represent the left and right branches. In practice, the prover would dynamically extend fml to ensure that fml (l k ) = C and fml (r k ) = D.
When splitting, if we simply added ⊥ ← {¬l k , ¬r k }, we would always need to consider either C ← {l k } or D ← {r k }, depending on the interpretation. However, labeled splitting can undo splits when backtracking. Yet fairness would require us to perform inferences with either C or D even when labeled splitting would not. We solve this as follows. Let = ∼⊥. We introduce the variable s k ∈ asn( ) so that we can enable or disable the split. The StrongUnsat rule then knows that s k is true, but we can still switch to propositional models that disable both C and D. A-clauses are then split using the following binary variant of Split: where C and D share no variables and k is the next split level. Unlike AVATAR, labeled splitting keeps the premise and might split it again with another level.
To emulate the original, the locking prover based on LS must repeatedly apply the following three steps in any order until saturation: 1. Apply Base to perform an inference from the enabled A-clauses. If an enabled Switch is powerful enough to support all of Fietzke and Weidenbach's backtracking rules, but to explore the tree in the same order as they do, we must choose the new model carefully. If a left branch is closed, the model must be updated so as to disable the splits that were not used to close this branch and to enable the right branch. If a right branch is closed, the split must be disabled, and the model must switch to the right branch of the closest enabled split above it with an enabled left branch. If a right branch is closed but there is no split above with an enabled left branch, the entire tree has been visited. Then, a propositional clause ⊥ ← A with A ⊆ i {s i } is | =-entailed by the A-clause set, and StrongUnsat can finish the refutation by exploiting fml (s i ) = .
The above strategy helps achieve fairness, because it ensures that there exists exactly one limit point. It also uses locks in a well-behaved way. This means we can considerably simplify the notion of fairness for =⇒ L -derivations and obtain a criterion that is almost identical to, but slightly more liberal than, Fietzke and Weidenbach's-thereby re-proving the completeness of labeled splitting.
For terminating derivations, their fairness criterion coincides with ours. For diverging derivations, Fietzke and Weidenbach construct a limit subsequence (Φ i , N i ) i of the derivation (Φ i , N i ) i and require that every persistent inference in it be made redundant, exactly as we do for =⇒ L -derivations. The subsequence consists of all states that lie on the split tree's unique infinite branch. Locks are well behaved, with lim sup j→∞ L j = L ∞ , because with the strategy above, once an A-clause is enabled on the rightmost branch, it remains enabled forever. Our definition of fairness allows more subsequences, although this is difficult to exploit without bringing in all the theoretical complexity of AVATAR.
SMT with Quantifiers. Satisfiability modulo theories (SMT) solvers based on DPLL(T ) [15] combine a SAT solver with theory solvers. In the classical setup, the theories are decidable, and the SMT solver is a decision procedure for the union of the theories. Some SMT solvers also support quantified formulas via instantiation at the expense of decidability.
Complete instantiation strategies have been developed for various fragments of first-order logic [11,18,19]. In particular, enumerative quantifier instantiation [18] is complete under some conditions. An SMT solver following such a strategy ought to be refutationally complete, but this has never been proved. Although SMT is quite different from the architectures considered above, we can instantiate our framework to show the completeness of an abstract SMT solver. The model-guided prover MG will provide a suitable starting point.
Let F be the set of first-order Σ-formulas. We represent the SMT solver's underlying SAT solver by the Unsat rule and complement it with an inference system FInf that includes rules for clausification outside quantifiers, theory reasoning, and instantiation. The clausification rules derive C and D from a premise C ∧ D, among others; the theory rules derive ⊥ from some Σ-formula set N such that N | = {⊥}, ignoring quantifiers; and the instantiation rules derive ϕ(u) from premises ∀x. ϕ(x), where u is a ground term. For FRed , we take an arbitrary instance of standard redundancy. Its only purpose is to split disjunctions destructively. We define the "theories with quantifiers" calculus TQ = (FInf , FRed ). For | = and | ≈, we use entailment in the supported theories including quantifiers.
We use the same approximation function as in AVATAR (Example 3). Let us call C ← A a subunit if C is not a disjunction. Whenever a (ground) disjunction C ∨D ←A emerges, we immediately apply Split. This delegates clausal reasoning to the SAT solver. It then suffices to assume that TQ is complete for subunits.
Theorem 26 (Dynamic completeness). Assume TQ is statically complete for subunit sets. Let (J i , N i ) i be a fair =⇒ MG -derivation based on TQ. If N 0 | = {⊥} and N ∞ contains only subunits, then ⊥ ∈ N j for some j.
Like AVATAR-based provers, SMT solvers will typically not perform all SInfinferences, not even up to SRed I . Given a ≈ b ← {v 0 }, b ≈ c ← {v 1 }, a ≈ d ← {v 2 }, c ≈ d ← {v 3 }, and a ≈ c ← {v 4 }, an SMT solver will find only one of the conflicts ⊥←{v 0 , v 1 , v 4 } or ⊥←{v 2 , v 3 , v 4 } but not both. For decidable theories, a practical fair strategy is to instantiate quantifiers only if no other rules are applicable.
Our mathematization of AVATAR and SMT with quantifiers exposes their dissimilarities. With SMT, splitting is mandatory, and there is no subsumption or simplification, locking, or active and passive sets. And of course, theory inferences are n-ary and quantifier instantiation is unary, whereas superposition is binary. Nevertheless, their completeness follows from the same principles.

Conclusion
Our framework captures splitting calculi and provers in a general way, independently of the base calculus. Users can conveniently derive a dynamic refutational completeness result for a splitting prover based on a given statically refutationally complete calculus. As we developed the framework, we faced some tension between constraining the SAT solver's behavior and the saturation prover's. It seemed preferable to constrain the prover, because the prover is typically easier to modify than an off-the-shelf SAT solver. To our surprise, we discovered counterexamples related to locking, formula selection, and simplification, which may affect Vampire's AVATAR implementation, depending on the SAT solver used. We proposed some restrictions, but alternatives could be investigated.
We found that labeled splitting can be seen as a variant of AVATAR where the SAT solver follows a strict strategy and propositional variables are not reused across branches. A benefit of the strict strategy is that locking preserves completeness. As for the relationship between AVATAR and SMT, there are some glaring differences, including that splitting is necessary to support disjunctions in SMT but fully optional in AVATAR. For future work, we could try to complete the picture by considering other related architectures [4][5][6]14].