1 Introduction

One of the great strengths of saturation calculi such as superposition [1] is that they avoid case distinctions. Derived clauses hold unconditionally, and the prover can stop as soon as it derives the empty clause, without having to backtrack. The drawback is that these calculi often generate long, unwieldy clauses that slow down the prover. A remedy is to partition the search space by splitting a multiple-literal clause into variable-disjoint subclauses \(C_i\). Splitting approaches include splitting with backtracking [24], splitting without backtracking [20], labeled splitting [10], and AVATAR [22].

The SAT-based AVATAR architecture is of particular interest because it is so successful. Voronkov reported that an AVATAR-enabled Vampire could solve 421 TPTP [21] problems that had never been solved before by any system [22, Sect. 9], a mind-boggling number. AVATAR works well in combination with the superposition calculus because it combines superposition’s strong equality reasoning with the SAT solver’s strong clausal reasoning. It is also appealing theoretically, because it gracefully generalizes traditional saturation provers and yet degenerates to a SAT solver if the problem is propositional.

Example 1

To illustrate the approach, we follow the key steps of an AVATAR-enabled resolution prover on the initial clause set containing \(\lnot \mathsf {p}(\mathsf {a}),\) \(\lnot \mathsf {q}(z, z),\) and \(\mathsf {p}(x)\vee \mathsf {q}(y, \mathsf {b}).\) The disjunction can be split into and where indicates that the clause C is enabled only in models in which the associated propositional variable [C] is true. A SAT solver is then run to choose a model of . Suppose makes \([\mathsf {p}(x)]\) true and \([\mathsf {q}(y, \mathsf {b})]\) false. Then resolving with \(\lnot \mathsf {p}(\mathsf {a})\) produces , which closes the branch. Next, the SAT solver makes the right disjunct true, and resolving with \(\lnot \mathsf {q}(z, z)\) yields The SAT solver then reports “unsatisfiable,” concluding the refutation.

What about refutational completeness? Far from being a purely theoretical concern, establishing completeness—or finding counterexamples—could yield insights and perhaps lead to an even stronger AVATAR. Before we can answer this open question, we must mathematize splitting. Our starting point is the saturation framework by Waldmann, Tourret, Robillard, and Blanchette [23], based on Bachmair and Ganzinger [2]. It covers a wide array of techniques, but “the main missing piece of the framework is a generic treatment of clause splitting” [23, p. 332]. We provide that missing piece, in the form of a splitting framework, and use it to show the completeness of an AVATAR-like architecture.

Our framework has five layers, linked by refinement. The first layer consists of a refutationally complete base calculus, such as resolution or superposition. It must be presentable as an inference system and a redundancy criterion.

From a base calculus, we derive a splitting calculus (Sect. 3). This extends the base calculus with splitting and inherits the base’s completeness. It works on A-clauses or A-formulas \(C \mathbin {\leftarrow }A\), where A is a set of propositional literals.

Using the saturation framework, we can prove the dynamic completeness of an abstract prover, formulated as a transition system, that implements the splitting calculus. However, this ignores a vital component of AVATAR: the SAT solver. AVATAR considers only inferences involving A-formulas whose assertions are true in the current propositional model. The role of the third layer is to reflect this behavior. A model-guided prover operates on states of the form where is a propositional model and \(\mathcal {N}\) is a set of A-formulas (Sect. 4).

The fourth layer introduces AVATAR’s locking mechanism (Sect. 5). With locking, an A-formula \(D \mathbin {\leftarrow }B\) can be temporarily disabled by another A-formula \(C \mathbin {\leftarrow }A\) if C subsumes D,  even if \(A \not \subseteq B.\) Here we make a first discovery: AVATAR-style locking compromises completeness and must be curtailed.

Finally, the fifth layer is an AVATAR-based prover (Sect. 6). This refines the locking model-guided prover of the fourth layer with the given clause procedure, which saturates an A-formula set by distinguishing between active and passive A-formulas. Here we make another discovery: Selecting A-formulas fairly is not enough to guarantee completeness. We need a stronger criterion.

In a hypothetical tête-à-tête with the designers of labeled splitting, they might gently point out that by pioneering the use of a propositional model, including locking, they almost invented AVATAR themselves. Likewise, developers of SMT solvers might be tempted to claim that Voronkov merely reinvented SMT. To investigate such questions, we apply our framework to splitting without backtracking, labeled splitting, and SMT with quantifiers (Sect. 7). This gives us a solid basis for comparison as well as some new theoretical results.

A technical report [8] is available with the proofs, several counterexamples, and further details. A formalization using Isabelle/HOL [16] is underway.

2 Preliminaries

Our framework is parameterized by abstract notions of formulas, consequence relations, inferences, and redundancy. We largely follow the conventions of Waldmann et al. [23]. A-formulas generalize Voronkov’s A-clauses [22].

Formulas. A set \(\mathbf {F}\) of formulas is a set that contains a distinguished element  denoting falsehood. A consequence relation  has the following properties for all \(M, N, P, Q \subseteq \mathbf {F}\) and \(C, D \in \mathbf {F}\): (D1) ; (D2) ; (D3) if and \(P \subseteq Q,\) then implies ; (D4) if and for every \(C \in M\) and for every \(D \in P,\) then The intended meaning of is . From , we can easily derive a relation understood as , as required by the saturation framework.

The notation can be extended to allow negation on either side. Let \(\mathbf {F}_{\!{{\sim }}}\) be defined as such that Given we have if and only if .

Following the saturation framework [23, p. 318], we distinguish between the consequence relation  used for stating refutational completeness and a possibly stronger consequence relation for soundness. We require that is compact.

Example 2

In clausal first-order logic with equality, the formulas in \(\mathbf {F}\) consist of clauses over a signature \(\mathrm {\Sigma }.\) Each clause C is a finite multiset of literals \(L_1, \dotsc , L_n\) written Each literal L is either an atom or its negation (\(\lnot \)), and each atom is an unoriented equation \(s \approx t.\) We have if and only if every \(\mathrm {\Sigma }\)-model of M also satisfies at least one clause in N.

Calculi and Derivations. A refutational calculus \(( Inf , Red )\) combines a set of inferences \( Inf \) and a redundancy criterion \( Red \). We refer to Waldmann et al. [23] for the precise definitions. Recall in particular that \( Inf (N)\) is the set of inferences from N, \( Inf (N, M) = Inf (N \mathrel {\cup }M) \setminus Inf (N \setminus M)\), N is saturated w.r.t. \( Inf \) and \( Red _\mathrm {I}\) if \( Inf (N) \subseteq Red _\mathrm {I}(N),\) and \(( Inf , Red )\) is statically (refutationally) complete (w.r.t. ) if for every saturated w.r.t. \( Inf \) and \( Red _\mathrm {I}.\)

Let \((X_i)_i\) be a sequence of sets. Its limit inferior is and its limit superior is The elements of \(X_\infty \) are called persistent. A sequence \((N_i)_i\) over \(\mathcal {P}(\mathbf {F})\) is weakly fair w.r.t. \( Inf \) and \( Red _\mathrm {I}\) if \( Inf (N_\infty ) \subseteq \bigcup _i Red _\mathrm {I}(N_i)\) and strongly fair if \(( Inf (N_i))^\infty \subseteq \bigcup _i Red _\mathrm {I}(N_i).\) Given a relation \({\rhd },\) a \(\rhd \)-derivation is an infinite sequence such that \(x_i \rhd x_{i+1}\) for every i. Finite runs can be extended to derivations via stuttering.

Let \({\rhd _{\! Red _\mathrm {F}}} \subseteq (\mathcal {P}(\mathbf {F}))^2\) be the relation such that \(M \rhd _{\! Red _\mathrm {F}}N\) if and only if \(M \setminus N \subseteq Red _\mathrm {F}(N).\) The calculus \(( Inf , Red )\) is dynamically (refutationally) complete (w.r.t. ) if for every \(\rhd _{\! Red _\mathrm {F}}\)-derivation \((N_i)_i\) that is weakly fair w.r.t. \( Inf \) and \( Red _\mathrm {I}\) and such that we have for some i.

A-Formulas. We fix throughout a countable set \(\mathbf {V}\) of propositional variables \(\mathsf {v}_0, \mathsf {v}_1, \dots .\) For each \(\mathsf {v} \in \mathbf {V},\) let \(\lnot \mathsf {v} \in \lnot \mathbf {V}\) denote its negation, with \(\lnot \lnot \mathsf {v} = \mathsf {v}.\) We assume that a formula \( fml (\mathsf {v}) \in \mathbf {F}\) is associated with each \(\mathsf {v} \in \mathbf {V}.\) Intuitively, \(\mathsf {v}\) approximates \( fml (\mathsf {v})\) at the propositional level. This definition is extended so that \( fml (\lnot \mathsf {v}) = {{\sim } fml (\mathsf {v})}.\) An assertion \(a \in \mathbf {A}= \mathbf {V}\mathrel {\cup }\lnot \mathbf {V}\) is either a propositional variable \(\mathsf {v}\) or its negation \(\lnot \mathsf {v}.\) Given a formula \(C \in \mathbf {F}_{\!{{\sim }}},\) let \( asn (C)\) denote the set of assertions \(a \in \mathbf {A}\) such that and .

A propositional interpretation is a set such that for every \(\mathsf {v} \in \mathbf {V}\), exactly one of and holds. We reserve the letter for interpretations, and define .

An A-formula over a set \(\mathbf {F}\) of base formulas and an assertion set \(\mathbf {A}\) is a pair \(\mathcal {C} = (C, A) \in \mathbf {AF}= \mathbf {F}\times \mathcal {P}_\mathrm {fin}(\mathbf {A}),\) written \(C \mathbin {\leftarrow }A,\) where C is a formula and A is a finite set of assertions \(\{a_1, \dotsc , a_n\}\) understood as an implication . We identify \(C \mathbin {\leftarrow }\emptyset \) with C and define the projection \(\lfloor C \mathbin {\leftarrow }A \rfloor = C.\) Moreover, \(\mathcal {N}_\bot \) is the set consisting of all A-formulas of the form . We call such A-formulas propositional clauses. Note the use of calligraphic letters (e.g., \(\mathcal {C}, \mathcal {N}\)) to range over A-formulas and sets of A-formulas.

We say that \(C \mathbin {\leftarrow }A \in \mathbf {AF}\) is enabled in if . A set of A-formulas is enabled in  if all of its members are enabled in . The enabled projection consists of the projections \(\lfloor \mathcal {C} \rfloor \) of all A-formulas \(\mathcal {C}\) enabled in . Analogously, the enabled projection of a set \( Inf \) of \(\mathbf {AF}\)-inferences consists of the projections \(\lfloor \iota \rfloor \) of all inferences \(\iota \in Inf \) whose premises are all enabled in 

A propositional interpretation is a propositional model of \(\mathcal {N}_\bot ,\) written if Moreover, we write if or . A set \(\mathcal {N}_\bot \) is propositionally satisfiable if there exists an interpretation  such that . In contrast to consequence relations, propositional modelhood  interprets the set \(\mathcal {N}_\bot \) conjunctively: is understood as

Finally, we lift and from \(\mathcal {P}(\mathbf {F})\) to \(\mathcal {P}(\mathbf {AF})\): if and only if for every in which \(\mathcal {N}\) is enabled, and if and only if for every in which \(\mathcal {N}\) is enabled.

Example 3

In the original AVATAR [22], the connection between first-order clauses and assertions takes the form of a function The encoding is such that \([\lnot C] = \lnot [C]\) for every ground unit clause C and \([C] = [D]\) if and only if C is syntactically equal to D up to variable renaming. This can be supported in our framework by letting \( fml (\mathsf {v}) = C\) for some C such that \([C] = \mathsf {v}\), for every \(\mathsf {v}.\)

3 Splitting Calculi

Let \(\mathbf {F}\) be a set of base formulas equipped with , , and . The relation is assumed to be nontrivial: (D5) . Let \(\mathbf {A}\) be a set of assertions over \(\mathbf {V}\) and \(\mathbf {AF}\) be the set of A-formulas over \(\mathbf {F}\) and \(\mathbf {A}.\) Let \(( FInf , FRed )\) be a base calculus for \(\mathbf {F}\), where \( FRed \) is a redundancy criterion that additionally satisfies (1) an inference is \( FRed _\mathrm {I}\)-redundant if one of its premises is \( FRed _\mathrm {F}\)-redundant; (2)  for every \(N \subseteq \mathbf {F}\); and (3)  for every . These requirements can easily be met by a well-designed redundancy criterion [1, Sect. 4.3].

Below, we will define the splitting calculus induced by the base calculus. We will see that it not only is statically and dynamically complete w.r.t. , but also meets stronger, “local completeness” criteria that capture model switching.

The Inference Rules. We start with the mandatory inference rules.

Definition 4

The splitting inference system \( SInf \) consists of all instances of

figure b

For Base, the side condition is \((C_n,\dotsc ,C_1,D) \in FInf .\) For Unsat, the side condition is that \(\{\bot \mathbin {\leftarrow }A_1,\dotsc ,\bot \mathbin {\leftarrow }A_n\}\) is propositionally unsatisfiable.

In addition, the following optional inference rules can be used:

figure c
figure d
figure e

The following side conditions apply. For Split: is splittable into \(C_1,\dotsc , C_n\) and \(a_i \in asn (C_i)\) for each i. A formula C is splittable into two or more formulas \(C_1,\dotsc ,C_n\) if and \(C \in FRed _\mathrm {F}(\{C_i\})\) for each i. For Collect: and For Trim: and For StrongUnsat: For Approx: \(a \in asn (C).\) For Tauto:

The three rules identified by double bars are simplifications; they replace their premises with their conclusions in the current A-formula set. The premises’ removal is justified by \( SRed _\mathrm {F},\) defined below. Also note that Base preserves the soundness of \( FInf \) w.r.t. and that the other rules are sound w.r.t. .

The Split rule performs an n-way case split on C. Each case \(C_i\) is approximated by an assertion \(a_i.\) The first conclusion expresses that the case distinction is exhaustive. The n other conclusions assume \(C_i\) if its approximation \(a_i\) is true. In a clausal prover, typically where the subclauses \(C_i\) have mutually disjoint sets of variables and form a maximal split.

Collect and Trim do some garbage collection. StrongUnsat is a variant of Unsat that uses instead of It might correspond to invoking an SMT solver [3] () with a time limit, falling back on a SAT solver (). Approx can be used to make any derived A-formula visible to . Tauto allows communication in the other direction, from the SAT solver to the calculus.

Example 5

Suppose the base calculus is first-order resolution [2] and the initial clauses are \(\lnot \mathsf {p}(\mathsf {a}),\) \(\lnot \mathsf {q}(z, z),\) and \(\mathsf {p}(x)\vee \mathsf {q}(y, \mathsf {b}),\) as in Example 1. Split replaces the last clause by \(\mathsf {p}(x)\mathbin {\leftarrow }\{\mathsf {v}_0\},\) and \(\mathsf {q}(y, \mathsf {b})\mathbin {\leftarrow }\{\mathsf {v}_1\}.\) Two Base inferences then generate and Finally, Unsat generates 

The Redundancy Criterion. Next, we lift the base redundancy criterion.

Definition 6

The splitting redundancy criterion \( SRed = ( SRed _\mathrm {I}, SRed _\mathrm {F})\) is specified as follows. An A-formula \(C \mathbin {\leftarrow }A \in \mathbf {AF}\) is redundant w.r.t. \(\mathcal {N}\), written \(C \mathbin {\leftarrow }A \in SRed _\mathrm {F}(\mathcal {N}),\) if (1)  for every propositional interpretation or (2) there exists an A-formula \(C \mathbin {\leftarrow }B \in \mathcal {N}\) with \(B \subset A.\) An inference \(\iota \in SInf \) is redundant w.r.t. \(\mathcal {N}\), written \(\iota \in SRed _\mathrm {I}(\mathcal {N}),\) if (1) \(\iota \) is a Base inference and for every or (2) \(\iota \) is an Unsat inference and

\( SRed \) qualifies as a redundancy criterion. It can justify the deletion of A-formulas that are propositionally tautological. It also allows other simplifications, as long as the assertions on A-formulas used to simplify a given \(C \mathbin {\leftarrow }A\) are contained in A. If the base criterion \( FRed _\mathrm {F}\) supports subsumption, this also extends to A-formulas: \(D \mathbin {\leftarrow }B \in SRed _\mathrm {F}(\{C \mathbin {\leftarrow }A\})\) if D is strictly subsumed by C and \(B \supseteq A\), or if \(C = D\) and \(B \supset A.\)

Local Saturation. It is not difficult to show that if \(( FInf , FRed )\) is statically complete, then \(( SInf , SRed )\) is statically and hence dynamically complete. However, this result fails to capture a key aspect of most splitting architectures. Since \(\rhd _{\! SRed _\mathrm {F}}\)-derivations have no notion of current split branch or model , they must also perform disabled inferences. To respect enabledness, we need a weaker notion of saturation. If an A-formula set is consistent, it should suffice to saturate w.r.t. a single propositional model. In other words, if no A-formula is derivable for some model  the prover should be allowed to give a verdict of “consistent.” We will call such model-specific saturations local.

Definition 7

A set \(\mathcal {N} \subseteq \mathbf {AF}\) is locally saturated w.r.t. \( SInf \) and \( SRed _\mathrm {I}\) if either or there exists such that \({\mathcal {N}_{\mathcal {J}}}\) is saturated w.r.t. \( FInf \) and \( FRed _\mathrm {I}.\)

Theorem 8

(Strong static completeness).  Assume \(( FInf , FRed )\) is statically complete. Given a set \(\mathcal {N} \subseteq \mathbf {AF}\) that is locally saturated w.r.t. \( SInf \) and \( SRed _\mathrm {I}\) and such that we have

Example 9

Consider the A-clause set expressed using AVATAR conventions. It is not saturated for resolution, because the conclusion of resolving the last two A-clauses is missing, but it is locally saturated with .

Definition 10

A sequence \((\mathcal {N}_i)_i\) of sets of A-formulas is locally fair w.r.t. \( SInf \) and \( SRed _\mathrm {I}\) if either for some i or there exists such that .

Theorem 11

(Strong dynamic completeness).  Assume \(( FInf , FRed )\) is statically complete. Given an \(\rhd _{\! SRed _\mathrm {F}}\)-derivation \((\mathcal {N}_i)_i\) that is locally fair w.r.t. \( SInf \) and \( SRed _\mathrm {I}\) and such that we have for some i.

In Sects. 4 to 6, we will review three transition systems of increasing complexity, culminating with an idealized specification of AVATAR. They will be linked by a chain of stepwise refinements, like pearls on a string. All derivations using these will correspond to \(\rhd _{\! SRed _\mathrm {F}}\)-derivations, and their fairness criteria will imply local fairness. Consequently, by Theorem 11, they will all be complete.

4 Model-Guided Provers

AVATAR and other splitting architectures maintain a model of the propositional clauses, which represents the split tree’s current branch. We can capture this abstractly by refining \(\rhd _{\! SRed _\mathrm {F}}\)-derivations to incorporate a propositional model. The states are now pairs , where is a propositional model and \(\mathcal {N} \subseteq \mathbf {AF}\). Initial states have the form , where \(N \subseteq \mathbf {F}.\) The model-guided prover \(\mathsf {MG}\) is defined by the following transition rules:

figure g

From an \(\Longrightarrow _\mathsf {MG}\)-derivation, we obtain an \(\rhd _{\! SRed _\mathrm {F}}\)-derivation by simply erasing the components. The Derive rule can add new A-formulas and delete redundant A-formulas.  should be a model of \(\mathcal {N}_\bot \) most of the time; when it is not, Switch can be used to switch model or StrongUnsat to finish the refutation.

Example 12

Let us revisit Example 5. Initially, let . After the split, we have \(\lnot \mathsf {p}(\mathsf {a}),\) \(\lnot \mathsf {q}(z, z),\) \(\mathsf {p}(x)\mathbin {\leftarrow }\{\mathsf {v}_0\},\) \(\mathsf {q}(y, \mathsf {b})\mathbin {\leftarrow }\{\mathsf {v}_1\},\) and The natural option is to switch model. We take . We then derive Since we switch to , where we derive Finally, we detect that the propositional clauses are unsatisfiable.

We need a fairness criterion for \(\mathsf {MG}\) that implies local fairness of the underlying \(\rhd _{\! SRed _\mathrm {F}}\)-derivation. The latter requires a witness  but gives us no hint as to where to look for one. Our solution involves a topological concept: is a limit point in if there exists a subsequence of such that .

Example 13

Let be the sequence such that (i.e., \(\mathsf {v}_1,\mathsf {v}_3, \ldots , \mathsf {v}_{2i-1}\) are true and the other variables are false) and Although it is not in the sequence, the interpretation  is a limit point. The associated split tree is shown in Fig. 1. The direct path from the root to a node specifies the assertions that are true in .

Fig. 1.
figure 1

A split tree with a single infinite branch

Fig. 2.
figure 2

A split tree with two infinite branches

Example 14

Let be such that and . This sequence has two limit points: and . The split tree is depicted in Fig. 2.

Basic topology tells us that every sequence has a limit point. No matter how erratically the prover switches branches, it will fully explore at least one of them. It then suffices to perform the base \( FInf \)-inferences fairly in that branch:

Definition 15

An \(\Longrightarrow _\mathsf {MG}\)-derivation is fair if either (1)  for some i or (2)  for infinitely many indices i and there exists a limit point of such that .

Fairness of an \(\Longrightarrow _\mathsf {MG}\)-derivation implies local fairness of the underlying \(\rhd _{\! SRed _\mathrm {F}}\)-derivation. A well-behaved propositional solver, as in labeled splitting, always gives rise to a single limit point , which can be taken for in Definition 15. By contrast, an unconstrained solver, as supported by AVATAR, can produce multiple limit points. Then it is more challenging to ensure fairness.

Example 16

Consider the consistent set consisting of \(\lnot \mathsf {p}(x),\) and Splitting the second clause into \(\mathsf {p}(\mathsf {a})\) and \(\mathsf {q}(\mathsf {a})\) and resolving \(\mathsf {q}(\mathsf {a})\) with the third clause yields This process can be iterated. Now suppose that \(\mathsf {v}_{2i}\) and \(\mathsf {v}_{2i+1}\) are associated with \(\mathsf {p}(\mathsf {f}^i(\mathsf {a}))\) and \(\mathsf {q}(\mathsf {f}^i(\mathsf {a})),\) respectively. If we split every emerging and the SAT solver always makes \(\mathsf {v}_{2i}\) true first, we end up with the situation of Example 13 and Fig. 1. For the limit point , all \( FInf \)-inferences are performed. Thus, the derivation is fair.

Example 17

We build a clause set from two copies of Example 16, where each clause C from each copy  is extended to . We add the clause and split it as our first move. From there, each branch imitates Example 16. A SAT solver might jump back and forth, as in Example 14 and Fig. 2. Even if A-clauses get disabled and re-enabled infinitely often, we must perform all nonredundant inferences in at least one of the two limit points ( or ).

5 Locking Provers

Next, we refine the model-guided prover into a locking prover that temporarily locks away A-formulas that are redundant locally w.r.t. some but not globally. The states are triples , with \(\mathcal {L} \subseteq \mathcal {P}_\mathrm {fin}(\mathbf {A})\times \mathbf {AF}\). Intuitively, \((B{,}~ C \mathbin {\leftarrow }A) \in \mathcal {L}\) means that \(C \mathbin {\leftarrow }A\) is “locally redundant” in interpretations . The function erases the locks: Initial states have the form , where The locking prover is defined by these two rules:

figure j

We note that \(\Longrightarrow _\mathsf {L}\)-derivations refine \(\Longrightarrow _\mathsf {MG}\)-derivations, with states  mapped to .

Locking can cause incompleteness, because an A-formula can be locally redundant at every point in the derivation and yet not be so at any limit point, thereby breaking local saturation. For example, if we have derived \(\mathsf {p}(x) \mathbin {\leftarrow }\{\lnot \mathsf {v}_k\}\) for every k,  then \(\mathsf {p}(\mathsf {c})\) is locally redundant in any that contains \(\lnot \mathsf {v}_k.\) For the models , the clause \(\mathsf {p}(\mathsf {c})\) would always be locally redundant and ignored. Yet \(\mathsf {p}(\mathsf {c})\) might not be locally redundant at the unique limit point . We could rule out this counterexample by requiring that derivations are strongly fair—that is, every inference possible infinitely often must eventually be made redundant. However, we have found a counterexample showing that strong fairness does not ensure completeness [8, Example 46]. It would seem that this counterexample could arise with Vampire if the underlying SAT solver produces this specific sequence of interpretations.

Our solution is as follows. Let be an \(\Longrightarrow _\mathsf {L}\)-derivation, let be a subsequence of , and let \((\mathcal {N}'_j)_j\) be the corresponding subsequence of \((\mathcal {N}_i)_i.\) To achieve fairness, we now consider \(\mathcal {N}'_\infty \), the A-formulas persistent in the unlocked subsequence \((\mathcal {N}'_j)_j\). By contrast, fairness of \(\Longrightarrow _\mathsf {MG}\)-derivations used \(\mathcal {N}_\infty \).

Definition 18

An \(\Longrightarrow _\mathsf {L}\)-derivation is fair if either (1) or (2) for infinitely many indices i and there exists a subsequence  converging to a limit point  such that , where \((\mathcal {N}'_j)_j\) and \((\mathcal {L}'_j)_j\) correspond to .

Fairness of an \(\Longrightarrow _\mathsf {L}\)-derivation implies fairness of the corresponding \(\Longrightarrow _\mathsf {MG}\)-derivation. The condition on the sets \(\mathcal {L}'_j\) ensures that inferences from A-formulas that are locked infinitely often, but not infinitely often with the same lock, are redundant at the limit point. In particular, if we know that each A-formula is locked at most finitely often, then and the inclusion in the definition above simplifies to .

6 AVATAR-Based Provers

AVATAR was unveiled in 2014 by Voronkov [22]. Since then, he and his colleagues studied many options and extensions [3, 17]. A second implementation, in Lean’s super tactic, is due to Ebner [9]. Here we attempt to capture AVATAR’s essence.

The abstract AVATAR-based prover we define in this section extends the locking prover \(\mathsf {L}\) with a given clause procedure [13]. A-formulas are moved in turn from the passive to the active set, where inferences are performed. The heuristic for choosing the next given A-formula to move is guided by timestamps indicating when the A-formulas were derived, to ensure fairness.

Let \(\mathbf {TAF}= \mathbf {AF}\times \mathbb N\) be the set of timestamped A-formulas. Given we define and we overload existing notations to erase timestamps. Thus, and so on. Note that we use a new set of calligraphic letters (e.g., ) to range over timestamped A-formulas and A-formulas sets. Using the saturation framework [23, Sect. 3], we lift \(( SInf , SRed )\) to a calculus \(( TSInf , TSRed )\) on \(\mathbf {TAF}\) with the tiebreaker order > on timestamps, so that \((\mathcal {C}, t + k) \in TSRed _\mathrm {F}(\{(\mathcal {C}, t)\})\) for any \(k > 0\).

A state is a tuple , where , , and are respectively the sets of active, passive, and other (disabled or propositional) timestamped A-formulas, and is the set of locked timestamped A-formulas such that (1) , (2)  is enabled in , and (3) . The AVATAR-based prover \(\mathsf {AV}\) is defined as follows:

figure l

There is also a LockP rule that is identical to LockA except that it starts in the state . An \(\mathsf {AV}\)-derivation is well timestamped if every A-formula introduced by a rule is assigned a unique timestamp.

Let be an \(\Longrightarrow _\mathsf {AV}\)-derivation. It is easy to see that it refines the \(\Longrightarrow _\mathsf {L}\)-derivation and that the saturation invariant holds if .

In contrast with nonsplitting provers, for \(\mathsf {AV}\), fairness w.r.t. formulas does not imply fairness w.r.t. inferences. A problematic scenario involves two premises of an inference \(\iota \) and four transitions repeated forever, possibly with other steps interleaved: Infer makes active; Switch disables it; Infer makes active; Switch disables it. Even though and are selected in a strongly fair fashion, \(\iota \) is never performed. We need an even stronger fairness criterion.

Definition 19

An \(\Longrightarrow _\mathsf {AV}\)-derivation is fair if (1) or (2) for infinitely many indices i and there exists a subsequence converging to a limit point such that (3)  and (4) 

Condition (3) ensures that all inferences involving passive A-formulas are redundant at the limit point. It would not suffice to require because A-formulas can move back and forth between , , and , as we just saw. Condition (4) is similar to the condition on locks in Definition 18. If the \(\Longrightarrow _\mathsf {AV}\)-derivation is fair, the corresponding \(\Longrightarrow _\mathsf {L}\)-derivation is also fair.

Many selection strategies are combinations of basic strategies, such as choosing the smallest formula by weight or the oldest by age. We capture such strategies using selection orders \(\mathrel {\lessdot }\). Intuitively,

if the prover will always select before if both are present. We use two selection orders: \(\mathrel {\lessdot }_\mathbf {TAF}\), based on timestamps, must be followed infinitely often; \(\mathrel {\lessdot }_\mathbf {F}\) must be followed otherwise. For the first one, we can use \(\mathrel {\lessdot }_\mathrm {age}\) defined so that \((\mathcal {C},t) \mathrel {\lessdot }_\mathrm {age} (\mathcal {C}',t')\) if \(t < t'.\)

Definition 20

Let X be a set. A selection order \({\mathrel {\lessdot }}\) on X is an irreflexive and transitive relation such that is finite for all \(x \in X\).

The intersection of two orders \(\mathrel {\lessdot }_1\) and \(\mathrel {\lessdot }_2\) corresponds to the nondeterministic alternation between them. The prover may choose either a \(\mathrel {\lessdot }_1\)-minimal or a \(\mathrel {\lessdot }_2\)-minimal A-formula, at its discretion.

To ensure completeness, we must restrict the inferences that the prover may perform; otherwise, it could derive infinitely many A-formulas with different assertions, causing it to switch between two branches of the split tree without making progress. Given \(\mathcal {N} \subseteq \mathbf {AF}\), let

Definition 21

A function \(F : \mathcal {P}(\mathbf {AF})\rightarrow \mathcal {P}(\mathbf {AF})\) is strongly finitary if \(\lfloor F(\mathcal {N}) \rfloor \) and \(\bigcup \lceil F(\mathcal {N}) \rceil \setminus \bigcup \lceil \mathcal {N} \rceil \) are finite for any \(\mathcal {N} \subseteq \mathbf {AF}\) such that \(\lfloor \mathcal {N} \rfloor \) is finite.

Intuitively, a strongly finitary function F returns finitely many base formulas and finitely many new assertions, although it may return infinitely many A-formulas. Clearly, \(F(\mathcal {N})\) is finite for any finite \(\mathcal {N} \subseteq \mathbf {AF}\). If \( FInf (N)\) is finite for any finite \(N \subseteq \mathbf {F}\), then performing \( SInf \)-inferences is strongly finitary. Deterministic Split rules, such as AVATAR’s, are also strongly finitary. We can lift a strongly finitary F to any by taking . If F and G are strongly finitary, then so is .

Simplification rules used by the prover must be restricted even more to ensure completeness, because they can lead to new splits and assertions. For example, simplifying to transforms an unsplittable clause into a splittable one. If simplifications were to produce infinitely many such clauses, the prover might split and switch models forever without making progress.

Definition 22

Let \(\prec \) be a well-founded relation on \(\mathbf {F}\), and let \(\preceq \) be its reflexive closure. A function \(S : \mathbf {AF}\rightarrow \mathcal {P}(\mathbf {AF})\) is a strongly finitary simplification bound for \(\prec \) if \(\mathcal {N} \mapsto \bigcup _{\mathcal {C} \in \mathcal {N}} S(\mathcal {C})\) is strongly finitary and \(\lfloor \mathcal {C}' \rfloor \preceq \lfloor \mathcal {C} \rfloor \) for all \(\mathcal {C}' \in S(\mathcal {C}).\)

The prover may simplify an A-formula \(\mathcal {C}\) to \(\mathcal {C}'\) only if \(\mathcal {C}' \in S(\mathcal {C})\). It may also delete \(\mathcal {C}\). Strongly finitary simplification bounds are closed under unions, allowing the combination of simplification techniques based on \(\prec \). For superposition, a natural choice for \(\prec \) is the clause order. The key property of strongly finitary simplification bounds is that if we saturate a finite set of A-formulas w.r.t. simplifications, the saturation is also finite.

Example 23

Let \(\mathbf {F}\) be the set of first-order clauses and . Then S is a strongly finitary simplification bound. This S covers many simplification techniques, including elimination of duplicate literals, deletion of resolved literals, and subsumption resolution.

Example 24

If the Knuth–Bendix order [12] is used and all weights are positive, then is a strongly finitary simplification bound. This can be used to cover demodulation.

Equipped with the above definitions, we introduce a fairness criterion that is more concrete and easier to apply than fairness of \(\Longrightarrow _\mathsf {AV}\)-derivations. We could refine \(\mathsf {AV}\) further and use this criterion to show the completeness of an imperative procedure such as Voronkov’s extended Otter loop [22, Fig. 3], thus showing that Vampire with AVATAR is complete if locking is sufficiently restricted.

Lemma 25

Let I be a strongly finitary function, and let S be a strongly finitary simplification bound. Then a well-timestamped \(\Longrightarrow _\mathsf {AV}\)-derivation is fair if all of the following conditions hold:

  1. 1.

    \(\mathrel {\lessdot }_\mathbf {TAF}\) is a selection order on , and \(\mathrel {\lessdot }_\mathbf {F}\) is a selection order on \(\mathbf {F}\);

  2. 2.

    and is finite;

  3. 3.

    for every Infer transition, either is \(\mathrel {\lessdot }_\mathbf {TAF}\)-minimal in  or is \(\mathrel {\lessdot }_\mathbf {F}\)-minimal in  ;

  4. 4.

    for every Infer transition, ;

  5. 5.

    for every Process transition, ;

  6. 6.

    if , then eventually Switch or StrongUnsat occurs;

  7. 7.

    if , then eventually Infer, Switch or StrongUnsat occurs;

  8. 8.

    there are infinitely many indices i such that either or Infer chooses a \(\mathrel {\lessdot }_\mathbf {TAF}\)-minimal at i;

  9. 9.

    for every subsequence converging to a limit point.

7 Application to Other Architectures

AVATAR may be the most natural application of our framework, but it is not the only one. Below we complete the picture by studying splitting without backtracking, labeled splitting, and SMT with quantifiers.

Splitting without Backtracking. Before AVATAR, Riazanov and Voronkov [20] had already experimented with splitting in Vampire in a lighter variant without backtracking. They based their work on ordered resolution \(\textsf {O}\) with selection [2]. Weidenbach [24] end of Sect. 4.5] independently outlined the same technique. The basic idea is to extend the signature \(\mathrm {\Sigma }\) with a countable set \(\mathbb {P}\) of nullary predicate symbols and to augment the base calculus with a binary splitting rule that replaces a \(\mathrm {\Sigma }_\mathbb {P}\)-clause with two \(\mathrm {\Sigma }_\mathbb {P}\)-clauses and . Riazanov and Voronkov require that the precedence \(\prec \) makes all \(\mathbb {P}\)-literals smaller than the \(\mathrm {\Sigma }\)-literals. Binary splitting is then a simplification. They also extend the selection function of the base calculus to support \(\mathbb {P}\)-literals. Their parallel selection function imitates as much as possible the original selection function.

The calculus \(\mathsf {O}_\mathbb {P}{}\) is closely related to an instance of our framework. Let \(\mathbf {F}\) be the set of \(\mathrm {\Sigma }\)-clauses, with the empty clause as . Let \(\textsf {O}= ( FInf , FRed )\) be the base calculus. We take \(\mathbf {V}=\mathbb {P}\). Let \(\mathsf {LA}= ( SInf , SRed )\), whose name stands for lightweight AVATAR, be the induced splitting calculus. Lightweight AVATAR amounts to the splitting architecture Cruanes implemented in Zipperposition [7, Sect. 2.5]. Binary splitting can be realized in \(\mathsf {LA}\) as a Split-like simplification rule. The calculi \(\mathsf {O}_\mathbb {P}\) and \(\mathsf {LA}\) disagree slightly because \(\mathsf {O}_\mathbb {P}\)’s order \(\prec \) can break ties using \(\mathbb {P}\)-literals and because \(\mathsf {LA}\) can detect unsatisfiability early using the Unsat rule. Despite its slightly weaker order, \(\mathsf {LA}\) is tighter than \(\mathsf {O}_\mathbb {P}\) in the sense that saturation w.r.t. \(\mathsf {O}_\mathbb {P}\) implies saturation w.r.t. \(\mathsf {LA}\) but not vice versa.

Labeled Splitting. Labeled splitting, as originally described by Fietzke and Weidenbach [10] and implemented in SPASS, is a first-order resolution-based calculus with binary splitting that traverses the split tree in a depth-first way, using an elaborate backtracking mechanism inspired by CDCL [15]. It works on states \((\mathrm {\Psi },\mathcal {N})\), where \(\mathrm {\Psi }\) is a stack storing the current state of the split tree and \(\mathcal {N}\) is a set of labeled clauses—clauses annotated with finite sets of natural numbers.

We model labeled splitting as an instance of the locking prover \(\mathsf {L}\) based on the splitting calculus \(\mathsf {LS}=( SInf , SRed )\) induced by the resolution calculus \(\textsf {R}= ( FInf , FRed )\), where  and  are as in Example 2 and \(\mathbf {V}= \bigcup _{i\in \mathbb N} \{\mathsf {l}_i, \mathsf {r}_i, \mathsf {s}_i\}\). A-clauses correspond to labeled clauses. Splits are identified by unique split levels. Given a split on  with level k, \(\mathsf {l}_k \in asn (C)\) and \(\mathsf {r}_k \in asn (D)\) represent the left and right branches. In practice, the prover would dynamically extend \( fml \) to ensure that \( fml (\mathsf {l}_k) = C\) and \( fml (\mathsf {r}_k) = D.\)

When splitting, if we simply added we would always need to consider either \(C \mathbin {\leftarrow }\{\mathsf {l}_k\}\) or \(D \mathbin {\leftarrow }\{\mathsf {r}_k\},\) depending on the interpretation. However, labeled splitting can undo splits when backtracking. Yet fairness would require us to perform inferences with either C or D even when labeled splitting would not. We solve this as follows. Let . We introduce the variable \(\mathsf {s}_k \in asn (\top )\) so that we can enable or disable the split. The StrongUnsat rule then knows that \(\mathsf {s}_k\) is true, but we can still switch to propositional models that disable both C and D. A-clauses are then split using the following binary variant of Split:

figure t

where C and D share no variables and k is the next split level. Unlike AVATAR, labeled splitting keeps the premise and might split it again with another level.

To emulate the original, the locking prover based on \(\mathsf {LS}\) must repeatedly apply the following three steps in any order until saturation:

  1. 1.

    Apply Base to perform an inference from the enabled A-clauses. If an enabled is derived with \(A\subseteq \bigcup _i \{\mathsf {l}_i, \mathsf {r}_i\}\), apply Switch or StrongUnsat.

  2. 2.

    Apply Derive to simplify or delete an enabled A-clause. Use Lock if necessary to remove the original A-clause. If an enabled is derived with \(A\subseteq \bigcup _i \{\mathsf {l}_i, \mathsf {r}_i\}\), apply Switch or StrongUnsat.

  3. 3.

    Apply SoftSplit with split level k on an A-clause \(\mathcal {C}\). Then use Switch to enable the left branch and apply Lock on \(\mathcal {C}\) with \(\mathsf {s}_k\) as the lock.

Switch is powerful enough to support all of Fietzke and Weidenbach’s backtracking rules, but to explore the tree in the same order as they do, we must choose the new model carefully. If a left branch is closed, the model must be updated so as to disable the splits that were not used to close this branch and to enable the right branch. If a right branch is closed, the split must be disabled, and the model must switch to the right branch of the closest enabled split above it with an enabled left branch. If a right branch is closed but there is no split above with an enabled left branch, the entire tree has been visited. Then, a propositional clause with is -entailed by the A-clause set, and StrongUnsat can finish the refutation by exploiting \( fml (\mathsf {s}_i) = \top \).

The above strategy helps achieve fairness, because it ensures that there exists exactly one limit point. It also uses locks in a well-behaved way. This means we can considerably simplify the notion of fairness for \(\Longrightarrow _\mathsf {L}\)-derivations and obtain a criterion that is almost identical to, but slightly more liberal than, Fietzke and Weidenbach’s—thereby re-proving the completeness of labeled splitting.

For terminating derivations, their fairness criterion coincides with ours. For diverging derivations, Fietzke and Weidenbach construct a limit subsequence \((\mathrm \Phi '_i, \mathcal {N}'_i)_i\) of the derivation \((\mathrm \Phi _i, \mathcal {N}_i)_i\) and require that every persistent inference in it be made redundant, exactly as we do for \(\Longrightarrow _\mathsf {L}\)-derivations. The subsequence consists of all states that lie on the split tree’s unique infinite branch. Locks are well behaved, with , because with the strategy above, once an A-clause is enabled on the rightmost branch, it remains enabled forever. Our definition of fairness allows more subsequences, although this is difficult to exploit without bringing in all the theoretical complexity of AVATAR.

SMT with Quantifiers. Satisfiability modulo theories (SMT) solvers based on DPLL(T) [15] combine a SAT solver with theory solvers. In the classical setup, the theories are decidable, and the SMT solver is a decision procedure for the union of the theories. Some SMT solvers also support quantified formulas via instantiation at the expense of decidability.

Complete instantiation strategies have been developed for various fragments of first-order logic [11, 18, 19]. In particular, enumerative quantifier instantiation [18] is complete under some conditions. An SMT solver following such a strategy ought to be refutationally complete, but this has never been proved. Although SMT is quite different from the architectures considered above, we can instantiate our framework to show the completeness of an abstract SMT solver. The model-guided prover \(\mathsf {MG}\) will provide a suitable starting point.

Let \(\mathbf {F}\) be the set of first-order \(\mathrm {\Sigma }\)-formulas. We represent the SMT solver’s underlying SAT solver by the \(\textsc {Unsat}\) rule and complement it with an inference system \( FInf \) that includes rules for clausification outside quantifiers, theory reasoning, and instantiation. The clausification rules derive C and D from a premise , among others; the theory rules derive from some \(\mathrm {\Sigma }\)-formula set N such that , ignoring quantifiers; and the instantiation rules derive \(\varphi (u)\) from premises \(\forall x.\>\varphi (x)\), where u is a ground term. For \( FRed \), we take an arbitrary instance of standard redundancy. Its only purpose is to split disjunctions destructively. We define the “theories with quantifiers” calculus \(\mathsf {TQ}= ( FInf , FRed )\). For and , we use entailment in the supported theories including quantifiers.

We use the same approximation function as in AVATAR (Example 3). Let us call \(C \mathbin {\leftarrow }A\) a subunit if C is not a disjunction. Whenever a (ground) disjunction emerges, we immediately apply Split. This delegates clausal reasoning to the SAT solver. It then suffices to assume that \(\mathsf {TQ}\) is complete for subunits.

Theorem 26

(Dynamic completeness).  Assume \(\mathsf {TQ}\) is statically complete for subunit sets. Let be a fair \(\Longrightarrow _\mathsf {MG}\)-derivation based on \(\mathsf {TQ}\). If and contains only subunits, then for some j.

Like AVATAR-based provers, SMT solvers will typically not perform all \( SInf \)-inferences, not even up to \( SRed _\mathrm {I}\). Given \(\mathsf {a}\approx \mathsf {b} \mathbin {\leftarrow }\{\mathsf {v}_0\}\), \(\mathsf {b}\approx c \mathbin {\leftarrow }\{\mathsf {v}_1\}\), \(\mathsf {a} \approx \mathsf {d} \mathbin {\leftarrow }\{\mathsf {v}_2\}\), \(\mathsf {c} \approx \mathsf {d} \mathbin {\leftarrow }\{\mathsf {v}_3\}\), and \(\mathsf {a} \not \approx \mathsf {c} \mathbin {\leftarrow }\{\mathsf {v}_4\},\) an SMT solver will find only one of the conflicts or but not both. For decidable theories, a practical fair strategy is to instantiate quantifiers only if no other rules are applicable.

Our mathematization of AVATAR and SMT with quantifiers exposes their dissimilarities. With SMT, splitting is mandatory, and there is no subsumption or simplification, locking, or active and passive sets. And of course, theory inferences are n-ary and quantifier instantiation is unary, whereas superposition is binary. Nevertheless, their completeness follows from the same principles.

8 Conclusion

Our framework captures splitting calculi and provers in a general way, independently of the base calculus. Users can conveniently derive a dynamic refutational completeness result for a splitting prover based on a given statically refutationally complete calculus. As we developed the framework, we faced some tension between constraining the SAT solver’s behavior and the saturation prover’s. It seemed preferable to constrain the prover, because the prover is typically easier to modify than an off-the-shelf SAT solver. To our surprise, we discovered counterexamples related to locking, formula selection, and simplification, which may affect Vampire’s AVATAR implementation, depending on the SAT solver used. We proposed some restrictions, but alternatives could be investigated.

We found that labeled splitting can be seen as a variant of AVATAR where the SAT solver follows a strict strategy and propositional variables are not reused across branches. A benefit of the strict strategy is that locking preserves completeness. As for the relationship between AVATAR and SMT, there are some glaring differences, including that splitting is necessary to support disjunctions in SMT but fully optional in AVATAR. For future work, we could try to complete the picture by considering other related architectures [4,5,6, 14].