Range-Restricted Interpolation through Clausal Tableaux

We show how variations of range-restriction and also the Horn property can be passed from inputs to outputs of Craig interpolation in first-order logic. The proof system is clausal tableaux, which stems from first-order ATP. Our results are induced by a restriction of the clausal tableau structure, which can be achieved in general by a proof transformation, also if the source proof is by resolution/paramodulation. Primarily addressed applications are query synthesis and reformulation with interpolation. Our methodical approach combines operations on proof structures with the immediate perspective of feasible implementation through incorporating highly optimized first-order provers.


Introduction
We show how variations of range-restriction and also the Horn property can be passed from inputs to outputs of Craig interpolation in first-order logic.The primarily envisaged application field is synthesis and reformulation of queries with interpolation [39,56,5].Basically, the sought target query R is understood there as the right side of a definition of a given query Q within a given background knowledge base K, i.e., it holds that K |= (Q ↔ R), where the vocabulary of R is in a given set of permitted target symbols.In first-order logic, the formulas R can be characterized as the Craig interpolants of K ∧ Q and ¬K ′ ∨ Q ′ , where K, Q are copies of K ′ , Q ′ with the symbols not allowed in R replaced by fresh symbols [14].Formulas R exist if and only if the entailment K ∧ Q |= ¬K ′ ∨ Q ′ holds.They can be constructed as Craig interpolants from given proofs of the entailment in a suitable calculus.
In databases and knowledge representation, syntactic fragments of first-order logic ensure desirable properties, for example domain independence.Typically, for given K and Q in some such fragment, also R must be in some specific fragment to be usable as a query or as a knowledge base component.Our work addresses this by showing for certain such fragments how membership is passed on to interpolants and thus to the constructed right sides of definitions.The fragment in focus here is a variant of range-restriction from [59], known as a rather general syntactic condition to ensure domain independence [1, p. 97].It permits conversion into a shape suitable for "evaluation" by binding free and quantified variables successively to the members of given predicate extensions.Correspondingly, if the vocabulary is relational, a range-restricted formula can be translated into a relational algebra expression.First-order representations of widely-used classes of integrity constraints, such as tuple-generating dependencies, are sentences that are range-restricted in the considered sense.
As proof system we use clausal tableaux [29,31,30,26,33], devised in the 1990s to take account of automated first-order provers that may be viewed as enumerating tree-shaped proof structures, labeled with instances of input clauses. 1  Such systems include the Prolog Technology Theorem Prover [53], SETHEO [32], leanCoP [43,42] and CMProver [16,60,61,45].As shown in [62], a given closed clausal tableau is quite well-suited as a proof structure to extract a Craig interpolant.Via the translation of a resolution deduction tree [12] to a clausal tableau in cut normal form [31,62] this transfers also to interpolation from a given resolution/paramodulation proof.
Since the considered notion of range-restriction is based on prenexing and properties of both a CNF and a DNF representation of the formula, it fits well with the common first-order ATP setting involving Skolemization and clausification and the ATP-oriented interpolation on the basis of clausal tableaux, where in a first stage the propositional structure of the interpolant is constructed and in a second stage the quantifier prefix.
Our strengthenings of Craig interpolation are induced by a specific restriction of the clausal tableau structure, which we call hyper, since it relates to the proof structure restrictions of hyperresolution [46] and hypertableaux [2].However, it is considered here for tree structures with rigid variables.A proof transformation that converts an arbitrary closed clausal tableau to one with the hyper property shows that the restriction is w.l.o.g.and, moreover, allows the prover unhampered search for the closed clausal tableaux or resolution/paramodulation proof underlying interpolation.
Structure of the Paper.Section 2 summarizes preliminaries, in particular interpolation with clausal tableaux [62].Our main result on strengthenings of Craig interpolation for range-restricted formulas is developed in Sect.3. Section 4 discusses Craig interpolation from a Horn formula, also combined with rangerestriction.The proof transformation underlying these results is introduced in Sect. 5. We conclude in Sect.6 with discussing related work, open issues and perspectives.
Proofs of nontrivial claims that are not proven in the body of the paper are supplemented in the appendix.An implementation with the PIE environment [60,61] 2 is in progress.

Notation
We consider formulas of first-order logic.An NNF formula is a quantifier-free formula built up from literals (atoms or negated atoms), truth-value constants ⊤, ⊥, conjunction and disjunction.A CNF formula, also called clausal formula, is an NNF formula that is a conjunction of disjunctions (clauses) of literals.A DNF formula is an NNF formula that is a disjunction of conjunctions (conjunctive clauses) of literals.The complement of a literal L is denoted by L. An occurrence of a subformula in a formula has positive (negative) polarity, depending on whether it is in the scope of an even (odd) number of possibly implicit occurrences of negation.Let F be a formula.Var (F ) is set of its free variables.Var + (F ) (Var − (F )) is the set of its free variables with an occurrence in an atom with positive (negative) polarity.Fun(F ) is the set of functions occurring in it, including constants, regarded here throughout as 0-ary functions.Pred ± (F ) is the set of pairs ⟨p, pol ⟩, where p is a predicate and pol ∈ {+, −}, such that an atom with predicate p occurs in F with the polarity indicated by pol .Voc ± (F ) is Fun(F ) ∪ Pred ± (F ).A sentence is a formula without free variables.An NNF is ground if it has no variables.If S is a set of terms, we call its members S-terms.The |= symbol expresses semantic entailment.

Clausal First-Order Tableaux
A clausal tableau (briefly tableau) for a clausal formula F is a finite ordered tree whose nodes N with exception of the root are labeled with a literal lit(N ), such that for each node N the disjunction of the literals of all its children in their leftto-right order, clause(N ), is an instance of a clause in F .A branch of a tableau is closed iff it contains nodes with complementary literals.A node is closed iff all branches through it are closed.A tableau is closed iff its root is closed.A node is closing iff it has an ancestor with complementary literal.With a closing node N , a particular such ancestor is associated as target of N , written tgt(N ).A tableau is regular iff no node has an ancestor with the same literal and is leaf-closing iff all closing nodes are leaves.A closed tableau that is leaf-closing is called leaf-closed.Tableau simplification can convert any tableau to a regular and leaf-closing tableau for the same clausal formula, closed if the original tableau is so.Regularity is achieved by repeating the following operation [31,Sect. 2.1.3]:Select a node N with an ancestor that has the same literal, remove the edges originating in the parent of N and replace them with the edges originating in N .The leaf-closing property is achieved by repeatedly selecting an inner node N that is closing and removing the edges originating in N .All occurrences of variables in (the literal labels of) a tableau are free and their scope spans the whole tableau.That is, we consider free-variable tableaux [30, p. 158ff] with rigid variables [26, p. 114].A tableau without variables is called ground.The universal closure of a clausal formula F is unsatisfiable iff there exists a closed clausal tableau for F .This holds also if clausal tableau is restricted by the properties ground, regular and leaf-closing in arbitrary combinations.

Interpolation with Clausal Tableaux
Craig's interpolation theorem [13,15] along with Lyndon's observation on the preservation of predicate polarities [35] ensures for first-order logic the existence of Craig-Lyndon interpolants, defined as follows.Let F, G be formulas such that  Following [62], our interpolant construction is based on a generalization of clausal tableaux where nodes have an additional side label that is shared by siblings and indicates whether the tableau clause is an instance of an input clause derived from the formula F or of the formula G of the statement F ∧ G |= ⊥ underlying the reverse interpolant.Thus, a two-sided clausal tableau for clausal formulas F and G is a tableau for F ∧ G whose nodes N with exception of the root are labeled additionally with a side side(N ) ∈ {F, G}, such that (1) if N and N ′ are siblings, then side an instance of a clause in F , and if N has a child N ′ with side(N ′ ) = G, then clause(N ) is an instance of a clause in G.We also refer to the side of the children of a node N as side of clause(N ).For side ∈ {F, G} define path side (N ) def = N ′ ∈Path and side(N ′ )=side lit(N ′ ), where Path is the union of the set of the ancestors of N and {N }.
Let N be a node of a leaf-closed two-sided clausal tableau.The value of ipol(N ) is an NNF formula, defined inductively as specified with the tables below, the left for the base case where N is a leaf, the right for the case where N is an inner node with children N 1 , . . ., N n .If N 0 is the root of a two-sided tableaux for clausal ground formulas F and G, then ipol(N 0 ) is a Craig-Lyndon interpolant of F and ¬G. 3 The CTIF (Clausal Tableau Interpolation for First-Order Formulas) procedure (Fig. 2) [62] extends this to a two-stage [9,24] (inductive construction and lifting) interpolation method for full first-order logic.It is complete (yields a Craig-Lyndon interpolant for all first order formulas F and G such that F |= G) under the assumption that the method for tableau computation in Step 3 is complete (yields a closed tableau for all unsatisfiable clausal formulas).Some steps leave room for interpolation-specific heuristics: In step 4 the choice of the terms used for grounding; in step 5 the choice of the side assigned to clauses that are an instance of both a clause in F ′ and a clause in G ′ ; and in step 7 the quantifier prefix, which is constrained just by a partial order.

Interpolation and Range-Restriction
We now develop our main result on strengthenings of Craig interpolation for range-restricted formulas.

CNF and DNF with Some Assumed Syntactic Properties
Following [59] we will consider a notion of range-restriction defined in terms of properties of two prenex formulas that are equivalent to the original formula, have both the same quantifier prefix but matrices in CNF and DNF, respectively.Although not syntactically unique, we refer to them functionally as cnf(F ) and dnf(F ) since we only rely on specific -easy to achieve -syntactic properties that are stated in the following Props.4-6.
Proposition 4. For all formulas F it holds that Var (cnf(F )) ⊆ Var (F ); Input: First-order formulas F and G such that F |= G.

Method:
1. Free variables to placeholder constants.Let Fc and Gc be the sentences obtained from F and G by replacing each free variable with a dedicated fresh constant.2. Skolemization and clausification.Apply there conversion to prenex form and second-order Skolemization independently to Fc and to ¬Gc, resulting in disjoint sets of fresh Skolem functions F ′ , G ′ , clausal formulas F ′ , G ′ , and sets In case F ′ or G ′ contains the empty clause, exit with result H def = ⊥ or H def = ⊤, respectively.3. Tableau computation.Compute a leaf-closed clausal tableau for the clausal formula F ′ ∧G ′ .This can be obtained, for example, from a clausal tableaux prover for clausal first-order formulas.4. Tableau grounding.Instantiate all variables of the tableau with ground terms built up from functions in F ′ ∧ G ′ and possibly also fresh functions S = S1 ⊎ S2.
Observe that the grounded tableau is still a leaf-closed tableau for F ′ ∧ G ′ . 5. Side assignment.Convert the ground tableau to a two-sided tableau for F ′ and G ′ by attaching appropriate side labels to all nodes except the root.This is always possible because every clause of the tableau is an instance of a clause in F ′ or in G ′ .6. Ground interpolant extraction.Let Hgrd be the value of ipol(N0), where N0 is the root of the tableau.7. Interpolant lifting.Let F def = F ′ ∪ (Fun(F ) \ Fun(G)) ∪ S1 and let G def = G ′ ∪ (Fun(G) \ Fun(F )) ∪ S2.Let FG stand for F ∪ G.An FG-maximal occurrence of an FG-term in a formula is an occurrence that is not within another FG-term.Let {t1, . . ., tn} be the set of the FG-terms with an FG-maximal occurrence in Hgrd, ordered such that if ti is a subterm of tj, then i < j.Let {v1, . . ., vn} be a set of fresh variables.For i ∈ {1, . . ., n} define the quantifiers Qi as ∃ if ti ∈ F-terms and as ∀ if ti ∈ G-terms.Let where H ′ grd is obtained from Hgrd by replacing all FG-maximal occurrences of terms ti with variable vi, simultaneously for all i ∈ {1, . . ., n}. 8. Placeholder constants to free variables.Let H be Hc after replacing any constants that were introduced in step 1 with their corresponding variables.
Output: Return H, a Craig-Lyndon interpolant of the input formulas F and G.For prenex formulas F with an NNF matrix let dual(F ) be the formula obtained from F by switching quantifiers ∀ and ∃, connectives ∧ and ∨, truth-value constants ⊤ and ⊥, and literals with their complement.
Proposition 5.For all formulas F it holds that cnf(F

Used Notions of Range-Restriction
The following definition renders the characteristics of the range-restricted formulas as considered by Van Gelder and Topor in [59, Theorem 7.2] (except for the special consideration of equality in [59]).
where Q is a quantifier prefix (the same in both formulas) upon universally quantified variables U and existentially quantified variables E (in arbitrary order), and M C , M D are quantifier-free formulas in CNF and DNF, respectively, such that 1.For all clauses C in M C it holds that Var (C) ∩ U ⊆ Var − (C).

For all conjunctive clauses
For VGT-range-restricted formulas it is shown in [59] that these can be translated via two intermediate formula classes to a relational algebra expression.Related earlier results include [40,41,17,18].The constraint on universal variables is also useful on its own as a weaker variation of range-restriction, defined as follows.
where Q is a quantifier prefix upon the universally quantified variables U (there may also be existentially quantified variables in Q) and M C is a quantifier-free formula in CNF such that for all clauses For formulas without free variables, U-range-restriction and VGT-range-restriction are related as follows.
Proposition 9. Let F be a sentence.Then (i) F is VGT-range-restricted iff F and ¬F are both U-range-restricted.(ii) If F is universal (i.e., in prenex form with only universal quantifiers), then F is VGT-range-restricted iff F is Urange-restricted.(iii) If F is existential (i.e., in prenex form with only existential quantifiers), then F is VGT-range-restricted iff ¬F is U-range-restricted.
U-range-restriction covers well-known restrictions of knowledge bases and inputs of bottom-up calculi for first-order logic and fragments of it that are naturally represented by clausal formulas [3].First-order representations of tuplegenerating dependencies (TGDs) are VGT-range-restricted sentences: conjunctions of sentences of the form ∀X Y (A(X Y) → ∃Z B(YZ)), where A is a possibly empty conjunction of relational atoms, B is a nonempty conjunction of relational atoms and the free variables of A and B are exactly those in the sequences X Y and YZ, respectively.Also certain generalizations, e.g., to disjunctive TGDs, where B is built up from atoms, ∧ and ∨, are VGT-range-restricted.

Results on Range-Restricted Interpolation
The following theorem shows three variations for obtaining range-restricted interpolants from range-restricted inputs.Observe that Theorem 10.i requires range-restriction only for F , the first of the two interpolation arguments.Theorem 10.iii aims at applications for query reformulation that in a basic form are expressed as interpolation task for input formulas Here K expresses background knowledge and constraints as a U-range-restricted sentence and Q(X ) represents a query to be reformulated, with free variables X .Formulas K ′ and Q ′ are copies of K and Q, respectively, where predicates not allowed in the interpolant are replaced by primed versions.If the query Q is Boolean, i.e., X is empty, and Q is VGT-range-restricted, then Theorem 10.ii already suffices to justify the construction of a VGT-range-restricted interpolant.If X is not empty, the fineprint preconditions of Theorem 10.iii come into play.Precondition (1) requires that cnf(K) does not have a clause with only negative literals, which is satisfied if K represents TGDs.Also cnf(Q) is not allowed to have a clause with only negative literals.By precondition (2) all the free variables X must occur in all those clauses of cnf(¬Q) that only have negative literals, which follows if Q meets condition (3.) of the VGT-range-restriction (Def.7).By precondition (3) for all clauses C in cnf(¬Q) it must hold that Var (C) ∩ X ⊆ Var − (C).A sufficient condition for Q to meet all these preconditions is that dnf(Q) has a purely existential quantifier prefix and a matrix with only positive literals where each query variable, i.e., member of X , occurs in each conjunctive clause.

Proving Range-Restricted Interpolation -The Hyper Property
We will prove Theorem 10 by showing how the claimed interpolants can be obtained with CTIF.As a preparatory step we match items from the specification of CTIF (Fig. 2) with the constraints of range-restriction.The following notion gathers intermediate formulas and sets of symbols of CTIF.
where F, G are formulas, F ′ , G ′ are clausal formulas, C is a set of constants, F, G are sets of functions, and E, U, V are sets of terms such that the following holds.
(i) F |= G. (ii) Let F c and G c be F and G after replacing each free variable with a dedicated fresh constant.Let C be those constants that were used there to replace a variable that occurs in both F and G. F ′ and G ′ are the matrices of cnf(F c ) and of cnf(¬G c ), after replacing existentially quantified variables with Skolem terms.(iii) F is the union of the set of the Skolem functions introduced for existential quantifiers of cnf(F c ), the set of functions occurring in F c but not in G c and, possibly, further functions freshly introduced in the grounding step of CTIF.Analogously, G is the union of the set of the Skolem functions introduced for cnf(¬G c ), the set of functions occurring in G c but not in F c , and, possibly, further functions introduced in grounding.(iv) E and U are the sets of all terms with outermost function symbol in F and G, respectively.
The following statements about an interpolation context are easy to infer.
, then for all clauses C in F ′ it holds that if a variable occurs in C in a position that is not within an E-term it occurs in C in a negative literal, in a position that is not within an E-term.(iv) If ¬G is U-range-restricted, then for all clauses C in G ′ it holds that if a variable occurs in C in a position that is not within an U-term, it occurs in C in a negative literal, in a position that is not within an U-term.(v) If G satisfies condition (3) of Theorem 10.iii, then for all clauses C in G ′ it holds that any member of C that occurs in C in a position that is not within an U-term occurs in C in a negative literal in a position that is not within an U-term.
CTIF involves conversion of terms to variables at lifting (step 7) and at replacing placeholder constants (step 8).We introduce a notation to identify those terms that will be converted there to variables.It mimics the notation for the set of free variables of a formula but applies to a set of terms, those with occurrences that are "maximal" with respect to a given set S of terms, i.e., are not within another term from S. For NNF formulas F define S-Max (F ) as the set of S-terms that occur in F in a position other than as subterm of another S-term.Define S-Max + (F ) (S-Max − (F ), respectively) as the set of S-terms that occur in F in a positive (negative, respectively) literal in a position other than as subterm of another S-term.We can now conclude from Lemma 12 the following properties of instances of clauses used for interpolant construction.
The following proposition adapts Props.6.v and 6.vi to S-Max .The key to obtain range-restricted interpolants from CTIF is that the tableau must have a specific form, which we call hyper, as it resembles proofs by hyperresolution [46] and hypertableaux [2].Definition 15.A clausal tableau is called hyper if the nodes labeled with a negative literal are exactly the leaf nodes.
While hyperresolution and related approaches, e.g., [46,36,11,2,3], consider DAGshaped proofs with non-rigid variables, aiming at interpolant extraction we consider the hyper property for tree-shaped proofs with rigid variables.The hyper requirement is w.l.o.g. because arbitrary closed clausal tableaux can be converted to tableaux with the hyper property, as we will see in Sect. 5.
The proof of Theorem 10 is based on three properties that invariantly hold for all nodes, or for all inner nodes, respectively, stated in the following lemma.
Lemma 16.Let ⟨F, G, F ′ , G ′ , F, G, E, U, C, V⟩ be an interpolation context and assume a leaf-closed and hyper two-sided clausal ground tableau for F ′ and G ′ .
(i) If F is U-range-restricted, then for all nodes N the property INV C (N ) defined as follows holds: (ii) If ¬G is U-range-restricted, then for all nodes N the property INV D (N ) defined as follows holds: (iii) If ¬G is U-range-restricted and conditions (1)-(3) Theorem 10.iii hold, then for all inner nodes N the property INV X (N ) defined as follows holds: Each of Lemma 16.i, 16.ii and 16.iii can be proven independently by an induction on the tableau structure, but for the same tableau, such that the properties claimed by them can be combined.In proving these three sub-lemmas it is sufficient to use their respective preconditions only to justify the application of matching sub-lemmas of Lemma 13.That lemma might thus be seen as an abstract interface that delivers everything that depends on these preconditions and is relevant for Theorem 10.
We show here the proof of Lemma 16.i.Lemma 16.ii can be proven in full analogy.The proof of Lemma 16.iii is deferred to App. A. In general, recall that the tableau in Lemma 16 is a two-sided tableau for F ′ and G ′ that is leaf-closed and hyper.Hence literal labels of leaves are negative, while those of inner nodes are positive.All tableau clauses are ground and with an associated side in {F, G} such that a tableau clause with side F is an instance of a clause in F ′ and one with side G is an instance of a clause in G ′ .Proof (Lemma 16.i).By induction on the tableau structure.
Base case where N is a leaf.If N and tgt(N ) have the same side, then ipol(N ) is a truth value constant, hence V-Max (ipol(N )) = ∅, implying INV C (N ).If N has side F and tgt(N ) has side G, then ipol(N ) = lit(N ), which, because N is a leaf, is a negative literal.Thus Induction Step.Let N 1 , . . ., N n , where 1 ≤ n, be the children of N .Assume as induction hypothesis that for i ∈ {1, . . ., n} it holds that INV C (N i ).Consider the case where the side of the children is F.
Assume that INV C (N ) does not hold.Then there exists a clause K in cnf(ipol(N )) and a term t such that (2 To derive a contradiction, we first show that given (2), ( 4) and ( 5) it holds that (6) For all children N ′ of N : t / ∈ V-Max + (path F (N ′ )).
Statement ( 6) can be proven as follows.Assume to the contrary that there is a child N ′ of N such that t ∈ V-Max + (path F (N ′ )).By (5) it follows that t ∈ V-Max (lit(N ′ )) and lit(N ′ ) is positive.By Lemma 13.i and ( 2) there is another child N ′′ of N such that lit(N ′′ ) is negative and t ∈ V-Max (lit(N ′′ )).Since the tableau is closed, it follows from (5) that tgt(N ′′ ) has side G, which implies that ipol(N ′′ ) = lit(N ′′ ).Hence t ∈ V-Max (ipol(N ′′ )).Since ipol(N ′′ ) is a negative literal and a disjunct of ipol(N ), it follows from (1) and Prop.6.iii that for all clauses C in cnf(ipol(N )) it holds that t ∈ V-Max − (C), contradicting assumption (4).Hence (6) must hold.From ( 6), ( 2) and the induction hypothesis it follows that for all children N ′ of N and clauses Hence, by (1) and Prop.14.i it follows that for all clauses C in cnf(ipol(N )) it holds that V-Max (C) ∩ {t} ⊆ V-Max − (C).This, however, contradicts our assumption of the existence of a clause K in cnf(ipol(N )) that satisfies (3) and ( 4).Hence INV C (N ) must hold.
We conclude the proof of the induction step for INV C (N ) by considering the case where the side of the children of N is G. Then (7) INV C (N ) follows from the induction hypothesis, ( 8), ( 7) and Prop.6.i.

⊓ ⊔
The invariant properties of tableau nodes shown in Lemmas 16.i-16.iiiapply in particular to the tableau root.We now apply this to prove Theorem 10.
Proof (Theorem 10).Interpolants with the stated properties are obtained with CTIF, assuming w.l.o.g. that the CNF computed in step 2 meets the requirement of Sect.3.1, and that the closed clausal tableau computed in step 3 is leaf-closed and has the hyper property.That CTIF constructs a Craig-Lyndon interpolant has been shown in [62].It remains to show the further claimed properties of the interpolant.Let ⟨F, G, F ′ , G ′ , F, G, E, U, C, V⟩ be the interpolation context for the input formulas F and G and let N 0 be the root of the tableau computed in step 3. Since N 0 is the root, path F (N 0 ) = path G (N 0 ) = ⊤ and thus the expressions V-Max + (path F (N 0 )) and V-Max + (path G (N 0 )) in the specifications of INV C (N 0 ), INV D (N 0 ) and INV X (N 0 ) all denote the empty set.The claims made in the particular sub-theorems can then be shown as follows.
(10.i)By Lemma 16.i it follows that INV C (N 0 ).Hence, for all clauses C in cnf(ipol(N 0 )) it holds that V-Max (C) ∩ U ⊆ V-Max − (C).It follows that the result of the interpolant lifting (step 7) of CTIF applied to ipol(N 0 ) is U-rangerestricted.Placeholder constant replacement (step 8) does not alter this.
(10.ii)As for Theorem 10.i it follows that for all clauses C in cnf(ipol(N 0 )) it holds that V-Max (C) ∩ U ⊆ V-Max − (C).By Lemma 16.ii it follows that INV D (N 0 ).Hence, for all conjunctive clauses D in dnf(ipol(N 0 )) it holds that V-Max (D) ∩ E ⊆ V-Max + (D).It follows that the result of the interpolant lifting of CTIF applied to ipol(N 0 ) is U-range-restricted.Since F and G have no free variables, placeholder constant replacement has no effect.(10.iii)As for Theorem 10.ii it follows that for all clauses C in cnf(ipol(N 0 )) it holds that V-Max (C) ∩ U ⊆ V-Max − (C) and for all conjunctive clauses D in dnf(ipol(N 0 )) it holds that V-Max (D) ∩ E ⊆ V-Max + (D).By Lemma 16.iii it follows that INV X (N 0 ).Hence, for all conjunctive clauses D in dnf(ipol(N 0 )) it holds that C ⊆ V-Max + (D).It follows that the result of the interpolant lifting of CTIF applied to ipol(N 0 ) followed by placeholder constant replacement, now applied to C, is VGT-range-restricted.

Horn Interpolation
A Horn clause is a clause with at most one positive literal.A Horn formula is built up from Horn clauses with the connectives ∧, ∃ and ∀.Horn formulas are important in countless theoretical and practical respects.Our interpolation method on the basis of clausal tableaux with the hyper property can be applied to obtain a Horn interpolant under the precondition that the first argument formula F of the interpolation problem is Horn.The following theorem makes this precise.It can be proven by an induction on the structure of a clausal tableau with the hyper property (see App. B).
Theorem 17 (Interpolation from a Horn Formula).Let F be a Horn formula and let G be a formula such that F |= G. Then there exists a Craig-Lyndon interpolant H of F and G that is a Horn formula.Moreover, H can be effectively constructed from a clausal tableau proof of F |= G.
An apparently weaker property than Theorem 17 has been shown in [38, § 4] with techniques from model theory: For two universal Horn formulas F and G there exists a universal Horn formula that is like a Craig interpolant, except that function symbols are not constrained.A universal Horn formula is there a prenex formula with only universal quantifiers and a Horn matrix.For CTIF, the corresponding strengthening of the interpolant to a universal formula can be read-off from the specification of interpolant lifting (step 7 in Fig. 2).
The following corollary shows that Theorem 17 can be combined with Theorem 10 to obtain interpolants that are both Horn and range-restricted.
Corollary 18 (Range-Restricted Horn Interpolants).Theorems 10.i, 10.ii and 10.iii can be strengthened: If F is a Horn formula, then there exists a Craig-Lyndon interpolant H with the properties shown in the respective theorem and the additional property that it is Horn.Moreover, H can be effectively constructed from a clausal tableau proof of F |= G.
Proof.Can be shown by combining the proof of Theorem 10.i, 10.ii and 10.iii, respectively, with the proof of interpolation from a Horn sentence, Theorem 17.The combined proofs are based on inductions on the same closed tableau with the hyper property.

Obtaining Proofs with the Hyper Property
Our new interpolation theorems, Theorems 10 and 17, depend on the hyper property of the underlying closed clausal tableaux from which interpolants are extracted.We present a proof transformation that converts any closed clausal tableau to one with the hyper property.The transformation can be applied to a clausal tableau as obtained directly from a clausal tableaux prover.Moreover, it can be also be indirectly applied to a resolution proof.To this end, the resolution deduction tree [12]  We specify the hyper conversion in Fig. 3 as a procedure that destructively manipulates a tableau.A fresh copy of an ordered tree T is there an ordered tree T ′ with fresh nodes and edges, related to T through a bijection c such that any node N of T has the same labels (literal label and side label) as node c(N ) of T ′ and such that the i-th edge originating in node N of T ends in node M if and only if the i-th edge originating in node c(N ) of T ′ ends in node c(M ).The procedure is performed as an iteration that in each round chooses an inner node with negative literal label and then modifies the tableau.Hence, at termination there is no inner node with negative literal, which means that the tableau is hyper.Termination of the procedure can be shown with a measure that strictly decreases in each round (Prop.20 in App.C).Figures 4 and 5 show example applications of the procedure.
Since the hyper conversion procedure copies parts of subtrees it is not a polynomial operation. 4To get an idea of its practical feasibility, we experimented with an unbiased set of proofs of miscellaneous problems.For this we took those 112 CASC-J11 [54] problems that could be proven with Prover9 [37] in 400 s per problem, including a basic proof conversion with Prover9's tool Prooftrans. 5 The hyper conversion succeeded on 107 (or 96%) of these, given 400 s timeout per proof, where the actual median of used time was only 0.01 s.It was applied to a tableau in cut normal form that represents the proof tree of Prover9's proof.The two intermediate steps, translation of paramodulation to binary resolution and expansion to cut normal form, succeeded in fractions of a second, except for one case where the expansion took 121 s and two cases where it failed due to memory exhaustion.The hyper conversion then failed in three further cases.

Input: A closed clausal tableau.
Method: Simplify the tableau to leaf-closing and regular form (Sect. 2.2).Repeat the following operations until the resulting tableau is hyper.
1. Let N ′ be the first node visited in pre-order with a child that is an inner node with a negative literal label.Let N be the leftmost such child.2. Create a fresh copy U of the subtree rooted at N ′ .In U remove the edges that originate in the node corresponding to N .3. Replace the edges originating in N ′ with the edges originating in N .4. For each leaf descendant M of N ′ with lit(M ) = lit(N ): Create a fresh copy U ′ of U .Change the origin of the edges originating in the root of U ′ to M . 5. Simplify the tableau to leaf-closing and regular form (Sect. 2.

2).
Output: A leaf-closed, regular and hyper clausal tableau whose clauses are clauses of the input tableau.For each round the result after procedure steps 1-4 is shown and then the result after step 5, simplification, applied here to achieve regularity.
For all except two proofs the hyper conversion reduced the proof size, where the overall median of the size ratio hyper-to-input was 0.39.See App.D for details.

Conclusion
We conclude with discussing related work, open issues and perspectives.Our interpolation method CTIF [62] is complete for first-order logic with function symbols.Vampire's native interpolation [23,22], targeted at verification, is like all local methods incomplete [28].Princess [47,10] implements interpolation with a sequent calculus that supports theories for verification and permits uninterpreted predicates and functions.Suitable proofs for our approach can currently be obtained from CMProver (clausal tableaux) and Prover9 (resolution/paramodulation).With optimized settings, Vampire [27] and E [49] as of today only output proofs with gaps.This seems to improve [48] or might be overcome by re-proving with Prover9 using lemmas from the more powerful systems.So far we did not address special handling of equality in the context of range-restriction, a topic on its own, e.g., [59,3].We treat it as predicate, with axioms for reflexivity, symmetry, transitivity and substitutivity.CTIF works smoothly with these, respecting polarity constraints of equality in interpolants [62,Sect. 10.4].With exception of reflexivity these axioms are U-range-restricted.We do not interfere with the provers' equality handling and just translate in finished proofs paramodulation into binary resolution with substitutivity axioms.
Our hyper property might be of interest for proof presentation and exchange, since it gives the proof tree a constrained shape and in experiments often shortens it.Like hyperresolution and hypertableaux it can be generalized to take a "semantics" into account [51] [12, Chap.6] [26, Sect.4.5].To shorten interpolants, it might be combined with proof reductions (e.g., [63]).
For query reformulation, interpolation on the basis of general first-order ATP was so far hardly considered.Most methods are sequent calculi [56,6] or analytic tableaux systems [21,25,5,57].Experiments with ATP systems and propositional inputs indicate that requirements are quite different from those in verification [4].An implemented system [25,57] uses analytic tableaux with dedicated refinements for enumerating alternate proofs/interpolants corresponding to query plans for heuristic choice.In [5] the focus is on interpolants that are sentences respecting binding patterns, which, like range-restriction, ensures database evaluability.Our interpolation theorems show fine-grained conditions for passing variations of range-restriction and the Horn property on to interpolants.Matching these with the many formula classes considered in knowledge representation and databases is an issue for future work.A further open topic is adapting recent synthesis techniques for nested relations [6] to the clausal tableaux proof system.
Methodically, we exemplified a way to approach operations on proof structures while taking efficient automated first-order provers into account.Feasible implementations are brought within reach, for practical application and also for validating abstract claims and conjectures with scrutiny.The prover is a black box, given freedom on optimizations, strategy and even calculus.For interfacing, the overall setting incorporates clausification and Skolemization.Requirements on the proof structure do not hamper proof search, but are ensured by transformations applied to proofs returned by the efficient systems.

A Proof of Lemma 16.iii
This appendix supplements Sect.3.4 with the proof of Lemma 16.iii, used for proving Theorem 10.iii on range-restricted interpolation with free variables.Lemma 16.Let ⟨F, G, F ′ , G ′ , F, G, E, U, C, V⟩ be an interpolation context and assume a leaf-closed and hyper two-sided clausal ground tableau for F ′ and G ′ .
(iii) If ¬G is U-range-restricted and conditions (1)-(3) Theorem 10.iii hold, then for all inner nodes N the property INV X (N ) defined as follows holds: The proof of Lemma 16.iii proceeds by induction on the tableau structure, referring to DNF conversions of the interpolant constituents, similarly to the proof of Lemma 16.ii, but with a base case that resides on the following lemma: Lemma 19.For all inner nodes N of a closed tableau that is hyper it holds that either all literals in clause(N ) are negative or N has a descendant N ′ such that all literals in clause(N ′ ) are negative.
Proof.The tableau is hyper, hence leaves are exactly the nodes with a negative literal label.Failure of the claimed property would imply that the tableau has an infinite branch.
⊓ ⊔ Proof (Lemma 16.iii).By induction on the tableau structure, with nodes N where all literals in clause(N ) are negative as base case.That this is sufficient as base case to show INV X (N ) for all inner nodes N as claimed follows from Lemma 19.
Base case where all literals in clause(N ) are negative.By Lemma 13.iii the children of N have side G. Hence For all children N ′ of N where the side of tgt(N ′ ) is F it holds that ipol(N ′ ) = lit(tgt(N ′ )), a positive literal, hence V-Max (lit(N ′ )) = V-Max + (ipol(N ′ )).For all children N ′ of N where the side of tgt( By Lemma 13.iv it holds that C ⊆ V-Max (clause(N )).With (2) it follows that Because ipol(N ) is a conjunction of literals and ⊤, the formula dnf(ipol(N )) consists of a single conjunctive clause D, which contains exactly the literals in ipol(N ) and thus satisfies V-Max + (D) = V-Max + (ipol(N )).Together with (3) this implies INV X (N ).
Induction Step.Let N 1 , . . ., N n , where 1 ≤ n, be the children of N .Assume as induction hypothesis that for i ∈ {1, . . ., n} it holds that INV X (N i ).Consider the case where the side of the children is G. Then literals, ⊥ and at most one Horn-like formula.It is easy to see that a Hornlike formula can be converted to an equivalent conjunction of Horn clauses by truth-value simplification and distributing disjunction upon conjunction.
The claimed Horn interpolant is obtained via CTIF (Fig. 2), assuming w.l.o.g. that the CNF computed in step 2 there meets the requirement of Sect.3.1, and that the closed clausal tableau computed in step 3 is leaf-closed and has the hyper property.That CTIF constructs a Craig-Lyndon interpolant has been shown in [62].It remains to show that it can be converted to a Horn formula.Let ⟨F, G, F ′ , G ′ ⟩ be the prefix of an interpolation context for the input formulas F and G and let N 0 be the root of the tableau computed in step 3.
We now show by induction on the tableau structure that H grd = ipol(N 0 ) for the tableau root N 0 is a Horn-like formula.The theorem then follows since we can obtain the final interpolant H from H grd by interpolant lifting (step 7 of CTIF), replacing placeholder constants with free variables (step 8), and conversion of the Horn-like matrix to an equivalent conjunction of Horn clauses, where all syntactic properties relevant for a Craig-Lyndon interpolant are preserved.
For the base case where N is a leaf it is immediate from the definition of ipol that ipol(N ) is a ground literal or a truth value constant and thus a Horn-like formula.To show the induction step, let N be an inner node with children N 1 , . . ., N n where n ≥ 1.As induction hypothesis assume that for all i ∈ {1, . . ., n} it holds that ipol(N i ) is a Horn-like formula.We prove the induction step by showing that then also ipol(N ) is a Horn-like formula.

C Termination of the Hyper Conversion
This appendix supplements Sect. 5 with a proven statement on the termination of the hyper conversion procedure (Fig. 3).
Proof.We give a measure that strictly decreases in each round of the procedure.Consider a single round of the steps 1-5 of the procedure with N and N ′ as determined in step 1.We observe the following.
(i) All tableau modifications made in the round are in the subtree rooted at N ′ .
(ii) At finishing the round all descendants of N ′ with the same literal label as N are leaves.(iii) All literal labels of inner nodes that are descendants of N ′ and are different from lit(N ) at finishing the round were already literal labels of inner nodes that are descendants of N ′ when entering the round.
We can now specify the measure that strictly decreases in each round of the procedure.For a node N define bad-literals(N ) as the set of literal labels that occur in inner (i.e., non-leaf) descendants of N and are negative.From the above observations (ii) and (iii) it follows that for N ′ as determined in step 1 of the procedure the cardinality of bad-literals(N ′ ) is strictly decreased in a round of steps 1-5 of the procedure.However, a different node might be determined as N ′ in step 1 of the next round.To specify a globally decreasing measure we define a further auxiliary notion: Let N n be a node whose ancestors are in root-to-leaf order the nodes N 1 , . . ., N n−1 .Define path-string(N n ) as the string I 1 . . .I n ω of numbers, where for i ∈ {1, . . ., n} the number I i is the number of right siblings of N i .With observation (i) it then follows that the following string of numbers, determined in step 1 of a round, is strictly reduced from round to round w.r.t. the lexicographical order of strings of numbers: Regularity ensures that the length of the strings to be considered can not be larger than the finite number of literal labels of nodes of the input tableau plus 3 (a leading 0 for the root, which has no literal label; ω; and |bad-literals(N ′ )|).
With the lexicographical order restricted to strings up to that length we have a well-order and the strict reduction ensures termination.⊓ ⊔

D Experimental Indicators of Practical Feasibility
This appendix supplements Sect. 5 with details on the experiments to verify practical feasibility of proof conversions involved in our strengthened variations of Craig interpolation.Also instructions for reproducing the experiments are given.An implementation of the techniques from the paper is currently in progress, written in SWI-Prolog [64], embedded in PIE [60,61].Core parts of the functionality are already available 6 but not yet integrated into full application workflows.The involved proof transformations lead from a proof with resolution and paramodulation via pure binary resolution and a clausal tableau in cut normal form to a clausal tableau with the hyper property.To get an impression of their practical feasibility, we tested them on problems from the latest CASC competition, CASC-J11 [54], as an unbiased set of proofs of miscellaneous problems.
As basis we took those FOF problems of CASC-J11 on which Prover9 succeeded in the competition.We tried to reprove these in Prover9's default auto mode7 via PIE and to convert their proofs with the Prooftrans tool, which comes with Prover9, with a timeout of 400 s per problem. 8Prooftrans was configured with the expand option that translates Prover9's proofs to just binary resolution, paramodulation and a few other equality rules.This succeeded for 112 problems.For one additional problem, Prooftrans failed. 9The length of the obtained proofs (number of steps, including axioms) was between 12 and 919 with median 55.
Equality-specific rules were then translated to binary resolution steps, which for no proof took longer than 0.04 s.The proof length (number of steps, including axioms) of the results was between 10 and 4,833, median 81 (in some cases the size decreased because non-clausal axioms were deleted).
These proofs were then converted to clausal tableaux in cut normal form that correspond to resolution trees.This failed for two of the 112 proofs due to memory exhaustion but succeeded for each of the others in less then 0.1 s, with exception of one problem, where it took 121 s.The median time per problem was 0.001 s.The proof size (number of inner nodes of the clausal tableau) in the results was between 20 and 97,866,317, median 259.
Finally the hyper conversion was applied with a timeout of 400 s per proof to the remaining 110 proofs.It succeeded for 107 proofs, within a median time of 0.01 s per proof, and a maximum time of 235 s.One failure was due to memory exhaustion, the other two were timeouts.The proof size (number of inner nodes of the clausal tableau) of the results was between 11 and 3,110, median 77.In 105 of the 107 cases the size was reduced.The largest proof on which the conversion succeeded had size 51,359 and was reduced to size 507.The ratios of the size of the hyper-converted tableau to the size of the source tableau were between 0.01 and 4.48, median 0.39.
Tables 1-3 show result data for each of the 112 problems.Figure 6 shows Prolog code to reproduce the experiments with PIE.The columns in these tables are as follows.
Problem The TPTP problem (TPTP v8.1.2).Rtg Its latest rating in TPTP v8.1.2.T1 Proving time in seconds, rounded (timeout 400 s).S1 Number of steps of the original proof, after expansion into binary resolution and paramodulation by Prooftrans, including axioms.S2 Number of steps of the binary resolution proof after translation of paramodulation and some other equality inferences to pure binary resolution, including axioms.S3 Tree size (number of inner nodes) of the clausal tableau in cut normal form, where "-" indicates failure of the conversion.S4 Tree size (number of inner nodes) of the clausal tableau after the hyper conversion, where "-" indicates a timeout (400 s) of the conversion.T2 Time for the hyper conversion in seconds, rounded.

( 3 )
Var (H) ⊆ Var (F )∩Var (G).The perspective of validating an entailment F |= G by showing unsatisfiability of F ∧ ¬G is reflected in the notion of reverse Craig-Lyndon interpolant of F and G, defined as Craig-Lyndon interpolant of F and ¬G.• ¬r(a) [q(a)]

Figure 1
shows a two-sided tableau for F = p(a) ∧ (¬p(a) ∨ q(a)) and G = (¬q(a)∨r(a))∧¬r(a).Side G is indicated by gray background.For each node the value of ipol, after truth-value simplification, is annotated in brackets.The clauses of the tableau are ¬r(a) and ¬q(a) ∨ r(a), which have side G, and ¬p(a) ∨ q(a) and p(a), which have side F. If N is the node shown bottom left, labeled with p(a), then path F (N ) = ¬p(a) ∧ p(a) and path G (N ) = ¬r(a) ∧ ¬q(a).

Theorem 10 (
Interpolation and Range-Restriction).Let F and G be formulas such that F |= G. (i) If F is U-range-restricted, then there exists a U-range-restricted Craig-Lyndon interpolant H of F and G.Moreover, H can be effectively constructed from a clausal tableau proof of F |= G. (ii) If F and G are sentences such that F and ¬G are U-range-restricted, then there exists a VGT-range-restricted Craig-Lyndon interpolant H of F and G.Moreover, H can be effectively constructed from a clausal tableau proof of F |= G. (iii) If F and ¬G are U-range-restricted, Var (F ) = Var (G) = X , and (1) no clause in cnf(F ) has only negative literals; (2) for all clauses C in cnf(¬G) with only negative literals it holds that X ⊆ Var − (C); (3) for all clauses C in cnf(¬G) it holds that Var (C) ∩ X ⊆ Var − (C), then there exists a VGTrange-restricted Craig-Lyndon interpolant H of F and G.Moreover, H can be effectively constructed from a clausal tableau proof of F |= G.
iii) If condition (1) of Theorem 10.iii holds, then no instance C of a clause in F ′ has only negative literals.(iv) If condition (2) of Theorem 10.iii holds, then for all instances C of a clause in G ′ with only negative literals it holds that C ⊆ V-Max − (C).(v) If ¬G is U-range-restricted and condition (3) of Theorem 10.iii holds, then for all instances C of a clause in

Proposition 14 .
Let F 1 , F 2 , . . ., F n be NNF formulas and let T be a set of terms.Then (i) If S is a set of terms such that for all i ∈ {1, . . ., n} and clausesC in cnf(F i ) it holds that T -Max (C) ∩ S ⊆ T -Max − (C), then for all clauses C in cnf( n i=1 F i ) it holds that T -Max (C) ∩ S ⊆ T -Max − (C).(ii) If S isa set of terms such that for all i ∈ {1, . . ., n} and conjunctive clauses D in dnf(F i ) it holds that T -Max (D) ∩ S ⊆ T -Max + (D), then for all conjunctive clauses D in dnf( n i=1 F i ) it holds that T -Max (D) ∩ S ⊆ T -Max + (D).
[31,he binary resolution proof is first translated to a closed clausal ground tableau in cut normal form[31, Sect.7.2.2].There the inner clauses are atomic cuts, tautologies of the form ¬p(t 1 , . . ., t n ) ∨ p(t 1 , . . ., t n ) or p(t 1 , . . ., t n ) ∨ ¬p(t 1 , . . ., t n ), corresponding to literals upon which a (tree) resolution step has been performed.Clauses of nodes whose children are leaves are instances of input clauses.Our hyper conversion can then be applied to the tableau in cut normal form.It is easy to see that a regular leaf-closed tableau with the hyper property can not have atomic cuts.Hence the conversion might be viewed as an elimination method for these cuts.