Abstract
We show how variations of rangerestriction and also the Horn property can be passed from inputs to outputs of Craig interpolation in firstorder logic. The proof system is clausal tableaux, which stems from firstorder ATP. Our results are induced by a restriction of the clausal tableau structure, which can be achieved in general by a proof transformation, also if the source proof is by resolution/paramodulation. Primarily addressed applications are query synthesis and reformulation with interpolation. Our methodical approach combines operations on proof structures with the immediate perspective of feasible implementation through incorporating highly optimized firstorder provers.
Funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) – ProjectID 457292495. The work was supported by the NorthGerman Supercomputing Alliance (HLRN).
Download conference paper PDF
1 Introduction
We show how variations of rangerestriction and also the Horn property can be passed from inputs to outputs of Craig interpolation in firstorder logic. The primarily envisaged application field is synthesis and reformulation of queries with interpolation [5, 39, 56]. Basically, the sought target query R is understood there as the right side of a definition of a given query Q within a given background knowledge base K, i.e., it holds that \(K \models (Q \leftrightarrow R)\), where the vocabulary of R is in a given set of permitted target symbols. In firstorder logic, the formulas R can be characterized as the Craig interpolants of \(K \wedge Q\) and \(\lnot K' \vee Q'\), where K, Q are copies of \(K',Q'\) with the symbols not allowed in R replaced by fresh symbols [14]. Formulas R exist if and only if the entailment \(K \wedge Q \models \lnot K' \vee Q'\) holds. They can be constructed as Craig interpolants from given proofs of the entailment in a suitable calculus.
In databases and knowledge representation, syntactic fragments of firstorder logic ensure desirable properties, for example domain independence. Typically, for given K and Q in some such fragment, also R must be in some specific fragment to be usable as a query or as a knowledge base component. Our work addresses this by showing for certain such fragments how membership is passed on to interpolants and thus to the constructed right sides of definitions. The fragment in focus here is a variant of rangerestriction from [59], known as a rather general syntactic condition to ensure domain independence [1, p. 97]. It permits conversion into a shape suitable for “evaluation” by binding free and quantified variables successively to the members of given predicate extensions. Correspondingly, if the vocabulary is relational, a rangerestricted formula can be translated into a relational algebra expression. Firstorder representations of widelyused classes of integrity constraints, such as tuplegenerating dependencies, are sentences that are rangerestricted in the considered sense.
As proof system we use clausal tableaux [26, 29,30,31, 33], devised in the 1990s to take account of automated firstorder provers that may be viewed as enumerating treeshaped proof structures, labeled with instances of input clauses.^{Footnote 1} Such systems include the Prolog Technology Theorem Prover [53], SETHEO [32], leanCoP [42, 43] and CMProver [16, 45, 60, 61]. As shown in [62], a given closed clausal tableau is quite wellsuited as a proof structure to extract a Craig interpolant. Via the translation of a resolution deduction tree [12] to a clausal tableau in cut normal form [31, 62] this transfers also to interpolation from a given resolution/paramodulation proof.
Since the considered notion of rangerestriction is based on prenexing and properties of both a CNF and a DNF representation of the formula, it fits well with the common firstorder ATP setting involving Skolemization and clausification and the ATPoriented interpolation on the basis of clausal tableaux, where in a first stage the propositional structure of the interpolant is constructed and in a second stage the quantifier prefix.
Our strengthenings of Craig interpolation are induced by a specific restriction of the clausal tableau structure, which we call hyper, since it relates to the proof structure restrictions of hyperresolution [46] and hypertableaux [2]. However, it is considered here for tree structures with rigid variables. A proof transformation that converts an arbitrary closed clausal tableau to one with the hyper property shows that the restriction is w.l.o.g. and, moreover, allows the prover unhampered search for the closed clausal tableaux or resolution/paramodulation proof underlying interpolation.
Structure of the Paper. Section 2 summarizes preliminaries, in particular interpolation with clausal tableaux [62]. Our main result on strengthenings of Craig interpolation for rangerestricted formulas is developed in Sect. 3. Section 4 discusses Craig interpolation from a Horn formula, also combined with rangerestriction. The proof transformation underlying these results is introduced in Sect. 5. We conclude in Sect. 6 with discussing related work, open issues and perspectives.
Proofs of nontrivial claims that are not proven in the body of the paper are supplemented in the preprint version [63]. An implementation with the PIE environment [60, 61]^{Footnote 2} is in progress.
2 Notation and Preliminaries
2.1 Notation
We consider formulas of firstorder logic. An NNF formula is a quantifierfree formula built up from literals (atoms or negated atoms), truthvalue constants \(\top , \bot \), conjunction and disjunction. A CNF formula, also called clausal formula, is an NNF formula that is a conjunction of disjunctions (clauses) of literals. A DNF formula is an NNF formula that is a disjunction of conjunctions (conjunctive clauses) of literals. The complement of a literal L is denoted by \(\overline{L}\). An occurrence of a subformula in a formula has positive (negative) polarity, depending on whether it is in the scope of an even (odd) number of possibly implicit occurrences of negation. Let F be a formula. \( \mathcal {V}ar (F)\) is set of its free variables. \( \mathcal {V}ar ^{+}(F)\) (\( \mathcal {V}ar ^{}(F)\)) is the set of its free variables with an occurrence in an atom with positive (negative) polarity. \( \mathcal {F}un (F)\) is the set of functions occurring in it, including constants, regarded here throughout as 0ary functions. \( \mathcal {P}red ^\pm (F)\) is the set of pairs \(\langle p, pol \rangle \), where p is a predicate and \( pol \in \{{+},{}\}\), such that an atom with predicate p occurs in F with the polarity indicated by \( pol \). \( \mathcal {V}oc ^\pm {(F)}\) is \( \mathcal {F}un (F) \cup \mathcal {P}red ^\pm (F)\). A sentence is a formula without free variables. An NNF is ground if it has no variables. If S is a set of terms, we call its members Sterms. The \(\models \) symbol expresses semantic entailment.
2.2 Clausal FirstOrder Tableaux
A clausal tableau (briefly tableau) for a clausal formula F is a finite ordered tree whose nodes N with exception of the root are labeled with a literal \(\textsf{lit}(N)\), such that for each node N the disjunction of the literals of all its children in their lefttoright order, \(\textsf{clause}(N)\), is an instance of a clause in F. A branch of a tableau is closed iff it contains nodes with complementary literals. A node is closed iff all branches through it are closed. A tableau is closed iff its root is closed. A node is closing iff it has an ancestor with complementary literal. With a closing node N, a particular such ancestor is associated as target of N, written \(\textsf{tgt}(N)\). A tableau is regular iff no node has an ancestor with the same literal and is leafclosing iff all closing nodes are leaves. A closed tableau that is leafclosing is called leafclosed. Tableau simplification can convert any tableau to a regular and leafclosing tableau for the same clausal formula, closed iff the original tableau is so. Regularity is achieved by repeating the following operation [31, Sect. 2.1.3]: Select a node N with an ancestor that has the same literal, remove the edges originating in the parent of N and replace them with the edges originating in N. The leafclosing property is achieved by repeatedly selecting an inner node N that is closing and removing the edges originating in N. All occurrences of variables in (the literal labels of) a tableau are free and their scope spans the whole tableau. That is, we consider freevariable tableaux [30, p. 158ff] with rigid variables [26, p. 114]. A tableau without variables is called ground. The universal closure of a clausal formula F is unsatisfiable iff there exists a closed clausal tableau for F. This holds also if clausal tableau is restricted by the properties ground, regular and leafclosing in arbitrary combinations.
2.3 Interpolation with Clausal Tableaux
Craig’s interpolation theorem [13, 15] along with Lyndon’s observation on the preservation of predicate polarities [35] ensures for firstorder logic the existence of CraigLyndon interpolants, defined as follows. Let F, G be formulas such that \(F \models G\). A CraigLyndon interpolant of F and G is a formula H such that (1) \(F \models H\) and \(H \models G\). (2) \( \mathcal {V}oc ^\pm {(H)} \subseteq \mathcal {V}oc ^\pm {(F)} \cap \mathcal {V}oc ^\pm {(G)}\). (3) \( \mathcal {V}ar (H) \subseteq \mathcal {V}ar (F) \cap \mathcal {V}ar (G)\). The perspective of validating an entailment \(F \models G\) by showing unsatisfiability of \(F \wedge \lnot G\) is reflected in the notion of reverse CraigLyndon interpolant of F and \(G\), defined as CraigLyndon interpolant of F and \(\lnot G\).
Following [62], our interpolant construction is based on a generalization of clausal tableaux where nodes have an additional side label that is shared by siblings and indicates whether the tableau clause is an instance of an input clause derived from the formula F or of the formula G of the statement \(F \wedge G\models \bot \) underlying the reverse interpolant. Thus, a twosided clausal tableau for clausal formulas \(F\) and \(G\) is a tableau for \(F\wedge G\) whose nodes N with exception of the root are labeled additionally with a side \(\textsf{side}(N) \in \{\textsf{F}, \textsf{G}\}\), such that (1) if N and \(N^\prime \) are siblings, then \(\textsf{side}(N) = \textsf{side}(N^\prime )\); (2) if N has a child \(N'\) with \(\textsf{side}(N^\prime ) = \textsf{F}\), then \(\textsf{clause}(N)\) is an instance of a clause in \(F\), and if N has a child \(N'\) with \(\textsf{side}(N^\prime ) = \textsf{G}\), then \(\textsf{clause}(N)\) is an instance of a clause in \(G\). We also refer to the side of the children of a node N as side of \(\textsf{clause}(N)\). For \( side \in \{\textsf{F},\textsf{G}\}\) define , where \( Path \) is the union of the set of the ancestors of N and \(\{N\}\).
Let N be a node of a leafclosed twosided clausal tableau. The value of \(\textsf{ipol}(N)\) is an NNF formula, defined inductively as specified with the tables below, the left for the base case where N is a leaf, the right for the case where N is an inner node with children \(N_1, \ldots , N_n\).
Example 1
Figure 1 shows a twosided tableau for \(F = \textsf{p}(\textsf{a}) \wedge (\lnot \textsf{p}(\textsf{a}) \vee \textsf{q}(\textsf{a}))\) and \(G = (\lnot \textsf{q}(\textsf{a}) \vee \textsf{r}(\textsf{a})) \wedge \lnot \textsf{r}(\textsf{a})\). is indicated by gray background. For each node the value of \(\textsf{ipol}\), after truthvalue simplification, is annotated in brackets. The clauses of the tableau are \(\lnot \textsf{r}(\textsf{a})\) and \(\lnot \textsf{q}(\textsf{a}) \vee \textsf{r}(\textsf{a})\), which have side \(\textsf{G}\), and \(\lnot \textsf{p}(\textsf{a}) \vee \textsf{q}(\textsf{a})\) and \(\textsf{p}(\textsf{a})\), which have side \(\textsf{F}\). If N is the node shown bottom left, labeled with \(\textsf{p}(\textsf{a})\), then \(\textsf{path}_{\textsf{F}}(N) = \lnot \textsf{p}(\textsf{a}) \wedge \textsf{p}(\textsf{a})\) and \(\textsf{path}_{\textsf{G}}(N) = \lnot \textsf{r}(\textsf{a}) \wedge \lnot \textsf{q}(\textsf{a})\).
If \(N_0\) is the root of a twosided tableaux for clausal ground formulas F and G, then \(\textsf{ipol}(N_0)\) is a CraigLyndon interpolant of F and \(\lnot G\).^{Footnote 3} The CTIF (Clausal Tableau Interpolation for FirstOrder Formulas) procedure (Fig. 2) [62] extends this to a twostage [9, 24] (inductive construction and lifting) interpolation method for full firstorder logic. It is complete (yields a CraigLyndon interpolant for all first order formulas F and G such that \(F \models G\)) under the assumption that the method for tableau computation in Step 3 is complete (yields a closed tableau for all unsatisfiable clausal formulas). Some steps leave room for interpolationspecific heuristics: In step 4 the choice of the terms used for grounding; in step 5 the choice of the side assigned to clauses that are an instance of both a clause in \(F'\) and a clause in \(G'\); and in step 7 the quantifier prefix, which is constrained just by a partial order.
Example 2
Let and let . Clausifying F and \(\lnot G\) then yields \(F' = \textsf{p}(x) \wedge \lnot \textsf{p}(x) \vee \textsf{q}(x)\) and \(G' = \lnot \textsf{q}(x) \vee \textsf{r}(x) \wedge \lnot \textsf{r}(\textsf{a})\). The tableau from Fig. 1 is a leafclosed ground tableau for \(F'\) and \(G'\) and we obtain \(\textsf{q}(\textsf{a})\) as \(H_{\textsc {grd}}\). Lifting for \(\mathcal {F}= \{\}\) and \(\mathcal {G}= \{\textsf{a}\}\) yields the interpolant \(H = \forall v_1\, \textsf{q}(v_1)\).
Example 3
Let and let . Clausifying yields \(F' = \textsf{p}(x,\textsf{f}(x),y)\) and \(G' = \lnot \textsf{p}(\textsf{a},z,\textsf{g}(z))\). We obtain \(\textsf{p}(\textsf{a},\textsf{f}(\textsf{a}),\textsf{g}(\textsf{f}(\textsf{a})))\) as \(H_{\textsc {grd}}\). Lifting is for \(\mathcal {F}= \{\textsf{f}\}\) and \(\mathcal {G}= \{\textsf{a}, \textsf{g}\}\) with \(t_1 = \textsf{a}\), \(t_2 = \textsf{f}(\textsf{a})\) and \(t_3 = \textsf{g}(\textsf{f}(\textsf{a}))\). It yields \(H = \forall v_1 \exists v_2 \forall v_3\, \textsf{p}(v_1, v_2, v_3)\).
3 Interpolation and RangeRestriction
We now develop our main result on strengthenings of Craig interpolation for rangerestricted formulas.
3.1 CNF and DNF with Some Assumed Syntactic Properties
Following [59] we will consider a notion of rangerestriction defined in terms of properties of two prenex formulas that are equivalent to the original formula, have both the same quantifier prefix but matrices in CNF and DNF, respectively. Although not syntactically unique, we refer to them functionally as \(\textsf{cnf}(F)\) and \(\textsf{dnf}(F)\) since we only rely on specific – easy to achieve – syntactic properties that are stated in the following Proposition 4–6.
Proposition 4
For all formulas F it holds that \( \mathcal {V}ar (\textsf{cnf}(F)) \subseteq \mathcal {V}ar (F)\); \( \mathcal {V}oc ^\pm {(\textsf{cnf}(F))} \subseteq \mathcal {V}oc ^\pm {(F)}\); \( \mathcal {V}ar (\textsf{dnf}(F)) \subseteq \mathcal {V}ar (F)\); \( \mathcal {V}oc ^\pm {(\textsf{dnf}(F))} \subseteq \mathcal {V}oc ^\pm {(F)}\).
For prenex formulas F with an NNF matrix let \(\textsf{dual}(F)\) be the formula obtained from F by switching quantifiers \(\forall \) and \(\exists \), connectives \(\wedge \) and \(\vee \), truthvalue constants \(\top \) and \(\bot \), and literals with their complement.
Proposition 5
For all formulas F it holds that \(\textsf{cnf}(F) = \textsf{dual}(\textsf{dnf}(\lnot F))\); \(\textsf{dnf}(F) = \textsf{dual}(\textsf{cnf}(\lnot F))\); \(\textsf{cnf}(\lnot F) = \textsf{dual}(\textsf{dnf}(F))\); \(\textsf{dnf}(\lnot F) = \textsf{dual}(\textsf{cnf}(F))\).
Proposition 6
Let \(F_1, F_2,\) \(\ldots , F_n\) be NNF formulas. Then (i) Each clause in \(\textsf{cnf}(\bigwedge _{i=1}^{n} F_i)\) is in some \(\textsf{cnf}(F_j)\). (ii) Each conjunctive clause in \(\textsf{dnf}(\bigvee _{i=1}^{n} F_i)\) is in some \(\textsf{dnf}(F_j)\). (iii) Formulas \(F_j\) that are literals are in each clause in \(\textsf{cnf}(\bigvee _{i=1}^{n} F_i)\). (iv) Formulas \(F_j\) that are literals are in each conjunctive clause in \(\textsf{dnf}(\bigwedge _{i=1}^{n} F_i)\). (v) If S is a set of variables such that for all \(i \in \{1,\ldots ,n\}\) and clauses C in \(\textsf{cnf}(F_i)\) it holds that \( \mathcal {V}ar (C) \cap S \subseteq \mathcal {V}ar ^{}(C)\), then for all clauses C in \(\textsf{cnf}(\bigvee _{i=1}^{n} F_i)\) it holds that \( \mathcal {V}ar (C) \cap S \subseteq \mathcal {V}ar ^{}(C)\). (vi) If S is a set of variables such that for all \(i \in \{1,\ldots ,n\}\) and conjunctive clauses D in \(\textsf{dnf}(F_i)\) it holds that \( \mathcal {V}ar (D) \cap S \subseteq \mathcal {V}ar ^{+}(D)\), then for all conjunctive clauses D in \(\textsf{dnf}(\bigwedge _{i=1}^{n} F_i)\) it holds that \( \mathcal {V}ar (D) \cap S \subseteq \mathcal {V}ar ^{+}(D)\).
3.2 Used Notions of RangeRestriction
The following definition renders the characteristics of the rangerestricted formulas as considered by Van Gelder and Topor in [59, Theorem 7.2] (except for the special consideration of equality in [59]).
Definition 7
A formula F with free variables \(\mathcal {X}\) is called VGTrangerestricted if \(\textsf{cnf}(F) = Q\, M_{\textsf{C}}\) and \(\textsf{dnf}(F) = Q\, M_{\textsf{D}}\), where Q is a quantifier prefix (the same in both formulas) upon universally quantified variables \(\mathcal {U}\) and existentially quantified variables \(\mathcal {E}\) (in arbitrary order), and \(M_{\textsf{C}}\), \(M_{\textsf{D}}\) are quantifierfree formulas in CNF and DNF, respectively, such that

1.
For all clauses C in \(M_{\textsf{C}}\) it holds that \( \mathcal {V}ar (C) \cap \mathcal {U} \subseteq \mathcal {V}ar ^{}(C)\).

2.
For all conjunctive clauses D in \(M_{\textsf{D}}\) it holds that \( \mathcal {V}ar (D) \cap \mathcal {E} \subseteq \mathcal {V}ar ^{+}(D)\).

3.
For all conjunctive clauses D in \(M_{\textsf{D}}\) it holds that \(\mathcal {X} \subseteq \mathcal {V}ar ^{+}(D)\).
For VGTrangerestricted formulas it is shown in [59] that these can be translated via two intermediate formula classes to a relational algebra expression. Related earlier results include [17, 18, 40, 41]. The constraint on universal variables is also useful on its own as a weaker variation of rangerestriction, defined as follows.
Definition 8
A formula F is called Urangerestricted if \(\textsf{cnf}(F) = Q\, M_{\textsf{C}}\) where Q is a quantifier prefix upon of the universally quantified variables \(\mathcal {U}\) (there may also be existentially quantified variables in Q) and \(M_{\textsf{C}}\) is a quantifierfree formula in CNF such that for all clauses C in \(M_{\textsf{C}}\) it holds that \( \mathcal {V}ar (C) \cap \mathcal {U} \subseteq \mathcal {V}ar ^{}(C)\).
For formulas without free variables, Urangerestriction and VGTrangerestriction are related as follows.
Proposition 9
Let F be a sentence. Then (i)F is VGTrangerestricted iff F and \(\lnot F\) are both Urangerestricted. (ii) If F is universal (i.e., in prenex form with only universal quantifiers), then F is VGTrangerestricted iff F is Urangerestricted. (iii) If F is existential (i.e., in prenex form with only existential quantifiers), then F is VGTrangerestricted iff \(\lnot F\) is Urangerestricted.
Urangerestriction covers wellknown restrictions of knowledge bases and inputs of bottomup calculi for firstorder logic and fragments of it that are naturally represented by clausal formulas [3]. Firstorder representations of tuplegenerating dependencies (TGDs) are VGTrangerestricted sentences: conjunctions of sentences of the form \(\forall \mathcal {X}\mathcal {Y}\, (A(\mathcal {X}\mathcal {Y}) \rightarrow \exists \mathcal {Z}\, B(\mathcal {Y}\mathcal {Z}))\), where A is a possibly empty conjunction of relational atoms, B is a nonempty conjunction of relational atoms and the free variables of A and B are exactly those in the sequences \(\mathcal {X}\mathcal {Y}\) and \(\mathcal {Y}\mathcal {Z}\), respectively. Also certain generalizations, e.g., to disjunctive TGDs, where B is built up from atoms, \(\wedge \) and \(\vee \), are VGTrangerestricted.
3.3 Results on RangeRestricted Interpolation
The following theorem shows three variations for obtaining rangerestricted interpolants from rangerestricted inputs.
Theorem 10
(Interpolation and RangeRestriction). Let F and G be formulas such that \(F \models G\).

(i)
If F is Urangerestricted, then there exists a Urangerestricted CraigLyndon interpolant H of F and G. Moreover, H can be effectively constructed from a clausal tableau proof of \(F \models G\).

(ii)
If F and G are sentences such that F and \(\lnot G\) are Urangerestricted, then there exists a VGTrangerestricted CraigLyndon interpolant H of F and G. Moreover, H can be effectively constructed from a clausal tableau proof of \(F \models G\).

(iii)
If F and \(\lnot G\) are Urangerestricted, \( \mathcal {V}ar (F) = \mathcal {V}ar (G) = \mathcal {X}\), and (1) no clause in \(\textsf{cnf}(F)\) has only negative literals; (2) for all clauses C in \(\textsf{cnf}(\lnot G)\) with only negative literals it holds that \(\mathcal {X} \subseteq \mathcal {V}ar ^{}(C)\); (3) for all clauses C in \(\textsf{cnf}(\lnot G)\) it holds that \( \mathcal {V}ar (C) \cap \mathcal {X} \subseteq \mathcal {V}ar ^{}(C)\), then there exists a VGTrangerestricted CraigLyndon interpolant H of F and G. Moreover, H can be effectively constructed from a clausal tableau proof of \(F \models G\).
Observe that Theorem 10.i requires rangerestriction only for F, the first of the two interpolation arguments. Theorem 10.iii aims at applications for query reformulation that in a basic form are expressed as interpolation task for input formulas \(F = K \wedge Q(\mathcal {X})\) and \(G = \lnot K' \vee Q'(\mathcal {X})\). Here K expresses background knowledge and constraints as a Urangerestricted sentence and \(Q(\mathcal {X})\) represents a query to be reformulated, with free variables \(\mathcal {X}\). Formulas \(K'\) and \(Q'\) are copies of K and Q, respectively, where predicates not allowed in the interpolant are replaced by primed versions. If the query Q is Boolean, i.e., \(\mathcal {X}\) is empty, and Q is VGTrangerestricted, then Theorem 10.ii already suffices to justify the construction of a VGTrangerestricted interpolant. If \(\mathcal {X}\) is not empty, the fineprint preconditions of Theorem 10.iii come into play. Precondition (1) requires that \(\textsf{cnf}(K)\) does not have a clause with only negative literals, which is satisfied if K represents TGDs. Also \(\textsf{cnf}(Q)\) is not allowed to have a clause with only negative literals. By precondition (2) all the free variables \(\mathcal {X}\) must occur in all those clauses of \(\textsf{cnf}(\lnot Q)\) that only have negative literals, which follows if Q meets condition (3.) of the VGTrangerestriction (Definition 7). By precondition (3) for all clauses C in \(\textsf{cnf}(\lnot Q)\) it must hold that \( \mathcal {V}ar (C) \cap \mathcal {X}\subseteq \mathcal {V}ar ^{}(C)\). A sufficient condition for Q to meet all these preconditions is that \(\textsf{dnf}(Q)\) has a purely existential quantifier prefix and a matrix with only positive literals where each query variable, i.e., member of \(\mathcal {X}\), occurs in each conjunctive clause.
3.4 Proving RangeRestricted Interpolation – The Hyper Property
We will prove Theorem 10 by showing how the claimed interpolants can be obtained with CTIF. As a preparatory step we match items from the specification of CTIF (Fig. 2) with the constraints of rangerestriction. The following notion gathers intermediate formulas and sets of symbols of CTIF.
Definition 11
An interpolation context is a tuple , where F, G are formulas, \(F', G'\) are clausal formulas, \(\mathcal {C}\) is a set of constants, \(\mathcal {F}, \mathcal {G}\) are sets of functions, and \(\mathcal {E},\mathcal {U} , \mathcal {V}\) are sets of terms such that the following holds. (i) \(F \models G\). (ii) Let \(F_c\) and \(G_c\) be F and G after replacing each free variable with a dedicated fresh constant. Let \(\mathcal {C}\) be those constants that were used there to replace a variable that occurs in both F and G. \(F'\) and \(G'\) are the matrices of \(\textsf{cnf}(F_c)\) and of \(\textsf{cnf}(\lnot G_c)\), after replacing existentially quantified variables with Skolem terms. (iii) \(\mathcal {F}\) is the union of the set of the Skolem functions introduced for existential quantifiers of \(\textsf{cnf}(F_c)\), the set of functions occurring in \(F_c\) but not in \(G_c\) and, possibly, further functions freshly introduced in the grounding step of CTIF. Analogously, \(\mathcal {G}\) is the union of the set of the Skolem functions introduced for \(\textsf{cnf}(\lnot G_c)\), the set of functions occurring in \(G_c\) but not in \(F_c\), and, possibly, further functions introduced in grounding. (iv) \(\mathcal {E}\) and \(\mathcal {U}\) are the sets of all terms with outermost function symbol in \(\mathcal {F}\) and \(\mathcal {G}\), respectively. (v) \(\mathcal {V}\) is \(\mathcal {E} \cup \mathcal {U} \cup \mathcal {C}\).
The following statements about an interpolation context are easy to infer.
Lemma 12
Let be an interpolation context. Then (i) No member of \(\mathcal {G}\) occurs in \(F'\). (ii) No member of \(\mathcal {F}\) occurs in \(G'\). (iii) If F is Urangerestricted, then for all clauses C in \(F'\) it holds that if a variable occurs in C in a position that is not within an \(\mathcal {E}\)term it occurs in C in a negative literal, in a position that is not within an \(\mathcal {E}\)term. (iv) If \(\lnot G\) is Urangerestricted, then for all clauses C in \(G'\) it holds that if a variable occurs in C in a position that is not within an \(\mathcal {U}\)term, it occurs in C in a negative literal, in a position that is not within an \(\mathcal {U}\)term. (v) If G satisfies condition (3) of Theorem 10.iii, then for all clauses C in \(G'\) it holds that any member of \(\mathcal {C}\) that occurs in C in a position that is not within an \(\mathcal {U}\)term occurs in C in a negative literal in a position that is not within an \(\mathcal {U}\)term.
CTIF involves conversion of terms to variables at lifting (step 7) and at replacing placeholder constants (step 8). We introduce a notation to identify those terms that will be converted there to variables. It mimics the notation for the set of free variables of a formula but applies to a set of terms, those with occurrences that are “maximal” with respect to a given set S of terms, i.e., are not within another term from S. For NNF formulas F define \(S\hbox {} \mathcal {M}ax (F)\) as the set of Sterms that occur in F in a position other than as subterm of another Sterm. Define \(S\hbox {} \mathcal {M}ax ^{+}(F)\) (\(S\hbox {} \mathcal {M}ax ^{}(F)\), respectively) as the set of Sterms that occur in F in a positive (negative, respectively) literal in a position other than as subterm of another Sterm. We can now conclude from Lemma 12 the following properties of instances of clauses used for interpolant construction.
Lemma 13
Let be an interpolation context. Then

(i)
If F is Urangerestricted, then for all instances C of a clause in \(F'\) it holds that \(\mathcal {V}\hbox {} \mathcal {M}ax (C) \cap \mathcal {U} \subseteq \mathcal {V}\hbox {} \mathcal {M}ax ^{}(C)\).

(ii)
If \(\lnot G\) is Urangerestricted, then for all instances C of a clause in \(G'\) it holds that \(\mathcal {V}\hbox {} \mathcal {M}ax (C) \cap \mathcal {E} \subseteq \mathcal {V}\hbox {} \mathcal {M}ax ^{}(C)\).

(iii)
If condition (1) of Theorem 10.iii holds, then no instance C of a clause in \(F'\) has only negative literals.

(iv)
If condition (2) of Theorem 10.iii holds, then for all instances C of a clause in \(G'\) with only negative literals it holds that \(\mathcal {C} \subseteq \mathcal {V}\hbox {} \mathcal {M}ax ^{}(C)\).

(v)
If \(\lnot G\) is Urangerestricted and condition (3) of Theorem 10.iii holds, then for all instances C of a clause in \(G'\) it holds that \(\mathcal {V}\hbox {} \mathcal {M}ax (C) \cap \mathcal {C} \subseteq \mathcal {V}\hbox {} \mathcal {M}ax ^{}(C)\).
The following proposition adapts Props. 6.v and 6.vi to \(S\hbox {}{ \mathcal {M}ax }\).
Proposition 14
Let \(F_1, F_2,\) \(\ldots , F_n\) be NNF formulas and let T be a set of terms. Then (i) If S is a set of terms such that for all \(i \in \{1,\ldots ,n\}\) and clauses C in \(\textsf{cnf}(F_i)\) it holds that \(T\hbox {} \mathcal {M}ax (C) \cap S \subseteq T\hbox {} \mathcal {M}ax ^{}(C)\), then for all clauses C in \(\textsf{cnf}(\bigvee _{i=1}^{n} F_i)\) it holds that \(T\hbox {} \mathcal {M}ax (C) \cap S \subseteq T\hbox {} \mathcal {M}ax ^{}(C)\). (ii) If S is a set of terms such that for all \(i \in \{1,\ldots ,n\}\) and conjunctive clauses D in \(\textsf{dnf}(F_i)\) it holds that \(T\hbox {} \mathcal {M}ax (D) \cap S \subseteq T\hbox {} \mathcal {M}ax ^{+}(D)\), then for all conjunctive clauses D in \(\textsf{dnf}(\bigwedge _{i=1}^{n} F_i)\) it holds that \(T\hbox {} \mathcal {M}ax (D) \cap S \subseteq T\hbox {} \mathcal {M}ax ^{+}(D)\).
The key to obtain rangerestricted interpolants from CTIF is that the tableau must have a specific form, which we call hyper, as it resembles proofs by hyperresolution [46] and hypertableaux [2].
Definition 15
A clausal tableau is called hyper if the nodes labeled with a negative literal are exactly the leaf nodes.
While hyperresolution and related approaches, e.g., [2, 3, 11, 36, 46], consider DAGshaped proofs with nonrigid variables, aiming at interpolant extraction we consider the hyper property for treeshaped proofs with rigid variables. The hyper requirement is w.l.o.g. because arbitrary closed clausal tableaux can be converted to tableaux with the hyper property, as we will see in Sect. 5.
The proof of Theorem 10 is based on three properties that invariantly hold for all nodes, or for all inner nodes, respectively, stated in the following lemma.
Lemma 16
Let be an interpolation context and assume a leafclosed and hyper twosided clausal ground tableau for \(F'\) and \(G'\).

(i)
If F is Urangerestricted, then for all nodes N the property \(\textsf{INV}_{\textsf{C}}(N)\) defined as follows holds: \(\textsf{INV}_{\textsf{C}}(N)\) For all clauses C in \(\textsf{cnf}(\textsf{ipol}(N))\) it holds that \(\mathcal {V}\hbox {} \mathcal {M}ax (C) \cap \mathcal {U} \subseteq \mathcal {V}\hbox {} \mathcal {M}ax ^{}(C) \cup \mathcal {V}\hbox {}{ \mathcal {M}ax }^{+}(\textsf{path}_{\textsf{F}}(N)).\)

(ii)
If \(\lnot G\) is Urangerestricted, then for all nodes N the property \(\textsf{INV}_{\textsf{D}}(N)\) defined as follows holds: \(\textsf{INV}_{\textsf{D}}(N)\) For all conjunctive clauses D in \(\textsf{dnf}(\textsf{ipol}(N))\) it holds that \(\mathcal {V}\hbox {}{ \mathcal {M}ax }(D) \cap \mathcal {E} \subseteq \mathcal {V}\hbox {}{ \mathcal {M}ax }^{+}(D) \cup \mathcal {V}\hbox {}{ \mathcal {M}ax }^{+}(\textsf{path}_{\textsf{G}}(N)).\)

(iii)
If \(\lnot G\) is Urangerestricted and conditions (1)–(3) Theorem 10.iii hold, then for all inner nodes N the property \(\textsf{INV}_{\textsf{X}}(N)\) defined as follows holds: \(\textsf{INV}_{\textsf{X}}(N)\) For all conjunctive clauses D in \(\textsf{dnf}(\textsf{ipol}(N))\) it holds that \(\mathcal {C} \subseteq \mathcal {V}\hbox {}{ \mathcal {M}ax }^{+}(D) \cup \mathcal {V}\hbox {}{ \mathcal {M}ax }^{+}(\textsf{path}_{\textsf{G}}(N)).\)
Each of Lemma 16.i, 16.ii and 16.iii can be proven independently by an induction on the tableau structure, but for the same tableau, such that the properties claimed by them can be combined. In proving these three sublemmas it is sufficient to use their respective preconditions only to justify the application of matching sublemmas of Lemma 13. That lemma might thus be seen as an abstract interface that delivers everything that depends on these preconditions and is relevant for Theorem 10.
We show here the proof of Lemma 16.i. Lemma 16.ii can be proven in full analogy. The proof of Lemma 16.iii is deferred to [63, App. A]. In general, recall that the tableau in Lemma 16 is a twosided tableau for \(F'\) and \(G'\) that is leafclosed and hyper. Hence literal labels of leaves are negative, while those of inner nodes are positive. All tableau clauses are ground and with an associated side in \(\{\textsf{F},\textsf{G}\}\) such that a tableau clause with side \(\textsf{F}\) is an instance of a clause in \(F'\) and one with side \(\textsf{G}\) is an instance of a clause in \(G'\).
Proof
(Lemma 16.i). By induction on the tableau structure.
Base case where N is a leaf. If N and \(\textsf{tgt}(N)\) have the same side, then \(\textsf{ipol}(N)\) is a truth value constant, hence \(\mathcal {V}\hbox {}{ \mathcal {M}ax }(\textsf{ipol}(N)) =\emptyset \), implying \(\textsf{INV}_{\textsf{C}}(N)\). If N has side \(\textsf{F}\) and \(\textsf{tgt}(N)\) has side \(\textsf{G}\), then \(\textsf{ipol}(N) = \textsf{lit}(N)\), which, because N is a leaf, is a negative literal. Thus \(\mathcal {V}\hbox {}{ \mathcal {M}ax }(\textsf{ipol}(N)) = \mathcal {V}\hbox {}{ \mathcal {M}ax }^{}(\textsf{ipol}(N))\), which implies \(\textsf{INV}_{\textsf{C}}(N)\). If N has side \(\textsf{G}\) and \(\textsf{tgt}(N)\) has side \(\textsf{F}\), then \(\textsf{ipol}(N) = \textsf{lit}(\textsf{tgt}(N))\), which, because N is a leaf, is a positive literal. Thus \(\mathcal {V}\hbox {}{ \mathcal {M}ax }(\textsf{ipol}(N)) \subseteq \mathcal {V}\hbox {}{ \mathcal {M}ax }^{+}(\textsf{path}_{\textsf{F}}(N))\), implying \(\textsf{INV}_{\textsf{C}}(N)\).
Induction Step. Let \(N_1, \ldots , N_n\), where \(1 \le n\), be the children of N. Assume as induction hypothesis that for \(i \in \{1,\ldots ,n\}\) it holds that \(\textsf{INV}_{\textsf{C}}(N_i)\). Consider the case where the side of the children is \(\textsf{F}\). Then

(1)
\(\textsf{ipol}(N) = \bigvee _{i = 1}^{n} \textsf{ipol}(N_i)\).
Assume that \(\textsf{INV}_{\textsf{C}}(N)\) does not hold. Then there exists a clause K in \(\textsf{cnf}(\textsf{ipol}(N))\) and a term t such that (2) \(t \in \mathcal {U}\); (3) \(t \in \mathcal {V}\hbox {}{ \mathcal {M}ax }(K)\); (4) \(t \notin \mathcal {V}\hbox {}{ \mathcal {M}ax }^{}(K)\); (5) \(t \notin \mathcal {V}\hbox {}{ \mathcal {M}ax }^{+}(\textsf{path}_{\textsf{F}}(N))\). To derive a contradiction, we first show that given (2), (4) and (5) it holds that

(6)
For all children \(N'\) of N: \(t \notin \mathcal {V}\hbox {}{ \mathcal {M}ax }^{+}(\textsf{path}_{\textsf{F}}(N'))\).
Statement (6) can be proven as follows. Assume to the contrary that there is a child \(N'\) of N such that \(t \in \mathcal {V}\hbox {}{ \mathcal {M}ax }^{+}(\textsf{path}_{\textsf{F}}(N'))\). By (5) it follows that \(t \in \mathcal {V}\hbox {}{ \mathcal {M}ax }(\textsf{lit}(N'))\) and \(\textsf{lit}(N')\) is positive. By Lemma 13.i and (2) there is another child \(N''\) of N such that \(\textsf{lit}(N'')\) is negative and \(t \in \mathcal {V}\hbox {}{ \mathcal {M}ax }(\textsf{lit}(N''))\). Since the tableau is closed, it follows from (5) that \(\textsf{tgt}(N'')\) has side \(\textsf{G}\), which implies that \(\textsf{ipol}(N'') = \textsf{lit}(N'')\). Hence \(t \in \mathcal {V}\hbox {}{ \mathcal {M}ax }(\textsf{ipol}(N''))\). Since \(\textsf{ipol}(N'')\) is a negative literal and a disjunct of \(\textsf{ipol}(N)\), it follows from (1) and Prop. 6.iii that for all clauses C in \(\textsf{cnf}(\textsf{ipol}(N))\) it holds that \(t \in \mathcal {V}\hbox {}{ \mathcal {M}ax }^{}(C)\), contradicting assumption (4). Hence (6) must hold.
From (6), (2) and the induction hypothesis it follows that for all children \(N'\) of N and clauses \(C'\) in \(\textsf{cnf}(\textsf{ipol}(N'))\) it holds that \(\mathcal {V}\hbox {}{ \mathcal {M}ax }(C') \cap \{t\} \subseteq \mathcal {V}\hbox {}{ \mathcal {M}ax }^{}(C')\). Hence, by (1) and Prop. 14.i it follows that for all clauses C in \(\textsf{cnf}(\textsf{ipol}(N))\) it holds that \(\mathcal {V}\hbox {}{ \mathcal {M}ax }(C) \cap \{t\} \subseteq \mathcal {V}\hbox {}{ \mathcal {M}ax }^{}(C)\). This, however, contradicts our assumption of the existence of a clause K in \(\textsf{cnf}(\textsf{ipol}(N))\) that satisfies (3) and (4). Hence \(\textsf{INV}_{\textsf{C}}(N)\) must hold.
We conclude the proof of the induction step for \(\textsf{INV}_{\textsf{C}}(N)\) by considering the case where the side of the children of N is \(\textsf{G}\). Then

(7)
\(\textsf{ipol}(N) = \bigwedge _{i = 1}^{n} \textsf{ipol}(N_i)\).

(8)
For all children \(N'\) of N: \(\textsf{path}_{\textsf{F}}(N) = \textsf{path}_{\textsf{F}}(N')\).
\(\textsf{INV}_{\textsf{C}}(N)\) follows from the induction hypothesis, (8), (7) and Prop. 6.i. \(\square \)
The invariant properties of tableau nodes shown in Lemmas 16.i–16.iii apply in particular to the tableau root. We now apply this to prove Theorem 10.
Proof
(Theorem 10). Interpolants with the stated properties are obtained with CTIF, assuming w.l.o.g. that the CNF computed in step 2 meets the requirement of Sect. 3.1, and that the closed clausal tableau computed in step 3 is leafclosed and has the hyper property. That CTIF constructs a CraigLyndon interpolant has been shown in [62]. It remains to show the further claimed properties of the interpolant. Let be the interpolation context for the input formulas F and G and let \(N_0\) be the root of the tableau computed in step 3. Since \(N_0\) is the root, \(\textsf{path}_{\textsf{F}}(N_0) = \textsf{path}_{\textsf{G}}(N_0) = \top \) and thus the expressions \(\mathcal {V}\hbox {}{ \mathcal {M}ax }^{+}(\textsf{path}_{\textsf{F}}(N_0))\) and \(\mathcal {V}\hbox {}{ \mathcal {M}ax }^{+}(\textsf{path}_{\textsf{G}}(N_0))\) in the specifications of \(\textsf{INV}_{\textsf{C}}(N_0)\), \(\textsf{INV}_{\textsf{D}}(N_0)\) and \(\textsf{INV}_{\textsf{X}}(N_0)\) all denote the empty set. The claims made in the particular subtheorems can then be shown as follows.
(10.i) By Lemma 16.i it follows that \(\textsf{INV}_{\textsf{C}}(N_0)\). Hence, for all clauses C in \(\textsf{cnf}(\textsf{ipol}(N_0))\) it holds that \(\mathcal {V}\hbox {}{ \mathcal {M}ax }(C) \cap \mathcal {U} \subseteq \mathcal {V}\hbox {}{ \mathcal {M}ax }^{}(C)\). It follows that the result of the interpolant lifting (step 7) of CTIF applied to \(\textsf{ipol}(N_0)\) is Urangerestricted. Placeholder constant replacement (step 8) does not alter this.
(10.ii) As for Theorem 10.i it follows that for all clauses C in \(\textsf{cnf}(\textsf{ipol}(N_0))\) it holds that \(\mathcal {V}\hbox {}{ \mathcal {M}ax }(C) \cap \mathcal {U} \subseteq \mathcal {V}\hbox {}{ \mathcal {M}ax }^{}(C)\). By Lemma 16.ii it follows that \(\textsf{INV}_{\textsf{D}}(N_0)\). Hence, for all conjunctive clauses D in \(\textsf{dnf}(\textsf{ipol}(N_0))\) it holds that \(\mathcal {V}\hbox {}{ \mathcal {M}ax }(D) \cap \mathcal {E} \subseteq \mathcal {V}\hbox {}{ \mathcal {M}ax }^{+}(D)\). It follows that the result of the interpolant lifting of CTIF applied to \(\textsf{ipol}(N_0)\) is Urangerestricted. Since F and G have no free variables, placeholder constant replacement has no effect.
(10.iii) As for Theorem 10.ii it follows that for all clauses C in \(\textsf{cnf}(\textsf{ipol}(N_0))\) it holds that \(\mathcal {V}\hbox {}{ \mathcal {M}ax }(C) \cap \mathcal {U} \subseteq \mathcal {V}\hbox {}{ \mathcal {M}ax }^{}(C)\) and for all conjunctive clauses D in \(\textsf{dnf}(\textsf{ipol}(N_0))\) it holds that \(\mathcal {V}\hbox {}{ \mathcal {M}ax }(D) \cap \mathcal {E} \subseteq \mathcal {V}\hbox {}{ \mathcal {M}ax }^{+}(D)\). By Lemma 16.iii it follows that \(\textsf{INV}_{\textsf{X}}(N_0)\). Hence, for all conjunctive clauses D in \(\textsf{dnf}(\textsf{ipol}(N_0))\) it holds that \(\mathcal {C} \subseteq \mathcal {V}\hbox {}{ \mathcal {M}ax }^{+}(D)\). It follows that the result of the interpolant lifting of CTIF applied to \(\textsf{ipol}(N_0)\) followed by placeholder constant replacement, now applied to \(\mathcal {C}\), is VGTrangerestricted. \(\square \)
4 Horn Interpolation
A Horn clause is a clause with at most one positive literal. A Horn formula is built up from Horn clauses with the connectives \(\wedge \), \(\exists \) and \(\forall \). Horn formulas are important in countless theoretical and practical respects. Our interpolation method on the basis of clausal tableaux with the hyper property can be applied to obtain a Horn interpolant under the precondition that the first argument formula F of the interpolation problem is Horn. The following theorem makes this precise. It can be proven by an induction on the structure of a clausal tableau with the hyper property (see [63, App. B]).
Theorem 17
(Interpolation from a Horn Formula). Let F be a Horn formula and let G be a formula such that \(F\models G.\) Then there exists a CraigLyndon interpolant H of F and G that is a Horn formula. Moreover, H can be effectively constructed from a clausal tableau proof of \(F \models G\).
An apparently weaker property than Theorem 17 has been shown in [38, § 4] with techniques from model theory: For two universal Horn formulas F and G there exists a universal Horn formula that is like a Craig interpolant, except that function symbols are not constrained. A universal Horn formula is there a prenex formula with only universal quantifiers and a Horn matrix. For CTIF, the corresponding strengthening of the interpolant to a universal formula can be readoff from the specification of interpolant lifting (step 7 in Fig. 2).
The following corollary shows that Theorem 17 can be combined with Theorem 10 to obtain interpolants that are both Horn and rangerestricted.
Corollary 18
(RangeRestricted Horn Interpolants). Theorems 10.i, 10.ii and 10.iii can be strengthened: If F is a Horn formula, then there exists a CraigLyndon interpolant H with the properties shown in the respective theorem and the additional property that it is Horn. Moreover, H can be effectively constructed from a clausal tableau proof of \(F \models G\).
Proof
Can be shown by combining the proof of Theorem 10.i, 10.ii and 10.iii , respectively, with the proof of interpolation from a Horn sentence, Theorem 17. The combined proofs are based on inductions on the same closed tableau with the hyper property. \(\square \)
5 Obtaining Proofs with the Hyper Property
Our new interpolation theorems, Theorems 10 and 17, depend on the hyper property of the underlying closed clausal tableaux from which interpolants are extracted. We present a proof transformation that converts any closed clausal tableau to one with the hyper property. The transformation can be applied to a clausal tableau as obtained directly from a clausal tableaux prover. Moreover, it can be also be indirectly applied to a resolution proof. To this end, the resolution deduction tree [12] of the binary resolution proof is first translated to a closed clausal ground tableau in cut normal form [31, Sect. 7.22]. There the inner clauses are atomic cuts, tautologies of the form \(\lnot p(t_1,\ldots ,t_n) \vee p(t_1,\ldots ,t_n)\) or \(p(t_1,\ldots ,t_n) \vee \lnot p(t_1,\ldots ,t_n)\), corresponding to literals upon which a (tree) resolution step has been performed. Clauses of nodes whose children are leaves are instances of input clauses. Our hyper conversion can then be applied to the tableau in cut normal form. It is easy to see that a regular leafclosed tableau with the hyper property can not have atomic cuts. Hence the conversion might be viewed as an elimination method for these cuts.
We specify the hyper conversion in Fig. 3 as a procedure that destructively manipulates a tableau. A fresh copy of an ordered tree T is there an ordered tree \(T^\prime \) with fresh nodes and edges, related to T through a bijection c such that any node N of T has the same labels (literal label and side label) as node c(N) of \(T^\prime \) and such that the ith edge originating in node N of T ends in node M if and only if the ith edge originating in node c(N) of \(T^\prime \) ends in node c(M). The procedure is performed as an iteration that in each round chooses an inner node with negative literal label and then modifies the tableau. Hence, at termination there is no inner node with negative literal, which means that the tableau is hyper. Termination of the procedure can be shown with a measure that strictly decreases in each round (Prop. 20 in [63, App. C]). Figures 4 and 5 show example applications of the procedure.
Since the hyper conversion procedure copies parts of subtrees it is not a polynomial operation.^{Footnote 4} To get an idea of its practical feasibility, we experimented with an unbiased set of proofs of miscellaneous problems. For this we took those 112 CASCJ11 [54] problems that could be proven with Prover9 [37] in 400 s per problem, including a basic proof conversion with Prover9 ’s tool Prooftrans.^{Footnote 5} The hyper conversion succeeded on 107 (or 96%) of these, given 400 s timeout per proof, where the actual median of used time was only 0.01 s. It was applied to a tableau in cut normal form that represents the proof tree of Prover9 ’s proof. The two intermediate steps, translation of paramodulation to binary resolution and expansion to cut normal form, succeeded in fractions of a second, except for one case where the expansion took 121 s and two cases where it failed due to memory exhaustion. The hyper conversion then failed in three further cases. For all except two proofs the hyper conversion reduced the proof size, where the overall median of the size ratio hypertoinput was 0.39. See [63, App. D] for details.
6 Conclusion
We conclude with discussing related work, open issues and perspectives. Our interpolation method CTIF [62] is complete for firstorder logic with function symbols. Vampire ’s native interpolation [22, 23], targeted at verification, is like all local methods incomplete [28]. Princess [10, 47] implements interpolation with a sequent calculus that supports theories for verification and permits uninterpreted predicates and functions. Suitable proofs for our approach can currently be obtained from CMProver (clausal tableaux) and Prover9 (resolution/paramodulation). With optimized settings, Vampire [27] and E [49] as of today only output proofs with gaps. This seems to improve [48] or might be overcome by reproving with Prover9 using lemmas from the more powerful systems.
So far we did not address special handling of equality in the context of rangerestriction, a topic on its own, e.g., [3, 59]. We treat it as predicate, with axioms for reflexivity, symmetry, transitivity and substitutivity. CTIF works smoothly with these, respecting polarity constraints of equality in interpolants [62, Sect. 10.4]. With exception of reflexivity these axioms are Urangerestricted. We do not interfere with the provers’ equality handling and just translate in finished proofs paramodulation into binary resolution with substitutivity axioms.
The potential bottleneck of conversion to clausal form in CTIF may be remedied with structurepreserving (aka definitional) normal forms [19, 44, 50, 58].
Our hyper property might be of interest for proof presentation and exchange, since it gives the proof tree a constrained shape and in experiments often shortens it. Like hyperresolution and hypertableaux it can be generalized to take a “semantics” into account [51] [12, Chap. 6] [26, Sect. 4.5]. To shorten interpolants, it might be combined with proof reductions (e.g., [64]).
For query reformulation, interpolation on the basis of general firstorder ATP was so far hardly considered. Most methods are sequent calculi [6, 56] or analytic tableaux systems [5, 21, 25, 57]. Experiments with ATP systems and propositional inputs indicate that requirements are quite different from those in verification [4]. An implemented system [25, 57] uses analytic tableaux with dedicated refinements for enumerating alternate proofs/interpolants corresponding to query plans for heuristic choice. In [5] the focus is on interpolants that are sentences respecting binding patterns, which, like rangerestriction, ensures database evaluability. Our interpolation theorems show finegrained conditions for passing variations of rangerestriction and the Horn property on to interpolants. Matching these with the many formula classes considered in knowledge representation and databases is an issue for future work. A further open topic is adapting recent synthesis techniques for nested relations [6] to the clausal tableaux proof system.
Methodically, we exemplified a way to approach operations on proof structures while taking efficient automated firstorder provers into account. Feasible implementations are brought within reach, for practical application and also for validating abstract claims and conjectures with scrutiny. The prover is a black box, given freedom on optimizations, strategy and even calculus. For interfacing, the overall setting incorporates clausification and Skolemization. Requirements on the proof structure do not hamper proof search, but are ensured by transformations applied to proofs returned by the efficient systems.
Notes
 1.
 2.
 3.
 4.
A thorough complexity analysis should take calculus or strategydependent properties of the input proofs into account. And possibly also the blowup from resolution to tree resolution underlying the cut normal form tableaux.
 5.
On a Linux notebook with 12th Gen Intel^{®} Core™ i71260P CPU and 32 GB RAM.
References
Abiteboul, S., Hull, R., Vianu, V.: Foundations of Databases. Addison Wesley, Boston (1995)
Baumgartner, P., Furbach, U., Niemelä, I.: Hyper tableaux. In: Alferes, J.J., Pereira, L.M., Orlowska, E. (eds.) JELIA 1996. LNCS, vol. 1126, pp. 1–17. Springer, Heidelberg (1996). https://doi.org/10.1007/3540616306_1
Baumgartner, P., Schmidt, R.A.: Blocking and other enhancements for bottomup model generation methods. J. Autom. Reasoning 64, 197–251 (2020). https://doi.org/10.1007/11814771_11
Benedikt, M., Kostylev, E.V., Mogavero, F., Tsamoura, E.: Reformulating queries: theory and practice. In: Sierra, C. (ed.) IJCAI 2017, pp. 837–843. ijcai.org (2017). https://doi.org/10.24963/ijcai.2017/116
Benedikt, M., Leblay, J., ten Cate, B., Tsamoura, E.: Generating Plans from Proofs: The Interpolationbased Approach to Query Reformulation. Morgan & Claypool, San Rafael (2016). https://doi.org/10.1007/9783031018565
Benedikt, M., Pradic, C., Wernhard, C.: Synthesizing nested relational queries from implicit specifications. In: PODS ’23, pp. 33–45 (2023). https://doi.org/10.1145/3584372.3588653
Bibel, W.: Automated Theorem Proving, 2nd edn. Vieweg, Braunschweig (1987). https://doi.org/10.1007/9783322901026. First edition 1982
Bibel, W., Otten, J.: From Schütte’s formal systems to modern automated deduction. In: The Legacy of Kurt Schütte, pp. 217–251. Springer, Cham (2020). https://doi.org/10.1007/9783030494247_13
Bonacina, M.P., Johansson, M.: On interpolation in automated theorem proving. J. Autom. Reasoning 54(1), 69–97 (2014). https://doi.org/10.1007/s1081701493140
Brillout, A., Kroening, D., Rümmer, P., Wahl, T.: Beyond quantifierfree interpolation in extensions of Presburger arithmetic. In: Jhala, R., Schmidt, D. (eds.) VMCAI 2011. LNCS, vol. 6538, pp. 88–102. Springer, Heidelberg (2011). https://doi.org/10.1007/9783642182754_8
Bry, F., Yahya, A.H.: Positive unit hyperresolution tableaux and their application to minimal model generation. J. Autom. Reasoning 25(1), 35–82 (2000). https://doi.org/10.1023/A:1006291616338
Chang, C.L., Lee, R.C.T.: Symbolic Logic and Automated Theorem Proving. Academic Press, Cambridge (1973)
Craig, W.: Linear reasoning. A new form of the HerbrandGentzen theorem. J. Symb. Log. 22(3), 250–268 (1957). https://doi.org/10.2307/2963593
Craig, W.: Three uses of the HerbrandGentzen theorem in relating model theory and proof theory. J. Symb. Log. 22(3), 269–285 (1957). https://doi.org/10.2307/2963594
Craig, W.: The road to two theorems of logic. Synthese 164(3), 333–339 (2008). https://doi.org/10.1007/s1122900893533
Dahn, I., Wernhard, C.: First order proof problems extracted from an article in the Mizar mathematical library. In: Bonacina, M.P., Furbach, U. (eds.) FTP’97, pp. 58–62. RISCLinz Report Series No. 97–50, Joh. Kepler Univ., Linz (1997). https://www.logic.at/ftp97/papers/dahn.pdf
Demolombe, R.: Syntactical characterization of a subset of domain independent formulas. Technical report, ONERACERT, Toulouse (1982)
Demolombe, R.: Syntactical characterization of a subset of domain independent formulas. JACM 39, 71–94 (1992). https://doi.org/10.1145/147508.147520
Eder, E.: An implementation of a theorem prover based on the connection method. In: Bibel, W., Petkoff, B. (eds.) AIMSA’84, pp. 121–128. NorthHolland (1985)
Fitting, M.: FirstOrder Logic and Automated Theorem Proving, 2nd edn. Springer, Cham (1995). https://doi.org/10.1007/9781461223603
Franconi, E., Kerhet, V., Ngo, N.: Exact query reformulation over databases with firstorder and description logics ontologies. JAIR 48, 885–922 (2013). https://doi.org/10.1613/jair.4058
Hoder, K., Holzer, A., Kovács, L., Voronkov, A.: Vinter: a Vampirebased tool for interpolation. In: Jhala, R., Igarashi, A. (eds.) APLAS 2012. LNCS, vol. 7705, pp. 148–156. Springer, Heidelberg (2012). https://doi.org/10.1007/9783642351822_11
Hoder, K., Kovács, L., Voronkov, A.: Interpolation and symbol elimination in Vampire. In: Giesl, J., Hähnle, R. (eds.) IJCAR 2010. LNCS (LNAI), vol. 6173, pp. 188–195. Springer, Heidelberg (2010). https://doi.org/10.1007/9783642142031_16
Huang, G.: Constructing Craig interpolation formulas. In: Du, D.Z., Li, M. (eds.) COCOON 1995. LNCS, vol. 959, pp. 181–190. Springer, Heidelberg (1995). https://doi.org/10.1007/BFb0030832
Hudek, A., Toman, D., Weddell, G.: On enumerating query plans using analytic tableau. In: De Nivelle, H. (ed.) TABLEAUX 2015. LNCS (LNAI), vol. 9323, pp. 339–354. Springer, Cham (2015). https://doi.org/10.1007/9783319243122_23
Hähnle, R.: Tableaux and related methods. In: Robinson, A., Voronkov, A. (eds.) Handbook of Automated Reasoning, vol. 1, chap. 3, pp. 101–178. Elsevier (2001). https://doi.org/10.1016/b9780444508133/500059
Kovács, L., Voronkov, A.: Firstorder theorem proving and Vampire. In: Sharygina, N., Veith, H. (eds.) CAV 2013. LNCS, vol. 8044, pp. 1–35. Springer, Heidelberg (2013). https://doi.org/10.1007/9783642397998_1
Kovács, L., Voronkov, A.: Firstorder interpolation and interpolating proof systems. In: Eiter, T., Sands, D. (eds.) LPAR21. EPiC, vol. 46, pp. 49–64. EasyChair (2017). https://doi.org/10.29007/1qb8
Letz, R.: Clausal tableaux. In: Bibel, W., Schmitt, P.H. (eds.) Automated Deduction  A Basis for Applications, vol. I, pp. 43–72. Kluwer Academic Publishers (1998)
Letz, R.: Firstorder tableau methods. In: D’Agostino, A., Gabbay, D.M., Hähnle, R., Posegga, J. (eds.) Handbook of Tableau Methods, pp. 125–196. Springer, Dordrecht (1999)
Letz, R.: Tableau and Connection Calculi. Structure, Complexity, Implementation. Habilitationsschrift, TU München (1999). http://www2.tcs.ifi.lmu.de/~letz/habil.pdf. Accessed 19 July 2023
Letz, R., Schumann, J., Bayerl, S., Bibel, W.: SETHEO: a highperformance theorem prover. J. Autom. Reasoning 8(2), 183–212 (1992). https://doi.org/10.1007/BF00244282
Letz, R., Stenz, G.: Model elimination and connection tableau procedures. In: Robinson, A., Voronkov, A. (eds.) Handbook of Automated Reasoning, vol. 1, pp. 2015–2114. Elsevier (2001)
Loveland, D.W.: Automated Theorem Proving: A Logical Basis. NorthHolland, Amsterdam (1978)
Lyndon, R.: An interpolation theorem in the predicate calculus. Pac. J. Math. 9, 129–142 (1959). https://doi.org/10.2140/pjm.1959.9.129
Manthey, R., Bry, F.: SATCHMO: A theorem prover implemented in Prolog. In: Lusk, E., Overbeek, R. (eds.) CADE 1988. LNCS, vol. 310, pp. 415–434. Springer, Heidelberg (1988). https://doi.org/10.1007/BFb0012847
McCune, W.: Prover9 and Mace4 (2005–2010). http://www.cs.unm.edu/~mccune/prover9. Accessed 19 July 2023
McNulty, G.F.: Fragments of first order logic, I: universal Horn logic. J. Symb. Log. 42(2), 221–237 (1977). https://doi.org/10.2307/2272123
Nash, A., Segoufin, L., Vianu, V.: Views and queries: determinacy and rewriting. ACM Trans. Database Syst. 35(3), 1–41 (2010). https://doi.org/10.1145/1806907.1806913
Nicolas, J.M.: Logics for improving integrity checking in relational data bases. Technical report, ONERACERT, Toulouse (1979)
Nicolas, J.M.: Logics for improving integrity checking in relational data bases. Acta Informatica 18(3), 227–253 (1982). https://doi.org/10.1007/BF00263192
Otten, J.: Restricting backtracking in connection calculi. AI Commun. 23(2–3), 159–182 (2010). https://doi.org/10.3233/AIC20100464
Otten, J., Bibel, W.: leanCoP: lean connectionbased theorem proving. J. Symb. Comput. 36(1–2), 139–161 (2003). https://doi.org/10.1016/S07477171(03)000373
Plaisted, D.A., Greenbaum, S.: A structurepreserving clause form translation. J. Symb. Comput. 2, 293–304 (1986). https://doi.org/10.1016/S07477171(86)800281
Rawson, M., Wernhard, C., Zombori, Z., Bibel, W.: Lemmas: generation, selection, application. In: Ramanayake, R., Urban, J. (eds.) TABLEAUX 2023. LNCS (LNAI), vol. 14278, pp. 153–174. Springer, Heidelberg (2023). https://doi.org/10.1007/9783031435133_9
Robinson, J.A.: Automatic deduction with hyperresolution. Int. J. Comput. Math. 1(3), 227–234 (1965)
Rümmer, P.: A constraint sequent calculus for firstorder logic with linear integer arithmetic. In: Cervesato, I., Veith, H., Voronkov, A. (eds.) LPAR 2008. LNCS (LNAI), vol. 5330, pp. 274–289. Springer, Heidelberg (2008). https://doi.org/10.1007/9783540894391_20
Schulz, S.: Credo Quia absurdum (?) – proof generation and challenges of proof generation. In: PAMLTP/DG4D\(^3\) (2023), workshop presentation. https://europroofnet.github.io/_pages/WG5/Prague23/pres/Schulz.pdf
Schulz, S., Cruanes, S., Vukmirović, P.: Faster, higher, stronger: E 2.3. In: Fontaine, P. (ed.) CADE 2019. LNCS (LNAI), vol. 11716, pp. 495–507. Springer, Cham (2019). https://doi.org/10.1007/9783030294366_29
Scott, D.: A decision method for validity of sentences in two variables. J. Symb. Log. 27(4), 477 (1962)
Slagle, J.R.: Automatic theorem proving with renamable and semantic resolution. JACM 14(4), 687–697 (1967). https://doi.org/10.1145/321420.321428
Smullyan, R.M.: FirstOrder Logic. Springer, New York (1968). also republished with corrections by Dover publications (1995)
Stickel, M.E.: A Prolog technology theorem prover: implementation by an extended Prolog compiler. J. Autom. Reasoning 4(4), 353–380 (1988). https://doi.org/10.1007/BF00297245
Sutcliffe, G., Desharnais, M.: The 11th IJCAR automated theorem proving system competition  CASCJ11. AI Commun. (2023). https://doi.org/10.3233/AIC220244
Takeuti, G.: Proof Theory, second edn. NorthHolland (1987)
Toman, D., Weddell, G.: Fundamentals of Physical Design and Query Compilation. Morgan & Claypool, San Rafael (2011). https://doi.org/10.1007/9783031018817
Toman, D., Weddell, G.: An interpolationbased compiler and optimizer for relational queries (system design report). In: Eiter, T., Sands, D., Sutcliffe, G., Voronkov, A. (eds.) IWIL 2017 Workshop and LPAR21 Short Presentations. Kalpa, vol. 1. EasyChair (2017). https://doi.org/10.29007/53fk
Tseitin, G.S.: On the complexity of derivation in propositional calculus. In: Slisenko, A.O. (ed.) Studies in Constructive Mathematics and Mathematical Logic, vol. Part II, pp. 115–125. Steklov Mathematical Institute (1970)
Van Gelder, A., Topor, R.W.: Safety and translation of relational calculus queries. ACM Trans. Database Syst. 16(2), 235–278 (1991). https://doi.org/10.1145/114325.103712
Wernhard, C.: The PIE system for proving, interpolating and eliminating. In: Fontaine, P., Schulz, S., Urban, J. (eds.) PAAR 2016. CEUR Workshop Proc., vol. 1635, pp. 125–138. CEURWS.org (2016). http://ceurws.org/Vol1635/paper11.pdf
Wernhard, C.: Facets of the PIE environment for proving, interpolating and eliminating on the basis of firstorder logic. In: Hofstedt, P., Abreu, S., John, U., Kuchen, H., Seipel, D. (eds.) INAP/WLP/WFLP 2019. LNCS (LNAI), vol. 12057, pp. 160–177. Springer, Cham (2020). https://doi.org/10.1007/9783030467142_11
Wernhard, C.: Craig interpolation with clausal firstorder tableaux. J. Autom. Reasoning 65(5), 647–690 (2021). https://doi.org/10.1007/s10817021095903
Wernhard, C.: Rangerestricted and Horn interpolation through clausal tableaux. CoRR abs/2306.03572 (2023). https://doi.org/10.48550/arXiv.2306.03572
Wernhard, C., Bibel, W.: Learning from Łukasiewicz and Meredith: investigations into proof structures. In: Platzer, A., Sutcliffe, G. (eds.) CADE 2021. LNCS (LNAI), vol. 12699, pp. 58–75. Springer, Cham (2021). https://doi.org/10.1007/9783030798765_4
Acknowledgments
The author thanks Michael Benedikt for bringing the subtleties of rangerestriction in databases to attention, Cécilia Pradic for insights into subtleties of proof theory, and anonymous reviewers for helpful suggestions to improve the presentation.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
Copyright information
© 2023 The Author(s)
About this paper
Cite this paper
Wernhard, C. (2023). RangeRestricted and Horn Interpolation through Clausal Tableaux. In: Ramanayake, R., Urban, J. (eds) Automated Reasoning with Analytic Tableaux and Related Methods. TABLEAUX 2023. Lecture Notes in Computer Science(), vol 14278. Springer, Cham. https://doi.org/10.1007/9783031435133_1
Download citation
DOI: https://doi.org/10.1007/9783031435133_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 9783031435126
Online ISBN: 9783031435133
eBook Packages: Computer ScienceComputer Science (R0)