Logical reduction of metarules
- 839 Downloads
Abstract
Many forms of inductive logic programming (ILP) use metarules, second-order Horn clauses, to define the structure of learnable programs and thus the hypothesis space. Deciding which metarules to use for a given learning task is a major open problem and is a trade-off between efficiency and expressivity: the hypothesis space grows given more metarules, so we wish to use fewer metarules, but if we use too few metarules then we lose expressivity. In this paper, we study whether fragments of metarules can be logically reduced to minimal finite subsets. We consider two traditional forms of logical reduction: subsumption and entailment. We also consider a new reduction technique called derivation reduction, which is based on SLD-resolution. We compute reduced sets of metarules for fragments relevant to ILP and theoretically show whether these reduced sets are reductions for more general infinite fragments. We experimentally compare learning with reduced sets of metarules on three domains: Michalski trains, string transformations, and game rules. In general, derivation reduced sets of metarules outperform subsumption and entailment reduced sets, both in terms of predictive accuracies and learning times.
Keywords
Inductive logic programming Meta-interpretive learning Logical reduction Program induction Inductive programming1 Introduction
In this paper, we study whether potentially infinite fragments of metarules can be logically reduced to minimal, or irreducible, finite subsets, where a fragment is a syntactically restricted subset of a logical theory (Bradley and Manna 2007).
1.1 Contributions
We describe the logical reduction problem (Sect. 3).
We describe subsumption and entailment reduction, and introduce derivation reduction, the problem of removing derivationally redundant clauses from a clausal theory (Sect. 3).
We study the decidability of the three reduction problems and show, for instance, that the derivation reduction problem is undecidable for arbitrary Horn theories (Sect. 3).
We introduce two general reduction algorithms that take a reduction relation as a parameter. We also study their complexity (Sect. 4).
We run the reduction algorithms on finite sets of metarules to identify minimal sets (Sect. 5).
We theoretically show whether infinite fragments of metarules can be logically reduced to finite sets (Sect. 5).
We experimentally compare the learning performance of Metagol when supplied with reduced sets of metarules on three domains: Michalski trains, string transformations, and game rules (Sect. 6).
2 Related work
This section describes work related to this paper, mostly work on logical reduction techniques. We first, however, describe work related to MIL and metarules.
2.1 Meta-interpretive learning
Although the study of metarules has implications for many ILP approaches (Albarghouthi et al. 2017; Campero et al. 2018; Cropper and Muggleton 2019; Emde et al. 1983; Evans and Grefenstette 2018; Flener 1996; Kaminski et al. 2018; Kietz and Wrobel 1992; Muggleton et al. 2015; De Raedt and Bruynooghe 1992; Si et al. 2018; Wang et al. 2014), we focus on meta-interpretive learning (MIL), a form of ILP based on a Prolog meta-interpreter.^{5} The key difference between a MIL learner and a standard Prolog meta-interpreter is that whereas a standard Prolog meta-interpreter attempts to prove a goal by repeatedly fetching first-order clauses whose heads unify with a given goal, a MIL learner additionally attempts to prove a goal by fetching second-order metarules, supplied as background knowledge (BK), whose heads unify with the goal. The resulting meta-substitutions are saved and can be reused in later proofs. Following the proof of a set of goals, a logic program is formed by projecting the meta-substitutions onto their corresponding metarules, allowing for a form of ILP which supports predicate invention and learning recursive theories.
Most existing work on MIL has assumed suitable metarules as input to the problem, or has used metarules without any theoretical justification. In this paper, we try to address this issue by identifying minimal sets of metarules for interesting fragments of logic, such as Datalog, from which a MIL system can theoretically learn any logic program.
2.2 Metarules
McCarthy (1995) and Lloyd (2003) advocated using second-order logic to represent knowledge. Similarly, Muggleton et al. (2012) argued that using second-order representations in ILP provides more flexible ways of representing BK compared to existing methods. Metarules are second-order Horn clauses and are used as a form of declarative bias (Nédellec et al. 1996; De Raedt 2012) to determine the structure of learnable programs which in turn defines the hypothesis space. In contrast to other forms of declarative bias, such as modes (Muggleton 1995) or grammars (Cohen 1994), metarules are logical statements that can be reasoned about, such as to reason about the redundancy of sets of metarules, which we explore in this paper.
Metarules were introduced in the Blip system (Emde et al. 1983). Kietz and Wrobel (1992) studied generality measures for metarules in the RDT system. A generality order is necessary because the RDT system searches the hypothesis space (which is defined by the metarules) in a top-down general-to-specific order. A key difference between RDT and MIL is that whereas RDT requires metarules of increasing complexity (e.g. rules with an increasing number of literals in the body), MIL derives more complex metarules through SLD-resolution. This point is important because this ability allows MIL to start from smaller sets of primitive metarules. In this paper we try to identify such primitive sets.
Using metarules to build a logic program is similar to the use of refinement operators in ILP (Nienhuys-Cheng and de Wolf 1997; Shapiro 1983) to build a definite clause literal-by-literal.^{6} As with refinement operators, it seems reasonable to ask about completeness and irredundancy of a set of metarules, which we explore in this paper.
2.3 Logical redundancy
Detecting and eliminating redundancy in a clausal theory is useful in many areas of computer science. In ILP logically reducing a theory is useful to remove redundancy from a hypothesis space to improve learning performance (Cropper and Muggleton 2014; Fonseca et al. 2004). In general, simplifying or reducing a theory often makes a theory easier to understand and use, and may also have computational efficiency advantages.
2.3.1 Literal redundancy
Plotkin (1971) used subsumption to decide whether a literal is redundant in a first-order clause. Joyner (1976) independently investigated the same problem, which he called clause condensation, where a condensation of a clause C is a minimum cardinality subset \(C'\) of C such that \(C' \models C\). Gottlob and Fermüller (1993) improved Joyner’s algorithm and also showed that determining whether a clause is condensed is co-NP-complete. In contrast to removing redundant literals, we focus on removing redundant clauses.
2.3.2 Clause redundancy
Plotkin (1971) introduced methods to decide whether a clause is subsumption redundant in a first-order clausal theory. This problem has also been extensively studied in the context of first-order logic with equality due to its application in superposition-based theorem proving (Hillenbrand et al. 2013; Weidenbach and Wischnewski 2010). The same problem, and slight variants, has been extensively studied in the propositional case (Liberatore 2005, 2008). Removing redundant clauses has numerous applications, such as to improve the efficiency of SAT (Heule et al. 2015). In contrast to these works, we focus on reducing theories formed of second-order Horn clauses (without equality), which to our knowledge has not yet been extensively explored. Another difference is that we additionally study redundancy based on SLD-derivations.
Cropper and Muggleton (2014) used Progol’s entailment-reduction algorithm (Muggleton 1995) to identify irreducible sets of metarules. Their approach removed entailment redundant clauses from sets of metarules. They identified theories that are (1) entailment complete for certain fragments of second-order Horn logic, and (2) irreducible. They demonstrated that in some cases as few as two clauses are sufficient to entail an infinite theory. However, they only considered small and highly constrained fragments of metarules. In particular, they focused on an exactly-two-connected fragment of metarules where each literal is dyadic and each first-order variable appears exactly twice in distinct literals. However, as discussed in the introduction, entailment reduction is not always the most appropriate form of reduction because it can remove metarules necessary to specialise a clause. Therefore, in this paper, we go beyond entailment reduction and introduce derivation reduction. We also consider more general fragments of metarules, such as a fragment of metarules sufficient to learn Datalog programs.
Cropper and Tourret (2018) introduced the derivation reduction problem and studied whether sets of metarules could be derivationally reduced. They considered the exactly-two-connected fragment previously considered by Cropper and Muggleton and a two-connected fragment in which every variable appears at least twice, which is analogous to our singleton-free fragment (Sect. 5.3). They used graph theoretic methods to show that certain fragments could not be completely derivationally reduced. They demonstrated on the Michalski trains dataset that the partially derivationally reduced set of metarules outperforms the entailment reduced set. In similar work Cropper and Tourret elaborated on their graph theoretic techniques and expanded the results to unconstrained resolution (Tourret and Cropper 2019).
In this paper, we go beyond the work of (Cropper and Tourret 2018) in several ways. First, we consider more general fragments of metarules, including connected and Datalog fragments. We additionally consider fragments with zero arity literals. In all cases we provide additional theoretical results showing whether certain fragments can be reduced, and, where possible, show the actual reductions. Second, Tourret and Cropper (2019) focused on derivation reduction modulo first-order variable unification, i.e. they considered the case where factorisation (Nienhuys-Cheng and de Wolf 1997) was allowed when resolving two clauses, which is not implemented in practice in current MIL systems. For this reason, although Section 5 in Tourret and Cropper (2019) and Sect. 5.1 in the present paper seemingly consider the same problem, the results are opposite to one another. Third, in addition to entailment and derivation reduction, we also consider subsumption reduction. We provide more theoretical results on the decidability of the reduction problems, such as showing a decidable case for derivation reduction (Theorem 4). Fourth, we describe the reduction algorithms and discuss their computational complexity. Finally, we corroborate the experimental results of Cropper and Tourret on Michalski’s train problem (Cropper and Tourret 2018) and provide additional experimental results on two more domains: real-world string transformations and inducing Datalog game rules from observations.
2.3.3 Theory minimisation
We focus on removing clauses from a clausal theory. A related yet distinct topic is theory minimisation where the goal is to find a minimum equivalent formula to a given input formula. This topic is often studied in propositional logic (Hemaspaandra and Schnoor 2011). The minimisation problem allows for the introduction of new clauses. By contrast, the reduction problem studied in this paper does not allow for the introduction of new clauses and instead only allows for the removal of redundant clauses.
2.3.4 Prime implicates
Implicates of a theory T are the clauses that are entailed by T and are called prime when they do not themselves entail other implicates of T. This notion differs from the subsumption and derivation reduction because it focuses on entailment, and it differs from entailment reduction because (1) the notion of a prime implicate has been studied only in propositional, first-order, and some modal logics (Bienvenu 2007; Echenim et al. 2015; Marquis 2000); (2) the generation of prime implicates allows for the introduction of new clauses in the formula.
3 Logical reduction
We now introduce the reduction problem: the problem of finding redundant clauses in a theory. We first describe the reduction problem starting with preliminaries, and then describe three instances of the problem. The first two instances are based on existing logical reduction methods: subsumption and entailment. The third instance is a new form of reduction introduced in Cropper and Tourret (2018) based on SLD-derivations.
3.1 Preliminaries
We assume familiarity with logic programming notation (Lloyd 1987) but we restate some key terminology. A clause is a disjunction of literals. A clausal theory is a set of clauses. A Horn clause is a clause with at most one positive literal. A Horn theory is a set of Horn clauses. A definite clause is a Horn clause with exactly one positive literal. A Horn clause is a Datalog clause if (1) it contains no function symbols, and (2) every variable that appears in the head of the clause also appears in a positive (i.e. not negated) literal in the body of the clause.^{7} We denote the powerset of the set S as \(2^S\).
3.1.1 Metarules
Although the reduction problem applies to any clausal theory, we focus on theories formed of metarules:
Definition 1
Example metarules
Name | Metarule |
---|---|
\(\hbox {Indent}_1\) | \(P(A) \leftarrow Q(A)\) |
\(\hbox {DIndent}_1\) | \(P(A) \leftarrow Q(A),R(A)\) |
\(\hbox {Indent}_2\) | \(P(A,B) \leftarrow Q(A,B)\) |
\(\hbox {DIndent}_2\) | \(P(A,B) \leftarrow Q(A,B),R(A,B)\) |
Precon | \(P(A,B) \leftarrow Q(A),R(A,B)\) |
Postcon | \(P(A,B) \leftarrow Q(A,B),R(B)\) |
Curry | \(P(A,B) \leftarrow Q(A,B,R)\) |
Chain | \(P(A,B) \leftarrow Q(A,C), R(C,B)\) |
Table 1 shows a selection of metarules commonly used in the MIL literature (Cropper and Muggleton 2015, 2016a, 2019; Cropper et al. 2015; Morel et al. 2019). As Definition 1 states, metarules may include predicate and constant symbols. However, we focus on the more general case where metarules only contain variables.^{8} In addition, although metarules can be any Horn clauses, we focus on definite clauses with at least one body literal, i.e. we disallow facts, because their inclusion leads to uninteresting reductions, where in almost all such cases the theories can be reduced to a single fact.^{9} We denote the infinite set of all such metarules as \({{{\mathscr {M}}}}^{{}}_{}\). We focus on fragments of \({{{\mathscr {M}}}}^{{}}_{}\), where a fragment is a syntactically restricted subset of a theory (Bradley and Manna 2007):
Definition 2
(The fragment\({{{\mathscr {M}}}}^{{a}}_{m}\)) We denote as \({{{\mathscr {M}}}}^{{a}}_{m}\) the fragment of \({{{\mathscr {M}}}}^{{}}_{}\) where each literal has arity at most a and each clause has at most m literals in the body. We replace a by the explicit set of arities when we restrict the allowed arities further.
Example 1
\({{{\mathscr {M}}}}^{{\{2\}}}_{2}\) is a subset of \({{{\mathscr {M}}}}^{{}}_{}\) where each predicate has arity 2 and each clause has at most 2 body literals.
Example 2
\({{{\mathscr {M}}}}^{{\{2\}}}_{m}\) is a subset of \({{{\mathscr {M}}}}^{{}}_{}\) where each predicate has arity 2 and each clause has at most m body literals.
Example 3
\({{{\mathscr {M}}}}^{{\{0,2\}}}_{m}\) is a subset of \({{{\mathscr {M}}}}^{{}}_{}\) where each predicate has arity 0 or 2 and each clause has at most m body literals.
Example 4
\({{{\mathscr {M}}}}^{{a}}_{\{1,2\}}\) is a subset of \({{{\mathscr {M}}}}^{{}}_{}\) where each predicate has arity at most a and each clause has either 1 or 2 body literals.
Let T be a clausal theory. Then we say that T is in the fragment \({{{\mathscr {M}}}}^{{a}}_{m}\) if and only if each clause in T is in \({{{\mathscr {M}}}}^{{a}}_{m}\).
3.2 Meta-interpretive learning
In Sect. 6 we conduct experiments to see whether using reduced sets of metarules can improve learning performance. The primary purpose of the experiments is to test our claim that entailment reduction is not always the most appropriate form of reduction. Our experiments focus on MIL. For self-containment, we briefly describe MIL.
Definition 3
B is a set of Horn clauses denoting background knowledge
\(E^+\) and \(E^-\) are disjoint sets of ground atoms representing positive and negative examples respectively
M is a set of metarules
The MIL problem is defined from a MIL input:
Definition 4
\(\forall c \in H, \exists m \in M\) such that \(c=m\theta \), where \(\theta \) is a substitution that grounds all the existentially quantified variables in m
\(H \cup B \models E^{+}\)
\(H \cup B \not \models E^{-}\)
The metarules and background define the hypothesis space. To explain our experimental results in Sect. 6, it is important to understand the effect that metarules have on the size of the MIL hypothesis space, and thus on learning performance. The following result generalises previous results (Cropper and Muggleton 2016a; Lin et al. 2014):
Theorem 1
(MIL hypothesis space) Given p predicate symbols and k metarules in \({{{\mathscr {M}}}}^{{a}}_{m}\), the number of programs expressible with n clauses is at most \((p^{m+1}k)^n\).
Proof
The number of first-order clauses which can be constructed from a \({{{\mathscr {M}}}}^{{a}}_{m}\) metarule given p predicate symbols is at most \(p^{m+1}\) because for a given metarule there are at most \(m+1\) predicate variables with at most \(p^{m+1}\) possible substitutions. Therefore the set of such clauses S which can be formed from k distinct metarules in \({{{\mathscr {M}}}}^{{a}}_{m}\) using p predicate symbols has cardinality at most \(p^{m+1}k\). It follows that the number of programs which can be formed from a selection of n clauses chosen from S is at most \((p^{m+1}k)^n\). \(\square \)
Theorem 1 shows that the MIL hypothesis space increases given more metarules. The Blumer bound (Blumer et al. 1987),^{10} says that given two hypothesis spaces, searching the smaller space will result in fewer errors compared to the larger space, assuming that the target hypothesis is in both spaces. This result suggests that we should consider removing redundant metarules to improve the learning performance. We explore this idea in the rest of the paper.
3.3 Encapsulation
To reason about metarules (especially when running the Prolog implementations of the reduction algorithms), we use a method called encapsulation (Cropper and Muggleton 2014) to transform a second-order logic program to a first-order logic program. We first define encapsulation for atoms:
Definition 5
(Atomic encapsulation) Let A be a second-order or first-order atom of the form \(P(T_{1},..,T_{n})\). Then \(enc(A) = enc(P,T_{1},..,T_{n})\) is the encapsulation of A.
For instance, the encapsulation of the atom parent(ann,andy) is enc(parent,ann,andy). Note that encapsulation essentially ignores the quantification of variables in metarules by treating all variables, including predicate variables, as first-order universally quantified variables of the first-order enc predicate. In particular, replacing existential quantifiers with universal quantifiers on predicate variables is fine for our work because we only reason about the form of metarules, not their semantics, i.e. we treat metarules as templates for first-order clauses. We extend atomic encapsulation to logic programs:
Definition 6
(Program encapsulation) The logic program enc(P) is the encapsulation of the logic program P in the case enc(P) is formed by replacing all atoms A in P by enc(A).
For example, the encapsulation of the metarule \(P(A,B) \leftarrow Q(A,C), R(C,B)\) is \(enc(P,A,B) \leftarrow enc(Q,A,C), enc(R,C,B)\). We extend encapsulation to interpretations (Nienhuys-Cheng and de Wolf 1997) of logic programs:
Definition 7
(Interpretation encapsulation) Let I be an interpretation over the predicate and constant symbols in a logic program. Then the encapsulated interpretation enc(I) is formed by replacing each atom A in I by enc(A).
We now have the proposition:
Proposition 1
[Encapsulation models (Cropper and Muggleton 2014)] The second-order logic program P has a model M if and only if enc(P) has the model enc(M).
Proof
Follows trivially from the definitions of encapsulated programs and interpretations.
\(\square \)
We can extend the definition of entailment to logic programs:
Proposition 2
[Entailment (Cropper and Muggleton 2014)] Let P and Q be second-order logic programs. Then \(P\models Q\) if and only if every model enc(M) of enc(P) is also a model of enc(Q).
Proof
Follows immediately from Proposition 1. \(\square \)
These results allow us to reason about metarules using standard first-order logic. In the rest of the paper all the reasoning about second-order theories is performed at the first-order level. However, to aid the readability we continue to write non-encapsulated metarules in the rest of the paper, i.e. we will continue to refer to sets of metarules as second-order theories.
3.4 Logical reduction problem
We now describe the logical reduction problem. For the clarity of the paper, and to avoid repeating definitions for each form of reduction that we consider (entailment, subsumption, and derivability), we describe a general reduction problem which is parametrised by a binary relation \(\sqsubset \) defined over any clausal theory, although in the case of derivability, \(\sqsubset \) is in fact only defined over Horn clauses. Our only constraint on the relation \(\sqsubset \) is that if \(A\sqsubset {}B\), \(A\subseteq A'\) and \(B'\subseteq B\) then \(A'\sqsubset {}B'\). We first define a redundant clause:
Definition 8
(\(\sqsubset \)-redundant clause) The clause C is \(\sqsubset \)-redundant in the clausal theory \(T \cup \{C\}\) whenever T\(\sqsubset \)\(\{C\}\).
In a slight abuse of notation, we allow Definition 8 to also refer to a single clause, i.e. in our notation T\(\sqsubset \)C is the same as T\(\sqsubset \)\(\{C\}\). We define a reduced theory:
Definition 9
(\(\sqsubset \)-reduced theory) A clausal theory is \(\sqsubset \)-reduced if and only if it is finite and it does not contain any \(\sqsubset \)-redundant clauses.
We define the input to the reduction problem:
Definition 10
(\(\sqsubset \)-reduction input) A reduction input is a pair (T, \(\sqsubset \)) where T is a clausal theory and \(\sqsubset \) is a binary relation over a clausal theory.
Note that a reduction input may (and often will) be an infinite clausal theory. We define the reduction problem:
Definition 11
(\(\sqsubset \)-reduction problem) Let (T, \(\sqsubset \)) be a reduction input. Then the \(\sqsubset \)-reduction problem is to find a finite theory \(T' \subseteq T\) such that (1) \(T'\)\(\sqsubset \)T (i.e. \(T'\)\(\sqsubset \)C for every clause C in T), and (2) \(T'\) is \(\sqsubset \)-reduced. We call \(T'\) a \(\sqsubset \)-reduction.
Although the input to a \(\sqsubset \)-reduction problem may contain an infinite theory, the output (a \(\sqsubset \)-reduction) must be a finite theory. We also introduce a variant of the \(\sqsubset \)-reduction problem where the reduction must obey certain syntactic restrictions:
Definition 12
(\({{{\mathscr {M}}}}^{{a}}_{m}\)-\(\sqsubset \)-reduction problem) Let (T,\(\sqsubset \),\({{{\mathscr {M}}}}^{{a}}_{m}\)) be a triple, where the first two elements are as in a standard reduction input and \({{{\mathscr {M}}}}^{{a}}_{m}\) is a target reduction theory. Then the \({{{\mathscr {M}}}}^{{a}}_{m}\)-\(\sqsubset \)-reduction problem is to find a finite theory \(T' \subseteq T\) such that (1) \(T'\) is a \(\sqsubset \)-reduction of T, and (2) \(T'\) is in \({{{\mathscr {M}}}}^{{a}}_{m}\).
3.5 Subsumption reduction
The first form of reduction we consider is based on subsumption, which, as discussed in Sect. 2, is often used to eliminate redundancy in a clausal theory:
Definition 13
(Subsumption) A clause C subsumes a clause D, denoted as \(C \preceq D\), if there exists a substitution \(\theta \) such that \(C\theta \subseteq D\).
Note that if a clause C subsumes a clause D then \(C \models D\) (Robinson 1965). However, if \(C \models D\) then it does not necessarily follow that \(C \preceq D\). Subsumption can therefore be seen as being weaker than entailment. Whereas checking entailment between clauses is undecidable (Church 1936), Robinson (1965) showed that checking subsumption between clauses is decidable [although in general deciding subsumption is a NP-complete problem (Nienhuys-Cheng and de Wolf 1997)].
If T is a clausal theory then the pair \((T,\preceq )\) is an input to the \(\sqsubset \)-reduction problem, which leads to the subsumption reduction problem (S-reduction problem). We show that the S-reduction problem is decidable for finite theories:
Proposition 3
(Finite S-reduction problem decidability) Let T be a finite theory. Then the corresponding S-reduction problem is decidable.
Proof
We can enumerate each element \(T'\) of \(2^T\) in ascending order on the cardinality of \(T'\). For each \(T'\) we can check whether \(T'\) subsumes T, which is decidable because subsumption between clauses is decidable. If \(T'\) subsumes T then we correctly return \(T'\); otherwise we continue to enumerate. Because the set \(2^T\) is finite the enumeration must halt. Because the set \(2^T\) contains T the algorithm will in the worst-case return T. Thus the problem is decidable. \(\square \)
3.6 Entailment reduction
As mentioned in the introduction, Cropper and Muggleton (2014) previously used entailment reduction (Muggleton 1995) to reduce sets of metarules using the notion of an entailment redundant clause:
Definition 14
(E-redundant clause) The clause C is entailment redundant (E-redundant) in the clausal theory \(T \cup \{C\}\) whenever \(T\models C\).
If T is a clausal theory then the pair \((T,\models )\) is an input to the \(\sqsubset \)-reduction problem, which leads to the entailment reduction problem (E-reduction). We show the relationship between an E- and a S-reduction:
Proposition 4
Let T be a clausal theory, \(T_S\) be a S-reduction of T, and \(T_E\) be an E-reduction of T. Then \(T_E \models T_S\).
Proof
Assume the opposite, i.e. \(T_E \not \models T_S\). This assumption implies that there is a clause \(C \in T_S\) such that \(T_E \not \models C\). By the definition of S-reduction, \(T_S\) is a subset of T so C must be in T, which implies that \(T_E \not \models T\). But this contradicts the premise that \(T_E\) is an E-reduction of T. Therefore the assumption cannot hold, and thus \(T_E \models T_S\). \(\square \)
We show that the E-reduction problem is undecidable for arbitrary clausal theories:
Proposition 5
(E-reduction problem clausal decidability) The E-reduction problem for clausal theories is undecidable.
Proof
Follows from the undecidability of entailment in clausal logic (Church 1936). \(\square \)
The E-reduction problem for Horn theories is also undecidable:
Proposition 6
(E-reduction problem Horn decidability) The E-reduction problem for Horn theories is undecidable.
Proof
Follows from the undecidability of entailment in Horn logic (Marcinkowski and Pacholski 1992). \(\square \)
The E-reduction problem is, however, decidable for finite Datalog theories:
Proposition 7
(E-reduction problem Datalog decidability) The E-reduction problem for finite Datalog theories is decidable.
3.7 Derivation reduction
Definition 15
We state our notion of derivability:
Definition 16
(Derivability) A Horn clause C is derivable from the Horn theory T, written \(T \vdash C\), if and only if \(C \in R^*(T)\).
We define a derivationally redundant (D-redundant) clause:
Definition 17
(D-redundant clause) A clause C is derivationally redundant in the Horn theory \(T \cup \{C\}\) if \(T \vdash C\).
We can show the relationship between E- and D-reductions by restating the notion of a SLD-deduction (Nienhuys-Cheng and de Wolf 1997):
Definition 18
[SLD-deduction (Nienhuys-Cheng and de Wolf 1997)] Let T be a Horn theory and C be a Horn clause. Then there exists a SLD-deduction of C from T, written \(T \vdash _d C\), if C is a tautology or if there exists a clause D such that \(T \vdash D\) and D subsumes C.
We can use the subsumption theorem (Nienhuys-Cheng and de Wolf 1997) to show the relationship between SLD-deductions and logical entailment:
Theorem 2
[SLD-subsumption theorem (Nienhuys-Cheng and de Wolf 1997)] Let T be a Horn theory and C be a Horn clause. Then \(T \models C\) if and only if \(T \vdash _d C\).
We can use this result to show the relationship between an E- and a D-reduction:
Proposition 8
Let T be a Horn theory, \(T_E\) be an E-reduction of T, and \(T_D\) be a D-reduction of T. Then \(T_E \models T_D\).
Proof
Follows from the definitions of E-reduction and D-reduction because an E-reduction can be obtained from a D-reduction with an additional subsumption check. \(\square \)
We also use the SLD-subsumption theorem to show that the D-reduction problem is undecidable for Horn theories:
Theorem 3
(D-reduction problem Horn decidability) The D-reduction problem for Horn theories is undecidable.
Proof
Assume the opposite, that the problem is decidable, which implies that \(T \vdash C\) is decidable. Since \(T \vdash C\) is decidable and subsumption between Horn clauses is decidable (Garey and Johnson 1979), then finding a SLD-deduction is also decidable. Therefore, by the SLD-subsumption theorem, entailment between Horn clauses is decidable. However, entailment between Horn clauses is undecidable (Schmidt-Schauß 1988), so the assumption cannot hold. Therefore, the problem must be undecidable. \(\square \)
However, the D-reduction problem is decidable for any fragment \({{{\mathscr {M}}}}^{{a}}_{m}\) (e.g. definite Datalog clauses where each clause has at least one body literal, with additional arity and body size constraints). To show this result, we first introduce two lemmas:
Lemma 1
Let D, \(C_1\), and \(C_2\) be definite clauses with \(m_d\), \(m_{c1}\), and \(m_{c2}\) body literals respectively, where \(m_d\), \(m_{c1}\), and \(m_{c2} > 0\). If \(\{C_1,C_2\} \vdash D\) then \(m_{c1} \le m_{d}\) and \(m_{c2} \le m_{d}\).
Proof
Follows from the definitions of SLD-resolution (Nienhuys-Cheng and de Wolf 1997). \(\square \)
Note that Lemma 1 does not hold for unconstrained resolution because it allows for factorisation (Nienhuys-Cheng and de Wolf 1997). Lemma 1 also does not hold when facts (bodyless definite clauses) are allowed because they would allow for resolvents that are smaller in body size than one of the original two clauses.
Lemma 2
Let \({{{\mathscr {M}}}}^{{a}}_{m}\) be a fragment of metarules. Then \({{{\mathscr {M}}}}^{{a}}_{m}\) is finite up to variable renaming.
Proof
Any literal in \({{{\mathscr {M}}}}^{{a}}_{m}\) has at most a first-order variables and 1 second-order variable, so any literal has at most \(a+1\) variables. Any metarule has at most m body literals plus the head literal, so any metarule has at most \(m+1\) literals. Therefore, any metarule has at most \(((a+1)(m+1))\) variables. We can arrange the variables in at most \(((a+1)(m+1))!\) ways, so there are at most \(((a+1)(m+1))!\) metarules in \({{{\mathscr {M}}}}^{{a}}_{m}\) up to variable renaming. Thus \({{{\mathscr {M}}}}^{{a}}_{m}\) is finite up to variable renaming. \(\square \)
Note that the bound in the proof of Lemma 2 is a worst-case result. In practice there are fewer usable metarules because we consider fragments of constrained theories, thus not all clauses are admissible, and in all cases the order of the body literals is irrelevant. We use these two lemmas to show that the D-reduction problem is decidable for \({{{\mathscr {M}}}}^{{a}}_{m}\):
Theorem 4
(\({{{\mathscr {M}}}}^{{a}}_{m}\)-D-reduction problem decidability) The D-reduction problem for theories included in \({{{\mathscr {M}}}}^{{a}}_{m}\) is decidable.
Proof
Let T be a finite clausal theory in \({{{\mathscr {M}}}}^{{a}}_{m}\) and C be a definite clause with \(n>0\) body literals. The problem is whether \(T \vdash C\) is decidable. By Lemma 1, we cannot derive C from any clause which has more than n body literals. We can therefore restrict the resolution closure \(R^*(T)\) to only include clauses with body lengths less than or equal to n. In addition, by Lemma 2 there are only a finite number of such clauses so we can compute the fixed-point of \(R^*(T)\) restricted to clauses of size smaller or equal to n in a finite amount of steps and check whether C is in the set. If it is then \(T \vdash C\); otherwise \(T \not \vdash C\). \(\square \)
3.8 k-Derivable clauses
Propositions 3 and 7 and Theorem 4 show that the \(\sqsubset \)-reduction problem is decidable under certain conditions. However, as we will shown in Sect. 4, even in decidable cases, solving the \(\sqsubset \)-reduction problem is computationally expensive. We therefore solve restricted k-bounded versions of the E- and D-reduction problems, which both rely on SLD-derivations. Specifically, we focus on resolution depth-limited derivations using the notion of k-derivability:
Definition 19
(k-derivability) Let k be a natural number. Then a Horn clause C is k-derivable from the Horn theory T, written \(T \vdash _k C\), if and only if \(C \in R^k(T)\).
The definitions for k-bounded E- and D-reductions follow from this definition but are omitted for brevity. In Sect. 4 we introduce a general algorithm (Algorithm 1) to solve the S-reduction problem and k-bounded E- and D-reduction problems.
4 Reduction algorithms
In Sect. 5 we logically reduce sets of metarules. We now describe the reduction algorithms that we use.
4.1 \(\sqsubset \)-Reduction algorithm
The reduce algorithm (Algorithm 1) shows a general \(\sqsubset \)-reduction algorithm that solves the \(\sqsubset \)-reduction problem (Definition 11) when the input theory is finite.^{11} We ignore cases where the input is infinite because of the inherent undecidability of the problem. Algorithm 1 is largely based on Plotkin’s clausal reduction algorithm (Plotkin 1971). Given a finite clausal theory T and a binary relation \(\sqsubset \), the algorithm repeatedly tries to remove a \(\sqsubset \)-redundant clause in T. If it cannot find a \(\sqsubset \)-redundant clause, then it returns the \(\sqsubset \)-reduced theory. Note that since derivation reduction is only defined over Horn theories, in a \(\vdash \)-reduction input \((T,\vdash )\), the theory T has to be Horn. We show total correctness of the algorithm:
Proposition 9
(Algorithm 1 total correctness) Let (T,\(\sqsubset \)) be a \(\sqsubset \)-reduction input where T is finite. Let the corresponding \(\sqsubset \)-reduction problem be decidable. Then Algorithm 1 solves the \(\sqsubset \)-reduction problem.
Proof
Trivial by induction on the size of T. \(\square \)
Note that Proposition 9 assumes that the given reduction problem is decidable and that the input theory is finite. If you call Algorithm 1 with an arbitrary clausal theory and the \(\models \) relation then it will not necessarily terminate. We can call Algorithm 1 with specific binary relations, where each variation has a different time-complexity. Table 2 shows different ways of calling Algorithm 1 with their corresponding time complexities, where we assume finite theories as input. We show the complexity of calling Algorithm 1 with the subsumption relation:
Proposition 10
(S-reduction complexity) If T is a finite clausal theory then calling Algorithm 1 with (T,\(\preceq \)) requires at most \(O(|T|^3)\) calls to a subsumption algorithm.
Proof
For every clause in T the algorithm checks whether any other clause in T subsumes C which requires at most \(O(|T|^2)\) calls to a subsumption algorithm. If any clause C is found to be S-redundant then the algorithm repeats the procedure on the theory (\(T \setminus \{C\}\)), so overall the algorithm requires at most \(O(|T|^3)\) calls to a subsumption algorithm. \(\square \)
Note that a more detailed analysis of calling Algorithm 1 with the subsumption relation would depend on the subsumption algorithm used, which is an NP-complete problem (Garey and Johnson 1979). We show the complexity of calling Algorithm 1 with the k-bounded entailment relation:
Proposition 11
(k-bounded E-reduction complexity) If T is a finite Horn theory and k is a natural number then calling Algorithm 1 with (T,\(\models _k\)) requires at most \(O(|T|^{k+2})\) resolutions.
Proof
In the worst case the derivation check (line 4) requires searching the whole SLD-tree which has a maximum branching factor |T| and a maximum depth k and takes \(O(|T|^{k})\) steps. The algorithm potentially does this step for every clause in T so the complexity of this step is \(O(|T|^{k+1})\). The algorithm has to perform this check for every clause in T with an overall worst-case complexity \(O(|T|^{k+2})\). \(\square \)
The complexity of calling Algorithm 1 with the k-derivation relation is identical:
Proposition 12
(k-bounded D-reduction complexity) Let T be a finite Horn theory and k be a natural number then calling Algorithm 1 with (T,\(\vdash _k\)) requires at most \(O(|T|^{k+2})\) resolutions.
Proof
Follows using the same reasoning as Proposition 11. \(\square \)
Outputs and complexity of Algorithm 1 for different input relations and an arbitrary finite clausal theory T
Relation | Output | Complexity |
---|---|---|
\(\preceq \) | S-reduction | \(O(|T|^3)\) |
\(\models \) | E-reduction | Undecidable |
\(\models _k\) | k-E-reduction | \(O(|T|^{k+2})\) |
\(\vdash \) | D-reduction | Undecidable |
\(\vdash _k\) | k-D-reduction | \(O(|T|^{k+2})\) |
4.2 \({{{\mathscr {M}}}}^{{a}}_{m}\)-\(\sqsubset \)-reduction algorithm
Although \(T'\) is an E-reduction of T, it is not in \({{{\mathscr {M}}}}^{{2}}_{2}\) because \(M_4\) is not in \({{{\mathscr {M}}}}^{{2}}_{2}\). However, the theory T can be \({{{\mathscr {M}}}}^{{2}}_{2}\)-E-reduced to \(\{M_1,M_2,M_3\}\) because \(\{M_2,M_3\} \models M_4\),^{14} and \(\{M_1,M_2,M_3\}\) cannot be further reduced. In general, let T be a theory in \({{{\mathscr {M}}}}^{{a}}_{m}\) and an \(T'\) be an E-reduction of T, then \(T'\) is not necessarily in \({{{\mathscr {M}}}}^{{a}}_{2}\).
Algorithm 2 overcomes this limitation of Algorithm 1. Given a finite clausal theory T, a binary relation \(\sqsubset \), and a reduction fragment \({{{\mathscr {M}}}}^{{a}}_{m}\), Algorithm 2 determines whether there is a \(\sqsubset \)-reduction of T in \({{{\mathscr {M}}}}^{{a}}_{m}\). If there is, it returns the reduced theory; otherwise it returns false. In other words, Algorithm 2 solves the \({{{\mathscr {M}}}}^{{a}}_{m}\)-\(\sqsubset \)-reduction problem. We show total correctness of Algorithm 2:
Proposition 13
(Algorithm 2 correctness) Let (T,\(\sqsubset \),\({{{\mathscr {M}}}}^{{a}}_{m}\)) be a \({{{\mathscr {M}}}}^{{a}}_{m}\)-\(\sqsubset \)-reduction input. If the corresponding \(\sqsubset \)-reduction problem is decidable then Algorithm 2 solves the corresponding \({{{\mathscr {M}}}}^{{a}}_{m}\)-\(\sqsubset \)-reduction problem.
Sketch Proof
We provide a sketch proof for brevity. We need to show that the function aux correctly determines whether B\(\sqsubset \)T, which we can show by induction on the size of T. Assuming aux is correct, then if T can be reduced to B, the mreduce function calls Algorithm 1 to reduce B, which is correct by Proposition 9. Otherwise it returns false. \(\square \)
5 Reduction of metarules
- G1:
identify a \({{{\mathscr {M}}}}^{{a}}_{k}\)-\(\sqsubset \)-reduction of \({{{\mathscr {M}}}}^{{a}}_{m}\) for some k as small as possible
- G2:
determine whether \({{{\mathscr {M}}}}^{{a}}_{2} \sqsubset {} {{{\mathscr {M}}}}^{{a}}_{\infty }\)
- G3:
determine whether \({{{\mathscr {M}}}}^{{a}}_{\infty }\) has any (finite) \(\sqsubset \)-reduction
We work on these goals for fragments of \({{{\mathscr {M}}}}^{{a}}_{m}\) relevant to ILP. Table 3 shows the four fragments and their main restrictions. The subsequent sections precisely describe the fragments.
Our first goal (G1) is to essentially minimise the number of body literals in a set of metarules, which can be seen as trying to enforce an Occamist bias. We are particularly interested reducing sets of metarules to fragments with at most two body literals because \({{{\mathscr {M}}}}^{{\{2\}}}_{2}\) augmented with one function symbol has universal Turing machine expressivity (Tärnlund 1977). In addition, previous work on MIL has almost exclusively used metarules from the fragment \({{{\mathscr {M}}}}^{{2}}_{2}\). Our second goal (G2) is more general and concerns reducing an infinite set of metarules to \({{{\mathscr {M}}}}^{{a}}_{2}\). Our third goal (G3) is similar, but is about determining whether an infinite set of metarules has any finite reduction.
The four main fragments of \({{{\mathscr {M}}}}^{{}}_{}\) that we consider
Fragment | Description |
---|---|
\({{{\mathscr {C}}}}^{}_{}\) | Connected clauses |
\({{{\mathscr {D}}}}^{}_{}\) | Connected Datalog clauses |
\({{{\mathscr {K}}}}^{}_{}\) | Connected Datalog clauses without singleton variables |
\({{{\mathscr {U}}}}^{}_{}\) | Connected Datalog clauses without duplicate variables |
5.1 Connected (\({{{\mathscr {C}}}}^{a}_{m}\)) results
We first consider a general fragment of metarules. The only constraint is that we follow the standard ILP convention (Cropper and Muggleton 2014; Evans and Grefenstette 2018; Gottlob et al. 1997; Nienhuys-Cheng and de Wolf 1997) and focus on connected clauses^{16}:
Definition 20
(Connected clause) A clause is connected if the literals in the clause cannot be partitioned into two sets such that the variables appearing in the literals of one set are disjoint from the variables appearing in the literals of the other set.
Cardinality and maximal body size of the reductions of \({{{\mathscr {C}}}}^{a}_{5}\)
Arities | S-reduction | E-reduction | D-reduction | |||
---|---|---|---|---|---|---|
a | Bodysize | Cardinality | Bodysize | Cardinality | Bodysize | Cardinality |
0 | 1 | 1 | 1 | 1 | 2 | 2 |
1 | 1 | 1 | 1 | 1 | 2 | 2 |
2 | 1 | 4 | 1 | 1 | 5 | 6 |
0, 1 | 1 | 3 | 1 | 3 | 2 | 5 |
0, 2 | 1 | 6 | 1 | 3 | 5 | 21 |
1, 2 | 1 | 9 | 1 | 2 | 4 | 8 |
0, 1, 2 | 1 | 12 | 1 | 4 | 5 | 13 |
Reductions of the connected fragment \({{{\mathscr {C}}}}^{\{1,2\}}_{5}\)
S-reduction | E-reduction | D-reduction |
---|---|---|
\(P(A) \leftarrow Q(A)\) | \(P(A) \leftarrow Q(B,A)\) | \(P(A) \leftarrow Q(B,A)\) |
\(P(A) \leftarrow Q(A,B)\) | \(P(A,B) \leftarrow Q(A)\) | \(P(A,A) \leftarrow Q(B,A)\) |
\(P(A) \leftarrow Q(B,A)\) | \(P(A,B) \leftarrow Q(B)\) | |
\(P(A,B) \leftarrow Q(A)\) | \(P(A,B) \leftarrow Q(B,A)\) | |
\(P(A,B) \leftarrow Q(B)\) | \(P(A,B) \leftarrow Q(B,B)\) | |
\(P(A,B) \leftarrow Q(A,C)\) | \(P(A,B) \leftarrow Q(A,B),R(A,B)\) | |
\(P(A,B) \leftarrow Q(B,C)\) | \(P(A,B) \leftarrow Q(A,C),R(B,C)\) | |
\(P(A,B) \leftarrow Q(C,A)\) | \(P(A,B) \leftarrow Q(A,C),R(A,D),S(B,C),T(B,D),U(C,D)\) | |
\(P(A,B) \leftarrow Q(C,B)\) |
As Table 4 shows, all the fragments can be S- and E-reduced to \({{{\mathscr {C}}}}^{a}_{1}\). We show that in general \({{{\mathscr {C}}}}^{a}_{\infty }\) has a \({{{\mathscr {C}}}}^{a}_{1}\)-S-reduction:
Theorem 5
(\({{{\mathscr {C}}}}^{a}_{\infty }\) S-reducibility) For all \(a>0\), the fragment \({{{\mathscr {C}}}}^{a}_{\infty }\) has a \({{{\mathscr {C}}}}^{a}_{1}\)-S-reduction.
Proof
Let C be any clause in \({{{\mathscr {C}}}}^{a}_{\infty }\), where \(a>0\). By the definition of connected clauses there must be at least one body literal in C that shares a variable with the head literal of C. The clause formed of the head of C with the body literal directly connected to it is by definition in \({{{\mathscr {C}}}}^{a}_{1}\) and clearly subsumes C. Therefore \({{{\mathscr {C}}}}^{a}_{1} \preceq {{{\mathscr {C}}}}^{a}_{\infty }\). \(\square \)
We likewise show that \({{{\mathscr {C}}}}^{a}_{\infty }\) always has a \({{{\mathscr {C}}}}^{a}_{1}\)-E-reduction:
Theorem 6
(\({{{\mathscr {C}}}}^{a}_{\infty }\) E-reducibility) For all \(a>0\), the fragment \({{{\mathscr {C}}}}^{a}_{\infty }\) has a \({{{\mathscr {C}}}}^{a}_{1}\)-E-reduction.
As Table 4 shows, the fragment \({{{\mathscr {C}}}}^{2}_{5}\) could not be D-reduced to \({{{\mathscr {C}}}}^{2}_{2}\) when running the derivation reduction algorithm. However, because we run the derivation reduction algorithm with a maximum derivation depth, this result alone is not enough to guarantee that the output cannot be further reduced. Therefore, we show that \({{{\mathscr {C}}}}^{2}_{5}\) cannot be D-reduced to \({{{\mathscr {C}}}}^{2}_{2}\):
Proposition 14
(\({{{\mathscr {C}}}}^{2}_{5}\) D-irreducibility) The fragment \({{{\mathscr {C}}}}^{2}_{5}\) has no \({{{\mathscr {C}}}}^{2}_{2}\)-D-reduction.
Proof
We denote by \({{{\mathscr {P}}}}(C)\) the set of all clauses that can be obtained from a given clause C by permuting the arguments in its literals up to variable renaming. For example if \(C=P(A,B)\leftarrow Q(A,C)\) then \({{{\mathscr {P}}}}(C)=\{(C),(P(A,B)\leftarrow Q(C,A)),(P(B,A)\leftarrow Q(A,C)),(P(B,A)\leftarrow Q(C,A))\}\) up to variable renaming.
Let \(C_I\) denote the clause \(P(A,B) \leftarrow Q(A,C),R(A,D),S(B,C),T(B,D),U(C,D)\). We prove that no clause in \({{{\mathscr {P}}}}(C_I)\) can be derived from \({{{\mathscr {C}}}}^{2}_{2}\) by induction on the length of derivations. Formally, we show that there exist no derivations of length n from \({{{\mathscr {C}}}}^{2}_{2}\) to a clause in \({{{\mathscr {P}}}}(C_I)\). We reason by contradiction and w.l.o.g. we consider only the clause \(C_I\).
For the base case \(n=0\), assume that there is a derivation of length 0 from \({{{\mathscr {C}}}}^{2}_{2}\) to \(C_I\). This assumption implies that \(C_I\in {{{\mathscr {C}}}}^{2}_{2}\), but this clearly cannot hold given the body size of \(C_I\).
All the literals of \(C_I\) occur in the same premise: because of Lemma 1, this case is impossible because this premise would contain more literals than \(C_I\) (the ones from \(C_I\) plus the resolved literal).
Only one of the literals of \(C_I\) occurs separately from the others: w.l.o.g., assume that the literal Q(A, C) occurs alone in \(C_2\) (up to variable renaming). Then \(C_2\) must be of the form \(H(A,C)\leftarrow Q(A,C)\) or \(H(C,A)\leftarrow Q(A,C)\) for some H, where the H-headed literal is the resolved literal of the inference that allows the unification of A and C with their counterparts in \(C_1\).^{17} In this case, \(C_1\) belongs to \({{{\mathscr {P}}}}(C_I)\) and a derivation of \(C_1\) from \({{{\mathscr {C}}}}^{2}_{2}\) of length smaller than n exists as a strict subset of the derivation to \(C_I\) of length n. This contradicts the induction hypothesis, thus the assumed derivation of C cannot exist.
Otherwise, the split of the literals of \(C_I\) between \(C_1\) and \(C_2\) is always such that at least three variables must be unified during the inference. For example, consider the case where \(P(A,B) \leftarrow Q(A,C) \subset C_1\) and the set \(\{R(A',D),S(B',C'),T(B',D),U(C',D)\}\) occurs in the body of \(C_2\) (up to variable renaming). Then \(A'\), \(B'\) and \(C'\) must unify respectively with A, B and C for \(C_I\) to be derived (up to variable renaming). However the inference can at most unify two variable pairs since the resolved literal must be dyadic at most and thus this inference is impossible, a contradiction.
We generalise this result to \({{{\mathscr {C}}}}^{2}_{\infty }\):
Theorem 7
(\({{{\mathscr {C}}}}^{2}_{\infty }\) D-irreducibility) The fragment \({{{\mathscr {C}}}}^{2}_{\infty }\) has no D-reduction.
Proof
It is enough to prove that \({{{\mathscr {C}}}}^{2}_{\infty }\) does not have a \({{{\mathscr {C}}}}^{2}_{m}\)-D-reduction for an arbitrary m because any D-reduced theory, being finite, admits a bound on the body size of the clauses it contains. Starting from \(C_I\) as defined in the proof of Proposition 14, apply the following transformation iteratively for k from 1 to m: replace the literals containing Q and R (i.e. at first Q(A, C) and R(A, D)) with the following set of literals \(Q(A,C_k)\), \(R(A,D_k)\), \(V_k(C_k,D_k)\), \(Q_k(C_k,C)\), \(R_k(D_k,D)\) where all variables and predicate variables labeled with k are new. Let the resulting clause be denoted \(C_{I_m}\). This clause is of body size \(3m+5\) and thus does not belong to \({{{\mathscr {C}}}}^{2}_{m}\). Moreover, for the same reason that \(C_I\) cannot be derived from any \({{{\mathscr {C}}}}^{2}_{m'}\) with \(m'<5\) (see the proof of Proposition 14) \(C_{I_m}\) cannot be derived from any \({{{\mathscr {C}}}}^{2}_{m'}\) with \(m'<3m+5\). In particular, \(C_{I_m}\) cannot be derived from \({{{\mathscr {C}}}}^{2}_{m}\). \(\square \)
Another way to generalise Proposition 14 is the following:
Theorem 8
(\({{{\mathscr {C}}}}^{a}_{\infty }\) D-irreducibility) For \(a\ge 2\), the fragment \({{{\mathscr {C}}}}^{a}_{\infty }\) has no \({{{\mathscr {C}}}}^{a}_{a^2+a-2}\)-D-reduction.
Proof
the split of the literals of \(C_a\) between \(C_1\) and \(C_2\) is always such that at least \(a+1\) variables must be unified during the inference, which is impossible since the resolved literal can at most hold a variables.
Note that this is enough to conclude that \({{{\mathscr {C}}}}^{a}_{\infty }\) cannot be reduced to \({{{\mathscr {C}}}}^{a}_{2}\) but it does not prove that \({{{\mathscr {C}}}}^{a}_{\infty }\) is not D-reducible.
5.1.1 Summary
Existence of a S-, E- or D-reduction of \({{{\mathscr {C}}}}^{a}_{\infty }\) to \({{{\mathscr {C}}}}^{a}_{2}\)
Arity | S | E | D |
---|---|---|---|
1 | \(\checkmark \) | \(\checkmark \) | \(\checkmark \) |
2 | \(\checkmark \) | \(\checkmark \) | \(\times \) |
\(>2\) | \(\checkmark \) | \(\checkmark \) | \(\times \) |
5.2 Datalog (\({{{\mathscr {D}}}}^{a}_{m}\)) results
Cardinality and maximal body size of the reductions of \({{{\mathscr {D}}}}^{a}_{5}\)
Arities | S-reduction | E-reduction | D-reduction | |||
---|---|---|---|---|---|---|
a | Bodysize | Cardinality | Bodysize | Cardinality | Bodysize | Cardinality |
0 | 1 | 1 | 1 | 1 | 2 | 2 |
1 | 1 | 1 | 1 | 1 | 2 | 2 |
2 | 2 | 4 | 2 | 2 | 5 | 10 |
0, 1 | 1 | 2 | 1 | 2 | 2 | 5 |
0, 2 | 2 | 5 | 2 | 3 | 5 | 38 |
1, 2 | 2 | 10 | 2 | 3 | 5 | 11 |
0, 1, 2 | 2 | 11 | 2 | 4 | 5 | 14 |
Reductions of the Datalog fragment \({{{\mathscr {D}}}}^{\{1,2\}}_{5}\)
S-reduction | E-reduction | D-reduction |
---|---|---|
\(P(A) \leftarrow Q(A)\) | \(P(A) \leftarrow Q(A,B)\) | \(P(A) \leftarrow Q(B,A)\) |
\(P(A) \leftarrow Q(A,B)\) | \(P(A,B) \leftarrow Q(B,A)\) | \(P(A,A) \leftarrow Q(A)\) |
\(P(A) \leftarrow Q(B,A)\) | \(P(A,B) \leftarrow Q(A),R(B)\) | \(P(A,A) \leftarrow Q(A,A)\) |
\(P(A,A) \leftarrow Q(B,A)\) | \(P(A,B) \leftarrow Q(B,A)\) | |
\(P(A,B) \leftarrow Q(A),R(B)\) | \(P(A,B) \leftarrow Q(A,B),R(A,B)\) | |
\(P(A,B) \leftarrow Q(A),R(B,C)\) | \(P(A,B) \leftarrow Q(A,C),R(B,C)\) | |
\(P(A,B) \leftarrow Q(A,B)\) | \(P(A,B) \leftarrow Q(B,C),R(A,D)\) | |
\(P(A,B) \leftarrow Q(B),R(A,C)\) | \(P(A,B) \leftarrow Q(B,C),R(A,D),S(B,D),T(C,E)\) | |
\(P(A,B) \leftarrow Q(B,A)\) | \(P(A,B) \leftarrow Q(A,C),R(A,D),S(B,C),T(B,D),U(C,D)\) | |
\(P(A,B) \leftarrow Q(B,C),R(A,D)\) | \(P(A,B) \leftarrow Q(B,C),R(A,D),S(C,E),T(B,F),U(D,F)\) | |
\(P(A,B) \leftarrow Q(B,C),R(B,D),S(C,E),T(A,F),U(D,F)\) |
We show that \({{{\mathscr {D}}}}^{2}_{\infty }\) can be S-reduced to \({{{\mathscr {D}}}}^{2}_{2}\):
Proposition 15
(\({{{\mathscr {D}}}}^{2}_{\infty }\) S-reducibility) The fragment \({{{\mathscr {D}}}}^{2}_{\infty }\) has a \({{{\mathscr {D}}}}^{2}_{2}\)-S-reduction.
Proof
Follows using the same argument as in Theorem 5 but the reduction is to \({{{\mathscr {D}}}}^{2}_{2}\) instead of \({{{\mathscr {D}}}}^{2}_{1}\). This difference is due to the Datalog constraint that states: if a variable appears in the head it must also appear in the body. For clauses with dyadic heads, if the two head argument variables occur in two distinct body literals then the clause cannot be further reduced beyond \({{{\mathscr {D}}}}^{2}_{2}\). \(\square \)
We show how this result cannot be generalised to \({{{\mathscr {D}}}}^{a}_{\infty }\):
Theorem 9
(\({{{\mathscr {D}}}}^{a}_{\infty }\) S-irreducibility) For \(a>0\), the fragment \({{{\mathscr {D}}}}^{a}_{\infty }\) does not have a \({{{\mathscr {D}}}}^{a}_{a-1}\)-S-reduction.
Proof
As a counter-example to a \({{{\mathscr {D}}}}^{a}_{a-1}\)-S-reduction, consider \(C_a=P(X_1,\dots ,X_a)\leftarrow Q_1(X_1), \, \dots ,Q_a(X_a)\). The clause \(C_a\) does not belong to \({{{\mathscr {D}}}}^{a}_{a-1}\) and cannot be S-reduced to it because the removal of any subset of its literals leaves argument variables in the head without their counterparts in the body. Hence, any subset of \(C_a\) does not belong to the Datalog fragment. Thus \(C_a\) cannot be subsumed by a clause in \({{{\mathscr {D}}}}^{a}_{a-1}\). \(\square \)
However, we can show that \({{{\mathscr {D}}}}^{a}_{\infty }\) can always be S-reduced to \({{{\mathscr {D}}}}^{a}_{a}\):
Theorem 10
(\({{{\mathscr {D}}}}^{a}_{\infty }\) to \({{{\mathscr {D}}}}^{a}_{a}\) S-reducibility) For \(a>0\), the fragment \({{{\mathscr {D}}}}^{a}_{\infty }\) has a \({{{\mathscr {D}}}}^{a}_{a}\)-S-reduction.
Proof
To prove that \({{{\mathscr {D}}}}^{a}_{\infty }\) has a \({{{\mathscr {D}}}}^{a}_{a}\)-S-reduction it is enough to remark that any clause in \({{{\mathscr {D}}}}^{a}_{\infty }\) has a subclause of body size at most a that is also in \({{{\mathscr {D}}}}^{a}_{\infty }\), the worst case being clauses such as \(C_a\) where all argument variables in the head occur in a distinct literal in the body. \(\square \)
We also show that \({{{\mathscr {D}}}}^{a}_{\infty }\) always has a \({{{\mathscr {D}}}}^{a}_{2}\)-E-reduction, starting with the following lemma:
Lemma 3
Proof
For the base case \(n=2\), by definition \({{{\mathscr {D}}}}^{a}_{2}\) contains \(P_0(A_1,A_2) \leftarrow P_1(A_1), P_2(A_2)\)
For the inductive step, assume the claim holds for \(n-1\). We show it holds for n. By definition \({{{\mathscr {D}}}}^{a}_{2}\) contains the clause \(D_1{=}P(A_1,A_2,\dots ,A_{n}) \leftarrow P_0(A_1,A_2,\dots ,A_{n-1}), P_n(A_{n})\). By the inductive hypothesis, \(D_2=P_0(A_1,A_2,\dots ,A_{n-1})\leftarrow P_1(A_1),\dots ,P_{n-1}(A_{n-1})\) is \({{{\mathscr {D}}}}^{a-1}_{2}\)-E-reducible, and thus also \({{{\mathscr {D}}}}^{a}_{2}\)-E-reducible. Together, \(D_1\) and \(D_2\) entail \(D=P_0(A_1,A_2,\dots ,A_n) \leftarrow P_1(A_1), P_2(A_2), \dots ,\)\( P_n(A_n)\), which can be seen by resolving the literal \(P_0(A_1,A_2,\dots ,A_{n-1})\) from \(D_1\) with the same literal from \(D_2\) to derive D. Thus D is \({{{\mathscr {D}}}}^{a}_{2}\)-E-reducible.
Theorem 11
(\({{{\mathscr {D}}}}^{a}_{\infty }\) E-reducibility) For \(a>0\), the fragment \({{{\mathscr {D}}}}^{a}_{\infty }\) has a \({{{\mathscr {D}}}}^{a}_{2}\)-E-reduction.
Proof
Let C be any clause in \({{{\mathscr {D}}}}^{a}_{\infty }\). We denote the head of C by \(P(A_1,\dots ,A_n)\), where \(0<n\le a\). The possibility that some of the \(A_i\) are equal does not impact the reasoning.
If \(n=1\), then by definition, there exists a literal \(L_1\) in the body of C such that \(A_1\) occurs in \(L_1\). It is enough to consider the clause \(P(A_1)\leftarrow L_1\) to conclude, because \(P(A_1)\) is the head of C and \(L_1\) belongs to the body of C, thus \(P(A_1)\leftarrow L_1\) entails C, and this clause belongs to \({{{\mathscr {D}}}}^{a}_{2}\).
The clause \(C'\) belongs to \({{{\mathscr {D}}}}^{a}_{\infty }\).
Some \(L_i\) may be identical with each other, since the \(A_i\)s may occur together in literals or simply be equal, but this scenario does not impact the reasoning.
The clause \(C'\) entails C because \(C'\) is equivalent to a subset of C (but this subset may be distinct from \(C'\) due to \(C'\) possibly including some extra duplicated literals).
As Table 7 shows, not all of the fragments can be D-reduced to \({{{\mathscr {D}}}}^{a}_{2}\). In particular, the result that \({{{\mathscr {D}}}}^{2}_{\infty }\) has no \({{{\mathscr {D}}}}^{2}_{2}\)-D-reduction follows from Theorem 7 because the counterexamples presented in the proof also belong to \({{{\mathscr {D}}}}^{2}_{\infty }\).
5.2.1 Summary
Existence of a S-, E- or D-reduction of \({{{\mathscr {D}}}}^{a}_{\infty }\) to \({{{\mathscr {D}}}}^{a}_{2}\)
Arity | S | E | D |
---|---|---|---|
1 | \(\checkmark \) | \(\checkmark \) | \(\checkmark \) |
2 | \(\checkmark \) | \(\checkmark \) | \(\times \) |
\(>2\) | \(\times \) | \(\checkmark \) | \(\times \) |
5.3 Singleton-free (\({{{\mathscr {K}}}}^{a}_{m}\)) results
It is common in ILP to require that all the variables in a clause appear at least twice (Cropper and Muggleton 2014; Muggleton and Feng 1990; De Raedt and Bruynooghe 1992), which essentially eliminates singleton variables. We call this fragment the singleton-free fragment:
Definition 21
(Singleton-free) A clause is singleton-free if each first-order variable appears at least twice
Cardinality and maximal body size of the reductions of \({{{\mathscr {K}}}}^{a}_{5}\)
Arities | S-reduction | E-reduction | D-reduction | |||
---|---|---|---|---|---|---|
a | Bodysize | Cardinality | Bodysize | Cardinality | Bodysize | Cardinality |
0 | 1 | 1 | 1 | 1 | 2 | 2 |
1 | 1 | 1 | 1 | 1 | 2 | 2 |
2 | 4 | 3 | 2 | 3 | 5 | 7 |
0, 1 | 1 | 2 | 1 | 2 | 2 | 5 |
0, 2 | 5 | 4 | 2 | 3 | 5 | 23 |
1, 2 | 4 | 8 | 2 | 4 | 5 | 8 |
0, 1, 2 | 4 | 9 | 2 | 5 | 5 | 11 |
Reductions of the singleton-free fragment \({{{\mathscr {K}}}}^{\{2\}}_{5}\)
S-reduction | E-reduction | D-reduction |
---|---|---|
\(P(A,B) \leftarrow Q(A,B)\) | \(P(A,B) \leftarrow Q(B,A)\) | \(P(A,A) \leftarrow Q(A,A)\) |
\(P(A,B) \leftarrow Q(B,A)\) | \(P(A,B) \leftarrow Q(A,A),R(B,B)\) | \(P(A,B) \leftarrow Q(B,A)\) |
\(P(A,B) \leftarrow \begin{aligned}&Q(B,C),R(A,D),\\&S(A,D),T(B,C)\end{aligned}\) | \(P(A,B) \leftarrow Q(A,C),R(B,C)\) | \(P(A,A) \leftarrow Q(A,B),R(B,B)\) |
\(P(A,B) \leftarrow Q(A,A),R(B,B)\) | ||
\(P(A,B) \leftarrow Q(A,B),R(A,B)\) | ||
\(P(A,B) \leftarrow Q(A,C),R(B,C)\) | ||
\(P(A,B) \leftarrow \begin{aligned}&Q(A,C),R(A,D), \\&S(B,C),T(B,D),U(C,D)\end{aligned}\) |
Unlike in the connected and Datalog cases, the fragment \({{{\mathscr {K}}}}^{\{2\}}_{5}\) is no longer S-reducible to \({{{\mathscr {K}}}}^{\{2\}}_{2}\). We show that \({{{\mathscr {K}}}}^{2}_{\infty }\) cannot be reduced to \({{{\mathscr {K}}}}^{2}_{2}\).
Proposition 16
(\({{{\mathscr {K}}}}^{2}_{\infty }\) S-reducibility) The fragment \({{{\mathscr {K}}}}^{2}_{\infty }\) does not have a \({{{\mathscr {K}}}}^{2}_{2}\)-S-reduction.
Proof
We can likewise show that this result holds in the general case:
Theorem 12
(\({{{\mathscr {K}}}}^{a}_{\infty }\) S-reducibility) For \(a\ge 2\), the fragment \({{{\mathscr {K}}}}^{a}_{\infty }\) does not have a \({{{\mathscr {K}}}}^{a}_{2a-1}\)-S-reduction.
Proof
We generalise the clause C from the proof of Proposition 16 to define the clause \(C_a = P(A_1,\dots ,A_a)\leftarrow P_1(A_1,B_1),P_2(A_1,B_1),\dots ,P_{2a-1}(A_a,B_a),P_{2a}(A_a,B_a)\). The same reasoning applies to \(C_a\) as to \(C (= C_2)\), making \(C_a\) irreducible in \({{{\mathscr {K}}}}^{a}_{\infty }\). Moreover \(C_a\) is of body size 2a, thus \(C_a\) is a counterexample to a \({{{\mathscr {K}}}}^{a}_{2a-1}\)-S-reduction of \({{{\mathscr {K}}}}^{a}_{\infty }\). \(\square \)
However, all the fragments can be E-reduced to \({{{\mathscr {K}}}}^{a}_{2}\).
Theorem 13
(\({{{\mathscr {K}}}}^{a}_{\infty }\) E-reducibility) For \(a>0\), the fragment \({{{\mathscr {K}}}}^{a}_{\infty }\) has a \({{{\mathscr {K}}}}^{a}_{2}\)-E-reduction.
Proof
The proof of Theorem 13 is an adaptation of that of Theorem 11. The only difference is that if \(n=1\) then \(P(A_1)\leftarrow L_1,L_1\) must be considered instead of \(P(A_1)\leftarrow L_1\) to ensure the absence of singleton variables in the body of the clause, and for the same reason, in the general case, the clause \(D'=P(A_1,\dots ,A_n)\leftarrow L_1,...,L_n\) must be replaced by \(D'=P(A_1,\dots ,A_n)\leftarrow L_1,L_1,\dots ,L_n,L_n\). Note that \(C'\) is not modified and thus may or may not belong to \({{{\mathscr {K}}}}^{a}_{\infty }\). However, it is enough that \(C'\in {{{\mathscr {D}}}}^{a}_{\infty }\). With these modifications, the proof carries from \({{{\mathscr {K}}}}^{a}_{\infty }\) to \({{{\mathscr {K}}}}^{a}_{2}\) as from \({{{\mathscr {D}}}}^{a}_{\infty }\) to \({{{\mathscr {D}}}}^{a}_{2}\), including the results in Lemma 3. \(\square \)
5.3.1 Summary
Existence of a S-, E- or D-reduction of \({{{\mathscr {K}}}}^{a}_{\infty }\) to \({{{\mathscr {K}}}}^{a}_{2}\)
Arity | S | E | D |
---|---|---|---|
1 | \(\checkmark \) | \(\checkmark \) | \(\checkmark \) |
2 | \(\times \) | \(\checkmark \) | \(\times \) |
\(>2\) | \(\times \) | \(\checkmark \) | \(\times \) |
5.4 Duplicate-free (\({{{\mathscr {U}}}}^{a}_{m}\)) results
The previous three fragments are general in the sense that they have been widely used in ILP. By contrast, the final fragment that we consider is of particular interest to MIL. Table 1 shows a selection of metarules commonly used in the MIL literature. These metarules have been successfully used despite no theoretical justification. However, if we consider the reductions of the three fragments so far, the identity, precon, and postcon metarules do not appear in any reduction. These metarules can be derived from the reductions, typically using either the \(P(A) \leftarrow Q(A,A)\) or \(P(A,A) \leftarrow Q(A)\) metarules. To try to identify a reduction which more closely matches the metarules shown in Table 1, we consider a fragment that excludes clauses in which a literal contains multiple occurrences of the same variable. For instance, this fragment excludes the previously mentioned metarules and also excludes the metarule \(P(A,A) \leftarrow Q(B,A)\), which was in the D-reduction shown in Table 5. We call this fragment duplicate-free. It is a sub-fragment of \({{{\mathscr {K}}}}^{a}_{m}\) and we denote it as \({{{\mathscr {U}}}}^{a}_{m}\).
Table 13 shows the reductions for the fragment \({{{\mathscr {U}}}}^{\{1,2\}}_{5}\). Reductions for other duplicate-free fragments are in Appendix “A.4”. As Table 13 shows, the D-reduction of \({{{\mathscr {U}}}}^{\{1,2\}}_{5}\) contains some metarules commonly used in the MIL literature. For instance, it contains the \({\textit{identity}}_1\), \({\textit{didentity}}_2\), and precon metarules. We use the metarules shown in Table 13 in Experiments 1 and 2 (Sects. 6.1 and 6.2) to learn Michalski trains solutions and string transformation programs respectively.
(S) The clauses in the proofs of Proposition 16 and Theorem 12 belong to \({{{\mathscr {U}}}}^{a}_{\infty }\).
(E) If the clause C considered initially in the proof of Theorem 13 belongs to \({{{\mathscr {U}}}}^{a}_{\infty }\), then all the subsequent clauses in that proof are also duplicate-free.
(D) In the proof of Theorem 7, the \(C_{I_m}\) family of clauses all belong to \({{{\mathscr {U}}}}^{a}_{\infty }\).
Reductions of the fragment \({{{\mathscr {U}}}}^{\{1,2\}}_{5}\)
S-reduction | E-reduction | D-reduction |
---|---|---|
\(P(A) \leftarrow Q(A)\) | \(P(A) \leftarrow Q(A,B),R(A,B)\) | \(P(A) \leftarrow Q(A)\) |
\(P(A) \leftarrow Q(A,B),R(A,B)\) | \(P(A,B) \leftarrow Q(B,A)\) | \(P(A) \leftarrow Q(A),R(A)\) |
\(P(A,B) \leftarrow Q(A,B)\) | \(P(A,B) \leftarrow Q(A),R(B)\) | \(P(A) \leftarrow Q(A,B),R(B)\) |
\(P(A,B) \leftarrow Q(B,A)\) | \(P(A) \leftarrow Q(A,B),R(A,B)\) | |
\(P(A,B) \leftarrow Q(A),R(B)\) | \(P(A,B) \leftarrow Q(B,A)\) | |
\(P(A,B) \leftarrow Q(B),R(A,C),S(A,C)\) | \(P(A,B) \leftarrow Q(A),R(B)\) | |
\(P(A,B) \leftarrow Q(A),R(B,C),S(B,C)\) | \(P(A,B) \leftarrow Q(A),R(A,B)\) | |
\(P(A,B) \leftarrow Q(B,C),R(A,D),\) | \(P(A,B) \leftarrow Q(A,B),R(A,B)\) | |
S(A, D), T(B, C) | \(P(A,B) \leftarrow Q(A,C),R(B,C)\) | |
\(P(A,B) \leftarrow Q(A,C),R(A,D),S(B,C),\) | ||
T(B, D), U(C, D) | ||
\(P(A,B) \leftarrow Q(B,C),R(A,D),S(B,D)\) | ||
T(C, E), U(E) | ||
\(P(A,B) \leftarrow Q(B,C),R(A,D),S(B,D),\) | ||
T(C, E), U(C, E) |
Cardinality and body size of the reductions of \({{{\mathscr {U}}}}^{a}_{5}\)
Arities | S-reduction | E-reduction | D-reduction | |||
---|---|---|---|---|---|---|
a | Bodysize | Cardinality | Bodysize | Cardinality | Bodysize | Cardinality |
0 | 1 | 1 | 1 | 1 | 2 | 2 |
1 | 1 | 1 | 1 | 1 | 2 | 2 |
2 | 4 | 3 | 5 | 2 | 5 | 10 |
0, 1 | 1 | 2 | 1 | 2 | 2 | 5 |
0, 2 | 5 | 4 | 5 | 3 | 5 | 38 |
1, 2 | 4 | 8 | 2 | 3 | 5 | 12 |
0, 1, 2 | 4 | 9 | 2 | 4 | 5 | 16 |
5.5 Summary
Existence of a S-, E- or D-reduction of \({{{\mathscr {M}}}}^{{a}}_{\infty }\) to \({{{\mathscr {M}}}}^{{a}}_{2}\)
Arities | \({{{\mathscr {C}}}}^{a}_{\infty }\) | \({{{\mathscr {D}}}}^{a}_{\infty }\) | \({{{\mathscr {K}}}}^{a}_{\infty }\) | \({{{\mathscr {U}}}}^{a}_{\infty }\) | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
a | S | E | D | S | E | D | S | E | D | S | E | D |
1 | \(\checkmark \) | \(\checkmark \) | \(\checkmark \) | \(\checkmark \) | \(\checkmark \) | \(\checkmark \) | \(\checkmark \) | \(\checkmark \) | \(\checkmark \) | \(\checkmark \) | \(\checkmark \) | \(\checkmark \) |
2 | \(\checkmark \) | \(\checkmark \) | \(\times \) | \(\checkmark \) | \(\checkmark \) | \(\times \) | \(\times \) | \(\checkmark \) | \(\times \) | \(\times \) | \(\checkmark \) | \(\times \) |
\(>2\) | \(\checkmark \) | \(\checkmark \) | \(\times \) | \(\times \) | \(\checkmark \) | \(\times \) | \(\times \) | \(\checkmark \) | \(\times \) | \(\times \) | \(\checkmark \) | \(\times \) |
6 Experiments
Null hypothesis 1 There is no difference in the learning performance of Metagol when using different reduced sets of metarules
6.1 Michalski trains
6.1.1 Materials
The \(\hbox {D}^*\) fragment, which is the D-reduction of the fragment \({{{\mathscr {U}}}}^{\{1,2\}}_{5}\) restricted to the fragment \({{{\mathscr {U}}}}^{\{1,2\}}_{2}\)
\(P(A) \leftarrow Q(A)\) | \(P(A,B) \leftarrow Q(B,A)\) |
\(P(A) \leftarrow Q(A),R(A)\) | \(P(A,B) \leftarrow Q(A),R(B)\) |
\(P(A) \leftarrow Q(A,B),R(B)\) | \(P(A,B) \leftarrow Q(A),R(A,B)\) |
\(P(A) \leftarrow Q(A,B),R(A,B)\) | \(P(A,B) \leftarrow Q(A,B),R(A,B)\) |
\(P(A,B) \leftarrow Q(A,C),R(B,C)\) |
6.1.2 Method
- 1.
Generate 10 training examples of \(t_i\), half positive and half negative
- 2.
Generate 200 testing examples of \(t_i\), half positive and half negative
- 3.For each set of metarules m in the S-, E-, D-, and \(D^*\)-reductions:
- (a)
Learn a program for task \(t_i\) using the training examples and metarules m
- (b)
Measure the predictive accuracy of the learned program using the testing examples
- (a)
Predictive accuracies when using different reduced sets of metarules on the Michalski trains problems
Task | S | E | D | \(\hbox {D}^*\) |
---|---|---|---|---|
\(T_1\) | 100 ± 0 | 100 ± 0 | 100 ± 0 | 100 ± 0 |
\(T_2\) | 100 ± 0 | 100 ± 0 | 100 ± 0 | 100 ± 0 |
\(T_3\) | \(68 \pm 5\) | \(62 \pm 5\) | 100 ± 0 | 100 ± 0 |
\(T_4\) | \(75 \pm 6\) | \(75 \pm 6\) | 100 ± 0 | 100 ± 0 |
\(T_5\) | \(92 \pm 4\) | \(78 \pm 6\) | \(78 \pm 6\) | 100 ± 0 |
\(T_6\) | \(52 \pm 2\) | \(50 \pm 0\) | \(70 \pm 6\) | 100 ± 0 |
\(T_7\) | \(95 \pm 3\) | \(65 \pm 5\) | \(82 \pm 5\) | 100 ± 0 |
\(T_8\) | \(55 \pm 3\) | \(52 \pm 2\) | \(72 \pm 6\) | 98 ± 2 |
Mean | \(80 \pm 1\) | \(73 \pm 2\) | \(88 \pm 2\) | 100 ± 0 |
Learning times in seconds when using different reduced sets of metarules on the Michalski trains problems
Task | S | E | D | \(\hbox {D}^*\) |
---|---|---|---|---|
T\(_1\) | 0 ± 0 | 0 ± 0 | 0 ± 0 | 0 ± 0 |
T\(_2\) | 0 ± 0 | 0 ± 0 | 0 ± 0 | 0 ± 0 |
T\(_3\) | \(424 \pm 59\) | \(461 \pm 56\) | 0 ± 0 | 0 ± 0 |
T\(_4\) | \(322 \pm 64\) | \(340 \pm 61\) | 0 ± 0 | 0 ± 0 |
T\(_5\) | \(226 \pm 48\) | \(320 \pm 59\) | \(361 \pm 59\) | 5 ± 2 |
T\(_6\) | \(583 \pm 17\) | \(600 \pm 0\) | \(429 \pm 51\) | 7 ± 2 |
T\(_7\) | \(226 \pm 44\) | \(446 \pm 55\) | \(243 \pm 61\) | 6 ± 1 |
T\(_8\) | \(550 \pm 35\) | \(570 \pm 30\) | \(361 \pm 64\) | 183 ± 40 |
mean | \(292 \pm 16\) | \(342 \pm 17\) | \(174 \pm 16\) | 25 ± 5 |
6.1.3 Results
Table 17 shows the predictive accuracies when learning with the different sets of metarules. The D set generally outperforms the S and E sets with a higher mean accuracy of 88% versus 80% and 73% respectively. Moreover, the \(D^*\) set easily outperforms them all with a mean accuracy of 100%. A McNemar’s test^{21} on the D and \(D^*\) accuracies confirmed the significance at the \(p < 0.01\) level.
Table 18 shows the corresponding learning times when using different reduces sets of metarules. The D set outperforms (has lower mean learning time) the S and E sets, and again the \(D^*\) set outperforms them all. A paired t-test^{22} on the D and \(D^*\) learning times confirmed the significance at the \(p < 0.01\) level.
The \(D^*\) set performs particularly well on the more difficult tasks. The poor performance of the S and E sets on the more difficult tasks is for one of two reasons. The first reason is that the S- and E-reduction algorithms have removed the metarules necessary to express the target concept. This observation strongly corroborates our claim that E-reduction can be too strong because it can remove metarules necessary to specialise a clause. The second reason is that the S- and E-reduction algorithms produce sets of metarules that are still sufficient to express the target theory but doing so requires a much larger and more complex program, measured by the number of clauses needed.
The performance discrepancy between the D and \(D^*\) sets of metarules can be explained by comparing the hypothesis spaces searched. For instance, when searching for a program with 3 clauses, Theorem 1 shows that when using the D set of metarules the hypothesis space contains approximately \(10^{24}\) programs. By contrast, when using the \(D^*\) set of metarules the hypothesis space contains approximately \(10^{14}\) programs. As explained in Sect. 3.2, assuming that the target hypothesis is in both hypothesis spaces, the Blumer bound (Blumer et al. 1987) tells us that searching the smaller hypothesis space will result in less error, which helps to explain these empirical results. Of course, there is the potential for the \(D^*\) set to perform worse than the D set when the target theory requires the three removed metarules, but we did not observe this situation in this experiment.
Figure 3 shows the target program for \(\hbox {T}_8\) and example programs learned by Metagol using the various reduced sets of metarules. Only the \(\hbox {D}^*\) program is success set equivalent^{23} to the target program when restricted to the target predicate f/1. In all three cases Metagol discovered that if a carriage has three wheels then it is a long carriage, i.e. Metagol discovered that the literal long(C2) is redundant in the target program. Indeed, if we unfold the \(\hbox {D}^*\) program to remove the invented predicates then the resulting single clause program is one literal shorter than the target program.
Overall, the results from this experiment suggest that we can reject the null hypothesis, both in terms of predictive accuracies and learning times.
6.2 String transformations
In Lin et al. (2014) and Cropper and Muggleton (2019) the authors evaluate Metagol on 17 real-world string transformation tasks using a predefined (hand-crafted) set of metarules. In this experiment, we compare learning with different metarules on an expanded dataset with 250 string transformation tasks.
6.2.1 Materials
Examples of the p6 string transformation problem input–output pairs
Input | Output |
---|---|
Arthur Joe Juan | AJJ |
Jose Larry Scott | JLS |
Kevin Jason Matthew | KJM |
Donald Steven George | DSG |
Raymond Frank Timothy | RFT |
6.2.2 Method
- 1.
Sample 50 tasks Ts from the set \(\{p1,\dots ,p250\}\)
- 2.For each \(t \in Ts\):
- (a)
Sample 5 training examples and use the remaining examples as testing examples
- (b)For each set of metarules m in the S-, E-, D, and \(D^*\)-reductions:
- i.
Learn a program p for task t using the training examples and metarules m
- ii.
Measure the predictive accuracy of p using the testing examples
- i.
- (a)
6.2.3 Results
Table 20 shows the mean predictive accuracies and learning times when learning with the different sets of metarules. Note that we are not interested in the absolute predictive accuracy, which is limited by factors such as the low timeout and insufficiency of the BK. We are instead interested in the relative accuracies. Table 20 shows that the D set outperforms the S and E sets, with a higher mean accuracy of 33%, versus 22% and 22% respectively. The \(D^*\) set outperforms them all with a mean accuracy of 56%. A McNemar’s test on the D and \(D^*\) accuracies confirmed the significance at the \(p < 0.01\) level.
Table 20 shows the corresponding learning times when varying the metarules. Again, the D set outperforms the S and E sets, and again the \(D^*\) set outperforms them all. A paired t-test on the D and \(D^*\) learning times confirmed the significance at the \(p < 0.01\) level.
Experimental results on the string transformation problems
S | E | D | \(\hbox {D}^*\) | |
---|---|---|---|---|
Mean predictive accuracy (%) | \(22 \pm 0\) | \(22 \pm 0\) | \(32 \pm 0\) | 56 ± 1 |
Mean learning time (seconds) | \(467 \pm 1\) | \(467 \pm 1\) | \(407 \pm 3\) | 270 ± 3 |
6.3 Inducing game rules
The general game playing (GGP) framework (Genesereth et al. 2005) is a system for evaluating an agent’s general intelligence across a wide range of tasks. In the GGP competition, agents are tested on games they have never seen before. In each round, the agents are given the rules of a new game. The rules are described symbolically as a logic program. The agents are given a few seconds to think, to process the rules of the game, and to then start playing, thus producing game traces. The winner of the competition is the agent who gets the best total score over all the games. In this experiment, we use the IGGP dataset (Cropper et al. 2019) which inverts the GGP task: an ILP system is given game traces and the task is to learn a set of rules (a logic program) that could have produced these traces.
6.3.1 Materials
IGGP games used in the experiments
GT attrition | GT chicken |
GT prisoner | Minimal decay |
Minimal even | Multiple buttons and lights |
Scissors paper stone | Untwisty corridor |
6.3.2 Method
The majority of game examples are negative. We therefore use balanced accuracy to evaluate the approaches. Given background knowledge B, sets of positive \(E^+\) and negative \(E^-\) testing examples, and a logic program H, we define the number of positive examples as \(p=|E^+|\), the number of negative examples as \(n=|E^-|\), the number of true positives as \(tp=|\{e \in E^+ | B \cup H \models e\}|\), the number of true negatives as \(tn=|\{e \in E^- | B \cup H \not \models e\}|\), and the balanced accuracy \(ba = (tp/p + tn/n)/2\).
- 1.
Learn a program p using all the training examples for \(g_t\) using the metarules m with a timeout of 10 min
- 2.
Measure the balanced accuracy of p using the testing examples
6.3.3 Results
Experimental results on the IGGP data
S | E | D | \(\hbox {D}^*\) | |
---|---|---|---|---|
Balanced accuracy (%) | 66 | 66 | 72 | 73 |
Learning time (seconds) | 316 | 316 | 327 | 296 |
7 Conclusions and further work
As stated in Sect. 1, despite the widespread use of metarules, there is little work determining which metarules to use for a given learning task. Instead, suitable metarules are assumed to be given as part of the background knowledge, or are used without any theoretical justification. Deciding which metarules to use for a given learning task is a major open challenge (Cropper 2017; Cropper and Muggleton 2014) and is a trade-off between efficiency and expressivity: the hypothesis space grows given more metarules (Cropper and Muggleton 2014; Lin et al. 2014), so we wish to use fewer metarules, but if we use too few metarules then we lose expressivity. To address this issue, Cropper and Muggleton (2014) used E-reduction on sets of metarules and showed that learning with E-reduced sets of metarules can lead to higher predictive accuracies and lower learning times compared to learning with non-E-reduced sets. However, as we claimed in Sect. 1, E-reduction is not always the most appropriate form of reduction because it can remove metarules necessary to learn programs with the necessary specificity.
To support our claim, we have compared three forms of logical reduction: S-, E-, and D-reduction, where the latter is a new form of reduction based on SLD-derivations. We have used the reduction algorithms to reduce finite sets of metarules. Table 15 summarises the results. We have shown that many sets of metarules relevant to ILP do not have finite reductions (Theorem 7). These negative results have direct (negative) implications for MIL. Specifically, our results mean that, in certain cases, a MIL system, such as Metagol or HEXMIL (Kaminski et al. 2018), cannot be given a finite set of metarules from which it can learn any program, such as when learning arbitrary Datalog programs. The results will also likely have implications for other forms of ILP which rely on metarules.
Our experiments compared learning the performance of Metagol when using the different reduced sets of metarules. In general, using the D-reduced set outperforms both the S- and E-reduced sets in terms of predictive accuracy and learning time. Our experimental results give strong evidence to our claim. We also compared a \(D^*\)-reduced set, a subset of the D-reduced metarules, which, although derivationally incomplete, outperforms the other two sets in terms of predictive accuracies and learning times.
7.1 Limitations and future work
Theorem 7 shows that certain fragments of metarules do not have finite D-reductions. However, our experimental results show that using D-reduced sets of metarules leads to higher predictive accuracies and lower learning times compared to the other forms of reduction. Therefore, our work now opens up a new challenge of overcoming this negative theoretical result. One idea is to explore whether special metarules, such as a currying metarule (Cropper and Muggleton 2016a), could alleviate the issue.
In future work we would also like reduce more general fragments of logic, such as triadic logics, which would allow us to tackle a wider variety or problems, such as more of the games in the IGGP dataset.
We have compared the learning performance of Metagol when using different reduced sets of metarules. However, we have not investigated whether these reductions are optimal. For instance, when considering derivation reductions, it may, in some cases, be beneficial to re-add redundant metarules to the reduced sets to avoid having to derive them through SLD-resolution. In future work, we would like to investigate identifying an optimal set of metarules for a given learning task, or preferably learning which metarules to use for a given learning task.
We have shown that although incomplete the \(D^*\)-reduced set of metarules outperforms the other reductions. In future work we would like to explore other methods which sacrifice completeness for efficiency.
We have used the logical reduction techniques to remove redundant metarules. It may also be beneficial to simultaneously reduce metarules and standard background knowledge. The idea of purposely removing background predicates is similar to dimensionality reduction, widely used in other forms of machine learning (Skillicorn 2007), but which has been under researched in ILP (Fürnkranz 1997). Initial experiments indicate that this is possible (Cropper 2017; Cropper and Muggleton 2014), and we aim to develop this idea in future work.
Footnotes
- 1.
- 2.
The fully quantified rule is \(\exists P \exists Q \exists R \forall A \forall B \forall C \; P(A,B) \leftarrow Q(A,C), R(C,B)\).
- 3.
A chained dyadic Datalog clause has the restriction that every first-order variable in a clause appears in exactly two literals and a path connects every literal in the body of C to the head of C. In other words, a chained dyadic Datalog clause has the form \(P_0(X_0,X_1) \leftarrow P_1(X_0,X_2), P_2(X_2,X_3), \dots , P_n(X_n,X_1)\) where the order of the arguments in the literals does not matter.
- 4.
- 5.
Although the MIL problem has also been encoded as an ASP problem (Kaminski et al. 2018).
- 6.
MIL uses example driven test incorporation for finding consistent programs as opposed to the generate-and-test approach of clause refinement.
- 7.
Datalog also imposes additional constraints on negation in the body of a clause, but because we disallow negation in the body we omit these constraints for simplicity.
- 8.
By more general we mean we focus on metarules that are independent of any particular ILP problem with particular predicate and constant symbols.
- 9.
For instance, the metarule \(P(A) \leftarrow \) entails and subsumes every metarule with a monadic head.
- 10.
The Blumer bound is a reformulation of Lemma 2.1 in Blumer et al. (1987).
- 11.
In practice we use more efficient algorithms for each approach. For instance, in the derivation reduction Prolog implementation we use the knowledge gained from Lemma 1 to add pruning so as to ignore clauses that are too large to be useful to check whether a clause is derivable.
- 12.
Rename the variables in \(M_4\) to form \(M_4' = P_0(X_1,X_2) \leftarrow P_1(X_2,X_3),P_2(X_1,X_4),P_3(X_1,X_4),\)\(P_4(X_2,X_3)\). Then \(M_4' \theta = P(A,B) \leftarrow R(B,B),Q(A,A),Q(A,A),R(B,B)\) where \(\theta =\{P_0/P,P_1/R,P_2/Q,P_3/Q,\)\(P_4/R,X_1/A,X_2/B,X_3/B,X_4/A\}\). It follows that \(M_4' \theta \subseteq M_2\), so \(M_4 \preceq M_2\), which in turn implies \(M_4 \models M_2\).
- 13.
Rename the variables in \(M_4\) to form \(M_4' = P_0(X_1,X_2) \leftarrow P_1(X_2,X_3),P_2(X_1,X_4),P_3(X_1,X_4),\)\(P_4(X_2,X_3)\). Then \(M_4' \theta = P(A,B) \leftarrow R(B,C),Q(A,C),Q(A,C),R(B,C)\) where \(\theta =\{P_0/P,P_1/R,P_2/Q,P_3/Q,\)\(P_4/R,X_1/A,X_2/B,X_3/C,X_4/C\}\). It follows that \(M_4' \theta \subseteq M_3\), so \(M_4 \preceq M_3\), which in turn implies \(M_4 \models M_3\).
- 14.
Rename the variables in \(M_3\) to form \(M_3' = P_0(X,Y) \leftarrow P_1(X,Z),P_2(Y,Z)\). Resolve the first body literal of \(M_2\) with \(M_3\) to form \(R_1 = P(A,B) \leftarrow P_1(A,Z),P_2(A,Z),R(B,B)\). Rename the variables \(P_1\) to \(P_3\), \(P_2\) to \(P_4\), and Z to \(Z_1\) in \(R_1\) (to standardise apart the variables) to form \(R_2 = P(A,B) \leftarrow P_3(A,Z_1),P_4(A,Z_1),R(B,B)\). Resolve the last body literal of \(R_2\) with \(M_3'\) to form \(R_3 = P(A,B) \leftarrow P_3(A,Z_1),P_4(A,Z_1),P_1(B,Z),P_2(B,Z)\). Rename the variables \(Z_1\) to D, ZtoC, \(P_3\) to R, \(P_4\) to S, \(P_1\) to Q, and \(P_2\) to T in \(R_3\) to form \(R_4 = P(A,B) \leftarrow R(A,D),S(A,D),Q(B,C),T(B,C)\).Thus, \(R_4 = M_4\), so it follows that and \(\{M_2,M_3\} \models M_4\)
- 15.
The entailment and derivation reduction algorithms often took 4–5 h to find a reduction. However, in some cases, typically where the fragments contained many metarules, the algorithms took around 12 h to find a reduction. By contrast, the subsumption reduction algorithm typically found a reduction in 30 min.
- 16.
Connected clauses are also known as linked clauses (Gottlob et al. 1997).
- 17.
Those are the only options to derive \(C_I\).Otherwise, e.g. with \(C_2 = H(A',C') \leftarrow Q(A',D')\), the resulting clause is not \(C_I\) because \(D'\) is not unified with any of the variables in \(C_1\) (whereas \(A'\) unifies with A and \(C'\) with C), e.g. the result includes the literal \(Q(A,D')\) instead of Q(A, C) hence it is not \(C_I\).
- 18.
Note that this proof also shows that \({{{\mathscr {K}}}}^{2}_{\infty }\) does not have a \({{{\mathscr {K}}}}^{2}_{3}\)-S-reduction.
- 19.
- 20.
Experimental data is available at http://github.com/andrewcropper/mlj19-reduce.
- 21.
A statistical test on paired nominal data https://en.wikipedia.org/wiki/McNemar%27s_test.
- 22.
A statistical test on paired ordinal data http://www.biostathandbook.com/pairedttest.html.
- 23.
The success set of a logic program P is the set of ground atoms \(\{A \in hb(P)|P\cup \{ \lnot A \}\;\text {has a SLD-refutation}\}\), where hb(P) represents the Herband base of the logic program P. The success set restricted to a specific predicate symbol p is the subset of the success set restricted to atoms containing the predicate symbol p.
Notes
Acknowledgements
The authors thank Stephen Muggleton and Katsumi Inoue for discussions on this topic. We especially thank Rolf Morel for valuable feedback on the paper.
References
- Albarghouthi, A., Koutris, P., Naik, M., & Smith, C. (2017). Constraint-based synthesis of Datalog programs. In J. C. Beck (Ed.), Principles and practice of constraint programming—23rd international conference, CP 2017, Melbourne, VIC, Australia, August 28–September 1, 2017, Proceedings, volume 10416 of Lecture Notes in Computer Science (pp. 689–706). Springer.Google Scholar
- Bienvenu, M. (2007). Prime implicates and prime implicants in modal logic. In Proceedings of the twenty-second AAAI conference on artificial intelligence, July 22–26, 2007, Vancouver, BC, Canada (pp. 379–384). AAAI Press.Google Scholar
- Blumer, A., Ehrenfeucht, A., Haussler, D., & Warmuth, M. K. (1987). Occam’s razor. Information Processing Letters, 24(6), 377–380.MathSciNetzbMATHGoogle Scholar
- Bradley, A. R., & Manna, Z. (2007). The calculus of computation-decision procedures with applications to verification. Berlin: Springer.zbMATHGoogle Scholar
- Campero, A., Pareja, A., Klinger, T., Tenenbaum, J., & Riedel, S. (2018). Logical rule induction and theory learning using neural theorem proving. ArXiv e-prints, September 2018.Google Scholar
- Church, A. (1936). A note on the Entscheidungsproblem. The Journal of Symbolic Logic, 1(1), 40–41.zbMATHGoogle Scholar
- Cohen, W. W. (1994). Grammatically biased learning: Learning logic programs using an explicit antecedent description language. Artificial Intelligence, 68(2), 303–366.zbMATHGoogle Scholar
- Cropper, A. (2017). Efficiently learning efficient programs. Ph.D. thesis, Imperial College London, UK.Google Scholar
- Cropper, A., Evans, R., & Law, M. (2019). Inductive general game playing. ArXiv e-prints, arXiv:1906.09627, Jun 2019.
- Cropper, A., & Muggleton, S. H. (2014). Logical minimisation of meta-rules within meta-interpretive learning. In J. Davis & J. Ramon (Eds.), Inductive logic programming—24th international conference, ILP 2014, Nancy, France, September 14–16, 2014. Revised selected papers, volume 9046 of Lecture Notes in Computer Science (pp. 62–75). Springer.Google Scholar
- Cropper, A., & Muggleton, S. H. (2015). Learning efficient logical robot strategies involving composable objects. In Yang, Q., & Wooldridge, M. (Eds.), Proceedings of the twenty-fourth international joint conference on artificial intelligence, IJCAI 2015, Buenos Aires, Argentina, July 25–31, 2015 (pp. 3423–3429). AAAI Press.Google Scholar
- Cropper, A., & Muggleton, S. H. (2016a). Learning higher-order logic programs through abstraction and invention. In Kambhampati, S. (Ed.), Proceedings of the twenty-fifth international joint conference on artificial intelligence, IJCAI 2016, New York, NY, USA, 9–15 July 2016 (pp. 1418–1424). IJCAI/AAAI Press.Google Scholar
- Cropper, A., & Muggleton, S. H. (2016b). Metagol system. https://github.com/metagol/metagol. Accessed 1 July 2019.
- Cropper, A., & Muggleton, S. H. (2019). Learning efficient logic programs. Machine Learning, 108(7), 1063–1083.MathSciNetzbMATHGoogle Scholar
- Cropper, A., Tamaddoni-Nezhad, A., & Muggleton, S. H. (2015). Meta-interpretive learning of data transformation programs. In Inoue, K., Ohwada, H., & Yamamoto, A. (Eds.), Inductive logic programming—25th international conference, ILP 2015, Kyoto, Japan, August 20–22, 2015, revised selected papers, volume 9575 of Lecture Notes in Computer Science (pp. 46–59). Springer.Google Scholar
- Cropper, A., & Tourret, S. (2018). Derivation reduction of metarules in meta-interpretive learning. In Riguzzi, F., Bellodi, E., & Zese, R. (Eds.), Inductive logic programming—28th international conference, ILP 2018, Ferrara, Italy, September 2–4, 2018, proceedings, volume 11105 of Lecture Notes in Computer Science (pp. 1–21). Springer.Google Scholar
- Dantsin, E., Eiter, T., Gottlob, G., & Voronkov, A. (2001). Complexity and expressive power of logic programming. ACM Computing Surveys, 33(3), 374–425.Google Scholar
- De Raedt, L. (2012). Declarative modeling for machine learning and data mining. In Algorithmic learning theory—23rd international conference, ALT 2012, Lyon, France, October 29–31, 2012. proceedings (p. 12).Google Scholar
- De Raedt, L., & Bruynooghe, M. (1992). Interactive concept-learning and constructive induction by analogy. Machine Learning, 8, 107–150.zbMATHGoogle Scholar
- Echenim, M., Peltier, N., & Tourret, S. (2015). Quantifier-free equational logic and prime implicate generation. In A. P. Felty & A. Middeldorp (Eds.), Automated deduction—CADE-25–25th international conference on automated deduction, Berlin, Germany, August 1–7, 2015, proceedings, volume 9195 of Lecture Notes in Computer Science (pp. 311–325). Springer.Google Scholar
- Emde, W., Habel, C., & Rollinger, C.-R. (1983). The discovery of the equator or concept driven learning. In M. Alanbundy (Ed.), Proceedings of the 8th international joint conference on artificial intelligence. Karlsruhe, FRG, August 1983 (pp. 455–458). William Kaufmann.Google Scholar
- Evans, R., & Grefenstette, E. (2018). Learning explanatory rules from noisy data. Journal of Artificial Intelligence Research, 61, 1–64.MathSciNetzbMATHGoogle Scholar
- Flener, P. (1996). Inductive logic program synthesis with DIALOGS. In Muggleton, S. (Ed.), Inductive logic programming, 6th international workshop, ILP-96, Stockholm, Sweden, August 26–28, 1996, selected papers, volume 1314 of Lecture Notes in Computer Science (pp. 175–198). Springer.Google Scholar
- Fonseca, N. A., Costa, V. S., Silva, F. M. A., & Camacho, R. (2004). On avoiding redundancy in inductive logic programming. In R. Camacho, R. D. King & A. Srinivasan (Eds.), Inductive logic programming, 14th international conference, ILP 2004, Porto, Portugal, September 6–8, 2004, proceedings, volume 3194 of Lecture Notes in Computer Science (pp. 132–146). Springer.Google Scholar
- Fürnkranz, J. (1997). Dimensionality reduction in ILP: A call to arms. In Proceedings of the IJCAI-97 workshop on frontiers of inductive logic programming (pp. 81–86).Google Scholar
- Garey, M. R., & Johnson, D. S. (1979). Computers and intractability: A guide to the theory of NP-completeness. New York: W. H. Freeman.zbMATHGoogle Scholar
- Genesereth, M. R., Love, N., & Pell, B. (2005). General game playing: Overview of the AAAI competition. AI Magazine, 26(2), 62–72.Google Scholar
- Gottlob, G., & Fermüller, C. G. (1993). Removing redundancy from a clause. Artificial Intelligence, 61(2), 263–289.MathSciNetzbMATHGoogle Scholar
- Gottlob, G., Leone, N., & Scarcello, F.(1997). On the complexity of some inductive logic programming problems. In N. Lavrac & S. Dzeroski (Eds.), Inductive logic programming, 7th international workshop, ILP-97, Prague, Czech Republic, September 17–20, 1997, proceedings, volume 1297 of Lecture Notes in Computer Science (pp. 17–32). Springer.Google Scholar
- Hemaspaandra, E., & Schnoor, H. (2011). Minimization for generalized boolean formulas. In T. Walsh (Ed.), IJCAI 2011, proceedings of the 22nd international joint conference on artificial intelligence, Barcelona, Catalonia, Spain, July 16–22, 2011 (pp. 566–571). IJCAI/AAAI.Google Scholar
- Heule, M., Järvisalo, M., Lonsing, F., Seidl, M., & Biere, A. (2015). Clause elimination for SAT and QSAT. Artificial Intelligence Research, 53, 127–168.MathSciNetzbMATHGoogle Scholar
- Hillenbrand, T., Piskac, R., Waldmann, U., & Weidenbach, C. (2013). From search to computation: Redundancy criteria and simplification at work. In A. Voronkov, & C. Weidenbach (Eds.), Programming logics - essays in memory of Harald Ganzinger, volume 7797 of Lecture Notes in Computer Science (pp. 169–193). Springer.Google Scholar
- Joyner, W. H, Jr. (1976). Resolution strategies as decision procedures. Journal of the ACM, 23(3), 398–417.MathSciNetzbMATHGoogle Scholar
- Kaminski, T., Eiter, T., & Inoue, K. (2018). Exploiting answer set programming with external sources for meta-interpretive learning. TPLP, 18(3–4), 571–588.MathSciNetzbMATHGoogle Scholar
- Kietz, J.-U., & Wrobel, S. (1992). Controlling the complexity of learning in logic through syntactic and task-oriented models. In Inductive logic programming. Citeseer.Google Scholar
- Kowalski, R. A. (1974). Predicate logic as programming language. In IFIP congress (pp. 569–574).Google Scholar
- Larson, J., & Michalski, R. S. (1977). Inductive inference of VL decision rules. SIGART Newsletter, 63, 38–44.Google Scholar
- Liberatore, P. (2005). Redundancy in logic I: CNF propositional formulae. Artificial Intelligence, 163(2), 203–232.MathSciNetzbMATHGoogle Scholar
- Liberatore, P. (2008). Redundancy in logic II: 2CNF and Horn propositional formulae. Artificial Intelligence, 172(2–3), 265–299.MathSciNetzbMATHGoogle Scholar
- Lin, D., Dechter, E., Ellis, K., Tenenbaum, J. B., & Muggleton, S. (2014). Bias reformulation for one-shot function induction. In ECAI 2014—21st European conference on artificial intelligence, 18–22 August 2014, Prague, Czech Republic—including prestigious applications of intelligent systems (PAIS 2014) (pp. 525–530).Google Scholar
- Lloyd, J. W. (1987). Foundations of logic programming (2nd ed.). Berlin: Springer.zbMATHGoogle Scholar
- Lloyd, J. W. (2003). Logic for learning. Berlin: Springer.zbMATHGoogle Scholar
- Marcinkowski, J., & Pacholski, L. (1992). Undecidability of the Horn-clause implication problem. In 33rd annual symposium on foundations of computer science, Pittsburgh, Pennsylvania, USA, 24–27 October 1992 (pp. 354–362).Google Scholar
- Marquis, P. (2000). Consequence finding algorithms. In Handbook of defeasible reasoning and uncertainty management systems (pp. 41–145). Springer.Google Scholar
- McCarthy, J. (1995). Making robots conscious of their mental states. In Machine intelligence 15, intelligent Agents [St. Catherine’s College, Oxford, July 1995] (pp. 3–17).Google Scholar
- Morel, R., Cropper, A., & Ong, C.-H. Luke (2019). Typed meta-interpretive learning of logic programs. In Calimeri, F., Leone, N., & Manna, M. (Eds.), Logics in artificial intelligence—16th European conference, JELIA 2019, Rende, Italy, May 7–11, 2019, proceedings, volume 11468 of Lecture Notes in Computer Science (pp. 198–213). Springer.Google Scholar
- Muggleton, S. (1995). Inverse entailment and Progol. New Generation Computing, 13(3&4), 245–286.Google Scholar
- Muggleton, S., De Raedt, L., Poole, D., Bratko, I., Flach, P. A., Inoue, K., et al. (2012). ILP turns 20-biography and future challenges. Machine Learning, 86(1), 3–23.MathSciNetzbMATHGoogle Scholar
- Muggleton, S., & Feng, C. (1990). Efficient induction of logic programs. In Algorithmic learning theory, first international workshop, ALT ’90, Tokyo, Japan, October 8–10, 1990, proceedings (pp. 368–381).Google Scholar
- Muggleton, S. H., Lin, D., Pahlavi, N., & Tamaddoni-Nezhad, A. (2014). Meta-interpretive learning: Application to grammatical inference. Machine Learning, 94(1), 25–49.MathSciNetzbMATHGoogle Scholar
- Muggleton, S. H., Lin, D., & Tamaddoni-Nezhad, A. (2015). Meta-interpretive learning of higher-order dyadic Datalog: Predicate invention revisited. Machine Learning, 100(1), 49–73.MathSciNetzbMATHGoogle Scholar
- Nédellec, C., Rouveirol, C., Adé, H., Bergadano, F., & Tausend, B. (1996). Declarative bias in ILP. Advances in inductive logic programming, 32, 82–103.Google Scholar
- Nienhuys-Cheng, S.-H., & de Wolf, R. (1997). Foundations of inductive logic programming. New York, Secaucus, NJ: Springer.zbMATHGoogle Scholar
- Plotkin, G.D. (1971). Automatic methods of inductive inference. Ph.D. thesis, Edinburgh University, August 1971.Google Scholar
- Robinson, J. A. (1965). A machine-oriented logic based on the resolution principle. Journal of the ACM, 12(1), 23–41.MathSciNetzbMATHGoogle Scholar
- Schmidt-Schauß, M. (1988). Implication of clauses is undecidable. Theoretical Computer Science, 59, 287–296.MathSciNetzbMATHGoogle Scholar
- Shapiro, E. Y. (1983). Algorithmic program debugging. London: MIT Press.zbMATHGoogle Scholar
- Si, X., Lee, W., Zhang, R., Albarghouthi, A., Koutris, P., & Naik, M. (2018). Syntax-guided synthesis of Datalog programs. In G. T. Leavens, A. Garcia, & C. S. Pasareanu (Eds.), Proceedings of the 2018 ACM joint meeting on european software engineering conference and symposium on the foundations of software engineering, ESEC/SIGSOFT FSE 2018, Lake Buena Vista, FL, USA, November 04–09, 2018 (pp. 515–527). ACM.Google Scholar
- Skillicorn, D. (2007). Understanding complex datasets: Data mining with matrix decompositions. New York: Chapman and Hall/CRC.zbMATHGoogle Scholar
- Tärnlund, S. Å. (1977). Horn clause computability. BIT, 17(2), 215–226.MathSciNetzbMATHGoogle Scholar
- Tourret, S., & Cropper, A. (2019). SLD-resolution reduction of second-order Horn fragments. In F. Calimeri, N. Leone & M. Manna (Eds.), Logics in artificial intelligence—16th European conference, JELIA 2019, Rende, Italy, May 7–11, 2019, proceedings, volume 11468 of Lecture Notes in Computer Science (pp. 259–276). Springer.Google Scholar
- Wang, W. Y., Mazaitis, K., & Cohen, W. W. (2014). Structure learning via parameter learning. In Li, J., Wang, X. S., Garofalakis, M. N., Soboroff, I., Suel, T., & Wang, M. (Eds.), Proceedings of the 23rd ACM international conference on conference on information and knowledge management, CIKM 2014, Shanghai, China, November 3–7, 2014 (pp. 1199–1208). ACM.Google Scholar
- Weidenbach, C., & Wischnewski, P. (2010). Subterm contextual rewriting. AI Communications, 23(2–3), 97–109.MathSciNetzbMATHGoogle Scholar
Copyright information
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.