Abstract
In Metainterpretive learning (MIL) the metarules, secondorder datalog clauses acting as inductive bias, are manually defined by the user. In this work we show that secondorder metarules for MIL can be learned by MIL. We define a generality ordering of metarules by \(\theta\)subsumption and show that userdefined sort metarules are derivable by specialisation of the mostgeneral matrix metarules in a language class; and that these matrix metarules are in turn derivable by specialisation of thirdorder punch metarules with variables quantified over the set of atoms and for which only an upper bound on their number of literals need be userdefined. We show that the cardinality of a metarule language is polynomial in the number of literals in punch metarules. We reframe MIL as metarule specialisation by resolution. We modify the MIL metarule specialisation operator to return new metarules rather than firstorder clauses and prove the correctness of the new operator. We implement the new operator as TOIL, a subsystem of the MIL system Louise. Our experiments show that as userdefined sort metarules are progressively replaced by sort metarules learned by TOIL, Louise’s predictive accuracy and training times are maintained. We conclude that automatically derived metarules can replace userdefined metarules.
Similar content being viewed by others
1 Introduction
MetaInterpretive Learning (MIL) (Muggleton et al., 2014; Muggleton & Lin, 2015) is a recent approach to Inductive Logic Programming (ILP) (Muggleton & de Raedt, 1994) capable of learning logic programs with recursive clauses and invented predicates from examples, background knowledge and a declarative bias, the “metarules”. Metarules are datalog clauses with variables quantified over predicate symbols that are therefore secondorder clauses.
Metarules are often interpreted as language bias or “clause templates” in earlier work, particularly outside the MIL literature, but they are in truth secondorder background knowledge used in mixedorder resolution with the firstorder background knowledge to derive the firstorder clauses of a hypothesis that explains the training examples. Resolution with secondorder metarules is made decidable thanks to their encapsulation into firstorder definite clauses, e.g. a metarule \(P(x,y) \leftarrow Q(x,y)\) is encapsulated as a definite clause \(m(P,x,y) \leftarrow m(Q,x,y)\).
In the MIL literature, metarules are typically defined by a user according to intuition or knowledge of a problem domain. A common criticism of MIL is the dearth of principled approaches for metarule selection (see e.g. Cropper & Tourret, 2018). In this work, we formalise MIL as metarule specialisation by resolution and thereby provide a principled approach to learning new metarules from examples, background knowledge and maximally general metarules.
In particular, we show that userdefined, fullyconnected sort metarules commonly used in the MIL literature can be derived automatically by specialisation of maximally general secondorder matrix metarules; and that matrix metarules can be themselves derived by specialisation of thirdorder punch metarules. This specialisation can be performed by a standard MIL clause construction operator modified to return metarules rather than firstorder clauses. Predicate invention, performed in MIL by resolution between metarules, is equivalent to a derivation of new metarules with arbitrary numbers of literals. Additionally, specialisation of thirdorder, punch metarules imposes no restriction on the arities of literals of the derived metarules. These two capabilities combined finally liberate MIL from the confines of the \(H^2_2\) language fragment of metarules with up to two body literals of arity at most 2, that is almost exclusively used in the literature and that is expressive but difficult to use in practice. Table 1 illustrates punch, matrix and sort metarules and their generality relations, highlighting the subsumption ordering of secondorder metarules with arbitrary numbers of literals of arbitrary arities. We illustrate learning metarules outside \(H^2_2\) with a worked example in Sect. 7.2. We present an example of learning in \(H^2_2\) in Appendix B.
While the number of specialisations of third order punch metarules can grow very large, they can be derived efficiently by Top Program Construction (TPC) (Patsantzis & Muggleton, 2021), a polynomialtime MIL algorithm that forms the basis of the MIL system Louise (Patsantzis & Muggleton, 2019a). We implement metarule learning by TPC in Louise as a new subsystem called TOIL.
We make the following contributions:

We define a generality ordering of third and secondorder metarules and firstorder clauses, by \(\theta\)subsumption.

We prove that metarules and firstorder clauses are derivable by specialisation of more general metarules.

We redefine MIL as metarule specialisation by SLDresolution.

We propose a modified MIL specialisation operator to return metarules rather than firstorder clauses and prove its correctness.

We prove that sets of metarule specialisations are polynomialtime enumerable.

We implement our modified operator as a subsystem of Louise called TOIL.

We verify experimentally that when userdefined metarules are replaced by metarules learned by TOIL, Louise maintains its predictive accuracy at the cost of a small increase in training times.
In Sect. 2 we discuss relevant earlier work. In Sect. 3 we give some background on MIL. In Sect. 4 we develop the framework of MIL as metarule specialisation and derive our main theoretical results. In Sect. 5 we describe TOIL. In Sect. 6 we compare Louise’s performance with userdefined and TOILlearned metarules. We summarise our findings in Sect. 7 and propose future work.
2 Related work
Declarative bias in clausal form, similar to metarules, is common in machine learing. Emde et al. (1983) propose the use of Horn clauses to declare transitivity, conversity, or parallelism relations between binary predicates, however these early “metarules” are firstorder definite clauses with ground predicate symbols. A related approach of “rule models”, having variables in place of predicate symbols, is developed in later systems METAXA.3 (Emde 1987), BLIP (Wrobel 1988) and MOBAL (Kietz and Wrobel 1992; Morik 1993). More recent work in ILP and program synthesis uses metarules as templates to restrict the hypothesis search space, for example the work by Evans and Grefenstette (2018) or Si et al. (2018), and Si et al. (2019) which draw inspiration from MIL but use metarules only as templates.
By contrast, metarules are used in MIL not as “templates” with “blanks” to be “filled in”, but as secondorder formulae with the structure of datalog clauses that are resolved with firstorder background knowledge to refute examples and derive the firstorder clauses of a hypothesis. This use of metarules in reasoning is unique to MIL, which includes our present work.
Our work improves on the earlier use of metarules in MIL and ILP in several ways. Firstly, we extend the concept of metarules to include thirdorder metarules with variables quantified over the set of atoms. Such metarules are too general to be used efficiently as clause templates and are best understood within a generality framework that relates them to secondorder metarules. Secondly, we define such a generality ordering by \(\theta\)subsumption over third and secondorder metarules and firstorder definite clauses. Kietz and Wrobel (1992) define a generality ordering between “rule models” as a special case of \(\theta\)subsumption (Plotkin, 1972) restricted to variable instantiation. Our framework instead extends the full definition of \(\theta \)subsumption to metarules. Si et al. (2018) also extend \(\theta\)subsumption to metarules without restriction, but maintain two disjoint generality orders: one for metarules; and one for firstorder clauses. Our work is the first to consider thirdorder metarules and place third and secondorder metarules and firstorder clauses in a single ordering. Thirdly, we formalise the description of MIL as metarule specialisation by resolution and thus explain clause construction and predicate invention in MIL as the application of a metarule specialisation operator.
Our main contribution is the modification of the MIL specialisation operator to derive new metarules by specialisation of more general metarules, thus learning secondorder theories. Previous work on MIL requires metarules to be defined manually, by intuition or domain knowledge, but our approach learns metarules from examples and firstorder background knowledge. Automatic selection of metarules for MIL by logical reduction of a metarule language is studied by Cropper and Muggleton (2015) and Cropper and Tourret (2018). Unlike this earlier work, our approach learns new metarules rather than selecting from a userdefined set and does not suffer from the irreducibility of metarule languages to minimal sets reported by Cropper and Tourret (2018). Our approach naturally reduces each metarule language to a minimal set of mostgeneral metarules from which all other metarules in the same language can be derived. Further, in our approach, the number of literals in thirdorder metarules suffices to define a metarule language. Finally, previous work on MIL has remained restricted to the \(H^2_2\) language, of datalog metarules with at most two body literals of arity at most 2, which is expressive but difficult to use in practice.^{Footnote 1} Our approach is capable of learning metarules without restriction to the numbers of literals or their arities.
Our work is comparable to recent work by Cropper and Morel (2021), on the system Popper, and Si et al. (2019) on the system ALPS, both of which are capable of learning inductive bias, the latter in the form of metarules. Both of those systems employ generateandtest approaches guided by \(\theta\)subsumption. Our approach differs to Popper and ALPS in that it is an application of SLD resolution to second and thirdorder clauses, rather than a generateandtest approach.
Our implementation extends the Top Program Construction algorithm (Patsantzis & Muggleton, 2021) that avoids an expensive search of a potentially large hypothesis space and instead constructs all clauses that entail an example with respect to background knowledge. We extend this earlier work with the ability to construct secondorder clauses without compromising efficiency.
3 Background
3.1 Logical notation
In this section, we extend the terminology established in NienhuysCheng and de Wolf (1997) with MILspecific terminology for second and thirdorder definite clauses and programs. We only describe the salient terms in the nomenclature.
We define a language of first and higherorder logic programs composed of clauses, themselves composed of terms, as follows. \({\mathcal{P}}, {\mathcal{F}}, {\mathcal{C}}, {\mathcal{A}}\) are disjoint sets of predicate symbols, function symbols, constants and atoms, respectively. Variables are quantified over (the elements of) a set. A variable is a term. A constant is a term. If F is a function symbol or a variable quantified over \({\mathcal {F}}\), and \(t_1, \ldots, t_n\) are terms, then \(F(t_1, \ldots, t_n)\) is a term, and n is the arity of F. If P is a predicate symbol or a variable quantified over \({\mathcal {P}}\), and \(t_1, \ldots, t_n\) are terms, then \(P(t_1, \ldots, t_n)\) is an atomic formula, or simply atom, and n is the arity of P. Arities are natural numbers. Commaseparated terms in parentheses are the arguments of a term or an atom. Terms with \(n > 0\) arguments are functional terms, or simply functions. Constants are functions with 0 arguments.
A literal is an atom, or the negation of an atom, or a variable quantified over \({\mathcal {A}}\), or the negation of a variable quantified over \({\mathcal {A}}\). A clause is a set of literals, interpreted as a disjunction. A logic program, or simply program, is a set of clauses, interpreted as a conjunction. A clause is Horn if it has at most one positive literal. A Horn clause is definite if it has exactly one positive literal, otherwise it is a Horn goal. A literal is datalog (Ceri et al., 1989) if it contains no functions other than constants. A Horn clause is datalog if it contains only datalog literals and each variable in a positive literal is shared with at least one negative literal. A logic program is definite if it contains only definite clauses and datalog if it contains only datalog clauses.
Terms without variables are ground. Atoms with only ground terms are ground. Clauses with only ground atoms are ground. Ground terms, atoms and clauses are 0’thorder. Variables quantified over \({\mathcal {C}}\) are firstorder, variables quantified over \({\mathcal {P}}\) or \({\mathcal {F}}\) are secondorder, and variables quantified over \({\mathcal {A}}\) are thirdorder. Functions and atoms with order k arguments are order k. Clauses with order k literals are order k. Programs with order k clauses are order k. A nonground order k term, atom or clause, or an order k program is also order \(k1\).
We denote a variable X quantified over a set S as \(\exists _{\in S} X\) or \(\forall _{\in S} X\).
A substitution of variables in a clause C is a finite set \(\vartheta = \{x_1/t_1, \ldots, x_n/t_n\}\) mapping each variable \(x_i\) in C to a term, atom, or symbol \(t_i\). \(C\vartheta\), the application of \(\vartheta\) to C, replaces each occurrence of \(x_i\) with \(t_i\) simultaneously.
In keeping with logic programming convention, we will write clauses as implications, e.g. the definite clause \({A \vee \lnot B \vee \lnot C}\) will be written as \(A \leftarrow B, C\) where the comma, “,”, indicates a conjunction; and refer to the positive and negative literals in a clause as the “head” and “body” of the clause, respectively.
3.2 Metainterpretive learning
MIL is an approach to ILP where programs are learned by specialisation of a set of higherorder metarules. Metarules are defined in the MIL literature as secondorder definite datalog clauses with existentially quantified variables in place of predicate symbols and constants (Muggleton & Lin, 2015). In Sect. 4 we introduce thirdorder metarules, with variables quantified over \({\mathcal {A}}\) as literals.
A system that performs MIL is a MetaInterpretive Learner, or MILlearner (with a slight abuse of abbreviation to allow a natural pronunciation). Examples of MIL systems are Metagol (Cropper & Muggleton, 2016b), Thelma (Patsantzis & Muggleton, 2019b), Louise (Patsantzis & Muggleton, 2019a) and Hexmil (Kaminski et al., 2018). A MILlearner is given the elements of a MIL problem and returns a hypothesis, a logic program constructed as a solution to the MIL problem. A MIL problem is a sextuple, \({\mathcal {T}} = \langle E^+, E^, B, {\mathcal {M}}, I, {\mathcal {H}} \rangle\) where (a) \(E^+\) is a set of ground atoms and \(E^\) is a set of negated ground atoms of one or more target predicates, the positive and negative examples, respectively; (b) B is the background knowledge, a set of definite clause definitions with datalog heads; (c) \({\mathcal {M}}\) is a set of secondorder metarules; (d) I is a set of additional symbols reserved for invented predicates not defined in B or \(E^+\); and (e) \({\mathcal {H}}\) is the hypothesis space, a set of hypotheses.
Each hypothesis H in \({\mathcal {H}}\) is a set of datalog clauses, a definition of one or more target predicates in \(E^+\), and may include definitions of one or more predicates in I. For each \(H \in {\mathcal {H}}\), if \(H \wedge B \models E^+\) and \(\forall e^ \in E^: H \wedge B \not \models e^\), then H is a correct hypothesis.
The set, \({\mathcal {L}}\), of clauses in all hypotheses in \({\mathcal {H}}\) is the Hypothesis Language. For each clause \(C \in {\mathcal {L}}\), there exists a metarule \(M \in {\mathcal {M}}\) such that \(M \wedge B \wedge E^+ \models C\), i.e. each clause in \({\mathcal {L}}\) is an instance of a metarule in \({\mathcal {M}}\) with secondorder existentially quantified variables substituted for symbols in \({\mathcal {P}}\) and firstorder existentially quantified variables substituted for constants in \({\mathcal {C}}\). \({\mathcal {P}}\) and \({\mathcal {C}}\) are populated from the symbols and constants in \(B, E^+\) and I.
Typically a MIL learner is not explicitly given \({\mathcal {H}}\) or \({\mathcal {L}}\), rather those are implicitly defined by \({\mathcal {M}}\) and the constants \({\mathcal {C}}\) and symbols \({\mathcal {P}}\).
A substitution of the existentially quantified variables in a metarule M is a metasubstitution of M. By logic programming convention, a substitution of universally quantified variables is denoted by a lowercase letter, \(\vartheta , \sigma , \omega\). We follow the convention and also denote metasubstitutions with capital Greek letters, \(\varTheta , \varSigma , \varOmega\), etc. \(\vartheta \varTheta\) denotes the composition of the substitution \(\vartheta\) and metasubstitution \(\varTheta\). For brevity, we refer to the composition of a substitution and metasubstitution as meta/substitution. To help the reader distinguish a meta/substitution from its application to a metarule we note a meta/substitution as, e.g. \(\vartheta \varTheta /M\), whereas \(M\vartheta \varTheta\) is the result of applying \(\vartheta \varTheta\) to M.
MIL learners construct clauses in H during refutationproof of a set of positive examples by SLDresolution (NienhuysCheng & de Wolf, 1997) with B and \({\mathcal {M}}\). Resolution is performed by a metainterpreter designed to preserve the metasubstitutions of metarules in a successful proof by refutation, while discarding the substitutions of universally quantified variables to avoid overspecialisation. Metasubstitutions applied to their corresponding metarules are firstorder definite clauses. In a “second pass” negative examples are refuted by the same metainterpreter with B, H and \({\mathcal {M}}\), and any clauses in H found to entail a negative example are either removed from H (Louise) or replaced on backtracking (Metagol).
Resolution between secondorder metarules and firstorder clauses is made decidable by encapsulation, a mapping from metarules with existentially and universally quantified second and firstorder variables, to definite clauses with only universally quantified, first order variables. For example, the metarule \(\exists P,Q \forall x,y: P(x,y) \leftarrow Q(x,y)\) is encapsulated as the firstorder definite clause \(\forall P,Q,x,y: m(P,x,y) \leftarrow m(Q,x,y)\). Encapsulation further maps each predicate symbol \(p \in {\mathcal {P}}\) to a new constant \(p \in {\mathcal {C}}\).^{Footnote 2}
4 Framework
4.1 Metarule languages
In this section we introduce a formal notation for sets of metarules.
Definition 1
A metarule language \({\mathscr {M}}^{l}_{a}\) is a set of metarules and their instances where l is a natural number or an interval over the natural numbers, denoting the number of literals in clauses in \({\mathscr {M}}^{l}_{a}\), and a is a natural number, or an interval over the natural numbers, or a sequence of natural numbers, denoting the arities of literals in clauses in \({\mathscr {M}}^{l}_{a}\).
When the arity term, a, in \({\mathscr {M}}^{l}_{a}\) is a sequence, a total ordering is assumed over the literals in each clause in \({\mathscr {M}}^{l}_{a}\) such that (a) the positive literal is ordered before any negative literals, (b) negative literals are ordered by lexicographic order of the names of their symbols and variables and (c) literals with the same symbol and variable names are ordered by ascending arity.
Example 1
\({\mathscr {M}}^{3}_{2}\) is the language of metarules and firstorder definite clauses with exactly three literals each of arity exactly 2; \({\mathscr {M}}^{[2,3]}_{\,[1,2]}\) is the language of metarules and firstorder definite clauses with 2 or 3 literals of arities between 1 and 2; and \({\mathscr {M}}^{3}_{\langle 1,2,3 \rangle }\) is the language of metarules and firstorder definite clauses having exactly three literals, a positive literal of arity 1 and two negative literals of arity 2 and arity 3, in that order.
The arities of literals in thirdorder metarules may not be known or relevant, in which case the arity term may be omitted from the formal notation of a language.
Example 2
\({\mathscr {M}}^3\) is the language of thirdorder metarules, secondorder metarules and firstorder definite clauses with exactly 3 literals. \({\mathscr {M}}^{[1,5]}\) is the language of third and secondorder metarules and firstorder definite clauses with 1 to 5 literals.
Of special interest to MIL is the \({\mathscr {M}}^{[2,3]}_{2}\) language of fullyconnected (see Definition 2) secondorder metarules with one function symbol which is decidable when \({\mathcal {P}}\) and \({\mathcal {C}}\) are finite (Muggleton & Lin, 2015) and secondorder variables are universally quantified by firstorder encapsulation. We denote this language exceptionally as \(H^2_2\) in keeping with the MIL literature. The set of 14 secondorder \(H^2_2\) metarules defined in Cropper and Muggleton (2015) are given in Table 2.
4.2 Metarule taxonomy
Metarules found in the MIL literature, such as the ones listed in Table 2 are typically fullyconnected. Definition 2 extends the definition of fullyconnected metarules in Cropper and Muggleton (2015) to encompass firstorder clauses.
Definition 2
(Fullyconnected datalog) Let M be a secondorder metarule or a firstorder definite clause. Two literals \(l_i, l_j \in M\) are connected if they share a firstorder term, or if there exists a literal \(l_k \in M\) such that \(l_i, l_k\) are connected and \(l_j, l_k\) are connected. M is fully connected iff each literal l in M appears exactly once in M and l is connected to every other literal in M.
Fullyconnected metarules are specialised, in the sense that all their universally quantified variables are shared between their literals; existentially quantified variables may also be shared. Accordingly, fullyconnected metarules can be generalised by replacing each instance of each of their variables with a new, unique variable of the same order and quantification as the replaced variable. Applying this generalisation procedure to the metarules in Table 2 we obtain the metarules in Table 3.
We observe that each metarule in Table 3 is a generalisation of a metarule in the \(H^2_2\) language listed in Table 2 and so the metarules in Table 3 are the mostgeneral metarules in \(H^2_2\). Further, those mostgeneral \(H^2_2\) metarules can themselves be generalised to the thirdorder metarules in Table 4 by replacing each of their literals with a thirdorder variable. This observation informs our definition of three taxa of metarules and a total ordering by \(\theta\)subsumption of third and secondorder metarules and their firstorder instances.
4.3 Punch, sort and matrix metarules
In the following sections we employ a moveable type metaphor for the elements of our metarule taxonomy. In typeset printing, first a punch of a glyph is sculpted in relief in steel. The punch is used to emboss the shape of the glyph in copper, creating a matrix. The copper matrix is filled with molten soft metal to form a cast of the glyph called a sort and used to finally imprint the glyph onto paper. Thus each “level” of type elements “stamps” its shape onto the next.
Accordingly, in our taxonomy of metarules, a thirdorder metarule is a punch metarule denoted by \(\breve{M}\), a mostgeneral metarule in a secondorder language is a matrix metarule denoted by \(\mathring{M}\), and a fullyconnected secondorder metarule is a sort metarule, denoted by \(\dot{M}\). As a mnemonic device the reader may remember that a “wider” accent denotes higher generality.
Definition 3
(Punch metarules) A punch metarule \(\breve{M}\) is a thirdorder definite clause of the form: \(\exists _{\in {\mathcal {A}}} A_1, A_2, \ldots, A_n: \{A_1 \vee \lnot A_2 \vee \ldots \vee \lnot A_n\}\).
Definition 4
(Matrix metarules) A matrix metarule \(\mathring{M}\) (not to be confused with the linear algebra, or firstorder logic concepts of a matrix) is a secondorder definite clause of the form: \(\exists _{\in {\mathcal {P}}} \tau , \exists _{\in {\mathcal {C}}} \sigma , \forall _{\in {\mathcal {C}}} \rho : \{L_1 \vee \lnot L_2 \vee \ldots \vee \lnot L_n\}\) where \(\tau , \sigma , \rho\) are disjoint sets of variables, each \(L_i \in \mathring{M}\) is a secondorder atom \(P_i(v_{i1}, \ldots, v_{im})\), \(P_i \in \tau , v_{i1}, \ldots, v_{im} \in \sigma \cup \rho\) and none of \(P_i, v_{i1}, \ldots, v_{im}\) is shared with any other literal \(L_k \in \mathring{M}\).
Definition 5
(Sort metarules) A sort metarule \(\dot{M}\) (not to be confused with the logic programming concept of a sort) is a secondorder definite clause of the form: \(\exists _{\in {\mathcal {P}}} \tau , \exists _{\in {\mathcal {C}}} \sigma , \forall _{\in {\mathcal {C}}} \rho : \{L_1 \vee \lnot L_2 \vee \ldots \vee \lnot L_n\}\) where \(\tau , \sigma , \rho\) are disjoint sets of variables, each \(L_i \in \dot{M}\) is a secondorder atom \(P_i(v_{i1}, \ldots, v_{im})\), \(P_i \in \tau , v_{i1}, \ldots, v_{im} \in \sigma \cup \rho\) and at least one \(L_i \in \dot{M}\) shares a variable in \(\tau \cup \sigma \cup \rho\) with at least one other literal \(L_k \in \dot{M}\).
Note 1
Sort metarules are not necessarily fullyconnected, rather fullyconnected metarules are a subset of the sort metarules. The metarules typically used in the MIL literature, such as the 14 Canonical \(H^2_2\) metarules in Table 2 are fully connected sort metarules.
We define metarules as sets of literals according to Sect. 3.1. As discussed at the end of that section we will write clauses in the logic programming convention, as implications, and this also applies to metarules. Additionally, in keeping with MIL convention we will write metarules concisely without quantifiers instead denoting quantification by means of capitalisation: uppercase letters for existentially quantified variables, lowercase letters for universally quantified variables.
Thus, we will write a punch metarule \(\exists _{\in {\mathcal {A}}}P,Q,R: \{P \vee \lnot Q \vee \lnot R\}\) as an implication \(P \leftarrow Q, R\), a matrix metarule \(\exists _{\in {\mathcal {P}}} P,Q,R, \forall _{\in {\mathcal {C}}}x,y,z,u,v,w: \{P(x,y) \vee \lnot Q(z,u) \vee \lnot R(v,w)\}\) as an implication \(P(x,y) \leftarrow Q(z,u), R(v,w)\) and a sort metarule \(\exists _{\in {\mathcal {P}}}P, Q, R\), \(\exists _{\in {\mathcal {C}}}X, \forall _{\in {\mathcal {C}}}x,y,z: \{P(x,y) \vee \lnot Q(x,z) \vee \lnot R(X)\}\) as an implication \(P(x,y) \leftarrow Q(x,z), R(X)\).
Defining metarules as sets of clauses facilitates their comparison in terms of generality, while denoting them as implications, without quantifiers, makes them easier to read and closely follows their implementation in MIL systems as Prolog clauses (with encapsulation).
4.4 Metarule generality order
We extend \(\theta\)subsumption between clauses, as defined by Plotkin (1972), to encompass metarules with existentially quantified variables:
Definition 6
(Metasubsumption) Let C be a metarule or a firstorder definite clause and D be a metarule or a firstorder definite clause. \(C \preceq D\) (read C subsumes D) iff \(\exists \vartheta , \varTheta : C \vartheta \varTheta \subseteq D\) where \(\vartheta\) is a substitution of the universally quantified variables in C and \(\varTheta\) is a metasubstitution of the existentially quantified variables in C.
Lemma 1
(3rdorder subsumption) Let \(\breve{M}\) be a punch metarule in the language \({\mathscr {M}}^{l}\) and \(\mathring{M}\) be a matrix metarule in the language \({\mathscr {M}}^{k}_{a}\). Then, \(\forall a: l \le k \rightarrow \breve{M} \preceq \mathring{M}\).
Proof
Let \(\breve{M} = \{A_1 \vee \lnot A_2 \ldots \vee \lnot A_l\}\) and \(\mathring{M} = \{L_1 \vee \lnot L_2 \ldots \vee \lnot L_k\}\). Assume a total ordering over the literals in \(\breve{M}\) and \(\mathring{M}\) as described in Sect. 4.2. Let \(\vartheta = \emptyset\) and let \(\varTheta\) be the metasubstitution that maps each \(A_i \in \breve{M}\) to each \(L_i \in \mathring{M}\). While \(l \le k\), \(\exists \vartheta , \varTheta\) and \(\breve{M} \vartheta \varTheta \subseteq \mathring{M}\). \(\square\)
Lemma 2
(2ndorder subsumption) Let \(\mathring{M}\) be a matrix metarule in the language \({\mathscr {M}}^{l}_{a}\) and \(\dot{M}\) be a sort metarule in the language \({\mathscr {M}}^{k}_{b}\), where a, b are two integers or two sequences of integers having the same first element. Then \(l \le k \rightarrow \mathring{M} \preceq \dot{M}\) iff \(a = b\) or a is a subsequence of b.
Proof
Let \(P_1, \ldots, P_l\) be the existentially quantified variables and \(v_1, \ldots, v_n\) the universally quantified variables in \(\mathring{M} \in {\mathscr {M}}^{l}_{a}\). Let \(Q_1, \ldots, Q_k\) be the existentially quantified variables and \(u_1, \ldots, u_m\) the universally quantified variables in \(\dot{M} \in {\mathscr {M}}^{k}_{b}\). Assume a total ordering over the literals in \(\mathring{M}\) and \(\dot{M}\) as described in Sect. 4.2. Let \(\vartheta\) be the substitution that maps each \(v_i\) to \(u_i\) and \(\varTheta\) the metasubstitution that maps each \(P_j\) to \(Q_j\). While \(l \le k\), and either \(a \le b\) or a is a subsequence of b, \(\exists \vartheta , \varTheta\) and \(\mathring{M} \vartheta \varTheta \subseteq \dot{M}\). \(\square\)
Lemma 3
(1storder subsumption) Let \(\dot{M}\) be a sort metarule in the language \({\mathscr {M}}^{l}_{a}\) and C be a firstorder clause in the language \({\mathscr {M}}^{k}_{b}\), where a, b are two integers or two sequences of integers having the same first element. Then \(l \le k \rightarrow \dot{M} \preceq C\) iff \(a = b\) or a is a subsequence of b.
Proof
Let \(P_1, \ldots, P_l\) be the existentially quantified variables and \(v_1, \ldots, v_n\) be the universally quantified variables in \(\dot{M} \in {\mathscr {M}}^{l}_{a}\). Let \(Q_1, \ldots, Q_k\) be the predicate symbols and constants in \(C \in {\mathscr {M}}^{k}_{b}\) and \(t_1, \ldots, t_m\) be the firstorder terms in C. Assume a total ordering over the literals in \(\dot{M}\) and C as described in Sect. 4.2.. Let \(\vartheta\) be the substitution that maps each \(v_i\) to \(t_i\) and \(\varTheta\) be the metasubstitution that maps each \(P_j\) to \(Q_j\). While \(l \le k\), and either \(a \le b\) or a is a subsequence of b, \(\exists \vartheta , \varTheta\) and \(\dot{M} \vartheta \varTheta \subseteq C\). \(\square\)
Corollary 1
Let \({\mathscr {M}}^{l}_{a}\) be a metarule language. There exists a unique, minimal set of matrix metarules \({\mathcal {M}}^* = \{\mathring{M}_1, \ldots, \mathring{M}_n\} \subseteq {\mathscr {M}}^{l}_{a}\) such that for each sort metarule \(\dot{M} \in {\mathscr {M}}^{l}_{a}\) \(\exists \mathring{M}_i \in {\mathcal {M}}^*: \mathring{M}_i \preceq \dot{M}\). Each \(\mathring{M}_i \in {\mathcal {M}}^*\) can be derived by replacing each variable in any single \(\dot{M} \in {\mathscr {M}}^{l}_{a}\) subsumed by \(\mathring{M}_i\) with a new, unique variable.
4.5 Metarule specialisation
We now show how firstorder clauses and secondorder metarules can be derived by specialisation of more general metarules. We define two ways to specialise a metarule or a clause: by variable substitution or introduction of new literals.
Note 2
In the following definitions and theorem, let \(M_1, M_2\) be two metarules, or a metarule and a definite clause.
Definition 7
(Vspecialisation) Let \(\vartheta , \varTheta\) be substitutions of the universally and existentially quantified variables, respectively, in \(M_1\) such that \(M_1 \vartheta \varTheta = M_2\). Then \(M_1 \vartheta \varTheta\) is a variable specialisation, or vspecialisation, of \(M_1\), and \(M_2\) is derivable from \(M_1\) by vspecialisation, or \(M_1 \vdash _v M_2\).
Definition 8
(Lspecialisation) Let L be a set of literals such that \(M_1 \cup L = M_2\). Then \(M_1 \cup L\) is a literal specialisation, or lspecialisation, of \(M_1\), and \(M_2\) is derivable from \(M_1\) by lspecialisation, or \(M_1 \vdash _l M_2\).
Definition 9
(VL Specialisation) Let \(M_1 \vartheta \varTheta\) be a vspecialisation of \(M_1\), \(M_1 \cup L\) be an lspecialisation of \(M_1\), and \(M_1 \vartheta \varTheta \cup L = M_2\). Then \(M_1 \vartheta \varTheta \cup L\) is a variable and literal specialisation, or vlspecialisation, of \(M_1\), and \(M_2\) is derivable from \(M_1\) by vlspecialisation, or \(M_1 \vdash _{vl} M_2\).
Theorem 1
(Metarule specialisation) \(M_1 \preceq M_2 \rightarrow M_1 \vdash _{vl} M_2\).
Proof
If \(M_1 \preceq M_2\) then: a) \(\exists \vartheta , \varTheta : M_1 \vartheta \varTheta \subseteq M_2\) and b) \(\exists L: M_1 \vartheta \varTheta \cup L = M_2\). (a) follows directly from Definition 6. (b) follows from (a) and the subset relation: if \(M_1 \vartheta \varTheta \subseteq M_2\) then \(\exists L \in M_2: M_2 \setminus L = M_1 \vartheta \varTheta\) and \(M_1 \vartheta \varTheta \cup L = M_2\). By Definitions 7 and 8, \(M_1 \vartheta \varTheta\) is a vspecialisation of \(M_1\) and \(M_1 \vartheta \varTheta \cup L\) is an lspecialisation of \(M_1 \vartheta \varTheta\). Therefore, \(M_1 \vdash _v M_1 \vartheta \varTheta \vdash _l M_1 \vartheta \varTheta \cup L = M_2\) and so \(M_1 \vdash _{vl} M_2\). \(\square\)
Observation 1
There are two special cases of (a) in the proof of Theorem 1 with respect to the set of literals L: either \(M_1 \vartheta \varTheta = M_2\) or \(M_1 \vartheta \varTheta \subset M_2\). In the case where \(M_1 \vartheta \varTheta = M_2\), \(L = \emptyset\). Otherwise, \(L \ne \emptyset\).
4.6 MIL as metarule specialisation
In this section we explain MIL as vlspecialisation of metarules.
Algorithm 1 lists the MIL specialisation operator used in a MIL meta  interpreter to construct new clauses by refutation of a literal, \(\lnot e\). MIL systems implement Algorithm 1 idiosyncratically: In Metagol \(B^* = B \cup H\) and \(\lnot e\) is refuted with B, H or \({\mathcal {M}}\) successively (Cropper & Muggleton, 2016a); in Louise \(B^* = B \cup E^+\) and \(\lnot e\) is refuted with \(B^* = B \cup E^+\) and \({\mathcal {M}}\) simultaneously (Patsantzis & Muggleton, 2021).
Initially, e is a positive example in \(E^+\) and if refutation of \(\lnot e\) succeeds the returned clause \(M\varTheta\) is a clause in the definition of a target predicate.
If refutation fails, each atom in \(body(M\sigma \varSigma )\) in line 4 in Procedure CONSTRUCT becomes the input literal \(\lnot e\) and is refuted recursively by resolution with \(B^* \cup {\mathcal {M}}\), until \(\square\) is derived. In that case, \(M\varTheta\) is a clause in the definition of an invented predicate, thus predicate invention in MIL is achieved by resolution between metarules. Given that metarules do not have predicate symbols, when \(\lnot body(M\sigma \varSigma )\) is successfully refuted, the existentially quantified secondorder variable P in \(head(M\varTheta )\) remains free. Hence, a new predicate symbol in I is substituted for P.^{Footnote 3}
If e is in \(E^\) and refutation succeeds, \(M\varTheta\) is inconsistent and must be replaced in, or discarded from H.
We observe that Algorithm 1 returns vlspecialisations of metarules in \({\mathcal {M}}\).
Theorem 2
(MIL as metarule specialisation) Let \(e, B^*, {\mathcal {M}}\) be as in Algorithm 1, M be a fullyconnected sort metarule selected in line 2 of Procedure CONSTRUCT and \(M\varTheta =\) CONSTRUCT\((\lnot e, B^*, {\mathcal {M}})\). \(M\varTheta\) is a vlspecialisation of M.
Proof
Let \(\vartheta = L = \emptyset\). By Definition 9\(M\vartheta \varTheta \cup L = M \varTheta\) is a vlspecialisation of M. \(\square\)
4.7 Implicit lspecialisation
Theorem 2 states that Procedure CONSTRUCT returns vlspecialisations of metarules when the set of introduced literals, L, is empty. This is a special case of vlspecialisation. What about the general case, when \(L \ne \emptyset\)? We conjecture that it is not necessary to explicitly construct such nonempty lspecialisations, because the v/specialisations returned by Procedure CONSTRUCT suffice to reconstruct nonempty l/specialisations by resolution. Suppose \(M_1, M_2 \in {\mathcal {M}}\) and \(C_1, C_2\) are nonempty v/specialisations of \(M_1, M_2\), respectively, such that \(\exists e \in E^+, \not \exists e \in E^: \{\lnot e\} \cup \{C_1, C_2\} \cup B^* \vdash \square\). We assume that \(C_1, C_2\) can resolve with each other and that \(\{\lnot e\} \cup B^* \setminus \{e\} \cup \{C_{i \in \{1,2\}}\} \nvdash \square\). Then, there exists a resolvent \(C_3\) of \(C_1, C_2\) such that \(\exists e \in E^+, \not \exists e \in E^: \{\lnot e\} \cup \{C_3\} \cup B^* \vdash \square\). This follows from the Resolution Theorem (Robinson, 1965). Moreover, there exists a metarule \(M_3\) that is a resolvent of \(M_1, M_2\) and such that \(C_3\) is a vlspecialisation of \(M_3\) where the set of introduced literals, L, is not empty. If so, it should not be necessary to explicitly derive \(M_3\) and \(C_3\), given \(M_1, M_2\) and Procedure CONSTRUCT. Cropper and Muggleton (2015) prove our conjecture for the \(H^2_2\) language. We leave a more general proof for future work. Table 5 illustrates the concept of such occult specialisations.
4.8 Metarule specialisation by MIL
Algorithm 1 learns firstorder definite clauses. Our motivation for this work is to learn metarules that can replace userdefined metarules. Userdefined metarules are fullyconnected sort metarules and chosen so that if M is a userdefined metarule and \(M\varTheta\) is a vlspecialisation of M returned by Procedure CONSTRUCT in Algorithm 1, then \(\exists e^+ \in E^+: M\varTheta \wedge B^* \models e^+\). Therefore, to replace userdefined metarules with automatically derived metarules, we must automatically derive fullyconnected sort metarules having vlspecialisations that entail one or more positive examples in \(E^+\) with respect to \(B^*\).
We achieve this goal by modifying Algorithm 1, as Algorithm 2, to generalise the substitutions of both universally and existentially quantified variables in metarules. Such substitutions are fullyground by successful resolution with \(B^*\) therefore, in order to produce metarules rather than firstorder clauses, we must replace the ground terms in those substitutions with new variables. We propose Procedure LIFT in Algorithm 3 to perform this “variabilisation” operation.
Lemma 4
(Fullyconnected lifting) Let \(\vartheta \varTheta /M\) be a meta/substitution of a punch or matrix metarule M and \(M\vartheta \varTheta\) be a fullyconnected definite clause. The application of LIFT\((\vartheta \varTheta )\) to M, M.LIFT\((\vartheta \varTheta )\), is a fullyconnected sort metarule.
Proof
Procedure LIFT replaces each occurrence of a ground term with the same variable throughout \(\vartheta \varTheta\) so that if two literals \(l_i,l_k \in M\vartheta \varTheta\) share a ground term, \(\{l_i,l_k\}.\) LIFT\((\vartheta \varTheta ) \in M.\) LIFT\((\vartheta \varTheta )\) share a variable. Therefore \(M\vartheta \varTheta\) is fullyconnected iff M.LIFT\((\vartheta \varTheta )\) is fullyconnected. \(\square\)
Lemma 5
(Lifting subsumption) Let \(\vartheta \varTheta /M\) be a meta/substitution of a punch or matrix metarule M. Then \(M \preceq M.\) LIFT\((\vartheta \varTheta ) \preceq M\vartheta \varTheta\).
Proof
\(M \preceq M.\) LIFT\((\vartheta \varTheta )\) by Definition 6. Construct a meta/substitution \(\sigma \varSigma\) by mapping each \(w_i\) in \(v_i/w_i \in \,\) LIFT\((\vartheta \varTheta )\) to \(t_i\) in \(v_i/t_i \in \vartheta \varTheta\). \(\exists \sigma \varSigma : M.\) LIFT\((\vartheta \varTheta )\sigma \varSigma = M \vartheta \varTheta\), therefore M.LIFT\((\vartheta \varTheta ) \preceq M \vartheta \varTheta\). \(\square\)
Example 3
Let \(M = P(x,y) \leftarrow Q(z,u)\), \(\vartheta \varTheta = \{P/p, Q/q, x/a, y/b, z/a, u/b\}\). Then: \(M \vartheta \varTheta = p(a,b) \leftarrow q(a,b)\), LIFT\((\vartheta \varTheta ) =\) \(\{P/P_1, Q/Q_1, x/x_1, y/y_1, z/x_1\), \(u/y_1\}\), M.LIFT\((\vartheta \varTheta ) =\) \(P_1(x_1,y_1) \leftarrow Q_1(x_1,y_1)\), \(\sigma \varSigma = \{P_1/p, Q_1/q, x_1/a, y_1/b \}\) and M.LIFT\((\vartheta \varTheta ) \sigma \varSigma = p(a,b) \leftarrow q(a,b) = M \vartheta \varTheta\).
Theorem 3
(Soundness) Let \(e, B^*, {\mathcal {M}}\) be as in Algorithm 2. If \(M' =\) VLSPECIALISE \((\lnot e, B^*, {\mathcal {M}})\) then \(M'\) is a fullyconnected sort metarule and \(\exists \varSigma / M': M'\varSigma \wedge B^* \models e\).
Proof
Assume Theorem 3 is false. Then, \(M' =\) VLSPECIALISE\((\lnot e, B^*, {\mathcal {M}})\) and (a) \(M'\) is not a fullyconnected sort metarule, or (b) \(\not \exists \varSigma /M': M'\varSigma \wedge B^* \models e\). In Procedure VLSPECIALISE \(M' = M.\) LIFT\((\vartheta \varTheta )\) is returned iff (c) M is the punch or matrix metarule selected in line 2, (d) \(\exists \vartheta , \varTheta /M: M\vartheta \varTheta \cup B^* \cup \{\lnot e\} \vdash _{SLD} \square\) iff \(M\vartheta \varTheta \wedge B^* \models e\) and (e) \(M \vartheta \varTheta\) is a fullyconnected definite clause. By Lemma 5, if (c) and (d) hold then \(M' \preceq M \vartheta \varTheta\) because \(\exists \sigma \varSigma : M' \sigma \varSigma = M \vartheta \varTheta\). Therefore if (c) and (d) hold then \(M'\sigma \varSigma \wedge B^* \models e\) and \(M' \varSigma \wedge B^* \models e\). By Lemma 4, if (e) holds then \(M'\) is a fullyconnected sort metarule. Therefore, either \(M' \ne\) VLSPECIALISE\((\lnot e, B^*, {\mathcal {M}})\) and (a), (b) are false, or Theorem 3 is true. This refutes the assumption and completes the proof. \(\square\)
4.9 Cardinality of metarule languages
In this section we turn our attention to the cardinalities of metarule languages and show that they are polynomial in the number of punch metarule literals (Table 6).
Definition 10
(Clause length) Let C be a metarule or a firstorder definite clause. The length of C is the number of literals in C.
Note 3
In the following lemmas and proofs, metarules that differ only in the names of their variables are considered identical.
Lemma 6
(Number of punch metarules) The number of punch metarules of length in [1, k ] is k.
Proof
There exist k ntuples of thirdorder variables for \(n \in [1,k]\). Exactly one definite clause can be formed from each such ntuple (see Note 3). \(\square\)
Example 4
Suppose \(k = 3\). The set of ntuples of thirdorder variables in punch metarules of length 1 to k is \(\{\{P\}, \{P,Q\}, \{P,Q,R\}\}\). Exactly one definite clause can be formed from each such ntuple: \(\{P\}\), \(\{P \vee \lnot Q\}\) and \(\{P \vee \lnot Q \vee \lnot R\}\).
Lemma 7
(Number of matrix metarules) Let \({\mathcal {A}}_{\mathring{M}} \subseteq {\mathcal {A}}\) be the set of matrix metarule atoms and \(a = {\mathcal {A}}_{\mathring{M}}\). The number of matrix metarules of length k is at most \(k(a^k/k!)\).
Proof
Let \(\{A_1,\ldots,A_k: A_i \in {\mathcal {A}}_{\mathring{M}}\}\) be the ktuple of atoms in a matrix metarule of length k. The number of such ktuples is the number of subsets of \({\mathcal {A}}_{\mathring{M}}\) of length k, called the kcombinations of \({\mathcal {A}}_{\mathring{M}}\), which is equal to the binomial coefficient \(\left( {\begin{array}{c}a\\ k\end{array}}\right)\), which is at most \(a^k/k!\) for \(1 \le k \le a\) (Cormen et al., 2001). Note that if a is less than k a matrix metarule of length k cannot be formed because matrix metarule literals must be distinct atoms. For each such ktuple, T, exactly k definite clauses can be formed by taking one atom in T as the single positive literal, in turn. \(\square\)
Example 5
Suppose \(k = 3\), \({\mathcal {A}}_{\mathring{M}} = \{P(x,y), Q(z,u,v), R(w)\}\). The set, T, of 3tuples of atoms in \({\mathcal {A}}_{\mathring{M}}\) is \(\{ \{P(x,y), Q(z,u,v)\), \(R(w) \} \} = \{{\mathcal {A}}_{\mathring{M}}\}\). The set of definite clauses formed by taking each atom in a 3tuple in T as a positive literal in turn is: \(\{\{P(x,y)\), \(\lnot Q(z,u,v)\), \(\lnot R(w)\}\), \(\{\lnot P(x,y)\), Q(z, u, v), \(\lnot R(w)\}\), \(\{\lnot P(x,y)\),\(\lnot Q(z,u,v)\), \(R(w)\}\}\).
Lemma 8
(Number of sort metarules) Let e be the number of existentially quantified first and secondorder variables, and u be the number of universally quantified firstorder variables, in all sort metarules of length k in the language \({\mathscr {M}}^{k}_{b}\). Let \(n = e + u\). The number of sort metarules in \({\mathscr {M}}^{k}_{b}\) is less than \((2n1)^n/ n!\).
Proof
Let \(\{P_1, \ldots, P_e, v_1, \ldots, v_u\}\) be the multiset^{Footnote 4} with cardinality \(n = e + u\) of existentially quantified first and secondorder variables \(P_i\) and universally quantified firstorder variables \(v_j\) each with multiplicity 1 or more. Let S be the set of all such multisets. Let \(S'\) be the set of multisets in S each containing existentially quantified secondorder variables of total multiplicity k, existentially quantified firstorder variables of total multiplicity \(ek\), and universally quantified firstorder variables with total multiplicity u at least one of which has multiplicity between 2 and u. Multiplicities of variables in elements of \(S'\) are constrained by Definition 5 and the cardinality of elements of \(S'\) is \(n = e + u\) therefore \(S'\) is the set of multisets of variables in sort metarules of length k in the language \({\mathscr {M}}^{k}_{b}\). The cardinality of S is equal to the multiset coefficient \(\left( \!\left( {\begin{array}{c}n\\ n\end{array}}\right) \!\right)\) which is equal to the binomial coefficient \(\left( {\begin{array}{c}2n1\\ n\end{array}}\right)\) (Stanley, 2011). S necessarily includes elements not in \(S'\), for example multisets containing no universally quantified variables with multiplicity 2. Therefore \(S' < S\) and so \(S' < \left( {\begin{array}{c}2n1\\ n\end{array}}\right)\). \((2n1)^n/n!\) is an upper bound for \(\left( {\begin{array}{c}2n1\\ n\end{array}}\right)\) for \(1 \le n \le 2n1\) which is always the case, therefore \(S' < (2n1)^n/n!\). \(\square\)
Example 6
Suppose \(k = 3\), n = 9. The set, S, of nmultisets of existentially and universally quantified variables is \(\{\) \(\{P, P, P, x\), x, x, x, \(x, x\}\), \(\{P, P, P, x, x, x, x, x, y, \}\), \(\{P, P, P, x, x, x, x, x, z\}\), \(\ldots \}\) etc. The set, \(S'\), of nmultisets of existentially and universally quantified variables in sort metarules of length 3 in the language \({\mathscr {M}}^{3}_{2}\) is \(\{ \{P, P, P, x, x, x, x, x, x\}, \{P, P, P\), \(y, y, x, x, x, x\}, \{P, P, P, y, x, y, x, x, x\}, \{P, P, P\), \(y, x, x, y, x, x\}, \ldots \}\) etc.
Lemma 9
(Number of metasubstitutions) Let \(p = {\mathcal {P}}\), \(c = {\mathcal {C}}\) and let n be as in Lemma 8. The number of metasubstitutions of sort metarules of length k is less than \(p^kc^n\).
Proof
Let h be the number of predicate symbols in the heads and b the number of predicate symbols in literals in the body, of all metasubstitutions of a sort metarule of length k, and let e be as in Lemma 8. Let \(\{H,B_1,\ldots,B_{k1},c_1,\ldots,c_{ek}\}\) be the etuple of a predicate symbol H substituting the existentially quantified secondorder variable in the head literal, predicate symbols \(B_i\) substituting existentially quantified secondorder variables in the body literals, and constants \(c_j\) substituting existentially quantified firstorder variables in all literals, in a sort metarule of length k. There exist \(hb^{k1}c^{ek}\) such etuples. \(hb^{k1}\) is at most \(p^k\) as when all symbols in \({\mathcal {P}}\) are of target predicates. \(c^{ek}\) is always less than \(c^n\) because at least k existentiallyquantified variables in a sort metarule of length k must be secondorder. Therefore, \(hb^{k1}c^{ek}\) is less than \(p^kc^n\). \(\square\)
Example 7
Suppose \(k = 3, e = 4\), \(H \in \{p\}\), \(B_i \in \{q,r\}\), \({\mathcal {C}}\) \(= \{a,b,c\}\). The set of 4tuples of predicate symbols and constants in metasubstitutions of sort metarules of length 3 with one existentially quantified firstorder variable is: \(\{ \{p,q,q,a\}\), \(\{p,q,q,b\}\), \(\{p,q,q,c\}\), \(\{p,q\),\(r,a\}\), ..., \(\{p,r,r,a\}\), \(\{p,r,r,b\}\), \(\{p,r,r,c\}\}\).
Observation 2
Lemma 9 is a refinement of earlier results by Lin et al. (2014); Cropper and Tourret (2018) who calculate the cardinality of the set of metasubstitutions of a single sort metarule as \(p^k\) (or \(p^3\) for \(H^2_2\) metarules). Our result takes into account, firstly the restriction that only symbols in \(E^+\) and I can be substituted for secondorder variables in the heads of sort metarules, and secondly the possible metasubstitution of existentially quantified firstorder variables by constants, neither of which is considered in the earlier results.
Lemma 10
(Number of ground clauses) Let n, c be as in Lemmas 8and 9. The number of ground substitutions of the universally quantified variables in a sort metarule is less than \(c^n\).
Proof
Let \(\{v_1,\ldots,v_c\}\) be the ctuple of constants substituted for u universally quantified variables. There are \(c^u\) such utuples. In a ground substitution of the universally quantified variables in a sort metarule, \(c^u\) is always less than \(c^n\) because \(n = e+u\), where e is as in Lemma 8, and there are exactly \(k > 0\) existentially quantified secondorder variables in a sort metarule of length k, therefore \(e > 0\). \(\square\)
Example 8
Let \(u = 3,\; {\mathcal {C}} = \{a,b,c\}\). The set of 3tuples of constants substituted for 3 universally quantified variables is \(\{\{a,a,a\}\), \(\{a,a,b\}\), \(\{a,a,c\}\), \(\{a,b,a\}\), \(\ldots, \{c,c,b\}, \{c,c,c\} \}\).
Note 4
While we have derived exact results in the proofs of Lemmas 9 and 10 we have chosen to state these two Lemmas in terms of upper bounds in the interest of simplifying notation, particularly the notation of Theorem 4.
Theorem 4
(Cardinality of metarule languages) Let k, a, n, p, c be as in Lemmas 6–10. The number of vl specialisations of punch metarules of length in [1, k] is at most:
Proof
By Lemma 6 there are k punch metarules in the language \(M^{k}_{}\). The cardinality of the set of vl/specialisations of the k punch metarules in the language \(M^{k}_{}\) is the sum of the cardinalities of the sets of vl/specialisations of punch metarules in each language \(M^{i}_{}\), where \(i \in [1,k]\).
Let \(i \in [1,k]\). By Lemma 7 there exist at most \(i(a^i/i!)\) matrix metarule specialisations of a punch metarule with i body literals. By Lemma 8, there exist fewer than \((2n1)^n/n!\) sort metarule specialisations of each such matrix metarule. By Lemma 9 there exist fewer than \(p^kc^n\) metasubstitutions of each such sort metarule. By Lemma 10 there exist fewer than \(c^n\) ground firstorder clause specialisations of each such metasubstitution.
Thus, the cardinality of the set of vlspecialisations of punch metarules in the language \(M^{k}_{}\) is at most the sum for all \(i \in [1,k]\) of the product \(i(a^i/i!) ((2n1)^n/ n!) p^ic^nc^n\). We may rewrite this product as the fraction \(\frac{ia^i (2n1)^n p^ic^{2n}}{i!n!}\). \(\square\)
Corollary 2
Each metarule language \({\mathscr {M}}^{l}_{a}\) is enumerable in time polynomial to the number of literals in the most general metarule in \({\mathscr {M}}^{l}_{a}\), i.e. l.
5 Implementation
We have created a prototype, partial implementation of Algorithms 2 and 3 in Prolog, as a new module added to Louise.^{Footnote 5} The implementation is partial in that it performs only vspecialisation of punch and matrix metarules, but not lspecialisation. For clarity, we will refer to this new module as TOIL (an abbreviation of Third Order Inductive Learner). We now briefly discuss TOIL but leave a full description for future work, alongside a complete implementation.^{Footnote 6}
We distinguish punch metarule specialisation in TOIL as TOIL3 and matrix metarule specialisation as TOIL2. Both subsystems are implemented as variants of the Top Program Construction algorithm in Louise. Each subsystem takes as input a MIL problem with punch or matrix metarules, respectively for TOIL2 and TOIL3, instead of sort metarules, and outputs a set of sort metarules.
According to line 5 of Procedure VLSPECIALISE in Algorithm 2, both subsystems test that a ground instance \(M \vartheta \varTheta\) of an input metarule M is fully/connected before passing it to their implementation of Procedure LIFT. To do so, TOIL2 maintains a “substitution buffer”, S, of tuples \(c \mapsto k\) where each c is a constant and each k is the number of firstorder variables in M substituted by c. If, when line 5 is reached, S includes any tuples where \(k = 1\), \(M \vartheta \varTheta\) is not fully/connected. S is first instantiated to the constants in an input example. When a new literal L of M is specialised, the firstorder variables in L are first substituted for constants in S, ensuring that L is connected to literals earlier in M. L is then resolved with \(B^*\). If resolution succeeds, a “lookahead” heuristic, listed in Algorithm 4, attempts to predict whether the now fullyground L allows a fully/connected instantiation of M to be derived. If so, S is updated with the new constants derived during resolution and the new counts of existing constants. If not, the process backtracks to try a new grounding of L.
TOIL3 restricts instantiation of punch metarule literals to the set \({\mathcal {A}}_{\mathring{M}, B}\), of matrix metarule literals unifiable with the heads of clauses in \(B^*\) (\({\mathcal {A}}_{\mathring{M}, B}\) is generated automatically by TOIL3). Because atoms in \({\mathcal {A}}_{\mathring{M}, B}\) are nonground, it is not possible to apply the lookahead heuristic employed in TOIL2; TOIL3 only uses the substitution buffer to ensure derived metarules are fully/connected.
Theorem 4 predicts that metarule languages are enumerable in polynomial time, but generating an entire metarule language is still expensive—and unnecessary. To avoid overgeneration of metarule specialisations, TOIL limits the number of attempted metarule specialisations, in three ways: (a) by subsampling, i.e. training on a randomly selected sample of \(E^+\); (b) by directly limiting the number of metarule specialisation attempts; and (c) by a coverset procedure that removes from \(E^+\) each example entailed by the last derived specialisation of an input metarule, before attempting a new one.
TOIL cannot directly derive sort metarules with existentially quantified firstorder variables. These must be simulated by monadic background predicates representing possible theory constants, e.g., pi(3.14), e(2.71), g(9.834), c(300000), etc.
We leave a formal treatment of the properties of the connectedness constraints and specialisation limits described above to the aforementioned future work.
6 Experiments
A common criticism of the MIL approach is its dependence on userdefined metarules. In this Section we show experimentally that automatically derived fullyconnected sort metarules can replace userdefined fullyconnected sort metarules as shown in Sect. 4.8, thus addressing the aforementioned criticism. We formalise our motivation for our experiments as Experimental Hypotheses 1 and 2.
Experimental Hypothesis 1
Metarules learned by TOIL can replace user/defined metarules without decreasing Louise’s predictive accuracy.
Experimental Hypothesis 2
Metarules learned by TOIL can replace user/defined metarules without increasing Louise’s training time.
6.1 Experiment setup
We conduct a set of metarule replacement experiments where an initial set, \({\mathcal {M}}\), of userdefined, fullyconnected sort metarules are progressively replaced by metarules learned by TOIL^{Footnote 7}.
Each metarule replacement experiment proceeds for \(k = {\mathcal {M}} + 1\) steps. Each step is split into three separate legs. We repeat the experiment for \(j = 10\) runs at the end of which we aggregate results. Each leg is associated with a new set of metarules: \({\mathcal {M}}_1\), \({\mathcal {M}}_2\) and \({\mathcal {M}}_3\) for legs 1 through 3, respectively. At the start of each run we initialise \({\mathcal {M}}_1\) to \({\mathcal {M}}\), and \({\mathcal {M}}_2\), \({\mathcal {M}}_3\) to \(\emptyset\). At each step i after the first, we select, uniformly at random and without replacement, a new userdefined metarule \(M_i\) and set \({\mathcal {M}}_1 = {\mathcal {M}}_1 \setminus \{M_i\}\), leaving \(ki\) metarules in \({\mathcal {M}}_1\). Thus, at step \(i = 1\), \({\mathcal {M}}_1 = {\mathcal {M}}\) while at step \(i = k\), \({\mathcal {M}}_1 = \emptyset\). In each step i we train TOIL2 and TOIL3 with a set of matrix or punch metarules (described in the following section), respectively, then we replace all the metarules in \({\mathcal {M}}_2\) with the output of TOIL2 and replace all the metarules in \({\mathcal {M}}_3\) with the output of TOIL3 (in other words, we renew \({\mathcal {M}}_2\) and \({\mathcal {M}}_3\) in each step). Then, in leg 1 we train Louise with the metarules in \({\mathcal {M}}_1\) only; in leg 2 we train Louise with the metarules in \({\mathcal {M}}_1 \cup {\mathcal {M}}_2\); and in leg 3 we train Louise with the metarules in \({\mathcal {M}}_1 \cup {\mathcal {M}}_3\). As to examples, at each step i we sample at random and without replacement 50% of the examples in each of \(E^+\) and \(E^\) as a training partition and hold the rest out as a testing partition. We sample a new pair of training and testing partitions in each leg of each step of each run of the experiment and perform a learning attempt with Louise on the training partition. We measure the accuracy of the hypothesis learned in each learning attempt on the testing partition, and the duration of the learning attempt in seconds. We measure accuracy and duration in two separate learning attempts for each leg. In total we perform (10 runs * \({\mathcal {M}}\) steps * 3 legs * 2 measurements) distinct learning attempts, each with a new randomly chosen training and testing partition. We set a time limit of 300 sec. for each learning attempt. If a learning attempt exhausts the time limit we calculate the accuracy of the empty hypothesis on the testing partition. Finally, we return the mean and standard error of the accuracy and duration for the learning attempts at the same step of each leg over all 10 runs.
We run all experiments on a PC with 32 8core Intel Xeon E52650 v2 CPUs clocked at 2.60 GHz with 251 Gb of RAM and running Ubuntu 16.04.7.
6.2 Experiment datasets
We reuse the datasets described in Patsantzis and Muggleton (2021). These comprise: (a) Grid World, a gridworld generator for robot navigation problems; (b) Coloured Graph, a generator of fullyconnected coloured graphs where the target predicate is a representation of the connectedness relation and comprising four separate datasets with different types of classification noise in the form of misclassified examples (false positives, false negatives, both kinds and none); and (c) M:tG Fragment, a handcrafted grammar of the Controlled Natural Language of the Collectible Card Game, “Magic: the Gathering” where examples are strings entailed by the grammar. Table 7 summarises the MIL problem elements of the three datasets. We refer the reader to Patsantzis and Muggleton (2021) for a full description of the three datasets.
Instead of the metarules defined in the experiment datasets we start each experiment by initialising \({\mathcal {M}}\) to the set of 14 \(H^2_2\) metarules in Cropper and Muggleton (2015), listed in Table 2, which we call the canonical \(H^2_2\) set. We replace them with specialisations of the matrix metarules Metadyadic and Metamonadic from Table 3 and the punch metarules TOM3 for M:tG Fragment or TOM2, TOM3, from Table 4 otherwise. We limit overgeneration in metarule specialisation as described in Sect. 5 by limiting metarule specialisation attempts to 1 for M:tG Fragment; and subsampling 50% of \(E^+\) at the start of each metarule learning attempt, for Grid World and Coloured Graph.
Our configuration of the three experimental datasets is identical to that in Patsantzis and Muggleton (2021) with the exception of the Grid World dataset, which we configure to generate a grid world of dimensions \(3 \times 3\). The resulting learning problem is trivial, but hypotheses learned with metarules derived by TOIL for worlds of larger dimensions tend to be extremely large (hypothesis cardinalities upwards of 6000 clauses are logged in preliminary experiments), consuming an inordinate amount of resources during evaluation. By comparison, Patsantzis and Muggleton (2021) report a hypothesis of 2567 clauses for a \(5 \times 5\) world (as in our preliminary experiment). This observation indicates that future work must address overgeneration by TOIL. Still, the size of the learned hypotheses serves as a stress test for our implementation.
6.3 Experiment results
Figure 1 lists the results of the experiments measuring predictive accuracy. We immediately observe that in the two legs of the experiment where userdefined metarules are replaced by learned metarules, marked by “TOIL2” and “TOIL3”, Louise’s accuracy is maintained, while it degrades in the leg of the experiment where metarules are reduced without replacement, marked “No replacement”. These results support Experimental Hypothesis 1.
Figure 2 lists the results of the six experiments measuring training times. We observe that training times for the “No replacement” leg of the experiments decrease as the number of userdefined metarules decreases, but remain more or less constant for the other two legs as removed metarules are replaced by metarules learned by TOIL2 and 3. In the M:tG Fragment dataset, all but a single metarule in the canonical set, the Chain metarule, are redundant. TOIL2 and 3 only learn the Chain metarule in their two legs of the experiment and so, as redundant metarules are removed and not replaced, training times decrease in all three legs of that experiment. These results support Experimental Hypothesis 2.
6.3.1 Learned metarules
During our experiments, the metarules learned by TOIL are logged to the command line of the executing system. We could thus examine and will now discuss examples of the metarules learned during execution.
For the M:tG Fragment dataset, we observed that TOIL2 and 3 both learned a single metarule, the Chain metarule listed in Table 2. The target theory for M:tG Fragment is a grammar in Definite Clause Grammar form (Colmerauer, 1978; Kowalski, 1974), where each clause is indeed an instance of Chain. For this dataset, TOIL was able to learn the set of metarules that would probably also be chosen by a user.
For the Grid World dataset, TOIL2 and 3 both learned a set of 22 \(H^2_2\) metarules including the canonical set and 4 metarules with a single variable in the head literal, e.g. \(P(x,x)\leftarrow Q(x,y), R(x,y)\) or \(P(x,x)\leftarrow Q(x,y), R(y,x)\) that are useful to represent solutions of navigation tasks beginning and ending in the same “cell” of the grid world. Such metarules may be seen as overspecialisations, but they are fullyconnected sort metarules which suggests that the constraints imposed on metarule specialisation to ensure only fullyconnected metarules are returned, described in Sect. 5, are correctly defined.
Table 8 lists metarules learned by TOIL2 and 3 for the Coloured Graph  False Positives dataset. The learned metarules are exactly the set of 14 Canonical \(H^2_2\) metarules in Table 2. Cropper and Muggleton (2015) show that the 14 canonical \(H^2_2\) metarules are reducible to a minimal set including only Inverse and Chain, therefore returning the entire canonical set is redundant. This is an example of the overgeneration discussed in Sect. 5, that TOIL attempts to control by limiting the number of attempted metarule specialisations. Logical minimisation by Plotkin’s program reduction algorithm, as described by Cropper and Muggleton (2015), could also be of help to reduce redundancy in an alreadylearned set of metarules, although TOIL may be overwhelmed by overgeneration before reduction has a chance to be applied. In any case, overgeneration is a clear weakness of our approach and must be further addressed by future work.
7 Conclusions and future work
7.1 Summary
We have presented a novel approach for the automatic derivation of metarules for MIL, by MIL. We have shown that the userdefined fullyconnected secondorder sort metarules used in the MIL literature can be derived by specialisation of the mostgeneral secondorder matrix metarules in a language class, themselves derivable by specialisation of thirdorder punch metarules with literals that range over the set of secondorder literals. We have shown that metarule languages are enumerable in time polynomial to the number of literals in punch metarules. We have defined two methods of metarule specialisation, v and l specialisation and shown that they are performed by MIL. We have proposed a modification of the MIL clause construction operator to return fully connected secondorder sort metarules, rather than firstorder clauses and proved its correctness. We have partially implemented the modified MIL operator as TOIL, a new subsystem of the MIL system Louise, and presented experiments demonstrating that metarules automatically derived by TOIL can replace userdefined metarules while maintaining predictive accuracy and training times.
7.2 Future work
The major practical limitations of our approach are the incomplete state of its implementation and its overgeneration of metarule specialisations.
Our prototype implementation of Algorithms 2 and 3 in TOIL is only capable of vspecialisation. Work is under way to complete the implementation with the capability for lspecialisation. In Sect. 5, we have left a formal treatment of TOIL to the time this work is complete. Further work would improve the lookahead heuristic in Algorithm 4 and our ability to limit attempted metarule specialisations to reduce overgeneration. In general, we do not know of a good, principled (as in nonheuristic) and efficient approach to derive just enough metarules to solve a problem, without deriving too many and overgeneralising.
Conversely to overgeneration, TOIL also exhibits a tendency to produce metarules that are overspecialised to the examples in a MIL problem, a form of overfitting. This seems to be a limitation of TOIL’s lookahead heuristic listed in Algorithm 4 and used to ensure learned metarules are fullyconnected. Future work should look for a principled approach to replace this heuristic, also.
TOIL3 is capable of learning new metarules with literals of arbitrary arities, as illustrated in Table 9. We haven’t demonstrated this important ability with experiments. Additionally, we have not presented any empirical results measuring training times for TOIL itself—only for Louise. Theoretical results in Sect. 4.9 predict that learning metarules should be timeconsuming, especially for larger metarule languages, and we have observed this while executing the experiments in Sect. 6, although more so for TOIL3 than TOIL2.
Our theoretical framework described in Sect. 4 extends \(\theta\)subsumption to metarules. It remains to be seen if the related frameworks of relative subsumption and relative entailment (NienhuysCheng & de Wolf, 1997) can also be extended to metarules. v and lspecialisation seem to be related to Shapiro’s refinement operators (Shapiro, 2004), a point also made about metarules in general by Cropper and Muggleton (2015) but we haven’t explored this relation in this work.
Corollary 1 suggests that classes of learning problems can be solved by the same sets of metarules, as long as suitable solutions belong to the same metarule language. This observation introduces the possibility of transferring generalisations, in the form of learned metarules, across learning problems or problem domains thus in a sense forming analogies, a capability poorly represented in modern machine learning—and, in general, AI/systems (Mitchell, 2021). Such a capability would however rely on a method to determine the relevance of metarules to a problem; currently, no such method is known. Table 9 illustrates the transfer of learned metarules as analogies between problems.
Future work should test the accuracy of the metarules learned by TOIL with other systems that use metarules, besides Louise, for example Metagol (Cropper & Muggleton, 2016b), Popper (Cropper & Morel, 2021) and ALPS (Si et al., 2019).
Change history
16 May 2022
A Correction to this paper has been published: https://doi.org/10.1007/s10994022061801
Notes
For example it is natural to define addition by an arity3 predicate sum(x, y, z). The same can be expressed as an arity2 predicate sum([x, y], z) but this is not datalog. The function symbol [] can be removed by flattening (Rouveirol, 1994), requiring two new body literals and two new background predicates, e.g. \(sum(XY,z) \leftarrow head(XY,x), tail(XY,y), ...\) but this is not in \(H^2_2\). We leave an \(H^2_2\) datalog definition of sum as an exercise to the reader.
Such sly cheating of FOL semantics is enabled by Prolog where \(\mathcal {P,C}\) need not be disjoint.
To simplify notation, we omit recursion and predicate invention in Algorithm 1 and also Algorithm 2, below. See Appendix A for a complete description.
Informally, a multiset is a collection of a set’s elements each repeating a number of times equal to its multiplicity.
Our new module is available from the Louise repository, at the following url: https://github.com/stassa/louise/blob/master/src/toil.pl
We reserve the title TOIL: A fullterm report for this future work.
Experiment code and datasets are available from https://github.com/stassa/mlj_2021
References
Ceri, S., Gottlob, G., & Tanca, L. (1989). What you always wanted to know about datalog (and never dared to ask). IEEE Transactions on Knowledge and Data Engineering, 1(1), 146–166.
Colmerauer, A. (1978). Metamorphosis grammars (pp. 133–188). Berlin: Springer. https://doi.org/10.1007/BFb0031371.
Cormen., T, Leiserson., C, Rivest., R, & Stein., C. (2001). Introduction to algorithms, second edition.
Cropper, A., & Morel, R. (2021). Learning programs by learning from failures. Machine Learning. https://doi.org/10.1007/s1099402005934z
Cropper., A, & Muggleton., S. (2016a). Learning higherorder logic programs through abstraction and invention. In Proceedings of the 25th international joint conference artificial intelligence (IJCAI 2016), IJCAI (pp. 1418–1424). http://www.doc.ic.ac.uk/~shm/Papers/metafunc.pdf
Cropper., A, & Muggleton., S. H. (2015). Logical minimisation of metarules within metainterpretive learning. In Proceedings of the 24th international conference on inductive logic programming (pp. 65–78).
Cropper., A., & Muggleton, S. H. (2016b). Metagol system. https://github.com/metagol/metagol
Cropper, A., & Tourret, S. (2018). Derivation reduction of metarules in metainterpretive learning. In F. Riguzzi, E. Bellodi, & R. Zese (Eds.), Inductive logic programming (pp. 1–21). Springer.
Emde., W. (1987). Noncumulative learning in metaxa.3. In Proceedings of IJCAI87, Morgan Kaufmann (pp. 208–210).
Emde., W, Habel., C. U., Rainer Rollinger, C., Berlin, T. U., Kit, P., & Fr, S. (1983). The discovery of the equator or concept driven learning. In Proceedings of the 8th international joint conference on artificial intelligence, Morgan Kaufmann (pp. 455–458).
Evans, R., & Grefenstette, E. (2018). Learning explanatory rules from noisy data. Journal of Artificial Intelligence Research, 61, 1–64. https://doi.org/10.1613/jair.5714
Kaminski, T., Eiter, T., & Inoue, K. (2018). Exploiting answer set programming with external sources for metainterpretive learning. TPLP, 18, 571–588.
Kietz, J. U., & Wrobel, S. (1992). Controlling the complexity of learning in logic through syntactic and taskoriented models. Inductive logic programming (pp. 335–359). Academic Press.
Kowalski, R. (1974). Logic for problem solving. Memo No 75, March 1974, Department of Computational Logic, School of Artificial Intelligence, University of Edinburgh. http://www.doc.ic.ac.uk/~rak/papers/Memo75.pdf
Lin, D., Dechter, E., Ellis, K., Tenenbaum, J., Muggleton, S., & Dwight, M. (2014). Bias reformulation for oneshot function induction. In Proceedings of the 23rd European conference on artificial intelligence (pp. 525–530). https://doi.org/10.3233/9781614994190525
Mitchell, M. (2021). Abstraction and analogymaking in artificial intelligence. arXiv:210210717v1 [csAI].
Morik, K. (1993). Balanced cooperative modeling (pp. 109–127). Springer. https://doi.org/10.1007/9781461532026_6
Muggleton, S., & Lin, D. (2015). Metainterpretive learning of higherorder dyadic datalog : Predicate invention revisited. Machine Learning, 100(1), 49–73.
Muggleton, S., & de Raedt, L. (1994). Inductive logic programming: Theory and methods. The Journal of Logic Programming, 19–20(SUPPL. 1), 629–679. https://doi.org/10.1016/07431066(94)900353
Muggleton, S. H., Lin, D., Pahlavi, N., & TamaddoniNezhad, A. (2014). Metainterpretive learning: Application to grammatical inference. Machine Learning, 94(1), 25–49. https://doi.org/10.1007/s1099401353583
NienhuysCheng, S. H., & de Wolf, R. (1997). Foundations of inductive logic programming. Berlin: SpringerVerlag.
Patsantzis, S., & Muggleton, S. H. (2019a) Louise system. https://github.com/stassa/louise
Patsantzis, S, & Muggleton, S. H. (2019b). Thelma system. https://github.com/stassa/thelma
Patsantzis, S., & Muggleton, S. H. (2021). Top program construction and reduction for polynomial time metainterpretive learning. Machine Learning. https://doi.org/10.1007/s1099402005945w
Plotkin, G. (1972). Automatic methods of inductive inference. PhD thesis, The University of Edinburgh.
Robinson, J. A. (1965). A machineoriented logic based on the resolution principle. Journal of the ACM (JACM), 12(1), 23–41. https://doi.org/10.1145/321250.321253
Rouveirol, C. (1994). Flattening and saturation: Two representation changes for generalization. Machine Learning, 14(2), 219–232. https://doi.org/10.1023/A:1022678217288
Shapiro, E. Y. (2004). Algorithmic program debugging. The MIT Press. https://doi.org/10.7551/mitpress/1192.001.0001
Si, X, Lee, W., Zhang, R., Albarghouthi, A., Koutris, P., & Naik, M. (2018). Syntaxguided synthesis of datalog programs. In Proceedings of the 2018 26th ACM joint meeting on european software engineering conference and symposium on the foundations of software engineering, association for computing machinery, New York, NY, USA, ESEC/FSE 2018 (pp. 515–527). https://doi.org/10.1145/3236024.3236034
Si, X., Raghothaman, M., Heo, K., & Naik, M. (2019). Synthesizing datalog programs using numerical relaxation. In Proceedings of the twentyeighth international joint conference on artificial intelligence, IJCAI19, international joint conferences on artificial intelligence organization (pp. 6117–6124). https://doi.org/10.24963/ijcai.2019/847
Stanley, R. P. (2011). Enumerative combinatorics (2nd ed.). Cambridge: Cambridge University Press.
Wrobel, S. (1988). Design goals for sloppy modeling systems. International Journal of ManMachine Studies, 29(4), 461–477. https://doi.org/10.1016/S00207373(88)800063
Acknowledgements
The first author acknowledges support from the UK’s EPSRC for financial support of her studentship. The second author acknowledges support from the UK’s EPSRC HumanLike Computing Network, for which he acts as director. We thank the anonymous reviewers for their diligent and knowledgeable reviews that have helped us significantly improve our work.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflicts of interest
Author 1 wrote all sections of the paper. Author 2 provided feedback and corrections on all sections of the paper. The authors have no conflicts of interest to disclose. Ethics approval, consent to participate and consent for publication were not required. Code and data have been made available in Sect. 6.
Additional information
Editors: Nikos Katzouris, Alexander Artikis, Luc De Raedt, Artur d’Avila Garcez, Sebastijan Dumancic, Ute Schmid, Jay Pujara.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
The original online version of this article was revised: On page 5 of the original article, the correct notation “P,F,C,A” was changed by the typesetter to “P <=> F <=> C <=> A” after the author had approved the proof. The article was subsequently published with the incorrect notation “P <=> F <=> C <=> A”.
Appendices
Appendix A: Predicate invention in MIL—full description
In Sect. 4.6 we have given a simplified description of Algorithm 1 omitting the recursive resolution step that takes place during predicate invention. We have done this to simplify the description of the algorithm and to isolate the specialisation operation that is the primary subject of Sect. 4.6. Algorithm 1 is accurate as long as predicate invention is not required. In this Appendix, Algorithm 5 is a more complete description of Algorithm 1 that includes recursion and predicate invention. Similarly, Algorithm 6 is a more complete description, including the predicate invention step, of Algorithm 2. In our implementation of TOIL the propagation of meta/substitution \(\vartheta \varTheta\) in line 11 of Algorithms 5, 6 is handled by the Prolog engine.
Appendix B: An example of metarule specialisation
Table 10 illustrates the use of TOIL to learn metarules for Louise. In table section (A) the elements of a MIL problem are defined. In table section (B) a set of matrix metarules \({\mathcal {M}}_1\) and a set of punch metarules \({\mathcal {M}}_2\) are defined, each with a single member. In row (c) TOIL2 learns a new fullyconnected sort metarule from the elements of the MIL problem in table section (A) and the matrix metarule in \({\mathcal {M}}_1\). In row (d) TOIL3 learns a new fullyconnected sort metarule from the elements of the MIL problem in table section (A) and the punch metarule in \({\mathcal {M}}_2\). Note that both subsystems of TOIL learn the same fullyconnected punch metarule (the \(H^2_2\) Chain metarule, listed in Table 2).
In rows (e) and (f) Louise is given the elements of the MIL problem in table section (A) and the metarule learned by TOIL2 and TOIL3, and learns the hypothesis starting at row (f). Note that this is a correct hypothesis constituting a grammar of the contextfree \(a^nb^n\) language.
It is interesting to observe that the program learned by Louise includes a definition of an invented predicate, \(\$1\), in row (f). This is despite the fact that our implementation of TOIL does not perform predicate invention and so has not learned any metarules that require predicate invention to be learned. In the MIL problem in Table 10 a single metarule is sufficient to learn a correct hypothesis and this metarule can be learned without predicate invention, even though the correct hypothesis starting in (f) cannot, itself, be learned without predicate invention. This observation suggests that even the current, limited version of TOIL that cannot perform predicate invention, may be capable of learning a set of metarules that is sufficient to learn a correct hypothesis, when given to a system capable of predicate invention, like Louise (or Metagol).
The observation about predicate invention in the previous paragraph further highlights the generality of metarules and suggests the existence of a class of learning problems that can be solved with a number of metarules much smaller than the number of clauses in their target theory, a subject for further study.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Patsantzis, S., Muggleton, S.H. Metainterpretive learning as metarule specialisation. Mach Learn 111, 3703–3731 (2022). https://doi.org/10.1007/s10994022061561
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10994022061561