Abstract
Despite early interest Predicate Invention has lately been underexplored within ILP. We develop a framework in which predicate invention and recursive generalisations are implemented using abduction with respect to a metainterpreter. The approach is based on a previously unexplored case of Inverse Entailment for Grammatical Inference of Regular languages. Every abduced grammar H is represented by a conjunction of existentially quantified atomic formulae. Thus ¬H is a universally quantified clause representing a denial. The hypothesis space of solutions for ¬H can be ordered by θsubsumption. We show that the representation can be mapped to a fragment of HigherOrder Datalog in which atomic formulae in H are projections of firstorder definite clause grammar rules and the existentially quantified variables are projections of firstorder predicate symbols. This allows predicate invention to be effected by the introduction of firstorder variables. Previous work by Inoue and Furukawa used abduction and metalevel reasoning to invent predicates representing propositions. By contrast, the present paper uses abduction with a metainterpretive framework to invent relations. We describe the implementations of Metainterpretive Learning (MIL) using two different declarative representations: Prolog and Answer Set Programming (ASP). We compare these implementations against a stateoftheart ILP system MCTopLog using the dataset of learning Regular and ContextFree grammars as well learning a simplified natural language grammar and a grammatical description of a staircase. Experiments indicate that on randomly chosen grammars, the two implementations have significantly higher accuracies than MCTopLog. In terms of running time, Metagol is overall fastest in these tasks. Experiments indicate that the Prolog implementation is competitive with the ASP one due to its ability to encode a strong procedural bias. We demonstrate that MIL can be applied to learning natural grammars. In this case experiments indicate that increasing the available background knowledge, reduces the running time. Additionally ASP_{M} (ASP using a metainterpreter) is shown to have a speed advantage over Metagol when background knowledge is sparse. We also demonstrate that by combining Metagol_{ R } (Metagol with a Regular grammar metainterpreter) and Metagol_{ CF } (ContextFree metainterpreter) we can formulate a system, Metagol_{ RCF }, which can change representation by firstly assuming the target to be Regular, and then failing this, switch to assuming it to be ContextFree. Metagol_{ RCF } runs up to 100 times faster than Metagol_{ CF } on grammars chosen randomly from Regular and nonRegular ContextFree grammars.
1 Introduction
Consider the problem of using an ILP system to learn a Regular grammar which accepts all and only those binary sequences containing an even number of 1s (see Fig. 1). Since the 1950s automatonbased learning algorithms have existed (Moore 1956) which inductively infer Regular languages, such as Parity, from positive and negative examples. If we try to learn Parity using an ILP system the obvious representation of the target would be a Definite Clause Grammar (DCG) (see Fig. 1a). However, if the ILP system were provided with examples for the predicate q _{0} then the predicate q _{1} would need to be invented since the only single state finite acceptor consistent with the examples would accept all finite strings consisting of 0s and 1s. It is widely accepted that Predicate Invention is a hard and underexplored topic within ILP (Muggleton et al. 2011), and indeed stateoftheart ILP systems, including MCTopLog (Muggleton et al. 2012) and Progol (Muggleton 1995; Muggleton and Bryant 2000), are unable to learn grammars such as Parity in the form of a DCG using only firstorder (nonmetalogical) background knowledge since these systems do not support Predicate Invention. However, note that in Fig. 1a each clause of the DCG has one of the following two forms.
where Q, C, P are the only symbols which vary between the clauses. Figure 1b shows how these two forms of clauses above can be captured within the two clauses of a recursive metainterpreter parse/3 which uses the auxiliary predicates acceptor/1 and delta1/3 ^{Footnote 1} ^{Footnote 2} to instantiate the predicate symbols and constants from the original DCG. The predicates acceptor/1 and delta1/3 can each be interpreted as HigherOrder Datalog (Muggleton and Pahlavi 2012) predicates since they take arguments which are predicate symbols q _{0},q _{1} from the DCG. By making acceptor/1 and delta1/3 abducible, Parity, and indeed any other Regular grammar, could in principle be learned from ground instances of parse/1 using abduction. The paper explores this form of learning with respect to a metainterpreter.
We show that such abductively inferred grammars are a special case of Inverse Entailment. We also show that the hypothesis space forms a lattice ordered by subsumption. The extensions of this use of abduction with respect to a metainterpreter lead to a new class of inductive algorithm for learning Regular and ContextFree languages. The new approach blurs the normal distinctions between abductive and inductive techniques (see Flach and Kakas, 2000). Usually abduction is thought of as providing an explanation in the form of a set of ground facts while induction provides an explanation in the form of a set of universally quantified rules. However, the metainterpreter in Fig. 1b can be viewed as projecting the universally quantified rules in Fig. 1a onto the ground facts associated with acceptor/1 and delta1/3 in Fig. 1b. In this way abducing these ground facts with respect to a metainterpreter is equivalent to induction, since it is trivial to map the ground acceptor/1 and delta1/3 facts back to the original universally quantified DCG rules.
In this paper, we show that the MIL framework can be directly implemented using declarative techniques such as Prolog and Answer Set Programming (ASP). In this way, the search for an hypothesis in a learning task is delegated to the search engine in Prolog or ASP. Although existing abductive systems can achieve predicate invention if loaded with the metainterpreter introduced in this paper, a direct implementation of MIL has the following advantages.

1.
As a declarative machine learning (De Raedt 2012) approach, it can make use of the advances in solvers. For example, techniques ASP solvers such as Clasp (Gebser et al. 2007) compete favourably in international competitions. Recently Clasp has been extended to Unclasp (Andres et al. 2012) which is highly efficiency for optimisation tasks. This advance is exploited in the experiments of this paper, as we use Unclasp for our experiments.

2.
As demonstrated by the experiments in this paper, direct implementation of the approach using a metainterpreter has increased efficiency due to an ordered search in the case of Prolog and effective pruning in the case of ASP. While existing abductive systems like SOLAR (Nabeshima et al. 2010), ASystem (Kakas et al. 2001) and MCTopLog do not have an ordered search, but instead enumerate all hypotheses that are consistent with the data.

3.
The resulting hypotheses achieve higher predictive due to global optimisation, as opposed to the greedy covering algorithm used in many systems including MCTopLog.
The paper is structured as follows. Section 2 introduces the theoretical framework for MIL and its application to grammatical inference. We then describe implementations for a variant of Metagol, ASP_{ M } (ASP using a metainterpreter). In Sect. 4 the performance of these systems is compared experimentally against MCToplog on Regular and ContextFree grammar learning problems. In Sect. 5 we describe related work. Lastly we conclude and describe directions for further work in Sect. 6.
2 MIL framework
2.1 Logical notation
A variable is represented by an upper case letter followed by a string of lower case letters and digits. A function symbol or predicate symbol is a lower case letter followed by a string of lower case letters and digits. The arity of a function or predicate symbol is the number of arguments it takes. A constant is a function or predicate symbol which has arity zero. Variables and constants are terms, and a function symbol immediately followed by a bracketed ntuple of terms is a term. Thus f(g(X),h) is a term when f, g and h are function symbols and X is a variable. A predicate symbol immediately followed by a bracketed ntuple of terms is called an atomic formula. The negation symbol is ¬. Both A and ¬A are literals whenever A is an atomic formula. In this case A is called a positive literal and ¬A is called a negative literal. A finite set (possibly empty) of literals is called a clause. A clause represents the disjunction of its literals. Thus the clause {A _{1},A _{2},…¬A _{ i },¬A _{ i+1},…} can be equivalently represented as (A _{1}∨A _{2}∨…¬A _{ i }∨¬A _{ i+1}∨…) or A _{1},A _{2},…←A _{ i },A _{ i+1},…. A Horn clause is a clause which contains at most one positive literal. A Horn clause is unit if and only if it contains exactly one literal. A denial or goal is a Horn clause which contains no positive literals. A definite clause is a Horn clause which contains exactly one positive literal. The positive literal in a definite clause is called the head of the clause while the negative literals are collectively called the body of the clause. A unit clause is positive if it contains a head and no body. A unit clause is negative if it contains one literal in the body. A set of clauses is called a clausal theory. A clausal theory represents the conjunction of its clauses. Thus the clausal theory {C _{1},C _{2},…} can be equivalently represented as (C _{1}∧C _{2}∧⋯). A clausal theory in which all predicates have arity at most one is called monadic. A clausal theory in which all predicates have arity at most two is called dyadic. A clausal theory in which each clause is Horn is called a Horn logic program. A logic program is said to be definite in the case it contains only definite clauses. A logic program is said to be a Datalog program if it contains no function symbols other than constants. A Datalog program is said to be higherorder in the case that it contains at least one constant predicate symbol which is the argument of a term. Literals, clauses and clausal theories are all wellformedformulae (wffs) in which the variables are assumed to be universally quantified. Let E be a wff or term. E is said to be ground if and only if it contains no variables. The process of replacing (existential) variables by constants is called Skolemisation. The unique constants are called Skolem constants. Let C and D be clauses. We say that C⪰_{ θ } D or C θsubsumes D if and only if there exists a substitution θ such that Cθ⊆D.
2.2 Formal language notation
Let Σ be a finite alphabet. Σ ^{∗} is the infinite set of strings made up of zero or more letters from Σ. λ is the empty string. uv is the concatenation of strings u and v. u is the length of string u. A language L is any subset of Σ ^{∗}. Let ν be a set of nonterminal symbols disjoint from Σ. A production rule r=LHS→RHS is wellformed in the case that LHS∈(ν∪Σ)^{∗}, RHS∈(ν∪Σ∪λ)^{∗} and when applied replaces LHS by RHS in a given string. A grammar G is a pair 〈s,R〉 consisting of a start symbol s∈ν and a finite set of production rules R. A grammar is Regular Chomskynormal in the case that it contains only production rules of the form S→λ or S→aB where S,B∈ν and a∈Σ. A grammar is Linear ContextFree in the case that it contains only Regular Chomskynormal production rules or rules of the form S→Ab where S,A∈ν and b∈Σ. A grammar is ContextFree in the case that it contains only Linear ContextFree Chomskynormal production rules or rules of the form S→AB where S,A,B∈ν.^{Footnote 3} A ContextFree grammar is said to be deterministic in the case that it does not contain two Regular Chomskynormal production rules S→aB and S→aC where B≠C. A sentence σ∈Σ ^{∗} is in L(G) iff given a start symbol S∈ν there exists a sequence of production rule applications \(S\rightarrow_{R_{1}}\cdots\rightarrow_{R_{n}}\sigma\) where R _{ i }∈G. A language L is Regular, Linear Contextfree or ContextFree in the case there exists a grammar G for which L=L(G) where G is Regular, Linear ContextFree or ContextFree respectively. According to the ContextFree Pumping Lemma (Hopcroft and Ullman 1979), if a language L is ContextFree, then there exists some integer p≥1 such that any string s in L with s≥p (where p is a constant) can be written as s=uvxyz with substrings u, v, x, y and z, such that vxy≤p, vy≥1 and uv ^{n} xy ^{n} z is in L for every integer n≥0.
2.3 Framework
The MetaInterpretive Learning (MIL) setting is a variant of the normal setting for ILP.
Definition 1
(MetaInterpretive Learning setting)
A MetaInterpretive Learning (MIL) problem consists of Input=〈B,E〉 and Output=H where the background knowledge B=B _{ M }∪B _{ A }. B _{ M } is a definite logic program^{Footnote 4} representing a metainterpreter and B _{ A } and H are ground definite HigherOrder Datalog programs consisting of positive unit clauses. The predicate symbol constants in B _{ A } and H are represented by Skolem constants. The examples are E=〈E ^{+},E ^{−}〉 where E ^{+} is a ground logic program consisting of positive unit clauses and E ^{−} is a ground logic program consisting of negative unit clauses. The Input and Output are such that B,H⊨E ^{+} and for all e ^{−} in E ^{−}, \(B, H \not\models e^{}\).
Inverse Entailment can be applied to allow H to be derived from B and E ^{+} as follows.
Since both H and E ^{+} can each be treated as conjunctions of ground atoms containing Skolem constants in place of existential variables, it follows that ¬H and ¬E ^{+} are universally quantified denials where the variables come from replacing Skolem constants by unique variables. We now define the concept of a Metainterpretive learner.
Definition 2
(Metainterpretive learner)
Let \({\mathcal{H}}_{B,E}\) represent the complete set of abductive hypotheses H for the MIL setting of Definition 1. Algorithm A is said to be a Metainterpretive learner iff for all B,E such that H is the output of Algorithm A given B and E as inputs, it is the case that \(H\in {\mathcal{H}}_{B,E}\).
Example 1
(Parity example)
Let B=〈B _{ M },B _{ A }〉, E=〈E ^{+},E ^{−}〉 and \(H\in{\mathcal{H}}_{B,E}\) represents the parity grammar. Figure 2 shows H as a possible output of a Metainterpretive learner.
Note that this example of abduction produces Predicate Invention by introducing Skolem constants representing new predicate symbols. By contrast an ILP system such as Progol uses Inverse Entailment (Muggleton 1995) to construct a single clause from a single example, while a Metainterpretive learner uses Inverse Entailment to construct the set of all clauses H as the abductive solution to a single goal ¬E ^{+} using E ^{−} as integrity constraints. In the example the hypothesised grammar H corresponds to the firstorder DCG from Fig. 1a, which contains both invented predicates and mutual recursion. Neither predicate invention nor mutual recursion can be achieved with DCGs in this way using ILP systems such as Progol or MCTopLog.
2.4 Lattice properties of hypothesis space
In this section we investigate orderings over MIL hypotheses.
Definition 3
(⪰_{ B,E } relation in MIL)
Within the MIL setting we say that H⪰_{ B,E } H′ in the case that \(H,H' \in {\mathcal{H}}_{B,E}\) and ¬H′⪰_{ θ }¬H.
We now show that ⪰_{ B,E } forms a quasiordering and a lattice.
Proposition 1
(Quasiordering)
Within the MIL setting \(\langle {\mathcal{H}}_{B,E},\succeq_{B,E}\rangle\) forms a quasiordering.
Proof
Follows from the fact that \(\langle \{\neg H: H \in {\mathcal{H}}_{B,E}\},\succeq_{\theta}\rangle\) forms a quasiordering since each ¬H is a clause (NienhuysCheng and de Wolf 1997). □
Proposition 2
(Lattice)
Within the MIL setting \(\langle {\mathcal{H}}_{B,E},\succeq_{B,E}\rangle\) forms a lattice.
Proof
Follows from the fact that \(\langle \{\neg H: H \in {\mathcal{H}}_{B,E}\},\succeq_{\theta}\rangle\) forms a lattice since each ¬H is a clause (NienhuysCheng and de Wolf 1997). □
We now show that this ordering has a unique top element.
Proposition 3
(Unique ⊤ element)
Within the MIL setting there exists \(\top \in {\mathcal{H}}_{B,E}\) such that for all \(H\in{\mathcal{H}}_{B,E}\) we have ⊤⪰_{ B,E } H and ⊤ is unique up to renaming of Skolem constants.
Proof
Let \(\neg H' = \bigvee_{H\in {\mathcal{H}}_{B,E}}\neg H\) and ¬⊤=¬H′θ _{ v } where v is a variable and θ _{ v }={u/v:u variable in ¬H′}. By construction for each \(H\in {\mathcal{H}}_{B,E}\) it follows that ¬⊤⪰_{ θ }¬H with substitution θ _{ v }. Therefore for all \(H\in{\mathcal{H}}_{B,E}\) we have ⊤⪰_{ B,E } H and ⊤ is unique up to renaming of Skolem constants. □
This proposition can be illustrated with a grammar example.
Example 2
(Subsumption example)
In terms of the Metainterpreter of Fig. 1a the universal grammar {0,1}^{∗} can be expressed using ⊤={(acceptor($0)←),(delta1($0,0,$0)←),(delta1($0,1,$0)←)}. Letting H represent the Parity grammar from Example 1 it is clear that ¬H⪰_{ θ }¬⊤ and so ⊤⪰_{ B,E } H. So unlike the subsumption relation between universally quantified clauses, binding all the (existentially quantified) variables in H to each other produces a maximally general grammar ⊤.
We now show the circumstances under which a unique bottom element of the lattice can be constructed using Plotkin’s lgg algorithm.
Proposition 4
(Unique ⊥ element)
In the case that \({\mathcal{H}}_{B,E}\) is finite up to renaming of Skolem constants there exists \(\bot \in {\mathcal{H}}_{B,E}\) such that for all \(H\in{\mathcal{H}}_{B,E}\) we have H⪰_{ B,E }⊥ and ⊥ is unique up to renaming of Skolem constants.
Proof
Since \({\mathcal{H}}_{B,E}\) is finite \(\neg\bot = lgg(\{\neg H: H\in {\mathcal{H}}_{B,E}\})\) where lgg is Plotkin’s algorithm for computing the least general generalisation of a set of clauses under subsumption (Plotkin 1969). □
For most purposes the construction of the unique bottom clause is intractable since the cardinality of the lgg clause increases exponentially in the cardinality of \({\mathcal{H}}_{B,E}\). We now show a method for reducing hypotheses.
2.5 Reduction of hypotheses
Proposition 5
(Logical reduction of hypotheses)
Suppose H′ is an hypothesis in the MIL setting and ¬H is the result of applying Plotkin’s clause reduction algorithm ^{Footnote 5} (Plotkin 1969) to ¬H′. Then H is a reduced hypothesis equivalent to H′.
Proof
Follows from the fact that ¬H′ is θsubsumption equivalent to ¬H by construction. □
Example 3
(Reduction example)
Let H′=H∪{r} where H is the Parity grammar from Fig. 2 and r=(delta1($0,0,$2)←) represents an additional redundant grammar rule. Now Plotkin’s reduction algorithm would reduce ¬H′ to the equivalent clause ¬H and consequently grammar H is a reduced equivalent form of H.
In the following section we show the existence of a compact bottom hypothesis in the case of MIL for Regular languages.
2.6 Framework applied to grammar learning
Figure 3 shows how the Metainterpreter for Regular Grammars can be extended to ContextFree Grammars.
The Chomsky language types form an inclusion hierarchy in which Regular ⊆ ContextFree. Algorithms for learning the Regular languages have been widely studied since the 1970s within the topic of Grammatical Inference (de la Higuera 2005). Many of these start with a prefix tree acceptor, and then progressively merge the states.
Proposition 6
(Unique ⊥ for Regular languages)
Prefix trees act as a compact bottom theory in the MIL setting for Regular languages.
Proof
Follows from the fact that all deterministic Regular grammars which include the positive examples can be formed by merging the arcs of a prefix tree acceptor (Muggleton 1990). Merging the arcs of the prefix tree is achieved by unifying the delta1 atoms in ¬H within the MIL setting. □
Example 4
(Prefix tree)
Assume the MIL setting with B _{ M } being the metainterpreter for Regular languages. Let E ^{+}={parse([1,1]),parse([1,1,0])} then ⊥={delta1($0,1,$1),delta1($1,1,$2),acceptor($2),delta1($2,0,$3),acceptor($3)} represents the prefix tree automaton.
Proposition 7
(⊥ for ContextFree languages)
Any bottom theory ⊥ for a ContextFree language contains a set of delta1 atoms representing a Regular prefix tree.
Proof
Follows from the fact that the Regular subset of MIL hypotheses are all subsumed by ¬⊥_{ R } where ⊥_{ R } represents the Regular prefix tree. □
3 Implementations
In this section, we describe the implementations of Metainterpretive Learning (MIL) using two different declarative languages: Prolog and Answer Set Programming (ASP). The resulting systems are called Metagol^{Footnote 6} and ASP_{M},^{Footnote 7} respectively.
3.1 Implementation in Prolog
The systems Metagol_{ R }, Metagol_{ CF }, and Metagol_{ RCF } are three simple Prolog implementations of MIL.
3.1.1 Metagol_{ R }
Before introducing Metagol_{ R }, we first explain its simplified version noMetagol_{ R } (nonoptimising Metagol_{ R }) as shown in Fig. 4. The system noMetagol_{ R } is based on the following abductive variant of the Regular Metainterpreter from Fig. 3 (the standard definition of member/2 is omitted for brevity).
The abduced atoms are simply accumulated in the extra variables G1,G2,G3. The term s(0) represents the start symbol and a finite set of Skolem constants is provided by the monadic predicate Skolem. Hypotheses are now the answer substitutions of a goal such as the following.
Note that each of the positive examples are provided sequentially within the goal and the resulting grammar is then tested for noncoverage of each of the negative examples. The final grammar returned in the variable G is a solution which covers all positives and none of the negatives. In the case shown above the first hypothesis found by Prolog is as follows.
This hypothesis correctly represents the Parity acceptor of Fig. 1. All other consistent hypotheses can be generated by making Prolog backtrack through the SLD proof space.
Metagol_{ R }
We will now explain the following procedural biases, which extends noMetagol_{ R } to Metagol_{ R }.
Minimal hypothesis
Occam’s razor suggests to select the shortest hypothesis that fits the data. Therefore we introduce the clause bound into Metagol_{ R } so that the search starts from shorter hypotheses. In Metagol_{ R } (Fig. 5) the variables K, K1, K2 and K3 are related to the clause bound. They are instantiated with Peano numbers (s(0),s(1),…) representing a bound on the maximum number of abduced clauses. Thus the second clause of abduce/5 fails once K1 has a value of 0. K1 is iteratively increased until an hypothesis is found within that bound. The search thus guarantees finding an hypothesis with minimal description length.
SpecifictoGeneral
Within the MIL setting an hypothesis H _{ s } is said to be more specific than H _{ g } in the case that ¬H _{ s }⪰_{ θ }¬H _{ g }, as explained in Sect. 2.4. Therefore H _{ s } is a refinement of H _{ g } by renaming with new Skolem constants. In Metagol_{ R } the Skolem constants are enumerated by the program of Skolem/3. The first clause of Skolem/3 introduces a new Skolem constant, while the second clause of Skolem/3 provides a Skolem constant that has already been used in the deriving hypothesis. Due to Prolog’s procedural semantics, the first clause of Skolem/3 will be tried before the second one, thus H _{ s }, that is, the one with more Skolem constants, will be considered before H _{ g }.^{Footnote 8} Switching the order of the two clauses in Skolem/3 will result in a generaltospecific search. In that case, the universal grammar will be considered first, since it is maximally general and can be expressed with only one Skolem constant (see Example 2).
Metagol_{ CF }
The Metagol_{ CF } system is based on an abductive variant of the ContextFree Metainterpreter from Fig. 3, though we omit the full Prolog description due to space restrictions. Once more, abduction is carried out with respect to a single goal as in Metagol_{ R }.
Metagol_{ RCF }
The Metagol_{ RCF } system simply combines Metagol_{ R } and Metagol_{ CF } sequentially, as shown below in Fig. 6. Due to Prolog’s procedural semantics, the hypothesis returned will be Regular in the case Metagol_{ R } finds a consistent grammar and otherwise will be the result of Metagol_{ CF }.
3.2 Implementation in Answer Set Programming (ASP)
Compared to Prolog, ASP not only has advantages in handling nonmonotonic reasoning, but also has higher efficiency in tackling search problems (Gebser et al. 2012). The systems ASP_{MR } and ASP_{MCF } are two simple ASP implementations of Metainterpretive learning. Each sequence is encoded as a set of facts. For example, the positive example posEx(Seq2,[1,1]) is encoded in the second line in Fig. 7, where seq2 is the ID of the sequence and the predicate seqT(SeqID,P,T) means the sequence has a terminal T at position P. The metainterpretive parser uses position to mark a substring, rather than storing the substring in a list. The goal of finding an hypothesis that covers all positive examples and none of the negatives is encoded as an integrity constraint.
ASP_{MR }
The program in Fig. 8 is an ASP implementation of the Regular Metainterpreter in Fig. 3. It is sectioned into parts describing generating, defining, testing, optimising, and displaying. The generating part specifies the hypothesis space as a set of facts about delta1/3 and acceptor/1. ASP choice rules are used to indicate that any subset of this set is allowed in the answer sets of this program. The defining part corresponds to the Regular Metainterpreter. The testing part contains an integrity constraint saying that an answer set of this program should contain production rules which parse all positive examples and no negative examples. The display part restricts the output to containing only predicates delta1/3 and acceptor/1, which corresponds to the hypothesis.
In order to find a minimal hypothesis like that in Metagol_{ R }, the optimisation component in ASP is used. Although the use of optimisation increases the computational complexity (Gebser et al. 2012), it improves^{Footnote 9} the predictive accuracy of the hypothesis. An optimisation statement like the one in Fig. 8 specifies the objective function to be optimised. The weight following each atom is part of the objective function. In our case, the objective function corresponds to the description length of an hypothesis. Therefore the weight is set to 1 for each atom, meaning the description length of a unit clause is 1.
Most ASP solvers do not support variables directly, therefore a grounder is needed for transforming the input program with firstorder variables into an equivalent ground program. Then an ASP solver can be applied to find an answer set that satisfies all the constraints. The hypothesised grammar will be part of the returned answer set. In the case shown above the first hypothesis returned by ASP is the same as the one found by Metagol and correctly represents the Parity acceptor of Fig. 1.
ASP solvers use efficient constraint handling techniques to efficiently find stable models known as answer sets. This computational mechanism is very different from that of Prolog, leading to their different implementations, in particular, in the use of iterative deepening. In addition, the bound on clauses puts an implicit limit on Skolem constants, since the number of Skolem constants in a derived hypothesis is at most the number of clauses it contains. Therefore Metagol_{ R } is immune to the number of Skolem constants prespecified in the background knowledge. By contrast, ASP_{MR } is largely affected by the number of Skolem constants due to its bottomup search. Therefore ASP_{MR } has to put an explicit bound on the number of Skolem constants. More specifically, the second line of ‘Generate’ {delta1(NT1,T,NT2):Skolem(NT1):terminal(T):Skolem(NT2)} has a default size of T∗NT ^{2}, where T corresponds to the number of terminals and NT denote the number of Skolem constants. While a cardinality constraint on this set does not always reduce the search space, because it can lead to a quadratic blowup in search space (Gebser et al. 2012) when the cardinality constraint is translated into normal logic program during the grounding stage. Additionally, ASP solvers’ buildin optimisation component is handy for finding a global minimal hypothesis. Thus ASP_{MR } does not use iterative deepening on the clause bound like that in Metagol_{ R } for finding a global minimal hypothesis.
ASP_{MCF }
Similar to Metagol_{ CF }, the ASP_{MCF } system is based on a variant of the ContextFree Metainterpreter from Fig. 3. However, there is no equivalent ASP implementation to Metagol_{ RCF }. Since Metagol_{ RCF } exploits the procedural semantics of Prolog programs, while there is no similar procedural semantics for ASP programs.
4 Experiments
In this section we describe experiments on learning Regular, ContextFree and a simplified natural language grammar. It was shown in Sect. 1 that ILP systems cannot learn grammars in a DCG representation with predicate invention. However, an ILP system given a metainterpreter as part of background knowledge becomes capable of doing predicate invention. In the experiments described below, the performance of a stateoftheart ILP system MCTopLog, loaded with suitable metainterpretive background, is compared against variants of Metagol and ASP_{M} as described in Sect. 3. MCTopLog is chosen for this comparison since it can learn multiple dependent clauses from examples (unlike say Progol). This is a necessary ability for grammar learning tasks. In the final experiment we show how MIL can be used to learn a definition of a staircase. This indicates the applicability of MIL in more general learning applications beyond grammar learning. All datasets and learning systems used in these experiments are available at http://ilp.doc.ic.ac.uk/metagol.
4.1 Learning regular languages
We investigate the following Null hypotheses.
 Null Hypothesis 1.1 :

Metagol_{ R }, ASP_{MR } and a stateoftheart ILP system cannot learn randomly chosen Regular languages.
 Null Hypothesis 1.2 :

Metagol_{ R } and ASP_{MR } cannot outperform a stateoftheart ILP system on learning randomly chosen Regular languages.
 Null Hypothesis 1.3 :

Metagol_{ R } can not outperform ASP_{MR } on learning randomly chosen Regular languages.
4.1.1 Materials and methods
Randomly chosen deterministic Regular grammars were generated by sampling from a Stochastic Logic Program (SLP) (Muggleton 1996) which defined the space of target grammars. More specifically, the SLP used for sampling consists of a metainterpreter and all possible grammars. Then the following steps were conducted. Firstly, an integer i (1≤i≤3) was randomly sampled. This integer corresponds to the number of seed examples.^{Footnote 10} Secondly, the query “sample(parse(Seq,Grammar))’ returned one sequence as well as the grammar that parse this sequence. Thirdly, the grammars were aggregated by issuing the query ‘sample(parse(Seq,Grammar))” i times. Finally, each generated grammar was reduced using Plotkin’s reduction algorithm (see Sect. 2.5) to remove redundancy and equivalent nonterminals. Nondeterministic and finite language grammars were discarded. Sampling of examples was also done using an SLP. Sampling was with replacement.
In this experiment, we used two different datasets sampled from different distributions. In dataset RG1, the examples were randomly chosen from Σ ^{∗} for Σ={a,b}, while in RG2 Σ={a,b,c}. RG2 has longer sequence lengths, as shown by Table 1. Both datasets contains 200 randomly chosen Regular grammars. We compared the performance of Metagol_{ R }, ASP_{MR } and MCTopLog on learning Regular grammars using RG1. Only Metagol_{ R } and ASP_{MR } were compared on RG2, since MCTopLog failed to terminate due to the longer sequence examples. The performance was evaluated on predictive accuracies and running time.^{Footnote 11} The results were averaged over 200 randomly sampled grammars. For each sample, we used a fixed test set of size 1000. The size of training set varied from 2 to 50 in RG1 and from 4 to 100 in RG2.
4.1.2 Results and discussion
As shown by Fig. 9(a), all three systems have predictive accuracies significantly higher than default. Therefore Null hypothesis 1.1 is refuted. MCTopLog is not usually able to carry out predicate invention, but is enabled to do so by the inclusion of a metainterpreter as background knowledge.
As shown in Fig. 9(b), MCTopLog’s running time is considerably longer than Metagol_{ R } and ASP_{MR }. MCTopLog has slightly lower predictive accuracies than both Metagol_{ R } and ASP_{MR }. The difference is statistically significant according to a ttest (p<0.01). Therefore, Null hypothesis 1.2 is refuted with respect to both predictive accuracy and running time. MCTopLog’s longer running time is due to the fact that it enumerates all candidate hypotheses within the version space. By contrast, both Metagol_{ R } and ASP_{MR } do not traverse the entire space. In particular, ASP solver like Clasp incorporate effective optimisation techniques based on branchandbound algorithms (Gebser et al. 2007). The larger hypothesis space leads to lower accuracy in MCTopLog. This is consistent with the Blumer Bound (Blumer et al. 1989), according to which the error bound decreases with the size of the hypothesis space. Moreover, MCTopLog’s accuracy is also affected by its covering algorithm which is greedy and does not guarantee finding a global optimal. By contrast, both Metagol_{ R } and ASP_{MR } find an hypothesis which is minimal in terms of description length. Figure 11 compares the different hypothesis suggested by the three systems. MCTopLog’s hypothesis H _{ mcTopLog } is longer than both of Metagol_{ R } and ASP_{MR }. By contrast, both Metagol_{ R } and ASP_{MR } derive the one with minimal description length, although they are not exactly the same. H _{ metagolR } is more specific than H _{ aspMR } due to the specifictogeneral search in Metagol_{ R }. In this example, H _{ metagolR } is the same as the target hypothesis.
Figure 9(b) indicates that Metagol_{ R } has considerably lower running time than ASP_{MR }, and the difference increases when examples are long, as shown in Fig. 10(b). Metagol_{ R } also has slightly higher accuracy than ASP_{MR }. A ttest suggests that their difference in accuracy is statistically significant (p<0.01) as one is consistently higher than the other. Therefore, Null hypothesis 1.3 is refuted with respect to both predictive accuracy and running time. The reasons that Metagol_{ R } is faster than ASP_{MR } on learning regular languages are: (1) Metagol_{ R }, as a Prolog implementation, can use forms of procedural bias which cannot be defined declaratively in ASP since the search in ASP is not affected by the order of clauses in the logic program; (2) there are few constraints in the learning task so that efficient constraint handling techniques in ASP do not increase efficiency.
Both Metagol_{ R } and ASP_{MR }’s running times appear to increase linearly with the number of examples. By contrast, MCTopLog’s running time appears to be unaffected by the number of examples. MCTopLog’s running time is determined by the size of the hypothesis space it enumerates, which depends on the lengths of examples. It therefore fails to learn from RG2 which has longer sequences (see Table 1).
4.2 Learning contextfree languages
We investigate the following Null hypotheses.
 Null Hypothesis 2.1 :

Metagol_{ CF }, ASP_{MCF } and a stateoftheart ILP system cannot learn randomly chosen ContextFree languages.
 Null Hypothesis 2.2 :

Metagol_{ CF } and ASP_{MCF } cannot outperform a stateoftheart ILP system on learning randomly chosen ContextFree languages.
 Null Hypothesis 2.3 :

Metagol_{ CF } cannot outperform ASP_{MCF } on learning randomly chosen ContextFree languages.
4.2.1 Materials and methods
Randomly chosen ContextFree grammars were generated using an SLP and reduced using Plotkin’s reduction algorithm (see Sect. 2.5). Grammars were removed if they corresponded to finite languages or could be recognised using the pumping lemma for ContextFree grammars. However, not all Regular grammars can be filtered in this way, since it is undecidable whether a ContextFree grammar is Regular. More specifically, if a grammar is not pumpable, then it is definitely Regular, while a pumpable grammar is not necessarily nonRegular.
The examples were generated in the same way as that in the Regularlanguage experiment. There were two datasets, each containing 200 samples. Details are shown in Table 1. The comparisons of Metagol_{ CF }, ASP_{MCF } and MCTopLog on learning ContextFree grammars was done using only dataset CFG3 since MCTopLog failed to terminate on CFG4 with longsequence examples. The evaluation method was the same as that for learning regular languages.
4.2.2 Results and discussion
As shown in Fig. 12(a), all three systems derive hypotheses with predictive accuracies considerably higher than default. Therefore Null hypotheses 2.1 is refuted. Compared to MCTopLog, both Metagol_{ CF } and ASP_{MCF } have consistently higher averaged predictive accuracies. This is again explained by the Blumer Bound since MCTopLog considers a larger hypothesis space. Metagol_{ CF } conducts a bounded search using a bottom clause so that it is feasible even though the version space is potentially infinite. ASP solvers can also deal with infinite spaces.
Null hypothesis 2.2 is refuted with respect to both running time and predictive accuracy. The predictive accuracies of Metagol_{ CF } and ASP_{MCF }, have no significant difference on either dataset, as shown by the graphs in Figs. 12(a) and 13(a), since both derive globally optimal solutions. However, Metagol_{ CF } has shorter running time due to its procedural bias. Therefore Null hypothesis 2.3 is refuted.
4.3 Representation change
 Null Hypothesis 3 :

Metagol_{ RCF } cannot improve performance by changing representation from Regular to ContextFree languages.
4.3.1 Materials and methods
The experiment used RG1 and CFG3 from the previous two experiments. Therefore, there were 400 sampled grammars in total, half being Regular and the other half mostly ContextFree and nonRegular.
We compared Metagol_{ RCF } (variable hypothesis space) against Metagol_{ CF } (fixed hypothesis space). The predictive accuracies and running time were measured as before. The results were averaged over the 400 grammars.
4.3.2 Results and discussion
As shown in Fig. 14(a), Metagol_{ RCF } has slightly higher predictive accuracies than Metagol_{ CF }. This refutes Null hypothesis 3. The accuracy difference is once more consistent with the Blumer Bound (Blumer et al. 1989), according to which the error bound decreases with the size of the hypothesis space.
Note also in Fig. 14(b), that the running times of Metagol_{ CF } are significantly higher than Metagol_{ RCF }. This can be explained by the fact that when the target grammar is Regular, ContextFree grammars were still considered.
4.4 Learning a simplified natural language grammar
Metagol_{ N } and ASP_{MN } are two systems resulting from the application of Metagol and ASP_{M} in learning a simplified natural language grammar. We investigate the following Null hypotheses. MCTopLog was not included for comparison since its search time was excessive in these learning tasks.
 Null Hypothesis 4.1 :

Metagol_{ N } and ASP_{MN } cannot learn a simplified natural language grammar.
 Null Hypothesis 4.2 :

Metagol_{ N } cannot outperform ASP_{MN } on learning a simplified natural language grammar.
 Null Hypothesis 4.3 :

The provision of background knowledge does not improve learning accuracies and efficiency.
4.4.1 Materials and methods
The training examples come from the same domain considered in Muggleton et al. (2012) and consist of 50 sentences such as “a ball hits the small dog”. Half the examples are positive and half negative, resulting in a default accuracy of 50 %. The complete target grammar rules for parsing the training examples are given in Fig. 15. Each learning task is generated by randomly removing a set of clauses. The leftout clauses become the target to be reconstructed. For each size of leaveout, we sampled ten times. For each sample, the predictive accuracies were computed by 10fold cross validation.^{Footnote 12} The results plotted on the figure are averaged over all leaveout samples.
4.4.2 Results and discussion
The predictive accuracies and running times are plotted in Figs. 16 and 17 respectively. The xaxis corresponds to the percentage of remaining production rules. More specifically, 0 % corresponds to the case when B _{ A }=∅, while 90 % means 9 out of 10 production rules remain. Figure 16 shows that the predictive accuracies of both Metagol_{ N } and ASP_{MN } are significantly higher than default, therefore Null hypothesis 4.1 is refuted.
Although there is no significant difference between Metagol_{ N } and ASP_{MN } in terms of predictive accuracy, ASP_{MN } takes much shorter running time than Metagol_{ N } when more than half of the production rules are missing (x<50 %). However, the expanded version for 50 %≤x≤90 % in Fig. 17(b) shows that ASP_{MN } becomes slower than Metagol_{ N } when background knowledge is less sparse. Therefore, Null hypothesis 4.2 is refuted since when more than 70 % of the production rules remain Metagol_{ N } has significantly shorter running time than ASP_{MN } without sacrificing its predictive accuracy. This is due to the procedural bias encoded in Metagol_{ N }.
The running times of both Metagol_{ N } and ASP_{MN } decrease dramatically with the increase of background knowledge. The predictive accuracies increase with increasing background knowledge, reaching 100 % when the degree of remaining background clauses increases to 70 %. Therefore, Null hypothesis 4.3 is refuted.
Figure 18 compares the different hypotheses derived by Metagol_{ N } and ASP_{MN }. These are derived when B _{ A }=∅. Since both Metagol_{ N } and ASP_{MN } find an hypothesis which is globally optimal in terms of description length, these hypotheses have identical description length although they are not identical hypotheses. Among all the invented predicates in H _{ M }, s _{4} corresponds to np in natural grammars and s _{3} is closed to vp. Similarly in H _{ A }, s _{3} and s6 corresponds to vp and np respectively.
4.5 Learning a definition of a staircase
The authors of Farid and Sammut (2012) have shown that ALEPH can learn a definition of a staircase for a rescue robot from visuallyderived data. Part of such definition is shown in Fig. 19. This kind of definition is not entirely general since it does not involve recursion. We now demonstrate that MIL can be used to learn a general recursive definition of a staircase using predicate invention. A staircase can be represented by a set of ordered planes. For example, staircase([p1,p2,p3]) represents a staircase composed of three planes. Relational information from the camera indicates that plane1 is vertical relative to plane2. This can be encoded as a delta rule delta4(vertical,p1,p2), where vertical is a nonterminal of a grammar and p1 and p2 are terminals. The metainterpreter used in this experiment is a variant of the ContextFree Metainterpreter from Fig. 3.
Training examples of staircases and their planar description were provided as input to both Metagol and ASP_{M}. The resulting hypothesis produced by both systems is shown in Fig. 20, where s _{1} is an invented predicate corresponding to step. Due to its recursive form, this definition has shorter description length than those found by ALEPH. It is also general in its applicability and easily understood.
5 Related work
Grammatical inference (or grammatical induction) is the process of learning a grammar from a set of examples. It is closely related to the fields of machine learning as well as the theory of formal languages. It has numerous realworld applications including speech recognition (e.g. Stolcke 1995), computational linguistics (e.g. Florêncio 2002) and computational biology (e.g. Salvador and Benedi 2002).
The problem of learning or inferring Regular languages, which can be represented by deterministic finite state automata, has been well studied and efficient automatonbased learning algorithms have existed since the 1950s (Moore 1956). Some heuristic approaches to machine learning contextfree grammars (Vanlehn and Ball 1987; Langley and Stromsten 2000) have been investigated, though the completeness of these approaches is unclear. Although an efficient and complete approach exists for learning contextfree grammars from parse trees (Sakakibara 1992), no comparable complete approach exists in the literature for learning contextfree grammars from positive and negative samples of the language. According to a recent survey article learning contextfree languages is widely believed to be intractable and the state of the art mainly consists of negative results (de la Higuera 2005). There are some positive PAC (probably approximately correct) learning results concerning Regular languages (e.g. Denis 2001), but to the best of our knowledge, these have not been extended to the contextfree case. The difficulty of learning contextfree languages arises from a very large search space compared to regular languages.
ILP, among other learning methods, has previously been applied to grammatical inference (e.g. Boström 1998). However, as discussed in Sect. 1, ILP systems normally require predicate invention even for learning Regular languages. Predicate invention has been viewed as an important problem since the early days of ILP (e.g. Muggleton and Buntine 1988), but it is widely accepted to be a hard and underexplored topic within ILP (Muggleton et al. 2011). Although Cussens and Pulman (2000) has applied ALEPH for learning natural language grammar, its learning setting avoids predicate invention by assuming all predicates like np (noun phrase) are known in the background knowledge. Additionally, the entailmentincompleteness of ALEPH restricts the applicability of the approach.
In the Metainterpretive Learning (MIL) framework introduced in this paper, predicate invention is done via abduction with respect to a metainterpreter and by the introduction of firstorder variables. This method is therefore related to other studies where abduction has been used for predicate invention. For instance, (Inoue et al. 2010) assumes background knowledge such as the following.
Here the predicates connected and caused are both metapredicates for objectlevel propositions g and s. Given multiple observations such as caused(g,s) and caused(h,s) abduction can be used to generate an explanation
in which X can be thought of as a new propositional predicate. One important feature of MIL, which makes it distinct from this approach, is that it introduces new predicate symbols which represent relations rather than new objects or propositions. In comparison to previous approaches to predicate invention one might question what is meant by the predicate symbols being new. In our case, we assume a source containing either a finite or an infinite source (e.g. the natural numbers) of uninterpreted predicate symbols. Rather than providing these implicitly in hidden code (as was the case in CIGOL (Muggleton and Buntine 1988)), we prefer to have these symbols explicitly defined as part of the Herbrand universe of the metainterpreter. Abductive hypothesis formation then provides the interpretation for these otherwise uninterpreted symbols.
6 Conclusions and further work
This paper explores the theory, implementation and experimental application of a new framework (MIL) for machine learning by abduction with respect to a given Metainterpreter. We have demonstrated that the MIL framework can be implemented using a simple Prolog program or within a more sophisticated solver such as ASP. We have applied these implementations to the problem of inductive inference of grammars, where our experiments indicate that they compete favourably in speed and accuracy with the state of the art ILP system MCTopLog. The MIL framework has a number of advantages with respect to the standard ILP framework. In particular, predicate invention and mutual recursion can be incorporated with ease by way of Skolem constants. The Metainterpreter provides an efficient declarative bias mechanism for controlling the search for hypotheses, which takes advantage of the completeness of SLD resolution in Prolog. This mechanism is distinct from the use of firstorder declarative bias in the form of a ⊤ theory (Muggleton et al. 2010, 2012) since it is not assumed that the metainterpreter entails each hypothesis.
The approach presented here is limited to learning grammars in the form of DCGs. Such grammars can be learned with predicates of arity at most 2. In future work we hope to deal with a number of extensions of this study. In particular, we would like to extend the applications of the MIL framework to nongrammar fragments of firstorder logic. We have shown an example of nongrammar learning, but more general learning problem requires Monadic and Dyadic and higher arity fragments of firstorder logic. We would like to incorporate a number of other features of ILP and SRL learning systems such as probabilistic parameters (similar to SRL) and noise handling.
Clearly devising an appropriate metainterpreter for a fragment of logic other than those studied in this paper will require careful mathematical analysis. The situation may be compared to that within Support Vector Machines, in which certain mathematical properties have to be established for each new form of kernel function. Hopefully, over time, such a process will become more routine and it may be possible to provide end users with general tools which support this activity. In the ideal case, we would like in future work, to develop a metainterpreter which is capable of implementing highly expressive, ideally Turingcomplete, languages. Such a metainterpreter might then be reasonably expected to learn effectively on arbitrary new problems without further manual revision of its metainterpreter.
In closing we believe the MIL framework provides a promising and novel form of Inductive Logic Programming which avoids a number of the bottlenecks of existing approaches.
Notes
Considering delta1/3 as an arity 3 ground relation, if c,k are bounds on the number of terminals and nonterminals respectively then the number of possible definitions for delta1/3 is \(2^{ck^{2}}\).
This is an adaptation of Chomskynormal form Contextfree, which only permits productions of the form S→λ, S→a and S→AB.
Note that the metainterpreter shown in Fig. 1b is a definite logic program. Such a metainterpreter is only a part of implementations such as those described later in Sect. 3. Within such an implementation negationbyfailure is used to implement operations such as abduction, so the implementation as a whole is not a definite logic program. However, this does not affect this definition or the later propositions which use it.
This algorithm iteratively remove logically redundant literals from a clause until no redundant clauses remain.
Metagol is MIL encoded within YAP Prolog. The name comes from the combination of Meta and gol, where Meta corresponds to the MetaInterpreter, and gol is the reverse of log which is short for logic.
The name ASP_{M} is MIL encoded within an ASP solver.
Provided H _{ g } and H _{ s } are within the same clause bound.
Occam’s razor suggests that simpler hypotheses have higher predictive power. This is further supported by Sect. 4 about experiments, where the nonminimal hypotheses suggested by MCTopLog have lower predictive accuracies than the minimal ones hypothesised by ASP_{M} and Metagol.
The parsing of seed examples requires all rules in the grammar.
The running times of Metagol_{ R } and ASP_{MR } are measured in terms of getting the first minimal hypothesis, rather than all minimal hypotheses.
The size of available examples is 50, therefore not large enough for reserving a subset as test set.
References
Andres, B., Kaufmann, B., Matheis, O., & Schaub, T. (2012). Unsatisfiabilitybased optimization in clasp. In Proceedings of the 28th International Conference on Logic Programming.
Blumer, A., Ehrenfeucht, A., Haussler, D., & Warmuth, M. K. (1989). Learnability and the VapnikChervonenkis dimension. Journal of the ACM, 36(4), 929–965.
Boström, H. (1998). Predicate invention and learning from positive examples only. In 10th European Conference on Machine Learning (ECML98) (pp. 226–237). Berlin: Springer.
Cussens, J., & Pulman, S. (2000). Experiments in inductive chart parsing. In J. Cussens & S. Dzeroski (Eds.), LNAI: Vol. 1925. Proceedings of Learning Language in Logic (LLL2000) (pp. 143–156). Berlin: Springer.
de la Higuera, C. (2005). A bibliographical study of grammatical inference. Pattern Recognition, 38, 1332–1348.
Denis, F. (2001). Learning regular languages from simple positive examples. Machine Learning, 44(1), 37–66.
Farid, R., & Sammut, C. (2012, to appear). Planebased object categorization using relational learning. ILP2012 MLJ special issue.
Flach, P. A. & Kakas, A. C. (Eds.) (2000). Abductive and Inductive Reasoning. Pure and Applied Logic. Amsterdam: Kluwer.
Florêncio, C. (2002). Consistent identification in the limit of rigid grammars from strings is nphard. In Grammatical Inference: Algorithms and Applications (pp. 729–733).
Gebser, M., Kaminski, R., Kaufmann, B., & Schaub, T. (2012). Answer Set Solving in Practice. Synthesis Lectures on Artificial Intelligence and Machine Learning. San Mateo: Morgan and Claypool.
Gebser, M., Kaufmann, B., Neumann, A., & Schaub, T. (2007). clasp: A conflictdriven answer set solver. In C. Baral, G. Brewka, & J. Schlipf (Eds.), Lecture Notes in Computer Science: Vol. 4483. Logic Programming and Nonmonotonic Reasoning (pp. 260–265). Berlin: Springer.
Hopcroft, J. E., & Ullman, J. D. (1979). Introduction to Automata and Formal Languages. Reading: AddisonWesley.
Inoue, K., Furukawa, K., Kobayashiand, I., & Nabeshima, H. (2010). Discovering rules by metalevel abduction. In L. De Raedt (Ed.), LNAI: Vol. 5989. Proceedings of the Nineteenth International Conference on Inductive Logic Programming (ILP09) (pp. 49–64). Berlin: Springer.
Kakas, A. C., Van Nuffelen, B., & Denecker, M. (2001). Asystem: Problem solving through abduction. In IJCAI (pp. 591–596).
Langley, P., & Stromsten, S. (2000). Learning contextfree grammars with a simplicity bias. In R. López de Mántaras & E. Plaza (Eds.), Lecture Notes in Computer Science: Vol. 1810. Machine Learning: ECML 2000 (pp. 220–228). Berlin: Springer.
Moore, E. F. (1956). Gedankenexperiments on sequential machines. In C. E. Shannon & J. McCarthy (Eds.), Automata Studies (pp. 129–153). Princeton: Princeton University Press.
Muggleton, S. H. (1990). Inductive Acquisition of Expert Knowledge. Wokingham: AddisonWesley.
Muggleton, S. H. (1995). Inverse entailment and Progol. New Generation Computing, 13, 245–286.
Muggleton, S. H. (1996). Stochastic logic programs. In L. de Raedt (Ed.), Advances in Inductive Logic Programming (pp. 254–264). Amsterdam: IOS Press.
Muggleton, S. H., & Bryant, C. H. (2000). Theory completion using inverse entailment. In Proc. of the 10th International Workshop on Inductive Logic Programming (ILP00) (pp. 130–146). Berlin: Springer.
Muggleton, S. H., & Buntine, W. (1988). Machine invention of firstorder predicates by inverting resolution. In Proceedings of the 5th International Conference on Machine Learning (pp. 339–352). Los Altos: Kaufmann.
Muggleton, S. H., Lin, D., & TamaddoniNezhad, A. (2012). MCToplog: Complete multiclause learning guided by a top theory. In LNAI: Vol. 7207. Proceedings of the 21st International Conference on Inductive Logic Programming (pp. 238–254).
Muggleton, S. H., De Raedt, L., Poole, D., Bratko, I., Flach, P., & Inoue, K. (2011). ILP turns 20: biography and future challenges. Machine Learning, 86(1), 3–23.
Muggleton, S. H., Santos, J., & TamaddoniNezhad, A. (2010). TopLog: ILP using a logic program declarative bias. In LNCS: Vol. 5366. Proceedings of the International Conference on Logic Programming 2008 (pp. 687–692). Berlin: Springer.
Muggleton, S. H., & Pahlavi, N. (2012, in press). Towards efficient higherorder logic learning in a firstorder datalog framework. In Latest Advances in Inductive Logic Programming. London: Imperial College Press.
Nabeshima, H., Iwanuma, K., Inoue, K., & Ray, O. (2010). Solar: An automated deduction system for consequence finding. AI Commun., 23(2–3), 183–203.
NienhuysCheng, S.H., & de Wolf, R. (1997). LNAI: Vol. 1228. Foundations of Inductive Logic Programming. Berlin: Springer.
Plotkin, G. D. (1969). A note on inductive generalisation. In B. Meltzer & D. Michie (Eds.), Machine Intelligence (Vol. 5, pp. 153–163). Edinburgh: Edinburgh University Press.
De Raedt, L. (2012). Declarative modeling for machine learning and data mining. In Proceedings of the International Conference on Algorithmic Learning Theory (p. 12).
Sakakibara, Y. (1992). Efficient learning of contextfreegrammars from positive structural examples. Information and Computation, 97(1), 23–60.
Salvador, I., & Benedi, J. M. (2002). Rna modeling by combining stochastic contextfree grammars and ngram models. International Journal of Pattern Recognition and Artificial Intelligence, 16(3), 309–316.
Stolcke, A. (1995). An efficient probabilistic contextfree parsing algorithm that computes prefix probabilities. Computational Linguistics, 21(2), 165–201.
Vanlehn, K., & Ball, W. (1987). A version space approach to learning contextfree grammars. Machine Learning, 2, 39–74.
Author information
Authors and Affiliations
Corresponding author
Additional information
Editors: Fabrizio Riguzzi and Filip Zelezny.
Rights and permissions
About this article
Cite this article
Muggleton, S.H., Lin, D., Pahlavi, N. et al. Metainterpretive learning: application to grammatical inference. Mach Learn 94, 25–49 (2014). https://doi.org/10.1007/s1099401353583
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s1099401353583