Learning efficient logic programs
Abstract
When machine learning programs from data, we ideally want to learn efficient rather than inefficient programs. However, existing inductive logic programming (ILP) techniques cannot distinguish between the efficiencies of programs, such as permutation sort (n!) and merge sort \(O(n\;log\;n)\). To address this limitation, we introduce Metaopt, an ILP system which iteratively learns lower cost logic programs, each time further restricting the hypothesis space. We prove that given sufficiently large numbers of examples, Metaopt converges on minimal cost programs, and our experiments show that in practice only small numbers of examples are needed. To learn minimal timecomplexity programs, including nondeterministic programs, we introduce a cost function called tree cost which measures the size of the SLDtree searched when a program is given a goal. Our experiments on programming puzzles, robot strategies, and realworld string transformation problems show that Metaopt learns minimal cost programs. To our knowledge, Metaopt is the first machine learning approach that, given sufficient numbers of training examples, is guaranteed to learn minimal cost logic programs, including minimal timecomplexity programs.
1 Introduction
As this find duplicate scenario shows, when machine learning programs from examples, we should consider the efficiency of learned programs. However, existing ILP systems cannot distinguish between the efficiencies of programs, and typically rely on an Occamist bias to learn textually simple programs, such as those using the fewest literals (Law et al. 2014) or fewest clauses (Muggleton et al. 2015).
A recent paper (Cropper and Muggleton 2015) attempted to address this issue by introducing Metagol\(_O\), an ILP system based on metainterpretive learning (MIL) (Muggleton et al. 2014, 2015; Cropper and Muggleton 2016a). Metagol\(_O\) learns minimal resource complexity robot strategies, described as dyadic logic programs, where resource complexity is the sum of the action costs required to execute a strategy.
In this paper, we introduce Metaopt, which generalises Metagol\(_O\) by adding a general cost function into a metainterpretive learner, where specific cost functions are provided as background knowledge. Metaopt uses a search procedure called iterative descent, introduced in Cropper and Muggleton (2015) but generalised in this paper, to iteratively learn lower cost programs, each time further restricting the hypothesis space. We prove that given sufficiently large numbers of examples, Metaopt converges on minimal cost programs, and our experiments show that in practice only small numbers of examples are necessary. In contrast to Metagol\(_O\), Metaopt is not restricted to dyadic logic programs, does not need designated inputoutput arguments, can handle negative examples, and considers backtracking steps when measuring program costs, crucial when learning nondeterministic logic programs. To learn minimal timecomplexity programs, we introduce a cost function called tree cost, which measures the size of the SLDtree searched when a program is given a goal. Our experiments on programming puzzles, robot strategies, and realworld string transformation problems show that Metaopt learns minimal tree cost programs.

We describe a general framework for learning minimal cost logic programs (Sect. 3).

We extend MIL to support learning minimal cost logic programs (Sect. 3).

To learn minimal timecomplexity logic programs, we introduce a cost function called tree cost which is based on SLDtree sizes (Sect. 3).

We introduce Metaopt, a MIL implementation, and prove that it converges on minimal cost programs given sufficiently large numbers of examples (Sect. 4).

We show through experimentation that Metaopt converges on minimal cost programs given small numbers of examples (Sect. 5).

We demonstrate the generality of Metaopt by simulating Metagol\(_O\) to learn efficient robot strategies (Sect. 5).

We show that Metaopt learns more efficient programs than existing ILP systems on realworld string transformation problems (Sect. 5).
2 Related work
Universal search If nothing is known about a problem besides input/output examples, and assuming that the solution can be verified in polynomial time, Levin’s universal search (Levin 1984) is the asymptotically fastest way of finding a program to solve the problem. Levin search differs from our work because it returns the first (and smallest) program that solves a problem, which is not necessarily the most efficient program. By contrast, after finding a program, our approach, Metaopt, continues to search for more efficient programs using the cost of the previously found program to restrict the hypothesis space. In addition, for many problems, it is unlikely that the most efficient program can be encoded with a small number of bits, making Levin search impractical. By contrast, although our approach is less general, because it assumes background knowledge of the problem, it is more practical.
Deductive program synthesis In contrast to universal search methods, deductive program synthesis (Manna and Waldinger 1979) approaches build programs from full specifications, where a specification precisely states the requirements and behaviour of the desired program. Both Kant (1983) and Zelle and Mooney (1993) synthesise efficient programs from complete specifications. In Zelle and Mooney (1993) the authors take as input a program to sort lists and use an explanationbased learning (Mitchell 1997) approach to speedup the execution of the program by analysing program execution traces over training examples. A similar, yet distinct, approach is that of logic program transformation (Pettorossi and Proietti 1994). This approach starts with an initial program and successively applies transformation rules to programs to improve on previous versions. The aim is to end up with final program that has the same meaning as the initial one, but is preferably better in some way, such as being more compact or more efficient (Pettorossi and Proietti 1994). In contrast to these approaches, our approach induces programs from incomplete specifications in the form of input/output examples. In addition, the aforementioned approaches do not guarantee that the resulting program is optimal in terms of efficiency. By contrast, we prove that Metaopt finds the minimal cost program given sufficiently large numbers of examples (Theorem 1), and Experiment 1 shows that in practice only small numbers of examples are necessary.
Program induction Program induction, also known as inductive programming (Gulwani et al. 2015), refers to inducing programs from incomplete specifications, typically input/output examples. Early work includes Plotkin on least generalisation (Plotkin 1969, 1971), Vere on induction algorithms for expressions in predicate calculus (Vera 1975), and Summers on inducing Lisp programs (Summers 1977). Interest in program induction has grown recently, partially due to the success of massmarket tools, such as FlashFill (Gulwani 2011). Most forms of program induction are biased towards learning simple programs, typically those with minimal textual complexity, such as the number of clauses (Muggleton et al. 2015) or the number of literals (Law et al. 2014). This bias ignores the efficiency of hypothesised programs. In ILP, for instance, Golem (Muggleton and Feng 1990) and Progol (Muggleton 1995) can both learn sorting algorithms from examples, but when given background predicates partition/3 and append/3, suitable for learning quick sort, both systems learn variants of insertion sort, because the definition is smaller and there is no bias for learning more efficient algorithms. By contrast, Metaopt is biased towards learning efficient programs.
AI planning In AI, planning typically involves deriving a sequence of actions to achieve a specific goal from an initial situation (Russell and Norvig 2010). Planning research mostly focuses on efficiently learning plans (Hoffmann and Nebel 2001). However, we are often interested in plans that are optimal with respect to an objective function which measures the quality of a plan. A common objective function is the length of the plan (Xing et al. 2006), and existing systems can learn optimal plans based on this function (Eiter et al. 2003).
Plan length alone is only one criterion. If executing actions is costly, we may prefer a plan which minimises the overall cost of the actions, e.g. to minimise the use of resources. The answer set programming literature has started to address learning optimal plans by incorporating action costs into the learning (Eiter et al. 2003; Yang et al. 2014). In contrast to these approaches, in Sect. 5.4, we use Metaopt to learn robot strategies (Cropper and Muggleton 2015), representing a (sometimes infinite) set of plans, which contain conditions and recursion.
Various machine learning approaches support constructing strategies, such as the SOAR architecture (Laird 2008), reinforcement learning (Sutton and Barto 1998), and action learning in ILP (Moyle and Muggleton 1997; Otero 2005). Relational markov decision processes (van Otterlo and Wiering 2012) provide a general setting for reinforcement learning. Strategies can be viewed as a deterministic special case of markov decision processes (MDPs) (Puterman 2014). Unlike these approaches, we learn recursive logic programs, including the use of predicate invention for automatic problem decomposition.
Efficient logic programs Kaplan (1988) describes a method for estimating the averagecase complexity of deterministic logic programs. However, in contrast to functional and imperative programs, logic programs can be nondeterministic, i.e. a logic program may return multiple solutions. Multiple solutions to a logic program are found by searching a SLDtree for a SLDrefutation, and then backtracking to find other SLDrefutations. Debray et al. (1997) introduce a semiautomatic method to estimate the worstcase time complexity of deterministic and nondeterministic logic programs. However, their approach requires metainformation, such as mode declarations and type information. In contrast to these approaches, we introduce a cost function which estimates the worstcase complexity of both deterministic and nondeterministic logic programs. Our approach does not require metainformation, and works by measuring the size of the SLDtree searched to find a SLDrefutation of a goal. In addition, the aforementioned approaches do not consider how to machine learn efficient programs.
In Sect. 4, we introduce Metaopt, which is used in Experiments 1, 2, and 3 to learn minimal timecomplexity programs by iteratively restricting the hypothesis space by bounding the number of resolutions allowed to find a hypothesis. Our approach is similar to one proposed by Blum and Blum (1975) which was later adapted by Shapiro (1983) who uses the notion of heasy functions to limit the search for a hypothesis, where an atom A is heasy with respect to a logic program P if there exists a derivation of A from P using at most h resolution steps. Shapiro’s approach measures the number of resolutions in the derivation of A from P, and thus ignores backtracking steps. By contrast, our approach uses the notion of tree cost to measure the total number of resolutions required to find a SLDrefutation of a goal with respect to a program, i.e. tree cost includes backtracking steps.
Our approach of evaluating a hypothesis based on the SLDtree size is similar the the idea of Muggleton et al. (1992), who combined proof complexity (the number of choice points in a SLDrefutation) with other factors, such as hypothesis length and predictive accuracy, to measure the significance of a hypothesis. However, the authors only considered how to characterise the significance of a hypothesis. By contrast, we use a metainterpretive learner to simultaneously induce and evaluate the time complexity of a hypothesis.
Metagol\(_O\) To our knowledge, the only work on inducing efficient programs is the work of Cropper and Muggleton (2015), who introduce an ILP system called Metagol\(_O\) and show that it learns efficient robot strategies. A robot strategy is a dyadic logic program where the first argument of each predicate is the input and the second argument is the output. Each argument is a state description, represented as a list of Prolog atoms. Each predicate represents an action which modifies the state. The resource complexity of a strategy is maintained in the state as a monadic atom named energy. Each time a robot action is successfully executed, the resource complexity is increased by an amount specified by the user. Metagol\(_O\) is not a general approach for learning efficient logic programs and has several limitations. Problems must be represented as dyadic programs, which is inconvenient and may lead to reduced learning performance because concepts may be less succinctly represented. Also, the user must specify predicate costs. For instance, if mergesort/2 is part of the background knowledge, then the user must specify its cost. If no costs are given, Metagol\(_O\) assumes uniform costs, and cannot, for instance, distinguish between mergesort/2 and tail/2. Finally, Metagol\(_O\) ignores failed actions and cannot accurately measure the time complexity of programs with backtracking. Our system, Metaopt, addresses all of these issues.
3 Framework
In this section, we describe the cost minimisation problem. We also describe MIL, which we extend to support learning minimal cost programs. We assume familiarity with logic programming (NienhuysCheng and Wolf 1997).
3.1 Cost minimisation problem
We assume a language of examples \(\mathscr {E}\), background knowledge \(\mathscr {B}\), and hypotheses \(\mathscr {H}\). We denote the Herbrand base of the language L as \(\sigma _{L}\). We denote the power set of the set S as \(2^S\).
We first define a cost function that measures the cost of a program with respect to an atom:
Definition 1
A program cost function forms part of the cost minimisation input:
Definition 2

\(B \subseteq \mathscr {B}\) is background knowledge

\(E = (E^{+}, E^{})\) is a pair where \(E^+ \subseteq \mathscr {E}, E^ \subseteq \mathscr {E}\) are sets of atoms representing positive and negative examples respectively

\(\varPhi \) is a program cost function
We measure the maximum (i.e. worstcase) cost of a program over a set of examples:
Definition 3
We also define a function that measures the size of a logic program:
Definition 4
(Program size) The size size(H) of the program H is the number of clauses in H.
We use the maximum cost and size of a program to define an efficiency ordering over programs:
Definition 5
 1.
\(max\_cost(\varPhi ,H_1,E) < max\_cost(\varPhi ,H_2,E)\)
 2.
\(max\_cost(\varPhi ,H_1,E) = max\_cost(\varPhi ,H_2,E)\) and \(size(H_1) \le size(H_2)\)
This ordering priorities programs first by their maximum cost, then by their size. We use this ordering to define the cost minimisation problem. For convenience, we first define the version space (Mitchell 1997), which contains only hypotheses consistent with the examples:
Definition 6
We now define the cost minimisation problem:
Definition 7
(Cost minimisation problem) Given a cost minimisation input \((B,E,\varPhi )\), the cost minimisation problem is to return a program \(H \in \mathscr {V}_{B,E}\) such that \(H \preceq _\varPhi H'\) for all \( H' \in \mathscr {V}_{B,E}\).
3.2 Metainterpretive learning
Example metarules. The letters P, Q, R, and F denote existentially quantified variables. The letters A, B, and C denote universally quantified variables
Name  Metarule 

Ident  \(P(A,B) \leftarrow Q(A,B)\) 
Precon  \(P(A,B) \leftarrow Q(A),R(A,B)\) 
Curry  \(P(A,B) \leftarrow Q(A,B,F)\) 
Chain  \(P(A,B) \leftarrow Q(A,C), R(C,B)\) 
Tailrec  \(P(A,B) \leftarrow Q(A,C), P(C,B)\) 
A standard MIL input is defined as:
Definition 8

\(B = B_C \cup M\) where \(B_C\) is a set of definite clauses and M is a set of metarules

\(E = (E^{+}, E^{})\) is a pair where \(E^+\) and \(E^\) are sets of atoms representing positive and negative examples respectively
A standard MIL learner is defined as:
Definition 9
(MIL learner) Given a MIL input (B, E), a MIL learner returns a program \(H \in \mathscr {V}_{B,E}\).
We extend MIL to support the cost minimisation problem. We first extend the MIL input:
Definition 10
(Cost minimal MIL input) A cost minimal MIL input is a triple \((B,E,\varPhi )\) where B and E are as in a standard MIL input and \(\varPhi \) is a program cost function.
We now define a cost minimal MIL learner:
Definition 11
(Cost minimal MIL learner) Given a cost minimal MIL input \((B, E,\varPhi )\), a costminimal MIL learner returns a program \(H \in \mathscr {V}_{B,E}\) such that \(H \preceq _\varPhi H'\) for all \( H' \in \mathscr {V}_{B,E}\).
In Sect. 4, we introduce Metaopt, a MIL learner that solves the MIL cost minimisation problem.
3.3 Tree cost minimisation
The cost minimisation input includes a program cost function (Definition 1). We now introduce a cost function for learning minimal timecomplexity logic programs. In computer science, time complexity refers to the time an algorithm needs to perform some computation. In logic programming, computation is formalised by means of SLDresolution. Given a logic program H and a goal G, computation involves finding a SLDrefutation of \(H \cup \{G\}\). A SLDrefutation is found by searching a SLDtree, which contains all possible SLDderivations, and thus all possible SLDrefutations. Prolog searches for SLDrefutations using a depthfirst search (Sterling and Shapiro 1994). We can therefore measure the runtime (time complexity) of a Prolog program as a function of the size of the SLDtree that is being searched. For a positive example, we can measure the size of the leftmost branch of the SLDtree in which the first SLDrefutation is found, i.e. the leftmost successful branch:
Definition 12
(Successful branch) Let H be a definite program, G be an initial goal, and T be a SLDtree for \(H \cup \{G\}\). Then a successful branch is a path between the root (G) and a leaf containing the empty clause.
We measure the size of the leftmost successful branch:
Definition 13
(Branch size) Let H be a definite program, G a goal, T a SLDtree for \(H \cup \{G\}\), and L be the leftmost successful branch of T. Then the branch size \(branch\_size(H,G)\) is the number of resolutions prior to and including L in the depthfirst enumeration of T.
For a negative example, we can measure the size of the finitely failed SLDtree:
Definition 14
(Finitely failed tree) Let H be a definite program, G be an initial goal. Then a finitely failed SLDtree for \(H \cup \{G\}\) is one which is finite and contains no successful branches.
We measure the size of the finitely failed SLDtree:
Definition 15
(Failed tree size) Let H be a definite program, G be an initial goal, and T be a finitely failed SLDtree for \(H \cup \{G\}\). Then the failed tree size \(tree\_size(H,G)\) is the number of resolutions in the depthfirst enumeration of T.
We now define our tree cost function:
Definition 16
In Experiments 1, 2, and 3, Metaopt uses tree cost to learn minimal timecomplexity programs.
4 Implementation
In this section, we introduce Metaopt, a MIL implementation that learns minimal cost logic programs. We also introduce two cost function implementations for learning minimal tree cost (Definition 16) and minimal resource complexity (Cropper and Muggleton 2015) programs. Finally, we describe Metagol\(_O\) which is used as a comparator in the experiments in Sect. 5.
4.1 Metaopt
Metaopt extends Metagol (Cropper and Muggleton 2016b), an existing MIL implementation, to support learning minimal cost logic programs. The two key extensions are (1) the addition of a general cost function into the metainterpreter, and (2) the use of a procedure called iterative descent to search for lower cost programs. We describe these two extensions in turn.
4.1.1 Metainterpreter
The metainterpreter works as follows. Given sets of atoms representing positive (Pos) and negative (Neg) examples, Metaopt tries to prove each positive atom in turn. Metaopt first tries to deductively prove an atom by calling pos_program_cost/2, which is defined as background knowledge. When an atom is proven this way, the cost of proving that atom is added to the overall proof cost. If the overall proof cost exceeds a bound (MaxCost), then the proof is terminated, as to ignore inefficient programs. This bound is determined by the iterative descent procedure, described in the next section. If Metaopt cannot deductively prove an atom, it tries to unify the atom with the head of a metarule (metarule(Name,Subs,(Atom : Body))) and to bind the existentially quantified variables in a metarule to symbols in the predicate signature. Metaopt saves the resulting metasubstitutions, which are eventually used to form a program. Metaopt then tries to prove the body of the metarule recursively through metainterpretation. After proving all positive atoms, a logic program is formed by projecting the metasubstitutions onto their corresponding metarules. Metaopt checks the consistency of the learned program with the negative examples. If the program is inconsistent, then Metaopt backtracks to explore different branches of the SLDtree.
4.1.2 Iterative descent
Iterative descent works as follows. Starting at iteration 1, Metaopt uses iterative deepening on the number of clauses to find a consistent program \(H_1\) with the minimal program size (Definition 4). The program \(H_1\) is the quickest to learn because the hypothesis space is exponential in the number of clauses (Lin et al. 2014; Cropper and Muggleton 2016a). Metaopt then calculates the cost of \(H_1\) using the max_program_cost/4 predicate, which measures the maximum cost of proving the positive examples and disproving the negative examples. This cost becomes the maximum cost for the next iteration for both the metainterpreter and the iterative descent algorithm. In iteration \(i > 1\), Metaopt searches for a program \(H_i\), again with minimal program size, but ensuring that the cost of \(H_i\) is less than the cost of \(H_{i1}\). Iterative descent continues until it cannot find a lower cost program.
We now prove that Metaopt solves the cost minimisation problem (Definition 7), i.e. that Metaopt converges on minimal cost programs given sufficiently large numbers of examples:
Theorem 1
(Metaopt convergence) Assume E consists of m examples drawn from an enumeration of the infinite example space \(E'\). Without loss of generality consider the hypothesis space formed of two programs \(H_1\) and \(H_2\) such that \(H_1 \preceq _{\varPhi } H_2\) for an arbitrary cost function \(\varPhi \). Then for a sufficiently large value m, Metaopt will return \(H_1\) in preference to \(H_2\).
Proof
 1.
\(max\_cost(\varPhi ,H_1,E) > max\_cost(\varPhi ,H_2,E)\)
 2.
\(max\_cost(\varPhi ,H_1,E) = max\_cost(\varPhi ,H_2,E)\) and \(size(H_1) > size(H_2)\)
 Case 1
With sufficiently large m there will exist an example e such that \(\varPhi (H_1,e) < \varPhi (H_2,e)\) and \(\varPhi (H_2,e) > \varPhi (H_2,e')\) for all other \(e'\) in E and \(\varPhi (H_1,e) > \varPhi (H_1,e')\) for all other \(e'\) in E. In this case \(max\_cost(\varPhi ,H_1,E) < max\_cost(\varPhi ,H_2,E)\) and Metaopt returns \(H_1\) which contradicts the assumption, so we discard this case.
 Case 2
Metaopt performs iterative deepening search (IDS) on the number of clauses. From the optimality of IDS, Metaopt returns \(H_1\) which contradicts the assumption, so we discard this case.
4.2 Program costs
Metaopt assumes a program cost function (Definition 1) as background knowledge. We now describe two cost function implementations.
4.2.1 Tree cost
4.2.2 Resource complexity
4.3 Metagol\(_O\)
In Experiments 2 and 3 we compare Metaopt with Metagol\(_O\). However, we cannot use the implementation from Cropper and Muggleton (2015) because Metagol\(_O\) requires that problems be represented as dyadic robot strategies (as detailed in Sect. 2). Therefore, we simulate Metagol\(_O\) in Metaopt by defining a program cost predicate in which all dyadic predicates have a uniform cost of 1, which is the assumption in Metagol\(_O\), and nondyadic predicates have a cost of 0, because Metagol\(_O\) does not take these into account when calculating resource complexity.
5 Experiments
We now describe four experiments^{3} which test whether Metaopt learns minimal cost programs. We also compare Metaopt with Metagol and Metagol\(_O\), which minimise textual complexity and resource complexity respectively.
5.1 Experiment 1: convergence on minimal cost programs
 Null hypothesis 1

Metaopt cannot learn minimal cost programs without very large numbers of examples.
This experiment focuses on learning minimal tree cost programs, and thus minimal timecomplexity programs.
Materials To refute null hypothesis 1, we must identify a minimal tree cost program in the hypothesis space, otherwise our hypothesis would be untestable. We provide Metaopt with background knowledge containing the ident, chain, and tailrec metarules (Table 1) and four predicates: mergesort/2, tail/2, head/2, and element/2 (Fig. 1). Given this background knowledge, the minimal tree cost program in the hypothesis space is shown in Fig. 2b.^{4} This minimal cost program has the following tree cost:
Proposition 1
Find duplicate minimal cost program Let n be the list length. Then the tree cost of the minimal tree cost program is \(O(n\;log\;n)\).
Sketch proof 1
The minimal tree cost program involves first sorting the list and then passing through the list checking whether any two adjacent elements are the same. Thus the overall cost is \(O(n\;log\;n)\). \(\square \)
 1.
Select a random integer k from the interval [5, 100] to represent the size of the input
 2.
Select a random integer j from the interval [1, k] to represent the duplicate element
 3.
Append j to the sequence \(1\dots k\) and randomly shuffle the resulting list to form \(s'\)
 4.
Form the atom \(f(s',j)\)
 1.
Select a random integer k from the interval [5, 100] to represent the size of the input
 2.
Select a random integer j from the interval [1, k] to represent the false duplicate element
 3.
Randomly shuffle the sequence \(1\dots k\) to form \(s'\)
 4.
Form the atom \(f(s',j)\)
 1.
Generate m training examples, half positive and half negative
 2.
Generate 2000 testing examples, half positive and half negative
 3.
Learn a program p using the training examples with a timeout of 10 minutes.
 4.
Measure the tree cost and running time of p over the testing examples
5.2 Experiment 2: comparison with other systems
 Null hypothesis 2

Metaopt cannot learn programs with lower costs and lower running times than Metagol and Metagol\(_O\).
Materials We provide all three systems with the same background knowledge as in Experiment 1. We generate training examples in the same way as in Experiment 1. We generate testing examples using the same procedure but for fixed list sizes from the set \(\{1000,2000,\ldots ,10{,}000\}\) to measure tree costs as the input grows.
 1.
Generate 20 training examples, half positive and half negative
 2.
Generate 100 testing examples, half positive and half negative
 3.
Learn a program p using the training examples with a timeout of 10 minutes.
 4.
Measure the tree cost and running time of p over the testing examples
5.3 Experiment 3: realworld string transformations
Examples for the p01 string transformation problem
Input  Output 

My name is John  John 
My name is Bill  Bill 
My name is Josh  Josh 
My name is Albert  Albert 
My name is Richard  Richard 
Materials We provide Metaopt, Metagol, and Metagol\(_O\) with the same background knowledge containing the curry and chain metarules (Table 1) and the predicates: is_letter/1, not_letter/1, is_uppercase/1, not_uppercase/1, is_number/1, not_number/1, is_space/1, not_space/1, tail/2, dropLast/2, reverse/2, filter/3, dropWhile/3, and takeWhile/3.
Method The dataset from Lin et al. (2014) contains five examples of each problem. We perform leavetwoout (keepthreein) cross validation. We measure median program costs and running times over all trials. We set a timeout at 10 minutes.
5.4 Experiment 4: robot postman strategies
 Null hypothesis 3

Metaopt cannot learn programs with lower resource complexities than Metagol.
Materials Imagine a humanoid robot postman learning to collect and deliver letters in a d sized onedimensional space. In the initial state, the robot is at position 1 and n letters are to be collected. In the final state, the robot is at position 1 and n letters have been delivered to their intended destinations. The state is represented as a list of Prolog facts. The robot can perform primitive actions to transform the state: move_right/2, move_left/2, pick_up_left/2, pick_up_right/2, drop_left/2, drop_right/2, take_letter/2, bag_letter/2, and give_letter/2. All primitive actions have a cost of 1. The robot can also perform complex actions, which are defined in terms of primitive actions: find_next_sender/2 and find_next_recipient/2, go_to_start/2, and go_to_end/2. The costs of complex actions are determined by their constituent primitive actions. The robot can take and carry a single letter from a sender using the action take_letter/2. Alternatively, the robot can take a letter from a sender and place it a postbag using the action bag_letter/2, which allows the robot to carry multiple letters. We use the chain and tailrec metarules (Table 1).
 1.
Select a random integer d from the interval [10, 25] representing the number of houses.
 2.
Select a random integer n from the interval [1, 5] representing the number of letters.
 3.
For each letter l, select random integers i and j from the interval [1, d] representing the letter’s start and end positions, such that \(i \ne j\)
 4.Form an input state \(s_1=\)$$\begin{aligned}&[pos(pman,0),energy(0),pos(l_1,i_1),\dots ,pos(l_n,i_n),\\&\quad letter(l_1,i_1,j_1),\dots ,letter(l_n,i_n,j_n)] \end{aligned}$$
 5.Form an output state \(s_2=\)$$\begin{aligned}&[pos(pman,\_),energy(\_),pos(l_1,j_1),\dots ,pos(l_n,j_n),\\&\quad letter(l_1,i_1,j_1),\dots ,letter(l_n,i_n,j_n)] \end{aligned}$$
 6.
Form an example \(f(s_1,s_2)\).
 1.
Generate 5 positive training and 100 positive testing examples
 2.
Learn a program p using the training examples with a timeout of 10 minutes.
 3.
Measure the resource complexity and running time of p over the testing examples
6 Conclusions and further work
We have introduced Metaopt which extends MIL to support learning minimal cost logic programs. To find minimal cost programs, Metaopt uses iterative descent, which iteratively learns lower cost programs, each time further restricting the hypothesis space. We have shown (Theorem 1) that given sufficiently large numbers of examples, Metaopt converges on minimal cost programs, and that in practice (Experiment 1), only small numbers of examples are required. To learn minimal timecomplexity programs, we introduced a cost function called tree cost (Definition 16), which is based on the size of a SLDtree at the point of which a goal is proved by a logic program. Our experiments on the find duplicate problem show that Metaopt learns minimal cost programs given small numbers of examples. By contrast, Metagol and Metagol\(_O\) both learn nonminimal cost programs with longer running times. Our experiments also show that Metaopt learns programs with lower costs than existing systems on some realworld string transformation problems. Finally, our experiments on learning robot strategies show that Metaopt can simulate Metagol\(_O\) and learn minimal resource complexity robot strategies by treating resource complexity as a specific case of the cost minimisation problem.
6.1 Future work
Theorem 1 shows that Metaopt learns minimal cost programs given sufficient examples. The find duplicate experiment (Sect. 5.1) supported this result and showed that in practice only small numbers of examples (<20) are necessary. Future work should further test this result on other domains, such as learning from visual data or learn efficient taleoreactive programs (Nilsson 1994). Likewise, in all of our experiments, we have assumed noisefree examples, which means that a learned program must be consistent with all examples. This assumption restricts MIL from being applied to noisy problems. To address this limitation, we could relax the requirement that a program must be consistent with all examples. One method, similar to the one used by Muggleton et al. (2018), is to repeatedly learn programs from random subsets of the examples, and to then calculate confidence levels of the learned programs based on the size of the subsets and the number of repetitions.
Other complexity measures We have introduced techniques to learn minimal worstcase complexity programs. However, we are often interested in finding minimal averagecase complexity programs, and future work should explore this topic. To do so, the framework in Sect. 3 would need to be adjusted to accommodate a different function to measure the cost of a program over a set of examples (Definition 3), which would effect the proof of convergence of Metaopt (Theorem 1). In this case, it would be desirable to analyse how many examples Metaopt would need to converge on the optimal averagecase program. Another promising areas of future work include investigating whether Metaopt can learn minimal spacecomplexity programs or minimal power consumption programs.
Program complexity analysis Metaopt uses iterative descent to continually prune the hypothesis space of programs that are less efficient than already learned ones. However, this approach is inefficient when the first found program (in the first iteration of iterative descent) has a prohibitively high cost. For instance, suppose you are learning to sort lists and that the shortest program in the hypothesis space is permutation sort. Then in the first iteration of iterative descent, Metaopt would find permutation sort, which would require \(O(n! )\) time. If the examples are large, then this approach would be impractical. To overcome this issue, iterative descent could start with a low program cost bound and then iteratively relax this bound until the first program is found. Once a program has been found, iterative descent could then work as it does now and search for more efficient programs by continually restricting the hypothesis space. Alternatively, we could estimate the tree complexity of a program by approximating the SLDtree size (Kilby and Slaney 2006).
Algorithm discovery We have used Metaopt to learn efficient programs, such as an efficient quicksort robot strategy and an efficient find duplicate program. However, although the learning techniques are novel, the learned programs are not, i.e. we have learned programs that we already knew about. In future work, we want to use Metaopt for algorithm discovery, where the goal is to learn programs that are both efficient and novel.
Footnotes
 1.
To aid readability, the Prolog code for the metainterpreter is a slimmed down version of the one used in the experiments.
 2.
 3.
All code and experimental data used in the experiments are available at https://github.com/andrewcropper/mlj18metaopt.
 4.
One could find the duplicate in time O(n) using a hash table but this program is not in the hypothesis space, so could not be found by Metaopt.
References
 Blum, L., & Blum, M. (1975). Toward a mathematical theory of inductive inference. Information and Control, 28(2), 125–155.MathSciNetCrossRefMATHGoogle Scholar
 Cropper, A., & Muggleton, Stephen H. (2015). Learning efficient logical robot strategies involving composable objects. In IJCAI (pp. 3423–3429). AAAI Press.Google Scholar
 Cropper, A., & Muggleton, S. H. (2016a). Learning higherorder logic programs through abstraction and invention. In IJCAI (pp. 1418–1424). IJCAI/AAAI Press.Google Scholar
 Cropper, A., & Muggleton, S. H. (2016b). Metagol system. https://github.com/metagol/metagol.
 Debray, S. K., LópezGarcía, P., Hermenegildo, M.V., & Lin, N.W. (1997). Lower bound cost estimation for logic programs. In Logic programming. Proceedings of the 1997 international symposium (pp. 291–305), Port Jefferson, Long Island, NY, USA, October 13–16, 1997Google Scholar
 Eiter, T., Faber, W., Leone, N., Pfeifer, G., & Polleres, A. (2003). Answer set planning under action costs. Journal of Artificial Intelligence Research, 19, 25–71.MathSciNetMATHGoogle Scholar
 Gulwani, S. (2011). Automating string processing in spreadsheets using input–output examples. In Proceedings of the 38th ACM SIGPLANSIGACT symposium on principles of programming languages, POPL 2011 (pp. 317–330), Austin, TX, USA, January 26–28, 2011Google Scholar
 Gulwani, S., HernándezOrallo, J., Kitzelmann, E., Muggleton, S. H., Schmid, U., & Zorn, B. G. (2015). Inductive programming meets the real world. Communications of the ACM, 58(11), 90–99.CrossRefGoogle Scholar
 Hoffmann, J., & Nebel, B. (2001). The ff planning system: Fast plan generation through heuristic search. Journal of Artificial Intelligence Research, 14, 253–302.MATHGoogle Scholar
 Kant, E. (1983). On the efficient synthesis of efficient programs. Artificial Intelligence, 20(3), 253–305.CrossRefGoogle Scholar
 Kaplan, S. (1988). Algorithmic complexity of logic programs. In Logic Programming, Proceedings of the fifth international conference and symposium (pp. 780–793), Seattle, Washington, August 15–19, 1988 (2 Volumes).Google Scholar
 Kilby, P., & Slaney, J. K. (2006). Sylvie Thiébaux, and Toby Walsh. Estimating search tree size. In AAAI (pp. 1014–1019). AAAI Press.Google Scholar
 Laird, J. E. (2008). Extending the soar cognitive architecture. Frontiers in Artificial Intelligence and Applications, 171, 224–235.Google Scholar
 Law, M., Russo, A., & Broda, K. (2014). Inductive learning of answer set programs. In E. Fermé & J. Leite (Eds.), Logics in artificial intelligence (pp. 311–325). Berlin: Springer.Google Scholar
 Levin, L. A. (1984). Randomness conservation inequalities; information and independence in mathematical theories. Information and Control, 61(1), 15–37.MathSciNetCrossRefMATHGoogle Scholar
 Lin, D., Dechter, E., Ellis, K., Tenenbaum, J. B., & Muggleton, S. (2014). Bias reformulation for oneshot function induction. In ECAI, volume 263 of Frontiers in artificial intelligence and applications (pp. 525–530). IOS Press.Google Scholar
 Manna, Z., & Waldinger, R. (1979). A deductive approach to program synthesis. In IJCAI (pp. 542–551). William Kaufmann .Google Scholar
 Mitchell, T. M. (1997). Machine learning., McGraw Hill series in computer science New York: McGrawHill.MATHGoogle Scholar
 Moyle, S., & Muggleton, S. H. (1997). Learning programs in the event calculus. In N. Lavrač, & S. Džeroski, S. (Eds.), Proceedings of the seventh inductive logic programming workshop (ILP97), LNAI 1297 (pp. 205–212). Berlin: SpringerVerlag.Google Scholar
 Muggleton, S. H., Dai, WZ., Sammut, C., TamaddoniNezhad, A., Wen, J., & Zhou, ZH. (2018). Metainterpretive learning from noisy images. Machine Learning. https://doi.org/10.1007/s1099401857108.
 Muggleton, S. (1995). Inverse entailment and progol. New Generation Computing, 13(3&4), 245–286.CrossRefGoogle Scholar
 Muggleton, S., & Feng, C. (1990). Efficient induction of logic programs. In ALT (pp. 368–381).Google Scholar
 Muggleton, S., Srinivasan, A., & Bain, M. (1992). Compression, significance, and accuracy. In D. H. Sleeman & P. Edwards (Eds.), Proceedings of the ninth international workshop on machine learning (ML 1992) (pp. 338–347), Aberdeen, Scotland, UK, July 1–3, 1992. Morgan Kaufmann.Google Scholar
 Muggleton, S. H., Lin, D., Pahlavi, N., & TamaddoniNezhad, A. (2014). Metainterpretive learning: Application to grammatical inference. Machine Learning, 94(1), 25–49.MathSciNetCrossRefMATHGoogle Scholar
 Muggleton, S. H., Lin, D., & TamaddoniNezhad, A. (2015). Metainterpretive learning of higherorder dyadic datalog: Predicate invention revisited. Machine Learning, 100(1), 49–73.MathSciNetCrossRefMATHGoogle Scholar
 NienhuysCheng, S.H., & de Wolf, R. (1997). Foundations of inductive logic programming. New York: Springer.CrossRefMATHGoogle Scholar
 Nilsson, N. J. (1994). Teleoreactive programs for agent control. Journal of Artificial Intelligence Research (JAIR), 1, 139–158.Google Scholar
 Otero, R. P. (2005). Induction of the indirect effects of actions by monotonic methods. In: S. Kramer & B. Pfahringer (Eds.), Inductive logic programming. 15th international conference, ILP 2005. Proceedings, volume 3625 of Lecture notes in computer science (pp. 279–294), Bonn, Germany, August 10–13, 2005. Springer.Google Scholar
 Pettorossi, A., & Proietti, M. (1994). Transformation of logic programs: Foundations and techniques. The Journal of Logic Programming, 19(20), 261–320.MathSciNetCrossRefMATHGoogle Scholar
 Plotkin, G. D. (1969). A note on inductive generalisation. In B. Meltzer & D. Michie (Eds.), Machine Intelligence (Vol. 5, pp. 153–163). Edinburgh: Edinburgh University Press.Google Scholar
 Plotkin, G.D. (1971). A further note on inductive generalization. In Machine intelligence (Vol. 6). Edinburgh: University Press.Google Scholar
 Puterman, M. L. (2014). Markov decision processes: Discrete stochastic dynamic programming. Hoboken: Wiley.MATHGoogle Scholar
 Russell, S. J., & Norvig, P. (2010). Artificial intelligence: A modern approach (3rd ed.). New Jersey: Pearson.MATHGoogle Scholar
 Shapiro, E. Y. (1983). Algorithmic program debugging. Cambridge: MIT Press.MATHGoogle Scholar
 Sterling, L., & Shapiro, E. Y. (1994). The art of Prolog–advanced programming techniques (2nd ed.). Cambridge: MIT Press.MATHGoogle Scholar
 Summers, P. D. (1977). A methodology for LISP program construction from examples. Journal of ACM, 24(1), 161–175.MathSciNetCrossRefMATHGoogle Scholar
 Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning—An introduction. Adaptive computation and machine learning. Cambridge: MIT Press.Google Scholar
 van Otterlo, M., & Wiering, M. (2012). Reinforcement learning and Markov decision processes. In M. Wiering & M. van Otterlo (Eds.), Reinforcement Learning (pp. 3–42). Berlin: Springer.CrossRefGoogle Scholar
 Vera, S. (1975). Induction of concepts in the predicate calculus. In Advance papers of the fourth international joint conference on artificial intelligence (pp. 281–287), Tbilisi, Georgia, USSR, September 38, 1975.Google Scholar
 Wielemaker, J., Schrijvers, T., Triska, M., & Lager, T. (2012). SWIProlog. Theory and Practice of Logic Programming, 12(1–2), 67–96.MathSciNetCrossRefMATHGoogle Scholar
 Xing, Z., Chen, Y., & Zhang, W. (2006). Optimal strips planning by maximum satisfiability and accumulative learning. In Proceedings of the international conference on autonomous planning and scheduling (ICAPS) (pp. 442–446).Google Scholar
 Yang, F., Khandelwal, P., Leonetti, M., & Stone, P. (2014). Planning in answer set programming while learning action costs for mobile robots. AAAI spring 2014 symposium on knowledge representation and reasoning in robotics (AAAISSS).Google Scholar
 Zelle, J. M., & Mooney, R. J. (1993). Combining FOIL and EBG to speedup logic programs. In IJCAI (pp. 1106–1113). Morgan Kaufmann.Google Scholar
Copyright information
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.