Inductive general game playing

Cropper, Andrew; Evans, Richard; Law, Mark

doi:10.1007/s10994-019-05843-w

Inductive general game playing

Open access
Published: 18 November 2019

Volume 109, pages 1393–1434, (2020)
Cite this article

Download PDF

You have full access to this open access article

Machine Learning Aims and scope Submit manuscript

Inductive general game playing

Download PDF

5440 Accesses
10 Citations
1 Altmetric
Explore all metrics

Abstract

General game playing (GGP) is a framework for evaluating an agent’s general intelligence across a wide range of tasks. In the GGP competition, an agent is given the rules of a game (described as a logic program) that it has never seen before. The task is for the agent to play the game, thus generating game traces. The winner of the GGP competition is the agent that gets the best total score over all the games. In this paper, we invert this task: a learner is given game traces and the task is to learn the rules that could produce the traces. This problem is central to inductive general game playing (IGGP). We introduce a technique that automatically generates IGGP tasks from GGP games. We introduce an IGGP dataset which contains traces from 50 diverse games, such as Sudoku, Sokoban, and Checkers. We claim that IGGP is difficult for existing inductive logic programming (ILP) approaches. To support this claim, we evaluate existing ILP systems on our dataset. Our empirical results show that most of the games cannot be correctly learned by existing systems. The best performing system solves only 40% of the tasks perfectly. Our results suggest that IGGP poses many challenges to existing approaches. Furthermore, because we can automatically generate IGGP tasks from GGP games, our dataset will continue to grow with the GGP competition, as new games are added every year. We therefore think that the IGGP problem and dataset will be valuable for motivating and evaluating future research.

General Game Playing

Iterative Tree Search in General Game Playing with Incomplete Information

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

General game playing (GGP) (Genesereth and Thielscher 2014) is a framework for evaluating an agent’s general intelligence across a wide variety of games. In the GGP competition, an agent is given the rules of a game that it has never seen before. The rules are described in a first-order logic-based language called the game description language (GDL) (Love et al. 2008). The rules specify the initial game state, what constitutes legal moves, how moves update the game state, and how the game terminates (Björnsson 2012). Before the game begins, the agent is given a few seconds to think, to process the rules, and devise a game-specific strategy. The agent then starts playing the game, thus generating game traces. The winner of the competition is the agent that gets the best total score over all the games. Figure 1 shows six example GGP games. Figure 2 shows a selection of rules, written in GDL, for the game Rock Paper Scissors.

In this paper, we invert the GGP competition task: the learner (a machine learning system) is given game traces and the task is to induce (learn) the rules that could have produced the traces. In other words, the learner must learn the rules of a game by observing others play. This problem is a core part of inductive general game playing (IGGP) (Genesereth and Björnsson 2013), the task of jointly learning the rules of a game and playing the game successfully. We focus exclusively on the first task. Once the rules of the game have been learned then existing GGP techniques (Finnsson 2012; Koriche et al. 2016, 2017) can be used to play the games.

Figure 3 shows an example IGGP task, described as a logic program, for the game Rock Paper Scissors. In this task, a learner is given a set of ground atoms representing background knowledge (BK) and sets of disjoint ground atoms representing positive ($E^+$) and negative ($E^-$) examples of target concepts. The task is for the learner to induce a set of general rules (a logic program) that explains all of the positive but none of the negative examples. In this scenario, the examples are observations of the next_score and next_step predicates, and the task is to learn the rules for these predicates, such as the rules shown in Fig. 4.

In this paper, we expand on the idea proposed by Genesereth and Björnsson (2013) and we introduce the IGGP problem (Sect. 3.2). Our main claim is that IGGP is difficult for existing inductive logic programming (ILP) techniques, and in Sect. 2 we outline the reasons why we think IGGP is difficult, such as the lack of task-specific language biases. To support our claim, we make three key contributions.

Our main contribution is a new IGGP dataset.^{Footnote 1} The dataset is based on game traces from 50 games from the GGP competition. The games vary across a number of dimensions, including the number of players (1–4), the number of spatial dimensions (0–2), the reward structure (whether the rewards are zero-sum, cooperative, or orthogonal), and complexity. Some of the games are turn-taking (Alquerque) while others (Rock Paper Scissors) are simultaneous. Some of the games are classic board games (Checkers and Hex); some are puzzles (Sokoban and Sudoku); some are dilemmas from game theory (Prisonner’s Dilemma and Chicken); others are simple implementations of classic video games (Centipede and Tron). Table 1 lists the 50 games and also shows for each game the number of dimensions, the number of players, and as an estimate of the game’s complexity the number of rules and literals in the GGP reference solution. Each game is described as four relational learning tasks goal, next, legal, and terminal with varying arities, although flattening the dataset to remove function symbols leads to more relations as illustrated in Fig. 3 where the next predicate is flattened to relations next_score/2 and next_step/2. For each game, we provide (1) training/validate/test data composed of sets of ground atoms in a 4:1:1 split, (2) a type signature file describing the arities of the predicates and types of the arguments, and (3) a reference solution in GDL. It is important to note that we have not designed these games: the games were designed independently from our IGGP problem without this induction task in mind.

Our second contribution is a mechanism to continually expand the dataset. The GGP competition produces new games each year, which provides a continual rich source of challenges to the GGP participants. Our technical contribution allows us to easily add these new games to our dataset. We implemented an automatic procedure for producing a new learning task from a game. When a new game is added to the GGP competition, our system can read the GDL description, generate traces of sample play, and extract an IGGP task from those traces (see Sect. 4.3 for technical details). This automatic procedure means that our dataset can expand each year as new games are added to the GGP competition. We again stress that the GGP games were not designed with this induction task in mind. The games were designed to be challenging for GGP systems. Thus, this induction task is based on a challenging “real world” problem, not a task that was designed to be the appropriate level of difficulty for current ILP systems.

Table 1 The IGGP dataset. We list the number of rules (clauses) R, the number of literals L, number of dimensions D, and the number of players P

Full size table

Our third contribution is an empirical evaluation of existing ILP approaches, to test our claim that IGGP is difficult for current ILP approaches. We evaluate the classical ILP system Aleph (Srinivasan 2001) and the more recent systems ASPAL (Corapi et al. 2011), Metagol (Cropper and Muggleton 2016b), and ILASP (Law et al. 2014). Although non-exhaustive, these systems cover a breadth of ILP approaches and techniques. We also compare non-ILP approaches in the form of simple baselines and clustering (KNN) approaches. Table 2 summarises the results. Although some systems can solve some of the simpler games, most of the games cannot be solved by existing approaches. In terms of balanced accuracy (Sect. 6.1.1), the best performing system, ILASP, achieves 86%. However, in terms of our perfectly solved metric (Sect. 6.1.2), the best performing system, ILASP, achieves only 40%. Our empirical results suggest that our current IGGP dataset poses many challenges to existing ILP approaches. Furthermore, because of our second contribution, our dataset will continue to grow with the GGP competition, as new games are added every year. We therefore think that the IGGP problem and dataset will be valuable for motivating and evaluating future research.

Table 2 Results summary. The baseline represents accepting everything. The results show that all of the approaches struggle in terms of the perfectly solved metric (which represents how many tasks were solved with 100% accuracy)

Full size table

The rest of the paper is organised as follows. Section 2 describes related work and further motivates this new problem and dataset. Section 3 describes the IGGP problem, the GDL, in which GGP games are described, and how IGGP games are Markov games. Section 4 introduces a technique to produce a IGGP task from a GGP game and provides specific details on how we generated our initial IGGP dataset. Section 5 describes the baselines and ILP systems used in the evaluation of current ILP techniques. Section 6 details the results of the evaluation and also describes why IGGP is so challenging for existing approaches. Finally, Sect. 6 concludes the paper and details future work.

2 Related work

2.1 General game playing

As Björnsson states (Björnsson 2012), from the inception of AI games have played a significant role as a test-bed for advancing the field. Although the early focus was on developing general problem-solving approaches, the focus shifted towards developing problem-specific approaches, such as approaches to play chess (Campbell et al. 2002) or checkers (Schaeffer et al. 1996) very well. One motivation of the GGP competition is to reverse this shift, as to encourage work on developing general AI approaches that can solve a variety of problems.

Our motivation for introducing the IGGP problem and dataset is similar. As we will discuss in the next section, there is much work in ILP on learning rules for specific games, or for specific patterns in games. However, there is little work on demonstrating general techniques for learning rules for a wide variety of games (i.e. the IGGP problem). We want to encourage such work by showing that current ILP systems struggle on this problem.

2.2 Inducing game rules

Inducing game rules has a long history in ILP, where chess has often been the focus. Bain (1994) studied inducing first-order Horn rules to determine the legality of moves in the chess KRK (king-rook-king) endgame, which is similar to the problem of learning the legal predicate in the IGGP games. Bain also studied inducing rules to optimally play the KRK endgame. Other works on chess include Goodacre (1996), Morales (1996), who induced rules to play the KRK endgame and rules to describe the fork pattern, and Muggleton et al. (2009).

Besides chess, Castillo and Wrobel (2003) used a top-down ILP system and active learning to induce a rule for when a square is safe in the game minesweeper. Law et al. (2014) used an ASP-based ILP approach to induce the rules for Sudoku and showed that this more expressive formalism allows for game rules to be expressed more compactly.

Kaiser (2012) learned the legal moves and the win condition (but not the state transition function) for a variety of boardgames (breakthrough, connect4, gomuku, pawn whopping, and tictactoe). This system represents game rules as formulas of first-order logic augmented with a transitive closure operator TC; it learns by enumerative search, starting with the guarded fragment before proceeding to full first-order logic with TC. Unusually, their system learns the game rules from videos of correct and incorrect play: before it can start learning the rules, it has to parse the video, converting a sequence of pixel arrays into a sequence of sets of ground atoms.

Relatedly, Grohe and Ritzert (2017) also use enumerative search, searching through the space of first-order formulas. They exploit Gaifman’s locality theorem to search through a restricted set of local formulas. They show, remarkably, that if the max degree of the Gaifman graph is polylogarithmic in the number n of objects, then the running time of their enumerative learning algorithm is also polylogarithmic in n. This intriguing result does not, however, suggest a practical algorithm as the constants involved are very large.

GRL (Gregory et al. 2015) builds on SGRL (Björnsson 2012) and LOCM (Cresswell et al. 2009) to learn game dynamics from traces. In these systems, the game dynamics are modelled as as finite deterministic automata. They do not learn the legal predicate (determining which subset of the possible moves are available in the current state) or the goal predicate.

As is clear from these works, there is little work in ILP demonstrating general techniques for learning rules for a wide variety of games. This limitation partially motivates the introduction of the IGGP problem and dataset.

2.3 Existing datasets

One of our main contributions is the introduction of a IGGP dataset. In contrast to the existing datasets, our dataset introduces many new challenges.

2.3.1 Size and diversity

Our dataset is larger and more diverse than most existing ILP datasets, especially on learning game rules. Commonly used ILP datasets, such as kinship data (Hinton 1986), Michaslki trains (Larson and Michalski 1977), Mutagenesis (Debnath et al. 1991), Carcinogenesis (Srinivasan et al. 1997), string transformations (Lin et al. 2014), and chess positions (Muggleton et al. 1989), typically contain a single predicate to be learned, such as eastbound/1 or westbound/1 in the Michaslki trains dataset or active/1 in the Mutagenesis dataset. By contrast, our dataset contains 50 distinct games, each described by at least four target predicates, where flattening leads to more relations as illustrated in Fig. 3. In addition, whereas some datasets use only dyadic concepts, such as kinship or string transformations, our dataset also requires learning programs with a mixture of predicates arities, such as input_jump/8 in Checkers and next_cell/4 predicate in Sudoku. Learning programs with high-arity predicates is a challenge for some ILP approaches (Cropper and Muggleton 2016b; Kaminski et al. 2018; Evans and Grefenstette 2018). Moreover, because of our second main contribution, we can continually and automatically expand the dataset as new games are introduced into the GGP competition. Therefore, our IGGP dataset will continue to expand to include more games.

2.3.2 Inductive bias

Our IGGP games come from the GGP competition. As stated in the introduction, the games were not designed with this induction task in mind. One key challenge proposed by the IGGP problem is the lack of inductive bias provided. Most existing work on inducing game rules has assumed as input a set of high-level concepts. For instance, Morales (1996) assumed as input a predicate to determine when a chess piece is in check. Likewise, Law et al. (2014) assumed high-level concepts such as same_row/2 and same_col/2 as background knowledge when learning whether a Sudoku board was valid. Moreover, most existing ILP work on game learning rules (and learning in general) involves the designers of the system designing the appropriate representation of the problem for their system. By contrast, in our IGGP problem the representation is fixed: it is the GDL provided by the GGP.

Many existing ILP techniques assume a task-specific language bias, expressing a hypothesis space which contains at least one correct representation of the target concept. When available, language biases are extremely useful as a smaller hypothesis space can mean fewer examples and less computational resources are needed by the ILP systems. In many practical situations, however, task-specific language biases are either not available, or are extremely wide, as very little information is known about the structure of the target concept.

In our IGGP dataset we only provide the most simple (or primitive) low-level concepts, which come directly from the GGP competition, i.e. our IGGP dataset does not provide any task-specific language biases. For each game, the only language bias given is the type schema of each predicate in the language of the background knowledge. For instance, in Sudoku the higher-level concepts of same row and same col are not given. Likewise, to learn the terminal predicate in Connect Four, a learner must learn the concept of a line, which in turn requires learning rules for vertical, horizontal, and diagonal lines. This means that for an approach to solve the IGGP problem in general (and to be able to accept future games without changing their method), it must be able to learn without a game-specific bias, or be able to generate this game-specific bias from the type-schemas in the task. In addition, a learner must learn concepts from only primitive low-level background predicates, such as cell(X,Y,Filled). Should these high-level concepts be reusable then it would be advantageous to perform predicate invention, which has long been a key challenge in ILP (Muggleton et al. 2012, 2014). Popular ILP systems, such as FOIL (Ross Quinlan 1990) and Progol (Muggleton 1995), do not support predicate invention, and although recent work (Inoue et al. 2013; Muggleton et al. 2015; Cropper and Muggleton 2016a) has tackled this challenge, predicate invention is still a difficult problem.

2.3.3 Large programs

Many reference solutions for IGGP games are large, both in terms of the number of literals and the clauses in them. For instance, the GGP reference solution for the goal predicate for Connect Four uses 14 clauses and a total of 72 literals. This solution uses predicate invention to essentially compress the solution, where the auxillary predicates include the concept of a line, which in turn uses the auxillary predicates for the concepts of columns, rows, and diagonals. If we unfold the reference solution as to remove auxillary predicates then the total number of literals required to learn a solution for this single predicate easily exceeds 400. However, learning large programs is a challenge for most ILP systems (Cropper 2017) which typically struggle to learn programs with hundreds of clauses or literals.

2.3.4 ILP2016 competition

The closest work similar to ours is the ILP 2016 Competition (Law et al. 2016). The ILP 2016 competition was based on a single type of task (with various hand crafted target hypotheses) aimed at learning the valid moves of an agent as it moved through a grid. In some ways this is similar to our legal tasks, although many tasks required learning invented predicates representing changes in state, similar to our next tasks. By contrast, our IGGP problem and dataset is based on a variety of real games, which we did not design. Furthermore, the ILP 2016 dataset provides restricted inductive biases to aid the ILP systems, whereas we (deliberately) do not give such help.

2.4 Model learning

AlphaZero (Silver et al. 2017) has shown the power of combining tree search with a deep neural network for distilling search policy into a neural net. But this technique presupposes that we have been given a model of the game dynamics: we must already know the state transition function and the reward function. Suppose we want to extend AlphaZero-style techniques to domains where we are not given an explicit model of the environment. We would need some way of learning a model of the environment from traces. Ideally, we would like to learn data-efficiently, without needing hundreds of thousands of traces.

Model-free reinforcement learning agents have high sample complexity: they often require millions of episodes before they can learn a reasonable policy. Model-based agents, by contrast, are able to use their understanding of the dynamics of the environment to learn much more efficiently (Džeroski et al. 2001; Duff and Barto 2002; Guez et al. 2012). Whether, and to what extent, model-based methods are more sample efficient than model-free methods depends on the complexity of the particular MDP. Sometimes, in simple environments, one needs fewer data to learn a policy than to learn a model. It has also been shown that, for Q learning, the worst-case asymptotics for model-based and model-free are the same (Kearns and Singh 1999). But these qualifications do not, of course, undermine the claim that in complex environments that require anticipation or planning, a model-based agent will be significantly more sample-efficient than its model-free counterpart.

The GGP dataset was designed to test an agent’s ability to learn a model that can be useful in planning. The most successful GGP algorithms, e.g. Cadiaplayer (Finnsson 2012), Sancho (Koriche et al. 2016), and WoodStock (Koriche et al. 2017), use Monte Carlo Tree Search (MCTS) to search. MCTS relies on an accurate forward model of the Markov Decision Process. The further into the future we search, the more important it is that our forward model is accurate, as errors compound. In order to avoid having to give our MCTS agents a hand-coded model of the game dynamics, they must be able to learn an accurate model of the dynamics from a handful of behavior traces.

Two things make the GGP dataset an appealing task for model learning. First, hundreds of games have already been designed for the GGP competition, with more being added each year. Second, each game comes with ‘ground truth’: a set of rules that completely describe the game. From these rules, we know the learning problem is solvable, and we have a good measure of how hard it is (by measuring the complexity of the ground-truth program^{Footnote 2}).

3 IGGP dataset

In this section, we describe the Game Description Language (GDL) in which GGP games are described, the IGGP problem setting, and finally an illustrative example of a typical IGGP task.

3.1 Game description language

GGP games are described using GDL. This language describes the state of a game as a set of facts and the game mechanics as logical rules. GDL is a variant of Datalog with two syntactic extensions (stratified negation and restricted function symbols) and with a small set of distinguished predicates that have a special meaning (Love et al. 2008) (shown in Fig. 5).

The first syntactic extension is stratified negation. Standard Datalog (lacking negation altogether) has the useful property that there is a unique minimal model (Dantsin et al. 2001). If we add unrestricted negation, we lose this attractive property: now there can be multiple distinct minimal models. To maintain the property of having a unique minimal model, GDL adds a restricted form of negation called stratified negation (Apt et al. 1988). The dependency graph of a set of rules is formed by creating an edge from predicate p to predicate q whenever there is a rule whose head is $p(\ldots )$ and that contains an atom $q(\ldots )$ in the body. The edge is labelled with a negation if the body atom is negated. A set of rules is stratified if the dependency graph contains no cycle that includes a negated edge.

GDL’s second syntactic extension to Datalog is restricted function symbols. The Herbrand base of a standard Datalog program is always finite. If we add unrestricted function symbols, the Herbrand base can be infinite. To maintain the property of having a finite Herbrand base, GDL restricts the use of function symbols in recursive rules (Love et al. 2008).

The two syntactic extensions of GDL, stratified negation and restricted function symbols, mean we extend the expressive power of Datalog without essentially changing its key attractive property: there is always a single, finite minimal model (Love et al. 2008).

3.2 Problem setting

We now define the IGGP problem. Our problem setting is based on the ILP learning from entailment setting (De Raedt 2008), where an example corresponds to an observation about the truth or falsity of a formula F and a hypothesis H covers F if H entails F. We assume languages of background knowledge ${\mathscr {B}}$ and examples ${\mathscr {E}}$ each formed of function-free ground atoms. The atoms are function-free because we flatten the GDL atoms. For example, in Fig. 6, the atom true(count(9)) has been flattened into true_count(p9). We flatten atoms because some ILP systems do not support function symbols. We likewise assume a language of hypotheses ${\mathscr {H}}$ formed of datalog programs with stratified negation. Stratified negation is not necessary but in practice allows significantly more concise programs, and thus often makes the learning task computationally easier. Note that the GDL also supports recursion but in practice most GGP games do not use recursion. In future work we intend to contribute recursive games to the GGP competition.

We now define the IGGP input:

Definition 1

(IGGP input) An IGGP input$\Delta $ is a set of m triples $\{(B_i,E^+_i,E^-_i)\}^m_{i=1}$ where

$B_i \subset {\mathscr {B}}$ represents background knowledge
$E_i^+\subseteq {\mathscr {E}}$ and $E_i^-\subseteq {\mathscr {E}}$ represent positive and negative examples respectively.

An IGGP input forms the IGGP problem:

Definition 2

(IGGP problem) Given an IGGP input $\Delta $, the IGGP problem is to return a hypothesis $H \in {\mathscr {H}}$ such that $\text {for all} \;\; (B_i,E^+_i,E^-_i) \in \Delta $ it holds that $H \cup B_i \models E^+$ and $H \cup B_i \not \models E_i^-$.

Note that a single hypothesis should be consistent with all given triples.

3.2.1 Illustrating example: Fizz Buzz

To give the reader an intuition for the IGGP problem and the GGP games, we now describe example scenarios for the game Fizz Buzz. Although typically a multi-player game, in our IGGP dataset Fizz Buzz is a single-player game. The aim of the game is for the player to replace any number divisible by three with the word fizz, any number divisible by five with the word buzz, and any number divisible by both three and five with fizzbuzz. For example, a game of Fizz Buzz up to the number 17 would go: 1, 2, fizz, 4, buzz, fizz, 7, 8, fizz, buzz, 11, fizz, 13, 14, fizz buzz, 16, 17.

Figures 6, 7, 8, and 9 show example IGGP problems and solutions for the target predicates legal, next, goal, and terminal respectively. For simplicity each example is a single $(B,E^+,E^-)$ triple, although in the dataset each learning task is often a set of multiple triples, where a single hypothesis should explain all the triples. In all cases the BK shown in Fig. 10 holds, so we omit it from the individual examples for brevity. Note that the game only runs to the number 31.

4 Generating the GGP dataset

In this section, we describe our procedure to automatically generate IGGP tasks from GGP game descriptions. We first explain how GGP games fit inside the framework of multi-agent Markov decision processes. We also explain the need for a type-signature for each game.

4.1 Preliminaries: Markov games

GGP games are Markov games (Littman 1994), a strict superset of multi-agent Markov decision process (MDP)s that allow simultaneous moves.^{Footnote 3} The four components (S, A, T, R) of the MDP are:

S is a finite set of states
A is a finite set of actions
T is transition function $T: S \times A \rightarrow S$
R is a reward function

We describe these elements in turn for a GGP game.

4.1.1 States

Each state $s \in S$ is a set of ground atoms representing fluents (propositions whose truth-value can change from one state to another). The true predicate indicates which fluents are true in the current state. For instance, one state of a best-of-three game of Rock Paper Scissors is:

$$\begin{aligned}&\texttt {true(score(p1,0)).} \\&\texttt {true(score(p2,2)).} \\&\texttt {true(step(2)).} \\ \end{aligned}$$

This state represents that the current score is 0 to 2 in favour of player p2, and 2 time-steps have been performed.

4.1.2 Actions

Each action $a \in A$ is a set of ground atoms representing the set of all joint actions for agents 1..n. The does predicate indicates which agents perform which actions. For instance, one set of joint actions for Rock Paper Scissors is:

$$\begin{aligned}&\texttt {does(p1,paper).} \\&\texttt {does(p2,stone).} \\ \end{aligned}$$

4.1.3 Transition function

In a stochastic MDP, the transition function T has the signature $T : S \times A \times S \rightarrow \{0,1\}$. By contrast, in a deterministic MDP, such as a GGP game, the transition function is $T : S \times A \rightarrow S$. Given a current state s and a set of actions a, the next predicate indicates which fluents are true in the (unique) next state $s'$. For instance, in Rock Paper Scissors, given the current state s and actions a above, the next state $s'$ is:

$$\begin{aligned}&\texttt {next(score(p1,1)).} \\&\texttt {next(score(p2,2)).} \\&\texttt {next(step(3)).} \\ \end{aligned}$$

The transition function is a set of definite clauses defining next in terms of true. For instance, the following two clauses define part of the transition function for Rock Paper Scissors:

4.1.4 Reward function

In a continuous multi-agent MDP, the reward function has the signature^{Footnote 4}$R : S \rightarrow {\mathbb {R}}^n$. In a discrete MDP, such as a GGP game, we assume a small fixed set of k discrete rewards $\{r_1,\dots ,r_k\}$, where $r_i$ is not necessarily numeric. Let G[i] be the set of atoms representing that player i has one of the k rewards $G[i] = \{ goal(i, r_j) \mid j = 1 .. k \}$. Let $G = G[1] \times \cdots \times G[n]$ be the joint rewards for agents 1..n. In our GGP dataset, the reward function has the signature $R : S \rightarrow G$. Note that, in this framework, learning the reward function becomes a classification problem rather than a regression problem. For example, in the Rock Paper Scissors state above, the reward for state $s'$ depends only on the score and is:

$$\begin{aligned}&\texttt {goal(p1,1).} \\&\texttt {goal(p2,2).} \\ \end{aligned}$$

4.1.5 Legal

In the GGP framework, actions are sometimes unavailable. It is not the case that all possible actions from A can be performed, but some of them have no effect—but rather that only a subset of actions are available in a particular state.

The legal function L determines which actions are available in which states: $L : S \rightarrow 2^A$. Recall that an element of A is not an individual action performed by a single player, but rather a set of simultaneous joint actions, one for each player. For example, one element of A is $\{\texttt {does(p1,paper).} , \texttt {does(p2,stone).} \}$. Note that the availability of an action for one agent does not depend on what other actions are being performed concurrently by other agents; it only depends on the state S.

4.1.6 Terminal

The GDL language contains a distinguished predicate, the nullary terminal predicate, that indicates when an episode has terminated (i.e. when the game is over).

4.2 Preliminaries: the type-signature for a GGP game

In order to calculate the complete set of ground atoms for a game,^{Footnote 5} we use a type signature $\Sigma $. The type signature defines the types of constants, functions, and predicates used in the GDL description. Our type signatures include a simple subtyping mechanism for inclusion polymorphism. For example:

In this example, true and next are predicates, at is a function that takes an (x, y) coordinate and a cell-type and returns a fluent (prop). A cell is either blank or one of the agents. The expression agent :> cell means that an agent is a subtype of cell.

Let $\sqsubseteq $ be the reflexive transitive closure of :> . Let $\Sigma (f)$ be the type assigned to element f by signature $\Sigma $. Then $f(k_1,\ldots , k_n)$ is a well-formed term of type t if:

$\Sigma (f) = (t_1,\ldots , t_n)$
$\Sigma (k_i) \sqsubseteq t_i$ for all $i = 1\ldots n$

Predicates are functions that return a bool and constants are functions with no arguments. For example, using the type signature above, true(at(3, 4, black)) is a well-formed term of type bool, i.e. a well-formed ground atom.

4.3 Automatically generating induction tasks for a GGP game

Given a GGP game $\Gamma $ written in GDL, and a type signature $\Sigma $ for that game, our system automatically generates an IGGP induction task. Before presenting the details, we summarise the general approach. To generate the GGP dataset, we built a simple forward-chaining GDL interpreter. We used the GDL interpreter to calculate the initial state, the currently valid moves, the transition function, and the reward. When generating traces, we first calculate the actions that are currently available for each player. Then we let each player choose actions uniform randomly. We record the state trace $(s_1,\ldots , s_n)$, and extract a set of $(B_i, E^+_i, E^-_i)$ triples from each trace. The target predicates we wish to learn are legal, next, goal, and terminal. The $(B_i, E^+_i, E^-_i)$ triples for the predicates legal, goal, and terminal are calculated from a single state, while the triples for next are calculated from a pair of consecutive states $(s_i, s_{i+1})$.

We generated multiple traces for each game: 1000 episodes with a maximum of 100 time-steps. However, we chose these numbers somewhat arbitrarily because there is a complex tradeoff on how much data to generate. We want to generate enough data to capture the diversity of a game, so that a learner can (in theory) learn the correct game rules. However, we do not want to generate too much data as to provide every game state, as this would mean that a learner would not need to learn anything, and could instead simply memorise game situations. We also we do not want to generate too much data that it becomes expensive to compute or store. It is, however, unclear where the boundary is between too little and too much data. Whether such a boundary even exists itself is unclear because by imposing different biases, different learners may need more or less information on the same task. In future work we would like to expand the dataset. We then intend to repeat the experiments with different amounts of training data.

Our approach is presented in Algorithm 1. This procedure generates a number of traces. Each trace is a sequence of game states, and each game state is represented by a set of ground atoms. We use the $ extract $ function (described in Sect. 4.3.1) to produce a set of $(B_i,E^+_i,E^-_i)$ triples from a trace. We add this set of triples to $\Lambda $. At the end, when we have finished all the traces, we return $\Lambda $, the set of triples. The variable s stores the current state (a set of ground atoms). Initially, s is set to the initial state: $ initial (\Gamma )$ produces the initial state from the GDL description. Then for each time-step, we calculate the next state via $ next (\Gamma , s)$. This function $ next (\Gamma , s)$ involves three steps. First, we calculate the available actions for each player. Second, we let each player take a (uniform) random move. Third, we use the transition function T to calculate the next state from the current state s and the actions of the players. Once we have calculated the new state, we append it to the end of t. Here, t is a trace i.e. a sequence of states. Then we check if the new state is terminal. If it is terminal, we finish the episode; otherwise, we continue for another time-step. Once the episode is finished, we extract the set of $(B_i,E^+_i,E^-_i)$ triples from the sequence of states, and continue to the next trace. Note that we need the type signature $\Sigma $ to extract the triples from the trace, but we do not need it to generate the trace itself. For our experiments, we generated 1000 traces for each game, and ran for a maximum of 100 time-steps per game.

4.3.1 The $ extract $ function

The $ extract (t, \Sigma )$ function in Algorithm 1 takes a trace $t = (s_1,\ldots , s_n)$ (a sequence of sets of ground atoms), and a type signature $\Sigma $ and produces a set of $(B_i,E^+_i,E^-_i)$ triples. This set of triples represents a set of induction tasks for the distinguished predicates $ legal $, $ goal $, $ terminal $, and $ next $. It is defined as:

$$\begin{aligned} extract ((s_1,\ldots , s_n), \Sigma ) = \Lambda _1 \cup \Lambda _2 \cup \Lambda _3 \cup \Lambda _4 \end{aligned}$$

where:

$$\begin{aligned} \Lambda _1= & {} \{ triple _1(s_i, legal , \Sigma ) \mid i = 1 .. n\} \\ \Lambda _2= & {} \{ triple _1(s_i, goal , \Sigma ) \mid i = 1 .. n\} \\ \Lambda _3= & {} \{ triple _1(s_i, terminal , \Sigma ) \mid i = 1 .. n\} \\ \Lambda _4= & {} \{ triple _2(s_i, s_{i+1}, \Sigma ) \mid i = 1 .. n-1\} \end{aligned}$$

Before we define the $ triple _1$ and $ triple _2$ functions, we introduce the relevant notation. If s is a set of ground atoms and p is a predicate, let $s_p$ be the subset of atoms in s that use the predicate p. If $\Sigma $ is a type signature and p is a predicate, then $ ground (\Sigma , p)$ is the set of all ground atoms generated by $\Sigma $ that use predicate p. Given this notation, we define $ triple _1(s, p, \Sigma ) = (B, E^+, E^-)$ where:

$$\begin{aligned}&B = s - s_p \\&E^+ = s_p \\&E^- = ground (\Sigma , p) - E^+ \end{aligned}$$

To calculate the negative instances ${E}^-_i$, we use the closed-world assumption: all p-atoms not known to be true in $E^+$ are assumed to be false in $E^-$. Given a type signature $\Sigma $, we generate the set $ ground (\Sigma , p)$ of all possible ground atoms whose predicate is the distinguished predicate p. For example, in a one player game, if $ ground (\Sigma , legal ) = \{$legal(p1, up), legal(p1, down), legal(p1, left), and legal(p1, right)$\}$, and $s_ legal $ only contains legal(p1, up) and legal(p1, down), then:

$$\begin{aligned} {E}^+_i= & {} \{{\texttt {legal(p1, up)}}, {\texttt {legal(p1, down)}}\} \\ {E}^-_i= & {} ground (\Sigma , legal ) - {E}^+_i = \{{\texttt {legal(p1, left)}}, {\texttt {legal(p1, right)}}\} \end{aligned}$$

We define $ triple _2(s_i, s_{i+1}, \Sigma ) = (B, E^+, E^-)$ where:

$$\begin{aligned}&B = s_i \\&E^+ = s_{i+1} [ true / next ] \\&E^- = ground (\Sigma , next ) - E^+ \end{aligned}$$

When learning $ next $, we use the facts at the earlier time-step $s_i$ as background facts, we use the facts at the later time-step $s_{i+1}$ as the positive facts $E^+$ to be learned (with the predicate $ true $ replaced by $ next $), and we use all the rest of the ground atoms involving $ next $ as the negative facts $E^-$. Note, again, the use of the closed-world assumption: we assume all $ next $ atoms not known to be in $E^+$ to be in $E^-$.

5 Baselines and ILP systems

We claim that IGGP is challenging for existing ILP approaches. To support this claim we evaluate existing ILP systems on our IGGP dataset. We compare the ILP systems against simple baselines. We first describe the baselines and then each ILP system.

5.1 Baselines

Figure 11 shows the four baselines. Each baseline is a Boolean function $f :2^{{\mathscr {B}}} \times {\mathscr {E}} \rightarrow \{\top ,\bot \}$, i.e. a function that takes background knowledge and an example and returns true ($\top $) or false $(\bot )$. We describe these baselines in detail.

Our first two baselines ignore the training data:

True deems that every atom is true:
$$\begin{aligned} True(B,a) = \top \end{aligned}$$
Inertia is the same as True for atoms with the target predicates goal, legal, and terminal, but for the next predicate an atom is true if and only if the corresponding true atom is in B. For instance, the atom next(at(1,4,x)) is true if and only if true(at(1,4,x)) is in B:
$$\begin{aligned} Inertia(B,a) = a[next/true] \in B \end{aligned}$$
The intuition behind this baseline is the empirical observation that in most of the games, most ground atoms retain their truth value from one time-step to the next, more often than not. Of course, it is possible to design games in which most or all of the atoms change their truth value each time-step; but in typical games, such radical changes are unusual.

Our next two baselines consider the training data $\Delta = \{(B_i,E^+_i,E^-_i)\}^m_{i=1}$:

Mean deems that a testing atom a is true if and only if a is true more often than not in the positive training examples:
$$\begin{aligned} Mean(B, a) = |\{(B_i, E^+_i,E^-_i) \in \Delta \mid a \in E^+_i\}| \ge \frac{|\Delta |}{2} \end{aligned}$$
KNN$_k$ is based on clustering the data. In $KNN_k(B,a)$ we find the k triples in $\Delta $, denoted as $\kappa _k(\Delta , B)$, whose backgrounds are most ‘similar’ to the background B. To assess the similarity of two sets A and B of ground atoms, we look at the size of the symmetric difference^{Footnote 6} between A and B:
$$\begin{aligned} d(A, B) = |A - B| + |B - A| \end{aligned}$$
It is straightforward to show that the d function satisfies the conditions for a distance metric:
- $d(A, B) \ge 0$
- $d(A, B) = d(B, A)$
- $d(A, B) = 0$ iff $A = B$
- $d(A, C) \le d(A, B) + d(B, C)$
We set the closest k triples $\kappa _k(\Delta , B)$ to be the k triples $\{(B_i,E^+_i,E^-_i)\}^k_{i=1}$ with the smallest d distance between $B_i$ and B. Given the k closest triples $\kappa _k(\Delta , B)$ the KNN baseline outputs $\top $ if a appears in $E^{+'}$ in at least half of the closest k triples. More formally:
$$\begin{aligned} KNN_k(B, a) = |\{ (B', E^{+'}, E^{-'}) \in \kappa _k(\Delta , B) \mid a \in E^{+'}\}| \ge \frac{k}{2} \end{aligned}$$

One potential limitation of the KNN approach is that, in contrast to the ILP approaches, the KNN approaches learn at the propositional level and are unable to learn general first-order rules. To illustrate this limitation, suppose we are trying to learn the target predicate p / 1 given the background predicate q / 1 and that the underlying target rule is $p(X) \leftarrow q(X)$. Suppose there are only two training triples of the form $(B,E^+,E-)$:

$$\begin{aligned} T_1= & {} (\{q(a)\}, \{ p(a) \}, \{ p(b), p(c) \})\\ T_2= & {} (\{q(b)\}, \{ p(b) \}, \{ p(a), p(c) \}) \end{aligned}$$

Given the test triple $(\{ q(c) \}, \{ p(c) \}, \{ p(a), p(b) \})$, a KNN approach will deem that p(c) is false because it has not seen a positive instance of this particular ground atom and has no representational resources for generalising.

5.2 ILP systems

We evaluate four ILP systems on our dataset. It is important to note that we are not trying to directly compare the ILP systems, or demonstrate that any particular ILP system is better than another. We are instead trying to show that the IGGP problem is challenging for existing systems, and that it (and the dataset) will provide a challenging problem for evaluating future research. Indeed, a direct comparison of ILP systems is often difficult (Cropper 2017), largely because different systems excel at certain classes of problems. For instance, directly comparing the Prolog-based Metagol against ASP-based systems, such as ILASP and HEXMIL (Kaminski et al. 2018) is difficult because Metagol is often used to learn recursive list manipulation programs, including string transformations and sorting algorithms (Cropper and Muggleton 2019). By contrast, many ASP solvers disallow explicit lists, such as the popular Clingo system (Gebser et al. 2014), and thus a direct comparison is difficult. Likewise, ASP-based systems can be used to learn non-deterministic specifications represented through choice rules and preferences modeled as weak constraints (Law et al. 2018), which is not necessarily the case for Prolog-based systems. In addition, because many of the systems have learning parameters, it is often possible to show that there exist some parameter settings for which system X can perform better than Algorithm Y on a particular dataset. Therefore, the relative performances of the systems should largely be ignored.

We compare the ILP systems Aleph, ASPAL, Metagol, and ILASP. We describe these systems in turn.

5.2.1 Aleph

Aleph is an ILP system written in Prolog based on Progol (Muggleton 1995). Aleph uses the following procedure to induce a logic program hypothesis (paraphrased from the Aleph website^{Footnote 7}):

1.
Select an example to be generalised. If none exist, stop, otherwise proceed to the next step.
2.
Construct the most specific clause (also known as the bottom clause (Muggleton 1995) that entails the example selected and is within language restrictions provided.
3.
Search for a clause more general than the bottom clause. This step is done by searching for some subset of the literals in the bottom clause that has the ‘best’ score.
4.
The clause with the best score is added to the current theory and all the examples made redundant are removed. Return to step 1.

To restrict the hypothesis space (mainly at step 2), Aleph uses both mode declarations (Muggleton 1995) and determinations to denote how and when a literal can appear in a clause. In the mode language, modeh are declarations for head literals and modeb are declarations for body literals. An example modeb declaration is modeb(2,mult(+int,+int,-int)). The first argument of a mode declaration is an integer denoting how often a literal may appear in a clause. The second argument denotes that the literal mult/3 may appear in the body of a clause and specifies the type of its arguments. The symbols $+$ and − denote whether the arguments are input or output arguments respectively. Determinations declare what predicates can be used to construct a hypothesis and are the form of determination(TargetName/Arity,BackgroundName/Arity). The first argument is the name and arity of the target predicate. The second argument is the name and arity of a predicate that can appear in the body of such clauses. Typically there will be many determination declarations for a target predicate, corresponding to the predicates thought to be relevant in constructing hypotheses. If no determinations are present Aleph does not construct any clauses.

Aleph assumes that modes will be declared by the user. For the IGGP tasks this is quite a burden because it requires that we create them for each game, and also requires some knowledge of the target hypothesis we want to learn. Fortunately, however, Aleph can extract mode declarations from determinations, where determinations are straightforward to supply because we can supply for each target predicate and each background predicate a determination. Therefore, for each game, we allow Aleph to use all the predicates available for that game as determinations and allow Aleph to induce the necessary mode declarations.

There are many parameters in Aleph which greatly influence the output, such as parameters that change the search strategy when generalising a bottom clause (step 3) and parameters that change the structure of learnable programs (such as limiting the number of literals in the bottom clause). We use Aleph 5 with YAP 6.2.2 (Costa et al. 2012), keeping the default parameters throughout. Therefore, there will most likely exist some parameter settings for which Aleph will perform better than we present.

5.2.2 ASPAL

ASPAL (Corapi et al. 2011) is a system for brave induction under the answer set programming (ASP) (Lifschitz 2008) semantics. Brave induction systems aim to find a hypothesis H such that there is at least one answer set of $B\cup H$ that covers the examples.^{Footnote 8}

ASPAL works by transforming a brave induction task T into a meta-level ASP program ${\mathscr {M}}(T)$ such that the answer sets of ${\mathscr {M}}(T)$ correspond to the inductive solutions of T. The first step of state-of-the-art ASP solvers, such as clingo (Gebser et al. 2011), is to compute the grounding of the program. Systems which follow this approach therefore have scalability issues with respect to the size of the hypothesis space, as every ground instance of every rule in the hypothesis space—i.e. the ground instances of every rule that has the potential to be learned—is computed when the ASP solver solves ${\mathscr {M}}(T)$.

Similarly to Aleph, ASPAL has several input parameters, which influence the size of the hypothesis space, such as the maximum number of body literals. For most of these, we used the default value, but we increased the maximum number of body literals from 3 to 5 and the maximum number of rules in the hypothesis space from 3 to 15. Our initial experiments showed that the maximum number of rules had very little effect on the feasibility of the ASPAL approach (as the size of the grounding of ${\mathscr {M}}(T)$ is unaffected by this change), whereas the maximum number of body literals can make a significant difference to the size of the grounding of ${\mathscr {M}}(T)$. It is possible that there is a set of parameters for ASPAL that performs better than those we have chosen.

Predicate invention is supported in ASPAL by allowing new predicates (which do not occur in the rest of the task) to appear in the mode declarations. This predicate invention is prescriptive rather than automatic, as the schema of the new predicates (i.e. the arity, and argument types) must be specified in the mode declarations. As how to guess the structure of predicates which should be invented is unclear for this problem setting, we did not allow ASPAL to use predicate invention on this dataset. It should be noted that when programs are stratified, hypotheses containing predicate invention can always be translated into equivalent hypotheses with no predicate invention. Of course, as such hypotheses may be significantly longer than the compact hypotheses which are possible through predicate invention, they may require more examples to be learned accurately by ASPAL.

Similarly, although ASPAL does enable learning recursive hypotheses, we did not permit recursion in these experiments. Recursive hypotheses can also be translated into non-recursive hypotheses over finite domains. Our initial experiments using ASPAL showed that in addition to increasing the size of the hypothesis space, allowing recursion also significantly increased the grounding of ASPAL’s meta program, ${\mathscr {M}}(T)$.

5.2.3 Metagol

Metagol (Muggleton et al. 2015; Cropper and Muggleton 2016a, b) is an ILP system based on a Prolog meta-interpreter. The key difference between Metagol and a standard Prolog meta-interpreter is that whereas a standard Prolog meta-interpreter attempts to prove a goal by repeatedly fetching first-order clauses whose heads unify with a given goal, Metagol additionally attempts to prove a goal by fetching higher-order metarules (Fig. 12), supplied as background knowledge, whose heads unify with the goal. The resulting meta-substitutions are saved and can be reused in later proofs. Following the proof of a set of goals, Metagol forms a logic program by projecting the meta-substitutions onto their corresponding metarules. Metagol is notable for its support for (non-prescriptive) predicate invention and learning recursive programs.

Metarules define the structure of learnable programs, which in turn defines the hypothesis space. Deciding which metarules to use for a given task is an unsolved problem (Cropper 2017; Cropper and Tourret 2019). To compute the benchmark, we set Metagol to use the same metarules for all games and tasks. This set is composed of 9 derivationally irreducible metarules (Cropper and Tourret 2018, 2019), a set of metarules to allow for constants in a program, and a set of nullary metarules (to learn the terminal predicates). Full details on the metarules used can be found in the code repository.

For each game, we allow Metagol to use all the predicates available for that game. We also allow Metagol to support a primitive form of negation by additionally using the negation of predicates. For instance, in Firesheep we allow Metagol to use the rule not_does_kill(A,B) :- not(does_kill(A,B)). To allow Metagol to induce a program given all $(B_i,E^+_i,E^-_i)$ triples, we prefix each atom with an extra argument to denote which triple each atom belongs to. For instance, in the first minimal even triple, the atom does_choose(player,1) becomes does_choose(triple1,player,1), and in the second triple the same atom becomes does_choose(triple2,player,1). To account for this extra argument, we also add extra argument to each literal in a metarule. For instance, the ident metarule becomes $P(I,A) \leftarrow Q(I,A)$ and the chain metarule becomes $P(I,A,B) \leftarrow Q(I,A,C), R(I,C,B)$.

We use Metagol 2.2.3 with YAP 6.2.2.

5.2.4 ILASP

ILASP (Inductive Learning of Answer Set Programs) (Law et al. 2014, 2015a, b) is a collection of ILP systems, which are capable of learning ASP programs consisting of normal rules, choice rules, hard and weak constraints. Unlike many other ILP approaches, ILASP guarantees the computation of an optimal inductive solution (where optimality is defined in terms of the length of a hypothesis). Similarly to ASPAL, early ILASP systems, such as ILASP1 (Law et al. 2014) and ILASP2 (Law et al. 2015b), work by representing an ILP task (i.e. every example and every rule in the hypothesis space) as a meta-level ASP program whose optimal answer sets correspond to the optimal inductive solutions of the task. The ILASP systems each target learning unstratified ASP programs with normal rules, choice rules and both hard and weak constraints. Therefore, the stratified normal logic programs which are targeted in this paper do not require the full generality of ILASP; in fact, on this dataset, the meta-level ASP programs used by both ILASP1 and ILASP2 are isomorphic to the meta-level program used by ASPAL.

ILASP2i (Law et al. 2016) addresses the scalability with respect to the number of examples by iteratively computing a subset of the examples, called relevant examples, and only representing the relevant examples in the ASP program. In each iteration, ILASP2i uses ILASP2 to find a hypothesis H that covers the set of relevant examples and then searches for a new relevant example which is not covered by H. When no further relevant examples exist, the computed H is guaranteed to be an optimal inductive solution of the full task.

Although ILASP2i makes significant improves on the scalability of ILASP1 and ILASP2 with respect to the examples, on tasks with large hypothesis spaces ILASP2i still suffers from the same grounding bottleneck as ASPAL, ILASP1 and ILASP2. As the size of the hypothesis spaces are one of the major challenges of the dataset in this paper, ILASP2i would likely not perform significantly better than ASPAL. To scale up the application of the ILASP framework to the GGP dataset, we used an extended version of ILASP2i, which computes, at each iteration, a relevant hypothesis space using the type signature and the current set of relevant examples, and then uses ILASP2 to solve a learning task with the current relevant examples and relevant hypothesis space. Through the rest of the paper, we refer to this extended ILASP algorithm as $\hbox {ILASP}^{*}$. Specifically, rules that entail negative examples or do not cover at least one relevant positive example are omitted from the relevant hypothesis space. Also, a rule is omitted if there is another rule which is shorter and covers the same (or more) relevant positive examples. Similarly to ASPAL, $\hbox {ILASP}^{*}$ takes a parameter for the maximum number of literals in the body. Our preliminary experiments showed that the method for computing the relevant hypothesis space performed best with this parameter set to 5, so this value was used for the experiments.

The construction of a relevant hypothesis space was made significantly easier by forbidding recursion and predicate invention in $\hbox {ILASP}^{*}$. Although the standard ILASP algorithms do support recursion and (prescriptive) predicate invention, these two features mean that the usefulness of a rule in covering examples cannot be evaluated independently, and thus constructing the relevant hypothesis space is much more challenging. In future work, we hope to generalise the method of relevant hypothesis space construction to relax these two constraints.

6 Results

We now describe the results of running the baselines and ILP systems on our dataset. All the experimental data is available at https://github.com/andrewcropper/mlj19-iggp. When running the ILP systems, we allowed each system the same amount of time, 30min, to learn each target predicate.

6.1 Evaluation metrics

We use two evaluation metrics: balanced accuracy and perfectly solved.

6.1.1 Balanced accuracy

In our dataset the majority of examples are negative. To account for this class imbalance, we use balanced accuracy (Brodersen et al. 2010) to evaluate the approaches. Given background knowledge B, disjoint sets of positive $E^+$ and negative $E^-$ testing examples, and a logic program H, we define the number of positive examples as $p=|E^+|$, the number of negative examples as $n=|E^-|$, the number of true positives as $tp=|\{e \in E^+ | B \cup H \models e\}|$, the number of true negatives as $tn=|\{e \in E^- | B \cup H \not \models e\}|$, and the balanced accuracy $ba = (tp/p + tn/n)/2$.

6.1.2 Perfectly solved

We also consider a perfectly solved metric, which is the number (or percentage) of tasks that an approach solves with 100% accuracy. The perfectly solved metric is important in IGGP because we know that every game has at least one perfect solution: the GDL description from which the traces were generated is a perfectly accurate model of the deterministic MDP. Perfect accuracy is important because even slightly inaccurate models compound their errors as the game progresses.

6.2 Results summary

Table 3 summarises the results and shows for each approach the balanced accuracy and percentage of perfectly solved tasks. The full results are in the “Appendix”. As the results show, the ILP and KNN approaches perform better than simple baselines (True, Inertia, and Mean). In terms of balanced accuracy, the KNN approaches often perform better than the ILP systems. However, in terms of the important perfectly solved metric, the ILP methods easily outperform the baselines and the KNN approaches. The most successful system $\hbox {ILASP}^{*}$ perfectly solves 40% of the tasks. It should be noted that 4% of test cases have no positive instances in either the training set or the test set, meaning that a perfect score can be achieved with the empty hypothesis. Each of our ILP systems achieved a perfect score on these tasks. Without these trivial cases, the score of each system on the perfectly solved metric would be even lower.

Table 3 Results summary. The baseline represents accepting everything. The results show that all of the approaches struggle in terms of the perfectly solved metric (which represents how many tasks were solved with 100% accuracy)

Full size table

Table 4 Balanced accuracy results for each target predicate

Full size table

Table 5 Perfectly solved percentage for each target predicate

Full size table

Table 6 Balanced accuracies for the next target predicate for the alphabetically first ten games

Full size table

As Table 4 shows, in terms of balanced accuracies, the most difficult task is the terminal predicate, although the margin of difference between the predicates is small. As Table 5 shows, in terms of the important perfectly solved metric, the most difficult task is the next predicate. The mean number of perfectly solved tasks is a measly 3%. Even if we exclude the baselines and only consider the ILP systems then the mean is still only 10%. Table 6 shows the balanced accuracies for the next predicate on the alphabetically first ten games. This predicate corresponds to the state transition function (Sect. 4.1). The next atoms are the most difficult to learn and there is only one out of the first ten games, Buttons and Lights, for which any of the methods find a perfect solution. The next predicate is the most difficult to learn because it has the highest mean complexity in terms of the number of dependent predicates in the dependency graph (Sect. 3.1) in the reference GDL game definitions.

In the following sections we analyse the results for each system and discuss the relative limitations of the respective systems on this dataset.

6.2.1 KNN

As Table 3 shows, the KNN approaches perform well in terms of balanced accuracy but poorly in terms of perfectly solved. Note that $\hbox {KNN}_1$ occasionally scores higher than $\hbox {KNN}_5$, which is to be expected because sometimes looking at additional triples gives misleading information. As already mentioned, the KNN approaches learn at the propositional level. This limitation is evident when analysing the results which show that the $\hbox {KNN}_1$ and $\hbox {KNN}_5$ approaches only perform well when the target predicate can be learned by memorizing particular atoms. For some of the simpler games (e.g. Coins), the KNN approach is often able to learn the goal predicate because the reward can be extracted directly from the value of an internal state variable representing the score. Similarly, the KNN approach sometimes learns the legal predicate when the set of legally valid actions is static and does not depend on the current state. But the KNN approach is not able to perfectly learn any of the next rules for any of the games in our dataset. In addition, the KNN approaches are expensive to compute. To get these results it took 3 days Intel Xeon CPU 3.6 GHz (6 core), 62 G RAM, 425 G Hard drive.

6.2.2 Aleph

As Table 3 shows, Aleph performs reasonably well, and outperforms most of the baselines in terms of the perfectly solved metric. However, after inspecting the learned programs, we found that Aleph was rarely learning general rules for the games, and instead typically learned facts to explain the specific examples. In other words, on this task, Aleph tends to learn overly specific programs. There are several potential explanations for this limitation. First, as we stated in Sect. 5.2.1, we did not provide mode declarations to Aleph, and instead allowed Aleph to infer them from the determinations. Second, we ran Aleph with the default parameters. However, as stated in Sect. 5.2.1, Aleph has many learning parameters which greatly influence the learning performance. It is reasonable to assume that Aleph could perform even better with a different set of parameters. Third, to learn a program Aleph must first construct the most specific clause (the bottom clause) that entails an example. However, constructing the bottom clause requires exponential time in the depth of variables in the target theory (Muggleton 1995). Therefore, learning large and complex clauses is intractable.

6.2.3 ASPAL

As Table 3 shows, ASPAL performs quite poorly on this dataset. It is outperformed by the mean baseline, both in terms of the perfectly solved metric, and the average balanced accuracy. ASPAL timed out on the majority of the test problems, which was caused by the size of the hypothesis space, and therefore the grounding of ASPAL’s meta-level ASP program. It is possible that by using different parameters to control the size of the hypothesis space, or using a different representation of the problem, with a smaller grounding, ASPAL could perform better.

The results of ASPAL are also interesting to explain the need to create a specialised version of the ILASP algorithm for this dataset. On this constrained problem domain, where we are only aiming to learn stratified programs (which are guaranteed to have a single answer set), ILASP2 and ASPAL are almost identical in their approaches. Both map the input ILP task into a meta-level ASP program, and use the Clingo ASP solver to find an optimal answer set, corresponding to an optimal inductive solution of the input task. The specialised $\hbox {ILASP}^*$ algorithm presented in Sect. 5.2.4 can overcome this problem in some cases, by reducing the size of the hypothesis space being considered, and thus reducing the size of the grounding of the meta-level program. In principle, this specialisation (along with ILASP2i’s relevant example method) could be applied to ASPAL, to create $\hbox {ASPAL}^*$, which would likely have performed better.

6.2.4 Metagol

Although Metagol outperforms the baselines in the perfectly correct metric (34%), it is outperformed in terms of balanced accuracy.

One of the main limitations of Metagol in this dataset is that it will only return a program if that program covers all of the positive examples and none of the negative examples. However, in some of the games, Metagol could learn a single simple rule that explains 99% of the training examples (and perhaps 99% of the testing examples) but may need an additional complex rule to cover the remaining 1%. If this extra rule is too complex to learn, then Metagol will not learn anything. To explore this limitation we ran a modified version of Metagol that relaxes this constraint. This modified version simply samples training examples, rather than learn from all the examples. This stochastic version of Metagol improved balanced accuracy from 69 to 76%. In future work we intend to develop more sophisticated versions of stochastic Metagol.

Metagol can generalise from few examples because of the strong inductive bias enforced by the metarules. However, this strong bias is also a key reason why Metagol struggles to learn programs for many of the games. Given insufficient metarules, Metagol cannot induce the target program. For instance, given only monadic metarules, Metagol can only learn monadic programs. Although there is work studying which metarules to use for monadic and dyadic logics (Cropper and Muggleton 2014; Cropper and Tourret 2018, 2019), there is no work on determining which metarules to use for higher-arity logic. Therefore, when computing the benchmarks, Metagol could not learn some of the higher-arity target predicates, such as the next_cell/4 predicate in Sudoku. Similarly Metagol could often not use higher-arity predicates, such as does_move/5 and triplet/6 in Alquerque.

Another issue with the metarules is in that, as described in Sect. 5.2.3, we used the same set of metarules for all games. This approach is inefficient because in almost all cases this approach meant that we were using irrelevant metarules, which added unnecessary search to the learning task. We expect that a simple preprocessing step to remove unusable metarules would improve learning performance, although probably not by any considerable margin.

Another reason why Metagol struggles to solve certain games is because, as with most ILP systems, it struggles to learn large and complex programs. For Metagol the bottleneck is in the size of the target program because the search space grows exponentially with the number of clauses in the target program (Cropper and Tourret 2019). Although there is work in trying to mitigate this issue (Cropper and Muggleton 2016a), developing approaches that can learn large and complex programs is a major challenge for MIL and ILP in general (Cropper 2017).

6.2.5 $\hbox {ILASP}^{*}$

The system with the highest percentage of completely accurate models (see Table 3) is $\hbox {ILASP}^{*}$, with 40% of the tasks completely solved. In most of the cases where $\hbox {ILASP}^{*}$ terminated with a solution in the time limit of 30 min, a perfect solution was returned. On the rare occasion that $\hbox {ILASP}^{*}$ terminated but learned an imperfect solution, it did cover the training examples, but performed imperfectly on the test set; for example, in the terminal training set for Untwisty Corridor there are no positive examples, meaning that $\hbox {ILASP}^{*}$ returns the empty hypothesis (which covers the set of negative examples); however, there is a positive instance of terminal in the test set, meaning that $\hbox {ILASP}^{*}$ (and all other approaches) score a balanced accuracy of 50 on this problem.

In some cases, the restriction on the number of body literals meant that the task had no solutions. In these unsatisfiable cases, the hypothesis in the last satisfiable iteration was returned by $\hbox {ILASP}^{*}$. In principle, the maximum number of body literals could have been iteratively increased until the task became satisfiable, but our initial experiments showed that this made little or no difference to the number of perfectly solved cases. Some of the unsatisfiable cases may have been caused by the restriction forbidding predicate invention for $\hbox {ILASP}^{*}$ on this dataset—although there will always by an equivalent hypothesis that does not contain predicate invention, the equivalent hypothesis may have rules with more than 5 body literals.

Similarly to the unsatisfiable cases, in the timeout cases, the hypothesis found in the $\hbox {ILASP}^{*}$’s final iteration was used to compute the accuracy. Returning the hypothesis found in the last iteration explains $\hbox {ILASP}^{*}$’s much higher average balanced accuracy compared to Metagol, which either returns a perfect solution over the test set or no solution at all.

$\hbox {ILASP}^*$ is able to perfectly solve some tasks that are not perfectly solved by any of the baselines or other ILP systems. One example is the $\mathtt {next}$ learning task for Rock Paper Scissors. In this case, the raw hypothesis returned by $\hbox {ILASP}^*$ is shown in Fig. 13, which is equivalent to the (more readable) hypothesis shown in Fig. 14. Note that this hypothesis is slightly more complicated than necessary. If $\hbox {ILASP}^*$ had been permitted to use $!=$ to check that two player variables did not represent the same player, it is possible that the last three rules would have been replaced with:

It is possible to learn hypotheses with $!=$ (and other binary comparison operators) in ILASP, but this would have increased the size of the hypothesis space, so in these experiments, we only allowed $\hbox {ILASP}^*$ to construct hypothesis spaces using the language of the input task. In future work, we may consider extending the relevant hypothesis space construction method to allow binary comparison operators. The increase in the size of the hypothesis space may be outweighed by the fact that the final hypothesis can be shorter—shorter hypotheses tend to need fewer iterations to learn.

6.3 Discussion

As Table 3 shows, most of the IGGP tasks cannot be perfectly learned by existing ILP systems. The best performing system ($\hbox {ILASP}^{*}$) solves only 40% of the tasks perfectly. Our results suggest that the IGGP problem poses many challenges to existing approaches.

As mentioned in Sect. 4.3, we are unsure whether the dataset contains sufficient training examples for each approach to perfectly solve all of the tasks. Moreover, determining whether there is sufficient data is especially difficult because the different systems employ different biases. However, in most cases the ILP systems simply timed out, rather than learning an incorrect solution. The key issue is that the ILP systems we have considered do not scale to the large problems in the IGGP dataset. In the previous section we discussed limitations of each system. We now summarise the limitations to help explain what makes IGGP difficult for existing approaches.

Large programs

As discussed in Sect. 2, many reference solutions for IGGP games are large, both in terms of the number of literals and the clauses in them. For instance, the GGP reference solution for the goal predicate for Connect Four uses 14 clauses and a total of 72 literals. However, learning large programs is a challenge for most ILP systems (Cropper 2017) which typically struggle to learn programs with hundreds of clauses or literals. Metagol, for instance, struggles to learn programs with more than 8 clauses.

Predicate invention The reference solution for goal in Connect four uses auxiliary predicates (goal is defined in terms of lines, which are defined in terms of columns, rows and diagonals). These auxiliary predicates are not strictly required, as any stratified definition with auxiliary predicates can be translated into an equivalent program with no auxiliary predicates; however, such equivalent programs are often significantly longer. If we unfold the reference solution to remove auxiliary predicates, the resulting equivalent unfolded program contains over 400 literals. For ILP approaches that do not support the learning of programs containing auxiliary predicates (such as Progol, Aleph, and FOIL), it is infeasible to learn such a large program. More modern ILP approaches support predicate invention, enabling the learning of auxiliary predicates which are not in the language of the background knowledge or the examples; however, predicate invention is far from easy, and there are significant challenges associated with it, even for state of the art ILP systems. ASPAL and ILASP support prescriptive predicate invention, where the schema of the auxiliary predicates (i.e. the arity, and argument types) must be specified in the mode declarations (Law 2018). By contrast, Metagol supports automatic predicate invention, where Metagol invents auxiliary predicates without the need for user-supplied arities or type information. However, Metagol’s approach can still often lead to inefficiencies in the search, especially when multiple new predicate symbols are introduced.

7 Conclusion

In this paper, we have expanded on the Inductive General Game Playing task proposed by Genesereth. We claimed that learning the rules of the GGP games is difficult for existing ILP techniques. To support this claim, we introduced a IGGP dataset based on 50 games from the GGP competition and we evaluated existing ILP systems on the dataset. Our empirical results show that most of the games cannot be perfectly learned by existing systems. The best performing system ($\hbox {ILASP}^{*}$) solves only 40% of the tasks perfectly. Our results suggest that the IGGP problem poses many challenges to existing approaches. We think that the IGGP problem and dataset will provide an exciting challenge for future research, especially as we have introduced techniques to continually expand the dataset with new games.

7.1 Limitations and future work

Better ILP systems

Our primary motivation for introducing this dataset is to encourage future research in ILP, especially on general ILP systems able to learn rules for a diverse set of tasks. In fact, we have already demonstrated two advancements in this paper: (1) a stochastic version of Metagol (6.2.4), and (2) $\hbox {ILASP}^{*}$ (Sect. 5.2.4), which scales up ILASP2 for the GGP dataset. In future work we intend to develop better ILP systems.

More games One of the main advantages of the IGGP problem is that the games are based on the GGP competition. As mentioned in the introduction, the GGP competition produces new games each year. These games are introduced independently from our dataset without any particular ILP system in mind. Therefore, because of our second contribution, we can continually expand the IGGP dataset with these new games. In future work we intend to automate this whole process and to ensure that all the data is publicly available.

More systems We have evaluated four ILP systems (Aleph, ASPAL, Metagol, and ILASP). In future work we would like to evaluate more ILP systems. We could also like to consider non-ILP systems (i.e. systems that may not necessarily learn explicit human-readable rules).

More evaluation metrics We have evaluated ILP systems according to two metrics: balanced accuracy and perfect solved. However, there are other dimensions on which to evaluate the systems. We have not, for instance, considered the learning times of the systems (although they all had the same maximum time to learn during the evaluation). Nor have we considered the sample complexity of the approaches. In future work it would be valuable to evaluate approaches when varying the number of game traces (i.e. observations) available, as to identify the most data-efficient approaches.

More challenges The main challenge in using existing systems on this dataset is the deliberate lack of game-specific language biases, meaning that for many games the hypothesis space that each system must consider is extremely large. This reflects a major current issue in ILP, where systems are often given well crafted language biases to ensure feasibility; however, this is not the only current challenge in ILP. For example, some ILP approaches target challenges such as learning from noisy data (Oblak and Bratko 2010; Evans and Grefenstette 2018; Law et al. 2018), probabilistic reasoning (Raedt et al. 2007; De Raedt and Thon 2010; Riguzzi et al. 2014; Bellodi and Riguzzi 2015; Riguzzi et al. 2016), non-determinism expressed through unstratified negation (Otero 2001; Law et al. 2018), and preference learning (Law et al. 2015b). Future versions of this dataset could be extended to contain these features.

Competitions SAT competitions have been held since 1992 with the aim of providing an objective evaluation of contemporary SAT solvers (Järvisalo et al. 2012). The competitions have significantly contributed to the progress of developing ever more efficient SAT techniques (Järvisalo et al. 2012). In addition, the competitions have motivated the SAT community to develop more robust, reliable, and general purposes SAT solvers (i.e implementations). We believe that the ILP community stands to benefit from an equivalent competition, to focus and motivate research. We hope that this new IGGP problem and dataset will become a central component in this new competition.

Notes

The dataset is available at https://github.com/andrewcropper/iggp.
This measure of complexity assumes, of course, that the length of the ground-truth program is reasonably close to the shortest GDL description of the game. In other words, this assumes the actual program length is a reasonable estimate of the Kolmogorov complexity.
There are variants in which some games are stochastic, and some have imperfect information. But in the core GGP framework all games are deterministic and have perfect information.
Sometimes, alternatively, the reward function has the slightly more expressive form $R : S \times A \times S \rightarrow {\mathbb {R}}^n$.
We could dispense with the type signature, and generate all possible untyped ground atoms. Naively generating all possible untyped ground atoms would significantly increase the size of the dataset. We use the type signature as a space optimisation to keep the dataset manageable.
For efficiency, we calculate this difference by converting the sets into bit vectors, applying xor, and counting the number of set bits.
https://www.cs.ox.ac.uk/activities/programinduction/Aleph/.
As the programs in this paper are guaranteed to be stratified—recursion through negation is not allowed in this dataset—all programs have exactly one answer set and so the brave and cautious settings for ILP under the answer set semantics coincide.

References

Apt, K. R., Blair, H. A., & Walker, A. (1988). Towards a theory of declarative knowledge. In J. Minker (Ed.), Foundations of deductive databases and logic programming (pp. 89–148). Burlinton: Morgan Kaufmann.
Chapter Google Scholar
Bain, M. (1994). Learning logical exceptions in chess. Ph.D. thesis, University of Strathclyde
Bellodi, E., & Riguzzi, F. (2015). Structure learning of probabilistic logic programs by searching the clause space. Theory and Practice of Logic Programming, 15(02), 169–212.
Article MATH Google Scholar
Björnsson, Y. (2012). Learning rules of simplified boardgames by observing. In ECAI (pp. 175–180).
Brodersen, K. H., Ong, C. S., Stephan, K. E., & Buhmann, J. M. (2010). The balanced accuracy and its posterior distribution. In 20th international conference on pattern recognition (ICPR 2010), Istanbul, Turkey, 23–26 August 2010 (pp. 3121–3124). IEEE Computer Society.
Campbell, M., Joseph Hoane, A, Jr., & Hsu, F. (2002). Deep blue. Artificial Intelligence, 134(1–2), 57–83.
Article MATH Google Scholar
Castillo, L. P., & Wrobel, S. (2003). Learning minesweeper with multirelational learning. In IJCAI (pp. 533–540). Morgan Kaufmann.
Corapi, D., Russo, A., & Lupu, E. (2011). Inductive logic programming in answer set programming. In International conference on inductive logic programming (pp. 91–97). Springer.
Costa, V. S., Rocha, R., & Damas, L. (2012). The YAP prolog system. TPLP, 12(1–2), 5–34.
MathSciNet MATH Google Scholar
Cresswell, S., McCluskey, T. L., & West, M. M. (2009). Acquisition of object-centred domain models from planning examples. In ICAPS.
Cropper, A. (2017). Efficiently learning efficient programs. Ph.D. thesis, Imperial College London, UK.
Cropper, A., & Muggleton, S. H. (2014). Logical minimisation of meta-rules within meta-interpretive learning. In J. Davis, & J. Ramon, (Eds.), Inductive logic programming—24th international conference (ILP 2014), Nancy, France, September 14–16, 2014, revised selected papers, volume 9046 of Lecture Notes in Computer Science (pp. 62–75). Springer.
Cropper, A., & Muggleton, S. H. (2016a). Learning higher-order logic programs through abstraction and invention. In S. Kambhampati (Ed.), Proceedings of the twenty-fifth international joint conference on artificial intelligence (IJCAI 2016), New York, NY, USA, 9–15 July 2016 (pp. 1418–1424). IJCAI/AAAI Press.
Cropper, A. & Muggleton, S. H. (2016b). Metagol system. https://github.com/metagol/metagol.
Cropper, A., & Muggleton, S. H. (2019). Learning efficient logic programs. Machine Learning, 108(7), 1063–1083.
Article MathSciNet MATH Google Scholar
Cropper, A., & Tourret, S. (2018). Derivation reduction of metarules in meta-interpretive learning. In ILP, volume 11105 of Lecture Notes in Computer Science (pp. 1–21). Springer.
Cropper, A., & Tourret, S. (2019). Logical minimisation of metarules. Machine Learning (to appear). arXiv:1907.10952.
Dantsin, E., Eiter, T., Gottlob, G., & Voronkov, A. (2001). Complexity and expressive power of logic programming. ACM Computing Surveys (CSUR), 33(3), 374–425.
Article Google Scholar
De Raedt, L. (2008). Logical and relational learning. In Cognitive technologies. Springer.
De Raedt, L. D., Kimmig, A., & Toivonen, H. (2007). Problog: A probabilistic prolog and its application in link discovery. IJCAI, 7, 2462–2467.
Google Scholar
De Raedt, L., & Thon, I. (2010). Probabilistic rule learning. In International conference on inductive logic programming (pp. 47–58). Springer.
Debnath, A. K., Lopez de Compadre, R. L., Debnath, G., Shusterman, A. J., & Hansch, C. (1991). Structure-activity relationship of mutagenic aromatic and heteroaromatic nitro compounds. Correlation with molecular orbital energies and hydrophobicity. Journal of Medicinal Chemistry, 34(2), 786–797.
Article Google Scholar
Duff, M. O., & Barto, A. (2002). Optimal learning: Computational procedures for Bayes-adaptive Markov decision processes. Ph.D. thesis, University of Massachusetts at Amherst.
Džeroski, S., De Raedt, L., & Driessens, K. (2001). Relational reinforcement learning. Machine learning, 43(1–2), 7–52.
Article MATH Google Scholar
Evans, R., & Grefenstette, E. (2018). Learning explanatory rules from noisy data. Journal of Artificial Intelligence, 61, 1–64.
Article MathSciNet MATH Google Scholar
Finnsson, H., et al. (2012). Simulation-based general game playing. Doctor of philosophy, School of Computer Science, Reykjavík University.
Gebser, M., Kaminski, R., Kaufmann, B., & Schaub, T. (2014). Clingo = ASP + control: Preliminary report. CoRR. arXiv:1405.3694.
Gebser, M., Kaufmann, B., Kaminski, R., Ostrowski, M., Schaub, T., & Schneider, M. (2011). Potassco: The potsdam answer set solving collection. Ai Communications, 24(2), 107–124.
Article MathSciNet MATH Google Scholar
Genesereth, M., & Thielscher, M. (2014). General game playing. Synthesis Lectures on Artificial Intelligence and Machine Learning, 8(2), 1–229.
Article MATH Google Scholar
Genesereth, M. R., & Björnsson, Y. (2013). The international general game playing competition. AI Magazine, 34(2), 107–111.
Article Google Scholar
Goodacre, J. (1996). Inductive learning of chess rules using Progol. Ph.D. thesis, University of Oxford.
Gregory, P., Schumann, H. C., Yngvi, B., & Schiffel, S. (2015). The GRL system: Learning board game rules with piece-move interactions. In Computer games (pp. 130–148). Springer.
Grohe, M., & Ritzert, M.(2017). Learning first-order definable concepts over structures of small degree. In 2017 32nd annual ACM/IEEE symposium on logic in computer science (LICS) (pp. 1–12). IEEE.
Guez, A., Silver, D., & Dayan, P. (2012). Efficient bayes-adaptive reinforcement learning using sample-based search. In Advances in neural information processing systems (pp. 1025–1033).
Hinton, G. E. (1986). Learning distributed representations of concepts. In Proceedings of the eighth annual conference of the cognitive science society (Vol. 1, p. 12). Amherst, MA.
Inoue, K., Doncescu, A., & Nabeshima, H. (2013). Completing causal networks by meta-level abduction. Machine Learning, 91(2), 239–277.
Article MathSciNet MATH Google Scholar
Järvisalo, M., Le Berre, D., Roussel, O., & Simon, L. (2012). The international SAT solver competitions. AI Magazine, 33(1), 89–92.
Article Google Scholar
Kaiser, L. (2012). Learning games from videos guided by descriptive complexity. In AAAI.
Kaminski, T., Eiter, T., & Inoue, K. (2018). Exploiting answer set programming with external sources for meta-interpretive learning. TPLP, 18(3–4), 571–588.
MathSciNet MATH Google Scholar
Kearns, M. J., & Singh, S. P. (1999) Finite-sample convergence rates for q-learning and indirect algorithms. In Advances in neural information processing systems (pp. 996–1002).
Koriche, F., Lagrue, S., Piette, É., & Tabary, S. (2016). Stochastic constraint programming for general game playing with imperfect information. In General intelligence in game-playing agents (GIGA’16) at the 25th international joint conference on artificial intelligence (IJCAI’16), pages.
Koriche, F., Lagrue, S., Piette, É., & Tabary, S. (2017). Woodstock: un programme-joueur générique dirigé par les contraintes stochastiques. Revue d’intelligence artificielle–no, 307, 336.
Google Scholar
Larson, J., & Michalski, R. S. (1977). Inductive inference of VL decision rules. SIGART Newsletter, 63, 38–44.
Google Scholar
Law, M. (2018). Inductive learning of answer set programs. Ph.D. thesis, Imperial College London, UK.
Law, M., Russo, A., & Broda, K. (2014). Inductive learning of answer set programs. In Logics in artificial intelligence—14th European conference (JELIA 2014) Funchal, Madeira, Portugal, September 24–26, 2014. Proceedings (pp. 311–325).
Law, M., Russo, A., & Broda, K. (2015a). The ILASP system for learning answer set programs. https://www.doc.ic.ac.uk/~ml1909/ILASP.
Law, M., Russo, A., & Broda, K. (2015). Learning weak constraints in answer set programming. Theory and Practice of Logic Programming, 15(4–5), 511–525.
Article MathSciNet MATH Google Scholar
Law, M., Russo, A., & Broda, K. (2016). Iterative learning of answer set programs from context dependent examples. Theory and Practice of Logic Programming, 16(5–6), 834–848.
Article MathSciNet MATH Google Scholar
Law, M., Russo, A., & Broda, K. (2018). The complexity and generality of learning answer set programs. Artificial Intelligence, 259, 110–146.
Article MathSciNet MATH Google Scholar
Law, Mark, Russo, Alessandra, & Broda, Krysia (2018). Inductive learning of answer set programs from noisy examples. Advances in Cognitive Systems.
Law, M., Russo, A., Cussens, J., & Broda, K. (2016). The 2016 competition on inductive logic programming. http://ilp16.doc.ic.ac.uk/competition/
Lifschitz, V. (2008). What is answer set programming? In Proceedings of the Twenty-Third AAAI Conference on Artificial Intelligence (AAAI 2008), Chicago, Illinois, USA, July 13–17, 2008 (pp. 1594–1597). The AAAI Press: California.
Lin, D., Dechter, E., Ellis, K., Tenenbaum, J. B., & Muggleton, S. (2014). Bias reformulation for one-shot function induction. In T. Schaub, G. Friedrich, & B. O’Sullivan (Eds.), ECAI 2014—21st European conference on artificial intelligence, 18–22 August 2014, Prague, Czech Republic—including prestigious applications of intelligent systems (PAIS 2014), volume 263 of Frontiers in Artificial Intelligence and Applications (pp. 525–530). IOS Press.
Littman, M. L. (1994). Markov games as a framework for multi-agent reinforcement learning. In Machine learning proceedings 1994 (pp. 157–163). Elsevier.
Love, N., Hinrichs, T., Haley, D., Schkufza, E. & Genesereth, M. (2008). General game playing: Game description language specification. Stanford Logic Group Computer Science Department Stanford University, technical report LG-2006-01.
Morales, E. M. (1996). Learning playing strategies in chess. Computational Intelligence, 12, 65–87.
Article Google Scholar
Muggleton, S. (1995). Inverse entailment and Progol. New Generation Computing, 13(3&4), 245–286.
Article Google Scholar
Muggleton, S., Bain, M., Michie, J. H., & Michie, D. (1989). An experimental comparison of human and machine learning formalisms. In A. Maria Segre (Ed.), Proceedings of the sixth international workshop on machine learning (ML 1989), Cornell University, Ithaca, New York, USA, June 26–27, 1989 (pp. 113–118). Morgan Kaufmann.
Muggleton, S., De Raedt, L., Poole, D., Bratko, I., Flach, P. A., Inoue, K., et al. (2012). ILP turns 20 - biography and future challenges. Machine Learning, 86(1), 3–23.
Article MathSciNet MATH Google Scholar
Muggleton, S. H., Lin, D., Pahlavi, N., & Tamaddoni-Nezhad, A. (2014). Meta-interpretive learning: application to grammatical inference. Machine Learning, 94(1), 25–49.
Article MathSciNet MATH Google Scholar
Muggleton, S. H., Lin, D., & Tamaddoni-Nezhad, A. (2015). Meta-interpretive learning of higher-order dyadic datalog: predicate invention revisited. Machine Learning, 100(1), 49–73.
Article MathSciNet MATH Google Scholar
Muggleton, S., Paes, A., Costa, V. S., & Zaverucha, G. (2009). Chess revision: Acquiring the rules of chess variants through FOL theory revision from examples. In L. De Raedt (Ed.), Inductive logic programming, 19th international conference (ILP 2009), Leuven, Belgium, July 02–04, 2009. revised papers, volume 5989 of Lecture Notes in Computer Science (pp. 123–130). Springer.
Oblak, A., & Bratko, I. (2010). Learning from noisy data using a non-covering ILP algorithm. In International conference on inductive logic programming (pp. 190–197). Springer.
Otero, R. P. (2001). Induction of stable models. In Inductive logic programming (pp. 193–205). Springer.
Ross Quinlan, J. (1990). Learning logical definitions from relations. Machine Learning, 5, 239–266.
Google Scholar
Riguzzi, F., Bellodi, E., & Zese, R. (2014). A history of probabilistic inductive logic programming. Frontiers in Robotics and AI, 1, 6.
Article Google Scholar
Riguzzi, F., Bellodi, E., Zese, R., Cota, G., & Lamma, E. (2016). Scaling structure learning of probabilistic logic programs by mapreduce. In European Conference on Artificial Intelligence.
Schaeffer, J., Lake, R., Paul, L., & Bryant, M. (1996). CHINOOK: the world man-machine checkers champion. AI Magazine, 17(1), 21–29.
Google Scholar
Silver, D., Hubert, T., Schrittwieser, J., Antonoglou, I., Lai, M., Guez, A., et al. (2017). Mastering chess and shogi by self-play with a general reinforcement learning algorithm. arXiv preprint arXiv:1712.01815.
Srinivasan, A. (2001). The ALEPH manual. In Machine learning at the computing laboratory. Oxford University.
Srinivasan, A., King, R. D., Muggleton, S. H., & Sternberg, M. J. E. (1997). Carcinogenesis predictions using ILP. Inductive Logic Programming, 1297, 273–287.
Article Google Scholar

Download references

Acknowledgements

We are very grateful to the following for feedback and guidance throughout this project: Alessandra Russo, David Pfau, Edward Grefenstette, Krysia Broda, Marc Lanctot, Marek Sergot, and Pushmeet Kohli.

Author information

Authors and Affiliations

University of Oxford, Oxford, UK
Andrew Cropper
Imperial College London, London, UK
Richard Evans & Mark Law

Authors

Andrew Cropper
View author publications
You can also search for this author in PubMed Google Scholar
Richard Evans
View author publications
You can also search for this author in PubMed Google Scholar
Mark Law
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Andrew Cropper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Editors: Dimitar Kazakov and Filip Zelezny.

Appendix A: Full results

This appendix includes the full results for our dataset of 50 games. We use balanced accuracy as the evaluation metric (see Sect. 6.1.1).

Game	Predicate	True	Inertia	Mean	KNN(1)	KNN(5)	Aleph	ASPAL	Metagol	$\hbox {ILASP}^{*}$
Alquerque	goal	50	50	50	97	95	100	50	100	100
Alquerque	legal	50	50	52	62	63	50	50	50	63
Alquerque	next	50	90	73	87	90	53	50	53	74
Alquerque	terminal	50	50	50	51	50	49	50	100	100
Asylum	goal	50	50	50	96	84	59	50	100	100
Asylum	legal	50	50	52	73	69	50	50	50	62
Asylum	next	50	97	74	97	97	68	50	51	84
Asylum	terminal	0	0	100	99	100	98	100	100	100
Battle of Numbers	goal	50	50	74	98	97	50	50	50	73
Battle of Numbers	legal	50	50	56	68	67	50	50	50	78
Battle of Numbers	next	50	88	52	87	86	58	50	53	67
Battle of Numbers	terminal	50	50	50	53	50	48	50	50	39
Breakthrough	goal	50	50	99	99	99	98	50	50	99
Breakthrough	legal	50	50	51	73	70	50	50	50	73
Breakthrough	next	50	96	70	95	96	51	50	51	97
Breakthrough	terminal	50	50	50	50	50	49	50	50	52
Buttons and Lights	goal	50	50	83	83	83	100	50	50	90
Buttons and Lights	legal	100	100	100	100	100	100	100	100	100
Buttons and Lights	next	50	54	50	82	81	57	50	50	100
Buttons and Lights	terminal	50	50	50	100	100	75	50	100	100
Centipede	goal	50	50	98	99	99	96	50	50	88
Centipede	legal	50	50	86	73	95	78	50	50	91
Centipede	next	50	67	56	88	85	56	50	50	92
Centipede	terminal	50	50	50	89	82	52	50	50	75
Checkers	goal	50	50	50	94	88	59	50	100	50
Checkers	legal	50	50	54	64	62	50	50	50	75
Checkers	next	50	91	66	90	90	55	50	55	95
Checkers	terminal	50	50	50	50	60	48	50	50	74
Coins	goal	50	50	100	100	100	93	50	100	100
Coins	legal	50	50	50	66	50	49	50	50	56
Coins	next	50	79	50	88	81	63	50	59	93

Game	Predicate	True	Inertia	Mean	KNN(1)	KNN(5)	Aleph	ASPAL	Metagol	$\hbox {ILASP}^{*}$
Coins	terminal	50	50	50	83	92	68	50	50	95
Connect 4 (Team)	goal	50	50	98	97	98	96	50	50	94
Connect 4 (Team)	legal	50	50	62	50	66	55	50	50	92
Connect 4 (Team)	next	50	93	50	92	92	50	50	50	96
Connect 4 (Team)	terminal	50	50	50	49	50	49	50	50	58
Don’t Touch	goal	50	50	80	73	80	67	50	50	78
Don’t Touch	legal	50	50	50	68	89	49	50	73	100
Don’t Touch	next	50	89	76	86	90	64	50	53	89
Don’t Touch	terminal	50	50	50	47	49	51	50	50	100
Duikoshi	goal	50	50	94	92	94	90	50	50	90
Duikoshi	legal	50	50	51	73	79	49	50	50	70
Duikoshi	next	50	93	59	92	92	52	50	52	94
Duikoshi	terminal	50	50	50	49	50	52	50	50	57
Eight Puzzle	goal	50	50	100	100	100	50	50	50	99
Eight Puzzle	legal	50	50	50	92	82	51	50	50	100
Eight Puzzle	next	50	84	52	89	88	49	50	55	86
Eight Puzzle	terminal	50	50	50	50	50	51	50	100	100
Farming	goal	50	50	98	100	99	100	50	100	100
Farming	legal	50	50	52	66	82	49	50	50	100
Farming	next	50	87	61	85	83	57	50	50	86
Farming	terminal	50	50	50	79	85	49	50	100	100
Firesheep	goal	50	50	63	97	97	100	50	100	100
Firesheep	legal	50	50	78	86	84	49	50	50	97
Firesheep	next	50	82	69	82	80	51	50	51	70
Firesheep	terminal	50	50	50	90	92	48	50	50	38
Fizz-Buzz	goal	50	50	100	100	100	100	50	50	100
Fizz-Buzz	legal	50	50	88	90	89	86	50	100	100
Fizz-Buzz	next	50	69	50	72	71	53	50	50	79
Fizz-Buzz	terminal	50	50	50	50	50	48	50	100	100
Forager	goal	50	50	50	97	100	55	50	100	100
Forager	legal	100	100	100	100	100	100	100	100	100

Game	Predicate	True	Inertia	Mean	KNN(1)	KNN(5)	Aleph	ASPAL	Metagol	$\hbox {ILASP}^{*}$
Forager	next	50	92	87	94	93	50	50	53	95
Forager	terminal	50	50	50	50	47	46	50	100	61
Free For All	goal	50	50	77	99	98	81	50	100	77
Free For All	legal	50	50	52	76	72	50	50	50	96
Free For All	next	50	86	65	86	84	59	50	58	63
Free For All	terminal	50	50	50	61	54	52	50	100	100
Frogs and Toads	goal	0	0	100	100	100	100	100	100	100
Frogs and Toads	legal	50	50	50	95	87	50	50	50	74
Frogs and Toads	next	50	95	93	97	97	51	50	51	85
Frogs and Toads	terminal	0	0	100	100	100	100	100	100	100
GT Attrition	goal	50	50	48	48	48	97	50	100	100
GT Attrition	legal	50	50	0	50	0	100	50	100	100
GT Attrition	next	38	60	54	64	62	57	50	78	86
GT Attrition	terminal	50	50	50	50	50	100	50	100	100
GT Centipede	goal	50	50	75	61	100	82	50	50	99
GT Centipede	legal	50	50	50	75	100	0	50	100	100
GT Centipede	next	43	61	53	69	64	59	50	71	100
GT Centipede	terminal	0	0	0	50	0	100	100	100	100
GT Chicken	goal	50	50	50	91	85	54	50	100	100
GT Chicken	legal	50	50	50	75	86	49	50	100	100
GT Chicken	next	50	59	50	79	71	50	50	67	68
GT Chicken	terminal	50	50	50	57	70	46	50	100	100
GT Prisoner	goal	50	50	50	83	78	56	50	100	100
GT Prisoner	legal	50	50	50	93	100	49	50	100	100
GT Prisoner	next	50	69	63	82	76	63	50	75	76
GT Prisoner	terminal	50	50	50	80	94	46	50	100	100
GT Ultimatum	goal	50	50	50	91	89	56	50	100	80
GT Ultimatum	legal	50	50	61	95	100	69	50	69	100
GT Ultimatum	next	45	61	68	68	71	57	50	61	84
GT Ultimatum	terminal	50	50	50	75	78	52	50	100	100

Game	Predicate	True	Inertia	Mean	KNN(1)	KNN(5)	Aleph	ASPAL	Metagol	$\hbox {ILASP}^{*}$
Hex (Three)	goal	50	50	100	100	100	99	50	50	99
Hex (Three)	legal	50	50	53	47	56	50	50	50	52
Hex (Three)	next	50	96	62	97	95	50	50	66	59
Hex (Three)	terminal	50	50	50	50	50	49	50	50	45
Horseshoe	goal	50	50	98	100	98	96	50	50	98
Horseshoe	legal	50	50	55	94	77	57	50	50	100
Horseshoe	next	50	64	50	87	83	69	50	65	90
Horseshoe	terminal	50	50	50	78	67	55	50	50	77
Hunter	goal	50	50	50	91	90	58	50	100	100
Hunter	legal	50	50	50	90	83	53	50	50	100
Hunter	next	50	88	77	88	90	69	50	52	87
Hunter	terminal	50	50	50	62	59	46	50	100	100
Knights Tour	goal	50	50	50	82	72	53	50	100	100
Knights Tour	legal	50	50	50	73	63	51	50	50	77
Knights Tour	next	50	83	64	87	84	63	50	50	94
Knights Tour	terminal	50	50	50	45	63	52	50	50	54
Kono	goal	50	50	50	97	95	100	50	100	100
Kono	legal	50	50	53	61	65	50	50	50	82
Kono	next	50	88	54	84	87	54	50	55	93
Kono	terminal	50	50	50	52	53	51	50	97	97
Leafy	goal	50	50	96	94	96	91	50	50	90
Leafy	legal	50	50	50	56	56	50	50	50	100
Leafy	next	50	97	90	96	97	49	50	100	92
Leafy	terminal	50	50	50	51	50	49	50	50	47
Lightboard	goal	50	50	100	100	100	100	50	50	98
Lightboard	legal	100	100	100	100	100	100	100	100	100
Lightboard	next	50	81	50	73	73	49	50	59	98
Lightboard	terminal	50	50	50	45	50	48	50	100	100
Minimal Decay	goal	0	0	100	100	100	100	100	100	100
Minimal Decay	legal	100	100	100	100	100	100	100	100	100
Minimal Decay	next	0	0	38	50	50	68	50	50	100

Game	Predicate	True	Inertia	Mean	KNN(1)	KNN(5)	Aleph	ASPAL	Metagol	$\hbox {ILASP}^{*}$
Minimal Decay	terminal	50	50	100	100	100	100	100	100	100
Minimal Even	goal	50	50	100	82	86	85	50	100	100
Minimal Even	legal	100	100	100	100	100	100	100	100	100
Minimal Even	next	50	89	50	84	87	100	50	100	100
Minimal Even	terminal	50	50	50	69	75	100	50	100	100
Multiple Buttons and Lights	goal	50	50	100	100	100	100	50	100	100
Multiple Buttons and Lights	legal	100	100	100	100	100	100	100	100	100
Multiple Buttons and Lights	next	50	72	50	82	81	55	50	70	99
Multiple Buttons and Lights	terminal	50	50	50	98	100	48	50	100	100
Nine Board TicTacToe	goal	50	50	98	97	98	97	50	50	97
Nine Board TicTacToe	legal	6	6	54	67	60	50	50	50	85
Nine Board TicTacToe	next	5	54	55	98	97	52	50	97	94
Nine Board TicTacToe	terminal	50	50	50	49	50	49	50	50	52
Pentago	goal	50	50	99	99	99	98	50	50	99
Pentago	legal	50	50	52	67	65	50	50	56	85
Pentago	next	50	91	50	87	84	52	50	53	94
Pentago	terminal	50	50	50	50	50	49	50	50	64
Pilgrimage	goal	50	50	100	100	100	99	50	50	70
Pilgrimage	legal	50	50	50	65	65	49	50	50	70
Pilgrimage	next	49	92	54	93	92	55	50	52	72
Pilgrimage	terminal	0	0	0	0	0	100	100	100	100
Platform Jumpers	goal	50	50	98	98	98	97	50	50	98
Platform Jumpers	legal	50	50	52	66	62	50	50	55	83
Platform Jumpers	next	50	98	76	99	99	56	50	50	73
Platform Jumpers	terminal	50	50	50	50	74	48	50	50	30
Rainbow	goal	50	50	99	99	99	97	50	50	95
Rainbow	legal	50	50	50	81	86	50	50	100	47
Rainbow	next	50	91	50	89	87	100	50	100	100
Rainbow	terminal	50	50	50	46	50	57	50	50	80
Rock Paper Scissors	goal	50	50	50	75	79	100	50	100	100
Rock Paper Scissors	legal	100	100	100	100	100	100	100	100	100

Game	Predicate	True	Inertia	Mean	KNN(1)	KNN(5)	Aleph	ASPAL	Metagol	$\hbox {ILASP}^{*}$
Rock Paper Scissors	next	50	56	50	73	74	52	50	66	100
Rock Paper Scissors	terminal	50	50	50	100	100	0	50	100	100
Sheep and Wolf	goal	50	50	100	100	100	50	50	100	56
Sheep and Wolf	legal	41	41	55	65	66	50	50	50	54
Sheep and Wolf	next	38	94	91	98	95	50	50	50	96
Sheep and Wolf	terminal	0	0	0	0	0	98	100	100	100
Sokoban	goal	50	50	50	50	50	49	50	50	72
Sokoban	legal	50	50	50	72	75	53	50	50	71
Sokoban	next	50	93	50	90	92	50	50	65	95
Sokoban	terminal	50	50	50	50	50	49	50	50	50
Sudoku	goal	50	50	100	100	100	100	50	100	100
Sudoku	legal	50	50	53	98	97	50	50	50	55
Sudoku	next	50	99	84	98	99	50	50	50	86
Sudoku	terminal	50	50	50	49	50	48	50	50	48
Sukoshi	goal	50	50	100	100	100	100	50	50	100
Sukoshi	legal	50	50	50	87	91	60	50	50	91
Sukoshi	next	50	93	65	90	93	69	50	50	93
Sukoshi	terminal	50	50	50	44	49	43	50	50	43
Switches	goal	0	0	100	100	100	100	50	100	100
Switches	legal	50	50	50	84	85	52	100	50	89
Switches	next	50	94	85	95	94	86	50	60	99
Switches	terminal	0	0	100	100	100	100	100	100	100
TicTacToe	goal	50	50	93	88	93	78	50	50	51
TicTacToe	legal	50	50	53	72	91	48	50	72	100
TicTacToe	next	50	85	51	83	91	54	50	55	89
TicTacToe	terminal	50	50	50	64	57	45	50	50	71
Tiger vs Dogs	goal	50	50	72	88	88	62	50	50	59
Tiger vs Dogs	legal	50	50	50	57	64	50	50	50	79
Tiger vs Dogs	next	50	91	72	89	92	51	50	51	54
Tiger vs Dogs	terminal	0	0	100	100	100	100	100	100	100
Tron	goal	50	50	50	75	71	29	50	50	91

Game	Predicate	True	Inertia	Mean	KNN(1)	KNN(5)	Aleph	ASPAL	Metagol	$\hbox {ILASP}^{*}$
Tron	legal	50	50	50	80	84	54	50	50	85
Tron	next	50	81	70	89	84	70	50	92	100
Tron	terminal	50	50	50	70	77	56	50	50	100
TTCC4	goal	50	50	100	100	100	100	50	50	100
TTCC4	legal	23	23	52	75	66	50	50	50	74
TTCC4	next	32	73	53	89	90	60	50	57	61
TTCC4	terminal	0	0	100	98	100	97	100	100	71
Untwisty Corridor	goal	50	50	100	100	100	100	50	100	100
Untwisty Corridor	legal	100	100	100	100	100	100	100	100	100
Untwisty Corridor	next	50	76	80	92	91	75	50	61	100
Untwisty Corridor	terminal	50	50	50	50	50	50	50	50	50
Walkabout	goal	50	50	95	95	95	92	50	50	93
Walkabout	legal	50	50	50	71	82	51	50	50	100
Walkabout	next	50	59	50	74	74	50	50	50	100
Walkabout	terminal	50	50	50	50	50	50	50	100	100

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and permissions

About this article

Cite this article

Cropper, A., Evans, R. & Law, M. Inductive general game playing. Mach Learn 109, 1393–1434 (2020). https://doi.org/10.1007/s10994-019-05843-w

Download citation

Received: 20 February 2019
Revised: 24 June 2019
Accepted: 06 September 2019
Published: 18 November 2019
Issue Date: July 2020
DOI: https://doi.org/10.1007/s10994-019-05843-w

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Inductive general game playing

Abstract

Similar content being viewed by others

General Game Playing

General Game Playing

Iterative Tree Search in General Game Playing with Incomplete Information

1 Introduction

2 Related work

2.1 General game playing

2.2 Inducing game rules

2.3 Existing datasets

2.3.1 Size and diversity

2.3.2 Inductive bias

2.3.3 Large programs

2.3.4 ILP2016 competition

2.4 Model learning

3 IGGP dataset

3.1 Game description language

3.2 Problem setting

Definition 1

Definition 2

3.2.1 Illustrating example: Fizz Buzz

4 Generating the GGP dataset

4.1 Preliminaries: Markov games

4.1.1 States

4.1.2 Actions

4.1.3 Transition function

4.1.4 Reward function

4.1.5 Legal

4.1.6 Terminal

4.2 Preliminaries: the type-signature for a GGP game

4.3 Automatically generating induction tasks for a GGP game

4.3.1 The \( extract \) function

5 Baselines and ILP systems

5.1 Baselines

5.2 ILP systems

5.2.1 Aleph

5.2.2 ASPAL

5.2.3 Metagol

5.2.4 ILASP

6 Results

6.1 Evaluation metrics

6.1.1 Balanced accuracy

6.1.2 Perfectly solved

6.2 Results summary

6.2.1 KNN

6.2.2 Aleph

6.2.3 ASPAL

6.2.4 Metagol

6.2.5 \(\hbox {ILASP}^{*}\)

6.3 Discussion

7 Conclusion

7.1 Limitations and future work

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix A: Full results

Appendix A: Full results

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation