Meta-Interpretive Learning from noisy images

Statistical machine learning is widely used in image classification. However, most techniques (1) require many images to achieve high accuracy and (2) do not provide support for reasoning below the level of classification, and so are unable to support secondary reasoning, such as the existence and position of light sources and other objects outside the image. This paper describes an Inductive Logic Programming approach called Logical Vision which overcomes some of these limitations. LV uses Meta-Interpretive Learning (MIL) combined with low-level extraction of high-contrast points sampled from the image to learn recursive logic programs describing the image. In published work LV was demonstrated capable of high-accuracy prediction of classes such as regular polygon from small numbers of images where Support Vector Machines and Convolutional Neural Networks gave near random predictions in some cases. LV has so far only been applied to noise-free, artificially generated images. This paper extends LV by (a) addressing classification noise using a new noise-telerant version of the MIL system Metagol, (b) addressing attribute noise using primitive-level statistical estimators to identify sub-objects in real images, (c) using a wider class of background models representing classical 2D shapes such as circles and ellipses, (d) providing richer learnable background knowledge in the form of a simple but generic recursive theory of light reflection. In our experiments we consider noisy images in both natural science settings and in a RoboCup competition setting. The natural science settings involve identification of the position of the light source in telescopic and microscopic images, while the RoboCup setting involves identification of the position of the ball. Our results indicate that with real images the new noise-robust version of LV using a single example (i.e. one-shot LV) converges to an accuracy at least comparable to a thirty-shot statistical machine learner on both prediction of hidden light sources in the scientific settings and in the RoboCup setting. Moreover, we demonstrate that a general background recursive theory of light can itself be invented using LV and used to identify ambiguities in the convexity/concavity of objects such as craters in the scientific setting and partial obscuration of the ball in the RoboCup setting.

near random predictions in some cases. LV has so far only been applied to noise-free, artifi-

Introduction
Galileo's Sidereus Nuncius (Galilei 2004) describes the first ever telescopic observations of the moon. Using sketches of shadow patterns Galileo conjectured the existence of mountains containing hollow areas (i.e. craters) on a celestial body previously thought perfectly spherical. His reasoned description, derived from a handful of observations, relies on a knowledge of (i) classical geometry, (ii) straight line movement of light and (iii) the Sun as an out-ofview light source. This paper investigates the use of Inductive Logic Programming (ILP) (Muggleton et al. 2011) to derive logical hypotheses, related to those of Galileo, from a small set of real-world images. Figure 1 illustrates part of the generic background knowledge used by ILP for interpreting object convexity in Experiment1 (Sect. 5.1). Figure 1a shows an image of the crescent moon in the night sky, in which convexity of the overall surface implies the position of the Sun as a hidden light source beyond the lower right corner of the image. Figure 1b shows an illusion in which assuming a light source in the lower right leads to perception of convex circles on the leading diagonal. Conversely, a light source in the upper left implies their being concave. Figure 1c shows how interpretation of a convex feature, such as a mountain, comes from illumination of the right side of a convex object. Figure 1d shows that perception of a concave feature, such as a crater, comes from illumination of the left side. Figure 1e shows how Prolog background knowledge encodes a simple recursive definition of the reflected path of a photon. This paper explores the phenomenon of knowledge-based perception using an extension of Logical Vision (LV) (Dai et al. 2015). In the previous work LV was shown to accurately learn a variety of polygon classes from artificial images with low sample requirements compared to statistical learners. LV generates logical hypotheses concerning images using an ILP technique called Meta-Interpretive Learning (MIL) Cropper and Muggleton 2016).

Contributions of this paper
The main contributions of this paper are:  Interpretation of light source direction: a waxing crescent moon (Credit: UC Berkeley), b concave/convex illusion, c concave and d convex photon reflection models, e prolog recursive model of photon reflection 1. We describe a generalisation of LV (Dai et al. 2015), which is tolerant to both classification noise and attribute noise. 2. We show that even in the presence of noise in images [absent in artificial images in Dai et al. (2015)] effective learning can be achieved from as few as one image. 3. We demonstrate that in all cases studied the combination of a logic-based learner with a statistical estimator requires far fewer images (sometimes one) to achieve accuracies requiring large numbers of images using statistical machine learning on its own. 4. We demonstrate that LV can use, as well as invent, generic background knowledge about reflection of photons in providing explanations of visual features. 5. We demonstrate that LV has potential in real application domains such as RoboCup.
RoboCup domain In Experiment 2 (Sect. 5.2) we investigate LV in the context of robotics. Figure 2 shows images from the RoboCup Soccer Standard Platform League. 1 This is a competition with five Aldebaran Nao robots on each team. They are placed on a 9 m × 6 m field, and operate autonomously to play soccer. The robots use cameras to detect the ball, field lines, goals and other robots. In Fig. 2a, the ball can be seen distinctly, whereas in Fig. 2b, c the ball is partially occluded. The problem with recognising the ball is that it consists of several patches of black and white, but there are many other objects on the field that also contain white regions. However, background knowledge concerning the geometry of a sphere projected on a 2D plane guarantees a ball has a circular appearance. If three edge points can be found our approach can fit them to a circle and if that circle has the proportions of black and white pixels, the system concludes it is a ball.
The paper is organised as follows. Section 2 describes related work. The theoretical framework for LV is provided in Sect. 3. Section 4 describes the implementation of LV, including the recursive background knowledge for describing radiation and reflection of light. In Sect. 5 we describe experiments on (1) learning abstract definitions of polygons from artificial images, (2) predicting the light source direction and identification of ambiguities in images of the moon and microscopic images of illuminated micro-organisms and (3) identifying the ball in the RoboCup domain. Finally, we conclude and discuss further work in Sect. 6.

Related work
Statistical machine learning based on low-level feature extraction has been increasingly successful in image classification (Rautaray and Agrawal 2015). However, high-level vision, involving interpretation of objects and their relations in the external world, is still relatively poorly understood (Cox 2014). Since the 1990s perception-by-induction (Gregory 1998) has been the dominant model within computer vision, where human perception is viewed as inductive inference of hypotheses from sensory data. The idea originated in the work of the nineteenth century physiologist (von Helmholtz 1962). The approach described in this paper is in line with perception-by-induction in using ILP for generating high-level perceptual hypotheses by combining sensory data with a strong bias in the form of explicitly encoded background knowledge. Whilst Gregory (1974) was one of the earliest to demonstrate the power of the Helmholtz's perception model for explaining human visual illusion, recent experiments (Heath and Ventura 2016) show Deep Neural Networks fail to reproduce humanlike perception of illusion. This contrasts with results in Sect. 5.2, in which LV achieves analogous outcomes to human vision.
Early work in Computer Vision investigated the interaction between visual analysis, linguistic descriptions and geometric models (Waltz 1980;Huffman 1971). In some such approaches visual illusions were identified by testing logical models of images for contradictions (Barrow and Tenenbaum 1981). However, these techniques were based on preformulated models, and did not use machine learning augmented by background knowledge in the fashion described in this paper. Preformulated models are also used in more recent work to capture, for instance, the movement of a human being walking (Hogg 1983) or a hyperbolic curve involved in analysing images of penetrating radar (Olhoeft 2000). However, these techniques lack the flexibility of our Logical Vision approach to combine a set of primitive models in a modular fashion to form a set of composite structured and re-useable models from an image.
Shape-from-shading (Horn 1989;Zhang et al. 1999) is a key computer vision technology for estimating low-level surface orientation in images. Unlike our approach for identifying concavities and convexities, shape-from-shading generally requires observation of the same object under multiple lighting conditions. By using background knowledge as a bias we reduce the number of images for accurate perception of high-level shape properties such as the identification of convex and concave image areas.
ILP has previously been used for learning concepts from images. For instance, in Cohn et al. (2006) object recognition is carried out using existing low-level computer vision approaches, with ILP being used for learning general relational concepts from this already symbolised starting point. Farid and Sammut (2014a, b) adopted a similar approach, extracting planar surfaces from a 3D image of objects encountered by urban search and rescue robots and household objects, then using ILP to learn relational descriptions of those objects. By contrast, LV (Dai et al. 2015) uses ILP to provide a bridge from very low-level features, such as high contrast points, to high-level interpretation of objects. The present paper extends the earlier work on LV by implementing a noise-proofing technique, applicable to real images, and extending the use of generic background knowledge to allow the identification of objects, such as light sources, not directly identifiable within the image itself.
Various statistics-based techniques, making use of high-level vision, have been proposed for one-or even zero-shot learning (Palatucci et al. 2009;Vinyals et al. 2016). They usually start from an existing model pre-trained on a large corpus of instances, and then adapt the model to data with unseen concepts. Approaches can be separated into two categories. The first exploits a mapping from images to a set of semantic attributes, then high-level models are learned based on these attributes (Lampert et al. 2014;Mensink et al. 2011;Palatucci et al. 2009). The second approach uses statistics-based methods, pre-trained on a large corpus, to find localized attributes belonging to objects but not the entire image, and then exploits the semantic or spatial relationships between the attributes for scene understanding (Hu et al. 2016;Li et al. 2014;Duan et al. 2012). Unlike these approaches, we focus on one-shot from scratch, i.e. high-level vision based on just very low-level features such as high contrast points.
Machine learning is used extensively in robotics, mainly to learn perceptual and motor skills. Current approaches for learning perceptual tasks include Deep Learning and Convolutional Neural Networks (Krizhevsky et al. 2012;Redmon et al. 2016). The different approaches to vision in RoboCup can be seen in the SPQR team's use of convolutional neural networks (Suriani et al. 2016) and the ad hoc, but effective method used by the 2016 SPL champions, B-Human (Rofer et al. 2016). This approach clearly depends on domain knowledge that has been acquired by the human designers. However, the approach described in this paper promises the possibility that similar knowledge could be acquired through machine learning.

Framework
The framework for LV is a special case of MIL.  Table 1). MIL (Muggleton et al. 2014bMuggleton 2015, 2016) is a form of ILP based on an adapted Prolog meta-interpreter. A standard Prolog meta-interpreter proves goals by repeatedly fetching first-order clauses whose heads unify with the goals. By contrast, a MIL learner proves the set of all examples by fetching higher-order metarules (Table 1)

Noise tolerant Meta-Interpretive Learning
The MIL framework described in the previous section has been implemented in a system called Metagol (Muggleton et al. 2014a(Muggleton et al. , b, 2015Muggleton 2015, 2016).

Logical Vision
Our implementation of Logical Vision, called LogV is, is shown in Algorithm 2. The input consists of a set of images I , background knowledge B including both Prolog primitives B p and metarules M, a set of training examples E of the target concept, Metagol N T 's parameters ν and n.
The procedure of LogV is is divided into two stages. The first stage is to extract symbolic background knowledge from images, which is done by the visual Abduce function. By Algorithm 2: LogV is (I, B, E, ν, n) Input : Training images I ; Background knowledge B; Set of (noisy) examples E; Parameter about noise level ν and number of iterations n. Output: Hypothesised logic program H . /* Initialise the knowledge base of visual primitives */ 1 B v = ; 2 for each image i ∈ I do /* Do visual abduction to get facts of visual primitives P */ Object detection: a sampled lines with edge points; b fitting of initial ellipse centred at O. Hypothesis tested using new edge points halfway between existing adjacent points. c Revised hypothesis tested until hypothesis passes test including abductive theories in B p ∈ B, visual Abduce can abduce. Points, lines, ellipses and even complex mid-level visual representations such as super-pixels (see Sect. 5.3). In our implementation, visual Abduce can take logic rules, statistical models and functions from a computer vision toolbox as background knowledge, which provide visual primitives. This makes LogV is flexible in learning many kinds of concepts. More details about visual abduction are introduced in Sect. 5.2.
The second stage of LogV is simply calls the noise-tolerant MIL system Metagol N T to induce a hypothesis for the target concept, as both abduced visual primitives B v and training examples E from an image dataset can be noisy. Visual abduction The target of visual abduction is to obtain symbolic interpretation of images for further learning. The abduced logical facts are groundings of primitives defined in the background knowledge B p . For example, in order to learn the concept of a polygon one at least needs to extract points and edges from an image. When the data is noise-free, this can be done by sampling high-contrast pixels from the image, such as the background knowledge about edge_point applied in Dai et al. (2015).
However, for real images that contain a degree of noise, we can include a statistical model in visual Abduce and use it to implement a noise-robust version of edge_point. For example, in the Protist and Moon experiments of Sect. 5, the edge_point/1 predicate calls a pre-trained statistical image background model which can categorise pixels into foreground or background points using Gaussian models or image segmentation. Furthermore, we can use an abductive theory about shapes to abduce objects. For example, in real images many objects of interest are composed of curves and can be approximated by ellipses or circles. Therefore we can include background knowledge about them in visual Abduce to perform ellipse and circle abduction, as shown in Fig. 3. The  B, T ilt] are the axis lengths and tilting angle and Radius is the circle radius. The computational complexity of the abduction procedure is O(rkn), where n is the number of edge_points, and k is the number of iteration of the ellipse fitting algorithm. r is the time required for resampling when the fitted object is not accurate enough, hence it is a constant that reflects the noise level of the input image.
In LogV is, background knowledge about visual primitives is implemented as logical predicates in a library, including basic geometrical concepts and extractors for low-level computer vision features such as the colour histogram and super-pixels. Users can implement their own background knowledge for visual abduction based on these primitives to address different kinds of problems flexibly.

Experiment 1
In the first experiment [detailed report in Dai et al. (2015)] we compared a noise-free variant of the LogV is algorithm (refereed to as LV Poly ) with statistics-based approaches on the task of learning simple geometrical concepts (see example images in Fig. 4). Materials and methods We used Inkscape 5 to randomly generate 3 labelled image datasets for 3 polygon shape learning tasks respectively. Training sets contain 40 examples. For simplicity, the images are binary-coloured, each image contains one polygon. Target concepts are: (1) triangle/1, quadrangle/1, pentagon/1 and hexagon/1; (2) regular_poly/1 (regular polygon); (3) right_tri/1 (right triangle). All the datasets were partitioned into fivefold respectively, 4 of them were used for training and the remaining one is for testing, thus each experiment was conducted 5 times. 6 Results and discussion Table 2 compares the predictive accuracies of an implementation of LV Poly versus several statistics-based computer vision algorithms. We used a popular statistics-based computer vision toolbox VLFeat (Vedaldi and Fulkerson 2008) to implement the statistical learning algorithms. The experiments are carried with different kinds of features. Because the sizes of datasets are small, we used a support vector machine [libSVM (Chang

Experiment 2
This subsection describes experiments comparing one-shot LV with multi-shot statisticsbased learning. 7 In this experiments, we investigate the following null hypothesis: Null hypothesis One-shot LV cannot learn models with accuracy comparable to thirty-shot statistics-based learning. vector of colour distribution (which is represented by a histogram of grey-scale value) of the 10 × 10 region centered at (X,Y) is calculated, then the background model is applied to determine whether this vector represents an edge point. The parameter of neighborhood region size 10 is chosen as a compromise between accuracy and efficiency after having tested it ranging from 5 to 20. The background model is trained from five randomly sampled images in the training set by providing the bounding box of the objects.

Statistics-based Classification
The experiments with statistics-based classification were conducted in different colour spaces combined with various features. Firstly, we performed feature extraction to transform images into fixed length vectors. Next SVMs [libSVM (Chang and Lin 2011)] with RBF kernel were applied to learn a multiclass-classifier model. Parameters of the SVM are chosen by cross validation on the training set. Like LV, we used grey intensity from both image datasets for the experiments. For the coloured Protists dataset, we transformed the images to HSV and Lab colour spaces to improve the performance. Since the image sizes in the dataset are irregular, during the object detection stage of LV, we used background models and computer graphics techniques (e.g. curve fitting) to extract the main objects and unified them into same sized patches for feature extraction. The sizes of object patches were 80×80 and 401×401 in Protists and Moons respectively. For the feature extraction process, we avoided descriptors which are insensitive to scale and rotation, instead we selected the luminance-sensitive features HOG and LBP. The Histogram of Oriented Gradient (HOG) (Dalal and Triggs 2005) is known for its ability to describe the local gradient orientation in an image, and widely used in computer vision and image processing for the purpose of object detection. Local binary pattern (LBP) (Ojala et al. 2002) is a powerful feature for texture classification by converting the local texture of an image into a binary number.
In the Moons task, LV and the compared statistics-based approach both used geometrical background knowledge for fitting circles (though in different forms) during object extraction. However, in the Protists task, the noise in images always caused poor performance in automatic object extraction for the statistics-based method. Therefore, we provided additional supervision to the statistics-based method consisting of bounding boxes labelling the position of the main objects in both training and test images during feature extraction. By comparison LV discovers the objects from raw images without any label information. Results Figure 6a shows the results for Moons. Note that performance of the statistics-based approach only surpasses one-shot LV after 100 training examples. In this task, background knowledge involving circle fitting exploited by LV and statistics-based approaches are similar, though low-level features used by the statistics-based approach are first-order information (grey-scale gradients), which is stronger than the zeroth-order information (grey-scale value)   Fig. 6a, b are represented by horizontal lines. When the number of training examples exceeds one, LV performs multiple one-shot learning and selects the most frequent output (see Algorithm 2), which we found is always in the same equivalent class in LV's hypothesis space. This suggests LV learns the optimal model in its hypothesis space from a single example. The learned program is shown in Fig. 7.
The results in Fig. 6 demonstrate that Logical Vision can learn an accurate model using a single training example. By comparison, the statistics-based approaches require 40 or even 100 more training examples to reach similar accuracy, which refutes the null hypothesis. However, the performance of LV heavily relies on the accuracy of the statistical estimator of edge_point/1, because the mistakes of edge points detection will harm the shape fitting and consequently the accuracy of main object extraction. Unless we train a better edge_point/1 classifier, the best performance of LV is limited as Fig. 6 shows.
LV is implemented in SWI-Prolog (Wielemaker et al. 2012) with multi-thread processing. Experiments were executed on a laptop with Intel i5-3210M CPU (2.50GHz), the time costs of object discovery are 9.5 and 6.4 s per image on Protists and Moons dataset respectively; the average running time Metagol procedure is 0.001 s on both datasets.
Protists and Moons contain only convex objects. If instead we provide images with concave objects (such as Fig. 9), LV learns a program such as Fig. 8. Here the invented predicate clock_angle2/1 can be interpreted as concave because its interpretation can be related to the appearance of opposite_angle/2. Discussion: Learning ambiguity Figure 9 shows two images of a crater on Mars, where Fig. 9b is a 180 • rotated image of Fig. 9a. Human perception often confuses the convexity of the crater in such images. 9 This phenomenon, called the crater/mountain illusion, occurs because human vision usually interprets pictures under the default assumption that the light is from the top of the image.
LV can use MIL to perform abductive learning. We show below that incorporation of generic recursive background knowledge concerning light enables LV to generate multiple mutually inconsistent perceptual hypotheses from real images. To the authors' knowledge, such ambiguous prediction has not been demonstrated previously with machine learning.
Recall the learned programs from Figs. 7 and 8 from the previous experiments. If we rename the invented predicates we get the general theory about lighting and convexity shown in Fig. 10. Now we can use the program as a part of interpreted background knowledge for LV to do abductive learning, where the abducible predicates and the rest of background knowledge are shown in Fig. 11. Abducibles prim(convex/1). prim(concave/1). prim(light source/1). prim(light source angle/3). Compiled BK % "obj1" is an object abduced from image, "obj2" is % the brighter part of "obj1"; "observer" is the camera contains(obj1,obj2).
Interpreted BK highlight(X,Y):contains(X,Y),brighter(Y,X),light source(L), light path(L,R),reflector(R),light path(R,O), observer(O). If we input Fig. 9a to LV, it will output four different abductive hypotheses for the image, as shown in Fig. 12. 10 From the first two results we see that, by considering different possibilities of light source direction, LV can predict that the main object (which is the crater) is either convex or concave, which shows the power of learning ambiguity. The last two results are even more interesting: they suggest that obj2 (the highlighted part of the crater) might be the light source as well, which indeed is possible, though seems unlikely. 11

Experiment 3
In this subsection we describe the experiments conducted on real images involving RoboCup 12 soccer where the task is to locate the football. We address this task in two stages: first we try to approximately locate the football in the image and then we use the model-driven technique of Logical Vision to abduce its location and shape. By doing this, one can estimate the size of the football, recognise occluded footballs and deduce depth information from the images. Dataset and task The dataset contains 377 colour images sampled from a video of the robot's camera view of the football field. As Fig. 13 shows, the scene of this dataset contains the green field, a robot, and a football. The original size of the images are 480 × 720. In this experiment they have been scaled into 240 × 360 for reducing the computational complexity.
This task is more difficult than those in the previous experiments. The objects in the images are more complex and contain more noise. Therefore it is difficult to learn a hypothesis using simple primitives such as "edge_point". For example, the robot and football contain many edges so the original line sampling based abduction used by Logical Vision will become a large-scale combinatorial optimisation problem. Moreover, in 41 of the images the football is either occluded by or connected to other objects, and in 40 images there is no football at all.
To address the challenges, we consider a two-staged learning procedure. The first sub-task is to quickly find candidate locations of the footballs, which can reduce the search space of

Fig. 12
Depiction of abduced hypotheses from Fig. 9a the fine grained football discovery. The second sub-task is to use Logical Vision to abduce the location and shape of the football from the candidate positions. For the first sub-task, we use a super-pixel algorithm (Achanta et al. 2012) to segment the images into small regions, which can serve as primitives for estimating the location of football. Super-pixel algorithms are able to group pixels into atomic regions that capture image redundancy, greatly reducing the complexity of subsequent image processing tasks. The super-pixel algorithm implementations we used are OpenCV_contrib 13 (Bradski 2000). The tuned parameter is the size of each super-pixel, which ranges from 10 to 30 with step size 5. During data transformation, we use the football bounding boxes shipped with original images to label the super-pixels: those which have 95% area inside of a bounding box of footballs (which is the label information in original data) are labelled as positive examples with predicate "ball_sp". The rest are labelled as negatives. Examples from the dataset are shown in Fig. 14. The second sub-task, model-driven football abduction, directly takes "ball_sp" and an abductive theory as input and outputs the circle parameters (centre and radius), where "ball_sp" should be the result produced by the classification model learned in the first stage.

Experiment: Football super-pixel classification
This experiment is related to the first subtask described above, i.e. locating the football from super-pixel segmented images. In this experiment we compare the performance of Metagol N T versus a statistical learner [we choose the CART algorithm (Breiman et al. 1984) 14 ] and investigate the same null hypothesis used in Sect. 5.2. Materials and methods In this experiment we use the super-pixel dataset as described above. Each super-pixel is regarded as a symbolic object in the background knowledge. We extract some basic properties, such as size, location and colour distribution as features. The colour distribution is represented by the proportion of white, grey, black and green pixels inside a super-pixel, which is identified by Lab values of the pixels. Moreover, we exploit the neighbourhood relationship between super-pixels, which is represented by the "next_to/2" predicate. 15 In this experiment we randomly sample 128 images for the training and the remaining 249 images for testing. Similar to the Protists and Moons experiments in Sect. 5.2, we randomly sample 1,2,4,8,16,32,64,128 images from the training set for learning the classification model. Random data partitioning is performed 5 times. The positive training examples (both for the statistical learner and the relational learner) are football super-pixels from each of 1,2,4,8,16,32,64,128 images and the same number of negative examples (i.e. non-football superpixels) are randomly sampled from the same set of training images. Similarly, for the test data the negative examples are randomly sampled from non-football super-pixels in the test images. For relational learning (i.e. Metagol N T ), background predicates mostly_white/1, partly_white/1, mostly_black/1, partly_black/1, etc were defined based on the colour distribution of super-pixels. For example the following background definitions describe a super-pixel which is mostly white or partly white: mostly_white(S):-white(S, P), P > 0.6. partly_white(S):-white(S, P), P > 0.4, P =< 0.6.
The background knowledge for the relational learner also includes the neighbourhood relationship between super-pixels, i.e. "next_to/2" predicates.
In this experiment the following parameters were used for the relational learner, i.e. Metagol N T (B, E, ν, n) in Algorithm 1. In addition to the above mentioned background knowledge, B includes the Pre2 and Post2 Meta-rules from Table 1.
E is the set of positive and negative training examples as described above. The size of randomly selected training examples T r i ⊂ E in each iteration i of Algorithm 1 and the number of iterations n can be set according to the expected degree of noise. Given that the expected error rate in the training data is not known in this problem, we choose an extreme case where T r i contains one randomly selected positive example (and one or two randomly For the statistics-based learner we use the CART decision tree algorithm (Breiman et al. 1984). The goal is to create a model that predicts the value of a target variable based on splitting the feature space. We choose CART as the compared method because we want to ensure the statistical model uses the same features as the relational model. Since the number of features, i.e. the green/white/grey/black pixel proportions, is relatively small, it is 14 CART was chosen since it is efficient and provides human-comprehensible output comparable to logic programs and execution of decision trees within the Robocup environment is sufficiently efficient (under 1/30th of a second) for localisation and decision making. 15 Dataset located at https://github.com/haldai/LogicalVision2/tree/master-2.1/data. A second reason for the choice of decision trees is efficiency of execution. The robots in RoboCup soccer must operate in real-time, which means that all vision, localisation, decision making, localisation and locomotion tasks must be completed in the time it takes to capture the next camera frame, typically 1/30th of a second. Thus, the classifier in the vision system must be extremely efficient to execute. A decision tree, with only a few comparisons leading to a decision in the leaf node, satisfies these stringent timing requirements. Results Figure 15 compares the predictive accuracy of the relational learner (Metagol N T ) vs the statistics-base learner (CART). As shown in the figure, Metagol N T achieves consistently higher accuracy than CART with the accuracy difference particularly high for small numbers of training examples. An example of the hypotheses found by the relational learner is as follows: Model-driven football abduction After narrowing down the candidate location of the football, Logical Vision is able to exploit geometrical background knowledge to perform model-driven abduction of the football's exact shape and position (i.e. its centre and radius as a circle). This is important in robotic football games since the robot can use this information to infer the distance between itself and the football. More importantly, by modelling the football with a circle, the robot can figure out the occlusion of the football by other robots and choose appropriate actions accordingly. We apply Logical Vision with an abductive theory for this task, whose abducible is "football/3". To sample edge points, Logical Vision draws random straight lines inside a super-pixel and its neighbourhood to return the points associated with a colour transition. Examples of football abduction are shown in Fig. 16.

Conclusions and further work
Human beings often learn visual concepts from single image presentations (so-called oneshot-learning) (Lake et al. 2011). This phenomenon is hard to explain from a standard Machine Learning perspective, given that it is unclear how to estimate any statistical parameter from a single randomly selected instance drawn from an unknown distribution. In this paper we show that learnable generic logical background knowledge can be used to generate high-accuracy logical hypotheses from single examples. This compares with similar demonstrations concerning one-shot MIL on string transformations (Lin et al. 2014) as well as previous concept learning in artificial images (Dai et al. 2015). The experiments in Sect. 5 show that the LV system can accurately identify the position of a light source from a single real image, in a way analogous to scientists such as Galileo, observing the moon for the first time through a telescope or Hook observing micro-organisms for the first time through a microscope. In Sect. 5.2 we show that logical theories learned by LV from labelled images can also be used to predict concavity and convexity predicated on the assumed position of a light source. Section 5.3 shows how LV can be used effectively in real-time robot vision. Ball recognition in robot soccer is challenging because the ball is frequently occluded by other robots and the similarity in colours of the ball, robots and field lines makes the ball difficult to distinguish.
We have studied LV's failure cases carefully. The main reason causing misclassification is the noise in images. The noise can cause misclassifications of edge_point/1 since it is implemented with statistical models. The mistakes of edge_point detection will further affect the edge detection and shape fitting. As a result, the accuracy of the main object extraction is limited by both the noise level in input images and the power of statistical model of edge_point/1. Therefore, LV will fail too since the wrongly extracted objects are its inputs. However, if we train stronger models for detecting edge_points, the accuracy of LV will not increase either.
In further work we aim to investigate broader sets of visual phenomena which can naturally be treated using background knowledge. For instance, the effects of object obscuration; the interpretation of shadows in an image to infer the existence of out-of-frame objects; the existence of unseen objects reflected in a mirror found within the image. All these phenomena could possibly be considered in a general way from the point of view of a logical theory describing reflection and absorption of light, where each image pixel is used as evidence of photons arriving at the image plane. In this further work we aim to compare our approach once more against a wider variety of competing methods.