A hypothesis H is called incomplete if it does not account for some positive examples and inconsistent if it erroneously accounts for some negative examples. Incompleteness is typically treated by generalization, e.g. addition of new clauses, or removal of literals from existing clauses. Inconsistency is treated by specialization, e.g. removal of clauses, or addition of new literals to existing clauses. Theory revision is the process of acting upon a hypothesis, in order to change the examples it accounts for. Theory revision is at the core of incremental learning settings, where examples are provided over time. A learner induces a hypothesis from scratch, from the first available set of examples, and treats this hypothesis as a revisable background theory in order to account for new examples.
Definition 1 provides a concrete account of the incremental setting we assume for our approach, which we call incremental learning of event definitions (ILED).
Definition 1
(Incremental learning) We assume an ILP task \( ILP(\mathsf {SDEC},\mathcal {E},M) \), where \(\mathcal {E}\) is a database of examples, called historical memory, storing examples presented over time. Initially \(\mathcal {E} = \emptyset \). At time n the learner is presented with a hypothesis \(H_n\) such that \(\mathsf {SDEC} \cup H_n \vDash \mathcal {E}\), in addition to a new set of examples \(w_n\). The goal is to revise \(H_n\) to a hypothesis \(H_{n+1}\), so that \(\mathsf {SDEC} \cup H_{n+1} \vDash \mathcal {E}\cup w_n\).
A main challenge of adopting a full memory approach is to scale it up to a growing size of experience. This is in line with a key requirement of incremental learning where “the incorporation of experience into memory during learning should be computationally efficient, that is, theory revision must be efficient in fitting new incoming observations” (Langley 1995; Di Mauro et al. 2005). In the stream processing literature, the number of passes over a stream of data is often used as a measure of the efficiency of algorithms (Li et al. 2004; Li and Lee 2009). In this spirit, the main contribution of ILED, in addition to scaling up XHAIL, is that it adopts a “single-pass” theory revision strategy, that is, a strategy that requires at most one pass over \(\mathcal {E}\) in order to compute \(H_{n+1}\) from \(H_{n}\).
Since experience may grow over time to an extent that is impossible to maintain in the working memory, we follow an external memory approach (Biba et al. 2006). This implies that the learner does not have access to all past experience as a whole, but to independent sets of training data, in the form of sliding windows. At time n, ILED is presented with a hypothesis \(H_n\) that accounts for the historical memory so far, and a new example window \(w_n\). If \(H_{n}\) covers the new window then it is returned as is, otherwise ILED starts the process of revising \(H_n\). In this process, revision operators that retract knowledge, such as the deletion of clauses or antecedents are excluded, due to the exponential cost of backtracking in the historical memory (Badea 2001). The supported revision operators are thus:
To treat incompleteness we add initiatedAt clauses and refine terminatedAt clauses, while to treat inconsistency we add terminatedAt clauses and refine initiatedAt clauses. The goal is to retain the preservable clauses of \(H_n\) intact, refine its revisable clauses and, if necessary, generate a set of new clauses that account for new examples in the incoming window \(w_n\). We henceforth call a clause preservable w.r.t. a set of examples if it does not cover negatives, nor it disproves positives, and call it revisable otherwise.
Figure 1 illustrates the revision process with a simple example. New clauses are generated by generalizing a Kernel Set of the incoming window, as shown in Fig. 1, where a terminatedAt/2 clause is generated from the new window \(w_n\). To facilitate refinement of existing clauses, each clause in the running hypothesis is associated with a memory of the examples it covers throughout \(\mathcal {E}\), in the form of a “bottom program”, which we call support set. The support set is constructed gradually, from previous Kernel Sets, as new example windows arrive. It serves as a refinement search space, where the single clause in the running hypothesis \(H_n\) is refined w.r.t. the incoming window \(w_n\) into two specializations. Each such specialization is constructed by adding to the initial clause one antecedent from the two support set clauses which are presented in Fig. 1. The revised hypothesis \(H_{n+1}\) is constructed from the refined clauses and the new ones, along with the preserved clauses of \(H_n\), if any.
ILED’s support set can be seen as the S-set in a version space (Mitchell 1979), i.e. the space of all overly-specific hypotheses, progressively augmented while new examples arrive. Similarly, a running hypothesis of ILED can be seen as an element of the G-set in a version space, i.e. the space of all overly-general hypotheses that account for all examples seen so far, and need to be further refined as new examples arrive.
There are two key features of ILED that contribute towards its scalability: First, re-processing of past experience is necessary only in the case where new clauses are generated by a revision, and is redundant in the case where a revision consists of refinements of existing clauses only. Second, re-processing of past experience requires a single pass over the historical memory, meaning that it suffices to re-visit each past window exactly once to ensure that the output revised hypothesis \(H_{n+1}\) is complete & consistent w.r.t. the entire historical memory. These properties of ILED are due to the support set, which we next present in detail. A proof of soundness and the single-pass revision strategy of ILED is given in Proposition 3, “Appendix 2”. The Pseudocode of ILED’s strategy is provided in Algorithm 3, “Appendix 2”.
Support set
In order to define the support set, we use the notion of most-specific clause. Given a set of mode declarations M, a clause C in the mode language \(\mathcal {L}(M)\) (see “Appendix 1” for a formal definition) is most-specific if it does not \(\theta \)-subsume any other clause in \(\mathcal {L}(M)\). \(\theta \)-subsumption is defined below.
Definition 2
(\(\theta \)-subsumption) Clause C
\(\theta \)-subsumes clause D, denoted \(C \preceq D\), if there exists a substitution \(\theta \) such that \(head(C)\theta = \ head(D)\) and \(body(C)\theta \subseteq body(D)\), where \( head(C) \) and \( body(C) \) denote the head and the body of clause C respectively. Program \(\Pi _1\)
\(\theta \)-subsumes program \(\Pi _2\) if for each clause \(C\in \Pi _1\) there exists a clause \(D\in \Pi _2\) such that \(C\preceq D\).
Intuitively, the support set of a clause C is a “bottom program” that consists of most-specific versions of the clauses that disjunctively define the concept captured by C. A formal account is given in Definition 3.
Definition 3
(Support set) Let \(\mathcal {E}\) be the historical memory, M a set of mode declarations, \(\mathcal {L}(M)\) the corresponding mode language of M and \(C\in \mathcal {L}(M)\) a clause. Also, let us denote by \(cov_{\mathcal {E}}(C)\) the coverage of clause C in the historical memory, i.e. \(cov_{\mathcal {E}}(C) = \{e \in \mathcal {E} \ | \ \mathsf {SDEC}\cup C \vDash e\}\). The support set \( C.supp \) of clause C is defined as follows:
The support set of clause C is thus defined as the set consisting of one bottom clause (Muggleton 1995) per each example \(e \in cov_{\mathcal {E}}(C) \), i.e. one most-specific clause D of \(\mathcal {L}(M)\) such that \(C \preceq D\) and \(\mathsf {SDEC} \cup D \vDash e\). Assuming no length bounds on hypothesized clauses, each such bottom clause is uniqueFootnote 2 and covers at least one example from \( cov_{E}(C) \); note that since the bottom clauses for a set of examples in \( cov_{\mathcal {E}}(C) \) may coincide (i.e. be \(\theta \)-subsumption equivalent – they \(\theta \)-subsume each other), a clause D in \( C.supp \) may cover more than one example from \( cov_{\mathcal {E}}(C) \). Proposition 1 highlights the main property of the structure. The proof is given in “Appendix 2”.
Proposition 1
Let C be a clause in \(\mathcal {L}(M)\). \( C.supp \) is the most specific program of \(\mathcal {L}(M)\) such that \( cov_{\mathcal {E}}(C.supp)=cov_{\mathcal {E}}(C) \).
Proposition 1 implies that clause C and its support set \( C.supp \) define a space \(\mathcal {S}\) of specializations of C, each of which is bound by a most-specific specialization, among those that cover the positive examples that C covers. In other words, for every \(D \in \mathcal {S}\) there is a \( C_s \in C.supp \) so that \(C \preceq D \preceq C_s\) and \(C_s\) covers at least one example from \( cov_{\mathcal {E}}(C) \). Moreover, Proposition 1 ensures that space \(\mathcal {S}\) contains refinements of clause C that collectively preserve the coverage of C in the historical memory. The purpose of \( C.supp \) is thus to serve as a search space for refinements \(R_{C}\) of clause C for which \( C\preceq R_C \preceq C.supp \) holds. Since such refinements preserve C’s coverage of positive examples, clause C may be refined w.r.t. a window \(w_n\), avoiding the overhead of re-testing the refined program on \(\mathcal {E}\) for completeness. However, to ensure that the support set can indeed be used as a refinement search space, one must ensure that \( C.supp \) will always contain such a refinement \(R_C\). This proof is provided in Proposition 2, “Appendix 9”.
The construction of the support set, presented in Algorithm 1, is a process that starts when C is added in the running hypothesis and continues as long as new example windows arrive. While this happens, clause C may be refined or retained, and its support set is updated accordingly. The details of Algorithm 1 are presented in Example 2, which also demonstrates how ILED processes incoming examples and revises hypotheses.
Table 4 Knowledge for Example 2
Example 2
Consider the annotated examples and running hypothesis related to the fighting high-level event from the activity recognition application shown in Table 4. We assume that ILED starts with an empty hypothesis and an empty historical memory, and that \(w_1\) is the first input example window. The currently empty hypothesis does not cover the provided examples, since in \(w_1\)
fighting between persons \(id_1\) and \(id_2\) is initiated at time 10 and thus holds at time 11. Hence ILED starts the process of generating an initial hypothesis. In the case of an empty hypothesis, ILED reduces to XHAIL and operates on a Kernel Set of \(w_1\) only. The variabilized Kernel Set in this case will be the single-clause program \(K_1\) presented in Table 4, generated from the corresponding ground clause. Generalizing this Kernel Set yields a minimal hypothesis that covers \(w_1\). One such hypothesis is clause C shown in Table 4. ILED stores \(w_1\) in \(\mathcal {E}\) and initializes the support set of the newly generated clause C as in line 3 of Algorithm 1, by selecting from \(K_1\) the clauses that are \(\theta \)-subsumed by C, in this case, \(K_1\)’s single clause.
Window \(w_2\) arrives next. In \(w_2\), fighting is initiated at time 20 and thus holds at time 21. The running hypothesis correctly accounts for that and thus no revision is required. However, \( C.supp \) does not cover \(w_2\) and unless proper actions are taken, property (i) of Proposition 1 will not hold once \(w_2\) is stored in \(\mathcal {E}\). ILED thus generates a new Kernel Set \(K_2\) from window \(w_2\), as presented in Table 4, and updates \( C.supp \) as shown in lines 7–11 of Algorithm 1. Since C
\(\theta \)-subsumes \(K_2\), the latter is added to \( C.supp \), which now becomes \( C.supp = \{K_1,K_2\}\). Now \( cov_{\mathcal {E}}(C.supp) = cov_{\mathcal {E}}(C) \), hence in effect, \( C.supp \) is a summarization of the coverage of clause C in the historical memory.
Window \(w_3\) arrives next, which has no positive examples for the initiation of fighting. The running hypothesis is revisable in window \(w_3\), since clause C covers a negative example at time 31, by means of initiating the fluent \( fighting(id_1,id_2) \) at time 30. To address the issue, ILED searches \( C.supp \), which now serves as a refinement search space, to find a refinement \(R_{C}\) that rejects the negative example, and moreover \( R_C \preceq C.supp \). Several choices exist for that. For instance, the following program
is such a refinement \(R_C\), since it does not cover the negative example in \(w_3\) and subsumes \( C.supp \). ILED however is biased towards minimal theories, in terms of the overall number of literals and would prefer the more compressed refinement \(C_1\), shown in Table 4, which also rejects the negative example in \(w_3\) and subsumes \( C.supp \). Clause \(C_1\) replaces the initial clause C in the running hypothesis. The hypothesis now becomes complete and consistent w.r.t. \(\mathcal {E}\). Note that the hypothesis was refined by local reasoning only, i.e. reasoning within window \(w_3\) and the support set, avoiding costly look-back in the historical memory. The support set of the new clause \(C_1\) is initialized (line 5 of Algorithm 1), by selecting the subset of the support set of its parent clause that is \(\theta \)-subsumed by \(C_1\). In this case \( C_1 \preceq C.supp = \{K_1,K_2\} \), hence \( C_1.supp = C.supp \). \(\square \)
The support set of a clause C is a compressed enumeration of the examples that C covers throughout the historical memory. It is compressed because each variabilized clause in the set is expected to encode many examples. In contrast, a ground version of the support set would be a plain enumeration of examples, since in the general case, it would require one ground clause per example. The main advantage of the “lifted” character of the support set over a plain enumeration of the examples is that it requires much less memory to encode the necessary information, an important feature in large-scale (temporal) applications. Moreover, given that training examples are typically characterized by heavy repetition, abstracting away redundant parts of the search space results in a memory structure that is expected to grow in size slowly, allowing for fast search that scales to a large amount of historical data.
Implementing revisions
Table 5 Syntactic transformations performed by ILED
Algorithm 2 presents the revision function of ILED. The input consists of SDEC as background knowledge, a running hypothesis \(H_n\), an example window \(w_n\) and a variabilized Kernel Set \(K_{v}^{w_n}\) of \(w_n\). The clauses of \(K_{v}^{w_n}\) and \(H_n\) are subject to the GeneralizationTansformation and the RefinementTransformation respectively, presented in Table 5. The former is the transformation discussed in Sect. 2.2.1, that turns the Kernel Set into a defeasible program, allowing the construction of new clauses. The RefinementTransformation aims at the refinement of the clauses of \(H_n\) using their support sets. It involves two fresh predicates, \( exception/3 \) and use / 3. For each clause \(D_i \in H_n\) and for each of its support set clauses \({\varGamma }_{i}^{j} \in D_i.supp\), one new clause \( head(D_i) \leftarrow body(D_i) \wedge \mathsf{not }\ exception(i,j,v(head(D_i))) \) is generated, where \( v(head(D_i)) \) is a term that contains the variables of \( head(C_i) \). Then an additional clause \( exception(i,j,v(head(D_i))) \leftarrow use(i,j,k)\wedge \mathsf{not }\ \delta _{i}^{j,k} \) is generated, for each body literal \(\delta _{i}^{j,k} \in {\varGamma }_{i}^{j}\).
The syntactically transformed clauses are put together in a program \(U(K_{v}^{w_n},H_n)\) (line 1 of Algorithm 2), which is used as a background theory along with SDEC. A minimal set of use / 2 and use / 3 atoms is abduced as a solution to the abductive task \({\varPhi }\) in line 2 of Algorithm 2. Abduced \( use/2 \) atoms are used to construct a set of \( NewClauses \), as discussed in Sect. 2.2.1 (line 5 of Algorithm 2). These new clauses account for some of the examples in \(w_n\), which cannot be covered by existing clauses in \(H_n\). The abduced \( use/3 \) atoms indicate clauses of \(H_n\) that must be refined. From these atoms, a refinement \(R_{D_i}\) is generated for each incorrect clause \(D_i \in H_n\), such that \(D_i \preceq R_{D_i} \preceq D_i.supp\) (line 6 of Algorithm 2). Clauses that lack a corresponding use / 3 atom in the abductive solution are retained (line 7 of Algorithm 2).
The intuition behind refinement generation is as follows: Assume that clause \(D_i\in H_n\) must be refined. This can be achieved by means of the extra clauses generated by the RefinementTransformation. These clauses provide definitions for the exception atom, namely one for each body literal in each clause of \(D_i.supp\). From these clauses, one can satisfy the exception atom by satisfying the complement of the corresponding support set literal and abducing the accompanying \( use/3 \) atom. Since an abductive solution \({\varDelta }\) is minimal, the abduced use / 3 atoms correspond precisely to the clauses that must be refined.
Hence, each inconsistent clause \(D_i\in H_n\) and each \({\varGamma }_ {i}^{j} \in D_i.supp \) correspond to a set of abduced use / 3 atoms of the form \(use(i,j,k_1),\ldots ,use(i,j,k_n)\). These atoms indicate that a specialization of \(D_i\) may be generated by adding to the body of \(D_i\) the literals \(\delta _{i}^{j,k_1},\ldots ,\delta _{i}^{j,k_n}\) from \({\varGamma }_{i}^{j}\). Then a refinement \(R_{D_i}\) such that \( D_i \preceq R_{D_i} \preceq D_i.supp \) may be generated by selecting one specialization of clause \(D_i\) from each support set clause in \( D_i.supp \).
Table 6 Clause refinement by ILED
Example 3
Table 6 presents the process of ILED’s refinement. The annotation lacks positive examples and the running hypothesis consists of a single clause C, with a support set of two clauses. Clause C is inconsistent since it entails two negative examples, namely \( \mathsf{holdsAt }(fighting(id_1,id_2), \ 2) \) and \( \mathsf{holdsAt }(fighting(id_3,id_4), \ 3) \). The program that results by applying the RefinementTransformation to the support set of clause C is presented in Table 6, along with a minimal abductive explanation of the examples, in terms of use / 3 atoms. Atoms use(1, 1, 2) and use(1, 1, 3) correspond respectively to the second and third body literals of the first support set clause, which are added to the body of clause C, resulting in the first specialization presented in Table 6. The third abduced atom use(1, 2, 2) corresponds to the second body literal of the second support set clause, which results in the second specialization in Table 6. Together, these specializations form a refinement of clause C that subsumes \( C.supp \). \(\square \)
Minimal abductive solutions imply that the running hypothesis is minimally revised. Revisions are minimal w.r.t. the length of the clauses in the revised hypothesis, but are not minimal w.r.t. the number of clauses, since the refinement strategy described above may result in refinements that include redundant clauses: Selecting one specialization from each support set clause to generate a refinement of a clause is sub-optimal, since there may exist other refinements with fewer clauses that also subsume the whole support set, as Example 2 demonstrates. To avoid unnecessary increase of the hypothesis size, the generation of refinements is followed by a “reduction” step (line 8 of Algorithm 2). The ReduceRefined function works as follows. For each refined clause C, it first generates all possible refinements from \( C.supp \). This can be realized with the abductive refinement technique described above. The only difference is that the abductive solver is instructed to find all abductive explanations in terms of use / 3 atoms, instead of one. Once all refinements are generated, ReduceRefined searches the revised hypothesis, augmented with all refinements of clause C, to find a reduced set of refinements of C that subsume \( C.supp \).