Skip to main content

Learning Prime Implicant Conditions from Interpretation Transition

  • Conference paper
  • First Online:
Inductive Logic Programming

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9046))

Abstract

In a previous work we proposed a framework for learning normal logic programs from transitions of interpretations. Given a set of pairs of interpretations (IJ) such that \(J=T_P(I)\), where \(T_P\) is the immediate consequence operator, we infer the program P. Here we propose a new learning approach that is more efficient in terms of output quality. This new approach relies on specialization in place of generalization. It generates hypotheses by specialization from the most general clauses until no negative transition is covered. Contrary to previous approaches, the output of this method does not depend on variables/transitions ordering. The new method guarantees that the learned rules are minimal, that is, the body of each rule constitutes a prime implicant to infer the head.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Apt, K.R., Blair, H.A., Walker, A.: Towards a Theory of Declarative Knowledge. Foundations of Deductive Databases and Logic Programming, p. 89. Morgan Kaufmann, USA (1988)

    Google Scholar 

  2. Garcez, A., Zaverucha, G.: The connectionist inductive learning and logic programming system. Appl. Intel. 11(1), 59–77 (1999)

    Article  Google Scholar 

  3. Dubrova, E., Teslenko, M.: A sat-based algorithm for finding attractors in synchronous boolean networks. IEEE/ACM Trans. Computat. Biol. Bioinf. (TCBB) 8(5), 1393–1399 (2011)

    Article  Google Scholar 

  4. d’Avila Garcez, A.S., Broda, K., Gabbay, D.: Symbolic knowledge extraction from trained neural networks: a sound approach. Artif. Intel. 125(2), 155–207 (2001). http://www.sciencedirect.com/science/article/pii/S0004370200000771

    Article  MathSciNet  MATH  Google Scholar 

  5. Gebser, M., Kaminski, R., Kaufmann, B., Schaub, T.: Answer Set Solving in Practice. Synthesis Lectures on Artificial Intelligence and Machine Learning. Morgan and Claypool Publishers, San Rafael (2012)

    MATH  Google Scholar 

  6. Inoue, K.: Logic programming for boolean networks. In: Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence-Volume, vol. 2, pp. 924–930. AAAI Press (2011)

    Google Scholar 

  7. Inoue, K.: DNF hypotheses in explanatory induction. In: Muggleton, S.H., Tamaddoni-Nezhad, A., Lisi, F.A. (eds.) ILP 2011. LNCS, vol. 7207, pp. 173–188. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  8. Inoue, K., Ribeiro, T., Sakama, C.: Learning from interpretation transition. Mach. Learn. 94(1), 51–79 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  9. Inoue, K., Sakama, C.: Oscillating behavior of logic programs. In: Erdem, E., Lee, J., Lierler, Y., Pearce, D. (eds.) Correct Reasoning. LNCS, vol. 7265, pp. 345–362. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  10. Kean, A., Tsiknis, G.: An incremental method for generating prime implicants/implicates. J. Symb. Comput. 9(2), 185–206 (1990)

    Article  MathSciNet  MATH  Google Scholar 

  11. Michalski, R.S.: A theory and methodology of inductive learning. Artif. Intel. 20(2), 111–161 (1983)

    Article  MathSciNet  Google Scholar 

  12. Mitchell, T.M.: Generalization as search. Artif. Intel. 18(2), 203–226 (1982)

    Article  MathSciNet  Google Scholar 

  13. Muggleton, S., De Raedt, L.: Inductive logic programming: theory and methods. J. Logic Program. 19, 629–679 (1994)

    Article  MathSciNet  MATH  Google Scholar 

  14. Muggleton, S., De Raedt, L., Poole, D., Bratko, I., Flach, P., Inoue, K., Srinivasan, A.: Ilp turns 20. Mach. Learn. 86(1), 3–23 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  15. Plotkin, G.D.: A note on inductive generalization. Mach. Intel. 5(1), 153–163 (1970)

    MathSciNet  MATH  Google Scholar 

  16. Ribeiro, T., Inoue, K., Sakama, C.: A BDD-based algorithm for learning from interpretation transition. In: Zaverucha, G., Santos Costa, V., Paes, A. (eds.) ILP 2013. LNCS, vol. 8812, pp. 47–63. Springer, Heidelberg (2014)

    Google Scholar 

  17. Tison, P.: Generalization of consensus theory and application to the minimization of boolean functions. IEEE Trans. Electron. Comput. 4, 446–456 (1967)

    Article  MATH  Google Scholar 

  18. Van Emden, M.H., Kowalski, R.A.: The semantics of predicate logic as a programming language. J. ACM (JACM) 23(4), 733–742 (1976)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tony Ribeiro .

Editor information

Editors and Affiliations

Appendix

Appendix

1.1 A.1 Proof of Theorem 1 (Completeness)

Given a set E of pairs of interpretations, LF1T with full naïve (resp. ground) resolution is complete for E.

Proof

According to Theorem 1 (resp. 2) of [8], LF1T with naïve (resp. ground) resolution is complete for E. It is trivial that any rules produced by naïve (resp. ground) resolution can be obtained by full naïve (resp. ground) resolution. Then, if P and \(P'\) are respectively obtained by naïve (resp. ground) resolution and full naïve (resp. ground) resolution, \(P'\) theory-subsumes P. If a program P is complete for E , a program \(P'\) that theory-subsumes P is also complete for E. Since P is complete for E by Theorem 1 of [8], \(P'\) is complete for E. \(\square \)

1.2 A.2 Proof of Theorem 1: (Soundness)

Given a set E of pairs of interpretations, LF1T with full naïve (resp. ground) resolution is sound for E.

Proof

All rules that can be produced by naïve (resp. ground) resolution can be obtained by full naïve (resp. ground) resolution. Since all rules produced by naïve (resp. ground) resolution are sound for E (Corrollary 1 (resp. 2) of [8]), full naïve (resp. ground) resolution is sound for E. \(\square \)

1.3 A.3 Proof of Theorem 2

Given a set E of pairs of interpretations, LF1T with full naïve (resp. ground) resolution learn the complete prime NLP that realize E.

Proof

Let us assume that LF1T with full naïve resolution does not learn a prime NLP of E. If our assumption is correct it implies that there exists R a prime rule for E that cannot be learned by LF1T with full naïve resolution. Let \(\mathcal{B}\) be the herbrand base of E.

Case 1: \(|b(R)| = |\mathcal{B}|\), R will be directly infer from a transition \((I,J) \in E\). This is a contradiction with our assumption.

Case 2: \(|b(R)| < |\mathcal{B}|\). let l be a literal such that \(l \not \in b(R)\), according to our assumption, their is a rule \(R'\) that is one of the rule \(R_1 := h(R) \leftarrow b(R) \cup l\) or \(R_2 := h(R) \leftarrow b(R) \cup \overline{l}\) and \(R'\) cannot be learned because \(res(R_1,R_2) = R\). Recursively, what applies to R applies to \(R'\) until we reach a rule \(R''\) such that \(|b(R'')| = |\mathcal{B}|\). Our assumption implies that this rule \(R''\) cannot be learned, but \(R''\) will be directly infer from a transition \((I,J) \in E\), this is a contradiction. Since ground resolution can learn all rules learn by naïve resolution, the proof also applies to LF1T with full ground resolution.\(\square \)

1.4 A.4 Proof of Theorem 3

Let \(R_1\), \(R_2\) be two rules such that \(b(R_1) \subseteq b(R_2)\). Let \(S_1\) be the set of rules subsumed by \(R_1\) and \(S_2\) be the rules of \(S_1\) that subsume \(R_2\). The least specialization of \(R_1\) by \(R_2\) only subsumes the set of rules \(S_1{\setminus }S_2\).

Proof

: According to Definition 9, the least specialization of \(R_1\) by \(R_2\) is as follows:

$$ls(R_1,R_2) = \{h(R_1) \leftarrow (b(R_1) \wedge \lnot b(R_2))\}$$

All rule R of \(S_2\) subsumes \(R_2\), then according to Definition 1 \(b(R) \subseteq b(R_2)\). If \(ls(R_1,R_2)\) subsumes an R then there exists \(R' \in ls(R_1,R_2)\) and \(b(R') \subseteq b(R)\). Since \(R' \in ls(R_1,R_2)\), there is a \(l \in b(R_2)\) such that \(\overline{l} \in b(R')\), so that \(b(R') \not \subseteq b(R_2)\). Since all \(R \in S_2\) subsume \(R_2\), \(R'\) cannot subsume any R since \(R'\) does not subsume \(R_2\).

Conclusion 1: the least specialization of \(R_1\) by \(R_2\) cannot subsume any \(R \in S_2\).

Let us suppose there is a rule \(R' \in S_1\) that does not subsumes \(R_2\) and is not subsumed by \(ls(R_1,R_2)\). Let \(l_i\) be the \(i^{th}\) literal of \(b(R_2)\), then:

$$ls(R_1,R_2) = \{(h(R_1) \leftarrow (b(R_1) \wedge \overline{l_i}) | l_i \in b(R_2){\setminus }b(R_1)\} (1)$$

\(R'\) is subsumed by \(R_1\), so that \(R' = h(R_1) \leftarrow b(R_1) \cup S\), with S a set of literal. \(R'\) does not subsume \(R_2\), so that there exists a \(l \in b(R_2){\setminus }b(R_1)\) such that \(\overline{l} \in S\). According to (1), the rule \(R'' = h(R_1) \leftarrow b(R_1) \wedge \overline{l}\) is in \(ls(R_1,R_2)\). Since \(R''\) subsumes \(R'\) and \(R'' \in ls(R_1,R_2)\), \(ls(R_1,R_2)\) subsumes \(R'\).

Conclusion 2: the least specialization of \(R_1\) by \(R_2\) subsumes all rule of \(S_1\) that does not subsume \(R_2\).

Final conclusion: the least specialization of \(R_1\) by \(R_2\) only subsumes \(S_1{\setminus }S_2\).\(\square \)

Now, let P be an NLP and R be a rule such that P subsumes R. Let \(S_P\) be the set of rules subsumed by P and \(S_R\) be the rules of \(S_P\) that subsume R. The least specialization of P by R only subsumes the set of rules \(S_P{\setminus }S_R\).

Proof

: According to Definition 5, the least specialization ls(PR) of P by R is as follows:

$$ls(P,R) = (P{\setminus }S_P) \cup (\underset{R_P \in S_P}{\bigcup }{ls(R_P,R)})$$

For any rule \(R_P\) let \(S_{R_P}\) be the set of rules subsumed by \(R_P\) and \(S_{R_{P}2} \in S_R\) be the rule of \(S_{R_P}\) that subsume R.

According to Theorem 3 the least specialization of \(R_P\) by R only subsumes \(S_{R_P}{\setminus }S_{R_{P}2}\). So that \(\underset{R_P \in S_P}{\bigcup }{ls(R_P,R)}\) only subsumes \((\underset{R_P \in S_P}{\bigcup }{S_{R_P}{\setminus }S_{R_{P}2}}) = (\underset{R_P \in S_P}{\bigcup }{S_{R_P}){\setminus }S_R}\). Then ls(PR) only subsumes the rules subsumed by \((P{\setminus }S_P) \cup (\underset{R_P \in S_P}{\bigcup }{S_{R_P}){\setminus }S_R}\), that is \(S_P{\setminus }S_R\).

Conclusion: The least specialization of P by R only subsumes \(S_P{\setminus }S_R\). \(\square \)

1.5 A.5 Proof of Theorem 4

Let \(P^\mathcal{B}_0\) be the most general complete prime NLP of a given Herbrand base \(\mathcal{{B}}\), i.e. the NLP that contains only facts

$$P^\mathcal{B}_0 = \{p. | p \in \mathcal{B}\}$$

Initializing LF1T with \(P^\mathcal{B}_0\), by using least specialization iteratively on the transitions of a set of state transitions E, LF1T learns an NLP P that realizes E.

Proof

: Let P be an NLP consistent with a set of transitions \(E'\), \(S_P\) be the set of rules subsumed by P and a state transition (IJ) such that \(E' \subset E\) and \((I,J) \in E\) but \((I,J) \not \in E'\). According to Theorem 3, for any rule \(R^I_A\) that can be inferred by LF1T from (IJ) that is subsumed by P, the least specialization \(ls(P,R^I_A)\) of P by \(R^I_A\) exactly subsumes the rules subsumed by P except the ones subsumed by \(R^I_A\). Since \(|R^I_A|\) is \(|\mathcal{B}|\), \(R^I_A\) only subsumes itself so that ls(PR) exactly subsumes \(S_P{\setminus }R^I_A\). Let \(P'\) be the NLP obtained by least specialization of P with all \(R^I_A\) that can be inferred from (IJ), then \(P'\) is consistent with \(E' \cup \{(I,J)\}\).

Conclusion 1: LF1T keep the consistency of the NLP learned.

LF1T start with \(P^\mathcal{B}_0\) as initial NLP. \(P^\mathcal{B}_0\) is at least consistent with \(\emptyset \subseteq E\). According to conclusion 1, initializing LF1T with \(P^\mathcal{B}_0\) and by using least specialization iteratively on the element of E when its needed, LF1T learns an NLP that realizes E.\(\square \)

1.6 A.6 Proof of Theorem 5

Let \(P^\mathcal{B}_0\) be the most general complete prime NLP of a given Herbrand base \(\mathcal{{B}}\), i.e. the NLP that contains only facts

$$P^\mathcal{B}_0 = \{p. | p \in \mathcal{B}\}$$

Initializing LF1T with \(P^\mathcal{B}_0\), by using least specialization iteratively on a set of state transitions E, LF1T learns the complete prime NLP of E.

Proof

: Let us assume that LF1T with least specialization does not learn a prime NLP of E. If our assumption is correct, according to Theorem 4, LF1T learns a NLP P, that is consistent with E and P is not the complete prime NLP of E. LF1T start with \(P^\mathcal{B}_0\) as initial NLP, \(P^\mathcal{B}_0\) is the most general complete prime NLP that can cover E.

Consequence 1: LF1T with least specialization can transform a complete prime NLP into an NLP that is not a complete prime NLP.

Let P be the complete prime NLP of a set of state transition \(E' \subset E\) and \((I,J) \not \in E'\), such that P is not consistent with (IJ). Our assumption implies that the least specialization \(P'\) of P by the rules inferred from (IJ) is not the complete prime NLP of \(E'\cup (I,J)\). According to Definition 7, there is two possibilities:

  • case 1: \(\exists R \in P'\) such that R is not a prime rule of \(E'\cup (I,J)\).

  • case 2: \(\exists R' \not \in P'\) such that \(R'\) is a prime rule of \(E'\cup (I,J)\).

Case 1.1: If \(R \in P\), it implies that R is a prime rule of \(E'\) and that R is consistent with (IJ), otherwise R should have been specialized. Because R is not a prime rule of \(E' \cup (I,J)\) it implies that there exists a rule \(R_m\) consistent with \(E' \cup (I,J)\) that is more general than R, i.e. \(b(R_m) \subset b(R)\). Then \(R_m\) is also consistent with \(E'\), but since R is a prime rule of \(E'\) there does not exist any rule consistent with \(E'\) that is more general than R. This is a contradiction.

Case 1.2: Now let us suppose that \(R \not \in P\); then R has been obtained by least specialization of a rule \(R_P \in P\) by a rule inferred from (IJ). It implies that \(\exists l \in b(R)\) and \(\overline{l} \in I\). If R is not a prime rule of \(E'\cup (I,J)\), there exists \(R_m\) a prime rule of \(E'\cup (I,J)\) and \(R_m\) is more general than R. It implies that \(l \in R_m\) otherwise \(R_m\) is not consistents with (IJ) because it will also subsumes \(R_P\) that is not consistents with (IJ). Since \(R_m\) is consistent with \(E' \cup (I,J)\) it is also consistent with \(E'\). This implies that \(\exists R_m'\) a prime rule of \(E'\) that subsumes \(R_m\) (it can be \(R_m\) itself), \(R_m'\) also subsumes R. Since P is the complete prime NLP of \(E'\), \(R_m' \in P\).

Case 1.2.1: Let suppose that \(l \not \in b(R_m')\), since \(l \in b(R)\) and \(R_m'\) subsumes R then \(R_m'\) subsumes \(R_P\) because \(R = h(R_P) \leftarrow b(R_P) \cup l\). But since \(R_P\) is a prime rule of \(E'\) it implies that \(R_m' = R_P\). In that case it means that \(R_P\) subsumes \(R_m\) and since \(l \in R_m\), \(h(R_P) \leftarrow b(R_P) \cup l\) also subsumes \(R_m\). Since \(h(R_P) \leftarrow b(R_P) \cup l\) is R, R subsumes \(R_m\) and \(R_m\) can neither be more general than R nor a prime rule of \(E' \cup (I,J)\). This is a contradiction with case 2.

Case 1.2.2: Finally let us suppose that \(l \in b(R_m')\), since \(R_m'\) is consistent with E and \(l \in I\), \(R_m'\) is consistent with \(E' \cup (I,J)\). But \(R_m'\) subsumes \(R_m\) and since \(R_m\) is a prime rule of \(E' \cup (I,J)\) it implies that \(R_m' = R_m\). In that case \(R_m \in P\) and because \(R_m\) is consistent with (IJ) and \(R_m\) subsumes R, LF1T will not add R into \(P'\). This is a contradiction with case 1.

Case 2: Let consider that there exists a \(R' \not \in P'\) such that \(R'\) is a prime rule of \(E'\cup (I,J)\). Since \(R' \not \in P'\), \(R' \not \in P\) and \(R'\) is not a prime rule of \(E'\) since P is the complete prime NLP of \(E'\). Then, there exists \(R_m \in P\) a prime rule of \(E'\) such that \(R_m\) subsumes \(R'\) and \(R_m \not \in P'\) since \(R'\) is a prime rule of \(E' \cup (I,J)\). Then, \(b(R') = b(R_m') \cup S\) with S a non-empty set of literals such that for all \(l \in S\), \(l \not \in b(R_m)\). Since \(R_m \not \in P'\), there is a rule \(R^I_{h(R_m)}\) that can be inferred from (IJ) and subsumed by \(R_m\). And there is no rule \(R_m' \in ls(R_m,R^I_{h(R_m)})\) that subsumes \(R'\) since \(R'\) is a prime rule of \(E' \cup (I,J)\). Then, for all \(l' \in b(R^I_{h(R_m)})\), \(\overline{l'} \not \in b(R')\) otherwise there is a \(R_m'\) that subsumes \(R'\). Since \(|b(R^I_{h(R_m)})| = \mathcal{B}\), \(b(R')\) cannot contains a literal that is not in \(b(R^I_{h(R_m)})\) so that \(R'\) subsumes \(R^I_{h(R_m)}\). \(R'\) cannot be a prime rule of \(E' \cup (I,J)\) since \(R'\) is not consistent with (IJ), this is a contradiction.

Conclusion: If P is a complete prime NLP of \(E' \subset E\), for any \((I,J) \in E\) LF1T with least specialization will learn the complete prime NLP \(P'\) of \(E' \cup (I,J)\). Since LF1T starts with a complete prime NLP that is \(P^\mathcal{B}_0\), according to Theorem 4, LF1T will learn a NLP consistent with E, our last statement implies that this NLP is the complete prime NLP of E since LF1T cannot specify a complete prime NLP into an NLP that is not a complete prime NLP.    \(\square \)

1.7 A.7 Proof of Theorem 6

Let n be the size of the Herbrand base \(|\mathcal{B}|\). Using least specialization, the memory complexity of LF1T remains in the same order as the previous algorithms based on ground resolution, i.e., \(O(2^n)\). But the computational complexity of LF1T with least specialization is higher than the previous algorithms based on ground resolution, i.e. \(O(n \cdot 4^n)\) and \(O(4^n)\), respectively. Same complexity results for full naïve (resp. ground) resolution.

Proof

Let n be the size of the Herbrand base \(|\mathcal{B}|\) of a set of state transitions E. This n is also the number of possible heads of rules. Furthermore, n is also the maximum size of a rule, i.e. the number of literals in the body; a literal can appear at most one time in the body of a rule. For each head there are \(3^n\) possible bodies: each literal can either be positive, negative or absent from the body. From these preliminaries we conclude that the size of a NLP |P| learned by LF1T from E is at most \(n\cdot 3^n\). But since a NLP P learned by LF1T only contains prime rules of E, |P| cannot exceed \(n\cdot 2^n\); in the worst case, P contains only rules of size n where all literals appear and there is only \(n\cdot 2^n\) such rules. If P contains a rule with m literals (\(m<n\)), this rule subsumes \(2^{n-m}\) rules which cannot appear in P. Finally, least specialization also ensures that P does not contain any pair of complementary rules, so that the complexity is further divided by n; that is, |P| is bounded by \(O(\frac{n\cdot 2^n}{n})=O(2^n)\).

When LF1T infers a rule \(R^{I}_A\) from a transition \((I,J) \in E\) where \(A \not \in J\), it has to compare it with all rules in P to extract the ones that need to be specialized. This operation has a complexity of \(O(|P|) = O(2^n)\). Since \(|b(R^{I}_A)| = n\), according to Definition 5 the least specialization of a rule \(R \in P\) can at most generate n different rules. In the worst case all rules of P with \(\overline{h(R^{I}_A)}\) as head subsume \(R^{I}_A\). There are possibly \(2^n/n\) such rules in P, so that LF1T generates at most \(2^n\) rules for each \(R^{I}_A\). For each \((I,J) \in E\), LF1T can infer at most n rules \(R^{I}_A\). In the worst case, LF1T can generates \(n \cdot 2^n\) rules that are compared with the \(2^n\) rules of P. Thus, construction of an NLP which realizes E implies \(n \cdot 2^n . 2^n = n \cdot 4^n\) operations. The same proof applies to LF1T naïve (resp. ground) resolution, when LF1T infers a rule \(R^{I}_A\) from a transition \((I,J) \in E\) where \(A \in J\). The complexity of learning an NLP from a complete set of state transitions with an Herbrand base of size n is \(O(n \cdot 4^n)\). \(\square \)

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Ribeiro, T., Inoue, K. (2015). Learning Prime Implicant Conditions from Interpretation Transition. In: Davis, J., Ramon, J. (eds) Inductive Logic Programming. Lecture Notes in Computer Science(), vol 9046. Springer, Cham. https://doi.org/10.1007/978-3-319-23708-4_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-23708-4_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-23707-7

  • Online ISBN: 978-3-319-23708-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics