Advertisement

Lifted discriminative learning of probabilistic logic programs

  • Arnaud Nguembang Fadja
  • Fabrizio Riguzzi
Article
Part of the following topical collections:
  1. Special Issue of the Inductive Logic Programming (ILP)

Abstract

Probabilistic logic programming (PLP) provides a powerful tool for reasoning with uncertain relational models. However, learning probabilistic logic programs is expensive due to the high cost of inference. Among the proposals to overcome this problem, one of the most promising is lifted inference. In this paper we consider PLP models that are amenable to lifted inference and present an algorithm for performing parameter and structure learning of these models from positive and negative examples. We discuss parameter learning with EM and LBFGS and structure learning with LIFTCOVER, an algorithm similar to SLIPCOVER. The results of the comparison of LIFTCOVER with SLIPCOVER on 12 datasets show that it can achieve solutions of similar or better quality in a fraction of the time.

Keywords

Statistical relational learning Probabilistic inductive logic programming Probabilistic logic programming Lifted inference Expectation maximization 

Notes

Acknowledgements

This work was supported by the “National Group of Computing Science (GNCS-INDAM)” and by Regione Emilia Romagna under the Piano triennale alte competenze—POR FSE 2014/2020 Obiettivo tematico 10.

References

  1. Alberti, M., Bellodi, E., Cota, G., Riguzzi, F., & Zese, R. (2017). cplint on SWISH: Probabilistic logical inference with a web browser. Intelligenza Artificiale, 11(1), 47–64.  https://doi.org/10.3233/IA-170105.CrossRefGoogle Scholar
  2. Bellodi, E., & Riguzzi, F. (2012). Learning the structure of probabilistic logic programs. In S. Muggleton, A. Tamaddoni-Nezhad, & F. Lisi (Eds.) 22nd international conference on inductive logic programming, Vol. 7207, LNCS. Berlin: Springer, pp 61–75.Google Scholar
  3. Bellodi, E., & Riguzzi, F. (2013). Expectation maximization over binary decision diagrams for probabilistic logic programs. Intelligent Data Analysis, 17(2), 343–363.Google Scholar
  4. Bellodi, E., & Riguzzi, F. (2015). Structure learning of probabilistic logic programs by searching the clause space. Theory and Practice of Logic Programming, 15(2), 169–212.  https://doi.org/10.1017/S1471068413000689.CrossRefzbMATHGoogle Scholar
  5. Bellodi, E., Lamma, E., Riguzzi, F., Costa, V. S., & Zese, R. (2014). Lifted variable elimination for probabilistic logic programming. Theory and Practice of Logic Programming, 14(4–5), 681–695.  https://doi.org/10.1017/S1471068414000283.CrossRefzbMATHGoogle Scholar
  6. Darwiche, A., & Marquis, P. (2002). A knowledge compilation map. Journal of Artificial Intelligence Research, 17, 229–264.MathSciNetCrossRefzbMATHGoogle Scholar
  7. Davis, J., & Goadrich, M. (2006). The relationship between precision-recall and ROC curves. In European conference on machine learning (ECML 2006), ACM, pp. 233–240.Google Scholar
  8. De Raedt, L., & Kimmig, A. (2015). Probabilistic (logic) programming concepts. Machine Learning, 100(1), 5–47.MathSciNetCrossRefzbMATHGoogle Scholar
  9. De Raedt, L., Kimmig, A., & Toivonen, H. (2007). ProbLog: A probabilistic prolog and its application in link discovery. In M.M. Veloso (Ed.), 20th international joint conference on artificial intelligence (IJCAI 2007), Vol. 7. AAAI Press/IJCAI, pp 2462–2467.Google Scholar
  10. Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society Series B (Methodological), 39(1), 1–38.MathSciNetzbMATHGoogle Scholar
  11. Di Mauro, N., Bellodi, E., & Riguzzi, F. (2015). Bandit-based Monte-Carlo structure learning of probabilistic logic programs. Machine Learning, 100(1), 127–156.  https://doi.org/10.1007/s10994-015-5510-3.MathSciNetCrossRefzbMATHGoogle Scholar
  12. Fawcett, T. (2006). An introduction to ROC analysis. Pattern Recognition Letters, 27, 861–874.CrossRefGoogle Scholar
  13. Good, I. J. (1961). A causal calculus (I). The British journal for the philosophy of science, 11(44), 305–318.MathSciNetCrossRefGoogle Scholar
  14. Gordon, D. M. (1998). A survey of fast exponentiation methods. Journal of Algorithms, 27(1), 129–146.  https://doi.org/10.1006/jagm.1997.0913.MathSciNetCrossRefzbMATHGoogle Scholar
  15. Gorlin, A., Ramakrishnan, C. R., & Smolka, S. A. (2012). Model checking with probabilistic tabled logic programming. Theory and Practice of Logic Programming, 12(4–5), 681–700.MathSciNetCrossRefzbMATHGoogle Scholar
  16. Hoos, H. H., & Stützle, T. (2004). Stochastic local search: Foundations and applications. New York: Elsevier/Morgan Kaufmann.zbMATHGoogle Scholar
  17. Huynh, T. N., & Mooney, R. J. (2008). Discriminative structure and parameter learning for Markov logic networks. In W.W. Cohen, A. McCallum, & S.T. Roweis (Eds.), Proceedings of the 25th international conference on machine learning, ACM, pp. 416–423.Google Scholar
  18. Huynh, T. N., & Mooney, R. J. (2011). Online structure learning for Markov logic networks. In D. Gunopulos, T. Hofmann, D. Malerba, & M. Vazirgiannis (Eds.), European conference on machine learning and principles and practice of knowledge discovery in databases (ECMLPKDD 2011), Vol. 6912. Lecture Notes in Computer Science. Springer, pp. 81–96.  https://doi.org/10.1007/978-3-642-23783-6_6.
  19. Kazemi, S. M., Buchman, D., Kersting, K., Natarajan, S., & Poole, D. (2014). Relational logistic regression. In C. Baral, G. D. Giacomo, & T. Eiter (Eds.), 14th international conference on principles of knowledge representation and reasoning (KR 2014). AAAI Press.Google Scholar
  20. Kersting, K., & De Raedt, L. (2002). Basic principles of learning Bayesian logic programs. In Institute for Computer Science, University of Freiburg, Citeseer.Google Scholar
  21. Khot, T., Natarajan, S., Kersting, K., & Shavlik, J. W. (2011). Learning Markov logic networks via functional gradient boosting. In Proceedings of the 11th IEEE international conference on data mining, IEEE, pp. 320–329.Google Scholar
  22. Kietz, J., & Lübbe, M. (1994). An efficient subsumption algorithm for inductive logic programming. In W.W. Cohen, & H. Hirsh (Eds.), 11th international conference on machine learning, Morgan Kaufmann, pp. 130–138.Google Scholar
  23. Kisynski, J., & Poole, D. (2009). Lifted aggregation in directed first-order probabilistic models. In C. Boutilier (Ed.), 21st international joint conference on artificial intelligence (IJCAI 2009), pp. 1922–1929.Google Scholar
  24. Kok, S., & Domingos, P. (2005). Learning the structure of Markov logic networks. In 22nd international conference on machine learning, ACM, pp. 441–448.Google Scholar
  25. Kok, S., & Domingos, P. (2010). Learning Markov logic networks using structural motifs. In J. Fürnkranz, & T. Joachims (Eds.), 27th international conference on machine learning, Omnipress, pp. 551–558.Google Scholar
  26. Koller, D., & Friedman, N. (2009). Probabilistic graphical models: Principles and techniques. Cambridge, MA: MIT Press.zbMATHGoogle Scholar
  27. Koller, D., & Pfeffer, A. (1997). Learning probabilities for noisy first-order rules. In IJCAI, pp. 1316–1323.Google Scholar
  28. Meert, W., Struyf, J., & Blockeel, H. (2008). Learning ground CP-Logic theories by leveraging Bayesian network learning techniques. Fundamenta Informaticae, 89(1), 131–160.MathSciNetzbMATHGoogle Scholar
  29. Meert, W., Struyf, J., & Blockeel, H. (2010). CP-Logic theory inference with contextual variable elimination and comparison to BDD based inference methods. In L. De Raedt (Ed.), Inductive logic programming, 19th international conference (ILP 2009), Vol. 5989, Lecture notes in computer science. Springer, pp. 96–109.  https://doi.org/10.1007/978-3-642-13840-9_10.
  30. Mihalkova, L., & Mooney, R. J. (2007). Bottom-up learning of Markov logic network structure. In Proceedings of the 24th international conference on machine learning, ACM, pp. 625–632.Google Scholar
  31. Mørk, S., & Holmes, I. (2012). Evaluating bacterial gene-finding hmm structures as probabilistic logic programs. Bioinformatics, 28(5), 636–642.CrossRefGoogle Scholar
  32. Muggleton, S. (1995). Inverse entailment and Progol. New Generation Computing, 13, 245–286.CrossRefGoogle Scholar
  33. Natarajan, S., Tadepalli, P., Kunapuli, G., & Shavlik, J. (2009). Learning parameters for relational probabilistic models with noisy-or combining rule. In International conference on machine learning and applications, 2009. ICMLA’09. IEEE, pp. 141–146.Google Scholar
  34. Natarajan, S., Khot, T., Kersting, K., Gutmann, B., & Shavlik, J. (2012). Gradient-based boosting for statistical relational learning: The relational dependency network case. Machine Learning, 86(1), 25–56.MathSciNetCrossRefzbMATHGoogle Scholar
  35. Nath, A., & Domingos, P. (2015). Learning relational sum-product networks. In B. Bonet & S. Koenig (Eds.), 29th national conference on artificial intelligence, AAAI’15 (pp. 2878–2886). Austin, Texas, USA: AAAI Press.Google Scholar
  36. Nguembang Fadja, A., & Riguzzi, F. (2017). Probabilistic logic programming in action. In A. Holzinger, R. Goebel, M. Ferri, & V. Palade (Eds.), Towards integrative machine learning and knowledge extraction (Vol. 10344)., Lecture notes in computer science Berlin: Springer.  https://doi.org/10.1007/978-3-319-69775-8_5.Google Scholar
  37. Nishino, M., Yamamoto, A., & Nagata, M. (2014). A sparse parameter learning method for probabilistic logic programs. In Statistical relational artificial intelligence, Vol. WS-14-13. Papers from the 2014 AAAI workshop, AAAI Press, AAAI Workshops.Google Scholar
  38. Nocedal, J. (1980). Updating quasi-Newton matrices with limited storage. Mathematics of Computation, 35(151), 773–782.MathSciNetCrossRefzbMATHGoogle Scholar
  39. Pearl, J. (1988). Probabilistic reasoning in intelligent systems: Networks of plausible inference. Burlington: Morgan Kaufmann.zbMATHGoogle Scholar
  40. Poole, D. (1997). The independent choice logic for modelling multiple agents under uncertainty. Artificial Intelligence, 94, 7–56.MathSciNetCrossRefzbMATHGoogle Scholar
  41. Poole, D. (2000). Abducing through negation as failure: Stable models within the independent choice logic. Journal of Logic Programming, 44(1–3), 5–35.MathSciNetCrossRefzbMATHGoogle Scholar
  42. Poole, D. (2003). First-order probabilistic inference. IJCAI, 3, 985–991.Google Scholar
  43. Quinlan, J. R. (1990). Learning logical definitions from relations. Machine Learning, 5, 239–266.  https://doi.org/10.1007/BF00117105.Google Scholar
  44. Reutemann, P., Pfahringer, B., & Frank, E. (2004). A toolbox for learning from relational data with propositional and multi-instance learners. In G.I. Webb, & X. Yu (Eds.), 17th Australian joint conference on artificial intelligence (AI 1994), Vol. 3339. Lecture notes in computer science, Springer, pp. 1017–1023.  https://doi.org/10.1007/978-3-540-30549-1_95.
  45. Riguzzi, F. (2014). Speeding up inference for probabilistic logic programs. The Computer Journal, 57(3), 347–363.  https://doi.org/10.1093/comjnl/bxt096.CrossRefGoogle Scholar
  46. Riguzzi, F. (2016). The distribution semantics for normal programs with function symbols. International Journal of Approximate Reasoning, 77, 1–19.  https://doi.org/10.1016/j.ijar.2016.05.005.MathSciNetCrossRefzbMATHGoogle Scholar
  47. Riguzzi, F., & Swift, T. (2011). The PITA system: Tabling and answer subsumption for reasoning under uncertainty. Theory and Practice of Logic Programming, 11(4–5), 433–449.  https://doi.org/10.1017/S147106841100010X.MathSciNetCrossRefzbMATHGoogle Scholar
  48. Riguzzi, F., & Swift, T. (2018). Probabilistic logic programming under the distribution semantics. In M. Kifer & Y. A. Liu (Eds.), Declarative logic programming: Theory, systems, and applications. Bonita Springs: Association for Computing Machinery and Morgan & Claypool.Google Scholar
  49. Riguzzi, F., Bellodi, E., Lamma, E., Zese, R., & Cota, G. (2016). Probabilistic logic programming on the web. Software: Practice and Experience, 46(10), 1381–1396.  https://doi.org/10.1002/spe.2386.zbMATHGoogle Scholar
  50. Riguzzi, F., Bellodi, E., Zese, R., Cota, G., & Lamma, E. (2017a). A survey of lifted inference approaches for probabilistic logic programming under the distribution semantics. International Journal of Approximate Reasoning, 80, 313–333.  https://doi.org/10.1016/j.ijar.2016.10.002.MathSciNetCrossRefzbMATHGoogle Scholar
  51. Riguzzi, F., Lamma, E., Alberti, M., Bellodi, E., Zese, R., & Cota, G. (2017b). Probabilistic logic programming for natural language processing. In F. Chesani, P. Mello, & M. Milano (Eds.), Workshop on deep understanding and reasoning, Vol. 1802, URANIA 2016, Sun SITE Central Europe, CEUR workshop proceedings, pp. 30–37.Google Scholar
  52. Sato, T. (1995). A statistical learning method for logic programs with distribution semantics. In L. Sterling (Ed.), 12th international conference on logic programming (ICLP 1995), MIT Press, pp. 715–729.Google Scholar
  53. Sato, T., & Kameya, Y. (1997). PRISM: A language for symbolic-statistical modeling. In 15th international joint conference on artificial intelligence (IJCAI 1997), Vol. 97, pp 1330–1339.Google Scholar
  54. Sato, T., & Kubota, K. (2015). Viterbi training in PRISM. Theory and Practice of Logic Programming, 15(02), 147–168.MathSciNetCrossRefzbMATHGoogle Scholar
  55. Schulte, O., & Khosravi, H. (2012). Learning graphical models for relational data via lattice search. Machine Learning, 88(3), 331–368.MathSciNetCrossRefGoogle Scholar
  56. Srinivasan, A. (2007). The aleph manual. http://www.cs.ox.ac.uk/activities/machlearn/Aleph/aleph.html. Accessed April 3, 2018.
  57. Srinivasan, A., Muggleton, S., Sternberg, M. J. E., & King, R. D. (1996). Theories for mutagenicity: A study in first-order and feature-based induction. Artificial Intelligence, 85(1–2), 277–299.CrossRefGoogle Scholar
  58. Srinivasan, A., King, R. D., Muggleton, S., & Sternberg, M. J. E. (1997). Carcinogenesis predictions using ILP. In N. Lavrac, & S. Dzeroski (Eds.), 7th international workshop on inductive logic programming, Vol. 1297. Lecture notes in computer science, Berlin: Springer, pp 273–287.Google Scholar
  59. Struyf, J., Davis, J., & Page, D. (2006). An efficient approximation to lookahead in relational learners. In European conference on machine learning (ECML 2006), Lecture notes in computer science. Springer, pp. 775–782.  https://doi.org/10.1007/11871842_79.
  60. Taghipour, N., Fierens, D., Davis, J., & Blockeel, H. (2013). Lifted variable elimination: Decoupling the operators from the constraint language. Journal of Artificial Intelligence Research, 47, 393–439.MathSciNetCrossRefzbMATHGoogle Scholar
  61. Valiant, L. G. (1979). The complexity of enumeration and reliability problems. SIAM Journal on Computing, 8(3), 410–421.MathSciNetCrossRefzbMATHGoogle Scholar
  62. Van den Broeck, G., Meert, W., & Darwiche, A. (2014). Skolemization for weighted first-order model counting. In C. Baral, G.D. Giacomo, & T. Eiter (Eds.), 14th international conference on principles of knowledge representation and reasoning (KR 2014), AAAI Press, pp. 111–120.Google Scholar
  63. Van Gelder, A., Ross, K. A., & Schlipf, J. S. (1991). The well-founded semantics for general logic programs. Journal of the ACM, 38(3), 620–650.MathSciNetzbMATHGoogle Scholar
  64. Van Haaren, J., Van den Broeck, G., Meert, W., & Davis, J. (2016). Lifted generative learning of markov logic networks. Machine Learning, 103(1), 27–55.  https://doi.org/10.1007/s10994-015-5532-x.MathSciNetCrossRefzbMATHGoogle Scholar
  65. Vennekens, J., & Verbaeten, S. (2003). Logic programs with annotated disjunctions. Tech. Rep. CW386, KU Leuven.Google Scholar
  66. Vennekens, J., Verbaeten, S., & Bruynooghe, M. (2004a). Logic programs with annotated disjunctions. In B. Demoen, & V. Lifschitz (Eds.), 24th international conference on logic programming (ICLP 2004), Vol. 3131. Lecture notes in computer science, Berlin: Springer, pp. 195–209.Google Scholar
  67. Vennekens, J., Verbaeten, S., & Bruynooghe, M. (2004b). Logic programs with annotated disjunctions. In 24th international conference on logic programming (ICLP 2004), Vol. 3132. Lecture notes in computer science. Springer, pp. 431–445.Google Scholar
  68. Wang, W. Y., Mazaitis, K., & Cohen, W. W. (2014). Structure learning via parameter learning. In J. Li, X.S. Wang, M.N. Garofalakis, I. Soboroff, T. Suel, & M. Wang (Eds.), 23rd ACM international conference on conference on information and knowledge management, CIKM 2014, ACM Press, pp. 1199–1208.  https://doi.org/10.1145/2661829.2662022.
  69. Wellman, M. P., Breese, J. S., & Goldman, R. P. (1992). From knowledge bases to decision models. The Knowledge Engineering Review, 7(1), 35–53.CrossRefGoogle Scholar
  70. Wielemaker, J., Schrijvers, T., Triska, M., & Lager, T. (2012). SWI-Prolog. Theory and Practice of Logic Programming, 12(1–2), 67–96.  https://doi.org/10.1017/S1471068411000494.MathSciNetCrossRefzbMATHGoogle Scholar
  71. Železný, F., Srinivasan, A., & Page, C. D. (2002). Lattice-search runtime distributions may be heavy-tailed. In Proceedings of the 12th international conference on inductive logic programming. Springer.Google Scholar
  72. Železný, F., Srinivasan, A., & Page, C. D, Jr. (2006). Randomised restarted search in ILP. Machine Learning, 64(1–3), 183–208.CrossRefzbMATHGoogle Scholar
  73. Zhang, N. L., & Poole, D. (1994). A simple approach to Bayesian network computations. In 10th Canadian conference on artificial intelligence, Canadian AI 1994, pp. 171–178.Google Scholar
  74. Zhang, N. L., & Poole, D. L. (1996). Exploiting causal independence in Bayesian network inference. Journal of Artificial Intelligence Research, 5, 301–328.MathSciNetCrossRefzbMATHGoogle Scholar

Copyright information

© The Author(s) 2018

Authors and Affiliations

  1. 1.Dipartimento di IngegneriaUniversity of FerraraFerraraItaly
  2. 2.Dipartimento di Matematica e InformaticaUniversity of FerraraFerraraItaly

Personalised recommendations