Advertisement

Machine Learning

, Volume 106, Issue 12, pp 1933–1969 | Cite as

kProbLog: an algebraic Prolog for machine learning

Article

Abstract

We introduce kProbLog as a declarative logical language for machine learning. kProbLog is a simple algebraic extension of Prolog with facts and rules annotated by semi-ring labels. It allows to elegantly combine algebraic expressions with logic programs. We introduce the semantics of kProbLog, its inference algorithm, its implementation and provide convergence guarantees. We provide several code examples to illustrate its potential for a wide range of machine learning techniques. In particular, we show the encodings of state-of-the-art graph kernels such as Weisfeiler-Lehman graph kernels, propagation kernels and an instance of graph invariant kernels, a recent framework for graph kernels with continuous attributes. However, kProbLog is not limited to kernel methods and it can concisely express declarative formulations of tensor-based algorithms such as matrix factorization and energy-based models, and it can exploit semirings of dual numbers to perform algorithmic differentiation. Furthermore, experiments show that kProbLog is not only of theoretical interest, but can also be applied to real-world datasets. At the technical level, kProbLog extends aProbLog (an algebraic Prolog) by allowing multiple semirings to coexist in a single program and by introducing meta-functions for manipulating algebraic values.

Keywords

Algebraic Prolog Kernel programming Graph kernels Machine learning 

Notes

Acknowledgements

We would like to thank Angelika Kimmig and Anton Dries for the fruitful discussions about ProbLog.

References

  1. Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G. S., Davis, A., Dean, J., Devin, M., Ghemawat, S., Goodfellow, I., Harp, A., Irving G, Isard, M., Jia, Y., Jozefowicz, R., Kaiser, L., Kudlur, M., Levenberg, J., Mané, D., Monga, R., Moore, S., Murray, D., Olah, C., Schuster, M., Shlens J, Steiner, B., Sutskever, I., Talwar, K., Tucker, P., Vanhoucke, V., Vasudevan, V., Viégas, F., Vinyals, O., Warden, P., Wattenberg, M., Wicke M, Yu, Y., & Zheng, X. (2015). TensorFlow: Large-scale machine learning on heterogeneous systems. http://tensorflow.org/, software available from tensorflow.org.
  2. Bastien, F., Lamblin, P., Pascanu, R., Bergstra, J., Goodfellow IJ, Bergeron, A., Bouchard, N., & Bengio, Y. (2012). Theano: New features and speed improvements. Deep Learning and Unsupervised Feature Learning NIPS 2012 Workshop.Google Scholar
  3. Baydin, A. G., Pearlmutter, B. A., Radul, A. A., & Siskind, J. M. (2015). Automatic differentiation in machine learning: A survey. arXiv:1502.05767
  4. Bryant, R. E. (1992). Symbolic boolean manipulation with ordered binary-decision diagrams. ACM Computing Surveys, 24(3), 293–318.MathSciNetCrossRefGoogle Scholar
  5. Ceri, S., Gottlob, G., & Tanca, L. (1989). What you always wanted to know about datalog (and never dared to ask). IEEE Transactions on Knowledge and Data Engineering, 1(1), 146–166. doi: 10.1109/69.43410.CrossRefGoogle Scholar
  6. Collobert, R., Bengio, S., & Mariéthoz, J. (2002). Torch: A modular machine learning software library. IDIAP: Tech. rep.Google Scholar
  7. Costa, F., & De Grave, K. (2010). Fast neighborhood subgraph pairwise distance kernel. In Proceedings of the 27th international conference on machine learning (ICML-10), June 21–24, Haifa, Israel, pp. 255–262, http://www.icml2010.org/papers/347.pdf
  8. Darwiche, A. (2011). SDD: A new canonical representation of propositional knowledge bases. In IJCAI 2011, Proceedings of the 22nd international joint conference on artificial intelligence, Barcelona, Catalonia, Spain, July 16–22, pp. 819–826. doi: 10.5591/978-1-57735-516-8/IJCAI11-143
  9. Darwiche, A., & Marquis, P. (2002). A knowledge compilation map. Journal of Artificial Intelligence Research, 17(1), 229–264.MathSciNetMATHGoogle Scholar
  10. De Marneffe, M. C., & Manning, C. D. (2008). The stanford typed dependencies representation. InColing 2008: Proceedings of the workshop on cross-framework and cross-domain parser evaluation, Association for Computational Linguistics, pp. 1–8.Google Scholar
  11. De Raedt, L. (2008). Logical and relational learning. Berlin: Springer.CrossRefMATHGoogle Scholar
  12. De Raedt, L., Kimmig, A., Toivonen, H. (2007). Problog: A probabilistic prolog and its application in link discovery. In IJCAI 2007, proceedings of the 20th international joint conference on artificial intelligence, Hyderabad, India, January 6–12, pp. 2462–2467, http://ijcai.org/Proceedings/07/Papers/396.pdf
  13. De Raedt, L., Kersting, K., Natarajan, S., & Poole, D. (2016). Statistical relational artificial intelligence: Logic, probability, and computation. Synthesis Lectures on Artificial Intelligence and Machine Learning, 10(2), 1–189.CrossRefMATHGoogle Scholar
  14. Debnath, A. K., Lopez de Compadre, R. L., Debnath, G., Shusterman, A. J., & Hansch, C. (1991). Structure-activity relationship of mutagenic aromatic and heteroaromatic nitro compounds. Correlation with molecular orbital energies and hydrophobicity. Journal of Medicinal Chemistry, 34(2), 786–797.CrossRefGoogle Scholar
  15. Droste, M., & Kuich, W. (2009). Semirings and formal power series. Berlin: Springer.CrossRefGoogle Scholar
  16. Eisner, J. (2002). Parameter estimation for probabilistic finite-state transducers. In Proceedings of the 40th annual meeting on association for computational linguistics, Association for Computational Linguistics, pp. 1–8.Google Scholar
  17. Eisner, J., Blatz, J. (2007). Program transformations for optimization of parsing algorithms and other weighted logic programs. In Proceedings of formal grammar, pp. 45–85.Google Scholar
  18. Eisner, J., & Filardo, N. W. (2011). Dyna: Extending datalog for modern AI. In Datalog reloaded. Springer, pp. 181–220.Google Scholar
  19. Eisner, J., Goldlust, E., & Smith, N. A. (2004). Dyna: A declarative language for implementing dynamic programs. In Proceedings of the 42nd annual meeting of the association for computational linguistics (ACL), Companion Volume, Barcelona, pp. 218–221.Google Scholar
  20. Esparza, J., Luttenberger, M., & Schlund, M. (2014). Fpsolve: A generic solver for fixpoint equations over semirings. In Proceedings of 19th international conference implementation and application of automata, CIAA 2014, Giessen, Germany, July 30–August 2, pp. 1–15. doi: 10.1007/978-3-319-08846-4_1.
  21. Frasconi, P., Costa, F., De Raedt, L., & De Grave, K. (2014). klog: A language for logical and relational learning with kernels. Artificial Intelligence, 217, 117–143. doi: 10.1016/j.artint.2014.08.003.CrossRefMATHGoogle Scholar
  22. Garcez, Ad., Besold, T. R., de Raedt, L., Földiak, P., Hitzler, P., Icard, T., Kühnberger, K. U., Lamb, L. C., Miikkulainen, R., & Silver, D. L. (2015). Neural-symbolic learning and reasoning: contributions and challenges. In Proceedings of the AAAI spring symposium on knowledge representation and reasoning: Iintegrating symbolic and neural approaches, Stanford.Google Scholar
  23. Garcez, A. S., Lamb, L. C., & Gabbay, D. M. (2008). Neural-symbolic cognitive reasoning. Berlin: Springer.MATHGoogle Scholar
  24. Gärtner, T., Flach, P., & Wrobel, S. (2003). On graph kernels: Hardness results and efficient alternatives. In Learning Theory and Kernel Machines. Springer, pp. 129–143Google Scholar
  25. Gärtner, T., Lloyd, J. W., & Flach, P. A. (2004). Kernels and distances for structured data. Machine Learning, 57(3), 205–232.CrossRefMATHGoogle Scholar
  26. Getoor, L., & Taskar, B. (Eds.). (2007). Introduction to statistical relational learning. Adaptive computation and machine learning. Cambridge, MA: MIT Press.MATHGoogle Scholar
  27. Golub, G. H., & Van Loan, C. F. (2012). Matrix computations (Vol. 3). Baltimore: JHU Press.MATHGoogle Scholar
  28. Green, T. J., Karvounarakis, G., & Tannen, V. (2007). Provenance semirings. In Proceedings of the 26th ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems. ACM.Google Scholar
  29. Griewank, A., & Walther, A. (2008). Evaluating derivatives: Principles and techniques of algorithmic differentiation (2nd ed.). Philadelphia: Society for Industrial and Applied Mathematics.Google Scholar
  30. Kashima, H., Tsuda, K., & Inokuchi, A. (2003). Marginalized kernels between labeled graphs. ICML, 3, 321–328.Google Scholar
  31. Kazius, J., McGuire, R., & Bursi, R. (2005). Derivation and validation of toxicophores for mutagenicity prediction. Journal of Medicinal Chemistry, 48(1), 312–320.Google Scholar
  32. Kim, M., & Candan, K. S. (2011). Approximate tensor decomposition within a tensor-relational algebraic framework. In Proceedings of the 20th ACM conference on information and knowledge management, CIKM 2011, Glasgow, United Kingdom, October 24–28, pp. 1737–1742. doi: 10.1145/2063576.2063827
  33. Kimmig, A., Van den Broeck, G., & De Raedt, L. (2011). An algebraic prolog for reasoning about possible worlds. In Proceedings of the twenty-fifth AAAI conference on artificial intelligence, AAAI 2011, San Francisco, California, USA, August 7–11, http://www.aaai.org/ocs/index.php/AAAI/AAAI11/paper/view/3685.
  34. Kimmig, A., Van den Broeck, G., & De Raedt, L. (2012). Algebraic model counting. CoRR abs/1211.4475, arXiv:1211.4475
  35. Kisa, D., Van den Broeck, G., Choi, A., & Darwiche, A. (2014). Probabilistic sentential decision diagrams. In Proceedings of the 14th international conference on principles of knowledge representation and reasoning (KR).Google Scholar
  36. Koren, Y., Bell, R., & Volinsky, C. (2009). Matrix factorization techniques for recommender systems. Computer, 42(8), 30–37.Google Scholar
  37. Kuich, W. (1997). Semirings and formal power series: Their relevance to formal languages and automata (pp. 609–677). In Handbook of formal languages: Springer.Google Scholar
  38. Landwehr, N., Passerini, A., De Raedt, L., & Frasconi, P. (2006). kfoil: Learning simple relational kernels, pp. 389–394.Google Scholar
  39. Li, X., & Roth, D. (2002). Learning question classifiers. In Proceedings of the 19th international conference on computational linguistics–Volume 1, Association for Computational Linguistics.Google Scholar
  40. Mahé, P., Ueda, N., Akutsu, T., Perret, J. L., & Vert, J. P. (2004). Extensions of marginalized graph kernels. In Proceedings of the twenty-first international conference on machine learning, ACM, p. 70.Google Scholar
  41. Milch, B., Marthi, B., Russell, S. J., Sontag, D., Ong, D. L., & Kolobov, A. (2005). BLOG: Probabilistic models with unknown objects, pp. 1352–1359, http://ijcai.org/Proceedings/05/Papers/1546.pdf.
  42. Muggleton, S., Raedt, L. D., Poole, D., Bratko, I., Flach, P. A., Inoue, K., et al. (2012). ILP turns 20–Biography and future challenges. Machine Learning, 86(1), 3–23. doi: 10.1007/s10994-011-5259-2.MathSciNetCrossRefMATHGoogle Scholar
  43. Neumann, M., Patricia, N., Garnett, R., & Kersting, K. (2012). Efficient graph kernels by randomization. In Proceedings of machine learning and knowledge discovery in databases—European conference, ECML PKDD 2012, Bristol, UK, September 24–28. Part I, pp. 378–393. doi: 10.1007/978-3-642-33460-3_30.
  44. Nickel, M., Tresp, V., & Kriegel, H. P. (2011). A three-way model for collective learning on multi-relational data. In Proceedings of the 28th international conference on machine learning (ICML-11), pp. 809–816.Google Scholar
  45. Nilsson, U., & Maluszynski, J. (1990). Logic, programming and Prolog. Chichester: Wiley.MATHGoogle Scholar
  46. Orsini, F., Frasconi, P., & De Raedt, L. (2015). Graph invariant kernels. In Proceedings of the twenty-fourth international joint conference on artificial intelligence, IJCAI 2015, Buenos Aires, Argentina, July 25–31, pp. 3756–3762, http://ijcai.org/Abstract/15/528.
  47. Quinlan, J. R. (1990). Learning logical definitions from relations. Machine Learning, 5(3), 239–266.Google Scholar
  48. Richardson, M., & Domingos, P. M. (2006). Markov logic networks. Machine Learning, 62(1–2), 107–136. doi: 10.1007/s10994-006-5833-1.CrossRefGoogle Scholar
  49. Sammut, C. (1993). The origins of inductive logic programming: A prehistoric tale. In Proceedings of the 3rd international workshop on inductive logic programming. J Stefan Institute, pp. 127–147.Google Scholar
  50. Sato, T. (1995). A statistical learning method for logic programs with distribution semantics. In Logic programming, proceedings of the twelfth international conference on logic programming, Tokyo, Japan, June 13–16, pp. 715–729.Google Scholar
  51. Sato, T., & Kameya, Y. (1997). PRISM: A language for symbolic-statistical modeling. In Proceedings of the fifteenth international joint conference on artificial intelligence, IJCAI 97, Nagoya, Japan, August 23–29, Vol. 2, pp. 1330–1339. http://ijcai.org/Proceedings/97-2/Papers/078.pdf.
  52. Shervashidze, N., Schweitzer, P., van Leeuwen, E. J., Mehlhorn, K., & Borgwardt, K. M. (2011). Weisfeiler-lehman graph kernels. Journal of Machine Learning Research 12:2539–2561, http://dl.acm.org/citation.cfm?id=2078187.
  53. van Emden, M. H., & Kowalski, R. A. (1976). The semantics of predicate logic as a programming language. Journal of the ACM, 23(4), 733–742. doi: 10.1145/321978.321991.MathSciNetCrossRefMATHGoogle Scholar
  54. Van Laer, W., & De Raedt, L. (2001). How to upgrade propositional learners to first order logic: A case study. In Machine learning and its applications. Springer, pp. 102–126.Google Scholar
  55. Vishwanathan, S. V. N., Schraudolph, N. N., Kondor, R., & Borgwardt, K. M. (2010). Graph kernels. The Journal of Machine Learning Research, 11, 1201–1242.MathSciNetMATHGoogle Scholar
  56. Vlasselaer, J., Van den Broeck, G., Kimmig, A., Meert, W., & De Raedt, L. (2015). Anytime inference in probabilistic logic programs with tp-compilation. In Proceedings of the twenty-fourth international joint conference on artificial intelligence, IJCAI 2015, Buenos Aires, Argentina, July 25–31, pp. 1852–1858, http://ijcai.org/Abstract/15/263.
  57. Whaley, J., Avots, D., Carbin, M., & Lam, M. S. (2005). Using datalog with binary decision diagrams for program analysis. In Programming languages and systems: Springer.CrossRefMATHGoogle Scholar
  58. Zhang, D., & Lee, W. S. (2003). Question classification using support vector machines. In SIGIR 2003: Proceedings of the 26th annual international ACM SIGIR conference on research and development in information retrieval, July 28–August 1, 2003, Toronto, Canada, pp. 26–32. doi: 10.1145/860435.860443.

Copyright information

© The Author(s) 2017

Authors and Affiliations

  1. 1.Department of Computer ScienceKatholieke Universiteit LeuvenLeuvenBelgium
  2. 2.Department of Information EngineeringUniversità degli Studi di FirenzeFlorenceItaly

Personalised recommendations