kProbLog: an algebraic Prolog for machine learning
We introduce kProbLog as a declarative logical language for machine learning. kProbLog is a simple algebraic extension of Prolog with facts and rules annotated by semi-ring labels. It allows to elegantly combine algebraic expressions with logic programs. We introduce the semantics of kProbLog, its inference algorithm, its implementation and provide convergence guarantees. We provide several code examples to illustrate its potential for a wide range of machine learning techniques. In particular, we show the encodings of state-of-the-art graph kernels such as Weisfeiler-Lehman graph kernels, propagation kernels and an instance of graph invariant kernels, a recent framework for graph kernels with continuous attributes. However, kProbLog is not limited to kernel methods and it can concisely express declarative formulations of tensor-based algorithms such as matrix factorization and energy-based models, and it can exploit semirings of dual numbers to perform algorithmic differentiation. Furthermore, experiments show that kProbLog is not only of theoretical interest, but can also be applied to real-world datasets. At the technical level, kProbLog extends aProbLog (an algebraic Prolog) by allowing multiple semirings to coexist in a single program and by introducing meta-functions for manipulating algebraic values.
KeywordsAlgebraic Prolog Kernel programming Graph kernels Machine learning
We would like to thank Angelika Kimmig and Anton Dries for the fruitful discussions about ProbLog.
- Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G. S., Davis, A., Dean, J., Devin, M., Ghemawat, S., Goodfellow, I., Harp, A., Irving G, Isard, M., Jia, Y., Jozefowicz, R., Kaiser, L., Kudlur, M., Levenberg, J., Mané, D., Monga, R., Moore, S., Murray, D., Olah, C., Schuster, M., Shlens J, Steiner, B., Sutskever, I., Talwar, K., Tucker, P., Vanhoucke, V., Vasudevan, V., Viégas, F., Vinyals, O., Warden, P., Wattenberg, M., Wicke M, Yu, Y., & Zheng, X. (2015). TensorFlow: Large-scale machine learning on heterogeneous systems. http://tensorflow.org/, software available from tensorflow.org.
- Bastien, F., Lamblin, P., Pascanu, R., Bergstra, J., Goodfellow IJ, Bergeron, A., Bouchard, N., & Bengio, Y. (2012). Theano: New features and speed improvements. Deep Learning and Unsupervised Feature Learning NIPS 2012 Workshop.Google Scholar
- Baydin, A. G., Pearlmutter, B. A., Radul, A. A., & Siskind, J. M. (2015). Automatic differentiation in machine learning: A survey. arXiv:1502.05767
- Collobert, R., Bengio, S., & Mariéthoz, J. (2002). Torch: A modular machine learning software library. IDIAP: Tech. rep.Google Scholar
- Costa, F., & De Grave, K. (2010). Fast neighborhood subgraph pairwise distance kernel. In Proceedings of the 27th international conference on machine learning (ICML-10), June 21–24, Haifa, Israel, pp. 255–262, http://www.icml2010.org/papers/347.pdf
- Darwiche, A. (2011). SDD: A new canonical representation of propositional knowledge bases. In IJCAI 2011, Proceedings of the 22nd international joint conference on artificial intelligence, Barcelona, Catalonia, Spain, July 16–22, pp. 819–826. doi: 10.5591/978-1-57735-516-8/IJCAI11-143
- De Marneffe, M. C., & Manning, C. D. (2008). The stanford typed dependencies representation. InColing 2008: Proceedings of the workshop on cross-framework and cross-domain parser evaluation, Association for Computational Linguistics, pp. 1–8.Google Scholar
- De Raedt, L., Kimmig, A., Toivonen, H. (2007). Problog: A probabilistic prolog and its application in link discovery. In IJCAI 2007, proceedings of the 20th international joint conference on artificial intelligence, Hyderabad, India, January 6–12, pp. 2462–2467, http://ijcai.org/Proceedings/07/Papers/396.pdf
- Debnath, A. K., Lopez de Compadre, R. L., Debnath, G., Shusterman, A. J., & Hansch, C. (1991). Structure-activity relationship of mutagenic aromatic and heteroaromatic nitro compounds. Correlation with molecular orbital energies and hydrophobicity. Journal of Medicinal Chemistry, 34(2), 786–797.CrossRefGoogle Scholar
- Eisner, J. (2002). Parameter estimation for probabilistic finite-state transducers. In Proceedings of the 40th annual meeting on association for computational linguistics, Association for Computational Linguistics, pp. 1–8.Google Scholar
- Eisner, J., Blatz, J. (2007). Program transformations for optimization of parsing algorithms and other weighted logic programs. In Proceedings of formal grammar, pp. 45–85.Google Scholar
- Eisner, J., & Filardo, N. W. (2011). Dyna: Extending datalog for modern AI. In Datalog reloaded. Springer, pp. 181–220.Google Scholar
- Eisner, J., Goldlust, E., & Smith, N. A. (2004). Dyna: A declarative language for implementing dynamic programs. In Proceedings of the 42nd annual meeting of the association for computational linguistics (ACL), Companion Volume, Barcelona, pp. 218–221.Google Scholar
- Esparza, J., Luttenberger, M., & Schlund, M. (2014). Fpsolve: A generic solver for fixpoint equations over semirings. In Proceedings of 19th international conference implementation and application of automata, CIAA 2014, Giessen, Germany, July 30–August 2, pp. 1–15. doi: 10.1007/978-3-319-08846-4_1.
- Garcez, Ad., Besold, T. R., de Raedt, L., Földiak, P., Hitzler, P., Icard, T., Kühnberger, K. U., Lamb, L. C., Miikkulainen, R., & Silver, D. L. (2015). Neural-symbolic learning and reasoning: contributions and challenges. In Proceedings of the AAAI spring symposium on knowledge representation and reasoning: Iintegrating symbolic and neural approaches, Stanford.Google Scholar
- Gärtner, T., Flach, P., & Wrobel, S. (2003). On graph kernels: Hardness results and efficient alternatives. In Learning Theory and Kernel Machines. Springer, pp. 129–143Google Scholar
- Green, T. J., Karvounarakis, G., & Tannen, V. (2007). Provenance semirings. In Proceedings of the 26th ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems. ACM.Google Scholar
- Griewank, A., & Walther, A. (2008). Evaluating derivatives: Principles and techniques of algorithmic differentiation (2nd ed.). Philadelphia: Society for Industrial and Applied Mathematics.Google Scholar
- Kashima, H., Tsuda, K., & Inokuchi, A. (2003). Marginalized kernels between labeled graphs. ICML, 3, 321–328.Google Scholar
- Kazius, J., McGuire, R., & Bursi, R. (2005). Derivation and validation of toxicophores for mutagenicity prediction. Journal of Medicinal Chemistry, 48(1), 312–320.Google Scholar
- Kim, M., & Candan, K. S. (2011). Approximate tensor decomposition within a tensor-relational algebraic framework. In Proceedings of the 20th ACM conference on information and knowledge management, CIKM 2011, Glasgow, United Kingdom, October 24–28, pp. 1737–1742. doi: 10.1145/2063576.2063827
- Kimmig, A., Van den Broeck, G., & De Raedt, L. (2011). An algebraic prolog for reasoning about possible worlds. In Proceedings of the twenty-fifth AAAI conference on artificial intelligence, AAAI 2011, San Francisco, California, USA, August 7–11, http://www.aaai.org/ocs/index.php/AAAI/AAAI11/paper/view/3685.
- Kimmig, A., Van den Broeck, G., & De Raedt, L. (2012). Algebraic model counting. CoRR abs/1211.4475, arXiv:1211.4475
- Kisa, D., Van den Broeck, G., Choi, A., & Darwiche, A. (2014). Probabilistic sentential decision diagrams. In Proceedings of the 14th international conference on principles of knowledge representation and reasoning (KR).Google Scholar
- Koren, Y., Bell, R., & Volinsky, C. (2009). Matrix factorization techniques for recommender systems. Computer, 42(8), 30–37.Google Scholar
- Kuich, W. (1997). Semirings and formal power series: Their relevance to formal languages and automata (pp. 609–677). In Handbook of formal languages: Springer.Google Scholar
- Landwehr, N., Passerini, A., De Raedt, L., & Frasconi, P. (2006). kfoil: Learning simple relational kernels, pp. 389–394.Google Scholar
- Li, X., & Roth, D. (2002). Learning question classifiers. In Proceedings of the 19th international conference on computational linguistics–Volume 1, Association for Computational Linguistics.Google Scholar
- Mahé, P., Ueda, N., Akutsu, T., Perret, J. L., & Vert, J. P. (2004). Extensions of marginalized graph kernels. In Proceedings of the twenty-first international conference on machine learning, ACM, p. 70.Google Scholar
- Milch, B., Marthi, B., Russell, S. J., Sontag, D., Ong, D. L., & Kolobov, A. (2005). BLOG: Probabilistic models with unknown objects, pp. 1352–1359, http://ijcai.org/Proceedings/05/Papers/1546.pdf.
- Neumann, M., Patricia, N., Garnett, R., & Kersting, K. (2012). Efficient graph kernels by randomization. In Proceedings of machine learning and knowledge discovery in databases—European conference, ECML PKDD 2012, Bristol, UK, September 24–28. Part I, pp. 378–393. doi: 10.1007/978-3-642-33460-3_30.
- Nickel, M., Tresp, V., & Kriegel, H. P. (2011). A three-way model for collective learning on multi-relational data. In Proceedings of the 28th international conference on machine learning (ICML-11), pp. 809–816.Google Scholar
- Orsini, F., Frasconi, P., & De Raedt, L. (2015). Graph invariant kernels. In Proceedings of the twenty-fourth international joint conference on artificial intelligence, IJCAI 2015, Buenos Aires, Argentina, July 25–31, pp. 3756–3762, http://ijcai.org/Abstract/15/528.
- Quinlan, J. R. (1990). Learning logical definitions from relations. Machine Learning, 5(3), 239–266.Google Scholar
- Sammut, C. (1993). The origins of inductive logic programming: A prehistoric tale. In Proceedings of the 3rd international workshop on inductive logic programming. J Stefan Institute, pp. 127–147.Google Scholar
- Sato, T. (1995). A statistical learning method for logic programs with distribution semantics. In Logic programming, proceedings of the twelfth international conference on logic programming, Tokyo, Japan, June 13–16, pp. 715–729.Google Scholar
- Sato, T., & Kameya, Y. (1997). PRISM: A language for symbolic-statistical modeling. In Proceedings of the fifteenth international joint conference on artificial intelligence, IJCAI 97, Nagoya, Japan, August 23–29, Vol. 2, pp. 1330–1339. http://ijcai.org/Proceedings/97-2/Papers/078.pdf.
- Shervashidze, N., Schweitzer, P., van Leeuwen, E. J., Mehlhorn, K., & Borgwardt, K. M. (2011). Weisfeiler-lehman graph kernels. Journal of Machine Learning Research 12:2539–2561, http://dl.acm.org/citation.cfm?id=2078187.
- Van Laer, W., & De Raedt, L. (2001). How to upgrade propositional learners to first order logic: A case study. In Machine learning and its applications. Springer, pp. 102–126.Google Scholar
- Vlasselaer, J., Van den Broeck, G., Kimmig, A., Meert, W., & De Raedt, L. (2015). Anytime inference in probabilistic logic programs with tp-compilation. In Proceedings of the twenty-fourth international joint conference on artificial intelligence, IJCAI 2015, Buenos Aires, Argentina, July 25–31, pp. 1852–1858, http://ijcai.org/Abstract/15/263.
- Zhang, D., & Lee, W. S. (2003). Question classification using support vector machines. In SIGIR 2003: Proceedings of the 26th annual international ACM SIGIR conference on research and development in information retrieval, July 28–August 1, 2003, Toronto, Canada, pp. 26–32. doi: 10.1145/860435.860443.