Machine Learning

, Volume 86, Issue 1, pp 89–114 | Cite as

Applying the information bottleneck to statistical relational learning

Article

Abstract

In this paper we propose to apply the Information Bottleneck (IB) approach to the sub-class of Statistical Relational Learning (SRL) languages that are reducible to Bayesian networks. When the resulting networks involve hidden variables, learning these languages requires the use of techniques for learning from incomplete data such as the Expectation Maximization (EM) algorithm. Recently, the IB approach was shown to be able to avoid some of the local maxima in which EM can get trapped when learning with hidden variables. Here we present the algorithm Relational Information Bottleneck (RIB) that learns the parameters of SRL languages reducible to Bayesian Networks. In particular, we present the specialization of RIB to a language belonging to the family of languages based on the distribution semantics, Logic Programs with Annotated Disjunction (LPADs). This language is prototypical for such a family and its equivalent Bayesian networks contain hidden variables. RIB is evaluated on the IMDB, Cora and artificial datasets and compared with LeProbLog, EM, Alchemy and PRISM. The experimental results show that RIB has good performances especially when some logical atoms are unobserved. Moreover, it is particularly suitable when learning from interpretations that share the same Herbrand base.

Keywords

Probabilistic inductive logic programming Statistical relational learning Inductive logic programming Knowledge based model construction Distribution semantics 

References

  1. Apt, K. R., & Bezem, M. (1991). Acyclic programs. New Generation Computing, 9(3/4), 335–364. CrossRefGoogle Scholar
  2. Bacchus, F. (1993). Using first-order probability logic for the construction of Bayesian networks. In D. Heckerman & E. H. Mamdani (Eds.), Proceedings of the 9th annual conference on uncertainty in artificial intelligence (pp. 219–226). San Mateo: Morgan Kaufmann. Google Scholar
  3. Blockeel, H., & Meert, W. (2007). Towards learning non-recursive LPADs by transforming them into Bayesian networks. In H. Blockeel, J. Ramon, J. W. Shavlik, & P. Tadepalli (Eds.), LNCS: Vol. 4894. Proceedings of the 17th international conference on inductive logic programming (pp. 94–108). Berlin: Springer. Google Scholar
  4. Breese, J. S., Goldman, R. P., & Wellman, M. P. (1994). Introduction to the special section on knowledge-based construction of probabilistic and decision models. IEEE Transactions on Systems, Man and Cybernetics, 24(11), 1577–1579. Google Scholar
  5. Costa, V. S., Page, D., Qazi, M., & Cussens, J. (2003). CLP(BN): Constraint logic programming for probabilistic knowledge. In C. Meek & U. Kjærulff (Eds.), Proceedings of the 19th conference in uncertainty in artificial intelligence (pp. 517–524). San Mateo: Morgan Kaufmann. Google Scholar
  6. Dantsin, E. (1991). Probabilistic logic programs and their semantics. In A. Voronkov (Ed.), LNCS: Vol. 592. Proceedings of the Russian conference on logic programming (pp. 152–164). San Mateo: Springer. Google Scholar
  7. Davis, J., & Goadrich, M. (2006). The relationship between Precision-Recall and ROC curves. In W. W. Cohen & A. Moore (Eds.), Proceedings of the 23rd international conference on machine learning (Vol. 4702, pp. 233–240). New York: ACM. Google Scholar
  8. De Raedt, L., Kimmig, A., & Toivonen, H. (2007). ProbLog: A probabilistic prolog and its application in link discovery. In M. M. Veloso (Ed.), Proceedings of the 20th international joint conference on artificial intelligence (pp. 2462–2467). Google Scholar
  9. De Raedt, L., Demoen, B., Fierens, D., Gutmann, B., Janssens, G., Kimmig, A., Landwehr, N., Mantadelis, T., Meert, W., Rocha, R., Costa, V., Santos, V., Thon, I., & Vennekens, J. (2008). Towards digesting the alphabet-soup of statistical relational learning. In Workshop on Probabilistic Programming (in NIPS). Google Scholar
  10. De Raedt, L., Frasconi, P., Kersting, K., & Muggleton, S. (Eds.) (2008). LNCS: Vol. 4911. Probabilistic inductive logic programming - theory and applications. Berlin: Springer. MATHGoogle Scholar
  11. Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society Series B (Methodological), 39(1), 1–38. MATHMathSciNetGoogle Scholar
  12. Elidan, G., & Friedman, N. (2005). Learning hidden variable networks: The information bottleneck approach. Journal of Machine Learning Research, 6, 81–127. MATHMathSciNetGoogle Scholar
  13. Friedman, N. (1998). The Bayesian structural EM algorithm. In G. F. Cooper & S. Moral (Eds.), Proceedings of the 14th conference on uncertainty in artificial intelligence (pp. 129–138). San Mateo: Morgan Kaufmann. Google Scholar
  14. Friedman, N., Mosenzon, O., Slonim, N., & Tishby, N. (2001). Multivariate information bottleneck. In J. S. Breese & D. Koller (Eds.), Proceedings of the 17th conference in uncertainty in artificial intelligence (pp. 152–161). San Mateo: Morgan Kaufmann. Google Scholar
  15. Getoor, L., & Taskar, B. (Eds.) (2007). Introduction to statistical relational learning. Cambridge: MIT Press. MATHGoogle Scholar
  16. Gutmann, B., Kimmig, A., Kersting, K., & De Raedt, L. (2008). Parameter learning in probabilistic databases: A least squares approach. In W. Daelemans, B. Goethals, & K. Morik (Eds.), LNCS: Vol. 5211. Proceedings of the European conference on machine learning and knowledge discovery in databases (pp. 473–488). Berlin: Springer. CrossRefGoogle Scholar
  17. Gutmann, B., Kimmig, A., Kersting, K., & De Raedt, L. (2010). Parameter estimation in ProbLog from annotated queries. Tech. Rep. CW 583, Department of Computer Science, Katholieke Universiteit Leuven, Belgium. Google Scholar
  18. Jaeger, M. (1997). Relational Bayesian networks. In D. Geiger & P. P. Shenoy (Eds.), Proceedings of the 13th conference on uncertainty in artificial intelligence (pp. 266–273). San Mateo: Morgan Kaufmann. Google Scholar
  19. Jaeger, M. (2007). Parameter learning for relational Bayesian networks. In Z. Ghahramani (Ed.), ACM International conference proceeding series: Vol. 227. Proceedings of the 24th international conference on machine learning (pp. 369–376). New York: ACM. CrossRefGoogle Scholar
  20. Kersting, K., & De Raedt, L. (2001). Towards combining inductive logic programming with Bayesian networks. In C. Rouveirol & M. Sebag (Eds.), LNCS: Vol. 2157. Proceedings of the 11th international conference on inductive logic programming (pp. 118–131). Berlin: Springer. Google Scholar
  21. Koller, D., & Pfeffer, A. (1997). Learning probabilities for noisy first-order rules. In Proceedings of the 15th international joint conference on artifical intelligence (pp. 1316–1321). San Mateo: Morgan Kaufmann. Google Scholar
  22. Lauritzen, S. L. (1995). The EM algorithm for graphical association models with missing data. Computational Statistics & Data Analysis, 19(2), 191–201. CrossRefMATHGoogle Scholar
  23. Lowd, D., & Domingos, P. (2007). Efficient weight learning for Markov logic networks. In J. N. Kok, J. Koronacki, R. L. de Mántaras, S. Matwin, D. Mladenic, & A. Skowron (Eds.), LNCS: Vol. 4702. Proceedings of the 18th European conference on machine learning (pp. 200–211). Berlin: Springer. Google Scholar
  24. Meert, W., Struyf, J., & Blockeel, H. (2007). Learning ground CP-logic theories by means of Bayesian network techniques. In Proceedings of the 6th international workshop on multi-relational data mining. Google Scholar
  25. Meert, W., Struyf, J., & Blockeel, H. (2008). Learning ground CP-Logic theories by leveraging Bayesian network learning techniques. Fundamenta Informaticae, 89(1), 131–160. MATHMathSciNetGoogle Scholar
  26. Meert, W., Struyf, J., & Blockeel, H. (2010). CP-Logic theory inference with contextual variable elimination and comparison to bdd based inference methods. In L. De Raedt (Ed.), LNCS: Vol. 5989. Proceedings of the 19th international conference on inductive logic programming (revised papers) (pp. 96–109). Berlin: Springer. Google Scholar
  27. Mihalkova, L., & Mooney, R. J. (2007). Bottom-up learning of Markov logic network structure. In Z. Ghahramani (Ed.), ACM International Conference Proceeding Series: Vol. 227. Proceedings of the 24th international conference on machine learning (pp. 625–632). New York: ACM. CrossRefGoogle Scholar
  28. Muggleton, S. (2002). Learning structure and parameters of stochastic logic programs. In S. Matwin & C. Sammut (Eds.), LNCS: Vol. 2583. Proceedings of the 12th international conference on inductive logic programming (revised papers) (pp. 198–206). Berlin: Springer. Google Scholar
  29. Muggleton, S., & De Raedt, L. (1994). Inductive logic programming: Theory and methods. Journal of Logic Programming, 19/20, 629–679. CrossRefMathSciNetGoogle Scholar
  30. Natarajan, S., Tadepalli, P., Altendorf, E., Dietterich, T. G., Fern, A., & Restificar, A. (2005). Learning first-order probabilistic models with combining rules. In L. D. Raedt & S. Wrobel (Eds.), ACM International Conference Proceeding Series: Vol.119. Proceedings of the 22nd international conference on machine learning (pp. 609–616). New York: ACM. CrossRefGoogle Scholar
  31. Pearl, J. (1988). Probabilistic reasoning in intelligent systems: networks of plausible inference. San Mateo: Morgan Kaufman. Google Scholar
  32. Poole, D. (1997). The Independent Choice Logic for modelling multiple agents under uncertainty. Artifcial Intelligence, 94(1–2), 7–56. CrossRefMATHMathSciNetGoogle Scholar
  33. Richardson, M., & Domingos, P. (2006). Markov logic networks. Machine Learning, 62(1–2), 107–136. CrossRefGoogle Scholar
  34. Riguzzi, F. (2004). Learning logic programs with annotated disjunctions. In R. Camacho, R. D. King, & A. Srinivasan (Eds.), LNCS: Vol. 4455. Proceedings of the 14th international conference on inductive logic programming (pp. 270–287). Berlin: Springer. Google Scholar
  35. Riguzzi, F. (2007a). ALLPAD: Approximate learning of logic programs with annotated disjunctions. In S. Muggleton, R. P. Otero, & A. Tamaddoni-Nezhad (Eds.), LNCS: Vol. 4455. Proceedings of the 16th international conference on inductive logic programming (pp. 43–45). Berlin: Springer. Google Scholar
  36. Riguzzi, F. (2007b). A top-down interpreter for LPAD and CP-Logic. In R. Basili & M. T. Pazienza (Eds.), LNCS: Vol. 4733. Proceedings of the 10th congress of the Italian association for artificial intelligence (pp. 109–120). Berlin: Springer. Google Scholar
  37. Riguzzi, F. (2008a). ALLPAD: approximate learning of logic programs with annotated disjunctions. Machine Learning, 70(2–3), 207–223. CrossRefGoogle Scholar
  38. Riguzzi, F. (2008b). Inference with logic programs with annotated disjunctions under the well-founded semantics. In M. Garcia de la Banda & E. Pontelli (Eds.), LNCS: Vol. 5366. Proceedings of the 24th international conference on logic programming (pp. 667–771). Berlin: Springer. Google Scholar
  39. Riguzzi, F. (2010). SLGAD resolution for inference on Logic Programs with Annotated Disjunctions. Fundamenta Informaticae, 102(3–4), 429–466. MATHMathSciNetGoogle Scholar
  40. Riguzzi, F., & Di Mauro, N. (2010) Application of the information bottleneck to LPAD learning. Tech. Rep. CS-2010-01, Dipartimento di Ingegneria, Università di Ferrara, Italy. http://www.ing.unife.it/docenti/FabrizioRiguzzi/Papers/RigdiM10-TR-CS-2010-01.pdf.
  41. Riguzzi, F., & Swift, T. (2010). Tabling and Answer subsumption for reasoning on logic programs with annotated disjunctions. In M. V. Hermenegildo & T. Schaub (Eds.), LIPIcs: Vol. 7. Technical communications of the 26th international conference on logic programming (pp. 162–171). Dagstuhl: Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik. Google Scholar
  42. Rose, K. (2002). Deterministic annealing for clustering, compression, classification, regression, and related optimization problems. Proceedings of the IEEE, 86(11), 2210–2239. CrossRefGoogle Scholar
  43. Sato, T. (1995). A statistical learning method for logic programs with distribution semantics. In L. Sterling (Ed.), Proceedings of the 12th international conference on logic programming (pp. 715–729). Cambridge: MIT Press. Google Scholar
  44. Sato, T., & Kameya, Y. (2001). Parameter learning of logic programs for symbolic-statistical modeling. Journal of Artificial Intelligence Research, 15, 391–454. MATHMathSciNetGoogle Scholar
  45. Sato, T., Kameya, Y., & Zhou, N. F. (2005). Generative modeling with failure in prism. In: L.P. Kaelbling, A. Saffiotti (Eds.), Proceedings of the 19th international joint conference on artificial intelligence (pp. 847–852), Professional Book Center. Google Scholar
  46. Singla, P., & Domingos, P. (2005). Discriminative training of Markov logic networks. In M. M. Veloso & S. Kambhampati (Eds.), Proceedings of the 20th national conference on artificial intelligence and the seventeenth innovative applications of artificial intelligence conference (pp. 868–873). Menlo Park: AAAI Press/MIT Press. Google Scholar
  47. Singla, P., & Domingos, P. (2006). Entity resolution with Markov logic. In Proceedings of the 6th IEEE international conference on data mining (pp. 572–582). Los Alamitos: IEEE Computer Society. Google Scholar
  48. Tishby, N., Pereira, F., & Bialek, W. (1999). The information bottleneck method. In 37th annual Allerton conference on communication, control and computing (pp. 368–377). Google Scholar
  49. Van Gelder, A., Ross, K. A., & Schlipf, J. S. (1991). The well-founded semantics for general logic programs. Journal of the ACM, 38(3), 620–650. MATHGoogle Scholar
  50. Vennekens, J., & Verbaeten, S. (2003). Logic programs with annotated disjunctions. Tech. Rep. CW386, Department of Computer Science, Katholieke Universiteit Leuven, Belgium. Google Scholar
  51. Vennekens, J., Verbaeten, S., & Bruynooghe, M. (2004). Logic programs with annotated disjunctions. In B. Demoen & V. Lifschitz (Eds.), LNCS: Vol. 3131. Proceedings of the 20th international conference on logic programming (pp. 195–209). Berlin: Springer. Google Scholar

Copyright information

© The Author(s) 2011

Authors and Affiliations

  1. 1.Dipartimento di IngegneriaUniversità di FerraraFerraraItaly
  2. 2.Dipartimento di InformaticaUniversità di Bari “Aldo Moro”BariItaly

Personalised recommendations