Machine Learning

, Volume 86, Issue 1, pp 25–56 | Cite as

Gradient-based boosting for statistical relational learning: The relational dependency network case

  • Sriraam NatarajanEmail author
  • Tushar Khot
  • Kristian Kersting
  • Bernd Gutmann
  • Jude Shavlik


Dependency networks approximate a joint probability distribution over multiple random variables as a product of conditional distributions. Relational Dependency Networks (RDNs) are graphical models that extend dependency networks to relational domains. This higher expressivity, however, comes at the expense of a more complex model-selection problem: an unbounded number of relational abstraction levels might need to be explored. Whereas current learning approaches for RDNs learn a single probability tree per random variable, we propose to turn the problem into a series of relational function-approximation problems using gradient-based boosting. In doing so, one can easily induce highly complex features over several iterations and in turn estimate quickly a very expressive model. Our experimental results in several different data sets show that this boosting method results in efficient learning of RDNs when compared to state-of-the-art statistical relational learning approaches.


Statistical relational learning Graphical models Ensemble methods 


  1. Van Assche, A., Vens, C., & Blockeel, H. (2006). First order random forests: Learning relational classifiers with complex aggregates. Machine Learning, 64, 149–182 CrossRefzbMATHGoogle Scholar
  2. Koller, D., Taskar, B., & Abeel, P. (2002). Discriminative probabilistic models for relational data. In UAI (pp. 485–492). Google Scholar
  3. Bilenko, M., & Mooney, R. (2003). Adaptive duplicate detection using learnable string similarity measures. In KDD (pp. 39–48). Google Scholar
  4. Blockeel, H., & De Raedt, L. (1998). Top-down induction of first-order logical decision trees. Artificial Intelligence, 101, 285–297. CrossRefzbMATHMathSciNetGoogle Scholar
  5. Boutilier, C., Friedman, N., Goldszmidt, M., & Koller, D. (1996). Context-specific independence in Bayesian networks. In UAI (pp. 115–123). Google Scholar
  6. Breiman, L. (1996). Bagging predictors. Machine Learning, 24, 123–140. zbMATHMathSciNetGoogle Scholar
  7. Chickering, D. (1996). Learning Bayesian networks is NP-complete. In Learning from data: Artificial intelligence and statistics V (pp. 121–130). Berlin: Springer. Google Scholar
  8. Craven, M., & Shavlik, J. (1996). Extracting tree-structured representations of trained networks. In NIPS (pp. 24–30). Google Scholar
  9. Davis, J., Ong, I., Struyf, J., Burnside, E., Page, D., & Costa, V. S. (2007). Change of representation for statistical relational learning. In IJCAI. Google Scholar
  10. de Salvo Braz, R., Amir, E., & Roth, D. (2005). Lifted first order probabilistic inference. In IJCAI (pp. 1319–1325). Google Scholar
  11. Dietterich, T. G., Ashenfelter, A., & Bulatov, Y. (2004). Training conditional random fields via gradient tree boosting. In ICML. Google Scholar
  12. Domingos, P., & Lowd, D. (2009). MarkovLogic: An interface layer for AI. San Rafael: Morgan & Claypool. Google Scholar
  13. Fierens, D., Blockeel, H., Bruynooghe, M., & Ramon, J. (2005). Logical Bayesian networks and their relation to other probabilistic Logical models. In ILP. Google Scholar
  14. Freund, Y., & Schapire, R. (1996). Experiments with a new boosting algorithm. In ICML. Google Scholar
  15. Friedman, J. H. (2001) Greedy function approximation: A gradient boosting machine. Annals of Statistics, 29, 1189–1232. CrossRefzbMATHMathSciNetGoogle Scholar
  16. Getoor, L., Friedman, N., Koller, D., & Pfeffer, A. (2001). Learning probabilistic relational models. In S. Dzeroski & N. Lavrac (Eds.), Relational data mining (pp. 307–338). Google Scholar
  17. Getoor, L., & Grant, J. (2006). PRL: A probabilistic relational language. Machine Learning, 62(1–2), 7–31. CrossRefGoogle Scholar
  18. Getoor, L., & Taskar, B. (2007). Introduction to statistical relational learning. Cambridge: MIT Press. zbMATHGoogle Scholar
  19. Gutmann, B., & Kersting, K. (2006). TildeCRF: Conditional random fields for logical sequences. In ECML. Google Scholar
  20. Heckerman, D., Chickering, D., Meek, C., Rounthwaite, R., & Kadie, C. (2001). Dependency networks for inference, collaborative filtering, and data visualization. Journal of Machine Learning Research, 1, 49–75. zbMATHGoogle Scholar
  21. Jaeger, M. (1997). Relational Bayesian networks. In Proceedings of UAI-97. Google Scholar
  22. Jing, Y., Pavloviä, V., & Rehg, J. (2008). Boosted Bayesian network classifiers. Machine Learning, 73(2), 155–184. CrossRefGoogle Scholar
  23. Karwath, A., Kersting, K., & Landwehr, N. (2008). Boosting Relational Sequence alignments. In ICDM. Google Scholar
  24. Kersting, K., Ahmadi, B., & Natarajan, S. (2009). Counting belief propagation. In UAI. Google Scholar
  25. Kersting, K., & De Raedt, L. (2007). Bayesian logic programming: theory and tool. In An introduction to statistical relational learning. Google Scholar
  26. Kersting, K., & Driessens, K. (2008). Non-parametric policy gradients: a unified treatment of propositional and relational domains. In ICML. Google Scholar
  27. Kok, S., & Domingos, P. (2009). Learning Markov logic network structure via hypergraph lifting. In ICML. Google Scholar
  28. Kok, S., & Domingos, P. (2010). Learning Markov logic networks using structural motifs. In ICML. Google Scholar
  29. Lawrence, S., Giles, C., & Bollacker, K. (1999). Autonomous citation matching. In AGENTS (pp. 392–393). CrossRefGoogle Scholar
  30. Mihalkova, L., & Mooney, R. (2007). Bottom-up learning of Markov logic network structure. In ICML (pp. 625–632). CrossRefGoogle Scholar
  31. Milch, B., Zettlemoyer, L., Kersting, K., Haimes, M., & Pack Kaelbling, L. (2008). Lifted probabilistic inference with counting formulas. In AAAI. Google Scholar
  32. Muggleton, S., & De Raedt, L. (1994). Inductive logic programming: theory and methods. The Journal of Logic Programming, 19/20, 629–679. CrossRefGoogle Scholar
  33. Natarajan, S., Tadepalli, P., Dietterich, T. G., & Fern, A. (2009). Learning first-order probabilistic models with combining rules. In AMAI. Google Scholar
  34. Neville, J., & Jensen, D. (2007). Relational dependency networks. In Introduction to statistical relational learning (pp. 653–692). Google Scholar
  35. Neville, J., Jensen, D., Friedland, L., & Hay, M. (2003). Learning relational probability trees. In KDD. Google Scholar
  36. Neville, J., Jensen, D., & Gallagher, B. (2003). Simple estimators for relational Bayesian classifiers. In ICDM (pp. 609–612). Google Scholar
  37. Parker, C., Fern, A., & Tadepalli, P. (2006). Gradient boosting for sequence alignment. In AAAI. Google Scholar
  38. Pearl, J. (1988). Probabilistic reasoning in intelligent systems: Networks of plausible inference. San Mateo: Morgan Kaufmann. Google Scholar
  39. Poole, D. (1993). Probabilistic Horn abduction and Bayesian networks. Artificial Intelligence, 64(1), 81–129. CrossRefzbMATHGoogle Scholar
  40. Poole, D. (2003). First-order probabilistic inference. In IJCAI (pp. 985–991). Google Scholar
  41. Poon, H., & Domingos, P. (2007). Joint inference in information extraction. In AAAI (pp. 913–918). Google Scholar
  42. De Raedt, L., Kimmig, A., & Toivonen, H. (2007). Problog: A probabilistic prolog and its application in link discovery. In IJCAI (pp. 2468–2473). Google Scholar
  43. Sato, T., & Kameya, Y. (2001). Parameter learning of logic programs for symbolic-statistical modeling. In JAIR (pp. 391–454). Google Scholar
  44. Singla, P., & Domingos, P. (2006). Entity resolution with Markov logic. In ICDM (pp. 572–582). Google Scholar
  45. Singla, P., & Domingos, P. (2008). Lifted first-order belief propagation. In AAAI (pp. 1094–1099). Google Scholar
  46. Srinivasan, A. (2004). The Aleph manual. Google Scholar
  47. Sutton, R., McAllester, D., Singh, S., & Mansour, Y. (2000). Policy gradient methods for reinforcement learning with function approximation. In NIPS. Google Scholar
  48. Truyen, T., Phung, D., Venkatesh, S., & Bui, H. (2006). Adaboost.mrf: Boosted Markov random forests and application to multilevel activity recognition. In CVPR (pp. 1686–1693). Google Scholar
  49. Vens, C., Ramon, J., & Blockeel, H. (2006). Refining aggregate conditions in relational learning. In Knowledge discovery in databases: PKDD (p. 2006). Google Scholar
  50. Xu, Z., Kersting, K., & Tresp, V. (2009). Multi-relational learning with Gaussian processes. In IJCAI. Google Scholar

Copyright information

© The Author(s) 2011

Authors and Affiliations

  • Sriraam Natarajan
    • 1
    Email author
  • Tushar Khot
    • 2
  • Kristian Kersting
    • 3
  • Bernd Gutmann
    • 4
  • Jude Shavlik
    • 2
  1. 1.School of MedicineWake Forest UniversityWinston SalemUSA
  2. 2.University of Wisconsin-MadisonMadisonUSA
  3. 3.Frauhofer IAISSankt AugustinGermany
  4. 4.K.U. LeuvenLeuvenBelgium

Personalised recommendations