Lifted Online Training of Relational Models with Stochastic Gradient Methods

  • Babak Ahmadi
  • Kristian Kersting
  • Sriraam Natarajan
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7523)


Lifted inference approaches have rendered large, previously intractable probabilistic inference problems quickly solvable by employing symmetries to handle whole sets of indistinguishable random variables. Still, in many if not most situations training relational models will not benefit from lifting: symmetries within models easily break since variables become correlated by virtue of depending asymmetrically on evidence. An appealing idea for such situations is to train and recombine local models. This breaks long-range dependencies and allows to exploit lifting within and across the local training tasks. Moreover, it naturally paves the way for online training for relational models. Specifically, we develop the first lifted stochastic gradient optimization method with gain vector adaptation, which processes each lifted piece one after the other. On several datasets, the resulting optimizer converges to the same quality solution over an order of magnitude faster, simply because unlike batch training it starts optimizing long before having seen the entire mega-example even once.


Relational Model Stochastic Gradient Factor Graph Online Training Natural Gradient 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Getoor, L., Taskar, B.: Introduction to Statistical Relational Learning. The MIT Press (2007)zbMATHGoogle Scholar
  2. 2.
    De Raedt, L., Frasconi, P., Kersting, K., Muggleton, S.H. (eds.): Probabilistic Inductive Logic Programming. LNCS (LNAI), vol. 4911. Springer, Heidelberg (2008)zbMATHGoogle Scholar
  3. 3.
    Singla, P., Domingos, P.: Lifted First-Order Belief Propagation. In: AAAI (2008)Google Scholar
  4. 4.
    Kersting, K., Ahmadi, B., Natarajan, S.: Counting belief propagation. In: UAI, Montreal, Canada (2009)Google Scholar
  5. 5.
    Mihalkova, L., Huynh, T., Mooney, R.: Mapping and revising markov logic networks for transfer learning. In: AAAI, pp. 608–614 (2007)Google Scholar
  6. 6.
    Rosenblatt, F.: Principles of Neurodynamics: Perceptrons and the Theory of Brain Mechanisms. Spartan (1962)Google Scholar
  7. 7.
    Besag, J.: Statistical Analysis of Non-Lattice Data. Journal of the Royal Statistical Society. Series D (The Statistician) 24(3), 179–195 (1975)MathSciNetGoogle Scholar
  8. 8.
    Winkler, G.: Image Analysis, Random Fields and Dynamic Monte Carlo Methods. Springer (1995)Google Scholar
  9. 9.
    Sutton, C., Mccallum, A.: Piecewise training for structured prediction. Machine Learning 77(2-3), 165–194 (2009)CrossRefGoogle Scholar
  10. 10.
    Lee, S.I., Ganapathi, V., Koller, D.: Efficient structure learning of Markov networks using L1-regularization. In: NIPS (2007)Google Scholar
  11. 11.
    Hinton, G.: Training products of experts by minimizing contrastive divergence. Neural Computation 14 (2002)Google Scholar
  12. 12.
    Kok, S., Domingos, P.: Learning Markov logic network structure via hypergraph lifting. In: ICML (2009)Google Scholar
  13. 13.
    Kok, S., Domingos, P.: Learning Markov logic networks using structural motifs. In: ICML (2010)Google Scholar
  14. 14.
    Khot, T., Natarajan, S., Kersting, K., Shavlik, J.: Learning markov logic networks via functional gradient boosting. In: ICDM (2011)Google Scholar
  15. 15.
    Friedman, J.H.: Greedy function approximation: A gradient boosting machine. Annals of Statistics, 1189–1232 (2001)Google Scholar
  16. 16.
    Natarajan, S., Khot, T., Kersting, K., Guttmann, B., Shavlik, J.: Gradient-based boosting for statistical relational learning: The relational dependency network case. Machine Learning (2012)Google Scholar
  17. 17.
    Richardson, M., Domingos, P.: Markov logic networks. Machine Learning 62(1-2), 107–136 (2006)CrossRefGoogle Scholar
  18. 18.
    Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society B.39 (1977)Google Scholar
  19. 19.
    Sato, T., Kameya, Y.: Parameter learning of logic programs for symbolic-statistical modeling. J. Artif. Intell. Res (JAIR) 15, 391–454 (2001)MathSciNetzbMATHGoogle Scholar
  20. 20.
    Kersting, K., De Raedt, L.: Adaptive Bayesian Logic Programs. In: Rouveirol, C., Sebag, M. (eds.) ILP 2001. LNCS (LNAI), vol. 2157, pp. 104–117. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  21. 21.
    Getoor, L., Friedman, N., Koller, D., Taskar, B.: Learning probabilistic models of link structure. Journal of Machine Learning Research 3, 679–707 (2002)MathSciNetGoogle Scholar
  22. 22.
    Thon, I., Landwehr, N., De Raedt, L.: Stochastic relational processes: Efficient inference and applications. Machine Learning 82(2), 239–272 (2011)zbMATHCrossRefGoogle Scholar
  23. 23.
    Gutmann, B., Thon, I., De Raedt, L.: Learning the Parameters of Probabilistic Logic Programs from Interpretations. In: Gunopulos, D., Hofmann, T., Malerba, D., Vazirgiannis, M. (eds.) ECML PKDD 2011, Part I. LNCS, vol. 6911, pp. 581–596. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  24. 24.
    Natarajan, S., Tadepalli, P., Dietterich, T.G., Fern, A.: Learning first-order probabilistic models with combining rules. Annals of Mathematics and AI (2009)Google Scholar
  25. 25.
    Jaeger, M.: Parameter learning for Relational Bayesian networks. In: ICML (2007)Google Scholar
  26. 26.
    Huynh, T., Mooney, R.: Online max-margin weight learning for markov logic networks. In: SDM (2011)Google Scholar
  27. 27.
    Singla, P., Domingos, P.: Lifted first-order belief propagation. In: AAAI (2008)Google Scholar
  28. 28.
    Sutton, C., McCallum, A.: Piecewise training for structured prediction. Machine Learning 77(2-3), 165–194 (2009)CrossRefGoogle Scholar
  29. 29.
    Richards, B., Mooney, R.: Learning relations by pathfinding. In: AAAI (1992)Google Scholar
  30. 30.
    Wainwright, M., Jaakkola, T., Willsky, A.: A new class of upper bounds on the log partition function. In: UAI, pp. 536–543 (2002)Google Scholar
  31. 31.
    Schraudolph, N., Graepel, T.: Combining conjugate direction methods with stochastic approximation of gradients. In: AISTATS, pp. 7–13 (2003)Google Scholar
  32. 32.
    Vishwanathan, S.V.N., Schraudolph, N.N., Schmidt, M.W., Murphy, K.P.: Accelerated training of conditional random fields with stochastic gradient methods. In: ICML, pp. 969–976 (2006)Google Scholar
  33. 33.
    Amari, S.: Natural gradient works efficiently in learning. Neural Comput. 10, 251–276 (1998)CrossRefGoogle Scholar
  34. 34.
    Le Roux, N., Manzagol, P.A., Bengio, Y.: Topmoumoute online natural gradient algorithm. In: NIPS (2007)Google Scholar
  35. 35.
    Müller, M.: A scaled conjugate gradient algorithm for fast supervised learning. Neural Networks 6(4), 525–533 (1993)CrossRefGoogle Scholar
  36. 36.
    Ahmadi, B., Kersting, K., Sanner, S.: Multi-Evidence Lifted Message Passing, with Application to PageRank and the Kalman Filter. In: IJCAI (2011)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Babak Ahmadi
    • 1
  • Kristian Kersting
    • 1
    • 2
    • 3
  • Sriraam Natarajan
    • 3
  1. 1.Knowledge Discovery DepartmentFraunhofer IAISSankt AugustinGermany
  2. 2.Institute of Geodesy and GeoinformationUniversity of BonnBonnGermany
  3. 3.School of MedicineWake Forest UniversityWinston-SalemUSA

Personalised recommendations