Skip to main content

Markov logic networks

An Erratum to this article was published on 01 May 2006

Abstract

We propose a simple approach to combining first-order logic and probabilistic graphical models in a single representation. A Markov logic network (MLN) is a first-order knowledge base with a weight attached to each formula (or clause). Together with a set of constants representing objects in the domain, it specifies a ground Markov network containing one feature for each possible grounding of a first-order formula in the KB, with the corresponding weight. Inference in MLNs is performed by MCMC over the minimal subset of the ground network required for answering the query. Weights are efficiently learned from relational databases by iteratively optimizing a pseudo-likelihood measure. Optionally, additional clauses are learned using inductive logic programming techniques. Experiments with a real-world database and knowledge base in a university domain illustrate the promise of this approach.

References

  1. Bacchus, F. (1990). Representing and reasoning with probabilistic knowledge. Cambridge, MA: MIT Press.

    Google Scholar 

  2. Bacchus, F., Grove, A. J., Halpern, J. Y., & Koller, D. (1996). From statistical knowledge bases to degrees of belief. Artificial Intelligence, 87, 75–143.

    MathSciNet  Article  Google Scholar 

  3. Bergadano, F., & Giordana, A. (1988). A knowledge-intensive approach to concept induction. Proceedings of the Fifth International Conference on Machine Learning (pp. 305–317). Ann Arbor, MI: Morgan Kaufmann.

  4. Berners-Lee, T., Hendler, J., & Lassila, O. (2001). The Semantic Web. Scientific American, 284: 5, 34–43.

  5. Besag, J. (1975). Statistical analysis of non-lattice data. The Statistician, 24, 179–195.

    Article  Google Scholar 

  6. Buntine, W. (1994). Operations for learning with graphical models. Journal of Artificial Intelligence Research, 2, 159–225.

    Google Scholar 

  7. Byrd, R. H., Lu, P., & Nocedal, J. (1995). A limited memory algorithm for bound constrained optimization. SIAM Journal on Scientific and Statistical Computing, 16, 1190–1208.

    MATH  MathSciNet  Article  Google Scholar 

  8. Chakrabarti, S., Dom, B., & Indyk, P. (1998). Enhanced hypertext categorization using hyperlinks. Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data (pp. 307–318). Seattle, WA: ACM Press.

  9. Collins, M. (2002). Discriminative training methods for hidden Markov models: Theory and experiments with perceptron algorithms. In Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing. Philadelphia, PA.

  10. Cumby, C., & Roth, D. (2003). Feature extraction languages for propositionalized relational learning. Proceedings of the IJCAI-2003 Workshop on Learning Statistical Models from Relational Data (pp. 24–31). Acapulco, Mexico: IJCAII.

  11. Cussens, J. (1999). Loglinear models for first-order probabilistic reasoning. In Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence (pp. 126–133). Stockholm, Sweden: Morgan Kaufmann.

  12. Cussens, J. (2003). Individuals, relations and structures in probabilistic models. InProceedings of the IJCAI-2003 Workshop on Learning Statistical Models from Relational Data (pp. 32–36). Acapulco, Mexico: IJCAII.

  13. De Raedt, L., & Dehaspe, L. (1997). Clausal discovery. Machine Learning, 26, 99–146.

    MATH  Article  Google Scholar 

  14. DeGroot, M. H., & Schervish, M. J. (2002). Probability and statistics. Boston, MA: AddisonWesley. 3rd edition.

    Google Scholar 

  15. Dehaspe, L. (1997). Maximum entropy modeling with clausal constraints. Proceedings of the Seventh International Workshop on Inductive Logic Programming (pp. 109–125). Prague, Czech Republic: Springer.

  16. Della Pietra, S., Della Pietra, V., & Lafferty, J. (1997). Inducing features of random fields. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19, 380–392.

    Article  Google Scholar 

  17. Dietterich, T., Getoor, L., & Murphy, K. (Eds.). (2003). Proceedings of the ICML-2004 Workshop on Statistical Relational Learning and its Connections to Other Fields. Banff, Canada: IMLS.

  18. Domingos, P., & Pazzani, M. (1997). On the optimality of the simple Bayesian classifier under zero-one loss. Machine Learning, 29, 103–130.

    MATH  Article  Google Scholar 

  19. Džeroski, S., & Blockeel, H. (Eds.). (2004). Proceedings of the Third International Workshop on Multi-Relational Data Mining. Seattle, WA: ACM Press.

  20. Džeroski, S., & De Raedt, L. (2003). Special issue on multi-relational data mining: The current frontiers. SIGKDD Explorations, 5.

  21. Džeroski, S., De Raedt, L., & Wrobel, S. (Eds.). (2002). Proceedings of the First International Workshop on Multi-Relational Data Mining. Edmonton, Canada: ACM Press.

  22. Džeroski, S., De Raedt, L., & Wrobel, S. (Eds.). (2003). Proceedings of the Second International Workshop on Multi-Relational Data Mining. Washington, DC: ACM Press.

  23. Edwards, R., & Sokal, A. (1988). Generalization of the Fortuin-Kasteleyn-Swendsen-Wang representation and Monte Carlo algorithm. Physics Review D (pp. 2009–2012).

  24. Flake, G. W., Lawrence, S., & Giles, C. L. (2000). Efficient identification of Web communities. Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 150–160). Boston, MA: ACM Press.

  25. Friedman, N., Getoor, L., Koller, D., & Pfeffer, A. (1999). Learning probabilistic relational models. In Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence (pp. 1300–1307). Stockholm, Sweden: Morgan Kaufmann.

  26. Genesereth, M. R., & Nilsson, N. J. (1987). Logical foundations of artificial intelligence. San Mateo, CA: Morgan Kaufmann.

    Google Scholar 

  27. Getoor, L., & Jensen, D. (Eds.). (2000).InProceedings of the AAAI-2000 Workshop on Learning Statistical Models from Relational Data. Austin, TX: AAAI Press.

  28. Getoor, L., & Jensen, D. (Eds.). (2003). In Proceedings of the IJCAI-2003 Workshop on Learning Statistical Models from Relational Data. Acapulco, Mexico: IJCAII.

  29. Geyer, C. J., & Thompson, E. A. (1992). Constrained Monte Carlo maximum likelihood for dependent data. Journal of the Royal Statistical Society, Series B, 54, 657–699.

    Google Scholar 

  30. Gilks, W. R., Richardson, S., & Spiegelhalter, D. J. (Eds.). (1996). Markov chain Monte Carlo in practice. London, UK: Chapman and Hall.

    Google Scholar 

  31. Halpern, J. (1990). An analysis of first-order logics of probability. Artificial Intelligence, 46, 311–350.

    MATH  MathSciNet  Article  Google Scholar 

  32. Heckerman, D., Chickering, D. M., Meek, C., Rounthwaite, R., & Kadie, C. (2000). Dependency networks for inference, collaborative filtering, and data visualization. Journal of Machine Learning Research, 1, 49–75.

    Article  Google Scholar 

  33. Heckerman, D., Geiger, D., & Chickering, D. M. (1995). Learning Bayesian networks: The combination of knowledge and statistical data. Machine Learning, 20, 197–243.

    MATH  Google Scholar 

  34. Heckerman, D., Meek, C., & Koller, D. (2004). Probabilistic entity-relationship models, PRMs, and plate models. In Proceedings of the ICML-2004 Workshop on Statistical Relational Learning and its Connections to Other Fields (pp. 55–60). Banff, Canada: IMLS.

  35. Hulten, G., & Domingos, P. (2002). Mining complex models from arbitrarily large databases in constant time. In Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 525–531). Edmonton, Canada: ACM Press.

  36. Jaeger, M. (1998). Reasoning about infinite random structures with relational Bayesian networks. Proceedings of the Sixth International Conference on Principles of Knowledge Representation and Reasoning. Trento, Italy: Morgan Kaufmann.

  37. Jaeger, M. (2000). On the complexity of inference about probabilistic relational models. Artificial Intelligence, 117, 297–308.

    MATH  MathSciNet  Article  Google Scholar 

  38. Kautz, H., Selman, B., & Jiang, Y. (1997). A general stochastic approach to solving problems with hard and soft constraints. In D. Gu, J. Du & P. Pardalos (Eds.), The satisfiability problem: Theory and applications, (pp. 573–586). New York, NY: American Mathematical Society.

    Google Scholar 

  39. Kersting, K., & De Raedt, L. (2001). Towards combining inductive logic programming with Bayesian networks. In Proceedings of the Eleventh International Conference on Inductive Logic Programming (pp. 118–131). Strasbourg, France: Springer.

  40. Laffar, J., & Lassez, J. (1987). Constraint logic programming. Proceedings of the Fourteenth ACM Conference on Principles of Programming Languages (pp. 111–119). Munich, Germany: ACM Press.

  41. Lavrač, N., & Džeroski, S. (1994). Inductive Logic Programming: Techniques and Applications. Chichester, UK: Ellis Horwood.

    Google Scholar 

  42. Liu, D. C., & Nocedal, J. (1989). On the limited memory BFGS method for large scale optimization. Mathematical Programming, 45, 503–528.

    MATH  MathSciNet  Article  Google Scholar 

  43. Lloyd, J. W. (1987). Foundations of logic programming. Berlin, Germany: Springer.

    Google Scholar 

  44. Lloyd-Richardson, E., Kazura, A., Stanton, C., Niaura, R., & Papandonatos, G. (2002). Differentiating stages of smoking intensity among adolescents: Stage-specific psychological and social influences. Journal of Consulting and Clinical Psychology, 70.

  45. Milch, B., Marthi, B., & Russell, S. (2004). BLOG: Relational modeling with unknown objects. Proceedings of the ICML-2004 Workshop on Statistical Relational Learning and its Connections to Other Fields (pp. 67–73). Banff, Canada: IMLS.

  46. Muggleton, S. (1996). Stochastic logic programs. In L. De Raedt (Ed.), Advances in inductive logic programming (pp.254–264). Amsterdam, Netherlands: IOS Press.

    Google Scholar 

  47. Neville, J., & Jensen, D. (2003). Collective classification with relational dependency networks. Proceedings of the Second International Workshop on Multi-Relational Data Mining (pp. 77–91). Washington, DC: ACM Press.

  48. Ngo, L., & Haddawy, P. (1997). Answering queries from context-sensitive probabilistic knowledge bases. Theoretical Computer Science, 171, 147–177.

    MATH  MathSciNet  Article  Google Scholar 

  49. Nilsson, N. (1986). Probabilistic logic. Artificial Intelligence, 28, 71–87.

    MATH  MathSciNet  Article  Google Scholar 

  50. Nocedal, J., & Wright, S. J. (1999). Numerical Optimization. New York, NY: Springer.

    Google Scholar 

  51. Ourston, D., & Mooney, R. J. (1994). Theory refinement combining analytical and empirical methods. Artificial Intelligence, 66, 273–309.

    MATH  MathSciNet  Article  Google Scholar 

  52. Parag, & Domingos, P. (2004). Multi-relational record linkage. In Proceedings of the Third International Workshop on Multi-Relational Data Mining. Seattle, WA: ACM Press.

  53. Paskin, M. (2002). Maximum entropy probabilistic logic (Technical Report UCB/CSD-01-1161). Computer Science Division, University of California, Berkeley, CA.

  54. Pasula, H., & Russell, S. (2001). Approximate inference for first-order probabilistic languages. In Proceedings of the Seventeenth International Joint Conference on Artificial Intelligence (pp. 741–748). Seattle, WA: Morgan Kaufmann.

  55. Pazzani, M., & Kibler, D. (1992). The utility of knowledge in inductive learning. Machine Learning, 9, 57–94.

    Google Scholar 

  56. Pearl, J. (1988). Probabilistic reasoning in intelligent systems: Networks of plausible inference. San Francisco, CA: Morgan Kaufmann.

    Google Scholar 

  57. Poole, D. (1993). Probabilistic Horn abduction and Bayesian networks. Artificial Intelligence, 64, 81–129.

    MATH  Article  Google Scholar 

  58. Poole, D. (2003). First-order probabilistic inference. Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence (pp. 985–991). Acapulco, Mexico: Morgan Kaufmann.

  59. Popescul, A., & Ungar, L. H. (2003). Structural logistic regression for link analysis. In Proceedings of the Second International Workshop on Multi-Relational Data Mining (pp. 92–106). Washington, DC: ACM Press.

  60. Puech, A., & Muggleton, S. (2003). A comparison of stochastic logic programs and Bayesian logic programs. Proceedings of the IJCAI-2003 Workshop on Learning Statistical Models from Relational Data (pp. 121–129). Acapulco, Mexico: IJCAII.

  61. Richardson, M., & Domingos, P. (2003). Building large knowledge bases by mass collaboration. Proceedings of the Second International Conference on Knowledge Capture (pp. 129–137). Sanibel Island, FL: ACM Press.

  62. Riezler, S. (1998). Probabilistic constraint logic programming. Doctoral dissertation, University of Tubingen, Tubingen, Germany.

  63. Robinson, J. A. (1965). A machine-oriented logic based on the resolution principle. Journal of the ACM, 12, 23–41.

    MATH  Article  Google Scholar 

  64. Roth, D. (1996). On the hardness of approximate reasoning. Artificial Intelligence, 82, 273–302.

    MathSciNet  Article  Google Scholar 

  65. Sanghai, S., Domingos, P., & Weld, D. (2003). Dynamic probabilistic relational models. Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence (pp. 992–997). Acapulco, Mexico: Morgan Kaufmann.

  66. Santos Costa, V., Page, D., Qazi, M., & Cussens, J. (2003). CLP(BN): Constraint logic programming for probabilistic knowledge. In Proceedings of the Nineteenth Conference on Uncertainty in Artificial Intelligence (pp. 517–524). Acapulco, Mexico: Morgan Kaufmann.

  67. Sato, T., & Kameya, Y. (1997). PRISM: A symbolic-statistical modeling language. In Proceedings of the Fifteenth International Joint Conference on Artificial Intelligence (pp. 1330–1335). Nagoya, Japan: Morgan Kaufmann.

  68. Taskar, B., Abbeel, P., & Koller, D. (2002). Discriminative probabilistic models for relational data. In Proceedings of the Eighteenth Conference on Uncertainty in Artificial Intelligence (pp. 485–492). Edmonton, Canada: Morgan Kaufmann.

  69. Towell, G. G., & Shavlik, J. W. (1994). Knowledge-based artificial neural networks. Artificial Intelligence, 70, 119–165.

    MATH  Article  Google Scholar 

  70. Wasserman, S., & Faust, K. (1994). social Network Analysis: Methods and Applications. Cambridge, UK: Cambridge University Press.

    Google Scholar 

  71. Wellman, M., Breese, J. S., & Goldman, R. P. (1992). From knowledge bases to decision models. Knowledge Engineering Review, 7.

  72. Winkler, W. (1999). The state of record linkage and current research problems. Technical Report, Statistical Research Division, U.S. Census Bureau.

  73. Yedidia, J. S., Freeman, W. T., & Weiss, Y. (2001). Generalized belief propagation. In T. Leen, T. Dietterich and V. Tresp (Eds.), Advances in neural information processing systems 13, 689–695. Cambridge, MA: MIT Press.

  74. Zhu, C., Byrd, R. H., Lu, P., & Nocedal, J. (1997). Algorithm 778: L-BFGSB, FORTRAN routines for large scale bound constrained optimization. ACM Transactions on Mathematical Software, 23, 550–560.

    MATH  MathSciNet  Article  Google Scholar 

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Pedro Domingos.

Additional information

Editors: Hendrik Blockeel, David Jensen and Stefan Kramer

An erratum to this article is available at http://dx.doi.org/10.1007/s10994-006-8633-8.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Richardson, M., Domingos, P. Markov logic networks. Mach Learn 62, 107–136 (2006). https://doi.org/10.1007/s10994-006-5833-1

Download citation

Keywords

  • Statistical relational learning
  • Markov networks
  • Markov random fields
  • Log-linear models
  • Graphical models
  • First-order logic
  • Satisfiability
  • Inductive logic programming
  • Knowledge-based model construction
  • Markov chain Monte Carlo
  • Pseudo-likelihood
  • Link prediction