Abstract
We propose a simple approach to combining first-order logic and probabilistic graphical models in a single representation. A Markov logic network (MLN) is a first-order knowledge base with a weight attached to each formula (or clause). Together with a set of constants representing objects in the domain, it specifies a ground Markov network containing one feature for each possible grounding of a first-order formula in the KB, with the corresponding weight. Inference in MLNs is performed by MCMC over the minimal subset of the ground network required for answering the query. Weights are efficiently learned from relational databases by iteratively optimizing a pseudo-likelihood measure. Optionally, additional clauses are learned using inductive logic programming techniques. Experiments with a real-world database and knowledge base in a university domain illustrate the promise of this approach.
Article PDF
References
Bacchus, F. (1990). Representing and reasoning with probabilistic knowledge. Cambridge, MA: MIT Press.
Bacchus, F., Grove, A. J., Halpern, J. Y., & Koller, D. (1996). From statistical knowledge bases to degrees of belief. Artificial Intelligence, 87, 75–143.
Bergadano, F., & Giordana, A. (1988). A knowledge-intensive approach to concept induction. Proceedings of the Fifth International Conference on Machine Learning (pp. 305–317). Ann Arbor, MI: Morgan Kaufmann.
Berners-Lee, T., Hendler, J., & Lassila, O. (2001). The Semantic Web. Scientific American, 284: 5, 34–43.
Besag, J. (1975). Statistical analysis of non-lattice data. The Statistician, 24, 179–195.
Buntine, W. (1994). Operations for learning with graphical models. Journal of Artificial Intelligence Research, 2, 159–225.
Byrd, R. H., Lu, P., & Nocedal, J. (1995). A limited memory algorithm for bound constrained optimization. SIAM Journal on Scientific and Statistical Computing, 16, 1190–1208.
Chakrabarti, S., Dom, B., & Indyk, P. (1998). Enhanced hypertext categorization using hyperlinks. Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data (pp. 307–318). Seattle, WA: ACM Press.
Collins, M. (2002). Discriminative training methods for hidden Markov models: Theory and experiments with perceptron algorithms. In Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing. Philadelphia, PA.
Cumby, C., & Roth, D. (2003). Feature extraction languages for propositionalized relational learning. Proceedings of the IJCAI-2003 Workshop on Learning Statistical Models from Relational Data (pp. 24–31). Acapulco, Mexico: IJCAII.
Cussens, J. (1999). Loglinear models for first-order probabilistic reasoning. In Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence (pp. 126–133). Stockholm, Sweden: Morgan Kaufmann.
Cussens, J. (2003). Individuals, relations and structures in probabilistic models. InProceedings of the IJCAI-2003 Workshop on Learning Statistical Models from Relational Data (pp. 32–36). Acapulco, Mexico: IJCAII.
De Raedt, L., & Dehaspe, L. (1997). Clausal discovery. Machine Learning, 26, 99–146.
DeGroot, M. H., & Schervish, M. J. (2002). Probability and statistics. Boston, MA: AddisonWesley. 3rd edition.
Dehaspe, L. (1997). Maximum entropy modeling with clausal constraints. Proceedings of the Seventh International Workshop on Inductive Logic Programming (pp. 109–125). Prague, Czech Republic: Springer.
Della Pietra, S., Della Pietra, V., & Lafferty, J. (1997). Inducing features of random fields. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19, 380–392.
Dietterich, T., Getoor, L., & Murphy, K. (Eds.). (2003). Proceedings of the ICML-2004 Workshop on Statistical Relational Learning and its Connections to Other Fields. Banff, Canada: IMLS.
Domingos, P., & Pazzani, M. (1997). On the optimality of the simple Bayesian classifier under zero-one loss. Machine Learning, 29, 103–130.
Džeroski, S., & Blockeel, H. (Eds.). (2004). Proceedings of the Third International Workshop on Multi-Relational Data Mining. Seattle, WA: ACM Press.
Džeroski, S., & De Raedt, L. (2003). Special issue on multi-relational data mining: The current frontiers. SIGKDD Explorations, 5.
Džeroski, S., De Raedt, L., & Wrobel, S. (Eds.). (2002). Proceedings of the First International Workshop on Multi-Relational Data Mining. Edmonton, Canada: ACM Press.
Džeroski, S., De Raedt, L., & Wrobel, S. (Eds.). (2003). Proceedings of the Second International Workshop on Multi-Relational Data Mining. Washington, DC: ACM Press.
Edwards, R., & Sokal, A. (1988). Generalization of the Fortuin-Kasteleyn-Swendsen-Wang representation and Monte Carlo algorithm. Physics Review D (pp. 2009–2012).
Flake, G. W., Lawrence, S., & Giles, C. L. (2000). Efficient identification of Web communities. Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 150–160). Boston, MA: ACM Press.
Friedman, N., Getoor, L., Koller, D., & Pfeffer, A. (1999). Learning probabilistic relational models. In Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence (pp. 1300–1307). Stockholm, Sweden: Morgan Kaufmann.
Genesereth, M. R., & Nilsson, N. J. (1987). Logical foundations of artificial intelligence. San Mateo, CA: Morgan Kaufmann.
Getoor, L., & Jensen, D. (Eds.). (2000).InProceedings of the AAAI-2000 Workshop on Learning Statistical Models from Relational Data. Austin, TX: AAAI Press.
Getoor, L., & Jensen, D. (Eds.). (2003). In Proceedings of the IJCAI-2003 Workshop on Learning Statistical Models from Relational Data. Acapulco, Mexico: IJCAII.
Geyer, C. J., & Thompson, E. A. (1992). Constrained Monte Carlo maximum likelihood for dependent data. Journal of the Royal Statistical Society, Series B, 54, 657–699.
Gilks, W. R., Richardson, S., & Spiegelhalter, D. J. (Eds.). (1996). Markov chain Monte Carlo in practice. London, UK: Chapman and Hall.
Halpern, J. (1990). An analysis of first-order logics of probability. Artificial Intelligence, 46, 311–350.
Heckerman, D., Chickering, D. M., Meek, C., Rounthwaite, R., & Kadie, C. (2000). Dependency networks for inference, collaborative filtering, and data visualization. Journal of Machine Learning Research, 1, 49–75.
Heckerman, D., Geiger, D., & Chickering, D. M. (1995). Learning Bayesian networks: The combination of knowledge and statistical data. Machine Learning, 20, 197–243.
Heckerman, D., Meek, C., & Koller, D. (2004). Probabilistic entity-relationship models, PRMs, and plate models. In Proceedings of the ICML-2004 Workshop on Statistical Relational Learning and its Connections to Other Fields (pp. 55–60). Banff, Canada: IMLS.
Hulten, G., & Domingos, P. (2002). Mining complex models from arbitrarily large databases in constant time. In Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 525–531). Edmonton, Canada: ACM Press.
Jaeger, M. (1998). Reasoning about infinite random structures with relational Bayesian networks. Proceedings of the Sixth International Conference on Principles of Knowledge Representation and Reasoning. Trento, Italy: Morgan Kaufmann.
Jaeger, M. (2000). On the complexity of inference about probabilistic relational models. Artificial Intelligence, 117, 297–308.
Kautz, H., Selman, B., & Jiang, Y. (1997). A general stochastic approach to solving problems with hard and soft constraints. In D. Gu, J. Du & P. Pardalos (Eds.), The satisfiability problem: Theory and applications, (pp. 573–586). New York, NY: American Mathematical Society.
Kersting, K., & De Raedt, L. (2001). Towards combining inductive logic programming with Bayesian networks. In Proceedings of the Eleventh International Conference on Inductive Logic Programming (pp. 118–131). Strasbourg, France: Springer.
Laffar, J., & Lassez, J. (1987). Constraint logic programming. Proceedings of the Fourteenth ACM Conference on Principles of Programming Languages (pp. 111–119). Munich, Germany: ACM Press.
Lavrač, N., & Džeroski, S. (1994). Inductive Logic Programming: Techniques and Applications. Chichester, UK: Ellis Horwood.
Liu, D. C., & Nocedal, J. (1989). On the limited memory BFGS method for large scale optimization. Mathematical Programming, 45, 503–528.
Lloyd, J. W. (1987). Foundations of logic programming. Berlin, Germany: Springer.
Lloyd-Richardson, E., Kazura, A., Stanton, C., Niaura, R., & Papandonatos, G. (2002). Differentiating stages of smoking intensity among adolescents: Stage-specific psychological and social influences. Journal of Consulting and Clinical Psychology, 70.
Milch, B., Marthi, B., & Russell, S. (2004). BLOG: Relational modeling with unknown objects. Proceedings of the ICML-2004 Workshop on Statistical Relational Learning and its Connections to Other Fields (pp. 67–73). Banff, Canada: IMLS.
Muggleton, S. (1996). Stochastic logic programs. In L. De Raedt (Ed.), Advances in inductive logic programming (pp.254–264). Amsterdam, Netherlands: IOS Press.
Neville, J., & Jensen, D. (2003). Collective classification with relational dependency networks. Proceedings of the Second International Workshop on Multi-Relational Data Mining (pp. 77–91). Washington, DC: ACM Press.
Ngo, L., & Haddawy, P. (1997). Answering queries from context-sensitive probabilistic knowledge bases. Theoretical Computer Science, 171, 147–177.
Nilsson, N. (1986). Probabilistic logic. Artificial Intelligence, 28, 71–87.
Nocedal, J., & Wright, S. J. (1999). Numerical Optimization. New York, NY: Springer.
Ourston, D., & Mooney, R. J. (1994). Theory refinement combining analytical and empirical methods. Artificial Intelligence, 66, 273–309.
Parag, & Domingos, P. (2004). Multi-relational record linkage. In Proceedings of the Third International Workshop on Multi-Relational Data Mining. Seattle, WA: ACM Press.
Paskin, M. (2002). Maximum entropy probabilistic logic (Technical Report UCB/CSD-01-1161). Computer Science Division, University of California, Berkeley, CA.
Pasula, H., & Russell, S. (2001). Approximate inference for first-order probabilistic languages. In Proceedings of the Seventeenth International Joint Conference on Artificial Intelligence (pp. 741–748). Seattle, WA: Morgan Kaufmann.
Pazzani, M., & Kibler, D. (1992). The utility of knowledge in inductive learning. Machine Learning, 9, 57–94.
Pearl, J. (1988). Probabilistic reasoning in intelligent systems: Networks of plausible inference. San Francisco, CA: Morgan Kaufmann.
Poole, D. (1993). Probabilistic Horn abduction and Bayesian networks. Artificial Intelligence, 64, 81–129.
Poole, D. (2003). First-order probabilistic inference. Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence (pp. 985–991). Acapulco, Mexico: Morgan Kaufmann.
Popescul, A., & Ungar, L. H. (2003). Structural logistic regression for link analysis. In Proceedings of the Second International Workshop on Multi-Relational Data Mining (pp. 92–106). Washington, DC: ACM Press.
Puech, A., & Muggleton, S. (2003). A comparison of stochastic logic programs and Bayesian logic programs. Proceedings of the IJCAI-2003 Workshop on Learning Statistical Models from Relational Data (pp. 121–129). Acapulco, Mexico: IJCAII.
Richardson, M., & Domingos, P. (2003). Building large knowledge bases by mass collaboration. Proceedings of the Second International Conference on Knowledge Capture (pp. 129–137). Sanibel Island, FL: ACM Press.
Riezler, S. (1998). Probabilistic constraint logic programming. Doctoral dissertation, University of Tubingen, Tubingen, Germany.
Robinson, J. A. (1965). A machine-oriented logic based on the resolution principle. Journal of the ACM, 12, 23–41.
Roth, D. (1996). On the hardness of approximate reasoning. Artificial Intelligence, 82, 273–302.
Sanghai, S., Domingos, P., & Weld, D. (2003). Dynamic probabilistic relational models. Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence (pp. 992–997). Acapulco, Mexico: Morgan Kaufmann.
Santos Costa, V., Page, D., Qazi, M., & Cussens, J. (2003). CLP(BN): Constraint logic programming for probabilistic knowledge. In Proceedings of the Nineteenth Conference on Uncertainty in Artificial Intelligence (pp. 517–524). Acapulco, Mexico: Morgan Kaufmann.
Sato, T., & Kameya, Y. (1997). PRISM: A symbolic-statistical modeling language. In Proceedings of the Fifteenth International Joint Conference on Artificial Intelligence (pp. 1330–1335). Nagoya, Japan: Morgan Kaufmann.
Taskar, B., Abbeel, P., & Koller, D. (2002). Discriminative probabilistic models for relational data. In Proceedings of the Eighteenth Conference on Uncertainty in Artificial Intelligence (pp. 485–492). Edmonton, Canada: Morgan Kaufmann.
Towell, G. G., & Shavlik, J. W. (1994). Knowledge-based artificial neural networks. Artificial Intelligence, 70, 119–165.
Wasserman, S., & Faust, K. (1994). social Network Analysis: Methods and Applications. Cambridge, UK: Cambridge University Press.
Wellman, M., Breese, J. S., & Goldman, R. P. (1992). From knowledge bases to decision models. Knowledge Engineering Review, 7.
Winkler, W. (1999). The state of record linkage and current research problems. Technical Report, Statistical Research Division, U.S. Census Bureau.
Yedidia, J. S., Freeman, W. T., & Weiss, Y. (2001). Generalized belief propagation. In T. Leen, T. Dietterich and V. Tresp (Eds.), Advances in neural information processing systems 13, 689–695. Cambridge, MA: MIT Press.
Zhu, C., Byrd, R. H., Lu, P., & Nocedal, J. (1997). Algorithm 778: L-BFGSB, FORTRAN routines for large scale bound constrained optimization. ACM Transactions on Mathematical Software, 23, 550–560.
Author information
Authors and Affiliations
Corresponding author
Additional information
Editors: Hendrik Blockeel, David Jensen and Stefan Kramer
An erratum to this article is available at http://dx.doi.org/10.1007/s10994-006-8633-8.
Rights and permissions
About this article
Cite this article
Richardson, M., Domingos, P. Markov logic networks. Mach Learn 62, 107–136 (2006). https://doi.org/10.1007/s10994-006-5833-1
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10994-006-5833-1