Bayesian Learning of Markov Network Structure

  • Aleks Jakulin
  • Irina Rish
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4212)


We propose a simple and efficient approach to building undirected probabilistic classification models (Markov networks) that extend naïve Bayes classifiers and outperform existing directed probabilistic classifiers (Bayesian networks) of similar complexity. Our Markov network model is represented as a set of consistent probability distributions on subsets of variables. Inference with such a model can be done efficiently in closed form for problems like class probability estimation. We also propose a highly efficient Bayesian structure learning algorithm for conditional prediction problems, based on integrating along a hill-climb in the structure space. Our prior based on the degrees of freedom effectively prevents overfitting.


Bayesian Network Bayesian Model Aver High Posterior Probability Bayesian Model Average Region Graph 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Friedman, N., Geiger, D., Goldszmidt, M.: Bayesian network classifiers. Machine Learning 29, 131–163 (1997)MATHCrossRefGoogle Scholar
  2. 2.
    Salojärvi, J., Puolamäki, K., Kaski, S.: On discriminative joint density modeling. In: Jeckle, M., Kowalczyk, R., Braun, P. (eds.) GSEM 2004. LNCS, vol. 3270, pp. 341–352. Springer, Heidelberg (2004)Google Scholar
  3. 3.
    Pernkopf, F., Bilmes, J.: Discriminative versus generative parameter and structure learning of Bayesian network classifiers. In: Proc. 22nd ICML, Bonn, Germany, pp. 657–664. ACM Press, New York (2005)Google Scholar
  4. 4.
    Grossman, D., Domingos, P.: Learning Bayesian network classifiers by maximizing conditional likelihood. In: Proc. 21st ICML, Banff, Canada, pp. 361–368. ACM Press, New York (2004)Google Scholar
  5. 5.
    Chow, C.K., Liu, C.N.: Approximating discrete probability distributions with dependence trees. IEEE Trans. on Information Theory 14(3), 462–467 (1968)MATHCrossRefMathSciNetGoogle Scholar
  6. 6.
    Meilă, M., Jordan, M.I.: Learning with mixtures of trees. Journal of Machine Learning Research 1, 1–48 (2000)CrossRefGoogle Scholar
  7. 7.
    Srebro, N.: Maximum likelihood bounded tree-width Markov networks. In: Proceedings of the 17th Conference on Uncertainty in Artificial Intelligence (UAI), pp. 504–511 (2001)Google Scholar
  8. 8.
    Bach, F., Jordan, M.: Thin junction trees. In: Advances in Neural Information Processing Systems, vol. 14, pp. 569–576 (2002)Google Scholar
  9. 9.
    Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: Proc. of the International Conference on Machine Learning (ICML), pp. 282–289 (2001)Google Scholar
  10. 10.
    Yedidia, J.S., Freeman, W.T., Weiss, Y.: Constructing free-energy approximations and generalized belief propagation algorithms. IEEE Transactions on Information Theory 51(7), 2282–2312 (2005)CrossRefMathSciNetGoogle Scholar
  11. 11.
    Jakulin, A., Rish, I., Bratko, I.: Kikuchi-Bayes: Factorized models for approximate classification in closed form. Technical Report RC23314, IBM (2004)Google Scholar
  12. 12.
    Santana, R.: Estimation of distribution algorithms with Kikuchi approximations. Evolutionary Computation 13(1), 67–97 (2005)CrossRefGoogle Scholar
  13. 13.
    Pearl, J.: Probabilistic Reasoning in Intelligent Systems. Morgan Kaufmann, San Francisco (1988)Google Scholar
  14. 14.
    Friedman, N., Koller, D.: Being Bayesian about network structure: A Bayesian approach to structure discovery in Bayesian networks. Machine Learning 50, 95–126 (2003)MATHCrossRefGoogle Scholar
  15. 15.
    Cerquides, J., López de Màntaras, R.: Tractable Bayesian learning of tree augmented naive Bayes classifiers. In: Proc. 20th ICML, pp. 75–82 (2003)Google Scholar
  16. 16.
    Hoeting, J.A., Madigan, D., Raftery, A.E., Volinsky, C.T.: Bayesian model averaging: A tutorial. Statistical Science 14(4), 382–417 (1999)MATHCrossRefMathSciNetGoogle Scholar
  17. 17.
    Krippendorff, K.: Information Theory: Structural Models for Qualitative Data. vol. 07–062. Sage Publications Inc., Beverly Hills, CA (1986)Google Scholar
  18. 18.
    Jakulin, A., Bratko, I.: Analyzing attribute dependencies. In: Lavrač, N., Gamberger, D., Todorovski, L., Blockeel, H. (eds.) PKDD 2003. LNCS, vol. 2838, pp. 229–240. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  19. 19.
    Greiner, R., Su, X., Shen, B., Zhou, W.: Structural extension to logistic regression: Discriminative parameter learning of belief net classifiers. Machine Learning 59, 297–322 (2005)MATHCrossRefGoogle Scholar
  20. 20.
    Jing, Y., Pavlovic, V., Rehg, J.M.: Efficient discriminative learning of Bayesian network classifiers via boosted augmented naive Bayes. In: Proc. 22nd ICML, Bonn, Germany, pp. 369–376. ACM Press, New York (2005)Google Scholar
  21. 21.
    Roos, T., Wettig, H., Grünwald, P., Myllymäki, P., Tirri, H.: On discriminative Bayesian network classifiers and logistic regression. Machine Learning 59, 267–296 (2005)MATHGoogle Scholar
  22. 22.
    Fayyad, U.M., Irani, K.B.: Multi-interval discretization of continuous-valued attributes for classification learning. In: IJCAI 1993, pp. 1022–1027. AAAI Press, Menlo Park (1993)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Aleks Jakulin
    • 1
  • Irina Rish
    • 2
  1. 1.Department of StatisticsColumbia UniversityNew YorkUSA
  2. 2.IBM T.J. Watson Research CenterHawthorneUSA

Personalised recommendations