Machine Learning

, Volume 106, Issue 9–10, pp 1289–1329 | Cite as

Efficient parameter learning of Bayesian network classifiers

  • Nayyar A. Zaidi
  • Geoffrey I. Webb
  • Mark J. Carman
  • François Petitjean
  • Wray Buntine
  • Mike Hynes
  • Hans De Sterck
Part of the following topical collections:
  1. Special Issue of the ECML PKDD 2017 Journal Track


Recent advances have demonstrated substantial benefits from learning with both generative and discriminative parameters. On the one hand, generative approaches address the estimation of the parameters of the joint distribution—\(\mathrm{P}(y,\mathbf{x})\), which for most network types is very computationally efficient (a notable exception to this are Markov networks) and on the other hand, discriminative approaches address the estimation of the parameters of the posterior distribution—and, are more effective for classification, since they fit \(\mathrm{P}(y|\mathbf{x})\) directly. However, discriminative approaches are less computationally efficient as the normalization factor in the conditional log-likelihood precludes the derivation of closed-form estimation of parameters. This paper introduces a new discriminative parameter learning method for Bayesian network classifiers that combines in an elegant fashion parameters learned using both generative and discriminative methods. The proposed method is discriminative in nature, but uses estimates of generative probabilities to speed-up the optimization process. A second contribution is to propose a simple framework to characterize the parameter learning task for Bayesian network classifiers. We conduct an extensive set of experiments on 72 standard datasets and demonstrate that our proposed discriminative parameterization provides an efficient alternative to other state-of-the-art parameterizations.



This research has been supported by the Australian Research Council (ARC) under Grant DP140100087. This material is based upon work supported by the Air Force Office of Scientific Research, Asian Office of Aerospace Research and Development (AOARD) under Award Numbers FA2386-15-1-4007 and FA2386-16-1-4023. The authors would like to thank Reza Haffari and Ana Martinez for helpful discussion during the course of this research. Authors would like to acknowledge Jeusus Cerquides for his extremely useful ideas and suggestions that shaped this work.


  1. Buntine, W. (1994). Operations forlearning with graphical models. Journal of Artificial Intelligence Research, 2, 159–225.Google Scholar
  2. Byrd, R., Lu, P., & Nocedal, J. (1995). A limited memory algorithm for bound constrained optimization. SIAM Journal on Scientific and Statistical Computing, 16(5), 1190–1208.MathSciNetCrossRefMATHGoogle Scholar
  3. Carvalho, A., Roos, T., Oliveira, A., & Myllymaki, P. (2011). Discriminative learning of Bayesian networks via factorized conditional log-likelihood. Journal of Machine Learning Research.Google Scholar
  4. Chow, C., & Liu, C. (1968). Approximating discrete probability distributions with dependence trees. IEEE Transactions on Information Theory, 14(3), 462–467.CrossRefMATHGoogle Scholar
  5. Demšar, J. (2006). Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research, 7, 1–30.MathSciNetMATHGoogle Scholar
  6. Fayyad, U. M., & Irani, K. B. (1992). On the handling of continuous-valued attributes in decision tree generation. Machine Learning, 8(1), 87–102.MATHGoogle Scholar
  7. Frank, A., & Asuncion, A. (2010). UCI machine learning repository.
  8. Friedman, N., Geiger, D., & Goldszmidt, M. (1997). Bayesian network classifiers. Machine Learning, 29(2), 131–163.CrossRefMATHGoogle Scholar
  9. Greiner, R., & Zhou, W. (2002). Structural extension to logistic regression: Discriminative parameter learning of belief net classifiers. In Annual national conference on artificial intelligence (AAAI), pp. 167–173.Google Scholar
  10. Greiner, R., Su, X., Shen, B., & Zhou, W. (2005). Structural extensions to logistic regression: Discriminative parameter learning of belief net classifiers. Machine Learning, 59, 297–322.Google Scholar
  11. Grossman, D., & Domingos, P. (2004). Learning Bayesian network classifiers by maximizing conditional likelihood. In ICML.Google Scholar
  12. Heckerman, D., & Meek, C. (1997). Models and selection criteria for regression and classification. In International conference on uncertainty in artificial intelligence.Google Scholar
  13. Jebara, T. (2003). Machine Learning: Discriminative and Generative. Berlin: Springer.MATHGoogle Scholar
  14. Kohavi, R., & Wolpert, D. (1996). Bias plus variance decomposition for zero-one loss functions. In ICML (pp. 275–283).Google Scholar
  15. Langford, J., Li, L., & Strehl, A. (2007). Vowpal wabbit online learning project.
  16. Martinez, A., Chen, S., Webb, G. I., & Zaidi, N. A. (2016). Scalable learning of Bayesian network classifiers. Journal of Machine Learning Research, 17, 1–35.Google Scholar
  17. Ng, A., & Jordan, M. (2002). On discriminative vs. generative classifiers: A comparison of logistic regression and naive bayes. In Advances in neural information processing systems.Google Scholar
  18. Pernkopf, F., & Bilmes, J. (2005). Discriminative versus generative parameter and structure learning of Bayesian network classifiers. In ICML.Google Scholar
  19. Pernkopf, F., & Bilms, J. A. (2010). Efficient heuristics for discriminative structure learning of Bayesian network classifiers. Journal of Machine Learning Research, 11, 2323–2360.Google Scholar
  20. Pernkopf, F., & Wohlmayr, M. (2009). On discriminative parameter learning of Bayesian network classifiers. In ECML PKDD.Google Scholar
  21. Ripley, B. D. (1996). Pattern Recognition and Neural Networks. Cambridge: Cambridge University Press.CrossRefMATHGoogle Scholar
  22. Roos, T., Wettig, H., Grünwald, P., Myllymäki, P., & Tirri, H. (2005). On discriminative Bayesian network classifiers and logistic regression. Machine Learning, 59(3), 267–296.MATHGoogle Scholar
  23. Rubinstein, Y. D., & Hastie, T. (1997). Discriminative vs informative learning. In AAAI.Google Scholar
  24. Sahami, M. (1996). Learning limited dependence bayesian classifiers. In Proceedings of the second international conference on knowledge discovery and data mining (pp. 335–338).Google Scholar
  25. Su, J., Zhang, H., Ling, C., & Matwin, S. (2008). Discriminative parameter learning for Bayesian networks. In ICML.Google Scholar
  26. Webb, G. I. (2000). Multiboosting: A technique for combining boosting and wagging. Machine Learning, 40(2), 159–196.CrossRefGoogle Scholar
  27. Webb, G. I., Boughton, J., Zheng, F., Ting, K. M., & Salem, H. (2012). Learning by extrapolation from marginal to full-multivariate probability distributions: decreasingly naive Bayesian classification. Machine Learning, 86(2), 233–272.Google Scholar
  28. Zaidi, N. A., Carman, M. J., Cerquides, J., & Webb, G. I. (2014). Naive-bayes inspired effective pre-conditioners for speeding-up logistic regression. In IEEE international conference on data mining.Google Scholar
  29. Zaidi, N. A., Cerquides, J., Carman, M. J., & Webb, G. I. (2013). Alleviating naive Bayes attribute independence assumption by attribute weighting. Journal of Machine Learning Research, 14, 1947–1988.MathSciNetMATHGoogle Scholar
  30. Zaidi, N. A., Petitjean, F., & Webb, G. I. (2016). Preconditioning an artificial neural network using naive bayes. In Proceedings of the 20th Pacific–Asia conference on knowledge discovery and data mining (PAKDD).Google Scholar
  31. Zaidi, N. A., Webb, G. I., Carman, M. J., & Petitjean, F. (2015). Deep Broad Learning—Big models for big data. arXiv:1509.01346.
  32. Zhu, C., Byrd, R. H., & Nocedal, J. (1997). LBFGSB, Fortran routines for large scale bound constrained optimization. ACM Transactions on Mathematical Software, 23(4), 550–560.MathSciNetCrossRefMATHGoogle Scholar

Copyright information

© The Author(s) 2017

Authors and Affiliations

  • Nayyar A. Zaidi
    • 1
  • Geoffrey I. Webb
    • 1
  • Mark J. Carman
    • 1
  • François Petitjean
    • 1
  • Wray Buntine
    • 1
  • Mike Hynes
    • 2
  • Hans De Sterck
    • 3
  1. 1.Faculty of Information TechnologyMonash UniversityClaytonAustralia
  2. 2.Department of Applied MathematicsUniversity of WaterlooWaterlooCanada
  3. 3.School of Mathematical SciencesMonash UniversityClaytonAustralia

Personalised recommendations