# Efficient parameter learning of Bayesian network classifiers

**Part of the following topical collections:**

## Abstract

Recent advances have demonstrated substantial benefits from learning with both generative and discriminative parameters. On the one hand, generative approaches address the estimation of the parameters of the joint distribution—\(\mathrm{P}(y,\mathbf{x})\), which for most network types is very computationally *efficient* (a notable exception to this are Markov networks) and on the other hand, discriminative approaches address the estimation of the parameters of the posterior distribution—and, are more *effective* for classification, since they fit \(\mathrm{P}(y|\mathbf{x})\) directly. However, discriminative approaches are less computationally efficient as the normalization factor in the conditional log-likelihood precludes the derivation of closed-form estimation of parameters. This paper introduces a new discriminative parameter learning method for Bayesian network classifiers that combines in an elegant fashion parameters learned using both generative and discriminative methods. The proposed method is discriminative in nature, but uses estimates of generative probabilities to speed-up the optimization process. A second contribution is to propose a simple framework to characterize the parameter learning task for Bayesian network classifiers. We conduct an extensive set of experiments on 72 standard datasets and demonstrate that our proposed discriminative parameterization provides an efficient alternative to other state-of-the-art parameterizations.

## Notes

### Acknowledgements

This research has been supported by the Australian Research Council (ARC) under Grant DP140100087. This material is based upon work supported by the Air Force Office of Scientific Research, Asian Office of Aerospace Research and Development (AOARD) under Award Numbers FA2386-15-1-4007 and FA2386-16-1-4023. The authors would like to thank Reza Haffari and Ana Martinez for helpful discussion during the course of this research. Authors would like to acknowledge Jeusus Cerquides for his extremely useful ideas and suggestions that shaped this work.

### References

- Buntine, W. (1994). Operations forlearning with graphical models.
*Journal of Artificial Intelligence Research*,*2*, 159–225.Google Scholar - Byrd, R., Lu, P., & Nocedal, J. (1995). A limited memory algorithm for bound constrained optimization.
*SIAM Journal on Scientific and Statistical Computing*,*16*(5), 1190–1208.MathSciNetCrossRefMATHGoogle Scholar - Carvalho, A., Roos, T., Oliveira, A., & Myllymaki, P. (2011). Discriminative learning of Bayesian networks via factorized conditional log-likelihood.
*Journal of Machine Learning Research*.Google Scholar - Chow, C., & Liu, C. (1968). Approximating discrete probability distributions with dependence trees.
*IEEE Transactions on Information Theory*,*14*(3), 462–467.CrossRefMATHGoogle Scholar - Demšar, J. (2006). Statistical comparisons of classifiers over multiple data sets.
*Journal of Machine Learning Research*,*7*, 1–30.MathSciNetMATHGoogle Scholar - Fayyad, U. M., & Irani, K. B. (1992). On the handling of continuous-valued attributes in decision tree generation.
*Machine Learning*,*8*(1), 87–102.MATHGoogle Scholar - Frank, A., & Asuncion, A. (2010).
*UCI machine learning repository*. http://archive.ics.uci.edu/ml. - Friedman, N., Geiger, D., & Goldszmidt, M. (1997). Bayesian network classifiers.
*Machine Learning*,*29*(2), 131–163.CrossRefMATHGoogle Scholar - Greiner, R., & Zhou, W. (2002). Structural extension to logistic regression: Discriminative parameter learning of belief net classifiers. In
*Annual national conference on artificial intelligence (AAAI)*, pp. 167–173.Google Scholar - Greiner, R., Su, X., Shen, B., & Zhou, W. (2005). Structural extensions to logistic regression: Discriminative parameter learning of belief net classifiers.
*Machine Learning*,*59*, 297–322.Google Scholar - Grossman, D., & Domingos, P. (2004). Learning Bayesian network classifiers by maximizing conditional likelihood. In
*ICML*.Google Scholar - Heckerman, D., & Meek, C. (1997). Models and selection criteria for regression and classification. In
*International conference on uncertainty in artificial intelligence*.Google Scholar - Jebara, T. (2003).
*Machine Learning: Discriminative and Generative*. Berlin: Springer.MATHGoogle Scholar - Kohavi, R., & Wolpert, D. (1996). Bias plus variance decomposition for zero-one loss functions. In
*ICML*(pp. 275–283).Google Scholar - Langford, J., Li, L., & Strehl, A. (2007). Vowpal wabbit online learning project. https://github.com/JohnLangford/vowpal_wabbit/wiki.
- Martinez, A., Chen, S., Webb, G. I., & Zaidi, N. A. (2016). Scalable learning of Bayesian network classifiers.
*Journal of Machine Learning Research*,*17*, 1–35.Google Scholar - Ng, A., & Jordan, M. (2002). On discriminative vs. generative classifiers: A comparison of logistic regression and naive bayes. In
*Advances in neural information processing systems*.Google Scholar - Pernkopf, F., & Bilmes, J. (2005). Discriminative versus generative parameter and structure learning of Bayesian network classifiers. In
*ICML*.Google Scholar - Pernkopf, F., & Bilms, J. A. (2010). Efficient heuristics for discriminative structure learning of Bayesian network classifiers.
*Journal of Machine Learning Research, 11*, 2323–2360.Google Scholar - Pernkopf, F., & Wohlmayr, M. (2009). On discriminative parameter learning of Bayesian network classifiers. In
*ECML PKDD*.Google Scholar - Ripley, B. D. (1996).
*Pattern Recognition and Neural Networks*. Cambridge: Cambridge University Press.CrossRefMATHGoogle Scholar - Roos, T., Wettig, H., Grünwald, P., Myllymäki, P., & Tirri, H. (2005). On discriminative Bayesian network classifiers and logistic regression.
*Machine Learning*,*59*(3), 267–296.MATHGoogle Scholar - Rubinstein, Y. D., & Hastie, T. (1997). Discriminative vs informative learning. In
*AAAI*.Google Scholar - Sahami, M. (1996). Learning limited dependence bayesian classifiers. In
*Proceedings of the second international conference on knowledge discovery and data mining*(pp. 335–338).Google Scholar - Su, J., Zhang, H., Ling, C., & Matwin, S. (2008). Discriminative parameter learning for Bayesian networks. In
*ICML*.Google Scholar - Webb, G. I. (2000). Multiboosting: A technique for combining boosting and wagging.
*Machine Learning*,*40*(2), 159–196.CrossRefGoogle Scholar - Webb, G. I., Boughton, J., Zheng, F., Ting, K. M., & Salem, H. (2012). Learning by extrapolation from marginal to full-multivariate probability distributions: decreasingly naive Bayesian classification.
*Machine Learning, 86*(2), 233–272.Google Scholar - Zaidi, N. A., Carman, M. J., Cerquides, J., & Webb, G. I. (2014). Naive-bayes inspired effective pre-conditioners for speeding-up logistic regression. In
*IEEE international conference on data mining*.Google Scholar - Zaidi, N. A., Cerquides, J., Carman, M. J., & Webb, G. I. (2013). Alleviating naive Bayes attribute independence assumption by attribute weighting.
*Journal of Machine Learning Research*,*14*, 1947–1988.MathSciNetMATHGoogle Scholar - Zaidi, N. A., Petitjean, F., & Webb, G. I. (2016). Preconditioning an artificial neural network using naive bayes. In
*Proceedings of the 20th Pacific–Asia conference on knowledge discovery and data mining (PAKDD)*.Google Scholar - Zaidi, N. A., Webb, G. I., Carman, M. J., & Petitjean, F. (2015).
*Deep Broad Learning—Big models for big data*. arXiv:1509.01346. - Zhu, C., Byrd, R. H., & Nocedal, J. (1997). LBFGSB, Fortran routines for large scale bound constrained optimization.
*ACM Transactions on Mathematical Software*,*23*(4), 550–560.MathSciNetCrossRefMATHGoogle Scholar