Learning Naive Bayes for Probability Estimation by Feature Selection

  • Liangxiao Jiang
  • Harry Zhang
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4013)


Naive Bayes is a well-known effective and efficient classification algorithm. But its probability estimation is poor. In many applications, however, accurate probability estimation is often required in order to make optimal decisions. Usually, probability estimation is measured by conditional log likelihood (CLL). There have been some learning algorithms proposed recently to extend naive Bayes for high CLL, such as ERL [8, 9] and BNC-2P [10]. Unfortunately, their computational complexity is relatively high. Is there a simple but effective and efficient approach to improve the probability estimation of naive Bayes? In this paper, we propose to use feature selection for this purpose. More precisely, a search process is conducted to select a subset of attributes, and then a naive Bayes is deployed on the selected attribute set. In fact, feature selection has been successfully applied to naive Bayes and achieves significant improvement in classification accuracy. Among the feature selection algorithms for naive Bayes, selective Bayesian classifiers (SBC) by Langley et al.[13] demonstrates good performance. In this paper, we first study the performance of SBC in terms of probability estimation, and then propose an improved SBC algorithm SBC-CLL, in which the CLL score is directly used for attribute selection, instead of using classification accuracy. Our experiments show that both SBC and SBC-CLL achieve significant improvement over naive Bayes, and that SBC-CLL outperforms SBC substantially, in probability estimation measured by CLL. Our work provides an efficient and surprisingly effective approach to improve the probability estimation of naive Bayes.


Feature Selection Bayesian Network Probability Estimation Feature Selection Algorithm Attribute Node 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Bennett, P.N.: Assessing the calibration of Naive Bayes’ posterior estimates. Technical Report No. CMU-CS00-155 (2000)Google Scholar
  2. 2.
    Chickering, D.M.: Learning Bayesian networks is NP-Complete. In: Fisher, D., Lenz, H. (eds.) Learning from Data: Artificial Intelligence and Statistics V, pp. 121–130. Springer, Heidelberg (1996)Google Scholar
  3. 3.
    Chickering, D.M.: The WinMine Toolkit. Technical Report MSR-TR-2002-103 (2002)Google Scholar
  4. 4.
    Domingos, P., Pazzani, M.: Beyond Independence: Conditions for the Optimality of the Simple Bayesian Classifier. Machine Learning 29, 103–130 (1997)MATHCrossRefGoogle Scholar
  5. 5.
    Duda, R.O., Hart, P.E.: Pattern Classification and Scene Analysis. A Wiley-Interscience Publication, Chichester (1973)MATHGoogle Scholar
  6. 6.
    Frank, E., Trigg, L., Holmes, G., Witten, I.H.: Naive Bayes for Regression. Machine Learning 41(1), 5–15 (2000)CrossRefGoogle Scholar
  7. 7.
    Friedman, Geiger, Goldszmidt,: Bayesian Network Classifiers. Machine Learning 29, 131–163 (1997)MATHCrossRefGoogle Scholar
  8. 8.
    Greiner, R., Zhou, W.: Structural Extension to Logistic Regression: Discriminative Parameter Learning of Belief Net Classifiers. In: Proceedings of the Eighteenth National Conference on Artificial Intelligence, pp. 167–173. AAAI Press, Menlo Park (2002)Google Scholar
  9. 9.
    Greiner, R., Su, X., Shen, B., Zhou, W.: Structural extension to logistic regression: Discriminative parameter learning of belief net classifiers. Machine Learning 59(3) (2005)Google Scholar
  10. 10.
    Grossman, D., Domingos, P.: Learning Bayesian Network Classifiers by Maximizing Conditional Likelihood. In: Proceedings of the Twenty-First International Conference on Machine Learning, pp. 361–368. ACM Press, Banff, Canada (2004)Google Scholar
  11. 11.
    Guo, Y., Greiner, R.: Discriminative Model Selection for Belief Net Structures. In: Proceedings of the Twentieth National Conference on Artificial Intelligence, pp. 770–776. AAAI Press, Menlo Park (2005)Google Scholar
  12. 12.
    Langley, P., Iba, W., Thomas, K.: An analysis of Bayesian classifiers. In: Proceedings of the Tenth National Conference of Artificial Intelligence, pp. 223C228. AAAI Press, Menlo Park (1992)Google Scholar
  13. 13.
    Langley, P., Sage, S.: Induction of selective Bayesian classifiers. In: Proceedings of the Tenth Conference on Uncertainty in Artificial Intelligence, pp. 339–406 (1994)Google Scholar
  14. 14.
    Lowd, D., Domingos, P.: Naive Bayes Models for Probability Estimation. In: Proceedings of the Twenty-Second International Conference on Machine Learning, pp. 529–536. ACM Press, New York (2005)Google Scholar
  15. 15.
    Kohavi, R.: Scaling Up the Accuracy of Naive-Bayes Classifiers: A Decision-Tree Hybrid. In: Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD-1996), pp. 202–207. AAAI Press, Menlo Park (1996)Google Scholar
  16. 16.
    Merz, C., Murphy, P., Aha, D.: UCI repository of machine learning databases, Dept of ICS, University of California, Irvine (1997),
  17. 17.
    Pearl, J.: Probabilistic Reasoning in Intelligent Systems. Morgan Kaufmann, San Francisco (1988)Google Scholar
  18. 18.
    Provost, F.J., Domingos, P.: Tree Induction for Probability-Based Ranking. Machine Learning 52(3), 199–215 (2003)MATHCrossRefGoogle Scholar
  19. 19.
  20. 20.
    Witten, I.H., Frank, E.: data mining-Practical Machine Learning Tools and Techniques with Java Implementation. Morgan Kaufmann, San Francisco (2000)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Liangxiao Jiang
    • 1
  • Harry Zhang
    • 2
  1. 1.Faculty of Computer ScienceChina University of GeosciencesHubeiP.R. China
  2. 2.Faculty of Computer ScienceUniversity of New BrunswickFrederictonCanada

Personalised recommendations