A Comparative Study of Bandwidth Choice in Kernel Density Estimation for Naive Bayesian Classification

  • Bin Liu
  • Ying Yang
  • Geoffrey I. Webb
  • Janice Boughton
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5476)

Abstract

Kernel density estimation (KDE) is an important method in nonparametric learning. While KDE has been studied extensively in the context of accuracy of distribution estimation, it has not been studied extensively in the context of classification. This paper studies nine bandwidth selection schemes for kernel density estimation in Naive Bayesian classification context, using 52 machine learning benchmark datasets. The contributions of this paper are threefold. First, it shows that some commonly used and very sophisticated bandwidth selection schemes do not give good performance in Naive Bayes. Surprisingly, some very simple bandwidth selection schemes give statistically significantly better performance. Second, it shows that kernel density estimation can achieve statistically significantly better classification performance than a commonly used discretization method in Naive Bayes, but only when appropriate bandwidth selection schemes are applied. Third, this study gives bandwidth distribution patterns for the investigated bandwidth selection schemes.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Fayyad, U.M., Irani, K.B.: Multi-interval discretization of continuous-valued attributes for classification learning. In: Proceedings of the 13th International Joint Conference on Artificial Intelligence, vol. 2, pp. 1022–1027 (1993)Google Scholar
  2. 2.
    Yang, Y., Webb, G.: Discretization for naive-bayes learning: managing discretization bias and variance. Machine Learning (2008) Online FirstGoogle Scholar
  3. 3.
    Bay, S.D.: Multivariate discretization for set mining. Knowledge and Information Systems 3(4), 491–512 (2001)CrossRefMATHGoogle Scholar
  4. 4.
    John, G.H., Langley, P.: Estimating continuous distributions in bayesian classifiers. In: Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence, pp. 338–345 (1995)Google Scholar
  5. 5.
    Silverman, B.W.: Density Estimation for Statistics and Data Analysis, 1st edn. Chapman & Hall/CRC (1986)Google Scholar
  6. 6.
    Wand, M.P., Jones, M.C.: Kernel Smoothing. Chapman & Hall/CRC (1994)Google Scholar
  7. 7.
    Epanechnikov, V.A.: Non-parametric estimation of a multivariate probability density. Theory of Probability and its Applications 14(1), 153–158 (1969)MathSciNetCrossRefMATHGoogle Scholar
  8. 8.
    Friedman, J.H.: On bias, variance, 0/1-loss, and the curse-of-dimensionality. Data Mining and Knowledge Discovery 1(1), 55–77 (1997)MathSciNetCrossRefGoogle Scholar
  9. 9.
    Hall, P., Kang, K.H.: Bandwidth choice for nonparametric classification. Annals of Statistics 33(1), 284–306 (2005)MathSciNetCrossRefMATHGoogle Scholar
  10. 10.
    Bowman, A.W.: An alternative method of cross-validation for the smoothing of density estimates. Biometrika 71(2), 353–360 (1984)MathSciNetCrossRefGoogle Scholar
  11. 11.
    R Development Core Team: R: A Language and Environment for Statistical Computing, Austria, Vienna (2008), http://www.R-project.org
  12. 12.
    Scott, D.W., Terrell, G.R.: Biased and unbiased cross-validation in density estimation. Journal of the American Statistical Association 82(400), 1131–1146 (1987)MathSciNetCrossRefMATHGoogle Scholar
  13. 13.
    Sheather, S.J., Jones, M.C.: A reliable data-based bandwidth selection method for kernel density estimation. Journal of the Royal Statistical Society. Series B 53(3), 683–690 (1991)MathSciNetMATHGoogle Scholar
  14. 14.
    Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)MATHGoogle Scholar
  15. 15.
    Hyndman, R.J.: The problem with sturge’s rule for constructing histograms (1995), http://www-personal.buseco.monash.edu.au/~hyndman/papers
  16. 16.
    Sturges, H.A.: The choice of a class interval. Journal of the American Statistical Association 21(153), 65–66 (1926)CrossRefGoogle Scholar
  17. 17.
    Venables, W.N., Ripley, B.D.: Modern Applied Statistics with S-PLUS, 3rd edn. Springer, Heidelberg (1999)CrossRefMATHGoogle Scholar
  18. 18.
    Asuncion, A., Newman, D.J.: UCI Machine Learning Repository (2007), http://www.ics.uci.edu/~mlearn/MLRepository.html
  19. 19.
    Webb, G.I.: Multiboosting: A technique for combining boosting and wagging. Machine Learning 40(2), 159–196 (2000)CrossRefGoogle Scholar
  20. 20.
    Kohavi, R., Wolpert, D.H.: Bias plus variance decomposition for zero-one loss functions. In: Machine Learning: Proceedings of the Thirteenth International Conference, vol. 275, p. 283 (1996)Google Scholar
  21. 21.
    Friedman, M.: The use of ranks to avoid the assumption of normality implicit in the analysis of variance. Journal of the American Statistical Association 32(200), 675–701 (1937)CrossRefMATHGoogle Scholar
  22. 22.
    Demsar, J.: Statistical comparisons of classifiers over multiple data sets. The Journal of Machine Learning Research 7, 1–30 (2006)MathSciNetMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Bin Liu
    • 1
  • Ying Yang
    • 1
  • Geoffrey I. Webb
    • 1
  • Janice Boughton
    • 1
  1. 1.Clayton School of Information TechnologyMonash UniversityAustralia

Personalised recommendations