A Comparative Study of Bandwidth Choice in Kernel Density Estimation for Naive Bayesian Classification

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5476)


Kernel density estimation (KDE) is an important method in nonparametric learning. While KDE has been studied extensively in the context of accuracy of distribution estimation, it has not been studied extensively in the context of classification. This paper studies nine bandwidth selection schemes for kernel density estimation in Naive Bayesian classification context, using 52 machine learning benchmark datasets. The contributions of this paper are threefold. First, it shows that some commonly used and very sophisticated bandwidth selection schemes do not give good performance in Naive Bayes. Surprisingly, some very simple bandwidth selection schemes give statistically significantly better performance. Second, it shows that kernel density estimation can achieve statistically significantly better classification performance than a commonly used discretization method in Naive Bayes, but only when appropriate bandwidth selection schemes are applied. Third, this study gives bandwidth distribution patterns for the investigated bandwidth selection schemes.


Mean Square Error Kernel Density Estimation Numeric Attribute Optimal Bandwidth Mean Integrate Square Error 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Fayyad, U.M., Irani, K.B.: Multi-interval discretization of continuous-valued attributes for classification learning. In: Proceedings of the 13th International Joint Conference on Artificial Intelligence, vol. 2, pp. 1022–1027 (1993)Google Scholar
  2. 2.
    Yang, Y., Webb, G.: Discretization for naive-bayes learning: managing discretization bias and variance. Machine Learning (2008) Online FirstGoogle Scholar
  3. 3.
    Bay, S.D.: Multivariate discretization for set mining. Knowledge and Information Systems 3(4), 491–512 (2001)CrossRefzbMATHGoogle Scholar
  4. 4.
    John, G.H., Langley, P.: Estimating continuous distributions in bayesian classifiers. In: Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence, pp. 338–345 (1995)Google Scholar
  5. 5.
    Silverman, B.W.: Density Estimation for Statistics and Data Analysis, 1st edn. Chapman & Hall/CRC (1986)Google Scholar
  6. 6.
    Wand, M.P., Jones, M.C.: Kernel Smoothing. Chapman & Hall/CRC (1994)Google Scholar
  7. 7.
    Epanechnikov, V.A.: Non-parametric estimation of a multivariate probability density. Theory of Probability and its Applications 14(1), 153–158 (1969)MathSciNetCrossRefzbMATHGoogle Scholar
  8. 8.
    Friedman, J.H.: On bias, variance, 0/1-loss, and the curse-of-dimensionality. Data Mining and Knowledge Discovery 1(1), 55–77 (1997)MathSciNetCrossRefGoogle Scholar
  9. 9.
    Hall, P., Kang, K.H.: Bandwidth choice for nonparametric classification. Annals of Statistics 33(1), 284–306 (2005)MathSciNetCrossRefzbMATHGoogle Scholar
  10. 10.
    Bowman, A.W.: An alternative method of cross-validation for the smoothing of density estimates. Biometrika 71(2), 353–360 (1984)MathSciNetCrossRefGoogle Scholar
  11. 11.
    R Development Core Team: R: A Language and Environment for Statistical Computing, Austria, Vienna (2008),
  12. 12.
    Scott, D.W., Terrell, G.R.: Biased and unbiased cross-validation in density estimation. Journal of the American Statistical Association 82(400), 1131–1146 (1987)MathSciNetCrossRefzbMATHGoogle Scholar
  13. 13.
    Sheather, S.J., Jones, M.C.: A reliable data-based bandwidth selection method for kernel density estimation. Journal of the Royal Statistical Society. Series B 53(3), 683–690 (1991)MathSciNetzbMATHGoogle Scholar
  14. 14.
    Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)zbMATHGoogle Scholar
  15. 15.
    Hyndman, R.J.: The problem with sturge’s rule for constructing histograms (1995),
  16. 16.
    Sturges, H.A.: The choice of a class interval. Journal of the American Statistical Association 21(153), 65–66 (1926)CrossRefGoogle Scholar
  17. 17.
    Venables, W.N., Ripley, B.D.: Modern Applied Statistics with S-PLUS, 3rd edn. Springer, Heidelberg (1999)CrossRefzbMATHGoogle Scholar
  18. 18.
    Asuncion, A., Newman, D.J.: UCI Machine Learning Repository (2007),
  19. 19.
    Webb, G.I.: Multiboosting: A technique for combining boosting and wagging. Machine Learning 40(2), 159–196 (2000)CrossRefGoogle Scholar
  20. 20.
    Kohavi, R., Wolpert, D.H.: Bias plus variance decomposition for zero-one loss functions. In: Machine Learning: Proceedings of the Thirteenth International Conference, vol. 275, p. 283 (1996)Google Scholar
  21. 21.
    Friedman, M.: The use of ranks to avoid the assumption of normality implicit in the analysis of variance. Journal of the American Statistical Association 32(200), 675–701 (1937)CrossRefzbMATHGoogle Scholar
  22. 22.
    Demsar, J.: Statistical comparisons of classifiers over multiple data sets. The Journal of Machine Learning Research 7, 1–30 (2006)MathSciNetzbMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  1. 1.Clayton School of Information TechnologyMonash UniversityAustralia

Personalised recommendations