A Comparative Study of Bandwidth Choice in Kernel Density Estimation for Naive Bayesian Classification
Kernel density estimation (KDE) is an important method in nonparametric learning. While KDE has been studied extensively in the context of accuracy of distribution estimation, it has not been studied extensively in the context of classification. This paper studies nine bandwidth selection schemes for kernel density estimation in Naive Bayesian classification context, using 52 machine learning benchmark datasets. The contributions of this paper are threefold. First, it shows that some commonly used and very sophisticated bandwidth selection schemes do not give good performance in Naive Bayes. Surprisingly, some very simple bandwidth selection schemes give statistically significantly better performance. Second, it shows that kernel density estimation can achieve statistically significantly better classification performance than a commonly used discretization method in Naive Bayes, but only when appropriate bandwidth selection schemes are applied. Third, this study gives bandwidth distribution patterns for the investigated bandwidth selection schemes.
KeywordsMean Square Error Kernel Density Estimation Numeric Attribute Optimal Bandwidth Mean Integrate Square Error
Unable to display preview. Download preview PDF.
- 1.Fayyad, U.M., Irani, K.B.: Multi-interval discretization of continuous-valued attributes for classification learning. In: Proceedings of the 13th International Joint Conference on Artificial Intelligence, vol. 2, pp. 1022–1027 (1993)Google Scholar
- 2.Yang, Y., Webb, G.: Discretization for naive-bayes learning: managing discretization bias and variance. Machine Learning (2008) Online FirstGoogle Scholar
- 4.John, G.H., Langley, P.: Estimating continuous distributions in bayesian classifiers. In: Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence, pp. 338–345 (1995)Google Scholar
- 5.Silverman, B.W.: Density Estimation for Statistics and Data Analysis, 1st edn. Chapman & Hall/CRC (1986)Google Scholar
- 6.Wand, M.P., Jones, M.C.: Kernel Smoothing. Chapman & Hall/CRC (1994)Google Scholar
- 11.R Development Core Team: R: A Language and Environment for Statistical Computing, Austria, Vienna (2008), http://www.R-project.org
- 15.Hyndman, R.J.: The problem with sturge’s rule for constructing histograms (1995), http://www-personal.buseco.monash.edu.au/~hyndman/papers
- 18.Asuncion, A., Newman, D.J.: UCI Machine Learning Repository (2007), http://www.ics.uci.edu/~mlearn/MLRepository.html
- 20.Kohavi, R., Wolpert, D.H.: Bias plus variance decomposition for zero-one loss functions. In: Machine Learning: Proceedings of the Thirteenth International Conference, vol. 275, p. 283 (1996)Google Scholar