Non-asymptotic Bandwidth Selection for Density Estimation of Discrete Data

  • Zdravko I. Botev
  • Dirk P. Kroese


We propose a new method for density estimation of categorical data. The method implements a non-asymptotic data-driven bandwidth selection rule and provides model sparsity not present in the standard kernel density estimation method. Numerical experiments with a well-known ten-dimensional binary medical data set illustrate the effectiveness of the proposed approach for density estimation, discriminant analysis and classification.


Bandwidth selection Kernel density estimator Generalized cross entropy Statistical modeling Discrete data smoothing Multivariate binary discrimination 

AMS 2000 Subject Classification

Primary 94A17 60K35 Secondary 68Q32 93E14 


  1. J. Aitchison and C. G. G. Aitken, “Multivariate binary discrimination by the kernel method,” Biometrika vol. 63 pp. 413–420, 1976.zbMATHCrossRefMathSciNetGoogle Scholar
  2. J. A. Anderson, K. Whale, J. Williamson, and W. W. Buchanan, “A statistical aid to the diagnosis of keratoconjunctivitis sicca,” Quarterly Journal of Medicine vol. 41 pp. 175–189, April, 1972.Google Scholar
  3. Z. I. Botev, Stochastic Methods for Optimization and Machine Learning. ePrintsUQ,, Technical Report, 2005.
  4. Z. I. Botev and D. P. Kroese, “The generalized cross entropy method, with applications to probability density estimation,” Electronic Preprint, 2006,
  5. A. W. Bowman, “An alternative method of cross-validation for the smoothing of density estimates,” Biometrika vol. 71 pp. 353–360, 1984.CrossRefMathSciNetGoogle Scholar
  6. A. W. Bowman, “A comparative study of some kernel-based nonparametric density estimators,” Journal of Statistical Computation and Simulation vol. 21 pp. 313–327, 1985.zbMATHCrossRefMathSciNetGoogle Scholar
  7. L. Devroye and L. Gyofri “Nonparametric density estimation: the L 1 view.” In Wiley Series In Probability And Mathematical Statistics, 1985.Google Scholar
  8. D. Erdogmus and J. C. Principe, “An error-entropy minimization algorithm for supervised training of nonlinear adaptive systems,” IEEE Transactions on Signal Processing, vol. 50(7) pp. 1184–1192, 2002.CrossRefMathSciNetGoogle Scholar
  9. M. J. Faddy and M. C. Jones, “Semiparametric smoothing for discrete data,” Biometrika vol. 85 pp. 131–138, 1998.zbMATHCrossRefMathSciNetGoogle Scholar
  10. R. Fletcher, Practical Methods of Optimization. Wiley, 1987.Google Scholar
  11. P. Hall, “On nonparametric multivariate binary discrimination,” Biometrika vol. 68 pp. 287–294, 1981.zbMATHCrossRefMathSciNetGoogle Scholar
  12. J. H. Havrda and F. Charvát, “Quantification methods of classification processes: concepts of structural α entropy,” Kybernatica vol. 3 pp. 30–35, 1967.zbMATHGoogle Scholar
  13. E. T. Jaynes, “Information theory and statistical mechanics,” Physical Reviews vol. 106 pp. 621–630, 1957.MathSciNetCrossRefGoogle Scholar
  14. M. C. Jones, J. S. Marron, and S. J. Sheather, “Progress in data-based bandwidth selection for kernel density estimation,” Computational Statistics vol. 11 pp. 337–381, 1996.zbMATHMathSciNetGoogle Scholar
  15. G. Judge, A. Golan, and D. Miller, Maximum Entropy Econometrics: Robust Estimation with Limited Data. Wiley Series in Financial Economics and Quantitative Analysis, New York, 1996.zbMATHGoogle Scholar
  16. J. N. Kapur, Maximum Entropy Models in Science and Engineering, Wiley: New Delhi, India, 1989.zbMATHGoogle Scholar
  17. J. N. Kapur. Measures of Information and Their Applications, Wiley: New Delhi, India, 1994.zbMATHGoogle Scholar
  18. J. N. Kapur and H. K. Kesavan, Generalized Maximum Entropy Principle (With applications). Standford Educational Press: University of Waterloo, Waterloo, Ontario, Canada, 1987.zbMATHGoogle Scholar
  19. J. N. Kapur and H. K. Kesavan, “The generalized maximum entropy principle,” IEEE Transactions on Systems, Man and Cybernetics vol. 19 pp. 1042–1052, 1989.CrossRefMathSciNetGoogle Scholar
  20. J. N. Kapur and H. K. Kesavan, Entropy Optimization Principles with Applications, Academic: New York, 1992.Google Scholar
  21. S. Kullback and R. A. Leibler, “On information and sufficiency,” Annals of Mathematical Statistics vol. 22 pp. 79–86, 1951.CrossRefMathSciNetzbMATHGoogle Scholar
  22. P. A. Lachenbruch and M. R. Mickey, “Estimation of error rates in discriminant analysis,” Technometrics vol. 10 pp. 1–10, 1968.CrossRefMathSciNetGoogle Scholar
  23. C. R. Loader, “Bandwidth selection: classical or plug-in,” The Annals of Statistics vol. 27 pp. 415–438, 1999.zbMATHCrossRefMathSciNetGoogle Scholar
  24. R. A. Morejon and J. C. Principe, “Advanced search algorithms for information-theoretic learning with kernel-based estimators,” IEEE Transactions on Neural Networks, vol. 15(4) pp. 874–884, 2004.CrossRefGoogle Scholar
  25. R. Y. Rubinstein, “The stochastic minimum cross-entropy method for combinatorial optimization and rare-event estimation,” Methodology and Computing in Applied Probability vol. 7 pp. 5–50, 2005.zbMATHCrossRefMathSciNetGoogle Scholar
  26. R. Y. Rubinstein and D. P. Kroese, The Cross-Entropy Method, Springer, 2004.Google Scholar
  27. M. Rudemo, “Empirical choice of histograms and kernel density estimators,” Scandinavian Journal of Statistics vol. 9 pp. 65–78, 1982.MathSciNetzbMATHGoogle Scholar
  28. D. W. Scott, Multivariate Density Estimation. Theory, Practice and Visualization, Wiley, 1992.Google Scholar
  29. C. E. Shannon, “A mathematical theory of communication,” Bell System Technical Journal vol. 27 pp. 379–423;623–659, 1948.MathSciNetzbMATHGoogle Scholar
  30. B. W. Silverman, Density Estimation for Statistics and Data Analysis, Chapman and Hall, 1986.Google Scholar
  31. J. S. Simonoff, “Smoothing categorical data,” Journal of Statistical Planning and Inference vol. 47 pp. 41–69, 1995.zbMATHCrossRefMathSciNetGoogle Scholar
  32. J. S. Simonoff, Smoothing Methods in Statistics, Springer, 1996.Google Scholar
  33. C. J. Stone, “An asymptotically optimal window selection rule for kernel density estimates,” Annals of Statistics, vol. 12, 1984.Google Scholar
  34. D. M. Titterington, “A comparative study of kernel-based density estimates for categorical data,” Technometrics vol. 22 pp. 259–268, 1980.zbMATHCrossRefMathSciNetGoogle Scholar
  35. C. Tsallis, “Possible generalization of boltzmann-gibbs statistics,” Journal of Statistical Physics vol. 52 pp. 479, 1988.zbMATHCrossRefMathSciNetGoogle Scholar
  36. M. P. Wand and M. C. Jones, Kernel Smoothing, Chapman & Hall, 1995.Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2007

Authors and Affiliations

  1. 1.Department of MathematicsThe University of QueenslandBrisbaneAustralia

Personalised recommendations