Abstract
We propose a new method for density estimation of categorical data. The method implements a non-asymptotic data-driven bandwidth selection rule and provides model sparsity not present in the standard kernel density estimation method. Numerical experiments with a well-known ten-dimensional binary medical data set illustrate the effectiveness of the proposed approach for density estimation, discriminant analysis and classification.
Similar content being viewed by others
References
J. Aitchison and C. G. G. Aitken, “Multivariate binary discrimination by the kernel method,” Biometrika vol. 63 pp. 413–420, 1976.
J. A. Anderson, K. Whale, J. Williamson, and W. W. Buchanan, “A statistical aid to the diagnosis of keratoconjunctivitis sicca,” Quarterly Journal of Medicine vol. 41 pp. 175–189, April, 1972.
Z. I. Botev, Stochastic Methods for Optimization and Machine Learning. ePrintsUQ, http://eprint.uq.edu.au/archive/00003377, Technical Report, 2005.
Z. I. Botev and D. P. Kroese, “The generalized cross entropy method, with applications to probability density estimation,” Electronic Preprint, 2006, http://espace.library.uq.edu.au/.
A. W. Bowman, “An alternative method of cross-validation for the smoothing of density estimates,” Biometrika vol. 71 pp. 353–360, 1984.
A. W. Bowman, “A comparative study of some kernel-based nonparametric density estimators,” Journal of Statistical Computation and Simulation vol. 21 pp. 313–327, 1985.
L. Devroye and L. Gyofri “Nonparametric density estimation: the L 1 view.” In Wiley Series In Probability And Mathematical Statistics, 1985.
D. Erdogmus and J. C. Principe, “An error-entropy minimization algorithm for supervised training of nonlinear adaptive systems,” IEEE Transactions on Signal Processing, vol. 50(7) pp. 1184–1192, 2002.
M. J. Faddy and M. C. Jones, “Semiparametric smoothing for discrete data,” Biometrika vol. 85 pp. 131–138, 1998.
R. Fletcher, Practical Methods of Optimization. Wiley, 1987.
P. Hall, “On nonparametric multivariate binary discrimination,” Biometrika vol. 68 pp. 287–294, 1981.
J. H. Havrda and F. Charvát, “Quantification methods of classification processes: concepts of structural α entropy,” Kybernatica vol. 3 pp. 30–35, 1967.
E. T. Jaynes, “Information theory and statistical mechanics,” Physical Reviews vol. 106 pp. 621–630, 1957.
M. C. Jones, J. S. Marron, and S. J. Sheather, “Progress in data-based bandwidth selection for kernel density estimation,” Computational Statistics vol. 11 pp. 337–381, 1996.
G. Judge, A. Golan, and D. Miller, Maximum Entropy Econometrics: Robust Estimation with Limited Data. Wiley Series in Financial Economics and Quantitative Analysis, New York, 1996.
J. N. Kapur, Maximum Entropy Models in Science and Engineering, Wiley: New Delhi, India, 1989.
J. N. Kapur. Measures of Information and Their Applications, Wiley: New Delhi, India, 1994.
J. N. Kapur and H. K. Kesavan, Generalized Maximum Entropy Principle (With applications). Standford Educational Press: University of Waterloo, Waterloo, Ontario, Canada, 1987.
J. N. Kapur and H. K. Kesavan, “The generalized maximum entropy principle,” IEEE Transactions on Systems, Man and Cybernetics vol. 19 pp. 1042–1052, 1989.
J. N. Kapur and H. K. Kesavan, Entropy Optimization Principles with Applications, Academic: New York, 1992.
S. Kullback and R. A. Leibler, “On information and sufficiency,” Annals of Mathematical Statistics vol. 22 pp. 79–86, 1951.
P. A. Lachenbruch and M. R. Mickey, “Estimation of error rates in discriminant analysis,” Technometrics vol. 10 pp. 1–10, 1968.
C. R. Loader, “Bandwidth selection: classical or plug-in,” The Annals of Statistics vol. 27 pp. 415–438, 1999.
R. A. Morejon and J. C. Principe, “Advanced search algorithms for information-theoretic learning with kernel-based estimators,” IEEE Transactions on Neural Networks, vol. 15(4) pp. 874–884, 2004.
R. Y. Rubinstein, “The stochastic minimum cross-entropy method for combinatorial optimization and rare-event estimation,” Methodology and Computing in Applied Probability vol. 7 pp. 5–50, 2005.
R. Y. Rubinstein and D. P. Kroese, The Cross-Entropy Method, Springer, 2004.
M. Rudemo, “Empirical choice of histograms and kernel density estimators,” Scandinavian Journal of Statistics vol. 9 pp. 65–78, 1982.
D. W. Scott, Multivariate Density Estimation. Theory, Practice and Visualization, Wiley, 1992.
C. E. Shannon, “A mathematical theory of communication,” Bell System Technical Journal vol. 27 pp. 379–423;623–659, 1948.
B. W. Silverman, Density Estimation for Statistics and Data Analysis, Chapman and Hall, 1986.
J. S. Simonoff, “Smoothing categorical data,” Journal of Statistical Planning and Inference vol. 47 pp. 41–69, 1995.
J. S. Simonoff, Smoothing Methods in Statistics, Springer, 1996.
C. J. Stone, “An asymptotically optimal window selection rule for kernel density estimates,” Annals of Statistics, vol. 12, 1984.
D. M. Titterington, “A comparative study of kernel-based density estimates for categorical data,” Technometrics vol. 22 pp. 259–268, 1980.
C. Tsallis, “Possible generalization of boltzmann-gibbs statistics,” Journal of Statistical Physics vol. 52 pp. 479, 1988.
M. P. Wand and M. C. Jones, Kernel Smoothing, Chapman & Hall, 1995.
Author information
Authors and Affiliations
Corresponding author
Additional information
Supported by the Australian Research Council, under grant number DP0558957.
Rights and permissions
About this article
Cite this article
Botev, Z.I., Kroese, D.P. Non-asymptotic Bandwidth Selection for Density Estimation of Discrete Data. Methodol Comput Appl Probab 10, 435–451 (2008). https://doi.org/10.1007/s11009-007-9057-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11009-007-9057-z
Keywords
- Bandwidth selection
- Kernel density estimator
- Generalized cross entropy
- Statistical modeling
- Discrete data smoothing
- Multivariate binary discrimination