Abstract
When interval-grouped data are available, the classical Parzen–Rosenblatt kernel density estimator has to be modified to get a computable and useful approach in this context. The new nonparametric grouped data estimator needs of the choice of a smoothing parameter. In this paper, two different bandwidth selectors for this estimator are analyzed. A plug-in bandwidth selector is proposed and its relative rate of convergence obtained. Additionally, a bootstrap algorithm to select the bandwidth in this framework is designed. This method is easy to implement and does not require Monte Carlo. Both proposals are compared through simulations in different scenarios. It is observed that when the sample size is medium or large and grouping is not heavy, both bandwidth selection methods have a similar and good performance. However, when the sample size is large and under heavy grouping scenarios, the bootstrap bandwidth selector leads to better results.
This is a preview of subscription content, access via your institution.






References
Bowman A (1984) An alternative method of cross-validation for the smoothing of density estimates. Biometrika 71:353–360
Bowman A, Azzalini A (1997) Applied smoothing techniques for data analysis: the kernel approach with S-plus illustrations. Oxford University Press, Oxford
Cao R (1993) Bootstrapping the mean integrated squared error. J Multivar Anal 45:137–160
Cao R, Francisco-Fernandez M, Anand A, Bastida F, Gonzalez-Andujar J (2011) Computing statistical indices for hydrothermal times using weed emergence data. J Agric Sci 149:701–712
Chacón JE, Duong T (2010) Multivariate plug-in bandwidth selection with unconstrained pilot bandwidth matrices. TEST 19:375–398
Devroye L (1997) Universal smoothing factor selection in density estimation: theory and practice. TEST 6:223–320
Faraway J, Jhun M (1990) Bootstrap choice of bandwidth for density estimators. J Am Stat Assoc 85:1119–1122
Guidoum AC (2014) kedd: Kernel estimator and bandwidth selection for density and its derivatives. R package version 1.0.1. http://CRAN.R-project.org/package=kedd
Hall P, Marron JS (1987) Estimation of integrated squared density derivatives. Stat Probab Lett 6:109–115
Hall P, Wand MP (1996) On the accuracy of binned kernel density estimators. J Multivar Anal 56:165–184
Jang W, Loh JM (2010) Density estimation for grouped data with application to line transect sampling. Ann Appl Probab 4:893–915
Jones MC (1991) The roles of ISE and MISE in density estimation. Stat Probab Lett 12:51–56
Jones MC, Sheather SJ (1991) Using non-stochastic terms to advantage in kernel-based estimation of integrated squared density derivatives. Stat Probab Lett 11:511–514
Jones MC, Marron JS, Sheather SJ (1996) A brief survey of bandwidth selection for density estimation. J Am Stat Assoc 91:401–407
Mächler M (2014) nor1mix: normal (1-d) mixture models (S3 classes and methods). R package version 1.2-0. http://CRAN.R-project.org/package=nor1mix
Mammen E (1990) A short note on optimal bandwidth selection for kernel estimators. Stat Probab Lett 9:23–25
Marron J (1992) Bootstrap bandwidth selection. In: LePage R, Billard L (eds) Exploring the limits of bootstrap. Wiley, New York, pp 249–262
Marron JS, Wand MP (1992) Exact mean integrated squared error. Ann Stat 20:712–736
Park BU, Marron JS (1990) Comparison of data-driven bandwidth selectors. J Am Stat Assoc 85:66–72
Parzen E (1962) On estimation of a probability density function and mode. Ann Math Stat 33:1065–1076
R Core Team (2016) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria, http://www.R-project.org/
Reyes M, Francisco-Fernandez M, Cao R (2016) Nonparametric kernel density estimation for general grouped data. J Nonparametr Stat 2:235–249
Rosenblatt M (1956) Remarks on some nonparametric estimates of a density function. Ann Math Stat 27:832–837
Scott D, Sheather SJ (1985) Kernel density estimation with binned data. Commun Stat Theory Methods 14:1353–1359
Sheather SJ, Jones MC (1991) A reliable data-based bandwidth selection method for kernel density estimation. J R Stat Soc Series B 53:683–690
Silverman BW (1986) Density estimation for statistics and data analysis. Chapman and Hall, London
Taylor C (1989) Bootstrap choice of the tuning parameter in kernel density estimators. Biometrika 76:705–712
Wand M (2014) KernSmooth: functions for kernel smoothing for Wand & Jones (1995). R package version 2.23-12. http://CRAN.R-project.org/package=KernSmooth
Wand MP, Jones MC (1995) Kernel smoothing. Chapman and Hall/CRC, London
Acknowledgements
This research has been partially supported by the Spanish Ministry of Science and Innovation, Grants MTM2011-22392 and MTM2014-52876-R, and Xunta de Galicia Grant CN2012/130. The authors thank two anonymous referees for numerous useful comments that significantly improved this article.
Author information
Authors and Affiliations
Corresponding author
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Reyes, M., Francisco-Fernández, M. & Cao, R. Bandwidth selection in kernel density estimation for interval-grouped data. TEST 26, 527–545 (2017). https://doi.org/10.1007/s11749-017-0523-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11749-017-0523-9
Keywords
- Smoothing parameter selection
- Plug-in bandwidth
- Bootstrap bandwidth selector
- Interval data
Mathematics Subject Classification
- 62G07
- 62N99
- 62G09