Skip to main content
Log in

Bandwidth selection in kernel density estimation for interval-grouped data

  • Original Paper
  • Published:
TEST Aims and scope Submit manuscript

Abstract

When interval-grouped data are available, the classical Parzen–Rosenblatt kernel density estimator has to be modified to get a computable and useful approach in this context. The new nonparametric grouped data estimator needs of the choice of a smoothing parameter. In this paper, two different bandwidth selectors for this estimator are analyzed. A plug-in bandwidth selector is proposed and its relative rate of convergence obtained. Additionally, a bootstrap algorithm to select the bandwidth in this framework is designed. This method is easy to implement and does not require Monte Carlo. Both proposals are compared through simulations in different scenarios. It is observed that when the sample size is medium or large and grouping is not heavy, both bandwidth selection methods have a similar and good performance. However, when the sample size is large and under heavy grouping scenarios, the bootstrap bandwidth selector leads to better results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  • Bowman A (1984) An alternative method of cross-validation for the smoothing of density estimates. Biometrika 71:353–360

    Article  MathSciNet  Google Scholar 

  • Bowman A, Azzalini A (1997) Applied smoothing techniques for data analysis: the kernel approach with S-plus illustrations. Oxford University Press, Oxford

    MATH  Google Scholar 

  • Cao R (1993) Bootstrapping the mean integrated squared error. J Multivar Anal 45:137–160

    Article  MathSciNet  MATH  Google Scholar 

  • Cao R, Francisco-Fernandez M, Anand A, Bastida F, Gonzalez-Andujar J (2011) Computing statistical indices for hydrothermal times using weed emergence data. J Agric Sci 149:701–712

    Article  Google Scholar 

  • Chacón JE, Duong T (2010) Multivariate plug-in bandwidth selection with unconstrained pilot bandwidth matrices. TEST 19:375–398

    Article  MathSciNet  MATH  Google Scholar 

  • Devroye L (1997) Universal smoothing factor selection in density estimation: theory and practice. TEST 6:223–320

    Article  MathSciNet  MATH  Google Scholar 

  • Faraway J, Jhun M (1990) Bootstrap choice of bandwidth for density estimators. J Am Stat Assoc 85:1119–1122

    Article  Google Scholar 

  • Guidoum AC (2014) kedd: Kernel estimator and bandwidth selection for density and its derivatives. R package version 1.0.1. http://CRAN.R-project.org/package=kedd

  • Hall P, Marron JS (1987) Estimation of integrated squared density derivatives. Stat Probab Lett 6:109–115

    Article  MathSciNet  MATH  Google Scholar 

  • Hall P, Wand MP (1996) On the accuracy of binned kernel density estimators. J Multivar Anal 56:165–184

    Article  MathSciNet  MATH  Google Scholar 

  • Jang W, Loh JM (2010) Density estimation for grouped data with application to line transect sampling. Ann Appl Probab 4:893–915

    MathSciNet  MATH  Google Scholar 

  • Jones MC (1991) The roles of ISE and MISE in density estimation. Stat Probab Lett 12:51–56

    Article  MathSciNet  Google Scholar 

  • Jones MC, Sheather SJ (1991) Using non-stochastic terms to advantage in kernel-based estimation of integrated squared density derivatives. Stat Probab Lett 11:511–514

    Article  MATH  Google Scholar 

  • Jones MC, Marron JS, Sheather SJ (1996) A brief survey of bandwidth selection for density estimation. J Am Stat Assoc 91:401–407

    Article  MathSciNet  MATH  Google Scholar 

  • Mächler M (2014) nor1mix: normal (1-d) mixture models (S3 classes and methods). R package version 1.2-0. http://CRAN.R-project.org/package=nor1mix

  • Mammen E (1990) A short note on optimal bandwidth selection for kernel estimators. Stat Probab Lett 9:23–25

    Article  MathSciNet  MATH  Google Scholar 

  • Marron J (1992) Bootstrap bandwidth selection. In: LePage R, Billard L (eds) Exploring the limits of bootstrap. Wiley, New York, pp 249–262

    Google Scholar 

  • Marron JS, Wand MP (1992) Exact mean integrated squared error. Ann Stat 20:712–736

    Article  MathSciNet  MATH  Google Scholar 

  • Park BU, Marron JS (1990) Comparison of data-driven bandwidth selectors. J Am Stat Assoc 85:66–72

    Article  Google Scholar 

  • Parzen E (1962) On estimation of a probability density function and mode. Ann Math Stat 33:1065–1076

    Article  MathSciNet  MATH  Google Scholar 

  • R Core Team (2016) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria, http://www.R-project.org/

  • Reyes M, Francisco-Fernandez M, Cao R (2016) Nonparametric kernel density estimation for general grouped data. J Nonparametr Stat 2:235–249

    Article  MathSciNet  MATH  Google Scholar 

  • Rosenblatt M (1956) Remarks on some nonparametric estimates of a density function. Ann Math Stat 27:832–837

    Article  MathSciNet  MATH  Google Scholar 

  • Scott D, Sheather SJ (1985) Kernel density estimation with binned data. Commun Stat Theory Methods 14:1353–1359

    Article  Google Scholar 

  • Sheather SJ, Jones MC (1991) A reliable data-based bandwidth selection method for kernel density estimation. J R Stat Soc Series B 53:683–690

    MathSciNet  MATH  Google Scholar 

  • Silverman BW (1986) Density estimation for statistics and data analysis. Chapman and Hall, London

    Book  MATH  Google Scholar 

  • Taylor C (1989) Bootstrap choice of the tuning parameter in kernel density estimators. Biometrika 76:705–712

    Article  MathSciNet  MATH  Google Scholar 

  • Wand M (2014) KernSmooth: functions for kernel smoothing for Wand & Jones (1995). R package version 2.23-12. http://CRAN.R-project.org/package=KernSmooth

  • Wand MP, Jones MC (1995) Kernel smoothing. Chapman and Hall/CRC, London

    Book  MATH  Google Scholar 

Download references

Acknowledgements

This research has been partially supported by the Spanish Ministry of Science and Innovation, Grants MTM2011-22392 and MTM2014-52876-R, and Xunta de Galicia Grant CN2012/130. The authors thank two anonymous referees for numerous useful comments that significantly improved this article.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mario Francisco-Fernández.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 281 KB)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Reyes, M., Francisco-Fernández, M. & Cao, R. Bandwidth selection in kernel density estimation for interval-grouped data. TEST 26, 527–545 (2017). https://doi.org/10.1007/s11749-017-0523-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11749-017-0523-9

Keywords

Mathematics Subject Classification

Navigation