Skip to main content
Log in

Automatic bandwidth selection for recursive kernel density estimators with length-biased data

  • Original Paper
  • Published:
Japanese Journal of Statistics and Data Science Aims and scope Submit manuscript

Abstract

In this paper we propose an automatic selection of the bandwidth of the recursive kernel estimators of a probability density function defined by the stochastic approximation algorithm in the case of length-biased data. We compared our proposed plug-in method with the cross-validation method and the so-called smooth bootstrap bandwidth selector via simulations as well as a real data set. Results showed that, using the selected plug-in bandwidth and some special stepsizes, the proposed recursive estimators will be very competitive to the non-recursive one in terms of estimation error and much better in terms of computational costs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

References

  • Altman, N., & Leger, C. (1995). Bandwidth selection for kernel distribution function estimation. Journal of Statistical Planning and Inference, 46, 195–214.

    Article  MathSciNet  Google Scholar 

  • Barmi, Simonoff, & Barmi and Simonoff. (2000). Transformation-based density estimation for weighted distributions. Journal of Nonparametric Statistics, 12, 861–878.

    Article  MathSciNet  Google Scholar 

  • Bhattacharyya, F. L. A., & Richardson, B. G. D. (1988). A comparioson of nonparametric un- weighted and length-biased density estimation of fibres. Communications in Statistics-Theory and Methods, 17, 3629–3644.

    Article  MathSciNet  Google Scholar 

  • Bojanic, R., & Seneta, E. (1973). A unified theory of regularly varying sequences. Mathematische Zeitschrift, 134, 91–106.

    Article  MathSciNet  Google Scholar 

  • Borrajo, M. I., González-Manteiga, W., & Martìnez-Miranda, M. D. (2017). Bandwidth selection for Kernel density estimation with length-biased data. Journal of Nonparametric Statistics, 29, 636–668.

    Article  MathSciNet  Google Scholar 

  • Brunel, E., Comte, F., & Guilloux, A. (2009). Nonparametric density estimation in presence of bias and censoring. Test, 18, 166–194.

    Article  MathSciNet  Google Scholar 

  • Cox, D. (2005). Some sampling problems in technology. In D. Hand & A. Herzberg (Eds.), Selected statistical papers of Sir David Cox (Vol. 1, pp. 81–92). Cambridge: Cambridge University Press.

    Google Scholar 

  • Cutillo, L., De Feis, I., Nikolaidou, C., & Sapatinas, T. (2014). Wavelet density estimation for weighted data. Journal of Statistical Planning and Inference, 146, 1–19.

    Article  MathSciNet  Google Scholar 

  • de Unã-Álvarez, J. (2004). Nonparametric estimation under length-biased sampling and type I censoring: a moment based approach. Annals of the Institute of Statistical Mathematics, 56, 667–681.

    Article  MathSciNet  Google Scholar 

  • Delaigle, A., & Gijbels, I. (2004). Practical bandwidth selection in deconvolution Kernel density estimation. Computational Statistics and Data Analysis, 45, 249–267.

    Article  MathSciNet  Google Scholar 

  • Duflo, M. (1997). Random iterative models in applications of mathematics. Berlin: Springer.

    Book  Google Scholar 

  • Duin, R. P. (1976). On the choice of smoothing parameters for Parzen estimators of probability density functions. IEEE Transactions on Computers, 25, 1175–1179.

    Article  Google Scholar 

  • Efromovich, S. (2004). Density estimation for biased data. The Annals of Statistics, 32, 1137–1161.

    Article  MathSciNet  Google Scholar 

  • Fisher, R. A. (1934). The effects of methods of ascertainment upon the estimation of frequencies. The Annals of Eugenics, 6, 13–25.

    Article  Google Scholar 

  • Galambos, J., & Seneta, E. (1973). Regularly varying sequences. Proceedings of the American Mathematical Society, 41, 110–116.

    Article  MathSciNet  Google Scholar 

  • Hanin, L. G., Rachev, S. T., Tsodikov, A. D., & Yakovlev, Y. (1997). A stochastic model of carcinogenesis and tumor size at detection. Advances in Applied Probability, 29, 607–628.

    Article  MathSciNet  Google Scholar 

  • Hart, J. D., & Vieu, P. (1990). Data-driven bandwidth choice for density estimation based on dependent data. The Annals of Statistics, 18, 873–890.

    Article  MathSciNet  Google Scholar 

  • Huang, Y., Chen, X., & Wu, W. B. (2014). Recursive nonparametric estimation for times series. IEEE Transactions on Information Theory, 60, 1301–1312.

    Article  MathSciNet  Google Scholar 

  • Jmaei, A., Slaoui, Y., & Dellagi, W. (2017). Recursive distribution estimators defined by stochastic approximation method using Bernstein polynomials. Journal of Nonparametric Statistics, 29, 792–805.

    Article  MathSciNet  Google Scholar 

  • Jones, M. C. (1991). Kernel density estimation for length-biased data. Biometrika, 78, 511–519.

    Article  MathSciNet  Google Scholar 

  • Kushner, H. J., & Yin, G. G. (2003). Stochastic approximation and recursive algorithms and applications, in applications of mathematics (p. 35). New York: Springer.

    Google Scholar 

  • Marron, J. S. (1988). Automatic smoothing parameter selection: a survey. Empirical Economics and Econometrics, 13, 187–208.

    Article  Google Scholar 

  • Milet, J., Nuel, G., Watier, L., Courtin, D., Slaoui, Y., Senghor, P., et al. (2010). Genome wide linkage study, using a 250K SNP Map, of plasmodium falciparum infection and mild Malaria attack in a Senegalese population. PLoS One, 5(7), e11616.

    Article  Google Scholar 

  • Mokkadem, A., & Pelletier, M. (2007). A companion for the Kiefer–Wolfowitz–Blum stochastic approximation algorithm. The Annals of Statistics, 35, 1749–1772.

    Article  MathSciNet  Google Scholar 

  • Mokkadem, A., Pelletier, M., & Slaoui, Y. (2009a). The stochastic approximation method for the estimation of a multivariate probability density. Journal of Statistical Planning and Inference, 139, 2459–2478.

    Article  MathSciNet  Google Scholar 

  • Mokkadem, A., Pelletier, M., & Slaoui, Y. (2009b). Revisiting Révész’s stochastic approximation method for the estimation of a regression function. ALEA. Latin American Journal of Probability and Mathematical Statistics, 6, 63–114.

    MathSciNet  MATH  Google Scholar 

  • Patil, G. P. (2002). Weighted distribution. Encyclopedia of Environmetrics, 4, 2369–2377.

    Google Scholar 

  • Rao, C. R. (1965). On discrete distributions arising out of methods of ascertainment. In G. P. Patil (Ed.), Classical and contagious discrete distributions (pp. 320–332). Calcutta: Pergamon Press and Statistical Publishing Society.

    Google Scholar 

  • Révész, P. (1973). Robbins-Monro procedure in a Hilbert space and its application in the theory of learning processes I. Studia Scientiarum Mathematicarum Hungarica, 8, 391–398.

    MathSciNet  MATH  Google Scholar 

  • Révész, P. (1977). How to apply the method of stochastic approximation in the non-parametric estimation of a regression function. Math.operationsforsch.u.statist.,ser.statist, 8, 119–126.

    MathSciNet  MATH  Google Scholar 

  • Rudemo, M. (1982). Empirical choice of histograms and kernel density estimators. Scandinavian Journal of Statistics, 9, 65–78.

    MathSciNet  MATH  Google Scholar 

  • Scott, D. W. (1992). Multivariate density estimation: Theory, practice and visualisation. Hoboken: Wiley.

    Book  Google Scholar 

  • Scott, D. W., & Terrell, G. R. (1987). Biased and unbiased cross-validation in density estimation. Journal of the American Statistical Association, 82, 1131–1146.

    Article  MathSciNet  Google Scholar 

  • Silverman, B. W. (1986). Density estimation for statistics and data analysis. London: Chapman and Hall.

    Book  Google Scholar 

  • Slaoui, Y. (2013). Large and moderate deviation principles for recursive kernel density estimators defined by stochastic approximation method. Serdica Mathematical Journal, 39, 53–82.

    MathSciNet  MATH  Google Scholar 

  • Slaoui, Y. (2014a). Bandwidth selection for recursive kernel density estimators defined by stochastic approximation method, Journal of Probability and Statistics, 2014, ID 739640, https://doi.org/10.1155/2014/739640.

  • Slaoui, Y. (2014b). The stochastic approximation method for the estimation of a distribution function. Mathematical Methods of Statistics, 23, 306–325.

    Article  MathSciNet  Google Scholar 

  • Slaoui, Y. (2016). On the choice of smoothing parameters for semirecursive nonparametric hazard estimators. The Journal of Statistical Theory and Practice, 10, 656–672.

    Article  MathSciNet  Google Scholar 

  • Slaoui, Y., & Jmaei, A. (2019). Recursive density estimators based on Robbins–Monro’s scheme and using Bernstein polynomials. Statistics and Its Interface, 12, 439–455.

    Article  MathSciNet  Google Scholar 

  • Slaoui, Y., & Nuel, G. (2014). Parameter estimation in a hierarchical random intercept model with censored response: An approach using a SEM algorithm and Gibbs sampling. Sankhya B, 76, 210–233.

    Article  MathSciNet  Google Scholar 

  • Tsybakov, A. B. (1990). Recurrent estimation of the mode of a multidimensional distribution. Problems of Information Transmission, 8, 119–126.

    MathSciNet  Google Scholar 

Download references

Acknowledgements

I would like to thank the Editor, the Associate Editor and the referees for their constructive comments and suggestions.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yousri Slaoui.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Proofs

Proofs

Throughout this section we use the following notation:

$$\begin{aligned} Q_n &= \prod _{j=1}^{n}\left( 1-\beta _j\right) ,\nonumber \\ Z_n\left( x\right) &= \eta h_n^{-1}Y_n^{-1}K\left( \frac{x-Y_n}{h_n}\right) . \end{aligned}$$
(22)

Let us first state the following proposition, which gives the bias and the variance of \(f_n\).

Proposition 2

(Bias and variance of \(f_n\)) Let assumptions \(\left( A1\right) -\left( A3\right)\) hold.

  1. 1.

    If \(a\in ]0, \beta /5]\) , then

    $$\begin{aligned} \mathbb {E}\left[ f_n\left( x\right) \right] -f\left( x\right) =\frac{1}{2\left( \beta -2a\xi \right) }f^{\left( 2\right) }\left( x\right) h_n^2\mu _2\left( K\right) +o\left( h_n^2\right) . \end{aligned}$$
    (23)

    If \(a\in ]\beta /5, 1[\), then

    $$\begin{aligned} \mathbb {E}\left[ f_n\left( x\right) \right] -f\left( x\right) =o\left( \sqrt{\beta _nh_n^{-1}}\right) . \end{aligned}$$
    (24)
  2. 2.

    If \(a\in [\beta /5, 1[\), then

    $$\begin{aligned} Var\left[ f_n\left( x\right) \right] &= \frac{1}{\left( 2-\left( \beta -a\right) \xi \right) }\beta _nh_n^{-1}H\left( x\right) R\left( K\right) +o\left( \beta _nh_n^{-1}\right) . \end{aligned}$$
    (25)

    If \(a\in ]0,\beta /5[\), then

    $$\begin{aligned} {\mathrm{Var}}\left[ f_n\left( x\right) \right] =o\left( h_n^4\right) . \end{aligned}$$
    (26)
  3. 3.

    If \(\lim _{n\rightarrow \infty }\left( n\beta _n\right) >\max \left\{ 2a, \left( a-\beta \right) /2\right\}\), then (23) and (25) hold simultaneously.

The bias and the variance of the estimator \(f_n\) defined by the stochastic approximation algorithm (1) then heavily depend on the choice of the stepsize \(\left( \beta _n\right)\).

Before giving the outlines of the proofs, we state the following technical lemma, which is proved in Mokkadem et al. (2009a), and which is widely applied throughout the demonstrations.

Lemma 1

Let \(\left( v_n\right) \in \mathcal {GS}\left( v^*\right)\), \(\left( \beta _n\right) \in \mathcal {GS}\left( -\beta \right)\), and \(m>0\) such that \(m-v^*\xi >0\) where \(\xi\) is defined in (4). We have

$$\begin{aligned} \lim _{n \rightarrow +\infty }v_nQ_n^{m}\sum _{k=1}^nQ_k^{-m}\frac{\beta _k}{v_k} =\frac{1}{m-v^*\xi }. \end{aligned}$$
(27)

Moreover, for all positive sequences \(\left( \alpha _n\right)\)such that \(\lim _{n \rightarrow +\infty }\alpha _n=0\), and all \(\delta \in \mathbb {R}\),

$$\begin{aligned} \lim _{n \rightarrow +\infty }v_nQ_n^{m}\left[ \sum _{k=1}^n Q_k^{-m} \frac{\beta _k}{v_k}\alpha _k+\delta \right] =0. \end{aligned}$$
(28)

Let us underline that the application of Lemma 1 requires Assumption (A2)(iii) on the limit of \(\left( n\beta _n\right)\) as n goes to infinity.

Our proofs are organized as follows. Propositions 1 and 2 are proven in Sects. A.2 and A.1 respectively, Theorem 1 in Sect. A.3.

1.1 Proof of Proposition 1

Let us first note that, in view of (35), we have

$$\begin{aligned}&\int _{\mathbb {R}}\left\{ Q_n\sum _{k=1}^{n}Q_k^{-1}\beta _k\left[ \mathbb {E}\left( Z_k\left( x\right) \right) -f\left( x\right) \right] \right\} ^2f\left( x\right) \mathrm{d}x\\&\quad =\frac{1}{4}\mu _2^2\left( K\right) \left[ Q_n\sum _{k=1}^{n}Q_k^{-1}\beta _kh_k^2\right] ^2\int _{\mathbb {R}}\left( f^{\left( 2\right) }\left( x\right) \right) ^2f\left( x\right) \mathrm{d}x +\int _{\mathbb {R}}\left[ Q_n\sum _{k=1}^nQ_k^{-1}\beta _kh_k^2\delta _k\left( x\right) \right] ^2f\left( x\right) \mathrm{d}x \\&\qquad +\,\mu _2\left( K\right) \left( Q_n\sum _{k=1}^{n}Q_k^{-1}\beta _kh_k^2\right) \left( Q_n\sum _{k=1}^{n}Q_k^{-1}\beta _kh_k^2\int _{\mathbb {R}}m^{\left( 2\right) }\left( x\right) \delta _k\left( x\right) f\left( x\right) \mathrm{d}x\right) . \end{aligned}$$

In view of \(\left( A3\right)\), the application of Lebesgue’s convergence theorem ensures that \(\lim _{k \rightarrow +\infty }\int _{\mathbb {R}}\delta _k^2\left( x\right) f\left( x\right) \mathrm{d}x=0\) and \(\lim _{k \rightarrow +\infty }\int _{\mathbb {R}} f^{\left( 2\right) }\left( x\right) \delta _k\left( x\right) f\left( x\right) \mathrm{d}x=0\). Moreover, Jensen’s inequality gives

$$\begin{aligned} \int _{\mathbb {R}}\left[ Q_n\sum _{k=1}^nQ_k^{-1}\beta _kh_k^2\delta _k\left( x\right) \right] ^2f\left( x\right) \mathrm{d}x\le & {} \left( Q_n\sum _{k=1}^nQ_k^{-1}\beta _kh_k^2\right) \left( Q_n\sum _{k=1}^{n}Q_k^{-1}\beta _kh_k^2\int _{\mathbb {R}}\delta _k^2\left( x\right) f\left( x\right) \mathrm{d}x\right) \\\le & {} \left( Q_n\sum _{k=1}^nQ_k^{-1}\beta _kh_k^2\right) \left( Q_n\sum _{k=1}^{n}Q_k^{-1}\beta _ko\left( h_k^2\right) \right) , \end{aligned}$$

so that we get

$$\begin{aligned} \int _{\mathbb {R}}\left\{ Q_n\sum _{k=1}^{n}Q_k^{-1}\beta _k\left[ \mathbb {E}\left( Z_k\left( x\right) \right) -f\left( x\right) \right] \right\} ^2f\left( x\right) \mathrm{d}x &= \frac{1}{4}\mu _2^2\left( K\right) \left[ Q_n\sum _{k=1}^{n}Q_k^{-1}\beta _kh_k^2\right] ^2\int _{\mathbb {R}}\left( f^{\left( 2\right) }\left( x\right) \right) ^2f\left( x\right) \mathrm{d}x\\&\quad +\,O\left( \left[ Q_n\sum _{k=1}^{n}Q_k^{-1}\beta _kh_k^2\right] \left[ Q_n\sum _{k=1}^{n}Q_k^{-1}\beta _ko\left( h_k^2\right) \right] \right) . \end{aligned}$$
  • Let us first consider the case \(a\le \beta /5\). In this case, \(\lim _{n\rightarrow \infty }\left( n\beta _n\right) >2a\), and the application of Lemma 1 gives

    $$\begin{aligned} \int _{\mathbb {R}}\left\{ Q_n\sum _{k=1}^{n}Q_k^{-1}\beta _k\left[ \mathbb {E}\left( Z_k\left( x\right) \right) -f\left( x\right) \right] \right\} ^2f\left( x\right) \mathrm{d}x &= \frac{1}{4\left( 1-2a\xi \right) ^2}h_n^4\mu _2^2\left( K\right) \int _{\mathbb {R}} \left( f^{\left( 2\right) }\left( x\right) \right) ^2f\left( x\right) \mathrm{d}x+o\left( h_n^4\right) , \end{aligned}$$

    and ensures that \(Q_n^2=o(h_n^4)\). In view of (34), we then deduce that

    $$\begin{aligned} \int _{\mathbb {R}}\left\{ \mathbb {E}\left( f_{n}\left( x\right) \right) -f\left( x\right) \right\} ^2f\left( x\right) \mathrm{d}x=\frac{1}{4\left( 1-2a\xi \right) ^2}h_n^4\mu _2^2\left( K\right) I_2+o\left( h_n^4\right) . \end{aligned}$$
    (29)
  • Let us now consider the case \(a>\beta /5\). In this case, we have \(h_k^2=o(\sqrt{\beta _kh_k^{-1}})\) and \(\lim _{n\rightarrow \infty }\left( n\beta _n\right) >\left( \beta -a\right) /2\). The application of Lemma 1 then gives

    $$\begin{aligned} \int _{\mathbb {R}}\left\{ Q_n\sum _{k=1}^{n}Q_k^{-1}\beta _k\left[ \mathbb {E}\left( Z_k\left( x\right) \right) -f\left( x\right) \right] \right\} ^2f\left( x\right) \mathrm{d}x &= O\left( \left[ Q_n\sum _{k=1}^{n}Q_k^{-1}\beta _ko\left( \sqrt{\beta _kh_k^{-1}}\right) \right] ^2\right) \\ &= o\left( \beta _nh_n^{-1}\right) , \end{aligned}$$

    and ensures that \(Q_n^2=o(\beta _nh_n^{-1})\). In view of (34), we then deduce that

    $$\begin{aligned} \int _{\mathbb {R}}\left\{ \mathbb {E}\left( f_{n}\left( x\right) \right) -f\left( x\right) \right\} ^2f\left( x\right) \mathrm{d}x= o\left( \beta _nh_n^{-1}\right) . \end{aligned}$$
    (30)

    On the other hand, we note that

    $$\begin{aligned} \int _{\mathbb {R}}xVar\left[ f_n\left( x\right) \right] f_Y\left( x\right) \mathrm{d}x &= Q_n^{2}\sum _{k=1}^nQ_k^{-2}\beta _k^2\int _{\mathbb {R}}xVar\left[ Z_k\left( x\right) \right] f_Y\left( x\right) \mathrm{d}x\\ &= Q_n^{2}\sum _{k=1}^nQ_k^{-2}\beta _k^2 \left[ \frac{1}{h_k}\int _{\mathbb {R}^2}xK^2\left( z\right) H\left( x-zh_k\right) f_Y\left( x\right) \mathrm{d}z\mathrm{d}x\right. \\&\quad -\,\left. \int _{\mathbb {R}}x\left( \int _{\mathbb {R}}K\left( z\right) f\left( x-zh_k\right) \mathrm{d}z\right) ^2f_Y\left( x\right) \mathrm{d}x \right] \end{aligned}$$

    with

    $$\begin{aligned} \int _{\mathbb {R}}\int _{\mathbb {R}}xK^2\left( z\right) H\left( x-zh_k\right) f_Y\left( x\right) \mathrm{d}z\mathrm{d}x &= \int _{\mathbb {R}}K^2\left( z\right) \left( \int _{\mathbb {R}}xH\left( x-zh_k\right) f_Y\left( x\right) \mathrm{d}x\right) \mathrm{d}z\\ &= \int _{\mathbb {R}}K^2\left( z\right) \mathrm{d}z\int _{\mathbb {R}}xH\left( x\right) f_Y\left( x\right) \mathrm{d}x \end{aligned}$$

    and

    $$\begin{aligned} \int _{\mathbb {R}}x\left( \int _{\mathbb {R}}K\left( z\right) f\left( x-zh_k\right) \mathrm{d}z\right) ^2f_Y\left( x\right) \mathrm{d}x &= \int _{\mathbb {R}^{3}}xK\left( z\right) K\left( z{^{\prime }}\right) f\left( x-zh_k\right) f\left( x-z^{\prime }h_k\right) f_Y\left( x\right) \mathrm{d}z \mathrm{d}z^{\prime }\mathrm{d}x\\ &\le \Vert f_Y\Vert ^2_{\infty }\Vert K\Vert _1^2. \end{aligned}$$
  • In the case \(a\ge \beta /5\), we have \(\lim _{n\rightarrow \infty }\left( n\beta _n\right) >\left( \beta -a\right) /2\), and Lemma 1 ensures that

    $$\begin{aligned} \int _{\mathbb {R}}y\mathrm{Var}\left[ f_n\left( x\right) \right] f_Y\left( x\right) \mathrm{d}x &= Q_n^{2}\sum _{k=1}^n\frac{Q_k^{-2}\beta _k^2}{h_k}\left[ I_1\int _{\mathbb {R}}K^2\left( z\right) \mathrm{d}z+o(1)\right] \nonumber \\ &=\beta _nh_n^{-1}\frac{1}{(2-\left( \beta -a\right) \xi )} I_1\int _{\mathbb {R}}K^2\left( z\right) \mathrm{d}z+o\left( \beta _nh_n^{-1}\right) . \end{aligned}$$
    (31)
  • In the case \(a<\beta /5\), we have \(\beta _nh_n^{-1}=o\left( h_n^4\right)\) and \(\lim _{n\rightarrow \infty }\left( n\beta _n\right) >2a\), so that Lemma 1 gives

    $$\begin{aligned} \int _{\mathbb {R}}x\mathrm{Var}\left[ f_n\left( x\right) \right] f_Y\left( x\right) \mathrm{d}x &= Q_n^{2}\sum _{k=1}^nQ_k^{-2}\beta _ko\left( h_k^4\right) \nonumber \\ &= o\left( h_n^4\right) . \end{aligned}$$
    (32)

    Part 1 of Proposition 2 follows from the combination of (29) and (32), Part 2 from that of (29) and (31), and Part 3 from that of (30) and (31).

1.2 Proof of proposition 2

First, since \(\eta _n\) is a consistent estimator for \(\eta\), we consider throughout the proof the following estimator of f at the point x,

$$\begin{aligned} f_n\left( x\right) =\left( 1-\beta _n\right) f_{n-1}\left( x\right) +\eta \beta _nh_n^{-1}Y_n^{-1}K\left( h_n^{-1}\left[ x-Y_n\right] \right) , \end{aligned}$$
(33)

which is consistent for our proposed kernel density estimators (1).

Moreover, in view of (22), we have

$$\begin{aligned} f_n\left( x\right) -f\left( x\right) &= \left( 1-\beta _n\right) \left( f_{n-1}\left( x\right) -f\left( x\right) \right) +\beta _n\left( Z_n\left( x\right) -f\left( x\right) \right) \nonumber \\ &= \sum _{k=1}^{n-1}\left[ \prod _{j=k+1}^{n}\left( 1-\beta _j\right) \right] \beta _k\left( Z_k\left( x\right) -f\left( x\right) \right) +\beta _n\left( Z_n\left( x\right) -f\left( x\right) \right) \nonumber \\&\quad +\,\left[ \prod _{j=1}^{n}\left( 1-\beta _j\right) \right] \left( f_{0}\left( x\right) -f\left( x\right) \right) \nonumber \\ &= Q_n\sum _{k=1}^nQ_k^{-1}\beta _k\left( Z_k\left( x\right) -f\left( x\right) \right) +Q_n\left( f_0\left( x\right) -f\left( x\right) \right) . \end{aligned}$$
(34)

It follows that

$$\begin{aligned} \mathbb {E}\left( f_n\left( x\right) \right) -f\left( x\right) &= Q_n\sum _{k=1}^{n}Q_k^{-1}\beta _k\left( \mathbb {E}\left( Z_k\left( x\right) \right) -f\left( x\right) \right) +Q_n\left( f_0\left( x\right) -f\left( x\right) \right) . \end{aligned}$$

Now, since \(f_Y\left( y\right) =\eta ^{-1}yf\left( y\right)\), Taylor’s expansion with integral remainder ensures that

$$\begin{aligned} \mathbb {E}\left[ Z_k\left( x\right) \right] -f\left( x\right) &= \eta \int _{\mathbb {R}}h_k^{-1}y^{-1}K\left( h_k^{-1}\left( x-y\right) \right) f_Y\left( y\right) dy-f\left( x\right) \nonumber \\ &= \int _{\mathbb {R}}K\left( z\right) \left[ f\left( x-zh_k\right) -f\left( x\right) \right] \mathrm{d}z \nonumber \\& = \frac{h_k^2}{2} \mu _2\left( K\right) f^{\left( 2\right) }\left( x\right) +h_k^2\delta _k\left( x\right) \end{aligned}$$
(35)

with

$$\begin{aligned} \delta _k\left( x\right) =h_k^{-2}\int _{\mathbb {R}} K\left( z\right) \left[ f\left( x-zh_k\right) -f\left( x\right) -z^2\frac{h_k^2}{2}f^{\left( 2\right) }\left( x\right) \right] \mathrm{d}z, \end{aligned}$$

and, since \(f^{\left( 2\right) }\) is bounded and continuous at x, we have \(\lim _{k\rightarrow \infty }\delta _k\left( x\right) =0\). In the case \(a\le \beta /5\), we have \(\lim _{n\rightarrow \infty }\left( n\beta _n\right) >2a\); the application of Lemma 1 then gives

$$\begin{aligned} \mathbb {E}\left[ f_n\left( x\right) \right] -f\left( x\right) &= \frac{1}{2}\mu _2\left( K\right) f^{\left( 2\right) }\left( x\right) Q_n\sum _{k=1}^{n}Q_k^{-1}\beta _kh_k^2[1+o(1)]+\varPi _n\left( f_0\left( x\right) -f\left( x\right) \right) \\ &= \frac{1}{2(1-2a\xi )}\mu _2\left( K\right) f^{\left( 2\right) }\left( x\right) \left[ h_n^2+o(1)\right] , \end{aligned}$$

and (23) follows. In the case \(a>\beta /5\), we have \(h_n^2=o\left( \sqrt{\beta _nh_n^{-1}}\right)\); since \(\lim _{n\rightarrow \infty }\left( n\beta _n\right) >\left( \beta -a\right) /2\), lemma 1 then ensures that

$$\begin{aligned} \mathbb {E}\left[ f_n\left( x\right) \right] -f\left( x\right) &= Q_n\sum _{k=1}^{n}Q_k^{-1}\beta _ko\left( \sqrt{\beta _kh_k^{-1}}\right) +O\left( Q_n\right) \\ &= o\left( \sqrt{\beta _nh_n^{-1}}\right) , \end{aligned}$$

which gives (24). Now, since, \(f_Y\left( y\right) =\eta ^{-2}y^2H\left( y\right)\), we have

$$\begin{aligned} Var\left[ f_n\left( x\right) \right] &= Q_n^{2}\sum _{k=1}^nQ_k^{-2}\beta _k^2Var\left[ Z_k\left( x\right) \right] \\& = Q_n^{2}\sum _{k=1}^n\frac{Q_k^{-2}\beta _k^2}{h_k} \left[ \int _{\mathbb {R}}K^2\left( z\right) H\left( x-zh_k\right) \mathrm{d}z- h_k\left( \int _{\mathbb {R}}K\left( z\right) f\left( x-zh_k\right) \mathrm{d}z\right) ^2\right] \\&= Q_n^{2}\sum _{k=1}^n\frac{Q_k^{-2}\beta _k^2}{h_k} \left[ H\left( x\right) \int _{\mathbb {R}}K^2\left( z\right) \mathrm{d}z+ \nu _k\left( x\right) -h_k{\tilde{\nu }}_k\left( x\right) \right] \end{aligned}$$

with

$$\begin{aligned} \nu _k\left( x\right) &= \int _{\mathbb {R}}K^2\left( z\right) \left[ H\left( x-zh_k\right) -H\left( x\right) \right] \mathrm{d}z,\\ {\tilde{\nu }}_k\left( x\right)&= \left( \int _{\mathbb {R}}K\left( z\right) f\left( x-zh_k\right) \mathrm{d}z\right) ^2. \end{aligned}$$

In view of \(\left( A3\right)\), we have \(\lim _{k\rightarrow \infty }\nu _k\left( x\right) =0\) and \(\lim _{k\rightarrow \infty }h_k{\tilde{\nu }}_k\left( x\right) =0\). In the case \(a\ge \beta /5\), we have \(\lim _{n\rightarrow \infty }\left( n\beta _n\right) >\left( \beta -a\right) /2\), and the application of Lemma 1 gives

$$\begin{aligned} Var\left[ f_n\left( x\right) \right] &= Q_n^{2}\sum _{k=1}^n\frac{Q_k^{-2}\beta _k^2}{h_k}\left[ H\left( x\right) \int _{\mathbb {R}}K^2\left( z\right) \mathrm{d}z+o\left( 1\right) \right] \\ &= \frac{1}{2-\left( \beta -a\right) \xi }\beta _nh_n^{-1} \left[ H\left( x\right) \int _{\mathbb {R}}K^2\left( z\right) \mathrm{d}z+o\left( 1\right) \right] , \end{aligned}$$

which proves (25). In the case \(a<\beta /5\), we have \(\beta _nh_n^{-1}=o\left( h_n^4\right)\); since \(\lim _{n\rightarrow \infty }\left( n\beta _n\right) >2a\), Lemma 1 then ensures that

$$\begin{aligned} Var\left[ f_n\left( x\right) \right] &= Q_n^{2}\sum _{k=1}^nQ_k^{-2}\beta _ko\left( h_k^4\right) \\&= o\left( h_n^4\right) , \end{aligned}$$

which gives (26).

1.3 Proof of theorem 1

Let us at first assume that, if \(a\ge \beta /5\), then

$$\begin{aligned} \sqrt{\beta _n^{-1} h_n}\left( f_{n}\left( x\right) -\mathbb {E}\left[ f_n\left( x\right) \right] \right) {\mathop {\rightarrow }\limits ^{\mathcal {D}}}\mathcal {N}\left( 0, \frac{1}{2-\left( \beta -a\right) \xi }H\left( x\right) \int _{\mathbb {R}}K^2\left( z\right) \mathrm{d}z\right) . \end{aligned}$$
(36)

In the case when \(a>\beta /5\), Part 1 of Theorem 1 follows from the combination of (24) and (36). In the case when \(a=\beta /5\), Parts 1 and 2 of Theorem 1 follow from the combination of (23) and (36). In the case \(a<\beta /5\), (26) implies that

$$\begin{aligned} h_n^{-2}\left( f_n\left( x\right) -\mathbb {E}\left( f_n\left( x\right) \right) \right) {\mathop {\rightarrow }\limits ^{\mathbb {P}}}0, \end{aligned}$$

and the application of (23) gives Part 2 of Theorem 1.

We now prove (36). In view of (33), we have

$$\begin{aligned} f_n\left( x\right) -\mathbb {E}\left[ f_n\left( x\right) \right] &= \left( 1-\beta _n\right) \left( f_{n-1}\left( x\right) - \mathbb {E}\left[ f_{n-1}\left( x\right) \right] \right) +\beta _n\left( Z_n\left( x\right) -\mathbb {E}\left[ Z_n\left( x\right) \right] \right) \\&= Q_n\sum _{k=1}^nQ_k^{-1}\beta _k\left( Z_k\left( x\right) - \mathbb {E}\left[ Z_k\left( x\right) \right] \right) . \end{aligned}$$

Set

$$\begin{aligned} Y_{k}\left( x\right) =Q_k^{-1}\beta _k\left( Z_k\left( x\right) -\mathbb {E}\left( Z_k\left( x\right) \right) \right) . \end{aligned}$$

The application of Lemma 1 ensures that

$$\begin{aligned} v_n^2 &= \sum _{k=1}^nVar\left( Y_{k}\left( x\right) \right) \\ &= \sum _{k=1}^nQ_k^{-2}\beta _k^2Var\left( Z_k\left( x\right) \right) \\&= \sum _{k=1}^n\frac{Q_k^{-2}\beta _k^2}{h_k}\left[ H\left( x\right) \int _{\mathbb {R}}K^2\left( z\right) \mathrm{d}z+o\left( 1\right) \right] \\&= \frac{1}{Q_n^2}\beta _nh_n^{-1}\left[ \frac{1}{2-\left( \beta -a\right) \xi }H\left( x\right) \int _{\mathbb {R}}K^2\left( z\right) \mathrm{d}z+o\left( 1\right) \right] . \end{aligned}$$

On the other hand, we have, for all \(p>0\),

$$\begin{aligned} \mathbb {E}\left[ \left| Z_k\left( x\right) \right| ^{2+p}\right] &= O\left( \frac{1}{h_k^{(1+p)}}\right) , \end{aligned}$$

and, since \(\lim _{n\rightarrow \infty }\left( n\beta _n\right) >\left( \beta -a\right) /2\), there exists \(p>0\) such that \(\lim _{n\rightarrow \infty }\left( n\beta _n\right) >\frac{1+p}{2+p}\left( \beta -a\right)\). Applying Lemma 1, we get

$$\begin{aligned} \sum _{k=1}^n\mathbb {E}\left[ \left| Y_{k}\left( x\right) \right| ^{2+p}\right] &= O\left( \sum _{k=1}^n Q_k^{-2-p}\beta _k^{2+p}\mathbb {E}\left[ \left| Z_k\left( x\right) \right| ^{2+p}\right] \right) \\&= O\left( \sum _{k=1}^n \frac{Q_k^{-2-p}\beta _k^{2+p}}{h_k^{(1+p)}}\right) \\&= O\left( \frac{\beta _n^{1+p}}{Q_n^{2+p}h_n^{(1+p)}}\right) , \end{aligned}$$

and we thus obtain

$$\begin{aligned} \frac{1}{v_n^{2+p}}\sum _{k=1}^n\mathbb {E}\left[ \left| Y_{k}\left( x\right) \right| ^{2+p}\right] &= O\left( {\left[ \beta _nh_n^{-1}\right] }^{p/2}\right) =o\left( 1\right) . \end{aligned}$$

The convergence in (36) then follows from the application of Lyapunov’s Theorem.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Slaoui, Y. Automatic bandwidth selection for recursive kernel density estimators with length-biased data. Jpn J Stat Data Sci 3, 429–452 (2020). https://doi.org/10.1007/s42081-019-00053-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s42081-019-00053-z

Keywords

Mathematics Subject Classification

Navigation