Skip to main content
Log in

Kernel Density Estimation Based on the Distinct Units in Sampling with Replacement

  • Published:
Sankhya B Aims and scope Submit manuscript

Abstract

This paper considers the problem of estimating density functions using the kernel method based on the set of distinct units in sampling with replacement. Using a combined design-model-based inference framework, which accounts for the underlying superpopulation model as well as the randomization distribution induced by the sampling design, we derive asymptotic expressions for the bias and integrated mean squared error (MISE) of a Parzen-Rosenblatt-type kernel density estimator (KDE) based on the distinct units from sampling with replacement. We also prove the asymptotic normality of the distinct units KDE under both design-based and combined inference frameworks. Additionally, we give the asymptotic MISE formulas of several alternative estimators including the estimator based on the full with-replacement sample and estimators based on without-replacement sampling of similar cost. Using the MISE expressions, we discuss how the various estimators compare asymptotically. Moreover, we use Mote Carlo simulations to investigate the finite sample properties of these estimators. Our simulation results show that the distinct units KDE and the without-replacement KDEs perform similarly but are all always superior to the full with-replacement sample KDE. Furthermore, we briefly discuss a Nadaraya-Watson-type kernel regression estimator based on the distinct units from sampling with replacement, derive its MSE under the combined inference framework, and demonstrate its finite sample properties using a small simulation study. Finally, we extend the distinct units density and regression estimators to the case of two-stage sampling with replacement.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Alin, A., Martin, M.A., Beyaztas, U. and Pathak, P.K. (2017). Sufficient m-out-of-n (m/n) bootstrap. J. Stat. Comput. Simul. 87, 1742–1753.

    Article  MathSciNet  MATH  Google Scholar 

  • Antal, E. and Tillé, Y. (2011a). Direct bootstrap method for complex sampling designs from a finite population. J. Am. Stat. Assoc. 106, 534–543.

    Article  MathSciNet  MATH  Google Scholar 

  • Antal, E. and Tillé, Y. (2011b). Simple random sampling with over-replacement. J. Stat. Plan. Inference 141, 597–601.

    Article  MathSciNet  MATH  Google Scholar 

  • Arnab, R. (1999). On use of distinct respondents in randomized response surveys. Biom. J. 41, 507–513.

    Article  MathSciNet  MATH  Google Scholar 

  • Basu, D. (1958). On sampling with and without replacement. Sankhyā 20, 287–294.

    MathSciNet  MATH  Google Scholar 

  • Bellhouse, D.R. and Stafford, J.E. (1999). Density estimation from complex surveys. Stat. Sin. 9, 407–424.

    MathSciNet  MATH  Google Scholar 

  • Bleuer, S.R. and Kratina, I.S. (2005). On the two-phase framework for joint model and design-based inference. Ann. Stat. 33, 2789–2810.

    MathSciNet  MATH  Google Scholar 

  • Bonnéry, D., Breidt, F.J. and Coquet, F. (2017). Kernel estimation for a superpopulation probability density function under informative selection. Metron 75, 301–318.

    Article  MathSciNet  MATH  Google Scholar 

  • Buskirk, T.D. and Lohr, S.L. (2005). Asymptotic properties of kernel density estimation with complex survey data. J. Stat. Plan. Inference 128, 165–190.

    Article  MathSciNet  MATH  Google Scholar 

  • Cochran, W.G. (1977). Sampling Techniques. Wiley, New York.

    MATH  Google Scholar 

  • Efron, B. (1979). Bootstrap methods: Another look at the jackknife. Ann. Stat. 7, 1–26.

    Article  MathSciNet  MATH  Google Scholar 

  • Efron, B. and Tibshirani, R. (1993). An Introduction to Bootstrap. Chapman and Hall, New York.

    Book  MATH  Google Scholar 

  • Fan, J. and Gijbels, I. (1996). Local Polynomial Modelling and Its Applications. Chapman & Hall, New York.

    MATH  Google Scholar 

  • Guillera-Arroita, G. (2011). Impact of sampling with replacement in occupancy studies with spatial replication. Methods Ecol. Evol. 2, 401–406.

    Article  Google Scholar 

  • Harms, T. and Duchesne, P. (2010). On kernel nonparametric regression designed for complex survey data. Metrika 72, 111–138.

    Article  MathSciNet  MATH  Google Scholar 

  • Hartley, H.O. and Sielken, R.L. (1975). A “superpopulation viewpoint” for finite population sampling. Biometrics 31, 411–422.

    Article  MathSciNet  MATH  Google Scholar 

  • Isaki, C.T. and Fuller, W.A. (1982). Survey design under the regression superpopulation model. J. Am. Stat. Assoc. 77, 89–96.

    Article  MathSciNet  MATH  Google Scholar 

  • Korwar, R.M. and Serfling, R.J. (1970). On averaging over distinct units in sampling with replacement. Ann. Math. Stat. 41, 2132–2134.

    Article  MathSciNet  MATH  Google Scholar 

  • Lanke, J. (1975). Some contributions to the theory of survey sampling. PhD thesis Department of Mathematical Statistics. University of Lund, Sweden.

    MATH  Google Scholar 

  • Lohr, S.L. (2010). Sampling: Design and Analysis. Cengage Learning, Massachusetts.

    MATH  Google Scholar 

  • Mostafa, S.A. and Ahmad, I.A. (2019). Kernel density estimation from complex surveys in the presence of complete auxiliary information. Metrika 82, 295–338.

    Article  MathSciNet  MATH  Google Scholar 

  • Nadaraya, E.A. (1964). On estimating regression. Theory Probab. Appl. 9, 141–142.

    Article  MATH  Google Scholar 

  • Naiman, D.Q. and Torcaso, F. (2016). To replace or not to replace in finite population sampling. arXiv:1606.01782.

  • Park, B.H., Ostrouchov, G. and Samatova, N.F. (2007). Sampling streaming data with replacement. Comput. Stat. Data Anal. 52, 750–762.

    Article  MathSciNet  MATH  Google Scholar 

  • Parzen, E. (1962). On estimation of a probability density function and mode. Ann. Math. Stat. 33, 1065–1076.

    Article  MathSciNet  MATH  Google Scholar 

  • Pathak, P.K. (1961). On the evaluation of moments of distinct units in a sample. Sankhyā Ser A 23, 415–420.

    MathSciNet  MATH  Google Scholar 

  • Pathak, P.K. (1962a). On sampling with unequal probabilities. Sankhyā Ser. A24, 315–326.

    MathSciNet  MATH  Google Scholar 

  • Pathak, P.K. (1962b). On simple random sampling with replacemen. Sankhyā, Ser A 24, 287–302.

    MathSciNet  MATH  Google Scholar 

  • Pathak, P.K. (1982). Asymptotic normality of the average of distinct units in simple random sampling with replacement. Essays in Honour of CR Rao. G. Kallianpur, P. R. Krishnaiah, J. K. Ghosh (Eds), pp. 567–573.

  • Pfeffermann, D. (1993). The role of sampling weights when modeling survey data. Int. Stat. Rev. 61, 317–337.

    Article  MATH  Google Scholar 

  • R Core Team (2017). A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna.

    Google Scholar 

  • Raj, D. and Khamis, S.H. (1958). Some remarks on sampling with replacement. Ann. Math. Stat. 39, 550–557.

    Article  MathSciNet  MATH  Google Scholar 

  • Ramakrishnan, M.K. (1969). Some results on the comparison of sampling with and without replacement. Sankhyā Ser. A 51, 333–342.

    MathSciNet  MATH  Google Scholar 

  • Rao, J.K.N. (1966). On the comparison of sampling with and without replacement. Rev. Int. Stat. Inst. 34, 125–138.

    Article  MathSciNet  MATH  Google Scholar 

  • Rosenblatt, M. (1956). Remarks on some nonparametric estimates of a density function. The Annals of Mathematical Statistics 27, 832–837.

    Article  MathSciNet  MATH  Google Scholar 

  • Scott, D.W. (2015). Multivariate Density Estimation: Theory, Practice and Visualization. Wiley, New York.

    Book  MATH  Google Scholar 

  • Sengupta, S. (2016). On comparisons of with and without replacement sampling strategies for estimating finite population mean in randomized response surveys. Sankhyā Ser B 78, 66–77.

    Article  MathSciNet  MATH  Google Scholar 

  • Seth, G.R. and Rao, J.K.N. (1964). On the comparison between simple random sampling with and without replacement. Sankhyā Ser A 26, 85–86.

    MathSciNet  MATH  Google Scholar 

  • Sheather, S.J. and Jones, M.C. (1991). A reliable data-based bandwidth selection method for kernel density estimation. J. R. Stat. Soc. Ser. B 53, 683–690.

    MathSciNet  MATH  Google Scholar 

  • Singh, S. and Sedory, S.A. (2011). Sufficient bootstrapping. Comput. Stat. Data Anal. 55, 1629–1637.

    Article  MathSciNet  MATH  Google Scholar 

  • Sinha, B.K. and Sen, P.K. (1989). On averaging over distinct units in sampling with replacement. Sankhyā Ser B 51, 65–83.

    MathSciNet  MATH  Google Scholar 

  • Stuart, A. and Ord, J.K. (1987). Kendall’s Advanced Theory of Statistics, 1. Oxford University Press, New York.

    MATH  Google Scholar 

  • Wand, M. and Jones, M. (1995). Kernel Smoothing. Chapman and Hall, London.

    Book  MATH  Google Scholar 

  • Watson, G.S. (1964). Smooth regression analysis. Sankhyā Ser A 26, 359–372.

    MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sayed A. Mostafa.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

(PDF 664 KB)

Appendix: Additional Proofs

Appendix: Additional Proofs

Proof of Corollary 2.2.

The estimator \(\hat {f}_{n}(y;h)\) can be seen as a sample mean of a fixed size simple random sample drawn with replacement from the finite population. Therefore, \(\hat {f}_{n}(y;h)\) is design-unbiased for the finite population smooth fU(y;h) and its design variance is given by (cf. Cochran, 1977, pg. 30)

$$ \begin{array}{@{}rcl@{}}\mathrm{V}_{\mathcal{D}}\{\hat{f}_n(y)|\mathbf{Y}_{\text{U}}\} &=&\frac{1}{n}\left( 1-\frac{1}{N}\right)\frac{1}{N-1}\sum\limits_{i=1}^{N}\left[K_h\left( y-Y_i\right)-f_{\text{U}}(y)\right]^2. \end{array} $$

Now, the bias of \(\hat {f}_{n}(y;h)\) under combined inference is identical to the bias of \(\hat {f}_{\nu }(y;h)\) (see Theorem 2.2), whereas the combined variance can be written as

$$ \begin{array}{@{}rcl@{}}\mathbb{V}\{\hat{f}_n(y)\}& = &\mathrm{E}_{\mathcal{\xi}}[\mathrm{V}_{\mathcal{D}}\{\hat{f}_n(y)|\mathbf{Y}_{\text{U}}\}] + \mathrm{V}_{\mathcal{\xi}}[\mathrm{E}_{\mathcal{D}}\{\hat{f}_{\nu}(y)|\mathbf{Y}_{\text{U}}\}].\end{array} $$

Using (14) and (15), it is not difficult to see that

$$ \begin{array}{@{}rcl@{}}\mathbb{V}\{\hat{f}_n(y)\}&=&\frac{1}{nh}\left[\left( 1-\frac{1}{N}\right)+\frac{n}{N}\right]d_K f(y)+o\left( \frac{1}{Nh}\right). \end{array} $$

The result follows upon collecting the squared bias and variance of \(\hat {f}_{n}(y;h)\) and integrating over y.□

Proof of Corollary 2.3.

The estimator \(\hat {f}_{\nu , \text { wor}}(y;h)\) is the sample mean of a fixed size simple random sample drawn without replacement from the finite population. Therefore, \(\hat {f}_{\nu , \text { wor}}(y;h)\) is design-unbiased for the finite population smooth fU(y;h) and its design variance is given by (cf. Cochran, 1977, pg. 23)

$$ \begin{array}{@{}rcl@{}}\mathrm{V}_{\mathcal{D}}\{\hat{f}_{\nu, \text{ wor}}(y)|\mathbf{Y}_{\text{U}}\} &=&\frac{1}{\nu}\left( 1-\frac{\nu}{N}\right)\frac{1}{N-1}\sum\limits_{i=1}^{N}\left[K_h\left( y-Y_i\right)-f_{\text{U}}(y)\right]^2. \end{array} $$

The rest of the proof follows the lines of the proof of Corollary 2.2.□

Proof of Corollary 2.4.

As in the proof of Corollary 2.3, the estimator \(\hat {f}_{\nu _{0}, \text { wor}}(y;h)\) is design-unbiased for fU(y;h) and its design variance is given by

$$ \begin{array}{@{}rcl@{}}\mathrm{V}_{\mathcal{D}}\{\hat{f}_{\nu_0, \text{ wor}}(y)|\mathbf{Y}_{\text{U}}\}&=&\left( \frac{1}{\nu_0}-\frac{1}{N}\right)\frac{1}{N-1}\sum\limits_{i=1}^{N}\left[K_h\left( y-Y_i\right)-f_{\text{U}}(y)\right]^2. \end{array} $$
(A.1)

Now, using \(\nu _{0}=\mathrm {E}_{\nu }(\nu )=N\left (1-\left \{1-1/N\right \}^{n}\right )\) to substitute in (A.1) and simplifying, we get

$$ \begin{array}{@{}rcl@{}}\mathrm{V}_{\mathcal{D}}\{\hat{f}_{\nu_0, \text{ wor}}(y)|\mathbf{Y}_{\text{U}}\}&=&\frac{1}{n}\mathcal{C}^{**}_{N,n}\frac{1}{N-1}\sum\limits_{i=1}^{N}\left[K_h\left( y-Y_i\right)-f_{\text{U}}(y)\right]^2,\end{array} $$

where \(\mathcal {C}^{**}_{N,n}=(n/N)(1-1/N)^{n}/\{1-(1-1/N)^{n}\}\). The rest of the proof follows the lines of the proof of Corollary 2.2.□

Proof of Corollary 2.5.

Given λ, the estimator \(\hat {f}_{\nu _{r}, \text { wor}}(y;h)\) is design-unbiased for fU(y;h) and its design variance is given by

$$ \begin{array}{@{}rcl@{}}\mathrm{V}_{\mathcal{D}}\{\hat{f}_{\nu_r, \text{ wor}}(y)|\mathbf{Y}_{\text{U}}\} &=&\left( \frac{1}{\nu_r}-\frac{1}{N}\right)\frac{1}{N-1}\sum\limits_{i=1}^{N}\left[K_h\left( y-Y_i\right)-f_{\text{U}}(y)\right]^2. \end{array} $$
(A.2)

Further, note that since λ = 1 with probability p = (ν0 −⌊ν0⌋), and λ = 0 with probability (1 − p), we can write

$$ \begin{array}{@{}rcl@{}}\mathrm{E}_\lambda \left( \frac{1}{\nu_r}\right)&=&\mathrm{E}_\lambda \left( \frac{1}{(1-\lambda)\lfloor \nu_0 \rfloor+\lambda\left\lceil \nu_0 \right\rceil}\right)\\ &=&\frac{p}{\left\lceil \nu_0 \right\rceil}+\frac{(1-p)}{\lfloor \nu_0 \rfloor}=\frac{1-\nu_0+2\lfloor \nu_0 \rfloor}{\lfloor \nu_0 \rfloor\left\lceil \nu_0 \right\rceil}. \end{array} $$
(A.3)

To reach the second equality in (A.3), use the fact that \(\left \lceil \nu _{0} \right \rceil -\lfloor \nu _{0} \rfloor =1\) and some basic algebra. Using (A.2) and (A.3), we get

$$ \begin{array}{@{}rcl@{}}\mathrm{V}_{\mathcal{P}}\{\hat{f}_{\nu_r, \text{ wor}}(y)|\mathbf{Y}_{\text{U}}\}&=&\frac{1}{n}\left( \mathcal{C}^{***}_{N,n}-\frac{n}{N}\right)\frac{1}{N-1}\sum\limits_{i=1}^{N}\left[K_h\left( y-Y_i\right)-f_{\text{U}}(y)\right]^2. \end{array} $$

The rest of the proof follows the lines of the proof of Corollary 2.2.□

Proof of Lemma 4.1.

Observe that,

$$ \begin{array}{@{}rcl@{}}\mathrm{E}\left\{\frac{1}{\nu}\sum\limits_{i=1}^{\nu}\hat{f}_{-i}(Y^{*}_i)\right\}&=& \mathrm{E}_{\nu}\left\{\mathrm{E}_{\mathcal{\xi}}\left[\frac{1}{\nu}\sum\limits_{i=1}^{{\nu}}\hat{f}_{-i}(Y^{*}_i)\Big|\nu\right]\right\}.\end{array} $$

But,

$$ \begin{array}{@{}rcl@{}}\mathrm{E}_{\mathcal{\xi}}\left[\frac{1}{\nu}\sum\limits_{i=1}^{\nu}\hat{f}_{-i}(Y^{*}_i)\Big|\nu\right]&=& \mathrm{E}_{\mathcal{\xi}}\left[\frac{1}{\nu(\nu-1)}\sum\limits_{i=1}^{\nu}\sum\limits_{j\not=i}^{\nu}K_h(Y^{*}_i-Y^{*}_j)\Big|\nu \right]\\ &\overset{iid}{=}& \mathrm{E}_{\mathcal{\xi}}\left[K_h(Y^{*}_1-Y^{*}_2)\right]\\ &=& \mathrm{E}_{Y^{*}_1}\left\{\mathrm{E}_{Y^{*}_2}\left[K_h(Y^{*}_1-Y^{*}_2)\right]\right\}\\ &=& \mathrm{E}_{Y^{*}_1}\left\{\int K_h(Y^{*}_1-y)f(y) dy\right\}.\end{array} $$

Therefore,

$$ \begin{array}{@{}rcl@{}}\mathrm{E}\left\{\frac{1}{\nu}\sum\limits_{i=1}^{\nu}\hat{f}_{-i}(Y^{*}_i)\right\} &=& \mathrm{E}_{\mathcal{\xi}}\left\{\int K_h(Y^{*}_1-y)f(y) dy\right\}\\&=&\mathrm{E}_{\mathcal{\xi}}\left\{\int \hat{f}_{\nu}(y)f(y) dy\Big|\nu\right\}=\mathrm{E}\left\{\int \hat{f}_{\nu}(y)f(y) dy\right\},\end{array} $$

and the proof is complete. □

Proof 13 (Proof of (4.5)).

$$ \begin{array}{@{}rcl@{}}d_{\hat{f}^{\prime\prime}_\nu}&=&{\int}_{\mathbb{R}}\left\{(\nu h^{3})^{-1}\sum\limits_{j=1}^{\nu}K^{\prime\prime}\left( \frac{y-y^{*}_j}{h}\right)\right\}^2dy\\ &=&(\nu h^{3})^{-2}{\int}_{\mathbb{R}}\left\{\sum\limits_{i=1}^{\nu}\left[K^{\prime\prime}\left( \frac{y-y^{*}_i}{h}\right)\right]^2+2\underset{i <j}{\sum\sum}K^{\prime\prime}\left( \frac{y-y^{*}_i}{h}\right)K^{\prime\prime}\left( \frac{y-y^{*}_j}{h}\right)\right\}dy\\ &=&(\nu h^{3})^{-2}\sum\limits_{i=1}^{\nu}{\int}_{\mathbb{R}}\left[K^{\prime\prime}\left( \frac{y-y^{*}_i}{h}\right)\right]^2 dy + 2(\nu h^{3})^{-2}\underset{i <j}{\sum\sum}{\int}_{\mathbb{R}}K^{\prime\prime}\left( \frac{y-y^{*}_i}{h}\right)K^{\prime\prime} \left( \frac{y-y^{*}_j}{h}\right) dy\\ &=&{\nu}^{-1}h^{-6}{\int}_{\mathbb{R}}\left[K^{\prime\prime}(z)\right]^2 hdz+2{\nu}^{-2}h^{-6}\underset{i <j}{\sum\sum}{\int}_{\mathbb{R}}K^{\prime\prime}(z)K^{\prime\prime}\left( z+\frac{y^{*}_i-y^{*}_j}{h}\right) hdz\\ &=&({\nu}h^{5})^{-1}d_{K^{\prime\prime}}+2{\nu}^{-2}h^{-5}\underset{i <j}{\sum\sum}\phi(c_{ij}), \end{array} $$

where \(\phi (c_{ij})={\int \limits }_{\mathbb {R}}K^{\prime \prime }(z)K^{\prime \prime }(z+h^{-1}\{y^{*}_{i}-y^{*}_{j}\})dz\). □

Proofs of (6.7), (6.8) and (6.9).

For (6.7), notice that

$$ \begin{array}{@{}rcl@{}}\mathbb{E}(Q)&=&\mathrm{E}_{\mathcal{\xi}}\left[\mathrm{E}_\nu \left\{\mathrm{E}_{\mathcal{D}}\left[\frac{1}{\nu}\sum\limits_{i=1}^{\nu}Y^{*}_iK_b(x-X^{*}_i)|\mathbf{X}_{\text{U}},\mathbf{Y}_{\text{U}},\nu\right]|\mathbf{X}_{\text{U}},\mathbf{Y}_{\text{U}}\right\}\right]\\ &=&\mathrm{E}_{\mathcal{\xi}}\left[\mathrm{E}_\nu \left\{\frac{1}{N}\sum\limits_{i=1}^{N}Y_iK_b(x-X_i)|\mathbf{X}_{\text{U}},\mathbf{Y}_{\text{U}}\right\}\right]\\ &=&\mathrm{E}_{\mathcal{\xi}}\left[\frac{1}{N}\sum\limits_{i=1}^{N}Y_iK_b(x-X_i)\right]\\ &=&\mathrm{E}_{\mathcal{\xi}}\left[Y_1K_b(x-X_1)\right] \end{array} $$
(A.4)
$$ \begin{array}{@{}rcl@{}} &=&\iint_{\mathbb{R}}y_1K_b(x-x_1)t(y_1|x_1)g(x_1)dy_1dx_1\\ &=&{\int}_{\mathbb{R}}m(x_1)K_b(x-x_1)g(x_1)dx_1\\ &=&{\int}_{\mathbb{R}}m(x+bu)K(u)g(x+bu)du\\ &=&{\int}_{\mathbb{R}}K(u)[m(x)+bu \ m^{\prime}(x)+\frac{1}{2}b^2u^2m^{\prime\prime}(x)+o(b^2)][g(x)\\&&+bu g^{\prime}(x)+\frac{1}{2}b^2u^2g^{\prime\prime}(x)+o(b^2)]du\\ &=&m(x)g(x)+b^2m^{\prime}(x)g^{\prime}(x)c_K+\frac{1}{2}b^2m(x)g^{\prime\prime}(x)c_K\\&&+\frac{1}{2}b^2m^{\prime\prime}(x)g(x)c_K+o(b^2)\\ &=& m(x)g(x)+\frac{1}{2}b^2[2m^{\prime}(x)g^{\prime}(x)+m(x)g^{\prime\prime}(x)\\&&+m^{\prime\prime}(x)g(x)]c_K+o(b^2), \end{array} $$
(A.5)

Next, we prove (6.8) as follows. Note that

$$ \begin{array}{@{}rcl@{}}\mathbb{V}(Q) &=& \mathrm{E}_{\mathcal{\xi}}\left[\mathrm{V}_{\mathcal{P}}\left\{Q|\mathbf{X}_{\text{U}},\mathbf{Y}_{\text{U}}\right\}\right] + \mathrm{V}_{\mathcal{\xi}}\left[\mathrm{E}_{\mathcal{P}}\left\{Q|\mathbf{X}_{\text{U}},\mathbf{Y}_{\text{U}}\right\}\right] =: L_1 + L_2.\end{array} $$
(A.6)

Using (A.4), we have

$$ \begin{array}{@{}rcl@{}}L_2&=&\frac{1}{N}\mathrm{V}_{\mathcal{\xi}}\left[Y_1K_b(x-X_1)\right]=\frac{1}{N}\mathrm{E}_{\mathcal{\xi}}\left[Y_1K_b(x-X_1)\right]^2\\&&-\frac{1}{N}E^2_{\mathcal{\xi}}\left[Y_1K_b(x-X_1)\right] =: L_{21}-L_{22}.\end{array} $$

Observe that,

$$ \begin{array}{@{}rcl@{}}L_{21}&=&\frac{1}{N}\iint_{\mathbb{R}}y^2_1K^2_b(x-x_1)t(y_1|x_1)g(x_1)dy_1dx_1\\ &=&\frac{1}{N}{\int}_{\mathbb{R}}K^2_b(x-x_1)g(x_1)\left[{\int}_{\mathbb{R}}y^2_1t(y_1|x_1)dy_1\right]dx_1\\ &=&\frac{1}{N}{\int}_{\mathbb{R}}\left[\sigma^2(x_1)+m^2(x_1)\right]K^2_b(x-x_1)g(x_1)dx_1\\ &=&\frac{1}{Nb}{\int}_{\mathbb{R}}\left[\sigma^2(x+bu)+m^2(x+bu)\right]K^2(u)g(x+bu)du\\ &=&\frac{1}{Nb}\left[\sigma^2(x)g(x)+m^2(x)g(x)\right]d_K+o\left( \frac{1}{Nb}\right).\end{array} $$

From (A.5), we have

$$ \begin{array}{@{}rcl@{}}L_{22}&=&\frac{1}{N}\left[m(x)g(x)+O(b^2)\right]^2.\end{array} $$

Therefore,

$$ \begin{array}{@{}rcl@{}}L_{2}&=&\frac{1}{Nb}\left[\sigma^2(x)g(x)+m^2(x)g(x)\right]d_K-\frac{1}{N}\left[m(x)g(x)+O(b^2)\right]^2+o\left( \frac{1}{Nb}\right)\\ &=&\frac{1}{Nb}\left[\sigma^2(x)+m^2(x)\right]g(x)d_K+o\left( \frac{1}{Nb}\right). \end{array} $$
(A.7)

Moreover, given ν, Q can be seen as the sample mean of a fixed size simple random sample drawn without replacement from the finite population. Therefore, Q is design-unbiased for the finite population mean of the variable Y Kb(xX), and its design variance is given by

$$ \begin{array}{@{}rcl@{}}\mathrm{V}_{\mathcal{D}}\left[Q|\mathbf{X}_{\text{U}},\mathbf{Y}_{\text{U}},\nu\right]&=&\left( \frac{1}{\nu}-\frac{1}{N}\right)\frac{1}{N-1}\sum\limits_{i=1}^{N}\left[Y_iK_b(x-X_i)-\frac{1}{N}\sum\limits_{i=1}^{N}Y_iK_b(x-X_i)\right]^2,\end{array} $$

and, hence,

$$ \begin{array}{@{}rcl@{}}\mathrm{V}_{\mathcal{P}}\left[Q|\mathbf{X}_{\text{U}},\mathbf{Y}_{\text{U}}\right]&=&\left[\mathrm{E}_\nu \left( \frac{1}{\nu}\right)-\frac{1}{N}\right]\frac{1}{N-1}\sum\limits_{i=1}^{N}\left[Y_iK_b(x-X_i)-\frac{1}{N}\sum\limits_{i=1}^{N}Y_iK_b(x-X_i)\right]^2\\&&+\mathrm{V}_\nu \left[\frac{1}{N}\sum\limits_{i=1}^{N}Y_iK_b(x-X_i)\right]\\ &=&\left[\mathrm{E}_\nu \left( \frac{1}{\nu}\right) - \frac{1}{N}\right]\frac{1}{N-1}\sum\limits_{i=1}^{N}\left[Y_iK_b(x-X_i)-\frac{1}{N}\sum\limits_{i=1}^{N}Y_iK_b(x-X_i)\right]^2. \end{array} $$

Consequently,

$$ \begin{array}{@{}rcl@{}}L_1&=&\left[\mathrm{E}_\nu \left( \frac{1}{\nu}\right)-\frac{1}{N}\right]\mathrm{V}_{\mathcal{\xi}}\left\{Y_1K_b(x-X_1)\right\}\\ &=&\frac{1}{nb}\mathcal{C}_{N,n}\left[\sigma^2(x)+m^2(x)\right]g(x)d_K+o\left( \frac{1}{Nb}\right), \end{array} $$
(A.8)

Using (A.7) and (A.8) in (A.6), we get (6.8).

For (6.9), observe that

$$ \begin{array}{@{}rcl@{}} \mathbb{E}(QW)&=&\mathbb{E}\left[\left\{\frac{1}{\nu}\sum\limits_{i=1}^{\nu}y^{*}_iK_b(x-x^{*}_i)\right\}\left\{\frac{1}{\nu}\sum\limits_{i=1}^{\nu}K_b(x-x^{*}_i)\right\}\right]\\ &=&\mathbb{E}\left[\frac{1}{\nu^2}\sum\limits_{i=1}^{\nu}y^{*}_iK^2_b(x-x^{*}_i)\right]+\mathbb{E}\left[\frac{1}{\nu^2}\underset{i\not=j}{\sum\sum}y^{*}_iK_b(x-x^{*}_i)K_b(x-x^{*}_j)\right]\\ &=:&B_1+B_2. \end{array} $$
(A.9)

First,

$$ \begin{array}{@{}rcl@{}}B_1&=&\mathrm{E}_\nu \left( \frac{1}{\nu}\right)\mathrm{E}_{\mathcal{\xi}}\left[Y_1K^2_b(x-X_1)\right]\\ &=&\mathrm{E}_\nu \left( \frac{1}{\nu}\right){\int}_{\mathbb{R}}m(x_1)K^2_b(x-x_1)g(x_1)dx_1\\ &=&\mathrm{E}_\nu \left( \frac{1}{\nu}\right)\left[\frac{1}{b}m(x)g(x)d_K+o\left( \frac{1}{Nb}\right)\right]\\ &=&\frac{1}{nb}\left[\mathcal{C}_{N,n}+\frac{n}{N}\right]m(x)g(x)d_K+o\left( \frac{1}{Nb}\right). \end{array} $$
(A.10)

Second,

$$ \begin{array}{@{}rcl@{}} B_2&=&\mathrm{E}_{\mathcal{\xi}}\left[\mathrm{E}_\nu \left\{\mathrm{E}_{\mathcal{D}}\left[\frac{1}{\nu^2}\underset{i,j\in U, i\not=j}{\sum\sum}I_iI_jY_iK_b(x-X_i)K_b(x-X_j)|\mathbf{X}_{\text{U}},\mathbf{Y}_{\text{U}},\nu\right]|\mathbf{X}_{\text{U}},\mathbf{Y}_{\text{U}}\right\}\right]\\ &=&\mathrm{E}_{\mathcal{\xi}}\left[\mathrm{E}_\nu \left\{\frac{1}{\nu^2}\underset{i,j\in U, i\not=j}{\sum\sum}\pi_{ij}Y_iK_b(x-X_i)K_b(x-X_j)|\mathbf{X}_{\text{U}},\mathbf{Y}_{\text{U}}\right\}\right]\\ &=&\mathrm{E}_{\mathcal{\xi}}\left[\mathrm{E}_\nu \left\{\frac{\nu(\nu-1)}{\nu^2N(N-1)}\right\}\underset{i,j\in U, i\not=j}{\sum\sum}Y_iK_b(x-X_i)K_b(x-X_j)\right]\\ &=&\left\{1-\mathrm{E}_\nu \left( \frac{1}{\nu}\right)\right\}\frac{1}{N(N-1)}\underset{i,j\in U, i\not=j}{\sum\sum}\mathrm{E}_{\mathcal{\xi}}\left[Y_iK_b(x-X_i)\right]\mathrm{E}_{\mathcal{\xi}}\left[K_b(x-X_j)\right]\\ &=&\left\{1-\mathrm{E}_\nu \left( \frac{1}{\nu}\right)\right\}\left[m(x)g(x)+\frac{1}{2}b^2\{m(x)g^{\prime\prime}(x)+2m^{\prime}(x)g^{\prime}(x)+m^{\prime\prime}(x)g(x)\}c_K+o(b^2)\right]\\ &&\qquad\qquad\qquad\qquad\times \left[g(x)+\frac{1}{2}b^2g^{\prime\prime}(x)c_K+o(b^2)\right]\\ &=& \left\{1 - \frac{1}{n}\left( \mathcal{C}_{N,n} + \frac{n}{N}\right)\right\}\left[m(x)g^2(x) + \frac{1}{2}b^2\left\{2m(x)g^{\prime\prime}(x) + 2m^{\prime}(x)g^{\prime}(x) + m^{\prime\prime}(x)g(x)\right\}g(x)c_K\right.\\ &&\left.\qquad\qquad\qquad\qquad\qquad\quad+o(b^2)\vphantom{\frac{1}{2}}\right]\\ &=&m(x)g^2(x)+\frac{1}{2}b^2\left[2m(x)g^{\prime\prime}(x)+2m^{\prime}(x)g^{\prime}(x)+m^{\prime\prime}(x)g(x)\right]g(x)c_K+o(b^2)\\&&-O\left( \frac{1}{n}\right). \end{array} $$
(A.11)

Finally, using (A.10) and (A.11) in (A.9), we get (6.9).□

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mostafa, S.A., Ahmad, I.A. Kernel Density Estimation Based on the Distinct Units in Sampling with Replacement. Sankhya B 83, 507–547 (2021). https://doi.org/10.1007/s13571-019-00223-9

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13571-019-00223-9

Keywords and phrases

AMS (2000) subject classification

Navigation