Kernel Density Estimation Based on the Distinct Units in Sampling with Replacement

Mostafa, Sayed A.; Ahmad, Ibrahim A.

doi:10.1007/s13571-019-00223-9

Kernel Density Estimation Based on the Distinct Units in Sampling with Replacement

Published: 13 March 2020

Volume 83, pages 507–547, (2021)
Cite this article

Sankhya B Aims and scope Submit manuscript

182 Accesses
1 Citation
Explore all metrics

Abstract

This paper considers the problem of estimating density functions using the kernel method based on the set of distinct units in sampling with replacement. Using a combined design-model-based inference framework, which accounts for the underlying superpopulation model as well as the randomization distribution induced by the sampling design, we derive asymptotic expressions for the bias and integrated mean squared error (MISE) of a Parzen-Rosenblatt-type kernel density estimator (KDE) based on the distinct units from sampling with replacement. We also prove the asymptotic normality of the distinct units KDE under both design-based and combined inference frameworks. Additionally, we give the asymptotic MISE formulas of several alternative estimators including the estimator based on the full with-replacement sample and estimators based on without-replacement sampling of similar cost. Using the MISE expressions, we discuss how the various estimators compare asymptotically. Moreover, we use Mote Carlo simulations to investigate the finite sample properties of these estimators. Our simulation results show that the distinct units KDE and the without-replacement KDEs perform similarly but are all always superior to the full with-replacement sample KDE. Furthermore, we briefly discuss a Nadaraya-Watson-type kernel regression estimator based on the distinct units from sampling with replacement, derive its MSE under the combined inference framework, and demonstrate its finite sample properties using a small simulation study. Finally, we extend the distinct units density and regression estimators to the case of two-stage sampling with replacement.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A new computational framework for log-concave density estimation

Article Open access 30 April 2024

Check your outliers! An introduction to identifying statistical outliers in R with easystats

Article 25 March 2024

Minimizing robust density power-based divergences for general parametric density models

Article 02 May 2024

References

Alin, A., Martin, M.A., Beyaztas, U. and Pathak, P.K. (2017). Sufficient m-out-of-n (m/n) bootstrap. J. Stat. Comput. Simul. 87, 1742–1753.
Article MathSciNet MATH Google Scholar
Antal, E. and Tillé, Y. (2011a). Direct bootstrap method for complex sampling designs from a finite population. J. Am. Stat. Assoc. 106, 534–543.
Article MathSciNet MATH Google Scholar
Antal, E. and Tillé, Y. (2011b). Simple random sampling with over-replacement. J. Stat. Plan. Inference 141, 597–601.
Article MathSciNet MATH Google Scholar
Arnab, R. (1999). On use of distinct respondents in randomized response surveys. Biom. J. 41, 507–513.
Article MathSciNet MATH Google Scholar
Basu, D. (1958). On sampling with and without replacement. Sankhyā 20, 287–294.
MathSciNet MATH Google Scholar
Bellhouse, D.R. and Stafford, J.E. (1999). Density estimation from complex surveys. Stat. Sin. 9, 407–424.
MathSciNet MATH Google Scholar
Bleuer, S.R. and Kratina, I.S. (2005). On the two-phase framework for joint model and design-based inference. Ann. Stat. 33, 2789–2810.
MathSciNet MATH Google Scholar
Bonnéry, D., Breidt, F.J. and Coquet, F. (2017). Kernel estimation for a superpopulation probability density function under informative selection. Metron 75, 301–318.
Article MathSciNet MATH Google Scholar
Buskirk, T.D. and Lohr, S.L. (2005). Asymptotic properties of kernel density estimation with complex survey data. J. Stat. Plan. Inference 128, 165–190.
Article MathSciNet MATH Google Scholar
Cochran, W.G. (1977). Sampling Techniques. Wiley, New York.
MATH Google Scholar
Efron, B. (1979). Bootstrap methods: Another look at the jackknife. Ann. Stat. 7, 1–26.
Article MathSciNet MATH Google Scholar
Efron, B. and Tibshirani, R. (1993). An Introduction to Bootstrap. Chapman and Hall, New York.
Book MATH Google Scholar
Fan, J. and Gijbels, I. (1996). Local Polynomial Modelling and Its Applications. Chapman & Hall, New York.
MATH Google Scholar
Guillera-Arroita, G. (2011). Impact of sampling with replacement in occupancy studies with spatial replication. Methods Ecol. Evol. 2, 401–406.
Article Google Scholar
Harms, T. and Duchesne, P. (2010). On kernel nonparametric regression designed for complex survey data. Metrika 72, 111–138.
Article MathSciNet MATH Google Scholar
Hartley, H.O. and Sielken, R.L. (1975). A “superpopulation viewpoint” for finite population sampling. Biometrics 31, 411–422.
Article MathSciNet MATH Google Scholar
Isaki, C.T. and Fuller, W.A. (1982). Survey design under the regression superpopulation model. J. Am. Stat. Assoc. 77, 89–96.
Article MathSciNet MATH Google Scholar
Korwar, R.M. and Serfling, R.J. (1970). On averaging over distinct units in sampling with replacement. Ann. Math. Stat. 41, 2132–2134.
Article MathSciNet MATH Google Scholar
Lanke, J. (1975). Some contributions to the theory of survey sampling. PhD thesis Department of Mathematical Statistics. University of Lund, Sweden.
MATH Google Scholar
Lohr, S.L. (2010). Sampling: Design and Analysis. Cengage Learning, Massachusetts.
MATH Google Scholar
Mostafa, S.A. and Ahmad, I.A. (2019). Kernel density estimation from complex surveys in the presence of complete auxiliary information. Metrika 82, 295–338.
Article MathSciNet MATH Google Scholar
Nadaraya, E.A. (1964). On estimating regression. Theory Probab. Appl. 9, 141–142.
Article MATH Google Scholar
Naiman, D.Q. and Torcaso, F. (2016). To replace or not to replace in finite population sampling. arXiv:1606.01782.
Park, B.H., Ostrouchov, G. and Samatova, N.F. (2007). Sampling streaming data with replacement. Comput. Stat. Data Anal. 52, 750–762.
Article MathSciNet MATH Google Scholar
Parzen, E. (1962). On estimation of a probability density function and mode. Ann. Math. Stat. 33, 1065–1076.
Article MathSciNet MATH Google Scholar
Pathak, P.K. (1961). On the evaluation of moments of distinct units in a sample. Sankhyā Ser A 23, 415–420.
MathSciNet MATH Google Scholar
Pathak, P.K. (1962a). On sampling with unequal probabilities. Sankhyā Ser. A24, 315–326.
MathSciNet MATH Google Scholar
Pathak, P.K. (1962b). On simple random sampling with replacemen. Sankhyā, Ser A 24, 287–302.
MathSciNet MATH Google Scholar
Pathak, P.K. (1982). Asymptotic normality of the average of distinct units in simple random sampling with replacement. Essays in Honour of CR Rao. G. Kallianpur, P. R. Krishnaiah, J. K. Ghosh (Eds), pp. 567–573.
Pfeffermann, D. (1993). The role of sampling weights when modeling survey data. Int. Stat. Rev. 61, 317–337.
Article MATH Google Scholar
R Core Team (2017). A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna.
Google Scholar
Raj, D. and Khamis, S.H. (1958). Some remarks on sampling with replacement. Ann. Math. Stat. 39, 550–557.
Article MathSciNet MATH Google Scholar
Ramakrishnan, M.K. (1969). Some results on the comparison of sampling with and without replacement. Sankhyā Ser. A 51, 333–342.
MathSciNet MATH Google Scholar
Rao, J.K.N. (1966). On the comparison of sampling with and without replacement. Rev. Int. Stat. Inst. 34, 125–138.
Article MathSciNet MATH Google Scholar
Rosenblatt, M. (1956). Remarks on some nonparametric estimates of a density function. The Annals of Mathematical Statistics 27, 832–837.
Article MathSciNet MATH Google Scholar
Scott, D.W. (2015). Multivariate Density Estimation: Theory, Practice and Visualization. Wiley, New York.
Book MATH Google Scholar
Sengupta, S. (2016). On comparisons of with and without replacement sampling strategies for estimating finite population mean in randomized response surveys. Sankhyā Ser B 78, 66–77.
Article MathSciNet MATH Google Scholar
Seth, G.R. and Rao, J.K.N. (1964). On the comparison between simple random sampling with and without replacement. Sankhyā Ser A 26, 85–86.
MathSciNet MATH Google Scholar
Sheather, S.J. and Jones, M.C. (1991). A reliable data-based bandwidth selection method for kernel density estimation. J. R. Stat. Soc. Ser. B 53, 683–690.
MathSciNet MATH Google Scholar
Singh, S. and Sedory, S.A. (2011). Sufficient bootstrapping. Comput. Stat. Data Anal. 55, 1629–1637.
Article MathSciNet MATH Google Scholar
Sinha, B.K. and Sen, P.K. (1989). On averaging over distinct units in sampling with replacement. Sankhyā Ser B 51, 65–83.
MathSciNet MATH Google Scholar
Stuart, A. and Ord, J.K. (1987). Kendall’s Advanced Theory of Statistics, 1. Oxford University Press, New York.
MATH Google Scholar
Wand, M. and Jones, M. (1995). Kernel Smoothing. Chapman and Hall, London.
Book MATH Google Scholar
Watson, G.S. (1964). Smooth regression analysis. Sankhyā Ser A 26, 359–372.
MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Mathematics & Statistics, North Carolina A&T State University, Greensboro, NC, USA
Sayed A. Mostafa
Department of Statistics, Oklahoma State University, Stillwater, OK, USA
Ibrahim A. Ahmad

Authors

Sayed A. Mostafa
View author publications
You can also search for this author in PubMed Google Scholar
Ibrahim A. Ahmad
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sayed A. Mostafa.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

(PDF 664 KB)

Appendix: Additional Proofs

Proof of Corollary 2.2.

The estimator $\hat {f}_{n}(y;h)$ can be seen as a sample mean of a fixed size simple random sample drawn with replacement from the finite population. Therefore, $\hat {f}_{n}(y;h)$ is design-unbiased for the finite population smooth f_U(y;h) and its design variance is given by (cf. Cochran, 1977, pg. 30)

$$ \begin{array}{@{}rcl@{}}\mathrm{V}_{\mathcal{D}}\{\hat{f}_n(y)|\mathbf{Y}_{\text{U}}\} &=&\frac{1}{n}\left( 1-\frac{1}{N}\right)\frac{1}{N-1}\sum\limits_{i=1}^{N}\left[K_h\left( y-Y_i\right)-f_{\text{U}}(y)\right]^2. \end{array} $$

Now, the bias of $\hat {f}_{n}(y;h)$ under combined inference is identical to the bias of $\hat {f}_{\nu }(y;h)$ (see Theorem 2.2), whereas the combined variance can be written as

$$ \begin{array}{@{}rcl@{}}\mathbb{V}\{\hat{f}_n(y)\}& = &\mathrm{E}_{\mathcal{\xi}}[\mathrm{V}_{\mathcal{D}}\{\hat{f}_n(y)|\mathbf{Y}_{\text{U}}\}] + \mathrm{V}_{\mathcal{\xi}}[\mathrm{E}_{\mathcal{D}}\{\hat{f}_{\nu}(y)|\mathbf{Y}_{\text{U}}\}].\end{array} $$

Using (14) and (15), it is not difficult to see that

$$ \begin{array}{@{}rcl@{}}\mathbb{V}\{\hat{f}_n(y)\}&=&\frac{1}{nh}\left[\left( 1-\frac{1}{N}\right)+\frac{n}{N}\right]d_K f(y)+o\left( \frac{1}{Nh}\right). \end{array} $$

The result follows upon collecting the squared bias and variance of $\hat {f}_{n}(y;h)$ and integrating over y.□

Proof of Corollary 2.3.

The estimator $\hat {f}_{\nu , \text { wor}}(y;h)$ is the sample mean of a fixed size simple random sample drawn without replacement from the finite population. Therefore, $\hat {f}_{\nu , \text { wor}}(y;h)$ is design-unbiased for the finite population smooth f_U(y;h) and its design variance is given by (cf. Cochran, 1977, pg. 23)

$$ \begin{array}{@{}rcl@{}}\mathrm{V}_{\mathcal{D}}\{\hat{f}_{\nu, \text{ wor}}(y)|\mathbf{Y}_{\text{U}}\} &=&\frac{1}{\nu}\left( 1-\frac{\nu}{N}\right)\frac{1}{N-1}\sum\limits_{i=1}^{N}\left[K_h\left( y-Y_i\right)-f_{\text{U}}(y)\right]^2. \end{array} $$

The rest of the proof follows the lines of the proof of Corollary 2.2.□

Proof of Corollary 2.4.

As in the proof of Corollary 2.3, the estimator $\hat {f}_{\nu _{0}, \text { wor}}(y;h)$ is design-unbiased for f_U(y;h) and its design variance is given by

$$ \begin{array}{@{}rcl@{}}\mathrm{V}_{\mathcal{D}}\{\hat{f}_{\nu_0, \text{ wor}}(y)|\mathbf{Y}_{\text{U}}\}&=&\left( \frac{1}{\nu_0}-\frac{1}{N}\right)\frac{1}{N-1}\sum\limits_{i=1}^{N}\left[K_h\left( y-Y_i\right)-f_{\text{U}}(y)\right]^2. \end{array} $$

(A.1)

Now, using $\nu _{0}=\mathrm {E}_{\nu }(\nu )=N\left (1-\left \{1-1/N\right \}^{n}\right )$ to substitute in (A.1) and simplifying, we get

$$ \begin{array}{@{}rcl@{}}\mathrm{V}_{\mathcal{D}}\{\hat{f}_{\nu_0, \text{ wor}}(y)|\mathbf{Y}_{\text{U}}\}&=&\frac{1}{n}\mathcal{C}^{**}_{N,n}\frac{1}{N-1}\sum\limits_{i=1}^{N}\left[K_h\left( y-Y_i\right)-f_{\text{U}}(y)\right]^2,\end{array} $$

where $\mathcal {C}^{**}_{N,n}=(n/N)(1-1/N)^{n}/\{1-(1-1/N)^{n}\}$. The rest of the proof follows the lines of the proof of Corollary 2.2.□

Proof of Corollary 2.5.

Given λ, the estimator $\hat {f}_{\nu _{r}, \text { wor}}(y;h)$ is design-unbiased for f_U(y;h) and its design variance is given by

$$ \begin{array}{@{}rcl@{}}\mathrm{V}_{\mathcal{D}}\{\hat{f}_{\nu_r, \text{ wor}}(y)|\mathbf{Y}_{\text{U}}\} &=&\left( \frac{1}{\nu_r}-\frac{1}{N}\right)\frac{1}{N-1}\sum\limits_{i=1}^{N}\left[K_h\left( y-Y_i\right)-f_{\text{U}}(y)\right]^2. \end{array} $$

(A.2)

Further, note that since λ = 1 with probability p = (ν₀ −⌊ν₀⌋), and λ = 0 with probability (1 − p), we can write

$$ \begin{array}{@{}rcl@{}}\mathrm{E}_\lambda \left( \frac{1}{\nu_r}\right)&=&\mathrm{E}_\lambda \left( \frac{1}{(1-\lambda)\lfloor \nu_0 \rfloor+\lambda\left\lceil \nu_0 \right\rceil}\right)\\ &=&\frac{p}{\left\lceil \nu_0 \right\rceil}+\frac{(1-p)}{\lfloor \nu_0 \rfloor}=\frac{1-\nu_0+2\lfloor \nu_0 \rfloor}{\lfloor \nu_0 \rfloor\left\lceil \nu_0 \right\rceil}. \end{array} $$

(A.3)

To reach the second equality in (A.3), use the fact that $\left \lceil \nu _{0} \right \rceil -\lfloor \nu _{0} \rfloor =1$ and some basic algebra. Using (A.2) and (A.3), we get

$$ \begin{array}{@{}rcl@{}}\mathrm{V}_{\mathcal{P}}\{\hat{f}_{\nu_r, \text{ wor}}(y)|\mathbf{Y}_{\text{U}}\}&=&\frac{1}{n}\left( \mathcal{C}^{***}_{N,n}-\frac{n}{N}\right)\frac{1}{N-1}\sum\limits_{i=1}^{N}\left[K_h\left( y-Y_i\right)-f_{\text{U}}(y)\right]^2. \end{array} $$

The rest of the proof follows the lines of the proof of Corollary 2.2.□

Proof of Lemma 4.1.

Observe that,

$$ \begin{array}{@{}rcl@{}}\mathrm{E}\left\{\frac{1}{\nu}\sum\limits_{i=1}^{\nu}\hat{f}_{-i}(Y^{*}_i)\right\}&=& \mathrm{E}_{\nu}\left\{\mathrm{E}_{\mathcal{\xi}}\left[\frac{1}{\nu}\sum\limits_{i=1}^{{\nu}}\hat{f}_{-i}(Y^{*}_i)\Big|\nu\right]\right\}.\end{array} $$

But,

$$ \begin{array}{@{}rcl@{}}\mathrm{E}_{\mathcal{\xi}}\left[\frac{1}{\nu}\sum\limits_{i=1}^{\nu}\hat{f}_{-i}(Y^{*}_i)\Big|\nu\right]&=& \mathrm{E}_{\mathcal{\xi}}\left[\frac{1}{\nu(\nu-1)}\sum\limits_{i=1}^{\nu}\sum\limits_{j\not=i}^{\nu}K_h(Y^{*}_i-Y^{*}_j)\Big|\nu \right]\\ &\overset{iid}{=}& \mathrm{E}_{\mathcal{\xi}}\left[K_h(Y^{*}_1-Y^{*}_2)\right]\\ &=& \mathrm{E}_{Y^{*}_1}\left\{\mathrm{E}_{Y^{*}_2}\left[K_h(Y^{*}_1-Y^{*}_2)\right]\right\}\\ &=& \mathrm{E}_{Y^{*}_1}\left\{\int K_h(Y^{*}_1-y)f(y) dy\right\}.\end{array} $$

Therefore,

$$ \begin{array}{@{}rcl@{}}\mathrm{E}\left\{\frac{1}{\nu}\sum\limits_{i=1}^{\nu}\hat{f}_{-i}(Y^{*}_i)\right\} &=& \mathrm{E}_{\mathcal{\xi}}\left\{\int K_h(Y^{*}_1-y)f(y) dy\right\}\\&=&\mathrm{E}_{\mathcal{\xi}}\left\{\int \hat{f}_{\nu}(y)f(y) dy\Big|\nu\right\}=\mathrm{E}\left\{\int \hat{f}_{\nu}(y)f(y) dy\right\},\end{array} $$

and the proof is complete. □

Proof 13 (Proof of (4.5)).

$$ \begin{array}{@{}rcl@{}}d_{\hat{f}^{\prime\prime}_\nu}&=&{\int}_{\mathbb{R}}\left\{(\nu h^{3})^{-1}\sum\limits_{j=1}^{\nu}K^{\prime\prime}\left( \frac{y-y^{*}_j}{h}\right)\right\}^2dy\\ &=&(\nu h^{3})^{-2}{\int}_{\mathbb{R}}\left\{\sum\limits_{i=1}^{\nu}\left[K^{\prime\prime}\left( \frac{y-y^{*}_i}{h}\right)\right]^2+2\underset{i <j}{\sum\sum}K^{\prime\prime}\left( \frac{y-y^{*}_i}{h}\right)K^{\prime\prime}\left( \frac{y-y^{*}_j}{h}\right)\right\}dy\\ &=&(\nu h^{3})^{-2}\sum\limits_{i=1}^{\nu}{\int}_{\mathbb{R}}\left[K^{\prime\prime}\left( \frac{y-y^{*}_i}{h}\right)\right]^2 dy + 2(\nu h^{3})^{-2}\underset{i <j}{\sum\sum}{\int}_{\mathbb{R}}K^{\prime\prime}\left( \frac{y-y^{*}_i}{h}\right)K^{\prime\prime} \left( \frac{y-y^{*}_j}{h}\right) dy\\ &=&{\nu}^{-1}h^{-6}{\int}_{\mathbb{R}}\left[K^{\prime\prime}(z)\right]^2 hdz+2{\nu}^{-2}h^{-6}\underset{i <j}{\sum\sum}{\int}_{\mathbb{R}}K^{\prime\prime}(z)K^{\prime\prime}\left( z+\frac{y^{*}_i-y^{*}_j}{h}\right) hdz\\ &=&({\nu}h^{5})^{-1}d_{K^{\prime\prime}}+2{\nu}^{-2}h^{-5}\underset{i <j}{\sum\sum}\phi(c_{ij}), \end{array} $$

where $\phi (c_{ij})={\int \limits }_{\mathbb {R}}K^{\prime \prime }(z)K^{\prime \prime }(z+h^{-1}\{y^{*}_{i}-y^{*}_{j}\})dz$. □

Proofs of (6.7), (6.8) and (6.9).

For (6.7), notice that

$$ \begin{array}{@{}rcl@{}}\mathbb{E}(Q)&=&\mathrm{E}_{\mathcal{\xi}}\left[\mathrm{E}_\nu \left\{\mathrm{E}_{\mathcal{D}}\left[\frac{1}{\nu}\sum\limits_{i=1}^{\nu}Y^{*}_iK_b(x-X^{*}_i)|\mathbf{X}_{\text{U}},\mathbf{Y}_{\text{U}},\nu\right]|\mathbf{X}_{\text{U}},\mathbf{Y}_{\text{U}}\right\}\right]\\ &=&\mathrm{E}_{\mathcal{\xi}}\left[\mathrm{E}_\nu \left\{\frac{1}{N}\sum\limits_{i=1}^{N}Y_iK_b(x-X_i)|\mathbf{X}_{\text{U}},\mathbf{Y}_{\text{U}}\right\}\right]\\ &=&\mathrm{E}_{\mathcal{\xi}}\left[\frac{1}{N}\sum\limits_{i=1}^{N}Y_iK_b(x-X_i)\right]\\ &=&\mathrm{E}_{\mathcal{\xi}}\left[Y_1K_b(x-X_1)\right] \end{array} $$

(A.4)

$$ \begin{array}{@{}rcl@{}} &=&\iint_{\mathbb{R}}y_1K_b(x-x_1)t(y_1|x_1)g(x_1)dy_1dx_1\\ &=&{\int}_{\mathbb{R}}m(x_1)K_b(x-x_1)g(x_1)dx_1\\ &=&{\int}_{\mathbb{R}}m(x+bu)K(u)g(x+bu)du\\ &=&{\int}_{\mathbb{R}}K(u)[m(x)+bu \ m^{\prime}(x)+\frac{1}{2}b^2u^2m^{\prime\prime}(x)+o(b^2)][g(x)\\&&+bu g^{\prime}(x)+\frac{1}{2}b^2u^2g^{\prime\prime}(x)+o(b^2)]du\\ &=&m(x)g(x)+b^2m^{\prime}(x)g^{\prime}(x)c_K+\frac{1}{2}b^2m(x)g^{\prime\prime}(x)c_K\\&&+\frac{1}{2}b^2m^{\prime\prime}(x)g(x)c_K+o(b^2)\\ &=& m(x)g(x)+\frac{1}{2}b^2[2m^{\prime}(x)g^{\prime}(x)+m(x)g^{\prime\prime}(x)\\&&+m^{\prime\prime}(x)g(x)]c_K+o(b^2), \end{array} $$

(A.5)

Next, we prove (6.8) as follows. Note that

$$ \begin{array}{@{}rcl@{}}\mathbb{V}(Q) &=& \mathrm{E}_{\mathcal{\xi}}\left[\mathrm{V}_{\mathcal{P}}\left\{Q|\mathbf{X}_{\text{U}},\mathbf{Y}_{\text{U}}\right\}\right] + \mathrm{V}_{\mathcal{\xi}}\left[\mathrm{E}_{\mathcal{P}}\left\{Q|\mathbf{X}_{\text{U}},\mathbf{Y}_{\text{U}}\right\}\right] =: L_1 + L_2.\end{array} $$

(A.6)

Using (A.4), we have

$$ \begin{array}{@{}rcl@{}}L_2&=&\frac{1}{N}\mathrm{V}_{\mathcal{\xi}}\left[Y_1K_b(x-X_1)\right]=\frac{1}{N}\mathrm{E}_{\mathcal{\xi}}\left[Y_1K_b(x-X_1)\right]^2\\&&-\frac{1}{N}E^2_{\mathcal{\xi}}\left[Y_1K_b(x-X_1)\right] =: L_{21}-L_{22}.\end{array} $$

Observe that,

$$ \begin{array}{@{}rcl@{}}L_{21}&=&\frac{1}{N}\iint_{\mathbb{R}}y^2_1K^2_b(x-x_1)t(y_1|x_1)g(x_1)dy_1dx_1\\ &=&\frac{1}{N}{\int}_{\mathbb{R}}K^2_b(x-x_1)g(x_1)\left[{\int}_{\mathbb{R}}y^2_1t(y_1|x_1)dy_1\right]dx_1\\ &=&\frac{1}{N}{\int}_{\mathbb{R}}\left[\sigma^2(x_1)+m^2(x_1)\right]K^2_b(x-x_1)g(x_1)dx_1\\ &=&\frac{1}{Nb}{\int}_{\mathbb{R}}\left[\sigma^2(x+bu)+m^2(x+bu)\right]K^2(u)g(x+bu)du\\ &=&\frac{1}{Nb}\left[\sigma^2(x)g(x)+m^2(x)g(x)\right]d_K+o\left( \frac{1}{Nb}\right).\end{array} $$

From (A.5), we have

$$ \begin{array}{@{}rcl@{}}L_{22}&=&\frac{1}{N}\left[m(x)g(x)+O(b^2)\right]^2.\end{array} $$

Therefore,

$$ \begin{array}{@{}rcl@{}}L_{2}&=&\frac{1}{Nb}\left[\sigma^2(x)g(x)+m^2(x)g(x)\right]d_K-\frac{1}{N}\left[m(x)g(x)+O(b^2)\right]^2+o\left( \frac{1}{Nb}\right)\\ &=&\frac{1}{Nb}\left[\sigma^2(x)+m^2(x)\right]g(x)d_K+o\left( \frac{1}{Nb}\right). \end{array} $$

(A.7)

Moreover, given ν, Q can be seen as the sample mean of a fixed size simple random sample drawn without replacement from the finite population. Therefore, Q is design-unbiased for the finite population mean of the variable Y K_b(x − X), and its design variance is given by

$$ \begin{array}{@{}rcl@{}}\mathrm{V}_{\mathcal{D}}\left[Q|\mathbf{X}_{\text{U}},\mathbf{Y}_{\text{U}},\nu\right]&=&\left( \frac{1}{\nu}-\frac{1}{N}\right)\frac{1}{N-1}\sum\limits_{i=1}^{N}\left[Y_iK_b(x-X_i)-\frac{1}{N}\sum\limits_{i=1}^{N}Y_iK_b(x-X_i)\right]^2,\end{array} $$

and, hence,

$$ \begin{array}{@{}rcl@{}}\mathrm{V}_{\mathcal{P}}\left[Q|\mathbf{X}_{\text{U}},\mathbf{Y}_{\text{U}}\right]&=&\left[\mathrm{E}_\nu \left( \frac{1}{\nu}\right)-\frac{1}{N}\right]\frac{1}{N-1}\sum\limits_{i=1}^{N}\left[Y_iK_b(x-X_i)-\frac{1}{N}\sum\limits_{i=1}^{N}Y_iK_b(x-X_i)\right]^2\\&&+\mathrm{V}_\nu \left[\frac{1}{N}\sum\limits_{i=1}^{N}Y_iK_b(x-X_i)\right]\\ &=&\left[\mathrm{E}_\nu \left( \frac{1}{\nu}\right) - \frac{1}{N}\right]\frac{1}{N-1}\sum\limits_{i=1}^{N}\left[Y_iK_b(x-X_i)-\frac{1}{N}\sum\limits_{i=1}^{N}Y_iK_b(x-X_i)\right]^2. \end{array} $$

Consequently,

$$ \begin{array}{@{}rcl@{}}L_1&=&\left[\mathrm{E}_\nu \left( \frac{1}{\nu}\right)-\frac{1}{N}\right]\mathrm{V}_{\mathcal{\xi}}\left\{Y_1K_b(x-X_1)\right\}\\ &=&\frac{1}{nb}\mathcal{C}_{N,n}\left[\sigma^2(x)+m^2(x)\right]g(x)d_K+o\left( \frac{1}{Nb}\right), \end{array} $$

(A.8)

Using (A.7) and (A.8) in (A.6), we get (6.8).

For (6.9), observe that

$$ \begin{array}{@{}rcl@{}} \mathbb{E}(QW)&=&\mathbb{E}\left[\left\{\frac{1}{\nu}\sum\limits_{i=1}^{\nu}y^{*}_iK_b(x-x^{*}_i)\right\}\left\{\frac{1}{\nu}\sum\limits_{i=1}^{\nu}K_b(x-x^{*}_i)\right\}\right]\\ &=&\mathbb{E}\left[\frac{1}{\nu^2}\sum\limits_{i=1}^{\nu}y^{*}_iK^2_b(x-x^{*}_i)\right]+\mathbb{E}\left[\frac{1}{\nu^2}\underset{i\not=j}{\sum\sum}y^{*}_iK_b(x-x^{*}_i)K_b(x-x^{*}_j)\right]\\ &=:&B_1+B_2. \end{array} $$

(A.9)

First,

$$ \begin{array}{@{}rcl@{}}B_1&=&\mathrm{E}_\nu \left( \frac{1}{\nu}\right)\mathrm{E}_{\mathcal{\xi}}\left[Y_1K^2_b(x-X_1)\right]\\ &=&\mathrm{E}_\nu \left( \frac{1}{\nu}\right){\int}_{\mathbb{R}}m(x_1)K^2_b(x-x_1)g(x_1)dx_1\\ &=&\mathrm{E}_\nu \left( \frac{1}{\nu}\right)\left[\frac{1}{b}m(x)g(x)d_K+o\left( \frac{1}{Nb}\right)\right]\\ &=&\frac{1}{nb}\left[\mathcal{C}_{N,n}+\frac{n}{N}\right]m(x)g(x)d_K+o\left( \frac{1}{Nb}\right). \end{array} $$

(A.10)

Second,

$$ \begin{array}{@{}rcl@{}} B_2&=&\mathrm{E}_{\mathcal{\xi}}\left[\mathrm{E}_\nu \left\{\mathrm{E}_{\mathcal{D}}\left[\frac{1}{\nu^2}\underset{i,j\in U, i\not=j}{\sum\sum}I_iI_jY_iK_b(x-X_i)K_b(x-X_j)|\mathbf{X}_{\text{U}},\mathbf{Y}_{\text{U}},\nu\right]|\mathbf{X}_{\text{U}},\mathbf{Y}_{\text{U}}\right\}\right]\\ &=&\mathrm{E}_{\mathcal{\xi}}\left[\mathrm{E}_\nu \left\{\frac{1}{\nu^2}\underset{i,j\in U, i\not=j}{\sum\sum}\pi_{ij}Y_iK_b(x-X_i)K_b(x-X_j)|\mathbf{X}_{\text{U}},\mathbf{Y}_{\text{U}}\right\}\right]\\ &=&\mathrm{E}_{\mathcal{\xi}}\left[\mathrm{E}_\nu \left\{\frac{\nu(\nu-1)}{\nu^2N(N-1)}\right\}\underset{i,j\in U, i\not=j}{\sum\sum}Y_iK_b(x-X_i)K_b(x-X_j)\right]\\ &=&\left\{1-\mathrm{E}_\nu \left( \frac{1}{\nu}\right)\right\}\frac{1}{N(N-1)}\underset{i,j\in U, i\not=j}{\sum\sum}\mathrm{E}_{\mathcal{\xi}}\left[Y_iK_b(x-X_i)\right]\mathrm{E}_{\mathcal{\xi}}\left[K_b(x-X_j)\right]\\ &=&\left\{1-\mathrm{E}_\nu \left( \frac{1}{\nu}\right)\right\}\left[m(x)g(x)+\frac{1}{2}b^2\{m(x)g^{\prime\prime}(x)+2m^{\prime}(x)g^{\prime}(x)+m^{\prime\prime}(x)g(x)\}c_K+o(b^2)\right]\\ &&\qquad\qquad\qquad\qquad\times \left[g(x)+\frac{1}{2}b^2g^{\prime\prime}(x)c_K+o(b^2)\right]\\ &=& \left\{1 - \frac{1}{n}\left( \mathcal{C}_{N,n} + \frac{n}{N}\right)\right\}\left[m(x)g^2(x) + \frac{1}{2}b^2\left\{2m(x)g^{\prime\prime}(x) + 2m^{\prime}(x)g^{\prime}(x) + m^{\prime\prime}(x)g(x)\right\}g(x)c_K\right.\\ &&\left.\qquad\qquad\qquad\qquad\qquad\quad+o(b^2)\vphantom{\frac{1}{2}}\right]\\ &=&m(x)g^2(x)+\frac{1}{2}b^2\left[2m(x)g^{\prime\prime}(x)+2m^{\prime}(x)g^{\prime}(x)+m^{\prime\prime}(x)g(x)\right]g(x)c_K+o(b^2)\\&&-O\left( \frac{1}{n}\right). \end{array} $$

(A.11)

Finally, using (A.10) and (A.11) in (A.9), we get (6.9).□

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mostafa, S.A., Ahmad, I.A. Kernel Density Estimation Based on the Distinct Units in Sampling with Replacement. Sankhya B 83, 507–547 (2021). https://doi.org/10.1007/s13571-019-00223-9

Download citation

Received: 10 June 2019
Published: 13 March 2020
Issue Date: November 2021
DOI: https://doi.org/10.1007/s13571-019-00223-9

Keywords and phrases

AMS (2000) subject classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Kernel Density Estimation Based on the Distinct Units in Sampling with Replacement

Abstract

Access this article

Similar content being viewed by others

A new computational framework for log-concave density estimation

Check your outliers! An introduction to identifying statistical outliers in R with easystats

Minimizing robust density power-based divergences for general parametric density models

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Electronic supplementary material

(PDF 664 KB)

Appendix: Additional Proofs

Proof of Corollary 2.2.

Proof of Corollary 2.3.

Proof of Corollary 2.4.

Proof of Corollary 2.5.

Proof of Lemma 4.1.

Proof 13 (Proof of (4.5)).

Proofs of (6.7), (6.8) and (6.9).

Rights and permissions

About this article

Cite this article

Keywords and phrases

AMS (2000) subject classification

Navigation

Kernel Density Estimation Based on the Distinct Units in Sampling with Replacement

Abstract

Access this article

Similar content being viewed by others

A new computational framework for log-concave density estimation

Check your outliers﻿! An introduction to identifying statistical outliers in R with easystats

Minimizing robust density power-based divergences for general parametric density models

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Electronic supplementary material

(PDF 664 KB)

Appendix: Additional Proofs

Appendix: Additional Proofs

Proof of Corollary 2.2.

Proof of Corollary 2.3.

Proof of Corollary 2.4.

Proof of Corollary 2.5.

Proof of Lemma 4.1.

Proof 13 (Proof of (4.5)).

Proofs of (6.7), (6.8) and (6.9).

Rights and permissions

About this article

Cite this article

Share this article

Keywords and phrases

AMS (2000) subject classification

Search

Navigation

Check your outliers! An introduction to identifying statistical outliers in R with easystats