Abstract
This paper considers the problem of estimating density functions using the kernel method based on the set of distinct units in sampling with replacement. Using a combined design-model-based inference framework, which accounts for the underlying superpopulation model as well as the randomization distribution induced by the sampling design, we derive asymptotic expressions for the bias and integrated mean squared error (MISE) of a Parzen-Rosenblatt-type kernel density estimator (KDE) based on the distinct units from sampling with replacement. We also prove the asymptotic normality of the distinct units KDE under both design-based and combined inference frameworks. Additionally, we give the asymptotic MISE formulas of several alternative estimators including the estimator based on the full with-replacement sample and estimators based on without-replacement sampling of similar cost. Using the MISE expressions, we discuss how the various estimators compare asymptotically. Moreover, we use Mote Carlo simulations to investigate the finite sample properties of these estimators. Our simulation results show that the distinct units KDE and the without-replacement KDEs perform similarly but are all always superior to the full with-replacement sample KDE. Furthermore, we briefly discuss a Nadaraya-Watson-type kernel regression estimator based on the distinct units from sampling with replacement, derive its MSE under the combined inference framework, and demonstrate its finite sample properties using a small simulation study. Finally, we extend the distinct units density and regression estimators to the case of two-stage sampling with replacement.
Similar content being viewed by others
References
Alin, A., Martin, M.A., Beyaztas, U. and Pathak, P.K. (2017). Sufficient m-out-of-n (m/n) bootstrap. J. Stat. Comput. Simul. 87, 1742–1753.
Antal, E. and Tillé, Y. (2011a). Direct bootstrap method for complex sampling designs from a finite population. J. Am. Stat. Assoc. 106, 534–543.
Antal, E. and Tillé, Y. (2011b). Simple random sampling with over-replacement. J. Stat. Plan. Inference 141, 597–601.
Arnab, R. (1999). On use of distinct respondents in randomized response surveys. Biom. J. 41, 507–513.
Basu, D. (1958). On sampling with and without replacement. Sankhyā 20, 287–294.
Bellhouse, D.R. and Stafford, J.E. (1999). Density estimation from complex surveys. Stat. Sin. 9, 407–424.
Bleuer, S.R. and Kratina, I.S. (2005). On the two-phase framework for joint model and design-based inference. Ann. Stat. 33, 2789–2810.
Bonnéry, D., Breidt, F.J. and Coquet, F. (2017). Kernel estimation for a superpopulation probability density function under informative selection. Metron 75, 301–318.
Buskirk, T.D. and Lohr, S.L. (2005). Asymptotic properties of kernel density estimation with complex survey data. J. Stat. Plan. Inference 128, 165–190.
Cochran, W.G. (1977). Sampling Techniques. Wiley, New York.
Efron, B. (1979). Bootstrap methods: Another look at the jackknife. Ann. Stat. 7, 1–26.
Efron, B. and Tibshirani, R. (1993). An Introduction to Bootstrap. Chapman and Hall, New York.
Fan, J. and Gijbels, I. (1996). Local Polynomial Modelling and Its Applications. Chapman & Hall, New York.
Guillera-Arroita, G. (2011). Impact of sampling with replacement in occupancy studies with spatial replication. Methods Ecol. Evol. 2, 401–406.
Harms, T. and Duchesne, P. (2010). On kernel nonparametric regression designed for complex survey data. Metrika 72, 111–138.
Hartley, H.O. and Sielken, R.L. (1975). A “superpopulation viewpoint” for finite population sampling. Biometrics 31, 411–422.
Isaki, C.T. and Fuller, W.A. (1982). Survey design under the regression superpopulation model. J. Am. Stat. Assoc. 77, 89–96.
Korwar, R.M. and Serfling, R.J. (1970). On averaging over distinct units in sampling with replacement. Ann. Math. Stat. 41, 2132–2134.
Lanke, J. (1975). Some contributions to the theory of survey sampling. PhD thesis Department of Mathematical Statistics. University of Lund, Sweden.
Lohr, S.L. (2010). Sampling: Design and Analysis. Cengage Learning, Massachusetts.
Mostafa, S.A. and Ahmad, I.A. (2019). Kernel density estimation from complex surveys in the presence of complete auxiliary information. Metrika 82, 295–338.
Nadaraya, E.A. (1964). On estimating regression. Theory Probab. Appl. 9, 141–142.
Naiman, D.Q. and Torcaso, F. (2016). To replace or not to replace in finite population sampling. arXiv:1606.01782.
Park, B.H., Ostrouchov, G. and Samatova, N.F. (2007). Sampling streaming data with replacement. Comput. Stat. Data Anal. 52, 750–762.
Parzen, E. (1962). On estimation of a probability density function and mode. Ann. Math. Stat. 33, 1065–1076.
Pathak, P.K. (1961). On the evaluation of moments of distinct units in a sample. Sankhyā Ser A 23, 415–420.
Pathak, P.K. (1962a). On sampling with unequal probabilities. Sankhyā Ser. A24, 315–326.
Pathak, P.K. (1962b). On simple random sampling with replacemen. Sankhyā, Ser A 24, 287–302.
Pathak, P.K. (1982). Asymptotic normality of the average of distinct units in simple random sampling with replacement. Essays in Honour of CR Rao. G. Kallianpur, P. R. Krishnaiah, J. K. Ghosh (Eds), pp. 567–573.
Pfeffermann, D. (1993). The role of sampling weights when modeling survey data. Int. Stat. Rev. 61, 317–337.
R Core Team (2017). A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna.
Raj, D. and Khamis, S.H. (1958). Some remarks on sampling with replacement. Ann. Math. Stat. 39, 550–557.
Ramakrishnan, M.K. (1969). Some results on the comparison of sampling with and without replacement. Sankhyā Ser. A 51, 333–342.
Rao, J.K.N. (1966). On the comparison of sampling with and without replacement. Rev. Int. Stat. Inst. 34, 125–138.
Rosenblatt, M. (1956). Remarks on some nonparametric estimates of a density function. The Annals of Mathematical Statistics 27, 832–837.
Scott, D.W. (2015). Multivariate Density Estimation: Theory, Practice and Visualization. Wiley, New York.
Sengupta, S. (2016). On comparisons of with and without replacement sampling strategies for estimating finite population mean in randomized response surveys. Sankhyā Ser B 78, 66–77.
Seth, G.R. and Rao, J.K.N. (1964). On the comparison between simple random sampling with and without replacement. Sankhyā Ser A 26, 85–86.
Sheather, S.J. and Jones, M.C. (1991). A reliable data-based bandwidth selection method for kernel density estimation. J. R. Stat. Soc. Ser. B 53, 683–690.
Singh, S. and Sedory, S.A. (2011). Sufficient bootstrapping. Comput. Stat. Data Anal. 55, 1629–1637.
Sinha, B.K. and Sen, P.K. (1989). On averaging over distinct units in sampling with replacement. Sankhyā Ser B 51, 65–83.
Stuart, A. and Ord, J.K. (1987). Kendall’s Advanced Theory of Statistics, 1. Oxford University Press, New York.
Wand, M. and Jones, M. (1995). Kernel Smoothing. Chapman and Hall, London.
Watson, G.S. (1964). Smooth regression analysis. Sankhyā Ser A 26, 359–372.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Appendix: Additional Proofs
Appendix: Additional Proofs
Proof of Corollary 2.2.
The estimator \(\hat {f}_{n}(y;h)\) can be seen as a sample mean of a fixed size simple random sample drawn with replacement from the finite population. Therefore, \(\hat {f}_{n}(y;h)\) is design-unbiased for the finite population smooth fU(y;h) and its design variance is given by (cf. Cochran, 1977, pg. 30)
Now, the bias of \(\hat {f}_{n}(y;h)\) under combined inference is identical to the bias of \(\hat {f}_{\nu }(y;h)\) (see Theorem 2.2), whereas the combined variance can be written as
Using (14) and (15), it is not difficult to see that
The result follows upon collecting the squared bias and variance of \(\hat {f}_{n}(y;h)\) and integrating over y.□
Proof of Corollary 2.3.
The estimator \(\hat {f}_{\nu , \text { wor}}(y;h)\) is the sample mean of a fixed size simple random sample drawn without replacement from the finite population. Therefore, \(\hat {f}_{\nu , \text { wor}}(y;h)\) is design-unbiased for the finite population smooth fU(y;h) and its design variance is given by (cf. Cochran, 1977, pg. 23)
The rest of the proof follows the lines of the proof of Corollary 2.2.□
Proof of Corollary 2.4.
As in the proof of Corollary 2.3, the estimator \(\hat {f}_{\nu _{0}, \text { wor}}(y;h)\) is design-unbiased for fU(y;h) and its design variance is given by
Now, using \(\nu _{0}=\mathrm {E}_{\nu }(\nu )=N\left (1-\left \{1-1/N\right \}^{n}\right )\) to substitute in (A.1) and simplifying, we get
where \(\mathcal {C}^{**}_{N,n}=(n/N)(1-1/N)^{n}/\{1-(1-1/N)^{n}\}\). The rest of the proof follows the lines of the proof of Corollary 2.2.□
Proof of Corollary 2.5.
Given λ, the estimator \(\hat {f}_{\nu _{r}, \text { wor}}(y;h)\) is design-unbiased for fU(y;h) and its design variance is given by
Further, note that since λ = 1 with probability p = (ν0 −⌊ν0⌋), and λ = 0 with probability (1 − p), we can write
To reach the second equality in (A.3), use the fact that \(\left \lceil \nu _{0} \right \rceil -\lfloor \nu _{0} \rfloor =1\) and some basic algebra. Using (A.2) and (A.3), we get
The rest of the proof follows the lines of the proof of Corollary 2.2.□
Proof of Lemma 4.1.
Observe that,
But,
Therefore,
and the proof is complete. □
Proof 13 (Proof of (4.5)).
where \(\phi (c_{ij})={\int \limits }_{\mathbb {R}}K^{\prime \prime }(z)K^{\prime \prime }(z+h^{-1}\{y^{*}_{i}-y^{*}_{j}\})dz\). □
Proofs of (6.7), (6.8) and (6.9).
For (6.7), notice that
Next, we prove (6.8) as follows. Note that
Using (A.4), we have
Observe that,
From (A.5), we have
Therefore,
Moreover, given ν, Q can be seen as the sample mean of a fixed size simple random sample drawn without replacement from the finite population. Therefore, Q is design-unbiased for the finite population mean of the variable Y Kb(x − X), and its design variance is given by
and, hence,
Consequently,
Using (A.7) and (A.8) in (A.6), we get (6.8).
For (6.9), observe that
First,
Second,
Rights and permissions
About this article
Cite this article
Mostafa, S.A., Ahmad, I.A. Kernel Density Estimation Based on the Distinct Units in Sampling with Replacement. Sankhya B 83, 507–547 (2021). https://doi.org/10.1007/s13571-019-00223-9
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13571-019-00223-9
Keywords and phrases
- Distinct units
- kernel density estimation
- kernel regression
- random sample size
- sampling with/without replacement.