Abstract
Stratified sampling is one of the most important survey sampling approaches and is widely used in practice. In this paper, we consider the estimation of the distribution function of a finite population in stratified sampling by the empirical distribution function (EDF) and kernel distribution estimator (KDE), respectively. Under general conditions, the rescaled estimation error processes are shown to converge to a weighted sum of transformed Brownian bridges. Moreover, simultaneous confidence bands (SCBs) are constructed for the population distribution function based on EDF and KDE. Simulation experiments and illustrative data example show that the coverage frequencies of the proposed SCBs under the optimal and proportional allocations are close to the nominal confidence levels.
Similar content being viewed by others
References
Bickel, P. J., Rosenblatt, M. (1973). On some global measures of the deviations of density function estimates. Annals of Statistics, 1, 1071–1095.
Billingsley, P. (1999). Convergence of Probability Measures (2nd ed.). New York: Wiley.
Cai, L., Yang, L. (2015). A smooth simultaneous confidence band for conditional variance function. TEST, 24, 632–655.
Cao, G., Yang, L., Todem, D. (2012). Simultaneous inference for the mean function based on dense functional data. Journal of Nonparametric Statistics, 24, 359–377.
Cao, G., Wang, L., Li, Y., Yang, L. (2016). Oracle-efficient confidence envelopes for covariance functions in dense functional data. Statistica Sinica, 26, 359–383.
Cardot, H., Josserand, E. (2011). Horvitz-Thompson estimators for functional data: asymptotic confidence bands and optimal allocation for stratified sampling. Biometrika, 98, 107–118.
Cardot, H., Degras, D., Josserand, E. (2013). Confidence bands for Horvitz–Thompson estimators using sampled noisy functional data. Bernoulli, 19, 2067–2097.
Chambers, R. L., Dunstan, R. (1986). Estimation distribution functions from survey data. Biometrika, 73, 597–604.
Chen, J., Wu, C. (2002). Estimation of distribution function and quantiles using the model-calibrated pseudo empirical likelihood method. Statistica Sinica, 12, 1223–1239.
Cochran, W. G. (1977). Sampling techniques (3rd ed.). New York: Wiley.
Degras, D. (2011). Simultaneous confidence bands for nonparametric regression with functional data. Statistica Sinica, 21, 1735–1765.
Frey, J. (2009). Confidence bands for the CDF when sampling from a finite population. Computational Statistics and Data Analysis, 53, 4126–4132.
Gu, L., Yang, L. (2015). Oracally efficient estimation for single-index link function with simultaneous confidence band. Electronic Journal of Statistics, 9, 1540–1561.
Gu, L., Wang, L., Härdle, W., Yang, L. (2014). A simultaneous confidence corridor for varying coefficient regression with sparse functional data. TEST, 23, 806–843.
Härdle, W. (1989). Asymptotic maximal deviation of M-smoothers. Journal of Multivariate Analysis, 29, 163–179.
Liu, R., Yang, L. (2008). Kernel estimation of multivariate cumulative distribution function. Journal of Nonparametric Statistics, 20, 661–677.
Lohr, S. (2009). Sampling: Design and analysis (2nd ed.). Boston: Brooks/Cole.
Ma, S., Yang, L., Carroll, R. (2012). A simultaneous confidence band for sparse longitudinal regression. Statistica Sinica, 22, 95–122.
McCarthy, P. J., Snowden, C. B. (1985). The bootstrap and finite population sampling. Vital and Health Statistics, 73, 1–23.
O’Neill, T., Stern, S. (2012). Finite population corrections for the Kolmogorov–Smirnov tests. Journal of Nonparametric Statistics, 24, 497–504.
Reiss, R. (1981). Nonparametric estimation of smooth distribution functions. Scandinavian Journal of Statistics, 8, 116–119.
Rosén, B. (1964). Limit theorems for sampling from finite population. Arkiv för Matematik, 5, 383–424.
Shao, Q., Yang, L. (2012). Polynomial spline confidence band for time series trend. Journal of Statistical Planning and Inference, 142, 1678–1689.
Song, Q., Yang, L. (2009). Spline confidence bands for variance function. Journal of Nonparametric Statistics, 21, 589–609.
Song, Q., Liu, R., Shao, Q., Yang, L. (2014). A simultaneous confidence band for dense longitudinal regression. Communications in Statistics-Theory and Methods, 43, 5195–5210.
Wang, J., Yang, L. (2009). Polynomial spline confidence bands for regression curves. Statistica Sinica, 19, 325–342.
Wang, J., Cheng, F., Yang, L. (2013). Smooth simultaneous confidence bands for cumulative distribution functions. Journal of Nonparametric Statistics, 25, 395–407.
Wang, J., Liu, R., Cheng, F., Yang, L. (2014). Oracally efficient estimation of autoregressive error distribution with simultaneous confidence band. Annals of Statistics, 42, 654–668.
Wang, J., Wang, S., Yang, L. (2016). Simultaneous confidence bands for the distribution function of a finite population and of its superpopulation. TEST, 25, 692–709.
Wang, S., Dorfman, A. (1996). A new estimator for the finite population distribution function. Biometrika, 83, 639–652.
Xia, Y. (1998). Bias-corrected confidence bands in nonparametric regression. Journal of the Royal Statistical Society Series B, 60, 797–811.
Zheng, S., Yang, L., Härdle, W. (2014). A smooth simultaneous confidence corridor for the mean of sparse functional data. Journal of the American Statistical Association, 109, 661–673.
Zhu, H., Li, R., Kong, L. (2012). Multivariate varying coefficient model for functional responses. Annals of Statistics, 40, 2634–2666.
Author information
Authors and Affiliations
Corresponding author
Additional information
This research was supported in part by Jiangsu Specially-Appointed Professor Program SR10700111, Jiangsu Province Key-Discipline Program ZY107992, National Natural Science Foundation of China Awards NSFC 11371272, 11771240, 11701403, Research Fund for the Doctoral Program of Higher Education of China Award 20133201110002, 2017 Jiangsu Overseas Visiting Scholar Program for University Prominent Young and Middle-aged Teachers and Presidents, and the Simons Foundation Mathematics and Physical Sciences Program Award #499650. Helpful comments from a reviewer are greatly appreciated.
Appendix
Appendix
In this Appendix, we use \(a_{n}=o\left( b_{n}\right) \) to denote that \( \lim _{n\rightarrow \infty }a_{n}/b_{n}=0\), and \(a_{n}=O\left( b_{n}\right) \) to denote that \(\limsup _{n\rightarrow \infty }a_{n}/b_{n}=c\), where c is a constant. In addition, we denote by \(o_{p}\)\(\left( O_{p}\right) \) and \( o_{a.s.}\) a sequence of random variables of order o\(\left( O\right) \) in probability and almost surely, respectively, while \(u_{a.s.}\) means \( o_{a.s.} \) uniformly in the domain.
In the following we will prove Lemma 1 and Theorems 2–4.
1.1 A.1 Proof of Lemma 1
Our framework given in Sect. 2 and Condition (C2) ensure that, for any \(s\in \left\{ 1,\ldots ,S\right\} \),
Hence,
Making use of the simple inequality
and letting \(k\rightarrow \infty \), one obtains that
since \(C<1\) according to Condition (C2), one obtains that \(\min _{1\le s\le S}CW_{s}^{-1}w_{s}<1\). The Lemma 1 is proved. \(\square \)
1.2 A.2 Proof of Theorem 2
For \(s=1,\ldots ,S\), combining (8) in Theorem 1 with Skorohod’s Representation Theorem shown in Theorem 6.7 of Billingsley (1999), there exits a version \(\tilde{B}_{sk}\left( \cdot \right) \) of Brownian bridge \(B_{s}\left( \cdot \right) \) that satisfies \(\tilde{B} _{sk}\left( F_{s}( x) \right) \overset{d}{\rightarrow } B_{s}\left( F_{s}( x) \right) \) as \(k\rightarrow \infty \) such that
which implies that
Recalling the definitions of \(F_{N_{k}}( x) \) and \( F_{n_{k}}( x) \) given in (1) and (4), one has
According to Condition (C2), as \(k\rightarrow \infty \),
and
Hence,
The proof of Theorem 2 is completed. \(\square \)
1.3 A.3 Proof of Theorem 3
Note that \(\lambda _{k}N_{k}^{-1/2}=\left( n_{k}^{-1}-N_{k}^{-1}\right) ^{-1/2}N_{k}^{-1/2}=\left( n_{k}/N_{k}\right) ^{1/2}\left( 1-n_{k}/N_{k}\right) ^{-1/2}\)\(\rightarrow 0\) when \(n_{k}/N_{k}\rightarrow C\equiv 0\) as \(k\rightarrow \infty \). Because of a sequence of populations \( \left\{ \pi _{k}\right\} _{k=1}^{\infty }\) as i.i.d. random samples generated from F(x) , Donsker’s Theorem entails that \( N_{k}^{1/2}\left\{ F_{N_{k}}( x) -F(x) \right\} \overset{d}{\rightarrow }B\left\{ F( x) \right\} \). Hence, as \( k\rightarrow \infty \),
Then Theorem 3 follows by Theorem 2 and Slutsky’s Theorem. \(\square \)
1.4 A.4 Proof of Theorem 4
According to the definitions of \(F_{n_{k}}( x) \) and \(\hat{F} _{k}( x) \) given in (4) and (6), one has
Applying Theorems 2 and 3 and Slutsky’s Theorem, Theorem 4 is proved. \(\square \)
About this article
Cite this article
Gu, L., Wang, S. & Yang, L. Simultaneous confidence bands for the distribution function of a finite population in stratified sampling. Ann Inst Stat Math 71, 983–1005 (2019). https://doi.org/10.1007/s10463-018-0668-7
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10463-018-0668-7