Simultaneous confidence bands for the distribution function of a finite population in stratified sampling

Gu, Lijie; Wang, Suojin; Yang, Lijian

doi:10.1007/s10463-018-0668-7

Simultaneous confidence bands for the distribution function of a finite population in stratified sampling

Published: 21 May 2018

Volume 71, pages 983–1005, (2019)
Cite this article

Annals of the Institute of Statistical Mathematics Aims and scope Submit manuscript

Lijie Gu¹,
Suojin Wang² &
Lijian Yang³

262 Accesses
5 Citations
Explore all metrics

Abstract

Stratified sampling is one of the most important survey sampling approaches and is widely used in practice. In this paper, we consider the estimation of the distribution function of a finite population in stratified sampling by the empirical distribution function (EDF) and kernel distribution estimator (KDE), respectively. Under general conditions, the rescaled estimation error processes are shown to converge to a weighted sum of transformed Brownian bridges. Moreover, simultaneous confidence bands (SCBs) are constructed for the population distribution function based on EDF and KDE. Simulation experiments and illustrative data example show that the coverage frequencies of the proposed SCBs under the optimal and proportional allocations are close to the nominal confidence levels.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Simultaneous confidence bands for the distribution function of a finite population and of its superpopulation

Article 05 May 2016

Optimal Strategy for Elevated Estimation of Population Mean in Stratified Random Sampling under Linear Cost Function

Article 30 March 2024

A new estimator for mean under stratified random sampling

Article Open access 23 July 2018

References

Bickel, P. J., Rosenblatt, M. (1973). On some global measures of the deviations of density function estimates. Annals of Statistics, 1, 1071–1095.
Billingsley, P. (1999). Convergence of Probability Measures (2nd ed.). New York: Wiley.
Book MATH Google Scholar
Cai, L., Yang, L. (2015). A smooth simultaneous confidence band for conditional variance function. TEST, 24, 632–655.
Cao, G., Yang, L., Todem, D. (2012). Simultaneous inference for the mean function based on dense functional data. Journal of Nonparametric Statistics, 24, 359–377.
Cao, G., Wang, L., Li, Y., Yang, L. (2016). Oracle-efficient confidence envelopes for covariance functions in dense functional data. Statistica Sinica, 26, 359–383.
Cardot, H., Josserand, E. (2011). Horvitz-Thompson estimators for functional data: asymptotic confidence bands and optimal allocation for stratified sampling. Biometrika, 98, 107–118.
Cardot, H., Degras, D., Josserand, E. (2013). Confidence bands for Horvitz–Thompson estimators using sampled noisy functional data. Bernoulli, 19, 2067–2097.
Chambers, R. L., Dunstan, R. (1986). Estimation distribution functions from survey data. Biometrika, 73, 597–604.
Chen, J., Wu, C. (2002). Estimation of distribution function and quantiles using the model-calibrated pseudo empirical likelihood method. Statistica Sinica, 12, 1223–1239.
Cochran, W. G. (1977). Sampling techniques (3rd ed.). New York: Wiley.
MATH Google Scholar
Degras, D. (2011). Simultaneous confidence bands for nonparametric regression with functional data. Statistica Sinica, 21, 1735–1765.
Article MathSciNet MATH Google Scholar
Frey, J. (2009). Confidence bands for the CDF when sampling from a finite population. Computational Statistics and Data Analysis, 53, 4126–4132.
Article MathSciNet MATH Google Scholar
Gu, L., Yang, L. (2015). Oracally efficient estimation for single-index link function with simultaneous confidence band. Electronic Journal of Statistics, 9, 1540–1561.
Gu, L., Wang, L., Härdle, W., Yang, L. (2014). A simultaneous confidence corridor for varying coefficient regression with sparse functional data. TEST, 23, 806–843.
Härdle, W. (1989). Asymptotic maximal deviation of M-smoothers. Journal of Multivariate Analysis, 29, 163–179.
Article MathSciNet MATH Google Scholar
Liu, R., Yang, L. (2008). Kernel estimation of multivariate cumulative distribution function. Journal of Nonparametric Statistics, 20, 661–677.
Lohr, S. (2009). Sampling: Design and analysis (2nd ed.). Boston: Brooks/Cole.
MATH Google Scholar
Ma, S., Yang, L., Carroll, R. (2012). A simultaneous confidence band for sparse longitudinal regression. Statistica Sinica, 22, 95–122.
McCarthy, P. J., Snowden, C. B. (1985). The bootstrap and finite population sampling. Vital and Health Statistics, 73, 1–23.
O’Neill, T., Stern, S. (2012). Finite population corrections for the Kolmogorov–Smirnov tests. Journal of Nonparametric Statistics, 24, 497–504.
Reiss, R. (1981). Nonparametric estimation of smooth distribution functions. Scandinavian Journal of Statistics, 8, 116–119.
MathSciNet MATH Google Scholar
Rosén, B. (1964). Limit theorems for sampling from finite population. Arkiv för Matematik, 5, 383–424.
Article MathSciNet MATH Google Scholar
Shao, Q., Yang, L. (2012). Polynomial spline confidence band for time series trend. Journal of Statistical Planning and Inference, 142, 1678–1689.
Song, Q., Yang, L. (2009). Spline confidence bands for variance function. Journal of Nonparametric Statistics, 21, 589–609.
Song, Q., Liu, R., Shao, Q., Yang, L. (2014). A simultaneous confidence band for dense longitudinal regression. Communications in Statistics-Theory and Methods, 43, 5195–5210.
Wang, J., Yang, L. (2009). Polynomial spline confidence bands for regression curves. Statistica Sinica, 19, 325–342.
Wang, J., Cheng, F., Yang, L. (2013). Smooth simultaneous confidence bands for cumulative distribution functions. Journal of Nonparametric Statistics, 25, 395–407.
Wang, J., Liu, R., Cheng, F., Yang, L. (2014). Oracally efficient estimation of autoregressive error distribution with simultaneous confidence band. Annals of Statistics, 42, 654–668.
Wang, J., Wang, S., Yang, L. (2016). Simultaneous confidence bands for the distribution function of a finite population and of its superpopulation. TEST, 25, 692–709.
Wang, S., Dorfman, A. (1996). A new estimator for the finite population distribution function. Biometrika, 83, 639–652.
Xia, Y. (1998). Bias-corrected confidence bands in nonparametric regression. Journal of the Royal Statistical Society Series B, 60, 797–811.
Article MathSciNet MATH Google Scholar
Zheng, S., Yang, L., Härdle, W. (2014). A smooth simultaneous confidence corridor for the mean of sparse functional data. Journal of the American Statistical Association, 109, 661–673.
Zhu, H., Li, R., Kong, L. (2012). Multivariate varying coefficient model for functional responses. Annals of Statistics, 40, 2634–2666.

Download references

Author information

Authors and Affiliations

School of Mathematical Sciences and Center for Advanced Statistics and Econometrics Research, Soochow University, Suzhou, 215006, China
Lijie Gu
Department of Statistics, Texas A&M University, College Station, TX, 77843, USA
Suojin Wang
Center for Statistical Science and Department of Industrial Engineering, Tsinghua University, Beijing, 100084, China
Lijian Yang

Authors

Lijie Gu
View author publications
You can also search for this author in PubMed Google Scholar
Suojin Wang
View author publications
You can also search for this author in PubMed Google Scholar
Lijian Yang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lijian Yang.

Additional information

This research was supported in part by Jiangsu Specially-Appointed Professor Program SR10700111, Jiangsu Province Key-Discipline Program ZY107992, National Natural Science Foundation of China Awards NSFC 11371272, 11771240, 11701403, Research Fund for the Doctoral Program of Higher Education of China Award 20133201110002, 2017 Jiangsu Overseas Visiting Scholar Program for University Prominent Young and Middle-aged Teachers and Presidents, and the Simons Foundation Mathematics and Physical Sciences Program Award #499650. Helpful comments from a reviewer are greatly appreciated.

Appendix

In this Appendix, we use $a_{n}=o\left( b_{n}\right) $ to denote that $ \lim _{n\rightarrow \infty }a_{n}/b_{n}=0$, and $a_{n}=O\left( b_{n}\right) $ to denote that $\limsup _{n\rightarrow \infty }a_{n}/b_{n}=c$, where c is a constant. In addition, we denote by $o_{p}$$\left( O_{p}\right) $ and $ o_{a.s.}$ a sequence of random variables of order o$\left( O\right) $ in probability and almost surely, respectively, while $u_{a.s.}$ means $ o_{a.s.} $ uniformly in the domain.

In the following we will prove Lemma 1 and Theorems 2–4.

1.1 A.1 Proof of Lemma 1

Our framework given in Sect. 2 and Condition (C2) ensure that, for any $s\in \left\{ 1,\ldots ,S\right\} $,

$$\begin{aligned} \lim _{k\rightarrow \infty }\left( N_{sk}/N_{k}\right) =W_{s},\ \ \lim _{k\rightarrow \infty }\left( n_{sk}/n_{k}\right) =w_{s},\ \ \lim _{k\rightarrow \infty }\left( n_{k}/N_{k}\right) =C. \end{aligned}$$

Hence,

$$\begin{aligned} CW_{s}^{-1}w_{s}={\normalsize \lim _{k\rightarrow \infty }}\left( n_{k}/N_{k}\right) \left( N_{sk}/N_{k}\right) ^{-1}\left( n_{sk}/n_{k}\right) ={\normalsize \lim _{k\rightarrow \infty }}\left( n_{sk}/N_{sk}\right) \le 1. \end{aligned}$$

Making use of the simple inequality

$$\begin{aligned} n_{k}/N_{k}=\frac{\sum \nolimits _{s=1}^{S}n_{sk}}{\sum \nolimits _{s=1}^{S}N_{sk}}\ge \min _{1\le s\le S}\left( n_{sk}/N_{sk}\right) \end{aligned}$$

and letting $k\rightarrow \infty $, one obtains that

$$\begin{aligned} C= & {} \lim _{k\rightarrow \infty }n_{k}/N_{k}\ge \lim _{k\rightarrow \infty }\min _{1\le s\le S}\left( n_{sk}/N_{sk}\right) =\min _{1\le s\le S} \lim _{k\rightarrow \infty }\left( n_{sk}/N_{sk}\right) \\= & {} \min _{1\le s\le S}CW_{s}^{-1}w_{s}, \end{aligned}$$

since $C<1$ according to Condition (C2), one obtains that $\min _{1\le s\le S}CW_{s}^{-1}w_{s}<1$. The Lemma 1 is proved. $\square $

1.2 A.2 Proof of Theorem 2

For $s=1,\ldots ,S$, combining (8) in Theorem 1 with Skorohod’s Representation Theorem shown in Theorem 6.7 of Billingsley (1999), there exits a version $\tilde{B}_{sk}\left( \cdot \right) $ of Brownian bridge $B_{s}\left( \cdot \right) $ that satisfies $\tilde{B} _{sk}\left( F_{s}( x) \right) \overset{d}{\rightarrow } B_{s}\left( F_{s}( x) \right) $ as $k\rightarrow \infty $ such that

$$\begin{aligned} \sup \nolimits _{x\in \mathbb {R}}\left| \lambda _{sk}\left\{ F_{n_{sk}}( x) -F_{N_{sk}}(x) \right\} -\tilde{B} _{sk}\left( F_{s}( x) \right) \right| \rightarrow 0,\,a.s., \end{aligned}$$

which implies that

$$\begin{aligned} F_{n_{sk}}( x) -F_{N_{sk}}(x) =\lambda _{sk}^{-1} \tilde{B}_{sk}\left( F_{s}( x) \right) +u_{a.s.}\left( \lambda _{sk}^{-1}\right) \text {.} \end{aligned}$$

Recalling the definitions of $F_{N_{k}}( x) $ and $ F_{n_{k}}( x) $ given in (1) and (4), one has

$$\begin{aligned} \lambda _{k}\left\{ F_{n_{k}}( x) -F_{N_{k}}( x) \right\}= & {} \lambda _{k}\left\{ \sum \limits _{s=1}^{S}W_{sk}F_{n_{sk}}( x) -\sum \limits _{s=1}^{S}W_{sk}F_{N_{sk}}( x) \right\} \\= & {} \lambda _{k}\sum \limits _{s=1}^{S}W_{sk}\left\{ F_{n_{sk}}( x) -F_{N_{sk}}( x) \right\} \\= & {} \lambda _{k}\sum \limits _{s=1}^{S}W_{sk}\left\{ \lambda _{sk}^{-1}\tilde{ B}_{sk}\left( F_{s}( x) \right) +u_{a.s.}\left( \lambda _{sk}^{-1}\right) \right\} . \end{aligned}$$

According to Condition (C2), as $k\rightarrow \infty $,

$$\begin{aligned} \frac{n_{sk}}{N_{sk}}=\frac{n_{sk}}{n_{k}}\cdot \frac{n_{k}}{N_{k}}\cdot \frac{N_{k}}{N_{sk}}\rightarrow _{p} w_{s}CW_{s}^{-1}, \end{aligned}$$

and

$$\begin{aligned} \lambda _{k}W_{sk}\lambda _{sk}^{-1}=\frac{N_{sk}}{N_{k}}\sqrt{\frac{n_{k}}{ n_{sk}}\cdot \frac{1-n_{sk}/N_{sk}}{1-n_{k}/N_{k}}}\rightarrow _{p} W_{s}\sqrt{ w_{s}^{-1}\frac{1-w_{s}CW_{s}^{-1}}{1-C}}. \end{aligned}$$

(A.1)

Hence,

$$\begin{aligned} \lambda _{k}\left\{ F_{n_{k}}( x) -F_{N_{k}}( x) \right\} \overset{d}{\rightarrow }\sum \limits _{s=1}^{S}W_{s}\sqrt{\left( w_{s}^{-1}-CW_{s}^{-1}\right) /\left( 1-C\right) }B_{s}\left\{ F_{s}( x) \right\} .\ \ \end{aligned}$$

The proof of Theorem 2 is completed. $\square $

1.3 A.3 Proof of Theorem 3

Note that $\lambda _{k}N_{k}^{-1/2}=\left( n_{k}^{-1}-N_{k}^{-1}\right) ^{-1/2}N_{k}^{-1/2}=\left( n_{k}/N_{k}\right) ^{1/2}\left( 1-n_{k}/N_{k}\right) ^{-1/2}$$\rightarrow 0$ when $n_{k}/N_{k}\rightarrow C\equiv 0$ as $k\rightarrow \infty $. Because of a sequence of populations $ \left\{ \pi _{k}\right\} _{k=1}^{\infty }$ as i.i.d. random samples generated from F(x) , Donsker’s Theorem entails that $ N_{k}^{1/2}\left\{ F_{N_{k}}( x) -F(x) \right\} \overset{d}{\rightarrow }B\left\{ F( x) \right\} $. Hence, as $ k\rightarrow \infty $,

$$\begin{aligned} \lambda _{k}M( F_{N_{k}},F) =\lambda _{k}O_{p}\left( N_{k}^{-1/2}\right) =o_{p}\left( 1\right) . \end{aligned}$$

Then Theorem 3 follows by Theorem 2 and Slutsky’s Theorem. $\square $

1.4 A.4 Proof of Theorem 4

According to the definitions of $F_{n_{k}}( x) $ and $\hat{F} _{k}( x) $ given in (4) and (6), one has

$$\begin{aligned} \lambda _{k}\left\{ F_{n_{k}}( x) -\hat{F}_{k}( x) \right\} =\lambda _{k}\left\{ \sum \limits _{s=1}^{S}W_{sk}F_{n_{sk}}( x) -\sum \limits _{s=1}^{S}W_{sk}\hat{F}_{sk}( x) \right\} . \end{aligned}$$

Then (9) and (A.1) imply that

$$\begin{aligned} \lambda _{k}M( \hat{F}_{k},F_{N_{k}})= & {} \lambda _{k}\sup \nolimits _{x\in \mathbb {R}}\left| \hat{F}_{k}( x) -F_{n_{k}}( x) \right| \\\le & {} \lambda _{k}\sum \limits _{s=1}^{S}W_{sk}\sup \nolimits _{x\in \mathbb {R }}\left| \hat{F}_{sk}( x) -F_{n_{sk}}( x) \right| \\= & {} \lambda _{k}\sum \limits _{s=1}^{S}W_{sk}\lambda ^{-1} _{sk}\times o_{p}\left( 1\right) =o_{p}\left( 1\right) . \end{aligned}$$

Applying Theorems 2 and 3 and Slutsky’s Theorem, Theorem 4 is proved. $\square $

About this article

Cite this article

Gu, L., Wang, S. & Yang, L. Simultaneous confidence bands for the distribution function of a finite population in stratified sampling. Ann Inst Stat Math 71, 983–1005 (2019). https://doi.org/10.1007/s10463-018-0668-7

Download citation

Received: 16 August 2017
Revised: 24 April 2018
Published: 21 May 2018
Issue Date: 07 August 2019
DOI: https://doi.org/10.1007/s10463-018-0668-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Simultaneous confidence bands for the distribution function of a finite population in stratified sampling

Abstract

Access this article

Similar content being viewed by others

Simultaneous confidence bands for the distribution function of a finite population and of its superpopulation

Optimal Strategy for Elevated Estimation of Population Mean in Stratified Random Sampling under Linear Cost Function

A new estimator for mean under stratified random sampling

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix

1.1 A.1 Proof of Lemma 1

1.2 A.2 Proof of Theorem 2

1.3 A.3 Proof of Theorem 3

1.4 A.4 Proof of Theorem 4

About this article

Cite this article

Keywords

Navigation

Simultaneous confidence bands for the distribution function of a finite population in stratified sampling

Abstract

Access this article

Similar content being viewed by others

Simultaneous confidence bands for the distribution function of a finite population and of its superpopulation

Optimal Strategy for Elevated Estimation of Population Mean in Stratified Random Sampling under Linear Cost Function

A new estimator for mean under stratified random sampling

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix

Appendix

1.1 A.1 Proof of Lemma 1

1.2 A.2 Proof of Theorem 2

1.3 A.3 Proof of Theorem 3

1.4 A.4 Proof of Theorem 4

About this article

Cite this article

Share this article

Keywords

Search

Navigation