Skip to main content

An empirical likelihood approach under cluster sampling with missing observations

Abstract

The parameter of interest considered is the unique solution to a set of estimating equations, such as regression parameters of generalised linear models. We consider a design-based approach; that is, the sampling distribution is specified by stratification, cluster (multi-stage) sampling, unequal selection probabilities, side information and a response mechanism. The proposed empirical likelihood approach takes into account of these features. Empirical likelihood has been mostly developed under more restrictive settings, such as independent and identically distributed assumption, which is violated under a design-based framework. A proper empirical likelihood approach which deals with cluster sampling, missing data and multidimensional parameters is absent in the literature. This paper shows that a cluster-level empirical log-likelihood ratio statistic is pivotal. The main contribution of the paper is to provide the rigorous asymptotic theory and underlining regularity conditions which imply \({\surd {n}}\)-consistency and the Wilks’s theorem or self-normalisation property. Negligible and large sampling fractions are considered.

This is a preview of subscription content, access via your institution.

References

  • Alfons, A., Filzmoser, P., Hulliger, B., Kolb, J., Kraft, S., Münnich, R. (2011). Synthetic Data Generation of SILC Data. Research Project Report WP6 – D6.2, University of Trier. http://ameli.surveystatistics.net. Accessed 22 July 2018.

  • Berger, Y. G. (2018). Empirical likelihood approaches under complex sampling designs. In Wiley StatsRef: Statistics Reference Online. Wiley.

    Google Scholar 

  • Berger, Y. G., Rao, J. N. K. (2006). Adjusted jackknife for imputation under unequal probability sampling without replacement. Journal of the Royal Statistical Society Series B, 68, 531–547.

    MathSciNet  MATH  Google Scholar 

  • Berger, Y. G., Torres, O. D. L. R. (2016). An empirical likelihood approach for inference under complex sampling design. Journal of the Royal Statistical Society Series B, 78, 319–341.

  • Binder, D. A. (1983). On the variance of asymptotically normal estimators from complex surveys. International Statistical Review, 51, 279–292.

    MathSciNet  MATH  Google Scholar 

  • Binder, D. A., Patak, Z. (1994). Use of estimating functions for estimation from complex surveys. Journal of the American Statistical Association, 89, 1035–1043.

    MathSciNet  MATH  Google Scholar 

  • Brewer, K., Gregoire, T. (2009). Introduction to survey sampling. In D. Pfeffermann, C. Rao (Eds.), Sample Surveys: Design, Methods and Applications (pp. 9–38). Handbook of Statistics. Amsterdam: Elsevier.

    Google Scholar 

  • Brick, J., Kalton, G. (1996). Handling missing data in survey research. Statistical Methods in Medical Research, 5, 215–238.

    Google Scholar 

  • Brick, J. M., Montaquila, J. M. (2009). Nonresponse and weighting. In D. Pfeffermann C. R. Rao (Eds.), Sample surveys: Design, methods and applications vol. 29A of Handbook of Statistics (pp. 163–185). Amsterdam: Elsevier.

    Google Scholar 

  • Chen, J., Sitter, R. R. (1999). A pseudo empirical likelihood approach to the effective use of auxiliary information in complex surveys. Statistica Sinica, 9, 385–406.

  • Chen, S., Kim, J. K. (2014). Population empirical likelihood for nonparametric inference in survey sampling. Statistica Sinica, 24, 335–355.

  • Chen, S., Van Keilegom, I. (2009). A review on empirical likelihood methods for regression. Test, 18, 415–447.

    MathSciNet  MATH  Google Scholar 

  • Deville, J. C. (1999). Variance estimation for complex statistics and estimators: Linearization and residual techniques. Survey Methodology, 25, 193–203.

    Google Scholar 

  • Deville, J. C., Särndal, C.-E. (1992). Calibration estimators in survey sampling. Journal of the American Statistical Association, 87, 376–382.

    MathSciNet  MATH  Google Scholar 

  • Eurostat (2012). European union statistics on income and living conditions (EU-SILC). http://ec.europa.eu/eurostat/web/income-and-living-conditions/overview. Accessed 22 July 2018.

  • Fang, F., Hong, Q., Shao, J. (2009). A pseudo empirical likelihood approach for stratified samples with nonresponse. The Annals of Statistics, 37, 371–393.

    MathSciNet  MATH  Google Scholar 

  • Fang, F., Hong, Q., Shao, J. (2010). Empirical likelihood estimation for samples with nonignorable nonresponse. Statistica Sinica, 20, 263–280.

  • Fay, B. E. (1991). A design-based perspective on missing data variance. In Proceeding of the 1991 annual research conference. U.S. Bureau of the Census (pp. 429–440).

  • Fuller, W. A. (2009). Some design properties of a rejective sampling procedure. Biometrika, 96, 933–944.

    MathSciNet  MATH  Google Scholar 

  • Godambe, V., Thompson, M. E. (1974). Estimating equations in the presence of a nuisance parameter. The Annals of Statistics, 2, 568–571.

    MathSciNet  MATH  Google Scholar 

  • Godambe, V. P., Thompson, M. (2009). Estimating functions and survey sampling. In D. Pfeffermann, C. Rao (Eds.), Sample surveys: Inference and analysis. Handbook of statistics (pp. 83–101). Amsterdam: Elsevier.

    Google Scholar 

  • Graf, E., Tillé, Y. (2014). Variance estimation using linearization for poverty and social exclusion indicators. Survey Methodology, 40, 61–79.

  • Hájek, J. (1964). Asymptotic theory of rejective sampling with varying probabilities from a finite population. The Annals of Mathematical Statistics, 35, 1491–1523.

    MathSciNet  MATH  Google Scholar 

  • Hartley, H. O., Rao, J. N. K. (1962). Sampling with unequal probabilities without replacement. The Annals of Mathematical Statistics, 33, 350–374.

    MathSciNet  MATH  Google Scholar 

  • Hartley, H. O., Rao, J. N. K. (1968). A new estimation theory for sample surveys. Biometrika, 55, 547–557.

    MATH  Google Scholar 

  • Haziza, D. (2009). Imputation and inference in the presence of missing data. In D. Pfeffermann C. R. Rao (Eds.), Sample surveys: Design, methods and applications vol. 29A of handbook of statistics (pp. 215–246). Amsterdam: Elsevier.

    Google Scholar 

  • Haziza, D., Beaumont, J. F. (2007). On the construction of imputation classes in surveys. International Statistical Review, 75, 25–43.

    Google Scholar 

  • Haziza, D., Lesage, E. (2016). A discussion of weighting procedures for unit nonresponse. Journal of Official Statistics, 32, 129–145.

    Google Scholar 

  • Imbens, G. W., Lancaster, T. (1994). Combining micro and macro data in microeconometric models. The Review of Economic Studies, 61, 655–680.

    MathSciNet  MATH  Google Scholar 

  • Isaki, C. T., Fuller, W. A. (1982). Survey design under the regression super-population model. Journal of the American Statistical Association, 77, 89–96.

    MathSciNet  MATH  Google Scholar 

  • Kalton, G. (1983). Compensating for missing survey data. Ann Arbor, MI: University of Michigan Press.

    Google Scholar 

  • Kovar, J. G., Rao, J. N. K., Wu, C. F. J. (1988). Bootstrap and other methods to measure errors in survey estimates. The Canadian Journal of Statistics, 16, 25–45.

    MathSciNet  MATH  Google Scholar 

  • Krewski, D., Rao, J. N. K. (1981). Inference from stratified sample: Properties of linearization jackknife, and balanced repeated replication methods. The Annals of Statistics, 9, 1010–1019.

    MathSciNet  MATH  Google Scholar 

  • Little, R. (1986). Survey nonresponse adjustments for estimates of means. International Statistical Review, 54, 139–157.

    MATH  Google Scholar 

  • Little, R., Rubin, D. B. (2002). Statistical analysis with missing data (2nd ed). Hoboken, NJ: Wiley.

  • Little, R., Vartivarian, S. (2005). Models for nonresponse in sample surveys. Survey Methodology, 31, 161–168.

  • Lundström, S., Särndal, C. E. (1999). Calibration as a standard method for treatment of nonresponse. Journal of Official Statistics, 15, 305–327.

  • Montaquila, J., Brick, J., Hagedorn, M., Kennedy, C., Keeter, S. (2008). Aspects of nonresponse bias in rdd telephone surveys. In J. Lepkowski, C. Tucker, J. Brick et al. (Eds.), Advances in Telephone Survey Methodology. New York: Wiley.

  • National Center for Health Statistics (2016). National health and nutrition examination survey (NHANES). http://www.cdc.gov/nchs/nhanes. Accessed 22 July 2018.

  • Neyman, J. (1938). On the two different aspects of the representative method: The method of stratified sampling and the method of purposive selection. Journal of the Royal Statistical Society, 97, 558–625.

    MATH  Google Scholar 

  • OECD (2006). PISA 2006 Technical Report. https://www.oecd.org/pisa/data/42025182.pdf. Accessed 22 July 2018.

  • OECD (2007). PISA 2006: Science competencies for tomorrows world, volume 1 - analysis. Paris: OECD Publisher.

  • Oǧuz-Alper, M., Berger, Y. G. (2016). Empirical likelihood approach for modelling survey data. Biometrika, 103, 447–459.

    MathSciNet  MATH  Google Scholar 

  • Osier, G., Berger, Y. G., Goedemé, T. (2013). Standard error estimation for the eu-silc indicators of poverty and social exclusion. Eurostat Methodologies and Working Papers series .

  • Owen, A. B. (1988). Empirical likelihood ratio confidence intervals for a single functional. Biometrika, 75, 237–249.

    MathSciNet  MATH  Google Scholar 

  • Owen, A. B. (2001). Empirical likelihood. New York: Chapman & Hall.

    MATH  Google Scholar 

  • Pfeffermann, D., Skinner, C., Holmes, D., Goldstein, H., Rasbash, J. (1998). Weighting for unequal selection probabilities in multilevel models. Journal of the Royal Statistical Society Series B, 60, 23–40.

    MathSciNet  MATH  Google Scholar 

  • Qin, J., Lawless, J. (1994). Empirical likelihood and general estimating equations. The Annals of Statistics, 22, 300–325.

    MathSciNet  MATH  Google Scholar 

  • Qin, J., Zhang, B., Leung, D. H. Y. (2009). Empirical likelihood in missing data problems. Journal of the American Statistical Association, 104, 1492–1503.

    MathSciNet  MATH  Google Scholar 

  • R Development Core Team (2014). R: A language and environment for statistical computing. R Foundation for Statistical Computing. Vienna, Austria. http://www.R-project.org, Accessed 22 July 2018.

  • Rao, C. (1973). linear statistical inference and its applications, vol (2nd ed.). New York: Wiley.

    Google Scholar 

  • Rao, J. N. K., Shao, A. J. (1992). Jackknife variance estimation with survey data under hotdeck imputation. Biometrika, 79, 811–822.

    MathSciNet  MATH  Google Scholar 

  • Rubin, D. B. (1976). Inference and missing data. Biometrika, 63, 581–592.

    MathSciNet  MATH  Google Scholar 

  • Rust, K., Rao, J. (1996). Variance estimation for complex surveys using replication techniques. Biometrika, 5, 281–310.

    Google Scholar 

  • Särndal, C. E., Lundström, S. (2005). Estimation in surveys with nonresponse. Chichester: Wiley.

    MATH  Google Scholar 

  • Särndal, C. E., Swensson, B. (1987). A general view of estimation for two-phases of selection with applications to two-phase sampling and non-response. International Statistical Review, 55, 279–294.

    MathSciNet  MATH  Google Scholar 

  • Särndal, C.-E., Swensson, B., Wretman, J. (1992). Model Assisted Survey Sampling. New York: Springer.

    MATH  Google Scholar 

  • Scheffé, H. (1959). The analysis of variance. New York: Wiley.

    MATH  Google Scholar 

  • Shao, J., Steel, P. (1999). Variance estimation for survey data with composite imputation and nonnegligible sampling fractions. Journal of the American Statistical Association, 94, 254–265.

    MathSciNet  MATH  Google Scholar 

  • Valliant, R. (2004). The effect of multiple weighting steps on variance estimation. Journal of Official Statistics, 20, 1–18.

    Google Scholar 

  • Van Der Vaart, A. W. (1998). Asymptotic statistics. Cambridge: Cambridge University Press.

    MATH  Google Scholar 

  • Wang, D., Chen, S. X. (2009). Empirical likelihood for estimating equations with missing values. The Annals of Statistics, 37, 490–517.

    MathSciNet  MATH  Google Scholar 

  • Wang, Q., Rao, J. N. K. (2002a). Empirical likelihood-based inference in linear models with missing data. Scandinavian Journal of Statistics, 29, 563–576.

    MathSciNet  MATH  Google Scholar 

  • Wang, Q., Rao, J. N. K. (2002b). Empirical likelihood-based inference under imputation for missing response data. The Annals of Statistics, 30, 896–924.

    MathSciNet  MATH  Google Scholar 

  • Wolter, K. M. (2007). Introduction to Variance Estimation (2nd ed). New York: Springer.

  • Wu, C., Rao, J. N. K. (2006). Pseudo-empirical likelihood ratio confidence intervals for complex surveys. Canadian Journal of Statistics, 34, 359–375.

    MathSciNet  MATH  Google Scholar 

  • Wu, C., Zhao, P., Haziza, D. (2017). Empirical likelihood inference for complex surveys and the design-based oracle variable selection theory. In Proceedings of the section on survey research methods. American Statistical Association.

Download references

Acknowledgements

This work was supported by the European Unions’s Sevenths Programme for Research, Technological Development and Demonstration under Grant Agreement No 312691 - InGRID. I wish to thanks Dr. Melike Oǧuz-Alper (Statistics Norway) for useful comments and help with Sect. 9. I also wish to thank an anonymous reviewer for suggesting adding Sects. 7, 8.4 and 9.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yves G. Berger.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 281 KB)

Appendix A

Appendix A

In this Appendix, we propose an estimator for (38). We have that [see (C.10) and (C.11) in “Appendix C” of the online supplement]

(45)

where

(46)

and is defined by (29). The operators \({{\mathbb {E}}}_r(\cdot )\) and \({{\mathbb {V}}}_r(\cdot )\) denote the expectation and variance with respect to the response mechanism. The operators \({{\mathbb {V}}}_d(\cdot \!\mid \!{\varvec{r}})\) and \({{\mathbb {E}}}_{d}(\cdot \!\mid \!{\varvec{r}})\) denote the conditional expectation and variance with respect to the sampling design, given \({\varvec{r}}\). An asymptotically unbiased estimator of \({{\varvec{V}}\!\!_{0}}\!^\mathrm{{I}}\) is

(47)

where denotes the customary two-stage variance estimator of \({{\mathbb {V}}}_d({\bar{{\varvec{\epsilon }}}}_{\pi }\mid {\varvec{r}})\) (e.g. Särndal et al. 1992, p137), treating \({\varvec{r}}\) as constant. This estimator takes into account of large sampling fractions, because it depends on the joint-inclusion probabilities of the clusters. The second term \({{\varvec{V}}\!\!_{0}}\!^\mathrm{{II}}\) can be estimated by (see (C.12) in “Appendix C” of the online supplement)

(48)

where \(P_i({\varvec{\lambda }}_{0})\) is defined by (3) and

(49)

The unknown quantity is substituted by within (47) and (49).

Finally, (45), (47) and (48) gives the following estimator for (38)

(50)

The estimates of are the eigenvalues of (50), after substituting by within the right hand side of (50).

About this article

Verify currency and authenticity via CrossMark

Cite this article

Berger, Y.G. An empirical likelihood approach under cluster sampling with missing observations. Ann Inst Stat Math 72, 91–121 (2020). https://doi.org/10.1007/s10463-018-0681-x

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10463-018-0681-x

Keywords

  • Design-based approach
  • Estimating equations
  • Stratification
  • Side information
  • Unequal probabilities