An empirical likelihood approach under cluster sampling with missing observations

Berger, Yves G.

doi:10.1007/s10463-018-0681-x

An empirical likelihood approach under cluster sampling with missing observations

Published: 03 August 2018

Volume 72, pages 91–121, (2020)
Cite this article

Annals of the Institute of Statistical Mathematics Aims and scope Submit manuscript

Yves G. Berger¹

418 Accesses
8 Citations
Explore all metrics

Abstract

The parameter of interest considered is the unique solution to a set of estimating equations, such as regression parameters of generalised linear models. We consider a design-based approach; that is, the sampling distribution is specified by stratification, cluster (multi-stage) sampling, unequal selection probabilities, side information and a response mechanism. The proposed empirical likelihood approach takes into account of these features. Empirical likelihood has been mostly developed under more restrictive settings, such as independent and identically distributed assumption, which is violated under a design-based framework. A proper empirical likelihood approach which deals with cluster sampling, missing data and multidimensional parameters is absent in the literature. This paper shows that a cluster-level empirical log-likelihood ratio statistic is pivotal. The main contribution of the paper is to provide the rigorous asymptotic theory and underlining regularity conditions which imply \({\surd {n}}\)-consistency and the Wilks’s theorem or self-normalisation property. Negligible and large sampling fractions are considered.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Doubly Weighted Estimation Approach for Linear Regression Analysis with Two-stage Cluster Samples

Article 15 December 2023

Brajendra C. Sutradhar

Seemingly unrelated clusterwise linear regression for contaminated data

Article Open access 06 August 2022

Gabriele Perrone & Gabriele Soffritti

Semi-parametric regression when some (expensive) covariates are missing by design

Article 01 January 2020

Göran Kauermann & Mehboob Ali

References

Alfons, A., Filzmoser, P., Hulliger, B., Kolb, J., Kraft, S., Münnich, R. (2011). Synthetic Data Generation of SILC Data. Research Project Report WP6 – D6.2, University of Trier. http://ameli.surveystatistics.net. Accessed 22 July 2018.
Berger, Y. G. (2018). Empirical likelihood approaches under complex sampling designs. In Wiley StatsRef: Statistics Reference Online. Wiley.
Google Scholar
Berger, Y. G., Rao, J. N. K. (2006). Adjusted jackknife for imputation under unequal probability sampling without replacement. Journal of the Royal Statistical Society Series B, 68, 531–547.
MathSciNet MATH Google Scholar
Berger, Y. G., Torres, O. D. L. R. (2016). An empirical likelihood approach for inference under complex sampling design. Journal of the Royal Statistical Society Series B, 78, 319–341.
Binder, D. A. (1983). On the variance of asymptotically normal estimators from complex surveys. International Statistical Review, 51, 279–292.
MathSciNet MATH Google Scholar
Binder, D. A., Patak, Z. (1994). Use of estimating functions for estimation from complex surveys. Journal of the American Statistical Association, 89, 1035–1043.
MathSciNet MATH Google Scholar
Brewer, K., Gregoire, T. (2009). Introduction to survey sampling. In D. Pfeffermann, C. Rao (Eds.), Sample Surveys: Design, Methods and Applications (pp. 9–38). Handbook of Statistics. Amsterdam: Elsevier.
Google Scholar
Brick, J., Kalton, G. (1996). Handling missing data in survey research. Statistical Methods in Medical Research, 5, 215–238.
Google Scholar
Brick, J. M., Montaquila, J. M. (2009). Nonresponse and weighting. In D. Pfeffermann C. R. Rao (Eds.), Sample surveys: Design, methods and applications vol. 29A of Handbook of Statistics (pp. 163–185). Amsterdam: Elsevier.
Google Scholar
Chen, J., Sitter, R. R. (1999). A pseudo empirical likelihood approach to the effective use of auxiliary information in complex surveys. Statistica Sinica, 9, 385–406.
Chen, S., Kim, J. K. (2014). Population empirical likelihood for nonparametric inference in survey sampling. Statistica Sinica, 24, 335–355.
Chen, S., Van Keilegom, I. (2009). A review on empirical likelihood methods for regression. Test, 18, 415–447.
MathSciNet MATH Google Scholar
Deville, J. C. (1999). Variance estimation for complex statistics and estimators: Linearization and residual techniques. Survey Methodology, 25, 193–203.
Google Scholar
Deville, J. C., Särndal, C.-E. (1992). Calibration estimators in survey sampling. Journal of the American Statistical Association, 87, 376–382.
MathSciNet MATH Google Scholar
Eurostat (2012). European union statistics on income and living conditions (EU-SILC). http://ec.europa.eu/eurostat/web/income-and-living-conditions/overview. Accessed 22 July 2018.
Fang, F., Hong, Q., Shao, J. (2009). A pseudo empirical likelihood approach for stratified samples with nonresponse. The Annals of Statistics, 37, 371–393.
MathSciNet MATH Google Scholar
Fang, F., Hong, Q., Shao, J. (2010). Empirical likelihood estimation for samples with nonignorable nonresponse. Statistica Sinica, 20, 263–280.
Fay, B. E. (1991). A design-based perspective on missing data variance. In Proceeding of the 1991 annual research conference. U.S. Bureau of the Census (pp. 429–440).
Fuller, W. A. (2009). Some design properties of a rejective sampling procedure. Biometrika, 96, 933–944.
MathSciNet MATH Google Scholar
Godambe, V., Thompson, M. E. (1974). Estimating equations in the presence of a nuisance parameter. The Annals of Statistics, 2, 568–571.
MathSciNet MATH Google Scholar
Godambe, V. P., Thompson, M. (2009). Estimating functions and survey sampling. In D. Pfeffermann, C. Rao (Eds.), Sample surveys: Inference and analysis. Handbook of statistics (pp. 83–101). Amsterdam: Elsevier.
Google Scholar
Graf, E., Tillé, Y. (2014). Variance estimation using linearization for poverty and social exclusion indicators. Survey Methodology, 40, 61–79.
Hájek, J. (1964). Asymptotic theory of rejective sampling with varying probabilities from a finite population. The Annals of Mathematical Statistics, 35, 1491–1523.
MathSciNet MATH Google Scholar
Hartley, H. O., Rao, J. N. K. (1962). Sampling with unequal probabilities without replacement. The Annals of Mathematical Statistics, 33, 350–374.
MathSciNet MATH Google Scholar
Hartley, H. O., Rao, J. N. K. (1968). A new estimation theory for sample surveys. Biometrika, 55, 547–557.
MATH Google Scholar
Haziza, D. (2009). Imputation and inference in the presence of missing data. In D. Pfeffermann C. R. Rao (Eds.), Sample surveys: Design, methods and applications vol. 29A of handbook of statistics (pp. 215–246). Amsterdam: Elsevier.
Google Scholar
Haziza, D., Beaumont, J. F. (2007). On the construction of imputation classes in surveys. International Statistical Review, 75, 25–43.
Google Scholar
Haziza, D., Lesage, E. (2016). A discussion of weighting procedures for unit nonresponse. Journal of Official Statistics, 32, 129–145.
Google Scholar
Imbens, G. W., Lancaster, T. (1994). Combining micro and macro data in microeconometric models. The Review of Economic Studies, 61, 655–680.
MathSciNet MATH Google Scholar
Isaki, C. T., Fuller, W. A. (1982). Survey design under the regression super-population model. Journal of the American Statistical Association, 77, 89–96.
MathSciNet MATH Google Scholar
Kalton, G. (1983). Compensating for missing survey data. Ann Arbor, MI: University of Michigan Press.
Google Scholar
Kovar, J. G., Rao, J. N. K., Wu, C. F. J. (1988). Bootstrap and other methods to measure errors in survey estimates. The Canadian Journal of Statistics, 16, 25–45.
MathSciNet MATH Google Scholar
Krewski, D., Rao, J. N. K. (1981). Inference from stratified sample: Properties of linearization jackknife, and balanced repeated replication methods. The Annals of Statistics, 9, 1010–1019.
MathSciNet MATH Google Scholar
Little, R. (1986). Survey nonresponse adjustments for estimates of means. International Statistical Review, 54, 139–157.
MATH Google Scholar
Little, R., Rubin, D. B. (2002). Statistical analysis with missing data (2nd ed). Hoboken, NJ: Wiley.
Little, R., Vartivarian, S. (2005). Models for nonresponse in sample surveys. Survey Methodology, 31, 161–168.
Lundström, S., Särndal, C. E. (1999). Calibration as a standard method for treatment of nonresponse. Journal of Official Statistics, 15, 305–327.
Montaquila, J., Brick, J., Hagedorn, M., Kennedy, C., Keeter, S. (2008). Aspects of nonresponse bias in rdd telephone surveys. In J. Lepkowski, C. Tucker, J. Brick et al. (Eds.), Advances in Telephone Survey Methodology. New York: Wiley.
National Center for Health Statistics (2016). National health and nutrition examination survey (NHANES). http://www.cdc.gov/nchs/nhanes. Accessed 22 July 2018.
Neyman, J. (1938). On the two different aspects of the representative method: The method of stratified sampling and the method of purposive selection. Journal of the Royal Statistical Society, 97, 558–625.
MATH Google Scholar
OECD (2006). PISA 2006 Technical Report. https://www.oecd.org/pisa/data/42025182.pdf. Accessed 22 July 2018.
OECD (2007). PISA 2006: Science competencies for tomorrows world, volume 1 - analysis. Paris: OECD Publisher.
Oǧuz-Alper, M., Berger, Y. G. (2016). Empirical likelihood approach for modelling survey data. Biometrika, 103, 447–459.
MathSciNet MATH Google Scholar
Osier, G., Berger, Y. G., Goedemé, T. (2013). Standard error estimation for the eu-silc indicators of poverty and social exclusion. Eurostat Methodologies and Working Papers series .
Owen, A. B. (1988). Empirical likelihood ratio confidence intervals for a single functional. Biometrika, 75, 237–249.
MathSciNet MATH Google Scholar
Owen, A. B. (2001). Empirical likelihood. New York: Chapman & Hall.
MATH Google Scholar
Pfeffermann, D., Skinner, C., Holmes, D., Goldstein, H., Rasbash, J. (1998). Weighting for unequal selection probabilities in multilevel models. Journal of the Royal Statistical Society Series B, 60, 23–40.
MathSciNet MATH Google Scholar
Qin, J., Lawless, J. (1994). Empirical likelihood and general estimating equations. The Annals of Statistics, 22, 300–325.
MathSciNet MATH Google Scholar
Qin, J., Zhang, B., Leung, D. H. Y. (2009). Empirical likelihood in missing data problems. Journal of the American Statistical Association, 104, 1492–1503.
MathSciNet MATH Google Scholar
R Development Core Team (2014). R: A language and environment for statistical computing. R Foundation for Statistical Computing. Vienna, Austria. http://www.R-project.org, Accessed 22 July 2018.
Rao, C. (1973). linear statistical inference and its applications, vol (2nd ed.). New York: Wiley.
Google Scholar
Rao, J. N. K., Shao, A. J. (1992). Jackknife variance estimation with survey data under hotdeck imputation. Biometrika, 79, 811–822.
MathSciNet MATH Google Scholar
Rubin, D. B. (1976). Inference and missing data. Biometrika, 63, 581–592.
MathSciNet MATH Google Scholar
Rust, K., Rao, J. (1996). Variance estimation for complex surveys using replication techniques. Biometrika, 5, 281–310.
Google Scholar
Särndal, C. E., Lundström, S. (2005). Estimation in surveys with nonresponse. Chichester: Wiley.
MATH Google Scholar
Särndal, C. E., Swensson, B. (1987). A general view of estimation for two-phases of selection with applications to two-phase sampling and non-response. International Statistical Review, 55, 279–294.
MathSciNet MATH Google Scholar
Särndal, C.-E., Swensson, B., Wretman, J. (1992). Model Assisted Survey Sampling. New York: Springer.
MATH Google Scholar
Scheffé, H. (1959). The analysis of variance. New York: Wiley.
MATH Google Scholar
Shao, J., Steel, P. (1999). Variance estimation for survey data with composite imputation and nonnegligible sampling fractions. Journal of the American Statistical Association, 94, 254–265.
MathSciNet MATH Google Scholar
Valliant, R. (2004). The effect of multiple weighting steps on variance estimation. Journal of Official Statistics, 20, 1–18.
Google Scholar
Van Der Vaart, A. W. (1998). Asymptotic statistics. Cambridge: Cambridge University Press.
MATH Google Scholar
Wang, D., Chen, S. X. (2009). Empirical likelihood for estimating equations with missing values. The Annals of Statistics, 37, 490–517.
MathSciNet MATH Google Scholar
Wang, Q., Rao, J. N. K. (2002a). Empirical likelihood-based inference in linear models with missing data. Scandinavian Journal of Statistics, 29, 563–576.
MathSciNet MATH Google Scholar
Wang, Q., Rao, J. N. K. (2002b). Empirical likelihood-based inference under imputation for missing response data. The Annals of Statistics, 30, 896–924.
MathSciNet MATH Google Scholar
Wolter, K. M. (2007). Introduction to Variance Estimation (2nd ed). New York: Springer.
Wu, C., Rao, J. N. K. (2006). Pseudo-empirical likelihood ratio confidence intervals for complex surveys. Canadian Journal of Statistics, 34, 359–375.
MathSciNet MATH Google Scholar
Wu, C., Zhao, P., Haziza, D. (2017). Empirical likelihood inference for complex surveys and the design-based oracle variable selection theory. In Proceedings of the section on survey research methods. American Statistical Association.

Download references

Acknowledgements

This work was supported by the European Unions’s Sevenths Programme for Research, Technological Development and Demonstration under Grant Agreement No 312691 - InGRID. I wish to thanks Dr. Melike Oǧuz-Alper (Statistics Norway) for useful comments and help with Sect. 9. I also wish to thank an anonymous reviewer for suggesting adding Sects. 7, 8.4 and 9.

Author information

Authors and Affiliations

Southampton Statistical Sciences Research Institute, University of Southampton, Southampton, SO17 1BJ, United Kingdom
Yves G. Berger

Authors

Yves G. Berger
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yves G. Berger.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 281 KB)

Appendix A

In this Appendix, we propose an estimator for (38). We have that [see (C.10) and (C.11) in “Appendix C” of the online supplement]

(45)

where

(46)

and is defined by (29). The operators \({{\mathbb {E}}}_r(\cdot )\) and \({{\mathbb {V}}}_r(\cdot )\) denote the expectation and variance with respect to the response mechanism. The operators \({{\mathbb {V}}}_d(\cdot \!\mid \!{\varvec{r}})\) and \({{\mathbb {E}}}_{d}(\cdot \!\mid \!{\varvec{r}})\) denote the conditional expectation and variance with respect to the sampling design, given \({\varvec{r}}\). An asymptotically unbiased estimator of \({{\varvec{V}}\!\!_{0}}\!^\mathrm{{I}}\) is

(47)

where denotes the customary two-stage variance estimator of \({{\mathbb {V}}}_d({\bar{{\varvec{\epsilon }}}}_{\pi }\mid {\varvec{r}})\) (e.g. Särndal et al. 1992, p137), treating \({\varvec{r}}\) as constant. This estimator takes into account of large sampling fractions, because it depends on the joint-inclusion probabilities of the clusters. The second term \({{\varvec{V}}\!\!_{0}}\!^\mathrm{{II}}\) can be estimated by (see (C.12) in “Appendix C” of the online supplement)

(48)

where \(P_i({\varvec{\lambda }}_{0})\) is defined by (3) and

(49)

The unknown quantity is substituted by within (47) and (49).

Finally, (45), (47) and (48) gives the following estimator for (38)

(50)

The estimates of are the eigenvalues of (50), after substituting by within the right hand side of (50).

About this article

Cite this article

Berger, Y.G. An empirical likelihood approach under cluster sampling with missing observations. Ann Inst Stat Math 72, 91–121 (2020). https://doi.org/10.1007/s10463-018-0681-x

Download citation

Received: 12 September 2017
Revised: 22 June 2018
Published: 03 August 2018
Issue Date: February 2020
DOI: https://doi.org/10.1007/s10463-018-0681-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An empirical likelihood approach under cluster sampling with missing observations

Abstract

Access this article

Similar content being viewed by others

Doubly Weighted Estimation Approach for Linear Regression Analysis with Two-stage Cluster Samples

Seemingly unrelated clusterwise linear regression for contaminated data

Semi-parametric regression when some (expensive) covariates are missing by design

References

Acknowledgements