# An empirical likelihood approach under cluster sampling with missing observations

- 43 Downloads

## Abstract

The parameter of interest considered is the unique solution to a set of estimating equations, such as regression parameters of generalised linear models. We consider a design-based approach; that is, the sampling distribution is specified by stratification, cluster (multi-stage) sampling, unequal selection probabilities, side information and a response mechanism. The proposed empirical likelihood approach takes into account of these features. Empirical likelihood has been mostly developed under more restrictive settings, such as independent and identically distributed assumption, which is violated under a design-based framework. A proper empirical likelihood approach which deals with cluster sampling, missing data and multidimensional parameters is absent in the literature. This paper shows that a cluster-level empirical log-likelihood ratio statistic is pivotal. The main contribution of the paper is to provide the rigorous asymptotic theory and underlining regularity conditions which imply \({\surd {n}}\)-consistency and the Wilks’s theorem or self-normalisation property. Negligible and large sampling fractions are considered.

## Keywords

Design-based approach Estimating equations Stratification Side information Unequal probabilities## Notes

### Acknowledgements

This work was supported by the European Unions’s Sevenths Programme for Research, Technological Development and Demonstration under Grant Agreement No 312691 - InGRID. I wish to thanks Dr. Melike Oǧuz-Alper (Statistics Norway) for useful comments and help with Sect. 9. I also wish to thank an anonymous reviewer for suggesting adding Sects. 7, 8.4 and 9.

## Supplementary material

## References

- Alfons, A., Filzmoser, P., Hulliger, B., Kolb, J., Kraft, S., Münnich, R. (2011). Synthetic Data Generation of SILC Data. Research Project Report WP6 – D6.2, University of Trier. http://ameli.surveystatistics.net. Accessed 22 July 2018.
- Berger, Y. G. (2018).
*Empirical likelihood approaches under complex sampling designs*. In Wiley StatsRef: Statistics Reference Online. Wiley.CrossRefGoogle Scholar - Berger, Y. G., Rao, J. N. K. (2006). Adjusted jackknife for imputation under unequal probability sampling without replacement.
*Journal of the Royal Statistical Society Series B*,*68*, 531–547.Google Scholar - Berger, Y. G., Torres, O. D. L. R. (2016). An empirical likelihood approach for inference under complex sampling design.
*Journal of the Royal Statistical Society Series B*,*78*, 319–341.Google Scholar - Binder, D. A. (1983). On the variance of asymptotically normal estimators from complex surveys.
*International Statistical Review*,*51*, 279–292.MathSciNetCrossRefzbMATHGoogle Scholar - Binder, D. A., Patak, Z. (1994). Use of estimating functions for estimation from complex surveys.
*Journal of the American Statistical Association*,*89*, 1035–1043.Google Scholar - Brewer, K., Gregoire, T. (2009). Introduction to survey sampling. In D. Pfeffermann, C. Rao (Eds.),
*Sample Surveys*:*Design, Methods and Applications*(pp. 9–38). Handbook of Statistics. Amsterdam: Elsevier.Google Scholar - Brick, J., Kalton, G. (1996). Handling missing data in survey research.
*Statistical Methods in Medical Research*,*5*, 215–238.Google Scholar - Brick, J. M., Montaquila, J. M. (2009). Nonresponse and weighting. In D. Pfeffermann C. R. Rao (Eds.),
*Sample surveys*:*Design, methods and applications vol. 29A of Handbook of Statistics*(pp. 163–185). Amsterdam: Elsevier.Google Scholar - Chen, J., Sitter, R. R. (1999). A pseudo empirical likelihood approach to the effective use of auxiliary information in complex surveys.
*Statistica Sinica*,*9*, 385–406.Google Scholar - Chen, S., Kim, J. K. (2014). Population empirical likelihood for nonparametric inference in survey sampling.
*Statistica Sinica*,*24*, 335–355.Google Scholar - Chen, S., Van Keilegom, I. (2009). A review on empirical likelihood methods for regression.
*Test*,*18*, 415–447.Google Scholar - Deville, J. C. (1999). Variance estimation for complex statistics and estimators: Linearization and residual techniques.
*Survey Methodology*,*25*, 193–203.Google Scholar - Deville, J. C., Särndal, C.-E. (1992). Calibration estimators in survey sampling.
*Journal of the American Statistical Association*,*87*, 376–382.Google Scholar - Eurostat (2012). European union statistics on income and living conditions (EU-SILC). http://ec.europa.eu/eurostat/web/income-and-living-conditions/overview. Accessed 22 July 2018.
- Fang, F., Hong, Q., Shao, J. (2009). A pseudo empirical likelihood approach for stratified samples with nonresponse.
*The Annals of Statistics*,*37*, 371–393.Google Scholar - Fang, F., Hong, Q., Shao, J. (2010). Empirical likelihood estimation for samples with nonignorable nonresponse.
*Statistica Sinica*,*20*, 263–280.Google Scholar - Fay, B. E. (1991). A design-based perspective on missing data variance. In
*Proceeding of the 1991 annual research conference. U.S. Bureau of the Census*(pp. 429–440).Google Scholar - Fuller, W. A. (2009). Some design properties of a rejective sampling procedure.
*Biometrika*,*96*, 933–944.MathSciNetCrossRefzbMATHGoogle Scholar - Godambe, V., Thompson, M. E. (1974). Estimating equations in the presence of a nuisance parameter.
*The Annals of Statistics*,*2*, 568–571.Google Scholar - Godambe, V. P., Thompson, M. (2009). Estimating functions and survey sampling. In D. Pfeffermann, C. Rao (Eds.),
*Sample surveys*:*Inference and analysis. Handbook of statistics*(pp. 83–101). Amsterdam: Elsevier.Google Scholar - Graf, E., Tillé, Y. (2014). Variance estimation using linearization for poverty and social exclusion indicators.
*Survey Methodology*,*40*, 61–79.Google Scholar - Hájek, J. (1964). Asymptotic theory of rejective sampling with varying probabilities from a finite population.
*The Annals of Mathematical Statistics*,*35*, 1491–1523.MathSciNetCrossRefzbMATHGoogle Scholar - Hartley, H. O., Rao, J. N. K. (1962). Sampling with unequal probabilities without replacement.
*The Annals of Mathematical Statistics*,*33*, 350–374.Google Scholar - Hartley, H. O., Rao, J. N. K. (1968). A new estimation theory for sample surveys.
*Biometrika*,*55*, 547–557.Google Scholar - Haziza, D. (2009). Imputation and inference in the presence of missing data. In D. Pfeffermann C. R. Rao (Eds.),
*Sample surveys*:*Design, methods and applications vol. 29A of handbook of statistics*(pp. 215–246). Amsterdam: Elsevier.Google Scholar - Haziza, D., Beaumont, J. F. (2007). On the construction of imputation classes in surveys.
*International Statistical Review*,*75*, 25–43.Google Scholar - Haziza, D., Lesage, E. (2016). A discussion of weighting procedures for unit nonresponse.
*Journal of Official Statistics*,*32*, 129–145.Google Scholar - Imbens, G. W., Lancaster, T. (1994). Combining micro and macro data in microeconometric models.
*The Review of Economic Studies*,*61*, 655–680.Google Scholar - Isaki, C. T., Fuller, W. A. (1982). Survey design under the regression super-population model.
*Journal of the American Statistical Association*,*77*, 89–96.Google Scholar - Kalton, G. (1983).
*Compensating for missing survey data*. Ann Arbor, MI: University of Michigan Press.Google Scholar - Kovar, J. G., Rao, J. N. K., Wu, C. F. J. (1988). Bootstrap and other methods to measure errors in survey estimates.
*The Canadian Journal of Statistics*,*16*, 25–45.Google Scholar - Krewski, D., Rao, J. N. K. (1981). Inference from stratified sample: Properties of linearization jackknife, and balanced repeated replication methods.
*The Annals of Statistics*,*9*, 1010–1019.Google Scholar - Little, R. (1986). Survey nonresponse adjustments for estimates of means.
*International Statistical Review*,*54*, 139–157.CrossRefzbMATHGoogle Scholar - Little, R., Rubin, D. B. (2002).
*Statistical analysis with missing data*(2nd ed). Hoboken, NJ: Wiley.Google Scholar - Little, R., Vartivarian, S. (2005). Models for nonresponse in sample surveys.
*Survey Methodology*,*31*, 161–168.Google Scholar - Lundström, S., Särndal, C. E. (1999). Calibration as a standard method for treatment of nonresponse.
*Journal of Official Statistics*,*15*, 305–327.Google Scholar - Montaquila, J., Brick, J., Hagedorn, M., Kennedy, C., Keeter, S. (2008). Aspects of nonresponse bias in rdd telephone surveys. In J. Lepkowski, C. Tucker, J. Brick et al. (Eds.),
*Advances in Telephone Survey Methodology*. New York: Wiley.Google Scholar - National Center for Health Statistics (2016). National health and nutrition examination survey (NHANES). http://www.cdc.gov/nchs/nhanes. Accessed 22 July 2018.
- Neyman, J. (1938). On the two different aspects of the representative method: The method of stratified sampling and the method of purposive selection.
*Journal of the Royal Statistical Society*,*97*, 558–625.CrossRefzbMATHGoogle Scholar - OECD (2006).
*PISA 2006 Technical Report*. https://www.oecd.org/pisa/data/42025182.pdf. Accessed 22 July 2018. - OECD (2007).
*PISA 2006*:*Science competencies for tomorrows world, volume 1 - analysis*. Paris: OECD Publisher.Google Scholar - Oǧuz-Alper, M., Berger, Y. G. (2016). Empirical likelihood approach for modelling survey data.
*Biometrika*,*103*, 447–459.Google Scholar - Osier, G., Berger, Y. G., Goedemé, T. (2013). Standard error estimation for the eu-silc indicators of poverty and social exclusion.
*Eurostat Methodologies and Working Papers series*.Google Scholar - Owen, A. B. (1988). Empirical likelihood ratio confidence intervals for a single functional.
*Biometrika*,*75*, 237–249.MathSciNetCrossRefzbMATHGoogle Scholar - Owen, A. B. (2001).
*Empirical likelihood*. New York: Chapman & Hall.CrossRefzbMATHGoogle Scholar - Pfeffermann, D., Skinner, C., Holmes, D., Goldstein, H., Rasbash, J. (1998). Weighting for unequal selection probabilities in multilevel models.
*Journal of the Royal Statistical Society Series B*,*60*, 23–40.Google Scholar - Qin, J., Lawless, J. (1994). Empirical likelihood and general estimating equations.
*The Annals of Statistics*,*22*, 300–325.Google Scholar - Qin, J., Zhang, B., Leung, D. H. Y. (2009). Empirical likelihood in missing data problems.
*Journal of the American Statistical Association*,*104*, 1492–1503.Google Scholar - R Development Core Team (2014). R: A language and environment for statistical computing. R Foundation for Statistical Computing. Vienna, Austria. http://www.R-project.org, Accessed 22 July 2018.
- Rao, C. (1973).
*linear statistical inference and its applications, vol*(2nd ed.). New York: Wiley.CrossRefGoogle Scholar - Rao, J. N. K., Shao, A. J. (1992). Jackknife variance estimation with survey data under hotdeck imputation.
*Biometrika*,*79*, 811–822.Google Scholar - Rubin, D. B. (1976). Inference and missing data.
*Biometrika*,*63*, 581–592.MathSciNetCrossRefzbMATHGoogle Scholar - Rust, K., Rao, J. (1996). Variance estimation for complex surveys using replication techniques.
*Biometrika*,*5*, 281–310.Google Scholar - Särndal, C. E., Lundström, S. (2005).
*Estimation in surveys with nonresponse*. Chichester: Wiley.Google Scholar - Särndal, C. E., Swensson, B. (1987). A general view of estimation for two-phases of selection with applications to two-phase sampling and non-response.
*International Statistical Review*,*55*, 279–294.Google Scholar - Särndal, C.-E., Swensson, B., Wretman, J. (1992).
*Model Assisted Survey Sampling*. New York: Springer.Google Scholar - Scheffé, H. (1959).
*The analysis of variance*. New York: Wiley.zbMATHGoogle Scholar - Shao, J., Steel, P. (1999). Variance estimation for survey data with composite imputation and nonnegligible sampling fractions.
*Journal of the American Statistical Association*,*94*, 254–265.Google Scholar - Valliant, R. (2004). The effect of multiple weighting steps on variance estimation.
*Journal of Official Statistics*,*20*, 1–18.Google Scholar - Van Der Vaart, A. W. (1998).
*Asymptotic statistics*. Cambridge: Cambridge University Press.CrossRefzbMATHGoogle Scholar - Wang, D., Chen, S. X. (2009). Empirical likelihood for estimating equations with missing values.
*The Annals of Statistics*,*37*, 490–517.Google Scholar - Wang, Q., Rao, J. N. K. (2002a). Empirical likelihood-based inference in linear models with missing data.
*Scandinavian Journal of Statistics*,*29*, 563–576.Google Scholar - Wang, Q., Rao, J. N. K. (2002b). Empirical likelihood-based inference under imputation for missing response data.
*The Annals of Statistics*,*30*, 896–924.Google Scholar - Wolter, K. M. (2007).
*Introduction to Variance Estimation*(2nd ed). New York: Springer.Google Scholar - Wu, C., Rao, J. N. K. (2006). Pseudo-empirical likelihood ratio confidence intervals for complex surveys.
*Canadian Journal of Statistics*,*34*, 359–375.Google Scholar - Wu, C., Zhao, P., Haziza, D. (2017). Empirical likelihood inference for complex surveys and the design-based oracle variable selection theory. In
*Proceedings of the section on survey research methods*. American Statistical Association.Google Scholar