An empirical likelihood approach under cluster sampling with missing observations

  • Yves G. Berger


The parameter of interest considered is the unique solution to a set of estimating equations, such as regression parameters of generalised linear models. We consider a design-based approach; that is, the sampling distribution is specified by stratification, cluster (multi-stage) sampling, unequal selection probabilities, side information and a response mechanism. The proposed empirical likelihood approach takes into account of these features. Empirical likelihood has been mostly developed under more restrictive settings, such as independent and identically distributed assumption, which is violated under a design-based framework. A proper empirical likelihood approach which deals with cluster sampling, missing data and multidimensional parameters is absent in the literature. This paper shows that a cluster-level empirical log-likelihood ratio statistic is pivotal. The main contribution of the paper is to provide the rigorous asymptotic theory and underlining regularity conditions which imply \({\surd {n}}\)-consistency and the Wilks’s theorem or self-normalisation property. Negligible and large sampling fractions are considered.


Design-based approach Estimating equations Stratification Side information Unequal probabilities 



This work was supported by the European Unions’s Sevenths Programme for Research, Technological Development and Demonstration under Grant Agreement No 312691 - InGRID. I wish to thanks Dr. Melike Oǧuz-Alper (Statistics Norway) for useful comments and help with Sect. 9. I also wish to thank an anonymous reviewer for suggesting adding Sects. 7, 8.4 and 9.

Supplementary material

10463_2018_681_MOESM1_ESM.pdf (282 kb)
Supplementary material 1 (pdf 281 KB)


  1. Alfons, A., Filzmoser, P., Hulliger, B., Kolb, J., Kraft, S., Münnich, R. (2011). Synthetic Data Generation of SILC Data. Research Project Report WP6 – D6.2, University of Trier. Accessed 22 July 2018.
  2. Berger, Y. G. (2018). Empirical likelihood approaches under complex sampling designs. In Wiley StatsRef: Statistics Reference Online. Wiley.CrossRefGoogle Scholar
  3. Berger, Y. G., Rao, J. N. K. (2006). Adjusted jackknife for imputation under unequal probability sampling without replacement. Journal of the Royal Statistical Society Series B, 68, 531–547.Google Scholar
  4. Berger, Y. G., Torres, O. D. L. R. (2016). An empirical likelihood approach for inference under complex sampling design. Journal of the Royal Statistical Society Series B, 78, 319–341.Google Scholar
  5. Binder, D. A. (1983). On the variance of asymptotically normal estimators from complex surveys. International Statistical Review, 51, 279–292.MathSciNetCrossRefzbMATHGoogle Scholar
  6. Binder, D. A., Patak, Z. (1994). Use of estimating functions for estimation from complex surveys. Journal of the American Statistical Association, 89, 1035–1043.Google Scholar
  7. Brewer, K., Gregoire, T. (2009). Introduction to survey sampling. In D. Pfeffermann, C. Rao (Eds.), Sample Surveys: Design, Methods and Applications (pp. 9–38). Handbook of Statistics. Amsterdam: Elsevier.Google Scholar
  8. Brick, J., Kalton, G. (1996). Handling missing data in survey research. Statistical Methods in Medical Research, 5, 215–238.Google Scholar
  9. Brick, J. M., Montaquila, J. M. (2009). Nonresponse and weighting. In D. Pfeffermann C. R. Rao (Eds.), Sample surveys: Design, methods and applications vol. 29A of Handbook of Statistics (pp. 163–185). Amsterdam: Elsevier.Google Scholar
  10. Chen, J., Sitter, R. R. (1999). A pseudo empirical likelihood approach to the effective use of auxiliary information in complex surveys. Statistica Sinica, 9, 385–406.Google Scholar
  11. Chen, S., Kim, J. K. (2014). Population empirical likelihood for nonparametric inference in survey sampling. Statistica Sinica, 24, 335–355.Google Scholar
  12. Chen, S., Van Keilegom, I. (2009). A review on empirical likelihood methods for regression. Test, 18, 415–447.Google Scholar
  13. Deville, J. C. (1999). Variance estimation for complex statistics and estimators: Linearization and residual techniques. Survey Methodology, 25, 193–203.Google Scholar
  14. Deville, J. C., Särndal, C.-E. (1992). Calibration estimators in survey sampling. Journal of the American Statistical Association, 87, 376–382.Google Scholar
  15. Eurostat (2012). European union statistics on income and living conditions (EU-SILC). Accessed 22 July 2018.
  16. Fang, F., Hong, Q., Shao, J. (2009). A pseudo empirical likelihood approach for stratified samples with nonresponse. The Annals of Statistics, 37, 371–393.Google Scholar
  17. Fang, F., Hong, Q., Shao, J. (2010). Empirical likelihood estimation for samples with nonignorable nonresponse. Statistica Sinica, 20, 263–280.Google Scholar
  18. Fay, B. E. (1991). A design-based perspective on missing data variance. In Proceeding of the 1991 annual research conference. U.S. Bureau of the Census (pp. 429–440).Google Scholar
  19. Fuller, W. A. (2009). Some design properties of a rejective sampling procedure. Biometrika, 96, 933–944.MathSciNetCrossRefzbMATHGoogle Scholar
  20. Godambe, V., Thompson, M. E. (1974). Estimating equations in the presence of a nuisance parameter. The Annals of Statistics, 2, 568–571.Google Scholar
  21. Godambe, V. P., Thompson, M. (2009). Estimating functions and survey sampling. In D. Pfeffermann, C. Rao (Eds.), Sample surveys: Inference and analysis. Handbook of statistics (pp. 83–101). Amsterdam: Elsevier.Google Scholar
  22. Graf, E., Tillé, Y. (2014). Variance estimation using linearization for poverty and social exclusion indicators. Survey Methodology, 40, 61–79.Google Scholar
  23. Hájek, J. (1964). Asymptotic theory of rejective sampling with varying probabilities from a finite population. The Annals of Mathematical Statistics, 35, 1491–1523.MathSciNetCrossRefzbMATHGoogle Scholar
  24. Hartley, H. O., Rao, J. N. K. (1962). Sampling with unequal probabilities without replacement. The Annals of Mathematical Statistics, 33, 350–374.Google Scholar
  25. Hartley, H. O., Rao, J. N. K. (1968). A new estimation theory for sample surveys. Biometrika, 55, 547–557.Google Scholar
  26. Haziza, D. (2009). Imputation and inference in the presence of missing data. In D. Pfeffermann C. R. Rao (Eds.), Sample surveys: Design, methods and applications vol. 29A of handbook of statistics (pp. 215–246). Amsterdam: Elsevier.Google Scholar
  27. Haziza, D., Beaumont, J. F. (2007). On the construction of imputation classes in surveys. International Statistical Review, 75, 25–43.Google Scholar
  28. Haziza, D., Lesage, E. (2016). A discussion of weighting procedures for unit nonresponse. Journal of Official Statistics, 32, 129–145.Google Scholar
  29. Imbens, G. W., Lancaster, T. (1994). Combining micro and macro data in microeconometric models. The Review of Economic Studies, 61, 655–680.Google Scholar
  30. Isaki, C. T., Fuller, W. A. (1982). Survey design under the regression super-population model. Journal of the American Statistical Association, 77, 89–96.Google Scholar
  31. Kalton, G. (1983). Compensating for missing survey data. Ann Arbor, MI: University of Michigan Press.Google Scholar
  32. Kovar, J. G., Rao, J. N. K., Wu, C. F. J. (1988). Bootstrap and other methods to measure errors in survey estimates. The Canadian Journal of Statistics, 16, 25–45.Google Scholar
  33. Krewski, D., Rao, J. N. K. (1981). Inference from stratified sample: Properties of linearization jackknife, and balanced repeated replication methods. The Annals of Statistics, 9, 1010–1019.Google Scholar
  34. Little, R. (1986). Survey nonresponse adjustments for estimates of means. International Statistical Review, 54, 139–157.CrossRefzbMATHGoogle Scholar
  35. Little, R., Rubin, D. B. (2002). Statistical analysis with missing data (2nd ed). Hoboken, NJ: Wiley.Google Scholar
  36. Little, R., Vartivarian, S. (2005). Models for nonresponse in sample surveys. Survey Methodology, 31, 161–168.Google Scholar
  37. Lundström, S., Särndal, C. E. (1999). Calibration as a standard method for treatment of nonresponse. Journal of Official Statistics, 15, 305–327.Google Scholar
  38. Montaquila, J., Brick, J., Hagedorn, M., Kennedy, C., Keeter, S. (2008). Aspects of nonresponse bias in rdd telephone surveys. In J. Lepkowski, C. Tucker, J. Brick et al. (Eds.), Advances in Telephone Survey Methodology. New York: Wiley.Google Scholar
  39. National Center for Health Statistics (2016). National health and nutrition examination survey (NHANES). Accessed 22 July 2018.
  40. Neyman, J. (1938). On the two different aspects of the representative method: The method of stratified sampling and the method of purposive selection. Journal of the Royal Statistical Society, 97, 558–625.CrossRefzbMATHGoogle Scholar
  41. OECD (2006). PISA 2006 Technical Report. Accessed 22 July 2018.
  42. OECD (2007). PISA 2006: Science competencies for tomorrows world, volume 1 - analysis. Paris: OECD Publisher.Google Scholar
  43. Oǧuz-Alper, M., Berger, Y. G. (2016). Empirical likelihood approach for modelling survey data. Biometrika, 103, 447–459.Google Scholar
  44. Osier, G., Berger, Y. G., Goedemé, T. (2013). Standard error estimation for the eu-silc indicators of poverty and social exclusion. Eurostat Methodologies and Working Papers series .Google Scholar
  45. Owen, A. B. (1988). Empirical likelihood ratio confidence intervals for a single functional. Biometrika, 75, 237–249.MathSciNetCrossRefzbMATHGoogle Scholar
  46. Owen, A. B. (2001). Empirical likelihood. New York: Chapman & Hall.CrossRefzbMATHGoogle Scholar
  47. Pfeffermann, D., Skinner, C., Holmes, D., Goldstein, H., Rasbash, J. (1998). Weighting for unequal selection probabilities in multilevel models. Journal of the Royal Statistical Society Series B, 60, 23–40.Google Scholar
  48. Qin, J., Lawless, J. (1994). Empirical likelihood and general estimating equations. The Annals of Statistics, 22, 300–325.Google Scholar
  49. Qin, J., Zhang, B., Leung, D. H. Y. (2009). Empirical likelihood in missing data problems. Journal of the American Statistical Association, 104, 1492–1503.Google Scholar
  50. R Development Core Team (2014). R: A language and environment for statistical computing. R Foundation for Statistical Computing. Vienna, Austria., Accessed 22 July 2018.
  51. Rao, C. (1973). linear statistical inference and its applications, vol (2nd ed.). New York: Wiley.CrossRefGoogle Scholar
  52. Rao, J. N. K., Shao, A. J. (1992). Jackknife variance estimation with survey data under hotdeck imputation. Biometrika, 79, 811–822.Google Scholar
  53. Rubin, D. B. (1976). Inference and missing data. Biometrika, 63, 581–592.MathSciNetCrossRefzbMATHGoogle Scholar
  54. Rust, K., Rao, J. (1996). Variance estimation for complex surveys using replication techniques. Biometrika, 5, 281–310.Google Scholar
  55. Särndal, C. E., Lundström, S. (2005). Estimation in surveys with nonresponse. Chichester: Wiley.Google Scholar
  56. Särndal, C. E., Swensson, B. (1987). A general view of estimation for two-phases of selection with applications to two-phase sampling and non-response. International Statistical Review, 55, 279–294.Google Scholar
  57. Särndal, C.-E., Swensson, B., Wretman, J. (1992). Model Assisted Survey Sampling. New York: Springer.Google Scholar
  58. Scheffé, H. (1959). The analysis of variance. New York: Wiley.zbMATHGoogle Scholar
  59. Shao, J., Steel, P. (1999). Variance estimation for survey data with composite imputation and nonnegligible sampling fractions. Journal of the American Statistical Association, 94, 254–265.Google Scholar
  60. Valliant, R. (2004). The effect of multiple weighting steps on variance estimation. Journal of Official Statistics, 20, 1–18.Google Scholar
  61. Van Der Vaart, A. W. (1998). Asymptotic statistics. Cambridge: Cambridge University Press.CrossRefzbMATHGoogle Scholar
  62. Wang, D., Chen, S. X. (2009). Empirical likelihood for estimating equations with missing values. The Annals of Statistics, 37, 490–517.Google Scholar
  63. Wang, Q., Rao, J. N. K. (2002a). Empirical likelihood-based inference in linear models with missing data. Scandinavian Journal of Statistics, 29, 563–576.Google Scholar
  64. Wang, Q., Rao, J. N. K. (2002b). Empirical likelihood-based inference under imputation for missing response data. The Annals of Statistics, 30, 896–924.Google Scholar
  65. Wolter, K. M. (2007). Introduction to Variance Estimation (2nd ed). New York: Springer.Google Scholar
  66. Wu, C., Rao, J. N. K. (2006). Pseudo-empirical likelihood ratio confidence intervals for complex surveys. Canadian Journal of Statistics, 34, 359–375.Google Scholar
  67. Wu, C., Zhao, P., Haziza, D. (2017). Empirical likelihood inference for complex surveys and the design-based oracle variable selection theory. In Proceedings of the section on survey research methods. American Statistical Association.Google Scholar

Copyright information

© The Institute of Statistical Mathematics, Tokyo 2018

Authors and Affiliations

  1. 1.Southampton Statistical Sciences Research InstituteUniversity of SouthamptonSouthamptonUnited Kingdom

Personalised recommendations