Journal of Statistical Theory and Practice

, Volume 4, Issue 4, pp 827–844 | Cite as

Variance Estimation in Two-stage Cluster Sampling under Imputation for Missing Data

  • David HazizaEmail author
  • J. N. K. Rao


Variance estimation in the presence of imputed data has been widely studied in the literature. It is well known that treating the imputed values as if they were true values could lead to serious underestimation of the true variance, especially if the response rates are low. In this paper, we consider the problem of variance estimation using a model, in the context of two-stage cluster sampling designs which are widely used in social and household surveys. In cluster sampling designs, units in the same neighborhood tend to have similar characteristics (e.g., income, education level, etc). It is thus important to take account of the intra-cluster correlation in formulating the model and then derive variance estimators under the appropriate model. In this paper, we consider weighted random hot-deck imputation and derive consistent variance estimators under two distinct frameworks: (i) the two-phase framework and (ii) the reverse framework. In the case of the two-phase framework, we use a variance estimation method proposed by Särndal (1992), whereas we use a method developed by Fay (1991) and Shao and Steel (1999) in the case of the reverse framework. Finally, we perform a simulation study to evaluate the performance of the proposed variance estimators in terms of relative bias. We conclude that the variance estimators obtained by Shao-Steel’s method are more robust to model misspecification than those derived using Särndal’s method.


Nonresponse Random hot-deck imputation Reverse framework Two-phase framework Two-stage cluster sampling Variance estimation 

AMS Subject Classification



Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Beaumont, J.F., Bocci, C., 2009. Variance estimation when donor imputation is used to fill in missing data. The Canadian Journal of Statistics, 37, 400–416.CrossRefGoogle Scholar
  2. Brick, J.M., Kalton, G., Kim, J.K., 2004. Variance estimation with hot deck imputation using a model. Survey Methodology, 30, 57–66.Google Scholar
  3. Demnati, A., Rao, J.N.K., 2004. Linearization variance estimators for survey data. Survey Methodology, 30, 17–27.Google Scholar
  4. Deville, J.C., Särndal, C.E., 1994. Variance estimation for the regression imputed Horvitz-Thompson estimator. Journal of Official Statistics, 23, 33–40.Google Scholar
  5. Fay, R.E., 1991. A Design-Based Perspective on Missing Data Variance. Proceedings of the 1991 Annual Research Conference, US Bureau of the Census, 429–440.Google Scholar
  6. Haziza, D., 2009. Imputation and inference in the presence of missing data. In Handbook of Statistics, Volume 29, Sample Surveys: Theory Methods and Inference, Pfeffermann, D. and Rao, C.R. (editors), pp. 215–246.Google Scholar
  7. Kim, J.K., Rao, J.N.K., 2009. Unified approach to linearization variance estimation from survey data after imputation for item nonresponse. Biometrika, 96, 917–932.MathSciNetCrossRefGoogle Scholar
  8. Rao, J.N.K., 1990. Variance estimation under imputation for missing data. Technical report, Statistics Canada, Ottawa.Google Scholar
  9. Rao, J.N.K., 1996. On variance estimation with imputed survey data. Journal of American Statistical Association, 91, 499–506.CrossRefGoogle Scholar
  10. Rao, J.N.K., Shao, J., 1992. On variance estimation under imputation for missing data. Biometrika, 79, 811–822.MathSciNetCrossRefGoogle Scholar
  11. Reiter, J.P., Raghunathan, T.E., Kinney, S.K., 2006. The importance of modeling the sampling design in multiple imputation for missing data. Survey Methodology, 32, 143–149.Google Scholar
  12. Särndal, C.E., 1992. Method for estimating the precision of survey estimates when imputation has been used. Survey Methodology, 18, 241–252.Google Scholar
  13. Searle, S.R., Casella, G., McCulloch, C.E., 1992. Variance components. John Wiley & Sons, Inc.CrossRefGoogle Scholar
  14. Shao, J., 2007. Handling survey nonresponse in cluster sampling. Survey Methodology, 33, 81–85.Google Scholar
  15. Shao, J., Sitter, R.R., 1996. Bootstrap for imputed survey data. Journal of the American Statistical Association, 93, 819–831.MathSciNetCrossRefGoogle Scholar
  16. Shao, J., Steel, P., 1999. Variance estimation for survey data with composite imputation and nonnegligible sampling fractions. Journal of the American Statistical Association, 94, 254–265.MathSciNetCrossRefGoogle Scholar
  17. Yuan, Y., Little, R.J.A., 2007. Parametric and semi-parametric model based estimates of the finite population mean for two-stage cluster samples with item nonresponse. Biometrics, 63, 1172–1180.MathSciNetCrossRefGoogle Scholar

Copyright information

© Grace Scientific Publishing 2010

Authors and Affiliations

  1. 1.Département de mathématiques et de statistiqueUniversité de MontréalMontrealCanada
  2. 2.School of Mathematics and StatisticsCarleton UniversityOttawaCanada

Personalised recommendations