Variance Estimation in Two-stage Cluster Sampling under Imputation for Missing Data
- 4 Downloads
Variance estimation in the presence of imputed data has been widely studied in the literature. It is well known that treating the imputed values as if they were true values could lead to serious underestimation of the true variance, especially if the response rates are low. In this paper, we consider the problem of variance estimation using a model, in the context of two-stage cluster sampling designs which are widely used in social and household surveys. In cluster sampling designs, units in the same neighborhood tend to have similar characteristics (e.g., income, education level, etc). It is thus important to take account of the intra-cluster correlation in formulating the model and then derive variance estimators under the appropriate model. In this paper, we consider weighted random hot-deck imputation and derive consistent variance estimators under two distinct frameworks: (i) the two-phase framework and (ii) the reverse framework. In the case of the two-phase framework, we use a variance estimation method proposed by Särndal (1992), whereas we use a method developed by Fay (1991) and Shao and Steel (1999) in the case of the reverse framework. Finally, we perform a simulation study to evaluate the performance of the proposed variance estimators in terms of relative bias. We conclude that the variance estimators obtained by Shao-Steel’s method are more robust to model misspecification than those derived using Särndal’s method.
Key-wordsNonresponse Random hot-deck imputation Reverse framework Two-phase framework Two-stage cluster sampling Variance estimation
AMS Subject Classification62D05
Unable to display preview. Download preview PDF.
- Brick, J.M., Kalton, G., Kim, J.K., 2004. Variance estimation with hot deck imputation using a model. Survey Methodology, 30, 57–66.Google Scholar
- Demnati, A., Rao, J.N.K., 2004. Linearization variance estimators for survey data. Survey Methodology, 30, 17–27.Google Scholar
- Deville, J.C., Särndal, C.E., 1994. Variance estimation for the regression imputed Horvitz-Thompson estimator. Journal of Official Statistics, 23, 33–40.Google Scholar
- Fay, R.E., 1991. A Design-Based Perspective on Missing Data Variance. Proceedings of the 1991 Annual Research Conference, US Bureau of the Census, 429–440.Google Scholar
- Haziza, D., 2009. Imputation and inference in the presence of missing data. In Handbook of Statistics, Volume 29, Sample Surveys: Theory Methods and Inference, Pfeffermann, D. and Rao, C.R. (editors), pp. 215–246.Google Scholar
- Rao, J.N.K., 1990. Variance estimation under imputation for missing data. Technical report, Statistics Canada, Ottawa.Google Scholar
- Reiter, J.P., Raghunathan, T.E., Kinney, S.K., 2006. The importance of modeling the sampling design in multiple imputation for missing data. Survey Methodology, 32, 143–149.Google Scholar
- Särndal, C.E., 1992. Method for estimating the precision of survey estimates when imputation has been used. Survey Methodology, 18, 241–252.Google Scholar
- Shao, J., 2007. Handling survey nonresponse in cluster sampling. Survey Methodology, 33, 81–85.Google Scholar