Abstract
In an ideal world for surveys, the finite population is well defined, sampling frames are complete and up-to-date, units for the sample are selected based on the survey design, and variables of interest are measured without any error for all units selected in the sample. In practice, however, there might be issues associated with some or all of the steps in survey operations, resulting in problems such as under-coverage, nonresponse and measurement error. If any of the problems is not handled with care, naive statistical analyses using the observed survey data often lead to invalid results. While missing data itself is a general topic arising from many fields, this chapter discusses issues related to and methods for handling missing survey data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Brick, J. M., & Kalton, G. (1996). Handling missing data in survey research. Statistical Methods in Medical Research, 5, 215–238.
Carrillo, I. A., Chen, J., & Wu, C. (2011). A pseudo-GEE approach to analyzing longitudinal surveys under imputation for missing responses. Journal of Official Statistics, 27, 255–277.
Chen, S., & Haziza, D. (2019). Recent development in dealing with item non-response in surveys: A critical review. International Statistical Review, 87, S192–218.
Chen, H. Y., & Little, R. J. A. (1999). A test of missing completely at random for generalized estimating equations with missing data. Biometrika, 86, 1–13.
Chen, J., & Shao, J. (2000). Nearest neighbor imputation for survey data. Journal of Official Statistics, 16, 113–131.
Chen, J., & Shao, J. (2001). Jackknife variance estimation for nearest-neighbor imputation. Journal of the American Statistical Association, 96, 260–269.
Chen, M., Thompson, M. E., & Wu, C. (2018). Empirical likelihood methods for complex surveys with data missing-by-design. Statistica Sinica, 28, 2027–2048.
Chen, M., Wu, C., & Thompson, M. E. (2015). An Imputation-based empirical likelihood approach to pretest-posttest studies. The Canadian Journal of Statistics, 43, 378–402.
Chen, M., Wu, C., & Thompson, M. E. (2016). Mann-Whitney test with empirical likelihood methods for pretest-posttest studies. Journal of Nonparametric Statistics, 28, 360–374.
Chen, Y., Li, P., & Wu, C. (2020). Doubly robust inference with non-probability survey samples. Journal of the American Statistical Association (to appear)
Fay, R. E. (1992). When are inferences from multiple imputation valid? Proceedings of the Survey Research Methods Section (pp. 227–232). American Statistical Association, Alexandria, VA.
Fay, R. E. (1996). Alternative paradigms for the analysis of imputed survey data. Journal of the American Statistical Association, 91, 490–498.
Godambe, V. P. (1991b). Orthogonality of estimating functions and nuisance parameters. Biometrika, 78, 143–151.
Haziza, D. (2009). Imputation and inference in the presence of missing data. In D. Pfeffermann & C. R. Rao (Eds.) Handbook of Statistics, Vol. 29A: Sample Surveys: Design, Methods and Applications (pp. 215–246). Amsterdam: Elsevier.
Haziza, D., & Beaumont, J.-F. (2007). On the construction of imputation classes in surveys. International Statistical Review, 75, 25–43.
Haziza, D., & Beaumont, J.-F. (2017). Construction of weights in surveys: A review. Statistical Science, 32, 206–226.
Haziza, D., & Lesage, E. (2016). A discussion of weighting procedures for unit nonresponse. Journal of Official Statistics, 32, 129–145.
Horvitz, D. G., & Thompson, D. J. (1952). A generalization of sampling without replacement from a finite universe. Journal of the American Statistical Association, 47, 663–685.
Kalton, G., & Kish, L. (1984). Some efficient random imputation methods. Communications in Statistics, A, 13, 1919–1939.
Kim, J. K., & Fuller, W. A. (2004). Fractional hot deck imputation. Biometrika, 91, 559–578.
Kim, J. K., & Haziza, D. (2014). Doubly robust inference with missing data in survey sampling. Statistica Sinica, 24, 375–394.
Kim, J. K., & Kim, J. J. (2007). Nonresponse weighting adjustment using estimated response probability. The Canadian Journal of Statistics, 35, 501–514.
Kim, J. K., & Shao, J. (2013). Statistical methods for handling incomplete data. Boca Raton: CRC Press, Taylor & Francis Group.
Kott, P. S. (1994). A note on handling nonresponse in sample surveys.Journal of the American Statistical Association, 89, 693–696.
Little, R. J. A. (1988). A test of missing completely at random for multivariate data with missing values. Journal of the American Statistical Association, 83, 1198–1202.
Little, R. J. A., & Rubin, D. B. (2002). Statistical analysis with missing data (2nd ed.). New York: Wiley
McCullagh, P., & Nelder, J. (1983). Generalized linear models. London: Chapman & Hall.
Meng, X.-L. (1994). Multiple-imputation inferences with uncongenial sources of input. Statistical Science, 9, 538–573.
Raghunathan, T. E., Lepkowski, J. M., van Hoewyk, J., & Solenberger, P. (2001). A multivariate technique for multiply imputing missing values using a sequence of regression models. Survey Methodology, 27, 85–96.
Rivers, D. (2007). Sampling for web surveys. Proceedings of the Survey Research Methods Section (pp. 1–26). Alexandria, VA: American Statistical Association.
Robins, J. M., Rotnitzky, A., & Zhao, L. P. (1994). Estimation of regression coefficients when some regressors are not always observed. Journal of the American Statistical Association, 89, 846–866.
Robins, J. M., & Wang, N. (2000). Inference for imputation estimators. Biometrika, 87, 113–124.
Rosenbaum, P. R., & Rubin, D. B. (1983). The central role of the propensity score in observational studies for causal effects. Biometrika, 70, 41–55.
Rubin, D. B. (1976). Inference and missing data. Biometrika, 63, 581–590.
Rubin, D. B. (1978). Multiple imputations in sample surveys - A phenomenological Bayesian approach to nonresponse. In Proceedings of the Survey Research Methods Section (pp. 20–28). Alexandria, VA: American Statistical Association.
Rubin, D. B. (1987). Multiple imputation for nonresponse in surveys. New York: Wiley.
Scharfstein, D. O., Rotnitzky, A., & Robins, J. M. (1999). Adjusting for nonignorable drop-out using semiparametric nonresponse models. Journal of the American Statistical Association, 94, 1096–1120.
Shao, J. (2009). Nonparametric variance estimation for nearest neighbor imputation. Journal of Official Statistics, 25, 55–62.
She, X., & Wu, C. (2019). Fully efficient joint fractional imputation for incomplete bivariate ordinal responses. Statistica Sinica, 29, 409–430.
She, X., & Wu, C. (2020). Validity and efficiency in analyzing ordinal responses with missing observations. The Canadian Journal of Statistics (in press).
Tsiatis, A. A. (2006). Semiparametric theory and missing data. New York: Springer.
Wang, N., & Robins, J. M. (1998). Large-sample theory for parametric multiple imputation procedures. Biometrika, 85, 935–948.
Wei, G. C., & Tanner, M. A. (1990). A Monte Carlo implementation of the EM algorithm and the poor man’s data augmentation algorithms. Journal of the American Statistical Association, 85, 699–704.
Xie, X., & Meng, X.-L. (2017). Dissecting multiple imputation from a multi-phase inference perspective: What happens when god’s, imputer’s and analyst’s models are uncongenial (with discussion)? Statistica Sinica, 27, 1485–1594.
Zhang, S., Han, P., & Wu, C. (2019a). A unified empirical likelihood approach to testing MCAR and subsequent estimation. Scandinavian Journal of Statistics, 46, 272–288.
Zhang, S., Han, P., & Wu, C. (2019b). Empirical likelihood inference for non-randomized pretest-posttest studies with missing data. Electronic Journal of Statistics, 13, 2012–2042.
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Wu, C., Thompson, M.E. (2020). Methods for Handling Missing Data. In: Sampling Theory and Practice. ICSA Book Series in Statistics. Springer, Cham. https://doi.org/10.1007/978-3-030-44246-0_9
Download citation
DOI: https://doi.org/10.1007/978-3-030-44246-0_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-44244-6
Online ISBN: 978-3-030-44246-0
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)