Methods for Handling Missing Data

Wu, Changbao; Thompson, Mary E.

doi:10.1007/978-3-030-44246-0_9

Changbao Wu⁵ &
Mary E. Thompson⁵

Part of the book series: ICSA Book Series in Statistics ((ICSABSS))

5323 Accesses

Abstract

In an ideal world for surveys, the finite population is well defined, sampling frames are complete and up-to-date, units for the sample are selected based on the survey design, and variables of interest are measured without any error for all units selected in the sample. In practice, however, there might be issues associated with some or all of the steps in survey operations, resulting in problems such as under-coverage, nonresponse and measurement error. If any of the problems is not handled with care, naive statistical analyses using the observed survey data often lead to invalid results. While missing data itself is a general topic arising from many fields, this chapter discusses issues related to and methods for handling missing survey data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Hardcover Book: USD 129.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Brick, J. M., & Kalton, G. (1996). Handling missing data in survey research. Statistical Methods in Medical Research, 5, 215–238.
Article Google Scholar
Carrillo, I. A., Chen, J., & Wu, C. (2011). A pseudo-GEE approach to analyzing longitudinal surveys under imputation for missing responses. Journal of Official Statistics, 27, 255–277.
Google Scholar
Chen, S., & Haziza, D. (2019). Recent development in dealing with item non-response in surveys: A critical review. International Statistical Review, 87, S192–218.
Article MathSciNet Google Scholar
Chen, H. Y., & Little, R. J. A. (1999). A test of missing completely at random for generalized estimating equations with missing data. Biometrika, 86, 1–13.
Article MathSciNet MATH Google Scholar
Chen, J., & Shao, J. (2000). Nearest neighbor imputation for survey data. Journal of Official Statistics, 16, 113–131.
Google Scholar
Chen, J., & Shao, J. (2001). Jackknife variance estimation for nearest-neighbor imputation. Journal of the American Statistical Association, 96, 260–269.
Article MathSciNet MATH Google Scholar
Chen, M., Thompson, M. E., & Wu, C. (2018). Empirical likelihood methods for complex surveys with data missing-by-design. Statistica Sinica, 28, 2027–2048.
MathSciNet MATH Google Scholar
Chen, M., Wu, C., & Thompson, M. E. (2015). An Imputation-based empirical likelihood approach to pretest-posttest studies. The Canadian Journal of Statistics, 43, 378–402.
Article MathSciNet MATH Google Scholar
Chen, M., Wu, C., & Thompson, M. E. (2016). Mann-Whitney test with empirical likelihood methods for pretest-posttest studies. Journal of Nonparametric Statistics, 28, 360–374.
Article MathSciNet MATH Google Scholar
Chen, Y., Li, P., & Wu, C. (2020). Doubly robust inference with non-probability survey samples. Journal of the American Statistical Association (to appear)
Google Scholar
Fay, R. E. (1992). When are inferences from multiple imputation valid? Proceedings of the Survey Research Methods Section (pp. 227–232). American Statistical Association, Alexandria, VA.
Google Scholar
Fay, R. E. (1996). Alternative paradigms for the analysis of imputed survey data. Journal of the American Statistical Association, 91, 490–498.
Article MATH Google Scholar
Godambe, V. P. (1991b). Orthogonality of estimating functions and nuisance parameters. Biometrika, 78, 143–151.
Article MathSciNet Google Scholar
Haziza, D. (2009). Imputation and inference in the presence of missing data. In D. Pfeffermann & C. R. Rao (Eds.) Handbook of Statistics, Vol. 29A: Sample Surveys: Design, Methods and Applications (pp. 215–246). Amsterdam: Elsevier.
Google Scholar
Haziza, D., & Beaumont, J.-F. (2007). On the construction of imputation classes in surveys. International Statistical Review, 75, 25–43.
Article Google Scholar
Haziza, D., & Beaumont, J.-F. (2017). Construction of weights in surveys: A review. Statistical Science, 32, 206–226.
Article MathSciNet MATH Google Scholar
Haziza, D., & Lesage, E. (2016). A discussion of weighting procedures for unit nonresponse. Journal of Official Statistics, 32, 129–145.
Article Google Scholar
Horvitz, D. G., & Thompson, D. J. (1952). A generalization of sampling without replacement from a finite universe. Journal of the American Statistical Association, 47, 663–685.
Article MathSciNet MATH Google Scholar
Kalton, G., & Kish, L. (1984). Some efficient random imputation methods. Communications in Statistics, A, 13, 1919–1939.
Article Google Scholar
Kim, J. K., & Fuller, W. A. (2004). Fractional hot deck imputation. Biometrika, 91, 559–578.
Article MathSciNet MATH Google Scholar
Kim, J. K., & Haziza, D. (2014). Doubly robust inference with missing data in survey sampling. Statistica Sinica, 24, 375–394.
MathSciNet MATH Google Scholar
Kim, J. K., & Kim, J. J. (2007). Nonresponse weighting adjustment using estimated response probability. The Canadian Journal of Statistics, 35, 501–514.
Article MathSciNet MATH Google Scholar
Kim, J. K., & Shao, J. (2013). Statistical methods for handling incomplete data. Boca Raton: CRC Press, Taylor & Francis Group.
Book MATH Google Scholar
Kott, P. S. (1994). A note on handling nonresponse in sample surveys.Journal of the American Statistical Association, 89, 693–696.
Google Scholar
Little, R. J. A. (1988). A test of missing completely at random for multivariate data with missing values. Journal of the American Statistical Association, 83, 1198–1202.
Article MathSciNet Google Scholar
Little, R. J. A., & Rubin, D. B. (2002). Statistical analysis with missing data (2nd ed.). New York: Wiley
Book MATH Google Scholar
McCullagh, P., & Nelder, J. (1983). Generalized linear models. London: Chapman & Hall.
Book MATH Google Scholar
Meng, X.-L. (1994). Multiple-imputation inferences with uncongenial sources of input. Statistical Science, 9, 538–573.
Article Google Scholar
Raghunathan, T. E., Lepkowski, J. M., van Hoewyk, J., & Solenberger, P. (2001). A multivariate technique for multiply imputing missing values using a sequence of regression models. Survey Methodology, 27, 85–96.
Google Scholar
Rivers, D. (2007). Sampling for web surveys. Proceedings of the Survey Research Methods Section (pp. 1–26). Alexandria, VA: American Statistical Association.
Google Scholar
Robins, J. M., Rotnitzky, A., & Zhao, L. P. (1994). Estimation of regression coefficients when some regressors are not always observed. Journal of the American Statistical Association, 89, 846–866.
Article MathSciNet MATH Google Scholar
Robins, J. M., & Wang, N. (2000). Inference for imputation estimators. Biometrika, 87, 113–124.
Article MathSciNet MATH Google Scholar
Rosenbaum, P. R., & Rubin, D. B. (1983). The central role of the propensity score in observational studies for causal effects. Biometrika, 70, 41–55.
Article MathSciNet MATH Google Scholar
Rubin, D. B. (1976). Inference and missing data. Biometrika, 63, 581–590.
Article MathSciNet MATH Google Scholar
Rubin, D. B. (1978). Multiple imputations in sample surveys - A phenomenological Bayesian approach to nonresponse. In Proceedings of the Survey Research Methods Section (pp. 20–28). Alexandria, VA: American Statistical Association.
Google Scholar
Rubin, D. B. (1987). Multiple imputation for nonresponse in surveys. New York: Wiley.
Book MATH Google Scholar
Scharfstein, D. O., Rotnitzky, A., & Robins, J. M. (1999). Adjusting for nonignorable drop-out using semiparametric nonresponse models. Journal of the American Statistical Association, 94, 1096–1120.
Article MathSciNet MATH Google Scholar
Shao, J. (2009). Nonparametric variance estimation for nearest neighbor imputation. Journal of Official Statistics, 25, 55–62.
Google Scholar
She, X., & Wu, C. (2019). Fully efficient joint fractional imputation for incomplete bivariate ordinal responses. Statistica Sinica, 29, 409–430.
MathSciNet MATH Google Scholar
She, X., & Wu, C. (2020). Validity and efficiency in analyzing ordinal responses with missing observations. The Canadian Journal of Statistics (in press).
Google Scholar
Tsiatis, A. A. (2006). Semiparametric theory and missing data. New York: Springer.
MATH Google Scholar
Wang, N., & Robins, J. M. (1998). Large-sample theory for parametric multiple imputation procedures. Biometrika, 85, 935–948.
Article MathSciNet MATH Google Scholar
Wei, G. C., & Tanner, M. A. (1990). A Monte Carlo implementation of the EM algorithm and the poor man’s data augmentation algorithms. Journal of the American Statistical Association, 85, 699–704.
Article Google Scholar
Xie, X., & Meng, X.-L. (2017). Dissecting multiple imputation from a multi-phase inference perspective: What happens when god’s, imputer’s and analyst’s models are uncongenial (with discussion)? Statistica Sinica, 27, 1485–1594.
MathSciNet MATH Google Scholar
Zhang, S., Han, P., & Wu, C. (2019a). A unified empirical likelihood approach to testing MCAR and subsequent estimation. Scandinavian Journal of Statistics, 46, 272–288.
Article MathSciNet MATH Google Scholar
Zhang, S., Han, P., & Wu, C. (2019b). Empirical likelihood inference for non-randomized pretest-posttest studies with missing data. Electronic Journal of Statistics, 13, 2012–2042.
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Statistics and Actuarial Science, University of Waterloo, Waterloo, ON, Canada
Changbao Wu & Mary E. Thompson

Authors

Changbao Wu
View author publications
You can also search for this author in PubMed Google Scholar
Mary E. Thompson
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Wu, C., Thompson, M.E. (2020). Methods for Handling Missing Data. In: Sampling Theory and Practice. ICSA Book Series in Statistics. Springer, Cham. https://doi.org/10.1007/978-3-030-44246-0_9

Download citation

DOI: https://doi.org/10.1007/978-3-030-44246-0_9
Published: 16 May 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-44244-6
Online ISBN: 978-3-030-44246-0
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics