Skip to main content

Methods for Handling Missing Data

  • Chapter
  • First Online:
Sampling Theory and Practice

Part of the book series: ICSA Book Series in Statistics ((ICSABSS))

  • 5323 Accesses

Abstract

In an ideal world for surveys, the finite population is well defined, sampling frames are complete and up-to-date, units for the sample are selected based on the survey design, and variables of interest are measured without any error for all units selected in the sample. In practice, however, there might be issues associated with some or all of the steps in survey operations, resulting in problems such as under-coverage, nonresponse and measurement error. If any of the problems is not handled with care, naive statistical analyses using the observed survey data often lead to invalid results. While missing data itself is a general topic arising from many fields, this chapter discusses issues related to and methods for handling missing survey data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 99.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 129.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • Brick, J. M., & Kalton, G. (1996). Handling missing data in survey research. Statistical Methods in Medical Research, 5, 215–238.

    Article  Google Scholar 

  • Carrillo, I. A., Chen, J., & Wu, C. (2011). A pseudo-GEE approach to analyzing longitudinal surveys under imputation for missing responses. Journal of Official Statistics, 27, 255–277.

    Google Scholar 

  • Chen, S., & Haziza, D. (2019). Recent development in dealing with item non-response in surveys: A critical review. International Statistical Review, 87, S192–218.

    Article  MathSciNet  Google Scholar 

  • Chen, H. Y., & Little, R. J. A. (1999). A test of missing completely at random for generalized estimating equations with missing data. Biometrika, 86, 1–13.

    Article  MathSciNet  MATH  Google Scholar 

  • Chen, J., & Shao, J. (2000). Nearest neighbor imputation for survey data. Journal of Official Statistics, 16, 113–131.

    Google Scholar 

  • Chen, J., & Shao, J. (2001). Jackknife variance estimation for nearest-neighbor imputation. Journal of the American Statistical Association, 96, 260–269.

    Article  MathSciNet  MATH  Google Scholar 

  • Chen, M., Thompson, M. E., & Wu, C. (2018). Empirical likelihood methods for complex surveys with data missing-by-design. Statistica Sinica, 28, 2027–2048.

    MathSciNet  MATH  Google Scholar 

  • Chen, M., Wu, C., & Thompson, M. E. (2015). An Imputation-based empirical likelihood approach to pretest-posttest studies. The Canadian Journal of Statistics, 43, 378–402.

    Article  MathSciNet  MATH  Google Scholar 

  • Chen, M., Wu, C., & Thompson, M. E. (2016). Mann-Whitney test with empirical likelihood methods for pretest-posttest studies. Journal of Nonparametric Statistics, 28, 360–374.

    Article  MathSciNet  MATH  Google Scholar 

  • Chen, Y., Li, P., & Wu, C. (2020). Doubly robust inference with non-probability survey samples. Journal of the American Statistical Association (to appear)

    Google Scholar 

  • Fay, R. E. (1992). When are inferences from multiple imputation valid? Proceedings of the Survey Research Methods Section (pp. 227–232). American Statistical Association, Alexandria, VA.

    Google Scholar 

  • Fay, R. E. (1996). Alternative paradigms for the analysis of imputed survey data. Journal of the American Statistical Association, 91, 490–498.

    Article  MATH  Google Scholar 

  • Godambe, V. P. (1991b). Orthogonality of estimating functions and nuisance parameters. Biometrika, 78, 143–151.

    Article  MathSciNet  Google Scholar 

  • Haziza, D. (2009). Imputation and inference in the presence of missing data. In D. Pfeffermann & C. R. Rao (Eds.) Handbook of Statistics, Vol. 29A: Sample Surveys: Design, Methods and Applications (pp. 215–246). Amsterdam: Elsevier.

    Google Scholar 

  • Haziza, D., & Beaumont, J.-F. (2007). On the construction of imputation classes in surveys. International Statistical Review, 75, 25–43.

    Article  Google Scholar 

  • Haziza, D., & Beaumont, J.-F. (2017). Construction of weights in surveys: A review. Statistical Science, 32, 206–226.

    Article  MathSciNet  MATH  Google Scholar 

  • Haziza, D., & Lesage, E. (2016). A discussion of weighting procedures for unit nonresponse. Journal of Official Statistics, 32, 129–145.

    Article  Google Scholar 

  • Horvitz, D. G., & Thompson, D. J. (1952). A generalization of sampling without replacement from a finite universe. Journal of the American Statistical Association, 47, 663–685.

    Article  MathSciNet  MATH  Google Scholar 

  • Kalton, G., & Kish, L. (1984). Some efficient random imputation methods. Communications in Statistics, A, 13, 1919–1939.

    Article  Google Scholar 

  • Kim, J. K., & Fuller, W. A. (2004). Fractional hot deck imputation. Biometrika, 91, 559–578.

    Article  MathSciNet  MATH  Google Scholar 

  • Kim, J. K., & Haziza, D. (2014). Doubly robust inference with missing data in survey sampling. Statistica Sinica, 24, 375–394.

    MathSciNet  MATH  Google Scholar 

  • Kim, J. K., & Kim, J. J. (2007). Nonresponse weighting adjustment using estimated response probability. The Canadian Journal of Statistics, 35, 501–514.

    Article  MathSciNet  MATH  Google Scholar 

  • Kim, J. K., & Shao, J. (2013). Statistical methods for handling incomplete data. Boca Raton: CRC Press, Taylor & Francis Group.

    Book  MATH  Google Scholar 

  • Kott, P. S. (1994). A note on handling nonresponse in sample surveys.Journal of the American Statistical Association, 89, 693–696.

    Google Scholar 

  • Little, R. J. A. (1988). A test of missing completely at random for multivariate data with missing values. Journal of the American Statistical Association, 83, 1198–1202.

    Article  MathSciNet  Google Scholar 

  • Little, R. J. A., & Rubin, D. B. (2002). Statistical analysis with missing data (2nd ed.). New York: Wiley

    Book  MATH  Google Scholar 

  • McCullagh, P., & Nelder, J. (1983). Generalized linear models. London: Chapman & Hall.

    Book  MATH  Google Scholar 

  • Meng, X.-L. (1994). Multiple-imputation inferences with uncongenial sources of input. Statistical Science, 9, 538–573.

    Article  Google Scholar 

  • Raghunathan, T. E., Lepkowski, J. M., van Hoewyk, J., & Solenberger, P. (2001). A multivariate technique for multiply imputing missing values using a sequence of regression models. Survey Methodology, 27, 85–96.

    Google Scholar 

  • Rivers, D. (2007). Sampling for web surveys. Proceedings of the Survey Research Methods Section (pp. 1–26). Alexandria, VA: American Statistical Association.

    Google Scholar 

  • Robins, J. M., Rotnitzky, A., & Zhao, L. P. (1994). Estimation of regression coefficients when some regressors are not always observed. Journal of the American Statistical Association, 89, 846–866.

    Article  MathSciNet  MATH  Google Scholar 

  • Robins, J. M., & Wang, N. (2000). Inference for imputation estimators. Biometrika, 87, 113–124.

    Article  MathSciNet  MATH  Google Scholar 

  • Rosenbaum, P. R., & Rubin, D. B. (1983). The central role of the propensity score in observational studies for causal effects. Biometrika, 70, 41–55.

    Article  MathSciNet  MATH  Google Scholar 

  • Rubin, D. B. (1976). Inference and missing data. Biometrika, 63, 581–590.

    Article  MathSciNet  MATH  Google Scholar 

  • Rubin, D. B. (1978). Multiple imputations in sample surveys - A phenomenological Bayesian approach to nonresponse. In Proceedings of the Survey Research Methods Section (pp. 20–28). Alexandria, VA: American Statistical Association.

    Google Scholar 

  • Rubin, D. B. (1987). Multiple imputation for nonresponse in surveys. New York: Wiley.

    Book  MATH  Google Scholar 

  • Scharfstein, D. O., Rotnitzky, A., & Robins, J. M. (1999). Adjusting for nonignorable drop-out using semiparametric nonresponse models. Journal of the American Statistical Association, 94, 1096–1120.

    Article  MathSciNet  MATH  Google Scholar 

  • Shao, J. (2009). Nonparametric variance estimation for nearest neighbor imputation. Journal of Official Statistics, 25, 55–62.

    Google Scholar 

  • She, X., & Wu, C. (2019). Fully efficient joint fractional imputation for incomplete bivariate ordinal responses. Statistica Sinica, 29, 409–430.

    MathSciNet  MATH  Google Scholar 

  • She, X., & Wu, C. (2020). Validity and efficiency in analyzing ordinal responses with missing observations. The Canadian Journal of Statistics (in press).

    Google Scholar 

  • Tsiatis, A. A. (2006). Semiparametric theory and missing data. New York: Springer.

    MATH  Google Scholar 

  • Wang, N., & Robins, J. M. (1998). Large-sample theory for parametric multiple imputation procedures. Biometrika, 85, 935–948.

    Article  MathSciNet  MATH  Google Scholar 

  • Wei, G. C., & Tanner, M. A. (1990). A Monte Carlo implementation of the EM algorithm and the poor man’s data augmentation algorithms. Journal of the American Statistical Association, 85, 699–704.

    Article  Google Scholar 

  • Xie, X., & Meng, X.-L. (2017). Dissecting multiple imputation from a multi-phase inference perspective: What happens when god’s, imputer’s and analyst’s models are uncongenial (with discussion)? Statistica Sinica, 27, 1485–1594.

    MathSciNet  MATH  Google Scholar 

  • Zhang, S., Han, P., & Wu, C. (2019a). A unified empirical likelihood approach to testing MCAR and subsequent estimation. Scandinavian Journal of Statistics, 46, 272–288.

    Article  MathSciNet  MATH  Google Scholar 

  • Zhang, S., Han, P., & Wu, C. (2019b). Empirical likelihood inference for non-randomized pretest-posttest studies with missing data. Electronic Journal of Statistics, 13, 2012–2042.

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Wu, C., Thompson, M.E. (2020). Methods for Handling Missing Data. In: Sampling Theory and Practice. ICSA Book Series in Statistics. Springer, Cham. https://doi.org/10.1007/978-3-030-44246-0_9

Download citation

Publish with us

Policies and ethics