Advertisement

Donor Limited Hot Deck Imputation: A Constrained Optimization Problem

  • Dieter William JoenssenEmail author
Part of the Studies in Classification, Data Analysis, and Knowledge Organization book series (STUDIES CLASS)

Abstract

Hot deck methods impute missing data by matching records that are complete to those that are missing values. Observations absent within the recipient are then replaced by replicating the values from the matched donor. Some hot deck procedures constrain the frequency with which any donor may be matched to increase the precision of post-imputation parameter-estimates. This constraint, called a donor limit, also mitigates risks of exclusively using one donor for all imputations or using one donor with an extreme value or values “too often.” Despite these desirable properties, imputation results of a donor limited hot deck are dependent on the recipients’ order of imputation, an undesirable property. For nearest neighbor type hot deck procedures, the implementation of a constraint on donor usage causes the stepwise matching between each recipient and its closest donor to no longer minimize the sum of all donor–recipient distances. Thus, imputation results may further be improved by procedures that minimize the total donor–recipient distance-sum. The discrete optimization problem is formulated and a simulation detailing possible improvements when solving this integer program is presented.

Keywords

Donor Limit Miss Data Mechanism Missingness Mechanism Multivariate Parameter Imputation Quality 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. Andridge, R. R., & Little, R. J. A. (2010). A review of hot deck imputation for survey nonresponse. International Statistical Review, 78, 40–64.CrossRefGoogle Scholar
  2. Bankhofer, U., & Joenssen, D. W. (2014). On limiting donor usage for imputation of missing data via hot deck methods. In M. Spiliopoulou, L. Schmidt–Thieme, & R. Jannings (Eds.), Data analysis, machine learning and knowledge discovery (pp. 3–11). Berlin: Springer.Google Scholar
  3. Collins, L., Schafer, J., & Kam, C. (2001). A comparison of inclusive and restrictive strategies in modern missing data procedures. Psychological Methods, 6, 330–351.CrossRefGoogle Scholar
  4. Domschke, W. (1995). Logistik: Transport. München: Oldenbourg.Google Scholar
  5. Enders, C. K. (2010). Applied missing data analysis. New York: Guilford.Google Scholar
  6. Ford B. (1983). An overview of hot-deck procedures. In W. Madow, H. Nisselson, & I. Olkin (Eds.), Incomplete data in sample surveys (pp. 185–207). New York: Academic Press.Google Scholar
  7. Genz, A., Bretz, F., Miwa, T., Mi, X., Leisch, F., Scheipl, F., et al. (2013). mvtnorm: Multivariate normal and distributions. R package version 0.9-9995. http://CRAN.R-project.org/package=mvtnorm.
  8. Joenssen, D. W. (2013). HotDeckImputation: Hot deck imputation methods for missing data. R package version 0.1.0. http://CRAN.R-project.org/package=HotDeckImputation.
  9. Kalton, G., & Kish, L. (1984). Some efficient random imputation methods. Communications in Statistics Theory and Methods, 13, 1919–1939.CrossRefGoogle Scholar
  10. Kovar, J. G., & Whitridge, J. (1995). Imputation of business survey data. In B. G. Cox, D. A. Binder, B. N. Chinnappa, A. Christianson, M. J. Colledge, & P. S. Kott (Eds.), Business survey methods (pp. 403–423). New York: Wiley.Google Scholar
  11. Little, R. J. A., & Rubin, D. B. (2002). Statistical analysis with missing data. Hoboken: Wiley.CrossRefzbMATHGoogle Scholar
  12. R Core Team. (2013). R: A language and environment for statistical computing. R Vienna: Foundation for Statistical Computing. http://www.R-project.org/
  13. Reinfeld, N. V., & Vogel, W. R. (1958). Mathematical programming. New Jersey: Prentice-Hall.Google Scholar
  14. Rubin, D. B. (1976). Inference and missing data (with discussion). Biometrika, 63, 581–592.CrossRefzbMATHMathSciNetGoogle Scholar
  15. Sande I. (1983). Hot-deck imputation procedures. In W. Madow, H. Nisselson, & I. Olkin (Eds.), Incomplete data in sample surveys (pp. 339–349). New York: Academic Press.Google Scholar
  16. Schafer, J., & Graham, J. (2002). Missing data: Our view of the state of the art. Psychological Methods, 7, 147–177.CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2015

Authors and Affiliations

  1. 1.Ilmenau University of TechnologyIlmenauGermany

Personalised recommendations