AStA Advances in Statistical Analysis

, Volume 97, Issue 1, pp 49–76 | Cite as

Illuminate the unknown: evaluation of imputation procedures based on the SAVE survey

  • Michael Ziegelmeyer
Original Paper


Questions about monetary variables (such as income, wealth or savings) are key components of questionnaires on household finances. However, missing information on such sensitive topics is a well-known phenomenon which can seriously bias any inference based only on complete-case analysis. Many imputation techniques have been developed and implemented in several surveys. Using the German SAVE data, a new estimation technique is necessary to overcome the upward bias of monetary variables caused by the initially implemented imputation procedure. The upward bias is the result of adding random draws to the implausible negative values predicted by OLS regressions until all values are positive. To overcome this problem the logarithm of the dependent variable is taken and the predicted values are retransformed to the original scale by Duan’s smearing estimate. This paper evaluates the two different techniques for the imputation of monetary variables implementing a simulation study, where a random pattern of missingness is imposed on the observed values of the variables of interest. A Monte-Carlo simulation based on the observed data shows the superiority of the newly implemented smearing estimate to construct the missing data structure. All waves are consistently imputed using the new method.


Imputation methods Monte-Carlo simulation Imputation evaluation Item-nonresponse Missing data Imputation Retransformation Sample surveys SAVE 

JEL Classification

C01 C81 C49 


  1. Aittokallio, T.: Dealing with missing values in large-scale studies: microarray data imputation and beyond. Briefings Bioinf. 11(2), 253–264 (2009)CrossRefGoogle Scholar
  2. Banca D’Italia (eds.): Supplements to the Statistical Bulletin—Sample Surveys: Household Income and Wealth in 2008. New series, vol. XX, no. 8–10, February (2010)Google Scholar
  3. Barceló, C.: Imputation of the 2002 wave of the Spanish Survey of Household Finances (EFF), Occasional Paper No. 0603, Bank of Spain (2006)Google Scholar
  4. Bello, A.L.: Choosing among imputation techniques for incomplete multivariate data: a simulation study. Commun. Stat. Theory Methods 22(3), 853–877 (1993)zbMATHCrossRefGoogle Scholar
  5. Bello, A.L.: Imputation techniques in regression analysis: Looking closely at their implementation. Comput. Stat. Data Anal. 20, 45–57 (1995)zbMATHCrossRefGoogle Scholar
  6. Börsch-Supan, A., Coppola, M., Essig, L., Eymann, A., Schunk, D.: The German SAVE study. Design and Results. Mea Studies 06, MEA, University of Mannheim (2008)Google Scholar
  7. Bover, O.: The Spanish Survey of Household Finances (EFF): Description and Methods of the 2002 Wave. Documentos Ocasionales No. 0409 (2004)Google Scholar
  8. Cameron, A.C., Trivedi, P.K.: Microeconometrics using Stata. Stata Press, College Station (2009)Google Scholar
  9. Chambers, R.: Evaluation Criteria for Statistical Editing and Imputation. University of Southampton, Southampton, UK, Working Paper for the Euredit Project on the Development and Evaluation of New Methods for Editing and Imputation (2003)Google Scholar
  10. Duan, N.: Smearing estimate: a nonparametric retransformation method. J. Am. Stat. Assoc. 78(383), 605–610 (1983)zbMATHCrossRefGoogle Scholar
  11. Essig, L., Winter, J.: Item nonresponse to financial questions in household surveys: an experimental study of interviewer and mode effects. Fiscal Stud. 30(3/4), 367–390 (2009)CrossRefGoogle Scholar
  12. Frick, J.R., Grabka, M.M.: Item non-response and Imputation of Annual Labor Income in Panel Surveys from a Cross-National Perspective. DIW Discussion Paper 736 (2007)Google Scholar
  13. Gelman, A.: Struggles with survey weighting and regression modeling. Stat. Sci. 22(2), 153–164 (2007)MathSciNetzbMATHCrossRefGoogle Scholar
  14. Giorgi, R., Belot, A., Gaudart, J., Launoy, G., The French Network of Cancer Registries FRANCIM: The performance of multiple imputation for missing covariate data within the context of regression relative survival. Stat. Med. 27, 6310–6331 analysis. Statist. Med. 27, 6310–6331 (2008)Google Scholar
  15. Graham, J.W., Schafer, J.L.: On the performance of multiple imputation for multivariate data with small sample size. In: Hoyle, R. (ed.) Statistical Strategies for Small Sample Research, pp. 1–29. Sage, Thousand Oaks (1999)Google Scholar
  16. Hu, M., Salvucci, S.M., Cohen, M.P.: Evaluation of some popular imputation algorithms. In: Proceedings of the Survey Research Methods Section. American Statistical Association, pp. 308–313 (1998)Google Scholar
  17. Hu, M., Salvucci, S.: A Study of Imputation Algorithms U.S. Department of Education, National Center for Education Statistics, Working Paper No. 2001–17, by Project Officer, Ralph Lee. Washington, DC (2001)Google Scholar
  18. Jonsson, P., Wohlin, C.: An evaluation of k-nearest neighbour imputation using Likert Data. In: Proceedings of the Software Metrics, 10th International, Symposium, pp. 108–118 (2004)Google Scholar
  19. Kennickell, A.B.: Imputation of the 1989 survey of consumer finances: stochastic relaxation and multiple imputation. In: Proceedings of the Section on Survey Research Methods. American Statistical Association, Atlanta (1991)Google Scholar
  20. Kennickell, A.B.: Multiple imputation of the 1983 and 1989 waves of the SCF, presented at the 1994 annual meetings. American Statistical Association, Toronto (1994)Google Scholar
  21. Kennickell, A.B.: Multiple Imputation and Disclosure Protection: the Case of the 1995 Survey of Consumer Finances, SCF Working Paper (1997)Google Scholar
  22. Kennickell, A.B.: Multiple imputation in the survey of consumer finances. In: Proceedings of the 1998 Joint Statistical Meetings, Dallas (1998)Google Scholar
  23. Kuchler, C., Spiess, M.: The data quality concept of accuracy in the context of public use data sets. AStA Wirtschafts- und Sozialstatistisches Archiv 3(1), 67–80 (2009)CrossRefGoogle Scholar
  24. Little, R.J.A., Raghunathan, T.E.: Should Imputation of missing data condition on all observed variables? American Statistical Association Proceedings of the Section on Survey Research. Methods, pp 617–622 (1997)Google Scholar
  25. Little, R.J.A., Rubin, D.B.: Statistical Analysis with Missing Data, 1 and 2 edn. Wiley, New York (2002)Google Scholar
  26. Manning, W.G.: The logged dependent variable, heteroscedasticity, and the retransformation problem. J. Health Econ. 17, 283–295 (1998)CrossRefGoogle Scholar
  27. Manning, W.G., Mullahy, J.: Estimating log models: to transform or not to transform? J. Health Econ. 20, 461–494 (2001)CrossRefGoogle Scholar
  28. Mullahy, J.: Much ado about two: reconsidering retransformation and the two-part model in health econometrics. J. Health Econ. 17, 247–281 (1998)CrossRefGoogle Scholar
  29. Münnich, R.T., Burgard, J.P.: On the influence of sampling design on small area estimates. J. Indian Soc. Agric. Stat. 66(1), 145–156 (2012)Google Scholar
  30. Nicoletti, C., Peracchi, F.: The effects on income imputation on micro analyses: Evidence from the ECHP, Working Papers of the Institute for Social and Economic Research, paper 2004–2019. University of Essex, Colchester (2004)Google Scholar
  31. Rässler, S., Riphahn, R.: Survey item nonresponse and its treatment. Allgemeines Statistisches Archiv 90, 217–232 (2006)MathSciNetzbMATHCrossRefGoogle Scholar
  32. Rick, A.: The Saving Behavior of German Families: Heterogeneity in the Effect of Children on Annual Saving, Saving Motives, and the Regularity of Saving, Mea Studies 11, MEA. University of Mannheim (2010)Google Scholar
  33. Royston, P.: Multiple imputation of missing values. Stata J. 4(3), 227–241 (2004)Google Scholar
  34. Rubin, D.B.: Inference and missing data. Biometrika 63, 581–592 (1976)MathSciNetzbMATHCrossRefGoogle Scholar
  35. Rubin, D.B.: Multiple imputation for nonresponse in surveys, 1 and 2 edn. Wiley, New York (1987, 2004)Google Scholar
  36. Rubin, D.B.: Multiple imputation after 18+ years. J Am Stat Assoc. 91(434), 473–489 (1996)Google Scholar
  37. Rubin, D.B.: Discussion on multiple imputation. Int. Stat. Rev. 73(3), 619–625 (2003)Google Scholar
  38. Schafer, J.L., Ezzatti-Rice, T.M., Johnson, W., Khare, M., Little, R.J.A., Rubin, D.B.: The NHANES III multiple imputation project. In: Proceedings of the Survey Research Methods Section of the American Statistical Association, pp. 28–27 (1996)Google Scholar
  39. Schafer, J.L.: Analysis of Incomplete Multivariate Data. Chapman& Hall, London (1997)zbMATHCrossRefGoogle Scholar
  40. Schunk, D.: A Markov Chain Monte Carlo Multiple Imputation Procedure for Dealing with Item Nonresponse in the German SAVE Survey, Mea-Discussion-Paper 121–2007. University of Mannheim, MEA (2007)Google Scholar
  41. Schunk, D.: A Markov chain Monte Carlo algorithm for multiple imputation in large surveys. Adv. Stat. Anal. 92(1), 101–114 (2008)MathSciNetzbMATHCrossRefGoogle Scholar
  42. Tanner, M.A., Wong, W.H.: The calculation of posterior distributions by data augmentation. J. Am. Stat. Assoc. 82(398), 528–550 (1987)MathSciNetzbMATHCrossRefGoogle Scholar
  43. Taylor, M.F., Brice, J., Buck, N., Prentice-Lane, E. (eds.): British Household Panel Survey User Manual, vol. A. Introduction, Technical Report and Appendices, University of Essex, Colchester (2010).
  44. Tseng, S., Wang, K., Lee, C.: A pre-processing method to deal with missing values by integrating clustering and regression techniques. Appl. Artif. Intell. 17(5/6), 535–544 (2003)CrossRefGoogle Scholar
  45. van Buuren, S., Brand, J.P.L., Groothuis-Oudshoorn, C.G.M., Rubin, D.B.: Fully conditional specification in multivariate imputation. J. Stat. Comput. Simul. 76(12), 1049–1064 (2006)MathSciNetzbMATHCrossRefGoogle Scholar
  46. Wasito, I., Mirkin, B.: Nearest neighbours in least-squares data imputation algorithms with different missing patterns. Comput. Stat. Data Anal. 50(4), 926–949 (2006)MathSciNetCrossRefGoogle Scholar
  47. Wooldridge, J. M.: Introductory econometrics—a modern approach, 2nd edn. Thomson, Mason (2003)Google Scholar
  48. Zhang, P.: Multiple imputation: theory and method. Int. Stat. Rev. 71(3), 581–592 (2003)zbMATHCrossRefGoogle Scholar
  49. Ziegelmeyer, M.: Documentation of the logical imputation using the panel structure of the 2003–2008 German SAVE Survey. MEA Discussion Paper 173–09, MEA Mannheim (2009a)Google Scholar
  50. Ziegelmeyer, M.: Analysis of the Precautionary Saving Motive Based on a Subjective Measure (SAVE 2005–2007), Mea Studies 07. University of Mannheim, MEA (2009b)Google Scholar

Copyright information

© Springer-Verlag 2012

Authors and Affiliations

  1. 1.Economics and Research DepartmentBanque centrale du Luxembourg (BCL), Munich Center for the Economics of Aging (MEA)LuxembourgLuxembourg

Personalised recommendations