# Illuminate the unknown: evaluation of imputation procedures based on the SAVE survey

- 255 Downloads
- 7 Citations

## Abstract

Questions about monetary variables (such as income, wealth or savings) are key components of questionnaires on household finances. However, missing information on such sensitive topics is a well-known phenomenon which can seriously bias any inference based only on complete-case analysis. Many imputation techniques have been developed and implemented in several surveys. Using the German SAVE data, a new estimation technique is necessary to overcome the upward bias of monetary variables caused by the initially implemented imputation procedure. The upward bias is the result of adding random draws to the implausible negative values predicted by OLS regressions until all values are positive. To overcome this problem the logarithm of the dependent variable is taken and the predicted values are retransformed to the original scale by Duan’s smearing estimate. This paper evaluates the two different techniques for the imputation of monetary variables implementing a simulation study, where a random pattern of missingness is imposed on the observed values of the variables of interest. A Monte-Carlo simulation based on the observed data shows the superiority of the newly implemented smearing estimate to construct the missing data structure. All waves are consistently imputed using the new method.

## Keywords

Imputation methods Monte-Carlo simulation Imputation evaluation Item-nonresponse Missing data Imputation Retransformation Sample surveys SAVE## JEL Classification

C01 C81 C49## References

- Aittokallio, T.: Dealing with missing values in large-scale studies: microarray data imputation and beyond. Briefings Bioinf.
**11**(2), 253–264 (2009)CrossRefGoogle Scholar - Banca D’Italia (eds.): Supplements to the Statistical Bulletin—Sample Surveys: Household Income and Wealth in 2008. New series, vol. XX, no. 8–10, February (2010)Google Scholar
- Barceló, C.: Imputation of the 2002 wave of the Spanish Survey of Household Finances (EFF), Occasional Paper No. 0603, Bank of Spain (2006)Google Scholar
- Bello, A.L.: Choosing among imputation techniques for incomplete multivariate data: a simulation study. Commun. Stat. Theory Methods
**22**(3), 853–877 (1993)zbMATHCrossRefGoogle Scholar - Bello, A.L.: Imputation techniques in regression analysis: Looking closely at their implementation. Comput. Stat. Data Anal.
**20**, 45–57 (1995)zbMATHCrossRefGoogle Scholar - Börsch-Supan, A., Coppola, M., Essig, L., Eymann, A., Schunk, D.: The German SAVE study. Design and Results. Mea Studies 06, MEA, University of Mannheim (2008)Google Scholar
- Bover, O.: The Spanish Survey of Household Finances (EFF): Description and Methods of the 2002 Wave. Documentos Ocasionales No. 0409 (2004)Google Scholar
- Cameron, A.C., Trivedi, P.K.: Microeconometrics using Stata. Stata Press, College Station (2009)Google Scholar
- Chambers, R.: Evaluation Criteria for Statistical Editing and Imputation. University of Southampton, Southampton, UK, Working Paper for the Euredit Project on the Development and Evaluation of New Methods for Editing and Imputation (2003)Google Scholar
- Duan, N.: Smearing estimate: a nonparametric retransformation method. J. Am. Stat. Assoc.
**78**(383), 605–610 (1983)zbMATHCrossRefGoogle Scholar - Essig, L., Winter, J.: Item nonresponse to financial questions in household surveys: an experimental study of interviewer and mode effects. Fiscal Stud.
**30**(3/4), 367–390 (2009)CrossRefGoogle Scholar - Frick, J.R., Grabka, M.M.: Item non-response and Imputation of Annual Labor Income in Panel Surveys from a Cross-National Perspective. DIW Discussion Paper 736 (2007)Google Scholar
- Gelman, A.: Struggles with survey weighting and regression modeling. Stat. Sci.
**22**(2), 153–164 (2007)MathSciNetzbMATHCrossRefGoogle Scholar - Giorgi, R., Belot, A., Gaudart, J., Launoy, G., The French Network of Cancer Registries FRANCIM: The performance of multiple imputation for missing covariate data within the context of regression relative survival. Stat. Med.
**27**, 6310–6331 analysis. Statist. Med.**27**, 6310–6331 (2008)Google Scholar - Graham, J.W., Schafer, J.L.: On the performance of multiple imputation for multivariate data with small sample size. In: Hoyle, R. (ed.) Statistical Strategies for Small Sample Research, pp. 1–29. Sage, Thousand Oaks (1999)Google Scholar
- Hu, M., Salvucci, S.M., Cohen, M.P.: Evaluation of some popular imputation algorithms. In: Proceedings of the Survey Research Methods Section. American Statistical Association, pp. 308–313 (1998)Google Scholar
- Hu, M., Salvucci, S.: A Study of Imputation Algorithms U.S. Department of Education, National Center for Education Statistics, Working Paper No. 2001–17, by Project Officer, Ralph Lee. Washington, DC (2001)Google Scholar
- Jonsson, P., Wohlin, C.: An evaluation of k-nearest neighbour imputation using Likert Data. In: Proceedings of the Software Metrics, 10th International, Symposium, pp. 108–118 (2004)Google Scholar
- Kennickell, A.B.: Imputation of the 1989 survey of consumer finances: stochastic relaxation and multiple imputation. In: Proceedings of the Section on Survey Research Methods. American Statistical Association, Atlanta (1991)Google Scholar
- Kennickell, A.B.: Multiple imputation of the 1983 and 1989 waves of the SCF, presented at the 1994 annual meetings. American Statistical Association, Toronto (1994)Google Scholar
- Kennickell, A.B.: Multiple Imputation and Disclosure Protection: the Case of the 1995 Survey of Consumer Finances, SCF Working Paper (1997)Google Scholar
- Kennickell, A.B.: Multiple imputation in the survey of consumer finances. In: Proceedings of the 1998 Joint Statistical Meetings, Dallas (1998)Google Scholar
- Kuchler, C., Spiess, M.: The data quality concept of accuracy in the context of public use data sets. AStA Wirtschafts- und Sozialstatistisches Archiv
**3**(1), 67–80 (2009)CrossRefGoogle Scholar - Little, R.J.A., Raghunathan, T.E.: Should Imputation of missing data condition on all observed variables? American Statistical Association Proceedings of the Section on Survey Research. Methods, pp 617–622 (1997)Google Scholar
- Little, R.J.A., Rubin, D.B.: Statistical Analysis with Missing Data, 1 and 2 edn. Wiley, New York (2002)Google Scholar
- Manning, W.G.: The logged dependent variable, heteroscedasticity, and the retransformation problem. J. Health Econ.
**17**, 283–295 (1998)CrossRefGoogle Scholar - Manning, W.G., Mullahy, J.: Estimating log models: to transform or not to transform? J. Health Econ.
**20**, 461–494 (2001)CrossRefGoogle Scholar - Mullahy, J.: Much ado about two: reconsidering retransformation and the two-part model in health econometrics. J. Health Econ.
**17**, 247–281 (1998)CrossRefGoogle Scholar - Münnich, R.T., Burgard, J.P.: On the influence of sampling design on small area estimates. J. Indian Soc. Agric. Stat.
**66**(1), 145–156 (2012)Google Scholar - Nicoletti, C., Peracchi, F.: The effects on income imputation on micro analyses: Evidence from the ECHP, Working Papers of the Institute for Social and Economic Research, paper 2004–2019. University of Essex, Colchester (2004)Google Scholar
- Rässler, S., Riphahn, R.: Survey item nonresponse and its treatment. Allgemeines Statistisches Archiv
**90**, 217–232 (2006)MathSciNetzbMATHCrossRefGoogle Scholar - Rick, A.: The Saving Behavior of German Families: Heterogeneity in the Effect of Children on Annual Saving, Saving Motives, and the Regularity of Saving, Mea Studies 11, MEA. University of Mannheim (2010)Google Scholar
- Royston, P.: Multiple imputation of missing values. Stata J.
**4**(3), 227–241 (2004)Google Scholar - Rubin, D.B.: Inference and missing data. Biometrika
**63**, 581–592 (1976)MathSciNetzbMATHCrossRefGoogle Scholar - Rubin, D.B.: Multiple imputation for nonresponse in surveys, 1 and 2 edn. Wiley, New York (1987, 2004)Google Scholar
- Rubin, D.B.: Multiple imputation after 18+ years. J Am Stat Assoc.
**91**(434), 473–489 (1996)Google Scholar - Rubin, D.B.: Discussion on multiple imputation. Int. Stat. Rev.
**73**(3), 619–625 (2003)Google Scholar - Schafer, J.L., Ezzatti-Rice, T.M., Johnson, W., Khare, M., Little, R.J.A., Rubin, D.B.: The NHANES III multiple imputation project. In: Proceedings of the Survey Research Methods Section of the American Statistical Association, pp. 28–27 (1996)Google Scholar
- Schafer, J.L.: Analysis of Incomplete Multivariate Data. Chapman& Hall, London (1997)zbMATHCrossRefGoogle Scholar
- Schunk, D.: A Markov Chain Monte Carlo Multiple Imputation Procedure for Dealing with Item Nonresponse in the German SAVE Survey, Mea-Discussion-Paper 121–2007. University of Mannheim, MEA (2007)Google Scholar
- Schunk, D.: A Markov chain Monte Carlo algorithm for multiple imputation in large surveys. Adv. Stat. Anal.
**92**(1), 101–114 (2008)MathSciNetzbMATHCrossRefGoogle Scholar - Tanner, M.A., Wong, W.H.: The calculation of posterior distributions by data augmentation. J. Am. Stat. Assoc.
**82**(398), 528–550 (1987)MathSciNetzbMATHCrossRefGoogle Scholar - Taylor, M.F., Brice, J., Buck, N., Prentice-Lane, E. (eds.): British Household Panel Survey User Manual, vol. A. Introduction, Technical Report and Appendices, University of Essex, Colchester (2010). https://www.iser.essex.ac.uk/bhps/documentation/pdf_versions/volumes/bhpsvola.pdf
- Tseng, S., Wang, K., Lee, C.: A pre-processing method to deal with missing values by integrating clustering and regression techniques. Appl. Artif. Intell.
**17**(5/6), 535–544 (2003)CrossRefGoogle Scholar - van Buuren, S., Brand, J.P.L., Groothuis-Oudshoorn, C.G.M., Rubin, D.B.: Fully conditional specification in multivariate imputation. J. Stat. Comput. Simul.
**76**(12), 1049–1064 (2006)MathSciNetzbMATHCrossRefGoogle Scholar - Wasito, I., Mirkin, B.: Nearest neighbours in least-squares data imputation algorithms with different missing patterns. Comput. Stat. Data Anal.
**50**(4), 926–949 (2006)MathSciNetCrossRefGoogle Scholar - Wooldridge, J. M.: Introductory econometrics—a modern approach, 2nd edn. Thomson, Mason (2003)Google Scholar
- Zhang, P.: Multiple imputation: theory and method. Int. Stat. Rev.
**71**(3), 581–592 (2003)zbMATHCrossRefGoogle Scholar - Ziegelmeyer, M.: Documentation of the logical imputation using the panel structure of the 2003–2008 German SAVE Survey. MEA Discussion Paper 173–09, MEA Mannheim (2009a)Google Scholar
- Ziegelmeyer, M.: Analysis of the Precautionary Saving Motive Based on a Subjective Measure (SAVE 2005–2007), Mea Studies 07. University of Mannheim, MEA (2009b)Google Scholar