Nonparametric imputation method for nonresponse in surveys

  • Caren HaslerEmail author
  • Radu V. Craiu
Original Paper


Many imputation methods are based on a statistical model that assumes the variable of interest is a noisy observation of a function of the auxiliary variables or covariates. Misspecification of this function may lead to severe errors in estimation and to misleading conclusions. Imputation techniques can therefore benefit from flexible formulations that can capture a wide range of patterns. We consider the use of smoothing splines within an additive model framework to estimate the functional dependence between the variable of interest and the auxiliary variables. The estimator obtained allows us to build an imputation model in the case of multiple auxiliary variables. The performance of our method is assessed via numerical experiments involving simulated and real data.


Additive models Data imputation Sample survey Smoothing spline 



The authors thank Yves Tillé for his constructive suggestions. This research was supported by the Swiss National Science Foundation and the Natural Science and Engineering Research Council of Canada.


  1. Andreis F, Conti PL, Mecatti F (2018) On the role of weights rounding in applications of resampling based on pseudopopulations. Stat NeerlGoogle Scholar
  2. Andridge RR, Little RJA (2010) A review of dot deck imputation for survey non-response. Int Stat Rev 78:40–64CrossRefGoogle Scholar
  3. Bellman R (1961) Adaptive control processes: a guided tour. Princeton University Press, PrincetonCrossRefzbMATHGoogle Scholar
  4. Berg E, Kim J-K, Skinner C (2016) Imputation under informative sampling. J Surv Stat Methodol 4(4):436–462CrossRefGoogle Scholar
  5. Breiman L (2001) Random forests. Mach Learn 45(1):5–32CrossRefzbMATHGoogle Scholar
  6. Central Statistical Office (1993) Family expenditure survey, 1992 [computer file]. Technical report, Colchester, Essex: UK Data Archive [distributor]. SN: 3064.
  7. Chauvet G, Deville J-C, Haziza D (2011) On balanced random imputation in surveys. Biometrika 98:459–471MathSciNetCrossRefzbMATHGoogle Scholar
  8. Da Silva DN, Opsomer JD (2006) A kernel smoothing method of adjusting for unit non-response in sample surveys. Can J Stat 34(4):563–579MathSciNetCrossRefzbMATHGoogle Scholar
  9. Da Silva DN, Opsomer JD (2009) Nonparametric propensity weighting for survey nonresponse through local polynomial regression. Surv Methodol 35(2):165–176Google Scholar
  10. Eubank RL (1999) Nonparametric regression and spline smoothing, 2nd edn. Marcel Dekker, New YorkzbMATHGoogle Scholar
  11. Giommi A (1987) Nonparametric methods for estimating individual response probabilities. Surv Methodol 13(2):127–134Google Scholar
  12. Green PJ, Silverman BW (1994) Nonparametric regression and generalized linear models. Chapman & Hall, Boca RatonCrossRefzbMATHGoogle Scholar
  13. Gross ST (1980) Mean estimation in sample surveys. In: Proceedings of the survey research methods section. American Statistical Association, pp 181–184Google Scholar
  14. Hastie TJ, Tibshirani RJ (1986) Generalized additive models. Stat Sci 1(3):297–318MathSciNetCrossRefzbMATHGoogle Scholar
  15. Hastie TJ, Tibshirani RJ (1990) Generalized additive models. Chapman & Hall, Boca RatonzbMATHGoogle Scholar
  16. Haziza D (2009) Imputation and inference in the presence of missing data. In: Rao C (ed) Handbook of statistics, volume 29 of handbook of statistics. Elsevier, Amsterdam, pp 215–246Google Scholar
  17. Haziza D, Rao JNK (2005) Inference for domain means and totals under imputation for missing data. Can J Stat 33:149–161CrossRefzbMATHGoogle Scholar
  18. Lee TCM (2003) Smoothing parameter selection for smoothing splines: a simulation study. Comput Stat Data Anal 42(1):139–148MathSciNetCrossRefzbMATHGoogle Scholar
  19. Mashreghi Z, Léger C, Haziza D (2014) Bootstrap methods for imputed data from regression, ratio and hot-deck imputation. Can J Stat 42(1):142–167MathSciNetCrossRefzbMATHGoogle Scholar
  20. Ning J, Cheng P (2012) A comparison study of nonparametric imputation methods. Stat Comput 22:273–285MathSciNetCrossRefzbMATHGoogle Scholar
  21. Niyonsenga T (1994) Nonparametric estimation of response probabilities in sampling theory. Surv Methodol 20(2):177–184Google Scholar
  22. Niyonsenga T (1997) Response probability estimation. J Stat Plan Inference 59:111–126MathSciNetCrossRefzbMATHGoogle Scholar
  23. Qin J, Leung D, Shao J (2002) Estimation with survey data under nonignorable nonresponse or informative sampling. J Am Stat Assoc 97(457):193–200MathSciNetCrossRefzbMATHGoogle Scholar
  24. Rubin DB (1976) Inference and missing data. Biometrika 63:581–592MathSciNetCrossRefzbMATHGoogle Scholar
  25. Särndal C-E (1992) Methods for estimating the precision of survey estimates when imputation has been used. Surv Methodol 18(2):241–252Google Scholar
  26. Shao J, Sitter RR (1996) Bootstrap for imputed survey data. J Am Stat Assoc 91:1278–1288MathSciNetCrossRefzbMATHGoogle Scholar
  27. Sitter RR (1992a) Comparing three bootstrap methods for survey data. Can J Stat 20:135–154MathSciNetCrossRefzbMATHGoogle Scholar
  28. Sitter RR (1992b) A resampling procedure for complex survey data. J Am Stat Assoc 87(416):755–765MathSciNetCrossRefzbMATHGoogle Scholar
  29. Stekhoven DJ (2013) missForest: nonparametric missing value imputation using random forest. R package version 1:4Google Scholar
  30. Stekhoven D, Buehlmann P (2012) Missforest—nonparametric missing value imputation for mixed-type data. Bioinformatics 28(1):112–118CrossRefGoogle Scholar
  31. Stones CJ (1985) Additive regression and other nonparametric models. Ann Stat 13(2):689–705MathSciNetCrossRefGoogle Scholar
  32. Wang Y (2011) Smoothing splines: methods and applications. Chapman & Hall, Boca RatonCrossRefzbMATHGoogle Scholar
  33. Wood S (2003) Thin plate regression splines. J R Stat Soc Ser B (Stat Methodol) 65(1):95–114MathSciNetCrossRefzbMATHGoogle Scholar
  34. Wood S (2008) Fast stable direct fitting and smoothness selection for generalized additive models. J R Stat Soc Ser B (Stat Methodol) 70(3):495–518MathSciNetCrossRefzbMATHGoogle Scholar
  35. Wood S (2014) mgcv: mixed GAM computation vehicle with GCV/AIC/REML smoothness estimation. R package version 1.7-28.
  36. Zhang G, Christensen F, Zheng W (2013) Nonparametric regression estimators in complex surveys. J Stat Comput Simul 85(5):1026–1034MathSciNetCrossRefGoogle Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Institute of StatisticsUniversity of NeuchâtelNeuchâtelSwitzerland
  2. 2.Department of Computer and Mathematical SciencesUniversity of Toronto ScarboroughTorontoCanada
  3. 3.Department of Statistical SciencesUniversity of TorontoTorontoCanada

Personalised recommendations