# Nonparametric imputation method for nonresponse in surveys

- 25 Downloads

## Abstract

Many imputation methods are based on a statistical model that assumes the variable of interest is a noisy observation of a function of the auxiliary variables or covariates. Misspecification of this function may lead to severe errors in estimation and to misleading conclusions. Imputation techniques can therefore benefit from flexible formulations that can capture a wide range of patterns. We consider the use of smoothing splines within an additive model framework to estimate the functional dependence between the variable of interest and the auxiliary variables. The estimator obtained allows us to build an imputation model in the case of multiple auxiliary variables. The performance of our method is assessed via numerical experiments involving simulated and real data.

## Keywords

Additive models Data imputation Sample survey Smoothing spline## Notes

### Acknowledgements

The authors thank Yves Tillé for his constructive suggestions. This research was supported by the Swiss National Science Foundation and the Natural Science and Engineering Research Council of Canada.

## References

- Andreis F, Conti PL, Mecatti F (2018) On the role of weights rounding in applications of resampling based on pseudopopulations. Stat NeerlGoogle Scholar
- Andridge RR, Little RJA (2010) A review of dot deck imputation for survey non-response. Int Stat Rev 78:40–64CrossRefGoogle Scholar
- Bellman R (1961) Adaptive control processes: a guided tour. Princeton University Press, PrincetonCrossRefzbMATHGoogle Scholar
- Berg E, Kim J-K, Skinner C (2016) Imputation under informative sampling. J Surv Stat Methodol 4(4):436–462CrossRefGoogle Scholar
- Breiman L (2001) Random forests. Mach Learn 45(1):5–32CrossRefzbMATHGoogle Scholar
- Central Statistical Office (1993) Family expenditure survey, 1992 [computer file]. Technical report, Colchester, Essex: UK Data Archive [distributor]. SN: 3064. https://doi.org/10.5255/UKDA-SN-3064-1
- Chauvet G, Deville J-C, Haziza D (2011) On balanced random imputation in surveys. Biometrika 98:459–471MathSciNetCrossRefzbMATHGoogle Scholar
- Da Silva DN, Opsomer JD (2006) A kernel smoothing method of adjusting for unit non-response in sample surveys. Can J Stat 34(4):563–579MathSciNetCrossRefzbMATHGoogle Scholar
- Da Silva DN, Opsomer JD (2009) Nonparametric propensity weighting for survey nonresponse through local polynomial regression. Surv Methodol 35(2):165–176Google Scholar
- Eubank RL (1999) Nonparametric regression and spline smoothing, 2nd edn. Marcel Dekker, New YorkzbMATHGoogle Scholar
- Giommi A (1987) Nonparametric methods for estimating individual response probabilities. Surv Methodol 13(2):127–134Google Scholar
- Green PJ, Silverman BW (1994) Nonparametric regression and generalized linear models. Chapman & Hall, Boca RatonCrossRefzbMATHGoogle Scholar
- Gross ST (1980) Mean estimation in sample surveys. In: Proceedings of the survey research methods section. American Statistical Association, pp 181–184Google Scholar
- Hastie TJ, Tibshirani RJ (1986) Generalized additive models. Stat Sci 1(3):297–318MathSciNetCrossRefzbMATHGoogle Scholar
- Hastie TJ, Tibshirani RJ (1990) Generalized additive models. Chapman & Hall, Boca RatonzbMATHGoogle Scholar
- Haziza D (2009) Imputation and inference in the presence of missing data. In: Rao C (ed) Handbook of statistics, volume 29 of handbook of statistics. Elsevier, Amsterdam, pp 215–246Google Scholar
- Haziza D, Rao JNK (2005) Inference for domain means and totals under imputation for missing data. Can J Stat 33:149–161CrossRefzbMATHGoogle Scholar
- Lee TCM (2003) Smoothing parameter selection for smoothing splines: a simulation study. Comput Stat Data Anal 42(1):139–148MathSciNetCrossRefzbMATHGoogle Scholar
- Mashreghi Z, Léger C, Haziza D (2014) Bootstrap methods for imputed data from regression, ratio and hot-deck imputation. Can J Stat 42(1):142–167MathSciNetCrossRefzbMATHGoogle Scholar
- Ning J, Cheng P (2012) A comparison study of nonparametric imputation methods. Stat Comput 22:273–285MathSciNetCrossRefzbMATHGoogle Scholar
- Niyonsenga T (1994) Nonparametric estimation of response probabilities in sampling theory. Surv Methodol 20(2):177–184Google Scholar
- Niyonsenga T (1997) Response probability estimation. J Stat Plan Inference 59:111–126MathSciNetCrossRefzbMATHGoogle Scholar
- Qin J, Leung D, Shao J (2002) Estimation with survey data under nonignorable nonresponse or informative sampling. J Am Stat Assoc 97(457):193–200MathSciNetCrossRefzbMATHGoogle Scholar
- Rubin DB (1976) Inference and missing data. Biometrika 63:581–592MathSciNetCrossRefzbMATHGoogle Scholar
- Särndal C-E (1992) Methods for estimating the precision of survey estimates when imputation has been used. Surv Methodol 18(2):241–252Google Scholar
- Shao J, Sitter RR (1996) Bootstrap for imputed survey data. J Am Stat Assoc 91:1278–1288MathSciNetCrossRefzbMATHGoogle Scholar
- Sitter RR (1992a) Comparing three bootstrap methods for survey data. Can J Stat 20:135–154MathSciNetCrossRefzbMATHGoogle Scholar
- Sitter RR (1992b) A resampling procedure for complex survey data. J Am Stat Assoc 87(416):755–765MathSciNetCrossRefzbMATHGoogle Scholar
- Stekhoven DJ (2013) missForest: nonparametric missing value imputation using random forest. R package version 1:4Google Scholar
- Stekhoven D, Buehlmann P (2012) Missforest—nonparametric missing value imputation for mixed-type data. Bioinformatics 28(1):112–118CrossRefGoogle Scholar
- Stones CJ (1985) Additive regression and other nonparametric models. Ann Stat 13(2):689–705MathSciNetCrossRefGoogle Scholar
- Wang Y (2011) Smoothing splines: methods and applications. Chapman & Hall, Boca RatonCrossRefzbMATHGoogle Scholar
- Wood S (2003) Thin plate regression splines. J R Stat Soc Ser B (Stat Methodol) 65(1):95–114MathSciNetCrossRefzbMATHGoogle Scholar
- Wood S (2008) Fast stable direct fitting and smoothness selection for generalized additive models. J R Stat Soc Ser B (Stat Methodol) 70(3):495–518MathSciNetCrossRefzbMATHGoogle Scholar
- Wood S (2014) mgcv: mixed GAM computation vehicle with GCV/AIC/REML smoothness estimation. R package version 1.7-28. http://CRAN.R-project.org/package=mgcv
- Zhang G, Christensen F, Zheng W (2013) Nonparametric regression estimators in complex surveys. J Stat Comput Simul 85(5):1026–1034MathSciNetCrossRefGoogle Scholar