Abstract
Methods are proposed to construct kernel estimators of a regression function in the presence of incomplete data. Furthermore, exponential upper bounds are derived on the performance of the \(L_p\) norms of the proposed estimators, which can then be used to establish various strong convergence results. The presence of incomplete data points are handled by a Horvitz–Thompson-type inverse weighting approach, where the unknown selection probabilities are estimated by both kernel regression and least-squares methods. As an immediate application of these results, the problem of nonparametric classification with partially observed data will be studied.
Similar content being viewed by others
References
Bernstein S (1946) The theory of probabilities. Gastehizdat Publishing House, Moscow
Chen J, Fan J, Li K, Zhou H (2006) Local quasi-likelihood estimation with data missing at random. Statistica Sinica 16:1071–1100
Cheng PE, Chu CK (1996) Kernel estimation of distribution functions and quantiles with missing data. Statistica Sinica 6:63–78
Chu CK, Cheng PE (1995) Nonparametric regression estimation with missing data. J Stat Plan Inference 48:85–99
Devroye L (1981) On the almost everywhere convergence of nonparametric regression function estimates. Ann Stat 9:1310–1319
Devroye L, Krzy\(\grave{{\rm z}}\)ak A (1989) An equivalence theorem for \(L_1\) convergence of kernel regression estimate. J Stat Plan Inference 23:71–82
Devroye L, Györfi L, Lugosi G (1996) A probabilistic theory of pattern recognition. Springer-Verlag, New York
Devroye L, Györfi L, Lugosi G (1985) Nonparametric density estimation: the L1 view. Wiley, New York
Devroye L, Wagner T (1980) On the \(L_1\) convergence of kernel estimators of regression functions with applications in discrimination. Zeitschrift fur Wahrscheinlichkeitstheorie und verwandte Gebiete 51:15–25
Eframovich S (2011) Nonparametric regression with response missing at random. J Stat Plan Inference 141:3744–3752
González S, Rueda M, Arcos A (2008) An improved estimator to analyse missing data. Stat Pap 49:791–796
Györfi L, Kohler M, Krzy\(\grave{{\rm z}}\)ak A, Walk H (2002) A distribution-free theory of nonparametric regression. Springer-Verlag, New York
Hirano KI, Ridder G (2003) Efficient estimation of average treatment effects using the estimated propensity score. Econometrica 71:1161–1189
Hoeffding W (1963) Probability inequalities for sums of bounded random variables. J Am Stat Assoc 58:13–30
Horvitz DG, Thompson DJ (1952) A generalization of sampling without replacement from a finite universe. J Am Stat Assoc 47:663–685
Hu XJ, Zhang B (2012) Pseudolikelihood ratio test with biased observations. Stat Pap 53:387–400
Karimi O, Mohammadzadeh M (2012) Bayesian spatial regression models with closed skew normal correlated errors and missing observations. Stat Pap 53:205–218
Kohler M, Krzy\(\grave{{\rm z}}\)ak A, Walk H (2003) Strong consistency of automatic kernel regression estimates. Ann Inst Stat Math 55: 287–308
Kolmogorov AN, Tikhomirov VM (1959) \(\epsilon \)-entropy and \(\epsilon \)-capacity of sets in function spaces. Uspekhi Mat Nauk 14:3–86
Krzy\(\grave{{\rm z}}\)ak A, Pawlak M (1984) Distribution-free consistency of a nonparametric kernel regression estimate and classification. IEEE Trans Inform Theory 30:78–81
Liang H, Wang S, Carroll R (2007) Partially linear models with missing response variables and error-prone covariates. Biometrika 94:185–198
Little RJA, Rubin DB (2002) Statistical analysis with missing data. Wiley, New York
McCullagh P, Nelder J (1983) Generalized linear models. Chapman & Hall, London
Mojirsheibani M (2007) Nonparametric curve estimation with missing data: a general empirical process approach. J Stat Plan Inference 137:2733–2758
Müller U (2009) Estimating linear functionals in nonlinear regression with responses missing at random. Ann Stat 37:2245–2277
Nadaraya EA (1964) On estimating regression. Theory Probab Appl 9:141–142
Robins J, Rotnitzky A (1995) Semiparametric efficiency in multivariate regression models with missing data. J Am Stat Assoc 90:122–129
Robins J, Rotnitzky A, Zhao L (1995) Ping analysis of semiparametric regression models for repeated outcomes in the presence of missing data. J Am Stat Assoc 90:106–121
Schisterman E, Rotnitzky A (2001) Estimation of the mean of a K-sample U-statistic with missing outcomes and auxiliaries. Biometrika 88:713–725
Spiegelman C, Sacks J (1980) Consistent window estimation in nonparametric regression. Ann Stat 8:240–246
Takai K, Kano Y (2013) Asymptotic inference with incomplete data. Commun Stat Theor Meth 42:3174–3190
Tsiatis A (2006) Semiparametric theory and missing data. Springer, New York
Toutenburg H, Shalabh (2003) Estimation of regression models with equi-correlated responses when some observations on the response variable are missing. Stat Pap 44:217–232
van der Vaart A, Wellner J (1996) Weak convergence and empirical processes. Springer-Verlag, New York
Walk H (2002) On cross-validation in kernel and partitioning regression estimation. Stat Probab Lett 59:113–123
Walk, H. (2002b), Almost sure convergence properties of Nadaraya-Watson regression estimates. Modeling uncertainty, 201–223, Inter Ser Oper Res Manag Sci, 46, Kluwer Academic Publishers, Boston
Wang D, Chen S (2009) Empirical likelihood for estimating equations with missing values. Ann Stat 37:490–517
Wang Q, Linton O, Härdle W (2004) Semiparametric regression analysis with missing response at random. J Am Stat Assoc 99:334–345
Wang Q, Qin Y (2010) Empirical likelihood confidence bands for distribution functions with missing responses. J Stat Plan Inference 140:2778–2789
Wang L, Rotnitzky A, Lin X (2010) Nonparametric regression with missing outcomes using weighted kernel estimating equations. J Am Stat Assoc 105:1135–1146
Watson GS (1964) Smooth regression analysis. Sankhya: Indian J Stat, Ser A 26:359–372
Acknowledgments
This work is supported in part by the National Science Foundation Grant DMS-1407400 of Majid Mojirsheibani.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Mojirsheibani, M., Reese, T. Kernel regression estimation for incomplete data with applications. Stat Papers 58, 185–209 (2017). https://doi.org/10.1007/s00362-015-0693-z
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00362-015-0693-z