Skip to main content
Log in

Kernel regression estimation for incomplete data with applications

  • Regular Article
  • Published:
Statistical Papers Aims and scope Submit manuscript

Abstract

Methods are proposed to construct kernel estimators of a regression function in the presence of incomplete data. Furthermore, exponential upper bounds are derived on the performance of the \(L_p\) norms of the proposed estimators, which can then be used to establish various strong convergence results. The presence of incomplete data points are handled by a Horvitz–Thompson-type inverse weighting approach, where the unknown selection probabilities are estimated by both kernel regression and least-squares methods. As an immediate application of these results, the problem of nonparametric classification with partially observed data will be studied.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Bernstein S (1946) The theory of probabilities. Gastehizdat Publishing House, Moscow

    Google Scholar 

  • Chen J, Fan J, Li K, Zhou H (2006) Local quasi-likelihood estimation with data missing at random. Statistica Sinica 16:1071–1100

    MathSciNet  MATH  Google Scholar 

  • Cheng PE, Chu CK (1996) Kernel estimation of distribution functions and quantiles with missing data. Statistica Sinica 6:63–78

    MathSciNet  MATH  Google Scholar 

  • Chu CK, Cheng PE (1995) Nonparametric regression estimation with missing data. J Stat Plan Inference 48:85–99

    Article  MathSciNet  MATH  Google Scholar 

  • Devroye L (1981) On the almost everywhere convergence of nonparametric regression function estimates. Ann Stat 9:1310–1319

    Article  MathSciNet  MATH  Google Scholar 

  • Devroye L, Krzy\(\grave{{\rm z}}\)ak A (1989) An equivalence theorem for \(L_1\) convergence of kernel regression estimate. J Stat Plan Inference 23:71–82

  • Devroye L, Györfi L, Lugosi G (1996) A probabilistic theory of pattern recognition. Springer-Verlag, New York

    Book  MATH  Google Scholar 

  • Devroye L, Györfi L, Lugosi G (1985) Nonparametric density estimation: the L1 view. Wiley, New York

    MATH  Google Scholar 

  • Devroye L, Wagner T (1980) On the \(L_1\) convergence of kernel estimators of regression functions with applications in discrimination. Zeitschrift fur Wahrscheinlichkeitstheorie und verwandte Gebiete 51:15–25

    Article  MathSciNet  MATH  Google Scholar 

  • Eframovich S (2011) Nonparametric regression with response missing at random. J Stat Plan Inference 141:3744–3752

    Article  MathSciNet  Google Scholar 

  • González S, Rueda M, Arcos A (2008) An improved estimator to analyse missing data. Stat Pap 49:791–796

    Article  MathSciNet  MATH  Google Scholar 

  • Györfi L, Kohler M, Krzy\(\grave{{\rm z}}\)ak A, Walk H (2002) A distribution-free theory of nonparametric regression. Springer-Verlag, New York

  • Hirano KI, Ridder G (2003) Efficient estimation of average treatment effects using the estimated propensity score. Econometrica 71:1161–1189

    Article  MathSciNet  MATH  Google Scholar 

  • Hoeffding W (1963) Probability inequalities for sums of bounded random variables. J Am Stat Assoc 58:13–30

    Article  MathSciNet  MATH  Google Scholar 

  • Horvitz DG, Thompson DJ (1952) A generalization of sampling without replacement from a finite universe. J Am Stat Assoc 47:663–685

    Article  MathSciNet  MATH  Google Scholar 

  • Hu XJ, Zhang B (2012) Pseudolikelihood ratio test with biased observations. Stat Pap 53:387–400

    Article  MathSciNet  MATH  Google Scholar 

  • Karimi O, Mohammadzadeh M (2012) Bayesian spatial regression models with closed skew normal correlated errors and missing observations. Stat Pap 53:205–218

    Article  MathSciNet  MATH  Google Scholar 

  • Kohler M, Krzy\(\grave{{\rm z}}\)ak A, Walk H (2003) Strong consistency of automatic kernel regression estimates. Ann Inst Stat Math 55: 287–308

  • Kolmogorov AN, Tikhomirov VM (1959) \(\epsilon \)-entropy and \(\epsilon \)-capacity of sets in function spaces. Uspekhi Mat Nauk 14:3–86

    MathSciNet  MATH  Google Scholar 

  • Krzy\(\grave{{\rm z}}\)ak A, Pawlak M (1984) Distribution-free consistency of a nonparametric kernel regression estimate and classification. IEEE Trans Inform Theory 30:78–81

  • Liang H, Wang S, Carroll R (2007) Partially linear models with missing response variables and error-prone covariates. Biometrika 94:185–198

    Article  MathSciNet  MATH  Google Scholar 

  • Little RJA, Rubin DB (2002) Statistical analysis with missing data. Wiley, New York

    Book  MATH  Google Scholar 

  • McCullagh P, Nelder J (1983) Generalized linear models. Chapman & Hall, London

    Book  MATH  Google Scholar 

  • Mojirsheibani M (2007) Nonparametric curve estimation with missing data: a general empirical process approach. J Stat Plan Inference 137:2733–2758

    Article  MathSciNet  MATH  Google Scholar 

  • Müller U (2009) Estimating linear functionals in nonlinear regression with responses missing at random. Ann Stat 37:2245–2277

    Article  MathSciNet  MATH  Google Scholar 

  • Nadaraya EA (1964) On estimating regression. Theory Probab Appl 9:141–142

    Article  MATH  Google Scholar 

  • Robins J, Rotnitzky A (1995) Semiparametric efficiency in multivariate regression models with missing data. J Am Stat Assoc 90:122–129

    Article  MathSciNet  MATH  Google Scholar 

  • Robins J, Rotnitzky A, Zhao L (1995) Ping analysis of semiparametric regression models for repeated outcomes in the presence of missing data. J Am Stat Assoc 90:106–121

    Article  MATH  Google Scholar 

  • Schisterman E, Rotnitzky A (2001) Estimation of the mean of a K-sample U-statistic with missing outcomes and auxiliaries. Biometrika 88:713–725

    Article  MathSciNet  MATH  Google Scholar 

  • Spiegelman C, Sacks J (1980) Consistent window estimation in nonparametric regression. Ann Stat 8:240–246

    Article  MathSciNet  MATH  Google Scholar 

  • Takai K, Kano Y (2013) Asymptotic inference with incomplete data. Commun Stat Theor Meth 42:3174–3190

    Article  MathSciNet  MATH  Google Scholar 

  • Tsiatis A (2006) Semiparametric theory and missing data. Springer, New York

    MATH  Google Scholar 

  • Toutenburg H, Shalabh (2003) Estimation of regression models with equi-correlated responses when some observations on the response variable are missing. Stat Pap 44:217–232

    Article  MathSciNet  MATH  Google Scholar 

  • van der Vaart A, Wellner J (1996) Weak convergence and empirical processes. Springer-Verlag, New York

    Book  MATH  Google Scholar 

  • Walk H (2002) On cross-validation in kernel and partitioning regression estimation. Stat Probab Lett 59:113–123

    Article  MathSciNet  MATH  Google Scholar 

  • Walk, H. (2002b), Almost sure convergence properties of Nadaraya-Watson regression estimates. Modeling uncertainty, 201–223, Inter Ser Oper Res Manag Sci, 46, Kluwer Academic Publishers, Boston

  • Wang D, Chen S (2009) Empirical likelihood for estimating equations with missing values. Ann Stat 37:490–517

    Article  MathSciNet  MATH  Google Scholar 

  • Wang Q, Linton O, Härdle W (2004) Semiparametric regression analysis with missing response at random. J Am Stat Assoc 99:334–345

    Article  MathSciNet  MATH  Google Scholar 

  • Wang Q, Qin Y (2010) Empirical likelihood confidence bands for distribution functions with missing responses. J Stat Plan Inference 140:2778–2789

    Article  MathSciNet  MATH  Google Scholar 

  • Wang L, Rotnitzky A, Lin X (2010) Nonparametric regression with missing outcomes using weighted kernel estimating equations. J Am Stat Assoc 105:1135–1146

    Article  MathSciNet  MATH  Google Scholar 

  • Watson GS (1964) Smooth regression analysis. Sankhya: Indian J Stat, Ser A 26:359–372

    MathSciNet  MATH  Google Scholar 

Download references

Acknowledgments

This work is supported in part by the National Science Foundation Grant DMS-1407400 of Majid Mojirsheibani.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Majid Mojirsheibani.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mojirsheibani, M., Reese, T. Kernel regression estimation for incomplete data with applications. Stat Papers 58, 185–209 (2017). https://doi.org/10.1007/s00362-015-0693-z

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00362-015-0693-z

Keywords

Mathematics Subject Classification

Navigation