Skip to main content
Log in

On the \(L_p\) norms of kernel regression estimators for incomplete data with applications to classification

  • Original Paper
  • Published:
Statistical Methods & Applications Aims and scope Submit manuscript

Abstract

We consider kernel methods to construct nonparametric estimators of a regression function based on incomplete data. To tackle the presence of incomplete covariates, we employ Horvitz–Thompson-type inverse weighting techniques, where the weights are the selection probabilities. The unknown selection probabilities are themselves estimated using (1) kernel regression, when the functional form of these probabilities are completely unknown, and (2) the least-squares method, when the selection probabilities belong to a known class of candidate functions. To assess the overall performance of the proposed estimators, we establish exponential upper bounds on the \(L_p\) norms, \(1\le p<\infty \), of our estimators; these bounds immediately yield various strong convergence results. We also apply our results to deal with the important problem of statistical classification with partially observed covariates.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

References

  • Bernstein S (1946) The theory of probabilities. Gastehizdat Publishing House, Moscow

    Google Scholar 

  • Bravo F (2015) Semiparametric estimation with missing covariates. J Multivar Anal 139:329–346

    Article  MathSciNet  MATH  Google Scholar 

  • Chen HY (2004) Nonparametric and semiparametric models for missing covariates in parametric regression. J Am Stat Assoc 99(468):1176–1189

    Article  MathSciNet  MATH  Google Scholar 

  • Cheng PE, Chu CK (1996) Kernel estimation of distribution functions and quantiles with missing data. Stat Sin 6:63–78

    MathSciNet  MATH  Google Scholar 

  • Devroye L (1981) On the almost everywhere convergence of nonparametric regression function estimates. Ann Stat 9:1310–1319

    Article  MathSciNet  MATH  Google Scholar 

  • Devroye L, Györfi L, Lugosi G (1985) Nonparametric density estimation: the L1 view. Wiley, New York

    MATH  Google Scholar 

  • Devroye L, Krzyz̀ak A (1989) An equivalence theorem for \(L_1\) convergence of kernel regression estimate. J Stat Plan Inference 23:71–82

    Article  MathSciNet  Google Scholar 

  • Devroye L, Wagner T (1980) On the \(L_1\) convergence of kernel estimators of regression functions with applications in discrimination. Z. Wahrsch. Verw. Gebiete 51:15–25

    Article  MathSciNet  MATH  Google Scholar 

  • Efromovich S (2012) Nonparametric regression with predictors missing at random. J Am Stat Assoc 106:306–319

    Article  MathSciNet  MATH  Google Scholar 

  • Faes C, Ormerod JT, Wand MP (2011) Variational Bayesian inference for parametric and nonparametric regression with missing data. J Am Stat Assoc 106(495):959–971

  • Guo X, Xu W, Zhu L (2014) Multi-index regression models with missing covariates at random. J Multivar Anal 123:345–363

    Article  MathSciNet  MATH  Google Scholar 

  • Györfi L, Kohler M, Krzyz̀ak A, Walk H (2002) A distribution-free theory of nonparametric regression. Springer, New York

    Book  MATH  Google Scholar 

  • Hardle W, Marron J (1985) Optimal bandwidth selection in nonparametric regression function estimation. Ann Stat 13:1465–1481

    Article  MathSciNet  MATH  Google Scholar 

  • Hirano KI, Ridder G (2003) Efficient estimation of average treatment effects using the estimated propensity score. Econometrica 71:1161–1189

    Article  MathSciNet  MATH  Google Scholar 

  • Horvitz DG, Thompson DJ (1952) A generalization of sampling without replacement from a finite universe. J Am Stat Assoc 47:663–685

    Article  MathSciNet  MATH  Google Scholar 

  • Hu Y, Zhu Q, Tian M (2014) An efficient technique of multiple imputation in nonparametric quantile regression. J Math Stat 10:30–44

    Article  Google Scholar 

  • Ibrahim JG, Lipsitz SR, Chen MH (1999) Missing covariates in generalized linear models when the missing data mechanism is non-ignorable. J R Stat Soc Ser B (Statistical Methodology) 61(1):173–190

    Article  MathSciNet  MATH  Google Scholar 

  • Kohler M, Krzyz̀ak A, Walk H (2003) Strong consistency of automatic kernel regression estimates. Ann. Inst. Stat. Math. 55:287–308

    MathSciNet  MATH  Google Scholar 

  • Liang H, Wang S, Robins J, Carroll R (2004) Estimation in partially linear models with missing covariates. J Am Stat Assoc 99(466):357–367

    Article  MathSciNet  MATH  Google Scholar 

  • Lipsitz SR, Ibrahim JG (1996) A conditional model for incomplete covariates in parametric regression models. Biometrika 83(4):916–922

    Article  MATH  Google Scholar 

  • Little RJA, Rubin DB (2002) Statistical analysis with missing data. Wiley, New York

    Book  MATH  Google Scholar 

  • Meier L, van de Geer S, Bühlmann P (2009) High-dimensional additive modeling. Ann Stat 37:3779–3821

    Article  MathSciNet  MATH  Google Scholar 

  • Mojirsheibani M (2007) Nonparametric curve estimation with missing data: a general empirical process approach. J Stat Plan Inference 137:2733–2758

    Article  MathSciNet  MATH  Google Scholar 

  • Mojirsheibani M (2012) Some results on classifier selection with missing covariates. Metrika 75:521–539

    Article  MathSciNet  MATH  Google Scholar 

  • Pollard D (1984) Convergence of stochastic processes. Springer, New York

    Book  MATH  Google Scholar 

  • Racine J, Hayfield T (2008) Nonparametric econometrics: the np package. J Stat Softw 27:1–32

    Google Scholar 

  • Racine J, Li Q (2004) Nonparametric estimation of regression functions with both categorical and continuous data. J Econom 119:99–130

    Article  MathSciNet  MATH  Google Scholar 

  • Robins JM, Rotnitzky A, Zhao LP (1994) Estimation of regression coefficients when some regressors are not always observed. J Am Stat Assoc 89(427):846–866

    Article  MathSciNet  MATH  Google Scholar 

  • Sinha S, Saha KK, Wang S (2014) Semiparametric approach for non-monotone missing covariates in a parametric regression model. Biometrics 70(2):299–311

    Article  MathSciNet  MATH  Google Scholar 

  • Spiegelman C, Sacks J (1980) Consistent window estimation in nonparametric regression. Ann Stat 8:240–246

    Article  MathSciNet  MATH  Google Scholar 

  • van Der Vaart AW, Wellner JA (1996) Weak convergence and empirical processes with applications to statistics. Springer, New York

    Book  MATH  Google Scholar 

  • Walk H (2002a) On cross-validation in kernel and partitioning regression estimation. Stat Probab Lett 59:113–123

    Article  MathSciNet  MATH  Google Scholar 

  • Walk H (2002b) Almost sure convergence properties of Nadaraya–Watson regression estimates. In: Modeling uncertainty. International Series of Operational Research and Management Science, vol 46. Kluwer Academic Publishing, Boston

  • Wang L, Rotnitzky A, Lin X (2010) Nonparametric regression with missing outcomes using weighted kernel estimating equations. J Am Stat Assoc 105:1135–1146

    Article  MathSciNet  MATH  Google Scholar 

  • Zhang Z, Rockette HE (2005) On maximum likelihood estimation in parametric regression with missing covariates. J Stat Plan Inference 134:206–223

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgments

This work is supported by the National Science Foundation Grant DMS-1407400 of Majid Mojirsheibani.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Majid Mojirsheibani.

Additional information

This work is supported by the NSF Grant DMS-1407400 of Majid Mojirsheibani.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Reese, T., Mojirsheibani, M. On the \(L_p\) norms of kernel regression estimators for incomplete data with applications to classification. Stat Methods Appl 26, 81–112 (2017). https://doi.org/10.1007/s10260-016-0359-6

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10260-016-0359-6

Keywords

Navigation