On histogram-based regression and classification with incomplete data

Han, Eric; Mojirsheibani, Majid

doi:10.1007/s00184-020-00794-y

On histogram-based regression and classification with incomplete data

Published: 19 August 2020

Volume 84, pages 635–662, (2021)
Cite this article

Metrika Aims and scope Submit manuscript

Eric Han¹ &
Majid Mojirsheibani¹

230 Accesses
Explore all metrics

Abstract

We consider the problem of nonparametric regression with possibly incomplete covariate vectors. The proposed estimators, which are based on histogram methods, are fully nonparametric and straightforward to implement. The presence of incomplete covariates is handled by an inverse weighting method, where the weights are estimates of the conditional probabilities of having incomplete covariate vectors. We also derive various exponential bounds on the $L_1$ norms of our estimators, which can be used to establish strong consistency results for the corresponding, closely related, problem of nonparametric classification with missing covariates. As the main focus and application of our results, we consider the problem of pattern recognition and statistical classification in the presence of incomplete covariates and propose histogram classifiers that are asymptotically optimal.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Kernel regression estimation for incomplete data with applications

Article 02 July 2015

On regression and classification with possibly missing response variables in the data

Article 10 September 2023

On the $$L_p$$ norms of kernel regression estimators for incomplete data with applications to classification

Article 05 April 2016

References

Bravo F (2015) Semiparametric estimation with missing covariates. J Multivar Anal 139:329–346
Article MathSciNet Google Scholar
Chen HY (2004) Nonparametric and semiparametric models for missing covariates in parametric regression. J Am Stat Assoc 99:1176–1189
Article MathSciNet Google Scholar
Chen Q, Paik MC, Kim M, Wang C (2016) Using link-preserving imputation for logistic partially linear models with missing covariates. Comput Stat Data Anal 101:174–185
Article MathSciNet Google Scholar
Devroye L, Györfi L (1983) Distribution-free exponential bound on the $L_1$ error of partitioning estimates of a regression function. In: Konecny F, Mogyorodi J, Wertz W (eds) Proceeding of the 4th Pannonian symposium on mathematical statistics. Akademiai Kiado, Budapest, pp 67–76
Google Scholar
Devroye L, Györfi L (1985) Nonparametric density estimation: the $L_1$ view. Wiley, New York
MATH Google Scholar
Devroye L, Györfi L, Lugosi G (1996) A probabilistic theory of pattern recognition. Springer, New York
Book Google Scholar
Efromovich S (2012) Nonparametric regression with predictors missing at random. J Am Stat Assoc 106:306–319
Article MathSciNet Google Scholar
Guo X, Xu W, Zhu L (2014) Multi-index regression models with missing covariates at random. J Multivar Anal 123:345–363
Article MathSciNet Google Scholar
Györfi L (1991) Universal consistencies of a regression estimate for unbounded regression functions. In: Roussa G (ed) Nonparametric functional estimation and related topics. NATO ASI series. Kluwer Academic Publishers, Dordrecht, pp 329–338
Chapter Google Scholar
Györfi L, Kohler M, Krzyżak A, Walk H (2002) A distribution-free theory of nonparametric regression. Springer, New York
Book Google Scholar
Hall P, Kang KH (2005) Bandwidth choice for nonparametric classification. Ann Stat 33:284–306
MathSciNet MATH Google Scholar
Horvitz DG, Thompson DJ (1952) A generalization of sampling without replacement from a finite universe. J Am Stat Assoc 47:663–685
Article MathSciNet Google Scholar
Hu Y, Zhu Q, Tian M (2014) An efficient technique of multiple imputation in nonparametric quantile regression. J Math Stat 10:30–44
Article Google Scholar
Lee SM, Li CS, Hsieh SH, Huang LH (2012) Semiparametric estimation of logistic regression model with missing covariates and outcome. Metrika 75:621–653
Article MathSciNet Google Scholar
Liang H, Wang S, Robins J, Carroll R (2004) Estimation in partially linear models with missing covariates. J Am Stat Assoc 99:357–367
Article MathSciNet Google Scholar
Lipsitz SR, Ibrahim JG (1996) A conditional model for incomplete covariates in parametric regression models. Biometrika 83:916–922
Article Google Scholar
Little JA, Rubin DB (2002) Statistical analysis with missing data. Wiley, Hoboken
Book Google Scholar
Liu T, Yuan X (2016) Weighted quantile regression with missing covariates using empirical likelihood. Statistics 50:89–113
Article MathSciNet Google Scholar
Lukusa TM, Lee SM, Li CS (2016) Semiparametric estimation of a zero-inflated Poisson regression model with missing covariates. Metrika 79:457–483
Article MathSciNet Google Scholar
Meier L, van de Geer S, Bühlmann P (2009) High-dimensional additive modeling. Ann Stat 37:3779–3821
Article MathSciNet Google Scholar
Mojirsheibani M (2012) Some results on classifier selection with missing covariates. Metrika 75:521–539
Article MathSciNet Google Scholar
Pollard D (1984) Convergence of stochastic processes. Springer, New York
Book Google Scholar
Racine J, Hayfield T (2008) Nonparametric econometrics: the np package. J Stat Softw 27:1–32
Google Scholar
Racine J, Li Q (2004) Nonparametric estimation of regression functions with both categorical and continuous data. J Econ 119:99–130
Article MathSciNet Google Scholar
Robins JM, Rotnitzky A, Zhao L (1994) Estimation of regression coefficients when some regresoors are not always observed. J Am Stat Assoc 89:846–866
Article Google Scholar
Sinha S, Saha KK, Wang S (2014) Semiparametric approach for non-monotone missing covariates in a parametric regression model. Biometrics 70:299–311
Article MathSciNet Google Scholar
Wu K, Wu L (2007) Generalized linear mixed models with informative dropouts and missing covariates. Metrika 66:1–18
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

California State University Northridge, Northridge, CA, 91330, USA
Eric Han & Majid Mojirsheibani

Authors

Eric Han
View author publications
You can also search for this author in PubMed Google Scholar
Majid Mojirsheibani
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Majid Mojirsheibani.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Availability of data and material

The Pima Indian Diabetes data set is available from https://www.kaggle.com/.

Code availability

Not applicable.

Funding

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This work is supported by the National Science Foundation (NSF) Grant DMS-1916161 of M. Mojirsheibani.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Han, E., Mojirsheibani, M. On histogram-based regression and classification with incomplete data. Metrika 84, 635–662 (2021). https://doi.org/10.1007/s00184-020-00794-y

Download citation

Received: 27 December 2019
Published: 19 August 2020
Issue Date: July 2021
DOI: https://doi.org/10.1007/s00184-020-00794-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

On histogram-based regression and classification with incomplete data

Abstract

Access this article

Similar content being viewed by others

Kernel regression estimation for incomplete data with applications

On regression and classification with possibly missing response variables in the data

On the $$L_p$$ norms of kernel regression estimators for incomplete data with applications to classification

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Availability of data and material

Code availability

Funding

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

On histogram-based regression and classification with incomplete data

Abstract

Access this article

Similar content being viewed by others

Kernel regression estimation for incomplete data with applications

On regression and classification with possibly missing response variables in the data

On the $$L_p$$ norms of kernel regression estimators for incomplete data with applications to classification

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Availability of data and material

Code availability

Funding

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation