Skip to main content
Log in

Plug-in marginal estimation under a general regression model with missing responses and covariates

  • Original Paper
  • Published:
TEST Aims and scope Submit manuscript

Abstract

In this paper, we consider a general regression model where missing data occur in the response and in the covariates. Our aim is to estimate the marginal distribution function and a marginal functional, such as the mean, the median or any \(\alpha \)-quantile of the response variable. A missing at random condition is assumed in order to prevent from bias in the estimation of the marginal measures under a non-ignorable missing mechanism. We give two different approaches for the estimation of the responses distribution function and of a given marginal functional, involving inverse probability weighting and the convolution of the distribution function of the observed residuals and that of the observed estimated regression function. Through a Monte Carlo study and two real data sets, we illustrate the behaviour of our proposals.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  • Aerts M, Claeskens G, Hens N, Molenberghs G (2002) Local multiple imputation. Biometrika 89:375–388

    Article  MathSciNet  MATH  Google Scholar 

  • Bahadur RR (1966) A note on quantiles in large samples. Ann Math Stat 37:577–580

    Article  MathSciNet  MATH  Google Scholar 

  • Bali L (2012) Métodos robustos de estimación de componentes principales funcionales y el modelo de componentes principales comunes. Ph. Thesis. Universidad de Buenos Aires (in spanish). Available at http://cms.dm.uba.ar/academico/carreras/doctorado/2012/tesisBali.pdf.

  • Bianco A, Boente G, González-Manteiga W, Pérez-González A (2010) Estimation of the marginal location under a partially linear model with missing responses. Comput Stat Data Anal 54:546–564

    Article  MathSciNet  MATH  Google Scholar 

  • Bianco A, Spano P (2017) Robust inference for nonlinear regression models. https://doi.org/10.1007/s11749-017-0570-2

  • Billingsley P (1968) Convergence of probability measures. Wiley, New York

    MATH  Google Scholar 

  • Boente G, González-Manteiga W, Pérez-González A (2009) Robust nonparametric estimation with missing data. J Stat Plan Inference 139:571–592

    Article  MathSciNet  MATH  Google Scholar 

  • Bravo F (2015) Semiparametric estimation with missing covariates. J Multivar Anal 139:329–346

    Article  MathSciNet  MATH  Google Scholar 

  • Bravo F, Jacho-Chávez D (2016) Semiparametric quasi-likelihood estimation with missing data. Commun Stat Theory Methods 45:1345–1369

    Article  MathSciNet  MATH  Google Scholar 

  • Burton A, Altman DG (2004) Missing covariate data within cancer prognostic studies: a review of current reporting and proposed guidelines. Br J Cancer 91:4–8

    Article  Google Scholar 

  • Chen H, Chen K (1991) Selection of the splined variables and convergence rates in a partial spline model. Can J Stat 19:323–339

    Article  MathSciNet  MATH  Google Scholar 

  • Chen Q, Ibrahim J, Chen M, Senchaudhuri P (2008) Theory and inference for regression models with missing responses and covariates. J Multivar Anal 99:1302–1331

    Article  MathSciNet  MATH  Google Scholar 

  • Chen J, Shao J (2000) Nearest neighbor imputation for survey data. J Off Stat 16:113–131

    Google Scholar 

  • Chen S, Van Keilegom I (2013) Estimation in semiparametric models with missing data. Ann Inst Math Stat 65:785–805

    Article  MathSciNet  MATH  Google Scholar 

  • Chen X, Wan A, Zhou Y (2015) Efficient quantile regression analysis with missing observations. J Am Stat Assoc 110:723–741

    Article  MathSciNet  MATH  Google Scholar 

  • Cheng PE (1994) Nonparametric estimation of mean functionals with data missing at random. J Am Stat Assoc 89:81–87

    Article  MATH  Google Scholar 

  • Cheng PE, Chu CK (1996) Kernel estimation of distribution functions and quantiles with missing data. Stat Sinica 6:63–78

    MathSciNet  MATH  Google Scholar 

  • Cleveland W (1985) The elements of graphing data. Bell Telephone Laboratories Inc., New Jersey

    Google Scholar 

  • Collomb G (1979) Conditions nécessaires et suffisantes de convergence uniforme d’un estimateur de la régression, estimation des dérivées de la régression. Comptes Rendus Academie de Sciencies de Paris 228:161–163

    MathSciNet  MATH  Google Scholar 

  • Daniel C, Wood F (1980) Fitting equations to data: computer analysis of multifactor data. Wiley, New York

    MATH  Google Scholar 

  • Díaz I (2017) Efficient estimation of quantiles in missing data models. J Stat Plan Inference 190:39–51

    Article  MathSciNet  MATH  Google Scholar 

  • Fernholz L (1993) Smoothed versions of statistical functionals. In: Morgenthaler S, Ronchetti E, Stahel W (eds) New directions in statistical data analysis and robustness. Birkhauser, Basel, pp 61–72

    Google Scholar 

  • Härdle W, Liang H, Gao J (2000) Partially linear models. Springer, Heidelberg

    Book  MATH  Google Scholar 

  • Härdle W, Müller M, Sperlich S, Werwatz A (2004) Nonparametric and semiparametric models. Springer, Heidelberg

    Book  MATH  Google Scholar 

  • He X, Zhu Z, Fung W (2002) Estimation in a semiparametric model for longitudinal data with unspecified dependence structure. Biometrika 89:579–590

    Article  MathSciNet  MATH  Google Scholar 

  • Hirano K, Imbens G, Ridder G (2003) Efficient estimation of average treatment effects using the estimated propensity score. Econometrica 71:1161–1189

    Article  MathSciNet  MATH  Google Scholar 

  • Horvitz DG, Thompson DJ (1952) A generalization of sampling without replacement from a finite universe. J Am Stat Assoc 47:663–685

    Article  MathSciNet  MATH  Google Scholar 

  • Huber P, Ronchetti E (2009) Robust statistics. Wiley, New York

    Book  MATH  Google Scholar 

  • Liang H, Wang S, Robins J, Carroll R (2004) Estimation in partially linear models with missing covariates. J Am Stat Assoc 99:357–367

    Article  MathSciNet  MATH  Google Scholar 

  • Little R (1992) Regression with missing X’s: a review. J Am Stat Assoc 87:1227–1237

    Google Scholar 

  • Little R, Rubin D (2002) Statistical analysis with missing data. Wiley, New York

    Book  MATH  Google Scholar 

  • Müller U (2009) Estimating linear functionals in nonlinear regression with responses missing at random. Ann Stat 37:2245–2277

    Article  MathSciNet  MATH  Google Scholar 

  • Pollard D (1984) Convergence of stochastic processes. Springer, New York

    Book  MATH  Google Scholar 

  • Robinson P (1988) Root-n-consistent semiparametric regression. Econometrica 56:931–954

    Article  MathSciNet  MATH  Google Scholar 

  • Schumaker L (1981) Spline functions: basic theory. Wiley, New York

    MATH  Google Scholar 

  • Sued M, Yohai V (2013) Robust location estimation with missing data. Can J Stat 41:111–132

    Article  MathSciNet  MATH  Google Scholar 

  • Tukey JW (1977) Exploratory data analysis. Addison-Wesley, Reading

    MATH  Google Scholar 

  • Varadarajan VS (1958) On the convergence of sample probability distributions. Sanky\(\bar{a}\) Indian J Stat 19:23–26

  • Wang Q, Linton O, Härdle W (2004) Semiparametric regression analysis with missing response at random. J Am Stat Assoc 99:334–345

    Article  MathSciNet  MATH  Google Scholar 

  • Wang W, Rao J (2002) Empirical likelihood-based inference under imputation for missing response data. Ann Stat 30:896–924

    Article  MathSciNet  MATH  Google Scholar 

  • Yang SS (1985) A smooth nonparametric estimator of a quantile function. J Am Stat Assoc 80:1004–1011

    Article  MathSciNet  MATH  Google Scholar 

  • Yates F (1933) The analysis of replicated experiments when the field results are incomplete. Empire J Exp Agric 1:129–142

    Google Scholar 

  • Zhang Z, Chen Z, Troendle JF, Zhang J (2012) Causal inference on quantiles with an obstetric application. Biometrics 68:697–706

    Article  MathSciNet  MATH  Google Scholar 

  • Zhou Y, Wan ATK, Wang X (2008) Estimating equation inference with missing data. J Am Stat Assoc 103:1187–1199

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

The authors wish to thank two anonymous referees for valuable comments which led to an improved version of the original paper. This work was partially developed while Ana M. Bianco and Graciela Boente were visiting the Departamento de Estatística, Análise Matemática e Optimización de la Universidad de Santiago de Compostela, Spain under the bilateral agreement between the Universidad de Buenos Aires and the Universidad de Santiago de Compostela. This research was partially supported by Grants pict 2014-0351 from anpcyt and 20020130100279BA from the Universidad de Buenos Aires, Argentina and also by the Spanish Projects MTM2013-41383P and MTM2016-76969P from the Ministry of Science and Innovation, Spain. A. Bianco and G. Boente also wish to thank the Minerva Foundation for its support to present some of this paper results at the International Conference on Robust Statistics 2017.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Graciela Boente.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 399 KB)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bianco, A.M., Boente, G., González-Manteiga, W. et al. Plug-in marginal estimation under a general regression model with missing responses and covariates. TEST 28, 106–146 (2019). https://doi.org/10.1007/s11749-018-0591-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11749-018-0591-5

Keywords

Mathematics Subject Classification

Navigation