Abstract
We consider a semiparametric method to estimate logistic regression models with missing both covariates and an outcome variable, and propose two new estimators. The first, which is based solely on the validation set, is an extension of the validation likelihood estimator of Breslow and Cain (Biometrika 75:11–20, 1988). The second is a joint conditional likelihood estimator based on the validation and non-validation data sets. Both estimators are semiparametric as they do not require any model assumptions regarding the missing data mechanism nor the specification of the conditional distribution of the missing covariates given the observed covariates. The asymptotic distribution theory is developed under the assumption that all covariate variables are categorical. The finite-sample properties of the proposed estimators are investigated through simulation studies showing that the joint conditional likelihood estimator is the most efficient. A cable TV survey data set from Taiwan is used to illustrate the practical use of the proposed methodology.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
Albert PS, Hunsberger SA, Bird FM (1997) Modeling repeated measures with monotonic ordinal responses and misclassification, with applications to studying maturation. J Am Stat Assoc 92: 1304–1311
Bollinger CR, David MH (1997) Modeling discrete choice with response error: food stamp participation. J Am Stat Assoc 92: 827–835
Breslow NE, Cain KC (1988) Logistic regression for two-stage case-control data. Biometrika 75: 11–20
Carroll RJ, Stefanski LA (1990) Approximate quasi-likelihood estimation in models with surrogate predictors. J Am Stat Asso 85: 652–663
Chen J, Breslow NE (2004) Semiparametric efficient estimation for the auxiliary outcome problem with the conditional mean model. Can J Stat 32: 359–372
Cheng KF, Hsueh HM (1999) Correcting bias due to misclassification in the estimation of logistic regression models. Stat Probab Lett 44: 229–240
Cheng KF, Hsueh HM (2003) Estimation of a logistic regression model with mismeasured observations. Statistica Sinica 13: 111–127
Chu H, Halloran ME (2004) Estimating vaccine efficacy using auxiliary outcome data and a small validation sample. Stat Med 23: 2697–2711
Cox DR (1970) The analysis of binary data. Chapman and Hall, London
Foutz RV (1977) On the unique consistent solution to the likelihood equations. J Am Stat Assoc 72: 147–148
Horvitz DG, Thompson DJ (1952) A generalization of sampling without replacement from a finite universe. J Am Stat Assoc 47: 663–685
Pepe MS (1992) Inference using surrogate outcome data and a validation sample. Biometrika 79: 355–365
Pepe MS, Reilly M, Fleming TR (1994) Auxiliary outcome data and the mean-score method. J Stat Plan Inference 42: 137–160
Pregibon D (1981) Logistic regression diagnostics. Ann Stat 9: 705–724
Reilly M, Pepe MS (1995) A mean-score method for missing and auxiliary covariate data in regression models. Biometrika 82: 299–314
Rosner B, Willett WC, Spiegelman DP (1989) Correction of logistic regression relative risk estimates and confidence intervals for systemtic within-person measurement error. Stat Med 8: 1051–1069
Ronser B, Spiegelman D, Willett WC (1990) Correction of logistic regression relative risk estimates and confidence intervals for measurement error: the case of multiple covariates measured with error. Am J Epidemiol 132: 734–745
Rubin DB (1976) Inference and missing data. Biometrika 63: 581–592
Wang CY, Wang S (1997) Semiparametric methods in logistic regression with measurement error. Statistica Sinica 7: 1103–1120
Wang CY, Chen JC, Lee SM, Ou ST (2002) Joint conditional likelihood estimator in logistic regression with missing covariate data. Statistica Sinica 12: 555–574
Acknowledgments
The authors thank two referees whose helpful comments improved the presentation. The research of S.M. Lee and S.H. Hsieh was supported by National Science Council grant of Taiwan, ROC, 97-2118-M-035-002-MY2 and 99-2811-M035-001 respectively. This publication was made possible by Grant Number UL1 RR024146 from the National Center for Research Resources (NCRR), a component of the National Institutes of Health (NIH), and NIH Roadmap for Medical Research (C.S. Li).
Open Access
This article is distributed under the terms of the Creative Commons Attribution Noncommercial License which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Open Access This is an open access article distributed under the terms of the Creative Commons Attribution Noncommercial License (https://creativecommons.org/licenses/by-nc/2.0), which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.
About this article
Cite this article
Lee, SM., Li, CS., Hsieh, SH. et al. Semiparametric estimation of logistic regression model with missing covariates and outcome. Metrika 75, 621–653 (2012). https://doi.org/10.1007/s00184-011-0345-9
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00184-011-0345-9