Annals of the Institute of Statistical Mathematics

, Volume 58, Issue 4, pp 687–706

Semiparametric Maximum Likelihood for Missing Covariates in Parametric Regression



We consider parameter estimation in parametric regression models with covariates missing at random. This problem admits a semiparametric maximum likelihood approach which requires no parametric specification of the selection mechanism or the covariate distribution. The semiparametric maximum likelihood estimator (MLE) has been found to be consistent. We show here, for some specific models, that the semiparametric MLE converges weakly to a zero-mean Gaussian process in a suitable space. The regression parameter estimate, in particular, achieves the semiparametric information bound, which can be consistently estimated by perturbing the profile log-likelihood. Furthermore, the profile likelihood ratio statistic is asymptotically chi-squared. The techniques used here extend to other models.


Asymptotic normality Efficiency Infinite-dimensional M-estimation Missing at random Missing covariates Parametric regression Profile likelihood Semiparametric likelihood 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Bickel P.J., Klaassen C.A.J., Ritov Y., Wellner J.A. (1993). Efficient and adaptive estimation for semiparametric models. Johns Hopkins University Press, BaltimoreMATHGoogle Scholar
  2. Breslow N.E., McNeney B., Wellner J.A. (2003). Large sample theory for semiparametric regression models with two-phase, outcome dependent sampling. Annals of Statistics 31, 1110–1139CrossRefMathSciNetGoogle Scholar
  3. Carroll R.J., Wand M.P. (1991). Semiparametric estimation in logistic measurement error models. Journal of the Royal Statistical Society, Series B 53, 573–585MathSciNetGoogle Scholar
  4. Chatterjee N., Chen Y.H., Breslow N.E. (2003). A pseudoscore estimator for regression problems with two-phase sampling. Journal of the American Statistical Association 98, 158–168CrossRefMathSciNetGoogle Scholar
  5. Chen H.Y. (2004). Nonparametric and semiparametric models for missing covariates in parametric regression. Journal of the American Statistical Association 99, 1176–1189CrossRefMathSciNetGoogle Scholar
  6. Ibrahim J.G., Chen M.H., Lipsitz S.R. (1999). Monte Carlo EM for missing covariates in parametric regression models. Biometrics 55, 591–596CrossRefGoogle Scholar
  7. Lawless J.F., Kalbfleisch J.D., Wild C.J. (1999). Semiparametric methods for response-selective and missing data problems in regression. Journal of the Royal Statistical Society, Series B 61, 413–438CrossRefMathSciNetGoogle Scholar
  8. Murphy S.A., van der Vaart A.W. (2000). On profile likelihood (with discussion). Journal of the American Statistical Association 95, 449–465CrossRefMathSciNetGoogle Scholar
  9. Murphy S.A., van der Vaart A.W. (2001). Semiparametric mixtures in case-control studies. Journal of Multivariate Analysis 79, 1–32CrossRefMathSciNetGoogle Scholar
  10. Pepe M.S., Fleming T.R. (1991). A nonparametric method for dealing with mismeasured covariate data. Journal of the American Statistical Association 86, 108–113CrossRefMathSciNetGoogle Scholar
  11. Reilly M., Pepe M.S. (1995). A mean score method for missing and auxiliary covariate data in regression models. Biometrika 82, 299–314CrossRefMathSciNetGoogle Scholar
  12. Robins J.M., Rotnitzky A., Zhao L.P. (1994). Estimation of regression coefficients when some regressors are not always observed. Journal of the American Statistical Association 89, 846–866CrossRefMathSciNetGoogle Scholar
  13. Robins J.M., Hsieh F., Newey W. (1995a). Semiparametric efficient estimation of a conditional density with missing or mismeasured covariates. Journal of the Royal Statistical Society, Series B 57, 409–424MathSciNetGoogle Scholar
  14. Robins J.M., Rotnitzky A., Zhao L.P. (1995b). Analysis of semiparametric regression models for repeated outcomes in the presence of missing data. Journal of the American Statistical Association 90, 106–121CrossRefMathSciNetGoogle Scholar
  15. Roeder K., Carroll R.J., Lindsay B.G. (1996). A semiparametric mixture approach to case-control studies with errors in covariables. Journal of the American Statistical Association 91, 722–732CrossRefMathSciNetGoogle Scholar
  16. Rubin D.B. (1976). Inference and missing data. Biometrika 63, 581–592CrossRefMathSciNetGoogle Scholar
  17. Rudin W. (1973). Functional analysis. McGraw-Hill, New YorkMATHGoogle Scholar
  18. van der Vaart A.W. (1994). Maximum likelihood estimation with partially censored data. Annals of Statistics 22, 1896–1916MathSciNetGoogle Scholar
  19. van der Vaart A.W. (1998). Asymptotic statistics. Cambridge University Press, New YorkMATHGoogle Scholar
  20. van der Vaart A.W., Wellner J.A. (1996). Weak Convergence and Empirical Processes with Applications to Statistics. Springer-Verlag, Berlin Heidelberg New YorkMATHGoogle Scholar
  21. van der Vaart A.W., Wellner J.A. (2001). Consistency of semiparametric maximum likelihood estimators for two-phase sampling. Canadian Journal of Statistics 29, 269–288Google Scholar
  22. Wild C.J. (1991). Fitting prospective regression models to case-control data. Biometrika 78, 705–717CrossRefMathSciNetGoogle Scholar
  23. Zhang Z., Rockette H.E. (2005a). On maximum likelihood estimation in parametric regression with missing covariates. Journal of Statistical Planning and Inference 134, 206–223CrossRefMathSciNetGoogle Scholar
  24. Zhang, Z., Rockette, H.E. (2005b). An EM algorithm for regression analysis with incomplete covariate information. Journal of Statistical Computation and Simulation (in press).Google Scholar

Copyright information

© The Institute of Statistical Mathematics, Tokyo 2006

Authors and Affiliations

  1. 1.Division of BiostatisticsU.S. Food and Drug AdministrationRockvilleUSA
  2. 2.Department of BiostatisticsUniversity of PittsburghPittsburghUSA

Personalised recommendations