Annals of the Institute of Statistical Mathematics

, Volume 71, Issue 5, pp 1201–1232 | Cite as

Semiparametric estimation in regression with missing covariates using single-index models

  • Zhuoer Sun
  • Suojin WangEmail author


We investigate semiparametric estimation of regression coefficients through generalized estimating equations with single-index models when some covariates are missing at random. Existing popular semiparametric estimators may run into difficulties when some selection probabilities are small or the dimension of the covariates is not low. We propose a new simple parameter estimator using a kernel-assisted estimator for the augmentation by a single-index model without using the inverse of selection probabilities. We show that under certain conditions the proposed estimator is as efficient as the existing methods based on standard kernel smoothing, which are often practically infeasible in the case of multiple covariates. A simulation study and a real data example are presented to illustrate the proposed method. The numerical results show that the proposed estimator avoids some numerical issues caused by estimated small selection probabilities that are needed in other estimators.


Asymptotic efficiency Generalized estimating equation Kernel estimation Missing at random Regression Single-index model 



The authors thank the Associate Editor and two referees for their helpful comments and suggestions that have led to much improvement of this paper. This research was supported in part by the Simons Foundation Mathematics and Physical Sciences—Collaboration Grants for Mathematicians Program Award No. 499650.


  1. Bang, H., Robins, J. M. (2005). Doubly robust estimation in missing data and causal inference models. Biometrics, 61, 962–973.Google Scholar
  2. Chen, H. Y. (2004). Nonparametric and semiparametric models for missing covariates in parametric regression. Journal of the American Statistical Association, 99, 1176–1189.MathSciNetCrossRefzbMATHGoogle Scholar
  3. Fuchs, C. (1982). Maximum likelihood estimation and model selection in contingency tables with missing data. Journal of the American Statistical Association, 77, 270–278.CrossRefGoogle Scholar
  4. Han, P. (2014). Multiply robust estimation in regression analysis with missing data. Journal of the American Statistical Association, 109, 1159–1173.MathSciNetCrossRefzbMATHGoogle Scholar
  5. Han, P. (2016). Combining inverse probability weighting and multiple imputation to improve robustness of estimation. Scandinavian Journal of Statistics, 43, 246–260.MathSciNetCrossRefzbMATHGoogle Scholar
  6. Han, P., Wang, L. (2013). Estimation with missing data: Beyond double robustness. Biometrika, 100, 417–430.Google Scholar
  7. Hartley, H., Hocking, R. (1971). The analysis of incomplete data. Biometrics, 27, 783–823.Google Scholar
  8. Hsu, C.-H., Long, Q., Li, Y., Jacobs, E. (2014). A nonparametric multiple imputation approach for data with missing covariate values with application to colorectal adenoma data. Journal of Biopharmaceutical Statistics, 24, 634–648.Google Scholar
  9. Ibrahim, J. G. (1990). Incomplete data in generalized linear models. Journal of the American Statistical Association, 85, 765–769.CrossRefGoogle Scholar
  10. Ibrahim, J. G., Chen, M.-H., Lipsitz, S. R. (2002). Bayesian methods for generalized linear models with covariates missing at random. Canadian Journal of Statistics, 30, 55–78.Google Scholar
  11. Ibrahim, J. G., Chen, M.-H., Lipsitz, S. R., Herring, A. H. (2005). Missing-data methods for generalized linear models: A comparative review. Journal of the American Statistical Association, 100, 332–346.Google Scholar
  12. Kang, J. D., Schafer, J. L. (2007). Demystifying double robustness: A comparison of alternative strategies for estimating a population mean from incomplete data. Statistical Science, 22, 523–539.Google Scholar
  13. Little, R. J., Rubin, D. B. (2014). Statistical analysis with missing data. New Jersey: Wiley.Google Scholar
  14. Reilly, M., Pepe, M. S. (1995). A mean score method for missing and auxiliary covariate data in regression models. Biometrika, 82, 299–314.Google Scholar
  15. Robins, J. M., Ritov, Y. (1997). Toward a curse of dimensionality appropriate(coda) asymptotic theory for semi-parametric models. Statistics in Medicine, 16, 285–319.Google Scholar
  16. Robins, J. M., Rotnitzky, A., Zhao, L. P. (1994). Estimation of regression coefficients when some regressors are not always observed. Journal of the American statistical Association, 89, 846–866.Google Scholar
  17. Robins, J., Sued, M., Lei-Gomez, Q., Rotnitzky, A. (2007). Comment: Performance of double-robust estimators when “inverse probability” weights are highly variable. Statistical Science, 22, 544–559.Google Scholar
  18. Rubin, D. B. (1976). Inference and missing data. Biometrika, 63, 581–592.MathSciNetCrossRefzbMATHGoogle Scholar
  19. Rubin, D. B. (2004). Multiple imputation for nonresponse in surveys. New Jersey: Wiley.zbMATHGoogle Scholar
  20. Schluchter, M. D., Jackson, K. L. (1989). Log-linear analysis of censored survival data with partially observed covariates. Journal of the American Statistical Association, 84, 42–52.Google Scholar
  21. Sepanski, J., Knickerbocker, R., Carroll, R. (1994). A semiparametric correction for attenuation. Journal of the American Statistical Association, 89, 1366–1373.Google Scholar
  22. Sinha, S., Saha, K. K., Wang, S. (2014). Semiparametric approach for non-monotone missing covariates in a parametric regression model. Biometrics, 70, 299–311.Google Scholar
  23. Wang, C., Wang, S., Zhao, L.-P., Ou, S.-T. (1997). Weighted semiparametric estimation in regression analysis with missing covariate data. Journal of the American Statistical Association, 92, 512–525.Google Scholar
  24. Wang, S., Wang, C. (2001). A note on kernel assisted estimators in missing covariate regression. Statistics & Probability Letters, 55, 439–449.Google Scholar
  25. Zhou, Y., Wan, A. T. K., Wang, X. (2008). Estimating equations inference with missing data. Journal of the American Statistical Association, 103, 1187–1199.Google Scholar

Copyright information

© The Institute of Statistical Mathematics, Tokyo 2018

Authors and Affiliations

  1. 1.Department of StatisticsTexas A&M UniversityCollege StationUSA

Personalised recommendations