, Volume 79, Issue 4, pp 457–483

Semiparametric estimation of a zero-inflated Poisson regression model with missing covariates



Zero-inflated Poisson (ZIP) regression models have been widely used to study the effects of covariates in count data sets that have many zeros. However, often some covariates involved in ZIP regression modeling have missing values. Assuming that the selection probability is known or unknown and estimated via a non-parametric method, we propose the inverse probability weighting (IPW) method to estimate the parameters of the ZIP regression model with covariates missing at random. The asymptotic properties of the proposed estimators are studied in detail under certain regularity conditions. Both theoretical analysis and simulation results show that the semiparametric IPW estimator is more efficient than the true weight IPW estimator. The practical use of the proposed methodology is illustrated with data from a motorcycle survey of traffic regulations conducted in 2007 in Taiwan by the Ministry of Transportation and Communication.


Score function Inverse probability weighting (IPW)  Missing at random (MAR) 


  1. Böhning D, Dietz E, Schlattmann P, Mendonca L, Kirchner U (1999) The zero-inflated Poisson model and the decayed, missing and filled teeth index in dental epidemiology. J R Stat Soc Ser A 162:195–209CrossRefGoogle Scholar
  2. Breslow NE, Cain KC (1988) Logistic regression for two-stage case–control data. Biometrika 75:11–20MathSciNetCrossRefMATHGoogle Scholar
  3. Chen XD, Fu YZ (2011) Model selection for zero-inflated regression with missing covariates. Comput Stat Data Anal 55:765–773MathSciNetCrossRefMATHGoogle Scholar
  4. Cheung YB (2002) Zero-inflated models for regression analysis of count data: a study of growth and development. Stat Med 21:1461–1469CrossRefGoogle Scholar
  5. Creemers A, Aerts M, Hens N, Molenberghs G (2012) A nonparametric approach to weighted estimating equations for regression analysis with missing covariates. Comput Stat Data Anal 56:100–113MathSciNetCrossRefMATHGoogle Scholar
  6. Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B 39:1–38MathSciNetMATHGoogle Scholar
  7. Dietz K, Böhning D (1997) The use of two-component mixture models with one completely or partly known component. Comput Stat 12:219–234MATHGoogle Scholar
  8. Foutz RV (1977) On the unique consistent solution to the likelihood equations. J Am Stat Assoc 72:147–148MathSciNetCrossRefMATHGoogle Scholar
  9. Hall DB, Shen J (2010) Robust estimation for zero-inflated Poisson regression. Scand J Stat 37:237–252MathSciNetCrossRefMATHGoogle Scholar
  10. Horvitz DG, Thompson DJ (1952) A generalization of sampling without replacement. J Am Stat Assoc 47:663–685MathSciNetCrossRefMATHGoogle Scholar
  11. Hsieh SH, Lee SM, Shen PS (2009) Semiparametric analysis of randomized response data with missing covariates in logistic regression. Comput Stat Data Anal 53:2673–2692MathSciNetCrossRefMATHGoogle Scholar
  12. Hsieh SH, Lee SM, Shen PS (2010) Logistic regression analysis of randomized response data with missing covariates. J Stat Plan Inference 140:927–940MathSciNetCrossRefMATHGoogle Scholar
  13. Jansakul N, Hinde JP (2002) Score test for zero-inflated Poisson models. Comput Stat Data Anal 40:75–96MathSciNetCrossRefMATHGoogle Scholar
  14. Johnson NL, Kotz S, Kemp AW (2005) Univariate discrete distributions, 3rd edn. Wiley, New YorkCrossRefMATHGoogle Scholar
  15. Lambert D (1992) Zero-inflated Poisson regression, with an application to defects in manufacturing. Technometrics 34:1–14CrossRefMATHGoogle Scholar
  16. Lee SM, Li CS, Hsieh SH, Huang LH (2012) Semiparametric estimation of logistic regression model with missing covariates and outcome. Metrika 75:621–653MathSciNetCrossRefMATHGoogle Scholar
  17. Li CS (2011) A Lack-of-fit test for parametric zero-inflated Poisson models. J Stat Comput Simul 81:1081–1098MathSciNetCrossRefMATHGoogle Scholar
  18. Li CS (2012) Score test for semiparametric zero-inflated poisson model. Int J Stat Probab 1:1–7CrossRefGoogle Scholar
  19. Little RJA (1992) Regression with missing X’s: a review. J Am Stat Assoc 87:1227–1237Google Scholar
  20. Lu SE, Lin Y, Shih WCJ (2004) Analyzing excessive no changes in clinical trials with clustered data. Biometrics 60:257–267MathSciNetCrossRefMATHGoogle Scholar
  21. Mason A, Richardson S, Plewis I, Best N (2012) Strategy for modelling nonrandom missing data mechanisms in observational studies using Bayesian analysis. J Off Stat 28:279–302Google Scholar
  22. Mullahy J (1986) Specification and testing of some modified of some count data models. J Econom 33:341–365MathSciNetCrossRefGoogle Scholar
  23. Pepe MS, Fleming TR (1991) A nonparametric method for dealing with mismeasured covariate data. J Am Stat Assoc 86:108–113MathSciNetCrossRefGoogle Scholar
  24. Reilly M, Pepe MS (1995) A mean score method for missing and auxiliary covariates data in regression methods. Biometrika 82:299–314MathSciNetCrossRefMATHGoogle Scholar
  25. Robins JM, Rotnitzky A, Zhao LP (1994) Estimation of regression coefficients when some regressors are not always observed. J Am Stat Assoc 89:846–866MathSciNetCrossRefMATHGoogle Scholar
  26. Rubin DB (1976) Inference and missing data. Biometrika 63:581–592MathSciNetCrossRefMATHGoogle Scholar
  27. Rubin DB (1987) Multiple imputation for nonresponse in surveys. Wiley, New YorkCrossRefMATHGoogle Scholar
  28. Singh S (1963) A note on inflated Poisson distribution. J Indian Stat Assoc 1:140–144MathSciNetGoogle Scholar
  29. Vuong QH (1989) Likelihood ratio tests for model selection and non-nested hypotheses. Econometrica 57:307–333MathSciNetCrossRefMATHGoogle Scholar
  30. Wang CY, Chen HY (2001) Augmented inverse probability weighted estimator for Cox missing covariate regression. Biometrics 57:414–419MathSciNetCrossRefMATHGoogle Scholar
  31. Wang S, Wang CY (2001) A note on kernel assisted estimators in missing covariate regression. Stat Probab Lett 55:439–449MathSciNetCrossRefMATHGoogle Scholar
  32. Wang D, Chen SX (2009) Empirical likelihood for estimating equations with missing values. Ann Stat 37:490–517MathSciNetCrossRefMATHGoogle Scholar
  33. Wang CY, Wang S, Zhao LP, Ou ST (1997) Weighted semiparametric estimation in regression with missing covariates data. J Am Stat Assoc 92:512–525MathSciNetCrossRefMATHGoogle Scholar
  34. Wang CY, Chen JC, Lee SM, Ou ST (2002) Joint conditional likelihood estimator in logistic regression with missing covariates data. Stat Sin 12:555–574MathSciNetMATHGoogle Scholar
  35. Yau KKW, Lee AH (2001) Zero-inflated Poisson regression with random effects to evaluate an occupational injury prevention programme. Stat Med 20:2907–2920CrossRefGoogle Scholar
  36. Zhao LP, Lipsitz S (1992) Design and analysis of two-stage studies. Stat Med 11:769–782CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2015

Authors and Affiliations

  • T. Martin Lukusa
    • 1
  • Shen-Ming Lee
    • 1
  • Chin-Shang Li
    • 2
  1. 1.Department of StatisticsFeng Chia UniversityTaichungTaiwan
  2. 2.Division of Biostatistics, Department of Public Health SciencesUniversity of CaliforniaDavisUSA

Personalised recommendations