Two-Part Models for Zero-Modified Count and Semicontinuous Data

  • Brian NeelonEmail author
  • Alistair James O’Malley
Reference work entry
Part of the Health Services Research book series (HEALTHSR)


Health services data often contain a high proportion of zeros. In studies examining patient hospitalization rates, for instance, many patients will have no hospitalizations, resulting in a count of zero. When the number of zeros is greater or less than expected under a standard count model, the data are said to be zero modified relative to the standard model. More precisely, the data are zero inflated if there is an overabundance of zeros, and zero deflated if there are fewer zeros than expected. A similar phenomenon arises with semicontinuous data, which are characterized by a spike at zero followed by a right-skewed continuous distribution of positive values. When dealing with zero-modified count and semicontinuous data, flexible two-part mixture distributions are often needed to accommodate both the excess zeros and the skewed distribution of nonzero values. A broad array of two-part models has been introduced over the past three decades to accommodate such data. These include hurdle models, zero-inflated models, and two-part semicontinuous models. While these models differ in their distributional assumptions, they each incorporate a two-part structure in which the zero and nonzero observations are modeled in distinct but related ways. This chapter describes recent developments in two-part modeling of zero-modified count and semicontinuous data and highlights their application in health services research.


  1. Agarwal DK, Gelfand AE, Citron-Pousty S. Zero-inflated models with application to spatial count data. Environ Ecol Stat. 2002;9(4):341–55. Available from Scholar
  2. Akaike H. A new look at the statistical model identification. IEEE Trans Autom Control. 1974;19(6):716–23.Google Scholar
  3. Albert P, Follman D. Shared-parameter models. In: Fitzmaurice G, Davidian M, Ver-beke G, Molenberghs G, editors. Longitudinal data analysis. Boca Raton: Chapman & Hall/CRC Press; 2009. p. 433–52.Google Scholar
  4. Albert JM, Wang W, Nelson S. Estimating overall exposure effects for zero-inflated regression models with application to dental caries. Stat Methods Med Res. 2011. Available from
  5. Ando T. Bayesian model selection and statistical modeling. Boca Raton: Chapman Hall/CRC Press; 2010.Google Scholar
  6. Arab A, Holan SH, Wikle CK, Wildhaber ML. Semiparametric bivariate zero-inflated Poisson models with application to studies of abundance for multiple species. ArXiv e-prints. 2011. Available from
  7. Basu A, Manning WG. Estimating lifetime or episode-of-illness costs under censoring. Health Econ. 2010;19(9):1010–28. Scholar
  8. Berger JO, Pericchi LR. The intrinsic Bayes factor for model selection and prediction. J Am Stat Assoc. 1996;91(433):109–22. Available from Scholar
  9. Blough DK, Madden CW, Hornbrook MC. Modeling risk using generalized linear models. J Health Econ. 1999;18(2):153–71. Available from Scholar
  10. Buntin MB, Zaslavsky AM. Too much ado about two-part models and transformation?: comparing methods of modeling Medicare expenditures. J Health Econ. 2004;23(3):525–42. Available from Scholar
  11. Buu A, Johnson NJ, Li R, Tan X. New variable selection methods for zero-inflated count data with applications to the substance abuse field. Stat Med. 2011;30(18):2326–40. Scholar
  12. Cameron AC, Trivedi PK. Regression analysis of count data. No. 9780521635677 in Cambridge Books. Cambridge University Press; 1998. Available from
  13. Celeux G, Forbes F, Robert CP, Titterington DM. Deviance information criteria for missing data models. Bayesian Anal. 2006;1(4):651–74.Google Scholar
  14. Consul P. Generalized Poisson distributions: properties and applications. New York: Marcel Dekker; 1989.Google Scholar
  15. Cooper NJ, Sutton AJ, Mugford M, Abrams KR. Use of Bayesian Markov chain Monte Carlo methods to model cost-of-illness data. Med Decis Mak. 2003;23(1):38–53. Available from Scholar
  16. Cooper NJ, Lambert PC, Abrams KR, Sutton AJ. Predicting costs over time using Bayesian Markov chain Monte Carlo methods: an application to early inflammatory polyarthritis. Health Econ. 2007;16(1):37–56. Scholar
  17. Dalrymple ML, Hudson IL, Ford RPK. Finite mixture, zero-inflated Poisson and hurdle models with application to SIDS. Comput Stat Data Anal. 2003;41(3–4):491–504. Scholar
  18. Deb P, Munkin MK, Trivedi PK. Bayesian analysis of the two-part model with endogeneity: application to health care expenditure. J Appl Econ. 2006;21(7):1081–99. Scholar
  19. DeSantis SM, Bandyopadhyay D. Hidden Markov models for zero-inflated Poisson counts with an application to substance use. Stat Med. 2011;30(14):1678–94. Scholar
  20. Dobbie MJ, Welsh AH. Modelling correlated zero-inflated count data. Aust N Z J Stat. 2001;43(4):431–44. Scholar
  21. Duan N. Smearing estimate: a nonparametric retransformation method. J Am Stat Assoc. 1983;78(383):605–10. Available from Scholar
  22. Duan N, Manning J Willard G, Morris CN, Newhouse JP. A comparison of alternative models for the demand for medical care. J Bus Econ Stat. 1983;1(2):115–26. Available from Scholar
  23. Fahrmeir L, Osuna EL. Structured additive regression for overdispersed and zero-inflated count data. Appl Stoch Model Bus Ind. 2006;22(4):351–69. Scholar
  24. Ferguson TS. A bayesian analysis of some nonparametric problems. Ann Stat. 1973;1(2):209–30. Available from Scholar
  25. Fournier DA, Skaug HJ, Ancheta J, Ianelli J, Magnusson A, Maunder MN, et al. AD model builder: using automatic differentiation for statistical inference of highly parameterized complex nonlinear models. Optim Methods Softw. 2012;27(2):233–49. Scholar
  26. Gelfand AE, Dey DK. Bayesian model choice: asymptotics and exact calculations. J R Stat Soc Ser B Stat Methodol. 1994;56(3):501–14. Available from
  27. Gelfand AE, Smith AFM. Sampling-based approaches to calculating marginal densities. J Am Stat Assoc. 1990;85(410):398–409. Available from Scholar
  28. Gelman A, li Meng X, Stern H. Posterior predictive assessment of model fitness via realized discrepancies. Stat Sin. 1996;6:733–807.Google Scholar
  29. Ghosh P, Albert PS. A Bayesian analysis for longitudinal semicontinuous data with an application to an acupuncture clinical trial. Comput Stat Data Anal. 2009;53(3):699–706. Scholar
  30. Ghosh SK, Mukhopadhyay P, Lu JC. Bayesian analysis of zero-inflated regression models. J Stat Plann Infer. 2006;136(4):1360–75. Available from Scholar
  31. Ghosh S, Gelfand AE, Zhu K, Clark JS. The k-ZIG: flexible modeling for zero-inflated counts. Biometrics. 2012;68(3):878–85. Scholar
  32. Green W. Accounting for excess zeros and sample selection in Poisson and negative binomial regression models. Working paper EC-94-10, Department of Economics. New York: New York University; 1994.Google Scholar
  33. Gschlößl S, Czado C. Modelling count data with overdispersion and spatial effects. Stat Pap. 2008;49:531–52. Scholar
  34. Gupta PL, Gupta RC, Tripathi RC. Analysis of zero-adjusted count data. Comput Stat Data Anal. 1996;23(2):207–18. Available from
  35. Hadfield JD. MCMC methods for multi-response generalized linear mixed models: the MCMCglmm R package. J Stat Softw. 2010;33(2):1–22. Available from
  36. Hall DB. Zero-inflated poisson and binomial regression with random effects: a case study. Biometrics. 2000;56(4):1030–9. Scholar
  37. Hall DB, Zhang Z. Marginal models for zero inflated clustered data. Stat Model. 2004;4(3):161–80. Available from Scholar
  38. Hasan MT, Sneddon G. Zero-inflated Poisson regression for longitudinal data. Commun Stat – SimulCompu. 2009;38(3):638–53.Google Scholar
  39. Hasan MT, Sneddon G, Ma R. Pattern-mixture zero-inflated mixed models for longitudinal unbalanced count data with excessive zeros. Biom J. 2009;51(6):946–60. Available from Scholar
  40. Hatfield LA, Boye ME, Carlin BP. Joint modeling of multiple longitudinal patient-reported outcomes and survival. J Biopharm Stat. 2011;21(5):971–91. Available from Scholar
  41. Heilbron DC. Zero-altered and other regression models for count data with added zeros. Biom J. 1994;36(5):531–47. Scholar
  42. Hilbe J. HNBLOGIT: stata module to estimate negative binomial-logit hurdle regression; 2005a. Statistical Software Components, Boston College Department of Economics. Available from
  43. Hilbe J. HPLOGIT: stata module to estimate Poisson-logit hurdle regression. Statistical Software Components, Boston College Department of Economics; 2005b. Available from
  44. Hsu CH. Joint modelling of recurrence and progression of adenomas: a latent variable approach. Stat Model. 2005;5(3):201–15. Available from Scholar
  45. Jackman S. pscl: classes and methods for R developed in the political science computational laboratory. Stanford: Stanford University; 2012. R package version 1.04.4. Available from
  46. Jones AM. Models for health care. In: Hendry D, Clements M, editors. Oxford handbook of economic forecasting. Oxford: Oxford University Press; 2011. p. 625–54.Google Scholar
  47. Kass RE, Raftery AE. Bayes factors. J Am Stat Assoc. 1995;90(430):773–95. Available from Scholar
  48. Kim S, Chang CC, Kim K, Fine M, Stone R. BLUP(REMQL) estimation of a correlated random effects negative binomial hurdle model. Health Serv Outcome Res Methodol. 2012;12:302–19. Scholar
  49. Lam KF, Xue H, Bun CY. Semiparametric analysis of zero-inflated count data. Biometrics. 2006;62(4):996–1003. Scholar
  50. Lambert D. Zero-inflated poisson regression, with an application to defects in manufacturing. Technometrics. 1992;34(1):1–14. Available from Scholar
  51. Li CS, Lu JC, Park J, Kim K, Brinkley PA, Peterson JP. Multivariate zero-inflated Poisson models and their applications. Technometrics. 1999;41(1):29–38. Scholar
  52. Liang KY, Zeger SL. Longitudinal data analysis using generalized linear models. Biometrika. 1986;73(1):13–22. Available from Scholar
  53. Lillard LA, Panis CWA. Multiprocess multilevel modelling, version 2, user’s guide and reference manual. Los Angeles: EconoWare; 1998–2003.Google Scholar
  54. Little RJA, Rubin DB. Statistical analysis with missing data. 2nd ed. Hoboken: Wiley; 2002.Google Scholar
  55. Liu H. Growth curve models for zero-inflated count data: an application to smoking behavior. Struct Equ Model Multidiscip J. 2007;14(2):247–79. Scholar
  56. Liu L. Joint modeling longitudinal semi-continuous data and survival, with application to longitudinal medical cost data. Stat Med. 2009;28(6):972–86. Available from Scholar
  57. Liu L, Ma JZ, Johnson BA. A multi-level two-part random effects model, with application to an alcohol-dependence study. Stat Med. 2008;27(18):3528–39. Available from Scholar
  58. Liu L, Strawderman RL, Cowen ME, Shih YCT. A flexible two-part random effects model for correlated medical costs. J Health Econ. 2010;29(1):110–23. Available from Scholar
  59. Liu L, Strawderman RL, Johnson BA, O’Quigley JM. Analyzing repeated measures semi-continuous data, with application to an alcohol dependence study. Stat Methods Med Res. 2012. Available from
  60. Lunn DJ, Thomas A, Best N, Spiegelhalter D. WinBUGS – a Bayesian modelling framework: concepts, structure, and extensibility. Stat Comput. 2000;10(4):325–37. Scholar
  61. Majumdar A, Gries C. Bivariate zero-inflated regression for count data: a Bayesian approach with application to plant counts. Int J Biostat. 2010;6(1):27. Available from Scholar
  62. Manning WG. The logged dependent variable, heteroscedasticity, and the retransformation problem. J Health Econ. 1998;17(3):283–95. Available from Scholar
  63. Manning WG, Mullahy J. Estimating log models: to transform or not to transform? J Health Econ. 2001;20(4):461–94. Available from Scholar
  64. Manning W, Morris C, Newhouse J, Orr L, Duan N, Keeler E, et al. A two-part model of the demand for medical care: preliminary results from the health insurance study. In: van der Gaag J, Perlman M, editors. Health, economics, and health economics. Amsterdam: North-Holland; 1981. p. 103–23.Google Scholar
  65. Manning WG, Basu A, Mullahy J. Generalized modeling approaches to risk adjustment of skewed outcomes data. J Health Econ. 2005;24(3):465–88. Available from Scholar
  66. Maruotti A. A two-part mixed-effects pattern-mixture model to handle zero-inflation and incompleteness in a longitudinal setting. Biom J. 2011;53(5):716–34. Available from Scholar
  67. Millar RB. Comparison of hierarchical Bayesian models for overdispersed count data using DIC and Bayes’ factors. Biometrics. 2009;65(3):962–9. Scholar
  68. Min Y, Agresti A. Random effect models for repeated measures of zero-inflated count data. Stat Model. 2005;5(1):1–19. Available from Scholar
  69. Moulton LH, Halsey NA. A mixture model with detection limits for regression analyses of antibody response to vaccine. Biometrics. 1995;51(4):1570–8. Available from Scholar
  70. Mullahy J. Specification and testing of some modified count data models. J Econ. 1986;33(3):341–65. Available from Scholar
  71. Muthén BO. Two-part growth mixture modeling; 2001. Unpublished Manuscript. Available from
  72. Muthén BO, Muthén LK. Mplus (Version 7). Muthén & Muthén; 1998–2012.Google Scholar
  73. Mwalili SM, Lesaffre E, Declerck D. The zero-inflated negative binomial regression model with correction for misclassification: an example in caries research. Stat Methods Med Res. 2008;17(2):123–39. Available from Scholar
  74. Neelon BH, OMalley AJ, Normand SLT. A Bayesian model for repeated measures zero inflated count data with application to outpatient psychiatric service use. Stat Model. 2010;10(4):421–39. Available from Scholar
  75. Neelon B, O’Malley AJ, Normand SLT. A bayesian two-part latent class model for longitudinal medical expenditure data: assessing the impact of mental health and substance abuse parity. Biometrics. 2011;67(1):280–9. Available from Scholar
  76. Neelon B, Ghosh P, Loebs PF. A spatial Poisson hurdle model for exploring geographic variation in emergency department visits. Journal of the Royal Statistical Society: Series A (Statistics in Society). 2012; Published online ahead of print. Available from Scholar
  77. Olsen MK, Schafer JL. A two-part random-effects model for semicontinuous longitudinal data. J Am Stat Assoc. 2001;96(454):730–45. Scholar
  78. Pan W. Akaike’s information criterion in generalized estimating equations. Biometrics. 2001;57(1):120–5. Available from Scholar
  79. Park RE. Estimation with heteroscedastic error terms. Econometrica. 1966;34(4):888. Available from Scholar
  80. Patil GP. Maximum likelihood estimation for generalized power series distributions and its application to a truncated binomial distribution. Biometrika. 1962;49(1–2):227–37. Available from Scholar
  81. Preisser JS, Stamm JW, Long DL. Review and recommendations for zero-inflated count regression modeling of dental caries indices in epidemiological studies. Caries Res. 2012;46:413–23.PubMedCentralPubMedGoogle Scholar
  82. R Development Core Team. R: a language and environment for statistical computing. Vienna; 2012. ISBN 3-900051-07-0. Available from
  83. Rabe-Hesketh S, Skrondal A, Pickles A. Maximum likelihood estimation of limited and discrete dependent variable models with nested random effects. J Econ. 2005;128(2):301–23. Available from Scholar
  84. Raftery AM, Newton MA, Satagopan JM, Krivitsky PN. Estimating the integrated likelihood via posterior simulation using the harmonic mean identity. In: Bernardo JM, Bayarri MJ, Berger JO, Dawid AP, Heckerman D, Smith AFM, et al., editors. Bayesian statistics 8. Oxford: Oxford University Press; 2007. p. 1–45.Google Scholar
  85. Rathbun S, Fei S. A spatial zero-inflated poisson regression model for oak regeneration. Environ Ecol Stat. 2006;13:409–26. Scholar
  86. Ridout M, Demétrio C, Hinde J. Models for count data with many zeros. Proceedings from the International Biometric Conference, Cape Town; 1998. Available from
  87. Ridout M, Hinde J, DemAtrio CGB. A score test for testing a zero-inflated Poisson regression model against zero-inflated negative binomial alternatives. Biometrics. 2001;57(1):219–23. Available from Scholar
  88. Rodrigues J. Bayesian analysis of zero-inflated distributions. Commun Stat Theory Methods. 2003;32(2):281–9. Available from Scholar
  89. Roeder K, Lynch KG, Nagin DS. Modeling uncertainty in latent class membership: a case study in criminology. J Am Stat Assoc. 1999;94(447):766–76. Available from Scholar
  90. Rosen O, Jiang W, Tanner M. Mixtures of marginal models. Biometrika. 2000;87(2):391–404. Available from Scholar
  91. SAS 9.1.3 Help and Documentation. Cary; 2000–2004. Available from:
  92. Schwarz G. Estimating the dimension of a model. Ann Stat. 1978;6(2):461–4. Available from Scholar
  93. Silva FF, Tunin KP, Rosa GJM, Silva MVBd, Azevedo ALS, Verneque RdS, et al. Zero-inflated Poisson regression models for QTL mapping applied to tickresistance in a Gyr x Holstein F2 population. Genet Mol Biol; 2011;34:575–82. Available from Scholar
  94. Skaug H, Fournier D, Nielsen A, Magnusson A, Bolker B. glmmADMB: generalized linear mixed models using AD Model Builder; 2012. R package version Available from
  95. Spiegelhalter DJ, Best NG, Carlin BP, Van Der Linde A. Bayesian measures of model complexity and fit. J R Stat Soc Ser B Stat Methodol. 2002;64(4):583–639. Scholar
  96. Stata Statistical Software: Release 12. College Station; 2011. Available from
  97. Su L, Tom BDM, Farewell VT. Bias in 2-part mixed models for longitudinal semicontinuous data. Biostatistics. 2009;10(2):374–89. Available from Scholar
  98. Su L, Brown S, Ghosh P, Taylor K. Modelling household debt and financial assets: a Bayesian approach to a bivariate two-part model; 2012.Google Scholar
  99. Tobin J. Estimation of relationships for limited dependent variables. Econometrica. 1958;26(1):24–36. Available from Scholar
  100. Tooze JA, Grunwald GK, Jones RH. Analysis of repeated measures data with clumping at zero. Stat Methods Med Res. 2002;11(4):341–55. Available from Scholar
  101. Ver Hoef JM, Jansen JK. Spacetime zero-inflated count models of harbor seals. Environmetrics. 2007;18(7):697–712. Available from Scholar
  102. Vuong QH. Likelihood ratio tests for model selection and non-nested hypotheses. Econometrica. 1989;57(2):307–33. Available from Scholar
  103. Walhin JF, Bivariate ZIP. Models. Biom J. 2001;43(2):147–60. Available from 10.1002/1521-4036(200105)43:2<147::AID-BIMJ147> 3.0.CO;2-5Google Scholar
  104. Welsh AH, Zhou XH. Estimating the retransformed mean in a heteroscedastic two-part model. J Stat PlannInfer. 2006;136(3):860–81. Available from Scholar
  105. Williamson JM, Lin HM, Lyles RH. Power calculations for ZIP and ZINB models. J Data Sci. 2007;5:519–34. Available from Scholar
  106. Winkelmann R. Econometric analysis of count data. 5th ed. Berlin: Springer; 2008. Available from Scholar
  107. Wu MC, Carroll RJ. Estimation and comparison of changes in the presence of informative right censoring by modeling the censoring process. Biometrics. 1988;44(1):175–88. Available from Scholar
  108. Xiang L, Lee AH, Yau KKW, McLachlan GJ. A score test for overdispersion in zero-inflated poisson mixed regression model. Stat Med. 2007;26(7):1608–22. Available from Scholar
  109. Xie H, McHugo G, Sengupta A, Clark R, Drake R. A method for analyzing longitudinal outcomes with many zeros. Ment Health Serv Res. 2004;6:239–46. Available from Scholar
  110. Yau KKW, Lee AH. Zero-inflated Poisson regression with random effects to evaluate an occupational injury prevention programme. Stat Med. 2001;20(19):2907–20. Available from Scholar
  111. Zeileis A, Kleiber C, Jackman S. Regression models for count data in R. J Stat Softw. 2008;27(8):1–25. Available from Scholar
  112. Zhang M, Strawderman RL, Cowen ME, Wells MT. Bayesian inference for a two-part hierarchical model: an application to profiling providers in managed health care. J Am Stat Assoc. 2006;101(475):934–45. Available from Scholar
  113. Zurr AF, Saveliev AA, Ieno EN. Zero inflated models and generalized linear mixed models with R. Newburgh: Highland Statistics Ltd; 2012. Available from Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Department of Biostatistics and BioinformaticsDuke University School of MedicineDurhamUSA
  2. 2.The Dartmouth Institute for Health Policy and Clinical Practice, Department of Biomedical Data ScienceGeisel School of Medicine at DartmouthLebanonUSA
  3. 3.Department of Health Care PolicyHarvard Medical SchoolBostonUSA

Personalised recommendations