In health services research, it is common to encounter semicontinuous data characterized by a point mass at zero followed by a continuous distribution with positive support. These are often analyzed using two-part mixtures that separately model the probability of use to account for the portion of the sample with zero values. Commonly, but not always, the second component models the continuous values conditional on them being positive. Prior work examining whether such two-part models are needed to appropriately draw inference from semicontinuous data compared to standard one-part regression models has found mixed results. However, prior studies have generally used only measures of model fit on a single dataset, leaving a definitive conclusion uncertain. This paper provides a detailed evaluation using simulations of the appropriateness of standard one-part generalized linear models (GLMs) compared to a recently developed marginalized two-part (MTP) model. The MTP model, unlike the one-part GLMs, explicitly accounts for the point mass at zero, yet takes the same form for the marginal mean as the commonly used GLM with log link, making the covariate effects directly comparable. We simulate data scenarios with varying sample sizes and percentages of zeros. One-part GLMs resulted in increased bias, lower than nominal coverage of confidence intervals, and inflated type I error rates, rendering them inappropriate for use with semicontinuous data. Even when distributional assumptions were violated, estimates of covariate effects and type I error rates under the MTP model remained robust.
This is a preview of subscription content, access via your institution.
Buy single article
Instant access to the full article PDF.
Price includes VAT (USA)
Tax calculation will be finalised during checkout.
Azzalini, A.: A class of distributions which includes the normal ones. Scand. J. Stat. 12, 171–178 (1985)
Basu, A., Manning, W.G.: Issues for the next generation of health care cost analyses. Med. Care 47, S109–S114 (2009)
Basu, A., Rathouz, P.J.: Estimating marginal and incremental effects on health outcomes using flexible link and variance function models. Biostatistics 6, 93–109 (2005)
Belotti, F., Deb, P., Manning, W.G., Norton, E.C.: twopm: two-part models. Stata J. 15, 3–20 (2015)
Blough, D.K., Madden, C.W., Hornbrook, M.C.: Modeling risk using generalized linear models. J. Health Econ. 18, 153–171 (1999)
Buntin, M.B., Zaslavsky, A.M.: Too much ado about two-part models and transformation?: comparing methods of modeling Medicare expenditures. J. Health Econ. 23, 525–542 (2004)
Chai, H.S., Bailey, K.R.: Use of log-skew-normal distribution in analysis of continuous data with a discrete component at zero. Stat. Med. 27, 3643–3655 (2008)
Cragg, J.G.: Some statistical models for limited dependent variables with application to the demand for durable goods. Econometrica 39, 829–844 (1971)
Diehr, P., Yanez, D., Ash, A., Hornbrook, M., Lin, D.: Methods for analyzing health care utilization and costs. Annu. Rev. Public Health 20, 125–144 (1999)
Duan, N., Manning Jr., W.G., Morris, C.N., Newhouse, J.P.: A comparison of alternative models for the demand of medical care. J. Bus. Econ. Stat. 1, 115–126 (1983)
Fitzmaurice, G.M., Laird, N.M., Ware, J.H.: Applied Longitudinal Analysis. Wiley, New York (2012)
Kahwati, L.C., Lance, T.X., Jones, K.R., Kinsinger, L.S.: RE-AIM evaluation of the Veterans Health Administration’s MOVE! weight management program. Transl. Behav. Med. 1, 551–560 (2011)
Kauermann, G., Carroll, R.J.: A note on the efficiency of sandwich covariance matrix estimation. J. Am. Stat. Assoc. 96, 1387–1396 (2001)
Liu, L., Cowen, M.E., Strawderman, R.L., Shih, Y.-C.T.: A flexible two-part random effects model for correlated medical costs. J. Health Econ. 29, 110–123 (2010)
Madden, C.W., Mackay, B.P., Skillman, S.M., Ciol, M., Diehr, P.K.: Risk adjusting capitation: applications in employed and disabled populations. Health Care Manag. Sci. 3, 101–109 (2000)
Manning, W.G., Mullahy, J.: Estimating log models: to transform or not to transform? J. Health Econ. 20, 461–494 (2001)
Manning, W.G., Morris, C.N., Newhouse, J.P., Orr, L.L., Duan, N., Keeler, E., Leibowitz, A., Marquis, K., Marquis, M., Phelps, C.: A two-part model of the demand for medical care: preliminary results from the health insurance study. In: Health, Economics, and Health Economics, pp. 103–123. North-Holland, Amsterdam (1981)
Manning, W.G., Basu, A., Mullahy, J.: Generalized modeling approaches to risk adjustment of skewed outcomes data. J. Health Econ. 24, 465–488 (2005)
Mullahy, J.: Much ado about two: reconsidering retransformation and the two-part model in health econometrics. J. Health Econ. 17, 247–281 (1998)
Neelon, B., O’Malley, A.J., Smith, V.: Modeling zero-modified count and semicontinuous data in health services research, Part 2: Case studies. Stat. Med. 35, 5094–5112 (2016)
Park, R.E.: Estimation with heteroscedastic error terms. Econometrica 34, 888 (1966)
Preisser, J.S., Das, K., Long, D.L., Divaris, K.: Marginalized zero-inflated negative binomial regression with application to dental caries. Stat. Med. 35, 1722–1735 (2016)
Royall, R.M.: Model robust confidence intervals using maximum likelihood estimators. Int. Stat. Rev. 54, 221–226 (1986)
Smith, V.A., Preisser, J.S.: Direct and flexible marginal inference for semicontinuous data. Stat. Methods Med. Res. (2015). doi:10.1177/0962280215602290 (published online September 1, 2015)
Smith, V.A., Preisser, J.S., Neelon, B., Maciejewski, M.L.: A marginalized two-part model for semicontinuous data. Stat. Med. 33, 4891–4903 (2014)
Smith, V.A., Neelon, B., Preisser, J.S., Maciejewski, M.L.: A marginalized two-part model for longitudinal semicontinuous data. Stat. Methods Med. Res. (2015). doi:10.1177/0962280215592908 (published online July 7, 2015)
The views expressed in this article are those of the authors and do not necessarily reflect the position or policy of the Department of Veteran Affairs, Duke University, the Medical University of South Carolina, or the University of North Carolina.
This work was funded by the Diabetes QUERI and the National Center for Health Promotion and Disease Prevention within the Department of Veterans Affairs (VA). This work was supported by the Center of Innovation for Health Services Research in Primary Care (CIN 13-410) at the Durham VA Medical Center. Dr. Maciejewski is supported by a Research Career Scientist award (RCS 10-391) and a Grant (IIR 10-159) from the VA. The Diabetes QUERI, the National Center for Health Promotion and Disease Prevention, and the Health Services Research and Development Service, Department of Veterans Affairs had no role in the design, conduct, collection, management, analysis, or interpretation of the data; or in the preparation, review, or approval of the manuscript.
Conflict of interest
Dr. Maciejewski owns stock in Amgen due to his spouse’s employment. No other authors have conflicts of interest.
All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards.
This study was approved by the institutional review board (including waiver of informed consent) of the Durham VA Medical Center.
Electronic supplementary material
Below is the link to the electronic supplementary material.
About this article
Cite this article
Smith, V.A., Neelon, B., Maciejewski, M.L. et al. Two parts are better than one: modeling marginal means of semicontinuous data. Health Serv Outcomes Res Method 17, 198–218 (2017). https://doi.org/10.1007/s10742-017-0169-9
- Generalized gamma distribution
- Health care expenditures
- Log-skew-normal distribution
- Marginalized models
- Two-part models