Skip to main content

Two parts are better than one: modeling marginal means of semicontinuous data

Abstract

In health services research, it is common to encounter semicontinuous data characterized by a point mass at zero followed by a continuous distribution with positive support. These are often analyzed using two-part mixtures that separately model the probability of use to account for the portion of the sample with zero values. Commonly, but not always, the second component models the continuous values conditional on them being positive. Prior work examining whether such two-part models are needed to appropriately draw inference from semicontinuous data compared to standard one-part regression models has found mixed results. However, prior studies have generally used only measures of model fit on a single dataset, leaving a definitive conclusion uncertain. This paper provides a detailed evaluation using simulations of the appropriateness of standard one-part generalized linear models (GLMs) compared to a recently developed marginalized two-part (MTP) model. The MTP model, unlike the one-part GLMs, explicitly accounts for the point mass at zero, yet takes the same form for the marginal mean as the commonly used GLM with log link, making the covariate effects directly comparable. We simulate data scenarios with varying sample sizes and percentages of zeros. One-part GLMs resulted in increased bias, lower than nominal coverage of confidence intervals, and inflated type I error rates, rendering them inappropriate for use with semicontinuous data. Even when distributional assumptions were violated, estimates of covariate effects and type I error rates under the MTP model remained robust.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

References

  1. Azzalini, A.: A class of distributions which includes the normal ones. Scand. J. Stat. 12, 171–178 (1985)

    Google Scholar 

  2. Basu, A., Manning, W.G.: Issues for the next generation of health care cost analyses. Med. Care 47, S109–S114 (2009)

    Article  PubMed  Google Scholar 

  3. Basu, A., Rathouz, P.J.: Estimating marginal and incremental effects on health outcomes using flexible link and variance function models. Biostatistics 6, 93–109 (2005)

    Article  PubMed  Google Scholar 

  4. Belotti, F., Deb, P., Manning, W.G., Norton, E.C.: twopm: two-part models. Stata J. 15, 3–20 (2015)

    Google Scholar 

  5. Blough, D.K., Madden, C.W., Hornbrook, M.C.: Modeling risk using generalized linear models. J. Health Econ. 18, 153–171 (1999)

    CAS  Article  PubMed  Google Scholar 

  6. Buntin, M.B., Zaslavsky, A.M.: Too much ado about two-part models and transformation?: comparing methods of modeling Medicare expenditures. J. Health Econ. 23, 525–542 (2004)

    Article  PubMed  Google Scholar 

  7. Chai, H.S., Bailey, K.R.: Use of log-skew-normal distribution in analysis of continuous data with a discrete component at zero. Stat. Med. 27, 3643–3655 (2008)

    Article  PubMed  PubMed Central  Google Scholar 

  8. Cragg, J.G.: Some statistical models for limited dependent variables with application to the demand for durable goods. Econometrica 39, 829–844 (1971)

    Article  Google Scholar 

  9. Diehr, P., Yanez, D., Ash, A., Hornbrook, M., Lin, D.: Methods for analyzing health care utilization and costs. Annu. Rev. Public Health 20, 125–144 (1999)

    CAS  Article  PubMed  Google Scholar 

  10. Duan, N., Manning Jr., W.G., Morris, C.N., Newhouse, J.P.: A comparison of alternative models for the demand of medical care. J. Bus. Econ. Stat. 1, 115–126 (1983)

    Google Scholar 

  11. Fitzmaurice, G.M., Laird, N.M., Ware, J.H.: Applied Longitudinal Analysis. Wiley, New York (2012)

    Google Scholar 

  12. Kahwati, L.C., Lance, T.X., Jones, K.R., Kinsinger, L.S.: RE-AIM evaluation of the Veterans Health Administration’s MOVE! weight management program. Transl. Behav. Med. 1, 551–560 (2011)

    Article  PubMed  PubMed Central  Google Scholar 

  13. Kauermann, G., Carroll, R.J.: A note on the efficiency of sandwich covariance matrix estimation. J. Am. Stat. Assoc. 96, 1387–1396 (2001)

    Article  Google Scholar 

  14. Liu, L., Cowen, M.E., Strawderman, R.L., Shih, Y.-C.T.: A flexible two-part random effects model for correlated medical costs. J. Health Econ. 29, 110–123 (2010)

    Article  PubMed  Google Scholar 

  15. Madden, C.W., Mackay, B.P., Skillman, S.M., Ciol, M., Diehr, P.K.: Risk adjusting capitation: applications in employed and disabled populations. Health Care Manag. Sci. 3, 101–109 (2000)

    CAS  Article  PubMed  Google Scholar 

  16. Manning, W.G., Mullahy, J.: Estimating log models: to transform or not to transform? J. Health Econ. 20, 461–494 (2001)

    CAS  Article  PubMed  Google Scholar 

  17. Manning, W.G., Morris, C.N., Newhouse, J.P., Orr, L.L., Duan, N., Keeler, E., Leibowitz, A., Marquis, K., Marquis, M., Phelps, C.: A two-part model of the demand for medical care: preliminary results from the health insurance study. In: Health, Economics, and Health Economics, pp. 103–123. North-Holland, Amsterdam (1981)

  18. Manning, W.G., Basu, A., Mullahy, J.: Generalized modeling approaches to risk adjustment of skewed outcomes data. J. Health Econ. 24, 465–488 (2005)

    Article  PubMed  Google Scholar 

  19. Mullahy, J.: Much ado about two: reconsidering retransformation and the two-part model in health econometrics. J. Health Econ. 17, 247–281 (1998)

    CAS  Article  PubMed  Google Scholar 

  20. Neelon, B., O’Malley, A.J., Smith, V.: Modeling zero-modified count and semicontinuous data in health services research, Part 2: Case studies. Stat. Med. 35, 5094–5112 (2016)

    Article  PubMed  Google Scholar 

  21. Park, R.E.: Estimation with heteroscedastic error terms. Econometrica 34, 888 (1966)

    Article  Google Scholar 

  22. Preisser, J.S., Das, K., Long, D.L., Divaris, K.: Marginalized zero-inflated negative binomial regression with application to dental caries. Stat. Med. 35, 1722–1735 (2016)

    Article  PubMed  Google Scholar 

  23. Royall, R.M.: Model robust confidence intervals using maximum likelihood estimators. Int. Stat. Rev. 54, 221–226 (1986)

    Article  Google Scholar 

  24. Smith, V.A., Preisser, J.S.: Direct and flexible marginal inference for semicontinuous data. Stat. Methods Med. Res. (2015). doi:10.1177/0962280215602290 (published online September 1, 2015)

  25. Smith, V.A., Preisser, J.S., Neelon, B., Maciejewski, M.L.: A marginalized two-part model for semicontinuous data. Stat. Med. 33, 4891–4903 (2014)

    Article  PubMed  Google Scholar 

  26. Smith, V.A., Neelon, B., Preisser, J.S., Maciejewski, M.L.: A marginalized two-part model for longitudinal semicontinuous data. Stat. Methods Med. Res. (2015). doi:10.1177/0962280215592908 (published online July 7, 2015)

Download references

Acknowledgements

The views expressed in this article are those of the authors and do not necessarily reflect the position or policy of the Department of Veteran Affairs, Duke University, the Medical University of South Carolina, or the University of North Carolina.

Funding

This work was funded by the Diabetes QUERI and the National Center for Health Promotion and Disease Prevention within the Department of Veterans Affairs (VA). This work was supported by the Center of Innovation for Health Services Research in Primary Care (CIN 13-410) at the Durham VA Medical Center. Dr. Maciejewski is supported by a Research Career Scientist award (RCS 10-391) and a Grant (IIR 10-159) from the VA. The Diabetes QUERI, the National Center for Health Promotion and Disease Prevention, and the Health Services Research and Development Service, Department of Veterans Affairs had no role in the design, conduct, collection, management, analysis, or interpretation of the data; or in the preparation, review, or approval of the manuscript.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Valerie A. Smith.

Ethics declarations

Conflict of interest

Dr. Maciejewski owns stock in Amgen due to his spouse’s employment. No other authors have conflicts of interest.

Ethical approval

All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards.

Informed consent

This study was approved by the institutional review board (including waiver of informed consent) of the Durham VA Medical Center.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 64 KB)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Smith, V.A., Neelon, B., Maciejewski, M.L. et al. Two parts are better than one: modeling marginal means of semicontinuous data. Health Serv Outcomes Res Method 17, 198–218 (2017). https://doi.org/10.1007/s10742-017-0169-9

Download citation

Keywords

  • Generalized gamma distribution
  • Health care expenditures
  • Log-skew-normal distribution
  • Marginalized models
  • Two-part models