Skip to main content
Log in

Fitting Heavy-Tailed Distributions to Health Care Data by Parametric and Bayesian Methods

  • Published:
Journal of Statistical Theory and Practice Aims and scope Submit manuscript

Abstract

We consider fitting parametric distributions to health care data that often exhibit heavy tails. The three-parameter generalized gamma distribution and its special cases, the lognormal, Weibull, and gamma, are examined. Continuous gamma mixing of the Weibull distribution leads to the Burr distribution and its special cases, the log-logistic and Pareto distributions. For finite mixtures we consider Coxian-phase type distributions and mixtures of exponentials. Both maximum likelihood and Bayesian methods are presented with prescriptions for fitting these models using recent enhancements to SAS software. Comparisons between competing models are made using probability-probability plots, formal likelihood ratio tests for nested models, and Vuong test for strictly nonnested models. We provide a demonstration of these methods with two empirical data sets, one with completely observed hospital length of stays, and the other with censored follow-up times of patients who received a bone marrow transplant.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Aalen, O., O. Borgan, and H. Gjessing. 2008. Survival and event history analysis. New York, NY: Springer-Verlag.

    Book  MATH  Google Scholar 

  • Andrews, D. W. K. 2001. Testing when a parameter is on the boundary of the maintained hypothesis. Econometrica, 69(3), 683–734.

    Article  MathSciNet  MATH  Google Scholar 

  • Asmussen, S., O. Nerman, and M. Olsson. 1996. Fitting phase-type distributions via the EM algorithm. Scandinavian J. Stat., 23(4), 419–441.

    MATH  Google Scholar 

  • Ausín, M. C., M. P. Wiper, and R. E. Lillo. 2008. Bayesian prediction of the transient behaviour and busy period in short- and long-tailed GI/G/1 queueing systems. Comput. Stat. Data Anal., 52(3), 1615–1635.

    Article  MathSciNet  MATH  Google Scholar 

  • Basu, A., and P. J. Rathouz. 2005. Estimating marginal and incremental effects on health outcomes using flexible link and variance function models. Biostatistics, 6(1), 93–109.

    Article  MATH  Google Scholar 

  • Bobbio, A., A. Horvath, and M. Telek. 2005. Matching three moments with minimal acyclic phase type distributions. Stochastic Models, 21(2–3), 303–326.

    Article  MathSciNet  MATH  Google Scholar 

  • Brooks, S. P., and P. Giudici. 2000. Markov chain Monte Carlo convergence assessment via two-way analysis of variance. J. Comput. Graphical Stat., 9(2), 266–285.

    MathSciNet  Google Scholar 

  • de Jong, P., and G. Z. Heller. 2008. Generalized linear models for insurance data. Cambridge, UK: Cambridge University Press.

    Book  MATH  Google Scholar 

  • Fackrell, M. 2009. Modelling healthcare systems with phase-type distributions. Health Care Manage. Sci., 12(1), 11–26.

    Article  MathSciNet  Google Scholar 

  • Faddy, M. J., N. Graves, and A. Pettitt. 2009. Modeling length of stay in hospital and other right skewed data: Comparison of phase-type, gamma and log-normal distributions. Value Health, 12(2), 309–314.

    Article  Google Scholar 

  • Faddy, M. J., and S. I. McClean. 1999. Analysing data on lengths of stay of hospital patients using phase-type distributions. Appl. Stochastic Models Business Ind., 15(4), 311–317.

    Article  MATH  Google Scholar 

  • Foss, S., D. Korshunov, and S. Zachary. 2011. An introduction to heavy-tailed and subexponential distributions. New York, NY: Springer-Verlag.

    Book  MATH  Google Scholar 

  • Gardiner, J. C. 2010. Survival analysis: Overview of Parametric, nonparametric and semiparametric approaches and new developments. SAS Global Forum, Paper 252-2010. Cary, NC: SAS Institute, Inc.

    Google Scholar 

  • Gardiner, J. C., Z. Luo, and L. Liu. 2008. Analysis of multiple failure times using SAS software. In Computational methods in biomedical research, ed. R. Khattree and D. Naik, 153–188. New York, NY: Chapman & Hall/CRC.

    Google Scholar 

  • Gardiner, J. C., Z. Luo, and L. A. Roman. 2009. Fixed effects, random effects and GEE: What are the differences? Stat. Med., 28(2), 221–239.

    Article  MathSciNet  Google Scholar 

  • Gelman, A., and D. B. Rubin. 1992. Inference from iterative simulation using multiple sequences. Stat. Sci., 7(4), 457–472.

    Article  MATH  Google Scholar 

  • Geweke, J. 1992. Evaluating the accuracy of sampling-based approaches to the calculation of posterior moments. In Bayesian statistics, ed. J. M. Bernado, J. Berger, A. P. Dawid, and A. F. M. Smith, 169–193. Oxford, UK: Oxford University Press.

    Google Scholar 

  • Ghosh, J. K., M. Delampady, and T. Samanta. 2006. An introduction to Bayesian analysis—Theory and methods. New York, NY: Springer.

    MATH  Google Scholar 

  • Gilks, W. R., S. Richardson, and D. J. Spiegelhalter. 1996. Markov chain Monte Carlo in practice. London, UK: Chapman Hall/CRC Press.

    MATH  Google Scholar 

  • Golub, G., and C. Van Loan. 1996. Matrix computations, 3rd ed. Baltimore, MD: Johns Hopkins University Press.

    MATH  Google Scholar 

  • Harrison, G. W., and P. H. Millard. 1991. Balancing acute and long-term care: The mathematics of throughput in departments of geriatric medicine. Methods Information Med., 30(3), 221–228.

    Article  Google Scholar 

  • Hougaard, P. 2000. Analysis of multivariate survival data. New York, NY: Springer-Verlag.

    Book  MATH  Google Scholar 

  • Ibrahim, J. G., M. H. Chen, and D. Sinha. 2001. Bayesian survival analysis. New York, NY: Springer-Verlag.

    Book  MATH  Google Scholar 

  • Keiding, N., J. P. Klein, and M. M. Horowitz. 2001. Multi-state models and outcome prediction in bone marrow transplantation. Stat. Med., 20(12), 1871–1885.

    Article  Google Scholar 

  • Klein, J. P., and M. L. Moeschberger. 2003. Survival analysis: Techniques for censored and truncated data, 2nd ed. New York, NY: Springer-Verlag.

    MATH  Google Scholar 

  • Klugman, S. A., H. H. Panjer, and G. E. Wilmot. 2004. Loss models: From data to decisions. New York, NY: John Wiley & Sons.

    MATH  Google Scholar 

  • Lancaster, T. 1990. The econometric analysis of transition data. Cambridge, UK: Cambridge University Press.

    Book  MATH  Google Scholar 

  • Lawless, J. F. 2003. Statistical models and methods for lifetime data, 2nd ed. Hoboken, NJ: John Wiley & Sons.

    MATH  Google Scholar 

  • Lo, Y. T., N. R. Mendell, and D. B. Rubin. 2001. Testing the number of components in a normal mixture. Biometrika, 88(3), 767–778.

    Article  MathSciNet  MATH  Google Scholar 

  • Maller, R., and X. Zhou. 1996. Survival analysis with long-term survivors. West Sussex, England: John Wiley & Sons.

    MATH  Google Scholar 

  • Manning, W. G., A. Basu, and J. Mullahy. 2005. Generalized modeling approaches to risk adjustment of skewed outcomes data. J. Health Econ., 24(3), 465–488.

    Article  Google Scholar 

  • Marshall, A. H., and S. I. McClean. 2004. Using Coxian phase-type distributions to identify patient characteristics for duration of stay in hospital. Health Care Manage. Sci., 7(4), 285–289.

    Article  Google Scholar 

  • Marshall, A. H., S. I. McClean, C. M. Shapcott, and P. H. Millard. 2002. Modelling patient duration of stay to facilitate resource management of geriatric hospitals. Health Care Management Science, 5(4), 313–319.

    Article  Google Scholar 

  • Marshall, A. H., B. Shaw, and S. I. McClean. 2007. Estimating the costs for a group of geriatric patients using the Coxian phase-type distribution. Stat. Med., 26(13), 2716–2729.

    Article  MathSciNet  Google Scholar 

  • McClean, S., and P. Millard. 1993. Patterns of length of stay after admission in geriatric-medicine—An event history approach. Statistician, 42(3), 263–274.

    Article  Google Scholar 

  • McCullagh, P., and J. A. Nelder. 1989. Generalized linear models. New York, NY: Chapman and Hall.

    Book  MATH  Google Scholar 

  • McGrory, C. A., A. N. Pettitt, and M. J. Faddy. 2009. A fully Bayesian approach to inference for Coxian phase-type distributions with covariate dependent mean. Comput. Stat. Data Anal., 53(12), 4311–4321.

    Article  MathSciNet  MATH  Google Scholar 

  • McLachlan, G., and D. Peel. 2000. Finite mixture models. New York, NY: John Wiley & Sons.

    Book  MATH  Google Scholar 

  • Mosler, K. 2003. Mixture models in econometric duration analysis. Appl. Stochastic Models Business Ind., 19(2), 91–104.

    Article  MathSciNet  MATH  Google Scholar 

  • Osogami, T., and M. Harchol-Balter. 2006. Closed form solutions for mapping general distributions to quasi-minimal PH distributions. Perform. Eval., 63(6), 524–552.

    Article  MATH  Google Scholar 

  • Putzer, E. J. 1966. Avoiding the Jordan cannonical form in the discussion of linear systems with constant coefficients. Am. Math. Monthly, 73, 2–7.

    Article  MathSciNet  MATH  Google Scholar 

  • Rigby, R. A., and D. M. Stasinopoulos. 2006. Using the Box-Cox t distribution in GAMLSS to model skewness and kurtosis. Stat. Model., 6(3), 209–229.

    Article  MathSciNet  Google Scholar 

  • SAS Institute. 2011a. SAS/ETS 9.3 user’s guide. Cary, NC: SAS Institute Inc.

    Google Scholar 

  • SAS Institute. 2011b. SAS/STAT 9.3 user’s guide. Cary, NC: SAS Institute Inc.

    Google Scholar 

  • Sun, J. F., E. W. Frees, and M. A. Rosenberg. 2008. Heavy-tailed longitudinal data modeling using copulas. Insur. Math. Econ., 42(2), 817–830.

    Article  MATH  Google Scholar 

  • Tang, X., Z. Luo, and J. C. Gardiner. 2012. Modeling hospital length of stay by Coxian phase-type regression with heterogeneity. Stat. Med., 31(14), 1502–1516.

    Article  MathSciNet  Google Scholar 

  • Vuong, Q. H. 1989. Likelihood ratio tests for model selection and non-nested hypotheses. Econometrica, 57(2), 307–333.

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Joseph C. Gardiner.

Additional information

Color versions of one or more of the figures in the article can be found online at www.tandfonline.com/ujsp.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gardiner, J.C., Luo, Z., Tang, X. et al. Fitting Heavy-Tailed Distributions to Health Care Data by Parametric and Bayesian Methods. J Stat Theory Pract 8, 619–652 (2014). https://doi.org/10.1080/15598608.2013.824823

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1080/15598608.2013.824823

AMS Subject Classification

Keywords

Navigation