Abstract
We consider fitting parametric distributions to health care data that often exhibit heavy tails. The three-parameter generalized gamma distribution and its special cases, the lognormal, Weibull, and gamma, are examined. Continuous gamma mixing of the Weibull distribution leads to the Burr distribution and its special cases, the log-logistic and Pareto distributions. For finite mixtures we consider Coxian-phase type distributions and mixtures of exponentials. Both maximum likelihood and Bayesian methods are presented with prescriptions for fitting these models using recent enhancements to SAS software. Comparisons between competing models are made using probability-probability plots, formal likelihood ratio tests for nested models, and Vuong test for strictly nonnested models. We provide a demonstration of these methods with two empirical data sets, one with completely observed hospital length of stays, and the other with censored follow-up times of patients who received a bone marrow transplant.
Similar content being viewed by others
References
Aalen, O., O. Borgan, and H. Gjessing. 2008. Survival and event history analysis. New York, NY: Springer-Verlag.
Andrews, D. W. K. 2001. Testing when a parameter is on the boundary of the maintained hypothesis. Econometrica, 69(3), 683–734.
Asmussen, S., O. Nerman, and M. Olsson. 1996. Fitting phase-type distributions via the EM algorithm. Scandinavian J. Stat., 23(4), 419–441.
Ausín, M. C., M. P. Wiper, and R. E. Lillo. 2008. Bayesian prediction of the transient behaviour and busy period in short- and long-tailed GI/G/1 queueing systems. Comput. Stat. Data Anal., 52(3), 1615–1635.
Basu, A., and P. J. Rathouz. 2005. Estimating marginal and incremental effects on health outcomes using flexible link and variance function models. Biostatistics, 6(1), 93–109.
Bobbio, A., A. Horvath, and M. Telek. 2005. Matching three moments with minimal acyclic phase type distributions. Stochastic Models, 21(2–3), 303–326.
Brooks, S. P., and P. Giudici. 2000. Markov chain Monte Carlo convergence assessment via two-way analysis of variance. J. Comput. Graphical Stat., 9(2), 266–285.
de Jong, P., and G. Z. Heller. 2008. Generalized linear models for insurance data. Cambridge, UK: Cambridge University Press.
Fackrell, M. 2009. Modelling healthcare systems with phase-type distributions. Health Care Manage. Sci., 12(1), 11–26.
Faddy, M. J., N. Graves, and A. Pettitt. 2009. Modeling length of stay in hospital and other right skewed data: Comparison of phase-type, gamma and log-normal distributions. Value Health, 12(2), 309–314.
Faddy, M. J., and S. I. McClean. 1999. Analysing data on lengths of stay of hospital patients using phase-type distributions. Appl. Stochastic Models Business Ind., 15(4), 311–317.
Foss, S., D. Korshunov, and S. Zachary. 2011. An introduction to heavy-tailed and subexponential distributions. New York, NY: Springer-Verlag.
Gardiner, J. C. 2010. Survival analysis: Overview of Parametric, nonparametric and semiparametric approaches and new developments. SAS Global Forum, Paper 252-2010. Cary, NC: SAS Institute, Inc.
Gardiner, J. C., Z. Luo, and L. Liu. 2008. Analysis of multiple failure times using SAS software. In Computational methods in biomedical research, ed. R. Khattree and D. Naik, 153–188. New York, NY: Chapman & Hall/CRC.
Gardiner, J. C., Z. Luo, and L. A. Roman. 2009. Fixed effects, random effects and GEE: What are the differences? Stat. Med., 28(2), 221–239.
Gelman, A., and D. B. Rubin. 1992. Inference from iterative simulation using multiple sequences. Stat. Sci., 7(4), 457–472.
Geweke, J. 1992. Evaluating the accuracy of sampling-based approaches to the calculation of posterior moments. In Bayesian statistics, ed. J. M. Bernado, J. Berger, A. P. Dawid, and A. F. M. Smith, 169–193. Oxford, UK: Oxford University Press.
Ghosh, J. K., M. Delampady, and T. Samanta. 2006. An introduction to Bayesian analysis—Theory and methods. New York, NY: Springer.
Gilks, W. R., S. Richardson, and D. J. Spiegelhalter. 1996. Markov chain Monte Carlo in practice. London, UK: Chapman Hall/CRC Press.
Golub, G., and C. Van Loan. 1996. Matrix computations, 3rd ed. Baltimore, MD: Johns Hopkins University Press.
Harrison, G. W., and P. H. Millard. 1991. Balancing acute and long-term care: The mathematics of throughput in departments of geriatric medicine. Methods Information Med., 30(3), 221–228.
Hougaard, P. 2000. Analysis of multivariate survival data. New York, NY: Springer-Verlag.
Ibrahim, J. G., M. H. Chen, and D. Sinha. 2001. Bayesian survival analysis. New York, NY: Springer-Verlag.
Keiding, N., J. P. Klein, and M. M. Horowitz. 2001. Multi-state models and outcome prediction in bone marrow transplantation. Stat. Med., 20(12), 1871–1885.
Klein, J. P., and M. L. Moeschberger. 2003. Survival analysis: Techniques for censored and truncated data, 2nd ed. New York, NY: Springer-Verlag.
Klugman, S. A., H. H. Panjer, and G. E. Wilmot. 2004. Loss models: From data to decisions. New York, NY: John Wiley & Sons.
Lancaster, T. 1990. The econometric analysis of transition data. Cambridge, UK: Cambridge University Press.
Lawless, J. F. 2003. Statistical models and methods for lifetime data, 2nd ed. Hoboken, NJ: John Wiley & Sons.
Lo, Y. T., N. R. Mendell, and D. B. Rubin. 2001. Testing the number of components in a normal mixture. Biometrika, 88(3), 767–778.
Maller, R., and X. Zhou. 1996. Survival analysis with long-term survivors. West Sussex, England: John Wiley & Sons.
Manning, W. G., A. Basu, and J. Mullahy. 2005. Generalized modeling approaches to risk adjustment of skewed outcomes data. J. Health Econ., 24(3), 465–488.
Marshall, A. H., and S. I. McClean. 2004. Using Coxian phase-type distributions to identify patient characteristics for duration of stay in hospital. Health Care Manage. Sci., 7(4), 285–289.
Marshall, A. H., S. I. McClean, C. M. Shapcott, and P. H. Millard. 2002. Modelling patient duration of stay to facilitate resource management of geriatric hospitals. Health Care Management Science, 5(4), 313–319.
Marshall, A. H., B. Shaw, and S. I. McClean. 2007. Estimating the costs for a group of geriatric patients using the Coxian phase-type distribution. Stat. Med., 26(13), 2716–2729.
McClean, S., and P. Millard. 1993. Patterns of length of stay after admission in geriatric-medicine—An event history approach. Statistician, 42(3), 263–274.
McCullagh, P., and J. A. Nelder. 1989. Generalized linear models. New York, NY: Chapman and Hall.
McGrory, C. A., A. N. Pettitt, and M. J. Faddy. 2009. A fully Bayesian approach to inference for Coxian phase-type distributions with covariate dependent mean. Comput. Stat. Data Anal., 53(12), 4311–4321.
McLachlan, G., and D. Peel. 2000. Finite mixture models. New York, NY: John Wiley & Sons.
Mosler, K. 2003. Mixture models in econometric duration analysis. Appl. Stochastic Models Business Ind., 19(2), 91–104.
Osogami, T., and M. Harchol-Balter. 2006. Closed form solutions for mapping general distributions to quasi-minimal PH distributions. Perform. Eval., 63(6), 524–552.
Putzer, E. J. 1966. Avoiding the Jordan cannonical form in the discussion of linear systems with constant coefficients. Am. Math. Monthly, 73, 2–7.
Rigby, R. A., and D. M. Stasinopoulos. 2006. Using the Box-Cox t distribution in GAMLSS to model skewness and kurtosis. Stat. Model., 6(3), 209–229.
SAS Institute. 2011a. SAS/ETS 9.3 user’s guide. Cary, NC: SAS Institute Inc.
SAS Institute. 2011b. SAS/STAT 9.3 user’s guide. Cary, NC: SAS Institute Inc.
Sun, J. F., E. W. Frees, and M. A. Rosenberg. 2008. Heavy-tailed longitudinal data modeling using copulas. Insur. Math. Econ., 42(2), 817–830.
Tang, X., Z. Luo, and J. C. Gardiner. 2012. Modeling hospital length of stay by Coxian phase-type regression with heterogeneity. Stat. Med., 31(14), 1502–1516.
Vuong, Q. H. 1989. Likelihood ratio tests for model selection and non-nested hypotheses. Econometrica, 57(2), 307–333.
Author information
Authors and Affiliations
Corresponding author
Additional information
Color versions of one or more of the figures in the article can be found online at www.tandfonline.com/ujsp.
Rights and permissions
About this article
Cite this article
Gardiner, J.C., Luo, Z., Tang, X. et al. Fitting Heavy-Tailed Distributions to Health Care Data by Parametric and Bayesian Methods. J Stat Theory Pract 8, 619–652 (2014). https://doi.org/10.1080/15598608.2013.824823
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1080/15598608.2013.824823