Abstract
Despite the increasing availability of routine data, no analysis method has yet been presented for cost-of-illness (COI) studies based on massive data. We aim, first, to present such a method and, second, to assess the relevance of the associated gain in numerical efficiency. We propose a prevalence-based, top-down regression approach consisting of five steps: aggregating the data; fitting a generalized additive model (GAM); predicting costs via the fitted GAM; comparing predicted costs between prevalent and non-prevalent subjects; and quantifying the stochastic uncertainty via error propagation. To demonstrate the method, it was applied to aggregated data in the context of chronic lung disease to German sickness funds data (from 1999), covering over 7.3 million insured. To assess the gain in numerical efficiency, the computational time of the innovative approach has been compared with corresponding GAMs applied to simulated individual-level data. Furthermore, the probability of model failure was modeled via logistic regression. Applying the innovative method was reasonably fast (19 min). In contrast, regarding patient-level data, computational time increased disproportionately by sample size. Furthermore, using patient-level data was accompanied by a substantial risk of model failure (about 80 % for 6 million subjects). The gain in computational efficiency of the innovative COI method seems to be of practical relevance. Furthermore, it may yield more precise cost estimates.
Similar content being viewed by others
References
Akobundu, E., Ju, J., Blatt, L., Mullins, C.D.: Cost-of-illness studies: a review of current methods. Pharmacoeconomics 24(9), 869–890 (2006)
Byford, S., Torgerson, D.J., Raftery, J.: Economic note: cost of illness studies. BMJ 320(7245), 1335 (2000)
Larg, A., Moss, J.R.: Cost-of-illness studies: a guide to critical evaluation. Pharmacoeconomics 29(8), 653–671 (2011)
Gruber, E.V., Stock, S., Stollenwerk, B.: Breast cancer attributable costs in Germany: a top-down approach based on sickness funds data. PLoS One 7(12), e51312 (2012)
Ament, A., Evers, S.: Cost of illness studies in health care: a comparison of two cases. Health Policy 26(1), 29–42 (1993)
Stollenwerk, B., Gandjour, A., Lungen, M., Siebert, U.: Accounting for increased non-target-disease-specific mortality in decision-analytic screening models for economic evaluation. Eur. J. Health Econ. (2012). doi:10.1007/s10198-012-0454-z
Stollenwerk, B., Gerber, A., Lauterbach, K.W., Siebert, U.: The German coronary artery disease risk screening model: development, validation, and application of a decision-analytic model for coronary artery disease prevention with statins. Med. Decis. Making 29(5), 619–633 (2009)
Shiell, A., Gerard, K., Donaldson, C.: Cost of illness studies: an aid to decision-making? Health Policy 8, 317–323 (1987)
Wiseman, V., Mooney, G.: Burden of illness estimates for priority setting: a debate revisited. Health Policy 43(3), 243–251 (1998)
Reuter, P.: What drug policies cost: estimating government drug policy expenditures. Addiction 101(3), 315–322 (2006)
Shenoy, A.U., Aljutaili, M., Stollenwerk, B.: Limited economic evidence of carotid artery stenosis diagnosis and treatment: a systematic review. Eur. J. Vasc. Endovasc. Surg. 44(5), 505–513 (2012)
Liu, J.L., Maniadakis, N., Gray, A., Rayner, M.: The economic burden of coronary heart disease in the UK. Heart 88(6), 597–603 (2002)
Hodgson, T.A., Meiners, M.R.: Cost-of-illness methodology: a guide to current practices and procedures. Milbank. Mem. Fund. Q. Health Soc. 60(3), 429–462 (1982)
Andersen, C.K., Andersen, K., Kragh-Sorensen, P.: Cost function estimation: the choice of a model to apply to dementia. Health Econ. 9(5), 397–409 (2000)
Andersen, C.K., Lauridsen, J., Andersen, K., Kragh-Sorensen, P.: Cost of dementia: impact of disease progression estimated in longitudinal data. Scand. J. Public Health 31(2), 119–125 (2003)
Maetzel, A., Li, L.C., Pencharz, J., Tomlinson, G., Bombardier, C.: The economic burden associated with osteoarthritis, rheumatoid arthritis, and hypertension: a comparative study. Ann. Rheum. Dis. 63(4), 395–401 (2004)
Penberthy, L.T., Towne, A., Garnett, L.K., Perlin, J.B., DeLorenzo, R.J.: Estimating the economic burden of status epilepticus to the health care system. Seizure 14(1), 46–51 (2005)
Perencevich, E.N., Sands, K.E., Cosgrove, S.E., Guadagnoli, E., Meara, E., Platt, R.: Health and economic impact of surgical site infections diagnosed after hospital discharge. Emerg. Infect. Dis. 9(2), 196–203 (2003)
Bassi, A., Dodd, S., Williamson, P., Bodger, K.: Cost of illness of inflammatory bowel disease in the UK: a single centre retrospective study. Gut 53(10), 1471–1478 (2004)
Dobson, A.J.: An introduction to generalized linear models. Chapman and Hall/CRC, London (2002)
Wenig, C.M.: The impact of BMI on direct costs in children and adolescents: empirical findings for the German Healthcare System based on the KiGGS-study. Eur. J. Health Econ. 13(1), 39–50 (2012)
van Rutten- Molken, M.P., van Doorslaer, E.K., van Vliet, R.C.: Statistical analysis of cost outcomes in a randomized controlled clinical trial. Health Econ. 3(5), 333–345 (1994)
Menn, P., Heinrich, J., Huber, R.M., Jorres, R.A., John, J., Karrasch, S., Peters, A., Schulz, H., Holle, R.: Direct medical costs of COPD: an excess cost approach based on two population-based studies. Respir. Med. 106(4), 540–548 (2012)
Mihaylova, B., Briggs, A., O’Hagan, A., Thompson, S.G.: Review of statistical methods for analysing healthcare resources and costs. Health Econ. 20(8), 897–916 (2010)
Stock, S., Redaelli, M., Luengen, M., Wendland, G., Civello, D., Lauterbach, K.W.: Asthma: prevalence and cost of illness. Eur. Respir. J. 25(1), 47–53 (2005)
Stock, S.A., Redaelli, M., Wendland, G., Civello, D., Lauterbach, K.W.: Diabetes–prevalence and cost of illness in Germany: a study evaluating data from the statutory health insurance in Germany. Diabet. Med. 23(3), 299–305 (2006)
Rubin, D.B.: Estimating causal effects from large data sets using propensity scores. Ann. Intern. Med. 127(8 Pt 2), 757–763 (1997)
Tegmark, M., Taylor, A.N., Heavens, A.F.: Karhunen-Loève eigenvalue problems in cosmology: how should we tackle large data sets? Astrophys. J. 480(1), 22–35 (1997)
Huang, Z.: Extensions to the k-means algorithm for clustering large data sets with categorical values. Data Min. Knowl. Disc. 2(3), 283–304 (1998)
Browning, B.L., Browning, S.R.: A unified approach to genotype imputation and haplotype-phase inference for large data sets of trios and unrelated individuals. Am. J. Hum. Genet. 84(2), 210–223 (2009). doi:10.1016/j.ajhg.2009.01.005
Department of Health: Guidance on the Routine Collection of Patient Reported Outcome Measures (PROMs). http://webarchive.nationalarchives.gov.uk/20130107105354/http://www.dh.gov.uk/prod_consum_dh/groups/dh_digitalassets/@dh/@en/documents/digitalasset/dh_092625.pdf (2010). Accessed 12 Jan 2014
Murdoch, T.B., Detsky, A.S.: The inevitable application of big data to health care. JAMA 309(13), 1351–1352 (2013). doi:10.1001/jama.2013.393
Schneeweiss, S.: Learning from big health care data. N. Engl. J. Med. 370(23), 2161–2163 (2014). doi:10.1056/NEJMp1401111
Fayyad, U.M., Piatetsky-Shapiro, G., Smyth, P., Uthurusamy, R.: Advances in knowledge discovery and data mining. MIT Press, Cambridge (1996)
Witten, I.H., Frank, E., Hall, M.A.: Data mining: practical machine learning tools and techniques, 3rd edn. Morgan Kaufmann Publishers/Elsevier, Burlington, MA (2011)
Wood, S.N.: Generalized Additive Models: An Introduction with R. Chapman and Hall/CRC, London (2006)
Stollenwerk, B., Stock, S., Siebert, U., Lauterbach, K.W., Holle, R.: Uncertainty assessment of input parameters for economic evaluation: Gauss’s error propagation, an alternative to established methods. Med. Decis. Making 30(3), 304–313 (2010)
Wood, S.N.: Thin plate regression splines. J. R. Stat Soc. B 65(1), 95–114 (2003)
Blough, D.K., Madden, C.W., Hornbrook, M.C.: Modeling risk using generalized linear models. J. Health Econ. 18(2), 153–171 (1999)
Mullahy, J.: Econometric modeling of health care costs and expenditures: a survey of analytical issues and related policy considerations. Med. Care. 47(7 Suppl 1), S104–S108 (2009)
R Development Core Team: R: a language and environment for statistical computing. In. R Foundation for Statistical Computing, Vienna (2012)
Stock, S.A., Stollenwerk, B., Redaelli, M., Civello, D., Lauterbach, K.W.: Sex differences in treatment patterns of six chronic diseases: an analysis from the German statutory health insurance. J. Womens Health (Larchmt) 17(3), 343–354 (2008)
Statistisches Bundesamt: Bevölkerung Deutschlands bis 2060: 12. koordinierte Bevölkerungsvorausberechnung. DESTATIS, Wiesbaden (2009)
Zolman, J.F.: Biostatistics. Oxford University Press, Oxford (1993)
Miravitlles, M., Murio, C., Guerrero, T., Gisbert, R.: Costs of chronic bronchitis and COPD: a 1-year follow-up study. Chest 123(3), 784–791 (2003)
van Rutten- Molken, M.P., Postma, M.J., Joore, M.A., Van Genugten, M.L., Leidl, R., Jager, J.C.: Current and future medical costs of asthma and chronic obstructive pulmonary disease in the Netherlands. Respir. Med. 93(11), 779–787 (1999)
Nielsen, R., Johannessen, A., Omenaas, E.R., Bakke, P.S., Askildsen, J.E., Gulsvik, A.: Excessive costs of COPD in ever-smokers: a longitudinal community study. Respir. Med. 105(3), 485–493 (2011)
Koleva, D., Motterlini, N., Banfi, P., Garattini, L.: Healthcare costs of COPD in Italian referral centres: a prospective study. Respir. Med. 101(11), 2312–2320 (2007)
van Rutten- Molken, M.P., Feenstra, T.L.: The burden of asthma and chronic obstructive pulmonary disease: data from the Netherlands. Pharmacoeconomics 19(Suppl 2), 1–6 (2001)
Ungar, W.J., Coyte, P.C., Chapman, K.R., MacKeigan, L.: The patient level cost of asthma in adults in south central Ontario. Pharmacy Medication Monitoring Program Advisory Board. Can. Respir. J. 5(6), 463–471 (1998)
Buist, A.S., McBurnie, M.A., Vollmer, W.M., Gillespie, S., Burney, P., Mannino, D.M., Menezes, A.M., Sullivan, S.D., Lee, T.A., Weiss, K.B., Jensen, R.L., Marks, G.B., Gulsvik, A., Nizankowska-Mogilnicka, E.: International variation in the prevalence of COPD (the BOLD Study): a population-based prevalence study. Lancet 370(9589), 741–750 (2007)
Manning, W.G., Mullahy, J.: Estimating log models: to transform or not to transform? J. Health Econ. 20(4), 461–494 (2001)
Basu, A., Rathouz, P.J.: Estimating marginal and incremental effects on health outcomes using flexible link and variance function models. Biostatistics 6(1), 93–109 (2005)
Manning, W.G., Basu, A., Mullahy, J.: Generalized modeling approaches to risk adjustment of skewed outcomes data. J. Health Econ. 24(3), 465–488 (2005)
Acknowledgments
Financial support for this study was provided by the Helmholtz Zentrum München, German Research Centre for Environmental Health (HMGU) and the Institute of Health Economics and Clinical Epidemiology, University of Cologne, Germany. The funding agreement enabled the authors to design the study, interpret the data, and write and publish the manuscript. The following authors are employed by the sponsors: Björn Stollenwerk (HMGU), Stephanie Stock (University of Cologne). We thank Heather Hynd for proofreading the manuscript.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Stollenwerk, B., Welchowski, T., Vogl, M. et al. Cost-of-illness studies based on massive data: a prevalence-based, top-down regression approach. Eur J Health Econ 17, 235–244 (2016). https://doi.org/10.1007/s10198-015-0667-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10198-015-0667-z