Abstract
A difficult problem in healthcare is predicting who will become very sick in the near future. In our case, we find that the top 10% of newly diagnosed type 2 diabetes patients account for 68% of healthcare utilization. In this paper, we demonstrate how the U.S. healthcare system can provide improved healthcare quality per unit of spend through better predictive data-based analytics applied to the increasingly available troves of healthcare claims data. Specifically, we demonstrate the effectiveness of data mining by applying machine learning methods to large-scale medical and pharmacy claims data for over 65,000 patients newly diagnosed with type 2 diabetes, a common and costly disease globally. This analysis reveals some important heretofore unknown patterns in the cost and quality among of the disease's common treatments and demonstrates the potential for using large-scale data mining for efficiently focusing further inquiry.
This is a preview of subscription content, access via your institution.


References
Berger ML (2009) Good research practices for comparative effectiveness research: defining, reporting and interpreting nonrandomized studies of treatment effects using secondary data sources: the ISPOR good research practices for retrospective database analysis task force report – part I. Value in Health 12 (8), 1062–1073.
Bowker SL, Majumdar SR, Veugelers P and Johnson JA (2006) Increased cancer-related mortality for patients with type 2 diabetes who use sulfonylureas or insulin. Diabetes Care 29 (2), 254–258.
Breiman L, Friedman J, Olshen R and Stone C (1984) Classification and Regression Trees, Wadsworth International Group, Belmont, CA.
Centers for Disease Control (2010) Diabetes successes and opportunities for population-based prevention and control, at-a-glance 2010. Retrieved 27 December 2010 from [WWW document] http://www.cdc.gov/chronicdisease/resources/publications/aag/pdf/2010/diabetes_aag.pdf.
Congressional Budget Office (2007) The long-term outlook for health care spending, introduction and summary, November 2007. Retrieved 27 December 2010 from [WWW document] http://www.cbo.gov/ftpdocs/87xx/doc8758/MainText.3.1.shtml.
Department of Health and Human Services (2009a) Report to the president and the congress on comparative effectiveness research. Retrieved 27 December 2010 from [WWW document] http://www.hhs.gov/recovery/programs/cer/execsummary.html.
Department of Health and Human Services (2009b) Draft definition, prioritization criteria, and strategic framework [for CER] for public comment. Retrieved 27 December 2010 from [WWW document] http://www.hhs.gov/recovery/programs/cer/draftdefinition.html.
Department of Health and Human Services (2010) Department of health and human services agency-wide plan [for ARRA] (as of June 2010). Retrieved 27 December 2010 from [WWW document] http://www.hhs.gov/recovery/reports/plans/index.html.
Dhar V and Stein R (1996) Seven Methods for Transforming Corporate Data into Business Intelligence. Prentice-Hall, Upper Saddle River, NJ.
Eurich DT, Majumdar SR, McAlister FA, Tsuyuki RT and Johnson JA (2005a) Improved clinical outcomes associated with metformin in patients with diabetes and heart failure. Diabetes Care 28 (10), 2345–2351.
Eurich DT, Simpson SH, Majumdar SR and Johnson JA (2005b) Secondary failure rates associated with metformin and sulfonylurea therapy for type 2 diabetes mellitus. Pharmacotherapy 25 (6), 810–816.
Hansen RA, Farley JF, Droege M and Maciejewski ML (2010) A retrospective cohort study of economic outcomes and adherence to monotherapy with metformin, pioglitazone, or a sulfonylurea among patients with type 2 diabetes mellitus in the United States from 2003 to 2005. Clinical Therapeutics 32 (7), 1308–1319.
Inzucchi SE et al (2012) Management of hyperglycemia in type 2 diabetes: a patient-centered approach. Diabetes Care 35 (6), 1364–1379.
Johnson JA, Simpson SH, Toth EL and Majumdar SR (2005) Reduced cardiovascular morbidity and mortality associated with metformin use in subjects with type 2 diabetes. Diabetes Medicine 22 (4), 497–502.
Kalsekar I and Latran M (2007) Economic effect of augmentation strategies in patients with type 2 diabetes initiated on sulfonylureas. Managed Care Interface 20 (9), 39–46.
Karter AJ, Ahmed AT, Liu J, Moffet HH and Parker MM (2005) Pioglitazone initiation and subsequent hospitalization for congestive heart failure. Diabetes Medicine 22 (8), 986–993.
National Institute of Diabetes and Digestive and Kidney Diseases (2011) National Diabetes Statistics, 2011. Retrieved 3 March 2011 from [WWW document] http://diabetes.niddk.nih.gov/DM/PUBS/statistics/.
Nichols GA, Koro CE, Gullion CM, Ephross SA and Brown JB (2005) The incidence of congestive heart failure associated with antidiabetic therapies. Diabetes/Metabolism Research and Reviews 21 (1), 51–57.
Peirce CS (1883) A theory of probable inference. Studies in Logic by members of the Johns Hopkins University. Little, Brown, and Company, Boston, MA, pp.126–181.
Pople HE (1973) On the mechanization of abductive logic. Proceedings of the 3rd International Joint Conference on Artificial Intelligence, pp.147–152.
Pople HE (1977) The formation of composite hypotheses in diagnostic problem solving: an exercise in synthetic reasoning. Proceedings of the 5th International Joint Conference on Artificial Intelligence, pp.1030–1037.
SAS Institute. (2009) Enterprise miner product documentation. Retrieved 27 December 2010 from [WWW document] http://support.sas.com/documentation/onlinedoc/miner/.
Schneeweiss S (2007) Developments in post-marketing comparative effectiveness research. Clinical Pharmacology and Therapeutics 82 (2), 143–156.
Shortliffe EH and Buchanan BG (1975) A model of inexact reasoning in medicine. Mathematical Biosciences 23 (3–4), 351–379.
Simpson SH, Majumdar SR, Tsuyuki RT, Eurich DT and Johnson JA (2006) Dose-response relation between sulfonylurea drugs and mortality in type 2 diabetes mellitus: a population-based cohort study. Canadian Medical Association Journal 174 (2), 169–174.
Soni A (2011) Top 10 most costly conditions among men and women, 2008: estimates for the U.S. civilian noninstitutionalized adult population, age 18 and older. Agency for Healthcare Research and Quality, Rockville, MD, Statistical Brief 331.
Sullivan P and Goldmann D (2011) The promise of comparative effectiveness research. Journal of the American Medical Association 305 (4), 400–401.
Szolovits P and Pauker SG (1978) Categorical and probabilistic reasoning in medical diagnosis. Artificial Intelligence 11, 115–144.
Tzoulaki I et al (2009) Risk of cardiovascular disease and all cause mortality among patients with type 2 diabetes prescribed oral antidiabetes drugs: retrospective cohort study using UK general practice research database. British Medical Journal 339, b4731.
U.S. Census Bureau. Census Regions and Divisions of the United States. Retrieved 5 November 2011 from [WWW document] http://www.census.gov/geo/www/us_regdiv.pdf.
U.S. Congress (2009) American recovery and reinvestment act of 2009. Retrieved 27 December 2010 from [WWW document] http://fdsys.gpo.gov/fdsys/pkg/BILLS-111hr1ENR/pdf/BILLS-111hr1ENR.pdf.
Author information
Authors and Affiliations
Corresponding author
Appendices
Appendix A
Medical claim coding for uncomplicated type 2 diabetes
ICD9-CM diagnosis codes
- 250.00:
-
Diabetes mellitus without mention of complication, Type II, Controlled
- 250.02:
-
Diabetes mellitus without mention of complication, Type II, Uncontrolled
Appendix B
Medical claim coding for type 1 diabetes
ICD9-CM diagnosis codes
- 250.x1:
-
Diabetes mellitus with or without mention of complication, Type I, Controlled
- 250.x3:
-
Diabetes mellitus with or without mention of complication, Type I, Uncontrolled
Appendix C
Medical claim coding for diabetes complications
ICD9-CM diagnosis codes
- 250.1x:
-
Diabetes with ketoacidosis
- 250.2x:
-
Diabetes with hyperosmolarity
- 250.3x:
-
Diabetes with other coma
- 250.4x:
-
Diabetes with renal manifestations
- 585.xx:
-
Chronic kidney disease
- 58381:
-
Nephritis and nephropathy, not specified as acute or chronic, in diseases classified elsewhere
- 58181:
-
Nephrotic syndrome in diseases classified elsewhere
- 250.5x:
-
Diabetes with ophthalmic manifestations
- 369.xx:
-
Blindness and low vision
- 362.xx:
-
Other retinal disorders
- 36641:
-
Diabetic cataract
- 36544:
-
Glaucoma associated with systemic syndromes
- 250.6x:
-
Diabetes with neurological manifestations
- 337.1x:
-
Peripheral autonomic neuropathy in disorders classified elsewhere
- 353.5x:
-
Thoracic root lesions, not elsewhere classified
- 354.xx:
-
Mononeuritis of upper limb and mononeuritis multiplex
- 355.xx:
-
Mononeuritis of lower limb
- 357.2x:
-
Polyneuropathy in diabetes
- 536.3x:
-
Gastroparesis
- 713.5x:
-
Arthropathy associated with neurological disorders
- 250.7x:
-
Diabetes with peripheral circulatory disorders
- 44381:
-
Peripheral angiopathy in diseases classified elsewhere
- 785.4x:
-
Gangrene
- 250.8x:
-
Diabetes with other specified manifestations
- 707.1x:
-
Ulcer of lower limbs, except pressure ulcer
- 707.2x:
-
Pressure ulcer stages
- 707.8x:
-
Chronic ulcer of other specified sites
- 707.9x:
-
Chronic ulcer of unspecified site
- 731.8x:
-
Other bone involvement in diseases classified elsewhere
- 250.9x:
-
Diabetes with unspecified complication
ICD9-CM procedure codes
- 84.0x:
-
Amputation of upper limb
- 84.1x:
-
Amputation of lower limb
CPT4 procedure codes
- 26910:
-
Amputate metacarpal bone
- 26951:
-
Amputation of finger/thumb
- 26952:
-
Amputation of finger/thumb
- 27590:
-
Amputate leg at thigh
- 27591:
-
Amputate leg at thigh
- 27592:
-
Amputate leg at thigh
- 27594:
-
Amputation follow-up surgery
- 27596:
-
Amputation follow-up surgery
- 27598:
-
Amputate lower leg at knee
- 27880:
-
Amputation of lower leg
- 27881:
-
Amputation of lower leg
- 27882:
-
Amputation of lower leg
- 27884:
-
Amputation follow-up surgery
- 27886:
-
Amputation follow-up surgery
- 27888:
-
Amputation of foot at ankle
- 27889:
-
Amputation of foot at ankle
- 28800:
-
Amputation of midfoot
- 28805:
-
Amputation through metatarsal
- 28810:
-
Amputation toe and metatarsal
- 28820:
-
Amputation of toe
- 28825:
-
Partial amputation of toe
Appendix D
Appendix E
Rights and permissions
About this article
Cite this article
Maguire, J., Dhar, V. Comparative effectiveness for oral anti-diabetic treatments among newly diagnosed type 2 diabetics: data-driven predictive analytics in healthcare. Health Syst 2, 73–92 (2013). https://doi.org/10.1057/hs.2012.20
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1057/hs.2012.20
Keywords
- diabetes
- comparative effectiveness
- healthcare
- data mining
- predictive analytics
- claims data