Skip to main content

Comparative effectiveness for oral anti-diabetic treatments among newly diagnosed type 2 diabetics: data-driven predictive analytics in healthcare


A difficult problem in healthcare is predicting who will become very sick in the near future. In our case, we find that the top 10% of newly diagnosed type 2 diabetes patients account for 68% of healthcare utilization. In this paper, we demonstrate how the U.S. healthcare system can provide improved healthcare quality per unit of spend through better predictive data-based analytics applied to the increasingly available troves of healthcare claims data. Specifically, we demonstrate the effectiveness of data mining by applying machine learning methods to large-scale medical and pharmacy claims data for over 65,000 patients newly diagnosed with type 2 diabetes, a common and costly disease globally. This analysis reveals some important heretofore unknown patterns in the cost and quality among of the disease's common treatments and demonstrates the potential for using large-scale data mining for efficiently focusing further inquiry.

This is a preview of subscription content, access via your institution.

Figure 1
Figure 2


  • Berger ML (2009) Good research practices for comparative effectiveness research: defining, reporting and interpreting nonrandomized studies of treatment effects using secondary data sources: the ISPOR good research practices for retrospective database analysis task force report – part I. Value in Health 12 (8), 1062–1073.

    Article  Google Scholar 

  • Bowker SL, Majumdar SR, Veugelers P and Johnson JA (2006) Increased cancer-related mortality for patients with type 2 diabetes who use sulfonylureas or insulin. Diabetes Care 29 (2), 254–258.

    Article  Google Scholar 

  • Breiman L, Friedman J, Olshen R and Stone C (1984) Classification and Regression Trees, Wadsworth International Group, Belmont, CA.

  • Centers for Disease Control (2010) Diabetes successes and opportunities for population-based prevention and control, at-a-glance 2010. Retrieved 27 December 2010 from [WWW document]

  • Congressional Budget Office (2007) The long-term outlook for health care spending, introduction and summary, November 2007. Retrieved 27 December 2010 from [WWW document]

  • Department of Health and Human Services (2009a) Report to the president and the congress on comparative effectiveness research. Retrieved 27 December 2010 from [WWW document]

  • Department of Health and Human Services (2009b) Draft definition, prioritization criteria, and strategic framework [for CER] for public comment. Retrieved 27 December 2010 from [WWW document]

  • Department of Health and Human Services (2010) Department of health and human services agency-wide plan [for ARRA] (as of June 2010). Retrieved 27 December 2010 from [WWW document]

  • Dhar V and Stein R (1996) Seven Methods for Transforming Corporate Data into Business Intelligence. Prentice-Hall, Upper Saddle River, NJ.

    Google Scholar 

  • Eurich DT, Majumdar SR, McAlister FA, Tsuyuki RT and Johnson JA (2005a) Improved clinical outcomes associated with metformin in patients with diabetes and heart failure. Diabetes Care 28 (10), 2345–2351.

    Article  Google Scholar 

  • Eurich DT, Simpson SH, Majumdar SR and Johnson JA (2005b) Secondary failure rates associated with metformin and sulfonylurea therapy for type 2 diabetes mellitus. Pharmacotherapy 25 (6), 810–816.

    Article  Google Scholar 

  • Hansen RA, Farley JF, Droege M and Maciejewski ML (2010) A retrospective cohort study of economic outcomes and adherence to monotherapy with metformin, pioglitazone, or a sulfonylurea among patients with type 2 diabetes mellitus in the United States from 2003 to 2005. Clinical Therapeutics 32 (7), 1308–1319.

    Article  Google Scholar 

  • Inzucchi SE et al (2012) Management of hyperglycemia in type 2 diabetes: a patient-centered approach. Diabetes Care 35 (6), 1364–1379.

    Article  Google Scholar 

  • Johnson JA, Simpson SH, Toth EL and Majumdar SR (2005) Reduced cardiovascular morbidity and mortality associated with metformin use in subjects with type 2 diabetes. Diabetes Medicine 22 (4), 497–502.

    Article  Google Scholar 

  • Kalsekar I and Latran M (2007) Economic effect of augmentation strategies in patients with type 2 diabetes initiated on sulfonylureas. Managed Care Interface 20 (9), 39–46.

    Google Scholar 

  • Karter AJ, Ahmed AT, Liu J, Moffet HH and Parker MM (2005) Pioglitazone initiation and subsequent hospitalization for congestive heart failure. Diabetes Medicine 22 (8), 986–993.

    Article  Google Scholar 

  • National Institute of Diabetes and Digestive and Kidney Diseases (2011) National Diabetes Statistics, 2011. Retrieved 3 March 2011 from [WWW document]

  • Nichols GA, Koro CE, Gullion CM, Ephross SA and Brown JB (2005) The incidence of congestive heart failure associated with antidiabetic therapies. Diabetes/Metabolism Research and Reviews 21 (1), 51–57.

    Article  Google Scholar 

  • Peirce CS (1883) A theory of probable inference. Studies in Logic by members of the Johns Hopkins University. Little, Brown, and Company, Boston, MA, pp.126–181.

    Chapter  Google Scholar 

  • Pople HE (1973) On the mechanization of abductive logic. Proceedings of the 3rd International Joint Conference on Artificial Intelligence, pp.147–152.

  • Pople HE (1977) The formation of composite hypotheses in diagnostic problem solving: an exercise in synthetic reasoning. Proceedings of the 5th International Joint Conference on Artificial Intelligence, pp.1030–1037.

  • SAS Institute. (2009) Enterprise miner product documentation. Retrieved 27 December 2010 from [WWW document]

  • Schneeweiss S (2007) Developments in post-marketing comparative effectiveness research. Clinical Pharmacology and Therapeutics 82 (2), 143–156.

    Article  Google Scholar 

  • Shortliffe EH and Buchanan BG (1975) A model of inexact reasoning in medicine. Mathematical Biosciences 23 (3–4), 351–379.

    Article  Google Scholar 

  • Simpson SH, Majumdar SR, Tsuyuki RT, Eurich DT and Johnson JA (2006) Dose-response relation between sulfonylurea drugs and mortality in type 2 diabetes mellitus: a population-based cohort study. Canadian Medical Association Journal 174 (2), 169–174.

    Article  Google Scholar 

  • Soni A (2011) Top 10 most costly conditions among men and women, 2008: estimates for the U.S. civilian noninstitutionalized adult population, age 18 and older. Agency for Healthcare Research and Quality, Rockville, MD, Statistical Brief 331.

    Google Scholar 

  • Sullivan P and Goldmann D (2011) The promise of comparative effectiveness research. Journal of the American Medical Association 305 (4), 400–401.

    Article  Google Scholar 

  • Szolovits P and Pauker SG (1978) Categorical and probabilistic reasoning in medical diagnosis. Artificial Intelligence 11, 115–144.

    Article  Google Scholar 

  • Tzoulaki I et al (2009) Risk of cardiovascular disease and all cause mortality among patients with type 2 diabetes prescribed oral antidiabetes drugs: retrospective cohort study using UK general practice research database. British Medical Journal 339, b4731.

    Article  Google Scholar 

  • U.S. Census Bureau. Census Regions and Divisions of the United States. Retrieved 5 November 2011 from [WWW document]

  • U.S. Congress (2009) American recovery and reinvestment act of 2009. Retrieved 27 December 2010 from [WWW document]

Download references

Author information

Authors and Affiliations


Corresponding author

Correspondence to Jon Maguire.


Appendix A

Medical claim coding for uncomplicated type 2 diabetes

ICD9-CM diagnosis codes


Diabetes mellitus without mention of complication, Type II, Controlled


Diabetes mellitus without mention of complication, Type II, Uncontrolled

Appendix B

Medical claim coding for type 1 diabetes

ICD9-CM diagnosis codes


Diabetes mellitus with or without mention of complication, Type I, Controlled


Diabetes mellitus with or without mention of complication, Type I, Uncontrolled

Appendix C

Medical claim coding for diabetes complications

ICD9-CM diagnosis codes


Diabetes with ketoacidosis


Diabetes with hyperosmolarity


Diabetes with other coma


Diabetes with renal manifestations


Chronic kidney disease


Nephritis and nephropathy, not specified as acute or chronic, in diseases classified elsewhere


Nephrotic syndrome in diseases classified elsewhere


Diabetes with ophthalmic manifestations


Blindness and low vision


Other retinal disorders


Diabetic cataract


Glaucoma associated with systemic syndromes


Diabetes with neurological manifestations


Peripheral autonomic neuropathy in disorders classified elsewhere


Thoracic root lesions, not elsewhere classified


Mononeuritis of upper limb and mononeuritis multiplex


Mononeuritis of lower limb


Polyneuropathy in diabetes




Arthropathy associated with neurological disorders


Diabetes with peripheral circulatory disorders


Peripheral angiopathy in diseases classified elsewhere




Diabetes with other specified manifestations


Ulcer of lower limbs, except pressure ulcer


Pressure ulcer stages


Chronic ulcer of other specified sites


Chronic ulcer of unspecified site


Other bone involvement in diseases classified elsewhere


Diabetes with unspecified complication

ICD9-CM procedure codes


Amputation of upper limb


Amputation of lower limb

CPT4 procedure codes


Amputate metacarpal bone


Amputation of finger/thumb


Amputation of finger/thumb


Amputate leg at thigh


Amputate leg at thigh


Amputate leg at thigh


Amputation follow-up surgery


Amputation follow-up surgery


Amputate lower leg at knee


Amputation of lower leg


Amputation of lower leg


Amputation of lower leg


Amputation follow-up surgery


Amputation follow-up surgery


Amputation of foot at ankle


Amputation of foot at ankle


Amputation of midfoot


Amputation through metatarsal


Amputation toe and metatarsal


Amputation of toe


Partial amputation of toe

Appendix D

Table D1

Table D1 Insulin medications

Appendix E

Table E1

Table E1 Oral antidiabetic medications

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Maguire, J., Dhar, V. Comparative effectiveness for oral anti-diabetic treatments among newly diagnosed type 2 diabetics: data-driven predictive analytics in healthcare. Health Syst 2, 73–92 (2013).

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:


  • diabetes
  • comparative effectiveness
  • healthcare
  • data mining
  • predictive analytics
  • claims data