Machine Learning: Multi-site Evidence-Based Best Practice Discovery

  • Eva K. LeeEmail author
  • Yuanbo Wang
  • Matthew S. Hagen
  • Xin Wei
  • Robert A. Davis
  • Brent M. Egan
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10122)


This study establishes interoperability among electronic medical records from 737 healthcare sites and performs machine learning for best practice discovery. A mapping algorithm is designed to disambiguate free text entries and to provide a unique and unified way to link content to structured medical concepts despite the extreme variations that can occur during clinical diagnosis documentation. Redundancy is reduced through concept mapping. A SNOMED-CT graph database is created to allow for rapid data access and queries. These integrated data can be accessed through a secured web-based portal. A classification model (DAMIP) is then designed to uncover discriminatory characteristics that can predict the quality of treatment outcome. We demonstrate system usability by analyzing Type II diabetic patients. DAMIP establishes a classification rule on a training set which results in greater than 80% blind predictive accuracy on an independent set of patients. By including features obtained from structured concept mapping, the predictive accuracy is improved to over 88%. The results facilitate evidence-based treatment and optimization of site performance through best practice dissemination and knowledge transfer.


Classification Rule Depth Level Discriminatory Feature Unify Medical Language System Graph Database 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



This paper receives the 2016 NSF Health Organization Transformation award (second place). The work is partially supported by a grant from the National Science Foundation IIP-1361532. Findings and conclusions in this paper are those of the authors and do not necessarily reflect the views of the National Science Foundation.


  1. 1.
    Jensen, P.B., Jensen, L.J., Brunak, S.: Mining electronic health records: towards better research applications and clinical care. Nat. Rev. Genet. 13(6), 395–405 (2012)CrossRefGoogle Scholar
  2. 2.
    Park, H., Hardiker, N.: Clinical terminologies: a solution for semantic interoperability. J. Korean Soc. Med. Inform. 15(1), 1–11 (2009)CrossRefGoogle Scholar
  3. 3.
    Rosenbloom, S.T., et al.: Interface terminologies: facilitating direct entry of clinical data into electronic health record systems. J. Am. Med. Inform. Assoc. 13(3), 277–288 (2006)CrossRefGoogle Scholar
  4. 4.
    Donnelly, K.: SNOMED-CT: the advanced terminology and coding system for eHealth. Stud. Health Technol. Inform. 121, 279 (2006)Google Scholar
  5. 5.
    McDonald, C.J., et al.: LOINC, a universal standard for identifying laboratory observations: a 5-year update. Clin. Chem. 49(4), 624–633 (2003)CrossRefGoogle Scholar
  6. 6.
    Liu, S., et al.: RxNorm: prescription for electronic drug information exchange. IT Prof. 7(5), 17–23 (2005)CrossRefGoogle Scholar
  7. 7.
    Bodenreider, O.: The unified medical language system (UMLS): integrating biomedical terminology. Nucleic Acids Res. 32(suppl 1), D267–D270 (2004)CrossRefGoogle Scholar
  8. 8.
    Hernandez, P., et al.: Automated mapping of pharmacy orders from two electronic health record systems to RxNorm within the STRIDE clinical data warehouse. American Medical Informatics Association (2009)Google Scholar
  9. 9.
    Carlo, L., Chase, H.S., Weng, C.: Aligning structured and unstructured medical problems using umls. American Medical Informatics Association (2010)Google Scholar
  10. 10.
    Patel, C.O., Cimino, J.J.: Using semantic and structural properties of the unified medical language system to discover potential terminological relationships. J. Am. Med. Inform. Assoc. 16(3), 346–353%@ 1067–5027 (2009)Google Scholar
  11. 11.
    Gallagher, R.J., Lee, E.K.: Mixed integer programming optimization models for brachytherapy treatment planning. In: Proceedings of the AMIA Annual Fall Symposium. American Medical Informatics Association (1997)Google Scholar
  12. 12.
    Haffner, S.M.: Epidemiology of type 2 diabetes: risk factors. Diab. Care 21(Suppl. 3), C3–C6 (1998)CrossRefGoogle Scholar
  13. 13.
    Chan, J.M., et al.: Obesity, fat distribution, and weight gain as risk factors for clinical diabetes in men. Diab. Care 17(9), 961–969 (1994)CrossRefGoogle Scholar
  14. 14.
    Estacio, R.O., et al.: Effect of blood pressure control on diabetic microvascular complications in patients with hypertension and type 2 diabetes. Diab. Care 23, B54 (2000)Google Scholar
  15. 15.
    Aronson, A.R.: Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program. In: Proceedings of the AMIA Symposium. American Medical Informatics Association (2001)Google Scholar
  16. 16.
    Aronson, A.R.: Metamap: Mapping Text to the UMLS Metathesaurus. NLM, NIH, DHHS, Bethesda, pp. 1–26 (2006)Google Scholar
  17. 17.
    Aronson, A.R., Lang, F.-M.: An overview of MetaMap: historical perspective and recent advances. J. Am. Med. Inform. Assoc. 17(3), 229–236 (2010)CrossRefGoogle Scholar
  18. 18.
    National Center for Health Statistics (U.S.). ICD-9-CM: The International Classification of Diseases, 9th Revision, Clinical Modification. 1978: Commission on Professional and Hospital ActivitiesGoogle Scholar
  19. 19.
    Caiado, J., Crato, N., Peña, D.: Comparison of times series with unequal length in the frequency domain. Commun. Stat. Simul. Comput.® 38(3), 527–540 (2009)Google Scholar
  20. 20.
    Lee, E.K.: Large-scale optimization-based classification models in medicine and biology. Ann. Biomed. Eng. 35(6), 1095–1109 (2007)CrossRefGoogle Scholar
  21. 21.
    Lee, E.K., et al.: A clinical decision tool for predicting patient care characteristics: patients returning within 72 hours in the emergency department. In: AMIA Annual Symposium Proceedings. American Medical Informatics Association (2012)Google Scholar
  22. 22.
    Brooks, J.P., Lee, E.K.: Solving a multigroup mixed-integer programming-based constrained discrimination model. INFORMS J. Comput. 26(3), 567–585 (2014)MathSciNetCrossRefzbMATHGoogle Scholar
  23. 23.
    Brooks, J.P., Lee, E.K.: Analysis of the consistency of a mixed integer programming-based multi-category constrained discriminant model. Ann. Oper. Res. 174(1), 147–168 (2010)MathSciNetCrossRefzbMATHGoogle Scholar

Copyright information

© Springer International Publishing AG 2016

Authors and Affiliations

  • Eva K. Lee
    • 1
    • 2
    • 3
    Email author
  • Yuanbo Wang
    • 1
    • 2
    • 3
  • Matthew S. Hagen
    • 1
    • 2
    • 3
  • Xin Wei
    • 1
    • 2
    • 3
  • Robert A. Davis
    • 4
    • 5
  • Brent M. Egan
    • 4
    • 5
  1. 1.Center for Operations Research in Medicine and HealthCareAtlantaUSA
  2. 2.NSF I/UCRC Center for Health Organization TransformationAtlantaUSA
  3. 3.Georgia Institute of TechnologyAtlantaUSA
  4. 4.University of South Carolina School of MedicineGreenvilleUSA
  5. 5.Care Coordination InstituteGreenvilleUSA

Personalised recommendations