Detecting MRSA Infections by Fusing Structured and Unstructured Electronic Health Record Data

  • Thomas HartvigsenEmail author
  • Cansu Sen
  • Elke A. Rundensteiner
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 1024)


Methicillin-resistant Staphylococcus aureus (MRSA), an antibiotic resistant bacteria, is a common cause of one of the more devastating hospital-acquired infections (HAI) in the United States. In this work, we study the practicality of leveraging machine learning methods for early detection of MRSA infections based on a rich variety of patient information commonly available in modern Electronic Health Records (EHR). We explore heterogeneous types of data in EHRs including on-admission demographics, throughout-stay time series and free-form clinical notes. On-admission data capture non-clinical information (e.g., age, marital status) while Throughout-stay data include vital signs, medications, laboratory studies, and other clinical assessments. Clinical notes, free-from text documents created by medical professionals, contain expert observations about patients. Our proposed system generates dense patient-level representations for each data type, extracting features from each of our data types. It then generates scores for each patient, indicating their risk of acquiring MRSA. We evaluate prediction performance achieved by core Machine Learning methods, namely Logistic Regression, Support Vector Machine, and Random Forest, when mining these different types of EHR data retrospectively to detect patterns predictive of MRSA infection. We evaluate classification performance using MIMIC III – a critical care data set comprised of 12 years of patient records from the Beth Israel Deaconess Medical Center Intensive Care Unit in Boston, MA. Our experiments show that while all types of data contain predictive signals, the fusion of all sources of data leads to the most effective prediction accuracy.


MRSA Machine learning Early prediction Feature fusion 



Thomas Hartvigsen thanks the US Department of Education for supporting his PhD studies via the grant P200A150306 on “GAANN Fellowships to Support Data-Driven Computing Research”, while Cansu Sen thanks WPI for granting her the Arvid Anderson Fellowship (2015–2016) to pursue her PhD studies. We also thank the DSRG and Data Science Community at WPI for their continued support and feedback.


  1. 1.
    Aureden, K., Arias, K., Burns, L., et al.: Guide to the Elimination of Methicillin-Resistant Staphylococcus Aureus (MRSA): Transmission in Hospital Settings. APIC, Washington, D.C. (2010)Google Scholar
  2. 2.
    Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)CrossRefGoogle Scholar
  3. 3.
    Celi, L.A., Mark, R.G., Stone, D.J., Montgomery, R.A.: “Big Data” in the intensive care unit. Closing the data loop. Am. J. Respir. Crit. Care Med. 187(11), 1157–1160 (2013)CrossRefGoogle Scholar
  4. 4.
    Chang, Y., et al.: Predicting hospital-acquired infections by scoring system with simple parameters. PLoS ONE 6(8), e23137 (2011)CrossRefGoogle Scholar
  5. 5.
    CMS: Electronic health records (EHR) incentive programs (2011).
  6. 6.
    Congress of the United States: American Recovery and Reinvestment Act (2009).
  7. 7.
    Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)zbMATHGoogle Scholar
  8. 8.
    Dantes, R., et al.: National burden of invasive Methicillin-resistant Staphylococcus aureus infections, United States, 2011. JAMA Intern. Med. 173(21), 1970–1978 (2013)Google Scholar
  9. 9.
    Dubois, S., Kale, D.C., Shah, N., Jung, K.: Learning effective representations from clinical notes. arXiv preprint arXiv:1705.07025 (2017)
  10. 10.
    Dutta, R., Dutta, R.: Maximum probability rule based classification of MRSA infections in hospital environment: using electronic nose. Sens. Actuators B: Chem. 120(1), 156–165 (2006)CrossRefGoogle Scholar
  11. 11.
    Fukuta, Y., Cunningham, C.A., Harris, P.L., Wagener, M.M., Muder, R.R.: Identifying the risk factors for hospital-acquired methicillin-resistant Staphylococcus aureus (MRSA) infection among patients colonized with MRSA on admission. Infect. Control Hosp. Epidemiol. 33(12), 1219–1225 (2012)CrossRefGoogle Scholar
  12. 12.
    Hajian-Tilaki, K.: Receiver operating characteristic (ROC) curve analysis for medical diagnostic test evaluation. Caspian J. Intern. Med. 4(2), 627 (2013)Google Scholar
  13. 13.
    Hartvigsen, T., Sen, C., Brownell, S., Teeple, E., Kong, X., Rundensteiner, E.: Early Prediction of MRSA Infections using Electronic Health Records. HealthInf, Valletta (2018)CrossRefGoogle Scholar
  14. 14.
    Jensen, P.B., Jensen, L.J., Brunak, S.: Mining electronic health records: towards better research applications and clinical care. Nat. Rev. Genet. 13(6), 395 (2012)CrossRefGoogle Scholar
  15. 15.
    Johnson, A.E., et al.: MIMIC-III, a freely accessible critical care database. Sci. Data 3, 160035 (2016)CrossRefGoogle Scholar
  16. 16.
    Jones, D.A., Shipman, J.P., Plaut, D.A., Selden, C.R.: Characteristics of personal health records: findings of the Medical Library Association/National Library of Medicine joint electronic personal health record task force. JMLA: J. Med. Libr. Assoc. 98(3), 243 (2010)CrossRefGoogle Scholar
  17. 17.
    Khalilia, M., Chakraborty, S., Popescu, M.: Predicting disease risks from highly imbalanced data using random forest. BMC Med. Inform. Decis. Mak. 11(1), 51 (2011)CrossRefGoogle Scholar
  18. 18.
    Lebedev, A., et al.: Random forest ensembles for detection and prediction of Alzheimer’s disease with a good between-cohort robustness. NeuroImage: Clin. 6, 115–125 (2014)CrossRefGoogle Scholar
  19. 19.
    Maree, C., Daum, R., Boyle-Vavra, S., Matayoshi, K., Miller, L.: Community-associated methicillin-resistant Staphylococcus aureus isolates and healthcare-associated infections. Emerg. Infect. Dis. 13(2), 236 (2007). Scholar
  20. 20.
    Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)Google Scholar
  21. 21.
    Murdoch, T., Detsky, A.: The inevitable application of big data to health care. JAMA 309(13), 1351–1352 (2013)CrossRefGoogle Scholar
  22. 22.
    Neu, H.C.: The crisis in antibiotic resistance. Science 257(5073), 1064–1074 (1992)CrossRefGoogle Scholar
  23. 23.
    Nseir, S., Grailles, G., Soury-Lavergne, A., Minacori, F., Alves, I., Durocher, A.: Accuracy of American Thoracic Society/Infectious Diseases Society of America criteria in predicting infection or colonization with multidrug-resistant bacteria at intensive-care unit admission. Clin. Microbiol. Infect. 16(7), 902–908 (2010)CrossRefGoogle Scholar
  24. 24.
    Raghupathi, W., Raghupathi, V.: Big data analytics in healthcare: promise and potential. Health Inf. Sci. Syst. 2(1), 3 (2014)CrossRefGoogle Scholar
  25. 25.
    Sen, C., Hartvigsen, T., Rundensteiner, E., Claypool, K.: CREST - risk prediction for clostridium difficile infection using multimodal data mining. In: Altun, Y., et al. (eds.) ECML PKDD 2017. LNCS (LNAI), vol. 10536, pp. 52–63. Springer, Cham (2017). Scholar
  26. 26.
    Shang, J.S., Lin, Y.E., Goetz, A.M.: Diagnosis of MRSA with neural networks and logistic regression approach. Health Care Manag. Sci. 3(4), 287 (2000)CrossRefGoogle Scholar
  27. 27.
    Sintchenko, V., Coiera, E., Gilbert, G.L.: Decision support systems for antibiotic prescribing. Curr. Opin. Infect. Dis. 21(6), 573–579 (2008)CrossRefGoogle Scholar
  28. 28.
    Ventola, C.L.: The antibiotic resistance crisis: Part 1: causes and threats. Pharm. Ther. 40(4), 277 (2015)Google Scholar
  29. 29.
    Visser, H., le Cessie, S., Vos, K., Breedveld, F.C., Hazes, J.M.: How to diagnose rheumatoid arthritis early: a prediction model for persistent (erosive) arthritis. Arthritis Rheumatol. 46(2), 357–365 (2002)CrossRefGoogle Scholar
  30. 30.
    Weiner, L., et al.: Antimicrobial-resistant pathogens associated with healthcare-associated infections: summary of data reported to the National Healthcare Safety Network at the centers for disease control and prevention, 2011–2014. Infect. Control Hosp. Epidemiol. 37(11), 1288–1301 (2016)CrossRefGoogle Scholar
  31. 31.
    Wiens, J., Guttag, J., Horvitz, E.: Learning evolving patient risk processes for c. diff. colonization. In: ICML Workshop on Machine Learning from Clinical Data (2012)Google Scholar
  32. 32.
    Wu, J., Roy, J., Stewart, W.F.: Prediction modeling using EHR data: challenges, strategies, and a comparison of machine learning approaches. Med. Care 48(6), S106–S113 (2010)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Thomas Hartvigsen
    • 1
    Email author
  • Cansu Sen
    • 1
  • Elke A. Rundensteiner
    • 1
  1. 1.Worcester Polytechnic InstituteWorcesterUSA

Personalised recommendations