Skip to main content

Advertisement

Log in

Text Mining of the Electronic Health Record: An Information Extraction Approach for Automated Identification and Subphenotyping of HFpEF Patients for Clinical Trials

  • Original Article
  • Published:
Journal of Cardiovascular Translational Research Aims and scope Submit manuscript

Abstract

Precision medicine requires clinical trials that are able to efficiently enroll subtypes of patients in whom targeted therapies can be tested. To reduce the large amount of time spent screening, identifying, and recruiting patients with specific subtypes of heterogeneous clinical syndromes (such as heart failure with preserved ejection fraction [HFpEF]), we need prescreening systems that are able to automate data extraction and decision-making tasks. However, a major obstacle is the vast amount of unstructured free-form text in medical records. Here we describe an information extraction-based approach that automatically converts unstructured text into structured data, which is cross-referenced against eligibility criteria using a rule-based system to determine which patients qualify for a major HFpEF clinical trial (PARAGON). We show that we can achieve a sensitivity and positive predictive value of 0.95 and 0.86, respectively. Our open-source algorithm could be used to efficiently identify and subphenotype patients with HFpEF and other disorders.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  1. Jensen, P. B., Jensen, L. J., & Brunak, S. (2012). Mining electronic health records: towards better research applications and clinical care. Nature Reviews. Genetics, 13(6), 395–405.

    Article  CAS  PubMed  Google Scholar 

  2. Sullivan, J.. (2004). Subject Recruitment and Retention: Barrier to Success. http://www.appliedclinicaltrialsonline.com/subject-recruitment-and-retention-barriers-success. Accessed 27 July 2015.

  3. PARAGON Inclusion/Exclusion Criteria (2015). https://sjonnalagadda.files.wordpress.com/2015/08/paragon_ie-criteria_10-01-2014.pdf. Accessed 10th August 2015.

  4. Bodenreider, O. (2004). The unified medical language system (UMLS): integrating biomedical terminology. Nucleic Acids Research, 32(Database issue), D267–D270.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Harkema, H., Dowling, J. N., Thornblade, T., & Chapman, W. W. (2009). ConText: an algorithm for determining negation, experiencer, and temporal status from clinical reports. Journal of Biomedical Informatics, 42(5), 839–851.

    Article  PubMed  PubMed Central  Google Scholar 

  6. Mitchell, K. J., Becich, M. J., Berman, J. J., Chapman, W. W., Gilbertson, J., Gupta, D., et al. (2004). Implementation and evaluation of a negation tagger in a pipeline-based system for information extract from pathology reports. Studies in Health Technology and Informatics, 107(Pt 1), 663–667.

    PubMed  Google Scholar 

  7. Shah, S. J., Heitner, J. F., Sweitzer, N. K., Anand, I. S., Kim, H. Y., Harty, B., et al. (2013). Baseline characteristics of patients in the treatment of preserved cardiac function heart failure with an aldosterone antagonist trial. Circulation. Heart Failure, 6(2), 184–192.

    Article  CAS  PubMed  Google Scholar 

  8. Shah, S. J., Cogswell, R., Ryan, J. J., & Sharma, K. (2016). How to develop and implement a specialized heart failure with preserved ejection fraction clinical program. Current Cardiology Reports, 18(12), 122.

    Article  PubMed  Google Scholar 

  9. Friedman, C. P., Wong, A. K., & Blumenthal, D. (2010). Achieving a nationwide learning health system. Science Translational Medicine, 2(57), 57cm29–57cm29.

    Article  PubMed  Google Scholar 

  10. Friedman, C., & Rigby, M. (2013). Conceptualising and creating a global learning health system. International Journal of Medical Informatics, 82(4), e63–e71.

    Article  PubMed  Google Scholar 

  11. Ma, X.-J., Wang, Z., Ryan, P. D., Isakoff, S. J., Barmettler, A., Fuller, A., et al. (2004). A two-gene expression ratio predicts clinical outcome in breast cancer patients treated with tamoxifen. Cancer Cell, 5(6), 607–616.

    Article  CAS  PubMed  Google Scholar 

  12. Strom, B. L., Schinnar, R., Jones, J., Bilker, W. B., Weiner, M. G., Hennessy, S., et al. (2011). Detecting pregnancy use of non-hormonal category X medications in electronic medical records. Journal of the American Medical Informatics Association, 18(Suppl 1), i81–i86.

  13. Mathias, J. S., Gossett, D., & Baker, D. W. (2012). Use of electronic health record data to evaluate overuse of cervical cancer screening. Journal of the American Medical Informatics Association, 19(e1), e96–e101.

  14. De Pauw, R., Kregel, J., De Blaiser, C., Van Akeleyen, J., Logghe, T., Danneels, L., et al. (2015). Identifying prognostic factors predicting outcome in patients with chronic neck pain after multimodal treatment: a retrospective study. Manual Therapy, 20(4), 592–597.

    Article  CAS  PubMed  Google Scholar 

  15. Onofrei, M., Hunt, J., Siemienczuk, J., Touchette, D. R., & Middleton, B. (2004). A first step towards translating evidence into practice: heart failure in a community practice-based research network. Informatics in Primary Care, 12(3), 139–145.

    PubMed  Google Scholar 

  16. Johnson, S. B., Bakken, S., Dine, D., Hyun, S., Mendonça, E., Morrison, F., et al. (2008). An electronic health record based on structured narrative. Journal of the American Medical Informatics Association, 15(1), 54–64.

    Article  PubMed  PubMed Central  Google Scholar 

  17. Zhou, L., Mahoney, L. M., Shakurova, A., Goss, F., Chang, F. Y., Bates, D. W., et al. (2012). How many medication orders are entered through free-text in EHRs?—a study on hypoglycemic agents. American Medical Informatics Association Annual Symposium Proceedings, 2012, 1079–1088.

    Google Scholar 

  18. Zheng, K., Hanauer, D. A., Padman, R., Johnson, M. P., Hussain, A. A., Ye, W., et al. (2011). Handling anticipated exceptions in clinical care: investigating clinician use of ‘exit strategies’ in an electronic health records system. Journal of the American Medical Informatics Association, 18(6), 883–889.

  19. Raghavan, P., Chen, J. L., Fosler-Lussier, E., & Lai, A. M. (2014). How essential are unstructured clinical narratives and information fusion to clinical trial recruitment? AMIA Jt Summits Transl Sci Proc, 2014, 218–223.

    PubMed  PubMed Central  Google Scholar 

  20. Stanfill, M. H., Williams, M., Fenton, S. H., Jenders, R. A., & Hersh, W. R. (2010). A systematic literature review of automated clinical coding and classification systems. Journal of the American Medical Informatics Association, 17(6), 646–651. 

  21. Jha, A. K. (2011). The promise of electronic records: around the corner or down the road? JAMA, 306(8), 880–881.

    Article  CAS  PubMed  Google Scholar 

  22. Friedman, C., Rindflesch, T. C., & Corn, M. (2013). Natural language processing: State of the art and prospects for significant progress, a workshop sponsored by the National Library of Medicine. Journal of Biomedical Informatics, 46(5), 765–773.

    Article  PubMed  Google Scholar 

  23. Shivade, C., Raghavan, P., Fosler-Lussier, E., Embi, P. J., Elhadad, N., Johnson, S. B., et al. (2014). A review of approaches to identifying patient phenotype cohorts using electronic health records. Journal of the American Medical Informatics Association, 21(2), 221–230.

  24. Nguyen, A. N., Lawley, M. J., Hansen, D. P., Bowman, R. V., Clarke, B. E., Duhig, E. E., et al. (2010). Symbolic rule-based classification of lung cancer stages from free-text pathology reports.  17(4), 440–445.

  25. Mia Schmiedeskamp, P. P., Spencer Harpe, P. P. M. P. H., Ronald Polk, P., Michael Oinonen, P. M. P. H., & Amy Pakyz, P. M. S. (2009). Use of international classification of diseases, ninth revision, clinical modification codes and medication use data to identify nosocomial Clostridium difficile infection. Infection Control and Hospital Epidemiology, 30(11), 1070–1076.

    Article  PubMed  Google Scholar 

  26. Penberthy, L., Brown, R., Puma, F., & Dahman, B. (2010). Automated matching software for clinical trials eligibility: measuring efficiency and flexibility. Contemporary Clinical Trials, 31(3), 207–217.

    Article  PubMed  PubMed Central  Google Scholar 

  27. Kho, A. N., Hayes, M. G., Rasmussen-Torvik, L., Pacheco, J. A., Thompson, W. K., Armstrong, L. L., et al. (2012). Use of diverse electronic medical record systems to identify genetic risk for type 2 diabetes within a genome-wide association study. Journal of the American Medical Informatics Association, 19(2), 212–218.

    Article  PubMed  Google Scholar 

  28. Klompas, M., Haney, G., Church, D., Lazarus, R., Hou, X., & Platt, R. (2008). Automated identification of acute hepatitis B using electronic medical record data to facilitate public health surveillance. PloS One, 3(7), e2626.

    Article  PubMed  PubMed Central  Google Scholar 

  29. Mani, S., Chen, Y., Arlinghaus, L. R., Li, X., Chakravarthy, A. B., Bhave, S. R., et al. (2011). Early prediction of the response of breast tumors to neoadjuvant chemotherapy using quantitative MRI and machine learning. American Medical Informatics Association Annual Symposium Proceedings, 2011, 868–877.

    Google Scholar 

  30. Van den Bulcke, T., Vanden Broucke, P., Van Hoof, V., Wouters, K., Vanden Broucke, S., Smits, G., et al. (2011). Data mining methods for classification of Medium-Chain Acyl-CoA dehydrogenase deficiency (MCADD) using non-derivatized tandem MS neonatal screening data. Journal of Biomedical Informatics, 44(2), 319–325.

    Article  PubMed  Google Scholar 

  31. Zhao, D., & Weng, C. (2011). Combining PubMed knowledge and EHR data to develop a weighted bayesian network for pancreatic cancer prediction. Journal of Biomedical Informatics, 44(5), 859–868.

    Article  PubMed  PubMed Central  Google Scholar 

  32. Kawaler, E., Cobian, A., Peissig, P., Cross, D., Yale, S., & Craven, M. (2012). Learning to predict post-hospitalization VTE risk from EHR data. American Medical Informatics Association Annual Symposium Proceedings, 2012, 436–445.

    Google Scholar 

  33. Lowe, H. J., Ferris, T. A., Hernandez, P. M., & Weber, S. C. (2009). STRIDE—an integrated standards-based translational research informatics platform. American Medical Informatics Association Annual Symposium Proceedings, 2009, 391–395.

    Google Scholar 

  34. Gregg, W., Jirjis, J., Lorenzi, N. M., & Giuse, D. (2003). StarTracker: an integrated, web-based clinical search engine. AMIA Annual Symposium Proceedings, 855.

  35. Hanauer, D. A., Mei, Q., Law, J., Khanna, R., & Zheng, K. (2015). Supporting information retrieval from electronic health records: a report of University of Michigan’s nine-year experience in developing and using the Electronic Medical Record Search Engine (EMERSE). Journal of Biomedical Informatics, 55, 290–300.

    Article  PubMed  PubMed Central  Google Scholar 

  36. Zalis, M., & Harris, M. (2010). Advanced search of the electronic medical record: augmenting safety and efficiency in radiology. Journal of the American College of Radiology, 7(8), 625–633.

    Article  PubMed  Google Scholar 

  37. Lehman, L. W., Saeed, M., Long, W., Lee, J., & Mark, R. (2012). Risk stratification of ICU patients using topic models inferred from unstructured progress notes. American Medical Informatics Association Annual Symposium Proceedings, 2012, 505–511.

    Google Scholar 

  38. Carroll, R. J., Eyler, A. E., & Denny, J. C. (2011). Naive electronic health record phenotype identification for rheumatoid arthritis. American Medical Informatics Association Annual Symposium Proceedings, 2011, 189–196.

    Google Scholar 

  39. Liao, K. P., Cai, T., Gainer, V., Goryachev, S., Zeng-treitler, Q., Raychaudhuri, S., et al. (2010). Electronic medical records for discovery research in rheumatoid arthritis. Arthritis Care and Research, 62(8), 1120–1127.

    Article  PubMed  PubMed Central  Google Scholar 

  40. Bejan, C. A., Xia, F., Vanderwende, L., Wurfel, M. M., & Yetisgen-Yildiz, M. (2012). Pneumonia identification using statistical feature selection. Journal of the American Medical Informatics Association, 19(5), 817–823.

  41. Kopcke, F., & Prokosch, H. U. (2014). Employing computers for the recruitment into clinical trials: a comprehensive systematic review. Journal of Medical Internet Research, 16(7), e161.

    Article  PubMed  PubMed Central  Google Scholar 

  42. Ni, Y., Kennebeck, S., Dexheimer, J. W., McAneney, C. M., Tang, H., Lingren, T., et al. (2015). Automated clinical trial eligibility prescreening: increasing the efficiency of patient identification for clinical trials in the emergency department. Journal of the American Medical Informatics Association, 22(1), 166–178.

    Article  PubMed  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Siddhartha R. Jonnalagadda.

Ethics declarations

Funding Sources

This work was funded by the National Library of Medicine: R00LM011389 and R01LM011416 (to S.R.J.), and an investigator-initiated study grant from Novartis. S.J.S. is also supported by grants from the National Institutes of Health (R01 HL107577 and R01 HL127028). The authors acknowledge Prasanth Nannapaneni for his valuable ideas on extracting information from the electronic health record.

Conflicts of Interest

Siddhartha R. Jonnalagadda is currently an employee of Microsoft Corporation.

Abhishek K. Adupa declares that he has no conflict of interest.

Ravi P. Garg declares that he has no conflict of interest.

Jessica Corona-Cox declares that she has no conflict of interest.

Sanjiv J. Shah reports receiving consulting fees from Novartis.

Ethical Approval

All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki Declaration and its later amendments or comparable ethical standards.

Informed Consent

Informed consent was waived for this study by the Northwestern University Institutional Review Board because the study only involved retrospective chart review.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jonnalagadda, S.R., Adupa, A.K., Garg, R.P. et al. Text Mining of the Electronic Health Record: An Information Extraction Approach for Automated Identification and Subphenotyping of HFpEF Patients for Clinical Trials. J. of Cardiovasc. Trans. Res. 10, 313–321 (2017). https://doi.org/10.1007/s12265-017-9752-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12265-017-9752-2

Keywords

Navigation