Pharmaceutical Medicine

, Volume 32, Issue 1, pp 31–37 | Cite as

A Case Study of the Incremental Utility for Disease Identification of Natural Language Processing in Electronic Medical Records

  • Lisa S. WeissEmail author
  • Xiaofeng Zhou
  • Alexander M. Walker
  • Ashwin N. Ananthakrishnan
  • Rongjun Shen
  • Rachel E. Sobel
  • Andrew Bate
  • Robert F. Reynolds
Short Communication



Information exists as unstructured medical text in healthcare databases. Such information is not routinely considered in safety surveillance but typically relies solely on structured (coded) data. Natural language processing (NLP) may allow the capture of concepts from unstructured data and thus enhance safety surveillance capability.


We sought to assess the added contribution of unstructured data extracted from medical text by NLP for detecting acute liver dysfunction (ALD) in patients with inflammatory bowel disease (IBD).


Using a previously developed rule, we evaluated structured and unstructured NLP-extracted terms from a commercially available electronic medical record (EMR) system. The rule was intended to identify ALD diagnosis and timing of onset and was the result of three iterations of rule development using 150 ALD candidate cases. We evaluated the performance of the rule with or without NLP among all candidate cases and among 50 new cases with clinical adjudication.


NLP terms were necessary for the diagnosis of 9% of cases and for ruling out 3% of false-positive cases. Inclusion of NLP terms led to an identification of an additional  9% of ALD-onset dates, with consequent earlier recognition in 5%.


NLP-derived terms in one large commercially available EMR system modestly improved the sensitivity and specificity in the identification of ALD and identified earlier onset.


Compliance with Ethical Standards

All patient and provider information was provided in the form of non-identifying study code numbers. The work did not require institutional review board approval.


This work was conducted using Pfizer, Inc., internal funds and under a research contract between Pfizer and World Health Information Science Consultants (AW and AA).

Conflicts of Interest

LSW, XZ, RS, RES, AB and RR are employees and may be shareholders of Pfizer, Inc. AW has worked under contract with Optum, which owns Humedica (whose data resource is being studied). AA has received consulting fees or honoraria for serving on scientific advisory boards for Abbvie, Takeda, and Merck. The views expressed herein are those of the authors and do not necessarily represent those of Pfizer, Inc.


  1. 1.
    Ananthakrishnan AN, Cai T, Savova G, Cheng SC, Chen P, Perez RG, et al. Improving case definition of Crohn’s disease and ulcerative colitis in electronic medical records using natural language processing: a novel informatics approach. Inflamm Bowel Dis. 2013;19(7):1411–20.CrossRefPubMedPubMedCentralGoogle Scholar
  2. 2.
    Carrell DS, Halgrim S, Tran DT, Buist DS, Chubak J, Chapman WW, et al. Using natural language processing to improve efficiency of manual chart abstraction in research: the case of breast cancer recurrence. Am J Epidemiol. 2014;179(6):749–58.CrossRefPubMedPubMedCentralGoogle Scholar
  3. 3.
    Dublin S, Baldwin E, Walker RL, Christensen LM, Haug PJ, Jackson ML, et al. Natural Language Processing to identify pneumonia from radiology reports. Pharmacoepidemiol Drug Saf. 2013;22(8):834–41.CrossRefPubMedGoogle Scholar
  4. 4.
    Elkin PL, Froehling DA, Wahner-Roedler DL, Brown SH, Bailey KR. Comparison of natural language processing biosurveillance methods for identifying influenza from encounter notes. Ann Intern Med. 2012;156(1 Pt 1):11–8.CrossRefPubMedGoogle Scholar
  5. 5.
    Haerian K, Varn D, Vaidya S, Ena L, Chase HS, Friedman C. Detection of pharmacovigilance-related adverse events using electronic health records and automated methods. Clin Pharmacol Ther. 2012;92(2):228–34.CrossRefPubMedPubMedCentralGoogle Scholar
  6. 6.
    Li L, Chase HS, Patel CO, Friedman C, Weng C. Comparing ICD9-encoded diagnoses and NLP-processed discharge summaries for clinical trials pre-screening: a case study. AMIA Annu Symp Proc. 2008;06:404–8.Google Scholar
  7. 7.
    Liao KP, Cai T, Gainer V, Goryachev S, Zeng-treitler Q, Raychaudhuri S, et al. Electronic medical records for discovery research in rheumatoid arthritis. Arthritis Care Res (Hoboken). 2010;62(8):1120–7.CrossRefPubMedPubMedCentralGoogle Scholar
  8. 8.
    Melton GB, Hripcsak G. Automated detection of adverse events using natural language processing of discharge summaries. J Am Med Inform. 2005;12(4):448–57.CrossRefGoogle Scholar
  9. 9.
    Murff HJ, FitzHenry F, Matheny ME, Gentry N, Kotter KL, Crimin K, et al. Automated identification of postoperative complications within an electronic medical record using natural language processing. JAMA. 2011;306(8):848–55.CrossRefPubMedGoogle Scholar
  10. 10.
    Penz JF, Wilcox AB, Hurdle JF. Automated identification of adverse events related to central venous catheters. J Biomed Inform. 2007;40(2):174–82.CrossRefPubMedGoogle Scholar
  11. 11.
    Afzal N, Sohn S, Abram S, Scott CG, Chaudhry R, Liu H, et al. Mining peripheral arterial disease cases from narrative clinical notes using natural language processing. J Vasc Surg. 2017;65(6):1753–61.CrossRefPubMedGoogle Scholar
  12. 12.
    Wang Y, Wang L, Rastegar-Mojarad M, Liu S, Shen F, Liu H. Systematic analysis of free-text family history in electronic health record. AMIA Jt Summits Transl Sci Proc. 2017;2017:104–13.PubMedPubMedCentralGoogle Scholar
  13. 13.
    Walker AM, Zhou X, Ananthakrishnan AN, Weiss LS, Shen R, Sobel RE, et al. Computer-assisted expert case definition in electronic health records. Int J Med Inform. 2016;86:62–70.CrossRefPubMedGoogle Scholar
  14. 14.
    Wallace PJ, Shah ND, Dennen T, Bleicher PA, Crown WH. Optum Labs: building a novel node in the learning health care system. Health Aff (Millwood). 2014;33(7):1187–94.CrossRefPubMedGoogle Scholar
  15. 15.
    Sadosky A, Mardekian J, Parsons B, Hopps M, Bienen EJ, Markman J. Healthcare utilization and costs in diabetes relative to the clinical spectrum of painful diabetic peripheral neuropathy. J Diabetes Complicat. 2015;29(2):212–7.CrossRefPubMedGoogle Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2017

Authors and Affiliations

  • Lisa S. Weiss
    • 1
    Email author
  • Xiaofeng Zhou
    • 1
  • Alexander M. Walker
    • 2
  • Ashwin N. Ananthakrishnan
    • 3
  • Rongjun Shen
    • 1
  • Rachel E. Sobel
    • 1
  • Andrew Bate
    • 1
  • Robert F. Reynolds
    • 1
  1. 1.Epidemiology, Research and Development, Worldwide Safety and RegulatoryPfizerNew YorkUSA
  2. 2.WHISCONNewtonUSA
  3. 3.Division of Gastroenterology, Crohn’s and Colitis CenterMassachusetts General HospitalBostonUSA

Personalised recommendations