The goal of pharmacovigilance is to detect, monitor, characterize and prevent adverse drug events (ADEs) with pharmaceutical products. This article is a comprehensive structured review of recent advances in applying natural language processing (NLP) to electronic health record (EHR) narratives for pharmacovigilance. We review methods of varying complexity and problem focus, summarize the current state-of-the-art in methodology advancement, discuss limitations and point out several promising future directions. The ability to accurately capture both semantic and syntactic structures in clinical narratives becomes increasingly critical to enable efficient and accurate ADE detection. Significant progress has been made in algorithm development and resource construction since 2000. Since 2012, statistical analysis and machine learning methods have gained traction in automation of ADE mining from EHR narratives. Current state-of-the-art methods for NLP-based ADE detection from EHRs show promise regarding their integration into production pharmacovigilance systems. In addition, integrating multifaceted, heterogeneous data sources has shown promise in improving ADE detection and has become increasingly adopted. On the other hand, challenges and opportunities remain across the frontier of NLP application to EHR-based pharmacovigilance, including proper characterization of ADE context, differentiation between off- and on-label drug-use ADEs, recognition of the importance of polypharmacy-induced ADEs, better integration of heterogeneous data sources, creation of shared corpora, and organization of shared-task challenges to advance the state-of-the-art.
This is a preview of subscription content, access via your institution.
Buy single article
Instant access to the full article PDF.
Price excludes VAT (USA)
Tax calculation will be finalised during checkout.
Onder G, et al. Adverse drug reactions as cause of hospital admissions: results from the Italian Group of Pharmacoepidemiology in the Elderly (GIFA) (in English). J Am Geriatr Soc. 2002;50(12):1962–8.
Harpaz R, et al. Text mining for adverse drug events: the promise, challenges, and state of the art. Drug Saf. 2014;37(10):777–90.
Cami A, Arnold A, Manzi S, Reis B. Predicting adverse drug events using pharmacological network models (in English). Sci Transl Med. 2011;3(114):114ra127. doi:10.1126/scitranslmed.3002774.
Pouliot Y, Chiang AP, Butte AJ. Predicting adverse drug reactions using publicly available PubChem BioAssay data (in English). Clin Pharmacol Ther. 2011;90(1):90–9.
Zheng HR, Wang HY, Xu H, Wu YH, Zhao ZM, Azuaje F. Linking biochemical pathways and networks to adverse drug reactions (in English). IEEE Trans Nanobiosci. 2014;13(2):131–7.
Liu M, et al. Large-scale prediction of adverse drug reactions using chemical, biological, and phenotypic properties of drugs (in English). J Am Med Inform Assoc. 2012;19(E1):E28–35.
Harpaz R, et al. Combing signals from spontaneous reports and electronic health records for detection of adverse drug reactions. J Am Med Inform Assoc. 2013;20(3):413–9.
Boland MR, Tatonetti NP. Are all vaccines created equal? Using electronic health records to discover vaccines associated with clinician-coded adverse events. AMIA Summits Transl Sci Proc. 2015;2015:196.
Nadkarni PM. Drug safety surveillance using de-identified EMR and claims data: issues and challenges. J Am Med Inform Assoc. 2010;17(6):671–4.
Classen DC, et al. ‘Global trigger tool’ shows that adverse events in hospitals may be ten times greater than previously measured. Health Aff. 2011;30(4):581–9.
Coloma PM, Trifirò G, Patadia V, Sturkenboom M. Postmarketing safety surveillance. Drug Saf. 2013;36(3):183–97.
Doupi P. Using EHR data for monitoring and promoting patient safety: reviewing the evidence on trigger tools. Stud Health Technol Inform. 2011;180:786–90.
Gonzalez GH, Tahsin T, Goodale BC, Greene AC, Greene CS. Recent advances and emerging applications in text and data mining for biomedical discovery. Brief Bioinform. 2016;17(1):33–42.
Luo Y, Uzuner Ö, Szolovits P. Bridging semantics and syntax with graph algorithms—state-of-the-art of extracting biomedical relations. Brief Bioinform. 2016;18(1):160–78.
Cohen KB, Demner-Fushman D. Biomedical natural language processing. Amsterdam: John Benjamins Publishing Company; 2014.
Nadkarni PM, Ohno-Machado L, Chapman WW. Natural language processing: an introduction. J Am Med Inform Assoc. 2011;18(5):544–51.
Luo Y, Riedlinger G, Szolovits P. Text mining in cancer gene and pathway prioritization. Cancer informatics. 2014;(Suppl. 1):69–79.
Thomson Reuters. EndNote, X7.5 ed. New York: Thomson Reuters; 2016.
Veritas Health Innovation. Covidence systematic review software, ed. Melbourne: Veritas Health Innovation; 2016.
Manning CD, Raghavan P, Schütze H. Introduction to information retrieval, vol. 1. Cambridge: Cambridge University Press; 2008.
Honigman B, et al. Using computerized data to identify adverse drug events in outpatients. J Am Med Inform Assoc. 2001;8(3):254–66.
Honigman B, Light P, Pulling RM, Bates DW. A computerized method for identifying incidents associated with adverse drug events in outpatients. Int J Med Inform. 2001;61(1):21–32.
Gurwitz JH, et al. Incidence and preventability of adverse drug events among older persons in the ambulatory setting. JAMA. 2003;289(9):1107–16.
Murff HJ, Forster AJ, Peterson JF, Fiskio JM, Heiman HL, Bates DW. Electronically screening discharge summaries for adverse medical events. J Am Med Inform Assoc. 2003;10(4):339–50.
Cantor MN, Feldman HJ, Triola MM. Using trigger phrases to detect adverse drug reactions in ambulatory care notes. Qual Saf Health Care. 2007;16(2):132–4.
Chazard E, Baceanu A, Ferret L, Ficheur G. The ADE scorecards: a tool for adverse drug event detection in electronic health records. Stud Health Technol Inform. 2011;166:169–79.
Chazard E, Ficheur G, Bernonville S, Luyckx M, Beuscart R. Data mining to generate adverse drug events detection rules. IEEE Trans Inf Technol Biomed. 2011;15(6):823–30.
Ballard J, Rosenman M, Weiner M. Harnessing a health information exchange to identify surgical device adverse events for urogynecologic mesh. AMIA Annu Symp Proc. 2012;2012:1109–18.
Ferrajolo C, et al. Idiopathic acute liver injury in paediatric outpatients: incidence and signal detection in two European countries. Drug Saf. 2013;36(10):1007–16.
Ferrajolo C, et al. Signal detection of potentially drug-induced acute liver injury in children using a multi-country healthcare database network. Drug Saf. 2014;37(2):99–108.
Pathak J, Kiefer RC, Chute CG. Using linked data for mining drug–drug interactions in electronic health records. Stud Health Technol Inform. 2013;192:682–6.
Pathak J, Kiefer RC, Chute CG. Mining drug–drug interaction patterns from linked data: a case study for warfarin, clopidogrel, and simvastatin. 2013 IEEE International Conference on Bioinformatics and Biomedicine, 2013.
Haber P et al. Post-Licensure surveillance of trivalent live-attenuated influenza vaccine in children aged 2–18 years, Vaccine Adverse Event Reporting System, United States, July 2005–June 2012. J Pediatr Infect Dis Soc. 2015;4(3):205–13.
International Conference on Harmonisation of Technical Requirements for Registration of Pharmaceuticals for Human Use (ICH). (Aug 31). Medical dictionary for regulatory activities. http://www.meddra.org/.
Bates DW, Evans RS, Murff H, Stetson PD, Pizziferri L, Hripcsak G. Detecting adverse events using information technology. J Am Med Inform Assoc JAMIA. 2003;10(2):115–28.
Hazlehurst B, Naleway A, Mullooly J. Detecting possible vaccine adverse events in clinical notes of the electronic medical record. Vaccine. 2009;27(14):2077–83.
Hazlehurst B, Mullooly J, Naleway A, Crane B. Detecting possible vaccination reactions in clinical notes. AMIA Annu Symp Proc, vol. Annual Symposium Proceedings/AMIA Symposium, p. 306–10, 2005.
Hazlehurst B, Frost HR, Sittig DF, Stevens VJ. MediClass: a system for detecting and classifying encounter-based clinical events in any electronic medical record. J Am Med Inform Assoc. 2005;12(5):517–29.
Sohn S, Kocher JP, Chute CG, Savova GK. Drug side effect extraction from clinical narratives of psychiatry and psychology patients. J Am Med Inform Assoc JAMIA. 2011;18(Suppl 1):i144–9.
Duke JD et al. Literature based drug interaction prediction with clinical assessment using electronic medical records: novel myopathy associated drug interactions. PLoS Comput Biol. 2012;8(8):e1002614. doi:10.1371/journal.pcbi.1002614
Epstein RH, St Jacques P, Stockin M, Rothman B, Ehrenfeld JM, Denny JC. Automated identification of drug and food allergies entered using non-standard terminology. J Am Med Inform Assoc JAMIA. 2013;20(5):962–8.
Eriksson R, Jensen PB, Frankild S, Jensen LJ, Brunak S. Dictionary construction and identification of possible adverse drug events in Danish clinical narrative text. J Am Med Inform Assoc. 2013;20(5):947–53.
Eriksson R, Werge T, Jensen LJ, Brunak S. Dose-specific adverse drug reaction identification in electronic patient records: temporal data mining in an inpatient psychiatric population. [Erratum appears in Drug Saf. 2014 May; 37(5):379]. Drug Saf. 2014;37(4):237–47.
Roitmann E, Eriksson R, Brunak S. Patient stratification and identification of adverse event correlations in the space of 1190 drug related adverse events. Front Physiol. 2014;5:332.
Nadeau D, Sekine S. A survey of named entity recognition and classification. Lingvisticae Investigationes. 2007;30(1):3–26.
Uzuner Ö, South BR, Shen S, DuVall SL. 2010 i2b2/VA challenge on concepts, assertions, and relations in clinical text. J Am Med Inform Assoc. 2011;18(5):552–6.
Friedman C, Shagina L, Lussier Y, Hripcsak G. Automated encoding of clinical documents based on natural language processing. J Am Med Inform Assoc. 2004;11(5):392–402.
Aronson AR. Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program. In: Proceedings of the AMIA Symposium, 2001, p. 17, American Medical Informatics Association.
Savova GK, et al. Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications. J Am Med Inform Assoc. 2010;17(5):507–13.
Xu H, Stenner SP, Doan S, Johnson KB, Waitman LR, Denny JC. MedEx: a medication information extraction system for clinical narratives. J Am Med Inform Assoc. 2010;17(1):19–24.
Cunningham H. GATE, a general architecture for text engineering. Comput Humanit. 2002;36(2):223–54.
Melton G, Hripcsak G. Automated detection of adverse events using natural language processing of discharge summaries. J Am Med Inform Assoc. 2005;12(4):448–57.
Penz JFE, Wilcox AB, Hurdle JF. Automated identification of adverse events related to central venous catheters. J Biomed Inform. 2007;40(2):174–82.
Wang X, Hripcsak G, Markatou M, Friedman C. Active computerized pharmacovigilance using natural language processing, statistics, and electronic health records: a feasibility study. J Am Med Inform Assoc. 2009;16(3):328–37.
Wang X, Chase H, Markatou M, Hripcsak G, Friedman C. Selecting information in electronic health records for knowledge acquisition. J Biomed Inform. 2010;43(4):595–601.
Friedman C. Discovering novel adverse drug events using natural language processing and mining of the electronic health record. Artif Intell Med Proc. 2009;5651:1–5.
Haerian K, Varn D, Vaidya S, Ena L, Chase HS, Friedman C. Detection of pharmacovigilance-related adverse events using electronic health records and automated methods. Clin Pharmacol Ther. 2012;92(2):228–34.
Gysbers M et al. Natural language processing to identify adverse drug events. AMIA Annu Symp Proc, vol. Annual Symposium Proceedings/AMIA Symposium, p. 961, 2008.
LePendu P, Iyer SV, Fairon C, Shah NH. Annotation analysis for testing drug safety signals using unstructured clinical notes. J Biomed Semant. 2012;3(Suppl 1):S5.
Shah NH, Bhatia N, Jonquet C, Rubin D, Chiang AP, Musen MA. Comparison of concept recognizers for building the Open Biomedical Annotator. BMC Bioinform. 2009;10(Suppl 9):S14.
Banerjee R, Ramakrishnan IV, Henry M, Perciavalle M. Patient centered identification, attribution, and ranking of adverse drug events. In: International Conference on Healthcare Informatics 2015, p.18–27.
Tsuruoka Y, Tsujii JI. Bidirectional inference with the easiest-first strategy for tagging sequence data. In: Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing, 2005, p. 467–74, Association for Computational Linguistics.
Liu Y, LePendu P, Iyer S, Shah NH. Using temporal patterns in medical records to discern adverse drug events from indications. AMIA Summits Transl Sci Proc. 2012;2012:47–56.
Whetzel PL, et al. BioPortal: enhanced functionality via new web services from the National Center for Biomedical Ontology to access and use ontologies in software applications. Nucleic Acids Res. 2011;39(suppl 2):W541–5.
Gerdes LU, Hardahl C. Text mining electronic health records to identify hospital adverse events. Stud Health Technol Inform. 2013;192:1145.
Wei WQ, et al. Creation and validation of an EMR-based algorithm for identifying major adverse cardiac events while on statins. AMIA Summits Transl Sci Proc. 2014;2014:112–9.
Iqbal E, et al. Identification of adverse drug events from free text electronic patient records and information in a large mental health case register. PLoS One. 2015;10(8):e0134208.
Ayvaz S, et al. Toward a complete dataset of drug–drug interaction information from publicly available sources. J Biomed Inform. 2015;55:206–17.
Bui Q-C, Sloot PM, Van Mulligen EM, Kors JA. A novel feature-based approach to extract drug–drug interactions from biomedical text. Bioinformatics. 2014;30(23):3365–71.
Wang G, Jung K, Winnenburg R, Shah NH. A method for systematic discovery of adverse drug events from clinical notes. J Am Med Inform Assoc. 2015;22(6):1196–204.
Chapman WW, Bridewell W, Hanbury P, Cooper GF, Buchanan BG. A simple algorithm for identifying negated findings and diseases in discharge summaries. J Biomed Inform. 2001;34(5):301–10.
Chapman WW, Chu D, Dowling JN. ConText: an algorithm for identifying contextual features from clinical text. In: Proceedings of the Workshop on BioNLP 2007: Biological, Translational, and Clinical Language Processing, 2007, p. 81–88, Association for Computational Linguistics.
Cao F, Sun X, Wang X, Li B, Li J, Pan Y. Ontology-based knowledge management for personalized adverse drug events detection. Stud Health Technol Inform. 2010;169:699–703.
Avillach P, et al. Harmonization process for the identification of medical events in eight European healthcare databases: the experience from the EU-ADR project. J Am Med Inform Assoc JAMIA. 2013;20(1):184–92.
Iyer SV, Harpaz R, LePendu P, Bauer-Mehren A, Shah NH. Mining clinical text for signals of adverse drug–drug interactions. J Am Med Inform Assoc. 2014;21(2):353–62.
Kamdar MR, Tudorache T, Musen MA. A systematic analysis of term reuse and term overlap across biomedical ontologies. Semantic Web, no. Preprint, p. 1–19, 2016.
Wasserman L. All of statistics: a concise course in statistical inference. New York: Springer; 2013.
Bishop CM. Pattern recognition. Mach Learn. 2006;128:1–58.
Visweswaran S, Hanbury P, Saul M, Cooper GF. Detecting adverse drug events in discharge summaries using variations on the simple Bayes model. AMIA Annu Symp Proc, vol. Annual Symposium Proceedings/AMIA Symposium, p. 689–93, 2003.
Wang X, Hripcsak G, Friedman C. Characterizing environmental and phenotypic associations using information theory and electronic health records. BMC Bioinform. 2009;10(Suppl 9):S13.
LePendu P, et al. Pharmacovigilance using clinical notes. Clin Pharmacol Ther. 2013;93(6):547–55.
Leeper NJ, Bauer-Mehren A, Iyer SV, LePendu P, Olson C, Shah NH. Practice-based evidence: profiling the safety of cilostazol by text-mining of clinical notes. PLoS One. 2013;8(5):e63499. doi: 10.1371/journal.pone.0063499
Schuemie MJ. Methods for drug safety signal detection in longitudinal observational databases: LGPS and LEOPARD. Pharmacoepidemiol Drug Saf. 2011;20(3):292–9.
Banda JM, et al. Feasibility of prioritizing drug–drug-event associations found in electronic health records. Drug Saf. 2016;39(1):45–57.
Aramaki E, et al. Extraction of adverse drug effects from clinical records. Stud Health Technol Inform. 2010;160(Pt 1):739–43.
Baeza-Yates R, Ribeiro-Neto B. Modern information retrieval. New York: ACM Press; 1999.
Henriksson A, Kvist M, Dalianis H, Duneld M. Identifying adverse drug event information in clinical notes with distributional semantic representations of context. J Biomed Inform. 2015;57:333–49.
Henriksson A, Zhao J, Bostrom H, Dalianis H. Modeling electronic health records in ensembles of semantic spaces for adverse drug event detection. In: 2015 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), 2015, p. 343–50.
Sun W, Rumshisky A, Uzuner O. Temporal reasoning over clinical text: the state of the art. J Am Med Inform Assoc. 2013;20(5):814–9.
Wang X, et al. SMDM: enhancing enterprise-wide master data management using semantic web technologies. Proc VLDB Endow. 2009;2(2):1594–7.
Knox C, et al. DrugBank 3.0: a comprehensive resource for ‘omics’ research on drugs. Nucleic Acids Res. 2011;39(suppl 1):D1035–41.
Luo Y, Sohani AR, Hochberg EP, Szolovits P. Automatic lymphoma classification with sentence subgraph mining from pathology reports. J Am Med Inform Assoc. 2014;21(5):824–32.
Boland MR, Hripcsak G, Shen Y, Chung WK, Weng C. Defining a comprehensive verotype using electronic health records for personalized medicine. J Am Med Inform Assoc. 2013;20(e2):e232–8.
Radley DC, Finkelstein SN, Stafford RS. Off-label prescribing among office-based physicians. Arch Intern Med. 2006;166(9):1021–6.
Flowers CM, Racoosin JA, Kortepeter C. Seizure activity and off-label use of tiagabine. N Engl J Med. 2006;354(7):773–4.
Carmona L, Descalzo MA, Ruiz-Montesinos D, Manero-Ruiz FJ, Perez-Pampin E, Gomez-Reino JJ. Safety and retention rate of off-label uses of TNF antagonists in rheumatic conditions: data from the Spanish registry BIOBADASER 2.0. Rheumatology. 2011;50(1):85–92.
Dal Pan GJ. Monitoring the safety of medicines used off-label. Clin Pharmacol Ther. 2012;91(5):787–95.
Epstein RS, Huang SM. The many sides of off-label prescribing (in English). Clin Pharmacol Ther. 2012;91(5):755–8.
Stafford RS. Off-label use of drugs and medical devices: a review of policy implications. Clin Pharmacol Ther. 2012;91(5):920–25.
Kimland E, Odlind V. Off-label drug use in pediatric patients. Clin Pharmacol Ther. 2012;91(5):796–801.
Morris J. The use of observational health-care data to identify and report on off-label use of biopharmaceutical products. Clin Pharmacol Ther. 2012;91(5):937–42.
Leong R, et al. Regulatory experience with physiologically based pharmacokinetic modeling for pediatric drug trials. Clin Pharmacol Ther. 2012;91(5):926–31.
Teagarden JR, Dreitlein WB, Kourlas H, Nichols L. Influence of pharmacy benefit practices on off-label dispensing of drugs in the United States (in English). Clin Pharmacol Ther. 2012;91(5):943–5.
Jung K, et al. Automated detection of off-label drug use. PLoS One. 2014;9(2):e89324.
LePendu P, Liu Y, Iyer S, Udell MR, Shah NH. Analyzing patterns of drug use in clinical notes for patient safety. AMIA Summits Transl Sci Proc. 2012;2012:63–70.
European Medical Informatics Framework. 2017. European Medical Informatics Framework. http://www.emif.eu/.
Huang C-C, Lu Z. Community challenges in biomedical text mining over 10 years: success, failure and the future. Brief Bioinform. 2016;17(1):132–44.
Segura Bedmar I, Martínez P, Herrero Zazo M. SemEval-2013 task 9: extraction of drug–drug interactions from biomedical texts (DDIExtraction 2013). Association for Computational Linguistics; 2013.
Segura Bedmar I, Martinez P, Sánchez Cisneros D. The 1st DDIExtraction-2011 challenge task: extraction of drug–drug interactions from biomedical texts. In: proceedings of the 1st challenge task on Drug-Drug Interaction Extraction (DDIExtraction 2011), p. 1–9, Huelva, Spain, 2011.
University of Massachusetts Medical School. 2017. NLP Challenges for Detecting Medication and Adverse Drug Events from Electronic Health Records (MADE1.0). http://bio-nlp.org/index.php/announcements/39-nlp-challenges.
Qato DM, Wilder J, Schumm LP, Gillet V, Alexander GC. Changes in prescription and over-the-counter medication and dietary supplement use among older adults in the United States, 2005 vs 2011. JAMA Intern Med. 2016;176(4):473–82.
Boyd CM, Darer J, Boult C, Fried LP, Boult L, Wu AW. Clinical practice guidelines and quality of care for older patients with multiple comorbid diseases: implications for pay for performance. JAMA. 2005;294(6):716–24.
Steinman MA, Miao Y, Boscardin WJ, Komaiko KD, Schwartz JB. Prescribing quality in older veterans: a multifocal approach. J Gen Intern Med. 2014;29(10):1379–86.
We would like to thank Philip Silberman and Dan Schneider of the Northwestern Medicine Enterprise Data Warehouse for help with data management.
This work is funded by a Grant from the Pharmacovigilance and Patient Safety department at AbbVie, Inc.
Conflict of interest
Yuan Luo, William Thompson, Timothy Herr, Zexian Zeng, Mark Berendsen, Siddhartha Jonnalagadda, Matthew Carson, and Justin Starren have no conflicts of interest that are directly relevant to the content of this study.
Electronic supplementary material
Below is the link to the electronic supplementary material.
About this article
Cite this article
Luo, Y., Thompson, W.K., Herr, T.M. et al. Natural Language Processing for EHR-Based Pharmacovigilance: A Structured Review. Drug Saf 40, 1075–1089 (2017). https://doi.org/10.1007/s40264-017-0558-6