MADEx: A System for Detecting Medications, Adverse Drug Events, and Their Relations from Clinical Notes
Early detection of adverse drug events (ADEs) from electronic health records is an important, challenging task to support pharmacovigilance and drug safety surveillance. A well-known challenge to use clinical text for detection of ADEs is that much of the detailed information is documented in a narrative manner. Clinical natural language processing (NLP) is the key technology to extract information from unstructured clinical text.
We present a machine learning-based clinical NLP system—MADEx—for detecting medications, ADEs, and their relations from clinical notes.
We developed a recurrent neural network (RNN) model using a long short-term memory (LSTM) strategy for clinical name entity recognition (NER) and compared it with baseline conditional random fields (CRFs). We also developed a modified training strategy for the RNN, which outperformed the widely used early stop strategy. For relation extraction, we compared support vector machines (SVMs) and random forests on single-sentence relations and cross-sentence relations. In addition, we developed an integrated pipeline to extract entities and relations together by combining RNNs and SVMs.
MADEx achieved the top-three best performances (F1 score of 0.8233) for clinical NER in the 2018 Medication and Adverse Drug Events (MADE1.0) challenge. The post-challenge evaluation showed that the relation extraction module and integrated pipeline (identify entity and relation together) of MADEx are comparable with the best systems developed in this challenge.
This study demonstrated the efficiency of deep learning methods for automatic extraction of medications, ADEs, and their relations from clinical text to support pharmacovigilance and drug safety surveillance.
The authors would like to thank the organizers who provided the annotated corpus and word embeddings for this challenge, and gratefully acknowledge the support of the NVIDIA Corporation with the donation of the GPUs used for this research. The authors would also like to thank the anonymous reviewers for their helpful feedback.
Compliance with Ethical Standards
This study was supported in part by the University of Florida Clinical and Translational Science Institute, which is funded by the National Institutes of Health (NIH) National Center for Advancing Translational Sciences under award number UL1TR001427, and the OneFlorida Clinical Research Consortium, which is funded by the Patient-Centered Outcomes Research Institute (PCORI) under award number CDRN-1501-26692. The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH.
Conflict of Interest
Xi Yang, Jiang Bian, Yan Gong, William R. Hogan, and Yonghui Wu have no conflicts of interest to declare that are directly relevant to the contents of this study.
This study utilized de-identified clinical notes provided by the University of Massachusetts Medical School through the MADE1.0 challenge, and was approved by the University of Florida Institutional Review Board.
- 1.Institute of Medicine (US) Committee on quality of health care in America. To err is human: building a safer health system. Washington, DC: National Academies Press; 2000. http://www.ncbi.nlm.nih.gov/books/NBK225182/. Accessed 23 June 2018.
- 2.Weiss AJ, Freeman WJ, Heslin KC, Barrett ML. Adverse drug events in US Hospitals, 2010 versus 2014. Statistical brief #234. AHRQ; 2018. https://www.hcup-us.ahrq.gov/reports/statbriefs/sb234-Adverse-Drug-Events.jsp. Accessed Dec 2018.
- 10.Kumar S. A survey of deep learning methods for relation extraction; 2017. arXiv:170503645.
- 13.Pradhan S, Elhadad N, Chapman W, Manandhar S, Savova G. SemEval-2014 Task 7: analysis of clinical text. In: Proceedings of the 8th international workshop on semantic evaluation (SemEval 2014);2014. p. 54–62.Google Scholar
- 15.Lafferty JD, McCallum A, Pereira FCN. Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of the 18th international conference on machine learning. San Franciso, CA: Morgan Kaufmann Publishers Inc.; 2001. p. 282–89.Google Scholar
- 17.Tsochantaridis I, Joachims T, Hofmann T, Altun Y. Large margin methods for structured and interdependent output variables. J Mach Learn Res. 2005;6:1453–84.Google Scholar
- 19.Friedman C. Towards a comprehensive medical language processing system: methods and issues. Proc AMIA Annu Fall Symp. 1997;595–599.Google Scholar
- 20.Denny JC, Irani PR, Wehbe FH, Smithers JD, Spickard A. The KnowledgeMap project: development of a concept-based medical school curriculum database. AMIA Annu Symp Proc.; 2003. pp. 195–199.Google Scholar
- 22.Zhang Y, Wang J, Tang B, Wu Y, Jiang M, Chen Y, et al. UTH_CCB: a report for semeval 2014–task 7 analysis of clinical text. Sem Eval. 2014;2014:802.Google Scholar
- 23.Tang B, Wu Y, Jiang M, Denny JC, Xu H. Recognizing and encoding disorder concepts in clinical text using machine learning and vector space model. CLEF 2013 proceedings. 2013. http://ceur-ws.org/Vol-1179/CLEF2013wn-CLEFeHealth-TangEt2013.pdf.
- 24.Le H-Q, Nguyen TM, Vu ST, Dang TH. D3NER: biomedical named entity recognition using CRF-biLSTM improved with fine-tuned embeddings of various linguistic information. Bioinformatics. 2018;24(20):3539–46.Google Scholar
- 27.Jagannatha AN, Yu H. Bidirectional RNN for medical event detection in electronic health records. Proc Conf. 2016;2016:473–82.Google Scholar
- 28.Wu Y, Jiang M, Lei J, Xu H. Named entity recognition in chinese clinical text using deep neural network. Stud Health Technol Inform. 2015;216:624–8.Google Scholar
- 29.Wu Y, Jiang M, Xu J, Zhi D, Xu H. Clinical named entity recognition using deep learning models. AMIA Annu Symp Proc 2018; 2017:1812–19 (eCollection 2017).Google Scholar
- 30.Zhao S, Grishman R. Extracting relations with integrated information using Kernel methods. In: Proceedings of the 43rd annual meeting of the association for computational linguistics. Stroudsburg, PA; 2005. pp. 419–426.Google Scholar
- 32.Zelenko D, Aone C, Richardella A. Kernel methods for relation extraction. J Mach Learn Res. 2003;3:1083–106.Google Scholar
- 39.Lample G, Ballesteros M, Subramanian S, Kawakami K, Dyer C. Neural architectures for named entity recognition; 2016. arXiv:160301360.
- 41.Wu Y, Xu J, Jiang M, Zhang Y, Xu H. A study of neural word embeddings for named entity recognition in clinical text. AMIA Annu Symp Proc. 2015;2015:1326–33.Google Scholar
- 42.LIBSVM. A library for support vector machines. https://www.csie.ntu.edu.tw/~cjlin/libsvm/. Accessed 23 Jun 2018.
- 43.Collobert R, Weston J, Bottou L, Karlen M, Kavukcuoglu K, Kuksa P. Natural language processing (almost) from scratch. J Mach Learn Res. 2011;12:2493–537.Google Scholar
- 44.Chapman AB, Peterson KS, Alba PR, DuVall SL, Patterson OV. Hybrid system for adverse drug event detection. Proc Mach Learn Res. 2018;90:16–24.Google Scholar
- 45.Dandala B, Joopudi V, Devarakonda M. IBM Research System at MADE 2018: detecting adverse drug events from electronic health records. Proc Mach Learn Res. 2018;90:39–47.Google Scholar