Multi-layered Learning for Information Extraction from Adverse Drug Event Narratives

Wunnava, Susmitha; Qin, Xiao; Kakar, Tabassum; Tlachac, M. L.; Kong, Xiangnan; Rundensteiner, Elke A.; Sahoo, Sanjay K.; De, Suranjan

doi:10.1007/978-3-030-29196-9_22

Susmitha Wunnava¹⁵,
Xiao Qin¹⁵,
Tabassum Kakar¹⁵,
M. L. Tlachac¹⁵,
Xiangnan Kong¹⁵,
Elke A. Rundensteiner¹⁵,
Sanjay K. Sahoo¹⁶ &
…
Suranjan De¹⁶

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1024))

Included in the following conference series:

International Joint Conference on Biomedical Engineering Systems and Technologies

489 Accesses

Abstract

Recognizing named entities in Adverse Drug Reactions narratives is a crucial step towards extracting valuable patient information from unstructured text and transforming the information into an easily processable structured format. This motivates using advanced data analytics to support data-driven pharmacovigilance. Yet existing biomedical named entity recognition (NER) tools are limited in their ability to identify certain entity types from these domain-specific narratives, resulting in poor accuracy. To address this shortcoming, we propose our novel methodology called Tiered Ensemble Learning System with Diversity (TELS-D), an ensemble approach that integrates a rich variety of named entity recognizers to procure the final result. There are two specific challenges faced by biomedical NER: the classes are imbalanced and the lack of a single best performing method. The first challenge is addressed through a balanced, under-sampled bagging strategy that depends on the imbalance level to overcome this highly skewed data problem. To address the second challenge, we design an ensemble of heterogeneous entity recognizers that leverages a novel ensemble combiner. Our experimental results demonstrate that for biomedical text datasets: (i) a balanced learning environment combined with an ensemble of heterogeneous classifiers consistently improves the performance over individual base learners and (ii) stacking-based ensemble combiner methods outperform simple majority voting based solutions by 0.3 in F1-score.

We are grateful for funding to in part support this research, including by the Seeds of STEM at WPI via the Institute of Education Sciences, U.S. Department of Education grant R305A150571, Oak Ridge Associated Universities (ORAU) for two ORISE Fellowships to conduct research with the U.S. Food and Drug Administration, and the Department of Education GAANN fellowship grant P200A150306.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://www.fda.gov/safety/medwatch/.

References

Alpaydin, E.: Introduction to Machine Learning. MIT Press, Cambridge (2014)
MATH Google Scholar
Aronson, A.R.: Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program. In: Proceedings of the AMIA Symposium, p. 17. AMIA (2001)
Google Scholar
Barandela, R., Valdovinos, R.M., Sánchez, J.S.: New applications of ensembles of classifiers. Pattern Anal. Appl. 6(3), 245–256 (2003)
Article MathSciNet Google Scholar
Bird, S., et al.: Natural Language Processing with Python: Analyzing Text with the Natural Language Toolkit. O’Reilly Media Inc., Sebastopol (2009)
MATH Google Scholar
Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, Heidelberg (2006)
MATH Google Scholar
Błaszczyński, J., Stefanowski, J., Idkowiak, Ł.: Extending bagging for imbalanced data. In: Burduk, R., Jackowski, K., Kurzynski, M., Wozniak, M., Zolnierek, A. (eds.) Proceedings of the 8th International Conference on Computer Recognition Systems CORES 2013. Advances in Intelligent Systems and Computing, vol. 226, pp. 269–278. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-319-00969-8_26
Chapter Google Scholar
Breiman, L.: Bagging predictors. Mach. Learn. 24(2), 123–140 (1996)
MATH Google Scholar
Charniak, E., Johnson, M.: Coarse-to-fine n-best parsing and MaxEnt discriminative reranking. In: Proceedings of the 43rd Annual Meeting on ACL, pp. 173–180. ACL (2005)
Google Scholar
Chawla, N.V.: Data mining for imbalanced datasets: An overview. In: Maimon, O., Rokach, L. (eds.) Data Mining and Knowledge Discovery Handbook, pp. 875–886. Springer, Boston (2009). https://doi.org/10.1007/978-0-387-09823-4_45
Chapter Google Scholar
Doan, S., Xu, H.: Recognizing medication related entities in hospital discharge summaries using support vector machine. In: Proceedings of the 23rd International Conference on Computational Linguistics: Posters, pp. 259–266. ACL (2010)
Google Scholar
FDA: FAERS (FDA adverse event reporting system) (2016)
Google Scholar
Feng, X., et al.: Assessing pancreatic cancer risk associated with dipeptidyl peptidase 4 inhibitors: data mining of FDA adverse event reporting system (FAERS). J. Pharmacovigilance 1, 1–7 (2013)
Google Scholar
Ferrucci, D., Lally, A.: UIMA: an architectural approach to unstructured information processing in the corporate research environment. Nat. Lang. Eng. 10(3–4), 327–348 (2004)
Article Google Scholar
Friedman, C., Alderson, P.O., Austin, J.H., Cimino, J.J., Johnson, S.B.: A general natural-language text processor for clinical radiology. JAMIA 1(2), 161–174 (1994)
Google Scholar
Galar, M., Fernandez, A., Barrenechea, E., Bustince, H., Herrera, F.: A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches. IEEE Trans. Syst. Man Cybern. Part C (Appl. Rev.) 42(4), 463–484 (2012)
Article Google Scholar
Ghiasvand, O.: Disease name extraction from clinical text using conditional random fields. Ph.D. thesis, The University of Wisconsin-Milwaukee (2014)
Google Scholar
Halgrim, S.R., Xia, F., Solti, I., Cadag, E., Uzuner, Ö.: A cascade of classifiers for extracting medication information from discharge summaries. J. Biomed. Semant. 2(3), S2 (2011)
Article Google Scholar
Harpaz, R., et al.: Text mining for adverse drug events: the promise, challenges, and state of the art. Drug Saf. 37(10), 777–790 (2014)
Article Google Scholar
Jagannatha, A.N., Yu, H.: Bidirectional RNN for medical event detection in electronic health records. In: Proceedings of the conference. ACL. North American Chapter. Meeting, vol. 2016, p. 473. NIH Public Access (2016)
Google Scholar
Japkowicz, N., Stephen, S.: The class imbalance problem: a systematic study. Intell. Data Anal. 6(5), 429–449 (2002)
Article Google Scholar
Kohavi, R., et al.: A study of cross-validation and bootstrap for accuracy estimation and model selection. In: Ijcai, Stanford, CA, vol. 14, pp. 1137–1145 (1995)
Google Scholar
Kuhn, M., Letunic, I., Jensen, L.J., Bork, P.: The sider database of drugs and side effects. Nucleic Acids Res. 44(D1), D1075–D1079 (2015)
Article Google Scholar
Lazarou, J., Pomeranz, B.H., Corey, P.N.: Incidence of adverse drug reactions in hospitalized patients: a meta-analysis of prospective studies. JAMA 279(15), 1200–1205 (1998)
Article Google Scholar
Longadge, R., Dongre, S.: Class imbalance problem in data mining review. arXiv preprint arXiv:1305.1707 (2013)
Nguyen, H., Patrick, J.: Text mining in clinical domain: dealing with noise. In: KDD, pp. 549–558 (2016)
Google Scholar
Polikar, R.: Ensemble learning. Scholarpedia 4(1), 2776 (2009). https://doi.org/10.4249/scholarpedia.2776. revision #91224
Article Google Scholar
Ramesh, B.P., Belknap, S.M., Li, Z., Frid, N., West, D.P., Yu, H.: Automatically recognizing medication and adverse event information from food and drug administration’s adverse event reporting system narratives. JMIR Med. Inform. 2(1), e10 (2014)
Article Google Scholar
Sakaeda, T., Tamon, A., Kadoyama, K., Okuno, Y.: Data mining of the public version of the FDA adverse event reporting system. Int. J. Med. Sci. 10(7), 796 (2013)
Article Google Scholar
Savova, G.K., et al.: Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications. JAMIA 17(5), 507–513 (2010)
Google Scholar
Schapire, R.E.: The strength of weak learnability. Mach. Learn. 5(2), 197–227 (1990)
Google Scholar
Simpson, M.S., Demner-Fushman, D.: Biomedical text mining: a survey of recent progress. In: Aggarwal, C., Zhai, C. (eds.) Mining Text Data, pp. 465–517. Springer, Boston (2012). https://doi.org/10.1007/978-1-4614-3223-4_14
Chapter Google Scholar
Tan, P.N., et al.: Introduction to Data Mining. Pearson Education India, New Delhi (2006)
Google Scholar
Uzuner, Ö., Solti, I., Cadag, E.: Extracting medication information from clinical text. JAMIA 17(5), 514–518 (2010)
Google Scholar
Uzuner, Ö., Solti, I., Xia, F., Cadag, E.: Community annotation experiment for ground truth generation for the i2b2 medication challenge. JAMIA 17(5), 519–523 (2010)
Google Scholar
Uzuner, Ö., Zhang, X., Sibanda, T.: Machine learning and rule-based approaches to assertion classification. JAMIA 16(1), 109–115 (2009)
Google Scholar
Wang, S., Yao, X.: Diversity analysis on imbalanced data sets by using ensemble models. In: Proceedings of the IEEE Symposium on Computational Intelligence and Data Mining, CIDM, pp. 324–331 (2009)
Google Scholar
Wilson, A.M., Thabane, L., Holbrook, A.: Application of data mining techniques in pharmacovigilance. BJCP 57(2), 127–134 (2004)
Google Scholar
Wolpert, D.H.: Stacked generalization. Neural Netw. 5(2), 241–259 (1992)
Article Google Scholar
Wunnava, S., et al.: One size does not fit all: an ensemble approach towards information extraction from adverse drug event narratives. In: Proceedings of the 11th International Joint Conference on Biomedical Engineering Systems and Technologies - Volume 5: HEALTHINF, pp. 176–188. INSTICC, SciTePress (2018). https://doi.org/10.5220/0006600201760188
Xu, H., Stenner, S.P., Doan, S., Johnson, K.B., Waitman, L.R., Denny, J.C.: MedEx: a medication information extraction system for clinical narratives. JAMIA 17(1), 19–24 (2010)
Google Scholar

Download references

Author information

Authors and Affiliations

Worcester Polytechnic Institute, Worcester, MA, USA
Susmitha Wunnava, Xiao Qin, Tabassum Kakar, M. L. Tlachac, Xiangnan Kong & Elke A. Rundensteiner
Center for Drug Evaluation and Research, U.S. Food and Drug Administration, Silver Spring, MD, USA
Sanjay K. Sahoo & Suranjan De

Authors

Susmitha Wunnava
View author publications
You can also search for this author in PubMed Google Scholar
Xiao Qin
View author publications
You can also search for this author in PubMed Google Scholar
Tabassum Kakar
View author publications
You can also search for this author in PubMed Google Scholar
M. L. Tlachac
View author publications
You can also search for this author in PubMed Google Scholar
Xiangnan Kong
View author publications
You can also search for this author in PubMed Google Scholar
Elke A. Rundensteiner
View author publications
You can also search for this author in PubMed Google Scholar
Sanjay K. Sahoo
View author publications
You can also search for this author in PubMed Google Scholar
Suranjan De
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Susmitha Wunnava , Xiao Qin , Tabassum Kakar , Xiangnan Kong , Elke A. Rundensteiner , Sanjay K. Sahoo or Suranjan De .

Editor information

Editors and Affiliations

University of Sao Paulo, São Paulo, Brazil
Alberto Cliquet Jr.
University of Saskatchewan, Saskatoon, Canada
Sheldon Wiebe
College of Charleston, Charleston, SC, USA
Paul Anderson
University of Rome Tor Vergata, Rome, Italy
Giovanni Saggio
Aberystwyth University, Aberystwyth, UK
Reyer Zwiggelaar
LIBPhys, New University of Lisbon, Lisbon, Portugal
Hugo Gamboa
Instituto de Telecomunicações and Instituto Superior Técnico, Lisbon, Portugal
Ana Fred
Madeira Interactive Technologies Institute, Universidade da Madeira, Funchal, Portugal
Sergi Bermúdez i Badia

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wunnava, S. et al. (2019). Multi-layered Learning for Information Extraction from Adverse Drug Event Narratives. In: Cliquet Jr., A., et al. Biomedical Engineering Systems and Technologies. BIOSTEC 2018. Communications in Computer and Information Science, vol 1024. Springer, Cham. https://doi.org/10.1007/978-3-030-29196-9_22

Download citation

DOI: https://doi.org/10.1007/978-3-030-29196-9_22
Published: 13 August 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-29195-2
Online ISBN: 978-3-030-29196-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics