Skip to main content

Multi-layered Learning for Information Extraction from Adverse Drug Event Narratives

  • Conference paper
  • First Online:
Biomedical Engineering Systems and Technologies (BIOSTEC 2018)

Abstract

Recognizing named entities in Adverse Drug Reactions narratives is a crucial step towards extracting valuable patient information from unstructured text and transforming the information into an easily processable structured format. This motivates using advanced data analytics to support data-driven pharmacovigilance. Yet existing biomedical named entity recognition (NER) tools are limited in their ability to identify certain entity types from these domain-specific narratives, resulting in poor accuracy. To address this shortcoming, we propose our novel methodology called Tiered Ensemble Learning System with Diversity (TELS-D), an ensemble approach that integrates a rich variety of named entity recognizers to procure the final result. There are two specific challenges faced by biomedical NER: the classes are imbalanced and the lack of a single best performing method. The first challenge is addressed through a balanced, under-sampled bagging strategy that depends on the imbalance level to overcome this highly skewed data problem. To address the second challenge, we design an ensemble of heterogeneous entity recognizers that leverages a novel ensemble combiner. Our experimental results demonstrate that for biomedical text datasets: (i) a balanced learning environment combined with an ensemble of heterogeneous classifiers consistently improves the performance over individual base learners and (ii) stacking-based ensemble combiner methods outperform simple majority voting based solutions by 0.3 in F1-score.

We are grateful for funding to in part support this research, including by the Seeds of STEM at WPI via the Institute of Education Sciences, U.S. Department of Education grant R305A150571, Oak Ridge Associated Universities (ORAU) for two ORISE Fellowships to conduct research with the U.S. Food and Drug Administration, and the Department of Education GAANN fellowship grant P200A150306.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://www.fda.gov/safety/medwatch/.

References

  1. Alpaydin, E.: Introduction to Machine Learning. MIT Press, Cambridge (2014)

    MATH  Google Scholar 

  2. Aronson, A.R.: Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program. In: Proceedings of the AMIA Symposium, p. 17. AMIA (2001)

    Google Scholar 

  3. Barandela, R., Valdovinos, R.M., Sánchez, J.S.: New applications of ensembles of classifiers. Pattern Anal. Appl. 6(3), 245–256 (2003)

    Article  MathSciNet  Google Scholar 

  4. Bird, S., et al.: Natural Language Processing with Python: Analyzing Text with the Natural Language Toolkit. O’Reilly Media Inc., Sebastopol (2009)

    MATH  Google Scholar 

  5. Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, Heidelberg (2006)

    MATH  Google Scholar 

  6. Błaszczyński, J., Stefanowski, J., Idkowiak, Ł.: Extending bagging for imbalanced data. In: Burduk, R., Jackowski, K., Kurzynski, M., Wozniak, M., Zolnierek, A. (eds.) Proceedings of the 8th International Conference on Computer Recognition Systems CORES 2013. Advances in Intelligent Systems and Computing, vol. 226, pp. 269–278. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-319-00969-8_26

    Chapter  Google Scholar 

  7. Breiman, L.: Bagging predictors. Mach. Learn. 24(2), 123–140 (1996)

    MATH  Google Scholar 

  8. Charniak, E., Johnson, M.: Coarse-to-fine n-best parsing and MaxEnt discriminative reranking. In: Proceedings of the 43rd Annual Meeting on ACL, pp. 173–180. ACL (2005)

    Google Scholar 

  9. Chawla, N.V.: Data mining for imbalanced datasets: An overview. In: Maimon, O., Rokach, L. (eds.) Data Mining and Knowledge Discovery Handbook, pp. 875–886. Springer, Boston (2009). https://doi.org/10.1007/978-0-387-09823-4_45

    Chapter  Google Scholar 

  10. Doan, S., Xu, H.: Recognizing medication related entities in hospital discharge summaries using support vector machine. In: Proceedings of the 23rd International Conference on Computational Linguistics: Posters, pp. 259–266. ACL (2010)

    Google Scholar 

  11. FDA: FAERS (FDA adverse event reporting system) (2016)

    Google Scholar 

  12. Feng, X., et al.: Assessing pancreatic cancer risk associated with dipeptidyl peptidase 4 inhibitors: data mining of FDA adverse event reporting system (FAERS). J. Pharmacovigilance 1, 1–7 (2013)

    Google Scholar 

  13. Ferrucci, D., Lally, A.: UIMA: an architectural approach to unstructured information processing in the corporate research environment. Nat. Lang. Eng. 10(3–4), 327–348 (2004)

    Article  Google Scholar 

  14. Friedman, C., Alderson, P.O., Austin, J.H., Cimino, J.J., Johnson, S.B.: A general natural-language text processor for clinical radiology. JAMIA 1(2), 161–174 (1994)

    Google Scholar 

  15. Galar, M., Fernandez, A., Barrenechea, E., Bustince, H., Herrera, F.: A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches. IEEE Trans. Syst. Man Cybern. Part C (Appl. Rev.) 42(4), 463–484 (2012)

    Article  Google Scholar 

  16. Ghiasvand, O.: Disease name extraction from clinical text using conditional random fields. Ph.D. thesis, The University of Wisconsin-Milwaukee (2014)

    Google Scholar 

  17. Halgrim, S.R., Xia, F., Solti, I., Cadag, E., Uzuner, Ö.: A cascade of classifiers for extracting medication information from discharge summaries. J. Biomed. Semant. 2(3), S2 (2011)

    Article  Google Scholar 

  18. Harpaz, R., et al.: Text mining for adverse drug events: the promise, challenges, and state of the art. Drug Saf. 37(10), 777–790 (2014)

    Article  Google Scholar 

  19. Jagannatha, A.N., Yu, H.: Bidirectional RNN for medical event detection in electronic health records. In: Proceedings of the conference. ACL. North American Chapter. Meeting, vol. 2016, p. 473. NIH Public Access (2016)

    Google Scholar 

  20. Japkowicz, N., Stephen, S.: The class imbalance problem: a systematic study. Intell. Data Anal. 6(5), 429–449 (2002)

    Article  Google Scholar 

  21. Kohavi, R., et al.: A study of cross-validation and bootstrap for accuracy estimation and model selection. In: Ijcai, Stanford, CA, vol. 14, pp. 1137–1145 (1995)

    Google Scholar 

  22. Kuhn, M., Letunic, I., Jensen, L.J., Bork, P.: The sider database of drugs and side effects. Nucleic Acids Res. 44(D1), D1075–D1079 (2015)

    Article  Google Scholar 

  23. Lazarou, J., Pomeranz, B.H., Corey, P.N.: Incidence of adverse drug reactions in hospitalized patients: a meta-analysis of prospective studies. JAMA 279(15), 1200–1205 (1998)

    Article  Google Scholar 

  24. Longadge, R., Dongre, S.: Class imbalance problem in data mining review. arXiv preprint arXiv:1305.1707 (2013)

  25. Nguyen, H., Patrick, J.: Text mining in clinical domain: dealing with noise. In: KDD, pp. 549–558 (2016)

    Google Scholar 

  26. Polikar, R.: Ensemble learning. Scholarpedia 4(1), 2776 (2009). https://doi.org/10.4249/scholarpedia.2776. revision #91224

    Article  Google Scholar 

  27. Ramesh, B.P., Belknap, S.M., Li, Z., Frid, N., West, D.P., Yu, H.: Automatically recognizing medication and adverse event information from food and drug administration’s adverse event reporting system narratives. JMIR Med. Inform. 2(1), e10 (2014)

    Article  Google Scholar 

  28. Sakaeda, T., Tamon, A., Kadoyama, K., Okuno, Y.: Data mining of the public version of the FDA adverse event reporting system. Int. J. Med. Sci. 10(7), 796 (2013)

    Article  Google Scholar 

  29. Savova, G.K., et al.: Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications. JAMIA 17(5), 507–513 (2010)

    Google Scholar 

  30. Schapire, R.E.: The strength of weak learnability. Mach. Learn. 5(2), 197–227 (1990)

    Google Scholar 

  31. Simpson, M.S., Demner-Fushman, D.: Biomedical text mining: a survey of recent progress. In: Aggarwal, C., Zhai, C. (eds.) Mining Text Data, pp. 465–517. Springer, Boston (2012). https://doi.org/10.1007/978-1-4614-3223-4_14

    Chapter  Google Scholar 

  32. Tan, P.N., et al.: Introduction to Data Mining. Pearson Education India, New Delhi (2006)

    Google Scholar 

  33. Uzuner, Ö., Solti, I., Cadag, E.: Extracting medication information from clinical text. JAMIA 17(5), 514–518 (2010)

    Google Scholar 

  34. Uzuner, Ö., Solti, I., Xia, F., Cadag, E.: Community annotation experiment for ground truth generation for the i2b2 medication challenge. JAMIA 17(5), 519–523 (2010)

    Google Scholar 

  35. Uzuner, Ö., Zhang, X., Sibanda, T.: Machine learning and rule-based approaches to assertion classification. JAMIA 16(1), 109–115 (2009)

    Google Scholar 

  36. Wang, S., Yao, X.: Diversity analysis on imbalanced data sets by using ensemble models. In: Proceedings of the IEEE Symposium on Computational Intelligence and Data Mining, CIDM, pp. 324–331 (2009)

    Google Scholar 

  37. Wilson, A.M., Thabane, L., Holbrook, A.: Application of data mining techniques in pharmacovigilance. BJCP 57(2), 127–134 (2004)

    Google Scholar 

  38. Wolpert, D.H.: Stacked generalization. Neural Netw. 5(2), 241–259 (1992)

    Article  Google Scholar 

  39. Wunnava, S., et al.: One size does not fit all: an ensemble approach towards information extraction from adverse drug event narratives. In: Proceedings of the 11th International Joint Conference on Biomedical Engineering Systems and Technologies - Volume 5: HEALTHINF, pp. 176–188. INSTICC, SciTePress (2018). https://doi.org/10.5220/0006600201760188

  40. Xu, H., Stenner, S.P., Doan, S., Johnson, K.B., Waitman, L.R., Denny, J.C.: MedEx: a medication information extraction system for clinical narratives. JAMIA 17(1), 19–24 (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Susmitha Wunnava , Xiao Qin , Tabassum Kakar , Xiangnan Kong , Elke A. Rundensteiner , Sanjay K. Sahoo or Suranjan De .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Wunnava, S. et al. (2019). Multi-layered Learning for Information Extraction from Adverse Drug Event Narratives. In: Cliquet Jr., A., et al. Biomedical Engineering Systems and Technologies. BIOSTEC 2018. Communications in Computer and Information Science, vol 1024. Springer, Cham. https://doi.org/10.1007/978-3-030-29196-9_22

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-29196-9_22

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-29195-2

  • Online ISBN: 978-3-030-29196-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics