Skip to main content

Comparing NLP Systems to Extract Entities of Eligibility Criteria in Dietary Supplements Clinical Trials Using NLP-ADAPT

  • Conference paper
  • First Online:
Book cover Artificial Intelligence in Medicine (AIME 2020)

Abstract

Natural Language Processing (NLP) techniques have been used extensively to extract concepts from unstructured clinical trial eligibility criteria. Recruiting patients whose information in Electronic Health Records matches clinical trial eligibility criteria can potentially facilitate and accelerate the clinical trial recruitment process. However, a significant obstacle is identifying an efficient Named Entity Recognition (NER) system to parse the clinical trial eligibility criteria. In this study, we used NLP-ADAPT (Artifact Discovery and Preparation Toolkit) to compare existing biomedical NLP systems (BiomedICUS, CLAMP, cTAKES and MetaMap) and their Boolean ensemble to identify entities of the eligibility criteria of 150 randomly selected Dietary Supplement (DS) clinical trials. We created a custom mapping of the gold standard annotated entities to UMLS semantic types to align with annotations from each system. All systems in NLP-ADAPT used their default pipelines to extract entities based on our custom mappings. The systems performed reasonably well in extracting UMLS concepts belonging to the semantic types Disorders and Chemicals and Drugs. Among all systems, cTAKES was the highest performing system for Chemicals and Drugs and Disorders semantic groups and BioMedICUS was the highest performing system for Procedures, Living Beings, Concepts and Ideas, and Devices. Whereas, the Boolean ensemble outperformed individual systems. This study sets a baseline that can be potentially improved with modifications to the NLP-ADAPT pipeline.

A. Bompelli and G. Silverman—Equal-contribution first authors

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://z.umn.edu/annotation_guidelines.

References

  1. Kuo, T-T., et al.: Ensembles of NLP tools for data element extraction from clinical notes. In: AMIA Annual Symposium Proceedings, vol. 2016, pp. 1880–1889 (2017)

    Google Scholar 

  2. Kang, N., Afzal, Z., Singh, B., van Mulligen, E.M., Kors, J.A.: Using an ensemble system to improve concept extraction from clinical records. J. Biomed. Inform. 45, 423–428 (2012). https://doi.org/10.1016/j.jbi.2011.12.009

    Article  Google Scholar 

  3. Friedman, C.: Towards a comprehensive medical language processing system: methods and issues. In: Proceedings AMIA Annual Fall Symposium, pp. 595–599 (1997)

    Google Scholar 

  4. Soysal, E., et al.: CLAMP - a toolkit for efficiently building customized clinical natural language processing pipelines. J. Am. Med. Inform. Assoc. 25, 331–336 (2018). https://doi.org/10.1093/jamia/ocx132

    Article  Google Scholar 

  5. Savova, G.K., et al.: Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications. J. Am. Med. Inform. Assoc. 17, 507–513 (2010). https://doi.org/10.1136/jamia.2009.001560

    Article  Google Scholar 

  6. Conway, M., et al.: Moonstone: a novel natural language processing system for inferring social risk from clinical narratives. J Biomed. Seman. 10, 1–10 (2018). https://doi.org/10.1186/s13326-019-0198-0

    Article  MathSciNet  Google Scholar 

  7. Wang, Y., et al.: Clinical information extraction applications: a literature review. J. Biomed. Inform. 77, 34–49 (2018). https://doi.org/10.1016/j.jbi.2017.11.011

    Article  Google Scholar 

  8. Friedman, C., Shagina, L., Lussier, Y., Hripcsak, G.: Automated encoding of clinical documents based on natural language processing. J. Am. Med. Inform. Assoc. 11, 392–402 (2004). https://doi.org/10.1197/jamia.M1552

    Article  Google Scholar 

  9. ten Teije, A., et al.: Knowledge Engineering and Knowledge Management: 18th International Conference, EKAW 2012, Galway City, Ireland, October 8-12, 2012. Proceedings. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33876-2

    Book  Google Scholar 

  10. Uzuner, Ö., South, B.R., Shen, S., DuVall, S.L.: 2010 i2b2/VA challenge on concepts, assertions, and relations in clinical text. J. Am. Med. Inform. Assoc. 18, 552–556 (2011). https://doi.org/10.1136/amiajnl-2011-000203

    Article  Google Scholar 

  11. University of Minnesota, NLP/IE. nlp-adapt-kube (2019). https://github.com/nlpie/nlp-adapt-kube. Accessed 06 Jan 2020

  12. University of Minnesota, NLP/IE, nlp-ensemble-explorer, UMN NLPIE (2020). https://github.com/nlpie/ensemble-explorer. Accessed 06 Jan 2020

  13. Azam, S.S., Raju, M., Pagidimarri, V., Kasivajjala, V.: Q-Map: clinical concept mining from clinical documents. arXiv:1804.11149 (2018)

  14. McCray, A.T., Burgun, A., Bodenreider, O.: Aggregating UMLS semantic types for reducing conceptual complexity. Stud. Health Technol. Inform. 84, 216–220 (2001)

    Google Scholar 

  15. Semantic types and groups. https://metamap.nlm.nih.gov/SemanticTypesAndGroups.shtml. Accessed 05 May 2020

  16. He, Z., Perl, Y., Elhanan, G., Chen, Y., Geller, J., Bian, J.: Auditing the assignments of top-level semantic types in the UMLS semantic network to UMLS concepts. In: Proceedings (IEEE International Conference Bioinformatics and Biomedicine), vol. 2017, pp. 1262–1269 (2017). https://doi.org/10.1109/BIBM.2017.8217840

  17. University of Minnesota N, biomedicus (2019). https://github.com/nlpie/biomedicus. Accessed 06 Jan 2020

  18. University of Texas, UT health, CLAMP (2020). https://clamp.uth.edu. Accessed 06 Jan 2020

  19. Apache software foundation, cTAKES. https://ctakes.apache.org. Accessed 06 Jan 2020

  20. The National Institutes of Health, MetaMap (2019). https://metamap.nlm.nih.gov. Accessed 06 Jan 2020

  21. Apache foundation. UIMA project (2013). https://uima.apache.org. Accessed 08 Feb 2020

  22. Aronson, A.R.: MetaMap evaluation (2001). https://ii.nlm.nih.gov/Publications/Papers/mm.evaluation.pdf

  23. Technische Universität Darmstadt, ubiquitous knowledge processing lab, dkpro-cassis (2019). https://github.com/dkpro/dkpro-cassis. Accessed 06 Jan 2020

  24. Miller, B.N., Ranum, D.L.: Parse tree. In: Problem Solving with Algorithms and Data Structures using Python. Section 7.6. https://runestone.academy/runestone/books/published/pythonds/Trees/ParseTree.html. Accessed 06 Jan 2020

  25. Sang, E.F.T.K., Veenstra, J.: Representing text chunks. In: Proceedings of the 9th Conference on European Chapter of the Association for Computational Linguistics, Bergen, Norway, pp. 173–179. Association for Computational Linguistics (1999). https://doi.org/10.3115/977035.977059

  26. University of Minnesota, NLP/IE. expected_number_boolean_combinations_n_eq_5.py. expected_number_boolean_combinations_n_eq_5.py (2020). https://gist.github.com/GregSilverman/3e09cb6b7c7bf664b4df14d309192bb3. Accessed 07 Feb 2020

  27. Knoll, B.C., Melton, G.B., Liu, H., Xu, H., Pakhomov, S.V.S.: Using synthetic clinical data to train an HMM-based POS tagger. In: 2016 IEEE-EMBS International Conference on Biomedical and Health Informatics (BHI), pp. 252–255 (2016). https://doi.org/10.1109/BHI.2016.7455882

  28. Albright, D., et al.: Towards comprehensive syntactic and semantic annotations of the clinical narrative. J. Am. Med. Inform. Assoc. 20, 922–930 (2013). https://doi.org/10.1136/amiajnl-2012-001317

    Article  Google Scholar 

  29. Aronson, A.R.: Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program. In: Proceeding AMIA Symposium, pp. 17–21 (2001)

    Google Scholar 

  30. Derczynski, L.: Complementarity, F-score, and NLP evaluation. In: Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC 2016), Portorož, Slovenia, pp. 261–266. European Language Resources Association (ELRA) (2016)

    Google Scholar 

  31. Aronson, A.R., Lang, F.-M.: An overview of MetaMap: historical perspective and recent advances. J. Am. Med. Inform. Assoc. 17, 229–236 (2010). https://doi.org/10.1136/jamia.2009.002733

    Article  Google Scholar 

  32. Kilicoglu, H., Rosemblat, G., Fiszman, M., Shin, D.: Broad-coverage biomedical relation extraction with SemRep. BMC Bioinform. 21, 1–28 (2020). https://doi.org/10.1186/s12859-020-3517-7

    Article  Google Scholar 

  33. Rizvi, R.F., et al.: iDISK: the integrated dietary supplements knowledge base. J. Am. Med. Inform. Assoc. 27, 539–548 (2020). https://doi.org/10.1093/jamia/ocz216

    Article  Google Scholar 

  34. Vasilakes, J., Bompelli, A., Bishop, J., Adam, T., Bodenreider, O., Zhang, R.: Assessing the enrichment of dietary supplement coverage in the UMLS. J. Am. Med. Informa. Assoc. (2020, in press)

    Google Scholar 

  35. Silverman, G.M., et al.: Named entity recognition in prehospital trauma care. Stud. Health Technol. Inform. 264, 1586–1587 (2019). https://doi.org/10.3233/SHTI190547

    Article  Google Scholar 

  36. Tignanelli, C.J., et al.: Natural language processing of prehospital emergency medical services trauma records allows for automated characterization of treatment appropriateness. J. Trauma Acute Care Surg. 88, 607–614 (2020). https://doi.org/10.1097/TA.0000000000002598

    Article  Google Scholar 

Download references

Acknowledgements

This work was partially supported by the NIH’s National Center for Complementary and Integrative Health and the Office of Dietary Supplements under grant number R01AT009457 (Zhang); and supported by the National Center for Advancing Translational Sciences under grant number UL1TR002494 and U01TR002062.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rui Zhang .

Editor information

Editors and Affiliations

Appendix

Appendix

Fig. 2.
figure 2

Overview of the study

Fig. 3.
figure 3

Mapping to UMLS semantic groups across NLP systems

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Bompelli, A. et al. (2020). Comparing NLP Systems to Extract Entities of Eligibility Criteria in Dietary Supplements Clinical Trials Using NLP-ADAPT. In: Michalowski, M., Moskovitch, R. (eds) Artificial Intelligence in Medicine. AIME 2020. Lecture Notes in Computer Science(), vol 12299. Springer, Cham. https://doi.org/10.1007/978-3-030-59137-3_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-59137-3_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-59136-6

  • Online ISBN: 978-3-030-59137-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics