Skip to main content

Relying on Discourse Trees to Extract Medical Ontologies from Text

  • Conference paper
  • First Online:
Artificial Intelligence (RCAI 2021)

Abstract

We explore the role of discourse analysis in ontology construction. While extracting candidate phrases to form ontology entries from text, it is important to pay attention to which discourse units these phrases occur in. It turns out that not all discourse units are equal in terms of their contribution to forming ontology entries. We survey text mining and ontology information extraction techniques in medical domain and select the ones where advanced linguistic analysis including the discourse level is leveraged the most to produce a robust and efficient ontology. We evaluate the consistency of the resultant ontology and its role in assuring high search relevance using several real-life medical datasets and prove the importance of introducing discourse information into the ontology construction.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://medlineplus.gov/druginfo/meds/a603012.html.

References

  1. Amer, E., Fouad, K.M.: Keyphrase extraction methodology from short abstracts of medical documents. In: 2016 8th Cairo International Biomedical Engineering Conference, CIBEC 2016 (2016)

    Google Scholar 

  2. Arbabi, A., Adams, D.R., Fidler, S., Brudno, M.: Identifying clinical terms in medical text using ontology-guided machine learning. JMIR Med. Inform. 7, e12596 (2019)

    Google Scholar 

  3. Aronson, A.R.: Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program. In: Proceedings of AMIA Symposium (2001)

    Google Scholar 

  4. Banko, M., Cafarella, M., Soderland, S., Broadhead, M., Etzioni, O.: Open information extraction from the web. In: IJCAI International Joint Conference on Artificial Intelligence, pp. 2670–2676 (2007)

    Google Scholar 

  5. Barzilay, R., Lapata, M.: Modeling local coherence: an entity-based approach. Comput. Linguis. 34, 1–34 (2008)

    Article  Google Scholar 

  6. Ben Abacha, A., Da Silveira, M., Pruski, C.: Medical ontology validation through question answering. In: Peek, N., Marín Morales, R., Peleg, M. (eds.) AIME 2013. LNCS (LNAI), vol. 7885, pp. 196–205. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-38326-7_30

    Chapter  Google Scholar 

  7. Berant, J., et al.: Modeling biological processes for reading comprehension. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1499–1510. Association for Computational Linguistics, Doha, Qatar (2014)

    Google Scholar 

  8. Chen, D., Manning, C.: A fast and accurate dependency parser using neural networks. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 740–750. Association for Computational Linguistics, Doha, Qatar (2014)

    Google Scholar 

  9. Chistova, E., et al.: RST discourse parser for Russian: an experimental study of deep learning models. In: van der Aalst, W.M.P., et al. (eds.) AIST 2020. LNCS, vol. 12602, pp. 105–119. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-72610-2_8

    Chapter  Google Scholar 

  10. Corro, L., Gemulla, R.: ClausIE: clause-based open information extraction. In: WWW 2013 - Proceedings of the 22nd International Conference on World Wide Web, pp. 355–366 (2013)

    Google Scholar 

  11. Galitsky, B.: Improving relevance in a content pipeline via syntactic generalization. Eng. Appl. Artif. Intell. 58, 1–26 (2017)

    Article  Google Scholar 

  12. Galitsky, B., Ilvovsky, D., Kuznetsov, S.O.: Text classification into abstract classes based on discourse structure. In: Proceedings of the International Conference Recent Advances in Natural Language Processing, pp. 200–207. Incoma Ltd., Shoumen, Bulgaria, Hissar, Bulgaria (2015)

    Google Scholar 

  13. Galitsky, B., Ilvovsky, D., Kuznetsov, S.O.: Detecting logical argumentation in text via communicative discourse tree. J. Exp. Theor. Artif. Intell. 30, 637–663 (2018)

    Google Scholar 

  14. Galitsky, B.A., Dobrocsi, G., de la Rosa, J.L., Kuznetsov, S.O.: Using generalization of syntactic parse trees for taxonomy capture on the web. In: Andrews, S., Polovina, S., Hill, R., Akhgar, B. (eds.) ICCS 2011. LNCS (LNAI), vol. 6828, pp. 104–117. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-22688-5_8

    Chapter  Google Scholar 

  15. Gonzalez, G., Tahsin, T., Goodale, B., Greene, A., Greene, C.: Recent advances and emerging applications in text and data mining for biomedical discovery. Briefings Bioinform. 17, 33–42 (2015)

    Article  Google Scholar 

  16. Ji, B., et al.: A hybrid approach for named entity recognition in Chinese electronic medical record. BMC Med. Inform. Decis. Making 19, 149–158 (2019)

    Google Scholar 

  17. Ji, Y., Eisenstein, J.: Representation learning for text-level discourse parsing. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 13–24. Association for Computational Linguistics, Baltimore, Maryland (2014)

    Google Scholar 

  18. Jin, Q., Dhingra, B., Liu, Z., Cohen, W.W., Lu, X.: PubMedQA: a dataset for biomedical research question answering. CoRR abs/1909.06146 (2019). http://arxiv.org/abs/1909.06146

  19. Joty, S., Carenini, G., Ng, R., Mehdad, Y.: Combining intra- and multi-sentential rhetorical parsing for document-level discourse analysis. In: ACL 2013–51st Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference, vol. 1 (2013)

    Google Scholar 

  20. Jusoh, S., Awajan, A., Obeid, N.: The use of ontology in clinical information extraction. J. Phys. Conf. Series 1529, 052083 (2020)

    Google Scholar 

  21. Li, J., Li, R., Hovy, E.: Recursive deep models for discourse parsing. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 2061–2069. Association for Computational Linguistics, Doha, Qatar, October 2014

    Google Scholar 

  22. Liu, J., Kuipers, B., Savarese, S.: Recognizing human actions by attributes. CVPR 2011, 3337–3344 (2011)

    Google Scholar 

  23. Mann, W., Thompson, S.: Rethorical structure theory: toward a functional theory of text organization. Text Talk 8, 243–281 (1988)

    Google Scholar 

  24. Nejadgholi, I., Fraser, K.C., De Bruijn, B., Li, M., LaPlante, A., El Abidine, K.Z.: Recognizing UMLS semantic types with deep learning. In: Proceedings of the Tenth International Workshop on Health Text Mining and Information Analysis (LOUHI 2019), pp. 157–167. Association for Computational Linguistics, Hong Kong (2019)

    Google Scholar 

  25. Pampari, A., Raghavan, P., Liang, J., Peng, J.: emrQA: a large corpus for question answering on electronic medical records. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 2357–2368. Association for Computational Linguistics, Brussels, Belgium (2018)

    Google Scholar 

  26. Pisarevskaya, D., et al.: Towards building a discourse-annotated corpus of Russian. In: Kompjuternaja Lingvistika i Intellektualnye Tehnologii, vol. 1 (2017)

    Google Scholar 

  27. Khin, N.P.P., Lynn, K.T.: Medical concept extraction: a comparison of statistical and semantic methods. In: 2017 18th IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD), pp. 35–38 (2017)

    Google Scholar 

  28. Richardson, M., Burges, C., Renshaw, E.: MCTest: a challenge dataset for the open-domain machine comprehension of text. In: EMNLP 2013–2013 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference, pp. 193–203 (01 2013)

    Google Scholar 

  29. Sarkar, K.: A hybrid approach to extract keyphrases from medical documents. Int. J. Comput. Appl. 63 (2013)

    Google Scholar 

  30. Socher, R., et al.: Recursive deep models for semantic compositionality over a sentiment treebank. In: EMNLP 2013–2013 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference (2013)

    Google Scholar 

  31. Song, M., Tanapaisankit, P.: Biokeyspotter: An unsupervised keyphrase extraction technique in the biomedical full-text collection. Intell. Syst. Ref. Libr. 25 (2012). https://doi.org/10.1007/978-3-642-23151-3_3

  32. Tsatsaronis, G., et al.: An overview of the BIOASQ large-scale biomedical semantic indexing and question answering competition. BMC Bioinform. 16, 138 (2015)

    Google Scholar 

  33. Wang, X., Yoshida, Y., Hirao, T., Sudoh, K., Nagata, M.: Summarization based on task-oriented discourse parsing. IEEE/ACM Trans. Audio Speech Lang. Process. 23, 1358–1367 (2015)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Elizaveta Goncharova .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Galitsky, B., Ilvovsky, D., Goncharova, E. (2021). Relying on Discourse Trees to Extract Medical Ontologies from Text. In: Kovalev, S.M., Kuznetsov, S.O., Panov, A.I. (eds) Artificial Intelligence. RCAI 2021. Lecture Notes in Computer Science(), vol 12948. Springer, Cham. https://doi.org/10.1007/978-3-030-86855-0_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-86855-0_15

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-86854-3

  • Online ISBN: 978-3-030-86855-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics