Abstract
We explore the role of discourse analysis in ontology construction. While extracting candidate phrases to form ontology entries from text, it is important to pay attention to which discourse units these phrases occur in. It turns out that not all discourse units are equal in terms of their contribution to forming ontology entries. We survey text mining and ontology information extraction techniques in medical domain and select the ones where advanced linguistic analysis including the discourse level is leveraged the most to produce a robust and efficient ontology. We evaluate the consistency of the resultant ontology and its role in assuring high search relevance using several real-life medical datasets and prove the importance of introducing discourse information into the ontology construction.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Amer, E., Fouad, K.M.: Keyphrase extraction methodology from short abstracts of medical documents. In: 2016 8th Cairo International Biomedical Engineering Conference, CIBEC 2016 (2016)
Arbabi, A., Adams, D.R., Fidler, S., Brudno, M.: Identifying clinical terms in medical text using ontology-guided machine learning. JMIR Med. Inform. 7, e12596 (2019)
Aronson, A.R.: Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program. In: Proceedings of AMIA Symposium (2001)
Banko, M., Cafarella, M., Soderland, S., Broadhead, M., Etzioni, O.: Open information extraction from the web. In: IJCAI International Joint Conference on Artificial Intelligence, pp. 2670–2676 (2007)
Barzilay, R., Lapata, M.: Modeling local coherence: an entity-based approach. Comput. Linguis. 34, 1–34 (2008)
Ben Abacha, A., Da Silveira, M., Pruski, C.: Medical ontology validation through question answering. In: Peek, N., MarÃn Morales, R., Peleg, M. (eds.) AIME 2013. LNCS (LNAI), vol. 7885, pp. 196–205. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-38326-7_30
Berant, J., et al.: Modeling biological processes for reading comprehension. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1499–1510. Association for Computational Linguistics, Doha, Qatar (2014)
Chen, D., Manning, C.: A fast and accurate dependency parser using neural networks. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 740–750. Association for Computational Linguistics, Doha, Qatar (2014)
Chistova, E., et al.: RST discourse parser for Russian: an experimental study of deep learning models. In: van der Aalst, W.M.P., et al. (eds.) AIST 2020. LNCS, vol. 12602, pp. 105–119. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-72610-2_8
Corro, L., Gemulla, R.: ClausIE: clause-based open information extraction. In: WWW 2013 - Proceedings of the 22nd International Conference on World Wide Web, pp. 355–366 (2013)
Galitsky, B.: Improving relevance in a content pipeline via syntactic generalization. Eng. Appl. Artif. Intell. 58, 1–26 (2017)
Galitsky, B., Ilvovsky, D., Kuznetsov, S.O.: Text classification into abstract classes based on discourse structure. In: Proceedings of the International Conference Recent Advances in Natural Language Processing, pp. 200–207. Incoma Ltd., Shoumen, Bulgaria, Hissar, Bulgaria (2015)
Galitsky, B., Ilvovsky, D., Kuznetsov, S.O.: Detecting logical argumentation in text via communicative discourse tree. J. Exp. Theor. Artif. Intell. 30, 637–663 (2018)
Galitsky, B.A., Dobrocsi, G., de la Rosa, J.L., Kuznetsov, S.O.: Using generalization of syntactic parse trees for taxonomy capture on the web. In: Andrews, S., Polovina, S., Hill, R., Akhgar, B. (eds.) ICCS 2011. LNCS (LNAI), vol. 6828, pp. 104–117. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-22688-5_8
Gonzalez, G., Tahsin, T., Goodale, B., Greene, A., Greene, C.: Recent advances and emerging applications in text and data mining for biomedical discovery. Briefings Bioinform. 17, 33–42 (2015)
Ji, B., et al.: A hybrid approach for named entity recognition in Chinese electronic medical record. BMC Med. Inform. Decis. Making 19, 149–158 (2019)
Ji, Y., Eisenstein, J.: Representation learning for text-level discourse parsing. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 13–24. Association for Computational Linguistics, Baltimore, Maryland (2014)
Jin, Q., Dhingra, B., Liu, Z., Cohen, W.W., Lu, X.: PubMedQA: a dataset for biomedical research question answering. CoRR abs/1909.06146 (2019). http://arxiv.org/abs/1909.06146
Joty, S., Carenini, G., Ng, R., Mehdad, Y.: Combining intra- and multi-sentential rhetorical parsing for document-level discourse analysis. In: ACL 2013–51st Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference, vol. 1 (2013)
Jusoh, S., Awajan, A., Obeid, N.: The use of ontology in clinical information extraction. J. Phys. Conf. Series 1529, 052083 (2020)
Li, J., Li, R., Hovy, E.: Recursive deep models for discourse parsing. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 2061–2069. Association for Computational Linguistics, Doha, Qatar, October 2014
Liu, J., Kuipers, B., Savarese, S.: Recognizing human actions by attributes. CVPR 2011, 3337–3344 (2011)
Mann, W., Thompson, S.: Rethorical structure theory: toward a functional theory of text organization. Text Talk 8, 243–281 (1988)
Nejadgholi, I., Fraser, K.C., De Bruijn, B., Li, M., LaPlante, A., El Abidine, K.Z.: Recognizing UMLS semantic types with deep learning. In: Proceedings of the Tenth International Workshop on Health Text Mining and Information Analysis (LOUHI 2019), pp. 157–167. Association for Computational Linguistics, Hong Kong (2019)
Pampari, A., Raghavan, P., Liang, J., Peng, J.: emrQA: a large corpus for question answering on electronic medical records. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 2357–2368. Association for Computational Linguistics, Brussels, Belgium (2018)
Pisarevskaya, D., et al.: Towards building a discourse-annotated corpus of Russian. In: Kompjuternaja Lingvistika i Intellektualnye Tehnologii, vol. 1 (2017)
Khin, N.P.P., Lynn, K.T.: Medical concept extraction: a comparison of statistical and semantic methods. In: 2017 18th IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD), pp. 35–38 (2017)
Richardson, M., Burges, C., Renshaw, E.: MCTest: a challenge dataset for the open-domain machine comprehension of text. In: EMNLP 2013–2013 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference, pp. 193–203 (01 2013)
Sarkar, K.: A hybrid approach to extract keyphrases from medical documents. Int. J. Comput. Appl. 63 (2013)
Socher, R., et al.: Recursive deep models for semantic compositionality over a sentiment treebank. In: EMNLP 2013–2013 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference (2013)
Song, M., Tanapaisankit, P.: Biokeyspotter: An unsupervised keyphrase extraction technique in the biomedical full-text collection. Intell. Syst. Ref. Libr. 25 (2012). https://doi.org/10.1007/978-3-642-23151-3_3
Tsatsaronis, G., et al.: An overview of the BIOASQ large-scale biomedical semantic indexing and question answering competition. BMC Bioinform. 16, 138 (2015)
Wang, X., Yoshida, Y., Hirao, T., Sudoh, K., Nagata, M.: Summarization based on task-oriented discourse parsing. IEEE/ACM Trans. Audio Speech Lang. Process. 23, 1358–1367 (2015)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Galitsky, B., Ilvovsky, D., Goncharova, E. (2021). Relying on Discourse Trees to Extract Medical Ontologies from Text. In: Kovalev, S.M., Kuznetsov, S.O., Panov, A.I. (eds) Artificial Intelligence. RCAI 2021. Lecture Notes in Computer Science(), vol 12948. Springer, Cham. https://doi.org/10.1007/978-3-030-86855-0_15
Download citation
DOI: https://doi.org/10.1007/978-3-030-86855-0_15
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-86854-3
Online ISBN: 978-3-030-86855-0
eBook Packages: Computer ScienceComputer Science (R0)