Abstract
Word Sense Disambiguation (WSD) is a subfield of Natural Language Processing that discerns which meaning of a given term is used in a specific context. Exclusively in the biomedical domain, automatic processing of medical documents containing biomolecular entities, diseases, proteins, and genes, for instance, is summoning namely because many of the terminologies are ambiguous. Word Sense Disambiguation algorithms work mainly on identifying the correct sense of a given vague word in its specific context. This paper runs through the existing methods and datasets in WSD in the biomedical domain. Various methods were applied in this endeavour ranging from knowledge based methods to supervised and unsupervised machine learning methods, applied to different types of datasets such as PubMed and PubMed Central, MIMIC III, Medline, etc. The main findings in our study include the following: (i)The lack of large testing datasets containing medical ambiguities available for public use limited the application of WSD in the biomedical domain. (ii)Whilst the unsupervised methods rely mainly on the UMLS Metathesaurus and can be applied widely, the restricted use of a manually annotated dataset containing both ambiguous words and their definitions, along with supervised learning methods, have given promising results in terms of providing the best definition for the given entity. (iii)Automatic analysis of massive health related corpora would be liable to err without accurate word sense disambiguation approaches.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Huang, L., Sun, C., Qiu, X., Huang, X.: GlossBERT: BERT for Word Sense Disambiguation with Gloss Knowledge (2020). https://github.com/HSLCY/GlossBERT
Al-Mubaid, H., Gungu, S.: A learning-based approach for biomedical word sense disambiguation. In: The Scientific World J. 2012, 8 pages (2012). https://doi.org/10.1100/2012/949247
Navigli, R.: Word sense disambiguation: a survey. In: ACM Reference Format, Volume 41, issue 10, 69 pages (2009). https://doi.org/10.1145/1459352.1459355
Johnson, A.E.W., et al.: Data Descriptor: MIMIC-III, a freely accessible critical care database (2016). https://doi.org/10.1038/sdata.2016.35, www.nature.com/sdata/
Jimeno-Yepes, A.J., Mcinnes, B.T., Aronson, A.R.: Exploiting MeSH indexing in MEDLINE to generate a data set for word sense disambiguation (2011). https://doi.org/10.1186/1471-2105-12-223
Weeber, M., Mork, J.G., Aronson, A.R.: Developing a Test Collection for Biomedical Word Sense Disambiguation (2001). PMID: 11825285; PMCID: PMC2243574
Pustejovsky, J., et al.: Medstract: creating large-scale information servers for biomedical libraries. In: Proceedings of the ACL-02 Workshop on Natural Language Processing in the Biomedical Domain - Volume 3, 85–92, Phildadelphia, Pennsylvania (2002). https://doi.org/10.3115/1118149.1118161
Kilicoglu, H., Shin, D., Fiszman, M., Rosemblat, G., Rindflesch, T.C.: SemMedDB: a PubMed-scale repository of biomedical semantic predications I: Bioinformatics, 28(23), 3158–3160 (2012). https://doi.org/10.1093/bioinformatics/bts591
Duque, A., Martinez-Romo, J., Araujo, L.: Can multilinguality improve biomedical word sense disambiguation? J. Biomed. Inform. 64, 320–332 (2016). https://doi.org/10.1016/j.jbi.2016.10.020
Sabbir, A., Jimeno-Yepes, A., Kavuluru, R.: Knowledge-based biomedical word sense disambiguation with neural concept embeddings. In: Proc IEEE Int Symp Bioinformatics Bioeng. - Volume 2017, 163–170 (2017). https://doi.org/10.1109/BIBE.2017.00-61
Jimeno-Yepes, A.J., Aronson, A.R.: Knowledge-based biomedical word sense disambiguation: comparison of approaches. BMC Bioinform. 11, 569 (2010). https://doi.org/10.1186/1471-2105-11-569
Banerjee, S., Pedersen, T.: An adapted Lesk algorithm for word sense disambiguation using WordNet. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing, CICLing 2002, LNCS, vol. 2276. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-45715-1_11
Agirre, E., Soroa, A.: Personalizing pagerank for word sense disambiguation. In: Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics, pp. 33–41. Association for Computational Linguistics (2009)
Humphrey, S., Rogers, W., Kilicoglu, H., Demner-Fushman, D., Rindflesch, T.: Word sense disambiguation by selecting the best semantic type based on Journal Descriptor Indexing: Preliminary experiment. J. Am. Soc. Inform. Sci. Technol. (Print) 57, 96 (2006). https://doi.org/10.1002/asi.20257
McInnes, B.T., Pedersen, T., Carlis, J.: Using UMLS Concept Unique Identifiers (CUIs) for word sense disambiguation in the biomedical domain. AMIA Annu Symp Proc. 2007 Oct 11;2007:533–7 (2007). PMID: 18693893; PMCID: PMC2655788
Humphrey, S.M., Rogers, W.J., Kilicoglu, H., Demner-Fushman, D., Rindflesch, T.C.: Word sense disambiguation by selecting the best semantic type based on journal descriptor indexing: preliminary experiment. J. Am. Soc. Inf. Sci. Technol. 57(1), 96–113 (2006). PMID: 19890434; PMCID: PMC2771948. https://doi.org/10.1002/asi.20257
Hatzivassiloglou, V., Duboué, P.A., Rzhetsky, A.: Disambiguating proteins genes and RNA in text: a machine learning approach. Bioinformatics 17, S97–S106 (2001)
Podowski, R.M., Cleary, J.G., Goncharoff, N.T., Amoutzias, G., Hayes, W.S.: AZuRE, a scalable system for automated term disambiguation of gene and protein names. Proc IEEE Comput Syst Bioinform Conf. pp. 415–24 (2004). PMID: 16448034. https://doi.org/10.1109/csb.2004.1332454
Hongfang Liu, P.D., Virginia Teller, P.D., Carol Friedman, P.D.: A multi-aspect comparison study of supervised word sense disambiguation. J. Am. Med. Inform. Assoc. 11(4), 320–331 (2004). https://doi.org/10.1197/jamia.M1533
Leroy, G., Rindflesch, T.C.: Effects of information and machine learning algorithms on word sense disambiguation with small datasets. Int. J. Med. Inform. 74(7–8), 573–585 (2005). ISSN 1386–5056, https://doi.org/10.1016/j.ijmedinf.2005.03.013
Gaudan, S., Kirsch, H., Rebholz-Schuhmann, D.: Resolving abbreviations to their senses in Medline. Bioinformatics 21(18), 3658–3664. https://doi.org/10.1093/bioinformatics/bti586
Xu, H., Markatou, M., Dimova, R., et al.: Machine learning and word sense disambiguation in the biomedical domain: design and evaluation issues. BMC Bioinform. 7, 334 (2006). https://doi.org/10.1186/1471-2105-7-334
Stevenson, M., Guo, Y., Gaizauskas, R., et al.: Disambiguation of biomedical text using diverse sources of information. BMC Bioinform. 9, S7 (2008). https://doi.org/10.1186/1471-2105-9-S11-S7
Yu, H., Kim, W., Hatzivassiloglou, V., John Wilbur, W.: Using MEDLINE as a knowledge source for disambiguating abbreviations and acronyms in full-text biomedical journal articles. J. Biomed. Inform. 40(2), 150–159 (2007). ISSN 1532–0464, https://doi.org/10.1016/j.jbi.2006.06.001
Savova, G.K., et al.: Word sense disambiguation across two domains: biomedical literature and clinical notes. J. Biomed. Inform. 41(6), 1088–1100 (2008). ISSN 1532–0464. https://doi.org/10.1016/j.jbi.2008.02.003
Tutubalina, E., Miftahutdinov, Z., Nikolenko, S., Malykh, V.: Medical concept normalization in social media posts with recurrent neural networks. J. Biomed. Inform. 84, 93–102 (2018). ISSN 1532–0464, https://doi.org/10.1016/j.jbi.2018.06.006
Zhang, C., Biś, D., Liu, X., et al.: Biomedical word sense disambiguation with bidirectional long short-term memory and attention-based neural networks. BMC Bioinform. 20, 502 (2019). https://doi.org/10.1186/s12859-019-3079-8
Liu, H., Lussier, Y.A., Friedman, C.: Disambiguating ambiguous biomedical terms in biomedical narrative text: an unsupervised method. J. Biomed. Inform. 34(4), 249–261 (2001). ISSN 1532–0464, https://doi.org/10.1006/jbin.2001.1023
Duque, A., Stevenson, M., Martinez-Romo, J., Araujo, L.: Co-occurrence graphs for word sense disambiguation in the biomedical domain. Artif. Intell. Med. 87, 9–19 (2018). ISSN 0933–3657, https://doi.org/10.1016/j.artmed.2018.03.002
McInnes, B.: An unsupervised vector approach to biomedical term disambiguation: integrating UMLS and medline. In: Proceedings of the ACL-08: HLT Student Research Workshop, pp. 49–54, Columbus, Ohio. Association for Computational Linguistics (2008). https://aclanthology.org/P08-3009
Agirre, E., Soroa, A., Stevenson, M.: Graph-based Word Sense Disambiguation of biomedical documents. Bioinformatics 26(22), 2889–2896 (2010). https://doi.org/10.1093/bioinformatics/btq555
McInnes, B.T., Pedersen, T., Liu, Y., Melton, G.B., Pakhomov, S.V.: Knowledge-based method for determining the meaning of ambiguous biomedical terms using information content measures of similarity, AMIA Annu Symp Proc, 2011:895–904 (2011). PMCID: PMC3243213
El-Rab, G., Wessam & Zaïane, Osmar & El-Hajj, Mohammad.: Analyzing the Impact of UMLS Relations on Word-sense Disambiguation Accuracy. Procedia Computer Science 21, 295–301 (2013). https://doi.org/10.1016/j.procs.2013.09.039
Home - PMC - NCBI. https://www.ncbi.nlm.nih.gov/pmc/. Accessed 6 Mar 2022
PubMed. https://pubmed.ncbi.nlm.nih.gov/. Accessed 6 Mar 2022
MEDLINE, PubMed, and PMC (PubMed Central): How are they different? https://www.nlm.nih.gov/bsd/difference.html. Accessed 6 Mar 2022
Word Sense Disambiguation. https://lhncbc.nlm.nih.gov/ii/areas/word-sense-disambiguation.html. Accessed 6 Mar 2022
Agirre, E., Edmonds, P.: Word Sense Disambiguation: Algorithms and Applications. 1st edn., Springer, Dordrecht (2007). ISBN 1402068700. https://doi.org/10.1007/978-1-4020-4809-8
Shortliffe, E.H., Cimino, J.J.: Biomedical Informatics: Computer Applications in Health Care and Biomedicine Fourth Edition. Springer, Heidelberg (2014). https://doi.org/10.1007/978-1-4471-4474-8
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
El Hannaoui, O., Nfaoui, E.H., El Haoussi, F. (2023). Word Sense Disambiguation in the Biomedical Domain: Short Literature Review. In: Kacprzyk, J., Ezziyyani, M., Balas, V.E. (eds) International Conference on Advanced Intelligent Systems for Sustainable Development. AI2SD 2022. Lecture Notes in Networks and Systems, vol 713. Springer, Cham. https://doi.org/10.1007/978-3-031-35248-5_23
Download citation
DOI: https://doi.org/10.1007/978-3-031-35248-5_23
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-35247-8
Online ISBN: 978-3-031-35248-5
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)