Skip to main content

Word Sense Disambiguation in the Biomedical Domain: Short Literature Review

  • Conference paper
  • First Online:
International Conference on Advanced Intelligent Systems for Sustainable Development (AI2SD 2022)

Abstract

Word Sense Disambiguation (WSD) is a subfield of Natural Language Processing that discerns which meaning of a given term is used in a specific context. Exclusively in the biomedical domain, automatic processing of medical documents containing biomolecular entities, diseases, proteins, and genes, for instance, is summoning namely because many of the terminologies are ambiguous. Word Sense Disambiguation algorithms work mainly on identifying the correct sense of a given vague word in its specific context. This paper runs through the existing methods and datasets in WSD in the biomedical domain. Various methods were applied in this endeavour ranging from knowledge based methods to supervised and unsupervised machine learning methods, applied to different types of datasets such as PubMed and PubMed Central, MIMIC III, Medline, etc. The main findings in our study include the following: (i)The lack of large testing datasets containing medical ambiguities available for public use limited the application of WSD in the biomedical domain. (ii)Whilst the unsupervised methods rely mainly on the UMLS Metathesaurus and can be applied widely, the restricted use of a manually annotated dataset containing both ambiguous words and their definitions, along with supervised learning methods, have given promising results in terms of providing the best definition for the given entity. (iii)Automatic analysis of massive health related corpora would be liable to err without accurate word sense disambiguation approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 259.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 329.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Huang, L., Sun, C., Qiu, X., Huang, X.: GlossBERT: BERT for Word Sense Disambiguation with Gloss Knowledge (2020). https://github.com/HSLCY/GlossBERT

  2. Al-Mubaid, H., Gungu, S.: A learning-based approach for biomedical word sense disambiguation. In: The Scientific World J. 2012, 8 pages (2012). https://doi.org/10.1100/2012/949247

  3. Navigli, R.: Word sense disambiguation: a survey. In: ACM Reference Format, Volume 41, issue 10, 69 pages (2009). https://doi.org/10.1145/1459352.1459355

  4. Johnson, A.E.W., et al.: Data Descriptor: MIMIC-III, a freely accessible critical care database (2016). https://doi.org/10.1038/sdata.2016.35, www.nature.com/sdata/

  5. Jimeno-Yepes, A.J., Mcinnes, B.T., Aronson, A.R.: Exploiting MeSH indexing in MEDLINE to generate a data set for word sense disambiguation (2011). https://doi.org/10.1186/1471-2105-12-223

  6. Weeber, M., Mork, J.G., Aronson, A.R.: Developing a Test Collection for Biomedical Word Sense Disambiguation (2001). PMID: 11825285; PMCID: PMC2243574

    Google Scholar 

  7. Pustejovsky, J., et al.: Medstract: creating large-scale information servers for biomedical libraries. In: Proceedings of the ACL-02 Workshop on Natural Language Processing in the Biomedical Domain - Volume 3, 85–92, Phildadelphia, Pennsylvania (2002). https://doi.org/10.3115/1118149.1118161

  8. Kilicoglu, H., Shin, D., Fiszman, M., Rosemblat, G., Rindflesch, T.C.: SemMedDB: a PubMed-scale repository of biomedical semantic predications I: Bioinformatics, 28(23), 3158–3160 (2012). https://doi.org/10.1093/bioinformatics/bts591

  9. Duque, A., Martinez-Romo, J., Araujo, L.: Can multilinguality improve biomedical word sense disambiguation? J. Biomed. Inform. 64, 320–332 (2016). https://doi.org/10.1016/j.jbi.2016.10.020

    Article  Google Scholar 

  10. Sabbir, A., Jimeno-Yepes, A., Kavuluru, R.: Knowledge-based biomedical word sense disambiguation with neural concept embeddings. In: Proc IEEE Int Symp Bioinformatics Bioeng. - Volume 2017, 163–170 (2017). https://doi.org/10.1109/BIBE.2017.00-61

  11. Jimeno-Yepes, A.J., Aronson, A.R.: Knowledge-based biomedical word sense disambiguation: comparison of approaches. BMC Bioinform. 11, 569 (2010). https://doi.org/10.1186/1471-2105-11-569

    Article  Google Scholar 

  12. Banerjee, S., Pedersen, T.: An adapted Lesk algorithm for word sense disambiguation using WordNet. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing, CICLing 2002, LNCS, vol. 2276. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-45715-1_11

  13. Agirre, E., Soroa, A.: Personalizing pagerank for word sense disambiguation. In: Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics, pp. 33–41. Association for Computational Linguistics (2009)

    Google Scholar 

  14. Humphrey, S., Rogers, W., Kilicoglu, H., Demner-Fushman, D., Rindflesch, T.: Word sense disambiguation by selecting the best semantic type based on Journal Descriptor Indexing: Preliminary experiment. J. Am. Soc. Inform. Sci. Technol. (Print) 57, 96 (2006). https://doi.org/10.1002/asi.20257

    Article  Google Scholar 

  15. McInnes, B.T., Pedersen, T., Carlis, J.: Using UMLS Concept Unique Identifiers (CUIs) for word sense disambiguation in the biomedical domain. AMIA Annu Symp Proc. 2007 Oct 11;2007:533–7 (2007). PMID: 18693893; PMCID: PMC2655788

    Google Scholar 

  16. Humphrey, S.M., Rogers, W.J., Kilicoglu, H., Demner-Fushman, D., Rindflesch, T.C.: Word sense disambiguation by selecting the best semantic type based on journal descriptor indexing: preliminary experiment. J. Am. Soc. Inf. Sci. Technol. 57(1), 96–113 (2006). PMID: 19890434; PMCID: PMC2771948. https://doi.org/10.1002/asi.20257

  17. Hatzivassiloglou, V., Duboué, P.A., Rzhetsky, A.: Disambiguating proteins genes and RNA in text: a machine learning approach. Bioinformatics 17, S97–S106 (2001)

    Article  Google Scholar 

  18. Podowski, R.M., Cleary, J.G., Goncharoff, N.T., Amoutzias, G., Hayes, W.S.: AZuRE, a scalable system for automated term disambiguation of gene and protein names. Proc IEEE Comput Syst Bioinform Conf. pp. 415–24 (2004). PMID: 16448034. https://doi.org/10.1109/csb.2004.1332454

  19. Hongfang Liu, P.D., Virginia Teller, P.D., Carol Friedman, P.D.: A multi-aspect comparison study of supervised word sense disambiguation. J. Am. Med. Inform. Assoc. 11(4), 320–331 (2004). https://doi.org/10.1197/jamia.M1533

    Article  Google Scholar 

  20. Leroy, G., Rindflesch, T.C.: Effects of information and machine learning algorithms on word sense disambiguation with small datasets. Int. J. Med. Inform. 74(7–8), 573–585 (2005). ISSN 1386–5056, https://doi.org/10.1016/j.ijmedinf.2005.03.013

  21. Gaudan, S., Kirsch, H., Rebholz-Schuhmann, D.: Resolving abbreviations to their senses in Medline. Bioinformatics 21(18), 3658–3664. https://doi.org/10.1093/bioinformatics/bti586

  22. Xu, H., Markatou, M., Dimova, R., et al.: Machine learning and word sense disambiguation in the biomedical domain: design and evaluation issues. BMC Bioinform. 7, 334 (2006). https://doi.org/10.1186/1471-2105-7-334

    Article  Google Scholar 

  23. Stevenson, M., Guo, Y., Gaizauskas, R., et al.: Disambiguation of biomedical text using diverse sources of information. BMC Bioinform. 9, S7 (2008). https://doi.org/10.1186/1471-2105-9-S11-S7

    Article  Google Scholar 

  24. Yu, H., Kim, W., Hatzivassiloglou, V., John Wilbur, W.: Using MEDLINE as a knowledge source for disambiguating abbreviations and acronyms in full-text biomedical journal articles. J. Biomed. Inform. 40(2), 150–159 (2007). ISSN 1532–0464, https://doi.org/10.1016/j.jbi.2006.06.001

  25. Savova, G.K., et al.: Word sense disambiguation across two domains: biomedical literature and clinical notes. J. Biomed. Inform. 41(6), 1088–1100 (2008). ISSN 1532–0464. https://doi.org/10.1016/j.jbi.2008.02.003

  26. Tutubalina, E., Miftahutdinov, Z., Nikolenko, S., Malykh, V.: Medical concept normalization in social media posts with recurrent neural networks. J. Biomed. Inform. 84, 93–102 (2018). ISSN 1532–0464, https://doi.org/10.1016/j.jbi.2018.06.006

  27. Zhang, C., Biś, D., Liu, X., et al.: Biomedical word sense disambiguation with bidirectional long short-term memory and attention-based neural networks. BMC Bioinform. 20, 502 (2019). https://doi.org/10.1186/s12859-019-3079-8

    Article  Google Scholar 

  28. Liu, H., Lussier, Y.A., Friedman, C.: Disambiguating ambiguous biomedical terms in biomedical narrative text: an unsupervised method. J. Biomed. Inform. 34(4), 249–261 (2001). ISSN 1532–0464, https://doi.org/10.1006/jbin.2001.1023

  29. Duque, A., Stevenson, M., Martinez-Romo, J., Araujo, L.: Co-occurrence graphs for word sense disambiguation in the biomedical domain. Artif. Intell. Med. 87, 9–19 (2018). ISSN 0933–3657, https://doi.org/10.1016/j.artmed.2018.03.002

  30. McInnes, B.: An unsupervised vector approach to biomedical term disambiguation: integrating UMLS and medline. In: Proceedings of the ACL-08: HLT Student Research Workshop, pp. 49–54, Columbus, Ohio. Association for Computational Linguistics (2008). https://aclanthology.org/P08-3009

  31. Agirre, E., Soroa, A., Stevenson, M.: Graph-based Word Sense Disambiguation of biomedical documents. Bioinformatics 26(22), 2889–2896 (2010). https://doi.org/10.1093/bioinformatics/btq555

    Article  Google Scholar 

  32. McInnes, B.T., Pedersen, T., Liu, Y., Melton, G.B., Pakhomov, S.V.: Knowledge-based method for determining the meaning of ambiguous biomedical terms using information content measures of similarity, AMIA Annu Symp Proc, 2011:895–904 (2011). PMCID: PMC3243213

    Google Scholar 

  33. El-Rab, G., Wessam & Zaïane, Osmar & El-Hajj, Mohammad.: Analyzing the Impact of UMLS Relations on Word-sense Disambiguation Accuracy. Procedia Computer Science 21, 295–301 (2013). https://doi.org/10.1016/j.procs.2013.09.039

  34. Home - PMC - NCBI. https://www.ncbi.nlm.nih.gov/pmc/. Accessed 6 Mar 2022

  35. PubMed. https://pubmed.ncbi.nlm.nih.gov/. Accessed 6 Mar 2022

  36. MEDLINE, PubMed, and PMC (PubMed Central): How are they different? https://www.nlm.nih.gov/bsd/difference.html. Accessed 6 Mar 2022

  37. Word Sense Disambiguation. https://lhncbc.nlm.nih.gov/ii/areas/word-sense-disambiguation.html. Accessed 6 Mar 2022

  38. Agirre, E., Edmonds, P.: Word Sense Disambiguation: Algorithms and Applications. 1st edn., Springer, Dordrecht (2007). ISBN 1402068700. https://doi.org/10.1007/978-1-4020-4809-8

  39. Shortliffe, E.H., Cimino, J.J.: Biomedical Informatics: Computer Applications in Health Care and Biomedicine Fourth Edition. Springer, Heidelberg (2014). https://doi.org/10.1007/978-1-4471-4474-8

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Oumayma El Hannaoui .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

El Hannaoui, O., Nfaoui, E.H., El Haoussi, F. (2023). Word Sense Disambiguation in the Biomedical Domain: Short Literature Review. In: Kacprzyk, J., Ezziyyani, M., Balas, V.E. (eds) International Conference on Advanced Intelligent Systems for Sustainable Development. AI2SD 2022. Lecture Notes in Networks and Systems, vol 713. Springer, Cham. https://doi.org/10.1007/978-3-031-35248-5_23

Download citation

Publish with us

Policies and ethics