Abstract
A common and ordinary way of representing a text is as a Bag of its component Words BoW. This Representation suffers from the lack of sense in resulting representations ignoring all semantics that reside in the original text, instead of, the Conceptualization using background knowledge enriches document representation models. While searching polysemic term corresponding senses in semantic resources, multiple matches are detected then introduce some ambiguities in the final document representation, three strategies for Disambiguation can be used: First Concept, All Concepts and Context-Based. SenseRelate is a well-known Context-Based algorithm, which use a fixed window size and taking into consideration the distance weight on how far the terms in the context are from the target word. This may impact negatively on the yielded concepts or senses.
To overcome this problem, and therefore to enhance the process of Biomedical WSD, in this paper we propose a simple modified versions of SenseRelate algorithm named NoDistanceSenseRelate which simply ignore the distance, that is the terms in the context will have the same distance weight.
To illustrate the efficiency of both SenseRelate algorithm and NoDistanceSenseRelate one over the others methods, in this study, several experiments have been conducted using OHSUMED corpus. The obtained results using Biomedical Text Categorization system based on three machine learning models: Support Vector Machine (SVM), Naïve Bayes (NB) and Maximum Entropy (ME) show that the Context-Based methods (SenseRelate and NoDistanceSenseRelate) outperform the others ones.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Elberrichi, Z., Taibi, M., Belaggoun, A.: Multilingual Medical Documents Classification Based on MesH Domain Ontology. CoRR abs/1206.4883 (2012)
Amine, A., Elberrichi, Z., Simonet, M.: Evaluation of text clustering methods using WordNet. Int. Arab J. Inf. Technol. 7, 351 (2010)
Guyot, J., Radhoum, S., Falquet, G.: Ontology-based multilingual information retrieval. In: CLEF (2005)
Litvak, M., Last, M., Kisilevich, S.: Improving classification of multi-lingual web documents using domain ontologies. In: The Second International Workshop on Knowledge Discovery and Ontologies, KDO05, Porto, Portugal, October 7th 2006
Sanchez, D., Moreno, A.: Creating ontologies from Web documents. In: Recent Advances in Artificial Intelligence Research and Development, vol. 113, pp. 11–18. IOS Press (2004)
Song, M.-H., Lim, S.-Y., Park, S.-B., Kang, D.-J., Lee, S.-J.: An automatic approach to classify web documents using a domain ontology. In: Pal, S.K., Bandyopadhyay, S., Biswas, S. (eds.) PReMI 2005. LNCS, vol. 3776, pp. 666–671. Springer, Heidelberg (2005)
Albitar, S., Fournier, S., Espinasse, B.: The Impact of Conceptualization on Text Classification
Patwardhan, S., Banerjee, S., Pedersen, T.: Using measures of semantic relatedness for word sense disambiguation. In: Proceedings of the Fourth International Conference on Intelligent Text Processing and Computational Linguistics, pp. 241–57 (2003)
Lesk, M.: Automatic sense disambiguation using machine readable dictionaries: How to tell a pine cone from an ice cream cone. In: Proceedings of the 5th Annual International Conference on Systems Documentation, pp. 24–6 (1986)
McInnes, B.T., Pedersen, T., Pakhomov, S.V.S., Liu, Y., Melton-Meaux, G.: UMLS: Similarity: Measuring the relatedness and similarity of biomedical concepts. In: HLT-NAACL, pp. 28–31 (2013)
Jimeno-Yepes, A., McInnes, B., Aronson, A.: An unsupervised vector approach to biomedical term disambiguation: Integrating umls and medline. BMC Bioinform. 12(1), 223 (2011)
Hersh, W., et al.: OHSUMED: An interactive retrieval evaluation and new large test collection for research. In: 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 192–201. New York, Inc., Dublin (1994)
Sokolova, M., Lapalme, G.: A systematic analysis of performance measures for classification tasks. Inf. Process. Manag. 45(4), 427–437 (2009)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Rais, M., Lachkar, A. (2016). Evaluation of Disambiguation Strategies on Biomedical Text Categorization. In: Ortuño, F., Rojas, I. (eds) Bioinformatics and Biomedical Engineering. IWBBIO 2016. Lecture Notes in Computer Science(), vol 9656. Springer, Cham. https://doi.org/10.1007/978-3-319-31744-1_68
Download citation
DOI: https://doi.org/10.1007/978-3-319-31744-1_68
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-31743-4
Online ISBN: 978-3-319-31744-1
eBook Packages: Computer ScienceComputer Science (R0)