Abstract
In this paper, we propose to use Harman, Croft and Okapi measures with Lesk algorithm to develop a system for Arabic word sense disambiguation, that combines unsupervised and knowledge based methods. This system must solve the lexical semantic ambiguity in Arabic language. The information retrieval measures are used to estimate the most relevant sense of the ambiguous word, by returning a semantic coherence score corresponding to the context that is semantically closest to the original sentence containing the ambiguous word. The Lesk algorithm is used to assign and select the adequate sense from those proposed by the information retrieval measures mentioned above. This selection is based on a comparison between the glosses of the word to be disambiguated, and its different contexts of use extracted from a corpus. Our experimental study proves that using of Lesk algorithm with Harman, Croft, and Okapi measures allows us to obtain an accuracy rate of 73%.
Similar content being viewed by others
References
Agirre E, Rigau G (1996) Word sense disambiguation using conceptual density. In: Proceedings of COLING, the 16th international conference on computational linguistic. Copenhagen, Denmark
Agirre E, Edmonds P (2006) Word sense disambiguation: algorithms and applications. Springer,
Al-Shalabi R, Kanaan G, Al-Serhan H (2003) New approach for extracting Arabic roots. The international Arab conference on information technology
Al-Sulaiti L (2004a) Arabic corpora and corpus analysis tools. The 11th conference natural language processing
Al-Sulaiti L (2004b) Designing and developing a corpus of contemporary Arabic. PhD thesis
Banerjee S (2002) Adapting the Lesk algorithm for word sense disambiguation to wordnet. Master’s thesis, University of Minnesota, USA
Bar-Hillel Y (1960) Automatic translation of languages, advances in computers. Academic Press, New York
Belgacem M, Zouaghi A, Zrigui M, Antoniadis G (2009) Amelioration of the performance of a semantic analyzer for the comprehension of the spontaneous Arabic speech. The international conference on artificial intelligence and pattern recognition
Black WJ, Elkateb S (2004) A prototype English–Arabic dictionary based on wordnet. The 2nd global wordnet conference. GWC2004, Czech Republic
Chen P, Ding W, Bowes C, Brown D (2009) A Fully unsupervised word sense disambiguation method using dependency knowledge. The annual conference of the North American chapter of the ACL
Croft W (1983) Experiments with representation in a document retrieval system. Res Dev 2(1): 1–21
Derwester S, Dumais ST, Furnas GW, Landauer TK, Harshmann R (1990) Indexing by latent semantic analysis. J Am Soc Inf Sci 41: 391–407
Diab M, Resnik P (2002) An unsupervised method for word sense tagging using parallel corpora, the 40th meeting of the association for computational linguistics. Philadelphia, USA
Dictionary Al Wasit (2003) Academy of the Arabic language, 4th Edn. Sunrise International Library. http://www.almeshkat.net/books/archive/books/almuajm%20alwasetzip
Elmougy S, Taher H, Noaman H (2008) Naïve Bayes classifier for Arabic word sense disambiguation. In: The NLP.
Fedaghi S, Al-Anzi F (1989) A new algorithm to extract Arabic root-pattern forms. The 11th national computer conference, Saudi Arabia
Gal Y (2002) An HMM approach to vowel restoration in Arabic and Hebrew. In: The ACL workshop on computational approaches to semitic languages. Philadelphia, PA
Harman D (1986) An experimental study of factors important in document ranking. The ACM conference on research and development in information retrieval
Ide N, Verronis J (1998) Word sense disambiguation: the state of the art. Comput Linguistics 24(1): 1–40
Lesk M (1986) Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone. The 5th annual international conference on systems documentation
Navigli R (2009) Word sense disambiguation: a survey. ACM Comput Surv 41(2): 1–69
Nelken R, Shieber S (2005) Arabic diacritization using weighed finite-state transducers. The ACL workshop on computational approaches to semitic language. Ann Arbor, Michigan
Robertson S, Walker M, Gatford M (1994) Okapi at TREC-3, The 3rd text retrieval conference (TREC-3), NIST special publication
Vasilescu F (2003) Monolingual corpus disambiguation by the approaches of Lesk. University of Montreal. Faculty of Arts and Sciences; Paper presented at the Faculty of Graduate Studies to obtain the rank of Master of Science (MSc) in computer science
Zitouni I, Sorensen J, Sarikaya R (2006) Maximum entropy based restoration of arabic diacritics. The 21th international conference on computational linguistics and 44th annual meeting of the association for computational linguistics. COLING-ACL. Sydney, Australia
Zouaghi A, Zrigui M, Antoniadis G (2008) Comprénsion de la parole arabe spontané Une modésation numéque, Revue TAL VARIA:49(1)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Zouaghi, A., Merhbene, L. & Zrigui, M. Combination of information retrieval methods with LESK algorithm for Arabic word sense disambiguation. Artif Intell Rev 38, 257–269 (2012). https://doi.org/10.1007/s10462-011-9249-3
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10462-011-9249-3