Skip to main content
Log in

Combination of information retrieval methods with LESK algorithm for Arabic word sense disambiguation

  • Published:
Artificial Intelligence Review Aims and scope Submit manuscript

Abstract

In this paper, we propose to use Harman, Croft and Okapi measures with Lesk algorithm to develop a system for Arabic word sense disambiguation, that combines unsupervised and knowledge based methods. This system must solve the lexical semantic ambiguity in Arabic language. The information retrieval measures are used to estimate the most relevant sense of the ambiguous word, by returning a semantic coherence score corresponding to the context that is semantically closest to the original sentence containing the ambiguous word. The Lesk algorithm is used to assign and select the adequate sense from those proposed by the information retrieval measures mentioned above. This selection is based on a comparison between the glosses of the word to be disambiguated, and its different contexts of use extracted from a corpus. Our experimental study proves that using of Lesk algorithm with Harman, Croft, and Okapi measures allows us to obtain an accuracy rate of 73%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Agirre E, Rigau G (1996) Word sense disambiguation using conceptual density. In: Proceedings of COLING, the 16th international conference on computational linguistic. Copenhagen, Denmark

  • Agirre E, Edmonds P (2006) Word sense disambiguation: algorithms and applications. Springer,

  • Al-Shalabi R, Kanaan G, Al-Serhan H (2003) New approach for extracting Arabic roots. The international Arab conference on information technology

  • Al-Sulaiti L (2004a) Arabic corpora and corpus analysis tools. The 11th conference natural language processing

  • Al-Sulaiti L (2004b) Designing and developing a corpus of contemporary Arabic. PhD thesis

  • Banerjee S (2002) Adapting the Lesk algorithm for word sense disambiguation to wordnet. Master’s thesis, University of Minnesota, USA

  • Bar-Hillel Y (1960) Automatic translation of languages, advances in computers. Academic Press, New York

    Google Scholar 

  • Belgacem M, Zouaghi A, Zrigui M, Antoniadis G (2009) Amelioration of the performance of a semantic analyzer for the comprehension of the spontaneous Arabic speech. The international conference on artificial intelligence and pattern recognition

  • Black WJ, Elkateb S (2004) A prototype English–Arabic dictionary based on wordnet. The 2nd global wordnet conference. GWC2004, Czech Republic

  • Chen P, Ding W, Bowes C, Brown D (2009) A Fully unsupervised word sense disambiguation method using dependency knowledge. The annual conference of the North American chapter of the ACL

  • Croft W (1983) Experiments with representation in a document retrieval system. Res Dev 2(1): 1–21

    MathSciNet  Google Scholar 

  • Derwester S, Dumais ST, Furnas GW, Landauer TK, Harshmann R (1990) Indexing by latent semantic analysis. J Am Soc Inf Sci 41: 391–407

    Article  Google Scholar 

  • Diab M, Resnik P (2002) An unsupervised method for word sense tagging using parallel corpora, the 40th meeting of the association for computational linguistics. Philadelphia, USA

    Google Scholar 

  • Dictionary Al Wasit (2003) Academy of the Arabic language, 4th Edn. Sunrise International Library. http://www.almeshkat.net/books/archive/books/almuajm%20alwasetzip

  • Elmougy S, Taher H, Noaman H (2008) Naïve Bayes classifier for Arabic word sense disambiguation. In: The NLP.

  • Fedaghi S, Al-Anzi F (1989) A new algorithm to extract Arabic root-pattern forms. The 11th national computer conference, Saudi Arabia

  • Gal Y (2002) An HMM approach to vowel restoration in Arabic and Hebrew. In: The ACL workshop on computational approaches to semitic languages. Philadelphia, PA

  • Harman D (1986) An experimental study of factors important in document ranking. The ACM conference on research and development in information retrieval

  • Ide N, Verronis J (1998) Word sense disambiguation: the state of the art. Comput Linguistics 24(1): 1–40

    Google Scholar 

  • Lesk M (1986) Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone. The 5th annual international conference on systems documentation

  • Navigli R (2009) Word sense disambiguation: a survey. ACM Comput Surv 41(2): 1–69

    Article  Google Scholar 

  • Nelken R, Shieber S (2005) Arabic diacritization using weighed finite-state transducers. The ACL workshop on computational approaches to semitic language. Ann Arbor, Michigan

  • Robertson S, Walker M, Gatford M (1994) Okapi at TREC-3, The 3rd text retrieval conference (TREC-3), NIST special publication

  • Vasilescu F (2003) Monolingual corpus disambiguation by the approaches of Lesk. University of Montreal. Faculty of Arts and Sciences; Paper presented at the Faculty of Graduate Studies to obtain the rank of Master of Science (MSc) in computer science

  • Zitouni I, Sorensen J, Sarikaya R (2006) Maximum entropy based restoration of arabic diacritics. The 21th international conference on computational linguistics and 44th annual meeting of the association for computational linguistics. COLING-ACL. Sydney, Australia

  • Zouaghi A, Zrigui M, Antoniadis G (2008) Comprénsion de la parole arabe spontané Une modésation numéque, Revue TAL VARIA:49(1)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Anis Zouaghi.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zouaghi, A., Merhbene, L. & Zrigui, M. Combination of information retrieval methods with LESK algorithm for Arabic word sense disambiguation. Artif Intell Rev 38, 257–269 (2012). https://doi.org/10.1007/s10462-011-9249-3

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10462-011-9249-3

Keywords

Navigation