Word sense disambiguation for Arabic text using Wikipedia and Vector Space Model

Abstract

In this research we introduce a new approach for Arabic word sense disambiguation by utilizing Wikipedia as a lexical resource for disambiguation. The nearest sense for an ambiguous word is selected using Vector Space Model as a representation and cosine similarity between the word context and the retrieved senses from Wikipedia as a measure. Three experiments have been conducted to evaluate the proposed approach, two experiments use the first retrieved sentence for each sense from Wikipedia but they use different Vector Space Model representations while the third experiment uses the first paragraph for the retrieved sense from Wikipedia. The experiments show that using first paragraph is better than the first sentence and the use of TF-IDF is better than using abstract frequency in VSM. Also, the proposed approach is tested on English words and it gives better results using the first sentence retrieved from Wikipedia for each sense.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2

References

  1. Abdullah, A. (2013). Arabic Wikipedia: Why it lags behind. London: Asfar e-Journal.

    Google Scholar 

  2. Bouhriz, N., Benabbou, F., & Lahmar, E. H. B. (2016). Word sense disambiguation approach for Arabic text, (IJACSA). International Journal of Advanced Computer Science and Applications, 7(4), 381–385.

    Article  Google Scholar 

  3. Carpaut, M., & Wu, D. (2005). Word sense disambiguation vs. statistical machine translation, In Proc. of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL), pp. 387–394.

  4. Chan, Y., Ng, H., & Chiang, D., 2007, “Word sense disambiguation improves statistical machine translation”, In: Proc. of the 45rd Annual Meeting of the Association for Computational Linguistics (ACL), pp. 33–40.

  5. Cleary JG, Trigg LE (1995) K*: An instance-based learner using an entropic distance measure. In: 12th International Conference on Machine Learning, 108–114.

  6. Dandala, B. (2013). Multilingual word sense disambiguation using Wikipedia. PhD Dissertation, University of North Texas.

  7. Diab, M. (2003). Word sense disambiguation within a multilingual framework. PhD dissertation, University of Maryland.

  8. El Bachir Menai, M., Alsaeedan, W. (2012). Genetic algorithm for Arabic word sense disambiguation, 13th ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing, IEEE, pp. 195–200.

  9. El-Gedawy, M. N. (2013). Using fuzzifiers to solve word sense ambiguation in Arabic language. International Journal of Computer Applications, 79(2), 1–8.

    Article  Google Scholar 

  10. Elkateb, S., Black, W., Vossen, P., Farwell, D., Rodríguez, H., Pease, A., & Alkhalifa, M. (2006). Arabic WordNet and the challenges of Arabic. In Proceedings of Arabic NLP/MT Conference. London, UK.

  11. Hadni, M., El Alaoui, S., & Lachkar, A. (2016). Word sense disambiguation for Arabic text categorization. The International Arab Journal of Information Technology, 13(1A), 215–222.

    Google Scholar 

  12. Ide, N., & Véronis, J. (1998). Word sense disambiguation: The state of the art. Computational Linguistics, 24(1), 1–40.

    Google Scholar 

  13. Jacquemin, B., Brun, C., & Boux, C. (2002). Enriching a text by semantic disambiguation for information extraction. In: Proc. of the Workshop on Using Semantics for Information Retrieval and Filtering in the 3rd International Conference in Language Resources and Evaluation (LREC).

  14. Lesk, M. (1986). Automatic sense disambiguation using machine readable dictionaries: How to tell a pine cone from a ice cream cone. In: Proceedings of SIGDOC’86.

  15. Lowe, W. (2001). Towards a theory of semantic space. In: Proceedings of the Twenty-_rst Annual Conference of the Cognitive Science Society, pp. 576–581.

  16. Mallery, J. C. (1988). Thinking about foreign policy: Finding an appropriate role for artificial intelligence computers. Ph.D. dissertation. MIT Political Science Department, Cambridge, MA.

  17. Merhbene, L., Zouaghi, A., & Zrigui, M. (2012). Lexical disambiguation of Arabic language: An experimental study, polibits no. 46. México, 2012, 49–54.

    Google Scholar 

  18. Merhbene, L., Zouaghi, A., Zrigui, M. (2013). A semi-supervised method for arabic word sense disambiguation using a weighted directed graph, In: International Joint Conference on Natural Language Processing (pp. 1027–1031).

  19. Mihalcea, R., Tarau, P., Figa, E. (2004). PageRank on semantic networks with application to word sense disambiguation. In: Proceedings of the 20th international conference on Computational Linguistics, COLING ‘04, doi:10.3115/1220355.1220517, ACM.

  20. Navigli, R. (2009). Word sense disambiguation: a survey, ACM Computing Surveys. 41(2), ACM Press, pp 1–69.

  21. Pal, A. R., & Saha, D. (2015). Word sense disambiguation: A survey. International Journal of Control Theory and Computer Modeling (IJCTCM), 5(3). doi:10.5121/ijctcm.2015.5301.

  22. Pinto, D., Rosso, P., Benajiba, Y., Ahachad, A., Jiménez-Salazar, H. (2007). Word sense induction in the Arabic language: A self-term expansion based approach, Proc. 7th Conference on Language Engineering of the Egyptian Society of Language Engineering-ESOLE, pp. 235–245.

  23. Ponzetto, S.P., Navigli, R. (2010). Knowledge-rich word sense disambiguation rivaling supervised systems. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pp. 1522–1531.

  24. Salton, G., Wong, A., & Yang, C. S. (1975). A Vector Space Model for automatic indexing. Communications of the ACM, 18(11), 613–620.

    Article  MATH  Google Scholar 

  25. Schütze, H., & Pedersen, J. (1995). Information retrieval based on word senses. In: Proc. of Symposium on Document Analysis and Information Retrieval (SDAIR’95), pp. 161–175.

  26. Stokoe, C., Oakes, M., & Tait, J. (2003). Word sense disambiguation in information retrieval revisited. In: Proc. of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. pp. 159–166.

  27. Turney, P. D., & Pantel, Patrick. (2010). From frequency to meaning: Vector Space Models of semantics. Journal of Articial Intelligence Research, 37(2010), 141–188.

    MathSciNet  MATH  Google Scholar 

  28. Weaver, W. (1955). Translation. In W. Locke & D. Booth (Eds.), Machine translation of languages: Fourteen essays. Cambridge, MA: MIT Press.

    Google Scholar 

  29. Wiki. (2016). Arabic Wikipedia definition retrieved at 22 June 2016 from: https://en.wikipedia.org/wiki/Arabic_Wikipedia

  30. Zouaghi, A. (2012). A hybrid approach for arabic word sense disambiguation. International Journal of Computer Processing of Languages, 24(2), 133–151.

    Article  Google Scholar 

  31. Zouaghi, A., Merhbene, L., & Zrigui, M. (2011). Word sense disambiguation for Arabic language using the variants of the Lesk algorithm. WORLDCOMP’, 11, 561–567.

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Marwah Alian.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Alian, M., Awajan, A. & Al-Kouz, A. Word sense disambiguation for Arabic text using Wikipedia and Vector Space Model. Int J Speech Technol 19, 857–867 (2016). https://doi.org/10.1007/s10772-016-9376-y

Download citation

Keywords

  • Arabic word sense disambiguation
  • Disambiguation resource
  • Vector space model
  • Wikipedia