Advertisement

International Journal of Speech Technology

, Volume 19, Issue 4, pp 857–867 | Cite as

Word sense disambiguation for Arabic text using Wikipedia and Vector Space Model

  • Marwah AlianEmail author
  • Arafat Awajan
  • Akram Al-Kouz
Article

Abstract

In this research we introduce a new approach for Arabic word sense disambiguation by utilizing Wikipedia as a lexical resource for disambiguation. The nearest sense for an ambiguous word is selected using Vector Space Model as a representation and cosine similarity between the word context and the retrieved senses from Wikipedia as a measure. Three experiments have been conducted to evaluate the proposed approach, two experiments use the first retrieved sentence for each sense from Wikipedia but they use different Vector Space Model representations while the third experiment uses the first paragraph for the retrieved sense from Wikipedia. The experiments show that using first paragraph is better than the first sentence and the use of TF-IDF is better than using abstract frequency in VSM. Also, the proposed approach is tested on English words and it gives better results using the first sentence retrieved from Wikipedia for each sense.

Keywords

Arabic word sense disambiguation Disambiguation resource Vector space model Wikipedia 

References

  1. Abdullah, A. (2013). Arabic Wikipedia: Why it lags behind. London: Asfar e-Journal.Google Scholar
  2. Bouhriz, N., Benabbou, F., & Lahmar, E. H. B. (2016). Word sense disambiguation approach for Arabic text, (IJACSA). International Journal of Advanced Computer Science and Applications, 7(4), 381–385.CrossRefGoogle Scholar
  3. Carpaut, M., & Wu, D. (2005). Word sense disambiguation vs. statistical machine translation, In Proc. of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL), pp. 387–394.Google Scholar
  4. Chan, Y., Ng, H., & Chiang, D., 2007, “Word sense disambiguation improves statistical machine translation”, In: Proc. of the 45rd Annual Meeting of the Association for Computational Linguistics (ACL), pp. 33–40.Google Scholar
  5. Cleary JG, Trigg LE (1995) K*: An instance-based learner using an entropic distance measure. In: 12th International Conference on Machine Learning, 108–114.Google Scholar
  6. Dandala, B. (2013). Multilingual word sense disambiguation using Wikipedia. PhD Dissertation, University of North Texas.Google Scholar
  7. Diab, M. (2003). Word sense disambiguation within a multilingual framework. PhD dissertation, University of Maryland.Google Scholar
  8. El Bachir Menai, M., Alsaeedan, W. (2012). Genetic algorithm for Arabic word sense disambiguation, 13th ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing, IEEE, pp. 195–200.Google Scholar
  9. El-Gedawy, M. N. (2013). Using fuzzifiers to solve word sense ambiguation in Arabic language. International Journal of Computer Applications, 79(2), 1–8.CrossRefGoogle Scholar
  10. Elkateb, S., Black, W., Vossen, P., Farwell, D., Rodríguez, H., Pease, A., & Alkhalifa, M. (2006). Arabic WordNet and the challenges of Arabic. In Proceedings of Arabic NLP/MT Conference. London, UK.Google Scholar
  11. Hadni, M., El Alaoui, S., & Lachkar, A. (2016). Word sense disambiguation for Arabic text categorization. The International Arab Journal of Information Technology, 13(1A), 215–222.Google Scholar
  12. Ide, N., & Véronis, J. (1998). Word sense disambiguation: The state of the art. Computational Linguistics, 24(1), 1–40.Google Scholar
  13. Jacquemin, B., Brun, C., & Boux, C. (2002). Enriching a text by semantic disambiguation for information extraction. In: Proc. of the Workshop on Using Semantics for Information Retrieval and Filtering in the 3rd International Conference in Language Resources and Evaluation (LREC).Google Scholar
  14. Lesk, M. (1986). Automatic sense disambiguation using machine readable dictionaries: How to tell a pine cone from a ice cream cone. In: Proceedings of SIGDOC’86.Google Scholar
  15. Lowe, W. (2001). Towards a theory of semantic space. In: Proceedings of the Twenty-_rst Annual Conference of the Cognitive Science Society, pp. 576–581.Google Scholar
  16. Mallery, J. C. (1988). Thinking about foreign policy: Finding an appropriate role for artificial intelligence computers. Ph.D. dissertation. MIT Political Science Department, Cambridge, MA.Google Scholar
  17. Merhbene, L., Zouaghi, A., & Zrigui, M. (2012). Lexical disambiguation of Arabic language: An experimental study, polibits no. 46. México, 2012, 49–54.Google Scholar
  18. Merhbene, L., Zouaghi, A., Zrigui, M. (2013). A semi-supervised method for arabic word sense disambiguation using a weighted directed graph, In: International Joint Conference on Natural Language Processing (pp. 1027–1031).Google Scholar
  19. Mihalcea, R., Tarau, P., Figa, E. (2004). PageRank on semantic networks with application to word sense disambiguation. In: Proceedings of the 20th international conference on Computational Linguistics, COLING ‘04, doi: 10.3115/1220355.1220517, ACM.
  20. Navigli, R. (2009). Word sense disambiguation: a survey, ACM Computing Surveys. 41(2), ACM Press, pp 1–69.Google Scholar
  21. Pal, A. R., & Saha, D. (2015). Word sense disambiguation: A survey. International Journal of Control Theory and Computer Modeling (IJCTCM), 5(3). doi: 10.5121/ijctcm.2015.5301.
  22. Pinto, D., Rosso, P., Benajiba, Y., Ahachad, A., Jiménez-Salazar, H. (2007). Word sense induction in the Arabic language: A self-term expansion based approach, Proc. 7th Conference on Language Engineering of the Egyptian Society of Language Engineering-ESOLE, pp. 235–245.Google Scholar
  23. Ponzetto, S.P., Navigli, R. (2010). Knowledge-rich word sense disambiguation rivaling supervised systems. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pp. 1522–1531.Google Scholar
  24. Salton, G., Wong, A., & Yang, C. S. (1975). A Vector Space Model for automatic indexing. Communications of the ACM, 18(11), 613–620.CrossRefzbMATHGoogle Scholar
  25. Schütze, H., & Pedersen, J. (1995). Information retrieval based on word senses. In: Proc. of Symposium on Document Analysis and Information Retrieval (SDAIR’95), pp. 161–175.Google Scholar
  26. Stokoe, C., Oakes, M., & Tait, J. (2003). Word sense disambiguation in information retrieval revisited. In: Proc. of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. pp. 159–166.Google Scholar
  27. Turney, P. D., & Pantel, Patrick. (2010). From frequency to meaning: Vector Space Models of semantics. Journal of Articial Intelligence Research, 37(2010), 141–188.MathSciNetzbMATHGoogle Scholar
  28. Weaver, W. (1955). Translation. In W. Locke & D. Booth (Eds.), Machine translation of languages: Fourteen essays. Cambridge, MA: MIT Press.Google Scholar
  29. Wiki. (2016). Arabic Wikipedia definition retrieved at 22 June 2016 from: https://en.wikipedia.org/wiki/Arabic_Wikipedia
  30. Zouaghi, A. (2012). A hybrid approach for arabic word sense disambiguation. International Journal of Computer Processing of Languages, 24(2), 133–151.CrossRefGoogle Scholar
  31. Zouaghi, A., Merhbene, L., & Zrigui, M. (2011). Word sense disambiguation for Arabic language using the variants of the Lesk algorithm. WORLDCOMP’, 11, 561–567.Google Scholar

Copyright information

© Springer Science+Business Media New York 2016

Authors and Affiliations

  1. 1.Hashemite UniversityZarqaJordan
  2. 2.Princess Sumaya University for TechnologyAmmanJordan

Personalised recommendations