International Journal of Speech Technology

, Volume 19, Issue 2, pp 259–267 | Cite as

Improving Arabic morphological analyzers benchmark

  • Younes Jaafar
  • Karim Bouzoubaa
  • Abdellah Yousfi
  • Rachida Tajmout
  • Hakima Khamar
Article

Abstract

The various tools dedicated to Arabic natural language processing have undergone significant development during recent years. Among these tools, Arabic morphological analyzers are of great importance because they are often used within other projects that are more advanced such as syntactic parsers, search engines, machine translation systems, etc. Thus, researchers are forced to make a decision concerning which morphological analyzer to use in their research projects, and this task is very difficult since there are many criteria to take into account. In order to facilitate this choice, we considered the problem of benchmarking morphological analyzers in a previous work by proposing a solution that allows returning a set of metrics of each analyzer that are: accuracy, precision, recall, F-measure and the execution time. In this article, we present two new major improvements to our solution: the establishment of the first version of our corpus that is dedicated to the evaluation of morphological analyzers, as well as the introduction of a new metric, which combines all metrics related to results as well as the execution time of the analyzers.

Keywords

Arabic morphological analyzers Benchmark Standard corpus 

References

  1. Alansary, S., Nagi, M., & Adly, N. (2007). Building an international corpus of Arabic. 7th international conference on language engineering, (p. np.). Cairo.Google Scholar
  2. ALECSO. (n.d.). Retrieved December 23, 2014, from مواصفات نظام التحليل الصرفي في اللغة العربية: http://www.alecso.org.tn/images/stories/OULOUM/MOHALLILAT%20SARFIA_DAMAS_2009/022%202%20SPECIFICATIONS.pdf.
  3. Al-Kabi, M., Al-Radaideh, Q., & Akkawi, K. (2011). Benchmarking and assessing the performance of Arabic stemmers. Journal of Information Science, 37(2), 111–119.CrossRefGoogle Scholar
  4. Alkhalil Morpho Sys. (2013). Retrieved April 23, 2015, from Alkhalil Morpho Sys: http://sourceforge.net/projects/alkhalil/.
  5. Al-Sughaiyer, I. A., & Al-Kharashi, I. A. (2004). Arabic morphological analysis techniques: A comprehensive survey. Journal of the American Society For Information Science and Technology, 55(3), 189–213. Retrieved from Imad Al-Sughayer and Ibrahim Al-Kharashi. “Arabic morphological Analysis Techniques: a comprehensive Survey”. Computer and Electronics.Google Scholar
  6. Boudlal, A., Lakhouaja, A., Mazroui, A., Meziane, A., Ould Abdallahi, O. B., & Shoul, M. (2011). Alkhalil Morpho Sys: A morphosyntactic analysis system for Arabic texts. Proceedings of ACIT’2010. Google Scholar
  7. Brihaye, P. (2003). AraMorph. Retrieved April 23, 2015, from AraMorph: http://www.nongnu.org/aramorph/english/index.html.
  8. Buckwalter, T. (2002a). Arabic morphology analysis. Retrieved April 23, 2015, from QAMUS: http://www.qamus.org/morphology.htm.
  9. Buckwalter, T. (2002b). Buckwalter Arabic morphological analyzer version 1.0.Google Scholar
  10. Champsaur, C. (2013, January). La traduction automatique : Un outil pour les traducteurs? The Journal of Specialised Translation, 19, pp. 19–28.Google Scholar
  11. Chennoufi, A., & Mazroui, A. (2014). Apport de la deuxième version de l’analyseur Alkhalil Morpho Sys dans la voyellation automatique des textes Arabes. 5th international conference on Arabic language processing (CITALA 2014), (pp. 223–230). Oujda.Google Scholar
  12. Darwish, K. (2002). Building a shallow Arabic morphological analyzer in one day. Proceedings of the ACL-2002 workshop on computational approaches to semitic languages, (pp. 47–54). Retrieved from https://aclweb.org/anthology.
  13. Diab, M. (2009). Second generation tools (AMIRA 2.0): Fast and robust tokenization, POS tagging, and base phrase chunking. Second international conference on Arabic language resources and tools, (pp. 285–288). Cairo.Google Scholar
  14. Dukes, K. (2010). The Quranic Arabic corpus. Retrieved April 23, 2015, from Quranic Arabic Corpus. http://corpus.quran.com.
  15. Dukes, K., & Habash, N. (2010). Morphological annotation of Quranic Arabic. Language resources and evaluation conference (LREC). Malta. Retrieved from https://aclweb.org/anthology.
  16. Graff, D., Maamouri, M., Bouziri, B., Krouna, S., Kulick, S., & Buckwalter, T. (2009). Standard Arabic morphological analyzer (SAMA) version 3.1. Linguistic Data Consortium LDC2009E73.Google Scholar
  17. Habash, N., Rambow, O., & Roth, R. (n.d.). MADA + TOKAN software suite. Retrieved April 23, 2015, from MADA + TOKAN: http://www1.cs.columbia.edu/~rambow/software-downloads/MADA_Distribution.html.
  18. Habash, N., Rambow, O., & Roth, R. (2009). Mada + tokan: A toolkit for Arabic tokenization, diacritization, morphological disambiguation, pos tagging, stemming and lemmatization. Proceedings of the 2nd international conference on Arabic language resources and Tools (MEDAR), (pp. 102–109). Cairo.Google Scholar
  19. Hassan, Y., Aly, M., & Atiya, A. (2014). Arabic spelling correction using supervised learning. Proceedings of the EMNLP 2014 workshop on Arabic, (pp. 121–126). Doha.Google Scholar
  20. Hattab, M., Haddad, B., Yaseen, M., Duraidi, A., & Shmais, A. A. (2009). Addaall Arabic search engine: Improving search based on combination of morphological analysis and generation considering semantic patterns. The 2nd international conference on Arabic language resources & tools, (pp. 159–162).Google Scholar
  21. Jaafar, Y., & Bouzoubaa, K. (2014). Benchmark of Arabic morphological analyzers: Challenges and solutions. Intelligent systems: Theories and applications (SITA-14), (pp. 1–6). Rabat.Google Scholar
  22. Kano, Y., Dorado, R., McCrohon, L., Ananiadou, S., & Tsujii, J. (2010). U-Compare: An integrated language resource evaluation platform including a comprehensive UIMA resource library. Proceedings of the seventh international conference on language resources and evaluation (LREC 2010), (pp. 428–434).Google Scholar
  23. Koulali, R., & Meziane, A. (2013). Experiments with Arabic topic detection. Journal of Theoretical and Applied Information Technology, 50(1), 28–32.Google Scholar
  24. Pang, B., & Lee, L. (2008). Opinion mining and sentiment analysis. Foundations and Trends® in Information Retrieval, 2(1–2), 1–135.CrossRefGoogle Scholar
  25. Pasha, A., Al-Badrashiny, M., Diab, M., El Kholy, A., Eskander, R., Habash, N., & Roth, R. M. (2014). MADAMIRA: A fast, comprehensive tool for morphological analysis and disambiguation of Arabic. LREC’14, (pp. 1094–1101). Reykjavik.Google Scholar
  26. Sawalha, M., & Atwell, E. (2008). Comparative evaluation of Arabic language morphological analysers and stemmers. International conference on computational linguisticsCOLING, (pp. 107–110). Retrieved from https://aclweb.org/anthology.
  27. Sawalha, M. (n.d.). Gold Standard of Arabic. Gold standard for evaluating Arabic morphological analyzers. Retrieved April 23, 2015, from http://www.comp.leeds.ac.uk/sawalha/goldstandard.html.
  28. Smrž, O. (2007). ElixirFM: Implementation of functional Arabic morphology. Proceedings of the 2007 workshop on computational approaches to Semitic languages: common issues and resources (pp. 1–8). Stroudsburg: Association for Computational Linguistics.Google Scholar
  29. Wali, W., Gargouri, B., & Ben Hamadou, A. (2014). A system for evaluating the content of LMF Arabic dictionaries. 5th international conference on Arabic language processing (CITALA 2014), (pp. 159–167). Oujda.Google Scholar

Copyright information

© Springer Science+Business Media New York 2016

Authors and Affiliations

  • Younes Jaafar
    • 1
  • Karim Bouzoubaa
    • 1
  • Abdellah Yousfi
    • 2
  • Rachida Tajmout
    • 1
  • Hakima Khamar
    • 3
  1. 1.Mohammadia School of EngineersMohammed Vth UniversityRabatMorocco
  2. 2.FSJESMohammed Vth UniversityRabatMorocco
  3. 3.Faculty of Letters and Human SciencesMohammed Vth UniversityRabatMorocco

Personalised recommendations