Improving Arabic morphological analyzers benchmark
The various tools dedicated to Arabic natural language processing have undergone significant development during recent years. Among these tools, Arabic morphological analyzers are of great importance because they are often used within other projects that are more advanced such as syntactic parsers, search engines, machine translation systems, etc. Thus, researchers are forced to make a decision concerning which morphological analyzer to use in their research projects, and this task is very difficult since there are many criteria to take into account. In order to facilitate this choice, we considered the problem of benchmarking morphological analyzers in a previous work by proposing a solution that allows returning a set of metrics of each analyzer that are: accuracy, precision, recall, F-measure and the execution time. In this article, we present two new major improvements to our solution: the establishment of the first version of our corpus that is dedicated to the evaluation of morphological analyzers, as well as the introduction of a new metric, which combines all metrics related to results as well as the execution time of the analyzers.
KeywordsArabic morphological analyzers Benchmark Standard corpus
- Alansary, S., Nagi, M., & Adly, N. (2007). Building an international corpus of Arabic. 7th international conference on language engineering, (p. np.). Cairo.Google Scholar
- ALECSO. (n.d.). Retrieved December 23, 2014, from مواصفات نظام التحليل الصرفي في اللغة العربية: http://www.alecso.org.tn/images/stories/OULOUM/MOHALLILAT%20SARFIA_DAMAS_2009/022%202%20SPECIFICATIONS.pdf.
- Alkhalil Morpho Sys. (2013). Retrieved April 23, 2015, from Alkhalil Morpho Sys: http://sourceforge.net/projects/alkhalil/.
- Al-Sughaiyer, I. A., & Al-Kharashi, I. A. (2004). Arabic morphological analysis techniques: A comprehensive survey. Journal of the American Society For Information Science and Technology, 55(3), 189–213. Retrieved from Imad Al-Sughayer and Ibrahim Al-Kharashi. “Arabic morphological Analysis Techniques: a comprehensive Survey”. Computer and Electronics.Google Scholar
- Boudlal, A., Lakhouaja, A., Mazroui, A., Meziane, A., Ould Abdallahi, O. B., & Shoul, M. (2011). Alkhalil Morpho Sys: A morphosyntactic analysis system for Arabic texts. Proceedings of ACIT’2010. Google Scholar
- Brihaye, P. (2003). AraMorph. Retrieved April 23, 2015, from AraMorph: http://www.nongnu.org/aramorph/english/index.html.
- Buckwalter, T. (2002a). Arabic morphology analysis. Retrieved April 23, 2015, from QAMUS: http://www.qamus.org/morphology.htm.
- Buckwalter, T. (2002b). Buckwalter Arabic morphological analyzer version 1.0.Google Scholar
- Champsaur, C. (2013, January). La traduction automatique : Un outil pour les traducteurs? The Journal of Specialised Translation, 19, pp. 19–28.Google Scholar
- Chennoufi, A., & Mazroui, A. (2014). Apport de la deuxième version de l’analyseur Alkhalil Morpho Sys dans la voyellation automatique des textes Arabes. 5th international conference on Arabic language processing (CITALA 2014), (pp. 223–230). Oujda.Google Scholar
- Darwish, K. (2002). Building a shallow Arabic morphological analyzer in one day. Proceedings of the ACL-2002 workshop on computational approaches to semitic languages, (pp. 47–54). Retrieved from https://aclweb.org/anthology.
- Diab, M. (2009). Second generation tools (AMIRA 2.0): Fast and robust tokenization, POS tagging, and base phrase chunking. Second international conference on Arabic language resources and tools, (pp. 285–288). Cairo.Google Scholar
- Dukes, K. (2010). The Quranic Arabic corpus. Retrieved April 23, 2015, from Quranic Arabic Corpus. http://corpus.quran.com.
- Dukes, K., & Habash, N. (2010). Morphological annotation of Quranic Arabic. Language resources and evaluation conference (LREC). Malta. Retrieved from https://aclweb.org/anthology.
- Graff, D., Maamouri, M., Bouziri, B., Krouna, S., Kulick, S., & Buckwalter, T. (2009). Standard Arabic morphological analyzer (SAMA) version 3.1. Linguistic Data Consortium LDC2009E73.Google Scholar
- Habash, N., Rambow, O., & Roth, R. (n.d.). MADA + TOKAN software suite. Retrieved April 23, 2015, from MADA + TOKAN: http://www1.cs.columbia.edu/~rambow/software-downloads/MADA_Distribution.html.
- Habash, N., Rambow, O., & Roth, R. (2009). Mada + tokan: A toolkit for Arabic tokenization, diacritization, morphological disambiguation, pos tagging, stemming and lemmatization. Proceedings of the 2nd international conference on Arabic language resources and Tools (MEDAR), (pp. 102–109). Cairo.Google Scholar
- Hassan, Y., Aly, M., & Atiya, A. (2014). Arabic spelling correction using supervised learning. Proceedings of the EMNLP 2014 workshop on Arabic, (pp. 121–126). Doha.Google Scholar
- Hattab, M., Haddad, B., Yaseen, M., Duraidi, A., & Shmais, A. A. (2009). Addaall Arabic search engine: Improving search based on combination of morphological analysis and generation considering semantic patterns. The 2nd international conference on Arabic language resources & tools, (pp. 159–162).Google Scholar
- Jaafar, Y., & Bouzoubaa, K. (2014). Benchmark of Arabic morphological analyzers: Challenges and solutions. Intelligent systems: Theories and applications (SITA-14), (pp. 1–6). Rabat.Google Scholar
- Kano, Y., Dorado, R., McCrohon, L., Ananiadou, S., & Tsujii, J. (2010). U-Compare: An integrated language resource evaluation platform including a comprehensive UIMA resource library. Proceedings of the seventh international conference on language resources and evaluation (LREC 2010), (pp. 428–434).Google Scholar
- Koulali, R., & Meziane, A. (2013). Experiments with Arabic topic detection. Journal of Theoretical and Applied Information Technology, 50(1), 28–32.Google Scholar
- Pasha, A., Al-Badrashiny, M., Diab, M., El Kholy, A., Eskander, R., Habash, N., & Roth, R. M. (2014). MADAMIRA: A fast, comprehensive tool for morphological analysis and disambiguation of Arabic. LREC’14, (pp. 1094–1101). Reykjavik.Google Scholar
- Sawalha, M., & Atwell, E. (2008). Comparative evaluation of Arabic language morphological analysers and stemmers. International conference on computational linguistics—COLING, (pp. 107–110). Retrieved from https://aclweb.org/anthology.
- Sawalha, M. (n.d.). Gold Standard of Arabic. Gold standard for evaluating Arabic morphological analyzers. Retrieved April 23, 2015, from http://www.comp.leeds.ac.uk/sawalha/goldstandard.html.
- Smrž, O. (2007). ElixirFM: Implementation of functional Arabic morphology. Proceedings of the 2007 workshop on computational approaches to Semitic languages: common issues and resources (pp. 1–8). Stroudsburg: Association for Computational Linguistics.Google Scholar
- Wali, W., Gargouri, B., & Ben Hamadou, A. (2014). A system for evaluating the content of LMF Arabic dictionaries. 5th international conference on Arabic language processing (CITALA 2014), (pp. 159–167). Oujda.Google Scholar