Abstract
Diacritization of written text has a significant impact on Arabic NLP applications. We present an approach to Arabic automatic diacritization that integrates morphological analysis with shallow syntactic analysis. The developed system (Alserag) is a rule based system. The system depends on three modules in order to provide fully diacritized Arabic words namely, morphological analysis module, syntactic analysis module and morph-phonological processing module. The results of the system were evaluated for accuracy against the reference using two metrics; diacritization error rate (DER) and word error rate (WER). The DER measurement was 8.68 % while WER measurement was 18.63 %. The system is benchmarked against three known diacritization systems; Harakat, Mishkal, and Aldoaly.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
- 3.
- 4.
it is a web application developed in Java and available at http://dev.undlfoundation.org/index.jsp.
- 5.
it is a web application developed in Java and available at http://dev.undlfoundation.org/index.
References
Smr, O.: Yet Another Intro to Arabic NLP (2005). http://ufal.mff.cuni.cz/~smrz/ANLP/anlp-lecture-notes.pdf
Rashwan, M., Abdou, S., Rafea, A.: Stochastic arabic hybrid diacritizer. In: IEEE Transactions on Natural Language Processing and Knowledge Engineering, pp. 1–8 (2009)
Attia, M., Rashwan, M.A.A., Al-Badrashiny, M.A.S.A.A.: Fassieh®, a semi-automatic visual interactive tool for morphological, PoS-tags, phonetic, and semantic annotation of arabic text corpora. IEEE Trans. Audio Speech Lang. Process. 17(5), 916–925 (2009)
Maamouri, M., Bies, A., Kulick, S.: Diacritization: A Challenge to Arabic Treebank Annotation and Parsing. Linguistic Data Consortium, University of Pennsylvania, USA (2006)
Bouamor, H., Zaghouani, W., Diab, M., Obeid, O., Oflazer, K., Ghoneim, M., Hawwari, A.: A pilot study on arabic multi-genre corpus diacritization annotation. In: Proceedings of the Second Workshop on Arabic Natural Language Processing, pp. 80–88. c2014 Association for Computational Linguistics, Beijing, China (2015)
EL-Desoky, A., Fayz, M., Samir, D.: A smart dictionary for the arabic full-form words. IJSCE 2(5) (2012). ISSN: 2231-2307
Al Badrashiny, M.: Automatic Diacritizer for Arabic Text. A Thesis Submitted to the Faculty of Engineering. Cairo University in Partial Fulfillment of the Requirements for the Degree of master of science in electronics and electrical communication (2009)
Vergyri, D., Kirchhoff, K.: Automatic diacritization of arabic for acoustic modeling in speech recognition. In: COLING Workshop, Geneva, Switzerland (2004)
Ananthakrishnan, S., Narayanan, S., Bangalore, S.: Automatic diacritization of arabic transcripts for asr. In: Proceedings of ICON-2005, Kanpur, India (2005)
Zitouni, I., Sorensen, J.S., Sarikaya. R.: Maximum entropy based restoration of arabic diacritics. In: Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics (ACL), Workshop on Computational Approaches to Semitic Languages, Sydney-Australia (2006)
Habash, N., Rambow, O.: Arabic diacritization through full morphological tagging. In: Proceedings of the 8th Meeting of the North American Chapter of the Association for Computational Linguistics (ACL), (HLT-NAACL) (2007)
Shaalan, K., Abo Bakr, H.M., Ziedan, I.: A hybrid approach for building Arabic diacritizer. In: Semitic 2009 Proceedings of the EACL Workshop on Computational Approaches to Semitic Languages (2009)
Shahrour, A., Khalifa, S., Habash, N.: Improving arabic diacritization through syntactic analysis. In: Proceedings of EMNLP, Lisbon (2015)
Alansary, S.: MUHIT: A multilingual harmonized dictionary. In: The 9th Edition of the Language Resources and Evaluation Conference, Reykjavik, Iceland, 26–31 May 2014
Alansary, S.: A Suite of Tools for Arabic Natural Language Processing: A UNL Approach, the special session on Arabic Natural Language Processing: Algorithms, Resources, Tools, Techniques and Applications, (ICCSPA 2013), Sharjah, UAE (2013)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Alansary, S. (2017). Alserag: An Automatic Diacritization System for Arabic. In: Hassanien, A., Shaalan, K., Gaber, T., Azar, A., Tolba, M. (eds) Proceedings of the International Conference on Advanced Intelligent Systems and Informatics 2016. AISI 2016. Advances in Intelligent Systems and Computing, vol 533. Springer, Cham. https://doi.org/10.1007/978-3-319-48308-5_18
Download citation
DOI: https://doi.org/10.1007/978-3-319-48308-5_18
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-48307-8
Online ISBN: 978-3-319-48308-5
eBook Packages: EngineeringEngineering (R0)