Skip to main content

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 533))

Abstract

Diacritization of written text has a significant impact on Arabic NLP applications. We present an approach to Arabic automatic diacritization that integrates morphological analysis with shallow syntactic analysis. The developed system (Alserag) is a rule based system. The system depends on three modules in order to provide fully diacritized Arabic words namely, morphological analysis module, syntactic analysis module and morph-phonological processing module. The results of the system were evaluated for accuracy against the reference using two metrics; diacritization error rate (DER) and word error rate (WER). The DER measurement was 8.68 % while WER measurement was 18.63 %. The system is benchmarked against three known diacritization systems; Harakat, Mishkal, and Aldoaly.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://tahadz.com/mishkal.

  2. 2.

    http://harakat.ae/.

  3. 3.

    http://www.unlweb.net/unlarium/index.php?unlarium=dictionary.

  4. 4.

    it is a web application developed in Java and available at http://dev.undlfoundation.org/index.jsp.

  5. 5.

    it is a web application developed in Java and available at http://dev.undlfoundation.org/index.

References

  1. Smr, O.: Yet Another Intro to Arabic NLP (2005). http://ufal.mff.cuni.cz/~smrz/ANLP/anlp-lecture-notes.pdf

  2. Rashwan, M., Abdou, S., Rafea, A.: Stochastic arabic hybrid diacritizer. In: IEEE Transactions on Natural Language Processing and Knowledge Engineering, pp. 1–8 (2009)

    Google Scholar 

  3. Attia, M., Rashwan, M.A.A., Al-Badrashiny, M.A.S.A.A.: Fassieh®, a semi-automatic visual interactive tool for morphological, PoS-tags, phonetic, and semantic annotation of arabic text corpora. IEEE Trans. Audio Speech Lang. Process. 17(5), 916–925 (2009)

    Google Scholar 

  4. Maamouri, M., Bies, A., Kulick, S.: Diacritization: A Challenge to Arabic Treebank Annotation and Parsing. Linguistic Data Consortium, University of Pennsylvania, USA (2006)

    Google Scholar 

  5. Bouamor, H., Zaghouani, W., Diab, M., Obeid, O., Oflazer, K., Ghoneim, M., Hawwari, A.: A pilot study on arabic multi-genre corpus diacritization annotation. In: Proceedings of the Second Workshop on Arabic Natural Language Processing, pp. 80–88. c2014 Association for Computational Linguistics, Beijing, China (2015)

    Google Scholar 

  6. EL-Desoky, A., Fayz, M., Samir, D.: A smart dictionary for the arabic full-form words. IJSCE 2(5) (2012). ISSN: 2231-2307

    Google Scholar 

  7. Al Badrashiny, M.: Automatic Diacritizer for Arabic Text. A Thesis Submitted to the Faculty of Engineering. Cairo University in Partial Fulfillment of the Requirements for the Degree of master of science in electronics and electrical communication (2009)

    Google Scholar 

  8. Vergyri, D., Kirchhoff, K.: Automatic diacritization of arabic for acoustic modeling in speech recognition. In: COLING Workshop, Geneva, Switzerland (2004)

    Google Scholar 

  9. Ananthakrishnan, S., Narayanan, S., Bangalore, S.: Automatic diacritization of arabic transcripts for asr. In: Proceedings of ICON-2005, Kanpur, India (2005)

    Google Scholar 

  10. Zitouni, I., Sorensen, J.S., Sarikaya. R.: Maximum entropy based restoration of arabic diacritics. In: Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics (ACL), Workshop on Computational Approaches to Semitic Languages, Sydney-Australia (2006)

    Google Scholar 

  11. Habash, N., Rambow, O.: Arabic diacritization through full morphological tagging. In: Proceedings of the 8th Meeting of the North American Chapter of the Association for Computational Linguistics (ACL), (HLT-NAACL) (2007)

    Google Scholar 

  12. Shaalan, K., Abo Bakr, H.M., Ziedan, I.: A hybrid approach for building Arabic diacritizer. In: Semitic 2009 Proceedings of the EACL Workshop on Computational Approaches to Semitic Languages (2009)

    Google Scholar 

  13. Shahrour, A., Khalifa, S., Habash, N.: Improving arabic diacritization through syntactic analysis. In: Proceedings of EMNLP, Lisbon (2015)

    Google Scholar 

  14. Alansary, S.: MUHIT: A multilingual harmonized dictionary. In: The 9th Edition of the Language Resources and Evaluation Conference, Reykjavik, Iceland, 26–31 May 2014

    Google Scholar 

  15. Alansary, S.: A Suite of Tools for Arabic Natural Language Processing: A UNL Approach, the special session on Arabic Natural Language Processing: Algorithms, Resources, Tools, Techniques and Applications, (ICCSPA 2013), Sharjah, UAE (2013)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sameh Alansary .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Alansary, S. (2017). Alserag: An Automatic Diacritization System for Arabic. In: Hassanien, A., Shaalan, K., Gaber, T., Azar, A., Tolba, M. (eds) Proceedings of the International Conference on Advanced Intelligent Systems and Informatics 2016. AISI 2016. Advances in Intelligent Systems and Computing, vol 533. Springer, Cham. https://doi.org/10.1007/978-3-319-48308-5_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-48308-5_18

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-48307-8

  • Online ISBN: 978-3-319-48308-5

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics