Arabic Corpus Linguistics: Major Progress, but Still a Long Way to Go

  • Imad ZeroualEmail author
  • Abdelhak Lakhouaja
Part of the Studies in Computational Intelligence book series (SCI, volume 740)


Arabic is an old Semitic language, the standardization of its lexicon and grammar are deeply rooted and well established a long time ago in history. Arabic is a morphologically rich language characterized by the phenomenon of derivation and inflection. It is an international language with over 500 million native speakers around 29 countries. In the last 15 years, Arabic has achieved the highest growth of the ten top online languages. Consequently, the volume of stored electronic information increases rapidly. Despite this proud heritage, lexical richness, and online user growth, Arabic is relatively an under-resourced language compared to other languages with less or similar population size (e.g., French and German). The boundaries of this chapter cover the major progress that has been made in Arabic linguistic resources, primarily corpora compilation and the challenges that researchers face in the development of such process. It is hoped that this overall view of the Arabic corpus linguistics would guide current and future research directions.


Corpus linguistics Arabic language Linguistic resources Corpus compilation Natural language processing 


  1. Ababou, N., Mazroui, A.: A hybrid Arabic POS tagging for simple and compound morphosyntactic tags. Int. J. Speech Technol. 19, 289–302 (2016)CrossRefGoogle Scholar
  2. Abbas, M., Smaïli, K., Berkani, D.: Evaluation of topic identification methods on Arabic corpora. JDIM 9, 185–192 (2011)Google Scholar
  3. Abdelali, Ahmed, Guzman, Francisco, Sajjad, Hassan, Vogel, Stephan: The AMARA corpus: building parallel language resources for the educational domain. In LREC 14, 1044–1054 (2014)Google Scholar
  4. Abdul-Mageed, M., Diab, M.T., Kübler, S.: ASMA: a system for automatic segmentation and morpho-syntactic disambiguation of modern standard Arabic. In: RANLP, pp. 1–8 (2013)Google Scholar
  5. Abumalloh, R.A., Al-Sarhan, H.M., Ibrahim, O., Abu-Ulbeh, W.: Arabic part-of-speech tagging. J: Soft Comput. Decis. Support Syst. 3, 45–52 (2016)Google Scholar
  6. Ahmed, F., Nürnberger, A.: Arabic/english word translation disambiguation using parallel corpora and matching schemes. In: Proceedings of EAMT, vol. 8, p. 28 (2008)Google Scholar
  7. Al-Dahdah, A.: The Grammar of the Arabic Language in Tables And Lists. Maktabat Lebnan, Beirut (1989). [in Arabic]Google Scholar
  8. Al-Emran, M., Zaza, S., Shaalan, K.: Parsing modern standard Arabic using Treebank resources. In: 2015 International Conference on Information and Communication Technology Research (ICTRC), pp. 80–83. IEEE (2015)Google Scholar
  9. Alfaifi, A.Y.G., Atwell, Eric, Hedaya, I.: Arabic learner corpus (ALC) v2: a new written and spoken corpus of Arabic learners. Proc. Learn. Corpus Stud. Asia World 2014(2), 77–89 (2014)Google Scholar
  10. Alrabiah, M., Al-Salman, A., Atwell, E.S.: The design and construction of the 50 million words KSUCCA. In: Proceedings of WACL’2 Second Workshop on Arabic Corpus Linguistics, pp. 5–8. The University of Leeds (2013)Google Scholar
  11. Alsaedi, N., Peter B., Rana, O.F.: Sensing real-world events using Arabic Twitter posts (2016)Google Scholar
  12. Al-Sulaiti, L., Atwell, E.S.: The design of a corpus of contemporary Arabic. Int. J. Corpus Linguist. 11, 135–171 (2006)CrossRefGoogle Scholar
  13. Altabba, M., Al-Zaraee, A., Shukairy, M.A.: An Arabic morphological analyzer and part-of-speech tagger. A Thesis Presented to the Faculty of Informatics Engineering, Arab International University, Damascus, Syria (2010)Google Scholar
  14. Al-Thubaity, A.O.: A 700M+ Arabic corpus: KACST Arabic corpus design and construction. Lang. Resour. Eval. 49, 721–751 (2015)CrossRefGoogle Scholar
  15. Armstrong, S., Church, K., Isabelle, P., Manzi, S., Tzoukermann, E., Yarowsky, D.: Natural Language Processing Using Very Large Corpora, vol. 11. Springer Science & Business Media (2013)Google Scholar
  16. Arts, T., Belinkov, Y., Habash, N., Kilgarriff, A., Suchomel, V.: arTenTen: Arabic corpus and word sketches. J. King Saud Univ.—Comput. Inf. Sci. 26, Special Issue on Arabic NLP, 357–371 (2014).
  17. Atkins, S., Clear, J., Ostler, N.: Corpus design criteria. Lit. Linguist. Comput. 7, 1–16 (1992)CrossRefGoogle Scholar
  18. Attia, M., Van Genabith, J.: A jellyfish dictionary for Arabic. In: Electronic Lexicography in the 21st Century: Thinking Outside the Paper: Proceedings of the eLex 2013 Conference, 17–19 October 2013, Tallinn, Estonia, pp. 195–212 (2013)Google Scholar
  19. Ball, C.N.: Automated text analysis: Cautionary tales. Lit. Linguist. Comput. 9, 295–302 (1994)CrossRefGoogle Scholar
  20. Baneyx, A., Charlet, J., Jaulent, M.-C.: Building an ontology of pulmonary diseases with natural language processing tools using textual corpora. Int. J. Med. Inform. 76, 208–215 (2007)CrossRefGoogle Scholar
  21. Belinkov, Y., Magidow, A., Romanov, M., Shmidman, A., Koppel, M.: Shamela: A Large-Scale Historical Arabic Corpus (2016). arXiv:1612.08989:45
  22. Bertels, A.: Corpus Linguistics for Language Teaching and LSP (2017)Google Scholar
  23. Bhattacharya, P., Goyal, P., Sarkar, Sudeshna: Query translation for cross-language information retrieval using multilingual word clusters. WSSANLP 2016, 152 (2016)Google Scholar
  24. Biber, D., Conrad, S., Reppen, R.: Corpus Linguistics: Investigating Language Structure and Use. Cambridge University Press (1998)Google Scholar
  25. Bongers, H.: The History and Principles of Vocabulary Control: As It Affects in General and of English in Particular. 3. The KLM-List. Wocopi (1947)Google Scholar
  26. Bouamor, H., Habash, N., Oflazer, K.: A multidialectal parallel corpus of Arabic. In LREC, pp. 1240–1245 (2014)Google Scholar
  27. Boudchiche, M., Mazroui, A., Bebah, M.O.A.O., Lakhouaja, A., Boudlal, A.: AlKhalil morpho sys 2: a robust Arabic morpho-syntactic analyzer. J. King Saud Univ.—Comput. Inf. Sci. (2016). Google Scholar
  28. Boulton, A., Landure, C.: Using corpora in language teaching, learning and use. Recherche et pratiques pédagogiques en langues de spécialité. Cahiers de l’Apliut 35 (2016)Google Scholar
  29. Cettolo, M., Girardi, C., Federico, M.: Wit3: web inventory of transcribed and translated talks. In: Proceedings of the 16th Conference of the European Association for Machine Translation (EAMT), pp. 261–268 (2012)Google Scholar
  30. Chen, Y., Eisele, A.: MultiUN v2: un documents with multilingual alignments. In: LREC, pp. 2500–2504 (2012)Google Scholar
  31. Chennoufi, A., Mazroui, A.: Impact of morphological analysis and a large training corpus on the performances of Arabic diacritization. Int. J. Speech Technol. 19, 269–280 (2016)CrossRefGoogle Scholar
  32. Cotterell, R., Callison-Burch, C.: A multi-dialect, multi-genre corpus of informal written Arabic. In: LREC, pp. 241–245 (2014)Google Scholar
  33. Darwish, K., Abdelali, A., Mubarak, H.: Using stem-templates to improve Arabic POS and gender/number tagging. In: LREC, pp. 2926–2931. Citeseer (2014)Google Scholar
  34. Diab, M.: Second generation AMIRA tools for Arabic processing: fast and robust tokenization, POS tagging, and base phrase chunking. In: 2nd International Conference on Arabic Language Resources and Tools (2009)Google Scholar
  35. Dror, Judith, Shaharabani, Dudu, Talmon, Rafi, Wintner, Shuly: Morphological analysis of the Qur’an. Lit. Linguist. Comput. 19, 431–452 (2004)CrossRefGoogle Scholar
  36. Dukes, K.: Statistical parsing by machine learning from a classical Arabic treebank (2015). arXiv:1510.07193
  37. Dukes, K., Habash, N.: Morphological annotation of Quranic Arabic. In: LREC (2010)Google Scholar
  38. El-Haj, M., Koulali, R.: KALIMAT a multipurpose Arabic Corpus. In: Second Workshop on Arabic Corpus Linguistics (WACL-2), pp. 22–25 (2013)Google Scholar
  39. El-Haj, M., Kruschwitz, U., Fox, C.: Creating language resources for under-resourced languages: methodologies, and experiments with Arabic. Lang. Resour. Eval. 49, 549–580 (2015)CrossRefGoogle Scholar
  40. Farghaly, Ali, Shaalan, Khaled: Arabic natural language processing: challenges and solutions. ACM Trans. Asian Lang. Inf. Process. (TALIP) 8, 14 (2009)Google Scholar
  41. Francis, W., Kucera, H.: Frequency analysis of English usage (1982)Google Scholar
  42. Ghalayini, M.I.M.S.: Jami’al-durus al-’arabiyah. Turath For Solutions (2013)Google Scholar
  43. Gharaibeh, I.K., Gharaibeh, N.K.: Towards Arabic noun phrase extractor (ANPE) using information retrieval techniques. Softw. Eng. 2, 36–42 (2012)Google Scholar
  44. Habash, N., Rambow, O., Roth, R.: MADA + TOKAN: a toolkit for Arabic tokenization, diacritization, morphological disambiguation, POS tagging, stemming and lemmatization. In: Proceedings of the 2nd International Conference on Arabic Language Resources and Tools (MEDAR), Cairo, Egypt, pp. 102–109 (2009)Google Scholar
  45. Halliday, M., Matthiessen, C.M.I.M., Matthiessen, C.: An Introduction to Functional Grammar. Routledge (2014)Google Scholar
  46. Hu, K.: Corpus-based translation studies: problems and prospects. In: Introducing Corpus-based Translation Studies, pp. 223–233. SpringerGoogle Scholar
  47. Hu, K., et al.: Introducing Corpus-Based Translation Studies. Springer (2016)Google Scholar
  48. Hunston, S.: Corpus linguistics: historical development. In: The Encyclopedia of Applied Linguistics (2013)Google Scholar
  49. Hyland, K.: Teaching and Researching Writing. Routledge (2015)Google Scholar
  50. Imad, Z., Abdelhak, L.: Adapting a decision tree based tagger for Arabic, pp. 1–6. IEEE (2016).
  51. Jakubíček, M., Kilgarriff, A., Kovář, V., Rychlỳ, P., Suchomel, V.: The tenten corpus family. In: 7th International Corpus Linguistics Conference CL, pp. 125–127 (2013)Google Scholar
  52. Jurida, H.S., Džanić, M., Pavlović, T., Jahić, A., Hanić, J.: Netspeak: linguistic properties and aspects of online communication in postponed time. J. Foreign Lang. Teach. Appl. Linguist. 3, 1–19 (2016)Google Scholar
  53. Kammoun, N.C., Belguith, L.H., Hamadou, A.B.: The MORPH2 new version: a robust morphological analyzer for Arabic texts. In: JADT 2010: 10th International Conference on Statistical Analysis of Textual Data (2010)Google Scholar
  54. Kennedy, G.: An Introduction to Corpus Linguistics. Routledge (2014)Google Scholar
  55. Khalifa, S., Habash, N., Abdulrahim, D., Hassan, S.: A large scale corpus of Gulf Arabic (2016). arXiv:1609.02960
  56. Khalifa, S., Hassan, S., Habash, N.: A morphological analyzer for Gulf Arabic verbs. WANLP 2017 (co-located with EACL 2017), 35 (2017)Google Scholar
  57. Khorsheed, M.S., Alhazmi, K.M., Asiri, A.M.: Developing typewritten Arabic corpus with multi-fonts (TRACOM). In: Proceedings of the International Workshop on Multilingual OCR, p. 16. ACM (2009)Google Scholar
  58. Kilgarriff, A.: Using corpora as data source for dictionaries. In: The Bloomsbury Companion to Lexicography, pp. 77–96. Bloomsbury, London (2013)Google Scholar
  59. Leech, G.N.: The state of the art in corpus linguistics. In: Aijmer, K., Altenberg, B. (eds.) English Corpus Linguistics: Studies in Honor of Jan Svartuk. Longman, London (1991)Google Scholar
  60. Leech, G.: Corpora and theories of linguistic performance. In: Directions in Corpus Linguistics, pp. 105–122 (1992a)Google Scholar
  61. Leech, G.: 100 million words of English: the British National Corpus (BNC). Lang. Res. 28, 1–13 (1992b)MathSciNetGoogle Scholar
  62. Leech, G., Rayson, P., et al.: Word Frequencies in Written and Spoken English: Based on the British National Corpus. Routledge (2014)Google Scholar
  63. Lefever, E., Hoste, V.: Semeval-2013 task 10: Cross-lingual word sense disambiguation. In: Proceedings of SemEval, pp. 158–166 (2013)Google Scholar
  64. Li, L., Forascu, C., El-Haj, M., Giannakopoulos, G.: Multi-document multilingual summarization corpus preparation, part 1: Arabic, English, Greek, Chinese, Romanian. In: Association for Computational Linguistics (2013)Google Scholar
  65. Liua, Q., Jiangb, H., Linga, Z.-H., Zhuc, X., Weid, S., Hua, Y.: Commonsense Knowledge Enhanced Embeddings for Solving Pronoun Disambiguation Problems in Winograd Schema Challenge (2016). arXiv:1611.04146
  66. Maamouri, M., Bies, A., Kulick, S., Ciul, M., Habash, N., Eskander, R.: Developing an Egyptian Arabic treebank: impact of dialectal morphology on annotation and tool development. In: LREC, pp. 2348–2354 (2014)Google Scholar
  67. Maamouri, M., Bies, A., Kulick, S., Gaddeche, F., Mekki, W., Krouna, S., Bouziri, B., Zaghouani, W.: Arabic Treebank: Part 1 v 4.1 (2013)Google Scholar
  68. Maegaard, B., Attia, M., Choukri, K., Krauwer, S., Mokbel, C., Yaseen, M.: MEDAR: Arabic language technology, state-of-the-art and a cooperation roadmap. In: Proceedings of the Second International Conference on Arabic Language Resources and Tools. Citeseer (2009)Google Scholar
  69. Magdy, W., Jones, G.J.F.: Studying machine translation technologies for large-data CLIR tasks: a patent prior-art search case study. Inf. Retr. 17, 492–519 (2014)CrossRefGoogle Scholar
  70. Mansour, M.: The absence of Arabic corpus linguistics: a call for creating an Arabic national corpus. Int. J. Human. Soc. Sci. 3, 81–90 (2013)Google Scholar
  71. Marcus, M.P., Marcinkiewicz, M.A., Santorini, B.: Building a large annotated corpus of English: the Penn Treebank. Comput. Linguist. 19, 313–330 (1993)Google Scholar
  72. McEnery, T., Wilson, A.: Corpus linguistics. Edinburgh University Press, Edinburgh (1996)zbMATHGoogle Scholar
  73. Mdhaffar, S., Bougares, F., Esteve, Y., Hadrich-Belguith, L.: Sentiment analysis of tunisian dialect: linguistic resources and experiments. In: WANLP 2017 (co-located with EACL 2017), pp. 55 (2017)Google Scholar
  74. Milfull, Inge: Mutual Illumination: the dictionary of old English and the ongoing revision of the oxford english dictionary (OED3). Florilegium 26, 235–264 (2009)Google Scholar
  75. Mostefa, D., Laïb, M., Chaudiron, S., Choukri, K., Chalendar, G.: A multilingual named entity corpus for Arabic, English and French. In: MEDAR 2009, 2nd (2009)Google Scholar
  76. Nakov, P.: Web as a corpus: going beyond the n-gram. In: Russian Summer School in Information Retrieval, pp. 185–228. Springer (2014)Google Scholar
  77. Nation, I.S.P.: Teaching & learning vocabulary. Heinle Cengage Learning, Boston (2013)Google Scholar
  78. Nothman, J., Ringland, N., Radford, W., Murphy, T., Curran, J.R.: Learning multilingual named entity recognition from Wikipedia. Artif. Intell. 194, 151–175 (2013)MathSciNetCrossRefzbMATHGoogle Scholar
  79. O’Keeffe, A., McCarthy, M.: The Routledge Handbook of Corpus Linguistics. Routledge (2010)Google Scholar
  80. Pasha, A., Al-Badrashiny, M., Diab, M., El Kholy, A., Eskander, R., Habash, N., Pooleery, M., Rambow, O., Roth, R.M.: Madamira: a fast, comprehensive tool for morphological analysis and disambiguation of arabic. In: Proceedings of the Language Resources and Evaluation Conference (LREC), Reykjavik, Iceland (2014)Google Scholar
  81. Paul, M., Federico, M., Stüker, S.: Overview of the IWSLT 2010 evaluation campaign. In: IWSLT, vol. 10, pp. 3–27 (2010)Google Scholar
  82. Rabiee, H.S.: Adapting standard open-source resources to tagging a morphologically rich language: a case study with Arabic. In: RANLP Student Research Workshop, pp. 127–132 (2011)Google Scholar
  83. Roberts, A., Al-Sulaiti, L., Atwell, E.: aConCorde: towards an open-source, extendable concordancer for Arabic. Corpora 1, 39–60 (2006)CrossRefGoogle Scholar
  84. Rogati, M., McCarley, S., Yang, Y.: Unsupervised learning of arabic stemming using a parallel corpus. In: Proceedings of the 41st Annual Meeting on Association for Computational Linguistics—Volume 1, pp. 391–398. Association for Computational Linguistics, ACL ’03. Stroudsburg, PA, USA (2003).
  85. Rozovskaya, A., Bouamor, H., Habash, N., Zaghouani, W., Obeid, O., Mohit, B.: The second QALB shared task on automatic text correction for Arabic. In: ANLP Workshop 2015, pp. 26 (2015)Google Scholar
  86. Saad, M.K., Ashour, W.: Osac: open source arabic corpora. In: 6th ArchEng Int. Symposiums, EEECS, vol. 10 (2010)Google Scholar
  87. Sahragard, R., Kushki, A., Ansaripour, E.: The application of corpora in teaching grammar: the case of English relative clause. J. Pan-Pac. Assoc. Appl. Linguist. 17, 79–93 (2013)Google Scholar
  88. Sakho, M.L.: Teaching Arabic as a Second Language in International School in Dubai A Case Study Exploring New Perspectives in Learning Materials Design and Development. British University in Dubai (2012)Google Scholar
  89. Salloum, W., Habash, N.: Adam: analyzer for dialectal arabic morphology. J. King Saud Univ.-Comput. Inf. Sci. 26, 372–378 (2014)Google Scholar
  90. Samih, Y., Attia, M., Eldesouki, M., Mubarak, H., Abdelali, A., Kallmeyer, L., Darwish, K.: A neural architecture for dialectal Arabic segmentation. In: WANLP 2017 (co-located with EACL 2017), pp. 46 (2017)Google Scholar
  91. Sawalha, M., Atwell, E., Abushariah, M.A.M.: SALMA: standard Arabic language morphological analysis. In: 2013 1st International Conference on Communications, Signal Processing, and their Applications (ICCSPA), pp. 1–6. IEEE (2013)Google Scholar
  92. Sawalha, M., Brierley, C., Atwell, E.: Automatically generated, phonemic Arabic-IPA pronunciation tiers for the boundary annotated Qur’an dataset for machine learning (version 2.0). In: Proceedings of LRE-Rel 2: 2nd Workshop on Language Resource and Evaluation for Religious Texts, LREC 2014 Post-Conference Workshop 31st May 2014, Reykjavik, Iceland, pp. 42–47. The University of Leeds (2014)Google Scholar
  93. Sharaf, A.-B.M., Atwell, E.: QurAna: corpus of the Quran annotated with pronominal anaphora. In: LREC, pp. 130–137. Citeseer (2012a)Google Scholar
  94. Sharaf, A.-B.M., Atwell, E.: QurSim: a corpus for evaluation of relatedness in short texts. In: LREC, 2295–2302 (2012b)Google Scholar
  95. Silberztein, M.: Formalizing Natural Languages: The NooJ Approach. Wiley (2016)Google Scholar
  96. Sinclair, J.: Preliminary recommendations on corpus typology. In: EAGLES Document TCWG-CTYP/P.
  97. Sinclair, J.: Intuition and annotation—the discussion continues. Lang. Comput. 49, 39–59 (2004)Google Scholar
  98. Sinclair, J.: Corpus and text-basic principles. In: Developing Linguistic Corpora: A Guide to Good Practice, pp. 1–16 (2005)Google Scholar
  99. Sinclair, J.: Borrowed ideas. Lang. Comput. Stud. Pract. Linguist. 64, 21 (2008)Google Scholar
  100. Steinberger, R., Ebrahim, M., Poulis, A., Carrasco-Benitez, M., Schlüter, P., Przybyszewski, M., Gilbro, S.: An overview of the European Union’s highly multilingual parallel corpora. Lang. Res. Eval. 48, 679–707 (2014)CrossRefGoogle Scholar
  101. Stolcke, A., Zheng, J., Wang, W., Abrash, V.: SRILM at sixteen: update and outlook. In: Proceedings of IEEE Automatic Speech Recognition and Understanding Workshop, vol. 5 (2011)Google Scholar
  102. Suchomel, V., Pomikálek, J., et al.: Efficient web crawling for large text corpora. In: Proceedings of the seventh Web as Corpus Workshop (WAC7), pp. 39–43 (2012)Google Scholar
  103. Teubert, W.: Corpus Linguistics and Lexicography: The Beginning of a Beautiful Friendship, Issues 31 (2015)Google Scholar
  104. Tiedemann, J.: Building a multilingual parallel subtitle corpus. Proc. CLIN, 14 (2007)Google Scholar
  105. Tiedemann, J.: Parallel data, tools and interfaces in OPUS. In: LREC, pp. 2214–2218 (2012)Google Scholar
  106. Toutanova, K., Klein, D., Manning, C.D., Singer, Y.: Feature-rich part-of-speech tagging with a cyclic dependency network. In: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology—Volume 1, pp. 173–180. Association for Computational Linguistics (2003)Google Scholar
  107. Tsarfaty, R., Seddah, D., Kübler, S., Nivre, J.: Parsing morphologically rich languages: Introduction to the special issue. Comput. Linguist. 39, 15–22 (2013)CrossRefGoogle Scholar
  108. Watson, J.C.E.: The Phonology and Morphology of Arabic. Oxford University Press on Demand (2002)Google Scholar
  109. Xing, J., Wong, D.F., Chao, L.S., Leal, A.L.V., Schmaltz, M., Lu, C.: Syntaxtree aligner: a web-based parallel tree alignment toolkit. In: 2016 International Conference on Wavelet Analysis and Pattern Recognition (ICWAPR), pp. 37–42. IEEE (2016)Google Scholar
  110. Yassein, M.B., Wahsheh, Y.A.: HQTP v. 2: holy Quran transfer protocol version 2. In: 2016 7th International Conference on Computer Science and Information Technology (CSIT), pp. 1–5. IEEE (2016)Google Scholar
  111. Zaghouani, W.: Critical Survey of the Freely Available Arabic Corpora (2017). arXiv:1702.07835
  112. Zaghouani, W., Habash, N., Mohit, B.: The qatar arabic language bank guidelines. Technical Report CMU-CS-QTR-124, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, September, 2014Google Scholar
  113. Zaki, Y., Hajjar, H., Hajjar, M., Bernard, G.: A survey of syntactic parsers of arabic language. In: Proceedings of the International Conference on Big Data and Advanced Wireless Technologies, p. 31. ACM (2016)Google Scholar
  114. Zamin, N., Oxley, A., Bakar, Z.A., Farhan, S.A.: A statistical dictionary-based word alignment algorithm: an unsupervised approach. In: 2012 International Conference on Computer & Information Science (ICCIS), vol. 1, pp. 396–402. IEEE (2012)Google Scholar
  115. Zeroual, I., Lakhouaja, A.: Towards a multilingual aligned parallel corpus. In: Proceedings of the International Conference of High Innovation in Computer Science, Kenitra, Morocco (2016a)Google Scholar
  116. Zeroual, I., Lakhouaja, A.: A new Quranic corpus rich in morphosyntactical information. Int. J. Speech Technol., 1–8 (2016b).
  117. Zeroual, I., Lakhouaja, A., Belahbib, R.: Towards a standard part of Speech tagset for the Arabic language. J. King Saud Univ.—Comput. Inf. Sci. 29, 174–181 (2017). CrossRefGoogle Scholar
  118. Zerrouki, T., Balla, A.: Tashkeela: novel corpus of Arabic vocalized texts, data for auto-diacritization systems. Data Brief 11, 147–151 (2017)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2018

Authors and Affiliations

  1. 1.Computer Sciences Laboratory, Faculty of SciencesMohammed First UniversityOujdaMorocco

Personalised recommendations