Arabic Readability Assessment for Foreign Language Learners

  • Naoual Nassiri
  • Abdelhak Lakhouaja
  • Violetta Cavalli-Sforza
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10859)


Reading in a foreign language is a difficult task, especially if the texts presented to readers are chosen without taking into account the reader’s skill level. Foreign language learners need to be presented with reading material suitable to their reading capacities. A basic tool for determining if a text is appropriate to a reader’s level is the assessment of its readability, a measure that aims to represent the human capacities required to comprehend a given text. Readability prediction for a text is an important aspect in the process of teaching and learning, for reading in a foreign language as well as in one’s native language, and continues to be a central area of research and practice. In this paper, we present our approach to readability assessment for Modern Standard Arabic (MSA) as a foreign language. Readability prediction is carried out using the Global Language Online Support System (GLOSS) corpus, which was developed for independent learners to improve their foreign language skills and was annotated with the Interagency Language Roundtable (ILR) scale. In this study, we introduce a frequency dictionary, which was developed to calculate frequency-based features. The approach gives results that surpass the state-of the-art results for Arabic.


Readability Modern Standard Arabic (MSA) Foreign language Machine Learning (ML) Text features 


  1. 1.
    Dale, E., Chall, J.S.: A formula for predicting readability: instructions. Educ. Res. Bull. 27(2), 37–54 (1948)Google Scholar
  2. 2.
    Flesch, R.: A new readability yardstick. J. Appl. Psychol. 32(3), 221 (1948)CrossRefGoogle Scholar
  3. 3.
    Gunning, R.: The fog index after twenty years. J. Bus. Commun. 6(2), 3–13 (1969)CrossRefGoogle Scholar
  4. 4.
    Ghani, K.A., Noh, A.S., Yusoff, N.M.: Linguistic features for development of Arabic text readability formula in Malaysia: a preliminary study. Middle-East J. Sci. Res. 19(3), 319–331 (2014)Google Scholar
  5. 5.
    Al Tamimi, A.K., Jaradat, M., Al-Jarrah, N., Ghanem, S.: AARI: automatic arabic readability index. Int. Arab. J. Inf. Technol. 11(4), 370–378 (2014)Google Scholar
  6. 6.
    Al-Khalifa, H.S., Al-Ajlan, A.: Automatic readability measurements of the Arabic text: an exploratory study. Arab. J. Sci. Eng. 35, 103–124 (2010)Google Scholar
  7. 7.
    Forsyth, J.: Automatic readability detection for modern standard Arabic. Thesis Diss., Brigh. Young Univ. – Provo (2014)Google Scholar
  8. 8.
    Pasha, A., et al.: MADAMIRA: a fast, comprehensive tool for morphological analysis and disambiguation of Arabic. In: LREC, vol. 14, pp. 1094–1101 (2014)Google Scholar
  9. 9.
    Saddiki, H., Bouzoubaa, K., Cavalli-Sforza, V.: Text readability for Arabic as a foreign language. In: 2015 IEEE/ACS 12th International Conference of Computer Systems and Applications (AICCSA), pp. 1–8 (2015)Google Scholar
  10. 10.
    Nassiri, N., Lakhouaja, A., Cavalli-Sforza, V.: Modern Standard Arabic readability prediction. In: Lachkar, A., Bouzoubaa, K., Mazroui, A., Hamdani, A., Lekhouaja, A. (eds.) ICALP 2017. CCIS, vol. 782, pp. 120–133. Springer, Cham (2018). Scholar
  11. 11.
    Boudchiche, M., Mazroui, A.: Approche hybride pour le développement d’un lemmatiseur pour la langue arabe. In: Presented at the 13th African Conference on Research in Computer Science and Applied Mathematics, Hammamet, Tunisia, p. 147 (2016)Google Scholar
  12. 12.
    Boudchiche, M., Mazroui, A., Ould Abdallahi Ould Bebah, M., Lakhouaja, A., Boudlal, A.: AlKhalil Morpho Sys 2: a robust Arabic morpho-syntactic analyzer. J. King Saud Univ. – Comput. Inf. Sci. 29(2), 141–146 (2017)Google Scholar
  13. 13.
    Ababou, N., Mazroui, A.: A hybrid Arabic POS tagging for simple and compound morphosyntactic tags. Int. J. Speech Technol. 19(2), 289–302 (2016)CrossRefGoogle Scholar
  14. 14.
    Zerrouki, T., Balla, A.: Tashkeela: novel corpus of Arabic vocalized texts, data for auto-diacritization systems. Data Brief 11, 147–151 (2017)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  • Naoual Nassiri
    • 1
  • Abdelhak Lakhouaja
    • 1
  • Violetta Cavalli-Sforza
    • 2
  1. 1.Department of Computer Science, Faculty of SciencesUniversity Mohamed FirstOujdaMorocco
  2. 2.School of Science and EngineeringAI Akhawayn UniversityIfraneMorocco

Personalised recommendations