Modern Standard Arabic Readability Prediction

  • Naoual Nassiri
  • Abdelhak Lakhouaja
  • Violetta Cavalli-Sforza
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 782)

Abstract

Reading is the most critical skill for satisfactory progress in school, as well as being highly important for access to information throughout one’s life. For this reason, readability is one of the main challenges when choosing academic texts for learners or for readers in general, and especially with materials containing important information, such as newspapers and medical or legal articles. Readability refers to the ability of a text to be understood by the reader. Readability level prediction is an important measure in several domains, but primarily in education. In the current paper we present our approach to readability prediction for Modern Standard Arabic. This method is based on 170 features of measuring different types of text characteristics. We have used a corpus of 230 Arabic texts, annotated with the Interagency Language Roundtable (ILR) scale, and a frequency dictionary obtained using Tashkeela corpora. The results obtained are very encouraging and better than for previously presented work.

Keywords

Readability Modern Standard Arabic Machine learning Classification Arabic readability 

References

  1. 1.
    Al-Khalifa, H.S., Al-Ajlan, A.: Automatic readability measurements of the Arabic text: an exploratory study. Arabian J. Sci. Eng. 35(2C), 103–124 (2010)Google Scholar
  2. 2.
    Dale, E., Chall, J.S.: A formula for predicting readability: instructions. Educ. Res. Bull. 27, 37–54 (1948)Google Scholar
  3. 3.
    Flesch, R.: A new readability yardstick. J. Appl. Psychol. 32, 221 (1948)CrossRefGoogle Scholar
  4. 4.
    Shen, W., Williams, J., Marius, T., Salesky, E.: A language-independent approach to automatic text difficulty assessment for second-language learners. DTIC Document (2013)Google Scholar
  5. 5.
    Senter, R.J., Smith, E.A.: Automated readability index. University of Cincinnati, Ohio (1967)Google Scholar
  6. 6.
    Al Tamimi, A.K., Jaradat, M., Al-Jarrah, N., Ghanem, S.: AARI: automatic Arabic readability index. Int. Arab. J. Inf. Technol. 11, 370–378 (2014)Google Scholar
  7. 7.
    El-Haj, M., Rayson, P.E.: OSMAN: a novel Arabic readability metric (2016)Google Scholar
  8. 8.
    Saddiki, H., Bouzoubaa, K., Cavalli-Sforza, V.: Text readability for Arabic as a foreign language. In: 2015 IEEE/ACS 12th International Conference of Computer Systems and Applications (AICCSA), pp. 1–8 (2015)Google Scholar
  9. 9.
    Pasha, A., Al-Badrashiny, M., Diab, M.T., El Kholy, A., Eskander, R., Habash, N., Pooleery, M., Rambow, O., Roth, R.: MADAMIRA: a fast, comprehensive tool for morphological analysis and disambiguation of Arabic. In: LREC, pp. 1094–1101 (2014)Google Scholar
  10. 10.
    Holmes, G., Donkin, A., Witten, I.H.: Weka: a machine learning workbench. In: Proceedings of the 1994 Second Australian and New Zealand Conference on Intelligent Information Systems, pp. 357–361. IEEE (1994)Google Scholar
  11. 11.
    Forsyth, J.: Automatic readability detection for modern standard Arabic. Theses dissertations (2014)Google Scholar
  12. 12.
    Zerrouki, T., Balla, A.: Tashkeela: novel corpus of Arabic vocalized texts, data for auto-diacritization systems. Data Brief. 11, 147–151 (2017)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2018

Authors and Affiliations

  • Naoual Nassiri
    • 1
  • Abdelhak Lakhouaja
    • 1
  • Violetta Cavalli-Sforza
    • 2
  1. 1.Department of Computer Science, Faculty of SciencesUniversity Mohamed FirstOujdaMorocco
  2. 2.School of Science and EngineeringAI Akhawayn UniversityIfraneMorocco

Personalised recommendations