ICALP 2017: Arabic Language Processing: From Theory to Practice pp 120-133 | Cite as
Modern Standard Arabic Readability Prediction
Abstract
Reading is the most critical skill for satisfactory progress in school, as well as being highly important for access to information throughout one’s life. For this reason, readability is one of the main challenges when choosing academic texts for learners or for readers in general, and especially with materials containing important information, such as newspapers and medical or legal articles. Readability refers to the ability of a text to be understood by the reader. Readability level prediction is an important measure in several domains, but primarily in education. In the current paper we present our approach to readability prediction for Modern Standard Arabic. This method is based on 170 features of measuring different types of text characteristics. We have used a corpus of 230 Arabic texts, annotated with the Interagency Language Roundtable (ILR) scale, and a frequency dictionary obtained using Tashkeela corpora. The results obtained are very encouraging and better than for previously presented work.
Keywords
Readability Modern Standard Arabic Machine learning Classification Arabic readabilityReferences
- 1.Al-Khalifa, H.S., Al-Ajlan, A.: Automatic readability measurements of the Arabic text: an exploratory study. Arabian J. Sci. Eng. 35(2C), 103–124 (2010)Google Scholar
- 2.Dale, E., Chall, J.S.: A formula for predicting readability: instructions. Educ. Res. Bull. 27, 37–54 (1948)Google Scholar
- 3.Flesch, R.: A new readability yardstick. J. Appl. Psychol. 32, 221 (1948)CrossRefGoogle Scholar
- 4.Shen, W., Williams, J., Marius, T., Salesky, E.: A language-independent approach to automatic text difficulty assessment for second-language learners. DTIC Document (2013)Google Scholar
- 5.Senter, R.J., Smith, E.A.: Automated readability index. University of Cincinnati, Ohio (1967)Google Scholar
- 6.Al Tamimi, A.K., Jaradat, M., Al-Jarrah, N., Ghanem, S.: AARI: automatic Arabic readability index. Int. Arab. J. Inf. Technol. 11, 370–378 (2014)Google Scholar
- 7.El-Haj, M., Rayson, P.E.: OSMAN: a novel Arabic readability metric (2016)Google Scholar
- 8.Saddiki, H., Bouzoubaa, K., Cavalli-Sforza, V.: Text readability for Arabic as a foreign language. In: 2015 IEEE/ACS 12th International Conference of Computer Systems and Applications (AICCSA), pp. 1–8 (2015)Google Scholar
- 9.Pasha, A., Al-Badrashiny, M., Diab, M.T., El Kholy, A., Eskander, R., Habash, N., Pooleery, M., Rambow, O., Roth, R.: MADAMIRA: a fast, comprehensive tool for morphological analysis and disambiguation of Arabic. In: LREC, pp. 1094–1101 (2014)Google Scholar
- 10.Holmes, G., Donkin, A., Witten, I.H.: Weka: a machine learning workbench. In: Proceedings of the 1994 Second Australian and New Zealand Conference on Intelligent Information Systems, pp. 357–361. IEEE (1994)Google Scholar
- 11.Forsyth, J.: Automatic readability detection for modern standard Arabic. Theses dissertations (2014)Google Scholar
- 12.Zerrouki, T., Balla, A.: Tashkeela: novel corpus of Arabic vocalized texts, data for auto-diacritization systems. Data Brief. 11, 147–151 (2017)CrossRefGoogle Scholar