Skip to main content
Log in

Texts for reading instruction and the most common words in modern standard Arabic: an investigation

  • Published:
Reading and Writing Aims and scope Submit manuscript


Reading instruction for young Arabic speakers presents challenges for textbook publishers and teachers. In the present study, the authors conduct an analysis at the word level of four multidisciplinary textbooks for reading instruction in grades one and two in Egypt. The study sought to answer the following questions: What are the most common words in standard Arabic? How many of the most common words in standard Arabic are used in the textbooks? How dense is the use of common words? How many rare words are used in the textbooks studied? A word frequency analysis from existing corpora were used to create a most common word list. From that list, the researchers were able to determine frequency and dispersion of the most common words in Arabic that were also used in the textbooks. Frequency and dispersion were calculated by octile, as well. Analysis found that the texts did not make use of any of the rare words found in the corpus, but many words in the texts did not appear in either the reference corpus inclusive of the common words list. Recommendations for policymakers and textbook publishers follow discussion of results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Availability of data and material

Corpora are publicly available.

Code availability



  • Abadzi, H. & Martelli, M. (2014, November 5). Efficient reading for Arab students: Implications from neurocognitive research. Paper presented at the World Summit of Innovation in Education (WISE). Doha, Qatar.

  • Abbas, M. (2004). Arabic corpora: Text corpora.

  • Abbas, M., & Smaili, K. (2005) Comparison of topic identification methods for Arabic language, RANLP05: Recent Advances in Natural Language Processing, pp. 14–17, 21–23 September 2005, Borovets, Bulgaria.

  • Abdelali, A., Cowie, J., & Soliman, H. S. (2004). Arabic information retrieval perspectives. In Proceedings of the 11th conference on natural language processing, Journes d’Etude sur la Parole-Traitement Automatique des Langues Naturelles (JEP-TALN) (pp. 391–400).

  • Abu-Rabia, S., & Taha, H. (2013). Reading in Arabic orthography: Characteristics, research findings, and assessment. In Handbook of orthography and literacy (pp. 335–352). Routledge.

  • Al Ghanem, R., & Kearns, D. M. (2015). Orthographic, phonological, and morphological skills and children’s word reading in Arabic: A literature review. Reading Research Quarterly, 50(1), 83–109.

    Article  Google Scholar 

  • Asaad, H., & Eviatar, Z. (2014). Learning to read in Arabic: The long and winding road. Reading and Writing, 27, 649–664.

    Article  Google Scholar 

  • Baddeley, A. D. (2001). Is working memory still working? American Psychologist, 56, 849–864.

    Article  Google Scholar 

  • Beck, I. L., & McKeown, M. G. (2007). Increasing young low-income children’s oral vocabulary repertoires through rich and focused instruction. The Elementary School Journal, 107(3), 251–271.

    Article  Google Scholar 

  • Boyle, H. N., & Salah, W. (2017). Reading reform in Egypt: Do the second-grade textbooks reflect the new direction? Prospects, 47, 197–213.

    Article  Google Scholar 

  • Brennan, W. (2018, April). Julie Washington’s quest to get schools to respect African American English. The Atlantic.

  • Chall, J. S. (1967). Learning to read: The great debate. McGraw-Hill.

    Google Scholar 

  • Chall, J. S. (1989). Learning to read: The great debate" 20 years later: A response to ’debunking the great phonics myth. The Phi Delta Kappan, 70(7), 521–538.

    Google Scholar 

  • Collins, M. (2012). Sagacious, sophisticated, and sedulous: The importance of discussing 50-cent words with preschoolers. YC Young Children, 67(5), 66–71.

    Google Scholar 

  • Cunningham, J. W., Hiebert, E. H., & Mesmer, H. A. (2018). Investigating the validity of two widely used quantitative text tools. Reading and Writing, 31(4), 813-833.

    Article  Google Scholar 

  • Dice, L. R. (1945). Measures of the amount of ecologic association between species. Ecology, 26(3), 297–302.

    Article  Google Scholar 

  • Eberhard, D. M., Simons, G. F., & Fennig, C. D. (eds.). 2020. What are the top 200 most spoken languages? In Ethnologue: Languages of the World, 23rd edn. SIL International.

  • Ehri, L., Nunes, S., Stahl, S., & Willows, D. (2001). Systematic phonics instruction helps students learn to read: Evidence from the National Reading Panel’s meta-analysis. Review of Educational Research, 71(3), 393–447.

    Article  Google Scholar 

  • El-Haj, M., & Rayson, P. (2016). OSMAN—A novel Arabic readability metric. In 10th edition of the Language Resources and Evaluation Conference (LREC'16). Portoroz, Slovenia.

  • El-Khair, I. A. (2016). 1.5 Billion words Arabic corpus.

  • Fry, E. (1964). A diacritical marking system to aid beginning reading instruction. Elementary English, 41(5), 526–537.

    Google Scholar 

  • Gamson, D. A., Lu, X., & Eckert, S. A. (2013). Challenging the research base of the Common Core State Standards: A historical reanalysis of text complexity. Educational Researcher, 42(7), 381–391.

    Article  Google Scholar 

  • Graesser, A. C., McNamara, D. S., & Kulikowich, J. M. (2011). Coh-Metrix: Providing multilevel analyses of text characteristics. Educational Researcher, 40(5), 223–234.

    Article  Google Scholar 

  • Gries, S. T., & Ellis, N. C. (2015). Statistical measures for usage-based linguistics. Language Learning, 65(S1), 228–255.

    Article  Google Scholar 

  • Hakvoort, B., van den Boer, M., Leenaars, T., Bos, P., & Tijms, J. (2017). Improvements in reading accuracy as a result of increased interletter spacing are not specific to children with dyslexia. Journal of Experimental Child Psychology, 164, 101–116.

    Article  Google Scholar 

  • Harris, T. L., & Hodges, R. E. (1981). A dictionary of reading and related terms. International Reading Association.

    Book  Google Scholar 

  • Haspelmath, M. (2011). The indeterminacy of word segmentation and the nature of morphology and syntax. Folia Linguistica, 45(1), 31–80.

    Article  Google Scholar 

  • Henderson, L., Weighall, A., & Gaskell, G. (2013). Learning new vocabulary during childhood: Effects of semantic training on lexical consolidation and integration. Journal of Experimental Child Psychology, 116(3), 572–592.

    Article  Google Scholar 

  • Hmeidi, I., Kanaan, G., & Evens, M. (1997). Design and implementation of automatic indexing for information retrieval with Arabic documents. Journal of the American Society for Information Science, 48(10), 867–881.

    Article  Google Scholar 

  • Khateb, A., Khateb-Abdelgani, M., Taha, H. Y., & Ibrahim, R. (2014). The impact of orthographic connectivity on visual word recognition in Arabic: A cross-sectional study. Reading and Writing, 27(8), 1413–1436.

    Article  Google Scholar 

  • Kim, Y.-S.G., Petscher, Y., & Vorstius, C. (2019). Unpacking eye movements during oral and silent reading and their relations to reading proficiency in beginning readers. Contemporary Educational Psychology, 58, 102–120.

    Article  Google Scholar 

  • Langsten, R., Abdelkhalek, F., & Hassan, T. (2020). Arabic language skills: A comparative study of community and government schools in rural Upper-Egypt. Compare: A Journal of Comparative and International Education.

    Article  Google Scholar 

  • Mandera, P., Keuleers, E., & Brysbaert, M. (2015). How useful are corpus-based methods for extrapolating psycholinguistic variables? Quarterly Journal of Experimental Psychology, 68(8), 1623–1642.

    Article  Google Scholar 

  • Masrai, A., & Milton, J. (2016). How different is Arabic from other languages? The relationship between word frequency and lexical coverage. Journal of Applied Linguistics and Language Research, 3(1), 15–35.

    Google Scholar 

  • Mesmer, H. A. E. (2009). Textual scaffolds for developing fluency in beginning readers: Accuracy and reading rate in qualitatively leveled and decodable text. Literacy Research and Instruction, 49(1), 20–39.

    Article  Google Scholar 

  • Mesmer, H. A., Cunningham, J. W., & Hiebert, E. H. (2012). Toward a theoretical model of text complexity for the early grades: Learning from the past, anticipating the future. Reading Research Quarterly, 47(3), 235–258.

    Article  Google Scholar 

  • Milton, J. (2009). Measuring second language vocabulary acquisition. Multilingual Matters.

    Article  Google Scholar 

  • Ministry of Education (MOE). (2020). Elearning entry page.

  • Nagy, W. E., & Anderson, R. (1984). How many words are there in printed school English? Reading Research Quarterly, 19(3), 304–330.

    Article  Google Scholar 

  • Oakhill, J. V., Cain, K., & Bryant, P. E. (2003). The dissociation of word reading and text comprehension: Evidence from component skills. Language and Cognitive Processes, 18(4), 443–468.

    Article  Google Scholar 

  • Oweini, A., & Hazoury, K. (2010). Towards a sight word list in Arabic. International Review of Education/internationale Zeitschrift Für Erziehungswissenschaft/revue Internationale De L’education, 56(4), 457–478.

    Google Scholar 

  • Palmer, H. E. (1917). The scientific study and teaching of languages. Harrap.

    Google Scholar 

  • Perfetti, C. (2007). Reading Ability: Lexical quality to comprehension. Scientific Studies of Reading, 11(4), 357–383.

    Article  Google Scholar 

  • Piantadosi, S. T. (2014). Zipf’s word frequency law in natural language: A critical review and future directions. Psychonomic Bulletin & Review, 21, 1112–1130.

    Article  Google Scholar 

  • Powers, M., Grisham, D., & Riles, P. (2008). Saccadic tracking skills of poor readers in high school. Optometry, 79(5), 228–234.

    Article  Google Scholar 

  • PwC. (2018). Understanding Middle East education: Egypt country profile, 1st ed.

  • Saiegh-Haddad, E. (2018). MAWRID: A model of Arabic word reading in development. Journal of Learning Disabilities, 51(5), 454–462.

    Article  Google Scholar 

  • Saiegh-Haddad, E., & Spolsky, B. (2014). Acquiring literacy in a diglossic context: Problems and prospects. In E. Saiegh-Haddad & M. Joshi (Eds.), Handbook of Arabic literacy: Insights and perspectives (pp. 225–240). Springer.

    Chapter  Google Scholar 

  • Schmitt, N., Jiang, X., & Grabe, W. (2011). The percentage of words known in a text and reading comprehension. The Modern Language Journal, 95(1), 26–43.

    Article  Google Scholar 

  • Scott, M. (2020). WordSmith Tools Help (Version 8) [Computer software].

  • Sharoff, S. (2006). Creating general-purpose corpora using automated search engine queries. In M. Baroni & S. Bernardini (Eds.), WaCky! Working papers on the web as corpus (pp. 63–98).

  • Sharoff, S. (2007). Classifying web corpora into domain and genre using automatic feature identification. In C. Fairon, H. Naets, A. Kilgarriff, & G.-M. De Schryver (Eds.), Building and exploring web corpora (Proceedings of the 3rd web as corpus workshop) (pp. 83–94). Cahiers du Cental.

    Google Scholar 

  • Taha, H. (2016). Deep and shallow in Arabic orthography: New evidence from reading performance of elementary school native Arab readers. Writing Systems Research, 8(2), 133–142.

    Article  Google Scholar 

  • Tibi, S., Edwards, A. A., Schatschneider, C., & Kirby, J. R. (2020). Predicting Arabic word reading: A cross-classified generalized random-effects analysis showing the critical role of morphology. Annals of Dyslexia, 70, 200–219.

    Article  Google Scholar 

  • Wolsey, T. D. (2019). Corpus.

  • Zack, E. (2001). The use of colloquial Arabic in prose literature: “laban il’a̩sfūr” by Yūsuf Al-Qa’īd. Quaderni Di Studi Arabi, 19, 193–219.

    Google Scholar 

  • Zaghouani, W. (2017). Critical survey of the freely available Arabic corpora. In Proceedings of the workshop on free/open-source Arabic corpora and corpora processing tools workshop programme, LREC (pp. 1–8).

  • Zeno, S. M., Ivens, S. H., Millard, R. T., & Duvvuri, R. (1995). The educator’s word frequency guide. Touchstone Applied Science Associates Inc.

    Google Scholar 

  • Zipf, G. K. (1935). The psycho-biology of language: An introduction to dynamic philology. Houghton Mifflin.

    Google Scholar 

Download references


The authors wish to thank the contributions of Mike Scott for his assistance with the Word Smith 8 tool and feedback on procedures. Blind acknowledgement to N. We are also grateful to M. Abbas for his interest in our work and for his permission to use the K and W corpora.


No funding was obtained for this project.

Author information

Authors and Affiliations



Study conception and design is the work of TDW. Material preparation, data collection and analysis were performed by TDW and IMK. EH contributed heavily to the theoretical framework. DAES provided expert Arabic language review and corrections. The first draft of the manuscript was written by TDW, EH, and IK. All authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Thomas DeVere Wolsey.

Ethics declarations

Conflicts of interest

There are no conflicts of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wolsey, T.D., Karkouti, I.M., Hiebert, E.H. et al. Texts for reading instruction and the most common words in modern standard Arabic: an investigation. Read Writ 36, 1567–1587 (2023).

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: