Abstract
Reading instruction for young Arabic speakers presents challenges for textbook publishers and teachers. In the present study, the authors conduct an analysis at the word level of four multidisciplinary textbooks for reading instruction in grades one and two in Egypt. The study sought to answer the following questions: What are the most common words in standard Arabic? How many of the most common words in standard Arabic are used in the textbooks? How dense is the use of common words? How many rare words are used in the textbooks studied? A word frequency analysis from existing corpora were used to create a most common word list. From that list, the researchers were able to determine frequency and dispersion of the most common words in Arabic that were also used in the textbooks. Frequency and dispersion were calculated by octile, as well. Analysis found that the texts did not make use of any of the rare words found in the corpus, but many words in the texts did not appear in either the reference corpus inclusive of the common words list. Recommendations for policymakers and textbook publishers follow discussion of results.
Similar content being viewed by others
Availability of data and material
Corpora are publicly available.
Code availability
N/A.
References
Abadzi, H. & Martelli, M. (2014, November 5). Efficient reading for Arab students: Implications from neurocognitive research. Paper presented at the World Summit of Innovation in Education (WISE). Doha, Qatar.
Abbas, M. (2004). Arabic corpora: Text corpora. https://sites.google.com/site/mouradabbas9/corpora/text-corpora
Abbas, M., & Smaili, K. (2005) Comparison of topic identification methods for Arabic language, RANLP05: Recent Advances in Natural Language Processing, pp. 14–17, 21–23 September 2005, Borovets, Bulgaria.
Abdelali, A., Cowie, J., & Soliman, H. S. (2004). Arabic information retrieval perspectives. In Proceedings of the 11th conference on natural language processing, Journes d’Etude sur la Parole-Traitement Automatique des Langues Naturelles (JEP-TALN) (pp. 391–400).
Abu-Rabia, S., & Taha, H. (2013). Reading in Arabic orthography: Characteristics, research findings, and assessment. In Handbook of orthography and literacy (pp. 335–352). Routledge.
Al Ghanem, R., & Kearns, D. M. (2015). Orthographic, phonological, and morphological skills and children’s word reading in Arabic: A literature review. Reading Research Quarterly, 50(1), 83–109.
Asaad, H., & Eviatar, Z. (2014). Learning to read in Arabic: The long and winding road. Reading and Writing, 27, 649–664. https://doi.org/10.1007/s11145-013-9469-9
Baddeley, A. D. (2001). Is working memory still working? American Psychologist, 56, 849–864.
Beck, I. L., & McKeown, M. G. (2007). Increasing young low-income children’s oral vocabulary repertoires through rich and focused instruction. The Elementary School Journal, 107(3), 251–271. https://doi.org/10.1086/511706
Boyle, H. N., & Salah, W. (2017). Reading reform in Egypt: Do the second-grade textbooks reflect the new direction? Prospects, 47, 197–213. https://doi.org/10.1007/s11125-018-9435-z
Brennan, W. (2018, April). Julie Washington’s quest to get schools to respect African American English. The Atlantic. https://www.theatlantic.com/magazine/archive/2018/04/the-code-switcher/554099
Chall, J. S. (1967). Learning to read: The great debate. McGraw-Hill.
Chall, J. S. (1989). Learning to read: The great debate" 20 years later: A response to ’debunking the great phonics myth. The Phi Delta Kappan, 70(7), 521–538.
Collins, M. (2012). Sagacious, sophisticated, and sedulous: The importance of discussing 50-cent words with preschoolers. YC Young Children, 67(5), 66–71.
Cunningham, J. W., Hiebert, E. H., & Mesmer, H. A. (2018). Investigating the validity of two widely used quantitative text tools. Reading and Writing, 31(4), 813-833. https://doi.org/10.1007/s11145-017-9815-4
Dice, L. R. (1945). Measures of the amount of ecologic association between species. Ecology, 26(3), 297–302. https://doi.org/10.2307/1932409
Eberhard, D. M., Simons, G. F., & Fennig, C. D. (eds.). 2020. What are the top 200 most spoken languages? In Ethnologue: Languages of the World, 23rd edn. SIL International. https://www.ethnologue.com/guides/ethnologue200
Ehri, L., Nunes, S., Stahl, S., & Willows, D. (2001). Systematic phonics instruction helps students learn to read: Evidence from the National Reading Panel’s meta-analysis. Review of Educational Research, 71(3), 393–447.
El-Haj, M., & Rayson, P. (2016). OSMAN—A novel Arabic readability metric. In 10th edition of the Language Resources and Evaluation Conference (LREC'16). Portoroz, Slovenia. http://www.lancaster.ac.uk/staff/elhaj/docs/elhajlrec2016Arabic.pdf
El-Khair, I. A. (2016). 1.5 Billion words Arabic corpus. https://www.semanticscholar.org/paper/1.5-billion-words-Arabic-Corpus-El-Khair/f3eeef4afb81223df96575adadf808fe7fe440b4
Fry, E. (1964). A diacritical marking system to aid beginning reading instruction. Elementary English, 41(5), 526–537.
Gamson, D. A., Lu, X., & Eckert, S. A. (2013). Challenging the research base of the Common Core State Standards: A historical reanalysis of text complexity. Educational Researcher, 42(7), 381–391.
Graesser, A. C., McNamara, D. S., & Kulikowich, J. M. (2011). Coh-Metrix: Providing multilevel analyses of text characteristics. Educational Researcher, 40(5), 223–234. https://doi.org/10.3102/0013189X11413260
Gries, S. T., & Ellis, N. C. (2015). Statistical measures for usage-based linguistics. Language Learning, 65(S1), 228–255. https://doi.org/10.1111/lang.12119
Hakvoort, B., van den Boer, M., Leenaars, T., Bos, P., & Tijms, J. (2017). Improvements in reading accuracy as a result of increased interletter spacing are not specific to children with dyslexia. Journal of Experimental Child Psychology, 164, 101–116. https://doi.org/10.1016/j.jecp.2017.07.010
Harris, T. L., & Hodges, R. E. (1981). A dictionary of reading and related terms. International Reading Association.
Haspelmath, M. (2011). The indeterminacy of word segmentation and the nature of morphology and syntax. Folia Linguistica, 45(1), 31–80. https://doi.org/10.1515/flin.2011.002
Henderson, L., Weighall, A., & Gaskell, G. (2013). Learning new vocabulary during childhood: Effects of semantic training on lexical consolidation and integration. Journal of Experimental Child Psychology, 116(3), 572–592.
Hmeidi, I., Kanaan, G., & Evens, M. (1997). Design and implementation of automatic indexing for information retrieval with Arabic documents. Journal of the American Society for Information Science, 48(10), 867–881.
Khateb, A., Khateb-Abdelgani, M., Taha, H. Y., & Ibrahim, R. (2014). The impact of orthographic connectivity on visual word recognition in Arabic: A cross-sectional study. Reading and Writing, 27(8), 1413–1436. https://doi.org/10.1007/s11145-014-9499-y
Kim, Y.-S.G., Petscher, Y., & Vorstius, C. (2019). Unpacking eye movements during oral and silent reading and their relations to reading proficiency in beginning readers. Contemporary Educational Psychology, 58, 102–120. https://doi.org/10.1016/j.cedpsych.2019.03.002
Langsten, R., Abdelkhalek, F., & Hassan, T. (2020). Arabic language skills: A comparative study of community and government schools in rural Upper-Egypt. Compare: A Journal of Comparative and International Education. https://doi.org/10.1080/03057925.2020.1843003
Mandera, P., Keuleers, E., & Brysbaert, M. (2015). How useful are corpus-based methods for extrapolating psycholinguistic variables? Quarterly Journal of Experimental Psychology, 68(8), 1623–1642. https://doi.org/10.1080/17470218.2014.988735
Masrai, A., & Milton, J. (2016). How different is Arabic from other languages? The relationship between word frequency and lexical coverage. Journal of Applied Linguistics and Language Research, 3(1), 15–35.
Mesmer, H. A. E. (2009). Textual scaffolds for developing fluency in beginning readers: Accuracy and reading rate in qualitatively leveled and decodable text. Literacy Research and Instruction, 49(1), 20–39.
Mesmer, H. A., Cunningham, J. W., & Hiebert, E. H. (2012). Toward a theoretical model of text complexity for the early grades: Learning from the past, anticipating the future. Reading Research Quarterly, 47(3), 235–258.
Milton, J. (2009). Measuring second language vocabulary acquisition. Multilingual Matters. https://doi.org/10.21832/9781847692092
Ministry of Education (MOE). (2020). Elearning entry page. https://moe.gov.eg/en/elearningenterypage
Nagy, W. E., & Anderson, R. (1984). How many words are there in printed school English? Reading Research Quarterly, 19(3), 304–330. https://doi.org/10.2307/747823
Oakhill, J. V., Cain, K., & Bryant, P. E. (2003). The dissociation of word reading and text comprehension: Evidence from component skills. Language and Cognitive Processes, 18(4), 443–468. https://doi.org/10.1080/01690960344000008
Oweini, A., & Hazoury, K. (2010). Towards a sight word list in Arabic. International Review of Education/internationale Zeitschrift Für Erziehungswissenschaft/revue Internationale De L’education, 56(4), 457–478.
Palmer, H. E. (1917). The scientific study and teaching of languages. Harrap.
Perfetti, C. (2007). Reading Ability: Lexical quality to comprehension. Scientific Studies of Reading, 11(4), 357–383. https://doi.org/10.1080/10888430701530730
Piantadosi, S. T. (2014). Zipf’s word frequency law in natural language: A critical review and future directions. Psychonomic Bulletin & Review, 21, 1112–1130. https://doi.org/10.3758/s13423-014-0585-6
Powers, M., Grisham, D., & Riles, P. (2008). Saccadic tracking skills of poor readers in high school. Optometry, 79(5), 228–234. https://doi.org/10.1016/j.optm.2007.07.014
PwC. (2018). Understanding Middle East education: Egypt country profile, 1st ed. https://www.pwc.com/m1/en/industries/education/publications/education-country-profile-egypt.pdf
Saiegh-Haddad, E. (2018). MAWRID: A model of Arabic word reading in development. Journal of Learning Disabilities, 51(5), 454–462. https://doi.org/10.1177/0022219417720460
Saiegh-Haddad, E., & Spolsky, B. (2014). Acquiring literacy in a diglossic context: Problems and prospects. In E. Saiegh-Haddad & M. Joshi (Eds.), Handbook of Arabic literacy: Insights and perspectives (pp. 225–240). Springer.
Schmitt, N., Jiang, X., & Grabe, W. (2011). The percentage of words known in a text and reading comprehension. The Modern Language Journal, 95(1), 26–43.
Scott, M. (2020). WordSmith Tools Help (Version 8) [Computer software]. https://lexically.net/wordsmith/support/
Sharoff, S. (2006). Creating general-purpose corpora using automated search engine queries. In M. Baroni & S. Bernardini (Eds.), WaCky! Working papers on the web as corpus (pp. 63–98). http://wackybook.sslmit.unibo.it/pdfs/sharoff.pdf
Sharoff, S. (2007). Classifying web corpora into domain and genre using automatic feature identification. In C. Fairon, H. Naets, A. Kilgarriff, & G.-M. De Schryver (Eds.), Building and exploring web corpora (Proceedings of the 3rd web as corpus workshop) (pp. 83–94). Cahiers du Cental.
Taha, H. (2016). Deep and shallow in Arabic orthography: New evidence from reading performance of elementary school native Arab readers. Writing Systems Research, 8(2), 133–142. https://doi.org/10.1080/17586801.2015.1114910
Tibi, S., Edwards, A. A., Schatschneider, C., & Kirby, J. R. (2020). Predicting Arabic word reading: A cross-classified generalized random-effects analysis showing the critical role of morphology. Annals of Dyslexia, 70, 200–219. https://doi.org/10.1007/s11881-020-00193-y
Wolsey, T. D. (2019). Corpus.
Zack, E. (2001). The use of colloquial Arabic in prose literature: “laban il’a̩sfūr” by Yūsuf Al-Qa’īd. Quaderni Di Studi Arabi, 19, 193–219.
Zaghouani, W. (2017). Critical survey of the freely available Arabic corpora. In Proceedings of the workshop on free/open-source Arabic corpora and corpora processing tools workshop programme, LREC (pp. 1–8). https://doi.org/10.13140/RG.2.1.1362.1284
Zeno, S. M., Ivens, S. H., Millard, R. T., & Duvvuri, R. (1995). The educator’s word frequency guide. Touchstone Applied Science Associates Inc.
Zipf, G. K. (1935). The psycho-biology of language: An introduction to dynamic philology. Houghton Mifflin.
Acknowledgements
The authors wish to thank the contributions of Mike Scott for his assistance with the Word Smith 8 tool and feedback on procedures. Blind acknowledgement to N. We are also grateful to M. Abbas for his interest in our work and for his permission to use the K and W corpora.
Funding
No funding was obtained for this project.
Author information
Authors and Affiliations
Contributions
Study conception and design is the work of TDW. Material preparation, data collection and analysis were performed by TDW and IMK. EH contributed heavily to the theoretical framework. DAES provided expert Arabic language review and corrections. The first draft of the manuscript was written by TDW, EH, and IK. All authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Conflicts of interest
There are no conflicts of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Wolsey, T.D., Karkouti, I.M., Hiebert, E.H. et al. Texts for reading instruction and the most common words in modern standard Arabic: an investigation. Read Writ 36, 1567–1587 (2023). https://doi.org/10.1007/s11145-022-10307-0
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11145-022-10307-0