1 Introduction

Second and foreign language (L2) reading offers pedagogical advantages, such as providing examples of creative language use and abundant cultural information in authentic contexts. In both second- and foreign-language environments, reading has long been recognized as the primary source of L2 vocabulary learning (Boers, 2022; Huckin & Coady, 1999; Krashen, 1989; Pigada & Schmitt, 2006; Waring & Takaki, 2003; Webb & Chang, 2015). With the increasing popularity of digital technology in language education (Golonka et al., 2014), digital reading has become ubiquitous for L2 reading. This study broadly defines 'digital reading' as onscreen reading with or without access to digital resources outside the reading texts. Given the crucial role of vocabulary in all language use (Schmitt et al., 2017, 2021) and successful language learning (Devine, 1988, p. 49; Laufer, 2003), the effectiveness of digital reading for L2 vocabulary learning has been the subject of much research interest for the last twenty years (Akbulut, 2007; AkbuSeileek, 2011; Hsieh et al., 2012; Hsu et al., 2013; Lee et al., 2016; Ruiz et al., 2021; Wang, 2016). Constantly evolving e-learning and artificial intelligence (AI) tools present new opportunities and challenges for effective L2 vocabulary learning through digital reading, which allows not only anytime-and-anywhere reading but also access to various digital resources, including but not limited to lexical priming, quizzes, audio narrations, lexical glosses, and personalized reading systems.

Although L2 vocabulary learning through reading can be enhanced by different digital resources, it remains unknown which resource is most facilitative to that learning. The present study rigorously synthesizes the overall effect of digital reading on L2 vocabulary learning. Twenty-one prominent empirical studies in this meta-analysis create a panoramic view of the cumulative effect and address the moderating effect regarding the potential variables: 1) L2 proficiency, 2) test formats, and 3) digital resources. These variables are identified as potential moderators based on previous L2 reading and vocabulary learning research findings. Understanding the relationship between learners' individual differences, research designs, and digital affordances has important implications for maximizing the effectiveness of digital reading for L2 vocabulary learning. In addition, the results provide critical insights for future research directions.

2 Literature Review

2.1 Previous Meta-Analyses

Several attempts have been made to synthesize studies comparing reading from paper and digital devices. Delgado et al. (2018) include 38 between-subject studies and 16 within-subject studies that were published in seven databases between 2000 and 2017. The mean effect size (g = -0.21) suggested that paper-based reading outperformed digital reading. Clinton (2019) also conducted a meta-analysis comparing reading from paper and screens, which includes 29 studies that were published in seven databases between 2008 and 2018. These studies overall generate 33 independent effect sizes for onscreen and paper-based reading performance. Results showed that onscreen reading had a negative effect on reading performance compared to paper-based reading (g = -0.25). Although previous meta-analyses have revealed an advantage of paper-based reading over digital reading, the generalizability of this finding is limited by measurements and resources. These meta-analyses primarily focused on measuring reading comprehension, whereas other learning outcomes, such as vocabulary acquisition, were under-explored. Without sufficient vocabulary skills, the concepts underlying given words cannot be known, leaving the knowledge of syntax and discourse almost useless (Schmitt et al., 2021). Receiving abundant and adequate input during reading is one of the most efficient ways to improve vocabulary skills (Krashen, 1989). Therefore, it is necessary to explore the facilitative potential of digital reading for vocabulary learning. The learning-enhancing power of digital reading was further underestimated by the fact that digital resources outside the reading texts were largely neglected.

To date, only a few meta-analyses have been conducted on L2 vocabulary learning through digital reading, with a specific focus on digital resources outside the reading texts. For example, Abraham (2008) conducted a meta-analysis to investigate the overall effects of digital reading with lexical glosses on L2 vocabulary learning. The study analyzed 11 published papers up to 2007. Digital reading showed a large mean weighted effect on immediate (d = 1.40) and delayed (d = 1.25) vocabulary post-tests compared to control groups without access to digital glosses. Later, Yun (2011) ran a similar meta-analysis of 10 papers published between 1990 and 2009, and the study found a medium positive effect (g = 0.46) of digital reading with lexical glosses on L2 vocabulary learning.

Although revealing, these previous meta-analytic studies have several drawbacks needing to be addressed. First, these studies were conducted more than ten years ago and reported inconsistent results (e.g., Abraham, 2008; Yun, 2011). Hence, the research area calls for up-to-date systematic reviews with more recent empirical studies focusing on vocabulary learning through digital reading. Second, despite recent review studies on L2 vocabulary learning and lexical glosses (Boers, 2022; Ramezanali et al., 2021; Vahedi et al., 2016; Yanagisawa et al., 2020), none were specifically conducted in the digital context. Moreover, L2 vocabulary learning through reading can be facilitated by not only lexical glosses but also other digital resources such as personalized reading systems (Wang, 2016). Thus, the effect of digital reading on L2 vocabulary learning should be synthesized and compared from a more holistic perspective. To fill these gaps, the current study consolidates the literature on the accuracy and robustness of L2 vocabulary learning through digital reading. Meanwhile, this meta-analysis focuses on potential moderating factors for a more thorough understanding of the role of digital reading in L2 vocabulary learning. The following section provides a detailed discussion and justification for the selection of the influential moderator variables, including differences in learner variables, research designs, and digital affordances.

2.2 Potential Moderator Variables

2.2.1 Learner Variables: Individual Differences in L2 Proficiency

Successful L2 vocabulary learning through reading primarily relies on text comprehension and word inference (Boers, 2022; Krashen, 1989). Whether a text can be sufficiently understood and whether novel words can be successfully inferred are influenced by the learner and textual factors. Learner factors mainly refer to learners' individual differences in L2 proficiency. When learners notice an unknown word, they usually need to pause to make inferences or find references to the word. Those with lower L2 proficiency may have difficulty inferring and identifying the correct word meaning within the reading context (Bengeleil & Paribakht, 2004). It seems that more proficient learners acquire L2 vocabulary more efficiently through reading. For example, Vahedi et al. (2016) found that L2 proficiency was a statistically significant moderator variable (Q = 6.53, p < 0.05). In particular, compared to beginners (g = 0.75), intermediate (g = 0.85) and advanced learners (g = 0.82) more efficiently learned L2 vocabulary through reading with lexical glosses. However, Yun (2011) found that onscreen reading with computerized lexical glosses was most effective for beginners (g = 0.70) and least effective for intermediate learners (g = 0.23). It appears that the relationship between language proficiency and learning efficiency varies with reading settings and designs. As suggested by Yanagisawa et al. (2020), more research needs to be conducted to investigate the interaction between different reading designs and L2 proficiency. To this end, we include learners' individual differences in language proficiency as a moderator variable.

2.2.2 Research Designs: Vocabulary Test Formats

Another factor that influences text comprehension and vocabulary learning is the textual factor, which mainly refers to the density of novel words and the number of word recurrences. Comprehensible reading texts require learners to recognize and decode a minimum of 95% of the words in a text (Laufer, 1997; Nation, 2013). If the ratio of novel words exceeds 5%, learners may not be able to obtain sufficient contextual clues to infer their meaning (Laufer, 2020). However, due to the fact that word learning is an incremental process (Milton, 2009; Schmitt, 2010) and that word knowledge is a multifaceted construct (Fitzpatrick & Clenton, 2017; Nation, 2013), correct inference of word meaning is usually inadequate for efficient vocabulary learning. For one thing, a novel word must reappear several times before being learned incidentally through reading (Chen & Truscott, 2010; Pellicer-Sánchez, 2016; Waring & Takaki, 2003). A new meeting with a word could extend or consolidate the lexical knowledge gained from previous meetings (Webb & Chang, 2015). For another, the acquisition of receptive vocabulary knowledge usually comes before productive vocabulary knowledge, rendering word recall more difficult to be acquired than word recognition (Laufer & Goldstein, 2004; Laufer & Paribakht, 1998; González-Fernández & Schmitt, 2020). Therefore, different test formats may have a moderating effect on vocabulary gains through reading. In a meta-analysis on L2 vocabulary learning with lexical glosses, Ramezanali et al. (2021) found no significant difference across the four test formats (Q = 0.29, p > 0.05), i.e., form recall (g = 0.44), form recognition (g = 0.61), meaning recall (g = 0.35), and meaning recognition (g = 0.49), though other formats such as vocabulary knowledge scale were not explored. These results suggest a need for further investigation of the relationship between different aspects of word knowledge and vocabulary test formats. This meta-analysis sets out to explore the potential moderating effect of test formats on L2 vocabulary learning through digital reading.

2.2.3 Digital Affordances: Facilitating Digital Resources

Vocabulary learning through reading can be facilitated by digital resources, as they enable anytime-and-anywhere learning. More importantly, digital resources cater to the learner and textual factors. For example, digital reading can present reading texts with lexical glosses, which compensate for limited vocabulary size by providing first language (L1) or L2 explanations of word meanings embedded in or hyperlinked to texts, often in bold or colored forms (Chen & Yen, 2013; Huang, 2018). Based on Schmidt's (1990) noticing hypothesis, lexical glosses enhance vocabulary learning as the bold or colored forms draw learners' attention to the glossed words (Rouhi & Mohebbi, 2012; Yanguas, 2009). With the explanations of word meanings, learners can easily obtain correct form-meaning connections and better comprehend texts (Yanagisawa et al., 2020). Digital reading also can present reading texts with lexical priming, which momentarily exposes learners to a formal priming stimulus before displaying the target word (Liu & Leveridge, 2017). For example, the word 'fake' is formally similar to the target word 'fate' and thus can be selected as the formal priming stimulus. Briefly presenting the stimulus before the target word can pre-activate learners' lexical knowledge and enhance word recognition (Liu & Leveridge, 2017). With recent advances in e-learning tools, digital reading allows learners to access multiple digital resources simultaneously. For example, learners can use a reading system to access e-dictionaries, lexical glosses, quizzes, and/or audio narration while reading onscreen. Further, natural language processing tools and adaptive algorithms afford AI reading systems, such as personalized reading systems, which can analyze and accumulate learner profiles while recommending the most appropriate reading materials. The recommended reading materials are designed to contain new words they have encountered lately to reinforce word learning and suit learners' language proficiency. In other words, personalized reading systems provide comprehensible texts while increasing the number of word recurrences. Although empirical research respectively showed the effectiveness of L2 vocabulary learning through onscreen reading with access to lexical glosses (AkbuSeileek, 2011; Khezrlou, 2019; Khezrlou & Ellis, 2017), multiple digital resources (Gorjian et al., 2011; Johnson & Heffernan, 2006; Proctor et al., 2007), or personalized reading systems (Hsieh et al., 2012; Hsu et al., 2013; Wang, 2016), the most effective design of digital reading remains unknown. This is one major gap that this meta-analysis aims to address by conducting a moderator analysis to compare the effectiveness of onscreen reading with access to different digital resources for L2 vocabulary learning.

2.3 Research Questions

The current meta-analysis intends to investigate L2 vocabulary learning through digital reading from a holistic perspective. In addition to the overall effect of digital reading, three moderator variables were included and analyzed. The following two research questions guided this study:

  1. 1)

    What is the overall effect of digital reading on L2 vocabulary learning?

  2. 2)

    How is the effect of digital reading on L2 vocabulary learning moderated by L2 proficiency, test formats, and digital resources?

3 Methodology

3.1 Literature Search and Retrieval

We conducted an exhaustive literature search to retrieve quantitative studies with a within-subject or between-subject design that explored the effects of digital reading on L2 vocabulary learning. We aimed to search and retrieve high-quality academic literature written in English and published in the last twenty years. Eight scholarly recognized databases were used for potential literature search: Academic Search Premier, ERIC, JSTOR, LLBA, PsycArticles, PsycINFO, Scopus, and Web of Science. Various combinations of keywords were used for each database: computerized OR mobile OR digital OR electronic OR personalized OR adaptive OR intelligent OR hypertext OR hypermedia AND reading AND vocabulary AND learning. To ensure that all relevant literature has been included, we consulted the reference sections of previous review studies. Bibliographies from retrieved studies were also cross-referenced.

3.2 Literature Screening and Reviewing

The title and abstract of all records were initially screened independently by all authors. Afterward, full texts of these initially screened records were retrieved and uploaded at Endnote with duplicates manually removed. Each text was further examined by all authors. Records meeting the inclusion and exclusion criteria were finally identified as eligible for data extraction and coding. The inclusion and exclusion criteria are presented below. Any discrepancies during literature screening and reviewing were resolved by discussion and consensus.

  • Studies published in the last twenty years were included.

  • Studies written in English were included.

  • Studies with vocabulary measurements were included.

  • Studies provided with means, sample sizes, and standard deviations for calculating Cohen's d were included.

  • Within-subject studies investigating L2 vocabulary learning through digital reading with a pretest–posttest design were included.

  • Between-subject studies were included when L2 vocabulary gains through onscreen reading without access to digital resources outside the actual texts were evaluated against the gains of a comparison group, in which learners read on papers with the same instruction.

  • Between-subject studies were included when L2 vocabulary gains through onscreen reading with access to an extra digital resource outside the actual texts were evaluated against the gains of a comparison group, in which learners receive the same instruction but read onscreen without access to the digital resource.

  • Case or review studies were excluded.

  • Studies without accessible full texts were excluded.

  • Studies which focused on participants who were diagnosed with learning disabilities, such as dyslexia, were excluded.

Following the Preferred Reporting Items for Systematic reviews and Meta-Analyses (PRISMA) 2020 statement (Page et al., 2021), Fig. 1 shows detailed information about what the authors did and found during the process of literature screening and reviewing. As shown in Fig. 1, 21 studies, asterisked in the References section, were finally included in this meta-analysis.

Fig. 1
figure 1

PRISMA Flow Chart for Literature Searching and Reviewing

3.3 Data Coding

Data coding is essential for meta-analysis, whereby various information from studies is translated into a standardized format on a coding scheme table (Plonsky & Oswald, 2015, p. 246). Microsoft Excel was used for data recording and coding. As shown in Table 1, the present study coded all eligible studies with three main coding categories. First, the information on the study context (i.e., publication characteristics and learner variables) was collected and coded if applicable. Secondly, the information on the research design (i.e., between-subject or within-subject research design, text readability, digital resources outside the reading texts, and vocabulary measurements) was also collected and coded if applicable. Finally, descriptive statistics of sample sizes (N), mean scores (M), and standard deviations (SD) were utilized for calculating the effect sizes.

Table 1 Coding Book

Two independent coders were involved in the coding process to ensure the reliability of the research. To identify the strength of agreement between the two coders, Cohen's Kappa statistic (k) was calculated using SPSS software and used to determine the inter-coder reliability. Table 2 presents the criteria for interpreting the magnitudes of k values. Kappa is often 'presented along with the agreement rate, which is the number of agreed-on codes divided by the total number of coding opportunities' (Cooper, 2017, p. 136). The agreement rate and the Kappa coding statistic were 99.82% and 0.99, respectively, indicating an almost perfect agreement between the two coders. After reaching the agreement rate and Kappa statistics, the two coders discussed and resolved the one discrepancy.

Table 2 Interpretation of Cohen's Kappa Statistic (k) for the Strength of Agreement (Landis & Koch, 1977)

3.4 Meta-Analysis Procedures and Statistics

The software used to conduct the meta-analytic procedures was StataSE Version 17.0. As the effect sizes of within-subject studies tend to be larger than between-subject studies (Plonsky & Oswald, 2014), unweighted and weighted effect sizes were calculated respectively for within-subject and between-subject studies. Table 3 presents Plonsky and Oswald's (2014) criteria for interpreting the magnitudes of d-type effect size. To estimate the range of effects across the participant population, a 95% confidence interval was constructed. In addition, weighted effect sizes were respectively calculated by sample size for within-subject and between-subject studies.

Table 3 The Interpretation of Effect Size (Plonsky & Oswald, 2014)

The heterogeneity of the effect sizes was examined by employing Q statistics and I-squared (I2) statistics. A statistically significant Q-test rejects the null hypothesis and indicates that the variance of effect sizes is more than the sampling error. In addition, I2 statistics indicate the total variance rate. Specifically, I2 > 25% indicates heterogeneity among the eligible studies included in the meta-analysis (Talan et al., 2020; Upadhyay et al., 2022). I2 > 75% indicates considerable heterogeneity (Cooper, 2017). Since the Q statistics and I2 statistics for within-subject (I2 = 91.55% > 25%; Q = 71.71, p < 0.01) and between-subject (I2 = 98.44% > 75%; Q = 1229.89, p < 0.01) studies indicated considerable heterogeneity, this meta-analysis employed the random-effects model, which calculates the variability in effect sizes due to study-level variance (Cooper, 2017). Further, both sampling error and moderator variables may contribute to heterogeneity. In particular, the heterogeneity due to moderator variables can be further identified by subgroup analysis when more than two eligible studies are included in a subgroup (Upadhyay et al., 2022).

Publication bias was assessed for the reliability of the study. Academic publications have drawn attention to studies reporting statistically significant and positive results with larger effect sizes, which could result in publication bias (Lin & Chu, 2018; Merino-Armero et al., 2021; Talan et al., 2020). Publication bias may also be caused by the researcher's expectation of a good performance, the name of some authors, and some degree of subjectivity in assessing outcomes (Merino-Armero et al., 2021). As this meta-analysis adopted the random-effects model, greater precision power was given to studies with larger sample sizes. Therefore, instead of using the funnel plot, a nonparametric trim-and-fill analysis of publication bias was conducted to 'estimate the number of missing studies that might exist in a meta-analysis' (Duval & Tweedie, 2000, p. 456). The results showed that no new studies would be necessary for immediate effect sizes of both within-subject (observed = 7, imputed = 0) and between-subject studies (observed = 46, imputed = 0). This implied that the meta-analysis for immediate effect sizes was not influenced by publication bias. However, the results showed that one imputed study was respectively necessary for delayed effect sizes of within-subject (observed = 4, imputed = 1) and between-subject studies (observed = 20, imputed = 1). As a result, delayed effect sizes were not weighted but reported as aggregated mean effect sizes for both observed and observed-plus-imputed studies.

4 Results

In terms of publication characteristics, the eligible studies included in this meta-analysis were published from 2006 to 2021 in 15 journals and two conference proceedings. All included studies were published in different journals or conference proceedings except for eight studies, two of which were each published in Computer-Assisted Language Learning, Computers & Education, Language Learning and Technology, or Language Teaching Research. As for learner variables, most participants reached an intermediate level of L2 proficiency. In terms of research designs, five studies had within-subject designs, while the other 16 had between-subject designs. Lexical glosses were the most frequently used digital resources outside the reading texts. Among all eligible studies, fifteen used lexical glosses while one used a personalized reading system, one used lexical priming, one read onscreen without access to any digital resources outside the reading texts, and three used multiple resources. The most frequently used test format was form recall, followed by meaning recognition. Finally, in terms of descriptive statistics, the sample sizes of the 21 included studies ranged from 16 to 282. A total of 77 effect sizes were yielded on immediate and delayed vocabulary post-tests. Five within-subject studies yielded 11 effect sizes, seven of which were immediate effect sizes ranging from 0.14 to 3.76, and the other four were delayed effect sizes ranging from 0.53 to 1.3. Sixteen between-subject studies yielded 66 effect sizes, 46 of which were immediate effect sizes ranging from 0.03 to 11.18. The other 20 delayed effect sizes ranged from -0.12 to 8.17.

Table 4 presents the immediate effect of digital reading on L2 vocabulary learning. The third column reports the aggregated effect size, and the fourth column reports the effect size weighted by sample size. As can be seen in Table 4, digital reading had a statistically significant effect on L2 vocabulary learning. To be specific, digital reading respectively had an upper-medium effect and a large immediate effect on L2 vocabulary for within-subject studies (dweighted = 1.39, p < 0.01) and between-subject studies (dweighted = 1.45, p < 0.01). The immediate effect sizes and confidence intervals for each within-subject and between-subject studies are shown, respectively, by Fig. 2 and Fig. 3.

Table 4 Immediate Effect Sizes
Fig. 2
figure 2

Forest Plot of the Immediate Effects of Within-Subject Studies

Fig. 3
figure 3

Forest Plot of the Immediate Effects of Between-Subject Studies

Table 5 reports the delayed effect of digital reading on L2 vocabulary learning. As previously analyzed, there was one imputed study for within-subject delayed effect sizes and one imputed study for between-subject delayed effect sizes. The third column of Table 5 presents the aggregated effect size for observed studies, and the fourth column presents the aggregated effect size for observed and imputed studies. As shown in Table 5, digital reading had a medium effect on delayed vocabulary post-tests of within-subject studies (dobserved+imputed = 0.86, p < 0.01), indicating that the overall effect decreased over time. In terms of between-subject studies, digital reading was found to have an accumulated large effect on L2 vocabulary learning as the delayed effect size (dobserved+imputed = 2.98, p < 0.01) was larger than the immediate effect size (dweighted = 1.45, p < 0.01). The delayed effect sizes and confidence intervals for each within-subject and between-subject study can be respectively found in Fig. 4 and Fig. 5.

Table 5 Delayed Effect Sizes
Fig. 4
figure 4

Forest Plot of the Delayed Effects of Within-Subject Studies

Fig. 5
figure 5

Forest Plot of the Delayed Effects of Between-Subject Studies

Subgroup analysis was conducted to investigate the sources of heterogeneity and their moderating effects. This meta-analysis selected three potential moderator variables, namely L2 proficiency, test formats, and digital resources, to explore their moderating effects on L2 vocabulary learning through reading. Table 6 and Table 7, respectively, show the moderating effects on effect sizes of within-subject and between-subject studies. To visualize the results and add transparency, forest plots are additionally illustrated in Fig. 6 and Fig. 7.

Table 6 Moderating Effects on Effect Sizes of Within-Subject Studies
Table 7 Moderating Effects on Effect Sizes of Between-Subject Studies
Fig. 6
figure 6

The Forest Plot for Moderating Effects on Effect Sizes of Within-Subject Studies

Fig. 7
figure 7

The Forest Plot for Moderating Effects on Effect Sizes of Between-Subject Studies

L2 proficiency was found to have a statistically significant moderating effect on effect sizes of both within-subject (Q = 19.35, p < 0.01) and between-subject (Q = 78.63, p < 0.01) studies. In within-subject comparisons, digital reading had an upper-medium effect on intermediate L2 learners' vocabulary learning (d = 1.35, p < 0.01), and the effect was statistically significant. For L2 beginners, digital reading only had a statistically significant and small effect (d = 0.40, p = 0.02) on their vocabulary learning. As for between-subject comparisons, digital reading had a statistically significant and large effect on learners at both the beginning (d = 1.42, p < 0.01) and intermediate (d = 3.73, p < 0.01) levels of L2 proficiency.

Further, the effect sizes of within-subject studies (Q = 6.77, p = 0.03) and between-subject studies (Q = 21.21, p < 0.01) were both significantly varied by test formats. For within-subject studies, meaning recognition had a statistically significant and medium effect (d = 0.64, p = 0.02) on L2 vocabulary learning through digital reading. Other test formats, including vocabulary knowledge scale (d = 2.24, p < 0.01) and mixed formats (d = 1.38, p < 0.01), had statistically significant and large effects on L2 vocabulary learning through digital reading. For between-subject studies, all test formats had a large moderating effect, and their effects were statistically significant.

Finally, the inspection of the effect sizes indicated that digital resources only had a statistically significant moderating effect on between-subject studies (Q = 55.80, p < 0.01). Among all accessible digital resources, personalized reading systems (d = 1.58, p < 0.01) and lexical glosses (d = 2.59, p < 0.01) showed statistically significant and large moderating effects on L2 vocabulary learning through digital reading.

5 Discussion

The main objectives of this meta-analysis were to synthesize research results and examine moderator variables to understand the effect of digital reading on L2 vocabulary learning. Mean effect sizes were calculated, and potential moderator variables were examined to capture the complex relationship between digital reading and L2 vocabulary learning. Regarding the mean effect sizes, a significant effect of digital reading was found for immediate L2 vocabulary tests, with an upper-medium effect of within-subject studies (dweighted = 1.39, p < 0.01) and a large effect of between-subject studies (dweighted = 1.45, p < 0.01). These positive results advocate for learning L2 vocabulary through digital reading, although the positive effect decreased over time for within-subject studies (dobserved+imputed = 0.86, p < 0.01). Conversely, the effect for between-subject studies accumulated over time (dobserved+imputed = 2.98, p < 0.01). Compared with the effect sizes recently calculated by Ramezanali et al. (2021), both the immediate (g = 0.46) and delayed (g = 0.28) effect sizes were smaller than this meta-analysis found. This may be explained by the fact that some studies included in Ramezanali et al. (2021) were not conducted in digital contexts but focused on paper reading with lexical glosses, while this meta-analysis only included onscreen reading with lexical glosses. In digital contexts, lexical glosses can be provided through hyperlinks, allowing learners to access or not access glosses, avoiding their attention being split between the text and glosses, and mitigating their cognitive load (AkbuSeileek, 2017; Chen & Yen, 2013). In line with this discussion, Abraham's (2008) meta-analysis on L2 vocabulary learning with computerized lexical glosses reported larger effect sizes (dimmediate = 1.4, ddelayed = 1.25) than Ramezanali et al. (2021). Nevertheless, the weighted effect sizes reported by this meta-analysis were still larger. This may be explained by the fact that Abraham (2008) and Ramezanali et al. (2021) only investigated the facilitative potential of lexical glosses while this meta-analysis included other resources such as personalized reading systems. Different digital resources accessible to onscreen reading have moderating effects on L2 vocabulary learning through digital reading, which is further discussed in the following analysis of moderator variables.

Subgroup analysis suggested that the digital resource was a statistically significant moderator variable for between-subject studies. Onscreen reading without access to any digital resources outside the reading texts (d = 0.17, p = 0.57) or onscreen reading with lexical priming (d = 0.26, p = 0.28) only had small effects on L2 vocabulary learning. Onscreen reading with access to multiple digital resources outside the reading texts (d = 1.17, p = 0.26), on the other hand, had a large effect on L2 vocabulary learning. However, their effects were not statistically significant, which may be due to the small number of eligible studies. Therefore, these results must be interpreted with caution. Personalized reading systems (d = 1.58, p < 0.01) and lexical glosses (d = 2.59, p < 0.01) had statistically significant and large moderating effects while lexical glosses appeared to be the most effective resource. Since only one eligible study used personalized reading systems, comparing its effect to the effect of 15 eligible studies of lexical glosses might be risky. To interpret the result, one advantage of personalized reading systems is that they adapt reading texts to learners' language proficiency. Although personalized reading systems afford reading comprehensible texts, which is one of the most efficient ways for L2 learning (Boers, 2022; Krashen, 1989), all eligible studies on lexical glosses used comprehensible texts and counterbalanced the adaptive advantage of personalized reading systems. Another advantage of personalized reading systems is that they recommend texts containing unfamiliar words that learners have previously encountered so as to increase the number of word recurrences and enhance vocabulary learning. Nevertheless, learners may not correctly infer word meanings, as no definitions were provided for unfamiliar words in the recommended texts. Further, personalized reading systems do not highlight these words, so learners may skip novel words during reading instead of noticing them and inferring their meanings. According to the noticing hypothesis (Schmidt, 1990), a lexical item must be noticed before being processed and learned. In other words, any lexical item that is not noticed is unlikely to be learned.

In addition to digital resources, L2 proficiency was found to have a statistically significant moderating effect on both within-subject and between-subject studies. In both cases, although intermediate L2 learners produced wider confidence intervals, they benefited more from digital reading than L2 beginners. The larger effect for intermediate L2 learners may be explained by the concerns raised about L2 beginners lacking enough vocabulary base to infer word meanings and retain words in the reading context (AkbuSeileek, 2011; Bengeleil & Paribakht, 2004; Laufer, 1997). Intermediate L2 learners, on the other hand, can make inferences about novel words based on contextual cues. Therefore, intermediate L2 learners can better learn L2 vocabulary through onscreen reading without access to any digital resources outside the reading texts, digital reading with lexical priming, or digital reading via personalized reading systems. As for digital reading with lexical glosses or multiple resources, although learners do not need to infer novel words as word meanings are provided, less proficient learners tend to be less efficient in allocating attentional resources than higher proficient learners (AkbuSeileek, 2008; Payne & Ross, 2005; Liu & Leveridge, 2017; Ruiz et al., 2021). To be specific, L2 beginners split more attention between the reading text and lexical gloss than L2 intermediate learners.

This finding is consistent with Abraham's (2008), which reported that more proficient learners (dbeginning = 0.57, dintermediate = 1.34, dadvanced = 2.06) could better connect vocabulary in the glosses to their pre-existing vocabulary network and semantic system. Similar conclusions have been drawn in recent empirical research focusing on the development of other aspects of foreign and L2 skills. For instance, Zhang and MacWhinney (2023) demonstrated that, compared to beginning learners, increasing unfamiliar training stimuli will more effectively help intermediate learners acquire the phonetic knowledge of an L2. Lantz-Andersson (2018) indicated that language activities on social platforms provide diverse linguistic repertoires for L2 learners to develop their L2 socio-pragmatic competence, but advanced L2 proficiencies are needed to better exploit such skills of language-in-use on social media. Pedagogically, the findings tend to endorse a graduated increase of more novelty and diversified instructional designs and learning materials and strategies for L2 learners tailored to their proficiency levels and individual differences. As more specifically shown in the current study, this approach entails broadening exposure to various digital resources as well as providing more personalized reading systems as the learners' L2 proficiency advances.

Finally, when it comes to the test formats, we also found statistically significant moderating effects on within-subject and between-subject studies. For between-subject studies, although researchers suggested that word recall was more difficult to acquire than word recognition (González-Fernández and Schmitt, 2020; Laufer & Goldstein, 2004; Laufer & Paribakht, 1998), statistically significant and large effects were found for recognition tests, recall tests and VKS tests that involved measuring word use. It appears that digital reading can effectively enhance learning all aspects of L2 vocabulary knowledge. For within-subject studies, statistically significant and large effects were found for VKS tests and mixed tests. Meaning recognition tests appeared to have a medium moderating effect on L2 vocabulary learning through digital reading. Given the wide confidence intervals and the small number of within-subject studies, this result must be interpreted with caution and awaits confirmation from future replication studies.

6 Conclusion

To summarize, this meta-analysis found that digital reading effectively enhanced L2 vocabulary learning. L2 proficiency, test formats, and digital resources were found to be statistically significant moderators. Subgroup analytic results suggested that intermediate learners benefited more from digital reading than L2 beginners. However, only a few eligible studies (i.e., AkbuSeileek, 2011, 2017; Eom et al., 2012; Gorjian, et al., 2011; Khezrlou, 2019; Lee, et al., 2017; Liu & Leveridge, 2017; Rassaei, 2020; Ruiz et al. 2021) clarified the tests and/or criteria for the level of proficiency. We thus suggest future research be more transparent on proficiency assessment and more rigorous about defining L2 proficiency. Further, no eligible studies investigated advanced learners' L2 vocabulary learning through digital reading, which is a gap to be addressed by future empirical studies. Results also suggested that all aspects of L2 vocabulary knowledge, including meaning recognition, meaning recall, form recognition, form recall, and vocabulary use, were facilitated by digital reading. Digital reading with access to lexical glosses appears to be the most efficient design for L2 vocabulary learning, followed by personalized reading systems. Hence, pedagogically, we suggest that teachers and learners may wish to increase the use of personalized reading systems and lexical glosses for digital reading, so as to enhance L2 intermediate and advanced learners' vocabulary learning. As L2 proficiency has been shown as a prominent moderating factor, another pedagogical insight is that increased exposure to various digital resources with comparable difficulty ladders should be offered to L2 intermediate and advanced learners to optimally enhance their vocabulary learning.

Although the statistical analysis is generally reliable, these conclusions should be interpreted as suggestive instead of definitive, due to the small number of within-subject studies included in the current meta-analysis, and more importantly, the limited number of studies on computerized lexical priming, personalized reading systems, and multiple digital resources. Future research is recommended to further explore the effectiveness of digital reading with access to digital resources outside the reading texts and lexical glosses. Along with the continuous development of natural language processing techniques and the fast update of adaptive learning algorithms, personalized reading systems have great potential in facilitating L2 vocabulary learning through digital reading and await further exploration. For example, future studies may apply personalized learning technology to lexical glosses. In addition to adapting the reading texts to learners' individual differences in L2 proficiency, personalized lexical glosses can adapt the glossed words to learners' individual differences in L2 vocabulary knowledge. Finally, as this meta-analysis seems to be the first and the only study comparing the effect of various resources on L2 vocabulary learning through digital reading, future studies may replicate systematic reviews to confirm our findings and sequence the effect of digital reading on other aspects of L2 learning.