A recent National Assessment of Educational Progress (NAEP) report indicated that 68% of fourth-grade students and 71% of eighth-grade students in the United States read below proficient levels (NAEP, 2022). Developing proficiency in reading is essential, as there are adverse consequences associated with persistent reading difficulties (RDs), including high school dropout, incarceration, anxiety, and depression (Dahle & Knivsberg, 2014; Daniel et al., 2006; Greenberg et al., 2007; Jordan et al., 2014). Early access to evidence-based core and supplemental reading instruction can reduce the incidence and severity of RDs experienced by students (Al Otaiba et al., 2009; VanDerHeydan et al., 2007). Thus, providing effective reading instruction is vital in not only supporting students’ development of proficient reading but also in supporting their overall success and wellbeing. In 2000, the National Reading Panel (NRP) conducted a systematic review of reading research and identified five key components of effective reading instruction: phonological awareness, phonics, vocabulary, comprehension, fluency (NRP, 2000). More recent reviews have supported and extended these findings (e.g., Donegan & Wanzek, 2021; Foorman et al., 2016; Gersten et al., 2020). For example, What Works Clearinghouse (WWC) published a practice guide suggesting that effective early reading instruction should address phonological awareness, phonics/decoding, fluency, and comprehension in an explicit and systematic manner (Foorman et al., 2016).

While it is known that (a) writing is an essential skill needed for learning and demonstrating knowledge in all content areas in school and (b) the majority of students in eighth grade in the United States perform below grade level in writing (National Center for Education Statistics, 2012), considerably less research has been conducted evaluating approaches to writing instruction than there has been evaluating approaches to reading instruction (Kim et al., 2021; McMaster et al., 2017). That said, there is some evidence supporting the efficacy of early writing instruction for the reduction of future writing difficulties (e.g., Berninger et al., 2008; McMaster et al., 2017) and a WWC practice guide reviewed 34 studies of writing instructional practices for students in elementary school to recommend that students receive instruction in handwriting, spelling, and composition (Graham et al., 2012). Given that literacy involves the ability to both read and write (United Nations Educational, Scientific and Cultural Organization [UNESCO], 2008), addressing reading-related skills, such as phonological awareness, phonics, vocabulary, comprehension, and fluency, as well as writing-related skills, such as spelling and written composition, is key to providing effective literacy instruction for elementary students.

Technology in education

The presence of technology to support student learning has increased rapidly during the last few decades. In 1983, there was one computer for every 125 students enrolled in public schools (Manzo, 2023). By 2009, there was one computer for every five students (National Center for Education Statistics, 2021). During the 2019–2020 school year, 45% of schools reported having a computer for each student; an additional 37% had a computer for each student in some grades or classrooms (Gray & Lewis, 2021). Technology has become equally prevalent in students’ homes: In 2021, 94% of adults who had school-age children at home reported that computers and internet access were always or usually available for educational purposes (Hemphill et al., 2021).

There is evidence that technology-based instruction increases student engagement and motivation (Bryant et al., 2015) and adapts to individual students’ needs (Ciampa, 2014). Many schools credit technology with helping students learn at their own pace (35%), learn collaboratively with peers (30%), learn more actively (41%), and think critically (27%; Gray & Lewis, 2021). Additionally, previous meta-analyses have found that the use of technology-based instruction is associated with positive effects ranging from small (0.16) to moderate (0.60) on the literacy performance of students in grades K-12 (e.g., Cheung & Slavin, 2012; Lee et al., 2022; Moran et al., 2008; Verhoeven et al., 2020; Wen & Walters, 2022; Xu et al., 2019). At the same time, researchers have noted that not all approaches to incorporating educational technology into the classroom are equally effective. For example, Ganimian (2022) argue that educational technology is most effective when it complements, rather than replacing, high-quality teacher-led instruction (e.g., by facilitating personalized feedback, expanding opportunities for practice, and increasing learner engagement). In the context of this rapidly changing educational landscape, there is a need to describe technology-based literacy instruction and examine its effects on student literacy outcomes.

Previous recent reviews of technology-based literacy instruction

We identified eight reviews of research investigating technology-based literacy instruction published in the last 5 years: four syntheses (Alqahtani, 2020; Dean et al., 2021; Eutsler et al., 2020; Jamshidifarsani et al., 2019) and four meta-analyses (Lee et al., 2022; Verhoeven et al., 2020; Wen & Walters, 2022; Xu et al., 2019). Table 1 summarizes the characteristics of these eight reviews. Five of the reviews of technology-based instruction focused on a broad array of literacy skills, one focused solely on the development of early literacy skills related to phonological awareness, letter knowledge, and storybook reading (Verhoeven et al., 2020), one focused on reading comprehension instruction only (Xu et al., 2019), and one focused on writing instruction only (Wen & Walters, 2022). Two of the reviews included studies with students in Grades K-12 (Lee et al., 2022; Xu et al., 2019), five of the reviews primarily included studies with elementary students (Alqahtani, 2020; Dean et al., 2021; Eutsler et al., 2020; Jamshidifarsani et al., 2019; Wen & Walters, 2022), and one only included studies with students in preschool and kindergarten (Verhoeven et al., 2020).

Table 1 Characteristics of previous reviews of technology-delivered literacy instruction

Jamshidifarsani et al. (2019) conducted a synthesis of studies involving technology-based reading interventions for students in Grades 1–6. They classified 42 studies with 32 intervention programs into six instructional categories based on the NRP (2000) report: phonological awareness, phonics, vocabulary, comprehension, fluency, and multi-component. They reported that 10 of the programs focused on phonics instruction and only one program focused on vocabulary instruction. Notably, six of the programs were classified as multi-component interventions. The authors noted that phonological awareness and phonics interventions were mostly evaluated in the early elementary grades whereas fluency interventions were used mostly in upper elementary grades; comprehension interventions were more evenly distributed across the grades. Of the 32 reading programs evaluated in the included studies, most (n = 29) were computer-based. They also noted that 72 different reading outcome measures were used in the studies they reviewed, but they did not analyze study outcomes.

Another recent synthesis that focused on studies of technology-based reading interventions for elementary students was Alqahtani (2020). Notably, the author was only interested in studies involving students with or at risk for RDs. Like Jamshidifarsani et al. (2019), Alqahtani classified included studies into categories based on the NRP (2000) literacy domain the evaluated intervention addressed. Among the 45 studies included in the Alqahtani synthesis, the most common intervention target was fluency (n = 19) and the least common was vocabulary (n = 1); nine studies were categorized as addressing multiple skills. The majority of interventions were delivered on computers (n = 39), with the remaining six delivered on tablets. In about half of the studies (n = 25), students used the program independently. One study took place in students’ homes instead of their schools. Results showed that 41 out of 45 studies demonstrated a positive effect when using technology to improve students’ reading skills.

Similar to Jamshidifarsani et al. (2019) and Alqahtani (2020), Dean et al. (2021) conducted a synthesis of studies involving elementary reading interventions delivered by technology. Jamshidifarsani and colleagues reported that in their sample of 49 studies, most focused on students in Grades 1–3 (n = 34) and included students with or at risk for RDs (n = 33). The devices used to deliver intervention primarily included computers (n = 35) and iPads or tablets (n = 10). In about half of the studies (n = 22), students worked independently. The average total intervention time was 16 h. Like Alqahtani (2020), the authors noted that, of the 44 studies that reported reading outcome data, 41 demonstrated some form of positive effect of the technology-based reading intervention on students’ reading skills.

Eutsler et al. (2020) specifically examined the influence of mobile technologies (e.g., tablets, smartphones, laptops) in the school setting on pre-kindergarten through fifth-grade students’ literacy achievement. Of the 61 studies included in their synthesis, 39 addressed a single literacy domain, with vocabulary (n = 16) and comprehension (n = 12) being the most common single literacy domains addressed. Tablets were the most common mobile device used (n = 45). About two-thirds of the studies (n = 42) involved students in Grades K-5. Of the 61 studies, 52 reported gains or mixed results in literacy outcomes.

Like Eutsler et al. (2020), Verhoeven et al. (2020) included pre-kindergarten students in their review. However, unlike the previously described reviews, Verhoven and colleagues were interested in examining the effects of computer-supported early literacy interventions for pre-kindergarten and kindergarten students only. They analyzed 59 studies that evaluated the effects of computer-assisted interventions addressing phonological awareness (n = 11), letter knowledge and phonological awareness combined (n = 28), or storybook reading (n = 20). On average, interventions lasted for 10 weeks for 51 min per week. The average student age was 65 months and students were designated as at risk for RDs in 16 studies. The authors reported a small positive mean effect of evaluated interventions on phonological awareness and reading-related (i.e., print concepts, letter knowledge, decoding, and spelling) outcomes (average g = 0.28). Notably, there were no significant differences in effects for the three types of interventions. Additionally, neither intervention duration nor participant characteristics (i.e., age or reading risk) were statistically significantly related to the effect size. However, intervention effects were found to be significantly associated with research design; studies with random assignment to conditions demonstrated lower effect sizes on average.

Lee et al. (2022) analyzed the effectiveness of technology-integrated literacy instruction in the classroom setting for K-12 English learners (ELs). Of the 36 studies, approximately half (n = 17) involved elementary students. Most studies implemented instruction on desktop computers (n = 28) and required students to work independently (n = 23). The average total instructional time was 21 h. The authors estimated a moderate mean effect of technology-integrated instruction on ELs’ literacy achievement (average d = 0.47). They also examined whether intervention effects were moderated by eight study features. Only learning context and literacy outcome were statistically significant moderators of intervention effects, with the English as a foreign language context generating a larger effect size (g = 0.58) than the English as a second language context (g = 0.20) and writing outcomes producing a large effect size (g = 0.91), whereas vocabulary (g = 0.47) and reading outcomes (g = 0.26) were moderate to small.

Two meta-analyses focused on technology-based instruction within specific literacy domains: reading comprehension instruction (Xu et al., 2019) and writing instruction (Wen & Walters, 2022). Xu et al. (2019) analyzed 19 studies that examined the effect of intelligent tutoring systems (i.e., computer-based instructional systems that provide feedback) on reading comprehension outcomes for K-12 students. They discovered that these interventions were moderately effective at improving reading comprehension, with a mean effect size of g = 0.60. Wen and Walters (2022) analyzed 20 studies that evaluated the impact of technology-integrated writing instruction on writing outcomes for elementary students. They estimated that the mean effect on writing quality was g = 0.56 and the mean effect on writing quantity was g = 0.28.

Rationale for the present meta-analysis of technology-delivered literacy instruction

We conducted a new meta-analysis for several reasons. First, of the eight previously described reviews of technology-based literacy instruction, three had very specific focus areas in terms of instructional content. Verhoeven et al. (2020) only included studies addressing early literacy (defined as phonological awareness, letter knowledge, or storybook reading), Xu et al. (2019) only included studies addressing reading comprehension instruction, and Wen and Walters (2022) only included studies addressing writing instruction. Further, although five of the reviews focused on technology-based literacy instruction broadly, they typically focused on the five reading components outlined by the NRP (2000) report (i.e., they did not explore spelling or writing components of literacy instruction). For example, Dean et al. (2021) required studies to have interventions addressing reading or reading-related skills (defined as phonological awareness, letter-sound knowledge or phonics, word reading, fluency, vocabulary, and reading comprehension). Further, although Jamshidifarsani et al. (2019) and Alqahtani (2020) included studies focused on improving reading skills broadly, both categorized their included studies based on the NRP report’s five reading components, and Alqahtani (2020) actually excluded studies of writing interventions. One review (Lee et al., 2022) did not provide information on specific literacy components addressed in their studies. Therefore, none of the previous reviews fully explored all components of literacy instruction.

Additionally, three of the eight previous reviews had very specific focus areas in terms of student characteristics. Alqahtani (2020) required students to be with or at risk for RDs, Lee et al. (2022) focused on ELs, and Verhoeven et al. (2020) only included students in pre-kindergarten and kindergarten. Further, some reviews excluded studies that included students with specific characteristics. For example, Jamshidifarsani et al. (2019), Verhoeven et al. (2020), and Xu et al. (2019) all excluded studies with second language learners. Some of the previous reviews also required instruction evaluated in included studies to include specific features. For example, Verhoeven et al. (2020) required studies to focus on interventions utilizing computers only. Eutsler et al. (2020) required the use of mobile devices (e.g., tablets, smartphones, laptops) in a classroom or school setting. Lee et al. (2022) and Wen and Walters (2022) similarly focused on instruction provided in the classroom or school setting only. Thus, few reviews included students in a wide range of grades who demonstrate a wide range of language skills and reading abilities, and some reviews limited the types of technological devices and instructional settings.

Further, of the eight reviews, only four employed meta-analytic methods. Without rigorous analyses of outcomes and potential moderators, we are unable to determine what works for whom and under what conditions. Thus, in the present meta-analysis, we only reported the effects on standardized, norm- or criterion-referenced measures (i.e., we excluded studies that only employed researcher-developed measures). Although researcher-developed measures that are closely aligned to content taught during instruction can provide important insight when evaluating the effectiveness of instruction (Clemens & Fuchs, 2022), such measures can also vary widely across studies. The use of norm- or criterion-referenced measures enabled us to more confidently compare effects across studies. Additionally, to more accurately describe the complex relationship between technology-delivered instruction and literacy outcomes, we explored the effects of several moderator variables. Based on previous meta-analyses of technology-based literacy instruction, we were specifically interested in whether effects were moderated by study characteristics (i.e., publication type, research design, sample size), participant characteristics (i.e., grade level, RDs), instruction characteristics (i.e., dosage, program availability, content foci), or outcome measure characteristics (i.e., literacy domain). Note that publication type was of particular interest given that (a) two of the four previous meta-analyses of technology-based literacy instruction have explicitly required studies to be published in peer-reviewed journals and (b) research supports the importance of including grey literature (e.g., dissertations, conference proceedings, research reports) in meta-analyses to avoid bias in estimating the effects of instruction (McAuley et al., 2000).

Given the increasing prevalence of educational technology and the fact that not all educational technology approaches are equally effective, there is a need to describe the technology-based literacy instruction that has been evaluated in recent research and examine its effects on student literacy outcomes. Although previous reviews have begun to embark on this work, none of them have both focused on the full array of literacy skills that can be addressed during technology-delivered instruction for elementary students and also utilized meta-analytic methods to rigorously examine effects of such instruction. Therefore, the present meta-analysis sought to address this gap in the literature by asking:

  1. (1)

    What are the characteristics of studies that examine the effects of technology-delivered literacy instruction on literacy outcomes for K-5 students?

  2. (2)

    What is the mean effect of technology-delivered literacy instruction on K-5 students’ literacy outcomes?

  3. (3)

    Are effects moderated by study, participant, instruction, or outcome characteristics?

Method

Identification of studies

To identify studies evaluating the effects of technology-delivered literacy instruction for elementary students, we followed Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines (Moher et al., 2009).

Search procedures

The search included a three-step process. First, we searched the electronic databases of ERIC and PsycINFO for studies published in English between January 1, 2000 and December 31, 2022. Our search syntax was as follows: (literacy OR read* OR spell* OR writ* OR vocab* OR phon*) AND (instruct* OR interven* OR program*) AND (computer* OR technolog*) AND (child* OR student* OR grade* OR kinder*). As we had already compiled an initial list of eligible studies, we ensured that results produced from these terms would include all of them. Next, we completed an ancestral search of articles included in recent, relevant reviews (Alqahtani, 2020; Dean et al., 2021; Eutsler et al., 2020; Jamshidifarsani et al., 2019). Finally, we conducted a hand search of the three education technology journals with the greatest number of studies included in the full-text review (Computers & Education, Educational Technology Research and Development, and Journal of Computer Assisted Learning) for articles published between 2020 and 2022. Figure 1 represents our search procedure and results at each stage of the search process.

Fig. 1
figure 1

PRISMA search flow diagram

Inclusion criteria

Studies were included if they met the following criteria:

  1. (1)

    Studies were published in English between January 1, 2000 and December 31, 2022.

  2. (2)

    Studies included participants in Grades K-5, or if grade level was not specified, participants were aged 5–11 years old, or the average age fell in this range.

  3. (3)

    Studies employed experimental or quasi-experimental, treatment-comparison research designs with at least 15 participants per group and groups were approximately equivalent (i.e., no major differences in age/grade, language learner status, or reading abilities). Acceptable comparison conditions included any group not receiving technology-delivered literacy instruction.

  4. (4)

    The instruction being evaluated focused on literacy (e.g., phonological awareness, phonics, decoding, encoding, vocabulary, fluency, reading comprehension, writing) and was provided on a technological device (e.g., computer, laptop, tablet, iPad, smartphone) primarily (i.e., at least 50%) in English.

  5. (5)

    Studies employed and reported data for at least one norm- or criterion-referenced literacy outcome measure. Studies included information needed to calculate a standardized mean difference (Hedges’ g) between treatment and comparison conditions (i.e., means, standard deviations, and sample sizes per group for each outcome measure of interest).

Screening and full-text review

We utilized the Covidence systematic review software (Covidence, 2023) to screen abstracts and identify articles that met inclusion criteria. Before screening abstracts, six screeners participated in a one-hour training and achieved ≥ 90% reliability with the first author when screening a practice set of ten abstracts. Each abstract was independently screened by two members of the research team, with the first author resolving any disagreements. A total of 9411 articles were excluded during the abstract screening stage. After the conclusion of abstract screening, 232 full texts were retrieved for review.

Before beginning full-text review, six reviewers participated in a one-hour training and achieved ≥ 90% reliability with the first author when reviewing a practice set of five articles. Reviewers were expected to apply inclusion criteria in a pre-specified order and identify the same reason for exclusion when exclusion was appropriate. Each article was independently reviewed by two members of the research team, with the first author resolving any disagreements. During full-text review, 179 articles were excluded because they did not meet at least one eligibility criterion, applied in the following order: enrolled students outside of the eligible grade/age range (n = 24), employed an ineligible research design (n = 50), did not use a technological device to provide literacy instruction in English (n = 48), did not employ an eligible, norm- or criterion-referenced outcome measure and report sample sizes, means, and standard deviations for each group at post-test (n = 46), and reported the results of analyses also reported in another publication (n = 11). Thus, of the 232 full texts reviewed, a total of 53 studies met the inclusion criteria and were coded.

Coding procedures

The first author, a researcher with extensive coding experience, served as the gold standard for coding. Prior to beginning coding, the third author, a graduate research assistant, participated in a one-hour training and achieved 93% interrater agreement on a set of two independently coded studies. All studies were then coded by the first author with the third author double coding approximately 30% of studies. Coders achieved 96% overall agreement across double-coded studies. Disagreements were resolved by discussion and consensus.

Articles were coded for (a) study and participant characteristics (e.g., research design, grade level, EL status, eligibility for free or reduced-price lunch), (b) instruction characteristics (e.g., literacy components present in the instruction, instruction dosage), and (c) outcome characteristics (e.g., literacy domains measured). Information about definitions used to code for these variables is provided in Table 2.

Table 2 Definitions of variables used in coding

Data analysis

To quantify the effects of technology-delivered literacy instruction on outcomes for elementary students, we analyzed relevant data using the Comprehensive Meta Analysis software (Borenstein et al., 2005). We used standardized mean differences between treatment and comparison groups estimated with Hedges’ g (Hedges, 1981), using reported posttest means and standard deviations by condition. We used adjusted posttest means (i.e., posttest means adjusted for pretest scores) when they were available. When more than one outcome measure was reported for a study, effect size estimates were aggregated using the mean to avoid overrepresentation of multi-measure studies in the overall analyses (Rosenthal, 1991). Although there are alternative analyses that account for effect size dependency that do not involve aggregating scores to a mean score (e.g., robust variance estimation), the variance across effect sizes when we assume independence (i.e., when the correlation is set to 0.0) is 0.001; when the correlation is set to 1.0 (i.e., when effect size estimates are aggregated using the mean), the variance is 0.002. Due to the very small difference in these two extremes, it is unlikely that setting the correlation to a numerical value between 0.0 to 1.0 (e.g., 0.8) would produce different results from those currently reported in the manuscript. Further, given that all our effect sizes were calculated from norm- or criterion-referenced outcome measures of related literacy skills (e.g., word reading, spelling), our effect sizes are likely to be highly correlated (i.e., near 1.0).

We also calculated the Q and I2 statistics to reveal the extent to which heterogeneity among true effect sizes contributes to the observed variation in the effect size estimates. To test the categorical moderator effects, we proceeded with analysis of the random effects between-groups heterogeneity statistics. For the continuous variables, we used a meta-regression based on the restricted maximum likelihood for the random effects model to predict variations in effect size across studies from the moderator variables. We also examined funnel plots and conducted Egger’s regression test for random effects models to determine the presence of publication bias. Lastly, we implemented the trim-and-fill method (Duval & Tweedie, 2000).

Results

Descriptive findings

Study and participant characteristics

The final corpus consisted of 53 studies. Table 3 summarizes study and participant characteristics. Of the 53 studies, 43 studies were published in peer-reviewed journals and 10 studies were doctoral dissertations. In terms of research design, 36 studies employed randomized controlled trials and 17 studies employed quasi-experimental designs in which participants were not randomized to condition. The average number of participants per study was 204 (range 32 to 1672). Almost half (n = 24) of the 53 studies reported that at least 50% of the participating students were identified as having or being at risk for RDs. Eight studies reported that at least 50% of participants were ELs and nine studies reported that at least 50% of participants experienced economic disadvantage (i.e., were eligible to receive free or reduced-price lunch).

Table 3 Study and participant characteristics

Although our search and inclusion criterion required studies to have a majority of participants in Grades K-5, participant grades ranged from pre-kindergarten to eighth grade. The two studies that included pre-kindergarten students (O'Callaghan et al., 2016; Schmitt et al., 2018) reported average ages that indicated at least 50% of participants were in kindergarten. Similarly, the four studies that included participants beyond Grade 5 (Boller, 2010; Kim et al., 2010, 2011; Troia, 2004) reported the percentage of students belonging to each grade, which indicated that at least 50% of participants were in Grades K-5. About half of the studies (n = 29) included participants in two or more grades. Overall, the majority of studies (n = 40) involved students primarily in early elementary (i.e., Grades K-2); 13 studies involved students primarily in later elementary (i.e., Grades 3–5).

Instruction characteristics

The 53 included studies reported on 61 treatment conditions and 62 comparison conditions; the total number of treatment-comparison contrasts across the 53 studies was 66. Instruction characteristics for each treatment condition are reported in Table 4.

Table 4 Instruction characteristics

The average reported dosage of technology-delivered literacy instruction was 36 min a day, 4 days per week, for 18 weeks. Overall, the average total instructional time was 37 h. Of the 61 treatment conditions, 50 implemented publicly-available literacy programs. Notably, 12 of the 50 used ABRACADABRA (A Balanced Approach for Children Designed to Achieve Best Results for All), seven used Lexia Learning Systems, and five used Fast ForWord. Most of the 61 treatment conditions (n = 45) did not report involving adult or peer support (i.e., the child independently engaged with the literacy program on the technological device). Of the 61 treatment conditions, only three programs were implemented on a device that was not a computer/laptop: one used smartphones (Patel et al., 2022) and two used tablets (both treatment conditions in Stein et al., 2022). All except for one of the 61 treatment conditions were implemented in a school setting; Schmitt et al. (2018) took place at the participants’ homes.

The research team analyzed descriptions of all 61 treatment conditions and coded for the presence of seven literacy domains: phonological awareness, phonics/decoding/word reading, encoding/spelling, text reading/fluency, vocabulary, comprehension, and writing. The most prevalent literacy domains were phonics/decoding/word reading (n = 49), text reading/fluency (n = 45), phonological awareness (n = 44), and comprehension (n = 43). Instruction in vocabulary was also common (n = 36). Instruction in encoding/spelling (n = 23) and writing (n = 11) was less prevalent. Most treatment conditions addressed more than one literacy domain during instruction (n = 57), with an average of four literacy domains addressed during instruction.

Outcome characteristics

The 53 included studies provided data for a total of 246 norm- or criterion-referenced literacy outcome measures. The most frequently measured literacy domains were phonological awareness (n = 71) and phonics/decoding/word reading/non-word reading (n = 68). Reading comprehension (n = 40) and text reading/fluency (n = 24) were also commonly measured. Measures of vocabulary (n = 19), oral language/listening comprehension (n = 13) and encoding/spelling (n = 11) were used less often.

Meta-analytic findings

Main effects

The average effect on combined outcomes was estimated as g = 0.24 (95% CI [.15, .32], p < .001), indicating a small positive and significant effect of technology-delivered literacy instruction on elementary student outcomes. The Q statistic indicated that the effect sizes were heterogeneous and the I2 statistic indicated that the substantial variation in the effect sizes of this set of studies was not due to chance (Q = 165.38, I2 = 68.56, df = 52, p < .001). Accordingly, analyses were conducted to examine the impact of potential moderator variables on treatment effect sizes.

Moderator analyses

Study and participant characteristics

The estimates of average effect size disaggregated by the levels of each study and participant moderator are reported in Table 5. There were no statistically significant differences in effect size based on publication type, research design, sample size, grade level, or presence of RDs.

Table 5 Moderator results for study and participant characteristics
Instruction characteristics

Table 6 reports the estimates of average effect size disaggregated by each instruction characteristics moderator. There were no statistically significant differences based on dosage (i.e., total number of hours of instruction), program availability, or literacy domain (i.e., phonological awareness, phonics/decoding/word reading, encoding/spelling, vocabulary, text reading/fluency, comprehension, or writing). Two literacy domains, phonics/decoding/word reading and encoding/spelling, approached statistical significance (p = .10 for both). Descriptively speaking, treatments with instruction in phonics/decoding/word reading tended to have larger effects (g = 0.27) than treatments that did not (g = 0.11); similarly, treatments with instruction in encoding/spelling descriptively tended to have larger effects (g = 0.31) than treatments that did not (g = 0.18).

Table 6 Moderator results for instruction characteristics
Outcome characteristics

The mean effects of instruction did not statistically significantly differ based on outcome domain (Table 7). Descriptively speaking, effects on measures of phonological awareness (g = 0.31) and phonics/decoding/word reading/non-word reading (g = 0.29) appeared larger than those of other outcome domains, with the effects on measures of reading comprehension appearing particularly small (g = 0.12).

Table 7 Moderator results for outcome characteristics

Publication bias

A funnel plot was created to investigate the presence of any potential publication bias. An Egger’s regression test indicated no evidence of funnel plot asymmetry (b = 0.37, df = 51, p = .51). Results of the trim and fill analysis indicated that five studies were missing from the right side of the funnel plot. Including these missing studies in the random-effects model would increase the mean effect size from g = 0.24 to g = 0.29.

Discussion

Technology is increasingly prevalent in students’ schools and homes (Gray & Lewis, 2021; Hemphill et al., 2021) and there is emerging evidence suggesting technology-based instruction is associated with improved motivation, engagement, and academic outcomes (Cheung & Slavin, 2012; Gray & Lewis, 2021). At the same time, not all approaches to incorporating educational technology into the classroom are equally effective (Ganimian 2022). In the context of this rapidly changing educational landscape, the purpose of this meta-analysis was to describe and evaluate recent research of technology-delivered literacy instruction for students in Grades K-5.

Characteristics of studies evaluating technology-delivered literacy instruction

Our first research aim was to describe the characteristics of studies that examine the effects of technology-delivered literacy instruction on literacy outcomes for K-5 students. There were some similarities across the 53 included studies. For example, most included studies were published in peer-reviewed journals and employed randomized controlled trials. Additionally, despite our attempts to include studies of students in Grades K-5, the majority of our studies only included students in Grades K-2. There are many meta-analyses that report a dearth of literacy research with students in the upper elementary and secondary grades (e.g., Donegan & Wanzek, 2021; Scammacca et al., 2015). That said, there is some evidence that literacy instruction tends to produce greater benefits when it is delivered earlier rather than later in elementary school (Al Otaiba et al., 2009; Lovett et al., 2017; Wanzek et al., 2018), which may explain the increased number of studies of technology-delivered literacy instruction focused on students in the earlier elementary grades.

In terms of participant characteristics, about half of the studies reported that the majority of participating students were identified as having or being at risk for RDs. The use of technology-delivered reading instruction to support students with RDs is not surprising based on Cheung and Slavin’s (2012) finding that technology-based reading intervention had a moderate, positive effect of 0.37 for students with RDs. Unfortunately, many studies did not report information about other characteristics of students (e.g., EL status or socioeconomic status).

In contrast to both the Jamshidifarsani et al. (2019) and Alqahtani (2020) reviews, which found that approximately 20% of technology-based reading interventions were multicomponent, almost all the programs included in our meta-analysis (90%) addressed multiple literacy skills. The programs in our corpus addressed an average of four literacy skills during instruction. The most commonly addressed literacy skills were phonics/decoding/word reading, text reading/fluency, phonological awareness, and comprehension. This finding is in alignment with recommendations from prior systematic reviews of reading research, such as the NRP (2000) report and the WWC practice guide on effective early reading instruction (Foorman et al., 2016).

Additionally, the majority of the technology-delivered literacy programs evaluated in the present meta-analysis were publicly available. It is noteworthy that approximately 20% of treatment groups used the web-based ABRACADABRA program. ABRACADABRA was developed based on the recommendations of the NRP (2000) report and targets a range of skills, including phonological awareness, phonics, vocabulary, comprehension, fluency. Activities in ABRACADABRA can be customized to a student’s specific abilities and challenges. In a recent meta-analysis of 17 studies implementing ABRACADABRA, Abrami et al. (2020) estimated that the overall weighted average effect size was g = 0.26. This finding means that students receiving ABRACADABRA are likely to show improvement in literacy outcomes.

Despite not setting any restrictions on type of technological device or instructional setting, almost all literacy programs were delivered on a computer/laptop in a school setting. Although we anticipated that a larger range of technological devices would be used in our studies, this finding is in alignment with results reported by Jamshidifarsani et al. (2019), who acknowledged that “the use of non-computer technologies, such as tablets and smartphones, are less than what was expected” (p. 446). Further, only one study in our corpus was not implemented in a school setting. Given that (a) children spend about 80% of their time not at school (Hall & Nielsen, 2020), (b) 92% of households have at least one type of technological device (i.e., desktops, laptops, tablets, or smartphones; US Census Bureau, 2021) and (c) empirical evidence suggests that home-based, family-implemented reading interventions can have a positive impact on the development of literacy skills (Sénéchal & Young, 2008; Van Steensel et al., 2011), it is somewhat surprising that more technology-delivered instruction was not provided at home. However, this finding is consistent with other recent reviews of technology-delivered literacy instruction. For example, Alqahtani (2020) also included studies that occurred in a range of settings, but only found one that occurred in the students’ homes.

To summarize, the studies examined in the present meta-analysis were primarily randomized controlled trials with students in early elementary grades. About half of the studies included students with or at risk for RDs. Almost all of the technology-delivered literacy instructional programs in these studies were publicly available and addressed multiple literacy skills. Lastly, most of the programs were delivered on a computer/laptop in a school setting.

Effects of technology-delivered literacy instruction

Our second research aim was to evaluate the effect of technology-delivered instruction on the literacy skills of K-5 students. The average effect on combined literacy outcomes for elementary students indicated a small, positive, statistically significant effect of technology-delivered literacy instruction relative to not technology-delivered literacy instruction (g = 0.24). Therefore, students in Grades K-5 are likely to benefit from technology-delivered literacy instruction. This finding is in alignment with the Verhoven et al. (2020) meta-analysis of computer-supported early literacy programs that demonstrated a mean effect of g = 0.28. However, given that researcher-developed measures are often proximal measures that are closely aligned to content taught during instruction, the inclusion of researcher-developed measures is associated with larger effect sizes (Scammacca et al., 2015; Swanson et al., 1999). When only standardized, norm- or criterion-referenced measures are included in meta-analyses, overall mean effect sizes are often much smaller. For example, Xu et al. (2019) examined the effects of technology-supported reading comprehension instruction and estimated a mean effect for reading comprehension as g = 0.60. When only including standardized outcome measures in their estimate, Xu and colleagues determined that the average effect size was much smaller (g = 0.25), and more similar to the findings in the present meta-analysis. Thus, including researcher-developed measures in the present meta-analysis may have resulted in a larger effect size.

Our third research aim was to determine whether effects were moderated by study, participant, instruction, or outcome characteristics. Although we explored the effects of several moderator variables representing study characteristics (i.e., publication type, research design, sample size), participant characteristics (i.e., grade level, RDs), instruction characteristics (i.e., dosage, program availability, content foci), and outcome measure characteristics (i.e., literacy domain), none of them emerged as statistically significant moderators. This finding was unexpected, given that other meta-analyses of technology-based literacy instruction have reported the presence of variables that impact the effectiveness of instruction. For example, Cheung and Slavin (2012) found that publication type, research design, sample size, and grade level were all statistically significant moderators of intervention effects. Notably, in the present study, the impact of two components of literacy instruction on intervention effectiveness approached statistical significance (p = .10 for both). Descriptively speaking, instruction that addressed phonics/decoding/word reading and instruction addressed encoding/spelling was more effective in improving combined literacy outcomes than instruction that did not include these components (g = 0.27 and g = 0.31, respectively). Thus, technology-delivered literacy instruction that includes word reading or spelling components may be beneficial in improving literacy outcomes. Additionally, technology-delivered literacy instruction was more effective than not technology-delivered literacy instruction in improving outcomes of phonological awareness (g = 0.31, p < .01) and phonics/decoding/word reading/non-word reading (g = 0.29, p < .01).

Another moderator worthy of further discussion is dosage. The treatments in the present meta-analysis represented a wide range of dosages (i.e., the total time ranged from 2 to 126 h) with the average amount of instructional time being 37 h. That the amount of time students received the technology-delivered literacy instruction did not have a statistically significant impact on the effectiveness of the instruction was not surprising given that meta-analyses of reading intervention research have often reported nonsignificant relations between dosage and reading outcomes (e.g., Suggate, 2010; Wanzek et al., 2016). Roberts et al. (2022) posited that these findings may be due to the nonlinearity of reading intervention dosage response and conducted a nonlinear meta-analysis of reading intervention dosage. They found that increasing dosage improved intervention effects until 40 h of instruction, after which the intervention effects decreased (i.e., the maximal effect of instruction occurred at 40 h). Although dosage was not a statistically significant moderator of intervention effects, the average number of hours of technology-delivered literacy instruction in the present meta-analysis (37 h) is similar to the optimal dosage of reading intervention estimated by Roberts and colleagues (40 h).

Limitations

There are a few limitations worth noting and considering within the context of the findings from this meta-analysis. As with any meta-analysis, our findings are unique to the search procedures, inclusion criteria, coding procedures, and analytic methods we used. If we used different search procedures and inclusion criteria, then we may have identified more studies to include. For example, although the ERIC database indexes grey literature content (e.g., research reports, curriculum and teaching guides, conference papers, dissertations, and theses) published by 1057 selected centers, agencies, programs, associations, and non-­profit organizations, there are other databases (e.g., Google Scholar; ProQuest; Theses Global) that might have yielded a larger pool of studies for screening and potential inclusion. The inclusion of an even larger number of studies would have enabled us to conduct more powered moderator analyses that could have shed light on the characteristics of effective technology-delivered literacy instruction. That said, the present meta-analysis included a similar number of studies relative to other recent reviews of technology-based literacy instruction for elementary students (e.g., Alqahtani, 2020 [n = 45]; Dean et al., 2021 [n = 49], Jamshidifarsani et al., 2019 [n = 42]). Additionally, it may have been a limitation that we only explored effects of technology-delivered literacy instruction on norm- or criterion-referenced assessments. While exclusively reporting effects on these types of measures enabled us to more confidently compare effects across studies, it also meant that we were unable to examine effects on researcher-developed measures that were more proximal to the instruction being delivered. Notably, none of the included studies reported outcome data from norm- or criterion-referenced writing measures. Thus, we were unable to determine the effects of technology-delivered literacy instruction on writing outcomes.

Further, and perhaps most importantly, as with previous reviews of technology-based literacy instruction (e.g., Jamshidifarsani et al., 2019), this meta-analysis was limited by the information provided in the included studies. For example, there was occasionally a lack of information reported within the manuscript related to the type of support received during instruction (i.e., whether students engaged with the program independently or were supported by an adult or peers) and intended dosage of instruction. It was also sometimes difficult to reliably code for the literacy components of instruction, as study authors provided very little detail about the computer-delivered literacy instruction they were evaluating. In the Jamshidifarsani et al. (2019) synthesis, the authors similarly noted, “Unfortunately, many of the studies do not provide sufficient information and enough details about their intervention programs” (p. 446). Additionally, there are some aspects of instruction that were not coded due to studies not providing sufficient information. For example, we did not code for whether the technology-delivered literacy instruction complemented or replaced teacher-led instruction, nor did we code for whether a blended-learning or fully technology-delivered approach was used. Overall, the lack of reporting on key characteristics of the program being evaluated is concerning because without having detailed information it is impossible to conduct rigorous analyses that allow us to determine which aspects of the program are most beneficial in improving student literacy outcomes.

Implications and future directions

Overall, our meta-analysis supports the use of technology-delivered literacy instruction to improve literacy outcomes for elementary students. This finding is encouraging for practitioners, program developers, and policymakers because it implies that technology-based efforts to support the development of elementary students’ literacy skills at school can be fruitful. In particular, it is noteworthy that technology-delivered literacy instruction was effective for students in earlier elementary grades (g = 0.23, p < .01) as well as in later elementary grades (g = 0.26, p < .01). It was also effective for elementary students with (g = 0.23, p < .01) and without RDs (g = 0.24, p < .01).

Unfortunately, our findings did not provide conclusive answers about the factors that distinguish more effective approaches to technology-based instruction from less effective approaches to such instruction. Thus, there is more work that needs to be done to determine what works for whom and under what conditions. For example, it may be worth further exploring the components of technology-delivered literacy instruction that are most effective at improving literacy outcomes. We identified two literacy components that showed promise. However, given the tendency for such instruction to be multicomponent, it is important for future research to evaluate which combinations of literacy components demonstrate the greatest improvement in literacy outcomes. It is also extremely important for researchers to more thoroughly describe the technology-delivered literacy programs they are evaluating. As noted previously, without detailed descriptions of the programs being evaluated, we are unable to determine what works best for improving student literacy skills.

We also identified some aspects of technology-delivered literacy instruction that lacked rigorous research, including the types of technological devices used and the settings in which instruction occurred. In particular, we echo other researchers in the field who noted “too few papers studied an in-home intervention” (Jamshidifarsani et al., 2019, p. 445) and call for the study of technology-delivered literacy instruction that takes place in the home rather than at school. Additionally, as indicated by Ganimian (2022), it may be worthwhile for future research to explore the effects of technology-based instruction that complements versus replaces teacher-led instruction. We also believe there may be value in exploring instruction that implements blended-learning approaches versus fully technology-delivered approaches.