The efficacy of a computer-adaptive reading program on grade 5 students’ reading achievement scores

Reading in upper-elementary grades includes comprehending complex texts and learning disciplinary-specific vocabulary. This study aims to determine the effects of a computer-adaptive supplementary reading program on fifth-grade students’ reading achievement. Using propensity score matching to create equivalent groups of 450 students for both control and intervention groups (N = 900), a quasi-experimental design was employed to examine changes in fifth-grade students’ overall, comprehension, vocabulary, reading achievement scores who used this program as compared to those who did not. Students from eight school districts and 108 schools in a Southeastern state were matched based on demographics and initial reading ability, and the Title I status of their schools. Students in the intervention group received at least 30 min per week of supplemental computer-adaptive reading instruction between the months of September and May. Mean differences between pre- and post-test scores identified that students in the intervention group utilizing a computer adaptive reading curriculum had statistically significant higher gains than students in the control group on their overall reading, vocabulary, and reading comprehension scores with small effect sizes. The findings support the use of a supplemental computer adaptive reading program for improving overall reading and reading comprehension outcomes among these fifth-grade students.


Introduction
The results of National Assessment of Educational Progress (NAEP, 2019) identify the continued challenges to reading achievement in the United States. Only 35% 1 3 of fourth graders are performing at or above proficient in reading, down from 36% to 2017. Supporting reading proficiency in upper-elementary classrooms entails responding to the differentiated needs of readers representing varying levels of proficiency (Connor et al., 2014;Connor & Morrison, 2016). National Reading Panel (2000) notes that instructional approaches for reading achievement should include a focus on (a) phonemic awareness, (b) phonics, (c) fluency, (d) comprehension, and (e) vocabulary. To support effective reading instruction, teachers require evidence of how their students are progressing in the different reading domains (Sztajn et al., 2012). Formative assessments and adaptive curriculum found in a computer adaptive reading program (CARP) may support the teaching of reading by providing teachers with data to guide instructional decisions to tailor their reading instruction to students' current abilities. Real-time knowledge of students' reading ability may be important for learners in upper-elementary grades, given research findings identify a "slump" in reading achievement for low-income students after the third grade (Campbell et al., 2019;Chall & Hirsch, 2003;Chall & Jacobs, 1983;Stockard, 2010;).

Reading instruction in the upper-elementary grades
Once students reach upper-elementary grades, a shift in classroom reading focus occurs, away from "learning to read" to "reading to learn" (Chall, 1983). Students in upper-elementary grades are tasked with comprehending increasingly complex texts which requires drawing upon a wider range of vocabulary. Shifts in reading purpose and instruction in upper-elementary grades are associated with widening reading achievement gaps -termed the "fourth-grade slump" -between low-income students and non-low-income students (Chall & Jacobs, 1983). Torgesen and colleagues (2007) identified six areas of "knowledge, skill, and aptitude," that readers in early adolescence should develop to maintain and increase reading proficiency as they advance to upper grades, including (a) fluency, (b) vocabulary knowledge, (c) content knowledge, (d) higher-level reasoning and thinking skills, (e) cognitive strategies specific to reading comprehension, and (f) motivation and engagement. A meta-analysis on students with reading difficulties and disabilities in upper-elementary grades found evidence in support of instruction in comprehension strategies, fluency interventions, vocabulary instruction, and multi-component interventions but emphasized the need for more research on interventions that address the unique needs of reading learners in upper-elementary grades (Wanzek et al., 2010). In addition, a systematic review of research studies indicated that interventions with upper-elementary grade students that change their daily teaching practices were more effective at raising reading achievement; however, the need remains for more high-quality studies on effective programs to support upper-elementary reading proficiency (Slavin et al., 2009). Therefore, research findings have identified promising practices to promote upperelementary grade students' reading achievement. Nevertheless, additional research is warranted to distinguish effective reading programs that teachers, schools, and districts may implement to address proficiency gaps. 1 3

Computer adaptive reading and differentiated instruction
There is evidence that targeting the specific learning needs of reading students in upper-elementary classrooms improves reading achievement (Jacob et al., 2015;Kim et al., 2011;Stockard, 2010). Differentiation in general instruction is a promising method for responding to the unique needs of learners representing diverse skill sets and interests (Reis et al., 2011;Tomlinson, 2014) and may be particularly important for struggling readers. Computer-based programs have the potential to support differentiated instruction since they enable real-time data production and individualized learning profiles. Differentiation in reading instruction can be especially important when a student is missing foundational skills and needs additional instructional support. Technology-enhanced learning may increase motivation, especially when using gamification to engage students (Dicheva et al., 2015;Hong & Masood, 2014).
Research findings identify the potential of computer-assisted programs to increase students' reading achievement scores (e.g., Cheung & Slavin, 2013;Kamil & Chou, 2009). For example, an investigation into the effectiveness of a supplementary, computer-based program in Texas found statistically significant reading skill growth for fourth and fifth graders who received the technology-based intervention (SEG Measurement, 2018). In their review of 25 studies on PK-12 reading instruction and computer technology, Kamil and Chou (2009) found evidence that computer-based programs may enhance learning of vocabulary and comprehension. However, they note a need for more research on recent technological advances, such as adaptive technology. Computer adaptive technologies may be used for both reading instruction and assessment. For assessments, adaptive technology overcomes a major challenge of non-adaptive standardized tests by tailoring the assessment to examinees' individual ability, such that it is neither too challenging nor too simple to measure students' knowledge (McGlohen & Chang, 2008;Merrell & Tymms, 2007).
Adaptive technology is used in summative state assessments of students' yearly progress (Flanigan, 2014), but may be particularly useful for formative assessments of students' reading level. There are several ways formative reading assessments may be useful for elementary school teachers, including offering evidence of students' progress towards goals, providing feedback to teachers and students, identifying next steps in learning, and judging achievement (Harlen, 2012). Formative reading assessments may also help teachers predict students' performance on standardized achievement tests (Marcotte & Hintze, 2009). In reading, formative assessments such as miscue analysis inform teachers' response to both struggling and excelling students and serve as an impetus for teachers to engage students in their own learning process (Goodman & Goodman, 2014). When formative reading assessments incorporate adaptive technology, benefits include pinpointing students' specific learning needs (Bennett, 2011), which may vary in upper-elementary classrooms. Further, computeradaptive assessments can be linked to adaptive curriculum delivered through technology or in a face-to-face environment. In some computer-adaptive programs such as the one investigated in our study; the computer delivers a supplemental computeradaptive curriculum in addition to teacher-delivered supplemental lessons to target specific deficit skills based on learners' needs.

3
Theoretically, computer-adaptive technologies employed in reading support students' achievement in their potential Zone of Proximal Development (Murray & Arroyo, 2002;Navarro & Mourges-Codern, 2018;Vygotsky, 1978). As computers adapt to students' reading knowledge and skills level, new content is presented (Murray & Arroyo, 2002). In some circumstances, computers may serve as the more knowledgeable one, moving the student towards their potential (Vygotsky, 1978).
As a supplementary computer-adaptive reading program (CARP) that provides a formative assessment tool and adaptive curriculum, the Istation Reading Program is designed to help teachers identify and respond to the diverse learning needs of their students. Within the reading program, formative assessment results are included in the interactive digital lessons scaffolded to students' individualized learning needs. Previous research investigating the reading program identified that third grade students in the most need of reading intervention have greater achievement when they practice reading using the supplemental program both in and out of class (Campbell, Sutter, & Lambie, 2019). Yet, the efficacy of the reading program and 5th grade reading achievement has not been investigated. Specifically, more research is needed to examine how CARPs may support upper-elementary school students' reading achievement (Blok et al., 2002;Jamshidifarsani et al., 2019;Wanzek et al., 2010).

The present study
To better inform educational stakeholders, the purpose of the present study was to investigate the efficacy of a CARP on fifth grade students' reading achievement. The current investigation employed a quasi-experimental design to determine the influence of the CARP (intervention) on students' reading proficiency scores. Specifically, the study addresses the following research questions: 1. What were the differences, if any, of fifth-grade students' overall reading, vocabulary, and reading comprehension scores when using a CARP as compared to students that do not use the CARP? 2. When considering all of the fifth-grade students by Academic Level, what were the differences, in the overall reading, vocabulary, and reading comprehension scores between the intervention (CARP) and control group?

Design
To examine the efficacy of the CARP on fifth-grade students' reading scores, we used propensity score matching (PSM; Rosenbaum & Rubin, 1983) to create groups matched on key ability and background characteristics and to establish group equivalence in observational studies (Graham & Kurlaender, 2011;Rosenbaum, 2009). Specifically, groups were matched on gender, race, pre-treatment reading level, and the Title I status of their schools. The PSM approach enables researchers to address the issue of selection bias and control for confounding variables, or factors that existed before treatment that may be associated with outcomes. PSM is recognized by the What Works Clearinghouse (U.S. Department of Education, 2017) as an acceptable approach to meet the standards for quasi-experimental designs outlined for "evidence-based practices" in Every Student Succeeds Act (2015). The Institute for Education Statistics describes specific considerations for quasi-experimental designs using PSM in its standards handbook, and the criteria for meeting those standards were consulted to support evidence of validity and reliability in the research design.

Participants
Before using PSM, the population included 14,525 fifth graders from 14 school districts and 326 different schools across in a Southeastern state. These districts and schools implemented the CARP to varying degrees of fidelity, employing distinct policies and procedures to assign, encourage, and monitor the reading program's usage. Variations in the implementation of the CARP allows us to examine the intervention as it is used under typical, instead of ideal, conditions. Common among all participants is that the reading assessment component was used to establish a preand posttest measure. Because the reading program recommended at least 30 min of curriculum usage per week, the intervention group consisted of students who used at least 30 min of the Istation reading curriculum per week between the months of September and May of the 2016-2017 school year. The control group consisted of fifth grade students who only used the CARP for assessment purposes. The pretest scores determined the students' academic level.

Students' academic level
Students' achievement level was measured based on their initial achievement percentiles determined on their initial formative annual assessment. Students at Level 1 include all students who initially scored at or below the 20th percentile. In other words, students at Level 1 are in the most need of reading support. Students at Level 2 initially scored between the 21st -40th percentiles, indicating their need of reading support. Finally, students at Level 3 scored above the 40th percentile in reading, suggesting that they have sound reading proficiencies.

Control and intervention groups
A comparison of control and intervention groups before PSM identified that the groups were uneven (Table 1). Compared to the intervention group (n = 5,548), the control group (n = 8,977) had higher percentages of struggling students (Academic Level 3), male students, white, non-Hispanic students, and students from Title I schools. The results of an independent samples t-test showed that baseline scores for the overall and comprehension assessment were significantly different between groups (overall: t(11,491)=5.7, p<.001; comprehension: t(10,869)=1.9, p<.001) ( Table 2).
To reduce the imbalance in the sample size of the two fifth grader student groups, nonparametric preprocessing was used to "match" students on the variables of race, gender, initial reading ability, and the Title I status of their schools using the MatchIt package in R (Ho et al., 2007). These variables were selected based on prior literature. A historical reading achievement gap has been identified in the United States related to race and socioeconomic status (SES) (Paschall et al., 2018). Moreover, other reading achievement studies have included these types of demographics (Chung et al., 2022). To prepare the data for matching, cases with missing values for assessment scores were removed. Next, a propensity score, or the probability  of receiving treatment, was estimated using logistic regression. Students from each group were then matched with the "nearest neighbor" in the corresponding group, as determined by similarities in the propensity score. Comparison of pre-matched and matched groups suggest that the matching worked very well, resulting in two similar groups of 450 students each (Table 3). On baseline pretest measures, there was no statistically significant difference between control and intervention groups on overall reading and reading comprehension scores (overall: t(899)=0.911, p=.162; comprehension: t(899)=0.556, p=.056). However, there was a statistically significant difference between groups on the vocabulary score (t(899)=-0.421, p=.027) ( Table 4). To adjust for the difference in groups' vocabulary scores without upsetting the balance of the groups, a regression model was fit to remove the baseline difference.   computer or tablet on campus as a supplement to their regular language arts (inclusive of reading) instruction. At the beginning of each month, students were prompted to take a short formative assessment inside the program that provided scores for overall reading, comprehension, and vocabulary. Teachers were afforded real-time data to evaluate students' progress and reading needs. Subsequent supplementary reading sessions included practice of specific reading skills based on students' individualized reading abilities. As students demonstrated mastery of skills, their instructional pathway would include new content and skills to learn through practice for mastery. For a typical fifth grade student with average fifth grade reading abilities, content might include informational text centered in the sciences. Regarding the teachers of the students in the study, professional development for using the CARP, its features, and the assessments were provided at the school or district level. Additionally, teachers had access to the CARP's online training and live webinars offered throughout the school year.

Measure
The CARP's assessment component, called Istation's Indicators of Progress Advanced Reading (ISIP-AR), is a web-delivered and computer-adaptive assessment. The ISIP-AR measures students' (grades 4-8) reading level and progress on essential elements of reading identified by Torgesen et al., 2007. The overall reading score includes four subtests: (a) word analysis, (b) word fluency, (c) vocabulary, and (d) reading comprehension. ISIP-AR provides a separate reading comprehension and vocabulary score. The reading comprehension subtest assesses known comprehension skills including: (a) main idea, (b) cause and effect, (c) inference, and (d) critical judgment. The vocabulary words chosen for the vocabulary subtest included Tier 2 words (general vocabulary) and Tier 3 words (content-specific vocabulary; Matthes, 2016).
Students' abilities are measured based on their performance on items, successfully answered questions result in increasingly more difficult items, while missed questions lead to less challenging items. To measure the CARP's influence at raising reading achievement for fifth graders, gain scores were calculated for the overall reading, reading comprehension, and vocabulary subscale scores. Simple gain scores were calculated by subtracting September scale scores from May scale scores.

Data analysis procedures
A one-way ANCOVA (analysis of covariance) was used to directly compare groups and measure the effect of treatment, with gain as a dependent variable (DV), treatment as independent variable, and initial score as a covariate. Specifically, ANCOVA is used "to adjust the means of the DV themselves to what they would be if all participants scored equally on the CVs" (covariates; Tabachnick & Fidell, 2019, p. 167). Therefore, the students' baseline scores were included as a CV to improve the precision of the estimates, and, in the case of vocabulary scores, to adjust for differences at baseline, given that there was a statistically significant difference between groups at baseline on this measure. One-way ANCOVA was also used to compare treatment and control groups for each academic level. To understand the magnitude or 1 3 the importance of the effect, Glass's Δ was selected as the most appropriate measure for effect size because of the differences between standard deviations for each group (Cumming, 2013;Glass, 1976;Grissom & Kim, 2012).

Results
The study reports findings of a quasi-experimental design using matching groups to compare reading outcomes for students who used the curricular portion of Istation, a CARP. PSM was used to match for a control and treatment group. The first research question asked: What were the differences, if any, of fifth-grade students' overall reading, vocabulary, and reading comprehension scores when using a CARP as compared to students that do not use the CARP?
Prior to data analysis, preliminary checks were conducted to test the statistical assumptions associated with ANCOVA (e.g., outliers, normality of sample distribution, linearity, homogeneity of variance, and homogeneity of regression). The results of the ANCOVA identified positive and statistically significant differences-in-differences for treatment groups for overall reading, reading comprehension, and vocabulary scores. There was a significant difference in mean gains for the overall reading score, F (1, 897) = 11.801, p < .001, Δ = 0.21, between treatment and control group, adjusting for pretest reading scores ( Table 5). For vocabulary scores, there was a significant difference in mean gains, F (1, 897) = 4.112, p = .043, Δ = 0.13, between treatment and control group, adjusting for pretest scores ( Table 5). For the reading comprehension scores, there was a significant difference in mean gains, F (1, 897) = 16.348, p <.001, Δ = 0.71, between treatment and control group, adjusting for pretest scores (Table 5). Therefore, the fifth-grade students that used the CARP had greater gains in their overall reading, reading comprehension, and vocabulary scores as compared to the fifth-grade students that did not use the CARP.
To answer research question two, when considering all of the fifth-grade students by Academic Level, what were the differences in the overall reading, vocabulary, and reading comprehension scores between the intervention (CARP) and control group? Comparable groups were determined by academic level through PSM (Table 6). For overall reading scores, results of ANCOVA identified a statistically significant dif-   ference between control and treatment for students at all academic levels including those in Academic Level 3 (students in most need of reading support), F (1, 361) = 6.135, p = .014, Δ = 0.23. For reading comprehension, there was a statistically significant difference between control and treatment groups for students at Levels One F (1, 317) = 15.501, p <.001, Δ = 0.44 and Level Three F (1, 361) = 4.024, p = .046, Δ = 0.20. For vocabulary scores, differences were not statistically significant for students at any of the three Academic Levels (Table 7). Comparing mean gains adjusted for the effect of the covariate identified that overall reading and vocabulary score gains were highest for students at Academic Level 3 and reading comprehension gains were highest for students in Academic Level 1. Therefore, when comparing the achievement as mean gains between the control and treatment matched groups (those who used the program for the recommended 30 min), the achievement was greater for students in the treatment group. Likewise, the Academic Level of the students by achievement and subtest scores provided evidence that the Academic Level of the students made a difference in students overall gains.

Discussion
Effective reading instruction entails a focus on fluency, comprehension and vocabulary for upper elementary education. Students in upper-elementary grades may have unique, differentiated reading needs. Previous studies indicated that computer adap-  tive reading may support personalized instruction based on students' reading needs. The current investigation was conducted to examine the efficacy of a supplementary, CARP in promoting fifth-grade students' reading achievement. The findings identified the use of the CARP for fifth-grade students as a means to promote their overall reading, reading comprehension, and vocabulary scores. Statistically significant differences were identified between fifth-grade students in the control and treatment groups for their overall reading, reading comprehension, and vocabulary score. Further analysis indicated that the fifth-grade students in Academic Level 3 (lowest quartile) using the CARP showed the greatest gains in their overall reading scores compared with students in Academic Level 1 (upper quartile) and Academic Level 2 (middle quartile). In addition, all fifth-grade students in the treatment group at all three Academic Levels demonstrated increases in their mean scores, suggesting that the Istation CARP can benefit reading achievement when used consistently (30 min per week). Implications for schools include allocating the recommended amount of time in schools learning schedules to allow for consistent access to and use of the CARP as time can be a predictor of academic achievement (Fisher et al., 2015). With compressed instructional time due to non-instructional activities, the importance of organizing the school week to meet the recommended weekly usage may make a difference in students' reading achievement as was found in this study (Smith, 2000). Moreover, schools can support teachers by providing the necessary resources (e.g., computer access for the whole class at the same time) to optimize scheduling students' use of the CARP.
Because the program is supplemental to the primary reading curriculum, how teachers integrate the program into their reading instruction may be considered key to program effectiveness. Teachers may need time and support to make sense of how to use the data generated by the program to inform daily instructional practices. Professional development opportunities could include: (a) developing shared understanding of the purposes and guidelines of the program and how it relates to instructional goals and teaching practices and (b) supporting teachers in making sense of data generated from formative assessment to target reading skill deficits (Wayman, 2005). Statistically significant differences in overall reading mean score gains by students' Academic Level for the treatment group supports that the CARP can promote all learners' reading scores, especially those in need fifth-grade students at Level 3. The effect size for overall gain while categorized small by Cohen's (1998) standard for 0.13-0.27, the What Works Clearinghouse (2017) considers findings as "substantively important" if the effect size is 0.25 standard deviations or greater" (p. 77). In this case, the overall effect size indicates the effect of the treatment (30 min a week for 30 weeks) would be a gain of 6-9 percentile points (Marzano, 2010) in comparison to those who did not have the treatment and have teacher instruction alone. Conversely, even though the fifth-grade students showed mean gains on their vocabulary scores, the results were not statistically significant; however, the effect size indicated 0.06-0.13, a small effect size, that translates to a 2-5 percentile increase. For the fifth-grade students at Academic Level 3, the effect sizes for their reading gains were the largest. The fifth-grade students' reading comprehension scores indicated that the effect of the CARP was small to moderate, with percentile gains range from 6 to 17 percentiles.
The fifth-grade students in Academic Level 2 (> 20th and < 40th percentile) evidenced the least amount of growth in their overall reading, reading comprehension, and vocabulary scores. Potential reasons for students in Academic Level 2 not scoring higher on the assessment may range from instructional attention to time engaged in practicing skills. Instructional attention can be greater for students most in need of reading support (Academic Level 3) or for those who evidence sustainable growth and/or may need reading challenges (Academic Level 1). Students in the middle (Academic Level 2) may not express the same needs. More research is needed to understand the factors that may have contributed to the phenomenon observed in this study and how that may relate to the documented challenges of upper-elementary grades readers (Chall & Jacobs, 1983;Torgensen et al., 2007).

Limitations of the investigation
While PSM procedures were designed to minimize selection bias and confounding variables, it is possible that differences at the district, school, and classroom level also influenced student's reading score outcomes, suggesting an important avenue for future analysis. For example, districts may employ different policies and policy levers to promote the CARP usage, which are, in turn, implemented differently at the school-administrator level. Perhaps most importantly, how teachers "make sense" of district and school policies influences how they use CARP to drive classroom instruction (Spillane, Reiser, & Reimer, 2002). A factor unexplored in this study includes how teachers use the formative feedback generated by the CARP to inform their instruction. Previous research has found that interventions that change daily teaching practices are more effective at raising reading achievement in upper-elementary grades (Slavin et al., 2009). Investigating how CARPs alter teachers' practices may further explain findings and identify implications for practice.

Conclusions
The present study examined a CARP (inclusive of curriculum component and curriculum-based measure) to determine the efficacy of the use of the supplementary curriculum component for the recommended time. PSM procedures were utilized to create a comparable treatment and control group. The findings indicated that those fifth-grade students who used the Istation supplementary computer adaptive reading curriculum for a minimum of 30 min each week (30 weeks) during the school year evidenced greater gains on the ISIP-AR than those students who did not use the curriculum. While the present study provided evidence that the CARP can be used to improve achievement outcomes for fifth graders, future quantitative and qualitative studies could investigate how district, school, and teacher-level factors influence the implementation of a supplementary CARP to increase reading proficiency levels for all students.