Background

There is growing evidence for the positive effects of physically active academic lessons, in which physical exercises are integrated into academic lessons, on academic outcomes (Donnelly et al. 2009; Norris et al. 2015; Watson et al. 2017; Webster et al. 2015). Review studies have shown that the main goal of physically active academic interventions was to increase physical activity levels and to reduce sedentary time, but occasionally effects on academic outcomes were also reported (Norris et al. 2015; Watson et al. 2017; Webster et al. 2015). In the short term, it was found that physically active academic lessons positively influenced children’s on-task behavior and academic motivation (Grieco et al. 2009; Mahar et al. 2006; Mullender-Wijnsma et al. 2015a; Vazou et al. 2012). Since academic engagement and academic motivation are important for children’s academic success (Greenwood et al. 2002; Linnenbrink and Pintrich 2002) physically active academic lessons could lead to increased academic achievement in the longer term.

So far, three cluster-randomized control studies have examined the effects of prolonged physically active academic interventions. Positive effects of the Physical Activity Across the Curriculum project were found on math (ES = 0.44), reading (ES = 0.35), and spelling (ES = 0.45) after three intervention years (J.E. Donnelly and J.L.Greene, personal communication). In this project, a variety of academic areas were coupled with moderate to vigorous physical activity (Donnelly et al. 2009; Donnelly and Lambourne 2011). Based on these results, they recently conducted another cluster-randomized controlled trial to further examine the effects of physically active lessons on academic achievement. In this 3-year study, no effects on academic outcomes were found (Donnelly et al. 2017). We recently reported the outcomes of ‘Fit & Vaardig op school’ (F&V; Fit and academically proficient at school), a 2-year cluster-randomized controlled trial in the Netherlands (Mullender-Wijnsma et al. 2016). The intervention used physical activity in the teaching of math and language. After two intervention years, the children that followed the F&V lessons had significantly greater gains in math speed scores (ES = 0.51), general math scores (ES = 0.42), and spelling scores (ES = 0.45) in comparison with the control group. No effects were found on reading.

These findings provide evidence for the positive effects of physically active academic lessons on academic achievement. However, the need to follow the effects of physically active academic lessons over time is necessary to determine whether effects are lasting when the lessons are no longer taught. Another question that arises is whether integrating physical exercise into academic lessons also improves the academic achievement of children from disadvantaged groups. The academic achievement gap between members of low and high socio-economic status groups is a worldwide problem (Reardon 2011; Rothstein 2009). This gap also exists in the Netherlands: socially disadvantaged children (SDC) achieve worse than children without a disadvantage (non-SDC) (De Greeff et al. 2014; Driessen and Dekkers 2008). Recent studies show that despite many efforts for over 40 years, the achievement gap remains large (Driessen and Dekkers 2008). It seems that new teaching methods might be necessary to enhance the academic achievement of SDC.

The first aim of the current study was to examine what happened with the effects of the F&V intervention on academic achievement more than half a year after the end of the intervention (follow-up), when the lessons were no longer taught. Because the children could no longer benefit from the F&V lessons, we expected that their math and spelling improvements would be lower than they were after two intervention years (Mullender-Wijnsma et al. 2016). The second aim was to examine the effects of the intervention especially for SDC. We expected that the intervention would improve the academic achievement of SDC, and because these children are more often overweight and less physically active than other children (Fredriks et al. 2005; de Vries et al. 2005), we expected that they would benefit more from the intervention than non-SDC.

Methods

Participants

The participants were 499 children from second- and third- grade classes of twelve elementary mainstream schools in the Netherlands that agreed to participate in the study. The schools were selected from 5 elementary school boards (46 schools). One hundred thirteen children were classified as SDC. The classification into SDC and non-SDC was based on parental education: the children whose parents or guardians had completed less than 3 years of secondary school were classified as SDC (Ministry of Education, Culture and Science 2006).

At each school, a second- and a third-grade class participated in the study. Per school the classes were randomly assigned to the intervention (n = 249) or the control group (n = 250). Randomization was performed by the Netherlands Bureau for Economic Policy Analysis (CPB). When the second grade was assigned to the intervention group, the third grade automatically served as control, and vice versa. Based on the initial findings from the Physical Activity Across the Curriculum study (Donnelly et al. 2009; Donnelly and Lambourne 2011), an effect size of 0.44 was assumed (J.E. Donnelly and J.L. Greene, personal communication). The power analysis resulted in a total sample of ≥ 20 classes, with 25 children per class (power 0.8, 1-tailed, α = 0.05) (Spybrook and Raudenbush 2008).

Intervention

In the F&V intervention, physical exercise was used when teaching math and language in the classroom. In a 20–30 min lesson, half of the time was spent on math, and half on language activities. The main focus was on constant practice and repetition of concepts learned in earlier classes. The physical exercises were of moderate to vigorous intensity. For example, children spelled a word by making a squat for each letter mentioned. The lessons were supported by presentations on interactive whiteboards. A previous study showed that the intervention program could be successfully implemented (Mullender-Wijnsma et al. 2015b).

Instruments

Academic achievement was measured using the One-Minute test (Brus and Voeten 1973) to assess reading, and the Speed-Test arithmetic (de Vos 1992) to assess the math speed performance. In addition, ability scores on math and spelling were retrieved from a child academic monitoring system, a standardized and norm-referenced test battery (Janssen and Hickendorff 2008; Janssen et al. 2010; de Wijs et al. 2010).

The One-Minute test is used to assess children’s technical reading skills from second to sixth grade. The children get one minute to read aloud as many words as possible; this is then repeated with a different set of words. The score is calculated as the total number of words read correctly (from 0 to 232). For the Dutch population (based on a representative sample of 25 Dutch elementary schools), the construct validity (r varied from .78 to .86) and test-retest reliability (r varied from .89 to .92) of the test are good (Brus and Voeten 1973).

The Speed-Test arithmetic is used to assess children’s math speed performance. Arithmetic problems have to be solved as quickly as possible. The total number of tasks solved determines the score (from 0 to 200). The test has been standardized in the Netherlands on a representative sample of 4804 elementary school children from 54 schools (de Vos 1992).

In the Netherlands, the spelling and math test from the child academic monitoring system is administered twice a year by most elementary schools. During the first part of the spelling test a sentence is read out by the teacher. A certain word from the sentence is then repeated, and the children have to write that word down correctly. The second part is an individual task in which the children identify misspelled words. For the Dutch population (based on a representative sample of 59 Dutch elementary schools), the reliability (r varied from .90 to .93), and content and construct validity of the spelling test were good (de Wijs et al. 2010). The math test is an individual task and involves several domains: number sense, arithmetic, algebra, geometry, time and money, and knowledge of ratios and fractions (Janssen and Hickendorff 2008); The test can be done digitally or with pencil and paper. For the Dutch population (based on a representative sample of 189 Dutch elementary schools), the reliability (r varied from .93 to .96) and content and construct validity of the test were good (Janssen et al. 2010).

Procedure

The children were assessed before the intervention (T0), after the first intervention year [8 months after T0 (T1)], after the second intervention year [1 year after T1 (T2)], and 7–9 months after the end of the intervention (when the children no longer participated in the lessons) (T3). In both intervention years, the children in the intervention group participated in the F&V lessons for 22 weeks, three times a week; and the children in the control group participated in the regular sedentary classroom lessons. In the first year, six recently graduated and qualified teachers taught the F&V lessons. The regular classroom teachers taught the lessons in the second year (Mullender-Wijnsma et al. 2016). Because the intervention included physically active academic lessons, blinding of children and teachers to group assignment was not possible.

The spelling and math test from the child academic monitoring system were administered by the participating schools at fixed time points. The One-Minute test and the Speed-Test arithmetic were administered by trained test administrators (Mullender-Wijnsma et al. 2016). Follow-up measurements (T3) of the One-Minute test and the Speed-Test were administered 7 months after the completion of the intervention. The participating schools administered the spelling and math test from the child academic monitoring system nine months after the intervention. The One-Minute test and the Speed-Test arithmetic were administered 2 months before the tests from the child academic monitoring system because of practical considerations, such as a school holiday period.

Data Analysis

The Statistical Package for the Social Sciences (version 23.0) was used to calculate the pretest characteristics; the significance level was set at 0.05. Baseline differences between intervention and control group, and between SDC and non-SDC, were examined using an independent t test (age) or a Chi Square Test (grade, gender, SDC).

Repeated measures multilevel modeling (MLwiN 2.29) was used to examine the effects of the F&V intervention at 7–9 months follow-up and to examine the effects of the intervention for SDC. To account for multiple testing, we used Bonferroni correction, resulting in a significance level of 0.0031. Multilevel models were calculated for each academic achievement posttest, with time (T0, T1, T2, T3) as level-one units, children as level-two units, and schools as level-three units. The first model contained the covariates grade, gender, and time. SDC and the interaction between time and SDC were also entered in the first model in order to examine if SDC scored lower on the academic achievement tests.

Subsequently, to assess the effect of the intervention on academic achievement at T3 (aim one), we entered condition and the interaction between condition and time as possible predictors in model 2. The interaction between condition and time was entered to see what happened at the different time points, up and above differences that may be already present at T0. Significant interaction effects would mean that there is an effect of the intervention. To assess the effect of the intervention for SDC at T1, T2, and T3 (aim two), the interaction between condition and SDC and the interaction between condition, time, and SDC were entered (model three). Significant interaction effects would mean that the effect of the intervention differs between SDC and non-SDC. Effect sizes (ES) were calculated as (estimated intervention effect)/√(variance at student level) (Hedges 2007).

Results

Figure 1 shows that 249 children were assigned to the intervention and 250 children to the control group. Two schools dropped out in the second intervention year: one because of the long-term absence of the teacher and the other because it was closed down. At T3 between 322 and 336 children were measured. Common reasons for not completing the tests were absence from school or leaving to attend another school.

Fig. 1
figure 1

Flow of schools and students from enrollment, allocation and analysis

The demographic characteristics of the children are shown in Table 1. The proportion of SDC and boys was similar in the control and intervention groups. Because there were more third-grade children in the control group (χ2 = 5.2, p < .05), the children in the control group were significantly older than the children in the intervention group (t = 2.2, p < .05). Furthermore, the proportions of boys and children in second grade were similar among SDC and non-SDC, but SDC were significantly older than non-SDC (t = 3.0, p < .05). Table 2 presents the mean scores and the number of children that took the academic achievement tests per measurement moment.

Table 1 Pretest characteristics, by condition and by SDC and non-SDC
Table 2 Mean scores on academic achievement tests with (sd); n

The first aim of the study was to examine the follow-up effects of the 2-year intervention for all participating children. The results of the second model of the multilevel analysis can be found in Table 3. For each academic achievement test, except the reading test, (Δχ2 (4) = 3.9, p = 0.42) inserting condition and the interaction between condition and time did significantly improve the first model (Δχ2 (4) = 20.5–50.5, p < 0.01). The results showed that at T0 the score on the math test from the child academic monitoring system of the intervention group was significantly lower than that of the control group (t = − 2.99; p < 0.003). No differences between the intervention and the control group were found on the reading, spelling, and math speed test scores at T0. The results further revealed no significant follow-up effect of the intervention on the reading scores (t = 1.40; p = 0.16; ES = 0.08; Fig. 2a) and spelling scores (t = 1.31; p = 0.19; ES = 0.14; Fig. 2b). The results of the math speed test (t = 3.99; p < 0.005; ES = 0.35; Fig. 2c) and math test from the child academic monitoring system (t = 5.83; p < 0.005; ES = 0.54; Fig. 2d) revealed that the scores of the children in the intervention group had improved significantly more (with respect to T0) than those of the control group at follow-up.

Table 3 Multilevel regression coefficients (B) and standard error (SE) for each factor predicting the achievement (model 2)
Fig. 2
figure 2

Predicted mean scores (based on the third model of the multilevel analysis) on the reading (a), spelling test from the child academic monitoring system (b), Math speed (c), and Math test from the child academic monitoring system (d) tests per measurement moment (T0, T1, T2)

The second aim of the study was to examine the effects of the intervention especially for SDC. SDC was a negative predictor of academic achievement for all achievement tests, indicating that SDC scored lower than non-SDC on the academic achievement tests. For each academic achievement test, inserting the interaction between condition and SDC and the interaction between condition, time, and SDC did not significantly improve the model (Δχ2 (4) = 2.5–5.8, p > 0.05). The results revealed no significant different intervention effects between SDC and non-SDC—either after one or after two intervention years or at follow-up—on all four academic achievement tests. This indicates that the intervention did not affect SDC and non-SDC differently.

Discussion

In the current study we examined the follow-up effects of the F&V intervention and the effects of the F&V intervention for SDC. It builds on a previous study of F&V lessons, in which effects of a 2-year intervention on math and spelling were found (Mullender-Wijnsma et al. 2016). The results of the current study indicated that effects on math were still apparent at follow-up, but no follow-up effects on spelling were found. The focus of the F&V lessons was on constant practice and repetition (Mullender-Wijnsma et al. 2015b). During the spelling lessons children had to spell words from specific word categories. It is likely that the knowledge acquired during these lessons was not very useful in later (regular) spelling lessons because other word categories were practiced. The mathematical knowledge acquired during F&V lessons might be more useful in later (regular) math lessons because constant practice of math problems during F&V lessons might provide an additional basis for recognition of more difficult math problems. The lasting effects on math indicate that gains in math achievement can be maintained at least 7–9 months after physically active academic lessons end. However, since the effects on spelling disappeared when the F&V lessons were no longer taught, schools may have to encourage their staff to use physically active academic lessons throughout elementary school. Donnelly et al. (2009) showed that 95% of the teachers continued to use the physically active academic lessons after the intervention stopped. This is a hopeful finding, but more research is necessary to assess the sustainability of physically active academic lesson programs in elementary schools and to discover the additional learning gains of continued use of the lessons.

The F&V lessons improved the math and spelling performance of children from low-socioeconomic status families (SDC). SDC in the intervention group showed more improvement than SDC in the control group, but SDC did not benefit more from the intervention than non-SDC. These findings correspond with the findings from a previous study on acute effects of F&V lessons (Mullender-Wijnsma et al. 2015a). According to this previous study, the on-task behavior of SDC is generally lower than the on-task behavior of non-SDC. Both SDC and non-SDC benefited from the F&V lessons; their time-on-task was higher immediately after an F&V lesson. However, SDC did not benefit more from the lessons than non-SDC. A study that examined relationships between children’s literacy and numeracy scores and their health behaviors showed that physical activity independently predicted children’s achievement scores after socioeconomic status was controlled for (O’Dea and Mugridge 2012). This suggests that the academic achievement of children from low and high socio-economic status groups is influenced equally by physical activity. Therefore, both SDC as non-SDC can be provided with physically active academic lessons and benefit academically.

Because time-on-task is positively related to academic achievement (Scheerens et al. 2013), the increased time-on-task immediately after an F&V lesson may be one of the mechanisms that caused the improved academic achievement scores. Other working mechanisms that may have contributed to the effects on children’s academic achievement are the effect of physical activity on the brain (Best 2010; Hillman et al. 2008) and the sensorimotor information that the children obtained during the F&V lesson (Kontra et al. 2012; Paas and Sweller 2012). Further research is necessary to examine through which mechanism physically active academic lessons improve academic achievement.

Strenghts and Limitations

The current study has some limitations that should be acknowledged. First, administration of the tests from the child academic monitoring system was done by the schools themselves. Teacher expectations may have led to teachers influencing children’s test results. However, the teachers knew under what conditions they had to administer a test; therefore it is not likely that they had a large influence on the children’s test results. Second, the score on the math test from the child academic monitoring system of the intervention group was significantly lower at baseline. Given that there was more room for improvement, this could possibly have influenced the results. The strengths of this study were the large sample size, the design (cluster-randomized controlled trial), and that long term effects of physically active academic lessons were investigated.

Conclusions

The F&V lessons positively influenced the math and spelling scores of SDC. Both SDC and non-SDC benefited, but SDC did not benefit more from the lessons than non-SDC. Furthermore, at 7–9 months follow-up, the effects of F&V physically active math and language lessons were found to have lasted on math. No follow-up effects were found on spelling and reading. The findings suggest that physically active academic lessons should be considered for inclusion in school curriculums in order to improve the academic achievement of SDC and non-SDC.