Background

Schools are the most important educational institution for children and young people (WHO, 2018b). They are well positioned to reach children and young people of all ages and social classes in most parts of the world. Furthermore, schools can play an important role in promoting physical activity (PA). Within a school day, PA can be encouraged not only in physical education lessons, but also during active breaks between classes, at recess, or by implementing after-school programs (WHO, 2018b). Recommendations and policies concerning PA requirements and the promotion of PA in schools vary between countries (Aubert et al., 2018; Hills, Dengel, & Lubans, 2015; Rütten & Pfeifer, 2016). About 50% of schools worldwide can create an environment that provides sufficient PA on school days (Aubert et al., 2018). The quality and quantity of PA in schools are significantly correlated with sociodemographic indicators such as the human development index (HDI), the number of years of schooling provided, or the degree of food security (Aubert et al., 2018). In some countries, schools are eliminating or reducing physical education to give more time to traditional academic teaching, despite existing evidence that physical education is conducive to academic success (Trudeau & Shephard, 2008).

Physical inactivity in children and adolescents can lead to physical and mental illnesses as well as to unfavorable social, physical, and cognitive health outcomes (Biddle, Ciaccioni, Thomas, & Vergeer, 2019; Janssen & Leblanc, 2010; Kremer et al., 2014; McMahon et al., 2017; Poitras et al., 2016; Warburton & Bredin, 2017). Physical inactivity in young people often becomes a lifetime problem, as PA behavior is transferred from childhood and adolescence into adulthood (Telama et al., 2014). It is therefore important to begin encouraging children to adopt a more active lifestyle at an early age. Nevertheless, the prevalence of physical inactivity is high among children and adolescents and is even higher among girls than among boys. Only 15% of girls aged 11 to 17 and 22% of boys in that age group meet the World Health Organization’s recommended guideline of 60 min of moderate to vigorous PA (MVPA) per day (WHO, 2019). In general, boys are more active than girls (Guthold, Stevens, Riley, & Bull, 2020; WHO, 2018a) also during recess periods (Ridgers, Salmon, Parrish, Stanley, & Okley, 2012; Sarkin, McKenzie, & Sallis, 1997; Sato, Ishii, Shibata, & Oka, 2012).

Due to the large amount of time most children and adolescents spend at school, a portion of overall daily PA should be performed during school hours. Unfortunately, in most cases this does not happen, although opportunities for PA are often offered in various areas of everyday school life (e.g., recess, physical education, and after-school programs) (McKenzie, 2019).

Although differences in PA between boys and girls have been identified, the reasons for these differences vary and have not been fully captured. There is a strong tradition of research on gender and health that conceptualizes health behaviors (such as PA) as both shaped by and expressions of societal constructions of gender (Courtenay, 2000; Johnson & Repta, 2012; Saltonstall, 1993). Increasingly, theoretical approaches to gender and health acknowledge that sex-based biological factors and gendered social factors are intertwined to the extent that it is not always possible to theoretically or empirically isolate the influence of each category of factors (Springer, Mager-Stellman, & Jordan-Young, 2012). To recognize this complexity, in this article we use the term “sex/gender” (Doull et al., 2014).

Sex/Gender differences in PA might be fostered and generated by interventions intended to promote PA in the school context. A review by Love, Adams, and van Sluijs (2017) described the effects of gender equality parameters on PA interventions for children. These parameters included gender, socioeconomic status, body mass index (BMI), ethnicity, place of residence, and religion. In a meta-analysis of accelerometer-assessed data, no effect related to sex/gender could be determined. Another systematic review by Love et al. (Love, Adams, & van Sluijs, 2019), which included 17 studies, found that school-based interventions to increase PA had no long-term effectiveness. In addition, no significant differences in intervention effectiveness related to sex/gender were observed. Mears and Jago (2016) found, in their systematic review of the effects of after-school interventions, that insufficient data on sex/gender differences were reported to enable quantitative analysis. Thus, until now, it is unknown where the differences in PA behavior between girls and boys come from, why interventions may increase them, and why interventions are only marginally effective.

Overall, sex/gender has received limited attention in interventions designed to promote PA (Love et al., 2017; Watson, Timperio, Brown, Best, & Hesketh, 2017). Since girls are less active than boys, we need to focus on intervention that promote PA in boys and girls in a similar way, so that boys and girls equally benefit from positive effects of PA. Differential effects in school-based PA interventions for boys and girls have been sporadically observed. Furthermore, up to now, sex/gender has been discussed mainly with regard to the effectiveness of interventions and not in terms of how the design, implementation, or analysis of the intervention could themselves produce differential effects. To reach reliable conclusions about how sex/gender affects interventions and their effectiveness, this oversight must be addressed. Therefore, this systematic review aims both to evaluate the effects of interventions to promote PA among girls and boys in the school context and to assess the extent to which these intervention studies took sex/gender into account in their design, implementation, and evaluation phases. Additionally, we conducted a meta-analysis to compare the intervention effects between girls and boys.

Methods

This paper is part of the genEffects systematic review, which seeks to analyze the effects of interventions to promote PA and/or reduce sedentary behavior (SB) in children and adolescents (Demetriou et al., 2019). The genEffects systematic review is reported according to PRISMA guidance (supplementary material 1) (Welch et al., 2012). The protocol for the genEffects review was published previously (Demetriou et al., 2019) and is also registered with PROSPERO (ref CRD42018109528). There were no protocol amendments for the present study, except that the GRADE framework was not used due to the narrative synthesis of data. The set of studies we reviewed was delimited to those that focused on interventions to promote PA in school. The consideration of sex/gender was assessed using a newly developed sex/gender checklist. Furthermore, a meta-analysis was conducted as noted in the previous paragraph.

Search strategy and eligibility criteria

Within the genEffects review, we searched the following eleven electronic databases: Cochrane Central Register of Controlled Trials (CENTRAL); Ovid MEDLINE; Epub Ahead of Print, In-Process and other Non-Indexed Citations, Daily, and Versions; Ovid Embase; Science Citation Index Expanded (SCI-EXPANDED); Clarivate Web of Science; Conference Proceedings Citation Index (CPCI-S); EBSCO PsycINFO; EBSCO Eric; EBSCO SPORTDiscus; and ProQuest Dissertations & Theses Global. The search included studies from January 2000 to August 2018, with a search strategy based on Cochrane standards (see supplementary material 2).

The search aimed to identify randomized and nonrandomized controlled trials of interventions to reduce SB and/or promote PA in children and adolescents age 3 to 19. Eligible studies were limited to peer-reviewed English-language publications reporting a quantified measure of PA and/or SB. Studies primarily targeting children and adolescents with specific health issues were excluded, as were those that focused exclusively on college students. Additionally, we required all intervention studies to meet at least one of the following criteria: reporting PA separately by sex/gender at baseline and/or follow-up; explaining how sex/gender was addressed in outcome analyses (e.g., adjusting the analysis for sex/gender); and/or reporting on sex/gender similarities or differences among the outcomes. The comparators were either a control group with an activity that did not promote PA or reduce SB, or a control group without an intervention (Table 1).

Table 1 Eligibility criteria for the genEffects systematic review

Study selection and data extraction

Two researchers performed the study selection process independently using Covidence software. All discrepancies were resolved by a third, senior researcher. After the removal of duplicates, titles and abstracts were screened, and all potentially relevant articles or those of undetermined relevance, were subsequently retrieved and screened against the eligibility criteria.

For each intervention study selected for inclusion, specific details were extracted by two reviewers independently. Data extraction covered general study characteristics (country, design, name of intervention program), sample size for intervention and control groups stratified by sex/gender and dropout rate, details of the intervention content, and intervention approaches and settings. Additionally, the extraction forms included information on the main outcomes of each intervention, measurement points and instruments, and statistical approaches, including the confounding variables taken into account in order to analyze the effectiveness of the intervention. For additional information, study protocols and supplementary material were used and in the case of missing information, authors were contacted (maximum of two contact attempts).

Quality assessment and risk of bias

Internal validity assessment was carried out independently by two reviewers using the Cochrane risk-of-bias tool for randomized trials, version 1 (Higgins et al., 2011; Higgins & Green, 2011). Discrepancies were resolved through discussion or through adjudication by a third reviewer if consensus was not reached. Primary studies were assessed across each of the five types of bias (selection, performance, attrition, detection, and reporting). Each domain was assessed as having a low, high, or unclear risk of bias, with the last category indicating either lack of information or uncertainty about the potential risk of bias. Nonrandomized controlled trials were considered to be at high risk of bias in domains related to randomization. To identify other potential risks, we examined the assessment of baseline differences between intervention and control groups, as well as seasonal differences in measurement points and monetary motivational incentives.

Sex/gender assessment

To assess the degree to which sex/gender was considered in the intervention studies, we used a newly developed sex/gender checklist (Demetriou et al., 2019). This sex/gender checklist consists of 10 items that analyzed background and concepts, study design, intervention planning and delivery, and presentation and interpretation of findings (Table 2). Each item was rated with regard to the extent to which the study took sex/gender into account on that item, using three categories: “basic,” “detailed,” and “no information provided.” A fourth category, “not relevant,” was used for items that were considered not applicable to studies in which all subjects were of the same sex/gender (items MI, SSR, PF, and SR). On the first item, another rating category, “poor,” was applied to those studies that used the terms “sex” and “gender” interchangeably.

Table 2 Sex/gender checklist: categories, items and their definitions

Data analysis and qualitative synthesis

We undertook a narrative synthesis to analyze differences and similarities between girls’ and boys’ PA in the interventions, based on their effectiveness with both sexes. Differences and similarities are reflected in the qualitative ratings obtained on the sex/gender checklist. In this analysis, we divided the studies into three superordinate groups. The first group consisted of studies with intervention effects in the same direction for girls and boys; this group was then subdivided into those with significant positive effects for girls and boys, those with significant negative effects for girls and boys, and those with no intervention effect for either sex/gender. Second, studies with different intervention effects for girls and boys were divided into four subgroups: positive significant effect for boys and no effect for girls, positive effect for girls and no effect for boys, negative effect for girls and no effect for boys, and negative effect for boys and no effect for girls. Third, among studies involving subjects of a single sex, we distinguished those that were effective from those that were not. In all three above-mentioned groups, the number of ratings of “detailed,” “basic,” “no information provided,” “poor,” and “not relevant” on every item of the checklist was calculated. By applying these analyses, we could compare the degree of sex/gender consideration between studies that were or were not effective in affecting PA for both girls and boys, on one hand, with those that revealed different effects on PA for girls and boys on the other hand.

Meta-analysis

Meta-analytic procedures were performed using Comprehensive Meta-analysis Software, version 3 (Biostat Inc., Englewood, NJ, USA). The meta-analysis was conducted to determine the effect of school-based interventions to promote PA in children and adolescents for girls and boys separately. Randomized controlled trials (RCTs) and cluster RCTs with pre/post control-group design were included in the meta-analysis if the study either disaggregated the results by sex/gender or includes subjects of only one sex/gender. Nonrandomized controlled studies were excluded, since random assignment is crucial for generating unbiased estimates of effects (Flay et al., 2005; Valentine & Thompson, 2013). If key information for the calculation of Hedges’ g was missing or if studies failed to report the results for boys and girls separately, a study was eliminated from the analyses. The main data entry format used for calculation of effect size was mean, standard deviation, and sample size for each group. A random-effects model was chosen to account for heterogeneity across the studies (Hedges & Olkin, 1985; Hedges & Vevea, 1998). Heterogeneity was analyzed by calculating the Q-statistic and the I2-statistic. The four included cluster RCTs were assessed for unit-of-analysis error their handling of adjusting for the clustering effect in the analyses (Campbell, Elbourne, & Altman, 2004; Eldridge, Ashby, Feder, Rudnicka, & Ukoumunne, 2004).

Publication bias was tested by the visual inspection of the funnel plot (an asymmetric, as opposed to a symmetric inverted, funnel shape indicated potential publication bias) and Egger’s test of the intercept to quantify the bias captured by the funnel plot and whether it was significant (p ≤ 0.05).

Several subgroup-moderator analyses were conducted according to the mixed-effects model. Two analyses concerning outliers were conducted by excluding (1) studies with the highest and lowest effect size and (2) studies with values of Hedges’ g not located within the 95% confidence interval of the random-effects model. Three further subgroup analyses were conducted: study sample (single sex/gender versus mixed sex/gender), PA (measured only during school time as opposed to being measured through the whole school day), and study design (RCT versus cluster RCT).

Results

In total, 58 articles reporting 56 unique school-based intervention studies with school PA as a primary outcome were included in our analyses (see supplementary material 3). Originally, in the genEffects systematic review, we identified 24,878 references through the electronic database search, leading to the inclusion of 244 articles reporting 217 unique studies (Fig. 1). We identified two publications each for two of the included interventions (Christiansen et al., 2017; Ha, Burnett, Sum, Medic, & Ng, 2015; Ha, Lonsdale, Ng, & Lubans, 2017; Toftager et al., 2014).

Fig. 1
figure 1

PRISMA flowchart

The included studies were categorized as either cluster RCTs (n = 30; 53.6%), RCTs (n = 14; 25.0%), or involving nonrandomized intervention and control groups (n = 12; 21.4%). In the included studies, the mean age was 10.9 ± 2.8 years (median = 10.7 years; minimum = 6.0 years; maximum = 18.4 years). The mean duration of the interventions was 46.7 ± 49.9 weeks (median = 30 weeks; minimum = 1 week; maximum = 208 weeks).

Risk of bias of primary studies

The risk of bias of each of the 56 studies was rated using the Cochrane risk-of-bias tool (Higgins et al., 2011; Higgins & Green, 2011). We analyzed the frequency with which each risk rating level occurred across all domains and studies, finding high risk of bias in 27.8% of all ratings, unclear risk in 30.6%, and low risk in 41.6% (Fig. 2). The risk-of-bias assessment for each included study is provided in supplementary material 4.

Fig. 2
figure 2

Risk of bias (RoB) for all 56 school physical activity (PA) studies

Overall sex/gender analysis of primary studies

Of the 56 studies, 19 (33.9%) reported results in a disaggregated manner for boys and girls separately; 18 (32.4%) analyzed sex/gender through interaction analyses (group allocation × time × sex/gender); 12 (21.4%) tested for differences or similarities in sex/gender at baseline or follow-up or via interaction analysis but did not find any (no effect size shown); and 7 (12.5%) included and analyzed girls only. No study included boys only. The consideration of sex/gender for each included study is provided in supplementary material 5.

The sex/gender assessment for each item according to the sex/gender checklist is provided in Fig. 3. Due to the inclusion criteria, the item Statistical results (SR) was the one rated most frequently as “detailed” (n = 42; 75.0%). The item rated most frequently as “no information provided” was Theoretical and/or conceptual linkages with sex/gender (TCL), in 55 (98.2%) studies. No study was rated as “detailed” on the items Definition and use of sex and/or gender terminology (DU), Study sample recruitment (SSR), or Intervention content and materials (ICM).

Fig. 3
figure 3

Sex/gender assessment of all 56 school physical activity (PA) studies

In the items MI, SSR, ICM, and IDLI (all in the intervention delivery category), we found that most of the studies provided no information about sex/gender. The specific percentages were as follows: Measurement instruments (MI), n = 43, 76.8%; Study sample recruitment (SSR), n = 46, 82.1%; Intervention content and materials (ICM), n = 52, 92.9%; Intervention delivery, location, and interventionists (IDLI), n = 52, 92.9%.

Intervention effectiveness in terms of sex/gender

Semiquantitative analysis.

First, 41 studies found that the intervention had the same effect on both girls’ and boys’ PA. In 27 studies, the intervention effect was significantly positive for girls and boys of the intervention group; in two studies (Fairclough et al., 2016; Ha et al., 2015), the control group was favored (Fig. 4); in 12 studies, no intervention effect could be found on girls’ and boys’ PA. The two studies favoring the control group had “detailed” ratings on the sex/gender checklist 10.0% percent of the time, less than those with a positive effect (15.2%; n = 41); studies with no significant intervention effect had the highest proportion of “detailed” ratings, with 18.3% (n = 22). Studies favoring the intervention group were more likely to provide information about considering sex/gender (74.4%) than studies with negative effect (12.6%) or no effect (13.3%). About one-third of the studies with same effect for girls and boys were rated as “detailed” on Participant flow (PF) and Discussion (D), and 85.4% (n = 35) of the 41 studies were rated as “detailed” on statistical results (SR). In all other items, sex/gender was only occasionally considered as “detailed.”

Fig. 4
figure 4

Sex/gender assessment—semi quantitative analysis of all 56 school physical activity (PA) studies

Different intervention effects for girls and boys were found in eight intervention studies. In four studies was no effect for girls and a significant effect for boys (Christiansen et al., 2017; Haerens et al., 2006; Loucaides, Jago, & Charalambous, 2009; McKenzie et al., 2004); no effect for girls and a negative effect for boys once (Elder, McKenzie, Arredondo, & Cre, 2011); no effect for boys and a negative effect for girls was shown once (Verloigne et al., 2012); and a positive intervention effect for girls and no effect for boys was also reported once (Bleeker, Beyler, James-Burdumy, & Fortson, 2015; Verstraete, Cardon, De Clercq, & De Bourdeaudhuij, 2006). In this group of studies, none was rated as “detailed” on any of the first seven items (DU, BI, TCL, MI, SSR, ICM, IDLI) except Bleeker et al. (2015) on item BI. Overall, 14 (17.5%) ratings of “detailed” were given by these studies; a rating of “basic” was given 19 (23.8%) times; in 45 cases (56.3%), the rating of “no information provided” was selected; and just two studies had the rating “poor” (Christiansen et al., 2017; McKenzie et al., 2004). On items TCL, ICM, and IDLI, no information about sex/gender was provided by any of these studies. On item SR, all studies considered sex/gender when reporting the statistical results except Bleeker et al. (2015), which was rated as “basic” in this regard. Overall, studies that were more successful for girls than for boys (significant positive effect for girls or negative effect for boys) had ratings of “detailed” more often (20.0% in both cases) than studies that were more successful for boys than for girls (significant positive effect for boys, 17.5%; negative effect for girls, 10.0%).

Among all the studies that considered only a single sex/gender, as noted above, only seven studies with girls as the target group met the inclusion criteria. A significant positive intervention effect was reported in four of these studies (Carlin, Murphy, Nevill, & Gallagher, 2018; Fairclough & Stratton, 2006; Guagliano, Lonsdale, Kolt, & Roser, 2015; Schneider et al., 2007) but not in the other three. Items MI, SSR, PF, and SR were excluded from consideration because these are not relevant to studies of a single sex/gender. Only the three studies with no intervention effect were rated as “detailed” on Discussion (D) because of their consideration of sex/gender in the discussion (Dewar et al., 2014; Dudley, Okely, Pearson, & Peat, 2010; Okely et al., 2017). Among the 13 ratings (31.0%) in the basic category, nearly half were on item BI, where six out of seven studies provided sex/gender background information regarding the research question; the most frequently mentioned background statement was that girls are significantly less physically active than boys (Carlin et al., 2018; Dewar et al., 2014; Dudley et al., 2010; Guagliano et al., 2015). On 26 occasions (61.9%), no information was provided about the consideration of sex/gender. Overall, in this group of studies, sex/gender was considered less frequently than in all other studies, regardless of the effectiveness of the studies.

Meta-analyses

An overview of the two calculated meta-analyses is provided in Table 3, including the effect size statistic, the heterogeneity statistic, the analysis of publication bias, and the subgroup analyses.

Table 3 Random effects model of Hedges’ g for school-based physical activity (PA) interventions and subgroup analyses

Intervention effects in girls

Ten studies provided sufficient data to be included in the meta-analysis. The overall pooled effect size was significantly positive and small, and heterogeneity was high (see Table 3). These results suggest that girls exposed to the PA intervention treatment participated in more PA than those in the control condition. The effects from the included studies were extremely inconsistent, ranging from g = 0.006 (61) to g = 1.592 (59) (Table 3). Of the ten included studies, five (Carlin et al., 2018; De Barros et al., 2009; Dudley et al., 2010; Parrish, Okely, Batterham, Cliff, & Magee, 2016) reported a small effect and two (55, 62) reported a large effect.

To explore whether the subgroups moderated the average intervention effect, a series of subgroup analyses was performed. Excluding outliers resulted in a slightly smaller effect size and reduced heterogeneity. The subgroup analysis of the study design did not differ significantly (p = 0.197), and it showed that RCTs resulted in a small effect size and no heterogeneity. When only cluster RCTs were analyzed, the effect size was higher, with high heterogeneity. As for the study sample, the assessment of Hedges’ g resulted in a low effect size for both studies with subjects of mixed sex/gender and those with girls only. Studies using mixed designs exhibited higher heterogeneity than single-sex/gender studies. The subgroup analysis of the PA measurement showed that assessing PA during the whole day produced a low effect size of Hedges’ g and high heterogeneity. In comparison, measuring only school-based PA resulted in a low effect size, but higher heterogeneity. The inspection of the funnel plot indicated more positive than negative comparisons and therefore some possible publication bias; also, Egger’s regression test was significant.

Intervention effects in boys

Five studies were eligible for inclusion in the meta-analysis (De Barros et al., 2009; Grydeland et al., 2013; Haerens et al., 2006; Parrish et al., 2016; Verstraete et al., 2006). The average treatment effect was significant but small, and heterogeneity was low. These results suggest that boys exposed to PA interventions participated in more PA than those in the control condition (Table 3). Excluding the comparison with highest and lowest effect sizes (Haerens et al., 2006; Verstraete et al., 2006) resulted in a slightly smaller effect size and no heterogeneity (Table 3). No study was located outside the 95% confidence interval with regard to the overall effect.

The subgroup analysis by study design among boys did not reveal any significant difference (p = 0.792). RCTs resulted in a small effect size and no heterogeneity. For cluster RCTs, the effect size was small without heterogeneity. When we analyzed PA over the whole school day, a small effect size was found with no heterogeneity. There were no studies of boys only. The visual inspection of the funnel plot was balanced, and Egger’s regression test was not significant (Table 3).

Discussion

This systematic review assessed the consideration given to sex/gender factors in the development, implementation, and evaluation stages of 56 school-based intervention studies that aimed to promote PA in children and adolescents. In all studies, sex/gender was considered only rudimentarily across all items of the sex/gender checklist, regardless of the effectiveness of the intervention. Additionally, the meta-analyses examining the intervention effects for girls and boys separately revealed that the interventions were successful in both girls and boys, but with small significant effects and high heterogeneity.

Most children and adolescents of all ages and from all social classes are attending school. Therefore, the school offers an important setting to promote PA, not only through physical education but also during recess, regular classes, or after-school programs (WHO, 2018b). Positive significant intervention effects were achieved only for girls in 3.6% of the studies and only for boys in 7.1% of the studies. In 48.2% of the studies, positive intervention effects on the PA levels of both girls and boys were found. No effect on either sex/gender occurred in 21.4% of the studies. Negative effects were found in 7.2% of the studies. Single sex/gender studies had in 5.4% of the studies no effect and a positive effect in 7.1%.

Overall, sex/gender aspects received minimal consideration regardless of whether the studies had the same intervention effect on girls and boys, had different effects on girls and boys, or included girls only in their sample (Love et al., 2017). Overall, only the statistical analyses addressed sex/gender in greater detail. These findings lead us to conclude that research studies are more likely to consider sex/gender in their analyses of intervention effectiveness and discussions than in the planning, design, development, and implementation of the study. Notably, studies of girls only that found a positive intervention effect (n = 4) considered sex/gender even less frequently than those with different intervention effect. These four studies did not provide information about consideration of sex/gender in 70.8% of the ratings over all items of the checklist (Carlin et al., 2018; Fairclough & Stratton, 2006; Guagliano et al., 2015; Schneider et al., 2007). One likely reason for this omission is that explicit discussion of comparisons with the opposite sex/gender may seem unnecessary in studies where all subjects are of the same sex.

Studies with different intervention effects on girls and boys were rated very similarly based on the checklist. None of these studies gave any information regarding the consideration of sex/gender on the checklist items that describe the theoretical and conceptual linkages with sex/gender, the measurement instruments, or how sex/gender was considered in study sample recruitment, intervention content and materials, or the selection of people carrying out the intervention. This means that in these studies, sex/gender was not considered in either the planning or the implementation of the intervention. Only in the results of the intervention did differences emerge, and they were then discussed by 88.0% of the studies. These findings indicate strongly that sex/gender should be taken into account at earlier stages of the study (i.e., in planning and implementation). All studies that found significant positive effects only in girls addressed sex/gender issues in the discussion; in contrast, among the studies that identified significant positive effects only in boys, just one-quarter considered issues of sex/gender when discussing the results. In other words, if an intervention is effective only with girls, the difference by sex/gender attracts researchers’ attention more strongly than if it is effective only with boys. This could be because PA is generally more prevalent among boys than among girls (Guthold et al., 2020; WHO, 2018a), with the result that intervention programs that improve PA only among girls highlight the differential impact by sex/gender most vividly.

The results of the meta-analyses showed that the interventions were successful with both girls and boys, even though the effect sizes were small and the heterogeneity between studies was very high throughout all studies. The meta-analysis of the effects on girls revealed a publication bias, in that the analyzed intervention studies are very different with regard to the implementation, measurement methods, and statistical analyses. Nevertheless, the results of our meta-analyses show that interventions conducted in a school context can increase PA among girls and boys. This finding indicates that such interventions are generally useful, although the validity of the meta-analysis was limited (Love et al., 2019).

Even if a PA intervention seems to work for both girls and boys, however, it is necessary to consider more carefully the target and the components of an intervention so as to assure effectiveness for girls and boys because there is always a risk of reinforcing inequalities. Sex/gender must therefore always be taken into account, otherwise unintended disadvantages or reinforcements of inequalities may result (Nieuwenhoven & Klinge, 2010; Verscheure & Amade-Escot, 2007).

Only further replication, with documentation of the content, components, and implementation of the intervention, can determine whether the sex/gender of teachers or caregivers has an influence on the promotion of PA, or whether girls and boys should be educated separately. Sigmund, El Ansari, & Sigmundova (2012) was the only study that received a rating of “detailed” on the item Intervention delivery, location and interventionists (IDLI) because this study reported that when girls and boys played separately and/or together, girls and boys chose different activity types, equipment, and content during co-educational teaching. This intervention was effective for both girls and boys. To find out whether this feature of the study (i.e., permitting girls and boys to play in different ways) was a reason why the intervention had positive impact on both girls and boys, further research would be needed, since we have just one study illustrating this pattern. On the item Intervention content and materials (ICM), no study received a “detailed” rating and four were rated as “basic” (Engelen et al., 2013; Fairclough et al., 2016; Okely et al., 2017; Sigmund et al., 2012) because they considered sex/gender in a limited way—for example, by providing differential materials for boys and girls (Engelen et al., 2013). The effects of these differences between the materials were not reported, however, so we have no ability to draw conclusions about the importance of the materials used. For example, it might be relevant what color, what language the materials had that were used in the interventions. Information is needed on how the children were addressed, if boys and girls were addressed equally or separately and if the interests and needs of both girls and boys were considered when developing the program. Based on this information, in a further step we can find out what works for boys, what works for girls and what works for everyone.

Another way to increase PA for both girls and boys, or at least to determine more clearly what interventions work for each sex/gender, could be to adopt school PA policies that contain sex/gender considerations (McKenzie, 2019).

Strengths and limitations

To the best of our knowledge, this review paper is the first to systematically analyze the consideration of sex/gender in intervention studies intended to promote PA in a school context, in relation to the effectiveness of the interventions. Another unique strength of this study was the use of the sex/gender checklist, which provided detailed information on the extent to which sex/gender was considered in each study and permitted comparison with the narrative interpretation of effects. The use of the PRISMA statement is another strength, as it ensured the methodological quality of the systematic review. Moreover, the meta-analyses provided further insights into the effectiveness of interventions in the school context with regard to sex/gender.

One limitation of this systematic review is that it encompassed only English-language articles. Furthermore, the checklist assesses whether sex/gender was discussed, but not the quality or extent of the discussion. In addition, it is not possible to assess whether sex/gender was not considered at all in a particular intervention study or whether it was just not reported; the inability to make this distinction could introduce a bias into the results. Another limitation is that only a small number of studies could be included in the meta-analysis, limiting its generalizability. Since no sex/gender-diverse participants were included in the studies examined in this systematic review, we were limited to binary sex/gender characterization.

Conclusion

In general, we found insufficient consideration of sex/gender in intervention studies in the school context to increase physical activity (PA) among children and adolescents. Studies that found significant positive intervention effects did not differ in their extent of consideration of sex/gender from those that did not find significant intervention effects, nor did studies that found the same effect on girls and boys differ from those that reported different effects on girls and boys. Current research shows a clear difference in the physical activity and sedentary behavior between girls and boys (Kalman et al., 2015; WHO, 2018a). These differences in behavior can have severe health consequences (Biddle et al., 2019; Janssen & Leblanc, 2010). Only by better understanding the differences and similarities in the physical activity and sedentary behavior of girls and boys can we contribute to enhance positive behaviors and counteract the physical inactivity pandemic. For this, a clear documentation of relevant sex/gender aspects during the design, implementation and evaluation of intervention programs and for the conduct of systematic reviews is crucial.