Introduction

Secular trends in children’s time use demonstrate that the greatest percentage of boys’ and girls’ daily time can be ascribed to leisure-time activities (Hofferth 2009). Therefore, children’s and adolescents’ free time provides a tremendous opportunity to enhance their health and well-being. Leisure-time PA is defined as discretionary or recreational time for hobbies, sports and exercise, and includes PA that is freely chosen for intrinsic satisfaction (Rossman and Schlatter 2008). It mostly reflects the wishes and inherent abilities of girls and boys (Aaltonen et al. 2016). Therefore, encouragement of leisure-time PA is suitable for a long-term promotion of an active lifestyle. Furthermore, leisure-time PA is associated with higher school engagement, lower levels of school related stress, better academic achievement, positive affect and lower levels of depression and psychological distress (Badura et al. 2016; Bélair et al. 2018; Kleppang et al. 2019; White et al. 2018).

Nevertheless, the overall amount of moderate to vigorous PA in leisure time decreases by approximately ten minutes per day with each year of age between 9 and 16 years due to a decrease in unorganized or organized PA (Commonwealth Scientific Industrial Research Organisation (CSIRO) et al. 2007).

Moreover, lower leisure-time PA levels are reported for girls compared to boys (Klinker et al. 2014a; Klinker et al. 2014b; Nilsson et al. 2009). Boys accumulate daily 10.72 more minutes of moderate to vigorous PA than girls during leisure-time (p < .05) (Klinker et al. 2014b). In addition, types of activity are highly gendered. For example, a recent Norwegian study reported that girls tended to participate in dancing, gymnastics, exercising to music, jumping or rope skipping, whereas boys participated more frequently in team handball, climbing, swimming/water play, mountain hiking or soccer (Resaland et al. 2019). Socialization into gender-typed PA begins in early life. Furthermore, girls and boys respond differently to interventions promoting overall and leisure-time PA. For example, it has been shown that girls compared to boys are more likely to be interested in smartphone apps to seek health information and are more likely to participate in Web-based PA interventions (Guertler et al. 2015).

Sociocultural influences play an important role with regard to PA preferences (Downward and Riordan 2007; Humpreys and Ruseski 2007). Differences in leisure-time PA levels between boys and girls may partly be explained by the expectancy-value model (Eccles 1983; Eccles and Harold 1991). This model focuses on the environmental factors and socializing individuals through which stereotypes and norms affect individuals. In particular, it assumes two core variables that determine behaviour: success expectancies and subjective task value. In particular, gender identity and sport sex/gender stereotyping may affect the amount of perceived competence and subjective value of leisure-time PA (Guillet et al. 2006; Slater and Tiggemann 2011).

Despite sex/gender differences in leisure-time PA, sex/gender has not been widely considered in systematic reviews when appraising existing evidence on the effects of interventions aiming to promote leisure-time PA. There are no existing guidelines in the context of PA promotion that encompass the implementation and assessment of the effectiveness of sex/gender inclusivity in reviews. Few reviews in the context of health promotion considered sex/gender in their analyses and report sex/gender data (e.g. sex/gender background information, sex/gender inclusivity of intervention delivery, location, or interventionists) (Petkovic et al. 2018). However, there is a need to evaluate sex/gender aspects in more detail in systematic reviews. In relation to leisure-time PA which reflects a voluntary behaviour, only one review has specifically investigated the effects of interventions promoting leisure-time PA in children and adolescents (De Meester et al. 2009). Nevertheless, the authors of this review did not consider or report any sex/gender aspects of intervention studies.

Therefore, the main objectives of this review are to evaluate the effects of interventions on children’s and adolescents’ leisure-time PA for boys and girls separately and to appraise the extent to which the studies have taken sex/gender into account. To reach this aim, all primary studies included in the review will be assessed on a recently developed sex/gender checklist (Demetriou et al. 2019).

Methods

The current study is part of the collaborative genEffects project that evaluates the sex/gender effects of interventions on girls’ and boys’ PA and sedentary behaviour. The genEffects systematic review on sex/gender is reported according to the PRISMA guidelines. As genEffects is a very broad systematic review with a broad range of different intervention studies with a high heterogeneity, the included studies were split with regard to different domains of PA and sedentary behaviour (i.e. overall PA, school PA, active transport, leisure-time PA and sedentary behaviour). The results with regard to overall PA, active transport and sedentary behaviour are presented elsewhere (Marzi et al. 2020; Schulze et al. 2020; Vondung et al. 2020). To enable a meta-analytical assessment of interventions aiming to promote leisure-time PA (after-school and on weekends) in children and adolescents, the current analyses of the genEffects systematic review have been conducted. Only primary studies reporting on leisure-time PA as the main outcome were included in these analyses. Owing to low heterogeneity of these studies, we were able to conduct a meta-analysis for boys and girls separately. The protocol for the genEffects project has been published previously (Demetriou et al. 2019) and is also registered (ref CRD42018109528). There were no protocol amendments, except the GRADE framework was not used.

Search strategy and eligibility criteria

For the genEffects systematic review, a comprehensive literature search was conducted using 11 electronic databases (Cochrane Central Register of Trials (CENTRAL); U.S. National Library of Medicine (clinicalTrials.gov); Ovid Embase; Epistemonikos; EBSCO Eric; WHO International Clinical Trails Registry Platform (ICTRP); Ovid Medline; ProQuest Dissertations & These Global; EBSCO PsycINFO; EBSCO SPORTDiscus; Clarivate Web of Science) in August 2018. The search strategy was based on Cochrane standards and is included for Ovid MEDLINE as Online Resource 1.

Included intervention studies met the following criteria:

  1. (1)

    Participants: healthy children and adolescents within the average age range of 3 to 19 years

  2. (2)

    Intervention: the aim of the intervention had to be promotion of leisure-time PA (after school and/or on weekends)

  3. (3)

    Study design: randomized controlled trials (parallel group or cluster-randomized) and controlled trials

  4. (4)

    Comparator: active control group, other than PA or sedentary behaviour, or control group with no intervention

  5. (5)

    Outcome: leisure-time PA assessed by any type of measure (subjective/objective); additionally, all intervention studies had to (1) report sex/gender disaggregated PA at baseline and/or follow up, and/or (2) report that there were no differences in outcome when looking at sex/gender

  6. (6)

    Publication: English language peer-reviewed journal articles published after year 2000

Study selection and data extraction

Study selection for the genEffects systematic review was performed by two independent reviewers using Covidence systematic review software (Veritas Health Innovation, Melbourne, Australia. Available at www.covidence.org). After de-duplication, titles and abstract were screened, and articles of potential or indeterminate relevance retrieved for full text screening against eligibility criteria. All conflicts were resolved by a senior, third reviewer.

For each included study, study details were extracted in terms of information about general study characteristics (country, design, name of intervention program), sample size for intervention and control groups stratified by sex/gender and dropout rates, details about intervention content as well as intervention approaches and settings. Additionally, extraction forms contained information about interventions’ main outcomes, measurement points and instruments, and statistical approaches including confounders taken into account. These information were necessary to analyse the effectiveness of the interventions promoting leisure-time PA. For additional information, study protocols and supplementary materials were used, and in the case of missing information, the author(s) of the articles were contacted (maximum of two contact attempts).

Quality assessment and risk of bias

Risk of bias was carried out independently by two reviewers using the Cochrane risk of bias tool for randomized trials, version 1 (Higgins et al. 2011). Using the seven domains of the tool, primary studies were assessed for selection, performance, attrition, detection, reporting, and ‘other’ bias. For ‘other’ bias, we assessed baseline differences between intervention and control arm as well as seasonal differences in measurement points. Each domain was judged as ‘low’, ‘high’ or ‘unclear’ risk of bias, with the last category indicating either lack of information or uncertainty about the potential bias. Discrepancies were resolved through discussion or adjudication by a third reviewer. The Review Manager 5 (RevMan 5) (The Nordic Cochrane Centre 2014) tool was used to assess risk of bias of included studies.

Sex/gender assessment

Risk of bias was carried out independently by two reviewers using the Cochrane risk of bias tool for randomized trials, version 1 (Higgins et al. 2011). Using the seven domains of the tool, primary studies were assessed for selection, performance, attrition, detection, reporting and ‘other’ bias. For ‘other’ bias, we assessed baseline differences between intervention and control arm as well as seasonal differences in measurement points. Each domain was judged as ‘low’, ‘high’ or ‘unclear’ risk of bias, with the last category indicating either lack of information or uncertainty about the potential bias. Discrepancies were resolved through discussion or adjudication by a third reviewer. The Review Manager 5 (RevMan 5) (The Nordic Cochrane Centre 2014) tool was used to assess risk of bias of included studies.

Data synthesis and statistical analyses

Meta-analysis

Overall two meta-analyses were conducted for girls and boys separately using random-effects model (Borenstein et al. 2010) using Comprehensive Meta-Analysis software (Version 3, Biostat, Englewood, NJ). Heterogeneity was evaluated using Cochrane’s Q-statistic and I2 (Higgins et al. 2003). The effect size used was Hedges’ g, which is the standardized mean difference adjusting for small samples sizes (k < 20). Effect sizes of g = 0.20 are interpreted as small, g = 0.50 as moderate and g = 0.80 as large (Cohen 1988). Positive effect size was interpreted as the intervention group having higher leisure-time PA scores. The meta-analytical effect size estimates were based on baseline and post-intervention means, standard deviations and sample size. If data were available in other formats, data transformation were applied if possible (Higgins and Green 2011). Hedges’ g was then calculated by dividing the between-group difference of mean change from baseline by the pooled standard deviation of change for the groups, assuming a correlation of r = 0.5 between baseline and postintervention (Higgins and Green 2011; Morris 2008). Where two or more measures of PA were used, the pooled effect size was calculated to include only one effect size per study (using the method from Borenstein et al. (2009)). A single study reported effects from a 3-arm intervention that included one control group and two intervention arms (Loucaides et al. 2009). This study was considered as two separate comparisons (comparisons between control and intervention 1 and 2, separately) in all subsequent analyses. When key information for the calculating of Hedges’ g was missing, studies were eliminated from the analyses.

Randomized controlled trials,including cluster randomized trials with a pre-post control group design from the single-sex/gender studies and sex/gender disaggregated studies, were included in the meta-analysis. Non-randomized controlled studies were excluded from the meta-analyses since they should be analysed separately (Reeves et al. 2019). All included cluster randomized controlled trials were assessed for a unit-of-analysis error and their handling of adjusting for the clustering effect in the analyses (Campbell et al. 2004; Eldridge et al. 2004).

Additional subgroup analyses were conducted according to the mixed model (Borenstein et al. 2009). These are: target (mixed sex/gender studies vs. single sex/gender studies) and study design (randomized controlled trials vs. cluster randomized-controlled trials). Two analyses concerning outliers were calculated: (1) the studies with the highest and lowest effect size and (2) the studies with values of Hedges’ g that were not located within the 95% confidence interval of the random-effects model were excluded.

Indications for a publication bias were investigated indirectly by conducting a visual inspection for the funnel plot for the effect size. Furthermore, the Egger’s test of intercept (Egger et al. 1997) was conducted.

Descriptive analysis

A descriptive analyses was conducted to analyse if the sex/gender-related effects of the included intervention studies were related to the ratings of the sex/gender checklist. Some studies reported more than one outcome for leisure-time PA (e.g. light PA and moderate PA). These studies reported different effects on boys and girls in different PA outcomes. Thus, we conducted the analysis on the level of the PA outcomes (see Online Resource 4). Owing to missing statistical data in some primary studies (e.g. reporting only ‘not significant’ as a result), we were not able to analyse PA outcomes that show an effect in the same direction. Instead, the study results were divided into three groups: (1) PA outcomes with same/similar significant intervention effects for boys and girls; (2) PA outcomes with no significant intervention effects for boys and girls; and (3) PA outcomes with different intervention effects for boys and girls. Studies that reported more than one PA outcome with different sex/gender-related results were assigned to more than one of these three groups (see Online Resource 4). In every group for all PA outcomes, sex/gender considerations were specified by calculating the sum of ratings for ‘detailed’, ‘basic’, ‘no information provided’, ‘poor’ and ‘not relevant’ for every item of the checklist and by calculating the average number (M) of each rating per grade over all studies in each of the three groups. By applying these analyses, we were able to compare the degree of sex/gender consideration between studies that were or were not effective for both boys and girls, with studies that revealed different effects for boys and girls, respectively. For single sex/gender studies we compared PA outcomes that were effective with outcomes that were not effective.

Results

Study selection (flow chart)

In total, 31 articles reporting 31 unique studies with 44 outcomes for leisure-time PA and 20,088 participants were included in this analysis and publication. Originally, in the genEffects systematic review we identified 24,878 references through the electronic database search leading to the inclusion of 217 unique studies (reported 244 articles) (Fig. 1).

Fig. 1
figure 1

PRISMA flow diagram

Characteristics of included studies and study participants

A table including relevant characteristics of included studies is presented in Online Resource 2. Of the 31 included studies, ten reported results for PA after school during the week and 12 reported weekend PA. Nine studies investigated either, after school PA and weekend PA, with six reporting results separately, and three not distinguishing between after-school and weekend PA. Of the 44 included outcomes for leisure-time PA, 25 assessed PA objectively, 17 subjectively and two used both objective and subjective measurements. Average sample size was 648 participants for total sample at baseline, ranging from 29 (Hardman et al. 2009) to 2848 (de Meij et al. 2011). The duration of the intervention programs of the included studies ranged from one month (Vasickova et al. 2013) to three years (Cronholm et al. 2017). Average intervention length was 11.2 (±9.1) months. In the studies including boys and girls, on average 49% of the study population were boys. Fifteen studies reported sex/gender results separately (disaggregated) and one study analysed sex/gender using an interaction analysis. Ten studies reported no significant sex/gender differences without reporting statistical data (‘tested’) and five studies were single sex/gender studies (four studies enrolling only girls and one study, only boys). No studies enrolled or identified gender-diverse participants.

Risk of bias within studies

Overall, 77% of included studies were judged to be at high risk of bias on at least one domain (Fig. 2, Online Resource 3). The domain rated as having the lowest risk of bias was selective reporting, with all studies at low risk. The risk of bias domain that was judged to have the highest number of high risk studies was blinding of participant and personal (14 studies; 45%). The majority of high risk judgements (nine studies; 29%) of the ‘other’ domain were caused by baseline imbalance of outcome variables or timing of outcome assessments.

Fig. 2
figure 2

Risk of bias of included studies

Meta-analysis

Overall, ten studies were eliminated from the analyses because key information for the calculating of Hedges’ g were missing.

Effects for girls

For meta-analysis regarding effects for girls, nine comparisons (eight studies out of 19 studies reporting effects for girls with a total number of 2709 girls) provided sufficient data for inclusion (three single sex/gender and five sex/gender disaggregated studies; see Fig. 3 and Table 1). Heterogeneity analysis indicated significant between-study variance (Q = 27.383; I2 = 70.79%, p < .01). The meta-analysis revealed a significant small positive treatment effect (k = 9, g = 0.220, 95%CI = 0.078 to 0.3888, p = .003) for interventions on leisure-time PA in girls. The inspection of the funnel plot indicated more positive than negative comparisons and therefore some possible publication bias. Egger’s regression test was significant (z = 2.646; p = .014, two-tailed). These results indicated a high probability of publication bias.

Fig. 3
figure 3

Results of sex/gender checklist

Table 1 Standardized mean difference random effects (Hedges’ g) of physical activity interventions on leisure-time physical activity levels in girls only

Effects for boys

For meta-analysis regarding effects for boys, seven comparisons (six studies out of 16 studies reporting effects for boys with a total number of 2275 boys) were eligible for inclusion (one single sex/gender and five disaggregated studies; see Fig. 4 and Table 2). Heterogeneity analyses indicated significant between-study variance (Q = 13.0122, I2 = 54.26%, p < .05). The average treatment effect was significant but small (k = 7, g = 0.193, 95%CI = 0.030 to 0.356, p = .020). The visual inspection of the funnel plot indicated more studies on the right side than the left. Egger’s regression test was non-significant (z = 1.658; p = .145). These results indicated a low probability of publication bias.

Fig. 4
figure 4

Study outcomes with same/similar effects in girls and boys (with or without significant intervention effects) compared to study outcomes with   different intervention effects for boys and girls; N=33 outcomes

Table 2 Standardized mean difference random effects (Hedges’ g) of physical activity interventions on leisure-time physical activity levels in boys only

Sensitivity analysis

Sensitivity analysis of meta-analysis regarding effects of girls showed that removing outliers (two comparisons of Loucaides et al. 2009) resulted in a significant reduction to a small positive effect (k = 7, g = 0.107, 95%CI = 0.036 to 0.177, p = .003). Removing the studies with the highest (Loucaides et al. 2009) and lowest (Llaurado et al. 2018) effect size, the overall effect was also significantly reduced to a small positive effect (k = 7, g = 0.174, 95%CI = 0.042 to 0.306, p = .010). Heterogeneity was reduced when the outliers were removed (Q = 4.550, I2 < 0.001%; p < .001; Q = 13.910, I2 = 56.87%, p < .05). The subgroup analysis of the study design did not differ significantly (p = .106) and showed that randomized controlled trials resulted in a small effect size of g = 0.386 (k = 6; 95%CI = 0.149 to 0.623; p = .001) and high heterogeneity (I2 = 80.81%). When only analysing cluster randomized controlled trials the effect size was lower (k = 3, g = 0.102, 95%CI = −0.148 to 0.352, p = .425) with no heterogeneity (I2 = 0.00%). Assessment of Hedges’ g resulted in a higher effect size for mixed sex/gender studies (k = 6, g = 0.386, 95%CI = 0.149 to 0.623, p = .001) compared to single sex/gender studies (k = 3, g = 0.102, 95%CI = −0.148 to 0.352, p = .425). However, the difference in effect sizes between mixed sex/gender studies and single sex/gender studies was not significant (p = .106). Studies using mixed sex/gender designs resulted in higher heterogeneity (I2 = 80.81%) than single sex/gender studies (I2 < 0.01%).

Sensitivity analysis of meta-analysis regarding the effects of boys showed that the effect size of one study was not included in the 95% confidence interval of the overall effect size (Bronikowski and Bronikowska 2011). Once the study was removed, the average treatment effect was significantly reduced to a significant small positive effect (k = 6, g = 0.102, 95%CI = 0.010 to 0.194, p = .029). Removing studies with the highest (Bronikowski and Bronikowska 2011) and lowest (Haerens et al. 2006) effect size, the overall effect was diluted at significant level (k = 5, g = 0.150, 95%CI = 0.033 to 0.267, p = .012). The subgroup analysis of the study design showed that randomized controlled trials produced a low effect sized of Hedges’ g = 0.241 (k = 6, g = 0.241, 95%CI = 0.006 to 0.477; p = .045) and high heterogeneity (I2 = 61.85%). The only cluster randomized controlled trial results in a low effect size (k = 1, g = 0.137, 95%CI = −0.314 to 0.588, p = .551) and no heterogeneity (I2 = 0.00%). Accounting for the target, the assessment of Hedges’ g resulted in a higher effect size for mixed sex/gender studies (k = 6, g = 0.241, 95%CI = 0.006 to 0.477, p = .045) compared to single sex/gender studies (k = 1, g = 0.137, 95%CI = −0.314 to 0.588, p = .551). However, the difference in effect sizes between mixed sex/gender studies and single sex/gender studies was not significant (p = .688). Studies using mixed sex/gender designs resulted in higher heterogeneity (I2 = 61.85%) than single sex/gender studies (I2 < 0 .001%).

Sex/gender checklist

The results of the sex/gender assessment are presented in Fig. 3. Eight studies (26%) were judged to be ‘poor’ for least one item of the sex/gender checklist. All studies achieved a ‘basic’ rating for at least one item of the checklist. Only one study did not achieve a single ‘detailed’ rating. In one study, eight out of ten items were judged ‘basic’ or ‘detailed’ (Sigmund et al. 2012). A ‘detailed’ reporting of sex/gender aspects was mostly realized in the statistical results section (26 studies, 84%). Three items were mostly rated ‘basic’: definition and use of sex and/or gender terminology (20 studies, 65%), participant flow (16 studies, 52%) and discussion (16 studies, 52%). The majority of judgements were ‘no information provided’ for sex/gender background information (15 studies, 48%), theoretical and/or conceptual linkages with sex/gender (30 studies, 97%), measurement instruments (29 studies, 94%), study sample recruitment (25 studies, 81%), intervention content and material (28 studies, 90%) and intervention delivery, location and interventionists (27 studies, 87%). Overall, judgement for sex/gender aspects across all items was mostly ‘no information provided’ (53%). Nevertheless, 23% and 16% of all ratings were ‘basic’ or ‘detailed’, respectively.

Intervention effectiveness in terms of sex/gender

Descriptive analysis

We analysed the relationship of intervention effects with regard to sex/gender by considering the results of the sex/gender checklist that indicates the extent to which studies have taken sex/gender into account (see Online Resource 4).

For nine PA outcomes, significant intervention effects were found with no differences between boys and girls, and in 19 PA outcomes in both boys and girls no significant intervention effects were reported. Furthermore, six PA outcomes revealed different intervention effects in boys and girls. Qualitative analyses considering the sex/gender checklist showed, that there were no differences in how often considerations of sex/gender were rated as ‘poor’ or ‘basic’ in PA outcomes with regard to their intervention effects (significant effects in boys and girls; no intervention effects; significant effects only in boys or only in girls) (Fig. 4). Nevertheless, PA outcomes with same/similar significant intervention effects for boys and girls were more often rated as ‘detailed‘ (M=2.0) and less rated with ‘no information provided’ (M=5.0) compared to PA outcomes with no significant effects in boys and girls (M=1.4 and 5.6, respectively) and PA outcomes with different effects in boys and girls (M=1.3 and 5.8, respectively). In particular, PA outcomes with same/similar significant effects for boys and girls were more often rated ‘detailed’ with regard to theoretical and/or conceptual linkages with sex/gender, measurement instruments, intervention delivery, location and interventionists and participant flow.

Five single sex/gender-studies reporting ten PA outcomes were included in analyses, eight for girls and two for boys, respectively. One PA outcome (mean steps per day for girls) showed a significant increase for the intervention compared to the control group (Hardman et al. 2009). This study showed fewer items that were rated as ‘detailed’ (M = 0.0) and more often items that were rated as ‘no information provided’ (M = 4.0) than studies reporting PA outcomes with no significant intervention effect (M = 1.0 and 2.8 respectively). There were no differences in ‘basic’ or ‘poor’ sex/gender assessment between PA outcomes with significant intervention effects (Mbasic = 2.0; Mpoor = 0.0) and PA outcomes with no significant intervention effects (Mbasic = 2.0; Mpoor = 0.3; see Online Resource 4: Summary of all tables).

Discussion

This review included 31 studies with 44 outcomes measuring a wide range of leisure time PA outcomes by any type of measure (subjective/objective). Most study outcomes resulted in similar intervention effects for boys and girls (28 out of 34 outcomes). The study outcomes with the same/similar significant effects for boys and girls reported on sex/gender aspects in more detail. Overall, the quality of reporting sex/gender aspects was low. The meta-analysis for a subsample of studies showed that interventions had a significant but small effect on leisure-time PA of girls and boys (Figs. 5 and 6).

Fig. 5
figure 5

Forest plot of overall effect size (Hedges’ g) and summary of effect sizes of each individual comparison for PA interventions on leisure time PA in   girls only. Note: Favours A represents control

Fig. 6
figure 6

Forest plot of overall effect size (Hedges’ g) and summary of effect sizes of each individual comparison for PA interventions on leisure time PA in   boys only. Note: Favours A represents control

Our meta-analysis on a small subsample of included studies indicated that interventions on leisure-time PA showed small but significant effects in girls and boys. Nevertheless, single sex/gender interventions showed slightly lower intervention effects compared to mixed sex/gender interventions. However, this difference became not significant and only one study on boys could be included. Our tentative findings could be explained by considering previous research indicating that boys and girls tended to accrue more moderate-to-vigorous PA in coeducational than in unisex classes (Hannon and Ratliffe 2005; Van Acker et al. 2010). Boys and girls reported that they have more fun and a higher social motivation in coeducational classes compared to unisex classes (Ronspies 2011). Nevertheless, as shown in another review on equity effects of children’s PA interventions (Steel et al. 2014), there is no clear evidence on comparative effectiveness of targeted interventions focusing on a specific high-risk subgroup (like girls) and universally targeted interventions. In the present review, heterogeneity analysis revealed significant heterogeneity in mixed sex/gender studies but not for single sex/gender studies in boys as well as in girls. Furthermore, results should be interpreted carefully because confidence intervals, especially for effects of mixed sex/gender studies, were wide. Thus, further research is needed to understand if targeted or non-targeted interventions are more effective in terms of promotion of leisure-time PA (Love et al. 2017).

Our review identified that studies reporting the same or similar significant effects in boys and girls were more often rated as ‘detailed’ with regard to sex/gender consideration across all items of the checklist when comparing with interventions with no significant intervention effects in both boys and girls and interventions with different effects in boys and girls. This could be an indication that the consideration of sex/gender during intervention planning, development, delivery and analyses could lead to similar significant effects in boys and girls in these interventions. Interventionists might be more aware of sex/gender issue in these intervention studies. Especially, the items theoretical and/or conceptual linkages with sex/gender, measurement instruments, intervention delivery, location and interventionists and participant flow were taken into account more strongly in interventions with equal significant effectiveness.

As our results showed, it is important to consider theoretical and/or conceptual linkages with sex/gender to ensure that interventions showed significant effects for both boys and girls. In the current systematic review, in the study of Sigmund et al. (2012) sex/gender background information as conceptual linkages were used to design sex/gender-specific intervention strategies to enhance children’s and adolescents’ leisure-time PA. In this study, similar significant intervention effects were found in boys and girls. Basically, it is important to start with a theory and then analyse the data to avoid exploratory research conclusions with false leads. It has been shown that analyses without an underlying theory lead to incorrect results (Armstrong 1970). Consequently, it seems important not to consider sex/gender aspects only using subgroup analyses, but to use priori sex/gender theories to avoid false conclusions. To conclude, it is important to report the theoretical and conceptual linkages with sex/gender to reveal connections between underlying theories and the success of the intervention.

Our review showed that higher reporting of sex/gender aspects in terms of measurement instruments lead to same/similar significant intervention effects for both boys and girls, the application of measurement instruments that are sex/gender invariant is important. As a positive example out of the included studies in this review, Sigmund et al. (2012) used relative energy expenditure values for group comparison of girls and boys with different body weights to consider sex/gender differences. In previous research, it has been reported that the Yamax pedometer underestimated the number of steps taken at lower walking speed. Consequently, lower step counts of girls could be a result of underestimation because girls have a smaller stride length, resulting in slower walking speeds (Rowlands and Eston 2005). Like in the example from Sigmund et al. (2012), it is possible to consider sex/gender specific characteristics (e.g. weight, height or BMI) to minimize bias regarding various types of measurement instruments.

Additionally, taking gender-sensitive intervention delivery, location and interventionists into account was shown to result in same/similar significant results for both boys and girls. For example, the included study of van Nassau et al. (2014) reported consideration sex/gender aspects regarding the person carrying out the intervention. In particular, all measurements of boys were done by male research assistants, whereas all measurements of girls were performed by female research assistants. Furthermore, in the study of Sigmund et al. (2012), it is described that girls and boys separately choose type, equipment and content of activities during co-educational teaching and therefore the specific needs of girls and boys were considered. Furthermore, the intervention was delivered by male and female research assistants. Thus, to reach boys and girls equally, sex/gender considerations regarding the time and location of intervention delivery and inclusion of male and female interventionists seem to be necessary. This is because the absence of considerations of sex/gender aspects in actual research limits the external validity of research and their applicability for people regardless of sex/gender (Heidari et al. 2016).

Interventions in our review reporting PA outcomes with same/similar significant intervention effects in both boys and girls reported participant flow more frequently compared to PA outcomes no significant effects in boys and girls and PA outcomes with different effects in boys and girls. As a positive example out of the included studies in this review, O'Dwyer et al. (2012) presented the number of participants for baseline, post-intervention and follow-up disaggregated for girls and boys. For conducting sex/gender-based analyses it is important to report the flow of participants according to sex/gender (e.g. recruited, enrolled, completed). As girls had a higher dropout from sports participation, sex/gender distribution might be equal for recruitment but not for post or follow-up measurement (Silva et al. 2019).

Implication for research and practice

Results indicated higher improvement of leisure-time PA of both boys and girls in mixed sex/gender interventions. There is a need to address the inconsistent use of terms sex and gender, the insufficient consideration and reporting of sex/gender in developing and implementing interventions (e.g. 90% of included studies reported no sex/gender inclusive intervention content and material) and the lack of robust sex/gender analysis in health research. This review demonstrates a need for continued efforts to improve appropriate consideration and reporting of sex/gender during all steps of intervention planning, development, delivery and analysis. Although a variety of initiatives (e.g. Canadian Institute of Health Research, the Gender Policy Committee of the European Association of Science Editors) attempted to increase the degree to which sex/gender is considered in studies, no appropriate guidelines encompassing sex/gender in interventions and systematic reviews in the context of leisure-time PA exist (Canadian Institute of Health Research (CIHR)-Institute of Gender and Health 2012; Canadian Institute of Health Research (CIHR) 2014; Dunn et al. 2016). It seems still important to consider sex/gender aspects to reduce the sex/gender gap in terms of leisure-time PA, due to existing differences between the preferences of leisure-time activities of boys and girls. While girls spend more time with sitting activities (schoolwork, housework, studying), boys devote more time to active participation in leisure-time PA (Videnovic et al. 2010). Therefore, boys showed higher leisure-time PA levels compared to girls, especially during adolescents (Klinker et al. 2014a; Klinker et al. 2014b; Nilsson et al. 2009). As boys and girls respond differently to interventions promoting leisure-time PA (Guertler et al. 2015), the newly developed sex/gender checklist can help researchers establishing sex/gender guidelines on the development, implementation and appraisal of leisure-time PA promotion.

Strengths and limitations

To our best knowledge our systematic review is the first to systematically assess how sex/gender aspects are considered in interventions aiming to promote leisure-time PA in children and/or adolescents. No previous review appraised the extent to which the studies haven taken sex/gender into account with a comprehensive checklist and systematically analysed the effectiveness with regard to sex/gender. Furthermore, through our inclusive approach to PA promotion activities, which was not limited to only behavioural and cognitive strategies, we were able to highlight a range of different programs to improve leisure-time PA in children and adolescents. Another strength of the systematic review was using the PRISMA statement to improve the reporting quality.

However, this work has some limitations: The review is limited to English language articles and did not include interventions published in other languages. Furthermore, the search was limited to peer-review journal articles, and thus results of other intervention studies published in other types of literature (such as dissertations) were excluded. Regarding the considerations of sex/gender aspects in the primary studies, we were not able to differentiate if these aspects were neglected or just fragmentary or insufficiently reported. However, this can lead to bias and underevaluation of sex/gender considerations in primary studies. It is also worth mentioning, that conclusions should be interpreted carefully because of the significant heterogeneity of studies. Additionally, based on the available primary data, we were not able to determine if the interventions contributed to gender equity. We just analysed if boys and girls benefited similarly from the intervention regardless of their starting levels of leisure-time PA. Thus, even if they benefited equally at the end of the intervention there can still be unequal levels of leisure-time PA. Finally, our work here is also limited to focusing on the binary characterization of gender (boys and girls) because none of the included studies included gender diverse participants.

Conclusion

The findings of the current systematic review showed significant but small effects of interventions aiming to improve leisure-time PA for both boys and girls. Despite low levels of PA during leisure-time in children and adolescents and different levels of PA in boys and girls, the current systematic review confirms that sex/gender aspects have rarely been considered in previous interventions aiming to increase children’s and adolescents’ leisure-time PA. Additionally, the review revealed that leisure-time PA interventions with low quality of sex/gender reporting tended to have significant effects only in boys or in girls, and thus may have contributed to manifestations of sex/gender inequalities in leisure-time PA. The findings can be of interest to stakeholders and health promoters as well as researchers and policy makers who put effort into promoting leisure-time PA by fostering sex/gender equity in leisure-time PA at the same time.