Commentary

The recently published systematic review by De Meester and colleagues [1] provides an excellent summary of the effectiveness of interventions to promote physical activity among European teenagers. The authors, however, based their conclusions merely on p-values while effect sizes are more appropriate to derive conclusions with regard to differences in the effectiveness of interventions. Therefore, in addition to the highly appreciated work done by De Meester and colleagues, I added effect sizes and provided an even more detailed insight into the effectiveness of interventions to promote physical activity among European teenagers.

Traditionally, null hypothesis-testing resulted in reporting of p-values as the key result of studies in behavioural sciences. Nowadays, however, reporting of effect sizes is deemed increasingly important [2]. Volker [3] argues on why to report effect sizes. In short, p-values are not particularly informative to determine whether a statistically significant effect is meaningful and substantive. They are merely "a conditional probability indicative of the probability of a result at least as extreme as the obtained difference assuming that the null hypothesis is true." [3] Therefore, effect sizes (i.e, Cohen's d) are reported in this commentary to gain a more detailed insight into the meaningfulness and substantiveness of the results of the studies included by De Meester and colleagues [1]. As correctly argued by De Meester and colleagues [1], performing a meta-analysis is problematic because of the heterogeneity of the outcome measures of the included studies. Nevertheless, effect sizes still provide the opportunity to quantify the intervention outcomes instead of having only a conditional probability. It needs to be stressed, however, that an effect size on its own is not equal to the public health impact of an intervention. According to the RE-AIM model, the public health impact of an intervention can only be evaluated by the assessment of five dimensions: Reach, Efficacy, Adoption, Implementation and Maintenance (hence the acronym RE-AIM) [4]. Thus, the public health impact of an intervention that has a small effect, but reaches a large group of people, can still be high. Furthermore, it is important, when looking at the potential of interventions, not to lose sight of the quality assessment of the studies (irrespective of the intervention outcomes).

Table 1 gives an overview of the effect sizes of intervention outcomes. Effect sizes were calculated by using the formulas described by Lipsey and Wilson [5]. As recommended by Morris [6], effect sizes were based on the pooled pre- and post-test standard deviation to obtain a more precise effect size estimate. Odds ratios were converted into Cohen's d (as described by Chinn [7]). Three studies did not report sufficient information to calculate effect sizes; one of the authors of these studies was able to provide the required information. In line with Cohen's classification [8, 9], effect sizes were divided into five levels: trivial (Cohen's d ≤ .2), small (> .2), moderate (> .5), large (> .8), and very large (> 1.3).

Table 1 Effect sizes of intervention outcomes

The effects sizes in Table 1 clearly demonstrate the differences between effect sizes and p-values. For example, the p-value of the difference between the intervention group and the control group in the study by Christodoulos and colleagues [10], with regard to total moderate to vigorous physical activity, did not reach significance. The effect size for this outcome measure, however, was very large (d = 2.79). This demonstrates the potential of this intervention with regard to behaviour change, apart from the small sample size in the specific study. Another example, the study of Hill and colleagues [11], shows that although the conditions in this study did not differ significantly, the effect sizes indicate a clear difference between the impact of these interventions (ranging from 0.18 to 0.45).

Taking into account the effect sizes reported in Table 1, the conclusion of De Meester and colleagues [1] that school-based interventions generally lead to short term improvement in physical activity levels still holds. There were, however, large differences between interventions with regard to effect sizes. These differences should be taken into account when judging the potential of intervention solutions. Therefore, the evidence with regard to involvement of family is inconclusive and recommendations regarding family involvement should be interpreted with caution as they are premature. In contrast to De Meester and colleagues [1], the evidence provided by effect sizes is not inconclusive with regard to a multi-component approach. Interventions that included environmental components (as identified in the published review) generally resulted in larger effect sizes. This provides evidence for the assumption that a multi-component approach should produce synergistic results. With regard to interventions aimed at multiple behaviours, it can be concluded that these interventions resulted in smaller effects with regard to physical activity. There appeared to be no differences in effect sizes related to quality assessment of the studies (as assessed in the published review). Nevertheless, when homogeneous outcome measures are available for future studies, meta-analyses are needed to fully warrant strong conclusions with regard to potential moderators of effect sizes (e.g., methodological quality).

Conclusions

Based on the evidence identified by the review of De Meester and colleagues [1] and the effect sizes reported in this commentary, a detailed insight into the effectiveness of interventions to promote physical activity among European teenagers is provided. In summary, the main findings based on this evidence:

(1) School-based interventions generally lead to short term improvement in physical activity levels, but there were large differences between interventions with regard to effect sizes.

(2) A multi-component approach (including environmental components) generally resulted in larger effect sizes, thereby providing evidence for the assumption that a multi-component approach should produce synergistic results.

(3) If an intervention aimed to affect more health behaviours besides physical activity, then the intervention appeared to be less effective in favour of physical activity.