Background

It has been suggested that exercise is a promising intervention in overweight and obese children and adolescents [1]. Potential benefits include, but are not limited to, improvements in (1) cardiovascular fitness, (2) muscular strength and (3) vascular function [1]. In addition, exercise may reduce body fat and increase lean body mass [1], thereby reducing the risk of overweight and obesity in adulthood [2] and the subsequent premature morbidity and mortality associated with such [3].

Body mass index (BMI) is the most common method used to assess overweight and obesity in children and adolescents. Previous systematic reviews, with or without meta-analysis, have generally focused on multiple lifestyle interventions, for example, diet and exercise, in the prevention and treatment of overweight and obesity in children and adolescents [429]. Consequently, the independent effects of an intervention such as exercise on BMI measures cannot be elucidated. From the investigative team’s perspective, this is important to know when attempting to develop effective interventions for treating overweight and obese children and adolescents. For the five systematic reviews with meta-analyses that have included a focus on exercise [4, 12, 19, 28, 29], four of five (80%) reported a non-significant change in BMI among male and female children and adolescents [4, 12, 19, 28]. However, all five suffer from one or more of the following potential limitations: (1) inclusion of a small number of studies with exercise as the only intervention [4, 12, 19], (2) inclusion of non-randomized trials [12, 29], and (3) inclusion of children and adolescents who were not overweight or obese [12, 28, 29]. Furthermore, using the Assessment of Multiple Systematic Reviews (AMSTAR) instrument for assessing the methodological quality of systematic reviews [30], the overall quality score (0% to 100% with higher scores representing better quality) was only 45% [29], 55% [4, 28], 64% [19] and 82% [12] for these five meta-analyses. Finally, none of the reviews included BMI z-score [4, 12, 19, 28, 29], an outcome that has been suggested to be more valid than other BMI measures in children and adolescents [31]. It is critically important to develop a better understanding of the overall magnitude of effect, as well as potential factors associated with, exercise-induced changes on BMI in overweight and obese children and adolescents. Given the former, the primary purpose of this study was to use the meta-analytic approach to examine the effects of exercise on BMI z-score in overweight and obese children and adolescents. A secondary purpose was to examine other selected variables that have been shown to be associated with cardiovascular as well as all-cause mortality; body weight, BMI in kg. m2, BMI percentile, body fat (absolute and percent), fat-free mass, waist circumference, waist-to-hip ratio, resting systolic and diastolic blood pressure, total cholesterol (TC), high-density lipoprotein cholesterol (HDL-C), ratio of total cholesterol to high-density lipoprotein cholesterol (TC:HDL-C), low-density lipoprotein cholesterol (LDL-C), triglycerides (TG), non-high density lipoprotein cholesterol (non-HDL-C), fasting glucose, fasting insulin, glycosylated hemoglobin, physical activity levels, maximum oxygen consumption (ml.kg-1.min-1), muscular strength, energy intake and energy expenditure [32].

Methods

This study was conducted and reported according to the general guidelines recommended by the Primary Reporting Items for Systematic Reviews and Meta-analyses (PRISMA) Statement [33]. A PRISMA checklist indicating where these items are reported in the original Word document can be found in Additional file 1.

Study eligibility criteria

The a priori inclusion criteria for this meta-analysis were as follows: (1) randomized controlled trials with the unit of assignment at the participant level, (2) comparative control group (non-intervention, attention control, usual care, placebo), (3) exercise-only intervention group (no diet intervention) lasting ≥ 4 weeks, (4) overweight and obese children and adolescents 2 to 18 years of age, (5) studies published in full in any language and source (journal articles, dissertations, etc.) between January 1, 1990 and December 31, 2012, (6) data available for BMI z-score or data to calculate BMI-z-score. Studies were limited to randomized trials because it is the only way to control for confounders that are not known or measured as well as the observation that nonrandomized controlled trials tend to overestimate the effects of healthcare interventions [34, 35]. Four weeks was chosen as the lower cut point for intervention length based on previous research demonstrating improvements in adiposity over this period of time in 11-year old girls [36]. Participants were limited to overweight and obese children and adolescents, as defined by the original study authors, because it has been shown that this population is at an increased risk for premature morbidity and mortality throughout their lifetime [37]. The year 1990 was chosen as the start point for searching in order to increase the chances of receiving data from investigators. The review protocol for this study is available from the corresponding author upon request.

Data sources

Studies up to December 31, 2012 were retrieved using the following 11 electronic databases: (1) Medline, (2) CINAHL, (3) Scopus, (4) Academic Search Complete, (5) Educational Research Complete, (6) Web of Science, (7) Sport Discus, (8) ERIC, (9) LILACS, (10) Cochrane Central Register of Controlled Trials (CENTRAL) and (11) Proquest. All electronic searches were conducted by the second author with assistance from a Health Sciences librarian at West Virginia University. While the search strategies used varied per the requirements of the different databases searched, keywords centered around the terms “exercise”, “overweight”, “obesity”, “children,” “adolescents” and “randomized”. The search strategies for all databases searched can be found in Additional file 2. After removing duplicates, the overall precision of the searches was calculated by dividing the number of studies included by the total number of studies screened [38]. The number needed to read (NNR) was then calculated as the inverse of the precision [38]. In addition to electronic database searches, cross-referencing for potentially eligible meta-analyses from retrieved reviews was also conducted. All studies were stored in Reference Manager, version 12.0.1 [39].

Study selection

All studies were selected by the first two authors, independent of each other. Disagreements regarding the final list of studies to include were resolved by consensus. If consensus could not be reached, the third author acted as an arbitrator. After an initial list of included studies was developed, the third author, an expert in exercise and overweight and obesity in children and adolescents, reviewed the list for completeness. All included studies as well as a list of excluded studies, including reasons for exclusion, were stored in Reference Manager (version 12.0.1) [39].

Data abstraction

Prior to data abstraction, a detailed codebook that could hold at least 242 items per study was developed by all three members of the research team in Microsoft Excel 2007 [40]. The major categories of variables that were coded included: (1) study characteristics, (2) subject characteristics, (3) exercise program characteristics, (4) primary outcomes and (5) secondary outcomes. The primary outcome for this study was BMI z-score. Secondary outcomes included body weight, BMI in kg. m2, BMI percentile, body fat (absolute and percent), fat-free mass, waist circumference, waist-to-hip ratio, resting systolic and diastolic blood pressure, TC, HDL-C, TC:HDL-C, LDL-C, TG, non-HDL-C, fasting glucose, fasting insulin, glycosylated hemoglobin, physical activity levels, maximum oxygen consumption (ml.kg-1.min-1), muscular strength, energy intake and energy expenditure.

Based on abstracted data and similar to a previous study in children and adolescents [41], intensity of training was calculated as metabolic equivalents (METS) using the following categories: (1) low = 2.35, based on range of 1.8 to 2.9, (2) moderate = 4.45, based on a range of 3.0 to 5.9, (3) high = 7.5, based on a MET value greater than 5.9 [42]. In addition, the following calculations were made: (1) minutes of training per week (frequency × duration), (2) MET minutes per week (frequency × duration × METS), (3) total minutes over the entire intervention (length × frequency × duration), (4) total MET minutes over the entire intervention (length × frequency × duration × METS). Where possible, calculations were also adjusted for compliance, defined as the percentage of exercise sessions attended.

Missing primary outcome data were requested from the author(s). Multiple publication bias was avoided by only including data from the most recently published study. Data abstraction occurred using the same procedure as the selection of studies. Using Cohen’s kappa statistic [43], the overall agreement rate prior to correcting discrepant items was 0.93.

Risk of bias

The Cochrane Collaboration risk of bias instrument was used to assess bias across six categories: (1) random sequence generation, (2) allocation concealment, (3) blinding of participants and personnel, (4) blinding of outcome assessment, (5) incomplete outcome data, (6) selective reporting and (7) whether or not participants were exercising regularly, as defined by the original study authors, prior to taking part in the study [44]. Each item was classified as having either a high, low, or unclear risk of bias [44]. Assessment for risk of bias was limited to the primary outcome of interest, changes in BMI z-score. Since it’s impossible to blind participants to group assignment in exercise intervention protocols, all studies were considered to be at a high risk of bias with respect to the category “blinding of participants and personnel”. Based on previous research, no study was excluded based on the results of the risk of bias assessment [45]. All assessments were performed by the first two authors, independent of each other. Both authors then met and reviewed every item for agreement. Disagreements were resolved by consensus.

Statistical analysis

The a priori plan was to conduct a one-step individual participant data (IPD) meta-analysis [46]. However, because of (1) the inability to obtain IPD from all eligible studies, (2) the inability to resolve discrepancies between the IPD provided and data reported in the published studies, for example, final sample sizes and (3) the potential loss of power with fewer included studies at the IPD level, a post hoc decision was made to conduct an aggregate data meta-analysis, an approach similar to conducting a two-step meta-analysis with IPD [46].

Calculation of effect sizes for primary and secondary outcomes from each study

The primary outcome for this study was effect size (ES) changes in BMI z-score. This was calculated by subtracting the change score difference in the exercise group from the change score difference in the control group. Variances were calculated from the pooled standard deviations of change scores in the intervention and control groups. If change score standard deviations were not available, these were calculated from reported 95% confidence intervals (CI) or pre and post standard deviation (SD) values according to procedures developed by Follmann et al. [47]. Each ES was then weighted by the inverse of its variance [48]. With the exception of fasting insulin, all other secondary outcomes were calculated using the same approach as for BMI z-score. For fasting insulin, the standardized mean difference ES, adjusted for small sample bias, was calculated from each study in order to create a common metric for the pooling of findings [48]. This was calculated as the difference in change scores between the exercise and control groups divided by the pooled SD of the change scores [48]. For all ES’s, the beneficial direction of effect was the natural direction of benefit, (for example, negative values for decreases in BMI z-score, positive values for increases in maximum oxygen consumption, etc.).

Pooled estimates for primary and secondary outcomes

Random-effects, method-of-moments models that incorporate heterogeneity into the overall estimate were used to pool results for BMI z-score and secondary outcomes from each study [49]. Multiple groups from the same study were analyzed independently as well as collapsing multiple groups so that only one ES represented each outcome from each study [50]. Non-overlapping 95% CI were considered statistically significant. Secondary outcomes were only included if data for the primary outcome of interest, BMI z-score, were available. To enhance practical application, the number-needed-to treat (NNT) was calculated for any overall findings that were reported as statistically significant [51]. This was accomplished using the approach suggested by the Cochrane Collaboration and assuming a control group risk of 10% [52]. Based on the NNT for changes in BMI z-score, gross estimates of the number of obese children and adolescents in the US who could benefit from exercise, based on 12.5 million obese children and adolescents [53] as well as the number of overweight and obese children worldwide who could benefit from exercise, based on 110 million overweight or obese children [54, 55], were provided. It was assumed that none of the overweight and obese children and adolescents included in the original estimates were exercising regularly.

Stability and validity of changes in primary and secondary outcomes

Heterogeneity of results between studies was examined using Q and I 2[56]. To determine treatment effects in a new trial, 95% prediction intervals (PI) were also calculated [57, 58]. Small-study effects (publication bias, etc.) were examined using the regression approach of Egger et al. [59, 60]. In order to examine the effects of each result from each study on the overall findings, results were analyzed with each study deleted from the model once. Cumulative meta-analysis, ranked by year, was used to examine the accumulation of evidence over time [61]. Post hoc, changes in BMI z-score were examined with two studies in which reductions in energy intake occurred deleted from the model [62, 63].

Moderator analysis for BMI z-score

Between-group differences (Qb) in BMI z-score for categorical variables were examined using mixed effects ANOVA-like models for meta-analysis [64]. This consisted of a random effects model for combining studies within each subgroup and a fixed effect-model across subgroups [64]. Study-to-study variance (tau-squared) was considered to be unequal for all subgroups. This value was computed within subgroups but not pooled across subgroups. Planned categorical variables to examine a priori included: country in which the study was conducted (USA, other), type of control group (non-intervention, other), whether IPD was provided (yes, no), whether the study was funded (yes, no), power/sample size analysis provided (yes, no), adverse events (yes, no), risk of bias assessment (separate assessment of low, high or unclear risk according to sequence generation, allocation concealment, blinding of participants and personnel, blinding of outcome assessment, incomplete outcome data, selective reporting, whether subjects were inactive prior to enrollment), gender and race/ethnicity. Using the categories yes, no or some, analyses were planned for the following variables: prescribed drugs, changes in exercise and/or physical activity levels beyond the exercise intervention, hyperlipidemia, type 1 diabetes, type 2 diabetes, hypertension, heart problems, metabolic syndrome, cancer, asthma and pubertal stage. In addition, type of exercise (aerobic, strength, both, other), exercise supervision (yes, no), setting that exercise took place (facility, home, both), type of participation (self, group, both), type of analysis (analysis-by-protocol versus intention-to-treat) and intensity of exercise (low, moderate, high), were examined [65]. All moderator analyses were considered exploratory [66].

Meta-regression for changes in BMI z-score and potential covariates

Simple mixed-effects, method of moments meta-regression was used to examine the potential association between changes in BMI z-score and continuous variables [64]. Because missing data for different variables from different studies was expected, only simple meta-regression was planned and performed. Potential predictor variables, established a priori, included year of publication, percentage of dropouts, age in years, baseline BMI z-score, as well as the following exercise intervention characteristics: length of training (weeks), frequency of training (days per week), duration of training (minutes per session), total minutes per week (unadjusted and adjusted for compliance), MET minutes per week (unadjusted and adjusted for compliance), total minutes for the entire intervention period (unadjusted and adjusted for compliance), and compliance, defined as the percentage of exercise sessions attended. Similar to moderator analyses, all meta-regression tests were considered exploratory [66].

Results

Study characteristics

A general description of the characteristics of each study is shown in Table 1. Of the 4,999 citations reviewed, 10 studies representing 21 groups (11 exercise, 10 control) and final assessment of BMI z-score in 835 children and adolescents (456 exercise, 379 control), were included [62, 63, 6774]. The precision of the searches was 0.0028 while the NNR was 357. A description of the search process, including the reasons for excluded studies, is shown in Figure 1 while a list of excluded studies, including the reasons for exclusion, is shown in Additional file 3. All studies were published in English-language journals between the years 2004 and 2012 [62, 63, 6774]. Seven studies used a non-intervention control group [62, 63, 6973] while the remaining three used some type of attention control [67, 68, 74]. For matching, seven studies did not match participants [62, 67, 69, 7174] while the remaining three matched participants according to race and gender [68, 70] or age, gender and BMI [63]. For data analysis, four studies used the intention-to-treat approach [67, 68, 70, 74], another five appeared to use the per-protocol approach [63, 69, 7173] and one used both [62]. Sample size justification was provided by five of the 10 studies [62, 67, 68, 70, 74] while all ten reported receiving some type of funding to conduct their study [62, 63, 6774]. The dropout rate for the eight studies in which data were available [62, 63, 67, 68, 70, 71, 73, 74] ranged from 0% to 34% for the 9 exercise groups for which data were available Χ ¯ ± SD = 11 ± 12 % , Mdn = 7 % and 0% to 26% for the 8 control groups in which data were available for Χ ¯ ± SD = 13 ± 10 % , Mdn = 15 % . Detailed data regarding the reasons for dropping out for each study are available upon request from the corresponding author. For the three studies that reported sufficient data on adverse events [68, 70, 74], two reported no serious adverse events [70, 74] while one reported a foot fracture in one participant as well as several minor injuries [68].

Table 1 Characteristics of included studies*
Figure 1
figure 1

Flow diagram for the selection of studies. *, number of reasons exceeds the number of studies because some studies were excluded for more than one reason.

Initial physical characteristics of the exercise and control groups are shown in Tables 1 and 2. For prior exercise, three studies reported that none of the participants were exercising regularly prior to enrollment [62, 68, 71], one reported that some were exercising regularly [69], while another reported that participants exceeded the guidelines for physical activity at baseline [70]. During the intervention period and when compared to the control group, one study reported a reduction in total daily physical activity in the exercise group [69]. Participants included those with and without cardiovascular disease risk factors [62, 63, 6774].

Table 2 Initial physical characteristics of participants

Characteristics of the exercise programs for each group from each study are described in Table 1. As can be seen, the exercise interventions varied widely. Length of training for the 11 exercise groups ranged from 8 to 24 weeks Χ ¯ ± SD = 16 ± 6 , Mdn = 13 , frequency from 2 to 7 times per week Χ ¯ ± SD = 4 ± 1 , Mdn = 4 and duration from 6 to 75 minutes per session Χ ¯ ± SD = 43 ± 22 , Mdn = 40 . Intensity of training was classified as moderate for 7 groups and high for 4. Seven of the ten studies focused primarily on aerobic types of activities [63, 67, 68, 7072, 74], one on strength training [73] and two on both [62, 69]. Eight groups from seven studies participated in supervised exercise [62, 63, 68, 69, 71, 73, 74], two in unsupervised exercise [70, 72] and one in both [67].

For the four studies [62, 68, 73, 74] and five groups in which data were available, compliance, defined as the percentage of exercise sessions attended, ranged from 42% to 96% Χ ¯ ± SD = 78 ± 21 , Mdn = 84 . Total minutes per week of exercise ranged from 40 to 250 Χ ¯ ± SD = 143 ± 69 , Mdn = 129 while MET minutes per week ranged from 180 to 1873 Χ ¯ ± SD = 821 ± 510 , Mdn = 750 . When adjusted for compliance for the four studies and five groups in which compliance data were available [62, 68, 73, 74] total minutes per week of exercise ranged from 85 to 168 Χ ¯ ± SD = 120 ± 31 , Mdn = 115 while MET minutes per week ranged from 554 to 1260 Χ ¯ ± SD = 821 ± 274 , Mdn = 787 . Total minutes of training over the entire length of the interventions ranged from 780 to 6000 Χ ¯ ± SD = 2270 ± 1695 , Mdn = 1760 while total MET minutes ranged from 6648 to 18881 Χ ¯ ± SD = 12805 ± 5222 , Mdn = 13827 .

Risk of bias assessment

Risk of bias results are shown in Figure 2 while results for each item from each study are shown in Additional file 4. As can be seen, there was a general lack of clear reporting for several potential risks of bias as well as an increased risk of bias for several variables.

Figure 2
figure 2

Risk of bias. Pooled risk of bias results using the Cochrane Risk of Bias Assessment Instrument.

Primary outcome

BMI Z-score

Overall, there was a statistically significant reduction in BMI z-score (Table 3 and Figure 3). This was equivalent to a relative exercise minus control group improvement of approximately 3%. Statistically significant but moderate heterogeneity was observed while 95% PIs were overlapping. No small-study effects were observed as indicated by a lack of funnel plot asymmetry (Figure 4) as well as overlapping 95% CI based on Egger’s regression intercept test (β0, -1.6, 95% CI, -4.1 to 1.0) [59]. Improvements in BMI z-score remained statistically significant when data were collapsed so that only one ES represented each study Χ ¯ , 0.06 , 95 % CI , 0.09 to 0.03 ; Q = 21.5 , p = 0.01 ; I 2 = 58.2 % . With each group deleted from the model once, results remained statistically significant across all deletions (Figure 5). The difference between the largest and smallest values with each group deleted was 0.007 (11.5%). Cumulative meta-analysis, ranked by year, demonstrated that results have been statistically significant since 2009 (Figure 6). The NNT was 107 (95% CI, 209 to 73) with an estimated 116,822 (95% CI, 59,809 to 171,233) obese US children and adolescents and approximately 1 million (95% CI, 0.5 to 1.5) overweight and obese children and adolescents worldwide experiencing improvements in their BMI z-score if they began and maintained a regular exercise program. Results remained statistically significant when the two studies in which energy intake decreased were deleted from the model Χ ¯ , 0.06 , 95 % CI , 0.09 to 0.03 ; Q = 18.4 , p = 0.02 ; I 2 = 56.5 % .

Table 3 Changes in primary and secondary outcomes
Figure 3
figure 3

Forest plot for changes in BMI z-score. Forest plot for point estimate changes in BMI z-score. The black squares represent the mean difference while the left and right extremes of the squares represent the corresponding 95% confidence intervals. The middle of the black diamond represents the overall mean difference while the left and right extremes of the diamond represent the corresponding 95% confidence intervals.

Figure 4
figure 4

Funnel plot for changes in BMI z-score.

Figure 5
figure 5

Influence analysis for changes in BMI z-score. Influence analysis for point estimate changes in BMI z-score with each corresponding study deleted from the model once. The black squares represent the mean difference while the left and right extremes of the squares represent the corresponding 95% confidence intervals. The middle of the black diamond represents the overall mean difference while the left and right extremes of the diamond represent the corresponding 95% confidence intervals. Results are ordered from smallest to largest reductions.

Figure 6
figure 6

Cumulative meta-analysis for changes in BMI z-score. Cumulative meta-analysis, ordered by year, for point estimate changes in BMI z-score. The black squares represent the mean difference while the left and right extremes of the squares represent the corresponding 95% confidence intervals. The results of each corresponding study are pooled with all studies preceding it. The middle of the black diamond represents the overall mean difference while the left and right extremes of the diamond represent the corresponding 95% confidence intervals.

Moderator analyses for changes in BMI z-score and in which sufficient data were available are shown in Additional file 5. As can be seen, no statistically significant between-group Qb differences for any of the analyses were observed.

Meta-regression analyses for changes in BMI z-score and selected covariates in which sufficient data were available for are shown in Additional file 6. As can be seen, there was no statistically significant association between changes in BMI z-score and any of the covariates.

Secondary outcomes

Changes in secondary outcomes are shown in Table 3. As can be seen, there were statistically significant reductions for body weight, BMI in kg/m2, BMI percentile, fat mass and percent body fat. These were equivalent to relative improvements of approximately 1%, 2%, 1%, 2% and 3%, respectively, for body weight, BMI in kg/m2, BMI percentile, fat mass and percent body fat. In addition, improvements were also observed for TG, fasting insulin, VO2max in ml.kg-1.min-1, and energy intake. These were equivalent to relative improvements of approximately 13% and 7% respectively, for TG and VO2max in ml.kg-1.min-1. There was also a statistically significant reduction of approximately 14% for energy intake. Based on the I 2 statistic, no between-study heterogeneity was observed for body weight, BMI percentile, TG, fasting insulin and energy intake while a very low amount was observed for fat mass. Between-study heterogeneity was categorized as moderate for BMI in kg/m2, percent body fat and VO2max in ml.kg-1.min-1. Statistically significant 95% PI were limited to improvements in body weight and fasting insulin. No small-study effects were observed for any of the secondary outcomes. In addition, all results remained statistically significant when ES were collapsed so that only one ES represented each study. No statistically significant differences were observed for fat-free mass, waist circumference, waist-to-hip ratio, resting systolic and diastolic blood pressure, TC, HDL-C ratio of TC to HDL-C, LDL-C, TG, non-HDL-C and fasting glucose. Insufficient data were available to calculate and pool changes in glycosylated hemoglobin, physical activity levels during the intervention period, muscular strength and energy expenditure.

Discussion

Overall findings

The primary purpose of this study was to use the aggregate data meta-analytic approach to determine the effects of exercise (aerobic, strength training or both) on BMI z-score in overweight and obese children and adolescents. The overall findings suggest that exercise improves BMI z-score in children and adolescents. This interpretation is supported by (1) non-overlapping 95% CI, (2) sensitivity of results with each study deleted from the model once, (3) cumulative meta-analysis, (4) absence of small-study effects, and (5) number of overweight and obese children in the US as well as worldwide who might improve their BMI z-score by initiating and maintaining a regular exercise program. In addition, the fact that statistically significant improvements were observed in both BMI z-score and percent body fat is encouraging given that BMI z-score is not the most sensitive measure of adiposity and changes may be observed in body composition variables such as percent body fat but not BMI.

While random-effects models that incorporate heterogeneity into the analysis were used, unidentified, moderate heterogeneity based on a fixed-effect model was observed for changes in BMI z-score. Consequently, it is possible that some groups may see greater or no improvements in BMI z-score. However, the existence of heterogeneity in meta-analysis is not only common [75], but also relevant, as there is no need to combine studies exactly alike since their findings, within statistical error, would be the same [76]. Caution may also be warranted with respect to the current findings given that PI for estimating the expected results of a new trial included zero for changes in BMI z-score. However, these values should not be confused with CI since PI are based on a random mean effect while CI are not [57].

No significant differences in BMI z-score were observed for any of the moderator or meta-regression analyses conducted. However, absence of evidence is not necessarily evidence of absence [77]. The lack of statistically significant and practically relevant findings may be especially important given the small number of studies and participants included in many of the covariate analyses. The former notwithstanding, the fact that no statistically significant differences in BMI z-score existed between those studies that supplied IPD versus those that did not suggests a lack of bias between the two.

The improvements in several body composition variables (body weight, BMI in kg.m2, percent body fat) observed in the current study are in agreement as well as disagreement with previous meta-analyses in which all or a majority of the participants were overweight and obese children and adolescents. For example, the statistically significant improvements observed for BMI in kg.m2 for the current meta-analysis were also reported in another meta-analysis (-0.35 kg.m2, 95% CI, -0.12 to -0.58) [29]. With respect to percent body fat and body weight, the improvements observed for percent body fat and body weight in the current study are in agreement with a previous meta-analysis with respect to percent body fat (SMD, -0.4, 95% CI, -0.7 to -0.1) but not body weight (-2.7 kg, 95% CI, -6.1 to 0.8) [4]. Potential reasons for this latter discrepancy may have to do with such things as differing inclusion criteria and possibly more importantly, differing sample sizes.

The results of this study compare favorably with the results of a recent meta-analysis on dietary interventions alone as well as combined lifestyle interventions in which the majority of the children and adolescents were overweight or obese [19]. Specifically, no statistically significant differences were observed for combined measures of adiposity for dietary interventions (SMD, -0.22, 95% CI, -0.56 to 0.11) while a statistically significant improvement was observed for combined lifestyle interventions targeting the family (SMD, -0.64, 95% CI, -0.88 to -0.39) but not children (SMD, -0.17, 95% CI, -0.40 to 0.05) [19].

The results of this study also compare favorably with pharmacologic interventions in overweight and obese children and adolescents. For example, statistically significant improvements have been observed in BMI in kg.m2 for sibutramine (-2.4 kg.m2, 95% CI, -1.8 to -3.1), and orlistat (-0.7 kg.m2, 95% CI, -0.3 to -1.2) [19]. While the findings for sibutramine are greater than the statistically significant findings in the current meta-analysis, as judged by non-overlapping CI between the two, no such between-intervention differences existed between exercise and orlistat [19]. In addition, the use of pharmacologic interventions in overweight and obese children and adolescents needs to be used with consideration for potential side-effects. Furthermore, there are numerous other potential benefits of exercise, some of which were observed in the current meta-analysis, for example, increases in cardiorespiratory fitness, which cannot be achieved with pharmacologic interventions such as sibutramine and orlistat.

Implications for research

The results of the current systematic review with meta-analysis have at least seven implications for future research. First, based on the Cochrane Risk of Bias Instrument [44], future randomized controlled trials need to do a better job in reporting information on several potential sources of bias. This includes complete information on (1) allocation concealment, (2) blinding of outcome assessors, (3) attrition, including reasons, according to each group and (4) the physical activity levels of the participants prior to study enrollment. While all of the included studies were also considered to be at a high risk of bias for the blinding of participants and personnel category [62, 63, 6774], it is important to realize that it is impossible to blind participants to the exercise intervention. Therefore, the best that one can probably do is blind participants to group assignment.

Second, only one study appeared to use both per-protocol and intention-to-treat approaches in the analysis of their data [62]. It is suggested that future studies include both. As a result, one may gain a better understanding of not only the efficacy (per-protocol analysis), but also the effectiveness (intention-to-treat analysis) of exercise for improving BMI z-score and other measures of adiposity in overweight and obese children and adolescents [78].

Third, given the paucity of data that were available for adverse events and the cost-effectiveness of the interventions employed, there is a need for future studies to collect and report this information. The inclusion of such is critical for those involved in deciding which interventions to recommend over others.

Fourth, it is suggested that investigators collect and report complete information on the exercise intervention(s) used. This includes data on the length, frequency, intensity and duration of exercise as well as the mode(s) used and compliance to the exercise protocol. Also, the setting in which exercise takes place (for example, home versus facility-based) and supervision status (supervised versus unsupervised) should be reported. For all groups studied, including control groups, data should also be collected and reported on the total physical activity levels of all groups during the study. The rationale for this suggestion is based on the possibility that physical activity levels beyond any intervention(s) may increase or decrease in the intervention and/or control groups. For example, one of the included studies reported that total daily physical activity in the exercise group, when compared to the control group, decreased during the study [69]. Changes such as these may negatively impact one or more of the outcomes under investigation.

Fifth, the PI reported for changes in BMI z-score may be beneficial for future researchers interested in conducting randomized controlled intervention trials addressing the effects of exercise on BMI z-score in overweight and obese children and adolescents. This is important for ensuring that appropriate power is obtained for the variable(s) of interest.

Sixth, since the dose–response effects of exercise on measures of adiposity in overweight and obese children and adolescents remain elusive, it is suggested that future randomized controlled trials address this issue. The determination of such is critical for the development of optimal exercise programs for reducing measures of adiposity in overweight and obese children and adolescents.

Seventh, the a priori plan of the current meta-analysis was to conduct a one-step IPD meta-analysis [46]. However, because of (1) the inability to obtain IPD from all eligible studies, (2) the inability to resolve discrepancies between the IPD provided and data reported in the published studies, for example, final sample sizes and (3) the potential loss of power with fewer included studies at the IPD level, a post hoc decision was made to conduct an aggregate data meta-analysis, an approach similar to conducting a two-step meta-analysis with IPD [46]. While some may consider IPD to be the gold standard [46, 79], primarily because of the potential to conduct covariate analyses at the participant level, this has to be weighed against the reality of trying to obtain valid IPD from all eligible studies as well as the fact that causal inferences based on covariate analyses, whether conducted using IPD or aggregate data, cannot be made given that experiments are never randomly assigned to covariates [66, 80]. In addition, the time and costs associated with conducting an IPD meta-analysis are substantially greater than conducting an aggregate data meta-analysis. For example, in 1997, Steinberg et al. estimated the cost for 12 ovarian cancer studies to be 5.3 times higher than conducting an IPD meta-analysis ($259,300 versus $48,665) [81]. However, it has been suggested that the real costs may be 8 times greater since the research team continued to work on the project after the money ran out [80]. Furthermore, support for the use of IPD is not always well grounded. For example, when examining for overall effects, the primary purpose of meta-analysis [76], studies claiming the superiority of IPD over aggregate data meta-analysis have been based on comparisons of a different number of studies between the two [82, 83]. In contrast, when identical or a nearly identical number of studies were included, the overall results were similar [81, 84, 85]. Finally, while the use of IPD has increased in recent years [86], the aggregate data approach is still the most common approach used when conducting a meta-analysis. Thus, while the research team agrees that an IPD meta-analysis may be the best approach in an ideal world, such an approach may not be appropriate in most real world situations in which applied meta-analysts currently reside. It is suggested that future investigators planning an IPD meta-analysis think very carefully about whether such an approach is not only feasible, but also what potential gain, if any, will be derived when compared to conducting an aggregate data meta-analysis.

Implications for practice

The results of the current meta-analysis in overweight and obese children and adolescents have important implications for practice. For example, exercise appears to improve BMI z-score as well as several other body composition (body weight, BMI in kg/m2, BMI percentile, fat mass, percent body fat) and cardiovascular disease risk factors (TG, fasting insulin, VO2max in ml.kg-1.min-1) variables. While the exact dose–response effects of exercise could not be determined in the current meta-analysis and despite the lack of reporting for adverse events, it would appear plausible to suggest that overweight and obese children and adolescents follow the current guidelines for exercise in youth [87]. These include 60 or more minutes of exercise per day, most of which should be either moderate or vigorous intensity aerobic exercise (brisk walking, running, cycling, etc.), including at least 3 days per week of vigorous intensity exercise, for example, running versus walking [87]. Included in the 60 minutes per day should be muscle strengthening exercises (pushups, weight training, etc.) at least 3 days per week as well as bone strengthening exercises, for example jumping rope, at least 3 days per week [87]. These activities should be (1) enjoyable, (2) age appropriate, (3) gender appropriate and (4) varied. For previously inactive children and adolescents who are overweight or obese, a gradual increase in the volume of activity should be encouraged until these thresholds can be met. In addition, since changes in all measures of adiposity were 3% or less, the use of additional interventions such as a reduction in energy intake, along with exercise, may yield greater improvements.

Potential strengths and limitations of current study

Strengths

There are least four potential strengths of the current meta-analysis. First, to the best of the investigative team’s knowledge, this is the first meta-analysis to focus on BMI z-score, a metric suggested to be superior to other BMI measures [31], as the primary outcome with respect to the effects of exercise in overweight and obese children and adolescents. Thus, this adds important information regarding the magnitude of benefit that exercise can provide for improving adiposity in overweight and obese children and adolescents. Second, the inclusion of the NNT provides practical information to aid decision-makers in deciding what treatments to recommend or prioritize over others when attempting to reduce adiposity in overweight and obese children and adolescents. Third, while gross estimates were provided and are prone to error, the absolute number of overweight and/or obese children and adolescents who could improve their BMI z-scores by participating in a regular exercise program can help aid decision-makers in allocating the resources necessary for accomplishing such. Fourth, the calculation and inclusion of PI can aid investigators when planning future randomized controlled trials on this topic.

Potential limitations

The results of the current meta-analysis should be viewed with respect to the following nine potential limitations. First, the many different exercise modes and various intensities used in the interventions could have affected the current findings. Second, while no statistically significant association was observed between baseline BMI z-score and changes in BMI z-score, results may have nevertheless been affected by the fact that studies included overweight to morbidly obese children and adolescents [62, 63, 6774]. Third, while we were unable to examine for such, the exercise response could have been affected by the fact that the studies included both healthy populations as well as those with cardiovascular disease risk factors [62, 63, 6774]. Fourth, while no statistically significant association was observed between compliance or dropouts and changes in BMI z-score, it is still possible that the various levels of compliance and dropout rates could have affected the current findings.

Fifth, because studies are not randomly assigned to covariates in meta-analysis, they are considered to be observational in nature. Consequently, the results of the moderator and meta-regression analyses conducted in this or any other meta-analysis do not support causal inferences [66]. Sixth, because a large number of statistical tests were conducted and no adjustments were made for such, some statistically significant findings could have been nothing more than the play of chance. However, as suggested by Rothman [88], no adjustment was made for multiple tests because of the concern about missing possibly important findings. Seventh, while estimates regarding the number of overweight and/or obese children who could improve their BMI z-scores in the US and worldwide were provided, it’s important to understand that these were gross estimates. Most notably, it was assumed that none of the overweight and obese children and adolescents were exercising. Consequently, the estimates provided may be inflated. Eighth, the results for the secondary outcomes included in the current meta-analysis may be biased since they were only included if BMI z-score was included as an outcome. Ninth, like any meta-analysis, the results of the current investigation may be prone to both ecological fallacy and/or Simpson’s Paradox [80].

Conclusions

Exercise improves BMI z-score in overweight and obese children and adolescents. However, based on risk of bias assessments as well as other observed factors, additional, well-designed randomized controlled trials on this topic are needed.

Authors’ information

GAK has more than 20 years of successful experience in the design and conduct of all aspects of meta-analysis, including the effects of chronic exercise in overweight and obese children, adolescents and adults. KSK has more than 16 years of successful experience in conducting meta-analysis, including the effects of chronic exercise in overweight and obese children, adolescents and adults. RRP has been a leading authority for more than 30 years on the effects of exercise in overweight and obese children and adolescents.