Background

The World Health Organization (WHO) defines physical activity “as any bodily movement produced by skeletal muscles that requires energy expenditure” [1]. Therefore, physical activity is not only limited to sports but also includes walking, running, swimming, gymnastics, dance, ball games, and martial arts, for example. In the last years, several organizations have published or updated their guidelines on physical activity. For example, the Physical Activity Guidelines for Americans, 2nd edition, provides information and guidance on the types and amounts of physical activity that provide substantial health benefits [2]. The evidence about the health benefits of regular physical activity is well established and so are the risks of sedentary behaviour [2]. Exercise is dose dependent, meaning that people who achieve cumulative levels several times higher than the current recommended minimum level have a significant reduction in the risk of breast cancer, colon cancer, diabetes, ischemic heart disease, and ischemic stroke events [3]. Benefits of physical activity have been reported for numerous outcomes such as mortality [4, 5], cognitive and physical decline [5,6,7], glycaemic control [8, 9], pain and disability [10, 11], muscle and bone strength [12], depressive symptoms [13], and functional mobility and well-being [14, 15]. Overall benefits of exercise apply to all bodily systems including immunological [16], musculoskeletal [17], respiratory [18], and hormonal [19]. Specifically for the cardiovascular system, exercise increases fatty acid oxidation, cardiac output, vascular smooth muscle relaxation, endothelial nitric oxide synthase expression and nitric oxide availability, improves plasma lipid profiles [15] while at the same time reducing resting heart rate and blood pressure, aortic valve calcification, and vascular resistance [20].

However, the degree of all the above-highlighted benefits vary considerably depending on individual fitness levels, types of populations, age groups and the intensity of different physical activities/exercises [21]. The majority of guidelines in different countries recommend a goal of 150 min/week of moderate-intensity aerobic physical activity (or equivalent of 75 min of vigorous-intensity) [22] with differences for cardiovascular disease [23] or obesity prevention [24] or age groups [25].

There is a plethora of systematic reviews published by the Cochrane Library critically evaluating the effectiveness of physical activity/exercise for various health outcomes. Cochrane systematic reviews (CSRs) are known to be a source of high-quality evidence. Thus, it is not only timely but relevant to evaluate the current knowledge, and determine the quality of the evidence-base, and the magnitude of the effect sizes given the negative lifestyle changes and rising physical inactivity-related burden of diseases. This overview will identify the breadth and scope to which CSRs have appraised the evidence for exercise on health outcomes; and this will help in directing future guidelines and identifying current gaps in the literature.

The objectives of this research were to a. answer the following research questions: in children, adolescents and adults (both healthy and medically compromised) what are the effects (and adverse effects) of exercise/physical activity in improving various health outcomes (e.g., pain, function, quality of life) reported in CSRs; b. estimate the magnitude of the effects by pooling the results quantitatively; c. evaluate the strength and quality of the existing evidence; and d. create recommendations for future researchers, patients, and clinicians.

Methods

Our overview was registered with PROSPERO (CRD42019120295) on 10th January 2019. The Cochrane Handbook for Systematic Reviews of interventions and Preferred Reporting Items for Overviews of Reviews were adhered to while writing and reporting this overview [26, 27].

Search strategy and selection criteria

We followed the practical guidance for conducting overviews of reviews of health care interventions [28] and searched the Cochrane Database of Systematic Reviews (CDSR), 2019, Issue 1, on the Cochrane Library for relevant papers using the search strategy: (health) and (exercise or activity or physical). The decision to seek CSRs only was based on three main aspects. First, high quality (CSRs are considered to be the ‘gold methodological standard’) [29,30,31]. Second, data saturation (enough high-quality evidence to reach meaningful conclusions based on CSRs only). Third, including non-CSRs would have heavily increased the issue of overlapping reviews (also affecting data robustness and credibility of conclusions). One reviewer carried out the searches. The study screening and selection process were performed independently by two reviewers. We imported all identified references into reference manager software EndNote (X8). Any disagreements were resolved by discussion between the authors with third overview author acting as an arbiter, if necessary.

We included CSRs of randomised controlled trials (RCTs) involving both healthy individuals and medically compromised patients of any age and gender. Only CSRs assessing exercise or physical activity as a stand-alone intervention were included. This included interventions that could initially be taught by a professional or involve ongoing supervision (the WHO definition). Complex interventions e.g., assessing both exercise/physical activity and behavioural changes were excluded if the health effects of the interventions could not have been attributed to exercise distinctly.

Any types of controls were admissible. Reviews evaluating any type of health-related outcome measures were deemed eligible. However, we excluded protocols or/and CSRs that have been withdrawn from the Cochrane Library as well as reviews with no included studies.

Data analysis

Three authors (HM, ALN, NK) independently extracted relevant information from all the included studies using a custom-made data collection form. The methodological quality of SRs included was independently evaluated by same reviewers using the AMSTAR-2 tool [32]. Any disagreements on data extraction or CSR quality were resolved by discussion. The entire dataset was validated by three authors (PP, MS, DP) and any discrepant opinions were settled through discussions.

The results of CSRs are presented in a narrative fashion using descriptive tables. Where feasible, we presented outcome measures across CSRs. Data from the subset of homogeneous outcomes were pooled quantitatively using the approach previously described by Bellou et al. and Posadzki et al. [33, 34]. For mortality and quality of life (QOL) outcomes, the number of participants and RCTs involved in the meta-analysis, summary effect sizes [with 95% confidence intervals (CI)] using random-effects model were calculated. For binary outcomes, we considered relative risks (RRs) as surrogate measures of the corresponding odds ratio (OR) or risk ratio/hazard ratio (HR). To stabilise the variance and normalise the distributions, we transformed RRs into their natural logarithms before pooling the data (a variation was allowed, however, it did not change interpretation of results) [35]. The standard error (SE) of the natural logarithm of RR was derived from the corresponding CIs, which was either provided in the study or calculated with standard formulas [36]. Binary outcomes reported as risk difference (RD) were also meta-analysed if two more estimates were available. For continuous outcomes, we only meta-analysed estimates that were available as standardised mean difference (SMD), and estimates reported with mean differences (MD) for QOL were presented separately in a supplementary Table 9. To estimate the overall effect size, each study was weighted by the reciprocal of its variance. Random-effects meta-analysis, using DerSimonian and Laird method [37] was applied to individual CSR estimates to obtain a pooled summary estimate for RR or SMD. The 95% prediction interval (PI) was also calculated (where ≥3 studies were available), which further accounts for between-study heterogeneity and estimates the uncertainty around the effect that would be anticipated in a new study evaluating that same association. I-squared statistic was used to measure between study heterogeneity; and its various thresholds (small, substantial and considerable) were interpreted considering the size and direction of effects and the p-value from Cochran’s Q test (p < 0.1 considered as significance) [38]. Wherever possible, we calculated the median effect size (with interquartile range [IQR]) of each CSR to interpret the direction and magnitude of the effect size. Sub-group analyses are planned for type and intensity of the intervention; age group; gender; type and/or severity of the condition, risk of bias in RCTs, and the overall quality of the evidence (Grading of Recommendations Assessment, Development and Evaluation (GRADE) criteria). To assess overlap we calculated the corrected covered area (CCA) [39]. All statistical analyses were conducted on Stata statistical software version 15.2 (StataCorp LLC, College Station, Texas, USA).

Results

The searches generated 280 potentially relevant CRSs. After removing of duplicates and screening, a total of 150 CSRs met our eligibility criteria [40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127,128,129,130,131,132,133,134,135,136,137,138,139,140,141,142,143,144,145,146,147,148,149,150,151,152,153,154,155,156,157,158,159,160,161,162,163,164,165,166,167,168,169,170,171,172,173,174,175,176,177,178,179,180,181,182,183,184,185,186,187,188,189] (Fig. 1). Reviews were published between September 2002 and December 2018. A total of 130 CSRs employed meta-analytic techniques and 20 did not. The total number of RCTs in the CSRs amounted to 2888; with 485,110 participants (mean = 3234, SD = 13,272). The age ranged from 3 to 87 and gender distribution was inestimable. The main characteristics of included reviews are summarised in supplementary Table 1. Supplementary Table 2 summarises the effects of physical activity/exercise on health outcomes. Conclusions from CSRs are listed in supplementary Table 3. Adverse effects are listed in supplementary Table 4. Supplementary Table 5 presents summary of withdrawals/non-adherence. The methodological quality of CSRs is presented in supplementary Table 6. Supplementary Table 7 summarises studies assessed at low risk of bias (by the authors of CSRs). GRADE-ings of the review’s main comparison are listed in supplementary Table 8.

Fig. 1
figure 1

Study selection process

There were 54 separate populations/conditions, considerable range of interventions and comparators, co-interventions, and outcome measures. For detailed description of interventions, please refer to the supplementary tables. Most commonly measured outcomes were - function 112 (75%), QOL 83 (55%), AEs 70 (47%), pain 41 (27%), mortality 28 (19%), strength 30 (20%), costs 47 (31%), disability 14 (9%), and mental health in 35 (23%) CSRs.

There was a 13% reduction in mortality rates risk ratio (RR) 0.87 [95% CI 0.78 to 0.96]; I2 = 26.6%, [PI 0.70, 1.07], median effect size (MES) = 0.93 [interquartile range (IQR) 0.81, 1.00]; 10 CSRs, 187 RCTs, 27,671 participants) following exercise when compared with various controls (Table 1). This reduction was smaller in ‘other groups’ of patients when compared to cardiovascular diseases (CVD) patients - RR 0.97 [95% CI 0.65, 1.45] versus 0.85 [0.76, 0.96] respectively. The effects of exercise were not intensity or frequency dependent. Sessions more than 3 times per week exerted a smaller reduction in mortality as compared with sessions of less than 3 times per week RR 0.87 [95% CI 0.78, 0.98] versus 0.63 [0.39, 1.00]. Subgroup analyses by risk of bias (ROB) in RCTs showed that RCTs at low ROB exerted smaller reductions in mortality when compared to RCTs at an unclear or high ROB, RR 0.90 [95% CI 0.78, 1.02] versus 0.72 [0.42, 1.22] versus 0.86 [0.69, 1.06] respectively. CSRs with moderate quality of evidence (GRADE), showed slightly smaller reductions in mortality when compared with CSRs that relied on very low to low quality evidence RR 0.88 [95% CI 0.79, 0.98] versus 0.70 [0.47, 1.04].

Table 1 Quantitative evidence synthesis for mortality outcomes

Exercise also showed an improvement in QOL, standardised mean difference (SMD) 0.18 [95% CI 0.08, 0.28]; I2 = 74.3%; PI -0.18, 0.53], MES = 0.20 [IQR 0.07, 0.39]; 15 CSRs, 408 RCTs, 32,984 participants) when compared with various controls (Table 2). These improvements were greater observed for health related QOL when compared to overall QOL SMD 0.30 [95% CI 0.21, 0.39] vs 0.06 [− 0.08, 0.20] respectively. Again, the effects of exercise were duration and frequency dependent. For instance, sessions of more than 90 mins exerted a greater improvement in QOL as compared with sessions up to 90 min SMD 0.24 [95% CI 0.11, 0.37] versus 0.22 [− 0.30, 0.74]. Subgroup analyses by the type of condition showed that the magnitude of effect was the largest among patients with mental health conditions, followed by CVD and cancer. Physical activity exerted negative effects on QOL in patients with respiratory conditions (2 CSRs, 20 RCTs with 601 patients; SMD -0.97 [95% CI -1.43, 0.57]; I2 = 87.8%; MES = -0.46 [IQR-0.97, 0.05]). Subgroup analyses by risk of bias (ROB) in RCTs showed that RCTs at low or unclear ROB exerted greater improvements in QOL when compared to RCTs at a high ROB SMD 0.21 [95% CI 0.10, 0.31] versus 0.17 [0.03, 0.31]. Analogically, CSRs with moderate to high quality of evidence showed slightly greater improvements in QOL when compared with CSRs that relied on very low to low quality evidence SMD 0.19 [95% CI 0.05, 0.33] versus 0.15 [− 0.02, 0.32]. Please also see supplementary Table 9 more studies reporting QOL outcomes as mean difference (not quantitatively synthesised herein).

Table 2 Quantitative evidence synthesis for quality of life outcomes

Adverse events (AEs) were reported in 100 (66.6%) CSRs; and not reported in 50 (33.3%). The number of AEs ranged from 0 to 84 in the CSRs. The number was inestimable in 83 (55.3%) CSRs. Ten (6.6%) reported no occurrence of AEs. Mild AEs were reported in 28 (18.6%) CSRs, moderate in 9 (6%) and serious/severe in 20 (13.3%). There were 10 deaths and in majority of instances, the causality was not attributed to exercise. For this outcome, we were unable to pool the data as effect sizes were too heterogeneous (Table 3).

Table 3 Quantitative evidence synthesis of AEs and adherence outcomes

In 38 CSRs, the total number of trials reporting withdrawals/non-adherence was inestimable. There were different ways of reporting it such as adherence or attrition (high in 23.3% of CSRs) as well as various effect estimates including %, range, total numbers, MD, RD, RR, OR, mean and SD. The overall pooled estimates are reported in Table 3.

Of all 16 domains of the AMSTAR-2 tool, 1876 (78.1%) scored ‘yes’, 76 (3.1%) ‘partial yes’; 375 (15.6%) ‘no’, and ‘not applicable’ in 25 (1%) CSRs. Ninety-six CSRs (64%) were scored as ‘no’ on reporting sources of funding for the studies followed by 88 (58.6%) failing to explain the selection of study designs for inclusion. One CSR (0.6%) each were judged as ‘no’ for reporting any potential sources of conflict of interest, including any funding for conducting the review as well for performing study selection in duplicate.

In 102 (68%) CSRs, there was predominantly a high risk of bias in RCTs. In 9 (6%) studies, this was reported as a range, e.g., low or unclear or low to high. Two CSRs used different terminology i.e., moderate methodological quality; and the risk of bias was inestimable in one CSR. Sixteen (10.6%) CSRs did not identify any studies (RCTs) at low risk of random sequence generation, 28 (18.6%) allocation concealment, 28 (18.6%) performance bias, 84 (54%) detection bias, 35 (23.3%) attrition bias, 18 (12%) reporting bias, and 29 (19.3%) other bias.

In 114 (76%) CSRs, limitation of studies was the main reason for downgrading the quality of the evidence followed by imprecision in 98 (65.3%) and inconsistency in 68 (45.3%). Publication bias was the least frequent reason for downgrading in 26 (17.3%) CSRs. Ninety-one (60.7%) CSRs reached equivocal conclusions, 49 (32.7%) reviews reached positive conclusions and 10 (6.7%) reached negative conclusions (as judged by the authors of CSRs).

Discussion

In this systematic review of CSRs, we found a large body of evidence on the beneficial effects of physical activity/exercise on health outcomes in a wide range of heterogeneous populations. Our data shows a 13% reduction in mortality rates among 27,671 participants, and a small improvement in QOL and health-related QOL following various modes of physical activity/exercises. This means that both healthy individuals and medically compromised patients can significantly improve function, physical and mental health; or reduce pain and disability by exercising more [190]. In line with previous findings [191,192,193,194], where a dose-specific reduction in mortality has been found, our data shows a greater reduction in mortality in studies with longer follow-up (> 12 months) as compared to those with shorter follow-up (< 12 months). Interestingly, we found a consistent pattern in the findings, the higher the quality of evidence and the lower the risk of bias in primary studies, the smaller reductions in mortality. This pattern is observational in nature and cannot be over-generalised; however this might mean less certainty in the estimates measured. Furthermore, we found that the magnitude of the effect size was the largest among patients with mental health conditions. A possible mechanism of action may involve elevated levels of brain-derived neurotrophic factor or beta-endorphins [195].

We found the issue of poor reporting or underreporting of adherence/withdrawals in over a quarter of CSRs (25.3%). This is crucial both for improving the accuracy of the estimates at the RCT level as well as maintaining high levels of physical activity and associated health benefits at the population level.

Even the most promising interventions are not entirely risk-free; and some minor AEs such as post-exercise pain and soreness or discomfort related to physical activity/exercise have been reported. These were typically transient; resolved within a few days; and comparable between exercise and various control groups. However worryingly, the issue of poor reporting or underreporting of AEs has been observed in one third of the CSRs. Transparent reporting of AEs is crucial for identifying patients at risk and mitigating any potential negative or unintended consequences of the interventions.

High risk of bias of the RCTs evaluated was evident in more than two thirds of the CSRs. For example, more than half of reviews identified high risk of detection bias as a major source of bias suggesting that lack of blinding is still an issue in trials of behavioural interventions. Other shortcomings included insufficiently described randomisation and allocation concealment methods and often poor outcome reporting. This highlights the methodological challenges in RCTs of exercise and the need to counterbalance those with the underlying aim of strengthening internal and external validity of these trials.

Overall, high risk of bias in the primary trials was the main reason for downgrading the quality of the evidence using the GRADE criteria. Imprecision was frequently an issue, meaning the effective sample size was often small; studies were underpowered to detect the between-group differences. Pooling too heterogeneous results often resulted in inconsistent findings and inability to draw any meaningful conclusions. Indirectness and publication bias were lesser common reasons for downgrading. However, with regards to the latter, the generally accepted minimum number of 10 studies needed for quantitatively estimate the funnel plot asymmetry was not present in 69 (46%) CSRs.

Strengths of this research are the inclusion of large number of ‘gold standard’ systematic reviews, robust screening, data extractions and critical methodological appraisal. Nevertheless, some weaknesses need to be highlighted when interpreting findings of this overview. For instance, some of these CSRs analysed the same primary studies (RCTs) but, arrived at slightly different conclusions. Using, the Pieper et al. [39] formula, the amount of overlap ranged from 0.01% for AEs to 0.2% for adherence, which indicates slight overlap. All CSRs are vulnerable to publication bias [196] - hence the conclusions generated by them may be false-positive. Also, exercise was sometimes part of a complex intervention; and the effects of physical activity could not be distinguished from co-interventions. Often there were confounding effects of diet, educational, behavioural or lifestyle interventions; selection, and measurement bias were inevitably inherited in this overview too. Also, including CSRs only might lead to selection bias; and excluding reviews published before 2000 might limit the overall completeness and applicability of the evidence. A future update should consider these limitations, and in particular also including non-CSRs.

Conclusions

Trialists must improve the quality of primary studies. At the same time, strict compliance with the reporting standards should be enforced. Authors of CSRs should better explain eligibility criteria and report sources of funding for the primary studies. There are still insufficient physical activity trends worldwide amongst all age groups; and scalable interventions aimed at increasing physical activity levels should be prioritized [197]. Hence, policymakers and practitioners need to design and implement comprehensive and coordinated strategies aimed at targeting physical activity programs/interventions, health promotion and disease prevention campaigns at local, regional, national, and international levels [198].