Systematic Review to Inform a World Health Organization (WHO) Clinical Practice Guideline: Benefits and Harms of Structured Exercise Programs for Chronic Primary Low Back Pain in Adults

Purpose Evaluate benefits and harms of structured exercise programs for chronic primary low back pain (CPLBP) in adults to inform a World Health Organization (WHO) standard clinical guideline. Methods We searched for randomized controlled trials (RCTs) in electronic databases (inception to 17 May 2022). Eligible RCTs targeted structured exercise programs compared to placebo/sham, usual care, or no intervention (including comparison interventions where the attributable effect of exercise could be isolated). We extracted outcomes, appraised risk of bias, conducted meta-analyses where appropriate, and assessed certainty of evidence using GRADE. Results We screened 2503 records (after initial screening through Cochrane RCT Classifier and Cochrane Crowd) and 398 full text RCTs. Thirteen RCTs rated with overall low or unclear risk of bias were synthesized. Assessing individual exercise types (predominantly very low certainty evidence), pain reduction was associated with aerobic exercise and Pilates vs. no intervention, and motor control exercise vs. sham. Improved function was associated with mixed exercise vs. usual care, and Pilates vs. no intervention. Temporary increased minor pain was associated with mixed exercise vs. no intervention, and yoga vs. usual care. Little to no difference was found for other comparisons and outcomes. When pooling exercise types, exercise vs. no intervention probably reduces pain in adults (8 RCTs, SMD = − 0.33, 95% CI − 0.58 to − 0.08) and functional limitations in adults and older adults (8 RCTs, SMD = − 0.31, 95% CI − 0.57 to − 0.05) (moderate certainty evidence). Conclusions With moderate certainty, structured exercise programs probably reduce pain and functional limitations in adults and older people with CPLBP. Supplementary Information The online version contains supplementary material available at 10.1007/s10926-023-10124-4.


Introduction
Exercise therapy or structured exercise programs are widely used to manage low back pain (LBP).Exercise therapy is defined as "a series of specific movements with the aim of training or developing the body by a routine practice or as physical training to promote good physical health" [1] with a goal to reduce pain and functional limitations.Exercise therapies are prescribed or planned by health practitioners and include conducting postures, movements, and/or activities (e.g., strengthening, stretching, aerobic exercise) at varying dosages (duration, frequency, intensity) [2].For people with chronic primary LBP (CPLBP), exercise therapy may improve musculoskeletal function, while also benefiting most other body systems and mental wellbeing [3].In turn, this may reduce pain and functional limitations, and improve emotional and psychological wellbeing [2].Exercise therapy is accessible globally.
Hayden and colleagues published a Cochrane review (2021) (literature search date ending 28 April 2018) to assess the impact of exercise therapy on pain and functional limitations for the management of chronic LBP in adults compared to placebo, no treatment, or usual care (pooled together), or other conservative treatments (249 randomized controlled trials (RCTs); 24,486 participants) [2] and a Extended author information available on the last page of the article network meta-analysis comparing different types of exercise treatments [4].They concluded with moderate certainty that exercise reduces pain and functional limitations when compared to no treatment, usual care, or sham, but not when compared to other conservative treatments [2].
To develop clinical practice guideline recommendations for the management of CPLBP in adults, the WHO commissioned the current systematic review to update the evidence and expand the aims of Hayden et al.'s previously published Cochrane review [2] by assessing additional important outcomes, conducting additional subgroup analyses, and disaggregating pairwise findings by exercise type (compared to no treatment, placebo/sham, or usual care).
The objectives of this systematic review of RCTs were to determine: (1) the benefits and harms of structured exercise programs compared to placebo/sham, usual care, or no intervention for the management of CPLBP in adults, including older adults (aged ≥ 60 years); and (2) whether the benefits and harms of structured exercise programs vary by age, gender/sex, presence of leg pain, race/ethnicity, or national economic development of the countries where the RCTs were conducted.

Methods
This systematic review was conducted as part of a series of reviews to inform a WHO clinical practice guideline on the management of CPLBP in adults.The development of this guideline was ongoing at the time of submission of this manuscript.The review was conducted in collaboration with the Cochrane 'exercise treatment for chronic low back pain' collaborative review team, led by Prof. Jill Hayden [5].The methods are detailed in the methodology article of this series [6].
Briefly, we updated and expanded the scope of the previously published Cochrane review [2].The current review differs from Hayden et al.'s in the following ways: 1) we updated the literature search to include RCTs published from 28 April 2018 through 17 May 2022; 2) we assessed additional outcomes identified as critical by the WHO Guideline Development Group (GDG); 3) we conducted additional subgroup analyses (e.g., age, gender/ sex); 4) we analyzed and reported the results separately for different exercise types, specifically comparing the effects of each exercise intervention to its respective comparator; 5) we did not assess 'other conservative treatment' comparisons (e.g., exercise vs. manual therapy); 6) we excluded RCTs of multimodal interventions where the specific effects of exercise could not be isolated; 7) we excluded RCTs judged to have high risk of bias in our primary analyses (although included all RCTs, irrespective of risk of bias in a supplementary analysis); and 8) the eligibility criteria for the population of interest differed to some degree.For example, we did not exclude RCTs of participants who had specific pathologies (e.g., disc herniation, lumbar spinal stenosis, and spondylolisthesis) provided all other eligibility criteria were satisfied.We also did not exclude RCTs of surgical populations if time since surgery was at least 12 months and participants had no history of fusion and/or disc replacement surgery.
We registered our review protocol with PROSPERO (International Prospective Register of Systematic Reviews) (CRD42022314576) on 7 March 2022.
In collaboration with the Cochrane review team, we modified the original search strategy using a detailed search optimization process [7].The updated strategy was approved by a Cochrane musculoskeletal (MSK) literature search specialist.We searched MEDLINE (Ovid), CENTRAL (Cochrane Library, Wiley), and Embase (Elsevier) with no date or language restrictions up until 17 May 2022 (see Online Resource 1).Retrieved citations were de-duplicated against the search results of the previous Cochrane review update.
We included RCTs that compared structured exercise programs to placebo/sham, usual care, and no intervention (including comparison interventions where the attributable effect of exercise could be isolated, i.e., exercise + medication vs. same medication alone) in adults (aged ≥ 20 years) with CPLBP.Eligible interventions included all types of exercise with no exclusions based on setting, mode of delivery (e.g., in-person vs. telehealth, group vs. individual, home vs. clinic or community) or degree of personalization (standardized vs. individualized).Individuals may have been given verbal or written exercise instructions (e.g., handbook).Eligible exercise interventions, considered as separate exercise types, included, but were not limited to aerobic exercise; muscle strength training; stretching, flexibility or mobilizing exercises; yoga; core strengthening; motor control exercise; functional restoration exercise (not including multimodal programs of exercise with other interventions, such as psychological supports); Pilates; Tai Chi; Qigong; and mixed exercise therapies (i.e., two or more types of exercise in which one did not clearly predominate).
In addition to the main critical outcomes assessed for all reviews in this series (pain, function, health-related quality of life (HRQoL), harms, psychological functioning, and social participation including work), we also assessed additional critical outcomes requested by the WHO GDG for this review -the change in use of medications, burden related to the intervention or comparator (e.g., ease of access to the intervention, time burden of the intervention), performance-based physical functioning, and falls (older adults only aged ≥ 60 years).We reported outcomes based on post-intervention follow-up intervals including: (1) immediate term (closest to 2 weeks after the intervention period); (2) short term (closest to 3 months after the intervention period); (3) intermediate term (closest to 6 months after the intervention period); (4) long term (closest to 12 months after the intervention period); and (5) extra-long term (more than 12 months after the intervention period).
We assessed between-group differences to determine the magnitude of the effect of an intervention and to assess its effectiveness [8,9] (details in the methodology article in this series) [6].Briefly, we considered a mean difference (MD) of ≥ 10% of the scale range or ≥ 10% difference in risk for dichotomous outcomes to be a minimally important difference (MID) [10,11].If the standardized mean difference (SMD) was calculated, SMD ≥ 0.2 was considered a MID [12].
Pairs of reviewers independently screened studies for eligibility, and critically appraised risk of bias (ROB) using the Cochrane ROB 1 tool [13], modified from the Cochrane Back and Neck Methods Guidelines [14].One reviewer extracted data for all included RCTs, which was then verified by a second reviewer.Any disagreements were resolved by consensus between paired reviewers or with a third reviewer, when necessary.Forms and guidance for screening, risk of bias assessment, and data extraction were adapted from those developed by Hayden et al. in the conduct of the 'exercise for chronic low back pain' collaborative review, in which members of our team participated [5].The forms were completed using DistillerSR Inc. [15]-a web-based electronic systematic review software application.
In our primary synthesis, our analyses were conducted according to exercise type (e.g., aerobic exercise, yoga).In addition to the subgroup analyses conducted for all reviews in this series (age, gender/sex, presence of leg pain, race/ ethnicity, and national economic development of country where RCT was conducted), we aimed to perform subgroup analyses according to exercise dosage and intensity, and to conduct a sensitivity analysis by removing RCTs rated as unclear ROB.
We conducted random-effects meta-analyses and narrative synthesis where meta-analysis was not appropriate [16], and graded the certainty of evidence using Grading of Recommendations Assessment, Development and Evaluation (GRADE) [17].The comparisons involving no intervention and interventions where the attributable effect of exercise could be isolated were combined in metaanalyses.Meta-analyses were conducted using R statistical packages [18,19], and GRADE Evidence Profiles and GRADE Summary of Findings tables were developed using GRADEpro software [20].
Following completion of our primary synthesis, the WHO commissioned a supplementary evidence synthesis to further inform the formulation of recommendations by the GDG.In the supplementary evidence synthesis, we synthesized the 13 RCTs (judged as low or unclear ROB) included in our primary evidence synthesis along with 55 additional RCTs originally excluded from our synthesis due to high ROB.These studies were identified as having been published in the period 28 April 2018 (search end date of Hayden's previously published Cochrane review [2]) to 17 May 2022.We included all 13 trials from the primary synthesis (from database inception through 17 May 2022) in this supplementary synthesis since no differences in the magnitude or directions of the effect estimates were observed in a sensitivity analysis where RCTs published on or before 28 April 2018 were excluded.
In the supplementary evidence synthesis (see Online Resource 8), we included RCTs that compared any structured exercise program or exercise type to the same comparisons as in our primary synthesis.The outcomes assessed were pain, function, and harms only.The key differences between the primary and supplementary evidence syntheses are summarized (Table 1).
The WHO was provided with the primary and supplementary evidence syntheses to support the GDG in formulation of recommendations.The GDG may have also considered other aligned evidence when formulating its recommendations (currently under development).
Regarding unpublished RCTs, we identified 185 RCTs (registrations and published protocols) in the WHO ICTRP.Of these, 14 authors could not be contacted because an email address could not be located.Thus, 171 authors were contacted and 164 received our invitation to respond to a REDCap survey [34,35] consisting of our specific queries.Of these, 32% (53/164) responded; 19 reported that their RCT would not meet our inclusion criteria; 26 reported their RCT was ongoing; and 8 provided citations, which we confirmed were already included in our review.Thus, we did not include any unpublished RCTs in our review.

Certainty of Evidence
The certainty of the evidence ranged from very low (for outcomes assessed with the individual exercise types) to moderate (for outcomes assessed after pooling exercise types).Certainty of evidence was downgraded due to ROB, inconsistency, indirectness, and/or imprecision of the effect estimates (see Online Resources 5, 6 and 7).For results reported as a MD, lower or negative values refer to reduced pain, functional limitations, depression, or fear avoidance; higher or positive values refer to improved HRQoL and self-efficacy.

Older Adults
Due to very low certainty evidence from 1 RCT [33] Due to very low certainty evidence from 1 RCT [33], in older adults in the immediate term, it is uncertain whether mixed exercise worsens HRQoL PCS (scale 0 to 100, 0 = poor quality of life) (MD = − 6.56, 95% CI − 13.03 to − 0.10) (plot 4.1.3.

All Adults
For outcomes that are based on RCTs of older adults only, results are reported under older adults below.

Older Adults
Due to very low certainty evidence from one RCT of older adults [24], it is uncertain whether mixed exercise makes little or no difference to pain (scale 0 to 10, 0 = no pain) in the immediate (MD = − 0.

Pilates Exercises Versus Comparison Interventions With Isolated Exercise Effects
Due to very low certainty evidence from one RCT [25], it is uncertain whether Pilates reduces pain (scale 0 to 10, 0 = no pain) in the immediate term (MD = − 2.10, 95% CI − 3.

Yoga Versus Usual Care
The evidence is based on one RCT [32] and is very low certainty for all outcomes and time points.The results in this section are narratively synthesized (no forest plots).

Pooled Analysis of All Exercise Types Versus Comparison Interventions With Isolated Exercise Effects
We conducted a post hoc analysis by pooling all exercise types since only 1-3 RCTs were identified for each exercise type and none on their own showed a clear benefit.To be included in this analysis, data from two or more of the eight exercise types had to be available per comparison, outcome, and time point.Otherwise, findings of the individual eight exercise types have been reported in the eight previous comparisons.

Subgroup, Sensitivity and Supplementary Evidence Analyses
For the primary evidence synthesis, we did not conduct subgroup analysis for exercise dosage or intensity because there were too few RCTs (1-3) per comparison with little variation in dosage or intensity between RCTs.Additionally, we did not conduct sensitivity analyses removing the overall unclear ROB RCTs as most were given this rating (11/13, 85%).
In the subgroup and/or sensitivity analyses conducted in both the primary and supplementary evidence syntheses, for all comparisons and outcomes, subgroup differences could not be explained and/or the differences between subgroups would likely not result in different recommendations for different subgroups.This was mostly due to the low or very low certainty evidence and the absence of or unimportant differences between the intervention and comparison groups (see Online Resources 7 and 8).

Discussion
The evidence regarding the benefits and harms of structured exercise programs for CPLBP in adults is based on 13 RCTs deemed as low or unclear ROB with a total of 1362 participants.Of these, two RCTs (n = 252) assessed adults aged ≥ 60 years.The eight exercise types assessed were aerobic exercise, core strengthening, muscle strengthening, mixed exercise, Pilates, stretching/flexibility/mobilizing exercise, yoga, and motor control exercise.Most of the RCTs (11, 85%) were rated as unclear overall ROB (concerns primarily with performance and detection bias).The certainty for the evidence related to individual exercise types was low or very low.Compared to no intervention, pain reduction was associated with aerobic exercise in the immediate and short terms, and Pilates in the immediate term, and motor control exercise vs. sham in the immediate and long terms.Improved function was associated with mixed exercise vs. usual care, and Pilates vs. no intervention in the immediate term.Temporary increased minor pain was associated with mixed exercise vs. no intervention, and yoga vs. usual care; no harms were reported with Pilates vs. no intervention.Little to no differences were found for other comparisons and outcomes.
When pooling all exercise types together based on the 13 RCTs, we found moderate certainty evidence indicating that in the immediate term, exercise (including aerobic, motor control, Pilates, yoga, core strengthening, and mixed exercise) improves pain in adults, and function in adults and older adults.Little or no difference was found between groups for the other outcomes (HRQoL, depression, selfefficacy, catastrophizing, fear avoidance, and performancebased physical functioning in older adults).Taken together, the findings from our primary synthesis, supplementary synthesis, and the work by Hayden et al. [2,4] are consistent.
Our systematic review has several strengths.First, our international team had clinical and methodological expertise regarding LBP, systematic reviews, evidence syntheses, and answering important public health questions from the WHO.Second, our review process involved conducting comprehensive literature searches without any language restrictions.Third, during the screening and ROB assessments, a core team member (with the most expertise and reliability in screening and ROB evaluations) was involved in each screening and ROB pair.Fourth, our ROB assessments did not rely on summary scores or the number of items at ROB.Instead, we created supplementary guidance forms based on the ROB 1 criteria [13,14], which allowed reviewers to consider critical flaws in the studies [6].Our use of these forms resulted in high agreement on ROB judgements.Fifth, we maintained transparency throughout the review process, providing detailed ROB assessments and footnotes for grading the certainty of the evidence (see Online Resources 2, 5, 8).These notes give readers a better understanding of our judgements and allow them to reach their own conclusions.
Our review has some limitations.One limitation is that we did not search the grey literature, which could introduce publication bias as studies published in peer-reviewed journals tend to report larger intervention effects than those in the grey literature [92].We tried to mitigate this by searching for unpublished RCTs in the WHO ICTRP registry and contacting authors of unpublished RCTs.Moreover, unpublished studies are known to represent a small proportion of studies and rarely impact results and conclusions [93].However, it may be important to include such studies in limited scenarios or where there are potential conflicts of interest in published research [93].
We identified several key gaps in the evidence across different exercise comparisons: 1) lack of studies examining the effects of exercise on anxiety symptoms and social participation (including work); 2) inability to assess whether the benefits or harms of exercise interventions vary by gender/sex or race/ethnicity; 3) insufficient studies to evaluate the impact of leg pain/symptoms on exercise benefits or harms, as well as differences in higher versus lower income countries; 4) inability to examine the influence of intervention-level characteristics, such as exercise specificity, tailored approaches, supervision level, and group versus individual delivery, on benefits and harms; 5) limited evidence on the benefits or harms of specific exercise types in older adults, including aerobic exercise, core strengthening, muscle strength training, Pilates, stretching, flexibility or mobilizing exercises, yoga, and motor control exercises; 6) few studies assessing the impact of exercise on quality of life and psychological outcomes (depression, fear avoidance, catastrophizing, self-efficacy, anxiety), with comparatively less evidence available for older individuals; 7) limited understanding of the effects of exercise in vulnerable populations, such as older adults and those in low-income settings, who are more likely to experience persistent disability from low back pain.
Additionally, exercise's effects are modest, suggesting a need for multifaceted interventions.

Conclusion
When assessing individual exercise types, based on low or very low certainty evidence, pain reduction was associated with aerobic exercise, Pilates and motor control exercise; improved function was associated with mixed exercise and Pilates.A temporary increase in minor pain was associated with mixed exercise and yoga.Little to no difference was found for other comparisons and outcomes.When pooling exercise types, based on moderate certainty evidence, exercise was shown to be beneficial in improving pain and function in adults and older adults.Exercise prescription should be considered based on patient preferences, availability of exercise type, costs, and other contextual factors.Harms should be further investigated systematically.

Fig. 2 Fig. 3
Fig. 2 Any exercise versus comparison interventions where the attributable effect of exercise could be isolated for pain in the immediate term (closest to 2 weeks)

Table 2
Number of included RCTs by comparison and outcomeBold values: majority of studies are in this category, italic values: some studies a Included comparison interventions where the attributable effect of exercise could be isolated (i.e., combined exercise with treatment B versus treatment B alone) b One RCT reported two intervention groups: 1) hamstring static stretching + physiotherapy vs. physiotherapy, 2) hamstring strengthening in lengthened position + physiotherapy vs. physiotherapy c One RCT included adults aged ≥ 60 years It is uncertain whether mixed exercise makes little or no difference to functional limitations (scale 0 to 24, 0 = no functional limitations) in the short (MD = − 2.30, 95% CI − 4.92 to 0.32) (plot 4.2.2.2), or intermediate term (MD = − 2.50, 95% CI − 5.19 to 0.19) (plot 4.2.2.3).