Background

Low back pain (LBP) affects approximately 70% of the adult population in the western world, at some point in their lives [1, 2]. The economic burden for society is high [1, 3], and LBP is ranked as one of the three most burdensome conditions in terms of years lived with disability [4]. In the acute phase of LBP, lasting up to 6 weeks, many individuals suffer considerable pain and disability. The prognosis for acute LBP is favorable. Most symptoms resolve within 6 weeks in 70–80% of affected patients, regardless of intervention or no intervention at all [5,6,7,8]. A small proportion, about 5%, suffer persistent or recurrent pain leading to prolonged disability, which may further abate recovery [6]. Many patients with acute LBP are managed in primary care settings and may visit a physiotherapist [3]. Physiotherapists offer many interventions for this patient group, of which one of the most widely used is exercise therapy [9,10,11]. This practice is concordant with clinical guidelines for the management of low back pain in primary care [12].

International, evidence-based, clinical practice guidelines exist that can aid the physiotherapist in choosing appropriate interventions for patients with acute LBP [13,14,15]. However, recommendations in various guidelines, produced in different countries, differ and are not always consistent with results from systematic reviews [13, 16,17,18]. This may seem odd since systematic reviews of randomized controlled trials (RCTs) provide top level evidence and should be the fundament for any high-quality evidence-based guideline [19]. However, in the development of clinical guidelines, evidence for effectiveness and safety of interventions are only two aspects that are considered. Other aspects such as costs, feasibility, patient preferences, and availability are usually also considered when translating the evidence into recommendations for clinical practice. A mix of populations (acute, sub-acute, and chronic LBP populations) and different definitions of duration are other factors that may explain why recommendations vary [13, 14, 16]. We know that intervention effects differ depending on whether the pain is acute, sub-acute, or chronic [17, 18, 20], and that acute pain differs from chronic pain [21].

The most uniform recommendation in international guidelines is that “first line care” for patients with acute LBP should be to give reassurance, advice to stay active in daily life, and, if necessary, pain medication [12, 13]. Exercise therapy, spinal manipulative therapy, mobilization, and acupuncture are other interventions recommended in some, but not all, clinical practice guidelines [12, 13, 16] and typically if first line care did not lead to improvement of symptoms. For these interventions, there are discrepancies regarding when, how, and whether they should be used in acute LBP [12, 13, 15, 22].

Within the umbrella term exercise therapy, several types of exercise therapy exist. From a physiotherapist perspective, the following types can be distinguished: McKenzie therapy, stabilization exercises (also called motor control exercise), strengthening (or resistance) exercises, stretching exercises, and aerobic exercises. These different types of exercise therapy differ in one important aspect; the hypothesized underpinning effect mechanism. Some have a rather solid theory (physiology), for example aerobic exercise, while others have a more conceptualized theory, for example McKenzie therapy, influenced by the persons that introduced that particular type of exercise therapy. Still, all fit within the umbrella term exercise therapy.

Despite guidelines recommending different interventions, several systematic reviews show that interventions for patients with acute LBP rarely yield clinically relevant effects compared with placebo treatment [17, 23, 24]. This seems to be the case not only concerning physiotherapeutic interventions but for pharmacological treatment as well [25]. Exercise therapy, frequently used in clinical physiotherapy practice [9,10,11], is no exception from this uncertainty of clinically relevant effect [7, 17, 26, 27]. Many systematic reviews conclude that exercise therapy is effective for patient with acute LBP, but that the evidence is inconclusive [20, 28]. Although systematic reviews may be well conducted, the often low methodological quality of many of the included studies reduces the confidence we may have in conclusions regarding exercise therapy and its clinically relevant effects [14].

The limitations and issues described imply a need to summarize and synthesize the findings from existing systematic reviews on exercise therapy, and to assess the overall certainty of evidence for effect of this common physiotherapeutic intervention for acute LBP. To the best of our knowledge, no systematic review of systematic reviews on this topic has been published. The aim of this systematic review of systematic reviews was to assess the overall certainty of evidence for the effects of exercise therapy provided by physiotherapists in comparison with other interventions, on pain, disability, recurrence, and adverse effects in adult patients with acute LBP.

Methods

Protocol and registration

We conducted this systematic review of systematic reviews according to a protocol registered in PROSPERO (CRD46146), available at: https://www.crd.york.ac.uk/PROSPERO/display_record.php?RecordID=46146. The protocol was not published in any peer-reviewed journal. The development of the protocol was guided by the Preferred Reporting Items for Systematic Review and Meta-Analysis Protocols (PRISMA-P) 2015 statement [29]. Conduct and reporting followed Smith et al.’s [30] methodological recommendations for conducting a systematic review of systematic reviews of healthcare interventions, the Cochrane Collaboration’s recommendations for conducting an overview of systematic reviews [31], Lunny et al.’s [32, 33] comprehensive publications with recommendation for conducting an overview, and the PRISMA statement [34] (Additional file 1).

Eligibility criteria

Exercise therapy is defined as “a regimen or plan of physical activities designed and prescribed for specific therapeutic goals, with the purpose to restore normal musculoskeletal function or to reduce pain caused by diseases or injuries” [35]. In this systematic review, interventions were classified as exercise therapy if they could be carried out in physiotherapy and when no other intervention dominated the intervention. For example, the concept of McKenzie therapy involves repeated exercises following a directional preference, but also sometimes spinal mobilization or manipulation [36].

Inclusion criteria and a priori determined definitions for clinical relevance are presented in Table 1.

Table 1 Inclusion criteria for the current systematic review, including cut-offs for clinical relevance

Excluded populations were patients with acute LBP related to pregnancy, infection, malignity, metastasis, osteoporosis, rheumatic arthritis, fracture, inflammatory process, or radiculopathy (neurologic signs).

Search methods

Search strategy

We designed a comprehensive search strategy with support from a medical librarian. We took guidance from earlier published search strategies in Cochrane Reviews regarding low back pain and exercise therapy, to reach an optimal strategy. We used a wide search strategy to avoid missing relevant systematic reviews not indexed correctly in the databases. Precision of search, calculated as eligible SRs/total records, and number needed to read (NNR), calculated as 1/precision of search, were used to present the search result. No language restrictions were applied. We combined search terms and MESH terms in a search strategy developed for PubMed, and adapted this strategy for the other databases. Search strategies are presented in Additional file 2.

Electronic searches

We searched PubMed, Cochrane library, CINAHL, PEDro, Web of Science, Open Grey, and PROSPERO for systematic reviews from inception to 1 March 2017. The three latter were explicitly searched for grey literature, including conference abstracts and study protocols. We also searched reference lists published on the website of the McKenzie Institute International [36]. We performed a supplementary search in PubMed in September 2019. As 83% of the included systematic reviews were indexed in PubMed in the original search, we limited the update search to this database.

Other sources

We scrutinized the reference lists of included systematic reviews for additional potentially relevant studies. We contacted authors by email if the full text was not available.

Selection of systematic reviews

Two reviewers (MK and AB or SB) independently screened titles and abstracts retrieved from the searches and assessed these for eligibility against the predetermined inclusion criteria (PICOS). We retrieved all titles and abstracts meeting the inclusion criteria in full text. Two independent reviewers (MK, SB or AB) read these full text articles to assess eligibility. Disagreements between reviewers were resolved by consensus.

Overlap

We calculated total overlap (RCTs in included reviews), and overlap for each time point, outcome, and type of exercise therapy, following the formula proposed by Pieper et al. [39]. We present overlap with percentage and corrected covered area (CCA). Interpretation of CCA: 0–5 = slight overlap, 6–10 = moderate overlap, 11–15 = high overlap, and > 15 = very high overlap.

Assessment of methodological quality of included reviews

We used A MeaSurement Tool to Assess systematic Reviews (AMSTAR) to assess the methodological quality of the included systematic reviews [40]. AMSTAR has been shown to be a valid and reliable tool to assess methodological quality of systematic reviews [40, 41]. During the process of doing this review, an updated version, AMSTAR 2, was published [42]. Because our assessment was already in progress and the original AMSTAR has more references of reliability and validity in the literature, we chose to continue to use this tool. Two reviewers (MK, AB or SB) independently performed this assessment. Before the actual assessment, a pilot test was carried out by five reviewers (MK, SB, AB, ML, and LN) to evaluate interrater reliability for each of the eleven questions. The result showed good interrater reliability (87% agreement, Fleiss Kappa 0.58), resembling earlier tests [41]. A second pilot test was carried out by MK, AB, and SB, on another review to examine whether agreement had improved. The second test showed 100% agreement. Disagreements in the assessments were handled in a consensus dialog after comparing discrepancies between assessors.

Data extraction

One reviewer (MK) extracted data from the included reviews and another reviewer (AB or SB) checked the extraction for accuracy. We extracted the data into a purpose-built data extraction form, adapted from a Cochrane form [43]. We extracted data primarily at the systematic review level, but supplemented this, when necessary, by extracting data at the RCT level. We verified all point estimates at the RCT level. If there were any discrepancies between an RCT and the systematic review in which it was included, we used data from the RCT. We only extracted data from populations with acute LBP. Each conclusion from the reviews and the RCTs on which this conclusion was based was extracted to enable an overall estimate of the evidence.

Data synthesis

We synthesized the data quantitatively when possible, and otherwise qualitatively. We present the findings from the systematic reviews in summary of findings (SoF) tables for each outcome, type of exercise therapy, and time point. We present continuous data with weighted mean difference (WMD) and 95% confidence interval (CI), and dichotomous data with risk ratio (RR) and 95% CI. We used GRADEpro to create the SoF tables [44].

We considered meta-analyses feasible if clinical homogeneity in the comparisons was present, meaning that interventions, comparisons, time points, and outcomes were similar. Clinical homogeneity was assessed by at least two authors (MK and AK or SB) and agreed upon in consensus discussions. We extracted data for the meta-analyses from the original studies. To enable comparison, we rescaled the data for pain to 0–100 points (mm); e.g., a numeric pain rating score of 3 on a scale from 0 to 10 was rescaled to 30. The rescaling of the data meant that we could use WMD also for the aggregated effects, which is easier to interpret than the standardized mean difference (SMD). Effects were estimated using the inversed variance heterogeneity model, which is a robust estimation method for handling issues of underestimation of the statistical error and overconfident estimates [45]. We defined statistical significance as the 95% confidence interval not including zero. We used the free meta-analysis software MetaXL 5.3 for the statistical analyses [46].

Assessment of certainty of evidence

To evaluate certainty in the overall body of evidence, we used the GRADE approach [47]. Certainty of evidence refers to how certain it is that the true effect of an intervention lies within a chosen range or on one side of a specified threshold [48]. In this systematic review, we used either 95% confidence intervals as the chosen range or the established minimal important difference (MID) as the specified threshold. When available, we used the GRADE and risk of bias assessments made by the authors of the included reviews, for each outcome, comparison, and time point [33]. When not available, we applied GRADE and appraised the potential limitations due to risk of bias, inconsistency, imprecision, and indirectness ourselves, based on the original studies. We did not assess publication bias due to the small number of studies in most comparisons.

Results

Search results

The searches retrieved 2602 records. After screening of titles and abstracts, 134 full-text assessments were carried out. We included 24 systematic reviews with a total of 572 RCTs (overlap not accounted for). Numbers needed to read was 103 and the precision of the search was 0.97%. Six of the 24 reviews were Cochrane reviews. Of the 572 RCTs, 25 publications reporting findings from 21 RCTs with a total of 2685 participants examined exercise therapy for acute LBP. Data for a total of 69 comparisons were extracted from the reviews. Eleven RCTs, with a total of 1397 participants, were included in meta-analyses. Overlap was high, 76%, with a corrected covered area of 0.14. The flowchart in Fig. 1 illustrates the selection process. The supplementary search performed in September 2019 did not result in any additional reviews that met the inclusion criteria.

Fig. 1
figure 1

Flow diagram of the selection process

Description of included reviews

The included reviews were published between 1993 and 2018. Included RCTs in the reviews with data for an acute population were published between 1982 and 2013. Characteristics of the included reviews are presented in Table 2. Excluded reviews are presented in Additional file 3, with reason for exclusion.

Table 2 Characteristics of included systematic reviews and their included randomized controlled trials

Exercise therapy

Stabilization exercise and McKenzie therapy were two specific types of exercise therapy in the included reviews that were possible to classify and assess separately. When a mix of different types of exercise was used as the intervention, we used the term general exercise therapy. For stabilization exercise (including co-contraction of multifidi and transversus abdominis muscles or facilitation of abdominal and/or lumbar extensor muscles, initially at low levels of contraction with progression), various terms were used in the reviews: stabilization exercises [82], specific stabilization exercises [68], specific spinal stabilization exercises [78], segmental stabilizing exercises [28], or motor control exercises [17, 84]. McKenzie therapy [70], directional preference management [88], directional preference exercise [87], McKenzie approach [81], and McKenzie method [76, 89] were used to describe the concept officially named The McKenzie Method® of Mechanical Diagnosis and Therapy® (2018).

Comparisons

The following interventions were used as comparator: usual care, advice, educational booklet, general practitioner management, medical management, spinal manipulative therapy, manual therapy, NSAID, activity of daily life, no treatment, bed rest, and sham ultrasound.

Treatment duration and frequency

Treatment periods for the exercise therapy groups ranged from 3 days to 8 weeks. Frequency ranged from one to three visits per week. Additional home exercise frequency ranged from three times per day to once every hour.

Outcomes

Different outcomes were reported in the systematic reviews: pain, bothersomeness, function, functional status, disability, recurrence, patient satisfaction, global improvement, time to recovery, mobility, loss of work days, back to work, muscle thickness, sick leave, activity of daily living, and adverse effects.

Outcome measures

Pain was measured using the Visual Analogue Scale (VAS) or the Numeric Rating Scale (NRS). Disability was measured using the Roland Morris Disability Questionnaire (RMDQ) or the Oswestry Disability Index (ODI). Data were often transformed and presented with MD or SMD, with 95% CI or standard deviation (SD). Recurrence was measured as frequency or number of patients affected, and transformed, when possible, to risk ratio (RR) with 95% CI. Otherwise (more frequently in older systematic reviews), outcome measures from original RCTs were presented with or without a transformation to a simple negative or positive effect, with or without statistical significance (p value).

Time points

Twenty-three reviews reported post-treatment (closest to 1 week) outcomes, 19 short-term follow-up (closest to 12 weeks), two intermediate-term follow-up (closest to 26 weeks), and 20 long-term follow-up (closest to 52 weeks).

Methodological quality of included reviews

Results of the AMSTAR quality assessment are presented in Table 3. The median overall AMSTAR score (of a maximum of 11) was 6.5 (range 2–11). Five of 24 reviews were assessed as being of high quality (AMSTAR score 9–11) [17, 23, 69, 85, 86], nine of moderate quality (AMSTAR score 5–8) [28, 64, 70, 73, 76, 80, 81, 88, 89], and ten of low quality (AMSTAR score 0–4) [49, 57, 62, 63, 68, 71, 78, 82, 84, 87]. Not reporting conflicts of interest (neither in the SRs nor in their included RCTs), followed by not reporting publication bias and a pre-determined strategy before conducting the review, were the main limitations. Reviews produced by Cochrane groups were of higher methodological quality, median 10 (range 8–11) versus non-Cochrane reviews, median 4.5 (range 2–8). Older reviews were more often of lower methodological quality; those conducted in the 1990s median 4 (range 3–4), in the 2000s median 6 (range 3–10), and in the 2010s median 8.5 (range 2–11).

Table 3 AMSTAR quality assessment of included reviews

Quality assessment of the included RCTs, as assessed by the authors of the included reviews

Four RCTs [27, 59, 65, 77] were consistently assessed as being of high quality, two RCTs of moderate quality [54, 61], and six RCTs [50, 51, 55, 56, 58, 75] of low quality. The assessment of the remaining nine RCTs [52, 53, 60, 66, 67, 72, 74, 79, 83] varied between low and high quality. The main limitations of the RCTs were small sample sizes and lack of blinding of participants, intervention providers, and outcome assessors.

Review conclusions for acute populations

Twenty-one of the 24 included reviews concluded that there was no difference in effects and three reviews made no conclusive statement about the difference in effect between exercise therapy and any comparator for the acute population. Three reviews concluded that there were positive effects of exercise therapy, but only for the outcome recurrence at long-term follow-up. However, this was based on the same, single RCT [72]. Of the RCTs included in the reviews, four showed results in favor of exercise therapy for some outcomes, 14 resulted in no difference, and three RCTs showed results in favor of the comparator.

Outcomes

Findings are summarized below and presented in detail in SoF tables 4-12 (Additional file 4).

Pain

General exercise therapy

Twelve reviews [17, 23, 49, 57, 62,63,64, 69, 73, 76, 80, 86], including eight RCTs [50,51,52, 59, 61, 66, 75, 77] of low to high quality, addressed effects of general exercise therapy on pain. Overlap for the various time points ranged from 75 to 100%, with corrected covered areas of 0.25–0.70.

We were able to pool data for one comparison. Meta-analysis of four RCTs [51, 59, 66, 75] comparing general exercise therapy with usual care showed no significant difference in post-treatment effects on pain (Fig. 2).

Fig. 2
figure 2

Post-treatment effects on pain of general exercise therapy versus usual care

No important difference in effects of general exercise therapy on pain was reported for any comparison or time point (SoF table 4). Evidence ranging from very low to moderate certainty suggests that general exercise therapy probably results in little or no important difference in pain, at any time point, when compared with any of the investigated control interventions.

Stabilization exercise

Seven reviews [17, 28, 68, 71, 78, 82, 84], including three RCTs [72, 77, 83] of low to high quality, addressed effects of stabilization exercise on pain. Overlap was 67% with a corrected covered area of 0.39. No important differences in effects of stabilization exercise on post-treatment or short-term pain were reported (SoF table 5). Intermediate- or long-term effects were not reported. Evidence ranging from low to moderate certainty suggests no important difference in pain at post-treatment and short-term, when comparing stabilization exercise with other exercise therapies. The evidence is very uncertain whether stabilization exercise plus medical management reduces post-treatment pain when compared with medical management alone.

McKenzie therapy

Thirteen reviews [23, 49, 62,63,64, 69, 70, 76, 81, 86,87,88,89], including seven RCTs [27, 53, 60, 65, 67, 74, 79] of low or high quality, addressed effects of McKenzie therapy on pain. Overlap was 75% with corrected covered areas of 0.24–0.45.

We were able to pool data for four comparisons. No significant difference was seen in post-treatment or short-term pain when McKenzie was compared with usual care (Fig. 3a–b) or for McKenzie therapy vs. spinal manipulative therapy post treatment (Fig. 3d). A significant difference was seen between McKenzie therapy and an educational booklet: total MD − 11.30 (95% CI − 18.15 to − 4.45) (Fig. 3c). However, the effect did not exceed the MID of 15 mm on the VAS. Findings for other comparisons are presented in SoF table 6.

Fig. 3
figure 3

a McKenzie vs. usual care, post treatment. b McKenzie vs. usual care, short term. c McKenzie vs. education booklet, post treatment. d McKenzie vs. spinal manipulative therapy, post treatment

At intermediate and long term, no important difference in effects of McKenzie therapy on pain was reported compared with usual care [69], educational booklet [69], spinal manipulative therapy [65], or NSAID [54]. Evidence ranging from very low to moderate certainty suggests no important difference in pain at any time point, when comparing McKenzie therapy with any of the control interventions.

Disability

General exercise therapy

Nine reviews [23, 49, 62,63,64, 69, 73, 80, 86], including six RCTs [50,51,52, 59, 66, 75] of predominantly low to moderate quality, addressed effects of general exercise therapy on disability. Overlap was 100% with corrected covered areas of 0.33–0.40.

We were able to pool data for three comparisons. Meta-analysis of three RCTs [52, 59, 66] of general exercise therapy versus usual care showed a statistically significant difference in post-treatment effects on disability in favor of usual care: MD 2.62 (95% CI 0.52 to 4.72) (Fig. 4a). However, this effect did not exceed the MID. Meta-analysis of two RCTs [59, 66] on short-term effects and of two RCTs [59, 66] on long-term effects of disability of general exercise therapy versus usual care showed no significant difference (Fig. 4b, c, SoF table 7).

Fig. 4
figure 4

a General exercise therapy vs. usual care, post treatment. b General exercise therapy vs. usual care, short term. c General exercise therapy vs. usual care, long term

No important difference in effects of general exercise therapy on post-treatment disability was reported compared with sham ultrasound [69], spinal manipulative therapy [69], hot pack [69], or NSAID [64]. In comparison with sham ultrasound [69], hot-pack [69], bed rest, or usual care [64], no important difference in effects of general exercise therapy on short-term disability was reported. None of the included reviews reported intermediate-term effects. In comparison with sham ultrasound [69] or bed rest [64], no important difference in long-term effects of general exercise therapy on disability was reported. Evidence of low to moderate certainty suggests no important difference in disability at any time point, when comparing general exercise therapy and usual care.

Stabilization exercise

Seven reviews [17, 28, 68, 71, 78, 82, 84], including three RCTs [72, 77, 83] of low to high quality, addressed effects of stabilization exercise on disability. Overlap was 67% with a corrected covered area of 0.39. No important difference in effects of stabilization exercise in post-treatment, short-term, or long-term disability was reported (SoF table 8). No intermediate-term effects were reported. Evidence of very low to low certainty suggests no important difference in disability at any time point, when stabilization exercise is compared with any of the control interventions examined. The evidence is very uncertain whether stabilization exercise plus medical management reduces post-treatment disability when compared with medical management alone.

McKenzie therapy

Seven reviews [23, 69, 70, 76, 87,88,89], including seven RCTs [27, 60, 65, 67, 74, 79, 83] ranging from low to high quality, addressed effects of McKenzie therapy on disability. Overlap was 71–75% with corrected covered areas of 0.25–0.42.

We were able to pool data for four comparisons. No important differences were seen in post-treatment or short-term disability when McKenzie therapy was compared with usual care, educational booklet, or spinal manipulative therapy (Fig. 5a–d).

Fig. 5
figure 5

a McKenzie vs. usual care, post treatment. b McKenzie vs. educational booklet, post treatment. c McKenzie vs. spinal manipulative therapy, post treatment. d McKenzie vs. usual care, short term

Findings for other comparisons are presented in SoF table 9. In comparison with educational booklet [70] or NSAID [54], no important difference in intermediate-term effects of McKenzie therapy on disability was reported. In comparison with usual care [69], educational booklet [69], spinal manipulative therapy [23], or NSAID [54], no important difference in long-term effects of McKenzie therapy on disability was reported. Evidence of very low to moderate certainty suggests that there is no difference in disability at any time point, between McKenzie therapy and usual care, spinal manipulative therapy, or NSAID. Evidence of moderate certainty suggests that McKenzie therapy likely does not reduce disability, at any time point, when compared with an educational booklet.

Recurrence

General exercise therapy

None of the included reviews addressed post-treatment, short-term, or intermediate-term effects on recurrence. Three reviews [64, 76, 85] addressed long-term effects of general exercise therapy on recurrence. Two of those [64, 85] reported results from the same RCT [59]. The third review [76] included one RCT [61] measuring long-term effects on recurrence, but this was not reported in the review. The RCTs were of moderate to high quality. Overlap was 50% with a corrected covered area of 0.25.

No significant effects of general exercise therapy on recurrence in comparison with sham ultrasound, usual care, or ice-pack was reported for any time point (SoF table 10). Evidence of moderate certainty suggests that general exercise therapy is not more effective in preventing recurrence than placebo or usual care at long-term. The evidence is very uncertain whether general exercise therapy reduces recurrence when compared with ice-pack.

Stabilization exercise

None of the included reviews reported post-treatment, short-term, or intermediate-term effects of stabilization exercise on recurrence. Eight reviews [17, 28, 68, 71, 78, 82, 84, 85] addressed long-term effects of stabilization exercise on recurrence, all based on one RCT [72]. Overlap was 100% with a corrected covered area of 1.0. Quality assessment of the RCT varied from low to high quality, affecting the level of evidence for stabilization exercise in the reviews, which ranged from very low to moderate evidence. Systematic reviews of low methodological quality tended to overestimate the quality of this RCT. Recurrence was measured as number of persons with recurrence in all eight reviews, and also as frequency of recurrence in one review.

Stabilization exercise plus medical management versus medical management alone resulted in lower relative risk for recurrence, RR 0.36 (95% CI 0.18 to 0.72), while there was no significant difference in recurrence frequency, MD -1.40 (95% CI -3.16 to 0.36) (SoF table 11). The evidence is very uncertain whether stabilization exercise plus medical management reduces the long-term risk for recurrence when compared with medical management alone.

McKenzie therapy

None of the included reviews reported any post-treatment or short-term effects of McKenzie therapy on recurrence. One review [70] addressed intermediate-term effects on recurrence but did not present results that were available in an included RCT [54], assessed as of moderate quality in the review [70]. Three reviews [70, 76, 85] addressed long-term effects on recurrence, in which three RCTs [53, 54, 65], of low to high quality, presented results for McKenzie therapy. Overlap was 50% with a corrected covered area of 0.25.

There was no difference in intermediate or long-term recurrence frequency with McKenzie therapy versus NSAID [54], and no long-term effects on recurrence compared with simple back educational interventions (SoF table 12). The evidence is very uncertain whether McKenzie therapy reduces the intermediate or long-term risk for recurrence when compared with simple back education or NSAID.

Adverse effects

Adverse effects were addressed in ten (42%) of the reviews [17, 23, 28, 69, 73, 80, 81, 85, 86, 88]. No review reported any adverse effects specific to the acute population. Of the included RCTs, two [54, 79] addressed adverse effects, but none of them reported any adverse effects.

Discussion

The main findings of this systematic review are that moderate-certainty evidence suggests no superior effect of exercise therapy versus any comparator, for any of the examined outcomes, at any time point. Low-certainty evidence suggests that McKenzie therapy may be superior, with a small effect, versus simple educational booklet at post-treatment follow-up for pain; but there were no other differences between McKenzie therapy and other interventions. Very low-certainty evidence suggests that stabilization exercise together with medical management may be superior versus only medical management for recurrence at long-term follow-up, but since the evidence is of very low certainty, we cannot draw any firm conclusions; and there were no other differences between medical management and other interventions.

Adverse effects were rarely addressed in the reviews or in the included RCTs and no adverse effects were reported, indicating a possibility that they were underreported. Mild reactions with increased back pain and muscle soreness were reported in one review [69], but it was unclear in that review whether they were in the acute population or the chronic population.

The body of evidence from 24 systematic reviews consistently shows that exercise therapy in the acute phase of LBP does not yield any clinically important difference compared with any other treatment, for most outcomes and most time points. The most likely explanation for the lack of effect is the generally good prognosis (natural course) of acute LBP. The lack of long-term effect might be explained by factors such as insufficient treatment duration, frequency, or intensity of the exercise protocols. Physiology tells us that the effect of 1 to 8 weeks of exercise may not remain 1 year later, unless exercise is maintained [90]. However, relevant post-treatment and short-term effects on pain or disability are also lacking. The conclusion seems justified that the role of exercise therapy in acute LBP is very limited, at best.

The lack of effect of McKenzie therapy has been attributed in previous reviews [70, 76] to the improper use of the method, i.e., without addressing the patient’s directional preference. However, the synthesized data in our systematic review do not support any clinically relevant effect, even when this issue has been addressed. A recent review, not included in our systematic review, showed a significant difference in effects on pain and disability between RCTs that adhered to McKenzie core principles and non-adherent RCTs, but no difference between adherent RCTs and other comparators [91].

For stabilization exercise, there is no convincing benefit over other types of exercise therapy.

The most recommended “first line” care (advice to stay active and reassurance of a favorable prognosis) [12, 13] was rarely used as comparison. Instead, a wide variety of interventions, such as spinal manipulative therapy, ice-pack, hot-pack, NSAID, educational booklets, manual therapy, or medical management with prescribed bed rest, were used as comparison.

Not all included RCTs point in the direction of no effect of exercise therapy. Small RCTs tended to favor exercise therapy, suggesting a potential publication bias. When compared with spinal manipulative therapy, the results often pointed in opposite directions. In contrast, larger trials with low risk of bias pointed in the direction of no effect or no minimal important difference when exercise was compared with less strenuous interventions. The included reviews follow the same pattern; higher quality reviews report no difference, while lower quality reviews suggest a positive effect of exercise therapy. Most lower-quality reviews typically highlight a marginal effect of exercise therapy rather than stating that the effects are not clinically relevant.

The most prominent issues with regard to risk of bias and the resulting uncertainty of the evidence are the small number of RCTs in the included reviews and the small sample size of those RCTs, resulting in a lack of power to detect statistically significant differences. Lack of blinding of patients, intervention providers (which is difficult to do with these interventions), and outcome assessors were potential study limitations that further reduce our confidence in the effects of exercise therapy.

Overlap is an important issue to describe and consider when producing systematic review of systematic reviews [39]. Our systematic review showed a high overlap, which we handled by presenting it with percentage and corrected covered area for each outcome and each comparison. This minimized the risk of bias and enabled us to judge the overall certainty of evidence for the broader term of exercise therapy and the two more specific types, i.e., stabilization exercise and McKenzie therapy.

We are not aware of any other systematic review of systematic reviews addressing exercise therapy for acute LBP. Swinkels et al. [92] addressed the effect of exercise therapy for nonspecific LBP in their overview. That overview included four reviews of which two [57, 69] were included in our systematic review. The other two reviews addressed non-acute populations. Swinkels et al. [92] concluded, based on the study by Hayden et al. [69], that exercise therapy is as effective as either no treatment or other non-exercise interventions at short-, intermediate-, and long term follow-up. We do not disagree with that conclusion but conclude, based on the studies included in our systematic review, that exercise therapy does not result in any minimal important difference in effect compared with other interventions. Maher et al. [93] concluded, also based on the study by Hayden et al. [69], that high-quality evidence exists for no difference between exercise therapy versus sham treatment or other conservative treatments. While we agree with their conclusion, our analysis only supports moderate certainty of evidence for this comparison, suggesting that future studies may change our confidence in the estimate of effects.

An updated publication of recommendations in international clinical guidelines showed that exercise therapy is recommended for acute LBP in three of 14 guidelines and that the other 11 guidelines provided inconsistent recommendations on exercise therapy for acute LBP [94]. The Danish guidelines [95] recommend exercise therapy based on low-quality evidence from seven RCTs, including a population with acute LBP (in their definition up to 12 weeks’ duration). The authors made the recommendation based on a trend in the results favoring exercise therapy. Five of these RCTs are included in our systematic review. Our findings do not support these recommendations. However, recommendations in guidelines are based on more aspects than solely evidence from systematic reviews. Patients’ preferences, clinicians’ experiences, costs, availability, and safety are examples of other aspects that are considered and which could explain the discrepancy between recommendations and evidence.

Some of the limitations in this systematic review may have introduced potential biases. The low methodological quality of some of the included reviews and their underlying RCTs contributes to the low certainty of evidence. For most outcomes and time points, the total number of participants was low. The inclusion of the same RCTs in many of the reviews caused a high overlap, and the varying results of review authors’ methodological quality assessment of some RCTs is a further cause for concern. Furthermore, our meta-analyses were based on aggregate data, which entails a potential for ecological fallacy. A difference in outcomes can be significant in several subgroups, but when combined, this difference may disappear or even reverse; a fallacy known as Simpson’s paradox [96]. Our overall GRADE assessment was based on a combination of assessments made by the systematic review authors and ourselves. This combination may entail inconsistency in assessments, as reliability between the assessment made by the authors of the systematic reviews and our research group is unknown.

We have followed available methodological guidance for conducting a systematic review of systematic reviews, but the guidance is evolving and several strategies are available. The choice of strategy will have an impact on the results and conclusion. Overlap could have been minimized by excluding reviews with the same RCTs included. Another possible strategy would have been to exclude reviews of lower methodological quality. However, then we would not have obtained a complete picture of the overall certainty of the evidence from all available reviews.

Implications for practice

LBP is among the most common reasons for which patients consult a physiotherapist or general practitioner in primary care [97]. It is important to provide accurate, timely, and effective management for this condition. The findings of this systematic review of systematic reviews do not suggest any benefit of using exercise therapy in the acute phase of LBP. None of the exercise types resulted in any effects, in any of the comparisons, which exceeded the established minimal important difference for pain and function [37]. This was true both when compared with placebo (sham ultrasound), with less strenuous interventions (advice to stay active, reassurance of an optimistic prognosis, and educational booklet), and with other forms of exercise therapy. This is important knowledge that the physiotherapist and general practitioner need to adopt in their clinical practice. Exercise is still used by physiotherapists for acute LBP, although not to as great extent as for subacute or chronic LBP [9]. Our findings imply that physiotherapists and general practitioners should be more reluctant in providing exercise therapy for acute LBP, and instead more strongly stress the good prognosis and provide reassurance and advice to stay active. They also need to communicate this knowledge to their patients with acute LBP so that patient and therapist can make an informed treatment decision together. Good patient–therapist communication is essential to achieve a collaborative rehabilitation and engage patients in their treatment [98]. In accordance with the principles of evidence-based practice, the physiotherapist and general practitioner should integrate their patient’s preferences and values with their own clinical expertise and the research findings, to determine if and when exercise therapy could, or should, be the intervention of choice.

Future research

This systematic review of systematic reviews reveals many areas in which there is room for improvement in terms of rigorous conduct of RCTs that would enhance certainty of the evidence. Such improvement is necessary if we are to come to a more certain conclusion regarding the effects or non-effects of exercise therapy for acute LBP. Increasing sample size to reach sufficient power, standardizing outcomes and outcome measures, choosing relevant time points and relevant comparisons, and improving the reporting of conflicts of interest are some issues that would strengthen the certainty of evidence. Presenting study findings with minimally important differences and confidence intervals would enhance applicability of the findings.

Assigning more weight to results from studies with low risk of bias or excluding studies with high risk of bias are strategies that could resolve discrepancies in existing reviews and their included RCTs. Our systematic review found more systematic reviews than RCTs. In view of the many systematic reviews published and the large extent of overlap, the need to conduct another systematic review is limited, unless new RCTs are published. However, it is questionable whether it would be worthwhile to invest public funding in new trials on exercise therapy for acute LBP. Small trials with a high risk of bias are expected to overestimate the true effect. If 24 systematic reviews and 21 RCTs do not show a clinically important effect of exercise therapy for acute LBP, scarce resources for research might be better spent on prevention or treatment of chronic LBP. Reducing the enormous burden of chronic LBP seems a priority world-wide [99].

We found the GRADE approach challenging to apply and believe it would benefit from further development and guidance for use in systematic reviews of systematic reviews, to facilitate assessment of the certainty of the evidence. GRADE was developed to assess certainty of evidence based on risk of bias and other criteria in primary studies, whereas in a systematic review of systematic reviews, the unit of analysis is the included reviews. Methodological quality of the underlying RCTs is an important component of the GRADE assessment, and we had to rely on satisfactory quality appraisal and reporting by the review authors. However, we found considerable variation in the quality appraisal, and hence GRADE assessment, among the included reviews. We attempted to use Pollocks et al.’s algorithm for assigning GRADE levels [100], but did not find it suitable for this systematic review of systematic reviews. Greater consistency is needed with regard to how systematic review authors extract and present data and assess evidence, so that systematic review authors can rely on the underpinning data without having to go back to the original RCTs. Using GRADE for each outcome, time point, and intervention is feasible as long as the overlap is controlled in systematic reviews of systematic reviews.

Ethical considerations

When more systematic reviews than RCTs are conducted in a certain field, we need to consider other approaches to get answers. Doing more underpowered or biased RCTs is likely to further increase inconsistency and heterogeneity. Comparing one intervention to another, where none of the interventions have any superior effect compared with no treatment or sham, will not increase the certainty of evidence. Maybe the right question to ask is why large RCTs are still missing, despite three decades of systematic reviews based on RCTs? Faas et al. (74) studied 493 participants in 1993 and since then, no other RCT has succeeded in matching this number of participants. Why? Patients are often the ones who participate in the underpowered and biased studies and in the end the ones who receive the interventions. It does not seem appropriate or ethical to continue including patients in trials that will not adequately answer the research question.

Conclusions

The findings of this systematic review of systematic reviews suggest that there is very low-to-moderate certainty evidence that exercise therapy of any type may result in little or no important difference in pain or disability in adult patients with acute LBP, compared with other interventions, at any of the follow-up points reported. It is uncertain whether stabilization exercise in the acute phase reduces the risk of recurrence. Contradictory findings were seen in some small RCTs of low methodological quality. Adverse effects seem rare, but the total sample is too small to draw firm conclusions.

Knowledge about the certainty of evidence for the effectiveness of exercise therapy is important for the physiotherapist in clinical primary care practice and should be used to inform treatment decisions. The knowledge should be communicated to the patient, together with other treatment options, so that a fully informed, joint decision about treatment can be made.