Background

Non-specific low back pain (LBP) can be described as low back pain without underlying cause or disease, and has a lifetime prevalence of 80% [1, 2]. Point prevalence ranges from 12% to 33%, with 90% of acute episodes recovering within six weeks [1, 3]. However, 62% of people experiencing their first episode of LBP will develop chronic symptoms lasting longer than one year, with 16% of people still sick listed from work at 6 months [4]. The UK health service spends more than £1 billion on related costs, including hospital and GP appointments and physiotherapy treatments, with similar high costs seen in other developed countries [5, 6]. LBP is a major cause for long term sickness amongst the workforce, and has been estimated to cost UK employers as much as £624 million per year, with 119 million works days lost each year [7, 8].

In the UK patients with LBP are routinely referred to physiotherapy [6]. Treatment can involve a number of different techniques ranging from spinal manipulations, mobilisation, advice, general exercises and specifically tailored exercises [9]. It has been claimed that there is a link between dysfunction within the activation and timing of local spinal stabilisation muscles and back pain [1012]. Consequently a therapeutic exercise regime aimed at these muscles was developed, designed to ‘retrain’ motor skills and the activation dysfunction [12]. Despite doubts raised about this link between back pain and muscle activation, and the effectiveness of such an exercise regime (known as stabilisation or ‘core stability’ exercises) it has grown in popularity and now ranks the most common form of physiotherapy treatment in the UK for back pain [9, 1315].

A 2008 systematic review by May and Johnson, which included 18 trials up to 2006, concluded that specific stabilisation exercises may be beneficial over no treatment, but went on to report that it was unlikely to produce an outcome better than any other form of exercise [13]. It has been suggested that median duration of survival time of a systematic review is 5.5 years, with 23% of systematic reviews being out of date within two years of publication [16]. Since 2006 there has been considerable growth in the evidence base, with a large number of new trials being published. In total there have been seven systematic reviews that have looked at stabilisation exercises [13, 1722], with the previous three performing a meta-analysis [17, 21, 22]. Macedo et al [17] included studies published up to June 2008 and concluded that stabilisation exercises were no better than general exercise. In 2012 Wang et al [21] carried out a systematic review and also concluded there was no significant difference between ‘core stability’ and general exercises [21]. However, Wang et al’s narrow definition of ‘core stability’ exercises was “exercises performed on unstable surfaces”, rather than a broader definition based upon specific muscle activation. Furthermore, they only included randomised controlled trials (RCT) that specifically compared intervention versus general exercise, rather than any other alternative treatment, and only included people suffering back pain for more than three months. Consequently only five articles fulfilled their inclusion criteria [21]. Our systematic review uses a more broader definition and comparison, similar to May and Johnson [13], and found 19 further articles to add to the original 18 [13]. In contract to these results more recently Byström et al [22] reported that stabilisation exercises were more favourable than general exercises. They searched the literature up to October 2011, but did not limit their participants to non specific back pain and had far stricter inclusion criteria. Our review included a further 15 articles to Byström et al thus providing the justification for a more up to date review.

This systemic review and meta-analysis was conducted to update the 2008 data by May and Johnson [13]. The primary aim of this analysis is to systematically review the most current up to date literature to determine whether stabilisation (or ‘core stability’) exercises are an effective therapeutic treatment compared to an alternative treatment for people with non-specific low back pain. The secondary aim is to determine if stabilisation exercises are as effective as other forms of exercise, and to evaluate findings by meta-analysis if appropriate. This systematic review update followed the recommendations of the PRISMA statement [23].

Methods

Search strategy

An electronic database search of title and abstract was conducted October 2006 to October 2013 on the following databases: (1) PubMed, (2) the Cumulative Index to Nursing and Allied Health Literature (CINAHL), (3) The Allied and Complimentary Medicine Database (AMED), (4) Physiotherapy Evidence Database (Pedro), (5) The Cochrane Library. Specific search strategies depended on the particular database being searched. For the keywords and the PubMed search strategy used see Table 1. Hand searches of the reference list of included articles were also performed.

Table 1 PubMed search strategy

Study selection

For inclusion the studies had to meet the following criteria.

Participants

Adults recruited from the general population with non-specific low back pain of any length of time. Low back pain defined as, but not restrictive to, pain and/or stiffness between the lower rib and buttock crease with or without leg pain. Studies with specific pathology, such as systemic inflammatory diseases, prolapsed disc, spondylolisthesis, pregnancy related, fractures, tumours, infections or osteoporosis were excluded.

Interventions

Primary intervention arm of stabilisation, or ‘core stability’, exercises defined as: facilitation of deep muscles of the spine (primarily transversus abdominis or multifidus) at low level, integrated into exercise, progressing into functional activity, according to Richardson et al [12], Norris et al [24] or O’Sullivan et al [25]. Comparison group of any other intervention, placebo or control were considered appropriate.

Outcomes

Included studies were required to report an outcome measure of pain and/or functional disability.

Study design

Studies had to be full randomised controlled trials (RCTs), published in English, in a peer reviewed journal. Studies that were not randomised or quasi-random were excluded.

Study selection

One reviewer (BS) conducted the electronic database searches and screened the title and abstracts. Full copies of potential eligible paper were retrieved and independently screened by two reviewers (BS and CL). Initial percentage agreement was 68%, and using Cohen’s statistic method kappa agreement was k = 0.29, which is considered poor to fair agreement [2628]. Disagreements were resolved by consensus without the need for a third reviewer (SM), who was available. Initial disagreements were due to; intervention criteria [2933], study population [34, 35], study design [3639] and duplication of results from another publication being missed [40, 41].

Our review excluded participants with specific pathology, so all three reviewers (BS, CL and SM) verified any exclusion of studies from the 2008 review [13].

Data extraction

We extracted the following data from the included articles: study design, participant information, interventions and setting, follow-up period and outcome data [42]. These data were then compiled into a standard table by one reviewer (BS) and then independently checked and verified by a second reviewer (SM). Disagreements were resolved through consensus. A third reviewer (CL) was available in the event of an agreement not being reached, but was not required. Of the included articles three had inconsistencies within their text, figures or tables with regards to their results [33, 43, 44]. All were contacted by e-mail, and all gave clarification. One study published median outcome scores, and the authors were contacted and provided mean outcome data [45]. Effectiveness was judged for short term (≤3 months from randomisation), medium term (>3 and <12 months) and long term (≥12 months), as recommended by the 2009 Updated Method Guidelines for Systematic Reviews in the Cochrane Back Review Group and in keeping with the original 2008 systematic review [13, 46].

Data from the 2008 review was taken directly from the published review [13].

Quality assessment

Studies meeting the inclusion criteria were assessed for methodological quality and risk of bias using the PEDro scale [47]. The 11 item PEDro scale was developed by Verhagen et al using the Delphi consensus technique to develop a list of criteria thought by experts in the field to measure methodological quality [48]. The PEDro scale consists of the following items: (1) Was eligibility criteria specified? (2) Were all subjects randomly allocated? (3) Were allocations concealed? (4) Were the groups similar at baseline? (5) Was there blinding of all subjects? (6) Was there blinding of all therapists? (7) Was there blinding of all assessors? (8) Was there measures of at least one key outcome for more than 85% of the subjects initially allocated to groups? (9) Did all subjects for whom outcome measures were available receive the treatment or control condition as allocated or, where this was not the case, data for at least one key outcome was analysed by “intention to treat”? (10) Were the results of between group statistical comparisons reported for at least one key outcome? (11) Did the study have both point measures and measures of variability for at least one key outcome? [47]. Items 2 – 9 refer to the internal validity of a paper, and items 10 and 11 refer to the statistical analysis, ensuring sufficient data to enable appropriate interpretation of the results. Item 1 is related to the external validity and therefore not included in the total PEDro score [49].

All included articles were already scored within the PEDro database, and these data were extracted from the PEDro website [50]. Based upon the original 2008 paper and precedent within the literature, studies scoring ≥6 out of 10 were considered to be high quality [13, 51].

Statistical analysis

Pain and disability mean scores, along with their measure of range (standard deviation/95% confidence interval) were transformed to a score ranging from 0 to 100 [52]. All data analyses were performed using the OpenMetaAnalyst software [53]. Statistical between study heterogeneity was assessed with the I 2 statistic, and this review considered 25% low, 50% moderate and 75% high [54]. If trials were considered sufficiently homogenous then outcome data was pooled according to outcome (pain or disability), methodological quality (PEDro scores <6, or ≥6) and follow-up period. Due to the inherent heterogeneity in low back pain within the literature, the DerSimonian and Laird random effects model was used [55].

Sensitivity analysis

The robustness of our results was tested through a sensitivity analysis. We assessed the impact of using solely high quality studies with using studies of low, medium and high quality combined.

Results

Study identification

The initial database search produced 2,076 citations, of which 41 were appropriate for full text review, see Figure 1 for study selection process.

Figure 1
figure 1

Study selection process.

After full text review 23 articles were excluded. Reasons being: participants not meeting criteria [34, 35, 56, 57], intervention not meeting criteria [29, 31, 32, 5861], study design not meeting criteria [3639, 6264], duplications of results from other included studies [40, 41, 65] and no appropriate outcome measures [66]. That left a total 18 studies for inclusion [33, 4345, 6780]. Of the 18, two were separate publications of different treatment groups of the same larger study [71, 72]. Of note is that Franca et al [71] did not perform an intention to treat analysis, and so has a lower PEDro score than Franca at al [72]. However, as both had a PEDro score ≥ 6 this does not affect the pooling of both comparisons within the data synthesis. Therefore, a total of 17 separate trials were included.

From the 18 included studies from the 2008 review, seven were rejected for this review, five due to this review only including patients with non specific back pain [25, 8184], one because it was a pilot study [85] and one due to inappropriate outcomes [86]. That resulted in 12 studies being drawn from the 2008 review (one from two publications) [8799], with 29 studies in total included for this updated review.

Characteristics of included studies

A summary of the characteristics of the included studies along with the main results is shown in Table 2. There were heterogeneous populations within the studies, with regard to duration of symptoms and presence of leg symptoms. Ten of the studies specified participants having back pain lasting more than three months [43, 67, 69, 7173, 7678], with two studies specifying pain lasting three or more months [68, 70], two specifying more than two months [33, 45], one specified any length of time [75], and two studies did not detail their criteria [44, 74]. Four studies included participants with or without leg pain [6770], eight excluded participants with leg pain [33, 44, 45, 7173, 77, 78] and six were not clear on their inclusion criteria with regards to leg pain [43, 7476, 79, 80].

Table 2 Characteristics of included studies

Stabilisation exercises were the sole intervention for the majority of the studies, with five being individually treated [68, 7072, 75, 79] and nine being in a class setting [33, 44, 45, 64, 73, 7678, 80]. Three studies combined stabilisation exercises with other forms of treatment, such as general exercises [43, 67], and electrotherapy treatment [74].

Thirteen studies used a visual analogue scale to measure pain [4345, 67, 7075, 77, 78, 80], whilst four used an ordinal numerical rating scale [62, 63, 70, 73]. Four studies used the Roland-Morris disability Questionnaire (RMDQ) to measure disability [62, 63, 70, 73], whilst 12 measured disability using the Oswestry Disability Index (ODI) [33, 4345, 64, 7174, 7780]. Two studies also included the Fear-Avoidance Beliefs Questionnaire (FABQ) as an outcome measure [77, 79].

Sixteen studies recorded short term follow-up, with 14 measuring pain and disability [4345, 68, 7074, 7680], one just pain [67] and one just disability [33]. Seven studies recorded medium term follow-up, with six recording outcomes for pain and disability [45, 6870, 76, 77], and one just pain [75]. Six studies recorded long term follow-up, with five recording pain and disability [45, 6870, 76], and one just pain [79]. Two further studies went on to record follow-up of disability and pain extra long term [45, 69].

For the characteristics of the 12 included studies from the 2008 review, please refer to the original review [13].

Study quality and bias

The PEDro scores ranged from 4 to 9 [47], with mean score of 6.6 (please refer to the PEDro website for score breakdowns). All participants were randomly allocated and all studies provided adequate results and analysis (items 10 and 11). Only five studies failed to conceal allocation [43, 67, 73, 74, 78] and one study assess baseline comparability [75]. No study blinded therapists, and only three blinded their participants [68, 75, 80]. The lower scoring studies were mainly marked down on blinding of assessors, adequate follow-up, intention to treat analysis and concealed allocation. With all studies, the greatest possible source of bias was related to blinding. Eleven publications scored ≥6 [33, 45, 6872, 7680], along with seven from the 2008 review, totalling 18 studies of high quality [8891, 93, 95, 98].

Data synthesis

Four studies from the 2008 review had insufficient data to enable their inclusion into a meta-analysis [89, 92, 94, 96], one of which was a high quality paper [89]. Twenty-two studies remained, 17 of high quality, which were considered suitably similar to warrant quantitative analysis and synthesis. Too few studies (only two of high quality) provided data ≥18 months to warrant pooling of data results for extra long term.

Pain

Twenty-two studies, with 2,258 participants, provided post treatment effect on pain. Combining the results of high quality studies demonstrated significant benefit (mean difference) of stabilisation exercises for low back pain short, medium and long term of -7.93 (95% CI -11.74 to -4.12), -6.10 (95% CI -10.54 to -1.65) and -6.39 (95% CI -10.14 to -2.65) (Figure 2) respectively, when compared with any alternative treatment or control. However, the difference between groups was clinically insignificant with Minimal Clinical Important Difference (MCID) for pain being suggested as 24 to 40 [100], with between study heterogeneity high to moderate (I 2 = 67%, 50% and 45% respectively).

Figure 2
figure 2

Forest plot of stabilisation versus alternative intervention: pain - long term. *Negative values favour stabilisation intervention, positive favour control.

Subgroup analysis of stabilisation exercises versus other forms of exercise demonstrated statistical significant short and medium term benefit, with a mean difference of -7.75 (95% CI -12.23 to -3.27) and -4.24 (95% CI -8.27 to -0.21). Differences between groups was clinically insignificant [100]. At long term there was no statistical or clinically significant difference; -3.06 (95% CI -6.74 to 0.63) (Figure 3). Between study heterogeneity was high to negligible (I 2 = 66%, 0% and 0% respectively).

Figure 3
figure 3

Forest plot of stabilisation versus other exercises: pain - long term. *Negative values favour stabilisation intervention, positive favour control.

Combining the results of all studies for the sensitivity analysis provided very similar results (Additional file 1).

Disability

Twenty-four studies, with 2,359 participants, provided post treatment effect on disability. Combining the results of high quality studies demonstrated statistical significant benefit (mean difference) of stabilisation exercises for low back pain short and long term of -3.61 (95% CI -6.53 to -0.70), -3.92 (95% CI -7.25 to -0.59) (Figure 4), when compared with any alternative treatment or control. However, the difference between groups was clinically insignificant, with MCID for RMDQ 17 to 21 and 8 to 17 for ODI (converting all to 0 – 100 scale) [100]. There was no difference statistically or clinically medium term; -2.31 (95% CI -5.85 to 1.23). Between study heterogeneity was high to moderate (I 2 = 83%, 65% and 56% respectively).

Figure 4
figure 4

Forest plot of stabilisation versus alternative intervention: disability - long term. *Negative values favour stabilisation intervention, positive favour control.

Subgroup analysis of stabilisation exercises versus other forms of exercises demonstrated significant short and medium term statistical benefit, but no significant clinical difference, (mean difference) of -3.63 (95% CI -6.69 to -0.58) and -3.56 (95% CI -6.47 to -0.66). There was no significant statistical or clinical long term benefit; -1.89 (95% CI -5.10 to 1.33) Figure 5. Between study heterogeneity was high to negligible (I 2 = 82%, 0% and 0% respectively).

Figure 5
figure 5

Forest plot of stabilisation versus other exercises: disability - long term. *Negative values favour stabilisation intervention, positive favour control.

Combining the results of all studies for the sensitivity analysis provided results that were less favourable for stabilisation exercises for short to medium term, with similar long term results (Additional file 1).

Two high quality studies featured FABQ as an outcome measure. FABQ (physical activity) (0-24) and FABQ (work) (0-42) for Marshall and Kennedy [77] at short term follow-up had a non significant mean difference of 2.2 (95% CI −1.3 to 5.6) and 2.3 (95% CI −1.8 to 6.5) respectively in favour of stabilisation exercises, when compared to stationary bike exercises. There was a non significant medium term mean difference of −2.0 (95% CI −5.1 to 1.0) and −2.7 (95% CI −7.6 to 2.1) respectively in favour of the stationary bike. Short term mean difference for FABQ (physical activity) for Unsgaard-Tøndel et al [79] was non significant at -1.58 (-4.00 to 0.84) and -0.18 (-2.42 to 2.07) in favour of sling and general exercises, respectively. Mean difference for FABQ (work) was non significant at -0.40 (95% CI -3.81 to 3.01) in favour of slings and 0.25 (95% CI -2.74 to 3.24) in favour of stabilisation exercises, when compared to general exercises.

Discussion

Summary of main findings

The objective of this systematic review was to evaluate the current evidence for the benefit of stabilisation (or ‘core stability’) exercises for low back pain. The overall results of the meta-analysis indicates a trend favouring core stability exercises which is not regarded as clinically significant, when compared with any alternative treatment or control. Minimal clinical important difference (MCID) for pain has been suggested as 24 to 40, with 17 to 21 for RMDQ and 8 to 17 for ODI (converting all to 0 – 100 scale) [100]. Any reduction in favour of stabilisation exercises was potentially meaningless, with mean change scores for pain (7.93, 6.10 and 6.39) and disability (3.61, 2.31 and 3.92) falling well below these MCID levels.

The overall results of the subgroup meta-analysis suggest that stabilisation (or ‘core stability’) exercises for low back pain offer very minimal benefit in the short and medium term when compared with other forms of exercise, with mean change scores for pain (7.75 and 4.24) and disability (3.63 and 3.56) also falling well below the clinically significant level. There was no significant benefit in the long term, for pain or disability, when compared with any other form of exercise. Results were trending towards stabilisations, but results were not significant, and any benefit would be clinically insignificant, being largely below the MCID level.

In the subgroup analysis of long term follow-up for stabilisations exercises versus other forms of exercises heterogeneity was negligible (I 2 = 0%). Therefore, our results, that stabilisation exercises offer no benefit over alternative forms of exercises in the long term, can be considered robust.

Whilst not statistically significant, both studies that used FABQ as an outcome found that there was a trend of worse scores with stabilisation exercises, compared with stationary bikes, sling exercises and general exercises [77, 79]. The rehabilitation strategy surrounding stabilisation exercises has been challenged and has been suggested could encourage unhealthy thoughts and beliefs on pain and movement [101].

Limitations of included studies

For the meta-analysis of pain and disability for stabilisation versus any alternative treatment or placebo, high to moderate heterogeneity existed. I 2 scores of pain for short, medium and long term were 67%, 50% and 45%, and disability, 83%, 65% and 56% respectively. The high heterogeneity is possibly due to the different comparisons being made between trials, and this reduces the robustness of our short to medium term results. Overall, the interventions were applied to a wide variety of patients, including patients from low, medium or high socio-economic groups, unemployed or employed, having had investigations or no investigations, patients with or without leg pain, patients with acute or chronic symptoms and patients classed as ‘distressed’ or ‘distressed’ patients excluded. Patients that have high levels of fear avoidance scores are likely to have poor outcomes and compliance with biomedical models of pain and treatments, such as stabilisation exercises, and would likely do better with a biopsychosocial approach [101]. Cairns et al. [88], for example, excluded patients that were ‘distressed’, which perhaps biases results in favour of stabilisation exercises. This compares with Ferreira et al [70], whose participants were from low socio-economic groups, who are more likely to develop chronic pain states with worse outcomes, and would perhaps bias results in favour of alternative treatment protocols [102]. Furthermore, differences existed with how the treatments were delivered, class settings only, one to one treatment only, class/one to one treatment with home exercises or just home exercises, plus different amount of therapist contact times.

The studies included within the main meta-analysis had PEDro scores of ≥6, and as such were considered to have low bias. However, the main source of bias within the studies was blinding. No study blinded the therapist and few studies blinded the participants. Given that the pain and disability rating scales were patient self recorded it is possible that this could over estimate the treatment effect sizes. However blinding in active physiotherapy studies is difficult to achieve.

One of the limitations with long term follow-up of RCTs, particularly with exercise intervention, is the attrition rate. An uneven dropout has the risk of over estimating the effect size of treatment groups. For example Ferreira et al. [70] had an uneven dropout rate, with 9% for the general exercise group and 19% for the stabilisation group. This could easily bias the results in favour of the stabilisation group.

Limitations of this review

An extensive literature search was carried out, with two reviewers screening full texts independently for inclusion and the data extracting independently checked. This minimised bias within this review process, however, no attempt was made to source unpublished studies, nor studies published in any other language than English. It is thought that identifying unpublished trials minimises publication bias [103]. However, this approach has been questioned by others, who suggest that truly unpublished trials frequently have poor methodology, and ones with better methodology often eventually become published [104]. It is not possible to know if the inclusion, if available, of any unpublished trials would considerably alter our conclusions, or if this truly is a weakness of this review.

Comparison with other reviews

Our main findings differ very little from the 2008 review [13], however firmer conclusions about stabilisation exercises can be drawn from our review. In the 2008 review the majority of the studies favouring stabilisation exercises combined the exercises with some other form of treatment, implying that it was the package of care that was effective rather than stabilisation exercises alone. In our updated review the majority of the studies used stabilisation exercises as sole treatment, and as such the data synthesis looks more closely at stabilisation exercises as sole treatment.

Our findings were similar to the Wang et al. [80] review, which also concluded some short term benefit to pain and disability for stabilisation exercises over general exercise, with no long term benefit to pain. No comparison for long term follow-up for disability was made, and no attempt at analysing results against MCID was made. Of their five included articles we included three in our review. One was excluded for duplicating results from another included study, which was included in both of our and Wang et al’s reviews. It is therefore possible that their meta-analysis double counts these results [40, 79]. The other study we excluded during initial screening [105] looked at a relaxation yoga programme with meditation, chanting and counselling, and clearly doesn’t match our intervention definition. It is perhaps questionable that this study was included within Wang et al’s study [80].

Our findings differ from the Byström et al [22] review which concluded long term benefit to disability in favour of stabilisation exercises over general exercises and with regards to pain at intermediate term. The differences may be due to our inclusion of a further 15 publications; their inclusion of studies within the analysis with high risk of bias defined by means of a PEDro score of less than 6; or their use of a fixed effects meta-analysis model for pooled analysis where heterogeneity, as measured by I 2, was less than 50%. Choosing fixed versus random effect models solely based upon the test for heterogeneity is considered incorrect, and should be made upon which model best fits the distribution of effects sizes [106]. We used a random effects model on all analyses, since there is inherent heterogeneity in low back pain within the literature. Using a fixed effects model incorrectly could over estimate the pooled effect sizes and underestimate the confidence interval width, thus reducing reliability of results [106].

Conclusion

The results of this current systematic review suggest that stabilisation exercises improves low back pain symptoms, but no better than any other form of active exercise in the long term. The low levels of heterogeneity and large number of high methodological quality of available studies, at long term follow-up, strengthen our current findings. There is a trend of worse fear avoidance scores.

This review cannot recommend stabilisation exercises for low back pain in preference to other forms of general exercise, and further research is unlikely to considerably alter this conclusion.