Background

Low back pain (LBP) is related to disability and work absence and accounts for high economical costs in western societies [1]. The management of LBP comprises a range of different intervention strategies including surgery, drug therapy, and non-medical interventions. During the last years, a large number of randomized controlled trials (RCTs) have been published and these have been summarized in systematic reviews. Most of these systematic reviews focus on the effectiveness of a single intervention and describe the effectiveness on the different types of LBP. The current study presents an up-to-date overview on the current literature on physical and rehabilitation medicine in patients with chronic LBP. The physical and rehabilitation medicine interventions include exercise therapy, back schools, transcutaneous electrical nerve stimulation (TENS), superficial heat or cold, low-level laser therapy (LLLT), individual patient education, massage, behavioural treatment, lumbar supports, traction, and multidisciplinary rehabilitation. This systematic review will provide an overview on these physical and rehabilitation medicine interventions applied in chronic LBP patients and its effectiveness.

Criteria for considering studies for this review

A study must fulfil the following inclusion criteria to be included in this review.

Types of studies

Only RCTs were included.

Types of participants

The study population should consist of adults, older than 18 years, with non-specific chronic LBP that persisted for 12 weeks or more.

Randomized controlled trials (RCTs) including subjects with specific LBP caused by pathological entities, such as vertebral spinal stenosis, ankylosing spondylitis, scoliosis, and coccydynia were excluded. The diagnosis for these specific entities had to be confirmed by means of an MRI or another diagnostic tool. Trials on post-partum LBP or pelvic pain due to pregnancy as well as post-operative studies and prevention studies were also excluded.

Types of interventions

Randomized controlled trials (RCTs) studying the following physical and rehabilitation interventions were included in this overview: exercise therapy, back schools, transcutaneous electrical nerve stimulation (TENS), superficial heat or cold, low-level laser therapy (LLLT), individual patient education, massage, behavioural treatment, lumbar supports, traction, and multidisciplinary rehabilitation.

Exercise therapy was defined as “a series of specific movements with the aim of training or developing the body by a routine practice or physical training to promote good physical health” [2].

A back school was defined as consisting of educational and skills acquisition program, including exercises, in which all lessons were given to groups of patients and supervised by a paramedical therapist or medical specialist [3].

All standard modes of transcutaneous electrical nerve stimulation (TENS) were considered in this review. TENS is a non-invasive therapeutic modality. TENS units stimulate peripheral nerves via skin surface electrodes at well-tolerated intensities and are capable of being self-administered [4].

Superficial heat or cold included all kinds of heat or cold therapies, such as ice, cold towels, cold gel packs, ice packs, and ice massage; hot water bottles, heated stones, soft-heated packs filled with grain, poultices, hot towels, hot baths, saunas, steam, heat wraps, heat pads, electric heat pads, and infrared heat lamps [5]. Spa therapy (balneotherapy) was excluded.

Low-level laser therapy (LLLT) is a light source that generates pure light of a single wavelength with non-thermal effects [6]. For this intervention, all types of LLLT, including all wavelengths, are included.

Patient education was defined as “a systematic experience, in a one-to-one situation, that consists of one or more methods, such as the provision of information and advice and behaviour modification techniques, which influence the way the patient experiences his illness and/or his knowledge and health behaviour, aimed at improving or maintaining or learning to cope with a condition” [7].

Massage was defined as soft tissue manipulation using the hands or a mechanical device [8].

Behavioural treatments included operant, cognitive, and respondent treatments or a combination of these treatments. Each of these focus on the modification of one of the three response systems that characterize emotional experiences: behaviour, cognition, and physiological reactivity [9].

Lumbar supports included any type of lumbar support, flexible or rigid, used for the treatment of chronic non-specific LBP [10].

The intervention traction included any type of traction, such as mechanical traction, manual traction (unspecific or segmental traction), computerized traction, auto traction, underwater traction, bed rest traction, inverted traction, continuous traction, and intermitted traction [11].

Finally, the multidisciplinary treatment included multidisciplinary bio-psychosocial rehabilitation with minimally one physical dimension and one of the other dimensions (psychological or social or occupational) [12].

For all types of interventions, additional treatments were allowed, provided that the intervention of interest was the main contrast between the intervention groups included in the study.

Types of outcome measures

The following self-reported outcome measures were assessed in this review: pain intensity (e.g. visual analogue scale (VAS), McGill pain questionnaire), back-specific disability (e.g. Roland Morris, Oswestry Disability Index), perceived recovery (e.g. overall improvement), return to work (e.g. return to work status, sick leave days), and side effects. The primary outcomes for this overview were pain and physical functional status. Studies with a follow-up less than one day were excluded.

Search methods for identification of studies

Existing Cochrane reviews of the 11 interventions were screened for studies fulfilling the inclusion criteria. Additionally, a search was conducted in MEDLINE, EMBASE, CINAHL, CENTRAL, and PEDro up to 22 December 2008. The searches were updated from the last date of the literature search in the Cochrane reviews.

References from the relevant studies were screened, and experts were approached in order to identify any additional primary studies not identified in the previous steps. The language was limited to English, Dutch, and German, because these were the languages that the review authors were able to read and understand. The search strategy outlined by the Cochrane Back Review Group (CBRG) was perused. Two reviewers working independently from each other conducted the electronic searches.

Methods of the review

Study selection

Three authors (MM and SR/TK) independently screened the abstracts and titles retrieved by the search strategy and applied the inclusion criteria to all these abstracts. Full text of the article was obtained if the abstract seemed to fulfil the inclusion criteria or if eligibility of the study was unclear. All full text articles were compiled and screened on inclusion criteria by the two authors, independently. Any disagreements between the authors were resolved by discussion and consensus. A third author was consulted if disagreements persisted.

Assessment of risk of bias in included studies

Two reviewers (MM, SMR) conducted the risk of bias assessment, independently. Risk of bias of the individual studies was assessed using the criteria list advised by the CBRG, which consists of 11-items. Items were scored as positive if they did fulfil the criteria and negative when there was a clear risk of bias, and marked as inconclusive if there was insufficient information. Differences in assessment were discussed during a consensus meeting. A total score was computed, and high quality was defined as fulfilling six or more (more than 50%) of the internal validity criteria (range 0–11).

Data extraction

The same two review authors who performed the risk of bias assessment conducted the data extraction, independently from one another. Data were extracted onto a standardized web-based form. The following data were extracted from the studies: (1) characteristics of the studies: number of participants, gender, age, setting, and duration of complaints; (2) characteristics of the interventions: the type, frequency, duration, co-interventions, and control intervention; (3) characteristics of the outcomes: outcome measures, instruments, and scores (e.g. mean, median, standard deviation, and confidence interval).

Data analysis and statistical analysis

Comparison therapies were combined into main clusters of presumed effectiveness (no treatment/waiting list controls, other interventions). Separate analyses were planned for: (1) each type of intervention, (2) each type of control, (3) each main outcome measure, and (4) time of follow-up (post-treatment; short-term (closest to 3 months), intermediate (closest to 6 months), and long-term (closest to 12 months) follow-up).

If trials reported outcomes only as graphs, the mean scores and standard deviations were estimated from these graphs (Supplementary material).

For continuous data results are presented as weighted mean differences (WMD). All scales were converted to 100-point scales. For dichotomous data, a relative risk (RR) was calculated, and the event was defined as the number of subjects recovered. A test for heterogeneity was calculated using the Q-test (Chi-square) and I 2. Confidence intervals (95%CI) were calculated for each effect. A random effects model was used and funnel plots were examined for publication bias.

If standard deviations were not reported, we calculated it using reported values of confidence intervals if possible. If the standard deviation of the baseline score was reported, this score was forwarded. Finally, if none of these data were reported, an estimation of the standard deviation was based on study data (population and score) of other studies.

To correct for bias introduced by “double-counting” of subjects of trials that had two control groups in the same meta-analyses, the number of subjects of these trials were divided by two.

Quality of evidence

Grades of Recommendation, Assessment, Development, and Evaluation (GRADE) were used to evaluate the overall quality of evidence and the strength of the recommendations [13]. Quality of evidence of a specific outcome was based upon four principal measures: (1) limitations (due to for example, study design), (2) consistency of results, (3) indirectness (e.g. generalizability of the findings), (4) precision (e.g. sufficient data), and (5) other considerations, such as reporting bias. The overall quality was considered to be high when RCTs with a low risk of bias provide consistent, sufficient, and precise results for a particular outcome; however, the quality of the evidence was downgraded by one level when one of the factors described above was not met. The following grades of evidence were applied:

High quality::

Further research is very unlikely to change our confidence in the estimate of effect.

Moderate quality::

Further research is likely to have an important impact on our confidence in the estimate of effect and may change the estimate.

Low quality::

Further research is very likely to have an important impact on our confidence in the estimate of effect and is likely to change the estimate.

Very low quality::

We are very uncertain about the estimate.

To improve the readability of this review, a GRADE table was completed only when we completed a meta-analysis.

Results

Description of studies

Of the 11 existing Cochrane reviews a total of 114 full text articles were screened for eligibility. Of these 114 articles, 58 studies fulfilled the inclusion criteria and were included. Additionally, 1,825 new relevant titles and abstracts were identified and screened for potential inclusion (Fig. 1). Of these, 127 full text articles were evaluated of which a total of 35 studies fulfilled the inclusion criteria.

Fig. 1
figure 1

Flow diagram of systematic review inclusion and exclusion of articles for non-medical treatments for chronic low back pain

After removing duplicates, 83 studies were included, comprising the following subjects: exercise therapy [1450] (n = 37), back schools [5155] (n = 5), TENS [5661] (n = 6), low-level laser therapy [6264] (n = 3), massage [6567] (n = 3), behavioural treatment [6888] (n = 21), patient education [89] (n = 1), traction [90] (n = 1), and multidisciplinary treatment [9196] (n = 6).

Multiple publications were found for Bendix et al. [15, 92, 97], Gudavalli et al. [26, 98, 99], Härkäpää et al. [96, 100102], Niemistö et al. [35, 103], Smeets et al. [41, 104], and Tavafian et al.[54, 105]. Information from all publications was used for assessment of risk of bias and data extraction, but only the first or most prominent publication was used for citation of these studies.

The study characteristics of all included studies are presented in Table 1.1 to 1.9 in Supplementary material. A total of 8,816 patients were included. Most patients were included in the exercise studies (n = 3,957), followed by the behavioural studies (n = 2,062), and multidisciplinary studies (n = 1,229). A total of 50 studies (60%) reported on the outcome pain intensity, measured with a VAS or numerical rating scale (NRS). In total, 11 studies (13.3%) did not report on the outcome pain.

Risk of bias assessment

The results of the risk of bias assessment are shown in Table 1. All studies were described as randomized, however, the method of randomization was only explicit in 56.6% (n = 47) of the studies. Only 28 studies (33.7%) met six or more of the criteria, which was our preset threshold for low risk of bias. Only the criteria regarding the baseline characteristics, timing of outcome measures, and description of dropouts were met by 50% or more of the included randomized trials. Compliance of the interventions was clearly acceptable in only 37.3% (n = 31).

Table 1 Risk of bias of studies investigating non-medical interventions for chronic low back pain

Effects of intervention

The effectiveness of exercise therapy

Exercise therapy versus waiting list controls/no treatment

Eight studies [14, 23, 24, 36, 40, 41, 43, 48] were identified as comparing some type of exercise therapy to waiting list controls or no treatment. Five studies reported post-treatment data only, because after the treatment period the waiting list controls also received the treatment. Only two studies [14, 41] had intermediate or long-term follow-up.

All studies reported data that could be used in the statistical pooling. The pooled mean difference of the five studies reporting post-treatment pain intensity was not statistically significant (−4.51 [95%CI −9.49; 0.47]). The WMD for post-treatment improvement in disability was −3.63 [95%CI −8.89; 1.63]. The pooled mean WMD for pain intensity at intermediate follow-up was −16.46 [95%CI −44.48; 11.57]. Only one study (102 people) reported intermediate outcomes for disability and long-term outcomes for pain intensity and disability. There were no differences between the groups receiving exercise therapy and the waiting list control group.[41].

Therefore, there is low quality evidence (serious limitations, imprecision) that there is no statistically significant difference in pain reduction and improvement of disability between exercise therapy and no treatment/waiting list controls.

Exercise therapy versus usual care/advise to stay active

A total of six studies [28, 35, 45, 49, 50, 106] investigated the effect of exercise therapy compared to usual care. Four of these studies had an intermediate or long-term follow-up. Statistical pooling of three studies [49, 50, 106] showed a significant decrease in pain intensity and disability in favour of the exercise group (WMD −9.23 [95%CI −16.02; −2.43]) and −12.35 [95%CI −23.00; −1.69], respectively. One study [49] reported on pain and disability at short-term follow-up, and found no statistically significant differences between the exercise group and the control group receiving home exercises. Two studies [35, 106] showed a statistically significant pooled WMD for disability at intermediate follow-up of −5.43 [95%CI −9.54; −1.32]. One study [35] found a statistically significant difference at intermediate follow-up for pain relief for the exercise group compared to the usual care group. Three studies [45, 103, 106] reported on pain and/or disability at long-term follow-up. The pooled WMD for pain was not statistically significant (−4.94 [95%CI −10.45; 0.58]); the WMD for disability was statistically significant in favour of the exercise group (WMD −3.17 [95%CI −5.96; −0.38]).

One study [28] reported recovery at post-treatment and during intermediate and long-term follow-up. There was a statistically significant difference between the groups at 3 and 6 months follow-up in favour of the exercise group compared with usual care (p < 0.001). As much as 80% of the patients in the exercise group regarded themselves as recovered at 3 months follow-up versus 47% in the usual care group.

There is low quality evidence (serious limitations, imprecision) for the effectiveness of exercise therapy compared to usual care on pain intensity and disability.

Exercise therapy versus back school/education

Four studies, three with a high risk of bias, were identified [19, 25, 39, 44]. Post-treatment results for disability were reported in two studies, with a significant pooled WMD of −11.20 [95%CI −16.78; −5.62]. One study reported on pain post-treatment and found no statistically significant difference between both the intervention groups [44]. The pooled mean differences for pain and disability at 3 months follow-up were −7.63 [95%CI −17.20; 1.93] and −2.55 [95%CI −10.07; 4.97], respectively.

Two studies [19, 25] reported intermediate outcomes on pain and three studies [19, 25, 39] reported on disability. The pooled WMDs showed no statistically significant differences between the groups: –5.58 [95%CI −16.65; 5.48] and −4.42 [95%CI −9.90; 1.05], respectively. Only one study (n = 346) reported long-term outcomes, and these were not statistically significantly different between the groups [25].

The data provided very low quality evidence (serious limitations, imprecision, and inconsistency) that there was no statistically significant difference in effect on pain and disability at short- and intermediate follow-up for exercise therapy compared to back school/education.

Exercise therapy versus behavioural treatment

Three studies, one with a low risk of bias, were identified comparing exercise therapy with a behavioural treatment [17, 41, 43]. Two studies reported post-treatment pain and disability and the pooled WMDs were 1.21 [95%CI −5.42; 7.84] and 0.34 [95%CI −2.64; 3.31], respectively.

All three studies reported intermediate and long-term follow-up on pain intensity and disability. For intermediate follow-up the pooled WMDs for pain and disability were −2.23 [95%CI −7.58; 3.12] and 1.97 [95%CI −3.55; 7.48], respectively. Long-term results showed a pooled WMD for pain intensity of −0.88 [95%CI −6.34; 4.58] and a pooled WMD for disability of 2.77 [95%CI −3.43; 8.96].

There is low quality evidence (serious limitations, imprecision) that there are no statistically significant differences between exercise therapy and behavioural therapy on pain intensity and disability at short- and long-term follow-up.

Exercise therapy versus TENS/laser therapy/ultrasound/massage

Five studies, two with a low risk of bias, were identified comparing exercise therapy with passive therapies, such as TENS, low-level laser therapy, ultrasound, thermal therapy, and ultrasound [16, 18, 27, 30, 49]. The pooled WMD for post-treatment pain intensity was −9.33 [95%CI −18.80; 0.13] and for post-treatment disability −2.59 [95%CI −8.03; 2.85]. Two studies [18, 49] reported on short-term pain intensity and disability and the pooled mean differences were 1.72 [95%CI −6.05; 9.50] and 1.02 [95%CI −0.38; 2.42], respectively. One study with a low risk of bias [30] reported intermediate and long-term outcomes, and found a statistically significant difference for pain intensity of 16.8 and 21.2 points, respectively, in favour of exercise therapy. Also a statistically significant difference was found for disability.

Low quality evidence (serious limitations, inconsistency, and imprecision) was provided that there is no statistically significant difference in effect between exercise therapy compared to TENS/laser/ultrasound/massage on the outcomes pain and disability at short-term follow-up.

Exercise therapy versus manual therapy/manipulation

Five studies, two with a low risk of bias, were identified comparing exercise treatment with spinal manipulation or manual therapy [21, 25, 26, 34, 47]. Post-treatment data were available for three studies. The pooled WMDs for pain intensity and disability were 5.67 [95%CI 1.99; 9.35] and 2.16 [95%CI –0.96; 5.28], respectively. One study reported a statistically significant difference in global perceived effect post-treatment [21] in favour of spinal manipulation. Two studies reported short-term effects on pain intensity and disability and the pooled WMDs were −1.33 [95%CI –10.11; 7.79] and 0.29 [95%CI −3.15; 3.72], respectively [25, 26]. Intermediate results on pain and disability were reported by three studies [21, 25, 26] and the pooled WMDs were −0.49 [95%CI –12.22; 11.23] and 2.38 [95%CI –5.16; 9.93], respectively. All studies reported long-term results on disability and the pooled WMD −0.70 [95%CI −3.14; 1.74]. Four studies reported long-term results on pain intensity and the pooled WMD was 2.09 [95%CI −2.94; 7.13]. Global perceived effect was reported by one study during intermediate and long-term follow-up. No statistically significant between group differences were found in this study [21].

The data provided low quality evidence (inconsistency, imprecision) that there was no statistically significant difference in effect (pain intensity and disability) for exercise therapy compared to manual therapy/manipulation at short- and long-term follow-up.

Exercise therapy versus psychotherapy

One study with a high risk of bias was identified [32]. Post-treatment results showed a statistically significant difference in disability scores between both groups in advantage of the exercise group. No post-treatment differences between both groups were found for pain intensity. At 6 months follow-up, both disability and pain intensity scores were lower in the exercise group compared to the psychotherapy group, but not statistically significant.

Exercise therapy versus other forms of exercise therapy

As much as 11 studies compared different exercise interventions with each other [20, 21, 29, 31, 33, 3739, 42, 46, 48]. Data of these studies could not be pooled because of the heterogeneity of the types of interventions.

Two studies found statistically significant differences between different exercise interventions. One study [42], with a high risk of bias, reported statistically significant difference in pain relief at 3 months follow-up of an aerobic exercise training program compared with a lumbar flexion exercise program of 3 months. One large trial [21] with a low risk of bias (n = 240) compared a general exercise program (strengthening and stretching) with a motor control exercise program (improving function of specific trunk muscles) of 12 weeks. The motor control exercise group had slightly significantly better outcomes (mean adjusted between group difference function 2.9 and global perceived effect 1.7) than the general exercise group at 8 weeks. Similar group outcomes were found at 6 and 12 months follow-up.

A total of eight studies did not find any statistically significant differences between the various exercise interventions [20, 29, 31, 33, 37, 38, 46, 48]. Sherman et al. [39] compared a 12-week yoga (viniyoga) program with a 12-week conventional exercise class program. Back-related function in the yoga group was superior to the exercise group at 12 weeks.

The effectiveness of back school

Back school versus waiting list controls/no treatment/usual care

Three studies compared back school with waiting list controls, no treatment, and a usual care clinic group [52, 54, 55]. Pain post-treatment was reported by 2 studies [52, 55] and the pooled WMD was −4.64 [95%CI −13.65; 4.37]. Disability post-treatment was only reported by Ribeiro et al. [55] and showed no statistically significant difference between both groups. Two studies [54, 55] reported short-term follow-up data on disability and the pooled WMD was −13.04 [95%CI −37.04; 10.95] in favour of the back school intervention. One study [55] with a low risk of bias reported on pain intensity at short-term follow-up and found no statistically significant difference between both intervention groups. One study [54] with a high risk of bias, reported on disability at intermediate and long-term follow-up and no significant differences were found at both time points between the back school group and the clinic group.

Due to serious limitations, inconsistency, and imprecision, low quality evidence was provided that there is no statistically significant short-term difference in treatment effect on pain and disability for a back school treatment compared to waiting list controls/no treatment/usual care.

Back school versus active treatment

Two studies, one with a low risk of bias, were identified comparing a back school treatment with an active treatment [19, 53]. The pooled WMDs for pain intensity and disability at short-term follow-up were 4.75 [95%CI −2.13; 11.63] and 0.12 [95%CI −2.37; 2.61], respectively. At intermediate follow-up, the pooled WMDs for pain intensity and disability were −2.16 [95%CI −13.03; 8.71] and 0.05 [−3.59; 3.69], respectively.

Low quality evidence (serious limitations, inconsistency, and imprecision) was provided that there is no statistically significant difference in effect for back school treatment compared to active treatments on pain and disability at short-term and intermediate follow-up.

Back school versus education/information

One study [51] with a high risk of bias was identified comparing back school with given instructional material. At 6 months follow-up, there was a statistically significant difference in pain intensity and disability in favour of the back school group. At long-term follow-up (12 months), there was still a significant difference between both intervention groups on the outcome disability, but not for pain intensity, in favour of the back school group.

The effectiveness of transcutaneous electrical nerve stimulation (TENS)

TENS versus sham treatment

Five studies, two with a low risk of bias, compared the effectiveness of TENS with sham TENS or sham percutaneous electrical nerve stimulation (PENS). Four studies [18, 56, 59, 61] described post-treatment results on pain and the pooled WMD was −4.47 [95%CI −12.84; 3.89]. The pooled WMD of post-treatment disability of two studies [18, 61] was −1.36 [95%CI −4.38; 1.66]. Ghoname et al. [56] reported on disability and found no significant difference between the TENS and sham-PENS group. The study of Jarzem et al. [59] with a low risk of bias, compared TENS with sham-TENS and demonstrated a significant carry-over effect with conventional TENS having a greater effect on pain intensity than sham-TENS.

Two studies [18, 58] found no statistically significant difference between the TENS and sham TENS groups at short-term follow-up.

The data provided low quality evidence (serious limitations, heterogeneity) that there is no statistically significant difference on post-treatment pain intensity and disability between TENS and sham-TENS.

TENS versus PENS/acupuncture

Four studies, all with a high risk of bias, compared the effectiveness of TENS with acupuncture or PENS [5658, 60]. Post-treatment results of two studies [56, 60] showed a pooled WMD for pain intensity of 16.64 [95%CI 5.86; 27.41], in favour of the control group. Outcomes on short-term pain intensity were reported in three studies [57, 58, 60]. The pooled WMD was 6.51 [95%CI −0.41; 13.44] in favour of the PENS/acupuncture intervention. One study [58], with a high risk of bias, reported no statistically significant difference on short-term disability.

Very low quality evidence (serious limitations, inconsistency, and imprecision) was provided that PENS/acupuncture is more effective than TENS for post-treatment and short-term pain relief.

TENS versus active treatments

Two studies, of which one with a high risk of bias, compared the effectiveness of TENS with active treatments [18, 56]. Ghoname et al. found no statistically significant difference in pain intensity post-treatment between both intervention groups. Deyo et al. [18] reported no statistically significant difference on pain intensity, disability, and recovery at short-term follow-up between TENS and exercise therapy.

Conventional TENS versus biphasic new wave TENS

One study [58] with a high risk of bias investigated the effectiveness of conventional TENS compared to biphasic new wave TENS for the outcomes of pain and disability post-treatment and at short-term follow-up. No statistically significant differences were found for both outcome measures at both time points.

The effectiveness of low-level laser therapy (LLLT)

Low-level laser therapy versus sham treatment

One study [64] with a low risk of bias, compared low-level laser therapy treatment with sham laser therapy treatment in elderly patients over 60 years. The study provided low quality evidence that LLLT was more effective in pain relief at intermediate follow-up (44.7%) compared with sham LLLT (15.2%).

Low-level laser therapy + exercise versus sham LLLT + exercise

Results on pain and disability at post-treatment were reported by one study [62] and no difference was found between the intervention groups on both outcome measures.

Two studies [62, 63] reported on pain intensity and disability at short-term (3 months) follow-up. The pooled analysis of these two small trials (n = 61) showed a significant difference in pain relief (WMD −13.57 [95%CI −26.67; −0.47]). No difference was found on disability between those who received LLLT plus exercise and those who received sham LLLT + exercise (WMD −5.42 [95%CI −23.55; 12.71].

Very low quality evidence was provided (serious limitations, inconsistency, and imprecision) for the effectiveness of LLLT + exercise compared to sham LLLT + exercise on pain intensity at short-term follow-up, but not for disability.

Low-Level laser therapy versus exercise

One study [27] compared the effectiveness of LLLT with exercise therapy post-treatment. No statistically significant difference was found between both therapy groups on pain level and disability.

The effectiveness of patient education

Patient education versus active non-educational interventions

Three studies [25, 39, 51], one with a low risk of bias, compared the effectiveness of patient education with physiotherapy [25], Swedish Back School [51] and exercise/yoga exercises [39].

Sherman et al. [39] compared the effectiveness of yoga exercises and conventional exercises with education on the outcome disability. Post-treatment, there was a statistically significant difference between the yoga exercise group and the education group in favour of the yoga group (WMD −3.4 [95%CI −5.1; −1.6]). No statistically significant difference was found between the conventional exercise group and the education group.

Pain and disability at short-term follow-up were reported by Goldby et al. [25] and no significant difference between the education group and the exercise group was found for both outcome measures at this time point.

Two studies [25, 51] reported on pain intensity at intermediate follow-up and the WMD was −9.20 [95%CI −23.55; 22.45].

Disability at intermediate follow-up was reported by three studies [25, 39, 51]; the pooled WMD was 3.16 [95%CI –3.97; 10.29]. Long-term follow-up data on pain intensity and disability were reported by two studies [25, 51] and the pooled WMDs were –5.54 [95%CI –15.80; 5.12] and –0.96 [95%CI –4.80; 2.88], respectively.

Due to serious limitations, inconsistency, and imprecision, low quality evidence is provided that there is no difference in effect at intermediate and long-term effect on pain and disability for patient education compared to active non-educational interventions.

Patient education: focus on anatomy versus focus on neurosystem

One study [89] with a high risk of bias compared one-on-one education with a focus on anatomy compared to a focus on the neurosystem in 58 patients who presented themselves at private rehabilitation clinics. Fifteen weekdays after the first session, a significant reduction in disability was found in the group with focus on the neurosystem compared to the control group. However, no differences on pain perception were found.

The effectiveness of massage therapy

Three studies [6567] with a high risk of bias compared massage therapy with relaxation therapy [65, 67] and acupuncture massage [66]. Post-treatment, there was no statistical significant reduction in pain intensity in the massage group compared to the control group; the pooled WMD was –0.93 [95%CI –8.51].

Low quality evidence (serious limitations, imprecision) was provided that there was no statistically significant difference in effect of massage therapy compared to passive interventions on pain intensity post-treatment.

The effectiveness of traction

One study [90] (n = 42) with a high risk of bias compared motorized traction treatment plus standard physiotherapy with standard physiotherapy only. No statistically significant differences were found on pain intensity, disability, and recovery at post-treatment and after 3 months follow-up between both intervention groups.

The effectiveness of behavioural treatment

As much as 21 randomized trials were identified investigating the effectiveness of behavioural treatment in chronic low back patients.

Behavioural treatment versus no treatment/waiting list controls/placebo

A total of 12 studies, of which 3 studies [41, 74, 79] had a low risk of bias, were identified comparing some type of behavioural treatment to waiting list controls, no treatment, or a placebo treatment.

Respondent therapy (progressive relaxation)

Three studies [82, 83, 85] compared progressive relaxation (respondent therapy) with waiting list controls or placebo. The pooled WMD post-treatment for pain intensity was –19.74 [95%CI –34.32; −5.16] and –5.24 [95%CI –8.42; −2.06] for disability. No short- or long-term results were reported in these studies.

Respondent therapy (EMG biofeedback)

A total of four studies [70, 76, 79, 82] were identified comparing EMG biofeedback (respondent therapy) with waiting list controls or placebo. The WMD for pain intensity of the three studies of which the data could be pooled was –8.67 [95%CI –13.59; −3.74]. Disability data were only available of 2 studies and the pooled WMD post-treatment was –7.33 [95%CI –21.38; 6.73].

Operant therapy

Four studies [41, 43, 74, 84], of which three could be pooled, were identified comparing operant therapy with waiting list controls. Post-treatment there was a significant reduction in pain intensity compared to the waiting list controls (WMD –7.00 [95%CI −12.33; −1.67]). The pooled WMD for disability was –2.87 [95%CI −7.15; 1.41]. No short- or long-term results were reported in these studies. The study of Kole-Snijders [74], with a low risk of bias, showed a significant decrease in negative affect, motoric behaviour and coping control in the operant behavioural treatment group compared to the waiting list control group at post-treatment.

Combined respondent and cognitive therapy

Four studies were identified comparing a combination of respondent and cognitive behavioural treatment with waiting list controls. The WMDs for post-treatment pain intensity and disability were –12.74 [95%CI –24.10; −1.37] and –2.60 [95%CI –6.48; 1.27], respectively. No short- or long-term results were reported in these studies.

Cognitive therapy

Two studies [69, 85] were identified comparing the post-treatment effectiveness of cognitive treatment compared with waiting list controls. The pooled WMD for pain intensity was –12.67 [95%CI –20.26; −5.08]. Post-treatment disability was only described by Turner et al. in 1993 and a significant decreased pain intensity between the pre- and post-treatment was found for the patients in the cognitive behavioural group, but not for the waiting list control group. One study [69] with a high risk of bias, reported on pain intensity at 3 months follow-up and found no statistical significant difference between the internet-based cognitive therapy group and the waiting list controls. One study [72] with a high risk of bias reported on the intermediate follow-up effects of cognitive therapy compared to waiting list controls. No statistically significant differences were found for pain intensity and disability between both intervention groups at 6 months follow-up.

Summarized, there is low quality evidence (serious limitations, inconsistency) provided for the effectiveness of behavioural therapy compared to no treatment/waiting list controls/placebo for pain intensity and disability at short-term follow-up.

Behavioural treatment in addition to an other treatment versus the other treatment alone

Seven studies compared one type of behavioural treatment plus an additional treatment with the additional treatment alone [41, 43, 68, 77, 78, 81, 87]. Three studies [41, 43, 77], one with a low risk of bias, compared operant therapy plus exercise/physiotherapy with exercise/physiotherapy alone and the WMD for pain intensity and disability post-treatment were –8.06 [95%CI –23.02; 6.91] and –1.43 [95%CI –3.68; 0.82], respectively. At intermediate follow-up the WMD for pain and disability were respectively 0.40 [95%CI −5.00; 5.80] and 1.26 [95%CI −1.78; 4.29]. Four other studies [68, 77, 78, 81] compared the effectiveness of cognitive therapy in combination with a standard inpatient program, physiotherapy, and usual GP care with these treatments alone. The post-treatment WMD for pain and disability were –0.03 [95%CI –6.72; 6.65] and –3.88 [95%CI –8.65; 0.89], respectively.

The pooled WMDs at intermediate follow-up showed no statistically significant differences on pain intensity and disability (4.49 [95%CI −1.53; 10.50] and 1.29 [95%CI –4.34; 6.91], respectively).

One study compared a combination of respondent (biofeedback) and physiotherapy with physiotherapy alone [87]. A significant difference in favour of the combination group was found for pain intensity post-treatment, but also after 6 weeks and 6 months.

We found a total post-treatment WMD for pain intensity and disability of –2.33 [95%CI –6.59; 1.93] and –1.82 [95%CI –3.88; 0.24], respectively. At 6 months follow-up the total WMDs for pain intensity and disability were –0.72 [95%CI –8.13; 6.69] and 1.39 [95%CI −0.80; 3.59], respectively.

Three studies [41, 43, 77] reported on the long-term outcomes pain and disability. Three studies compared a combination of operant behavioural treatment with exercise therapy/physiotherapy with exercise/physiotherapy alone. The WMDs for pain intensity and disability were –1.23 [95%CI −7.29; 4.83] and 0.87 [95%CI –2.32; 4.06], respectively. One study also compared a combination of cognitive treatment with physiotherapy with physiotherapy alone. We found a non-significant total WMD for long-term pain intensity and disability of –0.16 [95%CI –6.03; 5.70] and 0.85 [95%CI –2.28; 3.98], respectively.

Smeets et al. [41] compared operant therapy in combination with exercise with exercise therapy alone and was the only study reporting on the outcome recovery. No significant differences were found post-treatment and at 6-months follow-up. However, a statically significant difference in favour of the exercise group was found at 12 months follow-up.

Only two studies [68, 81] reported on return to work and sick leave. Altmaier et al. [68] found that 48% in the behavioural treatment group had returned to work after 6 months, compared to 67% in the control group. However, this difference was not statically significant. Schweikert et al. [81] reported on the costs due to sick leave. During follow-up, the costs were lower in the cognitive behavioural group than in the usual care group.

Summarized, there is low to moderate quality evidence (serious limitations, inconsistency) provided for not finding an effect of behavioural therapy in addition to another treatment compared to the other treatment alone in pain intensity and disability at short- and long-term follow-up.

Behavioural treatment versus other kinds of treatment

A total of six studies compared some kind of behavioural treatment with another treatment. Two studies [41, 43] compared operant behavioural treatment with exercise therapy, one study [88] compared operant therapy with physiotherapy, one study [75] compared respondent therapy (muscle relaxation) with self-hypnosis, one study [73] compared cognitive treatment with usual GP care, and one study [71] compared operant therapy and respondent therapy (biofeedback) with education. All studies reported on pain intensity, four studies reported on disability, and two studies reported on global recovery.

Post-treatment pain intensity was reported by four studies and the WMD for operant treatment was –1.61 [95%CI –6.83; 3.60] and for respondent (biofeedback) therapy –11.33 [95%CI –22.81; 0.16; Q = 0.23, df 1]. The total non-significant WMD for post-treatment pain intensity was –2.91 [95%CI −7.96; 2.13].

Disability post-treatment was reported by three studies, all comparing operant therapy with exercise therapy/physiotherapy and the total WMD was –0.32 [95%CI –3.32; 2.68].

Short-term follow-up results were reported by four studies [71, 73, 75, 88]. The WMD for pain intensity for operant therapy was –1.86 [95%CI –9.97; 6.25], for respondent therapy (biofeedback) –5.03 [95%CI −18.15; 8.10] and the total WMD for pain intensity was –5.00 [95%CI –10.08; 0.07]. Disability was reported by two studies [73, 88], of which one had a low risk of bias, and the total WMD for disability at short-term follow-up was –0.84 [95%CI −5.23; 3.64].

Three studies, comparing an operant therapy with exercise/physiotherapy reported on the intermediate outcomes pain and disability and the WMDs were –0.11 [95%CI –7.64; 7.42] and –0.28 [95%CI –4.16; 3.60], respectively.

Four studies, of which two with a low risk of bias, reported on pain and disability at 12 months follow-up [41, 43, 73, 88]. The significant WMD for pain intensity was –6.05 [95%CI –10.70; −1.40] and the WMD for disability was –2.04 [95%CI –5.19; 1.10].

Global perceived effect was reported by van der Roer et al. [88] and by Smeets et al. [41] and both studies did not find statistically significant differences between operant behavioural treatment and exercise/physiotherapy, at post-treatment and at 3, 6, and 12 months follow-up.

Summarized, there is low to moderate quality evidence (serious limitations, and inconsistency) provided that there is no difference in effect in pain intensity and disability at short- and long-term follow-up for behavioural therapy compared to other kinds of treatment.

Comparison among different types of behavioural treatment

Cognitive versus operant

One study [77] (n = 20) with a high risk of bias compared cognitive to operant therapy. All groups in this study also received a physiotherapy back-education and exercise program. The operant therapy group reported a significantly greater improvement in general function status, but not in pain intensity.

Cognitive versus respondent therapy

Two studies (n = 67) with a high risk of bias compared cognitive to respondent therapy consisting of progressive muscle relaxation training [83, 85]. The pooled WMD (n = 67) for post-treatment pain intensity was –3.02 [95%CI −13.55; 7.52] and for disability 2.31 [95%CI −1.42; 6.04]. Only one study (n = 33) reported on long-term pain and disability, and these outcomes were not statistically significantly different between the groups [85].

Due to serious limitations and imprecision, low quality evidence is provided for that there is no effect at post-treatment on pain and disability for cognitive compared to respondent therapy.

Operant therapy versus respondent

One study with a high risk of bias compared operant therapy (relaxation training) with respondent biofeedback therapy [71]. No statically significant differences were found on short- and long-term (4 years) follow-up.

Cognitive-behavioural versus cognitive

Only one study (n = 33) with a high risk of bias included a comparison between groups receiving cognitive-behavioural therapy and cognitive therapy [85]. The cognitive-behavioural therapy consisted of cognitive therapy plus progressive muscle relaxation and imagery. There were neither post-treatment nor long-term statistically significant differences between the groups on any of the outcome measures (global improvement, disability, and pain intensity).

Cognitive-behavioural versus operant therapy

Two studies, one with a low risk of bias, were identified [74, 84]. One study compared cognitive-behavioural therapy to operant therapy and found statistically significant better post-treatment results on pain behaviour, and physical functioning with operant therapy, but no differences between the groups after 6 and 12-month follow-up [84]. The second study reported better pain control post-treatment with cognitive-behavioural therapy, but no other post-treatment or long-term differences [74].

Cognitive-behavioural versus respondent therapy

One study (n = 28) with a high risk of bias was identified [76]. Cognitive-behavioural therapy was compared to EMG biofeedback. No significant differences were found between the groups for pain or any of the outcome measures in the behavioural domain, at either post-treatment or 6-month follow-up.

Operant therapy: in vivo exposure versus graded activity

One study (n = 85) with a low risk of bias compared an exposure in vivo treatment with a graded activity program [86]. No significant differences on pain intensity and disability at post-treatment or 6-month follow-up were identified between both intervention groups.

Cognitive-behavioural treatment: group or individual therapy

One study compared the effectiveness of cognitive-behavioural group treatment with individual treatment [80]. No significant effects of group membership (individual vs. group) on pain intensity and disability were demonstrated post-treatment and at 6 months follow-up.

The effectiveness of multidisciplinary treatment

Multidisciplinary treatment versus no treatment/waiting list controls

Three studies were identified comparing a multidisciplinary treatment with no treatment or waiting list controls [92, 93, 96]. Jackel et al. [93] reported on post-treatment pain intensity and found a statistical significant difference in favour of the multidisciplinary treatment compared to the waiting list controls.

Two studies [92, 96] reported on short-term pain intensity and the significant pooled WMD was –9.47 [95%CI −13.87; −5.07; Q = 0.11, df 1] and the pooled WMD for disability was –8.84 [95%CI –18.49; 0.82; Q = 2.51, df 1]. Long-term outcomes revealed no statistically significant differences between a multidisciplinary rehabilitation and no treatment. The long-term non-significant WMDs for pain intensity and disability were –9.27 [95%CI –27.86; 9.12; Q = 6.71, df 1] and –0.77 [95%CI −4.62; 3.08; Q = 0.46, df 1], respectively. Therefore, there is moderate quality evidence for the effectiveness of multidisciplinary treatment on short-term pain intensity compared to no treatment/waiting list controls and there is moderate quality evidence for not finding an effect on disability and on long-term outcomes.

One study [92] reported on sick leave and found a statistically significant difference at 4-months follow-up between the treated and the non-treated group; the median days of sick leave in the intervention group was 10 days compared to 122 days in the control group.

Multidisciplinary treatment versus other kinds of active treatment

Four studies [15, 91, 94, 95] were identified comparing a multidisciplinary treatment with inpatient exercises [91], physiotherapy [94], usual care [95], and exercise therapy [15].

One study reported on post-treatment disability and found no significant difference between both intervention groups [95].

Short-term pain intensity was reported in two studies [15, 91] and the statistically significant pooled WMD was –11.55 [95%CI –19.68;−3.43]. One study [15] reported on functional outcome and found a significant difference between both groups in favour of the multidisciplinary treatment at short-term follow-up.

Only one study [94] with a low risk of bias reported on intermediate pain intensity and disability and no statistically significant differences between the two groups were found.

Two studies [91, 94] reported on long-term pain intensity and we found a non-significant pooled WMD of –3.34 [95%CI –11.64; 4.97]. Only one study [94], with a low risk of bias, reported on long-term (12 and 24 months) disability and found no statistically significant difference between multidisciplinary treatment and physiotherapy.

One study [15] with a low risk of bias, reported on work-readiness and found a highly significant difference between the multidisciplinary intervention and the exercise intervention; 75% of the patients in the multidisciplinary group achieved work-readiness at 4 months compared to 42% in the active treatment group. Another study with a low risk of bias reported on sick leave and found no significant difference between both intervention groups, 1 and 2 years after rehabilitation [94].

One study [15] with a low risk of bias reported on pain, disability, and return to work after 5 years follow-up. No significant differences were found on pain intensity; however, patients in the multidisciplinary treatment group showed a lower disability level compared to the patients in the exercise group.

Summarized, there is moderate evidence for the effectiveness of multidisciplinary treatment compared to other kinds of active treatment on pain intensity at short-term follow-up and there is also moderate evidence that there is no statistically significant difference on pain intensity at long-term follow-up.

Outpatient versus inpatient multidisciplinary treatment

One study [96] (n = 316) with a high risk of bias compared a 3-week inpatient back school rehabilitation program with a 15-session outpatient back school rehabilitation program. No statistically significant differences were found between both intervention groups at short-term as well as on the long-term follow-up.

Discussion

In this review, 83 RCTs were included that evaluated the effectiveness of physical and rehabilitation interventions for non-specific chronic LBP.

The effectiveness of physical and rehabilitation treatment strategies

No significant treatment effects of exercise therapy compared to no treatment/waiting list controls were found on pain intensity and disability. Although, compared to usual care, pain intensity and disability were significantly reduced by exercise therapy at short-term follow-up.

We found no difference in effectiveness of TENS and sham TENS and there were also no differences between TENS and active treatments. All types of behavioural therapy were more effective in reducing pain intensity than waiting list controls, but it is unknown whether this also applies to back-specific function. Additionally, there are some indications that the addition of behavioural components can reduce sick leave and costs due to sick leave. However, further research is encouraged to confirm these findings. Finally, multidisciplinary treatment was found to be more effective in reducing pain intensity compared to no treatment/waiting list controls and active treatments (e.g. exercise therapy, physiotherapy, and usual care), and sick leave is reduced at short-term follow-up.

Adverse events were not reported in any of the included studies.

None of the significant differences found in this overview study reached a difference larger than 10%, where in most studies a difference of 15–20% is defined as clinically relevant. Therefore, the differences found in this overview must be regarded as small and not clinically relevant.

Of particular note is the heterogeneity in some of the analyses among the studies. This heterogeneity could have been caused by differences in interventions, differences in control groups, duration of the intervention, and the risk of bias of the different studies. Therefore, the results of the meta-analyses with heterogeneity should be interpreted with some caution.

This review showed that behavioural therapy has an effect on pain intensity. This is apparent because the aim of behavioural therapy is not to treat pain, but to modify one of the three response systems (behavioural, cognition, and physiological reactivity [107]). The decrease in pain intensity might be related to the combination of different treatment strategies applied in a great number of the included studies.

It was apparent that there were no studies identified studying the effectiveness of lumbar support for the treatment of chronic LBP and few studies were found for massage therapy and traction. Therefore, further research is encouraged to identify the effectiveness of these interventions.

Two of the earlier conducted reviews on the described interventions were conducted on chronic low back patients only: behavioural therapy and TENS [4, 9]. Because we applied strict criteria for “chronic low back pain”, not all studies included in that reviews were included in our overview. When we compare the results of the study from Ostelo et al. on behavioural treatment to ours, it is apparent that Ostelo et al. found strong evidence in favour of a combined respondent-cognitive therapy for medium positive effect on pain while we conclude to have low quality evidence for the effectiveness of behavioural therapy compared to no treatment/waiting list controls/placebo for pain intensity and disability at short term [9]. This difference is probably caused by the different inclusion criteria used on chronic LBP and the different methods used to define the level of evidence. If we compare the conclusion of the Cochrane review from Khadilkar et al. on the effectiveness of TENS versus placebo with our overview, we can conclude that the conclusions drawn are very comparable; both conclude that TENS is not supported compared to placebo in the management of chronic LBP [4].

Methodological considerations

The methodological quality of the studies was generally poor. Many methodological criteria regarding the internal validity of the studies were not fulfilled. Only two studies fulfilled all 11 items [33, 63]. Blinding of the patient and blinding of the care provider were not properly conducted in many studies. Blinding of patients is also difficult in many RCTs investigating the effectives of exercise therapy, back schools, education, behavioural treatment, and multidisciplinary rehabilitation. The quality of future RCTs in the field of back pain should be improved to reduce bias in systematic reviews and overviews, as it has been demonstrated that statistical pooling of low quality trials results in overestimation of treatment effects.

Overall, evidence provided from the meta-analyses in this overview study was low. In the most analyses there were serious limitations regarding the methodological quality and in most analyses there was imprecision of data because of sparse data and wide confidence intervals. Additionally, in some analysis there was a matter of inconsistency because of heterogeneity. Therefore, further research is very likely to have an important impact on our confidence in the estimate of effect and is likely to change the estimate.

Strengths and limitations

Several biases can be introduced by literature search and selection procedure. We might have missed relevant unpublished trials, which are more likely to be small studies with non-significant results, leading to publication bias. Screening references of identified trials and systematic reviews may result in an over representation of positive studies in the review, because trials with a positive result are more likely to be referred to in other publications, leasing to reference bias. Studies not published in English, Dutch, or German were not included in this review. It is not clear whether a language restriction is associated with bias [108].

Subgroups were pooled because of the clinical homogeneity. However, methodological heterogeneity occurred in some of the comparisons between different interventions strategies.

Only a small number of the studies were rated as high quality and this may have led to an overestimation of effect. Also, studies may lack information to assess quality and clinical relevance. The only outcome measure used in the majority of studies was pain intensity, limiting the ability to report on other important outcomes. Because of the relatively small number of studies pooled within the different subgroups, it was not possible to conduct a sensitivity analysis. However, with the GRADE method applied, we have tried to account for the risk of bias found in the different studies.

Implications for practice

The most promising interventions for a physical and rehabilitation treatment in chronic LBP patients are a multidisciplinary treatment or behavioural treatment. All types of behavioural therapy were more effective in reducing pain intensity than waiting list controls. Multidisciplinary treatment was found to be more effective in reducing pain intensity compared to no treatment/waiting list controls and active treatments (e.g. exercise therapy, physiotherapy, and usual care), and sick leave is reduced at short-term follow-up. Additionally, there are some indications that the addition of behavioural components can reduce sick leave and costs due to sick leave. Also exercise therapy reduced pain intensity and disability significantly compared to usual care.

Finally, there appeared to be insufficient data to draw firm conclusion on the clinical effect of back schools, low-level laser therapy, patient education, massage, traction, superficial heat/cold, and lumbar supports.

Because of the lack of evidence and the conflicting evidence on the effectiveness of different interventions discussed in this review, only multidisciplinary treatment, behavioural treatment, and exercise therapy should be provided as conservative treatments in daily practice in the treatment of chronic LBP.

Implications for research

To conclude, we identified 83 RCTs that evaluated treatment effects for patients with chronic non-specific LBP. Most of the studies included in this review showed methodological deficiencies.

For future research the focus should be on high-quality RCTs with sufficient sample size to be able to draw firm conclusions. Interventions under study should be the ones which seem to be promising, but where evidence is still unclear or insufficient, such as multidisciplinary treatments, education, and exercise. For example, a large high-quality study comparing exercise therapy and education with a wait and see approach could give the evidence for the effectiveness of exercise therapy compared to education or a wait and see approach, and of education compared to a wait and see approach. Additionally, comparing the multidisciplinary approach to exercise therapy alone could give insight in the additional value of the multidisciplinary approach, which is probably more expensive than a single exercise program. This also implies that cost-effectiveness studies are needed to make a cost–benefit consideration.

Finally focus in research on specific subgroups of LBP patients for whom a certain intervention is most effective is also needed. Some patients might respond better to exercise therapy than others, but insight in the patient characteristics of such a subgroup is still lacking.