Background

Well-conducted randomised clinical trials (RCTs) generally provide the strongest evidence for the effectiveness of treatments. RCTs on the effectiveness of treatments for non-specific low back pain have not found evidence for a clear superiority of any treatment [13]. Yet, low back pain symptoms tend to improve in RCTs regardless of the treatment provided. Such improvement seems to follow a pattern common to all treatment arms, of rapid early improvement within the first 6 weeks reaching a plateau over the following 12 months [4]. This is explained at least partly by the ‘natural history’ (i.e. the propensity for symptoms to improve without treatment). With the use of treatment this is referred to as the ‘clinical course’ of symptoms. The clinical course of back pain has been assessed in observational (cohort) studies [5, 6]. It was also found to follow a pattern of general improvement that starts rapidly and plateaus over time. Although this suggests a similarity between RCTs and cohort studies, there is no clear evidence for this from direct comparison. More importantly, it is not clear whether the size of overall symptom improvement is the same in these two groups of studies. There is only a limited evidence for a direct comparison, mainly comparing RCTs with non-randomised trials and observational studies that included comparator groups [7].

There is an assumption that the course of symptoms in RCTs is different from that in cohort studies. It has been suggested that the mere participation in a trial influences the course of symptoms [8, 9]. This might be explained by benefits perceived by participants and assumed to be related to the intensive assessment and monitoring. The so called ‘Hawthorne effect’ was quoted as an example of how individuals change behaviour due to the attention they receive from researchers. [1012]. Although this is expected to apply to all studies, it might be relatively more pronounced in RCTs compared with cohort studies.

Another issue is whether participants in RCTs are in some way different from the average person presenting for care in usual clinical practice. Whether their willingness to be randomly allocated to a treatment or a placebo makes these individuals different from the average patient to whom the results of RCTs will be applied. If true, this raises the issue of whether participants in RCTs are less representative of the average patients compared with participants in observational studies in which patients are not randomised.

It is therefore important to establish the evidence for the similarity or otherwise, in the pattern and the size of back pain symptom improvement in these two types of studies. This would test the assumption that mere willingness to enrol in RCTs and be randomised to treatments would influence the clinical course of symptoms. This would have potentially important implications on interpreting the results of RCTs and their generalizability in clinical practice.

The aim of this systematic review and meta-analysis was to compare changes in low back pain symptoms over time in RCT participants with those of participants in observational cohort studies.

Methods

Criteria for inclusion

Included were studies (RCTs and prospective observational cohort studies) conducted for primary care treatment for LBP (e.g. analgesia, exercises, manipulation therapy) among individuals aged 18 or over. Studies had to provide baseline and follow-up data on the designated primary outcome measure of pain intensity, measured on a Numerical Rating Scale (NRS) or Visual Analogue Scale (VAS). Only studies published in English were included. Also excluded were studies conducted among patients with specific LBP (e.g. cancer or inflammatory arthritis), post-operative or post-traumatic back pain, or back pain associated with pregnancy or labour.

Searching and selection of studies

To meet the specific aims of the study, the literature search did not have to be exhaustive, but to provide sufficiently large pool of studies. The Cochrane Central Register of Controlled Trials (CENTRAL) was therefore chosen as a sufficient data source for RCTs.. This search was an update (up to April 2012) of a strategy previously used and described elsewhere [4]. For observational studies, a literature search was conducted for the same time period using the databases of AMED, EMBASE, MEDLINE and CINAHL based on the keywords ‘low back pain’, ‘back pain’, ‘spinal pain’, ‘primary care’, ‘general practice’, ‘population’, ‘cohort’, ‘observational’, ‘prognosis’, predictor’ and ‘course’. The detailed search strategy is shown in Additional file 1. References accompanying relevant systematic reviews and included cohort studies were also hand-checked to identify additional eligible studies.

The literature search was conducted by MA and screening of citations/abstracts ad selection of RCTs and cohort studies applying the inclusion criteria was conducted by MA, DVdW & KPJ.

Data extraction

The extracted data included:

  1. 1.

    Study characteristics (publication year, country of study, clinical setting, study design, sample size).

  2. 2.

    Participants’ characteristics (mean age;% female; duration of symptoms).

  3. 3.

    Interventions: name, dose and provider.

  4. 4.

    Outcome: baseline and follow up mean scores (and baseline standard deviation (SD)) for pain intensity.

Analysis

Firstly, RCTs as a single group were compared with observational studies. Secondly, RCTs were sub-grouped into efficacy and pragmatic trials, based on whether the trial included a placebo, sham or no treatment, with such trials being grouped as efficacy trials. RCTs that included comparator treatment of usual care or waiting list arms were classified as pragmatic trials. To compare studies groups that are similar with regard to the type of treatment, a separate analysis was conducted to compare cohort studies with RCT arms that received ‘usual care’. Each RCT sub-group was compared separately with observational studies.

Pain intensity scores were converted to a zero to 100 scale (least to most severe) where necessary by multiplication. Meta-analysis using a random effects model was performed using STATA/IC 11 software to compute pooled mean pain intensity scores (and 95% confidence intervals) at baseline and follow up, separately for RCT treatment arms and for observational studies. Commonly used follow-up times of 6, 13, 27 and 52 weeks were selected for comparison. Data on other time points were considered to fall within the selected points if they were within a three-week range.

To compare the size of improvement in outcome scores in RCTs and observational studies, the standardized mean change (SMC) [13] was calculated for each RCT treatment arm and observational study by subtracting the follow-up mean outcome score from the baseline mean score and dividing by the standard deviation (SD) of baseline scores. Pooled SMCs were calculated using random effects meta-analysis. SMCs over 0.8 were considered large, 0.5 – 0.8 moderate and less than 0.5 small [14]. The 95% Confidence Intervals for SMCs were calculated using the formula described by Hozo et al. [15]. The variance (squared standard deviation, σ2) of response size was calculated using the following formula [15]:

σ 2 = 2 1 ρ / n n 1 / n 3 1 + n / 2 1 ρ δ 2 δ 2 / c n 1 2

Where: c (n-1) approximates 1 - [3 / 4(n-1) –1], ρ is the population correlation between baseline and follow-up scores which was estimated as 0.5, n is sample size and δ is the SMC. Heterogeneity of studies’ estimates was assessed by computing I 2 statistic [16], where zero indicates no variation between studies and 100% indicates that all variation is the result of variation between studies. Meta-regression analyses were conducted to test the significance of the difference in the size of SMCs between RCTs and observational studies at the selected follow up points.

Results

Included studies

The updated search for RCTs yielded a total of 1134 citations of which papers for 70 RCTs (165 treatment arms) satisfied the inclusion criteria and provided pain intensity data useful for analysis (Figure 1). The search for observational studies yielded a total of 653 citations (Figure 2), and data for pain intensity useful for analysis were provided in 15 papers. Relevant data were obtained for further four papers by contacting authors, allowing analysis of pain intensity data from papers for a total of 19 observational studies.

Figure 1
figure 1

Identification and inclusion of RCTs in the systematic review.

Figure 2
figure 2

Identification and inclusion of observational cohort studies in the systematic review.

Characteristics of study setting and population

A list of the included RCTs and observational studies and their population characteristics are presented in Tables 1 & 2. They were conducted in more than13 countries including the USA, Australia, and European countries during a period spanning two decades. They are comparable in terms of age distribution, gender composition and mean baseline pain intensity (Table 3). It appears that compared with observational studies, RCTs included a larger percentage of participants described as having chronic low back pain (57% in RCTs vs 11% in cohorts). However, these figures need to be interpreted with caution as observational studies often included a mixture of patients with acute and chronic back pain (19% in RCTs vs 63% in cohorts).

Table 1 Characteristics of included observational cohort studies (n 19)
Table 2 Characteristics of included RCTs (n 70)
Table 3 Comparison of population characteristics of included RCTs and observational cohort studies

The setting of RCTs included general practice (18 RCTs), occupational health care departments (15 RCTs) and physiotherapy departments (19 RCTs). Eight trials were conducted among the general population and 10 in mixed settings. 13 RCTs (34 treatment arms) were classified by one of the authors (MA) as efficacy trials and the remaining 57 (131 treatment arms) as pragmatic trials. Eight RCTs included ‘usual care’ arms. The19 observational studies included consulters in general practice (11 studies) and other allied primary care services such as chiropractic clinics and physiotherapy departments, as well as cohorts sampled from the general population in two studies. All participants were described in the papers as receiving ‘usual’ or ‘standard care’.

The course of pain intensity scores over time

Pooled mean pain intensity scores at baseline and follow up for RCTs and observational studies are presented in Figure 3 and Table 4. They show a similar pattern of symptom change over time in both groups. This is represented by a substantial rapid early improvement of mean pain intensity within the first 13 weeks of follow-up followed by a smaller further improvement over the follow-up period to 52 weeks.

Figure 3
figure 3

Pooled mean pain intensity scores (95% confidence interval) for the included RCTs and observational cohort studies from baseline to 52 week follow up.

Table 4 Pooled mean pain intensity scores (95% CI) for included RCTs and observational cohort studies using random effects meta-analysis

Regarding the size of symptom change over time, pooled SMCs (Table 5) confirm the substantial improvement in pain symptoms in both groups. These range from 0.9 to 1.2 for RCTs and from 1.0 to 1.2 for observational studies.

Table 5 Pooled estimates of SMCs (95% confidence interval) for pain intensity for included RCTs and observational cohort studies

There was a large between-study variation in the sizes of pain improvement from baseline within both observational studies and RCT treatment arms demonstrated by the high I 2 values (99%).

Meta-regression analysis showed no statistically significant difference in the change in pain intensity (SMC) between all RCTs and observational studies at any follow up point. There was also no statistically significant difference in the change in pain intensity when considering the two types of RCTs (pragmatic and efficacy) separately compared with observational studies. Comparing cohort studies and usual care arms of RCTs also did not show any difference in the pattern or course of LBP between these groups.

Discussion

This study directly compared the course of non-specific low back pain symptoms in observational studies with RCTs on primary care treatments for back pain. The results showed no significant difference in the size of symptom improvement and the pattern of this improvement over time.

Investigating whether any difference is concentrated between observational studies and efficacy RCTs failed to show any difference in the size of symptom improvement. This was to test the assumption that compared with pragmatic RCTs, efficacy RCTs are characterised by higher level of attention and adherence to treatment protocol as well as stricter criteria for patient selection and inclusion [111, 112]. Guidelines and tools are available to describe clinical trials as efficacy or pragmatic. The purpose of some of these tools is to inform trial design [111] while others are for the purpose of systematic reviews [112]. RCTs, however, are very rarely purely pragmatic or efficacy trials and could often be described along a continuum between these two ends and most include features of both with possible dominance of either. To satisfy the specific aims of our study related to the care and attention received in studies, the approach adopted was to describe trials that included placebo, sham or no treatment arms as efficacy trials.

A separate comparison between observational studies and the ‘usual treatment’ arms of RCTs was assumed to provide a comparison of groups receiving similar types of treatments. This comparison also failed to show any difference in the pattern or size of the clinical course of symptoms in these groups. This echoes what we have previously demonstrated of the absence of a significant difference in the pattern or size of symptom improvement in RCTs comparing usual care with active treatment arms [4].

One of the findings in this study was the large heterogeneity among cohort studies and RCT arms. Conducting meta-analysis in the presence of a large heterogeneity is potentially problematic. Using random effects model would have ameliorated this problem to an extent, but not completely. For this reason, the outcome of the meta-analysis will need to be interpreted within the specific context and aim of this study, namely to study the general trend of the clinical course of symptoms. The heterogeneity could be explained by a number of potential methodological as well as clinical characteristics. Formally studying such potential sources of heterogeneity is important and is beyond the aims of this study.

Meta-analyses comparing RCTs and observational studies have been conducted with varying aims including comparing treatment effects [111], adverse effects of treatments [112, 113] and prognostic factors [114]. However, although the clinical course of low back pain has been studied in observational studies [10, 11], we are not aware of a direct comparison with the clinical course of symptoms in RCTs. Furlan et al. [12] compared matching pairs of RCTs and non-randomised studies and included cohort studies but only those that had comparison groups. More significantly, the main aim of Furlan et al’s work was to compare RCTs with non-randomised studies regarding their methodological quality rather than to study the clinical course of symptoms.

A number of factors have been suggested to influence the course of symptoms in clinical trials, related to the participants (e.g. cultural background, health literacy) [115117], the practitioner/researcher (e.g. communication skills and experience with the use of the treatment) [115, 118] and the characteristics of the treatment (e.g. invasiveness, physical contact and psychological component) [119]. Another factor is suggested to relate to the actual enrolment in a trial. This is assumed to be related to the factual and perceived extensive care and attention provided in the trial - the ‘Hawthorne effect’, the ‘care effect’ or the unique strict adherence to the treatment protocol ‘protocol effect’. Such effects are assumed to contribute to extra improvement among participants in clinical trials compared with other studies or usual clinical practice [5].

The clinical course of back pain in observational studies might simply represent an extension of our earlier findings in RCTs [4]. This represents an average ‘general response to health care’ which dominates any individual responses to treatments. This general response overwhelms any additional effect of being in a trial, observational study or in fact seeking usual routine care. It is true that specific treatments are provided in RCTs as opposed to observational studies where no particular treatments are specified. In fact none of the observational studies included in our review included a specific treatment. However, conservative treatments for non-specific low back pain investigated in RCTs are not new but already available in clinical practice [1, 3]. This might mean that expectations of novel and big effects among those participating in RCTs of back pain are not generally high.

Alternatively, differences may exist between RCTs and observational studies in the care and attention provided. But the effect on the clinical course of symptoms lies in outcomes other than those captured by pain intensity. Outcomes that may specifically represent components of a ‘trial effect’, and their measurement was beyond the scope of this paper.

Participants of observational studies are arguably similar to patients presenting in usual clinical practice. This means that our findings suggest that RCTs participants are not different from the average patients with regard to the clinical course of LBP. This challenges the assumption that participants in clinical trials are somehow different from the average patients. Or that their symptoms run a course that is to an extent influenced by mere participation in the trial. In other words, or findings would support the generalizability of the trials’ findings to patients in usual clinical practice. The findings also throws in doubt the assumption related to the effect of mere participation in a trial, although our study did not specifically aims to study this effect.

Limitations

A large number of observational studies and RCTs on a wide range of treatments for non-specific low back pain were included to study the overall size of change in pain symptoms over time. The study, however, has a number of limitations.

For literature search, we adopted the same strategy that was adopted in a previous study conducted and published by the same group to examine the course of LBP in RCTs [4]. This was an updated access to the CENTRAL database. Although this might have limited the number of RCTs included in the study, it is unlikely that this represented a very large number that would have impacted the study outcome. Adopting the same strategy also provides the opportunity for a continuity of comparison between the two studies.

Also, as the aim of the study was to investigate the overall clinical course of LBP rather than to estimate the effectiveness of a particular treatment, an exhaustive inclusion of all trials on back pain treatments was not required. The aim was to have a large and representative pool of clinical trials that would vary sufficiently with respect to the types of treatments to achieve the objectives in this review and the CENTRAL database satisfied this aim. As a similar data base does not exist for observational cohort, a different search strategy was conducted for this group of studies.

The numbers of included RCTs and observational studies were not comparable. This might raise the concern that the outcome of the comparison is inaccurate. Although this is an arguably valid concern, the comparison with smaller subgroups of RCTs (efficacy RCTs and usual care arms) provided a more comparable numbers. The outcome of these comparisons confirmed the outcome of comparing the total groups of RCTs and cohort studies, which should help alleviate the related concerns.

The focus in our study was on pain intensity outcome using a Numerical Rating Scale (NRS) or Visual Analogue Scale (VAS). This was because of the lack of data on other outcome measures such as functional disability outcomes that would allow for a satisfactory comparison. The forced focus on one outcome measure in meta-analysis is common in systematic reviews of observational studies because of the lack of data on other outcome measures [11]. Excluding studies that did not provide data relevant to the analysis used in this study might have influenced our results. However, we have no evidence to suggest that this has led to systematic exclusion of studies with either large or small improvement in symptoms. We found in a previous review that the overall course of symptoms using functional disability outcomes (Roland Morris disability questionnaire, RMDQ and Oswestry Disability Inventory ODI) was similar to that when using pain intensity outcome [4].

Conclusion

The course of back pain symptoms in observational studies follows a pattern that is similar to that in RCTs, notably in the size of the average improvement in pain intensity over time. This suggests that, in both types of studies, a general improvement in back pain symptoms and comparable responses to nonspecific effects related to seeking and receiving care occur regardless of the study design.