Interventions for promoting evidence-based guideline-consistent surgery in low back pain: a systematic review and meta-analysis of randomised controlled trials

Examine the effectiveness of interventions to approach guideline-adherent surgical referrals for low back pain assessed via systematic review and meta-analysis. Five databases (10 September 2021), Google Scholar, reference lists of relevant systematic reviews were searched and forward and backward citation tracking of included studies were implemented. Randomised controlled/clinical trials in adults with low back pain of interventions to optimise surgery rates or referrals to surgery or secondary referral were included. Bias was assessed using the Cochrane ROB2 tool and evidence certainty via Grading of Recommendations Assessment, Development and Evaluation (GRADE). A random effects meta-analysis with a Paule Mandel estimator plus Hartung–Knapp–Sidik–Jonkman method was used to calculate the odds ratio and 95% confidence interval, respectively. Of 886 records, 6 studies were included (N = 258,329) participants; cluster sizes ranged from 4 to 54. Five studies were rated as low risk of bias and one as having some concerns. Two studies reporting spine surgery referral or rates could only be pooled via combination of p values and gave evidence for a reduction (p = 0.021, Fisher’s method, risk of bias: low). This did not persist with sensitivity analysis (p = 0.053). For secondary referral, meta-analysis revealed a non-significant odds ratio of 1.07 (95% CI [0.55, 2.06], I2 = 73.0%, n = 4 studies, Grading of Recommendations Assessment, Development and Evaluation [GRADE] evidence certainty: very low). Few RCTs exist for interventions to improve guideline-adherent spine surgery rates or referral. Clinician education in isolation may not be effective. Future RCTs should consider organisational and/or policy level interventions. CRD42020215137.


Strengths and limitations of this study
• Strength: systematic review and meta-analytic approach. • Strength: the analytical methods explore the heterogeneity in the main data estimates, by sensitivity analyses to examine outliers, outcome type, and assumptions on ICC values.
• Strength: implemented the most appropriate adjustment for the sample sizes of the included cluster RCTs for meta-analysis and used more efficient meta-analytic methods accounting for the smaller number of included studies. • Limitation: limited evidence base and the differences of the interventions within included studies. • Limitation: the outcomes examining secondary (specialist) referral are likely not solely surgical referral.

Introduction
Low back pain (LBP) is the leading cause of disability worldwide [1]. It is estimated that up to 90% of people will suffer from LBP during their lifetime [2]. The societal cost of LBP is reported to be in excess of $USD100 billion per year in the USA [3], more than $AUD9 billion per year in Australia [4] and approximately €50 billion per year in Germany [5]. The development and resultant adoption of evidence-based clinical guidelines has been shown to reduce costs and may lead to improved patient outcomes: randomised controlled trials have reported either significant [6][7][8] or non-significant [9][10][11] reductions in costs favouring guideline adherent approaches. Patient outcomes have been seen to improve in some [10,12,13], but not all [14,15] randomised controlled studies trialling guidelineconsistent approaches.
International evidence-based clinical practice guidelines provide recommendations on specialist referral for back pain. The majority (69%) of international evidencebased clinical guidelines recommend specialist referral, including that of a neurosurgeon, in cases of serious radiculopathy, whereas 54% further recommend specialist referral if there is no improvement over time [16]. Conversely, neurosurgical referral is not recommended in patients with non-specific LBP [16]. Recommendations regarding the use of surgery for non-specific and radicular LBP are debated and subsequently 47% of guidelines do not provide any recommendation for or against. Of those that do provide recommendations, half are against the use of surgery for non-specific LBP [16].
Currently, clinical practice guidelines are not always followed. For example, increasing rates of surgical procedures for LBP have been observed internationally [17][18][19], despite no underlying change in the prevalence of the condition. Further, investigations of referral of patients to secondary specialist care show that these referrals can occur unnecessarily: three prior studies noted unnecessary or inappropriate referral in 10% (14 of 139) [20], 19% (10 of 54) [21], and 41% (12 of 29) [22] of the acute LBP patients included. Beyond the additional cost to healthcare systems of secondary care consultation, surgical management of LBP carries more potential harms than conservative management [18]. Overall, the data suggest that there is potential for cost savings and harm reduction by implementing changes in clinical practice to approach guideline-conform surgery and surgery referral in primary care. To date, no systematic review has assessed this topic.
Our aim was to systematically review and meta-analyse the randomised controlled or clinical trials (RCTs) that examined to the implementation of evidence-based guidelines on surgery rates or surgical/secondary referral for low back pain. In addition, to potentially inform future work, we also collated information on prospective nonrandomised interventional studies relevant to the review.

Materials and methods
This review was completed in accordance with Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) [23]. The review was registered prospectively with PROSPERO (CRD42020215137). The statistical code, data and extraction sheet are available at https:// www. doi. org/ 10. 17605/ OSF. IO/ APMH2. To locate additional relevant records, we also searched Google Scholar and included the reference lists of relevant systematic reviews (identified via the Cochrane Database of Systematic Reviews and Google Scholar) published in the last 10 years. The reference lists of included studies were checked for potentially relevant articles. In addition, forward citation tracking of included studies was performed by adding articles that cited the included studies in Web of Science to screening. Furthermore, reference lists (double screened) of studies excluded solely for not being an RCT (e.g. interrupted time series analyses, controlled before and after studies) were also screened for potentially relevant articles.

Study selection
All results of the search were screened to exclude duplicates. Two independent reviewers (KE, XC) who were blinded for each other's assessment screened the titles and abstracts of the remaining studies against a predetermined eligibility criteria. The full-text reports of articles deemed potentially eligible after this first screening were screened once again by both independent reviewers using the previously mentioned inclusion and exclusion criteria. Any disagreements were adjudicated by SDT and discussed with the project team as necessary.
Participants: adults (≥ 18 yrs of age) with LBP. See Supplemental Data 1 for further detail on definitions.
Interventions: No limitation was placed on interventions, and these were classified according to procedures [24] used by the Cochrane Effective Practice and Organisation of Care (EPOC) group into professional, financial, organisational, patient oriented, structural or regulatory interventions.
Comparators: were, per Cochrane EPOC procedures [24], no intervention control group, standard practice control group and/or untargeted activity.
Outcomes: Surgical referral (including referral to secondary care if deemed to have surgical component) or surgery rates were included outcomes. See Supplemental Data 1 for further detail on definitions.
Study design: only full-text peer-reviewed journal article (grey literature excluded) with a parallel arm (individual-or cluster-designed) randomised controlled or clinical trial design were eligible.

Data collection and data items
Data extraction was completed by two independent assessors (xxxx). Extracted information included: publication information (i.e. author, title, year, journal), study design, cluster details for cluster randomised trials (e.g. number of clusters), number of participants, participant characteristics (e.g. age and sex), intervention details (e.g. duration, type, frequency), outcome measures (e.g. surgical rates), study funding, and author conflict of interest. Where available, we extracted participants' pain intensity, disability, and any adverse events from included trials. Extracted data were the number or percentage of surgery use or referrals and the total number of participants or appointment sessions. When data were presented in figures only, data were extracted by generating a screenshot and using ImageJ (version 1.48v https:// imagej. nih. gov/ ij/) to measure the length (in pixels) of the axes to calibrate, and then the length in pixels of the data points of interest [25]. In all instances where data required for meta-analysis were not available, authors were contacted a minimum of three times over a four-week period to request the information. Similarity between extracted data from the two independent assessors was evaluated through custom spreadsheets set-up in Google Sheets. Any discrepancies were discussed by the assessors with disagreements adjudicated by xxxx.

Risk of bias in individual studies
The Cochrane Collaboration Risk of Bias Tool version 2 [26] was for assessment of study quality. Further detail is provided in Supplemental Data 1. Two assessors (KE, XC) evaluated this independently and any disagreements were adjudicated by SDT.

Certainty of evidence
The assessment of certainty in the body of evidence was conducted in accordance with the Grading of Recommendations Assessment, Development and Evaluation (GRADE) guidelines [27] (see Supplementary Table 2) and performed by one person (SDT).

Statistical analysis
For statistical analysis, we created two categories of comparators: (1) multifaceted intervention and (2) no intervention control group, standard practice control group according to the EPOC guidelines [24]. The primary analysis approach was a random effects meta-analysis of odds ratios (ORs; as they are invariant to baseline risk [28]) with a Paule Mandel estimator for the between study variance T 2 , an SSW estimator for the overall effect with weights that depended only on the studies' effective sample sizes, and a 95% CI for the overall effect based on the Hartung-Knapp-Sidik-Jonkman method. We used this method as it outperforms the standard random effects method and other methods [29]. Cluster RCTs were handled by calculating a design effect to correct for clustering of the trials (see Supplemental Data 1 for further detail on this approach). Measures of heterogeneity used were Cochrane Q and the resulting chi-squared statistic and I 2 . 95%-Prediction intervals were used to assess the amount of heterogeneity if there were at least 10 studies in the meta-analysis [30]. Publication bias (or small study bias) was assessed via funnel plots, Egger's test, and trim and fill methods if at least 10 studies are included in the meta-analysis [31]. We performed sensitivity analysis via outlier identification and influence analysis [32].
If meta-analysis of effect size estimates was not possible due to missing data, following Cochrane Handbook recommendations [33], we performed a combination of P values. This can be considered where there is no or minimal information reported beyond P values and the direction of effect [33]. We used Fisher's method to combine the P values and performed a sensitivity analysis with Stouffer's method [33,34]. All calculations and graphics were performed with the software R [35] and the extension packages Meta [36], dmetar [37] and metap [38].

Study selection
A PRISMA flow chart of the systematic review process is shown in Fig. 1. There were 886 studies included in title and abstract screening, and 88 were identified as potentially meeting the eligibility criteria and included in the full-text screening. The examination of full texts resulted in 82 studies being excluded (Supplementary Table 3) and six [8,11,[39][40][41][42] studies included (Tables 1 and 2). Of these, six [8,11,[39][40][41][42] studies were eligible for quantitative analysis; two of which examined spine surgery referral rates and four which reported referral to secondary care. One publication required extraction from an image [41]. For one publication [41], the authors were contacted to request data, but the lead author had died and data were no longer available.

Study characteristics
Population The sample sizes included in intervention phases of the studies ranged from 1101 to 245,710 and total number of patients included in the review was 258,329. Cluster sizes ranged from 4 through to 54, with the number of clusters in one study [8] being unclear. Attempts to contact authors for further information were not successful. Two studies examined patients with acute LBP, and [8,11] two studies examined patients where the majority [39] or all [40] had less than three months of pain. In two studies [41,42], the duration of pain was unclear.
Interventions All six studies [8,11,[39][40][41][42] incorporated education and/or workshop component interventions for clinicians, with three studies [8,40,42] incorporating audit and/or feedback components, with three studies [11,40,42] incorporating some form of passive dissemination of materials to clinicians. One study [40] implemented changes in (electronic) medical records systems to remind clinicians of guideline-based approach. One US-based study [41] further worked with hospital administrators and educating them regarding negative impact on profits of non-adherent approaches and also implemented shared decision-making with patients.
Comparator Five studies [8,11,39,41,42] used a nointervention control for clinicians and one study [40] used passive dissemination of guidelines. One study implemented patient education materials in a control group [8].
Outcomes Two studies [39,41] reported data on spine surgery consultations [39] or spine surgery rates [41], and four studies reported referral more widely to specialist or secondary care but did not specifically report surgical consultation rates [8,11,40,42]. Three studies reported a significant reduction in surgery rates [41] or secondary referral rates [11,40], one study [8] reported a change in specialist referral rates (increase in control, decrease in intervention), but it was not significant (Fig. 2). Two studies reported no significant change in spine surgeon visits [39], referral rates [11] or specialist consultations [42].
Study design All included studies were cluster randomised controlled trials. Four studies had public or notfor-profit funding [8,39,41,42], and one study [40] received funding from a medical insurer foundation and professional/ community organisations and one study [11] did not report funding sources.

Risk of bias and GRADE assessment
Five studies [11,[39][40][41][42] were rated as low risk of bias and one [8] as having some concerns (Table 3). Considering individual domains, there was low risk of bias for the measurement of outcome, and some concerns in the remaining domains (Table 3). GRADE assessment was not  Table 1 for more detail on the included studies. There was very-low quality evidence for no effect of interventions on changing secondary referral rates. The outcome spine surgery referral or rates could only be assessed via meta-analysis of p-values and thus a forest plot was not possible (see "Results"). For this outcome, there was evidence for benefit of a reduction of surgery referral (p = 0.021, n = 2 studies, Risk of Bias: low), but this effect did not remain significant on sensitivity analysis. The number of events relative to the total participants are adjusted for the intraclass cluster coefficient during the analysis and displayed here implemented for the surgical referral or rate outcome as meta-analyses could not be conducted. For secondary referral, the certainty of the evidence was rated for meta-analytic outcomes as very low. Main reasons for downgrading the evidence were study quality, imprecision and inconsistency. Publication bias and/or small study bias could not be assessed because fewer than 10 studies were included [31], but the majority of studies were funded by not-for-profit entities leading us to expect that publication bias is unlikely to impact the evidence.
For surgical referral or rates, we performed a combination of P values as only Cherkin et al. [39] reported an effect size (adjusted OR: 1.11; 95% CI [0.53, 2.32]) whereas the other study [41] did not. In this analysis, there was evidence for benefit of a reduction of surgery referral in at least one study (p = 0.021, Fisher's method, n = 2 studies, risk of bias: low; GRADE assessment not possible as meta-analysis unable to be performed). However, sensitivity analysis with Stouffer's method did not show a significant effect (p = 0.053) when applying an alpha level of 0.05.
For studies reporting the outcome of secondary referral, we performed a meta-analysis with four studies (Fig. 2) [8,11,40,42]. A conservative ICC value of 0.015 was assumed for all studies that had no estimate and for the other studies the published estimate. Overall, meta-analysis revealed a non-significant OR 1.07; [95% CI (0.55, 2.06), I 2 = 73.0%, 95% CI (23.9%, 90.4%), n = 4 studies, GRADE: very low]. We performed sensitivity analyses on secondary referral (Table 4). First, we checked if there were potential outliers  [11] 2001 Low Low Low Low Low Low Low Riis [40] 2016 Low Low Low Low Low Low Low Schectman [8] 2003 Some concerns Some concerns Some concerns Some concerns Low Some concerns Some concerns or influential studies and what the impact of the removal of these studies would have on the overall summary effect size. We identified two influential studies [11,40]. The removal of these studies had a slight effect in favour of the control group OR 1.13; [95% CI (0.63, 2.05), I 2 = 0%, 95% CI (not estimable), n = 2 studies). A sensitivity analysis was done to check if a value of the ICC would change the results which was not the case. One further sensitivity analysis was done by only including trials with a low risk of bias. This did not alter the results (Table 4). A sensitivity analysis for the study [41] where some data extraction occurred from an image is not applicable as the meta-analysis was performed using the p-value for this study (which was extracted from the text). We conclude that the results of the main analysis are robust regarding the performed sensitivity analyses.

Discussion
This review examined the effects of various interventions on optimising guideline-adherent surgery referral and rates for low back pain patients. To the best of our knowledge, our study is the first systematic review to consider interventions to improve guideline-adherent surgery and surgical referral in back pain patients. All six included studies were cluster RCTs examining clinician level interventions.
There was some evidence for impact of the interventions on spine surgery rates or referral (two studies included), but this effect did not persist with sensitivity analysis. In considering referral to secondary care (four studies) we found very low-quality evidence that the interventions had no impact on the outcomes of interest. Individually, two studies [40,41] showed a significant impact on the outcomes of interest here whereas the remainder did not.

Informing future efforts for guideline-adherent surgical referral for back pain
Clinician education and/or workshops were a key component of the interventions in all studies [8,11,[39][40][41] with two studies [8,40] implementing a passive dissemination component (i.e. providing clinicians with information on guidelines), and the same two [8,40] studies incorporated clinical audit and feedback components. Overall, on the basis of the data and results of quantitative analysis, we argue that clinician education is, in isolation, unlikely an effective approach. Notably, the two studies showing significant effects on the number of back surgeries [41] and referral to secondary care [40] also implemented organisational changes. Goldberg et al. [41], a US-based study, also implemented shared decision making with patients and further worked with hospital administrators to educate them regarding negative impact on profits of non-guideline-adherent approaches. The other study [40] implemented changes in (electronic) medical records systems to remind clinicians of guideline-based approach. This provides some suggestions that organisational change interventions may be more effective.
To inform discussion of potentially effective interventions, as part of this systematic review, we also collated and extracted studies that were prospective interventional designs but excluded solely for being non-randomised (Supplementary Table 4). However, this only identified one relevant study. Nonetheless, this controlled before and after study [43] conducted in a region of Denmark, implemented structural changes to patient flow with the introduction of non-surgical multidisciplinary spine clinics prior to patient referral to surgical care. This study showed a 52% reduction in the rates of lumbar disc surgeries per 100,000 inhabitants. This compared to a 12% increase in the remainder of the country. This further highlights the potential of structural and policy change as effective interventions.

Implementation of behaviour change in clinical practice
The implementation of such organisational and structural changes across an organisation are important considerations for effective transition to guideline-based behaviours. Behaviour change approaches for implementing evidencebased practice rely on the modification of the behaviour of all parties within an organisational structure. This includes clinicians, managers and administrators and requires an understanding of the determinants of current and desired behaviours [44]. Behaviour change is not a simple process and requires the application of evidence-based behaviour change principles to enhance efficacy [45]. It is rare for a single approach to be adequate; interventions typically require multiple elements that focus on different aspects of current behaviour. Future work should consider adopting a framework (such as the Theoretical Domains Framework [46]) to guide the design of interventions which consider the cognitive, affective, social, and environmental influences of behaviour within the context of the modified behaviours that are intended to be adopted [46,47].

Evidence gaps and recommendations for future primary research
The findings of this systematic review highlight important considerations for future work. Pragmatically, randomised controlled cluster trials are likely the most feasible highquality study design to implement in an organisational setting, as opposed to trials where patients are randomised to different interventions. Controlled before-and-after and interrupted time series designs are often easier to perform in 1 3 an organisational setting, but randomised designs are necessary to provide high-quality evidence for guiding practice in the future and informing guidelines. Importantly, both of these study designs allow opportunities for the thorough consideration of organisational wide behavioural change factors and transparent reporting of design principles. Based on our current findings, we recommend implementing future randomised controlled cluster trials of interventions that include key components of organisational (e.g. structured patient flow, changes to referral requirements and/or changes to electronic medical records systems) and ideally policy (e.g. funding model) change. Beyond this, studies evaluating the effect of organisational behavioural change on guideline adherence should report surgery rates and referral for surgical consultation. Of the six studies identified in this systematic review, two studies [39,41] did not report these important outcomes. As recommended in Cochrane guidelines [48], future studies should report the total number of surgeries or referrals relative to the total number of appointments or patients seen to allow for appropriate metaanalysis. Furthermore, future cluster randomised controlled trials should always [48] report the intra-cluster correlation co-efficient (ICC) for their study. The ICC is a measure of how similar patients within the clusters are [49]. and this information is necessary [48] for appropriate meta-analysis of cluster RCTs.

Strengthens and limitations of the evidence included and of review processes
Strengths of the present study include the systematic review and meta-analytic approach applied. Further, the analytical methods explore the heterogeneity in the main data estimates, by sensitivity analyses to examine outliers, outcome type, and assumptions on ICC values. We implemented the most appropriate adjustment for the sample sizes of the included cluster RCTs for meta-analysis [48]. Finally, we used more efficient [29] (i.e. better coverage probability and less bias estimating the between study variance Tau) metaanalytic methods. A strength of the included evidence is that they stem from randomised controlled trials, the highest level of primary evidence for the research question.
Limitations include the lack of eligible studies published to date for inclusion and the differences of the interventions within included studies. The limited pool of studies prevented further investigation of sub-groups and heterogeneity, for example via meta-regression. Of the included studies, some did not report the data in a way which enabled metaanalysis of odds ratios and only pooling of p values could be performed. This limits the quality and utility of the evidence we were able to glean from the literature. It was not possible to provide a GRADE assessment for when meta-analysis was based on combination of p values. The GRADE approach is based on estimated effect sizes [27] and given meta-analysis was not performed, it is impossible to rate p values with this framework. Due to the low number of studies, we could not provide a prediction interval for the meta-analysis as they do not provide a useful estimate of the amount of heterogeneity if the included number of studies is low (< 10) [50]. A subgroup analyses on the types of interventions were not possible and might explain why few showed beneficial impact of the intervention on the outcome of interest. Further, the outcomes examining secondary (specialist) referral are likely not solely surgical referral. Whilst we argue the majority will be referral to surgical care, this nonetheless limits our ability to comment on surgical referral rates. In terms of review processes, we did not include grey literature (e.g. conference abstracts) in our review; Cochrane guidelines [33] regard this as highly desirable but not mandatory.

Conclusion
There is a limited evidence base for interventions that improve guideline-adherent surgery rates or referral in back pain. Based on the current evidence, it is not possible to provide a clear recommendation for best-practice approach. Clinician education may, in isolation, be less effective and organisational and/or policy change interventions may be more effective. An evidence gap for future work is for RCTs to ideally assess these intervention components separately to better inform implementation efforts. We also provide recommendations for improving the reporting of cluster RCTs in the future.