INTRODUCTION

Despite increasing US expenditures for low back pain (LBP), patient outcomes have deteriorated.1 The current epidemic of opioid addiction and deaths illustrates the urgency of finding safer and more effective approaches for chronic pain.2, 3 The traditional view of LBP as a largely biomedical problem4 is being supplanted by the biopsychosocial model that acknowledges that while pain usually has an underlying biological basis, psychosocial factors (e.g., pain beliefs/cognitions, distress, coping behaviors, and social factors) also significantly influence the experience and impact of pain.5, 6 This broader conceptualization provides a clear rationale for incorporating cognitive behavioral principles into the management of distressed and disabled patients with chronic LBP to minimize pain-related disability.

A promising strategy for categorizing and treating patients considering both their physical and psychosocial characteristics, the STarT Back approach, was developed and evaluated in England.7 This approach improved patients’ physical function and satisfaction with care while reducing costs.8,9,10,11 This strategy uses patient responses to a nine-item “STarT Back tool” questionnaire to allocate patients to a low-, medium-, or high-risk subgroup according to their risk of persistent disabling back pain. Patients in each subgroup are then recommended evidence-based treatments matched to their prognostic profile.7, 12, 13 Patients found to have at least four out of the five “psychosocial” risk factors (high pain bothersomeness, fear, worry, catastrophizing, depression) are “high risk” and those with relatively few (0–3) physical or psychosocial risk factors are “low risk.” The remaining patients with significant pain and/or activity limitations but fewer psychosocial risk factors are “medium risk.” This tool has been validated with primary care adults with non-specific LBP.12 The success of this strategy in England. has generated great interest in developed countries, providing new hope that meaningful improvements in primary care for LBP are within reach.14,15,16,17,18,19

Prior to widespread implementation, it would be valuable to know if the STarT Back strategy can be successfully translated to other settings. To determine if the STarT Back risk stratification strategy would succeed in the USA, we conducted the Matching Appropriate Treatments to Consumer Healthcare needs (MATCH) cluster randomized trial. This trial evaluates the effects of incorporating the STarT Back strategy into primary care practices within an integrated healthcare system. The goal was to give primary care providers (PCPs) and physical therapists the knowledge, tools, and confidence they needed to provide their patients with a broader understanding of their LBP, reassurance about their likely prognosis, and treatment options that matched the patients’ prognostic profile. We hypothesized this intervention would improve patient outcomes by promoting increased use of matched treatment options for patients in each subgroup, as determined by the STarT Back tool. We believe this is the first randomized and controlled evaluation of the STarT Back approach in the USA.

METHODS

Design and Setting

The trial design has been reported in detail.20 In brief, MATCH was a pragmatic, cluster randomized trial with two parallel arms21 each with a baseline data collection period. Six primary care clinics were randomized in 1:1 ratio to intervention or control (Fig. 1). PCPs were either MDs (84%) or physician assistant/nurse practitioners (16%); 85% had practiced over 5 years; and 62% were female. Data were collected from patients and electronic health records (EHR). The trial was conducted in Group Health (GH), an integrated healthcare delivery system in Washington State serving over 600,000 members GH is now called kaiser permanente washington.

Fig. 1
figure 1

Flow of patients through trial.

Clinical leaders at GH partnered with the research team to evaluate the effect of stratified care in its primary care clinics. The intervention was incorporated into a mandatory care improvement activity, fully supported by clinical and administrative leadership.

Participating Clinics (Clusters) and Patients

One clinic from each of three pairs of large primary care clinics (with onsite physical therapy) near Seattle matched on geographic and socioeconomic characteristics, were pair randomized to the intervention or control. Control clinics received no intervention. Pre-intervention levels of patient outcomes were measured in all six clinics. The intervention was then implemented in intervention clinics over 6 months, after which patient outcomes were again assessed in both intervention and control clinics.

During the trial, all patients 18+ years of age identified in the EHR as having received a primary diagnosis consistent with non-specific LBP (e.g., lumbago, back pain not otherwise specified) were eligible to participate. To maintain broad applicability, we only excluded patients with specific causes of pain (e.g., pregnancy, disc herniation, vertebral fracture, spinal stenosis) or with job injuries, which were seen in the Occupational Medicine clinic.

Randomization and Blinding

Prior to the intervention, the trial biostatistician randomly assigned one clinic in each of the three geographic and sociodemographic matched pairs of clinics to the intervention by computer-generated random number. All eligible patients seen in the intervention clinics were considered to have received the intervention. Researchers did not inform patients that their clinics were participating in an intervention. Interviewers were blinded to patients’ clinics.

The Intervention

The intervention was implemented in the 3 intervention clinics from April–September 2014. Key components of the intervention were incorporating the original version of the STarT Back tool8 into the EHR, identifying recommended treatment options available from GH for patients in each risk subgroup, and training the primary care teams and physical therapists (Appendix 1 online).20

Outcomes

Patient outcome data were collected by telephone interviewers during the pre-intervention (November 2013–April 2014) and post-intervention (December 2014–August 2016) periods. Interviews occurred 2 weeks (range, 1 to 3 weeks) after the LBP visit (baseline) and again 2 and 6 months later. Primary outcomes were LBP-related physical function in the previous week (measured with the modified Roland-Morris Disability Questionnaire (RMDQ)22) and LBP severity during the previous week (measured on a 0-to-10 scale where 0 represents “no pain” and 10 “pain as bad as it could be”).23 Secondary outcomes included patient outcomes (depression, anxiety, fear of movement, global improvement, self-efficacy, satisfaction, and work productivity and activity impairment20) and actual healthcare utilization from the automated EHR (e.g., lumbar imaging, physical therapy, complementary and alternative medical therapies, cognitive behavioral therapy, opioids, spinal injections, and spine surgeon consultations). Because the intervention targeted PCPs, we could not identify adverse effects.

Data Collection

The pre-intervention period was devoted to measuring changes in patient outcomes 2 and 6 months after LBP visits in the intervention and control clinics. Because collecting baseline data during the visit was not feasible, we mailed patients letters shortly after the visits explaining GH was conducting a study to improve LBP care and that we would call to invite their participation. Patients not wishing to be contacted were provided a phone number to opt out. Research specialists called patients between 1 and 3 weeks after their visit to explain the study, answer questions, confirm eligibility and obtain verbal informed consent to complete a baseline and two follow-up interviews. Patients were paid $20 for each questionnaire. Trained interviewers used computer-assisted telephone interviewing to minimize errors and missing data. Similar methods were used to collect post-intervention data. Because we did not meet our recruitment goal during the pre-intervention period, we increased post-intervention recruitment to maintain overall statistical power (see “Sample Size” section). We also improved the recruitment letter, increased staffing, and lengthened the recruitment period. The mean interval between visit date and baseline data collection was 12.7 (SD = 7.1) days.

Sample Size

A priori sample size calculations were performed targeting 80% power to detect a 1.5-point difference in 6-month LBP-related change in patient function (RMDQ) pre- and post-intervention between control and intervention clinics (0 point difference in the low risk subgroups and 2.5 points difference in medium- and high-risk subgroups) and 0.9 points on LBP pain severity score (0 point difference in low risk subgroups and 1.5 points difference in medium- and high-risk subgroups). We planned for a sample size of 1760 participants balanced equally between the pre- and post-intervention periods and the control and intervention clinics allowing for a loss to follow-up rate of 20%.20 Because we recruited only 603 participants (goal was 880) during the pre-intervention period, we determined that we would need a sample size of 1334 during the post-intervention period to maintain 80% power. The final numbers recruited were 603 participants in the pre-intervention period (546 with complete follow-up) and 1098 in the post-intervention period (1008 with complete follow-up). Our post hoc calculation of power based on the observed data (accounting for imbalance between intervention arms) found we had 80% power to detect a difference between trial arms of 1.5 points on the change in RMDQ score before and after the intervention. We assumed no correlation of outcomes within provider or clinic.

Statistical Methods

We first estimated the change in mean score by clinic assignment between the pre- and post-intervention periods. We then compared these differences to estimate the change attributable to the intervention (i.e., we made inferences on the interaction between clinic assignment and intervention period). We used a linear mixed effects model with random effects24 for patient participants (repeated outcome measurements on participants at 2 and 6 months post-LBP visit) and clinic (randomization at clinic level) to account for correlation within individuals and clinics. The primary analysis time-point was 6 months following the LBP visit. To account for potential confounding variables, we adjusted for participant-level baseline covariates shown to be associated with LBP physical function and pain intensity, as well as variables that were imbalanced at baseline at the patient level between intervention and control arms: sex, age, education, race, employment, function (RMDQ), and pain intensity. Risk-subgroup-specific estimates and secondary outcomes were calculated using an identical framework to that described above with one exception. For binary secondary outcomes, we used generalized linear mixed models (GLMM)25 with logit and/or log link functions to estimate odds ratios and/or relative risks instead of mean change scores. We assumed the standard alpha level of 0.05 for a two-sided test.

We used the same analytic approach with EHR data to evaluate the effect of the intervention on healthcare utilization for LBP. We examined if the use of recommended treatments for patients at medium and high risk of persistent disabling pain increased and the use of treatments generally not recommended for non-specific LBP decreased. The primary analyses included all eligible patients (not just those providing patient data). We also analyzed data for the subset of patients who participated in the telephone questionnaires. Comparison of the data from these two populations allowed us to determine the representativeness of participants. We also examined the frequency with which STarT Back risk scores were recorded in the EHR. See ref. 20 for more detail.

RESULTS

Patient Recruitment and Follow-up

Figure 1 presents the flow diagram showing the six clinics in this cluster RCT and the flow of trial participants. Because we included a pre-intervention “baseline” period, we present flow data for both the pre- and post-intervention periods as well as for the total. A total of 2138 LBP patients visited the intervention clinics and 2571 the control clinics. Characteristics of intervention and control patients were similar both pre- and post-intervention. Overall, 36% of eligible patients provided baseline data on the telephone. Participating patients were slightly older than non-participants (mean ages of 57.1 and 54.8, respectively) and more likely to be white (83.0 and 77.0%, respectively) but did not differ by gender. Follow-up rates were 93% at 2 months and 91% at 6 months. Participation and follow-up rates were similar in the intervention and control clinics.

Patient Characteristics

Reflecting the GH membership, participants had relatively high levels of education and income and were primarily white and non-Hispanic (Table 1). About half the participants were over 60 years old. Participants had moderately high levels of functional disability and pain severity, 56% had episodes lasting less than 3 months, 48% reported leg pain, and about 30% were using opioids for their pain. Data from the STarT Back tool showed that 41% were categorized at low risk, 37% at medium risk, and 22% at high risk of persistent disabling pain. The STarT Back tool successfully predicted the prognoses of the three risk groups (i.e., the high-risk group had the worst outcomes and the low-risk group had the best outcomes).26 Participants’ characteristics were similar in the intervention and control arms, showing no evidence of selection bias.

Table 1 Baseline characteristics of study participants in the control (n= 3) and intervention (n= 3) clinics

Effect of the Intervention

Patient Outcomes

At 6 months, there were no statistically significant differences between participants in the intervention and control arms for either primary patient outcome overall or within risk subgroup (Table 2) or for secondary patient outcomes (Table 3). The absolute magnitudes of the between group differences were small and for the primary outcome measures slightly favored the control group. Similar results were found at 2 months (Appendix 2 online).

Table 2 Primary outcomes main analysis and by risk subgroup at 6-month follow-up
Table 3 Secondary patient outcomes at 6-month follow-up

Healthcare Utilization

STarT Back tool data were available for about 50% of LBP visits during the 6-month intervention period, decreasing to about 40% over the ensuing 20 months. Among the 32 PCPs in the intervention clinics who saw at least ten patients with LBP during both the intervention period and post-intervention period, the median percentage of visits with a STarT Back tool score in the EHR was 47% (range 23–71%) during the intervention period and 42% (range 8–71%) during the post-intervention period. Thus, the tool continued to be used for patients of all PCPs, at least occasionally, long after the intervention period ended.

Despite PCP or nursing staff entry of the STarT Back tool data for almost half of the visits for LBP, knowledge of the patients’ risk subgroup did not affect the type or frequency of healthcare provided (Table 4). Specifically, there was no evidence that the intervention strategies used in the MATCH trial increased the use of treatments recommended for medium- and high-risk patients (e.g., physical therapy, complementary and alternative medicine, or cognitive behavioral therapy), or decreased the use of non-recommended tests or treatments (i.e., imaging, opioid medications, spine injections, surgical referrals) for LBP patients at any risk level. There was no evidence of any changes in the tests or treatments recommended by clinicians in the intervention clinics for any of the patient risk subgroups (Table 5). A pre-specified secondary analysis restricted to patients providing telephone outcome data showed similar results.

Table 4 Pre-post proportion and odds ratio (OR) for selected health services for low back pain between the control and intervention arms during the 6 months after visit
Table 5 Pre- versus post-intervention odd ratios for selected health services for LBP between the control and intervention groups during the 6 months after an index visit overall and by risk subgroup

DISCUSSION

The MATCH trial is the first major evaluation of implementation of an adaption of the STarT Back risk stratification strategy in the USA. Although the intervention resulted in use of the STarT Back tool for approximately half of patient visits for LBP, it did not change PCP treatment decisions. Another recent cluster randomized controlled trial evaluated use of a multifaceted strategy (including embedding the STarT Back Tool in the EHR) to implement LBP guidelines into Danish general practices.27 That trial found lower secondary care referral rates in the intervention clinics (5.0%) than in the control clinics (10.5%), but no improvement in patient outcomes.

There are many reasons complex interventions such as the one evaluated in this trial could fail to improve patient outcomes, including unacceptability to clinicians, inadequate leadership and system support, ineffective implementation, and inadequate potency. Although a comprehensive evaluation of the implementation process found high levels of clinician engagement and system support (submitted), there were limitations in our intervention that could explain why PCP behavior did not change, most notably: (1) we did not conduct feedback audits to encourage clinician adherence to matching treatments to patient subgroups and (2) compared with English studies,8, 9 our matched treatment options were more numerous, less familiar to clinicians, and more difficult to access, thereby placing a greater burden on our PCPs. We also used a different recruitment strategy than the English studies. Differences between the study populated in England9 and our study population could also explain outcome differences. For example, although the patient populations were similar in age, gender, employment, risk subgroup proportions, and pain severity, US patients had substantially higher baseline levels of LBP-related physical disability (RMDQ scores of 11.8 versus 8.4, respectively).

We designed our intervention20 to be as potent as possible without making it impossible to implement in primary care clinics. Even if our intervention had improved outcomes, it may not have been feasible to implement in most US primary care settings. The high levels of burnout among PCPs and the continued turmoil in USA healthcare,28 make complex changes in clinical practice difficult.

Major strengths of the MATCH trial include randomization of matched pairs of clinics to serve as intervention or control clinics, adequate sample sizes and power to detect meaningful differences, high follow-up rates, and an adaptive and pragmatic intervention design including substantial PCP and physical therapist training, training modules based on requests of primary care teams, and inclusion of the whole primary care team.20 Limitations include less than half of all LBP patients participated and may not be representative of all patients, the need to defer baseline data collection until 2 weeks after the PCP visit thereby missing any early treatment effects, and the restriction to a single socio-economically homogenous integrated healthcare system.

CONCLUSIONS

In contrast to the positive results of implementing a risk stratification strategy to improve primary care for LBP in England,15, 16 our adaptation of that strategy to the different circumstances in our setting did not change healthcare utilization or improve patient outcomes. This illustrates the risk of failure when complex interventions developed and found effective in one setting are implemented in a different setting even with strong system support and substantial resources devoted to adapting the intervention to local needs and circumstances.

To increase their chances of success, future initiatives to implement complex interventions in primary care should include simple and easily implemented and supported treatment recommendations, automatic alerts in the EHR to make it easy for clinicians to remember to collect risk-stratification information and recommend appropriate matched treatments to their patients, and the provision of regular feedback on their performance adhering to the matched treatment recommendations for patients at each risk stratum are likely to improve the chances of success. Given the limited ability of primary care clinicians to take on new responsibilities, however, innovative approaches (e.g., expanded nurse role) may be necessary to promote the clinical changes necessary to improve patient outcomes.