Oedema on STIR modified the effect of amoxicillin as treatment for chronic low back pain with Modic changes—subgroup analysis of a randomized trial

Objective To evaluate potential MRI-defined effect modifiers of amoxicillin treatment in patients with chronic low back pain and type 1 or 2 Modic changes (MCs) at the level of a previous lumbar disc herniation (index level). Methods In a prospective trial (AIM), 180 patients (25–64 years; mean age 45; 105 women) were randomised to receive amoxicillin or placebo for 3 months. Primary outcome was the Roland-Morris Disability Questionnaire (RMDQ) score (0–24 scale) at 1 year. Mean RMDQ score difference between the groups at 1 year defined the treatment effect; 4 RMDQ points defined the minimal clinically important effect. Predefined baseline MRI features of MCs at the index level(s) were investigated as potential effect modifiers. The predefined primary hypothesis was a better effect of amoxicillin when short tau inversion recovery (STIR) shows more MC-related high signal. To evaluate this hypothesis, we pre-constructed a composite variable with three categories (STIR1/2/3). STIR3 implied MC-related STIR signal increases with volume ≥ 25% and height > 50% of vertebral body and maximum intensity increase ≥ 25% and presence on both sides of the disc. As pre-planned, interaction with treatment was analysed using ANCOVA in the per protocol population (n = 155). Results The STIR3 composite group (n = 41) and STIR signal volume ≥ 25% alone (n = 45) modified the treatment effect of amoxicillin. As hypothesised, STIR3 patients reported the largest effect (− 5.1 RMDQ points; 95% CI − 8.2 to − 1.9; p for interaction = 0.008). Conclusions Predefined subgroups with abundant MC-related index-level oedema on STIR modified the effect of amoxicillin. This finding needs replication and further support. Key Points • In the primary analysis of the AIM trial, the effect of amoxicillin in patients with chronic low back pain and type 1 or 2 MCs did not reach the predefined cut-off for clinical importance. • In the present MRI subgroup analysis of AIM, predefined subgroups with abundant MC-related oedema on STIR reported an effect of amoxicillin. • This finding requires replication and further support. Supplementary Information The online version contains supplementary material available at 10.1007/s00330-020-07542-w.


Introduction
Modic changes (MCs) are magnetic resonance imaging (MRI) findings of vertebral bone marrow changes extending from the endplate. MCs are defined as type 1 (oedema type), 2 (fatty type), and 3 (sclerotic type) based on their intensity on T1-and T2-weighted MRI [1,2]. However, the MC intensity and type depend on MRI scanning parameters and magnetic field strength [3,4], and different MC types may represent stages of a common biological process [5]. This process may involve inflammation, fibrosis, high bone turnover, fatty infiltration, and sclerosis [1,5,6]. MCs were related to low back pain (LBP) in some studies but not in others [7][8][9]. The evidence for an association between MCs and LBP is stronger for type 1 than for type 2 or 3 MCs [10][11][12][13][14][15]. Proposed explanations for MCs include endplate damage, autoimmunity, and occult discitis [5]. Vertebral bone oedema resembling type 1 MCs is a common MRI finding in spondylodiscitis [16], and one theory is that MCs develop adjacent to a low-grade discitis region caused by haematogenic spread of Cutibacterium acnes bacteria to a previously disrupted, neo-vascularised lumbar disc [17,18].
A previous trial reported a substantial effect of antibiotic treatment for chronic LBP with type 1 MCs on 0.2-T MRI [19]. Some of these MCs might have appeared as type 2 on 1.5-T MRI [4]. The recent AIM (Antibiotics In Modic changes) trial applied 1.5-T MRI [20] and reported a small but not clinically important effect of amoxicillin in chronic LBP patients with type 1 MCs (− 2.3 points on the Roland-Morris Disability Questionnaire (RMDQ)) and no effect for type 2 MCs (− 0.1 RMDQ points) [21]. Thus, the type 1 (oedema type) group tended to have a larger effect. More bone marrow oedema may be associated with worse pain and disability [22][23][24] and might indicate more severe disease with a larger potential for improvement. Therefore, we hypothesised a larger effect of amoxicillin in subgroups with more versus less MC-related oedema. It is relevant to assess subgroup effects across both MC types because both can contain inflammatory changes [25] and oedema [26]. Short tau inversion recovery (STIR) series are ideal for highlighting oedema. STIR suppresses a high signal from fat and often shows oedema in MCs classified as type 2 on T1/T2 series without fat suppression [26]. Although STIR is more sensitive to bone marrow oedema than standard T1/T2 series [26,27], STIR can likewise not separate infectious from noninfectious causes [28,29]. This subgroup study of the AIM trial included both STIR and standard fast spin echo T1/T2 sequences. We aimed to evaluate potential MRI-defined effect modifiers of amoxicillin treatment in patients with chronic LBP and type 1 or 2 MCs at the level of a previous lumbar disc herniation.

Materials and methods
The AIM trial included 180 patients from six hospital outpatient clinics in Norway from June 2015 to September 2017 [20,21]. All the eligibility criteria are detailed in the Appendix, Table A1. The inclusion criteria were age 18-65 years, LBP for more than 6 months with a mean intensity of at least 5 on three 0-10 numerical rating scales, lumbar disc herniation on MRI in the preceding 2 years, and type 1 or type 2 MCs (with height ≥ 10% of the vertebral height and diameter > 5 mm) at the previously herniated disc level. Trial flow chart, trial methods, and baseline characteristics are published [21]. The trial, this study, and statistical analysis plans are registered at ClinicalTrials.gov (identifier: NCT02323412).

Randomisation, treatment, and outcome measures
Patients were randomised to receive oral amoxicillin 750 mg or placebo (maize starch) three times daily for 3 months. The amoxicillin and placebo tablets had identical encapsulation, containers, and labelling. A third-party statistician used Stata 13 (StataCorp) to create randomisation lists. Allocation was stratified by prior disc surgery (yes/no) and MC type (type 1 (n = 118) or type 2 (n = 62) only) at the previously herniated disc level(s) with a 1:1:1:1 allocation and random block sizes of four and six [21]. The allocation sequence was concealed and centrally administered. Care providers gave patients a prescription with a computer-generated allocation number to be used at dedicated pharmacies. All care providers, research staff, statisticians, and patients were blinded to treatment allocation during data collection.
The primary endpoint was the RMDQ score (0-24 scale) at 1 year [30,31]. The minimal clinically important difference in mean RMDQ score at 1 year between treatment groups (treatment effect) was predefined as 4 [20,21]. The outcome at the end of the treatment period (3 months) was not primary because long-term improvement is desirable and the antibiotic group in the prior trial improved by 4.5 RMDQ points (0-23 scale) from the end of the treatment period to 1 year [19]. The Oswestry Disability Index (ODI) 2.0 (0-100 scale) [32] and LBP intensity (0-10 numeric rating scale) were secondary outcomes [21].

MRI assessment
Baseline MRI of the lumbar spine was performed at six centres using identical protocols and the same type of 1.5-T scanner with the same software version (Magnetom Avanto B19; Siemens). This MRI included sagittal T1-and T2-weighted fast spin echo ('T1/T2') and sagittal STIR images. The integrated spine array coil was used, but no surface coils. Echo time (ms)/repetition time (ms) was 11/575 for T1, 87/3700 for T2, and 70/5530 for STIR. Echo train length was 5 for T1, 17 for T2, and 20 for STIR. Matrix was 384 × 269 for T1/T2 and 320 × 224 for STIR. The inversion time for STIR was 160 ms. Slice thickness/spacing was 4 mm/0.4 mm and field of view was 300 mm × 300 mm for all three sequences. Other MRI parameters were also identical between centres [33].
Three radiologists (A.E., N.V., and P.M.K.) who were blinded to clinical outcomes and treatment allocation independently rated MRI findings [33]. All had > 10 years of experience in musculoskeletal MRI. The same three radiologists interpreted the MRIs from all study centres. On T1/T2, they rated primary (most extensive) and secondary MC types as type 1 (hypointense on T1, hyperintense on T2), type 2 (hyperintense on T1, iso-or hyperintense on T2), and type 3 (hypointense on T1 and T2) [33]. They evaluated the largest height and volume of MCs on T1/T2 and the largest height, volume and intensity of any MC-related STIR signal increase (Table 1), defined as a visible increase compared with normal vertebral bone marrow, located in or abutting a region with MC on T1/T2 or located and shaped as an MC [33].
We based the conclusive MRI findings on the radiologists' majority rating or mean value of measurements made by two of them (A.E. and P.M.K. if all three agreed there was a lesion to measure). The inter-rater reliability of the MRI evaluations was previously reported [33]. Fleiss' kappa values [34] for overall inter-rater agreement (mean values across four MRI, magnetic resonance imaging; MCs, Modic changes; STIR, short tau inversion recovery; T1/T2, T1-and T2-weighted fast spin echo images a STIR intensity < 25% included MCs with no conclusive STIR signal increase (or decreased STIR signal) and thus no conclusive measured STIR intensity values. These values had likely been < 25% if measured, since the STIR intensity was < 20% for > 90% of intensity measurements reported by only one radiologist (as the other radiologists found no visual signal increase and therefore did not measure STIR intensity). To enhance clinical credibility, images were reviewed to ensure that intensity ≥ 25% implied visually convincing hyper-intensity endplates L4-S1) were 0.88/0.81 for presence of any MCs/ type 1 MCs, 0.64/0.69 for MC height/volume, 0.86 for presence of a STIR signal increase, and 0.51/0.56 (0.40/0.40 at L5/ S1 inferior to disc) for STIR signal height/volume. For maximum MC-related STIR signal intensity on a 0-100% scale (0% = normal vertebral body; 100% = cerebrospinal fluid), largest mean of differences and widest 95% limits of agreement were 0.9% and ± 7.6%, respectively.

Predefined hypotheses and potential effect modifiers
In the AIM trial protocol, we hypothesised a better effect of amoxicillin when: -STIR shows more MC-related high signal (primary hypothesis) -MCs contain more type 1 than type 2 or are larger (explorative hypothesis) The nine variables described in Table 1 were predefined as potential effect modifiers in the statistical analysis plan. All concerned MCs at the index level(s) with prior disc herniation because this level was hypothesised to contain low-grade discitis that was the target for treatment. One composite and four underlying variables concerned STIR signal extent and intensity. The STIR composite variable had three categories (STIR1/2/3) and was used to assess the primary hypothesis. STIR3 implied MC-related STIR signal increase with volume ≥ 25% and height > 50% of the vertebral body, maximum intensity increase ≥ 25%, and presence on both sides of the disc. One composite and three underlying variables concerned MC extent and type 1 degree on T1/T2. We constructed each composite variable by clinically plausible grouping of the underlying variables [35]. We did so before analysing the effect of any MRI variable on the outcome but were not blinded to the distribution of the variables in our sample.

Analyses
The baseline properties of the randomised groups were characterised using descriptive methods.
All pre-planned analyses are described in Table 2. Each effect modifier was analysed using ANCOVA with the outcome (RMDQ, ODI, or LBP intensity) at 1 year as the dependent variable and the randomisation group, effect modifier, and their interaction as independent variables adjusted for the baseline values of the outcome. Additionally, we adjusted for age and prior disc surgery because contamination during surgery is a potential cause of discitis. Supporting the use of ANCOVA, Levene's test indicated homogeneity of variances, and QQ plots indicated normally distributed residuals and outcome variables without extreme outliers. If one or both composite MRI variables modified the treatment effect, we would include both in the same model to assess their independence [36]. Post hoc analyses, marked as such throughout the manuscript, are also described in Table 2.
The primary analyses were performed on the predefined per protocol (PP) population described in the Appendix, page 2. The intention to treat (ITT) population was used for supportive analyses. Missing outcome values were imputed using multiple imputation (details in footnote, Table 2).
We used a Bonferroni-corrected alpha of 0.05/6 (0.008) when testing the primary hypothesis (ranked as hypothesis  [20]. Otherwise, an alpha of 0.05 was applied to minimise type 2 errors [37]. Analyses were performed using Stata 16 (StataCorp), and figures were made using MATLAB 9.5 (MathWorks) or Stata 16.

Power calculation
The AIM trial was designed with 90% power in each MC type group [20]; 80% power in the total sample would have required 50 patients or 200 patients in a subgroup study with two equally large subgroups [38]. In this study with three subgroup categories, 80% power would have required > 200 patients. Adding covariates in the analyses improved the power [39], but our sample was still small.

Primary hypothesis-STIR
As hypothesised, the STIR3 group (n = 41) reported the largest effect of amoxicillin; the difference in mean RMDQ score at 1 year between those receiving amoxicillin and those receiving placebo (PP analysis) was − 5.1 (95% CI − 8.2 to − 1.9, p for interaction = 0.008) (Fig. 1). The corresponding difference was − 0.7 (95% CI − 3.1 to 1.7) in the STIR2 group and 1.1 (95% CI − 2.1 to 4.4) in the STIR1 group. The treatment effect in the STIR3 group was − 4.8 points (95% CI − 7.9 to − 1.8; p for interaction = 0.014) in the ITT analysis and − 4.5 RMDQ points (95% CI − 7.9 to − 1.1, p for interaction = STIR volume ≥ 25% of the vertebral body (n = 45) also significantly modified the treatment effect (Fig. 1). In the STIR3 and STIR volume ≥ 25% groups, the effect of amoxicillin was > 4 RMDQ points (cut-off for clinical importance) and also evident for ODI (Fig. 2) but not for LBP intensity (Fig. 3).

Explorative hypothesis-T1/T2
The results for the composite MC variable based on T1/ T2 did not reach statistical significance or clinical importance (Fig. 1). Two underlying subgroups significantly modified the treatment effect in favour of amoxicillin: MC volume ≥ 25% and MC height > 50% of the vertebral body (Fig. 1). The treatment effect within these subgroups did not exceed the threshold for clinical importance.

Post hoc analyses of STIR3
The baseline characteristics of the STIR3 patients were similar in both treatment groups (Appendix, Table A5). The STIR3 patients and total sample had similar mean baseline RMDQ scores (amoxicillin/placebo 12.8/12.6 vs. 12.7/12.8) and prior disc surgery rates (16% vs. 21%).
The treatment effect of amoxicillin in the STIR3 group was present at 3 months and remained until the end of the study at 1 year (Fig. 4), but the change in RMDQ score varied considerably between patients (Fig. 5).
Bangs blinding index for STIR3 patients was − 0.01 in the amoxicillin group, indicating perfect blinding, and 0.61 in the placebo group, indicating un-blinding (Appendix , Table A4).

Discussion
To our knowledge, this was the first study to investigate STIRbased effect modifiers of a treatment for chronic LBP with MCs. These effect modifiers were defined by MC-related high signal on STIR at the previously herniated index level(s) hypothesised to contain a low-grade discitis. The STIR3 and STIR volume ≥ 25% groups with abundant high signal modified the treatment effect of amoxicillin. STIR3 patients reported the largest effect (− 5.1 RMDQ points; 95% CI − 8.2 to − 1.9; p for interaction = 0.008). Subgroups based on T1/T2 features of MCs did not report a clinically important effect of amoxicillin. All subgroups were small, and the findings must be interpreted with caution.

Credibility of results
We consider the subgroup effect of STIR3 to have overall moderate credibility based on the criteria predefined in the statistical analysis plan [36] and the results of the post hoc analyses, but some criteria were not fulfilled ( Table 4). The interaction of STIR3 and STIR volume ≥ 25% with treatment was found for the related outcomes RMDQ and ODI but not for LBP, and the finding has not yet been replicated in other studies. No tissue samples were taken, and STIR findings alone are not diagnostic for infection [28,29]. Further data are needed to link extensive oedema on STIR to low-grade disc infection.
Post hoc responder analyses supported an effect of amoxicillin in the STIR3 group, but the estimates showed wide CIs (Appendix, Tables A2-A3). Similar to patients with verified Cutibacterium acnes discitis [40], STIR3 patients improved most during antibiotic treatment (Fig. 4). The clinical course in the STIR3 placebo group is difficult to evaluate for credibility because the natural course in untreated groups with abundant MC-related oedema on STIR is unknown. The STIR3 placebo group reported almost no improvement during the 1-year follow-up (Fig. 4). Placebo groups and sick-listed  [19,41,42], 1.9 points for ODI [43], and 0-2.2 points for LBP [19,[41][42][43].
Incomplete blinding may have contributed to lack of improvement in our placebo group and an overestimated effect of amoxicillin. All patients were blinded to treatment allocation, but placebo patients still tended to suspect they were not on active treatment (Table A4). This might be due to a lack of treatment effect or lack of side effects [44]. AIM patients with little improvement at 3 months and no side effects were less likely to report at 1 year that they had received antibiotics [21]. The precise impact of incomplete blinding on outcome is unclear [45].
As hypothesised, amoxicillin had the largest effect on RMDQ and ODI in the assumed 'worst' category of all STIR variables (Figs. 1 and 2). This was not the case in the type 1 major category on T1/T2, and the effect of placebo in the group 'MC volume <10%' is difficult to explain and may be spurious (Fig. 1). Thus, the results for STIR variables appear more credible than the explorative T1/T2 results. Below, we further discuss the STIR3 results that correspond to our predefined primary hypothesis.

STIR3 results-interpretation and implications
The effect of amoxicillin was larger for STIR3 patients than for the original type 1 and type 2 only MC groups (− 2.3 and − 0.1 RMDQ points, respectively) in the primary analysis of AIM [19]. The disability at baseline was similar, not worse in the STIR3 group as expected, and cannot explain the difference. These findings make it relevant to examine patients with extensive MC-related oedema on STIR as a separate subgroup in future treatment studies.

Roland-Morris Disability
Questionnaire score (0-24 scale), Oswestry Disability Index score (0-100 scale), and low back pain intensity (0-10 numerical rating scale) from baseline to 1 year in each treatment group. Higher scores imply worse disability/ pain. The results are from post hoc analyses It remains unclear why the treatment effect (RMDQ difference) at 1 year was smaller in the STIR3 group (5.1; 0-24 scale) than in the prior cohort with type 1 MCs (8.3; 0-23 scale) [19]. Baseline MC oedema cannot be compared because the prior trial applied 0.2-T MRI without STIR. Both cohorts had MCs at the level of a prior disc herniation. STIR3 patients had slightly lower baseline RMDQ scores than the previous cohort, less than 13 vs. 15 [19]; baseline LBP scores were similar, above 6.
The STIR3 results were consistent with the hypothesis that some MCs with abundant oedema on STIR might represent low-grade discitis. Importantly, spondylodiscitis was an exclusion criterion and was not suspected on MRI. The STIR3 findings were credible by most of the predefined criteria (Table 4) and the treatment course mirrored that of Cutibacterium acnes discitis.
However, replication of our findings is essential. The effect of amoxicillin in the STIR3 group varied greatly (Fig. 5) and was not evident for LBP intensity. The CI overlapped with the cut-off for clinical importance, bacterial infection was not verified, and un-blinding may have occurred. Additionally, adverse events and antibiotic resistance are potential harms [21].
The present findings support the use of STIR to evaluate MCs. They also motivate further studies of MC-related oedema on STIR in relation to possible biological markers of infection that are currently being investigated by our research group. To achieve an optimal classification of MCs for potential clinical use, MC characteristics not studied here, such as diffusion parameters [46], contrast enhancement, and bone turnover [6], should also be investigated.
Further work is needed to quantify STIR findings. The composite STIR variable was based on both visual assessments and several time-consuming manual measurements. The visually estimated STIR signal volume alone yielded similar results and might be more applicable in a clinical setting. However, precise measurements are preferable in research and may become more feasible with advanced automated techniques [47,48]. To date, few studies have quantified spinal oedema on MRI [29,[49][50][51].

Strengths and limitations
The strengths of this study include predefined hypotheses, standardised MRI techniques, and MRI ratings by three experienced radiologists [52]. Additionally, potential effect modifiers were defined and categorised before analysing their modifying impact.
Subgroup studies of clinical trials often have limited statistical power and generalisability [36,53,54]. This phenomenon also applies to our study. Our pre-decision in the statistical analysis plan to perform primary PP analyses is debatable, although we also present secondary ITT results. The decision implied that we focused primarily on the effect of amoxicillin and secondarily on the effect of allocating patients to receive amoxicillin [55,56]. The impact of MRI findings on the treatment effect in patients who did not follow their assigned treatment seemed less relevant to study. PP analyses can create prognostic differences between the treatment groups [55]. However, ITT analyses supported the PP findings.
We evaluated the MRIs at inclusion before defining the subgroups, and the definitions of the MRI subgroups were partly dependent on the MRI data [57]. Furthermore, the composite MRI variables had not been validated. The ratings of STIR signal height and volume were less reliable at L5/S1 inferior to the disc [33]. However, the conclusive rating based on multiple observers' evaluations was likely more reliable than each observer's rating [52]. When an MC contained both type 1 and another type, we classified the type 1 part as primary or secondary, but we did not measure its exact size or intensity. Our results may not apply to low-field or 3-T MRI. However, they are likely valid with similar 1.5-T MRI protocols [3] and STIR works well with both low-and high-field scanners [58,59].

Conclusion
Predefined subgroups with chronic LBP, an index level with prior disc herniation, and abundant MC-related index-level oedema on STIR modified the treatment effect of amoxicillin. Funding This study has received funding by the South East Norway Regional Health Authority (grant no. 2015-090) and the Western Norway Regional Health Authority (grant nos. HV 911891 and HV 911938), which had no part in the planning, performing, or reporting of the study.

Compliance with ethical standards
Guarantor The scientific guarantor of this publication is Per Martin Kristoffersen.

Conflict of interest
The authors of this manuscript declare no relationships with any companies, whose products or services may be related to the subject matter of the article.
Statistics and biometry One of the authors has significant statistical expertise.
Informed consent Written informed consent was obtained from all patients in this study. Yes, at baseline.
(2) Is the effect suggested by comparisons within rather than between studies?
Yes, within this study.
(3) Was the hypothesis specified a priori? Yes, in the trial protocol and statistical analysis plan (4) Was the direction of the subgroup effect specified a priori? Yes, and the a priori specified direction was verified (5) Was the subgroup effect one of a small number of hypothesised effects tested?
Yes, the hypothesis was predefined as primary hypothesis for this study and pre-ordered as hypothesis six (F) in trial protocol (6) Does the interaction test suggest a low likelihood that chance explains the apparent subgroup effect?
Yes, p = 0.008, equal to the Bonferroni adjusted alpha of 0.05/6; this strengthened the result, as the six tests were not independent (7) Is the significant subgroup effect independent?
Probably. The subgroup effect only changed from 5.

Methodology
• prospective • randomised controlled trial (subgroup analysis) • multicentre study Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.