Background

Diffusion-weighted magnetic resonance imaging (DWI) is based on the random motion or “self-diffusion” of water molecules in a tissue, which depends on its histology [1]. The apparent diffusion coefficient (ADC) is a measurement of the diffusion calculated from the DWI-images [1,2,3]. Thus, the ADC adds information on function (diffusion) not revealed by imaging of anatomy and histology. DWI was first successfully used to evaluate brain tissue and is now also used in other soft tissue organs, especially for cancer imaging. In the vertebral bone marrow, DWI and ADC measurements are not part of standard clinical imaging protocols but have been applied to study inflammatory and infectious disorders and to differentiate benign from malignant compression fractures [3,4,5,6,7,8,9].

Modic changes (MCs) are magnetic resonance imaging (MRI) findings of vertebral bone marrow changes extending from the endplate. They are divided into type 1 (oedema type), 2 (fatty type) and 3 (sclerotic type) based upon T1- and T2-weighted images [10]. Type 1 MCs were related to back pain in some studies [11,12,13], but the clinical significance of MCs is uncertain [14]. There are limited data on ADC measurements in MCs [4,5,6, 15], but ADC values have been used to help distinguish type 1 MCs from infectious spondylitis [4, 5] and inflammatory spondyloarthritis [6]. In patients with MCs, DWI represents a research tool and has yet no role in routine imaging. Further research on the relevance of ADC measurements in MCs requires reproducible and valid measurements. We expect valid ADC values to differ according to MC type, since the underlying histology differs [16]. The aim of this study was to evaluate ADC values in MCs for interobserver reproducibility and relation to MC type.

Methods

This cross-sectional study was based on baseline MRI of 90 consecutive patients aged 25–63 years (mean age 44 years; 54 women) with chronic low back pain and MCs who were included in the Norwegian AIM (Antibiotics In Modic changes)-study, which comprised 180 patients. The current sample size (n = 90) was based on a power calculation (see below). Eligibility criteria and methodology of the AIM-study are previously published [17, 18]. In short, all AIM patients had type 1 and/or type 2 MCs, with height ≥ 10% of vertebral body height and diameter > 5 mm, at the level of a previous lumbar disc herniation [17]. The present analysis included any type of MC of that size at any level Th12-S1 with or without disc herniation. Patients with prior low back surgery, except surgery for disc herniation performed more than 1 year earlier, were excluded from the AIM-study. None had lumbar metal implants. All patients gave written informed consent prior to inclusion. The study was approved by the Regional Committees for Medical Research Ethics in South East Norway (ref. no. 2014/158). The current report follows guidelines for reporting reliability and agreement studies [19].

Images

The patients included in the present analysis underwent MRI of the lumbar spine during the initial phase of AIM from 2015 to 2016 at five centres using identical protocols and 1.5 T scanners (Magnetom Avanto B19, Siemens Healthineers, Erlangen, Germany). This study applied sagittal ADC maps and T1- and T2-weighted non-fat saturated fast spin-echo images (‘T1/T2’). Gradient-echo diffusion weighted echo-planar imaging with fat saturation was performed. The system software generated ADC maps based on b values of 50, 400 and 800 s/mm2 (recommended by the vendor) and three orthogonal directions of diffusion sensitization (see protocol details in Table 1).

Table 1 DWI with sagittal ADC maps of the lumbar spine

For T1/T2, slice thickness/ interslice gap was 4 mm/0.4 mm, matrix 384 × 269, field of view 300 mm × 300 mm, echo time (ms)/repetition time (ms) 11/575 (T1) and 87/3700 (T2), and echo train length 5 (T1) and 17 (T2). All images were stored and evaluated at a single centre using Agfa Impax 6.5 (Agfa HealthCare, Mortsel, Belgium).

Evaluation

Two radiologists (A, B), with 6 (A) and more than 10 years of experience (B), independently evaluated levels Th12-S1 (12 endplates) using all sagittal slices. The observers were aware that patients had chronic low back pain but were otherwise blinded to clinical findings. They cross-navigated between the ADC map and the T1/T2 images to ensure ADC was measured in an MC related area. MCs were defined based on T1/T2 images [10, 20] (Table 2). We excluded MCs with height < 10% of vertebral body height or diameter ≤ 5 mm according to one/both radiologists.

Table 2 Description of magnetic resonance imaging variables

For each MC, ADC was measured in the MC related area that was most intense on the ADC map, in normal vertebral body marrow, and in cerebrospinal fluid (CSF) using a circular region of interest (ROI) with predefined size (Fig. 1, Table 2). To limit variation in ADC measurements, we did not use freely shaped ROIs. If the MC area had uniform intensity on the ADC map, ADC was measured in the area where the MC had largest height on T1/T2.

Fig. 1
figure 1

Measurements of ADC values. (a-d) A 50-year-old woman with chronic low back pain. ADC maps (a, c) and corresponding T2 weighted fast spin echo images (b, d) showing MCs at the L4/L5 level. ADC measurements (Avg GY corresponding to mean 10−6 mm2/s) included (a) highest mean ADC value in the MC region (1655 in a 41.8 mm2 ROI) and (c) mean ADC in normal vertebral body marrow (215.9 in a 94 mm2 ROI) and in CSF (3125 in a 41.8 mm2 ROI). Midsagittal images were used for measurements in CSF at the level of the MC and close to the endplate in normal vertebral body marrow near the MC. ADC, apparent diffusion coefficient. MC, Modic change. ROI, region of interest. CSF, cerebrospinal fluid

The following ADC variables were analysed (Table 2): (a) the MC related ADC value (10− 6 mm2/s) (MC-ADC), (b) MC-ADC in percent (MC-ADC%) where 0% = ADC in normal vertebral body marrow and 100% = ADC in CSF, and (c) MC-ADC divided by the vertebral body ADC (MC-ADC-ratio).

Prior to any ADC measurements, as part of a previous study [20], MC type was independently assessed by three radiologists (B, C, D), each with more than 10 years’ spine MRI experience. MCs were classified in types 1, 2 and 3 [10] (Table 2).

Mixed MC types were classified as primary (most extensive) type / and secondary type, i.e., MC types 1/2, 1/3, 2/1, 3/1, 2/3 and 3/2. Finally, in the present study, MCs were grouped into a type 1 group (any MC containing type 1), type 2 group (pure type 2 MCs) or type 3 group (MC types 3, 3/2, and 2/3).

Conclusive MC type was based on the agreement of at least two of the three observers B, C, and D. If all three disagreed, MC type was decided in consensus with observer A. The conclusive value for ADC variables was the mean of the values reported by observers A and B. The height of the MC into the vertebral body was measured in mm in our previous study [20] and is reported here as the mean of the values reported by observers C and D.

Pilot study

Prior to this study, observers A, B, and C performed a pilot study on 10 patients, to determine the ADC evaluation criteria and align the measurements. The pilot study patients were not included in the present study.

Hypothesis

A priori, we hypothesized that ADC values were higher in the type 1 MC group vs the type 3 group and higher in the type 3 group vs the pure type 2 group. Rationale: Compared to type 2 and 3, type 1 MCs are likely to contain more inflammatory oedema, favouring motion of water molecules and increasing the ADC value. Trabecular thickening / sclerosis restricts water motion, and less trabecular thickening in type 1 vs type 3 MCs [21] also suggests higher ADC values in type 1. The large hydrophobic fatty cells in type 2 MCs may restrict water motion / reduce ADC values more than does the fibrovascular granulation tissue with inflammatory cells in type 1 MCs [10, 16] and the trabecular thickening in type 3 MCs [21]. In an MC containing type 1 but also type 2 and/or 3, we expected type 1 to contribute the highest ADC value.

Statistical analyses

The reproducibility analysis was restricted to MCs extending from one of the four lowest endplates (L4-S1), because of low prevalence (< 10%) of MCs at the other endplates [22]. Interobserver reliability at each endplate was assessed by Cohen’s kappa (MC presence and type) and intraclass correlation coefficients (ICCs) (ADC variables). We used 2-way random effects, absolute-agreement, average-measures ICCs. ADC variables were also analyzed using Bland Altman plots with mean of differences ±1.96 SD (limits of agreement, LoA) at each endplate and pooled across all four endplates L4-S1. We further calculated the proportion of differences exceeding 50% of the observers’ mean value for each ADC variable across L4-S1. We used 50% as cut-off because LoA were 5% ± 45% for ADC in vertebral bone marrow in a prior intra rater study [23]. Interpretation of Cohen’s kappa: 0.00–0.20 poor; 0.21–0.40 fair; 0.41–0.60 moderate; 0.61–0.80 good; 0.81–1.00 very good reliability [22]. ICC values were regarded to indicate poor (< 0.50), moderate (0.50–0.75), good (0.76–0.90) or excellent (> 0.90) reliability [24].

The relation between each ADC variable and MC type group was analysed using conclusive ADC values from MCs extending from one of the 12 endplates Th12-S1. Linear mixed-effects models were conducted using the ADC variable as dependent variable, MC type group as fixed effect, and endplate and patient as random effects. In each MC type group, the model returned a predicted mean value of the ADC variable that had been adjusted for data dependency between MCs at different endplates within the same patient. We also assessed the ability of each ADC variable to discriminate between the MC type groups by calculating the area under the receiver operating characteristic curve (AUC). We graded the discriminatory ability as low (AUC 0.5 to < 0.7), moderate (0.7 to < 0.9), or high (0.9–1.0) [25].

Mixed-effect models were conducted in R 4.0 (R Foundation, Vienna, Austria), using normality plots of standardized residuals and fitted values to assess model assumptions. All other analyses were performed using MedCalc 17.6 (MedCalc Software, Ostend, Belgium). Plots were made using Matlab 9.5 (Mathworks, Massachusetts, United States) and MedCalc 17.6. The significance level was 0.05.

Sample size

Previously reported ADC values (recalculated to 10− 6 mm2/s) were 624–1800 (SD 120–316) in type 1 MCs [4, 5] and 500 (SD 160) in type 2 MCs [4]. Assuming SD 300 for ADC in both of two MC groups, 36 MCs in each group are sufficient to detect a mean ADC difference of 200 between the groups (β = 0.2, two-sided α = 0.05). We needed 31 MCs at a given endplate to get a precision of ±0.10 (95% confidence level) for an ICC of 0.85. We expected 90 patients to have enough MCs to compare the ADC variables between the three MC type groups and to estimate their reliability.

Results

We included MCs from all 90 patients, 224 MCs in total (Table 3). These were 111 type 1 group MCs (any type 1), 91 type 2 group MCs (pure type 2), and 22 type 3 group MCs (20 type 2/3, 2 type 3/2, 0 type 3). MC height was mean 10.7 mm (SD 3.6 mm) and was ≥7 mm in 85% of the MCs (191/224). For reproducibility analyses, 201 MCs at L4-S1 were included.

Table 3 Distribution of Modic types across the lumbar spine

Interobserver reproducibility

The interobserver reliability was very good (kappa 0.85–0.96) for MC presence but varied from moderate to very good (kappa 0.41–0.81) for MC type group (Additional file 1, Table A1) and good to excellent (ICC 0.84–0.98) for the three ADC variables (Table 4).

Table 4 Interobserver reliability for ADC variables

For MC-ADC, values (10− 6 mm2/s) from both observers ranged from 108 to 2029 (mean 913) across the 201 MCs L4-S1. Widest LoA were 20 ± 407 (at L4-L5 inferior to disc) and narrowest LoA were 12 ± 254 (at L5/S1 inferior to disc) (Fig. 2).

Fig. 2
figure 2

Bland-Altman plots for MC-ADC. The figure shows results for two radiologists who measured MC-ADC in a total of 201 MCs at the four endplates L4-S1. MC, Modic change. ADC, apparent diffusion coefficient. MC-ADC, ADC in MC

MC-ADC% ranged from 6 to 76 (mean 25.6) and had widest and narrowest LoA of 1.6 ± 18.8 and 1.4 ± 10.4 (Fig. 3).

Fig. 3
figure 3

Bland-Altman plots for MC-ADC%. The figure shows results for two radiologists who measured MC-ADC% in a total of 201 MCs at the four endplates L4-S1. MC, Modic change. ADC, apparent diffusion coefficient. MC-ADC%, ADC in MC in percent (0% = vertebral body, 100% = cerebrospinal fluid)

MC-ADC-ratio ranged from 0.5 to 15.6 (mean 4.7) with widest and narrowest LoA 0.3 ± 4.3 and 0.2 ± 3.9 (Fig. 4).

Fig. 4
figure 4

Bland-Altman plots for MC-ADC-ratio. The figure shows results for two radiologists who measured MC-ADC-ratio in a total of 201 MCs at the four endplates L4-S1. MC, Modic change. ADC, apparent diffusion coefficient. MC-ADC-ratio, ADC in MC divided by ADC in normal vertebral body marrow

Pooled LoA across L4-S1 were for MC-ADC (10− 6 mm2/s) 7 ± 316, MC-ADC% 1.2 ± 13.8, and MC-ADC-ratio 0.4 ± 4.0. The upper border of these LoA reached 35, 59, and 94% of the mean value for MC-ADC (913), MC-ADC% (25.6) and MC-ADC-ratio (4.7), respectively, across all 201 MCs L4-S1.

The difference between the two observers was > 50% of their pairwise mean in 18 (9%) of the 201 MCs for MC-ADC, 41 MCs (20%) for MC-ADC%, and 34 MCs (17%) for MC-ADC-ratio.

Reproducibility parameters for ADC values in CSF and normal vertebral body marrow are shown in Additional file 1, Table A2.

ADC values by MC type group

Unadjusted mean values of the three MC related ADC variables are shown in Table 5.

Table 5 Unadjusted mean for ADC variables by Modic type group

Adjusted for data dependency within patients in the linear mixed-effects models, the predicted means for the ADC variables were higher in the type 1 vs type 3 MC group and in the type 3 vs type 2 MC group (p ≤ 0.001 to 0.02) (Fig. 5).

Fig. 5
figure 5

ADC variables according to Modic type group. The left panel shows predicted means from linear mixed-effects analyses for three ADC variables in each of three Modic type groups including a total of 224 MCs Th12-S1 in 90 patients. The right panel shows regression coefficient for Modic type 1 and type 3 groups using type 2 group as reference. ADC, apparent diffusion coefficient. MC, Modic change. MC-ADC, ADC in MC. MC-ADC%, ADC in MC in percent (0% = vertebral body, 100% = cerebrospinal fluid). MC-ADC-ratio, ADC in MC divided by ADC in normal vertebral body marrow

Predicted mean for type 1 vs 3 vs 2 was for MC-ADC (10− 6 mm2/s) 1201 vs 796 vs 576, for MC-ADC% 36 vs 21 vs 14, and for MC-ADC-ratio 5.9 vs 4.2 vs 3.1.

The ability to discriminate between the MC type groups was moderate to high for MC-ADC and MC-ADC% (AUC 0.73–0.91) and low to moderate for MC-ADC-ratio (AUC 0.67–0.85) (Fig. 6).

Fig. 6
figure 6

Ability of ADC variables to discriminate between Modic type groups. The figure shows receiver operating characteristic curves and AUC values describing the ability of each ADC variable to discriminate between the Modic type groups for 224 MCs Th12-S1 in 90 patients. MC-ADC and MC-ADC% discriminated better between MC type 1 and type 2, and between type 1 and type 3 than did MC-ADC-ratio (p 0.005 to < 0.001). The ability to discriminate between type 3 and type 2 did not differ between the three variables. ADC, apparent diffusion coefficient. MC, Modic change. AUC, area under the curve. MC-ADC, ADC in MC. MC-ADC%, ADC in MC in percent (0% = vertebral body, 100% = cerebrospinal fluid). MC-ADC-ratio, ADC in MC divided by ADC in normal vertebral body marrow

Supplementary ADC data are found in Additional file 1, Table A3 and Fig. A1.

Discussion

This study provides new data on interobserver reproducibility for ADC values in MCs. We found relatively better reproducibility for MC-ADC than for MC-ADC% and MC-ADC-ratio. To our knowledge, this is also the first study to show higher ADC values for a type 1 vs a type 3 MC group and for a type 3 vs a pure type 2 MC group, supporting our hypothesis based on histology of MCs [10, 16, 21].

We tested the hypothesis of ADC differences between MC types to assess the construct validity of the ADC variables [26, 27]. ADC maps cannot replace images used to discriminate between MC types. The discriminative ability still supports the validity of MC-ADC and MC-ADC% and weakens the validity of MC-ADC-ratio. To evaluate how well ADC values represent actual diffusion (criterion validity), one could perform DWI of phantoms with defined diffusion characteristics [28,29,30].

Pooled LoA suggested that 95% of differences in MC-ADC between observers can be expected to fall within 7 ± 316 (10− 6 mm2/s). This is relevant when different observers measure MC-ADC in the same patient. ICC ≥ 0.95 indicated that MC-ADC distinguished well between the patients [31], despite LoA reached 35% of the mean across L4-S1. The ICC quantifies the between-subject variability in relation to the measurement error [31, 32]. Thus, the high ICC values (≥ 0.95) may reflect the large variability in MC-ADC values between the patients in our sample. In more homogenous samples the ICC will be lower.

No previous study has reported specifically on the reproducibility of ADC values in MCs. A study of ADC measurements in active spondyloarthritis foci and type 1 MCs [6] reported interobserver ICCs of 0.89–0.98. Other studies on ADC values in bone marrow lacked interobserver data [23, 33]. Our LoA for ADC in normal lumbar bone marrow (4% ± 56%) (Additional file 1, Table A2) were only slightly wider than previously reported for intra observer LoA (5% ± 45%) [23]. ADC values have been found to be less reproducible in bone marrow than in soft tissues [23]. Thus, our results seem to agree with relevant prior studies.

Standardized ROIs, pilot testing, and clear instructions for where to measure probably reduced the variability of the ADC measurements. MC-ADC implied a single measurement, avoiding variation from measurements in CSF and normal bone marrow. This may partly explain the relatively better reproducibility for MC-ADC compared to MC-ADC% and MC-ADC-ratio. We included the two latter variables since it had been found useful in prior studies of ADC values to standardize lesion values against normal tissue values [34,35,36]. However, in our study this approach added variability. Compared to MC-ADC, MC-ADC-ratio also discriminated less well between the MC type groups. MC-ADC seems more feasible, reproducible, and promising for use in further research.

In line with our results, Belykh et al. found higher mean ADC (10− 6 mm2/s) in type 1 vs type 2 MCs (498 vs 223, p < 0.001) [15]. Prior statistical comparisons of ADC values between all three MC type groups are lacking. In a study with 20 MCs, mean ADC (recalculated to 10− 6 mm2/s) was descriptively reported to be 624, 500, and 756 in type 1, 2, and 3 MCs, respectively [4]. Thus, ADC values differed between studies. Our mean ADC value of 1226 in type 1 MCs was midways in the range of previous values (498 to 1800) [4, 5, 15], and close to what was found in spondyloarthritis foci (1240) [36].

Many factors can affect ADC values in MCs, such as MRI technique (sequence parameters, b values, fat suppression) [37,38,39,40], ROI size and location, type of ADC measure (mean, percentile, histogram), and the definition of MC type (e.g., pure, mixed). Lack of information on mixed MC types, ROI size, and exact location of the ROI in the MC further complicates a comparison of the ADC values [5, 15].

Strengths and limitations

Strengths of this study are standardized MRI methodology, well-defined criteria for measuring ADC, and a large enough sample size to compare ADC values between MC type groups. A limitation is that our type 3 MC group was dominated by type 2/3 MCs, which may have reduced its ADC values. Partial volume effect can bias ADC measurements in MCs. This was likely a minor issue in our study, since 85% of the MCs appeared clearly larger (based on height ≥ 7 mm) than the slice thickness applied (4 mm) and at least as large as the ROI used (diameter 7 mm). The interobserver reliability for MC type group varied, reflecting difficulties in assessing signal intensities in MCs, especially in mixed MC types, which were prevalent (Table 3). The observers were experienced and had performed a pilot study. Interobserver differences may be larger between less experienced radiologists. We did not assess intra observer agreement, which is often better than the interobserver agreement [41,42,43,44].

The ADC maps showed some noise and distortion (Fig. 1), which are common problems in spine DWI [40]. The single-shot echo-planar imaging method applied is prone to susceptibility artefacts, which can influence ADC values. The DWI sequence (3 min 48 s) was part of an extensive MRI protocol where each sequence had been shortened to reduce total scan time and make the protocol feasible at all study centres. Longer acquisition time could have been used to improve the ADC maps [45, 46]. New DWI methods like RESOLVE (readout segmentation of long variable echo-trains), can also provide better image quality but were not available to us at the time [47]. The DWI method we used should be possible to apply at most MRI centres. Importantly, we used T1/T2 images as anatomical references when measuring ADC, and the modest quality of the ADC maps hardly affected the overall results.

Implications

Our findings have some implications for future research. Firstly, MC-ADC may be preferable when all study participants undergo identical DWI protocols. Secondly, the intra observer repeatability of the ADC variables and their reproducibility with other and improved DWI protocols should be clarified. Finally, the clinical relevance of measuring ADC in MCs is unknown and should be investigated, especially in the most inflammatory type 1 MC group. In inflammatory lesions of spondyloarthritis and sacroiliitis, ADC measures were related to disease activity [36, 48].

Conclusions

ADC values of MCs had overall moderate interobserver reproducibility and they differed between MC types as hypothesized. The reproducibility was best for MC-ADC - measured in a ROI of predefined size - without standardization against normal bone marrow or CSF. This variable appears feasible, reliable, and valid to use in further research.