Introduction

Spondyloarthritis (SpA) encompasses a group of immune-mediated inflammatory diseases characterised by spinal pain, stiffness and damage which commonly affect young people and have poor long-term health outcomes [1]. Diagnosis of SpA is often difficult due to the complex nature of pain in young patients [2], and delays in diagnosis and treatment are common [3]. Identification of bone marrow oedema on MRI is of importance for showing inflammation of the sacroiliac joints and supports diagnosis of axial SpA [4,5,6,7]. This directly influences the decision to treat patients with disease-modifying or biologic drugs [5].

Unfortunately, the definition of active inflammation on MRI is based on subjective criteria and is heavily dependent on the expertise and opinion of the scan reader [8,9,10,11]. ‘Conventional’ MR images used to detect inflammation—typically short inversion time inversion recovery (STIR) and T1-weighted spin echo images—produce complex image contrast that depends on multiple tissue properties, including T1, T2, proton density, perfusion and diffusion [11,12,13], which may confound the identification and quantification of oedema. These factors can lead to a lack of consistency between observers and scanners/hospitals [7, 14]. Therefore, there is a need for a method which can simply and objectively assess skeletal inflammation on MRI scans to support diagnostic and therapeutic decisions.

Previous studies have investigated the use of diffusion-weighted imaging (DWI) and chemical shift-encoded MRI (CSE-MRI) as objective methods for assessing bone marrow oedema, with promising initial results [12, 13, 15]. Using DWI, it has been shown that apparent diffusion coefficient (ADC) measurements are increased in areas of marrow oedema, probably due to an expansion of the extracellular space [13, 15,16,17]. Using CSE-MRI, it has been shown that proton density fat fraction (PDFF) measurements are reduced in areas of oedema compared with normal marrow, due to increased water content [12]. CSE-MRI can also be used to assess the severity of fat metaplasia, defined as a focal increase in content in areas of previous inflammation (with diagnostic and prognostic significance), in a quantitative fashion [12]. Previous studies measuring ADC in subchondral bone have typically relied on manual placement of regions-of-interest (ROIs) within the subchondral bone [15, 16, 18] which introduces substantial methodological subjectivity. Furthermore, these studies have relied on mean ADC measurements, which may perform poorly in patients with mixed active and chronic inflammation due to neutralisation of opposing effects [19]. There is currently no validated tool for quantifying proton density fat fraction (PDFF) in the sacroiliac joints.

We describe a new analysis tool which enables a more complete and consistent assessment of subchondral bone and derives a series of histographic parameters from both ADC and PDFF maps, aiming to isolate and separately quantify the active and chronic components of the inflammatory process. We aimed to demonstrate proof-of-principle for this tool in a prospective study of young people with SpA.

Methods

This study received ethical approval from the Queen Square Research Ethics Committee, London, UK (Research Ethics Committee reference 15/LO/1475). All participants gave written informed consent prior to study entry.

Study design and participants

A prospective cross-sectional study was performed at a single specialist tertiary referral centre for adolescents and young adults with inflammatory arthritis. Fifty-three consecutive patients meeting the eligibility criteria (mean age, 18 years; age range, 12–23 years) were prospectively recruited between July 2016 and December 2018 (31 males, mean age 18 years, and 22 females, mean age 17 years). Patients were included if they were referred for an MRI scan of the sacroiliac joints for suspicion of sacroiliitis or for monitoring of known sacroiliitis and were excluded if they had a contraindication to MRI scanning. All patients with known, pre-existing sacroiliitis had a clinical diagnosis of either non-radiographic axial SpA or enthesitis-related arthritis [18,19,20,21]. The sample size was fixed and based on logistical constraints. Patients were classified according to the presence or absence of bone marrow oedema and fat metaplasia using established criteria, based on conventional MRI scans, as described below.

Image acquisition

All subjects underwent both quantitative and conventional MRI scans on the same visit. Quantitative CSE-MR images were acquired on a 3-T Philips Ingenia scanner (Ingenia, Philips) using an investigational version of the Philips mDixon Quant acquisition and post-processing pipeline, as described previously [12]. The images were acquired using a multi-echo gradient echo acquisition with bipolar readout (TE1 1.17 ms, ΔTE 1.6 ms, TR 25 ms, flip angle 3°, matrix size 320 × 320, pixel spacing 1.76 × 1.76 mm, bandwidth 394 Hz/Px) and PDFF maps were generated using complex fitting incorporating T2* decay and a 10-peak model of human adipose tissue [12]. Images were acquired coronal to the long axis of the sacroiliac joint [12]. DW images were acquired on a 1.5-T Siemens Avanto scanner (Avanto, Siemens) using b values of 0, 50, 100, 300 and 600 s/mm2 with spectrally attenuated inversion recovery (SPAIR) fat suppression and echo planar imaging readout (TE = 89 ms, TR = 3600 ms, 4 averages, 8 mm slices, matrix size 144 × 192, FOV 237 × 316 mm, bandwidth 1447 Hz/Px), with images acquired axial to the sacroiliac joint [13, 15]. Conventional MRI consisted of T2-weighted STIR images, T1-weighted turbo spin echo images and fat-suppressed post-contrast T1-weighted turbo spin images acquired coronal to the sacroiliac joint (see Supplementary Information for sequence parameters) [11, 12].

Image analysis

Histographic parameters were obtained from the PDFF and ADC maps used an in-house software tool known as BEACH (Bone Edema and Adiposity Characterisation with Histograms) as shown in Figs. 1, 2 and 3, and as described in detail in the Supplementary Information. This method generates a series of histographic parameters for both ADC and PDFF.

Fig. 1
figure 1

Definition of polygonal ROIs on subchondral bone. The observer is asked to define the line of the sacroiliac joint and ‘anchor lines’ are added to define the angles made by the joint with the anterior and posterior cortex of the bone, thus enabling the automatically propagated ROIs to better fit the subchondral bone

Fig. 2
figure 2

Examples of histograms generated using the BEACH tool. Conventional MR images (ac), PDFF maps (df) and PDFF histograms (gi) are shown. In the normal patient’s histogram (g), PDFF values are clustered around 50%, corresponding to normal marrow. In the patient with inflammation, a number of low-PDFF pixels have emerged in the histogram (h). In the patient with fat metaplasia, there is an upward shift in PDFF values, with a large number of high-PDFF pixels (i)

Fig. 3
figure 3

Examples of ADC histograms in patients with sacroiliitis (a) and control patients (b). The red lines indicate the 10th, 25th, 50th, 75th and 90th percentiles of the ADC distribution

The BEACH tool operates as follows. The observer is prompted to define the line of the sacroiliac joint using a single series of connected straight lines—an open polygon (Fig. 1). ‘Anchor lines’ are used to define the angle made by the joint with the cortical surface, at both the top and bottom of the joint, enabling the shape of the polygonal ROIs to be closely matched to subchondral bone. The software automatically generates a pair of polygonal ROIs in the subchondral bone either side of the joint (Fig. 1, Supplementary Figure S1). This is repeated for both sacroiliac joints covering the entire fibrocartilaginous part of the joint. For the ADC maps, all slices where the fibrocartilaginous joint was visible were included, whereas alternate slices were used for the PDFF maps due to the smaller slice thickness. For each patient, pixel values from the total volume of defined subchondral bone (i.e. from all ROIs) are analysed histographically. For both PDFF and ADC, we measured the 10th, 25th, 50th, 75th, and 90th centiles of the distribution (designated PDFF10, PDFF25… and ADC10, ADC25.. etc., as shown in Figs. 2 and 3). For each quantitative score, the mean of the two observers’ measurements was used for analysis.

The BEACH analysis was performed by two radiology residents (NS and AD, with 2 and 1 year of experience in MR imaging) who received a detailed training session (from TB). Both residents were blinded to all clinical information and to the qualitative radiological scores.

Scoring of conventional MRI

Each subject’s set of conventional MR images was scored by two experienced musculoskeletal radiologists (KR and MHC) with 10 and over 25 years of MRI experience, both blinded to clinical diagnosis, to treatment and to the quantitative image data. Images were read on a research workstation. For each patient, observers assigned a qualitative score between 0 and 72 for the extent/severity of bone marrow oedema [22]. The patient was deemed to have active inflammation if the mean bone marrow oedema score from the two readers was ≥ 2, as per the Assessment of SpondyloArthritis Internal Society (ASAS) criteria [9, 23, 24]. Structural lesions consisting of fat metaplasia, erosions and joint ankylosis were assessed using a structural visual scoring system [25]. Patients with a score of ≥ 3 were deemed to be positive for the presence of fat metaplasia [25, 26].

Clinical scores

Symptoms were assessed using a dedicated research questionnaire (see Supplementary Information). We report here the Bath Ankylosing Spondylitis Disability Index (BASDAI) and Bath Ankylosing Spondylitis Functional Index (BASFI), in addition to C-reactive protein (CRP) and erythrocyte sedimentation rate (ESR).

Statistical analysis

Quantitative parameters derived from ADC and PDFF maps were compared between groups with and without inflammation/fat metaplasia using logistic regression (∝ = 0.05) and receiver-operating characteristic analyses. The optimal operating point for the ROC analysis was defined as the cut-point with the highest positive likelihood ratio (LR+) producing sensitivity and specificity greater than 70%. ROC-AUC values for percentile measurements were compared against the median using the method of DeLong et al [27], implemented using the roccomp function in Stata (∝ = 0.05). To evaluate whether combinations of parameters could improve prediction, multiple logistic regression was performed using combinations of ADC-based and PDFF-based parameters. Likelihood ratio testing was used to test whether combinations of explanatory variables provided an improved fit. Linear regression was used to evaluate the relationship between the qualitative scores and the best-performing qMRI parameters from the ROC analysis. Spearman correlation was used to evaluate the relationship between clinical scores and radiological scores. Inter- and intra-observer variability was assessed using the Bland-Altman 95% limits of agreement and the intra-class correlation coefficient.

Results

Detection of inflammation

Fifteen of 53 patients (24.5%) had sufficient bone marrow oedema to meet the ASAS criteria for active inflammation. The inflamed group included 12 males and 3 females. The gender difference between the inflamed and uninflamed groups was not significant (p = 0.065). There was no significant age difference between the inflamed and uninflamed groups (p = 0.43).

Comparisons of quantitative parameters between inflamed and uninflamed SIJs are shown in Figs. 4 and 5, and the results of the corresponding logistic regression and ROC analyses are shown in Table 1.

Fig. 4
figure 4

ADC as an inflammatory marker. Representative BEACH parameters (ADCmedian and ADC90) are compared between inflamed and uninflamed groups. The displayed p values were obtained by logistic regression. ROC curves for all relevant parameters are shown in the bottom right

Fig. 5
figure 5

PDFF as an inflammatory marker. Representative BEACH parameters (PDFFmedian and PDFF10) are compared between inflamed and uninflamed groups. The displayed p values were obtained by logistic regression. ROC curves for all relevant parameters are shown in the bottom right

Table 1 Comparison of inflammatory parameters between inflamed and non-inflamed patients. ADC75, ADC90, etc. refer to the 75th and 90th percentiles of ADC measurements in the defined ROI. Estimates from each group are displayed as mean (95% CI). Odds ratio (OR) and p values (*) were derived from logistic regression. The highest ROC AUC value for the evaluation of inflammation is shown in italics. Sensitivity and specificity values for the optimal cutoff values (far right) are provided in the main “Results” section. The right-hand p values (**) relate to the comparison of ROC AUC with the median value

All ADC-based parameters were associated with significantly increased odds of inflammation. Parameters which sampled the upper end of the ADC distribution (i.e. ADC75 and ADC90) performed best for distinguishing inflamed from uninflamed SIJs; ADC90 produced an AUC value of 0.819 (0.676–0.962; p = 0.072 when compared with ADCmedian). The optimal cutoff for ADC90 was 986 mm2/s (sensitivity 71.4%, specificity 81.6%). Cutoffs for ADCmean, ADCmedian and ADC75 did not meet pre-specified performance thresholds.

PDFF-based parameters performed poorly as measures of inflammation with no significant difference between inflamed and uninflamed SIJs. Nonetheless, performance increased for parameters sampling the lower end of the distribution (AUC = 0.657 for PDFF10, 0.514 for PDFFmedian).

Detection of fat metaplasia

Thirty of 53 patients (56.6%) met the criteria for fat metaplasia. Patients with fat metaplasia were significantly older than those without fat metaplasia (mean ages (95% CI) were 19.6 (18.5–20.7) and 17.6 (16.5–18.7) respectively (p = 0.046)). There was no significant difference in gender between patients with and without fat metaplasia (p = 0.56).

Comparisons of quantitative parameters between patients with and without fat metaplasia are shown in Fig. 6, and the results of the corresponding logistic regression and ROC analyses are shown in Table 2.

Fig. 6
figure 6

PDFF as a structural marker. Representative BEACH parameters (PDFFmedian and PDFF90) are compared between patients with and without fat metaplasia. The displayed p values were obtained by logistic regression. ROC curves for all relevant parameters are shown in the bottom right

Table 2 Comparison of structural parameters between patients with and without fat metaplasia. PDFF75, PDFF90, etc. refer to the 75th and 90th percentiles of ADC measurements in the defined ROI. Estimates are displayed as mean (95% CI). Odds ratio (OR) and p values (*) were derived from logistic regression. The highest ROC AUC value for the evaluation of fat metaplasia is shown in italics. Sensitivity and specificity values for the optimal cutoff values (far right) are provided in the main “Results” section. The right-hand p values (**) relate to the comparison of ROC AUC with the median value

PDFF-based parameters were associated with increased odds of fat metaplasia, and the separation between patients with and without fat metaplasia was improved for parameters which specifically sampled the upper end of the PDFF distribution (i.e. PDFF75 and PDFF90). The best performing parameter, PDFF90, had an AUC of 0.780 (0.656–0.903; p = 0.263 when compared with PDFFmedian). The optimal operating point for PDFF90 was 55.7%, producing a sensitivity of 70% and a specificity of 73.9%.

There were no significant differences in ADCmean or ADCmedian between patients with and without fat metaplasia.

Prediction of inflammation and fat using combinations of parameters

Multiple logistic regression using both ADC90 and FF90 or ADC90 and FFmedian as predictor variables did not significantly improve the model fit compared with simple logistic regression using ADC90 as a single predictor (p = 0.41 and 0.81, respectively). Similarly, the combination of FF90 and ADC90 or FF90 and ADCmedian did not improve the model fit compared with using FF90 alone (p = 0.86 and 0.73, respectively).

Relationship between BEACH parameters and qualitative MRI scores

The relationship between visual scores of inflammation/fat metaplasia and qMRI parameters is shown in Supplementary Figure S2. There were significant positive relationships between ADC90 and the visual inflammation score (slope = 15.33, p < 0.0001) and between PDFF90 and the fat metaplasia score (slope = 1.05, p < 0.0001).

Relationship between MRI and symptoms

Scatterplots showing the relationship between BASDAI scores and visual and quantitative scores of inflammation and fat metaplasia are shown in Supplementary Figure S3.

There was no significant correlation between visual scores of inflammation and any clinical score (p = 0.45, 0.48, 0.14 and 0.49 for BASDAI, BASFI, CRP and ESR) or between ADC90 parameters and clinical scores (for ADC90 p = 0.48, 0.37, 0.19 and 0.63).

There was a significant negative relationship between fat metaplasia visual scores and clinical symptoms (p = 0.004 and 0.006 for BASDAI and BASFI), and a similar relationship was observed for the corresponding qMRI parameter PDFF90 (p = 0.03 and 0.01 for BASDAI and BASFI). There was no significant relationship between either visual or quantitative fat metaplasia scores and CRP or ESR (all p > 0.05).

Inter- and intra-observer agreement

Inter- and intra-observer agreement statistics for qMRI parameters and visual scores are shown in Table 3. Inter-observer and intra-observer agreement were excellent for all assessed qMRI parameters. Inter-observer agreement was excellent for visual inflammation scores, although the 95% limits of agreement (0.6 ± 6.4) were relatively wide compared with the ASAS definition of active inflammation (score of ≥ 2 diagnostic for active inflammation). Inter-observer agreement was poorer for fat metaplasia scores with an ICC value of 0.544.

Table 3 Inter-observer and intra-observer variability statistics for selected (most relevant) parameters. The intra-class correlation coefficient and Bland-Altman limits of agreement are shown

Discussion

We describe a quantitative, partially automated method for measurement of bone marrow oedema and fat metaplasia based on histographic analysis of quantitative MR images. We show that histogram-based qMRI parameters enable separation of patients according to the presence of oedema and fat metaplasia, both of which are of importance for the diagnosis and management of SpA. The proposed tool offers a simple and potentially repeatable means to quantify inflammation and fat and could be incorporated into picture archiving and communications system (PACS) systems relatively easily. Such a tool could be of value for monitoring inflammation over time and for guiding clinical decisions around initiation and changes of biologic and other therapies. Importantly, ADC-based and PDFF-based parameters provide discrete information regarding oedema and fat metaplasia and could therefore inform on the relative burden of active inflammation versus structural damage.

We found that ADC measurements produced superior performance to PDFF measurements for separating patients with and without inflammation. This suggests that increases in diffusivity are an important part of the inflammatory process in the bone marrow, rather than changes in water content per se. However, previous studies have shown substantial differences in PDFF between normal and inflamed marrow [12], and it may be that the discrepant observations in this study are due to the variability in the composition of normal bone marrow [28]. This could be investigated further by comparing the composition of the inflamed subchondral bone with normal bone marrow.

Our results showed that PDFF90 enabled separation of patients with and without fat metaplasia. Fat metaplasia can contribute to diagnosis [6, 7] and is also a prognostic factor, since patients with fat metaplasia are more likely to fuse their sacroiliac joints [29,28,31].

Interestingly, the 90th percentiles of ADC and PDFF yielded more accurate separation of inflamed and non-inflamed joints and joints with and without fat metaplasia compared with simple averages, although this difference did not reach statistical significance. This suggests that percentiles measuring the extremes of the distribution might be better ‘targeted’ to areas of oedema (for ADC) or fat metaplasia (for PDFF) than mean or median measurements, which may be ‘contaminated’ by non-inflamed or non-fatty sites, respectively, to a greater extent.

Importantly, the inter- and intra-observer variability for both ADC- and PDFF-based parameters was good or excellent. Inter-observer variability was excellent for visual scoring of bone marrow oedema, but substantially poorer for scoring of fat metaplasia. Given the known inconsistencies in radiologists’ interpretation in spondyloarthritis in clinical practice [14], a more consistent measurement could be a major advantage. However, formal studies are needed to assess repeatability and reproducibility across sites and MRI vendors.

We did not find a strong relationship between inflammation on MRI and symptoms in this study, likely reflecting the complex and multidimensional nature of pain in SpA [32]. There was a negative relationship between the severity of fat metaplasia and symptom scores. This suggests that fat metaplasia, a post-inflammatory phenomenon [30, 33], is more common in patients already on treatment with well-controlled symptoms.

A strength of our study is that the control subjects (i.e. those without inflammation) were patients where MRI was clinically indicated and thus likely to have either biomechanical back pain or quiescent inflammatory arthritis. Consequently, the reported statistics for separating patients with and without inflammation are likely to be realistic in a real-world clinical setting (this point is emphasised in the QUADAS-2 quality criteria [34]). By contrast, the use of healthy controls can artificially inflate sensitivity and specificity statistics and give a misleading impression of diagnostic performance. An additional strength is that the histographic parameters used are relatively simple and likely to offer superior performance to more complex metrics based on maximum likelihood estimation. Nonetheless, future work could explore the use of more complex analysis methods, such as Gaussian mixture modeling, to identify discrete subpopulations of pixels within the ROI.

A limitation of this study is that the diagnostic performance reported is not likely to be sufficient for the current use in clinical practice. This may be partially due to the variations in the composition of normal marrow in young patients, where the marrow may be partially ossified and contains varying proportions of water and fat. This factor may bias ADC and PDFF measurements and could have weakened the separation of inflamed and non-inflamed patients. In the future, the BEACH tool could be extended to isolate ossified bone, potentially improving performance. Similarly, the proportion of red and yellow marrow in ossified bone may vary between individuals. The use of variable thresholds depending on the composition of the normal ‘background’ marrow might help to improve the technique for detecting inflammation. ADC measurements can also suffer from poor reproducibility across sites, partly due to the difficulty of achieving high-quality fat suppression [19]. A final limitation is that the proposed tool is only partially automated; further methodological development is required to achieve full automation.

In conclusion, we describe a method for quantifying bone marrow oedema and fat metaplasia in patients with SpA, based on histographic analysis. ADC-based parameters can objectively differentiate patients with bone marrow oedema from those without, whilst PDFF-based parameters can differentiate patients with fat metaplasia from those without. Histographic analysis might improve performance compared with simple averages such as the mean and median and offers excellent agreement within and between observers.