Introduction

Inflammation of the synovial membrane (synovitis) is common in OA, with MRI-detected synovitis occurring in up to 90% of OA knees [1, 2]. It can be detected, both histologically and on imaging, from the early stages of the disease [3]. Strong cross-sectional associations exist between the presence of synovitis and the severity of knee pain [2]. Longitudinal associations have been demonstrated between the presence and severity of synovitis and both symptomatic and structural OA progression [4,5,6]. There is therefore a strong rationale for therapeutic targeting of synovitis to provide disease modification, particularly in patients with mild to moderate disease where disease-modifying and regenerative approaches are targeted [7].

Dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI) aims to characterise the uptake and washout of gadolinium-based contrast agents (GBCA) in tissues of interest, providing biomarkers of tissue perfusion, capillary permeability and blood and interstitial volume. These parameters are known to change in the synovium in OA [8]. DCE-MRI has been used to assess synovitis in early-phase clinical trials of rheumatoid arthritis and has demonstrated superiority over semiquantitative assessments in this setting [9, 10]. The promise of DCE-MRI in OA has been illustrated by several studies demonstrating changes in DCE-MRI biomarkers following intra-articular corticosteroid treatment with improved responsiveness compared to alternative semiquantitative and qualitative assessments of synovitis [11, 12].

DCE-MRI biomarkers are of particular interest in early-phase experimental medicine studies which aim to establish early proof-of-concept evidence of efficacy of novel treatments, streamline the treatment development process and reduce late-stage failure rates. They could improve outcome assessment in studies of synovitis-targeted therapies by quantifying response to treatment and are likely to be more robust than relying on qualitative or semiquantitative assessment. There may also be a role in selecting which patients are suitable for entry into studies of synovitis-targeted treatments.

However, to increase confidence in the utility of DCE-MRI biomarkers in these settings, technical and clinical validation is essential [13]. This includes an assessment of test-retest repeatability, ability to discriminate between knee OA and normal ageing and expected changes over relevant follow-up periods.

Therefore, the purpose of this study was to evaluate the test-retest repeatability, ability to discriminate between osteoarthritic and healthy participants and sensitivity to change over 6 months, of DCE-MRI biomarkers in knee OA.

Methods

Participants

The study was approved by the local research ethics committee, and written informed consent was given by all participants. This was a single-centre, prospective experimental feasibility study where DCE-MRI was the intervention.

Participants with mild-moderate knee osteoarthritis (OA) were recruited from specialist orthopedic knee clinics at a university teaching hospital. Healthy volunteers (HV) approximately matched for age were recruited via paper and electronic advertisement materials and from a register of healthy individuals who had agreed to be contacted about research studies. Inclusion criteria for OA participants were (i) age 40–60 years, (ii) body mass index (BMI) of ≤ 35 kg/m2, (iii) clinical diagnosis of knee OA per the American College of Rheumatology criteria and (iv) mild-moderate radiographic OA defined as Kellgren-Lawrence grade 2 or 3 on a postero-anterior fixed flexion knee radiograph taken using a positioning device (SynaFlexer; BioClinica) with medial compartment predominant disease [14,15,16]. Exclusion criteria were any history of previous lower limb fracture, previous knee surgery (including arthroscopy), history of inflammatory arthritis or contraindication to MRI or GBCA administration (e.g. pacemaker, renal failure). For HV participants, inclusion criteria were (i) age 40–60 years, (ii) no current or significant previous symptoms of knee pain or stiffness and (iii) BMI ≤ 35 kg/m2. At each study visit, participants completed the knee injury and osteoarthritis outcome score (KOOS) to assess symptoms and had their BMI recorded. No disease-modifying intervention was received by any participant during the study follow-up period.

Image acquisition

Participants underwent MRI of a single knee (most symptomatic knee in OA participants, randomly selected knee in HV using a random number generator [www.random.org]) on a 3 T platform (GE 750; GE Healthcare) using an 8-channel transmit/receive knee coil (InVivo). Imaging was performed at baseline and 6-month follow-up. A subset of participants (10 OA, 6 HV) was asked to return for imaging at 1-month post baseline for assessment of test-retest repeatability. Participants were supine and their knee was positioned in the coil with padding and foot support to minimise subject motion.

All MRI sequence parameters are provided in Table 1 with further details in the Supplementary Materials.

Table 1 Selected MRI sequence parameters

Image analysis—pharmacokinetic modelling

Voxelwise pharmacokinetic modelling of DCE-MRI data was performed on registered images (Supplementary Materials) using the extended Tofts compartmental model [17] with a population-averaged arterial input function (AIF) [18]. All AIFs were corrected for individual patient haematocrit [19]. GBCA concentration was estimated from the change in signal relaxation due to the presence of GBCA (gadoterate [Dotarem]; Guerbet) compared to the native T1 values using a relaxivity of 3.5 L.mmol-1.s-1[20]. Native T1 values were calculated from the variable flip angle images acquired before the contrast agent injection [21]. The biomarkers extracted were Ktrans (units min-1), the volume transfer constant for contrast agent between blood plasma and extravascular extracellular space; vp, fractional volume of blood plasma; ve, the fractional volume of extravascular extracellular space; and IAUC60 (mM.s), the initial area under the contrast agent concentration time curve 60-s post contrast agent arrival in the tissue.

Image analysis—region of interest definition

Two alternative methods of region of interest (ROI) definition were evaluated involving manual and semiautomatic approaches. Manual segmentation of the synovium was performed on the post-contrast 3D fat-suppressed spoiled gradient echo (FS SPGR) sequence by a musculoskeletal radiologist with 6 years’ experience (J.M.), with definition of seven synovial ROIs: suprapatellar, Hoffa’s fat pad, medial and lateral perimeniscal, intercondylar notch, medial and lateral posterior femoral condyles (Fig. 1). Anatomical definitions of synovial ROIs are provided in Table 2. The manual segmentation was intended to provide a rough estimation of where the synovium was located, rather than a detailed slice-by-slice manual segmentation.

Fig. 1
figure 1

3D rendering of synovial regions of interest with anterior (a), posterior (b), medial (c) and lateral (d) views. 3D rendering of femur, tibia and patella (grey) provided for reference. ROI Key: green—suprapatellar, yellow—Hoffa fat pad, red—medial perimeniscal, blue—lateral perimeniscal, purple—intercondylar notch, pink—posterior medial femoral condyle, orange—posterior lateral femoral condyle

Table 2 Anatomical definition of synovial regions of interest

For semiautomatic segmentation, enhancing voxels were defined by subtracting the pre-contrast 3D FS SPGR sequence from the matching post-contrast sequence using a shuffle transform [22]. For a given voxel in the post-contrast image, the shuffle transform minimises the absolute difference between the signal intensity of that voxel and the corresponding voxel plus a defined neighbourhood (for this study the adjacent 3 × 3 voxels) in the pre-contrast image. This improves the quality of the subtracted images and is also robust to residual motion artefact following image registration (Fig. 2). The shuffle-subtracted images were then converted to binary enhancing masks using the Otsu thresholding [23]. The intersection between this binary mask and the manual segmentation was termed the ‘volume of enhancing pannus’ (VEP) mask. The VEP mask was used for the extraction of median DCE-MRI biomarker values for each synovial ROI and for the whole joint (all ROIs combined). In addition, the VEP mask was used to create an estimate of volume of synovial tissue (VEP, measured in mL) by multiplying the number of voxels included in the VEP mask by the voxel size.

Fig. 2
figure 2

Example of the use of shuffle transform to improve quality of subtracted image compared to simple subtraction of registered images. a pre-contrast 3D FS SPGR, b post-contrast 3D FS SPGR, c simple subtraction (following intensity-based registration), d shuffle subtraction. Improved subtraction quality is seen when the shuffle transform is used

Segmentation was repeated by the original observer with an interval of > 6 months between analyses and an independent second observer (T.R., a radiology resident with 4 years’ experience) for all baseline visits to enable assessment of intra and inter-observer reproducibility of DCE-MRI biomarkers.

Image analysis—semiquantitative grading

Semiquantitative grading of synovitis was performed using the MRI Osteoarthritis Knee Score (MOAKS) by a musculoskeletal radiologist with 6 years’ experience (J.M.) [24]. MOAKS grades synovitis in two ways: signal alterations in Hoffa’s fat pad (Hoffa synovitis) and degree of suprapatellar joint effusion (effusion-synovitis). Both are scored on a 4-point ordinal scale (0–3). The intra and inter-reader reproducibility of MOAKS have previously been published [24].

Statistics (see Supplementary Material for detail)

Test-retest repeatability was assessed using baseline and 1-month whole joint data with calculation of the intraclass correlation coefficient (ICC). Intra- and inter-observer reproducibility was assessed using the root-mean-square coefficient of variation (RMSCV) and the concordance correlation coefficient (CCC). We also calculated the smallest detectable difference (SDD), representing the magnitude of change that would give 95% confidence of a change being genuine rather than due to measurement noise, assuming identical measurement conditions. This is defined as 2.77 (\( \sqrt{2} \) × 1.96) times the test-retest within-subject standard deviation or within-subject coefficient of variation (dependent on correlation between magnitude and variability of the biomarker) and is also known as the repeatability coefficient (RC) [25].

Discrimination between OA and HV participants was assessed using baseline data. Descriptive statistics were calculated for each group, and the standardised mean difference (SMD) was estimated for each DCE-MRI biomarker by dividing the difference in mean between the two groups by the pooled standard deviation.

Six-month changes in each biomarker were assessed using descriptive statistics. The number of participants with changes in each biomarker greater than the SDD was calculated.

No formal sample size calculation was performed for this feasibility study. All statistical analyses were performed in R version 3.6.1 [26].

Results

Participants

Fourteen OA and six HV participants were recruited. Baseline characteristics are provided in Table 3. Eight OA and six HV participants completed the 1-month visit. Twelve OA participants and five HV completed the 6-month visit. The reasons for the lost to follow-up were the inability to schedule the MR examination in the appropriate time window (n = 2) and participant withdrawal (n = 1).

Table 3 Participant characteristics at baseline. The Knee injury and Osteoarthritis Outcome Score (KOOS) is a validated patient-reported outcome measure which assesses knee pain and symptoms as well as the impact on knee-related activities of daily living, sports and recreation and quality of life. Scores range from 0 to 100, with 0 indicating the most severe symptoms possible and 100 indicating no symptoms

Test-retest repeatability

Repeatability metrics values for each parameter are provided in Table 4. Variabilities of Ktrans, IAUC60 and VEP were not significantly correlated with the value of the biomarker, so wSD and absolute SDD values are presented. Variabilities of vp and ve were significantly correlated with biomarker value, so wCV and percentage SDD values are presented. Kendall’s τ correlation coefficients for baseline and 1-month biomarker values are provided in Supplementary Table 1. Due to the poor repeatability of vp and ve and the presence of physiologically implausible values (e.g. ve greater than 1), these biomarkers were not used for further analyses. Repeatability of biomarker measurements from semiautomatic segmentation (VEP mask) was better than those derived from manual segmentation for Ktrans and IAUC60. Measurements derived from semiautomatic segmentation were therefore preferred for all subsequent analyses.

Table 4 Repeatability and reproducibility metrics for DCE-MRI biomarkers

Intra- and inter-observer reproducibility

Intra and inter-observer reproducibility was best for Ktrans derived from semiautomatic segmentation (both RMSCV 2.1%, CCC [95% CI] 1.00 [1.00, 1.00]). Ktrans and IAUC60 derived from semiautomatic segmentation demonstrated improved reproducibility compared to manual segmentation. All intra- and inter-observer reproducibility data are provided in Table 4.

Discriminative ability

Baseline between-group differences for the whole joint are illustrated in Fig. 3. Plots for individual ROI are provided in Supplementary Figure 1. One HV participant had much higher values of Ktrans and IAUC60 than other HV participants (> 5 SD greater than mean HV value excluding this participant) across all ROIs. On further investigation, it was determined that this HV had taken part in karate practice the night before each of the three study visits and also had an undisclosed history of gout (never having affected the knee). Possible explanations considered for this value were that this represented part of the normal range of healthy values, or that the presence of gout or recent intense physical activity had confounded measurement. This participant’s data were not excluded because the participant met the pre-specified inclusion criteria, but, where appropriate, additional exploratory analyses excluding this participant’s data are reported.

Fig. 3
figure 3

Baseline DCE-MRI biomarker values. Black dots are median values, with interquartile range error bars. The outlier HV is indicated with a triangle symbol (all other participants are circles)

SMDs between OA and HV groups were 0.94, 0.54 and 0.50 for Ktrans, IAUC60 and VEP respectively. Excluding the outlier HV case, SMDs were 1.34 for Ktrans and 1.12 for IAUC60. Visual analysis of plots for individual synovial ROIs (Supplementary Figure 1) revealed the highest between-group differences for the intercondylar notch and medial and lateral perimeniscal ROIs for Ktrans and IAUC60. The largest between-group difference and between-subject variability for VEP were seen in the suprapatellar ROI, as would be expected given the distensibility of the suprapatellar pouch to accommodate varying degrees of joint effusion. Discriminative ability was better in all cases for measurements derived from semiautomatic segmentation than for manual segmentation-derived measurements.

Sensitivity to change over 6 months

Changes in DCE-MRI biomarkers over 6 months are summarised in Fig. 4, with data for all synovial ROIs provided in Supplementary Figure 2.

Fig. 4
figure 4

Top panel: Change in DCE-MRI biomarker values over time. Individual participant trajectories are displayed by partially transparent coloured circles and lines. Black circles and lines represent group median values (with IQR error bars). As for Fig. 3, the outlier HV is identified with a triangle symbol. Bottom panel: Waterfall plots of individual participant 6-month change in DCE-MRI biomarker values, ordered along the x-axis by magnitude of 6-month change value. Dotted lines represent +/-SDD

For Ktrans, 5 out of 12 OA and 1 out of 5 HV participants had 6-month changes exceeding the SDD. For both IAUC60 and VEP, 1 out of 12 OA and no HV participants had changes exceeding the SDD. Using biomarkers extracted from manual segmentation rather than semiautomatic segmentation, 2 out of 12 OA participants and no HV participants had 6-month changes in Ktrans exceeding the SDD, and no participants had 6-month changes in IAUC60 greater than the SDD. Representative images of participants with changes greater than the SDD are provided in Fig. 5.

Fig. 5
figure 5

Example post-contrast 3D FS SPGR images overlaid with Ktrans data from participants with increases (a) and decreases (b) in Ktrans at 6 months which exceeds the SDD. In a, note extruded medial meniscus with cuff of adjacent synovitis (white arrow). At 6 months, the synovitis has increased both in amount and intensity. In b, note distention of suprapatellar pouch (white arrow) and synovitis adjacent to the anterior horn of lateral meniscus (white arrowhead) at baseline, with marked reduction at 6 months

A comparison of 6-month changes in Ktrans and semiquantitative MOAKS synovitis score (sum of effusion-synovitis and Hoffa synovitis scores, scale 0–6) is provided in Fig. 6. There was limited concordance between participants with changes in Ktrans exceeding the SDD and participants with changes in MOAKS synovitis score.

Fig. 6
figure 6

Individual participant 6-month changes in Ktrans plotted against change in MOAKS synovitis score. Dotted lines are +/- SDD values for Ktrans. The green cross highlights the OA participant depicted in Fig. 5a who had discordant changes: an increase in Ktrans exceeding the SDD, but a reduction in MOAKS synovitis score. The outlier HV is identified with a triangle symbol

Discussion

This study suggests that Ktrans is the optimum of the evaluated DCE-MRI biomarkers for use in experimental medicine studies, with the best test-retest repeatability, best discrimination between OA and HV participants and greatest sensitivity to change as judged by the number of participants showing detectable changes over a 6-month period.

Several previous studies have used DCE-MRI to evaluate synovitis in knee OA, including describing cross-sectional associations with symptoms and longitudinal association with response to treatment [12, 27]. Novel contributions of the current work include (1) improved synovial segmentation leading to more precise parameter estimates, (2) assessment of test-retest repeatability which is required for the interpretation of change at an individual level, (3) assessment of inter-observer reproducibility and (4) comparison of DCE-MRI biomarker values between OA and healthy knees which is needed to assess discriminative validity and also to inform effect size estimations for interventional studies.

Biomarkers that assess the intensity of synovitis (Ktrans and IAUC60) performed better than VEP, which reflects the extent of synovitis, across all assessment domains. This finding agrees with a previous knee OA study which suggested improved sensitivity to change of ‘intensive’ vs ‘extensive’ biomarkers of synovitis [12]. One possible explanation for the superiority of intensive biomarkers is the fact that synovial tissue may enhance despite not being actively inflamed, for example in areas of fibrosis related to previous inflammation [3]. The extensive biomarker can therefore be hypothesised to measure both active and inactive disease. However, such areas are likely to demonstrate different kinetic characteristics to areas of active inflammation, allowing intensive biomarkers to more accurately reflect disease activity at the time of the scan. DCE-MRI biomarkers derived from semiautomatic segmentation performed better than those derived from manual segmentation across the majority of assessment domains. Previous studies have demonstrated reduction in time taken for analysis with semiautomatic approaches but with similar repeatability and reproducibility to manual approaches [28, 29]. One plausible explanation for the demonstrated superiority of our semiautomatic approach is the fact that we used shuffle subtraction prior to our thresholding step, in contrast to approaches which attempt to threshold from the post-contrast images alone.

Interestingly, test-retest repeatability metrics for manual synovial segmentation were better than those for the semiautomatic approach. This probably relates to the fact that the manual segmentation was created to provide a rough mask of the location of the synovium which is then used by the semiautomatic method to identify enhancing voxels within the masked region. It is relatively straightforward for an expert radiologist to provide this initial rough mask as evidenced by the good intra and inter-observer reproducibility of manual segmentation. However, the manual method does not capture the variability in the volume of actual enhancing synovial tissue, in contrast to the semiautomatic method. The volume of enhancing synovial tissue (rather than the approximate region within which it is located) is more likely to undergo biological variation during the test-retest interval. Intra-observer reproducibility was similar for the two methods, but with superior inter-observer reproducibility for semiautomatic segmentation.

The design of our study assumes a natural history of OA with negligible change over one month (repeatability), but with the possibility of disease progression over 6 months. This is a short interval relative to the conventional concept of OA as a slowly progressive condition developing and progressing over years. However, experimental medicine studies are typically of short duration and so to be useful in this setting, an imaging biomarker has to be sensitive enough to detect changes over short intervals. We therefore chose a 6-month interval as a reasonable trade-off between the requirements of experimental medicine studies against the expected relatively slow change in disease.

There was a wide range of 6-month changes in DCE-MRI biomarkers in both positive and negative directions in OA participants. This may reflect the fluctuating nature of synovitis in OA, which is well recognised clinically [30]. Several participants demonstrated 6-month changes greater than the SDD (particularly for Ktrans) suggesting that sensitivity to change is adequate for experimental medicine studies performed over this interval. A possible counter-argument is that this sensitivity to change indicates that the background variability is too high to expect to be able to detect additive effects of therapy. Moreover, more participants demonstrated significant decreases rather than significant increases in Ktrans, likely related to regression to the mean. However, it should be noted that the majority of participants did not demonstrate significant reductions in DCE-MRI biomarkers and typically had higher values than age-matched controls suggesting that there is potential for improvement in these biomarkers with treatment. Moreover, the group mean 6-month changes in DCE-MRI biomarkers for OA participants was close to 0, after adjustment for baseline values (data not shown). This suggests that the effects of treatment may also be detectable at a group as well as at an individual level.

Our results suggest that DCE-MRI biomarkers are likely to be of use in experimental medicine studies featuring putative anti-inflammatory and immunomodulatory disease-modifying treatments. The data presented can be used to inform sample size calculation for further interventional studies. For example, using the observed standard deviation of 6-month change in Ktrans in this study (~ 0.015 min-1), a group-averaged reduction of 50% of the difference between OA and HV mean values (~ 0.01 min-1) could be detected with 80% power and a type 1 error rate of 5% (one-sided) with a sample size of 24 participants per group, assuming an active treatment vs placebo repeated-measures study design. This is a clinically feasible reduction relative to a previous study of change in Ktrans following intra-articular steroid administration [12].

Limitations of this study include the long test-retest interval (1 month) relative to the time over which clinical fluctuations in synovitis occur in OA. Therefore, the measured variability is likely to include contributions from both methodological and biological sources, and true methodological variability is likely to be lower. A second limitation is that the results presented are from a single centre and obtained with meticulous quality control; therefore, extrapolation to multi-centre studies should be done with caution. However, previous work suggests that DCE-MRI biomarkers can be used in such a setting with appropriate training, calibration and quality control [31]. In particular, the use of a semiautomated pipeline as described in this study for defining the synovial ROI is likely to improve robustness in the multi-centre setting compared with manual methods [32]. Finally, the number of included participants was low. While this was to some extent limited intentionally to mimic the conditions of an experimental medicine study, it does limit the precision of biomarker performance metric estimates. There is no ‘magic number’ of participants required for a repeatability study [25]. However, we would contend that the uncertainty in our repeatability estimates is low enough to allow them to be used for sample size calculation and interpretation of change at the individual level in future interventional studies.

In conclusion, this study has assessed the test-retest repeatability, discrimination between OA and ‘normal’ tissue characteristics and sensitivity to change of DCE-MRI biomarkers. Ktrans demonstrates the best performance across these domains and is therefore the most likely to be useful in experimental medicine studies and other future therapeutic trials.