Introduction

Functional imaging enables the detection and monitoring of early changes in the tumour microenvironment induced by targeted therapies [1] and conventional chemoradiotherapy [2]. Examples include measurements of elevated glucose metabolism with 18F-fluorodeoxyglucose (FDG) positron emission tomography-computed tomography (PET-CT) [3], altered cellular structure and density with diffusion-weighted imaging [4], presence of hypoxia with oxygen-enhanced magnetic resonance imaging (MRI) or hypoxia PET methods [5, 6], and altered perfusion with dynamic contrast-enhanced MRI (DCE-MRI) [7]. In these techniques, early change in measurements following therapy can be used to identify new drug targets (such as metabolism or angiogenesis) and to quantify the ‘biological response’ to these therapies. Furthermore, these biological responses may predict subsequent ‘clinical response’ as defined by RECIST and other evaluation systems [8].

Measurements derived from MRI and other modalities are termed ‘imaging biomarkers’. In oncology, these data provide non-invasive, serial, three-dimensional, whole tumour assessment of the microenvironment and can demonstrate differential biological responses in multiple lesions within the same patient [9, 10]. When used to full potential, such data can provide substantial insight into tumour biology and therapy mechanism of action [1, 2]. However, few imaging biomarkers are translated into tools used for decision-making in the clinic [9].

Measurement precision is an under-reported but important step in biomarker translation [11]. It can be quantified by single-centre repeatability of a biomarker value [9] and these data can be used for two purposes. Firstly, they provide a literature benchmark to assess the technical performance of the biomarker against other studies. Secondly, the size of the treatment effect can be compared with measures of repeatability to determine if therapy-induced change in a biomarker exceeds the amount expected through daily variation [12, 13].

In this study, we evaluated measures of DCE-MRI biomarker repeatability in patients with non-small cell lung cancer (NSCLC) treated with radiotherapy-based regimens. DCE-MRI is a technique that has been widely applied in the clinical trial setting to evaluate changes in blood flow and permeability, induced by anti-vascular drugs [7] and has also been used to assess similar changes in the tumour microvasculature following standard-of-care treatment. Radiotherapy plays an important role in the management of nearly all patients with lung cancer [14]. Although DCE-MRI is known to be feasible in NSCLC patients [15], have acceptable inter and intra-rater reliability [16] and be capable of detecting microenvironmental change following (chemo)-radiotherapy [17], the repeatability of these biomarkers is unknown in this setting.

The primary aim was to estimate DCE-MRI biomarker repeatability in NSCLC. The secondary aims were to demonstrate how repeatability estimates can identify which target lesions exhibit early biological response and to compare this assessment of therapeutic response with those obtained from conventional cohort statistics.

Materials and methods

Institutional review board approval was obtained at The Christie NHS Foundation Trust (reference: 15/NW/0264). All patients provided written informed consent.

Study design

Patients with NSCLC eligible for curative-intent radiotherapy either alone or with induction or concurrent chemotherapy were recruited prospectively from a single-centre between September 2015 and April 2017. Inclusion criteria were age ≥ 18 years old, histology or cytology confirmed NSCLC, Eastern Cooperative Oncology Group performance score of ≤ 2 and serum creatinine < 120 μmol/L. Coexistent chronic obstructive pulmonary disease was allowed only where patients had a forced expiratory volume in 1 s greater than 1 litre, or > 40% predicted value on lung function testing. Patients with distant metastasis were included only if eligible for curative-intent therapy. Exclusion criteria were standard contraindications to MRI.

Patients underwent DCE-MRI scanning at baseline prior to treatment. A retest, and repeat baseline were performed when logistically possible. Radiation treatment was delivered with intensity-modulated radiotherapy on a linear accelerator (55 Gy in 20 daily fractions or 60 Gy in 30 daily fractions). Patients underwent a final DCE-MRI scan at 2 weeks into therapy [18].

MRI data acquisition

MRI data were acquired on a 1.5-T whole body scanner (Philips Achieva, Philips Medical Systems) using Sense XL Torso (DCE-MRI) coils. Initial localiser and anatomic scans were performed. Then DCE-MRI data were acquired in the coronal plane.

Pre-contrast T1 was estimated using a coronal 3D T1 Fast Field Echo variable flip angle protocol (repetition time 3.3 ms; echo time 1.43 ms; α = 2°/4°/7°/10°). The field of view was 450 × 450 × 205 mm. 41 slices were acquired (5 mm thickness). The in-plane resolution was 4.69 × 4.69 mm (from a 96 × 96 acquisition matrix). The dynamic series had the same T1-weighted acquisition, matched for the field of view with a 10° dynamic acquisition over 75 timepoints (temporal resolution: 4.9 s). After 6 measurements, a bolus of 0.05 mmol/kg gadoterate meglumine (Dotarem®, Guerbet) was injected at 1.5 mL/s followed by saline into a forearm vein using a Spectris MR (Medrad Inc) power injector.

MRI biomarker derivation

The tumour region of interest (ROI) was defined by a board-certified radiologist (J.P.B.O’C., 18 years’ experience), using Jim 6 software (Xinapse systems) on coronal postcontrast gadolinium T1-weighted images. Lesions measuring 2.5 cm or greater in the largest axial dimension were included. Non-linear diffeomorphic image registration (stnava.github.io/ANTs) corrected for breathing and patient motion during the dynamic series. Tumour ROIs were then transferred to dynamic series and checked for spatial accuracy. Lesion whole tumour volume (WTV) was calculated in cm3. Quality control steps were performed by an independent observer (R.A.L.), including checking for protocol adherence, a positive control assessment of T1, the presence of a valid uptake curve post-gadolinium in the tumour and the presence of motion following motion correction.

An input function was measured for each patient from the thoracic aorta. The volume transfer constant (Ktrans), fractional volume of the extravascular extracellular space (ve), and fractional blood plasma volume (vp) were calculated from the extended Tofts version of the Kety tracer kinetic model [19]. The model-free initial area under the gadolinium contrast agent concentration-time curve up to 60 s post-injection (IAUC60) was also calculated [20]. All parameters were calculated voxel-wise for each individual tumour. Median values of IAUC60, Ktrans, ve and T1, and the mean vp across voxels were determined from the enhancing portion of each tumour ROI [21], defined by IAUC60 > 0 (one-sided paired sample t-test, p < 0.05) [22]. Analysis was performed using in-house software [23].

Evaluation of biomarker distributions

Data distributions of each MRI biomarker (WTV, T1 and DCE-MRI metrics) both at pre-treatment (hereafter termed ‘baseline’) and at week 2 were evaluated with histograms. Normality was determined by examining the cohort distribution of average values at baseline (mean or median). The Shapiro-Wilks test was used to identify departures from the normal distribution. Parameters exhibiting non-normal distribution and/or evident variability in measurement error (see below) underwent a logarithmic transformation.

Estimates of biomarker repeatability

Biomarker repeatability refers to the magnitude of its measurement error under a set of repeatable conditions and was estimated following recommendations by the Quantitative Imaging Biomarker Alliance [11]. Details of the statistical model specification can be found in the Supplementary materials and methods.

In brief, Bland-Altman plots [24] were used to represent the mean of the replicates for each lesion and compare their difference, thus illustrating trends in the variability of the biomarker over the range of measurement values. When the variability was not constant, a logarithmic transformation of the data was performed. This variability is designated as within-lesion variability (noted below by \({\sigma }_{w}^{2}\)), and repeatability metrics are derived from it by specifying and fitting a random effects model [25].

The repeatability coefficient (RC) is defined as the least significant difference between any two repeated measurements taken under identical conditions at a two-sided significance of \(\alpha =\) 0.05 [11]. Parameter changes exceeding the +RC threshold or falling below the −RC threshold (limits of agreement, LOA) can be regarded as real changes, not attributed to measurement error, with a 95% confidence level (hereafter the term ‘real change’ is used for this specific interpretation).

When normal distributions cannot be assumed or the within-lesion variance varies with the magnitude of the biomarker, the within-lesion coefficient of variation (wCV) is usually reported, defined as the ratio of the within-lesion standard deviation by the mean of the replicates and expressed in %.

We calculated the asymmetric LOA: where MRI biomarkers are lognormally distributed, the standard deviation of the parameter varies proportionally with the mean, and the \({wCV}\) can be also expressed as

$${wCV}=\sqrt{{e}^{{\sigma }_{{{{\mathrm{ln}}}}}^{2}}-1}$$

where \({\sigma }_{{{{\mathrm{ln}}}}\,}^{2}\) is the within-lesion variance of the log-transformed data [13]. In the logarithm scale, \({RC}\) can be calculated as \({\widehat{{RC}}}_{{{{\mathrm{ln}}}}}=1.96\sqrt{2{\hat{\sigma }}_{{{{\mathrm{ln}}}}}^{2}}\) [13]. This provides asymmetric LOA when back-transformed to the original scale:

$${{RC}}_{L}=(\exp \left(-{\widehat{{RC}}}_{{{{\mathrm{ln}}}}}\right)-1)\,{{{{\rm{and}}}}}\,{{RC}}_{U}=(\exp \left(+{\widehat{{RC}}}_{{{{\mathrm{ln}}}}}\right)-1)$$

Biomarker estimates of cohort mean treatment effect with mixed-effects modelling

For each biomarker, response to therapy was examined in the cohort by calculating percentage change from baseline to week 2 (on the log-transformed data), by fitting a mixed-effects model where random effects account for clustering of lesions within the patients, and scan timepoints (pre-treatment, week 2) incorporated as fixed effects (see Supplementary materials and methods). A significant treatment effect over time is determined when the parameter for the fixed effect is estimated to be significantly different from zero. For patients with two baseline scans, the average value of the two scans determined the baseline biomarker value.

Identifying individual lesions exhibiting significant treatment effect

For each MRI biomarker, and each lesion, waterfall plots were used to represent the percentage change from the pre-treatment values to week 2 [7]. We determined the frequency of individual lesion changes exceeding the asymmetric RC LOA obtained from the repeatability analyses. In addition, where available, literature-based thresholds were applied. We used published thresholds of 40% change in either direction for median Ktrans [26]. For WTV we derived thresholds based on 1D RECIST [8] and 2D World Health Organisation (WHO) size-based changes [27], with an increase of 72.8% and a decrease of 65.7% indicating the change (details in Supplementary materials and methods).

Statistical analyses

All statistical tests were performed using Stata/BE 17.0 (StataCorp LLC). A significance level of 0.05 was used for all tests. A career statistician (N.P.) was the statistics guarantor.

Results

Overall, 14 patients were recruited (mean age, 67 years +/− 12, 8 men). They had 22 tumours between them, comprising 12 primary lung cancers, 8 metastatic lymph nodes in the thorax and 2 distant metastases within the field of view. Details of the patients, their demographics and their lesions are presented in Table 1. All patients had either completed induction chemotherapy or were not eligible for chemotherapy, except for patient 4 who underwent concurrent chemoradiotherapy. Therefore, changes in MRI biomarkers were considered largely due to radiation. Sample images are shown in Fig. 1.

Table 1 Patient demographics, disease stage, treatment, and collection of MRI data
Fig. 1
figure 1

Example images of primary lung cancer in a patient who underwent MRI on all three occasions

Evaluation of biomarker distributions

Histograms revealed skewed, distributions of all MRI biomarker parameters that deviated from normality except for pre-treatment median IAUC60 and pre-treatment median ve (Fig. 2A, B and Supplementary Table S1). Bland-Altman plots in test-retest baseline MRI biomarker data (Fig. 2C) showed that measurement error was only constant over the range of biomarker values for median IAUC60. Consequently, all biomarkers underwent a logarithmic transformation.

Fig. 2
figure 2

Histogram distributions of the cohort parameter values for each of the six MRI biomarkers at (A) pre-treatment baseline and (B) week 2 of treatment. Measurement error of the test-retest replicates pre-treatment is shown in C with Bland-Altman plots, where the top and bottom dotted lines indicate LOA

Estimates of biomarker repeatability

Absolute values of the baseline pre-treatment biomarkers are shown for each parameter in Fig. 3. The wCV ranged from 9.16% for WTV to 42.44% for mean vp (Table 2). Asymmetric LOA values are also listed. Data for primary tumours only and nodal lesions only were similar to the repeatability of the overall cohort and are shown in Supplementary Tables S2 and S3).

Fig. 3
figure 3

Line plot showing MRI test-retest pre-treatment biomarker values for each lesion. Solid lines represent tumour lesions; dashed lines represent nodal lesions; dotted lines represent metastases

Table 2 Repeatability parameters for each MRI biomarker

Biomarker estimates of cohort mean treatment effect with mixed-effects modelling

The cohort-level change from baseline to week 2 showed significant change with an increase in median IAUC60, median Ktrans, median ve and reduction in WTV. In distinction, mean vp and median T1 showed no significant change (Table 3). Data for primary tumours only and nodal lesions only were similar to the overall cohort change data and are shown in Supplementary Tables S4 and S5.

Table 3 Cohort-level mean change in MRI biomarkers from baseline to week 2 of treatment

Identifying individual lesions exhibiting significant treatment effect: asymmetric RC LOA

The frequency of each DCE-MRI biomarker showing real change from baseline to week 2, using the asymmetric RC LOA, is shown in Fig. 4. Biomarkers exhibiting cohort-level significant change in mean values had three differing patterns of changes in individual lesions:

  1. a.

    Same-direction change in a minority of lesions: median IAUC60 and median Ktrans had real increase in two and four individual lesions, respectively, in the same direction as the significant overall cohort effect

  2. b.

    Lack of any change: median ve did not have real change in any individual lesion, despite a significant overall cohort effect

  3. c.

    Divergent change: WTV showed a significant cohort decrease whereas individual lesions showed diverse real changes with 7/22 reductions and 2/22 increases, while 13/22 lesions did not change beyond the RC LOA.

Fig. 4
figure 4

Percentage change in each individual lesion for all MRI parameters. In all cases the asymmetrical (dashed line) LOA are included, parallel to the x-axis. Additional literature-based thresholds are added for median Ktrans and WTV (continuous line)

Finally, biomarkers with no statistical change at the cohort level showed no real change in any individual lesions (median T1) or just one lesion with change (mean vp).

Identifying individual lesions exhibiting significant treatment effect: literature thresholds

The frequency of lesions showing real change was compared for literature threshold methods versus asymmetric RC LOA methods. The literature threshold for Ktrans of 40% change to indicate a biological response was close to the asymmetric RC LOA estimate. Therefore, a similar number of lesions (here, 5 rather than 4) were deemed to exhibit real change.

However, for WTV the RECIST-based assessment of size change was substantially greater than the asymmetric RC LOA (−65.7% and +72.8% compared to −22.37% and +28.82%). Consequently, RECIST-based assessment meant that only one lesion had real change—a reduction—differing substantially from the asymmetric RC LOA method of assessment and contrary to the cohort direction of change.

Discussion

In this study, we sought to define DCE-MRI biomarker precision by measuring repeatability. We also sought to determine if repeatability data could enable statistical analysis of biological response on a per-lesion level, augmenting traditional cohort-based evaluation of treatment effects.

We used a small but representative functional imaging dataset to address these questions. All MRI biomarker data were either non-normally distributed, prone to measurement error dependent on the magnitude of the biomarker, or both. This required data logarithmic transformation and determined which statistical analyses were appropriate for repeatability assessment. Our findings can be summarised in four key messages.

Firstly, MRI repeatability was reported using wCV and %RC values to enable literature comparison. The most commonly cited biomarker Ktrans had a wCV of 16.45% which is comparable to literature wCV values of 20.3% in pelvic tumours of gynaecological origin [28], 11.0% and 20.1% in prostate tumours [29, 30], 15.6% in primary renal cancer [31], and 15.4% and 24% in colorectal liver metastases [18, 32]. This suggests that Ktrans repeatability is acceptable in patients with NSCLC with thoracic lesions. These wCV values are similar to those obtained from biomarkers from other imaging modalities such as [18F]FDG PET-CT SUVmax in multiple studies [33].

Secondly, repeatability can be compared between different biomarkers derived from the same data with equal constraints (such as size variation and tumour motion). We measured DCE-MRI biomarkers including model-free IAUC60 and extended Tofts model parameters (Ktrans, ve, vp), as well as native T1 and WTV. We found negligible differences in the precision of IAUC60 and Ktrans—the two most widely used DCE-MRI biomarkers—consistent with some literature [34] despite other studies reporting two-fold differences [32]. As with previous studies [18, 35] we found that WTV was highly repeatable with WTV of less than 10% wCV but that vp precision was very poor, approaching 50% wCV.

Thirdly, cohort-level analysis can identify biomarkers that change with therapy and mixed-effects modelling increases data inclusion in studies where patients have variable numbers of lesions. Since cohort changes were significant for some MRI biomarkers (median IAUC60, median Ktrans, median ve and WTV) but not others (mean vp and median T1), even small studies can identify the best candidate biomarkers for use in subsequent larger studies.

Fourthly, repeatability data provides complimentary information on biological response by identifying the frequency of real change in individual lesions. Our data show that some biomarkers (here, ve) can exhibit significant cohort change in the absence of any individual lesion changing beyond the asymmetric RC LOA. In distinction, other biomarkers (here, median IAUC60 and median Ktrans) exhibit significant cohort change as well as showing significant changes in some individual lesions. Furthermore, other biomarkers (here, WTV) can show significant change at the cohort level but have individual lesions that change in variable manner (increase, no change or decrease). Finally, different methods exist for identifying which lesions exhibit real change (literature thresholds instead of asymmetric RC LOA) and the choice of method can substantially alter the interpretation of the frequency of real change in a cohort.

Limitations of the study include analysing one dataset and using single-centre data, but these limitations are typical to most studies in the literature that deploy DCE-MRI biomarkers. Specific findings will vary from those from other small studies but the principles—that data need to be evaluated for normality, that precision can be measured and can improve evaluation of biological response—are generalisable. In theory data from a research study such as presented here could be extrapolated to patients undergoing functional imaging as part of routine clinical practice. However, further work is required to determine if other authors obtain similar data to that reported here.

Biomarker studies that detect early pathophysiological change following therapy may offer insight into personalised therapy. However, the predictive relationship between this biomarker change and subsequent clinical response must be further evaluated in larger studies.

Collectively, data from our study suggest that measuring biomarker repeatability not only provides useful information regarding precision but also advances the understanding of early pathophysiological changes in the tumour microenvironment.