Comparing the interobserver reproducibility of different regions of interest on multi-parametric renal magnetic resonance imaging in healthy volunteers, patients with heart failure and renal transplant recipients

Objective To assess interobserver reproducibility of different regions of interest (ROIs) on multi-parametric renal MRI using commercially available software. Materials and methods Healthy volunteers (HV), patients with heart failure (HF) and renal transplant recipients (Tx) were recruited. Localiser scans, T1 mapping and pseudo-continuous arterial spin labelling (pCASL) were performed. HV and Tx also underwent diffusion-weighted imaging to allow calculation of apparent diffusion coefficient (ADC). For T1, pCASL and ADC, ROIs were drawn for whole kidney (WK), cortex (Cx), user-defined representative cortex (rep-Cx) and medulla. Intraclass correlation coefficient (ICC) and coefficient of variation (CoV) were assessed. Results Forty participants were included (10 HV, 10 HF and 20 Tx). The ICC for renal volume was 0.97 and CoV 6.5%. For T1 and ADC, WK, Cx, and rep-Cx were highly reproducible with ICC ≥ 0.76 and CoV < 5%. However, cortical pCASL results were more variable (ICC > 0.86, but CoV up to 14.2%). While reproducible, WK values were derived from a wide spread of data (ROI standard deviation 17% to 55% of the mean value for ADC and pCASL, respectively). Renal volume differed between groups (p < 0.001), while mean cortical T1 values were greater in Tx compared to HV (p = 0.009) and HF (p = 0.02). Medullary T1 values were also higher in Tx than HV (p = 0.03), while medullary pCASL values were significantly lower in Tx compared to HV and HF (p = 0.03 for both). Discussion Kidney volume calculated by manually contouring a localiser scan was highly reproducible between observers and detected significant differences across patient groups. For T1, pCASL and ADC, Cx and rep-Cx ROIs are generally reproducible with advantages over WK values. Electronic supplementary material The online version of this article (10.1007/s10334-019-00809-4) contains supplementary material, which is available to authorized users.


Introduction
Functional renal imaging is a burgeoning field of research that has the potential to translate into meaningful clinical applications for patients with kidney disease [1].

3
Multi-parametric magnetic resonance imaging (MRI) allows acquisition of multiple sequences with potential to inform regarding structure, tissue composition, perfusion, and physiology of renal function in a single scan [2]. However, the clinical utility of each sequence, and indeed the potential additive benefit of their use together, are yet to be proven. The immediate research priority in renal MRI is focusing on the standardisation and harmonisation of image acquisition across research sites and MRI vendors. This 'ground-up' approach is driven by international, independently funded working groups, including PARENCHIMA [2], a subsidiary of the European Cooperation in Science and Technology (COST) Action group and the UK Renal Imaging Network (UKRIN), amongst others. As image acquisition is standardised, scientific scrutiny must also be applied to the methods of analysis. Many of the MRI sequences employed produce quantitative results from modelling dependent on measurements using other sequences [3], and for which the resultant values will vary depending on whether whole kidney, renal cortex or renal medulla is selected [4]. Numerous analytic approaches have been reported to date, and the optimal technique in terms of time and clinical relevance, are not yet known. In addition, the absence of commercially available analysis software that is specifically designed for unique interests of renal MRI leads to use of in-house bespoke software, which renders external validation of results challenging.
Our centre has an active renal MRI research group, with current projects exploring the clinical implications of multi-parametric renal MRI across healthy volunteers [5] as well as patients with heart failure, chronic kidney disease (CKD) [6] and renal transplants. We aim to compare different regions of interest (ROIs) and their interobserver reproducibility using commercially available analysis software in healthy and patient populations, including native and transplant kidneys, across selected MRI sequences.

Study population and clinical parameters
Patients were recruited from nephrology and cardiology clinics, and from general advertisement, for the renal transplant (Tx), heart failure (HF) and healthy volunteer (HV) cohorts, respectively. For Tx and HF patients, the scans were acquired as baseline imaging for two separate ongoing clinical studies (ClinicalTrials.gov: NCT03705091 and NCT03485092). Basic biometric parameters were measured and serum creatinine was measured in accredited clinical biochemical laboratories. Estimated glomerular filtration rate (eGFR) was derived using the Chronic Kidney Disease Epidemiology Collaboration (CKD-EPI) equation [7]. All participants gave written informed consent and regional ethics committee approval was granted; the study was conducted in agreement with the Declaration of Helsinki.

MRI acquisition
MRI was performed on a Siemens MAGNETOM Prisma 3T scanner (Siemens Healthcare, Erlangen, Germany) using an 18-channel phased array coil anteriorly and a 32-channel spine coil posteriorly. Scans for renal volume, perfusion and T1 were acquired from all patients (Fig. 1 Motion correction and fitting of the T1 map was performed using a phase-sensitive inversion recovery reconstruction implemented in the vendor software (Siemens, VE11C, MyoMaps) [11].
• Arterial spin labelling A pseudo-continuous arterial spin labelling (pCASL) scan [12] with a 3D turbo gradient spin-echo (TGSE) readout which was acquired during free breathing [13]. The prototype sequence comprises a slice-selective presaturation pulse to suppress the sig- nal from preceding excitations and a frequency-offsetcorrected inversion (FOCI) pulse positioned over the imaging region. This is followed by the pCASL sliceselective labelling pulse. For background saturation, four non-selective hyperbolic secant pulses are applied, interspersed with three slice-selective saturation pulses, positioned superior to the labelling plane. The pCASL labelling plane was positioned in a transverse oblique slice of thickness 10 mm perpendicular to the aorta and superior to the kidneys to label the blood in the descending aorta ( Supplementary Material Fig. 1). The start time of the pCASL labelling was 3000 ms and the pCASL duration was 1500 ms with a flip angle of 28°. The presaturation pulses and FOCI pulse were positioned in a transverse slab covering the kidneys. The pulses to suppress inflowing arterial blood were applied in a slab superior to the labelling plane to suppress inflowing arterial blood. Images were obtained in a coronal oblique orientation covering the whole kidney volume. A low-resolution pCASL scan with one measurement was acquired to confirm that the positioning of the labelling plane was appropriate to produce signal in the perfusion-weighted image. This was followed by a higher resolution scan with parameters as given in Tables 1, 2. The sequence acquires label and control images and a reference proton density-weighted (M0) image.
Perfusion maps were produced using inline software. In-plane 2D motion correction is applied, retrospectively, to proton density-weighted (M 0 ) label and control images. Label and control images are subtracted to create perfusion-weighted images. Maps of perfusion rate (f) are calculated pixel by pixel using the motion-corrected proton density-weighted (M 0 ) and perfusion-weighted (ΔM) images according to: where f is the perfusion rate in ml/100 mg/min; t is the time between labelling and imaging (3000 ms); τ is the duration of labelling pulse (1500 ms); Δt is the arterial transit time, assumed to be 750 ms; α is the labelling efficiency, assumed to be 0.98; λ is the blood-tissue water partition coefficient, assumed to be 0.9 ml/100 g; T 1blood is the longitudinal relaxation time of arterial blood; T 1′ is the apparent longitudinal relaxation time of tissue. A fixed T 1blood = T 1′ = 1250 ms was assumed in calculating the perfusion maps. • DWI For the Tx and HV cohorts, DWI was performed using a single-shot spin-echo echo-planar imaging sequence with 17 slices positioned in a coronal oblique plane. Images were acquired at 10 b values (0, 50, 100, 150, 200, 250, 300, 500, 750, 1000 s/ mm 2 ) for four diffusion directions, averaged to give a 4-scan trace. Spectral attenuated inversion recovery (SPAIR) fat suppression was used and images were acquired during free breathing, with an acquisition time of 1 min 46 s. Apparent diffusion coefficient (ADC) maps were created using the vendor software, performing a mono-exponential fit to the ten b-values [14].

MRI analysis
Interobserver variability was compared across different methods of image analysis. For kidney volume, the renal contours were drawn around the whole kidney (excluding the renal pelvis) on the first and last slices containing renal tissue. Contours were then added to every alternate slice in between. This initial total kidney volume (linear interpolation for non-contoured slices) was then recorded ('alternate slices') prior to drawing contours to the remaining slices and noting the resultant volume ('every slice'). For pCASL and DWI, a single slice was chosen for analysis. ROIs were drawn manually around the whole kidney (WK), cortex (Cx), an area of user-defined representative cortex (rep-Cx), within the cortex at the superior and inferior poles (sup-Cx and inf-Cx, respectively) and in a representative area of medulla (Med) (Fig. 2). Corticomedullary differentiation was assessed by ratio of Cx to Med. Each cohort was analysed by a pair of independent observers from a pool of four clinicians and one physicist, all with local training in renal MRI analysis (SAS and LZ analysed HV, SAS and MMYL analysed HF and KAG and AJR analysed Tx). Image analysis was performed using the commercially available software cvi42 version 5.9.4 (Circle Cardiovascular Imaging, Calgary, Canada).

Statistical analysis
Descriptive statistics are reported as mean and standard deviation or median and range/interquartile range (IQR) for normally distributed and skewed data, respectively. Paired t-tests were used to compare kidney volume techniques and results were displayed graphically using a Bland-Altman plot [15]. Pearson correlation coefficient was used to quantify linear relationships between continuous variables. A total of 12 participants are required to detect a correlation coefficient of 0.8 with 90% power and alpha 0.05. Our decision to include 40 participants yields a power > 99.9% to detect a correlation coefficient of 0.8 at alpha 0.05. Interobserver reproducibility was measured using coefficient of variation (CoV) (calculated by the standard deviation divided by the mean) and intraclass correlation coefficient (ICC) (two-way random, average measures).

Participant demographics
A total of 40 participants were included: ten healthy volunteers, ten patients with heart failure (with reduced ejection fraction of ≤ 40%) and 20 renal transplant recipients. Clinical characteristics are shown in Table 1.

Renal volume
Calculation of renal volume was possible in 39 patients (98%) (one patient did not have appropriate TrueFISP images). Mean difference in renal volume was 1.6 ml lower when contours were drawn on alternate slices as opposed to every slice (p < 0.001) (Fig. 3). There was no interobserver difference in renal volume with either approach (p = 0.56 for alternate slice, and p = 0.89 for every slice). Tables 2, 3 show the results and interobserver reproducibility for renal volume, respectively.

T1, pCASL, ADC: comparison of different ROIs
T1, pCASL and ADC sequences were acquired in 39, 39 and 28 patients, respectively. Image quality was acceptable in all but two pCASL acquisitions in the Tx group who were excluded from further analysis. Table 2 shows the mean results for each sequence depending on whether ROIs were drawn for WK, Cx, rep-Cx, sup-Cx, inf-Cx and Med. The standard deviation in Table 2 represents the spread of mean values obtained. Table 3 shows the interobserver reproducibility for each ROI by sequence and participant group. For T1 and ADC, WK, Cx and rep-Cx were highly reproducible (ICC ≥ 0.76; CoV < 5%). For pCASL, Cx and rep-Cx were less readily reproducible (ICC > 0.86 but CoV up to 14.2%). The reproducibility of Med ROI was excellent for T1, but less good for pCASL and ADC (Table 3). Table 4 shows the spread of data within each ROI by reporting the mean ROI standard deviation as a proportion of the mean value. The spread of data from WK ROIs was higher than cortex-specific ROIs, even when the mean value for each was similar (Table 2).

Comparison between participant groups
There was a significant difference in kidney volume between groups (F = 13.2, p < 0.001) with the greatest renal volume in Tx, then HF and then HV ( Table 2). Mean T1 values also differed between participant groups (WK: F = 7.9, p = 0.001, Cx: F = 6.9, p = 0.003, rep-Cx: F = 7.1, p = 0.003). However, on paired comparisons, there was no difference in T1 results between HV and HF cohorts, while mean cortical T1 values were 122.4 ms (p = 0.009) and 84.7 ms (p = 0.02) greater in the Tx group compared to HV and HF groups, respectively. Medullary T1 values were also higher in Tx than HV (mean difference 129.1 ms, p = 0.03). There were no differences between groups on any cortical ROI for pCASL or ADC. Medullary pCASL values were significantly lower in Tx group compared to HV (mean difference − 35.7 ml/ min/100 g, p = 0.03) and HF (mean difference − 48.4 ml/ min/100 g, p = 0.03).

Discussion
This study provides evidence to support the reproducibility of certain analysis techniques for renal MRI using commercially available analysis software. This is an essential step to allow studies exploring the clinical significance of functional renal MRI to report in confidence. Our data show that measurement of renal volume by contouring a localiser image is highly reproducible between observers. Contouring alternate slices, as opposed to every slice, results in a small reduction in measured volume with the advantage of improved efficiency. We believe the 1.6 ml (0.8%) mean difference, in volume by contouring alternate slices, is clinically insignificant, but nevertheless we would advise consistency with whichever approach is chosen. Whilst automated contouring and volume calculation is being utilised by some centres [16] and is likely to improve time efficiency, this approach is still to be externally validated and widely available. For T1, pCASL and ADC, WK ROIs are highly reproducible and commonly reported, but the mean value is derived from an unduly wide range of values, as evidenced by the fact on average the ROI SD represented between 17 and 55% of the mean value in our cohort. We would argue this summary statistic is a crude representation of the physiological tissue, which we hope to describe and that cortical values may have more biological relevance, without unacceptable reduction in reproducibility. Indeed, for ADC, the correlation with renal function of cortical ROIs was stronger than for WK. When drawing a small ROI of representative cortex, prespecifying its location to be at either the superior (sup-Cx) or inferior (inf-Cx) pole did not improve reproducibility compared to a user-defined location and reduced the correlation with total cortex (Cx) for T1 and pCASL. Furthermore, sup-Cx and inf-Cx are theoretically more susceptible to artefact from respiratory movement in native kidneys compared to regions of lateral/medial cortex that would move in plane. We therefore advise that either Cx or rep-Cx be used preferentially, whenever cortical values are reported. Drawing an ROI for rep-Cx is likely to reduce analysis time compared to whole cortex and in this small sample, the correlation between eGFR and ADC was greatest when rep-Cx was used. However, this is balanced against the lower ICC for rep-CX than Cx. Further studies are required to distinguish their benefits and we suggest that either Cx or rep-Cx can be used to report cortical values in the interim. Nevertheless, development of a harmonised approach across centres is vital to allow broader use of renal MRI in research and clinical settings [1]. While there was a significant correlation between ADC and eGFR, there was no association between renal volume, T1 and pCASL with renal function. Although this may generate scepticism with regards to the clinical relevance of these sequences, the development of MRI biomarkers is intended to provide physiologic and prognostic information  additional to existing clinical measures, but further studies are needed to clarify this. We performed a limited comparison of medullary values. Future studies may wish to analyse the medulla in more detail. Recent studies have reported measures of corticomedullary differentiation (CMD) using T1 and ADC and their correlation with clinical parameters [17][18][19]. These studies were well-conducted, but there is a risk of over interpreting the significance of cortico-medullary findings. Loss of CMD is a well-established, non-specific finding in CKD that is detectable on ultrasound, computed tomography and MRI [20]. Any observed association between eGFR and CMD on T1 or ADC may underplay the utility of MRI as a functional measurement and may instead detect a crude structural change that is prevalent in CKD, and which can be measured in simpler ways.
The study is strengthened by its multi-parametric protocol across both healthy and diseased populations, including native and transplant kidneys yielding clinically meaningful results. The study has a number of limitations. Whilst we have shown these analyses to be reproducible, the clinical significance of any approach is not yet established. We did not assess R2* [also known as bloodoxygen-level-dependent (BOLD) imaging]. This parameter is recommended to be included in multi-parametric renal MRI protocols and its inclusion in this study would have been advantageous [1]. Only two observers reported each ROI for comparison of interobserver reproducibility. Kidney volume measurements were not compared with established 3D contrast-enhanced techniques, and further studies are required to assess the clinical relevance of kidney volume as measured by this approach. The current pCASL sequences utilise a fixed T1 value. We accept there may be advantages of using a measured T1 and we are exploring this for future studies. Other centres have developed efficient and accurate analysis methods, often using in-house developed software, which we are unable to replicate. For instance, a technique that uses a histogram to numerically segregate cortical from medullary values has been reported [4]. These analysis strategies require bespoke software which generally relies upon precise harmonisation of acquisition parameters to allow use out-with the centre in which they are developed. Nevertheless, comparison of results generated using this technique with the approaches detailed here would be interesting. The use of commercially available software in this study is strength. However, the license carries a cost and the software used is designed for cardiovascular analysis, such that we have applied many of the modules out-with their intended use. There is an urgent need for widely available software that is specifically designed for multi-parametric renal MRI analysis to advance the research and clinical application of renal MRI.

Conclusion
There are numerous strategies to analyse multi-parametric renal MRI with many centres using in-house bespoke software. The optimal approach is not yet known. These results provide justification for one approach using commercially available software. We suggest that kidney volume can be calculated by contouring alternate slices, rather than every slice, of a localiser scan albeit validation with 3D volume techniques is still required. For T1, pCASL and ADC, we suggest that whole kidney values, while highly reproducible, are used with caution given that the results represent a central value from an extremely wide range. Instead, manually delineated cortex or a small ROI of user-defined representative cortex can be used interchangeably in both native and transplant kidneys, with acceptable interobserver reproducibility. Clinical correlation of the results generated from this approach is eagerly awaited.