Observer variability of reference tissue selection for relativecerebral blood volume measurements in glioma patients

Objectives To assess observer variability of different reference tissues used for relative CBV (rCBV) measurements in DSC-MRI of glioma patients. Methods In this retrospective study, three observers measured rCBV in DSC-MR images of 44 glioma patients on two occasions. rCBV is calculated by the CBV in the tumour hotspot/the CBV of a reference tissue at the contralateral side for normalization. One observer annotated the tumour hotspot that was kept constant for all measurements. All observers annotated eight reference tissues of normal white and grey matter. Observer variability was evaluated using the intraclass correlation coefficient (ICC), coefficient of variation (CV) and Bland-Altman analyses. Results For intra-observer, the ICC ranged from 0.50–0.97 (fair–excellent) for all reference tissues. The CV ranged from 5.1–22.1 % for all reference tissues and observers. For inter-observer, the ICC for all pairwise observer combinations ranged from 0.44–0.92 (poor–excellent). The CV ranged from 8.1–31.1 %. Centrum semiovale was the only reference tissue that showed excellent intra- and inter-observer agreement (ICC>0.85) and lowest CVs (<12.5 %). Bland-Altman analyses showed that mean differences for centrum semiovale were close to zero. Conclusion Selecting contralateral centrum semiovale as reference tissue for rCBV provides the lowest observer variability. Key Points • Reference tissue selection for rCBV measurements adds variability to rCBV measurements. • rCBV measurements vary depending on the choice of reference tissue. • Observer variability of reference tissue selection varies between poor and excellent. • Centrum semiovale as reference tissue for rCBV provides the lowest observer variability.


Introduction
T2*-weighted dynamic susceptibility contrast-enhanced MR imaging (DSC-MRI) has been shown to be useful in evaluating brain neoplasms. With DSC-MRI, T2*-weighted echo planar images are acquired that measure the signal intensity change over time after the injection of a bolus of a paramagnetic contrast agent. The change in relaxation rate (ΔR2*) can be calculated from the signal intensity and is proportional to the contrast agent in the tissue. Cerebral blood volume (CBV) is a parameter that can be measured with DSC-MRI, and is proportional to the area under the curve of ΔR2*(t). Studies have used CBV to differentiate between low-and high-grade gliomas [1,2], to differentiate tumour progression from pseudo-progression [3], and to assess treatment response to anti-angiogenic drug therapy [4][5][6].
CBV measurements may show high variability. Values vary due to different image acquisition protocols and postprocessing methods and also due to physiological differences in the patients such as cardiac output and haematocrit values. This makes it difficult to compare CBV among patients and studies. Therefore, CBV is typically normalized to a reference tissue. The relative CBV (rCBV) is calculated by the CBV in the region of interest (e.g. tumour hotspot) divided by the CBV of an internal reference tissue for normalization. A typical reference tissue is the contralateral normal-appearing white matter (NAWM) or the normal-appearing grey matter (NAGM).
Different reference tissues are used in the literature, including normal-appearing white or grey matter (contralateral or ipsilateral) [7], the contralateral NAWM [2,8], the contralateral NAGM [9], contralateral thalamus [10] or contralateral centrum semiovale [11]. Most studies do not describe their exact location or give an exact definition, but it is known for instance that CBV values of grey matter are higher than CBV values of white matter [12].
The rCBV is subject to observer variability when the regions of interests (ROIs) are manually annotated. Variability can be reduced by the use of (semi-)automated methods [13][14][15], but despite their existence, they are not commonly available, and manual annotations of the reference tissue by an experienced radiologist is still common practice [16,17] for which a reliable and reproducible reference ROI is necessary. Wetzel et al. [18] investigated the observer variability of annotating the tumour hotspot while keeping the internal reference tissue constant. The authors did not study the observer variability of annotating the reference tissue.
Thus, the purpose of this study was to assess the observer variability of rCBV measurements depending on the choice of the reference tissue that is used for normalization in DSC-MRI of glioma patients.

Patient selection
For this retrospective study informed consent was waived. Between 2006 and 2008 our institution participated in a European project called eTumour. Patients presenting with symptoms suggestive of brain tumour, newly diagnosed and untreated brain tumours were prospectively included. One day before surgery they underwent conventional MR, MR spectroscopy and MR perfusion (DSC-MRI) imaging. Our institution included 98 subjects with brain tumours. Further details of the eTumour project can be found in Julia-Sape et al. [19].
The patient selection criteria to select patients from the eTumour database were as follows: (1) Subjects with a histopathologically confirmed diagnosis of glioma was available; (2) subjects did not have surgical resection, biopsy or radiation therapy before DSC-MR imaging was performed; and (3) subjects were 18 years or older. Subjects were excluded if the DSC-MRI was technically inadequate due to motion and susceptibility artifacts.
All tumours were located supratentorially.

Observers
One observer (MTHO, with 3 years of experience) annotated the tumour hotspot according to the method described by Wetzel et al. [18] and did not participate in annotating the reference tissues.
Three observers (FJAM and BMG, certified neuroradiologists with 10 and 30 years of experience, respectively, and EJS, a resident radiology) independently performed all measurements for the reference tissues. The observers were blinded to patient history and diagnosis. The observers underwent a training session with five training cases, which were excluded in the final evaluation in order to limit performance bias.

Image analysis
The DSC-MR images were processed on a dedicated in-house developed workstation (Cirrus Brain MR, version 7335, Radboudumc, Nijmegen, The Netherlands). Processing consisted of image registration of DSC-MRI to the T1w image, followed by calculation of the CBV perfusion map [20]. The Weisskoff correction method was used to correct for T1 leakage effects [21,22]. The DSC-MR image, corresponding CBV map and conventional MR images (T1w before and after contrast, T2w or FLAIR) were made available in the workstation to the observers.
One observer (MTHO) defined the tumour hotspot following Wetzel et al. [18] by annotating four to six circular ROIs of 25 mm 2 in the area of the tumour hotspot and selecting the ROI with the highest CBV value. This region was then kept constant for all observers in the subsequent normalization step. Defining the tumour hotspot was done in a separate session and care was taken to avoid areas of necrosis, cysts, or non-tumour macro-vessels.
To evaluate the influence normalization for calculating rCBV, all observers were asked to place a reference ROI in a homogenous region at the contralateral side of approximately 25  Large vessels and tumour-suspicious regions were avoided. rCBV was calculated by dividing the tumour CBV by the CBV of the reference tissue. Figure 1 shows an example of ROI placement in the regions listed above.

Statistical analysis
Statistical analyses were performed with IBM SPSS Statistics version 20 (SPSS Inc., Chicago, IL, USA). The mean and standard deviation of the rCBV measurements were determined per reference tissue, evaluation session and observer. Normal distribution was tested using the Shapiro-Wilk test. A paired t-test was used to compare the rCBV measurements between the two evaluations for every observer and all reference tissues, with p<0.05 considered statistically significant.
For the statistical analyses of the observer variability, the coefficient of variation (CV), intraclass correlation coefficient (ICC) and Bland-Altman analyses were used. The CV was calculated for the rCBV for every reference tissue, and every observer. The ICC is reported with a 95 % confidence interval where ICC <0.4 was considered poor agreement, ICC 0.40-0.59 was considered fair agreement, ICC 0.60-0.74 was considered good agreement, and ICC >0.74 was considered excellent agreement [23]. Bland-Altman analyses were expressed as the mean difference, standard deviation and 95 % limits of agreement.

Intra-observer variability
The observers repeated the rCBV measurements after 2 weeks or longer (up to a month) to assess the intra-observer variability. The measurements were made on the same dataset but in a different random order of presentation to limit recall bias. The statistical analyses were calculated for rCBV measurements between the two evaluations of an observer, for each reference tissue. A two-way mixed ICC model, with absolute agreement, single measures and a 95 % confidence interval was used.

Inter-observer variability
The statistical analyses were calculated for every pairwise combination of observers to assess the inter-observer variability. Only the rCBV measurements of the first evaluation for every reference tissue and observer were used. A two-way random ICC model, with absolute agreement, single measures and a 95 % confidence interval was used.

Results
The mean and standard deviation of the rCBV measurements for all observers are shown in Table 1. There was no statistically significant difference between the rCBV measurements of the two evaluations of all tissues, except for observer 1 in NAWM tumour.

Intra-observer variability
The ICC, CV and Bland-Altman analysis for intra-observer variability are summarized in Table 2. The ICC ranged from 0.50 to 0.97 for all tissues, indicating fair to excellent agreement. The averaged CV for a reference tissue ranged from 5.1 % to 22.1 % for all reference tissues and observers. Centrum semiovale (range ICC 0.88-0.97) was the only reference tissue that showed excellent agreement (ICC >0.74) for all observers. Centrum semiovale showed the lowest averaged CVs for all observers (range 5.1-9.0 %). Results of the Bland-Altman analysis showed that the mean differences for centrum semiovale and putamen were close to zero. Bland-Altman plots of centrum semiovale and putamen for intra-observer variability are shown in Fig. 2. Figure 3 illustrates the effect of ROI placement and the rCBV between centrum semiovale, the frontal NAWM and parietal NAWM. The effects for this example are shown in Table 3. The difference in rCBV values between two ROIs for frontal NAWM is 21 %, and for parietal NAWM 30 % compared to 3 % for centrum semiovale.  Inter-observer variability The ICC, CV and Bland-Altman analysis for inter-observer variability are summarized in showed the lowest averaged CVs. Results of the Bland-Altman analysis showed that the mean differences for centrum semiovale and putamen were close to zero. Bland-Altman plots of centrum semiovale and putamen for interbserver variability are shown in Fig. 4.

Discussion
Several factors may influence rCBV values [24], including contrast agent characteristics, acquisition technique and data pre-and post-processing. A low variability of rCBV measurements is not only important for accurate tumour grading and treatment monitoring, it also enables comparisons of values across studies and patient populations. In this study we have shown that selecting the contralateral centrum semiovale as reference tissue for rCBV measurements in DSC-MRI of glioma patients provides the lowest intra-and inter-observer variability. We assessed the observers' variability of reference tissue selection. In total eight regions of interest were depicted as reference tissue in NAWM and NAGM. Overall, a wide variability in observer agreement of the rCBV measurements was reported in our study.
The centrum semiovale is easier to annotate compared to the frontal or parietal NAWM, areas which are hindered by partial volume effects of WM and GM, pronounced T2-shortening effects of the cortical vessels (mainly GM) and distortion artifacts due to the frontal sinus. The centrum semiovale is a large homogenous area of WM that is mostly visible in only one or two axial slices, and suffer less from the problems described for the frontal or parietal NAWM. This explains the excellent intraclass correlation coefficient and the low coefficient of variation.
The putamen is a well-defined homogenous area of subcortical GM and could potentially also be a good reference tissue. Thalamus is less suited as a reference tissue because it is more heterogeneous with nuclei and suffers from pronounced T2shortening effects from vessels in the near vicinity. Putamen showed good to excellent intra-and inter-observer agreement, and showed the lowest averaged CVs for all observers of the GM reference tissues. However, centrum semiovale ICC and CV for intra-and inter-observer agreement were slightly better compared to putamen. Table 2 Intra-observer agreement -intraclass correlation coefficient, coefficient of variation, and Bland-Altman analysis Intra-observer agreement Observer 1  Table 3 Table  Table 3 shows the effects of region of interest (ROI) placement in white matter (WM) for the example in Fig. 2. The difference in rCBV values between two ROIs for frontal WM is 21 %, and for parietal WM 30 % difference compared to 3 % for centrum semiovale Our data showed distortion artifacts due to the GE-EPI sequence, which is common near brain-bone-air interfaces, mainly in the frontal lobe and in the area of the putamen [24]. Despite these distortions, rCBV in putamen could still be calculated (see Fig. 5). Distortions in these areas can be decreased by changing the phase encoding order from posterior to anterior instead of anterior to posterior. Further research on the reproducibility of the DSC-MR images over time is needed to investigate the distortion artifacts near the putamen and to investigate the effect of distortions on the calculation of rCBV.
Besides artifacts (like distortion artifacts and pronounced T2shortening effects), insufficient Z-coverage can be a problem in DSC-MRI (which was not the case in our study) and centrum semiovale can be excluded from the scan. If centrum semiovale and putamen are not available for assessment then normalappearing white matter in the slice of the tumour also showed good ICC for intra-and inter-observer variability. However, it also showed an average CVof >20 % for inter-observer variability, which is not preferable. Since artifacts are the most common occurrences to hinder normalization we advise selecting normalappearing white matter far away from the sinuses and mastoid to avoid distortion artifacts and to stay away from vessels to avoid pronounced T2-shortening effects.
Only one related work was found that assessed the observer agreement of selecting reference tissues. Wetzel et al. [18] investigated the observer agreement of the tumour ROI and used one pixel in NAWM as reference tissue of which the exact location was not described. To analyse the precision of measurements of NAWM they selected ten ROIs close to the initial reference ROI in NAWM and showed a CVof 20 % of repeated measurements in NAWM. In our study, if NAWM is selected as reference tissue then the results showed a higher overall CV, from 23.9 % for NAWM in the slice of the tumour, 26.4 % for parietal NAWM up to 27.7 % for NAWM by choice, except for the centrum semiovale, which showed the lowest CV (range 8.1-12.5 %). Another explanation for the differences in CV (besides partial volume effects of WM and GM, pronounced T2-shortening effects or distortion artifacts) could be the size of the ROI. The use of only one pixel as reference tissue by Wetzel et al. [18] is a limitation and can explain the higher CV in their study compared to our CV for centrum semiovale since centrum semiovale is an easy to annotate homogenous area and not hindered by the problems described above. The size of the ROI is still a matter of debate, and ranges in the literature from 3.2 mm 2 [18] to 50 mm 2 [3], and even up to 432 mm 2 [1]. That is, size ranged from 1 pixel [18] up to 100 pixels [1]. We decided to use 25 mm 2 ROIs since these ROIs can be easily placed in cortical GM, but also in thalamus and putamen without partial volume averaging within the NAWM.
We chose to use circular ROIs with a fixed diameter instead of freehand ROIs for the reference tissue. In a previous preliminary study [25], we showed that the freehand ROIs were larger than the ROIs used in the current study and therefore showed lower Table 4 Interobserver agreement -intraclass correlation coefficient, coefficient of variation, and Bland-Altman analysis Inter-observer agreement Observer 1 vs. Observer 2 CV. However, freehand ROIs showed lower agreement (lower ICC), because it is difficult to draw the same freehand ROI twice or by different observers. We therefore recommended using ROIs with fixed diameters.
Our goal was to only assess the observer agreement when selecting the reference tissue, and therefore the tumour hotspot was a fixed tumour throughout the experiments. Based on our study we cannot assess the overall influence if the observers were allowed to choose both the tumour hotspot and the reference tissue. Wetzel et al. [18] showed that the inter-observer CV for determining the tumour hotspot ROI with a fixed reference tissue ROI is 30 %. Our study showed that the CV for the reference tissue with a fixed tumour hotspot ROI ranged from 8.3 % to 31.1 %. A study should be performed to assess the overall influence if both can freely be selected. An accepted target for measurement error in multicentre studies is a CV that is less than 20 %, according to the Quantitative Imaging Biomarkers Alliance (QIBA) [26]. Our study showed a lower overall inter-observer agreement than intra-observer agreement in the rCBV measurements, which is in concordance to other studies [27][28][29][30][31][32]. However, these variabilities are difficult to compare to our results because different tumours, body parts, modalities, methods of dynamic acquisitions and pharmacokinetic models are used.

Limitations
One limitation of our study is that we did not use vessel segmentation in the analyses. It is known that GE-EPI sequences are more weighted towards the macrovasculature. Large vessels are pronounced due to T2 shortening outside the vessel lumen, which results in an overestimation of rCBV in cortical  Fig. 4 Interobserver Bland-Altman plots. Bland-Altman plots were used to analyse the agreement between two observers. The difference between the first measurement of two observers was plotted on the y-axis and the mean of the two evaluations was plotted on the x-axis. The solid (black) line represents the mean value for the data points and the dashed (red) line represents the 1.96*SD gray matter and nearby white matter [12]. To minimize these macrovessel signals in gradient echo images vessel segmentation techniques can be used during post-processing [33].
Another limitation is that our results only apply to rCBV. Care must be taken to extrapolate the results to other perfusion parameters like cerebral blood flow, spin-echo acquisitions or other perfusion methods (like arterial spin labeling or T1dynamic contrast-enhanced MR perfusion).

Conclusion
Our findings show that the observer variability of rCBV measurements can vary between poor and excellent, depending on the chosen reference tissue in NAWM or NAGM. Contralateral centrum semiovale as the internal reference standard for rCBV showed the lowest observer variability.
Funding The authors state that this work has not received any funding.

Compliance with ethical standards
Guarantor The scientific guarantor of this publication is Rashindra Manniesing.