The effect of gadolinium-based contrast-agents on automated brain atrophy measurements by FreeSurfer in patients with multiple sclerosis

Objective To determine whether reliable brain atrophy measures can be obtained from post-contrast 3D T1-weighted images in patients with multiple sclerosis (MS) using FreeSurfer. Methods Twenty-two patients with MS were included, in which 3D T1-weighted MR images were obtained during the same scanner visit, with the same acquisition protocol, before and after administration of gadolinium-based contrast agents (GBCAs). Two FreeSurfer versions (v.6.0.1 and v.7.1.1.) were applied to calculate grey matter (GM) and white matter (WM) volumes and global and regional cortical thickness. The consistency between measures obtained in pre- and post-contrast images was assessed by intra-class correlation coefficient (ICC), the difference was investigated by paired t-tests, and the mean percentage increase or decrease was calculated for total WM and GM matter volume, total deep GM and thalamus volume, and mean cortical thickness. Results Good to excellent reliability was found between all investigated measures, with ICC ranging from 0.926 to 0.996, all p values < 0.001. GM volumes and cortical thickness measurements were significantly higher in post-contrast images by 3.1 to 17.4%, while total WM volume decreased significantly by 1.7% (all p values < 0.001). Conclusion The consistency between values obtained from pre- and post-contrast images was excellent, suggesting it may be possible to extract reliable brain atrophy measurements from T1-weighted images acquired after administration of GBCAs, using FreeSurfer. However, absolute values were systematically different between pre- and post-contrast images, meaning that such images should not be compared directly. Potential systematic effects, possibly dependent on GBCA dose or the delay time after contrast injection, should be investigated. Trial registration Clinical trials.gov. identifier: NCT00360906. Key Points • The influence of gadolinium-based contrast agents (GBCAs) on atrophy measurements is still largely unknown and challenges the use of a considerable source of historical and prospective real-world data. • In 22 patients with multiple sclerosis, the consistency between brain atrophy measurements obtained from pre- and post-contrast images was excellent, suggesting it may be possible to extract reliable atrophy measurements in T1-weighted images acquired after administration of GBCAs, using FreeSurfer. • Absolute values were systematically different between pre- and post-contrast images, meaning that such images should not be compared directly, and measurements extracted from certain regions (e.g., the temporal pole) should be interpreted with caution. Supplementary Information The online version contains supplementary material available at 10.1007/s00330-021-08405-8.

Due to the high tissue contrast [16][17][18] in unenhanced three-dimensional (3D) T1-weighted images, this image type is commonly used by brain segmentation software and required by FreeSurfer, as well as other software with similar purposes. However, unenhanced 3D T1-weighted images are not mandatory in suggested standardised brain MRI protocols for MS [19] and may not be routinely included. Instead, post-contrast T1-weighted images are often prioritised, especially in clinical settings. In case of ongoing inflammation, the intravenously administered contrast agent leaks into the brain parenchyma in locations where the blood-brain barrier (BBB) is disrupted [20]. These postcontrast images are valuable both in baseline and follow-up examinations, as they can unequivocally detect lesions with active inflammation [19].
The contrast agent used is almost universally gadolinium-based, consisting of a central paramagnetic Gd 3+ ion chelated to a carrier molecule to prevent the toxicity of free Gd 3+ , while still maintaining its paramagnetic properties. Gadolinium-based contrast agents (GBCAs) shorten both the longitudinal (T1) and transverse (T2) relaxation times [21], leaving areas in which GBCAs accumulate as bright or hyperintense compared to surrounding tissue on T1-weighted images.
The use of GBCAs has increased over the last three decades [22], making up a considerable source of historical and prospective real-world data. However, the value of such data for brain atrophy measurements depends on our ability to correctly interpret the data in automated image analyses. The influence of GBCAs on atrophy measurements is still largely unknown and has previously been investigated in only a few studies using different image analysis techniques [23,24]. In this study, our aim is to validate the use of post-contrast T1-weighted images for volume and cortical thickness measurements and to provide guidelines on how to interpret results from clinically relevant and commonly considered measures. To do so, total WM and GM volume, total deep GM and thalamus volume, and mean cortical thickness measures were obtained in pre-and post-contrast images by FreeSurfer and compared.

Participants
The patients included in this study participated in a 10-year follow-up visit following a multi-centre, randomised,

Introduction
Grey matter (GM) atrophy measured on MRI in persons with multiple sclerosis (MS) reflects irreversible neuroaxonal loss and neurodegenerative changes in the CNS [1]. The degree of GM atrophy has been shown to consistently correlate with physical [2,3] and cognitive [4] disability, and is regarded as a promising neurodegenerative biomarker. Furthermore, as the demand for neuroprotective interventions increases, GM atrophy is an easily available outcome measure [5][6][7].
There are a number of available methods and software to measure GM atrophy. Although FreeSurfer requires substantial processing time, making it less suitable for clinical practice, it is one of the most commonly used automated methods in research, especially for cortical parcellation and thickness estimation. FreeSurfer is publicly available and widely validated [8][9][10][11][12], and in the body of literature on placebo-controlled trial of ω-3 fatty acids in MS (the OFAMS-study), which has previously been described in detail [25]. A total of 85 of the 92 persons with relapsing-remitting MS (RRMS) [26] originally enrolled in the OFAMS-study participated in the 10-year follow-up visit and underwent clinical, biochemical, and radiological examinations at their local study site.
The study was approved by the Regional Committee for Medical and Health Research Ethics in Western Norway Regional Health Authority (clinical trials.gov, identifier: NCT00360906). All participants gave their written informed consent.

MRI data acquisition
Imaging at the 10-year follow-up visit was performed at the different study sites, on a 3-Tesla (T) MRI scanner if available, alternatively using a 1.5 T MRI scanner, with a standard head coil. The acquisition included a postcontrast sagittal 3D T1-weighted sequence; acquisition details across sites are provided in Table 1. Furthermore, a sagittal T2-weighted 3D fluid-attenuated inversion recovery (FLAIR) sequence was acquired according to locally optimised protocols. The full MRI protocol provided to the study sites is available in eAppendix 1. The study sites were encouraged to include the same 3D T1-weighted sequence before contrast-agent administration, if possible. For the present study, only the subset of the participants who underwent 3D T1-weighted MR imaging both before and after injection of GBCAs, during the same scanner visit, and with the exact same acquisition protocol, was included.

MRI data processing
Lesion segmentation and lesion filling Lesion segmentation was done on FLAIR images using lesion segmentation tool (LST) (version 2.0.15; http:// appli ed-stati stics. de/ lst. html) [27]. The lesion probability map in FLAIR space was brought to T1-weighted space by FLIRT linear registration of the FLAIR image to the T1 image, using 7 degrees of freedom, correlation ratio as the cost function, and trilinear interpolation. Afterwards, a threshold of 0.1 was used to binarise the lesion probability map. To optimize the lesion filling, gadolinium-enhancing regions (both lesions and other regions) were first removed, by applying an upper-intensity threshold at the 98 th percentile. Next, the FMRIB Software Library (FSL) (version 5.0.10; http:// www. fmrib. ox. ac. uk/ fsl) was used to fill in lesional voxels in the T1-weighted images using the lesion_filling tool [28], and these filled lesions were pasted into the original post-contrast 3D T1-weighted images.

Morphological reconstruction
Cortical reconstruction and parcellation for cortical volume and thickness measurement and subcortical segmentation were performed with FreeSurfer, a freely available software package for academic use, available through online download (http:// surfer. nmr. mgh. harva rd. edu/). The findings presented here were obtained using FreeSurfer version 7.1.1; highly comparable findings obtained using Free-Surfer version 6.0.1 are presented in Table e1. The technical details of FreeSurfer procedures have been previously described [29,30] and briefly summarised in eAppendix 2.
Quality control was performed by visual inspection, and any segmentation errors were recorded for each patient. In cases where only specific anatomical regions were incorrectly  segmented, we chose to not apply any corrections for these errors in our analyses. The Desikan-Killiany atlas [31] was used to extract cortical thickness measures (mean cortical thickness, left and right hemisphere) and to study regional differences in cortical thickness between pre-and post-contrast images, across subjects, by creating a heat map. Furthermore, total cerebral GM and WM volume and total deep GM and thalamus volume (left and right hemisphere) were obtained.

MRI quality control tool
To investigate potential root causes of any observed segmentation differences, both pre-and post-contrast T1-weighted images were analysed using the MRI Quality Control Tool (MRIQC) [32]. MRIQC is an open-source software and extracts no-reference image quality metrics (IQMs) from structural and functional MRI data [32]. Using a segmentation into GM, WM, and CSF by FSL-FAST [33], MRIQC calculates tissue-specific signalto-noise ratio (SNR) values as well as the contrast-to-noise ratio (CNR) between GM and WM. Additionally, based on these values obtained from MRIQC, the contrast ratio (CR) between white and grey matter was also calculated.

Statistical analysis
Statistical analyses were performed using the Statistical Product and Service Solutions (SPSS) for macOS (Version 25; SPSS). Data were visually and statistically examined using the Kolmogorov-Smirnov test for normality. To assess the agreement between volume and thickness measurements obtained before and after GBCA administration, the intra-class correlation coefficient (ICC) was determined, based on a mean rating (k = 2), consistency, two-way mixed model. Scatterplots were created to visualise the agreement. To assess whether any systematic differences in structural measurements or IQMs were present between pre-and postcontrast measurements, paired t-tests were performed. Furthermore, boxplots were made to illustrate any differences, and Bland-Altman plots were created to identify fixed or proportional bias [34]. As an exploratory analysis, paired t-tests were used to investigate a possible systematic difference between field strengths (1.5 and 3 T).

Results
Pre-and post-contrast T1-weighted images were obtained with the exact same acquisition protocol in a total of 23 patients. One patient was excluded due to a large image artifact, causing segmentation errors. Table 2 provides an overview of the demographic and clinical characteristics of the patient group.

Quality control of FreeSurfer segmentations
All 22 pairs of pre-and post-contrast T1-weighted images finished the fully automated FreeSurfer pipeline (i.e., no hard failures). The most common soft failures (i.e., failures that do not disrupt the pipeline, but may need modification) are summarised in Table 3.

Volume and cortical thickness measurements before and after administration of GBCAs
The mean values of MRI measurements obtained before and after GBCAs are summarised in Table 4 and Fig. 4. Briefly, a mean increase in GM volumes and cortical thickness measures were observed in post-contrast images, while a mean decrease was observed in total WM volume. The results of the exploratory analysis subdivided according to field strength are presented in Table e2, showing no clear systematic differences between field strengths.

Consistency of measurements obtained before and after administration of GBCAs
A high degree of reliability was found between the measurements obtained pre-and post-contrast, for all volumes and cortical thickness measures assessed. All ICC values ( Table 4) were above 0.92, with the lowest values in the thalami, and above 0.96 for all larger structures, all p values < 0.001. The consistency between the measurements is demonstrated in Fig. 5.

Difference in measurements before and after administration of GBCAs
GM volumes and mean cortical thickness were significantly higher after administration of GBCAs, in all investigated structures (   Found in all scans, both pre-and post-contrast, but more frequently and to a more severe degree in post-contrast images The pial surface failing to follow the white surface, causing "looping" errors ( Fig. 1a) and subsequent incorrect enlargement of the cortical volume and thickness Found in all scans, both pre-and post-contrast, but more frequently and to a more severe degree in post-contrast images The constructed surface border between WM and GM (the white surface) failing to follow the intensity gradient correctly in the temporal poles, resulting in a suboptimal segmentation (Fig. 3) Found to a moderate degree in two post-contrast images, and to a minor degree in a total of eight patients, in the post-contrast image in all eight, and in the pre-contrast image in three of those eight  Figure 6 shows heatmaps visualising the difference in cortical thickness between pre-and post-contrast images, demonstrating the general increase in thickness measured in post-contrast T1-weighted images. However, in a few exceptions, most prominently the temporal pole, the parahippocampal, and the entorhinal gyrus in the temporal lobe, cortical thickness decreased.
While GM volumes and cortical thickness measurements were higher after administration of GBCAs, total WM volume was significantly lower. Figure e1 in the supplementary material shows the constructed Bland-Altman plots, revealing systematic differences, but no proportional bias.
IQMs are reported in Table 5. The CNR was not significantly different between pre-and post-contrast images. Tissue-specific SNRs were significantly lower in post-contrast images, for both GM (p < 0.01) and WM (p < 0.0001). The CR between WM and GM was significantly higher in post-contrast images (p < 0.006).

Discussion
Our results demonstrate that using FreeSurfer, reliable GM volume-and cortical thickness measurements may be obtained from post-contrast 3D T1-weighted images. Despite systematic overestimation of the GM, high consistency was observed between all investigated MRI brain measurements obtained before and after administration of GBCAs. To our knowledge, this is one of the very few studies investigating the effect of GBCAs on volume measures in MS patients and the first using FreeSurfer. In our study, when investigating the consistency between the measures obtained before and after administration of GBCAs, a good to excellent [35] reliability was found between all investigated measures. This is in agreement with previous studies investigating the whole brain [36], upper cervical cord area [37], and GM and WM measurements [23] using SIENAX [23,36], volBrain, and FSL-Anat [23] and may imply that reliable atrophy measurements acquired from post-contrast images are possible across segmentation techniques.
Consistently, total GM, deep GM, and thalamic volume were between 3.06 and 17.39% higher in post-contrast images, and the same tendency was found for mean cortical thickness. Simultaneously, total WM volume was 1.74% lower in postcontrast images. The differences were systematic across all investigated measurements and exhibited no proportional bias. Inspecting cortical segmentations in more detail, we produced heatmaps highlighting within-subject cortical thickness differences in smaller cortical regions (Fig. 6). While smaller regions almost inevitably produce more variability than the larger regions that were the main focus of this work, these inspections showed that cortical thickness overestimation was a brain-wide phenomenon and that the overestimation in postcontrast images was not tied to large errors in any specific region but instead occurred throughout the brain.
These systematic differences in measured volumes and cortical thicknesses between pre-and post-contrast images mean that they should not be compared directly. Another study, using synthetic tissue mapping to measure brain tissue fractions [24], found a 1.1% increase in total WM fraction and an 0.7% decrease in GM fraction, in post-contrast images. Due to the methodological differences between that study and ours, it is difficult to assess the reason for the discrepancy in findings.
We could not identify any definite reason for the differences between pre-and post-contrast images. However, when visually inspecting images separately, some recurring soft failures in the FreeSurfer pipeline were found: First, the pial surface often expanded into extraparenchymal tissue, including components of dura or blood vessels as part of the cortex (Fig. 1b). These errors have been shown in areas where the dura or other structures like venous sinuses, lie tangentially in close proximity to the cortex or deep GM structures, leading to larger thickness and volume variability (Fig. 2) [38]. In the FreeSurfer processing stream, the failure to remove enough extraparenchymal tissue happens in the preliminary skull stripping step [39] and the accuracy of the pial surface can be improved by manually erasing the incorporated dura or blood vessels before rerunning analyses [40].
Another recurring soft failure concerned the pial surface. In the surface-based cortical reconstruction, the border between white and grey matter (the white surface) is delineated, following T1 intensity gradients. The pial surface is then grown from the white surface, which serves as a reference point [41]. In all images, but more frequently and severely in post-contrast images, the pial surface failed to follow the white surface, causing "looping" errors (Fig. 1a) and a subsequent incorrect enlargement of the cortical volume and thickness. To improve pial surface accuracy, it is recommended to check for any mistakes in the white surface, and possibly apply manual edits before rerunning analyses [40].
Although most cortical regions demonstrated an increase in cortical thickness in post-contrast images, there were a few exceptions, particularly in the medial part of the temporal lobe. In the entorhinal and parahippocampal gyrus, as well as in the temporal poles, the measured cortical thickness was in some patients thinner after GBCA administration. These regions have in common that they are relatively small and structurally complex, and on visual inspection of the errors, the constructed white surface did not correctly follow the intensity gradients, causing considerable errors in the white surface, and subsequently the pial surface, leaving out parts of the temporal pole (Fig. 3). Challenges in reconstructing parts of the temporal cortex are consistent with previous studies [31,38,40,42], leading to increased variability of the local cortical thickness measurements [38].
The soft failures in the FreeSurfer pipeline occurred more often in post-contrast images in our data. This may be caused by the higher intensity in extraparenchymal structures in close proximity to the cortex or subcortical structures, causing disturbance and challenges in correctly separating different tissue types. Skull stripping errors and other soft failures could in some selected regions be identified as the direct cause of increased cortical thickness or GM volume in post-contrast images. It is however uncertain if these errors can explain the systematic increase in almost all GM structures and the overall decrease in WM volume. Even in the absence of active lesions and GBCA leakage through disruptions in the BBB, GBCAs can still be expected to be present in the brain capillary network [24]. This presence may shorten the overall T1 relaxation time in all tissues, and possibly also affect intensity borders. In our MRIQC analyses, there was no difference between pre-and post-contrast images CNR, indicating that the separation of GM and WM tissue distributions was similar in pre-and post-contrast images. It should however be noted that extracting reliable noise estimates from parallel imaging is challenging.
Systematic effects dependent on the type of GBCA used, dosage, and delay time after administration are likely. In the data retrospectively used in the present study, these factors were not standardised, nor always stated, making them difficult to correct for. To further conclude on the reliability of post-contrast measurements, it is necessary for future research to investigate the possible systematic effects dependent on these variables.
This study is not without limitations. For a multicentre study, the number of patients included was limited, and patients were scanned on different scanners with varying sequence parameters and field strength. Furthermore, some details of the MRI protocol that may affect brain measurements (e.g., head coils [43,44]) were in some cases neither stated nor retrospectively retrievable, making it difficult to evaluate the effect of these factors. Nonetheless, because the effect of field strength on atrophy measures has been studied before [45], we explored the results for 1.5 T and 3 T scanners separately. No systematic differences between the two field strengths regarding the different variabilities in the pre-and postcontrast images emerged, which could be due to small patient numbers and variable acquisition settings. Considering all these aspects, the fact that consistency between measurements before and after GBCA administration was observed across the different scanners, suggests that this behaviour is largely systematic. Future studies should investigate the effect of field strength and those of other aspects of image acquisition more systematically. Image analyses in this study were performed by FreeSurfer, while there are multiple other software packages that have been used in the MS literature. To focus the present work, we chose FreeSurfer because it allows both volumetric and cortical thickness measurements and has been widely used before in MS [46][47][48][49][50].
Finally, we did not perform any pre-processing to remove high-intensity regions (except for those in WM lesions, filled in the lesion filling process) from the post-gadolinium T1-weighted images. Future work should investigate whether removal or replacement of those regions, perhaps similar to the procedure followed as part of the lesion-filling process in the present work, could reduce the observed overestimation of grey matter.

Conclusion
This study has demonstrated that reliable atrophy measurements may be obtained by FreeSurfer from post-contrast 3D T1-weighted images. A good to excellent consistency was observed between all investigated GM and WM measurements derived from images acquired before and after GBCA administration. However, due to the systematic overestimation of the GM in post-contrast images, measurements acquired from pre-and post-contrast images should not be compared directly, and measurements extracted from certain regions (e.g., the temporal pole) should be interpreted with caution. Furthermore, possible systematic effects dependent on GBCA dose and delay time after injections should be investigated.