Grey matter (GM) atrophy measured on MRI in persons with multiple sclerosis (MS) reflects irreversible neuroaxonal loss and neurodegenerative changes in the CNS [1]. The degree of GM atrophy has been shown to consistently correlate with physical [2, 3] and cognitive [4] disability, and is regarded as a promising neurodegenerative biomarker. Furthermore, as the demand for neuroprotective interventions increases, GM atrophy is an easily available outcome measure [5,6,7].

There are a number of available methods and software to measure GM atrophy. Although FreeSurfer requires substantial processing time, making it less suitable for clinical practice, it is one of the most commonly used automated methods in research, especially for cortical parcellation and thickness estimation. FreeSurfer is publicly available and widely validated [8,9,10,11,12], and in the body of literature on GM atrophy in MS, many key papers have used FreeSurfer [13,14,15].

Due to the high tissue contrast [16,17,18] in unenhanced three-dimensional (3D) T1-weighted images, this image type is commonly used by brain segmentation software and required by FreeSurfer, as well as other software with similar purposes. However, unenhanced 3D T1-weighted images are not mandatory in suggested standardised brain MRI protocols for MS [19] and may not be routinely included. Instead, post-contrast T1-weighted images are often prioritised, especially in clinical settings. In case of ongoing inflammation, the intravenously administered contrast agent leaks into the brain parenchyma in locations where the blood–brain barrier (BBB) is disrupted [20]. These post-contrast images are valuable both in baseline and follow-up examinations, as they can unequivocally detect lesions with active inflammation [19].

The contrast agent used is almost universally gadolinium-based, consisting of a central paramagnetic Gd3+ ion chelated to a carrier molecule to prevent the toxicity of free Gd3+, while still maintaining its paramagnetic properties. Gadolinium-based contrast agents (GBCAs) shorten both the longitudinal (T1) and transverse (T2) relaxation times [21], leaving areas in which GBCAs accumulate as bright or hyperintense compared to surrounding tissue on T1-weighted images.

The use of GBCAs has increased over the last three decades [22], making up a considerable source of historical and prospective real-world data. However, the value of such data for brain atrophy measurements depends on our ability to correctly interpret the data in automated image analyses. The influence of GBCAs on atrophy measurements is still largely unknown and has previously been investigated in only a few studies using different image analysis techniques [23, 24]. In this study, our aim is to validate the use of post-contrast T1-weighted images for volume and cortical thickness measurements and to provide guidelines on how to interpret results from clinically relevant and commonly considered measures. To do so, total WM and GM volume, total deep GM and thalamus volume, and mean cortical thickness measures were obtained in pre- and post-contrast images by FreeSurfer and compared.

Materials and methods


The patients included in this study participated in a 10-year follow-up visit following a multi-centre, randomised, placebo-controlled trial of ω-3 fatty acids in MS (the OFAMS-study), which has previously been described in detail [25]. A total of 85 of the 92 persons with relapsing–remitting MS (RRMS) [26] originally enrolled in the OFAMS-study participated in the 10-year follow-up visit and underwent clinical, biochemical, and radiological examinations at their local study site.

The study was approved by the Regional Committee for Medical and Health Research Ethics in Western Norway Regional Health Authority (clinical, identifier: NCT00360906). All participants gave their written informed consent.

MRI data and analysis

MRI data acquisition

Imaging at the 10-year follow-up visit was performed at the different study sites, on a 3-Tesla (T) MRI scanner if available, alternatively using a 1.5 T MRI scanner, with a standard head coil. The acquisition included a post-contrast sagittal 3D T1-weighted sequence; acquisition details across sites are provided in Table 1. Furthermore, a sagittal T2-weighted 3D fluid-attenuated inversion recovery (FLAIR) sequence was acquired according to locally optimised protocols. The full MRI protocol provided to the study sites is available in eAppendix 1. The study sites were encouraged to include the same 3D T1-weighted sequence before contrast-agent administration, if possible. For the present study, only the subset of the participants who underwent 3D T1-weighted MR imaging both before and after injection of GBCAs, during the same scanner visit, and with the exact same acquisition protocol, was included.

Table 1 Details on MRI acquisition per protocol

MRI data processing

Lesion segmentation and lesion filling

Lesion segmentation was done on FLAIR images using lesion segmentation tool (LST) (version 2.0.15; [27]. The lesion probability map in FLAIR space was brought to T1-weighted space by FLIRT linear registration of the FLAIR image to the T1 image, using 7 degrees of freedom, correlation ratio as the cost function, and trilinear interpolation. Afterwards, a threshold of 0.1 was used to binarise the lesion probability map. To optimise the lesion filling, gadolinium-enhancing regions (both lesions and other regions) were first removed, by applying an upper-intensity threshold at the 98th percentile. Next, the FMRIB Software Library (FSL) (version 5.0.10; was used to fill in lesional voxels in the T1-weighted images using the lesion_filling tool [28], and these filled lesions were pasted into the original post-contrast 3D T1-weighted images.

Morphological reconstruction

Cortical reconstruction and parcellation for cortical volume and thickness measurement and subcortical segmentation were performed with FreeSurfer, a freely available software package for academic use, available through online download ( The findings presented here were obtained using FreeSurfer version 7.1.1; highly comparable findings obtained using FreeSurfer version 6.0.1 are presented in Table e1. The technical details of FreeSurfer procedures have been previously described [29, 30] and briefly summarised in eAppendix 2.

Quality control was performed by visual inspection, and any segmentation errors were recorded for each patient. In cases where only specific anatomical regions were incorrectly segmented, we chose to not apply any corrections for these errors in our analyses.

The Desikan-Killiany atlas [31] was used to extract cortical thickness measures (mean cortical thickness, left and right hemisphere) and to study regional differences in cortical thickness between pre- and post-contrast images, across subjects, by creating a heat map. Furthermore, total cerebral GM and WM volume and total deep GM and thalamus volume (left and right hemisphere) were obtained.

MRI quality control tool

To investigate potential root causes of any observed segmentation differences, both pre- and post-contrast T1-weighted images were analysed using the MRI Quality Control Tool (MRIQC) [32]. MRIQC is an open-source software and extracts no-reference image quality metrics (IQMs) from structural and functional MRI data [32]. Using a segmentation into GM, WM, and CSF by FSL-FAST [33], MRIQC calculates tissue-specific signal-to-noise ratio (SNR) values as well as the contrast-to-noise ratio (CNR) between GM and WM. Additionally, based on these values obtained from MRIQC, the contrast ratio (CR) between white and grey matter was also calculated.

Statistical analysis

Statistical analyses were performed using the Statistical Product and Service Solutions (SPSS) for macOS (Version 25; SPSS). Data were visually and statistically examined using the Kolmogorov–Smirnov test for normality. To assess the agreement between volume and thickness measurements obtained before and after GBCA administration, the intra-class correlation coefficient (ICC) was determined, based on a mean rating (k = 2), consistency, two-way mixed model. Scatterplots were created to visualise the agreement. To assess whether any systematic differences in structural measurements or IQMs were present between pre- and post-contrast measurements, paired t-tests were performed. Furthermore, boxplots were made to illustrate any differences, and Bland–Altman plots were created to identify fixed or proportional bias [34]. As an exploratory analysis, paired t-tests were used to investigate a possible systematic difference between field strengths (1.5 and 3 T).


Pre- and post-contrast T1-weighted images were obtained with the exact same acquisition protocol in a total of 23 patients. One patient was excluded due to a large image artifact, causing segmentation errors. Table 2 provides an overview of the demographic and clinical characteristics of the patient group.

Table 2 Demographic and clinical characteristics

Quality control of FreeSurfer segmentations

All 22 pairs of pre- and post-contrast T1-weighted images finished the fully automated FreeSurfer pipeline (i.e., no hard failures). The most common soft failures (i.e., failures that do not disrupt the pipeline, but may need modification) are summarised in Table 3.

Table 3 Summary of the most common soft failures
Fig. 1
figure 1

Post-contrast T1-weighted MRI, showing the border between WM and GM (white surface) (yellow), and the border between GM and CSF (pial surface) (red). a Axial slice showing a moderate pial surface “looping error” (white arrow). b Sagittal slice showing a typical skull stripping failure; a moderate error of the pial surface expanding into the dura and the sagittal sinus (white arrow)

Fig. 2
figure 2

T1-weighted MRI, showing the segmentation of the left Thalamus in pre- and post-contrast images, in two different patients (subject E3 (ad) and subject C1 (eh)). a–d Axial slices demonstrating the typical quality of thalamus segmentations. In post-contrast images (cd), the medial border of the left Thalamus is slightly overestimated (arrow) compared to pre-contrast images (arrowhead) (ab), most likely due to hyperintense signal from extraparenchymal structures in the midline. eh Axial slices demonstrating a more severe overestimation of the medial border of the left Thalamus (arrow) in post-contrast images (gh) compared to pre-contrast images (arrow head) (ef). Again, the segmentation of the medial border is overestimated due to inclusion of extraparenchymal hyperintense structures, in this case, the internal cerebral vein)

Fig. 3
figure 3

Pre-contrast (ac) and post-contrast (df) T1-weighted images obtained from the same patient (subject A3) in the same MRI session. b and e show the white surface, which is the border between white and grey matter as automatically constructed by FreeSurfer (yellow). c and f show the pial surface, which is similarly the automatically constructed border between grey matter and cerebrospinal fluid (red), derived from the white surface. The figure demonstrates a typical failure of moderate degree, where the white surface fails to include parts of the temporal poles in the post-contrast image (e) (arrow), with subsequent mistakes in the pial surface (f) (arrowhead)

Volume and cortical thickness measurements before and after administration of GBCAs

The mean values of MRI measurements obtained before and after GBCAs are summarised in Table 4 and Fig. 4. Briefly, a mean increase in GM volumes and cortical thickness measures were observed in post-contrast images, while a mean decrease was observed in total WM volume. The results of the exploratory analysis subdivided according to field strength are presented in Table e2, showing no clear systematic differences between field strengths.

Table 4 MRI measurement values
Fig. 4
figure 4

Boxplots of MRI measurements obtained before (yellow) and after (red) GBCA administration, in mL (a, c, d) and mm (b)

Consistency of measurements obtained before and after administration of GBCAs

A high degree of reliability was found between the measurements obtained pre- and post-contrast, for all volumes and cortical thickness measures assessed. All ICC values (Table 4) were above 0.92, with the lowest values in the thalami, and above 0.96 for all larger structures, all p values < 0.001. The consistency between the measurements is demonstrated in Fig. 5.

Fig. 5
figure 5

Scatterplots of global (a) and regional (b) MRI measurements obtained before and after GBCA administration. The green lines indicate identity lines

Difference in measurements before and after administration of GBCAs

GM volumes and mean cortical thickness were significantly higher after administration of GBCAs, in all investigated structures (Table 4, Figs. 4 and 5).

Figure 6 shows heatmaps visualising the difference in cortical thickness between pre- and post-contrast images, demonstrating the general increase in thickness measured in post-contrast T1-weighted images. However, in a few exceptions, most prominently the temporal pole, the parahippocampal, and the entorhinal gyrus in the temporal lobe, cortical thickness decreased.

Fig. 6
figure 6

Heatmaps demonstrating the difference (mm) in cortical thickness in the left (a) and right (b) hemisphere after administration of GBCAs. Brown colours indicate an increase in cortical thickness, and purple colours indicate a decrease in cortical thickness (colour range between -1.6 mm and + 1.6 mm cortical thickness difference). Letters in subject names indicate MRI scanner (ag)

While GM volumes and cortical thickness measurements were higher after administration of GBCAs, total WM volume was significantly lower. Figure e1 in the supplementary material shows the constructed Bland–Altman plots, revealing systematic differences, but no proportional bias.

IQMs are reported in Table 5. The CNR was not significantly different between pre- and post-contrast images. Tissue-specific SNRs were significantly lower in post-contrast images, for both GM (p < 0.01) and WM (p < 0.0001). The CR between WM and GM was significantly higher in post-contrast images (p < 0.006).

Table 5 Image quality metrics obtained by MRI Quality Control Tool


Our results demonstrate that using FreeSurfer, reliable GM volume- and cortical thickness measurements may be obtained from post-contrast 3D T1-weighted images. Despite systematic overestimation of the GM, high consistency was observed between all investigated MRI brain measurements obtained before and after administration of GBCAs.

To our knowledge, this is one of the very few studies investigating the effect of GBCAs on volume measures in MS patients and the first using FreeSurfer. In our study, when investigating the consistency between the measures obtained before and after administration of GBCAs, a good to excellent [35] reliability was found between all investigated measures. This is in agreement with previous studies investigating the whole brain [36], upper cervical cord area [37], and GM and WM measurements [23] using SIENAX [23, 36], volBrain, and FSL-Anat [23] and may imply that reliable atrophy measurements acquired from post-contrast images are possible across segmentation techniques.

Consistently, total GM, deep GM, and thalamic volume were between 3.06 and 17.39% higher in post-contrast images, and the same tendency was found for mean cortical thickness. Simultaneously, total WM volume was 1.74% lower in post-contrast images. The differences were systematic across all investigated measurements and exhibited no proportional bias. Inspecting cortical segmentations in more detail, we produced heatmaps highlighting within-subject cortical thickness differences in smaller cortical regions (Fig. 6). While smaller regions almost inevitably produce more variability than the larger regions that were the main focus of this work, these inspections showed that cortical thickness overestimation was a brain-wide phenomenon and that the overestimation in post-contrast images was not tied to large errors in any specific region but instead occurred throughout the brain.

These systematic differences in measured volumes and cortical thicknesses between pre- and post-contrast images mean that they should not be compared directly. Another study, using synthetic tissue mapping to measure brain tissue fractions [24], found a 1.1% increase in total WM fraction and an 0.7% decrease in GM fraction, in post-contrast images. Due to the methodological differences between that study and ours, it is difficult to assess the reason for the discrepancy in findings.

We could not identify any definite reason for the differences between pre- and post-contrast images. However, when visually inspecting images separately, some recurring soft failures in the FreeSurfer pipeline were found: First, the pial surface often expanded into extraparenchymal tissue, including components of dura or blood vessels as part of the cortex (Fig. 1b). These errors have been shown in areas where the dura or other structures like venous sinuses, lie tangentially in close proximity to the cortex or deep GM structures, leading to larger thickness and volume variability (Fig. 2) [38]. In the FreeSurfer processing stream, the failure to remove enough extraparenchymal tissue happens in the preliminary skull stripping step [39] and the accuracy of the pial surface can be improved by manually erasing the incorporated dura or blood vessels before rerunning analyses [40].

Another recurring soft failure concerned the pial surface. In the surface-based cortical reconstruction, the border between white and grey matter (the white surface) is delineated, following T1 intensity gradients. The pial surface is then grown from the white surface, which serves as a reference point [41]. In all images, but more frequently and severely in post-contrast images, the pial surface failed to follow the white surface, causing “looping” errors (Fig. 1a) and a subsequent incorrect enlargement of the cortical volume and thickness. To improve pial surface accuracy, it is recommended to check for any mistakes in the white surface, and possibly apply manual edits before rerunning analyses [40].

Although most cortical regions demonstrated an increase in cortical thickness in post-contrast images, there were a few exceptions, particularly in the medial part of the temporal lobe. In the entorhinal and parahippocampal gyrus, as well as in the temporal poles, the measured cortical thickness was in some patients thinner after GBCA administration. These regions have in common that they are relatively small and structurally complex, and on visual inspection of the errors, the constructed white surface did not correctly follow the intensity gradients, causing considerable errors in the white surface, and subsequently the pial surface, leaving out parts of the temporal pole (Fig. 3). Challenges in reconstructing parts of the temporal cortex are consistent with previous studies [31, 38, 40, 42], leading to increased variability of the local cortical thickness measurements [38].

The soft failures in the FreeSurfer pipeline occurred more often in post-contrast images in our data. This may be caused by the higher intensity in extraparenchymal structures in close proximity to the cortex or subcortical structures, causing disturbance and challenges in correctly separating different tissue types.

Skull stripping errors and other soft failures could in some selected regions be identified as the direct cause of increased cortical thickness or GM volume in post-contrast images. It is however uncertain if these errors can explain the systematic increase in almost all GM structures and the overall decrease in WM volume. Even in the absence of active lesions and GBCA leakage through disruptions in the BBB, GBCAs can still be expected to be present in the brain capillary network [24]. This presence may shorten the overall T1 relaxation time in all tissues, and possibly also affect intensity borders. In our MRIQC analyses, there was no difference between pre- and post-contrast images CNR, indicating that the separation of GM and WM tissue distributions was similar in pre- and post-contrast images. It should however be noted that extracting reliable noise estimates from parallel imaging is challenging.

Systematic effects dependent on the type of GBCA used, dosage, and delay time after administration are likely. In the data retrospectively used in the present study, these factors were not standardised, nor always stated, making them difficult to correct for. To further conclude on the reliability of post-contrast measurements, it is necessary for future research to investigate the possible systematic effects dependent on these variables.

This study is not without limitations. For a multicentre study, the number of patients included was limited, and patients were scanned on different scanners with varying sequence parameters and field strength. Furthermore, some details of the MRI protocol that may affect brain measurements (e.g., head coils [43, 44]) were in some cases neither stated nor retrospectively retrievable, making it difficult to evaluate the effect of these factors. Nonetheless, because the effect of field strength on atrophy measures has been studied before [45], we explored the results for 1.5 T and 3 T scanners separately. No systematic differences between the two field strengths regarding the different variabilities in the pre- and post-contrast images emerged, which could be due to small patient numbers and variable acquisition settings. Considering all these aspects, the fact that consistency between measurements before and after GBCA administration was observed across the different scanners, suggests that this behaviour is largely systematic. Future studies should investigate the effect of field strength and those of other aspects of image acquisition more systematically. Image analyses in this study were performed by FreeSurfer, while there are multiple other software packages that have been used in the MS literature. To focus the present work, we chose FreeSurfer because it allows both volumetric and cortical thickness measurements and has been widely used before in MS [46,47,48,49,50].

Finally, we did not perform any pre-processing to remove high-intensity regions (except for those in WM lesions, filled in the lesion filling process) from the post-gadolinium T1-weighted images. Future work should investigate whether removal or replacement of those regions, perhaps similar to the procedure followed as part of the lesion-filling process in the present work, could reduce the observed overestimation of grey matter.


This study has demonstrated that reliable atrophy measurements may be obtained by FreeSurfer from post-contrast 3D T1-weighted images. A good to excellent consistency was observed between all investigated GM and WM measurements derived from images acquired before and after GBCA administration. However, due to the systematic overestimation of the GM in post-contrast images, measurements acquired from pre- and post-contrast images should not be compared directly, and measurements extracted from certain regions (e.g., the temporal pole) should be interpreted with caution. Furthermore, possible systematic effects dependent on GBCA dose and delay time after injections should be investigated.