Deep learning for glioblastoma segmentation using preoperative magnetic resonance imaging identifies volumetric features associated with survival

Wan, Yizhou; Rahmat, Roushanak; Price, Stephen J.

doi:10.1007/s00701-020-04483-7

Deep learning for glioblastoma segmentation using preoperative magnetic resonance imaging identifies volumetric features associated with survival

Original Article - Brain Tumors
Open access
Published: 13 July 2020

Volume 162, pages 3067–3080, (2020)
Cite this article

Download PDF

You have full access to this open access article

Acta Neurochirurgica Aims and scope Submit manuscript

Deep learning for glioblastoma segmentation using preoperative magnetic resonance imaging identifies volumetric features associated with survival

Download PDF

12 Citations
17 Altmetric
Explore all metrics

Abstract

Background

Measurement of volumetric features is challenging in glioblastoma. We investigate whether volumetric features derived from preoperative MRI using a convolutional neural network–assisted segmentation is correlated with survival.

Methods

Preoperative MRI of 120 patients were scored using Visually Accessible Rembrandt Images (VASARI) features. We trained and tested a multilayer, multi-scale convolutional neural network on multimodal brain tumour segmentation challenge (BRATS) data, prior to testing on our dataset. The automated labels were manually edited to generate ground truth segmentations. Network performance for our data and BRATS data was compared. Multivariable Cox regression analysis corrected for multiple testing using the false discovery rate was performed to correlate clinical and imaging variables with overall survival.

Results

Median Dice coefficients in our sample were (1) whole tumour 0.94 (IQR, 0.82–0.98) compared to 0.91 (IQR, 0.83–0.94 p = 0.012), (2) FLAIR region 0.84 (IQR, 0.63–0.95) compared to 0.81 (IQR, 0.69–0.8 p = 0.170), (3) contrast-enhancing region 0.91 (IQR, 0.74–0.98) compared to 0.83 (IQR, 0.78–0.89 p = 0.003) and (4) necrosis region were 0.82 (IQR, 0.47–0.97) compared to 0.67 (IQR, 0.42–0.81 p = 0.005). Contrast-enhancing region/tumour core ratio (HR 4.73 [95% CI, 1.67–13.40], corrected p = 0.017) and necrotic core/tumour core ratio (HR 8.13 [95% CI, 2.06–32.12], corrected p = 0.011) were independently associated with overall survival.

Conclusion

Semi-automated segmentation of glioblastoma using a convolutional neural network trained on independent data is robust when applied to routine clinical data. The segmented volumes have prognostic significance.

Segmentation of glioblastomas in early post-operative multi-modal MRI with deep neural networks

Article Open access 02 November 2023

Deep Convolutional Neural Networks Using U-Net for Automatic Brain Tumor Segmentation in Multimodal MRI Volumes

Brain Tumor Segmentation and Survival Prediction Using Automatic Hard Mining in 3D CNN Architecture

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

The median survival for glioblastoma remains approximately 12–15 months despite surgery and chemoradiation [1]. Extent of resection (EOR) is the only modifiable prognostic factor [1,2,3]. Other prognostic variables such as age, O⁶-methylguanine–DNA methyltransferase (MGMT) methylation and Karnofsky performance status (KPS) are not modifiable, and mutation status cannot be assessed preoperatively [1, 4]. Combining clinical variables with MRI features may be able to better prognosticate patients and stratify them for clinical trials—especially trials at the time of surgery when molecular markers are unknown [5, 6].

Radiogenomic analysis has linked MRI-based tumour subregions such as fluid-attenuated inversion recovery (FLAIR) volumes with genetic signatures of invasiveness and reduced survival [7]. Reproducible and accurate descriptors of imaging features are needed to identify prognostic biomarkers [8]. The Visually Accessible Rembrandt Images (VASARI) variables is the largest standardised dataset of imaging variables based on preoperative MRI [8, 9]. Qualitative VASARI descriptors when combined with manually segmented volumetric variables are independently associated with survival [9].

Manual segmentation of tumour volumes is time-consuming and suffers from high interobserver and intraobserver variability [10]. This limits their use in clinical practice. Computer-assisted methods reduce segmentation time and demonstrate good agreement with manual ground truth segmentations [10,11,12].

Convolutional neural networks (CNNs) are the state-of-the art computer-assisted method for glioblastoma segmentation [13]. They outperform alternative methods which use independent decision classifiers to extract texture and intensity features [14]. A pre-annotated dataset is used to train the CNN architecture to perform a series of mathematical convolutions through interdependent multiple layers. This determines the relationship between the input images and output images. The CNN can then be validated on different test datasets.

DeepMedic is a 3-dimensional (3D) CNN ranked highly in the Brain Tumour Segmentation Challenge (BRATS) [14]. DeepMedic assigns classes to each voxel independently using intensity and local feature information across image planes through two 11-layer convolutional pathways [14]. Each pathway samples different resolutions to lower computational cost [14]. DeepMedic has been shown to have robust segmentation accuracy of tumour subregions when tested and trained on MRI images performed at multiple institutions, with different protocols [15].

Automated segmentation of glioblastoma subregions has moderate agreement with their corresponding VASARI-derived semi-quantitative measures. However, measurement of certain tumour subregions is less accurate than other regions such as necrosis compared to the contrast-enhancing region (CER) [8]. In clinical practice, manual correction of segmentations generated from deep learning is more accurate than when the automated segmentations are used as full replacement for manual segmentations [15]. Generating curated training data for each dataset is also labour-intensive and time-consuming, requiring manual segmentations for images in each data cohort. Using publicly available data such as BRATS to train CNNs may enable these automated methods to be applicable across different datasets and institutions. We therefore aim to investigate whether DeepMedic trained on BRATS can be used for transfer learning, utilising automated segmentations as priors for manual correction. Previous studies have not examined the prognostic value of CNN-assisted volumetric measurements in combination with known prognostic variables. We test the validity of our semi-automated segmentation approach by correlating volumetric features of the resulting segmentations with survival.

Methods

Patient characteristics

All patients (≥ 18 years) diagnosed with histologically confirmed primary glioblastoma were identified from July 2016 to January 2018. Patients with previous glioma or cranial surgery were excluded.

Clinical characteristics were collected from electronic patient records. Motor deficit was defined as reduced power in any modality, and sensory deficit as reduced sensation or paraesthesia in any modality. Speech problems can present as receptive or expressive dysphasia. The operative records were used to obtain the following factors: American Association of Anesthesiologists (ASA) grade, use of 5-aminoleuvenic acid (5-ALA) and/or neurostimulation/awake surgery. Postoperative neurological deficit was recorded within 1 week from surgery.

Presence of isocitrate dehydrogenase (IDH) mutation and MGMT promoter methylation were recorded. MGMT promoter methylation was determined by pyrosequencing of the differentially methylated region 2 using a 10% cutoff value [16].

Patients were treated with either adjuvant chemoradiation (Stupp regimen), radiotherapy for symptom stabilisation or supportive care. The date of death was obtained from national patient records. The date of last follow-up was the time of query of NHS Spine (22/01/2019).

Patient demographics and imaging were anonymised prior to recording of research data. This study was approved by the local research ethics committee (Study ID: PRE.2017.040).

Image preprocessing

MRI protocols are shown in Supplementary Table A1. Each patient’s images were resampled to 1 mm³. T1-weighted, T2-weighted and T2-FLAIR images were co-registered to T1C image using the FSL linear image registration tool (FLIRT) with a mutual information algorithm and 6-degrees of freedom [17].

Brain extraction was performed using brain extraction tool (BET) for the T1C image [18]. Masks were edited and applied to the other sequences using voxel multiplication.

Processed images were registered to the same atlas (SRI24) used for BRATS with a 12 degree of freedom affine registration, mutual information algorithm with FSL FLIRT [17]. Noise reduction was performed using smallest univalue segment assimilating nucleus (SUSAN) [19]. Images were denoised and normalised to zero mean unit variance.

Tumour segmentation

The BRATS 2017 dataset contains 285 mixed-grade gliomas with expert-annotated manual segmentations of the (1) necrotic core (NC), (2) CER, (3) non-enhancing tumour (NET) and (4) peritumoural oedema (PTE) [20]. The definitions for manual segmentation are found in the Supplementary Data. DeepMedic can detect three tumour subregions: (1) FLAIR region, (2) CER and (3) NC region. Following the procedure from BRATS, automated segmentations of the whole-tumour (WT) region were created from T2-weighted sequences and FLAIR sequences whilst the CER region and NC regions were created from T1-weighted images. PTE and NET were manually delineated from the FLAIR region [15].

The primary architecture of DeepMedic consists of two main parallel pathways, each consists of four feature extraction layers with 53 kernels for feature extraction, as well as two fully connected layers and one final classification layer. The multi-scale processing of different input channels is handled using the dual pathway to achieve a large receptive field for the final classification, whilst the cost computation remains low. The first pathway operates on the original image, and the second one operates on a down-sampled version [14, 21].

To apply transfer learning on our dataset, DeepMedic was firstly trained on randomly chosen scans from BRATS dataset. The dataset was divided into 107 patients for training (n = 66,340 images) and 50 patients for validation (n = 31,000 images). This trained model was further tested on 50 further patients from BRATS prior to application to our test dataset. As we are evaluating the validity of applying a CNN trained on a different dataset to our test dataset, one author (YW) manually segmented our test images by correcting the automated segmentation labels derived from DeepMedic using 3D Slicer (Harvard Medical School) (Fig. 1) [22].

DeepMedic incorporates data augmentation to increase input data volumes as well as their complexity via reflection with respect to the mid-sagittal plane. Data shuffle is performed at the start of each epoch to avoid overfitting [23]. The hyper parameters remained the same as the original DeepMedic network proposed [14]. The network is regularised using dropout, 35 epochs and 5 batch sizes, with 5-fold cross-validation. The loss function used was negative log-likelihood.

Training was performed using an implementation of DeepMedic on Tensorflow, using an NVIDIA Titan Xp graphics card [24].

The delimitation of residue contrast-enhancing tumour was performed using T1 subtraction maps (ΔT1 map). This method improves the delineation of tumour by subtraction of contrast enhancement from blood products [25]. Their use has validated residue tumour volume (RTV) as a predictor of survival [25]. ΔT1 maps were created by voxel-by-voxel subtraction of the pre-processed T1 image from the T1C image (Fig. 2).

The comparative metrics used to assess the quality of the segmentations were the Dice coefficient and volumes of the segmentations. The Dice coefficient is a value between zero and one which presents the degree of overlap between two segmentations (one represents perfect overlap) [11]. For our test dataset, the post-manually edited tumour subregion segmentations were considered the ground truth for comparison with the automated labels. Differences between the automated and manually corrected segmentations volumes of each tumour subregion were also compared. For the BRATS test data (n = 50), we also computed the Dice coefficient between the automated labels generated from our model and the expert-annotated manual ground truths.

Statistical analysis

VASARI features were scored on preoperative imaging (Supplementary Table A2). Spearman rank correlation was used to compare the proportions of each subregion; EOR was calculated based on volumetric segmentations; EOR = CER − RTV/CER × 100%.

The primary outcome was overall survival (OS): difference between the date of death and the date of surgery. Cox proportional hazards models were used to identify significant factors associated with OS. Median follow-up time was calculated using the reverse KM method [26]. The Student t test and Wilcoxon rank-sum test were used to compare baseline characteristics between groups for continuous variables. The chi-square test was used for comparing categorical variables.

Variables with p < 0.2 in the bivariable regression models were included in the multivariable models. The Akaike information criterion (AIC) was used as a measure to compare the quality of the models. AIC scores compare the relative performance of a model based on the number of parameters and goodness of fit. The model with the lower AIC score explains the greatest variation using the least number of independent variables [27]. Multiple testing was controlled for using the false discovery rate. A false discovery rate adjusted p value (q-value) is the percentage of significant tests which will result in a false positive. Given the exploratory nature of the study, a threshold of 0.1 was chosen (sensitivity analysis showed that lowering the threshold to 0.05 did not change the results of significant variables). Statistical analysis used Stata version 14 (StataCorp. College Station, Texas).

Results

Patient, clinical and treatment characteristics

One hundred twenty cases were included (Table 1). Median follow-up time was 19.9 months (95% CI, 17.4–21.9 months). The median OS was 8.8 months (95% CI, 7.3–12.0 months). Median survival of patients undergoing resection was 8 months longer than patients undergoing biopsy (12.8 months vs 4.7 months, p < 0.001).

Table 1 Baseline characteristics of all patients (n = 120)

Full size table

The median age was 65.2 [57.1–70.6] years and 69 (57.5%) of patients were male. Complete resection of enhancing tumour (CRET) was achieved in 59 (49.2%) patients and partial resection of enhancing tumour (PRET) achieved in 16 (13.3%) patients. A biopsy was performed on 45 (37.5%) patients. The median EOR was high, 99% (IQR, 50.9–100%) with a median RTV of 0.47 cm³ (IQR, 0.09–0.73 cm³) for PRET tumours.

Due to differences between baseline characteristics, the resection and biopsy groups were analysed separately. Resection patients had larger CER and NC volumes whilst biopsy patients had a larger NET volume (Supplementary Table B1).

Comparison between automated and manually corrected segmentations

The network was able to detect WT, CER and NC in all patients in the BRATS test data and in 118 patients (98%) of our test dataset. Failure to localise the tumour was caused by inaccurate labelling due to T2 hyperintensities on the FLAIR sequence in two patients. The time taken for full manual segmentation in each of these two patients was approximately 40 min. Time taken for manual correction of the automated labels in the remaining patients was approximately 15 min. The training and test times were approximately 41 h (2475 min) and 36 min respectively in our data. Therefore, considering training, testing and correction time, our semi-automated segmentation method can reduce segmentation time by approximately 38 h (2289 min) compared to manual segmentation.

Comparison of segmentation outcomes between our BRATS test data (n = 50) and our sample data (n = 120) showed similar network performance in the two datasets. Median Dice coefficients in our sample were (1) WT 0.94 (IQR, 0.82–0.98), (2) FLAIR region 0.84 (IQR, 0.63–0.95) and (3) CER 0.91 (IQR, 0.74–0.98) and NC were 0.82 (IQR, 0.47–0.97). We observed significantly different Dice coefficients for WT 0.91 (IQR, 0.83–0.94 p = 0.012), CER 0.83 (IQR, 0.78–0.89 p = 0.003) and NC 0.67 (IQR, 0.42–0.81 p = 0.005) but not FLAIR region 0.81 (IQR, 0.69–0.8 p = 0.170) in the BRATS test data (Fig. 3).

The median tumour volume for post-corrected segmentations were WT 77.0 cm³ (IQR, 43.6–115.6), FLAIR region 41.5 cm³ (IQR, 26.0–68.7), CER 8.2 cm³ (IQR, 2.9–14.9) and NC 7.2 cm³ (IQR, 2.7–16.2). There were significant differences between automated segmentation volumes for WT 83.1 cm³ (IQR, 53.1–121.9 p < 0.001), FLAIR region 61.5 cm³ (IQR, 36.8–90.9 p < 0.001) and CER 9.3 cm³ (IQR, 4.1–17.3 p < 0.001) but not NC 6.2 cm³ (IQR, 1.6–17.7 p = 0.209) (Fig. 4).

Relationship between volumetric subregions

There was a positive correlation between the volume of tumour core and individual tumour subregions (p < 0.001), shown in Fig. 5.

PTE volume positively correlated with CER volume r = 0.42 (p < 0.001) and NC volume r = 0.37 (p < 0.001) but not NET volume r = 0.06 (p = 0.508). There was a negative correlation between CER volume and NET volume r = − 0.31 (p < 0.001) as well as between CER volume and NC volume r = 0.37 (p < 0.001). NET volume was not correlated with NC volume r = −0.107 (p = 0.247).

Subregion volumes were normalised by dividing by TC volume. TC volume positively correlated with NET/TC r = 0.22 (p < 0.05). There was a negative correlation between TC and CER/TC r = − 0.35 (p < 0.001) and TC with PTE/TC r = − 0.48 (p < 0.001). NC/TC did not correlate with TC volume r = 0.13 (p = 0.158).

PTE/NC ratio correlated with CER/TC r = 0.27 (p = 0.003) and was independent of NET/TC r = 0.10 (p = 0.285). NER/TC was independent of CER/TC r = − 0.16 (p = 0.080) and negatively correlated with NET/TC r = − 0.49 (p < 0.001).

Volumetric features associated with overall survival

Cox regression models were constructed for volumetric variables which were not significantly correlated with each other (Supplementary Table C1, C3 and C5). Analysis was performed for the entire cohort and for biopsy and resection patients separately (Table 2).

Table 2 Cox regression analysis

Full size table

For the whole cohort, clinical variables associated with OS included the following: age (HR 1.05 [95% CI, 1.02–1.08], p = 0.003), postoperative deficit (HR 3.42 [95% CI, 1.69–6.90], p = 0.001), CRET (HR 0.11 [95% CI, 0.03–0.38], p < 0.001), PRET (HR 0.23 [95% CI, 0.06–0.93], p < 0.039) and adjuvant chemoradiotherapy (HR 0.15 [95% CI, 0.06–0.33], p < 0.001). Occipital lobe location (HR 10.05 [95% CI, 1.27–79.66]. p = 0.029), speech motor cortex location (HR 2.63 [95% CI, 1.02–6.75], p = 0.045) and visual cortex location (HR 0.18 [95% CI, 0.05–0.61], p = 0.006) were associated with OS. Volumetric variables which were significantly associated with OS were as follows: CER/TC (HR 4.73 [95% CI, 1.67–13.40], p = 0.003) and NC/TC (HR 8.13 [95% CI, 2.06–32.12], p = 0.003. At a corrected critical value of q = 0.022, both CER/TC (q = 0.017) and NC/TC (q = 0.011) were significantly associated with OS.

The significant variables associated with OS for resection patients were as follows: age (HR 1.08 [95% CI, 1.03–1.14), p = 0.002), KPS, EOR (HR 3.69 × 10⁻⁴ [95% CI, 1.51 × 10⁻⁷–0.90], p = 0.047) and adjuvant chemoradiotherapy (HR 0.15 [95% CI, 0.03–0.85), p = 0.031). The only significant volumetric feature was PTE/NC (HR 1.05 [95% CI, 1.01–1.09], p = 0.020) but this was not significant after adjustment for multiple testing (adjusted critical q > 0.004).

For biopsy patients, the following volumetric variables were associated with OS: NET volume (HR 0.01 [95% CI, 0.002–0.23], p = 0.005), NET/WT (HR 0.01 [95% CI, 0.002–0.23], p = 0.005) and CER/TC (HR 13.88 [95% CI, 1.56–123.36], p = 0.018). Only NET/WT was significantly associated with OS following correction for multiple testing (q < 0.038).

MGMT methylation and IDH mutation were not significantly associated with OS in bivariable or multivariable analysis.

Discussion

In this study, we integrated a deep learning network into a clinically applicable processing pipeline for semi-automated measurement of glioblastoma volumetric features from preoperative MRI. A CNN was trained on publicly available BRATS data before testing on our routine clinical dataset. Final segmentation labels were generated by manual correction of the automated segmentations. Performance of the network was comparable between our clinical dataset and BRATS testing dataset when measured using the Dice coefficient. We further assessed the validity of our segmentation approach by evaluating the prognostic performance of volumetric features. Higher CER/TC and NC/TC were independently associated with higher risk of death overall. NET/WT was associated with lower risk of death in biopsy patients.

We demonstrate the possibility of transfer learning by segmenting tumour regions on a heterogeneous clinical dataset trained and tested on an independent dataset using multimodal imaging. A major advantage of our method is that it is applicable to data from different scanners and institutions. Furthermore, curating a large sample of uniform images for deep learning training is time- and labour-intensive [28]. Even when training time is considered, use of our computer-assisted segmentation method can reduce annotation time by 20 min per patient compared to fully manual segmentation. Training on an external dataset also increases the generalizability of the method and means data does not need to be split into training and validation cohorts, reducing sample sizes.

Deep learning methods can provide quantitative imaging-based prognostic biomarkers that outperform semi-quantitative estimates. In previous studies, tumour size has been investigated as a potential prognostic marker [29]. Tumour dimensions can be estimated using long axis diameter, cuboid, spheroid and ellipsoid formulas [30]. These methods are subjective and difficult to reproduce, leading to conflicting associations with survival [31, 32]. Formula-based estimates also have poor accordance with volumetric measurements [30].

We did not perform full manual segmentation as an approximate for ground truth to compare with our automated segmentations as the goal of our study was to determine if deep learning generated automated segmentations could be used to aid manual segmentation rather than as a replacement. Despite being trained on BRATS data, our network was able to detect the tumour in nearly all our patients and showed comparable segmentation accuracy in our dataset when evaluated against BRATS test data.

Our final segmentations were derived from corrections performed on the automated labels so it is expected that the label overlap for each subregion is significantly higher in our dataset compared to the BRATS data where no bias exists due to the manual segmentations being performed independently. Nonetheless, the relative accuracy of the segmentation labels across different tumour subregions was similar across the two test datasets. For example, the necrosis subregion had the highest proportion of misclassified voxels whilst the CER had the least proportion of misclassified voxels. This indicates that our manual corrections were dependent on the accuracy of the automated labels.

Few studies have compared accuracy of automated and manual segmentations of tumour subregions such as necrosis [15]. It is important to measure these subregions as they may have prognostic significance. Our data highlights the importance of choosing clinically relevant metrics to compare automated and manual segmentations. Two segmentations may have high Dice overlap but significantly different volumes if a smaller volume is entirely within the larger volume. There were significant differences between the automated and corrected segmentations for all tumour subregions apart from necrosis. Necrosis may be particularly challenging to segment using either automated or manual methods due to its heterogeneous and dispersed nature within the tumour core [15].

Necrosis was one of the earliest imaging markers found to have prognostic value [32]. Descriptive studies classifying tumours by estimating necrotic proportions have not been consistently prognostic [1]. In concordance with previous studies, we do not find a consistent association between necrosis volume measured by VASARI score and OS [33]. Instead, higher relative proportion of necrosis is associated with worse prognosis. NC/TC has been previously associated with worse survival [30, 34]. However, this study only included patients suitable for CRET, raising the possibility of selection bias. Furthermore, they did not control for qualitative imaging features described in the VASARI feature set in their multivariable analysis [34]. Hypoxia may select for quiescent stem-like cells around the NC which resist apoptosis and undergo proliferation [35].

Established prognostic variables including EOR calculated using our segmentations are associated with survival. The median OS in our cohort is only 9 months because compared to previous cohorts, we did not exclude patients undergoing biopsy [1, 30]. The median OS of 13 months in our resection patients is comparable to previous studies [1]. By including both resection and biopsy patients, our sample is representative of the heterogeneous cohorts of glioblastomas encountered in routine clinical practice. IDH mutation was not found to be associated with improved survival but this finding may be limited by the small number of IDH-mutant patients in our cohort. Our finding that MGMT methylation was not associated with improved survival is difficult to interpret. Not all studies demonstrate an association between MGMT methylation and improved OS, with several studies including a randomised control trial showing no association with survival [4, 36]. The reasons for this may be due to assay variability and difficulties correlating promoter methylation with protein expression [37].

The association between FLAIR proportion relative to other subregions and survival is unclear [38]. PTE/TC was not found to be associated with OS in a previous study [34]. Multiparametric biopsies have shown high levels of viable tumour cells within the non-enhancing region [39]. However, unlike our study, these studies did not differentiate oedema from non-enhancing tumour. The Dice coefficient between the automated and corrected FLAIR region segmentations was not significantly higher for our sample compared to BRATS as we were able to manually delineate NET from the FLAIR region.

Differentiating non-enhancing tumour from peritumoural oedema is important as each subregion may yield different prognostic information. The non-enhancing tumour within the FLAIR abnormality may represent lower grade disease [40]. We show that non-enhancing tumour proportion relative to the whole-tumour volume rather than FLAIR volume or peritumoural oedema volume was associated with improved survival in the biopsy group. This suggests that non-enhancing tumour was differentiated from peritumoural oedema by manual segmentation. Although our network was not able to detect non-enhancing tumour, we have shown high segmentation accuracy for the FLAIR region which can aid the manual delineation of the non-enhancing tumour.

Non-enhancing tumour variables were not prognostic in resection patients. This may be because the resection patients had significantly smaller volumes of non-enhancing tumour compared to the biopsy patients. The difficulty in delineating oedema from non-enhancing tumour may result in overlap between non-enhancing tumour and oedema in resection patients. We have shown that diffusion MRI signatures have higher sensitivity for invasive tumour can also be segmented using CNN [21, 41]. These biomarkers may be correlated with the FLAIR region to improve identification of infiltrative tumour.

CER/TC was associated with survival, independent of RTV and other prognostic factors. In radiogenomic studies, the CER correlates with genes involved in angiogenesis and hypoxia [42]. The relationship between the NC and the CER may not be linear; we show that NC/TC is independent of CER/TC and core tumour volume. This suggests that some tumours have relatively greater proportions of necrosis for a given tumour volume.

CER volume was negatively associated with survival in a cohort of resected glioblastomas but the accuracy of the volumetric measurements was limited by digitised hard-copy imaging [43]. CER volume has also been associated with worse survival when adjusted for other VASARI variables but in this study adjuvant treatment received was not controlled for [9]. We found that CER/TC rather than CER volume was significantly associated with survival. CER/TC may be a more accurate prognostic measure because it relates CER to the core tumour volume rather than whole-tumour volume as the oedema component may be affected by factors such as steroid use.

Limitations to our study include its retrospective nature. It is one of the largest studies investigating volumetric features and prognosis incorporating quantitative measurement of postoperative imaging and clinical variables. Future work to evaluate our approach should quantitatively compare segmentation measurements with manual segmentations from multiple observers to assess inter-rater variability [28]. In addition, an independent dataset is necessary for us to compare the relative prognostic performance of automated segmentations against manual segmentations and determine the reproducibility of our segmentations. Finally, volumetric features may be combined with texture- and shape-based analysis of tumour subregions to develop improved prognostic models for glioblastoma patients [44, 45].

Conclusions

Using a CNN with a transfer learning approach, we have shown that volumetric measurements of glioblastoma tumour subregions can be measured from preoperative MRI with high accuracy. The CNN can be integrated into a radiological workflow to significantly shorten segmentation time.

Tumours with greater proportions of necrosis and contrast enhancement are independently associated with worse survival whilst non-enhancing tumour proportion is associated with improved survival. With further validation, we may be able to use volumetric features from routine clinical imaging for patient prognostication and stratification into clinical trials.

Data availability

Code can be made available by request to the authors

Abbreviations

AIC:: Akaike information criterion
5-ALA:: 5-Aminoleuvenic acid
ASA:: American Association of Anesthesiologists
BRATS:: Multimodal brain tumour segmentation challenge
CER:: Contrast-enhancing region
CRET:: Complete resection of enhancing tumour
CNN:: Convolutional neural network
EOR:: Extent of resection
FOV:: Field of view
FLAIR:: Fluid attenuated inversion recovery
HR:: Hazard ratio
IDH:: Isocitrate dehydrogenase
KPS:: Karnofsky Performance Scale
PTE:: Peritumoural oedema
MGMT:: O⁶-Methylguanine–DNA methyltransferase
MRI:: Magnetic resonance imaging
NC:: Necrotic core
NET:: Non-enhancing tumour
OS:: Overall survival
PACS:: Picture archiving system
PRET:: Partial resection of enhancing tumour
RTV:: Residue tumour volume
SVZ:: Subventricular zone
TC:: Tumour core
VASARI:: Visually Accessible Rembrandt Images
WT:: Whole tumour

References

Lacroix M, Abi-Said D, Fourney DR et al (2001) A multivariate analysis of 416 patients with glioblastoma multiforme: prognosis, extent of resection, and survival. J Neurosurg 95(2):190–198
Article CAS Google Scholar
Chaichana KL, Cabrera-Aldana EE, Jusue-Torres I, Wijesekera O, Olivi A, Rahman M, Quinones-Hinojosa A (2014) When gross total resection of a glioblastoma is possible, how much resection should be achieved? World Neurosurg 82(1):e257–e265
Article Google Scholar
Sanai N, Polley M-Y, McDermott MW, Parsa AT, Berger MS (2011) An extent of resection threshold for newly diagnosed glioblastomas: clinical article. J Neurosurg 115(1):3–8
Article Google Scholar
Zhao Y-H, Wang Z-F, Cao C-J, Weng H, Xu C-S, Li K, Li J-L, Lan J, Zeng X-T, Li Z-Q (2018) The clinical significance of O6-methylguanine-DNA methyltransferase promoter methylation status in adult patients with glioblastoma: a meta-analysis. Front Neurol. https://doi.org/10.3389/fneur.2018.00127
Komotar RJ, Starke RM, Connolly ES, Sisti MB (2010) Evaluating the benefit of repeat surgery for recurrent glioblastoma multiforme. Neurosurgery 67(6):N16–N17
Article Google Scholar
Park JK, Hodges T, Arko L et al (2010) Scale to predict survival after surgery for recurrent glioblastoma multiforme. J Clin Oncol 28(24):3838–3843
Article Google Scholar
Zinn PO, Majadan B, Sathyan P, Singh SK, Majumder S, Jolesz FA, Colen RR (2011) Radiogenomic mapping of edema/cellular invasion MRI-phenotypes in glioblastoma multiforme. PLoS One. https://doi.org/10.1371/journal.pone.0025451
Rios Velazquez E, Meier R, Dunn WD Jr, Alexander B, Wiest R, Bauer S, Gutman DA, Reyes M, Aerts HJWL (2015) Fully automatic GBM segmentation in the TCGA-GBM dataset: prognosis and correlation with VASARI features. Sci Rep 5:16822
Article CAS Google Scholar
Wangaryattawanich P, Hatami M, Wang J et al (2015) Multicenter imaging outcomes study of the Cancer Genome Atlas glioblastoma patient cohort: imaging predictors of overall and progression-free survival. Neuro-Oncol 17(11):1525–1537
Article Google Scholar
Meier R, Porz N, Knecht U, Loosli T, Schucht P, Beck J, Slotboom J, Wiest R, Reyes M (2017) Automatic estimation of extent of resection and residual tumor volume of patients with glioblastoma. J Neurosurg 127(4):798–806
Article Google Scholar
Porz N, Bauer S, Pica A, Schucht P, Beck J, Verma RK, Slotboom J, Reyes M, Wiest R (2014) Multi-modal glioblastoma segmentation: man versus machine. PLoS One 9(5):e96873
Article Google Scholar
Porz N, Habegger S, Meier R et al (2016) Fully automated enhanced tumor compartmentalization: man vs. machine reloaded. PLoS One 11(11):e0165302
Article Google Scholar
Menze BH, Jakab A, Bauer S et al (2015) The multimodal brain tumor image segmentation benchmark (BRATS). IEEE Trans Med Imaging 34(10):1993–2024
Article Google Scholar
Kamnitsas K, Ferrante E, Parisot S, Ledig C, Nori AV, Criminisi A, Rueckert D, Glocker B (2016) DeepMedic for brain tumor segmentation. In: Crimi A, Menze B, Maier O, Reyes M, Winzeck S, Handels H (eds) Brainlesion Glioma Mult. Scler. Stroke Trauma. Brain Inj. Springer International Publishing, Cham, pp 138–149
Google Scholar
Perkuhn M, Stavrinou P, Thiele F, Shakirin G, Mohan M, Garmpis D, Kabbasch C, Borggrefe J (2018) Clinical evaluation of a multiparametric deep learning model for glioblastoma segmentation using heterogeneous magnetic resonance imaging data from clinical routine. Invest Radiol 53(11):647–654
Article Google Scholar
Malley DS, Hamoudi RA, Kocialkowski S, Pearson DM, Collins VP, Ichimura K (2011) A distinct region of the MGMT CpG island critical for transcriptional regulation is preferentially methylated in glioblastoma cells and xenografts. Acta Neuropathol (Berl) 121(5):651–661
Article CAS Google Scholar
Jenkinson M, Smith S (2001) A global optimisation method for robust affine registration of brain images. Med Image Anal 5(2):143–156
Article CAS Google Scholar
Smith SM (2002) Fast robust automated brain extraction. Hum Brain Mapp 17(3):143–155
Article Google Scholar
Smith SM, Brady JM (1997) SUSAN—a new approach to low level image processing. Int J Comput Vis 23(1):45–78
Article Google Scholar
Bakas S, Reyes M, Jakab A, et al (2018) Identifying the best machine learning algorithms for brain tumor segmentation, progression assessment, and overall survival prediction in the BRATS challenge. ArXiv181102629 Cs Stat
Rahmat R, Saednia K, Haji Hosseini Khani MR, Rahmati M, Jena R, Price SJ (2020) Multi-scale segmentation in GBM treatment using diffusion tensor imaging. Comput Biol Med 123:103815
Fedorov A, Beichel R, Kalpathy-Cramer J et al (2012) 3D slicer as an image computing platform for the quantitative imaging network. Magn Reson Imaging 30(9):1323–1341
Article Google Scholar
Hinton GE, Srivastava N, Krizhevsky A, Sutskever I, Salakhutdinov RR (2012) Improving neural networks by preventing co-adaptation of feature detectors. ArXiv12070580 Cs
Abadi M, Barham P, Chen J et al (2016) TensorFlow: a system for large-scale machine learning. 12th USENIX Symp Oper Syst Des Implement 16:265–283
Google Scholar
Ellingson BM, Abrey LE, Nelson SJ et al (2018) Validation of postoperative residual contrast-enhancing tumor volume as an independent prognostic factor for overall survival in newly diagnosed glioblastoma. Neuro-Oncol 20(9):1240–1250
Article CAS Google Scholar
Schemper M, Smith TL (1996) A note on quantifying follow-up in studies of failure time. Control Clin Trials 17(4):343–346
Article CAS Google Scholar
Vrieze SI (2012) Model selection and psychological theory: a discussion of the differences between the Akaike information criterion (AIC) and the Bayesian information criterion (BIC). Psychol Methods 17(2):228–243
Article Google Scholar
Shapey J, Wang G, Dorent R et al (2019) An artificial intelligence framework for automatic segmentation and volumetry of vestibular schwannomas from contrast-enhanced T1-weighted and high-resolution T2-weighted MRI. J Neurosurg. https://doi.org/10.3171/2019.9.JNS191949
Reeves GI, Marks JE (1979) Prognostic significance of lesion size for glioblastoma multiforme. Radiology 132(2):469–472
Article CAS Google Scholar
Henker C, Kriesen T, Glass Ä, Schneider B, Piek J (2017) Volumetric quantification of glioblastoma: experiences with different measurement techniques and impact on survival. J Neuro-Oncol 135(2):391–402
Article CAS Google Scholar
Barajas RF, Phillips JJ, Parvataneni R et al (2012) Regional variation in histopathologic features of tumor specimens from treatment-naive glioblastoma correlates with anatomic and physiologic MR imaging. Neuro-Oncol 14(7):942–954
Article CAS Google Scholar
Hammoud MA, Sawaya R, Shi W, Thall PF, Leeds NE (1996) Prognostic significance of preoperative MRI scans in glioblastoma multiforme. J Neuro-Oncol 27(1):65–73
Article CAS Google Scholar
Gutman DA, Cooper LAD, Hwang SN et al (2013) MR imaging predictors of molecular profile and survival: multi-institutional study of the TCGA glioblastoma data set. Radiology 267(2):560–569
Article Google Scholar
Henker C, Hiepel MC, Kriesen T et al (2019) Volumetric assessment of glioblastoma and its predictive value for survival. Acta Neurochir 161(8):1723–1732
Article Google Scholar
Raza SM, Lang FF, Aggarwal BB, Fuller GN, Wildrick DM, Sawaya R (2002) Necrosis and Glioblastoma: a friend or a foe? A review and a hypothesis. Neurosurgery 51(1):2–13
Article Google Scholar
Clarke JL, Iwamoto FM, Sul J et al (2009) Randomized phase II trial of chemoradiotherapy followed by either dose-dense or metronomic temozolomide for newly diagnosed glioblastoma. J Clin Oncol 27(23):3861–3867
Article CAS Google Scholar
Weller M, Stupp R, Reifenberger G, Brandes AA, van den Bent MJ, Wick W, Hegi ME (2010) MGMT promoter methylation in malignant gliomas: ready for personalized medicine? Nat Rev Neurol 6(1):39–51
Article CAS Google Scholar
Lasocki A, Gaillard F, Tacey M, Drummond K, Stuckey S (2016) Incidence and prognostic significance of non-enhancing cortical signal abnormality in glioblastoma. J Med Imaging Radiat Oncol 60(1):66–73
Article Google Scholar
Eidel O, Burth S, Neumann J-O et al (2017) Tumor infiltration in enhancing and non-enhancing parts of glioblastoma: a correlation with histopathology. PLoS One. https://doi.org/10.1371/journal.pone.0169292
Lasocki A, Gaillard F (2019) Non-contrast-enhancing tumor: a new frontier in glioblastoma research. Am J Neuroradiol https://doi.org/10.3174/ajnrA6025
Price SJ, Jena R, Burnet NG, Hutchinson PJ, Dean AF, Peña A, Pickard JD, Carpenter TA, Gillard JH (2006) Improved delineation of glioma margins and regions of infiltration with the use of diffusion tensor imaging: an image-guided biopsy study. Am J Neuroradiol 27(9):1969–1974
CAS Google Scholar
Diehn M, Nardini C, Wang DS, McGovern S, Jayaraman M, Liang Y, Aldape K, Cha S, Kuo MD (2008) Identification of noninvasive imaging surrogates for brain tumor gene-expression modules. Proc Natl Acad Sci U S A 105(13):5213–5218
Article CAS Google Scholar
Iliadis G, Kotoula V, Chatzisotiriou A, Televantou D, Eleftheraki AG, Lambaki S, Misailidou D, Selviaridis P, Fountzilas G (2012) Volumetric and MGMT parameters in glioblastoma patients: survival analysis. BMC Cancer 12:3
Article CAS Google Scholar
Czarnek N, Clark K, Peters KB, Mazurowski MA (2017) Algorithmic three-dimensional analysis of tumor shape in MRI improves prognosis of survival in glioblastoma: a multi-institutional study. J Neuro-Oncol. https://doi.org/10.1007/s11060-016-2359-7
Li C, Wang S, Yan J-L, Torheim T, Boonzaier NR, Sinha R, Matys T, Markowetz F, Price SJ (2019) Characterizing tumor invasiveness of glioblastoma using multiparametric magnetic resonance imaging. J Neurosurg. https://doi.org/10.3171/2018.12.JNS182926

Download references

Funding

Stephen J. Price is funded by a National Institute for Health Research (NIHR), Career Development Fellowship, for this research project. This presents independent research funded by the National Institute for Health Research (NIHR). This report is an independent research supported by the NIHR Brain Injury MedTech Co-operative based at Cambridge University Hospitals NHS Foundation Trust and the University of Cambridge.

Author information

Authors and Affiliations

Division of Neurosurgery, Department of Clinical Neurosciences, University of Cambridge, Level 3 A Block Box 165, Cambridge Biomedical Campus, Cambridge, CB2 0QQ, UK
Yizhou Wan, Roushanak Rahmat & Stephen J. Price

Authors

Yizhou Wan
View author publications
You can also search for this author in PubMed Google Scholar
Roushanak Rahmat
View author publications
You can also search for this author in PubMed Google Scholar
Stephen J. Price
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Yizhou Wan contributed to study design, and performed the data collection and analysis. He also performed the statistical analysis.

Roushanak Rahmat contributed to study implementation and performed the convolutional neural network training and testing.

Stephen J. Price performed the surgeries and provided clinical and imaging data.

All authors contributed to writing, revisions and editing.

Corresponding author

Correspondence to Yizhou Wan.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Ethics approval

This study was approved by the local research ethics committee (Study ID: PRE.2017.040). The study was performed in accordance with the ethical standards as laid down in the 1964 Declaration of Helsinki and its later amendments or comparable ethical standards.

Consent to participate

Patient demographics and imaging were anonymised prior to recording of research data for this retrospective study. Informed consent from individual patients was not required.

Consent to publish

This retrospective anonymised study did not require consent to publish from individual participants.

Disclaimer

The views expressed are those of the authors and not necessarily those of the NHS, the NIHR or the Department of Health and Social Care.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article is part of the Topical Collection on Brain Tumors

Electronic supplementary material

ESM 1

(DOC 238 kb)

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Wan, Y., Rahmat, R. & Price, S.J. Deep learning for glioblastoma segmentation using preoperative magnetic resonance imaging identifies volumetric features associated with survival. Acta Neurochir 162, 3067–3080 (2020). https://doi.org/10.1007/s00701-020-04483-7

Download citation

Received: 29 April 2020
Accepted: 02 July 2020
Published: 13 July 2020
Issue Date: December 2020
DOI: https://doi.org/10.1007/s00701-020-04483-7

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Deep learning for glioblastoma segmentation using preoperative magnetic resonance imaging identifies volumetric features associated with survival

Abstract

Background

Methods

Results

Conclusion

Similar content being viewed by others

Segmentation of glioblastomas in early post-operative multi-modal MRI with deep neural networks

Deep Convolutional Neural Networks Using U-Net for Automatic Brain Tumor Segmentation in Multimodal MRI Volumes

Brain Tumor Segmentation and Survival Prediction Using Automatic Hard Mining in 3D CNN Architecture

Introduction

Methods

Patient characteristics

Image preprocessing

Tumour segmentation

Statistical analysis

Results

Patient, clinical and treatment characteristics

Comparison between automated and manually corrected segmentations

Relationship between volumetric subregions

Volumetric features associated with overall survival

Discussion

Conclusions

Data availability

Abbreviations

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethics approval

Consent to participate

Consent to publish

Disclaimer

Additional information

Publisher’s note

Electronic supplementary material

ESM 1

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation