Introduction

The median survival for glioblastoma remains approximately 12–15 months despite surgery and chemoradiation [1]. Extent of resection (EOR) is the only modifiable prognostic factor [1,2,3]. Other prognostic variables such as age, O6-methylguanine–DNA methyltransferase (MGMT) methylation and Karnofsky performance status (KPS) are not modifiable, and mutation status cannot be assessed preoperatively [1, 4]. Combining clinical variables with MRI features may be able to better prognosticate patients and stratify them for clinical trials—especially trials at the time of surgery when molecular markers are unknown [5, 6].

Radiogenomic analysis has linked MRI-based tumour subregions such as fluid-attenuated inversion recovery (FLAIR) volumes with genetic signatures of invasiveness and reduced survival [7]. Reproducible and accurate descriptors of imaging features are needed to identify prognostic biomarkers [8]. The Visually Accessible Rembrandt Images (VASARI) variables is the largest standardised dataset of imaging variables based on preoperative MRI [8, 9]. Qualitative VASARI descriptors when combined with manually segmented volumetric variables are independently associated with survival [9].

Manual segmentation of tumour volumes is time-consuming and suffers from high interobserver and intraobserver variability [10]. This limits their use in clinical practice. Computer-assisted methods reduce segmentation time and demonstrate good agreement with manual ground truth segmentations [10,11,12].

Convolutional neural networks (CNNs) are the state-of-the art computer-assisted method for glioblastoma segmentation [13]. They outperform alternative methods which use independent decision classifiers to extract texture and intensity features [14]. A pre-annotated dataset is used to train the CNN architecture to perform a series of mathematical convolutions through interdependent multiple layers. This determines the relationship between the input images and output images. The CNN can then be validated on different test datasets.

DeepMedic is a 3-dimensional (3D) CNN ranked highly in the Brain Tumour Segmentation Challenge (BRATS) [14]. DeepMedic assigns classes to each voxel independently using intensity and local feature information across image planes through two 11-layer convolutional pathways [14]. Each pathway samples different resolutions to lower computational cost [14]. DeepMedic has been shown to have robust segmentation accuracy of tumour subregions when tested and trained on MRI images performed at multiple institutions, with different protocols [15].

Automated segmentation of glioblastoma subregions has moderate agreement with their corresponding VASARI-derived semi-quantitative measures. However, measurement of certain tumour subregions is less accurate than other regions such as necrosis compared to the contrast-enhancing region (CER) [8]. In clinical practice, manual correction of segmentations generated from deep learning is more accurate than when the automated segmentations are used as full replacement for manual segmentations [15]. Generating curated training data for each dataset is also labour-intensive and time-consuming, requiring manual segmentations for images in each data cohort. Using publicly available data such as BRATS to train CNNs may enable these automated methods to be applicable across different datasets and institutions. We therefore aim to investigate whether DeepMedic trained on BRATS can be used for transfer learning, utilising automated segmentations as priors for manual correction. Previous studies have not examined the prognostic value of CNN-assisted volumetric measurements in combination with known prognostic variables. We test the validity of our semi-automated segmentation approach by correlating volumetric features of the resulting segmentations with survival.

Methods

Patient characteristics

All patients (≥ 18 years) diagnosed with histologically confirmed primary glioblastoma were identified from July 2016 to January 2018. Patients with previous glioma or cranial surgery were excluded.

Clinical characteristics were collected from electronic patient records. Motor deficit was defined as reduced power in any modality, and sensory deficit as reduced sensation or paraesthesia in any modality. Speech problems can present as receptive or expressive dysphasia. The operative records were used to obtain the following factors: American Association of Anesthesiologists (ASA) grade, use of 5-aminoleuvenic acid (5-ALA) and/or neurostimulation/awake surgery. Postoperative neurological deficit was recorded within 1 week from surgery.

Presence of isocitrate dehydrogenase (IDH) mutation and MGMT promoter methylation were recorded. MGMT promoter methylation was determined by pyrosequencing of the differentially methylated region 2 using a 10% cutoff value [16].

Patients were treated with either adjuvant chemoradiation (Stupp regimen), radiotherapy for symptom stabilisation or supportive care. The date of death was obtained from national patient records. The date of last follow-up was the time of query of NHS Spine (22/01/2019).

Patient demographics and imaging were anonymised prior to recording of research data. This study was approved by the local research ethics committee (Study ID: PRE.2017.040).

Image preprocessing

MRI protocols are shown in Supplementary Table A1. Each patient’s images were resampled to 1 mm3. T1-weighted, T2-weighted and T2-FLAIR images were co-registered to T1C image using the FSL linear image registration tool (FLIRT) with a mutual information algorithm and 6-degrees of freedom [17].

Brain extraction was performed using brain extraction tool (BET) for the T1C image [18]. Masks were edited and applied to the other sequences using voxel multiplication.

Processed images were registered to the same atlas (SRI24) used for BRATS with a 12 degree of freedom affine registration, mutual information algorithm with FSL FLIRT [17]. Noise reduction was performed using smallest univalue segment assimilating nucleus (SUSAN) [19]. Images were denoised and normalised to zero mean unit variance.

Tumour segmentation

The BRATS 2017 dataset contains 285 mixed-grade gliomas with expert-annotated manual segmentations of the (1) necrotic core (NC), (2) CER, (3) non-enhancing tumour (NET) and (4) peritumoural oedema (PTE) [20]. The definitions for manual segmentation are found in the Supplementary Data. DeepMedic can detect three tumour subregions: (1) FLAIR region, (2) CER and (3) NC region. Following the procedure from BRATS, automated segmentations of the whole-tumour (WT) region were created from T2-weighted sequences and FLAIR sequences whilst the CER region and NC regions were created from T1-weighted images. PTE and NET were manually delineated from the FLAIR region [15].

The primary architecture of DeepMedic consists of two main parallel pathways, each consists of four feature extraction layers with 53 kernels for feature extraction, as well as two fully connected layers and one final classification layer. The multi-scale processing of different input channels is handled using the dual pathway to achieve a large receptive field for the final classification, whilst the cost computation remains low. The first pathway operates on the original image, and the second one operates on a down-sampled version [14, 21].

To apply transfer learning on our dataset, DeepMedic was firstly trained on randomly chosen scans from BRATS dataset. The dataset was divided into 107 patients for training (n = 66,340 images) and 50 patients for validation (n = 31,000 images). This trained model was further tested on 50 further patients from BRATS prior to application to our test dataset. As we are evaluating the validity of applying a CNN trained on a different dataset to our test dataset, one author (YW) manually segmented our test images by correcting the automated segmentation labels derived from DeepMedic using 3D Slicer (Harvard Medical School) (Fig. 1) [22].

Fig. 1
figure 1

Example segmentation showing tumour subregions and imaging sequences. a T1-post-contrast. b T2 fluid-attenuated inversion recovery (FLAIR). c Whole-tumour mask. 1 = contrast-enhancing region = blue; 2 = necrotic core; 3 = peritumoural oedema. R = right. L = left

DeepMedic incorporates data augmentation to increase input data volumes as well as their complexity via reflection with respect to the mid-sagittal plane. Data shuffle is performed at the start of each epoch to avoid overfitting [23]. The hyper parameters remained the same as the original DeepMedic network proposed [14]. The network is regularised using dropout, 35 epochs and 5 batch sizes, with 5-fold cross-validation. The loss function used was negative log-likelihood.

Training was performed using an implementation of DeepMedic on Tensorflow, using an NVIDIA Titan Xp graphics card [24].

The delimitation of residue contrast-enhancing tumour was performed using T1 subtraction maps (ΔT1 map). This method improves the delineation of tumour by subtraction of contrast enhancement from blood products [25]. Their use has validated residue tumour volume (RTV) as a predictor of survival [25]. ΔT1 maps were created by voxel-by-voxel subtraction of the pre-processed T1 image from the T1C image (Fig. 2).

Fig. 2
figure 2

Processing pipeline for segmenting residual enhancing tumour (radiological orientation). T1C = T1 contrast; T2 FLAIR = T2 fluid-attenuated inversion recovery; RTV = residue tumour volume

The comparative metrics used to assess the quality of the segmentations were the Dice coefficient and volumes of the segmentations. The Dice coefficient is a value between zero and one which presents the degree of overlap between two segmentations (one represents perfect overlap) [11]. For our test dataset, the post-manually edited tumour subregion segmentations were considered the ground truth for comparison with the automated labels. Differences between the automated and manually corrected segmentations volumes of each tumour subregion were also compared. For the BRATS test data (n = 50), we also computed the Dice coefficient between the automated labels generated from our model and the expert-annotated manual ground truths.

Statistical analysis

VASARI features were scored on preoperative imaging (Supplementary Table A2). Spearman rank correlation was used to compare the proportions of each subregion; EOR was calculated based on volumetric segmentations; EOR = CER − RTV/CER × 100%.

The primary outcome was overall survival (OS): difference between the date of death and the date of surgery. Cox proportional hazards models were used to identify significant factors associated with OS. Median follow-up time was calculated using the reverse KM method [26]. The Student t test and Wilcoxon rank-sum test were used to compare baseline characteristics between groups for continuous variables. The chi-square test was used for comparing categorical variables.

Variables with p < 0.2 in the bivariable regression models were included in the multivariable models. The Akaike information criterion (AIC) was used as a measure to compare the quality of the models. AIC scores compare the relative performance of a model based on the number of parameters and goodness of fit. The model with the lower AIC score explains the greatest variation using the least number of independent variables [27]. Multiple testing was controlled for using the false discovery rate. A false discovery rate adjusted p value (q-value) is the percentage of significant tests which will result in a false positive. Given the exploratory nature of the study, a threshold of 0.1 was chosen (sensitivity analysis showed that lowering the threshold to 0.05 did not change the results of significant variables). Statistical analysis used Stata version 14 (StataCorp. College Station, Texas).

Results

Patient, clinical and treatment characteristics

One hundred twenty cases were included (Table 1). Median follow-up time was 19.9 months (95% CI, 17.4–21.9 months). The median OS was 8.8 months (95% CI, 7.3–12.0 months). Median survival of patients undergoing resection was 8 months longer than patients undergoing biopsy (12.8 months vs 4.7 months, p < 0.001).

Table 1 Baseline characteristics of all patients (n = 120)

The median age was 65.2 [57.1–70.6] years and 69 (57.5%) of patients were male. Complete resection of enhancing tumour (CRET) was achieved in 59 (49.2%) patients and partial resection of enhancing tumour (PRET) achieved in 16 (13.3%) patients. A biopsy was performed on 45 (37.5%) patients. The median EOR was high, 99% (IQR, 50.9–100%) with a median RTV of 0.47 cm3 (IQR, 0.09–0.73 cm3) for PRET tumours.

Due to differences between baseline characteristics, the resection and biopsy groups were analysed separately. Resection patients had larger CER and NC volumes whilst biopsy patients had a larger NET volume (Supplementary Table B1).

Comparison between automated and manually corrected segmentations

The network was able to detect WT, CER and NC in all patients in the BRATS test data and in 118 patients (98%) of our test dataset. Failure to localise the tumour was caused by inaccurate labelling due to T2 hyperintensities on the FLAIR sequence in two patients. The time taken for full manual segmentation in each of these two patients was approximately 40 min. Time taken for manual correction of the automated labels in the remaining patients was approximately 15 min. The training and test times were approximately 41 h (2475 min) and 36 min respectively in our data. Therefore, considering training, testing and correction time, our semi-automated segmentation method can reduce segmentation time by approximately 38 h (2289 min) compared to manual segmentation.

Comparison of segmentation outcomes between our BRATS test data (n = 50) and our sample data (n = 120) showed similar network performance in the two datasets. Median Dice coefficients in our sample were (1) WT 0.94 (IQR, 0.82–0.98), (2) FLAIR region 0.84 (IQR, 0.63–0.95) and (3) CER 0.91 (IQR, 0.74–0.98) and NC were 0.82 (IQR, 0.47–0.97). We observed significantly different Dice coefficients for WT 0.91 (IQR, 0.83–0.94 p = 0.012), CER 0.83 (IQR, 0.78–0.89 p = 0.003) and NC 0.67 (IQR, 0.42–0.81 p = 0.005) but not FLAIR region 0.81 (IQR, 0.69–0.8 p = 0.170) in the BRATS test data (Fig. 3).

Fig. 3
figure 3

Box and whisker plots (IQR) of WT, FLAIR, CER and NC regions Dice coefficients between the BRATS test data (BRATS) and sample data (Test). CER = contrast-enhancing region; FLAIR = fluid-attenuated inversion recovery; NC = necrotic core; WT = whole tumour

The median tumour volume for post-corrected segmentations were WT 77.0 cm3 (IQR, 43.6–115.6), FLAIR region 41.5 cm3 (IQR, 26.0–68.7), CER 8.2 cm3 (IQR, 2.9–14.9) and NC 7.2 cm3 (IQR, 2.7–16.2). There were significant differences between automated segmentation volumes for WT 83.1 cm3 (IQR, 53.1–121.9 p < 0.001), FLAIR region 61.5 cm3 (IQR, 36.8–90.9 p < 0.001) and CER 9.3 cm3 (IQR, 4.1–17.3 p < 0.001) but not NC 6.2 cm3 (IQR, 1.6–17.7 p = 0.209) (Fig. 4).

Fig. 4
figure 4

Box and whisker plots (IQR) comparing tumour subregion volume between corrected and automated segmentations for WT, FLAIR, CER and NC regions. CER = contrast-enhancing region; FLAIR = fluid-attenuated inversion recovery; NC = necrotic core; WT = whole tumour

Relationship between volumetric subregions

There was a positive correlation between the volume of tumour core and individual tumour subregions (p < 0.001), shown in Fig. 5.

Fig. 5
figure 5

Scatterplots showing correlation between tumour subregions and tumour core volume. The correlation was strongest for contrast enhancing and necrosis (r = 0.65) volume (r = 0.42) (b and d) whilst there was a weaker correlation between oedema (r = 0.37) and non-enhancing (r = 0.36) volume (a and c)

PTE volume positively correlated with CER volume r = 0.42 (p < 0.001) and NC volume r = 0.37 (p < 0.001) but not NET volume r = 0.06 (p = 0.508). There was a negative correlation between CER volume and NET volume r = − 0.31 (p < 0.001) as well as between CER volume and NC volume r = 0.37 (p < 0.001). NET volume was not correlated with NC volume r = −0.107 (p = 0.247).

Subregion volumes were normalised by dividing by TC volume. TC volume positively correlated with NET/TC r = 0.22 (p < 0.05). There was a negative correlation between TC and CER/TC r = − 0.35 (p < 0.001) and TC with PTE/TC r = − 0.48 (p < 0.001). NC/TC did not correlate with TC volume r = 0.13 (p = 0.158).

PTE/NC ratio correlated with CER/TC r = 0.27 (p = 0.003) and was independent of NET/TC r = 0.10 (p = 0.285). NER/TC was independent of CER/TC r = − 0.16 (p = 0.080) and negatively correlated with NET/TC r = − 0.49 (p < 0.001).

Volumetric features associated with overall survival

Cox regression models were constructed for volumetric variables which were not significantly correlated with each other (Supplementary Table C1, C3 and C5). Analysis was performed for the entire cohort and for biopsy and resection patients separately (Table 2).

Table 2 Cox regression analysis

For the whole cohort, clinical variables associated with OS included the following: age (HR 1.05 [95% CI, 1.02–1.08], p = 0.003), postoperative deficit (HR 3.42 [95% CI, 1.69–6.90], p = 0.001), CRET (HR 0.11 [95% CI, 0.03–0.38], p < 0.001), PRET (HR 0.23 [95% CI, 0.06–0.93], p < 0.039) and adjuvant chemoradiotherapy (HR 0.15 [95% CI, 0.06–0.33], p < 0.001). Occipital lobe location (HR 10.05 [95% CI, 1.27–79.66]. p = 0.029), speech motor cortex location (HR 2.63 [95% CI, 1.02–6.75], p = 0.045) and visual cortex location (HR 0.18 [95% CI, 0.05–0.61], p = 0.006) were associated with OS. Volumetric variables which were significantly associated with OS were as follows: CER/TC (HR 4.73 [95% CI, 1.67–13.40], p = 0.003) and NC/TC (HR 8.13 [95% CI, 2.06–32.12], p = 0.003. At a corrected critical value of q = 0.022, both CER/TC (q = 0.017) and NC/TC (q = 0.011) were significantly associated with OS.

The significant variables associated with OS for resection patients were as follows: age (HR 1.08 [95% CI, 1.03–1.14), p = 0.002), KPS, EOR (HR 3.69 × 10−4 [95% CI, 1.51 × 10−7–0.90], p = 0.047) and adjuvant chemoradiotherapy (HR 0.15 [95% CI, 0.03–0.85), p = 0.031). The only significant volumetric feature was PTE/NC (HR 1.05 [95% CI, 1.01–1.09], p = 0.020) but this was not significant after adjustment for multiple testing (adjusted critical q > 0.004).

For biopsy patients, the following volumetric variables were associated with OS: NET volume (HR 0.01 [95% CI, 0.002–0.23], p = 0.005), NET/WT (HR 0.01 [95% CI, 0.002–0.23], p = 0.005) and CER/TC (HR 13.88 [95% CI, 1.56–123.36], p = 0.018). Only NET/WT was significantly associated with OS following correction for multiple testing (q < 0.038).

MGMT methylation and IDH mutation were not significantly associated with OS in bivariable or multivariable analysis.

Discussion

In this study, we integrated a deep learning network into a clinically applicable processing pipeline for semi-automated measurement of glioblastoma volumetric features from preoperative MRI. A CNN was trained on publicly available BRATS data before testing on our routine clinical dataset. Final segmentation labels were generated by manual correction of the automated segmentations. Performance of the network was comparable between our clinical dataset and BRATS testing dataset when measured using the Dice coefficient. We further assessed the validity of our segmentation approach by evaluating the prognostic performance of volumetric features. Higher CER/TC and NC/TC were independently associated with higher risk of death overall. NET/WT was associated with lower risk of death in biopsy patients.

We demonstrate the possibility of transfer learning by segmenting tumour regions on a heterogeneous clinical dataset trained and tested on an independent dataset using multimodal imaging. A major advantage of our method is that it is applicable to data from different scanners and institutions. Furthermore, curating a large sample of uniform images for deep learning training is time- and labour-intensive [28]. Even when training time is considered, use of our computer-assisted segmentation method can reduce annotation time by 20 min per patient compared to fully manual segmentation. Training on an external dataset also increases the generalizability of the method and means data does not need to be split into training and validation cohorts, reducing sample sizes.

Deep learning methods can provide quantitative imaging-based prognostic biomarkers that outperform semi-quantitative estimates. In previous studies, tumour size has been investigated as a potential prognostic marker [29]. Tumour dimensions can be estimated using long axis diameter, cuboid, spheroid and ellipsoid formulas [30]. These methods are subjective and difficult to reproduce, leading to conflicting associations with survival [31, 32]. Formula-based estimates also have poor accordance with volumetric measurements [30].

We did not perform full manual segmentation as an approximate for ground truth to compare with our automated segmentations as the goal of our study was to determine if deep learning generated automated segmentations could be used to aid manual segmentation rather than as a replacement. Despite being trained on BRATS data, our network was able to detect the tumour in nearly all our patients and showed comparable segmentation accuracy in our dataset when evaluated against BRATS test data.

Our final segmentations were derived from corrections performed on the automated labels so it is expected that the label overlap for each subregion is significantly higher in our dataset compared to the BRATS data where no bias exists due to the manual segmentations being performed independently. Nonetheless, the relative accuracy of the segmentation labels across different tumour subregions was similar across the two test datasets. For example, the necrosis subregion had the highest proportion of misclassified voxels whilst the CER had the least proportion of misclassified voxels. This indicates that our manual corrections were dependent on the accuracy of the automated labels.

Few studies have compared accuracy of automated and manual segmentations of tumour subregions such as necrosis [15]. It is important to measure these subregions as they may have prognostic significance. Our data highlights the importance of choosing clinically relevant metrics to compare automated and manual segmentations. Two segmentations may have high Dice overlap but significantly different volumes if a smaller volume is entirely within the larger volume. There were significant differences between the automated and corrected segmentations for all tumour subregions apart from necrosis. Necrosis may be particularly challenging to segment using either automated or manual methods due to its heterogeneous and dispersed nature within the tumour core [15].

Necrosis was one of the earliest imaging markers found to have prognostic value [32]. Descriptive studies classifying tumours by estimating necrotic proportions have not been consistently prognostic [1]. In concordance with previous studies, we do not find a consistent association between necrosis volume measured by VASARI score and OS [33]. Instead, higher relative proportion of necrosis is associated with worse prognosis. NC/TC has been previously associated with worse survival [30, 34]. However, this study only included patients suitable for CRET, raising the possibility of selection bias. Furthermore, they did not control for qualitative imaging features described in the VASARI feature set in their multivariable analysis [34]. Hypoxia may select for quiescent stem-like cells around the NC which resist apoptosis and undergo proliferation [35].

Established prognostic variables including EOR calculated using our segmentations are associated with survival. The median OS in our cohort is only 9 months because compared to previous cohorts, we did not exclude patients undergoing biopsy [1, 30]. The median OS of 13 months in our resection patients is comparable to previous studies [1]. By including both resection and biopsy patients, our sample is representative of the heterogeneous cohorts of glioblastomas encountered in routine clinical practice. IDH mutation was not found to be associated with improved survival but this finding may be limited by the small number of IDH-mutant patients in our cohort. Our finding that MGMT methylation was not associated with improved survival is difficult to interpret. Not all studies demonstrate an association between MGMT methylation and improved OS, with several studies including a randomised control trial showing no association with survival [4, 36]. The reasons for this may be due to assay variability and difficulties correlating promoter methylation with protein expression [37].

The association between FLAIR proportion relative to other subregions and survival is unclear [38]. PTE/TC was not found to be associated with OS in a previous study [34]. Multiparametric biopsies have shown high levels of viable tumour cells within the non-enhancing region [39]. However, unlike our study, these studies did not differentiate oedema from non-enhancing tumour. The Dice coefficient between the automated and corrected FLAIR region segmentations was not significantly higher for our sample compared to BRATS as we were able to manually delineate NET from the FLAIR region.

Differentiating non-enhancing tumour from peritumoural oedema is important as each subregion may yield different prognostic information. The non-enhancing tumour within the FLAIR abnormality may represent lower grade disease [40]. We show that non-enhancing tumour proportion relative to the whole-tumour volume rather than FLAIR volume or peritumoural oedema volume was associated with improved survival in the biopsy group. This suggests that non-enhancing tumour was differentiated from peritumoural oedema by manual segmentation. Although our network was not able to detect non-enhancing tumour, we have shown high segmentation accuracy for the FLAIR region which can aid the manual delineation of the non-enhancing tumour.

Non-enhancing tumour variables were not prognostic in resection patients. This may be because the resection patients had significantly smaller volumes of non-enhancing tumour compared to the biopsy patients. The difficulty in delineating oedema from non-enhancing tumour may result in overlap between non-enhancing tumour and oedema in resection patients. We have shown that diffusion MRI signatures have higher sensitivity for invasive tumour can also be segmented using CNN [21, 41]. These biomarkers may be correlated with the FLAIR region to improve identification of infiltrative tumour.

CER/TC was associated with survival, independent of RTV and other prognostic factors. In radiogenomic studies, the CER correlates with genes involved in angiogenesis and hypoxia [42]. The relationship between the NC and the CER may not be linear; we show that NC/TC is independent of CER/TC and core tumour volume. This suggests that some tumours have relatively greater proportions of necrosis for a given tumour volume.

CER volume was negatively associated with survival in a cohort of resected glioblastomas but the accuracy of the volumetric measurements was limited by digitised hard-copy imaging [43]. CER volume has also been associated with worse survival when adjusted for other VASARI variables but in this study adjuvant treatment received was not controlled for [9]. We found that CER/TC rather than CER volume was significantly associated with survival. CER/TC may be a more accurate prognostic measure because it relates CER to the core tumour volume rather than whole-tumour volume as the oedema component may be affected by factors such as steroid use.

Limitations to our study include its retrospective nature. It is one of the largest studies investigating volumetric features and prognosis incorporating quantitative measurement of postoperative imaging and clinical variables. Future work to evaluate our approach should quantitatively compare segmentation measurements with manual segmentations from multiple observers to assess inter-rater variability [28]. In addition, an independent dataset is necessary for us to compare the relative prognostic performance of automated segmentations against manual segmentations and determine the reproducibility of our segmentations. Finally, volumetric features may be combined with texture- and shape-based analysis of tumour subregions to develop improved prognostic models for glioblastoma patients [44, 45].

Conclusions

Using a CNN with a transfer learning approach, we have shown that volumetric measurements of glioblastoma tumour subregions can be measured from preoperative MRI with high accuracy. The CNN can be integrated into a radiological workflow to significantly shorten segmentation time.

Tumours with greater proportions of necrosis and contrast enhancement are independently associated with worse survival whilst non-enhancing tumour proportion is associated with improved survival. With further validation, we may be able to use volumetric features from routine clinical imaging for patient prognostication and stratification into clinical trials.