Introduction

Glioblastoma (GBM) represents 48.3% of all malignant primary brain tumors [1]. Despite advances in both treatment and biological understanding, prognosis remains poor. Other than the modest benefit demonstrated by the addition of temozolomide to radiotherapy, and TTField therapy to chemoradiotherapy, modern-day regimens have not significantly improved overall survival in the past 40 years [2,3,4,5]. According to an National Cancer Database study, long-term survivorship (over three years) in those with GBM is only ~ 9% [6].

While extent of resection, age at diagnosis, Karnofsky performance status (KPS), O-6-Methylguanine-DNA Methyltransferase (MGMT) promoter methylation status and presence of an IDH1 or IDH2 mutation are well-validated prognostic factors, [7,8,9] more recently sex has been shown to be an important prognostic factor for GBM with better survival outcomes observed in females [6, 10]. Males have a higher incidence of GBM compared to females [1]. Transcriptome analysis has suggested the existence of sex-specific molecular subtypes for GBM indicating that the biological differences in disease likely extend beyond basic hormonal differences [11].

Currently, two nomograms have been developed for predicting 6-, 12-, and 24- month survival in GBM patients generally and in isocitrate dehydrogenase (IDH) wildtype GBM patients specifically [12, 13]. These nomograms use various demographic and biological factors as survival predictor variables including patient sex. We hypothesize that a sex-specific analysis may result in a more accurate survival prediction nomogram as sex was found be a significant predictor of survival in that analysis. The purpose of this study was to develop and independently validate sex-specific nomograms for estimation of individualized survival probabilities for GBM patients. We utilized data from 2 independent, recent, and non-overlapping NRG Oncology (formerly RTOG) clinical trials, NRG/RTOG 0525 and NRG/RTOG 0825 [14, 15].

Methods

Study population

Exempt approval was obtained from the University Hospitals Institutional Review Board (IRB) for all analyses presented. De-identified data were provided by NRG Oncology for the clinical trials NRG/RTOG 0525 and NRG/RTOG 0825 for which a written informed consent was obtained for each study subject under IRB approved protocols for each participating NRG study site [14, 15]. NRG/RTOG 0525 enrolled patients from January 2006 through June 2008; NRG/RTOG 0825 from April 2009 through May 2011. The two trials included information on 831 and 620 randomized patients with newly-diagnosed GBM, respectively. For each patient, the following variables were obtained: survival/follow-up time in months, survival status (dead or alive), progression-free survival time in months, progression-free survival status (no progression or progressed/dead), age at diagnosis (continuous), race (white, black, or other), sex (male or female), KPS (70, 80, 90, or 100), extent of resection (total/gross, subtotal, or other), MGMT promoter methylation status (promoter unmethylated or methylated), total number (0, 1, or ≥ 2) of comorbidities (heart problems, lung problems, high blood pressure, bleeding problems, circulation problems, diabetes, kidney/urine problems, stroke, thyroid problems, seizure, psychological problems), location of tumor within brain (frontal, temporal, parietal, occipital or multiple), laterality (right, left or bilateral) and use of corticosteroids (had to have received a stable or decreasing dose for the 5 days before study registration (yes/no)). Other category of extent of resection included unknown, biopsy, debulking, craniotomy etc. Overall, 88 patients with unknown MGMT promoter methylation status and 6 with unknown laterality were excluded from this analysis.

Statistical analysis

Descriptive statistics were used to assess any differences in patient characteristics and prognostic factors by sex using t-tests for continuous variables and chi-square tests for categorical variables. Non-parametric equivalents were used as appropriate. The analyses were performed using NRG/RTOG 0525 as the training dataset and NRG/RTOG 0825 as the validation dataset. Both overall survival (OS) and progression-free survival (PFS) were examined for the trial dataset using the Kaplan–Meier method and were compared by sex using the log-rank test. Upon examination of the Shoenfeld residuals by sex, the proportional hazards assumption for all analyses by sex was not violated.

In the initial phase of nomogram development to select prognostic factors, we fit a multivariable Cox proportional hazards model by sex for both OS and PFS to the training set (0525). Cox models were found to be superior for survival prediction on these datasets in a previous publication [12], and a multivariable Cox model with sex as a variable using these datasets was reported in a previous publication [12]. In the first step, a model was fit by including every candidate survival predictor variable; in each subsequent step, the model with the smallest Akaike information criterion (AIC) score was chosen after removing one variable at a time (backward selection). And the model was refit with the remaining variables. This process was repeated until to the point where removing any variable would increase the AIC score. Criterion-based methods such as AIC are preferred as they involve a wider search and compare models in a preferable manner[16, 17]. The proportional hazards and linearity assumptions were examined using Schoenfeld and Martingale residuals. None of the variables included in the final model appear to violate these assumptions. We used the candidate variables retained by each sex specific Cox model on the training set (NRG/RTOG 0525) as the predictors of survival to independently validate (NRG/RTOG 0825) and build nomograms for OS and PFS. The final selected models were trained using the data from NRG/RTOG 0525 and were independently validated using the data from NRG/RTOG 0825.

Calibration of the final models by sex for both OS and PFS for both training and validation dataset was visually evaluated by assigning all patients into quintiles of the nomogram-predicted survival probabilities and plotting the mean nomogram predicted survival probability against the Kaplan–Meier estimated survival for each quintile. A user-friendly online application to obtain individualized predicted survival probabilities by sex was developed and can be found here—https://npatilshinyappcalculator.shinyapps.io/SexDifferencesInGBM/. All analysis were performed using R v3.6.0 (http://www.r-project.org/) and the online application was developed using R Shiny application.

Results

Patient characteristics

In both trials, treatment either did not affect primary outcomes (OS and PFS) or the outcomes did not reach the prespecified improvement target; therefore, the data from both of the studies were used in this analysis (1,359 patients in total across both trials). The comparison of patient characteristics between the trials is shown in Supplemental Table 1. Table 1 shows the patient characteristics by sex by trial. The proportion of males and females was similar in both trials (57.7% vs 60.3% males and 42.3% vs 39.7% females for NRG/RTOG 0525 and NRG/RTOG 0825, respectively). Males tended to have higher KPS scores, poorer OS, poorer PFS, and more cardiac co-morbidities. Tumor location and laterality did not significantly differ by sex. Extent of resection (EOR) also did not differ significantly by sex. The majority of patients included in this analysis had no comorbidities (45.9%) and there was no significant difference in total number of comorbidities by sex (Table 1).

Table 1 Patient characteristics by NRG Oncology Trial and sex

Survival by the Kaplan–Meier method

Kaplan–Meier curves were generated for OS and PFS for both NRG/RTOG 0525, the training dataset (Fig. 1 Panels A and B) and NRG/RTOG 0825 (Fig. 1 Panels C and D), the validation dataset. In the training dataset, females had a median survival of 17.9 months (16.4–20.1), which differed significantly from male OS of 13.8 months (12.4–14.9) (log rank p = 0.003). Males also had poorer PFS of 5.8 months (5.4–6.4) compared to female PFS of 6.4 months (5.8–8.3) but this was not significant (log rank p = 0.06). In the validation dataset, females had a significantly greater median survival of 16.9 months (15.2–19.8) compared to male median survival of 15.7 months (14.5–16.6, log rank p = 0.03). The PFS was significantly different between females (10.3 months, 8.7–12.3) and males (8.9 months, 7.8–9.9, log rank p = 0.03). These differences in the median survival were unadjusted estimates.

Fig. 1
figure 1

Kaplan–Meier Survival Results by Sex for Overall and Progression-Free Survival Using Training (NRG/RTOG 0525) (A and B) and Validation (NRG/RTOG 0825) (C and D) datasets

Sex differences in survival

The overall Cox model by sex with the variables selected in the final model is shown in Table 2 for OS and Supplemental Table 4 for PFS. Based on the AIC criteria, age at diagnosis, KPS, MGMT status and location of tumor were common significant predictors of survival for both sexes. Extent of resection and use of corticosteroids were significant predictors of OS for males. However, for both sexes, tumors in frontal lobe had significantly better survival than tumors involving multiple sites. There was no difference in survival between other sites and tumors of multiple sites. Age, and MGMT status were also significant predictors for PFS for both sexes.

Table 2 Final Multivariable Cox Proportional Hazards Results for Overall Survival by Sex using the Training Dataset (NRG/RTOG 0525)

Nomograms

Calibration curves were drawn for both training (NRG/RTOG 0525) and validation (NRG/RTOG 0825) datasets for predicted 6-, 12-, and 24-month overall survival by sex (Supplemental Figs. 1 and 2). The curves show three lines, blue (observed survival rates), gray (ideal survival rates), and black (optimism/bias/ overfitting corrected survival rates). The 12-month and 24-month survival, observed and optimism corrected lines, are nearly identical showing near perfect calibration for OS. A sex-specific nomogram was developed for OS (Figs. 2 and 3). All nomograms were developed using NRG/RTOG 0525 as the training data and validated with NRG/RTOG 0825. The calibration curves for validation datasets were plotted using parameters from model using training dataset. The final multivariable model for validation dataset is shown in Supplemental Table 3. The calibration curves for PFS were not as accurate as those for OS (Supplemental Figs. 3 and 4). In addition, progression was determined by site investigator’s determination rather than centrally reviewed PFS standards, hence reducing the validity of this measure. For these reasons, we did not validate or construct nomograms for PFS.

Fig. 2
figure 2

Final nomogram of Overall Survival for Males built on training data (NRG/RTOG 0525) and independently validated on NRG/RTOG 0825

Fig. 3
figure 3

Final nomogram of Overall Survival for Females built on training data NRG/RTOG 0525 and independently validated on NRG/RTOG 0825

Discussion

In this study, we sought to develop and independently validate, sex-specific individual prognostic nomograms for patients with newly-diagnosed GBM. Our analysis includes a large group of GBM patients from 2 modern clinical trials. In the original NRG/RTOG 0525 and 0825 clinical trials, OS and PFS were not significantly different in treatment or control arms [14, 15]. This allowed us to train models on 0525 and externally validate using data from 0825 with no further adjustment for treatment arms. For OS in the male and female calibration curves, the ideal, bias-corrected, and observed curves tracked closely to each other for training and validation data. This suggests that the nomogram is resistant to possible batch effect and overfitting. In addition, the use of backward selection based on AIC to select only the most important variables prevents overfitting from using excess variables. In contrast, the calibration curves for PFS were not as strong, therefore we did not develop nomograms.

Interestingly, the factors that contribute to PFS and OS differ between males and females. Based on the final selected variables, age of diagnosis, KPS score, MGMT-promoter methylation status, extent of resection, use of corticosteroids, and location of the tumor in the brain are the significant predictors of OS for males. However, extent of resection was not a significant predictor of OS for females likely due to very low sample size for females with ‘Other’ resection (Table 1). For PFS, age at diagnosis, MGMT-promoter methylation status and extent of resection were significant survival predictors for males. In females, however, KPS score was significant and extent of resection was not a significant predictor of PFS. Similar to OS, the inconclusive p-values for some variables were likely due to very low sample size for both sexes.

While some of the variables for OS are the same for both males and females, the relative importance of these factors in terms of total points on the nomogram is different. The total point distribution for age of diagnosis, MGMT promoter methylation status and KPS are significantly higher for males compared to females indicating worse survival for males compared to females. This finding is similar to what has been reported earlier with these datasets, although these results were not stratified by sex [12]. However, there are some differences with respect to factors affecting survival by sex. Interestingly, the impact of extent of resection is different between males and females, albeit this could be due to lower sample size in females. Maximal extent of resection is currently equally indicated regardless of sex. It should be noted that extent of resection is a complex and somewhat subjective variable that incorporates abilities of the treating neurosurgeon, tumor size, tumor location as it related to proximity to eloquent cerebral cortex and other intracranial structures, dominant vs non-dominant laterality and the patient’s general medical risks. Moreover, extent of resection generally does not consider resected or residual non-contrast enhancing disease.

Location of the tumor in the brain also had different impact on OS and PFS between males and females. While tumors in the frontal lobe had significantly better survival probability compared to tumor involving multiple sites for both sexes, tumors at the other locations did not have any advantage over tumors in multiple sites. Further research is needed to validate this finding and to translate it to clinical relevance as we did not see similar association in the validation dataset. Additionally, the total number of comorbidities was not found to be significant for either sex possibly due to the fact that a large number of patients included in these trials did not have any comorbidity or only a small number of patients had each comorbidity (Table 1). We examined the univariate association of each of the comorbidity with OS by sex and found that none of the comorbidities were significant, except lung disease which was marginally significant (Supplemental Table 2). The impact of these comorbidities on the survival should be investigated in future trials with a larger sample size.

The primary limitations in our work include demographic differences between the two NRG clinical trials; and the population of GBM patients as a whole. While the patient demographics across both NRG trials are similar, race distribution, extent of resection patterns, and number of comorbidities varied between the studies. NRG/RTOG 0825, the validation set, had more white patients, greater gross total resection, and fewer patient comorbidities. All of these factors have been repeatedly shown to be prognostic for GBM survival [12, 6, 8]. However, in both the training (NRG/RTOG 0525) and validation (NRG/RTOG 0825) datasets, white patients were disproportionally more represented compared to distribution of GBM in the larger US population7. This may be the reason race was not found to be a significant factor. The patients in both trials may not be fully representative of the entire GBM population due to trial eligibility requirements. NRG/RTOG 0525 and 0825 had KPS cutoffs at 60 and 70 respectively and required adequate hematological, renal, and hepatic function [14, 15]. As such, the nomograms may not be predictive of survival in patients who have clinical characteristics different from the inclusion criteria of these clinical trials. The presence of an IDH mutation defines a separate entity from IDH-wildtype glioblastoma and is prognostic of survival outcomes. However, these studies predated routine testing of this biomarker and hence IDH mutation status was not available for the trials used in this study[9, 18]. Besides, IDH mutation only occurs in a small proportion of GBMs, hence these nomograms would be applicable for the majority of patients [19]. Finally, PFS in these older NRG/RTOG trials is based upon site investigator determination rather than central reviewers. Caution should be used when applying these nomograms to patients who are demographically or medically different from the population included in this analysis. Lastly, PFS should not be presumed to be a reliable endpoint, as the determination of progression was not by central review, and may have included instances of pseudoprogression.

The differences in the nomograms by sex shown here indicates that the prognosis of females and males may be different and that these nomograms are useful tools for estimating patient-level survival probabilities. To facilitate clinical use of this nomogram, free software for its implementation is provided (https://npatilshinyappcalculator.shinyapps.io/SexDifferencesInGBM/). This tool will be useful to health care providers in determining individualized survival probabilities by sex. Further research should be done to better characterize the exact biological mechanisms underlying sex differences in GBM.