Survival prediction of glioblastoma patients—are we there yet? A systematic review of prognostic modeling for glioblastoma and its clinical potential

Glioblastoma is associated with a poor prognosis. Even though survival statistics are well-described at the population level, it remains challenging to predict the prognosis of an individual patient despite the increasing number of prognostic models. The aim of this study is to systematically review the literature on prognostic modeling in glioblastoma patients. A systematic literature search was performed to identify all relevant studies that developed a prognostic model for predicting overall survival in glioblastoma patients following the PRISMA guidelines. Participants, type of input, algorithm type, validation, and testing procedures were reviewed per prognostic model. Among 595 citations, 27 studies were included for qualitative review. The included studies developed and evaluated a total of 59 models, of which only seven were externally validated in a different patient cohort. The predictive performance among these studies varied widely according to the AUC (0.58–0.98), accuracy (0.69–0.98), and C-index (0.66–0.70). Three studies deployed their model as an online prediction tool, all of which were based on a statistical algorithm. The increasing performance of survival prediction models will aid personalized clinical decision-making in glioblastoma patients. The scientific realm is gravitating towards the use of machine learning models developed on high-dimensional data, often with promising results. However, none of these models has been implemented into clinical care. To facilitate the clinical implementation of high-performing survival prediction models, future efforts should focus on harmonizing data acquisition methods, improving model interpretability, and externally validating these models in multicentered, prospective fashion. Supplementary Information The online version contains supplementary material available at 10.1007/s10143-020-01430-z.


Introduction
Glioblastoma is the most common and aggressive type of primary brain tumor in adults [1][2][3]. It has one of the highest mortality rates among human tumors with a median survival of 12 to 15 months after diagnosis despite improved standardof-care defined as maximal safe resection followed by radiotherapy plus concomitant and adjuvant temozolomide [1,4].
In recent years, prognostic models are increasingly being developed to predict survival of the individual glioblastoma Supplementary Information The online version contains supplementary material available at https://doi.org/10.1007/s10143-020-01430-z. patient [8,9]. These prognostic models utilize a wide range of statistical and machine learning algorithms to analyze heterogenous data sources and predict individual patient survival. This systematic review aims to synthesize the current trends and provide an outlook concerning the possibility of clinical use of prognostic glioblastoma models and the future direction of survival modeling in glioblastoma patients.

Search strategy
A search was performed in the Embase, Medline Ovid (PubMed), Web of science, Cochrane CENTRAL, and Google Scholar databases according to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines (S1). A professional librarian was consulted for constructing the search syntax with the use of keywords for glioblastoma, prognostic modeling, and overall survival as well as their synonyms (S1). All prognostic models concerning survival in glioblastoma patients were included in our search syntax. Prediction model studies on glioblastoma patients with overall survival as the primary outcome were included. Predictor finding studies were excluded. These studies focus on characterizing the association between individual variables and the outcome at the cohort level (e.g., identifying risk factors of survival within a population), whereas prediction model studies seek to develop a model that predicts survival as accurately as possible in the individual patient utilizing the optimal combination of variables. No restrictions were applied with regard to the participant characteristics, format of the input data, type of algorithm, or validation of testing procedures. Case reports or articles written in languages other than Dutch or English were excluded. No restrictions based on the date of publication were used. This systematic search was complemented by screening the references of included articles to identify additional publications. Titles and abstracts of retrieved articles were screened by two independent authors. Two authors (IRT, SK) read the full texts of the potentially eligible articles independently. Discrepancies were solved by discussion including a third reviewer (JS).

Data extraction
From all included studies, we extracted the year of publication, name of first author, title, abstract, source of data, selection criteria, events per variable, events, sample size, type of input, hyperparameter tuning, number of predictors in the best performing model, definition of overall survival, algorithm type, validation and testing procedure, performance metric, and model performance. To ensure a systematic approach of assessing validation of prognostic modeling in glioblastoma, all the extracted variables were based on the CHecklist for critical Appraisal and data extraction for systematic Reviews of prediction Modelling Studies (CHARMS) checklist [10]. The Prediction model Risk Of Bias ASsessment Tool (PROBAST) tool was used to assess the risk of bias in all included studies [11].

Results
The search identified 595 unique studies. After screening by title and abstract and subsequently screening the full text, 112 studies were included for full-text review (S2). Of these, 27 articles met our inclusion criteria and were included in the qualitative synthesis . A total of 59 models were presented of which the best performing model was included in this review. Two included models used the same database [14,36]. Yet, both were included in this systematic review because different predictors and algorithms were used to develop the models. The included prognostic models were developed between 2010 and 2019. General characteristics and model characteristics for each included study are presented in Supplementary S3. An overview of observations in all identified glioblastoma prognostic models is visualized in Figs. 1, 2, and 3.

Online prediction tools
Three studies have translated their model into an online prediction tool making the models more actionable and useful for individual prognostication in glioblastoma patients [26,30,38]. Although these three models included radiographic features, such as tumor size and extension, these features have to be interpreted or measured manually by a human expert and inserted into the model. Therefore, none of the online prediction tools used raw MRI data but exclusively used structured clinical parameters. Although studies have developed deep learning models utilizing unstructured data, e.g., genomics [26] or MRI imaging [38], none of these models has been translated to an actionable clinical prediction tool yet.

Discussion
This systematic review demonstrates the lack of widespread validation and clinical use of the existing glioblastoma models. Despite the increasing development of survival prediction models for glioblastoma patients, only seven model have been validated retrospectively in an external patient cohort [14-16, 18, 23, 26, 32], and none has been validated prospectively. Furthermore, three models, all of which developed from a statistical algorithm, have been deployed as a publicly available prediction tool [26,30,38], but none has been implemented as a standardized tool to guide clinical decision-making. Lastly, no trend was seen in performance throughout time despite machine learning methods increasingly being used, and no prognostic glioblastoma study till date had consequences for clinical decision-making.
Prognostic models have the potential to help tailor clinical management to needs of the individual glioblastoma patient by providing a personalized risk-benefit analysis. The increase of machine learning algorithms, and deep learning, enables the use of high-dimensional data, such as free text and imaging, to improve the accuracy and performance of prediction models. The increasing use of machine learning for the analysis of unstructured, high-dimensional data parallels the current trends in predictive modeling in medicine [44,45]. Neurosurgical examples include machine learning algorithms for glioblastoma, deep brain stimulation, traumatic brain injury, stroke, and spine surgery [38,[44][45][46][47][48][49][50][51][52][53]. Deep learning algorithms are also increasingly being used to further improve the WHO 2016 classification of high-grade gliomas via histological and biomolecular variables for more concise diagnosis and classification of gliomas [54][55][56].
Furthermore, deep learning algorithms are also frequently used in radiotherapeutic research for automated skull stripping, automated segmentation, or delineation of resection cavities for stereotactic radiosurgery [57][58][59][60]. Despite the ubiquity of highperforming models in clinical research, none has been translated  A probability curve of true-positive rates against the false-positive rates at different cutoff points in outcome [1].
Area under the ROC curve Area under the curve (AUC) distinguishes the discriminative potential of the algorithm. The threshold is 0.5 (no discriminative ability) [41].
Harrell's C-statistic To quantify the estimation of the algorithm in discriminating among subjects with different time events. Harrell's approach utilizes a normalizing constant for right-censored data [1][2][3].

Concordance index
To quantify the estimation of the algorithm in discriminating among subjects with different time events. Utilizes a normalizing constant for left censored data [1][2][3].
to the clinical realm and integrated in clinical decision-making. Few prognostic models are put to practice throughout all medical specialties. Specifically, for other diseases, such as breast cancer, where radiological and genetic factors are more wellknown than glioblastoma, more than sixty models were found to prognosticate breast cancer survival [61]. Nevertheless, this has led to little consequence for clinical care [62]. Seemingly, clinical implementation is not a matter of robustness of evidence or notoriety of the disease. Moons et al. also parallel our finding that robust validation studies are missing for most prognostic models and that most validation studies include a relatively small patient cohort, thus not helping the model's generalizability [62]. This raises the question: what needs to happen before prognostic models can be used in clinic?

Computational challenges
First, machine learning algorithms are accompanied with unique computational challenges which could limit the clinical implementation of these models. Due to the highdimensionality of the input data, machine learning models are inherently limited in their generalizability. Computer vision models developed on single institutional data perform poorly on data from different institutions when different scanning parameters, image features, and other formatting methods are used [63]. In contrast, prediction models that exclusively use structured clinical parameters, such as age, gender, and the presence of comorbidity, are more generalizable and implementable across institutions as this information is less subject to institution-specific data acquisition methods [26,30,38]. This could be one of the reasons that the three currently available prediction models all exclusively use clinically structured information. Therefore, if unstructured data and machine learning algorithms are to be incorporated in clinical prognostic models at multicenter level, harmonization and standardization procedures are required between institutions. Furthermore, most machine learning algorithms accommodate primarily to binary or continuous outcomes, whereas Adapted from PROBAST (Prediction model Risk Of Bias ASsessment Tool); ROB risk of bias "+" indicates low ROB; "−" indicates high ROB; "?" indicates unclear ROB survival data is typically composed of right-censored data, in which the value of an observation is only partially known (i.e., the patient survived at least beyond a specific follow-up time). Other approaches have already been considered for handling aforementioned data, such as discarding, including it twice in the model: once as an event and once as event-free, which creates bias in risk estimate. In addition, novel approaches such as modifying specific machine learning algorithms or weight estimation of the amount of censoring in sample size are introduced [64][65][66]. This highlights the need for translating existing machine learning algorithms to alternatives that can accommodate time-to-event survival data as well.

Clinical challenges
Machine learning algorithms are also accompanied with unique clinical challenges which could limit their clinical implementation. As medical computational science progresses, critical unanswered questions arise: (to what extent) should a medical professional rely on technology and how do you intercept an inevitable predictive miscalculation of the algorithm? First, the black box of many algorithms, e.g., hidden layers in neural networks, substantially reduces the interpretability of a potentially high-performing prediction model and thereby limits their clinical deployment. However, this is not a new phenomenon: therapeutic measures can be implemented in clinical care based on studies demonstrating their safety and efficacy, yet without the underlying mechanisms being fully clarified. In addition, algorithms learn from real-world data, and therefore, real-world disparities could propagate into the developed models. This could potentially sustain or amplify existing healthcare disparities, such as ethnic or racial biases [67]. If survival prediction models were to be clinically deployed, should these algorithms be used as a directive or supportive application in clinical decision-making? There is insufficient information and experience up until now to fully answer this question. Liability can become an issue if a misprediction, e.g., false positive, is made and clinical decisions are influenced by this misprediction; is the clinician responsible for the algorithms' fault? Therefore, medical professionals should be cautious when relying on technology and attempt to further understand predictive algorithms and their inevitable limitations.

Standardizing model evaluation prior to clinical implementation
These computational and clinical challenges have led to the development standardized methods of assessing predictive models for clinical implementation, both in the diagnostic and prognostic realms. The use of methylome data in neurooncology [68] has the promise to be used clinically as recommended by the iMPACT-now guidelines [69]. This study performed a prospective external validation in five different centers to test the accuracy of their model. Additionally, the model was tested in two different labs for technical robustness. This could be the most important step towards clinical deployment of prognostic glioblastoma models as well. Moreover, prognostic radiomics models for GBM patients demonstrate significant potential to achieve noninvasive pathological diagnosis and prognostic stratification of glioblastoma [18,23]. Yet, technical challenges need to be overcome before implementation of these models is realized, namely the access to larger image datasets, common criteria for feature definitions and image acquisition, and wide-scale validation of one radiomics model [70]. Another neuro-oncological model that is widely used for research purposes is the ds-GPA that functions as a diagnostic prognosis assessment for brain metastases [71]. Sperduto et al. included shortcomings of the ds-GPA, but more importantly report possible consequences in clinical decision-making per prognostic score of the ds-GPA [71]. This offers clinicians insight in the utility of the ds-GPA. The caveat of prognostic glioblastoma models is the lack of appraisal concerning the collected clinical data. Moreover, the FDA considers appraisal of data and subsequent analysis of all gathered data as a preliminary for clinical deployment of software as a medical device. Concisely, iterated datasets that could be considered as "pivotal" for superior performance, safety, and specific risk definitions should be identified before clinical deployment or introduction into guidelines [72].

Limitations
There are several limitations to the current systematic review. First, a preferred quantitative meta-analysis was not possible due to the methodological heterogeneity across all studies, and differences in model performance should be interpreted with caution. As low-performing models are not published, common bottlenecks may remain unexposed resulting in a duplication of futile efforts. Furthermore, publication bias can influence results as previously mentioned; high-performing models are likelier to be published. To the best of our knowledge, there are no previous systematic reviews presenting a general overview of the emerging field of survival modeling in glioblastoma patients.

Future directions
As the current field of survival modeling in glioblastoma patients is gravitating towards high-dimensional models, future research efforts should focus on harmonization and standardization to increase the volume of available training data, the accuracy of developed models, and the generalizability of their associated prediction tools. As of now, models specifically report prediction performance, yet there are many secondary characteristics that determine whether or not a model can be implemented in clinical practice. Therefore, future studies should concentrate not only on model performance but also on secondary metrics, such as the interpretability and ease of use, that are relevant for their clinical deployment [38]. Moreover, future research should focus on clinical utility, i.e., explaining how or when clinicians should alter the treatment plan of the glioblastoma patient. Lastly, considering the ethical and clinical implications parallel to its development could ensure a safe and sound implementation of this rapidly emerging technology.

Conclusion
The use of machine learning algorithms in prognostic survival models for glioblastoma has increased progressively in recent years. Yet, no machine learning models have led to an actionable prediction tool to date. For successful translation of a tool to the clinic, multicentered standardization and harmonization of data are needed. Future studies should focus not only on the model performance, but also on the secondary model characteristics, such as interpretability and ease of use, and the ethical challenges accompanied with it to ensure a safe and effective implementation in clinical care.
Funding Open access funding provided by Leiden University Medical Center (LUMC). This study did not receive funding from internal or external sources.

Compliance with ethical standards
Conflict of interest The authors declare that they have no conflict of interest.
Ethical approval The manuscript is a systematic review, so it does not require authorization of ethical committee (ethical approval).
Informed consent The manuscript is a systematic review, so it does not require authorization of ethical committee (ethical approval).
Consent for publication All authors have approved this manuscript and agree with its submission to Neurosurgical Review.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.