Glioblastoma is one of the most common and aggressive brain tumors in the world with a poor prognosis. A glioblastoma prognostication model has the potential to improve the cancer’s standard of care. No other paper has looked at using ensemble learning with a population database to predict multiple binary glioblastoma survival outcomes.
We utilized ensemble learning to design, build, and test a prognostication system for glioblastoma for short-, intermediate- and long-term survival, based on various clinical features. We used the population database SEER which covers 17 different registries. The most important prognostic features were identified and used as a clinical feature set. The statistical feature set was determined using Random Forests. The accuracy, sensitivity, specificity, area under the receiver operating characteristic (AUROC), positive predictive value (PPV), and negative predictive value (NPV) were reported.
Statistically-determined feature sets had the best performance. All the top models for short, intermediate, and long-term survival were random forests. With regards to short-term survival, top model had metrics AUROC = 0.937, accuracy = 86%, specificity = 88%, sensitivity = 85%, NPV = 85%, and PPV = 87%. For long-term survival, the top model had AUROC = 0.893, accuracy = 81%, specificity = 79%, sensitivity = 83%, NPV = 82%, and PPV = 79%. The top intermediate-term survival prediction had AUROC \(\ge\) 0.780 and the other metrics were at least 70%.
Our ensemble models were high-performing and achieved AUROCs as high as 0.94, highlighting the importance of balancing, using ensemble techniques and statistical feature selection. Our models can potentially be used by clinicians after external validation.
This is a preview of subscription content, access via your institution.
Buy single article
Instant access to the full article PDF.
Tax calculation will be finalised during checkout.
Data belongs to the Surveillance, Epidemiology, and End Results Program and was accessed after the submission of a signed Data User Agreement.
Bagherzadeh-Khiabani F, et al. A tutorial on variable selection for clinical prediction models: feature selection methods in data mining could improve the results. J Clin Epidemiol. 2016;71:76–85. https://doi.org/10.1016/j.jclinepi.2015.10.002.
Barami K, Lyon L, Conell C. Type 2 diabetes mellitus and glioblastoma multiforme-assessing risk and survival: results of a large retrospective study and systematic review of the literature. World Neurosurg. 2017;106:300–7. https://doi.org/10.1016/j.wneu.2017.06.164.
Blagus R, Lusa L. SMOTE for high-dimensional class-imbalanced data. BMC Bioinf. 2013. https://doi.org/10.1186/1471-2105-14-106.
Bohn A, Braley A, de la Vega PR, Zevallos JC, Barengo NC. The association between race and survival in glioblastoma patients in the US: a retrospective cohort study. PLoS ONE. 2018;13(6):e0198581. https://doi.org/10.1371/journal.pone.0198581.
Booth T, et al. Machine learning and glioma imaging biomarkers. Clin Radiol. 2020;75(1):20–32. https://doi.org/10.1016/j.crad.2019.07.001.
Brown TJ, et al. Association of the extent of resection with survival in glioblastoma. JAMA Oncol. 2016;2(11):1460. https://doi.org/10.1001/jamaoncol.2016.1373.
Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP. SMOTE: synthetic minority over-sampling technique. J Artif Intell Res. 2002;16:321–57. https://doi.org/10.1613/jair.953.
Cheng W, Zhang C, Ren X, Wang Z, Liu X, Han S, Wu A. Treatment strategy and IDH status improve nomogram validity in newly diagnosed GBM patients. Neuro-Oncology. 2017;19(5):736–8. https://doi.org/10.1093/neuonc/nox012.
Cheon S, et al. The accuracy of clinicians’ predictions of survival in advanced cancer: a review. Ann Palliat Med. 2016;5:22–9. https://doi.org/10.3978/j.issn.2224-5820.2015.08.04.
Collins GS, Reitsma JB, Altman DG, Moons K. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD statement. BMC Med. 2015;13(1):1. https://doi.org/10.1186/s12916-014-0241-z.
Corrales DC, Lasso E, Ledezma A, Corrales JC. Feature selection for classification tasks: expert knowledge or traditional methods? J Intell Fuzzy Syst. 2018;34(5):2825–35. https://doi.org/10.3233/jifs-169470.
Davis M. Glioblastoma: overview of disease and treatment. Clin J Oncol Nurs. 2016;20(5):S2–8. https://doi.org/10.1188/16.cjon.s1.2-8.
Fatehi M, Hunt C, Ma R, Toyota BD. Persistent disparities in survival for patients with glioblastoma. World Neurosurg. 2018;120:e511–6. https://doi.org/10.1016/j.wneu.2018.08.114.
Fotouhi S, Asadi S, Kattan MW. A comprehensive data level analysis for cancer diagnosis on imbalanced data. J Biomed Inform. 2019;90:103089. https://doi.org/10.1016/j.jbi.2018.12.003.
Gilbert MR, et al. A randomized trial of bevacizumab for newly diagnosed glioblastoma. N Engl J Med. 2014;370(8):699–708. https://doi.org/10.1056/nejmoa1308573.
Gittleman H, et al. An independently validated nomogram for individualized estimation of survival among patients with newly diagnosed glioblastoma: NRG oncology RTOG 0525 and 0825. Neuro-Oncol. 2016. https://doi.org/10.1093/neuonc/now208.
Gittleman H, et al. An independently validated nomogram for isocitrate dehydrogenase-wild-type glioblastoma patient survival. Neuro-Oncol Adv. 2019. https://doi.org/10.1093/noajnl/vdz007.
Gramatzki D, et al. Bevacizumab may improve quality of life, but not overall survival in glioblastoma: an epidemiological study. Ann Oncol. 2018;29(6):1431–6. https://doi.org/10.1093/annonc/mdy106.
Gwilliam B, Keeley V, Todd C, Roberts C, Gittins M, Kelly L, Barclay S, Stone P. Prognosticating in patients with advanced cancer-observational study comparing the accuracy of clinicians’ and patients’ estimates of survival. Ann Oncol. 2012;24(2):482–8. https://doi.org/10.1093/annonc/mds341.
Harter DH, Wilson TA, Karajannis MA. Glioblastoma multiforme: state of the art and future therapeutics. Surg Neurol Int. 2014;5(1):64. https://doi.org/10.4103/2152-7806.132138.
Kaji AH, Rademaker AW, Hyslop T. Tips for analyzing large data sets from the JAMA surgery statistical editors. JAMA Surg. 2018;153(6):508. https://doi.org/10.1001/jamasurg.2018.0647.
Karhade AV, et al. Development of machine learning algorithms for prediction of 5-year spinal chordoma survival. World Neurosurg. 2018;119:e842–7. https://doi.org/10.1016/j.wneu.2018.07.276.
Kickingereder P, et al. Radiogenomics of glioblastoma: machine learning-based classification of molecular characteristics by using multiparametric and multiregional MR imaging features. Radiology. 2016;281(3):907–18. https://doi.org/10.1148/radiol.2016161382.
Kim YJ, Lee DJ, Park CK, Kim IA. Optimal extent of resection for glioblastoma according to site, extension, and size: a population-based study in the temozolomide era. Neurosurg Rev. 2019. https://doi.org/10.1007/s10143-018-01071-3.
Kondziolka D, et al. The accuracy of predicting survival in individual patients with cancer. J Neurosurg. 2014;120(1):24–30. https://doi.org/10.3171/2013.9.jns13788.
Kourou K, et al. Machine learning applications in cancer prognosis and prediction. Comput Struct Biotechnol J. 2015;13:8–17. https://doi.org/10.1016/j.csbj.2014.11.005.
Macyszyn L, et al. Imaging patterns predict patient survival and molecular subtype in glioblastoma via machine learning techniques. Neuro-Oncology. 2015;18(3):417–25. https://doi.org/10.1093/neuonc/nov127.
Mazurowski MA, Desjardins A, Malof JM. Imaging descriptors improve the predictive power of survival models for glioblastoma patients. Neuro-Oncology. 2013;15(10):1389–94. https://doi.org/10.1093/neuonc/nos335.
Narang S, Lehrer M, Yang D, Lee J, Rao A. Radiomics in glioblastoma: current status, challenges and potential opportunities. Transl Cancer Res. 2016;5(4):383–97. https://doi.org/10.21037/tcr.2016.06.31.
National Cancer Institute, DCCPS, Surveillance Research Program: Surveillance, Epidemiology, and End Results (SEER) Program (www.seer.cancer.gov) SEER*Stat Database: Incidence - SEER 18 Regs Custom Data (with additional treatment fields), Nov 2018 Sub (1975-2016 varying) - Linked To County Attributes - Total U.S., 1969-2017 Counties (2019). Based on the November 2018 submission
Omuro A. Glioblastoma and other malignant gliomas. JAMA. 2013;310(17):1842. https://doi.org/10.1001/jama.2013.280319.
Ostrom QT, et al. Completeness of required site-specific factors for brain and CNS tumors in the surveillance, epidemiology and end results (SEER) 18 database (2004–2012, varying). J Neurooncol. 2016;130(1):31–42. https://doi.org/10.1007/s11060-016-2217-7.
Ostrom, Q.T., et al. CBTRUS statistical report: Primary brain and other central nervous system tumors diagnosed in the united states in 2011–2015. Neuro-Oncology 2018;20(suppl\_4), iv1–iv86. https://doi.org/10.1093/neuonc/noy131
Patel NP, Lyon KA, Huang JH. The effect of race on the prognosis of the glioblastoma patient: a brief review. Neurol Res. 2019. https://doi.org/10.1080/01616412.2019.1638018.
Senders JT, et al. An online calculator for the prediction of survival in glioblastoma patients using classical statistics and machine learning. Neurosurgery. 2019;86(2):E184–92. https://doi.org/10.1093/neuros/nyz403.
Shu C, Yan X, Zhang X, Wang Q, Cao S, Wang J. Tumor-induced mortality in adult primary supratentorial glioblastoma multiforme with different age subgroups. Future Oncol. 2019;15(10):1105–14. https://doi.org/10.2217/fon-2018-0719.
Stupp R, et al. Radiotherapy plus concomitant and adjuvant temozolomide for glioblastoma. N Engl J Med. 2005;352(10):987–96. https://doi.org/10.1056/nejmoa043330.
Sun Y, et al. Characteristics and prognostic factors of age-stratified high-grade intracranial glioma patients: a population-based analysis. Bosnian J Basic Med Sci. 2019. https://doi.org/10.17305/bjbms.2019.4213.
Surveillance Research Program: National Cancer Institute SEER*Stat software (www.seer.cancer.gov/seerstat) version 8.3.6
Sylvester EVA, et al. Applications of random forest feature selection for fine-scale genetic population assignment. Evol Appl. 2017;11(2):153–65. https://doi.org/10.1111/eva.12524.
Tang F, Ishwaran H. Random forest missing data algorithms. Stat Anal Data Min. 2017;10(6):363–77. https://doi.org/10.1002/sam.11348.
Thakkar JP, et al. Epidemiologic and molecular prognostic review of glioblastoma. Cancer Epidemiol Biomark Prevent. 2014;23(10):1985–96. https://doi.org/10.1158/1055-9965.epi-14-0275.
Woo P, et al. A comparative analysis of the usefulness of survival prediction models for patients with glioblastoma in the temozolomide era: The importance of methylguanine methyltransferase promoter methylation, extent of resection, and subventricular zone location. World Neurosurg. 2018;115:e375–85. https://doi.org/10.1016/j.wneu.2018.04.059.
Xu H, Chen J, Xu H, Qin Z. Geographic variations in the incidence of glioblastoma and prognostic factors predictive of overall survival in US adults from 2004–2013. Frontiers Aging Neurosci. 2017. https://doi.org/10.3389/fnagi.2017.00352.
Yoo C, Ramirez L, Liuzzi J. Big data analysis using modern statistical and machine learning methods in medicine. Int Neurourol J. 2014;18(2):50. https://doi.org/10.5213/inj.2014.18.2.50.
Zhou M, et al. Identifying spatial imaging biomarkers of glioblastoma multiforme for survival group prediction. J Magn Reson Imaging. 2016;46(1):115–23. https://doi.org/10.1002/jmri.25497.
Zhou M, et al. Radiomics in brain tumor: image assessment, quantitative feature descriptors, and machine-learning approaches. Am J Neuroradiol. 2017;39(2):208–16. https://doi.org/10.3174/ajnr.a5391.
This research received no funding.
Conflict of interest
The authors declare that they have no conflict of interest.
Analysis of the SEER data does not require an Institutional Review Board approval or informed consent.
The code used to generate the above models will be made available upon reasonable request.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Samara, K.A., Al Aghbari, Z. & Abusafia, A. GLIMPSE: a glioblastoma prognostication model using ensemble learning—a surveillance, epidemiology, and end results study. Health Inf Sci Syst 9, 5 (2021). https://doi.org/10.1007/s13755-020-00134-4
- Ensemble learning
- Feature selection
- Survival prediction
- Machine learning