GLIMPSE: a glioblastoma prognostication model using ensemble learning—a surveillance, epidemiology, and end results study



Glioblastoma is one of the most common and aggressive brain tumors in the world with a poor prognosis. A glioblastoma prognostication model has the potential to improve the cancer’s standard of care. No other paper has looked at using ensemble learning with a population database to predict multiple binary glioblastoma survival outcomes.


We utilized ensemble learning to design, build, and test a prognostication system for glioblastoma for short-, intermediate- and long-term survival, based on various clinical features. We used the population database SEER which covers 17 different registries. The most important prognostic features were identified and used as a clinical feature set. The statistical feature set was determined using Random Forests. The accuracy, sensitivity, specificity, area under the receiver operating characteristic (AUROC), positive predictive value (PPV), and negative predictive value (NPV) were reported.


Statistically-determined feature sets had the best performance. All the top models for short, intermediate, and long-term survival were random forests. With regards to short-term survival, top model had metrics AUROC = 0.937, accuracy = 86%, specificity = 88%, sensitivity = 85%, NPV = 85%, and PPV = 87%. For long-term survival, the top model had AUROC = 0.893, accuracy = 81%, specificity = 79%, sensitivity = 83%, NPV = 82%, and PPV = 79%. The top intermediate-term survival prediction had AUROC \(\ge\) 0.780 and the other metrics were at least 70%.


Our ensemble models were high-performing and achieved AUROCs as high as 0.94, highlighting the importance of balancing, using ensemble techniques and statistical feature selection. Our models can potentially be used by clinicians after external validation.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Data availability

Data belongs to the Surveillance, Epidemiology, and End Results Program and was accessed after the submission of a signed Data User Agreement.


  1. 1.

    Bagherzadeh-Khiabani F, et al. A tutorial on variable selection for clinical prediction models: feature selection methods in data mining could improve the results. J Clin Epidemiol. 2016;71:76–85.

    Article  Google Scholar 

  2. 2.

    Barami K, Lyon L, Conell C. Type 2 diabetes mellitus and glioblastoma multiforme-assessing risk and survival: results of a large retrospective study and systematic review of the literature. World Neurosurg. 2017;106:300–7.

    Article  Google Scholar 

  3. 3.

    Blagus R, Lusa L. SMOTE for high-dimensional class-imbalanced data. BMC Bioinf. 2013.

    Article  Google Scholar 

  4. 4.

    Bohn A, Braley A, de la Vega PR, Zevallos JC, Barengo NC. The association between race and survival in glioblastoma patients in the US: a retrospective cohort study. PLoS ONE. 2018;13(6):e0198581.

    Article  Google Scholar 

  5. 5.

    Booth T, et al. Machine learning and glioma imaging biomarkers. Clin Radiol. 2020;75(1):20–32.

    Article  Google Scholar 

  6. 6.

    Brown TJ, et al. Association of the extent of resection with survival in glioblastoma. JAMA Oncol. 2016;2(11):1460.

    Article  Google Scholar 

  7. 7.

    Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP. SMOTE: synthetic minority over-sampling technique. J Artif Intell Res. 2002;16:321–57.

    Article  Google Scholar 

  8. 8.

    Cheng W, Zhang C, Ren X, Wang Z, Liu X, Han S, Wu A. Treatment strategy and IDH status improve nomogram validity in newly diagnosed GBM patients. Neuro-Oncology. 2017;19(5):736–8.

    Article  Google Scholar 

  9. 9.

    Cheon S, et al. The accuracy of clinicians’ predictions of survival in advanced cancer: a review. Ann Palliat Med. 2016;5:22–9.

    Article  Google Scholar 

  10. 10.

    Collins GS, Reitsma JB, Altman DG, Moons K. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD statement. BMC Med. 2015;13(1):1.

    Article  Google Scholar 

  11. 11.

    Corrales DC, Lasso E, Ledezma A, Corrales JC. Feature selection for classification tasks: expert knowledge or traditional methods? J Intell Fuzzy Syst. 2018;34(5):2825–35.

    Article  Google Scholar 

  12. 12.

    Davis M. Glioblastoma: overview of disease and treatment. Clin J Oncol Nurs. 2016;20(5):S2–8.

    Article  Google Scholar 

  13. 13.

    Fatehi M, Hunt C, Ma R, Toyota BD. Persistent disparities in survival for patients with glioblastoma. World Neurosurg. 2018;120:e511–6.

    Article  Google Scholar 

  14. 14.

    Fotouhi S, Asadi S, Kattan MW. A comprehensive data level analysis for cancer diagnosis on imbalanced data. J Biomed Inform. 2019;90:103089.

    Article  Google Scholar 

  15. 15.

    Gilbert MR, et al. A randomized trial of bevacizumab for newly diagnosed glioblastoma. N Engl J Med. 2014;370(8):699–708.

    Article  Google Scholar 

  16. 16.

    Gittleman H, et al. An independently validated nomogram for individualized estimation of survival among patients with newly diagnosed glioblastoma: NRG oncology RTOG 0525 and 0825. Neuro-Oncol. 2016.

    Article  Google Scholar 

  17. 17.

    Gittleman H, et al. An independently validated nomogram for isocitrate dehydrogenase-wild-type glioblastoma patient survival. Neuro-Oncol Adv. 2019.

    Article  Google Scholar 

  18. 18.

    Gramatzki D, et al. Bevacizumab may improve quality of life, but not overall survival in glioblastoma: an epidemiological study. Ann Oncol. 2018;29(6):1431–6.

    Article  Google Scholar 

  19. 19.

    Gwilliam B, Keeley V, Todd C, Roberts C, Gittins M, Kelly L, Barclay S, Stone P. Prognosticating in patients with advanced cancer-observational study comparing the accuracy of clinicians’ and patients’ estimates of survival. Ann Oncol. 2012;24(2):482–8.

    Article  Google Scholar 

  20. 20.

    Harter DH, Wilson TA, Karajannis MA. Glioblastoma multiforme: state of the art and future therapeutics. Surg Neurol Int. 2014;5(1):64.

    Article  Google Scholar 

  21. 21.

    Kaji AH, Rademaker AW, Hyslop T. Tips for analyzing large data sets from the JAMA surgery statistical editors. JAMA Surg. 2018;153(6):508.

    Article  Google Scholar 

  22. 22.

    Karhade AV, et al. Development of machine learning algorithms for prediction of 5-year spinal chordoma survival. World Neurosurg. 2018;119:e842–7.

    Article  Google Scholar 

  23. 23.

    Kickingereder P, et al. Radiogenomics of glioblastoma: machine learning-based classification of molecular characteristics by using multiparametric and multiregional MR imaging features. Radiology. 2016;281(3):907–18.

    Article  Google Scholar 

  24. 24.

    Kim YJ, Lee DJ, Park CK, Kim IA. Optimal extent of resection for glioblastoma according to site, extension, and size: a population-based study in the temozolomide era. Neurosurg Rev. 2019.

    Article  Google Scholar 

  25. 25.

    Kondziolka D, et al. The accuracy of predicting survival in individual patients with cancer. J Neurosurg. 2014;120(1):24–30.

    Article  Google Scholar 

  26. 26.

    Kourou K, et al. Machine learning applications in cancer prognosis and prediction. Comput Struct Biotechnol J. 2015;13:8–17.

    Article  Google Scholar 

  27. 27.

    Macyszyn L, et al. Imaging patterns predict patient survival and molecular subtype in glioblastoma via machine learning techniques. Neuro-Oncology. 2015;18(3):417–25.

    Article  Google Scholar 

  28. 28.

    Mazurowski MA, Desjardins A, Malof JM. Imaging descriptors improve the predictive power of survival models for glioblastoma patients. Neuro-Oncology. 2013;15(10):1389–94.

    Article  Google Scholar 

  29. 29.

    Narang S, Lehrer M, Yang D, Lee J, Rao A. Radiomics in glioblastoma: current status, challenges and potential opportunities. Transl Cancer Res. 2016;5(4):383–97.

    Article  Google Scholar 

  30. 30.

    National Cancer Institute, DCCPS, Surveillance Research Program: Surveillance, Epidemiology, and End Results (SEER) Program ( SEER*Stat Database: Incidence - SEER 18 Regs Custom Data (with additional treatment fields), Nov 2018 Sub (1975-2016 varying) - Linked To County Attributes - Total U.S., 1969-2017 Counties (2019). Based on the November 2018 submission

  31. 31.

    Omuro A. Glioblastoma and other malignant gliomas. JAMA. 2013;310(17):1842.

    Article  Google Scholar 

  32. 32.

    Ostrom QT, et al. Completeness of required site-specific factors for brain and CNS tumors in the surveillance, epidemiology and end results (SEER) 18 database (2004–2012, varying). J Neurooncol. 2016;130(1):31–42.

    Article  Google Scholar 

  33. 33.

    Ostrom, Q.T., et al. CBTRUS statistical report: Primary brain and other central nervous system tumors diagnosed in the united states in 2011–2015. Neuro-Oncology 2018;20(suppl\_4), iv1–iv86.

  34. 34.

    Patel NP, Lyon KA, Huang JH. The effect of race on the prognosis of the glioblastoma patient: a brief review. Neurol Res. 2019.

    Article  Google Scholar 

  35. 35.

    Senders JT, et al. An online calculator for the prediction of survival in glioblastoma patients using classical statistics and machine learning. Neurosurgery. 2019;86(2):E184–92.

    Article  Google Scholar 

  36. 36.

    Shu C, Yan X, Zhang X, Wang Q, Cao S, Wang J. Tumor-induced mortality in adult primary supratentorial glioblastoma multiforme with different age subgroups. Future Oncol. 2019;15(10):1105–14.

    Article  Google Scholar 

  37. 37.

    Stupp R, et al. Radiotherapy plus concomitant and adjuvant temozolomide for glioblastoma. N Engl J Med. 2005;352(10):987–96.

    Article  Google Scholar 

  38. 38.

    Sun Y, et al. Characteristics and prognostic factors of age-stratified high-grade intracranial glioma patients: a population-based analysis. Bosnian J Basic Med Sci. 2019.

    Article  Google Scholar 

  39. 39.

    Surveillance Research Program: National Cancer Institute SEER*Stat software ( version 8.3.6

  40. 40.

    Sylvester EVA, et al. Applications of random forest feature selection for fine-scale genetic population assignment. Evol Appl. 2017;11(2):153–65.

    Article  Google Scholar 

  41. 41.

    Tang F, Ishwaran H. Random forest missing data algorithms. Stat Anal Data Min. 2017;10(6):363–77.

    MathSciNet  Article  Google Scholar 

  42. 42.

    Thakkar JP, et al. Epidemiologic and molecular prognostic review of glioblastoma. Cancer Epidemiol Biomark Prevent. 2014;23(10):1985–96.

    Article  Google Scholar 

  43. 43.

    Woo P, et al. A comparative analysis of the usefulness of survival prediction models for patients with glioblastoma in the temozolomide era: The importance of methylguanine methyltransferase promoter methylation, extent of resection, and subventricular zone location. World Neurosurg. 2018;115:e375–85.

    Article  Google Scholar 

  44. 44.

    Xu H, Chen J, Xu H, Qin Z. Geographic variations in the incidence of glioblastoma and prognostic factors predictive of overall survival in US adults from 2004–2013. Frontiers Aging Neurosci. 2017.

    Article  Google Scholar 

  45. 45.

    Yoo C, Ramirez L, Liuzzi J. Big data analysis using modern statistical and machine learning methods in medicine. Int Neurourol J. 2014;18(2):50.

    Article  Google Scholar 

  46. 46.

    Zhou M, et al. Identifying spatial imaging biomarkers of glioblastoma multiforme for survival group prediction. J Magn Reson Imaging. 2016;46(1):115–23.

    Article  Google Scholar 

  47. 47.

    Zhou M, et al. Radiomics in brain tumor: image assessment, quantitative feature descriptors, and machine-learning approaches. Am J Neuroradiol. 2017;39(2):208–16.

    Article  Google Scholar 

Download references


This research received no funding.

Author information



Corresponding author

Correspondence to Kamel A. Samara.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Ethical approval

Analysis of the SEER data does not require an Institutional Review Board approval or informed consent.

Code availability

The code used to generate the above models will be made available upon reasonable request.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Samara, K.A., Al Aghbari, Z. & Abusafia, A. GLIMPSE: a glioblastoma prognostication model using ensemble learning—a surveillance, epidemiology, and end results study. Health Inf Sci Syst 9, 5 (2021).

Download citation


  • Glioblastoma
  • Prognosis
  • SEER
  • Ensemble learning
  • Feature selection
  • Survival prediction
  • Machine learning