Introduction

Glioblastoma multiforme (GBM) is one of the most malignant gliomas of the central nervous system (CNS) [1]. The disappointing outcome of GBM treatment is a median survival of only 15 months despite multi-modalities of treatments [2, 3]. Based on the literature, GBM has special biological characteristics presenting high heterogeneity, diffusing invasiveness, and capacity to resist conventional therapies. In addition, the existence of biological barriers, e.g., BBB, makes this tumor difficult to treat [4]. Hence, the development of new methods for the clinical treatment of GBM may be facilitated by identifying the key genes associated with GBM prognosis [5].

Over the last decade, an increased focus has been on clarifying the origin, genomic landscape, and gene expression profile of GBM by identifying specific molecular markers and pathways involved in this pathology [6]. The advent of large-scale transcriptomic analyses in various cancers has tremendously increased our understanding of tumor biology and possible cancer therapy approaches [4]. Accordingly, in recent years, an increasing number of studies have focused on gene expression patterns to propose biomarkers and GBM tumor treatment strategies [7]. However, most of this information has not been translated into clinical practice for GBM patients [7].

The vast quantities of genomic data are now being deposited in public database repositories, such as Array Express (https://www.ebi.ac.uk/arrayexpress/), The Cancer Genome Atlas (TCGA, https://portal.gdc.cancer.gov), Chinese Glioma Genome Atlas (CGGA, http://www.cgga.org.cn) and Gene Expression Omnibus (GEO, https://www.ncbi.nlm.nih.gov/geo/). These genomic data are used by researchers around the world for the discovery of new genes of interest in GBM tumors. Several studies considered numerous mRNA expression datasets and identified gene signature panels to estimate prognosis in GBM tumors to improve the prognostic and predictive assessment of the tumors [8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127,128,129,130,131,132,133,134,135,136,137,138,139,140,141,142,143,144,145,146,147,148,149,150,151,152,153,154,155,156,157,158,159,160,161,162,163,164,165,166,167,168,169,170,171,172,173,174,175,176,177,178,179,180,181,182,183,184,185,186,187,188,189,190,191,192,193,194,195,196,197,198,199,200,201,202,203,204,205,206,207,208,209,210,211,212,213,214,215,216,217,218,219,220,221,222,223,224,225,226,227,228,229,230,231,232,233,234,235,236,237,238,239,240,241,242,243,244,245,246,247,248,249,250,251,252,253,254,255,256,257,258,259,260,261,262]. However, there is no consensus in the literature on the top gene sets that could be eventually used in clinical practice.

Considering the current state of our knowledge, we sought that a systematic survey of the literature is urgently required to identify genes whose expression could be predictive of GBM survival. Subsequently, to determine the top genes whose expression could be of interest in clinical practice, we assess biological pathways and protein-protein interaction (PPI) networks associated with these genes via bioinformatic analyses.

Materials and methods

The Preferred Reporting Items for Systematic Reviews and Meta-Analysis (PRISMA) guideline [263] was followed to conduct the review (Supplementary File 1, Table S1.). PubMed, Scopus, Cochrane, and Web of Science databases were used to search for relevant studies published between 24th February 2003 and 1st January 2024. The search was conducted by the terms “gene expression” or “expressed genes” or “mRNA” or “RNA-Seq” and “survival” or “prognostic” or “biomarker” and “Glioblastoma multiforme (GBM)” or “high-grade glioma”. The full search strategy is reported in the supplementary File 1, Table S2.

Inclusion criteria were: (a) clinical study with human participants, (b) bioinformatics analysis study, (c) full-text articles, (d) published in the English language, (e) published in peer-reviewed journals, and (f) only genes related to GBM were considered. The exclusion criteria were as follows: (a) reviews, letters to the editor, and abstracts, (b) duplicate publications, (c) Plasma biomarker study, (d) participants with immunohistochemical (IHC) and Western blot analysis, (e) cell line study, (f) studies that did not observe a significant correlation between mRNA expression and overall survival, (g) recurrent glioblastoma, (h) pseudogene, and (i) animal study and progression-free survival (PFS) were not considered.

Data extraction and quality assessment

Two independent authors (PA and TA) assessed and extracted all relevant articles. For each study, the following items were extracted: first author, publication year, country, mRNA, increased expression, decreased expression, public gene databases, detection method, short survival, long survival, and area under curve (AUC) for gene panel. The Newcastle–Ottawa Scale (NOS) was used to evaluate the quality of the eligible articles for case-control studies. NOS involves three perspectives: study group selection, group comparability, and whether the exposure or the outcome of interest for a case–control study is listed in the scale. Each study can obtain a maximum of nine stars [264]. Studies scoring above the median NOS value were considered as high quality (low risk of bias) and those scoring below the median value were considered as low quality (high risk of bias). A summary of the method of quality evaluation is presented in Table 1.

Table 1 Check list for quality evaluation and scoring of studies based on NOS

Bioinformatic and statistical analysis

Protein–protein interaction (PPI) network and signaling pathways analysis

All 613 genes (with p < 0.05) obtained from this review study were considered in the bioinformatic analysis. The PPI network was constructed by Cytoscape software (version 3.9.0; https://cytoscape.org/). The top important nodes of the PPI network were obtained based on the Cytohubba plug-in. The 5 well-known central indices, including degree, stress, betweenness, closeness, and radiality of nodes, were considered to rank the network nodes. The top 10% of genes were determined in each metric. Then, common genes were identified between five metrics. Finally, between common genes, proteins with a high degree of centrality were selected and were considered the most important ones to investigate their association with survival in GBM patients. Moreover, the top 10 genes ranked by degree are calculated.

A pathway analysis using the GO Biological Process (GOBP) 2023 database through the ENRICHR package (https://maayanlab.cloud/Enrichr/, accessed on 23 March 2024) was then performed for further specified related mechanisms involved in cancer such as cell proliferation, differentiation, apoptosis, mitosis, angiogenesis, and stemness. Only GOBP terms with adjusted p-value < 0.01 by ENRICHR analysis were used.

Survival analysis and validation of the gene expression in the GEPIA2 and CGGA datasets

To confirm the reliability of the identified gene from the PPI network, Kaplan-Meier curves were created according to the GEPIA2 (http://gepia2.cancer-pku.cn) and the CGGA (http://www.cgga.org.cn) databases. CGGA contained two glioma data sets, namely, mRNAseq_325 and mRNAseq_693. Primary GBM of CGGA (mRNAseq_325) and CGGA (mRNAseq_693) data were considered. To determine differences in overall survival for patients with a low and high gene-expressing GBM, OS Kaplan-Meier analysis was performed by the GEPIA2 using the TCGA gene expression dataset and CGGA online applications. Kaplan–Meier curves were generated with a 50% median expression cutoff for high- and low-expressing groups. The estimation of hazard ratios was done by Cox proportional hazards model regression analysis. A 95% confidence interval was set and used. P < 0.05 was a statistically significant difference in validation cohorts from GEPIA2 and CGGA.

Results

Descriptive statistics

The workflow of the literature selection process is shown in Fig. 1. In brief, 4104 articles were found via an initial literature search of the databases, and 1296 studies were excluded owing to duplication. After screening the titles and abstracts, 2371 studies were not considered relevant to the purpose of this systematic review based on method. Subsequently, 255 studies were enrolled, the characteristics of each study were shown in Supplementary Table 2. Among these, 161 studies were conducted in China [16, 18, 2021, 2329, 31, 34, 3637, 4044, 4748, 50, 53, 55, 64, 6669, 7174, 80, 8287, 89, 9295, 9798, 102105, 107, 109113, 115118, 120124, 126127, 135137, 139144, 146152, 154155, 157, 160164, 166, 168170, 172173, 177179, 184190, 193195, 197197, 206208, 212213, 215221, 223226, 229, 231246, 244, 246257, 259261 ], 23 studies were conducted in USA [10, 1314, 32, 4546, 58, 63, 75, 77, 88, 106, 108, 114, 133134, 159, 165, 182, 192, 228, 243, 245]; besides that, India [9, 19, 30, 33, 59, 101, 130, 132, 196, 205], Taiwan [61, 79, 96, 145, 156, 175176, 242], Germany [38, 51, 65, 90, 180, 230], Japan [22, 57, 76, 129, 258], Republic of Korea [62, 167, 171, 209, 241], UK [39, 60, 131, 191], Spain [15, 119, 153, 227], Lithuania [49, 99, 125, 138], Italy [100, 183, 203], France [ 8,12,158], Slovenia [11, 81, 128], Switzerland [35, 52], Sweden [78, 174], Turkey [211, 214], Russia [222], Finland [17], Netherland [204], Hungary [54], Canada [70], Brazil [210] Iran [262], and Austria [181]. Among 255 studies, 192, and 37 of them used the dataset of the TCGA and the CGGA, respectively.

Fig. 1
figure 1

Flowchart of the selection process

In the 720 genes studies, 613 unique genes were identified whose expression was associated with overall survival in GBM, of which 107 were described in two or more studies. See Supplementary Table 2 for details about the number of studies that described each gene, and whether or not it was found to be upregulated, downregulated, and the databases used.

NOS assessment

The risk of bias evaluation of the included studies for case-control studies according to the NOS is shown in Supplementary Tables 2 and Supplementary File 1, Table S4. Based on the NOS, the median score of the included studies was 7. Among the 255 studies, 131 studies that scored ≥ 7 were considered to present a low risk of bias. 124 of the studies were considered with a high risk of bias since they scored b < 7.

Bioinformatic analysis

After removing duplicates, 613 genes were included in the bioinformatic analysis (Supplementary File 2). A PPI network was built using the STRING database and Cytoscape application, with an input of 613 genes (Fig. 2). The network was analyzed, and the nodes were ranked based on centrality parameters. The PPI network contains 602 nodes and 5570 edges. Top genes based on the degree value, betweenness centrality, closeness centrality, and stress were selected and organized into 5 groups (Table 2). By considering the degree of connectivity in the PPI network, as described in the materials and methods section, 31 important genes including (IL6, EGFR, STAT3, MMP9, CD44, FN1, CD4, TGFB1, CXCL8, CCL2, IL10, ICAM1, IL1A, CD274, KDR, SPP1, ITGB2, CDKN2A, PARP1, MYD88, AGT, NOTCH1, SERPINE1, TNFRSF1A, CDK1, CAV1, ITGB3, CDK4, FOXO3, MDM2, PROM1), were introduced (Table 2). In addition, the top 10 genes with the highest node degree score were identified as hub genes, as shown in Fig. 3.

Fig. 2
figure 2

The 613 differentially expressed genes were input into STRING database for PPI network analysis, and achieved a PPI network of 602 nodes and 5570 edges, with PPI enrichment p-value < 1.0 × 10–16. The network was constructed by Cytoscape based on the PPI correlations from the STRING database

Table 2 The most 31 important genes related to survival GBM
Fig. 3
figure 3

The top 10 genes in the PPI network, in terms of degree ranking, were regarded as hub genes. The node color changes gradually from yellow to red in ascending order according to the degree of the genes

In the validation step as shown in Fig. 4, and Table 2, genes (FN1, CXCL8, and TNFRSF1A) from the GEPIA2 dataset, genes (IL6, STAT3, MMP9, FN1, CD4, CCL2, IL10, ICAM1, KDR, MYD88, MDM2) from the mRNA_seq325 of the CCGA, and genes (FN1, NOTCH1, CDKN2A) from the mRNA_seq693 of the CCGA cohort were associated significantly with overall survival in GBM patients.

Fig. 4
figure 4

(an) Kaplan-Meier analysis of overall survival for GBM patients in the GEPIA2 using the TCGA cohort (a. FN1; b. CXCL8; and c. TNFRSF1A), the mRNA_seq325 of the CCGA (d. IL6; e. STAT3; f. MMP9; g. FN1; h. CD4; i. CCL2; j. IL10; k. ICAM1; l. KDR; m. MYD88; and n, MDM2), and the mRNA_seq693 of the CCGA (o, FN1, p, NOTCH1, q, CDKN2A) based on low- and high-expression of genes. The red line represents samples with high expression of the genes, and the blue line represents the samples with low expression of genes. Among 31 genes, p < 0.05 was considered to be statistically different

The list of the top 31 genes is used as input for computing enrichment. As a result, 1271 GOBP terms were found and 11 GOBP terms were considered. The complete list of significantly enriched GOBP terms and related genes is given in Table 3.

Table 3 The top enriched gene ontology biological process terms

Discussion

To the best of our knowledge, this systematic literature review is the most comprehensive review of gene expression for predicting GBM overall survival outcomes. The most 31 important genes including IL6, EGFR, STAT3, MMP9, CD44, FN1, CD4, TGFB1, CXCL8, CCL2, IL10, ICAM1, IL1A, CD274, KDR, SPP1, ITGB2, CDKN2A, PARP1, MYD88, AGT, NOTCH1, SERPINE1, TNFRSF1A, CDK1, CAV1, ITGB3, CDK4, FOXO3, MDM2, and PROM1, respectively, were considered as candidate biomarkers for GBM survival. Our analyses showed that in fact they all could be considered biomarkers. Nevertheless, based on the search strategy (Supplementary File 1, Table S2.), this review aimed to conduct a comprehensive, systematic literature review to identify all relevant studies that have significantly reported genes related to overall survival in GBM patients. However, some impact reports on this topic might have been missed due to limitations in the search strategy [265266]. In the study future, given the well-established heterogeneity of GBM, the assessment of the prognostic value of specific genes must be conducted with consideration for GBM molecular subtypes, to ensure a comprehensive understanding of their impact, and would pave the way for precision medicine [266].

Detection of a specific gene expression in GBM tumors may be used to diagnose the existence of a GBM disease or enable clinicians to select the most effective treatment. As there was heterogeneity among the studied genes, bioinformatic analyses were performed to compile these data. The results identified 31 key genes, which had high weight and good topological properties (degree, stress, betweenness, closeness, and radiality) in the pathogenic networks. In addition, these genes were validated by RT-qPCR assays or bioinformatic analysis of datasets. In this study according to 5 typical nodal metrics, we found the most 31 important genes related to the survival of patients with GBM. However, there is currently no consensus on how to use these metrics for the interpretation of biological networks [267]. Therefore, these findings require further investigation.

Identification of survival-associated genes in GBMs has been ongoing over the past decade. However, the gene lists identified by researchers [8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127,128,129,130,131,132,133,134,135,136,137,138,139,140,141,142,143,144,145,146,147,148,149,150,151,152,153,154,155,156,157,158,159,160,161,162,163,164,165,166,167,168,169,170,171,172,173,174,175,176,177,178,179,180,181,182,183,184,185,186,187,188,189,190,191,192,193,194,195,196,197,198,199,200,201,202,203,204,205,206,207,208,209,210,211,212,213,214,215,216,217,218,219,220,221,222,223,224,225,226,227,228,229,230,231,232,233,234,235,236,237,238,239,240,241,242,243,244,245,246,247,248,249,250,251,252,253,254,255,256,257,258,259,260,261,262] differ considerably; only 107 common genes from 720 genes could be identified in studies. These differences can be attributed to two major factors. First, researchers have analyzed GBM datasets from various cohorts worldwide. Second, studies have analyzed different types of datasets, obtained using different approaches such as PCR or next-generation sequencing data. Due to technical limitations and cohorts’ specificity, the expression profiles of similar genes identified from different datasets may be inconsistent.

To improve prognostic and predictive survival power in GBM patients, researchers [69, 9192, 9495, 97, 109, 131, 137138, 144, 147, 172, 186, 189, 195, 197, 200, 222, 242, 251, 253], identified a panel of 2, 3, 4, 6,7, 8, 13, or 14 genes using mRNA expression datasets. They established a risk score model that performed well in survival prediction. High-risk group patients had significantly poorer survival as compared with those in the low-risk group. In this study, AUC for the 1-year overall survival predictions was reported between o.587 [195], to 0.905 [69]. This difference may be due to the use of different databases and cohorts’ specificities. The obtained 31 mRNA panel in this study, is suggested to predict OS in glioblastoma in various cohorts.

The present study showed that ten hub genes (IL6, EGFR, STAT3, MMP9, CD44, FN1, CD4, TGFB1, CXCL8, CCL2) with higher node degree in PPI networks have been predicted to be survival biomarkers for GBM patients and some have been experimentally validated. These hub genes can be offered to the candidate biomarkers of future research for therapeutic targets in patients with GBM. In addition, this study showed that five hub genes (IL6, NOTCH1, TGFB1, EGFR, and KDR) were involved in most of the pathways, and they can be further investigated for biological discoveries. Moreover, we noticed that cell proliferation, apoptotic process, cell migration, and cell differentiation contain many of these genes (see Table 3). In addition, various studies showed that overexpression of these five genes leads to increased cell proliferation and invasion, and inhibition of apoptosis in glioblastoma tumors and was associated with poor patient survival [268,269,270,271]. Therefore, these GOBP terms may exert a synergistic effect on the survival of GBM, which could be clues to therapeutic strategies for this disease.

One might inquire about the hub genes obtained from this study. Certainly, various computer modeling algorithms and prediction methods have been and are being developed and used to predict outcomes in medical research. It is noted that each modeling approach has its strengths and weaknesses and there is no best one for all cases [272]. The best modeling approach is uncertain, and may be obtained by combining more than one model, and research in this field continues [272]. By changing the method of prediction, the most important variables will be changed to predict outcomes [272]. We identified two hub gene groups that were associated with overall survival in GBM patients. The ten and five hub genes are ranked by degree and pathways analysis methods, respectively. It is noted that, when a different gene selection criterion is applied, the number of genes in the two top-ranking lists of the two methods will also change [273]. In this study, the algorithms yielded different top-ranking gene lists due to their different approach. Interestingly, the two lists of hub genes have three in common, that were selected as the most important genes for the prediction of survival in methods and can be considered as three hub genes (IL6, TGFB1, and EGFR).

In this study, the five and ten lists of hub genes were selected based on two different methods, therefore, the two groups are different [272273]. On the other hand, as we all know, the TCGA and the CGGA databases are the world’s largest and most comprehensive gene expression public databases in GBM patients. Hence, these databases were used for validation of our study results. In the validation analysis, we used the GEPIA2, the mRNA_seq325, and the mRNA_seq693 of the CCGA. Only, the FN1 gene was significant in three cohorts. Although the mRNA_seq693 includes more patients with Grade 4 glioma compared to the mRNA_seq325, only three genes were significant compared to eleven genes observed in the other cohort of the CGGA (Table 3). The differences seen between the three databases can be due to the differences in genetics between the different populations.

Study consistency

Of the 255 manuscripts, all studies were prospective. No randomized trial was found. 107 mRNAs (14.9%; 107/720) were common in all studies. However, a large number of studies have not been validated; hence, there was a lack of high-quality evidence in this study. 124 studies were rated as fair quality; 131 studies were considered to be of high quality. Types of studies and datasets were not consistently reported, resulting in a potential bias.

Among studies in this systematic review, genes with a significant correlation between gene expression and overall survival were considered. Previous studies have found that gene expression levels are associated with prognosis and some genes can be applied to predict the survival risk of GBM patients. However, some studies have conflicts regarding significant differences in gene expression and overall survival. These conflicts seem to depend on the GBM sample size, the heterogeneity of GBM, the datasets used, and the methodologies employed. All the above- mentioned may explain why the validation step did not yield significant results.

Study quality

The small sample size in PCR-based studies, the high number of single-center studies, various databases, and the high number of studies from China (161/255), the USA (23/155), and India (10/255), which may affect the quality of studies. The heterogeneity of the studies reduced the quality of the data. In some of these datasets and GBM samples, the type and severity of GBM disease were not specified. In addition, some studies lacked validation of their candidate genes in a GBM patient cohort. Therefore, further research with large sample sizes and validation in GBM patients is warranted.

Strengths, limitations, and future perspectives

To the best of our knowledge, the current study is the first that systematically reviewed published data on gene expression related to the survival of GBM patients. Of note, the major strength of the current systematic review is that bioinformatic analyses were performed, which added new information to the previously studied gene expression on this topic. The findings reported here provide a better view of gene expression biomarkers in predicting the prognosis of patients with GBM.

There are some limitations in our work. Firstly, the search strategy was restricted to the English language literature only, hence, there is a possibility of excluding qualified studies published in other languages. Secondly, the study showed a high level of heterogeneity in the methods used among the included studies. In particular, there were heterogeneities in (1) Variety in disease severity; and (2) Age- and gender-related changes in GBM patients were not considered. Thirdly, the overall survival has been associated with multiple factors such as poor immune response, which was not considered in this study. Additionally, the role of gene expression was not completely clarified in various biological processes and the potential application of these molecules as gene therapies. Hence, future studies are required to clarify the biological roles of the mRNAs to investigate the possibility of their clinical utilization in GBM patients.

Conclusion

Our review suggests that the current evidence for gene expression associated with GBM survival is highly variable. At present, no clear decisions can be made from this systematic review for application into clinical practice. The key recommendation from this study is that genetic data sharing develops strategies and guidelines in this field that can be used to answer important questions. Moreover, in future a combination of significant genes expression signatures can be applied to identify a powerful and independent predictor for outcome in GBM patients.