Introduction

Gliomas is a highly aggressive and the most common intracranial primary malignant tumor1,2. A new version of tumor taxonomy based on clinical and molecular pathological features for the GBM classification has been updated published by the World Health Organization (WHO) (2021 version). Low-grade gliomas (LGG) of grades 1 and 2 are defined as slow-growing tumors with low infiltration of the brain parenchyma, and high-grade gliomas (HGG) of grades 3 and 4 are highly infiltrative hypodifferentiated or undifferentiated tumors with high infiltration of the brain parenchyma. Grade 4 tumors, which also known as glioblastoma (GBM), are the hairiest and most aggressive CNS tumor1,3,4. Currently, surgical resection, targeted agents and chemotherapy are widely applied treatment options, however, the prognosis of GBM remains unfavorable, with a median survival time of only 14–16 months5. Immunotherapy as an emerging second-line treatment option has been proven to be successful in other cancers6,7. High infiltration results in extensive heterogeneity in the formation of immune components in tumor microenvironment, which explains the inability of immunotherapy to achieve breakthroughs in GBM8,9.

Numerous investigations have portrayed novel molecular expression tumor classification based on the Cancer Genome Atlas (TCGA) with different markers, focusing on the expression of signature genes to classify cancer subtypes10,11,12,13. These studies contributed to the development of anti-tumor targeted agents14,15. Molecular subtype-related studies have also classified GBM subtypes in terms of different features10,16. Liu et al. reported the identification of two subtypes in TCGA's low-grade glioma dataset based on cytomorpologic biomarkers of GBM10. In addition, Munquad et al. based on high-throughput data from different transcriptomes and methylomes in order to develop a deep learning model based on convolutional neural networks will be and used to recognize subtypes of GBM17. The portrayal of GBM molecular subtypes could improve personalized treatment selection for patients18. Verhaak's earliest study characterized four GBM subtypes, Proneural, Neural, Classical, and Mesenchymal, which exhibited abnormal expression of PDGFRA, IDH1, EGFR, and NF1 or copy number variation (CNV), respectively19. However, we still lack a systematic and in-depth insights into the current GBM subtyping. A deeper delineation of GBM molecular subtypes could offer a robust basis for scientific therapeutic instruction and immunotherapy development.

In this research, we presented the first classification of molecular subtypes of GBM in TCGA using Bayesian non-negative matrix factorization (BayesNMF) algorithm and consensus clustering. This clustering method has also previously helped to identify six expression subtypes for head and neck squamous cell carcinoma20. The genomic and proteomic data from the Clinical Proteomic Tumor Analysis Consortium (CPTAC), cellular data from Dependency Map, and proteomic data from the Cancer Proteome Atlas (TCPA) demonstrated the consistency and reproducibility of the discovered molecular subtypes. Based on the requirement of cytolytic immune responses for effective natural antitumor immunity, simple expression metrics of effector molecules (PRF1, GZMA) that mediate cytolytic activity could quantify the cytolytic activity of GBM, thereby allowing us to actively identify molecular subtypes that are sensitive to spontaneous cytolytic activity21. The molecular and pathological features of each subtype were explored based on expression profiles, and response to immunotherapy was assessed by TIDE. Our analysis developed novel molecular subtypes in GBM, and LASSO logistic regression analysis was also performed on the expression profiles of specific subtype to identify subtype-specific model, laying the foundation for a deeper characterization of cancer pathogenesis.

Materials and methods

TCGA GBM data

For this study, we collected GBM data from multiple databases. RNA sequencing data (RNA-seq) for the GBM sequencing project (TCGA-GBM) were obtained from the TCGA (https://portal.gdc.cancer.gov/) database, and clinical data and somatic mutation data (maf file format) of the samples in TCGA-GBM were also procured. The SangerBox database (http://sangerbox.com/login.html)22 was used to screen sample statistics and those with missing status were eliminated. For RNA-seq data, fragments Per Kilobase of exon model per Million mapped fragments (FPKM) were first converted into transcripts Per Kilobase of exon model per Million mapped reads (TPM), followed by log2(TPM + 1) processing. After processing, RNA-seq data from 153 samples were included in this study.

Genomic, proteomic and copy number variation data for GBM

Genomics and proteomics data for 99 GBM samples were sourced from the Clinical Proteomic Tumor Analysis Consortium (CPTAC, https://cptac-data-portal.georgetown.edu/datasets) database. Copy number variant data were extracted from the cBioPortal for Cancer Genomics (https://www.cbioportal.org/) database. Reverse phase protein array (RPPA) data were sourced from The Cancer Proteome Atlas (TCPA, https://www.tcpaportal.org/tcpa/download.html). In addition, the omics data and CRISPR knockout data for Cancer Cell Line Encyclopedia (CCLE) GBM cell line samples were obtained from the DepMap (https://depmap.org/portal/; DepMap Public 21Q3 dataset). Out of 1377 cell lines, we used 47 GBM cell lines.

Vulnerability analysis of different subtype-specific cancers

The CERES obtained from the Cancer Dependency Map (DepMap) is used as a measure of cell line cancer susceptibility. The CERES is a calculation used to estimate the dependency of each cell line tested on a given gene knockout. Typically, the CERES is interpreted as follows: a score of 0 indicates that the gene is not essential in a given cell line, while -1 indicates a high dependence23.

Identification of molecular subtypes in GBM by Bayesian non-negative matrix factorization (BayesNMF)

For the molecular subtypes of novel features in GBM, the expression features of 153 samples in TCGA-GBM were discriminated by consensus clustering based on BayesNMF with reference to the method of Tan et al.24. The top 25% of highly variable genes were considered to feature across samples, and the expression matrix of these genes was retained defined as R. The expression matrix R* was then transformed based on the median expression as the centered multiplicity. The distance matrix 1-C (Cij expressed the Spearman correlation between genes in sample i and sample j) and the consistency matrix Mk were determined. BayesNMF was run to select the optimal number of clusters K to achieve sample clustering. The differences between Proneural, Neural, Classical, and Mesenchymal subtypes found with Verhaak et al.19 were compared.

Single-sample gene set enrichment analysis (ssGSEA) in molecular subtypes

To discuss the biological differences in molecular subtypes, the h.all.v7.5.1.symbols.gmt gene set was obtained from the Molecular Signatures Database (MSigDB, https://www.gsea-msigdb.org/gsea/msigdb). The GSVA package25 was installed in R. The ssGSEA was implemented26,27. Features in molecular subtypes were defined based on pathway activity. Proliferative and immune features in subtypes and expression levels of T cells effector molecular proteins (PRF1, GZMA) were assessed based on markers in pan-cancer from previous studies28.

CPTAC cohort validation

The molecular subtypes in the GBM samples from the CPTAC cohort were clustered according to the BayesNMF implementation consensus clustering, which based on a normal likelihood and exponential priors, and derive an efficient Gibbs sampler to approximate the posterior density of the NMF factors29. Based on the h.all.v7.5.1.symbols.gmt gene set, ssGSEA was implemented on the clustered subtypes to explore the biological pathway differences and we further discussed the reproducibility and robustness of the novel subtypes of GBM.

Copy number variation analysis in TCGA and CPTAC cohorts

To uncover the mutational heterogeneity of cancer and to select new cancer driver genes, the somatic mutation data obtained from the TCGA cohort were analyzed. We determined representative mutated genes in GBM and genes with remarkable somatic copy-number alterations (SCNA) in novel subtypes of GBM using MutSig2CV30,31 and Genomic Identification of Significant Targets in Cancer (GISTIC) 2.032.

Identification of subtypes for immunotherapy

To identify molecular subtypes of GBM that could benefit from immunotherapy, we calculated the Tumor Immune Dysfunction and Exclusion (TIDE) score. Initially, a cohort (registration number GSE84010) treated with Temozolomide was retrieved in the Gene Expression Omnibus (GEO, https://www.ncbi.nlm.nih.gov/geo/). The marker genes based on the subtypes were clustered according to BayesNMF implementation consensus clustering to classify the samples in GSE84010 as GBM molecular subtypes. ssGSEA was implemented again to verify the reproducibility of the subtypes. The TIDE scores of the TCGA cohort samples were then obtained from the Tumor Immune Dysfunction and Exclusion website (http://tide.dfci.harvard.edu/) to compare the differences in TIDE scores among subtypes, with lower TIDE scores representing greater benefit from immunotherapy33. CIBERSORT is a tool for deconvoluting the expression matrix of 22 human immune cell subtypes based on the principle of linear support vector regression34. The proportion of immune cells in the TCGA-GBM cohort was computed by CIBERSORT assessment.

Construction of subtype prediction model

The protein data from TCGA-GBM cohort and TCPA were used to divide samples into training set data and test set data according to 8:2. The model was constructed by the LASSO logistic regression model. The glmnet package35 was installed in R. The cv.glmnet() function was called to construct the LASSO logistic regression model with marker genes, and the model with the minimum prediction error rate was selected by choosing the optimal lambda value.

Cell culture

The GBM cell line U87 and the normal cell line HEB were obtained from ATCC. Cells should be cultured at 37 °C and 5% CO2 using Dulbecco's Modified Eagle Medium (DMEM) medium (from Gibco) with 10% fetal bovine serum, 1% glutamine, and 1% antibiotic/antifungal solution.

Quantitative real-time polymerase chain reaction (qRT-PCR)

Total RNA was extracted using Trizol reagent (Invitrogen). Total RNA was reverse transcribed to cDNA by Takara PrimeScript™ RT Reagent Kit and following its instructions. Subsequently, Takara TB Green™ Premix Ex Taq™ II instructions were used in order to perform a qRT-PCR reaction with the following reaction conditions: pre-denaturation at 95 °C for 30 s and 40 cycles (95 °C for 5 s and 60 °C for 30 s). Finally, the 2-ΔΔCt method was used to calculate the relative expression of the target genes with GAPDH as the internal reference gene. The primer sequences of the target gene were as follows: CDAN1: GCAGGATCAACCCAACTCCG (F), and CTCGCTCCTCTTGCAGACTTC (R); UBE2M: ATGAGGGCTTCTACAAGAGTGG (F), and ATTGTCTCACACTTCACCTTGG (R).

Statistical analysis

Statistical analyses in this study were conducted by R software (version: 4.1.0). The bilateral Wilcoxon rank sum test and chi-square test were chosen. All analyses were considered statistically significant at P < 0.05.

Results

Four novel molecular subtypes in GBM classified at the genomic level

To make it easier to understand the flow of this study, we have produced Supplementary Fig. 1. In this study, we aimed to classify novel molecular subtypes in GBM at the genomic level using a consensus clustering approach based on BayesNMF (Supplementary Table 1). Our findings indicated the existence of four novel molecular subtypes (S1, S2, S3, and S4) (Fig. 1A). To further explore the properties of the molecular subtypes, we compared the differences between the subtypes (Proneural, Neural, Mesenchymal, Classical) with that of Verhaak et al.19. The classification by Verhaak et al. provided the basis and perspective for molecular classification of GBM, but the study by wan et al. demonstrated that the original Verhaak 840 gene set was unable to robustly cluster the larger cohorts now available on different platforms36. It was observed that 46.2% of the samples in S1 were the Neural subtype, 82.8% of the samples in S2 were the Proneural subtype, 74% of the samples in S3 were the Mesenchymal subtype, and 57.9% of the samples in S4 were the Classical subtype. Furthermore, S2 was closely related to Proneural and S3 was closely related to Mesenchymal subtypes, with 70.6% of Proneural mapping to S2 and 66.1% of Mesenchymal subtypes mapping to S3 (Fig. 1B). These results indicate that our classification of GBM is highly consistent with previous classifications, with each subtype associated with specific phenotype.

Figure 1
figure 1

Four novel GBM molecular subtypes. (A): Heat map of marker genes expression in GBM subtypes. (B): Comparison of four novel GBM molecular subtypes with Proneural, Neural, Classical, and Mesenchymal subtypes. And the number in each box represents the sample number, and the percentage in the middle refers to the proportion of the sample in all samples; The percentage on the right is the proportion of samples of the corresponding subtype in the whole row; The lower percentage is the proportion of samples of the corresponding subtype in the whole column.

Next, we discussed the biological meaning between the four molecular subtypes and performed ssGSEA. The results indicated that S1 exhibited a low proliferative profile, S2 and S4 exhibited a high proliferative profile, and S3 exhibited a high immune profile. In terms of survival trend, the prognosis of S2 and S4 was worse than that of S1 and S3 (Supplementary Fig. 2). Moreover, we also found that S2 exhibited low immune and inflammatory features (Fig. 2A). Genomic level features in the four molecular subtypes were identified by the pan-oncogenomic features in TCGA, the four molecular subtypes showed distinct mutational profiles, and S3 exhibited overall higher Aneuploidy Score, Stromal Fraction (Fig. 2B, C). Furthermore, to explore the association between the four subtypes and driver events (point mutations, insertional deletions and copy number variants), we analyzed cancer driver genes and copy number variants. Our results indicated that S2 was enriched to IDH1 mutation, TP53 mutation and loss, S3 was enriched to NF1 deletion, and S4 was enriched to EGFR amplification/mutation (Fig. 2D).

Figure 2
figure 2

Biological highway analysis and copy number variation analysis in GBM subtypes. (A): ssGSEA results showing the biological pathway differences in GBM subtypes. (B): Number of Segments, Aneuploidy Score, Stromal Fraction in GBM subtypes. (C): Waterfall diagram demonstrating the variation differences of 17 genes in GBM subtypes. (D): Comparison of CNV differences in GBM subtypes.

The immune landscape in S1-4

The immune landscape in S1-4 was also discussed. S2 showed lower Macrophage Regulation, Lymphocyte Infiltration Signature Score, IFN-gamma Response, and TGF-beta Response. In contrast, S3 exhibited higher Macrophage Regulation, Lymphocyte Infiltration Signature Score, IFN-gamma Response, and TGF-beta Response (Fig. 3A). Important molecules in the immune system, including immune checkpoint molecules and human leukocyte antigen (HLA) encoding MHC proteins, showed higher expression levels in S3 than the other three molecular subtypes (Supplementary Fig. 3A, B). According to the immune characteristics in pan-cancer, the highest T Cells Follicular Helper, Th2 Cells were found in S2, and the highest Dendritic Cells, Monocytes, Neutrophils were displayed in S3 (Fig. 3B). We also evaluated the performance of immune score, stromal score37 and the overall score of immune cell types related to adaptive immunity and the overall score of immune cell types related to innate immunity38, respectively, among the four molecular subtypes and found that all four scores were the highest in S3 (Supplementary Fig. 4A, B). These results indicated that the S3 had a higher immunogenicity and was more easily recognized by the immune system and elicits an immune response.

Figure 3
figure 3

Immune-related scores and immune cell scores in GBM subtypes. (A): Scores of Macrophage Regulation, Lymphocyte Infiltration Signature Score, IFN-gamma Response, TGF-beta Response in GBM subtypes calculated by ssGSEA. (B): Immune cell score in GBM computed by CIBERSORT. *P < 0.05, **P < 0.01, ***P < 0.001, ****P < 0.0001.

Subtype-specific cancer vulnerability and validation of specific gene expression levels

Glioma cell lines in CCLE were first classified into specific subtypes using subtype-specific marker genes (Supplementary Tables 1 and 2). Due to the small number of high-grade glioma cell lines, only three subtypes, S2(n = 4), S3(n = 36), and S4(n = 6), were eventually mapped (Supplementary Fig. 5A). Here, we selected the GBM somatic copy number variant genes identified in this study (Supplementary Table 3). Among them, genes with subtype-specific cancer susceptibility need to fulfill two conditions: first, to ensure that the CERES median for each driver gene is less than -0.5 in the subtype to reflect the fact that the subtype cell line is dependent on the gene; and, second, that the CERES median for the gene in the specific subtype is lower than that in the rest of the cell line by 0.3.Therefore, we found that only 2 genes (CDAN1 and UBE2M) fulfill the above conditions. However, the differences in these genes were non-significant due to the small sample size used (Supplementary Fig. 5B).

To validate CDAN1 and UBE2M, we examined the mRNA expression levels of these two genes in the GBM cell line U87 based on qRT-PCR. As shown in Supplementary Fig. 5C, we observed that the mRNA expression levels of CDAN1 and UBE2M were significantly upregulated in GBM cells compared to normal control cells. These results indirectly confirm the validity of our identified GBM subtypes.

Proteomics revealed specific protein regulation in S2-4

To further explore the features on protein expression levels in the four GBM subtypes, GBM samples in CPTAC was clustered as S1 (N = 15), S2 (N = 18), S3 (N = 35), and S4 (N = 21) based on marker genes in S1-4. Consistent with the ssGSEA results in TCGA, the S2 and S4 subtypes in CPTAC exhibited a high proliferative profile and the S3 subtype exhibited a high immune profile, this also deepened our conjecture that S3 may be immune infiltrating type (Fig. 4A). Because there was no significantly activated pathway in S1, subsequent studies were focused on protein regulation in S2, S3, and S4. GISTIC results showed the SCNA phenomenon of EGFR and CDKN2A in S2, S3, and S4. Specifically, CDKN2A had more CNV deletion, whereas there was more amplification in EGFR (Fig. 4B). Then we compared the protein abundance of EGFR and CDKN2A, and found that the protein abundance of CDKN2A was high in S2 and S3 but low in S4. The protein abundance of EGFR was low in S2 and S3 but high in S4 (Fig. 4C). Based on these results, we hypothesized that CNV loss of CDKN2A and CNV gain of EGFR in S4 may be one of the main reasons for the decreased expression of CDKN2A and the increased expression of EGFR.

Figure 4
figure 4

Proteomics data analysis in CPTAC GBM. (A): Molecular subtypes of GBM in CPTAC GBM cohort based on marker genes. (B): Histogram of CNV for CDKN2A and EGFR information in S2-4. (C): Protein abundance of CDKN2 and EGFR in GBM subtypes. *P < 0.05, **P < 0.01, ***P < 0.001, ****P < 0.0001.

Our initial GSVA results indicated that S2 and S4 subtypes possessed a high proliferative profile and S3 possessed a high immune profile. To further investigate the potential biological differences, we performed a more in-depth characterization at the proteomic level. In the CPTAC cohort, CNV deletion of PD-L1 was more common in S3 and S4, but CNV amplification of PD-L1 was more common in S2. We also noticed the highest level of PD-L1 expression and phosphorylation in S3 (Fig. 5A). Among S2-S4, EGFR had the highest CNV gain in S4, EGFR had the highest CNV loss in S2, and EGFR had an intermediate CNV gain and CNV loss in S3, and, the highest levels of EGFR expression, protein abundance and phosphorylation were observed in S4 (Fig. 5B). In the TCGA cohort, the CNV phenomenon of PD-L1 in S2-S4 was very different from that of EGFR. The CNV rate of PD-L1 in S2 was low, and the CNV pattern in S4 was almost all CNV loss. EGFR had a higher CNV rate in S4 than in S2 and S3, and the CNV pattern in each subtype was CNV gain. S2 had the lowest expression levels of both PD-L1 and EGFR, the highest expression level of PD-L1 in S3, and the highest expression level of EGFR in S4. (Fig. 5C). Based on previous studies, we compared the expression patterns of Proliferation scores and effectors of T cells (IFN-gamma Response) in S1-4. High levels of Proliferation scores were observed in S2 and S4, and the lowest IFN-gamma Response scores were observed in S2 (Fig. 5D). The highest PRF1, GZMA protein abundance was also observed in S3 (Fig. 5E). Thus, there was heterogeneity among S2, S3 and S4 in the genomic features of immune molecules, which may lead to different responses to immunotherapy.

Figure 5
figure 5

Genomic and proteomic analysis in the CPTAC GBM cohort. (A): PD-L1 expression, CNV, protein abundance, and phosphorylation levels in GBM subtypes. (B): EGFR expression, CNV, protein abundance, and phosphorylation levels in GBM subtypes. (C): PD-L1 and EGFR expression, CNV in TCGA-GBM cohort. (D): GBM subtype proliferation score and immune score in TCGA-GBM cohort. (E): Protein abundance of T cell effectors PRF1 and GZMA in the CPTAC GBM cohort.

S3 was a potential molecular subtype for immune checkpoint therapy response

A GBM sequencing cohort that received immunotherapy (Temozolomide treatment), GSE84010, was retrieved from the GEO database. Based on the expression matrix of marker genes, the samples could be classified as S1, S2, S3 and S4. The GSEA results were consistent with the trend of sample characteristics in the TCGA cohort, with S2 and S4 exhibiting high proliferative properties and S3 exhibiting high immune properties (Fig. 6A). In comparison with the subtypes identified by Verhaak et al.19, the majority of cases in S2 belonged to the Proneural subtype, the majority of cases in S3 belonged to the Mesenchymal subtype, and the majority of cases in S4 belonged to the Classical subtype (Fig. 6B). These results were highly consistent with in the TCGA cohort, further indicating that the novel subtypes of GBM in this study were robust. To identify the subtypes responding to immunotherapy, the TIDE score analysis was conducted. Due to the absence of significantly activated pathways in S1 and the absence of typical molecular typing features. We focused on the immunotherapy response in S2, S3, and S4. Samples in S4 had the highest TIDE scores, indicating the least sensitivity to immunotherapy (Fig. 6C). The lowest percentage of response to immunotherapy in S4 was also validated (Fig. 6D). In contrast, TIDE scores were lower in S3, indicating a greater benefit from taking immunotherapy (Fig. 6C, D). These results suggested that the S3 subtype was potentially a key molecular subtype for indicating response to immunotherapy in GBM. Since S4 had the lowest response rate to immunotherapy, we found the GBM cohort (GSE84010 dataset) treated with bevacizumab on the basis of radiotherapy/temozolomide, and analyzed the survival difference of the four subtypes after removing samples that could not be subclassified. We found that S4 had a significantly better prognosis in the bevacizumab group than in the other three subtypes, whereas in the placebo group, there was no significant difference in prognosis among the four subtypes, suggesting that S4 is the most suitable subtype for bevacizumab (Supplementary Fig. 6).

Figure 6
figure 6

Immunotherapy response prediction. (A): Comparison of subtype classification and biological activity of samples in GSE84010. (B): Comparison of GBM subtypes with Proneural, Neural, Classical, and Mesenchymal subtypes in GSE84010. (C): TIDE scores of GBM subtypes in GSE84010. (D): Immunotherapy response rate of GBM subtypes in GSE84010.

Biomarkers for the identification of S2, S3 and S4 subtypes

Our results confirmed that the BayesNMF-based approach to construct a novel subtype classification in GBM was feasible. But as more than 400 marker genes were identified in all four subtypes by this analysis method, marker genes in the model required compression. Based on marker genes or marker proteins in S2, S3 and S4, we constructed LASSO logistic regression model in TCGA-GBM cohort and RPPA cohort to predict S2, S3 and S4 subtypes by selecting the model with the smallest prediction error rate depending on the smallest lambda value (Supplementary Table 4). In the TCGA-GBM cohort, the LASSO logistic regression model with 13 genes reached an accuracy of 96.7% in predicting the S2 subtype, the LASSO logistic regression model with 17 genes reached an accuracy of 86.7% in predicting the S3 subtype, and the LASSO logistic regression model with 14 genes reached an accuracy of 93.3% in predicting the S4 subtype. In the RPPA cohort, the LASSO logistic regression model with 17 marker proteins reached an accuracy of 84.6% in predicting the S2 subtype, and the LASSO logistic regression model with 18 marker proteins reached an accuracy of 84.6% in predicting the S4 subtype (Table 1). These results indicated that we could accurately predict S2, S3 and S4 subtypes using marker genes expression data or protein expression data.

Table 1 Gene expression model and protein expression model for predicting S2, S3 and S4 molecular subtypes.

Discussion

Previous GBM studies identified the presence of somatic mutations in the IDH gene and common deletions of 1p and 19q in chromosomes, which determined the relevant subtypes39. Verhaak et al.19 also identified four GBM subtypes (Proneural, Neural, Classical, Mesenchymal). In 2021, WHO updated the classification summary, refining the clinicopathological classification and molecular characterization of GBM1. However, GBM is an “immune-cold” tumor, and strong tumor heterogeneity and high tumor plasticity have been challenging to accurately identify molecular subtypes40,41,42,43,44. Unclear molecular mechanisms hinder effective treatment for GBM patients and prognosis prediction5.

In this study we identified four novel molecular subtypes of GBM, S1, S2, S3, and S4 by consensus hierarchical clustering based on the BayesNMF method. The identity of the classical subtype is defined by the most common genomic aberration in GBM. The majority of cases in S2 belonged to the Proneural subtype, the majority of cases in S3 belonged to the Mesenchymal subtype, and the majority of cases in S4 belonged to the Classical subtype. S1 was dominated by neural and classical subtypes, and the high proportion of these two subtypes prevented us from clearly defining S1. However, in terms of pathways enriched in S1, its metabolism is active and its proliferative activity is very low. S2 was enriched to IDH1 mutations, TP53 mutations and deletions. The most typical features in Proneural were point mutations in IDH1 and TP53 mutations and loss of heterozygosity19. High abundance of Dendritic Cells, Monocytes, and Neutrophils was observed in S3. Dendritic Cells were specialized immune cells that presented antigens and were important regulators in the innate immune response45. In an inflammatory environment, Monocytes induce Monocytes-derived Dendritic Cells (MoDC) production46. In GBM, Neutrophils regulate T cells abundance and tumor-associated macrophage/ Monocytes abundance47. Reduced Dendritic Cells activity induces T Cells depletion48. It could be noted that the high abundance of Dendritic Cells, Monocytes, and Neutrophils in S3 formed a pro-immune active environment. High levels of EGFR expression and CNV amplification were observed in Classical subtypes19, and EGFR amplification and high protein abundance were detected in S4. Highly proliferative and aggressive GBM in which wtEGFR is typically expressed is an essential cause of relapse after treatment49. We observed that S4 exhibited a higher proliferative profile in which proliferation-associated signaling pathways are activated. The obtained four molecular subtypes were compared with the classification of Verhaak et al., and it was found that each subtype provided by Verhaak was also specifically reflected in the new molecular subtypes, which also increased the rationality of this classification of GBM.

S3 was the highly immunocompetent subtype among the four subtypes, showing a high intensity of macrophage regulation, lymphocyte infiltration signature score, IFN-gamma response. Macrophages are the major immune cells that express PD-L1 in tumors, and tumors rich in PD-L1 TAM exhibit an activated immune state with high levels of immune-related gene expression that may contribute to ICI therapy50,51,52. Macrophages are the major immune cells that express PD-L1 in tumors, and tumors rich in PD-L1 TAM exhibit an activated immune state with high levels of immune-related gene expression that may contribute to ICI therapy53. High levels of IFN-γ-related gene expression signatures are an important feature of tumors that respond to PD-1 checkpoint blockade54. Moreover, immune score, adaptive immunity score and innate immunity score HLA molecules encoding MHC proteins also showed the highest levels in S3, indicating that the S3 had a higher immunogenicity and was more easily recognized by the immune system and elicits an immune response. Down-regulation of MHC class I molecules on tumor cells is an important mechanism of immune escape and acquired ICI resistance. Increasing the expression of MHC class I proteins helps to improve or restore anti-tumor cell immunity, so as to obtain clinical benefits55. These pieces of evidence all support a favorable response of S3 to ICI therapy. Our results obtained by TIDE analysis also did indicate that S3 had the highest response rate to ICI treatment. To unblock immune checkpoints in tumors, immune cross-presentation in the organism is crucial. Hammerich et al.56 noted that adsorption and induction of cross-initiated Dendritic Cells in tumors could enable anti-tumor killing T cell responses and immune checkpoint blockade. Intra-tumor Dendritic Cells enhance T cell responses and glioma rejection57. DCs in the TME play a key role in mediating the response to ICI drugs58. Quantitative expansion and activation of DCs promote the effectiveness of ICI treatment response59. A high abundance of Dendritic Cells was found in S3. The TIDE score indicated that the S3 subtype was less prone to immune escape after receiving immunotherapy.

Interestingly, the lowest TIDE score and the highest immunotherapy response were observed in S1. In our study, S1 did not have significantly activated biological pathways and showed low proliferative properties, which might account for its high response to immunotherapy.

We recognized some limitations of our study. The cohort studied presented predominantly white data, and the results may not necessarily apply to other races, and larger validation in more races is needed. GBM also has intratumor heterogeneity, which needs to be dissected by single-cell sequencing technology, which will facilitate a more complete and accurate intratumor classification. Subsequently, animal experiments are also a necessary step.

Collectively, we reported a new classification of GBM, which divided GBM into four subtypes, each with its own specific molecular features and showing varying degrees of response rates to immunotherapy. In addition, a prediction model with high accuracy was tailored for particularly important subtypes, providing a reference for specific and potentially targetable markers for subtype.