1 Introduction

Neuroblastoma (NB) is a type of embryonic tumor associated with the autonomic nervous system, suggesting that its origin lies in a precursor cell that is still in development and not yet fully specialized, originating from neural-crest tissues [1]. As one might anticipate with a condition affecting developing tissues, cases of neuroblastoma typically arise in infants and toddlers; the average age at the time of diagnosis is 17 months [2, 3]. NB being highly heterogeneous, its clinical presentation ranges from spontaneous regressions to highly invasive metastatic lesions resulting in a wide variation in prognosis [4]. Despite significant advances in the treatment of NB in recent years, particularly in high-risk patients where the 5-year survival rate has increased from less than 20% to more than 50%, the overall prognosis remains unsatisfactory [5]. Long-term survival in high-risk NB patients remains low, and treatment-related late effects and chronic health problems significantly affect the quality of life of survivors. Current treatments include surgery, chemotherapy, and radiotherapy, but there are no effective curative therapies for patients with recurrence, emphasizing that significant clinical challenges remain [4]. Previous study showed that despite improved survival in patients with high-risk neuroblastoma, the majority of patients with recurrence still do not have curative therapies and survivors face treatment-related late effects in the long term [6]. Krystal and Foster emphasized that, through decades of clinical trials and research, advances in treatments including surgery, cytotoxic chemotherapy, radiotherapy, and autologous stem cell transplant and radiotherapy have significantly improved the prognosis of patients with NB [7]. However, these treatments have had limited success in improving long-term survival.

Biomarker discovery based on sequencing data has become an important component in precision medicine [8]. In cancer research, biomarkers are used not only for diagnosis but also for predicting treatment response and monitoring disease progression. For example, Zhu et al. noted that biomarkers identified from large-scale transcriptomic data can be used for pan-cancer diagnosis and prognosis, facilitating the application of precision medicine for cancer treatment [9]. Sarhadi and Armengol noted that biomarkers discovered through sequencing data can be used in precision medicine for cancers including risk assessment, diagnosis, prognostic evaluation as well as prediction and monitoring of treatment response [10]. Shaikh et al. predicted potential biomarkers by computational methods such as text mining, network analysis, next-generation technologies, machine learning, and deep learning in a breast cancer study [11]. Demir Karaman and Işık identified prognostic biomarkers for multiple cancer types by analyzing multi-omic data markers [12]. Addanki et al. noted that liquid biopsies, including circulating tumor DNA (ctDNA) and circulating tumor cells (CTCs), are biomarkers identified from sequencing data that can be used for precision medicine in cancer [13].

This study analyzed neuroblastoma sample data from the ArrayExpress database using WGCNA and identified key gene modules and hub genes related to NB prognosis. Based on these hub genes, a survival prognosis model was constructed. Through these analytical methods, biomarkers related to NB prognosis were systematically identified, providing new ideas and directions for subsequent clinical applications and the improvement of personalized treatment plans.

2 Materials and methods

2.1 Data sources

NB data with number E-MTAB-8248 were collected from ArrayExpress database (https://www.ebi.ac.uk/biostudies/arrayexpress) as a training set, which included a total of 223 cancer samples. Meanwhile, the neuroblastoma dataset with the number GSE49710 was retrieved from Gene Expression Omnibus (GEO, https://www.ncbi.nlm.nih.gov/geo/) database as a test set, containing a total of 498 cancer samples.

2.2 Weighted gene co-expression network analysis (WGCNA)

WGCNA was conducted using the WGCNA package in R [14]. The INSS stage was used as a trait to perform WGCNA analysis on the training set, and the soft threshold intercept height was set to 0.9 to obtain the optimal soft threshold β = 7. The minimum number of genes was 100 in a module and the module merge height was 0.2. The module with the highest correlation was defined as the specificity module.

2.3 Functional enrichment analysis

The hub genes identified from the WGCNA obtained from the screening were analyzed for functional enrichment by the Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) using the Database for Annotation, Visualization, and Integrated Discovery (DAVID, https://david.ncifcrf.gov/) [15, 16]. Functional entries and pathways that were significantly enriched (Count ≥ 2, P Value < 0.05) were screened.

2.4 Construction of prognostic model

Univariate COX regression analysis enables the variables in the model to be closely associated with the clinical outcomes of the patients, thus improving the clinical relevance of our constructed model. Thus, the hub genes obtained from the WGCNA screening of the combined survival data were analyzed by univariate COX regression to screen the genes significantly associated with survival. Subsequently, we used Lasso regression to further compress the screened genes to remove redundant and irrelevant variables, thereby reducing the risk of overfitting the model. Lasso regression was performed using the R package glmnet [17], and the best model was determined by tenfold cross-validation. Finally, 3 genes associated with NB prognosis were identified by stepwise multifactorial regression analysis: DHFR, GMPS, and E2F3. The risk score formula was RiskScore = 0.522 * DHFR + 0.569 * GMPS + 0.408 * E2F3.

2.5 Model validation

Each sample in the training set was assigned with a risk score and receiver operating curve (ROC) analysis was conducted using the R package pROC to analyze the classification efficiency of prognostic prediction at 1, 3 and 5 years, respectively [18]. The samples were categorized into high-risk and low-risk groups, the overall survival of the two groups was compared, and the robustness of the model was verified in the GEO test set.

2.6 Gene set enrichment analysis (GSEA)

GSEA (FDR < 0.05) for the training set samples was conducted applying GSEA_4.2.2 software. The c2.cp.kegg (http://www.gsea-msigdb.org/gsea/msigdb/download_file.jsp?filePath=/resources/msigdb/7.4/c2.cp.kegg.v7.4.symbols.gmt) and c5.go gene sets (http://www.gsea-msigdb.org/gsea/msigdb/download_file.jsp?filePath=/resources/msigdb/7.4/c5.go.v7.4.symbols.gmt) were obtained from the Molecular Signatures Database portal (MSigDB, https://www.gsea-msigdb.org/gsea/msigdb/), and KEGG and GO gene set enrichment analyses were performed to analyze the differences in relevant biological functions and signaling pathways between the two risk groups.

2.7 Immune infiltration analysis

CIBERSORT is an algorithm that can be used to infer immune cell composition from complex tumor samples, which analyzes the proportions of different immune cells in the tumor microenvironment to understand the role of the immune system in tumorigenesis and progression [19]. The ESTIMATE algorithm is based on the “estimate” package and is used to estimate the stromal and immune cell scores of a tumor tissue sample, which in turn reflects the relative amounts of immune and stromal components in the sample [20]. We utilized both of these algorithms in order to assess differences in the extent of immune cell infiltration in patients from different NB risk groups. In addition, a spearman correlation-based analysis was performed in order to analyze the correlation between immune infiltration scores and prognostic biomarkers and risk scores.

2.8 Drug sensitivity and immunotherapy correlation

Drug sensitivities of patients in the training set were predicted using the R package oncoPredict [21], and the half-maximal inhibitory concentration (IC50) of drugs was calculated to screen for drugs significantly related to the risk score. Differences in immunotherapy efficacy between the two risk groups were assessed using TIDE software (http://tide.dfci.harvard.edu/) [22].

3 Results

3.1 WGCNA results

Firstly, Weighted gene correlation network analysis was performed in E-MTAB-8248 cohort.In constructing the scale-free network, the soft threshold intercept height was set to 0.9, at which time the optimal soft threshold β = 7 (Fig. 1A). After integrating similar gene modules, 14 gene modules were obtained (Fig. 1B). INSS Stage was analyzed as the analyzed trait with the main trait ME of each gene module for pearson correlation analysis. MEturquoise showed the highest and most significant positive correlation with INSS Stage (r = 0.32, p = 9e–07) with 3621 genes within the module (Fig. 1C). With gene significance (GS) ≥ 0.2 and |module membership (MM)|≥ 0.8 as screening conditions, 60 significant hub genes were finally obtained (Fig. 1D).

Fig. 1
figure 1

Results of Weighted gene correlation network analysis. A Soft threshold screening graph, screening the minimum value of 7 with R^2 higher than 0.9 as the soft threshold for constructing the topology network. B Gene-module control plot, the upper half of the plot is a gene tree, each branch represents a gene, and each gene corresponds to a module before and after merging below. C Module-trait correlation plot, the horizontal coordinates of the plot are shapes, the vertical coordinates are modules, the numbers in the squares are uncorrelation coefficients, and the numbers in the parentheses are the significance P-values, with the red color representing a positive correlation and the blue color representing a negative correlation. D GS-MM plot for green module, horizontal coordinate MM (module membership), the correlation of each gene with the trait gene; vertical coordinate GS (gene significance), the correlation of genes and traits within the module

3.2 Biological functions of hub genes

Towards the comprehension of Hub gene correlative biological functions and pathways, the obtained 60 genes were processed for GO and KEGG functional enrichment analysis, and screened for significantly enriched functional terms and pathways (Count ≥ 2, P value < 0.05). Hub genes were largely implicated in the GO Biological Process (BP) terms of mitotic spindle assembly checkpoint, cell division, mitotic spindle organization, mitotic cell cycle, and cell cycle, which are GO Biological Process terms (Fig. 2A). Figure 2B shows that the cellular components (CC) enriched for the central genes we screened include kinetochore, nucleoplasm, and nucleus, among others. In addition, molecular function (MF) enrichment analysis revealed that these genes were also enriched in multiple pathways such as ATP binding, DNA binding, and microtubule binding (Fig. 2C). Finally, KEGG enrichment analysis also demonstrated that the hub genes we screened were also significantly enriched in the cell cycle pathway (Fig. 2D). These results further reveal that these pathways have an impact on both the onset and development of NB.

Fig. 2
figure 2

Biological functions of hub genes. A Bubble diagram of the top 10 terms of biological process. B Bubble chart of the top 10 terms of cellular components. C Bubble chart of top 10 terms of molecular function. D Bubble plot of KEGG significance in top 10 terms. The horizontal coordinate in all plots is GeneRatio, which is the ratio of genes enriched to the total number of genes entered in the entry, the size of the circle in the plot represents the number of genes enriched to the entry, and the color represents the significance of the terms, with blue to red representing an increase in significance

3.3 Biomarker-based prognostic modeling to predict NB prognosis

In order to narrow down the scope of the model, the 60 genes obtained from WGCNA combined with the survival data from the E-MTAB-8248 cohort were subjected to univariate COX analysis, which showed that all 60 genes were closely related to survival. These 60 prognostically significant genes were further compressed using Lasso regression to shrink the gene number in the prognostic model. After tenfold cross-validation for model construction, the model was optimized when lambda = 0. 037, at which time 6 genes were included in the model (Fig. 3A). Finally, three prognosis-related genes in NB, DHFR, GMPS, and E2F3, were identified after multivariate COX analysis. The model was formulated as RiskScore = 0.522*DHFR + 0.569*GMPS + 0.408*E2F3. In the E-MTAB-8248 cohort, RiskScore predicted that the 1-year, 3-year, 5-year prognosis of patients had AUC values greater than 0.8 (0.89, 0.82, and 0.82, respectively). In the E-MTAB-8248 cohort, patients were categorized into a high RiskScore group (102 samples) and a low RiskScore group (121 samples) based on the optimal grouping intercept method of the survminer package. The survival rate and time of the high RiskScore patients were more unfavorable than the low RiskScore patients (Fig. 3C, D). The prognostic value of RiskScore was validated in validation set GSE49710 grouped in the same way. Evidently, the results in the validation set GSE49710 followed the trend of the results in the training set.The AUC values of RiskScore for predicting patients' 1-year, 3-year, and 5-year prognosis in GSE49710 were 0.86, 0.85, and 0.83, respectively, which were also all greater than 0.8, demonstrating that the RiskScore presented excellent predictive value (Fig. 3E). Similarly, patients in the high RiskScore had overall more unfavorable survival rates and times when compared to the low RiskScore group (Fig. 3F, D).

Fig. 3
figure 3

Biomarker-based prognostic model predicting the prognosis of NB. A LASSO COX analysis. The left panel shows a plot of the Lasso penalty term parameters, with log(lambda) values in the horizontal coordinates and degrees of freedom in the vertical coordinates, representing the cross-validation error, and the location where the cross-validation error is expected to be minimized in the actual analysis. The right figure shows the Lasso regression coefficients plot, with log(lambda) in the horizontal coordinate and the coefficients of the genes in the vertical coordinate, demonstrating the change of the coefficients of the different variables with the λ-penalty, when min Lambda = 0.037, according to which the position (min Lambda) determines the topmost cross-coordinate of the log (Lambda), and the top shows the number of featured genes6. B ROC curve of RiskScore predicting the prognosis of NB patients in the training set (E-MTAB-8248) model. C KM survival curves of patients in the high and low RiskScore groups in the training set (E-MTAB-8248) model. D Scatterplot of the distribution of RiskScore values and survival status of NB samples in the training set (E-MTAB-8248). E: ROC curve of RiskScore predicting the prognosis of NB patients in the model of the training set (GSE49710). F KM survival curves of patients in the high and low RiskScore groups in the training set (GSE49710) model. G Scatterplot of the distribution of RiskScore values and survival status of NB samples in the training set (GSE49710)

3.4 RiskScore was an independent prognostic factor for NB

Age and RiskScore were significantly correlated with OS of NB and acted as risk factors for prognosis of NB, as shown by the results of univariate COX analysis (Fig. 4A). Multivariate COX analysis showed that RiskScore could serve as an independent factor for NB prognosis (Fig. 4B). In addition, we observed the distribution of patients’ age, stage(inss), and tert_expression in high and low RiskScore subgroups. Evidently, patients with NB in the high RiskScore subgroup had higher age, higher stage (inss), and exhibited higher tert expression (Fig. 4C–E).

Fig. 4
figure 4

RiskScore is an independent prognostic factor for NB. A Forest plot of univariate COX analysis combining age, stage(inss), tert_expression, and riskscore. B Forest plot of multivariate COX analysis combining age, riskscore. C Boxplot of age information for NB patients in high and low RiskScore subgroups. D Boxplot of age information for NB patients in high and low RiskScore groupings. E Box line plot of age information for NB patients in high and low RiskScore groupings

3.5 Abnormal signaling pathways in high and low RiskScore groups

Abnormal signaling pathways in high and low RiskScore groupings were compared using the c2.cp.kegg gene set and the c5.go genome as benchmarks. It could be visualized that DNA REPLICATION, CHROMOSOME ORGANIZATION, REGULATION OF DNA REPLICATION, DNA REPAIR, RIBOSOME BIOGENESIS, SPLICEOSOME in the high RiskScore grouping, NUCLEOTIDE EXCISION REPAIR, CELL CYCLE, BASE EXCISION REPAIR, and HOMOLOGOUS RECOMBINATION were more active (Fig. 5). These pathways are cell cycle-related pathways, suggesting that activation of cell cycle pathways in patients with high RiskScore results in accelerated rates of cell division and proliferation.

Fig. 5
figure 5

GSEA results for samples in high and low RiskScore groupings

3.6 Differences in immune infiltration in RiskScore groupings

There were significant differences in the infiltration of five immune cells in different RiskScore groupings, Monocytes, T cells CD4 memory resting, Dendritic cells resting, T cells gamma delta, and Neutrophils (Fig. 6A). RiskScore showed a significant positive correlation trend with Plasma cells and Neutrophils, and RiskScore was significantly negatively correlated with T cells gamma delta and T cells CD4 memory resting (Fig. 6B). Meanwhile, DHFR, GMPS, and E2F3 showed consistency in correlating with the level of 22 immune cell infiltration, mainly showing positive correlation trend with Neutrophils, and negative relation to T cells CD4 memory resting, and T cells gamma delta (Fig. 6B). We also observed that the low RiskScore subgroup exhibited higher ImmuneScore and ESTIMATEScore overall (Fig. 6C). Finally, we observed that the expression levels of RiskScore, DHFR, GMPS, and E2F3 all showed a significant negative correlation with ImmuneScore (Fig. 6D).

Fig. 6
figure 6

Differences in immune infiltration in RiskScore subgroups. A Box line plot of immune cell infiltration of 22 species in RiskScore subgroups. B Heatmap of the correlation between the expression levels of RiskScore, DHFR, GMPS, and E2F3 and the level of immune cell infiltration of 22 species. C StromalScore, ImmuneScore, ESTIMATEScore of samples in the RiskScore grouping. D Heatmap of correlation between RiskScore, DHFR, GMPS, and E2F3 expression levels and StromalScore, ImmuneScore, and ESTIMATEScore

3.7 Assessment of chemotherapy and immunotherapy indications for patients in RiskScore subgroups

We also discussed the therapeutic sensitivity of chemotherapeutic small molecule drugs in high and low RiskScore subgroups. We found a significant correlation between 133 drugs and riskScore (|cor|≥ 0.3, p < 0.05). We show the top 10 drugs with positive and negative correlations, with major positive correlations including Doramapimod_1042, BMS-754807_2171, SB216763_1025, KU-55933_1030, NU7441_1038, AZD8055_1059, AZD6482_2169, JQ1_2172, Axitinib_1021, PF-4708671_1129, and major negatively related drugs include Dihydrorotenone_1827, Gallibiscoquinazole_1830, Fulvestrant_1816, Acetalax_1804, GDC0810_1925, BDP-00009066_1866, Alpelisib_1560, OF-1_1853, Oxaliplatin_1089, VE-822_1613 (Fig. 7A). Specifically, patients with low RiskScore were Axitinib_1021, SB216763_102, SB216763_1025, NU7441_1038, Doramapimod_1042, PF-4708671_1129, AZD6482_2169, and JQ1_2172 drug therapy More sensitive. Patients with high RiskScore had higher sensitivities to Oxaliplatin_1089, Alpelisib_1560, VE-822_1613, Acetalax_1804, Fulvestrant_1816, Gallibiscoquinazole_1830, OF-1_1853, GDC0810_1925 drug treatments were more sensitive (Fig. 7B). In addition, we noted that patients in the high RiskScore group exhibited higher TIDE scores, suggesting that patients therein may not be suitable for immunotherapy (Fig. 7C).

Fig. 7
figure 7

Assessment of chemotherapy and immunotherapy indications in RiskScore grouped patients. A Lollipop chart showing the 20 drugs most significantly correlated with riskscore, with drugs on the left side of the horizontal line showing a significant negative correlation with riskscore and drugs on the right side of the horizontal line showing a significant positive correlation with riskscore. B IC50 values of the 20 drugs in the high and low riskscore groups. C Box line plot of TODE scores of patients in the riskscore group

4 Discussion

This study successfully classified key gene modules related to NB by WGCNA and further analyzed the role of these genes in disease progression. Particularly, the MEturquoise module, whose high correlation with Stage (INSS) (r = 0.32, p = 9e–07) showed the importance of these genes in the development of NB. Subsequently, by systematically identifying key molecular markers related to NB prognosis, reliable prognostic models will be constructed, which will in turn better promote the precision treatment of NB.

In NB research, the identification and functional resolution of hub genes is an important step in understanding tumor mechanisms. The results of GO and KEGG functional enrichment analyses showed that hub genes were mainly involved in several key biological processes and signaling pathways. Cell division is the fundamental process of tumor cell proliferation. Tumor cells form masses and invade surrounding tissues by continuously undergoing uncontrolled cell division. The CDC20 gene promotes prostate cancer cell growth and invasion through stabilizing β-catenin in the cancer stem cell-like cells [23]. Mitotic Spindle Organization in mitosis plays a key role in ensuring proper chromosome segregation. Abnormalities in spindle organization may lead to chromosomal instability, which is a critical mechanism in cancer development. Studies have shown that the mechanism of spindle multicenter aggregation in cancer cells is closely correlated with the three-dimensional structure and invasive ability of cancer cells [24]. Uncontrolled cell cycle regulation is one of the main causes of tumorigenesis. Cancer cells evade normal cell growth control mechanisms by altering the function of cell cycle checkpoints, resulting in uncontrolled proliferation.NuMA protein establishes spindle bipolarity during mitosis by organizing microtubule astrosomes, which is essential for the normal progression of cell division [25]. KEGG analysis showed that the Hub genes were primarily involved in cell cycle, DNA replication and cancer signaling pathways. These pathways play important roles in tumorigenesis and progression. Abnormalities in cell cycle and DNA replication pathways can lead to unlimited cell proliferation, whereas activation of cancer signaling pathways promotes tumor cell survival, proliferation, and invasion. Overexpressed Cyclin D1 gene plays a critical partin the adaptation of cancer cells to the mechanical stresses within the tumor [26]. Taken together, Hub genes in neuroblastoma promote tumorigenesis and progression by regulating several key biological processes and signaling pathways. These findings could deepen the current understanding of neuroblastoma mechanism, providing potential targets for developing new strategies for NB treatment.

In prognostic models of NB, the DHFR, GMPS, and E2F3 genes play key roles. DHFR is involved in DNA synthesis and repair by catalyzing the reduction of dihydrofolate to tetrahydrofolate in cellular proliferation, and its high expression has been associated with many types of cancers, including breast and lung cancer. For example, DHFR is a prognostic factor dependent in ovarian cancer, and low levels of DHFR expression were associated with longer patient survival. In addition, inhibition of DHFR modulates anti-tumor immunity, showing a potential of using DHFR inhibitors in combating immunotherapeutic resistance in non-small cell lung cancer [27]. The role of GMPS in guanosine monophosphate synthesis and its aberrant expression are closely related to tumor growth. It has been shown that GMPS is part of a prognostic model constructed based on metabolism-related genes in pancreatic cancer [28]. In ovarian cancer, GMPS, PR, CD40 and p21 were identified as important prognostic markers [29]. E2F3 is a transcription factor that promotes rapid tumor cell growth by regulating cell cycle progression and DNA synthesis. E2F3 is overexpressed in many types of cancers, including prostate and bladder cancers.Higher expression of E2F3 in colon cancer was significantly associated with worse overall survival, suggesting that it may serve as a prognostic biomarker [30]. In addition, E2F3 acts as a promoter in colon cancer progression by activating the STAT3 pathway and accelerating the stemness of colon cancer cells [31]. Therefore, these three genes identified in our study may be important driving causative agents of NB, and their high expression may be associated with poor patient prognosis and serve as potential therapeutic targets. The RiskScore formed by the combination of these genes is able to predict the 1-, 3-, and 5-year survival rates of patients, which has important clinical applications. Prognostic models constructed using these genes can more accurately evaluate patients’ prognosis and help physicians better develop personalized treatment plans.

5 Conclusion

In conclusion, the roles of DHFR, GMPS, and E2F3 in neuroblastoma and their applications in prognostic models provide new perspectives and methods for precision tumor therapy, promoting a better understanding of the molecular mechanisms of NB, facilitating the development of new therapeutic strategies.

6 Limitations

The present study still has limitations, although the hub genes in Neuroblastoma were identified and a prognostic model was constructed by the WGCNA method, this is a predictive study based on computerized methods with sequencing data from patient samples. The performance of the Neuroblastoma prognostic model still needs to be validated with large samples of data in multiple clinical centers. Clinical value of the model. Secondly, the hub genes identified in this study have been less studied in Neuroblastoma, and their functions are still unclear, and their functions in Neuroblastoma still need to be explored through cellular and animal experiments.