Introduction

Pediatric acute myeloid leukemia (P-AML) accounts for 15–20% of all childhood acute leukemia, which could be classified based on morphology, lineage, and genetics [1]. Over the past three decades, the overall survival (OS) rates of children with AML have increased dramatically, with the present 5-year survival rate varying between 65 and 75%, and the initial complete remission rates around 80% after the induction chemotherapy [2,3,4,5,6]. However, it remains a catastrophic disease with around 40% relapsed patients [6] and efforts to develop novel target therapies and cell therapies to enhance OS in these patients are ongoing.

The prognosis of P-AML is determined by a variety of cytogenetic and molecular traits [7]. Clinical protocol design has placed a strong emphasis on risk stratification of therapy for P-AML in order to maximize treatment for high-risk groups while minimizing therapeutic intensity for lower-risk groups. According to the groups participating in the pediatric cooperative clinical trials, different risk variables are employed for stratification [8]. Historically, risk-groups and treatments were categorized in the Children's Oncology Group (COG) AAML0531 research based on the baseline genetic prognostic indicators and the disease responses following induction therapy [9]. The three-year event-free survival (EFS) in the low-risk, intermediate-risk, and high-risk groups of AAML0531 COG classification was 64.0%, 46%, and 27%, respectively, resulting in substantial survival disparities across the groups. However, 60% of pediatric AML cases lack chromosomal abnormalities that stratify prognosis, while 20% of them lack all recognized markers [10]. In addition, it is generally difficult to quantify how these mutational signatures interact to affect survival. Measurement of measurable residual disease (MRD) by multidimensional flow cytometry has allowed the categorization of individuals without genetic anomalies associated with treatment results, however, the accuracy of the MRD measurement is closely related to the selection of methodology and antigen [10].

With the advancement of new molecular biology technologies, risk-stratification systems have begun to incorporate elements from high-throughput sequencing, such as somatic mutation profiling discovered by genome sequencing and gene expression profiling based on microarray or RNA sequencing [11,12,13,14,15]. One of the most recent risk categorization methods for pediatric AML has been used in the ongoing COG phase III study AAML1831 [8], and the high-risk group was further expanded with 6 alterations by using the whole-genome sequencing data from two sequential COG phase III trials (NCT01407757 and NCT01371981) for de novo pediatric AML patients to interrogate structural and molecular alternations in the associated genes [16]. As Adam J. Lamble et al. reported, the number of patients assigned to the allogenic donor stem cell transplantation cohort could have been increased by this expanded COG risk assessment algorithm (expanded_COG_AAML1831) [16]. The 2017 European LeukemiaNet (ELN) risk stratification system combines cytogenetic abnormalities and genetic mutations to provide guidance on the risk stratification of AML patients and is routinely used in clinical practice for adult AML patients [17]. Major strides have been made in our understanding of the AML pathophysiology recently, including the identification of the molecular etiology of the disease. The updated 2022 ELN guideline presented better performance in stratifying survival between adult patients with intermediate- or high-risk AML treated with induction chemotherapy [18, 19]. However, the distinct molecular profiling of AML in pediatric and adult patients limits the application of the models established in adult cohort to the pediatric population [20].

While it is challenging to apply most of the above-mentioned models extensively in the clinical environment since some have too many genes for ease of assay performance, some have interactions between the incorporated molecular makers, and some have mediocre risk stratification effectiveness in P-AML, new molecular biomarkers are needed for better prognostic classification, ultimately, better therapeutic targets of P-AML [21]. In this study, we aimed to establish a prognostic score for P-AML based on gene expression profile from public databases and validate its stability and forecasting performance, which may provide guidance for the choice of therapy and follow-up in P-AML.

Results

Patient characteristics

Clinical characteristics of the P-AML patients in TARGET 256, training dataset TARGET 145, validation datasets (TARGET 111, AAML1031 and JAPAN P-AML) can be found in Table S1. Based on the data from the cohorts mentioned above, we established and validated a prognostic model for pediatric AML. We also explored the potential clinical significance of the model and compared its prognostic ability with several existing models. Workflow of the study was presented in Fig. 1.

Fig. 1
figure 1

Overall design of this study

The distribution of age at diagnosis, gender, FAB (Leukemia French American British Morphology Code), treatment protocol and vital status of pediatric AML patients in TARGET 256 was shown in Fig. 2a and Table S1. The median age of all evaluated patients was 10.5 years (range, 10 days–22 years) and 135 patients (52.73%) were male. A total of 159 out of 256 patients (62.11%) received AAML0531 therapy, 56 (21.88%) received AAML03P1 therapy, and 41 (16.02%) received CCG-2961 therapy. The type or timing of induction therapy was randomized (CCG-2961 therapy) [22] or gemtuzumab ozogamicin (GO) was administered in a single-arm pilot (AAML03P1) [23] or a randomized fashion (AAML0531) [9]. Relapse as the first event occurred in 149 (58.20%) patients, and 47 (18.36%) patients had measurable residual disease (MRD1), defined as > 0.02% disease detected in the bone marrow by central difference from normal (ΔN) flow cytometry analysis after first course of induction chemotherapy [24, 25]. The percentage of hematopoietic stem cell transplantation (HSCT) was 12.50% (N = 32).

Fig. 2
figure 2

Clinical characteristics of the P-AML patients in TARGET 256. a The distribution of age at diagnosis (top panel), gender, FAB (Leukemia French American British Morphology Code), treatment protocol and vital status of pediatric AML patients in TARGET 256 (middle panel); b Clinical characteristics with significant differences between TARGET 145 and TARGET 111 (p < 0.01)

There were significant differences in COG, expanded_COG_AAML1831 and 2022ELN risk-group distribution, treatment protocol, GO treatment and FLT3_ITD (FLT3 Internal Tandem Duplication present) distribution between the TARGET 145 and TARGET 111 cohort (all p < 0.01, Fig. 2b). Compared to TARGET 145, more individuals received CCG2961 treatment protocol in TARGET 111 (22.52% vs. 11.03%, p < 0.01), more individuals were FLT3_ITD carriers (26.13% vs. 7.59%, p < 0.001), and less individuals received GO treatment (42.34% vs. 62.07%, p < 0.001). Considering the treatment subgroups in TARGET 256, P-AML patients received the AAML03P1 protocol presented the best prognosis (Fig. S1). These data revealed that TARGET 145 and TARGET 111 are two distinct datasets with clinical differences, akin to real-world scenarios of training and validation datasets.

Construction of the prognostic risk score

Our clinical hypothesis is that gene RNA expression levels may serve as reliable and convenient prognostic indicators. To investigate this issue, we conducted univariate Cox regression analysis of 57,599 genes in the TARGET 145 dataset. As a result, 5,943 genes were identified as the potential biomarkers for predicting the patient-specific OS (based on a threshold of p < 0.05 for individual gene analyses). Among these 5,943 genes, the LASSO-Cox regression analysis identified 14 genes that were the most relevant to OS prognosis (Fig. 3a b). Of these, expression levels of 11 genes (dichotomized by their median expression levels) were individually correlated with either a longer or a shorter OS according to the K-M survival curves (Fig. S2) and subsequently subjected to the multivariate Cox regression analysis. Notably, the results revealed that five genes were the independent predictors of OS (Fig. 3c, p < 0.05). A five-gene risk score (P-AML-5G) was then established by integrating the expression levels (normalized counts) and the coefficients derived from multivariable Cox regression analyses, and the formula was exhibited as below: Risk Value = [0.00024149 × COL23A1 expression] + [0.00029096 × TTC38 expression] + [0.00054436 × RNFT1 expression]- [0.000674 × ZNF775 expression] + [0.0001792 × CRNDE expression]. ROC curve analysis showed that area under the ROC curve (AUC)s of P-AML-5G score for 1-, 3- and 5-year OS were 0.86, 0.78, 0.80, respectively (Fig. 3d). The optimal cutoff risk value (1.676) was used further to divide patients into high- and low-risk groups. Patients in the low-risk group had a significantly longer OS than of those in the high-risk group (p < 0.001) (Fig. 3e). Patients in the low-risk group also had a significantly longer event-free survival (EFS) than of those in the high-risk group (all p < 0.001) (Fig. S3a). AUCs for 1-, 3- and 5-year EFS were 0.71, 0.70, 0.71, respectively (Fig. S3b). Heatmap showing the differential expression of five genes between the high- and low-risk groups (Fig. 3f). The results revealed that the expressions of COL23A1, TTC38, RNFT1, and CRNDE genes were significantly increased in the high-risk group compared with the low-risk group, while the expression of ZNF775 was significantly reduced in the high-risk group (Fig. S4).

Fig. 3
figure 3

Prognostic model construction. a Each curve represented the changed trajectory of each prognosis-related gene variable coefficient. When fourteen variables remained, the lowest partial probability of deviance was observed; b A coefficient spectrum was generated for screening variables. The first dotted vertical lines at optimal log (Lambda) value; c A total of five genes were identified according to multivariate cox regression analysis to construct the prognostic model; d ROC analysis of gene signature for prediction of OS risk at 1, 3, and 5 years in TARGET 145 dataset; e Kaplan–Meier curves of OS based on prognosis model (p < 0.001); f Expression profiles of 5 genes in the high-risk group and the low-risk group. OS: overall survival; ROC: receiver operating characteristic

Prognostic value validation

Further model validation results are essential to demonstrate the stability and feasibility of the model. As expected, patients in the high-risk group presented significantly shorter OS than those of the patients in the low-risk group in TARGET 256 (p < 0.001; HR = 3.74, 95CI% (2.52, 5.57) (Fig. 4a). Considering the treatment subgroups in the TRGET 145 dataset, the P-AML-5G risk groups provided powerful stratified performance in patients received AAML03P1 therapy (p < 0.001; Fig. S5a) and AAML0531 therapy (p < 0.001; Fig. S5b), while the result was unsatisfactory for stratifying patients received CCG2961 therapy (p = 0.31; Fig. S5c). Therefore, patients receiving protocol AAML0531 and AAML03P1 in TARGET 111 dataset were kept for internal model validation (N = 86, hereafter TARGET validation). The results showed that OS and EFS of the patients in the high-risk group were significantly shorter than those of the patients in the low-risk group (p = 0.005; HR = 2.63, 95CI% [1.30, 5.26]; p = 0.000; HR = 3.10, 95CI% [1.73, 5.56] (Fig. 4b and Fig. S6). We further demonstrated its prognostic generality in AAML1031 and JAPAN P-AML datasets with univariate Cox regression analysis [HR = 1.377, 95%CI (1.117 ~ 1.697), p = 0.003; HR = 1.756, 95%CI (1.103 ~ 2.793), p = 0.018, respectively]. A notable survival contrast was observed between the high and low-risk groups, as determined by the optimal split point identified in both of these datasets [p = 0.001; HR = 1.55, 95CI% (1.19, 2.01); p = 0.005; HR = 2.94, 95CI% (1.33, 6.53), respectively] (Fig. 4c-d). In comparison to the bone marrow tissue of healthy children, the RNA levels of COL23A1 and CRNDE exhibited significant upregulation in the bone marrow of pediatric patients participating in the AAML1031 study [all log2(foldchange) > 3, p < 0.001] (Fig. S7). The aforementioned results revealed that our model exhibited prognostic capability in multiple pediatric AML validation datasets, and two genes in the model showed significant differential expression between patients and controls.

Fig. 4
figure 4

Prognosis model validation in internal and external datasets. In a TARGET 256, b TARGET validation, c AAML1031 and, d JAPAN P-AML, patients with high-risk scores had worse OS than patients with low-risk scores (all p < 0.01)

Screening for prognostic factors for OS in the P-AML dataset

To investigate whether the P-AML-5G model is an independent prognostic factor for p-AML, we performed a univariate Cox regression analysis for screening the prognostic factors for OS using the P-AML-5G groups, COG, along with other clinical and/or genetic markers. Nine variables, including the P-AML-5G groups, treatment protocol, inv(16), MinusX, Cytogenetic Complexity, FLT3_ITD, WT1 mutation, COG risk stratification system and CBFB-MYH11 fusion were potential risk factors affecting OS in the TARGET 145 (p < 0.01, Table 1). Furthermore, multivariate survival analysis using the above variables found that the P-AML-5G groups was an independent prognostic factor for OS of the P-AML patients (p < 0.05, Table 2). In the TARGET validation dataset, the P-AML-5G groups, PB, COG risk stratification system and MRD1 were associated with the OS (p < 0.05) (Table 1), while P-AML-5G groups and COG risk system were independent prognostic factors (Table 2). Hence, the P-AML-5G groups offer independent prognostic information for the overall survival of pediatric AML across multiple datasets.

Table 1 Univariable Cox proportional hazards regression analysis of P-AML-5G
Table 2 Multivariable Cox Proportional Hazards Regression Analysis of P-AML-5G

Clinical significance of the P-AML-5G model

Pre-treatment risk stratification, post-treatment MRD status, and identifying suitable patients for HSCT are critical indicators or events in the clinical management of p-AML. Therefore, we also investigated the correlation between P-AML-5G risk groups and these indicators to explore the potential clinical significance of P-AML-5G. According to the Sankey diagram of the P-AML-5G groups and COG risk groups, we found that P-AML-5G resulted in the reclassification of 100% (8/8) of COG adverse patients, 25.00% (15/60) of COG favorable and 65.20% (45/69) of COG intermediate patients in the high-risk group of the TARGET 145 dataset. In the TARGET validation dataset, P-AML-5G classified 30.00% (9/30) of COG favorable, 47.80% (11/23) of COG intermediate, and 43.80% (7/16) of COG adverse patients to the high-risk group (Fig. S8).

Based on the above observations, we did Kaplan–Meier survival analysis for OS of the COG risk groups stratified by the P-AML-5G groups. We found that P-AML-5G groups provided prognostic information beyond that provided by COG. Noteworthy was that within intermediate risk groups, high-P-AML-5G score patient had worse prognosis than low-P-AML-5G score patients in TARGET 145, TARGET validation and AAML1031 (p < 0.001, p = 0.011, p = 0.021; respectively, Fig. 5a-c).

Fig. 5
figure 5

The P-AML-5G groups provided re-stratification value in the heterogeneous intermediate risk group defined by the traditional COG risk system in a TARGET 145, b TARGET validation and c AAML1031

Considering the missing rate of MRD1 and a low frequency of individuals receiving HSCT, the impact of the MRD1 and HSCT on the P-AML-5G risk groups was explored in the TARGET 256 dataset. MRD1 status was missing for 28.9% (74/256) of the patients. There were respectively 16.9% (25 of 148) and 20.4% (22/108) of subjects who were identified MRD1-positive in the low-, and high-risk categories. Although MRD1 had no impact on the P-AML-5G high-risk group (log-rank test, p = 0.22), MRD1-positive individuals in the P-AML-5G low-risk group presented a statistically suggestive worse prognoses compared to those with the low-risk/MRD1-negative patients (log-rank test, p = 0.09) (Fig. S9a). Including the four patients with incomplete HSCT information, there were 10.8% (16 of 148) and 14.8% (16/108) of individuals in the low-, and high-risk categories, respectively, who underwent HSCT. According to the K-M survival curves, HSCT showed the trend to improve the OS among the P-AML-5G high-risk patients (log-rank test, p = 0.20) (Fig. S9b).

In the TARGET 256 dataset, a total of 42 samples obtained from patients' peripheral blood were collected for further analysis. Our findings substantiate the model's significant capacity for risk stratification in relation to OS and the consistent trend observed for EFS in these peripheral blood samples (see Fig. S10) [p = 0.001; HR = 4.76, 95CI% (1.67, 12.5); p = 0.10; HR = 1.85, 95CI% (0.870, 3.85), respectively].

The above results demonstrated that using a single assay, the 5-gene prognostic model might enhance the current COG risk stratification system that currently relies on multiple tests, and has the potential to improve risk assessment for pediatric AML patients.

Comparison with existing AML risk stratification tools

To demonstrate the clinical applicability of the P-AML-5G model, we compared its prognostic capability with previous models across multiple datasets. As shown in Table 3 and Fig. 6, compared to the COG, LSC17 model and LSC6 model, the P-AML-5G groups demonstrated the highest value of C-index in TARGET 145, GO treatment and chemotherapy-only treatment subgroups (0.71, 0.758, 0.648, respectively, Fig. 6a-c). In TARGET validation and GO treatment subgroup, the COG risk system presented the best C-index values (0.66 and 0.693) (Fig. 6d-e). However, subgroup analysis suggested that our P-AML-5G groups demonstrated the highest values of C-index in the chemotherapy-only treatment subgroups of both the TARGET 145 and TARGET validation dataset (0.648 and 0.678, Table 3, Fig. 6c, f). Our results revealed that existing clinical prognostic tools and published RNA-seq based models were outperformed by the P-AML-5G groups in the chemo-therapy subgroup of both the training and validation cohorts, which may have the potential for precision treatment in the pediatric AML.

Table 3 Comparison of P-AML-5G with pre-existing AML risk classification tools
Fig. 6
figure 6

Model comparison of P-AML-5G with pre-existing AML risk classification tools. ac Compared with pre-existing AML risk classification tools, the P-AML-5G groups had the highest C-index value in TARGET 145 and two treatment subgroups; de The COG risk system presented the highest C-index value in TARGET validation and GO treatment subgroup of the TARGET validation; f The P-AML-5G groups had the highest C-index value in chemo-therapy treatment only subgroup of TARGET validation dataset; g The P-AML-5G groups had the highest C-index value in the whole-genome sequenced subgroup

The exclusion of individuals with missing values for any of the three risk assignment tools (COG, 2022ELN, expanded_COG_AAML1831) in the combined dataset TARGET 256, left us with a total of 150 pediatric AML cases (named TARGET WGS hereafter). The P-AML-5G groups showed good model performance with the highest C-index value (0.712), outperformed COG, 2022ELN, expanded_COG_AAML1831, and two LSC models (Fig. 6g).

Discussion

Prognostic guidance systems need to be adjusted as our knowledge of AML biology, cytogenetic and molecular characterizations, and the availability of novel treatment agents expand. This study established a cytogenic and genetic mutation independent five gene transcriptional signature, which may identify pediatric AML patients who will have negative outcomes at the time of diagnosis. The model has undergone successful validation in an internal validation set, as well as in two external validation sets. Notably, one of the external validation sets pertained to a Japanese population, characterized by a genetic background entirely distinct from that of the model's training cohort. In both the training and internal validation cohort, the risk model retained predictive value in multivariable analysis and improved risk prediction in the setting of traditional cytomolecular COG classification, especially in the chemo-therapy treatment-only subgroup.

A strong predictive biomarker of treatment outcomes would, in theory, early enough in the course of treatment identify a group of patients who had sufficiently high risk of recurrence and treatment resistance to warrant consideration of alternative therapies. In the training TARGET 145 and combined TARGET 256 datasets, the P-AML-5G risk groups in this study presented better performance in recognizing groups of high-risk patients from several clinical trials without the aid of predefined prognostic factors or COG risk system. Since the inclusion of the training dataset TARGET 145 in the combined cohort might unfairly bias the results towards the "trained" dataset, we further validate its independent prognostic value in the TARGET validation dataset. It should be highlighted that the TARGET dataset being enriched in poor performing AML and the training and validation sets exhibiting very different clinical characteristics. Meanwhile, our model also exhibits promising predictive capabilities for OS in a subset of peripheral blood samples, suggesting its clinical value in enabling risk assessment at an earlier stage with lower invasiveness. However, due to the limited involvement of only 42 peripheral blood samples in this study, further comprehensive investigations in larger population cohorts are warranted.

With respect to model comparison using C-index values, we would have expected the P-AML-5G groups to show the best model predictive power in the TARGET validation dataset, and then the results would have been very intriguing, considering the clinical convenience of RNA-seq assays versus cytological and mutation assays. This may be due to the small sample size of TARGET validation, differences in clinical characteristics from the training dataset and the fact that it includes people on two treatment regimens, which also reflects the high heterogeneity of p-AML patients at both the genomic and transcriptomic levels. However, the P-AML-5G is well-suited for future research and clinical use due to a number of properties. Firstly, P-AML-5G groups augments the existing standard COG risk system with new data, especially for re-stratifying heterogeneous populations now categorized as intermediate risk (over 40% of the combined cohort). Secondly, within patients receiving chemotherapy treatment only, P-AML-5G groups presented the best prognostic prediction performance. This result further suggests that a simple and straightforward gene expression assay panel, may have better performance in certain AML molecular subgroups or treatment subgroups. Thirdly, high-volume bone marrow is needed for many stages of testing according to the COG's current methodology, while gene-expression assay panel is much more economical, material and time saving, and facilitates the application in less developed areas. Based on above discussion, clinical use of the P-AML-5G model, either alone or in conjunction with traditional COG system, is completely consistent with the movement toward better risk stratification for P-AML.

With data from 150 whole genome-sequenced patients, we were for the first time able to compare P-AML-5G groups with several latest risk stratification systems for pediatric our adult AML. The ELN risk stratification system is universally accepted for the risk stratification of adult AML patients [26], however, its application in P-AML is not idealistic in previous studies [27, 28] and our analysis The fact that the ELN risk stratifications are unquantified and broad molecular differences between age cohorts may be to account for this result. Notably, we were unable to find a satisfactory prognostic value for the expanded_COG_AAML1831. The possible reasons might be the small sample size of the whole-genome sequenced individuals in this study or the complexity of the risk assignment for carriers with co-occurring mutations spanning risk categories (6 in 150 individuals within expanded_COG_AAML1831 groups) [29].

The significance of responsiveness to the first medication and evaluation of early MRD in the individual risk assignment is emphasized in addition to the baseline genetic characterization [24]. Although MRD1 did not hold independent prognostic implications for P-AML in this study, it was one of the prognostic indicators in the TARGET validation dataset. It was intriguing that the MRD1-positive patients in the low-risk P-AML-5G group had poorer trend of prognoses than that of the low-risk/MRD1-negative group patients. The strategy of combing P-AML-5G with the MRD status after the initial induction therapy might help identify patients at a higher risk in the P-AML-5G low-risk group and prevent a worse prognosis. From the perspective of screening HSCT candidates, patients in the high-risk group identified by our model might benefit from HSCT, which is consistent with previous findings [30, 31]. Since the risk groups determined by P-AML-5G showed a significant 61% difference in survival probability in the TARGET discovery dataset. In the TARGET validation dataset, this difference was observed to be 44%. These findings highlight the potential relevance of P-AML-5G in guiding HSCT treatment decisions for patients in the intermediate-risk category.

Expression levels of five feature genes (ZNF775, RNFT1, CRNDE, COL23A1, and TTC38) in the model were found to be significantly correlated with the prognosis of the P-AML patients. We also observed intense increased expression of CRNDE and COL23A1 in the bone marrow of patients, compared to that of healthy children. This novel discovery serves as additional evidence that reinforces the significance of these genes in pediatric AML. It expands the potential application of these genes in disease prognosis and underscores the importance of conducting further research on the functional mechanisms. The Zinc Finger Protein 775 (ZNF775) has been predicted to enable the DNA-binding transcription factor activity and found to have strong predictive value for the OS in the hepatocellular carcinoma patients [32], however, the functional role of this protein has not been explored in detail. Collagen Type XXIII Alpha 1 Chain (COL23A1) is reportedly associated with prostate cancer recurrence and distant metastases [33] and clinical stages in thyroid carcinoma [34]. Further investigation of urine samples from prostate cancer patients before and after prostatectomy suggested that collagen XXIII may have application as a biomarker in human fluids. Unfortunately, no reports on the function of the Ring Finger Protein, Transmembrane 1 (RNFT1), and Tetratricopeptide Repeat Domain 38 (TTC38) or their role in cancer have been found so far. Especially, Colorectal neoplasia differentially expressed (CRNDE) is a well-known long non-coding RNA which was considered to play crucial roles in the development of multiple cancers [35]. Moreover, it was highly expressed in AML [35, 36], displayed functional roles in AML proliferation [37] and indicated by previous analyses to have prognostic value in AML [36, 38]. Explorations focusing on the detailed mechanisms of CRNDE in P-AML pathology might help to identify a promising therapeutic strategy.

Taken together, we propose a robust P-AML-5G prognostic model specific for pediatric AML, which was created particularly utilizing data from pediatric AML outcomes. It has the potential to redefine traditional COG risk categorization, identify patients at high risk and offer the possibility of clinical application for the development of innovative treatment options by decreasing the panel and complexity of genetic markers without sacrificing the efficacy of the predictive capacity.

Materials and methods

Samples and datasets

Therapeutically Applicable Research to Generate Effective Treatments (TARGET) AML series consist of clinical data and RNA-sequencing from 256 peripheral blood or bone marrow samples of children, adolescents, and young adults with de novo AML enrolled on biology studies and clinical trials managed through the Children’s Oncology Group, hereafter referred to as the “TARGET 256”. OS is defined as the time from study entry until death. EFS is defined as the time from study entry until death, induction failure or relapse. Overall, the mean time of the follow-up was 4.15 ± 2.84 years for TARGET 256. A subset of samples from TARGET AML project was randomly chosen and provided to the Genomic Data Commons (GDC) database for harmonization (N = 145), facilitating integration and analysis of multiple datasets for GDC researchers. This dataset served as the training cohort for our study (https://portal.gdc.cancer.gov/projects/TARGET-AML, collected on 2021–10-21). The remaining 111 patients from TARGET were used as the validation set, simulating two distinct sets with clinical differences, akin to real-world scenarios of training and validation sets, gathered from the NCI’s data portal on 2021–10-21[https://target.nci.nih.gov/dataMatrix/TARGET_DataMatrix.html]. Individual-level whole-genome sequencing data of 150 samples in TARGET cohort were obtained from dbGaP (accession number phs000465). RNA-Sequencing and clinical data of 139 patients in Japan P-AML dataset deposited in the European Genome-Phenome Archive (EGAS00001003701) [39] was extracted for external validation purpose, hereafter referred to as the “JAPAN P-AML”. In addition, RNA-sequencing and clinical data from 943 pediatric patients and 70 normal bone marrow samples, obtained from the COG study AAML1031 [40], were collected for external validation (collected on 2023–11-03, 923 out of 943 patients with survival data available). These data, deposited in the GDC, were also utilized to examine the expression disparities of prognostic genes between patients and healthy children.

Prognostic score construction and validation for AML

Combined with the survival information, the genes related to the prognosis of P-AML patients in TARGET 145 were screened out by univariate Cox proportional hazards regression analysis using normalized counts value from DESeq2 tool [41] (p < 0.05). LASSO regression analysis was conducted to identify the most stable gene set with 1000 time iterations [42]. Subsequently, Kaplan–Meier survival analysis and multivariable Cox regression analysis were performed and a prognostic risk score formula was established based on a linear combination of expression levels weighted with the regression coefficients derived from the multivariate logistic regression analysis. Risk score = expression of gene 1 × β1 + expression of gene 2 × β2 + ⋯ + expression of gene n × βn. β values are the regression coefficients derived from the multivariate logistic regression analysis of the dataset. The optimal cutoff of risk score was determined via the maxstat package in R [43, 44], dividing patients into high-risk group and low-risk groups. TARGET 111 and JAPAN P-AML were used for prognostic value validation. All patients in the database were scored using the formula, and the optimal cutoff risk score was used to divide the patients into high and low groups. Kaplan–Meier survival analysis was applied to assess the prognostic value of the derived risk groups.

The traditional COG risk system used in the clinical trials on which the TARGET patients were enrolled (COG) [9, 45], expanded COG AAML1831 risk stratification system (expanded_COG_AAML1831) [16], 2022 ELN risk stratification system (2022ELN) [18] and two leukemic stem cell (LSC) score based models established for adult (LSC17) [46] or pediatric (LSC6) [14] AML, were selected for model comparison (references and detailed models can be found in Table S2; definition of risk-groups for COG, expanded_COG_AAML1831 and 2022ELN Classification systems can be found in Table S3).

Statistical analysis

The chi-square test was applied for comparing the statistical difference in categorical variables, and two-tailed Student's t test or Wilcoxon test was used for quantitative variables. Kaplan–Meier curves were plotted to estimate overall survival, and the log-rank test was performed to evaluate statistical significance of differences in survival. Variables identified as significant factors in the univariate Cox analysis were selected into the multivariate Cox proportional hazards regression analysis to identify the independent prognostic factors using the Forward Stepwise (conditional LR) method. ROC curves (receiver operating characteristic curves), the area under the curve (AUC) and the Harrell's concordance index (C-index) [47] were utilized with survival package in R to determine predictive values. Differentially expressed genes (DEGs) between pediatric patients and normal bone marrow samples in COG AAML1031 study were analyzed with R package DESeq2 [41]. All statistical analyses were carried out using R (3.5.2) software, and p < 0.05 (bilateral) was defined as a statistical difference.