Introduction

Standard induction treatment of patients with AML consists of cytarabine (ara-C), daunorubicin with or without etoposide (ADE or DA standard chemotherapy)1,2. Despite major advances in AML treatment, the development of drug resistance is one of the major causes of treatment failure and relapse in AML patients1,2,3. Previous studies have shown that genes involved in the metabolism or targeted by ADE chemotherapeutic agents (PK/PD genes) associate with the development of drug resistance and poor outcomes; however, these genes have been predominantly studied in isolation3,4,5. Because of concomitant administration of these drugs as induction regimen, we reasoned that comprehensive and systematic transcriptomic evaluation of genes of pharmacological significance to ara-C, daunorubicin and etoposide will help in providing a drug resistance score predictive of treatment outcomes in AML patients. To fulfill this goal, we cataloged a list of 67 genes involved in the metabolism or transport of ara-C, daunorubicin or etoposide and their potential drug targets. These genes can contribute to the emergence of drug resistance through various mechanisms as: (1) reduced cellular uptake due to low levels of uptake transporters; (2) increased efflux due to high expression of efflux transporters; (3) decreased expression or activity of enzymes responsible for the activation of pro-drugs; (4) increased expression or activity of enzymes responsible for the drug inactivation; (5) alterations in the expression or function of the molecular targets of the drugs. These key players have been well-established in impacting drug pharmacokinetics or pharmacodynamics a comprehensive transcriptomic evaluation using machine learning tools to develop a drug resistance signature has not been done. However, comprehensive evaluation of transcriptomic of these players have not been performed in AML. Previously, Least Absolute Shrinkage and Selection Operator (known as LASSO) based regression analysis defined a leukemic stemness score consisting of gene expression levels of 17 genes that was predictive of outcome has been reported6. A follow-up work defined a pediatric leukemic stemness score consisting of 6 genes in AML7. Within ALL, lasso analysis has been utilized to define prognostic risk factors8.

In this study, we evaluated the transcriptome of 67 pharmacologically relevant genes (listed in Table S1) in pediatric AML patients treated on the AML02 multi-center clinical trial. We utilized LASSO penalized regression on clinical outcome data to examine the significance of these genes and developed an ADE-Resistance Score (ADE-RS5) that was further validated in 10 independent AML cohorts. Recently our group developed a six-gene leukemic stem cell (pLSC6) score that associated with risk-groups and patient outcomes in pediatric AML7. Further combining the pLSC6 and ADE-RS5 score groups to incorporate both disease aggressiveness, as implied by the stemness score, and drug resistance, as reflected by resistance score was conducted across 10 cohorts of pediatric and adult AML patients, totaling 3634 individuals.

Results

Expression of five pharmacological genes defines a drug resistance score of prognostic value in AML02 discovery cohort

LASSO penalized Cox regression model using mRNA expression levels of 67 genes with EFS in 163 patients (model-development cohort) treated on multi-site AML02 trial identified five genes that passed at least 950 of 1000 leave-10%-out cross-validation replications of this analysis (Fig. 1 and Supplementary Fig. 1). This rigorous model-development process defined a five-gene ADE-Resistance Score (ADE-RS5) which was computed for each patient using gene expression weighted by the regression coefficients as defined in the following equation:

$$\begin{array}{l}{\rm{ADES}}-{\rm{RS}}5=\left({DCTD}^* 0.128\right)+\left({TOP}2A^* -0.0993\right)+\left({ABCC}1^* 0.212\right)\\\qquad\qquad\qquad\quad+\left({MPO}^* -0.113\right)+\left({CBR}1^* -0.126\right)\end{array}$$
(1)
Fig. 1
figure 1

Overall study design.

Each unit increase in ADE-RS5 was associated with a 7.32-fold increase in the rate of EFS events (p < 0.00001, 95% CI = 3.75–14.28) in a simple single-predictor Cox regression model. Dichotomization by recursive-partitioning resulted in classification of patients into two groups: low ADE-RS5 (n = 98 patients, 60%) or high ADE-RS5 score group (n = 65 patients; 40%). Though ADE-RS5 score groups did not differ by age, gender, race, risk group, FLT3-ITD status or WBC count at diagnosis, difference in distribution of cytogenetics was observed, as shown in Supplementary Table 2 summarizing patients characteristics by score groups. Within the discovery cohort, high ADE-RS5 score was a significant predictor of higher MRD1 positivity (OR = 2.39, 95% CI = 1.23–4.63, p = 0.013 Fig. 2A), lower EFS (HR = 4.07, 95% CI = 2.43–6.84; p < 0.0001 Fig. 2B), and OS probability in AML02 cohort (HR = 4.54, 95% CI = 2.42-8.49; p < 0.0001, Fig. 2B).

Fig. 2: ADE-resistance score predicts AML outcome in discovery cohort.
figure 2

High ADE-resistance score (ADE-RS5) significantly predicts MRD1 positivity (A), lower EFS and OS (B) probability in AML02 discovery cohort (n = 163). Association of the four groups classification based on integration of pLSC6 and ADE-RS5 scores (LSC6/RS5) with MRD1 (C), EFS and OS (D) probability in AML02 cohort. Forest plot showing results of multivariable cox regression analysis of association of ADE-RS, and the integrated pLSC6/ADE-RS5 score groups with EFS (E and G) and OS (F and H) after adjusting for risk group assignment, diagnostic WBC count, FLT3 status and age. For integrated LSC6RS5 scores: Group 1 = both LSC6 and ADE-RS5 scores are low; Group 2 = Low LSC6 score and High ADE-RS5 score; Group 3 = High LSC6 score and low ADE-RS5 score; Group 4= both LSC6 and ADE-RS5 scores are high.

Integration of ADE-RS5 score with previously established pLSC6 score in AML02 discovery cohort

We previously developed a clinically significant leukemic stemness score in pediatric AML and designated it as pLSC6 (derived from expression levels of DNMT3B, GPR56, CD34, SOCS2, SPINK2, and FAM30A). Patients within low pLSC6 score group previously showed better outcome as compared high pLSC6 group7. ADE-RS5 was tested within the pLSC6 score groups. Within low-pLSC6 score group (N = 97), high ADE-RS5 score was significantly associated with lower EFS (HR = 4.25; 95%CI = 1.08–10.04, P = 0.0009; Supplementary Fig. 2A) and OS (HR = 4.96; 95%CI = 1.57–15.64, P = 0.0063; Supplementary Fig. 2B) compared to the group of patients with low ADE-RS5 score. Similarly, within the high-pLSC6 score group (N = 66), patients with high ADE-RS5 score experienced lower EFS (HR = 1.98; 95%CI = 1.02–3.86, P = 0.044; Supplementary Fig. 2A), and OS (HR = 2.12; 95%CI = 0.99-4.52, P = 0.053; Supplementary Fig. 2B) as compared to low ADE-RS5 group. Based on these results, ADE-RS5 further enhanced the prognostic value of pLSC6 in predicting poor outcomes in patients with AML and thus we integrated pLSC6 and ADE-RS5 scores to classify patients into four groups (Fig. 1):

Group-1: Low: patients with low pLSC6 and low ADE-RS5 scores

Group-2: Low/High: patients with low pLSC6 and high ADE-RS5 scores

Group-3: High/Low: patients with high pLSC6 and low ADE-RS5 scores

Group-4: High: patients with high pLSC6 and high ADE-RS5 scores

Patient characteristics by the four-group assignment for the discovery cohort (AML02 trial) is summarized in Supplementary Table 2 and initial risk group assignment, cytogenetic features and FLT3-ITD status were the diagnostic features that differed by the groups. Patients in the integrated Group 4 (both pLSC6 and ADE-RS5 high scores) and Group 3 (high pLSC6 and low ADE-RS5) experienced greater MRD1 positivity (72% and 64%, Fig. 2C) as compared to Groups 1 and 2 (19% and 17%) implying it might be driven by high pLSC6 score.

With respect to survival outcomes patients within Group 4 had lower EFS (HR = 8.89, p < 0.0001) and OS (HR = 12.68, p < 0.0001) as compared to patients in Group 1 (Fig. 2D). Patients within Groups 2 and 3 showed intermediate outcome with significantly poor outcome as compared to the Group 1 (all p < 0.005, Fig. 2D).

In multivariable analysis after adjusting for diagnostic risk group assignment, WBC count, FLT3 status and age, ADE-RS5 remained an independent predictor of lower EFS and OS, (EFS: HR = 3.78, 95%CI = 2.24–6.41, p < 0.001; OS: HR = 4.32, 95%CI = 2.28-8.2, P < 0.001; Fig. 2E, F). Furthermore, in an integrated score group analysis with Group 1 as a reference group, significant poor outcome was observed in Groups 2, 3 and 4 for EFS (Group 2 vs. 1), HR = 4.68, p < 0.001, Group 3 vs. 1, HR = 3.22, p = 0.01, and Group 4 vs. 1, HR = 7.26, p < 0.001 Fig. 2G and OS (Group 2 vs. 1, HR = 5.75, p = 0.003, Group 3 vs. 1, HR = 3.73, p = 0.032, and Group 4 vs. 1, HR = 9.72, p < 0.001, Fig. 2H) in AML02 cohort.

Validation of transcriptomic based prognostic scores in >3000 patients from independent pediatric and adult clinical trials

We performed analysis of pLSC6, ADE-RS5 and integrated scores by combining all the pediatric cohorts together (4 different trials, total n = 1861) and all the adult cohorts together (5 different trials, total n = 1669). Distribution of patient characteristics by pLSC6, ADE-RS5 and integrated-pLSC6/ADE-RS5 scores across pediatric and adult validation cohorts is provided in Table 1. Overall consistent with our previous report, pLSC6 score group was significantly associated with patient’s risk group assignment, cytogenetics and FLT3 status and in addition to these factors, ADE-RS5 was associated with age in the combined pediatric and gender in the combined adult cohort.

Table 1 Distribution of patient characteristics in combined pediatric and adult AML validation cohorts (N = 1861 pediatric and 1669 adult AML patients)

In the combined pediatric cohort (n = 1861), EFS and OS showed significant and consistent association for ADE-RS5 (EFS: HR = 1.38 and OS: HR = 1.6, both p < 0.001; Fig. 3A), pLSC6 (EFS: HR = 1.9, and OS: HR = 2.1, both p < 0.001; Fig. 3B). For integrated pLSC6/ADE-RS5 group analysis with Group 1 being reference both EFS and OS showed inferior outcome in other groups (EFS: HR = 1.31, p = 0.005 (Group 2 vs Group 1), HR = 1.99, p < 0.001 (Group 3 vs Group 1) and HR = 2.13, p < 0.001 (Group 4 vs Group 1); OS: HR = 1.54 (Group 2 vs. Group 1), HR = 2.18 (Group 3 vs. Group 1), and HR = 2.62 (Group 4 vs Group1) all p < 0.001; Fig. 3C). Endpoint associations are also summarized in Table 1. In multivariable analysis after adjusting for cytogenetics risk group, age and WBC count in the combined pediatric AML patient population, ADE-RS5 was not significantly associated with ESF (p = 0.3) and OS (p = 0.06) however pLSC6 showed consistent significant association with EFS (p < 0.001) and OS (p < 0.001) (Supplementary Fig. 3A, B). For integrated pLSC6/ADE-RS5 score, groups 3 and 4 consistently showed significantly association with poor EFS and OS after adjusting for age, risk group and WBC as compared to group 1 (Supplementary Fig. 3C). Given that MRD after induction I holds prognostic value in driving the clinical decisions, we analyzed pLSC6 and ADE-RS5 scores with MRD1 data which was available in 3 of the 4 cohorts. ADE-RS5, pLSC6 and integrated score groups showed consistent and significant association with MRD1 (MRD1 positivity: ADER-RS5, high vs. low: 38% vs. 21%; pLSC6, high vs. low, 43% vs. 18% and for integrated score groups, 16% of group 1, 22% of group 2, 34% of group 3 and 50% of group 4 patients were MRD1 positive, all p < 0.0001, Figs. 4A, C, E, respectively). Individual and integrated score groups remained significant predictors of MRD1 in multivariable logistic regression models after adjusting for age, risk group, WBC and FLT3 status (ADE-RS5; OR = 1.68, p < 0.001, pLSC6; OR = 2.32, p < 0.001, for integrated score groups; using group 1 as reference, group 2: OR = 1.24, p = 0.26, group 3: OR = 1.84, p < 0.001, group 4: OR = 3.25, p < 0.001 Fig. 4B, D, F, respectively).

Fig. 3: ADE-RS5, pLSC6 and integrated score groups predict EFS and OS in large pediatric and adult AML validation cohorts.
figure 3

Association of ADE-RS5 (A), pLSC6 (B), and the integrated LSC6/ADE-RS5 four score groups (C) with EFS and OS in the combined pediatric AML validation cohorts from multiple multi-site clinical trials (N = 1861, 4 trials). Association of ADE-RS5 (D), pLSC6 (E), and the integrated LSC6/ADE-RS5 four score groups (F) with OS in the combined adult AML validation cohorts from multiple multi-site clinical trials (N = 1669 patients, 5 cohorts).

Fig. 4: ADE-RS5, pLSC6 and integrated score groups predict MRD after induction 1 in large pediatric AML validation cohorts.
figure 4

Association of ADE-RS5 (A), pLSC6 (C), and the integrated LSC6/ADE-RS5 four score groups (E) with MRD1 in 1507 pediatric AML patients (COG-cohort1, COG-cohort2, and AML08 datasets). Forest plots showing results of multivariable cox regression analysis of association of ADE-RS5 (B), pLSC6 (D), and the integrated score groups (F) and MRD1 after adjusting for risk group assignment, diagnostic WBC count, FLT3 status and age. * MRD1 data was not available from the pediatric GSE17855 dataset.

Within adult AML, we investigated 1669 patients from 5 different cohorts (OS data was available in all cohorts, but EFS was available on only 3 cohorts). In the combined cohort analysis of patients from 5 different trials, OS showed significant and consistent association with ADE-RS5 score (OS: HR = 1.32, p < 0.001; Fig. 3D), pLSC6 (OS: HR = 1.99, p < 0.001; Fig. 3E), and integrated pLSC6-ADE-RS5 score groups (Group 2 vs.1: OS: HR = 1.25, p = 0.013 Group 3 vs. 1: HR = 2.35, p < 0.001 and Group 4 vs. 1: HR = 1.99, p < 0.001 Fig. 3F). Multivariable analysis after adjusting for risk group, age, and FLT3-ITD mutation, ADE-RS5, pLSC6 and integrated pLSC6 and ADE-RS5 score groups remained significant predictors of OS (Supplementary Fig. 3D–F). For cohorts with EFS data available (n = 1306), consistent significant associations were observed for the ADE-RS5 (HR = 1.23, p = 0.001), pLSC6 (HR = 1.86, p < 0.001) and integrated score groups (Group 3 vs. 1 and Group 4 vs. Group 1, p < 0.001) (Supplementary Fig. 4A, C, E). In multivariable analysis adjusting for age, risk group and FLT3-ITD mutation, pLSC6 and integrated score groups remained significant predictors of EFS (Supplementary Fig. 4B, D, F).

Age stratified analysis for adults less than 65 years old and elderly patients who are ≥65 years old showed pLSC6 (pLSC6 low vs. high, <65 yrs, HR = 2.06, P < 0.00001; ≥65 yrs, HR = 2.02, P < 0.00001, Supplementary Fig. 5A, C), and ADE-RS5 (low ADE-RS5 vs. high, <65 y, HR = 1.37, P < 0.00001, and ≥65 yrs, HR = 1.21, p = 0.093, Supplementary Fig. 5E, F) to be associated with OS. The integrated scores remained a significant predictor of OS in the two age groups (Supplementary Fig. 5I, K). In the multivariable analysis adjusting for risk group assignment and FLT3-ITD mutation, pLSC6 and the integrated scores remained as significant independent predictor of OS in both age groups (Supplementary Fig. 5). Given cytogenetically normal (CN) subgroup of AML patients constitute significant proportion of patients and experience highly heterogenous response, we evaluated ADE-RS5, pLCS6 and integrated scores within these subgroups in all the 9 cohorts as well as in an additional cohort of CN patients from GSE71014 dataset. Consistent with the results from the whole cohort within CN-AML with high-pLSC6/high ADE-RS scores experienced significantly lower EFS and OS compared to low-pLSC6/low ADE-RS score group in pediatric and adult cohorts (Supplementary Fig. 6A, B). Multivariable analysis adjusting for age, WBC count at diagnosis and FLT3-status, pLSC6, ADE-RS5, and integrated score groups remained significant independent predictors of outcomes in pediatric and adult CN patients (Supplementary Fig. 6A, B).

Additionally, hematopoietic stem cell transplant (HSCT) can have a significant impact on outcome and we previously showed that patients with high pLSC6 score do not show benefit from HSCT in AML02 cohort7. Though HSCT information was not available in all cohorts we evaluated HSCT as a time-dependent variable for pLSC6, ADE-RS5 and the integrated score in 4 cohorts with availability of data. As shown in Supplementary Fig. 7, the score groups remained significant predictor of EFS and OS.

In addition to the analysis performed in the combined cohorts for pediatric and adult AML, each cohort was evaluated individually. Figure 5 shows a summary of results for association of both pLSC6 and ADE-RS5 scores in individual cohorts for EFS (5 pediatric cohorts) and 3 adult AML cohorts, (N = 3330) and OS (5 pediatric and 5 adult AML cohorts, total N = 3693). Consistent with the results from the discovery cohort pLSC6 was significantly associated with EFS (Fig. 5A) and OS (Fig. 5B) in all individual cohorts tested with common effect of HR = 1.95, 95%CI = 1.78–2.14, p < 0.00001 for association with EFS, and HR = 2.06, 95%CI = 1.88–2.26, P < 0.00001 for association with OS. ADE-RS5 was significantly associated with EFS in all cohorts (p < 0.01) except for AML08 (p = 0.07) and the Leucegene (p = 0.55) cohort, and with OS in all cohorts (p < 0.01) except for AML08 (p = 0.12), Beat AML (p = 0.8) and the Leucegene (p = 0.68) cohort, with common effect of HR = 1.34, 95%CI = 1.23–1.46, p < 0.00001 for association with EFS, and HR = 1.45, 95%CI = 1.32–1.59, p < 0.00001 for association with OS (Fig. 5C, D). Figure 5E–J shows the results for integrated LSC6-ADE-RS5 score (Groups 2–4 vs. Group 1) again showing Group 4 with worst outcome as compared to Group 1.

Fig. 5: Metanalysis forest plots for ADE-RS5, pLSC6 and integrated score groups in 10 AML cohorts.
figure 5

Meta-analysis of EFS in 8 pediatric and adult AML datasets and OS in 10 pediatric and adult AML datasets by pLSC6 (A, B), ADE-RS5 (C, D), and integrated LSC6/ADERS scores group 1 vs. 2 (E, F), group 1 vs. group 3(G, H), group 1 vs. group 4 (I, J).

Discussion

Cytarabine, daunorubicin and etoposide (ADE) are commonly used for induction of remission and intensification of pediatric AML. A combination of cytarabine and anthracyclines is the mainstay of treatment in adults. However, development of chemotherapeutic resistance is a major cause of AML treatment failure3,5. In recent years, significant effort has been devoted on transcriptomics based prognostic factors including leukemic stemness score (LSC17) reported in 20166 in an adult AML. Our group had previously leveraged the leukemia stemness genes identified by Ng et al. 6 and using outcome data from pediatric AML developed a pediatric leukemic stemness score that composed of 6 genes7. In addition to leukemic stemness that defines disease aggressiveness, development of drug resistance is an inherent clinical challenge. In this study, we used similar strategy to define a chemotherapeutic resistance score focused on key genes of pharmacological relevance (pharmacokinetics/ pharmacodynamics) to ADE. After running LASSO regression key genes of pharmacological relevance to ADE, we defined an ADE-RS score that was computed for each patient based on the expression level of five genes multiplied by their regression coefficients. These five genes included (i) deoxycytidylate deaminase (DCTD), a deaminase involved ara-CMP to ara-UMP conversion; (ii) ATP Binding Cassette Subfamily C Member 1 (ABCC1), an efflux transporter implicated in daunorubicin and etoposide efflux; (iii) Myeloperoxidase (MPO), involved in etoposide-catechol to quinone conversion9, MPO is also a myeloid cell specific marker10; (iv) Topoisomerase II alpha (TOP2A), daunorubicin and etoposide target11; and (v) Carbonyl Reductase 1 (CBR1), involved in reduction of daunorubicin to daunorubicinol12,13. Drug metabolism is a very complicated process with involvement of influx, efflux transporters, activating and inactivating enzymes and the dynamic interaction between these making it very challenging to simultaneously study all of these. Thus, alternative approaches as are done here provide some insight into drug responsiveness governed by pharmacological genes. To the best of our knowledge this is one of the first studies to apply this approach to establish drug resistance score that holds prognostic value and is predictive of survival outcomes.

Further a previously established pLSC6 and newly developed ADE-RS5 score were evaluated as a prognostic factor in 9 independent pediatric and adult AML cohorts totaling more than 3000 patients. pLSC6 score was validated in each cohort and within cytogenetically normal group as well as within patients less than and more than 65 yrs old. This is in contrast to recent observation where LSC17 was not associated with EFS and OS in patients ≥60 yrs age14.

Furthermore, the ADE-RS5 score predicted outcome within low and high pLSC6 groups indicating it offers additional prognostic value beyond that captured by the pLSC6 score alone. Thus, a four-group classifier system (Group 1 to Group 4) was developed for patients. Integrated stemness and drug-resistance score groups predicted outcome in both pediatric and adult AML patients as well as within different cytogenetic subgroups as well as within CN-AML. Group 1 representing patients with low-LSC6 and low-ADE-RS5 group had the most favorable outcome and group 4 with both high scores had the poorest outcomes. In addition, both pLSC6, ADE-RS5, and the integrated score groups, were significant and independent predictors of poor outcomes after adjusting for risk group assignment, age, FLT3-ITD mutation and WBC count at diagnosis. ADE-RS5 is not validated in BEAT AML and Leucegene cohorts and we believe this may be due to older age of the patient’s, different frequency of cytogenetic risk categories, treatment regimens without etoposide, and potential effect of transplant. Gene expression levels of all genes that are part of LSC17 was not available in all cohorts due to the type of array used, however we evaluated LSC17 groups as previously described and stayed significant predictor of OS. Combination of ADE-RS5 and LSC17 showed added value of ADE-RS5 in predicting survival (Supplementary Fig. 8). Despite this being one of the few studies with large patients’ samples across multiple cohorts there are some limitations such as non-uniform treatment protocols across the cohorts, continued updates on the AML classification resulting in changes in the initial risk group classification in older trials, variability in the post-induction treatment protocols across trials and centers, lack of availability of EFS data and time to transplant in some adult cohorts, lack of mechanistic studies supporting functional relevance of some of the genes that are part of the score.

In conclusion, this report highlights the significant prognostic value of multi-gene transcriptomics-based scores, that includes the assessment of disease aggressiveness through pLSC6 score and drug resistance via ADE-RS5 score. Our analysis reveals that the pLSC6 stemness score is a significant predictor of outcome and associates with high-risk group features, the ADE-RS5 drug resistance score adds further value, reflecting the clinical utility of simultaneous testing of both to optimize treatment strategies. One notable aspect of this study is the evaluation of nine entirely independent clinical cohorts, including both pediatric and adult AML patients from various countries. Evaluation of only 6 genes highlights the simplicity of clinical utility of pLSC6. Future clinical translation of these results, can be accelerated by use of a simple method for quantification of 11 genes such as that based on RT-PCR or use of nano string based assay, we have previously shown consistency for pLSC6 score across three platforms U133A, RNAseq and RT-PCR7. Future work is focused on developing a web-based tool that will allow for other investigators to utilize our signatures to predict treatment outcomes and refining patient classification.

Methods

Patient cohorts- AML02 discovery cohort

For this study, we included 163 patients treated on the multicenter AML02 clinical trial (ClinicalTrials.gov Identifier: NCT00136084). Patients with acute promyelocytic leukemia or Down’s syndrome were excluded, patient characteristics, risk group assignment and definition of clinical endpoints including minimal residual disease after induction I course of the treatment (MRD1), event-free survival (EFS) and overall survival (OS) have been previously described15. Gene expression profiling of leukemic blasts obtained at diagnosis in the AML02 discovery cohort was performed using GeneChip® Human Genome U133A [Affymetrix, Santa Clara, CA] as described previously16. The MAS 5.0 algorithm was used to obtain normalized gene expression signals. Expression data for 67 genes of relevance to ADE pharmacology (listed in Supplementary Table S1) was extracted and log2 transformed before the analysis.

Validation cohorts

AML patient cohorts with both gene expression data from diagnostic specimen and clinical outcome data available were included in the validation studies. Patients diagnosed with myelodysplastic syndrome (MDS), myelodysplastic syndrome refractory anemia with excess blasts (MDS-RAEB), Down’s syndrome-related AML and acute promyelocytic leukemia (APL; FAB-M3), data from specimens not from diagnosis or those missing survival data were excluded from the study. The validation cohorts are summarized below and listed in Fig. 1 (additional details are provided in the Supplementary Material). All the cohorts were evaluated for association between transcriptomic scores and clinical outcome endpoints individually as well as in a combined into pediatric and adult AML datasets. Use of data and/or specimens were approved by the respective protocol or institutional Institutional Review Boards, and informed consent was obtained from parents/guardians or patients and assents from the patients, as appropriate, in accordance with the approved clinical trial protocols and in accordance with Helsinki declaration. Study was approved by University of Florida Institutional Review Board.

Pediatric AML-children’s oncology group (COG) AAML0531 and AAML03P1

This dataset included 601 pediatric AML patients treated under the COG AAML053117 (NCT00372593; N = 531) and AAML03P1 (NCT0070174; N = 70)18 trials. Details on the clinical trial and outcome have been previously published17,19. The RNAseq and clinical outcome data was provided by COG or downloaded through TARGET-AML project dataset (https://ocg.cancer.gov/programs/target/projects/acute-myeloid-leukemia).

Pediatric AML -children’s oncology group (COG) AAML1031

This dataset included 941 pediatric AML patients treated under the COG-AAML1031 (NCT01371981). RNAseq and clinical outcome data provided by COG or obtained from TARGET-AML project (https://ocg.cancer.gov/programs/target/projects/acute-myeloid-leukemia). Details on the clinical trial and outcome have been previously published20.

Pediatric AML-AML08 cohort

This dataset included 122 pediatric AML patients treated under the multi-center AML08 clinical trial (NCT00703820) and were included in this evaluation21. RNA samples from diagnosis were available from 122 patients and gene expression data on 11 genes of interest was generated using Taqman based assay as detailed in Supplementary Material. Details on the clinical trial and outcome have been previously published21.

Pediatric AML-GSE17855 cohort

For this cohort, data from 197 pediatric AML patients (following exclusion criteria listed above) were included in this study. Patients received treatment on 8 different trials. Expression data generated using U133 plus array was downloaded from Gene Expression Omnibus (GSE) database (GSE17855).

Adult AML-GSE68833- the cancer genome atlas (TCGA) cohort

This dataset included 165 adult AML patients with publicly available clinical and gene expression data. U133-Plus microarray gene expression data was downloaded for this group of patients from Gene Expression Omnibus database (GSE68833). RNA-Seq gene expression data for 153 patients was also available for this cohort.

Adult AML-GSE37642

This dataset included 374 adult AML patients treated in the German AMLCG-1999 trial22 with publicly available gene-expression data generated using U133A array23.

Adult AML-GSE6891

This dataset included 417 adult AML patients treated according to sequential Dutch-Belgian Hemato-Oncology Cooperative Group and the Swiss Group for Clinical Cancer Research multiple HOVON trails with publicly available gene expression data generated using U133 plus array.

Adult AML-BeatAML

Clinical data was downloaded from http://www.vizome.org/aml/ and merged with clinical data downloaded from C-bioportal-OHSU24. After applying exclusion criteria indicated above, 198 patients were included in the current study.

Adult AML-Leucegene AML cohort

This dataset included 515 adult patients with newly diagnosed AML who were treated with intensive induction chemotherapy (7 + 3 based regimens) in Quebec (Canada) between 2001 and 2019. Diagnostic bone marrow or peripheral blood samples were collected and stored by the Quebec leukemia cell bank (bclq.org). Gene expression data was generated with whole transcriptome sequencing using an Illumina HiSeq 2000 sequencing system as part of the Leucegene project (leucegene.ca) and clinical data was collected and validated by the Quebec leukemia cell bank (details in supplementary material).

Adult AML-GSE71014- Cytogenetically normal AML dataset

Cytogenetically normal AML (CN-AML) patients (n = 104) treated at the National Taiwan University Hospital (NTUH)25 with gene-expression and clinical data available (HumanHT-12 V4.0 expression bead chip).

All the gene expression data was log2 transformed before analysis. RNA-Seq data was normalized as Reads per kilo base of transcript per million mapped reads (RPKM) or transcripts per million (TPM). We used log2 (RPKM + 1) or log2 (TPM + 1) values for subsequent statistical analysis. Supplementary Table 2 provides a list of probe/assay IDs for the 11 genes that constitute pLSC6 and ADE-RS5 score.

Clinical Outcome endpoint definitions

Minimal residual disease after induction I course (MRD1) of treatment was defined as one or more leukemic per 1000 mononuclear cells (≥0.1%). Event-free survival (EFS) was defined in the AML02 discovery cohort as the time from study enrollment to induction failure, relapse, second malignancy, refusal of therapy, removal from therapy because of unacceptable toxicity, or death, with patients who had not experienced any of these events censored at last follow-up. The definition of EFS among other clinical trials is described in the respective clinical trial outcome reports cited above or in supplemental information. Overall survival (OS) was defined as the time from study enrollment to death, with living patients censored at last follow-up.

Development of ADE-RS score

We utilized a least absolute shrinkage and selection operator (LASSO) Cox regression model, as implemented in glmnet package of the R3.6.0 statistical software (www.r-project.org), to the gene expression levels (67 genes of pharmacological relevance to ADE) and the EFS data of patients from the AML02 discovery cohort. To evaluate the variability and reproducibility of the LASSO Cox regression model estimates, we repeated the LASSO Cox regression fitting process for each of 1,000 leave-10%-out cross-validation evaluations. Genes with non-zero coefficient estimates in at least 950 of these 1000 evaluations were retained. The final model coefficient was obtained by averaging the coefficient estimates obtained for the set of cross-validation evaluations. We further utilized a recursive partitioning survival model, as implemented in the rpart package, to dichotomize ADE-resistance scores into “low” and “high” score groups (60% as low and 40% as high).

Integrated pLSC6/ADE-RS5 score groups

pLSC6 score was generated based on the expression level of six genes: DNMT3B, GPR56, CD34, SPINK2, SOCS2, FAM30A multiplied by their regression coefficients as defined previously7. Patients were classified as low or high pLSC6 groups as defined previously. Based on combination of the pLSC6 and ADE-RS5 score group designation, patients were further grouped as described in the results section. Association between pLSC6, ADE-RS5, and integrated score groups with clinical outcome endpoints was analyzed on the individual cohort level of pediatric AML datasets that included COG-cohort 1 (N = 601), COG-cohort 2 (N = 941), AML08 (N = 122) and GSE17855 (N = 197), and in the combined pediatric totaling 1861 patients. Similarly, we analyzed validation adult AML datasets individually in GSE68833-TCGA (N = 165), GSE37642 (N = 374), GSE6891 (N = 417), Beat-AML (N = 198), Leucegene (N = 515) cohorts as well as in the combined cohort totaling 1669 adult AML patients.

Statistical analysis

Survival analyses were performed using survival and survminer packages in R3.6.0. EFS and OS probabilities were estimated using the Kaplan-Meier method and Cox proportional hazard models was used to compare the survival curves of patients within ADE-RS5, pLSC6 and integrated pLSC6/ADE-RS5 score groups (Groups 1–4) as well as the association between each individual prognostic factor and survival outcomes. Multivariable Cox proportional hazards model was used to evaluate the independent prognostic effect of the study covariables. Wilcoxon rank-sum or Kruskal-Wallis tests was used for continuous variable comparisons between/among patient subgroups. Chi-square or fisher exact tests were used for testing association between categorical variables. For the meta-analysis, HRs and their 95% CIs were from Cox proportional hazard model with or without adjustment of known factors in individual cohorts. The overall HRs were estimated using meta-analysis (meta_6.1-0) with fixed effect model. The overall HRs were also provided with random effects allowing for heterogeneity among cohorts. Heterogeneity could be evaluated by I2. All analyses were conducted in R Statistical software version 3.6.0 (R Foundation for Statistical Computing, Vienna, Austria) R-4.2.1, and a two-tailed P value less than 0.05 was deemed statistically significant. The R script codes are available at GitHub (https://github.com/Abdelrahman-Elsayed/kit-nfold-cv-glmnet/blob/master/kit-nfold-cv-glmnet-v0.R). A stepwise model development flow chart is provided in Supplementary Material.