Abstract
Juvenile myelomonocytic leukemia (JMML) is a rare heterogeneous hematological malignancy of early childhood characterized by causative RAS pathway mutations. Classifying patients with JMML using global DNA methylation profiles is useful for risk stratification. We implemented machine learning algorithms (decision tree, support vector machine, and naïve Bayes) to produce a DNA methylation-based classification according to recent international consensus definitions using a well-characterized pooled cohort of patients with JMML (n = 128). DNA methylation was originally categorized into three subgroups: high methylation (HM), intermediate methylation (IM), and low methylation (LM), which is a trichotomized classification. We also dichotomized the subgroups as HM/IM and LM. The decision tree model showed high concordances with 450k-based methylation [82.3% (106/128) for the dichotomized and 83.6% (107/128) for the trichotomized subgroups, respectively]. With an independent cohort (n = 72), we confirmed that these models using both the dichotomized and trichotomized classifications were highly predictive of survival. Our study demonstrates that machine learning algorithms can generate clinical parameter-based models that predict the survival outcomes of patients with JMML and high accuracy. These models enabled us to rapidly and effectively identify candidates for augmented treatment following diagnosis.
Similar content being viewed by others
Introduction
Juvenile myelomonocytic leukemia (JMML) is a rare myelodysplastic/myeloproliferative neoplasm that occurs during infancy and early childhood. It is characterized by excessive myelomonocytic cell proliferation and granulocyte–macrophage colony-stimulating factor hypersensitivity1,2,3,4,5. More than 90% of patients with JMML harbor mutually exclusive somatic and/or germline mutations in canonical RAS pathway genes, including PTPN11, NF1, NRAS, KRAS, and CBL6.
Three independent studies described DNA methylation subgroups in JMML based on genome-wide DNA methylation data7,8,9. These studies demonstrated that high methylation subgroups were highly correlated with most of the established JMML risk factors, including older age10,11, higher hemoglobin F (HbF)10,11, lower platelet count10,11, PTPN11/NF1 mutations6,12, the presence of secondary genetic events13,14, LIN28B overexpression15, and an AML-like expression profile16, and the high methylation subgroups were also associated with poor survival17,18. Recently, a joint analysis of the above three groups established an international consensus definition for the DNA methylation subgroups of JMML19.
Methylation subgrouping will be an important parameter for stratifying patients with JMML for treatment in upcoming clinical trials. However, the high cost and relatively long turnaround time will likely limit the clinical implementation of DNA methylation analysis to a few developed countries. Thus, the development of an inexpensive and rapid predictor of DNA methylation profiling is likely to benefit patients with JMML.
Decision trees, support vector machine (SVM), and naïve Bayes models are widely used supervised machine learning techniques. A decision tree is an intuitive approach of classification using the standard classification and a regression tree algorithm to select a split with the best optimization criterion, which is then recursively repeated for the two child nodes20. SVM classifies data by determining the linear decision boundary, also known as the hyperplane, that separates all data points of one class from another. The best hyperplane for SVM is the one with the largest margin between the two classes21. Naïve Bayes is a family of classifiers that implements Bayesian techniques to form a simple network based on previously established probabilities22.
In the present study, we performed a preliminary clinical parameter-based prediction of DNA methylation classification using a large cohort of patients with JMML via an international collaboration. We utilized the above-mentioned three machine learning algorithms to classify patients into two or three subgroups in accordance with DNA methylation profiles and assessed the clinical utility of these algorithms for predicting prognosis in an independent cohort. Implementation of these models will allow providers to stratify patients with JMML in accordance with recently published DNA methylation classifications.
Results
Training data
We collected previously published clinical data and 450k array DNA methylation data from 130 patients with JMML from two of our prior studies (93 from Nagoya University, Japan; 37 from University of California, San Francisco, USA)8,9. After excluding two patients with incomplete clinical information, we implemented three supervised machine learning algorithms, i.e., a decision tree, SVM, and a naïve Bayes model, to predict the dichotomized methylation profiles for the remaining 128 patients with JMML (Fig. 1A). The baseline characteristics of the aggregated cohort are summarized in Table 1. The concordance rate was 82.8% for the decision tree, 83.6% for the SVM, and 82.8% for the naïve Bayes model (Fig. 1B). The decision tree used age, HbF, and PTPN11 mutation status to stratify patients. Based on the classification by the decision tree, we also compared the baseline characteristics between patients with clinically predicted LM (c-LM) and clinically predicted HM/IM (c-HM/IM). Univariable logistic regression models showed associations between clinical parameters and the dichotomized outcome (HM/IM vs. LM). The models indicated that the following factors were associated with the 450k array-based methylation classification: older age (≥ 24 months of age) (P = 6.7 × 10−8), age-adjusted HbF elevation (P = 8.2 × 10−8), the presence of CBL mutations (P = 6.8 × 10−4), PTPN11 mutations (P = 6.0 × 10−8), NF1 mutations (P = 0.017), and NRAS mutations (P = 0.003). Lower platelet count and KRAS mutations were not significantly associated factors (P = 0.67 and 0.98, respectively) (Table 2).
We also implemented the three algorithms to mimicking three DNA methylation subgroups. There was a high concordance rate between predicted and actual methylation classification profiles in the decision tree (82.8%) and the SVM (83.6%), although the naïve Bayes model failed to classify the patients into three subgroups (Fig. 2A). The decision tree in this analysis used the following variables: age, HbF, monosomy of chromosome 7 (monosomy 7), platelet counts, and PTPN11, CBL, and KRAS mutations (Fig. 2B).
Survival analysis
In survival analyses, the incidence rate of death was 10.6 [95% confidence interval (CI) 7.9–14.2] per 100 patient-years and that of transplantation was 55.2 (95% CI 45.3–67.1) per 100 patient-years in the training cohort, while it was 10.0 (95% CI 6.1–16.3) and 111 (95% CI 86.1–143.4) respectively, in the validation cohort. The median transplantation-free survival (TFS) was 6.5 and 6.0 months in the training and validation cohorts, respectively.
Patients with array-based LM had significantly higher overall survival (OS) and TFS than those with array-based HM/IM (P = 0.001 and 8.35 × 10−12, respectively) (Fig. 3A,B). Based on the decision tree algorithm, patients with c-LM (n = 73) had higher OS and TFS than those with c-HM/IM (n = 55) (P = 0.015 and 7.54 × 10−14, respectively) (Fig. 3C,D). The SVM classified patients into a c-LM subgroup with 76 patients and a c-HM/IM subgroup with 52 patients; it showed a higher OS and TFS in the c-LM group (P = 0.004 and 7.15 × 10−14, respectively) (Supplementary Fig. S1A,B). The naïve Bayes model classified patients into a c-LM subgroup with 73 patients and a c-HM/IM subgroup with 55 patients, and showed a higher OS and TFS in the c-LM group (P = 0.015 and 7.54 × 10−14, respectively) (Supplementary Fig. S2A,B).
Next, we implemented these algorithms to analyze the survival of patients in these three methylation subgroups. Figure 4A,B show the Kaplan–Meier estimates comparing array-based HM, IM, and LM. Both OS and TFS were significantly different across the array-based methylation subgroups (P = 0.002 and 1.25 × 10−13, respectively). Regarding the preliminary clinically predicted methylation subgroups, both OS and TFS were significantly different across the subgroups in the decision tree (P = 0.0026 and 1.74 × 10−13, respectively) (Fig. 4C,D) and the SVM (P = 6.05 × 10−5 and 1.51 × 10−13, respectively) (Supplementary Fig. S3A,B). The naïve Bayes model failed to classify patients into three subgroups.
Validation using an independent cohort
We applied the clinical parameter-based prediction models to an independent cohort (n = 72) (Table 1). Using dichotomized methylation classification, preliminary clinically predicted methylation classification showed a statistically significant difference between c-LM and c-HM/IM in both OS and TFS based on the decision tree algorithm (P = 0.042 and 0.007, respectively) (Fig. 3E,F), the SVM (P = 0.17 and 0.025, respectively) (Supplementary Fig. S1C,D), and the naïve Bayes model (P = 0.020 and 0.001, respectively) (Supplementary Fig. S2C,D). The clinically predicted trichotomized DNA methylation subgroup (cLM, c-IM, and c-HM) analyses showed a significant difference in TFS (P = 0.0019) but not in OS (P = 0.46) in the decision tree algorithm (Fig. 4E,F), as it did in the SVM (P = 0.0068 for TFS and 0.36 for OS, respectively) (Supplementary Fig. S3C,D).
Comparison of prognostic performance by the models
The prognostic performance of each model was compared using Harrel’s C statistic (Table 3). Compared to array-based methylation classification, the C-statistics for the models derived from machine learning algorithms were comparable for both OS and TFS in the training cohort. In the validation cohort, the decision tree-based reclassification model was comparable to the other models for both OS and TFS.
Discussion
In the present study, we implemented machine learning algorithms to develop preliminary clinical parameter-based prediction models of methylation profiles using an international cohort of patients with JMML and demonstrated a high concordance between DNA methylation and clinically predicted DNA methylation subgroups. We also assessed the validity of the models for assessing survival in an independent cohort. To our knowledge, this was the first study to develop clinical parameter-based models to predict an international consensus definition of DNA methylation subgroups in JMML19.
Notably, we developed prognostic models based on methylation classification, which may have the advantage of being able to predict high-risk JMML regardless of future changes in treatment, while the prognosis-based approach may not be applicable in future patients due to medical advances. In this regard, we believe methylation-based prediction would be a better approach to predict patients’ prognosis.
The models consisted of well-known clinical predictors, and the mechanisms for each parameter can be accounted for by prior knowledge4,11. In our clinical models, age, age-adjusted HbF elevation, and PTPN11 mutation were specified as classifiers to dichotomize the methylation profile, while age, age-adjusted HbF elevation, platelet count, PTPN11 and CBL mutations, and monosomy 7 were utilized as classification factors for the three methylation subgroups. Among the three machine learning algorithms we tested, namely, decision tree, SVM, and naïve Bayes models, decision tree algorithms were more intuitive than the others and were in line with our clinical knowledge as well as previous studies. Furthermore, the discrimination performance of the decision tree algorithm-based model was comparable to the array-based model in the training cohort and the other models in the validation cohort. Although the concordance between predicted and array-based classification was quite high for both dichotomous and trichotomous methylation profiles, dichotomized methylation enabled us to stratify patients with JMML simply and effectively for both OS and TFS.
Our models enabled us to effectively dichotomize patients with JMML (c-HM/IM and c-LM). In the training cohort, the 2-year TFS was nearly 50% in the c-LM group. The models were also capable of stratifying patients for augmented treatment for c-HM/IM patients. There is also clinical utility in identifying the patients most likely to survive in the absence of hematopoietic stem cell transplantation, in particular in developing countries.
The decision tree algorithm successfully stratified patients with JMML into three groups, as did the consensus classification in the previous study. However, the decision tree algorithm fitted to the validation cohort for categorization into three methylation groups showed a significant difference in TFS but not in OS. This likely reflects the shorter duration of follow-up in the validation cohort. Using a consensus classification, TFS in the HM and IM groups was equally poor in the European, American, and Japanese cohorts19. These findings suggest that the HM and IM groups may be biologically similar to one another.
There were some limitations in this study. First, the sample size in our study was relatively small. However, this international collaborative study contained a substantial number of patients with JMML despite the rarity of the disease, and machine learning algorithms enabled us to enhance the statistical power as well as the precision of prediction. Second, because of the lack of methylation profile data in the validation cohort, we have not been able to test whether the risk classifier established in this study can reproduce the methylation profile itself, and validation is needed in future studies.
In conclusion, we successfully developed preliminary clinical parameter-based DNA methylation prediction models and tested them in an independent cohort. These models stratified patients based on the necessity for more intensive treatment, including transplantation. With additional validation, these models will be helpful for patients with JMML worldwide.
Methods
Study design
We developed prediction models of methylation classification using machine learning algorithms in a training cohort and assessed the models’ survival analysis efficacy using a validation cohort. We implemented three machine learning algorithms to dichotomize the subjects. Then, we derived classification models from the training cohort using machine learning algorithms and implemented these models to predict OS and TFS in a validation cohort to assess efficacy. We developed and validated the prediction models based on the Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis (TRIPOD) Statement. TRIPOD is intended to help readers better understand the study design, conduct, analysis, and interpretation of data and assess the validity, transportability, and application of study results23,24. All patients’ parents or legal guardians provided informed consent according to the Declaration of Helsinki. Institutional ethics committees approved the storage and collection of patient specimens. The ethics committee of the Nagoya University Graduate School of Medicine approved this study.
Data sources
We used data from recently published international consensus definitions for the training cohort19. We also used data from 72 patients from Japan and the United States with available clinical information, which had not previously undergone methylation analysis for the validation cohort to examine the validity of the clinical model to predict OS and TFS.
Statistical analysis
We compared the mutation frequency and other clinical features between the disease groups using Fisher’s exact test. OS and TFS were calculated using the Kaplan–Meier method. For the TFS analysis, transplantation and death from any cause were censored as events.
We implemented three different supervised machine learning algorithms (decision tree, SVM, and naïve Bayes model) to differentiate the methylation profiles. Features were extracted as older age (> 2 years old), male sex, decreased platelet count (< 33,000), elevated age-adjusted HbF, monosomy 7, and mutations in PTPN11, CBL, NRAS, and KRAS and were dichotomized (as in Supplementary Data). Standard methods (fitcsvm, fitcnb, fitctree, and predict) from the Matlab®2020b Statistics and Machine Learning Toolbox with default parameter settings were used to derive the naïve Bayes classifier, the SVM-based classifier, and the decision tree. Tenfold cross-validation was employed to internally validate the models. First, we dichotomized the methylation profiles into two groups: HM/IM and LM. Second, we implemented each model to differentiate the methylation profiles into three categories: HM, IM, and LM. We first developed the decision tree algorithm as our primary analysis since it is an intuitive approach. Survival differences were tested using the log-rank test. All statistical analyses were performed using Stata 17.0 MP software (Stata Corp., College Station, TX, USA), and the machine learning algorithms were developed using MATLAB software (The MathWorks, Inc. Natick, MA, USA).
Data availability
The data underlying this article cannot be shared publicly to protect the study participants’ privacy.
References
Aricò, M., Biondi, A. & Pui, C. H. Juvenile myelomonocytic leukemia. Blood 90, 479–488 (1997).
Niemeyer, C. M. & Kratz, C. Juvenile myelomonocytic leukemia. Curr. Oncol. Rep. 5, 510–515 (2003).
Loh, M. L. Recent advances in the pathogenesis and treatment of juvenile myelomonocytic leukaemia. Br. J. Haematol. 152, 677–687 (2011).
Niemeyer, C. M. et al. Chronic myelomonocytic leukemia in childhood: A retrospective analysis of 110 cases. Blood 89, 3534–3543 (1997).
Arber, D. A. et al. The 2016 revision to the World Health Organization classification of myeloid neoplasms and acute leukemia. Blood 127, 2391–2405 (2016).
Niemeyer, C. M. & Flotho, C. Juvenile myelomonocytic leukemia: Who’s the driver at the wheel? Blood 133, 1060–1070 (2019).
Lipka, D. B. et al. RAS-pathway mutation patterns define epigenetic subclasses in juvenile myelomonocytic leukemia. Nat. Commun. 8, 2126 (2017).
Stieglitz, E. et al. Genome-wide DNA methylation is predictive of outcome in juvenile myelomonocytic leukemia. Nat. Commun. 8, 2127 (2017).
Murakami, N. et al. Integrated molecular profiling of juvenile myelomonocytic leukemia. Blood 131, 1576–1586 (2018).
Passmore, S. J. et al. Pediatric myelodysplasia: A study of 68 children and a new prognostic scoring system. Blood 85, 1742–1750 (1995).
Locatelli, F. & Niemeyer, C. M. How I treat juvenile myelomonocytic leukemia. Blood 125, 1083–1090 (2015).
Yoshida, N. et al. Correlation of clinical features with the mutational status of GM-CSF signaling pathway-related genes in juvenile myelomonocytic leukemia. Pediatr. Res. 65, 334–340 (2009).
Sakaguchi, H. et al. Exome sequencing identifies secondary mutations of SETBP1 and JAK3 in juvenile myelomonocytic leukemia. Nat. Genet. 45, 937–941 (2013).
Stieglitz, E. et al. Subclonal mutations in SETBP1 confer a poor prognosis in juvenile myelomonocytic leukemia. Blood 125, 516–524 (2015).
Helsmoortel, H. H. et al. LIN28B overexpression defines a novel fetal-like subgroup of juvenile myelomonocytic leukemia. Blood 127, 1163–1172 (2016).
Bresolin, S. et al. Gene expression-based classification as an independent predictor of clinical outcome in juvenile myelomonocytic leukemia. J. Clin. Oncol. 28, 1919–1927 (2010).
Luna-Fineman, S. et al. Myelodysplastic and myeloproliferative disorders of childhood: A study of 167 patients. Blood 93, 459–466 (1999).
Locatelli, F. et al. Hematopoietic stem cell transplantation (HSCT) in children with juvenile myelomonocytic leukemia (JMML): Results of the EWOG-MDS/EBMT trial. Blood 105, 410–419 (2005).
Schönung, M. et al. International consensus definition of DNA methylation subgroups in juvenile myelomonocytic leukemia. Clin. Cancer Res. 27, 158–168 (2021).
Breiman, L. Classification and Regression Trees (Chapman & Hall, 2017).
Noble, W. S. What is a support vector machine? Nat. Biotechnol. 24, 1565–1567 (2006).
Jensen, F. An Introduction to Bayesian Networks (Springer, 1996).
Moons, K. G. M. et al. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): Explanation and elaboration. Ann. Intern. Med. 162, W1–W73 (2015).
Collins, G. S., Reitsma, J. B., Altman, D. G. & Moons, K. G. M. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): The TRIPOD statement. Ann. Intern. Med. 162, 55–63 (2015).
Acknowledgements
The authors acknowledge all clinicians, patients, and their families.
Funding
The Japan Society for the Promotion of Science KAKENHI Grant Number 18K07816 supported this study.
Author information
Authors and Affiliations
Contributions
T.I. analyzed data and wrote the paper. J.M., M.W., H.K., and N.M. performed research and analyzed data. Y.O. performed research, analyzed data, and wrote the paper. S.K., Y.T., and M.L. collected clinical samples and analyzed data. E.S. and H.M. designed and analyzed data, led the project, and wrote the paper. All authors reviewed and approved the final version of the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher's note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Imaizumi, T., Meyer, J., Wakamatsu, M. et al. Clinical parameter-based prediction of DNA methylation classification generates a prediction model of prognosis in patients with juvenile myelomonocytic leukemia. Sci Rep 12, 14753 (2022). https://doi.org/10.1038/s41598-022-18733-4
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-022-18733-4
- Springer Nature Limited