Introduction

Acute myeloid leukemia (AML) is the common hematological malignancy comprising approximately 15% of leukemia cases. AML is driven by a series of genetic and epigenetic events, with induction chemotherapy and hematopoietic stem cell transplantation remaining the standard treatments [1, 2]. Despite a relatively high response rate, the prognosis of AML patients varies. The risk factors for AML with poor prognosis include clinical demographics, blood cell counts, serum oncological indicators, protein expression, and gene profiles [3, 4]. However, there is still an urgent need for new noninvasive, objective and time-efficient methods to identify those with poor prognosis and aid choice of therapy.

Chest computed tomography (CT) is the most widely used imaging method for routinely assessing lung conditions in all malignant diseases, including AML. However, it is challenging to predict prognostic status based on the traditional radiological evaluation of CT images in AML. CT based body composition evaluation, such as of myosarcopenia, refers to the computational extraction and analysis of imaging features from clinically acquired radiological images. Sarcopenia (loss of lean muscle mass) is an important reason for longer hospital stays and premature mortality in patients with nonmalignant disease [5]. Recently, the preliminary prognostic impact of sarcopenia has also been highlighted in AML [6,7,8] and acute lymphoblastic leukemia patients [9], demonstrating the potential for imaging-based prognostic prediction in leukemia. Notably, these studies were all conducted based on abdominal CT imaging. Unfortunately, AML patients do not routinely undergo abdominal CT scans unless they have coexisting abdominal diseases. Instead, a pretreatment chest CT is performed for AML patients prior to hospitalization in many institutions. Nevertheless, whether it is feasible to quantify CT-derived features based on pretreatment chest CT for the prognosis of AML patients is still unknown.

Methods

Patients

Through an evaluation of our institutional medical database from January 2010 to February 2020, data were retrospectively collected for de novo AML diagnosed in accordance with WHO criteria in ten participating hospitals. Details of the patient recruitment process and exclusion criteria are shown in Fig. 1. All 952 patients were randomly allocated into a training cohort (476 patients) and a validation cohort (476 patients). The general information of patients in these two cohorts is presented in Table 1.

Table 1 Clinical and body composition status data of 952 leukemia cases

Acquisition and retrieval procedure of CT images and radiological and body composition feature extraction

CT image acquisition, the image retrieval procedure, the algorithms for radiological and body composition feature extraction, and intra-observer (reader 1 twice) and inter-observer (reader 1 vs. reader 2) reproducibility evaluation were performed as previously described [10]. Briefly, CT-derived features, including skeletal muscle index (SMI), skeletal muscle radiation attenuation (SM-RA), liver CT value (liver_CTV) (Housfield units, HU), visceral or subcutaneous fat index (VFI or SFI), spleen CT value (spleen_CTV), and subcutaneous fat CT value (SF_CTV) evaluated by reader 1 were obtained from the initial chest CT images at the level of the fourth thoracic vertebra (T4) and were used to build models to predict the prognosis of AML.

Development of an individualized prediction model

The multivariable Cox regression method, which is suitable for the regression of medical data, was performed based on the candidate clinical and radiological predictors. Briefly, LASSO regression and clinical experience was used to screen for correlation factors [11]. Then, a prognostic model for predicting the 1-, 2-, 3-, and 5-year OS probabilities was developed by Cox regression. A predictive model was also constructed by using an ensemble machine learning method or deep learning algorithm. Details of machine learning and deep learning methods can be found in the supplementary methods.

Performance, validation and clinical use of the nomogram

The calibration of the nomogram was evaluated by calibration curves, and the diagnostic efficiency was quantified with Harrell’s C-index.

The performance of the nomogram was tested in the validation cohort with a series of indicators [12]. The concordance index (C-index) and the area under the time-dependent receiver operating characteristic (ROC) curve (AUC) were used to evaluate the differentiation ability of the new model. The performance of the new nomogram was further supplemented with two more indicators (net reclassification improvement [NRI] and integrated discrimination improvement [IDI]) to increase the accuracy and comprehensiveness of the comparisons. The consistency between the survival probabilities predicted using the nomogram and the actual result was evaluated by drawing calibration plots. Finally, decision-curve analysis (DCA) was performed to evaluate the clinical validity of the model. The total scores on the nomogram were divided into high-, and low- risk groups by X-Tile software. The MSN risk groups were identified by combining the CT-MSF nomogram risk and the ELN risk.

Statistical analysis

All of the statistical analysis were performed using IBM SPSS Statistics software (version 27.0, SPSS, Chicago, IL, USA), R software (version 4.0.3; http://www.Rproject.org) and X-tile software (version 3.6.1; http://tissuearray.org/). A bilateral probability value of p < 0.05 was considered indicative of statistical significance. *, P < 0.05. **, P < 0.01. ***, P < 0.001.

Data availability

The data and code of this study are available from the corresponding author upon request.

Results

Enrollment, characteristics and CT-derived features of the AML patients

A total of 952 AML patients at ten different hospitals (H1-10), were retrospectively included in this study. These patients were randomly assigned into the training cohort and validation cohort (Fig. 1A). Characteristics of these patients were presented in Fig. 1B; Table 1. No significant difference was observed in any parameters. Image pre-processing and analysis, prediction model construction process were performed as shown in Fig. 1C.

Fig. 1
figure 1

Enrollment, clinicopathological characteristics and body composition assessment of AML patients. (A) Flow-chart demonstrating the exclusion criteria and the patient recruitment process with the reason for exclusion. (B) Sankey diagram showing the scmap cluster projection of the key characteristics of 952 patients used in this study, including hospital, gender, age composition, risk, treatment response and BMT status. (C) Workflow of the body composition feature- and image-based analysis in this study. VFA, visceral fat area; MFA, muscle fat area; SMA, skeletal muscle area; SFA, subcutaneous fat area; Liver_CTV, CT value of liver parenchyma; VFI, visceral fat index; MFI, muscle fat index; SMI, skeletal muscle index; Spleen_CTV, spleen CT value; SF_CTV, subcutaneous fat CT value; myosarco, myosarcopenia. Risk, ELN risk; BMT, bone marrow transplantation; OS, overall survival; CR, Cox proportional-hazards regression; RSF, random survival forest; DL, deep learning.

Establishment and evaluation of the CT-MSF model

To identify prognostic factors for AML, we performed LASSO regression analysis with clinicopathological characteristics and CT-derived features as variables. The LASSO regression analysis indicated that myosarcopenia, bone marrow transplantation (BMT), risk, BMT, spleen CT value (spleen_CTV), subcutaneous fat CT value (SF_CTV), age, red blood cell distribution width (RDW), high-density lipoprotein (HDL) and triglyceride (TG) were independent predictors for overall survival (OS) (Fig. 2A, supplementary Table 1). By multivariate Cox regression analysis, we established a new model integrating myosarcopenia, spleen_CTV and SF_CTV (CT-MSF). The CT-MSF nomogram showed a performance with a C index of 0.735 and 0.690 for the training and validation cohort, which were significantly higher than the C index of 0.557 and 0.557 for the ELN risk model (P < 0.05). Moreover, this predictive model achieved AUCs of 0.717, 0.794, 0.796 and 0.792 for predicting the 1-, 2-, 3- and 5-year OS probabilities in the validation cohort, respectively. When compared to the traditional ELN risk model (blue), the prediction performance of CT-MSF model (red) had significantly better predictive performance (Fig. 2B-E). Interestingly, the CT-MSF nomogram (cyan) added more benefit than the ELN risk model (purple) (Figure S1). The calibration curves of the nomogram for the probability of OS showed good agreement between prediction and observation in the validation cohort (Figure S2). The net reclassification improvement (NRI) and integrated discrimination improvement (IDI) also indicated that the CT-MSF model achieved satisfactory efficiency (supplementary Table 2). Machine learning and deep learning models were also constructed but did not show better performance than the traditional Cox method included above (data not shown). Collectively, we have identified independent predictors and developed a CT-MSF nomogram to predict prognosis of AML with high accuracy.

Fig. 2
figure 2

Development and validation of CT-MSF Prediction Model. (A) CT-based MSF nomogram for the prognosis of AML. The CT-based MSF model was developed in the training cohort, incorporating the CT-derived parameters including myosarco, Spleen_CTV, and SF-CTV. Myosarco, myosarcopenia. Spleen_CTV, spleen CT value. SF-CTV, subcutaneous fat CT value. RDW, red blood cell distribution width. HDL, high-density lipoproteins. TG, triacylglyceride. Risk1, ELN high risk. Risk2, ELN intermediate risk. Risk3, ELN low risk. BMT 0, no BMT. BMT 1,with BMT. Myosarco 0, no sarcopenia. Myosarco 1, myosarcopenia. Myosarco 2, myosteatosis. Myosarco 3, sarcopenia. (B-E) Receiver operating characteristic (ROC) curves for validation of the CT-MSF model (red) and ELN risk model (blue) at 1 year (B), 2 years (C), 3 years (D) and 5 years(E). The area under curves (AUCs) for both models in the validation cohort of 476 patients are shown. Nomogram, the CT-MSF model. Risk, the ELN risk model.

We next assessed whether the CT-MSF model could be used to predict overall survival by applying the model to the whole dataset. Based on the scores generated by our model, all cases were classified into the CT-MSF high-risk and CT-MSF low-risk groups (CT-MSF risk), by using the X-title method. As expected, patients within the high-risk group had shorter survival than that of the low-risk patients (p < 0.0001, log-rank test) (Figure S3A, supplementary Table 3). We also observed that the CT-MSF high-risk patients demonstrated higher cumulative hazard than the low-risk group (Figure S3B).

Stratification of AML patients by a combination of CT-MSF and ELN model

Next, we plotted the overall survival of this cohort with the combination of risk subgroups determined by the CT-MSF model and the ELN risk model to stratify these 952 patients. Interestingly, the survival curves fell mainly into 3 new groups. The CT-MSF low-risk and ELN low- and intermediate-risk groups formed a new MSN low-risk group with better survival. The CT-MSF low-risk/ELN high-risk group composed the new MSN intermediate-risk group, which had a medium survival time. Notably, all the CT-MSF high-risk group patients fell into the new MSF-ELN (MSN) high-risk group and had worse survival, regardless of their ELN risk (Fig. 3). Taken together, these results indicate that our CT-MSF model could further stratify patients when combined with the ELN risk model.

Fig. 3
figure 3

Overall survival of 952 patients stratified with the combination CT-MSF model and ELN risk model. Low, low risk. Inter, intermediate risk. Hi, high risk. *, P < 0.05. ***, P < 0.001

To clarify whether the effect of intensive and low-intensity treatment could be predicted by this new MSN risk score, we conducted prognostic analysis of patients who underwent different treatment choice. The results showed that the overall survival of patients with low and intermediate MSN risk was not affected by the choice of chemotherapy (Figure S4A and B). In contrast, among the MSN high-risk patients, the survival time of patients receiving intensive treatment was significantly longer than the low-intensity group (P <0.05, Figure S4C). Our data indicated that the new MSN stratification system combining the CT-MSF model and ELN risk model aid in choosing the therapy.

Discussion

Our findings provide preliminary data to support the inclusion of CT-derived body composition parameters, sarcopenia for example, as biomarkers of prognosis in AML. Recent studies have reported that sarcopenia had adverse implications, including the increased severity of the disease, longer hospital stays, more complications, earlier postoperative recurrence, and poor prognosis [6,7,8]. Our findings were generally in line with these prior studies. Moreover, our model included Spleen_CTV and SF_CTV in addition to sarcopenic features. Consistently, these parameters are known to be related to prognosis in AML. A lower Spleen_CTV reflects decreased spleen density due to multiple complex reasons, including systemic inflammation, abnormal fat deposition, tumor cell infiltration, and spleen tissue cell damage [13]. The increased SF_CTV may be caused by various reasons, with systemic inflammation as the most common contributor. Therefore, myosarcopenia, spleen-CTV and SF-CTV can be used to predict the overall treatment response, representing multiscale biological characteristics associated with metabolic status, gene status, and various pathological conditions, which could potentially predict prognosis in patients with AML.

Our model could predict prognosis better than ELN risk model alone, as evidenced by our AUC values, C-index, NRI, IDI as well as DCA in the validation cohorts. We speculate that the improved model performance in our study may be due to our research strategy. First, we enrolled 952 patients from ten hospitals and pretreatment chest CT images is commonly available in all hospitals in China. Moreover, the CT images used in our cohort were obtained from different scanners with the same non-enhanced scanning protocol, which largely considered imaging variability. Finally, in addition to Cox regression, we also tried the other algorithms including ensemble machine learning classification method and deep learning to select the best method. Our study showed that the Cox model with optimization algorithm has better expressiveness and is not inferior to all the rest models in performance.

In addition to CT-derived features, our MSF nomogram also includes clinical characteristics that yet to be included in prognostic prediction models of AML, including RDW, HDL and TG. RDW is traditionally considered as a marker of the differential diagnosis of anemia, and it has been reported as a prognostic factor in AML [14]. Increased RDW is related to oxidative stress, poor nutritional status and older age and may also suggest a proinflammatory state [15]. HDL and TG are both indicators of lipid metabolism and HDL levels have been implied to be associated with the pathogenesis of AML [16]. Dysregulated lipid metabolism has been reported to be involved in the pathogenesis of AML, and several key enzymes involved in lipid synthesis have been studied and explored as the targets to treat cancers, HMGCR for example [16,17,18]. Consistently, these three traits serve as independent factors and important contributing factors in our nomogram.

However, there were limitations to this study. First, though we adopted a multicenter design in the present study, potential selection bias was unavoidable. In addition, the retrospective nature limits our exploration of the imaging and pathological correlation. Moreover, our sample size was still modest. In the future, a larger independent, prospective, multicenter study is needed for validation.

Conclusions

In summary, we presented an MSF nomogram utilizing chest CT images to predict prognosis in patients with AML. The combination of the MSF nomogram and ELN risk yielded an MSN stratification system that could stratify AML patients. Moreover, this MSN stratification system could be used to assist in the choice of therapy to achieve better outcomes. The information from the current study could be used to assist clinicians in selecting optimal therapies for personalized treatment of AML.