Introduction

Ewing sarcoma (ES) is a highly aggressive bone sarcoma, that occurs in any age with a peak incidence in children and young adults. The most powerful adverse prognostic factor across different treatment strategies in ES is metastases, with the lungs and bone being the most common metastatic locations. At presentation, approximately 20–25% of patients show metastases, usually affecting the lungs (70–80%) and the bone (40–45%) [1, 2]. At present, the overall survival (OS) of patients with localized disease has improved remarkable by multimodal treatment, with a 10-year rate of 55–65%. However, the suboptimal outcome in patients with pulmonary metastases with 2- to 10-year event-free survival (EFS) of 30–36% still remains to be a great concern [1, 3].

Computed tomography (CT) is widely used for diagnosing pulmonary metastases as a non-invasive and reliable method. However, because micro-metastases are not identified by the current radiological techniques, improving the accuracy in detecting pulmonary metastases at the time of diagnosis is necessary [4]. Radiomics analysis, which refers to extracting radiomics features from medical images and then transferring them into high-dimensional data, has been applied in various types of tumors for diagnosis and prognosis and as a treatment response imaging biomarker [5,6,7]. Few studies have employed radiomics to identify patients at risk for developing pulmonary metastases [8, 9], and Dai et al. used multiparametric magnetic resonance imaging (MRI)-based radiomic analysis for the distinction of ewing sarcoma and osteosarcoma [10]. However, to the best of the author’s knowledge, no studies have been reported on the use of radiomics to predict pulmonary metastases in ES.

This study aimed to develop and validate radiomics models based on CT to predict pulmonary metastases in patients with ES, which could be a potential tool for guiding more personalized medicine.

Materials and methods

Patients

This retrospective study was approved by the Ethics Committee of our hospital, and the need for a written informed consent from patients was waived. The inclusion criteria were as follows: (a) histopathologically-confirmed ES by biopsy or tumor resection; (b) with initial CT images, including plain and CT- enhanced (CTE) images of the primary bone lesion, and chest CT at presentation. The exclusion criteria were as follows: (a) poor quality images, which were inadequate for the following analysis; (b) without follow-up of 2 years after diagnosis. The patients were randomly divided into training cohort and validation cohort at a ratio of 8:2[11], the training cohort was used for feature selection and models building, and the validation cohort was used for evaluation. Figure 1 shows the flow diagram of the recruitment process.

Fig. 1
figure 1

Flowchart of the patients selection process in our study

All patients underwent follow-up of more than 2 years after diagnosis, and enhanced chest CT were monitored once every 3 months. The patients who suffered pulmonary metastases of ES within 2 years after diagnosis were defined as the presence (MT) cohort, and those without any suspicious nodules on chest CT images during the 2-year follow-up after diagnosis were defined as the absence (non-MT) cohort. The MT group were those patients with multiple nodules at presentation or nonspecific nodules that increased in number or size, as detected by chest CT during the 2-year follow-up. The potential clinical risk factors that may be associated with pulmonary metastases were obtained from medical records as follows: gender, age, tumor location (pelvic bone, long bone or other locations), lactate dehydrogenase (LDH) level, alkaline phosphatase (ALP) level, major length and therapeutic method.

Imaging acquisition and analysis

The CT and CTE images were obtained from the picture archiving and communication system (PACS). CT images were acquired using multidetector row CT (MDCT) systems (Brilliance iCT, Philips Healthcare, Best, the Netherlands; Light Speed Volume CT, GE Healthcare, Waukesha, USA), with the following parameters: 120 kV, 240 to 260 mAs, collimations of 64 × 0.6 mm, and slice thicknesses of 5 mm. CTE acquisition was performed after a 70s delay following intravenous administration, with 1.5 mL/kg iodinated contrast (100 mL of 370 mg J/mL iopromide; Bayer Schering Pharma, Berlin, Germany) by using an automatic pump injector (Ulrich CT Plus 150, Ulrich Medical) at a rate of 2.5 mL/s through the antecubital vein.

The regions of interest (ROIs) were handcrafted along the boundary of each tumor on each slice by using uAI Research Portal [12] (Shanghai United Imaging Intelligence Co., Ltd, Shanghai, China) by two experienced radiologists (YL and PY, with 6 and 10 years of CT experience, respectively). The ROIs were delineated on both the CT and CTE images by consensus of the two radiologists.

Radiomics feature extraction and selection

Z-score normalization method was implemented in CT images to minimize the intensity discretization before feature extraction. A total of 2600 radiomics features were extracted using Python software (https://www.python.org), based on algorithms provided in Pyradiomics [13] (version 2.1.1). The features could be further divided into three groups: intensity, tumor shape and texture features. Maximum normalization of absolute scaler method was applied to normalize the features before feature selection and model building. The univariate method - Analysis of Variance (ANOVA) and the multivariate method-least absolute shrinkage and selection operator (LASSO) algorithms were applied to select the most valuable radiomics features [14].

Model construction

After feature selection was conducted, radiomics and clinical-radiomics models were built to predict the pulmonary metastases of ES. The radiomics model incorporated only radiomics features, and then the clinical risk factors (therapeutic method, P<0.05) were introduced to build the clinical-radiomics model by the stepwise multiple logistic regression. Gaussian process (GP), logistic regression (LR) and partial least squares discrimination analysis (PLS-DA) classifiers with high stability were investigated.

Model assessment

The training cohort was used for feature selection and model construction, and the validation cohort was used for model evaluation. The predictive performance of the different models was assessed using the receiver operating characteristics (ROC) curve. The area under the curve (AUC), accuracy (ACC), sensitivity, and specificity were also reported for the radiomics and clinical-radiomics models. The comparisons between AUCs of the models were conducted by using DeLong test.

Statistical analysis

All statistics were performed using SPSS 22.0 (IBM Corp, NY, USA), with P < 0.05 considered as statistically significant. Comparisons of patient continuous clinical risk factors were conducted by t-test, and χ2 test or Fisher’s exact test was applied for categorical variables.

Results

Clinical characteristics

Finally, a total of 143 patients with (MT, n = 76) and without (non-MT, n = 67) pulmonary metastases were recruited in this study. The clinical characteristics are shown in Table 1.

Among all the cases, 85 lesions were located in pelvic bone, 39 lesions were located in the bones of extremities, and 19 were located in other areas (7 in clavicle, 5 in scapula, 3 in spine, 3 in rib, and 1 in calcaneus). No significant difference were found in gender, age, tumor location, LDH level, ALP level and major length between the MT and non-MT groups (P>0.05), except for the therapeutic method (P<0.05).

Predictive performance of radiomics model

By using ANOVA and LASSO methods, 10, 27 and 15 optimal features were ultimately selected as feature groups for individual CT, CTE and the combination of CT and CTE (ComB) models, respectively.

The ComB model had the highest AUC of 0.829, 0.852 and 0.848 among the models in the validation cohort by using GP, LR and PLS-DA classifiers respectively. The AUC, ACC, sensitivity and specificity of the CT, CTE, and ComB models are presented in Table 2. Figure 2 illustrates the calibration of the radiomics models, and the results showed good calibration in the training and validation cohorts. The decision curve analysis (DCA) for the three radiomics models of different classifiers is exhibited on Fig. 3, which showed the three radiomics models had similar benefit in the threshold probability.

Table 1 The clinical characteristics of patients
Table 2 The value of Different Radiomics Models in Training cohort and Validation cohort
Fig. 2
figure 2

Calibration curves of the three radiomics models in the training and validation cohort. (a): Calibration curves of the gaussian process based radiomics models (CT model, CTE model and ComB model) in the training cohort; (b): Calibration curves of the logistic regression based radiomics models (CT model, CTE model and ComB model) in the training cohort; (c): Calibration curves of the PLS-DA based radiomics models (CT model, CTE model and ComB model) in the training cohort; (d): Calibration curves of the gaussian process based radiomics models (CT model, CTE model and ComB model) in the validation cohort; (e): Calibration curves of the logistic regression based radiomics models (CT model, CTE model and ComB model) in the validation cohort; (f): Calibration curves of the PLS-DA based radiomics models (CT model, CTE model and ComB model) in the validation cohort

Fig. 3
figure 3

The decision curve analysis (DCA) of the three radiomics models in the training and validation cohort. (a): The DCA of the gaussian process based radiomics models in the training cohort. (b): The DCA of the logistic regression based radiomics models in the training cohort. (c): The DCA of the PLS-DA based radiomics models in the training cohort. (d): The DCA of the gaussian process based radiomics models in the validation cohort. (e): The DCA of the logistic regression based radiomics models in the validation cohort. (f): The DCA of the PLS-DA based radiomics models in the validation cohort. The y-axis measures the net benefit

Predictive performance of clinical-radiomics model

For GP and LR classifiers, the CTE_clinical model had the highest AUCs of 0.843 and 0.843 in the validation cohort, respectively. Whereas the ComB_clinical model achieved the highest AUC of 0.843 in the validation cohort by using the PLS-DA classifier. Table 3 shows the AUC, ACC, sensitivity and specificity of the three clinical-radiomics models. The ROC curves of the radiomics and clinical-radiomics models are exhibited in Fig. 4. There were no statistical differences between the AUC values of the models.

Table 3 The value of Different Clinical-Radiomics Models in Training cohort and Validation cohort

The calibration curve of the clinical-radiomics models showed good calibration results in the training and validation cohorts (Fig. 5). The DCAs of the three clinical-radiomics models indicated that these models had similar benefit in the threshold probability by using the GP, LR and PLS-DA classifiers (Fig. 6).

Fig. 4
figure 4

The ROC curves of the radiomics and clinical-radiomics models. (a)-(c), the ROC curves of radiomics and clinical-radiomics models in training cohort, by using GP, LR and PLS-DA classifiers, respectively; (d)-(f), the ROC curves of radiomics and clinical-radiomics models in validation cohort, by using GP, LR and PLS-DA classifiers, respectively

Fig. 5
figure 5

Calibration curves of the three clinical-radiomics models in the training and validation cohort. (a): Calibration curves of the gaussian process based clinical-radiomics models (CT_clinical model, CTE_clinical model and ComB_clinical model) in the training cohort;(b): Calibration curves of the logistic regression based clinical-radiomics models (CT_clinical model, CTE_clinical model and ComB_clinical model) in the training cohort; (c): Calibration curves of the PLS-DA based clinical-radiomics models (CT_clinical model, CTE_clinical model and ComB_clinical model) in the training cohort; (d): Calibration curves of the gaussian process based clinical-radiomics models (CT_clinical model, CTE_clinical model and ComB_clinical model) in the validation cohort; (e): Calibration curves of the logistic regression based clinical-radiomics models (CT_clinical model, CTE_clinical model and ComB_clinical model) in the validation cohort; (f): Calibration curves of the PLS-DA based clinical-radiomics models (CT_clinical model, CTE_clinical model and ComB_clinical model) in the validation cohort

Fig. 6
figure 6

The decision curve analysis (DCA) of the three clinical-radiomics models in the training and validation cohort. (a): The DCA of the gaussian process based clinical-radiomics models in the training cohort. (b): The DCA of the logistic regression based clinical-radiomics models in the training cohort. (c): The DCA of the PLS-DA based clinical-radiomics models in the training cohort. (d): The DCA of the gaussian process based clinical-radiomics models in the validation cohort. (e): The DCA of the logistic regression based clinical-radiomics models in the validation cohort. (f): The DCA of the PLS-DA based clinical-radiomics models in the validation cohort. The y-axis measures the net benefit

Discussion

To the best of the author’s knowledge, this study was the first to predict the risk of developing pulmonary metastases in patients with ES on the basis of CT radiomics features, in a 2-year follow-up period after diagnosis. The clinical-radiomics models based on combined features and the radiomics models demonstrated similar performance, and the sensitivity and specificity of the clinical-radiomics models and radiomics models were excellent.

Among the clinical risk factors, gender, age, tumor location, LDH level, ALP level and the major length of the tumor were confirmed to be not the prognostic factors for the occurrence of pulmonary metastases, without statistical difference between the MT and non-MT groups. By contrast, therapeutic method showed correlation with pulmonary metastases. Some previous studies reported that gender was not a risk factor for pulmonary metastases of ES, larger tumors had a higher chance of pulmonary metastases, and the results of age, and primary location were controversial. Li et al. found that surgery was a protective factor against pulmonary metastases and chemotherapy had no significant difference between MT and non-MT groups [15,16,17,18].

In the past decades, radiomics as a promising methodology has been widely used in the diagnosis, differentiation, staging and monitoring of tumors, owing to the progress in extracting vital high throughout analysis features and screening large numbers of features. In prior studies, radiomic analysis has been identified as a powerful method for the predicting pulmonary metastases of soft tissue and bone sarcomas [19,20,21].

For prediction of pulmonary metastases in ES, radiomics features based on CT were used to develop and validate three different radiomics models, which achieved high performance. Among the three models, the ComB model presented the highest performance in the validation set by using the LR classifier, but there were no statistical difference was observed between the AUCs of the models. Three well-known classifiers were used in this study, including GP, LR and PLS-DA. As the most widely used method in the past studies related to radiomics analysis, LR is a stratification algorithm suitable for analyzing large data sets of features in small samples, and designed to avoid overfitting and predict the class probability of a given categorical dependent variable [22]. GP is a nonparametric method that is based on Laplace approximation used for classification and regression, and it could handle various problems such as the curse of dimension, complex data types and insufficient capacity of the classical linear method [23, 24]. And PLS-DA method has been applied in previous studies, is demonstrated to be able to deal with high-dimensional radiomics dataset [25].

Furthermore, three clinical-radiomics models were established for patients with ES. The CTE_clinical and ComB_clinical models offered preferable prognostic ability (AUC = 0.843/0.843, in validation cohort) in the prediction of pulmonary metastases, and there were no statistic differences between the AUCs of the clinical-radiomics models. The prediction ability of the three clinical-radiomics models were not markedly enhanced relative to that of the radiomics models, and such results may be partially due to the sample size being relatively small and only one clinical risk factor being combined in the clinical-radiomics models.

There are several limitations in the retrospective study. First, due to the low incidence rate of ES, the patient population was relatively small, with all patients coming from one single center, and internal validity was used rather than external validity. Therefore, large-scale studies with external validation are required before widespread implementation of the models in the clinical practice. Second, most cases were diagnosed by surgical specimens, but a small fraction of the pathological diagnosis was obtained by biopsy, which could result in some bias. Third, multiparametric MR images with better soft-tissue resolution should be implemented in future studies to improve the precision and robustness of the models. Forth, manual segmentation is time consuming, and it can’t deliver reproducible results. But manual segmentation was used as a gold standard, and it has been applied in many previous studies and yielded excellent results. Fifth, the relationship between the radiomic analysis of the primary tumor and the possibility to develop pulmonary metastases was unclear, and this need more research in future.

In summary, the CT based radiomics model was effective in predicting the pulmonary metastases in patients with ES, and thus could be a potential tool for the accurate risk stratification and precision medicine.