Introduction

Colorectal cancer (CRC) is the most common tumor in the digestive system, and its mortality rate ranks third among cancer-related mortality in the world [1]. Moreover, rectal cancer accounts for about one-third of all CRC cases [2]. The primary treatments for rectal cancer include chemotherapy, radiotherapy, and surgery, though treatment options are determined by tumor stage. Early-stage rectal cancer can be treated directly by surgery, whereas locally advanced rectal cancer (LARC) requires neoadjuvant radiochemotherapy before surgery. Accurate preoperative staging of rectal cancer is essential for achieving precision treatment. Thus, it is vital to be able to precisely identify preoperative staging.

Magnetic resonance imaging (MRI) provides good soft-tissue contrast. The National Comprehensive Cancer Network [3] and the European Society for Medical Oncology [4] recommend MRI as the preferred imaging exam for rectal cancer. At present, MRI has been widely used in the evaluation of preoperative T stages of rectal cancer. However, it is not without limitations. Conventional MRI staging is easily influenced by clinical experience and the individual perspective of observers and its reproducibility and accuracy remains unsatisfactory [5,6,7]. In addition, conventional MRI has difficulty in distinguishing the inflammatory reaction around the tumor and tumor invasion, resulting in staging wrong [8]. Accurate staging is an imperative prerequisite to individualized therapy. The stratification according to accurate staging could avoid unnecessary chemoradiotherapy and their side effects, such as toxicity of chemotherapeutics, radiation enteritis, and rectal wall fibrosis and can reduce the financial burden of patients. Accurate staging has particular significance for clinical practice. Thus, there is a critical need to develop a method to provide this important information.

Radiomics can extract large amounts of quantified features from medical imaging data to provide mineable high-dimensional data. It deeply analyzes the clinicopathological information contained in large amounts of data [9, 10] and has been applied to tumor staging [11,12,13], predicting treatment response [14,15,16,17], and assessing the efficacy after chemoradiotherapy [18,19,20] in rectal cancer patients. Although some studies have explored the application of radiomics in T staging of rectal cancer, most of these studies used radiomics features alone; few have incorporated the clinical factors with the radiomics features and lack preoperative experimental validation. In this study, we introduced clinical factors and incorporated with radiomics features, analyzed from different aspects, constructed a multi-scale nomogram model to predict T staging in patients with rectal cancer, and validated the model in a prospective cohort, with the goal of providing a meaningful predictor and supporting for the individualized treatment plan, so as to make patients get more benefit from treatment.

Materials and methods

Patients

We searched our retrospective database for consecutive patients who received rectal MRI examinations in our hospital from August 2012 to December 2018. Inclusion criteria were patients who (1) underwent surgery and pathologically diagnosed with rectal adenocarcinoma; (2) received no treatment before surgery; (3) underwent MRI within 2 weeks before surgery; and (4) had complete clinical and pathological data. Exclusion criteria were as follows: (1) preoperative therapy (radiotherapy, chemotherapy or chemoradiotherapy); (2) simultaneous existence of other malignant tumors; and (3) poor image quality or artifacts. Finally, 268 patients were eligible and randomly distributed into two datasets at a proportion of 7:3. The study was approved by the Ethics Committee and the need to obtain informed consent was waived. Figure 1 presents the patient recruitment process.

Fig. 1
figure 1

Patient selection process

Baseline clinicopathologic data, including age, gender, preoperative carbohydrate antigen 199 (CA199), carcinoembryonic antigen (CEA), tumor diameter, location (distance from the anal verge), differentiation, and postoperative T staging, were derived from medical records.

Pathologic evaluation

The specimens were fixed in formalin for 48 h, and two pathologists evaluated the tissue sections stained with hematoxylin–eosin (H–E). The pathological diagnosis was performed according to the 8th edition of the American Joint Committee on Cancer staging standard [21]. T1 tumors invade submucosa, T2 tumors invade muscularis propria, T3 tumors penetrate the muscularis propria to reach the subserosal layer or invade the adjacent rectal tissue without peritoneal covering, and T4 tumors penetrate the serosal membrane or directly invade other organs or tissues. T1–T2 stages were classified as early stage and T3–T4 stages as local advanced stage.

MRI image acquisition and regions of interest segmentation

Imaging data were collected using a GE Discovery MR750w 3.0 T MRI scanner with the phased-array body coil. Patients fasted for 4 h and emptied the bowel contents before scanning. The axial and coronal MRI sequences were perpendicular and parallel to the long axis of the rectal tumor, respectively. The regions of interest were manually outlined using each slice of the axial T2-weighted images to cover the entire tumor (Fig. 2a and b). Intestinal contents and air were excluded. The procedure was performed using the MITK software (MITK Workbench 2018.04.2, http://mitk.org/wiki/MITK). Thirty cases of MRI images were randomly selected, and two radiologists with 9 (doctor 1) and 11 years (doctor 2) of MRI interpretation experience independently outlined the tumors without knowing the pathological results. Doctor 1 repeated the process 1 week later. The intra-class correlation coefficient (ICC) was used to assess the inter-/intra-observer variability. An ICC above 0.75 was considered to have good reproducibility. Doctor 1 finished the delineation for the remaining images. The formulation of ICC is shown in Supplementary Material 1.

Fig. 2
figure 2

Tumor segmentation on rectal MRI. a A 49-year-old man with early-stage rectal cancer. b A 73-year-old man with LARC

Radiomics feature extraction

Pyradiomics [22] (versions 3.0, https://www.radiomics.io/pyradiomics.html) were used to extract radiomics features. For the original MRI images, a Laplacian of Gaussian (LoG) filter with different σ parameters and a wavelet filter were used for preprocessing to obtain filtered images. Then original and filtered images were taken to extract features. Each patient obtained 960 features from their images and these features were normalized using a Z-score transformation. Supplementary Material 2 lists the features.

Feature selection and radiomics signature building

After extracting features, a least absolute shrinkage and selection operator (LASSO) logistic regression was performed to identify the optimal predictive features from the training dataset. Then the optimal features weighted by corresponding coefficients were linearly combined to obtain a radiomics score (Rad-score) [23].

Radiomics nomogram construction and performance evaluation

The Rad-score and clinicopathological factors, including age, gender, tumor location, maximum diameter, differentiation, CEA, and CA199, were considered as possible predictive factors. Significant predictors were chosen via univariate and multivariate logistic regression of the training dataset. Based on Akaike’s information criterion, we used the likelihood ratio test to make a backward stepwise selection. A radiomics nomogram grounded on the multivariate logistic analysis was built for the training dataset.

The predictive ability of the nomogram was quantified through the area under the curve (AUC) of a receiver operator characteristic (ROC) curve. A calibration curve [24] was chosen to evaluate the calibration performance via bootstrapping with 1000 resamplings. The Hosmer–Lemeshow (H–L) test was used to evaluate goodness of fit of the nomogram.

Validation of the nomogram

In the validation dataset, each patient received a Rad-score and the performance of the nomogram was validated. The discrimination was evaluated by the AUC, and calibration was verified by the calibration curve and H–L test.

Clinical use

The clinical utility of the nomogram was assessed via quantifying the net benefits at different thresholds in decision curve analysis (DCA) [25] for the training and validation datasets.

Statistical analysis

A Mann–Whitney U test or an independent samples t-test was used for continuous variables comparison, while Chi-squared tests for categorical variables. R software (version 3.6.3; https://www.r-project.org/) was applied to statistical analysis. P < 0.05 was considered statistically significant. Supplementary Material 3 describes the packages used in R.

Results

Patient characteristics

The median age was 59.84 in the training dataset and 60.20 in the validation dataset. Males had a preponderance in both datasets (69.7% and 70.0%, respectively). The moderately differentiated patients accounted large proportion in two datasets (70.2% and 70.0%, respectively). No significant differences were found among clinical factors between the two datasets. The training dataset contained 133 LARC patients and the validation dataset contained 60 LARC patients. There was no difference in the proportion of LARC patients in the two datasets (Table 1).

Table 1 Characteristics of patients

MRI evaluation of T staging

Two radiologists evaluated the T staging through MRI images without knowing the pathological results. When they had different opinions, the final results were determined through discussion. The AUC was used to evaluate the performance of the radiologists' subjective judgment of T staging through MRI images. The AUC for the training dataset was 0.689 (95% CI 0.619–0.759) and 0.666 (95% CI 0.552–0.781) for the validation dataset (Fig. 3a and b).

Fig. 3
figure 3

The ROC of the training (a) and validation (b) datasets for the MRI

Repeatability of ROI segmentation, feature selection, and radiomics signature construction

The evaluation of intra-/inter-observer repeatability of ROI segmentation was through calculating ICC. The intra- and inter-observer ICC were 0.816–0.947 and 0.778–0.925, respectively, which indicated satisfying repeatability of the segmentation. In the training dataset, the LASSO logistic regression identified nine potential radiomics features with nonzero coefficients (Fig. 4a and b). There were three shape-based features, one LoG filter feature and five wavelet filter features. In order to explore the contribution of the shape-based features, we built two Rad-scores: Rad-score 1 consisted of all nine features and Rad-score 2 consisted of the remaining features after removing the shape-based features. Then the AUC was used to evaluate the performance of Rad-scores. The AUC of Rad-score 1 was 0.872 (95% CI 0.821–0.922) for the training dataset and 0.807 (95% CI 0.705–0.909) for the validation dataset. The AUC of Rad-score 2 was 0.867 (95% CI 0.815–0.919) for the training dataset and 0.786 (95% CI 0.671–0.901) for the validation dataset. The performance of Rad-score 1 was better than Rad-score 2, and we chose Rad-score 1 as the final Rad-score. The Rad-score calculation formula was as follows:

Fig. 4
figure 4

The LASSO logistic regression model for radiomics feature selection. a The tuning parameter (λ) selection was based on a tenfold cross-validation in the LASSO model. The minimum criteria and one standard error of the minimum criteria (1-SE criteria) were used to place the vertical lines at the optimum values. A lambda of 0.068, with log(λ) = − 2.688 was chosen (1-SE criteria) for selecting features. b The LASSO coefficient profiles. The value selected by cross-validation was applied for drawing the vertical line and nine nonzero coefficients were shown

Rad-score = 1.087604 + 0.090569 × log-sigma-5–0-mm-3D_firstorder_Kurtosis + 0.204631 × original_shape_Maximum 2D Diameter Row – 0.222923 × original_shape_Sphericity + 0.116812 × original_shape_Minor Axis Length + 0.140837 × wavelet-HLL_glcm_Difference Entropy + 0.379834 × wavelet-HLL_glszm_Size Zone NonUniformity + 0.017470 × wavelet-HLH_glszm_Zone Entropy – 0.041454 × wavelet-HHL_firstorder_Mean + 0.004327 × wavelet-LLL_glcm_Informational Measure of Correlation 1

The Rad-scores of LARC patients were generally higher than those who had early-stage rectal cancer. A significant difference was found between LARC and early-stage rectal cancer Rad-scores (mean ± standard deviation) in both the training dataset (1.402 ± 0.868 vs 0.208 ± 0.680, respectively, P < 0.001) and the validation dataset (1.476 ± 0.836 vs 0.488 ± 0.782, respectively, P < 0.001).

Individualized radiomics nomogram development and validation

Univariate logistic regression analysis indicated that Rad-score, tumor diameter, tumor location, CEA, and CA199 level had predictive value for LARC. Further multivariate logistic regression analysis confirmed CEA level and Rad-score as independent predictive factors (Table 2). Thus, we developed a nomogram that combined Rad-score and CEA (Fig. 5a). The AUC of the nomogram was 0.882 (95% CI 0.835–0.930) for the training dataset and 0.846 (95% CI 0.757–0.936) for the validation dataset (Fig. 5b and c).

Table 2 Risk factors for patients with LARC
Fig. 5
figure 5

Radiomics nomogram for predicting LARC. The nomogram combining CEA and Rad-score was built from the training dataset (a). The ROC curves of the training (b) and validation (c) datasets for the radiomics nomogram

The calibration curve (Fig. 6a and b) and the H–L test results (P = 0.768 for the training dataset and 0.638 for the validation dataset) indicated good consistency between observation and prediction.

Fig. 6
figure 6

The calibration curves of the training (a) and validation (b) datasets

Clinical utility

The DCA for radiomics nomogram indicated that when the probability of LARC was between 0.23–0.99 and 0.27–0.99 in the training (Fig. 7a) and validation datasets (Fig. 7b), respectively, using the nomogram to evaluate pathological LARC had added benefit compared with treating all patients as early or local advanced stage.

Fig. 7
figure 7

The DCA of the training (a) and validation (b) datasets

Prospective trial analysis

The radiomics nomogram was used to perform prospective analysis on another 32 patients with rectal cancer, including 18 males and 14 females. Table 3 presents the patient characteristics.

Table 3 Characteristics of patients in the prospective study

The AUC of the radiomics nomogram was 0.859 (95% CI 0.730–0.987) (Fig. 8a). The DCA showed that when the probability of LARC ranged from 0.06 to 0.97 (Fig. 8b), using the nomogram to evaluate pathological LARC had added benefit compared with treating all patients as early or local advanced stage, which suggested the nomogram had good clinical utility. The prospective analysis showed that this nomogram has the satisfying predictive ability in prospective conditions.

Fig. 8
figure 8

The ROC (a) and DCA (b) of the nomogram

Discussion

In our study, we explored the performance of a radiomics-based nomogram for the preoperative individual prediction of T stage in rectal cancer patients. The nomogram incorporated radiomics features and CEA levels. The results showed that this nomogram had good accuracy and clinical utility in both the retrospective analysis and prospective study. This indicated that radiomics-based nomograms can improve preoperative T stage prediction strategies in rectal cancer patients, which has important implications for clinical decision making.

Computed tomography (CT), MRI, and positron emission tomography/CT are commonly used imaging methods in clinical routine work. MRI is the preferred imaging exam for rectal cancer T staging [3, 4]; however, there are some drawbacks that limit its application in preoperative assessment. Conventional MRI has difficulty in distinguishing tumor infiltration from fibrosis [26]. Most importantly, the accuracy of conventional MRI in the diagnosis of rectal cancer T staging is influenced by subjectivity [27].

Radiomics is a promising field that focuses on enormous quantitative information extraction from medical images and deeply explore the potential connections related to tumor occurrence and development [28, 29]. It can provide evidence to support clinical decision making and thus has high clinical significance [14, 19, 30]. In this study, we constructed and validated a radiomics nomogram, which combined both the radiomics signature and clinicopathologic risk factors for personal prediction of T staging in rectal cancer patients. We used axial T2-weighted images to extract features of the entire tumor and the independent predictors of LARC were selected out by LASSO logistic method. This method allows us to combine radiomics features into a radiomics signature [31,32,33]. Multi-factor analysis that incorporates individual factors into a factor panel has been widely used in recent studies [34,35,36]. For example, Wang et al. [34] constructed an MRI-based radiomics model to predict the muscle-invasive status of bladder cancer and confirmed that the radiomics could be an efficient tool for preoperative prediction. Similarly, Xu et al. [35] developed a radiomics nomogram to predict intracerebral hematoma expansion and found that the nomogram could serve as a convenient measurement. Pan et al. [36] used the LASSO logistic method to identify optimal radiomics features for preoperative classification of ovarian cystadenoma. The results showed that imaging biomarkers could classify ovarian serous cystadenoma and mucinous cystadenoma. In our study, we incorporated the Rad-score and CEA levels into a nomogram for preoperative prediction of LARC. The nomogram exhibited good discrimination in retrospective cohort studies and was confirmed in the prospective pilot analysis as well. Furthermore, our nomogram performed better than previous studies. Ma et al. [11] used MRI radiomics derived from T2-weighted images to predict pathological characteristics of rectal cancer. They extracted 1029 radiomics features and used the LASSO method to select optimal features. Eleven features were found to be related to T staging and the accuracy of T stage prediction was 0.762. Yin et al. [12] collected data from 115 rectal cancer patients in their study. Texture features based on apparent diffusion coefficient (ADC) maps and diffusion-weighted images (DWI) were used to predict different stages of rectal cancer, which resulted in an AUC of 0.793. Finally, Sun et al. [13] explored the role of radiomics features in identifying tumor characteristics of rectal cancer. They demonstrated that it is feasible to use MRI-based radiomics features to predict T staging.

In our study, we constructed the nomogram based on the T2-weighted imaging, which is a routine sequence in rectal scanning. T2-weighted imaging has better stability in appearance. High-resolution T2-weighted imaging can provide clear visualization of rectal layers by offering good contrast between the tumor and surrounding tissue, making it one of the most important sequences for staging of rectal cancer [37, 38]. Contrast-enhanced T1-weighted images are obtained by scanning after intravenous injection of a contrast agent to determine angiogenesis inside tumors. However, for staging of rectal cancer, contrast-enhanced T1-weighted imaging is not recommended as a routine sequence [39]. In our study, we extracted features from original and filtered images, including wavelet filter and LoG filter images. The features derived from the wavelet filter images accounted for the majority of the final Rad-score. Wavelet filters have advantages in signal denoising and have been commonly applied in recent studies [40,41,42]. One prospective study that predicted tumor grading of rectal carcinoma had features obtained primarily from wavelet filter images in their radiomics signature [40]. In the study by Hamerla et al. [41], the most relevant features in their radiomics model for predicting pathological response were also extracted from wavelet filter images. Furthermore, Liang et al. [42] constructed a prediction model for metachronous liver metastases in rectal cancer patients, and a large proportion of the selected features was obtained from wavelet filter images.

In addition to radiomics features, we introduced CEA into the nomogram. It is one of the commonly used tumor markers in clinical practice [43]. CEA is a cell surface glycoprotein and is expressed at low levels in normal tissue but is up-regulated in most CRC lesions [44, 45]. Previous studies have demonstrated that CEA is correlated with rectal cancer staging [46,47,48]. In this study, we analyzed the performance of CEA in predicting LARC. After univariate and multivariate logistic regression, we found that CEA was a predictive factor for LARC, fitting the results of previous studies. The model constructed by combining CEA with a radiomics signature showed good accuracy and calibration ability.

Some limitations exist in this study. First, the patients recruited in the study were from a single center, and therefore, our study lacks external validation. Second, the sample size was relatively limited. Third, our images were all scanned by one 3.0 T MRI scanner, which may lead to a limitation in model generalization. In the future, multi-center, large-scale, and multi-vendor studies are needed to overcome these limitations.

In conclusion, we constructed a radiomics-based nomogram and validated its prediction ability and clinical utility. The nomogram is a novel method to predict LARC and can provide support for clinical decision making.