Background

Colorectal cancer (CRC) is the third most common malignant tumor worldwide [1]. According to the latest data, reported by the Cancer Statistics of China in 2015, CRC ranks fifth in morbidity and mortality [2]. Among all the patients with CRC, rectal cancer accounts for 30–35%, which are generally adenocarcinomas. The optimal therapy program selection is a multifarious course for patients with rectal cancer [3, 4], and an accurate preoperative stage is an essential step for guiding treatment decisions, including surgery or neoadjuvant chemoradiotherapy (nCRT). Surgical excision is regarded as the standard treatment strategy for early rectal cancer (T1–2 and N0), and the treatment for locally advanced (T3–4 and/or N1) rectal cancer is nCRT followed by total mesorectal excision surgery [3]. Generally, the pathological type, degree of differentiation, depth of infiltration, and presence or absence of regional lymph node metastasis reflect the degree of tumor invasiveness and predict the prognosis of a tumor [3]. Therefore, a deeper understanding of tumor pathological features has a critical value in formulating the clinical treatment plan and predicting the prognosis. Moreover, high-resolution magnetic resonance imaging (MRI) has a pivotal role in the pretreatment assessment of rectal cancer because the high-resolution T2-weighted images (T2WIs) offer better diagnostic performance in the staging of rectal cancer [3].

Recently, radiomics analysis was developed and validated as an advanced tool in assessing tumor heterogeneity. Radiomics is a noninvasive method that involves high-quality image acquisition, VOIs segmentation, high-throughput quantitative feature extraction, high-dimension feature reduction, and diagnostic, prognostic, or predictive model establishment. The radiomics model, which makes use of the medical images and clinical data, has a potential in clinical decision-making [5]. Radiomics has been used to evaluate several kinds of tumors in previous studies and is being increasingly implemented [5,6,7,8,9]. MRI-based radiomics model has been employed in distinguishing cancer from benign tissue and reflecting the histological characteristics of rectal cancer [10,11,12,13]. Therefore, the purpose of the present study was to investigate the significance of an MRI-based radiomics model derived from high-resolution T2WI in identifying specific pathological features of rectal cancer and build a set of prediction radiomics models.

Methods

Participants

This retrospective study was approved by the local institutional (Committee on Ethics of Biomedicine, Second Military Medical University) review board, and written informed consent was waived for each patient. Between March 2017 and September 2018, 182 consecutive patients with rectal lesions identified by colonoscopy with no previous treatment were involved in this study. All patients underwent rectal MRI examination and postoperative pathological test. The exclusion criteria were as follows: chemotherapy or radiotherapy before and after MRI (n = 20), poor image quality (n = 6), and distant metastases (n = 4). Therefore, 152 patients were included in the final analysis.

Magnetic resonance imaging

All patients were scanned on a 3 T MRI (MAGNETOM Skyra, Siemens Healthcare, Erlangen, Germany) using an 18-channel pelvic phased-array coil. Every patient fasted for 4 h prior to the scan. Transversal high-resolution T2-weighted turbo spin echo images were acquired with the following parameters: TR/TE = 4000/108 ms, FOV = 180 × 180 mm2, matrix = 320 × 320, slice thickness = 3 mm, gap = 0 mm, acceleration factor = 3, echo train length = 16, and acquisition time = 4 min 10 s. All patients underwent surgery at a time interval of 8.9 ± 5.8 (range, 2–28) days after the MRI examination.

Pathological evaluation

The tissue sections were subjected to hematoxylin and eosin staining. All lymph nodes in the mesorectum were retrieved from the surgical specimens to ensure that at least 12 lymph nodes per patient were collected. The final histopathological reports detailed the tumor TN staging, histological grade, and circumferential resection margin (CRM). All TN statuses were determined according to the American Joint Committee on Cancer staging system, eighth edition [14, 15]. The patients were divided into two groups according to different pathological criteria. Histological grade: high-to-moderate and poor differentiation; T stage: T1–2 and T3–4 stages; and N stage: N0 and N1–2 stages.

Feature selection

The radiomics features were extracted from the VOIs as confirmed by a radiologist (with 8 years of experience in radiology) on high-resolution T2WI using a radiomics analysis platform [Radcloud, Huiying Medical Technology (Beijing, China) Co., Ltd.] (Fig. 1). 1029 high-throughput data features based on feature classes and filter classes were automatically extracted from the platform. The platform feature extraction is based on the “pyradiomics” package in Python (version 2.1.2, https://pyradiomics.readthedocs.io/).

Fig. 1
figure 1

Example image for rectal cancer contouring. a The outline of ROI on one slice of axial T2-weighted MR image. b Sagittal reconstruction. c Coronal reconstruction. d Volume rendering

To minimize the MRI intensity variations, we normalized the intensity of the image using the following formula (where x indicates the original intensity; f(x) indicates the normalized intensity; μ refers to the mean value; σ indicates the variance; s is an optional scaling, by default, it is set to 1).

$$ f(x)=\frac{s\left(x-{\mu}_x\right)}{\sigma_x} $$

First, to guarantee image feature robustness, the basis of an intraclass correlation of 0.6 was set for test–retest analysis. Then, the robust features were selected by the least absolute shrinkage and selection operator (LASSO) method to best predict the classification performance. In the LASSO method, leave-one-out cross-validation was used to select the optimal regularization parameter alpha, as the average of mean square error of each patient was the smallest. With the optimal alpha, features having nonzero coefficient in LASSO were reserved.

Prediction model analysis

The machine learning is based on the “scikit-learn” package in Python (version 0.21.3, https://scikit-learn.org/stable/). The original collection was divided into a training set (70%) and a test set (30%) randomly. Moreover, to lower the imbalance impact of samples distribution of the degree of histological grade and N stage, the synthetic minority oversampling technique algorithm was used in the training set. The multilayer perceptron (MLP), logistic regression (LR), support vector machine (SVM), decision tree (DT), random forest (RF), and K-nearest neighbor (KNN) classifiers were trained (the parameters of the six classifiers are shown in Table 1) using fivefold cross-validation to build a prediction model. Moreover, the independent test set was used to test the performance of the model. The experiment used the mean model as the final model for the test set. The performance of models for the statistically significant pathological features was assessed using sensitivity, specificity, and area under the receiver operating characteristic (ROC) curve (AUC). P value < 0.05 was considered statistically significant.

Table 1 Supplemental data (parameters)

Results

Patient demographics

Among the 152 patients with rectal cancer, 94 were male and 58 were female, with a mean age of 58.9 ± 8.3 years (range 24–78). The pathological features of rectal cancer are presented in Table 2. None of them had positive CRM.

Table 2 Pathological characteristics of the patients

Diagnostic performance of radiomics

A total of 1029 features were extracted from preoperative high-resolution T2WI, can be classified into three categories as follows: I. The characteristics of intensity statistics, such as peak value, mean value, and variance, which are used to quantitatively describe the distribution of voxel intensity in MR images; II. Shape features, such as volume, surface area, and spherical value, which reflect the three-dimensional characteristics of the shape and size of the outlined area; and III. texture features, including the gray-level co-occurrence matrix, gray-level run length matrix, and gray-level size zone matrix, which can quantify the heterogeneity of the selected region. Additionally, Laplace-Gauss filtering, exponential, logarithmic, square, square root, and wavelet filters can be used to calculate image intensity and texture features. Wavelet filters used included wavelet-LHL, wavelet-LHH, wavelet-HLL, wavelet-LLH, wavelet-HLH, wavelet-HHHH, wavelet-HHL, and wavelet-LLL. Then 15, 11, and 11 characteristic features related to the degree of differentiation, T stage, and N stage, respectively, were obtained (Table 3). Radiomics features were selected for subsequent prediction model building, the cutoff value was selected according to the Youden index to determine the corresponding sensitivity and specificity. The AUC was used to assess the predictive ability of the model, and the selection results are presented in Tables 4 and 5.

Table 3 Radiomics features
Table 4 Training set
Table 5 Test set

For the degree of differentiation, the SVM classifier provided the best discrimination capability for the prediction model with an AUC of 0.862 (95% CI, 0.750–0.967; sensitivity, 83.3%; specificity, 85.0%). As for the T stage, the MLP classifier provided the best discrimination capability with an AUC of 0.809 (95% CI, 0.690–0.905; sensitivity, 76.2%; specificity, 74.1%). Moreover, the RF classifier showed a good diagnostic performance for the N stage with an AUC of 0.746 (95% CI, 0.622–0.872; sensitivity, 79.3%; specificity, 72.2%) (Fig. 2).

Fig. 2
figure 2

Receiver operating characteristic (ROC) curves of the prediction model for the statistically significant prognostic factors. ROC curves of SVM classifier for pathological differentiation: (a1) training set (AUC, 0.871; std., 0.037; sensitivity, 80.6%; specificity, 89.2%); (a2) test set (AUC, 0.862; 95% CI, 0.750–0.967; sensitivity, 83.3%; specificity, 85.0%). ROC curves of MLP classifier for T stage: (b1) training set (AUC, 0.824; std., 0.087; sensitivity, 80.4%; specificity, 90.0%); (b2) test set (AUC, 0.809; 95% CI, 0.690–0.905; sensitivity, 76.2%; specificity, 74.1%). ROC curves of RF classifier for N stage: (c1) training set (AUC, 0.794; std., 0.100; sensitivity, 100.0%; specificity, 95.4%); (c2) test set (AUC, 0.746; 95% CI, 0.622–0.872; sensitivity, 79.3%; specificity, 72.2%)

Discussion

This study indicated that the high-resolution T2WI–based radiomics machine learning model could not only differentiate pathological differentiation and T stage but also exhibited good diagnostic performance for N stage.

Recent studies have shown that radiomics is important in identifying tumor heterogeneity in several kinds of tumors [5,6,7,8,9], which may serve as a complementary tool for the preoperative tumor staging in rectal cancer [10,11,12,13]. The patients with rectal cancer required a comprehensive staging evaluation for guiding decisions regarding choice of treatment with an aim to avoid undertreatment and minimize overtreatment. Therefore, high-resolution T2WIs were used to explore the significance of MRI-based radiomics model in the preoperative diagnosis of rectal cancer in the present study.

Previous studies have shown by NCCN, degree of differentiation, T stage, and N stage are powerful prognostic factors for patients with rectal cancer [3]. Several studies showed a statistically significant correlation between the apparent diffusion coefficient value, derived from diffusion-weighted images, and tumor differentiation grade [16, 17]; however, some studies showed a contradictory result [18, 19]. In this study, radiomics and tumor differentiation grade showed a statistically significant correlation. The ROC curves of SVM classifier showed an AUC of 0.862 (test set), suggesting that the SVM model can be used to distinguish poorly differentiated lesions from highly/moderately differentiated lesions.

Although high-resolution MRI is recommended for the T staging of patients with rectal cancer, the accuracy of staging is still unsatisfactory. Some studies demonstrated differences in results that ranged from 44 to 100% [20, 21]. Stage T2 lesions could be differentiated from T3 lesions by identifying a smooth outer tumor border within the rectal wall, with no invasion into the fat surrounding the rectum. The difficulty in differentiating tumor infiltration from fibrosis, which is due to inflammation and blood vessel invasion, limited the ability to distinguish stage T2 tumors from early-stage T3 tumors [15]. In this study, the ROC curves of MLP classifier showed an AUC of 0.809 (test set), suggesting that the MRI-based radiomics model can be used to distinguish T3–4 lesions from T1–2 lesions. These results could be explained by the fact that higher T-stage tumors showed greater heterogeneity of cell morphology and histology, higher cell density, and smaller interstitium.

Accurate preoperative diagnosis of lymph node metastasis is another important factor for treatment selection. Although the accuracy of T staging is considerably high, the prediction of N staging remains difficult [22]. Using morphological criteria only does not improve the prediction accuracy of lymph node metastasis in rectal cancer [10]. This limitation is aggravated by the lack of consensus on appropriate criteria to assess lymph node involvement [20]. The reported accuracy of routine MRI for lymph node staging varied widely, ranging from 43 to 85% [23], suggesting that the MRI criteria for detecting lymph node metastasis are not reliable. However, the ROC curves of RF classifier showed an AUC of 0.746 (test set), which was partially consistent with the results of Huang’s study [24]. The study found radiomics signatures and other risk factors could conveniently facilitate the individualized preoperative prediction of lymph node metastasis in patients with CRC. Therefore, the RF model might reflect the aggressiveness of particular tumor tissue.

This study had several limitations. First, it was a retrospective study prone to selection bias, and the exclusion of patients with distant metastases limited its application. Hence, more patients should be included to validate the results. Second, due to the relatively small sample size, some lesions were nonuniformly distributed. Further studies are needed to broaden the application of radiomics for these lesions. Finally, radiomics is a recent imaging modality; the MRI scanning parameters and machine learning models are not yet standardized. Large prospective multicenter trials are necessary to fully evaluate the role of radiomics in the pathological features of rectal cancer.

Conclusions

In conclusion, this study demonstrated that the high-resolution T2WI–based radiomics showed good classification performance related to tumor pathological features in patients with rectal cancer. Thus, radiomics may serve as a good alternative for evaluating the pathological features of rectal cancer and can add a further dimension to the predictive power of imaging.