An automated deep learning pipeline for EMVI classification and response prediction of rectal cancer using baseline MRI: a multi-centre study

Cai, Lishan; Lambregts, Doenja M. J.; Beets, Geerard L.; Mass, Monique; Pooch, Eduardo H. P.; Guérendel, Corentin; Beets-Tan, Regina G. H.; Benson, Sean

doi:10.1038/s41698-024-00516-x

An automated deep learning pipeline for EMVI classification and response prediction of rectal cancer using baseline MRI: a multi-centre study

Article
Open access
Published: 22 January 2024

Volume 8, article number 17, (2024)
Cite this article

Download PDF

You have full access to this open access article

npj Precision Oncology

An automated deep learning pipeline for EMVI classification and response prediction of rectal cancer using baseline MRI: a multi-centre study

Download PDF

Lishan Cai ORCID: orcid.org/0000-0002-3282-291X^1,2,
Doenja M. J. Lambregts^1,2,
Geerard L. Beets ORCID: orcid.org/0000-0002-1671-9912^2,3,
Monique Mass^1,2,
Eduardo H. P. Pooch^1,2,
Corentin Guérendel^1,2,
Regina G. H. Beets-Tan^1,2 &
…
Sean Benson¹

1303 Accesses
1 Citation
3 Altmetric
Explore all metrics

ABSTRACT

The classification of extramural vascular invasion status using baseline magnetic resonance imaging in rectal cancer has gained significant attention as it is an important prognostic marker. Also, the accurate prediction of patients achieving complete response with primary staging MRI assists clinicians in determining subsequent treatment plans. Most studies utilised radiomics-based methods, requiring manually annotated segmentation and handcrafted features, which tend to generalise poorly. We retrospectively collected 509 patients from 9 centres, and proposed a fully automated pipeline for EMVI status classification and CR prediction with diffusion weighted imaging and T2-weighted imaging. We applied nnUNet, a self-configuring deep learning model, for tumour segmentation and employed learned multiple-level image features to train classification models, named MLNet. This ensures a more comprehensive representation of the tumour features, in terms of both fine-grained detail and global context. On external validation, MLNet, yielding similar AUCs as internal validation, outperformed 3D ResNet10, a deep neural network with ten layers designed for analysing spatiotemporal data, in both CR and EMVI tasks. For CR prediction, MLNet showed better results than the current state-of-the-art model using imaging and clinical features in the same external cohort. Our study demonstrated that incorporating multi-level image representations learned by a deep learning based tumour segmentation model on primary MRI improves the results of EMVI classification and CR prediction with good generalisation to external data. We observed variations in the contributions of individual feature maps to different classification tasks. This pipeline has the potential to be applied in clinical settings, particularly for EMVI classification.

Quantitative MRI-based radiomics analysis identifies blood flow feature associated to overall survival for rectal cancer patients

Article Open access 02 January 2024

Predicting treatment response from longitudinal images using multi-task deep learning

Article Open access 25 March 2021

Recent Advances in Functional MRI to Predict Treatment Response for Locally Advanced Rectal Cancer

Article 12 November 2021

Introduction

Over the last two decades, advancements in imaging technologies have made stage-specific and personalized treatment of rectal cancer possible^1,2,3,4. Magnetic Resonance Imaging (MRI) is the routine modality used to stratify patients into low, intermediate and high risk groups based on key risk factors such as tumour (T) stage, nodal (N) stage and involvement of the mesorectal fascia^5,6,7. In addition, recent guidelines⁸ have also acknowledged extramural vascular (or venous) invasion (EMVI) (see Fig. S2 for EMVI visualisation) as an independent poor prognostic factor that should be taken into account for baseline staging and risk stratification. EMVI is defined as the spread of malignant cells beyond the rectal wall into adjacent perirectal blood vessels and is an important risk factor for local recurrence, distant metastasis and impaired overall survival^9,10.

In addition to primary staging and risk stratification, MRI also plays an increasingly important role in assessing response to neoadjuvant treatment^11,12. High-risk (locally advanced) patients typically undergo radiotherapy or combined chemoradiotherapy (CRT) to induce tumour downsizing and downstaging prior to surgery. As a result of CRT, up to 27% of patients may achieve a complete response (CR)¹³. Organ-preserving (watch and wait) treatment may be offered as an alternative to standard resection for these patients, provided that they can be accurately selected. This option has been associated with favourable long-term oncological outcomes and improved quality of life¹⁴. The combination of digital rectal examination, endoscopy and MRI including diffusion-weighted imaging (DWI) has been shown to yield good diagnostic performance to identify a CR after completion of CRT¹⁵. In addition to assessing response after completion of CRT, recent studies¹⁶ have focused on early response prediction using imaging biomarkers derived from baseline MRI (including DWI) scans. Predicting response before the start of treatment could create new opportunities to further personalise neoadjuvant treatment schemes depending on the anticipated response. Recent studies^17,18,19,20 demonstrate reasonable results for predicting risk factors such as EMVI and response to CRT by combining Artificial Intelligence (AI) techniques with MRI to develop prognostic image biomarker models. So far, these models have mostly used combinations of clinical and/or radiomics features, which require MRI manual delineation, feature extraction, and feature selection steps. Ao et al. ¹⁷ assessed preoperative EMVI using quantitative Dynamic Contrast-Enhanced MRI and DWI parameters, achieving an area under the ROC curve(AUC) of 0.856 with 84 patients from a single centre. Shu et al. ¹⁸ proposed an EMVI prediction model using multiparametric MRI including T2-weighted images (T2W), T1-weighted images (T1W), and DWI, with an AUC of 0.835 on 317 patients from a single-centre dataset without an external validation.

Regarding CR prediction, Bourbonne et al. ²¹ have concluded in their recent review that substantial efforts have been made to improve the quality of published radiomics models. As of the 14th of November 2022, there were 36 studies concerned MRI-only radiomics with reported AUC ranging from 0.70 to 0.95, and most were retrospective studies based on pre-CRT only MRIs. Also, delineation of the tumour volumes was manually done by radiologists in most studies, which hinders the implementation of fully automated classification models. Some studies applied deep learning (DL) techniques. Unlike radiomics using handcrafted and quantifiable features, DL is able to extract features automatically from images. Zhu et al. ²² proposed a DL model to predict response by training with Apparent Diffusion Coefficient (ADC) patches delineated by radiologists. Their DL model achieved an AUC of 0.851 (95% CI: 0.789–0.914), again based on data from a single centre. Jin et al. ²³ presented a multi-task deep learning approach consisting of two Siamese sub-networks that are joined at multiple layers. The multi-task model utilises both pre and post-treatment multiparametric MRI (DWI, T2W, T1W, T1-weighted with contrast-enhancement (T1W + C)), achieving an AUC of 0.95 in two independent cohorts. However, the same model was trained by Wichtman et al. ²⁴ in a multi-centre (4 centres) scenario. Their model showed an AUC of 0.60 when using the combination of pre and post-therapeutic T2W, DWI, and ADC maps as input. Wichtmann et al. ²⁴ demonstrated the current challenge of constructing deep learning models using multi-institutional medical data. Data from different origins can contain significant variations based on specific parametrisation, creating a domain shift problem observed in multiple medical imaging modalities^25,26.

In the management of rectal cancer using AI, there is a lack of multi-centre studies to validate the generalisability of the models and their feasibility for automated implementation in clinical settings. In this study, we introduced a fully automated deep learning pipeline. The pipeline consists of nnUNet²⁷, a self-configuring DL tumour segmentation model and a classification model utilising multi-level image representations learned by nnUNet, named as MLNet. To validate the pipeline, we used a multi-centre dataset including data from 509 patients from 9 medical centres in the Netherlands. The proposed automated pipeline aims to classify EMVI status and predict treatment response using primary staging MRI further to provide potential additional value to the preoperative clinical workflow.

Results

Characteristics of cases

We used a dataset collected as part of a previously published multi-centre study, which included the baseline staging MRI (DWI and T2W) of 509 patient cases (obtained from one university hospital, seven large teaching hospitals and one comprehensive cancer centres from Southeren and Northern part of the Netherlands) with locally advanced rectal cancer undergoing neoadjuvant CRT. Further in- and exclusion criteria were according to those described by Schurink et al. ²⁸. Baseline T and N staging variables cT-stage (cT1-2, cT3, cT4), cN-stage (cN0, cN1, cN2) were derived from the original staging reports that were performed by a multitude of readers. The data were grouped into mrEMVI+ and mrEMVI- cases, based on clinical assessment by an expert radiologist (D.M.J.L.) with >10 years of dedicated experience in rectal MRI. In total, there were 304 EMVI+ cases and 205 EMVI- cases. Additionally, the data was divided into pathological complete response of the primary tumour (CR) and non-complete response (non-CR) groups. CR was defined as either a complete pathological response after surgery (pCR = ypT0) or a sustained clinical complete response (cCR) with no evidence of a luminal regrowth on repeated follow-up MRI and endoscopy for a period of longer than 2 years. There were 368 cases of non-CR and 141 cases of CR. Lymph nodes were not taken into account. The characteristics of rectal cancer cases used in our study are summarised in Table 1. There were no significant differences among the basic demographic features and tumour characteristics of the development cohort and external validation cohort (all p values > 0.05).

Table 1 Summary of patient demographic and clinical characteristics of the multi-centre dataset.

Full size table

Tumour segmentation

In the first part of our proposed automated pipeline, see Fig. 1, we trained 2 nnUNet models with DWI and DWI + T2W separately. Dice similarity score (Dice) was used to measure the segmentation performance. The mean Dice (mDice) of the 4-fold cross-validation from DWI and DWI + T2W were 0.75 and 0.76 respectively. The mDices of external validation were 0.73 (DWI) and 0.74 (DWI + T2W), see Table 2. By adding T2W, the mDices for both cross-validation and external validation increased by 1%. The Dice difference between DWI and DWI + T2W segmentation is not significant with p > 0.05 (p = 0.31 for internal validation, p = 0.61 for external validation). From the boxplot in Fig. 2b, nnUNet had trouble with segmenting some cases (with Dice < 0.20) and failed to delineate several hard samples (Dice = 0.00). Figure 3 is the illustration of the segmentation performance for cases I−IV from external data and their corresponding Dice can be found in Table 2.

Table 2 Segmentation results using nnUNet.

Full size table

**Fig. 2: The segmentation performance using nnUNet in the internal and external cohorts.**

**Fig. 3: The visualization of four predicted segmentation from the external cohort.**

After the training of tumour segmentation, 4-stage feature maps derived from nnUNet were inferred. The visualization of feature maps from different stages for case I−IV can be seen in Fig. 4. Stages 1−2 represented more superficial, finer features and stages 3−4 showed coarser, more abstract image representations. nnUNet failed to delineate the rectal tumour for case IV with Dice 0.00, but feature maps were able to capture the tumoural regions.

EMVI classification and Complete Response prediction

Tables 3–4 and Fig. 5 showed EMVI classification and CR prediction results using MLNet with DWI only and DWI + T2W on the external validation. Multivariate analysis was also done for both EMVI and CR tasks using logistic regression to compare the predictive effects of clinical factors including age, gender, T and N staging, with MLNet. For EMVI classification, the MLNet with DWI alone showed better classification power with AUC 0.76 (0.66−0.84) (Table 3, Fig. 5a) in the external validation and AUC 0.76 (0.66−0.85) in the internal validation (Table S4). The addition of T2W resulted in an increase of the internal AUC to 0.78 (0.68−0.87) as indicated in (Table S6). However, the external AUC exhibited a decline to 0.73 (0.62−0.83), suggesting signs of overfitting. Nevertheless, with respect to the prediction of CR, the combination of DWI and T2W demonstrated superior performance, yielding an AUC of 0.66 (0.55−0.77) in the external cohort (Table 4, Fig. 5b) and 0.65 (0.52−0.77) in the internal validation set (Table S7). These results outperformed the utilisation of the DWI-only pipeline, which produced an external AUC of 0.62 (0.49−0.73) and an internal AUC of 0.62 (0.50−0.74) (Table S5). MLNet demonstrated superior performance for both EMVI and CR tasks in comparison to the multivariate analysis in the external cohort and development cohort (Tables S8−10).

Table 3 EMVI classification results in the external cohort.

Full size table

Table 4 CR prediction results in the external cohort.

Full size table

**Fig. 5: The ROC Curves for EMVI classification and CR prediction in the external cohort (n = 97).**

Tables 5–6 and Fig. 6 showed the ablation analysis of the EMVI classification and CR prediction from ResNet10, MLNet and the individual stage of feature maps solely using DWI in the external cohort and the results for the internal cohort can be found in the Tables S4–5. In the case of EMVI classification (Table 5, Fig. 6a), it was observed that features extracted from the first and second stages, which encompassed finer details and more information-rich attributes, played a more pivotal role in the model’s decision. Particularly, the network solely utilizing features from the first stage achieved noteworthy performance, yielding an AUC of 0.79 (0.70−0.87), surpassing MLNet’s performance, which incorporated representations from all four stages and achieved an AUC of 0.76 (0.66−0.84). In contrast, for CR prediction (Table 6 Fig. 6b), features from the third and ourth stages, characterised by coarser semantic attributes, had a more substantial impact on the final decision. Nevertheless, MLNet exhibited the best performance in the CR task. Similar patterns were also observed in the ablation analysis using both T2W and DWI, see Tables S2–3, Fig. S3 for the external validation and Tables S6−7 for the internal validation.

Table 5 EMVI classification ablation study using DWI only in the external cohort.

Full size table

Table 6 CR prediction ablation study using DWI only in the external cohort.

Full size table

**Fig. 6: The ROC Curves for the ablation study of EMVI classification and CR prediction in the external cohort (n = 97) using DWI only.**

AI explainability

To explore the interpretability of classification models, we showed the attention maps of networks in ablation analysis for both EMVI and response using Grad-Cam + +²⁹ (see Fig. 7). In case I, all the classification networks including 3D ResNet10 successfully concentrated on the tumuoral and surrounding regions for both tasks. In cases II and III, MLNet effectively guided its attention to the tumour and peri-tumour areas for both EMVI and CR experiments. While using features exclusively from the first stage, the model exhibited a selective focus solely on tumour-related areas during the EMVI classification, failing to encompass the same focus in the CR task. Furthermore, with the progression to coarser features (stages 3−4), the network lost its ability to focus on the tumour. For cases II and III, in the response prediction task, the model with features from a single stage alone appeared to be limited in guiding the model to concentrate on rectal tumour regions. MLNet highlighted tumour and peri-tumoural areas in case IV, despite the failure of tumour segmentation (Dice = 0.00). Overall, we observed that by injecting four-stage feature maps from segmentation networks, MLNet was guided to be able to effectively focus on tumoural and peri-tumoural regions for classification tasks across all four cases. In some specific cases, features from a single stage alone were also capable of localising the tumour and its neighbouring regions.

**Fig. 7: The visualization based on the Grad-CAM + + method of ablation studies for EMVI classification and CR prediction.**

Discussion

We have proposed a fully automated pipeline for rectal tumour segmentation, the classification of EMVI status and the prediction of the treatment outcome (complete response to CRT). The pipeline consists of nnUNet and MLNet, a lightweight CNN. nnUNet was trained to achieve automated tumour segmentation and extract different scale features from baseline MRI. MLNet, which fuses inferred segmentation features into 3D ResNet10, was trained to classify EMVI status and to predict treatment response. The nnUNet model demonstrated favourable rectal tumour segmentation performance and generalisation capabilities, as evidenced by achieving a mDice of 0.73 (0.74) on external validation and 0.74 (0.76) on cross-validation using DWI (DWI + T2W) in multi-centre background.

For EMVI classification, the performance was 0.76 (0.66−0.84) on the external validation dataset. With only the finest feature map (stage 1), the AUC of external validation could reach up to 0.79 (0.70−0.87) using only DWI. For CR prediction, MLNet achieved AUC of 0.66 (0.55−0.77), outperforming the current state-of-the-art by Schurink et al. ²⁸ on the same external cohort. Schurink et al. ²⁸ developed a clinical-imaging model to predict CR. The best-performing model, using non-imaging (weeks to surgery) and advanced staging variables (tumour height, T and N staging, invasion depth and tumour length), achieved an AUC of 0.60 (0.53−0.76). Like other Radiomics-based or tumour-centre crop-based models, one limitation of the study from Schurink et al. ²⁸ was that manually annotated segmentation for both development and test cohorts by experienced radiologists was required. To solve this, Jin et al. ²³ have proposed a multi-tasking learning model with pre-and-post multiparametric MRI for both segmentation and pCR assessment and the model shows the state-of-the-art results with single-centre data. The drawback of such a multi-tasking network is that firstly, it requires both pre- and post-MRIs. Pre-treatment prediction of response is potentially beneficial for personalising neoadjuvant strategies. Also, it is better to visualise the rectal tumour in pre-treatment than post-treatment MRIs, where high signal areas are frequently less noticeable and may be distributed throughout the fibrosis³⁰. Secondly, the training of such a heavy multitasking model is computationally expensive. In our study, only baseline MRI was used. Training and inference of lightweight MLNet were significantly faster than the multitasking network. Some studies have also proposed machine learning based automated workflows. Defeudis et al. ³¹ have demonstrated automated pCR radiomics models after nCRT in LARC using DWI and T2W performed before CRT. The AUC could reach 0.81 (0.60–0.89) over external validation data. However, the main limitation is that they have excluded all cases with automated segmentation dice lower than 0.2 as they cannot guarantee that radiomics features are from the targeted tumour regions with such poor segmentation results, where prediction AUC is biased to high Dice cases. In the multi-centre background, predicting masks with dice lower than 0.2 could often occur due to data heterogeneity. MLNet solved this problem by injecting tumour representations derived from different levels of nnUNet. For instance, in the case of IV, even though the segmentation network failed to contour the rectal tumour, MLNet was capable of capturing hidden features for CR prediction.

Most radiomics models were only looking at tumour core regions. However, peri-tumoural regions also potentially contain useful information. Delli Pizzi et al. ³² presented an MRI-based machine learning model using clinical features and radiomics features extracted from both “tumour core” (the whole rectal tumour manually segmented on pre-treatment T2W) and “tumour border” (the most peripheral portion of the tumour core and the surrounding tissues). By adding “tumour border” features, the machine learning model outperformed the model with “tumour core” regions only, which demonstrated that peri-tumoural tissues contain meaningful features to identify treatment responders. Rectal cancer arises in close association with white adipose tissue (mesorectal fat). Nutrient supply and catabolite drainage to and from the normal rectal wall and rectal tumours must travel through the mesorectal fat by way of vessels and lymphatics¹⁹, indicating that the mesorectal fat and structures within contain potential predictive information. Jayaprakasam et al. ³³ extracted radiomics features from mesorectal fat in patients with LARC to predict pathological complete responders (accuracy 83.9%) and local (accuracy 78.3%) or distant recurrence (accuracy 87.0%). Their study further demonstrated the potential predictive value of peri-tumoural regions. MLNet takes not only the peri-tumoural regions but also the global context into consideration. The original MRI was included in the input, which allows the incorporation of global information. In the meantime, local information is highlighted by injecting multi-level feature maps.

In our ablation study, we also observed that feature maps extracted from different stages contributed differently to EMVI classification and CR prediction tasks. The reason might be finer features are more crucial to morphological prognostic factors like EMVI. Conversely, for more challenging and intricate tasks like CR prediction, the integration of multi-level features was more beneficial.

There are some limitations of the study. First of all, despite MLNet outperforming the current state-of-the-art²⁸, the sensitivity (0.61) and positive predictive value (PPV) (0.44) were comparatively low, indicating that MLNet’s ability to correctly identify responders was limited, which hinders the implementation of the pipeline in the clinical workflow. The relatively low response rate (around 30%) could be one of the contributors to low sensitivity and PPV. Even though we have applied weighted loss, the data imbalance can still result in limited model performance. To address this issue, Generative Adversarial Networks (GANs) can be used to generate synthetic data for the responder class³⁴. Additionally, There is currently no standardised protocol for MRI evaluation of treatment response in locally advanced rectal cancer, which can lead to variability in the labelling of treatment response across different centres³⁵. Secondly, all the manual segmentation is based on DWI. T2W was then downsampled in the same domain of DWI, which led to information loss in T2W. Thirdly, the standard of reference for EMVI was based on the assessment by the radiologist using MRI and not pathology considering that patients who underwent CRT and EMVI status post-CRT would no longer be representative of the baseline setting. Fourthly, the dataset in our study was collected over a long time frame from February 2008 and March 2018 from different centres. The significant quality variations have a negative effect on the model performance. We have only used nnUNet pre-processing module to deal with the data heterogeneity. Other state-of-the-art methods can be adopted to deal with data heterogeneity. Modanwal et al. ³⁶ have proposed a method based on CycleGAN for MRI normalisation. Their model can successfully learn bidirectional mapping and perform normalisation between MRIs produced by different vendors. Fifthly, although we have the dataset from 9 centres, the total number of samples is only 509. Besides the sample size limitation, the inclusion of a solely Dutch patient cohort may impact the generalisability of the findings. More diverse patient cohorts may be beneficial for this study. Sixthly, some studies²³ have testified that clinical features like carcinoembryonic antigen (CEA) level could improve the model performance. Collecting CEA and other clinical features could be useful for the MLNet model. Also, integrating other modalities like endoscopic imaging can further enhance the model’s performance. Last but not least important, the retrospective nature of the study is also one of our limitations. A prospective cohort from multiple centres may further demonstrate the performance of the model.

Methods

Dataset

Patient data were retrospectively collected if they satisfied the following criterion: (1) biopsy-proven rectal adenocarcinoma; (2) non-metastasised; (3) available pre-treatment MRI (T2W and high b value DWI); (4) routine long-course neoadjuvant treatment including radiotherapy total dose 50.0−50.4 Grey with concurrent capecitabine-based chemotherapy; (5) final treatment including surgery or watch and wait with longer than 2 years clinical follow-up. The study was conducted in accordance with the Declaration of Helsinki and has been approved by the Institutional Review Board (IRB) of the Netherlands Cancer Institute. Each participating centre reviewed the study protocol and provided approval. Informed consent was waived by the IRB and by each participating centre during local ethical review and approval due to the retrospective nature of the study. 670 patients were initially collected and 161 patients were excluded see Fig. S1. 509 patients data were obtained using 25 scanners, 94 protocols for DWI and 112 T2W protocols see Table S1. For DWI, b-values range from 600 to 1200. Semi-automatical algorithm using level-tracing was first used to segment all high b value DWIs, A board-certified radiologist with >10 years of experience in rectal MRI then manually adjusted the segmentation slice by slice, taking the anatomical information from T2W into consideration, taking care to exclude the rectal lumen and any non-tumour perirectal tissues. The same expert radiologist reported mrEMVI status for each patient. We split patients into training, validation and external testing centre-wise. To have a fair comparison of the CR classification performance with the current state-of-the-art, we kept the same external cohort (3 centres) as Schurink et al. ²⁸. Out of the rest 6 centres, 2 centres were randomly chosen as the internal validation.

Segmentation

nnUNet, proposed by Isensee et al. ²⁷, is a deep learning-based segmentation approach that automatically configures itself for any new task, including preprocessing, network design, training, and post-processing. nnUNet has shown great performance over 23 public datasets used in international biomedical segmentation competitions²⁷. It has a state-of-the-art prepossessing technique, which automatically generates a dataset fingerprint that contains all relevant parameters and properties. Also, networks are trained with deep supervision strategy³⁷. Deep supervision is to provide the supervision of hidden layers and propagate it to lower layers, instead of only supervising at the output layer³⁸. In nnUNet, deep supervision downsamples the ground truth masks to different scales with tri-linear interpolation such that it corresponds to the output at each upsampling stage. The final segmenting loss is then the weighted combination of the loss at each of these upsampling stages. Deep supervision allows gradients to be fed deeper into the network and facilitates the training of all layers. All the feature maps at different stages are inferred after segmentation training for further application in classification tasks.

Classification

The second part of our automated pipeline is a lightweight CNN, which was modified on top of a 3D ResNet10³⁹. Other than 3D ResNet10, different backbones were compared in the external cohort and 3D ResNet10 outperformed all other 3D ResNet backbones see Table S11 and Fig. S4. The original MRI was fed into the model as input. Experiments using segmentation features as input without original MRIs underperformed MLNet, see Table S12. Additionally, instead of placing the residual blocks with skip connections, feature maps of different stages inferred from segmentation networks were injected into our classification network as prior knowledge. The feature injection was done by concatenation (Fig. 1b). For the ablation analysis, only the original MRI was used as input for the 3D ResNet10. Single-stage representations were injected into the StageN (N = 1, 2, 3, 4) model, with the original MRI serving as the input as well. Multivariate analysis was conducted with logistic regression using the development cohort (412 patients, 6 centres) and external validated with the same data as other models in the ablation analysis.

Experiment

For the segmentation part, we trained a 4-fold nnUNet and then inferred the predicted masks and corresponding feature maps of 4 stages (from coarse to fine, see Fig. 4 for feature visualization) for both 4-fold validation and external validation. After segmentation training, we split the development data (6 centres, with 412 patients) into a training set (4 centres, 317 patients) and an internal validation set (2 centres, 95 patients). MLNet, as well as other models in the ablation analysis, were constructed using a training cohort, internally validated and further validated on the external validation cohort. The pipeline was constructed using PyTorch⁴⁰. Both nnUNet and MLNet were trained on an NVIDIA RTX 2080 Ti GPU. During the training of nnUNet, all the hyperparameters were automatically configured. During the training of MLNet, the batch size was set to 4 and the initial learning rate was 1e − 4. Weighted binary cross entropy was used as the loss function. Adam⁴¹ was used as the optimiser. Additionally, shape-aware minimisation (SAM)⁴² simultaneously minimising loss value and loss sharpness was adopted. To avoid overfitting, training patience was set to 10. The best model was saved with the best loss on the internal validation set.

Statistical analysis

Statistical analysis was performed by using python 3.8.15. For further information, check MLNet github repository. The Dice is used to evaluate tumour segmentation performance. The AUC, sensitivity, specificity, PPV, Negative Predictive Value (NPV) and F1 score are used to evaluate the EMVI classification and CR prediction results. All the metrics are showed in Eqs. 1–7. The operating points for distinguishing between EMVI+ and EMVI-, CR and Non-CR were generated using the maximum Youden index on internal validation cohort and the same threshold was applied on the external set. 95% confidence intervals were generated with bootstrap method with 10,000 replications⁴³. The characteristics difference of different cohorts were compared by Kruskal-Wallis Test. Mann–Whitney U test was used to compare the difference of indicators among different methods. All statistical analyses were two-sided and p value less than 0.05 was regarded as statistically significant. All the metrics in our study are listed as follows:

$${Dice}=\frac{2{TP}}{2{TP}+{FP}+{FN}}$$

(1)

$${AUC}=\frac{{\sum }_{{in}{s}_{i}\in {positiveclass}}{Ran}{k}_{{in}{s}_{i}}-\frac{M* \left(M+1\right)}{2}}{M* N}$$

(2)

$${Sensitivity}=\frac{{TP}}{{TP}+{FN}}$$

(3)

$${Specificity}=\frac{{TN}}{{TN}+{FP}}$$

(4)

$${PPV}=\frac{{TP}}{{TP}+{FP}}$$

(5)

$${NPV}=\frac{{TN}}{{TN}+{FN}}$$

(6)

$$F1=\frac{{TP}}{{TP}+\frac{1}{2}\left({FP}+{FN}\right)}$$

(7)

Where TP is true positive, FN is false negative and FP denotes false positive. For AUC calculation, M, N are the number of positive samples and negative samples. ${Ran}{k}_{{in}{s}_{i}}$ is the serial number of sample i. ${\sum }_{{in}{s}_{i}\in {positiveclass}}{Ran}{k}_{{in}{s}_{i}}$ is adding up the serial numbers of the positive cases.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Data availability

The original data is private and is not publicly available to guarantee protection of patients’ privacy. All data supporting the findings can be provided upon reasonable request to the corresponding author for non-commercial and academic purposes. Excel files containing raw data included in the main figures and tables can be found in the Source Data File in the article. We provided all source codes of this study to facilitate reproducibility.

Code availability

The codes of this paper are available at https://github.com/Liiiii2101/MLNet. It should only be used for academic purposes.

References

Arnold, D. & Stein, A. Personalized treatment of colorectal cancer. Oncol. Res. Treat. 35, 42–48 (2012).
Google Scholar
Balyasnikova, S. & Brown, G. Optimal imaging strategies for rectal cancer staging and ongoing management. Curr. Treat. Options Oncol. 17, 1–11 (2016).
Article Google Scholar
Brouwer, N. P. et al. An overview of 25 years of incidence, treatment and outcome of colorectal cancer patients. Int. J. Cancer 143, 2758–2766 (2018).
Article CAS PubMed PubMed Central Google Scholar
Andrei, P. et al. Integrated approaches for precision oncology in colorectal cancer: the more you know, the better. Semin. Cancer Biol. 84, 199–213 (2022).
Jhaveri, K. S. & Hosseini-Nik, H. MRI of rectal cancer: an overview and update on recent advances. Am. J. Roentgenol. 205, W42–W55 (2015).
Article Google Scholar
Horvat, N. Carlos Tavares Rocha, C., Clemente Oliveira, B., Petkovska, I. & Gollub, M. J. MRI of rectal cancer: tumor staging, imaging techniques, and management. Radiographics 39, 367–387 (2019).
Article PubMed Google Scholar
Bates, D. D. et al. MRI for rectal cancer: staging, mrCRM, EMVI, lymph node staging and post-treatment response. Clin. Colorectal Cancer 21, 10–18 (2022).
Article PubMed Google Scholar
Beets-Tan, R. G. et al. Magnetic resonance imaging for clinical management of rectal cancer: updated recommendations from the 2016 European Society of Gastrointestinal and Abdominal Radiology (ESGAR) consensus meeting. Eur. Radiol. 28, 1465–1475 (2018).
Article PubMed Google Scholar
Yu, J. et al. Prognostic aspects of dynamic contrast-enhanced magnetic resonance imaging in synchronous distant metastatic rectal cancer. Eur. Radiol. 27, 1840–1847 (2017).
Article PubMed Google Scholar
Zech, C. J. MRI of extramural venous invasion in rectal cancer: a new marker for patient prognosis? Radiology 289, 686–687 (2018).
Article PubMed Google Scholar
Lambregts, D. M., Boellaard, T. N. & Beets-Tan, R. G. Response evaluation after neoadjuvant treatment for rectal cancer using modern MR imaging: a pictorial review. Insights Imaging 10, 1–14 (2019).
Article Google Scholar
Fernandes, M. C., Gollub, M. J. & Brown, G. The importance of MRI for rectal cancer evaluation. Surg. Oncol. 43, 101739 (2022).
Article PubMed Google Scholar
Maas, M. et al. Long-term outcome in patients with a pathological complete response after chemoradiation for rectal cancer: a pooled analysis of individual patient data. Lancet Oncol. 11, 835–844 (2010).
Article PubMed Google Scholar
Roh, M. S. et al. Preoperative multimodality therapy improves disease-free survival in patients with carcinoma of the rectum: NSABP R.-03.J. Clin. Oncol. 27, 5124 (2009).
Article PubMed PubMed Central Google Scholar
López-Campos, F. et al. Watch and wait approach in rectal cancer: Current controversies and future directions. World J. Gastroenterol. 26, 4218 (2020).
Article PubMed PubMed Central Google Scholar
Mahadevan, L. S. et al. Imaging predictors of treatment outcomes in rectal cancer: an overview. Crit. Rev. Oncol. Hematol. 129, 153–162 (2018).
Article PubMed Google Scholar
Ao, W. et al. Preoperative prediction of extramural venous invasion in rectal cancer by dynamic contrast-enhanced and diffusion weighted MRI: a preliminary study. BMC Med. Imaging 22, 1–12 (2022).
Article Google Scholar
Shu, Z. et al. Multiparameter MRI-based radiomics for preoperative prediction of extramural venous invasion in rectal cancer. Eur. Radiol. 32, 1–12 (2022).
Article Google Scholar
Shaish, H. et al. Radiomics of MRI for pretreatment prediction of pathologic complete response, tumor regression grade, and neoadjuvant rectal score in patients with locally advanced rectal cancer undergoing neoadjuvant chemoradiation: an international multicenter study. Eur. Radiol. 30, 6263–6273 (2020).
Article CAS PubMed Google Scholar
Petresc, B. et al. Pre-treatment T2-WI based radiomics features for prediction of locally advanced rectal cancer non-response to neoadjuvant chemoradiotherapy: a preliminary study. Cancers 12, 1894 (2020).
Article PubMed PubMed Central Google Scholar
Bourbonne, V. et al. Radiomics approaches for the prediction of pathological complete response after neoadjuvant treatment in locally advanced rectal cancer: ready for prime time? Cancers 15, 432 (2023).
Article CAS PubMed PubMed Central Google Scholar
Zhu, H.-T., Zhang, X.-Y., Shi, Y.-J., Li, X.-T. & Sun, Y.-S. A deep learning model to predict the response to neoadjuvant chemoradiotherapy by the pretreatment apparent diffusion coefficient images of locally advanced rectal cancer. Front. Oncol. 10, 574337 (2020).
Article PubMed PubMed Central Google Scholar
Jin, C. et al. Predicting treatment response from longitudinal images using multi-task deep learning. Nat. Commun. 12, 1851 (2021).
Article CAS PubMed PubMed Central Google Scholar
Wichtmann, B. D. et al. Are we there yet? The value of deep learning in a multicenter setting for response prediction of locally advanced rectal cancer to neoadjuvant chemoradiotherapy. Diagnostics 12, 1601 (2022).
Article PubMed PubMed Central Google Scholar
AlBadawy, E. A., Saha, A. & Mazurowski, M. A. Deep learning for segmentation of brain tumors: Impact of cross‐institutional training and testing. Med. Phys. 45, 1150–1158 (2018).
Article PubMed Google Scholar
Pooch, E. H., Ballester, P. & Barros, R. C. Can We Trust Deep Learning Based Diagnosis? The Impact of Domain Shift in Chest Radiograph Classification. Thoracic Image Analysis. Lecture Notes in Computer Science, Vol. 12502. Springer, Cham (2020).
Isensee, F., Jaeger, P. F., Kohl, S. A., Petersen, J. & Maier-Hein, K. H. nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation. Nat. Methods 18, 203–211 (2021).
Article CAS PubMed Google Scholar
Schurink, N. W. et al. Development and multicenter validation of a multiparametric imaging model to predict treatment response in rectal cancer. Eur. Radiol. 33, 8889–8898 (2023).
Article PubMed PubMed Central Google Scholar
Chattopadhay, A., Sarkar, A., Howlader, P. & Balasubramanian, V. N. Grad-CAM++: Generalized Gradient-Based Visual Explanations for Deep Convolutional Networks. 2018 IEEE Winter Conference on Applications of Computer Vision (WACV) 839–847 (2018).
van Heeswijk, M. M. et al. Automated and semiautomated segmentation of rectal tumor volumes on diffusion-weighted MRI: can it replace manual volumetry? Int. J. Radiat. Oncol. Biol. Phys. 94, 824–831 (2016).
Article PubMed Google Scholar
Defeudis, A. et al. MRI-based radiomics to predict response in locally advanced rectal cancer: Comparison of manual and automatic segmentation on external validation in a multicentre study. Eur. Radiol. Exp. 6, 19 (2022).
Article PubMed PubMed Central Google Scholar
Delli Pizzi, A. et al. MRI-based clinical-radiomics model predicts tumor response before treatment in locally advanced rectal cancer. Sci. Rep. 11, 5379 (2021).
Article CAS PubMed PubMed Central Google Scholar
Jayaprakasam, V. S. et al. MRI radiomics features of mesorectal fat can predict response to neoadjuvant chemoradiation therapy and tumor recurrence in patients with locally advanced rectal cancer. Eur. Radiol. 32, 971–980 (2022).
Article CAS PubMed Google Scholar
Lee, J. & Park, K. GAN-based imbalanced data intrusion detection system. Pers. Ubiquitous Comput. 25, 121–128 (2021).
Article Google Scholar
Patel, U. B. et al. Magnetic resonance imaging–detected tumor response for locally advanced rectal cancer predicts survival outcomes: MERCURY experience. J. Clin. Oncol. 29, 3753–3760 (2011).
Article PubMed Google Scholar
Modanwal, G., Vellal, A. & Mazurowski, M. A. Normalization of breast MRIs using cycle-consistent generative adversarial networks. Comput. Methods Prog. Biomed. 208, 106225 (2021).
Article Google Scholar
Wang, L., Lee, C.-Y., Tu, Z. & Lazebnik, S. Training deeper convolutional networks with deep supervision. ArXiv Prepr. ArXiv150502496 (2015).
Hesamian, M. H., Jia, W., He, X. & Kennedy, P. Deep learning techniques for medical image segmentation: achievements and challenges. J. Digit. Imaging 32, 582–596 (2019).
Article PubMed PubMed Central Google Scholar
Hara, K., Kataoka, H. & Satoh, Y. Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet? 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 6546–6555 (2018).
Paszke, A. et al. Pytorch: an imperative style, high-performance deep learning library. Adv. Neural Inf. Process. Syst. 32, (2019).
Kingma, D. P. & Ba, J. Adam: A method for stochastic optimization. ArXiv Prepr. ArXiv14126980 (2014).
Foret, P., Kleiner, A., Mobahi, H. & Neyshabur, B. Sharpness-aware minimization for efficiently improving generalization. ArXiv Prepr. ArXiv201001412 (2020).
Efron, B. & Tibshirani, R. J. An introduction to the bootstrap. (CRC press), (1994).

Download references

Acknowledgements

This project has received funding from the European Union’s Horizon 2020 Research and Innovation Programme under the Marie Skłodowska-Curie grant agreement No 857894.

Author information

Authors and Affiliations

Department of Radiology, The Netherlands Cancer Institute, Plesmanlaan 121, 1066 CX, Amsterdam, The Netherlands
Lishan Cai, Doenja M. J. Lambregts, Monique Mass, Eduardo H. P. Pooch, Corentin Guérendel, Regina G. H. Beets-Tan & Sean Benson
GROW School for Oncology and Developmental Biology, Maastricht University Medical Centre, P. Debyelaan 25, 66202 AZ, Maastricht, The Netherlands
Lishan Cai, Doenja M. J. Lambregts, Geerard L. Beets, Monique Mass, Eduardo H. P. Pooch, Corentin Guérendel & Regina G. H. Beets-Tan
Department of Surgery, The Netherlands Cancer Institute, Plesmanlaan 121, 1066 CX, Amsterdam, The Netherlands
Geerard L. Beets

Authors

Lishan Cai
View author publications
You can also search for this author in PubMed Google Scholar
Doenja M. J. Lambregts
View author publications
You can also search for this author in PubMed Google Scholar
Geerard L. Beets
View author publications
You can also search for this author in PubMed Google Scholar
Monique Mass
View author publications
You can also search for this author in PubMed Google Scholar
Eduardo H. P. Pooch
View author publications
You can also search for this author in PubMed Google Scholar
Corentin Guérendel
View author publications
You can also search for this author in PubMed Google Scholar
Regina G. H. Beets-Tan
View author publications
You can also search for this author in PubMed Google Scholar
Sean Benson
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

S.B. and L.C. designed the project. D.M.J.L. performed the acquisition and annotation of data; L.C. analysed the data. L.C. and S.B. proposed the model. L.C. draughted the manuscript. S.B. supervised the project. D.M.J.L, M.M., R.G.H.B.T and G.L.B. provided project administration and resources. E.H.P.P. and C.G. provided critical feedback. All authors approved the final version of this article.

Corresponding author

Correspondence to Sean Benson.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Reporting Summary

Source data

Source Data

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Cai, L., Lambregts, D.M.J., Beets, G.L. et al. An automated deep learning pipeline for EMVI classification and response prediction of rectal cancer using baseline MRI: a multi-centre study. npj Precis. Onc. 8, 17 (2024). https://doi.org/10.1038/s41698-024-00516-x

Download citation

Received: 16 August 2023
Accepted: 14 December 2023
Published: 22 January 2024
DOI: https://doi.org/10.1038/s41698-024-00516-x
Springer Nature Limited

Associated content

AI in precision oncology

Collection 19 April 2023

An automated deep learning pipeline for EMVI classification and response prediction of rectal cancer using baseline MRI: a multi-centre study

ABSTRACT

Similar content being viewed by others

Quantitative MRI-based radiomics analysis identifies blood flow feature associated to overall survival for rectal cancer patients

Predicting treatment response from longitudinal images using multi-task deep learning

Recent Advances in Functional MRI to Predict Treatment Response for Locally Advanced Rectal Cancer

Introduction

Results

Characteristics of cases

Tumour segmentation

EMVI classification and Complete Response prediction

AI explainability

Discussion

Methods

Dataset

Segmentation

Classification

Experiment

Statistical analysis

Reporting summary

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Supplementary information

Supplementary Information

Reporting Summary

Source data

Source Data

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation