Introduction

Rectal cancer accounts for almost half of colorectal cancer cases, which ranks third for incidence, and second for mortality worldwide [1]. A comprehensive treatment is necessary, as the majority of patients are diagnosed with locally advanced rectal cancer (LARC). The multimodal therapies for patients with LARC includes neoadjuvant chemoradiation (nCRT) or total neoadjuvant therapy (TNT), total mesorectal excision (TME), and adjuvant chemotherapy (AC) [2,3,4]. With the improvement and standardization of multimodal treatment, approximately 15–27% of patients who receive nCRT and TME achieve a pathological complete response (pCR, ypT0N0) as confirmed with pathological studies, and thus have better long-term survival outcome than not pathological complete response (non-pCR) patients [5,6,7].

Considering the high rate of pCR, the strategy of “wait and watch” was proposed for patients with rectal cancer with clinical complete response (cCR, ycT0N0), meaning that patients could receive strict follow-up instead of having a radical surgery at the cost of anus surgery with serious comorbidities [8]. This method has been confirmed to be feasible and beneficial [9,10,11]. Therefore, receiving nCRT and omitting radical surgery in cCR patients might be a safe option and holds similar prognosis as pCR patients do.

Currently, the reassessment tools of response include clinical examination and symptoms, digital rectal examination (DRE), endoscopy, biopsy, magnetic resonance imaging (MRI), transrectal ultrasound (TRUS), positron emission tomography-computed tomography (PET-CT), and carcinoma embryonic antigen (CEA). MRI reassessment mainly refers to T2-weighted imaging (T2WI), diffusion weighted imaging (DWI), and dynamic contrast enhancing (DCE). DRE, endoscopy, and MRI have been the main reassessment tools to evaluate the response for implementing the wait-and-watch strategy [12]. However, the independent and integrated value of reassessment tools to predict pCR was undervalued or overvalued [12]. The combination of reassessment tools for pCR prediction is lacking in prospective or even retrospective studies, and there is no consensus about it worldwide. Although nowadays there are various state-of-the-art tools attempting to predict pCR, increasingly promoting the wait-and-watch strategy, such as ctDNA, circulating tumor cells (CTC), exosome or bioptic ncRNA, and intelligent models, most of them are still being studied and none of them was clinical routine due to unsatisfied predicting accuracy or great expenditure [13]. It has been reported in many studies that an extended time interval to surgery by 9–12 weeks or even longer could ensure a better pCR rate or survival outcomes [14,15,16,17]. However, whether the timing of response reassessment tools use after nCRT needs to be readjusted to attain a higher predictive accuracy is still uncertain. Therefore, in this study, we attempt to evaluate and compare the value of common reassessment tools of response for a wait-and-watch strategy in patients with rectal cancer, in addition to revealing the influence of time fluctuation on the efficacy of response of the applied reassessment tools. To our knowledge, this is the first study that focuses on the tools and timing of reassessment for a wait-and-watch strategy in rectal cancer.

Patients and methods

Patient selection

A total of 364 patients were diagnosed with LARC from January 2013, to December 2021, at the Peking University Cancer Hospital and Institute. Ethics approval was granted by the Peking University Cancer Hospital Ethics Committee. The inclusion criteria were as follows: (1) endoscopic biopsy confirmed pathologically locally advanced rectal adenocarcinoma prior to multimodal therapy use; (2) patients received long-course nCRT (without induction or consolidative chemotherapy) followed by response reassessment, then had either surgery or a wait-and-watch strategy. The exclusion criteria were as follows (1) patients with distant metastasis or other diagnosed tumors; (2) patients received multimodal therapy outside our center; (3) absence of baseline information. A total of 250 patients were eligible for further analysis, and their baseline characteristics are shown in Table 1. The study workflow is shown in Fig. 1.

Table 1 The patients’ demography and pathology complete response
Fig. 1
figure 1

The study workflow. The 364 patients were included and 250 patients were eligible to be analyzed. Abbreviation: nCRT: neoadjuvant chemoradiation therapy; pCR: pathological complete response

Multimodal therapy

Multimodal therapy procedures were discussed through multidisciplinary team cooperation and performed by a stationary oncological team. The 7th edition of the American Joint Committee on Cancer TNM system was used for tumor staging. The long-course of 22-fraction intensity-modulated radiation therapy (IMRT) regimen with concurrent capecitabine at a dose of 825 mg/m2 orally twice per day was performed [18], with a total dose of 50.6-Gy gross tumor volume (GTV) and 41.8-Gy clinical target volume (CTV) administered 5 times per week for 30 days [19, 20]. The induction or consolidative chemotherapy was not used in this study group. Following the nCRT, implementation of a wait-and-watch strategy was based on the result of a cCR or near-cCR established through the reassessment tools of response. A non-cCR would lead to radical surgery. Patients who had no residual tumor cells in pathological tissue were considered pCR. It were also considered pCR analogue for analysis who had no sign of relapse within the first two years for wait-and-watch populations [21].

Response reassessment

Reassessment tools of response were applied, including clinical examination and symptoms inquiry, DRE, endoscopy, endoscopic biopsy, MRI, PET-CT, and laboratory assays including AFP, CEA, CA199, CA72.4, CA242, CA125, NSE. The tools for evaluation were planned when the nCRT was ended based on multi-disciplinary treatment (MDT) discussion. At least three different reassessment tools were applied to reassess patient response to nCRT. In our practice, definition of rectum follows the international Delphi consensus, and it was considered as the upper rectal boundary when the point of the sigmoid take-off was visualized on imaging [22]. The dentate line was considered as the surgical definition of lower rectal boundary. All the patients were diagnosed using endoscopy and pathological confirmation from biopsy, and the distance from the to the lower edge of the tumor to anus was measured and used in our analysis.

Based on the reassessment results, specific therapeutic regimen was decided and developed in the MDT meeting. MRI with T2WI, DWI and DCE were performed with a 3.0 T scanner in our center. There were perspective reports with written records including endoscopy, pathology, MRI, examination, which were further verified by specialists and MDT discussions. cCR, near-cCR, or non-cCR had been assigned during the treatment period, and further quantified in this study. The quantified score was introduced to divide the reassessment reports level, based on previous studies by Maas [10, 12], Habr-Gama [23], MSKRS [24], and ESMO [25], in addition to being designed based on our practice and other research [26]. In addition, the ulcer, irregularity and other related abnormality in reassessment tools of endoscopy were considered potentially non-CR instead of near-cCR. “Normal,” “alleviated,” and “unalleviated” symptoms after nCRT were respectively assigned with 0, 1, 2. Similarly, in the reassessment results of the DRE, “the tumor negative,” “nodularity or abnormity,” and “nodularity with blood” were respectively assigned to 0, 1, 2. In general, a variety of intricate results were simplified into three level and quantified with “0,” “1,” “2,” representing “cCR,” “near-cCR or possible non-cCR,” “non-cCR” respectively. The score details are shown in Table 2, and the definition of cCR or near-cCR was determined by following per under the score in Table 3. Time interval to reassessment (TTR) after the end of nCRT was recorded, including time to clinical examination (TTC), time to DRE (TTD), time to endoscopy (TTE), time to biopsy (TTB), time to MRI (TTM), and time to laboratory biomarker assay (TTA). A time interval > 4 weeks was recommended, and the chosen time was mainly influenced by appointments in line. The schematic overview is shown in Fig. 2.

Table 2 Reassessment results quantified assignment
Table 3 The performance of complete response prediction in reassessment tools combinations
Fig. 2
figure 2

The schematic overview. The locally advanced rectal cancer patients received routine multimodal therapy consisting of assessment, nCRT, response reassessment, surgery or wait-and-watch. The patients with predictive cCR were recommended to accept wait-and-watch strategy. Reassessment tools of response includes MRI, endoscopy, biopsy, clinical symptom, DRE, assay of blood. Abbreviation: nCRT: neoadjuvant chemoradiation; TTM: time to MRI; TTE: time to endoscopy; TTB: time to biopsy; TTC: time to clinical symptom; TTD: time to digital rectal examination; TTA: time to assay of blood; TTS: time to surgery

Follow-up

The postoperation follow-up was regularly performed every 3 months for 2 years, and every 6 months for the following 3 years. The overall survival (OS) was measured from the time of nCRT start to the endpoint, including local relapse, distant metastasis and death, otherwise censored at the last follow-up visit.

Statistics

Descriptive statistics were performed for demographic data, including medians with interquartile ranges (IQRs) and percentages. The Shapiro–Wilk test and Levene test were used for normality and homogeneity tests. The Student T-test, Mann–Whitney U test, or Kruskal–Wallis test was used as appropriate for continuous variables, and Pearson’s chi-square test for categorical variables. The Kaplan–Meier curves and log-rank tests were performed for analysis of OS. The sensitivity, specificity, true positive (TP), false positive (FP), false negative (FN), true negative (TN), accuracy, Youden index, and the area under curve (AUC) were recorded or calculated. The continuous variable cutoff was based on the median or the upper left corner point in receiver operating curves (ROC). In analyses of tools’ performed timing, ROC cutoff is made to highlight the difference of accuracy at its sides. All tests were two-sided, and P < 0.05 was considered significant. All statistical analyses and plots were done through R software (Version 4.2.0, R Foundation for Statistical Computing).

Results

Patient demographics

Overall, 250 patients with LARC were eligible for the cohort analysis, including 244 (97.60%) patients who had undergone radical surgery and 6 (2.40%) patients who had a wait-and-watch strategy. The demographic characteristics are shown in Table 1. In the 244 patients with radical surgery, 180 (180/244, 73.78%) were pCR and 64 (64/244, 35.56%) were non-pCR. In the 6 patients with a wait-and-watch strategy, 4 (4/6, 66.67%) were relapse-free in 2 years after consideration of cCR, and 2 patients (2/6, 33.33%) relapsed in 2 years and thus received surgery. The four patients were considered to be equivalent to pCR patients in this study. There were no significant differences in baseline characteristics (age, BMI, nCRT span, time to surgery, gender, cT, cN) between pCR and non-pCR patients. Factors of ycT and ycN had significant impacts on the pCR (p < 0.001, p = 0.008, respectively), and the results showed that higher ycT and ycN was associated with a lower likelihood for a pCR. The median TTS was 96 days. The median follow-up was 42 months.

Efficacy of reassessment tools in predicting pCR

The reassessment quantified and simplified score is shown in Table 2, and the accumulative score, which was considered as cCR or near-cCR, is shown in Table 3. The reassessment tools or their combinations’ AUC results are shown in Fig. 3 and Supplementary Fig. 1. In 250 patients, 250 (100.00%) underwent clinical symptoms examination and received a DRE, 241 (96.40%) received an MRI reassessment, 60 (24.00%) received endoscopy, and 50 (20.00%) underwent biopsies. Results showed that reassessment tools such as clinical symptoms and DRE had a quite low accuracy (56.00% and 45.20%, respectively) and AUC (0.476 and 0.490, respectively). Although the MRI through discussion of multidisciplinary team (MDT) had a higher accuracy (75.93%), its AUC (0.423) was still low. All the tumor biomarkers from peripheral blood had quite a low accuracy and AUC. For instance, CEA had an accuracy of 63.60%, an AUC of 0.593, and a Youden index of 0.178. Nevertheless, endoscopy, biopsy, and their combination had relatively high accuracies (85.00%, 62.00%, 62.00%, respectively) and AUCs (0.700, 0.693, 0.714, respectively). Moreover, the combination of MRI and endoscopy or MRI, endoscopy, and biopsy also obtained a high accuracy (79.66% and 74.00%, respectively) and AUC (0.687 and 0.714, respectively), which indicated that biopsy failed to improve performance when added into the reassessment combination with MRI and endoscopy. Likewise, regarding MRI, although it could not improve AUC when added into the combination of endoscopy and biopsy (0.714 and 0.714, respectively), it increased the accuracy from 62.00 to 74%. However, DRE and clinical symptoms, when combined with other tools, failed to improve the accuracy and AUC, and performed even worse in combinations (Table 3).

Fig. 3
figure 3

The receiving operating curve of reassessment tools. (A) The ROC and AUC of endoscopy; (B) the ROC and AUC of biopsy; (C) the ROC and AUC of endoscopy combined biopsy; (D) the ROC and AUC of the tools’ combination of endoscopy, biopsy, and MRI. Abbreviation: ROC: receiving operating curve; AUC: area under the curve

Impact of timing on the predictive efficacy of the reassessment tools

Time to reassessment (TTR) after nCRT was recorded and analyzed in Table 4, and included TTC, TTD, TTE, TTB, and TTM, with their distribution shown in Supplementary Fig. 2. The threshold of these indicators was determined by a median cutoff or a ROC cutoff. The results were then divided into two groups based on the time threshold, and the accuracy at threshold’s sides was calculated respectively. The true or false predictions in the divided groups were shown in Fig. 4. For TTC, the time to reassessment of clinical examination, which mainly included symptoms, there was no significant difference to any threshold at any time point, inferring that it is feasible to perform a clinical examination at any time. Similarly, for TTM, time to the reassessment of MRI, accuracy before and after 55 days were respectively 72.57% and 79.07%, with a p-value of 0.239. Therefore, the timing of MRI use would make no difference in reassessment efficacy.

Table 4 The influence of time to reassessment after nCRT on the prediction accuracy
Fig. 4
figure 4

True or false prediction in different timing of response reassessment tools. The patients were divided into two groups including true and false prediction group by tools response reassessment outcomes. The two groups’ distribution in time interval to reassessment were shown separately, but there was no difference in two group as time analysis as a whole. The correlation of prediction of CR with the time to reassessment of (A) MRI, (B) Endoscopy, (C) Examination of symptom, (D) Biopsy, (E) DRE. The value “1” and “2” were the groups’ categorical variables. The value of “1” indicated “False Prediction” and “2” indicated “True Prediction.” The difference of distribution would show as timing longer than 100 days in biopsy. There was no any distribution difference in any timepoint in the reassessment of MRI, DRE and examination of symptom. The p value in detail was shown in Table 4. Abbreviation: CR: complete response; DRE: digital rectal examination

However, there were significant differences in TTD, TTE and TTB. For TTD, time to the reassessment of DRE, the accuracy before and after the median cutoff of 91 days was 38.84% and 51.56% respectively, with a p-value of 0.002, while before and after the ROC cutoff of 95 days the accuracy was 38.73% and 53.70% respectively, with a p-value of 0.001. Therefore, performing reassessment with a DRE after 95 days could improve its accuracy. For TTE, time to the reassessment of endoscopy, the accuracy before and after the ROC cutoff of 105 days was 73.91% and 42.86% respectively, with a p-value of 0.033. For TTB, time to the reassessment of biopsy, the accuracy before and after ROC cutoff of 100 days was 51.28% and 100% respectively, with a p-value of 0.003, meaning that it was better to perform biopsy after 100 days, as it can improve its accuracy by a large margin. In summary, time to reassessment significantly affected the assessment’s accuracy, with TTR over 100 days obtaining greater accuracy in predicting pCR.

Discussion

In order to establish whether a wait-and-watch strategy is appropriate, there are many tools for or indicators of response reassessment after nCRT in LARC, including common clinical examination, advanced assay, and model prediction. This study aimed to optimize the tools and timing of response reassessment for a wait-and-watch strategy in post-nCRT LARC patients. We found that the combination of endoscopy, biopsy and MRI is an optimal routine reassessment tools combination for a wait-and-watch strategy after nCRT in patients with LARC, while DRE or clinical symptoms outcomes could not be added into any tool’s combination. In addition, in the routine scheme of 12 weeks or longer TTS, we also confirmed that the optimal timing to use these reassessment tools with a higher accuracy was over 100 days after nCRT endpoint. Our results may help practitioners in choosing the optimal reassessment tools and their proper timing for the purpose of establishing a wait-and-watch strategy in patients with LARC.

Common examination consists of inquiry of symptoms, physical examination with a DRE, non-invasive imaging examination with an MRI [27] and a PET-CT [28], invasive examination with endoscopy [29] and biopsy, and peripheral blood biomarkers tests [30,31,32]. Although there were some emerging and cutting-edge techniques for reassessment such as exosomal proteins [33] or ncRNA [34], bioptic ncRNA [13, 35], blood tests of ctDNA [36,37,38], probe-based confocal laser endomicroscopy (pCLE) [39], and MRI related indicators such as diffusion kurtos imaging (DKI) and intravoxel incoherent motion (IVIM) [40], none of them was that convincing to make them to clinical routine. Moreover, some of the common examinations such as DRE [41], biopsy [42], or emerging tools are still in dispute about their ability to independently predict pCR. Therefore, reassessment tools and their combinations are still in development. In 2011, Maas et al. proposed a panel of endoscopy with biopsy and MRI-DWI [10]. In 2013, Habr-Gama et al. proposed a panel of endoscopy, DRE, MRI, TRUS, and PET-CT [23]. In 2015, a nationwide clinical trial in the US employed the Memorial Sloan Kettering Regression Schema (MSKRS) which included a DRE, endoscopy and MRI [24]. In 2017, ESMO clinical practice guidelines mentioned reassessment methods including DRE, endoscopy-biopsy, MRI, TRUS, CEA [25]. Thus, there is no worldwide consensus on which tools or combinations are optimal for a wait-and-watch strategy use.

In this study, our results showed the low efficiency of clinical examination, including DRE and symptoms evaluation, common laboratory tumor biomarkers, such as CEA and CA199, and their combination with other tools that still could not improve performance, which was in line with previous results [43]. In fact, when it comes to DRE, it had been reported that it should not be used as a sole means of reassessment [41]. However, when part of a combination, it was included in many schemas for reassessment [12, 23, 25]. Nevertheless, our results showed that DRE failed to improve efficacy, or even decreased efficiency, when it was added to any assessment tools combination. Therefore, DRE can be used to determine and help choose between the specific surgical options, however it is not optimal in predicting CR for a wait-and-watch strategy. Moreover, biopsy as an additional reassessment method during an endoscopy procedure had an equivalent efficacy as endoscopy, and their combination could be slightly better but was still quite limited. Finally, MRI combined with endoscopy and biopsy showed the best performance with an accuracy of 74% and AUC of 0.714 (0.546–0.882).

In many developing countries, districts, and primary hospitals, it is difficult to perform advanced assays or new techniques to improve the prediction of CR response to nCRT in patients with LARC. Hence, it is imperative to optimize our current multimodal therapy standardization and schedule. Strict standardization and definition of cCR by reassessment tools of response could be helpful in improving their prediction performance [26]. The ingenious arrangement of multimodal therapy schedule is important as well, and part of the schedule is the time when the examination or therapy could be performed. In this study, we found that the time interval from nCRT end to the reassessment time point was an important factor in the reassessment efficacy. TTD > 95 days, TTB > 100 days, and TTE < 105 days could obtain a higher accuracy. However, MRI prediction accuracy was insusceptible to the time interval fluctuation. It was intelligible why reassessment tools accuracy could impact time. For post-nCRT LARC patients, time was needed for the tumor residuals to fade away and the scar or nodule was still in the lesion if assessed early on. Thus, untimely reassessments might diminish the accuracy of predicting pCR. For example, a lesion that could have been considered as CR could be estimated to be tumor positive by DRE, endoscopy, or biopsy reassessment due to the touch or sight of a mass that was decreasing and about to disappear. One important note is that a clinical endoscopic CR report was mainly based on a diminished lesion, or a remaining lesion scar. Shallow ulcer was not considered as CR or near-CR at that time [26, 44]. However, many recently data have shown that shallow ulcer should also be considered near-CR, which is also consent by Chinese and international experts [24, 45]. We thoroughly calculated the diagnostic accuracy of endoscopy when ulcer is defined as near CR and found a better diagnostic accuracy of endoscopy after a longer waiting time, which were in accordance with our biopsy. Those data were not included in our results, because the we were not able to retrieve the depth of ulcer for a differentiate purpose. In general, it is recommended and efficient that the time interval from nCRT to reassessment be set after 100 days. To our knowledge, in the scheme of the routine time interval of 12 weeks or longer until surgery, this is the first study to report the importance of timing on reassessment tools use for a wait-and-watch strategy in post-nCRT LARC patients.

Considering the limitation of sample and retrospective study in nature, in order to achieve more objective outcomes, we only included patients receiving nCRT for analysis for a confined analysis group, and those with TNT were hence not included. In addition, there were some limitations in our study: 1) it is a retrospective single-center study, and certain reassessment tools were applied on a small number of patients, which might induce biases; 2) the reason why patients had advanced or postponed reassessment and surgery was not recorded in detail. For instance, some patients who had undergone multimodal therapy procedures (18.80%) after 2020 might have been disturbed by the COVID-19 epidemic, delaying therapy or reassessment; 3) the time of reassessment tools such as endoscopy, MRI and DRE in some patients was not simultaneous, and usually the timing of DRE and clinical examination was prior to endoscopy and MRI. Therefore, lots of potential biases might be present and should be paid attention to in future prospective cohort studies. Last but not least, we combined the wait-and-watch patients (n = 6) and surgery patients (n = 244) for clinicopathologic analysis, which may possibly bring heterogeneity in the CR group. However, the wait-and-watch patients with 2-year sustained cCR were often considered pCR analogue in practice and literature [21] because recurrence-free patients in the first 2 years had very low recurrence probability (3.85%) after 2 years [11].

In conclusion, it was confirmed that the response reassessment by endoscopy and biopsy for post-nCRT LARC patients obtained a higher efficacy compared to MRI, DRE, clinical symptoms and peripheral blood common tumor markers. DRE and symptoms assessment, as one indicator of the panel, still failed to improve the efficacy of predicting pCR, and thus perhaps should not be used as part of reassessment method combinations anymore. Although MRI also failed to improve the AUC when added into the combination of endoscopy and biopsy, the accuracy had increased. Therefore, at present, the combination of endoscopy, biopsy and MRI is recommended as the suitable reassessment combination if the medical condition was limited. In addition, the time interval from nCRT endpoint to reassessment has a significant influence on the predictive accuracy. Performing response reassessment with endoscopy and biopsy 100 days after nCRT could obtain higher accuracy. As for the MRI, more flexibility regarding the time of using it can be implemented.