The introduction of preoperative, rather than postoperative, adjuvant chemoradiation therapy (CRT) has led to a reduction in local recurrence rates and has become standard of care for patients with locally advanced rectal cancer.1 In 10–24% of patients, no residual tumor is found at histology after surgery.2 These complete responders are known to have a very good prognosis, in terms of overall and disease-free survival.2 A complete response also raises the hotly debated question of whether surgery is still necessary for these patients, especially because total mesorectal excision (TME) may have associated morbidity and even mortality and has the potential risk of a permanent colostomy. Recently, a more conservative treatment is advocated in patients who show a good or complete response to neoadjuvant treatment. In 2006, Habr-Gama et al. presented the long-term results of a prospective trial that investigated a “wait-and-see” policy in a carefully selected group of patients with clinical and radiological evidence of a complete response after neoadjuvant CRT. Results at 5-year follow-up were favorable for the nonsurgical group, with an overall and disease-free survival of 93% and 85%, respectively.3 To safely omit surgery, it is essential to select accurately the right candidates, i.e., the true complete responders. This selection is mainly performed using digital examination, endoscopy, and biopsy, but these methods are not infallible. The role of imaging for restaging after CRT has been the subject of several studies and all suggest that neither MRI nor endorectal ultrasound or 18F-fluorodeoxyglucose-positron emission tomography (FDG-PET) are sufficiently accurate for identifying the true complete responders with positive predictive values ranging from 17–50%.49 The use of these modalities for selection of patients would consequently put them at risk for undertreatment.

Diffusion-weighted MRI (DWI) is a functional MR imaging technique that uses differences in the extracellular movement of water protons to discriminate between tissues of varying cellularity. In tissues with normal cellularity, water protons can diffuse relatively freely, which results in a loss of signal on DWI. Conversely, in tissues with increased cellularity (tumor), the diffusion of water is restricted, resulting in remaining high signal on DWI. In many reports, DWI has shown promise for identification of malignant tumors, and recent studies on rectal cancer have indicated that DWI also may be useful for response evaluation after chemoradiation treatment.8,1015 In 2009, Kim et al. showed in a study of 40 patients that DWI in addition to standard MRI significantly improved the performance of radiologists to select complete responders compared with standard MRI only.8

The purpose of our study was to evaluate the accuracy of DWI in addition to a standard restaging MRI for selection of complete responders after chemoradiation for locally advanced rectal cancer in a larger and multicenter study setting.

Methods and Materials

Patients

This study retrospectively evaluated 120 consecutive patients who were treated for locally advanced rectal cancer in three university hospitals between 2005 and 2009. Due to the retrospective nature of the study, informed consent was not required. Ninety-three patients were men and 27 were women. Median age was 67 (range 22–89) years. Inclusion criteria consisted of (1) biopsy-proven rectal cancer, (2) locally advanced disease as determined on primary staging MRI (T3-4 tumor, tumor involvement of the mesorectal fascia, and/or positive nodal status), (3) preoperative treatment consisting of a long course of neoadjuvant chemoradiation treatment, and (4) availability of posttreatment MR imaging, including DWI. Exclusion criteria consisted of (1) nonresectable disease and (2) insufficient MR image quality (e.g., due to metal or motion artefacts). All patients underwent a long course of preoperative chemoradiation, consisting of capecitabine and/or oxaliplatin, combined with 50.4–55 Gy of radiation. After a 5–10-week time interval, all patients underwent a second, restaging MRI, including DWI, for response evaluation. Patients were then referred for further treatment.

MR Imaging

In each participating center, imaging was performed at 1.5T using a phased array body coil. The MR protocol consisted of standard T2-weighted fast spin echo sequences (as described in literature) in three orthogonal directions: sagittal, axial, and coronal with an in plane resolution ranging from 0.42–2.56 mm2 and a slice thickness of 4–5 mm.16 An additional diffusion-weighted echo planar imaging sequence was acquired with b0 as the lowest and b1000 s/mm2 as the highest b-factor, an in plane resolution of 7.8–9.6 mm2 and a slice thickness of 4–5 mm, as described in previous reports from the participating centers.1719

Image Evaluation

All images were independently analysed by three readers, who were blinded to all clinical information, other imaging results, and histopathology. Reader 1 (RGHB) was a gastrointestinal (GI) radiologist who was highly specialized with 13 years of experience in reading pelvic MRI. Reader 2 (FCHB) was a GI radiologist with 3 years of experience in reading pelvic MRI. Reader 3 (VV) was a GI radiologist with 2 years pelvic MRI expertise and 5 years of experience in reading DWI images in head and neck, abdominal cancer, and lymphoma. The three readers first evaluated the standard postchemoradiation (restaging) MR images and scored the likelihood of a complete response of the primary tumor using a confidence level score (0 = definitely residual tumor, 1 = probably residual tumor, 2 = possibly residual tumor/possibly complete response, 3 = probably complete response, 4 = definitely complete response). The pre-CRT images were at the readers’ disposal to identify the primary tumor, which is just like the evaluation process performed in daily clinical practice. Subsequently, the confidence level-based scoring of the restaging MRI was repeated after addition of the b1000 DWI images.

Imaging Criteria

On standard MRI, a normalized rectal wall without any detectable wall thickening was considered a definite criterion for a complete response (Fig. 1). A solid residual mass with intermediate signal intensity on T2-weighted MRI was considered a definite criterion for residual tumor (Fig. 2). Hypointense signal intensity changes indicated fibrosis, in which case undetermined scores were assigned (Fig. 3).18 On the diffusion images, residual high-signal intensity on the location of the primary tumor was considered a criterion for residual tumor, whereas the absence of increased signal on DWI was indicative of a complete response (Fig. 3). The readers assigned a confidence level 2 score (equivocal score) when they were not able to differentiate between a complete response or residual tumor.

Fig. 1
figure 1

Standard T2-weighted images of a female patient with a tumor (T) in the mid-rectum, before (a) and after (b) preoperative chemoradiation therapy. After chemoradiation, the tumor has completely disappeared and a normalized rectal wall can be visualized (arrowheads). This feature was considered strongly predictive for a complete tumor response

Fig. 2
figure 2

Standard T2-weighted images of a male patient with a tumor (T) in the rectum, before (a) and after (b) preoperative chemoradiation therapy. After chemoradiation, a solid residual tumor mass is still visualized (arrow). This feature was considered strongly predictive for the presence of residual tumor

Fig. 3
figure 3

Standard T2-weighted images of two patients with a tumor (T) in the rectum before (a, d) and after chemoradiation treatment (b, e). In both cases, the tumor bed has become fibrotic after chemoradiation (arrowheads), which makes it difficult to discriminate between residual tumor and a complete response. In the upper patient, there is still a clear high signal intensity area on DWI (arrow in c), which was confirmed to be a ypT2 residual tumor at histology. In the lower patient, no high signal is shown on DWI (f) and a complete tumor response (ypT0) was confirmed at histology

Reference Standard

Histopathologic evaluation of the surgical resection specimen, according to the TNM staging system, served as the reference standard. The tumor regression grade (TRG) was evaluated according to the method of Mandard.20 The response of the primary tumor to chemoradiation was graded as follows: “pathologic complete response” (= ypT0/TRG 1, no residual tumor cells) or “residual tumor” (= ypT1-4 / TRG 2-5, varying from limited tumor cells to a solid residual tumor mass). Eight patients did not undergo surgery, due to strong clinical evidence of a complete response (repeated negative sigmoidoscopy and biopsies after CRT). For these eight patients, a local and distant recurrence-free follow-up period of >24 months was considered a surrogate endpoint for a complete response.

Statistical Analysis

Statistical analyses were performed using the Statistical Package for the Social Sciences (SPSS, version 16.0, Inc., Chicago, IL). Receiver operator characteristics (ROC) curve analyses were performed to evaluate the diagnostic performance of (1) standard MRI only and (2) standard MRI + DWI for identification of a complete response. Corresponding areas under the ROC curve (AUC), sensitivities, specificities, positive predictive values (PPV), and negative predictive values (NPV) with 95% confidence intervals (CI) were calculated. For these analyses, it had been decided at the start of the study to dichotomize the confidence level scores between 2 (possibly residual tumor/possibly complete response) and 3 (probably complete response). Differences in diagnostic performance between standard MRI only and the combination of standard MRI + DWI were analyzed by comparing the ROC curves according to the method described by DeLong et al.21 P values < 0.05 were considered statistically significant. Weighted kappa values with quadratic kappa weighting (0–0.2 poor, 0.21–0.4 fair, 0.41–0.6 moderate, 0.61–0.8 good and 0.81–1 excellent agreement) were calculated to evaluate interobserver variability.22

Results

Patient and Treatment Characteristics

A total of 79 patients underwent low anterior resection, 25 had abdominoperineal resection, 4 had more extended surgery, and 4 had local excision (transanal endoscopic microsurgery). At histology, 17 patients had ypT0, 11 had ypT1, 25 had ypT2, 55 had ypT3 and 4 had ypT4 status. Ten patients had mucinous type adenocarcinoma. The median time interval between the restaging MRI and surgery was 15 (range 0–61) days. The eight patients who did not undergo surgery had a median local and distant recurrence-free follow-up of 42.5 (range 26–73) months; these patients were therefore considered complete responders. All together, 25 patients had a complete response and 95 had residual tumor. Of the patients with residual tumor, 63 were yN0, 22 were yN1, and 10 were yN2 status. Of the patients with a complete tumor response, 23 were yN0, 1 was yN1, and 1 was yN2 status. There were no significant differences in patient characteristics, gender, or age distribution between the separate centers.

Diagnostic Performance for Selection of Complete Responders

ROC curves for the selection of complete responders are displayed in Fig. 4. Corresponding accuracy figures and AUCs with 95% confidence intervals are provided in Table 1. For the highly expert reader 1, AUC improved from 0.76 for standard MRI to 0.8 for standard MRI + DWI (P = 0.39). For the less experienced reader 2, AUC improved from 0.68 on standard MRI to 0.8 after addition of DWI (P = 0.02). For reader 3, AUC improved from 0.58 on standard MRI to 0.78 after addition of DWI (P = 0.002).

Fig. 4
figure 4

Receiver operator characteristics curves and areas under the curve (AUC) of the three readers for identification of a complete tumor response after CRT using only standard MRI and standard MRI + DWI, respectively. Diagnostic performance improved significantly (*) for reader 2 (P = 0.02) and reader 3 (P = 0.002). For reader 1, there was no significant improvement (P = 0.39)

Table 1 Diagnostic performance for the prediction of a complete response (ypT0)

Number of Equivocal (Confidence Level 2) Scores

When using only standard MRI without DWI, readers 1, 2, and 3 assigned a confidence level score of 2 (possibly residual tumor/possibly complete response) to 31, 7, and 41 patients, respectively. After addition of DWI, the number of equivocal scores decreased to 2, 4, and 2 for the three readers, respectively. This resulted in a reduced number of false negatives for prediction of a complete tumor response, ranging from 9–12 for the three readers on standard MRI + DWI compared with 15–25 on standard MRI only. The number of false positives remained unchanged and ranged from 2–8 on standard MRI and from 3–10 after addition of DWI.

Interobserver Agreement

Kappa values for the interobserver agreement between the three readers are displayed in Table 2. Interobserver agreement improved from fair agreement (κ 0.2–0.32) on standard MRI to moderate agreement (κ 0.51–0.55) after addition of DWI.

Table 2 Interobserver agreement between the three readers

Discussion

The findings of this study indicate that the diagnostic performance for predicting a pathologic complete tumor response after chemoradiation improved for the combination of standard MRI + DWI (AUC 0.78–0.8) compared with standard MRI only (AUC 0.58–0.76). With the addition of DWI, sensitivity for identification of a complete response improved by 16–52% for the three readers. Moreover, it resulted in a substantial reduction in the number of equivocal scores and an improved interobserver agreement.

Of interest is the improved sensitivity for the combination of MRI + DWI; i.e., it resulted in less overestimation of tumor in patients with a complete tumor response. This is mainly because on the restaging MRI without DWI many interpretation difficulties were observed when the primary tumor bed had become fibrotic as a result of the radiation treatment. In these cases, it is difficult to differentiate small areas of residual tumor from mere fibrosis and readers tend to overestimate the presence of tumor (Fig. 3).2326 Apparently, this is where the functional information from DWI proves beneficial. Areas of fibrosis typically have a low cellular density, which results in low signal intensity on high b-value (b1000) diffusion images.27 Conversely, residual tumor areas have a relatively high cellular density and show high signal on DWI, which stands out within the low signal of the surrounding tissue/fibrosis. This is the reason why small areas of residual tumor are better depicted on DWI.8,27 Nevertheless, interpretation errors were still observed with DWI resulting in a suboptimal sensitivity of 52–64%. When the signal of the normal rectal wall is not fully suppressed on DWI, which often occurs when the rectal wall is collapsed, high signal at the location of the initial tumor area may erroneously be interpreted as residual tumor, resulting in overstaging errors. In addition, some imaging artifacts may occur on DWI, particularly around air-tissue interfaces. It is relevant to recognize these shortcomings of DWI and initiate teaching courses in which radiologists will be trained in the interpretation of DWI and will become familiar with its pitfalls. Specificity for MRI and DWI is >90%, indicating that the residual tumors are accurately detected and the risk for undertreatment will be <10%. Although DWI allows detection of even small (2–5 mm) tumor volumes, the challenge will remain the detection of microscopically small clusters of residual tumor cells, which are difficult to detect—even at histology—and are currently beyond the detection level of any available imaging modality, including DWI.

The addition of DWI improved the performance of all readers, albeit that this benefit was not significant for reader 1. His extensive experience of 13 years in interpreting rectal cancer MRI may explain why reader 1 was already more accurate with the use of only standard MRI (AUC 0.76). This exceptionally high level of expertise does not reflect common daily practice. Our study, however, clearly shows that for radiologists in general centers with expertise levels like the other two readers, DWI can really be of value. Furthermore, all readers showed a significant reduction in equivocal (confidence level 2) scores after addition of DWI, indicating that it raised their confidence in the discrimination between complete responders and residual tumor. This also explains the better interobserver agreement between the readers after addition of DWI.

So far, the largest body of evidence for response evaluation exists for 18FDG-PET. Changes in FDG uptake, in particular early (±2 weeks) after onset of treatment, have proven useful for prediction of response.4,28,29 PET is, however, less reliable in identifying the complete tumor responders after completion of chemoradiation: up to 55% of the residual tumors are overlooked and patients are erroneously interpreted as complete responders.5,6,28 In a recent study by Janssen et al., only one of six complete responders as identified on FDG-PET corresponded with a true complete response at histology.4 When using FDG-PET for treatment planning, the main risk would be an undertreatment of these patients. In our DWI-MRI study, the presence of residual tumor was underestimated in only <10% of the cases, indicating that—compared to PET—there is a considerably smaller risk for undertreatment.

To the best of our knowledge, this is the largest and only multicenter study to investigate the value of DWI for identifying complete tumor responders after CRT for rectal cancer. It confirms previous findings of a smaller, single-center study by Kim et al.8 Previous studies also have shown promise for quantitative DWI measurements of the “apparent diffusion coefficient” (ADC) (performed before, during, and/or after chemoradiation treatment) to predict the degree of response to therapy.8,1015,17,30,31 In our study, we only focussed on qualitative, visual evaluation of DWI and did not quantitatively measure ADC. This is a more convenient approach, because a visual analysis is more practical and less time-consuming for a busy radiology practice. Furthermore, ADC values are dependent on technical variations among DWI sequences generated by different MR equipment. ADC data from multiple centers may be less suitable for pooled analysis. Visual evaluation of DWI images is less subjected to technical variations, and pooling of these data was feasible because all three participating centers acquired a DWI sequence with equal (b1000) diffusion weighting. Nevertheless, we acknowledge that small variations between the participating centers may have introduced some bias.

In the current study, we only focused on response assessment of the primary tumor and not the lymph nodes. The prevalence of a positive lymph node status in case of a complete response of the primary tumor after CRT is very low and was only 2 of 25 (8%) in the present study. Nevertheless, to safely offer patients a wait-and-see policy after CRT, we have to ensure that both the primary tumor and all metastatic nodes have undergone a complete regression (ypT0N0). Although standard MRI is known to be inaccurate for the primary staging of rectal cancer nodes,32,33 there is evidence that after chemoradiation, MRI performs considerably better. High NPVs ranging from 81–100% have been reported, suggesting that the ypN0 patients can already be accurately selected and the addition of functional techniques, such as DWI, may not even be necessary.7,19,23,3436 Furthermore, the only study to focus specifically on DWI for staging of rectal cancer nodes after CRT already showed good results for standard MRI only (NPV 94–95%) and reported no clear benefit after addition of DWI (NPV 92–93%). The main role of DWI for lymph node evaluation was that it improved the number of detected nodes (both benign and malignant), because nodes were more easily detected on DWI due to their high signal intensity compared with the suppressed background signal of surrounding tissues.19

Clinical Impact

A wait-and-see approach3 or local excision37 for patients with a good response after chemoradiation is at present still debatable. Initiating and performing large patient studies to prove their efficiency is difficult, partly because clinicians are not convinced that safe selection of the right patients can be done. Therefore, one of the most important cornerstones to make implementation of such minimally invasive treatments possible is a precise selection of the eligible patients. Our goal was to assess whether MR imaging can be beneficial in this regard. Because of its reported promise in cancer imaging, we particularly looked at the potential of diffusion-weighted MRI. Moreover, DWI is a noninvasive technique that does not require the use of ionizing radiation or contrast agents and can easily be added to any standard MRI protocol. Our results suggest that, by combining morphological with functional imaging information, MRI + DWI can significantly improve sensitivity for selection of complete responders. Furthermore, specificity is >90%, which indicates that the risk for underestimation of residual tumor can be brought to <10%. As an adjunct to clinical tools (digital examination, endoscopy, and biopsy), the combined use of MRI + DWI seems promising to enable a more precise selection of patients eligible to undergo minimally invasive treatments. The current results are obviously still premature for clinical decision-making, but its promise warrants further large and prospective patient studies.

In conclusion, this study shows that the addition of diffusion-weighted imaging to a standard, restaging MRI improves the performance and confidence of radiologists in selecting the patients with a pathological complete tumor response after chemoradiation for locally advanced rectal cancer. The combination of MRI + DWI could be of additional value for the clinical assessment of these patients.