Introduction

Neoadjuvant chemotherapy (NAC) is defined as the administration of chemotherapy to treat invasive breast cancer before local treatment (i.e. surgery). Although in breast cancer patients NAC was primarily used for treatment of locally advanced disease stages, its use has been extended to the treatment of early stage breast cancers in order to enable breast-conserving therapy for patients who would otherwise undergo mastectomy. In a large meta-analysis by Mauri et al., no significant differences in survival or overall disease progression was observed between patients receiving adjuvant or neoadjuvant chemotherapy [1].

Some physicians prefer NAC instead of the adjuvant chemotherapy because of the ability to assess tumour response in vivo. In this situation, reliable assessment of pathological tumour response to NAC is vital in order to select the most appropriate surgical plan. An imaging modality that could assess tumour response to NAC would be beneficial, providing it could detect any residual disease present. This could result in a surgical treatment plan more tailored to the individual patient. In addition, pathological complete response (pCR, i.e. the absence of any residual invasive tumour cells) on NAC has shown to be a prognostic factor for overall better survival, disease-free survival and recurrence-free survival [2]. In the future, this latter information might also guide further adjuvant treatment recommendations.

Many examinations have been proposed to evaluate residual disease and/or complete response to therapy (such as clinical examination, mammography and ultrasound), but their accuracy was only of modest degree [3]. Parallel to these findings, (contrast-enhanced) magnetic resonance imaging (MRI) of the breast proved to be superior to mammography and ultrasound with respect to assessing tumour extent, presence of additional foci (i.e. multifocality and/or multicentricity) and the presence of contralateral breast tumours [4, 5]. Therefore, MRI might be a promising imaging tool to assess therapy response in the NAC setting and for assessing pCR.

Several reviews have been recently published that assessed the ability of breast MRI to predict pCR in patients receiving NAC [68]. However, pCR is only achieved by a minority of patients and as a consequence, the role of breast MRI in neoadjuvant chemotherapy was not investigated for a substantial number of patients that still have some degree of residual disease after treatment. In this systematic review, we aimed to address the role of breast MRI in assessing both residual disease extent and pCR after NAC in breast cancer patients. Our study purpose was to assess the role of magnetic resonance imaging (MRI) in the evaluation in all patients receiving neoadjuvant chemotherapy for invasive breast cancer.

Materials and methods

For this systematic review, Embase, the Cochrane library, MEDLINE and citations as provided by PubMed were searched until 1 July 2012, using the search terms breast, breast neoplasms, breast neoplasm, breast cancer, breast carcinoma and breast lesion combined with the search terms magnetic resonance imaging, MRI, MR mammography and neoadjuvant therapy, neoadjuvant chemotherapy, neoadjuvant systemic therapy, neoadjuvant, chemotherapy, primary therapy and initial therapy. Only original articles were considered for inclusion (i.e. no reviews, brief communications or letters to the editor). References of all retrieved articles were manually searched for additional relevant manuscripts. Studies found through these search terms were assessed for potential eligibility by reading the abstracts first and then applying inclusion and exclusion criteria.

Included were only those in which breast MRI was performed at baseline and prior to surgery (but after completion of neoadjuvant chemotherapy). In addition, the ability of MRI to assess pCR was one of our study aims. Rates of pCR to NAC may vary, depending on the treatment regime used: 6–15 % in antracycline-based therapies, up to 30 % when adding taxanes [9]. Therefore, in order to have some reliable information on the ability of MRI to assess residual disease and also pCR, eligible studies should have a sufficiently large study population. To be eligible for this review, we decided that a study should consist of at least 25 patients (in the final analysis) with newly diagnosed, histologically proven breast cancer undergoing neoadjuvant chemotherapy who were imaged using clinical MRI scanners (i.e. minimum 1.5 T).

Studies were not excluded if other imaging modalities were performed parallel to MRI in order to evaluate treatment response.

After this initial assessment, the publications were summarised separately by two radiologists using a standard extraction form. When discrepancies were encountered, consensus opinion was reached afterwards. Extracted data included: first author, year of publication, study design (retrospective or prospective), blinding procedures, population size, mean patient age and range, magnetic field strength, contrast agent/dose used, breast cancer stage at inclusion, tumour histology, breast cancer subtypes, chemotherapy regimen, imaging response assessment [World Health Organisation (WHO) criteria, Response Evaluation Criteria in Solid Tumours (RECIST) criteria or other] and histopathological response assessment.

While scoring the extraction forms in consensus, some studies were excluded if the study outcome proved not to contain information on residual disease evaluation by MRI. All reported P-values ≤0.05 were considered statistically significant. The large heterogeneity observed in the included studies precluded us from pooling data (see also ‘Discussion’ section), which is why we chose to use descriptive statistics in this review. Since this was a systematic review, no approval from our institutional review board was necessary.

Results

In the primary literature search, 3,119 potential studies were identified, of which 515 were double in various searches, leaving 2,604 studies after the primary search. After reading the abstracts, 2,444 were excluded from further evaluation, leaving 160 studies to be analysed using the inclusion and exclusion criteria. In these studies, two additional studies were identified by manually searching the references in the manuscripts. In this analysis, another 98 studies did not comply with our eligibility criteria and were subsequently excluded, leaving 64 studies to be reviewed using the extraction form and consensus reading. This led to the exclusion of another 29 studies, because they did not address the topic of residual disease assessment with MRI after neoadjuvant chemotherapy. Therefore, a total of 35 studies were eligible for this systematic review [1045]. Figure 1 presents a more detailed overview of the study selection process.

Fig. 1
figure 1

Detailed overview of study selection

The majority of studies (27) were prospective in design. A total of 2,359 patients were included in the studies (mean 65.5 patients per study, range 30–216). Median age of patients was 48.0 years (range 23–82 years). Three studies were performed on a 3-T MRI scanner, four on both 1.5- and 3-T scanners, and the remaining studies on 1.5-T MRI scanners. In all studies, a commercially available gadolinium-based contrast agent was used for breast MRI at regular clinical administration doses. Interestingly, there was a remarkable heterogeneity in breast cancer stages and subtypes, neoadjuvant chemotherapy regimens, and methods used for assessing response in both imaging and histopathological analyses (Tables 1 and 2). This heterogeneity precluded us from further pooling of data in a meta-analysis.

Table 1 Overview of included studies
Table 2 Overview of included studies

Seventeen studies calculated correlation coefficients for the comparison of MRI tumour measurements compared with histopathological results [13, 14, 17, 21, 22, 25, 27, 28, 3135, 37, 39, 40, 43]. Correlation coefficients varied from poor to excellent, but the median value was 0.698 (range 0.21–0.982, Table 3). Nonetheless, two studies reported non-significant correlation coefficients. In the relatively small (n = 59) retrospective study by Guarneri et al., the correlation coefficients were similar for MRI (0.53) and ultrasound (0.66) when compared to histopathology, and both were non-significant [35]. In the study of 86 women by Nakahara et al., the correlation coefficient of all included patients was remarkably low (0.21), but rose to a strikingly high and statistically significant value of 0.92 when only triple-negative breast cancer types were analysed [31]. A weak correlation coefficient (0.30) was also presented by Chen et al. However, when four cases with a size discrepancy larger than 5 cm were excluded (all HER2-negative tumours presenting as non-mass-like enhancement), the correlation coefficient increased remarkably to 0.76 (P < 0.001) [39]. Furthermore, both overestimation and underestimation were observed by multiple studies [10, 13, 15, 16, 20, 22, 24, 32, 37].

Table 3 Correlation coefficients of MRI and histopathological tumour measurements
Table 4 Diagnostic accuracies of MRI for predicting pathologic complete response

Although correlation coefficients are useful tools to describe MRI’s ability to assess response to NAC, it could mask the truth, since the same trend in all studies could result in excellent correlation between histopathological results and MRI measurements, yet the actual estimation of pCR might not be accurate. Therefore, the variation of size evaluation between MRI and pathology yields additional information. Bhattacharyya et al. reported an overestimation of >10 mm in 4 of 32 cases [24]. Belli et al. described a mean overestimation and underestimation of 2.1 and 2.0 mm, respectively [20]. These numbers were 20.2 and 13.8 mm in the study by Denis et al. [15]. Partridge et al. found the smallest deviation with an overestimation of MRI measurements of only 0.9 mm [13]. The studies by Guarneri et al. and Lyou et al. found a mean size difference of 1.6 and 6.0 mm, respectively [35, 38].

With respect to pCR prediction with MRI, sensitivity was considered to be the proportion of patients with pCR that were correctly classified with MRI as complete responders. Specificity was considered to be the proportion of patients with non-pCR correctly classified by MRI as non-responders. To illustrate, a sensitivity of 62 % in these studies meant that in 62 out of 100 patients, MRI was able to correctly identify patients with pCR (i.e. MRI did not show any residual enhancement). Eight studies calculated the diagnostic accuracies for MRI in predicting pCR (Table 4) [12, 18, 24, 25, 29, 34, 43, 45]. Two studies reported diagnostic accuracies for diffusion-weighted MR imaging, and both reported a sensitivity of 100 % [43, 45]. Specificity was 70 % and 91 %. Only one study evaluated the potential of MR spectroscopy parameters (total choline-containing compounds, tCho) and reported a sensitivity of 53 % and a specificity of 70 % [43]. The remaining studies used dynamic, contrast-enhanced MR imaging. Median (and range) sensitivity and specificity were 42 % (25–92 %) and 89 % (50–97 %), respectively. If reported, median (and range) PPV and NPV were 64 % (50–73 %) and 87 % (71–96 %), respectively.

Interestingly, only three studies compared diagnostic accuracy of MRI and ultrasound for assessing residual disease [19, 21, 25]. In these studies, ultrasound was less accurate than MRI. MRI was also more accurate than mammography in assessing residual disease, but only one study performed this comparison [19]. Finally, MRI was more accurate than physical examination for assessing residual disease, which was examined by three studies [19, 21, 44].

Discussion

In this systematic review, we aimed to analyse the available data on MRI accuracy for assessing residual disease and pCR after neoadjuvant chemotherapy in breast cancer patients.

Many studies compared the measured tumour diameter or volumes on MRI with pathological results as the gold standard. Correlation coefficients for these comparisons were good to excellent, but both overestimation and underestimation of the MRI size measurements were frequently observed. Although these correlation coefficients were good, it does not necessarily mean that agreement between these measurements is good. The majority of studies did not investigate the agreement between these measurements, for instance by using Bland-Altman plots [46].

Contrast-enhanced breast MRI is superior to other imaging modalities to assess breast tumour extent and the presence of multicentricity or multifocality [4, 5]. However, overestimation of tumour extent is a well-known phenomenon in (preoperative) breast MRI [47, 48] and was also observed in the NAC setting. Confounding factors in overestimating tumour size might be: reactive inflammation caused by tumour response and healing, surrounding sclerosis and necrosis, multiple scattered lesions and presence of accompanying ductal carcinoma in situ [19, 20, 22]. In theory, overestimation of tumour size by MRI could result in an altered surgical treatment plan for the individual patient, with the risk of achieving wider resection margins (with poorer cosmetic results) or performing unnecessary mastectomy (where breast-conserving therapy would have been possible).

In contrast to overestimation of tumour size by MRI, underestimation was also observed in the NAC setting. Causes might be antivascular effects of docetaxel (resulting in less tumour enhancement), lack of inflammatory response surrounding the tumour in docetaxel patients, more extensive ductal carcinoma in situ components and partial volume effects in very small foci of residual disease [13, 15, 21]. Underestimation of residual disease could lead to positive resection margins with viable residual tumour cells, necessitating re-surgery. In addition, positive resection margins are associated with an increased long-term risk of disease recurrence in patients who have undergone breast-conserving therapy [49]. Straver et al. attempted to create an MRI-based model that could help in surgical decision making. In this study, MRI underestimated tumour size >20 mm in 17 % of the patients, in 13 % leading to an incorrect decision to perform breast-conserving surgery. From their study, they concluded that baseline tumour size, tumour size reduction and cancer subtype should be taken into account in the optimal selection of patients eligible for breast-conserving therapy [30].

Seven studies provided an overview of the diagnostic accuracy of MRI to predict pCR. Median sensitivity and specificity were 42 % and 89 %, respectively. If reported, median PPV and NPV were 64 % and 87 %, respectively. In a meta-analysis of 25 studies, Yuan et al. showed that pooled weighted estimates of sensitivity and specificity of MRI for demonstrating pCR were 63 % (range 56–70 %) and 91 % (range 89–92 %) [6]. These findings were concordant with the observations made by Wu et al., who performed a meta-analysis of 34 studies [7]. They concluded that the sensitivity and specificity of contrast-enhanced breast MRI to predict pCR were 68 % and 91 %, respectively. These values are slightly discrepant with the observations of our current review. The variation in these findings, especially in specificity, might be explained by the differences in included studies, since these recent publications additionally included studies with smaller populations and lower MRI field strengths. Since pCR is only achieved in up to 30 % of patients, we think that a minimum number of ten patients is too low to accurately use pCR as a study outcome. Therefore, we chose a minimum number of 25 patients to be included in our final analysis. Despite the differences in these reviews and our current study, the diagnostic accuracy of breast MRI to predict pCR seems to have a high specificity and NPV versus only moderate sensitivity and PPV. Nevertheless, the varying results in the separate studies (and their range) of the recent reviews show that MRI's accuracy for assessing pCR is still under debate and that it is too early to use it as a decision-making tool in studies that investigate other treatment strategies after pCR besides surgery.

Although the number of studies was small, contrast-enhanced MRI outperformed physical examination, ultrasound and mammography in accurately assessing residual disease. In physical examination, this is most likely explained by fibrosis surrounding the tumour bed as a result of the therapy. This fibrotic tissue remains hard, and as such it could lead to misinterpretation of residual disease. These fibrotic changes can also be observed in mammography and ultrasound and cannot be easily discerned from residual tumour tissue, but can be excluded by MRI since this fibrotic tissue does not show any enhancement after contrast administration. In addition, the diagnostic accuracy of mammography is strongly dependent on breast density, being lower in breasts with extremely dense fibroglandular tissue [50]. If performed in the right period of the menstrual cycle (day 3–14 in premenopausal women), the accuracy of breast MRI was less influenced by breast density [51].

Almost all papers used contrast-enhanced breast MRI for the evaluation of residual disease and pCR after NAC. Some studies additionally investigated the ability of diffusion-weighted imaging (DWI) for assessing pCR after NAC. In DWI, MRI is used to assess the Brownian motions of water molecules within a certain tissue of interest. In the cell-rich environment of tumours, the motion of these molecules is restricted and can be measured with DWI. This results in increased signal intensity on so-called diffusion-weigted images, with corresponding low values for the apparent-diffusion coefficient or ADC. Woodhams et al. demonstrated that the sensitivity, specificity and accuracy of DWI for assessing pCR was higher than contrast-enhanced breast MRI, but the differences observed were not significant with p-values of 0.31, 0.08 and 0.06, respectively [26]. In the study by Park et al., DWI was compared to positron emission computed tomography (PET-CT) and showed a slightly higher AUC for predicting pCR when compared to PET-CT, although this difference was not statistically significant in their population of 34 patients, of which 7 achieved pCR after therapy [45]. In the study by Shin et al., three different MRI techniques were compared: dynamic contrast-enhanced MRI, DWI and MR spectroscopy. They concluded that the change in ADC after treatment was the most accurate predictor of pCR. With an AUC of 0.96, they found that the optimal cutoff value for percentage ADC change was 40.7 %, yielding a sensitivity of 100 % and a specificity of 91 % [43].

This review has some important limitations. First, publication bias is a study limitation that merits attention in each systematic review. Small studies with less favorable results tend to be published less frequently or not at all. With this potential bias in mind, one should realise that the current positive findings of MRI accuracy after NAC might be overestimated.

Second, the lack of study uniformity prevented us from performing a meta-analysis. Therefore, we chose to perform a systematic review of the selected studies and provide a descriptive presentation of the observed findings instead of performing a meta-analysis that uses statistical models to adjust for this heterogeneity to some extent. Variations in study aim, chemotherapy regimens, response assessment criteria in both imaging and pathological analysis, patient populations and breast cancer subtypes precluded us from drawing more definitive conclusions. For example, Chen et al. showed in their study that MRI can predict pCR accurately in HER2-positive patients, but a high false-negative rate was observed in HER2-negative patients, especially when they received anti-angiogenic drugs [23]. Loo et al. showed in their study of 118 patients that response monitoring after NAC is effective in triple-negative or HER2-positive breast cancer subtypes, but is inaccurate in ER-positive/HER2-negative subtypes [36]. In their 2011 publication, Chen et al. demonstrated that MR imaging accuracy was higher for HER2-positive cancer types than for HER2-negative tumours (88 % versus 82 %). In the same study, they showed that the average size discrepancy in cases with Ki-67 staining of <10 % was greater than in cases with Ki-67 staining of >40 % [39]. But also the choice of chemotherapeutic regimen can influence MRI accuracy. Denis et al. showed that MRI frequently underestimated residual tumour size in taxane-containing treatments, most likely because of the antivascular effects of these drugs, resulting in less enhancement on contrast-enhanced MRI [15].

Third, the method of evaluating treatment response in imaging and pathology is important. Although no significant differences between WHO and RECIST criteria in imaging response assessment were observed in other cancer types, multiple response criteria were used in the selected studies [52, 53]. Similarly, there are no widely accepted response assessment criteria for pathology. The most important issue in this assessment is the extent of residual DCIS. Whether or not DCIS is included in the analysis might partly explain the differences observed in MRI over- and underestimation. From a clinical point of view, it would be most interesting to assess MRI accuracy if DCIS were included in the definition of pCR, since DCIS should also be excised during surgery and identification of DCIS extension by MRI remains challenging [54].

Fourth, the population size of the majority of studies is relatively small. Only four studies had a population size >100 subjects [25, 30, 36, 44] and most of the studies were single-centre studies. The statistical noise will be smaller if the sample sizes are increased in (future) multicentre studies in order to assess the true accuracy of MRI in the NAC setting with greater confidence.

In summary, breast MRI accuracy for assessing residual disease after neoadjuvant chemotherapy is good, but multiple factors, such as cancer subtype and treatment regimen, can influence MRI accuracy and should be considered in clinical decision making. Both overestimation and underestimation can be observed and might have important clinical impact. Clinical decision making based on MRI results should therefore be made prudently with these limitations in mind. Regardless of the many potential confounders described in this review, we feel that assessment of NAC response with MRI is promising and ready for more multicentre studies that are able to address these shortcomings.